Skip to content

Vector and Matrix Norms

A norm measures the size of a vector or matrix. Different norms encode different notions of “size” (Euclidean length, max coordinate, sum of magnitudes, …) and which one to use is usually dictated by the geometry of the problem.

A function that satisfies (1) and (2) but not (3) is sometimes called a quasi-norm. The third condition is what makes a norm geometrically well-behaved: it forces the unit ball {x:x1}\{\xv : \|\xv\| \le 1\} to be convex.

For p1p \ge 1 and x=(x1,,xn)Rn\xv = (x_1, \ldots, x_n) \in \R^n, define

xp=(i=1nxip)1/p.\|\xv\|_p = \left(\sum_{i=1}^n |x_i|^p\right)^{1/p}.

Three values of pp get their own names:

  • p=1p = 1: the Manhattan or taxicab norm, x1=ixi\|\xv\|_1 = \sum_i |x_i|.
  • p=2p = 2: the Euclidean norm, x2=ixi2\|\xv\|_2 = \sqrt{\sum_i x_i^2}.
  • p=p = \infty: the sup or Chebyshev norm, x=maxixi\|\xv\|_\infty = \max_i |x_i| (this is the limit of xp\|\xv\|_p as pp \to \infty).

The condition p1p \ge 1 is essential. For 0<p<10 < p < 1 the formula is still defined, but the triangle inequality fails and the unit ball becomes non-convex.

The shape of the unit ball {x:xp1}\{\xv : \|\xv\|_p \le 1\} is the cleanest visual handle on how the norm depends on pp. In R2\R^2:

  • p=1p = 1: diamond (square rotated 45°), vertices at (±1,0),(0,±1)(\pm 1, 0), (0, \pm 1).
  • p=2p = 2: circle of radius 1.
  • p=p = \infty: axis-aligned square, vertices at (±1,±1)(\pm 1, \pm 1).
  • 1<p<1 < p < \infty: a convex shape interpolating between the diamond and the square.
  • 0<p<10 < p < 1: a concave four-pointed star.
p = 2.00
convex (valid norm)
-1-111r = 1.000
r = distance from origin to ball boundary along the diagonal y = x (equals 21/2 − 1/p)

The diagonal extent (the distance from the origin to the ball boundary along the line y=xy = x) is 21/21/p2^{1/2 - 1/p}, plotted live above. It interpolates from 21/20.7072^{-1/2} \approx 0.707 at p=1p = 1 to 11 at p=2p = 2 to 21.414\sqrt{2} \approx 1.414 at p=p = \infty, and shrinks rapidly toward 00 for p<1p < 1. The faint dashed circle is the 2\ell^2 ball for reference.

A central fact in finite-dimensional analysis: all norms on Rn\R^n are equivalent.

Concretely, the p\ell^p norms satisfy

x    x2    x1    nx2    nx,\|\xv\|_\infty \;\le\; \|\xv\|_2 \;\le\; \|\xv\|_1 \;\le\; \sqrt{n} \, \|\xv\|_2 \;\le\; n \, \|\xv\|_\infty,

so convergence in any one p\ell^p norm is convergence in every other. The constants n,n\sqrt{n}, n degrade with dimension, which is exactly why “switching norms” is innocuous in low dimensions but matters in high-dimensional statistics.

A matrix norm is just a norm on the vector space of m×nm \times n matrices. There are two flavors that dominate practice.

Treat the matrix as a long vector and apply the Euclidean norm:

AF=i,jAij2=Trace(ATA)=iσi2,\|\Av\|_F = \sqrt{\sum_{i,j} A_{ij}^2} = \sqrt{\text{Trace}(\Av^{\rm T} \Av)} = \sqrt{\sum_{i} \sigma_i^2},

where σi\sigma_i are the singular values of A\Av. The last form is the most useful in proofs: it shows that AF\|\Av\|_F depends only on the singular value spectrum, not on the bases U,V\Uv, \Vv in the SVD. In particular, the Frobenius norm is invariant under orthogonal transformations on either side: Q1AQ2F=AF\|\Qv_1 \Av \Qv_2\|_F = \|\Av\|_F when Q1,Q2\Qv_1, \Qv_2 are orthogonal.

Given a vector norm p\|\cdot\|_p, the induced operator norm of a matrix is

Ap=supx0Axpxp=supxp=1Axp.\|\Av\|_p = \sup_{\xv \ne 0} \frac{\|\Av \xv\|_p}{\|\xv\|_p} = \sup_{\|\xv\|_p = 1} \|\Av \xv\|_p.

This measures the worst-case stretching of the unit ball under A\Av. Three cases have clean closed forms:

  • A1=maxjiAij\|\Av\|_1 = \max_j \sum_i |A_{ij}| (maximum column sum).
  • A=maxijAij\|\Av\|_\infty = \max_i \sum_j |A_{ij}| (maximum row sum).
  • A2=σ1(A)\|\Av\|_2 = \sigma_1(\Av) (largest singular value). Also called the spectral norm.

The spectral norm and the Frobenius norm coincide for rank-1 matrices but differ in general. They satisfy

A2    AF    rA2,\|\Av\|_2 \;\le\; \|\Av\|_F \;\le\; \sqrt{r} \cdot \|\Av\|_2,

where rr is the rank of A\Av. The upper bound is tight when all nonzero singular values are equal; the lower bound is tight when the spectrum is dominated by a single mode.

All induced operator norms (and the Frobenius norm) satisfy

ABp    ApBp,\|\Av \Bv\|_p \;\le\; \|\Av\|_p \cdot \|\Bv\|_p,

which is exactly the property needed for bounding products and powers. For the spectral norm this follows from ABx2A2Bx2A2B2x2\|\Av \Bv \xv\|_2 \le \|\Av\|_2 \|\Bv \xv\|_2 \le \|\Av\|_2 \|\Bv\|_2 \|\xv\|_2.

Submultiplicativity is the reason matrix norms are useful in numerical analysis: error bounds compound as products of operator norms, so a small per-step bound on a transformation gives a small bound on the composition.