Skip to content

Singular Value Decomposition

The Singular Value Decomposition (SVD) is one of the most powerful and widely used tools in linear algebra. Unlike eigenvalue decomposition, which is only defined for square matrices, the SVD exists for any m×nm \times n matrix A\Av.

The SVD is closely related to the eigenvalues and eigenvectors of the symmetric matrices AAT\Av\Av^{\rm T} and ATA\Av^{\rm T}\Av.

  • The columns of U\Uv are the orthonormal eigenvectors of AAT\Av\Av^{\rm T}.
  • The columns of V\Vv are the orthonormal eigenvectors of ATA\Av^{\rm T}\Av.
  • The non-zero singular values σi\sigma_i are the square roots of the non-zero eigenvalues of both AAT\Av\Av^{\rm T} and ATA\Av^{\rm T}\Av.
σi=λi(ATA)=λi(AAT)\sigma_i = \sqrt{\lambda_i(\Av^{\rm T}\Av)} = \sqrt{\lambda_i(\Av\Av^{\rm T})}

The SVD can be thought of as decomposing any linear transformation into three simple steps:

  1. Rotation in the domain (VT\Vv^{\rm T}): Rotate the input vector to align with the principal axes.
  2. Scaling (Σ\Sigmav): Stretch or compress the vector along these axes by the singular values.
  3. Rotation in the codomain (U\Uv): Rotate the resulting vector to the final orientation.

This tells us that every matrix maps a unit sphere to a (possibly degenerate) hyper-ellipse. The singular values σi\sigma_i are the lengths of the semi-axes of this ellipse.

One of the most important applications of SVD is the Eckart-Young-Mirsky Theorem. It states that the best rank-kk approximation of a matrix A\Av (in terms of Frobenius or Spectral norm) is obtained by keeping only the top kk singular values and their corresponding vectors.

If A=i=1rσiuiviT\Av = \sum_{i=1}^r \sigma_i \uv_i \vv_i^{\rm T}, then the best rank-kk approximation Ak\Av_k (k<rk < r) is:

Ak=i=1kσiuiviT\Av_k = \sum_{i=1}^k \sigma_i \uv_i \vv_i^{\rm T}

This is the foundation for techniques like Principal Component Analysis (PCA) and Image Compression.

In many cases, specifically when mnm \gg n or the matrix is not full rank, we use the Reduced SVD (or thin SVD). If A\Av is m×nm \times n with m>nm > n, the reduced SVD only keeps the first nn columns of U\Uv and the n×nn \times n upper square of Σ\Sigmav:

A=UnΣnVT\Av = \Uv_n \Sigmav_n \Vv^{\rm T}

This is more computationally efficient while preserving all the information needed to reconstruct A\Av.

  • Rank: The number of non-zero singular values is equal to the rank of the matrix.
  • Invertibility: A square matrix is invertible if and only if all its singular values are non-zero.
  • Condition Number: The ratio σmax/σmin\sigma_{\max} / \sigma_{\min} is the condition number of the matrix, which measures how sensitive a system of linear equations is to errors.
  • Pseudo-inverse: The Moore-Penrose pseudo-inverse A\Av^\dagger is easily computed as VΣUT\Vv \Sigmav^\dagger \Uv^{\rm T}, where Σ\Sigmav^\dagger is obtained by inverting the non-zero singular values.