Least Squares
When there are more equations than unknowns, usually has no solution: does not lie in the column space . Rather than give up, we ask for the that comes closest, minimizing the residual in the Euclidean norm. This single problem has four standard solutions, and seeing that they agree ties together projection, orthogonality, , and the SVD.
1. The normal equations
Section titled “1. The normal equations”The objective is a convex quadratic, so its minimizers are exactly its stationary points. Setting the gradient to zero, , gives the normal equations.
2. The geometry: projection onto the column space
Section titled “2. The geometry: projection onto the column space”The normal equations say , i.e. the error is orthogonal to every column of . So , and the fitted vector is the orthogonal projection of onto the column space: the closest point of to .
Least squares splits into a part inside and a part orthogonal to it. Minimizing is choosing the point of the plane nearest , the foot of the perpendicular.
When is invertible, substituting gives with the projection matrix
which satisfies and , the algebraic signature of an orthogonal projection.
Fitting a line or parabola to data is exactly this. Each data point contributes a row to and an entry to ; the least-squares coefficients place the curve so the vertical residuals have the smallest total square. Drag the points below and watch the fit and the residual norm respond.
3. Through QR
Section titled “3. Through QR”Forming explicitly is the numerically fragile step. With the factorization ( orthonormal columns, upper triangular and invertible when columns are independent), the normal equations collapse: and , so
This is one triangular solve, and it never forms , so it avoids the conditioning penalty below. Here the projection is .
4. Through the SVD and pseudoinverse
Section titled “4. Through the SVD and pseudoinverse”The most general route uses . The least-squares solution of minimum norm is
where is the pseudoinverse. Unlike the first three routes, this needs no independence assumption: when the columns are dependent there are infinitely many least-squares solutions, and selects the shortest one. This unifies least squares with the underdetermined case and is the subject of the next page.
When the routes disagree numerically
Section titled “When the routes disagree numerically”All four give the same in exact arithmetic. They differ in stability. The normal-equations matrix has condition number
so nearly dependent columns, large , are squared into a much worse problem, and small errors in or in storing are amplified.