Randomized Linear Algebra

When a matrix is too large to factor in full, sampling its rows and columns gives an unbiased estimator of products and an extremely effective approximation of the SVD. The key fact is that, with the right sampling distribution, the variance of the estimator is controlled by the same Frobenius norms that already govern Eckart–Young accuracy.

Approximate matrix multiplication

Write the product $\Av\Bv$ as a sum of rank-one terms,

\Av\Bv \;=\; \sum_{i=1}^n \av_i \, \bv_i^{\rm T},

where $\av_i$ is the $i$ -th column of $\Av$ and $\bv_i^{\rm T}$ is the $i$ -th row of $\Bv$ . Replace the full sum by a sample.

Each summand has expectation $\sum_i q_i \cdot \av_i\bv_i^{\rm T}/q_i = \Av\Bv$ , so $\widetilde{\Av\Bv}$ is unbiased. Its mean-square Frobenius error is the variance of the estimator,

\E\!\left[\,\lVert \widetilde{\Av\Bv} - \Av\Bv \rVert_F^2\,\right] \;=\; \frac{1}{p}\left( \sum_{i=1}^n \frac{\lVert \av_i \rVert^2\,\lVert \bv_i \rVert^2}{q_i} \;-\; \lVert \Av\Bv \rVert_F^2 \right),

and is minimized over the choice of $q_i$ by importance sampling proportional to column–row magnitudes:

q_i^{\star} \;=\; \frac{\lVert \av_i \rVert \, \lVert \bv_i \rVert}{\sum_j \lVert \av_j \rVert \, \lVert \bv_j \rVert}.

With this choice, the bound becomes $\E\!\left[\lVert \widetilde{\Av\Bv} - \Av\Bv \rVert_F^2\right] \le \tfrac{1}{p}\big(\sum_i \lVert \av_i\rVert\,\lVert \bv_i\rVert\big)^2$ , depending only on the total Frobenius mass of the factors. Matrices with a few dominant columns concentrate that mass, and a small $p$ suffices.

Randomized range finder

Many problems do not need the full product, only a good basis for the column space of $\Av$ . Hit $\Av$ with a small random matrix.

Statement
Sketch

Let $\Av \in \R^{m \times n}$ have singular values $\sigma_1 \ge \sigma_2 \ge \cdots$ , fix a target rank $k$ , an oversampling parameter $\ell \ge 2$ , and let $\Omegav \in \R^{n \times (k+\ell)}$ have i.i.d. standard Gaussian entries. Let $\Qv$ be an orthonormal basis for the columns of $\Yv = \Av\Omegav$ . Then with probability at least $1 - 6\ell^{-\ell}$ ,

\lVert \Av - \Qv\Qv^{\rm T} \Av \rVert \;\le\; \left(1 + 11\sqrt{(k+\ell)\min(m,n)}\right) \sigma_{k+1}.

The constant $11\sqrt{(k+\ell)\min(m,n)}$ is the simple worst-case bound; the expected error is much smaller, often only $\sqrt{k\ell}$ above the optimum. A single extra step of power iteration, replacing $\Yv = \Av\Omegav$ by $\Yv = (\Av\Av^{\rm T})^q \Av\Omegav$ , polynomially shrinks the gap whenever $\Av$ has any decay in its singular values.

Randomized SVD

Combining the two ideas gives the fast approximate SVD of a large matrix.

The expensive step is the matrix multiplication $\Av\Omegav$ , which touches each entry of $\Av$ once and can be streamed or done in parallel; the rest works on tall-thin matrices of size $m \times (k+\ell)$ or $(k+\ell) \times n$ . For an $m \times n$ matrix the cost drops from $O(mn\min(m,n))$ for a full SVD to roughly $O(mn(k+\ell))$ , with error close to the Eckart–Young optimum $\sigma_{k+1}$ . This is what lets PCA, latent-semantic indexing, and modern recommendation systems run on matrices that would never fit in memory in factored form.