When a matrix is too large to factor in full, sampling its rows and columns gives an unbiased estimator of products and an extremely effective approximation of the SVD. The key fact is that, with the right sampling distribution, the variance of the estimator is controlled by the same Frobenius norms that already govern Eckart–Young accuracy.
and is minimized over the choice of qi by importance sampling proportional to column–row magnitudes:
qi⋆=∑j∥aj∥∥bj∥∥ai∥∥bi∥.
With this choice, the bound becomes E[∥AB−AB∥F2]≤p1(∑i∥ai∥∥bi∥)2, depending only on the total Frobenius mass of the factors. Matrices with a few dominant columns concentrate that mass, and a small p suffices.
Many problems do not need the full product, only a good basis for the column space of A. Hit A with a small random matrix.
The constant 11(k+ℓ)min(m,n) is the simple worst-case bound; the expected error is much smaller, often only kℓ above the optimum. A single extra step of power iteration, replacing Y=AΩ by Y=(AAT)qAΩ, polynomially shrinks the gap whenever A has any decay in its singular values.
Combining the two ideas gives the fast approximate SVD of a large matrix.
The expensive step is the matrix multiplication AΩ, which touches each entry of A once and can be streamed or done in parallel; the rest works on tall-thin matrices of size m×(k+ℓ) or (k+ℓ)×n. For an m×n matrix the cost drops from O(mnmin(m,n)) for a full SVD to roughly O(mn(k+ℓ)), with error close to the Eckart–Young optimum σk+1. This is what lets PCA, latent-semantic indexing, and modern recommendation systems run on matrices that would never fit in memory in factored form.