The first question to ask about a matrix is not how to invert it but what it can produce. Reading Ax as a combination of the columns of A turns that into a geometric object, the column space, and a single counting invariant, the rank. The factorization A=CR makes both explicit and settles, in one line, that a matrix has as many independent rows as independent columns.
Because Ax=x1a1+⋯+xnan, the system Ax=b is solvable precisely when b∈C(A). The column space is therefore the set of attainable right-hand sides, and the rank measures how much of Rm the matrix can reach.
Keep only the columns that carry new directions. Scanning a1,…,an left to right and discarding any column that is already a combination of the ones before it leaves r independent columns; collect them as the matrix C∈Rm×r. Every column of A is a combination of these, and recording the coefficients gives the second factor.
For
A=111222456,
the second column is 2× the first, so columns 1 and 3 are independent and r=2. The factorization is
A=CR=111456(102001).
The independent columns of A (here columns 1 and 3) become C; the factor R says how to rebuild each column of A as a combination of them. The shared inner dimension is the rank r.
The factorization pays off immediately. In A=CR, every row of A is a combination of the rows of R (the i-th row of A is row i of C acting on R). So the row space of A is contained in the row space of R, which has only r rows; hence the row rank is at most r. Symmetrically r=rankC counts independent columns, and applying the same argument to AT gives the reverse inequality.
The column-space reading also bounds the rank of a product. Every column of AB is a combination of the columns of A, so C(AB)⊆C(A) and rank(AB)≤rank(A). Every row of AB is a combination of the rows of B, so rank(AB)≤rank(B). Together,
rank(AB)≤min{rank(A),rank(B)}.
In particular A=CR writes a rank-r matrix as a product of an m×r and an r×n factor, the smallest possible inner dimension; this is the prototype of every low-rank factorization, made optimal later by the singular value decomposition and Eckart–Young.