Skip to content

Projection Perspective

For square-integrable random variables, conditional expectation has a clean geometric interpretation: E(XG)\E(X \mid \cG) is the orthogonal projection of XX onto the closed subspace L2(Ω,G,P)L^2(\Omega, \cG, \Pr) inside the Hilbert space L2(Ω,F,P)L^2(\Omega, \cF, \Pr). Two equivalent characterizations make this precise:

  • Orthogonality. The residual XE(XG)X - \E(X \mid \cG) is uncorrelated with every G\cG-measurable square-integrable random variable.
  • Minimal distance. Among all G\cG-measurable square-integrable ZZ, the choice Z=E(XG)Z = \E(X \mid \cG) minimizes the mean-squared error E[(XZ)2]\E[(X - Z)^2].

Throughout this page, XX has E[X2]<\E[X^2] < \infty, so XL2(Ω,F,P)X \in L^2(\Omega, \cF, \Pr). The inner product on L2L^2 is X,Y=E[XY]\langle X, Y \rangle = \E[XY], and the norm is X2=E[X2]\| X \|_2 = \sqrt{\E[X^2]}. Two zero-mean random variables are uncorrelated iff their inner product (covariance) is zero.

In the drawings below, I use dotted lines to denote something perpendicular to the plane and dashed lines to represent something within the plane.

Conditional expectation as orthogonal projection in L²

L2(G)L^2(\cG) is the closed subspace of G\cG-measurable square-integrable random variables. The projection E(XG)\E(X \mid \cG) sits in this subspace; the residual XE(XG)X - \E(X \mid \cG) is perpendicular to it.

The reading: covariance is the inner product on the space of mean-zero L2L^2 random variables. The proposition says the residual is orthogonal (uncorrelated) with the whole subspace L2(Ω,G,P)L^2(\Omega, \cG, \Pr). This is exactly the defining property of an orthogonal projection in Hilbert space.

Intuition. Any blue line (XZX - Z) is longer than the red perpendicular (XE(XG)X - \E(X \mid \cG)). A projection has to be at minimal distance.

Minimal distance characterization: the perpendicular from X to L²(G) is shorter than any other line from X to the subspace

Any alternative ZL2(G)Z \in L^2(\cG) produces a longer line XZX - Z than the perpendicular XE(XG)X - \E(X \mid \cG). The Pythagorean identity XZ22=XE(XG)22+E(XG)Z22\|X - Z\|_2^2 = \|X - \E(X \mid \cG)\|_2^2 + \|\E(X \mid \cG) - Z\|_2^2 is the algebraic content of the picture.

The two characterizations (orthogonality, minimal distance) are equivalent statements of the Hilbert-space projection theorem specialized to the closed subspace L2(Ω,G,P)L2(Ω,F,P)L^2(\Omega, \cG, \Pr) \subseteq L^2(\Omega, \cF, \Pr). For any closed subspace MM of a Hilbert space HH and any XHX \in H:

  • A unique X^M\hat X \in M minimizes XZ\| X - Z \| over ZMZ \in M.
  • This X^\hat X is characterized by XX^MX - \hat X \perp M.

Conditional expectation realizes this projection concretely: X^=E(XG)\hat X = \E(X \mid \cG). The construction we gave via Radon-Nikodym handles all integrable XX (not just XL2X \in L^2), but on the L2L^2 subset it coincides with the projection, and most intuition transfers from the geometric picture.

A few corollaries that fall out immediately:

  • Variance decomposition. Taking Z=E[X]Z = \E[X] (the trivial-σ\sigma-field projection) in the Pythagorean identity gives
Var(X)  =  E[Var(XG)]+Var(E(XG)),\Var(X) \;=\; \E[\Var(X \mid \cG)] + \Var(\E(X \mid \cG)),

the law of total variance: unconditional variance splits into the conditional-variance average plus the variance of the conditional mean.

  • Best linear prediction. Restricting G\cG to be the σ\sigma-field generated by a finite collection {Y1,,Yk}\{Y_1, \ldots, Y_k\} and further restricting to linear combinations of the YiY_i recovers ordinary least-squares regression. The conditional expectation is the best predictor; OLS is the best linear predictor.

  • Idempotence. Projection is idempotent: applying E(G)\E(\cdot \mid \cG) twice gives the same answer, which is the tower property restricted to L2L^2.