Skip to content

Decomposing Variance

The law of total variance decomposes Var(X)\Var(X) into two pieces driven by an auxiliary random variable YY: the average within-YY variance and the variance of the conditional mean E[XY]\E[X \mid Y]. It is the second-moment analog of the law of iterated expectations and a direct corollary of the orthogonality of the residual.

This is the random variable obtained by squaring the residual XE[XY]X - \E[X \mid Y] and taking its conditional expectation given YY. It is a function of YY and is itself a random variable, just like E[XY]\E[X \mid Y].

Imagine XX is a measurement (height, income, response time) and YY is a grouping variable (school, region, treatment arm). The law of total variance reads:

The total spread of XX equals the average spread within each group plus the spread of the group averages.

Total variance splits into within-group (intra) and between-group (inter) variance

On the left, all data points pooled together: spread is Var(X)\Var(X). On the right, the same points grouped by YY: the orange bars show the within-group spread averaged across groups E[Var(XY)]\E[\Var(X \mid Y)], and the purple bar shows the spread of the group means Var(E[XY])\Var(\E[X \mid Y]). The two pieces sum back to the total variance on the left.

Concretely:

  • Within-group (intra) variance = E[Var(XY)]\E[\Var(X \mid Y)]. The average of the variances inside each YY-block. Captures how much XX wiggles around its conditional mean.
  • Between-group (inter) variance = Var(E[XY])\Var(\E[X \mid Y]). The variance of the conditional means E[XY]\E[X \mid Y]. Captures how much the group means themselves spread out.
  • Analysis of variance (ANOVA). The within/between decomposition is the algebraic core of one-way ANOVA, where YY is a categorical treatment label and the F-statistic compares the two pieces.
  • Variance reduction. If E[XY]\E[X \mid Y] is easy to compute and Var(XY)\Var(X \mid Y) is small, conditioning on YY gives a low-variance estimator. This is the basis for Rao-Blackwellization in statistics and stratified sampling in Monte Carlo.
  • Regression decomposition. With YY replaced by a fitted regression X^=f(Z)\hat X = f(\mathbf{Z}), the same identity gives the standard “explained vs. unexplained” variance decomposition. The R2R^2 statistic is the ratio of Var(E[XZ])\Var(\E[X \mid \mathbf{Z}]) to Var(X)\Var(X), restricted to the best linear predictor.