Decomposing Variance
The law of total variance decomposes into two pieces driven by an auxiliary random variable : the average within- variance and the variance of the conditional mean . It is the second-moment analog of the law of iterated expectations and a direct corollary of the orthogonality of the residual.
Conditional variance
Section titled “Conditional variance”This is the random variable obtained by squaring the residual and taking its conditional expectation given . It is a function of and is itself a random variable, just like .
Theorem
Section titled “Theorem”Intuition: Within-group and between-group
Section titled “Intuition: Within-group and between-group”Imagine is a measurement (height, income, response time) and is a grouping variable (school, region, treatment arm). The law of total variance reads:
The total spread of equals the average spread within each group plus the spread of the group averages.
On the left, all data points pooled together: spread is . On the right, the same points grouped by : the orange bars show the within-group spread averaged across groups , and the purple bar shows the spread of the group means . The two pieces sum back to the total variance on the left.
Concretely:
- Within-group (intra) variance = . The average of the variances inside each -block. Captures how much wiggles around its conditional mean.
- Between-group (inter) variance = . The variance of the conditional means . Captures how much the group means themselves spread out.
Applications
Section titled “Applications”- Analysis of variance (ANOVA). The within/between decomposition is the algebraic core of one-way ANOVA, where is a categorical treatment label and the F-statistic compares the two pieces.
- Variance reduction. If is easy to compute and is small, conditioning on gives a low-variance estimator. This is the basis for Rao-Blackwellization in statistics and stratified sampling in Monte Carlo.
- Regression decomposition. With replaced by a fitted regression , the same identity gives the standard “explained vs. unexplained” variance decomposition. The statistic is the ratio of to , restricted to the best linear predictor.