Skip to content

Examples

A few canonical settings where the CLT applies. Each example specifies μ=E[X1]\mu = \mathbb{E}[X_1] and σ2=Var(X1)\sigma^2 = \text{Var}(X_1), then the standardization

Zn=SnnμσndN(0,1)Z_n = \frac{S_n - n\mu}{\sigma \sqrt{n}} \xrightarrow{d} N(0, 1)

gives the limit shape.

Pick a base distribution and slide nn. The blue bars are the exact distribution of the nn-fold sum (computed by repeatedly convolving the base PMF with itself, not simulated). The red dashed curve is the Gaussian with mean nμn\mu and variance nσ2n\sigma^2. Notice how rapidly the bars match the curve as nn grows, even when the base is asymmetric (Bernoulli(0.3)) or bimodal.

n = 1μn = 0.30σn = 0.46
0.30.8sum valueprobabilityexact sumGaussian fit

The historical CLT: a sum of nn independent Bernoulli(p)(p) variables is Binomial(n,p)(n, p). With μ=p\mu = p and σ2=p(1p)\sigma^2 = p(1-p),

Snnpnp(1p)dN(0,1).\frac{S_n - np}{\sqrt{n p (1-p)}} \xrightarrow{d} N(0, 1).

Reading. For large nn, Binomial(n,p)N(np,np(1p))\text{Binomial}(n, p) \approx N(np, \, np(1-p)). This is the de Moivre–Laplace theorem (1733/1812), historically the first CLT, predating the i.i.d. CLT by over a century. The widget’s Bernoulli(0.3) option is exactly this setting: nn-fold convolution of Bernoulli(0.3)(0.3) is Binomial(n,0.3)(n, 0.3), and the dashed Gaussian overlay is the de Moivre-Laplace approximation N(0.3n,0.21n)N(0.3 n, \, 0.21 n). Slide nn to watch the binomial bars relax onto the bell curve.

Rule of thumb. The approximation is excellent when np10np \ge 10 and n(1p)10n(1-p) \ge 10. For small pp (rare events), the approximation degrades and the Poisson limit is more appropriate.

For XiX_i i.i.d. Uniform(0,1)\text{Uniform}(0, 1), μ=1/2\mu = 1/2 and σ2=1/12\sigma^2 = 1/12, so

12n(Xn12)dN(0,1).\sqrt{12 n} \, \left(\overline{X}_n - \tfrac{1}{2}\right) \xrightarrow{d} N(0, 1).

Reading. The sum SnS_n has the Irwin–Hall distribution, supported on [0,n][0, n]. Even at n=6n = 6, the distribution is already strikingly bell-shaped (this is the basis for the classic “sum of 12 uniforms minus 6” trick for crude Gaussian random number generation).

For XiX_i i.i.d. Exponential(λ)\text{Exponential}(\lambda), μ=1/λ\mu = 1/\lambda and σ2=1/λ2\sigma^2 = 1/\lambda^2. The sum is Gamma(n,λ)\text{Gamma}(n, \lambda):

nλ(Xn1λ)dN(0,1).\sqrt{n} \, \lambda \, \left(\overline{X}_n - \tfrac{1}{\lambda}\right) \xrightarrow{d} N(0, 1).

Reading. The exponential distribution is heavily right-skewed (skewness =2= 2). The CLT applies, but convergence is slower than for symmetric distributions: the gamma’s skewness is 2/n2 / \sqrt{n}, so even at n=30n = 30 a noticeable rightward bias remains. The widget above renders this exactly: select Exponential(1) and slide nn. At n=1n = 1 the curve is the bare exponential decay; the rightward tail visibly persists past n=20n = 20, while the symmetric die has already locked onto the Gaussian by then.

Rolling nn fair six-sided dice and summing gives SnS_n with μ=3.5n\mu = 3.5 n and σ2=(35/12)n\sigma^2 = (35/12) n. The standardized sum converges to N(0,1)N(0, 1).

Reading. Already by n=10n = 10 the histogram of the standardized sum is virtually indistinguishable from N(0,1)N(0, 1) at the resolution typically used. This is why averaging dice rolls is the canonical introduction to CLT in undergraduate texts.

The Cauchy distribution has density f(x)=1π(1+x2)f(x) = \frac{1}{\pi (1 + x^2)} and no finite mean, let alone variance. The CLT does not apply.

In fact, for i.i.d. Cauchy(0,1)(0, 1) variables,

Snn=Xn    Cauchy(0,1)for every n,\frac{S_n}{n} = \overline{X}_n \;\sim\; \text{Cauchy}(0, 1) \quad \text{for every } n,

not just in the limit. Averaging never narrows the distribution; the sample mean is no better an estimator of the (nonexistent) “center” than a single observation. This is the canonical heavy-tailed counterexample, ruled out by the finite-variance hypothesis of CLT 1.