The proof proceeds along the following plan:
Truncate. Replace X k X_k X k with Y k = X k 1 { ∣ X k ∣ ≤ k } Y_k = X_k \, \mathbb{1}_{\{|X_k| \le k\}} Y k = X k 1 { ∣ X k ∣ ≤ k } and show that the truncation changes the sum only finitely often (Borel–Cantelli), so it suffices to prove T n / n → μ T_n/n \to \mu T n / n → μ a.s. where T n = ∑ i = 1 n Y i T_n = \sum_{i=1}^n Y_i T n = ∑ i = 1 n Y i .
Reduce to X ≥ 0 X \ge 0 X ≥ 0 . Split X = X + − X − X = X^+ - X^- X = X + − X − .
Prove convergence along the geometric subsequence k ( n ) = ⌊ α n ⌋ k(n) = \lfloor \alpha^n \rfloor k ( n ) = ⌊ α n ⌋ using Chebyshev and Borel–Cantelli; use dominated convergence to identify the limit as μ \mu μ .
Fill in the gaps between consecutive k ( n ) k(n) k ( n ) and k ( n + 1 ) k(n+1) k ( n + 1 ) using monotonicity of T m T_m T m in m m m (this is where X ≥ 0 X \ge 0 X ≥ 0 is used). Let α ↓ 1 \alpha \downarrow 1 α ↓ 1 .
The argument relies on two technical lemmas. The first is a pure series estimate; the second uses the first to bound the variances of the truncated variables.
Lemma 2 (variance bound)
With Y k = X k 1 { ∣ X k ∣ ≤ k } Y_k = X_k \, \mathbb{1}_{\{|X_k| \le k\}} Y k = X k 1 { ∣ X k ∣ ≤ k } ,
∑ k = 1 ∞ Var ( Y k ) k 2 ≤ 4 E ∣ X 1 ∣ < ∞ . \sum_{k=1}^{\infty} \frac{\text{Var}(Y_k)}{k^2} \le 4 \, \mathbb{E}|X_1| < \infty. k = 1 ∑ ∞ k 2 Var ( Y k ) ≤ 4 E ∣ X 1 ∣ < ∞. Use the tail identity E [ Z 2 ] = ∫ 0 ∞ 2 y P ( Z > y ) d y \mathbb{E}[Z^2] = \int_0^{\infty} 2y \, \mathbb{P}(Z > y) \, dy E [ Z 2 ] = ∫ 0 ∞ 2 y P ( Z > y ) d y for non-negative Z Z Z (see the lemma in WLLN 2 ) applied to Z = ∣ Y k ∣ Z = |Y_k| Z = ∣ Y k ∣ . Since ∣ Y k ∣ ≤ k |Y_k| \le k ∣ Y k ∣ ≤ k , P ( ∣ Y k ∣ > y ) = 0 \mathbb{P}(|Y_k| > y) = 0 P ( ∣ Y k ∣ > y ) = 0 for y ≥ k y \ge k y ≥ k , so the integral truncates at k k k :
Var ( Y k ) ≤ E [ Y k 2 ] = ∫ 0 k 2 y P ( ∣ Y k ∣ > y ) d y ≤ ∫ 0 k 2 y P ( ∣ X 1 ∣ > y ) d y , \text{Var}(Y_k) \le \mathbb{E}[Y_k^2] = \int_0^k 2y \, \mathbb{P}(|Y_k| > y) \, dy \le \int_0^k 2y \, \mathbb{P}(|X_1| > y) \, dy, Var ( Y k ) ≤ E [ Y k 2 ] = ∫ 0 k 2 y P ( ∣ Y k ∣ > y ) d y ≤ ∫ 0 k 2 y P ( ∣ X 1 ∣ > y ) d y , using P ( ∣ Y k ∣ > y ) ≤ P ( ∣ X k ∣ > y ) = P ( ∣ X 1 ∣ > y ) \mathbb{P}(|Y_k| > y) \le \mathbb{P}(|X_k| > y) = \mathbb{P}(|X_1| > y) P ( ∣ Y k ∣ > y ) ≤ P ( ∣ X k ∣ > y ) = P ( ∣ X 1 ∣ > y ) .
Summing and swapping order of summation/integration (Fubini, non-negative integrand):
∑ k = 1 ∞ Var ( Y k ) k 2 ≤ ∑ k = 1 ∞ 1 k 2 ∫ 0 k 2 y P ( ∣ X 1 ∣ > y ) d y = ∫ 0 ∞ ( ∑ k > y k ∈ Z ≥ 1 2 y k 2 ) P ( ∣ X 1 ∣ > y ) d y . \sum_{k=1}^{\infty} \frac{\text{Var}(Y_k)}{k^2}
\le \sum_{k=1}^{\infty} \frac{1}{k^2} \int_0^k 2y \, \mathbb{P}(|X_1| > y) \, dy
= \int_0^{\infty} \left(\sum_{\substack{k > y \\ k \in \mathbb{Z}_{\ge 1}}} \frac{2y}{k^2}\right) \mathbb{P}(|X_1| > y) \, dy. k = 1 ∑ ∞ k 2 Var ( Y k ) ≤ k = 1 ∑ ∞ k 2 1 ∫ 0 k 2 y P ( ∣ X 1 ∣ > y ) d y = ∫ 0 ∞ k > y k ∈ Z ≥ 1 ∑ k 2 2 y P ( ∣ X 1 ∣ > y ) d y . By Lemma 1, the inner sum is at most 4 4 4 :
≤ 4 ∫ 0 ∞ P ( ∣ X 1 ∣ > y ) d y = 4 E ∣ X 1 ∣ , \le 4 \int_0^{\infty} \mathbb{P}(|X_1| > y) \, dy = 4 \, \mathbb{E}|X_1|, ≤ 4 ∫ 0 ∞ P ( ∣ X 1 ∣ > y ) d y = 4 E ∣ X 1 ∣ , where the last equality is the standard tail formula E ∣ X 1 ∣ = ∫ 0 ∞ P ( ∣ X 1 ∣ > y ) d y \mathbb{E}|X_1| = \int_0^{\infty} \mathbb{P}(|X_1| > y) \, dy E ∣ X 1 ∣ = ∫ 0 ∞ P ( ∣ X 1 ∣ > y ) d y for the non-negative variable ∣ X 1 ∣ |X_1| ∣ X 1 ∣ .
Reduction to the truncated variables.
Define Y k = X k 1 { ∣ X k ∣ ≤ k } Y_k = X_k \, \mathbb{1}_{\{|X_k| \le k\}} Y k = X k 1 { ∣ X k ∣ ≤ k } and T n = ∑ i = 1 n Y i T_n = \sum_{i=1}^n Y_i T n = ∑ i = 1 n Y i .
The events { X k ≠ Y k } = { ∣ X k ∣ > k } \{X_k \ne Y_k\} = \{|X_k| > k\} { X k = Y k } = { ∣ X k ∣ > k } satisfy
∑ k = 1 ∞ P ( ∣ X k ∣ > k ) = ∑ k = 1 ∞ P ( ∣ X 1 ∣ > k ) ≤ ∫ 0 ∞ P ( ∣ X 1 ∣ > t ) d t = E ∣ X 1 ∣ < ∞ , \sum_{k=1}^{\infty} \mathbb{P}(|X_k| > k) = \sum_{k=1}^{\infty} \mathbb{P}(|X_1| > k) \le \int_0^{\infty} \mathbb{P}(|X_1| > t) \, dt = \mathbb{E}|X_1| < \infty, k = 1 ∑ ∞ P ( ∣ X k ∣ > k ) = k = 1 ∑ ∞ P ( ∣ X 1 ∣ > k ) ≤ ∫ 0 ∞ P ( ∣ X 1 ∣ > t ) d t = E ∣ X 1 ∣ < ∞ , where the inequality compares a Riemann sum at integer points to the integral of the non-increasing function t ↦ P ( ∣ X 1 ∣ > t ) t \mapsto \mathbb{P}(|X_1| > t) t ↦ P ( ∣ X 1 ∣ > t ) .
By the first Borel–Cantelli lemma , P ( X k ≠ Y k i.o. ) = 0 \mathbb{P}(X_k \ne Y_k \text{ i.o.}) = 0 P ( X k = Y k i.o. ) = 0 . So almost surely, X k = Y k X_k = Y_k X k = Y k for all k k k beyond some (random) finite index. Then S n − T n S_n - T_n S n − T n stabilizes at a finite constant as n → ∞ n \to \infty n → ∞ , and ( S n − T n ) / n → 0 (S_n - T_n)/n \to 0 ( S n − T n ) / n → 0 almost surely. This proves the claim.
Reduction to X ≥ 0 X \ge 0 X ≥ 0 .
Decomposing X k = X k + − X k − X_k = X_k^+ - X_k^- X k = X k + − X k − , both { X k + } \{X_k^+\} { X k + } and { X k − } \{X_k^-\} { X k − } are identically distributed, pairwise independent (each is a function of a single X k X_k X k ), and have finite mean. If the theorem holds for non-negative summands, applying it separately to X k + X_k^+ X k + and X k − X_k^- X k − and subtracting gives the general case. Assume X k ≥ 0 X_k \ge 0 X k ≥ 0 for the remainder of the proof.
Convergence along the geometric subsequence.
Fix α > 1 \alpha > 1 α > 1 and set k ( n ) = ⌊ α n ⌋ k(n) = \lfloor \alpha^n \rfloor k ( n ) = ⌊ α n ⌋ .
Chebyshev bound on T k ( n ) T_{k(n)} T k ( n ) .
Apply Chebyshev’s inequality to T k ( n ) T_{k(n)} T k ( n ) : for ϵ > 0 \epsilon > 0 ϵ > 0 ,
P ( ∣ T k ( n ) − E T k ( n ) ∣ k ( n ) > ϵ ) ≤ Var ( T k ( n ) ) ϵ 2 k ( n ) 2 . \mathbb{P}\!\left(\frac{|T_{k(n)} - \mathbb{E} T_{k(n)}|}{k(n)} > \epsilon\right) \le \frac{\text{Var}(T_{k(n)})}{\epsilon^2 \, k(n)^2}. P ( k ( n ) ∣ T k ( n ) − E T k ( n ) ∣ > ϵ ) ≤ ϵ 2 k ( n ) 2 Var ( T k ( n ) ) .
Sum over n n n and swap with Fubini.
Since { X m } \{X_m\} { X m } are pairwise independent, so are { Y m } \{Y_m\} { Y m } (each Y m Y_m Y m is a function of a single X m X_m X m ), and the variance of a sum of pairwise uncorrelated variables is the sum of variances:
Var ( T k ( n ) ) = ∑ m = 1 k ( n ) Var ( Y m ) . \text{Var}(T_{k(n)}) = \sum_{m=1}^{k(n)} \text{Var}(Y_m). Var ( T k ( n ) ) = m = 1 ∑ k ( n ) Var ( Y m ) .
Summing the Chebyshev bound over n n n and swapping order:
∑ n = 1 ∞ Var ( T k ( n ) ) k ( n ) 2 = ∑ n = 1 ∞ 1 k ( n ) 2 ∑ m = 1 k ( n ) Var ( Y m ) = ∑ m = 1 ∞ Var ( Y m ) ∑ n : k ( n ) ≥ m 1 k ( n ) 2 . \sum_{n=1}^{\infty} \frac{\text{Var}(T_{k(n)})}{k(n)^2}
= \sum_{n=1}^{\infty} \frac{1}{k(n)^2} \sum_{m=1}^{k(n)} \text{Var}(Y_m)
= \sum_{m=1}^{\infty} \text{Var}(Y_m) \!\!\sum_{n : k(n) \ge m} \frac{1}{k(n)^2}. n = 1 ∑ ∞ k ( n ) 2 Var ( T k ( n ) ) = n = 1 ∑ ∞ k ( n ) 2 1 m = 1 ∑ k ( n ) Var ( Y m ) = m = 1 ∑ ∞ Var ( Y m ) n : k ( n ) ≥ m ∑ k ( n ) 2 1 .
Bound the inner sum by a geometric series.
For n ≥ 1 n \ge 1 n ≥ 1 , k ( n ) = ⌊ α n ⌋ ≥ α n / 2 k(n) = \lfloor \alpha^n \rfloor \ge \alpha^n / 2 k ( n ) = ⌊ α n ⌋ ≥ α n /2 , hence k ( n ) − 2 ≤ 4 α − 2 n k(n)^{-2} \le 4 \alpha^{-2n} k ( n ) − 2 ≤ 4 α − 2 n . The condition k ( n ) ≥ m k(n) \ge m k ( n ) ≥ m is equivalent to α n ≥ m \alpha^n \ge m α n ≥ m (since m m m is an integer). The smallest such n n n gives α − 2 n ≤ 1 / m 2 \alpha^{-2n} \le 1/m^2 α − 2 n ≤ 1/ m 2 , and the rest form a geometric series with ratio α − 2 \alpha^{-2} α − 2 :
∑ n : k ( n ) ≥ m 1 k ( n ) 2 ≤ ∑ n : α n ≥ m 4 α 2 n ≤ 4 / m 2 1 − α − 2 . \sum_{n : k(n) \ge m} \frac{1}{k(n)^2} \le \sum_{n : \alpha^n \ge m} \frac{4}{\alpha^{2n}} \le \frac{4/m^2}{1 - \alpha^{-2}}. n : k ( n ) ≥ m ∑ k ( n ) 2 1 ≤ n : α n ≥ m ∑ α 2 n 4 ≤ 1 − α − 2 4/ m 2 .
Combine with Lemma 2.
Substituting,
∑ n = 1 ∞ Var ( T k ( n ) ) k ( n ) 2 ≤ 4 1 − α − 2 ∑ m = 1 ∞ Var ( Y m ) m 2 ≤ 16 E ∣ X 1 ∣ 1 − α − 2 < ∞ . \sum_{n=1}^{\infty} \frac{\text{Var}(T_{k(n)})}{k(n)^2} \le \frac{4}{1 - \alpha^{-2}} \sum_{m=1}^{\infty} \frac{\text{Var}(Y_m)}{m^2} \le \frac{16 \, \mathbb{E}|X_1|}{1 - \alpha^{-2}} < \infty. n = 1 ∑ ∞ k ( n ) 2 Var ( T k ( n ) ) ≤ 1 − α − 2 4 m = 1 ∑ ∞ m 2 Var ( Y m ) ≤ 1 − α − 2 16 E ∣ X 1 ∣ < ∞.
Therefore ∑ n P ( ∣ T k ( n ) − E T k ( n ) ∣ / k ( n ) > ϵ ) < ∞ \sum_n \mathbb{P}(\,|T_{k(n)} - \mathbb{E} T_{k(n)}|/k(n) > \epsilon) < \infty ∑ n P ( ∣ T k ( n ) − E T k ( n ) ∣/ k ( n ) > ϵ ) < ∞ for every ϵ > 0 \epsilon > 0 ϵ > 0 , and the first Borel–Cantelli lemma gives
P ( ∣ T k ( n ) − E T k ( n ) ∣ k ( n ) > ϵ i.o. ) = 0 , \mathbb{P}\!\left(\frac{|T_{k(n)} - \mathbb{E} T_{k(n)}|}{k(n)} > \epsilon \text{ i.o.}\right) = 0, P ( k ( n ) ∣ T k ( n ) − E T k ( n ) ∣ > ϵ i.o. ) = 0 ,
so lim sup n ∣ T k ( n ) − E T k ( n ) ∣ / k ( n ) ≤ ϵ \limsup_n |T_{k(n)} - \mathbb{E} T_{k(n)}|/k(n) \le \epsilon lim sup n ∣ T k ( n ) − E T k ( n ) ∣/ k ( n ) ≤ ϵ a.s. Taking ϵ \epsilon ϵ through a countable sequence → 0 \to 0 → 0 ,
T k ( n ) − E T k ( n ) k ( n ) → a . s . 0. —(1) \frac{T_{k(n)} - \mathbb{E} T_{k(n)}}{k(n)} \xrightarrow{a.s.} 0. \quad\textcolor{gray}{\text{---(1)}} k ( n ) T k ( n ) − E T k ( n ) a . s . 0. —(1)
Identify the limit via DCT.
Each Y m = X m 1 { ∣ X m ∣ ≤ m } Y_m = X_m \, \mathbb{1}_{\{|X_m| \le m\}} Y m = X m 1 { ∣ X m ∣ ≤ m } has the same law as X 1 1 { ∣ X 1 ∣ ≤ m } X_1 \, \mathbb{1}_{\{|X_1| \le m\}} X 1 1 { ∣ X 1 ∣ ≤ m } , which converges pointwise to X 1 X_1 X 1 and is dominated by ∣ X 1 ∣ |X_1| ∣ X 1 ∣ . By the dominated convergence theorem ,
E [ Y m ] = E [ X 1 1 { ∣ X 1 ∣ ≤ m } ] ⟶ E [ X 1 ] = μ . \mathbb{E}[Y_m] = \mathbb{E}\!\left[X_1 \, \mathbb{1}_{\{|X_1| \le m\}}\right] \;\longrightarrow\; \mathbb{E}[X_1] = \mu. E [ Y m ] = E [ X 1 1 { ∣ X 1 ∣ ≤ m } ] ⟶ E [ X 1 ] = μ .
The Cesàro average of a convergent sequence has the same limit:
E T k ( n ) k ( n ) = 1 k ( n ) ∑ m = 1 k ( n ) E [ Y m ] ⟶ μ . —(2) \frac{\mathbb{E} T_{k(n)}}{k(n)} = \frac{1}{k(n)} \sum_{m=1}^{k(n)} \mathbb{E}[Y_m] \;\longrightarrow\; \mu. \quad\textcolor{gray}{\text{---(2)}} k ( n ) E T k ( n ) = k ( n ) 1 m = 1 ∑ k ( n ) E [ Y m ] ⟶ μ . —(2)
Combine (1) and (2).
T k ( n ) k ( n ) = T k ( n ) − E T k ( n ) k ( n ) + E T k ( n ) k ( n ) → a . s . 0 + μ = μ . —(3) \frac{T_{k(n)}}{k(n)} = \frac{T_{k(n)} - \mathbb{E} T_{k(n)}}{k(n)} + \frac{\mathbb{E} T_{k(n)}}{k(n)} \xrightarrow{a.s.} 0 + \mu = \mu. \quad\textcolor{gray}{\text{---(3)}} k ( n ) T k ( n ) = k ( n ) T k ( n ) − E T k ( n ) + k ( n ) E T k ( n ) a . s . 0 + μ = μ . —(3)
Filling in the gaps.
The subsequence result (3) controls T k ( n ) / k ( n ) T_{k(n)}/k(n) T k ( n ) / k ( n ) . For an arbitrary m m m , pick n n n with k ( n ) ≤ m < k ( n + 1 ) k(n) \le m < k(n+1) k ( n ) ≤ m < k ( n + 1 ) . Since X k ≥ 0 X_k \ge 0 X k ≥ 0 , T m T_m T m is non-decreasing in m m m , so
T k ( n ) k ( n + 1 ) ≤ T m m ≤ T k ( n + 1 ) k ( n ) . \frac{T_{k(n)}}{k(n+1)} \le \frac{T_m}{m} \le \frac{T_{k(n+1)}}{k(n)}. k ( n + 1 ) T k ( n ) ≤ m T m ≤ k ( n ) T k ( n + 1 ) . The ratios k ( n + 1 ) / k ( n ) → α k(n+1)/k(n) \to \alpha k ( n + 1 ) / k ( n ) → α as n → ∞ n \to \infty n → ∞ (since ⌊ α n + 1 ⌋ / ⌊ α n ⌋ → α \lfloor \alpha^{n+1} \rfloor / \lfloor \alpha^n \rfloor \to \alpha ⌊ α n + 1 ⌋ / ⌊ α n ⌋ → α ). Therefore, by (3),
T k ( n ) k ( n + 1 ) = T k ( n ) k ( n ) ⋅ k ( n ) k ( n + 1 ) → a . s . μ α , \frac{T_{k(n)}}{k(n+1)} = \frac{T_{k(n)}}{k(n)} \cdot \frac{k(n)}{k(n+1)} \xrightarrow{a.s.} \frac{\mu}{\alpha}, k ( n + 1 ) T k ( n ) = k ( n ) T k ( n ) ⋅ k ( n + 1 ) k ( n ) a . s . α μ , T k ( n + 1 ) k ( n ) = T k ( n + 1 ) k ( n + 1 ) ⋅ k ( n + 1 ) k ( n ) → a . s . α μ . \frac{T_{k(n+1)}}{k(n)} = \frac{T_{k(n+1)}}{k(n+1)} \cdot \frac{k(n+1)}{k(n)} \xrightarrow{a.s.} \alpha \mu. k ( n ) T k ( n + 1 ) = k ( n + 1 ) T k ( n + 1 ) ⋅ k ( n ) k ( n + 1 ) a . s . α μ . The sandwich gives
μ α ≤ lim inf m → ∞ T m m ≤ lim sup m → ∞ T m m ≤ α μ a.s. \frac{\mu}{\alpha} \le \liminf_{m \to \infty} \frac{T_m}{m} \le \limsup_{m \to \infty} \frac{T_m}{m} \le \alpha \mu \quad \text{a.s.} α μ ≤ m → ∞ lim inf m T m ≤ m → ∞ lim sup m T m ≤ α μ a.s. This holds for every α > 1 \alpha > 1 α > 1 . Taking α ↓ 1 \alpha \downarrow 1 α ↓ 1 along a countable sequence,
T m m → a . s . μ , \frac{T_m}{m} \xrightarrow{a.s.} \mu, m T m a . s . μ , which (by the reduction at the start) gives S m / m → a . s . μ S_m / m \xrightarrow{a.s.} \mu S m / m a . s . μ .