We investigate the conditions required to reverse the implications between different modes of convergence. We know that almost sure convergence and L p L^p L p convergence both imply convergence in probability. The reverse implications generally require additional conditions.
Convergence in probability does not imply almost sure convergence (as seen in the “Typewriter” counter-example). However, it does imply that a subsequence converges almost surely.
Theorem
If X n → P X X_n \xrightarrow{P} X X n P X , then there exists a subsequence { n m } m ≥ 1 \{n_m\}_{m \ge 1} { n m } m ≥ 1 such that:
X n m → a . s . X as m → ∞ X_{n_m} \xrightarrow{a.s.} X \quad \text{as } m \to \infty X n m a . s . X as m → ∞ Set n 0 = 0 n_0 = 0 n 0 = 0 (dummy term). For any m ≥ 1 m \ge 1 m ≥ 1 , we can find an index n m n_m n m large enough.
Let n m = inf { n > n m − 1 : P ( ∣ X n − X ∣ > 1 / m ) ≤ 2 − m } n_m = \inf \{ n > n_{m-1} : \mathbb{P}(|X_n - X| > 1/m) \le 2^{-m} \} n m = inf { n > n m − 1 : P ( ∣ X n − X ∣ > 1/ m ) ≤ 2 − m } .
We can always find such an n m n_m n m because X n → P X X_n \xrightarrow{P} X X n P X (implies the probability goes to 0, eventually dropping below 2 − m 2^{-m} 2 − m ).
Now consider the sum of these probabilities:
∑ m = 1 ∞ P ( ∣ X n m − X ∣ > 1 m ) ≤ ∑ m = 1 ∞ 2 − m = 1 < ∞ \sum_{m=1}^\infty \mathbb{P}\left(|X_{n_m} - X| > \frac{1}{m}\right) \le \sum_{m=1}^\infty 2^{-m} = 1 < \infty m = 1 ∑ ∞ P ( ∣ X n m − X ∣ > m 1 ) ≤ m = 1 ∑ ∞ 2 − m = 1 < ∞ By the First Borel-Cantelli Lemma :
Since the sum of probabilities is finite, the probability that the events A m = { ∣ X n m − X ∣ > 1 / m } A_m = \{ |X_{n_m} - X| > 1/m \} A m = { ∣ X n m − X ∣ > 1/ m } happen infinitely often is 0.
P ( A m i.o. ) = 0 \mathbb{P}(A_m \text{ i.o.}) = 0 P ( A m i.o. ) = 0 Equivalently, almost surely, ∣ X n m − X ∣ > 1 / m |X_{n_m} - X| > 1/m ∣ X n m − X ∣ > 1/ m happens for only finitely many m m m .
Thus, for almost every ω \omega ω , there exists an m 0 ( ω ) m_0(\omega) m 0 ( ω ) such that for all m ≥ m 0 m \ge m_0 m ≥ m 0 :
∣ X n m ( ω ) − X ( ω ) ∣ ≤ 1 m |X_{n_m}(\omega) - X(\omega)| \le \frac{1}{m} ∣ X n m ( ω ) − X ( ω ) ∣ ≤ m 1 Since 1 / m → 0 1/m \to 0 1/ m → 0 , this implies X n m → a . s . X X_{n_m} \xrightarrow{a.s.} X X n m a . s . X . □ \square □
Convergence in Probability implies L 1 L^1 L 1 convergence if and only if the sequence is Uniformly Integrable (U.I.) . This condition prevents “mass escaping to infinity”.
Essentially, the contribution to the expectation from the “tails” of the distribution goes to 0 uniformly across all X n X_n X n .
Theorem
If X n → P X X_n \xrightarrow{P} X X n P X and the sequence { X n } n ≥ 1 \{X_n\}_{n \ge 1} { X n } n ≥ 1 is Uniformly Integrable , then:
X n → L 1 X X_n \xrightarrow{L^1} X X n L 1 X Without loss of generality, assume X ≡ 0 X \equiv 0 X ≡ 0 (or consider Y n = X n − X Y_n = X_n - X Y n = X n − X ). Be careful that U.I. of X n X_n X n and X X X implies U.I. of X n − X X_n - X X n − X .
We want to show E [ ∣ X n ∣ ] → 0 \mathbb{E}[|X_n|] \to 0 E [ ∣ X n ∣ ] → 0 .
Fix any ϵ > 0 \epsilon > 0 ϵ > 0 .
Tail Control (U.I.) :
Since { X n } \{X_n\} { X n } is U.I., there exists a threshold K > 0 K > 0 K > 0 (large enough) such that for all n n n :
E [ ∣ X n ∣ 1 { ∣ X n ∣ > K } ] ≤ ϵ — (1) \mathbb{E}\left[ |X_n| \mathbb{1}_{\{|X_n| > K\}} \right] \le \epsilon \quad \textcolor{gray}{\text{--- (1)}} E [ ∣ X n ∣ 1 { ∣ X n ∣ > K } ] ≤ ϵ — (1)
Probability Control :
Since X n → P 0 X_n \xrightarrow{P} 0 X n P 0 , for the fixed K K K above, there exists an N N N such that for all n ≥ N n \ge N n ≥ N :
P ( ∣ X n ∣ > ϵ ) ≤ ϵ K — (2) \mathbb{P}(|X_n| > \epsilon) \le \frac{\epsilon}{K} \quad \textcolor{gray}{\text{--- (2)}} P ( ∣ X n ∣ > ϵ ) ≤ K ϵ — (2)
(Note: The proof logic typically splits events differently, let’s follow the standard epsilon-split used in the notes.)
Refined Split :
Decompose the expectation E [ ∣ X n ∣ ] \mathbb{E}[|X_n|] E [ ∣ X n ∣ ] into three regions based on the value of ∣ X n ∣ |X_n| ∣ X n ∣ :
Small : ∣ X n ∣ ≤ ϵ |X_n| \le \epsilon ∣ X n ∣ ≤ ϵ
Medium : ϵ < ∣ X n ∣ ≤ K \epsilon < |X_n| \le K ϵ < ∣ X n ∣ ≤ K
Large : ∣ X n ∣ > K |X_n| > K ∣ X n ∣ > K
E [ ∣ X n ∣ ] = ∫ ∣ X n ∣ ≤ ϵ ∣ X n ∣ d P + ∫ ϵ < ∣ X n ∣ ≤ K ∣ X n ∣ d P + ∫ ∣ X n ∣ > K ∣ X n ∣ d P ≤ ∫ ϵ d P + ∫ K ⋅ 1 { ∣ X n ∣ > ϵ } d P + ϵ (using (1)) ≤ ϵ + K ⋅ P ( ∣ X n ∣ > ϵ ) + ϵ \begin{aligned}
\mathbb{E}[|X_n|] &= \int_{|X_n| \le \epsilon} |X_n| \, d\mathbb{P} + \int_{\epsilon < |X_n| \le K} |X_n| \, d\mathbb{P} + \int_{|X_n| > K} |X_n| \, d\mathbb{P} \\
\\
&\le \int \epsilon \, d\mathbb{P} + \int K \cdot \mathbb{1}_{\{|X_n| > \epsilon\}} \, d\mathbb{P} + \epsilon \quad \textcolor{gray}{\text{(using (1))}} \\
\\
&\le \epsilon + K \cdot \mathbb{P}(|X_n| > \epsilon) + \epsilon
\end{aligned} E [ ∣ X n ∣ ] = ∫ ∣ X n ∣ ≤ ϵ ∣ X n ∣ d P + ∫ ϵ < ∣ X n ∣ ≤ K ∣ X n ∣ d P + ∫ ∣ X n ∣ > K ∣ X n ∣ d P ≤ ∫ ϵ d P + ∫ K ⋅ 1 { ∣ X n ∣ > ϵ } d P + ϵ (using (1)) ≤ ϵ + K ⋅ P ( ∣ X n ∣ > ϵ ) + ϵ Now use the convergence in probability (2): for n ≥ N n \ge N n ≥ N , P ( ∣ X n ∣ > ϵ ) \mathbb{P}(|X_n| > \epsilon) P ( ∣ X n ∣ > ϵ ) is very small (say ≤ ϵ / K \le \epsilon/K ≤ ϵ / K ).
≤ ϵ + K ( ϵ K ) + ϵ = 3 ϵ \le \epsilon + K \left(\frac{\epsilon}{K}\right) + \epsilon = 3\epsilon ≤ ϵ + K ( K ϵ ) + ϵ = 3 ϵ Since ϵ \epsilon ϵ was arbitrary, E [ ∣ X n ∣ ] → 0 \mathbb{E}[|X_n|] \to 0 E [ ∣ X n ∣ ] → 0 . Thus X n → L 1 0 X_n \xrightarrow{L^1} 0 X n L 1 0 . □ \square □
The following diagram summarizes the hierarchy of convergence modes.