We define three standard modes of convergence for random variables here. Convergence in distribution (Weak Convergence) is treated separately due to its depth.
Definition
A sequence of random variables X 1 , … , X n X_1, \dots, X_n X 1 , … , X n is said to converge almost surely (a.s.) to a random variable X X X if:
P ( lim n → ∞ X n = X ) = 1 \mathbb{P}\left( \lim_{n \to \infty} X_n = X \right) = 1 P ( n → ∞ lim X n = X ) = 1 Denoted by: X n → a . s . X X_n \xrightarrow{a.s.} X X n a . s . X .
Equivalent formulations :
P ( { ω : X n ( ω ) → X ( ω ) } ) = 1 \mathbb{P}(\{ \omega : X_n(\omega) \to X(\omega) \}) = 1 P ({ ω : X n ( ω ) → X ( ω )}) = 1
P ( { ω : lim n → ∞ X n ( ω ) ≠ X ( ω ) } ) = 0 \mathbb{P}(\{ \omega : \lim_{n \to \infty} X_n(\omega) \neq X(\omega) \}) = 0 P ({ ω : lim n → ∞ X n ( ω ) = X ( ω )}) = 0
Essentially, the set of sample paths ω \omega ω where the sequence fails to converge has probability zero.
Also called Mean Convergence. Let’s look at some context to have a better understanding of the definition.
X ∈ L p X \in L^p X ∈ L p means E [ ∣ X ∣ p ] < ∞ \mathbb{E}[|X|^p] < \infty E [ ∣ X ∣ p ] < ∞ .
Norm : In a general measure space, ∥ f ∥ p = ( ∫ ∣ f ∣ p d μ ) 1 / p \|f\|_p = (\int |f|^p \, d\mu)^{1/p} ∥ f ∥ p = ( ∫ ∣ f ∣ p d μ ) 1/ p .
In a probability space (L p ( Ω , F , P ) L^p(\Omega, \mathcal{F}, \mathbb{P}) L p ( Ω , F , P ) ), recall that since expectation is the Lebesgue integration over the probability measure, this becomes:
∥ X ∥ p = ( E [ ∣ X ∣ p ] ) 1 / p \|X\|_p = \left( \mathbb{E}[|X|^p] \right)^{1/p} ∥ X ∥ p = ( E [ ∣ X ∣ p ] ) 1/ p
Thus, X n → L p X ⟺ ∥ X n − X ∥ p → 0 X_n \xrightarrow{L^p} X \iff \|X_n - X\|_p \to 0 X n L p X ⟺ ∥ X n − X ∥ p → 0 .
Exercise 11
Prove that E [ ∣ X n − X ∣ ] → 0 ⟹ E [ X n ] → E [ X ] \mathbb{E}[|X_n - X|] \to 0 \implies \mathbb{E}[X_n] \to \mathbb{E}[X] E [ ∣ X n − X ∣ ] → 0 ⟹ E [ X n ] → E [ X ] .
Find an example where the reverse direction is not true.
For the first part, use the fact that ∣ E [ X n ] − E [ X ] ∣ = ∣ E [ X n − X ] ∣ ≤ E [ ∣ X n − X ∣ ] | \mathbb{E}[X_n] - \mathbb{E}[X] | = | \mathbb{E}[X_n - X] | \le \mathbb{E}[|X_n - X|] ∣ E [ X n ] − E [ X ] ∣ = ∣ E [ X n − X ] ∣ ≤ E [ ∣ X n − X ∣ ] .
For the second part, consider a sequence of random variables that are “close” to each other on average, but not necessarily in the limit.
Theorem
Almost sure convergence is stronger than convergence in probability:
X n → a . s . X ⟹ X n → P X X_n \xrightarrow{a.s.} X \implies X_n \xrightarrow{P} X X n a . s . X ⟹ X n P X Fix ϵ > 0 \epsilon > 0 ϵ > 0 . Define the sets A n = { ∣ X m − X ∣ ≤ ϵ for all m ≥ n } A_n = \{ |X_m - X| \le \epsilon \text{ for all } m \ge n \} A n = { ∣ X m − X ∣ ≤ ϵ for all m ≥ n } .
Notice that the sequence { A n } \{A_n\} { A n } is non-decreasing (A n ⊆ A n + 1 A_n \subseteq A_{n+1} A n ⊆ A n + 1 ).
The limit set is:
lim n A n = ⋃ n A n = { ∃ n s.t. ∣ X m − X ∣ ≤ ϵ for all m ≥ n } \lim_n A_n = \bigcup_n A_n = \{ \exists n \text{ s.t. } |X_m - X| \le \epsilon \text{ for all } m \ge n \} n lim A n = n ⋃ A n = { ∃ n s.t. ∣ X m − X ∣ ≤ ϵ for all m ≥ n } This set contains the event { lim n → ∞ X n = X } \{ \lim_{n \to \infty} X_n = X \} { lim n → ∞ X n = X } .
Since X n → a . s . X X_n \xrightarrow{a.s.} X X n a . s . X , we have P ( lim X n = X ) = 1 \mathbb{P}(\lim X_n = X) = 1 P ( lim X n = X ) = 1 , which implies P ( lim A n ) = 1 \mathbb{P}(\lim A_n) = 1 P ( lim A n ) = 1 .
By continuity of probability for non-decreasing sets:
lim n → ∞ P ( A n ) = P ( lim A n ) = 1 \lim_{n \to \infty} \mathbb{P}(A_n) = \mathbb{P}(\lim A_n) = 1 n → ∞ lim P ( A n ) = P ( lim A n ) = 1 On the other hand, notice that A n ⊆ { ∣ X n − X ∣ ≤ ϵ } A_n \subseteq \{ |X_n - X| \le \epsilon \} A n ⊆ { ∣ X n − X ∣ ≤ ϵ } .
Thus:
P ( ∣ X n − X ∣ ≤ ϵ ) ≥ P ( A n ) → 1 \mathbb{P}(|X_n - X| \le \epsilon) \ge \mathbb{P}(A_n) \to 1 P ( ∣ X n − X ∣ ≤ ϵ ) ≥ P ( A n ) → 1 This means P ( ∣ X n − X ∣ > ϵ ) → 0 \mathbb{P}(|X_n - X| > \epsilon) \to 0 P ( ∣ X n − X ∣ > ϵ ) → 0 , so X n → P X X_n \xrightarrow{P} X X n P X . □ \square □
Theorem
Convergence in mean (L p L^p L p ) is stronger than convergence in probability:
X n → L p X ⟹ X n → P X X_n \xrightarrow{L^p} X \implies X_n \xrightarrow{P} X X n L p X ⟹ X n P X For any ϵ > 0 \epsilon > 0 ϵ > 0 :
P ( ∣ X n − X ∣ > ϵ ) = P ( ∣ X n − X ∣ p > ϵ p ) \mathbb{P}(|X_n - X| > \epsilon) = \mathbb{P}(|X_n - X|^p > \epsilon^p) P ( ∣ X n − X ∣ > ϵ ) = P ( ∣ X n − X ∣ p > ϵ p ) By Markov’s Inequality :
≤ E [ ∣ X n − X ∣ p ] ϵ p \le \frac{\mathbb{E}[|X_n - X|^p]}{\epsilon^p} ≤ ϵ p E [ ∣ X n − X ∣ p ] Since X n → L p X X_n \xrightarrow{L^p} X X n L p X , the numerator goes to 0 as n → ∞ n \to \infty n → ∞ . Thus the probability goes to 0. □ \square □
Theorem
If 1 ≤ p < q < ∞ 1 \le p < q < \infty 1 ≤ p < q < ∞ , then convergence in a higher moment implies convergence in a lower moment:
X n → L q X ⟹ X n → L p X X_n \xrightarrow{L^q} X \implies X_n \xrightarrow{L^p} X X n L q X ⟹ X n L p X This assumes we are in a finite measure space (like a probability space).
We want to bound E [ ∣ X n − X ∣ p ] \mathbb{E}[|X_n - X|^p] E [ ∣ X n − X ∣ p ] . Let ϵ < 1 \epsilon < 1 ϵ < 1 .
Split the expectation based on the size of the difference:
E [ ∣ X n − X ∣ p ] = E [ ∣ X n − X ∣ p 1 { ∣ X n − X ∣ ≥ ϵ } ] + E [ ∣ X n − X ∣ p 1 { ∣ X n − X ∣ < ϵ } ] \mathbb{E}[|X_n - X|^p] = \mathbb{E}\left[ |X_n - X|^p \mathbb{1}_{\{|X_n - X| \ge \epsilon\}} \right] + \mathbb{E}\left[ |X_n - X|^p \mathbb{1}_{\{|X_n - X| < \epsilon\}} \right] E [ ∣ X n − X ∣ p ] = E [ ∣ X n − X ∣ p 1 { ∣ X n − X ∣ ≥ ϵ } ] + E [ ∣ X n − X ∣ p 1 { ∣ X n − X ∣ < ϵ } ] Analyze the two terms:
Large Difference (∣ X n − X ∣ ≥ ϵ |X_n - X| \ge \epsilon ∣ X n − X ∣ ≥ ϵ ):
Here, ∣ X n − X ∣ p = ∣ X n − X ∣ q ⋅ ∣ X n − X ∣ p − q |X_n - X|^p = |X_n - X|^q \cdot |X_n - X|^{p-q} ∣ X n − X ∣ p = ∣ X n − X ∣ q ⋅ ∣ X n − X ∣ p − q .
Since ∣ X n − X ∣ ≥ ϵ |X_n - X| \ge \epsilon ∣ X n − X ∣ ≥ ϵ and p − q < 0 p-q < 0 p − q < 0 , we have ∣ X n − X ∣ p − q ≤ ϵ p − q |X_n - X|^{p-q} \le \epsilon^{p-q} ∣ X n − X ∣ p − q ≤ ϵ p − q .
⟹ ∣ X n − X ∣ p ≤ ϵ p − q ∣ X n − X ∣ q \implies |X_n - X|^p \le \epsilon^{p-q} |X_n - X|^q ⟹ ∣ X n − X ∣ p ≤ ϵ p − q ∣ X n − X ∣ q
Small Difference (∣ X n − X ∣ < ϵ |X_n - X| < \epsilon ∣ X n − X ∣ < ϵ ):
Here, ∣ X n − X ∣ p < ϵ p |X_n - X|^p < \epsilon^p ∣ X n − X ∣ p < ϵ p .
Combining these bounds:
E [ ∣ X n − X ∣ p ] ≤ ϵ p − q E [ ∣ X n − X ∣ q 1 … ] + ϵ p \mathbb{E}[|X_n - X|^p] \le \epsilon^{p-q} \mathbb{E}[|X_n - X|^q \mathbb{1}_{\dots}] + \epsilon^p E [ ∣ X n − X ∣ p ] ≤ ϵ p − q E [ ∣ X n − X ∣ q 1 … ] + ϵ p Dropping the indicator (making the expectation larger):
≤ ϵ p − q E [ ∣ X n − X ∣ q ] + ϵ p \le \epsilon^{p-q} \mathbb{E}[|X_n - X|^q] + \epsilon^p ≤ ϵ p − q E [ ∣ X n − X ∣ q ] + ϵ p Now take limits:
lim sup n → ∞ E [ ∣ X n − X ∣ p ] ≤ ϵ p − q lim sup n → ∞ E [ ∣ X n − X ∣ q ] ⏟ 0 (since L q conv) + ϵ p \limsup_{n \to \infty} \mathbb{E}[|X_n - X|^p] \le \epsilon^{p-q} \underbrace{\limsup_{n \to \infty} \mathbb{E}[|X_n - X|^q]}_{0 \text{ (since } L^q \text{ conv)}} + \epsilon^p n → ∞ lim sup E [ ∣ X n − X ∣ p ] ≤ ϵ p − q 0 (since L q conv) n → ∞ lim sup E [ ∣ X n − X ∣ q ] + ϵ p = ϵ p = \epsilon^p = ϵ p Since ϵ \epsilon ϵ can be arbitrarily small, the limit must be 0. Thus X n → L p X X_n \xrightarrow{L^p} X X n L p X . □ \square □
In general, Convergence in Probability does not imply Almost Sure or L p L^p L p convergence.
The following example shows that convergence in probability does not imply almost sure convergence.
Consider Ω = [ 0 , 1 ] \Omega = [0,1] Ω = [ 0 , 1 ] with Lebesgue measure. Let X = 0 X = 0 X = 0 .
Define X n X_n X n as indicator functions of intervals that “scan” across [ 0 , 1 ] [0,1] [ 0 , 1 ] repeatedly.
X 1 = 1 [ 0 , 1 ] X_1 = \mathbb{1}_{[0,1]} X 1 = 1 [ 0 , 1 ]
X 2 = 1 [ 0 , 1 / 2 ] , X 3 = 1 [ 1 / 2 , 1 ] X_2 = \mathbb{1}_{[0,1/2]}, X_3 = \mathbb{1}_{[1/2,1]} X 2 = 1 [ 0 , 1/2 ] , X 3 = 1 [ 1/2 , 1 ]
X 4 = 1 [ 0 , 1 / 3 ] , X 5 = 1 [ 1 / 3 , 2 / 3 ] , X 6 = 1 [ 2 / 3 , 1 ] X_4 = \mathbb{1}_{[0,1/3]}, X_5 = \mathbb{1}_{[1/3,2/3]}, X_6 = \mathbb{1}_{[2/3,1]} X 4 = 1 [ 0 , 1/3 ] , X 5 = 1 [ 1/3 , 2/3 ] , X 6 = 1 [ 2/3 , 1 ]
And so on.
Convergence in Probability : Yes. The measure of the support of X n X_n X n is 1 / k 1/k 1/ k (where k k k is the denominator group), which goes to 0. Thus P ( ∣ X n ∣ > ϵ ) → 0 \mathbb{P}(|X_n| > \epsilon) \to 0 P ( ∣ X n ∣ > ϵ ) → 0 .
Almost Sure : No. For any point ω ∈ [ 0 , 1 ] \omega \in [0,1] ω ∈ [ 0 , 1 ] , the interval will cover it infinitely often as the sequence cycles. Thus lim X n ( ω ) \lim X_n(\omega) lim X n ( ω ) does not exist (it oscillates between 0 and 1).
X n → P 0 but X n ̸ → a . s . 0 X_n \xrightarrow{P} 0 \quad \text{but} \quad X_n \not\!\!\xrightarrow{a.s.} 0 X n P 0 but X n a . s . 0
Note : This sequence does converge in L p L^p L p since E [ ∣ X n ∣ p ] = length → 0 \mathbb{E}[|X_n|^p] = \text{length} \to 0 E [ ∣ X n ∣ p ] = length → 0 .
The following example shows that convergence in probability does not imply L p L^p L p convergence.
Consider the same space. Let X n X_n X n be tall, thin rectangles that maintain constant area.
X n = n ⋅ 1 [ 0 , 1 / n ] X_n = n \cdot \mathbb{1}_{[0, 1/n]} X n = n ⋅ 1 [ 0 , 1/ n ]
Convergence in Probability : Yes. The support is [ 0 , 1 / n ] [0, 1/n] [ 0 , 1/ n ] , which has measure 1 / n → 0 1/n \to 0 1/ n → 0 .
P ( ∣ X n ∣ > ϵ ) = 1 / n → 0 \mathbb{P}(|X_n| > \epsilon) = 1/n \to 0 P ( ∣ X n ∣ > ϵ ) = 1/ n → 0
L p L^p L p Convergence : No.
E [ ∣ X n − 0 ∣ 1 ] = ∫ 0 1 / n n d x = 1 ↛ 0 \mathbb{E}[|X_n - 0|^1] = \int_0^{1/n} n \, dx = 1 \not\to 0 E [ ∣ X n − 0 ∣ 1 ] = ∫ 0 1/ n n d x = 1 → 0
(For p > 1 p > 1 p > 1 , E [ ∣ X n ∣ p ] = n p − 1 → ∞ \mathbb{E}[|X_n|^p] = n^{p-1} \to \infty E [ ∣ X n ∣ p ] = n p − 1 → ∞ ).
X n → P 0 but X n ̸ → L 1 0 X_n \xrightarrow{P} 0 \quad \text{but} \quad X_n \not\xrightarrow{L^1} 0 X n P 0 but X n L 1 0
Note : This sequence does converge almost surely to 0. For any ω > 0 \omega > 0 ω > 0 , eventually 1 / n < ω 1/n < \omega 1/ n < ω , so X n ( ω ) = 0 X_n(\omega) = 0 X n ( ω ) = 0 for all large n n n . (Convergence fails only at ω = 0 \omega=0 ω = 0 , which has probability 0).