Also called convergence in distribution .
Definition
A sequence of random variables { X n } \{X_n\} { X n } with cumulative distribution functions F n F_n F n is said to converge in distribution (or converge weakly ) to a random variable X X X with distribution function F F F , if:
lim n → ∞ F n ( x ) = F ( x ) \lim_{n \to \infty} F_n(x) = F(x) n → ∞ lim F n ( x ) = F ( x ) for all points x x x where F F F is continuous.
Notation : X n ⇒ X X_n \Rightarrow X X n ⇒ X or X n → d X X_n \xrightarrow{d} X X n d X .
Remark
Convergence in distribution is fundamentally different from other modes because it describes the convergence of the laws (distributions), not the random variables themselves, i.e., it’s not about the relation between X n ( ω ) X_n(\omega) X n ( ω ) and X ( ω ) X(\omega) X ( ω ) , but only about their distributions.
It does not imply that X n X_n X n and X X X are close to each other in value.
It does not even require X n X_n X n and X X X to be defined on the same probability space!
It only asserts that for large n n n , the statistical behavior of X n X_n X n is modeled by F F F .
This is the classic “Law of Rare Events”. If we have a sequence of Binomial distributions where the number of trials n n n goes to infinity and the probability of success p n p_n p n goes to 0 such that n p n → λ n p_n \to \lambda n p n → λ , the distribution converges to a Poisson distribution.
Let X n ∼ Bin ( n , p n ) X_n \sim \text{Bin}(n, p_n) X n ∼ Bin ( n , p n ) with p n = λ / n p_n = \lambda/n p n = λ / n .
The probability mass function (PMF) for a fixed k k k is:
P ( X n = k ) = ( n k ) p n k ( 1 − p n ) n − k = n ( n − 1 ) … ( n − k + 1 ) k ! ( λ n ) k ( 1 − λ n ) n − k = λ k k ! n ( n − 1 ) … ( n − k + 1 ) n k ⏟ → 1 ( 1 − λ n ) n ⏟ → e − λ ( 1 − λ n ) − k ⏟ → 1 \begin{aligned}
\mathbb{P}(X_n = k) &= \binom{n}{k} p_n^k (1-p_n)^{n-k} \\
&= \frac{n(n-1)\dots(n-k+1)}{k!} \left(\frac{\lambda}{n}\right)^k \left(1-\frac{\lambda}{n}\right)^{n-k} \\
&= \frac{\lambda^k}{k!} \underbrace{\frac{n(n-1)\dots(n-k+1)}{n^k}}_{\to 1} \underbrace{\left(1-\frac{\lambda}{n}\right)^n}_{\to e^{-\lambda}} \underbrace{\left(1-\frac{\lambda}{n}\right)^{-k}}_{\to 1}
\end{aligned} P ( X n = k ) = ( k n ) p n k ( 1 − p n ) n − k = k ! n ( n − 1 ) … ( n − k + 1 ) ( n λ ) k ( 1 − n λ ) n − k = k ! λ k → 1 n k n ( n − 1 ) … ( n − k + 1 ) → e − λ ( 1 − n λ ) n → 1 ( 1 − n λ ) − k
Taking the limit as n → ∞ n \to \infty n → ∞ :
lim n → ∞ P ( X n = k ) = λ k e − λ k ! \lim_{n \to \infty} \mathbb{P}(X_n = k) = \frac{\lambda^k e^{-\lambda}}{k!} n → ∞ lim P ( X n = k ) = k ! λ k e − λ
This is the PMF of a Poisson( λ ) (\lambda) ( λ ) distribution. Thus X n → d Pois ( λ ) X_n \xrightarrow{d} \text{Pois}(\lambda) X n d Pois ( λ ) .
A classic example is the convergence of a scaled Geometric distribution to an Exponential distribution.
Consider a sequence of independent Bernoulli trials with success probability p p p .
Let X p X_p X p be the number of trials until the first success. This follows a Geometric distribution with parameter p p p :
P ( X p = n ) = p ( 1 − p ) n − 1 P ( X p > n ) = ( 1 − p ) n \begin{aligned}
\mathbb{P}(X_p = n) &= p(1-p)^{n-1} \\
\mathbb{P}(X_p > n) &= (1-p)^n
\end{aligned} P ( X p = n ) P ( X p > n ) = p ( 1 − p ) n − 1 = ( 1 − p ) n
for n = 1 , 2 , … n = 1, 2, \dots n = 1 , 2 , … . Note that X p > n X_p > n X p > n means the first n n n trials were failures.
As p → 0 p \to 0 p → 0 , we analyze the distribution function of p X p p X_p p X p , i.e., P ( p X p ≤ x ) \mathbb{P}(p X_p \le x) P ( p X p ≤ x ) .
lim p → 0 P ( p X p > x ) = lim p → 0 P ( X p > x p ) = lim p → 0 ( 1 − p ) x / p \begin{aligned}
\lim_{p \to 0} \mathbb{P}(p X_p > x) &= \lim_{p \to 0} \mathbb{P}\left(X_p > \frac{x}{p}\right) \\
&= \lim_{p \to 0} (1-p)^{x/p}
\end{aligned} p → 0 lim P ( p X p > x ) = p → 0 lim P ( X p > p x ) = p → 0 lim ( 1 − p ) x / p
Recalling the limit definition of the exponential function lim m → ∞ ( 1 − 1 m ) m = e − 1 \lim_{m \to \infty} (1 - \frac{1}{m})^m = e^{-1} lim m → ∞ ( 1 − m 1 ) m = e − 1 . To apply this, we make the substitution m = 1 / p m = 1/p m = 1/ p . As p → 0 p \to 0 p → 0 , m → ∞ m \to \infty m → ∞ .
lim p → 0 ( 1 − p ) x / p = lim m → ∞ ( 1 − 1 m ) m x = [ lim m → ∞ ( 1 − 1 m ) m ] x = e − x \lim_{p \to 0} (1-p)^{x/p} = \lim_{m \to \infty} \left(1 - \frac{1}{m}\right)^{mx} = \left[ \lim_{m \to \infty} \left(1 - \frac{1}{m}\right)^m \right]^x = e^{-x} p → 0 lim ( 1 − p ) x / p = m → ∞ lim ( 1 − m 1 ) m x = [ m → ∞ lim ( 1 − m 1 ) m ] x = e − x
Thus:
lim p → 0 P ( p X p ≤ x ) = 1 − e − x for all x > 0 \lim_{p \to 0} \mathbb{P}(p X_p \le x) = 1 - e^{-x} \quad \text{for all } x > 0 p → 0 lim P ( p X p ≤ x ) = 1 − e − x for all x > 0
This is precisely the CDF of an Exponential distribution with rate parameter λ = 1 \lambda = 1 λ = 1 .
p X p → d Exp ( 1 ) as p → 0 p X_p \xrightarrow{d} \text{Exp}(1) \quad \text{as } p \to 0 p X p d Exp ( 1 ) as p → 0
We know that convergence in probability is stronger than convergence in distribution. Here we provide a proof.
Proposition
Let { X n } n ≥ 1 \{X_n\}_{n \ge 1} { X n } n ≥ 1 be a sequence of random variables.
If X n → P X X_n \xrightarrow{P} X X n P X , then X n → d X X_n \xrightarrow{d} X X n d X .
Let F n F_n F n and F F F be the CDFs of X n X_n X n and X X X respectively.
Let a a a be any point where F F F is continuous.
We want to show that F n ( a ) → F ( a ) F_n(a) \to F(a) F n ( a ) → F ( a ) for any a a a where F F F is continuous.
Let ϵ > 0 \epsilon > 0 ϵ > 0 . Since F F F is continuous at a a a , there exists δ > 0 \delta > 0 δ > 0 such that:
F ( a ) − ϵ < F ( a − δ ) ≤ F ( a ) ≤ F ( a + δ ) < F ( a ) + ϵ —(1.1) F(a) - \epsilon < F(a-\delta) \le F(a) \le F(a+\delta) < F(a) + \epsilon \quad\textcolor{gray}{\text{---(1.1)}} F ( a ) − ϵ < F ( a − δ ) ≤ F ( a ) ≤ F ( a + δ ) < F ( a ) + ϵ —(1.1)
Since X n → P X X_n \xrightarrow{P} X X n P X , there exists N ϵ > 0 N_{\epsilon} > 0 N ϵ > 0 such that:
P ( ∣ X n − X ∣ > δ ) < ϵ for all n ≥ N ϵ —(1.2) \mathbb{P}(|X_n - X| > \delta) < \epsilon \quad \text{for all } n \ge N_{\epsilon} \quad \textcolor{gray}{\text{---(1.2)}} P ( ∣ X n − X ∣ > δ ) < ϵ for all n ≥ N ϵ —(1.2)
Now, for such n n n , consider the event { X n ≤ a } \{X_n \le a\} { X n ≤ a } . We can decompose it as:
F n ( a ) = P ( X n ≤ a ) = P ( X n ≤ a , ∣ X n − X ∣ ≤ δ ) + P ( X n ≤ a , ∣ X n − X ∣ > δ ) \begin{aligned}
F_n(a) &= \mathbb{P}(X_n \le a) \\
&= \mathbb{P}(X_n \le a, |X_n - X| \le \delta) + \mathbb{P}(X_n \le a, |X_n - X| > \delta)
\end{aligned} F n ( a ) = P ( X n ≤ a ) = P ( X n ≤ a , ∣ X n − X ∣ ≤ δ ) + P ( X n ≤ a , ∣ X n − X ∣ > δ )
Upper Bound
The second probability term is bounded by ϵ \epsilon ϵ from ( 1.2 ) (1.2) ( 1.2 ) :
P ( X n ≤ a , ∣ X n − X ∣ > δ ) ≤ P ( ∣ X n − X ∣ > δ ) < ϵ \mathbb{P}(X_n \le a, |X_n - X| > \delta) \le \mathbb{P}(|X_n - X| > \delta) < \epsilon P ( X n ≤ a , ∣ X n − X ∣ > δ ) ≤ P ( ∣ X n − X ∣ > δ ) < ϵ
For the first term, observe the implication:
X n ≤ a and ∣ X n − X ∣ ≤ δ ⟹ X ≤ a + δ X_n \le a \text{ and } |X_n - X| \le \delta \implies X \le a + \delta X n ≤ a and ∣ X n − X ∣ ≤ δ ⟹ X ≤ a + δ
This implies the inclusion of events { X n ≤ a , ∣ X n − X ∣ ≤ δ } ⊆ { X ≤ a + δ } \{X_n \le a, |X_n - X| \le \delta\} \subseteq \{X \le a+\delta\} { X n ≤ a , ∣ X n − X ∣ ≤ δ } ⊆ { X ≤ a + δ } . Thus:
P ( X n ≤ a , ∣ X n − X ∣ ≤ δ ) ≤ P ( X ≤ a + δ ) = F ( a + δ ) \mathbb{P}(X_n \le a, |X_n - X| \le \delta) \le \mathbb{P}(X \le a+\delta) = F(a+\delta) P ( X n ≤ a , ∣ X n − X ∣ ≤ δ ) ≤ P ( X ≤ a + δ ) = F ( a + δ )
Combining these and using classical continuity bound ( 1.1 ) (1.1) ( 1.1 ) :
F n ( a ) < F ( a + δ ) + ϵ < ( F ( a ) + ϵ ) + ϵ = F ( a ) + 2 ϵ F_n(a) < F(a+\delta) + \epsilon < (F(a) + \epsilon) + \epsilon = F(a) + 2\epsilon F n ( a ) < F ( a + δ ) + ϵ < ( F ( a ) + ϵ ) + ϵ = F ( a ) + 2 ϵ
Lower Bound
To find a lower bound for the first term P ( X n ≤ a , ∣ X n − X ∣ ≤ δ ) \mathbb{P}(X_n \le a, |X_n - X| \le \delta) P ( X n ≤ a , ∣ X n − X ∣ ≤ δ ) , consider the reverse implication:
X ≤ a − δ and ∣ X n − X ∣ ≤ δ ⟹ X n ≤ a X \le a-\delta \text{ and } |X_n - X| \le \delta \implies X_n \le a X ≤ a − δ and ∣ X n − X ∣ ≤ δ ⟹ X n ≤ a
This gives the inclusion { X ≤ a − δ , ∣ X n − X ∣ ≤ δ } ⊆ { X n ≤ a , ∣ X n − X ∣ ≤ δ } \{X \le a-\delta, |X_n - X| \le \delta\} \subseteq \{X_n \le a, |X_n - X| \le \delta\} { X ≤ a − δ , ∣ X n − X ∣ ≤ δ } ⊆ { X n ≤ a , ∣ X n − X ∣ ≤ δ } .
Now, write P ( X ≤ a − δ ) \mathbb{P}(X \le a-\delta) P ( X ≤ a − δ ) as the sum of disjoint events based on the condition ∣ X n − X ∣ ≤ δ |X_n - X| \le \delta ∣ X n − X ∣ ≤ δ :
P ( X ≤ a − δ ) = P ( X ≤ a − δ , ∣ X n − X ∣ ≤ δ ) + P ( X ≤ a − δ , ∣ X n − X ∣ > δ ) ⏟ ≤ P ( ∣ X n − X ∣ > δ ) < ϵ \mathbb{P}(X \le a-\delta) = \mathbb{P}(X \le a-\delta, |X_n - X| \le \delta) + \underbrace{\mathbb{P}(X \le a-\delta, |X_n - X| > \delta)}_{\le \mathbb{P}(|X_n - X| > \delta) < \epsilon} P ( X ≤ a − δ ) = P ( X ≤ a − δ , ∣ X n − X ∣ ≤ δ ) + ≤ P ( ∣ X n − X ∣ > δ ) < ϵ P ( X ≤ a − δ , ∣ X n − X ∣ > δ )
Rearranging this inequality:
P ( X ≤ a − δ , ∣ X n − X ∣ ≤ δ ) ≥ F ( a − δ ) − ϵ \mathbb{P}(X \le a-\delta, |X_n - X| \le \delta) \ge F(a-\delta) - \epsilon P ( X ≤ a − δ , ∣ X n − X ∣ ≤ δ ) ≥ F ( a − δ ) − ϵ
Substituting back into our expression for F n ( a ) F_n(a) F n ( a ) (and ignoring the non-negative second term):
F n ( a ) ≥ F ( a − δ ) − ϵ F_n(a) \ge F(a-\delta) - \epsilon F n ( a ) ≥ F ( a − δ ) − ϵ
Using the continuity bound (1.1) again:
F n ( a ) > ( F ( a ) − ϵ ) − ϵ = F ( a ) − 2 ϵ F_n(a) > (F(a) - \epsilon) - \epsilon = F(a) - 2\epsilon F n ( a ) > ( F ( a ) − ϵ ) − ϵ = F ( a ) − 2 ϵ
Conclusion
Combining bounds, for n ≥ N ϵ n \ge N_{\epsilon} n ≥ N ϵ :
F ( a ) − 2 ϵ < F n ( a ) < F ( a ) + 2 ϵ F(a) - 2\epsilon < F_n(a) < F(a) + 2\epsilon F ( a ) − 2 ϵ < F n ( a ) < F ( a ) + 2 ϵ
Since ϵ \epsilon ϵ is arbitrary, F n ( a ) → F ( a ) F_n(a) \to F(a) F n ( a ) → F ( a ) .
Does X n → d X ⟹ X n → P X X_n \xrightarrow{d} X \implies X_n \xrightarrow{P} X X n d X ⟹ X n P X ?
Generally No , because convergence in distribution does not require variables to be close in value (or even on the same space).
However, if the limit is a constant , the converse holds.
Proposition
Let { X n } n ≥ 1 \{X_n\}_{n \ge 1} { X n } n ≥ 1 be a sequence of random variables.
If X n → d c X_n \xrightarrow{d} c X n d c for some constant c c c , then X n → P c X_n \xrightarrow{P} c X n P c .
For any ϵ > 0 \epsilon > 0 ϵ > 0 , we want to show that P ( ∣ X n − c ∣ ≤ ϵ ) → 1 \mathbb{P}(|X_n - c| \le \epsilon) \to 1 P ( ∣ X n − c ∣ ≤ ϵ ) → 1 .
P ( ∣ X n − c ∣ ≤ ϵ ) = P ( c − ϵ ≤ X n ≤ c + ϵ ) ≥ P ( c − ϵ < X n ≤ c + ϵ ) = F n ( c + ϵ ) − F n ( c − ϵ ) \begin{aligned}
\mathbb{P}(|X_n - c| \le \epsilon) &= \mathbb{P}(c - \epsilon \le X_n \le c + \epsilon) \\
&\ge \mathbb{P}(c - \epsilon < X_n \le c + \epsilon) \\
&= F_n(c + \epsilon) - F_n(c - \epsilon)
\end{aligned} P ( ∣ X n − c ∣ ≤ ϵ ) = P ( c − ϵ ≤ X n ≤ c + ϵ ) ≥ P ( c − ϵ < X n ≤ c + ϵ ) = F n ( c + ϵ ) − F n ( c − ϵ ) Consider the limit distribution F F F of the constant random variable X ≡ c X \equiv c X ≡ c :
F ( x ) = { 0 x < c 1 x ≥ c = 1 { x ≥ c } F(x) = \begin{cases} 0 & x < c \\ 1 & x \ge c \end{cases} = \mathbb{1}_{\{x \ge c\}} F ( x ) = { 0 1 x < c x ≥ c = 1 { x ≥ c } This function F F F is continuous at all points except x = c x=c x = c .
Therefore, for any ϵ > 0 \epsilon > 0 ϵ > 0 , F F F is continuous at c + ϵ c + \epsilon c + ϵ and c − ϵ c - \epsilon c − ϵ .
By definition of weak convergence:
F n ( c + ϵ ) → F ( c + ϵ ) = 1 F n ( c − ϵ ) → F ( c − ϵ ) = 0 \begin{aligned}
F_n(c + \epsilon) &\to F(c + \epsilon) = 1 \\
F_n(c - \epsilon) &\to F(c - \epsilon) = 0
\end{aligned} F n ( c + ϵ ) F n ( c − ϵ ) → F ( c + ϵ ) = 1 → F ( c − ϵ ) = 0 Thus:
lim n → ∞ P ( ∣ X n − c ∣ ≤ ϵ ) ≥ 1 − 0 = 1 \lim_{n \to \infty} \mathbb{P}(|X_n - c| \le \epsilon) \ge 1 - 0 = 1 n → ∞ lim P ( ∣ X n − c ∣ ≤ ϵ ) ≥ 1 − 0 = 1 Since probabilities cannot exceed 1, the limit is exactly 1. Hence X n → P c X_n \xrightarrow{P} c X n P c .