Also known as Lévy’s Continuity Theorem . It identifies weak convergence of probability measures with pointwise convergence of their characteristic functions , turning a question about distributions into a question about a single complex-valued function.
Before stating the theorem, we record a basic fact used in its proof.
Proposition
Let φ \varphi φ be the characteristic function of a probability measure μ \mu μ . Then φ \varphi φ is continuous on R \mathbb{R} R .
Let X ∼ μ X \sim \mu X ∼ μ , so φ ( t ) = E [ e i t X ] \varphi(t) = \mathbb{E}[e^{itX}] φ ( t ) = E [ e i tX ] . Fix t ∈ R t \in \mathbb{R} t ∈ R and let s → t s \to t s → t . Then:
Pointwise convergence : As s → t s \to t s → t , e i s X → e i t X e^{isX} \to e^{itX} e i s X → e i tX pointwise on Ω \Omega Ω , since the map s ↦ e i s x s \mapsto e^{isx} s ↦ e i s x is continuous for every x ∈ R x \in \mathbb{R} x ∈ R .
Domination : ∣ e i s X ∣ = 1 |e^{isX}| = 1 ∣ e i s X ∣ = 1 , which is integrable since μ \mu μ is a probability measure.
By the dominated convergence theorem , φ ( s ) = E [ e i s X ] → E [ e i t X ] = φ ( t ) \varphi(s) = \mathbb{E}[e^{isX}] \to \mathbb{E}[e^{itX}] = \varphi(t) φ ( s ) = E [ e i s X ] → E [ e i tX ] = φ ( t ) .
Theorem (Lévy)
Let { μ n } n ≥ 1 \{\mu_n\}_{n \ge 1} { μ n } n ≥ 1 and μ \mu μ be probability measures on ( R , B ) (\mathbb{R}, \mathcal{B}) ( R , B ) with characteristic functions φ n , φ \varphi_n, \varphi φ n , φ . Then
μ n ⇒ μ ⟺ φ n ( t ) → φ ( t ) for all t ∈ R . \mu_n \Rightarrow \mu \quad \Longleftrightarrow \quad \varphi_n(t) \to \varphi(t) \text{ for all } t \in \mathbb{R}. μ n ⇒ μ ⟺ φ n ( t ) → φ ( t ) for all t ∈ R . In short: weak convergence of measures is the same as pointwise convergence of characteristic functions.
The forward direction is a one-line application of Portmanteau . The reverse direction is the substantive content: pointwise convergence of φ n \varphi_n φ n contains enough information to force tightness , after which Helly and the inversion formula close the argument.
(⟹) Weak convergence implies pointwise convergence of ch.f.
Let X n ∼ μ n X_n \sim \mu_n X n ∼ μ n and X ∼ μ X \sim \mu X ∼ μ . The map x ↦ e i t x x \mapsto e^{itx} x ↦ e i t x is bounded (∣ e i t x ∣ = 1 |e^{itx}| = 1 ∣ e i t x ∣ = 1 ) and continuous. By the Portmanteau theorem (condition 2: expectations of bounded continuous functions converge):
E [ e i t X n ] → E [ e i t X ] , \mathbb{E}[e^{itX_n}] \to \mathbb{E}[e^{itX}], E [ e i t X n ] → E [ e i tX ] , which is exactly φ n ( t ) → φ ( t ) \varphi_n(t) \to \varphi(t) φ n ( t ) → φ ( t ) for every t ∈ R t \in \mathbb{R} t ∈ R .
(Strictly speaking, e i t x e^{itx} e i t x is complex-valued. Apply Portmanteau separately to the real part cos ( t x ) \cos(tx) cos ( t x ) and the imaginary part sin ( t x ) \sin(tx) sin ( t x ) , both of which are bounded continuous real-valued functions.)
(⟸) Pointwise convergence of ch.f. implies weak convergence.
This direction has three steps: derive a tail bound from φ n \varphi_n φ n , conclude tightness, then identify every subsequential weak limit with μ \mu μ .
Tail bound from the characteristic function (Fubini trick).
Let X n ∼ μ n X_n \sim \mu_n X n ∼ μ n . For any u > 0 u > 0 u > 0 , compute
1 u ∫ − u u ( 1 − φ n ( t ) ) d t = 1 u ∫ − u u E [ 1 − e i t X n ] d t . \frac{1}{u} \int_{-u}^{u} \big(1 - \varphi_n(t)\big) \, dt
= \frac{1}{u} \int_{-u}^{u} \mathbb{E}\big[1 - e^{itX_n}\big] \, dt. u 1 ∫ − u u ( 1 − φ n ( t ) ) d t = u 1 ∫ − u u E [ 1 − e i t X n ] d t .
The integrand is bounded by 2 2 2 on a bounded interval, so by Fubini we may swap:
= E [ 1 u ∫ − u u ( 1 − e i t X n ) d t ] = E [ 2 − 2 sin ( u X n ) u X n ] = 2 E [ 1 − sin ( u X n ) u X n ] , = \mathbb{E}\left[\frac{1}{u} \int_{-u}^{u} \big(1 - e^{itX_n}\big) \, dt\right]
= \mathbb{E}\left[2 - \frac{2\sin(uX_n)}{uX_n}\right]
= 2\,\mathbb{E}\left[1 - \frac{\sin(uX_n)}{uX_n}\right], = E [ u 1 ∫ − u u ( 1 − e i t X n ) d t ] = E [ 2 − u X n 2 sin ( u X n ) ] = 2 E [ 1 − u X n sin ( u X n ) ] ,
where the inner integral ∫ − u u e i t x d t = 2 sin ( u x ) x \int_{-u}^{u} e^{itx} dt = \frac{2 \sin(ux)}{x} ∫ − u u e i t x d t = x 2 s i n ( ux ) was evaluated directly.
Now ∣ sin y / y ∣ ≤ 1 |\sin y / y| \le 1 ∣ sin y / y ∣ ≤ 1 always, and for ∣ y ∣ ≥ 2 |y| \ge 2 ∣ y ∣ ≥ 2 we have ∣ sin y / y ∣ ≤ 1 / ∣ y ∣ ≤ 1 / 2 |\sin y / y| \le 1/|y| \le 1/2 ∣ sin y / y ∣ ≤ 1/∣ y ∣ ≤ 1/2 . Restricting to the event { ∣ X n ∣ ≥ 2 / u } \{|X_n| \ge 2/u\} { ∣ X n ∣ ≥ 2/ u } , i.e., { ∣ u X n ∣ ≥ 2 } \{|uX_n| \ge 2\} { ∣ u X n ∣ ≥ 2 } :
2 E [ 1 − sin ( u X n ) u X n ] ≥ 2 E [ ( 1 − 1 ∣ u X n ∣ ) 1 { ∣ X n ∣ ≥ 2 / u } ] ≥ 2 ⋅ 1 2 ⋅ P ( ∣ X n ∣ ≥ 2 u ) . 2\,\mathbb{E}\left[1 - \frac{\sin(uX_n)}{uX_n}\right]
\ge 2\,\mathbb{E}\left[\left(1 - \frac{1}{|uX_n|}\right) \mathbb{1}_{\{|X_n| \ge 2/u\}}\right]
\ge 2 \cdot \tfrac{1}{2} \cdot \mathbb{P}\!\left(|X_n| \ge \tfrac{2}{u}\right). 2 E [ 1 − u X n sin ( u X n ) ] ≥ 2 E [ ( 1 − ∣ u X n ∣ 1 ) 1 { ∣ X n ∣ ≥ 2/ u } ] ≥ 2 ⋅ 2 1 ⋅ P ( ∣ X n ∣ ≥ u 2 ) .
The first inequality also used that the integrand is non-negative wherever ∣ sin y / y ∣ ≤ 1 |\sin y / y| \le 1 ∣ sin y / y ∣ ≤ 1 , so dropping the complement of { ∣ X n ∣ ≥ 2 / u } \{|X_n| \ge 2/u\} { ∣ X n ∣ ≥ 2/ u } can only decrease the expectation. We have arrived at the bound
P ( ∣ X n ∣ ≥ 2 u ) ≤ 1 u ∫ − u u ( 1 − φ n ( t ) ) d t . —(1.1) \mathbb{P}\!\left(|X_n| \ge \tfrac{2}{u}\right) \le \frac{1}{u} \int_{-u}^{u} \big(1 - \varphi_n(t)\big) \, dt. \quad\textcolor{gray}{\text{---(1.1)}} P ( ∣ X n ∣ ≥ u 2 ) ≤ u 1 ∫ − u u ( 1 − φ n ( t ) ) d t . —(1.1)
Choosing u u u small makes 2 / u 2/u 2/ u large, so ( 1.1 ) (1.1) ( 1.1 ) controls the probability that X n X_n X n falls outside a large interval.
Tightness.
Fix ϵ > 0 \epsilon > 0 ϵ > 0 . By the preliminary proposition, φ \varphi φ is continuous, and φ ( 0 ) = 1 \varphi(0) = 1 φ ( 0 ) = 1 (property 1 of CHFs ). Choose u > 0 u > 0 u > 0 small enough that
1 u ∫ − u u ( 1 − φ ( t ) ) d t < ϵ 2 . —(1.2) \frac{1}{u} \int_{-u}^{u} \big(1 - \varphi(t)\big) \, dt < \frac{\epsilon}{2}. \quad\textcolor{gray}{\text{---(1.2)}} u 1 ∫ − u u ( 1 − φ ( t ) ) d t < 2 ϵ . —(1.2)
Each φ n \varphi_n φ n is bounded by 1 1 1 in modulus, so ∣ 1 − φ n ( t ) ∣ ≤ 2 |1 - \varphi_n(t)| \le 2 ∣1 − φ n ( t ) ∣ ≤ 2 . Pointwise convergence φ n → φ \varphi_n \to \varphi φ n → φ together with this bound on [ − u , u ] [-u, u] [ − u , u ] and the DCT gives
1 u ∫ − u u ( 1 − φ n ( t ) ) d t ⟶ 1 u ∫ − u u ( 1 − φ ( t ) ) d t < ϵ 2 . \frac{1}{u} \int_{-u}^{u} \big(1 - \varphi_n(t)\big) \, dt \;\longrightarrow\; \frac{1}{u} \int_{-u}^{u} \big(1 - \varphi(t)\big) \, dt < \frac{\epsilon}{2}. u 1 ∫ − u u ( 1 − φ n ( t ) ) d t ⟶ u 1 ∫ − u u ( 1 − φ ( t ) ) d t < 2 ϵ .
So there exists N N N such that for all n ≥ N n \ge N n ≥ N , the left side is below ϵ \epsilon ϵ . Combining with ( 1.1 ) (1.1) ( 1.1 ) ,
P ( ∣ X n ∣ ≥ 2 u ) < ϵ for all n ≥ N . \mathbb{P}\!\left(|X_n| \ge \tfrac{2}{u}\right) < \epsilon \quad \text{for all } n \ge N. P ( ∣ X n ∣ ≥ u 2 ) < ϵ for all n ≥ N .
The remaining finitely many indices n < N n < N n < N can each be handled by enlarging M M M , since each μ n \mu_n μ n is a probability measure. Setting M ϵ M_\epsilon M ϵ to be the maximum of 2 / u 2/u 2/ u and the values needed for n < N n < N n < N gives P ( ∣ X n ∣ ≥ M ϵ ) < ϵ \mathbb{P}(|X_n| \ge M_\epsilon) < \epsilon P ( ∣ X n ∣ ≥ M ϵ ) < ϵ for all n n n . So { μ n } \{\mu_n\} { μ n } is tight .
Every subsequential weak limit equals μ \mu μ .
By the tightness theorem , every subsequence of { μ n } \{\mu_n\} { μ n } has a further subsequence { μ n k } \{\mu_{n_k}\} { μ n k } converging weakly to some probability measure ν \nu ν . Applying the forward direction (already proved) to this sub-subsequence,
φ n k ( t ) → ν ^ ( t ) for all t . \varphi_{n_k}(t) \to \widehat{\nu}(t) \quad \text{for all } t. φ n k ( t ) → ν ( t ) for all t .
But by assumption φ n k ( t ) → φ ( t ) = μ ^ ( t ) \varphi_{n_k}(t) \to \varphi(t) = \widehat{\mu}(t) φ n k ( t ) → φ ( t ) = μ ( t ) . So ν ^ = μ ^ \widehat{\nu} = \widehat{\mu} ν = μ , and by the inversion formula characteristic functions determine the measure: ν = μ \nu = \mu ν = μ .
Every weak subsequential limit of { μ n } \{\mu_n\} { μ n } equals μ \mu μ . This forces μ n ⇒ μ \mu_n \Rightarrow \mu μ n ⇒ μ : if F n ( x ) ↛ F ( x ) F_n(x) \not\to F(x) F n ( x ) → F ( x ) at some continuity point x x x of F F F , some subsequence stays bounded away from F ( x ) F(x) F ( x ) , but tightness lets us extract a sub-subsequence converging weakly to μ \mu μ , hence F n k j ( x ) → F ( x ) F_{n_{k_j}}(x) \to F(x) F n k j ( x ) → F ( x ) at continuity points, contradicting the choice of subsequence.
Two distributions are equal iff their characteristic functions agree (by the inversion formula ); the continuity theorem extends this to the dynamic setting: μ n \mu_n μ n converges weakly to μ \mu μ iff their characteristic functions converge pointwise. Among the consequences:
The Central Limit Theorem reduces to showing that the characteristic function of the standardized sum converges to e − t 2 / 2 e^{-t^2/2} e − t 2 /2 , the characteristic function of the standard normal .
Sums of independent random variables become tractable because characteristic functions of independent sums multiply , so weak convergence of sums reduces to a multiplicative computation.
A pointwise limit φ n → g \varphi_n \to g φ n → g does not automatically yield weak convergence to a probability measure: it requires g g g to be the characteristic function of a probability measure, which by the proposition above requires g g g to be continuous at 0 0 0 . Continuity of the limit at 0 0 0 is what guarantees no mass escapes to infinity.