Helly’s selection theorem produces a subsequential limit F F F that is right-continuous and non-decreasing, but the total mass of F F F may be strictly less than 1 1 1 (mass can escape to ± ∞ \pm\infty ± ∞ ). Tightness is the condition that rules this out and promotes Helly’s conclusion to weak convergence to a probability measure.
Definition: Tightness
A sequence of probability measures { μ n } n ≥ 1 \{\mu_n\}_{n \ge 1} { μ n } n ≥ 1 on ( R , B ) (\mathbb{R}, \mathcal{B}) ( R , B ) is called tight if for every ϵ > 0 \epsilon > 0 ϵ > 0 , there exists M > 0 M > 0 M > 0 such that
μ n ( ( − M , M ] ) ≥ 1 − ϵ for all n ≥ 1. \mu_n((-M, M]) \ge 1 - \epsilon \quad \text{for all } n \ge 1. μ n (( − M , M ]) ≥ 1 − ϵ for all n ≥ 1. A sequence of distribution functions is tight if the corresponding sequence of measures is tight.
Written in terms of distribution functions, this is equivalent to
1 − F n ( M ) + F n ( − M ) ≤ ϵ for all n . 1 - F_n(M) + F_n(-M) \le \epsilon \quad \text{for all } n. 1 − F n ( M ) + F n ( − M ) ≤ ϵ for all n .
Some equivalent formulations of the same condition:
lim inf n → ∞ μ n ( ( − M , M ] ) ≥ 1 − ϵ \liminf_{n \to \infty} \mu_n((-M, M]) \ge 1 - \epsilon lim inf n → ∞ μ n (( − M , M ]) ≥ 1 − ϵ for some M M M ,
lim sup n → ∞ ( 1 − F n ( M ) + F n ( − M ) ) ≤ ϵ \limsup_{n \to \infty} \big(1 - F_n(M) + F_n(-M)\big) \le \epsilon lim sup n → ∞ ( 1 − F n ( M ) + F n ( − M ) ) ≤ ϵ for some M M M ,
μ n ( ( − M , M ] ) ≥ 1 − ϵ \mu_n((-M, M]) \ge 1 - \epsilon μ n (( − M , M ]) ≥ 1 − ϵ for all n n n (after enlarging M M M ).
The equivalence between holding for all n n n and holding eventually is immediate: any finite collection of measures can be absorbed by taking M M M large enough, since each individual μ n \mu_n μ n is a probability measure and so μ n ( ( − M , M ] ) → 1 \mu_n((-M, M]) \to 1 μ n (( − M , M ]) → 1 as M → ∞ M \to \infty M → ∞ .
Theorem
A sequence { μ n } n ≥ 1 \{\mu_n\}_{n \ge 1} { μ n } n ≥ 1 of probability measures on ( R , B ) (\mathbb{R}, \mathcal{B}) ( R , B ) is tight if and only if every subsequence has a further subsequence that converges weakly to a probability measure.
(⟹) Tightness implies subsequential weak convergence.
Fix any subsequence { F n k } \{F_{n_k}\} { F n k } . By Helly’s selection theorem , there is a further subsequence (which we relabel { F n m } \{F_{n_m}\} { F n m } ) and a right-continuous, non-decreasing function F F F with F n m ( y ) → F ( y ) F_{n_m}(y) \to F(y) F n m ( y ) → F ( y ) at every continuity point y y y of F F F .
It remains to show lim y → − ∞ F ( y ) = 0 \lim_{y \to -\infty} F(y) = 0 lim y → − ∞ F ( y ) = 0 and lim y → ∞ F ( y ) = 1 \lim_{y \to \infty} F(y) = 1 lim y → ∞ F ( y ) = 1 , since then F F F is a distribution function and { F n m } \{F_{n_m}\} { F n m } converges to it weakly. Both limits exist because F F F is monotone and bounded in [ 0 , 1 ] [0,1] [ 0 , 1 ] , and they satisfy lim y → − ∞ F ( y ) ≥ 0 \lim_{y \to -\infty} F(y) \ge 0 lim y → − ∞ F ( y ) ≥ 0 and lim y → ∞ F ( y ) ≤ 1 \lim_{y \to \infty} F(y) \le 1 lim y → ∞ F ( y ) ≤ 1 . So it suffices to show
lim y → ∞ F ( y ) − lim y → − ∞ F ( − y ) = 1. \lim_{y \to \infty} F(y) - \lim_{y \to -\infty} F(-y) = 1. y → ∞ lim F ( y ) − y → − ∞ lim F ( − y ) = 1.
Pick M ϵ M_\epsilon M ϵ from tightness.
For ϵ > 0 \epsilon > 0 ϵ > 0 , choose M ϵ M_\epsilon M ϵ such that
lim sup n → ∞ ( 1 − F n ( M ϵ ) + F n ( − M ϵ ) ) ≤ ϵ . —(1.1) \limsup_{n \to \infty} \big(1 - F_n(M_\epsilon) + F_n(-M_\epsilon)\big) \le \epsilon. \quad\textcolor{gray}{\text{---(1.1)}} n → ∞ lim sup ( 1 − F n ( M ϵ ) + F n ( − M ϵ ) ) ≤ ϵ . —(1.1)
Compare F F F to F n m F_{n_m} F n m at M ϵ M_\epsilon M ϵ via continuity points.
Pick continuity points r < − M ϵ r < -M_\epsilon r < − M ϵ and s > M ϵ s > M_\epsilon s > M ϵ of F F F . Such points exist since F F F has at most countably many discontinuities. By monotonicity of each F n m F_{n_m} F n m ,
F n m ( s ) ≥ F n m ( M ϵ ) , F n m ( r ) ≤ F n m ( − M ϵ ) , F_{n_m}(s) \ge F_{n_m}(M_\epsilon), \quad F_{n_m}(r) \le F_{n_m}(-M_\epsilon), F n m ( s ) ≥ F n m ( M ϵ ) , F n m ( r ) ≤ F n m ( − M ϵ ) ,
so
1 − F n m ( s ) + F n m ( r ) ≤ 1 − F n m ( M ϵ ) + F n m ( − M ϵ ) . 1 - F_{n_m}(s) + F_{n_m}(r) \le 1 - F_{n_m}(M_\epsilon) + F_{n_m}(-M_\epsilon). 1 − F n m ( s ) + F n m ( r ) ≤ 1 − F n m ( M ϵ ) + F n m ( − M ϵ ) .
Pass to the limit.
Since r , s r, s r , s are continuity points of F F F , F n m ( r ) → F ( r ) F_{n_m}(r) \to F(r) F n m ( r ) → F ( r ) and F n m ( s ) → F ( s ) F_{n_m}(s) \to F(s) F n m ( s ) → F ( s ) . Taking lim sup \limsup lim sup on the right and using ( 1.1 ) (1.1) ( 1.1 ) ,
1 − F ( s ) + F ( r ) ≤ ϵ . 1 - F(s) + F(r) \le \epsilon. 1 − F ( s ) + F ( r ) ≤ ϵ .
Equivalently, F ( s ) − F ( r ) ≥ 1 − ϵ F(s) - F(r) \ge 1 - \epsilon F ( s ) − F ( r ) ≥ 1 − ϵ .
Send r → − ∞ r \to -\infty r → − ∞ and s → ∞ s \to \infty s → ∞ .
Choosing continuity points r k → − ∞ r_k \to -\infty r k → − ∞ and s k → ∞ s_k \to \infty s k → ∞ ,
lim s → ∞ F ( s ) − lim r → − ∞ F ( r ) ≥ 1 − ϵ . \lim_{s \to \infty} F(s) - \lim_{r \to -\infty} F(r) \ge 1 - \epsilon. s → ∞ lim F ( s ) − r → − ∞ lim F ( r ) ≥ 1 − ϵ .
Since ϵ > 0 \epsilon > 0 ϵ > 0 was arbitrary, the difference is at least 1 1 1 . Combined with lim y → ∞ F ( y ) ≤ 1 \lim_{y \to \infty} F(y) \le 1 lim y → ∞ F ( y ) ≤ 1 and lim y → − ∞ F ( y ) ≥ 0 \lim_{y \to -\infty} F(y) \ge 0 lim y → − ∞ F ( y ) ≥ 0 , both limits must take their extreme values: lim y → ∞ F ( y ) = 1 \lim_{y \to \infty} F(y) = 1 lim y → ∞ F ( y ) = 1 and lim y → − ∞ F ( y ) = 0 \lim_{y \to -\infty} F(y) = 0 lim y → − ∞ F ( y ) = 0 .
Thus F F F is a distribution function, and the corresponding measure μ \mu μ is a probability measure with μ n m ⇒ μ \mu_{n_m} \Rightarrow \mu μ n m ⇒ μ .
(⟸) No tightness implies no convergent sub-subsequence.
We prove the contrapositive: if { μ n } \{\mu_n\} { μ n } is not tight, then some subsequence has no further subsequence converging weakly to a probability measure.
Non-tightness gives an ϵ > 0 \epsilon > 0 ϵ > 0 such that for every M M M , some n n n satisfies 1 − F n ( M ) + F n ( − M ) ≥ ϵ 1 - F_n(M) + F_n(-M) \ge \epsilon 1 − F n ( M ) + F n ( − M ) ≥ ϵ . Building this index by index, there exist sequences M k → ∞ M_k \to \infty M k → ∞ and indices n k n_k n k with
1 − F n k ( M k ) + F n k ( − M k ) ≥ ϵ for all k . —(2.1) 1 - F_{n_k}(M_k) + F_{n_k}(-M_k) \ge \epsilon \quad \text{for all } k. \quad\textcolor{gray}{\text{---(2.1)}} 1 − F n k ( M k ) + F n k ( − M k ) ≥ ϵ for all k . —(2.1) By Helly’s selection theorem , { F n k } \{F_{n_k}\} { F n k } has a further subsequence (still denoted { F n k } \{F_{n_k}\} { F n k } ) with F n k ( y ) → F ( y ) F_{n_k}(y) \to F(y) F n k ( y ) → F ( y ) at continuity points of a right-continuous, non-decreasing F F F . We show F F F is not a distribution function, hence does not correspond to a probability measure.
Bound F F F at any pair of continuity points r < 0 < s r < 0 < s r < 0 < s .
Since M k → ∞ M_k \to \infty M k → ∞ , for k k k large enough, M k > s M_k > s M k > s and − M k < r -M_k < r − M k < r . By monotonicity of F n k F_{n_k} F n k ,
F n k ( s ) ≤ F n k ( M k ) , F n k ( r ) ≥ F n k ( − M k ) , F_{n_k}(s) \le F_{n_k}(M_k), \quad F_{n_k}(r) \ge F_{n_k}(-M_k), F n k ( s ) ≤ F n k ( M k ) , F n k ( r ) ≥ F n k ( − M k ) ,
so
1 − F n k ( s ) + F n k ( r ) ≥ 1 − F n k ( M k ) + F n k ( − M k ) ≥ ϵ 1 - F_{n_k}(s) + F_{n_k}(r) \ge 1 - F_{n_k}(M_k) + F_{n_k}(-M_k) \ge \epsilon 1 − F n k ( s ) + F n k ( r ) ≥ 1 − F n k ( M k ) + F n k ( − M k ) ≥ ϵ
using ( 2.1 ) (2.1) ( 2.1 ) .
Pass to the limit and send r → − ∞ , s → ∞ r \to -\infty, s \to \infty r → − ∞ , s → ∞ .
Taking k → ∞ k \to \infty k → ∞ at continuity points r , s r, s r , s ,
1 − F ( s ) + F ( r ) ≥ ϵ . 1 - F(s) + F(r) \ge \epsilon. 1 − F ( s ) + F ( r ) ≥ ϵ .
Letting s → ∞ s \to \infty s → ∞ and r → − ∞ r \to -\infty r → − ∞ along continuity points,
lim s → ∞ F ( s ) − lim r → − ∞ F ( r ) ≤ 1 − ϵ < 1. \lim_{s \to \infty} F(s) - \lim_{r \to -\infty} F(r) \le 1 - \epsilon < 1. s → ∞ lim F ( s ) − r → − ∞ lim F ( r ) ≤ 1 − ϵ < 1.
So F F F does not satisfy property 2 of a distribution function: either lim y → ∞ F ( y ) < 1 \lim_{y \to \infty} F(y) < 1 lim y → ∞ F ( y ) < 1 or lim y → − ∞ F ( y ) > 0 \lim_{y \to -\infty} F(y) > 0 lim y → − ∞ F ( y ) > 0 (or both).
Therefore the subsequence { F n k } \{F_{n_k}\} { F n k } has no further subsequence converging weakly to a probability measure: any such sub-sub-limit must agree with F F F at continuity points, but F F F is not a distribution function.
The forward direction is the workhorse: in practice, one shows a sequence of laws is tight (often via a moment bound such as sup n E [ ∣ X n ∣ ] < ∞ \sup_n \mathbb{E}[|X_n|] < \infty sup n E [ ∣ X n ∣ ] < ∞ , which controls the tails through Markov’s inequality), then invokes the theorem to extract a weakly convergent subsequence. If a uniqueness argument pins down all possible subsequential limits to the same μ \mu μ , the full sequence converges weakly to μ \mu μ .
The reverse direction says tightness is not just sufficient but necessary: any sequence enjoying this subsequential compactness must control its tails uniformly.