We define higher-order statistics based on the expectation.
Definition: k-th Moment
The k k k -th moment of a random variable X X X is defined as E [ X k ] \mathbb{E}[X^k] E [ X k ] for k = 1 , 2 , … k = 1, 2, \dots k = 1 , 2 , … .
Bernoulli Distribution : X ∼ Bern ( p ) X \sim \text{Bern}(p) X ∼ Bern ( p ) .
P ( X = 1 ) = p , P ( X = 0 ) = 1 − p \mathbb{P}(X=1) = p, \quad \mathbb{P}(X=0) = 1-p P ( X = 1 ) = p , P ( X = 0 ) = 1 − p
Mean : E [ X ] = 1 ⋅ p + 0 ⋅ ( 1 − p ) = p \mathbb{E}[X] = 1 \cdot p + 0 \cdot (1-p) = p E [ X ] = 1 ⋅ p + 0 ⋅ ( 1 − p ) = p .
Second Moment : Since X X X takes values { 0 , 1 } \{0, 1\} { 0 , 1 } , we have X 2 = X X^2 = X X 2 = X . Thus
E [ X 2 ] = E [ X ] = p \mathbb{E}[X^2] = \mathbb{E}[X] = p E [ X 2 ] = E [ X ] = p
Variance :
Var ( X ) = E [ X 2 ] − ( E [ X ] ) 2 = p − p 2 = p ( 1 − p ) \text{Var}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2 = p - p^2 = p(1-p) Var ( X ) = E [ X 2 ] − ( E [ X ] ) 2 = p − p 2 = p ( 1 − p )
Poisson Distribution : X ∼ Poi ( λ ) , λ > 0 X \sim \text{Poi}(\lambda), \ \lambda > 0 X ∼ Poi ( λ ) , λ > 0 .
P ( X = k ) = e − λ λ k k ! , k = 0 , 1 , … \mathbb{P}(X=k) = e^{-\lambda} \frac{\lambda^k}{k!}, \quad k = 0, 1, \dots P ( X = k ) = e − λ k ! λ k , k = 0 , 1 , …
To calculate moments, it is easier to use factorial moments . Consider E [ X ( X − 1 ) … ( X − k + 1 ) ] \mathbb{E}[X(X-1)\dots(X-k+1)] E [ X ( X − 1 ) … ( X − k + 1 )] .
E [ X ( X − 1 ) … ( X − k + 1 ) ] = ∑ j = 0 ∞ j ( j − 1 ) … ( j − k + 1 ) e − λ λ j j ! = ∑ j = k ∞ j ! ( j − k ) ! e − λ λ j j ! = ∑ j = k ∞ e − λ λ j ( j − k ) ! \begin{aligned}
\mathbb{E}[X(X-1)\dots(X-k+1)] &= \sum_{j=0}^{\infty} j(j-1)\dots(j-k+1) e^{-\lambda} \frac{\lambda^j}{j!} \\
&= \sum_{j=k}^{\infty} \frac{j!}{(j-k)!} e^{-\lambda} \frac{\lambda^j}{j!} = \sum_{j=k}^{\infty} e^{-\lambda} \frac{\lambda^j}{(j-k)!}
\end{aligned} E [ X ( X − 1 ) … ( X − k + 1 )] = j = 0 ∑ ∞ j ( j − 1 ) … ( j − k + 1 ) e − λ j ! λ j = j = k ∑ ∞ ( j − k )! j ! e − λ j ! λ j = j = k ∑ ∞ e − λ ( j − k )! λ j
Let i = j − k i = j-k i = j − k . Then:
= λ k ∑ i = 0 ∞ e − λ λ i i ! = λ k ( ∑ i = 0 ∞ e − λ λ i i ! ) ⏟ 1 (sum of PMF) = λ k = \lambda^k \sum_{i=0}^{\infty} e^{-\lambda} \frac{\lambda^i}{i!} = \lambda^k \underbrace{\left( \sum_{i=0}^{\infty} e^{-\lambda} \frac{\lambda^i}{i!} \right)}_{1 \text{ (sum of PMF)}} = \lambda^k = λ k i = 0 ∑ ∞ e − λ i ! λ i = λ k 1 (sum of PMF) ( i = 0 ∑ ∞ e − λ i ! λ i ) = λ k
Mean : Put k = 1 k=1 k = 1 . E [ X ] = λ 1 = λ \mathbb{E}[X] = \lambda^1 = \lambda E [ X ] = λ 1 = λ .
Second Moment : Using k = 2 k=2 k = 2 , E [ X ( X − 1 ) ] = λ 2 \mathbb{E}[X(X-1)] = \lambda^2 E [ X ( X − 1 )] = λ 2 .
E [ X 2 ] = E [ X ( X − 1 ) ] + E [ X ] = λ 2 + λ \mathbb{E}[X^2] = \mathbb{E}[X(X-1)] + \mathbb{E}[X] = \lambda^2 + \lambda E [ X 2 ] = E [ X ( X − 1 )] + E [ X ] = λ 2 + λ
Variance :
Var ( X ) = ( λ 2 + λ ) − λ 2 = λ \text{Var}(X) = (\lambda^2 + \lambda) - \lambda^2 = \lambda Var ( X ) = ( λ 2 + λ ) − λ 2 = λ
Exponential Distribution : X ∼ Exp ( λ ) X \sim \text{Exp}(\lambda) X ∼ Exp ( λ ) .
Density f ( x ) = λ e − λ x , x ≥ 0 f(x) = \lambda e^{-\lambda x}, \ x \ge 0 f ( x ) = λ e − λ x , x ≥ 0 .
E [ X k ] = ∫ 0 ∞ x k λ e − λ x d x \mathbb{E}[X^k] = \int_0^{\infty} x^k \lambda e^{-\lambda x} \, dx E [ X k ] = ∫ 0 ∞ x k λ e − λ x d x
Substitute y = λ x ⟹ d x = d y / λ y = \lambda x \implies dx = dy/\lambda y = λ x ⟹ d x = d y / λ .
= 1 λ k ∫ 0 ∞ y k e − y d y = Γ ( k + 1 ) λ k = k ! λ k = \frac{1}{\lambda^k} \int_0^{\infty} y^k e^{-y} \, dy = \frac{\Gamma(k+1)}{\lambda^k} = \frac{k!}{\lambda^k} = λ k 1 ∫ 0 ∞ y k e − y d y = λ k Γ ( k + 1 ) = λ k k !
Mean : E [ X ] = 1 ! / λ 1 = 1 / λ \mathbb{E}[X] = 1! / \lambda^1 = 1/\lambda E [ X ] = 1 ! / λ 1 = 1/ λ .
Variance :
Var ( X ) = 2 ! λ 2 − ( 1 λ ) 2 = 2 λ 2 − 1 λ 2 = 1 λ 2 \text{Var}(X) = \frac{2!}{\lambda^2} - \left( \frac{1}{\lambda} \right)^2 = \frac{2}{\lambda^2} - \frac{1}{\lambda^2} = \frac{1}{\lambda^2} Var ( X ) = λ 2 2 ! − ( λ 1 ) 2 = λ 2 2 − λ 2 1 = λ 2 1
If the MGF exists in a neighborhood of t = 0 t=0 t = 0 , we can expand e t X e^{tX} e tX as a Taylor series:
e t X = 1 + t X + ( t X ) 2 2 ! + ( t X ) 3 3 ! + … e^{tX} = 1 + tX + \frac{(tX)^2}{2!} + \frac{(tX)^3}{3!} + \dots e tX = 1 + tX + 2 ! ( tX ) 2 + 3 ! ( tX ) 3 + …
Taking expectations (assuming we can swap sum and expectation):
M X ( t ) = 1 + t E [ X ] + t 2 2 ! E [ X 2 ] + t 3 3 ! E [ X 3 ] + … M_X(t) = 1 + t\mathbb{E}[X] + \frac{t^2}{2!}\mathbb{E}[X^2] + \frac{t^3}{3!}\mathbb{E}[X^3] + \dots M X ( t ) = 1 + t E [ X ] + 2 ! t 2 E [ X 2 ] + 3 ! t 3 E [ X 3 ] + …
Thus, the k k k -th derivative at t = 0 t=0 t = 0 generates the k k k -th moment:
M X ( k ) ( 0 ) = E [ X k ] M_X^{(k)}(0) = \mathbb{E}[X^k] M X ( k ) ( 0 ) = E [ X k ]
The MGF does not always exist! The term e t X e^{tX} e tX grows very fast. For E [ e t X ] \mathbb{E}[e^{tX}] E [ e tX ] to be finite, the probability density of X X X must decay fast enough (faster than e − t x e^{-tx} e − t x ) to counteract this growth.
Light Tails : If tails decay exponentially (like Poisson, Exponential, Normal), the MGF usually exists in a neighborhood of 0.
Heavy Tails : If tails decay slower (polyomially), the integral might diverge for t > 0 t > 0 t > 0 .
Exercise 9
Let X ∼ Exp ( λ ) X \sim \text{Exp}(\lambda) X ∼ Exp ( λ ) with density f ( x ) = λ e − λ x f(x) = \lambda e^{-\lambda x} f ( x ) = λ e − λ x for x ≥ 0 x \ge 0 x ≥ 0 .
Calculate M X ( t ) = E [ e t X ] M_X(t) = \mathbb{E}[e^{tX}] M X ( t ) = E [ e tX ] .
For which t t t is this finite?
Reveal Hint
Integrate λ e − ( λ − t ) x \lambda e^{-(\lambda - t)x} λ e − ( λ − t ) x . What happens when t ≥ λ t \ge \lambda t ≥ λ ?
Let X X X have the standard Cauchy density:
f ( x ) = 1 π ( 1 + x 2 ) , x ∈ R f(x) = \frac{1}{\pi (1+x^2)}, \quad x \in \mathbb{R} f ( x ) = π ( 1 + x 2 ) 1 , x ∈ R Show that the MGF M X ( t ) M_X(t) M X ( t ) is infinite for all t ≠ 0 t \ne 0 t = 0 .
Reveal Hint
For t > 0 t > 0 t > 0 , look at the behavior as x → ∞ x \to \infty x → ∞ . Does e t x 1 + x 2 \frac{e^{tx}}{1+x^2} 1 + x 2 e t x have a finite integral?
Consider the Taylor series expansion of M X ( t ) M_X(t) M X ( t ) .
Why does the radius of convergence , R R R (the largest value such that the series converges for all ∣ t ∣ < R |t| < R ∣ t ∣ < R ), relate to the “heaviness” of the tails of X X X ?
What if R = 0 R=0 R = 0 ?
What if R = ∞ R=\infty R = ∞ ?