Inequalities

We can use integrability to bound the probability of “tail events” (values far from the mean). These inequalities form the basis for proving laws of large numbers.

General case

Statement
Proof

Let $X$ be a random variable and $g: \mathbb{R} \to [0, \infty)$ be a non-negative function. For any set $B \in \mathcal{B}$ , let $L$ be the lower bound of $g$ on $B$ :

L = \inf \{ g(x) : x \in B \}

Then:

\mathbb{P}(X \in B) \le \frac{\mathbb{E}[g(X)]}{L}

A non-negative function g dominating the scaled indicator step L·1 on the region B

The whole argument in one picture: the step $L \cdot \mathbb{1}_B$ never rises above $g(x)$ , touching it only at the boundary of $B$ . Integrating this pointwise domination against the law of $X$ turns area into probability and gives $\mathbb{P}(X \in B) \le \mathbb{E}[g(X)] / L$ . Markov’s inequality is the special case $g(x) = x$ , $B = \{x \ge a\}$ , where $L = a$ .

Important special cases

Markov’s Inequality

Chebyshev’s Inequality

Alternatively, for the raw second moment:

\mathbb{P}(|X| \ge a) \le \frac{\mathbb{E}[X^2]}{a^2}

A parabola centered at the mean dominating a two-sided indicator step outside the band μ±a

Chebyshev is the general bound with $g(x) = (x - \mu)^2$ and $B = \{|x - \mu| \ge a\}$ . The parabola $(x-\mu)^2/a^2$ sits above the indicator of the tail event, equal to it precisely at $\mu \pm a$ . Integrating turns the squared deviation into variance: $\mathbb{P}(|X - \mu| \ge a) \le \mathrm{Var}(X)/a^2$ .

Jensen’s Inequality

The previous inequalities bound tail probabilities. The next one is of a different kind: it relates the expectation of a convex transformation to the transformation of the expectation. It is the workhorse behind moment comparisons and the contraction properties of averaging operators.

A convex curve sagging below its chord, with the Jensen gap between φ(E[X]) on the curve and E[φ(X)] on the chord

For a two-point variable taking $x_1$ and $x_2$ with equal weight, $\mathbb{E}[X]$ is the midpoint and $\mathbb{E}[\varphi(X)]$ is the chord’s height there. Convexity keeps the chord above the curve, so $\varphi(\mathbb{E}[X])$ (on the curve) sits below $\mathbb{E}[\varphi(X)]$ (on the chord). The vertical gap is the slack in the inequality, and it closes only when $\varphi$ is affine on the range of $X$ .

Statement
Proof

Let $\varphi : \mathbb{R} \to \mathbb{R}$ be a convex function, and let $X$ be an integrable random variable ( $\mathbb{E}|X| < \infty$ ). Then

\varphi\!\big(\mathbb{E}[X]\big) \;\le\; \mathbb{E}\!\big[\varphi(X)\big],

where the right side is well-defined in $(-\infty, +\infty]$ .

If $\varphi$ is concave, the inequality reverses. (A familiar instance: $\mathbb{E}[\log X] \le \log \mathbb{E}[X]$ .)

A single supporting line suffices here because the mean $\mathbb{E}[X]$ is a fixed number, so we only need the line tangent at that one point. When the deterministic mean is replaced by the random variable $\mathbb{E}(X \mid \mathcal{G})$ , no single line works for all outcomes at once, and the argument has to invoke a whole countable family of supporting lines simultaneously. That refinement is carried out in the conditional version of Jensen’s inequality.