Inequalities
We can use integrability to bound the probability of “tail events” (values far from the mean). These inequalities form the basis for proving laws of large numbers.
General case
Section titled “General case”The whole argument in one picture: the step never rises above , touching it only at the boundary of . Integrating this pointwise domination against the law of turns area into probability and gives . Markov’s inequality is the special case , , where .
Important special cases
Section titled “Important special cases”Markov’s Inequality
Section titled “Markov’s Inequality”Chebyshev’s Inequality
Section titled “Chebyshev’s Inequality”Alternatively, for the raw second moment:
Chebyshev is the general bound with and . The parabola sits above the indicator of the tail event, equal to it precisely at . Integrating turns the squared deviation into variance: .
Jensen’s Inequality
Section titled “Jensen’s Inequality”The previous inequalities bound tail probabilities. The next one is of a different kind: it relates the expectation of a convex transformation to the transformation of the expectation. It is the workhorse behind moment comparisons and the contraction properties of averaging operators.
For a two-point variable taking and with equal weight, is the midpoint and is the chord’s height there. Convexity keeps the chord above the curve, so (on the curve) sits below (on the chord). The vertical gap is the slack in the inequality, and it closes only when is affine on the range of .
A single supporting line suffices here because the mean is a fixed number, so we only need the line tangent at that one point. When the deterministic mean is replaced by the random variable , no single line works for all outcomes at once, and the argument has to invoke a whole countable family of supporting lines simultaneously. That refinement is carried out in the conditional version of Jensen’s inequality.