Skip to content

Regular Conditional Distribution

The introduction chose Idea 2 (define conditional expectation directly) over Idea 1 (define the conditional distribution first, then derive the expectation as an integral) because the second route is harder. With the theory of conditional expectation in hand, we can now circle back and construct conditional distributions rigorously. The right object is the regular conditional distribution (RCD).

How should we define the conditional distribution of XX given Y=yY = y when YY is continuous? Concretely, what is P(XAY=y)\Pr(X \in A \mid Y = y)?

The elementary formula P(AB)=P(AB)P(B)\Pr(A \mid B) = \frac{\Pr(A \cap B)}{\Pr(B)} breaks down since P(Y=y)=0\Pr(Y = y)=0.

A natural rephrasing uses conditional expectation:

P(XAY=y)  =  E ⁣[1{XA}Y](ω)for some ω{Y=y}.\Pr(X \in A \mid Y = y) \;=\; \E\!\left[ \mathbb{1}_{\{X \in A\}} \,\big|\, Y \right](\omega) \quad \text{for some } \omega \in \{Y = y\}.

But conditional expectation is only defined up to a P\Pr-null set, and {Y=y}\{Y = y\} is itself a P\Pr-null set. The value of E[1{XA}Y]\E[\mathbb{1}_{\{X \in A\}} \mid Y] on {Y=y}\{Y = y\} is unconstrained: it can be modified freely without changing the version. Worse, as AA ranges over Borel sets, the family of conditional probabilities so obtained need not assemble into a probability measure (countable additivity in AA holds only a.s., and the exceptional null set may depend on the chosen Borel cover).

The fix is to demand a single function ωμ(ω,)\omega \mapsto \mu(\omega, \cdot) that is simultaneously:

  • a version of P(XAG)\Pr(X \in A \mid \cG) for each fixed AA,
  • a probability measure in AA for each fixed ω\omega.

The second requirement is the regularity condition.

Condition (1) is the “conditional probability” content: μ(,A)\mu(\cdot, A) matches the conditional expectation of the indicator 1{XA}\mathbb{1}_{\{X \in A\}}. Condition (2) is the regularity: for each typical ω\omega, the slice μ(ω,)\mu(\omega, \cdot) is a genuine probability measure on R\R, not just a collection of numbers indexed by Borel sets.

Once an RCD exists, conditional expectation of any function of XX is computed by integrating against it.

The reading: once an RCD exists, conditional expectation is just integration against the RCD. This recovers the elementary picture from the introduction: conditioning on Y=yY = y corresponds to integration against the conditional measure μ(ω,)\mu(\omega, \cdot) for any ω{Y=y}\omega \in \{Y = y\}.

The construction proceeds via conditional cumulative distribution functions on the rationals, then extends to R\R, then converts to a measure.

  • The role of R\R. The existence proof relied on R\R in exactly one place: building a measure from its values on the countable π\pi-system of half-lines {(,r]:rQ}\{(-\infty, r] : r \in \Q\}. The same machinery works for any random variable taking values in a Borel space (a measurable space isomorphic to a Borel subset of a Polish space), which covers Rn\R^n, separable metric spaces, and most distributions of practical interest. For general measurable target spaces, RCDs need not exist.
  • Uniqueness. Two RCDs of XX given G\cG agree as measures for P\Pr-almost every ω\omega. The proof mirrors the integration-identity argument above: the π\pi-λ\lambda theorem upgrades equality on the rational half-lines to equality on all Borel sets, P\Pr-a.s. in ω\omega.
  • Disintegration. When G=σ(Y)\cG = \sigma(Y) for some random variable YY taking values in a Borel space, RCDs assemble into a disintegration of the joint law of (X,Y)(X, Y): a Markov kernel K(y,dx):=μ(ω,dx)K(y, dx) := \mu(\omega, dx) for any ω\omega with Y(ω)=yY(\omega) = y, well-defined up to a PY\Pr_Y-null set of yy. Conditioning takes the elementary form E[f(X)Y=y]=Rf(x)K(y,dx)\E[f(X) \mid Y = y] = \int_\R f(x) \, K(y, dx), closing the loop with the introduction’s discussion of the discrete and absolutely continuous cases.