Conditional expectation inherits the structural properties of the integral (linearity, monotonicity, monotone convergence) and adds a few of its own (tower property, take-out-what-is-known, independence). All the proofs reduce to checking the two conditions in the definition : G \cG G -measurability and the integration identity on G \cG G -events.
Throughout this page, ( Ω , F , P ) (\Omega, \cF, \Pr) ( Ω , F , P ) is a probability space, G ⊆ F \cG \subseteq \cF G ⊆ F is a sub-σ \sigma σ -field, and "E ( ⋅ ∣ G ) \E(\cdot \mid \cG) E ( ⋅ ∣ G ) " refers to any version of the conditional expectation.
Theorem: Linearity
For integrable X , Y X, Y X , Y and any a ∈ R a \in \R a ∈ R ,
E ( a X + Y ∣ G ) = a.s. a E ( X ∣ G ) + E ( Y ∣ G ) . \E(a X + Y \mid \cG) \;\stackrel{\text{a.s.}}{=}\; a \, \E(X \mid \cG) + \E(Y \mid \cG). E ( a X + Y ∣ G ) = a.s. a E ( X ∣ G ) + E ( Y ∣ G ) . The right-hand side is G \cG G -measurable (a linear combination of G \cG G -measurable functions). For any A ∈ G A \in \cG A ∈ G ,
∫ A ( a E ( X ∣ G ) + E ( Y ∣ G ) ) d P = a ∫ A X d P + ∫ A Y d P = ∫ A ( a X + Y ) d P , \int_A \big(a \E(X \mid \cG) + \E(Y \mid \cG)\big) \, d\Pr
\;=\; a \int_A X \, d\Pr + \int_A Y \, d\Pr
\;=\; \int_A (a X + Y) \, d\Pr, ∫ A ( a E ( X ∣ G ) + E ( Y ∣ G ) ) d P = a ∫ A X d P + ∫ A Y d P = ∫ A ( a X + Y ) d P , using linearity of the integral on both sides. So the right-hand side satisfies both defining conditions for E ( a X + Y ∣ G ) \E(a X + Y \mid \cG) E ( a X + Y ∣ G ) , and by uniqueness the two agree a.s.
Theorem: Monotonicity
If X ≤ Y X \le Y X ≤ Y a.s. with X , Y X, Y X , Y integrable, then
E ( X ∣ G ) ≤ E ( Y ∣ G ) a.s. \E(X \mid \cG) \;\le\; \E(Y \mid \cG) \quad \text{a.s.} E ( X ∣ G ) ≤ E ( Y ∣ G ) a.s. Mirror the uniqueness argument. Suppose
A : = { E ( X ∣ G ) > E ( Y ∣ G ) } A \;:=\; \{\E(X \mid \cG) > \E(Y \mid \cG)\} A := { E ( X ∣ G ) > E ( Y ∣ G )} has P ( A ) > 0 \Pr(A) > 0 P ( A ) > 0 . Since both conditional expectations are G \cG G -measurable, A ∈ G A \in \cG A ∈ G . Applying the integration identity on A A A ,
∫ A E ( X ∣ G ) d P = ∫ A X d P ≤ ∫ A Y d P = ∫ A E ( Y ∣ G ) d P , \int_A \E(X \mid \cG) \, d\Pr \;=\; \int_A X \, d\Pr \;\le\; \int_A Y \, d\Pr \;=\; \int_A \E(Y \mid \cG) \, d\Pr, ∫ A E ( X ∣ G ) d P = ∫ A X d P ≤ ∫ A Y d P = ∫ A E ( Y ∣ G ) d P , so ∫ A ( E ( X ∣ G ) − E ( Y ∣ G ) ) d P ≤ 0 \int_A \big(\E(X \mid \cG) - \E(Y \mid \cG)\big) \, d\Pr \le 0 ∫ A ( E ( X ∣ G ) − E ( Y ∣ G ) ) d P ≤ 0 . But the integrand is strictly positive on A A A , and P ( A ) > 0 \Pr(A) > 0 P ( A ) > 0 , so the integral is strictly positive. Contradiction. Hence P ( A ) = 0 \Pr(A) = 0 P ( A ) = 0 .
Theorem: Conditional MCT
If 0 ≤ X n ↑ X 0 \le X_n \uparrow X 0 ≤ X n ↑ X a.s. with E [ X ] < ∞ \E[X] < \infty E [ X ] < ∞ , then
E ( X n ∣ G ) ↑ E ( X ∣ G ) a.s. \E(X_n \mid \cG) \;\uparrow\; \E(X \mid \cG) \quad \text{a.s.} E ( X n ∣ G ) ↑ E ( X ∣ G ) a.s. Let Y n : = X − X n Y_n := X - X_n Y n := X − X n and Z n : = E ( Y n ∣ G ) Z_n := \E(Y_n \mid \cG) Z n := E ( Y n ∣ G ) . By assumption Y n ≥ 0 Y_n \ge 0 Y n ≥ 0 and Y n ↓ 0 Y_n \downarrow 0 Y n ↓ 0 a.s. By linearity,
Z n = E ( X ∣ G ) − E ( X n ∣ G ) , Z_n \;=\; \E(X \mid \cG) - \E(X_n \mid \cG), Z n = E ( X ∣ G ) − E ( X n ∣ G ) , so E ( X n ∣ G ) ↑ E ( X ∣ G ) \E(X_n \mid \cG) \uparrow \E(X \mid \cG) E ( X n ∣ G ) ↑ E ( X ∣ G ) is equivalent to Z n ↓ 0 Z_n \downarrow 0 Z n ↓ 0 a.s.
Step 1: ∫ A Z n d P → 0 \int_A Z_n \, d\Pr \to 0 ∫ A Z n d P → 0 for every A ∈ G A \in \cG A ∈ G . Apply the integration identity:
∫ A Z n d P = ∫ A Y n d P . \int_A Z_n \, d\Pr \;=\; \int_A Y_n \, d\Pr. ∫ A Z n d P = ∫ A Y n d P . Since 0 ≤ Y n ≤ X 0 \le Y_n \le X 0 ≤ Y n ≤ X and E [ X ] < ∞ \E[X] < \infty E [ X ] < ∞ , the dominated convergence theorem gives ∫ A Y n d P → 0 \int_A Y_n \, d\Pr \to 0 ∫ A Y n d P → 0 .
Step 2: Z n Z_n Z n has an a.s. limit. Monotonicity applied to Y n + 1 ≤ Y n Y_{n+1} \le Y_n Y n + 1 ≤ Y n gives Z n + 1 ≤ Z n Z_{n+1} \le Z_n Z n + 1 ≤ Z n a.s. So Z n Z_n Z n is non-increasing in n n n and bounded below by 0 0 0 (apply monotonicity again to Y n ≥ 0 Y_n \ge 0 Y n ≥ 0 ). Hence Z n ↓ Z ∞ Z_n \downarrow Z_\infty Z n ↓ Z ∞ a.s. for some Z ∞ ≥ 0 Z_\infty \ge 0 Z ∞ ≥ 0 .
Step 3: Z ∞ = 0 Z_\infty = 0 Z ∞ = 0 a.s. Since Z n ≤ Z 1 ≤ E ( X ∣ G ) Z_n \le Z_1 \le \E(X \mid \cG) Z n ≤ Z 1 ≤ E ( X ∣ G ) , and E [ E ( X ∣ G ) ] = E [ X ] < ∞ \E[\E(X \mid \cG)] = \E[X] < \infty E [ E ( X ∣ G )] = E [ X ] < ∞ , the Z n Z_n Z n are uniformly dominated by an integrable function. Dominated convergence applied again gives
∫ A Z n d P ⟶ ∫ A Z ∞ d P . \int_A Z_n \, d\Pr \;\longrightarrow\; \int_A Z_\infty \, d\Pr. ∫ A Z n d P ⟶ ∫ A Z ∞ d P . Combining with Step 1, ∫ A Z ∞ d P = 0 \int_A Z_\infty \, d\Pr = 0 ∫ A Z ∞ d P = 0 for every A ∈ G A \in \cG A ∈ G . Since Z ∞ Z_\infty Z ∞ is G \cG G -measurable and non-negative, this forces Z ∞ = 0 Z_\infty = 0 Z ∞ = 0 a.s.
The analogous conditional Fatou and conditional dominated convergence theorems follow by the same machine (apply the unconditional versions inside ∫ A ⋅ d P \int_A \cdot \, d\Pr ∫ A ⋅ d P and translate via the integration identity).
The reading: if Y Y Y carries no information about X X X , then conditioning on Y Y Y is the same as not conditioning at all.
This is the opposite extreme of the independence case: if X X X is already measurable in G \cG G , then G \cG G carries all the information about X X X and conditioning is the identity.
Theorem: Tower Property
Let F 1 ⊆ F 2 ⊆ F \cF_1 \subseteq \cF_2 \subseteq \cF F 1 ⊆ F 2 ⊆ F be sub-σ \sigma σ -fields. For any integrable X X X ,
E ( E ( X ∣ F 1 ) ∣ F 2 ) = a.s. E ( X ∣ F 1 ) , E ( E ( X ∣ F 2 ) ∣ F 1 ) = a.s. E ( X ∣ F 1 ) . \begin{aligned}
\E\!\big( \E(X \mid \cF_1) \,\big|\, \cF_2 \big) &\;\stackrel{\text{a.s.}}{=}\; \E(X \mid \cF_1), \\
\E\!\big( \E(X \mid \cF_2) \,\big|\, \cF_1 \big) &\;\stackrel{\text{a.s.}}{=}\; \E(X \mid \cF_1).
\end{aligned} E ( E ( X ∣ F 1 ) F 2 ) E ( E ( X ∣ F 2 ) F 1 ) = a.s. E ( X ∣ F 1 ) , = a.s. E ( X ∣ F 1 ) . Iterated conditional expectations collapse to the smaller σ \sigma σ -field, regardless of the order.
(a) Inner F 1 \cF_1 F 1 , outer F 2 \cF_2 F 2 . E ( X ∣ F 1 ) \E(X \mid \cF_1) E ( X ∣ F 1 ) is F 1 \cF_1 F 1 -measurable, hence F 2 \cF_2 F 2 -measurable (since F 1 ⊆ F 2 \cF_1 \subseteq \cF_2 F 1 ⊆ F 2 ). By self-measurability , conditioning a F 2 \cF_2 F 2 -measurable function on F 2 \cF_2 F 2 returns the function itself.
(b) Inner F 2 \cF_2 F 2 , outer F 1 \cF_1 F 1 . Take any A ∈ F 1 A \in \cF_1 A ∈ F 1 . Since F 1 ⊆ F 2 \cF_1 \subseteq \cF_2 F 1 ⊆ F 2 , also A ∈ F 2 A \in \cF_2 A ∈ F 2 . Applying the integration identity at each level,
∫ A E ( X ∣ F 1 ) d P = ∫ A X d P = ∫ A E ( X ∣ F 2 ) d P . \int_A \E(X \mid \cF_1) \, d\Pr \;=\; \int_A X \, d\Pr \;=\; \int_A \E(X \mid \cF_2) \, d\Pr. ∫ A E ( X ∣ F 1 ) d P = ∫ A X d P = ∫ A E ( X ∣ F 2 ) d P . The right-hand side is ∫ A Y d P \int_A Y \, d\Pr ∫ A Y d P with Y = E ( X ∣ F 2 ) Y = \E(X \mid \cF_2) Y = E ( X ∣ F 2 ) , which is the integration identity defining E ( Y ∣ F 1 ) \E(Y \mid \cF_1) E ( Y ∣ F 1 ) . Combined with the F 1 \cF_1 F 1 -measurability of E ( X ∣ F 1 ) \E(X \mid \cF_1) E ( X ∣ F 1 ) , uniqueness gives E ( E ( X ∣ F 2 ) ∣ F 1 ) = E ( X ∣ F 1 ) \E(\E(X \mid \cF_2) \mid \cF_1) = \E(X \mid \cF_1) E ( E ( X ∣ F 2 ) ∣ F 1 ) = E ( X ∣ F 1 ) a.s.
Specializing to F 1 = { ∅ , Ω } \cF_1 = \{\emptyset, \Omega\} F 1 = { ∅ , Ω } (the trivial σ \sigma σ -field) gives the law of iterated expectations :
E [ E ( X ∣ G ) ] = E [ X ] . \E\!\big[ \E(X \mid \cG) \big] \;=\; \E[X]. E [ E ( X ∣ G ) ] = E [ X ] .
Averaging the conditional expectation against the full distribution recovers the unconditional expectation.
Note that this is the same indicator-simple-non-negative-general progression used to define the Lebesgue integral itself. Every “extend to integrable f f f ” argument in measure theory looks the same.