We define the joint distribution to study the behavior of multiple random variables simultaneously.
Independence can be characterized purely in terms of distribution functions.
This generalizes to n random variables X1,…,Xn. They are mutually independent if and only if:
P(X1≤x1,…,Xn≤xn)=i=1∏nP(Xi≤xi)
for all xi∈R.
If we define a measure η on R2 by the distribution of the random vector (X,Y):
η((−∞,x1]×(−∞,x2])=P(X≤x1,Y≤x2)
If X⊥Y, this factorizes:
η((−∞,x1]×(−∞,x2])=μ((−∞,x1])⋅ν((−∞,x2])
where μ and ν are the distributions (laws) of X and Y respectively.
This implies that η=μ×ν is a product measure, satisfying η(A×B)=μ(A)ν(B) for Borel sets A,B.
The product structure allows us to compute expectations of functions of independent variables as iterated integrals.