We now introduce concepts to measure the linear relationship between two random variables.
Definition: Covariance
Let X , Y X, Y X , Y be two random variables. Their covariance is:
Cov ( X , Y ) = E [ ( X − E [ X ] ) ( Y − E [ Y ] ) ] \text{Cov}(X, Y) = \mathbb{E}[(X - \mathbb{E}[X])(Y - \mathbb{E}[Y])] Cov ( X , Y ) = E [( X − E [ X ]) ( Y − E [ Y ])] provided the expectations exist.
Computational Formula:
Cov ( X , Y ) = E [ X Y ] − E [ X ] E [ Y ] \text{Cov}(X, Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] Cov ( X , Y ) = E [ X Y ] − E [ X ] E [ Y ]
Why normalize? Covariance captures how X X X and Y Y Y relate, but it depends on the scale.
Cov ( a X , b Y ) = a b Cov ( X , Y ) \text{Cov}(aX, bY) = ab \, \text{Cov}(X, Y) Cov ( a X , bY ) = ab Cov ( X , Y )
Var ( a X ) = a 2 Var ( X ) \text{Var}(aX) = a^2 \text{Var}(X) Var ( a X ) = a 2 Var ( X )
Dividing by the square root of variances cancels out the scaling factors (a a a and b b b assuming they are positive), making correlation dimensionless and easier to interpret.
X X X and Y Y Y are called uncorrelated if:
Cov ( X , Y ) = 0 \text{Cov}(X, Y) = 0 Cov ( X , Y ) = 0
Note
We use Covariance instead of Correlation to define uncorrelatedness, because Correlation is undefined if Var ( X ) = 0 \text{Var}(X) = 0 Var ( X ) = 0 (i.e., X X X is constant), whereas Covariance is simply 0 in that case.
Independence ⟹ \implies ⟹ Uncorrelatedness (if variances exist)
If X ⊥ Y X \perp Y X ⊥ Y , then E [ X Y ] = E [ X ] E [ Y ] \mathbb{E}[XY] = \mathbb{E}[X]\mathbb{E}[Y] E [ X Y ] = E [ X ] E [ Y ] .
Thus Cov ( X , Y ) = E [ X Y ] − E [ X ] E [ Y ] = 0 \text{Cov}(X, Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] = 0 Cov ( X , Y ) = E [ X Y ] − E [ X ] E [ Y ] = 0 .
Technical Note : If X X X or Y Y Y do not have finite expectations (e.g., Cauchy distribution), then Covariance is undefined, so they cannot be “uncorrelated” even if independent. Thus, strictly speaking, independence is not “stronger” than uncorrelatedness because it doesn’t enforce moment existence.
Uncorrelatedness ̸ ⟹ \;\not\!\!\!\implies ⟹ Independence
Cov ( X , Y ) = 0 \text{Cov}(X, Y) = 0 Cov ( X , Y ) = 0 provides only a single number summarizing a linear relationship; it does not guarantee the full factorization of probabilities required for independence.
Exercise 10
Find an example where Cov ( X , Y ) = 0 \text{Cov}(X, Y) = 0 Cov ( X , Y ) = 0 but X ⊥̸ Y X \not\perp Y X ⊥ Y .
Let X ∼ N ( 0 , 1 ) X \sim \mathcal{N}(0, 1) X ∼ N ( 0 , 1 ) and Y = ∣ X ∣ Y = |X| Y = ∣ X ∣ .
Clearly X X X and Y Y Y are dependent (knowing X X X fully determines Y Y Y ).
E [ X ] = 0 \mathbb{E}[X] = 0 E [ X ] = 0 .
Cov ( X , Y ) = E [ X Y ] − E [ X ] E [ Y ] = E [ X ∣ X ∣ ] − 0 \text{Cov}(X, Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] = \mathbb{E}[X|X|] - 0 Cov ( X , Y ) = E [ X Y ] − E [ X ] E [ Y ] = E [ X ∣ X ∣ ] − 0 .
X ∣ X ∣ X|X| X ∣ X ∣ is an odd function (e.g., ( − x ) ∣ − x ∣ = − x ∣ x ∣ (-x)|-x| = -x|x| ( − x ) ∣ − x ∣ = − x ∣ x ∣ ).
Since the standard normal PDF is even, the integral over R \mathbb{R} R ,
E [ X ∣ X ∣ ] = ∫ − ∞ ∞ x ∣ x ∣ 2 π e − x 2 2 d x = 0 \mathbb{E}[X|X|] = \int_{-\infty}^{\infty} \frac{x|x|}{\sqrt{2\pi}} e^{-\frac{x^2}{2}} \, dx = 0 E [ X ∣ X ∣ ] = ∫ − ∞ ∞ 2 π x ∣ x ∣ e − 2 x 2 d x = 0
Thus Cov ( X , Y ) = 0 \text{Cov}(X, Y) = 0 Cov ( X , Y ) = 0 , so they are uncorrelated but dependent.
Note
Corr ( X , Y ) = 1 \text{Corr}(X, Y) = 1 Corr ( X , Y ) = 1 iff Y = a X + b Y = aX + b Y = a X + b for some a > 0 a > 0 a > 0 .
Corr ( X , Y ) = − 1 \text{Corr}(X, Y) = -1 Corr ( X , Y ) = − 1 iff Y = a X + b Y = aX + b Y = a X + b for some a < 0 a < 0 a < 0 .
Let X ∼ Bin ( n , p ) X \sim \text{Bin}(n, p) X ∼ Bin ( n , p ) .
We can model X X X as the sum of n n n independent trials:
X = Y 1 + ⋯ + Y n X = Y_1 + \dots + Y_n X = Y 1 + ⋯ + Y n
where Y i ∼ Bernoulli ( p ) Y_i \sim \text{Bernoulli}(p) Y i ∼ Bernoulli ( p ) are i.i.d. (independent and identically distributed).
Expectation :
Since expectation is always linear:
E [ X ] = ∑ i = 1 n E [ Y i ] = n E [ Y 1 ] = n p \mathbb{E}[X] = \sum_{i=1}^n \mathbb{E}[Y_i] = n \mathbb{E}[Y_1] = np E [ X ] = i = 1 ∑ n E [ Y i ] = n E [ Y 1 ] = n p
Variance :
Since Y i Y_i Y i are independent , Cov ( Y i , Y j ) = 0 \text{Cov}(Y_i, Y_j) = 0 Cov ( Y i , Y j ) = 0 for i ≠ j i \ne j i = j . Thus the variance of the sum is the sum of the variances:
Var ( X ) = ∑ i = 1 n Var ( Y i ) \text{Var}(X) = \sum_{i=1}^n \text{Var}(Y_i) Var ( X ) = i = 1 ∑ n Var ( Y i )
For a Bernoulli variable, Var ( Y i ) = p ( 1 − p ) \text{Var}(Y_i) = p(1-p) Var ( Y i ) = p ( 1 − p ) .
⟹ Var ( X ) = n p ( 1 − p ) \implies \text{Var}(X) = n p(1-p) ⟹ Var ( X ) = n p ( 1 − p )