Covariance & Correlation

We now introduce concepts to measure the linear relationship between two random variables.

Definitions

Why normalize? Covariance captures how $X$ and $Y$ relate, but it depends on the scale.

$\text{Cov}(aX, bY) = ab \, \text{Cov}(X, Y)$
$\text{Var}(aX) = a^2 \text{Var}(X)$

Dividing by the square root of variances cancels out the scaling factors ( $a$ and $b$ assuming they are positive), making correlation dimensionless and easier to interpret.

Uncorrelatedness

$X$ and $Y$ are called uncorrelated if:

\text{Cov}(X, Y) = 0

Independence vs. Uncorrelatedness

Independence $\implies$ Uncorrelatedness (if variances exist) If $X \perp Y$ , then $\mathbb{E}[XY] = \mathbb{E}[X]\mathbb{E}[Y]$ . Thus $\text{Cov}(X, Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] = 0$ .

Technical Note: If $X$ or $Y$ do not have finite expectations (e.g., Cauchy distribution), then Covariance is undefined, so they cannot be “uncorrelated” even if independent. Thus, strictly speaking, independence is not “stronger” than uncorrelatedness because it doesn’t enforce moment existence.
Uncorrelatedness $\;\not\!\!\!\implies$ Independence $\text{Cov}(X, Y) = 0$ provides only a single number summarizing a linear relationship; it does not guarantee the full factorization of probabilities required for independence.
Exercise 10
Problem
Solution
Find an example where $\text{Cov}(X, Y) = 0$ but $X \not\perp Y$ .
Let $X \sim \mathcal{N}(0, 1)$ and $Y = |X|$ .

Clearly $X$ and $Y$ are dependent (knowing $X$ fully determines $Y$ ).

$\mathbb{E}[X] = 0$ .

$\text{Cov}(X, Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] = \mathbb{E}[X|X|] - 0$ .

$X|X|$ is an odd function (e.g., $(-x)|-x| = -x|x|$ ).

Since the standard normal PDF is even, the integral over $\mathbb{R}$ , $\mathbb{E}[X|X|] = \int_{-\infty}^{\infty} \frac{x|x|}{\sqrt{2\pi}} e^{-\frac{x^2}{2}} \, dx = 0$

Thus $\text{Cov}(X, Y) = 0$ , so they are uncorrelated but dependent.

Linearity of Variance for Sums: $\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) + 2\text{Cov}(X, Y)$
Cauchy-Schwarz Inequality: $|\text{Cov}(X, Y)| \le \sqrt{\text{Var}(X)\text{Var}(Y)}$ This implies that $-1 \le \text{Corr}(X, Y) \le 1$ .

Example: Binomial Distribution

Let $X \sim \text{Bin}(n, p)$ . We can model $X$ as the sum of $n$ independent trials:

X = Y_1 + \dots + Y_n

where $Y_i \sim \text{Bernoulli}(p)$ are i.i.d. (independent and identically distributed).

Expectation: Since expectation is always linear:
$\mathbb{E}[X] = \sum_{i=1}^n \mathbb{E}[Y_i] = n \mathbb{E}[Y_1] = np$
Variance: Since $Y_i$ are independent, $\text{Cov}(Y_i, Y_j) = 0$ for $i \ne j$ . Thus the variance of the sum is the sum of the variances:
$\text{Var}(X) = \sum_{i=1}^n \text{Var}(Y_i)$
For a Bernoulli variable, $\text{Var}(Y_i) = p(1-p)$ .
$\implies \text{Var}(X) = n p(1-p)$