Skip to content

Covariance & Correlation

We now introduce concepts to measure the linear relationship between two random variables.

Why normalize? Covariance captures how XX and YY relate, but it depends on the scale.

  • Cov(aX,bY)=abCov(X,Y)\text{Cov}(aX, bY) = ab \, \text{Cov}(X, Y)
  • Var(aX)=a2Var(X)\text{Var}(aX) = a^2 \text{Var}(X)

Dividing by the square root of variances cancels out the scaling factors (aa and bb assuming they are positive), making correlation dimensionless and easier to interpret.

XX and YY are called uncorrelated if:

Cov(X,Y)=0\text{Cov}(X, Y) = 0
  1. Independence     \implies Uncorrelatedness (if variances exist) If XYX \perp Y, then E[XY]=E[X]E[Y]\mathbb{E}[XY] = \mathbb{E}[X]\mathbb{E}[Y]. Thus Cov(X,Y)=E[XY]E[X]E[Y]=0\text{Cov}(X, Y) = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] = 0.

    Technical Note: If XX or YY do not have finite expectations (e.g., Cauchy distribution), then Covariance is undefined, so they cannot be “uncorrelated” even if independent. Thus, strictly speaking, independence is not “stronger” than uncorrelatedness because it doesn’t enforce moment existence.

  2. Uncorrelatedness   ̸ ⁣ ⁣ ⁣    \;\not\!\!\!\implies Independence Cov(X,Y)=0\text{Cov}(X, Y) = 0 provides only a single number summarizing a linear relationship; it does not guarantee the full factorization of probabilities required for independence.

Let XBin(n,p)X \sim \text{Bin}(n, p). We can model XX as the sum of nn independent trials:

X=Y1++YnX = Y_1 + \dots + Y_n

where YiBernoulli(p)Y_i \sim \text{Bernoulli}(p) are i.i.d. (independent and identically distributed).

  1. Expectation: Since expectation is always linear:

    E[X]=i=1nE[Yi]=nE[Y1]=np\mathbb{E}[X] = \sum_{i=1}^n \mathbb{E}[Y_i] = n \mathbb{E}[Y_1] = np
  2. Variance: Since YiY_i are independent, Cov(Yi,Yj)=0\text{Cov}(Y_i, Y_j) = 0 for iji \ne j. Thus the variance of the sum is the sum of the variances:

    Var(X)=i=1nVar(Yi)\text{Var}(X) = \sum_{i=1}^n \text{Var}(Y_i)

    For a Bernoulli variable, Var(Yi)=p(1p)\text{Var}(Y_i) = p(1-p).

        Var(X)=np(1p)\implies \text{Var}(X) = n p(1-p)