For a probability space ( Ω , F , P ) (\Omega, \mathcal{F}, \mathbb{P}) ( Ω , F , P ) and measure space ( R , B ) (\mathbb{R}, \mathcal{B}) ( R , B ) , let X X X be a random variable mapping a preimage X − 1 ( A ) X^{-1}(A) X − 1 ( A ) to A ∈ B A \in \mathcal{B} A ∈ B . Recall that μ ( A ) = P ( X − 1 ( A ) ) \mu(A) = \mathbb{P}(X^{-1}(A)) μ ( A ) = P ( X − 1 ( A )) .
If X X X and Y Y Y induce the same distribution on ( R , B ) (\mathbb{R}, \mathcal{B}) ( R , B ) , i.e., F X = F Y F_X = F_Y F X = F Y , then we say they have the same distribution or they are equal in distribution , denoted by:
X = d Y X \overset{ \text{d} }{=} Y X = d Y
Important
In general, X ( ω ) ≠ Y ( ω ) X(\omega) \neq Y(\omega) X ( ω ) = Y ( ω ) even when X = d Y X \overset{ \text{d} }{=} Y X = d Y .
Example : Consider Ω = { H , T } \Omega = \{H, T\} Ω = { H , T } with P ( H ) = P ( T ) = 0.5 \mathbb{P}(H)=\mathbb{P}(T)=0.5 P ( H ) = P ( T ) = 0.5 .
Let X ( H ) = 1 , X ( T ) = 0 X(H)=1, X(T)=0 X ( H ) = 1 , X ( T ) = 0 and Y ( H ) = 0 , Y ( T ) = 1 Y(H)=0, Y(T)=1 Y ( H ) = 0 , Y ( T ) = 1 .
Then X = d Y X \overset{ \text{d} }{=} Y X = d Y (both are Bernoulli(0.5)), but X ≠ Y X \neq Y X = Y everywhere.
Definition: Probability Density Function
If there exists a function f f f such that for all x ∈ R x \in \mathbb{R} x ∈ R ,
P ( X ≤ x ) = F ( x ) = ∫ − ∞ x f ( y ) d y \mathbb{P}(X \le x) = F(x) = \int_{-\infty}^x f(y) \, dy P ( X ≤ x ) = F ( x ) = ∫ − ∞ x f ( y ) d y then f f f is called the probability density function (PDF) .
If a distribution has a density function, it is called absolutely continuous .
Note that this is different from just “continuous”, which is defined below.
Properties:
For any a < b a < b a < b :
P ( X ∈ ( a , b ] ) = F ( b ) − F ( a ) = ∫ − ∞ b f ( y ) d y − ∫ − ∞ a f ( y ) d y = ∫ a b f ( y ) d y \begin{aligned}
\mathbb{P}(X \in (a, b]) &= F(b) - F(a) \\
&= \int_{-\infty}^b f(y) \, dy - \int_{-\infty}^a f(y) \, dy \\
&= \int_a^b f(y) \, dy
\end{aligned} P ( X ∈ ( a , b ]) = F ( b ) − F ( a ) = ∫ − ∞ b f ( y ) d y − ∫ − ∞ a f ( y ) d y = ∫ a b f ( y ) d y
Also, P ( X = x ) ≤ P ( X ∈ ( x − ϵ , x + ϵ ] ) = ∫ x − ϵ x + ϵ f ( y ) d y \mathbb{P}(X=x) \le \mathbb{P}(X \in (x-\epsilon, x+\epsilon]) = \int_{x-\epsilon}^{x+\epsilon} f(y) \, dy P ( X = x ) ≤ P ( X ∈ ( x − ϵ , x + ϵ ]) = ∫ x − ϵ x + ϵ f ( y ) d y .
Taking the limit as ϵ → 0 \epsilon \to 0 ϵ → 0 , we get:
⟹ for a r.v. with a density function, P ( X = x ) = 0 \implies \text{for a r.v. with a density function, } \mathbb{P}(X=x) = 0 ⟹ for a r.v. with a density function, P ( X = x ) = 0
This also implies P ( a < X < b ) = P ( a ≤ X ≤ b ) = ∫ a b f ( y ) d y \mathbb{P}(a < X < b) = \mathbb{P}(a \le X \le b) = \int_a^b f(y) \, dy P ( a < X < b ) = P ( a ≤ X ≤ b ) = ∫ a b f ( y ) d y .
Definition: Continuous Distribution
If a distribution has:
P ( X = x ) = F ( x ) − F ( x − ) = 0 for all x \mathbb{P}(X=x) = F(x) - F(x-) = 0 \quad \text{for all } x P ( X = x ) = F ( x ) − F ( x − ) = 0 for all x i.e., its distribution function F F F is continuous, then the distribution is called continuous .
Relationship:
Absolutely continuous ⟹ \implies ⟹ continuous.
But, continuous ̸ ⟹ \;\not\!\!\!\implies ⟹ absolutely continuous. (There exist continuous distributions which do not have a density function).
Example : The Cantor Distribution is continuous (F F F is continuous) but “singular” with respect to Lebesgue measure (derivative is 0 almost everywhere), so it has no PDF.
A distribution is called singular if there exists a set A ∈ B A \in \mathcal{B} A ∈ B such that λ ( A ) = 0 \lambda(A) = 0 λ ( A ) = 0 (Lebesgue measure is 0), but P ( X ∈ A ) = 1 \mathbb{P}(X \in A) = 1 P ( X ∈ A ) = 1 , while X X X is still continuous.
Look at the Cantor set or some fractals for examples.
This is why a general distribution cannot be assumed to be a mixture of just a part with density and a part with probability mass on discrete points. There can also be “singular” parts.
A general distribution can be decomposed as:
General Distr = Abs. Continuous Part ⏟ with density f + Discrete Part ⏟ with point masses + Singular Part ⏟ continuous but no density \text{General Distr} = \underbrace{\text{Abs. Continuous Part}}_{\text{with density } f} + \underbrace{\text{Discrete Part}}_{\text{with point masses}} + \underbrace{\text{Singular Part}}_{\text{continuous but no density}} General Distr = with density f Abs. Continuous Part + with point masses Discrete Part + continuous but no density Singular Part
Examples
Uniform distribution on [ 0 , 1 ] [0, 1] [ 0 , 1 ] : Unif [ 0 , 1 ] \text{Unif}[0, 1] Unif [ 0 , 1 ] or U [ 0 , 1 ] U[0, 1] U [ 0 , 1 ] .
f ( x ) = 1 for x ∈ [ 0 , 1 ] f(x) = 1 \quad \text{for } x \in [0, 1] f ( x ) = 1 for x ∈ [ 0 , 1 ]
F ( x ) = { 0 x < 0 x x ∈ [ 0 , 1 ) 1 x ≥ 1 F(x) = \begin{cases}
0 & x < 0 \\
x & x \in [0, 1) \\
1 & x \ge 1
\end{cases} F ( x ) = ⎩ ⎨ ⎧ 0 x 1 x < 0 x ∈ [ 0 , 1 ) x ≥ 1
Exponential distribution :
f ( x ) = { λ e − λ x x ≥ 0 0 otherwise f(x) = \begin{cases}
\lambda e^{-\lambda x} & x \ge 0 \\
0 & \text{otherwise}
\end{cases} f ( x ) = { λ e − λ x 0 x ≥ 0 otherwise
Computation of CDF for x ≥ 0 x \ge 0 x ≥ 0 :
F ( x ) = ∫ − ∞ x f ( y ) d y = ∫ 0 x λ e − λ y d y = − e − λ y ∣ 0 x = 1 − e − λ x F(x) = \int_{-\infty}^x f(y) \, dy = \int_0^x \lambda e^{-\lambda y} \, dy = -e^{-\lambda y} \Big|_0^x = 1 - e^{-\lambda x} F ( x ) = ∫ − ∞ x f ( y ) d y = ∫ 0 x λ e − λ y d y = − e − λ y 0 x = 1 − e − λ x
(and F ( x ) = 0 F(x)=0 F ( x ) = 0 for x < 0 x<0 x < 0 ).
Standard Normal : x ∈ R x \in \mathbb{R} x ∈ R
f ( x ) = 1 2 π e − x 2 / 2 f(x) = \frac{1}{\sqrt{2\pi}} e^{-x^2/2} f ( x ) = 2 π 1 e − x 2 /2
The distribution function F ( x ) F(x) F ( x ) does not have a closed form expression in this case.
Point mass at 0 . P ( X = 0 ) = 1 \mathbb{P}(X=0) = 1 P ( X = 0 ) = 1 .
F ( x ) = { 0 x < 0 1 x ≥ 0 F(x) = \begin{cases}
0 & x < 0 \\
1 & x \ge 0
\end{cases} F ( x ) = { 0 1 x < 0 x ≥ 0