Probability/Conditional Distributions

Motivation

Suppose there is an earthquake. Let $X$ be the number of casualties and $Y$ be the Richter scale of the earthquake.

(a) Without given anything, what is the distribution of $X$ ?

(b) Given that $Y = 1$ , what is the distribution of $X$ ?

(c) Given that $Y = 9$ , what is the distribution of $X$ ?

Template:Colored remark Are your answers to (a),(b),(c) different?

In (b) and (c), we have the Template:Colored em distribution of $X$ given $Y = 1$ , and the Template:Colored em distribution of $X$ given $Y = 9$ respectively.

In general, we have Template:Colored em of $X$ given $Y$ (Template:Colored em observing the value of $Y$ ), or $X$ given $Y = y$ (Template:Colored em observing the value of $Y$ ).

Conditional distributions

Recall the definition of Template:Colored em: $ℙ (A | B) = \frac{ℙ (A \cap B)}{ℙ (B)},$ in which $A, B$ are events, with $ℙ (B) > 0$ . Applying this definition to Template:Colored em $X, Y$ , we have $ℙ (X = x | Y = y) = \frac{ℙ (X = x \cap Y = y)}{ℙ (Y = y)} = \frac{f (x, y)}{f_{Y} (y)},$ where $f (x, y)$ is the joint pmf of $X$ and $Y$ , and $f_{Y} (y)$ is the marginal pmf of $Y$ . It is natural to call such conditional probability as Template:Colored em, right? We will denote such conditional probability as $f_{X | Y} (x | y)$ . Then, this is basically the definition of Template:Colored em pmf: Template:Colored em pmf of $X$ given $Y = y$ is the conditional probability $ℙ (X = x | Y = y)$ . Naturally, we will expect that Template:Colored em is defined similarly. This is indeed the case: Template:Colored definition Template:Colored remark To understand the definition more intuitively for the continuous case, consider the following diagram.

Top view:
     
        |
        |
        *---------------* 
        |               |
        |               |
fixed y *===============* <--- corresponding interval
        |               |
        |               |
        *---------------*
        |
        *---------------- x

Side view:

          *  
         / \ 
        *\  *  /                                           
       /|#\   \
   |  / |##\ / *---------*
   | *  |###\            /\
   | |\ |##/#\----------/--\     
   | | \|#/###*--------*   /                             
   | |  \/############/#\ /                              
   | |y *\===========/===*                               
   | | /  *---------*   /                                
   | |/              \ /                                 
   | *----------------*                                  
   |/                                                    
   *------------------------- x                          


Front view:
             
    |
    |
    |               
    *\     
    |#\    
    |##\   
    |###\             
    |####\   <------ Area: f_Y(y)
    |#####*--------*  
    |###############\ 
    *================*-------------- x

*---*
|###| : corresponding cross section from joint pdf
*---*

We can see that when we are conditioning $Y = y$ , we take a "slice" out from the region under joint pdf, and the area of the "whole slice" is the area between the Template:Colored em joint pdf $f (x, y)$ with fixed $y$ and variable $x$ , and the $x$ -axis. Since the area is given by $\int_{- \infty}^{\infty} f (x, y) d x = f_{Y} (y)$ , while according to the probability axioms, the area should equal 1. Hence, we scale down the area of "slice" by a factor of $f_{Y} (y)$ , by dividing the univariate joint pdf $f (x, y)$ by $f_{Y} (y)$ . After that, the curve at the top of scaled "slice" is the graph of the conditional pdf $\frac{f (x, y)}{f_{Y} (y)}$ .

Now, we have discussed the case where both random variables are discrete or continuous. How about the case where one of them is discrete and another one is continuous? In this case, there is no "joint probability function" of these two random variables, since one is discrete and another is continuous! But, we can still define the conditional probability function in some other ways. To motivate the following definition, let $F_{X | Y} (x | y)$ be the conditional probability $ℙ (X \leq x | Y = y)$ . Then, differentiating $F_{X | Y} (x | y)$ with respect to $x$ should yield the conditional pdf $f_{X | Y} (x | y)$ . So, we have $\begin{matrix} f_{X | Y} (x | y) = \frac{d}{d x} F_{X | Y} (x | y) & = \lim_{h \to 0} \frac{ℙ (X \leq x + h | Y = y) - ℙ (X \leq x | Y = y)}{h} \\ = \lim_{h \to 0} \frac{ℙ (x < X \leq x + h | Y = y)}{h} \\ = \lim_{h \to 0} \frac{ℙ (Y = y | x < X \leq x + h) ℙ (x < X \leq x + h)}{h ℙ (Y = y)} \\ = \lim_{h \to 0} \frac{ℙ (Y = y | x < X \leq x + h) ℙ (x < X \leq x + h)}{h ℙ (Y = y)} \\ = \lim_{h \to 0} \frac{ℙ (Y = y | x \leq X \leq x + h)}{ℙ (Y = y)} \lim_{h \to 0} \frac{ℙ (x < X \leq x + h)}{h} \\ = \frac{ℙ (Y = y | X = x) \frac{d}{d x} F_{X} (x)}{ℙ (Y = y)} \\ = \frac{ℙ (Y = y | X = x) f_{X} (x)}{ℙ (Y = y)} . \end{matrix}$ Thus, it is natural to have the following definition. Template:Colored definition Now, how about the case where $X$ is discrete and $Y$ is continuous? In this case, let us use the above definition for the motivation of definition. However, we should interchange $X$ and $Y$ so that the assumptions are still satisfied. Then, we get $f_{Y | X} (y | x) = \frac{ℙ (X = x | Y = y) f_{Y} (y)}{ℙ (X = x)} .$ In this case, $X$ is discrete, so it is natural to define the conditional pmf of $X$ given $Y = y$ as $ℙ (X = x | Y = y)$ in the expression. Now, after rearranging the terms, we get $ℙ (X = x | Y = y) = \frac{f_{Y | X} (y | x) ℙ (X = x)}{f_{Y} (y)} .$ Thus, we have the following definition. Template:Colored definition Based on the definitions of conditional probability functions, it is natural to define the Template:Colored em cdf as follows. Template:Colored definition Template:Colored remark Graphical illustration of the definition (continuous random variables):

Top view:
     
        |
        |
        *---------------* 
        |               |
        |               |
fixed y *=========@=====* <--- corresponding interval
        |         x     |
        |               |
        *---------------*
        |
        *---------------- 

Side view:

          *  
         / \ 
        *\  *  /                                           
       /|#\   \
   |  / |##\ / *---------*
   | *  |###\            /\
   | |\ |##/#\----------/--\     
   | | \|#/###*--------*   /                             
   | |  \/#########   / \ /                              
   | |y *\========@==/===*                               
   | | /  *-------x-*   /                                
   | |/              \ /                                 
   | *----------------*                                  
   |/                                                    
   *------------------------- x                          


Front view:

    |
    |
    |
    *\      
    |#\    
    |##\              
    |###\             
    |####\   <------------- Area: f_Y(y)         
    |#####*--------*  
    |###########    \ 
    *==========@=====*--------------  
               x
*---*
|###| : the desired region from the cross section from joint pdf, whose area is the probability from the cdf
*---*

If $Y = 𝟏 {A}$ for some event $A$ , we have some special notations for simplicity:

the conditional probability function of $X$ given $Y = y$ becomes

$f_{X | Y} (x | y) = {\begin{matrix} f (x | A), & y = 1; \\ f (x | A^{c}), & y = 0 . \end{matrix}$

the conditional cdf of $X$ given $Y = y$ becomes

$F_{X | Y} (x | y) = ℙ (X \leq x | Y = y) = {\begin{matrix} F (x | A), & y = 1; \\ F (x | A^{c}), & y = 0 . \end{matrix}$ Template:Colored proposition

Proof. Recall the definition of independence between two random variables:

X, Y

are independent if

$f (x, y) = f_{X} (x) f_{Y} (y)$

for each

x, y

.

Since $f_{X | Y} (x | y) = \frac{\overset{f_{X} (x) f_{Y} (y)}{\overset{⏞}{f (x, y)}}}{f_{Y} (y)} = f_{X} (x) and f_{Y | X} (y | x) = \frac{\overset{f_{Y} (y) f_{X} (x)}{\overset{⏞}{f (y, x)}}}{f_{X} (x)} = f_{Y} (y)$ for each $x, y$ , we have the desired result.

$◻$

Template:Colored remark

We can extend the definition of conditional probability function and cdf to groups of random variables, for joint cdf's and joint probability functions, as follows: Template:Colored definition Then, we also have a similar proposition for determining independence of two random vectors. Template:Colored proposition

Proof. The definition of independence between two random vectors is

$𝐗 = (X_{1}, \dots, X_{r})^{T}, 𝐘 = (Y_{1}, \dots, Y_{s})^{T}$ are independent if

$f (x_{1}, \dots, x_{r}, y_{1}, \dots, y_{s}) = f_{𝐗} (x_{1}, \dots, x_{r}) f_{𝐘} (y_{1}, \dots, y_{s})$

for each

x_{1}, \dots, x_{r}, y_{1}, \dots, y_{s}

.

Since $f_{𝐗 | 𝐘} (x_{1}, \dots, x_{r} | y_{1}, \dots, y_{s}) = \frac{\overset{f_{𝐗} (x_{1}, \dots, x_{r}) f_{𝐘} (y_{1}, \dots, y_{s})}{\overset{⏞}{f (x_{1}, \dots, x_{r}, y_{1}, \dots, y_{s})}}}{f_{𝐘} (y_{1}, \dots, y_{s})} = f_{𝐗} (x_{1}, \dots, x_{r}) and f_{𝐘 | 𝐗} (y_{1}, \dots, y_{s} | x_{1}, \dots, x_{r}) = \frac{\overset{f_{𝐘} (y_{1}, \dots, y_{s}) f_{𝐗} (x_{1}, \dots, x_{r})}{\overset{⏞}{f (y_{1}, \dots, y_{s}, x_{1}, \dots, x_{r})}}}{f_{𝐗} (x_{1}, \dots, x_{r})} = f_{𝐘} (y_{1}, \dots, y_{s})$ for each $x_{1}, \dots, x_{r}, y_{1}, \dots, y_{s}$ , we have the desired result.

$◻$

Conditional distributions of bivariate normal distribution

Recall from the [[../Important Distributions]] chapter that the joint pdf of $𝒩_{2} (μ, Σ)$ is $f (x, y) = \frac{1}{2 π σ_{X} σ_{Y} \sqrt{1 - ρ^{2}}} \exp (- \frac{1}{2 (1 - ρ^{2})} ({(\frac{x - μ_{X}}{σ_{X}})}^{2} - 2 ρ (\frac{x - μ_{X}}{σ_{X}}) (\frac{y - μ_{Y}}{σ_{Y}}) + {(\frac{y - μ_{Y}}{σ_{Y}})}^{2})), (x, y) \in ℝ^{2}$ , and $X \sim 𝒩 (μ_{X}, σ_{X}^{2})$ and $Y \sim 𝒩 (μ_{Y}, σ_{Y}^{2})$ in this case. in which $ρ = ρ (X, Y)$ and $σ_{X}, σ_{Y}$ are positive. Template:Colored proposition

Proof.

First, the conditional pdf

$\begin{matrix} f_{X | Y} (x | y) & \overset{def}{=} \frac{f (x, y)}{f_{Y} (y)} \\ = \frac{1}{2 π σ_{X} σ_{Y} \sqrt{1 - ρ^{2}}} \exp (- \frac{1}{2 (1 - ρ^{2})} ({(\frac{x - μ_{X}}{σ_{X}})}^{2} - 2 ρ (\frac{x - μ_{X}}{σ_{X}}) (\frac{y - μ_{Y}}{σ_{Y}}) + {(\frac{y - μ_{Y}}{σ_{Y}})}^{2})) / \frac{1}{\sqrt{2 π σ_{Y}^{2}}} \exp (- (y - μ_{Y})^{2} / 2 σ_{Y}^{2}) \\ = \frac{1}{\sqrt{2 π σ_{X}^{2} (1 - ρ^{2})}} \exp (- \frac{1}{2 (1 - ρ^{2})} ({(\frac{x - μ_{X}}{σ_{X}})}^{2} - 2 ρ (\frac{x - μ_{X}}{σ_{X}}) (\frac{y - μ_{Y}}{σ_{Y}}) + {(\frac{y - μ_{Y}}{σ_{Y}})}^{2}) + (y - μ_{Y})^{2} / 2 σ_{Y}^{2}) \\ = \frac{1}{\sqrt{2 π σ_{X}^{2} (1 - ρ^{2})}} \exp (- \frac{1}{2 (1 - ρ^{2})} ({(\frac{x - μ_{X}}{σ_{X}})}^{2} - 2 ρ (\frac{x - μ_{X}}{σ_{X}}) (\frac{y - μ_{Y}}{σ_{Y}}) + {(\frac{y - μ_{Y}}{σ_{Y}})}^{2} - (1 - ρ^{2}) {(\frac{y - μ_{Y}}{σ_{Y}})}^{2})) \\ = \frac{1}{\sqrt{2 π σ_{X}^{2} (1 - ρ^{2})}} \exp (- \frac{1}{2 σ_{X}^{2} (1 - ρ^{2})} ({(x - μ_{X})}^{2} - 2 ρ \cdot \frac{σ_{X}}{σ_{Y}} (x - μ_{X}) (y - μ_{Y}) + {(ρ \cdot \frac{σ_{X}}{σ_{Y}} (y - μ_{Y}))}^{2})) \\ = \frac{1}{\sqrt{2 π σ_{X}^{2} (1 - ρ^{2})}} \exp (- \frac{1}{2 σ_{X}^{2} (1 - ρ^{2})} {((x - μ_{X}) - (ρ \cdot \frac{σ_{X}}{σ_{Y}} (y - μ_{Y})))}^{2}) \\ = \frac{1}{\sqrt{2 π σ_{X}^{2} (1 - ρ^{2})}} \exp (- \frac{1}{2 σ_{X}^{2} (1 - ρ^{2})} {(x - μ_{X} - ρ \cdot \frac{σ_{X}}{σ_{Y}} (y - μ_{Y}))}^{2}) \end{matrix}$

Then, we can see that $X | (Y = y) \sim 𝒩 (μ_{X} + ρ \cdot \frac{σ_{X}}{σ_{Y}} (y - μ_{Y}), σ_{X}^{2} (1 - ρ^{2}))$ ,
and by symmetry (interchanging $X$ and $Y$ , and also interchanging $x$ and $y$ ), $Y | (X = x) \sim 𝒩 (μ_{Y} + ρ \cdot \frac{σ_{Y}}{σ_{X}} (x - μ_{X}), σ_{Y}^{2} (1 - ρ^{2}))$ .

$◻$

Conditional version of concepts

We can obtain Template:Colored em version of concepts previously established for 'unconditional' distributions analogously for Template:Colored em distributions by substituting 'unconditional' cdf, pdf or pmf, i.e. $F (\cdot)$ or $f (\cdot)$ , by their Template:Colored em counterparts, i.e. $F (\cdot | \cdot)$ or $f (\cdot | \cdot)$ .

Conditional independence

Template:Colored definition Template:Colored remark Template:Colored example Template:Colored example

Conditional expectation

Template:Colored definition Template:Colored remark Similarly, we have conditional version of law of the unconscious statistician. Template:Colored proposition Template:Colored proposition

Proof. $𝔼 [g (X) | Y] = {\begin{matrix} \sum_{x}^{} g (x) f_{X | Y} (x | Y) = \sum_{x}^{} g (x) f_{X} (x) = 𝔼 [g (X)], & X is discrete; \\ \int_{- \infty}^{\infty} g (x) f_{X | Y} (x | Y) d x = \int_{- \infty}^{\infty} g (x) f_{X} (x) d x = 𝔼 [g (X)], & X is continuous . \end{matrix}$

$◻$

Template:Colored remark Template:Colored example The properties of $𝔼 [\cdot]$ still hold for conditional expectations $𝔼 [\cdot | Y]$ , with Template:Colored em 'unconditional' expectation replaced by Template:Colored em expectation and some suitable modifications, as follows: Template:Colored proposition

Proof. The proof is similar to the one for 'unconditional' expectations.

$◻$

Template:Colored remark The following theorem about conditional expectation is quite important. Template:Colored theorem

Proof. $𝔼 [𝔼 [g (X) | Y]] = {\begin{matrix} \sum_{y}^{} 𝔼 [g (X) | Y = y] f_{Y} (y) = \sum_{x}^{} (\sum_{y}^{} g (x) \overset{f (x, y) / f_{Y} (y)}{\overset{⏞}{f_{X | Y} (x | y)}} f_{Y} (y)) = \sum_{x}^{} g (x) (\overset{f_{X} (x)}{\overset{⏞}{\sum_{y}^{} f (x, y)}}) = 𝔼 [g (X)], & X is discrete; \\ \int_{- \infty}^{\infty} 𝔼 [g (X) | Y = y] f_{Y} (y) d y = \int_{- \infty}^{\infty} (\int_{- \infty}^{\infty} g (x) \underset{f (x, y) / f_{Y} (y)}{\underset{⏟}{f_{X | Y} (x | y)}} d x) f_{Y} (y) d y = \int_{- \infty}^{\infty} g (x) (\underset{f_{X} (x)}{\underset{⏟}{\int_{- \infty}^{\infty} f (x, y) d y}}) d x = 𝔼 [g (X)], & X is continuous . \end{matrix}$

$◻$

Template:Colored remark Template:Colored corollary

Proof.

First,

$𝔼 [𝟏 {A} | Y] = 1 (ℙ (𝟏 {A} = 1 | Y) + 0 (ℙ (𝟏 {A} = 0 | Y) = ℙ (A | Y) .$

Then, using law of total expectation,

$𝔼_{Y} [ℙ (A | Y)] \overset{above}{=} 𝔼_{Y} [𝔼 [𝟏 {A} | Y]] = 𝔼 [𝟏 {A}] = ℙ (A) .$

$◻$

Template:Colored remark Template:Colored corollary

Proof. Define $Y = i$ if $A_{i}$ occurs, in which $i$ is a positive integer. Then, $𝔼 [X] = 𝔼_{Y} [𝔼_{X} [X | Y]] = \sum_{i = 1}^{\infty} 𝔼_{X} [X | Y = i] ℙ (Y = i) = \sum_{i = 1}^{\infty} 𝔼 [X | A_{i}] ℙ (A_{i})$

$◻$

Template:Colored remark Template:Colored example Template:Colored corollary

Proof. By the formula of expectation computed by weighted average of conditional expectations, $𝔼 [X 𝟏 {A}] = 𝔼 [X \underset{1}{\underset{⏟}{𝟏 {A}}} | A] ℙ (A) + 𝔼 [X \underset{0}{\underset{⏟}{𝟏 {A}}} | A^{c}] ℙ (A^{c}) = 𝔼 [X | A] ℙ (A),$ and the result follows if $ℙ (A) > 0$ .

$◻$

Template:Colored remark After defining Template:Colored em expectation, we can also have Template:Colored em variance, covariance and correlation coefficient, since variance, covariance, and correlation coefficient are built upon expectation.

Conditional expectations of bivariate normal distribution

Template:Colored proposition

Proof.

The result follows from the proposition about conditional distributions of bivariate normal distribution readily.

$◻$

Conditional variance

Template:Colored definition Similarly, we have properties of Template:Colored em variance which are similar to that of variance. Template:Colored proposition

Proof. The proof is similar to the one for properties of variance.

$◻$

Beside law of total expectation, we also have law of total variance, as follows: Template:Colored proposition

Proof. $\begin{matrix} 𝔼 [Var (X | Y)] + Var (𝔼 [X | Y]) & = 𝔼 [𝔼 [X^{2} | Y] - (𝔼 [X | Y])^{2}] + 𝔼 [(𝔼 [X | Y])^{2}] - (𝔼 [𝔼 [X | Y]])^{2} \\ = 𝔼 [𝔼 [X^{2} | Y]] + 𝔼 [(𝔼 [X | Y])^{2}] + 𝔼 [(𝔼 [X | Y])^{2}] - (𝔼 [𝔼 [X | Y]])^{2} \\ = 𝔼 [X^{2}] - (𝔼 [X])^{2} by law of total expectation \\ = Var (X) \end{matrix}$

$◻$

Template:Colored remark

Conditional variances of bivariate normal distribution

Template:Colored proposition

Proof.

The result follows from he proposition about conditional distributions of bivariate normal distribution readily.

$◻$

Template:Colored remark

Probability/Conditional Distributions

Contents

Motivation

Conditional distributions

Conditional distributions of bivariate normal distribution

Conditional version of concepts

Conditional independence

Conditional expectation

Conditional expectations of bivariate normal distribution

Conditional variance

Conditional variances of bivariate normal distribution

Conditional covariance

Conditional correlation coefficient

Conditional quantile

Navigation menu

Probability/Conditional Distributions

Motivation

Conditional distributions

Conditional distributions of bivariate normal distribution

Conditional version of concepts

Conditional independence

Conditional expectation

Conditional expectations of bivariate normal distribution

Conditional variance

Conditional variances of bivariate normal distribution

Conditional covariance

Conditional correlation coefficient

Conditional quantile

Navigation menu

Search