Probability/Conditional Probability

Motivation

In the previous chapter, we have actually only dealt with Template:Colored em. To be more precise, in the problems encountered in the previous chapter, the sample space is defined initially, and all probabilities are assigned with respect to that initial sample space. However, in many situations, after defining the initial sample space for a random experiment, we may get some new information about the random experiment. Hence we may need to Template:Colored em the sample space based on those information. The probability based on this updated sample space is known as a Template:Colored em.

To illustrate how we get the new information and update the sample space correspondingly, consider the following example: Template:Colored example From the example above, we are able to calculate the (conditional) probability "reasonably" through some arguments (see (b)) when the sample points in the initial sample space are equally likely. Furthermore, we can notice that the condition should be an occurrence of an event, which involves the sample points in the sample space. When the condition does not involves the sample points at all, it is irrelevant to the random experiment. For example, if the condition is "the poker deck costs $10", then this is clearly not an event in the sample space and does not involve the sample points. Also, it is irrelevant from this experiment.

To motivate the definition of conditional probability, let us consider more precisely how we obtain the (conditional) probability in (b). In (b), we are given that an ace is drawn out from the poker deck beforehand. This means that ace can never be drawn in our draw. This corresponds to the occurrence of the event (with respect to original sample space) ${not drawing that ace from the poker deck}$ (denoted by $B$ ) which consists of 51 sample points, resulting from excluding that ace from the original 52 sample points. Thus, we can regard the condition as the occurrence of event $B$ . Now, under this condition, the sample space is updated to be the set $B$ , that is, only points in $B$ are regarded as legit sample points now.

Consider part (b) again. Let us denote by $A$ the event ${an ace is drawn from the deck}$ (with respect to the original sample space).

Now, only Template:Colored em of the points (whose also lie in the set $B$ ) in the set $A$ are regarded as legit sample points. All other points in set $A$ are no longer legit sample points anymore under this condition. In other words, only the points in Template:Colored em sets $A$ and $B$ (i.e., in set $A \cap B$ ) are legit sample points in event $A$ under this condition.

In the part (b) above, only the three aces remaining in the deck (in both sets $A$ and $B$ , and hence in set $A \cap B$ ) are considered to be legit sample points. The other ace in set $A$ (the ace that is drawn out in the condition) is Template:Colored em considered to be a legit sample point, since that ace is not in the deck at all!

To summarize, when we want to calculate the Template:Colored em of event $A$ Template:Colored em, we do the following:

We update the sample space to the set $B$ .
We only regard the sample points in set $A \cap B$ to be the (valid) sample points of event $A$ .

In the above example, we encounter a special case where the sample points in the initial sample space (assumed to be finite) are equally likely (and hence the sample points in the updated sample space $B$ should also be equally likely). In this case, using the result about combinatorial probability (in previous chapter), the conditional probability, denoted by $ℙ (A | B)$ , is given by $ℙ (A | B) = \frac{| A \cap B |}{| B |} = \frac{number of (valid) sample points in A}{number of sample points in updated sample space B} .$ (Notice that " $ℙ (A | B)$ " is just a notation ( $ℙ (\cdot | B)$ is a function, $A$ is the "input"). Merely " $A | B$ " means nothing. Particularly, " $A | B$ " is Template:Colored em an event/set.)

When the sample points are Template:Colored em equally likely, we can apply a theorem in the previous chapter for constructing probability measure on the updated sample space $B$ . (Here, we assume that $B$ is countable.) Particularly, since we are only regarding the sample points in set $A \cap B$ as the (valid) sample points of event $A$ , it Template:Colored em the (naive) "conditional probability" of $A$ given the occurrence of event $B$ should be given by $ℙ (A \cap B) = \sum_{ω \in A \cap B}^{} ℙ ({ω})$ according to that theorem (where $ℙ$ is the probability measure in the Template:Colored em probability space $(Ω, ℱ, ℙ)$ ).

However, when we apply the original probability measure $ℙ$ (in the original probability space) to every singleton event in the new sample space $B$ , we face an issue: the sum of those probabilities are just $\sum_{ω \in B}^{} ℙ ({ω}) = ℙ (B),$ which is not 1 in general! But that theorem requires this probability to be 1! A natural remedy to this problem is to define a new probability measure $ℙ (\cdot | B) : ℱ \to [0, 1]$ , based on the original probability measure and the above (naive) "conditional probability" $ℙ (A \cap B)$ , such that the sum must be 1. After noticing that $1 = \frac{ℙ (B)}{ℙ (B)}$ , a natural choice of such probability measure is given by $ℙ (A | B) = \frac{ℙ (A \cap B)}{ℙ (B)},$ for every $A \in ℱ$ . The probability $ℙ (B)$ can be interpreted as the Template:Colored em constant, and every (naive) "conditional probability" (as suggested previously) is scaled by a factor of $\frac{1}{ℙ (B)}$ . (It turns out that the probability measure $ℙ (\cdot | B) : ℱ \to [0, 1]$ defined in this way also satisfies all the probability axioms. We will prove this later.)

When comparing this formula with the formula for the equally likely sample points case, the two formulas look quite similar actually. In fact, we can express the formula for the Template:Colored em in the same form as this formula (since the equally likely sample points case is actually just a special case for the theorem we are considering): $\frac{| A \cap B |}{| B |} = \frac{| A \cap B | / | Ω |}{| B | / | Ω |} = \frac{ℙ (A \cap B)}{ℙ (B)} .$

File:Conditional probability illustration.webm So, now we have developed a reasonable and natural formula to calculate the conditional probability $ℙ (A | B)$ when the outcomes are equally likely (applicable to finite sample space), and the outcomes are not equally likely (only for countable sample space). It is thus natural to also use the same formula when the sample space is uncountable. This motivates the following definition of conditional probability:

Definition

Proof. With the assumptions, we have by definition $ℙ (E_{1}) ℙ (E_{2} | E_{1}) ℙ (E_{3} | E_{1} \cap E_{2}) \dots ℙ (E_{n} | E_{1} \cap \dots \cap E_{n - 1}) \overset{def}{=} ℙ (E_{1}) \cdot \frac{ℙ (E_{2} \cap E_{1})}{ℙ (E_{1})} \cdot \frac{ℙ (E_{3} \cap E_{1} \cap E_{2})}{ℙ (E_{1} \cap E_{2})} \cdot \dots \cdot \frac{ℙ (E_{n} \cap E_{1} \cap \dots \cap E_{n - 1})}{ℙ (E_{1} \cap \dots \cap E_{n - 1})} .$

$◻$

Template:Colored remark Template:Colored example Two important theorems related to conditional probability, namely law of total probability and Bayes' theorem, will be discussed in the following section.

Law of total probability and Bayes' theorem

Sometimes, it is not an easy task to assign a suitable unconditional probability to an event. For instance, suppose Amy will perform a COVID-19 test, and the result is either positive or negative (impossible to have invalid results). Let $P = {Amy tests positive at the COVID-19 test}$ . What should be $ℙ (P)$ ? It is actually quite difficult to answer directly, since this probability is without any condition. In particular, it is unknown that whether Amy gets infected by COVID-19 or not, and clearly the infection will affect the probability assignment quite significantly.

On the other hand, it may be easier to assign/calculate related conditional/unconditional probabilities. Now, let $I = {Amy gets infected by COVID-19}$ . The Template:Colored em probability $ℙ (P | I)$ , called Template:Colored em, may be known based on the research on the COVID-19 test. Also, the Template:Colored em probability $ℙ (P^{c} | I^{c})$ , called Template:Colored em, may also be known based on the research. Besides, the probability $ℙ (I)$ may be obtained according to studies on COVID-19 infection for Amy's place of living. Since $ℙ (P) = ℙ (P \cap I) + ℙ (P \cap I^{c}),$ by the definition of conditional probability, we have $ℙ (P) = ℙ (P | I) ℙ (I) + ℙ (P | I^{c}) ℙ (I^{c}) .$ Since the conditional probability satisfies the probability axioms, we have the relation $ℙ (P | I^{c}) = 1 - ℙ (P^{c} | I^{c})$ , and thus the value of $ℙ (P | I^{c})$ can be obtained. The remaining terms in the expression can also be obtained, as suggested by above. Thus, we can finally obtain the value of $ℙ (P)$ .

This shows that conditional probabilities can be quite helpful for calculating unconditional probabilities, especially when we condition appropriately so that the conditional probabilities, and the probability of the condition are known in some ways.

The following theorem is an important theorem that relates unconditional probabilities and conditional probabilities, as in above discussion. Template:Colored theorem

Proof. Here we only prove the finite case. The proof for countably infinite case is similar and thus left as an exercise.

Under the assumptions, $B_{1}, B_{2}, \dots, B_{n}$ are pairwise disjoint, and thus $A \cap B_{1}, A \cap B_{2}, \dots, A \cap B_{n}$ are also pairwise disjoint (by observing that $(A \cap B_{1}) \cap (A \cap B_{2}) = A \cap (B_{1} \cap B_{2}) = A \cap \emptyset = \emptyset$ , and other intersections have similar results). Also, since $ℙ (B_{1}), ℙ (B_{2}), \dots, ℙ (B_{n}) > 0$ , the conditional probabilities $ℙ (\cdot | B_{1}), ℙ (\cdot | B_{2}), \dots, ℙ (\cdot | B_{n})$ are defined. Moreover, since $A \subseteq B_{1} \cup B_{2} \cup \dots \cup B_{n}$ , we can observe that $A = (A \cap B_{1}) \cup (A \cap B_{2}) \cup \dots \cup (A \cap B_{n})$ (through Venn diagram, informally). It follows that $\begin{matrix} ℙ (A) & = ℙ ((A \cap B_{1}) \cup (A \cap B_{2}) \cup \dots \cup (A \cap B_{n})) \\ = ℙ (A \cap B_{1}) + ℙ (A \cap B_{2}) + \dots + ℙ (A \cap B_{n}) & (finite additivity) \\ = ℙ (A | B_{1}) ℙ (B_{1}) + ℙ (A | B_{2}) ℙ (B_{2}) + \dots + ℙ (A | B_{n}) ℙ (B_{n}) . & (definition) \end{matrix}$

$◻$

Template:Colored exercise Now, suppose Amy has performed a COVID-19 test, and the result is positive! So now Amy is worrying about whether she really gets infected by COVID-19, or it is just a Template:Colored em. Therefore, she would like to know the conditional probability $ℙ (I | P)$ (the conditional probability for getting infected given testing positive). Notice that the conditional probability $ℙ (P | I)$ may be known (based on some research). However, it does not equal the conditional probability $ℙ (I | P)$ generally. (These two probabilities are referring to two different things.) So, now we are interested in knowing that whether there is formula that relates these two probabilities, which have somewhat "similar" expressions. See the following exercise for deriving the relationship between $ℙ (A | B)$ and $ℙ (B | A)$ : Template:Colored exercise The following theorem is the generalization of the above result. Template:Colored theorem

Proof.

Finite case: Under the assumptions, we have by law of total probability $ℙ (A) = ℙ (A | B_{1}) ℙ (B_{1}) + ℙ (A | B_{2}) ℙ (B_{2}) + \dots + ℙ (A | B_{n}) ℙ (B_{n}) .$ On the other hand, by the definition of conditional probability, $ℙ (A \cap B_{i}) = ℙ (A | B_{i}) ℙ (B_{i}) .$ Since $ℙ (B_{i} | A) = \frac{ℙ (A \cap B_{i})}{ℙ (A)}$ by the definition of conditional probability, the result follows.

Countably infinite case: Under the assumptions, we have by law of total probability $ℙ (A) = ℙ (A | B_{1}) ℙ (B_{1}) + ℙ (A | B_{2}) ℙ (B_{2}) + \dots .$ On the other hand, by the definition of conditional probability, $ℙ (A \cap B_{i}) = ℙ (A | B_{i}) ℙ (B_{i}) .$ Since $ℙ (B_{i} | A) = \frac{ℙ (A \cap B_{i})}{ℙ (A)}$ by the definition of conditional probability, the result follows.

$◻$

Template:Colored example Template:Colored example Template:Colored example The following is a famous problem. Template:Colored example Template:Colored remark Template:Colored example Template:Colored example Template:Colored example

Independence

From the previous discussion, we know that the conditional probability of event $A$ given the occurrence of event $B$ can be interpreted as the probability of $A$ where the sample space is updated to event $B$ . In general, through this update, the probability of $A$ should be affected. But what if the probability is somehow the same as the one before the update?

If this is the case, then the occurrence of a particular event $B$ does not affect the probability of event $A$ actually. Symbolically, it means $ℙ (A | B) = ℙ (A)$ . If this holds, then we have $ℙ (B | A) = \frac{ℙ (A | B) ℙ (B)}{ℙ (A)} = \frac{ℙ (A) ℙ (B)}{ℙ (A)} = ℙ (B)$ . This means the occurrence of event $A$ also does not affect the probability of event $B$ . This result matches with our intuitive interpretation of the independence of two events, so it seems that it is quite reasonable to define the independence of events $A$ and $B$ as follows: Template:Quote However, this definition has some slight issues. If $ℙ (A) = 0 or ℙ (B) = 0$ , then some of the conditional probabilities involved may be Template:Colored em. So, for some events, we may not be able to tell whether they are independent or not using this "definition". To deal with this, we consider an alternative definition, that is equivalent to the above when $ℙ (A) \neq 0$ and $ℙ (B) \neq 0$ . To motivate that definition, we can see that $ℙ (A | B) = ℙ (A) or ℙ (B | A) = ℙ (B) ⟺ ℙ (A \cap B) = ℙ (A) ℙ (B)$ , when both $ℙ (A)$ and $ℙ (B)$ are nonzero. This results in the following definition: Template:Colored definition Template:Colored remark But what if there are more than two events involved? Intuitively, we may consider the following as the general "definition" of independence: Template:Quote But we will get some strange results by using this as the "definition": Template:Colored example From this example, merely requiring $ℙ (E_{1} \cap E_{2} \cap \dots \cap E_{n}) = ℙ (E_{1}) ℙ (E_{2}) \dots ℙ (E_{n})$ cannot ensure all Template:Colored em of events involved are independent. So, this suggests us another definition: Template:Quote However, we will again get some strange results by using this as the "definition": Template:Colored example The above examples suggest us that we actually need Template:Colored em requirements

$ℙ (A \cap B \cap C) = ℙ (A) ℙ (B) ℙ (C)$ . (This ensures $ℙ (C | A \cap B) = ℙ (C), ℙ (A | B \cap C) = ℙ (A) and ℙ (B | A \cap C) = ℙ (B)$ . That is, if we condition on the intersection of two events, the probability of another one is not affected.)
All the pairs of events $A, B, C$ are independent. (That is, $ℙ (A \cap B) = ℙ (A) ℙ (B), ℙ (B \cap C) ℙ (B) ℙ (C) and ℙ (A \cap C) = ℙ (A) ℙ (C)$ .

in the definition of independence of events $A, B, C$ for the definition to "make sense".

Similarly, the independence of four events $A, B, C, D$ should require

$ℙ (A \cap B \cap C \cap D) = ℙ (A) ℙ (B) ℙ (C) ℙ (D)$ .
All the triples of events are independent.

In other words, we need the probability of intersection of any three and any two events to be able to "split as product of probabilities of single events" as in above.

This leads us to the following general definition: Template:Colored definition Template:Colored remark Template:Colored example Template:Colored example Template:Colored example In general, we have the following result. Template:Colored proposition

Proof. Assume that the events $E_{1}, E_{2}, \dots, E_{n}$ are independent. Then, we have $ℙ (⋂_{i = 1}^{n} E_{i}) = \prod_{i = 1}^{n} ℙ (E_{i}) .$ Now, suppose we replace $E_{1}$ by $E_{1}^{c}$ , and we want to prove that the independence still holds: $\begin{matrix} ℙ (E_{1}^{c} \cap ⋂_{i = 2}^{n} E_{i}) & = ℙ (⋂_{i = 2}^{n} E_{i}) - ℙ (E_{1} \cap ⋂_{i = 2}^{n} E_{i}) & (a property) \\ = \prod_{i = 2}^{n} ℙ (E_{i}) - ℙ (E_{1}) \prod_{i = 2}^{n} ℙ (E_{i}) & (independence of E_{1}, E_{2}, \dots, E_{n}) \\ = \prod_{i = 2}^{n} ℙ (E_{i}) (1 - ℙ (E_{1})) \\ = \prod_{i = 2}^{n} ℙ (E_{i}) ℙ (E_{1}^{c}) . \end{matrix}$ Thus, $E_{1}^{c}, E_{2}, \dots, E_{n}$ are still independent. By symmetry, we can instead replace an event other than $E_{1}$ by its respective complement, and the independence still holds.

Notice that we can split the process of replacing Template:Colored em of the events into multiple steps, where we replace one event in every step (that is, we replace the events one by one). Now, we can apply the above argument in each step to ensure that the resulting events are still independent. Thus, after all steps, and we finish the replacement, the independence still holds.

$◻$

Template:Colored example Template:Colored remark Template:Colored example Template:Colored exercise Template:Colored exercise Template:Colored example

Conditional independence

Template:Colored em independence is a conditional version of independence, and has the following definition which is similar to that of independence. Template:Colored definition Template:Colored remark Template:Colored example Template:Nav

Template:BookCat

Probability/Conditional Probability

Contents

Motivation

Definition

Law of total probability and Bayes' theorem

Independence

Conditional independence

Navigation menu

Probability/Conditional Probability

Motivation

Definition

Law of total probability and Bayes' theorem

Independence

Conditional independence

Navigation menu

Search