Probability/Probability Spaces

Terminologies

The name of this chapter, Template:Colored em, is a mathematical construct that models a random experiment. To be more precise: Template:Colored definition Let us first give the definitions related to sample space. For the definitions of event space and probability, we will discuss it in later sections. Template:Colored definition Template:Colored remark Template:Colored example Template:Colored definition Template:Colored remark Template:Colored definition Template:Colored example

Probability interpretations

In this chapter, we will discuss probability mathematically, and we will give an axiomatic and abstract definition to probability (function). By axiomatic definition, we mean defining probability to be a function that satisfying some axioms, called probability axioms. But such axiomatic definition does not tell us how should we interpret the term "probability", so the definition is said to be Template:Colored em from the interpretation of probability. Such independence make the formal definition always applicable, no matter how you interpret probability.

However, the axiomatic definition does not suggest a way to construct a probability measure (i.e., assigning probabilities to events): it just states that probability is a function satisfying certain axioms, but how can we construct such function in the first place? In this section, we will discuss two main types of probability interpretations: subjectivism and frequentism, where the method of assigning probabilities to events is mentioned in each of them.

Subjectivism

Intuitively and naturally, Template:Colored em of an event is often regarded as a numerical measure of the "chance" of the occurrence of the event (that is, how likely the event will occur). So, it is natural for us to assign probability to an event based on our own assessment on the "chance". (In order for the probability to be valid according to the axiomatic definition, the assignment needs to satisfy the Template:Colored em.) But different people may have different assessment on the "chance", depending on their personal opinions. So, we can see that such interpretation of probability is somewhat Template:Colored em, since different people may assign different probabilities to the same event. Hence, we call such probability interpretation as Template:Colored em (also known as Template:Colored em). Template:Colored example The main issue of the subjectivism is the lack of objectivity, since different probabilities can be assigned to the same event based on personal opinion. Then, we may have difficulties in choosing which of the probabilities should be used for that event. To mitigate the issue of the lack of objectivity, we may adjust our degrees of belief on an event from time to time when there are more observed data through Template:Colored em, which will be discussed in later chapter, so that the value is assigned in a more objective way. However, even after the adjustment, the assignment of value is still not in an Template:Colored em objective way, since the adjusted value (known as Template:Colored em) still depends on the initial value (known as Template:Colored em), which is assigned subjectively.

Frequentism

Another probability interpretation, which is objective, is called Template:Colored em. We denote by $n (E)$ the number of occurrences of an event $E$ in $n$ repetitions of experiment. (An Template:Colored em is any action or process with an Template:Colored em that is subject to uncertainty or randomness.) Then, we call $\frac{n (E)}{n}$ as the Template:Colored em of the event $E$ . Intuitively, we will Template:Colored em that the relative frequency fluctuates less and less as $n$ gets larger and larger, and approach to a constant limiting value (we call this as Template:Colored em) as $n$ tends to infinity, i.e., the limiting relative frequency is $\lim_{n \to \infty} \frac{n (E)}{n}$ . It is thus natural to take the limiting relative frequency as the probability of the event $E$ . This is exactly what the definition of probability in the frequentism. In particular, the Template:Colored em of such limiting relative frequency is an assumption or Template:Colored em in frequentism. (As a side result, when $n$ is large enough, the relative frequency of the event $E$ may be used to approximate the probability of the event $E$ .)

However, an issue of frequentism is that it may be infeasible to conduct experiments many times for some events. Hence, for those events, no probability can be assigned to them, and this is clearly a limitation for frequentism. Template:Colored example Template:Colored example Because of these issues, we will instead use a modern axiomatic and abstract approach to define probability, which is suggested by a Russian mathematician named Andrey Nikolaevich Kolmogorov in 1933. By Template:Colored em, we mean defining probability quite broadly and abstractly as something that satisfy certain axioms (called Template:Colored em). Such probability axioms are the mathematical foundation and the basis of modern probability theory.

Probability axioms

Since we want use the probability measure $ℙ$ to assign probability $ℙ (E)$ to every event $E$ in the sample space, it seems natural for us to set Template:Colored em of the probability measure $ℙ$ to be the set containing Template:Colored em subsets of $Ω$ , i.e., the power set of $Ω$ , $𝒫 (Ω)$ . Unfortunately, this situation is not that simple, and there are some technical difficulties if we set the domain like this, when the sample space $Ω$ is Template:Colored em. Template:Colored remark This is because the power set of such uncountable sample space includes some "badly behaved" sets, which causes problems when assigning probabilities to them. (Here, we will not discuss those sets and these technical difficulties in details.) Thus, instead of setting the domain of the probability measure to be $𝒫 (Ω)$ , we set the domain to be a Template:Colored em (sigma-algebra) containing some "sufficiently well-behaved" events: Template:Colored definition Template:Colored remark Template:Colored proposition

Proof.

Property 1: By the closure under complementation, since $S \in Σ$ , it follows that $\emptyset = S^{c} \in Σ$ .

Property 2: By the closure under countable unions, we have for Template:Colored em infinite sequence of sets $A_{1}, A_{2}, \dots$ , if the sets $A_{1}, A_{2}, \dots \in Σ$ , then $⋃_{i = 1}^{\infty} A_{i} \in Σ$ . So, in particular, we can choose the sequence to be $A_{1}, A_{2}, \dots, A_{n}, \emptyset, \emptyset, \dots$ ( $\emptyset \in Σ$ ) where $A_{1}, A_{2}, \dots, A_{n}$ is an arbitrary sequence such that $A_{1}, A_{2}, \dots, A_{n} \in Σ$ . Then, $⋃_{i = 1}^{n} A_{i} = ⋃_{i = 1}^{\infty} A_{i} \in Σ .$ Thus, we have the desired result.

Property 3: For every infinite sequence of sets $A_{1}, A_{2}, \dots \in Σ$ , by the closure under complementation, we have $A_{1}^{c}, A_{2}^{c}, \dots \in Σ .$ Then, by the closure under countable unions, we have $⋃_{i = 1}^{\infty} A_{i}^{c} \in Σ$ . After that, we use the De Morgan's law: ${(⋂_{i = 1}^{\infty} A_{i})}^{c} = ⋃_{i = 1}^{\infty} A_{i}^{c} \in Σ .$ Using the closure under complementation property again, we have $⋂_{i = 1}^{\infty} A_{i} \in Σ$ as desired.

Property 4: The proof is similar to that of property 2, and hence left as an exercise.

$◻$

Template:Colored remark Template:Colored exercise Template:Colored example Template:Colored exercise We have seen two examples of $σ$ -algebra in the example above. Often, the "smallest" $σ$ -algebra is not chosen to be the domain of the probability measure, since we usually are interested in events Template:Colored em $\emptyset$ and $Ω$ .

For the "largest" $σ$ -algebra, on the other hand, it contains every event, but we may not be interested in some of them. Particularly, we are usually interested in events that are "well-behaved", instead of those "badly behaved" events (indeed, it may be even impossible to assign probabilities to them properly (those events are called Template:Colored em)).

Fortunately, when the sample space $Ω$ is Template:Colored em, every set in $𝒫 (Ω)$ is "well-behaved", so we can take this power set to be a $σ$ -algebra for the domain of probability measure.

However, when the sample space $Ω$ is Template:Colored em, even if the power set $𝒫 (Ω)$ is a $σ$ -algebra, it contains "too many" events, particularly, it even includes some "badly behaved" events. Therefore, we will not choose such power set to the domain of the probability measure. Instead, we just choose a $σ$ -algebra that includes the "well-behaved" events to be the domain, so that we are able to assign probability properly to every event in the $σ$ -algebra of the domain. Particularly, those "well-behaved" events are often the events of interest, so all events of interest are contained in that $σ$ -algebra, that is, the domain of the probability measure.

To motivate the probability axioms, we consider some properties that the "probability" in frequentism (as a limiting relative frequency) possess:

The limiting relative frequency must be nonnegative. (We call this property as Template:Colored em.)
The limiting relative frequency of the whole sample space $Ω$ ( $Ω$ is also an event) must be 1 (since by definition $Ω$ contains all sample points, this event must occur in every repetition). (We call this property as Template:Colored em.)
If the events $E_{1}, E_{2}, \dots$ are pairwise disjoint (i.e., $E_{i} \cap E_{j} = \emptyset$ for every $i, j$ with $i \neq j$ ), then the limiting relative frequency of the event $⋃_{i = 1}^{\infty} E_{i} \overset{def}{=} E_{1} \cup E_{2} \cup \dots$ (union of subsets of $Ω$ is a subset of $Ω$ , so it can be called an event) is

$\begin{matrix} \lim_{n \to \infty} \frac{n (⋃_{i = 1}^{\infty} E_{i})}{n} & = \lim_{n \to \infty} \frac{n (⋃_{i = 1}^{\infty} E_{i})}{n} \\ = \lim_{n \to \infty} \frac{n (E_{1}) + n (E_{2}) + \dots}{n} & (the events are pairwise disjoint) \\ = \lim_{n \to \infty} \frac{n (E_{1})}{n} + \lim_{n \to \infty} \frac{n (E_{2})}{n} + \dots & (every limit exists by the axiom in frequentism) \\ = \sum_{i = 1}^{\infty} \lim_{n \to \infty} \frac{n (E_{i})}{n}, \end{matrix}$

which is the sum of the limiting relative frequency of each of the events

E_{1}, E_{2}, \dots

. (We call this property as Template:Colored em.)

It is thus very natural to set the probability axioms to be the three properties mentioned above: Template:Colored definition Template:Colored remark Using the probability axioms alone, we can prove many well-known properties of probability.

Basic properties of probability

Let us start the discussion with some simple properties of probability. Template:Colored theorem

Proof. Consider the infinite sequence of events $Ω, \emptyset, \emptyset, \dots$ (recall that $\emptyset$ and $Ω$ must be in the $σ$ -algebra $ℱ$ ). We can see that the events are pairwise disjoint. Also, the union of these events is $Ω$ . Hence, by the countable additivity of probability, we have $\underset{= 1}{\underset{⏟}{ℙ (Ω)}} = \underset{= 1}{\underset{⏟}{ℙ (Ω)}} + ℙ (\emptyset) + ℙ (\emptyset) + \dots ⟹ ℙ (\emptyset) + ℙ (\emptyset) + \dots = 1 - 1 = 0 ⟹ \sum_{i = 1}^{\infty} ℙ (\emptyset) = 0 .$ It can then be shown that $ℙ (\emptyset) = 0$ . ^[1]

$◻$

Using this result, we can obtain Template:Colored em from the countable additivity of probability: Template:Colored theorem

Proof. Consider the infinite sequence of events $E_{1}, E_{2}, \dots, E_{n}, \emptyset, \emptyset, \dots$ (recall that $\emptyset \in ℱ$ always). Then, $\begin{matrix} ℙ (⋃_{i = 1}^{n} A_{i}) & = ℙ (⋃_{i = 1}^{\infty} A_{i}) \\ = \sum_{i = 1}^{\infty} ℙ (A_{i}) & (countable additivity) \\ = \sum_{i = 1}^{n} ℙ (A_{i}) + \sum_{i = n + 1}^{\infty} ℙ (\emptyset) \\ = \sum_{i = 1}^{n} ℙ (A_{i}) . \end{matrix}$ (The last equality follows since $ℙ (\emptyset) = 0$ , and it can be shown that $\sum_{i = n + 1}^{\infty} ℙ (\emptyset) = 0$ using some concepts in limit (to be mathematically rigorous).)

$◻$

Finite additivity makes the proofs of some of the following results simpler. Template:Colored theorem

Proof.

Property 1:

First, notice that by definition $Ω = A \cup A^{c}$ . Furthermore, since $A \in ℱ$ , we have $A^{c} \in ℱ$ by the closure under complementation of $σ$ -algebra. Also, the sets $A$ and $A^{c}$ are disjoint. Thus, by the finite additivity, we have $ℙ (A \cup A^{c}) = ℙ (A) + ℙ (A^{c}) .$ On the other hand, $ℙ (A \cup A^{c}) = ℙ (Ω) \overset{P2}{=} 1 .$ Thus, we have the desired result.

Property 2: By property 1, we have $ℙ (A) = 1 - \underset{\geq 0 by P1}{\underset{⏟}{ℙ (A^{c})}} \leq 1$ . We then have the desired numeric bound on $ℙ (A)$ since $ℙ (A) \geq 0$ also by the nonnegativity of probability.

Property 3: $\begin{matrix} ℙ (B) & = ℙ (B \cap Ω) & (B \subseteq Ω, so B = B \cap Ω) \\ = ℙ (B \cap (A \cup A^{c})) & (definition) \\ = ℙ ((B \cap A) \cup (B \cap A^{c})) & (distributive law) \\ = ℙ (B \cap A) + ℙ (B ∖ A) & (B \cap A, B \cap A^{c} \in ℱ, and are disjoint. Also, B ∖ A = B \cap A^{c}) \end{matrix}$

Property 4: By property 3, we have $\begin{matrix} ℙ (A \cup B) & = ℙ ((A \cup B) \cap A) + ℙ ((A \cup B) ∖ A) & (property 3) \\ = ℙ (A) + ℙ (B ∖ A) & (possibly through Venn diagram informally) \\ = ℙ (A) + ℙ (B) - ℙ (B \cap A) . & (property 3) \end{matrix}$

Property 5: Assume that $A \subseteq B$ . Then, $A \cap B = A$ . Hence, by property 3, $ℙ (B) = ℙ (B \cap A) + \underset{\geq 0 by P1}{\underset{⏟}{ℙ (B ∖ A)}} = ℙ (A) .$

$◻$

Template:Colored remark Template:Colored example Template:Colored example Template:Colored exercise Template:Colored example Template:Colored remark Template:Colored example Template:Colored exercise Template:Colored example

Constructing a probability measure

As we have said, the axiomatic definition does not suggest us a way to construct a probability measure. Actually, even for the same experiment, there can be many ways to construct a probability measure that satisfies the above probability axioms if there are not sufficient information provided: Template:Colored example However, we have previously mentioned that we may assign probabilities to events subjectively (as in subjectivism), or according to its limiting relative frequency (as in frequentism). Through these two probability interpretations, we may provide some background information for a random experiment, by assigning probabilities to some of the events before constructing the probability measure, to the extent that there is Template:Colored em way to construct a probability measure. Consider the coin tossing example again: Template:Colored example In general, it is not necessary to assign probability to Template:Colored em event in the event space in the background information for us to able to construct the probability measure in exactly one way. Consider the following example. Template:Colored example We can see from this example that to provide sufficient background information to the extent that the probability measure can be constructed in exactly one way, we just need the probability of each of the singleton events (which should be nonnegative and sum to one to satisfy the probability axioms). After that, we can calculate the probability for each of the other events in the event space, and hence construct the only possible probability measure.

This is true when the sample space is countable, in general: Template:Colored theorem

Proof.

Case 1: $Ω$ is finite. Then, we can write $Ω = {ω_{1}, ω_{2}, \dots, ω_{n}}$ . It follows that every event $E \in ℱ$ can be expressed as $E = ⋃_{i} {ω_{i}}$ (which $i$ s are taken over for the union depends on the event $E$ ). Notice also that the sets " ${ω_{i}}$ "s are disjoint. (Every set contains a different sample point, and so the intersection of any pair of them is an empty set.) Then, by the finite additivity of probability, we have for every event $E \in ℱ$ , $ℙ (E) = \sum_{i}^{} ℙ ({ω_{i}}) = \sum_{ω \in E}^{} ℙ ({ω}) .$

Case 2: $Ω$ is countably infinite. Then, we can write $Ω = {ω_{1}, ω_{2}, \dots}$ . It follows that every event $E \in ℱ$ can be expressed as $E = ⋃_{i} {ω_{i}}$ (which $i$ s are taken over for the union depends on the event $E$ ). Notice also that the sets " ${ω_{i}}$ "s are disjoint. Then, by the countable additivity/finite additivity of probability, we have for every event $E \in ℱ$ , $ℙ (E) = \sum_{i}^{} ℙ ({ω_{i}}) = \sum_{ω \in E}^{} ℙ ({ω}) .$

$◻$

Template:Colored example The following is an important special case for the above theorem. Template:Colored corollary

Proof. Under the assumptions, the probability of every singleton event is nonnegative. Also, the sum of the probabilities is $\underset{| Ω | times}{\underset{⏟}{\frac{1}{| Ω |} + \frac{1}{| Ω |} + \dots + \frac{1}{| Ω |}}} = 1 .$ Thus, for every event $E$ , we have by the previous theorem $ℙ (E) = \sum_{ω \in E}^{} ℙ ({ω}) = \underset{| E | times}{\underset{⏟}{\frac{1}{| Ω |} + \frac{1}{| Ω |} + \dots + \frac{1}{| Ω |}}} = \frac{| E |}{| Ω |} .$

$◻$

Template:Colored remark Template:Colored example Template:Colored example Template:Colored exercise Template:Colored example Template:Colored example Template:Colored example Template:Colored example Template:Colored example Template:Colored example Template:Colored example Template:Colored example Template:Colored example

More advanced properties of probability

Recall the Template:Colored em in combinatorics. We have similar results for probability: Template:Colored theorem

Proof. We can prove this by mathematical induction.

Let $P (n)$ be the statement $ℙ (E_{1} \cup E_{2} \cup \dots \cup E_{n}) = \sum_{i_{1}}^{} ℙ (E_{i_{1}}) - \sum_{i_{1} < i_{2}}^{} ℙ (E_{i_{1}} \cap E_{i_{2}}) + \sum_{i_{1} < i_{2} < i_{3}}^{} ℙ (E_{i_{1}} \cap E_{i_{2}} \cap E_{i_{3}}) - \dots + (- 1)^{n + 1} \sum_{i_{1} < i_{2} < \dots < i_{n}}^{} ℙ (E_{i_{1}} \cap E_{i_{2}} \cap \dots \cap E_{i_{n}}) .$ We wish to prove that $P (n)$ is true for every positive integer $n$ .

Basis Step: When $n = 1$ , $P (n)$ is clearly true since it merely states that $ℙ (E_{1}) = ℙ (E_{1})$ .

Inductive Hypothesis: Assume that $P (k)$ is true for an arbitrary positive integer $k$ .

Inductive Step:

Case 1: $k = 1$ . Then, $P (k + 1) = P (2)$ is true by a property of probability (recall that we have " $ℙ (A \cup B) = ℙ (A) + ℙ (B) - ℙ (A \cap B)$ ").

Case 2: $k \geq 2$ . We wish to prove that $P (k + 1)$ is true. The main idea of the steps is to regard $E_{1} \cup E_{2} \cup \dots E_{k} \cup E_{k + 1}$ as $(E_{1} \cup E_{2} \cup \dots \cup E_{k}) \cup E_{k + 1}$ , and then we apply the above property of probability, and eventually we will apply the inductive hypothesis twice, on two probabilities involving union of $k$ events. Ultimately, through some (somewhat complicated) algebraic manipulations, we finally get the desired result. The details are as follows (may be omitted): $\begin{matrix} ℙ (E_{1} \cup E_{2} \cup \dots \cup E_{k} \cup E_{k + 1}) & = ℙ ((E_{1} \cup E_{2} \cup \dots \cup E_{k}) \cup E_{k + 1}) \\ = ℙ (E_{1} \cup E_{2} \cup \dots \cup E_{k}) + ℙ (E_{k + 1}) - ℙ ((E_{1} \cup E_{2} \cup \dots \cup E_{k}) \cap E_{k + 1}) (using the above property of probability again) \\ = ℙ (E_{1} \cup E_{2} \cup \dots \cup E_{k}) + ℙ (E_{k + 1}) - ℙ ((E_{1} \cap E_{k + 1}) \cup (E_{2} \cap E_{k + 1}) \cup \dots \cup (E_{k} \cap E_{k + 1})) (distributive law) \\ = \sum_{i_{1}}^{} ℙ (E_{i_{1}}) - \sum_{i_{1} < i_{2}}^{} ℙ (E_{i_{1}} \cap E_{i_{2}}) + \sum_{i_{1} < i_{2} < i_{3}}^{} ℙ (E_{i_{1}} \cap E_{i_{2}} \cap E_{i_{3}}) - \dots + (- 1)^{k + 1} \sum_{i_{1} < i_{2} < \dots < i_{k}}^{} ℙ (E_{i_{1}} \cap E_{i_{2}} \cap \dots \cap E_{i_{k}}) \\ + ℙ (E_{k + 1}) - \sum_{i_{1}}^{} ℙ (E_{i_{1}} \cap E_{k + 1}) + \sum_{i_{1} < i_{2}}^{} ℙ (E_{i_{1}} \cap E_{i_{2}} \cap E_{k + 1}) - \dots + (- 1)^{k + 1} \sum_{i_{1} < i_{2} < \dots < i_{k - 1}}^{} ℙ (E_{i_{1}} \cap E_{i_{2}} \cap E_{i_{3}} \cap \dots \cap E_{i_{k - 1}} \cap E_{k + 1}) \\ + (- 1)^{k + 2} \sum_{i_{1} < i_{2} < \dots < i_{k}}^{} ℙ (E_{i_{1}} \cap E_{i_{2}} \cap \dots \cap E_{i_{k}} \cap E_{k + 1}) (using inductive hypothesis twice) \\ = \sum_{i_{1}}^{} ℙ (E_{i_{1}}) - \sum_{i_{1} < i_{2}}^{} ℙ (E_{i_{1}} \cap E_{i_{2}}) + \sum_{i_{1} < i_{2} < i_{3}}^{} ℙ (E_{i_{1}} \cap E_{i_{2}} \cap E_{i_{3}}) - \dots + (- 1)^{k + 1} \sum_{i_{1} < i_{2} < \dots < i_{k}}^{} ℙ (E_{i_{1}} \cap E_{i_{2}} \cap \dots \cap E_{i_{k}}) \\ + ℙ (E_{k + 1}) - \sum_{i_{1}}^{} ℙ (E_{i_{1}} \cap E_{k + 1}) + \sum_{i_{1} < i_{2}}^{} ℙ (E_{i_{1}} \cap E_{i_{2}} \cap E_{k + 1}) - \dots + (- 1)^{k + 1} \sum_{i_{1} < i_{2} < \dots < i_{k - 1}}^{} ℙ (E_{i_{1}} \cap E_{i_{2}} \cap E_{i_{3}} \cap \dots \cap E_{i_{k - 1}} \cap E_{k + 1}) \\ + (- 1)^{k + 2} \sum_{i_{1} < i_{2} < \dots < i_{k}}^{} ℙ (E_{i_{1}} \cap E_{i_{2}} \cap \dots \cap E_{i_{k}} \cap E_{k + 1}) (just changing colors) \\ = \sum_{i_{1}}^{} ℙ (E_{i}) - \sum_{i_{1} < i_{2}}^{} ℙ (E_{i_{1}} \cap E_{i_{2}}) + \sum_{i_{1} < i_{2} < i_{3}}^{} ℙ (E_{i_{1}} \cap E_{i_{2}} \cap E_{i_{3}}) - \dots + (- 1)^{k + 1} \sum_{i_{1} < i_{2} < \dots < i_{k}}^{} ℙ (E_{i_{1}} \cap E_{i_{2}} \cap \dots \cap E_{i_{k}}) \\ + (- 1)^{k + 2} \sum_{i_{1} < i_{2} < \dots < i_{k} < i_{k + 1}}^{} ℙ (E_{i_{1}} \cap E_{i_{2}} \cap \dots \cap E_{i_{k}} \cap E_{i_{k + 1}}) (sum is wrt k + 1 case, which involves E_{1}, E_{2}, \dots, E_{k + 1}) \end{matrix}$ So, $P (k + 1)$ is true.

Hence, by the principle of mathematical induction, $P (n)$ is true for every positive integer $n$ .

$◻$

Template:Colored remark Template:Colored example Template:Colored example The following is a classical example for demonstrating the application of inclusion-exclusion principle. Template:Colored example Template:Colored theorem

Proof. Assume that $E_{n} ↑ E$ . Now, define $F_{1} = E_{1}, F_{2} = E_{2} ∖ E_{1}, F_{3} = E_{3} ∖ E_{2}, \dots .$ ^[2] Claim: $F_{1}, F_{2}, \dots \in ℱ$ are pairwise disjoint.

Proof. We wish to prove that $F_{i} \cap F_{j} = \emptyset$ for every $i, j$ such that $i \neq j$ . $\begin{matrix} F_{i} \cap F_{j} & = (E_{i} \cap E_{i - 1}^{c}) \cap (E_{j} \cap E_{j - 1}^{c}) \\ = (E_{i} \cap E_{j}) \cap (E_{i - 1} \cup E_{j - 1})^{c} . & (De Morgan's law) \end{matrix}$ Case 1: $i < j$ . Then, $E_{i} \cap E_{j} = E_{i}$ ( $E_{i} \subseteq E_{j}$ ) and $E_{i - 1} \cup E_{j - 1} = E_{j - 1}$ ( $E_{i - 1} \subseteq E_{j - 1}$ ). Hence, $F_{i} \cap F_{j} = E_{i} ∖ E_{j - 1} = \emptyset$ . (Since $i < j$ , $E_{i} \subseteq E_{j - 1}$ ( $i$ can be at most $j - 1$ ).)

Case 2: $i > j$ . Then, $E_{i} \cap E_{j} = E_{j}$ ( $E_{j} \subseteq E_{i}$ ) and $E_{i - 1} \cup E_{j - 1} = E_{i - 1}$ . Hence, $F_{i} \cap F_{j} = E_{j} ∖ E_{i - 1} = \emptyset$ similarly.

$◻$

Also, we have $⋃_{i = 1}^{n} F_{i} = E_{n}$ , and $⋃_{i = 1}^{\infty} F_{i} = ⋃_{i = 1}^{\infty} E_{i} \in ℱ$ . Then, we have $\begin{matrix} \lim_{n \to \infty} ℙ (E_{n}) & = \lim_{n \to \infty} ℙ (⋃_{i = 1}^{n} F_{i}) & (above) \\ = \lim_{n \to \infty} \sum_{i = 1}^{n} ℙ (F_{i}) & (finite additivity) \\ = \sum_{i = 1}^{\infty} ℙ (F_{i}) & (definition) \\ = ℙ (⋃_{i = 1}^{\infty} F_{i}) & (countable additivity) \\ = ℙ (⋃_{i = 1}^{\infty} E_{i}) & (above) \\ = ℙ (E) . \end{matrix}$

$◻$

Template:Colored corollary

Proof. Assume that $E_{n} ↓ E$ as $n \to \infty$ . Then, by De Morgan's law and a property of subset, $E_{n}^{c} ↑ E^{c}$ . So, $\begin{matrix} \lim_{n \to \infty} ℙ (E_{n}) & = 1 - \lim_{n \to \infty} ℙ (E_{n}^{c}) & (complementary event property) \\ = 1 - ℙ (E^{c}) & (continuity from below) \\ = 1 - (1 - ℙ (E)) & (complementary event property) \\ = ℙ (E) . \end{matrix}$

$◻$

Template:Colored example Template:Colored exercise Template:Colored example Template:Colored theorem

Proof. Define $F_{1} = E_{1}, F_{2} = E_{2} ∖ E_{1}, F_{3} = E_{3} ∖ (E_{2} \cup E_{1}), \dots F_{k} = E_{k} ∖ (E_{k - 1} \cup E_{k - 2} \cup \dots \cup E_{2} \cup E_{1}), \dots .$ Claim: $F_{1}, F_{2}, F_{3}, \dots \in ℱ$ are pairwise disjoint.

Proof. We wish to prove that $F_{i} \cap F_{j} = \emptyset$ for every $i, j$ such that $i \neq j$ . $\begin{matrix} F_{i} \cap F_{j} & = (E_{i} \cap {(E_{i - 1} \cup E_{i - 2} \cup \dots \cup E_{1})}^{c}) \cap (E_{j} \cap {(E_{j - 1} \cup E_{j - 2} \cup \dots \cup E_{1})}^{c}) \\ = (E_{i} \cap E_{i - 1}^{c} \cap E_{i - 2}^{c} \cap \dots \cap E_{1}^{c}) \cap (E_{j} \cap E_{j - 1}^{c} \cap E_{j - 2}^{c} \cap \dots \cap E_{1}^{c}) . & (De Morgan's law) \end{matrix}$ Case 1: $i < j$ . Then, $E_{j} \cap E_{j - 1}^{c} \cap E_{j - 2}^{c} \cap \dots \cap E_{1}^{c} = E_{j} \cap E_{j - 1}^{c} \cap E_{j - 2}^{c} \cap \dots \cap E_{i}^{c} \cap \dots \cap E_{1}^{c} \subseteq E_{i}^{c}$ . So, it follows that $(E_{i} \cap E_{i - 1}^{c} \cap E_{i - 2}^{c} \cap \dots \cap E_{1}^{c}) \cap (E_{j} \cap E_{j - 1}^{c} \cap E_{j - 2}^{c} \cap \dots \cap E_{1}^{c}) \subseteq (E_{i} \cap E_{i - 1}^{c} \cap E_{i - 2}^{c} \cap \dots \cap E_{1}^{c}) \cap E_{i}^{c} = \emptyset,$ which means $F_{i} \cap F_{j} = \emptyset$ (since the only subset of $\emptyset$ is $\emptyset$ ).

Case 2: $i > j$ . Then, $E_{i} \cap E_{i - 1}^{c} \cap E_{i - 2}^{c} \cap \dots \cap E_{1}^{c} = E_{i} \cap E_{i - 1}^{c} \cap E_{i - 2}^{c} \cap \dots \cap E_{j}^{c} \cap \dots \cap E_{1}^{c} \subseteq E_{j}^{c}$ . So, it follows that $(E_{i} \cap E_{i - 1}^{c} \cap E_{i - 2}^{c} \cap \dots \cap E_{1}^{c}) \cap (E_{j} \cap E_{j - 1}^{c} \cap E_{j - 2}^{c} \cap \dots \cap E_{1}^{c}) \subseteq E_{j}^{c} \cap (E_{j} \cap E_{j - 1}^{c} \cap E_{j - 2}^{c} \cap \dots \cap E_{1}^{c}) = \emptyset,$ which means $F_{i} \cap F_{j} = \emptyset$ (since the only subset of $\emptyset$ is $\emptyset$ ).

$◻$

Furthermore, we have $F_{i} \subseteq E_{i}$ for every $i = 1, 2, \dots$ , and $⋃_{j = 1}^{\infty} F_{j} = ⋃_{j = 1}^{\infty} E_{j} \in ℱ$ . Hence, $\begin{matrix} ℙ (⋃_{i = 1}^{\infty} E_{i}) & = ℙ (⋃_{i = 1}^{\infty} F_{i}) \\ = \sum_{i = 1}^{\infty} ℙ (F_{i}) & (countable additivity) \\ \leq \sum_{i = 1}^{\infty} ℙ (E_{i}) . & (monotonicity) \end{matrix}$

$◻$

Template:Colored remark Template:Colored example Template:Nav

Template:BookCat

↑ One may prove this by Template:Colored em: Assume that $ℙ (\emptyset) \neq 0$ . Then, by the nonegativity of probability, this means $ℙ (\emptyset) > 0$ . Then, $\sum_{i = 1}^{\infty} ℙ (\emptyset) = \lim_{n \to \infty} (n \underset{> 0}{\underset{⏟}{ℙ (\emptyset)}}) = \infty$ (that is, the sum Template:Colored em). So, $\sum_{i = 1}^{\infty} ℙ (\emptyset) \neq 0$ .

↑ Graphically,

      .           .
      .         .
      .       .
*-----------*
|###########|
*--------*##|
|////////|##|
*----*///|##|  ...
|....|///|##|
|....|///|##|
*----*---*--*
  E_1
    E_2
      E_3 ...
..
.. : F_1  

//
// : F_2

##
## : F_3

[1] One may prove this by Template:Colored em: Assume that $ℙ (\emptyset) \neq 0$ . Then, by the nonegativity of probability, this means $ℙ (\emptyset) > 0$ . Then, $\sum_{i = 1}^{\infty} ℙ (\emptyset) = \lim_{n \to \infty} (n \underset{> 0}{\underset{⏟}{ℙ (\emptyset)}}) = \infty$ (that is, the sum Template:Colored em). So, $\sum_{i = 1}^{\infty} ℙ (\emptyset) \neq 0$ .

[2] Graphically,
. . . . . . *-----------* |###########| *--------*##| |////////|##| *----*///|##| ... |....|///|##| |....|///|##| *----*---*--* E_1 E_2 E_3 ... .. .. : F_1 // // : F_2 ## ## : F_3

[1]

[2]

Probability/Probability Spaces

Contents

Terminologies

Probability interpretations

Subjectivism

Frequentism

Probability axioms

Basic properties of probability

Constructing a probability measure

More advanced properties of probability

Navigation menu

Probability/Probability Spaces

Terminologies

Probability interpretations

Subjectivism

Frequentism

Probability axioms

Basic properties of probability

Constructing a probability measure

More advanced properties of probability

Navigation menu

Search