Probability/Probability Spaces
Terminologies
The name of this chapter, Template:Colored em, is a mathematical construct that models a random experiment. To be more precise: Template:Colored definition Let us first give the definitions related to sample space. For the definitions of event space and probability, we will discuss it in later sections. Template:Colored definition Template:Colored remark Template:Colored example Template:Colored definition Template:Colored remark Template:Colored definition Template:Colored example
Probability interpretations
In this chapter, we will discuss probability mathematically, and we will give an axiomatic and abstract definition to probability (function). By axiomatic definition, we mean defining probability to be a function that satisfying some axioms, called probability axioms. But such axiomatic definition does not tell us how should we interpret the term "probability", so the definition is said to be Template:Colored em from the interpretation of probability. Such independence make the formal definition always applicable, no matter how you interpret probability.
However, the axiomatic definition does not suggest a way to construct a probability measure (i.e., assigning probabilities to events): it just states that probability is a function satisfying certain axioms, but how can we construct such function in the first place? In this section, we will discuss two main types of probability interpretations: subjectivism and frequentism, where the method of assigning probabilities to events is mentioned in each of them.
Subjectivism
Intuitively and naturally, Template:Colored em of an event is often regarded as a numerical measure of the "chance" of the occurrence of the event (that is, how likely the event will occur). So, it is natural for us to assign probability to an event based on our own assessment on the "chance". (In order for the probability to be valid according to the axiomatic definition, the assignment needs to satisfy the Template:Colored em.) But different people may have different assessment on the "chance", depending on their personal opinions. So, we can see that such interpretation of probability is somewhat Template:Colored em, since different people may assign different probabilities to the same event. Hence, we call such probability interpretation as Template:Colored em (also known as Template:Colored em). Template:Colored example The main issue of the subjectivism is the lack of objectivity, since different probabilities can be assigned to the same event based on personal opinion. Then, we may have difficulties in choosing which of the probabilities should be used for that event. To mitigate the issue of the lack of objectivity, we may adjust our degrees of belief on an event from time to time when there are more observed data through Template:Colored em, which will be discussed in later chapter, so that the value is assigned in a more objective way. However, even after the adjustment, the assignment of value is still not in an Template:Colored em objective way, since the adjusted value (known as Template:Colored em) still depends on the initial value (known as Template:Colored em), which is assigned subjectively.
Frequentism
Another probability interpretation, which is objective, is called Template:Colored em. We denote by the number of occurrences of an event in repetitions of experiment. (An Template:Colored em is any action or process with an Template:Colored em that is subject to uncertainty or randomness.) Then, we call as the Template:Colored em of the event . Intuitively, we will Template:Colored em that the relative frequency fluctuates less and less as gets larger and larger, and approach to a constant limiting value (we call this as Template:Colored em) as tends to infinity, i.e., the limiting relative frequency is . It is thus natural to take the limiting relative frequency as the probability of the event . This is exactly what the definition of probability in the frequentism. In particular, the Template:Colored em of such limiting relative frequency is an assumption or Template:Colored em in frequentism. (As a side result, when is large enough, the relative frequency of the event may be used to approximate the probability of the event .)
However, an issue of frequentism is that it may be infeasible to conduct experiments many times for some events. Hence, for those events, no probability can be assigned to them, and this is clearly a limitation for frequentism. Template:Colored example Template:Colored example Because of these issues, we will instead use a modern axiomatic and abstract approach to define probability, which is suggested by a Russian mathematician named Andrey Nikolaevich Kolmogorov in 1933. By Template:Colored em, we mean defining probability quite broadly and abstractly as something that satisfy certain axioms (called Template:Colored em). Such probability axioms are the mathematical foundation and the basis of modern probability theory.
Probability axioms
Since we want use the probability measure to assign probability to every event in the sample space, it seems natural for us to set Template:Colored em of the probability measure to be the set containing Template:Colored em subsets of , i.e., the power set of , . Unfortunately, this situation is not that simple, and there are some technical difficulties if we set the domain like this, when the sample space is Template:Colored em. Template:Colored remark This is because the power set of such uncountable sample space includes some "badly behaved" sets, which causes problems when assigning probabilities to them. (Here, we will not discuss those sets and these technical difficulties in details.) Thus, instead of setting the domain of the probability measure to be , we set the domain to be a Template:Colored em (sigma-algebra) containing some "sufficiently well-behaved" events: Template:Colored definition Template:Colored remark Template:Colored proposition
Proof.
Property 1: By the closure under complementation, since , it follows that .
Property 2: By the closure under countable unions, we have for Template:Colored em infinite sequence of sets , if the sets , then . So, in particular, we can choose the sequence to be () where is an arbitrary sequence such that . Then, Thus, we have the desired result.
Property 3: For every infinite sequence of sets , by the closure under complementation, we have Then, by the closure under countable unions, we have . After that, we use the De Morgan's law: Using the closure under complementation property again, we have as desired.
Property 4: The proof is similar to that of property 2, and hence left as an exercise.
Template:Colored remark Template:Colored exercise Template:Colored example Template:Colored exercise We have seen two examples of -algebra in the example above. Often, the "smallest" -algebra is not chosen to be the domain of the probability measure, since we usually are interested in events Template:Colored em and .
For the "largest" -algebra, on the other hand, it contains every event, but we may not be interested in some of them. Particularly, we are usually interested in events that are "well-behaved", instead of those "badly behaved" events (indeed, it may be even impossible to assign probabilities to them properly (those events are called Template:Colored em)).
Fortunately, when the sample space is Template:Colored em, every set in is "well-behaved", so we can take this power set to be a -algebra for the domain of probability measure.
However, when the sample space is Template:Colored em, even if the power set is a -algebra, it contains "too many" events, particularly, it even includes some "badly behaved" events. Therefore, we will not choose such power set to the domain of the probability measure. Instead, we just choose a -algebra that includes the "well-behaved" events to be the domain, so that we are able to assign probability properly to every event in the -algebra of the domain. Particularly, those "well-behaved" events are often the events of interest, so all events of interest are contained in that -algebra, that is, the domain of the probability measure.
To motivate the probability axioms, we consider some properties that the "probability" in frequentism (as a limiting relative frequency) possess:
- The limiting relative frequency must be nonnegative. (We call this property as Template:Colored em.)
- The limiting relative frequency of the whole sample space ( is also an event) must be 1 (since by definition contains all sample points, this event must occur in every repetition). (We call this property as Template:Colored em.)
- If the events are pairwise disjoint (i.e., for every with ), then the limiting relative frequency of the event (union of subsets of is a subset of , so it can be called an event) is
- which is the sum of the limiting relative frequency of each of the events . (We call this property as Template:Colored em.)
It is thus very natural to set the probability axioms to be the three properties mentioned above: Template:Colored definition Template:Colored remark Using the probability axioms alone, we can prove many well-known properties of probability.
Basic properties of probability
Let us start the discussion with some simple properties of probability. Template:Colored theorem
Proof. Consider the infinite sequence of events (recall that and must be in the -algebra ). We can see that the events are pairwise disjoint. Also, the union of these events is . Hence, by the countable additivity of probability, we have It can then be shown that . [1]
Using this result, we can obtain Template:Colored em from the countable additivity of probability: Template:Colored theorem
Proof. Consider the infinite sequence of events (recall that always). Then, (The last equality follows since , and it can be shown that using some concepts in limit (to be mathematically rigorous).)
Finite additivity makes the proofs of some of the following results simpler. Template:Colored theorem
Proof.
Property 1:
First, notice that by definition . Furthermore, since , we have by the closure under complementation of -algebra. Also, the sets and are disjoint. Thus, by the finite additivity, we have On the other hand, Thus, we have the desired result.
Property 2: By property 1, we have . We then have the desired numeric bound on since also by the nonnegativity of probability.
Property 3:
Property 4: By property 3, we have
Property 5: Assume that . Then, . Hence, by property 3,
Template:Colored remark Template:Colored example Template:Colored example Template:Colored exercise Template:Colored example Template:Colored remark Template:Colored example Template:Colored exercise Template:Colored example
Constructing a probability measure
As we have said, the axiomatic definition does not suggest us a way to construct a probability measure. Actually, even for the same experiment, there can be many ways to construct a probability measure that satisfies the above probability axioms if there are not sufficient information provided: Template:Colored example However, we have previously mentioned that we may assign probabilities to events subjectively (as in subjectivism), or according to its limiting relative frequency (as in frequentism). Through these two probability interpretations, we may provide some background information for a random experiment, by assigning probabilities to some of the events before constructing the probability measure, to the extent that there is Template:Colored em way to construct a probability measure. Consider the coin tossing example again: Template:Colored example In general, it is not necessary to assign probability to Template:Colored em event in the event space in the background information for us to able to construct the probability measure in exactly one way. Consider the following example. Template:Colored example We can see from this example that to provide sufficient background information to the extent that the probability measure can be constructed in exactly one way, we just need the probability of each of the singleton events (which should be nonnegative and sum to one to satisfy the probability axioms). After that, we can calculate the probability for each of the other events in the event space, and hence construct the only possible probability measure.
This is true when the sample space is countable, in general: Template:Colored theorem
Proof.
Case 1: is finite. Then, we can write . It follows that every event can be expressed as (which s are taken over for the union depends on the event ). Notice also that the sets ""s are disjoint. (Every set contains a different sample point, and so the intersection of any pair of them is an empty set.) Then, by the finite additivity of probability, we have for every event ,
Case 2: is countably infinite. Then, we can write . It follows that every event can be expressed as (which s are taken over for the union depends on the event ). Notice also that the sets ""s are disjoint. Then, by the countable additivity/finite additivity of probability, we have for every event ,
Template:Colored example The following is an important special case for the above theorem. Template:Colored corollary
Proof. Under the assumptions, the probability of every singleton event is nonnegative. Also, the sum of the probabilities is Thus, for every event , we have by the previous theorem
Template:Colored remark Template:Colored example Template:Colored example Template:Colored exercise Template:Colored example Template:Colored example Template:Colored example Template:Colored example Template:Colored example Template:Colored example Template:Colored example Template:Colored example Template:Colored example
More advanced properties of probability
Recall the Template:Colored em in combinatorics. We have similar results for probability: Template:Colored theorem
Proof. We can prove this by mathematical induction.
Let be the statement We wish to prove that is true for every positive integer .
Basis Step: When , is clearly true since it merely states that .
Inductive Hypothesis: Assume that is true for an arbitrary positive integer .
Inductive Step:
Case 1: . Then, is true by a property of probability (recall that we have "").
Case 2: . We wish to prove that is true. The main idea of the steps is to regard as , and then we apply the above property of probability, and eventually we will apply the inductive hypothesis twice, on two probabilities involving union of events. Ultimately, through some (somewhat complicated) algebraic manipulations, we finally get the desired result. The details are as follows (may be omitted): So, is true.
Hence, by the principle of mathematical induction, is true for every positive integer .
Template:Colored remark Template:Colored example Template:Colored example The following is a classical example for demonstrating the application of inclusion-exclusion principle. Template:Colored example Template:Colored theorem
Proof. Assume that . Now, define [2] Claim: are pairwise disjoint.
Proof. We wish to prove that for every such that . Case 1: . Then, () and (). Hence, . (Since , ( can be at most ).)
Case 2: . Then, () and . Hence, similarly.
Also, we have , and . Then, we have
Proof. Assume that as . Then, by De Morgan's law and a property of subset, . So,
Template:Colored example Template:Colored exercise Template:Colored example Template:Colored theorem
Proof. Define Claim: are pairwise disjoint.
Proof. We wish to prove that for every such that . Case 1: . Then, . So, it follows that which means (since the only subset of is ).
Case 2: . Then, . So, it follows that which means (since the only subset of is ).
Furthermore, we have for every , and . Hence,
Template:Colored remark Template:Colored example Template:Nav
- β One may prove this by Template:Colored em: Assume that . Then, by the nonegativity of probability, this means . Then, (that is, the sum Template:Colored em). So, .
- β Graphically,
. . . . . . *-----------* |###########| *--------*##| |////////|##| *----*///|##| ... |....|///|##| |....|///|##| *----*---*--* E_1 E_2 E_3 ... .. .. : F_1 // // : F_2 ## ## : F_3