Probability/Important Distributions
Distributions of a discrete random variable
Preliminary conept: Bernoulli trial
Template:Colored definition Template:Colored remark Template:Colored definition Template:Colored example Template:Colored remark
Binomial distribution
Motivation
Consider independent Bernoulli trials with the same success probability . We would like to calculate to probability .
Let be the event , as in the previous section. Let's consider a particular sequence of outcomes such that there are successes in trials: Its probability is [1] Since the probability of other sequences with some of successes occurring in other trials is the Template:Colored em, and there are distinct possible sequences[2], This is the pmf of a random variable following the Template:Colored em.
Definition
Template:Colored definition Template:Colored remark
Bernoulli distribution
Bernoulli distribution is simply a special case of Template:Colored em distribution, as follows: Template:Colored definition Template:Colored remark
Poisson distribution
Motivation
The Poisson distribution can be viewed as the 'limit case' for the binomial distribution.
Consider independent Bernoulli trials with success probability . By the binomial distribution,
After that, consider an unit time interval, with (positive) Template:Colored em of a rare event (i.e. the Template:Colored em of number of occurrence of the rare event is ). We can divide the unit time interval to time subintervals of time length each. If is Template:Colored em and is Template:Colored em, such that the probability for occurrence of two or more Template:Colored em at a single time interval is negligible, then the probability for occurrence of Template:Colored em for each time subinterval is by definition of mean. Then, we can view the unit time interval as a sequence of Bernoulli trials [3] with success probability . After that, we can use to model the number of occurrences of Template:Colored em. To be more precise, This is the pmf of a random variable following the Template:Colored em, and this result is known as the Template:Colored em (or law of rare events). We will introduce it formally after introducing the definition of Template:Colored em.
Definition
Template:Colored definition Template:Colored remark Template:Colored theorem
Proof. The result follows from the result proved above: the pmf of approaches the pmf of as .
Geometric distribution
Motivation
Consider a sequence of independent Bernoulli trials with success probability . We would like to calculate the probability . By considering this sequence of outcomes: we can calculate that [4] This is the pmf of a random variable following the Template:Colored em.
Definition
Template:Colored definition Template:Colored remark Template:Colored proposition
Proof.
- In particular, since .
Negative binomial distribution
Motivation
Consider a sequence of independent Bernoulli trials with success probability . We would like to calculate the probability . By considering this sequence of outcomes: we can calculate that Since the probability of other sequences with some of failures occurring in other trials (and some of successes (excluding the th success, which must occur in the last trial) occurring in other trials), is the Template:Colored em, and there are (or , which is the same numerically) distinct possible sequences [5], This is the pmf of a random variable following the Template:Colored em.
Definition
Template:Colored definition Template:Colored remark
Hypergeometric distribution
Motivation
Consider a sample of size are drawn without replacement from a population size , containing objects of type 1 and of another type. Then, the probability [6].
- : unordered selection of objects of type 1 from (distinguishable) objects of type 1 without replacement;
- : unordered selection of objects of another type from (distinguishable) objects of another type without replacement;
- : unordered selection of objects from (distinguishable) objects without replacement.
This is the pmf of a random variable following the Template:Colored em.
Definition
Template:Colored definition Template:Colored remark
Finite discrete distribution
This type of distribution is a generalization of all discrete distribution with finite support, e.g. Bernoulli distribution and hypergeometric distribution.
Another special case of this type of distribution is Template:Colored em, which is similar to the Template:Colored em (will be discussed later). Template:Colored definition Template:Colored remark Template:Colored definition Template:Colored remark Template:Colored example Template:Colored example
Exercises
Distributions of a continuous random variable
Uniform distribution (continuous)
The Template:Colored em is a model for 'no preference', i.e. all intervals of the same length on its support are Template:Colored em [7] (it can be seen from the pdf corresponding to continuous uniform distribution). There is also Template:Colored em uniform distribution, but it is less important than Template:Colored em uniform distribution. So, from now on, simply 'uniform distribution' refers to the Template:Colored em one, instead of the discrete one. Template:Colored definition Template:Colored remark Template:Colored proposition
Proof. Then, the result follows.
Exponential distribution
The Template:Colored em distribution with Template:Colored em parameter is often used to describe the Template:Colored em of rare events with rate .
Comparing this with the Template:Colored em distribution, the Template:Colored em distribution describes the interarrival Template:Colored em of rare events, while Template:Colored em distribution describes the Template:Colored em of occurrences of rare events within a fixed time interval.
By definition of Template:Colored em, when the Template:Colored em , then Template:Colored em (i.e. frequency of the rare event ).
So, we would like the pdf to be more skewed to left when (i.e. the pdf has higher value for small when ), so that areas under the pdf for intervals involving small value of when .
Also, since with a fixed rate , the interarrival time should be less likely of higher value. So, intuitively, we would also like the pdf to be a strictly Template:Colored em function, so that the probability involved (area under the pdf for some interval) when .
As we can see, the pdf of exponential distribution satisfies both of these properties. Template:Colored definition Template:Colored proposition
Proof. Suppose . The cdf of is
Proof.
Gamma distribution
Template:Colored em distribution is a generalized Template:Colored em distribution, in the sense that we can also change the Template:Colored em of the pdf of Template:Colored em distribution. Template:Colored definition Template:Colored remark
Beta distribution
Template:Colored em distribution is a generalized , in the sense that we can also change the Template:Colored em of the pdf, using Template:Colored em. Template:Colored definition Template:Colored remark
Cauchy distribution
The Template:Colored em distribution is a Template:Colored em distribution [8]. As a result, it is a 'pathological' distribution, in the sense that it has some counter-intuitive properties, e.g. undefined mean and variance, despite its mean and variance Template:Colored em to be defined when we look at its graph directly. Template:Colored definition Template:Colored remark
Normal distribution (very important)
The normal or Gaussian distribution is a thing of beauty, appearing in many places in nature. This is probably because sample means or sample sums often follow Template:Colored em distributions Template:Colored em by Template:Colored em. As a result, the Template:Colored em distribution is important in statistics.
Template:Colored definition Template:Colored remark Template:Colored proposition
Proof. Assume [9]. Let and be cdf of and respectively. Since by differentiation, which is the pdf of .
Important distributions for statistics especially
The following distributions are important in statistics especially, and they are all related to normal distribution. We will introduce them briefly.
Chi-squared distribution
The Template:Colored em distribution is a special case of Gamma distribution, and also related to Template:Colored em distribution.
Template:Colored definition Template:Colored remark
Student's t-distribution
The Template:Colored em is related to Template:Colored em distribution and Template:Colored em distribution. Template:Colored definition Template:Colored remark
F-distribution
The -distribution is sort of a generalized Student's -distribution, in the sense that it has one more changeable parameter for another degrees of freedom. Template:Colored definition Template:Colored remark If you are interested in knowing how Template:Colored em, Template:Colored em, and Template:Colored em are useful in statistics, then you may briefly look at, for instance, Statistics/Interval Estimation (applications in confidence interval construction) and Statistics/Hypothesis Testing (applications in hypothesis testing).
Joint distributions
Multinomial distribution
Motivation
Multinomial distribution is Template:Colored em binomial distribution, in the sense that each trial has more than two outcomes.
Suppose objects are to be allocated to cells independently, for which each object is allocated to Template:Colored em cell, with probability to be allocated to the th cell () [10]. Let be the number of objects allocated to cell . We would like to calculate the probability , i.e. the probability that th cell has objects.
We can regard each allocation as an independent trial with outcomes (since it can be allocated to one and only one of cells). We can recognize that the allocation of objects is partition of objects into groups. There are hence ways of allocation.
So, In particular, the probability of allocating objects to th cell is by independence, and so that of a particular case of allocation of objects to cells is by independence.
Definition
Template:Colored definition Template:Colored remark
Multivariate normal distribution
Template:Colored em normal distribution is, as suggested by its name, a multivariate (and also generalized) version of the normal distribution (univariate). Template:Colored definition Template:Colored remark Template:Info Template:Colored proposition
Proof. For the bivariate normal distribution,
- the Template:Colored em is ;
- the Template:Colored em is
- Hence,
- It follows that the joint pdf is
- ↑ 'indpt.' stands for independence.
- ↑ This is because there is unordered selection of (distinguishable and ordered) trials for Template:Color without replacement from trials (then the remaining position is for Template:Color).
- ↑ Occurrence of the rare event is viewed as 'success' and non-occurrence of the rare event is viewed as 'failure'.
- ↑ Unlike the outcomes for the binomial distribution, there is only Template:Colored em possible sequence for each .
- ↑ There is unordered selection of trials for Template:Color (or trials for Template:Color) from trials without replacement
- ↑ The restriction on is imposed so that the binomial coefficients are defined, i.e. the expression 'makes sense'. In practice, we rarely use this condition directly. Instead, we usually directly determine whether a specific value of 'makes sense'.
- ↑ The probability is 'distributed uniformly over an interval'.
- ↑ A random variable following the Template:Colored em distribution has a relatively high probability to take Template:Colored em, compared with other Template:Colored em distributions (e.g. the normal distribution). Graphically, the 'tails' (i.e. left end and right end) of the pdf.
- ↑ The case for holds similarly (The inequality sign is in opposite direction, and eventually we will have two negative signs cancelling each other). Also when , the r.v. becomes a non-random constant, and so we are not interested in this case.
- ↑ Then, .