Probability/Important Distributions

From testwiki
Revision as of 11:48, 4 December 2024 by imported>R. Henrik Nilsson (occured > occurred)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Template:Nav

Template:Info

Distributions of a discrete random variable

Preliminary conept: Bernoulli trial

Template:Colored definition Template:Colored remark Template:Colored definition Template:Colored example Template:Colored remark

Binomial distribution

Motivation

Consider n independent Bernoulli trials with the same success probability p. We would like to calculate to probability ({r successes in n trials}).

Let Si be the event {ith Bernoulli trial is a success},i=1,2,, as in the previous section. Let's consider a particular sequence of outcomes such that there are r successes in n trials: SSr successesFFnr failures Its probability is (S1SrSr+1cSnc)= indpt. (S1)(Sr)(Sr+1c)(Snc)=pr(1p)nr [1] Since the probability of other sequences with some of r successes occurring in other trials is the Template:Colored em, and there are (nr) distinct possible sequences[2], ({r successes in n trials})=(nr)pr(1p)nr. This is the pmf of a random variable following the Template:Colored em.

Definition

Template:Colored definition Template:Colored remark

Bernoulli distribution

Bernoulli distribution is simply a special case of Template:Colored em distribution, as follows: Template:Colored definition Template:Colored remark

Poisson distribution

Motivation

The Poisson distribution can be viewed as the 'limit case' for the binomial distribution.

Consider n independent Bernoulli trials with success probability p=λ/n. By the binomial distribution, (r successes in n trials)=(nr)(λ/n)r(1λ/n)nr.

After that, consider an unit time interval, with (positive) Template:Colored em λ of a rare event (i.e. the Template:Colored em of number of occurrence of the rare event is λ). We can divide the unit time interval to n time subintervals of time length 1/n each. If n is Template:Colored em and p is Template:Colored em, such that the probability for occurrence of two or more Template:Colored em at a single time interval is negligible, then the probability for occurrence of Template:Colored em for each time subinterval is p=λ/n by definition of mean. Then, we can view the unit time interval as a sequence of n Bernoulli trials [3] with success probability p=λ/n. After that, we can use Binom(n,λ/n) to model the number of occurrences of Template:Colored em. To be more precise, (r successes in n trialsr rare events in the unit time)=(nr)(λ/n)r(1λ/n)nr=n(n1)(nr+1)r!(λr/nr)(1λ/n)nr=(λr/r!)(11/n0 as n)(1(r1)/n0 as n)1 as n(1λ/n)nrn as neλ as neλλr/r! as n. This is the pmf of a random variable following the Template:Colored em, and this result is known as the Template:Colored em (or law of rare events). We will introduce it formally after introducing the definition of Template:Colored em.

Definition

Template:Colored definition Template:Colored remark Template:Colored theorem

Proof. The result follows from the result proved above: the pmf of Binom(n,λ/n) approaches the pmf of Pois(λ) as n.

Template:Colored remark

Geometric distribution

Motivation

Consider a sequence of independent Bernoulli trials with success probability p. We would like to calculate the probability ({x failures before first success}). By considering this sequence of outcomes: FFx failuresS, we can calculate that ({x failures before first success})=(1p)xp,xsupp(X)={0,1,2,} [4] This is the pmf of a random variable following the Template:Colored em.

Definition

Template:Colored definition Template:Colored remark Template:Colored proposition

Proof. (X>m+n|Xm)= def (X>m+nXm)=X>m+n(Xm)= def p((1p)m+n+1+(1p)m+n+2+)p((1p)m+(1p)m+1+)=(1p)m+n+1/(1(1p))(1p)m/(1(1p))by geometric series formula=(1p)n+1pp=p(1p)n+11(1p)=p((1p)n+1+(1p)n+2+)by geometric series formula= def (X>n)since X>nX=n+1,n+2,.

  • In particular, X>m+nXm=X>m+n since X>m+nX=m+n+1,m+n+2,XmX=m,m+1,.

Template:Colored remark

Negative binomial distribution

Motivation

Consider a sequence of independent Bernoulli trials with success probability p. We would like to calculate the probability ({x failures before kth success}). By considering this sequence of outcomes: FFx1 failuresSFFx2 failuresSFFxk failuresx+k1 trialsSkth success,x1+x2++xk=x, we can calculate that ({x failures before kth success})=(1p)xpk,xsupp(X)={0,1,2,}. Since the probability of other sequences with some of x failures occurring in other trials (and some of k1 successes (excluding the kth success, which must occur in the last trial) occurring in other trials), is the Template:Colored em, and there are (x+k1x) (or (x+k1k1), which is the same numerically) distinct possible sequences [5], ({x failures before kth success})=(x+k1x)(1p)xpk,xsupp(X)={0,1,2,}. This is the pmf of a random variable following the Template:Colored em.

Definition

Template:Colored definition Template:Colored remark

Hypergeometric distribution

Motivation

Consider a sample of size n are drawn without replacement from a population size N, containing K objects of type 1 and NK of another type. Then, the probability ({k type 1 objects are found when n objects are drawn from N objects})=(Kk)type 1(NKnk)another type/(Nn)all outcomes,k{max{nN+K,0},,min{K,n}} [6].

  • (Kk): unordered selection of k objects of type 1 from K (distinguishable) objects of type 1 without replacement;
  • (NKnk): unordered selection of nk objects of another type from NK (distinguishable) objects of another type without replacement;
  • (Nn): unordered selection of n objects from N (distinguishable) objects without replacement.

This is the pmf of a random variable following the Template:Colored em.

Definition

Template:Colored definition Template:Colored remark

Finite discrete distribution

This type of distribution is a generalization of all discrete distribution with finite support, e.g. Bernoulli distribution and hypergeometric distribution.

Another special case of this type of distribution is Template:Colored em, which is similar to the Template:Colored em (will be discussed later). Template:Colored definition Template:Colored remark Template:Colored definition Template:Colored remark Template:Colored example Template:Colored example

Exercises

Template:Colored exercise

Distributions of a continuous random variable

Uniform distribution (continuous)

The Template:Colored em is a model for 'no preference', i.e. all intervals of the same length on its support are Template:Colored em [7] (it can be seen from the pdf corresponding to continuous uniform distribution). There is also Template:Colored em uniform distribution, but it is less important than Template:Colored em uniform distribution. So, from now on, simply 'uniform distribution' refers to the Template:Colored em one, instead of the discrete one. Template:Colored definition Template:Colored remark Template:Colored proposition

Proof. F(x)=x𝟏{axb}bady=1baax𝟏{axb}dy={0/(ba),x<a;[][y]ax/(ba),axb;[][y]ab/(ba),x>b. Then, the result follows.


Exponential distribution

The Template:Colored em distribution with Template:Colored em parameter λ is often used to describe the Template:Colored em of rare events with rate λ.

Comparing this with the Template:Colored em distribution, the Template:Colored em distribution describes the interarrival Template:Colored em of rare events, while Template:Colored em distribution describes the Template:Colored em of occurrences of rare events within a fixed time interval.

By definition of Template:Colored em, when the Template:Colored em , then Template:Colored em (i.e. frequency of the rare event ).

So, we would like the pdf to be more skewed to left when λ(i.e. the pdf has higher value for small x when λ), so that areas under the pdf for intervals involving small value of x when λ.

Also, since with a fixed rate λ, the interarrival time should be less likely of higher value. So, intuitively, we would also like the pdf to be a strictly Template:Colored em function, so that the probability involved (area under the pdf for some interval) when x.

As we can see, the pdf of exponential distribution satisfies both of these properties. Template:Colored definition Template:Colored proposition

Proof. Suppose XExp(λ). The cdf of X is F(x)=xλeλy𝟏{y0}dy={0xλeλydy,x0;0,x<0(When x<0,xsupp(X), so F(x)=(Xx)=0)=𝟏{x0}λ0xeλydy=𝟏{x0}λλ[eλy]0x=𝟏{x0}(eλx1)=(1eλx)𝟏{x0}.

Template:Colored proposition

Proof. (X>s+t|X>s)= def (X>s+tX>s)(X>s)=(X>s+t)(X>s)=1(1eλ(s+t))1(1eλs)=eλ(s+t)eλs=eλt=(X>t).

Template:Colored remark

Gamma distribution

Template:Colored em distribution is a generalized Template:Colored em distribution, in the sense that we can also change the Template:Colored em of the pdf of Template:Colored em distribution. Template:Colored definition Template:Colored remark

Beta distribution

Template:Colored em distribution is a generalized 𝒰[0,1], in the sense that we can also change the Template:Colored em of the pdf, using Template:Colored em. Template:Colored definition Template:Colored remark

Cauchy distribution

The Template:Colored em distribution is a Template:Colored em distribution [8]. As a result, it is a 'pathological' distribution, in the sense that it has some counter-intuitive properties, e.g. undefined mean and variance, despite its mean and variance Template:Colored em to be defined when we look at its graph directly. Template:Colored definition Template:Colored remark

Normal distribution (very important)

The normal or Gaussian distribution is a thing of beauty, appearing in many places in nature. This is probably because sample means or sample sums often follow Template:Colored em distributions Template:Colored em by Template:Colored em. As a result, the Template:Colored em distribution is important in statistics.

Template:Colored definition Template:Colored remark Template:Colored proposition

Proof. Assume a>0 [9]. Let FX and FY be cdf of X and Y respectively. Since FY(y)=(Yy)=(aX+by)=(X(yb)/a)=FX((yb)/a), by differentiation, fY(y)=1afX((yb)/a)=1a2πσ2exp(((yb)/aμ)2/2σ2)=12πa2σ2exp((y(aμ+b))2/2a2σ2)since a>0, which is the pdf of 𝒩(aμ+b,a2σ2).

Template:Colored remark


Important distributions for statistics especially

The following distributions are important in statistics especially, and they are all related to normal distribution. We will introduce them briefly.

Chi-squared distribution

The Template:Colored em distribution is a special case of Gamma distribution, and also related to Template:Colored em distribution.

Template:Colored definition Template:Colored remark

Student's t-distribution

The Template:Colored em is related to Template:Colored em distribution and Template:Colored em distribution. Template:Colored definition Template:Colored remark

F-distribution

The F-distribution is sort of a generalized Student's t-distribution, in the sense that it has one more changeable parameter for another degrees of freedom. Template:Colored definition Template:Colored remark If you are interested in knowing how Template:Colored em, Template:Colored em, and Template:Colored em are useful in statistics, then you may briefly look at, for instance, Statistics/Interval Estimation (applications in confidence interval construction) and Statistics/Hypothesis Testing (applications in hypothesis testing).

Joint distributions

Template:Info

Multinomial distribution

Motivation

Multinomial distribution is Template:Colored em binomial distribution, in the sense that each trial has more than two outcomes.

Suppose n objects are to be allocated to k cells independently, for which each object is allocated to Template:Colored em cell, with probability pi to be allocated to the ith cell (i=1,2,,k) [10]. Let Xi be the number of objects allocated to cell i. We would like to calculate the probability (𝐗= def (X1,,Xk)T=𝐱= def (x1,,xk)T), i.e. the probability that ith cell has xi objects.

We can regard each allocation as an independent trial with k outcomes (since it can be allocated to one and only one of k cells). We can recognize that the allocation of n objects is partition of n objects into k groups. There are hence (nx1,,xk) ways of allocation.

So, (𝐗=𝐱)=(nx1,,xk)p1x1pkxk. In particular, the probability of allocating xi objects to ith cell is pixi by independence, and so that of a particular case of allocation of n objects to k cells is p1x1pkxk by independence.

Definition

Template:Colored definition Template:Colored remark

Multivariate normal distribution

Template:Colored em normal distribution is, as suggested by its name, a multivariate (and also generalized) version of the normal distribution (univariate). Template:Colored definition Template:Colored remark Template:Info Template:Colored proposition

Proof. For the bivariate normal distribution,

  • the Template:Colored em is μ=(μX,μY);
  • the Template:Colored em is Σ=(Cov(X,X)Cov(X,Y)Cov(Y,X)Cov(Y,Y))=(Var(X)Cov(X,Y)Cov(X,Y)Var(Y))=(σX2ρσXσYρσXσYσY2).
  • Hence,

(𝐱μ)TΣ1(𝐱μ)=1detΣ((xμX,yμY)T)T(σY2ρσXσYρσXσYσX2)(xμX,yμY)T)=1detΣ(xμXyμY)(σY2ρσXσYρσXσYσX2)(xμXyμY)=1detΣ((xμX)σY2(yμY)ρσXσY(xμX)ρσXσY+(yμY)σX2)(xμXyμY)=1detΣσX2σY2(ρσXσY)2((xμX)2σY2(xμX)(yμY)ρσXσY(xμX)(yμY)ρσXσY=2ρ(xμX)(yμY)σXσY+(yμY)2σX2)=(xμX)2σY22ρ(xμX)(yμY)σXσY+(yμY)2σX2σX2σY2(1ρ)2=11ρ2((xμXσX)22ρ((xμX)(yμY)σXσY)+(yμYσY)2).

  • It follows that the joint pdf is

f(x,y)=1(2π)2detΣexp(1211ρ2((xμXσX)22ρ((xμX)(yμY)σXσY)+(yμYσY)2))=12πσX2σY2(1ρ2)exp(12(1ρ2)((xμXσX)22ρ((xμX)(yμY)σXσY)+(yμYσY)2))=12πσXσY1ρ2exp(12(1ρ2)((xμXσX)22ρ(xμXσX)(yμYσY)+(yμYσY)2)).

Template:Nav

Template:BookCat

  1. 'indpt.' stands for independence.
  2. This is because there is unordered selection of (distinguishable and ordered) r trials for Template:Color without replacement from n trials (then the remaining position is for Template:Color).
  3. Occurrence of the rare event is viewed as 'success' and non-occurrence of the rare event is viewed as 'failure'.
  4. Unlike the outcomes for the binomial distribution, there is only Template:Colored em possible sequence for each x.
  5. There is unordered selection of x trials for Template:Color (or k1 trials for Template:Color) from x+k1 trials without replacement
  6. The restriction on k is imposed so that the binomial coefficients are defined, i.e. the expression 'makes sense'. In practice, we rarely use this condition directly. Instead, we usually directly determine whether a specific value of x 'makes sense'.
  7. The probability is 'distributed uniformly over an interval'.
  8. A random variable following the Template:Colored em distribution has a relatively high probability to take Template:Colored em, compared with other Template:Colored em distributions (e.g. the normal distribution). Graphically, the 'tails' (i.e. left end and right end) of the pdf.
  9. The case for a<0 holds similarly (The inequality sign is in opposite direction, and eventually we will have two negative signs cancelling each other). Also when a=0, the r.v. becomes a non-random constant, and so we are not interested in this case.
  10. Then, p1+p2++pk=1.