Statistics/Distributions/Hypergeometric

From testwiki
Revision as of 19:56, 8 September 2022 by 78.190.155.1 (talk) (Added and Edited Info Box)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Hypergeometric Distribution

Template:Probability distribution

The hypergeometric distribution describes the number of successes in a sequence of n draws without replacement from a population of N that contained m total successes.

Its probability mass function is:

f(x)=(mx)(Nmnx)(Nn) for all x[0,n]

Technically the support for the function is only where x∈[max(0, n+m-N), min(m, n)]. In situations where this range is not [0,n], f(x)=0 since for k>0, (0k)=0.

Probability Density Function

We first check to see that f(x) is a valid pmf. This requires that it is non-negative everywhere and that its total sum is equal to 1. The first condition is obvious. For the second condition we will start with Vandermonde's identity

x=0n(ax)(bnx)=(a+bn)
x=0n(ax)(bnx)(a+bn)=1

We now see that if a=m and b=N-m that the condition is satisfied.

Mean

We derive the mean as follows:

E[X]=x=0nxf(x;n,m,N)=x=0nx(mx)(Nmnx)(Nn)
E[X]=0(m0)(Nmn0)(Nn)+x=1nx(mx)(Nmnx)(Nn)

We use the identity (ab)=ab(a1b1) in the denominator.

E[X]=0+x=1nx(mx)(Nmnx)Nn(N1n1)
E[X]=nNx=1nx(mx)(Nmnx)(N1n1)

Next we use the identity b(ab)=a(a1b1) in the first binomial of the numerator.

E[X]=nNx=1nm(m1x1)(Nmnx)(N1n1)

Next, for the variables inside the sum we define corresponding prime variables that are one less. So Template:Nowrap, Template:Nowrap, Template:Nowrap, Template:Nowrap.

E[X]=mnNx=0n(mx)(Nmnx)(Nn)
E[X]=mnNx=0nf(x;n,m,N)

Now we see that the sum is the total sum over a Hypergeometric pmf with modified parameters. This is equal to 1. Therefore

E[X]=nmN

Variance

We first determine E(X2).

E[X2]=x=0nf(x;n,m,N)x2=x=0n(mx)(Nmnx)(Nn)x2
E[X2]=(m0)(Nmn0)(Nn)02+x=1n(mx)(Nmnx)(Nn)x2
E[X2]=0+x=1nm(m1x1)(Nmnx)Nn(N1n1)x
E[X2]=mnNx=1n(m1x1)(Nmnx)(N1n1)x

We use the same variable substitution as when deriving the mean.

E[X2]=mnNx=0n(mx)(Nmnx)(Nn)(x+1)
E[X2]=mnN[x=0n(mx)(Nmnx)(Nn)x+x=0n(mx)(Nmnx)(Nn)]

The first sum is the expected value of a hypergeometric random variable with parameteres (n',m',N'). The second sum is the total sum that random variable's pmf.

E[X2]=mnN[nmN+1]
E[X2]=mnN[(n1)(m1)(N1)+1]=mnN[(n1)(m1)+(N1)(N1)]

We then solve for the variance

Var(X)=E[X2](E[X])2
Var(X)=mnN[(n1)(m1)+(N1)(N1)](mnN)2
Var(X)=NmnN2[(n1)(m1)+(N1)(N1)](N1)(mn)2(N1)N2
Var(X)=nm(Nn)(Nm)N2(N1)

or, equivalently,

Var(X)=nmN(1nN)(1m1N1)

Template:BookCat