Statistics/Distributions/Hypergeometric

Hypergeometric Distribution

The hypergeometric distribution describes the number of successes in a sequence of n draws without replacement from a population of N that contained m total successes.

Its probability mass function is:

f (x) = \frac{(\binom{m}{x}) (\binom{N - m}{n - x})}{(\binom{N}{n})} for all x \in [0, n]

Technically the support for the function is only where x∈[max(0, n+m-N), min(m, n)]. In situations where this range is not [0,n], f(x)=0 since for k>0, $(\binom{0}{k}) = 0$ .

Probability Density Function

We first check to see that f(x) is a valid pmf. This requires that it is non-negative everywhere and that its total sum is equal to 1. The first condition is obvious. For the second condition we will start with Vandermonde's identity

\sum_{x = 0}^{n} (\binom{a}{x}) (\binom{b}{n - x}) = (\binom{a + b}{n})

\sum_{x = 0}^{n} \frac{(\binom{a}{x}) (\binom{b}{n - x})}{(\binom{a + b}{n})} = 1

We now see that if a=m and b=N-m that the condition is satisfied.

Mean

We derive the mean as follows:

E [X] = \sum_{x = 0}^{n} x \cdot f (x; n, m, N) = \sum_{x = 0}^{n} x \cdot \frac{(\binom{m}{x}) (\binom{N - m}{n - x})}{(\binom{N}{n})}

E [X] = 0 \cdot \frac{(\binom{m}{0}) (\binom{N - m}{n - 0})}{(\binom{N}{n})} + \sum_{x = 1}^{n} x \cdot \frac{(\binom{m}{x}) (\binom{N - m}{n - x})}{(\binom{N}{n})}

We use the identity $(\binom{a}{b}) = \frac{a}{b} (\binom{a - 1}{b - 1})$ in the denominator.

E [X] = 0 + \sum_{x = 1}^{n} x \cdot \frac{(\binom{m}{x}) (\binom{N - m}{n - x})}{\frac{N}{n} (\binom{N - 1}{n - 1})}

E [X] = \frac{n}{N} \sum_{x = 1}^{n} x \cdot \frac{(\binom{m}{x}) (\binom{N - m}{n - x})}{(\binom{N - 1}{n - 1})}

Next we use the identity $b (\binom{a}{b}) = a (\binom{a - 1}{b - 1})$ in the first binomial of the numerator.

E [X] = \frac{n}{N} \sum_{x = 1}^{n} \frac{m (\binom{m - 1}{x - 1}) (\binom{N - m}{n - x})}{(\binom{N - 1}{n - 1})}

Next, for the variables inside the sum we define corresponding prime variables that are one less. So Template:Nowrap, Template:Nowrap, Template:Nowrap, Template:Nowrap.

E [X] = \frac{m n}{N} \sum_{x^{'} = 0}^{n^{'}} \frac{(\binom{m^{'}}{x^{'}}) (\binom{N^{'} - m^{'}}{n^{'} - x^{'}})}{(\binom{N^{'}}{n^{'}})}

E [X] = \frac{m n}{N} \sum_{x^{'} = 0}^{n^{'}} f (x^{'}; n^{'}, m^{'}, N^{'})

Now we see that the sum is the total sum over a Hypergeometric pmf with modified parameters. This is equal to 1. Therefore

E [X] = \frac{n m}{N}

Variance

We first determine E(X²).

E [X^{2}] = \sum_{x = 0}^{n} f (x; n, m, N) \cdot x^{2} = \sum_{x = 0}^{n} \frac{(\binom{m}{x}) (\binom{N - m}{n - x})}{(\binom{N}{n})} \cdot x^{2}

E [X^{2}] = \frac{(\binom{m}{0}) (\binom{N - m}{n - 0})}{(\binom{N}{n})} \cdot 0^{2} + \sum_{x = 1}^{n} \frac{(\binom{m}{x}) (\binom{N - m}{n - x})}{(\binom{N}{n})} \cdot x^{2}

E [X^{2}] = 0 + \sum_{x = 1}^{n} \frac{m (\binom{m - 1}{x - 1}) (\binom{N - m}{n - x})}{\frac{N}{n} (\binom{N - 1}{n - 1})} \cdot x

E [X^{2}] = \frac{m n}{N} \sum_{x = 1}^{n} \frac{(\binom{m - 1}{x - 1}) (\binom{N - m}{n - x})}{(\binom{N - 1}{n - 1})} \cdot x

We use the same variable substitution as when deriving the mean.

E [X^{2}] = \frac{m n}{N} \sum_{x^{'} = 0}^{n^{'}} \frac{(\binom{m^{'}}{x^{'}}) (\binom{N^{'} - m^{'}}{n^{'} - x^{'}})}{(\binom{N^{'}}{n^{'}})} (x^{'} + 1)

E [X^{2}] = \frac{m n}{N} [\sum_{x^{'} = 0}^{n^{'}} \frac{(\binom{m^{'}}{x^{'}}) (\binom{N^{'} - m^{'}}{n^{'} - x^{'}})}{(\binom{N^{'}}{n^{'}})} x^{'} + \sum_{x^{'} = 0}^{n^{'}} \frac{(\binom{m^{'}}{x^{'}}) (\binom{N^{'} - m^{'}}{n^{'} - x^{'}})}{(\binom{N^{'}}{n^{'}})}]

The first sum is the expected value of a hypergeometric random variable with parameteres (n',m',N'). The second sum is the total sum that random variable's pmf.

E [X^{2}] = \frac{m n}{N} [\frac{n^{'} m^{'}}{N^{'}} + 1]

E [X^{2}] = \frac{m n}{N} [\frac{(n - 1) (m - 1)}{(N - 1)} + 1] = \frac{m n}{N} [\frac{(n - 1) (m - 1) + (N - 1)}{(N - 1)}]

We then solve for the variance

Var (X) = E [X^{2}] - (E [X])^{2}

Var (X) = \frac{m n}{N} [\frac{(n - 1) (m - 1) + (N - 1)}{(N - 1)}] - {(\frac{m n}{N})}^{2}

Var (X) = \frac{N m n}{N^{2}} [\frac{(n - 1) (m - 1) + (N - 1)}{(N - 1)}] - \frac{(N - 1) (m n)^{2}}{(N - 1) N^{2}}

Var (X) = \frac{n m (N - n) (N - m)}{N^{2} (N - 1)}

or, equivalently,

Var (X) = \frac{n m}{N} (1 - \frac{n}{N}) (1 - \frac{m - 1}{N - 1})

Template:BookCat

Statistics/Distributions/Hypergeometric

Contents

Hypergeometric Distribution

Probability Density Function

Mean

Variance

Navigation menu

Statistics/Distributions/Hypergeometric

Hypergeometric Distribution

Probability Density Function

Mean

Variance

Navigation menu

Search