Statistics/Point Estimation

Introduction

Usually, a random variable $X$ resulting from a random experiment is Template:Colored em to follow a certain distribution with an unknown (but Template:Colored em) parameter (vector) ^[1] $θ \in ℝ^{k}$ ^[2] ( $k$ is a positive integer, and its value depends on the distribution), taking value in a set $Θ$ , called the parameter space. Template:Colored remark For example, suppose the random variable $X$ is assumed to follow a normal distribution $𝒩 (μ, σ^{2})$ . Then, in this case, the parameter vector $θ = (μ, σ) \in Θ$ is unknown, and the parameter space $Θ = {(μ, σ) : μ \in ℝ, σ > 0}$ . It is often useful to Template:Colored em those unknown parameters in some ways to "understand" the random variable $X$ better. We would like to make sure the estimation should be "good" ^[3] enough, so that the understanding is more accurate.

Intuitively, the (realization of) Template:Colored em $X_{1}, \dots, X_{n}$ should be useful. Indeed, the estimators introduced in this chapter are all based on the random sample in some sense, and this is what Template:Colored em mean. To be more precise, let us define Template:Colored em and Template:Colored em. Template:Colored definition Template:Colored remark Template:Colored example In the following, we will introduce two well-known point estimators, which are actually quite "good", namely Template:Colored em and Template:Colored em.

Maximum likelihood estimator (MLE)

As suggested by the name of this estimator, it is the estimator that Template:Colored em some kind of "likelihood". Now, we would like to know what "likelihood" should we maximize to estimate the unknown parameter(s) (in a "good" way). Also, as mentioned in the introduction section, the estimator is based on the random sample in some sense. Hence, this "likelihood" should be also based on the random sample in some sense.

To motivate the definition of maximum likelihood estimator, consider the following example. Template:Colored example Template:Colored remark Intuitively, with these particular realizations (fixed), we would like to find a value of $p$ that maximizes this probability, i.e.,, makes the realizations obtained to be the one that is "most probable" or "with maximum likelihood". Now, let us formally define the terms related to MLE. Template:Colored definition Template:Colored remark Template:Colored definition Template:Colored remark Now, let us find the MLE of the unknown parameter $p$ in the previous coin flipping example. Template:Colored example Sometimes, there is constraint imposed on the parameter when we are finding its MLE. The MLE of the parameter in this case is called a Template:Colored em MLE. We will illustrate this in the following example. Template:Colored example To find the MLE, we sometimes use methods other than derivative test, and we do not need to find the log-likelihood function. Let us illustrate this in the following example. Template:Colored example In the following example, we will find the MLE of a parameter vector. Template:Colored example Template:Colored exercise Template:Colored example

Method of moments estimator (MME)

For maximum likelihood estimation, we need to utilize the likelihood function, which is found from the joint pmf of pdf of the random sample from a distribution. However, we may not know exactly the pmf of pdf of the distribution in practice. Instead, we may just know some information about the distribution, e.g. mean, variance, and some moments ( $r$ th moment of a random variable $X$ is $𝔼 [X^{r}]$ , we denote it by $μ_{r}$ for simplicity). Such moments often contain information about the unknown parameter. For example, for a normal distribution $𝒩 (μ, σ^{2})$ , we know that $μ = μ_{1}$ and $σ^{2} = μ_{2} - (μ_{1})^{2}$ . Because of this, when we want to estimate the parameters, we can do this through estimating the moments.

Now, we would like to know how to estimate the moments. We let $m_{r} = \frac{\sum_{i = 1}^{n} X_{i}^{r}}{n}$ be the $r$ th Template:Colored em ^[4], where $X_{i}$ 's are independent and identically distributed. By Template:Colored em (assuming the conditions are satisified), we have

$\overline{X} = m_{1} \overset{p}{\to} 𝔼 [X] = μ_{1}$
$m_{2} \overset{p}{\to} 𝔼 [X^{2}] = μ_{2}$ (this can be seen from replacing the " $X$ " by " $X^{2}$ " in the weak law of large number, then the conditions are still satisfied, and so we can still apply the weak law of large number)

In general, we have $m_{r} \overset{p}{\to} μ_{r}$ , since the conditions are still satisfied after replacing the " $X$ " by " $X^{r}$ " in the weak law of large number.

Because of these results, we can estimate the $r$ -th moment $μ_{r}$ using the $r$ -th sample moment $m_{r}$ , and the estimation is "better" when $n$ is large. For example, in the above normal distribution example, we can estimate $μ$ by $m_{1}$ and $σ^{2}$ by $m_{2} - (m_{1})^{2}$ , and these estimators are actually called the Template:Colored em.

To be more precise, we have the following the definition of the Template:Colored em: Template:Colored definition Template:Colored remark Template:Colored example Template:Colored remark Template:Colored example Template:Colored exercise

Properties of estimator

In this section, we will introduce some criteria for evaluating how "good" a point estimator is, namely Template:Colored em, Template:Colored em and Template:Colored em.

Unbiasedness

For $\hat{θ}$ to be a "good" estimator of a parameter $θ$ , a desirable property of $\hat{θ}$ is that its expected value equals the value of the parameter $θ$ , or at least close to the value. Because of this, we introduce a value, namely Template:Colored em, to measure how close is the mean of $\hat{θ}$ to $θ$ . Template:Colored definition We will also define some terms related to bias. Template:Colored definition Template:Colored definition Template:Colored remark Template:Colored example Template:Colored example

Efficiency

We have discussed how to evaluate the unbiasedness of estimators. Now, if we are given two unbiased estimators, $\hat{θ}$ and $\tilde{θ}$ , how should we compare their goodness? Their goodness is the same if we are only comparing them in terms of unbiasedness. Therefore, we need another criterion in this case. One possible way is to compare their Template:Colored em, and the one with smaller variance is better, since on average, the estimator is less deviated from its mean, which is the value of the unknown parameter by the definition of unbiased estimator, and thus the one with smaller variance is more accurate in some deviation sense. Indeed, an unbiased estimator can still have a large variance, and thus deviate a lot from its mean. Such estimator is unbiased since the positive deviations and negative deviations somehow cancel out each other. This is the idea of Template:Colored em.

Template:Colored definition Template:Colored remark Actually, for the variance of unbiased estimator, since the mean of the unbiased estimator is the unknown paramter $θ$ , it measures the mean of the squared deviation from $θ$ , and we have a specific term for this deviation, namely Template:Colored em (MSE). Template:Colored definition Template:Colored remark Notice that in the definition of MSE, we do not specify that $\hat{θ}$ to be an unbiased estimator. Thus, $\hat{θ}$ in the definition may be biased. We have mentioned that when $\hat{θ}$ is unbiased, then its variance is actually its MSE. In the following, we will give a more general relationship between $MSE (\hat{θ})$ and $Var (\hat{θ})$ , not just for unbiased estimators. Template:Colored proposition

Proof. By definition, we have $MSE (\hat{θ}) = 𝔼 [(\hat{θ} - θ)^{2}]$ and $Var (\hat{θ}) = 𝔼 [(\hat{θ} - 𝔼 [\hat{θ}])^{2}]$ . From these, we are motivated to write $\begin{matrix} MSE (\hat{θ}) & = 𝔼 [(\hat{θ} - θ)^{2}] \\ = 𝔼 [((\hat{θ} - 𝔼 [\hat{θ}]) + (𝔼 [\hat{θ}] - θ))^{2}] \\ = 𝔼 [(\hat{θ} - 𝔼 [\hat{θ}])^{2} + 2 (\hat{θ} - 𝔼 [\hat{θ}]) \underset{constant}{\underset{⏟}{(𝔼 [\hat{θ}] - θ)}} + (𝔼 [\hat{θ}] - θ)^{2}] \\ = Var (\hat{θ}) + 2 (𝔼 [\hat{θ}] - θ) \underset{= 𝔼 [\hat{θ}] - 𝔼 [\hat{θ}] = 0}{\underset{⏟}{𝔼 [\hat{θ} - 𝔼 [\hat{θ}]]}} + [Bias (\hat{θ})]^{2} \\ = Var (\hat{θ}) + [Bias (\hat{θ})]^{2}, \end{matrix}$ as desired.

$◻$

Template:Colored example Template:Colored proposition

Proof.

"if" part is simple. Assume $\lim_{n \to \infty} Var (\hat{θ}) = 0$ and $\lim_{n \to \infty} Bias (\hat{θ}) = 0$ . Then, $\lim_{n \to \infty} (Var (\hat{θ}) + (Bias (\hat{θ}))^{2}) = 0 \Rightarrow \lim_{n \to \infty} MSE (\hat{θ}) = 0$ .
"only if" part: we can use proof by contrapositive, i.e., proving that if $\lim_{n \to \infty} Var (\hat{θ}) \neq 0$ Template:Colored em $\lim_{n \to \infty} Bias (\hat{θ}) = 0$ , then $\lim_{n \to \infty} MSE (\hat{θ}) \neq 0$ .

Case 1: when $\lim_{n \to \infty} Var (\hat{θ}) \neq 0$ , it means $\lim_{n \to \infty} Var (\hat{θ}) > 0$ since the variance is nonnegative. Also, $\lim_{n \to \infty} (Bias (\hat{θ}))^{2} \geq 0$ . It follows that $\lim_{n \to \infty} MSE (\hat{θ}) > 0$ , i.e., the MSE does not equal zero.
Case 2: when $\lim_{n \to \infty} Bias (\hat{θ}) \neq 0$ , it means $\lim_{n \to \infty} (Bias (\hat{θ}))^{2} > 0$ . Also, $\lim_{n \to \infty} Var (\hat{θ}) \geq 0$ . It follows that $\lim_{n \to \infty} MSE (\hat{θ}) > 0$ , i.e., the MSE does not equal zero.

$◻$

Template:Colored remark

Uniformly minimum-variance unbiased estimator

Now, we know that the smaller the variance of an unbiased estimator, the more efficient (and "better") it is. Thus, it is natural that we want to know what is the Template:Colored em efficient (i.e., the "best") unbiased estimator, i.e., the unbiased estimator with the smallest variance. We have a specific name for such unbiased estimator, namely Template:Colored em ^[5]. To be more precise, we have the following definition for UMVUE: Template:Colored definition Indeed, UMVUE is Template:Colored em, i.e., there is exactly one unbiased estimator with the smallest variance among all unbiased estimators, and we will prove it in the following. Template:Colored proposition

Proof. Assume that $W$ is an UMVUE of $τ (θ)$ , and $W^{'}$ is another UMVUE of $τ (θ)$ . Define the estimator $W^{*} = \frac{1}{2} (W + W^{'})$ . Since $𝔼 [W^{*}] = \frac{1}{2} (𝔼 [W] + 𝔼 [W^{'}]) = \frac{1}{2} (τ (θ + θ) = τ (θ)$ , $W^{*}$ is an unbiased estimator of $τ (θ)$ .

Now, we consider the variance of $W^{*}$ . $\begin{matrix} Var (W^{*}) & = \frac{1}{4} Var (W + W^{'}) \\ = \frac{1}{4} [Var (W) + Var (W^{'}) + 2 Cov (W, W^{'})] \\ \leq \frac{1}{4} Var (W) + \frac{1}{4} Var (W^{'}) + \frac{1}{2} \sqrt{Var (W) Var (W^{'})} & (covariance inequality) \\ = \frac{1}{4} Var (W) + \frac{1}{4} Var (W) + \frac{1}{2} \sqrt{(Var (W))^{2}} & (Var (W) = Var (W^{'}) since W and W^{'} are both UMVUE) \\ = \frac{1}{2} Var (W) + \frac{1}{2} Var (W) & (Var (W) > 0) \\ = Var (W) . \end{matrix}$ Thus, we now have either $Var (W^{*}) < Var (W)$ or $Var (W^{*}) = Var (W)$ . If the former is true, then $W$ is Template:Colored em an UMVUE of $τ (θ)$ by definition, since we can find another unbiased estimator, namely $W^{*}$ , with smaller variance than it. Hence, we must have the latter, i.e., $Var (W^{*}) = Var (W) .$ This implies when we apply the covariance inequality, the equality holds, i.e., $Cov (W, W^{'}) = \sqrt{Var (W) Var (W^{'})} ⟺ ρ (W^{'}, W) = 1,$ which means $W^{'}$ is increasing linearly with $W$ , i.e., we can write $W^{'} = a W + b$ for some constants $a > 0$ and $b$ .

Now, we consider the covariance $Cov (W, W^{'})$ . $Cov (W, W^{'}) \overset{above}{=} Cov (W, a W + b) \overset{properties}{=} a Cov (W, W) \overset{property}{=} a Var (W) .$ On the other hand, since the equality holds in the covariance inequality, and $Var (W) = Var (W^{'})$ (since they are both UMVUE), $Cov (W, W^{'}) = \sqrt{Var (W) Var (W^{'})} = \sqrt{(Var (W))^{2}} = Var (W) .$ Thus, we have $a = 1$ .

It remains to show that $b = 0$ to prove that $W = W^{'}$ , and therefore conclude that $W$ is Template:Colored em.

From above, we currently have $W^{'} = W + b ⟹ 𝔼 [W^{'}] = 𝔼 [W] + b ⟹ τ (θ) = τ (θ) + b ⟹ b = 0$ , as desired.

$◻$

Template:Colored remark

Cramer-Rao lower bound

Without using some results, it is quite difficult to determine the UMVUE, since there are many (perhaps even infinitely many) possible unbiased estimator, so it is quite hard to ensure that one particular unbiased estimator is relative more efficient than every other possible unbiased estimators.

Therefore, we will introduce some approaches that help us to find the UMVUE. For the first approach, we find a Template:Colored em ^[6] on the variances of all possible unbiased estimators. After getting such lower bound, if we can find an unbiased estimator with variance to be exactly equal to the lower bound, then the lower bound is the minimum value of the variances, and hence such unbiased estimator is an UMVUE by definition. Template:Colored remark A common way to find such lower bound is to use the Template:Colored em (CRLB), and we get the CRLB through Template:Colored em. Before stating the inequality, let us define some related terms. Template:Colored definition Template:Colored remark For the regularity conditions which allow interchange of derivative and integral, they include

the partial derivatives involved should exist, i.e., the (natural log) of the functions involved is differentiable
the integrals involved should be differentiable
the support does not depend on the parameter(s) involved

We have some results that assist us to compute the Fisher information. Template:Colored proposition

Proof. $\begin{matrix} ℐ_{n} (θ) & = 𝔼 [{(\frac{\partial \ln ℒ (θ; 𝐱)}{\partial θ})}^{2}] \\ = Var (\frac{\partial \ln ℒ (θ; 𝐱)}{\partial θ}) & by above remark \\ = Var (\frac{\partial}{\partial θ} (\ln \prod_{i = 1}^{n} f (X_{i}; θ))) & (ℒ (θ; 𝐱) = \prod_{i = 1}^{n} f (x_{i}; θ)) \\ = Var (\frac{\partial}{\partial θ} (\sum_{i = 1}^{n} \ln f (X_{i}; θ))) \\ = Var (\sum_{i = 1}^{n} \frac{\partial}{\partial θ} \ln f (X_{i}; θ)) & by linearity of differentiation \\ = \sum_{i = 1}^{n} Var (\frac{\partial}{\partial θ} \ln f (X_{i}; θ)) & by independence \\ = n Var (\frac{\partial}{\partial θ} \ln f (X_{i}; θ)) & by identically distributed property \\ = n 𝔼 [{(\frac{\partial \ln f (X; θ)}{\partial θ})}^{2}] & by above remark \\ = n ℐ (θ) . \end{matrix}$

$◻$

Template:Colored proposition

Proof. $\begin{matrix} 𝔼 [\frac{\partial^{2} \ln f (X; θ)}{\partial θ^{2}}] & = 𝔼 [\frac{\partial}{\partial θ} (\frac{\partial \ln f (X; θ)}{\partial θ})] \\ = 𝔼 [\frac{\partial}{\partial θ} (\frac{1}{f (X; θ)} \cdot \frac{\partial f (X; θ)}{\partial θ})] \\ = 𝔼 [\frac{1}{f (X; θ)} \cdot \frac{\partial^{2} f (X; θ)}{\partial θ^{2}} - \frac{\partial f (X; θ)}{\partial θ} \cdot \frac{1}{(f (X; θ))^{2}} \cdot \frac{\partial f (X; θ)}{\partial θ}] \\ = 𝔼 [\frac{1}{f (X; θ)} \cdot \frac{\partial^{2} f (X; θ)}{\partial θ^{2}} - {(\frac{\partial f (X; θ)}{\partial θ})}^{2} \cdot \frac{1}{(f (X; θ))^{2}}] \\ = 𝔼 [\frac{1}{f (X; θ)} \cdot \frac{\partial^{2} f (X; θ)}{\partial θ^{2}}] - 𝔼 [{(\frac{\partial \ln f (X; θ)}{\partial θ})}^{2}] \\ = 𝔼 [\frac{1}{f (X; θ)} \cdot \frac{\partial^{2} f (X; θ)}{\partial θ^{2}}] - ℐ (θ) \end{matrix}$ Now, it suffices to prove that $𝔼 [\frac{1}{f (X; θ)} \cdot \frac{\partial^{2} f (X; θ)}{\partial θ^{2}}] = 0$ , which is true since $\begin{matrix} 𝔼 [\frac{1}{f (X; θ)} \cdot \frac{\partial^{2} f (X; θ)}{\partial θ^{2}}] & = \int_{- \infty}^{\infty} \frac{1}{f (x; θ)} \cdot \frac{\partial^{2} f (x; θ)}{\partial θ^{2}} \cdot f (x; θ) d x \\ = \int_{- \infty}^{\infty} \frac{\partial^{2} f (x; θ)}{\partial θ^{2}} d x \\ = \frac{\partial^{2}}{\partial θ^{2}} \int_{- \infty}^{\infty} f (x; θ) d x \\ = \frac{\partial^{2}}{\partial θ^{2}} (1) \\ = 0 . \end{matrix}$

$◻$

Template:Colored remark Template:Colored theorem

Proof. Since $W$ is an unbiased estimator of $τ (θ)$ , we have by definition $𝔼 [W] = τ (θ)$ . By definition of expectation, we have $𝔼 [W] = \int_{- \infty}^{\infty} \dots \int_{- \infty}^{\infty} w ℒ (θ; 𝐱) d x_{n} \dots d x_{1}$ where $ℒ (θ; 𝐱)$ is the likelihood function. Thus, $\begin{matrix} \int_{- \infty}^{\infty} \dots \int_{- \infty}^{\infty} w ℒ (θ; 𝐱) d x_{n} \dots d x_{1} & = τ (θ) \\ \Rightarrow & \frac{\partial}{\partial θ} \int_{- \infty}^{\infty} \dots \int_{- \infty}^{\infty} w ℒ (θ; 𝐱) d x_{n} \dots d x_{1} & = \frac{\partial}{\partial θ} τ (θ) \\ \Rightarrow & \int_{- \infty}^{\infty} \dots \int_{- \infty}^{\infty} \frac{\partial}{\partial θ} (w ℒ (θ; 𝐱)) d x_{n} \dots d x_{1} & = τ^{'} (θ) \\ \Rightarrow & \int_{- \infty}^{\infty} \dots \int_{- \infty}^{\infty} w \frac{\partial}{\partial θ} (ℒ (θ; 𝐱)) \cdot \frac{1}{ℒ (θ; 𝐱)} \cdot ℒ (θ; 𝐱) d x_{n} \dots d x_{1} & = τ^{'} (θ) \\ \Rightarrow & \int_{- \infty}^{\infty} \dots \int_{- \infty}^{\infty} w \frac{\partial \ln ℒ (θ; 𝐱)}{\partial θ} ℒ (θ; 𝐱) d x_{n} \dots d x_{1} & = τ^{'} (θ) \\ \Rightarrow & 𝔼 [W \cdot \frac{\partial \ln ℒ (θ; 𝐱)}{\partial θ}] & = τ^{'} (θ) \\ \Rightarrow & 𝔼 [W S (θ; 𝐗)] & = τ^{'} (θ) & (S (θ; 𝐗) = \frac{\partial \ln ℒ (θ; 𝐱)}{\partial θ}) \\ \Rightarrow & 𝔼 [W S (θ; 𝐗)] - 𝔼 [W] \underset{= 0}{\underset{⏟}{𝔼 [S (θ; 𝐗)]}} & = τ^{'} (θ) & (𝔼 [S (θ; 𝐗)] = 0 by remark about Fisher information) \\ \Rightarrow & Cov (W, S (θ; 𝐗)) & = τ^{'} (θ) \end{matrix}$ Consider the covariance inequality: $(Cov (X, Y))^{2} \leq Var (X) Var (Y)$ . We have $(Cov (W, S (θ; 𝐗)))^{2} \leq Var (W) Var (S (θ; 𝐗)) ⟹ (τ^{'} (θ))^{2} \leq Var (W) Var (S (θ; 𝐗)) ⟹ Var (W) \geq \frac{(τ^{'} (θ))^{2}}{Var (S (θ; 𝐗))} = \frac{(τ^{'} (θ))^{2}}{ℐ_{n} (θ)} .$ ( $ℐ_{n} (θ) = Var (S (θ; 𝐗))$ by remark about Fisher information)

$◻$

Template:Colored remark Template:Colored example Sometimes, we cannot use the CRLB method for finding UMVUE, because

the regularity conditions may not be satisfied, and thus we cannot use the Cramer-Rao inequality, and
the variance of the unbiased estimator may not be equal to the CRLB, but we cannot conclude that it is not an UMVUE, because it may be the case that the CRLB is not attainable at all, and the smallest variance among all unbiased estimators is actually the variance of that estimator, which is larger than the CRLB.

We will illustrate some examples for these two cases in the following. Template:Colored example Template:Colored example Template:Colored remark Since the CRLB is sometimes attainable and sometimes not, it is natural to question that Template:Colored em can the CRLB be attained. In other words, we would like to know the Template:Colored em for the CRLB, which are stated in the following corollary. Template:Colored corollary

Proof. Considering the proof for Cramer-Rao inequality, we have $Var (W) = \frac{(τ^{'} (θ))^{2}}{ℐ_{n} (θ)} ⟺ (Cov (W, S (θ; 𝐗)))^{2} = Var (W) Var (S (θ; 𝐗))$ We can write $Cov (W, S (θ; 𝐗))$ as $Cov (W - \underset{constant}{\underset{⏟}{τ (θ)}}, S (θ; 𝐗))$ (by result about covariance). Also, $Var (W) = Var (W - \underset{constant}{\underset{⏟}{τ (θ)}})$ (by result about variance). Thus, we have $\begin{matrix} (Cov (W - τ (θ), S (θ; 𝐗)))^{2} & = Var (W - τ (θ)) Var (S (θ; 𝐗)) \\ \Leftrightarrow & \frac{(Cov (W - τ (θ), S (θ; 𝐗)))^{2}}{Var (W - τ (θ)) Var (S (θ; 𝐗))} & = 1 \\ \Leftrightarrow & \frac{(Cov (S (θ; 𝐗), W - τ (θ)))^{2}}{Var (W - τ (θ)) Var (S (θ; 𝐗))} & = 1 \\ \Leftrightarrow & (ρ (S (θ; 𝐗), W - τ (θ)))^{2} & = 1 \\ \Leftrightarrow & ρ (S (θ; 𝐗), W - τ (θ)) & = \pm 1 \end{matrix}$ where $ρ (\cdot, \cdot)$ is the correlation coefficient between two random variables. This means $S (θ; 𝐗)$ increases or decreases linearly with $W - τ (θ)$ , i.e., $S (θ; 𝐗) = k (W - τ (θ)) + c$ for some constants $c, k$ . Now, it suffices to show that the constant $c$ is actually zero.

We know that $𝔼 [W] = τ (θ)$ (since $W$ is an unbiased estimator of $τ (θ)$ ), and $𝔼 [S (θ; 𝐗)] = 0$ (from remark about Fisher information). Thus, applying expectations on both side gives $𝔼 [S (θ; 𝐗)] = k 𝔼 [W - τ (θ)] + c ⟺ 𝔼 [S (θ; 𝐗)] = k (\underset{= 0}{\underset{⏟}{𝔼 [W] - τ (θ)}}) + c ⟺ 0 = 0 + c ⟺ c = 0 .$ Then, the result follows.

$◻$

Template:Colored remark Template:Colored example Template:Colored remark Template:Colored example Template:Colored remark We have discussed MLE previously, and MLE is actually a "best choice" asymptotically (i.e., as the sample size $n \to \infty$ ) according to the following theorem. Template:Colored theorem

Proof. Template:Colored em: we consider the Taylor series of order 2 for $\frac{d}{d θ} \ln ℒ (θ)$ , and we will get $\frac{d}{d θ} \ln ℒ (\hat{θ}) = \frac{d}{d θ} \ln ℒ (θ) + (\hat{θ} - θ) \frac{d^{2}}{d θ^{2}} \ln ℒ (θ) + \frac{1}{2} (\hat{θ} - θ)^{2} \frac{d^{3}}{d θ^{3}} \ln ℒ (θ) |_{θ = θ^{*}}$ where $θ^{*}$ is between $θ$ and $\hat{θ}$ . Since $\hat{θ}$ is the MLE of $θ$ , from the derivative test, we know that $\frac{d}{d θ} \ln ℒ (\hat{θ}) = 0$ (we apply regularity condition to ensure the existence of this derivative). Hence, we have $\begin{matrix} \frac{d}{d θ} \ln ℒ (θ) + (\hat{θ} - θ) \frac{d^{2}}{d θ^{2}} \ln ℒ (θ) + \frac{1}{2} (\hat{θ} - θ)^{2} \frac{d^{3}}{d θ^{3}} \ln ℒ (θ) |_{θ = θ^{*}} & = 0 \\ \Rightarrow & - \sqrt{n} (\hat{θ} - θ) \frac{d^{2}}{d θ^{2}} \ln ℒ (θ) - \frac{\sqrt{n}}{2} (\hat{θ} - θ)^{2} \frac{d^{3}}{d θ^{3}} \ln ℒ (θ) |_{θ = θ^{*}} = \sqrt{n} \frac{d}{d θ} \ln ℒ (θ) \\ \Rightarrow & \sqrt{n} (\hat{θ} - θ) = \frac{\frac{d}{d θ} \ln ℒ (θ) / \sqrt{n}}{- n^{- 1} \frac{d^{2}}{d θ^{2}} \ln ℒ (θ) - (2 n)^{- 1} (\hat{θ} - θ) \frac{d^{3}}{d θ^{3}} \ln ℒ (θ) |_{θ = θ^{*}}} . \end{matrix}$ Since $Var (\sum_{i = 1}^{n} \frac{\partial \ln f (X_{i}; θ)}{\partial θ}) = \sum_{i = 1}^{n} Var (\frac{\partial \ln f (X_{i}; θ)}{\partial θ}) = \sum_{i = 1}^{n} 𝔼 [{(\frac{\partial \ln f (X_{i}; θ)}{\partial θ})}^{2}] = n ℐ (θ) (1),$ by central limit theorem, $\frac{\frac{d}{d θ} \ln ℒ (θ)}{\sqrt{n}} = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \frac{\partial \ln f (X_{i}; θ)}{\partial θ} \overset{d}{\to} 𝒩 (0, (1 / n) n I (θ)) \equiv 𝒩 (0, ℐ (θ)) .$ Furthermore, we apply the weak law of large number to show that $- n^{- 1} \frac{d^{2}}{d θ^{2}} \ln ℒ (θ) = - \frac{1}{n} \sum_{i = 1}^{n} \frac{\partial^{2} \ln f (X_{i}; θ)}{\partial θ^{2}} \overset{p}{\to} - 𝔼 [\frac{\partial^{2} \ln f (X_{i}; θ)}{\partial θ^{2}}] = ℐ (θ) (2) .$ It can be shown in a quite complicated way (and using regularity conditions) that $- (2 n)^{- 1} (\hat{θ} - θ) \frac{d^{3}}{d θ^{3}} \ln ℒ (θ) |_{θ = θ^{*}} \overset{p}{\to} 0 . (3) .$ Considering $(2)$ and $(3)$ , using property of convergence in probability, we have $- n^{- 1} \frac{d^{2}}{d θ^{2}} \ln ℒ (θ) - (2 n)^{- 1} (\hat{θ} - θ) \frac{d^{3}}{d θ^{3}} \ln ℒ (θ) |_{θ = θ^{*}} \overset{p}{\to} ℐ (θ) + 0 = ℐ (θ) (4) .$ Considering $(1)$ and $(4)$ , and using Slutsky's theorem, we have $\sqrt{n} (\hat{θ} - θ) = \frac{\frac{d}{d θ} \ln ℒ (θ) / \sqrt{n}}{- n^{- 1} \frac{d^{2}}{d θ^{2}} \ln ℒ (θ) - (2 n)^{- 1} (\hat{θ} - θ) \frac{d^{3}}{d θ^{3}} \ln ℒ (θ) |_{θ = θ^{*}}} \overset{d}{\to} \frac{Y}{ℐ (θ)}$ where $Y \sim 𝒩 (0, ℐ (θ))$ , and hence $\frac{Y}{ℐ (θ)} \sim 𝒩 (0, \frac{ℐ (θ)}{[ℐ (θ)]^{2}}) \equiv 𝒩 (0, 1 / ℐ (θ))$ . It follows that $\sqrt{n} (\hat{θ} - θ) \overset{d}{\to} 𝒩 (0, 1 / ℐ (θ)) .$ This means $\hat{θ} - θ \overset{d}{\to} 𝒩 (0, 1 / (n ℐ (θ))) \equiv 𝒩 (0, 1 / ℐ_{n} (θ)),$ and thus $\frac{\hat{θ} - θ}{\sqrt{1 / ℐ_{n} (θ)}} \overset{d}{\to} 𝒩 (0, \frac{1 / (n ℐ (θ))}{1 / \underset{= n ℐ (θ)}{\underset{⏟}{ℐ_{n} (θ)}}}) \equiv 𝒩 (0, 1)$ as desired.

$◻$

Template:Colored remark Since we are not able to use the CRLB to find UMVUE in some situations, we will introduce another method to find UMVUE in the following, which uses the concepts of Template:Colored em and Template:Colored em.

Sufficiency

Intuitively, a Template:Colored em $T (X_{1}, \dots, X_{n})$ , which is a function of a given random sample $X_{1}, \dots, X_{n}$ , contains all information needed for estimating the unknown parameter (vector) $θ$ . Thus, the statistic $T (X_{1}, \dots, X_{n})$ itself is "sufficient" for estimating the unknown parameter (vector) $θ$ .

Formally, we can define and describe Template:Colored em as follows: Template:Colored definition Template:Colored remark Template:Colored example Template:Colored remark Let us state the above remark about transformation of sufficient statistic formally below. Template:Colored proposition Now, we discuss a theorem that helps us to check the sufficiency of a statistic, namely (Fisher-Neyman) Template:Colored em. Template:Colored theorem

Proof. Since the proof for continuous case is quite complicated, we will only give a proof for the discrete case. For simplicity of presentation, let $𝐗 = (X_{1}, \dots, X_{n})$ , $T = T (X_{1}, \dots, X_{n})$ , $𝐱 = (x_{1}, \dots, x_{n})$ , and $t = T (x_{1}, \dots, x_{n})$ , and hence there are notations for different types of pmfs from these. By definition, $f_{𝐗 | T} (𝐱 | t; θ) = f_{𝐗 | T} (𝐱, t)$ . Also, we have $𝐗 = 𝐱 ⟺ 𝐗 = 𝐱 \cap T (𝐗) = T (𝐱) ⟺ 𝐗 = 𝐱 \cap T = t$ . Thus, we can write $f_{𝐗, T} (𝐱, t; θ) = f_{𝐗} (𝐱; θ) (*)$ .

"only if" ( $\Rightarrow$ ) direction: Assume $T$ is a sufficient statistic. Then, we choose $g (t; θ) = f_{T} (t; θ)$ and $h (𝐱) = f_{𝐗 | T} (𝐱 | t)$ , which does not depend on $θ$ by the definition of sufficient statistic. It remains to verify that the equation actually holds for this choice.

Hence, $f_{𝐗} (𝐱; θ) = f_{𝐗, T} (𝐱, t; θ) \overset{def}{=} f_{𝐗 | T} (𝐱 | t; θ) f_{T} (t; θ) \overset{sufficiency}{=} f_{𝐗 | T} (𝐱 | t) f_{T} (t; θ) = h (𝐱) g (t; θ) .$

"if" ( $\Leftarrow$ ) direction: Assume we can write $f_{𝐗} (𝐱; θ) = g (t; θ) h (𝐱)$ . Then, $f_{T} (t; θ) \overset{marginal pmf}{=} \sum_{𝐱}^{} f_{𝐗, T} (𝐱, t; θ) \overset{(*)}{=} \sum_{𝐱}^{} f_{𝐗} (𝐱; θ) \overset{assumption}{=} \sum_{𝐱}^{} g (t; θ) h (𝐱) = \underset{independent from 𝐱}{\underset{⏟}{g (t; θ)}} \sum_{𝐱}^{} h (𝐱) .$ Now, we aim to show that $f_{𝐗 | T} (𝐱 | t)$ does not depend on $θ$ , which means $T$ is a sufficient statistic for $θ$ . We have $f_{𝐗 | T} (𝐱 | t) \overset{def}{=} \frac{f_{𝐗, T} (𝐱, t; θ)}{f_{T} (t; θ)} \overset{(*)}{=} \frac{f_{𝐗} (𝐱; θ)}{f_{T} (t; θ)} = \frac{\overset{assumption}{\overset{⏞}{g (t; θ) h (𝐱)}}}{\underset{above}{\underset{⏟}{g (t; θ) \sum_{𝐱}^{} h (𝐱)}}} = \frac{h (𝐱)}{\sum_{𝐱}^{} h (𝐱)},$ which does not depend on $θ$ , as desired.

$◻$

Template:Colored remark Template:Colored example For some "nice" distributions, which belong to Template:Colored em, sufficient statistics can be found using another alternative method easily and more conveniently. This method works because of the "nice" form of the pdf or pmf of those distributions, which can be characterized as follows: Template:Colored definition Template:Colored remark Template:Colored example Template:Colored theorem

Proof. Since the distribution belongs to the exponential family, the joint pdf or pmf of $X_{1}, \dots, X_{n}$ can be expressed as $\begin{matrix} f (x_{1}, \dots, x_{n}; θ) & = \prod_{j = 1}^{n} [h (x_{j}) g (θ) \exp (\sum_{i = 1}^{s} η_{i} (θ) T_{i} (x_{j}))] \\ = [\prod_{j = 1}^{n} h (x_{j})] (g (θ))^{n} \exp (\sum_{j = 1}^{n} \sum_{i = 1}^{s} η_{i} (θ) T_{i} (x_{j})) \\ = [\prod_{j = 1}^{n} h (x_{j})] (g (θ))^{n} \exp (\sum_{i = 1}^{s} \sum_{j = 1}^{n} η_{i} (θ) T_{i} (x_{j})) & (changing summation order, where the upper bounds are constants) \\ = [\prod_{j = 1}^{n} h (x_{j})] (g (θ))^{n} \exp (\sum_{i = 1}^{s} \underset{independent from j}{\underset{⏟}{η_{i} (θ)}} \sum_{j = 1}^{n} T_{i} (x_{j})) \\ = [\prod_{j = 1}^{n} h (x_{j})] (g (θ))^{n} \exp (η_{1} (θ) \sum_{j = 1}^{n} T_{1} (x_{j}) + \dots + η_{s} (θ) \sum_{j = 1}^{n} T_{s} (x_{j})) . \end{matrix}$ From here, for applying the factorization theorem, we can identify the Template:Color part of the function as " $h (x_{1}, \dots, x_{n})$ ", and the Template:Color part of the function as " $g (T (x_{1}, \dots, x_{n}); θ)$ ". We can notice that the Template:Color part of the function depends on $x_{1}, \dots, x_{n}$ only through $(\sum_{j = 1}^{n} T_{1} (x_{j}), \dots, \sum_{j = 1}^{n} T_{s} (x_{j}))$ . The result follows.

$◻$

Template:Colored example Now, we will start discussing how is sufficient statistic related to UMVUE. We begin our discussion by Template:Colored em. Template:Colored theorem

Proof. Assume $W$ is an arbitrary unbiased estimator of $τ (θ)$ , and $T$ is a sufficient statistic of $θ$ .

First, we prove that $φ (T)$ is an unbiased estimator of $τ (θ)$ . Before proving the unbiasedness, we should ensure that $φ (T)$ is actually an estimator, i.e., it is a statistic, which is a function of random sample, and needs to be independent from $θ$ (so that it is calculable): since $W$ is a function of random sample, and $T$ is a sufficient statistic, which make the conditional distribution of $W$ , given $T$ , Template:Colored em of $θ$ . Also, $φ (T) = 𝔼 [W | T]$ is a function of $W$ , and thus is also a function of random sample.

Now, we prove that $φ (T)$ is an unbiased estimator of $τ (θ)$ : since $𝔼 [φ (T)] = 𝔼 [𝔼 [W | T]] \overset{law of total expectation}{=} 𝔼 [W] \overset{unbiasedness}{=} τ (θ)$ , $φ (T)$ is an unbiased estimator of $τ (θ)$ .

Next, we prove that $Var (φ (T)) \leq Var (W)$ : by law of total variance, we have $Var (W) = Var (𝔼 [W | T]) + 𝔼 [Var (W | T)] \overset{def}{=} Var (φ (T)) + \overset{\geq 0}{\overset{⏞}{𝔼 [\underset{\geq 0}{\underset{⏟}{Var (W | T)}}]}} \geq Var (φ (T)),$ as desired.

$◻$

Template:Colored remark To actually determine the UMVUE, we need another theorem, called Template:Colored em, which is based on Rao-Blackwell theorem, and requires the concept of Template:Colored em.

Completeness

Template:Colored definition When a random sample $X_{1}, \dots, X_{n}$ is from a distribution in exponential family, then a complete statistic can also be founded easily, similar to the case for sufficient statistic. Template:Colored theorem

Proof. Omitted.

$◻$

Template:Colored remark Template:Colored theorem

Proof. Assume $T$ is a Template:Colored em for $θ$ and $𝔼 [φ (T)] = τ (θ)$ .

Since $T$ is a sufficient statistic for $θ$ , we can apply the Rao-Blackwell theorem. From Rao-Blackwell theorem, if $W$ is an arbitrary unbiased estimator of $τ (θ)$ , then $φ (T)$ is another unbiased estimator where $Var (φ (T)) \leq Var (W)$ .

To prove that $φ (T)$ is the unique UMVUE of $τ (θ)$ , we proceed to show that Template:Colored em of the choice of the unbiased estimator $W$ of $τ (θ)$ , we get the Template:Colored em $φ (T)$ from the Rao-Blackwell theorem (with probability 1). Then, we will have Template:Colored em possible unbiased estimator $W$ of $τ (θ)$ , $Var (φ (T)) \leq Var (W)$ (with probability 1) ^[7], which means $φ (T)$ is the UMVUE, and is also the Template:Colored em UMVUE since we always get the same $φ (T)$ ^[8].

Assume that $W^{'}$ is Template:Colored em unbiased estimator of $τ (θ)$ ( $W^{'} \neq W$ ). By Rao-Blackwell theorem again, there is an unbiased estimator $ψ (T) = 𝔼 [W^{'} | T]$ ( $ψ (T) \neq φ (T)$ ) where $Var (ψ (T)) \leq Var (W^{'})$ . Since both $φ (T)$ and $ψ (T)$ are unbiased estimators of $τ (θ)$ , we have for each $θ \in Θ$ , $𝔼 [φ (T)] = 𝔼 [ψ (T)] ⟹ 𝔼 [φ (T) - ψ (T)] = 0 .$ Since $T$ is a complete statistic, we have $ℙ (φ (T) - ψ (T) = 0) = 1 ⟹ ℙ (φ (T) = ψ (T)) = 1,$ which means $φ (T) = ψ (T)$ (with probability 1), i.e., we get the same $φ (T)$ from the Rao-Blackwell theorem in this case (with probability 1).

$◻$

Template:Colored remark Template:Colored example Template:Colored remark Template:Colored example Template:Colored exercise

Consistency

In the previous sections, we have discussed Template:Colored em and Template:Colored em. In this section, we will discuss another property called Template:Colored em. Template:Colored definition Template:Colored remark Template:Colored proposition

Proof. Assume $\hat{θ}$ is an (asymptotically) unbiased estimator of an unknown parameter $θ$ and $Var (\hat{θ}) \to 0$ as $n \to \infty$ . Since $\hat{θ}$ is an (asymptotically) unbiased estimator of $θ$ , we have $\lim_{n \to \infty} Bias (\hat{θ}) = 0$ (this is true for both asymptotically unbiased estimator and unbiased estimator of $θ$ ). In addition to this, we have by assumption that $\lim_{n \to \infty} Var (\hat{θ}) = 0$ . By definition of mean squared error, these imply that $\lim_{n \to \infty} MSE (\hat{θ}) = 0 \Rightarrow \lim_{n \to \infty} 𝔼 [(\hat{θ} - θ)^{2}] = 0$ . Thus, as $n \to \infty$ , we have by Chebyshov's inequality (notice that $MSE (\hat{θ}) = 𝔼 [(\hat{θ} - θ)^{2}]$ exist from above), for each $ε > 0$ , $ℙ (| \hat{θ} - θ | > ε) \leq \frac{𝔼 [(\hat{θ} - θ)^{2}]}{ε^{2}} \to \frac{0}{ε^{2}} = 0 .$ Since probability is nonnegative ( $\geq 0$ ), and this probability is less than or equal to an expression that tends to be 0 as $n \to \infty$ , we conclude that this probability tends to be zero as $n \to \infty$ . That is, $\hat{θ}$ is a Template:Colored em of $θ$ .

$◻$

Template:Colored remark Template:Colored example Template:Nav Template:BookCat

↑ For the parameter vector, it contains all parameters governing the distribution.
↑ We will simply use " $θ$ " when we do not know whether it is parameter vector or just a single parameter. We may use $θ$ instead if we know it is indeed a parameter vector.
↑ We will discuss some criterion for "good" in the #Properties of estimator section.
↑ For each positive integer $r$ , $m_{r}$ always exist, unlike $μ_{r}$ .
↑ "uniformly" means that the variance is minimum compared to other unbiased estimators, Template:Colored em (i.e., for each possible value of $θ \in Θ$ ). That is, the variance is not just minimum for a particular value of $θ$ , but all possible values of $θ$ .
↑ This is different from the minimum value. For Template:Colored em, it only needs to be smaller than all variances involved, and there may not be any variance that actually achieve this lower bound. However, for the minimum value, it has to be one of the values of the variance.
↑ Notice that this is a stronger result than the result in the Rao-Blackwell theorem, where the latter only states that $Var (φ (T)) \leq Var (W)$ , Template:Colored em
↑ Indeed, we know that UMVUE must be unique from previous proposition. However, in this argument, when we show that $φ (T)$ is UMVUE, we also automatically show that it is unique.

[1] For the parameter vector, it contains all parameters governing the distribution.

[2] We will simply use " $θ$ " when we do not know whether it is parameter vector or just a single parameter. We may use $θ$ instead if we know it is indeed a parameter vector.

[3] We will discuss some criterion for "good" in the #Properties of estimator section.

[4] For each positive integer $r$ , $m_{r}$ always exist, unlike $μ_{r}$ .

[5] "uniformly" means that the variance is minimum compared to other unbiased estimators, Template:Colored em (i.e., for each possible value of $θ \in Θ$ ). That is, the variance is not just minimum for a particular value of $θ$ , but all possible values of $θ$ .

[6] This is different from the minimum value. For Template:Colored em, it only needs to be smaller than all variances involved, and there may not be any variance that actually achieve this lower bound. However, for the minimum value, it has to be one of the values of the variance.

[7] Notice that this is a stronger result than the result in the Rao-Blackwell theorem, where the latter only states that $Var (φ (T)) \leq Var (W)$ , Template:Colored em

[8] Indeed, we know that UMVUE must be unique from previous proposition. However, in this argument, when we show that $φ (T)$ is UMVUE, we also automatically show that it is unique.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

Statistics/Point Estimation

Contents

Introduction

Maximum likelihood estimator (MLE)

Method of moments estimator (MME)

Properties of estimator

Unbiasedness

Efficiency

Uniformly minimum-variance unbiased estimator

Cramer-Rao lower bound

Sufficiency

Completeness

Consistency

Navigation menu

Statistics/Point Estimation

Introduction

Maximum likelihood estimator (MLE)

Method of moments estimator (MME)

Properties of estimator

Unbiasedness

Efficiency

Uniformly minimum-variance unbiased estimator

Cramer-Rao lower bound

Sufficiency

Completeness

Consistency

Navigation menu

Search