Statistics/Hypothesis Testing

Introduction

In previous chapters, we have discussed two methods for Template:Colored em, namely Template:Colored em and Template:Colored em. Estimating unknown parameters is an important area in statistical inference, and in this chapter we will discuss another important area, namely Template:Colored em, which is related to Template:Colored em. Indeed, the concepts of Template:Colored em and Template:Colored em are closely related, as we will demonstrate.

Basic concepts and terminologies

Before discussing how to Template:Colored em hypothesis testing, and Template:Colored em the "goodness" of a hypothesis test, let us introduce some basic concepts and terminologies related to hypothesis testing first. Template:Colored definition There are two terms that classify hypotheses: Template:Colored definition Sometimes, it is not immediately clear that whether a hypothesis is simple or composite. To understand the classification of hypotheses more clearly, let us consider the following example. Template:Colored example In hypothesis tests, we consider two hypotheses: Template:Colored definition Template:Colored remark A general form of $H_{0}$ and $H_{1}$ is $H_{0} : θ \in Θ_{0}$ and $H_{1} : θ \in Θ_{1}$ where $Θ_{1} = Θ_{0}^{c}$ , which is the Template:Colored em of $Θ_{0}$ (with respect to $Θ$ ), i.e., $Θ_{0}^{c} = Θ ∖ Θ_{0}$ ( $Θ$ is the parameter space, containing all possible values of $θ$ ). The reason for choosing the complement of $Θ_{0}$ in $H_{1}$ is that $H_{1}$ is the complementary hypothesis to $H_{0}$ , as suggested in the above definition. Template:Colored remark Template:Colored example We have mentioned that exactly one of $H_{0}$ and $H_{1}$ is assumed to be true. To make a decision, we need to Template:Colored em which hypothesis should be regarded as true. Of course, as one may expect, this decision is not perfect, and we will have some errors involved in our decision. So, we cannot say we "prove that" a particular hypothesis is true (that is, we cannot be Template:Colored em that a particular hypothesis is true). Despite this, we may "regard" (or "accept") a particular hypothesis as true (but Template:Colored em prove it as true) when we have Template:Colored em that lead us to make this decision (ideally, with small errors ^[1]). Template:Colored remark Now, we are facing with two questions. First, what evidences should we consider? Second, what is meant by "sufficient"? For the first question, a natural answer is that we should consider the observed Template:Colored em, right? This is because we are making hypothesis about the population, and the samples are taken from, and thus closely related to the population, which should help us make the decision.

To answer the second question, we need the concepts in Template:Colored em. In particular, in hypothesis testing, we will construct a so-called Template:Colored em or Template:Colored em to help us determining that Template:Colored em we should reject the Template:Colored em hypothesis (i.e., regard $H_{0}$ as false), and hence (naturally) regard $H_{1}$ as true ("accept" $H_{1}$ ) (we have assumed that exactly one of $H_{0}$ and $H_{1}$ is true, so when we regard one of them as false, we should regard another of them as true). In particular, when we do Template:Colored em reject $H_{0}$ , we will act as if, or accept $H_{0}$ as true (and thus should also reject $H_{1}$ since exactly one of $H_{0}$ of $H_{1}$ is true).

Let us formally define the terms related to hypothesis testing in the following. Template:Colored definition Template:Colored remark Template:Colored definition Template:Colored remark Typically, we use Template:Colored em (a statistic for conducting a hypothesis test) to specify the rejection region. For instance, if the random sample is $X_{1}, \dots, X_{n}$ and the test statistic is $\overline{X}$ , the rejection region may be, say, $R = {𝐱 : \overline{x} < 2}$ (where $x_{1}, \dots, x_{n}$ and $\overline{x}$ is observed value of $X_{1}, \dots, X_{n}$ and $\overline{X}$ respectively). Through this, we can directly construct a hypothesis test: when $𝐱 \in R$ , we reject $H_{0}$ and accept $H_{1}$ . Otherwise, if $𝐱 \in R^{c}$ , we accept $H_{0}$ . So, in general, to specify the rule in a hypothesis test, we just need a Template:Colored em. After that, we will apply the test on testing $H_{0}$ against $H_{1}$ . There are some terminologies related to the hypothesis tests constructed in this way: Template:Colored definition Template:Colored remark Template:Colored example As we have mentioned, the decisions made by hypothesis test should not be perfect, and errors occur. Indeed, when we think carefully, there are actually Template:Colored em of errors, as follows: Template:Colored definition We can illustrate these two types of errors more clearly using the following table.

Type I and II errors
	Accept $H_{0}$	Reject $H_{0}$
$H_{0}$ is true	Correct decision	Type I error
$H_{0}$ is false	Type II error	Correct decision

We can express $H_{0} : θ \in Θ_{0}$ and $H_{1} : θ \in Θ_{0}^{c}$ . Also, assume the rejection region is $R = R (𝐗)$ (i.e., the rejection region with " $x$ " replaced by " $X$ "). In general, when " $R$ " is put together with " $X$ ", we assume $R = R (𝐗)$ .

Then we have some notations and expressions for Template:Colored em of making type I and II errors: (let $X_{1}, \dots, X_{n}$ be a random sample and $𝐗 = (X_{1}, \dots, X_{n})$ )

The probability of making a type I error, denoted by $α (θ)$ , is $ℙ_{θ} (𝐗 \in R)$ if $θ \in Θ_{0}$ .
The probability of making a type II error, denoted by $β (θ)$ , is $ℙ_{θ} (𝐗 \in R^{c}) = 1 - ℙ_{θ} (𝐗 \in R)$ if $θ \in Θ_{0}^{c}$ .

Template:Colored remark Notice that we have a common expression in both $α (θ)$ and $β (θ)$ , which is " $ℙ_{θ} ((X_{1}, \dots, X_{n}) \in R)$ ". Indeed, we can also write this expression as $ℙ_{θ} ((X_{1}, \dots, X_{n}) \in R) = {\begin{matrix} α (θ), & θ \in Θ_{0}; \\ 1 - β (θ), & θ \in Θ_{0}^{c} . \end{matrix}$ Through this, we can observe that this expression contains all informations about the probabilities of making errors, given a hypothesis test with rejection $R$ . Hence, we will give a special name to it: Template:Colored definition Template:Colored remark Template:Colored example Template:Colored example Ideally, we want to make both $α (θ)$ and $β (θ)$ arbitrarily small. But this is generally impossible. To understand this, we can consider the following extreme examples:

Set the rejection region $R$ to be $S = {𝐱}$ , which is the set of all possible observations of random samples. Then, $π (θ) = 1$ for each $θ \in Θ$ . From this, of course we have $β (θ) = 0$ , which is nice. But the serious problem is that $α (θ) = 1$ due to the mindless rejection.
Another extreme is setting the rejection region $R$ to be the empty set $\emptyset$ . Then, $π (θ) = 0$ for each $θ \in Θ$ . From this, we have $α (θ) = 0$ , which is nice. But, again the serious problem is that $β (θ) = 1$ due to the mindless acceptance.

We can observe that to make $α (θ)$ ( $β (θ)$ ) to be very small, it is inevitable that $β (θ)$ ( $α (θ)$ ) will increase consequently, due to accepting (rejecting) "too much". As a result, we can only try to minimize the probability of making one type of error, holding the probability of making another type of error Template:Colored em.

Now, we are interested in knowing that Template:Colored em type of errors should be controlled. To motivate the choice, we can again consider the analogy of legal principle of presumption of innocence. In this case, type I error means proving an innocent guilty, and type II error means acquitting a guilty person. Then, as suggested by Blackstone's ratio, type I error is more serious and important than type II error. This motivates us to control the probability of type I error, i.e., $α (θ)$ , at a specified small value $α^{*}$ , so that we can control the probability of making this more serious error. After that, we consider the tests that "control the type I error probability at this level", and the one with the smallest $β (θ)$ is the "best" one (in the sense of probability of making errors).

To describe "control the type I error probability at this level" in a more precise way, let us define the following term. Template:Colored definition Template:Colored remark So, using this definition, controlling the type I error probability at a particular level $α$ means that the size of the test should not exceed $α$ , i.e., $\sup_{θ \in Θ_{0}} π (θ) \leq α$ (in some other places, such test is called a Template:Colored em. Template:Colored example For now, we have focused on using Template:Colored em to conduct hypothesis tests. But this is not the only way. Alternatively, we can make use of $p$ -value. Template:Colored definition Template:Colored remark The following theorem allows us to use $p$ -value for hypothesis testing. Template:Colored theorem

Proof. (Partial) We can prove "if" and "only if" directions at once. Let us first consider the case 1 in the definition of $p$ -value. By definitions, $p$ -value is $\sup_{θ \in Θ_{0}} ℙ_{θ} (T (𝐗) \leq T^{*} (𝐱))$ and $α = \sup_{θ \in Θ_{0}} π (θ) = \sup_{θ \in Θ_{0}} ℙ_{θ} (T (𝐗) \leq T^{*} (𝐱))$ (Define $T^{*} (𝐗)$ such that $T (𝐗) \leq T^{*} (𝐱) ⟺ (X_{1}, \dots, X_{n}) \in R$ .). Then, we have $\begin{matrix} p -value \leq α & ⟺ \sup_{θ \in Θ_{0}} ℙ_{θ} (T (𝐗) \leq T (𝐱)) \leq \sup_{θ \in Θ_{0}} ℙ_{θ} (T (𝐗) \leq T^{*} (𝐱)) \\ ⟺ T (𝐱) \leq T^{*} (𝐱) & (by some omitted arguments and the monotonicity of cdf) \\ ⟺ (x_{1}, \dots, x_{n}) \in {(y_{1}, \dots, y_{n}) : T (y_{1}, \dots, y_{n}) \leq T^{*} (𝐱)} & (x_{1}, \dots, x_{n} are realizations of X_{1}, \dots, X_{n} respectively) \\ ⟺ (x_{1}, \dots, x_{n}) \in R & (defined above) \\ ⟺ H_{0} is rejected at significance level α . & (the test with power function π (θ) is size α test) \end{matrix}$ For other cases, the idea is similar (just the directions of inequalities for $T$ are different).

$◻$

Template:Colored remark Template:Colored example

Evaluating a hypothesis test

After discussing some basic concepts and terminologies, let us now study some ways to evaluate goodness of a hypothesis test. As we have previously mentioned, we want the probability of making type I errors and type II errors to be small, but we have mentioned that it is generally impossible to make both probabilities to be arbitrarily small. Hence, we have suggested to control the type I error, using the size of a test, and the "best" test should the one with the smallest probability of making type II error, after controlling the type I error.

These ideas lead us to the following definitions. Template:Colored definition Using this definition, instead of saying "best" test (test with the smallest type II error probability), we can say "a test with the most power", or in other words, the "most powerful test". Template:Colored definition Template:Colored remark

Constructing a hypothesis test

There are many ways of constructing a hypothesis test, but of course not all are good (i.e., "powerful"). In the following, we will provide some common approaches to construct hypothesis tests. In particular, the following lemma is very useful for constructing a MP test with size $α$ .

Neyman-Pearson lemma

Template:Colored lemma

Proof. Let us first consider the case where the underlying distribution is continuous. Template:Color, the "size" requirement for being a UMP test is satisfied immediately. So, it suffices to show that $φ$ satisfies the "UMP" requirement for being a MP test.

Notice that in this case, " $Θ_{1}$ " is simply ${θ_{1}}$ . So, for every test $ψ$ with rejection region $R^{*} \neq R$ and $π_{ψ} (θ_{0}) \leq α$ , we will proceed to show that $π_{φ} (θ_{1}) \geq π_{ψ} (θ_{1})$ .

Since $\begin{matrix} π_{φ} (θ_{1}) - π_{ψ} (θ_{1}) & = ℙ_{θ_{1}} ((X_{1}, \dots, X_{n}) \in R) - ℙ_{θ_{1}} ((X_{1}, \dots, X_{n}) \in R^{*}) \\ = \int \dots \int_{R}^{} ℒ (θ_{1}; 𝐱) d x_{n} \dots d x_{1} - \int \dots \int_{R^{*}}^{} ℒ (θ_{1}; 𝐱) d x_{n} \dots d x_{1} \\ = \int \dots \int_{R}^{} ℒ (θ_{1}; 𝐱) d x_{n} \dots d x_{1} - \int \dots \int_{R \cap R^{*}}^{} ℒ (θ_{1}; 𝐱) d x_{n} \dots d x_{1} - (\int \dots \int_{R^{*}}^{} ℒ (θ_{1}; 𝐱) d x_{n} \dots d x_{1} - \int \dots \int_{R \cap R^{*}}^{} ℒ (θ_{1}; 𝐱) d x_{n} \dots d x_{1}) \\ = \int \dots \int_{R ∖ R^{*}}^{} ℒ (θ_{1}; 𝐱) d x_{n} \dots d x_{1} - \int \dots \int_{R^{*} ∖ R}^{} ℒ (θ_{1}; 𝐱) d x_{n} \dots d x_{1} \\ \geq \frac{1}{k} \int \dots \int_{R ∖ R^{*}}^{} ℒ (θ_{0}; 𝐱) d x_{n} \dots d x_{1} - \frac{1}{k} \int \dots \int_{R^{*} ∖ R}^{} ℒ (θ_{0}; 𝐱) d x_{n} \dots d x_{1} (In R, ℒ (θ_{1}; 𝐱) \geq \frac{1}{k} ℒ (θ_{0}; 𝐱) . In R^{c}, ℒ (θ_{1}; 𝐱) < \frac{1}{k} ℒ (θ_{0}; 𝐱) ⟺ - ℒ (θ_{1}; 𝐱) > - \frac{1}{k} ℒ (θ_{0}; 𝐱)) \\ = \frac{1}{k} \int \dots \int_{R ∖ R^{*}}^{} ℒ (θ_{0}; 𝐱) d x_{n} \dots d x_{1} + \frac{1}{k} \int \dots \int_{R \cap R^{*}}^{} ℒ (θ_{0}; 𝐱) d x_{n} \dots d x_{1} - (\frac{1}{k} \int \dots \int_{R^{*} ∖ R}^{} ℒ (θ_{0}; 𝐱) d x_{n} \dots d x_{1} + \frac{1}{k} \int \dots \int_{R \cap R^{*}}^{} ℒ (θ_{0}; 𝐱) d x_{n} \dots d x_{1}) \\ = \frac{1}{k} \int \dots \int_{R}^{} ℒ (θ_{0}; 𝐱) d x_{n} \dots d x_{1} - \frac{1}{k} \int \dots \int_{R^{*}}^{} ℒ (θ_{0}; 𝐱) d x_{n} \dots d x_{1} \\ = \frac{1}{k} (\underset{= α}{\underset{⏟}{ℙ_{θ_{0}} ((X_{1}, \dots, X_{n}) \in R)}} - \underset{\leq α}{\underset{⏟}{ℙ_{θ_{0}} ((X_{1}, \dots, X_{n}) \in R^{*})}}) \\ \geq \frac{1}{k} (α - α) = 0, \end{matrix}$ we have $π_{ϕ} (θ_{1}) \geq π_{ψ} (θ_{1})$ as desired.

For the case where the underlying distribution is discrete, the proof is very similar (just replace the integrals with sums), and hence omitted.

$◻$

Template:Colored remark Even if the hypotheses involved in the Neyman-Pearson lemma are simple, with some conditions, we can use the lemma to construct a UMP test for testing Template:Colored em null hypothesis against Template:Colored em alternative hypothesis. The details are as follows: For testing $H_{0} : θ \leq θ_{0} vs. H_{1} : θ > θ_{0}$

Find a MP test $φ$ with size $α$ , for testing $H_{0} : θ = θ_{0} vs. H_{1} : θ = θ_{1} > θ_{0}$ using the Neyman-Pearson lemma, where $θ_{1}$ is an arbitrary value such that $θ_{1} > θ_{0}$ .
Template:Colored em, then the test $φ$ has the greatest power for Template:Colored em $θ \in Θ_{1} = {ϑ : ϑ > θ_{0}}$ . So, the test $φ$ is a UMP test with size $α$ for testing $H_{0} : θ = θ_{0} vs. H_{1} : θ > θ_{0}$
Template:Colored em, then it means that the size of the test $φ$ is still $α$ , even if the null hypothesis is changed to $H_{0} : θ \leq θ$ . So, after changing $H_{0} : θ = θ_{0}$ to $H_{0} : θ \leq θ_{0}$ and not changing $H_{1}$ (also adjusting the parameter space) for the test $φ$ , the test $φ$ still satisfies the "MP" requirement (because of not changing $H_{1}$ , so the result in step 2 still applies), and also the test $φ$ will satisfy the "size" requirement (because of changing $H_{0}$ in this way). Hence, the test $φ$ is a UMP test with size $α$ for testing $H_{0} : θ \leq θ_{0} vs. H_{1} : θ > θ_{0}$ .

For testing $H_{0} : θ \geq θ_{0} vs. H_{1} : θ < θ_{0}$ , the steps are similar. But in general, there is no UMP test for testing $H_{0} : θ = θ_{0} vs. H_{1} : θ \neq θ_{0}$ .

Of course, when the condition in step 3 holds but that in step 2 does not hold, the test $φ$ in step 1 is a UMP test with size $α$ for testing $H_{0} : θ \leq θ_{0} vs. H_{1} : θ = θ_{1}$ where $θ_{1}$ is a constant (which is larger than $θ_{0}$ , or else $H_{1}$ and $H_{0}$ are not disjoint). However, the hypotheses are generally not in this form. Template:Colored example Template:Colored remark Now, let us consider another example where the underlying distribution is discrete. Template:Colored example

Likelihood-ratio test

Previously, we have suggested using the Neyman-Pearson lemma to construct MPT for testing simple null hypothesis against simple alternative hypothesis. However, when the hypotheses are composite, we may not be able to use the Neyman-Pearson lemma. So, in the following, we will give a general method for constructing tests for any hypotheses, not limited to simple hypotheses. But we should notice that the tests constructed are not necessarily UMPT. Template:Colored definition Template:Colored remark

Relationship between hypothesis testing and confidence intervals

We have mentioned that there are similarities between hypothesis testing and confidence intervals. In this section, we will introduce a theorem suggesting how to construct a hypothesis test from a confidence interval (or in general, confidence Template:Colored em), and vice versa. Template:Colored theorem

Proof. For the first part, since $R (θ_{0})$ is the rejection region of the size $α$ test, we have $ℙ_{θ_{0}} (𝐗 \in R) = α .$ Hence, the coverage probability for the random set $C (𝐗)$ is $ℙ_{θ_{0}} (θ_{0} \in C (𝐗)) = ℙ_{θ_{0}} (𝐗 \in R^{c}) = 1 - α,$ which means that the random set $C (𝐗)$ is a $1 - α$ confidence set of $θ_{0}$ .

For the second part, by assumption, we have $ℙ_{θ} (θ \in C^{*} (𝐗)) = 1 - α .$ So, the size of the test with the rejection region $R (θ_{0})$ is $ℙ_{θ = θ_{0}} (𝐗 \in R (θ_{0})) = ℙ_{θ = θ_{0}} (θ_{0} = θ \notin C^{*} (𝐗)) = 1 - (1 - α) = α .$

$◻$

Template:Colored remark Template:Nav Template:BookCat

↑ Thus, a natural measure of "goodness" of a hypothesis test is its "size of errors". We will discuss these later in this chapter.

[1] Thus, a natural measure of "goodness" of a hypothesis test is its "size of errors". We will discuss these later in this chapter.

[1]

Statistics/Hypothesis Testing

Contents

Introduction

Basic concepts and terminologies

Evaluating a hypothesis test

Constructing a hypothesis test

Neyman-Pearson lemma

Likelihood-ratio test

Relationship between hypothesis testing and confidence intervals

Navigation menu

Statistics/Hypothesis Testing

Introduction

Basic concepts and terminologies

Evaluating a hypothesis test

Constructing a hypothesis test

Neyman-Pearson lemma

Likelihood-ratio test

Relationship between hypothesis testing and confidence intervals

Navigation menu

Search