Statistics/Hypothesis Testing

From testwiki
Revision as of 15:09, 9 December 2022 by imported>LeoChiukl
(diff) ← Older revision | Latest revision (diff) | Newer revision β†’ (diff)
Jump to navigation Jump to search

Template:Nav

Introduction

In previous chapters, we have discussed two methods for Template:Colored em, namely Template:Colored em and Template:Colored em. Estimating unknown parameters is an important area in statistical inference, and in this chapter we will discuss another important area, namely Template:Colored em, which is related to Template:Colored em. Indeed, the concepts of Template:Colored em and Template:Colored em are closely related, as we will demonstrate.

Basic concepts and terminologies

Before discussing how to Template:Colored em hypothesis testing, and Template:Colored em the "goodness" of a hypothesis test, let us introduce some basic concepts and terminologies related to hypothesis testing first. Template:Colored definition There are two terms that classify hypotheses: Template:Colored definition Sometimes, it is not immediately clear that whether a hypothesis is simple or composite. To understand the classification of hypotheses more clearly, let us consider the following example. Template:Colored example In hypothesis tests, we consider two hypotheses: Template:Colored definition Template:Colored remark A general form of H0 and H1 is H0:θΘ0 and H1:θΘ1 where Θ1=Θ0c, which is the Template:Colored em of Θ0 (with respect to Θ), i.e., Θ0c=ΘΘ0 (Θ is the parameter space, containing all possible values of θ). The reason for choosing the complement of Θ0 in H1 is that H1 is the complementary hypothesis to H0, as suggested in the above definition. Template:Colored remark Template:Colored example We have mentioned that exactly one of H0 and H1 is assumed to be true. To make a decision, we need to Template:Colored em which hypothesis should be regarded as true. Of course, as one may expect, this decision is not perfect, and we will have some errors involved in our decision. So, we cannot say we "prove that" a particular hypothesis is true (that is, we cannot be Template:Colored em that a particular hypothesis is true). Despite this, we may "regard" (or "accept") a particular hypothesis as true (but Template:Colored em prove it as true) when we have Template:Colored em that lead us to make this decision (ideally, with small errors [1]). Template:Colored remark Now, we are facing with two questions. First, what evidences should we consider? Second, what is meant by "sufficient"? For the first question, a natural answer is that we should consider the observed Template:Colored em, right? This is because we are making hypothesis about the population, and the samples are taken from, and thus closely related to the population, which should help us make the decision.

To answer the second question, we need the concepts in Template:Colored em. In particular, in hypothesis testing, we will construct a so-called Template:Colored em or Template:Colored em to help us determining that Template:Colored em we should reject the Template:Colored em hypothesis (i.e., regard H0 as false), and hence (naturally) regard H1 as true ("accept" H1) (we have assumed that exactly one of H0 and H1 is true, so when we regard one of them as false, we should regard another of them as true). In particular, when we do Template:Colored em reject H0, we will act as if, or accept H0 as true (and thus should also reject H1 since exactly one of H0 of H1 is true).

Let us formally define the terms related to hypothesis testing in the following. Template:Colored definition Template:Colored remark Template:Colored definition Template:Colored remark Typically, we use Template:Colored em (a statistic for conducting a hypothesis test) to specify the rejection region. For instance, if the random sample is X1,,Xn and the test statistic is X, the rejection region may be, say, R={𝐱:x<2} (where x1,,xn and x is observed value of X1,,Xn and X respectively). Through this, we can directly construct a hypothesis test: when 𝐱R, we reject H0 and accept H1. Otherwise, if 𝐱Rc, we accept H0. So, in general, to specify the rule in a hypothesis test, we just need a Template:Colored em. After that, we will apply the test on testing H0 against H1. There are some terminologies related to the hypothesis tests constructed in this way: Template:Colored definition Template:Colored remark Template:Colored example As we have mentioned, the decisions made by hypothesis test should not be perfect, and errors occur. Indeed, when we think carefully, there are actually Template:Colored em of errors, as follows: Template:Colored definition We can illustrate these two types of errors more clearly using the following table.

Type I and II errors
Accept H0 Reject H0
H0 is true Correct decision Type I error
H0 is false Type II error Correct decision

We can express H0:θΘ0 and H1:θΘ0c. Also, assume the rejection region is R=R(𝐗) (i.e., the rejection region with "x" replaced by "X"). In general, when "R" is put together with "X", we assume R=R(𝐗).

Then we have some notations and expressions for Template:Colored em of making type I and II errors: (let X1,,Xn be a random sample and 𝐗=(X1,,Xn))

  • The probability of making a type I error, denoted by α(θ), is β„™θ(𝐗R) if θΘ0.
  • The probability of making a type II error, denoted by β(θ), is β„™θ(𝐗Rc)=1β„™θ(𝐗R) if θΘ0c.

Template:Colored remark Notice that we have a common expression in both α(θ) and β(θ), which is "β„™θ((X1,,Xn)R)". Indeed, we can also write this expression as β„™θ((X1,,Xn)R)={α(θ),θΘ0;1β(θ),θΘ0c. Through this, we can observe that this expression contains all informations about the probabilities of making errors, given a hypothesis test with rejection R. Hence, we will give a special name to it: Template:Colored definition Template:Colored remark Template:Colored example Template:Colored example Ideally, we want to make both α(θ) and β(θ) arbitrarily small. But this is generally impossible. To understand this, we can consider the following extreme examples:

  • Set the rejection region R to be S={𝐱}, which is the set of all possible observations of random samples. Then, π(θ)=1 for each θΘ. From this, of course we have β(θ)=0, which is nice. But the serious problem is that α(θ)=1 due to the mindless rejection.
  • Another extreme is setting the rejection region R to be the empty set . Then, π(θ)=0 for each θΘ. From this, we have α(θ)=0, which is nice. But, again the serious problem is that β(θ)=1 due to the mindless acceptance.

We can observe that to make α(θ) (β(θ)) to be very small, it is inevitable that β(θ) (α(θ)) will increase consequently, due to accepting (rejecting) "too much". As a result, we can only try to minimize the probability of making one type of error, holding the probability of making another type of error Template:Colored em.

Now, we are interested in knowing that Template:Colored em type of errors should be controlled. To motivate the choice, we can again consider the analogy of legal principle of presumption of innocence. In this case, type I error means proving an innocent guilty, and type II error means acquitting a guilty person. Then, as suggested by Blackstone's ratio, type I error is more serious and important than type II error. This motivates us to control the probability of type I error, i.e., α(θ), at a specified small value α*, so that we can control the probability of making this more serious error. After that, we consider the tests that "control the type I error probability at this level", and the one with the smallest β(θ) is the "best" one (in the sense of probability of making errors).

To describe "control the type I error probability at this level" in a more precise way, let us define the following term. Template:Colored definition Template:Colored remark So, using this definition, controlling the type I error probability at a particular level α means that the size of the test should not exceed α, i.e., supθΘ0π(θ)α (in some other places, such test is called a Template:Colored em. Template:Colored example For now, we have focused on using Template:Colored em to conduct hypothesis tests. But this is not the only way. Alternatively, we can make use of p-value. Template:Colored definition Template:Colored remark The following theorem allows us to use p-value for hypothesis testing. Template:Colored theorem

Proof. (Partial) We can prove "if" and "only if" directions at once. Let us first consider the case 1 in the definition of p-value. By definitions, p-value is supθΘ0β„™θ(T(𝐗)T*(𝐱)) and α=supθΘ0π(θ)=supθΘ0β„™θ(T(𝐗)T*(𝐱)) (Define T*(𝐗) such that T(𝐗)T*(𝐱)(X1,,Xn)R.). Then, we have p-valueαsupθΘ0β„™θ(T(𝐗)T(𝐱))supθΘ0β„™θ(T(𝐗)T*(𝐱))T(𝐱)T*(𝐱)(by some omitted arguments and the monotonicity of cdf)(x1,,xn){(y1,,yn):T(y1,,yn)T*(𝐱)}(x1,,xn are realizations of X1,,Xn respectively)(x1,,xn)R(defined above)H0 is rejected at significance level α.(the test with power function π(θ) is size α test) For other cases, the idea is similar (just the directions of inequalities for T are different).

Template:Colored remark Template:Colored example

Evaluating a hypothesis test

After discussing some basic concepts and terminologies, let us now study some ways to evaluate goodness of a hypothesis test. As we have previously mentioned, we want the probability of making type I errors and type II errors to be small, but we have mentioned that it is generally impossible to make both probabilities to be arbitrarily small. Hence, we have suggested to control the type I error, using the size of a test, and the "best" test should the one with the smallest probability of making type II error, after controlling the type I error.

These ideas lead us to the following definitions. Template:Colored definition Using this definition, instead of saying "best" test (test with the smallest type II error probability), we can say "a test with the most power", or in other words, the "most powerful test". Template:Colored definition Template:Colored remark

Constructing a hypothesis test

There are many ways of constructing a hypothesis test, but of course not all are good (i.e., "powerful"). In the following, we will provide some common approaches to construct hypothesis tests. In particular, the following lemma is very useful for constructing a MP test with size α.

Neyman-Pearson lemma

Template:Colored lemma

Proof. Let us first consider the case where the underlying distribution is continuous. Template:Color, the "size" requirement for being a UMP test is satisfied immediately. So, it suffices to show that φ satisfies the "UMP" requirement for being a MP test.

Notice that in this case, "Θ1" is simply {θ1}. So, for every test ψ with rejection region R*R and πψ(θ0)α, we will proceed to show that πφ(θ1)πψ(θ1).

Since πφ(θ1)πψ(θ1)=β„™θ1((X1,,Xn)R)β„™θ1((X1,,Xn)R*)=Rβ„’(θ1;𝐱)dxndx1R*β„’(θ1;𝐱)dxndx1=Rβ„’(θ1;𝐱)dxndx1RR*β„’(θ1;𝐱)dxndx1(R*β„’(θ1;𝐱)dxndx1RR*β„’(θ1;𝐱)dxndx1)=RR*β„’(θ1;𝐱)dxndx1R*Rβ„’(θ1;𝐱)dxndx11kRR*β„’(θ0;𝐱)dxndx11kR*Rβ„’(θ0;𝐱)dxndx1(In R,β„’(θ1;𝐱)1kβ„’(θ0;𝐱). In Rc,β„’(θ1;𝐱)<1kβ„’(θ0;𝐱)β„’(θ1;𝐱)>1kβ„’(θ0;𝐱))=1kRR*β„’(θ0;𝐱)dxndx1+1kRR*β„’(θ0;𝐱)dxndx1(1kR*Rβ„’(θ0;𝐱)dxndx1+1kRR*β„’(θ0;𝐱)dxndx1)=1kRβ„’(θ0;𝐱)dxndx11kR*β„’(θ0;𝐱)dxndx1=1k(β„™θ0((X1,,Xn)R)=αβ„™θ0((X1,,Xn)R*)α)1k(αα)=0, we have πϕ(θ1)πψ(θ1) as desired.

For the case where the underlying distribution is discrete, the proof is very similar (just replace the integrals with sums), and hence omitted.

Template:Colored remark Even if the hypotheses involved in the Neyman-Pearson lemma are simple, with some conditions, we can use the lemma to construct a UMP test for testing Template:Colored em null hypothesis against Template:Colored em alternative hypothesis. The details are as follows: For testing H0:θθ0vs.H1:θ>θ0

  1. Find a MP test φ with size α, for testing H0:θ=θ0vs.H1:θ=θ1>θ0 using the Neyman-Pearson lemma, where θ1 is an arbitrary value such that θ1>θ0.
  2. Template:Colored em, then the test φ has the greatest power for Template:Colored em θΘ1={ϑ:ϑ>θ0}. So, the test φ is a UMP test with size α for testing H0:θ=θ0vs.H1:θ>θ0
  3. Template:Colored em, then it means that the size of the test φ is still α, even if the null hypothesis is changed to H0:θθ. So, after changing H0:θ=θ0 to H0:θθ0 and not changing H1 (also adjusting the parameter space) for the test φ, the test φ still satisfies the "MP" requirement (because of not changing H1, so the result in step 2 still applies), and also the test φ will satisfy the "size" requirement (because of changing H0 in this way). Hence, the test φ is a UMP test with size α for testing H0:θθ0vs.H1:θ>θ0.

For testing H0:θθ0vs.H1:θ<θ0, the steps are similar. But in general, there is no UMP test for testing H0:θ=θ0vs.H1:θθ0.

Of course, when the condition in step 3 holds but that in step 2 does not hold, the test φ in step 1 is a UMP test with size α for testing H0:θθ0vs.H1:θ=θ1 where θ1 is a constant (which is larger than θ0, or else H1 and H0 are not disjoint). However, the hypotheses are generally not in this form. Template:Colored example Template:Colored remark Now, let us consider another example where the underlying distribution is discrete. Template:Colored example

Likelihood-ratio test

Previously, we have suggested using the Neyman-Pearson lemma to construct MPT for testing simple null hypothesis against simple alternative hypothesis. However, when the hypotheses are composite, we may not be able to use the Neyman-Pearson lemma. So, in the following, we will give a general method for constructing tests for any hypotheses, not limited to simple hypotheses. But we should notice that the tests constructed are not necessarily UMPT. Template:Colored definition Template:Colored remark

Relationship between hypothesis testing and confidence intervals

We have mentioned that there are similarities between hypothesis testing and confidence intervals. In this section, we will introduce a theorem suggesting how to construct a hypothesis test from a confidence interval (or in general, confidence Template:Colored em), and vice versa. Template:Colored theorem

Proof. For the first part, since R(θ0) is the rejection region of the size α test, we have β„™θ0(𝐗R)=α. Hence, the coverage probability for the random set C(𝐗) is β„™θ0(θ0C(𝐗))=β„™θ0(𝐗Rc)=1α, which means that the random set C(𝐗) is a 1α confidence set of θ0.

For the second part, by assumption, we have β„™θ(θC*(𝐗))=1α. So, the size of the test with the rejection region R(θ0) is β„™θ=θ0(𝐗R(θ0))=β„™θ=θ0(θ0=θC*(𝐗))=1(1α)=α.

Template:Colored remark Template:Nav Template:BookCat

  1. ↑ Thus, a natural measure of "goodness" of a hypothesis test is its "size of errors". We will discuss these later in this chapter.