Statistics/Hypothesis Testing: Difference between revisions
imported>LeoChiukl No edit summary |
(No difference)
|
Latest revision as of 15:09, 9 December 2022
Introduction
In previous chapters, we have discussed two methods for Template:Colored em, namely Template:Colored em and Template:Colored em. Estimating unknown parameters is an important area in statistical inference, and in this chapter we will discuss another important area, namely Template:Colored em, which is related to Template:Colored em. Indeed, the concepts of Template:Colored em and Template:Colored em are closely related, as we will demonstrate.
Basic concepts and terminologies
Before discussing how to Template:Colored em hypothesis testing, and Template:Colored em the "goodness" of a hypothesis test, let us introduce some basic concepts and terminologies related to hypothesis testing first. Template:Colored definition There are two terms that classify hypotheses: Template:Colored definition Sometimes, it is not immediately clear that whether a hypothesis is simple or composite. To understand the classification of hypotheses more clearly, let us consider the following example. Template:Colored example In hypothesis tests, we consider two hypotheses: Template:Colored definition Template:Colored remark A general form of and is and where , which is the Template:Colored em of (with respect to ), i.e., ( is the parameter space, containing all possible values of ). The reason for choosing the complement of in is that is the complementary hypothesis to , as suggested in the above definition. Template:Colored remark Template:Colored example We have mentioned that exactly one of and is assumed to be true. To make a decision, we need to Template:Colored em which hypothesis should be regarded as true. Of course, as one may expect, this decision is not perfect, and we will have some errors involved in our decision. So, we cannot say we "prove that" a particular hypothesis is true (that is, we cannot be Template:Colored em that a particular hypothesis is true). Despite this, we may "regard" (or "accept") a particular hypothesis as true (but Template:Colored em prove it as true) when we have Template:Colored em that lead us to make this decision (ideally, with small errors [1]). Template:Colored remark Now, we are facing with two questions. First, what evidences should we consider? Second, what is meant by "sufficient"? For the first question, a natural answer is that we should consider the observed Template:Colored em, right? This is because we are making hypothesis about the population, and the samples are taken from, and thus closely related to the population, which should help us make the decision.
To answer the second question, we need the concepts in Template:Colored em. In particular, in hypothesis testing, we will construct a so-called Template:Colored em or Template:Colored em to help us determining that Template:Colored em we should reject the Template:Colored em hypothesis (i.e., regard as false), and hence (naturally) regard as true ("accept" ) (we have assumed that exactly one of and is true, so when we regard one of them as false, we should regard another of them as true). In particular, when we do Template:Colored em reject , we will act as if, or accept as true (and thus should also reject since exactly one of of is true).
Let us formally define the terms related to hypothesis testing in the following. Template:Colored definition Template:Colored remark Template:Colored definition Template:Colored remark Typically, we use Template:Colored em (a statistic for conducting a hypothesis test) to specify the rejection region. For instance, if the random sample is and the test statistic is , the rejection region may be, say, (where and is observed value of and respectively). Through this, we can directly construct a hypothesis test: when , we reject and accept . Otherwise, if , we accept . So, in general, to specify the rule in a hypothesis test, we just need a Template:Colored em. After that, we will apply the test on testing against . There are some terminologies related to the hypothesis tests constructed in this way: Template:Colored definition Template:Colored remark Template:Colored example As we have mentioned, the decisions made by hypothesis test should not be perfect, and errors occur. Indeed, when we think carefully, there are actually Template:Colored em of errors, as follows: Template:Colored definition We can illustrate these two types of errors more clearly using the following table.
| Accept | Reject | |
|---|---|---|
| is true | Correct decision | Type I error |
| is false | Type II error | Correct decision |
We can express and . Also, assume the rejection region is (i.e., the rejection region with "" replaced by ""). In general, when "" is put together with "", we assume .
Then we have some notations and expressions for Template:Colored em of making type I and II errors: (let be a random sample and )
- The probability of making a type I error, denoted by , is if .
- The probability of making a type II error, denoted by , is if .
Template:Colored remark Notice that we have a common expression in both and , which is "". Indeed, we can also write this expression as Through this, we can observe that this expression contains all informations about the probabilities of making errors, given a hypothesis test with rejection . Hence, we will give a special name to it: Template:Colored definition Template:Colored remark Template:Colored example Template:Colored example Ideally, we want to make both and arbitrarily small. But this is generally impossible. To understand this, we can consider the following extreme examples:
- Set the rejection region to be , which is the set of all possible observations of random samples. Then, for each . From this, of course we have , which is nice. But the serious problem is that due to the mindless rejection.
- Another extreme is setting the rejection region to be the empty set . Then, for each . From this, we have , which is nice. But, again the serious problem is that due to the mindless acceptance.
We can observe that to make () to be very small, it is inevitable that () will increase consequently, due to accepting (rejecting) "too much". As a result, we can only try to minimize the probability of making one type of error, holding the probability of making another type of error Template:Colored em.
Now, we are interested in knowing that Template:Colored em type of errors should be controlled. To motivate the choice, we can again consider the analogy of legal principle of presumption of innocence. In this case, type I error means proving an innocent guilty, and type II error means acquitting a guilty person. Then, as suggested by Blackstone's ratio, type I error is more serious and important than type II error. This motivates us to control the probability of type I error, i.e., , at a specified small value , so that we can control the probability of making this more serious error. After that, we consider the tests that "control the type I error probability at this level", and the one with the smallest is the "best" one (in the sense of probability of making errors).
To describe "control the type I error probability at this level" in a more precise way, let us define the following term. Template:Colored definition Template:Colored remark So, using this definition, controlling the type I error probability at a particular level means that the size of the test should not exceed , i.e., (in some other places, such test is called a Template:Colored em. Template:Colored example For now, we have focused on using Template:Colored em to conduct hypothesis tests. But this is not the only way. Alternatively, we can make use of -value. Template:Colored definition Template:Colored remark The following theorem allows us to use -value for hypothesis testing. Template:Colored theorem
Proof. (Partial) We can prove "if" and "only if" directions at once. Let us first consider the case 1 in the definition of -value. By definitions, -value is and (Define such that .). Then, we have For other cases, the idea is similar (just the directions of inequalities for are different).
Template:Colored remark Template:Colored example
Evaluating a hypothesis test
After discussing some basic concepts and terminologies, let us now study some ways to evaluate goodness of a hypothesis test. As we have previously mentioned, we want the probability of making type I errors and type II errors to be small, but we have mentioned that it is generally impossible to make both probabilities to be arbitrarily small. Hence, we have suggested to control the type I error, using the size of a test, and the "best" test should the one with the smallest probability of making type II error, after controlling the type I error.
These ideas lead us to the following definitions. Template:Colored definition Using this definition, instead of saying "best" test (test with the smallest type II error probability), we can say "a test with the most power", or in other words, the "most powerful test". Template:Colored definition Template:Colored remark
Constructing a hypothesis test
There are many ways of constructing a hypothesis test, but of course not all are good (i.e., "powerful"). In the following, we will provide some common approaches to construct hypothesis tests. In particular, the following lemma is very useful for constructing a MP test with size .
Neyman-Pearson lemma
Proof. Let us first consider the case where the underlying distribution is continuous. Template:Color, the "size" requirement for being a UMP test is satisfied immediately. So, it suffices to show that satisfies the "UMP" requirement for being a MP test.
Notice that in this case, "" is simply . So, for every test with rejection region and , we will proceed to show that .
Since we have as desired.
For the case where the underlying distribution is discrete, the proof is very similar (just replace the integrals with sums), and hence omitted.
Template:Colored remark Even if the hypotheses involved in the Neyman-Pearson lemma are simple, with some conditions, we can use the lemma to construct a UMP test for testing Template:Colored em null hypothesis against Template:Colored em alternative hypothesis. The details are as follows: For testing
- Find a MP test with size , for testing using the Neyman-Pearson lemma, where is an arbitrary value such that .
- Template:Colored em, then the test has the greatest power for Template:Colored em . So, the test is a UMP test with size for testing
- Template:Colored em, then it means that the size of the test is still , even if the null hypothesis is changed to . So, after changing to and not changing (also adjusting the parameter space) for the test , the test still satisfies the "MP" requirement (because of not changing , so the result in step 2 still applies), and also the test will satisfy the "size" requirement (because of changing in this way). Hence, the test is a UMP test with size for testing .
For testing , the steps are similar. But in general, there is no UMP test for testing .
Of course, when the condition in step 3 holds but that in step 2 does not hold, the test in step 1 is a UMP test with size for testing where is a constant (which is larger than , or else and are not disjoint). However, the hypotheses are generally not in this form. Template:Colored example Template:Colored remark Now, let us consider another example where the underlying distribution is discrete. Template:Colored example
Likelihood-ratio test
Previously, we have suggested using the Neyman-Pearson lemma to construct MPT for testing simple null hypothesis against simple alternative hypothesis. However, when the hypotheses are composite, we may not be able to use the Neyman-Pearson lemma. So, in the following, we will give a general method for constructing tests for any hypotheses, not limited to simple hypotheses. But we should notice that the tests constructed are not necessarily UMPT. Template:Colored definition Template:Colored remark
Relationship between hypothesis testing and confidence intervals
We have mentioned that there are similarities between hypothesis testing and confidence intervals. In this section, we will introduce a theorem suggesting how to construct a hypothesis test from a confidence interval (or in general, confidence Template:Colored em), and vice versa. Template:Colored theorem
Proof. For the first part, since is the rejection region of the size test, we have Hence, the coverage probability for the random set is which means that the random set is a confidence set of .
For the second part, by assumption, we have So, the size of the test with the rejection region is
Template:Colored remark Template:Nav Template:BookCat
- ↑ Thus, a natural measure of "goodness" of a hypothesis test is its "size of errors". We will discuss these later in this chapter.