Introduction to Hypothesis Testing Point estimation and confidence intervals are useful statistical inference procedures. Another type of inference is used frequently used concerns tests of hypotheses. We are interested in a random variable X which has density f(x; θ) where θ Ω. We would like to know whether θ ω 0 or θ ω 1, where ω 0 ω 1 = Ω.
Hypothesis testing: H 0 : θ ω 0 versus H 1 : θ ω 1. The hypothesis H 0 is referred to as the null hypothesis while H 1 is referred as the alternative hypothesis. Often the null hypothesis represents no change or no difference from the past while the alternative hypothesis represents change or difference. The decision rule to take H 0 or H 1 is based on a sample X 1, X 2,..., X n from the distribution of X. We may make mistakes.
Table 1: 2 2 Decision Table for a Test of Hypothesis Decision H 0 is True H 1 is True Reject H 0 Type I Error Correct Decision Accept H 0 Correct Decision Type II Error Type I Error = P ( Reject H 0 H 0 is true) Type II Error = P ( Accept H 0 H 1 is true)
An Example Let µ 1 be the mean score of midterm for UMBC male students and µ 2 be the mean score for female students. Hypothesis testing: H 0 : µ 1 µ 2 0 vs. H 1 : µ 1 µ 2 > 0 H 0 : µ 1 µ 2 = 0 vs. H 1 : µ 1 µ 2 0 H 0 : µ 1 75 vs. H 1 : µ 1 > 75
Some Definitions Critical Region C: reject H 0 (accept H 1 ): if (X 1,..., X n ) C accept H 0 (reject H 1 ): if (X 1,..., X n ) C c. A Type I error occurs if H 0 is rejected when it is true while a Type II error occurs if H 0 is accepted when H 1 is true. The goal is to select a critical region from all possible critical regions which minimizes the probabilities of these errors. In general, this is impossible. For example, let C =, type I error is 0 but the type II error is 1. Often we consider type I error to be the worse of the two errors. We select critical region which bound the type I error and minimize the type II error.
We say a critical region C is of size α (significance level) if α = max θ ω 0 P θ [(X 1,..., X n ) C]. Over all critical regions of size α, we consider critical regions which have lower probabilities of Type II error. For θ ω 1, we want to maximize 1 P θ [Type II error] = P θ [(X 1,..., X n ) C]. The probability on the right hand side is called the Power of the test at θ. The power function is defined as γ C (θ) = P θ [(X 1,..., X n ) C] = P ( accept H 1 H 1 is true )
Testing for a Binomial Proportion of Success Let X be a Bernoulli random variable with probability of success p. Suppose we want to test H 0 : p = p 0 vs. H 1 : p < p 0. Let X 1,..., X n be a random sample from the distribution of X and let S = n i=1 X i. An intuitive decision rule (critical region) is Reject H 0 in favor of H 1 if S k, where k is such that α = P H0 [S k]. Since S is Binomial, we may find k which solve this equation. For example, n = 20, p 0 = 0.7 and α = 0.15, then, S bin(20, 0.7) and k. = 11.
The power function is γ(p) = P p [S k], p < p 0. See Figure 5.5.1 for the picture of γ(p). Note that the function is decreasing. The power is higher to detect the alternative p = 0.2 than p = 0.6. Simple Hypothesis: completely specifies the underlying distribution, e.g., H 0 : p = p 0. Composite hypotheses: compose of many simple hypothesis, e.g., H 1 : p < p 0.
Large Sample Tests for the Mean The test in the last example is based on the exact distribution of its test statistics, i.e., the binomial distribution. Often we cannot get the distribution of test statistics in closed form. Use central limit theorem. Let X be a random variable with mean µ and variance σ 2. We want to test where µ 0 is specified. H 0 : µ = µ 0 vs. H 1 : µ > µ 0, For example: µ 0 is the mean level on a standardized test of students who have been taught by a standard method of teaching. Let X 1,..., X n be a random sample from the distribution of X.
Because X p µ, an intuitive decision rule is given by Reject H 0 in favor of H 1 if X is much larger than µ0. Using central limit theorem, X µ S/ n p Z. Using this, we obtain a test with an approximate size α, if Reject H 0 in favor of H 1 if X µ S/ n z α. The test is intuitive. To reject H 0, X must exceed µ0 by at least z α S/ n.
The power function is approximated by γ(µ) = P µ ( X µ 0 + z α σ/ n). n(µ0 µ) = 1 Φ(z α + ) σ n(µ0 µ) = Φ( z α ), σ which is an increasing function of µ.
Let X N(µ, σ 2 ). Consider Tests for µ under Normality H 0 : µ = µ 0 vs. H 1 : µ > µ 0. Under H 0, the test statistics T = ( X µ)/(s/ n) has a t-distribution with n 1 degrees of freedom. The decision rule is Reject H 0 in favor of H 1 if T = X µ S/ n t α,n 1.
p-value p-value: The p-value is the observed tail probability of a statistics being at least as extreme as the particular observed value when H 0 is true. If Y = u(x 1,..., X n ) is the statistics to be used in a test of H 0 and if the critical region is of the form u(x 1,..., x n ) c, an observed value u(x 1,..., x n ) = d would mean that the p-value = P (Y d; H 0 ). If p-value is small, we need reject the null hypothesis.
Let X 1,..., X 25 be a random sample from N(µ, 2 2 ). To test H 0 : µ = 77 vs. H 1 : µ < 77. Say we observe the 25 values and determine that x = 76.1. We know that Z = ( X 77)/ 4/25 is N(0, 1) provided that µ = 77. Since the observed statistics is z = (76.1 77)/0.4 = 2.25, the p-value of the test is Φ( 2.25) = 1 0.998 = 0.012. Accordingly, if we use a significance level of α = 0.05, we would reject H 0 and accept µ < 77.