The Neyman-Pearson lemma. The Neyman-Pearson lemma

Transcription

1 The Neyman-Pearson lemma In practical hypothesis testing situations, there are typically many tests possible with significance level α for a null hypothesis versus alternative hypothesis. This leads to some important questions, such as (1) How to decide on the test statistic? () How to know that we selected the best rejection region? The Neyman-Pearson lemma Definition 7..1 Suppose that W is the test statistic and RR is the rejection region for a test of hypothesis concerning the value of a parameter θ. Then the power of the test is the probability that the test rejects H 0 when the alternative is true. That is, π = Power(θ) = P(W in RR when the parameter value is an alternative θ). If H 0 : θ = θ 0 and H a : θ θ 0, then the power of the test at some θ = θ 1 θ 0 is Power(θ) = P(reject H 0 θ = θ 1 ). But, β(θ 1 )=P(accept H 0 θ = θ 1 ). Therefore, Power(θ 1 ) = 1-β(θ 1 ).

2 The Neyman-Pearson lemma Example 7..1 Let X 1,, X n be a random sample from a Poisson distribution with parameter λ, that is, the pdf is given by f (x) = e λ λ x / (x!). Then the hypothesis H 0 : λ=1 uniquely specifies the distribution, because f (x) = e 1 / (x!) and hence is a simple hypothesis. The hypothesis H a : λ>1 is composite, because f(x) is not uniquely determined. The Neyman-Pearson lemma Definition 7.. A test at a given α of a simple hypothesis H 0 versus the simple alternative H a that has the largest power among tests with the probability of type I error no larger than the given α is called the most powerful test.

3 The Neyman-Pearson lemma Theorem 7..1 (Neyman-Pearson Lemma) Suppose that one wants to test a simple hypothesis H 0 : θ = θ 0 versus the simple alternative hypothesis H a : θ = θ 1 based on a random sample X 1,, X n from a distribution with parameter θ. Let L(θ) L(θ; X 1,, X n )>0 denote the likelihood of the sample when the value of the parameter is θ. If there exist a positive constant K and a subset C of the sample space R n (the Euclidean n-space) such that 1. L θ 0 L θ 1. L θ 0 L θ 1 ( ) ( ) K for ( x, x,, x 1 n ) C ( ) ( ) K for ( x 1, x,, x n ) C ' ( ) C; θ 0 = α 3.P x 1, x,, x n Then the test with critical region C will be the most powerful test for H 0 versus H a. We call α the size of the test and C the best critical region of size α. The Neyman-Pearson lemma Example 7.. Let X 1,, X n denote an independent random sample from a population with a Poisson distribution with mean λ. Derive the most powerful test for testing H 0 : λ= versus H a : λ=1/. Solution Recall that the pdf of Poisson variable is e λ λ x, λ > 0, x = 0,1,, p(x) = x! 0, otherwise Thus, the likelihood function is L= n ( x i ) i= 1 λn [ λ e n i= 1 x! i ]

4 The Neyman-Pearson lemma Example 7.. n Solution (cont.) ( xi For λ=, L(θ 0 ) = L(λ = ) = [ ) i=1 e n ] n x i! and for λ=1/, i=1 ( xi L(θ 1 ) = L(λ = 1/ ) = [(1/ ) ) i=1 e n/ ] ( ) Thus, L(θ 0 ) x i e n L(θ 1 ) = ( 1 < K ) x i e n/ Taking natural logarithm, ( x i )ln 4 3n < ln K. Solving for x i and letting {[ln K + (3n / )] / ln 4} = K ', we will reject H 0 whenever x i < K '. n i=1 n x i! The Neyman-Pearson lemma Procedure for applying the Neyman-Pearson lemma 1. Determine the likelihood functions under both null and alternative hypotheses.. Take the ratio of the two likelihood functions to be less than a constant K. 3. Simplify the inequality in step to obtain a rejection region.

5 The Neyman-Pearson lemma Example 7..3 Suppose X 1,, X n is a random sample from a normal distribution with a known mean of µ and an unknown variance of σ. Find the most powerful α-level test for testing H 0 : σ = σ 0 versus H a : σ = σ 1 (σ 1 >σ 0 ) Show that this test is equivalent to the χ -test. Is the test uniformly most powerful for H a : σ > σ 0? The Neyman-Pearson lemma Example 7..3 Solution To test H 0 : (σ 1 > σ 0 ) versus H a : σ > σ 1. We have n 1 L(σ 0 ) = e (x i µ) σ 0 1 = e i=1 πσ 0 π for some K. (x i µ) ( ) n σ 0 n Similarly, 1 L(σ 1 ) = e σ 1. ( π ) n n σ 1 Therefore, the most powerful test is, reject H 0 if L(σ 0 ) L(σ 1 ) = σ 1 σ 0 n (x i µ) σ 0 exp{ (σ 1 σ 0 ) (x i µ) } K σ 1 σ 0.

6 The Neyman-Pearson lemma Example 7..3 Solution (cont.) Taking the natural logarithms, we have or nln σ 1 σ 1 σ 0 σ x i µ 1 σ 0 σ 0 ( ) ( ) ( x i µ ) nln σ 1 σ 0 ln K σ 1σ 0 σ 1 σ 0 ln K ( ) = C To find the rejection region for a fixed value of α, write the region as ( x i µ ) C σ = C ' 0 ( x i µ ) σ 0 Note that σ has a χ 0 -distribution with n degrees of freedom. Under the H 0 because the same rejection region (does not depend upon the specific value of σ 1 in the alternative) would be used for any (σ 1 > σ 0 ), the test is uniformly most powerful. Likelihood ratio tests In this section, we shall study a general procedure that is applicable when one or both H 0 and H a are composite. We assume that the pdf or pmf of the random variable X is f(x,θ), where θ can be one or more known parameters. Consider the hypotheses H 0 : µ = µ 0 vs. H a : µ µ 0 Where θ is the known population parameter(s) with values in Θ, and Θ 0 is a subset of Θ. Let Θ represent the total parameter space that is the set of all possible values of the parameter θ given by either H 0 or H a. Maximum likelihood Definition max L θ; x 1,, x n θ Θ The likelihood ratio λ is the ratio λ = 0 ( ) L ( θ; x,, x 1 n ) = L * 0 L * max θ Θ Maximum likelihood We note that 0 λ 1. Because λ is the ratio of nonnegative functions, λ 0. Because Θ 0 is a subset of Θ,we know that max L( θ ) max L( θ ) θ Θ Hence, λ 1. 0 θ Θ

7 Likelihood ratio tests Likelihood ratio tests (LRTs) To test H 0 :θ Θ 0 vs. H a :θ Θ a λ = max L θ; x 1,, x n θ Θ 0 L θ; x 1,, x n max θ Θ will be used as the test statistic. ( ) ( ) = L * 0 L * The rejection region for the likelihood ratio test is given by Reject H 0 if λ K. K is selected such that the test has the given significance level α. Likelihood ratio tests Example Let X 1,, X n be a random sample from an N(µ,σ ). Assume that σ is known. We wish to test, at level α, H 0 : µ = µ 0 vs. H a : µ µ 0. Find an appropriate likelihood ratio test. Solution We have seen that to test H 0 : µ = µ 0 vs. H a : µ µ 0 there is no uniformly most powerful test for this case. The likelihood function is 1 L(µ) = πσ Here, Θ ={µ 0 } and Θ =R-{µ 0 }. 0 a n exp{ n i=1 (x i µ) σ }.

8 Likelihood ratio tests Example Solution (cont.) Hence, L * 0 = 1 πσ Similarly, L * = max <µ< n exp{ 1 πσ (x i µ 0 ) Because the only unknown parameter in the parameter space Θ is µ, - <µ<, the maximum of the likelihood function is achieved when µ equals its maximum likelihood estimator, that is, ˆµ ml. = X. Therefore, with a simple calculation we have n exp{ (x i µ 0 ) / σ } i=1 λ = n = e n(x µ 0 ) /σ exp{ (x i x) / σ } i=1 n n i=1 σ }. exp{ n i=1 (x i µ) σ }. Likelihood ratio tests Example Solution (cont.) Thus, the likelihood ratio test has the rejection region Reject H 0 if λ K. which is equivalent to n σ (X µ 0 ) ln K (X µ 0 ) σ / n ln K X µ 0 σ / n ln K = c 1. We now compute c 1. Under H 0, [(X µ 0 ) / (σ / n)] ~ N(0,1). Hence, LRT for the given hypothesis is Reject H 0 if X-µ 0 σ / n > z a/. Thus, in this case, the likelihood ratio test is equivalent to the z-test for large random samples.

9 Likelihood ratio tests Procedure for the likelihood ratio test (LRT) 1. Find the largest value of the likelihood L(θ) for any θ 0 Θ 0 by finding the maximum likelihood estimate within Θ 0 and substituting back into the likelihood function.. Find the largest value of the likelihood L(θ) for any θ Θ by finding the maximum likelihood estimate within Θ and substituting back into the likelihood function. 3. Form the ratio λ = λ ( x 1,, x n ) = L( θ )inθ 0 L( θ )inθ 4. Determine a K so that the test has the desired probability of type I error, α. 5. Reject H 0 if λ K. Likelihood ratio tests Example 7.3. Machine I produces 5% defectives. Machine produces 10% defectives. Ten items produced by each of the machines are sampled randomly; X = number of defectives. Let θ be the true proportion of defectives. Test H 0 : θ = 0.05 versus H a : θ = 0.1. Use α = Solution We need to test H 0 : θ = 0.05 vs. H a : θ = 0.1. Let 10 x (0.05)x (0.95) 10 x, if θ = L(θ) = 10 x (0.1)x (0.90) 10 x, if θ = 0.10.

10 Likelihood ratio tests Example 7.3. Solution (cont.) And L 1 = L(0.05) = 10 x (0.05)x (0.95) 10 x and L = L(0.1) = 10 x (0.1)x (0.90) 10 x Thus, we have L 1 = (0.05)x (0.95) 10 x = 1 L (0.1) x (0.9) 10 x x x. L The ratio λ = 1 max(l 1, L ). Likelihood ratio tests Example 7.3. Solution (cont.) Note that if max(l 1,L )=L 1, then λ=1. Because we want to reject for small values of λ, max(l 1,L )=L, and we reject H 0 if (L 1 /L ) K or (L /L 1 ) > K (note that L /L 1 = x (18/19) 10-x ). That is, reject H 0 if 18 x x > K (19 / 9) x > K. Hence, reject H 0 if X>C; P(X>C H 0 : θ=0.05) Using the binomial tables, we have P(X > θ = 0.05) = P(X θ = 0.05) = => Reject H 0 if X>. and P.S. The likelihood ratio tests do not always produce a test statistic with a known probability distribution.

11 Take a break. Hypotheses for a single parameter Definition Corresponding to an observed value of a test statistic, the p-value (or attained significance level) is the lowest level of significance at which the null hypothesis would have been rejected. P-value: the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. [from wiki]

12 Hypotheses for a single parameter Steps to find the p-value 1. Let TS be the test statistic.. Compute the value of TS using the sample X 1,, X n. Say it is a. 3. The p-value is given by P(TS < a H 0 ), if lower tail test p value = P(TS > a H 0 ), if upper tail test P( TS > a H 0 ), if two tail test. p-value depends on alternative hypothesis!! Hypotheses for a single parameter Example To test H 0 : µ = 0 vs. H a : µ 0, suppose that the test statistic Z results in a computed value of Then, the p-value=p( Z >1.58)= *0.0571= That is, we must have a type I error of (or higher) in order to reject H 0. Also, if H a : µ > 0, then the p-value would be P(Z>1.58)= In this case we must have an α of in order to reject H 0.

13 Hypotheses for a single parameter Reporting test result as p-values 1. Choose the maximum value of α that you are willing to tolerate.. If the p-value of the test is less than maximum value of α, reject H 0. Hypotheses for a single parameter Example 7.4. The management of a local health club claims that its members lose on the average 15 pounds or more within the first 3 months after joining the club. To check this claim, a consumer agency took a random sample of 45 members of this health club and found that they lost an average of 13.8 pounds within the first 3 months of membership, with a sample deviation of 4. pounds. (a) Find the p-value for this test. (b) Based on the p-value in (a), would you reject the null hypothesis at α =0.01?

14 Hypotheses for a single parameter Example 7.4. Solution (a) Let µ be the true mean weight loss in pounds within the first 3 months of membership in this club. Then we have to test the hypothesis H 0 : µ = 15 vs. H a : µ < 15 Here n = 45, x =13.8, and s = 4.. Because n = 45>30, we can use normal approximation. Hence, the test statistic is z = 4. / 45 = and p-value=p(z< ) P(Z<-1.9)= Thus, we can use an α as small as and still reject H 0. (b) No. Hypotheses for a single parameter Steps in any hypothesis testing problem 1. State the alternative hypothesis, H a (what is believed to be true).. State the null hypothesis, H 0 (what is doubted to be true). 3. Decide on a level of significance α. 4. Choose an appropriate TS and compute the observed test statistic. 5. Using the distribution of TS and α, determine the rejection region(s) (RR). 6. Conclusion: If the observed test statistic falls in the RR, reject H 0 and conclude that based on the sample information, we are (1- α)100% confident that H a is true. Otherwise, conclude that there is not sufficient evidence to reject H 0. In all the applied problems, interpret the meaning of your decision. 7. State any assumptions you made in testing the given hypothesis. 8. Compute the p-value from the null distribution of the test statistic and interpret it.

15 Hypotheses for a single parameter Summary of hypothesis tests for µ Large Sample ( n 30) To test H H a 0 : µ = µ versus µ > µ, upper tail 0 : µ < µ, lower tail 0 0 test test µ µ, two - tailed test X µ 0 Test statistic : Z = σ / n Replace σ by S, if σ is unknown. 0 z > zα, upper tail RR Rejection region : z < zα, lower tail RR z > zα /, two tail RR Assumption : n 30 Small Sample ( n < 30) To test H H a : µ = µ versus µ > µ, upper tail test : µ < µ, lower tail test X µ 0 Test statistic : T = S / n t > tα, n 1, upper tail RR Rejection region : t < tα, n 1, lower tail RR t > tα /, n 1, two tail RR Assumption : Random sample comes from a normal population µ µ, two - tailed test 0 0 Hypotheses for a single parameter Summary of hypothesis tests for µ (cont.) Decision: Reject H 0, if the observed test statistic falls in the RR and conclude that H a is true with (1-α)100% confidence. Otherwise, keep H 0 so that there is no enough evidence to conclude that H a is true for the given α and more experiments may be needed.

16 Hypotheses for a single parameter Example In a frequently traveled stretch of the I-75 highway, where the posted speed is 70 mph, it is thought that people travel on the average of at least 75 mph. To check this claim, the following radar measurements of the speeds (in mph) is obtained for 10 vehicles traveling on this stretch of the interstate highway Do the data provide sufficient evidence to indicate that the mean speed at which people travel on this stretch of highway is at least 75 mph? Test the appropriate hypothesis using α = Draw a box plot and normal plot for this data, and comment. Hypotheses for a single parameter

17 Hypotheses for a single parameter Example Solution We need to test H 0 : µ = 75 vs. H a : µ > 75. For this sample, the sample mean is x =74.8 mph and the standard deviation is σ = mph. Hence, the observed test statistic is t = x µ = σ / n / 10 = From the t-table, t 0.01 =.81. Hence, the rejection region is {t >.81}. Because t = does not fall in the rejection region, we do not reject the null hypothesis at α = Hypotheses for a single parameter Example Solution (cont.) The box plot suggests that there are no outliers present. However, the normal plot indicates that the normality assumption for this data set is not justified. Hence, it may be more appropriate to do a nonparametric test.

18 Hypotheses for a single parameter Example A machine is considered to be unsatisfactory if it produces more than 8% defectives. It is suspected that the machine is unsatisfactory. A random sample of 10 items produced by the machine contains 14 defectives. Does the sample evidence support the claim that the machine is unsatisfactory? Use α = Hypotheses for a single parameter Example Solution Let Y be the number of observed defectives. This follows a binomial distribution. However, because np 0 and nq 0 are greater than 5, we can use a normal approximation to the binomial to test the hypothesis. So we need to test H 0 : p = 0.08 versus H a : p > Let the point estimate of p be ˆp = (Y / n) = 0.117, the sample proportion. Then the value of the TS is z = ˆp p = = p 0 q (0.9) n 10 For α = 0.01, z 0.01 =.33. Hence, the rejection region is {z>.33}. Decision: Because is not greater than.33, we do not reject H 0. We conclude that the evidence does not support the claim that the machine is unsatisfactory.

19 Hypotheses for a single parameter Summary of hypothesis test for the proportion p To test H 0 : p = p 0 Versus p > p 0, upper tail test H a : p < p 0, lower tail test Test statistic: Z = ˆp p 0, where σ ˆp = p (1 p ) 0 0 σ ˆp n Rejection region: Z > Z α, upper tail RR Z < Z α, lower tail RR Z > Z α /, two tail RR Hypotheses for a single parameter Summary of hypothesis test for the proportion p (cont.) Assumption: n is large. A good rule of thumb is to use the normal approximation to the binomial distribution only when np 0 and n(1-p 0 ) are both greater than 5 Decision: Reject H 0, if the observed test statistic falls in the RR and conclude that H a is true with (1-α)100% confidence. Otherwise, do not reject H 0 because there is not enough evidence to conclude that H a is true for given α and more data are needed.

20 Hypotheses for a single parameter Summary of hypothesis test for the variance σ To test H 0 :σ = σ 0 Versus σ > σ 0, upper tail test H a :σ < σ 0, lower tail test σ σ 0, two-tailed test. Test statistic: where S is the sample variance. Observed value of test statistic: Rejection region: (n 1)S χ = σ 0 χ > χ α,n 1, upper tail RR χ < χ 1 α,n 1, lower tail RR χ > χ α /,n 1 or χ > χ 1 α /,n 1, two tail RR (n 1)s σ 0 Hypotheses for a single parameter Summary of hypothesis test for the variance σ (cont.) Assumption: Sample comes from a normal population Decision: Reject H 0, if the observed test statistic falls in the RR and conclude that H a is true with (1-α)100% confidence. Otherwise, do not reject H 0 because there is not enough evidence to conclude that H a is true for given α and more data are needed.

21 Hypotheses for a single parameter Example A physician claims that the variance in cholesterol levels of adult men in a certain laboratory is at most 100. A random sample of 5 adult males from this laboratory produced a sample standard deviation of cholesterol levels as 1. Test the physician s claim at 5% level of significance. Solution To test H 0 : σ =100 versus H a : σ <100 For α = 0.05, and 4 degrees of freedom, the rejection region is RR = {χ < χ 1 α,n 1 } = {χ < 13.84}. The observed value of the TS is χ = (4)(144) 100 = Because the value of the test statistic does not fall in the rejection region, we cannot reject H 0 at 5% level of significance. Testing of hypotheses for two samples Independent Samples Two random samples are drawn independently of each other from two populations, and the sample information is obtained. We are interested in testing a hypothesis about the difference of the true means. Let X 11,, X 1n be a random sample from population 1 with mean µ 1 and variance σ 1, and X 1,, X n be a random sample from population with mean µ and variance σ. Let X represent the respective i,i = 1, sample means and S i,i = 1, represent the respective sample variances. In testing hypotheses about µ 1 and µ : (i) σ 1 and σ are known; (ii) σ 1 and σ are unknown and n 1 30 and n 30 ; (iii) σ 1 and σ are unknown and n and ; (a) 1 < 30 n < 30 σ 1 = σ (b) σ 1 σ Most common; most complicated in computation

22 Testing of hypotheses for two samples Hypothesis test for µ 1 - µ for large samples (n 1 & n 30) To test H 0 : µ 1 - µ = D 0 versus µ 1 µ > D 0, upper tailed test H a :{ µ 1 µ < D 0, µ 1 µ D 0, lower tailed test two-tailed test. The test statistic is Z = X 1 X D 0 σ 1 n 1 + σ Replace σ i by S i, if σ i, i = 1, are not known. n. Testing of hypotheses for two samples Hypothesis test for µ 1 - µ for large samples (n 1 & n 30) (cont.) Rejection region is RR :{ z > z α, z < z α, z > z α /, upper tailed RR lower tailed RR two-tailed RR, where z is the observed test statistic given by z = x 1 x D 0 σ 1 n 1 + σ n. Assumption: the samples are independent and n 1 and n 30. Decision: Reject H 0, if test statistic falls in the RR and conclude that H a is true with (1- α)100% confidence. Otherwise, do not reject H 0 because there is not enough evidence to conclude that H a is true for a given α and more experiments are needed.

23 Testing of hypotheses for two samples Example In a salary study of faculty at a center university, sample salaries of 50 male assistant professors and 50 female assistant professors yielded the following basic statistics. Sample mean salary Sample standard deviation Male assistant professors $ 36, Female assistant professors $34,00 0 Test the hypothesis that the mean salary of male assistant professors is more than the mean salary of female assistant professors at this university. Use α = 0.05 Testing of hypotheses for two samples From wiki

24 Testing of hypotheses for two samples Example Solution Let µ 1 be the mean salary for male assistant professors and µ for the mean salary for female assistant professors at this university. To test H 0 : µ 1 - µ = 0 vs. H a : µ 1 µ > 0 The test statistics is z = x x D , , 00 σ 1 + σ = = (360) + (0) n The rejection region for α = 0.05 is {z > 1.645}. n Testing of hypotheses for two samples Example Solution (cont.) Because z = > 1.645, we reject the null hypothesis at α = We conclude that the salary of male assistant professors at this university is higher than that of female assistant professors for α = Note that even though σ 1 and σ are unknown, because n 1 30 and n 30, we could replace σ 1 and σ by the respective sample variances. We are assuming that the salaries of male and female are sampled independently of each other.

25 Testing of hypotheses for two samples Comparison of two population means, small sample case (pooled t- test); assume variances are equal To test H 0 : µ 1 - µ = D 0 versus µ 1 µ > D 0, upper tailed test H a :{ µ 1 µ < D 0, µ 1 µ D 0, lower tailed test two-tailed test. The test statistic is T = X X D S p + 1 n 1 n Here the pooled sample variance is S p = (n 1 1)S 1 + (n 1)S n 1 + n. Testing of hypotheses for two samples Comparison of two population means, small sample case (pooled t- test) (cont.) Then the rejection region is t > t α, upper tailed RR RR :{ t < t α, lower tailed RR t > t α /, two-tailed RR where t is the observed test statistic and t α is based on (n 1 +n -) degrees of freedom, and such that P(T > t α ) = α. Decision: Reject H 0, if test statistic falls in the RR and conclude that H a is true with (1- α)100% confidence. Otherwise, do not reject H 0 because there is not enough evidence to conclude that H a is true for a given α. Assumption: The samples are independent and come from normal populations with mean µ 1 and µ, and with the (unknown) but equal variances, that is, =. σ 1 σ

26 Testing of hypotheses for two samples Now we shall consider the case where and are unknown and cannot be assumed to be equal. In such case the following test is often used. For the hypothesis µ 1 µ > D 0, H 0 : µ 1 - µ = D 0 vs. define the test statistic T v as H a :{ T v = x 1 x D 0 σ 1 σ µ 1 µ < D 0, µ 1 µ D 0, + S n 1 n where T v has a t-distribution with v degrees of freedom, and S 1 v = [(S 1 / n 1 ) + (S / n )] (S 1 / n 1 ) n (S / n ) n 1 Testing of hypotheses for two samples The value of v will not necessarily be an integer. In that case, we will round it down to the nearest integer. This method of hypothesis testing with unequal variances is called the Smith-Satterthwaite procedure.

27 Testing of hypotheses for two samples Example 7.5. The intelligence quotients (IQs) of 17 students from one area of a city showed a sample mean of 106 with a sample standard deviation of 10, whereas the IQs of 14 students for another area chosen independently showed a sampled mean of 109 with a standard deviation of 7. Is there a significant difference between the IQs of the two groups at α = 0.0? Assume that the population variance are equal. Testing of hypotheses for two samples Example 7.5. Solution We test H 0 : µ 1 µ = 0 vs. H a : µ 1 µ 0 Here n 1 = 17, x 1 = 106, and s 1 = 10. Also, n = 14, x = 109, and s = 7. We have S p = (n 1)S (n 1)S n 1 + n The test statistic is = (16)(10) + (13)(7) 9 T = X 1 X D 0 S p 1 n n = = 1 ( ) =

28 Testing of hypotheses for two samples Example 7.5. Solution (cont.) For α = 0.0, t 0.01, 9 =.46. Hence, the rejection region is t < -.46 or t >.46. Because the observed value of the test statistic, T = , does not fall in the rejection region, there is not enough evidence to conclude that the mean IQs are different for the two groups. Here we assume that the two samples are independent and are taken from normal populations. Testing of hypotheses for two samples Example Infrequent or suspended menstruation can be a symptom of serious metabolic disorder in women. In a study to compare the effect of jogging and running on the number of menses, two independent subgroups were chosen from a large group of women, who were similar in physical activity (aside from running), heights, occupations, distribution of ages, and the type of birth control methods being used. The first group consisted of a random sample of 6 women joggers who jogged slow and easy 5 to 30 miles per week, and the second group consisted of a random sample of 6 women runners who ran more than 30 miles per week and combined with long distance, slow speed walk. The following summary statistics were obtained (E. Dale, D. H. Gerlach, and A. L. Wilhite, Menstrual Dysfunction in Distance Runner, obstet. Gynecol. 54, 47-53, 1979). Joggers x 1 = 10.1, S 1 =.1 Runners = 9.1, S =.4 x

29 Testing of hypotheses for two samples Example (cont.) Using α = 0.05, (a) test for differences in mean number of menses for each group assuming equality of population variances, and (b) test for differences in mean number of menses for each group assuming inequality of population variances. Solution Here we need to test H 0 : µ 1 µ = 0 vs. H a : µ 1 µ 0 Here n 1 = 6, x 1 = 10.1, and s 1 =.1. Also, n = 6, x = 9.1, and s =.4. (a) Under the assumption =, we have σ 1 σ S p = (n 1)S (n 1)S n 1 + n = (5)(.1) + (5)(.4) 50 = Testing of hypotheses for two samples Example Solution (cont.) The test statistic is T = X 1 X D 0 S p 1 n n = 1 ( 5.085) = For α = 0.05, t 0.05, Hence, the rejection region is t < or t > Because T = does not fall in the rejection region, we do not reject the null hypothesis. At α = 0.05, there is not enough evidence to conclude that the population mean number of menses for the joggers and runners are different.

30 Testing of hypotheses for two samples Example Solution (cont.) (b) Under the assumption, we have σ 1 σ v = [(S 1 / n 1 ) + (S / n )] (S 1 / n 1 ) n (S / n ) n 1 = ( (.1) 6 + (.4) 6 ) ( (.1) 6 ) + 5 ( (.4) 6 ) 5 = Hence, we have v = 49 degrees of freedom. Because this value is large, the rejection region is still approximately t < and t > Hence, the conclusion is the same as that of part (a). In both parts (a) and (b), we assumed that the samples are independent and came from two normal populations. Testing of hypotheses for two samples Hypothesis test for (p 1 p ) for large samples (n i p i > 5 and n i (1-p i ) > 5, for i = 1, ) Assume binomial distribution is approximated by normal distribution. To test H0: p 1 p = D 0 versus p 1 p > D 0, upper tailed test H a :{ p 1 p < D 0, lower tailed test p 1 p D 0, two-tailed test at significant level α, the test statistic is Z = ˆp ˆp D 1 0 ˆp 1 ˆq 1 + ˆp ˆq n 1 n where z is the observed value of Z.

31 Testing of hypotheses for two samples Hypothesis test for (p 1 p ) for large samples (n i p i > 5 and n i q i > 5, for i = 1, ) (cont.) The rejection region is z > z α, upper tailed RR RR :{ z < z α, lower tailed RR z > z α /, two-tailed RR Assumption: The samples are independent and n i p i > 5 and n i (1-p i ) > 5, for i = 1,. Decision: Reject H 0 if test statistic falls in the RR and conclude that H a is true with (1- α)100% confidence. Otherwise, do not reject H 0 because there is not enough evidence to conclude that H a is true for given α and more experiments are needed. Testing of hypotheses for two samples Example Because of the impact of the global economy on a high-wage country such as United States, it is claimed that the domestic content in manufacturing industries fell between 1977 and A survey of 36 randomly picked U.S. companies gave the proportion of domestic content total manufacturing in 1977 as 0.37 and in 1997 as At the 1% level of significance, test the claim that the domestic content really fell during the period

32 Testing of hypotheses for two samples Example Solution Let p 1 be the domestic content in 1977 and p be the domestic content in Given n 1 = n = 36, ˆp 1 = 0.37 and ˆp = We need to test H 0 : p 1 p = 0 vs. H a : p 1 p > 0. The test statistic is Z = ˆp ˆp D 1 0 ˆp 1 ˆq 1 + ˆp ˆq n 1 n = (0.37)(0.63) + (0.36)(0.64) = Testing of hypotheses for two samples Example Solution (cont.) For α = 0.01, z 0.01 =.35. Hence, the rejection region is z >.35. Because the observed value of the test statistic does not fall in the rejection region, at α = 0.01, there is not enough evidence to conclude that the domestic content in manufacturing industries fell between 1977 and 1997.

33 Testing of hypotheses for two samples We have already seen in Ch.4 that F = S 1 /σ 1 S /σ follows the F-distribution with v 1 = n 1 1 numerator and v = n 1 degrees of freedom. Under the assumption H 0 : =, we have F = S 1 which has an F-distribution with (v 1, v ) degrees of freedom. S σ 1 σ Testing of hypotheses for two samples Testing for the equality of variances To test H 0 : σ 1 = σ versus σ 1 > σ, upper tailed test H a :{ σ 1 < σ, lower tailed test σ 1 σ, two-tailed test at significance level α, the test statistic is F = S 1 S The rejection region is f > F α (v 1, v ), RR :{ f < F 1 α (v 1, v ), f > F α / (v 1, v ) or f < F 1 α / (v 1, v ), upper tailed RR lower tailed RR two-tailed RR

34 Testing of hypotheses for two samples Testing for the equality of variances (cont.) s1 where f is the observed test statistic given by f =. s Decision: Reject H 0 if test statistic falls in the RR and conclude that H a is true with (1- α)100% confidence. Otherwise, keep H 0, because there is not enough evidence to conclude that H a is true for given α and more experiments are needed. Assumption: (i) The two random samples are independent. (ii) Both populations are normal. In order to find F 1-α (v 1,v ), we use the identity F 1-α (v 1,v ) = (1/F α (v,v 1 )) Testing of hypotheses for two samples Example Consider two independent random samples X 1,, X n from an N(µ 1,σ 1 ) distribution and Y 1,, Y n from an N(µ,σ ) distribution. σ 1 Test H 0 : = versus H a : for the following basic statistics: n 1 = 5, x 1 = 410, s 1 = 95, and n = 16, x = 390, s = 300 Use α = 0.0. σ σ 1 σ Solution Test H 0 : σ 1 = σ versus H a : σ 1 σ. This is a two-tailed test. Here the degrees of freedom are v 1 = 4 and v = 15. The test statistic is F = S 1 S = =

35 Testing of hypotheses for two samples Example Solution (cont.) From the F-table, F 0.10 (4,15) = 1.9 and F 0.90 (4,15) = (1/F 0.10 (15,4)) = Hence, the rejection region is F > 1.90 and F < Because the observed value of the test statistic, 0.317, is less than 0.56, we reject the null hypothesis. There is evidence that the population variances are not equal. Take a break.

36 Testing of hypotheses for two samples Dependent Samples Two samples are dependent: each data point in one sample can be coupled in some natural, nonrandom fashion with each data point in the second sample. The pairing may be the result of the individual observations in the two samples: (1) represent before and after program, () sharing the same characteristics, (3) being matched by location, (4) being matched by time, (5) control and experimental, and so forth. Testing of hypotheses for two samples Dependent Samples Let (X 1i,X i ) for i = 1,,, n, be a random sample. X 1i and X j (i j) are independent. To test the significant of the difference between two population means when the samples are dependent, we first calculate for each pair of scores the difference, D i = X 1i X i, i = 1,,, n, between the two scores. Let µ D = E(D i ). Because pairs of observations form a random sample D 1,, D n are i.i.d random variables, if d 1,, d n are the observed values of D 1,, D n, the we define d = 1 n d i and s d = 1 n d i 1 (d i d) i=1 n ( d i) i=1 = n n 1 n 1 i=1 i=1 n n

37 Testing of hypotheses for two samples Testing for matched pairs experiment To test H 0 : µ D = d 0 versus H a :{ µ D > d 0, µ D < d 0, µ D d 0, upper tailed test lower tailed test two-tailed test the test statistic: T = D d 0 (this approximately follows a Student t- S D / n distribution with (n-1) degrees of freedom). The rejection region is RR :{ t > t α,n 1, t < t α,n 1, t > t α /,n 1, upper tailed RR lower tailed RR two-tailed RR where t is the observed test statistic. Testing of hypotheses for two samples Testing for matched pairs experiment (cont.) Assumption: The differences are approximately normally distributed. Decision: Reject H 0 if test statistic falls in the RR and conclude that H a is true with (1- α)100% confidence. Otherwise, do not reject H 0, because there is not enough evidence to conclude that H a is true for a given α and more data are needed.

38 Testing of hypotheses for two samples Example A new diet and exercise program has been advertised as remarkable way to reduce blood glucose levels in diabetic patients. Ten randomly selected diabetic patients are put on the program, and the results after 1 month are given by the following table: Before After Do the date provide sufficient evidence to support the claim that the new program reduces blood glucose level in diabetic patients? Use α = Testing of hypotheses for two samples Example Solution We need to test the hypothesis H 0 : µ D = 0 vs. H a : µ D < 0. First we calculate the difference if each pair given in the following table. Before After Diff. (afterbefore) From the table, the mean of the differences is d = 71.9 deviation s d = 56.. The test statistic is T = D d 0 S D / n = 71.9 = / 10 and the standard

39 Testing of hypotheses for two samples Example Solution (cont.) From the t-table, t 0.05,9 = Because the observed value of t = < t 0.05,9 = 1.833, we reject the null hypothesis and conclude that the sample evidence suggests that the new diet and exercise program is effective. Testing of hypotheses for two samples Why must we take paired differences and then calculate the mean and standard deviation for the differences why can t we just take the means of each sample, as we did for independent samples? à σ need not be equal to σ (X1 X ) D Assume that E(X ji ) = µ j, Var(X ji ) = σ j, for j = 1, and Cov(X 1i, X i ) = ρσ 1 σ where ρ denotes the assumed common correlation coefficient of the pair (X 1i, X i ) for i = 1,,, n. Because the value of D i, i = 1,,, n, are i.i.d., µ D = E(D i ) = E(X 1i ) E(X i ) = µ 1 µ and σ D = Var(D i ) = Var(X 1i ) + Var(X i ) Cov(X!i, X i ) = σ 1 + σ ρσ 1 σ

40 Testing of hypotheses for two samples From these calculations, and E(D) = µ D = µ 1 µ σ D = Var(D) = σ D n = 1 n (σ 1 + σ ρσ 1 σ ) Now, if the samples were independent with n 1 = n = n, and E(X 1 X ) = µ 1 µ σ (X1 X = 1 ) n (σ 1 + σ ) Hence, if ρ > 0, then σ D < σ (X1 X. ) Chi-Square Tests for Count Data Suppose that we have outcomes of a multinomial experiment that consists of k mutually exclusive and exhaustive events A 1,.A k. Let P(A i )=p i, i = 1,,,k k p i = 1 Let the experiment be repeated n times, and X i (i=1,,,n) represent the number of times the event A i occurs, then (X 1,,X n ) have a multinomial distribution with parameters k, p 1,,p k. Let Q = k i=1 i=1 (X i np i ) np i It can be shown that for large n, the random variable Q is approximately - distributed with (k-1) degrees of freedom. It is usual to demand np i 5 (i=1,,, k) for the approximation to be valid, although the approximation generally works well if for only a few values of i (~0%), np i 1 and the rest (~80%) satisfy the condition np i 5. (Karl Pearson 1900) χ

41 Chi-Square Tests for Count Data Example A plant geneticist grows 00 progeny from a cross that is hypothesized to result in a 3:1 phenotypic ratio of red-flowered to white-flowered plants. Suppose the cross produces 170 red- to 30 white- flowered plants. Calculate the value of Q for this experiment. Chi-Square Tests for Count Data Example Solution n=00, k=, Let i=1 represent red-flowered and i= represent white-flowered plants. Then X 1 =170, and X =30. Here, H 0 : The flower color population ratio is not different from 3:1, and the alternate is H a : The flower color population sampled has a flower color ratio that is not 3 red: 1 white. Under the null hypothesis, the expected frequencies are np 1 =(00)(3/4)=150, and np =(00)(1/4)=50. Hence, Q = k i=1 (X i np i ) np i ( ) (30 50) = = Often X i is called the observed frequency and np i is called expected frequency. This example gives a measure of how close our observed frequencies come to the expected frequencies and is referred to as a measure of goodness of fit. Smaller values of Q values indicate better fit.

42 The Goodness-of-Fit Test Summary Let an experiment have k mutually exclusive and exhaustive outcomes A 1, A,, A k. We would like to test the null hypothesis that all the p i =p(a i ), i = 1,,, k are equal to known numbers p i0, i = 1,,, k. That is, to test H 0 :p 1 =p 10, p k =p k0 vs. H a : At least one of the probabilities is different from the hypothesized value. The test is always a one-sided upper tail test. Let O i be the observed frequency, E i = np i0 be the expected frequency (frequency under the null hypothesis), and k be the number of classes. The test statistic is Q = k i=1 (O i E i ) E i The test statistic Q has an approximate Chi-square distribution with k-1 degrees of freedom. The rejection region is Q χ α,k 1 Assumption: E i 5: Exact methods are available. Computing the power of this test is difficult. The Goodness-of-Fit Test Summary This test implies that if the observed data are very close to the expected data, we have a very good fit and we accept the null hypothesis. That is, for small Q values, we accept H 0.

43 The Goodness-of-Fit Test Example A die is rolled 60 times and the face values are recorded. The results are as follows. Up face Frequency Is the die balanced? Test using α = 0.05 The Goodness-of-Fit Test Example Solution If the die is balanced, we must have p 1 = p = = p 6 = 1/6 where p i =P(face value on the die is i), i = 1,,, 6. This has the discrete uniform distribution. Hence, vs. H 0 : p 1 = p = = p 6 = 1/6 H a : At least one of the probabilities is different from the hypothesized value of 1/6 E 1 =n 1 p 1 = (60)(1/6)=10,, E 6 =10. We summarize the calculation in the following table: Face value Frequency, O i Expected value, E i

44 The Goodness-of-Fit Test Example Solution (cont.) The test statistic is given by 6 (O Q = i E i ) = 6 i=1 E i From the chi-square table with 5 d.f., χ 0.05,5 = Because the value of the test statistic does not fall in the rejection region, we do not reject H 0. Therefore, we conclude that the die is balanced. Take a break.

45 Contingency Table: Test for Independence χ One of the uses of the statistic is in contingency (dependence) testing where n randomly selected items are classified according to two different criteria, such as when data are classified on the basis of two factors (row factor and column factor) where the row factor has r levels and the column factor has c levels. Our interest is to test for independence of two methods of classification of observed events. For example, we might classify a sample of students by sex and by their grade on a statistics course in order to test the hypothesis that the grades are dependent on sex. More generally the problem is to investigate a dependency between two classification criteria. The obtained data are displayed as shown in the following table, where n ij represents the number of data values under row i and column j. Levels of column factor 1 c Row total Row levels 1 n 11 n 1 n 1c n 1 n 1 n n c n r n r1 n r n rc n r Column total n.1 n. n.c N Contingency Table: Test for Independence where c N = n. j = n i. = n ij j=1 r i=1 r c i=1 j=1 We wish to test that the hypothesis that the two factors are independent.

46 Test for the Independence of Two Factors To test H 0 : The factors are independent vs. H a : The factors are dependent The test statistic is, Q = r c i=1 j=1 (O ij E ij ) E ij where Q ij = n ij and E ij = n i n j N The under the null hypothesis the test statistic Q has an approximate chi-square distribution with (r-1)(c-1) degrees of freedom. Hence, the rejection region is Q > χ α,(r 1)(c 1) Assumption: E ij 5 Test for the Independence of Two Factors Example The following table gives a classification according to religious affiliation and marital status for 500 randomly selected individuals. Religious affiliation A B C D None Total Marital status Single With spouse Total For α = 0.01, test the null hypothesis that marital status and religious affiliation are independent.

47 Test for the Independence of Two Factors Example Solution We need to test the hypothesis vs. H 0 : Marital status and religious affiliation are independent H a : Marital status and religious affiliation are dependent. Here, c = 5, and r =. For α = 0.01, and for (c-1)(r-1)=4 degrees of freedom, we have = χ 0.01,4 Hence, the rejection region is Q > We have E ij = n in j N. Thus, E 11 = (116)(11) 500 E 13 = (116)(56) 500 E 15 = (116)(55) 500 E = (384)(80) 500 E 4 = (384)(98) 500 = 48.95;E 1 = (116)(80) = 18.5; 500 = 1.99;E 14 = (116)(98) =.736; 500 = 1.76;E 1 = (384)(11) = 16.05; 500 = 61.44;E 3 = (384)(56) = ; 500 = 75.64;E 5 = (384)(55) = 4.4; 500 Test for the Independence of Two Factors Example Solution (cont.) The value of the test statistic is Q = r c i=1 j=1 = (O ij E ij ) E ij Because the observed value of Q does not fall in the rejection region, we do not reject the null hypothesis at α =0.01. Therefore, based on the observed data, the marital status and religious affiliation are independent.

48 Testing to Identify the Probability Distribution In hypothesis testing problems we often assume that the form of the population distribution is known. For example, in a χ -test for variance, we assume that the population is normal. The goodness-of-fit tests examine the validity of such an assumption if we have a large enough sample. This is another application of the chisquare statistic used for goodness-of fit tests. Goodness-of-Fit Test Procedures for Probability Distributions Let X 1,,X n be a sample from a population with cdf F(x), which may depend on the set of unknown parameters θ. We wish to test H 0 : F(x)=F 0 (x), where F 0 (x) is completely specified. 1. Divide the range of values of the random variables X 1 into K nonoverlapping intervals I 1, I,,I k. Let O j be the number of sample values that fall in the interval I j (j = 1,,, K). Assuming the distribution of X to be F 0 (x), find P(X I j ). Let P(X I j ) = π j. Let E j =nπ j be the expected frequency. 3. Compute the test statistic Q given by Q = k i=1 (O i E i ) E i The test statistic Q has an approximate of freedom. 4. Reject of H 0 if Q > χ α,(k 1) 5. Assumptions: E j 5, j = 1,,., k χ -distribution with (K-1) degrees

49 Goodness-of-Fit Test Procedures for Probability Distributions If the null hypothesis does not specify F 0 (x) completely, that is, if F 0 (x) contains some unknown parameters θ 1,θ,,θ p, we estimate these parameters by the method of maximum likelihood. Using these estimated values we specify F 0 (x) completely. Denote the estimated F 0 (x) by ˆ 0 ( ). Let The test statistic is ˆπ i = P{X I i ˆ 0 (x)} Q = k i=1 (O i Êi ) Ê i and Ê i = n ˆπ i The statistic Q has an approximate chi-square distribution with (K-1-p) degrees of freedom. We reject H 0 if Q > χ α,(k 1 p) Goodness-of-Fit Test Procedures for Probability Distributions Example The grades of students in a class of 00 are given in the following table. Test the hypothesis that the grades are normally distributed with a mean of 75 and a standard deviation of 8. Use α = Range Number of students

50 Goodness-of-Fit Test Procedures for Probability Distributions Example Solution We have O 1 = 1, O = 36, O 3 = 90, O 4 = 44, O 5 = 18. We now compute π i (i=1,,,5), using the continuity correction factor, And, π 1 = P{X 59.5 H 0 } = P{z } = 0.06, 8 π = 0.189, π 3 = 0.47, π 4 = 0.476, π 5 = , E 1 = 5.4, E = 43.78, E 3 = 94.44, E 4 = 49.5, E 5 = 7.0 The test statistic results in n (O Q = i E i ) = 6. i=1 E i Q has a chi-square distribution with (5-1)=4 degrees of freedom. The critical value is χ 0.05,4 = Hence the rejection region is Q >7.11. Because the observed value of Q =6.>7.11, we reject H 0 at α =0.05. Thus, we conclude that the population is not normal.