Hypothesis testing. Null hypothesis H 0 and alternative hypothesis H 1.

Transcription

1 Hypothesis testing Null hypothesis H 0 and alternative hypothesis H 1.

2 Hypothesis testing Null hypothesis H 0 and alternative hypothesis H 1. Simple and compound hypotheses. Simple : the probabilistic model is specified completely. Compound : the probabilistic model is not specified completely (generally it will contain parameters to be estimated).

3 Hypothesis testing Null hypothesis H 0 and alternative hypothesis H 1. Simple and compound hypotheses. Simple : the probabilistic model is specified completely. Compound : the probabilistic model is not specified completely (generally it will contain parameters to be estimated). Example 1: we want to test whether data are compatible with the assumption that their true mean is µ 0. Then it could be set as: H 0 : X 1,..., X n N(µ 0, σ0 2 ) and independent. [simple if σ 2 known]; H 1 : X 1,..., X n N(µ, σ 2 0 ) and independent, with µ µ 0. [compound]

4 Rejection region Example 2: we have two groups, and we wish to test whether they can be considered as samples from the same population, or from two populations with different means. General assumption: X 1,..., X n N(µ X, σ 2 ), Y 1,..., Y m N(µ Y, σ 2 ), and independent. H 0 : µ X = µ Y, σ 2 > 0. H 1 : µ X µ Y, σ 2 > 0. Both are compound, but H 0 is simpler than H 1. How does a test work? We select a rejection region C: if data fall in C, we reject H 0 (and accept H 1 ); if data do not fall in C, we accept (do not reject) H 0.

5 Errors of first and second species Error of first species: rejecting H 0 if H 0 is true; Error of second species: accepting H 0 if H 1 is true. A smaller rejection region C decreases error of first species, but increases those of second species; a larger C vice versa.

6 Errors of first and second species Error of first species: rejecting H 0 if H 0 is true; Error of second species: accepting H 0 if H 1 is true. A smaller rejection region C decreases error of first species, but increases those of second species; a larger C vice versa. Solution? We choose a level (the risk we take of errors of 1st species) α [often 5%] and select a region s.t. P(X C H 0 ) = α (probability of error of 1st species).

7 Errors of first and second species Error of first species: rejecting H 0 if H 0 is true; Error of second species: accepting H 0 if H 1 is true. A smaller rejection region C decreases error of first species, but increases those of second species; a larger C vice versa. Solution? We choose a level (the risk we take of errors of 1st species) α [often 5%] and select a region s.t. P(X C H 0 ) = α (probability of error of 1st species). A test will have a certain power (the probability of rejecting H 0 when indeed H 1 is true).

8 Errors of first and second species Error of first species: rejecting H 0 if H 0 is true; Error of second species: accepting H 0 if H 1 is true. A smaller rejection region C decreases error of first species, but increases those of second species; a larger C vice versa. Solution? We choose a level (the risk we take of errors of 1st species) α [often 5%] and select a region s.t. P(X C H 0 ) = α (probability of error of 1st species). A test will have a certain power (the probability of rejecting H 0 when indeed H 1 is true). Generally, we are not able to compute exactly the power of a test (it depends on exact parameter value), but theory selects the most powerful test within a certain level.

9 One-sample test on the mean Example 1: (with σ 2 unknown). H 0 : X 1,..., X n N(µ 0, σ 2 ) and independent, where σ 2 > 0. H 1 : X 1,..., X n N(µ, σ 2 ) and independent, where µ µ 0, σ 2 > 0.

10 One-sample test on the mean Example 1: (with σ 2 unknown). H 0 : X 1,..., X n N(µ 0, σ 2 ) and independent, where σ 2 > 0. H 1 : X 1,..., X n N(µ, σ 2 ) and independent, where µ µ 0, σ 2 > 0. The test quantity used is T = X µ 0 S/ n where S 2 is the sample variance. It is natural (and can be justified rigorously) to reject H 0 when T is far away from 0.

11 One-sample test on the mean Example 1: (with σ 2 unknown). H 0 : X 1,..., X n N(µ 0, σ 2 ) and independent, where σ 2 > 0. H 1 : X 1,..., X n N(µ, σ 2 ) and independent, where µ µ 0, σ 2 > 0. The test quantity used is T = X µ 0 S/ n where S 2 is the sample variance. It is natural (and can be justified rigorously) to reject H 0 when T is far away from 0. How far is far?

12 One-sample test on the mean Example 1: (with σ 2 unknown). H 0 : X 1,..., X n N(µ 0, σ 2 ) and independent, where σ 2 > 0. H 1 : X 1,..., X n N(µ, σ 2 ) and independent, where µ µ 0, σ 2 > 0. The test quantity used is T = X µ 0 S/ n where S 2 is the sample variance. It is natural (and can be justified rigorously) to reject H 0 when T is far away from 0. How far is far? Under H 0, T follows a t(n 1) distribution. Then we find t α s.t. P( t(n 1) > t α ) = α. Reject H 0 if T > t α, accept it otherwise.

13 Computations It is expected that mean body temperature is 37.0 C. Temperatures were obtained rom 25 individuals: Are these compatible with µ = 37.0?

14 Computations It is expected that mean body temperature is 37.0 C. Temperatures were obtained rom 25 individuals: Are these compatible with µ = 37.0? X = , S = 0.374, T =

15 Computations It is expected that mean body temperature is 37.0 C. Temperatures were obtained rom 25 individuals: Are these compatible with µ = 37.0? p-value = X = , S = 0.374, T =

16 Computations It is expected that mean body temperature is 37.0 C. Temperatures were obtained rom 25 individuals: Are these compatible with µ = 37.0? X = , S = 0.374, T = p-value = (this means that there is about 56% chance of getting a higher (in absolute value) value of T if H 0 is true). [Many statistical programs (e.g. R) return the p-value; the user will know how it compares with α she/he has chosen] Here we accept µ = 37.0 at any reasonable level of α.

17 One-sample test on the mean. II If [unilateral alternative] H 1 : X 1,..., X n N(µ, σ 2 ) and independent, where µ > µ 0, σ 2 > 0, then the rejection region is for T positive and large. Hence we find t α s.t. P(t(n 1) > t α) = α. Reject H 0 if T > t α, accept it otherwise. Vice versa if the alternative hypothesis is µ < µ 0.

18 One-sample test on the mean. II If [unilateral alternative] H 1 : X 1,..., X n N(µ, σ 2 ) and independent, where µ > µ 0, σ 2 > 0, then the rejection region is for T positive and large. Hence we find t α s.t. P(t(n 1) > t α) = α. Reject H 0 if T > t α, accept it otherwise. Vice versa if the alternative hypothesis is µ < µ 0. In the example, if H 1 had been µ < 37, p-value = P(t(24) < ) = 0.281

19 One-sample test on the mean. II If [unilateral alternative] H 1 : X 1,..., X n N(µ, σ 2 ) and independent, where µ > µ 0, σ 2 > 0, then the rejection region is for T positive and large. Hence we find t α s.t. P(t(n 1) > t α) = α. Reject H 0 if T > t α, accept it otherwise. Vice versa if the alternative hypothesis is µ < µ 0. In the example, if H 1 had been µ < 37, p-value = P(t(24) < ) = Observations: unilateral alternatives make it easier rejecting the null hypothesis (hence they are seldom used). In practice, border-line results suggest further research.

20 Test on equality of the means Independent samples (e.g. 2 groups with different treatments) X 1,..., X n N(µ X, σ 2 ), Y 1,..., Y m N(µ Y, σ 2 ), and independent. H 0 : µ X = µ Y, σ 2 > 0. H 1 : µ X µ Y, σ 2 > 0.

21 Test on equality of the means Independent samples (e.g. 2 groups with different treatments) X 1,..., X n N(µ X, σ 2 ), Y 1,..., Y m N(µ Y, σ 2 ), and independent. H 0 : µ X = µ Y, σ 2 > 0. H 1 : µ X µ Y, σ 2 > 0. Under H 0 Y i X j N(0, 2σ 2 ) for all i = 1... m, j = 1... n. Hence we test whether the true mean of all these variables is 0.

22 Test on equality of the means Independent samples (e.g. 2 groups with different treatments) X 1,..., X n N(µ X, σ 2 ), Y 1,..., Y m N(µ Y, σ 2 ), and independent. H 0 : µ X = µ Y, σ 2 > 0. H 1 : µ X µ Y, σ 2 > 0. Under H 0 Y i X j N(0, 2σ 2 ) for all i = 1... m, j = 1... n. Hence we test whether the true mean of all these variables is 0. Estimate of σ 2 : SX 2,Y = 1 ( (n 1)S 2 n + m 2 X + (m 1)SY 2 ) weighted mean of S 2 X (variance estimated on X i) and S 2 Y.

23 Test on equality of the means Independent samples (e.g. 2 groups with different treatments) X 1,..., X n N(µ X, σ 2 ), Y 1,..., Y m N(µ Y, σ 2 ), and independent. H 0 : µ X = µ Y, σ 2 > 0. H 1 : µ X µ Y, σ 2 > 0. Under H 0 Y i X j N(0, 2σ 2 ) for all i = 1... m, j = 1... n. Hence we test whether the true mean of all these variables is 0. Estimate of σ 2 : SX 2,Y = 1 ( (n 1)S 2 n + m 2 X + (m 1)SY 2 ) weighted mean of SX 2 (variance estimated on X i) and SY 2. Ȳ X Finally, t-test on T =. s X,Y 1 n + 1 m

24 Test on equality of the means Independent samples (e.g. 2 groups with different treatments) X 1,..., X n N(µ X, σ 2 ), Y 1,..., Y m N(µ Y, σ 2 ), and independent. H 0 : µ X = µ Y, σ 2 > 0. H 1 : µ X µ Y, σ 2 > 0. Under H 0 Y i X j N(0, 2σ 2 ) for all i = 1... m, j = 1... n. Hence we test whether the true mean of all these variables is 0. Estimate of σ 2 : SX 2,Y = 1 ( (n 1)S 2 n + m 2 X + (m 1)SY 2 ) weighted mean of SX 2 (variance estimated on X i) and SY 2. Ȳ X Finally, t-test on T =. s X,Y 1 n + 1 m Assumptions: normality, independence, equality of variances (should be checked [sometimes variable transformations help]).

25 Test of equality of means with independent samples: computations Annual flow of the river Nile at Ashwan has been measured We ask whether the mean of the former 50 years is the same as the mean of the latter 50 years.

26 Test of equality of means with independent samples: computations Annual flow of the river Nile at Ashwan has been measured We ask whether the mean of the former 50 years is the same as the mean of the latter 50 years. Treat each year as an independent sample, though assumption questionable

27 Test of equality of means with independent samples: computations Annual flow of the river Nile at Ashwan has been measured We ask whether the mean of the former 50 years is the same as the mean of the latter 50 years. X (mean of years 1 50) = , Ȳ (mean of years ) = SX 2 = 37, 140, S Y 2 = 12, 106. S X 2,Y = 24, 623.

28 Test of equality of means with independent samples: computations Annual flow of the river Nile at Ashwan has been measured We ask whether the mean of the former 50 years is the same as the mean of the latter 50 years. X (mean of years 1 50) = , Ȳ (mean of years ) = SX 2 = 37, 140, S Y 2 = 12, 106. S X 2,Y = 24, 623. Hence T = , p-value = Reject µ X = µ Y.

29 Test of equality of means with independent samples: computations Annual flow of the river Nile at Ashwan has been measured We ask whether the mean of the former 50 years is the same as the mean of the latter 50 years. X (mean of years 1 50) = , Ȳ (mean of years ) = S 2 X = 37, 140, S 2 Y = 12, 106. S 2 X,Y = 24, 623. Hence T = , p-value = Reject µ X = µ Y. One may question equality of variances... A correction of the t-test works without equal variances: t = , df = , p value =

30 Test on equality of the means. II Paired samples (e.g. same individuals before/after treatment) General assumption Y i X i N(µ, σ 2 ), i = 1... n. No assumption on X i and Y i, but only on their differences (the effect of treatment). H 0 : µ = 0, σ 2 > 0. H 1 : µ 0, σ 2 > 0. This is simply a test that the true mean of W = Y X is 0. It will be easier rejecting H 0 because generally s Y X is much smaller than 2s X,Y. Basic assumption: the effect of treatment is additive (does not depend on the original value of X i ).

31 Test on binomial Observations: n trials with k successes. Are these compatible with probability of success p 0? H 0 : X Bin(n, p 0 ) H 1 : X Bin(n, p) with p p 0. Simplest method: under H 0 by central limit theorem Z = X np 0 np0 (1 p 0 ) N(0, 1) for n large [rule np 0, n(1 p 0 ) > 5]. Set α find z α s.t. P( N(0, 1) > z α ) = α. If Z > z α, reject H 0 ; otherwise accept it. Exact method: make computations with binomial distribution. Note. All previous cases could be studied through confidence intervals. Some authors prefer using only confidence intervals.

32 Chi-square test General chi-square test: We have k types of events that can occur in each trial, with a priori probabilities for them p 0 1,..., p 0 k with p p 0 k = 1. After n trials, we observe n 1 events of type 1, n 2 of 2,..., n k of k, with n n k = n. Are data compatible with expectations? (k = 2 is binomial) Classical test: chi-square: Set E i = np 0 i (expected number of events of type i under H 0 ), X 2 = k (n i E i ) 2 i=1 E i χ 2 (k 1) for n large. Find c α s.t. P(χ 2 (k 1) > c α ) = α. If X 2 > c α, reject H 0 ; accept it otherwise.

33 Chi-square for data fit to a distribution The values p1 0,..., p0 k can be those arising from some distribution. Often the distribution will contain parameters to be estimated (e.g. λ of Poisson). One can use the chi-square: if m parameters are estimated, X 2 χ 2 (k m 1) (of course m < k 1). Example: data (Von Bartkiewicz, 1898) on Prussians soldiers kicked to death by horses: i (deaths) n i (number of corps/years) Total 200

34 Chi-square example (continued) Estimate λ with the sample mean ˆλ = ( )/200 = Compute E i. Join the classes 3 (rule of thumb: E i 5) to obtain: i n i Ê i Total 200 Compute X lower than relevant quantiles of χ 2 (2).

35 Chi-square test of independence A classical use of chi-square is when we observe two qualitative variables. H 0 : variables are independent; H 1 : they are not independent. k levels of X and l levels of Y (if k = l = 2, a 2 2 contingency table). Data are n ij (# of observations with X = i (i = 1... k) and Y = j (j = 1... l). H 0 : P(X = i, Y = j) = p i q j for all i and j p i, i = 1... k 1, q j, j = 1... l 1 to be estimated from data. H 1 : P(X = i, Y = j) p i q j. Examples and computations to be briefly presented in next lecture.