6. Duality between confidence intervals and statistical tests

Transcription

1 6. Duality between confidence intervals and statistical tests Suppose we carry out the following test at a significance level of 100α%. H 0 :µ = µ 0 H A :µ µ 0 Then we reject H 0 if and only if µ 0 does not belong to the 100(1 α)% confidence interval for µ. 1 / 44

2 Intuition of the duality between tests and confidence intervals Intuition: if µ 0 belongs to the appropriate confidence interval, then it is a credible value for the population mean and hence we do not reject H 0. Significance level + confidence level = 100%. 2 / 44

3 Example 6.1 (see Example 5.2) A sample of 100 Irish people were measured. Their mean height was 168cm and the sample standard deviation 12cm. Calculate a 95% confidence interval for the mean height of all Irish people. On the basis of this confidence interval, should we reject the hypothesis that the mean height of Irish people is 170cm? What is the significance level of such a test? 3 / 44

4 Solution to Example 6.1 The sample is large and 100(1 α) = 95 α = The formula for the 95% confidence interval is X ± t,α/2s =168 ± n 10 =168 ± 2.35 = [165.65, ] Since µ 0 = 170 belongs to the 95% confidence interval for µ, we do not reject H 0. 4 / 44

5 Applications of the concept of duality The duality between confidence intervals and tests can easily be extended to tests for the difference of two means for 2 dependent samples. Let D denote the difference between a pair of observations. We reject the null hypothesis H 0 : µ D = 0, at a significance level of 100α%, if and only if 0 does not belong to the 100(1 α)% confidence interval for the mean difference. 5 / 44

6 Example 6.2 (see Example 5.4) A drug for reducing blood pressure is tested on a group of 7 patients. The maximum resting blood pressure of these patients was measured before and after treatment. Patient Blood Pressure - Before Blood Pressure - After Calculate 95% and 99% confidence intervals for the mean fall in blood pressure. Can it be said at a significance level of a) 5% b) 1%, that the drug changes the mean blood pressure? 6 / 44

7 Solution of Example 6.2 It should be noted that this hypothesis states that µ D 0. This cannot be the null hypothesis. The null hypothesis is the complement of this hypothesis i.e. µ D = 0. First we calculate the falls in the blood pressure for each patient. Patient Before (X ) After (Y ) Difference (X Y ) / 44

8 Solution of Example 6.2 Now we calculate the confidence intervals for the mean difference. The sample of differences is small, hence the appropriate formulae are as follows: 95% confidence interval: D ± s Dt n 1,0.025 n 99% confidence interval: D ± s Dt n 1,0.005 n 8 / 44

9 Solution of Example 6.2 We calculate the mean and the standard deviation of the differences D= = 10 7 n sd 2 = i=1 (d i D) 2 n 1 = (10 10)2 + (10 10) (20 10) 2 = s D = 100 = / 44

10 Solution of Example 6.2 We now calculate the 95% confidence interval: D ± s Dt n 1,0.025 =10 ± 10t 6,0.025 n =10 ± = 10 ± 9.25 = [0.75, 19.25] 7 Since 0 does not belong to the 95% confidence interval for µ D, we reject H 0 at a significance level of 5%. 10 / 44

11 Solution of Example 6.2 We now calculate the 99% confidence interval: D ± s Dt n 1,0.005 =10 ± 10t 6,0.005 n =10 ± 7 =10 ± = [ 4.01, 24.01] Since 0 belongs to the 99% confidence interval for µ D, we do not reject H 0 at a significance level of 1%. Hence, we have evidence that the drug affects blood pressure, although this evidence is not very strong. 11 / 44

12 6.1 Confidence intervals for the difference between population means based on 2 independent samples For 2 large samples When both samples are large the 100(1-α)% confidence interval for the difference between 2 population means is: Confidence interval for the difference between population means X Y ± t, α S.E.(X Y ) / 44

13 Confidence intervals for the difference between population means based on 2 independent samples In particular, 95% confidence interval X Y ± t,0.025 S.E.(X Y ) = X Y ± 1.96S.E.(X Y ). 99% confidence interval X Y ± t,0.005 S.E.(X Y ) = X Y ± 2.576S.E.(X Y ), where X and Y are the sample means and S.E.(X Y ) is the standard error of using (X Y ) to estimate µ X µ Y. 13 / 44

14 Standard error of the difference between the sample means When both samples are large, the approximation for the standard error of the difference between the sample means is given by sx 2 S.E.(X Y ) m + s2 Y n, where m and n are the sample sizes. 14 / 44

15 Example 6.3 (see Example 5.5) The average height of 100 Dutch men is 176cm and their standard deviation 12cm. The average height of 50 Japanese men is 169cm and the standard deviation is 10cm. Calculate a 95% confidence interval for the mean difference in height between Dutch and Japanese men. Can it be stated at a significance level of 5% that the mean heights of these nationatilities differ? 15 / 44

16 Solution to Example 6.3 First we estimate the standard error of the difference between the sample means s 2 X S.E.(X Y ) m + s2 Y n 122 = = / 44

17 Solution to Example 6.3 The 95% confidence interval for the difference between population means is X Y ± t,0.025 S.E.(X Y )=( ) ± =7 ± 3.64 = [3.36, 10.64] 17 / 44

18 Solution to Example 6.3 We have H 0 :µ X µ Y = 0 H A :µ X µ Y 0. Since 0 does not belong to this confidence interval, we reject H 0 at a significance level of 5%, we have evidence that the mean heights differ. 18 / 44

19 6.1.2 For small samples If at least one of the samples is small, then we use the following approximation for S.E.(X Y ). S.E.(X Y ) where s 2 p is the pooled variance s 2 p ( 1 m + 1 ), n s 2 p = (m 1)s2 X + (n 1)s2 Y m + n 2 This formula assumes that the population variances are equal. The formula for the confidence interval assumes that if a sample is small then observations come from a normal distribution. 19 / 44

20 Formulae for confidence intervals for the difference between two population means (small samples) The 100(1-α)% confidence interval for the difference between 2 population means is (X Y ) ± t m+n 2, α S.E.(X Y ). 2 (if m + n 2 > 30, we may use the approximation t m+n 2,α/2 t,α/2. In particular, the 95% confidence interval is (X Y ) ± t m+n 2,0.025 S.E.(X Y ). The 99% confidence interval for the difference between 2 population means is (X Y ) ± t m+n 2,0.005 S.E.(X Y ). 20 / 44

21 Example 6.3 (see Example 5.6) The table below gives statistics regarding the weights of Americans and Japanese. Sample Americans Japanese size mean std. dev Calculate a 99% confidence interval for the mean difference in weight between Americans and Japanese. Can it be stated at a significance level of 1% that these mean weights differ? 21 / 44

22 Solution to Example 6.3 First we calculate the pooled variance sp= 2 (m 1)s2 X + (n 1)s2 Y m + n 2 = Now we calculate the standard error ( 1 S.E.(X Y ) sp 2 m + 1 ) n ( ) / 44

23 Solution to Example 6.3 Now we calculate the 99% confidence interval (X Y ) ± t m+n 2,0.005 S.E.(X Y )=(86 72) ± t 23, =14 ± ± 12.9 = [1.1, 26.9]. 23 / 44

24 Solution to Example 6.3 We have H 0 :µ X µ Y = 0 H A :µ X µ Y 0, where µ X and µ Y are the mean wieghts of the populations of Americans and Japanese, respectively. Since 0 does not belong to the 99% confidence interval, we reject H 0 at a significance level of 1%. We have strong evidence that the mean weight of Americans and Japanese differs. 24 / 44

25 Solution to Example 6.3 It should be noted that these calculations assume that weight has a normal distribution. Since weight has a slightly skewed distribution, the calculation of the confidence interval is not totally accurate. The evidence that the mean weights differ is thus not so strong as stated. 25 / 44

26 6.2 Tests for a population proportion Suppose we have a large sample. We wish to test between the following two hypotheses H 0 :p = p 0 H A :p p 0, where p is the proportion of a population showing a given trait. The obvious estimator of this proportion is the sample proportion ˆp = X n, where X is the number of individuals exhibiting the given trait in a sample of n individuals. 26 / 44

27 Testing procedure The test statistic for this test is Z = ˆp p 0 S.E.(ˆp), where S.E.(ˆp) is the standard error of the sample proportion. The p-value for the test is 2P(Z > t ), where t is the realisation of the test statistic. At a significance level of α, we reject H 0 : p = p 0 if t > t, α / 44

28 Standard error of the sample proportion The standard error of this estimate is p(1 p) S.E.(ˆp) =. n Under the null hypothesis p = p 0, p0 (1 p 0 ) S.E.(ˆp) =. n This formula for the standard error is used in the standard testing procedure. 28 / 44

29 Confidence intervals for a population proportion A 100(1-α)% confidence interval for a population proportion is given by ˆp ± t, α 2 S.E.(ˆp). When calculating a confidence interval, the standard error is approximated using the sample proportion i.e. ˆp(1 ˆp) S.E.(ˆp). n 29 / 44

30 Particular confidence intervals for a population proportion In particular, a 95% confidence interval for a population proportion is given by ˆp ± t,0.025 S.E.(ˆp) = ˆp ± 1.96S.E.(ˆp). A 99% confidence interval for a population proportion is given by ˆp ± t,0.005 S.E.(ˆp) = ˆp ± 2.576S.E.(ˆp). 30 / 44

31 Approximate duality between confidence intervals for population proportions and tests It should be noted that when we are testing the hypothesis H 0 : p = p 0, due to the different formulae used to calculate the standard error of ˆp, the duality between confidence intervals and hypothesis tests is only approximate (in the standard testing procedure p = p 0 is used to calculate the standard error, but p ˆp is used in the calculation of the confidence interval). e.g. using a 95% confidence to test the hypothesis H 0 : p = p 0, the significance level would be approximately 5%. 31 / 44

32 Example of 300 randomly picked Irish households have a PC. i) Test the null hypothesis that 55% of all Irish households have a PC. ii) Construct 95% and 99% confidence intervals for the proportion of Irish households owning a PC. 32 / 44

33 Solution to Example 6.4 i) The sample proportion is The test statistic is ˆp = = 0.61 Z = ˆp p 0 S.E.(ˆp) 33 / 44

34 Solution to Example 6.4 The standard error of the sample proportion is p0 (1 p 0 ) S.E.(ˆp)= n = The realisation of the test statistic is t = / 44

35 Conclusion Since t > t,0.025 = 1.96 and t < t,0.005 = 2.576, we reject H 0 at a significance level of 5%, but not at a significance level of 1%. We have evidence that the percentage is higher than 55%. 35 / 44

36 Calculation of confidence interval The approximation of the standard error is ˆp(1 ˆp) S.E.(ˆp) n = / 44

37 Solution to Example 6.4 The 95% confidence interval for the proportion of all Irish households owning a PC is 0.61 ± ± = [0.555, 0.665] Since 0.55 does not belong to this confidence interval, based on the confidence interval we reject the null hypothesis that p = 0.55 at a significance level of approximately 5%. 37 / 44

38 Solution to Example 6.4 The 99% confidence interval for the proportion of all Irish households owning a PC is 0.61 ± ± = [0.537, 0.683] Since 0.55 belongs to this confidence interval, we do not reject the null hypothesis that p = 0.55 at a significance level of approximately 1%. Hence, we reject H 0 : p = 0.55 at the 5% level, but not at the 1% level. We have evidence that the proportion of Irish households owning PCs is greater than 55%, but this evidence is not strong. 38 / 44

39 6.3 Testing for a difference between two population proportions Suppose we have two large samples. We wish to test between the following two hypotheses H 0 :p 1 = p 2 (i.e. p 1 p 2 = 0) H A :p 1 p 2 where p i is the proportion of population i showing a given trait. We will only consider such tests using duality i.e. based on confidence intervals. 39 / 44

40 Confidence intervals for the difference between two population proportions A 100(1-α)% confidence interval for the difference between two population proportions is given by (ˆp 1 ˆp 2 ) ± t, α 2 S.E.(ˆp 1 ˆp 2 ), where ˆp i is the sample proportion for the sample from population i. In particular, a 95% confidence interval for the difference between two population proportions is given by (ˆp 1 ˆp 2 ) ± t,0.025 S.E.(ˆp 1 ˆp 2 ) = (ˆp 1 ˆp 2 ) ± 1.96S.E.(ˆp 1 ˆp 2 ) A 99% confidence interval for the difference between two population proportions is given by (ˆp 1 ˆp 2 ) ± t,0.005 S.E.(ˆp 1 ˆp 2 ) = (ˆp 1 ˆp 2 ) ± 2.576S.E.(ˆp 1 ˆp 2 ) 40 / 44

41 Standard error of the difference between sample proportions The standard error for the difference between two sample proportions is estimated by ˆp 1 (1 ˆp 1 ) S.E.(ˆp 1 ˆp 2 ) + ˆp 2(1 ˆp 2 ), n 1 n 2 where n i is the size of the sample from the population i. Again, the duality between confidence intervals and hypothesis tests is only approximate. 41 / 44

42 Example 6.5 A long term study showed that 64 of 400 smokers contracted lung cancer, while 18 of 900 non-smokers contracted the disease. Calculate a 95% confidence interval for the difference between the proportions of smokers and non-smokers contracting the disease. 42 / 44

43 Solution to Example 6.5 The sample proportions are ˆp 1 = = 0.16 ˆp 2 = = 0.02 The estimate of the standard error for the difference between the sample proportions is ˆp 1 (1 ˆp 1 ) + ˆp 2(1 ˆp 2 ) = + n 1 n / 44

44 Solution to Example 6.5 A 95% confidence interval for the difference between the population proportions is given by (ˆp 1 ˆp 2 ) ± 1.96S.E.(ˆp 1 ˆp 2 )=( ) ± =0.14 ± = [0.103, 0.177] Since 0 does not belong to this 95% confidence interval, we reject the null hypothesis at a significance level of approximately 5%. Hence, we have evidence that the population proportions are not equal. From the data it can be seen that the proportion of smokers contracting lung cancer is higher than the proportion of non-smokers. 44 / 44