Cofidece Itervals A cofidece iterval is a iterval whose purpose is to estimate a parameter (a umber that could, i theory, be calculated from the populatio, if measuremets were available for the whole populatio). A cofidece iterval has three elemets. First there is the iterval itself, somethig like (123, 456). Secod is the cofidece level, somethig like 95%. Third there is the parameter beig estimated, somethig like the populatio mea, µ or the populatio proportio, p. I order to have a meaigful statemet, you eed all three elemets: (123, 456) is a 95% cofidece iterval for µ. Formulas: Geeral formula for cofidece itervals: estimate ± margi of error is 1.645 for 90% cofidece, 1.96 for 95% cofidece, ad 2.576 for 99% cofidece CI for a populatio mea (σ is kow ad > 30 or the variable is ormally distributed i the σ populatio) x ± (TI-83: STAT TESTS 7:ZIterval) CI for a populatio mea (σ is ukow ad > 30 or the variable is ormally distributed i the populatio) s x ± t (TI-83: STAT TESTS 8:TIterval) CI for a Populatio proportio (whe p$ 10 ad ( 1 p$) 10) $p x = p$ ± (TI-83: STAT TESTS A:1-PropZIterval) If you do t kow $p, use $p = 1 2 (coservative approach). Miimum required sample sie for a desired margi of error ad cofidece level: Whe it is a mea problem: = m σ 2 Whe it is a proportio problem: = p $( 1 p $) 2 1
Examples: 1. You wish to estimate, with 95% cofidece, the proportio of computers that eed repairs or have problems by the time the product is three years old. Your estimate must be accurate withi 3% of the true proportio. a. If o prelimiary estimate is available, fid the miimum sample sie required. If o prelimiary estimate is available, use the coservative choice: p $ = 05. m = 3% = 0.03 2 2 = p p = 2 $( 1 $) 05. ( 1 05. ) = 1111111. 0. 03 Thus we eed at least 1112 computers to sample. (Remember: ALWAYS roud up!) b. Now suppose a prior study ivolvig less tha 100 computers foud that 19% of these computers eeded repairs or had problems by the time the product was three years old. Fid the miimum sample sie eeded. Now p $ = 019. 2 2 = p p = 2 $( 1 $) 019. ( 1 019. ) = 684 0. 03 This is a whole umber, thus the miimum sample sie we eed is 684. 2. A college admiistrator would like to determie how much time studets sped o homework assigmets durig a typical week. A questioaire is set to a sample of = 100 studets ad their respose idicates a mea of 7.4 hours per week ad stadard deviatio of 3hours. (a) What is the poit estimate of the mea amout of homework for the etire studet populatio (i.e., what is the poit estimate for µ, the ukow populatio mea)? The poit estimate for the populatio mea is the sample mea. I this case it s 7.4 hours. (b) Now make a iterval estimate of the populatio mea so that you are 95% cofidet that the true mea is i your iterval (i.e., compute the 95% cofidece iterval). Coditios: radom sample? We do t really kow. > 30, so we ca assume by the CLT that the shape of the samplig distributio of the sample meas is approximately ormal. x = 7.4 hours, ad s = 3 hours. The populatio s.d. is ukow, we oly kow the sample s.d., so we eed to use the t-iterval. 2
s Usig x ± t ( t = 1.987) or the calculator: 8: TIterval The 95% cofidece iterval is (6.8, 8.0). That meas, we are 95% cofidet that the mea time ALL studets sped o homework assigmets durig a typical week is betwee 6.8 hours ad 8.0 hours. (c) Now compute the 99% cofidece iterval. Repeatig part b with t = 2.632, we get (6.6, 8.2). That meas, we are 99% cofidet that the mea time ALL studets sped o homework assigmets durig a typical week is betwee 6.6 hours ad 8.2 hours. (d) Compare your aswer to b ad c. Which cofidece iterval is wider, ad why? How is the width of the cofidece iterval related to the percetage/degree of cofidece? The 99% cofidece iterval is wider. If you wide the cofidece iterval of plausible values, you're more sure that the real parameter is i there somewhere. (e) Now compute the 95% cofidece iterval agai, but assume that = 50. Sice is still larger tha 30, we ca use the t-iterval agai. (t = 2.014) The 95% cofidece iterval with = 50 is (6.5, 8.3). (f) Compare your aswer to b ad e. Which cofidece iterval is wider, ad why? How is the width of the cofidece iterval related to the sie of the sample? The sample sie of 100 gives a smaller cofidece iterval tha the sample of sie 50. The larger your sample sie, the more sure you ca be that their aswers truly reflect the populatio. This idicates that for a give cofidece level, the larger your sample sie, the smaller your cofidece iterval. However, the relatioship is ot liear (i.e., doublig the sample sie does ot halve the cofidece iterval. Actually if we make the sample sie quadrupled (times 4), that would halve the cofidece iterval). 3. I Roosevelt Natioal Forest, the ragers took radom samples of live aspe trees ad measured the base circumferece of each tree. Assume that the circumfereces of the trees are ormally distributed. a. The first sample had 30 trees with a mea circumferece of 15.71 iches, ad stadard deviatio of 4.63 iches. Fid a 95% cofidece iterval for the mea circumferece of aspe trees from this data. Coditios: radom sample checked, σ is ukow, ad =30 ad the circumfereces are ormally distributed, so we ca use the t-iterval. x = 15.71 s = 4.63 = 30 3
s Usig x ± t (t = 2.045) or the calculator: 8: TIterval The 95% t-iterval is (13.98, 17.44). This meas, that we are 95% cofidet that the mea circumferece of ALL live aspe trees i Roosevelt Natioal Forest is betwee 13.98 iches ad 17.44 iches. That is, based o this sample. If we could measure the circumferece of ALL of the live aspe trees there, the we are 95% cofidet that the mea of all the measuremets would be betwee 13.98 iches ad 17.44 iches. Also, it meas that if we would take may, may samples of sie 30 of live aspe trees ad calculate a 95% cofidece iterval for each sample, about 95% of them would cotai the real, actual mea circumferece ad about 5% would miss it. But, of course, we do t kow which 5% would miss it. The ext sample had 100 trees with a mea of 15.58 iches. Agai fid a 95% cofidece iterval for the mea circumferece of aspe trees from these data. Coditios: σ is ukow, ad > 30 ad the circumfereces are ormally distributed, so we ca use the t-iterval. x = 15.71 s = 4.63 = 100 s Usig x ± t (t = 1.984) or the calculator: 8: TIterval The 95% t-iterval is (14.79, 16.63). This meas, that we are 95% cofidet that the mea circumferece of ALL live aspe trees i Roosevelt Natioal Forest is betwee 14.79 iches ad 16.63 iches. That is, based o this sample, if we could measure the circumferece of ALL the live aspe trees there, the we are 95% cofidet that the mea of all the measuremets would be betwee 14.79 iches ad 16.63 iches. The last sample had 300 trees with a mea of 15.59 iches. Fid a 95% cofidece iterval from these data. Coditios: σ is ukow, ad > 30 ad the circumfereces are ormally distributed, so we ca use the t-iterval. x = 15.71 s = 4.63 = 300 s Usig x ± t (t = 1.96) or the calculator: 8: TIterval The 95% t-iterval is (15.18, 16.24). 4
This meas, that we are 95% cofidet that the mea circumferece of ALL live aspe trees i Roosevelt Natioal Forest is betwee 15.18 iches ad 16.24 iches. That is, based o this sample, if we could measure the circumferece of ALL the live aspe trees there, the we are 95% cofidet that the mea of all the measuremets would be betwee 15.18 iches ad 16.24 iches. Fid the legth of each iterval of parts (a), (b) ad (c). Commet o how these legths chage as the sample sie icreases. The legth of the CI with = 30 is 17.44 13.98 = 3.46 The legth of the CI with = 100 is 16.63 14.79 = 1.84 The legth of the CI with = 300 is 16.24 15.18 = 1.06. The legth of the iterval gets smaller as the sample sie icreases. 4. I a article explorig blood serum levels of vitamis ad lug cacer risks (The New Eglad Joural of Medicie), the mea serum level of vitami E i the cotrol group was 11.9 mg/liter. There were 196 patiets i the cotrol group. (These patiets were free of all cacer, except possible ski cacer, i the subsequet 8 years). Assume that the stadard deviatio σ = 4.30 mg/liter. a. Fid a 95% cofidece iterval for the mea serum level of vitami E i all persos similar to the cotrol group. Coditios: Radom sample? We do t really kow, but let s assume they picked the subjects radomly. σ is kow, so we ca use the -iterval. x = 11.9 σ = 4.30 = 196 σ Usig either x ± ( = 1.96) or the calculator: 7: ZIterval The 95% t-iterval is (11.3, 12.5). This meas, that we are 95% cofidet that the mea serum level of vitami E i the ALL cacer free patiets is betwee 11.3 mg/liter ad 12.5 mg/liter. That is, based o this sample, if we could measure the mea serum level of vitami E i ALL cacer free patiets (except possible ski cacer i the subsequet 8 years), the we are 95% cofidet that the mea of all the measuremets would be betwee 11.3 mg/liter ad 12.5 mg/liter. b. If you wated to estimate the mea serum level of vitami E, with 90% cofidece, ad a margi of error of o more tha 0.25 mg/liter, how large a sample would you eed? For the miimum sample sie we eed we ca use the formula: = m σ 2 5
= m 1645. 4. 30 = = 80055. 0. 25 σ 2 2 Thus, we would eed at least 801 cacer free patiets i our sample. 5. Suppose i a state with a large umber of voters that 56 out of 100 radomly surveyed voters favored Propositio 1. This is just a small sample of all the voters. Do you thik Propositio 1 passed? YES, but I am ot very sure, I would like more iformatio. a. Give a rage of plausible values for the proportio of all voters who favored Propositio 1. (That is, fid a 95% cofidece iterval) Our goal is to estimate the proportio of ALL voters who favored Propositio 1 (p). I our sample, 56 out of 100 favored the propositio, that is $p = 56/100 = 0.56 = 56%. x = 56 = 100 $p =0.56 Checkig coditios for CI: radom sample, p$ = 56 > 10 ad ( 1 p$) = 100( 1 056. ) = 44 Coditios are satisfied. We use : p$ ± Thus, usig the formula above (with = 1.96), or usig the A:1-PropZIt meu o the calculator, we get (0.462, 0.653). That is we are 95% cofidet that the proportio of ALL voters who favored Propostio 1 is betwee 46.2% ad 65.3%. Other samples of 100 voters would yield other 95% cofidece itervals. Most of these cofidece itervals (about 95% of them) would capture p, but a few of them (about 5%) would ot. b. The 95% cofidece iterval we just computed is rather wide ad does ot pipoit p to ay great extet. (I fact, we caot eve tell whether a majority voted for Propositio 1 Our ext example shows that we ca obtai a arrower cofidece iterval by takig a larger sample. Suppose i a state with a large umber of voters that 560 out of 1000 radomly surveyed voters favored Propositio 1. Give a rage of plausible values for the proportio of all voters who favored Propositio 1. Our goal is to estimate the proportio of ALL voters who favored Propositio 1 (p). I our sample, 560 out of 1000 favored the propositio, that is $p = 560/1000 = 0.56 = 56%. x = 560 = 1000 $p =0.56 6
Checkig coditios for CI: radom sample, p$ = 560 > 10 ad ( 1 p$) = 1000( 1 0. 56) = 440 > 10 Coditios are satisfied. We use : p$ ± Thus, usig the formula above (with = 1.96), or usig the A:1-PropZIt meu o the calculator, we get (0.529, 0.591). That is, based o the results from our sample of sie 1000, we are 95% cofidet that the proportio of ALL voters who favored Propostio 1 is betwee 52.9% ad 59.1%. Notice that the sample sie of 1000 gives a much arrower cofidece iterval tha the sample sie of 100. I fact, with the larger sample, we ca be quite cofidet (about 95% of the time ayway), that a majority of the voters favored Propositio 1, sice the smaller edpoit of the samples 95% cofidece iterval, 0.529 is greater tha oe-half. Bear i mid, however, that the larger sample may be more costly ad time cosumig tha the smaller oe. Now, how cofidet are you that Propositio 1 passed or failed? I d bet a small amout of moey that I am right. c. Forget the previous parts ow. Assume that you did t take ay samples yet. What sample sie you eed to use if you wat the margi of error to be at most 3% with 95% cofidece but you have o estimate of p? Because you do t have a estimate of p, use $p = 0.5. We wat the margi of error to be at most 3%, that is m = 0.03. 2 2 = p p = 196. $( 1 $) 05. ( 1 05. ) = 1067111. 0. 03 Thus, to get a margi of error to be at most 3%, we eed at least 1068voters i our sample. d. Now let s assume you did a pilot sample, i which 56 out of 100 voters said they favor Propositio 1. What sample sie you eed to use if you wat the margi of error to be at most 3% with 95% cofidece ow? Now we have a estimate of p from the pilot study, so we use $p = 0.56. We wat the margi of error to be at most 3%, that is m = 0.03. 2 2 = p p = 196. $( 1 $) 056. ( 1 056. ) = 105174. 0. 03 Thus, to get a margi of error to be at most 3%, we eed at least 1052 voters i our sample. 6. Sometimes a 95% cofidece iterval is ot eough. For example, i testig ew medical drugs or procedures, a 99% cofidece iterval may be required before the ew drug or procedure is approved for geeral use. For example, a ew drug for migraies might iduce isomia (difficulty of fallig asleep) i some patiets. If this side effect happes i too may patiets, the 7
drug might ot be approved. More precisely, if it could happe i more tha 5% of all the patiets, it wo t be approved. I a radom sample of 632 migraie patiets who took the ew pill, 19 of them experieced isomia. Based o this sample result, what would be your recommedatio, should the ew drug be approved or ot? We wat to estimate the proportio of ALL migraie patiets who would experiece isomia. The sample proportio, $p, is 19/632 = 0.03 = 3% We wat to calculate the 99% cofidece iterval based o this sample result. Let s check the coditios first: Radom sample, p$ = 19 > 10 ad ( 1 p$) = 613 > 10 Coditios are satisfied. We use : p$ ± Thus, usig the formula above (with = 2.575), or usig the A:1-PropZIt meu o the calculator, we get (0.0126, 0.0476). Thus, based o this sample result, we are 99% cofidet that if we could test every migraie patiets who would take this pill, the proportio of them who would experiece isomia would be betwee about 1.26% ad 4.76%. Therefore, we ca recommed the approval of the ew drug. 7. The Gallup Poll survey orgaiatio coducted telephoe iterviews with a radomly selected atioal sample of 1,003 adults, 18 years ad older, o Mar. 3-5, 2003. I the survey they foud that 281 adults said that the atio s eergy situatio is very serious. Fid a 95 ad 99% cofidece iterval for the ukow proportio of Americas who felt that the atio s eergy situatio is very serious. x This is a proportio problem. $p = = 281 1003 Coditios: radom sample, checked, p$ = 1003 281 281 = 281 > 10, ( 1 p$) = 1003( 1 ) = 722 > 10 1003 1003 95% cofidece iterval: p$ ± ( = 1.96) Or usig the calculator: STAT TESTS A:1-PropZIt, x = 213, = 1003, C-level: 0.95 The 95% cofidece iterval is: (0.253, 0.308) We are 95% cofidet that the proportio of ALL adult i the U.S. who feel that the atio s eergy situatio is very serious is somewhere betwee 25.3% ad 30.8%. That is, if we could ask EVERY adult i the U.S. ad ask them what they thik about the atio s eergy situatio, we are 95% cofidet that 25.3%-30.8% of them would thik that the eergy situatio is very serious. 8
99% cofidece iterval: p$ ± ( = 2.575) Or usig the calculator: STAT TESTS A:1-PropZIt, x = 281, = 1003, C-level: 0.99 The 95% cofidece iterval is: (0.244, 0.317) We are 99% cofidet that the proportio of ALL adult i the U.S. who feels that the atio s eergy situatio is very serious is somewhere betwee 24.4% ad 31.7%. That is, if we could ask EVERY adult i the U.S. ad ask them what they thik about the atio s eergy situatio, we are 95% cofidet that 24.4%-31.7% of them would thik that the eergy situatio is very serious. Agai, as it should be, the 99% cofidece iterval is wider. 8. The dataset "Normal Body Temperature, Geder, ad Heart Rate" cotais 130 observatios of body temperature, alog with the geder of each idividual ad his or her heart rate. MINITAB provides the followig iformatio: Descriptive Statistics Variable N Mea Media Tr Mea StDev SE Mea TEMP 130 98.249 98.300 98.253 0.733 0.064 Variable Mi Max Q1 Q3 TEMP 96.300 100.800 97.800 98.700 Based o these results, costruct ad iterpret a 95% cofidece itervals for the mea body temperature. Accordig to these results, is the usual assumed ormal body temperature of 98.6 degrees Fahreheit withi the 95% cofidece iterval for the mea? This is a mea problem. Coditios: radom sample: we do t kow. No iformatio about that. > 30. Sice we do t kow sigma, the populatio s stadard deviatio, we eed to use the t-iterval. The sample mea is 98.249, ad the sample stadard deviatio is 0.733 (both are provided above). Use t = 1.984 s. The 95% cofidece iterval: x ± t = 98. 249 ± 1984. 0 733 = ( 98121., 98. 377 ) 130 Or usig the calculator: STAT TESTS 8: TIterval: highlight Stat, ad eter 98.249 for the mea, 0.733 for Sx, ad 130 for. We are 95% cofidet that the mea body temperature for ALL people is betwee 98.121 ad 98.377 degrees of Fahreheit. The usual assumed ormal body temperature of 98.6 degrees Fahreheit is ot withi the 95% cofidece iterval for the mea. 9