Confidence Intervals

1 Cofidece Itervals Recall: Iferetial statistics are used to make predictios ad decisios about a populatio based o iformatio from a sample. The two major applicatios of iferetial statistics ivolve the use of sample data to (1) estimate the value of a populatio parameter, ad (2) test some claim (or hypothesis) about a populatio. I this Chapter, we itroduce methods for estimatig values of some importat populatio parameters. We also preset methods for determiig sample sizes ecessary to estimate those parameters. 7.1 Fidig Critical Z Values The ukow populatio parameter that we are iterested i estimatig is called the target parameter. Idetifyig the Target Parameter Some helpful key words are provided below to determie our target parameter: Parameter Key Words or Phrases Type of Data Mea; Average Quatitative p Proportio; Percetage; Fractio; Rate Qualitative Estimatig a Populatio Mea Usig a Cofidece Iterval Recall: A poit estimator of a populatio parameter is a rule or formula that tells us how to use the sample data to calculate a sigle umber that ca be used to estimate the populatio parameter. For all populatios, the sample mea x is a ubiased estimator of the populatio mea, meaig that the distributio of sample meas teds to ceter about the value of the populatio mea. For may populatios, the distributio of sample meas x teds to be more cosistet (with less variatio) tha the distributios of other sample statistics.

2 We have used poit estimators before to estimate target parameters; however, we caot assig ay level of certaity with those poit estimators. To remove this drawback, we ca use what is called a iterval estimator. A iterval estimator (or Cofidece Iterval) is a formula that tells us how to use sample data to calculate a iterval that estimates a populatio parameter. The cofidece coefficiet is the relative frequecy with which the iterval estimator ecloses the populatio parameter whe the estimator is used repeatedly a very large umber of times. The diagram show below shows the coverage of 8 cofidece itervals (CI s). The vertical lie shows the locatio of the parameter all the itervals capture the parameter except CI 2. If the cofidece level was 95% for each of these itervals, we would expect that oly 5% of the itervals fail to capture the parameter (as CI 2 has doe). CI 1 CI 2 CI 3 CI 4 CI 5 CI 6 CI 7 CI 8 A importat relatioship: The cofidece level has a complemetary relatioship with somethig called the sigificace level. The symbol for the sigificace level is alpha:. The relatioship betwee the cofidece level ad the sigificace level is expressed by the followig equatios:

3 Sigificace Level 100% Cofidece Level For example, if the cofidece level is 90%, the sigificace level is 10%. Cofidece Level 100% Sigificace Level For example, if the sigificace level is 5%, the cofidece level is 100% - 5% = 95%. The most commo choices for the cofidece level are give below with the correspodig sigificace levels: 90% (a = 10%), 95% (a = 5%), or 99% (a = 1%). A little otatio: The value z is defied as the value of the ormal radom variable Z such that the area to its right is. = area i this tail z Our goal for this sectio is to be able to estimate the true value of the populatio parameter (mu = mea). What we will eed is sample data. This should iclude the sample mea, the sample size, ad the sample (or populatio) stadard deviatio. We will also eed a cofidece level ad a z-chart.

4 Thigs Needed to Create a Cofidece Iterval for the Populatio Mea Sample Mea Sample Size Stadard Deviatio x Cofidece Level (1 - )100% Z-table Value z /2 The logic of a cofidece iterval ca be uderstood by cosiderig the followig ideas. First recall that uder the empirical rule approximately 95% of the data will fall betwee 2 stadard deviatios from the mea. Also, recall that the CLT tells us that for samples of size draw from the populatio, 2. (Note: x ~ N, 2 / x ~ N, / mea ad variace 2 / ) meas the sample mea is ormally distributed with It should make sese that: x 2, x 2 possible. would capture about 95% of all the sample meas Now cosider the drawig below: The z score separatig the right-tail is commoly deoted by z /2 ad is referred to as a critical value because it is o the borderlie separatig sample mea values that are likely to occur from those that are ulikely to occur.

5 Sample meas have a relatively small chace (with probability deoted by ) of fallig i oe of the red tails of the figure. Deotig the area of each shaded tail by /2, we see that there is a total probability of that a sample mea will fall i either of the two red tails. By the rule of complemets (from probability), there is a probability of 1- that a sample mea will fall withi the ier regio of the figure below: 1 Alpha z /2 z /2 The critical value z /2 is the positive z value that is at the vertical boudary separatig a area of /2 i the right tail of the stadard ormal distributio. (The value of z /2 is at the vertical boudary for the area of /2 i the left tail.) The subscript /2 is simply a remider that the z score separates a area of /2 i the right tail of the stadard ormal distributio. We ca see from the drawig that the P z Z z x Pz/ 2 z/ 2 1. Next, we may solve the compoud iequality for :, Now substitute for Z to get / 2 / 2 1 Pz/ 2 x z/ 2 1,

6 Multiply all three sides of the iequality by egative oe: Pz/ 2 x z/ 2 1, Add x-bar to all three sides ad write the iequality i the proper order: Px z/ 2 x z/ 2 1, Now we must drop the Probability otatio because is ot a radom variable but is istead a ukow costat (It is either i the iterval or it is t there ca t be ay otio of probability here if the value of does ot vary radomly). Fially, we ca say that we are (1 )100% that x z/ 2 x z/ 2. The (1 )100% Cofidece Iterval for the Mea x z x z / 2 / 2 *A ote about otatio: x E, x E is ofte writte as x E x E The part of the iterval above give by z /2 is called the margi of error, E.

7 The margi of error is the maximum likely differece observed betwee the sample mea, x, ad the populatio mea, μ, ad is deoted by E. The formula above has oly oe quatity which we will ot be give directly: z /2 Here are the steps to fidig z /2(Whe a t-table ca t be used) 1. Idetify the C-level 2. Fid the (C-Level)/2 3. Go to the Z-table, (i the body of the table) look up the umber foud i step 2 = the bold umbers o the side ad top of the table 4. z /2 Example 100: Fid z /2for a 95% cofidece iterval: Solutio: The cofidece level is 0.95. Dividig that i two gives us: 0.4750. Lookig up 0.4750 i our z- table gives us our aswer: z /2 = 1.96.

8 *** Note: Usig the t-table provided with the formula card o my web site is much easier tha the above method. The t-table also provides a extra decimal place, so it is recommeded to use the t-table wheever possible. Example 101: Fid z /2 for a 90% CI Example 102: Fid z /2 for a 98% CI Example 103: Fid z /2 for a 99% CI. Now that we ca fid z /2, it will be easy to create our cofidece itervals. 7.2 Large-Sample Cofidece Itervals for a Populatio Mea Before we create a cofidece iterval to estimate the mea, we should look at the requiremets for costructig these itervals: 1. The sample is a simple radom sample. (All samples of the same size have a equal chace of beig selected.) 2. The value of the populatio stadard deviatio is kow. 3. Either or both of these coditios are satisfied: The populatio is ormally distributed or > 30. Steps to Create a Cofidece Iterval 1. List all give sample data from the problem icludig sample size ad C-level 2. Fid z /2 3. Calculate the margi of error, E z /2 x E, x E 4. Calculate

9 Example 104: I sociology, a social etwork is defied as the people you make frequet cotact with. The persoal etwork size for each adult i a sample of 2,819 adults was calculated. The sample had a mea persoal etwork size of 14.6 with a kow populatio stadard deviatio of 9.8. a. Give a poit estimate for the mea persoal etwork size of all adults b. Form a 95% cofidece iterval for the mea persoal etwork size of all adults c. Give the practical iterpretatio of the iterval created i part b. d. Give the coditios required for the iterval to be valid (aswer: The sample must be radom ad should be large, 30 ).!!!Importat!!! Example 105: A study foud the body temperatures of 106 healthy adults. The sample mea was 98.2 degrees ad the sample stadard deviatio was 0.62 degrees. Fid the margi of error E ad the 95% cofidece iterval for µ. Does the iterval cotradict the claim that the average body temperature of healthy adults is 98.6 degrees? Coclusio: There are some relatioships that should be uderstood. --If cofidece goes up so does the iterval width -- If cofidece goes dow so does the iterval width --If sample size goes up iterval width goes dow 7.3 Determiig the Sample Size

10 Sample Size for Estimatig the Mea Suppose we wat to collect sample data with the objective of estimatig some populatio. questio is how may sample items must be obtaied? The By solvig the margi of error formula for, we ca arrive at the followig sample size formula: z /2 E where: z α/2 = critical z score based o the desired cofidece level E = desired margi of error σ = populatio stadard deviatio 2 Whe fidig the sample size,, if the use of the formula above does ot result i a whole umber, always icrease the value of to the ext larger whole umber. Whe solvig a problem, you may see a phrase like, We wat to estimate the mea withi. That word withi is a key word idicatig the margi of error. Example 106: Nielse Media Research wats to estimate the mea time that full-time college studets sped watchig TV each weekday. Fid the sample size ecessary to estimate that mea with a 15 miute margi of error. Assume that 96% cofidece is desired, ad assume that the populatio stadard deviatio is 112.2 miutes.

11 Example 106.5: I a paper titled, The Role of Deliberate Practice i the Acquisitio of Expert Performace researchers estimated that it takes approximately 10,000 hours of deliberate practice to become a expert at somethig. If we wat to estimate the average time that it would take to become a expert guitarist withi 200 hours, how large should our sample of expert guitarists be? Assume the stadard deviatio is 850 hours, ad that we wat a 95% cofidece level. 7.4 Fidig Critical T Values Estimatig a Populatio Mea: Small sample size (or σ ukow) ad ormally distributed. This sectio presets methods for fidig a cofidece iterval estimate of a populatio mea whe the populatio stadard deviatio is ot kow. With σ ukow, we will use the Studet t distributio assumig that certai requiremets are satisfied. 2 Recall: The CLT says if X ~ N,, the X ~ N, 2 / small. for ay sample size o matter how However, if X is ot ormal we eed a sufficietly large sample size to assume ormality. Whe the populatio stadard deviatio is ukow, we use the sample stadard deviatio (S) as a substitute, but for small sample sizes S may ot be a very good substitute.

12 Our goal for this sectio is to be able to estimate the true value of the populatio parameter (mu = mea) whe: 1. is ukow 2. The sample is ormally distributed or 30 *I class, we will use a simpler method for choosig betwee t ad Z. If 30 use Z otherwise use t. I most cases, whe the distributio is assumed to be ormal, statisticias use the t-distributio, but for i class problem solvig without software the z distributio has some advatages. Thus i class we will use Z whe our sample size is larger tha 30. As before we will eed certai iformatio from the sample to form our cofidece iterval: Thigs eeded to create a Cofidece Iterval for the populatio mea Sample Mea Sample Size Stadard Deviatio x S Cofidece Level (1 - )100% T-table Value t /2 Note: We o loger are usig a z value for our cofidece iterval; istead we will eed a value from a related distributio*--the Studet s t-distributio. The t-distributio has a shape like that of the stadard ormal distributio (Z-distributio), but it is a little heavier i the tails ad cosequetly a little lower at its ceter (Actually, the specific shape of the t-distributio is determied by its degrees of freedom = 1). Importat Properties of the Studet t Distributio 1. The Studet t distributio is differet for differet sample sizes (see the figures below). 2. The Studet t distributio has the same geeral symmetric bell shape as the stadard ormal distributio but it reflects the greater variability (with wider distributios) that is expected with small samples.

13 3. The Studet t distributio has a mea of t = 0 (just as the stadard ormal distributio has a mea of z = 0). 4. The stadard deviatio of the Studet t distributio varies with the sample size ad is greater tha 1 (ulike the stadard ormal distributio, which has a s = 1). 5. As the sample size gets larger, the Studet t distributio gets closer to the ormal distributio. t 0.025,4 * Z ad t are related by takig a Stadard Normal radom variable (Z) ad dividig it by the square root of a Chi-squared radom variable (V) which is divided by its degrees of freedom (v), we get a radom variable that has a Studet s t-distributio ( Z / V / v ).

14 To get the eeded critical value t /2, we eed to: 1. Get the degrees of freedom, df = 1 2. Fid 1 CIlevel 3. Look-up the degrees of freedom ad /2 o the t-table from our site (or use ay other t- table). Example 107: Fid t /2for a 90% cofidece iterval with a sample size of = 24 Example 108: Fid t /2for a 99% cofidece iterval with a sample size of = 29 Example 109: Fid t 0 such that P( t t0) 0.025 whe = 28 Now that we ca fid t /2, it is time to lear how to create our cofidece iterval to estimate the mea: 7.5 Small-Sample Cofidece Itervals for a Populatio Mea Steps to Create a Cofidece Iterval 1. List all give sample data from the problem icludig sample size ad C-level 2. Fid t /2 s 3. Calculate the margi of error, E t /2 x E, x E 4. Calculate Example 110: I a 2011 report by the CDF (Childre s Defese Fud), it was reported that a radom sample of 29 black males with oly a high school degree eared o average $25,418. The stadard deviatio is estimated to be $5,500. Use the sample data ad a 95% cofidece level to fid the margi of error E ad the cofidece iterval for µ. A 95% cofidece iterval was costructed for white males, ad it was foud that the true mea icome for white males with oly a high school degree was betwee $33,215 ad 37,399. Comparig these two itervals, ca we coclude there is a sigificat differece betwee icomes for the two groups?

15 Example 111: Because cardiac deaths appear to icrease after heavy sow falls, a experimet was desiged to determie the cardiac demads of maually shovelig sow. Te subjects cleared tracts of sow, ad their maximum heart rates were recorded. Their average maximum heart rate was 175 with a stadard deviatio of 15. Assumig maximum heart rates are ormally distributed, fid the 95% cofidece iterval estimate of the populatio mea for those people who shovel sow maually. Example 112: Flesch ease of readig scores for 12 differet pages radomly selected from J.K. Rowlig s Harry Potter ad the Sorcerer s Stoe were calculated. Fid the 95% iterval estimate of, the true mea Flesch ease of readig score for Harry Potter ad the Sorcerer s Stoe (The 12 pages distributio appears to be bell-shaped with x = 80.75 ad s = 4.68). Formal Method for Choosig Betwee z ad t: Method Z-distributio Coditios kow & ormally distributed or t-distributio kow & >30 ot kow & ormally distributed or oparametric ot kow & >30 Populatio is ot ormally distributed ad 30

16 *Note: for classroom purposes, we will use Z whe 30 ad t otherwise. 7.6 Cofidece Itervals for a Populatio Proportio Large-Sample Cofidece Iterval for a Populatio Proportio I may real world scearios, we would like to estimate a populatio proportio. If we look at radomly selected subjects ad x of them have some trait we are iterested i, we ca form a sample proportio from the data: x pˆ, where x = the umber of subjects havig the trait we are iterested i. This proportio is a sample proportio sice it is oly based o subjects from some larger populatio. x We ca use this pˆ to estimate the populatio proportio.

17 Sice for each sample draw of size, a differet amout (x) of subjects will have the desired trait, the x probabilities associated with each possible value of pˆ will be equal to the probability associated with each possible value of x. X has a biomial distributio we ca approximate this distributio whe is large (as log as is large eough that pˆ will fit iside of 3 p ˆ 0,1 ) usig the stadard ormal (Z) distributio. 2 Remember that X~biomial( p, pq ) x pˆ will have the followig mea ad stadard deviatio: p ad ˆp ˆp pq To uderstad why these values are as they are look at the followig properties of expectatio: ae X ad Var ax a 2 Var X E ax Sice E X p, X p E p SiceVar X X pq pq pq, Var 2 As before, we would like to have more tha just a good poit estimator of the populatio proportio, so i this sectio, we will lear how to form a iterval estimate of the true populatio proportio. x From above, we ca recall the mea ad stadard deviatio of pˆ is: x E pˆ E p, ad ˆp pq

18 x Now usig the assumptio that is large, we ca approximate the samplig distributio of pˆ by the ormal distributio. Our iterval to estimate the true populatio proportio will have a similar structure to the iterval used to estimate the mea: (Poit Estimate) (Number of Stadard Deviatios)(Stadard Error) pˆ z /2 pq ˆˆ *Note: we are approximatig the stadard error of ˆp as, ˆp ˆˆ pq **Also, we should check that is large eough that pˆ will fit iside of 3 p ˆ above method, or alteratively we ca check to see if both p 15 ad q 15. Steps to Creatig a Cofidece Iterval for a Populatio Proportio: x 1. Gather sample data: x (or ˆp ),, ad C-level [Calculate pˆ & (1 - ˆp ) = ˆq ] 0,1 before usig the 2. Fid Z /2 3. Calculate the Margi of Error, E = Z /2 4. Fially, form pˆ E, pˆ E ˆˆ pq Example 113: A atiowide poll of early 1,500 people coducted by the sydicated cable televisio show Datelie: USA foud that 70 percet of those surveyed believe there is itelliget life outside of Earth i the uiverse, perhaps eve i our ow Milky Way Galaxy. What proportio of the etire populatio agrees, at the 95% cofidece level?

19 Example 113.5: I may sports, eligibility for mior leagues is determied by age at the start of the caledar year (Ja 1). Jouralist Malcolm Gladwell wrote about the effects of this eligibility rule o Caadia hockey i his 2008 book Outliers. The issue is that people bor i the early moths of the year ed up beig older whe they are fially able to participate i mior leagues tha studets bor i later moths. For example, a child bor o Jauary 2 d who is eligible to play whe he/she turs te, will be 10 years ad 364 days old whe he starts playig compared to a child bor December 31 st who will oly be 10 years ad 1 day old whe he/she is eligible to play. Beig older is a advatage i sports, so these older kids aturally perform better ad stad out more to coaches ad scouts. Stephe Leavitt ad Stephe J. Duber, authors of the bestsellig Freakoomics series of books, oted this tred i iteratioal soccer i a 2006 New York Times colum. FIFA itroduced a Ja. 1 cutoff date i 1997. Of the 410 players i the 2006 World Cup bor after 1979 (thus affected by the Ja. 1 cutoff date) the percetage who were bor i Jauary, February ad March was 32.4%. Use the 2006 World Cup Data ad a 99% cofidece level to form a iterval estimate for the true proportio of FIFA players bor after 1979 that have a birthday i the first three moths of the year. Assumig that for the geeral populatio the birth rate for the moths Jauary, February, ad March is approximately 25%, does it seem the proportio of FIFA stars bor i these three moths is sigificatly higher tha the expected 25% rate? Messi, i his Barca uiform above, bor i late Jue probably did ot beefit from his date of birth, but the cry baby i the white Real Madrid uiform was bor i early Feb ad probably did beefit from his lucky birth moth just aother reaso why Messi is the better player. Example 114: Butt-dialig 911 is a growig problem. I New York City, a 2012 report stated that 40% of calls made to 911 were dialed i error from cell phoes. The report looked at a sample of 743,000 calls hadled by NYC s 911 operators. Costruct a 98% cofidece iterval estimate of the true proportio of NYC 911 calls that are made i error. Before this report the mayor of NYC claimed that more tha 45% of the calls made to 911 were due to butt-dialig. Did this report cotradict the mayor s claim? Example 115: Whe Medel coducted his famous geetics experimets with peas, oe sample of offsprig cosisted of 428 gree peas ad 152 yellow peas. Medel expected that 25% of the offsprig peas would be yellow. Fid a 95% cofidece iterval estimate for the true proportio of yellow peas. Do the results cotradict Medel s theory?

20 Note: Watch out for problems where p is very close to 0 or 1, i those cases, would have to be very large for the samplig distributio of ˆp to be well approximated by the ormal curve. If p is close to 0 or 1, Wilso s adjustmet for estimatig p yields better results p(1 p) p z /2 4 where x 2 p 4 Example 116: Suppose i a particular year the percetage of firms declarig bakruptcy that had show profits the previous year is.002. If 100 firms are sampled ad oe had declared bakruptcy, what is the 95% CI o the proportio of profitable firms that will tak the ext year? Solutio: p(1 p) p p z /2 4 x 2 12 p.0289 4 100 4.0289(1.0289) p.0289 1.96 100 4 p.0289.032 Determiig Sample Size for the Estimatio of Proportio If we wat to estimate the populatio proportio with a certai margi of error ad a specified cofidece level, we will use the followig formula to determie the eeded sample size: z 2 /2 E 2 pq Where p ad q ca be estimated from previously kow sample data or ca be coservatively estimated to be 0.50 each. The formula above was derived by solvig the formula for margi of error for.