Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Transcription

1 Ceter, Spread, ad Shape i Iferece: Claims, Caveats, ad Isights Dr. Nacy Pfeig (Uiversity of Pittsburgh) AMATYC November 2008 Prelimiary Activities 1. I would like to produce a iterval estimate for the mea years of educatio (startig with first grade) of Math/Stats teachers at two-year colleges. I record years of educatio for each teacher, calculate the sample mea ad stadard deviatio, ad use those to set up a 95% cofidece iterval for the populatio mea: x ± 2 s. (The sample is large eough that the t multiplier for 95% cofidece is approximately 2.) What do I claim about this iterval? I claim to be 95% cofidet that the mea years of educatio of all Math/Stats teachers at two-year colleges is i the iterval. 2. A set of 6 umbered cards represet how may phoe umbers each perso i a family of 6 kows by heart. I would like to produce a iterval estimate for the mea umber kow by all 6, based o iformatio from of the 6 family members who happe to be available. Assume the populatio stadard deviatio is 3. We pick cards at radom ad set up the iterval x ± 2 3 = x ± 3. What do I claim about this iterval? I claim to be 95% cofidet that the ukow populatio mea is i that iterval. 3. A set of 16 umbered cards represet the umber of filligs for a group of 16 studets. Suppose a detist visitig the school has time to examie 2 studets. Based o their mea umber of filligs, he is to test the hypothesis that the mea for the whole group is agaist the alterative that it is less tha. The test is to be carried out at the 0.05 level. We ll assume the stadard deviatio for all 16 studets is 3. Two cards are picked at radom from those 16, ad we calculate their mea x. The test statistic is x 3/. Oce we fid the test statistic is ot less tha 1.65, we do ot 2 reject the ull hypothesis at the 0.05 level. I fact, the mea of all 16 cards is, so the ull hypothesis is true. I the log ru, how ofte will it be rejected? I claim it will be rejected 5% of the time. Itroductio Our key iferece results i itroductory statistics cofidece itervals ad hypothesis tests about meas or proportios are costructed from what we claim to kow about the ceter, spread, ad shape of the samplig distributios. I fact, such claims must be made cautiously, because if certai criteria cocerig the backgroud of our data are ot met, the ceter, spread, ad shape of the samplig distributios are affected. We ca be specific about what aspects of our iterval or test are udermied, depedig o how the three key 1

2 coditios are violated. Furthermore, we ca carry out activities with studets to give them a cocrete feel for how improper samplig desig ca affect three specific aspects of our iferece claims. Because of limited time, we ll focus oly o quatitative variables. Claims The key results obtaied are summarized i the followig claims about the samplig distributio of sample mea: If the populatio for a quatitative variable has mea µ ad stadard deviatio σ, the (uder the right circumstaces) the distributio of sample mea for a sample of size has ceter: mea µ spread: stadard deviatio σ shape: approximately ormal These pave the way for us to make the followig iferece claims: If sampled values of a quatitative variable for a sample of size have mea x, ad the variable s stadard deviatio is kow to be σ, the a 95% cofidece iterval for µ is x±2 σ. To test the hypothesis H 0 : µ = µ 0, we stadardize x to z = x µ 0 σ/ ad decide whether to reject H 0 based o the probability of z this low, high, or differet from 0, depedig o the form of the alterative hypothesis. Caveats Studets ad, admittedly, we teachers are iclied to take this iformatio ad ru with it, although we do occasioally metio the followig caveats: The sample must be ubiased. Selectios must be idepedet; the populatio should be at least 10 times the sample size. The sample size must be large eough. However, to avoid the discomfort of assigig studets problems that reach a dead-ed, we ted to give them carte blache: assume the sample is radom, ad large eough to guaratee ormality through the Cetral Limit Theorem, ad that our selectios are idepedet. Ofte, because these assuraces start to become repetitive, we let studets assume that, by default, all the ecessary coditios are i place. Thus, i the iterest of guarateeig them situatios where they ca plug ito a hady formula with impuity, we deprive them of two valuable lessos: 2

3 1. There are direct cosequeces that lik each caveat with oe of the claims about ceter, spread, ad shape, respectively. Thus, there are meaigful coectios betwee what is leared about producig ad summarizig data, ad about samplig distributios, ad what is applied to perform iferece. These coectios are threads that tie together all the most importat material i a itroductory statistics course. 2. I may real-life situatios, the coditios eeded to justify the claims are ot met. Ujustified claims about ceter I geeral, bias ca arise i two ways, each of which ca udermie our claim that the mea of the distributio of sample meas equals the populatio mea. 1. Bias arisig from the samplig desig: the sample does ot truly represet the larger populatio of iterest. Do our teachers truly represet the etire populatio of math/stat teachers at two-year colleges? 2. Bias arisig from the study desig: the sampled values are assessed improperly, resultig i biased summaries. Note that this type of bias ca arise eve if the sampled idividuals are truly represetative of the larger populatio. Is askig teachers to report their years of educatio out loud a good way to obtai completely accurate data values? Are we really 95% cofidet that the mea years of educatio for the etire populatio of math/stat teachers at two-year colleges falls i the iterval we produced earlier? Ujustified claims about spread Whe we preset studets with the cocept of radom variables, we ofte metio rules for meas ad variaces: the mea of the sum of two radom variables equals the sum of the meas, ad the variace of the sum equals the sum of the variaces. But the latter oly holds if the two variables are idepedet. If there is depedece, the σ 2 X 1 +X 2 ca be substatially greater or less tha σ 2 X 1 + σ 2 X 2. The additivity of variaces lets us assert that the variace of X = 1 (X 1 +X 2 + +X ) is σ 2 ) = 1 ) = σ2 2(σ2 2(σ2 ad that the stadard deviatio of X is the square root of this, σ. However, these claims break dow if the X i are depedet, a circumstace that ca arise i real-life samplig situatios, icludig whe we sample without replacemet from a populatio that is ot sufficietly large relative to the sample size. Thus, if we fail to satisfy the requiremet that the populatio be at least 10 times the sample size, the claimed stadard deviatio σ i our cofidece iterval formula is icorrect. Are we really 95% cofidet that the iterval x ± 2 3 = x ± 3 based o a sample of cards cotais the mea for all 6 cards? 3

4 Ujustified claims about shape Thaks to the Cetral Limit Theorem, we ca say that the distributio of sample mea for radom samples of size is approximately ormal if the uderlyig distributio itself is approximately ormal, or if the sample size is large eough to offset o-ormality i the uderlyig distributio. If the sample is too small ad the shape of the samplig distributio is ot ormal, the the multiplier 2 i our cofidece iterval is icorrect, because it is based o a ormal probability distributio. Likewise, our P-value i a hypothesis test would be icorrect. Earlier, we picked 2 cards at radom from 16 cards whose mea was ad stadard deviatio was 3, ad tested the true ull hypothesis H 0 : µ = agaist the alterative H a : µ < at the 0.05 level. Ca we really claim that H 0 will be rejected 5% of the time i the log ru? Isights through activities 1. A biased sample mea: Our activity today had to do with teachers years of educatio. You ca survey studets about aother variable that may produce biased resposes: ask them to report out loud how may classes they missed the week before. Note that bias ca occur here o two frots. First, the studets who are missig class that day are ot represeted i the survey! Secod, studets may ot feel comfortable tellig their istructor how may classes they missed. I both cases, the bias leads to our sample mea providig a uderestimate of the true populatio mea umber of classes missed. Alteratively, you ca ask studets to pick a umber at radom (off the top of their heads) from 1 to 20. We use the sample mea umber picked, x, ad the stadard deviatio of the umbers 1 to 20 (5.77) to costruct a 95% cofidece iterval for the ukow populatio mea, Presumably we are 95% cofidet that the populatio mea falls i this iterval. However, such selectios which are actually haphazard, ot radom ted to be biased towards higher umbers. Our sample mea is a biased estimator for µ, ad our cofidece iterval is wrog. 2. A icorrect stadard deviatio: We sample cards without replacemet from a populatio with mea 5.5, stadard deviatio approximately 3. I the first case, we will sample from a populatio of 0 cards (10 times the sample size. I the secod case, we sample from 6 (the populatio is oly 50% larger tha the sample). I the third case, the populatio is the same size as the sample. To give our activity cotext, we ca say the values represet how may phoe umbers people have memorized. (a) Sample cards from 0: a ordiary deck mius all the face cards, so it cosists of each of the umbers 1 through 10. (b) Sample cards from 6: the umbers 1, 3, 5, 6, 8, 10. (c) Sample cards from : the umbers 2, 3, 8, 9.

5 I each case, we sample cards without replacemet at least 20 times from the give populatio, recordig the sample mea value each time. Assumig selectios are idepedet, the distributio of sample mea selectio has a mea of 5.5 ad a stadard 3 deviatio of = 1.5. We calculate the mea ad stadard deviatio of each group of sample meas. I all three cases, the mea of sample meas will i fact be approximately 5.5. I the first case, the stadard deviatio should coform well to the claimed value, 1.5. I the secod case, the stadard deviatio will be less. I the third case, it is of course zero! If we sample idividuals without replacemet from a populatio of 6 idividuals whose stadard deviatio is 3, ad use their sample mea to set up a cofidece iterval for the populatio mea, the iterval will be wider tha it should be (margi of error 2 3 = 3 istead of about 2(.8) = 1.6). We would claim that we are oly 95% cofidet that the populatio mea falls i that iterval, but because these sample meas do t stray as far from 5.5, our probability of capturig 5.5 i our cofidece iterval will be cosiderably higher tha 95%. I the extreme, if we sample from cards, our 95% cofidece iterval actually has a 100% success rate, because our sample mea is always 5.5 ad there is o margi of error at all. As far as a hypothesis test is cocered, we would calculate z to be x 5.5 3/, whereas the actual deomiator for cards sampled without replacemet from 6 is less tha 3/. Our calculated z is less extreme tha it should be, makig us vulerable to Type II Errors, failig to reject a false ull hypothesis. 3. A o-ormal samplig distributio: A right-skewed distributio of umbers is created from a deck of cards: use all of each of the umbers 1 ad 2, ad oly 1 each of the umbers 3 through 10. (This distributio models the umber of filligs i a group of people where most have just oe or two, but a few have more.) The populatio mea is ad the stadard deviatio is 3. Each studet picks 2 cards at radom from these 16 cards ad reports his or her sample mea. Each uses the sample mea to test H 0 : µ = vs. H a : µ < at the 0.05 level. The log-ru probability of rejectig, theoretically, is However, the lowest possible sample mea is 1, which stadardizes to 1 3/ 2 = 1.1. A sample mea low eough to produce a P-value of 0.05 or less must fall below (to the left of) 1.65, but this is impossible. We ca take hudreds such samples, ad ever maage to reject H 0 at the 5% level. I geeral, our P-value based o z probabilities is icorrect if we are stadardizig a sample mea or sample proportio for a small sample take from a skewed populatio. Summarizig claims ad caveats i cofidece itervals A 95% cofidece iterval for µ is costructed as x ± 2 σ 5

6 If there is bias arisig from a poor samplig or study desig, the the correct iterval is ot cetered at x. If there is depedece i our sampled idividuals (such as whe we sample without replacemet from a relatively small populatio), the the stadard deviatio σ is icorrect. If the sample size is ot large eough to offset o-ormality i the shape of the populatio, the the multiplier 2 is icorrect. [Note 1: the same problems arise if we stadardize with s ad base our cofidece iterval o a t distributio. Note 2: Aother source of samplig depedece would be if the idividuals are aware of each other s resposes, ad ted to adjust their ow resposes accordigly. Yet aother is the use of a two-sample procedure for paired data. I particular, the claimed formula for the stadard deviatio of the differce betwee sample meas is udermied if the variables X 1 ad X 2 are depedet via a paired desig.] 6