+ Statistical Methods i Practice STA/MTH 3379 + Dr. A. B. W. Maage Associate Professor of Statistics Departmet of Mathematics & Statistics Sam Housto State Uiversity Discoverig Statistics 2d Editio Daiel T. Larose Chapter 7: Samplig Distributios Lecture PowerPoit Slides + Chapter 7 Overview 3 + The Big Picture 4 Where we are comig from ad where we are headed 7.1 Itroductio to Samplig Distributios 7.2 Cetral Limit Theorem for Meas 7.3 Cetral Limit Theorem for Proportios I Chapters 1 4, we leared ways to describe data sets usig umbers, tables, ad graphs. I Chapters 5 6 we leared the tools of probability ad probability distributios that allow us to quatify ucertaity. I Chapter 7, we will discover that seemigly radom statistics have predictable behaviors. The special type of distributio we use to describe these behaviors is called the samplig distributio. We will also lear about the most importat result i statistical iferece, the Cetral Limit Theorem. The samplig distributios we lear i this chapter form the basis for the statistical iferece we will perform i the rest of the book. 1
+ 7.1: Itroductio to Samplig Distributios Objectives: Explai the samplig distributio of the sample mea. Describe the samplig distributio of the sample mea whe the populatio is ormal. Fid probabilities ad percetiles for the sample mea whe the populatio is ormal. 5 6 Sample Mea I this chapter, we will develop methods that will allow us to quatify the behavior of statistics like the sample mea. The samplig distributio of the sample mea for a give sample size cosists of the collectio of the meas of all possible samples of size from the populatio. 7.1 x 10 20 5 30 15 16 miutes N 5 If we calculate the mea time for every possible sample of three idividuals, we get the samplig distributio below. x 10 20 5 x1 11.67 miutes N 3 Sample Mea Whe workig with samplig distributios, it is importat to kow the mea ad stadard deviatio. The mea of the samplig distributio of the sample mea is the value of the populatio mea µ. That is, x. 7 Accordig to CaEquity Mortgage compay, the mea age of mortgage applicats i the City of Toroto is 37 years old. Assume that the stadard deviatio is 6 years. Fid the mea ad stadard deviatio for the samplig distributio of the sample mea for the followig sample sizes: (a) 4, (b) 100, (c) 225 8 The stadard deviatio of the samplig distributio of the sample mea is called the stadard error of the mea. It is equal to x /, where σ is the populatio stadard deviatio. 6 (a) a. = 4. The 3. x 4 Note, because the deomiator of the stadard error formula is, the larger the sample size, the tighter the resultig samplig distributio. Larger sample sizes lead to smaller variability, which results i more precise estimatio. x 37 6 (b) e. = 100. The 0.6. x 100 6 (c) f. = 225. The 0.4. x 225 2
Sample Mea for a Normal Populatio Two importat facts should be oted about sample meas that are collected from a ormal populatio. For a ormal populatio, the samplig distributio of the sample mea is distributed as ormal (µ, σ/ ), where µ is the populatio mea ad σ is the populatio stadard deviatio. 9 Probabilities ad Percetiles Usig a Samplig Distributio Sice we kow the samplig distributio of the sample mea is ormal whe the populatio is ormally distributed, we ca use the techiques of Sectio 6.5 to aswer questios about the meas of samples take from ormal populatios. Suppose the quiz scores for a certai istructor are ormal (70, 10). Fid the probability that a radomly chose studet s score will be above 80. Fid the probability that a sample of 25 quiz scores will have a mea score greater tha 80. 10 Whe the samplig distributio of the sample mea is ormal, we may stadardize to produce the stadard ormal radom variable: Z x x x x / Probabilities ad Percetiles Usig a Samplig Distributio 11 + 7.2: Cetral Limit Theorem for Meas 12 Suppose the quiz scores for a certai istructor are ormal (70, 10). Objectives: What two symmetric values cotai the middle 90% of all sample meas betwee them? Assume a class size of 25. Use ormal probability plots to assess ormality. The middle 90% will fall betwee the 5 th percetile ad the 95 th percetile. These percetiles correspod to Z = 1.645 ad Z = 1.645. 70 1.645(2) = 66.71 70 + 1.645(2) = 73.29 Describe the samplig distributio of sample meas for skewed ad symmetric populatios as the sample size icreases. Apply the Cetral Limit Theorem for Meas to solve probability questios about the sample mea. 3
Normal Probability Plots Much of our aalysis requires that the sample data come from a populatio that is ormally distributed. We ca use histograms, dotplots, ad stem-ad-leaf displays to assess ormality. But a more precise tool is the ormal probability plot of the estimated cumulative ormal probabilities agaist the correspodig data values. 13 Samplig Distributio of x-bar for Skewed Populatios The samplig distributio of sample meas for a ormal populatio is also ormal. What if the populatio is ot ormal? 14 If the poits i the ormal probability plot either cluster aroud a straight lie or early all fall withi the curved bouds, the it is likely that the data set is ormal. Systematic deviatios off the straight lie are evidece agaist the claim that the data set is ormal. Cetral Limit Theorem for Meas 15 Cetral Limit Theorem for Meas 16 Regardless of the populatio, the samplig distributio of the sample mea becomes approximately ormal as the sample size gets larger. If the Populatio is Normal The samplig distributio of sample meas is ormal. Cetral Limit Theorem for Meas Give a populatio with mea µ ad stadard deviatio σ, the samplig distributio of the sample mea becomes approximately ormal (µ, σ/ ) as the sample size gets larger, regardless of the shape of the populatio. If the Populatio is No-Normal or Ukow ad the Sample Size is At Least 30 The samplig distributio of the sample mea is approximately ormal. Rule of Thumb: We cosider 30 as large eough to apply the Cetral Limit Theorem for Meas for ay populatio. If the Populatio is No-Normal or Ukow ad the Sample Size is Less Tha 30 We have isufficiet iformatio to coclude that the samplig distributio of the sample mea is either ormal or approximately ormal. 4
+ 7.3: Cetral Limit Theorem for Proportios Objectives: Explai the samplig distributio of the sample proportio. Apply the Cetral Limit Theorem for Proportios to solve probability questios about the sample proportio. 17 18 Sample Proportio The sample mea is ot the oly statistic that ca have a samplig distributio. Every statistic has a samplig distributio. Oe of the most importat is the samplig distributio of the sample proportio. Suppose each idividual i a populatio either has or does ot have a particular characteristic. If we take a sample of size from the populatio, the sample proportio (read p-hat) is: p ˆ X where X represets the umber of idividuals i the sample that have the particular characteristic. The samplig distributio of the sample proportio for a give sample size cosists of the collectio of the sample proportios of all possible samples of size from the populatio. Sample Proportio The mea of the samplig distributio of the sample proportio is the value of the populatio proportio p. This may be deoted as p ˆ p The stadard deviatio of the samplig distributio of the sample proportio is called the stadard error of the proportio ad is foud by p(1 p) p ˆ where p is the populatio proportio ad is the sample size. The samplig distributio of the sample proportio may be cosidered approximately ormal oly if both p 5 ad (1 p) 5. The miimum sample size required to produce approximate ormality is the larger of either 1 = 5/p or 2 = 5/(1 p). 19 20 Sample Proportio The Natioal Istitutes of Health reported that color blidess liked to the X chromosome afflicts 8% of me. Suppose we take a radom sample of 100 me ad let p deote the proportio of me i the populatio who have color blidess liked to the X chromosome. Fid ad. pˆ p ˆp pˆ pˆ p 1 p 0.08 10.08 100 0.000736 0.02713 5
Applyig the Cetral Limit Theorem for Proportios Cetral Limit Theorem for Proportios The samplig distributio of the sample proportio follows a approximately ormal distributio with mea p ad stadard deviatio p(1 p) p ˆ whe both p 5 ad (1 p) 5. Whe the samplig distributio of the sample proportio is approximately ormal, we ca stadardize to produce the stadard ormal Z: Z p ˆ p ˆ p ˆ p p ˆ p(1 p) 21 The Texas Workforce Commissio reported that the state uemploymet rate i March 2007 was 4.3%. Let p = 0.043 represet the populatio proportio of uemployed workers i Texas. Fid the probability that a sample of 117 Texas workers will have a proportio uemployed greater tha 9%. Sice 117(0.043) > 5 ad 117(0.957) > 5, we ca apply the Cetral Limit Theorem for Proportios. Z 22.09.043 2.51.043(1.043) 117 P(Z > 2.51) = 1 0.9940 = 0.0060 23 + Chapter 7 Overview 24 The Texas Workforce Commissio reported that the state uemploymet rate i March 2007 was 4.3%. Let p = 0.043 represet the populatio proportio of uemployed workers i Texas. Fid the 99 th percetile of sample proportios for = 117. The Z-value associated with 0.9901 is 2.33. 7.1 Itroductio to Samplig Distributios 7.2 Cetral Limit Theorem for Meas 7.3 Cetral Limit Theorem for Proportios ˆ p 2.33(0.01875) 0.043 0.0867 6