This week: Chapter 9 (will do 9.6 to 9.8 later, with Chap. 11) Understanding Sampling Distributions: Statistics as Random Variables

This week: Chapter 9 (will do 9.6 to 9.8 later, with Chap. 11) Uderstadig Samplig Distributios: Statistics as Radom Variables ANNOUNCEMENTS: Shadog Mi will give the lecture o Friday. See website for differet office hours Fri, Mo, Tues. New use of clickers: to test for uderstadig. I will give may more clicker questios, ad radomly five to cout for credit each week. Homework from today ad Friday is due Moday, Nov 8. Homework to be assiged Moday is ot due. Midterm i oe week. You are allowed two sheets of otes. HOMEWORK: Due Mo 11/8, Chapter 9: #15, 25, 37, 44 Chapters 9 to 13: Statistical Iferece See picture draw o board. Five situatios we will cover for the rest of this quarter: For each parameter we will: Lear how to fid a cofidece iterval for its true value Test hypotheses about its true value EXAMPLES OF EACH OF THE 5 SITUATIONS Oe proportio: Biomial situatio with ad p Questio: What proportio of households watched Dacig with the Stars the week of Oct 18? Get a cofidece iterval. Populatio parameter: p = proportio of the populatio of all US households that watched it. Nielse ratigs measure = 25,000 households. X = umber i sample who watched the show = 3,075. pˆ X 3,075 25,000 = = =.123 = proportio of sample who watched This is called p-hat. Differece i two proportios: Compare two populatio proportios usig idepedet samples of size 1 ad 2. Questio: What is the differece i the proportio of smokers who would quit if wearig a icotie patch versus placebo? Get a cofidece iterval for the populatio differece. Test to see if it is statistically sigificatly differet from 0. Populatio parameter: p 1 p 2 = populatio differece i proportios who would quit if everyoe were to use each type of patch (ic.-plac.) Differece i the proportios i the sample who did quit pˆ ˆ 1 p2 =.46.20 =.26 This is read as p-oe-hat mius p-two-hat Note that the parameter ad statistic ca rage from 1 to +1.

Oe mea: Populatio mea for a quatitative variable. Questio: A airlie would like to kow the average weight of checked luggage per passeger, for fuel calculatios. Get a cofidece iterval for the populatio mea. There is o logical value to test, so we would ot do a test. Populatio parameter: µ = mea weight of the luggage for the populatio of all passegers who check luggage. Collect a sample of observatios. For istace, suppose they sample 100 passegers ad fid the mea is 30 pouds. x = 30 = the mea for the sampleof 100 passegers Remember this is read as x-bar Mea for paired differeces: Populatio mea for the differece i two quatitative measuremets i a matched pairs situatio. Questio: How much differet o average would IQ be after listeig to Mozart compared to after sittig i silece? Populatio parameter: µ d = populatio mea for the differece i IQ if everyoe i the populatio were to liste to Mozart versus silece. For the experimet doe with = 36 UCI studets, the mea differece was 9 IQ poits. d = 9 = the mea differece for the sample of 36 studets Read as d-bar. Differece i two meas: Comparig two populatio meas whe idepedet samples of size 1 ad 2 are available. Questio: What is the differece i mea IQ of 4-year-old childre for the populatio of mothers who smoked durig pregacy ad the populatio who did ot? Get a cofidece iterval for the differece. Test to see if the differece is stat sigif. differet from 0. Populatio parameter: µ 1 µ 2 = differece i the meas for the two populatios Based o a study doe at Corell, the differece i meas for two samples was 9 IQ poits. x1 x2 = differece i the meas for the two samples= 9 Read as x-bar-oe mius x-bar-two. GOAL: Estimate ad test parameters based o statistics. Get cofidece itervals ad do hypothesis tests SOME LOGICAL NOTES: 1. Assumig the sample is represetative of the populatio, the sample statistic should represet the populatio parameter fairly well. (Better for larger samples.) 2. But the sample statistic will have some error associated with it, i.e. it wo t equal the parameter exactly. Recall the margi of error from Chapter 3! 3. If repeated samples are take from the same populatio ad the sample statistic is computed each time, these sample statistics will vary but i a predictable way, i.e. they will have a distributio. It is a pdf for the statistic. It is called a samplig distributio for the statistic.

Ratioale: RATIONALE AND DEFINITION FOR SAMPLING DISTRIBUTIONS Remember that a radom variable is a umber associated with the outcome of a radom circumstace, which ca chage each time the radom circumstace occurs. Whe a sample is take from a populatio the resultig umbers are the outcome of a radom circumstace. Dacig with the Stars example: A radom circumstace is takig a radom sample of 25,000 households with TVs. The resultig umber is the proportio of those households that were watchig Dacig with the Stars that week =.123 (or 12.3%) Example: For each differet sample of 25,000 households that week, we would have had a differet sample proportio (sample statistic) watchig the show. Therefore, a sample statistic is a radom variable. Therefore, a sample statistic has a pdf associated with it. The pdf of a sample statistic ca be used to fid the probability that the sample statistic will fall ito specified itervals whe a ew sample is take. Defiitio: The pdf of a sample statistic is called the samplig distributio for that statistic. Example: The radom variable is ˆp = sample proportio = sample statistic. The pdf of ˆp will be defied ext. It is the distributio of possible sample proportios i this sceario. We already kow the pdf for X = umber of households out of 25,000 that are watchig the show. It is biomial with = 25,000 ad p = true proportio of households i US that watched. Familiar example: Suppose 48% (p = 0.48) of a populatio supports a cadidate. I a poll of 1000 radomly selected people, what do we expect to get for the sample proportio pˆ who support the cadidate i the poll? I the last few lectures, we looked at the pdf for X = the umber who support the cadidate. X was biomial, ad also X was approx. ormal with mea = 480 ad s.d. = 15.8. Now let s look at the pdf for the proportio who do. X pˆ = where X is a biomial radom variable. We have see picture of possible values of X. Divide all values by to get picture for possible pˆ.

PDF for x = umber of successes Probability for each possible value of X Plot of possible umber who support cadidate ad probabilities Biomial, =1000, p=0.48 0.025 0.020 0.015 0.010 0.005 0.000 420 440 460 480 500 520 540 Values for umber of successes X (umber who support cadidate) PDF for ˆp = proportio of successes Probability for each possible value of p-hat Plot of possible proportio who support cadidate, with probabilities Biomial, =1000, p=0.48 0.025 0.020 0.015 0.010 0.005 0.000.42.44.46.48.50.52 Values for proportio of successes p-hat What s differet ad what s the same about these two pictures? Everythig is the same except the values o the x-axis! O the left, values are umbers 0, 1, 2, to 1000 O the right, values are proportios 0, 1/1000, 2/1000, to 1..54 Recall the ormal approximatio for the biomial: For a biomial radom variable X with parameters ad p (with p ad (1 p) at least 5) X is approximately a ormal radom variable with: mea µ = p stadard deviatio σ = p(1 p) NOW: Divide everythig by to get similar result for p ˆ ˆp is approximately a ormal radom variable with: mea µ = p stadard deviatio σ = s.d.( pˆ ) = p(1 p) So, we ca fid probabilities that ˆp will be i specific itervals if we kow ad p. X = The Samplig Distributio for a Sample Proportio pˆ 1. The physical situatio: biomial. Actual populatio with fixed proportio w/trait or opiio (e.g. polls, TV ratigs, etc.) OR A repeatable situatio with fixed probability of a certai outcome (e.g. birth is a boy, probability of heart attack if oe takes aspiri) 2. The Experimet Radom sample of from the populatio, pˆ = proportio w/trait OR Repeat situatio times, pˆ = proportio with specified outcome 3. Sample size requiremet: I either case, must have p ad (1- p) at least 5, prefer at least 10. Assumig the above coditios are met, the distributio of possible values of pˆ is approximately ormal with: p ( 1 p) mea µ = p stadard deviatio σ = The resultig ormal distributio is called the samplig distributio of ˆp Notatio: p ( 1 p) s.d.( pˆ ) = stadard deviatio of pˆ = But suppose p is ukow (which is will be if we are estimatig it!) The istead we approximate the s.d. usig p ˆ(1 pˆ ) s.e.( pˆ ) = stadard error of pˆ = = estimate of the stadard deviatio of pˆ

This result is also called the ormal curve approximatio rule for sample proportios For the poll example: Poll of = 1000 people, where the true populatio proportio p = 0.48. The distributio of possible values of pˆ is approximately ormal with p ( 1 p). 48(1.48) mea µ = p =0.48 ad s.d. σ = = 0.0158 = 1000 Probability for each possible value of p-hat Actual (tiy rectagles) Plot of possible proportio who support cadidate, with probabilities Biomial, =1000, p=0.48 0.025 0.020 0.015 0.010 0.005 0.000.42.44.46.48.50 Values for proportio of successes p-hat.52.54 Normal approximatio (smooth) Desity 25 20 15 10 5 0 0.42 0.44 Approproximate distributio of p-hat Normal, Mea=0.48, StDev=0.0158 0.46 0.48 0.50 Possible values of p-hat For example, to fid the probability that ˆp is at least 0.50: Could add up areas of rectagles from.501,.502,, 1000 but that would be too much work! P( ˆp > 0.50) 0.50.48 Pz ( > ) = Pz ( > 1.267) =.103.0158 0.52 0.54 Goig back to the Big Picture The samplig distributio for ˆp describes the distributio of possibilities for it if we were to take millios of samples of size ad compute ˆp each time. It tells us what rages we ca expect ˆp to fall i, ad with what probability. To fid the samplig distributio, we would eed to kow the true value of the parameter p. I practice, we do t kow the true value of the parameter p. I fact the whole poit of statistical iferece is to estimate the parameter, or test for possible values of it. BUT, the stadard deviatio (or stadard error) of the samplig distributio tells us how far the sample statistic is likely to fall from the parameter p, eve if we do t kow what that value of p is. For example, i our poll of = 1000, we kow that the stadard deviatio of ˆp is about.0158 (or.016). So, (from the Empirical Rule) we kow that for approximately 68% of all samples ˆp will be withi oe stadard deviatio =.016 of the true parameter p. We ca use that to estimate p! For istace, if ˆp is 0.45, we ca be 68% certai that the true p is somewhere i the rage of 0.45 ±.016 or betwee 0.434 ad 0.466.

PREPARING FOR THE REST OF CHAPTER 9 For all 5 situatios we are cosiderig, the samplig distributio of the sample statistic: Is approximately ormal Has mea = the correspodig populatio parameter Has stadard deviatio that ivolves the populatio parameter(s) ad thus ca t be kow without it (them) Has stadard error that does t ivolve the populatio parameters ad is used to estimate the stadard deviatio. Has stadard deviatio (ad stadard error) that get smaller as the sample size(s) get larger. Summary table o pages 382-383 will help you with these! New Example I 2005, accordig to the Cesus Bureau, 67% of all childre i the Uited States were livig with 2 parets. (Icludes step-parets ad adoptive parets, but ot foster parets.) I our class, there are about 180 of you who participate i clicker questios. Are you a represetative sample for this questio? If so, what should we expect the class proportio to be? = 180 p =.67 The samplig distributio of ˆp is approximately ormal with (.67)(.33) mea =.67 ad stadard deviatio = =.035 180 Clicker questio (ot for credit, aswers aoymous) I 2005, were you livig with 2 parets? (Step parets ad adoptive parets cout, but foster parets do ot.) A. Yes B. No 12 Samplig distributio of p-hat for =180, p=.67 Normal, Mea=0.67, StDev=0.035 10 8 Desity 6 4 2 0 0.565 0.600 0.635 0.670 0.705 Possible values of p-hat 0.740 0.775