Determiig the sample size Oe of the most commo questios ay statisticia gets asked is How large a sample size do I eed? Researchers are ofte surprised to fid out that the aswer depeds o a umber of factors ad they have to give the statisticia some iformatio before they ca get a aswer! As with all our exampes so far, the aswers are essetially differet depedig o whether the study is a survey desiged to fid out the proportio of somethig, or is desiged to fid a sample mea. We cosider these cases separately. 1
Sample size to estimate a proportio Example: A professor i UNC s Sociology departmet is tryig to determie the proportio of UNC studets who support gay marriage. She asks, How large a sample size do I eed? To aswer a questio like this we eed to ask the researcher certai questios, like 1. How accurately do you eed the aswer? 2. What level of cofidece do you ited to use? 3. (possibly) What is your curret estimate of the proportio of UNC studets who support gay marriage? 2
Possible aswers might be: 1. We eed a margi of error less tha 2.5%. Typical surveys have margis of error ragig from less tha 1% to somethig of the order of 4% we ca choose ay margi of error we like but eed to specify it. 2. 95% cofidece itervals are typical but ot i ay way madatory we could do 90%, 99% or somethig else etirely. For this example, we assume 95%. 3. May be guided by past surveys or geeral kowledge of public opiio. Let s suppose aswer is 30%. 3
Calculatio of sample size: We already kow that the margi of error is 1.96 times the stadard error ad that the stadard error is the formula is ˆp(1 ˆp) ME = z where ME is the desired margi of error ˆp(1 ˆp). I geeral z is the z-score, e.g. 1.645 for a 90% cofidece iterval, 1.96 for a 90% cofidece iterval, 2.58 for a 99% cofidece iterval (see Table 8.2, page 369) ˆp is our prior judgmet of the correct value of p. is the sample size (to be foud) ( ) 4
So i this case we set ME equal to 0.025, z = 1.96 ad ˆp = 0.3, ad ( ) becomes 0.3 0.7 0.025 = 1.96 or ( ) 0.3 0.7 0.025 2 = =.0001627 1.96 which traslates ito 0.3 0.7 =.0001627 = 1291. So we would eed a sample of about 1300 studets. 5
We could clearly try varyig ay of the elemets of this. For example, maybe the researcher would be satisfied with a 90% cofidece iterval, for which z = 1.645. I this case ( ) becomes 0.3 0.7 0.025 = 1.645 for which we ca quickly fid = 909. If we are willig to accept a lower cofidece level, we ca get away with a smaller sample size. 6
A differet type of variatio is What if we have o iitial estimate of ˆp? I this case, the covetio is to assume ˆp = 0.5. ˆp(1 ˆp) The reaso is that the stadard error formula,, is largest whe ˆp = 0.5, so this is a coservative assumptio that allows for ˆp beig ukow a priori. If we repeat the calculatio with ˆp = 0.5 (but returig to z = 1.96), we fid 0.5 0.5 0.025 = 1.96 which results i = 1537. The cost of ˆp beig ukow is a icrease i the sample size, though if ˆp were kow ad already quite close to 0.5 (as occurs i may electio predictios where the result is close), this would ot be too importat a feature. 7
Sample size to estimate a populatio mea The issues are similar if we are desigig a survey or a experimet to estimate a populatio mea. I this case, the formula is ME = t s ( ) where ME is the desired margi of error t is the t-score that we use to calculate the cofidece iterval, that depeds o both the degrees of freedom ad the desired cofidece level, s is the stadard deviatio, is the sample size we wat to fid. 8
There is a complicatio here because the sample size affects t as well as. However, whe 30, the value of t is quite close to the value of z that we would get if we igored the distictio betwee the ormal ad t distributios, so ofte we do igore that distictio ad just use the z value, e.g. 1.96 for a 95% cofidece iterval. The secod complicatio is the eed to specify s. I practice, s will be the sample stadard deviatio, computed after the sample is take. So we ca t possibly kow that i advace. But s is typically a guess, based either o past experiece or o rough estimates of what sort of variability we would expect. 9
Example. We would like to estimate the mea teacher s salary i the Chapel Hill school district, with 99% cofidece, to a accuracy withi $2,000. I this case we have literally o idea what s would be. But if you refer back to problem 2.120 o page 87 (this was part of HW3), there we deduced that amog four possible values that were give, the likeliest was $6,000. So i the absece of aythig better, let s use that as our guess for s. I this case the 99% cofidece iterval traslates to a z or t of 2.58. Therefore ( ) becomes 2000 = 2.58 6000 which solves to ( ) 2.58 6000 2 = = 59.9 2000 or 60 to the earest whole umber. 10
Other ideas (o eed to study i detail, but please read briefly) 1. Small sample estimatio (pages 391 393): Idea of addig 2 to both the umber of successes ad the umber of failures i the sample. This has bee foud to make the ˆp(1 ˆp)/ formula work quite well eve whe is small. 2. Bootstrappig (pages 395-397): Idea of geeratig ew samples by resamplig from curret data. Actually, I have used this i some of the simulatios I showed you i this course, though I did t call it that at the time. 11
Some Worked Examples 8.95. A survey estimated that 20% of all Americas aged 16 to 20 drove uder the ifluece of drugs or alcohol. A similar survey is plaed for New Zealad. They wat a 95% cofidece iterval to have a margi of error of 0.04. (a) Fid the ecessary sample size if they expect to fid results similar to those i the Uited States. (b) Suppose istead they used the coservative formula based o ˆp = 0.5. What is ow the required sample size? 12
Solutio: (a) The geeral formula is which also traslates to ME = z ˆp(1 ˆp) ˆp(1 ˆp)z2 = ME 2 With ME = 0.04, ˆp = 0.2, z = 1.96 we get = 0.2 0.8 1.96 1.96 0.04 0.04 (b) With ME = 0.04, ˆp = 0.5, z = 1.96 we get = 0.5 0.5 1.96 1.96 0.04 0.04 = 384.2. = 600.25. 13
The sample size is 384 for (a) ad 600 for (b), showig the advatage i usig the estimated ˆp (0.2) so log as we feel cofidet that this is roughly the right guess. Note that the choice z = 1.96 arises because this is the z value appropriate for a 95% cofidece iterval. If we were asked for a 99% cofidece iterval, for example, we would use z = 2.58. 14
8.97. A tax assessor wats to assess the mea property tax bill for all homeowers i Madiso, Wiscosi. A survey te years ago got a sample mea ad stadard deviatio of $1400 ad $1000. (a) How may tax records should be sampled for a 95% cofidece iterval to have a margi of error of $100? (b) I reality, the stadard deviatio is ow $1500. Usig the sample size you used i (a), would the margi of error for a 95% cofidece iterval be less tha $100, equal to $100, or greater tha $100? (c) (Adapted.) Uder (b), what is the true probability that the sample mea falls withi $100 of the populatio mea? 15
Solutio: (a) The formula ME = t s traslates to ) 2. ( st = ME With s = 1000, t = 1.96, ME = 100, we get = 384. (b) Sice M E is proportio to s, if s icreases from 1000 to 1500, the ME icreases i the same proportio (to 150). (c) t = x µ s/ so with x µ = ±100, s = 1500, = 384 we get t = ±1.31. I this case with df=383, the t distributio is almost the same as the ormal distributio, so we look this up i the stadard ormal table: the probability of gettig a z score betwee 1.31 ad +1.31 is.9049.0951 =.8098, i.e. the omial 95% cofidece iterval i reality has about a 81% chace of icludig the true value. 16