Determining the sample size



Similar documents
Math C067 Sampling Distributions

Hypothesis testing. Null and alternative hypotheses

1. C. The formula for the confidence interval for a population mean is: x t, which was

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

5: Introduction to Estimation

Confidence Intervals

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

Confidence Intervals for One Mean

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Chapter 7: Confidence Interval and Sample Size

1 Computing the Standard Deviation of Sample Means

I. Chi-squared Distributions

One-sample test of proportions

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Section 11.3: The Integral Test

Measures of Spread and Boxplots Discrete Math, Section 9.4

Hypergeometric Distributions

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

Sampling Distribution And Central Limit Theorem

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

Practice Problems for Test 3

1 Correlation and Regression Analysis

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011

STA 2023 Practice Questions Exam 2 Chapter 7- sec 9.2. Case parameter estimator standard error Estimate of standard error

LECTURE 13: Cross-validation

% 60% 70% 80% 90% 95% 96% 98% 99% 99.5% 99.8% 99.9%

5.4 Amortization. Question 1: How do you find the present value of an annuity? Question 2: How is a loan amortized?

PSYCHOLOGICAL STATISTICS

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

Page 1. Real Options for Engineering Systems. What are we up to? Today s agenda. J1: Real Options for Engineering Systems. Richard de Neufville

FM4 CREDIT AND BORROWING

Quadrat Sampling in Population Ecology

Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

OMG! Excessive Texting Tied to Risky Teen Behaviors

CHAPTER 3 THE TIME VALUE OF MONEY

Descriptive Statistics

Properties of MLE: consistency, asymptotic normality. Fisher information.

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Normal Distribution.

Confidence intervals and hypothesis tests

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Lesson 15 ANOVA (analysis of variance)

CHAPTER 11 Financial mathematics

Soving Recurrence Relations

Topic 5: Confidence Intervals (Chapter 9)

Professional Networking

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth


GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

Lesson 17 Pearson s Correlation Coefficient

Unit 8: Inference for Proportions. Chapters 8 & 9 in IPS

Statistical inference: example 1. Inferential Statistics

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Output Analysis (2, Chapters 10 &11 Law)

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

Chapter 7 Methods of Finding Estimators

The Stable Marriage Problem

CHAPTER 3 DIGITAL CODING OF SIGNALS

Chapter 14 Nonparametric Statistics

7.1 Finding Rational Solutions of Polynomial Equations

The Forgotten Middle. research readiness results. Executive Summary

Listing terms of a finite sequence List all of the terms of each finite sequence. a) a n n 2 for 1 n 5 1 b) a n for 1 n 4 n 2

Present Value Factor To bring one dollar in the future back to present, one uses the Present Value Factor (PVF): Concept 9: Present Value

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

Basic Elements of Arithmetic Sequences and Series

Pre-Suit Collection Strategies

Elementary Theory of Russian Roulette

Predictive Modeling Data. in the ACT Electronic Student Record

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Betting on Football Pools

CS103X: Discrete Structures Homework 4 Solutions

How to use what you OWN to reduce what you OWE

Agenda. Outsourcing and Globalization in Software Development. Outsourcing. Outsourcing here to stay. Outsourcing Alternatives

Solving Logarithms and Exponential Equations

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

INVESTMENT PERFORMANCE COUNCIL (IPC)

Repeating Decimals are decimal numbers that have number(s) after the decimal point that repeat in a pattern.

Chapter 5 Unit 1. IET 350 Engineering Economics. Learning Objectives Chapter 5. Learning Objectives Unit 1. Annual Amount and Gradient Functions

MARTINGALES AND A BASIC APPLICATION

Maximum Likelihood Estimators.

Now here is the important step

MEP Pupil Text 9. The mean, median and mode are three different ways of describing the average.

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction

Best of security and convenience

S. Tanny MAT 344 Spring be the minimum number of moves required.

Transcription:

Determiig the sample size Oe of the most commo questios ay statisticia gets asked is How large a sample size do I eed? Researchers are ofte surprised to fid out that the aswer depeds o a umber of factors ad they have to give the statisticia some iformatio before they ca get a aswer! As with all our exampes so far, the aswers are essetially differet depedig o whether the study is a survey desiged to fid out the proportio of somethig, or is desiged to fid a sample mea. We cosider these cases separately. 1

Sample size to estimate a proportio Example: A professor i UNC s Sociology departmet is tryig to determie the proportio of UNC studets who support gay marriage. She asks, How large a sample size do I eed? To aswer a questio like this we eed to ask the researcher certai questios, like 1. How accurately do you eed the aswer? 2. What level of cofidece do you ited to use? 3. (possibly) What is your curret estimate of the proportio of UNC studets who support gay marriage? 2

Possible aswers might be: 1. We eed a margi of error less tha 2.5%. Typical surveys have margis of error ragig from less tha 1% to somethig of the order of 4% we ca choose ay margi of error we like but eed to specify it. 2. 95% cofidece itervals are typical but ot i ay way madatory we could do 90%, 99% or somethig else etirely. For this example, we assume 95%. 3. May be guided by past surveys or geeral kowledge of public opiio. Let s suppose aswer is 30%. 3

Calculatio of sample size: We already kow that the margi of error is 1.96 times the stadard error ad that the stadard error is the formula is ˆp(1 ˆp) ME = z where ME is the desired margi of error ˆp(1 ˆp). I geeral z is the z-score, e.g. 1.645 for a 90% cofidece iterval, 1.96 for a 90% cofidece iterval, 2.58 for a 99% cofidece iterval (see Table 8.2, page 369) ˆp is our prior judgmet of the correct value of p. is the sample size (to be foud) ( ) 4

So i this case we set ME equal to 0.025, z = 1.96 ad ˆp = 0.3, ad ( ) becomes 0.3 0.7 0.025 = 1.96 or ( ) 0.3 0.7 0.025 2 = =.0001627 1.96 which traslates ito 0.3 0.7 =.0001627 = 1291. So we would eed a sample of about 1300 studets. 5

We could clearly try varyig ay of the elemets of this. For example, maybe the researcher would be satisfied with a 90% cofidece iterval, for which z = 1.645. I this case ( ) becomes 0.3 0.7 0.025 = 1.645 for which we ca quickly fid = 909. If we are willig to accept a lower cofidece level, we ca get away with a smaller sample size. 6

A differet type of variatio is What if we have o iitial estimate of ˆp? I this case, the covetio is to assume ˆp = 0.5. ˆp(1 ˆp) The reaso is that the stadard error formula,, is largest whe ˆp = 0.5, so this is a coservative assumptio that allows for ˆp beig ukow a priori. If we repeat the calculatio with ˆp = 0.5 (but returig to z = 1.96), we fid 0.5 0.5 0.025 = 1.96 which results i = 1537. The cost of ˆp beig ukow is a icrease i the sample size, though if ˆp were kow ad already quite close to 0.5 (as occurs i may electio predictios where the result is close), this would ot be too importat a feature. 7

Sample size to estimate a populatio mea The issues are similar if we are desigig a survey or a experimet to estimate a populatio mea. I this case, the formula is ME = t s ( ) where ME is the desired margi of error t is the t-score that we use to calculate the cofidece iterval, that depeds o both the degrees of freedom ad the desired cofidece level, s is the stadard deviatio, is the sample size we wat to fid. 8

There is a complicatio here because the sample size affects t as well as. However, whe 30, the value of t is quite close to the value of z that we would get if we igored the distictio betwee the ormal ad t distributios, so ofte we do igore that distictio ad just use the z value, e.g. 1.96 for a 95% cofidece iterval. The secod complicatio is the eed to specify s. I practice, s will be the sample stadard deviatio, computed after the sample is take. So we ca t possibly kow that i advace. But s is typically a guess, based either o past experiece or o rough estimates of what sort of variability we would expect. 9

Example. We would like to estimate the mea teacher s salary i the Chapel Hill school district, with 99% cofidece, to a accuracy withi $2,000. I this case we have literally o idea what s would be. But if you refer back to problem 2.120 o page 87 (this was part of HW3), there we deduced that amog four possible values that were give, the likeliest was $6,000. So i the absece of aythig better, let s use that as our guess for s. I this case the 99% cofidece iterval traslates to a z or t of 2.58. Therefore ( ) becomes 2000 = 2.58 6000 which solves to ( ) 2.58 6000 2 = = 59.9 2000 or 60 to the earest whole umber. 10

Other ideas (o eed to study i detail, but please read briefly) 1. Small sample estimatio (pages 391 393): Idea of addig 2 to both the umber of successes ad the umber of failures i the sample. This has bee foud to make the ˆp(1 ˆp)/ formula work quite well eve whe is small. 2. Bootstrappig (pages 395-397): Idea of geeratig ew samples by resamplig from curret data. Actually, I have used this i some of the simulatios I showed you i this course, though I did t call it that at the time. 11

Some Worked Examples 8.95. A survey estimated that 20% of all Americas aged 16 to 20 drove uder the ifluece of drugs or alcohol. A similar survey is plaed for New Zealad. They wat a 95% cofidece iterval to have a margi of error of 0.04. (a) Fid the ecessary sample size if they expect to fid results similar to those i the Uited States. (b) Suppose istead they used the coservative formula based o ˆp = 0.5. What is ow the required sample size? 12

Solutio: (a) The geeral formula is which also traslates to ME = z ˆp(1 ˆp) ˆp(1 ˆp)z2 = ME 2 With ME = 0.04, ˆp = 0.2, z = 1.96 we get = 0.2 0.8 1.96 1.96 0.04 0.04 (b) With ME = 0.04, ˆp = 0.5, z = 1.96 we get = 0.5 0.5 1.96 1.96 0.04 0.04 = 384.2. = 600.25. 13

The sample size is 384 for (a) ad 600 for (b), showig the advatage i usig the estimated ˆp (0.2) so log as we feel cofidet that this is roughly the right guess. Note that the choice z = 1.96 arises because this is the z value appropriate for a 95% cofidece iterval. If we were asked for a 99% cofidece iterval, for example, we would use z = 2.58. 14

8.97. A tax assessor wats to assess the mea property tax bill for all homeowers i Madiso, Wiscosi. A survey te years ago got a sample mea ad stadard deviatio of $1400 ad $1000. (a) How may tax records should be sampled for a 95% cofidece iterval to have a margi of error of $100? (b) I reality, the stadard deviatio is ow $1500. Usig the sample size you used i (a), would the margi of error for a 95% cofidece iterval be less tha $100, equal to $100, or greater tha $100? (c) (Adapted.) Uder (b), what is the true probability that the sample mea falls withi $100 of the populatio mea? 15

Solutio: (a) The formula ME = t s traslates to ) 2. ( st = ME With s = 1000, t = 1.96, ME = 100, we get = 384. (b) Sice M E is proportio to s, if s icreases from 1000 to 1500, the ME icreases i the same proportio (to 150). (c) t = x µ s/ so with x µ = ±100, s = 1500, = 384 we get t = ±1.31. I this case with df=383, the t distributio is almost the same as the ormal distributio, so we look this up i the stadard ormal table: the probability of gettig a z score betwee 1.31 ad +1.31 is.9049.0951 =.8098, i.e. the omial 95% cofidece iterval i reality has about a 81% chace of icludig the true value. 16