Sampling (cont d) and Confidence Intervals Lecture 9 8 March 2006 R. Ryznar

Similar documents
5.1 Identifying the Target Parameter

Week 4: Standard Error and Confidence Intervals

SAMPLING DISTRIBUTIONS

Mind on Statistics. Chapter 10

Constructing and Interpreting Confidence Intervals

Objectives. 6.1, 7.1 Estimating with confidence (CIS: Chapter 10) CI)

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

BY Aaron Smith NUMBERS, FACTS AND TRENDS SHAPING THE WORLD FOR RELEASE MARCH 10, 2016 FOR MEDIA OR OTHER INQUIRIES:

Review. March 21, S7.1 2_3 Estimating a Population Proportion. Chapter 7 Estimates and Sample Sizes. Test 2 (Chapters 4, 5, & 6) Results

Crosstabulation & Chi Square

BY Maeve Duggan NUMBERS, FACTS AND TRENDS SHAPING THE WORLD FOR RELEASE AUGUST 19, 2015 FOR FURTHER INFORMATION ON THIS REPORT:

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Math 251, Review Questions for Test 3 Rough Answers

1.5 Oneway Analysis of Variance

Week 3&4: Z tables and the Sampling Distribution of X

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

August 2012 EXAMINATIONS Solution Part I

MEASURES OF VARIATION

The Standard Normal distribution

Confidence intervals

Chapter 7 Review. Confidence Intervals. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Association Between Variables

Fairfield Public Schools

Estimation and Confidence Intervals

The Normal distribution

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Standard Deviation Estimator

Unit 26 Estimation with Confidence Intervals

Need for Sampling. Very large populations Destructive testing Continuous production process

Coefficient of Determination

Kaiser Family Foundation/New York Times Survey of Chicago Residents

Chapter 4. Probability and Probability Distributions

Continued Majority Support for Death Penalty

Regression III: Advanced Methods

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

Section 13, Part 1 ANOVA. Analysis Of Variance

Statistical estimation using confidence intervals

Point and Interval Estimates

CALCULATIONS & STATISTICS

E-reader Ownership Doubles in Six Months

Lecture Notes Module 1

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

Introduction Qualitative Data Collection Methods... 7 In depth interviews... 7 Observation methods... 8 Document review... 8 Focus groups...

Statistical tests for SPSS

Study Guide for the Final Exam

WHERE DOES THE 10% CONDITION COME FROM?

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

The normal approximation to the binomial

Characteristics of Binomial Distributions

Reflections on Probability vs Nonprobability Sampling

Confidence Intervals for One Standard Deviation Using Standard Deviation

4. Continuous Random Variables, the Pareto and Normal Distributions

Simple Regression Theory II 2010 Samuel L. Baker

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

p ˆ (sample mean and sample

Math 108 Exam 3 Solutions Spring 00

Expedia.com 2008 International Vacation Deprivation Survey Results

UNIVERSITY OF NAIROBI

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared.

6. Decide which method of data collection you would use to collect data for the study (observational study, experiment, simulation, or survey):

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Confidence Intervals for the Difference Between Two Means

MBA 611 STATISTICS AND QUANTITATIVE METHODS

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

Introduction to Hypothesis Testing OPRE 6301

Binomial Sampling and the Binomial Distribution

Older Adults and Social Media Social networking use among those ages 50 and older nearly doubled over the past year

The AP-Viacom Survey of Youth on Education March, 2011

Sampling strategies *

Farm Business Survey - Statistical information

Population Mean (Known Variance)

Comparing Means in Two Populations

How To Calculate Confidence Intervals In A Population Mean

PUBLIC SAYS CLIMATE CHANGE IS REAL

Food Demand Survey (FooDS) Technical Information on Survey Questions and Methods. May 22, Jayson L. Lusk

Imputation and Analysis. Peter Fayers

Math 58. Rumbos Fall Solutions to Review Problems for Exam 2

Content Creation Online

AP Statistics Chapters Practice Problems MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

The correlation coefficient

Stat 411/511 THE RANDOMIZATION TEST. Charlotte Wickham. stat511.cwick.co.nz. Oct

Stat 20: Intro to Probability and Statistics

Lesson 17: Margin of Error When Estimating a Population Proportion

AMS 5 CHANCE VARIABILITY

Ordinal Regression. Chapter

Two-sample inference: Continuous data

9.07 Introduction to Statistical Methods Homework 4. Name:

Lecture 13/Chapter 10 Relationships between Measurement (Quantitative) Variables

Paid and Unpaid Labor in Developing Countries: an inequalities in time use approach

3.2 Measures of Spread

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Sampling Probability and Inference

Regional Employment Challenges. Edgar Morgenroth

Educational Attainment in the United States: 2015

PROPERTY TAX SOLUTIONS

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

2003 Annual Survey of Government Employment Methodology

Dale Duhan. Natalia Kolyesnikova Tim Dodd Debbie Laverie. Texas Tech University, Lubbock, Texas, USA

Transcription:

Sampling (cont d) and Confidence Intervals 11.220 Lecture 9 8 March 2006 R. Ryznar

Census Surveys Decennial Census Every (over 11 million) household gets the short form and 17% or 1/6 get the long form Miss approximately.12% of population overall (including about 2.78% of black population) Why do it? Current Population Survey 60,000 households interviewed every month The American Community Survey Contacts 3 million households (including some from every county) and will replace the long form in 2010

Gallup Polls Many do not believe a survey of 1500-2000 respondents can represent the views of all Americans.

Estimation Parameter A number that describes the population. We don t know its value. Statistic A number that describes a sample. It can change from sample to sample. If we take lots of samples the statistic follows a predictable pattern.

Sampling Variability

Law of Large Numbers As the number of trials increases the average outcome approaches the mean of the population (i.e., the expected outcome) and the standard deviation of the average outcome approaches zero.

To reduce bias take a SRS. To reduce variability take larger samples. The margin of error is about sampling variability. We say, The president s approval rating is 40%, plus or minus 3 percentage points. We are 95% percent confident that the true population proportion is between 37% and 43%.

Central Limit Theorem The distribution of an average tends to be Normal, even when the distribution from which the average is computed is decidedly non-normal.

Quick method for a 95% confidence interval around a sample proportion is 1/ n

Margin of Error and Sample Size 1/ n = 1/ 1600 = 1/ 40 = 0.025or 3.0% 1/ n = 1/ 2527 = 1/ 50.27 = 0.020or 2.0% 1/ n = 1/ 100 = 1/10 = 0.1or10% The size of the population has little influence on the behavior of statistics from random samples. The population size does not matter as long as it is at least 100 times larger than the sample.

Gallup Polls Many do not believe a survey of 1500-2000 respondents can represent the views of all Americans.

Estimating a Population Proportion We take a survey (SRS) to estimate the percentage of overweight children aged 6 11 years in the general population. count of successes in the sample 408 pˆ = = = 15.3% n 2673

Sampling Distribution of a Sample Proportion If the sample is large enough, the sampling distribution of is approximately normal. pˆ The mean of the sampling distribution is p. The standard deviation of the sampling distribution is p ( 1 p) n

The standard deviation from our sample is: p(1 n p) = pˆ(1 n pˆ).153(.847) 2673 =.006963

The 95% Confidence Interval around our estimate is: pˆ ± z α / 2 pˆ(1 n pˆ).153 ± 1.96(.00696).153 ±.0136 13.9%,16.7%

The 95% Confidence Interval around our estimate is: pˆ ± z α / 2 pˆ(1 n pˆ).153 ± 2(.00696).153 ±.0139 13.9%,16.7%

What if you wanted a 99% Confidence Interval around our estimate? pˆ ± z α / 2 pˆ(1 n pˆ).153 ± 2.58(.00696).153 ±.018 13.5%,17.1%

Sampling distribution of a sample mean Choose an SRS of size n from a population in which individuals have mean µ and standard deviation σ. Let x be the mean of the sample. Then: The sampling distribution of x is approximately normal when the sample size n is large. The mean of the sampling distribution is equal to µ. The standard deviation (standard error of the estimate) of the sampling distribution is σ = s. e. = σ / x n

Confidence Interval for a Population Mean (µ) When n is large (>30) the sample standard deviation s is close to σ and can be used to estimate it. Confidence Interval for a population mean: n s z x n z x z x x 2 / 2 / 2 / α α α σ σ ± ± ±

Suppose a program director wants to estimate the average length of time (in months) clients remain in a rehab clinic program. She takes a random sample of 100 clients records and uses the sample s mean x, to estimate µ, the population mean. We start by calculating the mean and the sample standard deviation. Assume that: x = 465 ( x x) 2 = 2,387

Then, x x = n = 465 = 100 4.65 2 2 ( x x) 2,387 s = = = 24.11 and s = 4.9 n 1 99 Since we have a large sample (n=100) we can substitute s for σ. A 95% confidence interval for the mean number of months spent in the program is x s 4.9 ± 2 = 4.65 ± 2 = 4.65 ±.98 100 10 Confidence Interval = 3.67, 5.63

Small sample estimates of µ x ± t α / 2 s n Where t α/2 is based on (n 1) degrees of freedom. Assumption: A random sample is selected from a population with a relative frequency distribution that is approximately normal.

Food prices have been going up rapidly. To periodically assess the increase in prices you purchase the same items at twenty different grocery stores. The mean and standard deviation of the costs at the twenty supermarkets are: x = $26.84 and s = $2.63 If we assume that the distribution of costs for the grocery basket at all supermarkets is approximately normal, we can use the t-statistic to form the confidence interval. For a confidence level of 95%, we need the tabulated value of t with df = n 1 = 19. From the t table we see that t α/2 = t.025 = 2.093 x s 2.63 ± t. 025 = 26.84 ± 2.093 = 26.84 ± 1.23 = n 20 ( 25.61, 28.07) Thus, we are reasonably confident (95%) that the interval from $25.61 to $28.07 contains the true mean cost µ of the grocery basket. This is because if we were to employ our interval estimator on repeated occasions, 95% of the intervals constructed would contain µ.

Determining Sample Size How can the appropriate sample size be determined? First determine how reliable you want the estimate to be. Example: Consider the rehab program example where we estimated the mean length of time clients stayed in the program. A sample of 100 clients records produced an estimate, x, that was within.98 month of µ with probability equal to.95. What if we wanted to estimate the true mean to within.5 month with a probability equal to.95. How large a sample would be required?

For the sample size n = 100, we found that an approximate 95% confidence interval to be x ± x 2 σ 4.65 ±.98 If we now want our estimator to be within.5 month of µ, we must have 2σ 2 σ =.5 =. 5 x or n S=4.9 2(4.9) =.5 n 2(4.9) n =.5 2 n = 19.6 = 19.6 = 384.16 384

You would have to sample approximately 384 clients records in order to estimate the mean length of stay in the program, µ, to within.5 month with probability equal to.95.

Understanding Degrees of Freedom Statisticians use the terms "degrees of freedom" to describe the number of values in the final calculation of a statistic that are free to vary. Consider, for example the statistic s 2. To calculate the s 2 of a random sample, we must first calculate the mean of that sample and then compute the sum of the several squared deviations from that mean. While there will be n such squared deviations only (n - 1) of them are, in fact, free to assume any value whatsoever. This is because the final squared deviation from the mean must include the one value of X such that the sum of all the Xs divided by n will equal the obtained mean of the sample. All of the other (n - 1) squared deviations from the mean can, theoretically, have any values whatsoever. For these reasons, the statistic s 2 is said to have only (n - 1) degrees of freedom.