Point and Interval Estimates



Similar documents
5.1 Identifying the Target Parameter

Lesson 17: Margin of Error When Estimating a Population Proportion

CALCULATIONS & STATISTICS

Math 251, Review Questions for Test 3 Rough Answers

The Math. P (x) = 5! = = 120.

Week 4: Standard Error and Confidence Intervals

Unit 26 Estimation with Confidence Intervals

6.4 Normal Distribution

4. Continuous Random Variables, the Pareto and Normal Distributions

Introduction to the Practice of Statistics Sixth Edition Moore, McCabe Section 5.1 Homework Answers

Normal distribution. ) 2 /2σ. 2π σ

An Introduction to Basic Statistics and Probability

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe

p ˆ (sample mean and sample

BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp

Estimation and Confidence Intervals

Week 3&4: Z tables and the Sampling Distribution of X

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Characteristics of Binomial Distributions

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

Descriptive Statistics

WRITING PROOFS. Christopher Heil Georgia Institute of Technology

Chapter 4. Probability and Probability Distributions

A Short Guide to Significant Figures

Objectives. 6.1, 7.1 Estimating with confidence (CIS: Chapter 10) CI)

5544 = = = Now we have to find a divisor of 693. We can try 3, and 693 = 3 231,and we keep dividing by 3 to get: 1

Descriptive Statistics and Measurement Scales

Probability Distributions

Statistics 2014 Scoring Guidelines

3.4 Statistical inference for 2 populations based on two samples

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

SAMPLING DISTRIBUTIONS

The Binomial Probability Distribution

Math Quizzes Winter 2009

Introduction to Hypothesis Testing

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

Sample Size Issues for Conjoint Analysis

DETERMINE whether the conditions for a binomial setting are met. COMPUTE and INTERPRET probabilities involving binomial random variables

Review. March 21, S7.1 2_3 Estimating a Population Proportion. Chapter 7 Estimates and Sample Sizes. Test 2 (Chapters 4, 5, & 6) Results

Mind on Statistics. Chapter 10

SOLUTIONS: 4.1 Probability Distributions and 4.2 Binomial Distributions

WHERE DOES THE 10% CONDITION COME FROM?

Covariance and Correlation

Numerical Methods for Option Pricing

Final Exam Practice Problem Answers

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Northumberland Knowledge

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Chapter 2. Hypothesis testing in one population

Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck!

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Chapter 7 - Practice Problems 1

Lesson 20. Probability and Cumulative Distribution Functions

Zeros of a Polynomial Function

Confidence Intervals for the Difference Between Two Means

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Confidence intervals

1. Give the 16 bit signed (twos complement) representation of the following decimal numbers, and convert to hexadecimal:

10.2 Series and Convergence

Lesson 4 Measures of Central Tendency

Constructing and Interpreting Confidence Intervals

Solving Rational Equations

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

The Euclidean Algorithm

Probability Distributions

Confidence Intervals for One Standard Deviation Using Standard Deviation

5.1 Radical Notation and Rational Exponents

6 3 The Standard Normal Distribution

Need for Sampling. Very large populations Destructive testing Continuous production process

MULTIPLE REGRESSION WITH CATEGORICAL DATA

Chapter 7 Review. Confidence Intervals. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

The Margin of Error for Differences in Polls

Algebra Practice Problems for Precalculus and Calculus

Means, standard deviations and. and standard errors

The Theory and Practice of Using a Sine Bar, version 2

The normal approximation to the binomial

Section 1.3 P 1 = 1 2. = P n = 1 P 3 = Continuing in this fashion, it should seem reasonable that, for any n = 1, 2, 3,..., =

3.1. Solving linear equations. Introduction. Prerequisites. Learning Outcomes. Learning Style

MATH 140 Lab 4: Probability and the Standard Normal Distribution

5.4 Solving Percent Problems Using the Percent Equation

Simple linear regression

statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals

3.2. Solving quadratic equations. Introduction. Prerequisites. Learning Outcomes. Learning Style

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods

Note on growth and growth accounting

7.6 Approximation Errors and Simpson's Rule

Problem Solving and Data Analysis

Confidence Intervals for Spearman s Rank Correlation

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Descriptive Statistics

SIMULATION STUDIES IN STATISTICS WHAT IS A SIMULATION STUDY, AND WHY DO ONE? What is a (Monte Carlo) simulation study, and why do one?

Transcription:

Point and Interval Estimates Suppose we want to estimate a parameter, such as p or µ, based on a finite sample of data. There are two main methods: 1. Point estimate: Summarize the sample by a single number that is an estimate of the population parameter; 2. Interval estimate: A range of values within which, we believe, the true parameter lies with high probability. Example. Suppose I wanted to estimate the mean height of all female students at UNC. I took a sample in this class and the sample mean was x = 65.5 (inches). So the obvious thing to do is to take that as an estimate for the population mean. But I didn t have to use the sample mean. I could have taken the sample median (65) or the sample mode (63). It makes sense to ask which is better. 1

What properties make a good point estimator? 1. It s desirable that the sampling distribution be centered around the true population parameter. An estimator with this property is called unbiased. 2. It s desirable that our chosen estimator have a small standard error in comparison with other estimators we might have chosen. The sample mean is exactly unbiased (whereas the sample median may not be), and also, if the true population is normal, the sample mean has a smaller standard error than the sample median. Both of these would indicate that the sample mean is preferable to the sample median as an estimator of the population mean. However there are other properties that could nevertheless make the median preferable (e.g. it s more resistant to outliers). 2

In the case of a binomial proportion, the obvious point estimator is the sample proportion. For example, consider our example about President Obama s popularity rating (class posted 03/05/09 Chapter 6 material). In this example, 68% of respondents gave Obama a positive rating after he had been in office for one month (the answer could be different if we repreated the poll now). The most natural interpretation of this is that 68% or 0.68 is a statistic which serves as an estimator of the true but unknown proportion of people who would have approved of Obama if the whole population had been surveyed. It seems obvious that we would use the sample proportion as an estimator of the population proportion, but we don t have to. 3

Now let s turn to interval estimates. The simplest way to introduce this is through an example. Example. In a college of 25,000 students, the administration would like to know for what proportion of students both parents had completed college. A sample of 350 students was drawn at random and in that sample, 276 of the students said that both their parents had completed college. The sample proportion is 350 276 = 0.789 (or 78.9%), so by the same logic as in the last example, it makes sense to use that number also as an estimator of the population proportion (in this case the population is all 25,000 students at this college). But, how accurate is that? 4

The idea: Consider the interval Sample Proportion ± 1.96 Standard Error ( ) Why would this work? 5

Sample Proportion ± 1.96 Standard Error ( ) First let s calculate Pr{ 1.96 < X < 1.96} when X is a standard normal random variable (mean 0, standard deviation 1). From the normal table, for z = 1.96 we have left-tail probability 0.9750. For z = 1.96 we have left-tail probability 0.0250. The difference is 0.975 0.025=0.95. But given that the sample proportion has an approximately normal distribution, this means that the probability that the sample proportion lies within 1.96 standard errors of the true mean is also 0.95. Or in other words, the probability that the interval (*) includes the true mean is 0.95. This is what we mean by saying that the interval we calculated is a 95% confidence interval. 6

A side comment. Earlier in the course, we said that there is a 95% chance that a normal random variable lies within two standard deviations of the mean (this is part of the empirical rule first discussed in Chapter 2, because although at that time we didn t use the words normal distribution, that s actually what the empirical rule refers to). So why have we now replaced the number 2 with 1.96? Actually, 1.96 is more accurate. If we repeat the above normal probability calculations with z = ±2 instead of z = ±1.96, the probability becomes.9772.0228 =.9544. That s still quite close to 95%, and 1.96 is quite close to 2, so in practice, it doesn t make much difference whether we use 2 or 1.96 standard deviations. But at this stage of the course, we re trying to be more precise about things than we were earlier on, hence the change. 7

Confidence interval for a population proportion To construct a confidence interval to measure the proportion p of a population that has a particular characteristic (e.g. supporters of President Obama): Step 1: Take a sample of size n, calculate ˆp (pronounced p-hat) as the sample proportion of people who have that characteristic (e.g. saying they support Obama in a survey) Step 2: Calculate the standard error SE = ˆp(1 ˆp) n. Note: The formula should really be p, so we use ˆp instead. p(1 p) n, but we don t know Step 3: The 95% confidence interval (ˆp 1.96 SE, ˆp+1.96 SE). 8

Example from text. In one question of the GSS in 2000, 1154 people were asked whether they would be willing to pay higher prices to support the environment. 518 said yes. Find a 95% confidence interval for p, the true proportion in the whole population who would be willing to pay higher prices to support the environment. 9

Step 1: ˆp = 518 1154 = 0.449 to 3 decimal places. Step 2: The standard error is 0.449 0.551 1154 = 0.0146. Step 3: 1.96 0.0146 = 0.029 to 3 decimal places. The 95% confidence interval is (0.420,0.478). In practice, we wouldn t usually express this to three decimal places and simply say that we believe the true proportion of people who support the proposition (i.e. who would be willing to pay higher prices to protect the environment) is between 42% and 48%. 10

A side comment. In another GSS survey people were asked whether they would support legislation to force industry to adopt more environment-friendly policies. This time close to 80% answered yes. Yet it seems likely that tighter regulation on industry will result in higher prices for consumers (this will certainly be true if you ask the industry representatives). Another case where the wording of a question arguably influences the answer to a much greater extent than standard error calculations indicate. 11

Sample size condition For these calculations to be valid (standard error formula including ˆp in place of p, normal approximation for the distribution of ˆp) we require the sample size to be reasonably large. In practice, it is sufficient that nˆp 15 and n(1 ˆp) 15. 12

Confidence intervals with other confidence coefficients So far we have worked with 95% confidence intervals, signifying that there is supposed to be a 95% probability that the interval includes the true population parameter. However, there s nothing special here about the probability 95% we could equally well work with 90%, or 99%, or any other probability we care to specify. If we want a 99% confidence interval, we make the same calculation but replace 1.96 by 2.58. If we want a 90% confidence interval, we make the same calculation but replace 1.96 by 1.645. See Table 7.2, page 328. 13

Example. In an earlier example, we considered a sample of 350 students from a college (with a total student population of 25,000) and asked for what fraction of students it was true that both their parents had been to college. In that case, the sample proportion was.789. Suppose we wanted a 90% confidence interval..789 (1.789) The standard error is 350 =.0218 multiply by 1.645, the margin if error is 0.036 to three decimal places. Thus the 95% confidence interval is (.753,.825). 14

Interpretation of a confidence interval Continue the previous example: Does the answer mean there is a 90% chance that the true proportion is between.753 and.825? Strictly speaking, such a statement doesn t make sense we re talking about a finite population of parents; either they went to college or they didn t; what does it mean to talk of a 90% chance? What the 90% confidence statement really means is that in many repetitions of the procedure, the interval will cover the true value 90% of the time. 15

Example: Let s suppose the true proportion is 80%. I ran 10 simulations experiments where I generated a binomial random variable with n = 350, p = 0.8. The results were: 272 285 266 289 287 278 269 277 285 271. I now constructed a 90% confidence interval for each of the ten hypothetical samples. The results were: X Lower Upper X Lower Upper Bound Bound Bound Bound 272 0.741 0.814 278 0.759 0.830 285 0.780 0.848 269 0.731 0.806 266 0.722 0.798 277 0.756 0.827 289 0.792 0.859 285 0.780 0.848 287 0.786 0.854 271 0.738 0.811 The interval is slightly different each time, and in fact, in 9 out of the 10 cases the interval covered the true value 0.8. 16

Conclusion: The interval is random. the confidence coefficient (in this case, 90%) represents the long-run probability that the interval would cover the true value in many repetitions of the sampling procedure.