of course the mean is p. That is just saying the average sample would have 82% answering



Similar documents
Lecture 19: Chapter 8, Section 1 Sampling Distributions: Proportions

The normal approximation to the binomial

5.1 Identifying the Target Parameter

Lecture 10: Depicting Sampling Distributions of a Sample Proportion

MATH 10: Elementary Statistics and Probability Chapter 7: The Central Limit Theorem

Week 3&4: Z tables and the Sampling Distribution of X

Descriptive Statistics

The normal approximation to the binomial

Lecture 5 : The Poisson Distribution

Introduction to the Practice of Statistics Sixth Edition Moore, McCabe Section 5.1 Homework Answers

Point and Interval Estimates

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe

Chapter 4. iclicker Question 4.4 Pre-lecture. Part 2. Binomial Distribution. J.C. Wang. iclicker Question 4.4 Pre-lecture

Understand the role that hypothesis testing plays in an improvement project. Know how to perform a two sample hypothesis test.

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Normal Approximation. Contents. 1 Normal Approximation. 1.1 Introduction. Anthony Tanbakuchi Department of Mathematics Pima Community College

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

The Binomial Probability Distribution

4. Continuous Random Variables, the Pareto and Normal Distributions

Normal distribution. ) 2 /2σ. 2π σ

WHERE DOES THE 10% CONDITION COME FROM?

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

b) All outcomes are equally likely with probability = 1/6. The probabilities do add up to 1, as they must.

Simple Regression Theory II 2010 Samuel L. Baker

Stats on the TI 83 and TI 84 Calculator

Normal Distribution as an Approximation to the Binomial Distribution

SOLUTIONS: 4.1 Probability Distributions and 4.2 Binomial Distributions

z-scores AND THE NORMAL CURVE MODEL

DETERMINE whether the conditions for a binomial setting are met. COMPUTE and INTERPRET probabilities involving binomial random variables

Hypothesis Testing for Beginners

TEACHER NOTES MATH NSPIRED

HYPOTHESIS TESTING: POWER OF THE TEST

Two-sample inference: Continuous data

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

MATH 140 Lab 4: Probability and the Standard Normal Distribution

Binomial Probability Distribution

Comparing Two Groups. Standard Error of ȳ 1 ȳ 2. Setting. Two Independent Samples

6. Let X be a binomial random variable with distribution B(10, 0.6). What is the probability that X equals 8? A) (0.6) (0.4) B) 8! C) 45(0.6) (0.

Confidence Intervals for the Difference Between Two Means

Normal Distribution. Definition A continuous random variable has a normal distribution if its probability density. f ( y ) = 1.

The Normal Distribution

Chapter 7 - Practice Problems 1

The Standard Normal distribution

Chapter 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS. Part 3: Discrete Uniform Distribution Binomial Distribution

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

1 Review of Least Squares Solutions to Overdetermined Systems

6.2 Normal distribution. Standard Normal Distribution:

Characteristics of Binomial Distributions

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck!

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

3.2 Measures of Spread

BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp

16. THE NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION

SIMULATION STUDIES IN STATISTICS WHAT IS A SIMULATION STUDY, AND WHY DO ONE? What is a (Monte Carlo) simulation study, and why do one?

MAT 155. Key Concept. September 27, S5.5_3 Poisson Probability Distributions. Chapter 5 Probability Distributions

CALCULATIONS & STATISTICS

Notes on Continuous Random Variables

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

1.5 Oneway Analysis of Variance

Pr(X = x) = f(x) = λe λx

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Week 4: Standard Error and Confidence Intervals

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

Chapter 3 RANDOM VARIATE GENERATION

Section 1.3 P 1 = 1 2. = P n = 1 P 3 = Continuing in this fashion, it should seem reasonable that, for any n = 1, 2, 3,..., =

Chapter 2. Hypothesis testing in one population

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015

Business Statistics, 9e (Groebner/Shannon/Fry) Chapter 9 Introduction to Hypothesis Testing

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

abc Mark Scheme Statistics 6380 General Certificate of Education 2006 examination - January series SS02 Statistics 2

1 Sufficient statistics

Statistiek I. Proportions aka Sign Tests. John Nerbonne. CLCG, Rijksuniversiteit Groningen.

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

Lecture 8: More Continuous Random Variables

Lecture 6: Discrete & Continuous Probability and Random Variables

Binomial random variables

individualdifferences

Inference for two Population Means

Lesson 4 Measures of Central Tendency

Chapter 2: Descriptive Statistics

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

Social Studies 201 Notes for November 19, 2003

The Normal Distribution

6.4 Normal Distribution

Probability Distributions

p ˆ (sample mean and sample

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

WRITING PROOFS. Christopher Heil Georgia Institute of Technology

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Measures of Central Tendency and Variability: Summarizing your Data for Others

Capital Market Theory: An Overview. Return Measures

Introduction to Basic Reliability Statistics. James Wheeler, CMRP

DESCRIPTIVE STATISTICS & DATA PRESENTATION*

Estimation and Confidence Intervals

Math 251, Review Questions for Test 3 Rough Answers

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Probability Distributions

Confidence intervals

Transcription:

Sampling Distribution for a Proportion Start with a population, adult Americans and a binary variable, whether they believe in God. The key parameter is the population proportion p. In this case let us suppose 82% of Americans believe in God. Take a sample of 400 Americans, ask if they believe in God. The key statistic is the sample proportion ˆp, ( pee-hat ) the number of yes answers divided by the total (400). The proportion of the sample who believe in God. Each sample has a different ˆp. If we consider all possible samples we can make a histogram of those values, the sampling distribution of the random variable ˆP. The sampling distribution of ˆP has 1. A mean of p 2. A standard deviation (standard error) of σ ˆP 3. A normal distribution if n is big enough. p(1 p) n Wait, but parameters are supposed to be Greek letters and statistics are supposed to be Roman, right? p ought to be some cool Greek letter like π, (in some advanced textbooks its θ) but somehow people don t follow that convention here. Notice though that the statistic has a decoration (the hat) just like the sample mean (the bar over x). of course the mean is p. That is just saying the average sample would have 82% answering yes The standard deviation of a sampling distribution (i.e. the standard deviation of a statistic) is always called the standard error The Fine Print There were 3 assumptions underlying last slide. None exactly true in real situations, but Rules of Thumb say when close enough. (a) SRS The sample is assumed to be a Simple Random Sample. The formula for standard deviation assumes sampling with replacement so successive individuals sampled are independent. This is close enough if population is much larger than the sample: (b) Independence/Large Population Assumption The population size is at least 20 times the sample size. As n gets larger the distribution gets more normal, but it happens faster if p is close to.5. We can use the normal dist. to model sample proportion if (c) Normality Assumption/Rule of 15 : the numbers np and n(1 p) are both at least 15. We will learn a number of rules of thumb for when we can take assumptions as being met. 1

Populations are generally big, so the large population assumption is almost always obviously met. I will expect you to be able to say, if the sample size is say 200, that the population needs to be at least 4, 000. In the rare situations where the large population rule is not met, there is a slightly more complicated formula for the standard deviation that works fine, so failure of this assumption is a pretty mild problem. The numbers np and n(1 p) are the average or expected number of yes and no answers in the sample. If p is close to 1/2 this means n just has to be a little more than 30. if p is close to 0 or close to 1 n needs to be quite large. An Example 82% of adult Americans believe in God. Take a SRS of 400 adult Americans and ask if they believe. What are mean and standard deviation of proportion in your sample who do? What is chance less than 80% in your sample will believe in God? Between 80 and 90%? It says simple random sample, so SRS assumption:met. The mean is p.82. Check the large population assumption: Need there to be more than 400 20 8000 adult Americans: obviously true so Met. p(1 p).82.18 σ ˆP 0.0192. n 400 Check normality assumption/ rule of 15 : np 400.82 328 15. n(1 p) 400.18 72 15 so ˆP is normal.met Checking the large population assumption was typical. I want to see that you know how big the population needs to be. Usually you do not know exactly how big the population is, but is generally obvious that it is big enough. Notice the numbers np and n(1 p) were the average number of yes and no answers you would expect in your sample. More Example 82% of adult Americans believe. Take a SRS of 400 adult Americans and ask if they support him. What are mean and standard deviation of proportion in your sample who do? What is chance less than 80% in your sample will believe? Between 80 and 90%? To find the probability it is less than 80% since ˆP is normal P ( ˆP <.8) normdist(.8,.82,.82.18/400, 1) 14.9% 2

Between 80% and 90% : P (.8 < ˆP <.9) normdist(.9,.82,.82.18/400, 1) normdist(.8,.82,.82.18/400, 1) 85.1% Notice I put the formula for the s.d. into normdist, not just the rounded result. Normdist calculations are extremely sensitive to the standard deviation, and you can be quite far off if you round it off too much. So enter the formula directly into excel and do not round in the middle for this calculation. The Example - The Big Picture So we saw that if we take many samples of n 400 from a population with proportion of successes p.82 and compute the sample proportion ˆp for each one, these values of ˆP will have a normal distribution with mean µ.82 and standard deviation σ.019 Another Example You know the answer to 75% of the questions your philosophy professor might ask. View the 50 questions on the test as a simple random sample of all questions s/he might ask. Find mean and s. d. of the proportion you will get right. What is your chance of getting over 90%?Between 80 and 90? Between 70 and 80? SRS: Met. Check large population assumption: Need more than 20 50 1000 questions s/he could ask. Lots of questions out there, seems reasonable. Met. Check rule of 15 : np 50.75 37.5 15. n(1 p) 50.25 12.5 which is < 15. Cannot assume ˆP is normal. Not Met. Continue with calc. treat results with skepticism. Compute mean and s.d. The mean is p.75. p(1 p).75.25 σ ˆP 0.0612. n 50 3

More Other Example You get 75% of questions right, test has 50 questions. Find mean and s. d. of ˆP. Chance over 90%? Between 80 and 90? Between 70 and 80? The distribution of ˆP is roughly normal with.75 σ ˆP.75.25/50.0612. P ( ˆP >.9) 1 normdist(.9,.75,.75.25/50, 1).715% P (.8 < ˆP <.9) normdist(.9,.75,.75.25/50, 1) normdist(.8,.75,.75.25/50, 1) 20.0%. P (.7 < ˆP <.8) normdist(.8,.75,.75.25/50, 1) normdist(.7,.75,.75.25/50, 1) 58.6%. Sampling Distribution In general consider a population and a variable. Take a simple random sample and compute some statistic like mean or proportion. Each time you do this you get a different answer, so it is a Random variable! If you consider the value of this statistic for every possible sample, you get a distribution, the sampling distribution. We want to know its mean, standard deviation, and shape of its histogram. The population distribution is the values of the variable in the population (the 82% of all adult Americans who believe) The data distribution is the values of the variable in one particular sample (maybe 320 yes answers in a sample of 400) The sampling distribution is the different values of the statistic (like ˆP ) in different samples One of the trickiest points in the class is the fact that we are taking the statistic as a random variable. This means in a sense our population has become the set of all possible samples out of the original population. If you can keep track of these levels (the original population, one particular sample, and the population of all samples) straight, you will own this course. If you can t, be patient: Your brain takes time to stretch, but it gets there. 4

Lecture 16 Key Points After watching this lecture you should be able to Know we mean by the sampling distribution of ˆP, and what it represents Calculate the mean and standard deviation of ˆP Check the Independence/ Large Population assumption and what it tells you (that the s.d. formula is correct) Check the Normality / Rule of 15 assumption and what it tells you (can use normdist) Calculate probabilities of ˆP using normdist. 5