Math C067 Sampling Distributions



Similar documents
Hypothesis testing. Null and alternative hypotheses

Determining the sample size

1. C. The formula for the confidence interval for a population mean is: x t, which was

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Sampling Distribution And Central Limit Theorem

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

Confidence Intervals

1 Computing the Standard Deviation of Sample Means

I. Chi-squared Distributions

Properties of MLE: consistency, asymptotic normality. Fisher information.

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

Measures of Spread and Boxplots Discrete Math, Section 9.4

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Practice Problems for Test 3

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

Overview of some probability distributions.

One-sample test of proportions

Chapter 7: Confidence Interval and Sample Size

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

Repeating Decimals are decimal numbers that have number(s) after the decimal point that repeat in a pattern.

5: Introduction to Estimation

Lesson 17 Pearson s Correlation Coefficient

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

1 Correlation and Regression Analysis

Confidence Intervals for One Mean

Hypergeometric Distributions

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals


Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

Chapter 14 Nonparametric Statistics

OMG! Excessive Texting Tied to Risky Teen Behaviors

PSYCHOLOGICAL STATISTICS

Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011

STA 2023 Practice Questions Exam 2 Chapter 7- sec 9.2. Case parameter estimator standard error Estimate of standard error

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Maximum Likelihood Estimators.

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

Confidence intervals and hypothesis tests

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Chapter 7 Methods of Finding Estimators

Output Analysis (2, Chapters 10 &11 Law)

Descriptive Statistics

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Lesson 15 ANOVA (analysis of variance)

BINOMIAL EXPANSIONS In this section. Some Examples. Obtaining the Coefficients

The Stable Marriage Problem

Elementary Theory of Russian Roulette

Present Value Factor To bring one dollar in the future back to present, one uses the Present Value Factor (PVF): Concept 9: Present Value

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Infinite Sequences and Series

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Lecture 4: Cheeger s Inequality

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

Statistical inference: example 1. Inferential Statistics

Betting on Football Pools

LECTURE 13: Cross-validation

CHAPTER 11 Financial mathematics

Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu

CS103X: Discrete Structures Homework 4 Solutions

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

Listing terms of a finite sequence List all of the terms of each finite sequence. a) a n n 2 for 1 n 5 1 b) a n for 1 n 4 n 2

CHAPTER 3 THE TIME VALUE OF MONEY

Convexity, Inequalities, and Norms

Simple Annuities Present Value.

Normal Distribution.

Building Blocks Problem Related to Harmonic Series

Your organization has a Class B IP address of Before you implement subnetting, the Network ID and Host ID are divided as follows:

Time Value of Money. First some technical stuff. HP10B II users

3. Greatest Common Divisor - Least Common Multiple

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

Soving Recurrence Relations

A probabilistic proof of a binomial identity

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

Section 11.3: The Integral Test

Parametric (theoretical) probability distributions. (Wilks, Ch. 4) Discrete distributions: (e.g., yes/no; above normal, normal, below normal)

A Mathematical Perspective on Gambling

Chapter 5: Inner Product Spaces

7.1 Finding Rational Solutions of Polynomial Equations

Unit 8: Inference for Proportions. Chapters 8 & 9 in IPS

Sequences and Series

3 Basic Definitions of Probability Theory

Predictive Modeling Data. in the ACT Electronic Student Record

Quadrat Sampling in Population Ecology

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE The absolute value of the complex number z a bi is

Central Limit Theorem and Its Applications to Baseball

Department of Computer Science, University of Otago

FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. 1. Powers of a matrix

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

Transcription:

Math C067 Samplig Distributios Sample Mea ad Sample Proportio Richard Beigel Some time betwee April 16, 2007 ad April 16, 2007 Examples of Samplig A pollster may try to estimate the proportio of voters i favor of a particular presidetial cadidate by pollig a radom sample of voters. A statisticia may wat to estimate the average startig icome of ew college graduates by fidig the mea of a radom sample of ew college graduates. Numbers Populatio size: N Sample size: (usually much smaller tha N) Samplig with Replacemet Choose 1 perso, Choose a 2d perso (who could be the same as the first), Choose a 3rd perso (who could be the same as the first ad/or secod),..., Choose a th perso Samplig without Replacemet Choose differet people Rules of thumb Our formulas will be correct for samplig with replacemet. Our formulas will be approximately correct for samplig without replacemet provided that the populatio size N is large. 1

Sample Mea Suppose that we are tryig to estimate a radom variable that is based o a large populatio, e.g., we wat to estimate average icome. Radom Variable that we wish to estimate: X Expected value (mea) of X: µ Stadard deviatio of X: σ We defie a radom variable X called the Sample Mea via the followig radom process: Evaluate X at radom poits, i.e., choose radom people ad fid out how much each of them ears Fid the average of those values Formulas The expected value of X is the same as the the expected value of X, i.e., µ X = µ (correct eve if we sample without replacemet). The variace of X is smaller tha the variace of X by a factor of (the sample size), i.e., σ 2 X = 1 σ2 The stadard deviatio of X is smaller tha the stadard deviatio of X by a factor of (the sample size), i.e., σ X = 1 σ 2

Example: Test scores. Oe millio studets take a stadardized test. Their average score is 500. The stadard deviatio of their scores is 100. A statisticia chooses 400 studets at radom with replacemet ad computes the mea X of their test scores. What is the expected value of X? What is the stadard deviatio of X? µ X = µ = 500 σ X = 1 σ = 1 400 100 = 1 20 100 = 5 Note: these formulas do ot deped i ay way o the populatio size N. Example: Test scores. Oe millio studets take a stadardized test. Their average score is 500. The stadard deviatio of their scores is 100. A statisticia chooses a group of 400 differet studets at radom ad computes the mea X of their test scores. What is the expected value of X? What is the stadard deviatio of X? µ X = µ = 500. This formula works with or without replacemet. Because the populatio size is large we ca use the same formula we used with replacemet to estimate the stadard deviatio. σ X 1 σ = 1 400 100 = 1 20 100 = 5 Example: Average icome 10 millio people work i Metropolis. Their average icome is $100,000. The stadard deviatio of their icomes is 2500. A statisticia chooses 100 differet Metropolis workers at radom ad computes their average icome I. Was the sample chose with replacemet or without replacemet? What is the expected value of I? What is the stadard deviatio of I? What is the probability that I is betwee $99,500 ad $100,500. I is just a sample mea, so we ca call it X. (But the problem statemet ca call it aythig. It s up to you to recogize that I is the sample mea.) µ I = µ X = µ = 100000 σ I = σ X 1 σ = 1 2500 = 1 2500 = 250 100 10 The Sample Mea is approximately ormally distributed so we ca use the methods of chapter 6 to estimate Pr[99500 X 100500]. We are lookig at 2 stadard deviatios o each side of the mea so Pr[99500 X 100500] 0.4772 + 0.4772 = 0.9544 Note: 30. You ca approximate a Sample Mea by a Normal Distributio provided that 3

Sample Proportio Suppose that a fractio (proportio) p of a populatio favors cadidate C for presidet. We take a radom sample cosistig of people Let X be a radom variable that deotes the umber of people i the sample who favor cadidate C The the distributio of X is B(, p). Let ˆP be a radom variable that deotes the fractio (proportio) of people i the sample that favor cadidate C The ˆP = 1 X ˆP is called the Sample Proportio E(X) = p, E( ˆP ) = p Var(X) = p(1 p), Var( ˆP ) = p(1 p) σ X = p(1 p), σ ˆP = p(1 p) Example: A Electio Suppose that 55 percet of the voters favor Cadidate C for presidet. If a radom sample of 1000 voters is chose, estimate the probability that betwee 52 ad 58 percet of the voters i the sample will favor Cadidate C. To aswer this questio we ca work with B(, p) ad estimate Pr[520 X 580] or we ca work directly with the Sample Proportio ad ask whether Pr[0.52 ˆP 0.58]. Both radom variables X ad ˆP are approximately ormally distributed. Solve this o your ow ad the compare your aswer to the aswers o the ext page. 4

Solved Example: A Electio Suppose that 55 percet of the voters favor Cadidate C for presidet. If a radom sample of 1000 voters is chose, estimate the probability that betwee 52 ad 58 percet of the voters i the sample will favor Cadidate C. To aswer this questio we ca work with B(, p) ad estimate Pr[520 X 580] or we ca work directly with the Sample Proportio ad ask whether Pr[0.52 ˆP 0.58]. Both radom variables X ad ˆP are approximately ormally distributed. Solve this o your ow ad the compare your aswer to the aswers o the ext page. Solutio usig Biomial Distributio E[X] = p = 1000 0.55 = 550. Var[X] = pq = p(1 p) = 1000 0.55 0.45 = 247.5 σ X = Var[X] = 247.5 15.73 Pr[520 X 580] = Pr[519.5 X 580.5] = Pr[519.5 X 550] + Pr[550 < X 580.5] The first iterval cosists of 550 519.5 1.94 stadard deviatios. The secod iterval cosists 15.73 of 580.5 550 1.94 stadard deviatios. Therefore 15.73 Pr[520 X 580] 0.4738 + 0.4738 = 0.9476 Did you get the correct aswer? If ot, go back ad try agai. Solutio usig Sample Proportio E[ ˆP ] = p = 0.55 Var[ ˆP ] = pq σ ˆP = = p(1 p) = 0.55 0.45 1000 = 0.0002475 Var[ ˆP ] = 0.0002475 0.01573 Pr[0.52 ˆP 0.58] = Pr[0.52 ˆP 0.55] + Pr[0.55 < ˆP 0.58] The first iterval cosists of 0.55 0.52 1.91 stadard deviatios. The secod iterval cosists 0.01573 of 0.58 0.55 1.91 stadard deviatios. Therefore 0.01573 Pr[0.52 ˆP 058] 0.4719 + 0.4719 = 0.9438 (The aswer we got by usig the Biomial Distributio was more accurate because we expaded the iterval by 0.5 i both directios.) Did you get the correct aswer? If ot, go back ad try agai. 5

Solved Example: 1936 Presidetial Electio Suppose that 61% of the voters favor Cadidate R for Presidet. You poll a radom sample cosistig of 3000 voters. What is the probability that a majority of your sample favors Cadidate R? Try this yourself before readig o. 6

Solved Example: 1936 Presidetial Electio Suppose that 61% of the voters favor Cadidate R for Presidet. You poll a radom sample cosistig of 3000 voters. What is the probability that a majority of your sample favors Cadidate R? Solutio usig Biomial Distributio Let X deote the umber of voters i your sample who favor Cadidate R. The distributio of X is B(3000, 0.61). We wish to estimate Pr[X > 1500], which is the same as Pr[X 1501]. E[X] = p = 3000 0.61 = 1830. Var[X] = pq = p(1 p) = 3000 0.61 0.39 = 713.7. σ X = Var[X] = 713.7 26.72 Pr[X 1501] = Pr[X 1500.5] = Pr[1500.5 X] = Pr[1500.5 X 1830] + Pr[1830 < X] The first iterval cosists of approximately 1830 1500.5 26.72 12.33 stadard deviatios. The secod iterval cosists of half of the distributio. Therefore, Solutio usig Sample Proportio E[ ˆP ] = p = 0.61 Var[ ˆP ] = pq σ ˆP = Pr[X 1501] 0.5 + 0.5 = 1.0 = p(1 p) = 0.61 0.39 3000 = 0.0000793 Var[ ˆP ] = 0.0000793 0.008905 Pr[ ˆP > 0.5] = Pr[0.5 < ˆP ] = Pr[0.5 < ˆP 0.61] + Pr[0.61 < ˆP ] The first iterval cosists of approximately 0.61 0.5 12.35 stadard deviatios. The secod 0.008905 iterval cosists of half of the distributio. Therefore, Pr[ ˆP > 0.5] 0.5 + 0.5 = 1.0 7

Gallup s Gambit I a 1935, George Gallup came up with a clever marketig strategy for his opiio polls. Gallup bet that he would correctly predict the results of the 1936 Presidetial Electio. He promised to refud all subscriptio fees for the etire year if his predictio was icorrect. Gallup did predict the Electio results correctly based o a sample of approximately 3000 voters. Reader s Digest icorrectly predicted the result based o a sample of 10,000,000 voters. How could Gallup predict correctly based o such a small sample? Because the actual electio was t really close, eve his small sample provided a comfort level of 12 stadard deviatios. How could Reader s Digest predict icorrectly based o such a large sample? Their sample was ot represetative, but cosisted maily of people who had a car or a telephoe. Lesso: Radomess is much more importat tha sample size. (Accuracy i samplig is also importat, sice some people lie to pollsters. There is a sciece to desigig questioaires.) Some recet electios have bee decided by less tha a 1% margi. It is iterestig to look at how Gallup s poll might have tured out, had the 1936 Electio ot bee a ladslide. Example: What if the Electio had bee close? Suppose that 50.5% of the voters favor Cadidate R for Presidet. You poll a radom sample cosistig of 3000 voters. What is the probability that a majority of your sample favors Cadidate R? Try this oe yourself before readig the solutio. 8

Example: What if the Electio had bee close? Suppose that 50.5% of the voters favor Cadidate R for Presidet. You poll a radom sample cosistig of 3000 voters. What is the probability that a majority of your sample favors Cadidate R? Solutio usig Biomial Distributio Let X deote the umber of voters i your sample who favor Cadidate R. The distributio of X is B(3000, 0.505). We wish to estimate Pr[X > 1500], which is the same as Pr[X 1501]. E[X] = p = 3000 0.505 = 1515. Var[X] = pq = p(1 p) = 3000 0.505 0.495 = 749.925. σ X = Var[X] = 749.925 27.38 Pr[X 1501] = Pr[X 1500.5] = Pr[1500.5 X] = Pr[1500.5 X 1515] + Pr[1515 < X] The first iterval cosists of approximately 1515 1500.5 27.38 0.53 stadard deviatios. The secod iterval cosists of half of the distributio. Therefore, Pr[X 1501] 0.2019 + 0.5 = 0.7019 Thus Gallup s chace of predictig the Electio correctly would have bee about 70%, assumig he used a represetative sample. If his sample was skewed (like Reader s Digest s), the his chace could have bee much smaller. Solutio usig Sample Proportio E[ ˆP ] = p = 0.505 Var[ ˆP ] = pq σ ˆP = = p(1 p) = 0.505 0.495 3000 = 0.000083325 Var[ ˆP ] = 0.000083325 0.009128 Pr[ ˆP > 0.5] = Pr[0.5 < ˆP ] = Pr[0.5 < ˆP 0.505] + Pr[0.505 < ˆP ] The first iterval cosists of approximately 0.505 0.5 0.55 stadard deviatios. The secod 0.009128 iterval cosists of half of the distributio. Therefore, Pr[ ˆP > 0.5] 0.2088 + 0.5 = 0.7088 Thus Gallup s chace of predictig the Electio correctly would have bee about 71%, assumig he used a represetative sample. If his sample was skewed (like Reader s Digest s), the his chace could have bee much smaller. 9

Example: How good was Gallup s Sample? Oly 54% of Gallup s sample favored Roosevelt. Suppose that 61% of the voters favor Cadidate R for Presidet. You poll a radom sample cosistig of 3000 voters. What is the probability that 54% or less your sample favors Cadidate R? Try this yourself before readig o. 10

Solved Example: How good was Gallup s Sample? Suppose that 61% of the voters favor Cadidate R for Presidet. You poll a radom sample cosistig of 3000 voters. What is the probability that 54% or less your sample favors Cadidate R? Solutio usig Biomial Distributio Let X deote the umber of voters i your sample who favor Cadidate R. The distributio of X is B(3000, 0.61). 54% of 3000 is 1620. We wish to estimate Pr[X 1620]. E[X] = p = 3000 0.61 = 1830. Var[X] = pq = p(1 p) = 3000 0.61 0.39 = 713.7. σ X = Var[X] = 713.7 26.72 Pr[X 1620] = Pr[X 1620.5] = Pr[X 1830] Pr[1620.5 < X 1830] The first iterval cosists of half of the distributio. 1830 1620.5 7.84 stadard deviatios. Therefore, 26.72 The secod iterval cosists of Pr[X 1620] 0.5 0.5 = 0.0 This suggests that Gallup s sample was ot close to uiform, i.e., ot represetative of the populatio. Perhaps Gallup just got lucky. They say that Fortue favors the bold. Solutio usig Sample Proportio E[ ˆP ] = p = 0.61 Var[ ˆP ] = pq σ ˆP = = p(1 p) = 0.61 0.39 3000 = 0.0000793 Var[ ˆP ] = 0.0000793 0.008905 Pr[ ˆP 0.54] = Pr[ ˆP 0.61] Pr[0.54 < ˆP 0.61] The first iterval cosists of approximately 0.61 0.54 7.86 stadard deviatios. The secod 0.008905 iterval cosists of half of the distributio. Therefore, Pr[ ˆP 0.54] 0.5 0.5 = 0.0 This suggests that Gallup s sample was ot close to uiform, i.e., ot represetative of the populatio. Perhaps Gallup just got lucky. They say that Fortue favors the bold. 11

Gallup vs. the Tout If the 1936 Electio had bee a close oe (decided by oly a 1% margi), there is substatial probability (30% or more) that Gallup would have predicted it wrog. Was Gallup takig a big risk? Perhaps Gallup would have take a larger sample if it looked like the electio was goig to be close. Or, perhaps Gallup had othig to lose. Suppose that you are gambler. A tout is a perso who sells predictios, usually writte o somethig called a tout sheet. For example, the tout might predict whether the Phillies will beat the Cardials toight. The tout charges $1 for his predictio but he always offers a moey-back guaratee. Key facts about tout sheets: 1. Tout sheets are worthless. Why? I reality, the tout will tell oe customer that Philadelphia will wi ad tell aother customer that Philadelphia will lose. So half of the predictios are guarateed to be correct. (Not like Gallup Polls) 2. The tout ever loses. Whe he predicts correctly, he ears $1. Whe he predicts icorrectly he ears $0. (Like Gallup, although Gallup also exteded his offer to previous customers.) 3. Like most somethig-for-othig schemes, sellig tout sheets is illegal. (Not like Gallup.) 4. Touts thrive o repeat busiess. People who wi are likely to retur to the tout for aother predictio, which is usually more expesive tha the first. (Like Gallup, although I do t thik he jacked up his rates.) 5. A tout s customers are kow i gamblig circles as suckers. (Like Gallup s customers, if they were fooled by the gambit.) 12