Chapter 20 & 23 - Chance Error for Sampling PART

Similar documents
Stat 20: Intro to Probability and Statistics

Chapter 20: chance error in sampling

MEASURES OF VARIATION

AMS 5 CHANCE VARIABILITY

13.0 Central Limit Theorem

$ ( $1) = 40

6.4 Normal Distribution

John Kerrich s coin-tossing Experiment. Law of Averages - pg. 294 Moore s Text

Northumberland Knowledge

The Normal Approximation to Probability Histograms. Dice: Throw a single die twice. The Probability Histogram: Area = Probability. Where are we going?

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test

You flip a fair coin four times, what is the probability that you obtain three heads.

Interpreting Data in Normal Distributions

Statistics 151 Practice Midterm 1 Mike Kowalski

AP Physics 1 and 2 Lab Investigations

Stat 20: Intro to Probability and Statistics

Week 4: Standard Error and Confidence Intervals

The Math. P (x) = 5! = = 120.

Means, standard deviations and. and standard errors

Name: Date: Use the following to answer questions 2-3:

Characteristics of Binomial Distributions

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

MATH 10: Elementary Statistics and Probability Chapter 7: The Central Limit Theorem

Chapter 8: Quantitative Sampling

Lecture 14. Chapter 7: Probability. Rule 1: Rule 2: Rule 3: Nancy Pfenning Stats 1000

SAMPLING DISTRIBUTIONS

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

The Normal Distribution

Chapter 4. Probability and Probability Distributions

Midterm Review Problems

4. Continuous Random Variables, the Pareto and Normal Distributions

Statistics 2014 Scoring Guidelines

Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds

Using Proportions to Solve Percent Problems I

Association Between Variables

9. Sampling Distributions

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Introduction to Hypothesis Testing

Common Core Unit Summary Grades 6 to 8

Lesson 17: Margin of Error When Estimating a Population Proportion

Math 108 Exam 3 Solutions Spring 00

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

Problem Solving and Data Analysis

EQUATING TEST SCORES

WISE Sampling Distribution of the Mean Tutorial

Statistical estimation using confidence intervals

Descriptive Statistics and Measurement Scales

Week 3&4: Z tables and the Sampling Distribution of X

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

LESSON 4 Missing Numbers in Multiplication Missing Numbers in Division LESSON 5 Order of Operations, Part 1 LESSON 6 Fractional Parts LESSON 7 Lines,

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Mathematics of Risk. Introduction. Case Study #1 Personal Auto Insurance Pricing. Mathematical Concepts Illustrated. Background

The overall size of these chance errors is measured by their RMS HALF THE NUMBER OF TOSSES NUMBER OF HEADS MINUS NUMBER OF TOSSES

X: Probability:

MATH 140 Lab 4: Probability and the Standard Normal Distribution

University of Chicago Graduate School of Business. Business 41000: Business Statistics Solution Key

8. THE NORMAL DISTRIBUTION

How to Verify Performance Specifications

How to Win the Stock Market Game

5.1 Identifying the Target Parameter

Ch.3 Demand Forecasting.

Elementary Statistics and Inference. Elementary Statistics and Inference. 17 Expected Value and Standard Error. 22S:025 or 7P:025.

Simple Random Sampling

Mind on Statistics. Chapter 2

A POPULATION MEAN, CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

SOLUTIONS: 4.1 Probability Distributions and 4.2 Binomial Distributions

MBA 611 STATISTICS AND QUANTITATIVE METHODS

QM0113 BASIC MATHEMATICS I (ADDITION, SUBTRACTION, MULTIPLICATION, AND DIVISION)

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

Topic 9 ~ Measures of Spread

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

X On record with the USOE.

Solutions to Homework 6 Statistics 302 Professor Larget

Probability. a number between 0 and 1 that indicates how likely it is that a specific event or set of events will occur.

11. Analysis of Case-control Studies Logistic Regression

Fraction Basics. 1. Identify the numerator and denominator of a

Name: Date: Use the following to answer questions 3-4:

Additional sources Compilation of sources:

COMPARISON MEASURES OF CENTRAL TENDENCY & VARIABILITY EXERCISE 8/5/2013. MEASURE OF CENTRAL TENDENCY: MODE (Mo) MEASURE OF CENTRAL TENDENCY: MODE (Mo)

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

MA 1125 Lecture 14 - Expected Values. Friday, February 28, Objectives: Introduce expected values.

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Math 201: Statistics November 30, 2006

Measurement with Ratios

Probability Distributions

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

Mind on Statistics. Chapter 8

ideas from RisCura s research team

Online Appendix: Thar SHE blows? Gender, Competition, and Bubbles in Experimental Asset Markets, by Catherine C. Eckel and Sascha C.

Vertical Alignment Colorado Academic Standards 6 th - 7 th - 8 th

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Chi Square Distribution

Chapter 4 - Practice Problems 1

STA 130 (Winter 2016): An Introduction to Statistical Reasoning and Data Science

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Transcription:

Chapter 20 & 23 - Chance Error for Sampling PART VI : SAMPLING Dr. Joseph Brennan Math 148, BU Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 1 / 35

Sampling and Accuracy In Chapter 19 we discussed probability sampling and major biases to avoid. In Chapters 20 and 23 we will discuss the expected accuracy of a sample. The sample is only part of the population, so the percentage composition of the sample usually differs by some amount from the percentage composition of the whole population. For the remainder of Part VI (Chapters 20, 21, and 23) we will restrict ourselves to simple samples. These samples are done with or without replacement and every individual in the population has the same chance of being chosen. Recall that a simple random sample is done without replacement. Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 2 / 35

Variable Type In Chapter 20 our samples will consist of qualitative variables. These are samples which should be modeled as a box of 1 s and 0 s where we discuss the number of successful draws and the percentage of successful draws. In Chapter 23 our samples will consist of quantitative variables. These samples are modeled with the box models that have tickets which represent the actual observed value. We discuss the sum of draws and the average of draws. Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 3 / 35

Variable Type In Chapters 20 and 23 we will consider the composition of our box known. The standard deviation of the box can be calculated. In Chapter 21 we will consider the composition of our box unknown. The standard deviation of the box must be estimated. Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 4 / 35

Chance Error - Chapter 20 As we are in Chapter 20, our box is known and our variables will be qualitative! All of our boxes have 0 and 1 tickets. In Part V we defined expected value and standard error in absolute terms. In Chapter 20 we will explore these topics in relative terms. In Mathematics, the term relative is used to mean a ratio or percent of the whole. Therefore, this chapter will deal with the expected value and standard error in terms of percentage points. Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 5 / 35

Chance Error - Chapter 20 For a simple random sample, the expected value for the sample percentage equals the population percentage. We would be foolish to think that the expected value is the guaranteed value. As in all predictions and measurements an error exists: Chance Error: The chance error of a sample is the difference between the sample percentage and the expected value (population percentage). The likely size of the chance error is given by the standard error, often written SE. Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 6 / 35

Example (Sampling for Gender) A health study is to be based on a representative cross section of 6, 678 Americans age 18 to 79. This population is known to have 46% men and 54% women. Suppose we were to sample 100 individuals from the population and took their responses in the form M for male and F for female: This sample has 51 men and 49 women. The male chance error for this sample was 5 percentage points. Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 7 / 35

Example (Sampling for Gender) 250 samples involving 100 individuals was taken from the population of 6, 678. The histogram below represents the percentage of men in each sample: The expected value for each sample is 46% male, while the actual chance error fluctuates between 12 and 12 percentage points. Notice that there is clustering about the expected value 46 (17 samples actually met the expected value). Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 8 / 35

Example (Sampling for Gender) In a second round of sampling, a total of 250 samples involving 400 individuals was taken. Notice the spread of the histogram decreases. The chance error of the second round fluctuates between 7 and 8 percentage points. Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 9 / 35

Example (Sampling for Gender) In our sampling experiment, when we quadrupled the sample size our chance error fluctuations range decreased by less than 50%. It is intuitive that increasing the sample size will decrease the error (ultimately we would have no error if the sample was the total population), but can we anticipate the change in accuracy? Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 10 / 35

Counting Box Model - Chapter 20 In classifying and counting, our box model consists of tickets labeled with 1 or 0. The outcomes to be counted are labeled 1. The Expected Value is the average of the tickets in our box. The Standard Deviation is the spread of the tickets in our box. In the special case of counting: σ = (fraction of 1 s) (fraction of 0 s) Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 11 / 35

Standard Error - Chapter 20 With multiple random draws from a 0 1 box, the expected value for the percentages of 1 s in the sample equals the percentage of 1 s in the box (population). Let µ be the average of the box and σ be the standard deviation of the box. For n draws from the box, the following is true for the sum of outcomes: The expected value for a sample number is nµ The standard error for a sample number is nσ The second part of the above rule is called the the Square Root Law. Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 12 / 35

Example (Sampling for Gender) The spread decreases by a factor of 2 when the sample is increased by a factor of 4 (due to the Square Root Law). Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 13 / 35

Standard Error for Percentage - Chapter 20 The Standard Error for Sample Number is an absolute measurement of the spread of the chance errors. The term sample number is due to the fact that we are classifying and counting for success (a 0-1 box model). The expected value of the sum counts the number of expected successes. The Standard Error for Percentage is a relative measurement of the spread of the chance errors. To compute this value first compute the Standard Error for the corresponding number, then convert to percent. The formula: SE for percentage = SE for number size of sample 100% Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 14 / 35

Example (Rolling a Die) In Part V we rolled a die and set up a 0 1 box model which counted the occurrences of 3 or 4 when the die is rolled. Suppose we were to roll a die 100, 10, 000, and 1, 000, 000 times: Number Rolls Standard Error SE for Percent 100 100 0.47 = 4.7 4.7 100 = 0.047 10,000 10, 000 0.47 = 47 47 10,000 = 0.0047 1,000,000 1, 000, 000 0.47 = 470 470 1,000,000 = 0.00047 Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 15 / 35

General Trends - Chapter 20 The SE for Sample Number increases like the square root of the sample size. The SE for Sample Percentage decreases like the square root of the sample size. Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 16 / 35

SE with the Normal Curve When drawing at random from a 0 1 box, the percentage of 1 s among the draws is likely to be around the Expected Value for the percentage of 1 s among the draws, give or take the SE for the Percentage of 1 s among the draws. The language above implies a good normal approximation. Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 17 / 35

Normal Curve for Standard Error for Percentages Suppose 1,000 draws are made with replacement from a 0 1 box whose mean is 0.2 and whose standard error for percentages is 0.01. The values 0.19 and 0.21 are exactly 1 SE for percentages away from the average expected value of 0.2. We can use a normal curve with parameters N(0.200, 0.01) to find that 68 percentage points lie between 0.19 and 0.21. 68 Percentage Points of what? FALSE: About 68% of tickets in the box are in the range 0.19 to 0.21. TRUE: There is about a 68% chance for the average of the 1, 000 draws to be in the range 0.19 to 0.21. Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 18 / 35

Example (Telephone Survey) A telephone company has 100, 000 customers and plans to take a simple random sample of 400 of them for market research. According to Census data, 20% of the company s subscribers earn over $50, 000 annually. The percentage of persons in the sample with incomes over $50, 000 annually will be around, give or take. Solution: As the sampling technique is simple random, begin by constructing a box model for classifying and counting. We let people making over $50, 000 be ticketed as 1 and all others ticketed as 0. The first blank will be filled with the expected value of the sample percentage. The second blank will be filled with the SE for the sample percentage. Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 19 / 35

Example (Telephone Survey) The expected value for the box is the average of the tickets in the box. Note that 20% of the box consists of 1 s and 80% of the box consists of 2 s. µ = 0.2 1 + 0.8 0 = 0.2 As there will be 400 draws from the box, the expected value for the sum is 400 0.2 = 80 The expected value of the sample percentage is 80 400 = 0.2 Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 20 / 35

Example (Telephone Survey) The standard deviation is found using our computational trick: σ = 0.2 0.8 = 0.4 The SE for the sum is found using the square root law: SE = 400 0.4 = 8 The SE for the sample percentage is 8 400 = 0.02 Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 21 / 35

Example (Telephone Survey) The percentage of high earners in the sample will be around 20%, give or take 2% or so. The number of people making above $50, 000 annually will be 80, give or take 8 or so. Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 22 / 35

Example (Telephone Survey) a) Estimate the chance that between 18% and 22% of the persons in the sample are high earners. b) Estimate the chance that more than 25% of the persons in the sample are high earners. Solution: The EV for the sample percentage is 20% and the SE is 2%: 18 20 22 20 25 20 z 18% = = 1 z 2 22% = = 1 z 2 25% = = 2.5 2 These z-scores are associated with specific percentiles: z 18% 16 z 22% 84 z 25% 99.4 a) There is a 84 16 = 68% chance that between 18% and 22% of the sampled are high earners. b) There is a 100 99.4 = 0.6% chance that more than 25% of the sampled are high earners. Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 23 / 35

The Correction Factor - Chapter 20 Notice that none of our formulas requires knowledge of the population size. When the sample is only a small part of the population, the number of individuals in the population has almost no influence on the accuracy of the sample percentage. It is the absolute size of the sample (the number of individuals in the sample) which matters, not the size relative to the population. Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 24 / 35

The Correction Factor - Chapter 20 The square root law is exact when draws are made with replacement. When the draws are made without replacement, the formula gives a good approximation - provided the number of objects in the box is large relative to the number of draws. When drawing without replacement, to get the exact SE you have to multiply by the correction factor: number of objects in box number of draws number of objects in box 1 When the number of tickets in the box is large relative to the number of draws, the correction factor is nearly one. Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 25 / 35

The Correction Factor - Chapter 20 When the population is LARGE relative to the sample, the correction factor is nearly 1 and can be ignored. In this case, the absolute size of the sample, through the SE determines the accuracy of the sample. Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 26 / 35

Examples In 2004 there were 1.5 million eligible voters in New Mexico and 15 million eligible voters in Texas. A sample of 2, 500 people, taken without replacement, from each state would have the same accuracy (if the sample was simple and random) as the sample is tiny relative to the population. Suppose you took a drop of liquid from a bottle, for chemical analysis. If the liquid was well mixed, the chemical composition of the drop should reflect the composition of the whose bottle, and it shouldn t matter if the battle was a test tube of a gallon jug. Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 27 / 35

Examples We refer to the standard deviation of the box and the standard error of the draws. The standard deviation of the boxes are all equal: σ = fraction of 1 s fraction of 0 s = 0.5 0.5 = 0.5. The Standard Error for n draws from the box is n 0.5. This holds true for draws done with replacement. For draws without replacement, multiply by the correction factor (cf): σ = cf n 0.5 Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 28 / 35

Chance Error - Chapter 23 Let s turn our attention to quantitative variables; the box has tickets with values showing the observed value of the quantitative variable. The box is still known; the standard deviation of the tickets in the box can be calculated. Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 29 / 35

Chance Error - Chapter 23 Assume a box has tickets for the quantitative random variable X. The expected value of the average of draws equals the average µ of the box. The standard error of the average of draws measures the relative measurement of the spread of the chance errors. To compute this value first compute the standard error for the sum, then convert to an average. The formula: SE for average = SE for sum number of draws 100% Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 30 / 35

Example (Rolling a Die) In Part V we rolled a die and set up a box model: Warning: Not a SRS as die rolls are done with replacement! Suppose we were to roll a die 100, 10, 000, and 1, 000, 000 times: Number Rolls Standard Error SE for Average 100 100 1.71 = 17.1 17.1 100 = 0.171 10,000 10, 000 1.71 = 171 171 10,000 = 0.0171 1,000,000 1, 000, 000 1.71 = 1710 1710 1,000,000 = 0.00171 Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 31 / 35

Example (SE for Sum and Average) The average µ of the box is found as the weighted average µ = 1 1 6 + 2 2 6 + 4 1 6 + 5 2 6 = 19 6 3.2 The standard deviation σ of the box is calculated σ = (1 3.2) 2 1 6 + (2 3.2)2 2 6 + (4 3.2)2 1 6 + (5 3.2)2 2 6 σ 1.6 Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 32 / 35

Example (SE for Sum and Average) The expected value of the sum resulting from drawing with replacement from the box 250 times is: n µ = 250 3.2 = 800 The expected value of the average is n µ = µ = 3.2 n The standard error of the sum is n σ = 250 1.6 25 The standard error of the average is n σ = 25 n 250 = 0.1 Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 33 / 35

Trends The expected value for the average of the draws equals the average of the box. As the number of draws goes up, the standard error for the sum of the draws increases while the standard error for the average of the draws decreases. Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 34 / 35

Normal Curve for Standard Error for Averages Suppose 1,000 draws are made with replacement from a box whose average ticket value is 200. The standard error for averages is found to be 10. The values 190 and 210 are exactly 1 SE for averages away from the average expected value of 200. We can use a normal curve with parameters N(200, 10) to find that 68 percentage points lie between 190 and 210. 68 Percentage Points of what? FALSE: About 68% of tickets in the box are in the range 190 to 210. TRUE: There is about a 68% chance for the average of the 1, 000 draws to be in the range 190 to 210. Dr. Joseph Brennan (Math 148, BU) Chapter 20 & 23 - Chance Error for Sampling 35 / 35