Stat 20: Intro to Probability and Statistics Lecture 18: Simple Random Sampling Tessa L. Childers-Day UC Berkeley 24 July 2014
By the end of this lecture... You will be able to: Draw box models for real-world scenarios Find EVs and SEs for percentages (as opposed to sums) Explain the difference between drawing with and without replacement (and how that affects EVs, SEs, and use of the normal) 2 / 23
Recap: Box Models Box models are useful in analyzing games of chance Draw a box model to describe a process (sums or classify/count) Draw randomly, with replacement, from the box Calculate EV for sum and SE for sum Use normal curve to calculate probabilities of certain outcomes. Why? 3 / 23
Example: Roulette You are playing roulette, a game in which a ball has equal chances of landing on one of 18 red spaces, 18 black spaces, or 2 green spaces. What does the box look like? Now, I want to know what the chance is of having the ball land on a red space 20 or more times in 35 plays. Draw a box model to describe the game (sums or classify/count) Draw randomly, with replacement, from the box Calculate EV for sum and SE for sum Use normal curve to calculate probability 4 / 23
Other Box Models Box models can also be useful in analyzing the ways that other chance processes work Surveys, studies, experiments, etc. What is our box? How do we draw from it? How is this different from the other box models we ve seen? 5 / 23
Example: Loan Debt You are interested in estimating the percentage of students at Cal who have taken out student loans. Here, and in general: Draw a box model to describe the population Draw randomly, without replacement, from the box Calculate EV for percent and SE for percent Use normal curve to calculate probability 6 / 23
Chance Variability Say that the population is divided evenly among class rank. Will my sample reflect this? Why or why not? If I take one sample, put every ticket back, and draw another sample, will they match? Is this a problem? 7 / 23
The Expected Value Recall, that when drawing with replacement, EV for sum = # of draws average of box. But our box is made of only 0 s and 1 s, so EV for # of 1 s = number of draws proportion of 1 s. 8 / 23
The Expected Value (cont.) A percentage = # of something total # 100, so EV for # of 1 s EV for percent of 1 s = 100 # of draws number of draws proportion of 1 s = 100 # of draws = proportion of 1 s 100 = percentage of 1 s This is when we draw with replacement. 9 / 23
The Expected Value (cont.) When drawing without replacement EV for percent of 1 s = percentage of 1 s. Intuitively, it makes sense that we expect to see a representative number of 1 s, since this is a good sampling method 10 / 23
The Standard Error Recall, that when drawing with replacement, SE for sum = # of draws SD of box. But our box is made of only 0 s and 1 s, so SE for # of 1 s = # of draws (1 0) (proportion of 1 s ) ( ) proportion. of 0 s 11 / 23
The Standard Error (cont.) A percentage = # of something total # 100, so SE for # of 1 s SE for percent of 1 s = 100 # of draws (proportion ) ( ) number of draws proportion = 100 # of draws of 1 s of 0 s ( ) ( ) proportion proportion = of 1 s of 0 s number of draws 100 This is when we draw with replacement. 12 / 23
The Standard Error (cont.) SE for percent of 1 s = ( ) ( ) proportion proportion of 1 s of 0 s number of draws 100 We can see that: Increasing the number of trials: increases the SE for the sum by a factor of the square root decreases the SE for % by a factor of the square-root As with our previous SE s, this tells us about how far off a draw will be from the EV 13 / 23
The Standard Error (cont.) When drawing without replacement SE for percent of 1 s without replacement = correction factor SE for percent of 1 s. with replacement Where correction factor = population size sample size population size 1 Intuitively, it makes sense that we must relate the sample size and the population size 14 / 23
The Correction Factor correction factor = population size sample size population size 1 What happens if our sample is small compared to the population? What if it is large? 15 / 23
The Correction Factor (cont.) Corection Factor 0.0 0.2 0.4 0.6 0.8 1.0 Pop Size = 100,000 Pop Size = 10,000 Pop Size = 4000 Pop Size = 3000 Pop Size = 2000 Pop Size = 1,000 0 1000 2000 3000 4000 Sample Size 16 / 23
Comparison Sum of draws from a box, with replacement Percent of 1 s from a 0-1 box, with replacement EV for sum = # of draws avg. of box EV for % = percent of 1 s in the box SE for sum = # of draws SD of box SE for % ( ) ( ) proportion proportion = of 1 s of 0 s number of draws 100 17 / 23
Comparison (cont.) Percent of 1 s from a 0-1 box, with replacement Percent of 1 s from a 0-1 box, without replacement EV for % = percent of 1 s in the box EV for % = percent of 1 s in the box SE for % ( ) ( ) proportion proportion = of 1 s of 0 s number of draws 100 SE for % = correction factor ( ) ( ) proportion proportion of 1 s of 0 s number of draws 100 18 / 23
Examples Say we want to find the chance of having a roulette ball land on a red space 20 or more times in 35 plays. Draw a box, indicate number and kind of tickets, number of draws, kind of draws Calculate EV for sum Calculate SE for sum Use normal curve 19 / 23
Examples (cont.) Say we want to find the chance of having a roulette ball land on a red space 57% or more of the time, in 35 plays. Draw a box, indicate number and kind of tickets, number of draws, kind of draws Calculate EV for percent Calculate SE for percent Use normal curve 20 / 23
Examples (cont.) Assume that there are 30,000 students at UC Berkeley, and that 65% of them have some student loan debt. We want to find the chance of having 67% or more of those sampled have student loan debt, in a sample of size 1,000. Draw a box, indicate number and kind of tickets, number of draws, kind of draws Calculate EV for percent Calculate SE for percent Use normal curve 21 / 23
Examples (cont.) I m interested in looking at student loan debt at other colleges, besides UC Berkeley. So, I expand my survey to include UCLA and Pomona College (a small, liberal arts university). In both places, I will take a sample of 2% of the students, in order to estimate the percentage of students with loan debt. Other things being equal: 1 The accuracy to be expected at UCLA is about the same as the accuracy to be expected at Pomona. 2 The accuracy to be expected at UCLA is quite a bit higher than at Pomona. 3 The accuracy to be expected at UCLA is quite a bit lower than at Pomona. Explain. 22 / 23
Important Takeaways Box models can be used for real world problems, not just gambling Box = population, Draws from box without replacement = sample (SRS) EV, and SE for percent of 1 s drawn from a 0-1 box Correction factor for drawing without replacement Probability histogram still normally distributed We can use the correction factor, combined with the normal curve, to find probabilities under the normal curve (approximate probabilities from the probability histogram) Next time: What if we don t know what s in the box? 23 / 23