Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley 22 July 2014
By the end of this lecture... You will be able to: Determine what we expect the sum of draws from a box to be, and how far off we will likely be Quickly calculate the SD of a list with only two kinds of numbers Easily calculate probabilities for sums of draws Use a box model to address more kinds of problems, e.g. counting the number of 6 s shown in a series of throws 2 / 28
Recap: Box Models Box models are useful in analyzing games of chance Draw a box Indicate the number and kind of tickets Indicate the number and kind of draws Indicate what is done with each ticket Examined minimum and maximum of sum of draws 3 / 28
Example 1: Box Model Have a box with three tickets a 1, a 2, and a 3 Draw 5 times, with replacement Add together the values seen on each ticket What is the sum of the draws? How much does each draw contribute to the sum? What can we reasonably expect the sum to be? 4 / 28
The Expected Value The Expected Value (EV) for the sum of the draws from the box is # of draws average of the box The sum of draws from a box (with replacement) should be somewhere around the expected value. 5 / 28
Example 2: Rolling Dice You are playing a dice game. It costs $1 per play. You roll the dice, and if it is an even number, you win $3. If it is odd you win nothing. About how much do you expect to win or lose in 50 plays? 1 Draw a box model, indicating the number and kind of tickets 2 Indicate the number and kind of draws 3 Indicate what is done with each ticket 4 Answer the question above 6 / 28
Example 3: Coin Flipping You are playing a coin flipping game. It costs $1 to play. You flip the two coins, and if there is at least one head showing, you win $2. Otherwise, you win nothing. True or False, and Explain: If you play 30 times, you will definitely win $10. 7 / 28
Chance Error Variation around expected value is due to chance error chance error = # observed # expected If I actually win $5, what is my chance error? What if I lose $15? How big is my chance error likely to be? 8 / 28
Standard Error The Standard Error (SE) for the sum of the draws from the box is # of draws SD of the box The sum of draws from a box (with replacement) should be somewhere around the expected value, give or take a SE. 9 / 28
Example 4: Two Boxes What kind of variability do we expect from the sum of 5 draws, with replacement, from a box with: 1 A single 1 and a single 3? 2 A single 1 and a single 10? Calculate the EV and SE of both of these situations. 10 / 28
SD Shortcut Obviously calculating a lot of SDs: 1 Find average 2 Find the deviations from average 3 Square the deviations from average 4 Average the squared deviations 5 Take the square root 11 / 28
SD Shortcut (cont.) If there are only two types of tickets in the box (or only two types of numbers in the list): 1 Call the larger number the big # and the smaller number the small # 2 Call the fraction of larger numbers b.f. and the fraction of smaller numbers s.f. 3 Calculate SD = (big # small #) b.f. s.f. 12 / 28
SD Shortcut (cont.) Let s look at a box with three 2 s and two 1 s avg = 1.6 (2 1.6) sd = 2 + (2 1.6) 2 + (2 1.6) 2 + (1 1.6) 2 + (1 1.6) 2 5 (0.4) 2 + (0.4) 2 + (0.4) 2 + ( 0.6) 2 + ( 0.6) 2 = = 0.49 sd = (2 1) = 0.49 3 5 2 5 5 13 / 28
Lists vs. Chance Processes List of numbers (tickets in a box), all values are known mean = average = sum of values, divided by number of values; the typical size of an entry/ticket SD = standard deviation = square root of average of deviations from mean; the typical size of the deviation from the mean in a single entry The typical entry in a list is around average, give or take a standard deviation or so. 14 / 28
Lists vs. Chance Processes (cont.) Chance process (draws from a box), values are unknown EV for sum of draws with replacement = number of draws times average of box; typical size of the sum of draws with replacement SE for sum of draws with replacement = standard error = square root of number of draws times SD of box; typical size of deviation from EV in a single sum of draws The sum of draws with replacement is around expected value, give or take a standard error or so. 15 / 28
Example 5: Drawing from a Box 50 draws are taken, with replacement, from a box with 1 each of the following: 1, 2, 3, 6, 8 1 Calculate the expected value and standard error for the sum of the draws. 2 The sum of the draws will be around, give or take or so. 3 Someone actually makes 50 draws with replacement. You are asked to guess what the sum is. Do you think your guess is off by about 2, 12, or 20? 4 You are told that 175 is the sum. Fill in the following: (a) expected value = (b) observed value = (c) chance error = (d) standard error = 16 / 28
Interesting Question: What is the chance that using the box above (1 each of: 1, 2, 3, 6, 8 ), the sum of 1000 draws is between 3900 and 4100? Could you find a similar probability for a much smaller number of draws? Recalling the frequency definition of probability, could you find this probability? 17 / 28
Interesting Question: (cont.) Draw 1000 tickets with replacement, calculate the sum of the tickets Do this 10 times, record the proportion of sums (out of 10) that are between 3900 and 4100 Do this 100 times, record the proportion of sums (out of 100) that are between 3900 and 4100 Relative Proportion Between 3900 and 4100 0.75 0.80 0.85 0.90 Relative Proportion of Observed Sums of 1000 Draws Between 3900 and 4100 0 2000 4000 6000 8000 10000 Number of Observed Sums 18 / 28
Interesting Question: (cont.) Draw 1000 tickets with replacement, calculate the sum of the tickets Do this 200 times, record the proportion of sums (out of 200) that are between 3900 and 4100 Do this times, record the proportion of sums that are between 3900 and 4100 Relative Proportion Between 3900 and 4100 0.75 0.80 0.85 0.90 Relative Proportion of Observed Sums of 1000 Draws Between 3900 and 4100 0 2000 4000 6000 8000 10000 Number of Observed Sums 19 / 28
Interesting Question: (cont.) Histogram of 10,000 Observed Sums of 1,000 Draws From The Box Draw 1000 tickets with replacement, calculate the sum of the tickets Do this 10,000 times, make a histogram of the sum The histogram looks normal Density 0.000 0.001 0.002 0.003 0.004 3700 3800 3900 4000 4100 4200 4300 Sum of 1,000 Draws 20 / 28
Interesting Question: (cont.) We can use the approximate normality of this curve to calculate the chance that the sum of 1,000 draws is between 3900 and 4100. z = value of sum expected value of sum standard error of sum Use the normal table to find the chance that the sum of 1000 draws from the box ( 1, 2, 3, 6, 8 ) is between 3900 and 4100. In general, the normal curve can be used to calculate probabilities for sums of random draws with replacement from a box, when the number of draws is large 21 / 28
Example 6: Using the Normal Curve A fair die is thrown 200 times. 1 Calculate the expected value and standard error for the sum of the throws 2 The sum of the throws will be around, give or take or so. 3 Find the probability that the sum of the throws is greater than 647. 22 / 28
Example 7: Counting the Evens A fair die is thrown 600 times. 1 The sum of the throws will be around 2100, give or take 42 or so. 2 The sum of evens thrown will be around, give or take or so. 3 The number of evens thrown will be around, give or take or so. 23 / 28
Example 7: Counting the Evens (cont.) Let s simplify, and assume a fair die is thrown 5 times. I roll 6 2 4 5 3 If I am making a sum of throws, I will add: 6 + 2 + 4 + 5 + 3 = 20 If I am making a sum of evens thrown, I will add: 6 + 2 + 4 + 0 + 0 = 12 If I am counting the number of evens thrown, I will add: 1 + 1 + 1 + 0 + 0 = 3 Each process can be written as a sum! 24 / 28
Example 7: Counting the Evens (cont.) Strategy for adding only certain things or counting number of things: 1 Make the box describing the basic chance process 2 Formulate your desired quantity as a sum 3 Change the value of the tickets (but not the number of tickets!) to add as appropriate 25 / 28
Example 7: Counting the Evens (cont.) A fair die is thrown 600 times. Box is ( 1, 2, 3, 4, 5, 6 ). Have 600 throws, with replacement 1 Sum of throws is like sum of draws from box above, usual EV and SE formulas apply 2 Sum of evens: each even drawn adds itself to the sum. Each odd drawn adds 0 to the sum. Box becomes ( 0, 2, 0, 4, 0, 6 ). Sum of evens is like sum of draws from this box, usual EV and SE formulas apply 3 Number of evens: each even drawn adds 1 to the count (sum). Each odd drawn adds 0 to the count (sum). Box becomes ( 0, 1, 0, 1, 0, 1 ). Number of evens is like sum of draws from this box, usual EV and SE formulas apply. 26 / 28
In a 0-1 Box The 1 s represent the event(s) that we wish to count, the 0 s represent the event(s) that we do not wish to count The EV for the sum = number of draws average of box. But what is the average of the box? The SE for the sum = number of draws SD of box. But what is the SD of the box? The normal curve can be used to calculate chances for sums of draws: new avg = EV for sum, new SD = SE for sum 27 / 28
Important Takeaways The EV is the sum that we expect to see The SE is the amount we expect a particular sum to be off from the EV, due to chance error There is a shortcut for calculating the SD of a list with only two kinds of numbers The normal curve can be used to calculate chances for sums of draws: new avg = EV for sum, new SD = SE for sum Changing the box helps address a lot of problems either to add only certain kinds of tickets, or to count the number of a certain event Next time: Why use the normal curve to calculate probabilities? 28 / 28