Chapter 17: expected value and standard error for the sum of the draws from a box Context................................................................... 2 When we do this 10,000 times..................................................... 3 Expected value and standard error................................................. 4 Expected value 5 Expected value for sum of the draws, method 1...................................... 6 Expected value for sum of the draws, method 2...................................... 7 Formula for expected value of sum of the draws...................................... 8 Standard error 9 Standard error for the sum of the draws........................................... 10 Computing the SE for the sum of the draws........................................ 11 Example................................................................ 12 Example (cont d).......................................................... 13 Example (cont d).......................................................... 14 Short-cut................................................................ 15 Normal approximation 16 Use normal approximation.................................................... 17 Example................................................................ 18 Example (cont d).......................................................... 19 Example (cont d).......................................................... 20 Classifying and counting 21 Replace tickets by 0s and 1s................................................... 22 1
Context We ll look at sum of the draws of a box Example: Count the number of heads in 100 coin tosses Maybe one time the number is 54, the next time it is 48, the third time it is 47. The observed value varies! Observed value = expected value + chance error See computer simulation, where I repeated this 10,000 times 2 / 22 When we do this 10,000 times... Number of heads in 100 coin tosses, repeated 10000 times Density 0.00 0.02 0.04 0.06 0.08 30 40 50 60 nr of heads 3 / 22 Expected value and standard error Note that the number of heads is a random variable, with a distribution! What is the center and spread of this distribution? The center is called the expected value The spread is called the standard error. The standard error gives the likely size of the chance error. We can use a similar model to analyze election polls, and will look into that later. 4 / 22 2
Expected value 5 / 22 Expected value for sum of the draws, method 1 We look at the sum of 100 draws from a box with the tickets 0, 1, 1, 6 Observed value = expected value + chance error What is the expected value of the sum of the draws? Method 1: How many 0 s do we expect in our draws? About 25. How many 1 s do we expect in our draws? About 50. How many 6 s do we expect in our draws? About 25. So what do we expect for the sum of the draws? About (25 0) + (50 1) + (25 6) = 0 + 50 + 150 = 200 6 / 22 Expected value for sum of the draws, method 2 Method 2: The average of the box is: 0 + 1 + 1 + 6 4 = 8 4 = 2 So after each draw, we expect the sum of the draws to increase by about 2 So the sum of the draws is expected to be 100 2 = 200 General formula for the expected value for the sum of the draws, made at random with replacement: (number of draws) (averageof thebox) 7 / 22 Formula for expected value of sum of the draws General formula for the expected value for the sum of the draws, made at random with replacement: Does the formula make sense? (number of draws) (averageof thebox) What happens if the number of draws is doubled? Then the expected value of the sum of the draws doubles. What happens if the average of the box is doubled? Then the expected value of the sum of the draws doubles. 8 / 22 3
Standard error 9 / 22 Standard error for the sum of the draws We look at the sum of draws from a box Observed value = expected value + chance error How big is the chance error? The chance error is likely to be similar in size to the standard error (SE) for the sum of the draws If the SE for the sum of the draws is large, then we have large chance errors, and the observed values are widely spread around the expected value If the SE for the sum of the draws is small, then we have small chance errors, and the observed values are tightly clustered around the expected value Observed values are rarely more than 2 or 3 SEs away from the expected value. 10 / 22 Computing the SE for the sum of the draws SEfor thesum of thedraws = number of draws (SDof thebox) This is called the square root law, because it involves the square root of the number of draws Does the formula make sense? What happens if the number of draws is doubled? Then the SE of the sum of the draws is multiplied by a factor 2. This matches with what we learned about the law of large numbers: the chance error grows, but only slowly. What happens if we double the SD of the box? Then the SE of the sum of the draws doubles. 11 / 22 Example We look at the sum of 25 draws from a box with tickets 0,2,3,4,6 Fill in the blank. The sum of the draws is around...(a), give or take...(b) or so. (a) should be the expected value of the sum of the draws: (number of draws) (averageof thebox) = 25 ( ) 0+2+3+4+6 5 = 25 3 = 75 (b) should be the SE for the sum of the draws. This is given by the square root law: number of draws (SDof thebox) 12 / 22 4
Example (cont d) We need to compute the SE for the sum of the draws: number of draws (SDof thebox) What is the SD of the box 0, 2, 3, 4, 6? Step 1: compute the average of the box: 3 (see part a) Step 2: compute deviation from the average: -3, -1, 0, 1, 3 Step 3: compute r.m.s. size of the deviations: ( 3) 2 + ( 1) 2 + 0 2 + 1 2 + 3 2 So the SD of the box is 2 The SE for the sum of the draws is: 25 2 = 5 2 = 10. 5 = 20 5 = 4 = 2 13 / 22 Example (cont d) We look at the sum of 25 draws from a box with tickets 0,2,3,4,6 Fill in the blank. The sum of the draws is around...(a), give or take...(b) or so. (a) should be the expected value of the sum of the draws: 75 (b) should be the SE for the sum of the draws: 10 So the sum of the draws is around 75, give or take 10 or so. 14 / 22 Short-cut Suppose the box only contains two kinds of tickets: some tickets with a big number and some tickets with a small number. Then there is a shortcut to compute the SD of the box! SDof thebox = (big number small number) (fraction of big numbers) (fraction of smallnumbers) Example: box with tickets 7,7,7,-2,-2 Large number = 7. Fraction of large numbers = 3/5. Small number = -2. Fraction of small numbers = 2/5. SD of the box = (7 ( 2)) (3/5) (2/5) = 9 (3/5) (2/5) Use calculator to compute this 15 / 22 5
Normal approximation 16 / 22 Use normal approximation If number of draws is large, we can use the normal approximation to estimate chances. We should use a new average and new SD: New average = expected value for sum of the draws New SD = SE for the sum of the draws So the new standard units tell us how many SEs a number is away from the expected value 17 / 22 Example Consider the sum of 25 draws from the box with tickets 0,2,3,4,6. See computer simulation, where I repeated this 1000 times 18 / 22 Example (cont d) Histogram of sum of the draws, when repeated 1000 times Density 0.00 0.01 0.02 0.03 0.04 40 50 60 70 80 90 100 110 sum of the draws 19 / 22 6
Example (cont d) About what percentage of observed values should be between 50 and 100? We use the normal approximation: New average: expected value for the sum of the draws = 75 New SD: SE for the sum of the draws = 10 Note that these numbers match with the graph on the previous slide. Then use normal approximation as before. See overhead 20 / 22 Classifying and counting 21 / 22 Replace tickets by 0s and 1s See overhead for example Suppose you draw from a box, and want to count the number of a certain ticket (or tickets) Then: put a 0 on the tickets that you don t want to count put a 1 on the ticket that you do want to count Using the new box: The count is like the sum of the draws from the new box We can compute the expected value and SE as before We can also use the normal curve to approximate probabilities as before 22 / 22 7