University of California, Berkeley, Statistics 134: Concepts of Probability Michael Lugo, Spring 211 Exam 2 solutions 1. A fair twenty-sided die has its faces labeled 1, 2, 3,..., 2. The die is rolled sixteen times. Find: (a) [5] The expected value of the sum of the numbers rolled. Solution: the expectation of each roll is (1 + 2 +... + 2)/2 = 1.5, so the expectation of the sum is 1.5 16 = 168. Average score: 5. Median score: 4.7. Most people got this. (b) [5] The variance of the sum of the numbers rolled. Solution: the variance of each roll is (2 2 1)/12 = 33.25, so the expectation of the sum is 33.25 16 = 532. Average score: 3.8. Median score: 4. The most common error here was to find the variance of the score on a single roll and not multiply by 16. (c) [5] The approximate probability that the sum of the numbers rolled is at least 18. Solution: from the central limit theorem, 1 Φ((18 168)/ 532) 1 Φ(.52) 1.7 =.3. Average score: 3.9. Median score: 5. Note that the denominator is the standard deviation, which is the square root of the variance calculated in part (b). (d) [5] The expected number of faces which fail to appear at least once. Solution: let X be the number of faces which fail to appear at least once. Then X = I 1 + +I 2 where I n is the indicator of the event that face n never appears. E(I n ) = (1 1/2) 16 so the answer is 2(1 1/2) 16 8.8. Average score: 2.8. Median score: 3. Among people who tried this and didn t get the right answer, a lot gave the answer 16(1 1/2) 16. There are twenty faces, and therefore you re adding up twenty indicators. (e) [1] The variance of the number of faces which fail to appear at least once. Solution: X 2 = 16 k=1 I2 k + j k I ji k. By symmetry and linearity, E(X 2 ) = 2E(I 1 ) + (2 19)E(I 1 I 2 ). We already computed E(I 1 ) = (.95) 2. Similarly, E(I 1 I 2 ) = (.9) 16. Thus E(X 2 ) = 2(.95) 16 + 38(.9) 16 79.22 and V ar(x) = E(X 2 ) E(X) 2 79.22 (8.8) 2 = 1.73. Average score: 2.8. Median score: 2. You have to use the method of indicators here. In particular the distribution is not binomial. If face 1 fails to appear in twenty rolls, then that makes face 2 slightly more likely to appear at some point. In terms of covariance, Cov(I 1, I 2 ) = E(I 1 I 2 ) E(I 1 )E(I 2 ) is negative, so the variance is smaller than that of a binomial with n = 2, p = (.95) 16. Problem 1 statistics: out of 3 points, average 18, median 19, SD 6.3. 1
2. Phone calls arrive at a telephone exchange at an average rate of two per minute. The calls may be modeled as a Poisson process. Let T be the time (in minutes after noon) at which the second call to arrive after noon arrives. (a) [5] Find the PDF of T. Solution: T is gamma(2, 2) and has PDF f(t) = 4te 2t for all positive t. Average score: 2.9. Median score: 3. You just have to recognize that this is a distribution we ve already talked about. Note that a Poisson process is not the same thing as a Poisson distribution. (b) [5] Find E(T 2 ). Solution: integrating, E(T 2 ) = Changing variables with u = 2t gives (1/4) t 2 f(t) dt = u 3 e u du. 4t 3 e 2t dt. The integral is Γ(4) = 3! = 6, so we get 3/2. Alternatively, E(T 2 ) = E(T ) 2 + V ar(t ). We know E(T ) = r/λ and V ar(t ) = r/λ 2 for gamma random variables; here E(T ) = 1, V ar(t ) = 1/2, summing to 3/2. Average score: 2.2. Median score: 2. This is just calculus. (c) [5] Find P ( < T < 1). Integrating again, P ( < T < 1) = 4te 2t dt Integrating by parts with u = 4t, dv = e 2t dt, so du = 4 dt, v = 1/2e 2t dt, we find the indefinite integral 4te 2t dt = 2te 2t + 2 e 2t dt = (1 + 2t)e 2t. Evaluating at and 1 gives P ( < T < 1) = (1 + 2)e 2 + (1 + )e = 1 3e 2. Alternatively, if the second call arrives between time and time 1, that means that at least two calls arrive by time 1. But the number of calls arriving between time and time 1, which we ll call N, follows a Poisson distribution with mean 2. Then P (N = k) = e 2 2 k /k!. In particular P ( < T < 1) = P (N 2) = 1 P (N = ) P (N = 1) = 1 (e 2 ) (2e 2 ) = 1 3e 2. Average score: 2.7. Median score: 3. This is just calculus. 2
(d) [1] Let N(a, b) denote the number of calls arriving in the time interval (a, b). Find P (N(, 2) = 3 and N(1, 3) = 4 N(, 3) = 5). Solution: We can rewrite this using the definition of conditional probability to get But the numerator can be rewritten as P (N(, 2) = 3, N(1, 3) = 4, N(, 3) = 5) P (N(, 3) = 5). P (N(, 1) = 1, N(1, 2) = 2, N(2, 3) = 2 P (N(, 3) = 5). Now the three events in the numerator are independent, and so this is P (N(, 1) = 1)P (N(1, 2) = 2)P (N(2, 3) = 3) P (N(, 3) = 5). But N(, 1), N(1, 2), N(2, 3) are all Poisson with mean 2, and N(, 3) is Poisson with mean 6. Thus this is The es cancel leaving e 2 2 1 /1! e 2 2 2 /2! e 2 2 2 /2!. e 6 6 5 /5! 2 1 2 2 2 2 5! 1!2!2!6 = 384 5 3114 = 1 81. Average score: 3.5. Median score: 3. Problem 2 statistics: out of 25 points, average 11.4, median 11, SD 6.9. 3. Alice and Bob have children until they have had at least one boy and at least one girl, and then stop. The probability that a child is a boy is p. (a) [5] What is the probability that they have a total of k children, for each integer k 2? Solution: Decompose based on whether the first child is a boy or a girl. The probability that they have k children and the first child is a boy is the probability that they have k 1 boys followed by a girl, which is p k 1 q; similarly the probability they have k children and the first child is a girl is q k 1 p. The answer is the sum of these, p k 1 q + q k 1 p. Average score 4.2, median score 5. Common errors included writing only one of the two terms. (b) [1] Let p = 1/2, so boys and girls are equally likely. What is the mean and variance of the number of children that they have? Solution: The probability of having k children in this case is just (1/2) k 1, for k 2. This is just a geometric with success probability p = 1/2, shifted by 1. (You can see this probabilistically the geometric itself is the waiting time for a child of opposite sex to the 3
first child.) Such a geometric has mean 1/p = 2 and variance q/p 2 = 2, so the number of children has mean 3 and variance 2. Alternatively, find k 2 k/2k 1 to get the mean. Note that this is not a geometric series! It is 2 2 + 3 1 2 + 4 2 2 + 3 and we can rewrite this as 2 1 + 1 2 2 + 1 2 3 + ) + 2 + 1 1 2 + 1 2 2 + 3 ) + 2 + 1 2 2 + 1 ) 3 2 + + 4 2 + 1 3 2 + 1 ) 4 2 + 5 Each term is a geometric series, so this is 1 + 1 + 1/2 + 1/4 + 1/8 + ; this is now a geometric series except for the first term. Its sum is 3. To get the variance one needs k 2 k2 /2 k 1. Unfortunately this sum is hard to do by tricks like the one above; the best way to do it explicitly is by generating functions, if you know about those from a discrete math class. That sum is 11. The variance is therefore 11 3 2 = 2. Average score: 5.4. Median score: 6. A lot more people got the mean than the variance if you don t recognize this as a geometric doing the sum to get E(X 2 ) is difficult. Problem 3 statistics: out of 15 points, average 9.6, median 1, SD 3.8. 4. Let U have uniform (, 1) distribution. Let V = U 4. Find: (a) [5] the CDF of V = U 4. Solution: P (V v) = P (U 4 v) = P (U v 1/4 ) = v 1/4. (b) [5] the PDF of V. Solution: differentiate the CDF from part (a) to get f(v) = (1/4)v 3/4. (a): average 2.7, median 3. (b): average 3.2, median 4. A lot of people did this in the other order, using the change-of-variables formula to get (b) and then integrating to get (a). Some got the CDF and PDF confused and wrote down the right PDF as the answer to (a), and then differentiated that to get (b). And a discouragingly large number of people just wrote down the CDF or PDF for a uniform random variable. Technically these formulas are only true on the range [, 1]. The CDF is to the left of and 1 to the right of 1; the PDF is outside this range. (c) [5] the expected value of V. Solution: integrate the PDF from part (b): vf(v) dv = (1/4)v 1/4 dv = (1/4) v 5/4 /(5/4) 1 and evaluation gives 1/5. Average 2.8, median 3. Note that the integral is from to 1, not from to or to. We ve written that E(V ) = vf(v) dv before, but you have to remember where f is zero. (d) [5] Let V 1, V 2, V 3 be three independent random variables with the same distribution as V. What is the density of V (2), the second order statistic of V 1, V 2, V 3? 4
Solution: by formula from the text, ( ) 3 1 3(1/4v 3/4 ) (v 1/4 ) 1 (1 v 1/4 ) 1 = 3 2 1 2 (v 1/2 v 1/4 ). Average 2.7, median 3. Note that the V i are NOT uniformly distributed, so the order statistics are not beta distributed. (e) [5] Find the expected value of V (2). Integrate the result from (e) to get 3 2 v(v 1/2 v 1/4 ) dv = 3 2 = 3 2 (v 1/2 v 3/4 ) dv 3/2 1 ) = 1 7/4 7. (f) [5] Find the median of V (2). (Hint: if you write an integral, you re doing it wrong.) V 1, V 2, V 3 are obtained from uniform random variables U 1, U 2, U 3 by taking fourth powers. So V (2) = U 4 (2). In particular, the median of V (2) is the fourth power of the median of U (2). By symmetry the median of U (2) the second largest of three random points in the unit interval is 1/2, and so the median of V (2) is 1/16. 5