2. Discrete Random Variables and Expectation

2. Discrete Random Variables and Expectation In tossing two dice we are often interested in the sum of the dice rather than their separate values The sample space in tossing two dice consists of 36 events of equal probability, given by the ordered pairs of numbers {(1,1), (1,2),, (6, 6)} If the quantity we are interested in is the sum of the two dice, then we are interested in 11 events (of unequal probability) Any such function from the sample space to the real numbers is called a random variable MAT-72306 RandAl, Spring 2015 22-Jan-15 75 2.1. Random Variables and Expectation Definition 2.1: A random variable (RV) on a sample space is a real-valued function on ; that is,. A discrete random variable is a RV that takes on only a finite or countably infinite number of values For a discrete RV and a real value, the event " includes all the basic events of the sample space in which assumes the value I.e., " represents the set ) = } MAT-72306 RandAl, Spring 2015 22-Jan-15 76 1

We denote the probability of that event by Pr = Pr, If is the RV representing the sum of the two dice, the event =4corresponds to the set of basic events {(1, 3), (2,2), (3, 1)} Hence, Pr = 4 = 3 36 = 1 12 MAT-72306 RandAl, Spring 2015 22-Jan-15 77 Definition 2.2: Two RVs and are independent if and only if Pr(( )) = Pr Pr ) for all values and. Similarly, RVs,, are mutually independent if and only if, for any subset [1, ] and any values,, Pr ) = Pr MAT-72306 RandAl, Spring 2015 22-Jan-15 78 2

Definition 2.3: The expectation of a discrete RV, denoted by E], is given by = Pr where the summation is over all values in the range of. The expectation is finite if Pr, converges; otherwise, it is unbounded. E.g., the expectation of the RV representing the sum of two dice is = 1 36 2+ 2 36 3+ 3 36 4++ 1 36 12=7 MAT-72306 RandAl, Spring 2015 22-Jan-15 79 As an example of where the expectation of a discrete RV is unbounded, consider a RV that takes on the value 2 with probability 12 for = 1,2, The expected value of is ] = 1 2 2 = 1 expresses that ] is unbounded MAT-72306 RandAl, Spring 2015 22-Jan-15 80 3

2.1.1. Linearity of Expectations By this property, the expectation of the sum of RVs is equal to the sum of their expectations Theorem 2.1 [Linearity of Expectations]: For any finite collection of discrete RVs,, with finite expectations, = MAT-72306 RandAl, Spring 2015 22-Jan-15 81 Proof: We prove the statement for two random variables and (general case by induction). The summations that follow are understood to be over the ranges of the corresponding RVs: = Pr ) = Pr ) + Pr ) = Pr ) + Pr ) = Pr + Pr ] + The first equality follows from Definition 1.2. In the penultimate equation uses Theorem 1.6, the law of total probability. MAT-72306 RandAl, Spring 2015 22-Jan-15 82 4

Let us now compute the expected sum of two standard dice Let, where represents the outcome of die for = 1,2 Then = 1 6 =7 2 Applying the linearity of expectations, we have =7 Linearity of expectations holds for any collection of RVs, even if they are not independent MAT-72306 RandAl, Spring 2015 22-Jan-15 83 Lemma 2.2: For any constant and discrete RV, ]. Proof: The lemma is obvious for =0. For 0, ]= Pr Pr Pr. MAT-72306 RandAl, Spring 2015 22-Jan-15 84 5

2.1.2. Jensen's Inequality Let us choose the length of a side of a square uniformly at random from the range [1,99] What is the expected value of the area? We can write this as ] It is tempting to think of this as being equal to, but a simple calculation shows that this is not correct In fact, = = 50 = 2500 whereas = 99503 3317 > 2500 MAT-72306 RandAl, Spring 2015 22-Jan-15 85 More generally, [ ] ( ) Consider =() The RV is nonnegative and hence its expectation must also be nonnegative ] = [( ]) + ] ]+( ) ) To obtain the penultimate line, use the linearity of expectations To obtain the last line use Lemma 2.2 to simplify [[]] = [] [] MAT-72306 RandAl, Spring 2015 22-Jan-15 86 6

The fact that [ ] ( ) is an example of Jensen's inequality Jensen's inequality shows that, for any convex function, we have )] ]) Definition 2.4: A function is said to be convex if, for any and 1, + + (1 ) Lemma 2.3: If is twice differentiable function, then is convex if and only if "() 0 MAT-72306 RandAl, Spring 2015 22-Jan-15 87 MAT-72306 RandAl, Spring 2015 22-Jan-15 88 7

Theorem 2.4 [Jensen's Inequality]: If is a convex function, then )] ]). Proof: We prove the theorem assuming that has a Taylor expansion. Let ]. By Taylor's theorem there is a value such that (x)+ ) 2 (x ) since ) > 0 by convexity. Taking expectations and applying linearity of and Lemma 2.2 yields: ] + )( ) = ]). MAT-72306 RandAl, Spring 2015 22-Jan-15 89 2.2. The Bernoulli and Binomial Random Variables We run an experiment that succeeds with probability and fails with probability Let be a RV such that = iftheexperimentsucceeds, otherwise The variable is called a Bernoulli or an indicator random variable. Note that, for a Bernoulli RV, ]=1+(10==Pr(=1) MAT-72306 RandAl, Spring 2015 22-Jan-15 90 8

If we, e.g., flip a fair coin and consider heads a success, then the expected value of the corresponding indicator RV is 1/2 Consider a sequence of independent coin flips What is the distribution of the number of heads in the entire sequence? More generally, consider a sequence of independent experiments, each of which succeeds with probability If we let represent the number of successes in the experiments, then has a binomial distribution MAT-72306 RandAl, Spring 2015 22-Jan-15 91 Definition 2.5: A binomial RV with parameters and, denoted by ), is defined by the following probability distribution on = 0,1,2,,: Pr = I.e., the binomial RV (BRV) equals when there are exactly successes and failures in independent experiments, each of which is successful with probability Definition 2.5 ensures that the BRV is a valid probability function (Definition 1.2): Pr = = 1 MAT-72306 RandAl, Spring 2015 22-Jan-15 92 9

We want to gather data about the packets going through a router We want to know the approximate fraction of packets from a certain source / of a certain type We store a random subset or sample of the packets for later analysis Each packet is stored with probability and packets go through the router each day, the number of sampled packets each day is a BRV with parameters and To know how much memory is necessary for such a sample, determine the expectation of MAT-72306 RandAl, Spring 2015 22-Jan-15 93 If is a BRV with parameters and, then is the number of successes in trials, where each trial is successful with probability Define a set of indicator RVs,,, where =1if the th trial is successful and 0 otherwise Clearly, ]=and = and so, by the linearity of expectations, = MAT-72306 RandAl, Spring 2015 22-Jan-15 94 10

2.3. Conditional Expectation Definition 2.6: ] = Pr where the summation is over all in the range of The conditional expectation of a RV is, like, a weighted sum of the values it assumes Now each value is weighted by the conditional probability that the variable assumes that value MAT-72306 RandAl, Spring 2015 22-Jan-15 95, Suppose that we independently roll two standard six-sided dice Let be the number that shows on the first die, the number on the second die, and the sum of the numbers on the two dice Then = 2 = Pr =2 1 = 11 6 2 MAT-72306 RandAl, Spring 2015 22-Jan-15 96 11

As another example, consider =5: = 5 = Pr =5 = Pr =5 Pr =5 = 136 = 5 436 2 MAT-72306 RandAl, Spring 2015 22-Jan-15 97 Lemma 2.5: For any RVs and, ]= Pr ], where the sum is over all values in the range of and all of the expectations exist. Proof: Pr =Pr = Pr Pr Pr = Pr = Pr ] MAT-72306 RandAl, Spring 2015 22-Jan-15 98 12

The linearity of expectations also extends to conditional expectations Lemma 2.6: For any finite collection of discrete RVs,, with finite expectations and for any RV, = ] MAT-72306 RandAl, Spring 2015 22-Jan-15 99 Confusingly, the conditional expectation is also used to refer to the following RV Definition 2.7: The expression ] is a RV ) that takes on the value ] when ] is not a real value; it is actually a function of the RV Hence ] is itself a function from the sample space to the real numbers and can therefore be thought of as a RV MAT-72306 RandAl, Spring 2015 22-Jan-15 100 13

In the previous example of rolling two dice, =Pr = 1 6 + 7 2 We see that is a RV whose value depends on If ] is a RV, then it makes sense to consider its expectation ] We found that + 72 Thus, + 7 2 =7 2 +7 2 = 7 = ] MAT-72306 RandAl, Spring 2015 22-Jan-15 101 More generally, Theorem 2.7: Y = ] Proof: From Definition 2.7 we have, where takes on the value when. Hence = Pr The right-hand side equals Y by Lemma 2.5. MAT-72306 RandAl, Spring 2015 22-Jan-15 102 14

Consider a program that includes one call to a process Assume that each call to process recursively spawns new copies of the process, where the number of new copies is a BRV with parameters and We assume that these random variables are independent for each call to What is the expected number of copies of the process generated by the program? MAT-72306 RandAl, Spring 2015 22-Jan-15 103 To analyze this recursive spawning process, we use generations The initial process is in generation 0 Otherwise, we say that a process is in generation if it was spawned by another process in generation 1 Let denote the number of processes in generation Since we know that =1, the number of processes in generation 1 has a binomial distribution Thus, = MAT-72306 RandAl, Spring 2015 22-Jan-15 104 15

Similarly, suppose we knew that the number of processes in generation 1was, so Then Applying Theorem 2.7, we can compute the expected size of the th generation inductively We have ] By induction on, and using the fact that =1, we then obtain = MAT-72306 RandAl, Spring 2015 22-Jan-15 105 The expected total number of copies of process generated by the program is given by = = If 1then the expectation is unbounded; if <1, the expectation is 1 (1 ) The # of processes generated by the program is bounded iff the # of processes spawned by each process is less than 1 This is a simple example of a branching process, a probabilistic paradigm extensively studied in probability theory MAT-72306 RandAl, Spring 2015 22-Jan-15 106 16

2.4. The Geometric Distribution Let us flip a coin until it lands onheads What is the distribution of the number of flips? This is an example of a geometric distribution It arises when we perform a sequence of independent trials until the first success, where each trial succeeds with probability Definition 2.8: A geometric RV with parameter is given by the following probability distribution on = 1,2, : Pr ) = MAT-72306 RandAl, Spring 2015 22-Jan-15 107 Geometric RVs are said to be memoryless because the probability that you will reach your first success trials from now is independent of the number of failures you have experienced Informally, one can ignore past failures they do not change the distribution of the number of future trials until first success Formally, we have the following Lemma 2.8: For a geometric RV with parameter and for >0, Pr ) = Pr MAT-72306 RandAl, Spring 2015 22-Jan-15 108 17

When a RV takes values in the set of natural numbers = {0,1,2,3, } there is an alternative formula for calculating its expectation Lemma 2.9: Let be a discrete RV that takes on only nonnegative integer values. Then Proof: ]= Pr = Pr Pr = Pr = Pr ] MAT-72306 RandAl, Spring 2015 22-Jan-15 109 For a geometric RV with parameter, Pr = = Hence = = 1 (1) = 1 Thus, for a fair coin where = 1/2, on average it takes two flips to see the first heads MAT-72306 RandAl, Spring 2015 22-Jan-15 110 18

Finding the expectation of a geometric RV with parameter using conditional expectations and the memoryless property of geometric RVs Recall that corresponds to the number of flips until the first heads given that each flip isheads with probability Let =0if the first flip istails and =1if the first flip isheads By the identity from Lemma 2.5, = Pr = 0 =0 + Pr = 1 = 1] = (1 )[ = 0] + [ = 1] MAT-72306 RandAl, Spring 2015 22-Jan-15 111 If = 1 then = 1, so [ = 1] = 1 If =0, then >1 In this case, let the number of remaining flips (after the first flip until the first heads) be Then, by the linearity of expectations, ]=(1+1]+1=(1]+1 By the memoryless property of geometric RVs, is also a geometric RV with parameter Hence ] = ], since they both have the same distribution We therefore have ]=(1]+1= (1)[]+1, which yields [] = 1/ MAT-72306 RandAl, Spring 2015 22-Jan-15 112 19

2.4.1. Example: Coupon Collector's Problem Each box of cereal contains one of different coupons Once you obtain one of every type of coupon, you can send in for a prize Coupon in each box is chosen independently and uniformly at random from the possibilities and that you do not collaborate to collect coupons How many boxes of cereal must you buy before you obtain at least one of every type of coupon? MAT-72306 RandAl, Spring 2015 22-Jan-15 113 Let be the number of boxes bought until at least one of every type of coupon is obtained If is the number of boxes bought while you had exactly 1different coupons, then clearly = The advantage of breaking into a sum of random variables, = 1,,, is that each is a geometric RV When exactly 1coupons have been found, the probability of obtaining a new coupon is =1 1 MAT-72306 RandAl, Spring 2015 22-Jan-15 114 20

Hence, is a geometric RV with parameter : = 1 = +1 Using the linearity of expectations, we have that = = +1 1 MAT-72306 RandAl, Spring 2015 22-Jan-15 115 The summation harmonic number ) is known as the Lemma 2.10: The harmonic number = satisfies ) = ln (1). Thus, for the coupon collector's problem, the expected number of random coupons required to obtain all coupons is ln MAT-72306 RandAl, Spring 2015 22-Jan-15 116 21

Given the first and second moments, one can compute the variance and standard deviation of the RV Intuitively, the variance and standard deviation offer a measure of how far the RV is likely to be from its expectation Definition 3.2: The variance of a RV is ] The standard deviation of a RV is ]= MAT-72306 RandAl, Spring 2015 22-Jan-15 122 The two forms of the variance in the definition are equivalent, as is easily seen by using the linearity of expectations Keeping in mind that is a constant, we have [ ]= ] ] MAT-72306 RandAl, Spring 2015 22-Jan-15 123 22