EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may of the key defiitios. We are goig to formalize some tools to deal with combiatios of evets. Probability Recap The most basic thig is the sample space Ω represetig all the distict possibilites of what the radom experimet could yield. Doig the radom experimet results i exactly oe outcome ω Ω beig selected by ature. (Ad ature here is o-adversarial to us it is ot out to cause us trouble o purpose.) Ω itself might have some iteral structure to it. The most commo case is that Ω cosists of tuples lists ad Ω itself ca be viewed as a Cartesia product Ω 1 Ω 2 Ω, where sometimes the idividual Ω i ca be thought of as sub-experimets that are all doe simultaeously as a part of the larger experimet. However, we are iterested ot just i idividual outcomes, but i sets of possible outcomes. Sets of outcomes are called evets, ad formally, these must be subsets of Ω. The ull subset is a allowable evet, as is the etire set Ω itself. Probability is a fuctio o evets. It obeys certai atural properties, which are sometimes called the axioms of probability. If A is a evet (subset of Ω), Pr[A] 0. This property is called o-egativity. If A is a evet (subset of Ω), Pr[A] 1 with Pr[Ω] = 1. This property is called ormalizatio. If A ad B are evets, ad the evets are disjoit (i.e. A B = /0), the Pr[A B] = Pr[A] + Pr[B]. This property is called additivity. By iductio, it ca easily be exteded to what is called fiite additivity. If A i for i = 1,2,..., are evets that are all disjoit (i.e. for all i j, A i A j = /0), the Pr[ A i ] = Pr[A i ]. This is complemeted, for techical reasos, by a secod additivity axiom that deals with coutably ifiite collectios of evets. If A i for i = 1,2,... are evets that are all disjoit (i.e. for all i j, A i A j = /0), the Pr[ A i ] = Pr[A i ]. For the purposes of EECS70, these two forms of additivity ca just be viewed together. The oly importat thig is that additivity requires you to be able to list all the evets i questio i order, oe at a time. This will oly become a importat restrictio later whe we cosider cotiuous probability. These properties aloe give rise to various useful properties that are largely iherited from set theory, ad we will talk about them i the ext sectio. EECS 70, Sprig 2014, Note 13 1
Notrivial combiatios of evets I most applicatios of probability i EECS, we are iterested i thigs like Pr[ A i ] ad Pr[ A i ], where the A i are simple evets (i.e., we kow, or ca easily compute, the Pr[A i ]). The itersectio i A i correspods to the logical AND of the evets A i, while the uio i A i correspods to their logical OR. As a example, if A i deotes the evet that a failure of type i happes i a certai system, the i A i is the evet that the system fails. I geeral, computig the probabilities of such combiatios ca be very difficult. I this sectio, we discuss some situatios where it ca be doe. Let s start with idepedet evets, for which itersectios are quite simple to compute. Idepedet Evets Defiitio 13.1 (idepedece): Two evets A, B i the same probability space are idepedet if Pr[A B] = Pr[A] Pr[B]. Oe ituitio behid this defiitio is the followig. Suppose that Pr[B] > 0. The we have Pr[A B] = Pr[A B] Pr[B] = Pr[A] Pr[B] Pr[B] = Pr[A]. Thus idepedece has the atural meaig that the probability of A is ot affected by whether or ot B occurs. (By a symmetrical argumet, we also have Pr[B A] = Pr[B] provided Pr[A] > 0.) For evets A, B such that Pr[B] > 0, the coditio Pr[A B] = Pr[A] is actually equivalet to the defiitio of idepedece. A deeper ituitio is that idepedece is the way to capture the essece (as far as iferece goes) of the property that two completely urelated subexperimets have to each other. Kowig somethig about oe tells you othig about the other. I fact, several of our previously metioed radom experimets cosist of idepedet evets. For example, if we flip a coi twice, the evet of obtaiig heads i the first trial is idepedet to the evet of obtaiig heads i the secod trial. The same applies for two rolls of a die; the outcomes of each trial are idepedet. The above defiitio geeralizes to ay fiite set of evets: Defiitio 13.2 (mutual idepedece): Evets A 1,...,A are mutually idepedet if for every subset I {1,...,}, Pr[ i I A i ] = i I Pr[A i ]. Note that we eed this property to hold for every subset I. For mutually idepedet evets A 1,...,A, it is ot hard to check from the defiitio of coditioal probability that, for ay 1 i ad ay subset I {1,...,} \ {i}, we have Pr[A i j I A j ] = Pr[A i ]. Note that the idepedece of every pair of evets (so-called pairwise idepedece) does ot ecessarily imply mutual idepedece. For example, it is possible to costruct three evets A,B,C such that each pair is idepedet but the triple A,B,C is ot mutually idepedet. EECS 70, Sprig 2014, Note 13 2
Pairwise Idepedece Example Suppose you toss a fair coi twice ad let A be the evet that the first flip is H s ad B be the evet that the secod flip is H s. Now let C be the evet that both flips are the same (i.e. both H s or both T s). Of course A ad B are idepedet. What is more iterestig is that so are A ad C: give that the first toss came up H s, there is still a eve chace that the secod flip is the same as the first. Aother way of sayig this is that P[A C] = P[A]P[C] = 1/4 sice A C is the evet that the first flip is H s ad the secod is also H s. By the same reasoig B ad C are also idepedet. The fact that A should be idepedet of C is ot ituitively obvious at first glace. This is the power of the defiitio of idepedece. It tells us somethig oobvious. O the other had, A, B ad C are ot mutually idepedet. For example if we are give that A ad B occurred the the probability that C occurs is 1. So eve though A, B ad C are ot mutually idepedet, every pair of them are idepedet. I other words, A, B ad C are pairwise idepedet but ot mutually idepedet. Itersectios of evets Computig itersectios of idepedet evets is easy; it follows from the defiitio. We simply multiply the probabilities of each evet. How do we compute itersectios for evets which may ot be idepedet? From the defiitio of coditioal probability, we immediately have the followig product rule (sometimes also called the chai rule) for computig the probability of a itersectio of evets. Theorem 13.1: [Product Rule] For ay evets A,B, we have More geerally, for ay evets A 1,...,A, Pr[A B] = Pr[A]Pr[B A]. Pr[ A i ] = Pr[A 1 ] Pr[A 2 A 1 ] Pr[A 3 A 1 A 2 ] Pr[A 1 A i]. Proof: The first assertio follows directly from the defiitio of Pr[B A] (ad is i fact a special case of the secod assertio with = 2). To prove the secod assertio, we will use iductio o (the umber of evets). The base case is = 1, ad correspods to the statemet that Pr[A] = Pr[A], which is trivially true. For the iductive step, let > 1 ad assume (the iductive hypothesis) that Pr[ 1 A i] = Pr[A 1 ] Pr[A 2 A 1 ] Pr[A 1 2 A i]. Now we ca apply the defiitio of coditioal probability to the two evets A ad 1 A i to deduce that Pr[ A i ] = Pr[A ( 1 A i)] = Pr[A 1 A i] Pr[ 1 A i] = Pr[A 1 A i] Pr[A 1 ] Pr[A 2 A 1 ] Pr[A 1 2 A i], where i the last lie we have used the iductive hypothesis. This completes the proof by iductio. The product rule is particularly useful whe we ca view our sample space as a sequece of choices. The ext few examples illustrate this poit. EECS 70, Sprig 2014, Note 13 3
Examples Coi tosses. Toss a fair coi three times. Let A be the evet that all three tosses are heads. The A = A 1 A 2 A 3, where A i is the evet that the ith toss comes up heads. We have Pr[A] = Pr[A 1 ] Pr[A 2 A 1 ] Pr[A 3 A 1 A 2 ] = Pr[A 1 ] Pr[A 2 ] Pr[A 3 ] = 1 2 1 2 1 2 = 1 8. The secod lie here follows from the fact that the tosses are mutually idepedet. Of course, we already kow that Pr[A] = 1 8 from our defiitio of the probability space i a earlier lecture ote. Aother way of lookig at this calculatio is that it justifies our defiitio of the probability space, ad shows that it was cosistet with assumig that the coi flips are mutually idepedet. If the coi is biased with heads probability p, we get, agai usig idepedece, Pr[A] = Pr[A 1 ] Pr[A 2 ] Pr[A 3 ] = p 3. Ad more geerally, the probability of ay sequece of tosses cotaiig r heads ad r tails is p r (1 p) r. This is i fact the reaso we defied the probability space this way i the previous lecture ote: we defied the sample poit probabilities so that the coi tosses would behave idepedetly. Moty Hall Recall the Moty Hall problem from a earlier lecture: there are three doors ad the probability that the prize is behid ay give door is 1 3. There are goats behid the other two doors. The cotestat picks a door radomly, ad the host opes oe of the other two doors, revealig a goat. How do we calculate itersectios i this settig? For example, what is the probability that the cotestat chooses door 1, the prize is behid door 2, ad the host chooses door 3? Let A 1 be the evet that the cotestat chooses door 1, let A 2 be the evet that the prize is behid door 2, ad let A 3 be the evet that the host chooses door 3. We would like to compute Pr[A 1 A 2 A 3 ]. By the product rule: Pr[A 1 A 2 A 3 ] = Pr[A 1 ] Pr[A 2 A 1 ] Pr[A 3 A 1 A 2 ] The probability of A 1 is 1 3, sice the cotestat is choosig the door at radom. The probability A 2 give A 1 is still 1 3 sice they are idepedet. The probability of the host choosig door 3 give evets A 1 ad A 2 is 1; the host caot choose door 1, sice the cotestat has already opeed it, ad the host caot choose door 2, sice the host must reveal a goat (ad ot the prize). Therefore, Pr[A 1 A 2 A 3 ] = 1 3 1 3 1 = 1 9. Observe that we did eed coditioal probability i this settig; had we simply multiplied the probabilities of each evet, we would have obtaied 1 27 sice the probability of A 3 is also 1 3 (ca you figure out why?). What if we chaged the situatio, ad istead asked for the probability that the cotestat chooses door 1, the prize is behid door 1, ad the host chooses door 2? We ca use the same techique as above, but our fial aswer will be differet. This is left as a exercise. EECS 70, Sprig 2014, Note 13 4
Aother useful exercise is to use coditioal probability to aalyze the case whe the two goats have differet geders ad Moty is committed to always reveal the locatio of the female goat. (This could be the door that you yourself had chose). I this case, what happes whe Moty reveals the female goat is behid oe of the other two doors? What is the coditioal probability of wiig for switchig vs ot switchig? Poker Hads Let s use the product rule to compute the probability of a flush i a differet way. This is equal to 4 Pr[A], where A is the probability of a Hearts flush. Ituitively, this should be clear sice there are 4 suits; we ll see why this is formally true i the ext sectio. We ca write A = 5 A i, where A i is the evet that the ith card we pick is a Heart. So we have Pr[A] = Pr[A 1 ] Pr[A 2 A 1 ] Pr[A 5 4 A i ]. Clearly Pr[A 1 ] = 13 52 = 1 4. What about Pr[A 2 A 1 ]? Well, sice we are coditioig o A 1 (the first card is a Heart), there are oly 51 remaiig possibilities for the secod card, 12 of which are Hearts. So Pr[A 2 A 1 ] = 12 51. Similarly, Pr[A 3 A 1 A 2 ] = 11 50, ad so o. So we get 4 Pr[A] = 4 13 52 12 51 11 50 10 49 9 48, which is exactly the same fractio we computed i the previous lecture ote. So ow we have two methods of computig probabilities i may of our sample spaces. It is useful to keep these differet methods aroud, both as a check o your aswers ad because i some cases oe of the methods is easier to use tha the other. Uios of evets You are i Las Vegas, ad you spy a ew game with the followig rules. You pick a umber betwee 1 ad 6. The three dice are throw. You wi if ad oly if your umber comes up o at least oe of the dice. The casio claims that your odds of wiig are 50%, usig the followig argumet. Let A be the evet that you wi. We ca write A = A 1 A 2 A 3, where A i is the evet that your umber comes up o die i. Clearly Pr[A i ] = 1 6 for each i. Therefore, Pr[A] = Pr[A 1 A 2 A 3 ] = Pr[A 1 ] + Pr[A 2 ] + Pr[A 3 ] = 3 1 6 = 1 2. Is this calculatio correct? Well, suppose istead that the casio rolled six dice, ad agai you wi iff your umber comes up at least oce. The the aalogous calculatio would say that you wi with probability 6 1 6 = 1, i.e., certaily! The situatio becomes eve more ridiculous whe the umber of dice gets bigger tha 6. The problem is that the evets A i are ot disjoit: i.e., there are some sample poits that lie i more tha oe of the A i. (We could get really lucky ad our umber could come up o two of the dice, or all three.) So if we add up the Pr[A i ] we are coutig some sample poits more tha oce. Fortuately, there is a formula for this, kow as the Priciple of Iclusio/Exclusio: Theorem 13.2: [Iclusio/Exclusio] For evets A 1,...,A i some probability space, we have Pr[ A i ] = Pr[A i ] Pr[A i A j ] + Pr[A i A j A k ] ± Pr[ A i ]. {i, j} {i, j,k} EECS 70, Sprig 2014, Note 13 5
[I the above summatios, {i, j} deotes all uordered pairs with i j, {i, j, k} deotes all uordered triples of distict elemets, ad so o.] I.e., to compute Pr[ i A i ], we start by summig the evet probabilities Pr[A i ], the we subtract the probabilities of all pairwise itersectios, the we add back i the probabilities of all three-way itersectios, ad so o. We wo t prove this formula here; but you might like to verify it for the special case = 3 by drawig a Ve diagram ad checkig that every sample poit i A 1 A 2 A 3 is couted exactly oce by the formula. You might also like to (hit: do this) prove the formula for geeral by iductio (i similar fashio to the proof of the Product Rule above). Takig the formula o faith, what is the probability we get lucky i the ew game i Vegas? Pr[A 1 A 2 A 3 ] = Pr[A 1 ] + Pr[A 2 ] + Pr[A 3 ] Pr[A 1 A 2 ] Pr[A 1 A 3 ] Pr[A 2 A 3 ] + Pr[A 1 A 2 A 3 ]. Now the ice thig here is that the evets A i are mutually idepedet (the outcome of ay die does ot deped o that of the others), so Pr[A i A j ] = Pr[A i ]Pr[A j ] = ( 1 6 )2 = 1 36, ad similarly Pr[A 1 A 2 A 3 ] = ( 1 6 )3 = 1 216. So we get Pr[A1 A2 A3] = ( 3 1 ) ( ) 6 3 1 36 + 1 216 = 91 216 0.42. So your odds are quite a bit worse tha the casio is claimig! Whe is large (i.e., we are iterested i the uio of may evets), the Iclusio/Exclusio formula is essetially useless because it ivolves computig the probability of the itersectio of every o-empty subset of the evets: ad there are 2 1 of these! Sometimes we ca just look at the first few terms of it ad forget the rest: ote that successive terms actually give us a overestimate ad the a uderestimate of the aswer, ad these estimates both get better as we go alog. However, i may situatios we ca get a log way by just lookig at the first term: 1. Disjoit evets. If the evets A i are all disjoit (i.e., o pair of them cotai a commo sample poit such evets are also called mutually exclusive), the Pr[ A i ] = Pr[A i ]. [Note that we have already used this fact several times i our examples, e.g., i claimig that the probability of a flush is four times the probability of a Hearts flush clearly flushes i differet suits are disjoit evets.] 2. Uio boud. Always, it is the case that Pr[ A i ] Pr[A i ]. This merely says that addig up the Pr[A i ] ca oly overestimate the probability of the uio. Crude as it may seem, i the ext lecture ote we ll see how to use the uio boud effectively i a core EECS example. EECS 70, Sprig 2014, Note 13 6