3 Introduction to Probability Given a fair coin, what can we expect to be the frequency of tails in a sequence of 10 coin tosses? Tossing a coin is an example of a chance experiment, namely a process which results in one and only one outcome from a set of mutually exclusive outcomes, where the outcomes cannot be predicted with certainty. A chance experiment can be real or conceptual. Other examples of a chance experiment are: throwing a fair die 10 times and recording the number of times a prime number (namely 1, 2, 3 or 5) is obtained, or selecting 5 students at random and recording whether they are male or female, or randomly drawing a sample of voters from the U.S. population. 3.1 SAMPLE SPACES AND EVENTS The most basic outcomes of a chance experiment are called elementary outcomes or sample points. Any theory involves idealizations, and our rst idealization concerns the elementary outcomes of an experiment. For example, when a coin is tossed, it does not necessarily fall head (H) or tail (T), for it can stand on its edge or roll away. Still we agree that H and T are the only elementary outcomes. The sample space is the set of all elementary outcomes of a chance experiment. An outcome that can be decomposed into a set of elementary outcomes is called an event. The simplest kind of sample spaces are the ones that are nite, that is, consist only of a nite number of points. If the number of points is small, then these spaces are easy to visualize. Example 3.1 Consider the chance experiment of tossing 3 coins or, equivalently, tossing the same coin 3 times. The sample space of this experiment is easily constructed by noticing that the rst coin toss has two possible outcomes, H and T. Given the result of the rst coin toss, the second also has H and T as possible outcomes. Given the results of the rst two coin tosses, the third also has H and T as possible outcomes. The outcome tree of this experiment and its sample points are listed in Table 4. Taken together, these sample points comprise the sample space. The event \at least 2 heads" consists of the following sample points HHH; HHT; HTH; THH: 2
20 Table 4 Outcome tree and sample space of the chance experiment of tossing 3 coins. H T / H HHH H / n T HHT n / H HTH T n T HTT / H THH H / n T THT n / H TTH T n T TTT Many important sample spaces are not nite. Some of them contain countably many points, and some of them may even contain uncountably many points. Example 3.2 Consider the chance experiment of tossing a coint until a head turns up. The points of this sample space are: H; T; TH;TT; TTH; TTT;::: This sample space contains countably many points. 2 Example 3.3 Consider the chance experiment of picking a real number from the interval (0; 1). This sample space contains uncountably many points. 2 3.2 RELATIONS AMONG EVENTS Let S be a sample space, e an elementary outcome and E an event, that is, a set of elementary outcomes. Because the notions of elementary outcome and event are the same as those of point and point set in set theory, standard concepts and results from set theory also apply to probability theory. Thus, ; denotes the impossible event, that is, the event that contains no sample point. Given an event E, E c denotes the complement of E, that is, the event consisting of all points of S that are not contained in E. Clearly, S c = ; and ; c = S. Given two events A and B, we say that A is contained in B, written A µ B, if all points in A are also in B. In the language of probability, we say that \B occurs whenever A occurs". Clearly, for any event E, we have that ; µ E and E µ S. We
INTRODUCTION TO PROBABILITY 21 say that A and B are equal, written A = B, if A µ B and B µ A. We say that A is strictly contained in B, written A ½ B, if A µ B but A is not equal to B. Given two events A and B, the event A[B (called the union of A and B) corresponds to the occurrence of either A or B, that is, it consists of all sample points that are either in A or in B, or in both. Clearly, A [ B = B [ A; A µ (A [ B); B µ (A [ B): Given any event E, we also have E [ E c = S; E [ S = S; E [ ; = E: (3.1) Given two events A and B, the event A \ B (called the intersection of A and B) corresponds to the occurrence of both A and B, that is, it consists of all sample points that are in both A and B. When A \ B = ;, we say that the events A and B are mutually exclusive, that is, they cannot occur at once. Clearly, Further A \ B = B \ A; (A \ B) µ A; (A \ B) µ B: Given any event E, we also have (A \ B) µ (A [ B): E \ E c = ;; E \ S = E; E \ ; = ;: (3.2) In fact, the relationship between (3.1) and (3.2) is a special case of the following results, known as de Morgan's laws. Given two events A and B (A \ B) c = A c [ B c ; (A [ B) c = A c \ B c : De Morgan's laws show that complementation, union and intersection are not independent operations. Given two events A and B, the event E = A B (called the di erence of A and B) corresponds to all sample points in A that are not in B. Clearly, A B = A \ B c. Notice that A B and B A are di erent events, that (A B) \ (B A) = ; and that (A \ B) [ (A B) = A. Venn diagrams. 3.3 PROBABILITIES IN SIMPLE SAMPLE SPACES Probabilities are just numbers assigned to events. These numbers have the same nature as lengths, areas and volumes in geometry. How are probability numbers assigned? In the experiment of tossing a fair coin, where S = fh;tg, we do not hesitate to assign probability 1/2 to each of the two elementary outcomes H and T. From the theoretical point of view this is merely a convention, which can however be justi ed on the basis of actually tossing a fair coin a large number of times. In this case, the probability 1/2 assigned to the event \H occurred" can be interpreted as the limiting
22 relative frequency of heads in the experiment of tossing a fair coin n times as n! 1. The view of probabilities as the limit of relative frequencies is called the frequentist interpretation of probabilities. This is not the only interpretation, however. Another important one is the subjectivist interpretation, where probabilities are essentially viewed as representing degrees of belief about the likelihood of an event. A sample space consisting of a nite number of points, where each point is equally probable, that is, receives the same probability, is called simple. Example 3.4 The sample space corresponding to the chance experiment of tossing a fair coin 3 times is a simple sample space where each sample point receives the same probability 1/8. 2 Given a simple sample space S, the probability of an event E µ S is Pr(E) = number of sample points in E total number of sample points : Several important properties of probabilities follow immediately from this de nition: (i) 0 Pr(E) 1; (ii) Pr(S) = 1; (iii) Pr(;) = 0. These three properties hold for general sample spaces as well. Other properties are easy to understand using Venn diagrams. If A µ B, then If E = A [ B, then Clearly, Pr(A) Pr(B): Pr(E) = sum of the probabilities of all sample points in A [ B = Pr(A) + Pr(B) Pr(A \ B) Pr(A) + Pr(B): Pr(E) = Pr(A) + Pr(B) if and only if Pr(A \ B) = 0, that is, A and B are mutually exclusive events. For the complement E c of E, since E [ E c = S and E \ E c = ;, we have Pr(E) + Pr(E c ) = Pr(S) = 1 and so Pr(E c ) = 1 Pr(E): Example 3.5 Consider the simple sample space corresponding to experiment of tossing a fair coin 3 times. The event \at least 2 heads" corresponds to the set of elementary outcomes A = fhhh; HHT; HTH; THHg: Therefore, its probability is Pr(A) = 4=8 = 1=2:
INTRODUCTION TO PROBABILITY 23 The event \at least 1 tail" corresponds to the set of elementary outcomes B = fhht; HTH; HTT; THH; THT; TTH; TTTg: Because B is the complement of the event \no tails", its probability is The intersection ofa andb is the event Pr(B) = 7=8 = 1 Pr(HHH): A \ B = fhht; HTH; THHg; whose probability is equal to 3=8. The probability of the union of A and B is therefore equal to Pr(A) + Pr(B) Pr(A \ B) = 1 2 + 7 8 3 8 = 1; which ought not be surprising since A [ B = S in this case. 2 3.4 COUNTING RULES Calculations of probabilities for simple sample spaces is facilitated by a systematic use of a few counting rules. 3.4.1 MULTIPLICATION RULE The experiment of tossing a fair coin twice has 4 possible outcomes: HH, HT, TH and TT. This is an example of a chance experiment with the following characteristics: 1. The experiment is performed in 2 parts. 2. The rst part has n possible outcomes, say x 1 ;:::;x n. Regardless of which of these outcomes occurred, the second part has m possible outcomes, say y 1 ;:::;y m. Each point of the sample space S is therefore a pair e = (x i ;y j ), where i = 1;:::;n and j = 1;:::;m, and S consists of the mn pairs (x 1 ;y 1 ) (x 1 ;y 2 ) (x 1 ;y m ) (x 2 ;y 1 ) (x 2 ;y 2 ) (x 2 ;y m )... (x n ;y 1 ) (x n ;y 2 ) (x n ;y m ): The generalization to the case of an experiment with more than 2 parts is straightforward. Consider an experiment that is performed in k parts (k 2), where the hth part of the experiment has n h possible outcomes (h = 1;:::;k) and each of the outcomes in any part of the experiment can occur regardless of which speci c outcome occurred in any of the other parts. Then each sample point in S will be a k-tuple e = (u 1 ;:::;u k ), where u h is one of the n h possible outcomes in the hth part of the experiment. The total number of sample points in S is therefore equal to n 1 n 2 n k :
24 Example 3.6 Suppose one can choose between 10 speaker types, 5 receivers and 3 CD players. The number of di erent stereo systems that can be put together this way is 10 5 3 = 150. 2 The next two subsections provide important examples of application of the multiplication rule. 3.4.2 SAMPLING WITH REPLACEMENT Consider a chance experiment which consists of k repetitions of the same basic experiment or trial. If each trial has the same number n of possible outcomes, then the total number of sample points in S is equal to n k. Example 3.7 Consider tossing a coin 4 times. The total number of outcomes is 2 4 = 16. 2 Example 3.8 Consider a box containing 10 balls numbered 1; 2;:::; 10. Suppose that we repeat 5 times the basic experiment of selecting one ball at random, recording its number and then putting the ball back in the urn. Since the number of possible outcomes in each trial is equal to 10, the total number of possible outcomes of the experiment is equal to 10 5 = 100; 000. This experiment is an example of sampling with replacement from a nite population. 2 3.4.3 SAMPLING WITHOUT REPLACEMENT Sampling without replacement corresponds to successive random draws, without replacement, of a single population unit. In the example of drawing balls from a box (Example 3.8), after a ball is selected, it is left out of the box. Example 3.9 Consider a deck of 52 cards. If we select 3 cards in succession, then there are 52 possible outcomes at the rst selection, 51 at the second, and 50 at the third. This is an example of sampling without replacement from a nite population. The total number of possible outcomes is therefore 52 51 50 = 132; 600. 2 If k elements have to be selected from a set of n elements, then the total number of possible outcomes is P n;k = n(n 1)(n 2) (n k + 1); called the number of permutations of n elements taken k at a time. If k = n, then the number of possible outcomes is the number of permutations of all n elements P n;n = n(n 1)(n 2) 2 1; called n factorial and denoted by n!. By convention 0! = 1. Thus P n;k = n(n 1)(n 2) (n k + 1)(n k) 2 1 (n k)(n k 1) 2 1 = n! (n k)! :
INTRODUCTION TO PROBABILITY 25 Example 3.10 Given a group of k people (2 k 365), what is the probability that at least 2 people in the group have the same birthday? To simplify the problem, assume that birthdays are unrelated (there are no twins) and that each of the 365 days of the year are equally likely to be the birthday of any person. The sample space S then consists of 365 k possible outcomes. The number of outcomes in S for which all k birthdays are di erent is P 365;k. Therefore, if E denotes the event \all k people have di erent birthdays", then Pr(E) = P 365;k 365 k : Because the event \at least 2 people have the same birthday" is just the complement of E, we get Pr(E c ) = 1 P 365;k 365 k : We denote this probability by p(k). The table below summarizes the value of p(k) for di erent values of k: k p(k) 5.027 10.117 20.411 40.891 60.994 Notice that, in a class of 100 people, the event that at least 2 people have the same birthday is almost certain. 2 3.4.4 COMBINATIONS As a motivation, consider the following example. Example 3.11 Consider combining 4 elements a, b, c and d, taken 2 at a time. The total number of possible outcomes is equal to the permutation of 4 objects taken 2 at a time, namely P 4;2 = 4 3 = 12: If the order of the elements of each pair is irrelevant, the table below shows that 6 di erent combinations are obtained:
26 12 permutations 6 combinations a;b a;c a;d b;a b;c b;d c;a c;b c;d d;a d;b d;c fa;bg fa;cg fa;dg fb;cg fb;dg fc;dg 2 Let C n;k denote the number of di erent combinations of n objects taken k at a time. To determine C n;k notice that the list of P n;k permutations may be constructed as follows. First select a particular combination of k objects. Then notice that this particular combination can produce k! permutations. Hence from which we get P n;k = C n;k k!; C n;k = P n;k k! = n! (n k)!k! : The number C n;k is also called binomial coe±cient and denoted µ n C n;k = : k Clearly µ µ n n! n = k (n k)!k! = : n k Example 3.12 In Example 3.11, n = 4, k = 2 and so C 4;2 = 12=2 = 6. 2 Example 3.13 Given a hand of 5 cards, randomly drawn from a deck of 52, the probability of a \straight ush" is p = Pr(\straight ush") = no. of di erent straight ushes : no. of di erent hands The number of di erent hands is equal to µ 52 = 52! = 2; 598; 960: 5 5! 47!
INTRODUCTION TO PROBABILITY 27 Because there are 10 straight ushes for each suit, the total number of straight ushes is 10 4 = 40. Therefore, the desired probability is p = 40 2; 598; 960 = :000015: Not a high one! 2 When a set contains only elements of 2 distinct types, a binomial coe±cient may be used to represent the number of di erent arrangements of all the elements in the set. Example 3.14 Suppose that k red balls and n k green balls are to be arranged in a row. Since the red balls occupy k positions, the number of di erent arrangements of the n balls corresponds to the number C n;k of combinations of n objects taken k at a time. 2 Example 3.15 Given a hand of 5 cards, randomly drawn from a deck of 52, the probability of a \poker" is p = Pr(\poker") = no. of di erent pokers no. of di erent hands ; where the denominator is the same as in Example 3.13. To compute the denominator, notice that 13 types of poker are possible: A, K, Q,..., 2, and that 5 cards can be divided in 2 groups, one of 4 and one of 1 cards, in C 5;4 = µ 5 4 = 5! 1! 4! = 5 possible ways. Therefore, the number of possible pokers in one hand of 5 cards is 13 5 = 65 and so 65 p = 2; 598; 960 = :000025; which is higher than the probability of a straight ush. 2 3.5 CONDITIONAL PROBABILITIES Suppose that we have a sample space S where probabilities have been assigned to all events. If we know that the event B ½ S occurred, then it seems intuitively obvious that this ought to modify our assignment of probabilities to any other event A ½ S, because the only sample points in A that are now possible are the ones that are also contained in B. This new probabilitiy assigned to A is called the conditional probability of the event A given that the event B has occurred, or simply the conditional probability of A given B, and denoted by Pr(AjB). Example 3.16 Consider again the experiment of tossing a fair coin 3 times. Let A = \at least one T" and B = \H in the rst trial". Clearly Pr(B) = 1=2; Pr(A) = 7=8; Pr(A \ B) = 3=8:
28 If we know that B occurred, then the relevant sample space becomes Therefore S 0 = fhhh; HHT; HTH; HTTg: Pr(AjB) = 3 4 = 3=8 1=2 Pr(A \ B) = : Pr(B) Notice that Pr(AjB) < Pr(A) in this case. 2 De nition 3.1 If A and B are any two events, then the conditional probability of A given B is Pr(A \ B) Pr(AjB) = Pr(B) if Pr(B) > 0, and Pr(AjB) = 0 otherwise. 2 The conditional probability of B givena is similarly de ned as Pr(B ja) = Pr(A \ B) Pr(A) provided that Pr(B) > 0. The frequentist interpretation of conditional probabilities is as follows. If a chance experiment is repeated a large number of times, then the proportion of trials on which the event B occurs is approximately equal to Pr(B), whereas the proportion of trials in which both A and B occur is approximately equal to Pr(A \ B). Therefore, among those trials in which B occurs, the proportion in which A also occurs is approximately equal to Pr(A \ B)= Pr(B). De nition 3.1 may be re-expressed as Pr(A \ B) = Pr(AjB) Pr(B): (3.3) This result, called the multiplication law, provides a convenient way of nding Pr(A \ B) whenever Pr(AjB) and Pr(B) are easy to nd. Example 3.17 Consider a hand of 2 cards randomly drawn from a deck of 52. Let A = \second card is a king" and B = \ rst card is an ace". Then Pr(B) = 4=52 and Pr(AjB) = 4=51. Hence Pr(A \ B) = Pr(\ace and then king") = Pr(AjB) Pr(B) = 4 4 51 52 = :0060: 2 We now consider a useful application of the multiplication law (3.3). Notice that A = (A \ B) [ (A \ B c ); where A \ B and A \ B c are disjoint events because B and its complement B c are disjoint. Hence Pr(A) = Pr(A \ B) + Pr(A \ B c );
INTRODUCTION TO PROBABILITY 29 where, by the multiplication law, and Therefore Pr(A \ B) = Pr(A jb) Pr(B) Pr(A \ B c ) = Pr(AjB c ) Pr(B c ): Pr(A) = Pr(AjB) Pr(B) + Pr(AjB c ) Pr(B c ); (3.4) which is sometimes called the law of total probabilities. Example 3.18 Consider a hand of 2 cards randomly drawn from a deck of 52. Let A = \second card is a king" and B = \ rst card is a king". We have Pr(B) = 4=52, Pr(B c ) = 48=52 and Hence, by the law of total probabilities Pr(AjB) = 3=51; Pr(AjB c ) = 4=51: Pr(A) = 3 51 4 52 + 4 51 48 52 = 4 52 : Thus Pr(A) and Pr(B) are the same. 2 3.6 STATISTICAL INDEPENDENCE Let A and B be two events with non-zero probability. If knowing that B occurred gives no information about whether or not A occurred, then the probability assigned to A should not be modi ed by the knowledge that B occurred, that is, Pr(AjB) = Pr(A). Hence, by the multiplication law, Pr(A \ B) = Pr(A) Pr(B): We take this as our formal de nition of statistical independence. De nition 3.2 Two events A and B are said to be statistically independent if Pr(A \ B) = Pr(A) Pr(B): Notice that this de nition of independence is symmetric in A and B, and also covers the case when Pr(A) = 0 or Pr(B) = 0. It is easy to show that if A and B are independent, then A and B c as well as A c and B c are independent. It is clear from De nition 3.2 thatmutually exclusive events cannot be independent. The concept of statistical independence is di erent from other concepts of independence (logical, mathematical, political, etc.). When there is no ambiguity, the term independence will be taken to mean statistical independence. 2
30 Example 3.19 The sample space associated with the experiment of tossing a fair coin twice is a simple sample space consisting of 2 2 = 4 points. De ne the events A = \H in the rst toss" and B = \T in the second toss". Because A \ B = HT we have Pr(A \ B) = 1 4 = 1 2 1 = Pr(A) Pr(B): 2 This result seems fairly intuitive, because the occurrence of H in the rst coin toss has no relation to, and no in uence on the occurrence of T in the second coin toss, and viceversa. 2 It is natural to assume that events that are physically unrelated (such as successive coin tosses) are also statistically independent. However, physically related events may also satisfy the de nition of statistical independence. Example 3.20 Consider the chance experiment consisting of throwing a fair die. The sample space of this experiment is the simple sample space: 1 2 3 4 5 6 Let A = \an even number is obtained" and B = \the number 1, 2, 3 or 4 is obtained". It is easy to verify that Pr(A) = 1=2 and Pr(B) = 2=3. Further Pr(A \ B) = Pr(\2 or 4") = 1=3 = Pr(A) Pr(B): Hence, A and B are independent even though their occurrence depends on the same roll of a die. 2 3.7 BAYES LAW Suppose that you want to determine whether a coin is fair (F) or unfair (U). You have no information on the coin, and so you are willing to believe that F and U are equally likely, that is, Pr(F) = Pr(U) = 1=2: If the coin is fair, then Pr(H jf) = 1=2: Further suppose that you know that, if the coin is unfair, then H is more likely than T, say Pr(H ju) = :9:
INTRODUCTION TO PROBABILITY 31 Assume that tossing the coin once gives you H. What is now the probability that the coins is unfair? This is called the posterior probability of F given H, and denoted by Pr(F jh). Intuitively, the occurrence of H (the most likely event if the coin is unfair) should modify your initial beliefs, leading you to view the event that the coin is fair as less likely than initially thought, whereas the occurrence of T should lead you to view the event that the coin is fair as more likely than initially thought. One way of computing the posterior probabilities Pr(F jh) and Pr(F jt) is to draw the outcome tree for this problem. F U / H Pr(H \ F) = :25 n T Pr(T \ F) = :25 / H Pr(H \ U) = :45 n T Pr(T \ U) = :05 It is then clear that the events U and F are mutually exclusive and that the event H is the union of the two disjoint events H \ F and H \ U. Hence Therefore Pr(H) = Pr(H \ F) + Pr(H \ U) = :25 + :45 = :70: Pr(F jh) = Pr(H \ H) Pr(H) = :25 :70 = :357; which is indeed less than the original assignement of probability to F, namely Pr(F) = 1=2. By a similar argument we have Pr(F jt) = Pr(T \ F) Pr(T) = :25 :30 = :833: We can also compute the posterior probability Pr(F jh) without the need of a tree diagram, by using the fact that by the multiplication law, and by the law of total probabilities. Hence, Pr(H \ F) = Pr(H jf) Pr(F) Pr(H) = Pr(H jf) Pr(F) + Pr(H ju) Pr(U) Pr(F jh) = Pr(H jf) Pr(F) Pr(H jf) Pr(F) + Pr(H ju) Pr(U) : (3.5)
32 This formula is known as Bayes law. For Pr(F jt), Bayes law gives Pr(F jt) = Pr(T jf) Pr(F) Pr(T jf) Pr(F) + Pr(T ju) Pr(U) ; where Pr(T jf) = 1 Pr(H jf) and Pr(T ju) = 1 Pr(H ju). Notice that we can regard Pr(F) as our prior information about whether the coin is fair. Bayes law then gives us a way of updating this information in the light of the new information contained in the fact that H was obtained.