Elements of probability theory



Similar documents
Introduction to Probability

E3: PROBABILITY AND STATISTICS lecture notes

Basic Probability Concepts

Lecture Note 1 Set and Probability Theory. MIT Spring 2006 Herman Bennett

Lecture 1 Introduction Properties of Probability Methods of Enumeration Asrat Temesgen Stockholm University

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

1. Prove that the empty set is a subset of every set.

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS INTRODUCTION

Math/Stats 425 Introduction to Probability. 1. Uncertainty and the axioms of probability

LECTURE NOTES IN MEASURE THEORY. Christer Borell Matematik Chalmers och Göteborgs universitet Göteborg (Version: January 12)

k, then n = p2α 1 1 pα k

Discrete Mathematics

So let us begin our quest to find the holy grail of real analysis.

Chapter 4. Probability and Probability Distributions

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Undergraduate Notes in Mathematics. Arkansas Tech University Department of Mathematics

5. Probability Calculus

Statistics in Geophysics: Introduction and Probability Theory

6.3 Conditional Probability and Independence

Extension of measure

Probability Theory. Florian Herzog. A random variable is neither random nor variable. Gian-Carlo Rota, M.I.T..

Combinatorial Proofs

Chapter ML:IV. IV. Statistical Learning. Probability Basics Bayes Classification Maximum a-posteriori Hypotheses

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

1 if 1 x 0 1 if 0 x 1

Sample Induction Proofs

MA651 Topology. Lecture 6. Separation Axioms.

Definition and Calculus of Probability

Mathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson

3. Mathematical Induction

Mathematics for Econometrics, Fourth Edition

A Little Set Theory (Never Hurt Anybody)

God created the integers and the rest is the work of man. (Leopold Kronecker, in an after-dinner speech at a conference, Berlin, 1886)

36 CHAPTER 1. LIMITS AND CONTINUITY. Figure 1.17: At which points is f not continuous?

CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e.

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES

Collinear Points in Permutations

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

THE BANACH CONTRACTION PRINCIPLE. Contents

Cartesian Products and Relations

Gambling Systems and Multiplication-Invariant Measures

Representation of functions as power series

The Binomial Distribution

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

Metric Spaces Joseph Muscat 2003 (Last revised May 2009)

Basic Concepts of Point Set Topology Notes for OU course Math 4853 Spring 2011

Basics of Counting. The product rule. Product rule example. 22C:19, Chapter 6 Hantao Zhang. Sample question. Total is 18 * 325 = 5850

4. Continuous Random Variables, the Pareto and Normal Distributions

Lecture 17 : Equivalence and Order Relations DRAFT

Chapter 4 Lecture Notes

STAT 319 Probability and Statistics For Engineers PROBABILITY. Engineering College, Hail University, Saudi Arabia

Full and Complete Binary Trees

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

PUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include

Chapter 13 & 14 - Probability PART

A Tutorial on Probability Theory

FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES

Sums of Independent Random Variables

Section 1.3 P 1 = 1 2. = P n = 1 P 3 = Continuing in this fashion, it should seem reasonable that, for any n = 1, 2, 3,..., =

ECE302 Spring 2006 HW1 Solutions January 16,

TOPPER Sample Paper - I. Class : XI MATHEMATICS. Questions. Time Allowed : 3 Hrs Maximum Marks: 100

Vector and Matrix Norms

You flip a fair coin four times, what is the probability that you obtain three heads.

Exact Nonparametric Tests for Comparing Means - A Personal Summary

SOLUTIONS TO EXERCISES FOR. MATHEMATICS 205A Part 3. Spaces with special properties

Normal distribution. ) 2 /2σ. 2π σ

Real Roots of Univariate Polynomials with Real Coefficients

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

LEARNING OBJECTIVES FOR THIS CHAPTER

INTRODUCTORY SET THEORY

How To Solve A Minimum Set Covering Problem (Mcp)

1.2 Solving a System of Linear Equations

Set theory as a foundation for mathematics

Notes on Probability. Peter J. Cameron

An example of a computable

Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

Arkansas Tech University MATH 4033: Elementary Modern Algebra Dr. Marcel B. Finan

Continued Fractions and the Euclidean Algorithm

Section 6-5 Sample Spaces and Probability

Math 55: Discrete Mathematics

MEASURE AND INTEGRATION. Dietmar A. Salamon ETH Zürich

Baltic Way Västerås (Sweden), November 12, Problems and solutions

CHAPTER 5. Number Theory. 1. Integers and Division. Discussion

Practice with Proofs

The Ideal Class Group

Mathematical Methods of Engineering Analysis

Lemma 5.2. Let S be a set. (1) Let f and g be two permutations of S. Then the composition of f and g is a permutation of S.

Lecture 1: Systems of Linear Equations

Random variables, probability distributions, binomial random variable

Probability. Sample space: all the possible outcomes of a probability experiment, i.e., the population of outcomes

Chapter 4: Probability and Counting Rules

c 2008 Je rey A. Miron We have described the constraints that a consumer faces, i.e., discussed the budget constraint.

A Simpli ed Axiomatic Approach to Ambiguity Aversion

Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products

Notes on metric spaces

CS 598CSC: Combinatorial Optimization Lecture date: 2/4/2010

How To Find Out How To Calculate A Premeasure On A Set Of Two-Dimensional Algebra

Transcription:

2 Elements of probability theory Probability theory provides mathematical models for random phenomena, that is, phenomena which under repeated observations yield di erent outcomes that cannot be predicted with certainty. 2.1 SAMPLE SPACES A situation whose outcomes occur randomly is called an experiment. The set of all possible outcomes of an experiment is called the sample space corresponding to the experiment, and is denoted by. A generic element of is called a sample point, or simply a point, and is denoted by! 2. Example 2.1 A coin is tossed twice and the sequence of heads (H) and tails (T) is recorded. The possible outcomes of this experiment are HH, HT, TH and TT. Hence, the sample space corresponding to this experiment consists of the four points = fhh;ht;th;ttg: A sample space is called nite if it is empty or contains a nite number of points, otherwise is called in nite. A sample space is called countable if its points can be indexed by the set of positive integers. A sample space that is nite or countable is called discrete. Example 2.2 A coin is tossed until H is recorded. The sample space corrsponding to this experiment is = fh;th;tth;ttth;tttth;:::g: Thus contains countably many points. 2 Not all sample spaces are discrete. For example, the sample space consisting of all positive real numbers is not discrete, neither is the sample space consisting of all real numbers in the interval [0; 1]. 2

8 2.2 RELATIONS AMONG EVENTS A subset of points in a sample space is called an event in. An event occurs if and only if one of its points occurs. Viewed as an event, is called the sure event. In general, events will be de ned by certain conditions on the points that compose them. Because events are just subsets of points in, concepts and results from point set theory apply to events. In particular, if A and B are events in, A implies B, written A µ B, if and only if all points in A also belong to B. The events A and B are identical, written A = B, if and only if A µ B and B µ A, that is, A and B contain exactly the same points. Other usual operations and relations between sets are listed below (union) A [ B = f! 2 :! 2 A or! 2 Bg, (intersection) A \ B = f! 2 :! 2 A and! 2 Bg, (complement) A c = f! 2 :! 62 Ag, (impossible event) ; = c, (di erence) A B = A \ B c, (symmetric di erence) A B = (A B) [ (B A). Instead of A \ B we also write AB. Operations and relations between sets are easy to visualize using Venn diagrams. If A \ B = ;, then A andb are called disjoint or mutually exclusive events. Intersections and unions of a countable collection A 1 ;A 2 ;A 3 ;::: of events are denoted by T 1 A i and S 1 A i, respectively. If A is an arbitrary family of subsets in, we write [ A2A A = f! 2 :! 2 A for some A 2 Ag and \ A2A A = f! 2 :! 2 A for all A 2 Ag Some basic relationships between events are: A [ ; = A, A [ =, A [ A = A, [ ; =, A \ ; = ;, A \ = A, A \ A = A, \ ; = ;. and

ELEMENTS OF PROBABILITY THEORY 9 (commutative law) A [ B = B [ A, A \ B = B \ A, (distributive law) A [ (B \ C) = (A [ B) \ (A [ C), A \ (B [ C) = (A \ B) [ (A \ C), (associative law) (A [ B) [ C = A [ (B [ C), (A \ B) \ C = A \ (B \ C), (De Morgan's laws) (A [ B) c = A c \ B c, (A \ B) c = A c [ B c. De Morgan's laws show that complementation, union and intersection are not independent operations. The commutative, distributive and associative laws and De Morgan's laws can easily be extended to a countable collection of events A 1 ;A 2 ;A 3 ;:::. The characteristic function (cf) of an event A µ is a function 1 A ( ) de ned for all! 2 by the relation ½ 1; if! 2 A, 1 A (!) = 0; otherwise. We also write 1(! 2 A) or, more compactly, 1(A). There is a one-to-one correspondence between sets and their cf's, and all properties of sets and set operations can be expressed in terms of cf's. For example, if C = A c then 1 C = 1 1 A, if C = A [ B then 1 C = max(1 A ; 1 B ), and if C = A \ B then 1 C = 1 A 1 B. 2.3 PROBABILITY How can we attach probabilities to events? The rst and easiest case is an experiment with a nite sample space consisting of N points. Suppose that, because of the nature of the experiment (e.g. tossing a fair coin), all points in are equiprobable, that is, equally likely, and let A be some event in. We de ne the probability of A, written P(A), as the ratio P(A) = N(A) N ; (2.1) where N(A) denotes the number of points in A. For any A µ, we have 0 P(A) 1; P( ) = N( ) N = 1; P(;) = N(;) N = 0: Further, if A and B are disjoint events in, then P(A [ B) = N(A [ B) N = N(A) N + N(B) N = P(A) + P(B):

10 Despite its simplicity, formula (2.1) can lead to non trivial calculations. In order to use it in a given problem, we need to determine: (i) the number N of all equiprobable outcomes, and (ii) the number of all those outcomes leading to the occurrence of A. A second case is whena basic experimentcan be repeated inexactly the same conditions any number n of times. We call this situation the case of independent trials under identical conditions. In this case, we can give a precise meaning to the concept of probability. In each trial a particular event A may or may not occur. Let n(a) be the number of trials in which A occurs. The relative frequency of the event A in the given series of n trials is de ned as f n (A) = n(a) n : It is an empirical fact that the f n (A) observed for di erent series of trials are virtually the same for large n, clustering about a constant value P(A), called the probability of A. Roughly speaking, the probability of A equals the fraction of trials leading to the occurrence of A in a large series of trials. 2.4 COMBINATORIAL RESULTS Whenever equal probabilities are assigned to the elements of a nite sample space, computation of probabilities of events reduces to counting the points comprising the events. Theorem 2.1 Given n elements a 1 ;:::;a n and m elements b 1 ;:::;b m there are exactly nm distinct ordered pairs (a i ;b j ) containing one element of each kind. Thus, if one experiment has n possible outcomes and another experiment has m possible outcomes, there are nm possible outcomes for the two expriments. More generally we have: Theorem 2.2 Given n 1 elements a 1 ;:::;a n1, n 2 elements b 1 ;:::;b n2, etc., up to n r elements x 1 ;:::;x nr, there are n 1 n 2 n r distinct ordered pairs (a i1 ;b i2 ;:::;x ir ) containing one element of each kind. Thus, if there are r experiments, where the rst has n 1 possible outcomes, the second n 2,..., and the rth n r possible outcomes, there are a total of n 1 n 2 n r possible outcomes for the r experiments. A permutation is an ordered arrangement of objects. An ordered sample of size r is a permutation of r objects obtained from a set ofn elements. Two possible ways for obtaining samples are: sampling with replacement and sampling without replacement. Notice that only samples of size r n without replacement are possible.

ELEMENTS OF PROBABILITY THEORY 11 Theorem 2.3 Given a set of n elements and sample size r, there are n r di erent ordered samples with replacement, and n(n 1)(n 2) (n r + 1) = di erent ordered samples without replacement. n! (n r)! Theorem 2.3 implies that the number of permutations or orderings of n elements is equal to n!. A combination is a set of elements without repetitions and without regard to ordering. For example, fa; bg and fb; ag are di erent permutations but only one combination. Thus, a combination is the number of unordered samples of a given size drawn without replacement from a nite set of objects. Theorem 2.4 The number of possible combinations of n objects taken r at a time (r n), is equal to µ Cr n n! n = r!(n r)! = : r Proof. Since the number of ordered samples is equal to the number of unordered samples times the number of ways to order each sample we have that n! (n r)! = Cn r r!; from which C n r = n! r!(n r)! : 2 The number Cr n is called binomial coe±cient, since it occurs in the binomial expansion nx µ n (a + b) n = a n r b r : r r=0 More generally, simple induction gives the following: Theorem 2.5 Given a set of n elements, let n 1 ;:::;n k be positive integers such that P k n i = n. Then there are µ n n! = (2.2) n 1 n 2 :::n k n 1!n 2!:::n k! ways of partitioning the set into k unorderd samples without replacement of size n 1 ;:::;n k respectively. The numbers (2.2) are called multinomial coe±cients.

12 2.5 FINITE PROBABILITY SPACES The de nition of probability in terms of equiprobable events is circular. On the other hand, de ning probabilities as limits of relative frequencies inindependent trials under identical conditions is far too restrictive. To avoid these problems we shall now present a purely axiomatic treatment of probabilities. De nition 2.1 A sample space is called a nite probability space if is nite and for every event A µ, there is de ned a real number P(A), called the probability of the event A, such that: A.1: P(A) 0; A.2: P( ) = 1; A.3: if A 1 and A 2 are mutually exclusive events in, then P(A [ B) = P(A) + P(B): It follows from De nition 2.1 that, for any subset A and B of, 0 P(A) 1; (2.3) 2 Further P(A c ) = 1 P(A); (2.4) P(;) = 0; (2.5) A µ B ) P(A) P(B): (2.6) P(A [ B) = P(A) + P(B) P(AB) (Covering theorem): (2.7) This implies the following upper bound on P(A [ B) P(A [ B) P(A) + P(B); with equality if and only if A and B are disjoint. Notice that B = AB[A c B, where AB and A c B are mutually exclusive events. Hence P(B) = P(AB) + P(A c B) and therefore P(B) P(AB) = P(A c B). Substituting in (2.7) gives P(A [ B) = P(A) + P(A c B) (Addition law): (2.8) Also notice that, by De Morgan's law and the Covering theorem, This implies 1 P(AB) = P((AB) c ) = P(A c [ B c ) P(A c ) + P(B c ): P(AB) 1 P(A c ) P(B c ) (Bonferroni inequality); (2.9)

ELEMENTS OF PROBABILITY THEORY 13 with equality if and only if A c and B c are disjoint. More generally, if A 1 ;:::;A n is a nite collection of events in, then A 1 ;A c 1A 2 ;:::;A c 1A c 2;:::;A c n 1A n form a partition of S n A i, and so n[ P( A i ) = P(A 1 ) + P(A c 1A 2 ) + + P(A c 1A c 2 A c n 1A n ): This result generalizes the Addition law (2.8). Since for all n 1, we also have A c 1A c 2 A c n 1A n µ A n n[ P( A i ) nx P(A i ): This result generalizes the Covering theorem (2.7). Finally, the generalization of the Bonferroni inequality (2.9) is n\ P( A i ) 1 nx P(A c i): 2.6 MEASURABLE SPACES AND MEASURES For in nite sample spaces, some modi cations of the axioms A.1-A.3 and some additional concepts of set theory are required. The reason is that some subsets of an in nite sample space may be so irrregular that it is not possible to assign a probability to them. A set whose elements are sets of will be called a class of sets in. When a set operation performed on sets in a class A gives as a result sets which also belong to A, we say that A is closed under the given operation. De nition 2.2 A nonempty class A of subsets of is called a eld or an algebra on if it contains and is closed under complementation and nite unions, that is, 2 A; A 2 A ) A c 2 A; (2.10) n[ A i 2 A; i = 1;:::;n ) A i 2 A: (2.11) By De Morgan's laws, (2.10) and (2.11) together imply A i 2 A; i = 1;:::;n ) n\ A i 2 A: 2

14 Thus, all standardset operations (union, intersection andcomplementation) can be performed any nite number of times on the elements of a eld A without obtaining a set not in A. De nition 2.3 A eld A on is a ¾- eld or a ¾-algebra if it is closed under countable unions, that is, if A i 2 A; i = 1; 2;::: ) A i 2 A: (2.12) By De Morgan's law, (2.10) and (2.12) together imply A i 2 A; i = 1; 2;::: ) 1\ A i 2 A: Thus, all standard set operations can be performed any countable number of times on the elements of a ¾- eld A without obtaining a set not in A. If A is a class of subsets of, the smallest eld (¾- eld) containing A is called the eld (¾- eld) generated by A. It can be veri ed that the eld (¾- eld) generated by A is equal to the intersection of all eld (¾- elds) containing A. If A is a ¾- eld on, the pair ( ;A) is called a measurable space. A subset A of is said to be measurable if A 2 A. Given a space, it is generally possible to de ne many ¾- elds on. To distinguish between them, the members of a given ¾- eld A on will be called A-measurable sets. Example 2.3 An important ¾- eld on the real line < is that generated by the class of all bounded semi-closed intervals of the form (a;b], 1 < a < b < 1. This ¾- eld is called the Borel eld on < and denoted by B. Its elements are called the Borel sets. Since B is a ¾- eld, repeated nite and countable set theoretic operations on its elements will never lead outside B. The measurable space (<;B) is called the Borel line. Notice that B would equivalently be generated by all the open half-lines of <, all the open intervals of <, or all the closed intervals of <. 2 A set function is a function de ned on a class of sets. De nition 2.4 A measure ¹ on a measurable space ( ;A) is a nonnegative set function¹de ned for all sets ofa and satisfying: M.1: ¹(;) = 0; M.2: (Countable additivity) if fa i g is any countable sequence of disjoint A- measurable sets, then ¹( A i ) = nx ¹(A i ): 2

ELEMENTS OF PROBABILITY THEORY 15 Clearly, countable additivity implies nite additivity, that is, if A 1 ;:::;A n is a nite collection of disjoint measurable sets, then n[ ¹( A i ) = nx ¹(A i ): Example 2.4 Let f be a nonnegative function of the points of a set. Let the ¾- eld A consist of all countable subsets of. A measure ¹ on ( ;A) is then de ned as nx ¹(;) = 0; ¹(f! 1 ;:::;! n g) = f(! i ): If f = 1, then ¹ is called counting measure. 2 It is easy to verify that if ¹ is a measure on ( ;A), then it is monotone, that is, ¹(A) ¹(B) whenever A;B 2 A and A ½ B. De nition 2.5 A measure ¹ on ( ;A) is called nite if ¹( ) < 1. It is called ¾- nite if there exists a sequence fa i g of sets in A such that S 1 A i = and ¹(A i ) < 1, n = 1; 2;:::. 2 Example 2.5 An important ¾- nite measure is the one de ned on the Borel line (<;B) by ¹((a;b]) = b a, the length of the interval (a;b]. Such a measure is called Lebesgue measure. It is easy to verify that every countable set is a Borel set of measure zero. 2 De nition 2.6 If ¹ is a measure on ( ;A), the triple ( ;A;¹) is called a measure space. 2 A measure space ( ;A;¹) is called complete if it contains all subsets of sets of measure zero, that is, if A 2 A, B ½ A, and ¹(A) = 0, then B 2 A. It can be shown that each measure space can be completed by the addition of subsets of sets of measure zero. If ¹ is a ¾- nite measure de ned on ( ;A) and F(A) is the ¾- eld generated by A, then it can be shown that there exists a unique measure ¹ on ( ;F(A)) such that ¹ (A) = ¹(A) for all A 2 A. Further, ¹ is also ¾- nite. Such a measure is called the extension of ¹. De nition 2.7 A measure space ( ;A;P) is a probability space if P is a ¾- nite measure with P( ) = 1. 2 2 2.7 PROBABILITY SPACES From De nition 2.7, a probability space is a triple ( ;A;P), where is the sample space associated with an experiment, A is a ¾- eld on, and the probability measure P is a real valued function de ned for all sets in A and satisfying:

16 P.1: P(A) 0 for all A 2 A; P.2: P( ) = 1; P.3: (Countable additivity): If fa i g is a countable sequence of disjoint subsets in A, then nx P( A i ) = P(A i ): If ( ;A;P) is a probability space, then the sets in A are interpreted as possible events associated with an experiment. For any A 2 A, the real number P(A) is called the probability of the event A. A support of P is any set A 2 A for which P(A) = 1. If is a nite sample space and A is the set of all the events in (the collection of all subsets of ), then properties P.1{P.3 are equivalent to A.1{A.3 that de ne a nite probability space. As a consequence of properties P.1{P.3, relationships (2.3){(2.9) hold for any A; B 2 A. Further, their generalizations hold for any nite collection of events in A. Notice that the Covering theorem (2.7) can be shown to hold for any countable collection of events in A. Further, if fa i g is a countable collection of events in A such that A 1 µ A 2 µ, then P( A i ) = lim n!1 P(A i): 2.8 CONDITIONAL PROBABILITY Let ( ;A;P) be a probability space and let B 2 A be an event such that P(B) > 0. If we know that B occurred, then the relevant sample space becomes B rather than. This justi es de ning the conditional probability of A given B as P(AjB) = P(AB) (2.13) P(B) if P(B) > 0, and P(A jb) = 0 if P(B) = 0. It is easy to verify that the function P( jb) de ned on A is a probability measure on ( ;A), that is, it satis es P.1{P.3. We call P( jb) the conditional probability measure given B. Notice that (2.13) can equivalently be written as P(AB) = P(AjB)P(B): This result, called the Multiplication law, provides a convenient way of nding P(AB) whenever P(AjB) and P(B) are easy to nd. The Multiplication law can be generalized to a nite collection of events A 1 ;:::;A n in A P(A 1 A n ) = P(A n ja n 1 A 1 )P(A n 1 A 1 ) = P(A n ja n 1 A 1 )P(A n 1 ja n 2 A 1 )P(A n 2 A 1 );

ELEMENTS OF PROBABILITY THEORY 17 and so on. Thus P(A 1 A n ) = P(A 1 )P(A 2 ja 1 )P(A 3 ja 2 A 1 ) P(A n ja n 1 A 1 ): Now consider a countable collection fb i g of disjoint events in A such that P(B i ) > 0 for every i and S 1 B i =. Clearly, For any A 2 A, where we used the fact that 1X P( B i ) = P(B i ) = 1: P(A) = P(A \ ( B i )) + P(A \ ( B i ) c ) = P(A \ ( B i )); P(( B i ) c ) = 1 P( B i ) = 0: Thus, by the Morgan's laws, 1X P(A) = P( AB i ) = P(AB i ); since AB i \ AB j = ; for all i 6= j. Therefore P(A) = 1X P(AjB i )P(B i ); which is called the Law of total probabilities. Now let A 2 A be such that P(A) > 0, and consider computing the conditional probability P(B j ja) given knowledge of fp(ajb i )g and fp(b i )g. By the de nition of conditional probability and the Multiplication law, P(B j ja) = P(B ja) P(A) = P(AjB j)p(b j ) P(A) for any xed j = 1; 2;:::. Therefore, by the Law of total probabilities, P(B j ja) = P(A jb j) P(B j ) P 1 P(AjB i)p(b i ) ; which is called Bayes rule.

18 2.9 INDEPENDENCE Let A;B 2 A be two events with non-zero probability. If knowing that B occurred gives no information about whether or not A occurred, then the probability assigned to A should not be modi ed by the knowledge that B occurred. Hence P(AjB) = P(A); and so P(AB) = P(A)P(B): (2.14) Two events A;B 2 A are said to be (pairwise) independent if (2.14) holds. Notice that this de nition of independence is symmetric in A and B, and also covers the case when P(A) = 0 or P(B) = 0. It is easy to show that if A and B are independent, then A and B c as well as A c and B c are independent. Three events A;B;C 2 A are said to be (mutually) independent if they are pairwise independent and P(ABC) = P(A)P(B) P(C): This condition is necessary, for pairwise independence does not ensure that, for example, P((AB)C) = P(AB)P(C). It is easy to verify that if A;B and C are independent events, then A[B and C are independent, and A \ B and C are independent. More generally, a family A of events are (mutually) independent if, for every nite collection A 1 ;:::;A n of events in A, REFERENCES n\ P( A i ) = ny P(A i ): (2.15) Billingsley P. (1979) Probability and Measure, Wiley, New York. Feller, W. (1968) An Introduction to Probability Theory and Its Applications (3rd ed.), Vol. 1, Wiley, New York. Halmos, P.R. (1974)Measure Theory, Springer, New York. Kolmogorov, A.N. and Fomin S.V. (1970) Introductory Real Analysis, Dover, New York. Loµeve, M. (1977)Probability Theory (4th ed.), Vol. 1, Springer, New York. Royden H.L. (1968)Real Analysis (2nd ed.), MacMillan, New York.