NOTES ON PROBABILITY Greg Lawler Last Updated: March 21, 2016

Transcription

1 NOTES ON PROBBILITY Greg Lawler Last Updated: March 21, 2016 Overview This is a itroductio to the mathematical foudatios of probability theory. It is iteded as a supplemet or follow-up to a graduate course i real aalysis. The first two sectios assume the kowledge of measure spaces, measurable fuctios, Lebesgue itegral, ad otios of covergece of fuctios; the third assumes Fubii s Theorem; the fifth assumes kowledge of Fourier trasform of ice (Schwartz) fuctios o R; ad Sectio 6 uses the Rado-Nikodym Theorem. The mathematical foudatios of probability theory are exactly the same as those of Lebesgue itegratio. However, probability adds much ituitio ad leads to differet developmets of the area. These otes are oly iteded to be a brief itroductio this might be cosidered what every graduate studet should kow about the theory of probability. Probability uses some differet termiology tha that of Lebesgue itegratio i R. These otes will itroduce the termiology ad will also relate these ideas to those that would be ecoutered i a elemetary (by which we will mea pre-measure theory) course i probability or statistics. Graduate studets ecouterig probabilty for the first time might wat to also read a udergraduate book i probability. 1 Probability spaces Defiitio probability space is a measure space with total measure oe. The stadard otatio is (Ω, F, P) where: Ω is a set (sometimes called a sample space i elemetary probability). Elemets of Ω are deoted ω ad are sometimes called outcomes. F is a σ-algebra (or σ-field, we will use these terms syoymously) of subsets of Ω. Sets i F are called evets. P is a fuctio from F to [0, 1] with P(Ω) = 1 ad such that if E 1, E 2,... F are disjoit, P = P[E j ]. We say probability of E for P(E). E j discrete probability space is a probability space such that Ω is fiite or coutably ifiite. I this case we usually choose F to be all the subsets of Ω (this ca be writte F = 2 Ω ), ad the probability measure P is give by a fuctio p : Ω [0, 1] with ω Ω p(ω) = 1. The, P(E) = ω E p(ω). We will cosider aother importat example here, the probability space associated to a ifiite umber of flips of a coi. Let Ω = {ω = (ω 1, ω 2,...) : ω j = 0 or 1}. 1

2 We thik of 0 as tails ad 1 as heads. For each positive iteger, let Ω = {(ω 1, ω 2,..., ω ) : ω j = 0 or 1}. Each Ω is a fiite set with 2 elemets. We ca cosider Ω as a probability space with σ-algebra 2 Ω ad probability P iduced by p (ω) = 2, ω Ω. Let F be the σ-algebra o Ω cosistig of all evets that deped oly o the first flips. More formally, we defie F to be the collectio of all subsets of Ω such that there is a E 2 Ω with = {(ω 1, ω 2,...) : (ω 1, ω 2,..., ω ) E}. (1) Note that F is a fiite σ-algebra (cotaiig 2 2 subsets) ad If is of the form (1), we let F 1 F 2 F 3. P() = P (E). Oe ca easily check that this defiitio is cosistet. This gives a fuctio P o F 0 := F j. Recall that a algebra of subsets of Ω is a collectio of subsets cotaiig Ω, closed uder complemetatio, ad closed uder fiite uios. To show closure uder fiite uios, it suffices to show that if E 1, E 2 F 0, the E 1 E 2 F 0. Propositio 1.1. F 0 is a algebra but ot a σ-algebra. Proof. Ω F 0 sice Ω F 1. Suppose E F 0. The E F for some, ad hece E c F ad E c F. lso, suppose E 1, E 2 F. The there exists j, k with E 1 F j, E 2 F k. Let = max{j, k}. The sice the σ-algebras are icreasig, E 1, E 2 F ad hece E 1 E 2 F. Therefore, E 1 E 2 F 0 ad F 0 is a algebra. To see that F 0 is ot a σ-algebra cosider the sigleto set E = {(1, 1, 1,...)}. E is ot i F 0 but E ca be writte as a coutable itersectio of evets i F 0, E = {(ω 1, ω 2,...) : ω 1 = ω 2 = = ω j = 1}. Propositio 1.2. The fuctio P is a (coutably additive) measure o F 0, i.e., it satisfies P[ ] = 0, ad if E 1, E 2,... F 0 are disjoit with =1E F 0, the [ ] P E = P(E ). =1 =1 2

3 Proof. P[ ] = 0 is immediate. lso it is easy to see that P is fiitely additive, i.e., if E 1, E 2,..., E F 0 are disjoit the P = P(E j ). E j (To see this, ote that there must be a N such that E 1,..., E F N ad the we ca use the additivity of P N.) Showig coutable subadditivity is harder. I fact, the followig stroger fact holds: suppose E 1, E 2,... F 0 are disjoit ad E = =1E F 0. The there is a N such that E j = for j > N. Oce we establish this, coutable additivity follows from fiite additivity. To establish this, we cosider Ω as the topological space {0, 1} {0, 1} with the product topology where we have give each {0, 1} the discrete topology (all four subsets are ope). The product topology is the smallest topology such that all the sets i F 0 are ope. Note also that all sets i F 0 are closed sice they are complemets of sets i F 0. It follows from Tychooff s Theorem that Ω is a compact topological space uder this topology. Suppose E 1, E 2,... are as above with E = =1E F 0. The E 1, E 2,... is a ope cover of the closed (ad hece compact) set E. Therefore there is a fiite subcover, E 1, E 2,..., E N. Sice the E 1, E 2,... are disjoit, this must imply that E j = for j > N. Let F be the smallest σ-algebra cotaiig F 0. The the Carathéodory Extesio Theorem tells us that P ca be exteded uiquely to a complete measure space (P, F, P) where F F. The astute reader will ote that the costructio we just did is exactly the same as the costructio of Lebesgue measure o [0, 1]. Here we deote a real umber x [0, 1] by its dyadic expasio x =.ω 1ω 2ω 3 = (There is a slight uisace with the fact that = , but this ca be hadled.) The σ-algebra F above correspods to the Borel subsets of [0, 1] ad the completio F correspods to the Lebesgue measurable sets. If the most complicated probability space we were iterested were the space above, the we could just use Lebesgue measure o [0, 1]. I fact, for almost all importat applicatios of probability, oe could choose the measure space to be [0, 1] with Lebesgue measure (see Exercise 3). However, this choice is ot always the most coveiet or atural. ω j 2 j. Cotiuig o the last remark, oe geerally does ot care what probability space oe is workig o. What oe observes are radom variables which are discussed i the ext sectio. 3

4 2 Radom Variables ad Expectatio Defiitio radom variable X is a measurable fuctio from a probability space (Ω, F, P) to the reals 1, i.e., it is a fuctio X : Ω (, ) such that for every Borel set B, Here we use the shorthad otatio X 1 (B) = {X B} F. {X B} = {ω Ω : X(ω) B}. If X is a radom variable, the for every Borel subset B of R, X 1 (B) F. We ca defie a fuctio o Borel sets by µ X (B) = P{X B} = P[X 1 (B)]. This fuctio is i fact a measure, ad (R, B, µ X ) is a probability space. The measure µ X is called the distributio of the radom variable. If µ X gives measure oe to a coutable set of reals, the X is called a discrete radom variable. If µ X gives zero measure to every sigleto set, ad hece to every coutable set, X is called a cotiuous radom variable. Every radom variable ca be writte as a sum of a discrete radom variable ad a cotiuous radom variable. ll radom variables defied o a discrete probability space are discrete. The distributio µ X is ofte give i terms of the distributio fuctio 2 defied by Note that F = F X satisfies the followig: lim x F (x) = 0. lim x F (x) = 1. F is a odecreasig fuctio. F is right cotiuous, i.e., for every x, F X (x) = P{X x} = µ X (, x]. F (x+) := lim ɛ 0 F (x + ɛ) = F (x). Coversely, ay F satisfyig the coditios above is the distributio fuctio of a radom variable. The distributio ca be obtaied from the distributio fuctio by settig µ X (, x] = F X (x), ad extedig uiquely to the Borel sets. For some cotiuous radom variables X, there is a fuctio f = f X : R [0, ) such that P{a X b} = b a f(x) dx. Such a fuctio, if it exists, is called the desity 3 of the radom variable. If the desity exists, the F (x) = x f(t) dt. 1 or, more geerally, to ay topological space 2 ofte called cumulative distributio fuctio (cdf) i elemetary courses 3 More precisely, it is the desity or Rado-Nikoydm derivative with respect to Lebesgue measure. I elemetary courses, the term probability desity fuctio (pdf) is ofte used. 4

5 If f is cotiuous at t, the the fudametal theorem of calculus implies that desity f satisfies f(x) = F (x). f(x) dx = 1. Coversely, ay oegative fuctio that itegrates to oe is the desity of a radom variable. Example If E is a evet, the idicator fuctio of E is the radom variable { 1, ω E, 1 E (ω) = 0, ω E. (The correspodig fuctio i aalysis is ofte called the characteristic fuctio ad deoted χ E. Probabilists ever use the term characteristic fuctio for the idicator fuctio because the term characteristic fuctio has aother meaig. The term idicator fuctio has o ambiguity.) Example Let (Ω, F, P) be the probability space for ifiite tossigs of a coi as i the previous sectio. Let X (ω 1, ω 2,...) = ω = { 1, if th flip heads, 0, if th flip tails. (2) S = X X = # heads o first flips. (3) The X 1, X 2,..., ad S 1, S 2,... are discrete radom variables. If F deotes the σ-algebra of evets that deped oly o the first flips, the S is also a radom variable o the probability space (Ω, F, P). However, S +1 is ot a radom variable o (Ω, F, P). Example Let µ be ay probability measure o (R, B). Cosider the trivial radom variable X = x, defied o the probability space (R, B, µ). The X is a radom variable ad µ X = µ. Hece every probability measure o R is the distributio of a radom variable. Example radom variable X has a ormal distributio with mea µ ad variace σ 2 if it has desity f(x) = 1 2πσ 2 e (x µ)2 /2σ 2, < x <. If µ = 0 ad σ 2 = 1, X is said to have a stadard ormal distributio. The distributio fuctio of the stadard ormal is ofte deoted Φ, Example If X is a radom variable ad Φ(x) = x 1 2π e t2 /2 dt. g : (R, B) R is a Borel measurable fuctio, the Y = g(x) is also a radom variable. Example Recall that the Cator fuctio is a cotiuous fuctio F : [0, 1] [0, 1] with F (0) = 0, F (1) = 1 ad such that F (x) = 0 for all x [0, 1]\ where deotes the middle thirds Cator set. Exted F to R by settig F (x) = 0 for x 0 ad F (x) = 1 for x 1. The F is a distributio fuctio. radom variable with this distributio fuctio is cotiuous, sice F is cotiuous. However, such a radom variable has o desity. 5

6 The term radom variable is a little misleadig but it stadard. It is perhaps easier to thik of a radom variable as a radom umber. For example, i the case of coi-flips we get a ifiite sequece of radom umbers correspodig to the results of the flips. Defiitio If X is a oegative radom variable, the expectatio of X, deoted E(X), is E(X) = X dp, where the itegral is the Lebesgue itegral. If X is a radom variable with E( X ) <, the we also defie the expectatio by E(X) = X dp. If X takes o positive ad egative values, ad E( X ) =, the expectatio is ot defied. Other terms used for expectatio are expected value ad mea (the term expectatio value ca be foud i the physics literature but it is ot good Eglish ad sometimes refers to itegrals with respect to measures that are ot probability measures). The letter µ is ofte used for mea. If X is a discrete radom variable takig o the values a 1, a 2, a 3,... we have the stadard formula from elemetary courses: provided the sum is absolutely coverget. E(X) = Lemma 2.1. Suppose X is a radom variable with distributio µ X. The, E(X) = x dµ X. a j P{X = a j }, (4) (Either side exists if ad oly if the other side exists.) Proof. First assume that X 0. If, k are positive itegers let { k, = ω : k 1 X(ω) < k }. For every <, cosider the discrete radom variable X takig values i {k/ : k Z}, R The, Hece But, X = k=1 k 1 k,. X 1 X X. E[X ] 1 E[X] E[X ]. [ E X 1 ] = k=1 k 1 P( k,), 6

7 ad By summig we get k E[X ] = P( k,), k=1 k 1 µ X[ k 1, k ) x dµ X k [ k 1, k ) µ X[ k 1, k ). [ E X 1 ] x dµ X E[X ]. [0, ) By lettig go to ifiity we get the result. The geeral case ca be doe by writig X = X + X. We omit the details. I particular, the expectatio of a radom variable depeds oly o its distributio ad ot o the probability space o which it is defied. If X has a desity f, the the measure µ X is the same as f(x) dx so we ca write (as i elemetary courses) E[X] = where agai the expectatio exists if ad oly if x f(x) dx, x f(x) dx <. Sice expectatio is defied usig the Lebesgue itegral, all results about the Lebesgue itegral (e.g., liearity, Mootoe Covergece Theorem, Fatou s Lemma, Domiated Covergece Theorem) also hold for expectatio. Exercise 2.2. Use Hölder s iequality to show that if X is a radom variable ad q 1, the E[ X q ] (E[X]) q. For which X does equality hold? Example Cosider the coi flippig radom variables (2) ad (3). Clearly Hece lso, E[X ] = = 1 2. E[S ] = E[X X ] = E[X 1 ] + + E[X ] = 2. E[X 1 + X X 1 ] = E[X 1 ] + E[X 1 ] + + E[X 1 ] = 2. Note that i the above example, X X ad X X 1 do ot have the same distributio, yet the liearity rule for expectatio show that they have the same expectatio. very powerful (although trivial!) tool i probability is the liearity of the expectatio eve for radom variables that are correlated. If X 1, X 2,... are oegative we ca eve take coutable sums, [ ] E X = E[X ]. =1 This ca be derived from the rule for fiite sums ad the mootoe covergece theorem by cosiderig the icreasig sequece of radom variables Y = X j. =1 7

8 Lemma 2.3. Suppose X is a radom variable with distributio µ X, ad g is a Borel measurable fuctio. The, E[g(X)] = g(x) dµ X. (Either side exists if ad oly if the other side exists.) Proof. ssume first that g is a oegative fuctio. The there exists a icreasig sequece of oegative simple fuctios with g approachig g. Note that g (X) is the a sequece of oegative simple radom variables approachig g(x). For the simple fuctios g it is immediate from the defiitio that E[g (X)] = g (x) dµ X. ad hece by the mootoe covergece theorem, E[g(X)] = R R R g(x) dµ X. The geeral case ca be doe by writig g = g + g ad g(x) = g + (X) g (X). If X has a desity f, the we get the followig formula foud i elemetary courses E[g(X)] = g(x) f(x) dx. Defiitio The variace of a radom variable X is defied by provided this expecatio exists. Note that Var(X) = E[(X E(X)) 2 ], Var(X) = E[(X E(X)) 2 ] = E[X 2 2XE(X) + (E(X)) 2 ] = E[X 2 ] (E(X)) 2. The variace is ofte deoted σ 2 = σ 2 (X) ad the square root of the variace, σ, is called the stadard deviatio There is a slightly differet use of the term stadard deviatio i statistics; see ay textbook o statistics for a discussio. Lemma 2.4. Let X be a radom variable. (Markov s Iequality) If a > 0, (Chebyshev s Iequality) If a > 0, P{ X a} E[ X ]. a P{ X E(X) a} Var(X) a 2. 8

9 Proof. Let X a = a1 { X a}. The X a X ad hece ap{ X a} = E[X a ] E[X]. This gives Markov s iequality. For Chebyshev s iequality, we apply Markov s iequality to the radom variable [X E(X)] 2 to get P{[X E(X)] 2 a 2 } E([X E(X)]2 ) a 2. Exercise 2.5 (Geeralized Chebyshev Iequality). Let f : [0, ) [0, ) be a odecreasig Borel fuctio ad let X be a oegative radom variable. The for all a > 0, 3 Idepedece P{X a} E[f(X)]. f(a) Throughout this sectio we will assume that there is a probability space (Ω,, P). ll σ-algebras will be sub-σ-algebras of, ad all radom variables will be defied o this space. Defiitio Two evets, B are called idepedet if P( B) = P()P(B). collectio of evets { α } is called idepedet if for every distict α1,..., α, P( α1 α ) = P( α1 ) P( α ). collectio of evets { α } is called pairwise idepedet if for each distict α1, α2, P( α1 α2 ) = P( α1 )P( α2 ). Clearly idepedece implies pairwise idepedece but the coverse is false. easy example ca be see by rollig two dice ad lettig 1 = { sum of rolls is 7 }, 2 = { first roll is 1 }, 3 = { secod roll is 6 }. The P( 1 ) = P( 2 ) = P( 3 ) = 1 6, P( 1 2 ) = P( 1 3 ) = P( 2 3 ) = 1 36, but P( ) = 1/36 P( 1 )P( 2 )P( 3 ). Defiitio fiite collectio of σ-algebras F 1,..., F is called idepedet if for ay 1 F 1,..., F, P( 1 ) = P( 1 ) P( ). ifiite collectio of σ-algebras is called idepedet if every fiite subcollectio is idepedet. Defiitio If X is a radom variable ad F is a σ-algebra, we say that X is F-measurable if X 1 (B) F for every Borel set B, i.e., if X is a radom variable o the probability space (Ω, F, P). We defie F(X) to be the smallest σ-algebra F for which X is F-measurable. 9

10 I fact, F(X) = {X 1 (B) : B Borel }. (Clearly F(X) cotais the right had side, ad we ca check that the right had side is a σ-algebra.) If {X α } is a collectio of radom variables, the F({X α }) is the smallest σ-algebra F such that each X α is F-measurable. Probabilists thik of a σ-algebra as iformatio. Roughly speakig, we have the iformatio i the σ-algebra F if for every evet E F we kow whether or ot the evet E has occured. radom variable X is F-measurable if we ca determie the value of X by kowig whether or ot E has occured for every F-measurable evet E. For example suppose we roll two dice ad X is the sum of the two dice. To determie the value of X it suffices to kow which of the followig evets occured: E j,1 = {first roll is j}, j = 1,... 6, E j,2 = {secod roll is j}, j = 1, More geerally, if X is ay radom variable, we ca determie X by kowig which of the evets E a := {X < a} have occured. Exercise 3.1. Suppose F 1 F 2 is a icreasig sequece of σ-algebras ad The F 0 is a algebra. F 0 = F. =1 Exercise 3.2. Suppose X is a radom variable ad g is a Borel measurable fuctio. Let Y = g(x). The F(Y ) F(X). The iclusio ca be a strict iclusio. The followig is a very useful way to check idepedece. Lemma 3.3. Suppose F 0 ad G 0 are two algebras of evets that are idepedet, i.e., P( B) = P() P(B) for F 0, B G 0. The F := σ(f 0 ) ad G := σ(g 0 ) are idepedet σ-algebras. Proof. Let B G 0 ad defie the measure µ B () = P( B), µ B () = P() P(B). Note that µ B, µ B are fiite measures ad they agree o F 0. Hece (usig Cartheodory extesio) they must agree o F. Hece P( B) = P() P(B) for F, B G 0. Now take F ad cosider the measures ν (B) = P( B), ν (B) = P() P(B). If X 1,..., X are radom variables, we ca cosider them as a radom vector (X 1,..., X ) ad hece as a radom variable takig values i R. The joit distributio of the radom variables is the measure µ = µ X1,...,X o Borel sets i R give by µ(b) = P{(X 1,..., X ) B}. Each radom variable X 1,..., X also has a distributio o R, µ Xi ; these are called the margial distributios. The joit distributio fuctio F = F X1,...,X is defied by F (t 1,..., t ) = P{X 1 t 1,..., X t } = µ[ (, t 1 ] (, t ] ]. 10

11 Defiitio Radom variables X 1, X 2,..., X are said to be idepedet if ay of these (equivalet) coditios hold: For all Borel sets B 1,..., B, P{X 1 B 1,..., X B } = P{X 1 B 1 } P{X 2 B 2 } P{X B }. The σ-algebras F(X 1 ),..., F(X ) are idepedet. µ = µ X1 µ X2 µ X. For all real t 1,..., t, F (t 1,..., t ) = F X1 (t 1 ) F X2 (t 2 ) F X (t ). The radom variables X 1,..., X have a joit desity (joit probability desity fuctio) f(x 1,..., x ) if for all Borel sets B R, P{(X 1,..., X ) B} = f(x 1,..., x ) dx 1 dx 2 dx. Radom variables with a joit desity are idepedet if ad oly if where f Xj (x j ) deotes the desity of X j. B f(x 1,..., x ) = f X1 (x 1 ) f X (x ), ifiite collectio of radom variables is called idepedet if each fiite subcollectio is idepedet. The followig follows immediately from the defiitio ad Exercise 3.2. Propositio 3.4. If X 1,..., X are idepedet radom variables ad g 1,..., g are Borel measurable fuctios, the g 1 (X 1 ),..., g (X ) are idepedet. Propositio 3.5. If X, Y are idepedet radom variables with E X <, E Y <, the E[ XY ] < ad E[XY ] = E[X] E[Y ]. (5) Proof. First cosider simple radom variables S = m a i 1 i, T = i=1 b j 1 Bj, where S is F(X) measurable ad T is F(Y ) measurable. The each i is idepedet of each B j ad (5) ca be derived immediately usig P( i B j ) = P( i ) P(B j ). Suppose X, Y are oegative radom variables. The there exists simple radom variables S, measurable with respect to F(X), that icrease to X ad simple radom variables T, measurable with respect to F(Y ), icreasig to Y. Note that S T icreases to XY. Hece by the mootoe covergece theorem, E[XY ] = lim E[S T ] = lim E[S ] E[T ] = E[X] E[Y ]. Fially for geeral X, Y, write X = X + X, Y + Y. Note that X +, X are idepedet of Y +, Y (Propostio 3.4) ad hece the result follows from liearity of the itegral. 11

12 The multiplicatio rule for expectatio ca also be viewed as a applicatio of Fubii s Theorem. Let µ, ν deote the distributios of X ad Y. If they are idepedet, the the distributio of (X, Y ) is µ ν. Hece [ ] E[XY ] = xy d(µ ν) = xy dµ(x) dν(y) = E[X]y dν(y) = E[X] E[Y ]. R 2 Radom variables X, Y that satisfy E[XY ] = E[X]E[Y ] are called orthogoal. The last propositio shows that idepedet itegrable radom variables are orthogoal. Exercise 3.6. Give a example of orthogoal radom variables that are ot idepedet. Propositio 3.7. Suppose X 1,..., X are radom variables with fiite variace. If X 1,..., X are pairwise orthogoal, the Var[X X ] = Var[X 1 ] + + Var[X ]. Proof. Without loss of geerality we may assume that all the radom variables have zero mea (otherwise, we subtract the mea without chagig ay of the variaces). The Var = E ( X j ) 2 X j = = But by orthogoality, if i j, E[X i X j ] = E[X i ] E[X j ] = 0 E[Xj 2 ] + i j Var(X j ) + i j E[X i X j ] E[X i X j ]. This propositio is most ofte used whe the radom variables are idepedet. It is ot true that Var(X + Y ) = Var(X) + Var(Y ) for all radom variables. Two extreme cases are Var(X + X) = Var(2X) = 4Var(X) ad Var(X X) = 0. The last propositio is really just a geeralizatio of the Pythagorea Theorem. atural framework for discussig radom variables with zero mea ad fiite variace is the Hilbert space L 2 with the ier product X, Y = E[XY ]. I this case, X, Y are orthogoal iff X, Y = 0. Exercise 3.8. Let (Ω, F, P) be the uit iterval with Lebesgue measure o the Borel subsets. Follow this outlie to show that oe ca fid idepedet radom variables X 1, X 2, X 3,... defied o (Ω, F, P), each ormal mea zero, variace 1. Let use write x [0, 1] i its dyadic expasio x =.ω 1 ω 2 ω 3, ω j {0, 1}. as i Sectio 1. (i) Show that the radom variable X(x) = x is a uiform radom variable o [0, 1], i.e., has desity f(x) = 1 [0,1]. (ii) Use a diagoalizatio process to defie U 1, U 2,..., idepedet radom variables each with a uiform distributio o [0, 1]. (iii) Show that X j = Φ 1 (U j ) has a ormal distributio with mea zero, variace oe. Here Φ is the ormal distributio fuctio. 12

13 4 Sums of idepedet radom variables Defiitio sequece of radom variables Y coverges to Y i probability if for every ɛ > 0, lim P{ Y Y > ɛ} = 0. sequece of radom variables Y coverges to Y almost surely (a.s.) or with probability oe (w.p.1) if there is a evet E with P (E) = 1 ad such that for ω E, Y (ω) Y (ω). I other words, i probability is the same as i measure ad almost surely is the aalogue of almost everywhere. The term almost surely is ot very accurate (somethig that happes with probability oe happes surely!) so may people prefer to say with probability oe. Exercise 4.1. Suppose Y 1, Y 2,... is a sequece of radom variables with E[Y ] µ ad Var[Y ] 0. Show that Y µ i probability. Give a example to show that it is ot ecessarily true that Y µ w.p.1. Much of the focus of classical probability is o sums of idepedet radom variables. Let (Ω,, P) be a probability space ad assume that X 1, X 2,... are idepedet radom variables o this probability space. We say that the radom variables are idetically distributed if they all have the same distributio, ad we write i.i.d. for idepedet ad idetically distributed. If X 1, X 2,... are idepedet (or, more geerally, orthogoal) radom variables each with mea µ ad variace σ 2, the [ ] X1 + + X E = 1 [E(X 1) + + E(X )] = µ, [ ] X1 + + X Var = 1 2 [Var(X 1) + + Var(X )] = σ2. The equality for the expectatios does ot require the radom variables to be orthogoal, but the expressio for the variace requires it. The mea of the weighted average is the same as the mea of oe of the radom variables, but the stadard deviatio of the weighted average is σ/ which teds to zero. Theorem 4.2 (Weak Law of Large Numbers). If X 1, X 2,... are idepedet radom variables such that E[X ] = µ ad Var[X ] σ 2 for each, the i probability. X X µ Proof. We have We ow use Exercise 4.1. [ ] [ ] X1 + + X X1 + + X E = µ, Var σ

14 The Strog Law of Large Numbers (SLLN) is a statemet about almost sure covergece of the weighted sums. We will prove a versio of the strog law here. We start with a easy, but very useful lemma. Recall that the limsup of evets is defied by lim sup = =1 m= Probabilists ofte write { i.o.} for lim sup ; here i.o. stads for ifiitely ofte. Lemma 4.3 (Borel-Catelli Lemma). Suppose 1, 2,... is a sequece of evets. (i) If P( ) <, the P{ i.o.} = P(lim sup ) = 0. m. (ii) If P( ) =, ad the 1, 2,... are idepedet, the P{ i.o.} = P(lim sup ) = 1. The coclusio i (ii) is ot ecessarily true for depedet evets. For example, if is a evet of probability 1/2, ad 1 = 2 = =, the lim sup = which has probability 1/2. Proof. (i) If P( ) <, P(lim sup ) = lim P ( m= ) lim m= P( m ) = 0. (ii) ssume P( ) = ad 1, 2,... are idepedet. We will show that [ ] P[(lim sup ) c ] = P = 0. To show this it suffices to show that for each ( P m= c m =1 m= ) = 0, c m ad for this we eed to show for each, lim P M ( M m= c m ) = 0. But by idepedece, ( M P ) c m = m= m= { M [1 P( m )] exp } M P( m ) 0. =m 14

15 Theorem 4.4 (Strog Law of Large Numbers). Let X 1, X 2,... be idepedet radom variables each with mea µ. Suppose there exists a M < such that E[X 4 ] M for each (this holds, for example, whe the sequece is i.i.d. ad E[X 4 1 ] < ). The w.p.1, X 1 + X X µ. Proof. Without loss of geerality we may assume that µ = 0 for otherwise we ca cosider the radom variables X 1 µ, X 2 µ,.... Our first goal is to show that To see this, we first ote that if i {j, k, l}, E[(X 1 + X X ) 4 ] 3M 2. (6) E[X i X j X k X l ] = E[X i ] E[X j X k X l ] = 0. (7) Whe we expad (X 1 + X X ) 4 ad the use the sum rule for expectatio we get 4 terms. We get terms of the form E[Xi 4] ad 3( 1) terms of the form E[X2 i X2 j ] with i j. There are also terms of the form E[X i Xj 3], E[X ix j Xk 2] ad E[X ix j X k X l ] for distict i, j, k, l; all of these expectatios are zero by (7) so we will ot bother to cout the exact umber. We kow that E[Xi 4 ] M. lso (see Exercise 2.2), if i j, E[X 2 i X 2 j ] E[X 2 i ] E[X 2 j ] ( E[X 4 i ] E[X 4 j ] ) 1/2 M. Sice the sum cosists of at most 3 2 terms each bouded by M, we get (6). Let be the evet { X X = 1 } = { X 1 + X X 7/8 }. 1/8 By the geeralized Chebyshev iequality (Exercise 2.5), P( ) E[(X X ) 4 ] ( 7/8 ) 4 3M 3/2. I particular, P( ) <. Hece by the Borel-Catelli Lemma, P(lim sup ) = 0. Suppose for some ω, (X 1 (ω) + + X (ω))/ does ot coverge to zero. The there is a ɛ > 0 such that X 1 (ω) + X 2 (ω) + + X (ω) ɛ, for ifiitely may values of. This implies that ω lim sup. Hece the set of ω at which we do ot have covergece has probability zero. The strog law of large umbers holds i greater geerality tha uder the coditios give here. I fact, if X 1, X 2,... is ay i.i.d. sequece of radom variables with E(X 1) = µ, the the strog law holds X X µ, w.p.1. We will ot prove this here. However, as the ext exercise shows, the strog law does ot hold for all idepedet sequeces of radom variables with the same mea. 15

16 Exercise 4.5. Let X 1, X 2,... be idepedet radom variables. Suppose that Show that that E[X ] = 1 for each ad that w.p.1, P{X = 2 } = 1 P{X = 0} = 2. X 1 + X X 0. Verify that the hypotheses of Theorem 4.4 do ot hold i this case. Exercise 4.6. a. Let X be a radom variable. Show that E[ X ] < if ad oly if P{ X } <. =1 b. Let X 1, X 2,... be i.i.d. radom variables. Show that if ad oly if E[ X 1 ] <. P{ X i.o.} = 0 We ow establish a result kow as the Kolmogorov Zero-Oe Law. ssume X 1, X 2,... are idepedet. Let F be the σ-algebra geerated by X 1,..., X, ad let G be the σ-algebra geerated by X +1, X +2,.... Note that F ad G are idepedet σ-algebras, ad F 1 F 2, G 1 G 2. Let F be the σ-algebra geerated by X 1, X 2,..., i.e., the smallest σ-algebra cotaiig the algebra The tail σ-algebra T is defied by F 0 := T = F. =1 G. (The itersectio of σ-algebras is a σ-algebra, so T is a σ-algebra.) example of a evet i T is { } X X m lim = 0. m m =1 It is easy to see that this evet is i G for each sice { } { X X m lim = 0 = m m lim m X X m m } = 0. We write for symmetric differece, B = ( \ B) (B \ ). The ext lemma says roughly that F 0 is dese i F. Lemma 4.7. Suppose F 0 is a algebra of evets ad F = σ(f 0 ). If F, the for every ɛ > 0 there is a 0 F 0 with P( 0 ) < ɛ. 16

17 Proof. Let B be the collectio of all evets such that for every ɛ > 0 there is a 0 F 0 such that P( 0 ) < ɛ. Trivially, F 0 B. We will show that B is a σ-algebra ad this will imply that F B. Obviously, Ω B. Suppose B ad let ɛ > 0. Fid 0 F 0 such that P( 0 ) < ɛ. The c 0 F 0 ad P( c c 0) = P( 0 ) < ɛ. Hece c B. Suppose 1, 2,... B ad let = j. Let ɛ > 0, ad fid such that P P[] ɛ 2. For j = 1,...,, let j,0 F 0 such that j P[ j j,0 ] ɛ 2 j 1. Let 0 = 1,0,0 ad ote that 0 j j,0 \ ad hece P( 0 ) < ɛ ad B. Theorem 4.8 (Kolmogorov Zero-Oe Law). If T the P() = 0 or P() = 1. Proof. Let T ad let ɛ > 0. Fid a set 0 F 0 with P( 0 ) < ɛ. The 0 F for some. Sice T G ad F is idepedet of G, ad 0 are idepedet. Therefore P( 0 ) = P()P( 0 ). Note that P( 0 ) P() P( 0 ) P() ɛ. Lettig ɛ 0, we get P() = P()P(). This implies P() = 0 or P() = 1. s a example, suppose X 1, X 2,... are idepedet radom variables. The the probability of the evet { } X 1 (ω) + + X (ω) ω : lim = 0 is zero or oe. 5 Cetral Limit Theorem I this sectio, we will use some kowledge of the Fourier trasform that we review here. If g : R R, the Fourier trasform of g is defied by ĝ(y) = e ixy g(x) dx. We will call g a Schwartz fuctio it is C ad all of its derivatives decay at ± faster tha every polyomial. If g is Schwartz, the ĝ is Schwartz. (Roughly speakig, the Fourier trasform seds smooth fuctios to bouded fuctios ad bouded fuctios to smooth fuctios. Very smooth ad very bouded fuctios are set to very smooth ad very bouded fuctios.) The iversio formula g(x) = 1 e ixy ĝ(y) dy (8) 2π j, 17

18 holds for Schwartz fuctios. straightforward computatio shows that We also eed f(x) ĝ(x) dx = ê x2 /2 = 2πe y2 /2. = = [ f(x) [ g(y) ˆf(y) g(y) dy. g(y) e ixy dy ] f(x) e ixy dx This is valid provided that the iterchage of itegrals ca be justified; i particular, it holds for Schwartz f, g. The characteristic fuctio is a versio of the Fourier trasform. Defiitio The characteristic fuctio of a a radom variable X is the fuctio φ = φ X : R C defied by φ(t) = E[e ixt ]. Properties of Characteristic Fuctios ] dy φ(0) = 1 ad for all t φ(t) = E[e ixt ] E [ e ixt ] = 1. φ is a cotiuous fuctio of t. To see this ote that the collectio of radom variables Y s = e isx are domiated by the radom variable 1 which has fiite expectatio. Hece by the domiated covergece theorem, [ lim φ(s) = lim s t s t E[eisX ] = E lim eisx] = φ(t). s t I fact, it is uiformly cotiuous (Exercise 5.1). If Y = ax + b where a, b are costats, the φ Y (t) = E[e i(ax+b)t ] = e ibt E[e ix(at) ] = e ibt φ X (at). The fuctio M X (t) = e tx is ofte called the momet geeratig fuctio of X. Ulike the characteristic fuctio, the momet geeratig fuctio does ot always exist for all values of t. Whe it does exist, however, we ca use the formal relatio φ(t) = M(it). If a radom variable X has a desity f the φ X (t) = e itx f(x) dx = ˆf( t). If two radom variables have the same characteristic fuctio, the they have the same distributio. This fact is ot obvious this will follow from Propositio 5.6 which gives a iversio formula for the characteristic fuctio similar to (8). If X 1,..., X are idepedet radom variables, the φ X1+ +X (t) = E[e it(x1+ +X) ] = E[e itx1 ] E[e itx2 ] E[e tx ] = φ X1 (t) φ X2 (t) φ X (t). 18

19 If X is a ormal radom variable, mea zero, variace 1, the by completig the square i the expoetial oe ca compute φ(t) = e ixt 1 e x2 /2 dx = e t2 /2. 2π If Y is ormal mea µ, variace σ 2, the Y has the same distributio as σx + µ, hece φ Y (t) = φ σx+µ (t) = e iµt φ X (σt) = e iµt e σ2 t 2 /2. Exercise 5.1. Show that φ is a uiformly cotiuous fuctio of t. Propositio 5.2. Suppose X is a radom variable with characteristic fuctio φ. Suppose E[ X ] <. The φ is cotiuously differetiable ad φ (0) = i E(X). Proof. Note that for real θ e iθ 1 θ. This ca be checked geometrically or by otig that θ e iθ 1 = ie is ds We write the differece quotiet 0 θ 0 ie is ds = θ. φ(t + δ) φ(t) δ = e i(t+δ)x e itx δ dµ(x). Note that e i(t+δ)x e itx δ δx x. δ Sayig E[ X ] < is equivalet to sayig that x is itegrable with respect to the measure µ. By the domiated covergece theorem, e i(t+δ)x e itx lim dµ(x) = δ 0 δ = [ lim δ 0 e i(t+δ)x e itx ] dµ(x). δ ixe itx dµ(x). Hece, ad, i particular, φ (t) = ixe itx dµ(x), φ (0) = i E[X]. The followig propositio ca be proved i the same way. Propositio 5.3. Suppose X is a radom variable with characteristic fuctio φ. Suppose E[ X k ] < for some positive iteger k. The φ has k cotiuous derivatives ad φ (j) (t) = i j E[X j ], j = 0, 1..., k. (9) 19

20 Formally, (9) ca be derived by writig E[e ixt ] = E (ix) j t j = j! j=0 j=0 i j E(X j ) t j, j! differetiatig both sides, ad evaluatig at t = 0. Propositio 5.3 tells us that the existece of E[ X k ] allows oe to justify this rigorously up to the kth derivative. Propositio 5.4. Let X 1, X 2,... be idepedet, idetically distributed radom variables with mea µ ad variace σ 2 ad characteristic fuctio φ. Let Z = X X µ, σ2 ad let φ be the characteristic fuctio of Z. The for each t, lim φ (t) = e t2 /2. Recall that e t2 /2 is the characteristic fuctio of a ormal mea zero, variace 1 radom variable. Proof. Without loss of geerality we will assume µ = 0, σ 2 = 1 for otherwise we ca cosider Y j = (X j µ)/σ. By Propositio 5.3, we have φ(t) = 1 + φ (0) t φ (0) t 2 + ɛ t t 2 = 1 t2 2 + ɛ t t 2, where ɛ t 0 as t 0. Note that φ (t) = [ ( )] t φ, ad hece for fixed t, [ ( )] t lim φ (t) = lim φ = lim [1 t2 2 + δ ], where δ 0 as. Oce is sufficietly large that δ (t 2 /2) < we ca take logarithms ad expad i a Taylor series, log [1 t2 2 + δ ] = t2 2 + ρ, where ρ 0. (Sice φ ca take o complex values, we eed to take a complex logarithm with log 1 = 0, but there is o problem provided that δ (t 2 /2) <.) Therefore lim log φ (t) = t2 2. Theorem 5.5 (Cetral Limit Theorem). Let X 1, X 2,... be idepedet, idetically distributed radom variables with mea µ ad fiite variace. If < a < b <, the { lim P a X } 1 + X X µ b σ 1 b = e x2 /2 dx. a 2π 20

21 Proof. Let φ be the characteristic fuctio of 1/2 σ 1 (X X µ). We have already show that for each t, φ (t) e t2 /2. It suffices to show that if µ is ay sequece of distributios such that their characteristic fuctios φ coverge poitwise to e t2 /2 the for every a < b, where lim µ [a, b] = Φ(b) Φ(a), Φ(x) = x 1 2π e y2 /2 dy. Fix a, b ad let ɛ > 0. We ca fid a C fuctio g = g a,b,ɛ such that We will show that Sice for ay distributio µ, 0 g(x) 1, < x <, g(x) = 1, a x b, g(x) = 0, lim g(x) dµ (x) = µ[a, b] x / [a ɛ, b + ɛ]. 1 g(x) e x2 /2 dx. (10) 2π g(x) dµ(x) µ[a ɛ, b + ɛ], we the get the theorem by lettig ɛ 0. Sice g is a C fuctio with compact support, it is a Schwartz fuctio. Let ĝ(y) = e ixy g(x) dx, be the Fourier trasform. The, usig the iversio formula (8), [ 1 ] g(x) dµ (x) = e ixy ĝ(y) dy dµ (x) 2π = 1 [ ] e ixy dµ (x) ĝ(y) dy 2π = 1 φ (y)ĝ(y) dy. 2π Sice φ ĝ ĝ ad ĝ is L 1, we ca use the domiated covergece theorem to coclude lim g(x) dµ (x) = 1 e y2 /2ĝ(y) dy. 2π Recallig that ê x2 /2 = 2πe y2 /2, ad that ˆfg = fĝ, we get (10). The ext propositio establishes the uiqueess of characteristic fuctios by givig a iversio formula. Note that i order to specify a distributio µ, it suffices to give µ[a, b] at all a, b with µ{a, b} = 0. 21

22 Propositio 5.6. Let X be a radom variable with distributio µ, distributio fuctio F, ad characteristic fuctio φ. The for every a < b such that F is cotiuous at a ad b, 1 T µ[a, b] = F (b) F (a) = lim T 2π T e iya e iyb φ(y) dy. (11) iy Note that it is temptig to write the right had side of (11) as 1 2π e iya e iyb φ(y) dy, (12) iy but this itegrad is ot ecessarily itegrable o (, ). We ca write (12) if we iterpret this as the improper Riema itegral ad ot as the Lebesgue itegral o (, ). Proof. We first fix T <. Note that e iya e iyb iy φ(y) (b a), (13) ad hece the itegral o the right-had side of (11) is well defied. The boud (13) allows us to use Fubii s Theorem: 1 T e iya e iyb φ(y) dy = 1 T e iya e iyb [ ] e iyx dµ(x) dy 2π T iy 2π T iy = 1 [ ] T e iy(x a) e iy(x b) dy dµ(x) 2π iy It is easy to check that for ay real c, T T T e icx c T ix dx = 2 si x 0 x dx. We will eed the fact that There are two relevat cases. T si x lim T 0 x dx = π 2. If x a ad x b have the same sig (i.e., x < a or x > b), If x a > 0 ad x b < 0, T e iy(x a) e iy(x b) lim dy = 0. T T iy T e iy(x a) e iy(x b) T si x lim dy = 4 lim dx = 2π. T T iy T 0 x 22

23 We do ot eed to worry about what happes at x = a or x = b sice µ gives zero measure to these poits. Sice the fuctio T e iy(x a) e iy(x b) g T (a, b, x) = dy iy is uiformly bouded i T, a, b, x, we ca use domiated covergece theorem to coclude that [ ] T e iy(x a) e iy(x b) lim dy dµ(x) = 2π 1 (a,b) dµ(x) = 2πµ[a, b]. T T iy T Therefore, lim T 1 T e iya e iyb φ(y) dy = µ[a, b]. 2π T iy Exercise 5.7. radom variable X has a Poisso distributio with parameter λ > 0 if X takes o oly oegative iteger values ad P{X = k} = e λ λk, k = 0, 1, 2,.... k! I this problem, suppose that X, Y are idepedet Poisso radom variables o a probability space with parameters λ X, λ Y, respectively. (i) Fid E[X], Var[X], E[Y ], Var[Y ]. (ii) Show that X + Y has a Poisso distributio with parameter λ X + λ Y by computig directly P{X + Y = k}. (iii) Fid the characteristic fuctio for X, Y ad use this to fid a alterative deriviatio of (ii). (iv) Suppose that for each iteger > λ we have idepedet radom variables Z 1,, Z 2,,..., Z, with Fid P{Z j, = 1} = 1 P{Z j, = 0} = λ. lim P{Z 1, + Z 2, + + Z, = k}. Exercise 5.8. radom variable X is ifiitely divisible if for each positive iteger, we ca fid i.i.d. radom variables X 1,..., X such that X X have the same distributio as X. (i) Show that Poisso radom variables ad ormal radom variables are ifiitely divisible. (ii) Suppose that X has a uiform distributio o [0, 1]. Show that X is ot ifiitely divisible. (iii) Fid all ifiitely divisible distributios that take o oly a fiite umber of values. 6 Coditioal Expectatio 6.1 Defiitio ad properties Suppose X is a radom variable with E[ X ] <. We ca thik of E[X] as the best guess for the radom variable give o iformatio about the outcome of the radom experimet which produces the umber X. Coditioal expectatio deals with the case where oe has some, but ot all, iformatio about a evet. Cosider the example of flippig cois, i.e., the probability space Ω = {(ω 1, ω 2,...) : ω j = 0 or 1}. 23

24 Let F be the σ-algebra of all evets that deped o the first flips of the cois. Itutitively we thik of the σ-algebra F as the iformatio cotaied i viewig the first flips. Let X (ω) = ω be the idicator fuctio of the evet the th flip is heads. Let S = X X be the total umber of heads i the first flips. Clearly E[S ] = /2. Suppose we are iterested i S 3 but we get to see the flip of the first coi. The our best guess for S 3 depeds o the value of the first flip. Usig the otatio of elemetary probability courses we ca write E[S 3 X 1 = 1] = 2, E[S 3 X 1 = 0] = 1. More formally we ca write E[S 3 F 1 ] for the radom variable that is measurable with respect to the σ-algebra F 1 that takes o the value 2 o the evet {X 1 = 1} ad 1 o the evet {X 1 = 0}. Let (Ω, F, P) be a probability space ad G F aother σ-algebra. Let X be a itegrable radom variable. Let us first cosider the case whe G is a fiite σ-algebra. I this case there exist disjoit, oempty evets 1, 2,..., k that geerate G i the sese that G cosists precisely of those sets obtaied from uios of the evets 1,..., k. If P( j ) > 0, we defie E[X G] o j to be the average of X o j, E[X G] = j X dp = E[X1 j ] P( j ) P( j ), o j. (14) If P( j ) = 0, this defiitio does ot make sese, ad we just let E[X G] take o a arbitrary value, say c j, o j. Note that E[X G] is a G-measurable radom variable ad that its defiitio is uique except perhaps o a set of total probability zero. Propositio 6.1. If G is a fiite σ-algebra ad E[X G] is defied as i (14), the for every evet G, E[X G] dp = X dp. Proof. Let G be geerated by 1,..., k ad assume that = 1 l. The E[X G] dp = = = = l E[X G] dp j l P( j ) j X dp P( j ) l j X dp X dp. This propositio gives us a clue as how to defie E[X G] for ifiite G. Propositio 6.2. Suppose X is a itegrable radom variable o (Ω, F, P), ad G F is a σ-algebra. The there exists a G-measurable radom variable Y such that if G, Y dp = X dp. (15) Moreover, if Ỹ is aother G-measurable radom variable satisfyig (15), the Ỹ = Y a.s. 24

25 Proof. The uiqueess is immediate sice if Y, Ỹ are G-measurable radom variables satisfyig (15), the (Y Ỹ ) dp = 0. for all G ad hece Y Ỹ = 0 almost surely. To show existece cosider the probability space (Ω, G, P). Cosider the (siged) measure o (Ω, G), ν() = X dp. Note that ν << P. Hece by the Rado-Nikodym theorem there is a G-measurable radom variable Y with ν() = Y dp. Usig the propositio, we ca defie the coditioal expectatio E[X G] to be the radom variable Y i the propositio. It is characterized by the properties E[X G] is G-measurable. For every G, E[X G] dp = X dp. (16) The radom variable E[X G] is oly defied up to a set of probability zero. Exercise 6.3. This exercise gives a alterative proof of the existece of E[X G] usig Hilbert spaces (but ot usig Rado-Nikodym theorem). Let H deote the real Hilbert space of all square-itegrable radom variables o (Ω, F, P). 1. Show that the space of G-measurable, square-itegrable radom varaibles is a closed subspace of H. 2. For square-itegrable X, defie E(X G) to be the Hilbert space projectio oto the space of G- measurable radom variables. Show that E(X G) satisfies (16). 3. Show that E(X G) exists for itegrable X. (Hit: if X 0, we ca approximate X from below by simple radom variables X. Defie For geeral X, write X = X + X.) Properties of Coditioal Expectatio E(X G) = lim E(X G). E[E[X G]] = E[X]. Proof. This is a special case of (16) where = Ω. If a, b R, E[a X + b Y G] = a E[X G] + b E[Y G]. (This equality ad others below are really equalities up a evet of probability zero.) 25

26 Proof. Note that a E[X G] + b E[Y G] is G-measurable. lso, if G, (a E[X G] + b E[Y G]) dp = a E[X G] dp + b E[Y G] dp P = a X dp + b Y dp = (ax + by ) dp. The result follows by uiqueess of the coditioal expectatio. If X is G-measurable, the E[X G] = X. If G is idepedet of X, the E[X G] = E[X]. (G is idepedet of X if G is idepedet of the σ-algebra geerated by X. Equivaletly, they are idepedet if for every G, 1 is idepedet of X.) Proof. The costat radom variable E[X] is certaily G-measurable. lso, if G, E[X] dp = P() E[X] = E[1 ] E[X] = E[1 X] = X dp. If H G F are σ-algebras, the E[X H] = E[E[X G] H]. (17) Proof. Clearly, E[E[X G] H] is H-measurable. If H, the G, ad hece E[E[X G] H] dp = E[X G] dp = X dp. If Y is G-measurable, the E[XY G] = Y E[X G]. (18) Proof. Clearly Y E[X G] is G-measurable. We eed to show that for every G, Y E[X G] dp = Y X dp. (19) We will first cosider the case where X, Y 0. Note that this implies that E[X G] 0 a.s., sice E[X G] dp = X dp 0 for every G. Fid simple G-measurable radom variables 0 Y 1 Y 2 such that Y Y. The Y X coverges mootoically to Y X ad Y E[X G] coverges mootoically to Y E[X G]. If G ad Z = c j 1 Bj, B j G, 26

27 is a G-measurable simple radom variable, Z E[X G] dp = = = = = c j 1 Bj E[X G] dp c j E[X G] dp B j c j X dp B j [ c j 1 Bj ]X dp ZX dp. Hece (19) holds for simple oegative Y ad oegative X, ad hece by the mootoe covergece theorem it holds for all oegative Y ad oegative X. For geeral Y, X, we write Y = Y + Y, X = X + X ad use liearity of expectatio ad coditioal expectatio. If X, Y are radom variables ad X is itegrable we write E[X Y ] for E[X σ(y )], Here σ(y ) deotes the σ-algebra geerated by Y. Ituitively, E[X Y ] is the best guess of X give the value of Y. Note that E[X Y ] ca be writte as φ(y ) for some fuctio φ, i.e., for each possible value of Y there is a value of E[X Y ]. Elemetary texts ofte write this as E[X Y = y] to idicate that for each value of y there is a value of E[X Y ]. Similarly, we ca defie E[X Y 1,... Y ]. Example Cosider the coi-tossig radom variables discussed earlier i the sectio. What is E[X 1 S ]? Symmetry tells us that E[X 1 S ] = E[X 2 S ] = = E[X S ]. lso, liearity gives Hece, E[X 1 S ] + E[X 2 S ] + + E[X S ] = E[X X S ] = E[S S ] = S. E[X S ] = S. Note that E[X 1 S ] is ot the same thig at E[X 1 X 1,..., X ] = E[X 1 F ] which would be just X 1. The first coditioig is give oly the umber of heads o the first flips while the secod coditioig is give all the outcomes of the flips. Example Suppose X, Y are two radom variables with joit desity fuctio f(x, y). probability space to be R 2, let F be the Borel sets, ad let P be the measure P() = f(x, y) dx dy. Let us take our The the radom variables X(x, y) = x ad Y (x, y) = y have the joit desity f(x, y). The E[Y X] is a radom variable with desity f(x, y) f(y X) =. f(x, z) dz 27

28 This formula is ot valid whe the deomiator is zero. However, the set of x such that f(x, z) dz = 0, is a ull set uder the measure P ad hece the desity f(y X) is well defied almost surely. This desity is ofte called the coditioal desity of Y give X. 6.2 Martigales martigale is a model of a fair game. Suppose we have a probability space (Ω, F, P) ad a icreasig sequece of σ-algebras F 0 F 1 Such a sequece is called a filtratio, ad we thik of F as represetig the amout of kowledge we have at time. The iclusio of the σ-algebras imply that we do ot lose ay kowledge that we have gaied. Defiitio martigale (with respect to the filtratio {F }) is a sequece of itegrable radom variables M such that M is F -measurable for each ad for all < m, We ote that to prove (20), it suffices to show that for all, E[M m F ] = M. (20) E[M +1 F ] = M. This follows from successive applicatios of (17). If the filtratio is ot specified explicitly, the oe assumes that oe uses the filtratio geerated by M, i.e., F is the σ-algebra geerated by M 1,..., M. Examples We list some examples. We leave it as exercises to show that they are martigales. If X 1, X 2,... are idepedet, mea-zero radom variables ad S = X X, the S is a martigale with respect to the filtratio geerated by X 1, X 2,.... If X 1, X 2,... are as above, E[X 2 j ] = σ2 j < for each j, the M = S 2 is a martigale with respect to the filtratio geerated by X 1, X 2,.... Suppose X 1, X 2,... are as i the first example ad F is the filtratio geerated by X 1, X 2,.... We choose F 0 to be the trivial σ-algebra. For each, let B be a bouded radom variable that is measurable with respect to F 1. We thik of B as beig the bet o the game X ; we ca see the results of X 1,..., X 1 before choosig a bet but oe caot see X. The total fortue by time is give by W 0 = 0 ad W = B j X j. The W is a martigale. σ 2 j 28

29 Suppose X 1, X 2,... are idepedet coi-flippig radom variables, i.e., with distributio P{X j = 1} = P{X j = 1} = 1 2. particular case of the last example is where B 0 = 1 ad B = 2 if X 1 = X 2 = = X 1 = 1 ad B = 0 otherwise. I other words, oe keeps doublig oe s bet util oe wis. Note that P{W = 1} = 1 2, P{W = 1 2 } = 2. (21) This bettig strategy is sometimes called the martigale bettig strategy. We wat to prove a versio of the optioal stoppig theorem which is a mathematical way of sayig that oe caot beat a fair game. The last example above shows that we eed to be careful if I play a fair game ad keep doublig my bet, I will evetually wi ad come out ahead (provided that I always have eough moey to put dow the bet whe I lose!) Defiitio stoppig time with respect to a filtratio {F } is a radom variable takig values i {0, 1, 2...} { } such that for each, the evet {T } F. I other words, we oly eed the iformatio at time to kow whether we have stopped at time. Examples of stoppig times are: Costat radom variables are stoppig times. If M is a martigale with respect to {F } ad V R is a Borel set, the is a stoppig time. T = mi{j : M j V } If T 1, T 2 are stoppig times, the so are T 1 T 2 ad T 1 T 2. I particular, if T is a stoppig time the so are T = T. Note that T 0 T 1... ad T = lim T. Propositio 6.4. If M is a martigale ad T is a stoppig time with respect to {F }, the Y = M T is a martigale with respect to {F }. I particular, E[Y ] = E[Y 0 ] = E[M 0 ]. Proof. Note that 1 Y = M 1{T } + M j 1{T = j}. From this it is easy to see that Y is F -measurable ad E[ Y ] <. lso ote that the evet 1{T + 1} is the complemet of the evet 1{T } ad hece is F -measurable. Therefore, (18) implies Therefore, usig liearity, Examples E(M +1 1{T + 1} F ) = 1{T + 1} E(M +1 F ) = 1{T + 1} M. j=0 E(Y +1 F ) = 1{T + 1} M + M j 1{T = j} j=0 1 = 1{T } M + M j 1{T = j} = Y. j=0 29

30 Let X 1, X 2,... be idepedet, each with P{X j = 1} = P{X j = 1} = 1/2. Let M = 1+X 1 + +X, N a iteger greater tha 1, ad T = mi{j : M = 0 or M = N}. T is the first time that a radom walker startig at 1 reaches 0 or N. The M T is a martigale. I particular, 1 = E[M 0 ] = E[M T ]. It is easy to check that w.p.1 T <. Sice M T is a sequece of bouded radom variables approachig M T, the domiated covergece theorem implies Note that ad hece we have computed a probability, E[M T ] = 1. E[M T ] = N P{M T = N}, P{M T = N} = 1 N. Let X 1, X 2,... be as above ad let S = X X. We have oted that M = S 2 is a martigale. Let N be a positive iteger ad The E(M T ) is a martigale ad hece With probability oe, We would like to say that T = mi{ : S = N}. 0 = E[M 0 ] = E[M T ] = E[S 2 T ( T ) 2 ]. lim M T = M T. E[M T ] = lim E[M T ] = 0. (22) Ideed, we ca justify this usig the domiated covergece. It is ot difficult to show (Exercise 6.5) that E[T ] <, ad hece M T is domiated by the itegrable radom variable N 2 +T. This justifies (22). Note that this implies that E[T ] = N 2. Cosider the martigale bettig strategy as above ad let T = mi{ : X = 1} = mi{x : W = 1}. The W T is a martigale, ad hece E[W T ] = E[W 0 ] = 0. This could be checked easily usig (21). Note that P{T < } = 1 ad lim T W T = W T = 1. Hece, 1 = E[W T ] lim T E[W T ] = 0. I this case there is o itegrable radom variable Y such that W T N Y for all N. 30