Definition: A Sigma Field of a set is a collection of sets where (i), (ii) if, and (iii) for.

Fundamentals of Probability, Hyunseung Kang Definitions Let set be a set that contains all possible outcomes of an experiment. A subset of is an event while a single point of is considered a sample point. Next, we define a field, which is a collection of events such that it is closed under complements, and it is closed under countable addition. Definition: A Sigma Field of a set is a collection of sets where (i), (ii) if, and (iii) for. does not have to be unique. For example, for any set, and are both sigma fields. Also, a sigma field generated by a collection,, denoted as, is the smallest sigma field which contains A. To be the smallest sigma field implies that is the intersection of all sigma fields that contain A. For example, let be a collection of open sets in ( ]. Each element of ( ( ] ) ( ]) is a Borel set and the field is aptly named the Borel sigma field; it is also known as the field that contains all the open sets in ( ]. Another example, consider where is a random variable (to be defined later). is the smallest F s.t. F; or intuitively speaking, we call it the information of Finally, we define a measure which assigns probability to each element in the field we constructed. Definition: A function ] is a probability measure iff (i) and (ii) and are disjoint There are numerous properties we can derive using these definitions. We highlight the important ones. Property: If and, then Property: If and, then Next, we define random variables (and naturally their expectations) by first introducing simple random variables. Definition: A function is a simple random variable if where are disjoint partitions of. A random variable is on where Definition: The expected value of a simple random variable,. The expected value of a random variable is. We approximate this integral as a limit of the expectation of bounded r.v.s, which is a limit of simple random variables. There are couple properties of random variables that come up. The most important one is uniform boundedness of random variables and independence. Definition: A sequence of random variables is uniformly bounded when, for some Definition: A sequence of random variables is uniformly integrable if ( ( )) Definition: and are independent iff Properties

An important concept to remember is whether a sequence of events occurs infinitely often. There are also famous theorems associated with these called the Borel Cantelli Lemmas. Definition: A sequence of events,, occurs infinitely often, Theorem (First Borel Cantelli Lemma): Proof: Define. Then,. Therefore, ( ) ( ) Theorem (Second Borel Cantelli Lemma): Proof: }. Thus, we only have to show. Using, ( ) ( ). In addition to i.o., we can establish inequalities of random variables. Theorem (Markov Inequality): If, then. When is not non.neg, we can use Proof: Note that. Take expectation on both sides to complete the proof. Theorem (Chebyshev s Inequality): For any random variable and, ( ) Proof: Note that. Square both sides and take expectation to complete the proof. Theorem (Cauchy-Schwartz Inequality): () ( ) ( ) Proof: We know. Let and. Then, ( ). Take expectation on both sides, multiply the square root factor to get the desired inequality. Theorem (Holder s Inequality): If, then ( ( )) ( ( )) Proof: From Young s Inequality,. Let ( ) and ( ( )),do the same trick as above. Theorem (Kolmogorov Maximal Inequality): If are indep. and, then ( ) ( ) Proof: Let. Define.Then, ( ). Taking expectation on both sides, ( ) ( ( )). Note that because the stopping time T is independent from future observations. Rearranging the terms, we get ( ( )) ( ( )) ( ) *Theorem (Kolmogorov Zero-One Law): If indep, and, then, or 1

Note: stands for tail events. Intuitively speaking, these are events such that it is determined by all the values of, but is not determined by any finite subset of OR (probabilistically independent of any finite subset). An example of a tail event is,, or. Generally, it is hard to determine whether it s 0 or 1. Proof: Since or 1, we prove. First, let. Then, if ( ) by the approximation of sets, s.t. ( ). Now, restrict. Since the sigma fields do not overlap, and are independent. Now, ( ) ( ). Taking expectation on both sides and using independence, we get ( ) ( ) ( ). As ( ), which imply. Theorem(Berry-Essen Theorem): If i.i.d.,, then (, where. ) Theorem(Berry-Essen Inequality):For any two random variables,, with cdfs, and chfs,, which are constants s.t. We state other properties, mostly having to do with inequalities and expectation. Theorem: If is a bounded r.v., then Theorem: If, then (monotonicity of the integral) Theorem: ( ) Theorem (Jensen s Inequality): If is a proper, convex function and is r.v., then ( ) ( ) Theorem (Hoeffding s Inequality, Limited): If and, then ( ) Theorem (Tail Bound for Normal(0,1)): If Theorem: For any sequence of events a.s. Theorem (Approximation of Sets/Approximation Lemma): If is a sequence of -fields and, then s.t. ( ) where ( ) ( ) (aka the stuff not shared by two sets and ). Theorem (Kronecker s Lemma): For, converges imply Proof: ( ). Note that as by our hypothesis. Taking limits on both sides, you get the result. Theorem (Sterling s Formula):

Theorem (Le Cam s Theorem): If, Bernoulli with, indep.,, and, then ( ). Note: This is saying that the sum of Bernoullis roughly has a Poisson distribution. Convergence of Random Variables First, we define the three different types of convergence encountered in Definition (convergence in dist): A sequence of probability spaces with an associated random variable, converges in distribution of if for all continuous points, in, Definition (convergence in prob): A sequence of functions ( ) converges in probability if Definition (convergence a.s.): A sequence of functions converges in probability if ( ). Next, there are properties along with theorems related to these convergence. Proposition: Proof: Let ( ) ( ). Note that and. Thus, ( ) Proposition: If Proposition: If These are identical statements: (i) is uniformly integrable, (ii) in, and (iii) Proposition: If, then there is a subsequence,, s.t. Proof: Since ( ). Then, by defn. of limits, for each, we can find s.t. where. Since, by first B.C., as desired. Proposition: If, there is a further subsequence s.t. Proposition: If and, then Theorem (Bounded Convergence Theorem): If the same probability space and if, then is a sequence of uniformly bounded random variables defined on Proof: ( ), where is some constant. This is the defn. of limit. Theorem (Fatou s Lemma): If and, then

Proof: Define where and is a bounded r.v. Note that and. Then, Then, by BCT, we get. Finally, take the sup over all Z and we end up with Note that Fatou s Lemma is often used to show that the expectation of a random variable is bounded. Theorem (Dominated Convergence Theorem): If, and, then Proof: Let and, where both and. By Fatou s Lemma, we get and. Combining them together, we get our result. Theorem (Monotone Convergence Theorem): If and (a.s. or in prob), then. Proof: Using Fatou, we get. Also, taking the integral and taking limsup on both sides, we get. Thus, we get our desired result. There are several results on convergence. Here, we state what s relevant for this section. *Theorem (Kolmogorov One-Series Theorem): If independent,,then converges a.s. Proof: Use Kolmogorov Maximal Inequality Theorem(Kolmogorov Condensation Test): Suppose indep and ( ). If converges (a.s. or in prob), converges. Now, there are two very important types of law of large numbers, the strong law and the weak law. The weak law is basically convergence in probability while strong law is convergence a.s. For here on out, Theorem (Weak Law of Large Numbers): If are independent with, then Proof: Using Chebyshev, ( ). Take the limit on both sides to obtain the result. There are several flavors of the strong law of large numbers, each relaxing assumptions from the previous ones Theorem (Strong Law of Large Numbers, v1): If i.i.d. and bounded,, then Proof: Pick quadratic subsequence terms,. Then, we know from Chebyshev and first BC, ( ) ( ). Now, let. We can write ( ). As, we get. Since the event that w.p. 1, w.p. 1 Theorem (Strong Law of Large numbers, v2): If i.i.d.,, then

Proof: By Kolmogorov One-Series Theorem, converges a.s. By Kronecker, this implies converges a.s. Theorem (Strong Law of Large numbers, v3): If i.i.d., and (), Proof: first BC ( Theorem, where ( ). For the first sum, ( ) (),by )=0 and thus, it goes to zero a.s.. For the second sum, using Kolmogorov s One-Series ( ) ( ) () converges a.s. By Kronecker s Lemma, this implies that *Theorem (Strong Law of Large numbers, v4): If are pairwise independent,, and (), Proof: Requires Edimonti s Theorem and a whole lot of crap. Distribution Theory We first define a characteristic function and state some properties associated with it. Note that for any r.v., the characteristic function always exists. Definition: For any random variable, the characteristic function,, of is defined as Properties: If is a characteristic function, then the following statements are true (i) (ii) is uniformly continuous (and hence, continuous) (iii) there is an atom in some place (iv) is real in distribution (aka symmetric) (v) We have a pdf (vi) If ( ) is n times continuous differentiable with (vii) If is n derivatives at zero, then ( ) if n is even and up to if n is odd. (viii) (Levy s Uniqueness Theorem) If and have the same chfs, then X and Y have the same distribution (ix) (Levy s Inversion Formula): and thus, the chf determines the cumulative distribution of X. Theorem (Polya s Theorem): If is (i) symmetric, (ii) convex for, (iii) uniformly continuous, and(iv), then is a characteristic function of Polya type In addition to characteristic functions, we define the concept of tightness and how it can be used to prove various properties of convergence for characteristic functions and its cdfs. Definition: A sequence of cdfs,, is tight if Definition: A sequence of r.vs,, is tight if ( )

Theorem (Helly Selection Theorem): For a sequence of cdfs,, there is a subsequence and a right-continuous non-decreasing function s.t. at all continuity points in F. (Note: Steele states is a distribution function only if is tight) Proof: By compactness of ] and ] that converges in [0,1]. Let be a sequence of rationals. Then, has a subsequence,, that converges to. Move on to. has a further subsequence that converges to. Repeating this diagonal argument through, we end up with a super-small subsequence s.t.. Next, since is non-decreasing function, G must also be a non-increasing function. Furthermore,. Finally, if is a continuous point in F, we can pick so that. Since and, then, as desired. Theorem (Tail Bound): For any random variable, () ) ( ) Proof: ( ) ( ) ( [ ]) ( ) ( ) ( ) ( ). Rearrange the equations to obtain( ) ( ). Pick and we re done. Theorem (Tightness Lemma): If a sequence of chfs, for and some and is continuous at zero, then the cdfs associated with is tight. (Note: does not have to be a chf) Proof: ( denotes the cdfs for each chf). By Tail Bound, we get ([ ] ) ( ). Taking on both sides, we get ([ ] ) ([ ] ) ( ). We can control the right hand to be a very small number because ( ) where we use that is continuous at zero. Thus, are tight. Theorem (Levy Continuity Theorem): Given a sequence of chfs, and its corresponding cdfs, where and is continuous at zero, then (i) is a chf and (ii) Proof: By Tightness Lemma, must be tight. By Helly, and thus, has to be a chf. Now, by contradiction, suppose does not converge to. This implies that there is a subsequence does not converge to F. But, by Helly, we can choose a further subsequence of s.t., where where is a continuity point of. But, since the chfs converge, and. Hence,, which is a contradiction. With these criterions, we can now prove the central limit theorem. There are many flavors of this as well, each relaxing the assumptions of the other. Theorem (Central Limit Theorem, v1): If i.i.d.,, Proof: We need three key ingredients: 1. From Taylor Series, ( ( )). Take and we get ( ( ))

2. By induction, we get where and (lemma 1, pg 358) 3. From the definition of natural e, Since is represented by a characteristic function ( ) and using (1) and (2), we get ( ) ( ) ( ( )). By DCT, ( ) and ( ( get. Since this is the chf of a normal, by Levy Continuity Theorem, we re done. )). By (3), we Theorem (Central Limit Theorem, v2): For each n, we have where each are independent and each set of sequence may live on different probability spaces. If, and ( ), then. Note: The tail bound condition is called the Lindenberg Condition (LC) Proof: WLOG, let to get, and ( ). Then, using (1) and (2) from v1, we get. The first and the second absolute values can be bounded by the following Using (1) and (2) from v1 ( ( ) ( ) ) ( ), where the second sum is taken care of by the LC cond. Again, using the same trick again where we Taylor expand around, ( ) ( ( ) ( ) ) ( ). To bound the last summand, first note that ( ) ( ) ( Second, using Cauchy-Schwartz and Chebyshev, ). Using LC, we get. ( ) ( ). Summing this expectation, we get ( ). Since the first and second summands are bounded by, we arrive at our result. Finally, we have a pretty cool theorem showing that two random variables have identical distribution. Theorem: If and, then in distribution. Martingales First, note that we work in (aka Hilbert spaces) where all Cauchy sequence converges. A related work to Hilbert spaces is the theory behind conditional expectation. Definition: If, is a field, and, then is a conditional expectation of given G if (a) and (b). Written another way, ( )

Properties: Take the notation above. Then, the following are true (i) (Existence) exists (ii) (Uniqueness) If Y above exists, then Y is a.s. unique. (iii) (Linearity of Conditional Expectation). If is the conditional expectation of given, then is the conditional expectation of given (iv) (Non-negativity): If, then is non-negative. (v) (Contraction): (or () ()) (vi) (Tower Property): Given ( ( ) ) ( ( ) ) ( ) We highlight a convergence of monotone martingales. Theorem (Monotone Convergence of Conditional Expectation): If and, w.p.1, and, then ( ) ( ) w.p.1. Proof: Since ( ) ( ), and ( ) is some positive operator on the monotone sequence, s.t. ( ) w.p.1. (note can be infinity). Next, using MCT, and since, we get. Finally, to show that ( ) again, using MCT, we see that and and. Since limits are unique, and we re done. Now, onto the definition of a martingale Definition: Suppose we have s.t., and. Then is a martingale iff ( ) Definition: Suppose we have, and a random variable Z where (). Then,, where ( ), is a martingale (called Levy s Martingale) Definition: Suppose we have s.t., and. Then is a submartingale iff ( ). Note that any convex function of a MG,, is submg. If is a submg and monotonic, any convex function on it will be a submg. Now, there are a lot of ways in which martingales can converge. We highlight a couple along with the proofs Theorem: If is a martingale and, then for any. Proof: WLOG,. Then, ( ( )) ( ( )) ( ( ( ) )) Theorem (Doob Maximal Inequality): For any martingale in and, ( ) Proof: Let and. Then,. Take expectation on both sides, we get ( ) ( ) where the second inequality works because ( ) ( ( )) ( ( )). Therefore, ( ) ( ), as desired. Note: This also works if MG is in. Theorem ( Convergence): If is a martingale in, then converges in a.s. Proof: First, some facts about expectation and martingales. For any, ( ) ( ), thanks to the theorem above. Now, ( ) for any

. Therefore,. Notice that the right hand sum shrinks as grows (thanks to the bound on the sum). Second, to prove that converges a.s., we just have to show that the sequence is Cauchy OR ( ). Note that ( ) ( ) ( ). Applying Doob on each of the terms, we get ( ) ( ) ( ). Take, we end up with ( ). Using the first fact, we get the RHS to 0. Theorem (Doob Maximal Inequality): For any submg that is non-negative and for any, Proof: Define. Then,. Take expectation on both sides, ( ) where the second inequality uses. Moving out the expectation and using the same reasoning as, we get the above inequality. Theorem ( Convergence): If are submg in, then converges a.s. Proof:. By MCT and the fact that,. Furthermore, ( ) a.s. and thus, exists. By Fatou s lemma, and thus a.s. Similarly, for the case. Using Fatou again,. Hence, a.s. Therefore, a.s. *Theorem ( Maximal Inequality): If is a non-negative martingale, ( ) ( ) for. Proof: First, looking at Doob proof, we know. Replacing that sum with the equality,. Manipulating this and using the fact that ( ) for any non-negative, ( ( ) ) ( ) ( ) ( ( ) ) ( ). Then, applying the Holder s Inequality on the RHS, done. ( ) ( ) ( ) ( ). Take the power off and we re Theorem (Up-Crossing Inequality): If is a submg, ( ) ( ) where is the number of times the process up-crosses b and a. Note: The idea is to show that the MG goes between ] finitely many times for convergence Proof: In Durret. Very nasty Theorem(Doob Stopping Time): If is a MG, and is a stopping time, then is a martingale.

Theorem (Wald Identity): If iid (), and is a stopping time, Theorem(Optional Sampling Theorem): If is MG,, is a bounded stopping time,