Definition: A Sigma Field of a set is a collection of sets where (i), (ii) if, and (iii) for.

Similar documents
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

Random variables, probability distributions, binomial random variable

Mathematical Methods of Engineering Analysis

n k=1 k=0 1/k! = e. Example 6.4. The series 1/k 2 converges in R. Indeed, if s n = n then k=1 1/k, then s 2n s n = 1 n

CS 103X: Discrete Structures Homework Assignment 3 Solutions

1 if 1 x 0 1 if 0 x 1

Lemma 5.2. Let S be a set. (1) Let f and g be two permutations of S. Then the composition of f and g is a permutation of S.

FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES

E3: PROBABILITY AND STATISTICS lecture notes

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

Undergraduate Notes in Mathematics. Arkansas Tech University Department of Mathematics

God created the integers and the rest is the work of man. (Leopold Kronecker, in an after-dinner speech at a conference, Berlin, 1886)

Lecture 13: Martingales

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES

BANACH AND HILBERT SPACE REVIEW

Metric Spaces. Chapter Metrics

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Gambling Systems and Multiplication-Invariant Measures

Probability Generating Functions

Wald s Identity. by Jeffery Hein. Dartmouth College, Math 100

Tiers, Preference Similarity, and the Limits on Stable Partners

Mathematics for Econometrics, Fourth Edition

Continued Fractions and the Euclidean Algorithm

Notes on metric spaces

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

How To Prove The Dirichlet Unit Theorem

SOLUTIONS TO EXERCISES FOR. MATHEMATICS 205A Part 3. Spaces with special properties

Metric Spaces Joseph Muscat 2003 (Last revised May 2009)

MA651 Topology. Lecture 6. Separation Axioms.

CONTINUED FRACTIONS AND FACTORING. Niels Lauritzen

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.

88 CHAPTER 2. VECTOR FUNCTIONS. . First, we need to compute T (s). a By definition, r (s) T (s) = 1 a sin s a. sin s a, cos s a

Mathematical Induction. Mary Barnes Sue Gordon

10.2 Series and Convergence

Stationary random graphs on Z with prescribed iid degrees and finite mean connections

SOME APPLICATIONS OF MARTINGALES TO PROBABILITY THEORY

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

No: Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

6.2 Permutations continued

Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e.

1. Prove that the empty set is a subset of every set.

Follow links for Class Use and other Permissions. For more information send to:

x a x 2 (1 + x 2 ) n.

A Sublinear Bipartiteness Tester for Bounded Degree Graphs

1. (First passage/hitting times/gambler s ruin problem:) Suppose that X has a discrete state space and let i be a fixed state. Let

THE BANACH CONTRACTION PRINCIPLE. Contents

Math 55: Discrete Mathematics

CHAPTER IV - BROWNIAN MOTION

Lecture 13 - Basic Number Theory.

Transfer Learning! and Prior Estimation! for VC Classes!

Winter Camp 2011 Polynomials Alexander Remorov. Polynomials. Alexander Remorov

Lecture 7: Continuous Random Variables

Partial Fractions. Combining fractions over a common denominator is a familiar operation from algebra:

HOMEWORK 5 SOLUTIONS. n!f n (1) lim. ln x n! + xn x. 1 = G n 1 (x). (2) k + 1 n. (n 1)!

University of Miskolc

MATH10040 Chapter 2: Prime and relatively prime numbers

Handout #1: Mathematical Reasoning

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.

Cartesian Products and Relations

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Probability and Random Variables. Generation of random variables (r.v.)

CONTINUED FRACTIONS AND PELL S EQUATION. Contents 1. Continued Fractions 1 2. Solution to Pell s Equation 9 References 12

Lecture 2: Universality

Linear Algebra I. Ronald van Luijk, 2012

ON SEQUENTIAL CONTINUITY OF COMPOSITION MAPPING. 0. Introduction

. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i.

Solutions for Practice problems on proofs

MOP 2007 Black Group Integer Polynomials Yufei Zhao. Integer Polynomials. June 29, 2007 Yufei Zhao

Duality of linear conic problems

Notes V General Equilibrium: Positive Theory. 1 Walrasian Equilibrium and Excess Demand

The Prime Numbers. Definition. A prime number is a positive integer with exactly two positive divisors.

Non Parametric Inference

Fixed Point Theorems

3. Mathematical Induction

THE DIMENSION OF A VECTOR SPACE

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

ST 371 (IV): Discrete Random Variables

6.3 Conditional Probability and Independence

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

LEARNING OBJECTIVES FOR THIS CHAPTER

ALMOST COMMON PRIORS 1. INTRODUCTION

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

x < y iff x < y, or x and y are incomparable and x χ(x,y) < y χ(x,y).

INCIDENCE-BETWEENNESS GEOMETRY

MATHEMATICAL METHODS OF STATISTICS

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 2

and s n (x) f(x) for all x and s.t. s n is measurable if f is. REAL ANALYSIS Measures. A (positive) measure on a measurable space

How To Understand The Theory Of Hyperreals

MARTINGALES AND GAMBLING Louis H. Y. Chen Department of Mathematics National University of Singapore

This is a square root. The number under the radical is 9. (An asterisk * means multiply.)

8 Primes and Modular Arithmetic

Kevin James. MTHSC 412 Section 2.4 Prime Factors and Greatest Comm

Markov random fields and Gibbs measures

Maximum Likelihood Estimation

k, then n = p2α 1 1 pα k

CHAPTER 5. Number Theory. 1. Integers and Division. Discussion

Transcription:

Fundamentals of Probability, Hyunseung Kang Definitions Let set be a set that contains all possible outcomes of an experiment. A subset of is an event while a single point of is considered a sample point. Next, we define a field, which is a collection of events such that it is closed under complements, and it is closed under countable addition. Definition: A Sigma Field of a set is a collection of sets where (i), (ii) if, and (iii) for. does not have to be unique. For example, for any set, and are both sigma fields. Also, a sigma field generated by a collection,, denoted as, is the smallest sigma field which contains A. To be the smallest sigma field implies that is the intersection of all sigma fields that contain A. For example, let be a collection of open sets in ( ]. Each element of ( ( ] ) ( ]) is a Borel set and the field is aptly named the Borel sigma field; it is also known as the field that contains all the open sets in ( ]. Another example, consider where is a random variable (to be defined later). is the smallest F s.t. F; or intuitively speaking, we call it the information of Finally, we define a measure which assigns probability to each element in the field we constructed. Definition: A function ] is a probability measure iff (i) and (ii) and are disjoint There are numerous properties we can derive using these definitions. We highlight the important ones. Property: If and, then Property: If and, then Next, we define random variables (and naturally their expectations) by first introducing simple random variables. Definition: A function is a simple random variable if where are disjoint partitions of. A random variable is on where Definition: The expected value of a simple random variable,. The expected value of a random variable is. We approximate this integral as a limit of the expectation of bounded r.v.s, which is a limit of simple random variables. There are couple properties of random variables that come up. The most important one is uniform boundedness of random variables and independence. Definition: A sequence of random variables is uniformly bounded when, for some Definition: A sequence of random variables is uniformly integrable if ( ( )) Definition: and are independent iff Properties

An important concept to remember is whether a sequence of events occurs infinitely often. There are also famous theorems associated with these called the Borel Cantelli Lemmas. Definition: A sequence of events,, occurs infinitely often, Theorem (First Borel Cantelli Lemma): Proof: Define. Then,. Therefore, ( ) ( ) Theorem (Second Borel Cantelli Lemma): Proof: }. Thus, we only have to show. Using, ( ) ( ). In addition to i.o., we can establish inequalities of random variables. Theorem (Markov Inequality): If, then. When is not non.neg, we can use Proof: Note that. Take expectation on both sides to complete the proof. Theorem (Chebyshev s Inequality): For any random variable and, ( ) Proof: Note that. Square both sides and take expectation to complete the proof. Theorem (Cauchy-Schwartz Inequality): () ( ) ( ) Proof: We know. Let and. Then, ( ). Take expectation on both sides, multiply the square root factor to get the desired inequality. Theorem (Holder s Inequality): If, then ( ( )) ( ( )) Proof: From Young s Inequality,. Let ( ) and ( ( )),do the same trick as above. Theorem (Kolmogorov Maximal Inequality): If are indep. and, then ( ) ( ) Proof: Let. Define.Then, ( ). Taking expectation on both sides, ( ) ( ( )). Note that because the stopping time T is independent from future observations. Rearranging the terms, we get ( ( )) ( ( )) ( ) *Theorem (Kolmogorov Zero-One Law): If indep, and, then, or 1

Note: stands for tail events. Intuitively speaking, these are events such that it is determined by all the values of, but is not determined by any finite subset of OR (probabilistically independent of any finite subset). An example of a tail event is,, or. Generally, it is hard to determine whether it s 0 or 1. Proof: Since or 1, we prove. First, let. Then, if ( ) by the approximation of sets, s.t. ( ). Now, restrict. Since the sigma fields do not overlap, and are independent. Now, ( ) ( ). Taking expectation on both sides and using independence, we get ( ) ( ) ( ). As ( ), which imply. Theorem(Berry-Essen Theorem): If i.i.d.,, then (, where. ) Theorem(Berry-Essen Inequality):For any two random variables,, with cdfs, and chfs,, which are constants s.t. We state other properties, mostly having to do with inequalities and expectation. Theorem: If is a bounded r.v., then Theorem: If, then (monotonicity of the integral) Theorem: ( ) Theorem (Jensen s Inequality): If is a proper, convex function and is r.v., then ( ) ( ) Theorem (Hoeffding s Inequality, Limited): If and, then ( ) Theorem (Tail Bound for Normal(0,1)): If Theorem: For any sequence of events a.s. Theorem (Approximation of Sets/Approximation Lemma): If is a sequence of -fields and, then s.t. ( ) where ( ) ( ) (aka the stuff not shared by two sets and ). Theorem (Kronecker s Lemma): For, converges imply Proof: ( ). Note that as by our hypothesis. Taking limits on both sides, you get the result. Theorem (Sterling s Formula):

Theorem (Le Cam s Theorem): If, Bernoulli with, indep.,, and, then ( ). Note: This is saying that the sum of Bernoullis roughly has a Poisson distribution. Convergence of Random Variables First, we define the three different types of convergence encountered in Definition (convergence in dist): A sequence of probability spaces with an associated random variable, converges in distribution of if for all continuous points, in, Definition (convergence in prob): A sequence of functions ( ) converges in probability if Definition (convergence a.s.): A sequence of functions converges in probability if ( ). Next, there are properties along with theorems related to these convergence. Proposition: Proof: Let ( ) ( ). Note that and. Thus, ( ) Proposition: If Proposition: If These are identical statements: (i) is uniformly integrable, (ii) in, and (iii) Proposition: If, then there is a subsequence,, s.t. Proof: Since ( ). Then, by defn. of limits, for each, we can find s.t. where. Since, by first B.C., as desired. Proposition: If, there is a further subsequence s.t. Proposition: If and, then Theorem (Bounded Convergence Theorem): If the same probability space and if, then is a sequence of uniformly bounded random variables defined on Proof: ( ), where is some constant. This is the defn. of limit. Theorem (Fatou s Lemma): If and, then

Proof: Define where and is a bounded r.v. Note that and. Then, Then, by BCT, we get. Finally, take the sup over all Z and we end up with Note that Fatou s Lemma is often used to show that the expectation of a random variable is bounded. Theorem (Dominated Convergence Theorem): If, and, then Proof: Let and, where both and. By Fatou s Lemma, we get and. Combining them together, we get our result. Theorem (Monotone Convergence Theorem): If and (a.s. or in prob), then. Proof: Using Fatou, we get. Also, taking the integral and taking limsup on both sides, we get. Thus, we get our desired result. There are several results on convergence. Here, we state what s relevant for this section. *Theorem (Kolmogorov One-Series Theorem): If independent,,then converges a.s. Proof: Use Kolmogorov Maximal Inequality Theorem(Kolmogorov Condensation Test): Suppose indep and ( ). If converges (a.s. or in prob), converges. Now, there are two very important types of law of large numbers, the strong law and the weak law. The weak law is basically convergence in probability while strong law is convergence a.s. For here on out, Theorem (Weak Law of Large Numbers): If are independent with, then Proof: Using Chebyshev, ( ). Take the limit on both sides to obtain the result. There are several flavors of the strong law of large numbers, each relaxing assumptions from the previous ones Theorem (Strong Law of Large Numbers, v1): If i.i.d. and bounded,, then Proof: Pick quadratic subsequence terms,. Then, we know from Chebyshev and first BC, ( ) ( ). Now, let. We can write ( ). As, we get. Since the event that w.p. 1, w.p. 1 Theorem (Strong Law of Large numbers, v2): If i.i.d.,, then

Proof: By Kolmogorov One-Series Theorem, converges a.s. By Kronecker, this implies converges a.s. Theorem (Strong Law of Large numbers, v3): If i.i.d., and (), Proof: first BC ( Theorem, where ( ). For the first sum, ( ) (),by )=0 and thus, it goes to zero a.s.. For the second sum, using Kolmogorov s One-Series ( ) ( ) () converges a.s. By Kronecker s Lemma, this implies that *Theorem (Strong Law of Large numbers, v4): If are pairwise independent,, and (), Proof: Requires Edimonti s Theorem and a whole lot of crap. Distribution Theory We first define a characteristic function and state some properties associated with it. Note that for any r.v., the characteristic function always exists. Definition: For any random variable, the characteristic function,, of is defined as Properties: If is a characteristic function, then the following statements are true (i) (ii) is uniformly continuous (and hence, continuous) (iii) there is an atom in some place (iv) is real in distribution (aka symmetric) (v) We have a pdf (vi) If ( ) is n times continuous differentiable with (vii) If is n derivatives at zero, then ( ) if n is even and up to if n is odd. (viii) (Levy s Uniqueness Theorem) If and have the same chfs, then X and Y have the same distribution (ix) (Levy s Inversion Formula): and thus, the chf determines the cumulative distribution of X. Theorem (Polya s Theorem): If is (i) symmetric, (ii) convex for, (iii) uniformly continuous, and(iv), then is a characteristic function of Polya type In addition to characteristic functions, we define the concept of tightness and how it can be used to prove various properties of convergence for characteristic functions and its cdfs. Definition: A sequence of cdfs,, is tight if Definition: A sequence of r.vs,, is tight if ( )

Theorem (Helly Selection Theorem): For a sequence of cdfs,, there is a subsequence and a right-continuous non-decreasing function s.t. at all continuity points in F. (Note: Steele states is a distribution function only if is tight) Proof: By compactness of ] and ] that converges in [0,1]. Let be a sequence of rationals. Then, has a subsequence,, that converges to. Move on to. has a further subsequence that converges to. Repeating this diagonal argument through, we end up with a super-small subsequence s.t.. Next, since is non-decreasing function, G must also be a non-increasing function. Furthermore,. Finally, if is a continuous point in F, we can pick so that. Since and, then, as desired. Theorem (Tail Bound): For any random variable, () ) ( ) Proof: ( ) ( ) ( [ ]) ( ) ( ) ( ) ( ). Rearrange the equations to obtain( ) ( ). Pick and we re done. Theorem (Tightness Lemma): If a sequence of chfs, for and some and is continuous at zero, then the cdfs associated with is tight. (Note: does not have to be a chf) Proof: ( denotes the cdfs for each chf). By Tail Bound, we get ([ ] ) ( ). Taking on both sides, we get ([ ] ) ([ ] ) ( ). We can control the right hand to be a very small number because ( ) where we use that is continuous at zero. Thus, are tight. Theorem (Levy Continuity Theorem): Given a sequence of chfs, and its corresponding cdfs, where and is continuous at zero, then (i) is a chf and (ii) Proof: By Tightness Lemma, must be tight. By Helly, and thus, has to be a chf. Now, by contradiction, suppose does not converge to. This implies that there is a subsequence does not converge to F. But, by Helly, we can choose a further subsequence of s.t., where where is a continuity point of. But, since the chfs converge, and. Hence,, which is a contradiction. With these criterions, we can now prove the central limit theorem. There are many flavors of this as well, each relaxing the assumptions of the other. Theorem (Central Limit Theorem, v1): If i.i.d.,, Proof: We need three key ingredients: 1. From Taylor Series, ( ( )). Take and we get ( ( ))

2. By induction, we get where and (lemma 1, pg 358) 3. From the definition of natural e, Since is represented by a characteristic function ( ) and using (1) and (2), we get ( ) ( ) ( ( )). By DCT, ( ) and ( ( get. Since this is the chf of a normal, by Levy Continuity Theorem, we re done. )). By (3), we Theorem (Central Limit Theorem, v2): For each n, we have where each are independent and each set of sequence may live on different probability spaces. If, and ( ), then. Note: The tail bound condition is called the Lindenberg Condition (LC) Proof: WLOG, let to get, and ( ). Then, using (1) and (2) from v1, we get. The first and the second absolute values can be bounded by the following Using (1) and (2) from v1 ( ( ) ( ) ) ( ), where the second sum is taken care of by the LC cond. Again, using the same trick again where we Taylor expand around, ( ) ( ( ) ( ) ) ( ). To bound the last summand, first note that ( ) ( ) ( Second, using Cauchy-Schwartz and Chebyshev, ). Using LC, we get. ( ) ( ). Summing this expectation, we get ( ). Since the first and second summands are bounded by, we arrive at our result. Finally, we have a pretty cool theorem showing that two random variables have identical distribution. Theorem: If and, then in distribution. Martingales First, note that we work in (aka Hilbert spaces) where all Cauchy sequence converges. A related work to Hilbert spaces is the theory behind conditional expectation. Definition: If, is a field, and, then is a conditional expectation of given G if (a) and (b). Written another way, ( )

Properties: Take the notation above. Then, the following are true (i) (Existence) exists (ii) (Uniqueness) If Y above exists, then Y is a.s. unique. (iii) (Linearity of Conditional Expectation). If is the conditional expectation of given, then is the conditional expectation of given (iv) (Non-negativity): If, then is non-negative. (v) (Contraction): (or () ()) (vi) (Tower Property): Given ( ( ) ) ( ( ) ) ( ) We highlight a convergence of monotone martingales. Theorem (Monotone Convergence of Conditional Expectation): If and, w.p.1, and, then ( ) ( ) w.p.1. Proof: Since ( ) ( ), and ( ) is some positive operator on the monotone sequence, s.t. ( ) w.p.1. (note can be infinity). Next, using MCT, and since, we get. Finally, to show that ( ) again, using MCT, we see that and and. Since limits are unique, and we re done. Now, onto the definition of a martingale Definition: Suppose we have s.t., and. Then is a martingale iff ( ) Definition: Suppose we have, and a random variable Z where (). Then,, where ( ), is a martingale (called Levy s Martingale) Definition: Suppose we have s.t., and. Then is a submartingale iff ( ). Note that any convex function of a MG,, is submg. If is a submg and monotonic, any convex function on it will be a submg. Now, there are a lot of ways in which martingales can converge. We highlight a couple along with the proofs Theorem: If is a martingale and, then for any. Proof: WLOG,. Then, ( ( )) ( ( )) ( ( ( ) )) Theorem (Doob Maximal Inequality): For any martingale in and, ( ) Proof: Let and. Then,. Take expectation on both sides, we get ( ) ( ) where the second inequality works because ( ) ( ( )) ( ( )). Therefore, ( ) ( ), as desired. Note: This also works if MG is in. Theorem ( Convergence): If is a martingale in, then converges in a.s. Proof: First, some facts about expectation and martingales. For any, ( ) ( ), thanks to the theorem above. Now, ( ) for any

. Therefore,. Notice that the right hand sum shrinks as grows (thanks to the bound on the sum). Second, to prove that converges a.s., we just have to show that the sequence is Cauchy OR ( ). Note that ( ) ( ) ( ). Applying Doob on each of the terms, we get ( ) ( ) ( ). Take, we end up with ( ). Using the first fact, we get the RHS to 0. Theorem (Doob Maximal Inequality): For any submg that is non-negative and for any, Proof: Define. Then,. Take expectation on both sides, ( ) where the second inequality uses. Moving out the expectation and using the same reasoning as, we get the above inequality. Theorem ( Convergence): If are submg in, then converges a.s. Proof:. By MCT and the fact that,. Furthermore, ( ) a.s. and thus, exists. By Fatou s lemma, and thus a.s. Similarly, for the case. Using Fatou again,. Hence, a.s. Therefore, a.s. *Theorem ( Maximal Inequality): If is a non-negative martingale, ( ) ( ) for. Proof: First, looking at Doob proof, we know. Replacing that sum with the equality,. Manipulating this and using the fact that ( ) for any non-negative, ( ( ) ) ( ) ( ) ( ( ) ) ( ). Then, applying the Holder s Inequality on the RHS, done. ( ) ( ) ( ) ( ). Take the power off and we re Theorem (Up-Crossing Inequality): If is a submg, ( ) ( ) where is the number of times the process up-crosses b and a. Note: The idea is to show that the MG goes between ] finitely many times for convergence Proof: In Durret. Very nasty Theorem(Doob Stopping Time): If is a MG, and is a stopping time, then is a martingale.

Theorem (Wald Identity): If iid (), and is a stopping time, Theorem(Optional Sampling Theorem): If is MG,, is a bounded stopping time,