Probability Theory. Florian Herzog. A random variable is neither random nor variable. Gian-Carlo Rota, M.I.T..



Similar documents
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Chapter 4 Lecture Notes

RANDOM INTERVAL HOMEOMORPHISMS. MICHA L MISIUREWICZ Indiana University Purdue University Indianapolis

ST 371 (IV): Discrete Random Variables

E3: PROBABILITY AND STATISTICS lecture notes

Lecture Notes on Measure Theory and Functional Analysis

Notes on Continuous Random Variables

Lebesgue Measure on R n

Probability Distribution for Discrete Random Variables

Probability for Estimation (review)

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Elements of probability theory

and s n (x) f(x) for all x and s.t. s n is measurable if f is. REAL ANALYSIS Measures. A (positive) measure on a measurable space

Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Section 5.1 Continuous Random Variables: Introduction

Lecture 13: Martingales

Maximum Likelihood Estimation

Lecture 6: Discrete & Continuous Probability and Random Variables

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Lecture 8. Confidence intervals and the central limit theorem

Joint Exam 1/P Sample Exam 1

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

SOME APPLICATIONS OF MARTINGALES TO PROBABILITY THEORY

MEASURE AND INTEGRATION. Dietmar A. Salamon ETH Zürich

Lecture 7: Continuous Random Variables

M2S1 Lecture Notes. G. A. Young ayoung

5. Continuous Random Variables

RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Notes on Probability Theory

Introduction to Probability

Lecture Note 1 Set and Probability Theory. MIT Spring 2006 Herman Bennett

Math 151. Rumbos Spring Solutions to Assignment #22

Markov random fields and Gibbs measures

Probability and statistics; Rehearsal for pattern recognition

Estimating the Average Value of a Function

Lecture 3: Continuous distributions, expected value & mean, variance, the normal distribution

Mathematical Methods of Engineering Analysis

The Ergodic Theorem and randomness

Math/Stats 425 Introduction to Probability. 1. Uncertainty and the axioms of probability

4. Continuous Random Variables, the Pareto and Normal Distributions

Exploratory Data Analysis

ALMOST COMMON PRIORS 1. INTRODUCTION

An Introduction to Basic Statistics and Probability

Probability: Terminology and Examples Class 2, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails

MATHEMATICAL METHODS OF STATISTICS

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

Chapter ML:IV. IV. Statistical Learning. Probability Basics Bayes Classification Maximum a-posteriori Hypotheses

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Math 431 An Introduction to Probability. Final Exam Solutions

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Nonparametric adaptive age replacement with a one-cycle criterion

Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80)

INSURANCE RISK THEORY (Problems)

1.1 Introduction, and Review of Probability Theory Random Variable, Range, Types of Random Variables CDF, PDF, Quantiles...

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 14 10/27/2008 MOMENT GENERATING FUNCTIONS

Mathematics for Econometrics, Fourth Edition

Normal distribution. ) 2 /2σ. 2π σ

Practice problems for Homework 11 - Point Estimation

Math/Stats 342: Solutions to Homework

Math 425 (Fall 08) Solutions Midterm 2 November 6, 2008

Gambling Systems and Multiplication-Invariant Measures

Statistics 100A Homework 8 Solutions

2WB05 Simulation Lecture 8: Generating random variables

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Sums of Independent Random Variables

Extension of measure

Lecture 4: BK inequality 27th August and 6th September, 2007

Lectures on Stochastic Processes. William G. Faris

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

Chapter 5. Random variables

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS INTRODUCTION

1. Prove that the empty set is a subset of every set.

Martingale Ideas in Elementary Probability. Lecture course Higher Mathematics College, Independent University of Moscow Spring 1996

Chapter 3: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS. Part 3: Discrete Uniform Distribution Binomial Distribution

1 Norms and Vector Spaces

How To Find Out How To Calculate A Premeasure On A Set Of Two-Dimensional Algebra

Linear Algebra I. Ronald van Luijk, 2012

Probability Calculator

6.041/6.431 Spring 2008 Quiz 2 Wednesday, April 16, 7:30-9:30 PM. SOLUTIONS

Two-Stage Stochastic Linear Programs

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 13. Random Variables: Distribution and Expectation

Statistics in Geophysics: Introduction and Probability Theory

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 18. A Brief Introduction to Continuous Probability

CITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION

Basic Concepts of Point Set Topology Notes for OU course Math 4853 Spring 2011

Summer School in Statistics for Astronomers & Physicists, IV June 9-14, 2008

The Heat Equation. Lectures INF2320 p. 1/88

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

A characterization of trace zero symmetric nonnegative 5x5 matrices

1 The Brownian bridge construction

10.2 Series and Convergence

Transcription:

Probability Theory A random variable is neither random nor variable. Gian-Carlo Rota, M.I.T.. Florian Herzog 2013

Probability space Probability space A probability space W is a unique triple W = {Ω, F, P }: Ω is its sample space F its σ-algebra of events P its probability measure Remarks: (1) The sample space Ω is the set of all possible samples or elementary events ω: Ω = {ω ω Ω}. (2)The σ-algebra F is the set of all of the considered events A, i.e., subsets of Ω: F = {A A Ω, A F}. (3) The probability measure P assigns a probability P (A) to every event A F: P : F [0, 1]. Stochastic Systems, 2013 2

Sample space The sample space Ω is sometimes called the universe of all samples or possible outcomes ω. Example 1. Sample space Toss of a coin (with head and tail): Ω = {H, T }. Two tosses of a coin: Ω = {HH, HT, T H, T T }. A cubic die: Ω = {ω 1, ω 2, ω 3, ω 4, ω 5, ω 6 }. The positive integers: Ω = {1, 2, 3,... }. The reals: Ω = {ω ω R}. Note that the ωs are a mathematical construct and have per se no real or scientific meaning. The ωs in the die example refer to the numbers of dots observed when the die is thrown. Stochastic Systems, 2013 3

Event An event A is a subset of Ω. If the outcome ω of the experiment is in the subset A, then the event A is said to have occurred. The set of all subsets of the sample space are denoted by 2 Ω. Example 2. Events Head in the coin toss: A = {H}. Odd number in the roll of a die: A = {ω 1, ω 3, ω 5 }. An integer smaller than 5: A = {1, 2, 3, 4}, where Ω = {1, 2, 3,... }. A real number between 0 and 1: A = [0, 1], where Ω = {ω ω R}. We denote the complementary event of A by A c = Ω\A. When it is possible to determine whether an event A has occurred or not, we must also be able to determine whether A c has occurred or not. Stochastic Systems, 2013 4

Probability Measure I Definition 1. Probability measure A probability measure P on the countable sample space Ω is a set function P : F [0, 1], satisfying the following conditions P(Ω) = 1. P(ω i ) = p i. If A 1, A 2, A 3,... F are mutually disjoint, then ( ) P A i = i=1 P (A i ). i=1 Stochastic Systems, 2013 5

Probability The story so far: Sample space: Ω = {ω 1,..., ω n }, finite! Events: F = 2 Ω : All subsets of Ω Probability: P (ω i ) = p i P (A Ω) = w i A p i Probability axioms of Kolmogorov (1931) for elementary probability: P(Ω) = 1. If A Ω then P (A) 0. If A 1, A 2, A 3,... Ω are mutually disjoint, then ( ) P A i = i=1 P (A i ). i=1 Stochastic Systems, 2013 6

Uncountable sample spaces Most important uncountable sample space for engineering: R, resp. R n. Consider the example Ω = [0, 1], every ω is equally likely. Obviously, P(ω) = 0. Intuitively, P([0, a]) = a, basic concept: length! Question: Has every subset of [0, 1] a determinable length? Answer: No! (e.g. Vitali sets, Banach-Tarski paradox) Question: Is this of importance in practice? Answer: No! Question: Does it matter for the underlying theory? Answer: A lot! Stochastic Systems, 2013 7

Fundamental mathematical tools Not every subset of [0, 1] has a determinable length collect the ones with a determinable length in F. Such a mathematical construct, which has additional, desirable properties, is called σ-algebra. Definition 2. σ-algebra A collection F of subsets of Ω is called a σ-algebra on Ω if Ω F and F ( denotes the empty set) If A F then Ω\A = A c F: The complementary subset of A is also in Ω For all A i F: i=1 A i F The intuition behind it: collect all events in the σ-algebra F, make sure that by performing countably many elementary set operation (,, c ) on elements of F yields again an element in F (closeness). The pair {Ω, F} is called measure space. Stochastic Systems, 2013 8

Example of σ-algebra Example 3. σ-algebra of two coin-tosses Ω = {HH, HT, T H, T T } = {ω 1, ω 2, ω 3, ω 4 } F min = {, Ω} = {, {ω 1, ω 2, ω 3, ω 4 }}. F 1 = {, {ω 1, ω 2 }, {ω 3, ω 4 }, {ω 1, ω 2, ω 3, ω 4 }}. F max = {, {ω 1 }, {ω 2 }, {ω 3 }, {ω 4 }, {ω 1, ω 2 }, {ω 1, ω 3 }, {ω 1, ω 4 }, {ω 2, ω 3 }, {ω 2, ω 4 }, {ω 3, ω 4 }, {ω 1, ω 2, ω 3 }, {ω 1, ω 2, ω 4 }, {ω 1, ω 3, ω 4 }, {ω 2, ω 3, ω 4 }, Ω}. Stochastic Systems, 2013 9

Generated σ-algebras The concept of generated σ-algebras is important in probability theory. Definition 3. σ(c): σ-algebra generated by a class C of subsets Let C be a class of subsets of Ω. The σ-algebra generated by C, denoted by σ(c), is the smallest σ-algebra F which includes all elements of C, i.e., C F. Identify the different events we can measure of an experiment (denoted by A), we then just work with the σ-algebra generated by A and have avoided all the measure theoretic technicalities. Stochastic Systems, 2013 10

Borel σ-algebra The Borel σ-algebra includes all subsets of R which are of interest in practical applications (scientific or engineering). Definition 4. Borel σ-algebra B(R)) The Borel σ-algebra B(R) is the smallest σ-algebra containing all open intervals in R. The sets in B(R) are called Borel sets. The extension to the multi-dimensional case, B(R n ), is straightforward. (, a), (b, ), (, a) (b, ) [a, b] = (, a) (b, ), (, a] = n=1 [a n, a] and [b, ) = n=1 [b, b + n], (a, b] = (, b] (a, ), {a} = n=1 (a 1 n, a + 1 n ), {a 1,, a n } = n k=1 a k. Stochastic Systems, 2013 11

Measure Definition 5. Measure Let F be the σ-algebra of Ω and therefore (Ω, F) be a measurable space. The map µ : F [0, ] is called a measure on (Ω, F) if µ is countably additive. The measure µ is countably additive (or σ-additive) if µ( ) = 0 and for every sequence of disjoint sets (F i : i N) in F with F = i N F i we have µ(f ) = i N µ(f i ). If µ is countably additive, it is also additive, meaning for every F, G F we have µ(f G) = µ(f ) + µ(g) if and only if F G = The triple (Ω, F, µ) is called a measure space. Stochastic Systems, 2013 12

Lebesgue Measure The measure of length on the straight line is known as the Lebesgue measure. Definition 6. Lebesgue measure on B(R) The Lebesgue measure on B(R), denoted by λ, is defined as the measure on (R, B(R)) which assigns the measure of each interval to be its length. Examples: Lebesgue measure of one point: λ({a}) = 0. Lebesgue measure of countably many points: λ(a) = i=1 λ({a i}) = 0. The Lebesgue measure of a set containing uncountably many points: zero positive and finite infinite Stochastic Systems, 2013 13

Probability Measure Definition 7. Probability measure A probability measure P on the sample space Ω with σ-algebra F is a set function P : F [0, 1], satisfying the following conditions P(Ω) = 1. If A F then P (A) 0. If A 1, A 2, A 3,... F are mutually disjoint, then ( ) P A i = i=1 P (A i ). i=1 The triple (Ω, F, P ) is called a probability space. Stochastic Systems, 2013 14

F-measurable functions Definition 8. F-measurable function The function f : Ω R defined on (Ω, F, P ) is called F-measurable if f 1 (B) = {ω Ω : f(ω) B} F for all B B(R), i.e., the inverse f 1 maps all of the Borel sets B R to F. Sometimes it is easier to work with following equivalent condition: y R {ω Ω : f(ω) y} F This means that once we know the (random) value X(ω) we know which of the events in F have happened. F = {, Ω}: only constant functions are measurable F = 2 Ω : all functions are measurable Stochastic Systems, 2013 15

F-measurable functions Ω: Sample space, A i : Event, F = σ(a 1,..., A n ): σ-algebra of events X(ω) : Ω R R ω A 1 A 2 X(ω) R (, a] B A 3 A 4 X 1 : B F Ω X: random variable, B: Borel σ-algebra Stochastic Systems, 2013 16

F-measurable functions - Example Roll of a die: Ω = {ω 1, ω 2, ω 3, ω 4, ω 5, ω 6 } We only know, whether an even or and odd number has shown up: F = {, {ω 1, ω 3, ω 5 }, {ω 2, ω 4, ω 6 }, Ω} = σ({ω 1, ω 3, ω 5 }) Consider the following random variable: f(ω) = { 1, if ω = ω1, ω 2, ω 3 ; 1, if ω = ω 4, ω 5, ω 6. Check measurability with the condition y R {ω Ω : f(ω) y} F {ω Ω : f(ω) 0} = {ω 4, ω 5, ω 6 } / F f is not F-measurable. Stochastic Systems, 2013 17

Lebesgue integral I Definition 9. Lebesgue Integral (Ω, F) a measure space, µ : Ω R a measure, f : Ω R is F-measurable. If f is a simple function, i.e., f(x) = c i, for all x A i, c i R Ω fdµ = n c i µ(a i ). If f is nonnegative, we can always construct a sequence of simple functions f n with f n (x) f n+1 (x) which converges to f: lim n f n (x) = f(x). With this sequence, the Lebesgue integral is defined by fdµ = lim f n dµ. Ω n Ω i=1 Stochastic Systems, 2013 18

Lebesgue integral II Definition 10. Lebesgue Integral (Ω, F) a measure space, µ : Ω R a measure, f : Ω R is F-measurable. If f is an arbitrary, measurable function, we have f = f + f with f + (x) = max(f(x), 0) and f (x) = max( f(x), 0), and then define Ω fdµ = Ω f + dp Ω f dp. The integral above may be finite or infinite. It is not defined if Ω f + dp and Ω f dp are both infinite. Stochastic Systems, 2013 19

Riemann vs. Lebesgue The most important concept of the Lebesgue integral is that the limit of approximate sums (as the Riemann integral): for Ω = R: f(x) f(x) f x x x Stochastic Systems, 2013 20

Riemann vs. Lebesgue integral Theorem 1. Riemann-Lebesgue integral equivalence Let f be a bounded and continuous function on [x 1, x 2 ] except at a countable number of points in [x 1, x 2 ]. Then both the Riemann and the Lebesgue integral with Lebesgue measure µ exist and are the same: x2 x 1 f(x) dx = [x 1,x 2 ] fdµ. There are more functions which are Lebesgue integrable than Riemann integrable. Stochastic Systems, 2013 21

Popular example for Riemann vs. Lebesgue Consider the function f(x) = { 0, x Q; 1, x R\Q. The Riemann integral 1 0 f(x)dx does not exist, since lower and upper sum do not converge to the same value. However, the Lebesgue integral fdλ = 1 [0,1] does exist, since f(x) is the indicator function of x R\Q. Stochastic Systems, 2013 22

Random Variable Definition 11. Random variable A real-valued random variable X is a F-measurable function defined on a probability space (Ω, F, P ) mapping its sample space Ω into the real line R: X : Ω R. Since X is F-measurable we have X 1 : B F. Stochastic Systems, 2013 23

Distribution function Definition 12. Distribution function The distribution function of a random variable X, defined on a probability space (Ω, F, P ), is defined by: F (x) = P (X(ω) x) = P ({ω : X(ω) x}). From this the probability measure of the half-open sets in R is P (a < X b) = P ({ω : a < X(ω) b}) = F (b) F (a). Stochastic Systems, 2013 24

Density function Closely related to the distribution function is the density function. Let f : R R be a nonnegative function, satisfying fdλ = 1. The function f is called R a density function (with respect to the Lebesgue measure) and the associated probability measure for a random variable X, defined on (Ω, F, P ), is for all A F. P ({ω : ω A}) = A fdλ. Stochastic Systems, 2013 25

Important Densities I Poisson density or probability mass function (λ > 0): Univariate Normal density f(x) = λx x! e λ, x = 0, 1, 2,.... f(x) = 1 2πσ e 1 2 ( ) x µ σ. The normal variable is abbreviated as N (µ, σ). Multivariate normal density (x, µ R n ; Σ R n n ): f(x) = 1 (2π)n det(σ) e 1 2 (x µ)t Σ 1 (x µ). Stochastic Systems, 2013 26

Important Densities II Univariate student t-density ν degrees of freedom (x, µ R 1 ; σ R 1 f(x) = Γ(ν+1 2 ) ( Γ( ν 2 ) 1 + 1 (x µ) 2 ) 1 2 (ν+1) πνσ ν σ 2 Multivariate student t-density with ν degrees of freedom (x, µ R n ; Σ R n n ): f(x) = Γ( ν+n 2 ) ( Γ( ν 2 ) 1 + 1 (πν) n det(σ) ν (x µ)t Σ 1 1 2 (ν+n) (x µ)). Stochastic Systems, 2013 27

Important Densities II The chi square distribution with degree-of-freedom (dof) n has the following density )n 2 2 ( f(x) = e x 2 x 2 2Γ( n 2 ) which is abreviated as Z χ 2 (n) and where Γ denotes the gamma function. A chi square distributed random variable Y is created by Y = n i=1 X 2 i where X are independent standard normal distributed random variables N (0, 1). Stochastic Systems, 2013 28

Important Densities III A standard student-t distributed random variable Y is generated by Y = X Z ν, where X N (0, 1) and Z χ 2 (ν). Another important density is the Laplace distribution: p(x) = 1 x µ 2σ e σ with mean µ and diffusion σ. The variance of this distribution is given as 2σ 2. Stochastic Systems, 2013 29

Expectation & Variance Definition 13. Expectation of a random variable The expectation of a random variable X, defined on a probability space (Ω, F, P ), is defined by: E[X] = Ω XdP = Ω xfdλ. With this definition at hand, it does not matter what the sample Ω is. The calculations for the two familiar cases of a finite Ω and Ω R with continuous random variables remain the same. Definition 14. Variance of a random variable The variance of a random variable X, defined on a probability space (Ω, F, P ), is defined by: var(x) = E[(X E[X]) 2 ] = (X E[X]) 2 dp = E[X 2 ] E[X] 2. Ω Stochastic Systems, 2013 30

Normally distributed random variables The shorthand notation X N (µ, σ 2 ) for normally distributed random variables with parameters µ and σ is often found in the literature. The following properties are useful when dealing with normally distributed random variables: If X N (µ, σ 2 )) and Y = ax + b, then Y N (aµ + b, a 2 σ 2 )). If X 1 N (µ 1, σ 2 1 )) and X 2 N (µ 2, σ 2 2 )) then X 1 + X 2 N (µ 1 + µ 2, σ 2 1 + σ2 2 ) (if X 1 and X 2 are independent) Stochastic Systems, 2013 31

Conditional Expectation I From elementary probability theory (Bayes rule): P (A B) = P (A B) P (B) = P (B A)P (A) P (B), P (B) > 0. A A B B Ω E(X B) = E(XI B) P (B), P (B) > 0. Stochastic Systems, 2013 32

Conditional Expectation II General case: (Ω, F, P ) Definition 15. Conditional expectation Let X be a random variable defined on the probability space (Ω, F, P ) with E[ X ] <. Furthermore let G be a sub-σ-algebra of F (G F). Then there exists a random variable Y with the following properties: 1. Y is G-measurable. 2. E[ Y ] <. 3. For all sets G in G we have Y dp = G G XdP, for all G G. The random variable Y = E[X G] is called conditional expectation. Stochastic Systems, 2013 33

Conditional Expectation: Example Y is a piecewise linear approximation of X. Y = E[X G] X(ω) Y X G 1 G 2 G 3 G 4 G 5 ω For the trivial σ-algebra {, Ω}: Y = E[X {, Ω}] = XdP = E[X]. Ω Stochastic Systems, 2013 34

Conditional Expectation: Properties E(E(X F)) = E(X). If X is F-measurable, then E(X F) = X. Linearity: E(αX 1 + βx 2 F) = αe(x 1 F) + βe(x 2 F). Positivity: If X 0 almost surely, then E(X F) 0. Tower property: If G is a sub-σ-algebra of F, then E(E(X F) G) = E(X G). Taking out what is known: If Z is G-measurable, then E(ZX G) = Z E(X G). Stochastic Systems, 2013 35

Summary σ-algebra: collection of the events of interest, closed under elementary set operations Borel σ-algebra: all the events of practical importance in R Lebesgue measure: defined as the length of an interval Density: transforms Lebesgue measure in a probability measure Measurable function: the σ-algebra of the probability space is rich enough Random variable X: a measurable function X : Ω R Expectation, Variance Conditional expectation is a piecewise linear approximation of the underlying random variable. Stochastic Systems, 2013 36