Probability Review Lecture Notes Brief Review of Basic Probability I assume you know basic probability. Chapters -3 are a review. I will assume you have read and understood Chapters -3. Here is a very quick review.. Random Variables A random variable is a map X from a set Ω (equipped with a probability P ) to R. We write P (X A) = P ({ω Ω : X(ω) A}) and we write X P to mean that X has distribution P. function (cdf) of X is F X (x) = F (x) = P (X x). If X is discrete, its probability mass function (pmf) is The cumulative distribution p X (x) = p(x) = P (X = x). If X is continuous, then its probability density function function (pdf) satisfies P (X A) = p X (x)dx = p(x)dx A A and p X (x) = p(x) = F (x). The following are all equivalent: X P, X F, X p. Suppose that X P and Y Q. We say that X and Y have the same distribution if P (X A) = Q(Y A) for all A. In that case we say that X and Y are equal in distribution and we write X d = Y. Lemma X d = Y if and only if F X (t) = F Y (t) for all t.
.2 Expected Values The mean or expected value of g(x) is { g(x)p(x)dx if X is continuous E (g(x)) = g(x)df (x) = g(x)dp (x) = j g(x j)p(x j ) if X is discrete. Recall that:. E( k j= c jg j (X)) = k j= c je(g j (X)). 2. If X,..., X n are independent then ( n ) E X i = i= i E (X i ). 3. We often write µ = E(X). 4. σ 2 = Var (X) = E ((X µ) 2 ) is the Variance. 5. Var (X) = E (X 2 ) µ 2. 6. If X,..., X n are independent then ( n ) Var a i X i = i= i a 2 i Var (X i ). 7. The covariance is Cov(X, Y ) = E((X µ x )(Y µ y )) = E(XY ) µ X µ Y and the correlation is ρ(x, Y ) = Cov(X, Y )/σ x σ y. Recall that ρ(x, Y ). The conditional expectation of Y given X is the random variable E(Y X) whose value, when X = x is E(Y X = x) = y p(y x)dy where p(y x) = p(x, y)/p(x). 2
The Law of Total Expectation or Law of Iterated Expectation: E(Y ) = E [ E(Y X) ] = E(Y X = x)p X (x)dx. The Law of Total Variance is Var(Y ) = Var [ E(Y X) ] + E [ Var(Y X) ]. The moment generating function (mgf) is M X (t) = E ( e tx). If M X (t) = M Y (t) for all t in an interval around 0 then X d = Y. Exercise: show that M (n) X (t) t=0 = E (X n )..3 Transformations Let Y = g(x). Then F Y (y) = P(Y y) = P(g(X) y) = A(y) p X (x)dx where Then p Y (y) = F Y (y). If g is monotonic, then where h = g. A y = {x : g(x) y}. p Y (y) = p X (h(y)) dh(y) dy Example 2 Let p X (x) = e x for x > 0. Hence F X (x) = e x. Let Y = g(x) = log X. Then and p Y (y) = e y e ey for y R. F Y (y) = P (Y y) = P (log(x) y) = P (X e y ) = F X (e y ) = e ey Example 3 Practice problem. Let X be uniform on (, 2) and let Y = X 2. Find the density of Y. 3
Let Z = g(x, Y ). For exampe, Z = X + Y or Z = X/Y. Then we find the pdf of Z as follows:. For each z, find the set A z = {(x, y) : g(x, y) z}. 2. Find the CDF F Z (z) = P (Z z) = P (g(x, Y ) z) = P ({(x, y) : g(x, y) z}) = p X,Y (x, y)dxdy. A z 3. The pdf is p Z (z) = F Z (z). Example 4 Practice problem. Let (X, Y ) be uniform on the unit square. Let Z = X/Y. Find the density of Z..4 Independence X and Y are independent if and only if for all A and B. P(X A, Y B) = P(X A)P(Y B) Theorem 5 Let (X, Y ) be a bivariate random vector with p X,Y (x, y). X and Y are independent iff p X,Y (x, y) = p X (x)p Y (y). X,..., X n are independent if and only if P(X A,..., X n A n ) = n P(X i A i ). Thus, p X,...,X n (x,..., x n ) = n i= p X i (x i ). If X,..., X n are independent and identically distributed we say they are iid (or that they are a random sample) and we write X,..., X n P or X,..., X n F or X,..., X n p..5 Important Distributions X N(µ, σ 2 ) if i= p(x) = σ 2π e (x µ)2 /(2σ2). If X R d then X N(µ, Σ) if ( p(x) = (2π) d/2 Σ exp ) 2 (x µ)t Σ (x µ). 4
X χ 2 p if X = p j= Z2 j where Z,..., Z p N(0, ). X Bernoulli(θ) if P(X = ) = θ and P(X = 0) = θ and hence p(x) = θ x ( θ) x x = 0,. X Binomial(θ) if p(x) = P(X = x) = ( ) n θ x ( θ) n x x x {0,..., n}. X Uniform(0, θ) if p(x) = I(0 x θ)/θ..6 Sample Mean and Variance The sample mean is and the sample variance is X = X i, n S 2 = n i (X i X) 2. Let X,..., X n be iid with µ = E(X i ) = µ and σ 2 = Var(X i ) = σ 2. Then E(X) = µ, Theorem 6 If X,..., X n N(µ, σ 2 ) then (a) X N(µ, σ2 n ) i Var(X) = σ2 n, E(S2 ) = σ 2. (b) (n )S2 σ 2 χ 2 n (c) X and S 2 are independent.7 Delta Method If X N(µ, σ 2 ), Y = g(x) and σ 2 is small then To see this, note that Y N(g(µ), σ 2 (g (µ)) 2 ). Y = g(x) = g(µ) + (X µ)g (µ) + 5 (X µ)2 g (ξ) 2
for some ξ. Now E((X µ) 2 ) = σ 2 which we are assuming is small and so Y = g(x) g(µ) + (X µ)g (µ). Thus Hence, E(Y ) g(µ), Var(Y ) (g (µ)) 2 σ 2. g(x) N ( g(µ), (g (µ)) 2 σ 2). 6
Appendix: Common Distributions Discrete Uniform X U (,..., N) X takes values x =, 2,..., N P (X = x) = /N E (X) = x xp (X = x) = x x N = N E (X 2 ) = x x2 P (X = x) = x x2 N = N Binomial X Bin(n, p) X takes values x = 0,,..., n P (X = x) = ( ) n x p x ( p) n x Hypergeometric X Hypergeometric(N, M, K) P (X = x) = (M x )( N M K x ) ( N K) Geometric X Geom(p) P (X = x) = ( p) x p, x =, 2,... E (X) = x x( p)x = p x Poisson X Poisson(λ) P (X = x) = e λ λ x x! x = 0,, 2,... E (X) = Var (X) = λ N(N+) = (N+) 2 2 N(N+)(2N+) 6 d ( ( dp p)x ) = p p =. p 2 p M X (t) = x=0 etx e λ λ x = e λ (λe t ) x x! x=0 = e λ e λet = e λ(et ). x! 7
E (X) = λe t e λ(et ) t=0 = λ. Use mgf to show: if X Poisson(λ ), X 2 Poisson(λ 2 ), independent then Y = X + X 2 Poisson(λ + λ 2 ). Multinomial Distribution The multivariate version of a Binomial is called a Multinomial. Consider drawing a ball from an urn with has balls with k different colors labeled color, color 2,..., color k. Let p = (p, p 2,..., p k ) where j p j = and p j is the probability of drawing color j. Draw n balls from the urn (independently and with replacement) and let X = (X, X 2,..., X k ) be the count of the number of balls of each color drawn. We say that X has a Multinomial (n, p) distribution. The pdf is ( ) n p(x) = p x... p x k k x,..., x. k Continuous Distributions Normal X N(µ, σ 2 ) p(x) = 2πσ exp{ 2σ 2 (x µ) 2 }, x R mgf M X (t) = exp{µt + σ 2 t 2 /2}. E (X) = µ Var (X) = σ 2. e.g., If Z N(0, ) and X = µ + σz, then X N(µ, σ 2 ). Show this... Proof. M X (t) = E ( e tx) = E ( e t(µ+σz)) = e tµ E ( e tσz) = e tµ M Z (tσ) = e tµ e (tσ)2/2 = e tµ+t2 σ 2 /2 which is the mgf of a N(µ, σ 2 ). 8
Gamma Alternative proof: F X (x) = P (X x) = P (µ + σz x) = P ( ) x µ = F Z σ ( ) x µ p X (x) = F X(x) = p Z σ σ { = exp ( ) } 2 x µ 2π 2 σ σ { = exp ( ) } 2 x µ, 2πσ 2 σ which is the pdf of a N(µ, σ 2 ). X Γ(α, β). p X (x) = Γ(α)β α x α e x/β, x a positive real. Γ(α) = 0 x α e x/β dx. β α Important statistical distribution: χ 2 p = Γ( p 2, 2). χ 2 p = p i= X2 i, where X i N(0, ), iid. Exponential X exp(β) p X (x) = β e x/β, x a positive real. exp(β) = Γ(, β). ( Z x µ ) σ e.g., Used to model waiting time of a Poisson Process. Suppose N is the number of phone calls in hour and N P oisson(λ). Let T be the time between consecutive phone calls, then T exp(/λ) and E (T ) = (/λ). If X,..., X n are iid exp(β), then i X i Γ(n, β). Memoryless Property: If X exp(β), then P (X > t + s X > t) = P (X > s). 9
Multivariate Normal Distribution Let Y R d. Then Y N(µ, Σ) if ( p(y) = exp ) (2π) d/2 Σ /2 2 (y µ)t Σ (y µ). Then E(Y ) = µ and cov(y ) = Σ. The moment generating function is ( ) M(t) = exp µ T t + tt Σt. 2 Theorem 7 (a). If Y N(µ, Σ), then E(Y ) = µ, cov(y ) = Σ. (b). If Y N(µ, Σ) and c is a scalar, then cy N(cµ, c 2 Σ). (c). Let Y N(µ, Σ). If A is p n and b is p, then AY + b N(Aµ + b, AΣA T ). Theorem 8 Suppose that Y N(µ, Σ). Let ( ) Y Y =, µ = Y 2 ( µ µ 2 ), Σ = ( Σ Σ 2 Σ 2 Σ 22 where Y and µ are p, and Σ is p p. (a). Y N p (µ, Σ ), Y 2 N n p (µ 2, Σ 22 ). (b). Y and Y 2 are independent if and only if Σ 2 = 0. (c). If Σ 22 > 0, then the condition distribution of Y given Y 2 is ). Y Y 2 N p (µ + Σ 2 Σ 22 (Y 2 µ 2 ), Σ Σ 2 Σ 22 Σ 2 ). Lemma 9 Let Y N(µ, σ 2 I), where Y T = (Y,..., Y n ), µ T = (µ,..., µ n ) and σ 2 > 0 is a scalar. Then the Y i are independent, Y i N (µ, σ 2 ) and Y 2 σ 2 Theorem 0 Let Y N(µ, Σ). Then: (a). Y T Σ Y χ 2 n(µ T Σ µ). (b). (Y µ) T Σ (Y µ) χ 2 n(0). = Y T Y σ 2 χ 2 n ( µ T µ σ 2 ). 0