SF2940: Probability theory Lecture 8: Multivariate Normal Distribution



Similar documents
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

October 3rd, Linear Algebra & Properties of the Covariance Matrix

Chapter 6. Orthogonality

Recall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n.

Some probability and statistics

Introduction to Matrix Algebra

Orthogonal Diagonalization of Symmetric Matrices

1 Introduction to Matrices

Similarity and Diagonalization. Similar Matrices

Data Mining: Algorithms and Applications Matrix Math Review

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Linear algebra and the geometry of quadratic equations. Similarity transformations and orthogonal matrices

Linear Algebra Review. Vectors

Inner products on R n, and more

Section Inner Products and Norms

3. Let A and B be two n n orthogonal matrices. Then prove that AB and BA are both orthogonal matrices. Prove a similar result for unitary matrices.

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

Multivariate normal distribution and testing for means (see MKB Ch 3)

More than you wanted to know about quadratic forms

Inner Product Spaces and Orthogonality

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Chapter 17. Orthogonal Matrices and Symmetries of Space

[1] Diagonal factorization

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

The Bivariate Normal Distribution

LINEAR ALGEBRA W W L CHEN

Lecture 5 Principal Minors and the Hessian

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = i.

Finite dimensional C -algebras

Linear Algebra Notes for Marsden and Tromba Vector Calculus

4: EIGENVALUES, EIGENVECTORS, DIAGONALIZATION

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

5. Continuous Random Variables

STAT 830 Convergence in Distribution

Inner Product Spaces

Sections 2.11 and 5.8

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Lecture 21. The Multivariate Normal Distribution

Numerical Analysis Lecture Notes

THREE DIMENSIONAL GEOMETRY

Lecture Notes 1. Brief Review of Basic Probability

Numerical Methods I Eigenvalue Problems

DATA ANALYSIS II. Matrix Algorithms

Similar matrices and Jordan form

Applied Linear Algebra I Review page 1

Notes on Determinant

LINEAR ALGEBRA. September 23, 2010

5. Orthogonal matrices

Lecture 1: Schur s Unitary Triangularization Theorem

Examination paper for TMA4115 Matematikk 3

Lecture 2 Matrix Operations

Lecture 5: Singular Value Decomposition SVD (1)

Notes on Symmetric Matrices

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

UNIT I: RANDOM VARIABLES PART- A -TWO MARKS

The Characteristic Polynomial

Lecture 13: Martingales

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

Math 312 Homework 1 Solutions

Covariance and Correlation

Lectures notes on orthogonal matrices (with exercises) Linear Algebra II - Spring 2004 by D. Klain

x = + x 2 + x

Eigenvalues and Eigenvectors

Linear Algebra I. Ronald van Luijk, 2012

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Lecture 6: Discrete & Continuous Probability and Random Variables

by the matrix A results in a vector which is a reflection of the given

Regression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

Factor Analysis. Factor Analysis

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 14 10/27/2008 MOMENT GENERATING FUNCTIONS

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Joint Exam 1/P Sample Exam 1

Statistics 100A Homework 8 Solutions

Multivariate Normal Distribution Rebecca Jennings, Mary Wakeman-Linn, Xin Zhao November 11, 2010

MATH APPLIED MATRIX THEORY

M2S1 Lecture Notes. G. A. Young ayoung

Review Jeopardy. Blue vs. Orange. Review Jeopardy

4 MT210 Notebook Eigenvalues and Eigenvectors Definitions; Graphical Illustrations... 3

F Matrix Calculus F 1

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.

Understanding and Applying Kalman Filtering

α = u v. In other words, Orthogonal Projection

Solving Linear Systems, Continued and The Inverse of a Matrix

Linear Maps. Isaiah Lankham, Bruno Nachtergaele, Anne Schilling (February 5, 2007)

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Linear Algebra: Determinants, Inverses, Rank

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Matrix Algebra. Some Basic Matrix Laws. Before reading the text or the following notes glance at the following list of basic matrix algebra laws.

Examination paper for TMA4205 Numerical Linear Algebra

Section 5.1 Continuous Random Variables: Introduction

4 Sums of Random Variables

Matrices and Linear Algebra

MAT188H1S Lec0101 Burbulla

Gaussian Conjugate Prior Cheat Sheet

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

minimal polyonomial Example

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers

Transcription:

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1

Learning outcomes Random vectors, mean vector, covariance matrix, rules of transformation Multivariate normal R.V., moment generating functions, characteristic function, rules of transformation Density of a multivariate normal RV Joint PDF of bivariate normal RVs Conditional distributions in a multivariate normal distribution Timo Koski Matematisk statistik 24.09.2015 2 / 1

PART 1: Mean vector, Covariance matrix, MGF, Characteristic function Timo Koski Matematisk statistik 24.09.2015 3 / 1

Vector Notation: Random Vector A random vector X is a column vector X 1 X 2 X =. = (X 1,X 2,...,X n ) T X n Each X i is a random variable. Timo Koski Matematisk statistik 24.09.2015 4 / 1

Sample Value Random Vector A column vector x = x 1 x 2. x n = (x 1,x 2,...,x n ) T We can think of x i is an outcome of X i. Timo Koski Matematisk statistik 24.09.2015 5 / 1

Joint CDF, Joint PDF The joint CDF (=cumulative distribution function) of a continuous random vector X is F X (x) = F X1,...,X n (x 1,...,x n ) = P (X x) = = P (X 1 x 1,...,X n x n ) Joint probability density function (PDF) f X (x) = n x 1... x n F X1,...,X n (x 1,...,x n ) Timo Koski Matematisk statistik 24.09.2015 6 / 1

Mean Vector µ X = E [X] = E [X 1 ] E [X 2 ]. E [X n ] a column vector of means (=expectations) of X., Timo Koski Matematisk statistik 24.09.2015 7 / 1

Matrix, Scalar Product If X T is the transposed column vector (=a row vector), then is a n n matrix, and XX T X T X = is a scalar product, a real valued R.V.. n Xi 2 i=1 Timo Koski Matematisk statistik 24.09.2015 8 / 1

Covariance Matrix of A Random Vector Covariance matrix C X := E [(X µ X )(X µ X ) T] where the element (i,j) is the covariance of X i and X j. C X (i,j) = E [(X i µ i )(X j µ j )] Timo Koski Matematisk statistik 24.09.2015 9 / 1

Remarks on Covariance X and Y are independent Cov(X,Y) = 0. The converse implication is not true in general, as shown in the next example. Let X N(0,1) and set Y = X 2. Then Y is clearly functionally dependent on X. But we have Cov(X,Y) = E [(X Y)] E [X] E [Y] = E [ X 3] 0 E [Y] = E [ X 3] = 0. The last equality holds, since one has g(x) = x 3 φ(x), so that g( x) = g(x). Hence E [ X 3] = + g(x)dx = 0, c.f., in the sequel, too. Timo Koski Matematisk statistik 24.09.2015 10 / 1

A Quadratic Form We see that = E = x T C X x = n n i=1j=1 [ n i=1 n n i=1j=1 x i x j C X (i,j). x i x j E [(X i µ i )(X j µ j )] n x i x j (X i µ i )(X j µ j ) j=1 ] ( ) Timo Koski Matematisk statistik 24.09.2015 11 / 1

Properties of a Covariance Matrix Covariance matrix is nonnegative definite, i.e., for all x we have x T C X x 0 Hence detc X 0. The covariance matrix is symmetric C X = CX T Timo Koski Matematisk statistik 24.09.2015 12 / 1

Properties of a Covariance Matrix The covariance matrix is symmetric C X = C T X since C X (i,j) = E [(X i µ i )(X j µ j )] = E [(X j µ j )(X i µ i )] = C X (j,i) Timo Koski Matematisk statistik 24.09.2015 13 / 1

Properties of a Covariance Matrix A covariance matrix is positive definite, x T C X x > 0 for all x = 0 iff (i.e. C X is invertible). detc X > 0 Timo Koski Matematisk statistik 24.09.2015 14 / 1

Properties of a Covariance Matrix Proposition Pf: By ( ) above = E x T C X x = x T E x T C X x 0 [ (X µ X )(X µ X ) T] x [ ] [ ] x T (X µ X )(X µ X ) T x = E x T w w T x where we have set w = (X µ X ). Then by linear algebra x T w = w T x = n i=1 w ix i. Hence ( [ ] n ) 2 E x T ww T x = E i x i 0. i=1w Timo Koski Matematisk statistik 24.09.2015 15 / 1

Properties of a Covariance Matrix In terms of the entries c i,j of a covariance matrix C = (c i,j ) n,n, i=1,j=1 there are the following necessary properties. 1 c i,j = c j,i (symmetry). 2 c i,i = Var(X i ) = σ 2 i 0 (the elements in the main diagonal are the variances, and thus all elements in the main diagonal are nonnegative). 3 c 2 i,j c i,i c j,j (Cauchy-Schwartz inequality). Timo Koski Matematisk statistik 24.09.2015 16 / 1

Coefficient of Correlation The Coefficient of Correlation ρ of X and Y is defined as ρ := ρ X,Y := Cov(X,Y) Var(X) Var(Y), where Cov(X,Y) = E [(X µ X )(Y µ Y )]. This is normalized For random variables X and Y, 1 ρ X,Y 1 Cov(X,Y) = ρ X,Y = 0 does not always mean that X,Y are independent. Timo Koski Matematisk statistik 24.09.2015 17 / 1

Special case: Covariance Matrix of A Bivariate Vector X = (X 1,X 2 ) T. ( σ 2 C X = 1 ρσ 1 σ 2 ρσ 1 σ 2 σ2 2 where ρ is the coefficient of correlation of X 1 and X 2, and σ1 2 = Var(X 1), σ2 2 = Var(X 2). C X is invertible iff ρ 2 = 1, for proof we note that detc X = σ1 2 σ2 2 ( 1 ρ 2 ) ), Timo Koski Matematisk statistik 24.09.2015 18 / 1

Special case: Covariance Matrix of A Bivariate Vector if ρ 2 = 1, the inverse exists and Λ 1 = ( σ 2 Λ = 1 ρσ 1 σ 2 ρσ 1 σ 2 σ2 2 ( 1 σ1 2σ2 2 (1 ρ2 ) ), σ 2 2 ρσ 1 σ 2 ρσ 1 σ 2 σ 2 1 ), Timo Koski Matematisk statistik 24.09.2015 19 / 1

Y = BX+b Proposition X is a random vector with mean vector µ X and covariance matrix C X. B is a m n matrix. If Y = BX+b, then EY = Bµ X +b C Y = BC X B T Pf: For simplicity of writing, take b = µ = 0. Then C Y = EYY T = EBX(BX) T = [ = EBXX T B T = BE XX T] B T = BC X B T Timo Koski Matematisk statistik 24.09.2015 20 / 1

Moment Generating and Characteristic Functions Definition Moment generating function of X is defined as ψ X (t) def = Ee ttx = Ee t 1X 1 +t 2 X 2 + +t n X n Definition Characteristic function of X is defined as ϕ X (t) def = Ee ittx = Ee i(t 1X 1 +t 2 X 2 + +t n X n ) Special cases: take t 1 = 1,t 2 = t 3 =... = t n = 0, then ϕ X (t) = ϕ X1 (t 1 ). Timo Koski Matematisk statistik 24.09.2015 21 / 1

PART 2: Def I of a multivariate normal distribution We recall first some of the properties of univariate normal distribution Timo Koski Matematisk statistik 24.09.2015 22 / 1

Normal (Gaussian) One-dimensional RVs X is a normal random variable if where µ is real and σ > 0. Notation: X N(µ, σ 2 ) f X (x) = 1 σ 2π e 1 2σ 2(x µ)2 Properties: E(X) = µ, Var(X) = σ 2 Timo Koski Matematisk statistik 24.09.2015 23 / 1

Normal (Gaussian) One-dimensional RVs 0.8 0.6 0.8 f X (x) 0.4 0.2 0 2 0 2 4 6 x f X (x) 0.6 0.4 0.2 0 2 0 2 4 6 x µ = 2, σ = 1/2, (b) µ = 2, σ = 2 (a) Timo Koski Matematisk statistik 24.09.2015 24 / 1

Central Moments Normal (Gaussian) One-dimensional RVs X N(0, σ 2 ). Then E [X n ] = { 0 n is odd (2k)! 2 k k! σ2k n = 2k, k = 0,1,2,.... Timo Koski Matematisk statistik 24.09.2015 25 / 1

Linear Transformation X N(µ X, σ 2 ) Y = ax +b is N(aµ X +b,a 2 σ 2 ) Thus Z = X µ X σ X N(0,1) and ( X µx P(X x) = P σ X or x µ X σ X ( F X (x) = P Z x µ ) ( ) X x µx = Φ σ X σ X ) Timo Koski Matematisk statistik 24.09.2015 26 / 1

Normal (Gaussian) One-dimensional RVs X N(µ, σ 2 ) then the moment generating function is [ ψ X (t) = E e tx] = e tµ+1 2 t2 σ 2, and the characteristic function is ϕ X (t) = E as found in previous Lectures. [ e itx] = e itµ 1 2 t2 σ 2 Timo Koski Matematisk statistik 24.09.2015 27 / 1

Multivariate Normal Def. I Definition An n 1 random vector X has a normal distribution iff for every n 1-vector a the one-dimensional random vector a T X has a normal distribution. We write X N(µ,Λ), when µ is the mean vector and Λ is the covariance matrix. Timo Koski Matematisk statistik 24.09.2015 28 / 1

Consequences of Def. I (1) An n 1 vector X N(µ,Λ) iff the one-dimensional random vector a T X has a normal distribution for every n-vector a. Now we know that (take B = a T in the preceding) [ ] Ea T X = a T µ, Var a T X = a T Λa Timo Koski Matematisk statistik 24.09.2015 29 / 1

Consequences of Def. I (2) Hence, if Y = a T X, then Y N ( a T µ,a T Λa ) and the moment generating function of Y is [ ψ Y (t) = E e ty] = e tat µ+ 2 1t2 a TΛa. Therefore ψ X (a) = Ee atx = ψ Y (1) = e at µ+ 1 2 at Λa. Timo Koski Matematisk statistik 24.09.2015 30 / 1

Consequences of Def. I (3) Hence we have shown that if X N(µ,Λ), then ψ X (t) = Ee ttx = e tt µ+ 1 2 tt Λt. is the moment generating function of X. Timo Koski Matematisk statistik 24.09.2015 31 / 1

Consequences of Def. I (4) In the same way we can find that ϕ X (t) = Ee ittx = e itt µ 1 2 tt Λt. is the characteristic function of X N(µ,Λ). Timo Koski Matematisk statistik 24.09.2015 32 / 1

Consequences of Def. I (5) Let Λ be a diagonal covariance matrix with λ 2 i s on the main diagonal, i.e., λ 2 1 0 0... 0 0 λ 2 2 0... 0 Λ = 0 0 λ 2 3... 0,. 0...... 0 0 0 0... λ 2 n Proposition If X N(µ,Λ), then X 1,X 2,...,X n are independent normal variables. Timo Koski Matematisk statistik 24.09.2015 33 / 1

Consequences of Def. I (6) Pf: Λ is diagonal, the quadratic form becomes a single sum of squares. ϕ X (t) = e itt µ 1 2 tt Λt = = e i n i=1 µ it i 1 2 n i=1 λ2 i t2 i = e iµ 1t 1 1 2 λ2 1 t2 1e iµ 2t 2 2 1 λ2 2 t2 2 e iµ n t n 2 1 λ2 n t2 n is the product of the characteristic functions of X i N ( µ i, λ 2 ) i, which are thus seen to be independent N ( µ i, λ 2 ) i. Timo Koski Matematisk statistik 24.09.2015 34 / 1

Kac s theorem: Thm 8.1.3. in LN Theorem X = (X 1,X 2,,X n ) T. The componentsx 1,X 2,,X n are independent if and only if φ X (s) = E [ e is X ] = n i=1 φ Xi (s i ), where φ Xi (s i ) is the characteristic function for X i. Timo Koski Matematisk statistik 24.09.2015 35 / 1

Further properties of the multivariate normal X N(µ,Λ) Every component X k is one-dimensional normal. To prove this we take a = (0,0,..., }{{} 1,0,...,0) T position k and the conclusion follows by Def. I. X 1 +X 2 + X n is one-dimensional normal. Note: The terms in the sum need not be independent. Timo Koski Matematisk statistik 24.09.2015 36 / 1

Properties of multivariate normal X N(µ,Λ) Every marginal distribution of k variables ( 1 k < n is normal. To prove this we consider any k variables X i1,x i2...x ik and then take a such that a j = 0 for j = i 1,...i k and then apply Def. I. Timo Koski Matematisk statistik 24.09.2015 37 / 1

Properties of multivariate normal Proposition X N(µ,Λ) and Y = BX+b. Then ( Y N Bµ+b,BΛB T). Pf: ψ Y (s) = E = e stb E E [ ] [ ] e st Y = E e st (b+bx) = [ e st BX ] = e stb E [ e (BT s) T X [ ] ( ) e s) T (BT X = ψ X B T s. ] Timo Koski Matematisk statistik 24.09.2015 38 / 1

Properties of multivariate normal X N(µ,Λ) ) ψ X (B T s = e s) T (BT µ+ 2(B 1 T s) T Λ(B T s). ( B T s) T µ = s T Bµ, ( ) T ( ) B T s Λ B T s = s T BΛB T s, e (BT s) T µ+ 1 2(B T s) T Λ(B T s) = e s T Bµ+ 1 2 st BΛB T s Timo Koski Matematisk statistik 24.09.2015 39 / 1

Properties of multivariate normal ) ψ X (B T s = e st Bµ+ 1 2 st BΛB Ts. ) ψ Y (s) = e stb ψ X (B T s = e stb e st Bµ+ 2 1sT BΛB T s which proves the claim as asserted. ψ Y (s) = e st (b+bµ)+ 1 2 st BΛB Ts, Timo Koski Matematisk statistik 24.09.2015 40 / 1

PART 3: Multivariate normal, Def. II: characteristic function, DEF III: density Timo Koski Matematisk statistik 24.09.2015 41 / 1

Multivariate normal, Def. II: char. fnctn Definition A random vector X with mean vector µ and a covariance matrix Λ is N(µ,Λ) if its characteristic function is ϕ X (t) = Ee ittx = e itt µ 1 2 tt Λt. Timo Koski Matematisk statistik 24.09.2015 42 / 1

Multivariate normal, Def. II implies Def. I We need to show that the one-dimensional random vector Y = a T X has a normal distribution. [ ϕ Y (t) = E e ity] ] = E [e it n i=1 a i X i = = E [ e itat X ] = ϕ X (ta) = = e itat µ 1 2 t2 a T Λa and this is the characteristic function of N ( a T µ,a T Λa ). Timo Koski Matematisk statistik 24.09.2015 43 / 1

Multivariate normal, Def. III: joint PDF Definition A random vector X with mean vector µ and an invertible covariance matrix Λ is N(µ,Λ), if the density is f X (x) = 1 (2π) n/2 det(λ) e 1 2 (x µ) T Λ 1 (x µ) Timo Koski Matematisk statistik 24.09.2015 44 / 1

Multivariate normal It can be checked by a computation that e itt µ 2 1tTΛt = e itt x 1 R n (2π) n/2 det(λ) e 1 2 (x µ) TΛ 1 (x µ) dx (complete the square) Hence Def. III implies the property in Def. II. The three definitions are equivalent, in the case inverse of the covariance matrix exists. Timo Koski Matematisk statistik 24.09.2015 45 / 1

PART 4: Bivariate normal with density Timo Koski Matematisk statistik 24.09.2015 46 / 1

Multivariate Normal: the bivariate case As soon as ρ 2 = 1, the matrix ( σ 2 Λ = 1 ρσ 1 σ 2 ρσ 1 σ 2 σ2 2 ), is invertible, and the inverse is Λ 1 = 1 σ 2 1 σ2 2 (1 ρ2 ) ( σ 2 2 ρσ 1 σ 2 ρσ 1 σ 2 σ 2 1 ), Timo Koski Matematisk statistik 24.09.2015 47 / 1

Multivariate Normal: the bivariate case ρ 2 = 1, and X = (X 1,X 2 ) T, then f X (x) = = 1 2π detλ e 1 2 (x µ X ) T Λ 1 (x µ X ) 1 2πσ 1 σ 2 1 ρ 2 e 1 2 Q(x 1,x 2 ) Timo Koski Matematisk statistik 24.09.2015 48 / 1

Multivariate Normal: the bivariate case where Q(x 1,x 2 ) = [ (x1 ) 1 (1 ρ 2 ) µ 2 1 2ρ(x ( ) ] 1 µ 1 )(x 2 µ 2 ) x2 µ 2 2 + σ 1 σ 2 σ 1 For this, invert the matrix Λ and expand the quadratic form! σ 2 Timo Koski Matematisk statistik 24.09.2015 49 / 1

ρ = 0 0.35 0.3 0.25 0.2-3 0.15-2 0.1 0.05 0-3 -2-1 0 1 2-1 0 1 2 3 3 Timo Koski Matematisk statistik 24.09.2015 50 / 1

ρ = 0.9 0.35 0.3 0.25 0.2-3 0.15-2 0.1-1 0.05 0-3 -2-1 0 1 2 0 1 2 3 3 Timo Koski Matematisk statistik 24.09.2015 51 / 1

ρ = 0.9 0.35 0.3 0.25 0.2-3 0.15-2 0.1-1 0.05 0-3 -2-1 0 1 2 0 1 2 3 3 Timo Koski Matematisk statistik 24.09.2015 52 / 1

Conditional densities for the bivariate normal Complete the square of the exponent to write where f X,Y (x,y) = f X (x)f Y X (y) f X (x) = f Y X (y) = 1 e 1 2σ 2 (x µ 1 ) 2 1 σ 1 2π 1 e 1 2 σ 2 (y µ 2 (x)) 2 2 σ 2 2π µ 2 (x) = µ 2 + ρ σ 2 σ 1 (x µ 1 ), σ 2 = σ 2 1 ρ 2 Timo Koski Matematisk statistik 24.09.2015 53 / 1

Bivariate normal properties E(X) = µ 1 Given X = x, Y is Gaussian Conditional mean of Y given X = x: µ 2 (x) = µ 2 + ρ σ 2 σ 1 (x µ 1 ) = E(Y X = x) Conditional variance of Y given X = x: Var(Y X = x) = σ2 2 ( 1 ρ 2 ) Timo Koski Matematisk statistik 24.09.2015 54 / 1

Bivariate normal properties Conditional mean of Y given X = x: µ 2 (x) = µ 2 + ρ σ 2 σ 1 (x µ 1 ) = E(Y X = x) Conditional variance of Y given X = x: Var(Y X = x) = σ2 2 ( 1 ρ 2 ) Check Section 3.7.3. and Exercise 3.8.4.6. By this is seen that the conditional mean of Y given X variable in a bivariate normal distribution is also the best LINEAR predictor of Y based on X, and the conditional variance is the variance of the estimation error. Timo Koski Matematisk statistik 24.09.2015 55 / 1

Marginal PDFs Timo Koski Matematisk statistik 24.09.2015 56 / 1

Proof of conditional pdf Consider f X,Y (x,y) f X (x) = σ 1 2π 2πσ 1 σ 2 1 ρ 2 e 1 2 Q(x,y)+ 1 2σ 1 2 (x µ 1 ) 2 Timo Koski Matematisk statistik 24.09.2015 57 / 1

Proof of conditional pdf 1 2 Q(x,y)+ 1 2σ1 2 (x µ 1 ) 2 = 1 2 H(x,y), Timo Koski Matematisk statistik 24.09.2015 58 / 1

Proof of conditional pdfs H(x,y) = [ (x ) 1 2 (1 ρ 2 ) µ1 2ρ(x µ ( ) ] 1)(y µ 2 ) y 2 µ2 + σ 1 σ 2 σ 1 ( x µ1 σ 1 ) 2 σ 2 Timo Koski Matematisk statistik 24.09.2015 59 / 1

Proof of conditional pdf H(x,y) = ρ 2 (x µ 1 ) 2 (1 ρ 2 ) σ1 2 2ρ(x µ 1)(y µ 2 ) σ 1 σ 2 (1 ρ 2 + (y µ 2) 2 ) σ2 2(1 ρ2 ) Timo Koski Matematisk statistik 24.09.2015 60 / 1

Proof of conditional pdf H(x,y) = ( ) 2 y µ 2 ρ σ 2 σ 1 (x µ 1 ) σ 2 2 (1 ρ2 ) Timo Koski Matematisk statistik 24.09.2015 61 / 1

Conditional pdf 1 1 ρ 2 σ 2 2π e f X,Y (x,y) = f X (x) 2 1 (y µ 2 ρ σ 2 σ1 (x µ 1 )) 2 σ 2 2(1 ρ2 ) This establishes the bivariate normal properties claimed above. Timo Koski Matematisk statistik 24.09.2015 62 / 1

Bivariate normal properties : ρ Proposition (X,Y) bivariate normal ρ = ρ X,Y Proof: E [(X µ 1 )(Y µ 2 )] = E(E([(X µ 1 )(Y µ 2 )] X)) = E((X µ 1 )E [Y µ 2 ] X)) Timo Koski Matematisk statistik 24.09.2015 63 / 1

Bivariate normal properties : ρ = E((X µ 1 )E [(Y µ 2 )] X)) = E(X µ 1 )[E(Y X) µ 2 ] [ = E((X µ 1 ) µ 2 + ρ σ ] 2 (X µ 1 ) µ 2 σ 1 = ρ σ 2 σ 1 E(X µ 1 )((X µ 1 )) Timo Koski Matematisk statistik 24.09.2015 64 / 1

Bivariate normal properties : ρ = ρ σ 2 σ 1 E(X µ 1 )(X µ 1 ) = ρ σ 2 σ 1 E(X µ 1 ) 2 = ρ σ 2 σ 1 σ 2 1 = ρσ 2 σ 1 Timo Koski Matematisk statistik 24.09.2015 65 / 1

Bivariate normal properties : ρ In other words we have checked that ρ = E [(X µ 1)(Y µ 2 )] σ 2 σ 1 ρ = 0 bivariate normal X,Y are independent. Timo Koski Matematisk statistik 24.09.2015 66 / 1

PART 5: Generating a multivariate normal variable Timo Koski Matematisk statistik 24.09.2015 67 / 1

Standard Normal Vector: definition Z N(0,I) is a standard normal vector. I is the n n identity matrix. f Z (z) = 1 (2π) n/2 det(i) e 1 2 (z 0) T I 1 (z 0) = 1 (2π) n/2e 1 2 zt z Timo Koski Matematisk statistik 24.09.2015 68 / 1

Distribution of X = AZ+b X = AZ+b, Z is standard Gaussian, then X = N (b,aa T) (follows by a rule in the preceding) Timo Koski Matematisk statistik 24.09.2015 69 / 1

Multivariate Normal: the bivariate case If ( σ 2 Λ = 1 ρσ 1 σ 2 ρσ 1 σ 2 σ2 2 ), then Λ = AA T, where A = ( σ1 0 ρσ 2 σ 2 1 ρ 2 ), Timo Koski Matematisk statistik 24.09.2015 70 / 1

Standard Normal Vector X N(µ X,Λ), and A is such that Λ = AA T (An invertible matrix A with this property exists always, if Λ is positive definite (we need the symmetry of Λ, too.) Then Z = A 1 (X µ X ) is a standard Gaussian vector. Proof: We give the first idea of his proof, a rule of transformation. Timo Koski Matematisk statistik 24.09.2015 71 / 1

Rule of transformation If X has density f X (x), Y = AX+b, A is invertible, then f Y (y) = Note that if Λ = AA T, then so that deta = detλ. 1 deta f ( X A 1 (y b) ) detλ = deta deta T = deta deta = deta 2, Timo Koski Matematisk statistik 24.09.2015 72 / 1

Diagonalizable Matrices An n n matrix A is orthogonally diagonalizable, if there is an orthogonal matrix P (i.e., P T P =PP T = I) such that where Λ is a diagonal matrix. P T AP = Λ, Timo Koski Matematisk statistik 24.09.2015 73 / 1

Diagonalizable Matrices Theorem If A is an n n matrix, then the following are equivalent: (i) A is orthogonally diagonalizable. (ii) A has an orthonormal set of eigenvectors. (iii) A is symmetric. Since covariance matrices are symmetric, we have by the theorem above that all covariance matrices are orthogonally diagonalizable. Timo Koski Matematisk statistik 24.09.2015 74 / 1

Diagonalizable Matrices Theorem If A is a symmetric matrix, then (i) Eigenvalues ofaare all real numbers. (ii) Eigenvectors from different eigenspaces are orthogonal. That is, all eigenvalues of a covariance matrix are real. Timo Koski Matematisk statistik 24.09.2015 75 / 1

Diagonalizable Matrices Hence we have for any covariance matrix the spectral decomposition C = n λ i e i ei T, (1) i=1 where Ce i = λ i e i. Since C is nonnegative definite, and its eigenvectors are orthonormal, 0 e T i Ce i = λ i e T i e i = λ i, and thus the eigenvalues of a covariance matrix are nonnegative. Timo Koski Matematisk statistik 24.09.2015 76 / 1

Diagonalizable Matrices Let now P be an orthogonal matrix such that P C X P = Λ, and X N(0,C X ), i.e., C X is a covariance matrix and Λ is diagonal (with the eigenvalues of C X on the main diagonal). Then if Y = P T X, we have that Y N(0,Λ). In other words, Y is a Gaussian vector and has independent components. This method of producing independent Gaussians has several important applications. One of these is the principal component analysis. Timo Koski Matematisk statistik 24.09.2015 77 / 1

Timo Koski Matematisk statistik 24.09.2015 78 / 1