SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Similar documents

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

October 3rd, Linear Algebra & Properties of the Covariance Matrix

Introduction to Matrix Algebra

Chapter 6. Orthogonality

Some probability and statistics

Quadratic forms Cochran s theorem, degrees of freedom, and all that

1 Introduction to Matrices

Orthogonal Diagonalization of Symmetric Matrices

Recall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n.

The Bivariate Normal Distribution

Similarity and Diagonalization. Similar Matrices

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

Data Mining: Algorithms and Applications Matrix Math Review

Linear Algebra Review. Vectors

[1] Diagonal factorization

Linear algebra and the geometry of quadratic equations. Similarity transformations and orthogonal matrices

Inner products on R n, and more

Section Inner Products and Norms

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

Multivariate normal distribution and testing for means (see MKB Ch 3)

Linear Algebra Notes for Marsden and Tromba Vector Calculus

Inner Product Spaces and Orthogonality

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Chapter 17. Orthogonal Matrices and Symmetries of Space

Lecture 2 Matrix Operations

More than you wanted to know about quadratic forms

Numerical Analysis Lecture Notes

Inner Product Spaces

Numerical Methods I Eigenvalue Problems

Lecture 5 Principal Minors and the Hessian

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

Covariance and Correlation

Lecture 21. The Multivariate Normal Distribution

Finite dimensional C -algebras

Examination paper for TMA4115 Matematikk 3

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

LINEAR ALGEBRA W W L CHEN

3. Let A and B be two n n orthogonal matrices. Then prove that AB and BA are both orthogonal matrices. Prove a similar result for unitary matrices.

5. Continuous Random Variables

STAT 830 Convergence in Distribution

by the matrix A results in a vector which is a reflection of the given

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = i.

Sections 2.11 and 5.8

THREE DIMENSIONAL GEOMETRY

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

4: EIGENVALUES, EIGENVECTORS, DIAGONALIZATION

DATA ANALYSIS II. Matrix Algorithms

Chapter 7. Matrices. Definition. An m n matrix is an array of numbers set out in m rows and n columns. Examples. (

Lecture 5: Singular Value Decomposition SVD (1)

Similar matrices and Jordan form

Applied Linear Algebra I Review page 1

Notes on Determinant

LINEAR ALGEBRA. September 23, 2010

The Characteristic Polynomial

5. Orthogonal matrices

Lecture 1: Schur s Unitary Triangularization Theorem

MATH APPLIED MATRIX THEORY

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 14 10/27/2008 MOMENT GENERATING FUNCTIONS

M2S1 Lecture Notes. G. A. Young ayoung

Factor Analysis. Factor Analysis

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

Lecture Notes 1. Brief Review of Basic Probability

Linear Algebra: Determinants, Inverses, Rank

Review Jeopardy. Blue vs. Orange. Review Jeopardy

University of Lille I PC first year list of exercises n 7. Review

4 Sums of Random Variables

Lecture 6: Discrete & Continuous Probability and Random Variables

LS.6 Solution Matrices

Vector and Matrix Norms

Lectures notes on orthogonal matrices (with exercises) Linear Algebra II - Spring 2004 by D. Klain

Multivariate Normal Distribution Rebecca Jennings, Mary Wakeman-Linn, Xin Zhao November 11, 2010

Eigenvalues and Eigenvectors

Solving Linear Systems, Continued and The Inverse of a Matrix

4 MT210 Notebook Eigenvalues and Eigenvectors Definitions; Graphical Illustrations... 3

F Matrix Calculus F 1

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Regression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013

Notes on Symmetric Matrices

UNIT I: RANDOM VARIABLES PART- A -TWO MARKS

1 Determinants and the Solvability of Linear Systems

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

ECE302 Spring 2006 HW5 Solutions February 21,

Understanding and Applying Kalman Filtering

Linear Maps. Isaiah Lankham, Bruno Nachtergaele, Anne Schilling (February 5, 2007)

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Lecture 13: Martingales

13 MATH FACTS a = The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

Math 312 Homework 1 Solutions

Matrix Representations of Linear Transformations and Changes of Coordinates

Matrix Algebra. Some Basic Matrix Laws. Before reading the text or the following notes glance at the following list of basic matrix algebra laws.

Joint Exam 1/P Sample Exam 1

Continued Fractions and the Euclidean Algorithm

Linear Algebra I. Ronald van Luijk, 2012

CONDITIONAL, PARTIAL AND RANK CORRELATION FOR THE ELLIPTICAL COPULA; DEPENDENCE MODELLING IN UNCERTAINTY ANALYSIS

Statistics 100A Homework 7 Solutions

Statistics 100A Homework 8 Solutions

α = u v. In other words, Orthogonal Projection

Transcription:

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2014 Timo Koski () Mathematisk statistik 24.09.2014 1 / 75

Learning outcomes Random vectors, mean vector, covariance matrix, rules of transformation Multivariate normal R.V., moment generating functions, characteristic function, rules of transformation Density of a multivariate normal RV Joint PDF of bivariate normal RVs Conditional distributions in a multivariate normal distribution Timo Koski () Mathematisk statistik 24.09.2014 2 / 75

PART 1: Mean vector, Covariance matrix, MGF, Characteristic function Timo Koski () Mathematisk statistik 24.09.2014 3 / 75

Vector Notation: Random Vector A random vector X is a column vector X 1 X 2 X =. = (X 1,X 2,...,X n ) T X n Each X i is a random variable. Timo Koski () Mathematisk statistik 24.09.2014 4 / 75

Sample Value Random Vector A column vector x = x 1 x 2. x n = (x 1,x 2,...,x n ) T We can think of x i is an outcome of X i. Timo Koski () Mathematisk statistik 24.09.2014 5 / 75

Joint CDF, Joint PDF The joint CDF (=cumulative distribution function) of a continuous random vector X is F X (x) = F X1,...,X n (x 1,...,x n ) = P (X x) = = P (X 1 x 1,...,X n x n ) Joint probability density function (PDF) f X (x) = n x 1... x n F X1,...,X n (x 1,...,x n ) Timo Koski () Mathematisk statistik 24.09.2014 6 / 75

Mean Vector µ X = E [X] = E [X 1 ] E [X 2 ]. E [X n ] a column vector of means (=expectations) of X., Timo Koski () Mathematisk statistik 24.09.2014 7 / 75

Matrix, Scalar Product If X T is the transposed column vector (=a row vector), then is a n n matrix, and XX T X T X = is a scalar product, a real valued R.V.. n Xi 2 i=1 Timo Koski () Mathematisk statistik 24.09.2014 8 / 75

Covariance Matrix of A Random Vector Covariance matrix C X := E [(X µ X )(X µ X ) T] where the element (i,j) is the covariance of X i and X j. C X (i,j) = E [(X i µ i )(X j µ j )] Timo Koski () Mathematisk statistik 24.09.2014 9 / 75

A Quadratic Form We see that = E = x T C X x = n n i=1j=1 [ n i=1 n n i=1j=1 x i x j C X (i,j). x i x j E [(X i µ i )(X j µ j )] n x i x j (X i µ i )(X j µ j ) j=1 ] ( ) Timo Koski () Mathematisk statistik 24.09.2014 10 / 75

Properties of a Covariance Matrix Covariance matrix is nonnegative definite, i.e., for all x we have x T C X x 0 Hence detc X 0. The covariance matrix is symmetric C X = CX T Timo Koski () Mathematisk statistik 24.09.2014 11 / 75

Properties of a Covariance Matrix The covariance matrix is symmetric C X = C T X since C X (i,j) = E [(X i µ i )(X j µ j )] = E [(X j µ j )(X i µ i )] = C X (j,i) Timo Koski () Mathematisk statistik 24.09.2014 12 / 75

Properties of a Covariance Matrix A covariance matrix is positive definite, x T C X x > 0 for all x = 0 iff (i.e. C X is invertible). detc X > 0 Timo Koski () Mathematisk statistik 24.09.2014 13 / 75

Properties of a Covariance Matrix Proposition Pf: By ( ) above = E x T C X x = x T E x T C X x 0 [ (X µ X )(X µ X ) T] x [ ] [ ] x T (X µ X )(X µ X ) T x = E x T w w T x where we have set w = (X µ X ). Then by linear algebra x T w = w T x = n i=1 w ix i. Hence ( [ ] n ) 2 E x T ww T x = E i x i 0. i=1w Timo Koski () Mathematisk statistik 24.09.2014 14 / 75

Properties of a Covariance Matrix In terms of the entries c i,j of a covariance matrix C = (c i,j ) n,n, i=1,j=1 there are the following necessary properties. 1 c i,j = c j,i (symmetry). 2 c i,i = Var(X i ) = σ 2 i 0 (the elements in the main diagonal are the variances, and thus all elements in the main diagonal are nonnegative). 3 c 2 i,j c i,i c j,j (Cauchy-Schwartz inequality). Timo Koski () Mathematisk statistik 24.09.2014 15 / 75

Coefficient of Correlation The Coefficient of Correlation ρ of X and Y is defined as ρ := ρ X,Y := Cov(X,Y) Var(X) Var(Y), where Cov(X,Y) = E [(X µ X )(Y µ Y )]. This is normalized For random variables X and Y, 1 ρ X,Y 1 Cov(X,Y) = ρ X,Y = 0 does not always mean that X,Y are independent. Timo Koski () Mathematisk statistik 24.09.2014 16 / 75

Special case: Covariance Matrix of A Bivariate Vector X = (X 1,X 2 ) T. ( σ 2 C X = 1 ρσ 1 σ 2 ρσ 1 σ 2 σ2 2 where ρ is the coefficient of correlation of X 1 and X 2, and σ1 2 = Var(X 1), σ2 2 = Var(X 2). C X is invertible iff ρ 2 = 1, for proof we note that detc X = σ1 2 σ2 2 ( 1 ρ 2 ) ), Timo Koski () Mathematisk statistik 24.09.2014 17 / 75

Special case: Covariance Matrix of A Bivariate Vector if ρ 2 = 1, the inverse exists and Λ 1 = ( σ 2 Λ = 1 ρσ 1 σ 2 ρσ 1 σ 2 σ2 2 ( 1 σ1 2σ2 2 (1 ρ2 ) ), σ 2 2 ρσ 1 σ 2 ρσ 1 σ 2 σ 2 1 ), Timo Koski () Mathematisk statistik 24.09.2014 18 / 75

Y = BX+b Proposition X is a random vector with mean vector µ X and covariance matrix C X. B is a m n matrix. If Y = BX+b, then EY = Bµ X +b C Y = BC X B T Pf: For simplicity of writing, take b = µ = 0. Then C Y = EYY T = EBX(BX) T = [ = EBXX T B T = BE XX T] B T = BC X B T Timo Koski () Mathematisk statistik 24.09.2014 19 / 75

Moment Generating and Characteristic Functions Definition Moment generating function of X is defined as ψ X (t) def = Ee ttx = Ee t 1X 1 +t 2 X 2 + +t n X n Definition Characteristic function of X is defined as ϕ X (t) def = Ee ittx = Ee i(t 1X 1 +t 2 X 2 + +t n X n ) Special cases: take t 1 = 1,t 2 = t 3 =... = t n = 0, then ϕ X (t) = ϕ X1 (t 1 ). Timo Koski () Mathematisk statistik 24.09.2014 20 / 75

PART 2: Def I of a multivariate normal distribution We recall first some of the properties of univariate normal distribution Timo Koski () Mathematisk statistik 24.09.2014 21 / 75

Normal (Gaussian) One-dimensional RVs X is a normal random variable if where µ is real and σ > 0. Notation: X N(µ, σ 2 ) Properties: E(X) = µ, Var = σ 2 f X (x) = 1 σ 2π e 1 2σ 2(x µ)2 Timo Koski () Mathematisk statistik 24.09.2014 22 / 75

Normal (Gaussian) One-dimensional RVs 0.8 0.6 0.8 f X (x) 0.4 0.2 0 2 0 2 4 6 x f X (x) 0.6 0.4 0.2 0 2 0 2 4 6 x µ = 2, σ = 1/2, (b) µ = 2, σ = 2 (a) Timo Koski () Mathematisk statistik 24.09.2014 23 / 75

Linear Transformation X N(µ X, σ 2 ) Y = ax +b is N(aµ X +b,a 2 σ 2 ) Thus Z = X µ X σ X N(0,1) and ( X µx P(X x) = P σ X or x µ X σ X ( F X (x) = P Z x µ ) ( ) X x µx = Φ σ X σ X ) Timo Koski () Mathematisk statistik 24.09.2014 24 / 75

Normal (Gaussian) One-dimensional RVs X N(µ, σ 2 ) then the moment generating function is [ ψ X (t) = E e tx] = e tµ+1 2 t2 σ 2, and the characteristic function is ϕ X (t) = E as found in previous Lectures. [ e itx] = e itµ 1 2 t2 σ 2 Timo Koski () Mathematisk statistik 24.09.2014 25 / 75

Multivariate Normal Def. I Definition An n 1 random vector X has a normal distribution iff for every n 1-vector a the one-dimensional random vector a T X has a normal distribution. We write X N(µ,Λ), when µ is the mean vector and Λ is the covariance matrix. Timo Koski () Mathematisk statistik 24.09.2014 26 / 75

Consequences of Def. I (1) An n 1 vector X N(µ,Λ) iff the one-dimensional random vector a T X has a normal distribution for every n-vector a. Now we know that (take B = a T in the preceding) [ ] Ea T X = a T µ,var a T X = a T Λa Timo Koski () Mathematisk statistik 24.09.2014 27 / 75

Consequences of Def. I (2) Hence, if Y = a T X, then Y N ( a T µ,a T Λa ) and the moment generating function of Y is [ ψ Y (t) = E e ty] = e tat µ+ 2 1t2 a TΛa. Therefore ψ X (a) = Ee atx = ψ Y (1) = e at µ+ 1 2 at Λa. Timo Koski () Mathematisk statistik 24.09.2014 28 / 75

Consequences of Def. I (3) Hence we have shown that if X N(µ,Λ), then ψ X (t) = Ee ttx = e tt µ+ 1 2 tt Λt. is the moment generating function of X. Timo Koski () Mathematisk statistik 24.09.2014 29 / 75

Consequences of Def. I (4) In the same way we can find that ϕ X (t) = Ee ittx = e itt µ 1 2 tt Λt. is the characteristic function of X N(µ,Λ). Timo Koski () Mathematisk statistik 24.09.2014 30 / 75

Consequences of Def. I (5) Let Λ be a diagonal covariance matrix with λ 2 i s on the main diagonal, i.e., λ 2 1 0 0... 0 0 λ 2 2 0... 0 Λ = 0 0 λ 2 3... 0,. 0...... 0 0 0 0... λ 2 n Proposition If X N(µ,Λ), then X 1,X 2,...,X n are independent normal variables. Timo Koski () Mathematisk statistik 24.09.2014 31 / 75

Consequences of Def. I (6) Pf: Λ is diagonal, the quadratic form becomes a single sum of squares. ϕ X (t) = e itt µ 1 2 tt Λt = = e i n i=1 µ it i 1 2 n i=1 λ2 i t2 i = e iµ 1t 1 1 2 λ2 1 t2 1e iµ 2t 2 2 1 λ2 2 t2 2 e iµ n t n 2 1 λ2 n t2 n is the product of the characteristic functions of X i N ( µ i, λ 2 ) i, which are thus seen to be independent N ( µ i, λ 2 ) i. Timo Koski () Mathematisk statistik 24.09.2014 32 / 75

Kac s theorem: Thm 8.1.3. in LN Theorem X = (X 1,X 2,,X n ). ThecomponentsX 1,X 2,,X n are independent if and only if φ X (s) = E [ e is X ] = n i=1 φ Xi (s i ), where φ Xi (s i ) is thecharacteristic functionfor X i. Timo Koski () Mathematisk statistik 24.09.2014 33 / 75

Further properties of the multivariate normal X N(µ,Λ) Every component X k is one-dimensional normal. To prove this we take a = (0,0,..., }{{} 1,0,...,0) T position k and the conclusion follows by Def. I. X 1 +X 2 + X n is one-dimensional normal. Note: The terms in the sum need not be independent. Timo Koski () Mathematisk statistik 24.09.2014 34 / 75

Properties of multivariate normal X N(µ,Λ) Every marginal distribution of k variables ( 1 k < n is normal. To prove this we consider any k variables X i1,x i2...x ik and then take a such that a j = 0 for j = i 1,...i k and then apply Def. I. Timo Koski () Mathematisk statistik 24.09.2014 35 / 75

Properties of multivariate normal Proposition X N(µ,Λ) and Y = BX+b. Then ( Y N Bµ+b,BΛB T). Pf: ψ Y (s) = E = e stb E E [ ] [ ] e st Y = E e st (b+bx) = [ e st BX ] = e stb E [ e (BT s) T X [ ] ( ) e s) T (BT X = ψ X B T s. ] Timo Koski () Mathematisk statistik 24.09.2014 36 / 75

Properties of multivariate normal X N(µ,Λ) ) ψ X (B T s = e s) T (BT µ+ 2(B 1 T s) T Λ(B T s). ( B T s) T µ = s T Bµ, ( ) T ( ) B T s Λ B T s = s T BΛB T s, e (BT s) T µ+ 1 2(B T s) T Λ(B T s) = e s T Bµ+ 1 2 st BΛB T s Timo Koski () Mathematisk statistik 24.09.2014 37 / 75

Properties of multivariate normal ) ψ X (B T s = e st Bµ+ 1 2 st BΛB Ts. ) ψ Y (s) = e stb ψ X (B T s = e stb e st Bµ+ 2 1sT BΛB T s which proves the claim as asserted. ψ Y (s) = e st (b+bµ)+ 1 2 st BΛB Ts, Timo Koski () Mathematisk statistik 24.09.2014 38 / 75

PART 3: Multivariate normal, Def. II: characteristic function, DEF III: density Timo Koski () Mathematisk statistik 24.09.2014 39 / 75

Multivariate normal, Def. II: char. fnctn Definition A random vector X with mean vector µ and a covariance matrix Λ is N(µ,Λ) if its characteristic function is ϕ X (t) = Ee ittx = e itt µ 1 2 tt Λt. Timo Koski () Mathematisk statistik 24.09.2014 40 / 75

Multivariate normal, Def. II implies Def. I We need to show that the one-dimensional random vector Y = a T X has a normal distribution. [ ϕ Y (t) = E e ity] ] = E [e it n i=1 a i X i = = E [ e itat X ] = ϕ X (ta) = = e itat µ 1 2 t2 a T Λa and this is the characteristic function of N ( a T µ,a T Λa ). Timo Koski () Mathematisk statistik 24.09.2014 41 / 75

Multivariate normal, Def. III: joint PDF Definition A random vector X with mean vector µ and an invertible covariance matrix Λ is N(µ,Λ), if the density is f X (x) = 1 (2π) n/2 det(λ) e 1 2 (x µ) T Λ 1 (x µ) Timo Koski () Mathematisk statistik 24.09.2014 42 / 75

Multivariate normal It can be checked by a computation that e itt µ 2 1tTΛt = e itt x 1 R n (2π) n/2 det(λ) e 1 2 (x µ) TΛ 1 (x µ) dx (complete the square) Hence Def. III implies the property in Def. II. The three definitions are equivalent, in the case inverse of the covariance matrix exists. Timo Koski () Mathematisk statistik 24.09.2014 43 / 75

PART 4: Bivariate normal with density Timo Koski () Mathematisk statistik 24.09.2014 44 / 75

Multivariate Normal: the bivariate case As soon as ρ 2 = 1, the matrix ( σ 2 Λ = 1 ρσ 1 σ 2 ρσ 1 σ 2 σ2 2 ), is invertible, and the inverse is Λ 1 = 1 σ 2 1 σ2 2 (1 ρ2 ) ( σ 2 2 ρσ 1 σ 2 ρσ 1 σ 2 σ 2 1 ), Timo Koski () Mathematisk statistik 24.09.2014 45 / 75

Multivariate Normal: the bivariate case ρ 2 = 1, and X = (X 1,X 2 ) T, then f X (x) = = 1 2π detλ e 1 2 (x µ X ) T Λ 1 (x µ X ) 1 2πσ 1 σ 2 1 ρ 2 e 1 2 Q(x 1,x 2 ) Timo Koski () Mathematisk statistik 24.09.2014 46 / 75

Multivariate Normal: the bivariate case where Q(x 1,x 2 ) = [ (x1 ) 1 (1 ρ 2 ) µ 2 1 2ρ(x ( ) ] 1 µ 1 )(x 2 µ 2 ) x2 µ 2 2 + σ 1 σ 2 σ 1 For this, invert the matrix Λ and expand the quadratic form! σ 2 Timo Koski () Mathematisk statistik 24.09.2014 47 / 75

ρ = 0 0.35 0.3 0.25 0.2-3 0.15-2 0.1 0.05 0-3 -2-1 0 1 2-1 0 1 2 3 3 Timo Koski () Mathematisk statistik 24.09.2014 48 / 75

ρ = 0.9 0.35 0.3 0.25 0.2-3 0.15-2 0.1-1 0.05 0-3 -2-1 0 1 2 0 1 2 3 3 Timo Koski () Mathematisk statistik 24.09.2014 49 / 75

ρ = 0.9 0.35 0.3 0.25 0.2-3 0.15-2 0.1-1 0.05 0-3 -2-1 0 1 2 0 1 2 3 3 Timo Koski () Mathematisk statistik 24.09.2014 50 / 75

Conditional densities for the bivariate normal Complete the square of the exponent to write where f X,Y (x,y) = f X (x)f Y X (y) f X (x) = f Y X (y) = 1 e 1 2σ 2 (x µ 1 ) 2 1 σ 1 2π 1 e 1 2 σ 2 (y µ 2 (x)) 2 2 σ 2 2π µ 2 (x) = µ 2 + ρ σ 2 σ 1 (x µ 1 ), σ 2 = σ 2 1 ρ 2 Timo Koski () Mathematisk statistik 24.09.2014 51 / 75

Bivariate normal properties E(X) = µ 1 Given X = x, Y is Gaussian Conditional mean of Y given X = x: µ 2 (x) = µ 2 + ρ σ 2 σ 1 (x µ 1 ) = E(Y X = x) Conditional variance of Y given X = x: Var(Y X = x) = σ2 2 ( 1 ρ 2 ) Timo Koski () Mathematisk statistik 24.09.2014 52 / 75

Bivariate normal properties Conditional mean of Y given X = x: µ 2 (x) = µ 2 + ρ σ 2 σ 1 (x µ 1 ) = E(Y X = x) Conditional variance of Y given X = x: Var(Y X = x) = σ2 2 ( 1 ρ 2 ) Check Section 3.7.3. and Exercise 3.8.4.6. By this is seen that the conditional mean of Y given X variable in a bivariate normal distribution is also the best LINEAR predictor of Y based on X, and the conditional variance is the variance of the estimation error. Timo Koski () Mathematisk statistik 24.09.2014 53 / 75

Marginal PDFs Timo Koski () Mathematisk statistik 24.09.2014 54 / 75

Proof of conditional pdf Consider f X,Y (x,y) f X (x) = σ 1 2π 2πσ 1 σ 2 1 ρ 2 e 1 2 Q(x,y)+ 1 2σ 1 2 (x µ 1 ) 2 Timo Koski () Mathematisk statistik 24.09.2014 55 / 75

Proof of conditional pdf 1 2 Q(x,y)+ 1 2σ1 2 (x µ 1 ) 2 = 1 2 H(x,y), Timo Koski () Mathematisk statistik 24.09.2014 56 / 75

Proof of conditional pdfs H(x,y) = [ (x ) 1 2 (1 ρ 2 ) µ1 2ρ(x µ ( ) ] 1)(y µ 2 ) y 2 µ2 + σ 1 σ 2 σ 1 ( x µ1 σ 1 ) 2 σ 2 Timo Koski () Mathematisk statistik 24.09.2014 57 / 75

Proof of conditional pdf H(x,y) = ρ 2 (x µ 1 ) 2 (1 ρ 2 ) σ1 2 2ρ(x µ 1)(y µ 2 ) σ 1 σ 2 (1 ρ 2 + (y µ 2) 2 ) σ2 2(1 ρ2 ) Timo Koski () Mathematisk statistik 24.09.2014 58 / 75

Proof of conditional pdf H(x,y) = ( ) 2 y µ 2 ρ σ 2 σ 1 (x µ 1 ) σ 2 2 (1 ρ2 ) Timo Koski () Mathematisk statistik 24.09.2014 59 / 75

Conditional pdf 1 1 ρ 2 σ 2 2π e f X,Y (x,y) = f X (x) 2 1 (y µ 2 ρ σ 2 σ1 (x µ 1 )) 2 σ 2 2(1 ρ2 ) This establishes the bivariate normal properties claimed above. Timo Koski () Mathematisk statistik 24.09.2014 60 / 75

Bivariate normal properties : ρ Proposition (X,Y) bivariate normal ρ = ρ X,Y Proof: E [(X µ 1 )(Y µ 2 )] = E(E([(X µ 1 )(Y µ 2 )] X)) = E((X µ 1 )E [Y µ 2 ] X)) Timo Koski () Mathematisk statistik 24.09.2014 61 / 75

Bivariate normal properties : ρ = E((X µ 1 )E [(Y µ 2 )] X)) = E(X µ 1 )[E(Y X) µ 2 ] [ = E((X µ 1 ) µ 2 + ρ σ ] 2 (X µ 1 ) µ 2 σ 1 = ρ σ 2 σ 1 E(X µ 1 )((X µ 1 )) Timo Koski () Mathematisk statistik 24.09.2014 62 / 75

Bivariate normal properties : ρ = ρ σ 2 σ 1 E(X µ 1 )(X µ 1 ) = ρ σ 2 σ 1 E(X µ 1 ) 2 = ρ σ 2 σ 1 σ 2 1 = ρσ 2 σ 1 Timo Koski () Mathematisk statistik 24.09.2014 63 / 75

Bivariate normal properties : ρ In other words we have checked that ρ = E [(X µ 1)(Y µ 2 )] σ 2 σ 1 ρ = 0 bivariate normal X,Y are independent. Timo Koski () Mathematisk statistik 24.09.2014 64 / 75

PART 5: Generating a multivariate normal variable Timo Koski () Mathematisk statistik 24.09.2014 65 / 75

Standard Normal Vector: definition Z N(0,I) is a standard normal vector. I is the n n identity matrix. f Z (z) = 1 (2π) n/2 det(i) e 1 2 (z 0) T I 1 (z 0) = 1 (2π) n/2e 1 2 zt z Timo Koski () Mathematisk statistik 24.09.2014 66 / 75

Distribution of X = AZ+b X = AZ+b, Z is standard Gaussian, then X = N (b,aa T) (follows by a rule in the preceding) Timo Koski () Mathematisk statistik 24.09.2014 67 / 75

Multivariate Normal: the bivariate case If ( σ 2 Λ = 1 ρσ 1 σ 2 ρσ 1 σ 2 σ2 2 ), then Λ = AA T, where A = ( σ1 0 ρσ 2 σ 2 1 ρ 2 ), Timo Koski () Mathematisk statistik 24.09.2014 68 / 75

Standard Normal Vector X N(µ X,Λ), and A is such that Λ = AA T (An invertible matrix A with this property exists always, if Λ is positive definite (we need the symmetry of Λ, too.) Then Z = A 1 (X µ X ) is a standard Gaussian vector. Proof: We give the first idea of his proof, a rule of transformation. Timo Koski () Mathematisk statistik 24.09.2014 69 / 75

Rule of transformation If X has density f X (x), Y = AX+b, A is invertible, then f Y (y) = Note that if Λ = AA T, then so that deta = detλ. 1 deta f ( X A 1 (y b) ) detλ = deta deta T = deta deta = deta 2, Timo Koski () Mathematisk statistik 24.09.2014 70 / 75

Johann Carl Friedrich Gauss (30 April 1777 23 February 1855) Timo Koski () Mathematisk statistik 24.09.2014 71 / 75

Diagonalizable Matrices An n n matrix A is orthogonally diagonalizable, if there is an orthogonal matrix P (i.e., P T P =PP T = I) such that where Λ is a diagonal matrix. P T AP = Λ, Timo Koski () Mathematisk statistik 24.09.2014 72 / 75

Diagonalizable Matrices Theorem If Ais an n n matrix, thenthefollowingare equivalent: (i) A is orthogonally diagonalizable. (ii) A has an orthonormal set of eigenvectors. (iii) A is symmetric. Since covariance matrices are symmetric, we have by the theorem above that all covariance matrices are orthogonally diagonalizable. Timo Koski () Mathematisk statistik 24.09.2014 73 / 75

Diagonalizable Matrices Theorem If Ais asymmetricmatrix, then (i) Eigenvalues of A are all real numbers. (ii) Eigenvectors from different eigenspaces are orthogonal. That is, all eigenvalues of a covariance matrix are real. Timo Koski () Mathematisk statistik 24.09.2014 74 / 75

Diagonalizable Matrices Hence we have for any covariance matrix the spectral decomposition C = n λ i e i ei T, (1) i=1 where Ce i = λ i e i. Since C is nonnegative definite, and its eigenvectors are orthonormal, 0 e T i Ce i = λ i e T i e i = λ i, and thus the eigenvalues of a covariance matrix are nonnegative. Timo Koski () Mathematisk statistik 24.09.2014 75 / 75

Diagonalizable Matrices Let now P be an orthogonal matrix such that P C X P = Λ, and X N(0,C X ), i.e., C X is a covariance matrix and Λ is diagonal (with the eigenvalues of C X on the main diagonal). Then if Y = P T X, we have that Y N(0,Λ). In other words, Y is a Gaussian vector and has independent components. This method of producing independent Gaussians has several important applications. One of these is the principal component analysis. Timo Koski () Mathematisk statistik 24.09.2014 76 / 75