Lecture Notes 3 Random Vectors. Specifying a Random Vector. Mean and Covariance Matrix. Coloring and Whitening. Gaussian Random Vectors

Similar documents
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

6. Cholesky factorization

Inner Product Spaces and Orthogonality

Lecture Notes 1. Brief Review of Basic Probability

CS229 Lecture notes. Andrew Ng

DATA ANALYSIS II. Matrix Algorithms

Sections 2.11 and 5.8

Linear Algebra Review. Vectors

Data Mining: Algorithms and Applications Matrix Math Review

Factorization Theorems

LINEAR ALGEBRA. September 23, 2010

Inner Product Spaces

The Monte Carlo Framework, Examples from Finance and Generating Correlated Random Variables

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = i.

3. Let A and B be two n n orthogonal matrices. Then prove that AB and BA are both orthogonal matrices. Prove a similar result for unitary matrices.

LINEAR ALGEBRA W W L CHEN

Notes on Symmetric Matrices

Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances

Lecture 8: Signal Detection and Noise Assumption

NOTES ON LINEAR TRANSFORMATIONS

Similarity and Diagonalization. Similar Matrices

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

1 Introduction to Matrices

[1] Diagonal factorization

Section Inner Products and Norms

Some probability and statistics

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

Notes on Determinant

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Lectures notes on orthogonal matrices (with exercises) Linear Algebra II - Spring 2004 by D. Klain

Introduction to Matrix Algebra

October 3rd, Linear Algebra & Properties of the Covariance Matrix

Lecture 1: Schur s Unitary Triangularization Theorem

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Understanding and Applying Kalman Filtering

Quadratic forms Cochran s theorem, degrees of freedom, and all that

CONTROLLABILITY. Chapter Reachable Set and Controllability. Suppose we have a linear system described by the state equation

1 VECTOR SPACES AND SUBSPACES

by the matrix A results in a vector which is a reflection of the given

Factor analysis. Angela Montanari

The Bivariate Normal Distribution

Math 4310 Handout - Quotient Vector Spaces

5. Continuous Random Variables

Linear Algebra Notes for Marsden and Tromba Vector Calculus

MAT188H1S Lec0101 Burbulla

Abstract: We describe the beautiful LU factorization of a square matrix (or how to write Gaussian elimination in terms of matrix multiplication).

Chapter 7. Permutation Groups

Introduction to General and Generalized Linear Models

Direct Methods for Solving Linear Systems. Matrix Factorization

Introduction to Probability

MATH APPLIED MATRIX THEORY

Lecture 5: Singular Value Decomposition SVD (1)

Lecture 21. The Multivariate Normal Distribution

Chapter 6. Orthogonality

What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference

Numerical Analysis Lecture Notes

SOLVING LINEAR SYSTEMS

Vector and Matrix Norms

7 Gaussian Elimination and LU Factorization

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Solving Linear Systems, Continued and The Inverse of a Matrix

Component Ordering in Independent Component Analysis Based on Data Power

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

Similar matrices and Jordan form

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

A Tutorial on Probability Theory

Orthogonal Projections

Multivariate Normal Distribution

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

Probability Generating Functions

3. INNER PRODUCT SPACES

Inner products on R n, and more

Multivariate normal distribution and testing for means (see MKB Ch 3)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Master s Theory Exam Spring 2006

1 Sets and Set Notation.

Math 431 An Introduction to Probability. Final Exam Solutions

For a partition B 1,..., B n, where B i B j = for i. A = (A B 1 ) (A B 2 ),..., (A B n ) and thus. P (A) = P (A B i ) = P (A B i )P (B i )

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Least-Squares Intersection of Lines

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

LECTURE 4. Last time: Lecture outline

The Characteristic Polynomial

Lecture 2 Matrix Operations

3 Orthogonal Vectors and Matrices

Probability and Random Variables. Generation of random variables (r.v.)

Elasticity Theory Basics

5. Orthogonal matrices

Linear Algebraic Equations, SVD, and the Pseudo-Inverse

Chapter 4 Lecture Notes

Linear Algebra: Determinants, Inverses, Rank

A note on companion matrices

M2S1 Lecture Notes. G. A. Young ayoung

Covariance and Correlation

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

T ( a i x i ) = a i T (x i ).

CS3220 Lecture Notes: QR factorization and orthogonal transformations

Transcription:

Lecture Notes 3 Random Vectors Specifying a Random Vector Mean and Covariance Matrix Coloring and Whitening Gaussian Random Vectors EE 278B: Random Vectors 3 Specifying a Random Vector Let X,X 2,...,X n be random variables defined on the same probability space. We define a random vector (RV) as X X = X 2. X is completely specified by its joint cdf for x =(x,x 2,...,x n ): X n F X (x) =P{X x,x 2 x 2,...,X n x n }, x R n If X is continuous, i.e., F X (x) is a continuous function of x, thenx can be specified by its joint pdf: f X (x) =f X,X 2,...,X n (x,x 2,...,x n ), If X is discrete then it can be specified by its joint pmf: p X (x) =p X,X 2,...,X n (x,x 2,...,x n ), x R n x X n EE 278B: Random Vectors 3 2

Amarginalcdf(pdf,pmf)isthejointcdf(pdf,pmf)forasubset of {X,...,X n };e.g.,for X = X the marginals are X 2 X 3 f X (x ),f X2 (x 2 ),f X3 (x 3 ) f X,X 2 (x,x 2 ),f X,X 3 (x,x 3 ),f X2,X 3 (x 2,x 3 ) The marginals can be obtained from the joint in the usual way. For the previous example, F X (x )= lim F X(x,x 2,x 3 ) x 2,x 3 f X,X 2 (x,x 2 )= f X,X 2,X 3 (x,x 2,x 3 ) dx 3 EE 278B: Random Vectors 3 3 Conditional cdf (pdf, pmf) can also be defined in the usual way. E.g.,the conditional pdf of X n k+ =(X k+,...,x n ) given X k =(X,...,X k ) is Chain Rule: Wecanwrite f X n k+ X k(xn k+ x k )= f X(x,x 2,...,x n ) f X k(x,x 2,...,x k ) = f X(x) f X k(x k ) f X (x) =f X (x )f X2 X (x 2 x )f X3 X,X 2 (x 3 x,x 2 ) f Xn X n (x n x n ) Proof: By induction. The chain rule holds for n =2by definition of conditional pdf. Now suppose it is true for n. Then f X (x) =f X n (x n )f Xn X n (x n x n ) = f X (x )f X2 X (x 2 x ) f Xn X n 2(x n x n 2 )f Xn X n (x n x n ), which completes the proof EE 278B: Random Vectors 3 4

Independence and Conditional Independence Independence is defined in the usual way; e.g., X,X 2,...,X n are independent if n f X (x) = f Xi (x i ) for all (x,...,x n ) i= Important special case, i.i.d. r.v.s: X,X 2,...,X n are said to be independent, identically distributed (i.i.d.) if they are independent and have the same marginals Example: if we flip a coin n times independently, we generate i.i.d. Bern(p) r.v.s. X,X 2,...,X n R.v.s X and X 3 are said to be conditionally independent given X 2 if f X,X 3 X 2 (x,x 3 x 2 )=f X X 2 (x x 2 )f X3 X 2 (x 3 x 2 ) for all (x,x 2,x 3 ) Conditional independence neither implies nor is implied by independence; X and X 3 independent given X 2 does not mean that X and X 3 are independent (or vice versa) EE 278B: Random Vectors 3 5 Example: Coin with random bias. GivenacoinwithrandombiasP f P (p), flip it n times independently to generate the r.v.s X,X 2,...,X n,where X i =if i-th flip is heads, 0 otherwise X,X 2,...,X n are not independent However, X,X 2,...,X n are conditionally independent given P ;infact,they are i.i.d. Bern(p) for every P = p Example: Additive noise channel. Consideranadditivenoisechannelwithsignal X, noisez, andobservationy = X + Z, wherex and Z are independent Although X and Z are independent, they are not in general conditionally independent give Y EE 278B: Random Vectors 3 6

Mean and Covariance Matrix The mean of the random vector X is defined as E(X) = [ E(X ) E(X 2 ) E(X n ) ] T Denote the covariance between X i and X j, Cov(X i,x j ),byσ ij (so the variance of X i is denoted by σ ii, Var(X i ),orσ 2 X i ) The covariance matrix of X is defined as σ σ 2 σ n Σ X = σ 2 σ 22 σ 2n...... σ n σ n2 σ nn For n =2,wecanusethedefinitionofcorrelationcoefficienttoobtain [ σ σ Σ X = 2 σ 2 ] X = ρ X,X 2 σ X σ X2 σ 2 σ 22 ρ X,X 2 σ X σ X2 σx 2 2 EE 278B: Random Vectors 3 7 Properties of Covariance Matrix Σ X Σ X is real and symmetric (since σ ij = σ ji ) Σ X is positive semidefinite, i.e.,thequadratic form a T Σ X a 0 for every real vector a Equivalently, all the eigenvalues of Σ X are nonnegative, and also all leading principal minors are nonnegative To show that Σ X is positive semidefinite we write Σ X =E [ (X E(X))(X E(X)) T ], i.e., as the expectation of an outer product. Thus a T Σ X a = a T E [ (X E(X))(X E(X)) T ] a =E [ a T (X E(X))(X E(X)) T a ] =E [ (a T (X E(X))) 2] 0 EE 278B: Random Vectors 3 8

Which of the Following Can Be a Covariance Matrix?. 0 0 0 0 2. 0 0 2 2 3. 0 2 0 3 4. 5. 2 6. 3 2 3 2 4 6 3 6 9 EE 278B: Random Vectors 3 9 Coloring and Whitening Square root of covariance matrix: LetΣ be a covariance matrix. Then there exists an n n matrix Σ /2 such that Σ=Σ /2 (Σ /2 ) T.ThematrixΣ /2 is called the square root of Σ Coloring: LetX be white RV, i.e., has zero mean and Σ X = I. Assumewithout loss of generality that a = Let Σ be a covariance matrix, then the RV Y =Σ /2 X has covariance matrix Σ (why?) Hence we can generate a RV with any prescribed covariance from awhiterv Whitening: GivenazeromeanRVY with nonsingular covariance matrix Σ, then the RV X =Σ /2 Y is white Hence, we can generate a white RV from any RV with nonsingular covariance matrix Coloring and whitening have applications in simulations, detection, and estimation EE 278B: Random Vectors 3 0

Finding Square Root of Σ For convenience, we assume throughout that Σ is nonsingular Since Σ is symmetric, it has n real eigenvalues λ,λ 2,...,λ n and n corresponding orthogonal eigenvectors u, u 2,...,u n Further, since Σ is positive definite, the eigenvalues are all positive Thus, we have Σu i = λ i u i, u T i u j =0 λ i > 0, i=, 2,...,n for every i j Without loss of generality assume that the u i vectors are unit vectors The first set of equations can be rewritten in the matrix form where ΣU = UΛ, U =[u u 2... u n ] and Λ is a diagonal matrix with diagonal elements λ i EE 278B: Random Vectors 3 Note that U is a unitary matrix (U T U = UU T = I), hence Σ=UΛU T and the square root of Σ is Σ /2 = UΛ /2, where Λ /2 is a diagonal matrix with diagonal elements λ /2 i The inverse of the square root is straightforward to find as Σ /2 =Λ /2 U T Example: Let 2 Σ= 3 To find the eigenvalues of Σ, wefindtherootsofthepolynomialequation which gives λ =3.62, λ 2 =.38 To find the eigenvectors, consider [ 2 3 det(σ λi) =λ 2 5λ +5=0, ][ u u 2 ] u =3.62, u 2 EE 278B: Random Vectors 3 2

and u 2 + u 2 2 =,whichyields u = 0.53 0.85 Similarly, we can find the second eigenvector 0.85 u 2 = 0.53 Hence, Σ /2 = [ ] 0.53 0.85 3.62 0 = 0.85 0.53 0.38 The inverse of the square root is [ ] Σ /2 / 3.62 0 = 0 / 0.53 0.85 =.38 0.85 0.53 [ ].62 0.62 [ 0.28 ] 0.45 0.72 0.45 Geometric interpretation: To generate a RV Y with covariance matrix Σ from white RV X, weusethetransformationy = UΛ /2 X Equivalently, we first scale each component of X to obtain the RV Z =Λ /2 X We then rotate Z using U to obtain Y = UZ EE 278B: Random Vectors 3 3 Cholesky Decomposition Σ has many square roots: If Σ /2 is a square root, then for any unitary matrix V, Σ /2 V is also a square root since Σ /2 VV T (Σ /2 ) T =Σ The Cholesky decomposition is an efficient algorithm for computing lower triangle square root that can be used to perform coloring causally (sequentially) For n =3,wewanttofindalowertrianglematrix(squareroot)A such that Σ= σ σ 2 σ 3 σ 2 σ 22 σ 23 = a 0 0 a 2 a 22 0 a a 2 a 3 0 a 22 a 32 σ 3 σ 32 σ 33 a 3 a 32 a 33 0 0 a 33 The elements of A are computed in a raster scan manner: a : σ = a 2 a = σ a 2 : σ 2 = a 2 a a 2 = σ 2 /a a 22 : σ 22 = a 2 2 + a 2 22 a 22 = σ 22 a 2 2 a 3 : σ 3 = a a 3 a 3 = σ 3 /a EE 278B: Random Vectors 3 4

a 32 : σ 32 = a 2 a 3 + a 22 a 32 a 32 = σ 32 a 2 a 3 )/a 22 a 33 : σ 33 = a 2 3 + a 2 32 + a 2 33 a 33 = σ 33 a 2 3 a2 32 The inverse of a lower triangle square root is also lower triangular Coloring and whitening summary: Coloring: X Σ /2 Y Whitening: Σ X = I Σ Y =Σ Y Σ /2 X Σ Y =Σ Σ X = I Lower triangle square root and its inverse can be efficiently computed using Cholesky decomposition EE 278B: Random Vectors 3 5 Gaussian Random Vectors ArandomvectorX =(X,...,X n ) is a Gaussian random vector (GRV) (or X,X 2,...,X n are jointly Gaussian r.v.s) if the joint pdf is of the form f X (x) = e (2π) n 2 Σ 2 2 (x µ)t Σ (x µ), where µ is the mean and Σ is the covariance matrix of X, and Σ > 0, i.e.,σ is positive definite Verify that this joint pdf is the same as the case n =2from Lecture Notes 2 Notation: X N(µ, Σ) denotes a GRV with given mean and covariance matrix Since Σ is positive definite, Σ is positive definite. Thus if x µ 0, (x µ) T Σ (x µ) > 0, which means that the contours of equal pdf are ellipsoids The GRV X N(0,aI), wherei is the identity matrix and a>0, iscalled white; itscontoursofequaljointpdfarespherescenteredattheorigin EE 278B: Random Vectors 3 6

Properties of GRVs Property : ForaGRV,uncorrelationimpliesindependence This can be verified by substituting σ ij =0for all i j in the joint pdf. Then Σ becomes diagonal and so does Σ,andthejointpdfreducestothe product of the marginals X i N(µ i,σ ii ) For the white GRV X N(0,aI), ther.v.sarei.i.d.n (0,a) Property 2: LineartransformationofaGRVyieldsaGRV,i.e.,givenany m n matrix A, wherem n and A has full rank m, then Y = AX N(Aµ, AΣA T ) Example: Let Find the joint pdf of X N Y = ( 0, ) 2 3 X 0 EE 278B: Random Vectors 3 7 Solution: From Property 2, we conclude that ( ) 2 Y N 0, = N 0 3 0 Before we prove Property 2, let us show that ( 0, E(Y) =Aµ and Σ Y = AΣA T ) 7 3 3 2 These results follow from linearity of expectation. First, expectation: E(Y) =E(AX) =A E(X) =Aµ Next consider the covariance matrix: Σ Y =E [ (Y E(Y))(Y E(Y)) T ] =E [ (AX Aµ)(AX Aµ) T ] = A E [ (X µ)(x µ) T ] A T = AΣA T Of course this is not sufficient to show that Y is a GRV we must also show that the joint pdf has the right form We do so using the characteristic function for a random vector EE 278B: Random Vectors 3 8

Definition: IfX f X (x), thecharacteristicfunctionofx is ( ) Φ X (ω) =E e iωt X, where ω is an n-dimensional real valued vector and i = Thus Φ X (ω) =... f X (x)e iωt x dx This is the inverse of the multi-dimensional Fourier transform of f X (x), which implies that there is a one-to-one correspondence between Φ X (ω) and f X (x). The joint pdf can be found by taking the Fourier transform of Φ X (ω), i.e., f X (x) =... (2π) nφ X(ω)e iω T x dω Example: The characteristic function for X N(µ, σ 2 ) is and for a GRV X N(µ, Σ), Φ X (ω) =e 2 ω2 σ 2 + iµω, Φ X (ω) =e 2 ωt Σω + iω T µ EE 278B: Random Vectors 3 9 Now let s go back to proving Property 2 Since A is an m n matrix, Y = AX and ω are m-dimensional. Therefore the characteristic function of Y is ( ) Φ Y (ω) =E e iωt Y ( ) =E e iωt AX Thus Y = AX N(Aµ,AΣA T ) =Φ X (A T ω) = e 2 (AT ω) T Σ(A T ω)+iω T Aµ = e 2 ωt (AΣA T )ω + iω T Aµ An equivalent definition of GRV: X is a GRV iff for any real vector a 0,the r.v. Y = a T X is Gaussian (see HW for proof) Whitening transforms a GRV to a white GRV; conversely, coloring transforms a white GRV to a GRV with prescribed covariance matrix EE 278B: Random Vectors 3 20

Property 3: MarginalsofaGRVareGaussian,i.e.,ifX is GRV then for any subset {i,i 2,...,i k } {, 2,...,n} of indexes, the RV is a GRV Y = X i X i2. X ik To show this we use Property 2. For example, let n =3and Y = We can express Y as a linear transformation of X: 0 0 Y = X X 0 0 2 X = X 3 Therefore Y N ([ µ µ 3 X 3 ] ) σ σ, 3 σ 3 σ 33 [ X As we have seen in Lecture Notes 2, the converse of Property 3 does not hold in general, i.e., Gaussian marginals do not necessarily mean that the r.v.s are jointly Gaussian X 3 ] EE 278B: Random Vectors 3 2 Property 4: ConditionalsofaGRVareGaussian,morespecifically,if X = X N µ, Σ Σ 2, Σ 2 Σ 22 X 2 where X is a k-dim RV and X 2 is an n k-dim RV, then X 2 {X = x} N ( Σ 2 Σ (x µ )+µ 2, Σ 22 Σ 2 Σ Σ ) 2 Compare this to the case of n =2and k =: ( ) σ2 X 2 {X = x} N (x µ )+µ 2,σ 22 σ2 2 σ σ µ 2 Example: X 2 X 2 N 2, 2 5 2 2 2 9 X 3 EE 278B: Random Vectors 3 22

From Property 4, it follows that E(X 2 X = x) = Σ {X2 X =x} = = [ [ 2 2 2x (x ) + = ] 2] x + 5 2 2 [2 ] 2 9 0 0 8 The proof of Property 4 follows from properties and 2 and the orthogonality principle (HW exercise) EE 278B: Random Vectors 3 23