Lecture Notes 3 Random Vectors. Specifying a Random Vector. Mean and Covariance Matrix. Coloring and Whitening. Gaussian Random Vectors

Transcription

1 Lecture Notes 3 Random Vectors Specifying a Random Vector Mean and Covariance Matrix Coloring and Whitening Gaussian Random Vectors EE 278B: Random Vectors 3 Specifying a Random Vector Let X,X 2,...,X n be random variables defined on the same probability space. We define a random vector (RV) as X X = X 2. X is completely specified by its joint cdf for x =(x,x 2,...,x n ): X n F X (x) =P{X x,x 2 x 2,...,X n x n }, x R n If X is continuous, i.e., F X (x) is a continuous function of x, thenx can be specified by its joint pdf: f X (x) =f X,X 2,...,X n (x,x 2,...,x n ), If X is discrete then it can be specified by its joint pmf: p X (x) =p X,X 2,...,X n (x,x 2,...,x n ), x R n x X n EE 278B: Random Vectors 3 2

2 Amarginalcdf(pdf,pmf)isthejointcdf(pdf,pmf)forasubset of {X,...,X n };e.g.,for X = X the marginals are X 2 X 3 f X (x ),f X2 (x 2 ),f X3 (x 3 ) f X,X 2 (x,x 2 ),f X,X 3 (x,x 3 ),f X2,X 3 (x 2,x 3 ) The marginals can be obtained from the joint in the usual way. For the previous example, F X (x )= lim F X(x,x 2,x 3 ) x 2,x 3 f X,X 2 (x,x 2 )= f X,X 2,X 3 (x,x 2,x 3 ) dx 3 EE 278B: Random Vectors 3 3 Conditional cdf (pdf, pmf) can also be defined in the usual way. E.g.,the conditional pdf of X n k+ =(X k+,...,x n ) given X k =(X,...,X k ) is Chain Rule: Wecanwrite f X n k+ X k(xn k+ x k )= f X(x,x 2,...,x n ) f X k(x,x 2,...,x k ) = f X(x) f X k(x k ) f X (x) =f X (x )f X2 X (x 2 x )f X3 X,X 2 (x 3 x,x 2 ) f Xn X n (x n x n ) Proof: By induction. The chain rule holds for n =2by definition of conditional pdf. Now suppose it is true for n. Then f X (x) =f X n (x n )f Xn X n (x n x n ) = f X (x )f X2 X (x 2 x ) f Xn X n 2(x n x n 2 )f Xn X n (x n x n ), which completes the proof EE 278B: Random Vectors 3 4

3 Independence and Conditional Independence Independence is defined in the usual way; e.g., X,X 2,...,X n are independent if n f X (x) = f Xi (x i ) for all (x,...,x n ) i= Important special case, i.i.d. r.v.s: X,X 2,...,X n are said to be independent, identically distributed (i.i.d.) if they are independent and have the same marginals Example: if we flip a coin n times independently, we generate i.i.d. Bern(p) r.v.s. X,X 2,...,X n R.v.s X and X 3 are said to be conditionally independent given X 2 if f X,X 3 X 2 (x,x 3 x 2 )=f X X 2 (x x 2 )f X3 X 2 (x 3 x 2 ) for all (x,x 2,x 3 ) Conditional independence neither implies nor is implied by independence; X and X 3 independent given X 2 does not mean that X and X 3 are independent (or vice versa) EE 278B: Random Vectors 3 5 Example: Coin with random bias. GivenacoinwithrandombiasP f P (p), flip it n times independently to generate the r.v.s X,X 2,...,X n,where X i =if i-th flip is heads, 0 otherwise X,X 2,...,X n are not independent However, X,X 2,...,X n are conditionally independent given P ;infact,they are i.i.d. Bern(p) for every P = p Example: Additive noise channel. Consideranadditivenoisechannelwithsignal X, noisez, andobservationy = X + Z, wherex and Z are independent Although X and Z are independent, they are not in general conditionally independent give Y EE 278B: Random Vectors 3 6

4 Mean and Covariance Matrix The mean of the random vector X is defined as E(X) = [ E(X ) E(X 2 ) E(X n ) ] T Denote the covariance between X i and X j, Cov(X i,x j ),byσ ij (so the variance of X i is denoted by σ ii, Var(X i ),orσ 2 X i ) The covariance matrix of X is defined as σ σ 2 σ n Σ X = σ 2 σ 22 σ 2n σ n σ n2 σ nn For n =2,wecanusethedefinitionofcorrelationcoefficienttoobtain [ σ σ Σ X = 2 σ 2 ] X = ρ X,X 2 σ X σ X2 σ 2 σ 22 ρ X,X 2 σ X σ X2 σx 2 2 EE 278B: Random Vectors 3 7 Properties of Covariance Matrix Σ X Σ X is real and symmetric (since σ ij = σ ji ) Σ X is positive semidefinite, i.e.,thequadratic form a T Σ X a 0 for every real vector a Equivalently, all the eigenvalues of Σ X are nonnegative, and also all leading principal minors are nonnegative To show that Σ X is positive semidefinite we write Σ X =E [ (X E(X))(X E(X)) T ], i.e., as the expectation of an outer product. Thus a T Σ X a = a T E [ (X E(X))(X E(X)) T ] a =E [ a T (X E(X))(X E(X)) T a ] =E [ (a T (X E(X))) 2] 0 EE 278B: Random Vectors 3 8

5 Which of the Following Can Be a Covariance Matrix? EE 278B: Random Vectors 3 9 Coloring and Whitening Square root of covariance matrix: LetΣ be a covariance matrix. Then there exists an n n matrix Σ /2 such that Σ=Σ /2 (Σ /2 ) T.ThematrixΣ /2 is called the square root of Σ Coloring: LetX be white RV, i.e., has zero mean and Σ X = I. Assumewithout loss of generality that a = Let Σ be a covariance matrix, then the RV Y =Σ /2 X has covariance matrix Σ (why?) Hence we can generate a RV with any prescribed covariance from awhiterv Whitening: GivenazeromeanRVY with nonsingular covariance matrix Σ, then the RV X =Σ /2 Y is white Hence, we can generate a white RV from any RV with nonsingular covariance matrix Coloring and whitening have applications in simulations, detection, and estimation EE 278B: Random Vectors 3 0

6 Finding Square Root of Σ For convenience, we assume throughout that Σ is nonsingular Since Σ is symmetric, it has n real eigenvalues λ,λ 2,...,λ n and n corresponding orthogonal eigenvectors u, u 2,...,u n Further, since Σ is positive definite, the eigenvalues are all positive Thus, we have Σu i = λ i u i, u T i u j =0 λ i > 0, i=, 2,...,n for every i j Without loss of generality assume that the u i vectors are unit vectors The first set of equations can be rewritten in the matrix form where ΣU = UΛ, U =[u u 2... u n ] and Λ is a diagonal matrix with diagonal elements λ i EE 278B: Random Vectors 3 Note that U is a unitary matrix (U T U = UU T = I), hence Σ=UΛU T and the square root of Σ is Σ /2 = UΛ /2, where Λ /2 is a diagonal matrix with diagonal elements λ /2 i The inverse of the square root is straightforward to find as Σ /2 =Λ /2 U T Example: Let 2 Σ= 3 To find the eigenvalues of Σ, wefindtherootsofthepolynomialequation which gives λ =3.62, λ 2 =.38 To find the eigenvectors, consider [ 2 3 det(σ λi) =λ 2 5λ +5=0, ][ u u 2 ] u =3.62, u 2 EE 278B: Random Vectors 3 2

7 and u 2 + u 2 2 =,whichyields u = Similarly, we can find the second eigenvector 0.85 u 2 = 0.53 Hence, Σ /2 = [ ] = The inverse of the square root is [ ] Σ /2 / = 0 / = [ ] [ 0.28 ] Geometric interpretation: To generate a RV Y with covariance matrix Σ from white RV X, weusethetransformationy = UΛ /2 X Equivalently, we first scale each component of X to obtain the RV Z =Λ /2 X We then rotate Z using U to obtain Y = UZ EE 278B: Random Vectors 3 3 Cholesky Decomposition Σ has many square roots: If Σ /2 is a square root, then for any unitary matrix V, Σ /2 V is also a square root since Σ /2 VV T (Σ /2 ) T =Σ The Cholesky decomposition is an efficient algorithm for computing lower triangle square root that can be used to perform coloring causally (sequentially) For n =3,wewanttofindalowertrianglematrix(squareroot)A such that Σ= σ σ 2 σ 3 σ 2 σ 22 σ 23 = a 0 0 a 2 a 22 0 a a 2 a 3 0 a 22 a 32 σ 3 σ 32 σ 33 a 3 a 32 a a 33 The elements of A are computed in a raster scan manner: a : σ = a 2 a = σ a 2 : σ 2 = a 2 a a 2 = σ 2 /a a 22 : σ 22 = a a 2 22 a 22 = σ 22 a 2 2 a 3 : σ 3 = a a 3 a 3 = σ 3 /a EE 278B: Random Vectors 3 4

8 a 32 : σ 32 = a 2 a 3 + a 22 a 32 a 32 = σ 32 a 2 a 3 )/a 22 a 33 : σ 33 = a a a 2 33 a 33 = σ 33 a 2 3 a2 32 The inverse of a lower triangle square root is also lower triangular Coloring and whitening summary: Coloring: X Σ /2 Y Whitening: Σ X = I Σ Y =Σ Y Σ /2 X Σ Y =Σ Σ X = I Lower triangle square root and its inverse can be efficiently computed using Cholesky decomposition EE 278B: Random Vectors 3 5 Gaussian Random Vectors ArandomvectorX =(X,...,X n ) is a Gaussian random vector (GRV) (or X,X 2,...,X n are jointly Gaussian r.v.s) if the joint pdf is of the form f X (x) = e (2π) n 2 Σ 2 2 (x µ)t Σ (x µ), where µ is the mean and Σ is the covariance matrix of X, and Σ > 0, i.e.,σ is positive definite Verify that this joint pdf is the same as the case n =2from Lecture Notes 2 Notation: X N(µ, Σ) denotes a GRV with given mean and covariance matrix Since Σ is positive definite, Σ is positive definite. Thus if x µ 0, (x µ) T Σ (x µ) > 0, which means that the contours of equal pdf are ellipsoids The GRV X N(0,aI), wherei is the identity matrix and a>0, iscalled white; itscontoursofequaljointpdfarespherescenteredattheorigin EE 278B: Random Vectors 3 6

9 Properties of GRVs Property : ForaGRV,uncorrelationimpliesindependence This can be verified by substituting σ ij =0for all i j in the joint pdf. Then Σ becomes diagonal and so does Σ,andthejointpdfreducestothe product of the marginals X i N(µ i,σ ii ) For the white GRV X N(0,aI), ther.v.sarei.i.d.n (0,a) Property 2: LineartransformationofaGRVyieldsaGRV,i.e.,givenany m n matrix A, wherem n and A has full rank m, then Y = AX N(Aµ, AΣA T ) Example: Let Find the joint pdf of X N Y = ( 0, ) 2 3 X 0 EE 278B: Random Vectors 3 7 Solution: From Property 2, we conclude that ( ) 2 Y N 0, = N Before we prove Property 2, let us show that ( 0, E(Y) =Aµ and Σ Y = AΣA T ) These results follow from linearity of expectation. First, expectation: E(Y) =E(AX) =A E(X) =Aµ Next consider the covariance matrix: Σ Y =E [ (Y E(Y))(Y E(Y)) T ] =E [ (AX Aµ)(AX Aµ) T ] = A E [ (X µ)(x µ) T ] A T = AΣA T Of course this is not sufficient to show that Y is a GRV we must also show that the joint pdf has the right form We do so using the characteristic function for a random vector EE 278B: Random Vectors 3 8

10 Definition: IfX f X (x), thecharacteristicfunctionofx is ( ) Φ X (ω) =E e iωt X, where ω is an n-dimensional real valued vector and i = Thus Φ X (ω) =... f X (x)e iωt x dx This is the inverse of the multi-dimensional Fourier transform of f X (x), which implies that there is a one-to-one correspondence between Φ X (ω) and f X (x). The joint pdf can be found by taking the Fourier transform of Φ X (ω), i.e., f X (x) =... (2π) nφ X(ω)e iω T x dω Example: The characteristic function for X N(µ, σ 2 ) is and for a GRV X N(µ, Σ), Φ X (ω) =e 2 ω2 σ 2 + iµω, Φ X (ω) =e 2 ωt Σω + iω T µ EE 278B: Random Vectors 3 9 Now let s go back to proving Property 2 Since A is an m n matrix, Y = AX and ω are m-dimensional. Therefore the characteristic function of Y is ( ) Φ Y (ω) =E e iωt Y ( ) =E e iωt AX Thus Y = AX N(Aµ,AΣA T ) =Φ X (A T ω) = e 2 (AT ω) T Σ(A T ω)+iω T Aµ = e 2 ωt (AΣA T )ω + iω T Aµ An equivalent definition of GRV: X is a GRV iff for any real vector a 0,the r.v. Y = a T X is Gaussian (see HW for proof) Whitening transforms a GRV to a white GRV; conversely, coloring transforms a white GRV to a GRV with prescribed covariance matrix EE 278B: Random Vectors 3 20

11 Property 3: MarginalsofaGRVareGaussian,i.e.,ifX is GRV then for any subset {i,i 2,...,i k } {, 2,...,n} of indexes, the RV is a GRV Y = X i X i2. X ik To show this we use Property 2. For example, let n =3and Y = We can express Y as a linear transformation of X: 0 0 Y = X X X = X 3 Therefore Y N ([ µ µ 3 X 3 ] ) σ σ, 3 σ 3 σ 33 [ X As we have seen in Lecture Notes 2, the converse of Property 3 does not hold in general, i.e., Gaussian marginals do not necessarily mean that the r.v.s are jointly Gaussian X 3 ] EE 278B: Random Vectors 3 2 Property 4: ConditionalsofaGRVareGaussian,morespecifically,if X = X N µ, Σ Σ 2, Σ 2 Σ 22 X 2 where X is a k-dim RV and X 2 is an n k-dim RV, then X 2 {X = x} N ( Σ 2 Σ (x µ )+µ 2, Σ 22 Σ 2 Σ Σ ) 2 Compare this to the case of n =2and k =: ( ) σ2 X 2 {X = x} N (x µ )+µ 2,σ 22 σ2 2 σ σ µ 2 Example: X 2 X 2 N 2, X 3 EE 278B: Random Vectors 3 22

12 From Property 4, it follows that E(X 2 X = x) = Σ {X2 X =x} = = [ [ 2 2 2x (x ) + = ] 2] x [2 ] The proof of Property 4 follows from properties and 2 and the orthogonality principle (HW exercise) EE 278B: Random Vectors 3 23