3. Multivariate Normal Distribution

Size: px

Start display at page:

Download "3. Multivariate Normal Distribution"

Valentine Miller
7 years ago
Views:

1 3. Multivariate Normal Distribution The MVN distribution is a generalization of the univariate normal distribution which has the density function (p.d.f.) ( f (x) p exp ) (x ) < x < where mean of distribution, variance. In p dimensions the density becomes f (x) () p exp jj (x )T (x ) (3.) Within the mean vector there are p (independent) parameters and within the symmetric covariance matrix there are p (p + ) independent parameters [ p (p + 3) independent parameters in total]. We use the notation x s N p (; ) (3.) to denote a RV x having the p variate MVN distribution with E (x) Cov (x) Note that MVN distributions are entirely characterized by the rst and second moments of the distribution. 3. Basic properties If x (p )is MVN with mean and covariance matrix Any linear combination of x is MVN Let y Ax + c with A ( p) and c ( ) then y s N y ; y where y A + c and y AA T : Any subset of variables in x has a MVN distribution. If a set of variables is uncorrelated, then they are independently distributed. In particular i) if ij then x i ; x j are independent.

2 ii) if x is MVN with covariance matrix, then Ax and Bx are independent if and only if Cov (Ax; Bx) AB T (3.3) Conditional distributions are MVN. Result For the MVN distribution, variable are uncorrelated, variable are independent. Proof Let x (p ) be partitioned as x x x p with mean vector p and covariance matrix p p i) Independent ) uncorrelated (always holds). Suppose x ; x are independent. Then f (x ; x h) h (x ) g (x ) is a i factorization of the multivariate p.d.f.and Cov (x ; x ) E (x ) (x ) T factorizes into the h i product of E [(x )] and E (x ) T which are both zero since E (x ) and E (x ) : Hence : ii) Uncorrelated ) independent (for MVN) This result depends on factorizing the p.d.f. (3.) when : In this case (x ) T (x ) has the partitioned form h i x T T ; xt T x x h i x T T ; xt T x x (x ) T (x ) + (x ) T (x )

3 so that expf(x ) T (x )g factorizes into the product of n o n o exp (x ) T (x ) and exp (x ) T (x ) : Therefore the p.d.f. can be written as proving that x and x are independent. f (x) g (x ) h (x ) 3. Conditional distribution X Let X be a partitioned MVN random p X p with mean and covariance matrix vector, The conditional distribution of X given X x is MVN with E (X jx x ) + (x ) (3.4a) Cov (X jx x ) (3.4b) Note: the notation X to denote the r:v: and x to denote a speci c constant value (realization of X ) will be very useful here. Proof of 3.4a De ne a transformation from (X ; X ) to new variables X and X X X : This is achieved by the linear transformation X X I I X X (3.5a) AX say. (3.5b) This linear relationship shows that X ; X are jointly MVN (by rst property of MVN stated above.) We now show that X and X are independent by proving that X and X are uncorrelated. Approach : 3

4 Cov X ; X Cov X ; X X Cov(X ; X ) Cov (X ; X ) Approach : B In (3.3), write A where B C h i h I and C i I Cov X ; X Cov (BX; CX) BC T h i I h i I I Since X and X are MVN and uncorrelated they are independent. Thus E X jx x E X E X X Now, as X X X and X x is given, we have as reuired. Proof of 3.4b Because X is independent of X E (X jx x ) E X jx x + x + x + (x ) Cov X jx x Cov X 4

5 The left hand side is LHS Cov X jx x Cov X x jx x Cov (X jx x ) The right hand side is following from the general expansion RHS Cov X Cov X X Cov (X DX ) Cov (X ; X ) DCov (X ; X ) with D : Therefore as reuired. Example Cov (X ; X ) D T + DCov (X ; X ) D T Cov (X jx x ) Let x have a MVN distribution with covariance matrix Show that the conditional distribution of (X ; X ) given X 3 x 3 is also MVN with mean + (x 3 3 ) and covariance matrix 4 5

6 Solution Let Y X X We have Cov Hence and : and Y (X 3 ) then EY Y Y EY ( 3 ) : where [] T E [Y jy x 3 ] + (x 3 3 ) + (x 3 3 ) + (x 3 3 ) Cov [Y jy x 3 ] h i Maximum-likelihood estimation Let X T (x ; :::; x n ) contain an independent random sample of size n from N p (; ) : The maximum likelihood estimates (MLE s) of ; are the sample mean and covariance matrix (with divisor n) ^ x (3.6a) ^ S (3.6b) 6

7 The likelihood function is a function of the parameters ; given the data X ny L (; jx) f (x r j; ) (3.7) r The RHS is evaluated by substituting the individual data vectors fx ; :::; x n g in turn into the p.d.f. of N p (; ) and taking the product. ny r f (x r j; ) () np jj n exp ( ) (x r ) T (x r ) Maximizing L is euivalent to minimizing the log likelihood function r l (; ) log L log f (x r j; ) r K + n log jj+ where K is a constant independent of ; : (x r ) T (x r ) (3.8) r Result 3.3 l (; ) n log j j+ tr S + dd T (3.9) up to an additive constant, where d x : Proof Noting that x r (x r x) + d the nal term in the likelihood expression (3.8) becomes (x r ) T (x r ) r (x r x) T (x r x) + nd T d r ntr S + nd T d ntr S + dd T proving the expression (3.9). Note that the cross-product terms have vanished because P n r x r 7

8 nx and therefore d T (x r x) d T (x r x) r r (x r x) T d In (3.9) the dependence on is entirely through d. Now assume that is positive de nite (p.d.), then so is as V V T where V V T is the eigenanalysis of. Thus 8d 6 we have d T d > : Hence l (; ) is minimized with respect to for xed when d i.e. r ^ x Final part of proof: to minimize the log-likelihood l (^; ) w.r.t. let l (^; ) n log jj + tr S () (3.) We show that () (S) n log jj log jsj + tr S p n tr S log j Sj p (3.) Lemma S is positive semi-de nite (proved elsewhere). positive. Therefore the eigenvalues of S are Lemma For any set of positive numbers A log G + where A and G are the arithmetic, geometric means respectively. Proof 8

9 For all x we have e x + x (simple exercise).consider a set of n strictly positive numbers fy i g y i + log y i X yi n + X log y i as reuired. Y n A + log yi + log G Recall that for any (n n) matrix A; we have tr (A) P n i i the sum of the eigenvalues, and j Aj Y i the product of the eigenvalues. Let i (i ; :::; p) be the positive eigenvalues of S and substitute in (3.) Y log j Sj log i p log G tr S X i Hence pa () (S) np fa log G g This proves that the MLE s are as stated in (3:6) : 3.3 Sampling distribution of x and S The Wishart distribution (De nition) If M (p p) can be written M X T X where X (m p) is a data matrix from N p (; ) then M is said to have a Wishart distribution with scale matrix and degrees of freedom m: We write M s W p (;m) (3.) When I p the distribution is said to be in standard form. Note: 9

10 The Wishart distribution is the multivariate generalization of the chi-suare distribution Additive property of matrices with a Wishart distribution Let M, M be matrices having the Wishart distribution M s W p (;m ) M s W p (;m ) independently, then M + M s W p (;m + m ) This property follows from the de nition of the Wishart distribution because data matrices are additive in the sense that if X X X is a combined data matrix consisting of m + m rows then X T X X T X +X T X is matrix (known as the Gram matrix) formed from the combined data matrix X: Case of p When p we know from the de nition of r as the distribution of the sum of suares of r independent N (; ) variates that M so that mx x i s m i W ; m m Sampling distributions Let x ; x ; :::; x n be a random sample of size n from N p (; ). Then. The sample mean x has the normal distribution x s N p ; n. The (scaled) sample covariance matrix has the Wishart distribution: (n ) S u s W p (;n ) 3. The distributions of x and S u are independent.

11 3.4 Estimators for special circumstances 3.4. proportional to a given vector Sometimes is known to be proportional to a given vector, so k with being a known vector. For example if x represents a sample of repeated measurements then kwhere (; ; :::; ) T is the p vector of s: We nd the MLE of k for this situation. Suppose is known and k : Let d x k : The log likelihood is Set dl dk from which Properties l (k) log L n log j j+ tr S + d d T h n log j j+ tr S i + (x k ) T (x k ) n x T x k T x+ k T to minimize l (k) w.r.t. k + constant terms indept of k T x+ T k ^k T x T (3.3) We now show that ^k is an unbiased estimator of k and determine the variance of ^k In (3.3) ^k takes the form ct x with c T T and T so i E h^k ct E [x] kct : kt since E [x] k : Hence i E h^k k (3.4) showing that ^k is an unbiased estimator.

12 Note that V ar [x] n and therefore that V ar c T x n ct c we have V ar ^k n ct c n T T n T (3.5) 3.4. Linear restriction on We determine an estimator for to satisfy a linear restriction A b where A (m p) and b (m ) are given constants and is assumed to be known. We write the restriction in vector form g () and form the Lagrangean L (; ) l () + T g () where T ( ; :::; m ) is a vector of Lagrange multipliers (the factor is inserted just for convenience). Set L (; ) l () + T (A b) n o n (x ) T (x ) + T (A b) ignore constant terms involving d L (; ) using results from Example Sheet : d (x ) + A T x A T (3.6) We use the constraint A b to evaluate the Lagrange multipliers : Premultiply by A Ax b AA T AA T (Ax b) Substitute into (3.6) ^x A T AA T (Ax b) (3.7)

13 3.4.3 Covariance matrix proportional to a given matrix We consider estimating k when k ; where is a given.constant matrix. The likelihood (3.8) takes the form when d (^ x) plus constant terms (not involving k): l (k) n log jk j + tr k S l (k) dl dk p log k + k tr S + constant terms ) p k k tr S Hence ^k tr S p (3.8) 3

1. Introduction to multivariate data

1. Introduction to multivariate data . Introduction to multivariate data. Books Chat eld, C. and A.J.Collins, Introduction to multivariate analysis. Chapman & Hall Krzanowski, W.J. Principles of multivariate analysis. Oxford.000 Johnson,