Lecture 1. Random vectors and multivariate normal distribution

Size: px

Start display at page:

Download "Lecture 1. Random vectors and multivariate normal distribution"

Milton Chandler
7 years ago
Views:

1 Lecture 1. Random vectors and multivariate normal distribution 1.1 Moments of random vector A random vector X of size p is a column vector consisting of p random variables X 1,..., X p and is X = (X 1,..., X p ). The mean or expectation of X is defined by the vector of expectations, E(X 1 ) µ E(X) =., E(X p ) which exists if E X i < for all i = 1,..., p. Lemma 1. Let X be a random vector of size p and Y be a random vector of size q. For any non-random matrices A (m p), B (m q), C (1 n), and D (m n), E(AX + BY) = AE(X) + BE(Y), E(AXC + D) = AE(X)C + D. For a random vector X of size p satisfying E(X 2 i ) < for all i = 1,..., p, the variance covariance matrix (or just covariance matrix) of X is Σ Cov(X) = E[(X EX)(X EX) ]. The covariance matrix of X is a p p square, symmetric matrix. In particular, Σ ij = Cov(X i, X j ) = Cov(X j, X i ) = Σ ji. Some properties: 1. Cov(X) = E(XX ) E(X)E(X). 2. If c = c (p 1) is a constant, Cov(X + c) = Cov(X). 3. If A (m p) is a constant, Cov(AX) = ACov(X)A. Lemma 2. The p p matrix Σ is a covariance matrix if and only if it is non-negative definite. 1.2 Multivariate normal distribution - nonsingular case Recall that the univariate normal distribution with mean µ and variance σ 2 has density f(x) = (2πσ 2 ) 1 2 exp[ 1 2 (x µ)σ 2 (x µ)]. Similarly, the multivariate normal distribution for the special case of nonsingular covariance matrix Σ is defined as follows.

2 Definition 1. Let µ R p and Σ (p p) > 0. A random vector X R p has p-variate normal distribution with mean µ and covariance matrix Σ if it has probability density function f(x) = 2πΣ 1 2 exp [ 1 ] 2 (x µ) Σ 1 (x µ), (1) for x R p. We use the notation X N p (µ, Σ). Theorem 3. If X N p (µ, Σ) for Σ > 0, then 1. Y = Σ 1 2 (X µ) N p (0, I p ), 2. X = L Σ 1 2 Y + µ where Y N p (0, I p ), 3. E(X) = µ and Cov(X) = Σ, 4. for any fixed v R p, v X is univariate normal. 5. U = (X µ) Σ 1 (X µ) χ 2 (p). Example 1 (Bivariate normal) Geometry of multivariate normal The multivariate normal distribution has location parameter µ and the shape parameter Σ > 0. In particular, let s look into the contour of equal density E c = {x R p : f(x) = c 0 } = {x R p : (x µ) Σ 1 (x µ) = c 2 }. Moreover, consider the spectral decomposition of Σ = UΛU where U = [u 1,..., u p ] and Λ = diag(λ 1,..., λ p ) with λ 1 λ 2... λ p > 0. The E c, for any c > 0, is an ellipsoid centered around µ with principal axes u i of length proportional to λ i. If Σ = I p, the ellipsoid is the surface of a sphere of radius c centered at µ. As an example, consider a bivariate normal distribution N 2 (0, Σ) with Σ = [ ] 2 1 = 1 2 [ cos(π/4) sin(π/4) sin(π/4) cos(π/4) ] [ ] [ ] cos(π/4) sin(π/4). sin(π/4) cos(π/4) The location of the distribution is the origin (µ = 0), and the shape (Σ) of the distribution is determined by the ellipse given by the two principal axes (one at 45 degree line, the other at -45 degree line). Figure 1 shows the density function and the corresponding E c for c = 0.5, 1, 1.5, 2,.... 2

Figure 1: Bivariate normal density and its contours. Notice that an ellipses in the plane can represent a bivariate normal distribution. In higher dimensions d > 2, ellipsoids play the similar role.

3 Figure 1: Bivariate normal density and its contours. Notice that an ellipses in the plane can represent a bivariate normal distribution. In higher dimensions d > 2, ellipsoids play the similar role. 1.3 General multivariate normal distribution The characteristic function of a random vector X is defined as ϕ X (t) = E(e it X ), for t R p. Note that the characteristic function is C-valued, and always exists. We collect some important facts. 1. ϕ X (t) = ϕ Y (t) if and only if X L = Y. 2. If X and Y are independent, then ϕ X+Y = ϕ X (t)ϕ Y (t). 3. X n X if and only if ϕ Xn (t) ϕ X (t) for all t. An important corollary follows from the uniqueness of the characteristic function. Corollary 4 (Cramer Wold device). If X is a p 1 random vector then its distribution is uniquely determined by the distributions of linear functions of t X, for every t R p. Corollary 4 paves the way to the definition of (general) multivariate normal distribution. Definition 2. A random vector X R p has a multivariate normal distribution if t X is an univariate normal for all t R p. The definition says that X is MVN if every projection of X onto a 1-dimensional subspace is normal, with a convention that a degenerate distribution δ c has a normal distribution with variance 0, i.e., c N(c, 0). The definition does not require that Cov(X) is nonsingular. 3

4 Theorem 5. The characteristic function of a multivariate normal distribution with mean µ and covariance matrix Σ 0 is, for t R p, ϕ(t) = exp[it µ 1 2 t Σt]. If Σ > 0, then the pdf exists and is the same as (1). In the following, the notation X N(µ, Σ) is valid for a non-negative definite Σ. However, whenever Σ 1 appears in the statement, Σ is assumed to be positive definite. Proposition 6. If X N p (µ, Σ) and Y = AX + b for A (q p) N q (Aµ + b, AΣA ). and b (q 1), then Y Next two results are concerning independence and conditional distributions of normal random vectors. Let X 1 and X 2 be the partition of X whose dimensions are r and s, r + s = p, and suppose µ and Σ are partitioned accordingly. That is, [ ] ([ ] [ ]) X1 µ1 Σ11 Σ X = N X p, µ 2 Σ 21 Σ 22 Proposition 7. The normal random vectors X 1 and X 2 are independent if and only if Cov(X 1, X 2 ) = Σ 12 = 0. Proposition 8. The conditional distribution of X 1 given X 2 = x 2 is N r (µ 1 + Σ 12 Σ 1 22 (x 2 µ 2 ), Σ 11 Σ 12 Σ 1 22 Σ 21 ) Proof. Consider new random vectors X 1 = X 1 Σ 12 Σ 1 22 X 2 and X 2 = X 2, [ ] [ ] X X = 1 Ir Σ = AX, A = 12 Σ (s r) I s X 2 By Proposition 6, X is multivariate normal. An inspection of the covariance matrix of X leads that X 1 and X 2 are independent. The result follows by writing X 1 = X 1 + Σ 12 Σ 1 22 X 2, and that the distribution (law) of X 1 given X 2 = x 2 is L(X 1 X 2 = x 2 ) = L(X 1+Σ 12 Σ22 1 X 2 X 2 = x 2 ) = L(X 1 + Σ 12 Σ 1 22 x 2 X 2 = x 2 ), which is a MVN of dimension r. 4

5 1.4 Multivariate Central Limit Theorem If X 1, X 2,... R p are i.i.d. with E(X i ) = µ and Cov(X) = Σ, then n 1 2 n (X j µ) N p (0, Σ) as n, j=1 or equivalently, n 1 2 ( Xn µ) N p (0, Σ) as n, where X n = 1 2 n j=1 X j. The delta-method can be used for asymptotic normality of h( X n ) for some function h : R p R. In particular, denote h(x) for the gradient of h at x. Using the first two terms of Taylor series, h( X n ) = h(µ) + ( h(µ)) ( X n µ) + O p ( X n µ 2 2), Then Slutsky s theorem gives the result, n(h( Xn ) h(µ)) = ( h(µ)) n( X n µ) + O p ( n( X n µ) ( X n µ)) ( h(µ)) N p (0, Σ) as n, = N p (0, ( h(µ)) Σ( h(µ))) 1.5 Quadratic forms in normal random vectors Let X N p (µ, Σ). A quadratic form in X is a random variable of the form Y = X AX = p p X i a ij X j, i=1 j=1 where A is a p p symmetric matrix. We are interested in the distribution of quadratic forms and the conditions under which two quadratic forms are independent. Example 2. A special case: If X N p (0, I p ) and A = I p, Fact 1. Recall the following: Y = X AX = X X = p i=1 X 2 i χ 2 (p). 1. A p p matrix A is idempotent if A 2 = A. 2. If A is symmetric, then A = Γ ΛΓ, where Λ = diag(λ i ) and Γ is orthogonal. 3. If A is symmetric idempotent, (a) its eigenvalues are either 0 or 1, 5

6 (b) rank(a) = #{non zero eigenvalues} = trace(a). Theorem 9. Let X N p (0, σ 2 I) and A be a p p symmetric matrix. Then Y = X AX σ 2 if and only if A is idempotent of rank m < p. χ 2 (m) Corollary 10. Let X N p (0, Σ) and A be a p p symmetric matrix. Then Y = X AX χ 2 (m) if and only if either i) AΣ is idempotent of rank m or ii) ΣA is idempotent of rank m. Example 3. If X N p (µ, Σ) then (X µ) Σ 1 (X µ) χ 2 (p). Theorem 11. Let X N p (0, I) and A be a p p symmetric matrix, and B be a k p matrix. If BA = 0, then BX and X AX are independent. Example 4. Let X i N(µ, σ 2 ) i.i.d. The sample mean X n and the sample variance Sn 2 = (n 1) 1 n i=1 (X i X n ) 2 are independent. Moreover, (n 1) S2 n χ 2 (n 1). σ 2 Theorem 12. Let X N p (0, I). Suppose A and B are p p symmetric matrices. If BA = 0, then X AX and X BX are independent. Corollary 13. Let X N p (0, Σ) and A be a p p symmetric matrix. 1. For B (k p), BX and X AX are independent if BΣA = 0; 2. For symmetric B, X AX and X BX are independent if BΣA = 0. Example 5. The residual sum of squares in the standard linear regression has a scaled chisquared distribution and is independent with the coefficient estimates. Next lecture is on the distribution of the sample covariance matrix. 6

Multivariate Normal Distribution

Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues