Lecture : Intro/refresher in Matrix Algebra Bruce Walsh lecture notes SISG -Mixed Model Course version 8 June 0 Topics Definitions, dimensionality, addition, subtraction Matrix multiplication Inverses, solving systems of equations Quadratic products and covariances The multivariate normal distribution Ordinary least squares Vector/matrix calculus (taking derivatives) 3 Matrix/linear algebra Compact way for treating the algebra of systems of linear equations Most common statistical methods can be written in matrix form y = X! + e is the general linear model OLS solution:! = (X T X) - X T y Y = X! + Z a + e is the general mixed model Matrices: An array of elements Vectors: A matrix with either one row or one column Usually written in bold lowercase, eg a, b, c a = 3 b = ( 0 5 ) 47 Column vector Row vector (3 x ) ( x 4) Dimensionality of a matrix: r x c (rows x columns) think of Railroad Car 4
General Matrices Usually written in bold uppercase, eg A, C, D C = 3 5 4 (3 x 3) Square matrix D = 0 3 4 9 (3 x ) Dimensionality of a matrix: r x c (rows x columns) think of Railroad Car A matrix is defined by a list of its elements B has ij-th element B ij -- the element in row i and column j 5 Partitioned Matrices It will often prove useful to divide (or partition) the elements of a matrix into a matrix whose elements are itself matrices C = 3 3 5 4 = a b 5 4 = d B a = ( 3 ), b = ( ), d = 5 4, B = One useful partition is to write the matrix as either a row vector of column vectors or a column vector of row vectors 7 Addition and Subtraction of Matrices If two matrices have the same dimension (same number of rows and columns), then matrix addition and subtraction simply follows by adding (or subtracting) on an element by element basis A column vector whose elements are row vectors Matrix addition: (A+B) ij = A ij + B ij Matrix subtraction: (A-B) ij = A ij - B ij Examples: A = 3 0 and B = A row vector whose elements are column vectors C = A + B = 4 3 3 - and D = A B = ( ) 6 8
Towards Matrix Multiplication: dot products The least-squares solution for the linear model y = µ + β z + β n z n The dot (or inner) product of two vectors (both of length n) is defined as follows: Example: a b = n i= aibi a = 3 and b = ( 4 5 7 9 ) 4 a b = *4 + *5 + 3*7 + 4*9 = 7 9 yields the following system of equations for the! i σ(y,z ) = β σ (z ) + β σ(z, z ) + + β nσ(z, z n) σ(y,z )= β σ(z,z ) + β σ (z ) + + β nσ(z, z n) σ(y, z n )= β σ(z, z n ) + β σ(z, z n )+ + β n σ (z n ) This can be more compactly written in matrix form as σ (z ) σ(z, z ) σ(z, z n) β σ(y, z ) σ(z,z ) σ (z ) σ(z, z n) β = σ(y, z ) σ(z,z n ) σ(z,z n ) σ (z n ) β n σ(y, z n ) X T X or,! = (X T X) - X T y! X T y Matrices are compact ways to write systems of equations Matrix Multiplication: The order in which matrices are multiplied affects the matrix product, eg AB = BA For the product of two matrices to exist, the matrices must conform For AB, the number of columns of A must equal the number of rows of B The matrix C = AB has the same number of rows as A and the same number of columns as B 0 C (rxc) = A (rxk) B (kxc) ij-th element of C is given by k C ij = A il B lj l= Elements in the jth column of B Elements in the ith row of matrix A
Outer indices given dimensions of resulting matrix, with r rows (A) and c columns (B) C (rxc) = A (rxk) B (kxc) Inner indices must match columns of A = rows of B Example: Is the product ABCD defined? If so, what is its dimensionality? Suppose A 3x5 B 5x9 C 9x6 D 6x3 Yes, defined, as inner indices match Result is a 3 x 3 matrix (3 rows, 3 columns) 3 Example a b e f ae + bg af + bh AB = = c d g h ce + dg cf + dh Likewise ae + cf eb + df BA = ga + ch gd + dh ORDER of multiplication matters! Indeed, consider C 3x5 D 5x5 which gives a 3 x 5 matrix, versus D 5x5 C 3x5, which is not defined 5 More formally, consider the product L = MN Express the matrix M as a column vector of row vectors m m M = where m i = ( Mi Mi Mic ) m r Likewise express N as a row vector of column vectors N = ( n n n b ) where n j = The ij-th element of L is the inner product of M s row i with N s column j m n m n m n b L = m n m n m n b m r n m r n m r n b N j N j Ncj 4 The Transpose of a Matrix The transpose of a matrix exchanges the rows and columns, A T ij = A ji Useful identities (AB) T = B T A T (ABC) T = C T B T A T Inner product = a T b = a T ( X n) b (n X ) Indices match, matrices conform a a = Dimension of resulting product is X (ie a scalar) b n ( a a n ) = a T b = a i b i b i= n a n b b = b n Note that b T a = (b T a) T = a T b 6
Outer product = ab T = a (n X ) b T ( X n) Resulting product is an n x n matrix The Identity Matrix, I The identity matrix serves the role of the number in matrix multiplication: AI =A, IA = A I is a square diagonal matrix, with all diagonal elements being one, all off-diagonal elements zero I ij = for i = j 0 otherwise I 3x3 = 0 0 0 0 0 0 7 9 Solving equations The identity matrix I Serves the same role as in scalar algebra, eg, a*=*a =a, with AI=IA= A The inverse matrix A - (IF it exists) Defined by A A - = I, A - A = I Serves the same role as scalar division To solve ax = c, multiply both sides by (/a) to give (/a)*ax = (/a)c or (/a)*a*x = *x = x, Hence x = (/a)c To solve Ax = c, A - Ax = A - c Or A - Ax = Ix = x = A - c 8 For The Inverse Matrix, A - For a square matrix A, define is Inverse A -, as the matrix satisfying ( A = a b c d A - A = AA - = I ) A = ad bc d b c a If this quantity (the determinant) is zero, the inverse does not exist 0
If det(a) is not zero, A - exists and A is said to be non-singular If det(a) = 0, A is singular, and no unique inverse exists (generalized inverses do) Generalized inverses, and their uses in solving systems of equations, are discussed in Appendix 3 of Lynch & Walsh A - is the typical notation to denote the G-inverse of a matrix When a G-inverse is used, provided the system is consistent, then some of the variables have a family of solutions (eg, x =, but x + x 3 = 6) β = - ρ β = ρ If # = 0, these reduce to the two univariate slopes, β = σ(y, z ) σ (z ) [ σ(y, z ) σ (z ) ρ [ σ(y, z ) σ (z ) ρ ] σ(y,z ) σ(z )σ(z ) ] σ(y,z ) σ(z )σ(z ) and β = σ(y, z ) σ (z ) Likewise, if # =, this reduces to a univariate regression, 3 Example: solve the OLS for! in y = " +! z +! z + e! = V - c β β c = σ(y, z ) V = σ (z ) σ(z, z ) σ(y, z ) σ(z,z ) σ (z ) It is more compact to use σ(z,z ) = ρ σ(z )σ(z ) V = σ (z )σ (z ) ( ρ ) σ (z ) σ(z, z ) σ(z, z ) σ (z ) = σ (z )σ (z ) ( ρ ) σ (z ) σ(z, z ) σ(z, z ) σ (z ) σ(y,z ) σ(y,z ) Useful identities (A T ) - = (A - ) T (AB) - = B - A - For a diagonal matrix D, then Det = D = product of the diagonal elements Also, the determinant of any square matrix A, det(a), is simply the product of the eigenvalues $ of A, which statisfy Ae = $e If A is n x n, solutions to $ are an n-degree polynomial If any of the roots to the equation are zero, A - is not defined Further, some some linear combination b, we have Ab = 0 e is the eigenvector associated with $ 4
Variance-Covariance matrix A very important square matrix is the variance-covariance matrix V associated with a vector x of random variables V ij = Cov(x i,x j ), so that the i-th diagonal element of V is the variance of x i, and offdiagonal elements are covariances V is a symmetric, square matrix Quadratic and Bilinear Forms Quadratic product: for A n x n and x n x x T Ax = n i= j= n a ij x i x j Scalar ( x ) Bilinear Form (generalization of quadratic product) for A m x n, a n x, b m x their bilinear form is b T x m A m x n a n x m n b T Aa = A ij b i a j i= j= 5 Note that b T A a = at A T b 7 The trace The trace, tr(a) or trace(a), of a square matrix A is simply the sum of its diagonal elements The importance of the trace is that it equals the sum of the eigenvalues of A, tr(a) = % $ i For a covariance matrix V, tr(v) measures the total amount of variation in the variables $ i / tr(v) is the fraction of the total variation in x contained in the linear combination e it x, where e i, the i-th principal component of V is also the i-th eigenvector of V (Ve i = $ i e i ) 6 Covariance Matrices for Transformed Variables What is the variance of the linear combination, c x + c x + + c n x n? (note this is a scalar) σ ( c T x ) ( n ) = σ c i x i = σ n n c i x i, c j x j i= i= j= n n n n = σ (c i x i,c j x j) = c i c j σ (x i,x j) i= j= = c T Vc i= j= Likewise, the covariance between two linear combinations can be expressed as a bilinear form, σ(a T x,b T x) = a T V b 8
Example: Suppose the variances of x, x, and x 3 are 0, 0, and 30 x and x have a covariance of -5, x and x 3 of 0, while x and x 3 are uncorrelated What are the variances of the new variables y = x -x +5x 3 and y = 6x -4x 3? Var(y ) = Var(c T x) = c T Var(x) c = 960 Var(y ) = Var(c T x) = c T Var(x) c = 00 Cov(y,y ) = Cov(c T x, c T x) = c T Var(x) c = -90 9 Positive-definite matrix A matrix V is positive-definite if for all vectors c contained at least one non-zero member, c T Vc > 0 A non-negative definite matrix satisfies c T Vc > 0 Any covariance-matrix is (at least) nonnegative definite, as Var(c T x) = c T Vc > 0 Any nonsingular covariance matrix is positive-definite Nonsingular means det(v) > 0 Equivalently, all eigenvalues of V are positive, $ i > 0 3 Now suppose we transform one vector of random variables into another vector of random variables Transform x into (i) y k x = A k x n x n x (ii) z m x = B m x n x n x The covariance between the elements of these two transformed vectors of the original is a k x m covariance matrix = AVB T For example, the covariance between y i and y j is given by the ij-th element of AVA T Likewise, the covariance between y i and z j is given by the ij-th element of AVB T 30 The Multivariate Normal Distribution (MVN) Consider the pdf for n independent normal random variables, the ith of which has mean µ i and variance & i n ( p(x) = (π) / σ exp (x ) - - i µ i ) i σ i= i ( n - ( n = (π) σi) n/ exp i= This can be expressed more compactly in matrix form i= ) (xi µi) σi 3
Define the covariance matrix V for the vector x of the n normal random variable by σ 0 0 0 σ V = 0 0 σn Define the mean vector µ by µ = µ µ µ n n i= V = Hence in matrix from the MVN pdf becomes [ p(x) = (π) n/ V / exp ] - (x µ)t V (x µ) Notice this holds for any vector µ and symmetric positive-definite matrix V, as V > 0 n i= σ i (x i - µ i) = (x µ) T V (x µ) σ i 33 Vector of means µ determines location Spread (geometry) about µ determined by V µ x, x equal variances, positively correlated µ x, x equal variances, uncorrelated Eigenstructure (the eigenvectors and their corresponding eigenvalues) determines the geometry of V 35 The multivariate normal Vector of means µ determines location Spread (geometry) about µ determined by V Just as a univariate normal is defined by its mean and spread, a multivariate normal is defined by its mean vector µ (also called the centroid) and variance-covariance matrix V µ x, x equal variances, negatively correlated µ Var(x ) < Var(x ), uncorrelated 34 Positive tilt = positive correlations Negative tilt = negative correlation No tilt = uncorrelated 36
$ e Eigenstructure of V The direction of the largest axis of variation is given by the unit-length vector e, the st eigenvector of V µ $ e The next largest axis of orthogonal (at 90 degrees from) e, is given by e, the nd eigenvector Properties of the MVN - II 3) Conditional distributions are also MVN Partition x into two components, x (m dimensional column vector) and x ( n-m dimensional column vector) x = ( x x ) µ µ = µ and V = V x x Vx x V T xx Vx x x x is MVN with m-dimensional mean vector µ x x = µ + V xxv xx (x µ ) 37 and m x m covariance matrix Vx x = Vxx - VxxV x x Vx T x - 39 Properties of the MVN - I ) If x is MVN, any subset of the variables in x is also MVN ) If x is MVN, any linear combination of the elements of x is also MVN If x ~ MVN(µ,V) for y = x + a, y is MVN n (µ + a, V) n for y = a T x = aixi, y is N(a T µ,a T V a) k= for y = Ax, y is MVNm ( ) Aµ,A T VA Properties of the MVN - III 4) If x is MVN, the regression of any subset of x on another subset is linear and homoscedastic x = µ x x + e = µ + VxxV xx (x µ ) + e Where e is MVN with mean vector 0 and variance-covariance matrix Vx x 38 40
µ + VxxV xx (x µ ) + e The regression is linear because it is a linear function of x The regression is homoscedastic because the variancecovariance matrix for e does not depend on the value of the x s Vx x = Vxx - VxxV x x Vx T x All these matrices are constant, and hence the same for any value of x - 4 Regression of Offspring value on Parental values (cont) Hence, z o = µ + VxxV xx (x µ ) + e Vx,x = σ z, Vx,x = h σz ( ), V x,x = σz = µ o + h σ z ( ) σ z 0 zs µ s 0 z d + e µ d = µ o + h (zs µ s) + h (zd µ d) + e Where e is normal with mean zero and variance Vx x = Vxx - VxxV x x Vx T x σ e = σz h σz ( ) σz - 0 h σz 0 = σz h4-0 ( 0 ) 43 Example: Regression of Offspring value on Parental values Assume the vector of offspring value and the values of both its parents is MVN Then from the correlations among (outbred) relatives, zo z s z d MVN µo µ s µ d Let x = ( zo ), x =, σ z h / h / h / 0 h / 0 ( zs Vx,x = σ z, Vx,x = h σz ( ), V x,x = σz zd ) = µ + VxxV xx (x µ ) + e 0 ( 0 ) 4 Hence, the regression of offspring trait value given the trait values of its parents is z o = µ o + h /(z s - µ s ) + h /(z d - µ d ) + e where the residual e is normal with mean zero and Var(e) = & z (-h 4 /) Similar logic gives the regression of offspring breeding value on parental breeding value as A o = µ o + (A s - µ s )/ + (A d - µ d )/ + e = A s / + A d / + e where the residual e is normal with mean zero and Var(e) = & A / 44
Ordinary least squares Some common derivatives Hence, we need to discuss vector/matrix derivatives 45 47 The gradient, the derivative of a vector-valued function 46 48
Simple matrix commands in R R is case-sensitive! t(a) = transpose of A A%*%B = matrix product AB %*% command = matrix multiplication solve(a) = compute inverse of A x <- = assigns the variable x what is to the right of the arrow eg, x <- 35 Example : Compute Var(y ), Var(y ), and Cov(y,y ), where y i = c it x c T Vc c T Vc 49 c T Vc 5 Example: Solve the equation det(a) returns the determinant of A First, input the matrix A In R, c denotes a list, and we built the matrix by columns, eigen(a) returns the eigenvalues and vectors of A Next, input the vector c Finally, compute A - c 50 5
Additional references Lynch & Walsh Chapter 8 (intro to matrices) In notes package: Appendix 4 (Matrix geometry) Appendix 5 (Matrix derivatives) Matrix Calculations in R 53 55 54