1 Introduction. 2 Matrices: Definition. Matrix Algebra. Hervé Abdi Lynne J. Williams

Similar documents

Introduction to Matrix Algebra

Data Mining: Algorithms and Applications Matrix Math Review

Metric Multidimensional Scaling (MDS): Analyzing Distance Matrices

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

1 Introduction to Matrices

Linear Algebra Review. Vectors

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

Partial Least Squares (PLS) Regression.

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

13 MATH FACTS a = The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Lecture 2 Matrix Operations

DATA ANALYSIS II. Matrix Algorithms

Notes on Determinant

Linear Algebra Notes for Marsden and Tromba Vector Calculus

by the matrix A results in a vector which is a reflection of the given

Partial Least Square Regression PLS-Regression

1 Overview and background

The Kendall Rank Correlation Coefficient

Linear Algebra: Determinants, Inverses, Rank

Matrix Differentiation

Lecture 5: Singular Value Decomposition SVD (1)

MATH APPLIED MATRIX THEORY

Introduction to Matrix Algebra

Brief Introduction to Vectors and Matrices

Similarity and Diagonalization. Similar Matrices

Multiple factor analysis: principal component analysis for multitable and multiblock data sets

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

LINEAR ALGEBRA. September 23, 2010

Similar matrices and Jordan form

Applied Linear Algebra I Review page 1

1 Overview. Fisher s Least Significant Difference (LSD) Test. Lynne J. Williams Hervé Abdi

Chapter 6. Orthogonality

Notes on Symmetric Matrices

Orthogonal Diagonalization of Symmetric Matrices

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

The Method of Least Squares

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

Chapter 7. Matrices. Definition. An m n matrix is an array of numbers set out in m rows and n columns. Examples. (

Section Inner Products and Norms

Least-Squares Intersection of Lines

Linear Algebraic Equations, SVD, and the Pseudo-Inverse

[1] Diagonal factorization

Numerical Analysis Lecture Notes

Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances

Multiple Correspondence Analysis

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

The Characteristic Polynomial

Vector and Matrix Norms

Linear Algebra: Vectors

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Solution to Homework 2

University of Lille I PC first year list of exercises n 7. Review

Factor Rotations in Factor Analyses.

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

MAT188H1S Lec0101 Burbulla

Inner Product Spaces and Orthogonality

CONTROLLABILITY. Chapter Reachable Set and Controllability. Suppose we have a linear system described by the state equation

Matrix Algebra. Some Basic Matrix Laws. Before reading the text or the following notes glance at the following list of basic matrix algebra laws.

Matrix Algebra in R A Minimal Introduction

More than you wanted to know about quadratic forms

ASEN Structures. MDOF Dynamic Systems. ASEN 3112 Lecture 1 Slide 1

Factor Analysis. Chapter 420. Introduction

State of Stress at Point

LS.6 Solution Matrices

The Matrix Elements of a 3 3 Orthogonal Matrix Revisited

Lecture L3 - Vectors, Matrices and Coordinate Transformations

Multivariate Analysis of Variance (MANOVA): I. Theory

Matrix algebra for beginners, Part I matrices, determinants, inverses

1 VECTOR SPACES AND SUBSPACES

Mathematics Course 111: Algebra I Part IV: Vector Spaces

Lecture 1: Schur s Unitary Triangularization Theorem

Chapter 17. Orthogonal Matrices and Symmetries of Space

9 MATRICES AND TRANSFORMATIONS

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

Systems of Linear Equations

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

MAT 200, Midterm Exam Solution. a. (5 points) Compute the determinant of the matrix A =

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

Multiple regression - Matrices

Lecture notes on linear algebra

Solution of Linear Systems

A Primer on Index Notation

Linear Algebra Done Wrong. Sergei Treil. Department of Mathematics, Brown University

Inner product. Definition of inner product

Linear Algebra Review and Reference

Factor analysis. Angela Montanari

SYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89. by Joseph Collison

Continued Fractions and the Euclidean Algorithm

How To Understand Multivariate Models

Elementary Matrices and The LU Factorization

T ( a i x i ) = a i T (x i ).

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Introduction to Principal Components and FactorAnalysis

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Section 5.3. Section 5.3. u m ] l jj. = l jj u j + + l mj u m. v j = [ u 1 u j. l mj

NOTES ON LINEAR TRANSFORMATIONS

Transcription:

In Neil Salkind (Ed.), Encyclopedia of Research Design. Thousand Oaks, CA: Sage. 00 Matrix Algebra Hervé Abdi Lynne J. Williams Introduction Sylvester developed the modern concept of matrices in the 9th century. For him a matrix was an array of numbers. Sylvester worked with systems of linear equations and matrices provided a convenient way of working with their coefficients, so matrix algebra was to generalize number operations to matrices. Nowadays, matrix algebra is used in all branches of mathematics and the sciences and constitutes the basis of most statistical procedures. Matrices: Definition A matrix is a set of numbers arranged in a table. For example, Toto, Marius, and Olivette are looking at their possessions, and they are counting how many balls, cars, coins, and novels they each possess. Toto has balls, 5 cars, 0 coins, and 0 novels. Marius has,, 3, Hervé Abdi The University of Texas at Dallas Lynne J. Williams The University of Toronto Scarborough Address correspondence to: Hervé Abdi Program in Cognition and Neurosciences, MS: Gr.4., The University of Texas at Dallas, Richardson, TX 75083 0688, USA E-mail: herve@utdallas.edu http://www.utd.edu/ herve

Matrix Algebra and 4 and Olivette has 6,, 3 and 0. These data can be displayed in a table where each row represents a person and each column a possession: balls cars coins novels Toto 5 0 0 Marius 3 4 Olivette 6 3 0 We can also say that these data are described by the matrix denoted A equal to: 5 0 0 3 4. () 6 3 0 Matrices are denoted by boldface uppercase letters. To identify a specific element of a matrix, we use its row and column numbers. For example, the cell defined by Row 3 and Column contains the value 6. We write that a 3, = 6. With this notation, elements of a matrix are denoted with the same letter as the matrix but written in lowercase italic. The first subscript always gives the row number of the element (i.e., 3) and second subscript always gives its column number (i.e., ). A generic element of a matrix is identified with indices such as i and j. So, a i,j is the element at the the i-th row and j-th column of A. The total number of rows and columns is denoted with the same letters as the indices but in uppercase letters. The matrix A has I rows (here I = 3) and J columns (here J = 4) and it is made of I J elements a i,j (here 3 4 = ). We often use the term dimensions to refer to the number of rows and columns, so A has dimensions I by J.

ABDI & WILLIAMS 3 As a shortcut, a matrix can be represented by its generic element written in brackets. So, A with I rows and J columns is denoted: a, a, a,j a,j a, a, a,j a,j [a i,j ] = a i, a i, a i,j a i,j a I, a I, a I,j a I, J. () For either convenience or clarity, we can also indicate the number of rows and columns as a subscripts below the matrix name: A I J = [a i,j ]. (3). Vectors A matrix with one column is called a column vector or simply a vector. Vectors are denoted with bold lower case letters. For example, the first column of matrix A (of Equation ) is a column vector which stores the number of balls of Toto, Marius, and Olivette. We can call it b (for balls), and so: b =. (4) 6 Vectors are the building blocks of matrices. For example, A (of Equation ) is made of four column vectors which represent the number of balls, cars, coins, and novels, respectively.. Norm of a vector We can associate to a vector a quantity, related to its variance and standard deviation, called the norm or length. The norm of a vector is the square root of the sum of squares of the elements, it is denoted by putting the name of the vector between a set of double bars ( ).

4 Matrix Algebra For example, for x =, (5) we find x = + + = 4 + + 4 = 9 = 3. (6).3 Normalization of a vector A vector is normalized when its norm is equal to one. To normalize a vector, we divide each of its elements by its norm. For example, vector x from Equation 5 is transformed into the normalized x as x = x x = 3 3 3. (7) 3 Operations for matrices 3. Transposition If we exchange the roles of the rows and the columns of a matrix we transpose it. This operation is called the transposition, and the new matrix is called a transposed matrix. The A transposed is denoted A T. For example: if A 3 4 = 5 0 0 3 4 6 3 0 then A T = A 4 3 T = 6 5. (8) 0 3 3 0 4 0

ABDI & WILLIAMS 5 3. Addition (sum) of matrices When two matrices have the same dimensions, we compute their sum by adding the corresponding elements. For example, with 5 0 0 3 4 6 3 0 3 4 5 6 and B = 4 6 8 3 5, (9) we find + 3 5 + 4 0 + 5 0 + 6 5 9 5 6 A + B = + + 4 3 + 6 4 + 8 = 3 6 9 6 + + 3 + 3 0 + 5 7 3 6 5. (0) In general A + B = a, + b, a, + b, a,j + b,j a,j + b,j a, + b, a, + b, a,j + b,j a,j + b,j a i, + b i, a i, + b i, a i,j + b i,j a i,j + b i,j a I, + b I, a I, + b I, a I,j + b I,j a I, J + b I, J. () Matrix addition behaves very much like usual addition. Specifically, matrix addition is commutative (i.e., A + B = B + A); and associative [i.e.,a + (B + C) = (A + B) + C]. 3.3 Multiplication of a matrix by a scalar In order to differentiate matrices from the usual numbers, we call the latter scalar numbers or simply scalars. To multiply a matrix by a scalar, multiply each element of the matrix by this scalar. For example: 3 4 5 6 30 40 50 60 0 B = 0 4 6 8 = 0 40 60 80. () 3 5 0 0 30 50

6 Matrix Algebra 3.4 Multiplication: Product or products? There are several ways of generalizing the concept of product to matrices. We will look at the most frequently used of these matrix products. Each of these products will behave like the product between scalars when the matrices have dimensions. 3.5 Hadamard product When generalizing product to matrices, the first approach is to multiply the corresponding elements of the two matrices that we want to multiply. This is called the Hadamard product denoted by. The Hadamard product exists only for matrices with the same dimensions. Formally, it is defined as: For example, with A B = [a i,j b i,j ] = a, b, a, b, a,j b,j a,j b,j a, b, a, b, a,j b,j a,j b,j a i, b i, a i, b i, a i,j b i,j a i,j b i,j a I, b I, a I, b I, a I,j b I,j a I, J b I, J. (3) we get: 5 0 0 3 4 6 3 0 3 4 5 6 and B = 4 6 8 3 5, (4) 3 5 4 0 5 0 6 6 0 50 0 A B = 4 3 6 4 8 = 8 8 3. (5) 6 3 3 0 5 6 9 50

ABDI & WILLIAMS 7 3.6 Standard (a.k.a.) Cayley product The Hadamard product is straightforward, but, unfortunately, it is not the matrix product most often used. This product is called the standard or Cayley product, or simply the product (i.e., when the name of the product is not specified, this is the standard product). Its definition comes from the original use of matrices to solve equations. Its definition looks surprising at first because it is defined only when the number of columns of the first matrix is equal to the number of rows of the second matrix. When two matrices can be multiplied together they are called conformable. This product will have the number of rows of the first matrix and the number of columns of the second matrix. So, A with I rows and J columns can be multiplied by B with J rows and K columns to give C with I rows and K columns. A convenient way of checking that two matrices are conformable is to write the dimensions of the matrices as subscripts. For example: or even: A B = C, (6) I J J K I K I A J B K = C I K An element c i,k of the matrix C is computed as: c i,k = J j= (7) a i,j b j,k. (8) So, c i,k is the sum of J terms, each term being the product of the corresponding element of the i-th row of A with the k-th column of B. For example, let: [ 3 4 5 6 ] and B = 3 4 5 6. (9) The product of these matrices is denoted C = A B = AB (the sign can be omitted when the context is clear). To compute c, we

8 Matrix Algebra add 3 terms: () the product of the first element of the second row of A (i.e., 4) with the first element of the first column of B (i.e., ); () the product of the second element of the second row of A (i.e., 5) with the second element of the first column of B (i.e., 3); and (3) the product of the third element of the second row of A (i.e., 6) with the third element of the first column of B (i.e., 5). Formally, the term c, is obtained as c, = J=3 a,j b j, j= Matrix C is obtained as: = (a, ) (b, ) + (a, b, ) + (a,3 b 3, ) = (4 ) + (5 3) + (6 5) = 49. (0) AB = C = [c i,k ] = J=3 a i,j b j,k j= = [ + 3 + 3 5 + 4 + 3 6 4 + 5 3 + 6 5 4 + 5 4 + 6 6 ] 8 = [ 49 64 ]. () 3.6. Properties of the product Like the product between scalars, the product between matrices is associative, and distributive relative to addition. Specifically, for any set of three conformable matrices A, B and C: (AB)C = A(BC) = ABC associativity () A(B + C) = AB + AC distributivity. (3)

ABDI & WILLIAMS 9 The matrix products AB and BA do not always exist, but when they do, these products are not, in general, commutative: For example, with AB BA. (4) [ ] and B = [ ] (5) we get: But AB = [ ] [ ] = [0 0 0 0 ]. (6) B [ ] [ ] = [ 4 8 4 ]. (7) Incidently, we can combine transposition and product and get the following equation: (AB) T = B T A T. (8) 3.7 Exotic product: Kronecker Another product is the Kronecker product also called the direct, tensor, or Zehfuss product. It is denoted, and is defined for all matrices. Specifically, with two matrices a i,j (with dimensions I by J) and B (with dimensions K and L), the Kronecker product gives a matrix C (with dimensions (I K) by (J L)) defined as: a, B a, B a,j B a,j B a, B a, B a,j B a,j B A B =. (9) a i, B a i, B a i,j B a i,j B a I, B a I, B a I,j B a I, J B

0 Matrix Algebra For example, with we get: [ 3] and B = [ 6 7 8 9 ] (30) A B = [ 6 7 6 7 3 6 3 7 7 4 8 ] = [6 8 9 8 9 3 8 3 9 8 9 6 8 4 7 ]. (3) The Kronecker product is used to write design matrices. It is an essential tool for the derivation of expected values and sampling distributions. 4 Special matrices Certain special matrices have specific names. 4. Square and rectangular matrices A matrix with the same number of rows and columns is a square matrix. By contrast, a matrix with different numbers of rows and columns, is a rectangular matrix. So: is a square matrix, but is a rectangular matrix. 3 4 5 5 7 8 0 B = 4 5 7 8 (3) (33)

ABDI & WILLIAMS 4. Symmetric matrix A square matrix A with a i,j = a j,i is symmetric. So: is symmetric, but is not. 0 3 0 5 3 5 30 3 4 0 5 7 8 30 (34) (35) Note that for a symmetric matrix: A T. (36) A common mistake is to assume that the standard product of two symmetric matrices is commutative. But this is not true as shown by the following example, with: 3 4 3 4 and B = 3 3. (37) We get 9 9 9 AB = 5, but B 5 0. (38) 9 0 9 9 Note, however, that combining Equations 8 and 36, gives for symmetric matrices A and B, the following equation: AB = (BA) T. (39)

Matrix Algebra 4.3 Diagonal matrix A square matrix is diagonal when all its elements, except the ones on the diagonal, are zero. Formally, a matrix is diagonal if a i,j = 0 when i j. So: 0 0 0 0 0 0 is diagonal. (40) 0 0 30 Because only the diagonal elements matter for a diagonal matrix, we just need to specify them. This is done with the following notation: diag {[a,,..., a i,i,..., a I,I ]} = diag {[a i,i ]}. (4) For example, the previous matrix can be rewritten as: 0 0 0 0 0 0 = diag {[0, 0, 30]}. (4) 0 0 30 The operator diag can also be used to isolate the diagonal of any square matrix. For example, with: 3 4 5 6 7 8 9 we get: 3 diag {A} = diag 4 5 6 = 5 7 8 9 9 Note, incidently, that: (43). (44) 0 0 diag {diag {A}} = 0 5 0. (45) 0 0 9

ABDI & WILLIAMS 3 4.4 Multiplication by a diagonal matrix Diagonal matrices are often used to multiply by a scalar all the elements of a given row or column. Specifically, when we pre-multiply a matrix by a diagonal matrix the elements of the row of the second matrix are multiplied by the corresponding diagonal element. Likewise, when we post-multiply a matrix by a diagonal matrix the elements of the column of the first matrix are multiplied by the corresponding diagonal element. For example, with: we get and and also [ 3 4 5 6 ] B = [ 0 0 0 0 5 ] C = 0 4 0, (46) 0 0 6 B [ 0 0 5 ] [ 3 4 5 6 ] = [ 4 6 0 5 30 ] (47) AC = [ 3 0 0 4 5 6 ] 0 4 0 0 0 6 BAC = [ 0 0 5 ] [ 3 4 5 6 ] = [ 8 8 8 0 36 ] (48) 0 0 0 4 0 4 6 36 = [ 0 0 6 40 00 80 ]. (49) 4.5 Identity matrix A diagonal matrix whose diagonal elements are all equal to is called an identity matrix and is denoted I. If we need to specify its dimensions, we use subscripts such as I = I = 3 3 0 0 0 0 0 0 (this is a 3 3 identity matrix). (50)

4 Matrix Algebra The identity matrix is the neutral element for the standard product. So: I A I = A (5) for any matrix A conformable with I. For example: 0 0 3 3 0 0 3 0 0 4 5 5 = 4 5 5 0 0 = 4 5 5 0 0 7 8 0 7 8 0 0 0 7 8 0. (5) 4.6 Matrix full of ones A matrix whose elements are all equal to, is denoted or, when we need to specify its dimensions, by. These matrices are neutral I J elements for the Hadamard product. So: A = [ 3 3 3 4 5 6 ] [ ] (53) = [ 3 4 5 6 ] = [ 3 4 5 6 ]. (54) The matrices can also be used to compute sums of rows or columns: [ 3] = ( ) + ( ) + (3 ) = + + 3 = 6, (55) or also [ ] [ 3 ] = [5 7 9]. (56) 4 5 6 4.7 Matrix full of zeros A matrix whose elements are all equal to 0, is the null or zero matrix. It is denoted by 0 or, when we need to specify its dimensions, by

ABDI & WILLIAMS 5 0. Null matrices are neutral elements for addition I J [ 3 4 ] + 0 = [ + 0 + 0 3 + 0 4 + 0 ] = [ 3 4 ]. (57) They are also null elements for the Hadamard product. [ 3 4 ] 0 = [ 0 0 3 0 4 0 ] = [0 0 0 0 ] = 0 (58) and for the standard product: [ 3 4 ] 0 = [ 0 + 0 0 + 0 3 0 + 4 0 3 0 + 4 0 ] = [0 0 0 0 ] = 0. (59) 4.8 Triangular matrix A matrix is lower triangular when a i,j = 0 for i < j. A matrix is upper triangular when a i,j = 0 for i > j. For example: and 0 0 0 0 0 3 5 30 3 B = 0 0 5 0 0 30 is lower triangular, (60) is upper triangular. (6) 4.9 Cross-product matrix A cross-product matrix is obtained by multiplication of a matrix by its transpose. Therefore a cross-product matrix is square and symmetric. For example, the matrix: 4 3 4 (6)

6 Matrix Algebra pre-multiplied by its transpose gives the cross-product matrix: A T = [ 3 4 4 ] (63) A T [ + + 3 3 + 4 + 3 4 + 4 + 4 3 + 4 4 + 4 4 ] 4 = [ 33 ]. (64) 4.9. A particular case of cross-product matrix: Variance/Covariance A particular case of cross-product matrices are correlation or covariance matrices. A variance/covariance matrix is obtained from a data matrix with three steps: () subtract the mean of each column from each element of this column (this is centering ); () compute the cross-product matrix from the centered matrix; and (3) divide each element of the cross-product matrix by the number of rows of the data matrix. For example, if we take the I = 3 by J = matrix A: 5 0 8 0, (65) we obtain the means of each column as: m = I I A I J = 3 [ ] 5 0 = [5 7]. (66) 8 0 To center the matrix we subtract the mean of each column from all its elements. This centered matrix gives the deviations from each

ABDI & WILLIAMS 7 element to the mean of its column. Centering is performed as: D = A m = 5 0 [5 7] J 8 0 (67) 5 7 3 6 = 5 0 5 7 = 0 3. 8 0 5 7 3 3 (68) We note S the variance/covariance matrix derived from A, it is computed as: S = I DT D = 3 [ 3 0 3 3 6 6 3 3 ] 0 3 3 3 = 7 [8 3 7 54 ] = [6 9 9 8 ]. (69) (Variances are on the diagonal, covariances are off-diagonal.) 5 The inverse of a square matrix An operation similar to division exists, but only for (some) square matrices. This operation uses the notion of inverse operation and defines the inverse of a matrix. The inverse is defined by analogy with the scalar number case for which division actually corresponds to multiplication by the inverse, namely: a b = a b with b b =. (70) The inverse of a square matrix A is denoted A. It has the following property: A A = A I. (7)

8 Matrix Algebra The definition of the inverse of a matrix is simple. but its computation, is complicated and is best left to computers. For example, for: 0 0 0 0, (7) the inverse is: A = 0 0 0 0. (73) All square matrices do not necessarily have an inverse. The inverse of a matrix does not exist if the rows (and the columns) of this matrix are linearly dependent. For example, 3 4 0 3, (74) does not have an inverse since the second column is a linear combination of the two other columns: 4 0 = 3 3 = 6 4 3. (75) A matrix without an inverse is singular. When A exists it is unique. Inverse matrices are used for solving linear equations, and least square problems in multiple regression analysis or analysis of variance.

ABDI & WILLIAMS 9 5. Inverse of a diagonal matrix The inverse of a diagonal matrix is easy to compute: The inverse of is the diagonal matrix diag {a i,i } (76) A = diag {a i,i } = diag {/a i,i } (77) For example, 0 0 0.5 0 0 0 4 are the inverse of each other. and 0 0 0 0 0 0.5, (78) 6 The Big tool: eigendecomposition So far, matrix operations are very similar to operations with numbers. The next notion is specific to matrices. This is the idea of decomposing a matrix into simpler matrices. A lot of the power of matrices follows from this. A first decomposition is called the eigendecomposition and it applies only to square matrices, the generalization of the eigendecomposition to rectangular matrices is called the singular value decomposition. Eigenvectors and eigenvalues are numbers and vectors associated with square matrices, together they constitute the eigendecomposition. Even though the eigendecomposition does not exist for all square matrices, it has a particularly simple expression for a class of matrices often used in multivariate analysis such as correlation, covariance, or cross-product matrices. The eigendecomposition of these matrices is important in statistics because it is used to find the maximum (or minimum) of functions involving these matrices. For example, principal component analysis is obtained from the eigendecomposition of a covariance or correlation matrix and gives the least square estimate of the original data matrix.

0 Matrix Algebra 6. Notations and definition An eigenvector of matrix A is a vector u that satisfies the following equation: Au = λu, (79) where λ is a scalar called the eigenvalue associated to the eigenvector. When rewritten, Equation 79 becomes: (A λi)u = 0. (80) Therefore u is eigenvector of A if the multiplication of u by A changes the length of u but not its orientation. For example, has for eigenvectors: [ 3 ] (8) u = [ 3 ] with eigenvalue λ = 4 (8) and u = [ ] with eigenvalue λ = (83) When u and u are multiplied by A, only their length changes. That is, Au = λ u = [ 3 ] [3 ] = [ 8 ] = 4 [3 ] (84) and Au = λ u = [ 3 ] [ ] = [ ] = [ ]. (85) This is illustrated in Figure. For convenience, eigenvectors are generally normalized such that: u T u =. (86)

ABDI & WILLIAMS 8 Au u 3 u - - Au a Figure : Two eigenvectors of a matrix. b For the previous example, normalizing the eigenvectors gives: We can check that: u = [.83.5547 ] and u [.707.707 ]. (87) [ 3 ] [.83.5547 ] = [3.384.88 ] = 4 [.83.5547 ] (88) and [ 3 ] [.707.707 ] = [.707.707 ] = [.707.707 ]. (89) 6. Eigenvector and eigenvalue matrices Traditionally, we store the eigenvectors of A as the columns a matrix denoted U. Eigenvalues are stored in a diagonal matrix (denoted Λ). Therefore, Equation 79 becomes: AU = UΛ. (90)

Matrix Algebra For example, with A (from Equation 8), we have [ 3 ] [3 ] = [3 ] [4 0 0 ] (9) 6.3 Reconstitution of a matrix The eigen-decomposition can also be use to build back a matrix from it eigenvectors and eigenvalues. This is shown by rewriting Equation 90 as UΛU. (9) For example, because we obtain: U.. = [.4.6 ], UΛU = [ 3 ] [4 0.. ] [ 0.4.6 ] = [ 3 ]. (93) 6.4 Digression: An infinity of eigenvectors for one eigenvalue It is only through a slight abuse of language that we talk about the eigenvector associated with one eigenvalue. Any scalar multiple of an eigenvector is an eigenvector, so for each eigenvalue there is an infinite number of eigenvectors all proportional to each other. For example, [ ] (94)

ABDI & WILLIAMS 3 is an eigenvector of A: Therefore: is also an eigenvector of A: [ 3 ]. (95) [ ] = [ ] (96) [ 3 ] [ ] = [ ] = [ ]. (97) 6.5 Positive (semi-)definite matrices Some matrices, such as [ 0 ], do not have eigenvalues. Fortunately, 0 0 the matrices used often in statistics belong to a category called positive semi-definite. The eigendecomposition of these matrices always exists and has a particularly convenient form. A matrix is positive semi-definite when it can be obtained as the product of a matrix by its transpose. This implies that a positive semi-definite matrix is always symmetric. So, formally, the matrix A is positive semi-definite if it can be obtained as: XX T (98) for a certain matrix X. Positive semi-definite matrices include correlation, covariance, and cross-product matrices. The eigenvalues of a positive semi-definite matrix are always positive or null. Its eigenvectors are composed of real values and are pairwise orthogonal when their eigenvalues are different. This implies the following equality: U = U T. (99) We can, therefore, express the positive semi-definite matrix A as: UΛU T (00) where U T U = I are the normalized eigenvectors.

4 Matrix Algebra For example, can be decomposed as: [ 3 3 ] (0) UΛU T = [ 4 0 0 ] = [ 3 3 ], (0) with = [ 0 0 ]. (03) 6.5. Diagonalization When a matrix is positive semi-definite we can rewrite Equation 00 as UΛU T Λ = U T AU. (04) This shows that we can transform A into a diagonal matrix. Therefore the eigen-decomposition of a positive semi-definite matrix is often called its diagonalization. 6.5. Another definition for positive semi-definite matrices A matrix A is positive semi-definite if for any non-zero vector x we have: x T Ax 0 x. (05) When all the eigenvalues of a matrix are positive, the matrix is positive definite. In that case, Equation 05 becomes: x T Ax > 0 x. (06)

ABDI & WILLIAMS 5 6.6 Trace, Determinant, etc. The eigenvalues of a matrix are closely related to three important numbers associated to a square matrix the: trace, determinant and rank. 6.6. Trace The trace of A, denoted trace {A}, is the sum of its diagonal elements. For example, with: we obtain: 3 4 5 6 7 8 9 (07) trace {A} = + 5 + 9 = 5. (08) The trace of a matrix is also equal to the sum of its eigenvalues: trace {A} = λ l = trace {Λ} (09) l with Λ being the matrix of the eigenvalues of A. For the previous example, we have: We can verify that: Λ = diag {6.68,.68, 0}. (0) trace {A} = λ l = 6.68 + (.68) = 5 () l 6.6. Determinant The determinant is important for finding the solution of systems of linear equations (i.e., the determinant determines the existence of a solution). The determinant of a matrix is equal to the product of its

6 Matrix Algebra eigenvalues. If det {A} is the determinant of A: det {A} = λ l with λ l being the l-th eigenvalue of A. () l For example, the determinant of A from Equation 07 is equal to: det {A} = 6.68.68 0 = 0. (3) 6.6.3 Rank Finally, the rank of a matrix is the number of non-zero eigenvalues of the matrix. For our example: rank {A} =. (4) The rank of a matrix gives the dimensionality of the Euclidean space which can be used to represent this matrix. Matrices whose rank is equal to their dimensions are full rank and they are invertible. When the rank of a matrix is smaller than its dimensions, the matrix is not invertible and is called rank-deficient, singular, or multicolinear. For example, matrix A from Equation 07, is a 3 3 square matrix, its rank is equal to, and therefore it is rank-deficient and does not have an inverse. 6.7 Statistical properties of the eigen-decomposition The eigen-decomposition is essential in optimization. For example, principal component analysis (pca) is a technique used to analyze a I J matrix X where the rows are observations and the columns are variables. Pca finds orthogonal row factor scores which explain as much of the variance of X as possible. They are obtained as F = XQ, (5) where F is the matrix of factor scores and Q is the matrix of loadings of the variables. These loadings give the coefficients of the linear combination used to compute the factor scores from the variables.

ABDI & WILLIAMS 7 In addition to Equation 5 we impose the constraints that F T F = Q T X T XQ (6) is a diagonal matrix (i.e., F is an orthogonal matrix) and that Q T Q = I (7) (i.e., Q is an orthonormal matrix). The solution is obtained by using Lagrangian multipliers where the constraint from Equation 7 is expressed as the multiplication with a diagonal matrix of Lagrangian multipliers denoted Λ in order to give the following expression Λ (Q T Q I) (8) This amounts to defining the following equation L = trace {F T F Λ (Q T Q I)} = trace {Q T X T XQ Λ (Q T Q I)}. (9) The values of Q which give the maximum values of L, are found by first computing the derivative of L relative to Q: and setting this derivative to zero: L Q = XT XQ ΛQ, (0) X T XQ ΛQ = 0 X T XQ = ΛQ. () Because Λ is diagonal, this is an eigendecomposition problem, and Λ is the matrix of eigenvalues of the positive semi-definite matrix X T X ordered from the largest to the smallest and Q is the matrix of eigenvectors of X T X. Finally, the factor matrix is F = XQ. () The variance of the factors scores is equal to the eigenvalues: F T F = Q T X T XQ = Λ. (3)

8 Matrix Algebra Because the sum of the eigenvalues is equal to the trace of X T X, the first factor scores extract as much of the variances of the original data as possible, and the second factor scores extract as much of the variance left unexplained by the first factor, and so on for the remaining factors. The diagonal elements of the matrix Λ which are the standard deviations of the factor scores are called the singular values of X. 7 A tool for rectangular matrices: The singular value decomposition The singular value decomposition (svd) generalizes the eigendecomposition to rectangular matrices. The eigendecomposition, decomposes a matrix into two simple matrices, and the svd decomposes a rectangular matrix into three simple matrices: Two orthogonal matrices and one diagonal matrix. The svd uses the eigendecomposition of a positive semi-definite matrix to derive a similar decomposition for rectangular matrices. 7. Definitions and notations The svd decomposes matrix A as: P Q T. (4) where P is the (normalized) eigenvectors of the matrix AA T (i.e., P T P = I). The columns of P are called the left singular vectors of A. Q is the (normalized) eigenvectors of the matrix A T A (i.e., Q T Q = I). The columns of Q are called the right singular vectors of A. is the diagonal matrix of the singular values, = Λ with Λ being the diagonal matrix of the eigenvalues of AA T and A T A. The svd is derived from the eigendecomposition of a positive semi-definite matrix. This is shown by considering the eigendecomposition of the two positive semi-definite matrices obtained from A:

ABDI & WILLIAMS 9 namely AA T and A T A. If we express these matrices in terms of the svd of A, we find: AA T = P Q T Q P T = P P T = PΛP T, (5) and A T Q P T P Q T = Q Q T = QΛQ T. (6) This shows that is the square root of Λ, that P are eigenvectors of AA T, and that Q are eigenvectors of A T A. For example, the matrix:.547.547.0774 0.0774 0.0774.0774 (7) can be expressed as: P Q T 0.865 0 = 0.408 0.707 [ 0 0.707 0.707 ] [ 0.408 0.707 0 0.707 0.707 ].547.547 =.0774 0.0774 0.0774.0774. (8) We can check that: 0.865 0 AA T = 0.408 0.707 [ 0 0.408 0.707 0 ] [ 0.865 0.408 0.408 0 0.707 0.707 ]

30 Matrix Algebra.6667.3333.3333 =.3333.667 0.667.3333 0.667.667 (9) and that: A T 0.707 0.707 [ 0.707 0.707 ] 0 [ 0 ] [ 0.707 0.707 0.707 0.707 ].5.5 = [.5.5 ]. (30) 7. Generalized or pseudo-inverse The inverse of a matrix is defined only for full rank square matrices. The generalization of the inverse for other matrices is called generalized inverse, pseudo-inverse or Moore-Penrose inverse and is denoted by X +. The pseudo-inverse of A is the unique matrix that satisfies the following four constraints: AA + A (i) A + AA + = A + (ii) (AA + ) T = AA + (symmetry ) (iii) (A + A) T = A + A (symmetry ) (iv). (3) For example, with we find that the pseudo-inverse is equal to (3) A +.5.5.5 = [.5.5.5 ]. (33)

ABDI & WILLIAMS 3 This example shows that the product of a matrix and its pseudoinverse does not always gives the identity matrix: AA + =.5.5.5 0.50 [ ] = [0.3750.5.5.5 0.50 0.3750 ]. (34) 7.3 Pseudo-inverse and singular value decomposition The svd is the building block for the Moore-Penrose pseudo-inverse. Because any matrix A with svd equal to P Q T has for pseudoinverse: A + = Q P T. (35) For the preceding example we obtain: A + 0.707 0.707 = [ 0.707 0.707 ] 0 [ 0 ] [ 0.865 0.408 0.408 0 0.707 0.707 ] 0.887 0.6443 0.3557 = [ 0.887 0.3557 0.6443 ]. (36) Pseudo-inverse matrices are used to solve multiple regression and analysis of variance problems. Related entries Analysis of variance and covariance, canonical correlation, correspondence analysis, confirmatory factor analysis, discriminant analysis, general linear, latent variable, Mauchly test, multiple regression, principal component analysis, sphericity, structural equation modelling

3 Matrix Algebra Further readings. Abdi, H. (007a). Eigendecomposition: eigenvalues and eigenvecteurs. In N.J. Salkind (Ed.): Encyclopedia of Measurement and Statistics. Thousand Oaks (CA): Sage. pp. 304 308.. Abdi, H. (007b). Singular Value Decomposition (SVD) and Generalized Singular Value Decomposition (GSVD). In N.J. Salkind (Ed.): Encyclopedia of Measurement and Statistics. Thousand Oaks (CA): Sage. pp. 907 9. 3. Basilevsky, A. (983). Applied Matrix Algebra in the Statistical Sciences. New York: North-Holland. 4. Graybill, F.A. (969). Matrices with Applications in Statistics. New York: Wadworth. 5. Healy, M.J.R. (986). Matrices for Statistics. Oxford: Oxford University Press. 6. Searle, S.R. (98). Matrices Algebra Useful for Statistics. New York: Wiley.