Matrix multiplication is an operation with properties quite different from its scalar counterpart.

Similar documents
Introduction to Matrix Algebra

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Introduction to Matrix Algebra

Lecture 2 Matrix Operations

1 Introduction to Matrices

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

9 MATRICES AND TRANSFORMATIONS

Matrix Algebra. Some Basic Matrix Laws. Before reading the text or the following notes glance at the following list of basic matrix algebra laws.

Notes on Determinant

Solution to Homework 2

Linear Algebra Notes for Marsden and Tromba Vector Calculus

Matrix Differentiation

Data Mining: Algorithms and Applications Matrix Math Review

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

MAT188H1S Lec0101 Burbulla

Direct Methods for Solving Linear Systems. Matrix Factorization

Math 312 Homework 1 Solutions

A Primer on Index Notation

Recall that two vectors in are perpendicular or orthogonal provided that their dot

Rotation Matrices and Homogeneous Transformations

13 MATH FACTS a = The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

Multiple regression - Matrices

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Systems of Linear Equations

SYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89. by Joseph Collison

Chapter 7. Matrices. Definition. An m n matrix is an array of numbers set out in m rows and n columns. Examples. (

Matrix Algebra in R A Minimal Introduction

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Similarity and Diagonalization. Similar Matrices

The Characteristic Polynomial

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

NOTES ON LINEAR TRANSFORMATIONS

Vector and Matrix Norms

Introduction to Matrices for Engineers

Multivariate normal distribution and testing for means (see MKB Ch 3)

Mathematics Course 111: Algebra I Part IV: Vector Spaces

STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. Clarificationof zonationprocedure described onpp

LS.6 Solution Matrices

4.5 Linear Dependence and Linear Independence

Adding vectors We can do arithmetic with vectors. We ll start with vector addition and related operations. Suppose you have two vectors

Typical Linear Equation Set and Corresponding Matrices

DETERMINANTS IN THE KRONECKER PRODUCT OF MATRICES: THE INCIDENCE MATRIX OF A COMPLETE GRAPH

CONTROLLABILITY. Chapter Reachable Set and Controllability. Suppose we have a linear system described by the state equation

Multivariate Analysis of Variance (MANOVA): I. Theory

Lecture L3 - Vectors, Matrices and Coordinate Transformations

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system

Linearly Independent Sets and Linearly Dependent Sets

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Solving Systems of Linear Equations

MATH APPLIED MATRIX THEORY

MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set.

Name: Section Registered In:

THREE DIMENSIONAL GEOMETRY

3.1 Least squares in matrix form

The Determinant: a Means to Calculate Volume

A note on companion matrices

Section 5.3. Section 5.3. u m ] l jj. = l jj u j + + l mj u m. v j = [ u 1 u j. l mj

4: EIGENVALUES, EIGENVECTORS, DIAGONALIZATION

Lecture 4: Partitioned Matrices and Determinants

by the matrix A results in a vector which is a reflection of the given

Linear Codes. Chapter Basics

Matrix Representations of Linear Transformations and Changes of Coordinates

A linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form

Abstract: We describe the beautiful LU factorization of a square matrix (or how to write Gaussian elimination in terms of matrix multiplication).

Section Inner Products and Norms

Linear Algebra: Determinants, Inverses, Rank

T ( a i x i ) = a i T (x i ).

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.

26. Determinants I. 1. Prehistory

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = i.

1 Sets and Set Notation.

Applied Linear Algebra I Review page 1

Vector Spaces 4.4 Spanning and Independence

Linear Algebra Notes

Cross product and determinants (Sect. 12.4) Two main ways to introduce the cross product

Continued Fractions and the Euclidean Algorithm

The Matrix Elements of a 3 3 Orthogonal Matrix Revisited

Module 3: Correlation and Covariance

Linear Algebra Review. Vectors

Some probability and statistics

1 Symmetries of regular polyhedra

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Using row reduction to calculate the inverse and the determinant of a square matrix

Orthogonal Diagonalization of Symmetric Matrices

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

CBE 6333, R. Levicky 1. Tensor Notation.

UNCOUPLING THE PERRON EIGENVECTOR PROBLEM

Similar matrices and Jordan form

CURVE FITTING LEAST SQUARES APPROXIMATION

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

DATA ANALYSIS II. Matrix Algorithms

Chapter Two: Matrix Algebra and its Applications 1 CHAPTER TWO. Matrix Algebra and its applications

[1] Diagonal factorization

v w is orthogonal to both v and w. the three vectors v, w and v w form a right-handed set of vectors.

1 VECTOR SPACES AND SUBSPACES

Arkansas Tech University MATH 4033: Elementary Modern Algebra Dr. Marcel B. Finan

LEARNING OBJECTIVES FOR THIS CHAPTER

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Transcription:

Matrix Multiplication Matrix multiplication is an operation with properties quite different from its scalar counterpart. To begin with, order matters in matrix multiplication. That is, the matrix product AB need not be the same as the matrix product BA. Indeed, the matrix product AB might be well-defined, while the product BA might not exist.

Definition (Conformability for Matrix Multiplication). B s are conformable for matrix pa q and r multiplication as AB if and only if q = r.

Definition (Matrix Multiplication). Let = a B = b. A { } and { } p q ij q s ij Then = = { c } C AB where p s ik c q = a b (1) ik ij jk j= 1

Example (The Row by Column Method). The meaning of the formal definition of matrix multiplication might not be obvious at first glance. Indeed, there are several ways of thinking about matrix multiplication.

The first way, which I call the row by column approach, works as follows. Visualize p A q as a set of p row vectors and q B s as a set of s column vectors. Then if C= AB, element c ik of C is the scalar product (i.e., the sum of cross products) of the ith row of A with the kth column of B.

2 4 6 For example, let A = 5 7 1, and let 2 3 5 4 1 B = 0 2 5 1 38 16 Then C= AB = 25 20. 33 13

The following are some key properties of matrix multiplication: 1) Associativity. ( AB) C = A( BC ) (2) 2) Not generally commutative. That is, often AB BA. 3) Distributive over addition and subtraction.

CA ( + B) = CA+ CB (3) 4) Assuming it is conformable, the identity matrix I functions like the number 1, that is AI = IA = A (4) 5) AB = 0 does not necessarily imply that either A= 0 or B= 0.

Several of the above results are surprising, and result in negative transfer for beginning students as they attempt to reduce matrix algebra expressions.

Example (A Null Matrix Product). The following example shows that one can, indeed, obtain a null matrix as the product of two non-null matrices. Let 8 12 12 a = [ 6 2 2], and let B = 12 40 4. 12 4 40 ab = 0 0 0. Then [ ]

Definition (Pre-multiplication and Postmultiplication). When we talk about the product of matrices A and B, it is important to remember that AB and BA are usually not the same. Consequently, it is common to use the terms pre-multiplication and post-multiplication. When we say A is postmultiplied by B, or B is pre-multiplied by A, we are referring to the product AB. When we say B is post-multiplied by A, or A is premultiplied by B, we are referring to the product BA.

Matrix Transposition Transposing a matrix is an operation which plays a very important role in multivariate statistical theory. The operation, in essence, switches the rows and columns of a matrix.

Definition (Matrix Transposition). Let = { a } A or A. Then the transpose of A, denoted p q ij T A, is defined as { b } { a } B= A = = (5) q p ij ji

Example (Matrix Transposition). Let 1 2 3 A = 1 4 5. Then 1 1 A = 2 4 3 5

Properties of Matrix Transposition. ( A ) = A ( ca) = ca ( A+ B) = A + B ( AB) = B A A square matrix A is symmetric if and only if A= A

Partitioning of Matrices In many theoretical discussions of matrices, it will be useful to conceive of a matrix as being composed of sub-matrices. When we do this, we will partition the matrix symbolically by breaking it down into its components. The components can be either matrices or scalars.

Example. In simple multiple regression, where there is one criterion variable y and p predictor variables in the vector x, it is common to refer to the correlation matrix of the entire set of variables using partitioned notation. So we can write R 1 ry x = rxy Rxx (6)

Order of a Partitioned Form We will refer to the order of the partitioned form as the number of rows and columns in the partitioning, which is distinct from the number of rows and columns in the matrix being represented. For example, suppose there were p = 5 predictor variables in the example of Equation (6). Then the matrix R is a 6 6 matrix, but the example shows a 2 2 partitioned form.

When matrices are partitioned properly, it is understood that pieces that appear to the left or right of other pieces have the same number of rows, and pieces that appear above or below other pieces have the same number of columns. So, in the above example, R xx, appearing to the right of the p 1 column vector r xy, must have p rows, and since it appears below the 1 p row vector r y x, it must have p columns. Hence, it must be a p p matrix.

Linear Combinations of Matrix Rows and Columns We have already discussed the row by column conceptualization of matrix multiplication. However, there are some other ways of conceptualizing matrix multiplication that are particularly useful in the field of multivariate statistics. To begin with, we need to enhance our understanding of the way matrix multiplication and transposition works with partitioned matrices.

Definition. (Multiplication and Transposition of Partitioned Matrices). 1. To transpose a partitioned matrix, treat the submatrices in the partition as though they were elements of a matrix, but transpose each submatrix. The transpose of a p q partitioned form will be a q p partitioned form. 2. To multiply partitioned matrices, treat the submatrices as though they were elements of a matrix. The product of p q and q r partitioned forms is a p r partitioned form.

Some examples will illustrate the above definition. Example (Transposing a Partitioned Matrix). Suppose A is partitioned as C D A= E F. Then G H C E G A = D F H

Example (Product of Two Partitioned Matrices). Suppose A= [ X Y ] and B G = H. Then (assuming conformability) AB = XG + YH

Example (Linearly Combining Columns of a Matrix). Consider an N p matrix X, containing the scores of N persons on p variables. One can conceptualize the matrix as a set of p column vectors. In partitioned matrix form, we can represent X as X x x x x = 1 2 3 p

Now suppose one were to post-multiply X with a p 1 vector b. The product is a N 1 column vector: y = Xb 2 = x1 x2 x3 x b p 3 1 1 2 2 3 3 b b bp = bx + b x + b x + + b x p 1 p

Example (Computing Difference Scores). Suppose the matrix X consists of a set of scores on two variables, and you wish to compute the difference scores on the variables. y = Xb 80 70 10 + 1 = 77 79 = 2 1 64 64 0

Example. (Computing Course Grades). 80 70 73 1/3 77 79 2/3 = 78 64 64 64 1 3 1 3

Example. (Linearly Combining Rows of a Matrix). Suppose we view the p q matrix X as being composed of p row vectors. If we pre-multiply X with a 1 p row vector b, the elements of b are linear weights applied to the rows of X.

Sets of Linear Combinations There is, of course, no need to restrict oneself to a single linear combination of the rows and columns of a matrix. To create more than one linear combination, simply add columns (or rows) to the post-multiplying (or pre-multiplying) matrix! 80 70 150 10 1 1 77 79 = 156 2 1 1 64 64 128 0

Example (Extracting a Column from a Matrix). 1 4 4 2 5 0 = 5 3 6 1 6

Definition (Selection Vector). The selection vector s [] i is a vector with all elements zero except the ith element, which is 1. To extract the ith column of X, post-multiply by s [] i, and to extract the ith row of X, pre-multiply by s [] i. 1 4 0 1 0 2 5 = 2 5 3 6 [ ] [ ]

Example (Exchanging Columns of a Matrix). 1 4 4 1 0 1 2 5 = 5 2 1 0 3 6 6 3

Example (Rescaling Rows or Columns). 1 4 2 12 2 0 2 5 = 4 15 0 3 3 6 6 18

Example (Using Two Selection Vectors to Extract a Matrix Element). 1 4 0 1 0 0 2 5 = 4 1 3 6 [ ]

Matrix Algebra of Some Sample Statistics Converting to Deviation Scores Suppose x is an 1 N vector of scores for N people on a single variable. We wish to transform the scores in to deviation score form. (In general, we will find this a source of considerable convenience.) To accomplish the deviation score transformation, the arithmetic mean X, must be subtracted from each score in x.

Let 1 be a N 1 vector of ones. Then N i= 1 X i = 1x and N X = (1/ N) Xi = (1/ N) i= 1 1x

To transform to deviation score form, we need to subtract X from every element of x. We need * x = x 1( X ) = x 11x / N = x ( 11 / N) x = Ix ( 11 / N) x = Ix Px = ( I P) x = Qx

Example 2/3 1/3 1/3 4 2 1/3 2/3 1/3 2 = 0 1/3 1/3 2/3 0 2 Note that the ith row of Q gives you a linear combination of the N scores for computing the ith deviation score.

Properties of the Q Operator Definition (Idempotent Matrix). 2 A matrix C is idempotent if CC = C = C Theorem. If C is idempotent and I is a conformable identity matrix, then I C is also idempotent. Proof. To prove the result, we need merely show 2 I C = I C. This is straightforward. that ( ) ( )

Properties of the Q Operator ( I C) 2 = ( I C)( I C) 2 2 = I CI IC+ C = I C C+ C = I C

Properties of the Q Operator Class Exercise. Prove that if a matrix A is symmetric, so is AA. Class Exercise. From the preceding, prove that if a matrix A is symmetric, then for any scalar c, the matrix ca is symmetric. Class Exercise. If matrices A and B are both symmetric and of the same order, then A+ B and A B must be symmetric.

Properties of the Q Operator Recall that P = 11 / N is an N N symmetric matrix with each element equal to 1/ N. P is also idempotent. (See handout.) It then follows that = Q I P is also symmetric and idempotent. (Why? C.P.)

The Sample Variance If * x has scores in deviation score form, then 2 * * X = 1/( 1) x x S N

The Sample Variance If scores in x are not in deviation score form, we may use the Q operator to convert it into deviation score form first. Hence, in general, S 2 X = 1/( N 1) xqqx = 1/( N 1) x QQx = 1/( N 1) xqx

The Sample Covariance Do you understand each step below? Remember that Q = Q = QQ = Q' Q = QQ ' SXY = 1/( N 1) xqy = 1/( N 1) xy = 1/( N 1) xqy * = 1/( N 1) x y = 1/( N 1) xqqy * = 1/( N 1) x y * *

Notational Conventions In what follows, we will generally assume, unless explicitly stated otherwise, that our data matrices have been transformed to deviation score form. (The operator discussed above will accomplish this simultaneously for the case of scores of N subjects on several, say p, variates.) For example, consider a data matrix N X p, whose p columns are the scores of N subjects on p different variables. If the columns of X are in raw score form, the matrix Qx will have p columns of deviation scores. Why?

Notational Conventions We shall concentrate on results in the case where is in column variate form, i.e., is an N p matrix. Equivalent results may be developed for row variate form p N data matrices which have the N scores on p variables arranged in p rows. The choice of whether to use row or column variate representations is arbitrary, and varies in books and articles.

The Variance-Covariance Matrix SXX = 1/( N 1) XQX If we assume X is in deviation score form, then SXX = 1/( N 1) XX (Note: Some authors call XX S a covariance matrix. ) (Why would they do this?)

Diagonal Matrices Diagonal matrices have special properties, and we have some special notations associated with them. We use the notation diag( X ) to signify a diagonal matrix with diagonal entries equal to the diagonal elements of X. We use power notation with diagonal matrices, in the following sense: Let D be a diagonal matrix. c Then D is a diagonal matrix composed of the entries of D raised to the c power.

Correlation Matrix For p variables in the data matrix X, the correlation matrix R XX is a p p symmetric matrix with typical element r ij equal to the correlation between variables i and j. Of course, the diagonal elements of this matrix represent the correlation of a variable with itself, and are all equal to 1.

Correlation Matrix 1/2 1/2 XX R = D S D XX

(Cross-) Covariance Matrix Assume X and Y are in deviation score form. Then SXY = 1/( N 1) XY

Variance-Covariance of Linear Combinations Theorem (Linear Combinations of Deviation Scores). Given X, a data matrix in column variate deviation score form. Any linear composite Y = Xb will also be in deviation score form.

Variance and Covariance of Linear Combinations Theorem. (Variance-Covariance of Linear Combinations). a) If X has variance-covariance matrix S xx, then the linear combination y = Xb has variance bs b. XX b) The set of linear combinations Y = XB has variance-covariance matrix S = BS B. YY XX

c) Two sets of linear combinations W = XB and M = YC have covariance matrix S = BS C. WM XY

Trace of a Square Matrix Definition (Trace of a Square Matrix). The trace of a p p square matrix A is Tr( A) p = a i= 1 ii

Properties of the Trace 1. Tr( A+ B) = Tr( A) + Tr( B ) 2. Tr( A) = Tr ( A ) 3. Tr( ca) = ctr( A ) 4. Tr( AB ) = a b 5. Tr( EE ) = e i i j j ij ij 6. The cyclic permutation rule 2 ij ( ABC) = ( CAB) = ( BCA ) Tr Tr Tr