Matrix Differentiation



Similar documents
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

Linear Algebra Notes for Marsden and Tromba Vector Calculus

NOTES ON LINEAR TRANSFORMATIONS

1 Introduction to Matrices

Lecture 2 Matrix Operations

13 MATH FACTS a = The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

Solving Linear Systems, Continued and The Inverse of a Matrix

F Matrix Calculus F 1

Introduction to Matrix Algebra

CONTROLLABILITY. Chapter Reachable Set and Controllability. Suppose we have a linear system described by the state equation

Direct Methods for Solving Linear Systems. Matrix Factorization

Chapter 7. Matrices. Definition. An m n matrix is an array of numbers set out in m rows and n columns. Examples. (

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

Matrix Algebra. Some Basic Matrix Laws. Before reading the text or the following notes glance at the following list of basic matrix algebra laws.

Similarity and Diagonalization. Similar Matrices

Notes on Determinant

T ( a i x i ) = a i T (x i ).

SYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89. by Joseph Collison

MAT188H1S Lec0101 Burbulla

SOLVING LINEAR SYSTEMS

CS3220 Lecture Notes: QR factorization and orthogonal transformations

160 CHAPTER 4. VECTOR SPACES

Systems of Linear Equations

Mathematics Course 111: Algebra I Part IV: Vector Spaces

Linear Maps. Isaiah Lankham, Bruno Nachtergaele, Anne Schilling (February 5, 2007)

A Primer on Index Notation

Typical Linear Equation Set and Corresponding Matrices

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.

Linear Algebra: Vectors

Linear Algebra: Determinants, Inverses, Rank

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

Solution of Linear Systems

A linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form

6. Cholesky factorization

Equations, Inequalities & Partial Fractions

Methods for Finding Bases

Scalar Valued Functions of Several Variables; the Gradient Vector

Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances

4.5 Linear Dependence and Linear Independence

Using row reduction to calculate the inverse and the determinant of a square matrix

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

CURVE FITTING LEAST SQUARES APPROXIMATION

MA106 Linear Algebra lecture notes

Data Mining: Algorithms and Applications Matrix Math Review

Solving Linear Systems of Equations. Gerald Recktenwald Portland State University Mechanical Engineering Department

Lecture L3 - Vectors, Matrices and Coordinate Transformations

Matrices 2. Solving Square Systems of Linear Equations; Inverse Matrices

Section Inner Products and Norms

Linear Algebra Done Wrong. Sergei Treil. Department of Mathematics, Brown University

Linear Algebra Methods for Data Mining

MATHEMATICS FOR ENGINEERS BASIC MATRIX THEORY TUTORIAL 2

LS.6 Solution Matrices

Brief Introduction to Vectors and Matrices

1 Solving LPs: The Simplex Algorithm of George Dantzig

Operation Count; Numerical Linear Algebra

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = i.

5 Homogeneous systems

Vector and Matrix Norms

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Rotation Matrices and Homogeneous Transformations

Linear Algebra Review. Vectors

Notes on Matrix Calculus

1 Determinants and the Solvability of Linear Systems

Chapter 2 Determinants, and Linear Independence

Lecture 5 Principal Minors and the Hessian

Notes on Linear Algebra. Peter J. Cameron

Linear Algebra Done Wrong. Sergei Treil. Department of Mathematics, Brown University

9.2 Summation Notation

Introduction to Matrix Algebra

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

Similar matrices and Jordan form

Math 312 Homework 1 Solutions

Linear Algebra Review and Reference

DATA ANALYSIS II. Matrix Algorithms

Solution to Homework 2

Solving simultaneous equations using the inverse matrix

University of Lille I PC first year list of exercises n 7. Review

Lecture Notes on The Mechanics of Elastic Solids

The Matrix Elements of a 3 3 Orthogonal Matrix Revisited

8 Square matrices continued: Determinants

MATHEMATICS FOR ENGINEERING BASIC ALGEBRA

The Characteristic Polynomial

Lecture notes on linear algebra

Lecture 5: Singular Value Decomposition SVD (1)

Multi-variable Calculus and Optimization

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

Solving Systems of Linear Equations

Chapter 6. Orthogonality

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix

Introduction to Matrices for Engineers

x = + x 2 + x

Homogeneous systems of algebraic equations. A homogeneous (ho-mo-geen -ius) system of linear algebraic equations is one in which

October 3rd, Linear Algebra & Properties of the Covariance Matrix

1.2 Solving a System of Linear Equations

Lecture 1: Systems of Linear Equations

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

Linear Algebra and TI 89

Transcription:

1 Introduction Matrix Differentiation ( and some other stuff ) Randal J. Barnes Department of Civil Engineering, University of Minnesota Minneapolis, Minnesota, USA Throughout this presentation I have chosen to use a symbolic matrix notation. This choice was not made lightly. I am a strong advocate of index notation, when appropriate. For example, index notation greatly simplifies the presentation and manipulation of differential geometry. As a rule-of-thumb, if your work is going to primarily involve differentiation with respect to the spatial coordinates, then index notation is almost surely the appropriate choice. In the present case, however, I will be manipulating large systems of equations in which the matrix calculus is relatively simply while the matrix algebra and matrix arithmetic is messy and more involved. Thus, I have chosen to use symbolic notation. 2 Notation and Nomenclature Definition 1 array Let a ij R, i = 1, 2,..., m, j = 1, 2,..., n. Then the ordered rectangular A = a 11 a 12 a 1n a 21 a 22 a 2n... (1) is said to be a real matrix of dimension m n. a m1 a m2 a mn When writing a matrix I will occasionally write down its typical element as well as its dimension. Thus, A = [a ij ], i = 1, 2,..., m; j = 1, 2,..., n, (2) denotes a matrix with m rows and n columns, whose typical element is a ij. Note, the first subscript locates the row in which the typical element lies while the second subscript locates the column. For example, a jk denotes the element lying in the jth row and kth column of the matrix A. Definition 2 A vector is a matrix with only one column. Thus, all vectors are inherently column vectors. Convention 1 Multi-column matrices are denoted by boldface uppercase letters: for example, A, B, X. Vectors (single-column matrices) are denoted by boldfaced lowercase letters: for example, a, b, x. I will attempt to use letters from the beginning of the alphabet to designate known matrices, and letters from the end of the alphabet for unknown or variable matrices. 1

Convention 2 When it is useful to explicitly attach the matrix dimensions to the symbolic notation, I will use an underscript. For example, A, indicates a known, multi-column matrix with m rows m n and n columns. A superscript T denotes the matrix transpose operation; for example, A T denotes the transpose of A. Similarly, if A has an inverse it will be denoted by A 1. The determinant of A will be denoted by either A or det(a). Similarly, the rank of a matrix A is denoted by rank(a). An identity matrix will be denoted by I, and 0 will denote a null matrix. 3 Matrix Multiplication Definition 3 Let A be m n, and B be n p, and let the product AB be C = AB (3) then C is a m p matrix, with element (i, j) given by c ij = for all i = 1, 2,..., m, j = 1, 2,..., p. a ik b kj (4) Proposition 1 Let A be m n, and x be n 1, then the typical element of the product z = Ax (5) is given by z i = a ik x k (6) for all i = 1, 2,..., m. Similarly, let y be m 1, then the typical element of the product z T = y T A (7) is given by z i = a ki y k (8) for all i = 1, 2,..., n. Finally, the scalar resulting from the product α = y T Ax (9) is given by m α = a jk y j x k (10) j=1 Proof: These are merely direct applications of Definition 3. 2

Proposition 2 Let A be m n, and B be n p, and let the product AB be C = AB (11) then C T = B T A T (12) Proof: The typical element of C is given by c ij = a ik b kj (13) By definition, the typical element of C T, say d ij, is given by d ij = c ji = a jk b ki (14) Hence, Proposition 3 by then Proof: C T = B T A T (15) Let A and B be n n and invertible matrices. Let the product AB be given C = AB (16) C 1 = B 1 A 1 (17) CB 1 A 1 = ABB 1 A 1 = I (18) 4 Partioned Matrices Frequently, I will find it convenient to deal with partitioned matrices 1. Such a representation, and the manipulation of this representation, are two of the relative advantages of the symbolic matrix notation. Definition 4 Let A be m n and write A = [ B D C E ] (19) where B is m 1 n 1, E is m 2 n 2, C is m 1 n 2, D is m 2 n 1, m 1 +m 2 = m, and n 1 +n 2 = n. The above is said to be a partition of the matrix A. 1 Much of the material in this section is extracted directly from Dhrymes (1978, Section 2.7). 3

Proposition 4 Let A be a square, nonsingular matrix of order m. Partition A as [ ] A 11 A 12 A = A 21 A 22 (20) so that A 11 is a nonsingular matrix of order m 1, A 22 is a nonsingular matrix of order m 2, and m 1 + m 2 = m. Then [ ( ) A 1 A11 A 12 A 1 1 ( ) 22 A 21 A 1 11 A 12 A22 A 21 A 1 1 ] 11 A 12 = ( ) A 1 22 A 21 A11 A 12 A 1 1 ( ) 22 A 21 A22 A 21 A 1 1 (21) 11 A 12 Proof: Direct multiplication of the proposed A 1 and A yields A 1 A = I (22) 5 Matrix Differentiation In the following discussion I will differentiate matrix quantities with respect to the elements of the referenced matrices. Although no new concept is required to carry out such operations, the element-by-element calculations involve cumbersome manipulations and, thus, it is useful to derive the necessary results and have them readily available 2. Convention 3 Let y = ψ(x), (23) where y is an m-element vector, and x is an n-element vector. The symbol x = 1 1 x 1 x 2 2 2 x 1 x 2. m x 1. m x 2 1 x n 2 x n will denote the m n matrix of first-order partial derivatives of the transformation from x to y. Such a matrix is called the Jacobian matrix of the transformation ψ(). Notice that if x is actually a scalar in Convention 3 then the resulting Jacobian matrix is a m 1 matrix; that is, a single column (a vector). On the other hand, if y is actually a scalar in Convention 3 then the resulting Jacobian matrix is a 1 n matrix; that is, a single row (the transpose of a vector).. m x n (24) Proposition 5 Let y = Ax (25) 2 Much of the material in this section is extracted directly from Dhrymes (1978, Section 4.3). The interested reader is directed to this worthy reference to find additional results. 4

where y is m 1, x is n 1, A is m n, and A does not depend on x, then Proof: Since the ith element of y is given by x = A (26) it follows that for all i = 1, 2,..., m, y i = a ik x k (27) j = 1, 2,..., n. Hence i x j = a ij (28) x = A (29) Proposition 6 Let y = Ax (30) where y is m 1, x is n 1, A is m n, and A does not depend on x, as in Proposition 5. Suppose that x is a function of the vector z, while A is independent of z. Then Proof: Since the ith element of y is given by = A x (31) for all i = 1, 2,..., m, it follows that y i = a ik x k (32) i j = a ik x k j (33) but the right hand side of the above is simply element (i, j) of A x. Hence = x x = A x (34) Proposition 7 Let the scalar α be defined by α = y T Ax (35) where y is m 1, x is n 1, A is m n, and A is independent of x and y, then x = yt A (36) 5

and Proof: Define and note that Hence, by Proposition 5 we have that = xt A T (37) w T = y T A (38) α = w T x (39) which is the first result. Since α is a scalar, we can write and applying Proposition 5 as before we obtain x = wt = y T A (40) α = α T = x T A T y (41) = xt A T (42) Proposition 8 For the special case in which the scalar α is given by the quadratic form where x is n 1, A is n n, and A does not depend on x, then Proof: By definition α = x T Ax (43) x = xt ( A + A T ) (44) α = j=1 i=1 Differentiating with respect to the kth element of x we have x k = for all k = 1, 2,..., n, and consequently, a kj x j + j=1 a ij x i x j (45) a ik x i (46) i=1 x = xt A T + x T A = x T ( A T + A ) (47) 6

Proposition 9 For the special case where A is a symmetric matrix and where x is n 1, A is n n, and A does not depend on x, then Proof: This is an obvious application of Proposition 8. α = x T Ax (48) x = 2xT A (49) Proposition 10 Let the scalar α be defined by α = y T x (50) where y is n 1, x is n 1, and both y and x are functions of the vector z. Then Proof: We have x = xt + yt α = Differentiating with respect to the kth element of z we have ( ) j x j = x j + y j k k k j=1 (51) x j y j (52) j=1 (53) for all k = 1, 2,..., n, and consequently, = + x x x = xt + yt (54) Proposition 11 Let the scalar α be defined by α = x T x (55) where x is n 1, and x is a function of the vector z. Then = 2xT x Proof: This is an obvious application of Proposition 10. (56) Proposition 12 Let the scalar α be defined by α = y T Ax (57) where y is m 1, x is n 1, A is m n, and both y and x are functions of the vector z, while A does not depend on z. Then = xt A T + yt A x (58) 7

Proof: Define and note that Applying Propositon 10 we have w T = y T A (59) α = w T x (60) Substituting back in for w we arrive at = xt w + wt x = + x x = xt A T + yt A x (61) (62) Proposition 13 Let the scalar α be defined by the quadratic form α = x T Ax (63) where x is n 1, A is n n, and x is a function of the vector z, while A does not depend on z. Then = ( xt A + A ) T x (64) Proof: This is an obvious application of Proposition 12. Proposition 14 For the special case where A is a symmetric matrix and α = x T Ax (65) where x is n 1, A is n n, and x is a function of the vector z, while A does not depend on z. Then = 2xT A x (66) Proof: This is an obvious application of Proposition 13. Definition 5 Let A be a m n matrix whose elements are functions of the scalar parameter α. Then the derivative of the matrix A with respect to the scalar parameter α is the m n matrix of element-by-element derivatives: a 11 a 12 a 1n a A = 21 a 22 a 2n (67)... a m1 a m2 Proposition 15 Let A be a nonsingular, m m matrix whose elements are functions of the scalar parameter α. Then A 1 A = A 1 A 1 (68) a mn 8

Proof: Start with the definition of the inverse A 1 A = I (69) and differentiate, yielding rearranging the terms yields 1 A A + A 1 A = 0 (70) A 1 A = A 1 A 1 (71) 6 References Dhrymes, Phoebus J., 1978, Mathematics for Econometrics, Springer-Verlag, New York, 136 pp. Golub, Gene H., and Charles F. Van Loan, 1983, Matrix Computations, Johns Hopkins University Press, Baltimore, Maryland, 476 pp. Graybill, Franklin A., 1983, Matrices with Applications in Statistics, 2nd Edition, Wadsworth International Group, Belmont, California, 461 pp. 9