The Matrix Cookbook. [ http://matrixcookbook.com ] Kaare Brandt Petersen Michael Syskind Pedersen. Version: November 15, 2012



Similar documents
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Applied Linear Algebra I Review page 1

CONTROLLABILITY. Chapter Reachable Set and Controllability. Suppose we have a linear system described by the state equation

LINEAR ALGEBRA. September 23, 2010

The Characteristic Polynomial

1 Introduction to Matrices

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

3. Let A and B be two n n orthogonal matrices. Then prove that AB and BA are both orthogonal matrices. Prove a similar result for unitary matrices.

Factorization Theorems

PUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = i.

Introduction to Matrix Algebra

Linear Algebra: Determinants, Inverses, Rank

Notes on Determinant

13 MATH FACTS a = The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

The Singular Value Decomposition in Symmetric (Löwdin) Orthogonalization and Data Compression

Notes on Symmetric Matrices

Chapter 7. Matrices. Definition. An m n matrix is an array of numbers set out in m rows and n columns. Examples. (

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Data Mining: Algorithms and Applications Matrix Math Review

Matrix Algebra. Some Basic Matrix Laws. Before reading the text or the following notes glance at the following list of basic matrix algebra laws.

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

Lecture 2 Matrix Operations

Inner Product Spaces and Orthogonality

Notes on Matrix Calculus

Eigenvalues and Eigenvectors

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

9 MATRICES AND TRANSFORMATIONS

Section Inner Products and Norms

MATRICES WITH DISPLACEMENT STRUCTURE A SURVEY

Linear Algebra Notes for Marsden and Tromba Vector Calculus

Lecture 4: Partitioned Matrices and Determinants

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

MAT188H1S Lec0101 Burbulla

October 3rd, Linear Algebra & Properties of the Covariance Matrix

MATH PROBLEMS, WITH SOLUTIONS

3.1 Least squares in matrix form

Vector and Matrix Norms

RESULTANT AND DISCRIMINANT OF POLYNOMIALS

Solution to Homework 2

Linear Algebra Review and Reference

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

THREE DIMENSIONAL GEOMETRY

Math 312 Homework 1 Solutions

Inner Product Spaces

CS3220 Lecture Notes: QR factorization and orthogonal transformations

Factor analysis. Angela Montanari

Matrix Differentiation

1 VECTOR SPACES AND SUBSPACES

5. Orthogonal matrices

NOV /II. 1. Let f(z) = sin z, z C. Then f(z) : 3. Let the sequence {a n } be given. (A) is bounded in the complex plane

Continuous Groups, Lie Groups, and Lie Algebras

Some probability and statistics

Matrices and Polynomials

Linear Algebra Review. Vectors

Chapter 17. Orthogonal Matrices and Symmetries of Space

LINEAR ALGEBRA W W L CHEN

Linear Algebra Done Wrong. Sergei Treil. Department of Mathematics, Brown University

Lecture 1: Schur s Unitary Triangularization Theorem

1 Determinants and the Solvability of Linear Systems

SOLVING LINEAR SYSTEMS

Least-Squares Intersection of Lines

MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set.

THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS

Chapter 7. Permutation Groups

Problem Set 5 Due: In class Thursday, Oct. 18 Late papers will be accepted until 1:00 PM Friday.

VERTICES OF GIVEN DEGREE IN SERIES-PARALLEL GRAPHS

Sections 2.11 and 5.8

Matrices and Linear Algebra

Statistical machine learning, high dimension and big data

Direct Methods for Solving Linear Systems. Matrix Factorization

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Finite dimensional C -algebras

6. Cholesky factorization

T ( a i x i ) = a i T (x i ).

Algebra Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the school year.

Some Lecture Notes and In-Class Examples for Pre-Calculus:

Chapter 9 Unitary Groups and SU(N)

Polynomials. Key Terms. quadratic equation parabola conjugates trinomial. polynomial coefficient degree monomial binomial GCF

Tim Kerins. Leaving Certificate Honours Maths - Algebra. Tim Kerins. the date

Solution of Linear Systems

Introduction to Matrix Algebra

IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL. 1. Introduction

Inner products on R n, and more

Statistical Machine Learning

Linear Algebra Done Wrong. Sergei Treil. Department of Mathematics, Brown University

How To Understand The Kroncker Product

Linear Algebraic Equations, SVD, and the Pseudo-Inverse

How To Prove The Dirichlet Unit Theorem

a 1 x + a 0 =0. (3) ax 2 + bx + c =0. (4)

7 Gaussian Elimination and LU Factorization

A note on companion matrices

x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.

Linear Threshold Units

Notes on Linear Algebra. Peter J. Cameron

Lecture 13 Linear quadratic Lyapunov theory

1 Homework 1. [p 0 q i+j p i 1 q j+1 ] + [p i q j ] + [p i+1 q j p i+j q 0 ]

Introduction to Matrices for Engineers

Similar matrices and Jordan form

(Quasi-)Newton methods

Transcription:

The Matrix Cookbook http://matrixcookbook.com ] Kaare Brandt Petersen Michael Syskind Pedersen Version: November 15, 01 1

Introduction What is this? These pages are a collection of facts (identities, approximations, inequalities, relations,...) about matrices and matters relating to them. It is collected in this form for the convenience of anyone who wants a quick desktop reference. Disclaimer: The identities, approximations and relations presented here were obviously not invented but collected, borrowed and copied from a large amount of sources. These sources include similar but shorter notes found on the internet and appendices in books - see the references for a full list. Errors: Very likely there are errors, typos, and mistakes for which we apologize and would be grateful to receive corrections at cookbook@30.dk. Its ongoing: The project of keeping a large repository of relations involving matrices is naturally ongoing and the version will be apparent from the date in the header. Suggestions: Your suggestion for additional content or elaboration of some topics is most welcome acookbook@30.dk. Keywords: Matrix algebra, matrix relations, matrix identities, derivative of determinant, derivative of inverse matrix, differentiate a matrix. Acknowledgements: We would like to thank the following for contributions and suggestions: Bill Baxter, Brian Templeton, Christian Rishøj, Christian Schröppel, Dan Boley, Douglas L. Theobald, Esben Hoegh-Rasmussen, Evripidis Karseras, Georg Martius, Glynne Casteel, Jan Larsen, Jun Bin Gao, Jürgen Struckmeier, Kamil Dedecius, Karim T. Abou-Moustafa, Korbinian Strimmer, Lars Christiansen, Lars Kai Hansen, Leland Wilkinson, Liguo He, Loic Thibaut, Markus Froeb, Michael Hubatka, Miguel Barão, Ole Winther, Pavel Sakov, Stephan Hattinger, Troels Pedersen, Vasile Sima, Vincent Rabaud, Zhaoshui He. We would also like thank The Oticon Foundation for funding our PhD studies. Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 01, Page

CONTENTS CONTENTS Contents 1 Basics 6 1.1 Trace.................................. 6 1. Determinant.............................. 6 1.3 The Special Case x......................... 7 Derivatives 8.1 Derivatives of a Determinant.................... 8. Derivatives of an Inverse....................... 9.3 Derivatives of Eigenvalues...................... 10.4 Derivatives of Matrices, Vectors and Scalar Forms........ 10.5 Derivatives of Traces......................... 1.6 Derivatives of vector norms..................... 14.7 Derivatives of matrix norms..................... 14.8 Derivatives of Structured Matrices................. 14 3 Inverses 17 3.1 Basic.................................. 17 3. Exact Relations............................ 18 3.3 Implication on Inverses........................ 0 3.4 Approximations............................ 0 3.5 Generalized Inverse.......................... 1 3.6 Pseudo Inverse............................ 1 4 Complex Matrices 4 4.1 Complex Derivatives......................... 4 4. Higher order and non-linear derivatives............... 6 4.3 Inverse of complex sum....................... 7 5 Solutions and Decompositions 8 5.1 Solutions to linear equations..................... 8 5. Eigenvalues and Eigenvectors.................... 30 5.3 Singular Value Decomposition.................... 31 5.4 Triangular Decomposition...................... 3 5.5 LU decomposition.......................... 3 5.6 LDM decomposition......................... 33 5.7 LDL decompositions......................... 33 6 Statistics and Probability 34 6.1 Definition of Moments........................ 34 6. Expectation of Linear Combinations................ 35 6.3 Weighted Scalar Variable...................... 36 7 Multivariate Distributions 37 7.1 Cauchy................................ 37 7. Dirichlet................................ 37 7.3 Normal................................ 37 7.4 Normal-Inverse Gamma....................... 37 7.5 Gaussian................................ 37 7.6 Multinomial.............................. 37 Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 01, Page 3

CONTENTS CONTENTS 7.7 Student s t.............................. 37 7.8 Wishart................................ 38 7.9 Wishart, Inverse........................... 39 8 Gaussians 40 8.1 Basics................................. 40 8. Moments............................... 4 8.3 Miscellaneous............................. 44 8.4 Mixture of Gaussians......................... 44 9 Special Matrices 46 9.1 Block matrices............................ 46 9. Discrete Fourier Transform Matrix, The.............. 47 9.3 Hermitian Matrices and skew-hermitian.............. 48 9.4 Idempotent Matrices......................... 49 9.5 Orthogonal matrices......................... 49 9.6 Positive Definite and Semi-definite Matrices............ 50 9.7 Singleentry Matrix, The....................... 5 9.8 Symmetric, Skew-symmetric/Antisymmetric............ 54 9.9 Toeplitz Matrices........................... 54 9.10 Transition matrices.......................... 55 9.11 Units, Permutation and Shift.................... 56 9.1 Vandermonde Matrices........................ 57 10 Functions and Operators 58 10.1 Functions and Series......................... 58 10. Kronecker and Vec Operator.................... 59 10.3 Vector Norms............................. 61 10.4 Matrix Norms............................. 61 10.5 Rank.................................. 6 10.6 Integral Involving Dirac Delta Functions.............. 6 10.7 Miscellaneous............................. 63 A One-dimensional Results 64 A.1 Gaussian................................ 64 A. One Dimensional Mixture of Gaussians............... 65 B Proofs and Details 66 B.1 Misc Proofs.............................. 66 Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 01, Page 4

CONTENTS CONTENTS Notation and Nomenclature A Matrix A ij Matrix indexed for some purpose A i Matrix indexed for some purpose A ij Matrix indexed for some purpose A n Matrix indexed for some purpose or The n.th power of a square matrix A 1 The inverse matrix of the matrix A A + The pseudo inverse matrix of the matrix A (see Sec. 3.6) A 1/ The square root of a matrix (if unique), not elementwise (A) ij The (i, j).th entry of the matrix A A ij The (i, j).th entry of the matrix A A] ij The ij-submatrix, i.e. A with i.th row and j.th column deleted a Vector (column-vector) a i Vector indexed for some purpose a i The i.th element of the vector a a Scalar Rz Rz RZ Iz Iz IZ Real part of a scalar Real part of a vector Real part of a matrix Imaginary part of a scalar Imaginary part of a vector Imaginary part of a matrix det(a) Determinant of A Tr(A) Trace of the matrix A diag(a) Diagonal matrix of the matrix A, i.e. (diag(a)) ij = δ ij A ij eig(a) Eigenvalues of the matrix A vec(a) The vector-version of the matrix A (see Sec. 10..) sup Supremum of a set A Matrix norm (subscript if any denotes what norm) A T Transposed matrix A T The inverse of the transposed and vice versa, A T = (A 1 ) T = (A T ) 1. A Complex conjugated matrix A H Transposed and complex conjugated matrix (Hermitian) A B A B Hadamard (elementwise) product Kronecker product 0 The null matrix. Zero in all entries. I The identity matrix The single-entry matrix, 1 at (i, j) and zero elsewhere Σ A positive definite matrix Λ A diagonal matrix J ij Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 01, Page 5

1 BASICS 1 Basics (AB) 1 = B 1 A 1 (1) (ABC...) 1 =...C 1 B 1 A 1 () (A T ) 1 = (A 1 ) T (3) (A + B) T = A T + B T (4) (AB) T = B T A T (5) (ABC...) T =...C T B T A T (6) (A H ) 1 = (A 1 ) H (7) (A + B) H = A H + B H (8) (AB) H = B H A H (9) (ABC...) H =...C H B H A H (10) 1.1 Trace Tr(A) = i A ii (11) Tr(A) = i λ i, λ i = eig(a) (1) Tr(A) = Tr(A T ) (13) Tr(AB) = Tr(BA) (14) Tr(A + B) = Tr(A) + Tr(B) (15) Tr(ABC) = Tr(BCA) = Tr(CAB) (16) a T a = Tr(aa T ) (17) 1. Determinant Let A be an n n matrix. det(a) = i λ i λ i = eig(a) (18) det(ca) = c n det(a), if A R n n (19) det(a T ) = det(a) (0) det(ab) = det(a) det(b) (1) det(a 1 ) = 1/ det(a) () det(a n ) = det(a) n (3) det(i + uv T ) = 1 + u T v (4) For n = : For n = 3: det(i + A) = 1 + det(a) + Tr(A) (5) det(i + A) = 1 + det(a) + Tr(A) + 1 Tr(A) 1 Tr(A ) (6) Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 01, Page 6

1.3 The Special Case x 1 BASICS For n = 4: det(i + A) = 1 + det(a) + Tr(A) + 1 +Tr(A) 1 Tr(A ) + 1 6 Tr(A)3 1 Tr(A)Tr(A ) + 1 3 Tr(A3 ) (7) For small ε, the following approximation holds det(i + εa) = 1 + det(a) + εtr(a) + 1 ε Tr(A) 1 ε Tr(A ) (8) 1.3 The Special Case x Consider the matrix A A11 A A = 1 A 1 A Determinant and trace ] Eigenvalues λ 1 = Tr(A) + Tr(A) 4 det(a) λ 1 + λ = Tr(A) Eigenvectors Inverse v 1 det(a) = A 11 A A 1 A 1 (9) Tr(A) = A 11 + A (30) λ λ Tr(A) + det(a) = 0 ] A 1 λ 1 A 11 A 1 = λ = Tr(A) Tr(A) 4 det(a) λ 1 λ = det(a) v A 1 λ A 11 det(a) A 1 A 11 1 A A 1 ] ] (31) Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 01, Page 7

DERIVATIVES Derivatives This section is covering differentiation of a number of expressions with respect to a matrix X. Note that it is always assumed that X has no special structure, i.e. that the elements of X are independent (e.g. not symmetric, Toeplitz, positive definite). See section.8 for differentiation of structured matrices. The basic assumptions can be written in a formula as X kl X ij = δ ik δ lj (3) that is for e.g. vector forms, ] x = x i y y i ] x = x y i y i ] x = x i y ij y j The following rules are general and very useful when deriving the differential of an expression (19]): A = 0 (A is a constant) (33) (αx) = αx (34) (X + Y) = X + Y (35) (Tr(X)) = Tr(X) (36) (XY) = (X)Y + X(Y) (37) (X Y) = (X) Y + X (Y) (38) (X Y) = (X) Y + X (Y) (39) (X 1 ) = X 1 (X)X 1 (40) (det(x)) = Tr(adj(X)X) (41) (det(x)) = det(x)tr(x 1 X) (4) (ln(det(x))) = Tr(X 1 X) (43) X T = (X) T (44) X H = (X) H (45).1 Derivatives of a Determinant.1.1 General form k det(y) x det(x) = det(y)tr Y 1 Y x ] (46) X jk = δ ij det(x) (47) X ik ] Y det(y) 1 x x = det(y) Tr Y x +Tr Y ( Tr Y 1 Y x 1 Y x ] Tr Y ) ( Y 1 Y ] x )] ] (48) 1 Y x Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 01, Page 8

. Derivatives of an Inverse DERIVATIVES.1. Linear forms det(x) = det(x)(x 1 ) T (49) X det(x) X jk = δ ij det(x) (50) X ik k det(axb) X.1.3 Square forms If X is square and invertible, then = det(axb)(x 1 ) T = det(axb)(x T ) 1 (51) det(x T AX) X If X is not square but A is symmetric, then det(x T AX) X If X is not square and A is not symmetric, then det(x T AX) X.1.4 Other nonlinear forms Some special cases are (See 9, 7]) = det(x T AX)X T (5) = det(x T AX)AX(X T AX) 1 (53) = det(x T AX)(AX(X T AX) 1 + A T X(X T A T X) 1 ) (54) ln det(x T X) X = (X + ) T (55) ln det(x T X) X + = X T (56) ln det(x) X = (X 1 ) T = (X T ) 1 (57) det(x k ) X = k det(x k )X T (58). Derivatives of an Inverse From 7] we have the basic identity Y 1 x Y = Y 1 x Y 1 (59) Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 01, Page 9

.3 Derivatives of Eigenvalues DERIVATIVES from which it follows (X 1 ) kl X ij = (X 1 ) ki (X 1 ) jl (60) a T X 1 b X = X T ab T X T (61) det(x 1 ) X = det(x 1 )(X 1 ) T (6) Tr(AX 1 B) X = (X 1 BAX 1 ) T (63) Tr((X + A) 1 ) X = ((X + A) 1 (X + A) 1 ) T (64) From 3] we have the following result: Let A be an n n invertible square matrix, W be the inverse of A, and J(A) is an n n -variate and differentiable function with respect to A, then the partial differentials of J with respect to A and W satisfy J A = J A T W A T.3 Derivatives of Eigenvalues eig(x) = Tr(X) = I (65) X X eig(x) = X X det(x) = det(x)x T (66) If A is real and symmetric, λ i and v i are distinct eigenvalues and eigenvectors of A (see (76)) with vi T v i = 1, then 33] λ i = v T i (A)v i (67) v i = (λ i I A) + (A)v i (68).4 Derivatives of Matrices, Vectors and Scalar Forms.4.1 First Order x T a x a T Xb X a T X T b X a T Xa X = at x x = a (69) = ab T (70) = ba T (71) = at X T a X = aa T (7) X X ij = J ij (73) (XA) ij X mn = δ im (A) nj = (J mn A) ij (74) (X T A) ij X mn = δ in (A) mj = (J nm A) ij (75) Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 01, Page 10

.4 Derivatives of Matrices, Vectors and Scalar Forms DERIVATIVES.4. Second Order X ij X kl (76) X kl X mn klmn = kl b T X T Xc X = X(bc T + cb T ) (77) (Bx + b) T C(Dx + d) x = B T C(Dx + d) + D T C T (Bx + b) (78) (X T BX) kl X ij = δ lj (X T B) ki + δ kj (BX) il (79) (X T BX) X ij = X T BJ ij + J ji BX (J ij ) kl = δ ik δ jl (80) See Sec 9.7 for useful properties of the Single-entry matrix J ij x T Bx x = (B + B T )x (81) b T X T DXc X = D T Xbc T + DXcb T (8) X (Xb + c)t D(Xb + c) = (D + D T )(Xb + c)b T (83) Assume W is symmetric, then s (x As)T W(x As) = A T W(x As) (84) x (x s)t W(x s) = W(x s) (85) s (x s)t W(x s) = W(x s) (86) x (x As)T W(x As) = W(x As) (87) A (x As)T W(x As) = W(x As)s T (88) As a case with complex values the following holds (a x H b) x This formula is also known from the LMS algorithm 14].4.3 Higher-order and non-linear For proof of the above, see B.1.3. = b(a x H b) (89) (X n n 1 ) kl = (X r J ij X n 1 r ) kl (90) X ij r=0 n 1 X at X n b = (X r ) T ab T (X n 1 r ) T (91) r=0 Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 01, Page 11

.5 Derivatives of Traces DERIVATIVES X at (X n ) T X n b = n 1 X n 1 r ab T (X n ) T X r r=0 +(X r ) T X n ab T (X n 1 r ) T ] (9) See B.1.3 for a proof. Assume s and r are functions of x, i.e. s = s(x), r = r(x), and that A is a constant, then x st Ar = ] T s Ar + x ] T r A T s (93) x (Ax) T (Ax) x (Bx) T (Bx) = x T A T Ax x x T B T Bx (94) = AT Ax x T BBx xt A T AxB T Bx (x T B T Bx) (95).4.4 Gradient and Hessian Using the above we have for the gradient and the Hessian.5 Derivatives of Traces f = x T Ax + b T x (96) x f = f x = (A + AT )x + b (97) f xx T = A + A T (98) Assume F (X) to be a differentiable function of each of the elements of X. It then holds that Tr(F (X)) = f(x) T X where f( ) is the scalar derivative of F ( )..5.1 First Order Tr(X) X = I (99) X Tr(XA) = AT (100) X Tr(AXB) = AT B T (101) X Tr(AXT B) = BA (10) X Tr(XT A) = A (103) X Tr(AXT ) = A (104) Tr(A X) X = Tr(A)I (105) Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 01, Page 1

.5 Derivatives of Traces DERIVATIVES.5. Second Order See 7]. X Tr(X ) = X T (106) X Tr(X B) = (XB + BX) T (107) X Tr(XT BX) = BX + B T X (108) X Tr(BXXT ) = BX + B T X (109) X Tr(XXT B) = BX + B T X (110) X Tr(XBXT ) = XB T + XB (111) X Tr(BXT X) = XB T + XB (11) X Tr(XT XB) = XB T + XB (113) X Tr(AXBX) = AT X T B T + B T X T A T (114) X Tr(XT X) = X Tr(XXT ) = X (115) X Tr(BT X T CXB) = C T XBB T + CXBB T (116) X Tr X T BXC ] = BXC + B T XC T (117) X Tr(AXBXT C) = A T C T XB T + CAXB (118) ] X Tr (AXB + C)(AXB + C) T = A T (AXB + C)B T (119).5.3 Higher Order Tr(X X) = Tr(X)Tr(X) = Tr(X)I(10) X X X Tr(Xk ) = k(x k 1 ) T (11) X Tr(AXk ) = k 1 (X r AX k r 1 ) T (1) r=0 X Tr B T X T CXX T CXB ] = CXX T CXBB T +C T XBB T X T C T X +CXBB T X T CX +C T XX T C T XBB T (13) Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 01, Page 13

.6 Derivatives of vector norms DERIVATIVES.5.4 Other X Tr(AX 1 B) = (X 1 BAX 1 ) T = X T A T B T X T (14) Assume B and C to be symmetric, then ] X Tr (X T CX) 1 A ] X Tr (X T CX) 1 (X T BX) ] X Tr (A + X T CX) 1 (X T BX) See 7]. = (CX(X T CX) 1 )(A + A T )(X T CX) 1 (15) = CX(X T CX) 1 X T BX(X T CX) 1 +BX(X T CX) 1 (16) = CX(A + X T CX) 1 X T BX(A + X T CX) 1 +BX(A + X T CX) 1 (17) Tr(sin(X)) X = cos(x) T (18).6 Derivatives of vector norms.6.1 Two-norm x x a = x a (19) x a x a I (x a)(x a)t = x x a x a x a 3.7 Derivatives of matrix norms For more on matrix norms, see Sec. 10.4. (130) x x = xt x = x (131) x.7.1 Frobenius norm X X F = X Tr(XXH ) = X (13) See (48). Note that this is also a special case of the result in equation 119..8 Derivatives of Structured Matrices Assume that the matrix A has some structure, i.e. symmetric, toeplitz, etc. In that case the derivatives of the previous section does not apply in general. Instead, consider the following general rule for differentiating a scalar function f(a) df = ] ] T f A kl f A = Tr (133) da ij A kl A ij A A ij kl Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 01, Page 14

.8 Derivatives of Structured Matrices DERIVATIVES The matrix differentiated with respect to itself is in this document referred to as the structure matrix of A and is defined simply by A A ij = S ij (134) If A has no special structure we have simply S ij = J ij, that is, the structure matrix is simply the single-entry matrix. Many structures have a representation in singleentry matrices, see Sec. 9.7.6 for more examples of structure matrices..8.1 The Chain Rule Sometimes the objective is to find the derivative of a matrix which is a function of another matrix. Let U = f(x), the goal is to find the derivative of the function g(u) with respect to X: g(u) X = g(f(x)) X Then the Chain Rule can then be written the following way: (135) g(u) X = g(u) x ij = M N k=1 l=1 Using matrix notation, this can be written as:.8. Symmetric g(u) X ij = Tr ( g(u) U g(u) u kl (136) u kl x ij )T U X ij ]. (137) If A is symmetric, then S ij = J ij + J ji J ij J ij and therefore ] ] T ] df f f f da = + diag A A A (138) That is, e.g., (5]): Tr(AX) X det(x) X ln det(x) X = A + A T (A I), see (14) (139) = det(x)(x 1 (X 1 I)) (140) = X 1 (X 1 I) (141).8.3 Diagonal If X is diagonal, then (19]): Tr(AX) X = A I (14) Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 01, Page 15

.8 Derivatives of Structured Matrices DERIVATIVES.8.4 Toeplitz Like symmetric matrices and diagonal matrices also Toeplitz matrices has a special structure which should be taken into account when the derivative with respect to a matrix with Toeplitz structure. Tr(AT) T = Tr(TA) T = Tr(A) Tr(A T ] n1 ) Tr(A T ] 1n ] n 1, ) A n1 Tr(A T ] 1n )) Tr(A T ] 1n ],n 1 ) α(a) Tr(A).................. Tr(A T ]1n ] n 1, ).......... Tr(A T ]n1 ) A 1n Tr(A T ] 1n ],n 1 ) Tr(A T ] 1n )) Tr(A) (143) As it can be seen, the derivative α(a) also has a Toeplitz structure. Each value in the diagonal is the sum of all the diagonal valued in A, the values in the diagonals next to the main diagonal equal the sum of the diagonal next to the main diagonal in A T. This result is only valid for the unconstrained Toeplitz matrix. If the Toeplitz matrix also is symmetric, the same derivative yields Tr(AT) T = Tr(TA) T = α(a) + α(a) T α(a) I (144) Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 01, Page 16

3 INVERSES 3 Inverses 3.1 Basic 3.1.1 Definition The inverse A 1 of a matrix A C n n is defined such that AA 1 = A 1 A = I, (145) where I is the n n identity matrix. If A 1 exists, A is said to be nonsingular. Otherwise, A is said to be singular (see e.g. 1]). 3.1. Cofactors and Adjoint The submatrix of a matrix A, denoted by A] ij is a (n 1) (n 1) matrix obtained by deleting the ith row and the jth column of A. The (i, j) cofactor of a matrix is defined as cof(a, i, j) = ( 1) i+j det(a] ij ), (146) The matrix of cofactors can be created from the cofactors cof(a, 1, 1) cof(a, 1, n) cof(a) =. cof(a, i, j). cof(a, n, 1) cof(a, n, n) (147) The adjoint matrix is the transpose of the cofactor matrix adj(a) = (cof(a)) T, (148) 3.1.3 Determinant The determinant of a matrix A C n n is defined as (see 1]) det(a) = = n ( 1) j+1 A 1j det (A] 1j ) (149) j=1 n A 1j cof(a, 1, j). (150) j=1 3.1.4 Construction The inverse matrix can be constructed, using the adjoint matrix, by A 1 = For the case of matrices, see section 1.3. 1 adj(a) (151) det(a) Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 01, Page 17

3. Exact Relations 3 INVERSES 3.1.5 Condition number The condition number of a matrix c(a) is the ratio between the largest and the smallest singular value of a matrix (see Section 5.3 on singular values), c(a) = d + d (15) The condition number can be used to measure how singular a matrix is. If the condition number is large, it indicates that the matrix is nearly singular. The condition number can also be estimated from the matrix norms. Here c(a) = A A 1, (153) where is a norm such as e.g the 1-norm, the -norm, the -norm or the Frobenius norm (see Sec 10.4 for more on matrix norms). The -norm of A equals (max(eig(a H A))) 1, p.57]. For a symmetric matrix, this reduces to A = max( eig(a) ) 1, p.394]. If the matrix is symmetric and positive definite, A = max(eig(a)). The condition number based on the -norm thus reduces to A A 1 = max(eig(a)) max(eig(a 1 )) = max(eig(a)) min(eig(a)). (154) 3. Exact Relations 3..1 Basic 3.. The Woodbury identity (AB) 1 = B 1 A 1 (155) The Woodbury identity comes in many variants. The latter of the two can be found in 1] (A + CBC T ) 1 = A 1 A 1 C(B 1 + C T A 1 C) 1 C T A 1 (156) (A + UBV) 1 = A 1 A 1 U(B 1 + VA 1 U) 1 VA 1 (157) If P, R are positive definite, then (see 30]) (P 1 + B T R 1 B) 1 B T R 1 = PB T (BPB T + R) 1 (158) 3..3 The Kailath Variant See 4, page 153]. (A + BC) 1 = A 1 A 1 B(I + CA 1 B) 1 CA 1 (159) 3..4 Sherman-Morrison (A + bc T ) 1 = A 1 A 1 bc T A 1 1 + c T A 1 b (160) Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 01, Page 18

3. Exact Relations 3 INVERSES 3..5 The Searle Set of Identities The following set of identities, can be found in 5, page 151], (I + A 1 ) 1 = A(A + I) 1 (161) (A + BB T ) 1 B = A 1 B(I + B T A 1 B) 1 (16) (A 1 + B 1 ) 1 = A(A + B) 1 B = B(A + B) 1 A (163) A A(A + B) 1 A = B B(A + B) 1 B (164) A 1 + B 1 = A 1 (A + B)B 1 (165) (I + AB) 1 = I A(I + BA) 1 B (166) (I + AB) 1 A = A(I + BA) 1 (167) 3..6 Rank-1 update of inverse of inner product Denote A = (X T X) 1 and that X is extended to include a new column vector in the end X = X v]. Then 34] ] ( X T 1 A + X) AXT vv T XA T AX T v = v T v v T XAX T v v T v v T XAX T v v T XA T v T v v T XAX T v 3..7 Rank-1 update of Moore-Penrose Inverse 1 v T v v T XAX T v The following is a rank-1 update for the Moore-Penrose pseudo-inverse of real valued matrices and proof can be found in 18]. The matrix G is defined below: Using the the notation (A + cd T ) + = A + + G (168) β = 1 + d T A + c (169) v = A + c (170) n = (A + ) T d (171) w = (I AA + )c (17) m = (I A + A) T d (173) the solution is given as six different cases, depending on the entities w, m, and β. Please note, that for any (column) vector v it holds that v + = v T (v T v) 1 = vt v. The solution is: Case 1 of 6: If w 0 and m 0. Then G = vw + (m + ) T n T + β(m + ) T w + (174) = 1 w vwt 1 β m mnt + m w mwt (175) Case of 6: If w = 0 and m 0 and β = 0. Then G = vv + A + (m + ) T n T (176) = 1 v vvt A + 1 m mnt (177) Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 01, Page 19

3.3 Implication on Inverses 3 INVERSES Case 3 of 6: If w = 0 and β 0. Then G = 1 ( ) ( ) β mvt A + β v m T v m + β β m + v β (A+ ) T v + n (178) Case 4 of 6: If w 0 and m = 0 and β = 0. Then G = A + nn + vw + (179) = 1 n A+ nn T 1 w vwt (180) Case 5 of 6: If m = 0 and β 0. Then G = 1 β A+ nw T β n w + β ( w Case 6 of 6: If w = 0 and m = 0 and β = 0. Then β ) ( ) n T A+ n + v β w + n (181) G = vv + A + A + nn + + v + A + nvn + (18) = 1 v vvt A + 1 n A+ nn T + vt A + n v n vnt (183) 3.3 Implication on Inverses See 5]. If (A + B) 1 = A 1 + B 1 then AB 1 A = BA 1 B (184) 3.3.1 A PosDef identity Assume P, R to be positive definite and invertible, then See 30]. (P 1 + B T R 1 B) 1 B T R 1 = PB T (BPB T + R) 1 (185) 3.4 Approximations The following identity is known as the Neuman series of a matrix, which holds when λ i < 1 for all eigenvalues λ i which is equivalent to (I A) 1 = (I + A) 1 = A n (186) n=0 ( 1) n A n (187) When λ i < 1 for all eigenvalues λ i, it holds that A 0 for n, and the following approximations holds n=0 (I A) 1 = I + A + A (I + A) 1 = I A + A (188) (189) Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 01, Page 0

3.5 Generalized Inverse 3 INVERSES The following approximation is from ] and holds when A large and symmetric If σ is small compared to Q and M then Proof: A A(I + A) 1 A = I A 1 (190) (Q + σ M) 1 = Q 1 σ Q 1 MQ 1 (191) (Q + σ M) 1 = (19) (QQ 1 Q + σ MQ 1 Q) 1 = (193) This can be rewritten using the Taylor expansion: ((I + σ MQ 1 )Q) 1 = (194) Q 1 (I + σ MQ 1 ) 1 (195) Q 1 (I + σ MQ 1 ) 1 = (196) Q 1 (I σ MQ 1 + (σ MQ 1 )...) = Q 1 σ Q 1 MQ 1 (197) 3.5 Generalized Inverse 3.5.1 Definition A generalized inverse matrix of the matrix A is any matrix A such that (see 6]) AA A = A (198) The matrix A is not unique. 3.6 Pseudo Inverse 3.6.1 Definition The pseudo inverse (or Moore-Penrose inverse) of a matrix A is the matrix A + that fulfils I AA + A = A II A + AA + = A + III IV AA + symmetric A + A symmetric The matrix A + is unique and does always exist. Note that in case of complex matrices, the symmetric condition is substituted by a condition of being Hermitian. Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 01, Page 1

3.6 Pseudo Inverse 3 INVERSES 3.6. Properties Assume A + to be the pseudo-inverse of A, then (See 3] for some of them) (A + ) + = A (199) (A T ) + = (A + ) T (00) (A H ) + = (A + ) H (01) (A ) + = (A + ) (0) (A + A)A H = A H (03) (A + A)A T A T (04) (ca) + = (1/c)A + (05) A + = (A T A) + A T (06) A + = A T (AA T ) + (07) (A T A) + = A + (A T ) + (08) (AA T ) + = (A T ) + A + (09) A + = (A H A) + A H (10) A + = A H (AA H ) + (11) (A H A) + = A + (A H ) + (1) (AA H ) + = (A H ) + A + (13) (AB) + = (A + AB) + (ABB + ) + (14) f(a H A) f(0)i = A + f(aa H ) f(0)i]a (15) f(aa H ) f(0)i = Af(A H A) f(0)i]a + (16) where A C n m. Assume A to have full rank, then For two matrices it hold that 3.6.3 Construction (AA + )(AA + ) = AA + (17) (A + A)(A + A) = A + A (18) Assume that A has full rank, then Tr(AA + ) = rank(aa + ) (See 6]) (19) Tr(A + A) = rank(a + A) (See 6]) (0) (AB) + = (A + AB) + (ABB + ) + (1) (A B) + = A + B + () A n n Square rank(a) = n A + = A 1 A n m Broad rank(a) = n A + = A T (AA T ) 1 A n m Tall rank(a) = m A + = (A T A) 1 A T The so-called broad version is also known as right inverse and the tall version as the left inverse. Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 01, Page

3.6 Pseudo Inverse 3 INVERSES Assume A does not have full rank, i.e. A is n m and rank(a) = r < min(n, m). The pseudo inverse A + can be constructed from the singular value decomposition A = UDV T, by A + = V r D 1 r U T r (3) where U r, D r, and V r are the matrices with the degenerated rows and columns deleted. A different way is this: There do always exist two matrices C n r and D r m of rank r, such that A = CD. Using these matrices it holds that See 3]. A + = D T (DD T ) 1 (C T C) 1 C T (4) Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 01, Page 3

4 COMPLEX MATRICES 4 Complex Matrices The complex scalar product r = pq can be written as ] ] ] Rr Rp Ip Rq = Ir Ip Rp Iq (5) 4.1 Complex Derivatives In order to differentiate an expression f(z) with respect to a complex z, the Cauchy-Riemann equations have to be satisfied (7]): df(z) dz and df(z) dz or in a more compact form: = R(f(z)) Rz = i R(f(z)) Iz f(z) Iz + i I(f(z)) Rz + I(f(z)) Iz (6) (7) = if(z) Rz. (8) A complex function that satisfies the Cauchy-Riemann equations for points in a region R is said yo be analytic in this region R. In general, expressions involving complex conjugate or conjugate transpose do not satisfy the Cauchy-Riemann equations. In order to avoid this problem, a more generalized definition of complex derivative is used (4], 6]): Generalized Complex Derivative: df(z) dz = 1 Conjugate Complex Derivative ( f(z) ) Rz if(z). (9) Iz df(z) dz = 1 ( f(z) ) Rz + if(z). (30) Iz The Generalized Complex Derivative equals the normal derivative, when f is an analytic function. For a non-analytic function such as f(z) = z, the derivative equals zero. The Conjugate Complex Derivative equals zero, when f is an analytic function. The Conjugate Complex Derivative has e.g been used by 1] when deriving a complex gradient. Notice: df(z) dz f(z) Rz + if(z) Iz. (31) Complex Gradient Vector: If f is a real function of a complex vector z, then the complex gradient vector is given by (14, p. 798]) f(z) = df(z) dz (3) = f(z) Rz + if(z) Iz. Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 01, Page 4

4.1 Complex Derivatives 4 COMPLEX MATRICES Complex Gradient Matrix: If f is a real function of a complex matrix Z, then the complex gradient matrix is given by (]) f(z) = df(z) dz (33) = f(z) RZ + if(z) IZ. These expressions can be used for gradient descent algorithms. 4.1.1 The Chain Rule for complex numbers The chain rule is a little more complicated when the function of a complex u = f(x) is non-analytic. For a non-analytic function, the following chain rule can be applied (7]) g(u) x = g u u x + g u u x = g u ( g ) u u x + u x (34) Notice, if the function is analytic, the second term reduces to zero, and the function is reduced to the normal well-known chain rule. For the matrix derivative of a scalar function g(u), the chain rule can be written the following way: g(u) X g(u) Tr(( = U )T U) + X 4.1. Complex Derivatives of Traces Tr(( g(u) U ) T U ) X. (35) If the derivatives involve complex numbers, the conjugate transpose is often involved. The most useful way to show complex derivative is to show the derivative with respect to the real and the imaginary part separately. An easy example is: Tr(X ) RX = Tr(XH ) RX i Tr(X ) IX = ) itr(xh IX = I (36) = I (37) Since the two results have the same sign, the conjugate complex derivative (30) should be used. Tr(X) RX = Tr(XT ) RX i Tr(X) IX = ) itr(xt IX = I (38) = I (39) Here, the two results have different signs, and the generalized complex derivative (9) should be used. Hereby, it can be seen that (100) holds even if X is a complex number. Tr(AX H ) RX i Tr(AXH ) IX = A (40) = A (41) Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 01, Page 5

4. Higher order and non-linear derivatives 4 COMPLEX MATRICES Tr(AX ) RX i Tr(AX ) IX = A T (4) = A T (43) Tr(XX H ) RX i Tr(XXH ) IX = Tr(XH X) RX = i Tr(XH X) IX = RX (44) = iix (45) By inserting (44) and (45) in (9) and (30), it can be seen that Tr(XX H ) = X X (46) Tr(XX H ) X = X (47) Since the function Tr(XX H ) is a real function of the complex matrix X, the complex gradient matrix (33) is given by Tr(XX H ) = Tr(XXH ) X = X (48) 4.1.3 Complex Derivative Involving Determinants Here, a calculation example is provided. The objective is to find the derivative of det(x H AX) with respect to X C m n. The derivative is found with respect to the real part and the imaginary part of X, by use of (4) and (37), det(x H AX) can be calculated as (see App. B.1.4 for details) det(x H AX) X = 1 ( det(x H AX) i det(xh AX) ) RX IX = det(x H AX) ( (X H AX) 1 X H A ) T (49) and the complex conjugate derivative yields det(x H AX) X = 1 ( det(x H AX) + i det(xh AX) ) RX IX = det(x H AX)AX(X H AX) 1 (50) 4. Higher order and non-linear derivatives (Ax) H (Ax) x (Bx) H (Bx) = x H A H Ax x x H B H Bx (51) = AH Ax x H BBx xh A H AxB H Bx (x H B H Bx) (5) Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 01, Page 6