MATRICES, PARTIAL DERIVATIVES AND THE CHAIN RULE STEFAN GESCHKE 1 The dot-product in higher dimensions The definition of the dot-product can be easily extended to dimensions > 3 Definition 11 If x,, x n and y y 1,, y n are vectors in R n, then the dot-product x y is defined by n x y,, x n y 1,, y n x i y i y 1 + + x n y n Note that the dot-product of two vectors is a real number Example 12 We compute the dot-product of two vectors in R 4 : 1, 2, 3, 4 1, 2, 0, 1 1 1 + 2 2 + 3 0 + 4 1 1 + 4 + 0 4 1 The dot-product in dimension n behaves as well as in dimension 3 Theorem 13 Let x, y, z R n and let λ R Then the following hold: 1 x y y x 2 x y + z x y + x z 3 x + y z x z + y z 4 λx y x λy λx y 5 0 x x 0 0 2 Matrices Definition 21 Let m and n be natural numbers positive integers An m-by-n matrix is an array a 11 a 12 a 1n a 21 a 22 a 2n a m1 a m2 a mn of real numbers with m rows and n columns Each entry has two indices, the first denoting the row and the second the column The matrix above is often denoted by a ij 1 i m,1 j n or just a ij if m and n are clear from the context If A a ij 1 i m,1 j n and B b ij 1 i m,1 j n are matrices of the same format, then their sum A + B is defined componentwise, ie, A + B a ij + b ij 1 i m,1 j n 1 i1
2 STEFAN GESCHKE Example 22 a The vector 1, 2, 3 is a 1-by-3 matrix The array 15 2 π 4 5 e is a 3-by-2 matrix b 15 2 π 4 5 e + 1 0 1 e 1 + 15 2 + 0 π + 0 4 + 1 5 + 1 e + e 25 2 π 5 6 2e Using the dot-product, we can define products of matrices of suitable formats Definition 23 Let l, m, n be natural numbers If A a ij 1 i l,1 j m and B b jk 1 j m,1 k n are matrices, then the product A B is defined to be the l-by-n matrix C c ik 1 i l,1 k n where m c ik a ij b jk a i1 b 1k + + a im b mk j1 In other words, A B is the matrix whose entry in the i th row and k th column is the dot-product of the i th row of A and the k th column of B Note that the product A B can only be formed if A has as many of columns as B has rows Moreover, if A is a 1-by-n matrix a 1 a 2 a n and B is an n-by-1 matrix b 2 b 1 b n, then A B is simply the dot-product a 1,, a n b 1,, b n Example 24 a e π e 0 + π 2 e 1 + π 1 2π e + π 2 1 0 0 + 1 2 0 1 + 1 1 2 1 b 2 1 e π 2e 2π + 1 Together with a, this shows that matrix multiplication is not commutative c e π 1 2 3 1 0 e + 2 π + 3
MATRICES, PARTIAL DERIVATIVES AND THE CHAIN RULE 3 d 1 2 3 1 0 0 1 e π 1 0 e + 2 π + 3 e π 1 1 Theorem 25 Let A and B be l-by-m matrices and let C and D be m-by-n matrices Then It follows that A C + D A C + A D and A + B C A C + B C A + B C + D A C + A B + B C + B D 3 Derivatives Definition 31 Let f : R n R m be a function and let f 1,, f m : R n R be its component functions coordinate functions We assume that all the partial derivatives i x j, 1 i m, 1 j n, exist and are continuous, at least on some open region U R n Then for each a 1,, a n U we define the derivative of f at a 1,, a n to be the m-by-n matrix 1 d a 1,, a n 1 dx 2 a 1,, a n 1 dx n a 1,, a n 2 dx Dfa 1,, a n 1 a 1,, a n 2 dx 2 a 1,, a n 2 dx n a 1,, a n m d a 1,, a n m dx 2 a 1,, a n m dx n a 1,, a n Example 32 a Let f : R R 2 be defined by ft cos t, sin t Then for each a R we have Dfa cos t t sin t t a a sin a cos a Note that this differs in notation from the previously defined f a sin a, cos a For the chain rule that we will discuss below, it is however important to pay attention to the fact that Dfa is a 2-by-1 matrix, ie, a vector written vertically, as opposed to a 1-by-2 matrix, ie, a vector written horizontally b Let f : R 3 R be defined by fx, y, z x 2 + y 2 + z 2 Then for all a, b, c R 3, Dfa, b, c x a, b, c y a, b, c z a, b, c c Let f : R 2 R 2 be defined by 1 2 x x + 2y fx, y y y 2a 2b 2c The component functions are f 1 x, y x + 2y and f 2 x, y y a, b R 2, Dfa, b What do you observe? 1 x a, b 1 y a, b 1 2 1 x a, b 1 y a, b Now for all
4 STEFAN GESCHKE d In the previous examples we considered functions of the form f,, x n and computed the derivative at a point a 1,, a n This was to point out the distinction between the variables,, x n with respect to which we take partial derivatives and the points at which we compute the derivative In the future we will not be as careful, see the following example Let f : R 3 R 3 be defined by fr, ϕ, z r cos ϕ, r sin ϕ, z Note that f computes from cylindrical coordinates the cartesian coordinates of a point have Dfr, ϕ, z cos ϕ sin ϕ 0 r sin ϕ r cos ϕ 0 0 Theorem 33 Let f, g : R n R m be functions and assume that all the relevant derivatives exist Then the following hold: 1 If f is constant, then Df 0, where 0 denotes the m-by-n matrix whose entries are all the real number 0 2 Df + g Df + Dg, where the first + denote the sum of two functions and the second + denotes the sum of two matrices 3 If f is of the form f,, x n A x 2 x n for some m-by-n matrix A, then for all,, x n R n, See Example 32 c Df,, x n A 4 The chain rule in higher dimensions Definition 41 Let f : R n R m and g : R m R l be functions Then their composition g f : R n R l is defined by g f, x n gf,, x n Note that this is a reasonable definition because the range of f is contained in R m and g is defined on R m Example 42 a ht sin 2 t is the composition g f of the functions gx x 2 and ft sin t Note that f gx sin x 2 b Let ft cos t, sin t and let gx, y x 2 + y 2 Then for all t R g ft cos 2 t + sin 2 t 1 There is a close connection between matrix multiplication and composition of functions We
MATRICES, PARTIAL DERIVATIVES AND THE CHAIN RULE 5 Theorem 43 If f : R l R m and g : R m R n are functions such that there is an l-by-m matrix A and an n-by-m matrix B such that f,, A and gy 1,, y m B y 1, then g f,, B A y m This theorem is just a special case of the fact that matrix multiplication satisfies the associative law: if A, B and C are matrices of suitable formats, then A B C A B C More precisely, if f, g, A and B are as in the theorem above, then g f,, B A B A Theorem 44 Chain Rule Let f : R n R m and g : R m R l and assume that all the relevant partial derivatives exist and are continuous a 1,, a n R n, rule Dg fa 1,, a n Dgfa 1,, a n Dfa 1,, a n Then for all Note that for functions from R to R this is just the usual 1-dimensional chain Example 45 a Let f and g be as in Example 42 b Since g f is constant, On the other hand, Dg ft g f t 0 Dg ft Dgft Dft Dgcos t, sin t Dft sin t 2 cos t 2 sin t 2 cos t sin t + 2 sin t cos t 0 cos t b Let fx, y, z x 2 + y z, x y 2 + 3z and gu, v u + v, u v Then 1 1 Dgu, v 1 1 and therefore Dg fx, y, z 1 1 1 1 2 1 1 2y 3 2x + 1 1 2y 2 2x 1 1 + 2y 4
6 STEFAN GESCHKE Definition 46 If f : R n R, then,, x n,,,, x n x n is called the gradient of f at,, x n and denoted by f,, x n Note that f,, x n is the vector with the same entries as the 1-by-n matrix Df,, x n Example 47 Let fx, y, z x 2 + 2y z 3 Then fx, y, z 2x, 2, 3z 2 Corollary 48 If f : R R m and g : R m R, then the chain rule reduces to Dg ft gft f t g ft f 1 t + + g ft f m t, x m where f 1,, f m are the component functions of f See Example 44 a Example 49 A typical application of this corollary is the following: f : R R 3 describes the position of an airplane at time t, for instance ft 100 cos t, 100 sin t, t The function g : R 3 R describe the temperature at a point x, y, z, for instance gx, y, z 70 z The gradient of g at x, y, z is gx, y, z 0, 0, 1 The derivative of f at t is f t 100 sin t, 100 cos t, 1 Now Dg ft g f t g100 cos t, 100 sin t, t f t 0, 0, 1 100 sin t, 100 cos t, 1 1 The reason this is so simple in this particular case is that gx, y, z only depends on z It is actually easier to compute the derivative of the composition by first computing the composition and then its derivative We have g ft 70 t If g is more complicated, the chain rule actually helps gx, y, z 70 + x2 200 z Then gx, y, z x 100, 0, 1 Hence Dg ft g f t g100 cos t, 100 sin t, t f t Suppose now that cos t, 0, 1 100 sin t, 100 cos t, 1 100 cos t sin t 1