DERIVATIVES AS MATRICES; CHAIN RULE



Similar documents
Section 12.6: Directional Derivatives and the Gradient Vector

(a) We have x = 3 + 2t, y = 2 t, z = 6 so solving for t we get the symmetric equations. x 3 2. = 2 y, z = 6. t 2 2t + 1 = 0,

88 CHAPTER 2. VECTOR FUNCTIONS. . First, we need to compute T (s). a By definition, r (s) T (s) = 1 a sin s a. sin s a, cos s a

Scalar Valued Functions of Several Variables; the Gradient Vector

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

Practice with Proofs

( 1) = 9 = 3 We would like to make the length 6. The only vectors in the same direction as v are those

Fundamental Theorems of Vector Calculus

Differentiation of vectors

Linear Algebra Notes for Marsden and Tromba Vector Calculus

160 CHAPTER 4. VECTOR SPACES

1 3 4 = 8i + 20j 13k. x + w. y + w

Question 2: How do you solve a matrix equation using the matrix inverse?

A QUICK GUIDE TO THE FORMULAS OF MULTIVARIABLE CALCULUS

L 2 : x = s + 1, y = s, z = 4s Suppose that C has coordinates (x, y, z). Then from the vector equality AC = BD, one has

Mathematics Course 111: Algebra I Part IV: Vector Spaces

Microeconomic Theory: Basic Math Concepts

TOPIC 4: DERIVATIVES

Linear and quadratic Taylor polynomials for functions of several variables.

Class Meeting # 1: Introduction to PDEs

Determine If An Equation Represents a Function

Objectives. Materials

To give it a definition, an implicit function of x and y is simply any relationship that takes the form:

2.2 Derivative as a Function

1.2 Solving a System of Linear Equations

Methods for Finding Bases

Solutions for Review Problems

Systems of Linear Equations

Linear Algebra Notes

DIFFERENTIABILITY OF COMPLEX FUNCTIONS. Contents

Chapter 19. General Matrices. An n m matrix is an array. a 11 a 12 a 1m a 21 a 22 a 2m A = a n1 a n2 a nm. The matrix A has n row vectors

the points are called control points approximating curve

Recall that two vectors in are perpendicular or orthogonal provided that their dot

x 2 + y 2 = 1 y 1 = x 2 + 2x y = x 2 + 2x + 1

Multi-variable Calculus and Optimization

NOTES ON LINEAR TRANSFORMATIONS

Section 9.5: Equations of Lines and Planes

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

5.1 Derivatives and Graphs

8 Square matrices continued: Determinants

LECTURE 1: DIFFERENTIAL FORMS forms on R n

CURVE FITTING LEAST SQUARES APPROXIMATION

Math 21a Curl and Divergence Spring, Define the operator (pronounced del ) by. = i

Row Echelon Form and Reduced Row Echelon Form

Inner product. Definition of inner product

1.5 Equations of Lines and Planes in 3-D

x = + x 2 + x

H.Calculating Normal Vectors

1 Inner Products and Norms on Real Vector Spaces

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system

Second Order Linear Nonhomogeneous Differential Equations; Method of Undetermined Coefficients. y + p(t) y + q(t) y = g(t), g(t) 0.

Math 241, Exam 1 Information.

3 Contour integrals and Cauchy s Theorem

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Solutions to Math 51 First Exam January 29, 2015

5.5. Solving linear systems by the elimination method

Similarity and Diagonalization. Similar Matrices

Average rate of change of y = f(x) with respect to x as x changes from a to a + h:

1.7 Graphs of Functions

MATH 425, PRACTICE FINAL EXAM SOLUTIONS.

Beginner s Matlab Tutorial

CHOOSING A COLLEGE. Teacher s Guide Getting Started. Nathan N. Alexander Charlotte, NC

MATH 4330/5330, Fourier Analysis Section 11, The Discrete Fourier Transform

v w is orthogonal to both v and w. the three vectors v, w and v w form a right-handed set of vectors.

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Lecture 1: Systems of Linear Equations

Solving Systems of Linear Equations

Limits and Continuity

α = u v. In other words, Orthogonal Projection

Section 13.5 Equations of Lines and Planes

2.2. Instantaneous Velocity

Mechanics 1: Conservation of Energy and Momentum

Section 1.1 Linear Equations: Slope and Equations of Lines

MATH2210 Notebook 1 Fall Semester 2016/ MATH2210 Notebook Solving Systems of Linear Equations... 3

SOLUTIONS TO HOMEWORK ASSIGNMENT #4, MATH 253

Rolle s Theorem. q( x) = 1

Orthogonal Projections

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

This makes sense. t /t 2 dt = 1. t t 2 + 1dt = 2 du = 1 3 u3/2 u=5

Linear Algebra I. Ronald van Luijk, 2012

1.5 SOLUTION SETS OF LINEAR SYSTEMS

8.2. Solution by Inverse Matrix Method. Introduction. Prerequisites. Learning Outcomes

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES

15. Symmetric polynomials

Economics 121b: Intermediate Microeconomics Problem Set 2 1/20/10

LS.6 Solution Matrices

Matrix Algebra. Some Basic Matrix Laws. Before reading the text or the following notes glance at the following list of basic matrix algebra laws.

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.

5 Systems of Equations

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.

Chapter 7 - Roots, Radicals, and Complex Numbers

University of Lille I PC first year list of exercises n 7. Review

THE DIMENSION OF A VECTOR SPACE

Nonlinear Algebraic Equations. Lectures INF2320 p. 1/88

Week 2: Exponential Functions

LEARNING OBJECTIVES FOR THIS CHAPTER

Math 120 Final Exam Practice Problems, Form: A

Lectures notes on orthogonal matrices (with exercises) Linear Algebra II - Spring 2004 by D. Klain

Lecture L3 - Vectors, Matrices and Coordinate Transformations

Introduction to Matrix Algebra

Transcription:

DERIVATIVES AS MATRICES; CHAIN RULE 1. Derivatives of Real-valued Functions Let s first consider functions f : R 2 R. Recall that if the partial derivatives of f exist at the point (x 0, y 0 ), then we can write down the candidate for the tangent plane: (1.1) z z 0 = f x (x 0, y 0 )(x x 0 ) + f y (x 0, y 0 )(y y 0 ). We defined f to be differentiable if this plane really is a tangent plane to the graph of f in the sense that the tangent line to every smooth curve in the graph through the point (x 0, y 0, z 0 ) lies in this plane. As mentioned in class, one can show that if the partial derivatives are continuous, then the function is differentiable. However, while we talked about differentiable functions, we never defined the actual derivative of a function from R 2 to R. Definition 1.1. If f is differentiable at (x 0, y 0 ), then we define the derivative to be f (x 0, y 0 ) = [ f x (x 0, y 0 ) f y (x 0, y 0 ) ]. (If we view this row matrix at a vector, then it is the gradient f(x 0, y 0 ).) Similarly, for a differentiable function f : R n R, the derivative is the row matrix with n entries consisting of the n partial derivatives. E.g., for a function of three variables, we have f (x 0.y 0, z 0 ) = [ f x (x 0, y 0, z 0 ) f y (x 0, y 0, z 0 ) f z (x 0, y 0, z 0 ) ] 1.2. Example. Let f(x, y) = x 2 y 3, and let (x 0, y 0 ) = (2, 1). Then f x = 2xy3 and f y = 3x2 y 2, so f f x (2, 1) = 4 and y (2, 1) = 12. Thus f (2, 1) = [ 4 12 ]. The matrix f (2, 1) defines a linear transformation: L(x, y) = [ 4 12 ] x = 4x + 12y. y Observe that the tangent line may be written as z 4 = 4(x 2) + 12(y 1) z 4 = [ 4 12 ] x 2. y 1 As in this example, for an arbitrary differentiable function f : R 2 R, the tangent plane (Equation 1.1) to the graph at (x 0, y 0, z 0 ) may be written (1.3) z z 0 = f x x0 (x 0, y 0 ) y y 0 1

2 DERIVATIVES AS MATRICES; CHAIN RULE If we write then Equation 1.3 may be rewritten as [ x x = and x y] 0 = [ x0 y 0 ] (1.4) z z 0 = f (x 0, y 0 )(x x 0 ). Note the resemblance between this expression for the tangent plane to the graph of a function f : R 2 R 2 and the familiar expression for the tangent line to the graph of a real-valued function of one variable. 2. Functions for which the domain and range are both higher-dimensional. A function f : R n R m consists of m-real-valued functions on R n : f(x 1,..., x n ) = (f 1 (x 1,..., x n ),..., f m (x 1,..., x n )). The real-valued functions f 1,..., f m are called the component functions. 2.1. Example. Define a function f : R 2 R 3 by f(x, y) = (x 2 y, 3xy, 5x + 4y)). Then the component functions are f 1 (x, y) = x 2 y, f 2 (x, y) = 3xy, and f 3 (x, y) = 5x + 4y. Definition 2.1. We say a function f : R n R m is differentiable if each of the component functions is differentiable. In that case, the derivative at a point (a 1,..., a n ) in R n is given by the m n matrix whose ith row is the derivative of the ith component function. 2.2. Example. Define f as in Example 2.1. Let (a 1, a 2 ) = (1, 1). Then f 1 (x, y) = [ 2xy x 2] so f 1(1, 1) = [ 2 1 ]. Similarly and Thus f 2(1, 1) = [ 3 3 ] f 3(1, 1) = [ 5 4 ] f (1, 1) = 2 1 3 3. 5 4 2.3. Exercise. Find the derivatives (as matrices) of each of the following functions at the indicated point: (1) f(x, y, z) = (e 3x y z, x 2 y 2 ) at the point (1, 1, 2) (2) f(x, y) = ((x + y) 3, xsin(y)) at (2, 0). (3) f(t) = (t 2, t 3, t 4 ) at t 0 = 1. (If you view f as a parametrized curve, you have the familiar notion of the tangent vector at the point with parameter t 0 = 1. How does this compare to the derivative that you just computed?)

DERIVATIVES AS MATRICES; CHAIN RULE 3 Again consider an arbitrary differentiable function f : R n R m. Denote elements of R n by column vectors and elements of R m by columns Let be a chosen point in R n and set x 1 x =. x n u 1 u =. u m a 1 x 0 =. a n u 0 = f(a 1,..., a n ) (viewed as a column matrix). Then the analog of the tangent plane, or best linear approximation to the graph at (a 1,..., a n ), is given by: (2.4) u u 0 = f (x 0 )(x x 0 ) 2.5. Example. In Example 2.1, f(1, 1) = (1, 3, 9), and the best linear approximation at (1, 1) is given as follows: u u 0 = 2 1 3 3 (x x 0 ) 5 4 I.e., (2.6) u 1 1 u 2 3 = 2 1 2(x 1) + 1(y 1) 3 3 x 1 = 3(x 1) + 3(y 1) y 1 u 3 9 5 4 5(x 1) + 4(y 1) where the first equality is the expression for the tangent plane, and the second equality is obtained by multiplying matrices. If we compare the matrices on the left and right sides, the first row tells us that u 1 = 2(x 1) + 1(y 1). Note that this is the equation of the tangent plane to the function f 1 (x, y) = x 2 y at (1, 2). Similarly, the second and third rows give the tangent planes to the second and third component functions of f. 2.7. Exercise. Write down the linear approximation to the graph of each of the functions in Exercise 2.3 at the indicated point. (Your answers should be expressed similarly to Equation 2.6.)

4 DERIVATIVES AS MATRICES; CHAIN RULE 2.8. Remark. The most common mistake students make in writing down a derivative matrix is to switch the rows and columns. Note that the rows of the derivative matrix are the derivatives (gradients) of the component functions. The columns correspond to the independent variable; e.g., in the first column we see the partials of the component functions with respect to the first variable. A good way to keep this straight is to view the derivative matrix as the matrix of a linear transformation L (the one that gives us the linear approximation). Since the original function f has domain R n, we want L to have domain R n and range R m as well. That means we need to be able to multiply the derivative matrix by a column with n entries. So the derivative matrix must have n columns. Similarly, for the range to be R m, it must have m rows. In the examples above, we wrote down the derivative (as a matrix) by first computing the partials. Conversely, if you are given the derivative matrix, then you can read off the partial derivatives of all the component functions. 2.9. Example. Suppose f : R 3 R 2 is differentiable at x 0 = (x 0, y 0, z 0 ) and that it s derivative at x 0 is f 2 5 1 (x 0, y 0, z 0 ) = 3 1 4 Writing (u, v) = f(x, y, z) = (f 1 (x, y, z), f 2 (x, y, z), then at the point x 0, we can read off from the matrix that x f 1(x 0, y 0 z 0 ) = 2. (This is also written as x = 2.) Similarly at x 0, we have y = y f 1 = 5 and etc. z = z f 1 = 1 v x = x f 2 = 3 3. The Chain Rule Recall the chain rule for real-valued functions of one variable: 1. Familiar chain rule. If g : R R is differentiable at x 0 and f : R R is differentiable at y 0 = g(x 0 ), then f g is differentiable at x 0 and (f g) (x 0 ) = f (y 0 )g (x 0 ). Using matrices, the chain rule in higher dimensions looks identical to the familiar chain rule. As before, we use bold face letters like x to denote points (or vectors) in R n. 1. Chain rule in higher dimensions. If g : R n R m is differentiable at x 0 and f : R m R p is differentiable at y 0 = g(x 0 ), then f g is differentiable at x 0 and (f g) (x 0 ) = f (y 0 )g (x 0 ).

3.1. Example. Define g : R 2 R 3 by and define f : R 3 R by DERIVATIVES AS MATRICES; CHAIN RULE 5 g(s, t) = (s 2 t, s + 2t 2, st) f(x, y, z) = e 2x y+z. Let s compute the derivative of f g at (s 0, t 0 ) = (1, 1, ). Note that g(1, 1) = (1, 3, 1). The chain rule says that We compute: so Similarly Thus (f g) (1, 1) = f (1, 3, 1)g (1, 1). g (s, t) = 2st s2 1 4t t s g (1, 1, ) = 2 1 1 4 1 1 f (1, 3, 1) = [ 2 1 1 ] (f g) (1, 1) = [ 2 1 1 ] 2 1 1 4 = [ 4 1 ] 1 1 Writing w = f g(s, t), we can read off from this derivative matrix that at (1, 1), w s = 4 and w t = 1. Let s compare this computation with the version of the chain rule given in Section 15.5 of Stewart. Writing w = f(x, y, z) and (x, y, z) = g(s, t), the chain rule in Stewart says that At (s, t) = (1, 1) and (x, y, z) = (1, 3, 1), we get (3.2) w s = w x x s + w y y s + w x z s. w s = 2(2) + ( 1)(1) + (1)(1) = 4. This agrees with what we obtained using the matrices. Notice that the computation in Equation 3.2 is equivalent to multiplying the first (and only) row of the matrix f (1, 3, 1) by the first column of the matrix g (1, 1). Similarly, the formula in Stewart for w t corresponds to multiplying the row of f (1, 3, 1) by the second column of g (1, 1). The matrix multiplication gives us both partials at once. (If you are only interested in one of the partials, then the formula in Stewart is a bit faster. If you want all the partials, the matrix method is more convenient.)

6 DERIVATIVES AS MATRICES; CHAIN RULE 3.3. Example. Let g be as in the previous example, and let f : R 3 R 2 be given by f(x, y, z) = (e 2x y+z, xyz). Then f 2 1 1 (1, 3, 1) = 3 1 3 so (f g) 2 1 1 (1, 1) = 2 1 1 4 4 1 = 3 1 3 10 10 1 1 Writing (u, v) = f(x, y, z) = (f g)(s, t), we can read off all the partials at (s, t) = (1, 1): s = 4 t = 1 v s = 10 v t = 10 3.4. Exercise. Let f : R 2 R 2 be given by f(x, y) = (e x y 2, (x + 1)y 3 ) and let g : R R 2 be given by g(t) = (sin(t), e t ). First use the matrix version of the Chain Rule to find (f g) (0). Then compute (f g) (0) by the method in Section 15.5 and compare your answers. 3.5. Exercise. This exercise refers to problem 26 in Section 15.5. (same in old edition). (1) Express the given functions as Y = f(u, v, w) and (u, v, w) = g(r, s, t). (2) Write down the derivative matrices for f and g at the indicated point (i.e., (r, s, t) = (1, 0, 1) and (u, v, w) = g(1, 0, 1), which you can compute). (3) Use the matrix version of the Chain Rule to compute the derivative (f g) (1, 0, 1). (4) Read off from your matrix in (3) the partial derivatives that are requested in problem 26.