SIMPLE MULTIVARIATE OPTIMIZATION

SIMPLE MULTIVARIATE OPTIMIZATION 1. DEFINITION OF LOCAL MAXIMA AND LOCAL MINIMA 1.1. Functions of variables. Let f(,x ) be defined on a region D in R containing the point (a, b). Then a: f(a, b) is a local maximum value of f if f(a, b) f(,x ) for all domain points (,x ) in an open disk centered at (a, b). b: f(a, b) is a local minimum value of f if f(a, b) f(,x ) for all domain points (,x ) in an open disk centered at (a, b). 1.. Functions of n variables. a: Consider a real valued function f with domain D in R n. Then f is said to have a local minimum at a point x D if there exists a real number δ>such that f(x) f(x ) x D satisfying x x <δ (1) b: Consider a real valued function f with domain D in R n. Then f is said to have a local maximum at a point x D if there exists a real number δ> such that f(x) f(x ) x D satisfying x x <δ () c: A real valued function f with domain D in R n is said to have a local extremum at a point x Dif either equation 1 or equation holds. d: The function f has a global maximum at x ɛ D if holds for all x ɛ D and similarly for a global minimum.. DEFINITION OF THE GRADIENT AND HESSIAN OF A FUNCTION OF n VARIABLES.1. Gradient of f. The gradient of a function of n variables f(,x,,x n ) is defined as follows. f(x) = = ( x. x n... ) x x n (3) Date: September 1, 5. 1

SIMPLE MULTIVARIATE OPTIMIZATION.. Hessian matrix of f. The Hessian matrix of a function of n variables f(,x,,x n ) is as follows. [ ] f f(x) = i, j = 1,,..., n x i x j = 3.1. First derivative test for local extreme values. f(x ) f(x ) x... f(x ) f(x ) x. x x..... f(x ) f(x ) x n x n x... f(x ) x n f(x ) x x n. f(x ) x n x n 3. NECESSARY CONDITIONS FOR EXTREME POINTS Theorem 1. Let x be an interior point of a domain D in R n and assume that f is twice continuously differentiable on D. It is necessary for a local minimum of f at x that (4) This implies that f(x ) = (5) (x ) = (x ) x =. (6) (x ) x n = Theorem. Let x be an interior point of a domain D in R n and assume that f is twice continuously differentiable on D. It is necessary for a local maximum of f at x that f(x ) = (7) 3.. Graphical illustration. Figure 1 shows the relative maximum of a function. The plane that is tangent to the surface at the point where the gradient is equal to zero is horizontal in the f(,x ) plane. 3.3. Critical points and saddle points. 3.3.1. Critical point. An interior point of the domain of a function f(,x,..., x n ) where all first partial derivatives are zero or where one or more of the first partials does not exist is a critical point of f. 3.3.. Saddle point. A critical point that is not a local extremum is called a saddle point. We can say that a differentiable function f(,x,..., x n ) has a saddle point at a critical point (x ) if we can partition the vector x* into two subvectors (,x ) where X 1 R q and x X R p (n = p + q) with the following property f(,x ) f(,x ) f(,x ) (8)

SIMPLE MULTIVARIATE OPTIMIZATION 3 FIGURE 1. Local maximum of function f(,x )=e ( +x ) 1 f,x.5 1 1 1 x 1 for all X 1 and x X. The idea is that a saddle point attains a maximum in one direction and a minimum in the other direction. Figure shows a function with a saddle point. The function reaches a maximum along the axis and a minimum along the x axis. A contour plot for the function f(,x ) = x 1 x in contained in figure 3. 3.3.3. Example problems. Find all the critical points for each of the following functions. 1: y = 16 +1x x 1 x : y = 5 +18x x 1 x 3: y = 1 +76x 4 x 1 x 4: y = +4x x 1 + x x 5: y = 3 +x x 1 +x x 6: y = 3x.4 1 x. 6 x 7: y = x.3 1 x.5 5 6 x 4. SECOND DERIVATIVE TEST FOR LOCAL EXTREME VALUES 4.1. A theorem on second derivatives of a function and local extreme values. Theorem 3. Suppose that f(,x ) and its first and second partial derivatives are continuous throughout a disk centered at (a, b) and that (a, b) = x (a, b) =. Then a: f has a local maximum at (a, b) if f (a, b) < and f x 1 x 1 f x [ ] f x > at (a, b).

4 SIMPLE MULTIVARIATE OPTIMIZATION FIGURE. Saddle point of the function f(,x )=x 1 x f,x 5 5 a x FIGURE 3. Contour plot for function f(,x )=x 1 x 1 1 1 1 We can also write this as f 11 < and f 11 f f 1 > at (a, b). b: f has a local minimum at (a, b) if

f (a, b) > and f f x 1 x 1 x We can also write this as f 11 > and f 11 f f1 SIMPLE MULTIVARIATE OPTIMIZATION 5 [ ] f x c: fhasasaddle point at (a, b) if f f x 1 x We can also write this as f 11 f f1 [ > at (a, b). ] f x > at (a, b). < at (a, b). < at (a, b). [ ] d: The test is inconclusive at (a, b) if f f f x 1 x x = at (a, b). In this case we must find some other way to determine the behavior of f at (a, b). [ ] The expression f f f x 1 x x 1 x is called the discriminant of f. It is sometimes easier to remember it by writing it in determinant form f x 1 f x [ f x ] = f 11 f 1 f 1 f (9) The theorem says that if the discriminant is positive at the point (a, b), then the surface curves the same way in all directions; downwards if f 11 <, giving rise to a local maximum, and upwards if f 11 >, giving a local minimum. On the other hand, if the discriminant is negative at (a, b), then the surface curves up in some directions and down in others and we have a saddle point. 4.. Example problems. 4..1. Example 1. Taking the first partial derivatives we obtain y = x 1 x = = (1a) x = x = (1b) =, x = (1c) Thus we have a critical point at (,). Taking the second derivatives we obtain f x 1 =, f x =, f x f x =, = The discriminant is f [ f ] f x 1 x = x f 11 f 1 f 1 f = = ( ) ( ) () () = 4. Thus this function has a maximum at the point (, ) because f and the discriminant is positive. The x 1 graph of the function is given in figure 4. The level curves are given in figure 5. The tangent plane is given in figure 6.

6 SIMPLE MULTIVARIATE OPTIMIZATION FIGURE 4. Graph of the function x 1 x f,x 5 1 15 x FIGURE 5. Level curves of the function x 1 x 3 1-1 - -3-3 - -1 1 3 4... Example. y = 16 +1x + x 1 + x

SIMPLE MULTIVARIATE OPTIMIZATION 7 FIGURE 6. Tangent plane to the function x 1 x 5 f,x 1 15 x Taking the first partial derivatives we obtain = 16 + = (11a) x = 1 + x = (11b) These equations are easy to solve because the first equation only depends on and the second equation only depends on x. Solving equation 11a, we obtain 16 + = = 16 Solving equation 11b, we obtain = 8 1 + x = x = 1 = 6 Thus we have a critical point at (-8,-6). Taking the second derivatives we obtain The discriminant is f x 1 =, f x =, f x = f x =

8 SIMPLE MULTIVARIATE OPTIMIZATION f x 1 f x [ ] f = x f 11 f 1 f 1 f = = () () () () = 4. Thus this function has a minimum at the point (-8, -6) because f > and the discriminant is positive. x 1 The graph of the function is given in figure 7. The level curves are given in figure 8. FIGURE 7. Graph of the function f(,x )=16 +1x + x 1 + x 1 f,x 5 1 15 1 5 1 5 x 15 FIGURE 8. Level curves of the function 16 +1x + x 1 + x 5-5 -1-15 -15-1 -5 5

SIMPLE MULTIVARIATE OPTIMIZATION 9 The tangent plane is given in figure 9. FIGURE 9. Plane tangent to the function 16 +1x + x 1 + x 15 1 5 f,x 1 1 15 1 5 x 5 4..3. Example 3. y = x Taking the first partial derivatives we obtain = x = (1a) x = = (1b) Thus this function has a critical value at the point at (,). Taking the second derivatives we obtain The discriminant is f x 1 =, f x =1, f x = 1 f x = f x 1 f x [ ] f = x f 11 f 1 f 1 f = 1 1 = () () (1) (1) = 1. The function has a saddle point at (,) because the discriminant is negative at this critical value. The graph of the function is given in figure 1. The level curves are given in figure 11. The tangent plane is shown from two different angles in figures 1 and 13.

1 SIMPLE MULTIVARIATE OPTIMIZATION FIGURE 1. Graph of the function f(,x )= x 4 f,x 1 1 1 1 x FIGURE 11. Let sets of the function f(,x )= x 15 1 5-5 -1-15 -15-1 -5 5 1 15 4..4. Example 4. Taking the first partial derivatives we obtain y = +4x x 1 +4 x x

SIMPLE MULTIVARIATE OPTIMIZATION 11 FIGURE 1. Saddle point of the function f(,x )= x 1 f,x 1 1 x 1 FIGURE 13. Saddle point of the function f(,x )= x 1 f,x x 1 1 1 = +4x = (13a) x = 4 + 4 4x = (13b) Write this system of equations as

1 SIMPLE MULTIVARIATE OPTIMIZATION +4x = (14a) 4 + 4 4x = (14b) 4x = (14c) 4 4x = 4 (14d) Multiply equation 14c by - and add it to equation 14d. First the multiplication ( 4x ) = () Then the addition 4 +8x = 4 4 +8x = 4 4 4x = 4 4x = 8 Now multiply the result in equation 15 by 1 to obtain the system 4 4x = x = Now multiply the second equation in 16 by 4 and add to the first equation as follows 4x = 4x = 8 = 6 Multiplying the result in equation 17 by 1 we obtain the system (15) (16) (17) = 3 (18) x = Thus we have a critical value at (-3, -). We obtain the second derivatives by differentiating the system in equation 13 The discriminant is f f x 1 x [ f x f x 1 =, f x =4, ] = f x 1 f x = 4 = 4 f 11 f 1 f 1 f = 4 4 4 = ( ) ( 4) (4) (4) = 8.

SIMPLE MULTIVARIATE OPTIMIZATION 13 The discriminant is negative and so the function has a saddle point. The graph of the function is given in figure 14. The level curves are given in figure 15. The tangent plane is shown from two different angles in figures 16 and 17. FIGURE 14. Graph of the function f(,x )= +4x x 1 +4 x x 1 1 f,x 1 15 1 1 x 4..5. Example 5. Taking the first partial derivatives we obtain y = x e ( + x ) x = x e ( + x ) + x 1x e ( + x ) = x (x 1 1)e (x 1 + x ) = e ( + x ) + x e ( + x ) = (x 1)e (x 1 + x ) We set both of the equations equal to zero and solve for critical values of and x. Because

14 SIMPLE MULTIVARIATE OPTIMIZATION FIGURE 15. Level sets of the function f(,x )= +4x x 1 +4 x x 1 5 5 1 1 5 5 1 FIGURE 16. Plane through saddle point of the function f(,x )= +4x x 1 + 4 x x 8 6 4 f,x 4 6 4 x e (x 1 + x ) for all (,x ) the equations can be zero if and only if x (x 1 1) = (x 1) =

SIMPLE MULTIVARIATE OPTIMIZATION 15 FIGURE 17. Plane through saddle point of the function f(,x )= +4x x 1 + 4 x x 8 6 4 f x 5 1,x 65 4 x Thus we have critical values at (, ), (1, 1), (1, -1), (-1, 1) and (-1, -1). A graph of the function is contained in figure 18. FIGURE 18. Graph of the function f(,x ) = x e ( + x ). f,x. x A set of level curves is contained in figure 19.

16 SIMPLE MULTIVARIATE OPTIMIZATION FIGURE 19. Level sets of the function f(,x ) = x e ( + x ) 3 1-1 - -3-3 - -1 1 3 We compute the second partial derivatives. The second partial derivative of f with respect to is f x1 = 3e x x e x x 3 1 x The second partial derivative of f with respect to x is given by f x x = 3e x x e x x 3 The cross partial derivative of f with respect to and x is f x = e x 1 x + e x x 1 + e x x e x x 1x [ ] The discriminant f f f is f x 1 x x 1 x given by 11 f 1 f 1 f. We can write this as follows by making the substitutions. e x 3e x 1 x x e x x 3 1 x e x + e x x 1 + e x x e x x 1 x + e x x 1 + e x x e x x 1 x 3e x x e x x 3 Computing the discriminant we obtain

SIMPLE MULTIVARIATE OPTIMIZATION 17 x 1 Discriminant = 3e x x 1 x x e x 3 x 1 3e x x 1 x x e x 3 x 1 x e x 1 x + e x x 1 + e 1 x x x e 1 x x x 1 x 1 x e x 1 x + e x x 1 + e 1 x x x e 1 x x 1 x Multiplying out the first term we obtain term 1= e x x 1 ( 3+x 1 ) x ( 3+x ) ( = e x 9 x 1 x 3 x 4 1 x 3 x 1 x 4 + x 4 1 x) 4 Multiplying out the second term we obtain term = e x ( 1+x 1 ) ( 1+x ) ( = e x 1 x 1 + x 4 1 x +4x 1 x x 4 1 x + x 4 x 1 x 4 + x 4 1 x) 4 Subtracting term from term 1 we obtain ( Discriminant = e x 9 x 1 x 3 x 4 1 x 3 x 1 x 4 + x 4 1 x 4 ) + ( ) 1+x 1 x 4 1 +x 4x 1 x +x 4 1 x x 4 +x 1 x 4 x 4 1 x 4 ( = e x 1+5x 1 x x4 1 x x4 + +x x4 1 ) x4 Because e x > for all (,x ), we need only consider the term in parentheses to check the sign of the discriminant. Consider each of the cases in turn. a: Consider the critical point (, ). In the case the relevant term is given by ( 1+5x 1 x x 4 1 x x 1 x 4 +x 1 +x x 4 1 x 4 ) = 1 This is negative so this is a saddle point. We can see this in figure. b: Consider the critical point (1, -1). In the case the relevant term is given by ( 1+5x 1 x x 4 1 x x 1 x 4 +x 1 +x x 4 1 x 4 ) =4 This is positive. The second derivative of f with respect to evaluated at (1, -1) is given by f x1 =3e x 1 x x e x x 3 1 x =e x 1 x ( 3x1 x x 3 1 x ) =e 1 ( 3+1)= e <. Given that the f 11 is negative, this point is a local maximum. We can see this in figure 1. c: Consider the critical point (-1, 1). In the case the relevant term is given by ( 1+5x 1 x x4 1 x x4 + +x x4 1 ) x4 =4 This is positive. The second derivative of x with respect to evaluated at (1, -1) is given by

18 SIMPLE MULTIVARIATE OPTIMIZATION FIGURE. Saddle point of the function f(,x ) = x e ( + x )..1 f,x.1. x f x1 =3e x x e x x 3 1x ( ) 3x1 x x 3 1x =e x =e 1 ( 3+1)= e <. Given that f 11 is negative, this point is a local maximum. We can see this in figure. d: Consider the critical point (1, 1). In the case the relevant term is given by ( 1+5x 1 x x 4 1 x x 1 x 4 +x 1 +x x 4 1 x 4 ) =4 This is positive. The second derivative of f with respect to evaluated at (1, 1) is given by f x1 =3e x x e x x 3 1x ( ) 3x1 x x 3 1x =e x =e 1 (3 1) = e >. Given that the f 11 derivative is positive, this point is a local minimum. We can see this in figure 3.

SIMPLE MULTIVARIATE OPTIMIZATION 19 FIGURE 1. Local maximum of the function f(,x ) = x e ( + x ). f,x. x FIGURE. Local maximum of the function f(,x ) = x e (x 1 + x ). f,x. x e: Consider the critical point (-1, -1). In the case the relevant term is given by ( 1+5x 1 x x4 1 x x4 + +x x4 1 ) x4 =4

SIMPLE MULTIVARIATE OPTIMIZATION FIGURE 3. Local minimum of the function f(,x ) = x e ( + x ) x f,x.. This is positive. The second derivative of f with respect to evaluated at (-1, -1) is given by f x1 =3e x x e x x 3 1 x ( ) 3x1 x x 3 1 x =e x =e 1 (3 1) = e >. Given that the f 11 derivative is positive, this point is a local minimum. We can see this in figure 4. FIGURE 4. Local minimum of the function f(,x ) = x e (x 1 + x ) x f,x..

SIMPLE MULTIVARIATE OPTIMIZATION 1 4.3. In-class problems. a: y = x 1 + x x 7 x b: y = x x + + x +1 c: y = 16 +1x x 1 + x x d: y = 4 x x 1 x 4 e: y = 1x1.4 x. x 5. TAYLOR SERIES 5.1. Definition of a Taylor series. Let f be a function with derivatives of all orders throughout some interval containing a as an interior point. The Taylor series generated by f at x = a is k= f k (a) k! (x a) k = f(a) + f a)(x a) + f (a)! (x a) + f (a) 3! (x a) 3 + + f n (a) (x a) n + (19) n! 5.. Definition of a Taylor polynomial. The linearization of a differentiable function f at a point a is the polynomial P 1 (x) = f(a) + f (a)(x a) () For a given a, f(a) and f (a) will be constants and we get a linear equation in x. This can be extended to higher order terms as follows. Let f be a function with derivatives of order k for k = 1,,..., N in some interval containing a as an interior point. Then for any integer n from through N, the Taylor polynomial of order n generated by f at x = a is the polynomial P n (x) =f(a) +f (a)(x a) + f (a)! (x a) + + f k (a) k! (x a) k + + f n (a) n! (x a) n (1) That is, the Taylor polynomial of order n is the Taylor series truncated with the term containing the nth derivative of f. This allows us to approximate a function with polynomials of higher orders. 5.3. Taylor s theorem. Theorem 4. If a function f and its first n derivatives f,f,...,f (n) are continuous on [a, b] or on [b, a], and f (n) is differentiable on (a, b) or (b, a), then there exists a number c between a and b such that f(b) = f(a) + f (a)(b a) + f (a)! (b a) +... + f n (a) n! (b a) n + f n+1 (c) (n + 1)! (b a)n+1 () What this says is that we can approximate the function f at the point b by an nth order polynomial defined at the point a with an error term defined by f n+1 (c) (n+1)! (b a) n+1. In using Taylor s theorem we usually think of a as being fixed and treat b as an independent variable. If we replace b with x we obtain the following corollary. Corollary 1. If a function f has derivatives of all orders in an open interval I containing a, then for each positive integer n and for each x in I,

SIMPLE MULTIVARIATE OPTIMIZATION f(x) = f(a) +f (a)(x a) + f (a) (x a) + + f n (a) (x a) n + R n (x),! n! where (3) R n (x) = f n+1 (c) (n + 1)! (x a)n+1 for some c between a and x. If R n (x) asn for all x in I, then we say that the Taylor series generated by f at x = a converges to foni. 6. PROOF OF THE SECOND DERIVATIVE TEST FOR LOCAL EXTREME VALUES Let f(,x ) have continuous partial derivatives in the open region R containing a point P(a, b) where = x =. Let h and k be increments small enough to put the point S(a+h, b+k) and the line segment joining it to P inside R. We parameterize the line segment PS as = a + th, x = b + tk, t 1. (4) Define F(t) as f(a+th, b+tk). In this way F tracks the movement of the function f along the line segment from P to S. Now differentiate F with respect to t using the chain rule to obtain F (t) = d dt + dx x dt = f 1 h + f k. (5) Because and x are differentiable, F is a differentiable function of t and we have F (t) = F d dt + F dx x dt = (f 1 h + f k) h + x = h f 11 +hk f 1 + k f, (f 1 h + f k) k where we use the fact that f 1 =f 1. By the definition of F, which is defined over the interval t 1, and the fact that f is continuous over the open region R, it is clear that F and F are continuous on [, 1] and that F is differentiable on (, 1). We can then apply Taylor s formula with n = and a =. This will give (6) Simplifying we obtain F (1) = F () + F () (1 ) + F (1 ) (c) (7) F (1) = F () + F () (1 ) + F (1 ) (c) (8) = F () + F () + F (c) for some c between and 1. Now write equation 8 in terms of f. Since t = 1 in equation 8, we obtain f at the point (a+h, b+k) or f(a + h, b + k) =f(a, b)+hf 1 (a, b)+kf (a, b)+ 1 (h f 11 +hkf 1 + k f ) (a+ch, b+ck) (9) Because = x = at (a, b), equation 9 reduces to f(a + h, b + k) f(a, b) = 1 (h f 11 +hkf 1 + k f ) (a+ch, b+ck) (3) The presence of an extremum of f at the point (a, b) is determined by the sign of the left hand side of equation 3, or f(a+b, b+k) - f(a, b). If [f(a + h, b + k) f(a, b)] < then the point is a maximum. If

SIMPLE MULTIVARIATE OPTIMIZATION 3 [f(a + h, b + k) f(a, b)] >, the point is a minimum. So if the left hand side (lhs) of equation <, we have a maximum, and if the left hand side of equation >, we have a minimum. Specifically, lhs <, lhs >, By equation 3 this is the same as the sign of f is at a maximum f is at a minimum Q(c) = (h f 11 +hkf 1 + k f ) (a + ch, b+ck) (31) If Q(), the sign of Q(c) will be the same as the sign of Q() for sufficiently small values of h and k. We can predict the sign of Q() = h f 11 (a, b) +hkf 1 (a, b) +k f (a, b) (3) from the signs of f 11 and f 11 f f1 at the point (a, b). Multiply both sides of equation 3 by f 11 and rearrange to obtain f 11 Q() = f 11 (h f 11 +hkf 1 + k f ) = f11 h +hk f 11 f 1 + k f 11 f = f11 h +hk f 11 f 1 + k f1 k f1 + k f 11 f (33) =(hf 11 + kf 1 ) k f1 + k f 11 f =(hf 11 + kf 1 ) +(f 11 f f1 ) k Now from equation 33 we can see that a: If f 11 < and f 11 f f1 > at (a, b), then Q() < for all sufficiently small values of h and k. This is clear because the right hand side of equation 33 will be positive if the discriminant is positive. With the right hand side positive, the left hand side must be positive which implies that with f 11 <, Q() must also be negative. Thus f has a local maximum at (a, b) because f evaluated at points close to (a, b) is less than f at (a, b). b: If f 11 > and f 11 f f1 > at (a, b), then Q() > for all sufficiently small values of h and k. This is clear because the right hand side of equation 33 will be positive if the discriminant is positive. With the right hand side positive, the left hand side must be positive which implies that with f 11 >, Q() must also be positive. Thus f has a local minimum at (a, b) because f evaluated at points close to (a, b) is greater than f at (a, b). c: If f 11 f f1 < at (a, b), there are combinations of arbitrarily small nonzero values of h and k for which Q() > and other values for which Q() <. Arbitrarily close to the point P (a, b, f(a, b)) on the surface y = f(,x ) there are points above P and points below P, so f has a saddle point at (a, b). d: If f 11 f f1 =, another test is needed. The possibility that Q() can be zero prevents us from drawing conclusions about the sign of Q(c).