Lecture 5 Principal Minors and the Hessian



Similar documents
SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

Notes on Determinant

Similarity and Diagonalization. Similar Matrices

Chapter 6. Orthogonality

Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = i.

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

Orthogonal Diagonalization of Symmetric Matrices

Linear Algebra I. Ronald van Luijk, 2012

2.3 Convex Constrained Optimization Problems

Introduction to Matrix Algebra

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Inner Product Spaces

October 3rd, Linear Algebra & Properties of the Covariance Matrix

Zeros of a Polynomial Function

Math 115 Spring 2011 Written Homework 5 Solutions

Nonlinear Programming Methods.S2 Quadratic Programming

Assessment Schedule 2013

α = u v. In other words, Orthogonal Projection

University of Lille I PC first year list of exercises n 7. Review

Multi-variable Calculus and Optimization

Vector and Matrix Norms

More than you wanted to know about quadratic forms

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

Chapter 17. Orthogonal Matrices and Symmetries of Space

LINEAR ALGEBRA. September 23, 2010

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

A FIRST COURSE IN OPTIMIZATION THEORY

Lecture 11: 0-1 Quadratic Program and Lower Bounds

Question 2: How do you solve a matrix equation using the matrix inverse?

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

2.2/2.3 - Solving Systems of Linear Equations

Zero: If P is a polynomial and if c is a number such that P (c) = 0 then c is a zero of P.

Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products

MA107 Precalculus Algebra Exam 2 Review Solutions

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.

DATA ANALYSIS II. Matrix Algorithms

5.5. Solving linear systems by the elimination method

Generating Valid 4 4 Correlation Matrices

Linear algebra and the geometry of quadratic equations. Similarity transformations and orthogonal matrices

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Understanding Basic Calculus

Portfolio Optimization Part 1 Unconstrained Portfolios

Recall that two vectors in are perpendicular or orthogonal provided that their dot

Row Echelon Form and Reduced Row Echelon Form

Scalar Valued Functions of Several Variables; the Gradient Vector

Solving Linear Systems, Continued and The Inverse of a Matrix

x = + x 2 + x

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Linear Algebra Review. Vectors

Critical points of once continuously differentiable functions are important because they are the only points that can be local maxima or minima.

Section Inner Products and Norms

3.1 Solving Systems Using Tables and Graphs

13 MATH FACTS a = The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

A characterization of trace zero symmetric nonnegative 5x5 matrices

Linear Algebra Notes

Recall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n.

Continued Fractions and the Euclidean Algorithm

Actually Doing It! 6. Prove that the regular unit cube (say 1cm=unit) of sufficiently high dimension can fit inside it the whole city of New York.

1 Lecture: Integration of rational functions by decomposition

Mathematical finance and linear programming (optimization)

JUST THE MATHS UNIT NUMBER 1.8. ALGEBRA 8 (Polynomials) A.J.Hobson

THREE DIMENSIONAL GEOMETRY

MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set.

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system

Matrix Algebra. Some Basic Matrix Laws. Before reading the text or the following notes glance at the following list of basic matrix algebra laws.

Algebra 1 Course Title

Numerical Analysis Lecture Notes

Inner products on R n, and more

Systems of Linear Equations

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.

Lectures notes on orthogonal matrices (with exercises) Linear Algebra II - Spring 2004 by D. Klain

The Characteristic Polynomial

The last three chapters introduced three major proof techniques: direct,

Lecture 4: Partitioned Matrices and Determinants

LS.6 Solution Matrices

Linear Programming. March 14, 2014

Introduction to Algebraic Geometry. Bézout s Theorem and Inflection Points

Applied Linear Algebra I Review page 1

1 Introduction to Matrices

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

The Ideal Class Group

4: EIGENVALUES, EIGENVECTORS, DIAGONALIZATION

LINEAR ALGEBRA W W L CHEN

Similar matrices and Jordan form

10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method

TOPIC 4: DERIVATIVES

Vector Spaces 4.4 Spanning and Independence

Notes on Symmetric Matrices

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.

Core Maths C1. Revision Notes

CURVE FITTING LEAST SQUARES APPROXIMATION

The Method of Partial Fractions Math 121 Calculus II Spring 2015

8 Square matrices continued: Determinants

( ) which must be a vector

Lecture 13: Risk Aversion and Expected Utility

Transcription:

Lecture 5 Principal Minors and the Hessian Eivind Eriksen BI Norwegian School of Management Department of Economics October 01, 2010 Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and the Hessian October 01, 2010 1 / 25 Principal minors Principal minors Let A be a symmetric n n matrix. We know that we can determine the definiteness of A by computing its eigenvalues. Another method is to use the principal minors. Definition A minor of A of order k is principal if it is obtained by deleting n k rows and the n k columns with the same numbers. The leading principal minor of A of order k is the minor of order k obtained by deleting the last n k rows and columns. For instance, in a principal minor where you have deleted row 1 and 3, you should also delete column 1 and 3. Notation We write D k for the leading principal minor of order k. There are ( ) n k principal minors of order k, and we write k for any of the principal minors of order k. Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and the Hessian October 01, 2010 2 / 25

Principal minors Two by two symmetric matrices Let A = ( ) a b b c be a symmetric 2 2 matrix. Then the leading principal minors are D 1 = a and D 2 = ac b 2. If we want to find all the principal minors, these are given by 1 = a and 1 = c (of order one) and 2 = ac b 2 (of order two). Let us compute what it means that the leading principal minors are positive for 2 2 matrices: Let A = ( ) a b b c be a symmetric 2 2 matrix. Show that if D1 = a > 0 and D 2 = ac b 2 > 0, then A is positive definite. Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and the Hessian October 01, 2010 3 / 25 Principal minors Leading principal minors: An example If D 1 = a > 0 and D 2 = ac b 2 > 0, then c > 0 also, since ac > b 2 0. The characteristic equation of A is λ 2 (a + c)λ + (ac b 2 ) = 0 and it has two solutions (since A is symmetric) given by λ = a + c 2 ± (a + c) 2 4(ac b 2 ) 2 Both solutions are positive, since (a + c) > (a + c) 2 4(ac b 2 ). This means that A is positive definite. Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and the Hessian October 01, 2010 4 / 25

Principal minors Definiteness and principal minors Theorem Let A be a symmetric n n matrix. Then we have: A is positive definite D k > 0 for all leading principal minors A is negative definite ( 1) k D k > 0 for all leading principal minors A is positive semidefinite k 0 for all principal minors A is negative semidefinite ( 1) k k 0 for all principal minors In the first two cases, it is enough to check the inequality for all the leading principal minors (i.e. for 1 k n). In the last two cases, we must check for all principal minors (i.e for each k with 1 k n and for each of the ( n k) principal minors of order k). Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and the Hessian October 01, 2010 5 / 25 Principal minors Definiteness: An example Determine the definiteness of the symmetric 3 3 matrix 1 4 6 A = 4 2 1 6 1 6 One may try to compute the eigenvalues of A. However, the characteristic equation is det(a λi ) = (1 λ)(λ 2 8λ + 11) 4(18 4λ) + 6(6λ 16) = 0 This equations (of order three with no obvious factorization) seems difficult to solve! Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and the Hessian October 01, 2010 6 / 25

Principal minors Definiteness: An example (Continued) Let us instead try to use the leading principal minors. They are: D 1 = 1, D 2 = 1 4 4 2 = 14, D 1 4 6 3 = 4 2 1 6 1 6 = 109 Let us compare with the criteria in the theorem: Positive definite: D 1 > 0, D 2 > 0, D 3 > 0 Negative definite: D 1 < 0, D 2 > 0, D 3 < 0 Positive semidefinite: 1 0, 2 0, 3 0 for all principal minors Negative semidefinite: 1 0, 2 0, 3 0 for all principal minors The principal leading minors we have computed do not fit with any of these criteria. We can therefore conclude that A is indefinite. Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and the Hessian October 01, 2010 7 / 25 Principal minors Definiteness: Another example Determine the definiteness of the quadratic form Q(x 1, x 2, x 3 ) = 3x 2 1 + 6x 1 x 3 + x 2 2 4x 2 x 3 + 8x 2 3 The symmetric matrix associated with the quadratic form Q is 3 0 3 A = 0 1 2 3 2 8 Since the leading principal minors are positive, Q is positive definite: D 1 = 3, D 2 = 3 0 0 1 = 3, D 3 0 3 3 = 0 1 2 3 2 8 = 3 Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and the Hessian October 01, 2010 8 / 25

Optimization of functions of several variables The first part of this course was centered around matrices and linear algebra. The next part will be centered around optimization problems for functions f (x 1, x 2,..., x n ) in several variables. C 2 functions Let f (x 1, x 2,..., x n ) be a function in n variables. We say that f is C 2 if all second order partial derivatives of f exist and are continuous. All functions that we meet in this course are C 2 functions, so shall not check this in each case. Notation We use the notation x = (x 1, x 2,..., x n ). Hence we write f (x) in place of f (x 1, x 2,..., x n ). Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and the Hessian October 01, 2010 9 / 25 Stationary points of functions in several variables The partial derivatives of f (x) are written f 1, f 2,..., f n. The stationary points of f are the solutions of the first order conditions: Definition Let f (x) be a function in n variables. We say that x is a stationary point of f if f 1(x) = f 2(x) = = f n(x) = 0 Let us look at an example: Let f (x 1, x 2 ) = x 2 1 x 2 2 x 1x 2. Find the stationary points of f. Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and the Hessian October 01, 2010 10 / 25

Stationary points: An example We compute the partial derivatives of f (x 1, x 2 ) = x 2 1 x 2 2 x 1x 2 to be f 1(x 1, x 2 ) = 2x 1 x 2, f 2(x 1, x 2 ) = 2x 2 x 1 Hence the stationary points are the solutions of the following linear system 2x 1 x 2 = 0 x 1 2x 2 = 0 Since det ( ) 2 1 1 2 = 5 0, the only stationary point is x = (0, 0). Given a stationary point of f (x), how do we determine its type? Is it a local minimum, a local maximum or perhaps a saddle point? Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and the Hessian October 01, 2010 11 / 25 The Hessian matrix Let f (x) be a function in n variables. The Hessian matrix of f is the matrix consisting of all the second order partial derivatives of f : Definition The Hessian matrix of f at the point x is the n n matrix f 11 (x) f 12 (x)... f 1n (x) f f 21 (x) f 22 (x)... f 2n (x) = (x)...... f n1 (x) f n2 (x)... f nn(x) Notice that each entry in the Hessian matrix is a second order partial derivative, and therefore a function in x. Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and the Hessian October 01, 2010 12 / 25

The Hessian matrix: An example Compute the Hessian matrix of the quadratic form f (x 1, x 2 ) = x 2 1 x 2 2 x 1 x 2 We computed the first order partial derivatives above, and found that Hence we get f 1(x 1, x 2 ) = 2x 1 x 2, f 2(x 1, x 2 ) = 2x 2 x 1 f 11(x 1, x 2 ) = 2 f 12(x 1, x 2 ) = 1 f 21(x 1, x 2 ) = 1 f 22(x 1, x 2 ) = 2 Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and the Hessian October 01, 2010 13 / 25 The Hessian matrix: An example (Continued) The Hessian matrix is therefore given by ( ) f 2 1 (x) = 1 2 The following fact is useful to notice, as it will simplify our computations in the future: Proposition If f (x) is a C 2 function, then the Hessian matrix is symmetric. The proof of this fact is quite technical, and we will skip it in the lecture. Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and the Hessian October 01, 2010 14 / 25

Convexity and the Hessian Definition A subset S R n is open if it is without boundary. Subsets defined by inequalities with < or > are usually open. Also, R n is open by definition. The set S = {(x 1, x 2 ) : x 1 > 0, x 2 > 0} R 2 is an open set. Its boundary consists of the positive x 1 - and x 2 -axis, and these are not part of S. Theorem Let f (x) be a C 2 function in n variables defined on an open convex set S. Then we have: 1 f (x) is positive semidefinite for all x S f is convex in S 2 f (x) is negative semidefinite for all x S f is concave in S 3 f (x) is positive definite for all x S f is strictly convex in S 4 f (x) is negative definite for all x S f is strictly concave in S Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and the Hessian October 01, 2010 15 / 25 Convexity: An example Is f (x 1, x 2 ) = x 2 1 x 2 2 x 1x 2 convex or concave? We computed the Hessian of this function earlier. It is given by ( ) f 2 1 (x) = 1 2 Since the leading principal minors are D 1 = 2 and D 2 = 5, the Hessian is neither positive semidefinite or negative semidefinite. This means that f is neither convex nor concave. Notice that since f is a quadratic form, we could also have used the symmetric matrix of the quadratic form to conclude this. Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and the Hessian October 01, 2010 16 / 25

Convexity: Another example Is f (x 1, x 2 ) = 2x 1 x 2 x 2 1 + x 1x 2 x 2 2 convex or concave? We compute the Hessian of f. We have that f 1 = 2 2x 1 + x 2 and f 2 = 1 + x 1 2x 2. This gives Hessian ( ) f 2 1 (x) = 1 2 Since the leading principal minors are D 1 = 2 and D 2 = 3, the Hessian is negative definite. This means that f is strictly concave. Notice that since f is a sum of a quadratic and a linear form, we could have used earlier results about quadratic forms and properties of convex functions to conclude this. Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and the Hessian October 01, 2010 17 / 25 Convexity: An example of degree three Is f (x 1, x 2, x 3 ) = x 1 + x 2 2 + x 3 3 x 1x 2 3x 3 convex or concave? We compute the Hessian of f. We have that f 1 2, f 2 2 x 1 and f 3 3 2 3. This gives Hessian matrix 0 1 0 f (x) = 1 2 0 0 0 6x 3 Since the leading principal minors are D 1 = 0, D 2 = 1 and D 3 = 6x 3, the Hessian is indefinite for all x. This means that f is neither convex nor concave. Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and the Hessian October 01, 2010 18 / 25

Extremal points Definition Let f (x) be a function in n variables defined on a set S R n. A point x S is called a global maximum point for f if and a global minimum point for f if f (x ) f (x) for all x S f (x ) f (x) for all x S If x is a maximum or minimum point for f, then we call f (x ) the maximum or minimum value. How do we find global maxima and global minima points, or global extremal points, for a given function f (x)? Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and the Hessian October 01, 2010 19 / 25 First order conditions Proposition If x is a global maximum or minimum point of f that is not on the boundary of S, then x is a stationary point. This means that the stationary points are candidates for being global extremal points. On the other hand, we have the following important result: Theorem Suppose that f (x) is a function of n variables defined on a convex set S R n. Then we have: 1 If f is convex, then all stationary points are global minimum points. 2 If f is concave, then all stationary points are global maximum points. Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and the Hessian October 01, 2010 20 / 25

Global extremal points: An example Show that the function has a global maximum. f (x 1, x 2 ) = 2x 1 x 2 x 2 1 + x 1 x 2 x 2 2 Earlier, we found out that this function is (strictly) concave, so all stationary points are global maxima. We must find the stationary points. We computed the first order partial derivatives of f earlier, and use them to write down the first order conditions: f 1 = 2 2x 1 + x 2 = 0, f 2 = 1 + x 1 2x 2 = 0 Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and the Hessian October 01, 2010 21 / 25 Global extremal points: An example (Continued) This leads to the linear system 2x 1 x 2 = 2 x 1 2x 2 = 1 We solve this linear system, and find that (x 1, x 2 ) = (1, 0) is the unique stationary point of f. This means that (x 1, x 2 ) = (1, 0) is a global maximum point for f. Find all extremal points of the function f given by f (x, y, z) = x 2 + 2y 2 + 3z 2 + 2xy + 2xz + 3 Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and the Hessian October 01, 2010 22 / 25

Global extremal points: Another example We first find the partial derivatives of f : f x = 2x + 2y + 2z, f y = 4y + 2x, f z = 6z + 2x This gives Hessian matrix f (x) = 2 2 2 2 4 0 2 0 6 We see that the leading principal minors are D 1 = 2, D 2 = 4 and D 3 = 8, so f is a (strictly) convex function and all stationary points are global minima. We therefore compute the stationary points by solving the first order conditions f x = f y = f z = 0. Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and the Hessian October 01, 2010 23 / 25 Global extremal points: More examples (Continued) The first order conditions give a linear system 2x + 2y + 2z = 0 2x + 4y = 0 2x + 6z = 0 We must solve this linear system. Since the 3 3 determinant of the coefficient matrix is 8 0, the system has only one solution x = (0, 0, 0). This means that (x 1, x 2, x 3 ) = (0, 0, 0) is a global minimum point for f. Find all extremal points of the function f (x 1, x 2, x 3 ) = x 4 1 + x 4 2 + x 4 3 + x 2 1 + x 2 2 + x 2 3 Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and the Hessian October 01, 2010 24 / 25

Global extremal points: More examples The function f has one global minimum (0, 0, 0). See the lecture notes for details. So far, all examples where functions defined on all of R n. Let us look at one example where the function is defined on a smaller subset: Show that S = {(x 1, x 2 ) : x 1 > 0, x 2 > 0} is a convex set and that the function f (x 1, x 2 ) = x1 3 x 2 2 defined on S is a concave function. See the lecture notes for details. Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and the Hessian October 01, 2010 25 / 25