MATH2640 Introduction to Optimisation

MATH640 Introduction to Optimisation 3 Quadratic forms and Eigenvalues, Economic applications, Constrained Optimisation, Lagrange Multipliers, Bordered Hessians : summary 3(A Quadratic forms and eigenvalues (i Eigenvalues are the solutions, λ, of the matrix equation Ax = λx (31 In general, an n n matrix has n eigenvalues, though there may be less than n in special cases The vectors x satisfying this equation are called the eigenvectors For each eigenvalue λ there is a corresponding eigenvector x (ii case The quadratic form associated with the symmetric matrix ( a b is Q(x, y = (x y ( ( a b x = ax b c b c y + bxy + y To find the eigenvalues, we write equation (31 as (A λix = 0, where I is the unit (identity matrix Then for this to have nontrivial solutions, a λ b b c λ = 0, so λ (a + cλ + ac b = 0 is the quadratic equation whose solutions are the two eigenvalues of the matrix A These eigenvalues are always real for a symmetric matrix (iii The same methods work for an n n symmetric matrix A Quadratic forms are always associated with symmetric matrices, where A ij = A ji, and symmetric matrices always have real eigenvalues If there are n distinct eigenvalues then the eigenvectors of a symmetric matrix are always orthogonal to each other, x i x j = 0 for any i j, with i and j lying between 1 and n (iv Rules for classifying quadratic forms using the eigenvalue method If all eigenvalues are > 0, Q is positive definite (PD If all eigenvalues are < 0, Q is negative definite (ND If some eigenvalues are 0 and some are 0, Q is indefinite (I If all eigenvalues are 0, Q is positive semi-definite (PSD If all eigenvalues are 0, Q is negative semi-definite (NSD (v Normalising eigenvectors To find the eigenvector corresponding to a particular eigenvalue, solve (A λix = 0 for that value of λ You can only solve these equations up to an arbitrary constant k Example: ( 3 1 1 3 has eigenvalues λ = and λ = 4 To find eigenvector corresponding to λ =, solve x 1 + x = 0 x 1 + x = 0 giving (k, k as the eigenvector To normalise this, choose k so that (k, k has unit length, k = 1, so the normalised eigenvector is (1/, 1/ The normalised eigenvector corresponding to λ = 4 is (1/, 1/ (vi Normal forms When the normalised eigenvectors corresponding to the n eigenvalues λ (1, λ (, λ (n, are found, (y (1 1, y(1, y(1 n ; (y ( 1, y(,, y( n, etc then the quadratic form Q can be written as a sum of squares, Q = λ (1 (x 1 y (1 1 + x y (1 + + λ ( (x 1 y ( 1 + x y ( + + which is said to be the normal form of the quadratic form Q For the above example, Q = 3x 1 + x 1 x + 3x = ( x 1 x + 4( x 1 + x Check this by multiplying it out Its now obvious that positive eigenvalues make Q positive definite

3(B Unconstrained Optimisation in Economics (i Production functions A firm produces Q items per year, which sell at a price p per item The Revenue, R = pq Q depends on quantities x 1, x, etc which are quantities such as the number of employees, amount spent on equipment, and so on The vector (x 1, x,, x n is called the input bundle vector The Cost function measures how much a firm spends on production, and this will also be a function of x 1, x and so on The profit, Π = R C = pq(x 1, x,, x n C(x 1, x,, x n To maximise the profit, we find the maximum of Π as a function of the inputs x 1,, x n, leading to p Q = C, i = 1, n Problems where a firm produces several different products can also lead to optimisation problems (ii The discriminating monopolist A monopolist produces two products at a rate Q 1 and Q The price obtained is not constant, but reduces if the monopolist floods the market, so p 1 = a 1 b 1 Q 1, and p = a b Q, are the prices for the two products The Cost function is taken as C = c 1 Q 1 + c Q, being simply proportional to the quantity produced The profit Π = R C = p 1 Q 1 + p Q c 1 Q 1 c Q = (a 1 c 1 Q 1 + (a c Q b 1 Q 1 b Q so there is a best choice of Q 1 and Q to maximise profit, when Π Q1 = (a 1 c 1 b 1 Q 1 = 0, and Π Q = (a c b Q = 0 Inspection of the Hessian shows that Π has a maximum here provided b 1 and b are positive We then also require a 1 c 1 and a c The monopolist then has a profit maximising strategy allowing discrimination between the amounts of the two products to be produced The same approach can be used when the cost function or the pricing is a more complicated function of the Q i (iii The Cobb-Douglas production function A commonly used model in microeconomics is the Cobb-Douglas production function Q(x 1, x = x a 1 xb, where a > 0, b > 0 The input bundle here is then just a two-dimensional vector, and we assume here the price p is independent of Q, unlike in the monopolist problem If the Cost is linear in the input bundle, C = w 1 x 1 + w x, then the Profit Π = px a 1x b w 1 x 1 w x and the conditions for a stationary point are apx a 1 1 x b = w 1, bpx a 1x b 1 = w giving positive critical values x 1 = apq /w 1 and x = bpq /w, where Q is the production function at the critical value We want to know when this stationary value is a maximum, so we must examine the Hessian ( a(a 1px a H = 1 x b abpx a 1 1 x b 1 abpx a 1 1 x b 1 b(b 1px a 1 xb which is negative definite provided LP M 1 = a(a 1px a 1 x b < 0 and LP M = ab(1 a bp x1 a x b > 0, which requires 0 < a < 1 and 0 < b < 1 and a + b < 1 If these conditions are satisfied, then the Cobb-Douglas production function gives a profit maximising strategy If they are not satisfied, the profit increases indefinitely as x 1 and x get very large This is possible economic behaviour: if there is no profit maximising strategy, all it means is that the bigger the firm the more profit it makes, which is of course entirely possible

3(C Constrained Optimisation: Lagrange multipliers Normally, economic activity is constrained, eg expenditure is constrained by available income The constraints are applied to the input bundle, and may take many different forms Often constraints are in the form of inequalities, eg we can spend up to so much on equipment but no more However, it is often simpler to solve problems with equality constraints, eg we are allowed to spend 1000 pounds on equipment, and we will spend it all, so the equipment expenditure x 1 = 1000 is an equality constraint In general, we are trying to maximise the objective function f(x 1, x,, x n subject to the k inequality constraints g i (x 1, x,, x n b i, 1 i k and subject to the M equality constraints h i (x 1, x,, x n c i, 1 i m (i Single Equality constraint Maximise f(x subject to h(x = c At a stationary point of this problem, the constraint surface must touch the level surfaces of f To see this, note that if dx is a small displacement from the critical point x, then if the small displacement lies in the constraint surface dx h = 0 But if this displacement is at a critical point, f must not change as we move from x to x + dx, so we must have dx f = 0 also Since this has to true for all small displacements on the constraint surface, h is parallel to f at x When two vectors are parallel, there is a constant λ such that f = λ h, or (f λh = 0 The constant λ is called the Lagrange multiplier L = f λh is called the Lagrangian To find the stationary (critical points of f(x subject to h(x c = 0, we solve the n + 1 equations f λ h = 0, i = 1, n h(x = c, for the n + 1 unknowns x 1, x x n, λ As usual, we must be very careful to get all the solutions of these n + 1 equations, and not to miss some (or include bogus solutions Example: Maximise f(x, y = xy subject to h(x, y = x + 4y 16 = 0 Solution: The Lagrangian L = f λh = xy λ(x + 4y 16 The equations we need are L x = 0, or y λ = 0 (A L y = 0, or x 4λ = 0 (B x + 4y = 16 The only critical point is x = 8, y = and λ = At this point, f = 16, and this is the maximum value of f subject to this constraint Strictly speaking, we don t yet know its a maximum, because we have not examined the Hessian (see the section on Bordered Hessians below However, if we look at other values of x and y that satisfy the constraint, such as x = 0, y = 4, we see they give lower values of f, which strongly suggests that the critical point is a maximum (ii The non-degenerate constraint qualification, NDCQ The above algorithm for finding stationary points normally works well, but the argument breaks down if h = 0 at the critical point In this case, the constraint surface doesn t a uniquely defined normal, so we can t say that f must be parallel to h We should therefore check that at each critical point we find the constraint qualification, usually called the non-degenerate constraint qualification NDCQ holds, ie h 0 at each critical point What happens if it doesn t hold, ie h = 0 at the critical point? Usually the Lagrange equations can t be solved Example: Maximise f = x subject to h = x 3 + y = 0 Solution: L = x λ(x 3 + y, so L x = 0 and L y = 0 give 1 = 3λx, (A; λy = 0, (B; x 3 = y (C (C

From (A, λ 0, so from (B, y = 0 Then from (C, x = 0 x = y = 0 is actually the maximising point, as we can see from the constraint (C, as y must be 0, so x 0, so f = 0 is the largest value of f However, x = y = 0 is not a well-defined solution of (A, (B and (C because if x = 0 equation (A reads 1 = 0! The problem is that h = (3x, y happens to be zero at the critical point (0, 0 so the NDCQ is not satisfied at this critical point If the NDCQ constraint is not satisfied at the critical point, we have to use a different method to find the maximum, other than Lagrange s method In this simple problem, we could just eliminate x using the constraint, so x = ( y 1/3, and sketch f = ( y 1/3 as a function of y to find the minimum In general, however, the problem is much more difficult if the NDCQ is not satisfied (iii Several equality constraints If we have m equality constraints, h i (x c i = 0, 1 i m, then we need m Lagrange multipliers, and the Lagrangian L is Then the n + m equations to be solved are L = f λ 1 h 1 λ h λ m h m L = 0, i = 1, n h i (x = c i, i = 1, m, for the n + m unknowns x 1,, x n and λ 1,, λ m If any constraint is simple, it may well be easier to use that constraint to eliminate a variable before finding the stationary points Example: Maximise f = x + y + z subject to the constraints x + y = 1 and z = Solution: Just using the general Lagrange method mechanically, and the five equations we need to solve are L = x + y + z λ 1 (x + y λ (z 1 λ 1 x = 0, 1 λ 1 y = 0, 1 λ = 0, x + y = 1, z = It is easily seen that the two solutions are x = 1/, y = 1/, z =, λ 1 = 1/, λ = 1 and x = 1/, y = 1/, z =, λ 1 = 1/, λ = 1 However, it is much simpler to use z = to eliminate z and just maximise f = x+y+ subject to x +y = 1, which only requires three equations for three unknowns, and gives the same answer more quickly If there are several inequality constraints, the non-degenerate constraint qualification, NDCQ, is that the m rows of the Jacobian derivative matrix, h 1 h 1 x n x 1 h m x 1 are linearly independent, that is that none of the rows can be expressed as a linear combination of the other rows If this is the case, we say the Jacobian derivative matrix is of rank m 3(D Constrained Optimisation: Second order conditions The Lagrange multiplier method gives the first order conditions that determine the stationary points, and we can sometimes tell whther the stationary point is a maximum or a minimum without examining the Hessian Matrix In more complicated examples, we need to a method to determine whether a stationary point is a local maximum or a local minimum, or neither Note that having the constraint completely changes the nature of the quadratic expression For example Q(x, y = x y is indefinite, but if we apply the constraint y = 0 then Q(x, y = Q(x, 0 = x which is positive definite and has a local minimum at x = y = 0 We must therefore replace our Leading Principal Minor test with something else The Lagrangian is L = f λ 1 h 1 λ h,, λ m h m, with h 1 = c 1,, h m = c m being the constraints Suppose we have a stationary point at x with Lagrange multipliers λ i, Using the Taylor series, h m x n, L(x + dx = L(x + dx L + 1 dxt Hdx + higher order terms

The term dx L = 0 at a stationary point, because at the stationary points L = 0 So the stationary point is a minimum if dx T Hdx > 0 and a maximum if dx T Hdx < 0 Here H is the Hessian based on the second derivatives of the Lagrangian, so the i, j component is L xix j = L = (f λ 1 x j x h 1 λ h + j Now the allowed displacements dx are restricted by the constraints to satisfy dx h i = 0 for every constraint i = 1, m Replacing dx by x, and h i by its value u i at the stationary point, the problem becomes classify the quadratic form Q = x T Hx, subject to u i x = 0 i = 1, m (i with one linear constraint Note that Q(x, y = x y is indefinite, but if we apply the constraint y = 0, ie u = (0, 1, then Q = x which is positive definite So a constraint can change the classification of Q from indefinite to definite In general, Q(x, y = ax + bxy + cy, with ux + vy = 0 Eliminate y by setting y = ux/v to get Q(x = x v (av buv + cu which is positive definite (minimum if (av buv + cu > 0, and negative definite (maximum if (av buv + cu < 0 (ii Bordered Hessians If we have an n variable Lagrangian with m constraints, then the Bordered Hessian is an (n + m (n + m matrix constructed as follows: ( 0 B H B = B T H where the B part has m rows and n columns, with the rows consisting of the constraint vectors u i, i = 1, m, and the H part is the Hessian based on the Lagrangian In the case where m =, n = 3 and the two constraints are u 1 x + u y + u 3 z = 0, v 1 x + v y + v 3 z = 0, (u 1, u, u 3 being h 1 and (v 1, v, v 3 being h, both evaluated at stationary point, the 5 5 Bordered Hessian matrix is 0 0 u 1 u u 3 0 0 v 1 v v 3 H B = u 1 v 1 L xx L xy L xz u v L xy L yy L yz u 3 v 3 L xz L yz L zz where L xx = f xx λ 1h 1xx λ h xx, and the other components of the Hessian are defined similarly (iii Rules for classification using Bordered Hessians We need to classify Q = x T Hx, subject to u i x = 0, i = 1, m Construct the bordered Hessian as defined above, and evaluate the leading principal minors We only need the last (n m LPM s, starting with LP M m+1 and going up to (and including LP M n+m LP M n+m is just the determinant of the whole H B matrix (A Q is positive definite (PD if sgn(lp M n+m = sgn(det H B = ( 1 m and the successive LPM s have the same sign Here sgn means the sign of, so eg sgn( = 1 This case corresponds to a local minimum (B Q is negative definite (ND if sgn(lp M n+m = sgn(det H B = ( 1 n and the successive LPM s from LP M m+1 to LP M n+m alternate in sign This case corresponds to a local maximum (C If both (A and (B are violated by non-zero LPM s, then Q is indefinite The semi-definite conditions are very involved, and we don t go into them here

Example: If n = and m = 1, then n m = 1 and we only need examine LP M 3 Suppose this is LP M 3 = 1 Then this has the same sign as ( 1 m, and so the constrained Q is positive definite Check: if Q = x y and the constraint is y = 0, so u = (0, 1, the bordered Hessian is H B = 0 0 1 0 1 0 1 0 1 which has LP M 3 = deth B = 1, and Q(x, 0 = x which is indeed positive definite Example: Q = x y z with constraint x = 0, so now u = (1, 0, 0 Now n = 3 and m = 1 we only need examine LP M 4 and LP M 3 H B = 0 1 0 0 1 1 0 0 0 0 1 0 0 0 0 1 LP M 4 = 1, and LP M 3 = 1 These alternate in sign there is a possibility the constrained quadratic is negative definite To test this against criterion (B note LP M 4 = 1 which has the same sign as ( 1 n, and so the constrained Q is negative definite Check: if Q = x y z and the constraint is x = 0, then Q = y z which is clearly negative definite (iv Evaluating determinants Evaluating and 3 3 determinants is straightforward, but evaluating larger determinants can be difficult The definition is det ( 1 h A 1r1 A r A nrn where the summation is made over all permutations r 1 r r 3,, r n A permutation is just the numbers 1 to n written in a different order, eg [1 3 ] is a permutation on the first 3 integers and corresponds to r 1 = 1, r = 3 and r 3 = Permutations are either odd or even If it takes an odd number of interchanges to restore the permutation to its natural order, the permutation is odd If it takes an even number, the permutation is even In the formula for the determinant, h = 1 if the permutation is odd, h = if it is even So [ 3 4 1] is an odd permutation, as we must first swap and 1, then and 3, and then 3 and 4, three swaps altogether Since three is an odd number of swaps, the permutation is odd Other useful facts for evaluating determinants are (i Swapping any two rows or any two columns reverses the sign of the determinant (ii Adding any multiple of another row to a row leaves the determinant unchanged Similarly adding any multiple of a column to a different column leaves it unchanged using this trick, it is usually simple to generate zero elements which makes evaluating the determinant easier APPENDIX How were the tests for the constrained quadratic forms derived? This appendix can be omitted on first reading, as the most important issue is what the tests are and how to apply them However, there is a way to understand where these tests come from in terms of elementary row and column operations (ERCO s These also explain where the leading principal minor (LPM tests for the unconstrained quadratic forms come from An elementary row and column operation, ERCO, acts on a symmetric matrix and removes a pair of elements a ij and a ji from the matrix by subtracting appropriate multiples of a row and a column To illustrate, consider the 4 4 matrix a 11 a 1 a 13 a 14 a 1 a a 4 a 13 a 33 a 34 a 14 a 4 a 34 a 44 We want to remove the elements a 4 from row, column 4 and row 4, column using the elements and a 3 Because the matrix is symmetric it is the same element in both placesto remove a 4 from row, column 4, multiply column 3 by a 4 / and subtract it from column 4 This gives A = (a a 1 a 3 0 a 13 a 33 (a 34 a4a33 a 14 a 4 a 34 (a 44 a4a34

This process can be written as a matrix multiplication A U = A or a 11 a 1 a 13 (a 14 a4a13 1 0 0 0 a 1 a 0 0 1 0 0 a 13 a 33 (a 34 a4a33 a 0 0 1 4 a 3 0 0 0 1 a 14 a 4 a 34 (a 44 a4a34 = a 11 a 1 a 13 a 14 a 1 a a 4 a 13 a 33 a 34 a 14 a 4 a 34 a 44 The matrix U on the right is a unit upper triangular matrix, because it has zeros below the leading diagonal and ones along the leading diagonal Multiply this out yourself to check it really works! To remove the a 4 from row 4, column, multiply row 3 by a 4 / and subtract it from row 4 This is equivalent to the matrix multiplication U T Ã = A or 1 0 0 0 0 1 0 0 0 0 1 0 0 0 a 4 1 (a a 1 a 3 0 a 13 a 33 34 a4a33 ( ( ( (a a 14 a4a13 0 a 34 a4a33 a 44 a4a34 = a 11 a 1 a 13 a 14 a4a13 a 1 a 0 a 13 a 33 a 34 a4a33 a 14 a 4 a 34 a 44 a4a34 So the whole ERCO process is equivalent to writing U T ÃU The important points are that U is a unit upper triangular matrix, so U T is a unit lower triangular matrix, and Ã has zeros where the elements where in A There are two key properties of ERCO operations (i The product of two unit upper triangular matrices is another unit upper triangular matrix This means if we do two successive ERCO operations, we end up with U T U 1 T ÃU 1U = V T ÃV for some unit upper triangular matrix V In fact, we can going on removing pairs of elements as often as we want, and still end up with V T ÃV (ii Removing a multiple of a row from another row, or a multiple of a column from another column, leaves the determinant of the matrix unchanged This a standard result from the theory of determinants Indeed, it also leaves all the leading principal minors, as defined in D(ii, unchanged This result can easily be verified for a 3 3 determinant, and the extension to the n n case is straightforward It follows that an ERCO operation leaves the leading principal minors unchanged, ie the LPM s of A and Ã are all the same Now we apply this ERCO process to our bordered Hessian We take the case m =, n = 4 so there are two constraints and four variables, so that our bordered Hessian is u 1 x 1 + u x + u 3 x 3 + u 4 x 4 = 0, v 1 x 1 + v x + v3x 3 + v4x 4 = 0, 0 0 u 1 u u 3 u 4 0 0 v 1 v v 3 v 4 u 1 v 1 a 11 a 1 a 13 a 14 u v a 1 a a 4 u 3 v 3 a 13 x a 33 a 34 u 4 v 4 a 14 a 4 a 34 a 44 First we remove u, u 3 and u 4 by using the ERCO operations based on u 1 to write U T ÃU giving 1 0 0 0 0 0 0 0 u 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 v 1 v v 3 v 4 0 1 0 0 0 0 0 0 1 0 0 0 u 1 v 1 a 11 a 1 a 13 a 14 0 0 1 u /u 1 u 3 /u 1 u 4 /u 1 0 0 u /u 1 1 0 0 u v a 1 a a 4 0 0 0 1 0 0 0 0 u 3 /u 1 0 1 0 u 3 v 3 a 13 a 33 a 34 0 0 0 0 1 0 0 0 u 4 /u 1 1 0 a 1 u 4 v 4 a 14 a 4 a 34 a 44 0 0 0 1 0 1 If u 1 happened to be zero, we would need to reorder the variables so that a non-zero element is brought into the u 1 position Since the constraint has to be non-trivial this must be possible Note that the U matrix applied to the vector (0, 0, x 1, x, x 3, x 4 T gives Ux = 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 u /u 1 u 3 /u 1 u 4 /u 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 x 1 + u x /u 1 + u 3 x 3 /u 1 + u 4 x 4 /u 1 //x x 3 x 4 = x,

When we apply the first row of Ã to x we get zero because of the constraint, and the same is true of x T applied to the first column of Ã Now we use the v element to remove v 1, v 3 and v 4 If m > we would continue until every row and column of the border just has one non-zero element in it If v happened to be zero, we would again reorder the variables from x to x n to bring a non-zero element into the v position If the only non-zero element in row of Ã is v 1, then the second constraint is linearly dependent on the first constraint, so its not really a second constraint In this case the NDCQ won t hold (see 3C(iii The matrix A now has the form U T BU = A with B of the form B = 0 0 u 1 0 0 0 0 0 0 v 0 0 u 1 0 a 11 a 1 a 13 a 14 0 v a 1 a a 4 0 0 a 13 a 33 a 34 0 0 a 14 a 4 a 34 a 44 At this stage we can check that just as applying the first row of Ã to x gave zero because of the constraint, so applying the second row of B to Ux also gives zero The next step is to remove all the off-diagonal elements of the type a ij elements using a ij /a ii multiples of the ith row and column This reduces the matrix to the form U T CU = A with C of the form C = 0 0 u 1 0 0 0 0 0 0 v 0 0 u 1 0 a 11 0 0 0 0 v 0 a 0 0 0 0 0 0 a 33 0 0 0 0 0 0 a 44 Now we get the matrix into its final form by removing the a 11 and a elements using u 1 and v respectively In general, this means that a 11 to a mm are removed Then we swap columns 1 and 3, and columns and 4 Every time we swap a row we change the sign of the determinant and the principal minors, so this process multiplies the determinants and minors by a factor ( 1 m After this swap the matrix now has the form U T DU = A with D = u 1 0 0 0 0 0 0 v 0 0 0 0 0 0 u 1 0 0 0 0 0 0 v 0 0 0 0 0 0 a 33 0 0 0 0 0 0 a 44 The determinant of D is simply LP M 6 = u 1 v a 33a 44 The next lowest principal minor is LP M 5 = u 1 v a 33 Now the U matrix applied to the vector x = (0, 0, x 1, x, x 3, x 4 T gives y = Ux Because of the swap of rows 1 and 3, and and 4, the third and fourth elements of y are zero, and the first element is x 1 + x u /u 1 + x 3 u 3 /u 1 + x 4 u 4 /u 1 which when multiplied by D 11 gives zero from the constraint equation Similarly, D times the second element y is zero, so that y T Dy = x T Ax = a 33 y 3 + a 44y 4 The elements y 3 and y 4 are linear combinations of x 1 x 4 The quadratic form x T Ax is therefore positive definite if a 33 and a 44 are positive This is the case if LP M 5 and LP M 6 are both positive However, LP M 6 and LP M 5 are the LPM s of D, which are ( 1 m times the LPM s of A So the test on the original matrix A to be positive definte is that LP M 5 and LP M 6, the LPM s of A, both have the same sign as ( 1 m Similarly, if a 33 and a 44 are both negative in D, then the quadratic form is negative definite This requires the LPM s of D to have alternating signs, with LP M 1 < 0 This means that LP M n+m, the determinant of D, must have sign ( 1 n+m, and so LP M n+m, the determinant of A, must have sign ( 1 n+m ( 1 m = ( 1 n So the condition for negative definiteness is that the LPM s of A must alternate in sign, and the last one, the determinant of A, must have the same sign as ( 1 n These are just the rules in 3D(iii The quadratic form is semi-definite but not definite if one of a 33 or a 44 is zero The same argument holds if there are no constraints, ie m = 0 Then we skip the elimination of the u i and v i, and obtain the results in D(ii directly CAJ /11/005