Outline. Optimization scheme Linear search methods Gradient descent Conjugate gradient Newton method Quasi-Newton methods

Transcription

1 Outline 1 Optimization without constraints Optimization scheme Linear search methods Gradient descent Conjugate gradient Newton method Quasi-Newton methods 2 Optimization under constraints Lagrange Equality constraints Inequality constraints Dual problem - Resolution by duality Numerical methods Penalty functions Projected gradient: equality constraints Projected gradient: inequality constraints 3 Conclusion

3 Optimization scheme General numerical optimization scheme min x R d f (x) 1 Init : by x 0, an initial guess of a minimum x. 2 Recursion : until a convergence criterion is satisfied at x n at x n, determine of a search direction d n R d, linear search : find x n+1 along the semi-line x n + t d n, t R + ; amounts to minimizing φ(t) = f (x n + t d n ) in t > 0.

4 Optimization scheme Remarks : It is important to notice that f can t be plotted (d large). An optimization scheme is short-sighted i.e. only has access to local knowledge on f. The information available will determine the method to use : order 0 : only f (x n ) available, 1st order : f (x n ) also known 2nd order : 2 f (x n ) known (or estimated) Stop criteria are on f (x n ), on the relative norm of the last step xn+1 x n x n, etc. The linear search may simply be an approximate minimization.

5 Linear search methods Linear search We look for a local minimum of φ : min φ(t) = f (x n + t d n ) t>0 Equivalently, and to simplify notations, we can assume that f : R R and look for a minimum of f. There exist many methods, according to the assumptions on f : convex, unimodular, C 1, C 2, etc.

6 Linear search methods Newton-Raphson method Assumes f is C 2. Principle : approximate f by its second order expansion around x n, f (x) = f (x n )+f (x n )(x x n )+ 1 2 f (x n )(x x n ) 2 +o(x x n ) 2 take as x n+1 the min of the quadratic approx. of f. x n+1 = x n f (x n ) f (x n )

7 Linear search methods Equivalently, amounts to finding a zero of f (x). f (x) = f (x n ) + f (x n )(x x n ) + o(x x n ) x n+1 = x n f (x n ) f (x n )

8 Linear search methods Secant method Assumes f is only C 1. Principle : same as the Newton-Raphson method, but f (x n ) is approximated by f (x n ) f (x n 1 ) x n x n 1 this yields x n+1 = x n x n x n 1 f (x n ) f (x n 1 ) f (x n ) Standard to find the zero of a function when its derivative is unknown.

9 Linear search methods Wolfe s method Assumes f is only C 1. Principle : approximate linear search proposes an x n+1 that sufficiently decreases f (x n+1 ) w.r.t. f (x n ) and that also significantly decreases f (x n+1 ) w.r.t. f (x n ) The search of x n+1 is done by dichotomy in an interval [a = x n, b].

10 Linear search methods Let 0 < m 1 < 1 2 < m 2 < 1 be two parameters. The point x is acceptable as x n+1 iff f (x) f (x 0 ) + m 1 (x x 0 )f (x 0 ) f (x) m 2 f (x 0 )

11 Gradient descent Gradient descent (steepest descent) Back to the general case : f : R d R Principle : Performs the linear search along the steepest descent direction d n = f (x n ) the optimal step t minimizes φ(t) = f (x n + t d n ), so φ (t ) = f (x n + t d n ) t d n = 0 the descent stops when the new gradient f (x n+1 ) becomes orthogonal to the current descent direction d n = f (x n ).

12 Gradient descent Properties : + Easy to implement, requires only first order information on f. - Slow convergence. Performs poorly even on simple functions like quadratic forms! In practice, a fast suboptimal descent step is preferred.

13 Gradient descent Example : on Rosenbrock s banana f (x) = 100(x 2 x 2 1 ) 2 + (1 x 1 ) 2

14 Conjugate gradient Conjugate gradient Overview : A first order method, simple variation of the gradient descent, designed to perform well on quadratic forms. Idea = tilt the next search direction to better aim at the minimum of the quadratic form.

15 Conjugate gradient We assume A is a positive symmetric matrix and Principle : f (x) = 1 2 x t Ax + b t x and f (x) = Ax + b start at x 0, d 0 = f (x 0 ) g 0, at x n, instead of d n = f (x n ) g n, look for the minimum of f in the affine space W n+1 = x 0 + sp{d 0, d 1,..., d n 1, g n } Lemma x n+1 is the min of f in W n+1 g n+1 f (x n+1 ) W n+1

16 Conjugate gradient We assume A is a positive symmetric matrix and Principle : f (x) = 1 2 x t Ax + b t x and f (x) = Ax + b start at x 0, d 0 = f (x 0 ) g 0, at x n, instead of d n = f (x n ) g n, look for the minimum of f in the affine space W n+1 = x 0 + sp{d 0, d 1,..., d n 1, g n } Lemma x n+1 is the min of f in W n+1 g n+1 f (x n+1 ) W n+1

17 Conjugate gradient Lemma x n is the minimum of f in W n, from x n, direction d n points to the minimum x n+1 in W n+1 iff (d n ) t A d i = 0 for 0 i n 1 The direction d n is said to be conjugate to all the previous d i. Proof : x n+1 = x n + td n g n+1 f (x n+1 ) = Ax n+1 + b = g n + tad n From the previous lemma g n+1 W n+1 and g n W n, so (g n+1 ) t g n = g n 2 + t(d n ) t Ag n = 0 t 0 (g n+1 ) t d i = (g n ) t d i + t(d n ) t Ad i = 0 for 0 i n 1

18 Conjugate gradient Lemma x n is the minimum of f in W n, from x n, direction d n points to the minimum x n+1 in W n+1 iff (d n ) t A d i = 0 for 0 i n 1 The direction d n is said to be conjugate to all the previous d i. Proof : x n+1 = x n + td n g n+1 f (x n+1 ) = Ax n+1 + b = g n + tad n From the previous lemma g n+1 W n+1 and g n W n, so (g n+1 ) t g n = g n 2 + t(d n ) t Ag n = 0 t 0 (g n+1 ) t d i = (g n ) t d i + t(d n ) t Ad i = 0 for 0 i n 1

19 Conjugate gradient Question : How to find direction d n, conjugate to all previous d i? Notice g i+1 g i = A(x i+1 x i ) Ad i, so (d n ) t A d i = 0 (d n ) t g i+1 = (d n ) t g i = cst Since the g i form an orthogonal family, one has d n n i=0 g i g i 2 d n = g n + c n d n 1 Answer : steepest slope, slightly corrected by previous descent direction.

20 Conjugate gradient Example : on Rosenbrock s banana

21 Conjugate gradient Expressions of the correction coefficient c n : c n = g n 2 g n 1 2 Fletcher & Reeves (1964) c n = (g n g n 1 ) t g n g n 1 2 Polak & Ribière (1971) c n = (g n ) t Ad n 1 d n 1 2 A Properties : converges in d steps for a quadratic form f : R d R same complexity as the gradient method! Works well on non quadratic forms if the Hessian doesn t change much between x n and x n+1 Caution : d n may not be a descent direction... In this case, reset to g n.

22 Conjugate gradient Expressions of the correction coefficient c n : c n = g n 2 g n 1 2 Fletcher & Reeves (1964) c n = (g n g n 1 ) t g n g n 1 2 Polak & Ribière (1971) c n = (g n ) t Ad n 1 d n 1 2 A Properties : converges in d steps for a quadratic form f : R d R same complexity as the gradient method! Works well on non quadratic forms if the Hessian doesn t change much between x n and x n+1 Caution : d n may not be a descent direction... In this case, reset to g n.

23 Newton method Newton method Principle : Replace f by its second order approximation at x n φ(x) = f (x n ) + f (x n ) t (x x n ) (x x n ) t 2 f (x n ) (x x n ) take as x n+1 the min of φ(x) φ(x) = f (x n ) + 2 f (x n ) (x x n ) which amounts to solving the linear system 2 f (x n ) (x n+1 x n ) = f (x n )

24 Newton method Example : on Rosenbrock s banana

25 Newton method Comments : + faster convergence (1 step for quadratic functions!), but expensive : requires second order information on f yields a stationary point of f : one still has to check that it is a minimum in practice, try d n = [ 2 f (x n )] 1 f (x n )] as descent direction, and perform a linear search - no guarantee that d n is an admissible descent direction... - no guarantee that x n+1 is a better point than x n f (x n ) may be singular, or badly conditioned... the Levenberg-Marquardt regularization suggests to solve [ 2 f (x n ) + µ1i ] d n = f (x n )

26 Quasi-Newton methods Quasi-Newton methods Principle : Take advantage of the efficiency of the Newton method when the Hessian 2 f (x) is unavailable! Idea : approximate [ 2 f (x n )] 1 by matrix K n in x n+1 = x n [ 2 f (x n )] 1 f (x n ) More precisely, explore direction d n = K n f (x n ) from x n.

27 Quasi-Newton methods Quasi-Newton equation Consider the second order Taylor expansion of f at x n f (x) = f (x n ) + f (x n ) t (x x n ) (x x n ) t 2 f (x n ) (x x n ) + o( x x n 2 ) f (x) = f (x n ) + 2 f (x n ) (x x n ) + o( x x n ) The estimate K n of the inverse Hessian must satisfy the quasi-newton equation (QNE) x n+1 x n = K n+1 [ f (x n+1 ) f (x n )] Notice that this should be K n... but K n is used to find x n+1, so we impose the relation be satisfied at the next step.

28 Quasi-Newton methods Quasi-Newton equation Consider the second order Taylor expansion of f at x n f (x) = f (x n ) + f (x n ) t (x x n ) (x x n ) t 2 f (x n ) (x x n ) + o( x x n 2 ) f (x) = f (x n ) + 2 f (x n ) (x x n ) + o( x x n ) The estimate K n of the inverse Hessian must satisfy the quasi-newton equation (QNE) x n+1 x n = K n+1 [ f (x n+1 ) f (x n )] Notice that this should be K n... but K n is used to find x n+1, so we impose the relation be satisfied at the next step.

29 Quasi-Newton methods All quasi-newton Methods recursively build the K n by K n+1 = K n + C n where the correction C n is adjusted to satisfy the QNE. Notations : u n = x n x n 1 v n = g n g n 1 QNE : u n+1 = K n+1 v n+1

30 Quasi-Newton methods Correction C n of rank 1 K n+1 = K n + w n (w n ) t (w n ) t v n+1 where w n = u n+1 K n v n+1 If initialized with K 0 = 1I, K n converges in d steps to the true A 1 for a quadratic form. DFP (Davidon, Fletcher, Powell) correction of rank 2 K n+1 = K n + un+1 (u n+1 ) t (u n+1 ) t v n+1 K nv n+1 (v n+1 ) t K n (v n+1 ) t K n v n+1 Converges in d steps to the true A 1 for a quadratic form. Descent directions are conjugate w.r.t. A. Coincides with the conjugate gradient method.

31 Quasi-Newton methods Correction C n of rank 1 K n+1 = K n + w n (w n ) t (w n ) t v n+1 where w n = u n+1 K n v n+1 If initialized with K 0 = 1I, K n converges in d steps to the true A 1 for a quadratic form. DFP (Davidon, Fletcher, Powell) correction of rank 2 K n+1 = K n + un+1 (u n+1 ) t (u n+1 ) t v n+1 K nv n+1 (v n+1 ) t K n (v n+1 ) t K n v n+1 Converges in d steps to the true A 1 for a quadratic form. Descent directions are conjugate w.r.t. A. Coincides with the conjugate gradient method.

32 Quasi-Newton methods BFGS (Broyden, Fletcher, Goldfarb, Shanno, 1970), correction of rank 3 K n+1 = K n un+1 (v n+1 ) t K n + K n v n+1 (u n+1 ) t (u n+1 ) t v n+1 + (1 + (v n+1 ) t K n v n+1 ) u n+1 (u n+1 ) t (u n+1 ) t v n+1 (u n+1 ) t v n+1 Considered as the best Quasi-Newton method. In practice, one should check that K n g n is a descent direction, i.e. (g n ) t K n g n < 0, otherwise reinitialize by K n = 1I.

33 Quasi-Newton methods BFGS (Broyden, Fletcher, Goldfarb, Shanno, 1970), correction of rank 3 K n+1 = K n un+1 (v n+1 ) t K n + K n v n+1 (u n+1 ) t (u n+1 ) t v n+1 + (1 + (v n+1 ) t K n v n+1 ) u n+1 (u n+1 ) t (u n+1 ) t v n+1 (u n+1 ) t v n+1 Considered as the best Quasi-Newton method. In practice, one should check that K n g n is a descent direction, i.e. (g n ) t K n g n < 0, otherwise reinitialize by K n = 1I.

34 Quasi-Newton methods Example : on Rosenbrock s banana

36 Lagrange Joseph Louis count of Lagrange Giuseppe Ludovico di Lagrangia, Italian mathematician, born in Turin (1736) founder of the Academy of Turin (1758) called by Euler to the Academy of Berlin director of the French Academy of Sciences (1788) survived the French revolution (b.c.w. Condorcet,...) resting at the Pantheon (1813)

37 Lagrange Among his contributions : the calculus of variations, the Taylor-Lagrange formula, the least action principle in mechanics, some results on the 3 bodies problem, (the Lagrange points)... and the notion of Lagrangian!

38 Equality constraints Equality constraints min x f (x) s.t. θ j (x) = 0, 1 j m D : θ(x) = [θ 1 (x),..., θ m (x)] t = 0 defines a manifold of dimension d m in R d θ j (x 0 ) t (x x 0 ) = 0 : tangent hyperplane to θ j (x) = 0 at point x 0 θ(x 0 ) t (x x 0 ) = 0 : tangent space to D at x 0

39 Equality constraints Definition In domain D = {x R d : θ(x) = 0}, the point x 0 is regular iff the gradients θ j (x 0 ) of the m constraints are linearly independent. Lemma If x 0 is regular, every (unit) direction d in the tangent space is admissible, i.e. can be obtained as the limit of x n x 0 x n x 0, with lim n x n = x 0 and x n D.

40 Equality constraints Theorem Let x be a regular point of D, if x is a local extremum of f in D, then there exists a unique vector λ R m of Lagrange multipliers such that f (x ) + m λ j θ j (x ) = 0 j=1

41 Equality constraints Proof : Project f (x ) on sp{ θ 1 (x ),..., θ m (x )} f (x ) = m j=1 λ j θ j (x ) + u u belongs to the tangent space to D at x progressing along u decreases f and doesn t change θ

42 Equality constraints Application To solve min x f (x) s.t. θ(x) = 0, 1 build the Lagrangian L(x, λ) = f (x) + j λ j θ j (x) 2 find a stationary point (x, λ ) of the Lagrangian, i.e. a zero of L(x, λ) x L(x, λ) = f (x) + j λ j θ j (x) λ L(x, λ) = θ(x) i.e. d + m (non-linear) equations, with d + m unknowns.

43 Equality constraints Example Problem : find the radius x 1 and the height x 2 of a cooking pan in order to minimize its surface, s.t. the capacity of the pan is 1 litre. f (x) = πx πx 1 x 2 θ(x) = πx 2 1 x 2 1

44 Equality constraints

45 Equality constraints Solution : Lagrangian L(x, λ) = f (x) + λθ(x) L(x, λ) x 1 = 2πx 1 + 2πx 2 + λ2πx 1 x 2 = 0 L(x, λ) x 2 = 2πx 1 + λπx 1 2 = 0 We obtain x 1 = x 2 = 2 λ. Finally, θ(x ) = 0 gives the value of λ to plug : λ = π1/3 2, so x 1 = x 2 = π 1/3.

46 Equality constraints Another interpretation Consider the unconstrained problem, where λ is fixed min x L(x, λ) = f (x) + λθ(x) f and L have the same local minima in D = {x : θ(x) = 0}. Let x (λ) be a local minimum of L(x, λ) in R d. If x (λ) D, then it is also a local min of f. So one just has to adjust λ to get this property.

47 Equality constraints Second order conditions Theorem Let (x, λ ) be a stationary point of L(x, λ), and consider the Hessian of the Lagrangian 2 x L(x, λ ) = 2 f (x ) + m λ j 2 θ j (x ) j=1 NC : x is a local min of f on D 2 x L(x, λ ) is a positive quadratic form on the tangent space at x, i.e. the kernel of matrix θ(x ) t. SC : 2 x L(x, λ ) is strictly positive on the tangent space x is a local min of f on D

48 Equality constraints

49 Inequality constraints Inequality constraints min x f (x) s.t. θ j (x) 0, 1 j m D : θ(x) = [θ 1 (x),..., θ m (x)] t 0 defines a volume in R d limited by m manifolds of dimension d 1 At point x, constraint θ j is active iff θ j (x) = 0. A(x) = {j : θ j (x) = 0} = active set at x. One could have simultaneously equality and inequality constraints (not done here for a of matter clarity). Equality constraints are always active. j A(x 0 ) {x : θ j (x 0 ) t (x x 0 ) = 0} defines the tangent space to D at x 0

50 Inequality constraints Admissible directions Let x 0 D, we look for directions d R d that keep us inside domain D : x 0 + ɛ d D. Definition Direction d is admissible from x 0 iff (x n ) n>0 in D such that lim n x n = x 0 and lim n x n x 0 x n x 0 = d d

51 Inequality constraints Admissible directions at x 0 form a cone C(x 0 ). This cone is not necessarily convex... C(x 0 ) can be determined from the θ j (x 0 ) of the active constraints.

52 Inequality constraints Theorem If x 0 is a regular point, i.e. the gradients of the active constraints at x 0 are linearly independent, then C(x 0 ) is the convex cone given by C(x 0 ) = {u R d : θ j (x 0 ) t u 0, j A(x 0 )} Interpretation : an admissible displacement must not increase the value of θ j (x 0 ) for an already active constraint, it can only decrease it or leave it unchanged.

53 Inequality constraints Dual cone and Farkas lemma For v 1,..., v J R d, consider cone C = {u : u t v 1 0,..., u t v J 0}. Farkas-Minkowski lemma Let g R d, one has the equivalence u C, g t u 0, C is included in the half-space {u : g t u 0}, g belongs to the dual cone C = {w : u C, w t u 0} g = J j=1 α jv j where α j 0 for all j

54 Inequality constraints 1st order optimality conditions Theorem (Karush-Kuhn-Tucker conditions) Let x be a regular point of domain D. If x is a local minimum of f in D, there exists a unique set of generalized Lagrange multipliers λ j for j A(x ) such that f (x ) + Remarks : j A(x ) λ j θ j (x ) = 0 and λ j 0, j A(x ) Similar to the case of equality constraints : here only active constraints are considered. The positivity condition is new : translates the fact that one side of the manifold is permitted.

55 Inequality constraints 1st order optimality conditions Theorem (Karush-Kuhn-Tucker conditions) Let x be a regular point of domain D. If x is a local minimum of f in D, there exists a unique set of generalized Lagrange multipliers λ j for j A(x ) such that f (x ) + Remarks : j A(x ) λ j θ j (x ) = 0 and λ j 0, j A(x ) Similar to the case of equality constraints : here only active constraints are considered. The positivity condition is new : translates the fact that one side of the manifold is permitted.

56 Inequality constraints Proof : take any admissible direction : d C(x ) = {u : u t θ j (x ) 0, j A(x )} progressing along d doesn t decrease f : [ f (x )] t d 0 this means that g = f (x ) belongs to the dual cone C(x ), so by Farkas lemma f (x ) = j A(x ) λ j θ j (x ) and λ j 0

57 Inequality constraints Corollary The Karush-Kuhn-Tucker conditions are equivalent to m f (x ) + λ j θ j (x ) = 0 and λ j 0, 1 j m j=1 with the extra complementarity condition m λ j θ j (x ) = 0 j=1 This entails λ j = 0 for an inactive constraint θ j at x. To be usable, requires to know/guess the set of active constraints at the optimum. A(x ) known, leaves a set of non-linear equations + positivity constraints.

58 Inequality constraints Example Problem : Minimize distance from point P to the red segment min f (x) x = (x 1 1) 2 + (x 2 2) 2 subject to θ 1 (x) = x 1 x 2 1 = 0 θ 2 (x) = x 1 + x θ 3 (x) = x 1 0 θ 4 (x) = x 2 0

59 Inequality constraints Objective : cancel the gradient of the Lagrangian L(x, λ) = x λ 1 + λ 2 λ 3 = 0 x 1 L(x, λ) = x 2 2 λ 1 + λ 2 λ 4 = 0 x 2 L(x, λ) equality = x 1 x 2 1 = 0 λ 1 inequalities λ 2 (x 1 + x 2 2) = 0, λ 2 0 λ 3 x 1 = 0, λ 3 0 λ 4 x 2 = 0, λ 4 0 1st guess : A(x ) = {1}, i.e. only θ 1 active at the optimum. Complementarity λ 2 = λ 3 = λ 4 = 0. This yields x = (2, 1) which violates θ 2 (x) 0.

60 Inequality constraints Objective : cancel the gradient of the Lagrangian L(x, λ) = x λ 1 + λ 2 λ 3 = 0 x 1 L(x, λ) = x 2 2 λ 1 + λ 2 λ 4 = 0 x 2 L(x, λ) equality = x 1 x 2 1 = 0 λ 1 inequalities λ 2 (x 1 + x 2 2) = 0, λ 2 0 λ 3 x 1 = 0, λ 3 0 λ 4 x 2 = 0, λ 4 0 2nd guess : A(x ) = {1, 2}, i.e. θ 2 is added to the active set. Complementarity λ 3 = λ 4 = 0. This yields x = ( 3 2, 1 2 ) which belongs to D.

61 Dual problem - Resolution by duality Dual problem - Resolution by duality [ For simplicity we consider the case of inequality constraints. ] Idea: under some conditions, a stationary point (x, λ ) of the Lagrangian, i.e. L(x, λ ) = f (x ) + i λ i θ(x ) = 0 corresponds to a saddle point of the Lagrangian, i.e. inf x L(x, λ ) = L(x, λ ) = sup L(x, λ) λ So the resolution amounts to finding such saddle points, and then extract x.

62 Dual problem - Resolution by duality Saddle points Definition (x, λ ) is a saddle point of L in D x D λ iff sup λ D λ L(x, λ) = L(x, λ ) = inf x D x L(x, λ )

63 Dual problem - Resolution by duality Lemma If (x, λ ) is a saddle point of L in D x D λ, then sup λ D λ inf L(x, λ) = L(x, λ ) = inf x D x x D x sup L(x, λ) λ D λ Proof one always has sup λ inf x L(x, λ) inf x sup λ L(x, λ) the difference is called the duality gap, generally > 0 from the def. of a saddle point, one has then sup L(x, λ) = L(x, λ ) = inf L(x, λ x λ ) inf x [ sup λ inf L(x, λ) ] sup L(x, λ) λ L(x, x λ ) sup [ inf x λ L(x, λ) ] Morality: one can look for λ first, and then for x...

67 Dual problem - Resolution by duality Saddle points of the Lagrangian Theorem If (x, λ ) is a saddle point of the Lagrangian L in R d R m +, then x is a solution of the primal problem (P) (P) min x f (x) s.t. θ i (x) 0, 1 i m Proof From L(x, λ) L(x, λ ), λ D λ = R m + f (x ) + i λ i θ i (x ) f (x ) + i λ i θ i (x ) (λ i λ i )θ i (x ) 0 i whence θ i (x ) 0 by λ i + : x satisfies constraints

68 Dual problem - Resolution by duality Saddle points of the Lagrangian Theorem If (x, λ ) is a saddle point of the Lagrangian L in R d R m +, then x is a solution of the primal problem (P) (P) min x f (x) s.t. θ i (x) 0, 1 i m Proof From L(x, λ) L(x, λ ), λ D λ = R m + f (x ) + i λ i θ i (x ) f (x ) + i λ i θ i (x ) (λ i λ i )θ i (x ) 0 i whence θ i (x ) 0 by λ i + : x satisfies constraints

69 Dual problem - Resolution by duality Moreover, i λ i θ i(x ) 0, by λ i = 0, and so i λ i θ i(x ) = 0 (complementarity condition) From L(x, λ ) L(x, λ ), x R d f (x ) + i λ i θ i (x ) f (x) + i λ i θ i (x) so for all admissible x, i.e. such that θ i (x) 0, 1 i m f (x ) f (x) Summary : saddle points of the Lagrangian, when they exist, give solutions to the optimization problem. But they don t always exist...

72 Dual problem - Resolution by duality Existence of saddle points Theorem If f and the constraints θ i are convex functions of x in R d, and if x = arg min x f (x) in {x : θ i (x) 0, 1 i m} is regular then x corresponds to a saddle point (x, λ ) of the Lagrangian Proof: from Kuhn-Tucker, derive the saddle point property L(x, λ) = f (x ) + i λ iθ i (x ) f (x ) = f (x ) + i λ iθ i (x ) = L(x, λ ) using admissibility of x, positivity of λ i and complementarity L(x, λ ) = f (x) + i λ i θ i(x) is a convex function of x From the stationarity of L, one has x L(x, λ ) = f (x ) + i λ i θ i(x ) = 0 sufficient to show that x is a minimum of the convex function L(x, λ )

75 Dual problem - Resolution by duality Dual problem Summary: Provided the Lagrangian has saddle points Solutions to (P) min x f (x) s.t. θ i (x) 0, 1 i m are the 1st argument of a saddle point (x, λ ) of the Lagrangian L(x, λ) If λ were known, amounts to solving an unconstrained problem x = arg min x L(x, λ ) How to find such a λ? One has L(x, λ ) = max λ R m + min x L(x, λ), so λ should be a solution of the dual problem (D) max λ g(λ), s.t. λ Rm +, where g(λ) = min x L(x, λ)

78 Dual problem - Resolution by duality Under some conditions, it is equivalent to solve the (P) or (D) : Theorem If the θ i are continuous over R d, and λ R m +, x (λ) = arg min x L(x, λ) is unique, and x (λ) is a continuous function of λ then λ solves (D) x (λ ) solves (P) If (P) has at least one solution x, f and the θ i are convex and x is regular, then (D) has at least a solution λ. Remark (D) is still an optimization problem under constraints but constraints λ R m + are much simpler to handle!

79 Dual problem - Resolution by duality Under some conditions, it is equivalent to solve the (P) or (D) : Theorem If the θ i are continuous over R d, and λ R m +, x (λ) = arg min x L(x, λ) is unique, and x (λ) is a continuous function of λ then λ solves (D) x (λ ) solves (P) If (P) has at least one solution x, f and the θ i are convex and x is regular, then (D) has at least a solution λ. Remark (D) is still an optimization problem under constraints but constraints λ R m + are much simpler to handle!

80 Dual problem - Resolution by duality Example Minimize a quadratic function under a quadratic constraint in R min x (x x 0 ) 2 s.t. (x x 1 ) 2 d 0 with d > 0, x 1 > x 0 convex, regular case... unique saddle point of the Lagrangian

81 Dual problem - Resolution by duality L(x, λ) = (x x 0 ) 2 + λ[(x x 1 ) 2 d] Compute g(λ) = min x R L(x, λ) x L(x, λ) = 2(x x 0 ) + 2λ(x x 1 ) = 0 x (λ) = x 0+λx 1 1+λ g(λ) = (x 1 x 0 ) 2 λ 1+λ λd Solve (D) : max λ 0 g(λ) g (λ) = (x 1 x 0 ) 2 d = 0 (1+λ) 2 λ = x 1 x d 0 1 if 0, otherwise λ = 0 (constraint is inactive) When λ > 0, x (λ ) = x 1 d otherwise, for λ = 0, x (λ ) = x 0

84 Dual problem - Resolution by duality Plot of the Lagrangian Case where λ > 0, i.e. x 1 x 0 > d (here x 0 = 1, x 1 = 3, d = 1, λ = 1)

85 Dual problem - Resolution by duality Plot of the Lagrangian Case where λ = 0, i.e. x 1 x 0 d (here x 0 = 1, x 1 = 1.5, d = 1)

86 Numerical methods Numerical methods Same principle as for the unconstrained case, with 2 extra difficulties constraints limit the choice of admissible directions, progressing along an admissible direction may meet the boundary of D.

87 Numerical methods Penalty functions Also called Lagrangian relaxation Exterior points method : for equality constraints θ(x) = 0 Principle = penalize non-admissible solutions. Let ψ(x) 0 and ψ(x) = 0 exactly on D, for example ψ(x) = θ(x) 2 Consider the unconstrained problem min x F (x) = f (x) + c k ψ(x), c k > 0 and let c k go to +.

88 Numerical methods Interior points method : better suited to inequalities θ(x) 0 Principle = completely forbid non-admissible solutions, penalize those that get close to the boundaries of D. Let ψ(x) 0 and ψ(x) + when θ j (x) 0, for example ψ(x) = j 1 θ j (x) then same as exterior points method : min x F (x) = f (x) + c k ψ(x)...

89 Numerical methods Projected gradient: equality constraints Principle : project f (x n ) on the tangent space to constraints lemma Let C = [C 1,..., C m ] R d m be the matrix formed by m linearly independent (column) vectors C j of R d. In R r, the projection on sp{c 1,..., C m } is given by π C (x) = Px with P = C(C t C) 1 C t Proof : This amounts to solving the quadratic problem min x Cα 2 α Rm Remark : The projection on sp{c 1,..., C m } = {x : C t x = 0} is given by matrix Q = I P

90 Numerical methods Projected gradient: equality constraints Principle : project f (x n ) on the tangent space to constraints lemma Let C = [C 1,..., C m ] R d m be the matrix formed by m linearly independent (column) vectors C j of R d. In R r, the projection on sp{c 1,..., C m } is given by π C (x) = Px with P = C(C t C) 1 C t Proof : This amounts to solving the quadratic problem min x Cα 2 α Rm Remark : The projection on sp{c 1,..., C m } = {x : C t x = 0} is given by matrix Q = I P

91 Numerical methods Affine equality constraints Replace min x f (x) s.t. θ(x) = C t x c = 0, by min x F (x) with F (x) = f [π D (x)]. These two functions coincide on D = {x : θ(x) = 0} = {x : x = π D (x)}

92 Numerical methods For x 0 D, one has π D (x) = x 0 + Q(x x 0 ), so min F (x) x = f [x 0 + Q(x x 0 )] F (x) = Q f [x 0 + Q(x x 0 )] 2 F (x) = Q f [x 0 + Q(x x 0 )] Q F (x n ) is the projection of f (x n ) on D = {x : C t x = 0} Iterations starting with x 0 D stay in D.

93 Numerical methods Non linear equality constraints, project f (x) on the tangent space sp{ 1 θ(x),..., m θ(x)} then project x n+1 on D.

94 Numerical methods Projected gradient: affine inequality constraints Similar to the case of equality constraints, but only active constraints are considered. Some constraints may become active/inactive during the linear search... Stop when the Kuhn-Tucker conditions are met.

95 Numerical methods

97 Conclusion When facing a constrained optimization problem......one reflex : build the Lagrangian! L(x, λ) = f (x) + j λ j θ j (x) solve L(x, λ) = 0 to find a candidate optimum (x, λ ) check the positivity of its Hessian 2 xl(x, λ ) to check if x is a min, a max or a saddle point of f.