Outline. Optimization scheme Linear search methods Gradient descent Conjugate gradient Newton method Quasi-Newton methods

Size: px
Start display at page:

Download "Outline. Optimization scheme Linear search methods Gradient descent Conjugate gradient Newton method Quasi-Newton methods"

Transcription

1 Outline 1 Optimization without constraints Optimization scheme Linear search methods Gradient descent Conjugate gradient Newton method Quasi-Newton methods 2 Optimization under constraints Lagrange Equality constraints Inequality constraints Dual problem - Resolution by duality Numerical methods Penalty functions Projected gradient: equality constraints Projected gradient: inequality constraints 3 Conclusion

2 Outline 1 Optimization without constraints Optimization scheme Linear search methods Gradient descent Conjugate gradient Newton method Quasi-Newton methods 2 Optimization under constraints Lagrange Equality constraints Inequality constraints Dual problem - Resolution by duality Numerical methods Penalty functions Projected gradient: equality constraints Projected gradient: inequality constraints 3 Conclusion

3 Optimization scheme General numerical optimization scheme min x R d f (x) 1 Init : by x 0, an initial guess of a minimum x. 2 Recursion : until a convergence criterion is satisfied at x n at x n, determine of a search direction d n R d, linear search : find x n+1 along the semi-line x n + t d n, t R + ; amounts to minimizing φ(t) = f (x n + t d n ) in t > 0.

4 Optimization scheme Remarks : It is important to notice that f can t be plotted (d large). An optimization scheme is short-sighted i.e. only has access to local knowledge on f. The information available will determine the method to use : order 0 : only f (x n ) available, 1st order : f (x n ) also known 2nd order : 2 f (x n ) known (or estimated) Stop criteria are on f (x n ), on the relative norm of the last step xn+1 x n x n, etc. The linear search may simply be an approximate minimization.

5 Linear search methods Linear search We look for a local minimum of φ : min φ(t) = f (x n + t d n ) t>0 Equivalently, and to simplify notations, we can assume that f : R R and look for a minimum of f. There exist many methods, according to the assumptions on f : convex, unimodular, C 1, C 2, etc.

6 Linear search methods Newton-Raphson method Assumes f is C 2. Principle : approximate f by its second order expansion around x n, f (x) = f (x n )+f (x n )(x x n )+ 1 2 f (x n )(x x n ) 2 +o(x x n ) 2 take as x n+1 the min of the quadratic approx. of f. x n+1 = x n f (x n ) f (x n )

7 Linear search methods Equivalently, amounts to finding a zero of f (x). f (x) = f (x n ) + f (x n )(x x n ) + o(x x n ) x n+1 = x n f (x n ) f (x n )

8 Linear search methods Secant method Assumes f is only C 1. Principle : same as the Newton-Raphson method, but f (x n ) is approximated by f (x n ) f (x n 1 ) x n x n 1 this yields x n+1 = x n x n x n 1 f (x n ) f (x n 1 ) f (x n ) Standard to find the zero of a function when its derivative is unknown.

9 Linear search methods Wolfe s method Assumes f is only C 1. Principle : approximate linear search proposes an x n+1 that sufficiently decreases f (x n+1 ) w.r.t. f (x n ) and that also significantly decreases f (x n+1 ) w.r.t. f (x n ) The search of x n+1 is done by dichotomy in an interval [a = x n, b].

10 Linear search methods Let 0 < m 1 < 1 2 < m 2 < 1 be two parameters. The point x is acceptable as x n+1 iff f (x) f (x 0 ) + m 1 (x x 0 )f (x 0 ) f (x) m 2 f (x 0 )

11 Gradient descent Gradient descent (steepest descent) Back to the general case : f : R d R Principle : Performs the linear search along the steepest descent direction d n = f (x n ) the optimal step t minimizes φ(t) = f (x n + t d n ), so φ (t ) = f (x n + t d n ) t d n = 0 the descent stops when the new gradient f (x n+1 ) becomes orthogonal to the current descent direction d n = f (x n ).

12 Gradient descent Properties : + Easy to implement, requires only first order information on f. - Slow convergence. Performs poorly even on simple functions like quadratic forms! In practice, a fast suboptimal descent step is preferred.

13 Gradient descent Example : on Rosenbrock s banana f (x) = 100(x 2 x 2 1 ) 2 + (1 x 1 ) 2

14 Conjugate gradient Conjugate gradient Overview : A first order method, simple variation of the gradient descent, designed to perform well on quadratic forms. Idea = tilt the next search direction to better aim at the minimum of the quadratic form.

15 Conjugate gradient We assume A is a positive symmetric matrix and Principle : f (x) = 1 2 x t Ax + b t x and f (x) = Ax + b start at x 0, d 0 = f (x 0 ) g 0, at x n, instead of d n = f (x n ) g n, look for the minimum of f in the affine space W n+1 = x 0 + sp{d 0, d 1,..., d n 1, g n } Lemma x n+1 is the min of f in W n+1 g n+1 f (x n+1 ) W n+1

16 Conjugate gradient We assume A is a positive symmetric matrix and Principle : f (x) = 1 2 x t Ax + b t x and f (x) = Ax + b start at x 0, d 0 = f (x 0 ) g 0, at x n, instead of d n = f (x n ) g n, look for the minimum of f in the affine space W n+1 = x 0 + sp{d 0, d 1,..., d n 1, g n } Lemma x n+1 is the min of f in W n+1 g n+1 f (x n+1 ) W n+1

17 Conjugate gradient Lemma x n is the minimum of f in W n, from x n, direction d n points to the minimum x n+1 in W n+1 iff (d n ) t A d i = 0 for 0 i n 1 The direction d n is said to be conjugate to all the previous d i. Proof : x n+1 = x n + td n g n+1 f (x n+1 ) = Ax n+1 + b = g n + tad n From the previous lemma g n+1 W n+1 and g n W n, so (g n+1 ) t g n = g n 2 + t(d n ) t Ag n = 0 t 0 (g n+1 ) t d i = (g n ) t d i + t(d n ) t Ad i = 0 for 0 i n 1

18 Conjugate gradient Lemma x n is the minimum of f in W n, from x n, direction d n points to the minimum x n+1 in W n+1 iff (d n ) t A d i = 0 for 0 i n 1 The direction d n is said to be conjugate to all the previous d i. Proof : x n+1 = x n + td n g n+1 f (x n+1 ) = Ax n+1 + b = g n + tad n From the previous lemma g n+1 W n+1 and g n W n, so (g n+1 ) t g n = g n 2 + t(d n ) t Ag n = 0 t 0 (g n+1 ) t d i = (g n ) t d i + t(d n ) t Ad i = 0 for 0 i n 1

19 Conjugate gradient Question : How to find direction d n, conjugate to all previous d i? Notice g i+1 g i = A(x i+1 x i ) Ad i, so (d n ) t A d i = 0 (d n ) t g i+1 = (d n ) t g i = cst Since the g i form an orthogonal family, one has d n n i=0 g i g i 2 d n = g n + c n d n 1 Answer : steepest slope, slightly corrected by previous descent direction.

20 Conjugate gradient Example : on Rosenbrock s banana

21 Conjugate gradient Expressions of the correction coefficient c n : c n = g n 2 g n 1 2 Fletcher & Reeves (1964) c n = (g n g n 1 ) t g n g n 1 2 Polak & Ribière (1971) c n = (g n ) t Ad n 1 d n 1 2 A Properties : converges in d steps for a quadratic form f : R d R same complexity as the gradient method! Works well on non quadratic forms if the Hessian doesn t change much between x n and x n+1 Caution : d n may not be a descent direction... In this case, reset to g n.

22 Conjugate gradient Expressions of the correction coefficient c n : c n = g n 2 g n 1 2 Fletcher & Reeves (1964) c n = (g n g n 1 ) t g n g n 1 2 Polak & Ribière (1971) c n = (g n ) t Ad n 1 d n 1 2 A Properties : converges in d steps for a quadratic form f : R d R same complexity as the gradient method! Works well on non quadratic forms if the Hessian doesn t change much between x n and x n+1 Caution : d n may not be a descent direction... In this case, reset to g n.

23 Newton method Newton method Principle : Replace f by its second order approximation at x n φ(x) = f (x n ) + f (x n ) t (x x n ) (x x n ) t 2 f (x n ) (x x n ) take as x n+1 the min of φ(x) φ(x) = f (x n ) + 2 f (x n ) (x x n ) which amounts to solving the linear system 2 f (x n ) (x n+1 x n ) = f (x n )

24 Newton method Example : on Rosenbrock s banana

25 Newton method Comments : + faster convergence (1 step for quadratic functions!), but expensive : requires second order information on f yields a stationary point of f : one still has to check that it is a minimum in practice, try d n = [ 2 f (x n )] 1 f (x n )] as descent direction, and perform a linear search - no guarantee that d n is an admissible descent direction... - no guarantee that x n+1 is a better point than x n f (x n ) may be singular, or badly conditioned... the Levenberg-Marquardt regularization suggests to solve [ 2 f (x n ) + µ1i ] d n = f (x n )

26 Quasi-Newton methods Quasi-Newton methods Principle : Take advantage of the efficiency of the Newton method when the Hessian 2 f (x) is unavailable! Idea : approximate [ 2 f (x n )] 1 by matrix K n in x n+1 = x n [ 2 f (x n )] 1 f (x n ) More precisely, explore direction d n = K n f (x n ) from x n.

27 Quasi-Newton methods Quasi-Newton equation Consider the second order Taylor expansion of f at x n f (x) = f (x n ) + f (x n ) t (x x n ) (x x n ) t 2 f (x n ) (x x n ) + o( x x n 2 ) f (x) = f (x n ) + 2 f (x n ) (x x n ) + o( x x n ) The estimate K n of the inverse Hessian must satisfy the quasi-newton equation (QNE) x n+1 x n = K n+1 [ f (x n+1 ) f (x n )] Notice that this should be K n... but K n is used to find x n+1, so we impose the relation be satisfied at the next step.

28 Quasi-Newton methods Quasi-Newton equation Consider the second order Taylor expansion of f at x n f (x) = f (x n ) + f (x n ) t (x x n ) (x x n ) t 2 f (x n ) (x x n ) + o( x x n 2 ) f (x) = f (x n ) + 2 f (x n ) (x x n ) + o( x x n ) The estimate K n of the inverse Hessian must satisfy the quasi-newton equation (QNE) x n+1 x n = K n+1 [ f (x n+1 ) f (x n )] Notice that this should be K n... but K n is used to find x n+1, so we impose the relation be satisfied at the next step.

29 Quasi-Newton methods All quasi-newton Methods recursively build the K n by K n+1 = K n + C n where the correction C n is adjusted to satisfy the QNE. Notations : u n = x n x n 1 v n = g n g n 1 QNE : u n+1 = K n+1 v n+1

30 Quasi-Newton methods Correction C n of rank 1 K n+1 = K n + w n (w n ) t (w n ) t v n+1 where w n = u n+1 K n v n+1 If initialized with K 0 = 1I, K n converges in d steps to the true A 1 for a quadratic form. DFP (Davidon, Fletcher, Powell) correction of rank 2 K n+1 = K n + un+1 (u n+1 ) t (u n+1 ) t v n+1 K nv n+1 (v n+1 ) t K n (v n+1 ) t K n v n+1 Converges in d steps to the true A 1 for a quadratic form. Descent directions are conjugate w.r.t. A. Coincides with the conjugate gradient method.

31 Quasi-Newton methods Correction C n of rank 1 K n+1 = K n + w n (w n ) t (w n ) t v n+1 where w n = u n+1 K n v n+1 If initialized with K 0 = 1I, K n converges in d steps to the true A 1 for a quadratic form. DFP (Davidon, Fletcher, Powell) correction of rank 2 K n+1 = K n + un+1 (u n+1 ) t (u n+1 ) t v n+1 K nv n+1 (v n+1 ) t K n (v n+1 ) t K n v n+1 Converges in d steps to the true A 1 for a quadratic form. Descent directions are conjugate w.r.t. A. Coincides with the conjugate gradient method.

32 Quasi-Newton methods BFGS (Broyden, Fletcher, Goldfarb, Shanno, 1970), correction of rank 3 K n+1 = K n un+1 (v n+1 ) t K n + K n v n+1 (u n+1 ) t (u n+1 ) t v n+1 + (1 + (v n+1 ) t K n v n+1 ) u n+1 (u n+1 ) t (u n+1 ) t v n+1 (u n+1 ) t v n+1 Considered as the best Quasi-Newton method. In practice, one should check that K n g n is a descent direction, i.e. (g n ) t K n g n < 0, otherwise reinitialize by K n = 1I.

33 Quasi-Newton methods BFGS (Broyden, Fletcher, Goldfarb, Shanno, 1970), correction of rank 3 K n+1 = K n un+1 (v n+1 ) t K n + K n v n+1 (u n+1 ) t (u n+1 ) t v n+1 + (1 + (v n+1 ) t K n v n+1 ) u n+1 (u n+1 ) t (u n+1 ) t v n+1 (u n+1 ) t v n+1 Considered as the best Quasi-Newton method. In practice, one should check that K n g n is a descent direction, i.e. (g n ) t K n g n < 0, otherwise reinitialize by K n = 1I.

34 Quasi-Newton methods Example : on Rosenbrock s banana

35 Outline 1 Optimization without constraints Optimization scheme Linear search methods Gradient descent Conjugate gradient Newton method Quasi-Newton methods 2 Optimization under constraints Lagrange Equality constraints Inequality constraints Dual problem - Resolution by duality Numerical methods Penalty functions Projected gradient: equality constraints Projected gradient: inequality constraints 3 Conclusion

36 Lagrange Joseph Louis count of Lagrange Giuseppe Ludovico di Lagrangia, Italian mathematician, born in Turin (1736) founder of the Academy of Turin (1758) called by Euler to the Academy of Berlin director of the French Academy of Sciences (1788) survived the French revolution (b.c.w. Condorcet,...) resting at the Pantheon (1813)

37 Lagrange Among his contributions : the calculus of variations, the Taylor-Lagrange formula, the least action principle in mechanics, some results on the 3 bodies problem, (the Lagrange points)... and the notion of Lagrangian!

38 Equality constraints Equality constraints min x f (x) s.t. θ j (x) = 0, 1 j m D : θ(x) = [θ 1 (x),..., θ m (x)] t = 0 defines a manifold of dimension d m in R d θ j (x 0 ) t (x x 0 ) = 0 : tangent hyperplane to θ j (x) = 0 at point x 0 θ(x 0 ) t (x x 0 ) = 0 : tangent space to D at x 0

39 Equality constraints Definition In domain D = {x R d : θ(x) = 0}, the point x 0 is regular iff the gradients θ j (x 0 ) of the m constraints are linearly independent. Lemma If x 0 is regular, every (unit) direction d in the tangent space is admissible, i.e. can be obtained as the limit of x n x 0 x n x 0, with lim n x n = x 0 and x n D.

40 Equality constraints Theorem Let x be a regular point of D, if x is a local extremum of f in D, then there exists a unique vector λ R m of Lagrange multipliers such that f (x ) + m λ j θ j (x ) = 0 j=1

41 Equality constraints Proof : Project f (x ) on sp{ θ 1 (x ),..., θ m (x )} f (x ) = m j=1 λ j θ j (x ) + u u belongs to the tangent space to D at x progressing along u decreases f and doesn t change θ

42 Equality constraints Application To solve min x f (x) s.t. θ(x) = 0, 1 build the Lagrangian L(x, λ) = f (x) + j λ j θ j (x) 2 find a stationary point (x, λ ) of the Lagrangian, i.e. a zero of L(x, λ) x L(x, λ) = f (x) + j λ j θ j (x) λ L(x, λ) = θ(x) i.e. d + m (non-linear) equations, with d + m unknowns.

43 Equality constraints Example Problem : find the radius x 1 and the height x 2 of a cooking pan in order to minimize its surface, s.t. the capacity of the pan is 1 litre. f (x) = πx πx 1 x 2 θ(x) = πx 2 1 x 2 1

44 Equality constraints

45 Equality constraints Solution : Lagrangian L(x, λ) = f (x) + λθ(x) L(x, λ) x 1 = 2πx 1 + 2πx 2 + λ2πx 1 x 2 = 0 L(x, λ) x 2 = 2πx 1 + λπx 1 2 = 0 We obtain x 1 = x 2 = 2 λ. Finally, θ(x ) = 0 gives the value of λ to plug : λ = π1/3 2, so x 1 = x 2 = π 1/3.

46 Equality constraints Another interpretation Consider the unconstrained problem, where λ is fixed min x L(x, λ) = f (x) + λθ(x) f and L have the same local minima in D = {x : θ(x) = 0}. Let x (λ) be a local minimum of L(x, λ) in R d. If x (λ) D, then it is also a local min of f. So one just has to adjust λ to get this property.

47 Equality constraints Second order conditions Theorem Let (x, λ ) be a stationary point of L(x, λ), and consider the Hessian of the Lagrangian 2 x L(x, λ ) = 2 f (x ) + m λ j 2 θ j (x ) j=1 NC : x is a local min of f on D 2 x L(x, λ ) is a positive quadratic form on the tangent space at x, i.e. the kernel of matrix θ(x ) t. SC : 2 x L(x, λ ) is strictly positive on the tangent space x is a local min of f on D

48 Equality constraints

49 Inequality constraints Inequality constraints min x f (x) s.t. θ j (x) 0, 1 j m D : θ(x) = [θ 1 (x),..., θ m (x)] t 0 defines a volume in R d limited by m manifolds of dimension d 1 At point x, constraint θ j is active iff θ j (x) = 0. A(x) = {j : θ j (x) = 0} = active set at x. One could have simultaneously equality and inequality constraints (not done here for a of matter clarity). Equality constraints are always active. j A(x 0 ) {x : θ j (x 0 ) t (x x 0 ) = 0} defines the tangent space to D at x 0

50 Inequality constraints Admissible directions Let x 0 D, we look for directions d R d that keep us inside domain D : x 0 + ɛ d D. Definition Direction d is admissible from x 0 iff (x n ) n>0 in D such that lim n x n = x 0 and lim n x n x 0 x n x 0 = d d

51 Inequality constraints Admissible directions at x 0 form a cone C(x 0 ). This cone is not necessarily convex... C(x 0 ) can be determined from the θ j (x 0 ) of the active constraints.

52 Inequality constraints Theorem If x 0 is a regular point, i.e. the gradients of the active constraints at x 0 are linearly independent, then C(x 0 ) is the convex cone given by C(x 0 ) = {u R d : θ j (x 0 ) t u 0, j A(x 0 )} Interpretation : an admissible displacement must not increase the value of θ j (x 0 ) for an already active constraint, it can only decrease it or leave it unchanged.

53 Inequality constraints Dual cone and Farkas lemma For v 1,..., v J R d, consider cone C = {u : u t v 1 0,..., u t v J 0}. Farkas-Minkowski lemma Let g R d, one has the equivalence u C, g t u 0, C is included in the half-space {u : g t u 0}, g belongs to the dual cone C = {w : u C, w t u 0} g = J j=1 α jv j where α j 0 for all j

54 Inequality constraints 1st order optimality conditions Theorem (Karush-Kuhn-Tucker conditions) Let x be a regular point of domain D. If x is a local minimum of f in D, there exists a unique set of generalized Lagrange multipliers λ j for j A(x ) such that f (x ) + Remarks : j A(x ) λ j θ j (x ) = 0 and λ j 0, j A(x ) Similar to the case of equality constraints : here only active constraints are considered. The positivity condition is new : translates the fact that one side of the manifold is permitted.

55 Inequality constraints 1st order optimality conditions Theorem (Karush-Kuhn-Tucker conditions) Let x be a regular point of domain D. If x is a local minimum of f in D, there exists a unique set of generalized Lagrange multipliers λ j for j A(x ) such that f (x ) + Remarks : j A(x ) λ j θ j (x ) = 0 and λ j 0, j A(x ) Similar to the case of equality constraints : here only active constraints are considered. The positivity condition is new : translates the fact that one side of the manifold is permitted.

56 Inequality constraints Proof : take any admissible direction : d C(x ) = {u : u t θ j (x ) 0, j A(x )} progressing along d doesn t decrease f : [ f (x )] t d 0 this means that g = f (x ) belongs to the dual cone C(x ), so by Farkas lemma f (x ) = j A(x ) λ j θ j (x ) and λ j 0

57 Inequality constraints Corollary The Karush-Kuhn-Tucker conditions are equivalent to m f (x ) + λ j θ j (x ) = 0 and λ j 0, 1 j m j=1 with the extra complementarity condition m λ j θ j (x ) = 0 j=1 This entails λ j = 0 for an inactive constraint θ j at x. To be usable, requires to know/guess the set of active constraints at the optimum. A(x ) known, leaves a set of non-linear equations + positivity constraints.

58 Inequality constraints Example Problem : Minimize distance from point P to the red segment min f (x) x = (x 1 1) 2 + (x 2 2) 2 subject to θ 1 (x) = x 1 x 2 1 = 0 θ 2 (x) = x 1 + x θ 3 (x) = x 1 0 θ 4 (x) = x 2 0

59 Inequality constraints Objective : cancel the gradient of the Lagrangian L(x, λ) = x λ 1 + λ 2 λ 3 = 0 x 1 L(x, λ) = x 2 2 λ 1 + λ 2 λ 4 = 0 x 2 L(x, λ) equality = x 1 x 2 1 = 0 λ 1 inequalities λ 2 (x 1 + x 2 2) = 0, λ 2 0 λ 3 x 1 = 0, λ 3 0 λ 4 x 2 = 0, λ 4 0 1st guess : A(x ) = {1}, i.e. only θ 1 active at the optimum. Complementarity λ 2 = λ 3 = λ 4 = 0. This yields x = (2, 1) which violates θ 2 (x) 0.

60 Inequality constraints Objective : cancel the gradient of the Lagrangian L(x, λ) = x λ 1 + λ 2 λ 3 = 0 x 1 L(x, λ) = x 2 2 λ 1 + λ 2 λ 4 = 0 x 2 L(x, λ) equality = x 1 x 2 1 = 0 λ 1 inequalities λ 2 (x 1 + x 2 2) = 0, λ 2 0 λ 3 x 1 = 0, λ 3 0 λ 4 x 2 = 0, λ 4 0 2nd guess : A(x ) = {1, 2}, i.e. θ 2 is added to the active set. Complementarity λ 3 = λ 4 = 0. This yields x = ( 3 2, 1 2 ) which belongs to D.

61 Dual problem - Resolution by duality Dual problem - Resolution by duality [ For simplicity we consider the case of inequality constraints. ] Idea: under some conditions, a stationary point (x, λ ) of the Lagrangian, i.e. L(x, λ ) = f (x ) + i λ i θ(x ) = 0 corresponds to a saddle point of the Lagrangian, i.e. inf x L(x, λ ) = L(x, λ ) = sup L(x, λ) λ So the resolution amounts to finding such saddle points, and then extract x.

62 Dual problem - Resolution by duality Saddle points Definition (x, λ ) is a saddle point of L in D x D λ iff sup λ D λ L(x, λ) = L(x, λ ) = inf x D x L(x, λ )

63 Dual problem - Resolution by duality Lemma If (x, λ ) is a saddle point of L in D x D λ, then sup λ D λ inf L(x, λ) = L(x, λ ) = inf x D x x D x sup L(x, λ) λ D λ Proof one always has sup λ inf x L(x, λ) inf x sup λ L(x, λ) the difference is called the duality gap, generally > 0 from the def. of a saddle point, one has then sup L(x, λ) = L(x, λ ) = inf L(x, λ x λ ) inf x [ sup λ inf L(x, λ) ] sup L(x, λ) λ L(x, x λ ) sup [ inf x λ L(x, λ) ] Morality: one can look for λ first, and then for x...

64 Dual problem - Resolution by duality Lemma If (x, λ ) is a saddle point of L in D x D λ, then sup λ D λ inf L(x, λ) = L(x, λ ) = inf x D x x D x sup L(x, λ) λ D λ Proof one always has sup λ inf x L(x, λ) inf x sup λ L(x, λ) the difference is called the duality gap, generally > 0 from the def. of a saddle point, one has then sup L(x, λ) = L(x, λ ) = inf L(x, λ x λ ) inf x [ sup λ inf L(x, λ) ] sup L(x, λ) λ L(x, x λ ) sup [ inf x λ L(x, λ) ] Morality: one can look for λ first, and then for x...

65 Dual problem - Resolution by duality Lemma If (x, λ ) is a saddle point of L in D x D λ, then sup λ D λ inf L(x, λ) = L(x, λ ) = inf x D x x D x sup L(x, λ) λ D λ Proof one always has sup λ inf x L(x, λ) inf x sup λ L(x, λ) the difference is called the duality gap, generally > 0 from the def. of a saddle point, one has then sup L(x, λ) = L(x, λ ) = inf L(x, λ x λ ) inf x [ sup λ inf L(x, λ) ] sup L(x, λ) λ L(x, x λ ) sup [ inf x λ L(x, λ) ] Morality: one can look for λ first, and then for x...

66 Dual problem - Resolution by duality Lemma If (x, λ ) is a saddle point of L in D x D λ, then sup λ D λ inf L(x, λ) = L(x, λ ) = inf x D x x D x sup L(x, λ) λ D λ Proof one always has sup λ inf x L(x, λ) inf x sup λ L(x, λ) the difference is called the duality gap, generally > 0 from the def. of a saddle point, one has then sup L(x, λ) = L(x, λ ) = inf L(x, λ x λ ) inf x [ sup λ inf L(x, λ) ] sup L(x, λ) λ L(x, x λ ) sup [ inf x λ L(x, λ) ] Morality: one can look for λ first, and then for x...

67 Dual problem - Resolution by duality Saddle points of the Lagrangian Theorem If (x, λ ) is a saddle point of the Lagrangian L in R d R m +, then x is a solution of the primal problem (P) (P) min x f (x) s.t. θ i (x) 0, 1 i m Proof From L(x, λ) L(x, λ ), λ D λ = R m + f (x ) + i λ i θ i (x ) f (x ) + i λ i θ i (x ) (λ i λ i )θ i (x ) 0 i whence θ i (x ) 0 by λ i + : x satisfies constraints

68 Dual problem - Resolution by duality Saddle points of the Lagrangian Theorem If (x, λ ) is a saddle point of the Lagrangian L in R d R m +, then x is a solution of the primal problem (P) (P) min x f (x) s.t. θ i (x) 0, 1 i m Proof From L(x, λ) L(x, λ ), λ D λ = R m + f (x ) + i λ i θ i (x ) f (x ) + i λ i θ i (x ) (λ i λ i )θ i (x ) 0 i whence θ i (x ) 0 by λ i + : x satisfies constraints

69 Dual problem - Resolution by duality Moreover, i λ i θ i(x ) 0, by λ i = 0, and so i λ i θ i(x ) = 0 (complementarity condition) From L(x, λ ) L(x, λ ), x R d f (x ) + i λ i θ i (x ) f (x) + i λ i θ i (x) so for all admissible x, i.e. such that θ i (x) 0, 1 i m f (x ) f (x) Summary : saddle points of the Lagrangian, when they exist, give solutions to the optimization problem. But they don t always exist...

70 Dual problem - Resolution by duality Moreover, i λ i θ i(x ) 0, by λ i = 0, and so i λ i θ i(x ) = 0 (complementarity condition) From L(x, λ ) L(x, λ ), x R d f (x ) + i λ i θ i (x ) f (x) + i λ i θ i (x) so for all admissible x, i.e. such that θ i (x) 0, 1 i m f (x ) f (x) Summary : saddle points of the Lagrangian, when they exist, give solutions to the optimization problem. But they don t always exist...

71 Dual problem - Resolution by duality Moreover, i λ i θ i(x ) 0, by λ i = 0, and so i λ i θ i(x ) = 0 (complementarity condition) From L(x, λ ) L(x, λ ), x R d f (x ) + i λ i θ i (x ) f (x) + i λ i θ i (x) so for all admissible x, i.e. such that θ i (x) 0, 1 i m f (x ) f (x) Summary : saddle points of the Lagrangian, when they exist, give solutions to the optimization problem. But they don t always exist...

72 Dual problem - Resolution by duality Existence of saddle points Theorem If f and the constraints θ i are convex functions of x in R d, and if x = arg min x f (x) in {x : θ i (x) 0, 1 i m} is regular then x corresponds to a saddle point (x, λ ) of the Lagrangian Proof: from Kuhn-Tucker, derive the saddle point property L(x, λ) = f (x ) + i λ iθ i (x ) f (x ) = f (x ) + i λ iθ i (x ) = L(x, λ ) using admissibility of x, positivity of λ i and complementarity L(x, λ ) = f (x) + i λ i θ i(x) is a convex function of x From the stationarity of L, one has x L(x, λ ) = f (x ) + i λ i θ i(x ) = 0 sufficient to show that x is a minimum of the convex function L(x, λ )

73 Dual problem - Resolution by duality Existence of saddle points Theorem If f and the constraints θ i are convex functions of x in R d, and if x = arg min x f (x) in {x : θ i (x) 0, 1 i m} is regular then x corresponds to a saddle point (x, λ ) of the Lagrangian Proof: from Kuhn-Tucker, derive the saddle point property L(x, λ) = f (x ) + i λ iθ i (x ) f (x ) = f (x ) + i λ iθ i (x ) = L(x, λ ) using admissibility of x, positivity of λ i and complementarity L(x, λ ) = f (x) + i λ i θ i(x) is a convex function of x From the stationarity of L, one has x L(x, λ ) = f (x ) + i λ i θ i(x ) = 0 sufficient to show that x is a minimum of the convex function L(x, λ )

74 Dual problem - Resolution by duality Existence of saddle points Theorem If f and the constraints θ i are convex functions of x in R d, and if x = arg min x f (x) in {x : θ i (x) 0, 1 i m} is regular then x corresponds to a saddle point (x, λ ) of the Lagrangian Proof: from Kuhn-Tucker, derive the saddle point property L(x, λ) = f (x ) + i λ iθ i (x ) f (x ) = f (x ) + i λ iθ i (x ) = L(x, λ ) using admissibility of x, positivity of λ i and complementarity L(x, λ ) = f (x) + i λ i θ i(x) is a convex function of x From the stationarity of L, one has x L(x, λ ) = f (x ) + i λ i θ i(x ) = 0 sufficient to show that x is a minimum of the convex function L(x, λ )

75 Dual problem - Resolution by duality Dual problem Summary: Provided the Lagrangian has saddle points Solutions to (P) min x f (x) s.t. θ i (x) 0, 1 i m are the 1st argument of a saddle point (x, λ ) of the Lagrangian L(x, λ) If λ were known, amounts to solving an unconstrained problem x = arg min x L(x, λ ) How to find such a λ? One has L(x, λ ) = max λ R m + min x L(x, λ), so λ should be a solution of the dual problem (D) max λ g(λ), s.t. λ Rm +, where g(λ) = min x L(x, λ)

76 Dual problem - Resolution by duality Dual problem Summary: Provided the Lagrangian has saddle points Solutions to (P) min x f (x) s.t. θ i (x) 0, 1 i m are the 1st argument of a saddle point (x, λ ) of the Lagrangian L(x, λ) If λ were known, amounts to solving an unconstrained problem x = arg min x L(x, λ ) How to find such a λ? One has L(x, λ ) = max λ R m + min x L(x, λ), so λ should be a solution of the dual problem (D) max λ g(λ), s.t. λ Rm +, where g(λ) = min x L(x, λ)

77 Dual problem - Resolution by duality Dual problem Summary: Provided the Lagrangian has saddle points Solutions to (P) min x f (x) s.t. θ i (x) 0, 1 i m are the 1st argument of a saddle point (x, λ ) of the Lagrangian L(x, λ) If λ were known, amounts to solving an unconstrained problem x = arg min x L(x, λ ) How to find such a λ? One has L(x, λ ) = max λ R m + min x L(x, λ), so λ should be a solution of the dual problem (D) max λ g(λ), s.t. λ Rm +, where g(λ) = min x L(x, λ)

78 Dual problem - Resolution by duality Under some conditions, it is equivalent to solve the (P) or (D) : Theorem If the θ i are continuous over R d, and λ R m +, x (λ) = arg min x L(x, λ) is unique, and x (λ) is a continuous function of λ then λ solves (D) x (λ ) solves (P) If (P) has at least one solution x, f and the θ i are convex and x is regular, then (D) has at least a solution λ. Remark (D) is still an optimization problem under constraints but constraints λ R m + are much simpler to handle!

79 Dual problem - Resolution by duality Under some conditions, it is equivalent to solve the (P) or (D) : Theorem If the θ i are continuous over R d, and λ R m +, x (λ) = arg min x L(x, λ) is unique, and x (λ) is a continuous function of λ then λ solves (D) x (λ ) solves (P) If (P) has at least one solution x, f and the θ i are convex and x is regular, then (D) has at least a solution λ. Remark (D) is still an optimization problem under constraints but constraints λ R m + are much simpler to handle!

80 Dual problem - Resolution by duality Example Minimize a quadratic function under a quadratic constraint in R min x (x x 0 ) 2 s.t. (x x 1 ) 2 d 0 with d > 0, x 1 > x 0 convex, regular case... unique saddle point of the Lagrangian

81 Dual problem - Resolution by duality L(x, λ) = (x x 0 ) 2 + λ[(x x 1 ) 2 d] Compute g(λ) = min x R L(x, λ) x L(x, λ) = 2(x x 0 ) + 2λ(x x 1 ) = 0 x (λ) = x 0+λx 1 1+λ g(λ) = (x 1 x 0 ) 2 λ 1+λ λd Solve (D) : max λ 0 g(λ) g (λ) = (x 1 x 0 ) 2 d = 0 (1+λ) 2 λ = x 1 x d 0 1 if 0, otherwise λ = 0 (constraint is inactive) When λ > 0, x (λ ) = x 1 d otherwise, for λ = 0, x (λ ) = x 0

82 Dual problem - Resolution by duality L(x, λ) = (x x 0 ) 2 + λ[(x x 1 ) 2 d] Compute g(λ) = min x R L(x, λ) x L(x, λ) = 2(x x 0 ) + 2λ(x x 1 ) = 0 x (λ) = x 0+λx 1 1+λ g(λ) = (x 1 x 0 ) 2 λ 1+λ λd Solve (D) : max λ 0 g(λ) g (λ) = (x 1 x 0 ) 2 d = 0 (1+λ) 2 λ = x 1 x d 0 1 if 0, otherwise λ = 0 (constraint is inactive) When λ > 0, x (λ ) = x 1 d otherwise, for λ = 0, x (λ ) = x 0

83 Dual problem - Resolution by duality L(x, λ) = (x x 0 ) 2 + λ[(x x 1 ) 2 d] Compute g(λ) = min x R L(x, λ) x L(x, λ) = 2(x x 0 ) + 2λ(x x 1 ) = 0 x (λ) = x 0+λx 1 1+λ g(λ) = (x 1 x 0 ) 2 λ 1+λ λd Solve (D) : max λ 0 g(λ) g (λ) = (x 1 x 0 ) 2 d = 0 (1+λ) 2 λ = x 1 x d 0 1 if 0, otherwise λ = 0 (constraint is inactive) When λ > 0, x (λ ) = x 1 d otherwise, for λ = 0, x (λ ) = x 0

84 Dual problem - Resolution by duality Plot of the Lagrangian Case where λ > 0, i.e. x 1 x 0 > d (here x 0 = 1, x 1 = 3, d = 1, λ = 1)

85 Dual problem - Resolution by duality Plot of the Lagrangian Case where λ = 0, i.e. x 1 x 0 d (here x 0 = 1, x 1 = 1.5, d = 1)

86 Numerical methods Numerical methods Same principle as for the unconstrained case, with 2 extra difficulties constraints limit the choice of admissible directions, progressing along an admissible direction may meet the boundary of D.

87 Numerical methods Penalty functions Also called Lagrangian relaxation Exterior points method : for equality constraints θ(x) = 0 Principle = penalize non-admissible solutions. Let ψ(x) 0 and ψ(x) = 0 exactly on D, for example ψ(x) = θ(x) 2 Consider the unconstrained problem min x F (x) = f (x) + c k ψ(x), c k > 0 and let c k go to +.

88 Numerical methods Interior points method : better suited to inequalities θ(x) 0 Principle = completely forbid non-admissible solutions, penalize those that get close to the boundaries of D. Let ψ(x) 0 and ψ(x) + when θ j (x) 0, for example ψ(x) = j 1 θ j (x) then same as exterior points method : min x F (x) = f (x) + c k ψ(x)...

89 Numerical methods Projected gradient: equality constraints Principle : project f (x n ) on the tangent space to constraints lemma Let C = [C 1,..., C m ] R d m be the matrix formed by m linearly independent (column) vectors C j of R d. In R r, the projection on sp{c 1,..., C m } is given by π C (x) = Px with P = C(C t C) 1 C t Proof : This amounts to solving the quadratic problem min x Cα 2 α Rm Remark : The projection on sp{c 1,..., C m } = {x : C t x = 0} is given by matrix Q = I P

90 Numerical methods Projected gradient: equality constraints Principle : project f (x n ) on the tangent space to constraints lemma Let C = [C 1,..., C m ] R d m be the matrix formed by m linearly independent (column) vectors C j of R d. In R r, the projection on sp{c 1,..., C m } is given by π C (x) = Px with P = C(C t C) 1 C t Proof : This amounts to solving the quadratic problem min x Cα 2 α Rm Remark : The projection on sp{c 1,..., C m } = {x : C t x = 0} is given by matrix Q = I P

91 Numerical methods Affine equality constraints Replace min x f (x) s.t. θ(x) = C t x c = 0, by min x F (x) with F (x) = f [π D (x)]. These two functions coincide on D = {x : θ(x) = 0} = {x : x = π D (x)}

92 Numerical methods For x 0 D, one has π D (x) = x 0 + Q(x x 0 ), so min F (x) x = f [x 0 + Q(x x 0 )] F (x) = Q f [x 0 + Q(x x 0 )] 2 F (x) = Q f [x 0 + Q(x x 0 )] Q F (x n ) is the projection of f (x n ) on D = {x : C t x = 0} Iterations starting with x 0 D stay in D.

93 Numerical methods Non linear equality constraints, project f (x) on the tangent space sp{ 1 θ(x),..., m θ(x)} then project x n+1 on D.

94 Numerical methods Projected gradient: affine inequality constraints Similar to the case of equality constraints, but only active constraints are considered. Some constraints may become active/inactive during the linear search... Stop when the Kuhn-Tucker conditions are met.

95 Numerical methods

96 Outline 1 Optimization without constraints Optimization scheme Linear search methods Gradient descent Conjugate gradient Newton method Quasi-Newton methods 2 Optimization under constraints Lagrange Equality constraints Inequality constraints Dual problem - Resolution by duality Numerical methods Penalty functions Projected gradient: equality constraints Projected gradient: inequality constraints 3 Conclusion

97 Conclusion When facing a constrained optimization problem......one reflex : build the Lagrangian! L(x, λ) = f (x) + j λ j θ j (x) solve L(x, λ) = 0 to find a candidate optimum (x, λ ) check the positivity of its Hessian 2 xl(x, λ ) to check if x is a min, a max or a saddle point of f.

98 Conclusion When facing a constrained optimization problem......one reflex : build the Lagrangian! L(x, λ) = f (x) + j λ j θ j (x) solve L(x, λ) = 0 to find a candidate optimum (x, λ ) check the positivity of its Hessian 2 xl(x, λ ) to check if x is a min, a max or a saddle point of f.

99 Conclusion When facing a constrained optimization problem......one reflex : build the Lagrangian! L(x, λ) = f (x) + j λ j θ j (x) solve L(x, λ) = 0 to find a candidate optimum (x, λ ) check the positivity of its Hessian 2 xl(x, λ ) to check if x is a min, a max or a saddle point of f.

100 Conclusion When facing a constrained optimization problem......one reflex : build the Lagrangian! L(x, λ) = f (x) + j λ j θ j (x) solve L(x, λ) = 0 to find a candidate optimum (x, λ ) check the positivity of its Hessian 2 xl(x, λ ) to check if x is a min, a max or a saddle point of f.

101 Conclusion When facing a constrained optimization problem......one reflex : build the Lagrangian! L(x, λ) = f (x) + j λ j θ j (x) solve L(x, λ) = 0 to find a candidate optimum (x, λ ) check the positivity of its Hessian 2 xl(x, λ ) to check if x is a min, a max or a saddle point of f.

(Quasi-)Newton methods

(Quasi-)Newton methods (Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

More information

Date: April 12, 2001. Contents

Date: April 12, 2001. Contents 2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........

More information

Support Vector Machine (SVM)

Support Vector Machine (SVM) Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen (für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

Nonlinear Programming Methods.S2 Quadratic Programming

Nonlinear Programming Methods.S2 Quadratic Programming Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective

More information

A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION

A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION 1 A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION Dimitri Bertsekas M.I.T. FEBRUARY 2003 2 OUTLINE Convexity issues in optimization Historical remarks Our treatment of the subject Three unifying lines of

More information

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all. 1. Differentiation The first derivative of a function measures by how much changes in reaction to an infinitesimal shift in its argument. The largest the derivative (in absolute value), the faster is evolving.

More information

2.3 Convex Constrained Optimization Problems

2.3 Convex Constrained Optimization Problems 42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

More information

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA This handout presents the second derivative test for a local extrema of a Lagrange multiplier problem. The Section 1 presents a geometric motivation for the

More information

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 J. Zhang Institute of Applied Mathematics, Chongqing University of Posts and Telecommunications, Chongqing

More information

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem

More information

constraint. Let us penalize ourselves for making the constraint too big. We end up with a

constraint. Let us penalize ourselves for making the constraint too big. We end up with a Chapter 4 Constrained Optimization 4.1 Equality Constraints (Lagrangians) Suppose we have a problem: Maximize 5, (x 1, 2) 2, 2(x 2, 1) 2 subject to x 1 +4x 2 =3 If we ignore the constraint, we get the

More information

Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

More information

Practical Guide to the Simplex Method of Linear Programming

Practical Guide to the Simplex Method of Linear Programming Practical Guide to the Simplex Method of Linear Programming Marcel Oliver Revised: April, 0 The basic steps of the simplex algorithm Step : Write the linear programming problem in standard form Linear

More information

Duality of linear conic problems

Duality of linear conic problems Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least

More information

Lecture 5 Principal Minors and the Hessian

Lecture 5 Principal Minors and the Hessian Lecture 5 Principal Minors and the Hessian Eivind Eriksen BI Norwegian School of Management Department of Economics October 01, 2010 Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725 Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T

More information

Nonlinear Algebraic Equations. Lectures INF2320 p. 1/88

Nonlinear Algebraic Equations. Lectures INF2320 p. 1/88 Nonlinear Algebraic Equations Lectures INF2320 p. 1/88 Lectures INF2320 p. 2/88 Nonlinear algebraic equations When solving the system u (t) = g(u), u(0) = u 0, (1) with an implicit Euler scheme we have

More information

CONSTRAINED NONLINEAR PROGRAMMING

CONSTRAINED NONLINEAR PROGRAMMING 149 CONSTRAINED NONLINEAR PROGRAMMING We now turn to methods for general constrained nonlinear programming. These may be broadly classified into two categories: 1. TRANSFORMATION METHODS: In this approach

More information

4.6 Linear Programming duality

4.6 Linear Programming duality 4.6 Linear Programming duality To any minimization (maximization) LP we can associate a closely related maximization (minimization) LP. Different spaces and objective functions but in general same optimal

More information

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d). 1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction

More information

Introduction to Algebraic Geometry. Bézout s Theorem and Inflection Points

Introduction to Algebraic Geometry. Bézout s Theorem and Inflection Points Introduction to Algebraic Geometry Bézout s Theorem and Inflection Points 1. The resultant. Let K be a field. Then the polynomial ring K[x] is a unique factorisation domain (UFD). Another example of a

More information

3. Linear Programming and Polyhedral Combinatorics

3. Linear Programming and Polyhedral Combinatorics Massachusetts Institute of Technology Handout 6 18.433: Combinatorial Optimization February 20th, 2009 Michel X. Goemans 3. Linear Programming and Polyhedral Combinatorics Summary of what was seen in the

More information

University of Lille I PC first year list of exercises n 7. Review

University of Lille I PC first year list of exercises n 7. Review University of Lille I PC first year list of exercises n 7 Review Exercise Solve the following systems in 4 different ways (by substitution, by the Gauss method, by inverting the matrix of coefficients

More information

Solutions Of Some Non-Linear Programming Problems BIJAN KUMAR PATEL. Master of Science in Mathematics. Prof. ANIL KUMAR

Solutions Of Some Non-Linear Programming Problems BIJAN KUMAR PATEL. Master of Science in Mathematics. Prof. ANIL KUMAR Solutions Of Some Non-Linear Programming Problems A PROJECT REPORT submitted by BIJAN KUMAR PATEL for the partial fulfilment for the award of the degree of Master of Science in Mathematics under the supervision

More information

Roots of Equations (Chapters 5 and 6)

Roots of Equations (Chapters 5 and 6) Roots of Equations (Chapters 5 and 6) Problem: given f() = 0, find. In general, f() can be any function. For some forms of f(), analytical solutions are available. However, for other functions, we have

More information

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}. Walrasian Demand Econ 2100 Fall 2015 Lecture 5, September 16 Outline 1 Walrasian Demand 2 Properties of Walrasian Demand 3 An Optimization Recipe 4 First and Second Order Conditions Definition Walrasian

More information

Parameter Estimation for Bingham Models

Parameter Estimation for Bingham Models Dr. Volker Schulz, Dmitriy Logashenko Parameter Estimation for Bingham Models supported by BMBF Parameter Estimation for Bingham Models Industrial application of ceramic pastes Material laws for Bingham

More information

Chapter 6. Cuboids. and. vol(conv(p ))

Chapter 6. Cuboids. and. vol(conv(p )) Chapter 6 Cuboids We have already seen that we can efficiently find the bounding box Q(P ) and an arbitrarily good approximation to the smallest enclosing ball B(P ) of a set P R d. Unfortunately, both

More information

Lecture 11. 3.3.2 Cost functional

Lecture 11. 3.3.2 Cost functional Lecture 11 3.3.2 Cost functional Consider functions L(t, x, u) and K(t, x). Here L : R R n R m R, K : R R n R sufficiently regular. (To be consistent with calculus of variations we would have to take L(t,

More information

Stochastic Inventory Control

Stochastic Inventory Control Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the

More information

AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS

AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS Revised Edition James Epperson Mathematical Reviews BICENTENNIAL 0, 1 8 0 7 z ewiley wu 2007 r71 BICENTENNIAL WILEY-INTERSCIENCE A John Wiley & Sons, Inc.,

More information

MINIMIZATION OF ENTROPY FUNCTIONALS UNDER MOMENT CONSTRAINTS. denote the family of probability density functions g on X satisfying

MINIMIZATION OF ENTROPY FUNCTIONALS UNDER MOMENT CONSTRAINTS. denote the family of probability density functions g on X satisfying MINIMIZATION OF ENTROPY FUNCTIONALS UNDER MOMENT CONSTRAINTS I. Csiszár (Budapest) Given a σ-finite measure space (X, X, µ) and a d-tuple ϕ = (ϕ 1,..., ϕ d ) of measurable functions on X, for a = (a 1,...,

More information

How To Prove The Dirichlet Unit Theorem

How To Prove The Dirichlet Unit Theorem Chapter 6 The Dirichlet Unit Theorem As usual, we will be working in the ring B of algebraic integers of a number field L. Two factorizations of an element of B are regarded as essentially the same if

More information

Ideal Class Group and Units

Ideal Class Group and Units Chapter 4 Ideal Class Group and Units We are now interested in understanding two aspects of ring of integers of number fields: how principal they are (that is, what is the proportion of principal ideals

More information

Factorization Theorems

Factorization Theorems Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Several Views of Support Vector Machines

Several Views of Support Vector Machines Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric

More information

Class Meeting # 1: Introduction to PDEs

Class Meeting # 1: Introduction to PDEs MATH 18.152 COURSE NOTES - CLASS MEETING # 1 18.152 Introduction to PDEs, Fall 2011 Professor: Jared Speck Class Meeting # 1: Introduction to PDEs 1. What is a PDE? We will be studying functions u = u(x

More information

Arrangements And Duality

Arrangements And Duality Arrangements And Duality 3.1 Introduction 3 Point configurations are tbe most basic structure we study in computational geometry. But what about configurations of more complicated shapes? For example,

More information

Cost Minimization and the Cost Function

Cost Minimization and the Cost Function Cost Minimization and the Cost Function Juan Manuel Puerta October 5, 2009 So far we focused on profit maximization, we could look at a different problem, that is the cost minimization problem. This is

More information

A Perfect Example for The BFGS Method

A Perfect Example for The BFGS Method A Perfect Example for The BFGS Method Yu-Hong Dai Abstract Consider the BFGS quasi-newton method applied to a general nonconvex function that has continuous second derivatives. This paper aims to construct

More information

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems

Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420-001,

More information

Mathematical finance and linear programming (optimization)

Mathematical finance and linear programming (optimization) Mathematical finance and linear programming (optimization) Geir Dahl September 15, 2009 1 Introduction The purpose of this short note is to explain how linear programming (LP) (=linear optimization) may

More information

Shape Optimization Problems over Classes of Convex Domains

Shape Optimization Problems over Classes of Convex Domains Shape Optimization Problems over Classes of Convex Domains Giuseppe BUTTAZZO Dipartimento di Matematica Via Buonarroti, 2 56127 PISA ITALY e-mail: buttazzo@sab.sns.it Paolo GUASONI Scuola Normale Superiore

More information

Nonlinear Optimization: Algorithms 3: Interior-point methods

Nonlinear Optimization: Algorithms 3: Interior-point methods Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,

More information

GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1

GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1 (R) User Manual Environmental Energy Technologies Division Berkeley, CA 94720 http://simulationresearch.lbl.gov Michael Wetter MWetter@lbl.gov February 20, 2009 Notice: This work was supported by the U.S.

More information

LECTURE 5: DUALITY AND SENSITIVITY ANALYSIS. 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method

LECTURE 5: DUALITY AND SENSITIVITY ANALYSIS. 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method LECTURE 5: DUALITY AND SENSITIVITY ANALYSIS 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method Introduction to dual linear program Given a constraint matrix A, right

More information

4.1 Learning algorithms for neural networks

4.1 Learning algorithms for neural networks 4 Perceptron Learning 4.1 Learning algorithms for neural networks In the two preceding chapters we discussed two closely related models, McCulloch Pitts units and perceptrons, but the question of how to

More information

Derivative Free Optimization

Derivative Free Optimization Department of Mathematics Derivative Free Optimization M.J.D. Powell LiTH-MAT-R--2014/02--SE Department of Mathematics Linköping University S-581 83 Linköping, Sweden. Three lectures 1 on Derivative Free

More information

Interior Point Methods and Linear Programming

Interior Point Methods and Linear Programming Interior Point Methods and Linear Programming Robert Robere University of Toronto December 13, 2012 Abstract The linear programming problem is usually solved through the use of one of two algorithms: either

More information

10. Proximal point method

10. Proximal point method L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing

More information

Lecture 7: Finding Lyapunov Functions 1

Lecture 7: Finding Lyapunov Functions 1 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.243j (Fall 2003): DYNAMICS OF NONLINEAR SYSTEMS by A. Megretski Lecture 7: Finding Lyapunov Functions 1

More information

Stationarity Results for Generating Set Search for Linearly Constrained Optimization

Stationarity Results for Generating Set Search for Linearly Constrained Optimization SANDIA REPORT SAND2003-8550 Unlimited Release Printed October 2003 Stationarity Results for Generating Set Search for Linearly Constrained Optimization Tamara G. Kolda, Robert Michael Lewis, and Virginia

More information

Piecewise Cubic Splines

Piecewise Cubic Splines 280 CHAP. 5 CURVE FITTING Piecewise Cubic Splines The fitting of a polynomial curve to a set of data points has applications in CAD (computer-assisted design), CAM (computer-assisted manufacturing), and

More information

Linear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc.

Linear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc. 1. Introduction Linear Programming for Optimization Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc. 1.1 Definition Linear programming is the name of a branch of applied mathematics that

More information

A Globally Convergent Primal-Dual Interior Point Method for Constrained Optimization Hiroshi Yamashita 3 Abstract This paper proposes a primal-dual interior point method for solving general nonlinearly

More information

Efficient Curve Fitting Techniques

Efficient Curve Fitting Techniques 15/11/11 Life Conference and Exhibition 11 Stuart Carroll, Christopher Hursey Efficient Curve Fitting Techniques - November 1 The Actuarial Profession www.actuaries.org.uk Agenda Background Outline of

More information

A FIRST COURSE IN OPTIMIZATION THEORY

A FIRST COURSE IN OPTIMIZATION THEORY A FIRST COURSE IN OPTIMIZATION THEORY RANGARAJAN K. SUNDARAM New York University CAMBRIDGE UNIVERSITY PRESS Contents Preface Acknowledgements page xiii xvii 1 Mathematical Preliminaries 1 1.1 Notation

More information

Principles of Scientific Computing Nonlinear Equations and Optimization

Principles of Scientific Computing Nonlinear Equations and Optimization Principles of Scientific Computing Nonlinear Equations and Optimization David Bindel and Jonathan Goodman last revised March 6, 2006, printed March 6, 2009 1 1 Introduction This chapter discusses two related

More information

Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

More information

Applications to Data Smoothing and Image Processing I

Applications to Data Smoothing and Image Processing I Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is

More information

CHAPTER 9. Integer Programming

CHAPTER 9. Integer Programming CHAPTER 9 Integer Programming An integer linear program (ILP) is, by definition, a linear program with the additional constraint that all variables take integer values: (9.1) max c T x s t Ax b and x integral

More information

G.A. Pavliotis. Department of Mathematics. Imperial College London

G.A. Pavliotis. Department of Mathematics. Imperial College London EE1 MATHEMATICS NUMERICAL METHODS G.A. Pavliotis Department of Mathematics Imperial College London 1. Numerical solution of nonlinear equations (iterative processes). 2. Numerical evaluation of integrals.

More information

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:

More information

Gambling and Data Compression

Gambling and Data Compression Gambling and Data Compression Gambling. Horse Race Definition The wealth relative S(X) = b(x)o(x) is the factor by which the gambler s wealth grows if horse X wins the race, where b(x) is the fraction

More information

Variational approach to restore point-like and curve-like singularities in imaging

Variational approach to restore point-like and curve-like singularities in imaging Variational approach to restore point-like and curve-like singularities in imaging Daniele Graziani joint work with Gilles Aubert and Laure Blanc-Féraud Roma 12/06/2012 Daniele Graziani (Roma) 12/06/2012

More information

Elasticity Theory Basics

Elasticity Theory Basics G22.3033-002: Topics in Computer Graphics: Lecture #7 Geometric Modeling New York University Elasticity Theory Basics Lecture #7: 20 October 2003 Lecturer: Denis Zorin Scribe: Adrian Secord, Yotam Gingold

More information

. P. 4.3 Basic feasible solutions and vertices of polyhedra. x 1. x 2

. P. 4.3 Basic feasible solutions and vertices of polyhedra. x 1. x 2 4. Basic feasible solutions and vertices of polyhedra Due to the fundamental theorem of Linear Programming, to solve any LP it suffices to consider the vertices (finitely many) of the polyhedron P of the

More information

Nonlinear Algebraic Equations Example

Nonlinear Algebraic Equations Example Nonlinear Algebraic Equations Example Continuous Stirred Tank Reactor (CSTR). Look for steady state concentrations & temperature. s r (in) p,i (in) i In: N spieces with concentrations c, heat capacities

More information

Support Vector Machines

Support Vector Machines CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning algorithm. SVMs are among the best (and many believe are indeed the best)

More information

Computing a Nearest Correlation Matrix with Factor Structure

Computing a Nearest Correlation Matrix with Factor Structure Computing a Nearest Correlation Matrix with Factor Structure Nick Higham School of Mathematics The University of Manchester higham@ma.man.ac.uk http://www.ma.man.ac.uk/~higham/ Joint work with Rüdiger

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Some representability and duality results for convex mixed-integer programs.

Some representability and duality results for convex mixed-integer programs. Some representability and duality results for convex mixed-integer programs. Santanu S. Dey Joint work with Diego Morán and Juan Pablo Vielma December 17, 2012. Introduction About Motivation Mixed integer

More information

Average rate of change of y = f(x) with respect to x as x changes from a to a + h:

Average rate of change of y = f(x) with respect to x as x changes from a to a + h: L15-1 Lecture 15: Section 3.4 Definition of the Derivative Recall the following from Lecture 14: For function y = f(x), the average rate of change of y with respect to x as x changes from a to b (on [a,

More information

Inner Product Spaces

Inner Product Spaces Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

More information

SOLUTIONS. f x = 6x 2 6xy 24x, f y = 3x 2 6y. To find the critical points, we solve

SOLUTIONS. f x = 6x 2 6xy 24x, f y = 3x 2 6y. To find the critical points, we solve SOLUTIONS Problem. Find the critical points of the function f(x, y = 2x 3 3x 2 y 2x 2 3y 2 and determine their type i.e. local min/local max/saddle point. Are there any global min/max? Partial derivatives

More information

Math 120 Final Exam Practice Problems, Form: A

Math 120 Final Exam Practice Problems, Form: A Math 120 Final Exam Practice Problems, Form: A Name: While every attempt was made to be complete in the types of problems given below, we make no guarantees about the completeness of the problems. Specifically,

More information

Biggar High School Mathematics Department. National 5 Learning Intentions & Success Criteria: Assessing My Progress

Biggar High School Mathematics Department. National 5 Learning Intentions & Success Criteria: Assessing My Progress Biggar High School Mathematics Department National 5 Learning Intentions & Success Criteria: Assessing My Progress Expressions & Formulae Topic Learning Intention Success Criteria I understand this Approximation

More information

Perron vector Optimization applied to search engines

Perron vector Optimization applied to search engines Perron vector Optimization applied to search engines Olivier Fercoq INRIA Saclay and CMAP Ecole Polytechnique May 18, 2011 Web page ranking The core of search engines Semantic rankings (keywords) Hyperlink

More information

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1. MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

More information

INTERPOLATION. Interpolation is a process of finding a formula (often a polynomial) whose graph will pass through a given set of points (x, y).

INTERPOLATION. Interpolation is a process of finding a formula (often a polynomial) whose graph will pass through a given set of points (x, y). INTERPOLATION Interpolation is a process of finding a formula (often a polynomial) whose graph will pass through a given set of points (x, y). As an example, consider defining and x 0 =0, x 1 = π 4, x

More information

4.3 Lagrange Approximation

4.3 Lagrange Approximation 206 CHAP. 4 INTERPOLATION AND POLYNOMIAL APPROXIMATION Lagrange Polynomial Approximation 4.3 Lagrange Approximation Interpolation means to estimate a missing function value by taking a weighted average

More information

EQUATIONS and INEQUALITIES

EQUATIONS and INEQUALITIES EQUATIONS and INEQUALITIES Linear Equations and Slope 1. Slope a. Calculate the slope of a line given two points b. Calculate the slope of a line parallel to a given line. c. Calculate the slope of a line

More information

Lecture 2: Homogeneous Coordinates, Lines and Conics

Lecture 2: Homogeneous Coordinates, Lines and Conics Lecture 2: Homogeneous Coordinates, Lines and Conics 1 Homogeneous Coordinates In Lecture 1 we derived the camera equations λx = P X, (1) where x = (x 1, x 2, 1), X = (X 1, X 2, X 3, 1) and P is a 3 4

More information

Lecture 2: August 29. Linear Programming (part I)

Lecture 2: August 29. Linear Programming (part I) 10-725: Convex Optimization Fall 2013 Lecture 2: August 29 Lecturer: Barnabás Póczos Scribes: Samrachana Adhikari, Mattia Ciollaro, Fabrizio Lecci Note: LaTeX template courtesy of UC Berkeley EECS dept.

More information

1 if 1 x 0 1 if 0 x 1

1 if 1 x 0 1 if 0 x 1 Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or

More information

Multi-variable Calculus and Optimization

Multi-variable Calculus and Optimization Multi-variable Calculus and Optimization Dudley Cooke Trinity College Dublin Dudley Cooke (Trinity College Dublin) Multi-variable Calculus and Optimization 1 / 51 EC2040 Topic 3 - Multi-variable Calculus

More information

(a) We have x = 3 + 2t, y = 2 t, z = 6 so solving for t we get the symmetric equations. x 3 2. = 2 y, z = 6. t 2 2t + 1 = 0,

(a) We have x = 3 + 2t, y = 2 t, z = 6 so solving for t we get the symmetric equations. x 3 2. = 2 y, z = 6. t 2 2t + 1 = 0, Name: Solutions to Practice Final. Consider the line r(t) = 3 + t, t, 6. (a) Find symmetric equations for this line. (b) Find the point where the first line r(t) intersects the surface z = x + y. (a) We

More information

1 3 4 = 8i + 20j 13k. x + w. y + w

1 3 4 = 8i + 20j 13k. x + w. y + w ) Find the point of intersection of the lines x = t +, y = 3t + 4, z = 4t + 5, and x = 6s + 3, y = 5s +, z = 4s + 9, and then find the plane containing these two lines. Solution. Solve the system of equations

More information

November 16, 2015. Interpolation, Extrapolation & Polynomial Approximation

November 16, 2015. Interpolation, Extrapolation & Polynomial Approximation Interpolation, Extrapolation & Polynomial Approximation November 16, 2015 Introduction In many cases we know the values of a function f (x) at a set of points x 1, x 2,..., x N, but we don t have the analytic

More information

Zeros of a Polynomial Function

Zeros of a Polynomial Function Zeros of a Polynomial Function An important consequence of the Factor Theorem is that finding the zeros of a polynomial is really the same thing as factoring it into linear factors. In this section we

More information

with functions, expressions and equations which follow in units 3 and 4.

with functions, expressions and equations which follow in units 3 and 4. Grade 8 Overview View unit yearlong overview here The unit design was created in line with the areas of focus for grade 8 Mathematics as identified by the Common Core State Standards and the PARCC Model

More information

NOTES ON LINEAR TRANSFORMATIONS

NOTES ON LINEAR TRANSFORMATIONS NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all

More information

MATH 590: Meshfree Methods

MATH 590: Meshfree Methods MATH 590: Meshfree Methods Chapter 7: Conditionally Positive Definite Functions Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2010 fasshauer@iit.edu MATH 590 Chapter

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information