Nonlinear Optimization: Algorithms 3: Interior-point methods



Similar documents
Duality in General Programs. Ryan Tibshirani Convex Optimization /36-725

Nonlinear Programming Methods.S2 Quadratic Programming

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

4.6 Linear Programming duality

Summer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method

2.3 Convex Constrained Optimization Problems

Linear Programming Notes V Problem Transformations

Support Vector Machines

Support Vector Machine (SVM)

10. Proximal point method

Interior Point Methods and Linear Programming

1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where.

Discrete Optimization

Duality in Linear Programming

constraint. Let us penalize ourselves for making the constraint too big. We end up with a

Date: April 12, Contents

Two-Stage Stochastic Linear Programs

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

Practical Guide to the Simplex Method of Linear Programming

Integer Programming: Algorithms - 3

Big Data - Lecture 1 Optimization reminders

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Dual Methods for Total Variation-Based Image Restoration

An Overview Of Software For Convex Optimization. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM

Linear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc.

Solutions of Equations in One Variable. Fixed-Point Iteration II

24. The Branch and Bound Method

A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION

Distributed Machine Learning and Big Data

Interior-Point Algorithms for Quadratic Programming

Lecture 2. Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization

INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: FILTER METHODS AND MERIT FUNCTIONS

Some representability and duality results for convex mixed-integer programs.

Optimization in R n Introduction

Duality of linear conic problems

The Multiplicative Weights Update method

Support Vector Machines Explained

CONSTRAINED NONLINEAR PROGRAMMING

Solutions Of Some Non-Linear Programming Problems BIJAN KUMAR PATEL. Master of Science in Mathematics. Prof. ANIL KUMAR

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Several Views of Support Vector Machines

Lecture 2: August 29. Linear Programming (part I)

On SDP- and CP-relaxations and on connections between SDP-, CP and SIP

3. Linear Programming and Polyhedral Combinatorics

A Simple Introduction to Support Vector Machines

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1

Linear Programming in Matrix Form

LECTURE 5: DUALITY AND SENSITIVITY ANALYSIS. 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method

Optimization of Communication Systems Lecture 6: Internet TCP Congestion Control

Proximal mapping via network optimization

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

Chapter 6. Linear Programming: The Simplex Method. Introduction to the Big M Method. Section 4 Maximization and Minimization with Problem Constraints

Linear Threshold Units

Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams

Advanced Lecture on Mathematical Science and Information Science I. Optimization in Finance

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Chapter 2 Solving Linear Programs

TOMLAB - For fast and robust largescale optimization in MATLAB

Principles of Scientific Computing Nonlinear Equations and Optimization

. P. 4.3 Basic feasible solutions and vertices of polyhedra. x 1. x 2

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

International Doctoral School Algorithmic Decision Theory: MCDA and MOO

Optimization Theory for Large Systems

1 Portfolio mean and variance

Cost Minimization and the Cost Function

IEOR 4404 Homework #2 Intro OR: Deterministic Models February 14, 2011 Prof. Jay Sethuraman Page 1 of 5. Homework #2

Linear Programming. Widget Factory Example. Linear Programming: Standard Form. Widget Factory Example: Continued.

Linear Programming. March 14, 2014

Numerical methods for American options

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.

Reformulating the monitor placement problem: Optimal Network-wide wide Sampling

Lecture 11: 0-1 Quadratic Program and Lower Bounds

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1

Pattern Analysis. Logistic Regression. 12. Mai Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Solving Linear Programs

Optimization Modeling for Mining Engineers

Linear Programming I

Agricultural and Environmental Policy Models: Calibration, Estimation and Optimization. Richard E. Howitt

26 Linear Programming

Optimization Methods in Finance

10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method

Special Situations in the Simplex Algorithm

Adaptive Online Gradient Descent

Linear Programming: Chapter 11 Game Theory

Lecture 2: The SVM classifier

Integrating Benders decomposition within Constraint Programming

CHAPTER 9. Integer Programming

Statistical Machine Learning

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

INTEGER PROGRAMMING. Integer Programming. Prototype example. BIP model. BIP models

Sensitivity Analysis 3.1 AN EXAMPLE FOR ANALYSIS

Transcription:

Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.1/32

Outline Inequality constrained minimization Logarithmic barrier function and central path Barrier method Feasibility and phase I methods Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.2/32

Inequality constrained minimization Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.3/32

Setting We consider the problem: minimize f(x) subject to g i (x) 0, i = 1,...,m, Ax = b, f and g are supposed to be convex and twice continuously differentiable. A is a p n matrix of rank p < n (i.e., fewer equality constraints than variables, and independent equality constraints). We assume f is finite and attained at x Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.4/32

Strong duality hypothesis We finally assume the problem is strictly feasible, i.e., there exists x with g i (x) < 0,i = 1,...,m, and Ax = 0. This means that Slater s constraint qualification holds = strong duality holds and dual optimum is attained, i.e., there exists λ R p and µ R m which together with x satisfy the KKT conditions: f (x ) + m i=1 Ax = b g i (x ) 0, i = 1,...,m µ 0 µ i g i (x ) + A λ = 0 µ ig i (x ) = 0, i = 1,...,m. Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.5/32

Examples Many problems satisfy these conditions, e.g.: LP, QP, QCQP Entropy maximization with linear inequality constraints minimize subject to n x i log x i i=1 Fx g Ax = b. Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.6/32

Examples (cont.) To obtain differentiability of the objective and constraints we might reformulate the problem, e.g: ( ) minimize a i x + b i max i=1,...,n with nondifferentiable objective is equivalent to the LP: minimize t subject to a i x + b t, i = 1,...,m. Ax = b. Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.7/32

Overview Interior-point methods solve the problem (or the KKT conditions) by applying Newton s method to a sequence of equality-constrained problems. They form another level in the hierarchy of convex optimization algorithms: Linear equality constrained quadratic problems (LCQP) are the simplest (set of linear equations that can be solved analytically) Newton s method: reduces linear equality constrained convex optimization problems (LCCP) with twice differentiable objective to a sequence of LCQP. Interior-point methods reduce a problem with linear equality and inequality constraints to a sequence of LCCP. Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.8/32

Logarithmic barrier function and central path Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.9/32

Problem reformulation Our goal is to approximately formulate the inequality constrained problem as an equality constrained problem to which Newton s method can be applied. To this end we first hide the inequality constraint implicit in the objective: m minimize f(x) + subject to Ax = b, i=1 I (g i (x)) where I : R R is the indicator function for nonpositive reals: { 0 if u 0, I (u) = + if u > 0. Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.10/32

Logarithmic barrier The basic idea of the barrier method is to approximate the indicator function I by the convex and differentiable function Î (u) = 1 t log( u), u < 0, where t > 0 is a parameter that sets the accuracy of the prediction. Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.11/32

Problem reformulation Subsituting Î for I in the optimization problem gives the approximation: m minimize f(x) + subject to Ax = b, i=1 1 t log ( g i(x)) The objective function of this problem is convex and twice differentiable, so Newton s method can be used to solve it. Of course this problem is just an approximation to the original problem. We will see that the quality of the approximation of the solution increases when t increases. Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.12/32

Logarithmic barrier function The function φ(x) = m log ( g i (x)) i=1 is called the logarithmic barrier or log barrier for the original optimization problem. Its domain is the set of points that satisfy all inequality constraints strictly, and it grows without bound if g i (x) 0 for any i. Its gradient and Hessian are given by: φ(x) = 2 φ(x) = m i=1 m i=1 1 g i (x) g i(x), 1 g i (x) 2 g i(x) g i (x) + m i=1 1 g i (x) 2 g i (x). Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.13/32

Central path Our approximate problem is therefore (equivalent to) the following problem: minimize tf(x) + φ(x) subject to Ax = b. We assume for now that this problem can be solved via Newton s method, in particular that it has a unique solution x (t) for each t > 0. The central path is the set of solutions, i.e.: {x (t) t > 0}. Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.14/32

Characterization of the central path A point x (t) is on the central path if and only if: it is strictly feasible, i.e., satisfies: Ax (t) = b, g i (x (t)) < 0, i = 1,...,m. there exists a ˆλ R p such that: 0 = t f (x (t)) + φ (x (t)) + A ˆλ m = t f (x 1 (t)) + g i (x (t)) g i (x (t)) + A ˆλ. i=1 Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.15/32

Example: LP central path The log barrier for a LP: is given by minimize c x subject to Ax b, φ(x) = m log i=1 ( ) b i a i x where a i is the ith row of A. Its derivatives are:, φ(x) = m i=1 1 b i a i xa i, 2 φ(x) = m i=1 1 ( bi a i x) 2 a ia i. Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.16/32

Example (cont.) The derivatives can be rewritten more compactly: φ(x) = A d, 2 φ(x) = A diag(d) 2 A, where d R m is defined by d i = 1/ ( b i a i x). The centrality condition for x (t) is: tc + A d = 0 = at each point on the central path, φ(x) is parallel to c. Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.17/32

Dual points on central path Remember that x = x (t) if there exists a w such that t f (x (t)) + m i=1 1 g i (x (t)) g i (x (t)) + A ˆλ = 0, Ax = b. Let us now define: µ i(t) = 1 tg i (x (t)), i = 1,...,m, λ (t) = ˆλ t. We claim that the pair λ (t),µ (t) is dual feasible. Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.18/32

Dual points on central path (cont.) Indeed: µ (t) > 0 because g i (x (t)) < 0 x (t) minimizes the Lagrangian L (x,λ (t),µ (t)) = f(x)+ m i=1 µ i (t)g i(x)+λ (t) (Ax b). Therefore the dual function q (µ (t),λ (t)) is finite and: q (µ (t),λ (t)) = L (x (t),λ (t),µ (t)) = f (x (t)) m t Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.19/32

Convergence of the central path From the equation: q (µ (t),λ (t)) = f (x (t)) m t we deduce that the duality gap associated with x (t) and the dual feasible pair λ (t),µ (t) is simply m/t. As an important consequence we have: f (x (t)) f m t This confirms the intuition that f (x (t)) f if t. Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.20/32

Interpretation via KKT conditions We can rewrite the conditions for x to be on the central path by the existence of λ,µ such that: 1. Primal constraints: g i (x) 0, Ax = b 2. Dual constraints : µ 0 3. approximate complementary slackness: µ i g i (x) = 1/t 4. gradient of Lagrangian w.r.t. x vanishes: f(x) + m i=1 µ i g i (x) + A λ = 0 The only difference with KKT is that 0 is replaced by 1/t in 3. For large t, the point on the central path almost satisfies the KKT conditions. Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.21/32

The barrier method Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.22/32

Motivations We have seen that the point x (t) is m/t-suboptimal. In order to solve the optimization problem with a guaranteed specific accuracy ɛ > 0, it suffices to take t = m/ɛ and solve the equality constrained problem: minimize m ɛ f(x) + φ(x) subject to Ax = b by Newton s method. However this only works for small problems, good starting points and moderate accuracy. It is rarely, if ever, used. Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.23/32

Barrier method given strictly feasible x, t = t (0) > 0,µ > 1, tolerance ɛ > 0. repeat 1. Centering step: compute x (t) by minimizing tf + φ, subject to Ax = b 2. Update: x := x (t). 3. Stopping criterion: quit if m/t < ɛ. 4. Increase t: t := µt. Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.24/32

Barrier method: Centering Centering is usually done with Newton s method, starting at current x Inexact centering is possible, since the goal is only to obtain a sequence of points x (k) that converges to an optimal point. In practice, however, the cost of computing an extremely accurate minimizer of tf 0 + φ as compared to the cost of computing a good minimizer is only marginal. Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.25/32

Barrier method: choice of µ The choice of µ involves a trade-off For small µ, the initial point of each Newton process is good and few Newton iterations are required; however, many outer loops (update of t) are required. For large µ, many Newton steps are required after each update of t, since the initial point is probably not very good. However few outer loops are required. In practice µ = 10 20 works well. Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.26/32

Barrier method: choice of t (0) The choice of t (0) involves a simple trade-off if t (0) is chosen too large, the first outer iteration will require too many Newton iterations if t (0) is chosen too small, the algorithm will require extra outer iterations Several heuristics exist for this choice. Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.27/32

Example: LP in inequality form m = 100 inequalities, n = 50 variables. start with x on central paht (t (0) = 1, duality gap 100), terminates when t = 10 8 (gap 10 6 ) centering uses Newton s method with backtracking total number of Newton iterations not very sensitive for µ > 10 Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.28/32

Example: A family of standard LP minimize c x subject to Ax = b, x 0 for A R m 2m. Test for m = 10,...,1000: The number of iterations grows very slowly as m ranges over a 100 : 1 ratio. Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.29/32

Feasibility and phase I methods The barrier method requires a strictly feasible starting point x (0) : g i (x (0)) < 0,i = 1,...,m Ax (0) = 0. When such a point is not known, the barrier method is preceded by a preliminary stage, called phase I, in which a strictly feasible point is computed. Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.30/32

Basic phase I method minimize s subject to g i (x) s, i = 1,...,m, Ax = b, this problem is always strictly feasible (choose any x, and s large enough). apply the barrier method to this problem = phase I optimization problem. If x,s feasible with s < 0 then x is strictly feasible for the initial problem If f > 0 then the original problem is infeasible. Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.31/32

Primal-dual interior-point methods A variant of the barrier method, more efficient when high accurary is needed update primal and dual variables at each iteration: no distinction between inner and outer iterations often exhibit superlinear asymptotic convergence search directions can be interpreted as Newton directions for modified KKT conditions can start at infeasible points cost per iteration same as barrier method Nonlinear optimization c 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.32/32