Summer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.

Similar documents

Nonlinear Optimization: Algorithms 3: Interior-point methods

2.3 Convex Constrained Optimization Problems

Duality of linear conic problems

10. Proximal point method

Duality in General Programs. Ryan Tibshirani Convex Optimization /36-725

INTERIOR POINT POLYNOMIAL TIME METHODS IN CONVEX PROGRAMMING

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

Interior Point Methods and Linear Programming

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION

Convex Optimization. Lieven Vandenberghe University of California, Los Angeles

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Introduction to Support Vector Machines. Colin Campbell, Bristol University

1 Norms and Vector Spaces

24. The Branch and Bound Method

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method

CONSTRAINED NONLINEAR PROGRAMMING

An Overview Of Software For Convex Optimization. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM

Lecture 5 Principal Minors and the Hessian

Optimisation et simulation numérique.

Numerical methods for American options

Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach

[1] F. Jarre and J. Stoer, Optimierung, Lehrbuch, Springer Verlag 2003.

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1

Solving polynomial least squares problems via semidefinite programming relaxations

Dual Methods for Total Variation-Based Image Restoration

4.6 Linear Programming duality

Conic optimization: examples and software

An Introduction on SemiDefinite Program

Critical points of once continuously differentiable functions are important because they are the only points that can be local maxima or minima.

Advances in Convex Optimization: Interior-point Methods, Cone Programming, and Applications

Lecture 7: Finding Lyapunov Functions 1

Advanced Lecture on Mathematical Science and Information Science I. Optimization in Finance

MATH 425, PRACTICE FINAL EXAM SOLUTIONS.

Discrete Optimization

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where.

Lecture 13 Linear quadratic Lyapunov theory

Two-Stage Stochastic Linear Programs

Some representability and duality results for convex mixed-integer programs.

Interior-Point Algorithms for Quadratic Programming

Big Data - Lecture 1 Optimization reminders

Lecture 11: 0-1 Quadratic Program and Lower Bounds

Proximal mapping via network optimization

INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: FILTER METHODS AND MERIT FUNCTIONS

Date: April 12, Contents

Error Bound for Classes of Polynomial Systems and its Applications: A Variational Analysis Approach

Solutions of Equations in One Variable. Fixed-Point Iteration II

Lecture Notes 10

Stochastic Inventory Control

Nonlinear Programming Methods.S2 Quadratic Programming

The Advantages and Disadvantages of Online Linear Optimization

Optimal shift scheduling with a global service level constraint

Support Vector Machines

3. Linear Programming and Polyhedral Combinatorics

Linear Threshold Units

Lecture 3: Linear methods for classification

Numerical Methods I Eigenvalue Problems

Linear Programming Notes V Problem Transformations

Principles of Scientific Computing Nonlinear Equations and Optimization

GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1

Determining distribution parameters from quantiles

The Goldberg Rao Algorithm for the Maximum Flow Problem

Optimal Investment with Derivative Securities

(Quasi-)Newton methods

Solving Method for a Class of Bilevel Linear Programming based on Genetic Algorithms

Additional Exercises for Convex Optimization

Several Views of Support Vector Machines

Lecture 2: August 29. Linear Programming (part I)

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

1 Solving LPs: The Simplex Algorithm of George Dantzig

RANDOM INTERVAL HOMEOMORPHISMS. MICHA L MISIUREWICZ Indiana University Purdue University Indianapolis

Support Vector Machines with Clustering for Training with Very Large Datasets

Follow the Perturbed Leader

Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams

Mathematical finance and linear programming (optimization)

Research Article Stability Analysis for Higher-Order Adjacent Derivative in Parametrized Vector Optimization

NP-Hardness Results Related to PPAD

x a x 2 (1 + x 2 ) n.

On SDP- and CP-relaxations and on connections between SDP-, CP and SIP

5.1 Derivatives and Graphs

constraint. Let us penalize ourselves for making the constraint too big. We end up with a

SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA)

CHAPTER 9. Integer Programming

BANACH AND HILBERT SPACE REVIEW

Cost Minimization and the Cost Function

Nonlinear Algebraic Equations Example

Adaptive Online Gradient Descent

FACTORING POLYNOMIALS IN THE RING OF FORMAL POWER SERIES OVER Z

Applied Algorithm Design Lecture 5

Transcription:

Summer course on Convex Optimization Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.Minnesota

Interior-Point Methods: the rebirth of an old idea Suppose that f is convex and g 1,..., g m are concave min f(x) s.t. g i (x) 0 1 i m x X We want to solve this by Newton s method Constraints are difficult to handle with this method Idea: put them in the objective min f(x) + µ m i=1 φ(g i (x)) where φ is convex, nondecreasing, and φ(t) + if t 0 Then solve it for various µ 0. Φ(x) := m i=1 φ(g i (x)) is a barrier, as Φ is convex, and Φ(x) + as x X Problems: tiny convergence zone, numerical problems when µ 0

Interior-Point Methods: the rebirth of an old idea 1969-1984 Reign of Augmented Lagrangian Methods In the language of the lecture of yesterday, F = x µ u 2 u i x i : µ > 0 2 i µ + 1984 Narendra Karmarkar creates a new polynomial-time algorithm for Linear Programming ( ) 2 People realized it fits Fiacco and McCormick s framework

The blooming of Interior-Point Methods 1988 Yurii Nesterov and Arkadii Nemirovski generalize Interior-Point Methods to Convex Optimization Nonlinearity is not an issue anymore. Largest nonlinear Optimization problem solved: 10 9 variables, 3 10 8 constraints (Gonzio, 2006) 1992 Yurii Nesterov and Mike Todd define efficient algorithms for Semidefinite Optimization. A new way of modelling appears, with applications in Mechanics, Control, Finance, Structural Design,... (Boyd, Vandenberghe)

Something odd in Black-Box methods for convex programming How do Black-Box methods deal with convexity? First, you realize that your problem is convex (or even strongly convex) Thus, you investigate its global properties Then you hide your problem in a mysterious black box You only interact with it through an oracle that gives you local information (if x is the current point, gives f(x), and/or f(x), and/or 2 f(x)... ) They act as if they didn t know the problem is convex

By the way, how do you check convexity? Directly from the definition Try this one: for x > 0, f(x) := max exp( x 2 2 ), λ max n i=1 x i A i ln(x 1 ) + 5x4 n. By using the structure of the function You know several simple convex functions: t 2,exp(t),... and several operations that preserve convexity: max, +,... And after all this work, you give this beautiful structure to a Black-Box method that doesn t care about it! But interior-point method explicitly use this structure to construct a barrier for the feasible set (see below)

Newton s Method under scrutiny x k+1 = x k 2 f(x k ) 1 f(x k ) We use for the induced matrix norm Theorem 1 (Kantorovich) Suppose that the function f : R n R {+ } satisfies: f is twice continuously differentiable, M > 0 such that x, y 2 f(x) 2 f(y) M x y, 2 f(x ) li 0 then, when x 0 x < 2l/3M, the iterates x k of Newton s method are well-defined and: x k+1 x M x k x 2 2(l M x k x ).

Newton s Method under scrutiny Kantorovitch s proof x k+1 = x k 2 f(x k ) 1 f(x k ) We have: x k+1 x = x k x 2 f(x k ) 1 f(x k ) ( 1 ) = 2 f(x k ) 1 [ 2 f(x k ) 2 f(x + t(x k x ))](x k x )dt 0 Hence, with r k := x k x(, 1 ) r k+1 2 f(x k ) 1 2 f(x k ) 2 f(x + t(x k x )) dt 2 f(x k ) 1 M 2 r2 k Mr 2 k 2(l Mr k ), 0 because 2 f(x k ) 2 f(x ) Mr k I n, and 2 f(x k ) (l Mr k )I n. Note that r k+1 < r k when r k < 2l/(3M), because r k r k+1 Mr 2 k 2(l Mr k ) < r k

Kantorovitch s result is very strange The iterates of Newton s Method are affine invariant Proof: Consider A invertible, x 0, x k+1 = x k 2 f(x k ) 1 f(x k ) φ(y) := f(ay), and y 0 = A 1 x 0. Note that φ(y 0 ), h = lim t 0 f(ay 0 + tah) f(ay 0 ) t = f(ay 0 ), Ah and φ(y 0 ) = A f(ay 0 ). Similarly 2 φ(y 0 ) = A 2 f(ay 0 )A. y 1 = A 1 ( x 0 2 f(ay 0 ) 1 f(ay 0 ) ) However, the convergence zone x 0 x < 2l/3M is not!

What s wrong with the assumptions? x k+1 = x k 2 f(x k ) 1 f(x k ) We use for the induced matrix norm Theorem 1 (Kantorovich) Suppose that the function f : R n R {+ } satisfies: f is twice continuously differentiable, M > 0 such that x, y 2 f(x) 2 f(y) M x y, 2 f(x ) li 0 then, when x 0 x < 2l/3M, the iterates x k of Newton s method are well-defined and: x k+1 x M x k x 2 2(l M x k x ).

Nesterov and Nemirovski s solution Instead of using the Euclidean norm, use a local norm u u x = 2 f(x)u, u This norm is affine invariant Let φ(y) := f(ay), and v = A 1 u. We have 2 φ(y)v, v = (A 2 f(y)a)(a 1 u),(a 1 u) = 2 f(x)u, u The property x, y 2 f(x) 2 f(y) M x y should then be replaced by: x, y, h 3 f(x)[h, h, x y] M h 2 x x y x

Self-concordancy: one of the two big properties There exists M > 0 for which: x, y, h 3 f(x)[h, h, x y] M h 2 x x y x x, h 3 f(x)[h, h, h] M h 3 x x, h 3 f(x)[h, h, h] 2 h 3 x Such functions are called self-concordant Examples (check it as exercise): ln(t) (domain: R ++ ) lndet(x) (domain: S N ++ ) ln(t 2 x 2 ) (domain: ice-cream cone) ln(t) ln(ln(t) x) (domain:{(t, x) : t exp(x)})

Self-concordant functions: the right thing for Newton s method These functions have MANY properties, among which: For every x domf, {y : y x x < 1} domf (interesting for Karmarkar s method) Proof:Let x dom(f), h R n. Let φ(t) = 1 h x+th = 1 2 f(x + th)h, h 1/2 Then φ(t) +0 when x + th dom(f) (because the Hessian goes to ). As long as φ(t) > 0, x + th dom(f) φ(t) = 3 f(x + th)[h, h, h] 2 2 f(x + th)h, h 3/2 Thus φ(t) < 1, and φ(t) > 0 for φ(0) < t < φ(0) i.e x ± φ(0)h dom(f) x ± h/ h x dom(f)

Self-concordant functions: the right thing for Newton s method These functions have MANY properties, among which: If f(x) x := 2 f(x) 1 f(x), f(x) 3 5 2, then x is in the quadratic convergence zone (automatic test, no x needed) The following method ALWAYS converges x k+1 = x k 2 f(x k ) 1 f(x k ) 1+ f(x k ) x k Exercise: the dual norm h x is 2 f(x k ) 1 h, h

Finally! Interior-point methods Main idea: formulate your problem in its conic form, and use as barrier for your cone a self-concordant function f min c, x min c, x + µf(x) s.t. Ax = b s.t. Ax = b x K The set of minimizers x(µ) is called the primal central path, and x(µ) x when µ 0 But wait: is c, x + µf(x) a self-concordant function? Yes!

How to decrease µ? Main goal: we want to decrease it linearly: µ = (1 θ)µ. Main idea: use our knowledge of the quadratic convergence zone Current point: x(µ). Target: x(µ ). We have c + µ f(x(µ)) = 0, and we want i.e. c + (1 θ)µ f(x(µ)) x(µ) < 3 5, 2 θµ f(x(µ)) x(µ) < 3 5, 2 hence, we would like to have a bound for f(x) x Note: this bound is responsible of the complexity. The smaller it is, the bigger is the decrease θ

The two crucial properties of barriers Self-concordancy: Bound for f(x) x : x, h 3 f(x)[h, h, h] 2 h 3 x x domf 2 f(x) 1 f(x), f(x) ν These functions are called ν-self-concordant barriers The theoretical complexity of the best IPM is O( ν ln(c/ɛ)) Newton iterations (in practice, even better: O(ln(ν)/ɛ))

An interior-point algorithm Algorithm 1 Let ɛ > 0, µ 0 > 0 and x 0 feasible such that c + µ 0 f(x 0 ) x 0 3 5 2 Let θ := 1/(1.5 + 14.3 ν) and k := 0 While 2.58µ k ν ɛ 1. µ k+1 := µ k (1 θ) 2. x k+1 := x k 2 f(x k ) 1 ( f(x k ) + µ k+1 c) 3. Increment k Complexity upper bound: (1.03 + 14.3 ν)ln(2.58µ 0 ν)/ɛ Proof of constants: PhD Thesis of François Glineur

How do you construct self-concordant barriers? 1- Basic barriers: Domain Barrier Complexity parameter R + ln(t) 1 S n + ln(det(x)) n epi x 2 ln(t 2 x 2 2 ) 2 epiexp(x) ln(t) ln(ln(t) x) 2 2- Combining barriers: Let f 1 be a barrier for K 1 with param. ν 1, and f 2 be a barrier for K 2 with param. ν 2. Then f 1 + f 2 is a barrier for K 1 K 2 with param. ν 1 + ν 2 Let f be the barrier for K with param. ν Then f (s) := sup x R n s, x f(x) is a barrier for K with param. ν

How do you construct self-concordant barriers? 1- Basic barriers: Domain Barrier Complexity parameter R + ln(t) 1 S+ n ln(det(x)) n epi x 2 ln(t 2 x 2 2 ) 2 epiexp(x) ln(t) ln(ln(t) x) 2 2- Combining barriers: Let f be the barrier for K with param. ν The restriction of f to a affine space S is a barrier for the set S K with param. ν

What about primal-dual problems? Everything is the same: min c, x s.t. Ax = b x K max y, b s.t. A y + s = c s K i.e. (What s the optimal value?) min c, x b, y min s, x + µ(f(x) + f (s)) s.t. Ax = b s.t. Ax = b A y + s = c A y + s x K, s K

Strangely enough, primal-dual IPM work very well All IPM optimization software (SeDuMi, MOSEK,... ) are primal-dual. Efficient IPM methods can solve: Linear problems, Second Order problems (ice-cream cone) in particular Quadratic problems, Semidefinite problems, and (sometimes) Geometric problems, i.e. involving posynomials (see on Thursday) because of the properties of their self-concordant barrier

And in practice? Many speed-ups and tricks are used For computing the starting point (and dealing with infeasible starting points) For solving the Newton system (reduction of variable) For updating µ Decrease µ much faster than in the theory, then do several steps targeting the central path THOUSANDS of research papers deal with these questions

Some references [1] - Y. Nesterov, Introductory lectures on convex optimization: a basic course, Kluwer, 2003 [2] - Y. Nesterov and A. Nemirovski, Interior Point Algorithms in Convex Programming, SIAM, 1993 [3] - J. Renegar, A Mathematical View of Interior-Point Methods in Convex Optimization, SIAM, 2001