Summer course on Convex Optimization Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.Minnesota
Interior-Point Methods: the rebirth of an old idea Suppose that f is convex and g 1,..., g m are concave min f(x) s.t. g i (x) 0 1 i m x X We want to solve this by Newton s method Constraints are difficult to handle with this method Idea: put them in the objective min f(x) + µ m i=1 φ(g i (x)) where φ is convex, nondecreasing, and φ(t) + if t 0 Then solve it for various µ 0. Φ(x) := m i=1 φ(g i (x)) is a barrier, as Φ is convex, and Φ(x) + as x X Problems: tiny convergence zone, numerical problems when µ 0
Interior-Point Methods: the rebirth of an old idea 1969-1984 Reign of Augmented Lagrangian Methods In the language of the lecture of yesterday, F = x µ u 2 u i x i : µ > 0 2 i µ + 1984 Narendra Karmarkar creates a new polynomial-time algorithm for Linear Programming ( ) 2 People realized it fits Fiacco and McCormick s framework
The blooming of Interior-Point Methods 1988 Yurii Nesterov and Arkadii Nemirovski generalize Interior-Point Methods to Convex Optimization Nonlinearity is not an issue anymore. Largest nonlinear Optimization problem solved: 10 9 variables, 3 10 8 constraints (Gonzio, 2006) 1992 Yurii Nesterov and Mike Todd define efficient algorithms for Semidefinite Optimization. A new way of modelling appears, with applications in Mechanics, Control, Finance, Structural Design,... (Boyd, Vandenberghe)
Something odd in Black-Box methods for convex programming How do Black-Box methods deal with convexity? First, you realize that your problem is convex (or even strongly convex) Thus, you investigate its global properties Then you hide your problem in a mysterious black box You only interact with it through an oracle that gives you local information (if x is the current point, gives f(x), and/or f(x), and/or 2 f(x)... ) They act as if they didn t know the problem is convex
By the way, how do you check convexity? Directly from the definition Try this one: for x > 0, f(x) := max exp( x 2 2 ), λ max n i=1 x i A i ln(x 1 ) + 5x4 n. By using the structure of the function You know several simple convex functions: t 2,exp(t),... and several operations that preserve convexity: max, +,... And after all this work, you give this beautiful structure to a Black-Box method that doesn t care about it! But interior-point method explicitly use this structure to construct a barrier for the feasible set (see below)
Newton s Method under scrutiny x k+1 = x k 2 f(x k ) 1 f(x k ) We use for the induced matrix norm Theorem 1 (Kantorovich) Suppose that the function f : R n R {+ } satisfies: f is twice continuously differentiable, M > 0 such that x, y 2 f(x) 2 f(y) M x y, 2 f(x ) li 0 then, when x 0 x < 2l/3M, the iterates x k of Newton s method are well-defined and: x k+1 x M x k x 2 2(l M x k x ).
Newton s Method under scrutiny Kantorovitch s proof x k+1 = x k 2 f(x k ) 1 f(x k ) We have: x k+1 x = x k x 2 f(x k ) 1 f(x k ) ( 1 ) = 2 f(x k ) 1 [ 2 f(x k ) 2 f(x + t(x k x ))](x k x )dt 0 Hence, with r k := x k x(, 1 ) r k+1 2 f(x k ) 1 2 f(x k ) 2 f(x + t(x k x )) dt 2 f(x k ) 1 M 2 r2 k Mr 2 k 2(l Mr k ), 0 because 2 f(x k ) 2 f(x ) Mr k I n, and 2 f(x k ) (l Mr k )I n. Note that r k+1 < r k when r k < 2l/(3M), because r k r k+1 Mr 2 k 2(l Mr k ) < r k
Kantorovitch s result is very strange The iterates of Newton s Method are affine invariant Proof: Consider A invertible, x 0, x k+1 = x k 2 f(x k ) 1 f(x k ) φ(y) := f(ay), and y 0 = A 1 x 0. Note that φ(y 0 ), h = lim t 0 f(ay 0 + tah) f(ay 0 ) t = f(ay 0 ), Ah and φ(y 0 ) = A f(ay 0 ). Similarly 2 φ(y 0 ) = A 2 f(ay 0 )A. y 1 = A 1 ( x 0 2 f(ay 0 ) 1 f(ay 0 ) ) However, the convergence zone x 0 x < 2l/3M is not!
What s wrong with the assumptions? x k+1 = x k 2 f(x k ) 1 f(x k ) We use for the induced matrix norm Theorem 1 (Kantorovich) Suppose that the function f : R n R {+ } satisfies: f is twice continuously differentiable, M > 0 such that x, y 2 f(x) 2 f(y) M x y, 2 f(x ) li 0 then, when x 0 x < 2l/3M, the iterates x k of Newton s method are well-defined and: x k+1 x M x k x 2 2(l M x k x ).
Nesterov and Nemirovski s solution Instead of using the Euclidean norm, use a local norm u u x = 2 f(x)u, u This norm is affine invariant Let φ(y) := f(ay), and v = A 1 u. We have 2 φ(y)v, v = (A 2 f(y)a)(a 1 u),(a 1 u) = 2 f(x)u, u The property x, y 2 f(x) 2 f(y) M x y should then be replaced by: x, y, h 3 f(x)[h, h, x y] M h 2 x x y x
Self-concordancy: one of the two big properties There exists M > 0 for which: x, y, h 3 f(x)[h, h, x y] M h 2 x x y x x, h 3 f(x)[h, h, h] M h 3 x x, h 3 f(x)[h, h, h] 2 h 3 x Such functions are called self-concordant Examples (check it as exercise): ln(t) (domain: R ++ ) lndet(x) (domain: S N ++ ) ln(t 2 x 2 ) (domain: ice-cream cone) ln(t) ln(ln(t) x) (domain:{(t, x) : t exp(x)})
Self-concordant functions: the right thing for Newton s method These functions have MANY properties, among which: For every x domf, {y : y x x < 1} domf (interesting for Karmarkar s method) Proof:Let x dom(f), h R n. Let φ(t) = 1 h x+th = 1 2 f(x + th)h, h 1/2 Then φ(t) +0 when x + th dom(f) (because the Hessian goes to ). As long as φ(t) > 0, x + th dom(f) φ(t) = 3 f(x + th)[h, h, h] 2 2 f(x + th)h, h 3/2 Thus φ(t) < 1, and φ(t) > 0 for φ(0) < t < φ(0) i.e x ± φ(0)h dom(f) x ± h/ h x dom(f)
Self-concordant functions: the right thing for Newton s method These functions have MANY properties, among which: If f(x) x := 2 f(x) 1 f(x), f(x) 3 5 2, then x is in the quadratic convergence zone (automatic test, no x needed) The following method ALWAYS converges x k+1 = x k 2 f(x k ) 1 f(x k ) 1+ f(x k ) x k Exercise: the dual norm h x is 2 f(x k ) 1 h, h
Finally! Interior-point methods Main idea: formulate your problem in its conic form, and use as barrier for your cone a self-concordant function f min c, x min c, x + µf(x) s.t. Ax = b s.t. Ax = b x K The set of minimizers x(µ) is called the primal central path, and x(µ) x when µ 0 But wait: is c, x + µf(x) a self-concordant function? Yes!
How to decrease µ? Main goal: we want to decrease it linearly: µ = (1 θ)µ. Main idea: use our knowledge of the quadratic convergence zone Current point: x(µ). Target: x(µ ). We have c + µ f(x(µ)) = 0, and we want i.e. c + (1 θ)µ f(x(µ)) x(µ) < 3 5, 2 θµ f(x(µ)) x(µ) < 3 5, 2 hence, we would like to have a bound for f(x) x Note: this bound is responsible of the complexity. The smaller it is, the bigger is the decrease θ
The two crucial properties of barriers Self-concordancy: Bound for f(x) x : x, h 3 f(x)[h, h, h] 2 h 3 x x domf 2 f(x) 1 f(x), f(x) ν These functions are called ν-self-concordant barriers The theoretical complexity of the best IPM is O( ν ln(c/ɛ)) Newton iterations (in practice, even better: O(ln(ν)/ɛ))
An interior-point algorithm Algorithm 1 Let ɛ > 0, µ 0 > 0 and x 0 feasible such that c + µ 0 f(x 0 ) x 0 3 5 2 Let θ := 1/(1.5 + 14.3 ν) and k := 0 While 2.58µ k ν ɛ 1. µ k+1 := µ k (1 θ) 2. x k+1 := x k 2 f(x k ) 1 ( f(x k ) + µ k+1 c) 3. Increment k Complexity upper bound: (1.03 + 14.3 ν)ln(2.58µ 0 ν)/ɛ Proof of constants: PhD Thesis of François Glineur
How do you construct self-concordant barriers? 1- Basic barriers: Domain Barrier Complexity parameter R + ln(t) 1 S n + ln(det(x)) n epi x 2 ln(t 2 x 2 2 ) 2 epiexp(x) ln(t) ln(ln(t) x) 2 2- Combining barriers: Let f 1 be a barrier for K 1 with param. ν 1, and f 2 be a barrier for K 2 with param. ν 2. Then f 1 + f 2 is a barrier for K 1 K 2 with param. ν 1 + ν 2 Let f be the barrier for K with param. ν Then f (s) := sup x R n s, x f(x) is a barrier for K with param. ν
How do you construct self-concordant barriers? 1- Basic barriers: Domain Barrier Complexity parameter R + ln(t) 1 S+ n ln(det(x)) n epi x 2 ln(t 2 x 2 2 ) 2 epiexp(x) ln(t) ln(ln(t) x) 2 2- Combining barriers: Let f be the barrier for K with param. ν The restriction of f to a affine space S is a barrier for the set S K with param. ν
What about primal-dual problems? Everything is the same: min c, x s.t. Ax = b x K max y, b s.t. A y + s = c s K i.e. (What s the optimal value?) min c, x b, y min s, x + µ(f(x) + f (s)) s.t. Ax = b s.t. Ax = b A y + s = c A y + s x K, s K
Strangely enough, primal-dual IPM work very well All IPM optimization software (SeDuMi, MOSEK,... ) are primal-dual. Efficient IPM methods can solve: Linear problems, Second Order problems (ice-cream cone) in particular Quadratic problems, Semidefinite problems, and (sometimes) Geometric problems, i.e. involving posynomials (see on Thursday) because of the properties of their self-concordant barrier
And in practice? Many speed-ups and tricks are used For computing the starting point (and dealing with infeasible starting points) For solving the Newton system (reduction of variable) For updating µ Decrease µ much faster than in the theory, then do several steps targeting the central path THOUSANDS of research papers deal with these questions
Some references [1] - Y. Nesterov, Introductory lectures on convex optimization: a basic course, Kluwer, 2003 [2] - Y. Nesterov and A. Nemirovski, Interior Point Algorithms in Convex Programming, SIAM, 1993 [3] - J. Renegar, A Mathematical View of Interior-Point Methods in Convex Optimization, SIAM, 2001