Interior-Point Methods for Linear Optimization

Transcription

1 1 Interior-Point Methods for Linear Optimization Kees Roos Faculty of Information Technology and Systems (ITS) Optimization Group TU Delft, Mekelweg 4 P.O. Box 5031, 2628 CD Delft The Netherlands URL: roos C.Roos@its.tudelft.nl December 22, Introduction Everyone with some background in Mathematics knows how to solve a system of linear equalities, since it is the basic subject in Linear Algebra. In many practical problems, however, also inequalities play a role. For example, a budget usually may not be larger than some specified amount. In such situations one may end up with a system of linear relations that not only contains equalities but also inequalities. Solving such a system requires methods and theory that go beyond the standard Mathematical knowledge. Nevertheless the topic has a rich history and is tightly related to the important topic of Linear Optimization, where the object is to find the optimal (minimal or maximal) value of a linear function subject to linear constraints on the variables; the constraints may be either equality or inequality constraints. Both from a theoretical and computational point of view both topics are equivalent. In this chapter we describe the ideas underlying a new class of solution methods, that has become known as Interior-Point Methods (IPMs). 2 Short history of Linear Optimization Linear Optimization 1 (LO) is one of the most widely applied mathematical techniques. The last ten years gave rise to revolutionary developments both in computer technology and algorithms for LO. As a consequence, LO-problems that 10 years ago required a computational time of one year, say, can now be solved within some minutes. The achieved acceleration is due partly to advances in computer technology and for a significant part also to the new IPMs for LO. Until very recently, the standard method for solving LO-problems was the Simplex Method, first proposed by Dantzig [6], in Since then, it has been routinely used to solve problems in Business, Logistics, Economics and Engineering. In an effort to explain the remarkable efficiency of the Simplex Method, using the theory of complexity, one has tried to prove that the computational effort to solve an LO-problem via the Simplex Method is polynomially bounded in terms of the size of a problem instance. This question is still unsettled today, but it stimulated two important proposals of new algorithms for LO. The first polynomial method is the so-called Ellipsoid method, due to Khachiyan, in 1979 [21]. It is based on the ellipsoid technique for nonlinear optimization of Shor [33]. With this technique, Khachiyan proved that LO belongs to the class of polynomially solvable problems. Although this result had a great theoretical impact, it failed to deliver its promises in actual computational efficiency. The second proposal was made in 1984 by Karmarkar [20]. Karmarkar s algorithm is also polynomial, with a better complexity bound than 1 The field of Linear Optimization has been given the name Linear Programming in the past. The origin of this name goes back to the Dutch Nobel prize winner Koopmans. See Dantzig [7]. Nowadays the word programming usually refers to the activity of writing computer programs, and its use instead of the more natural word optimization gives rise to confusion.

2 2 Khachiyan, but it has the further advantage of being highly efficient in practice. After an initial controversy it has been established that for very large, sparse problems, subsequent variants of Karmarkar s method often outperform the Simplex Method. Though the field of LO was then considered more or less mature, after Karmarkar s paper it suddenly surfaced as one of the most active areas of research in optimization. In the period more than 1300 papers were published on the subject. 2 Originally the aim of the research was to get a better understanding of the so-called Projective Method of Karmarkar. Soon it became apparent that this method was related to classical methods like the Affine Scaling Method of Dikin [8, 9, 10], the Logarithmic Barrier Method of Frisch [11, 12] and the Center Method of Huard [17], and that the last two methods, when tuned properly, could also be proved to be polynomial. Moreover, it turned out that the IPM-approach to LO has a natural generalization to the related field of convex nonlinear optimization, which resulted in a new stream of research and an excellent monograph of Nesterov and Nemirovski [28]. The monograph of Nesterov and Nemirovski opened the way into other new subfields of optimization. Two major new subfields are Semidefinite Optimization and Second Order Cone Optimization, with important applications in System Theory, Discrete Optimization, and many other areas. For a survey of these developments the reader may consult Vandenberghe and Boyd [5], and the forthcoming book of Ben-Tal and Nemirovski [3]. This chapter offers a brief but rather complete introduction to IPMs for LO. Most of the material is treated in more detail in the book [31]. We deal with the duality theory for LO, putting much emphasis on the central path, which is at the heart of all current methods. We present a polynomial method for solving an LO-problem, including an elegant way to initialize the method. Although the presented method has the best theoretical iteration bound up till now, in practice it is outperformed by so-called Barrier Methods, a topic that we do not cover in this chapter. For this and other topics not covered the reader is referred to the aforementioned book or to the book of Wright [37]. We mention one other topic that we do not touch, Infeasible Start Methods. These methods have drawn a lot of attention in the literature. They deal with the situation when no feasible starting point is available. See. e.g., Potra [29], Bonnans and Potra [4], and Wright [35, 36, 37]. We want to point out that the embedding method described extensively in the Sections 3 6, provides a much more elegant and reliable solution to this problem: any given LO-problem can be embedded in a self-dual problem for which a feasible interior starting point is known. It has been implemented by the brothers Andersen, in their code MOSEK [1]. We strived to make a self-containing treatment of the subject. It is hoped that this chapter answers the need of professors who want to teach their students the basic principles of IPMs, and of users of LO who want to understand what is behind the new IPM-solvers in commercial codes (CPLEX, OSL, XPRESS, MOSEK, etc. ) and how to interpret results from those codes, and of other users who want to exploit the new algorithms as part of a more general software toolbox in optimization. 3 Reduction to a self-dual LO-problem 3.1 Canonical LO-problem and its dual We say that a linear optimization problem is in canonical form if it is written in the following way: (P ) min { c T x : Ax b, x 0 }, (1) where the matrix A is of size m n, the vectors c and x are in IR n and b in IR m. Note that all the constraints in (P ) are inequality constraints and the variables are nonnegative. Each LO-problem can be transformed to an equivalent canonical problem. 3 Given the above canonical problem (P ), we consider a second problem, denoted by (D) and called the dual problem of (P ), given by (D) max { b T y : A T y c, y 0 }. (2) 2 A recent(annotated) bibliography was given by Roos and Terlaky [30]. A valuable source of information is the World Wide Web interior-point archive: 3 For this we refer to any text book on LO. Also [31] is a suitable reference.

3 3 The two problems (P ) and (D) share the matrix A and the vectors b and c in their description. But the role of b and c has been interchanged: the objective vector c of (P ) is the right-hand side vector of (D), and, similarly, the right-hand side vector b of (P ) is the objective vector of (D). Moreover, the constraint matrix in (D) is the transposed matrix A T, where A is the constraint matrix in (P ). In both problems the variables are nonnegative. The problems differ in that (P ) is a minimization problem whereas (D) is a maximization problem, and, moreover, the inequality symbols in the constraints have opposite direction. Now let x be feasible for (P ) and y for (D). Then x 0, y 0, Ax b and A T y c. As a consequence we have b T y (Ax) T y = x T ( A T y ) c T x. (3) Hence, any y that is feasible for (D) provides a lower bound b T y for the value of c T x, whenever x is feasible for (P ). Conversely, any x that is feasible for (P ) provides an upper bound c T x for the value of b T y, whenever y is feasible for (D). This phenomenon is known as the weak duality property. An immediate consequence is that if c T x = b T y for a feasible pair (x, y) then x is optimal for (P ) and y is optimal for (D). The (nonnegative) difference c T x b T y between the primal objective value at a primal feasible x and the dual objective value at a dual feasible y is called the duality gap for the pair (x, y). We just established that x is optimal for (P ) and y is optimal for (D) if the duality gap vanishes. The most important result in LO states that the converse statement is also true: if x is an optimal solution of (P ) and y is an optimal solution of (D) then the duality gap vanishes at the pair (x, y). This result is known as the strong duality property. A proof of this result will be given in Section Reduction to inequality system The strong duality property implies that whenever both (P ) and (D) have an optimal solution then the system Ax b, x 0, A T y c, y 0, (4) b T y c T x 0 has a solution, and any such solution provides optimal solutions for (P ) and (D). If κ = 1, the following inequality system is equivalent to (4), as easily can be verified. 0 m m A b A T 0 n n c y x 0 m 0 n, x 0, y 0, κ 0. (5) b T c T 0 κ 0 Here 0 k denotes the zero vector of length k and 0 k l the zero matrix of size k l. Since the right-hand side in (5) is the zero vector, this system is homogeneous: whenever (y, x, κ) solves the system then λ(y, x, κ) also solves the system, for any positive λ. The new variable κ is called the homogenizing variable. Now, given any solution (x, y, κ) of (5) with κ > 0, (x/κ, y/κ, 1) yields a solution of (4). This makes clear that, in fact, the two systems are completely equivalent unless every solution of (5) has κ = 0. But if κ = 0 for every solution of (5), it is obvious that the first system cannot have a solution. Evidently, we can work with the second system without loss of information about the solution set of the first system. Hence, defining the matrix M and the vector z by M := 0 A b A T 0 c, z := y x, b T c T 0 κ where we omitted the size indices of the zero blocks, we have reduced the problem of solving (P ) and (D) to finding a solution of the inequality system Mz 0, z 0. (6)

4 4 If this system has a nonzero solution with κ > 0 then it gives optimal solutions for (P ) and (D); otherwise these problems have no optimal solutions. Thus our task has been reduced to finding a solution of (6) with κ > 0, or to prove that such a solution does not exists. In the sequel we will deal with this problem. In doing so, we will strongly use the fact that the matrix M is skew-symmetric, i.e., M T = M. 3.3 Interior-point condition The method we are going to use for solving (6) is an IPM, and for this we need the system to satisfy the interior-point condition (IPC). We say that any linear system of equalities and inequalities satisfies the IPC if there exists a feasible solution that satisfies all inequality constraints strictly. Unfortunately the system (6) does not satisfy the IPC. Because if z = (x, y, κ) satisfies all (inequality) constraints strictly then κ > 0, and hence x/κ is feasible for (P ) and y/κ is feasible for (D). But then ( c T x b T y ) /κ 0, by weak duality. On the other hand, the last constraint in (6) requires b T y c T x 0. It follows that b T y c T x = 0, and hence no feasible solution of (6) satisfies the last inequality in (6) strictly. To overcome this shortcoming of the system (6) we increase the dimension by adding one more nonnegative variable ϑ to the vector z, and by extending M with one extra column and row, according to ( ) ( ) M r z M := r T, z :=, 0 ϑ where r = e Me, with e denoting an all-one vector of length m + n + 1. Letting q be the vector of length m + n + 2 given by ( ) 0 q :=, m + n + 1 we consider the system Mz q, z 0. (7) First we observe that this system satisfies the IPC. The all-one vector does the work, because taking z = e and ϑ = 1, we have ( ) ( ) ( ) ( ) ( ) M r e 0 Me + r e Mz + q = r T + = 0 1 m + n + 2 r T =, (8) e + m + n since r T e + m + n + 2 = ( e Me ) T e + m + n + 2 = 1. Here we used e T Me = 0, since M is skew-symmetric. The usefulness of the new system stems not only from the fact that it satisfies the IPC. Another crucial property is that there is a correspondence between the solutions of (6) and the solutions of (7) with ϑ = 0. To see this it is useful to write (7) in terms of z and ϑ: ( ) ( ) ( ) M r z 0 r T + 0, z 0, ϑ 0. 0 ϑ m + n + 2 Obviously, if z = ( z, 0) satisfies (7), this implies M z 0 and z 0, and hence z satisfies (6). On the other hand, if z satisfies (6) then M z 0 and z 0, and z = ( z, 0) satisfies (7) if and only if r T z m + n + 2. If r T z 0 this certainly holds. Otherwise, if r T z > 0, a (small enough) positive multiple of z will certainly satisfy. Since a positive multiple preserves signs, this is sufficient for our goal.

5 5 3.4 Embedding into a self-dual LO-problem In the previous section we reduced the problem of solving (P ) and (D) to finding a solution of (7) with ϑ = 0 and κ > 0, or to prove that such a solution does not exists. For that purpose we consider the LO-problem (SP ) min { q T z : Mz q, z 0 }. (9) Since q T z = (m + n + 2)ϑ 0, the optimal value is nonnegative. On the other hand, since q 0 the zero vector (z = 0) is feasible, and yields zero as objective value. We conclude that the optimal solutions of (9) are precisely the vectors z satisfying (7) with ϑ = 0. Thus our task becomes to find an optimal solution of (9) with κ > 0, or to establish that such an optimal solution does not exist. An interesting feature of the LO-problem (9) is that it is self-dual: the dual problem is max { q T u : M T u q, u 0 } ; since M is skew-symmetric, M T u q is equivalent to Mu q, or Mu q, and maximizing q T u is equivalent to minimizing q T u, and thus the dual problem is the same problem as (9). The rest of the chapter is devoted to our main task, namely to find an optimal solution of (9) with z m+n+1 = κ > 0 or to establish that such a solution does not exist. To simplify notation we let n denote the order of the matrix M. So, n = m + n + 2. Furthermore, we introduce index sets B and N according to B := {i : z i > 0 for some optimal z} N := {i : s i (z) > 0 for some optimal z}. So, B contains all indices i for which an optimal solution z with positive z i exists. We also write z i B if i B. Note that we certainly have ϑ / B. The main question we have to answer is whether κ B holds or not. Because if κ B then there exists an optimal solution z with κ > 0, in which case (P ) and (D) have optimal solutions, and otherwise not. The next lemma implies that the sets B and N are disjoint. In this lemma, and further on, we use some new notations. First we associate to any nonnegative vector z its slack vector s(z) as follows. s(z) := Mz + q. Then z is a feasible vector if and only if s(z) 0. Furthermore, to any vector u IR k, we associate the diagonal matrix U whose diagonal entries are the elements of u, in the same order. Finally, if also v IR k, then Uv will be denoted shortly as uv. Thus uv is a vector whose entries are obtained by multiplying u and v componentwise. Lemma 3.1 Let z 1 and z 2 be feasible for (SP ). Then z 1 and z 2 are optimal solutions of (SP ) if and only if z 1 s(z 2 ) = z 2 s(z 1 ) = 0. Proof: Recall that, since M is skew-symmetric, u T Mu = 0, u IR n. (10) Using this we may write q T z = z T (s(z) Mz) = z T s(z) z T Mz = z T s(z). (11) As a consequence, z 0 is optimal if and only if s(z) 0 and z T s(z) = 0. Since, by (10), ( z 1 z 2) T ( M z 1 z 2) = 0, we have ( z 1 z 2) T ( s(z 1 ) s(z 2 ) ) = 0.

6 6 Expanding the product on the left and rearranging the terms we get (z 1 ) T s(z 2 ) + (z 2 ) T s(z 1 ) = (z 1 ) T s(z 1 ) + (z 2 ) T s(z 2 ). Now z 1 is optimal if and only if (z 1 ) T s(z 1 ) = 0, by (11), and similarly for z 2. z 1, z 2, s(z 1 ) and s(z 2 ) are all nonnegative, z 1 and z 2 are optimal if and only if Hence, since which is equivalent to proving the lemma. (z 1 ) T s(z 2 ) + (z 2 ) T s(z 1 ) = 0, z 1 s(z 2 ) = z 2 s(z 1 ) = 0, Later, in Section 4.3, we prove that the every index belongs to either B or N. Thus B and N form a partition of the whole index set. This partition is the so-called the optimal partition of (SP ). 4 The central path 4.1 Definition of the central path Recall from (8) that s(e) = e, where e (as always) denotes the all-one vector of appropriate length (in this case, n). As a consequence, we have a vector z such that z i s i (z) = 1 (1 i n), which, using our shorthand notation can also be expressed as zs(z) = e. (12) Now we come to a very fundamental notion, both from a theoretical and algorithmic point of view, namely the central path of the LO-problem at hand. The underlying theoretical property is that for every positive number µ there exist a unique nonnegative vector z such that zs(z) = µe, z 0, s(z) 0. (13) If µ = 1, the existence of such a vector is guaranteed by (12). Also note that if we put µ = 0 in (13) then the solutions are just the optimal solutions of (SP ), and then the system may have multiple solutions. Therefore, the following lemma is of interest. Lemma 4.1 If µ > 0, then there exists at most one nonnegative vector z such that (13) holds. Proof: Let z 1 and z 2 to nonnegative vectors satisfying (13), and let s 1 = s(z 1 ) and s 2 = s(z 2 ). Since µ > 0, z 1, z 2, s 1, s 2 are all positive. Define z := z 2 z 1, and similarly s := s 2 s 1. Then we may easily verify that M z = s (14) z 1 s + s 1 z + s z = 0. (15) Using that M is skew-symmetric, (14) implies that z T s = 0, or, equivalently, Rewriting (15) gives e T ( z s) = 0. (16) (z 1 + z) s + s 1 z = 0. Since z 1 + z = z 2 > 0 and s 1 > 0, this implies that no two corresponding entries in z and s have the same sign. So it follows that z s 0. (17) Combining (16) and (17), we obtain z s = 0. Hence either ( z) i = 0 or ( s) i = 0, for each i. Using (15) once more, we conclude that ( z) i = 0 and ( s) i = 0, for each i. Hence z = s = 0, whence z 1 = z 2 and s 1 = s 2. This proves the lemma.

7 7 To prove the existence of a solution to (13) requires more effort. We postpone this to the next section. For the moment, let us take the existence of a solution to (13) for granted and denote it as z(µ). We call it the µ-center of (SP ). The set {z(µ) : µ > 0} of all µ-centers represents a parametric curve in the feasible region of (SP ) and is called the central path of (SP ). Note that q T z(µ) = s(µ) T z(µ) = nµ. (18) This proves that along the central path, when µ approaches zero, the objective value q T z(µ) monotonically decreases to zero, at a linear rate. In the next section we will use an other important fact, namely e T z + e T s(z) = n + q T z. (19) This identity follows by applying the orthogonality property (10) to the vector e z, which gives (z e) T (s(z) s(e)) = 0. since s(e) = e, e T e = n and z T s(z) = q T z, the above relation follows. 4.2 Existence of the central path In this section we give an algorithmic proof of the existence of a solution to (13). Starting at z = e we construct the µ-center for any µ > 0. This is done by using the so-called Newton direction as a search direction. The results in this section will also be used later when dealing with a polynomial-time method for solving (SP ) Newton direction Assume that z is a positive solution of (SP ) such that its slack vector s = s(z) is positive, and let z denote a displacement in the z-space. Our aim is to find z such that z + z is the µ-center. We denote z + := z + z, and the new slack vector as s + : s + := s(z + ) = M(z + z) + q = s + M z. Thus, the displacement s in the s-space is simply given by s = s + s = M z. Observe that, that z and s are orthogonal, by using (10): ( z) T s = ( z) T M z = 0. (20) We want z + to be the µ-center, which means (z + z) (s + s) = µe, or zs+z s+s z + z s = µe. The last equation is nonlinear. Applying Newton s method, we omit the nonlinear term z s, leaving us with the following linear system in the unknown vectors z and s: M z s = 0 z s + s z = µe zs. (21) This system has a unique solution, as easily may be verified, by using that M is skew-symmetric and z > 0 and s > 0. The solution z is called the Newton direction. Since we omitted the quadratic term z s in our calculation of the Newton direction, z + z will (in general) not be

8 8 the µ-center, but hopefully it will be a good approximation. In fact, after the Newton step, one has z + s(z + ) = (z + z)(s + s) = zs + z s + s z + z s = µe + z s. (22) Comparing this with our desire, namely z + s(z + ) = µe, we see that the error is precisely the quadratic term z s. From (22) and (20) we deduce ( z + ) T s(z + ) = µe T e = µ n, (23) showing that after the Newton step the duality gap already has its desired value Proximity measure To measure the quality of the approximation we introduce a proximity measure δ(z, µ) that vanishes if z = z(µ) and is positive otherwise. It is convenient to introduce the vector zs(z) v := µ, (24) where all operations are componentwise. The proximity measure δ(z, µ) is then defined by δ(z, µ) := 1 2 v v 1. (25) Note that z = z(µ) if and only if v = e and then δ(z, µ) = 0, and otherwise δ(z, µ) > 0. We proceed with some lemmas. The first lemma is a simple technical result on the componentwise product of two arbitrary orthogonal vectors. In this lemma. denotes the Eucledian norm (or 2-norm) and. the Chebychev norm (or infinity norm). Lemma 4.2 If u and v are orthogonal in IR n, then uv 1 4 u + v 2, uv u + v 2. Proof: We may write ( uv = 1 4 (u + v) 2 (u v) 2). (26) From this we derive the componentwise inequality This implies 1 4 (u v)2 uv 1 4 (u + v) u v 2 e uv 1 4 u + v 2 e. Since u and v are orthogonal, the vectors u v and u + v have the same 2-norm, and hence the first inequality in the lemma follows. For the second inequality we derive from (26) that uv 2 = e T (uv) 2 = 1 16 et ( (u + v) 2 (u v) 2) et ( (u + v) 4 + (u v) 4). Since e T z 4 z 4 for any z IR n, we obtain uv ( u + v 4 + u v 4). Using again that u v = u + v, the second inequality follows. The next lemma estimates the error term in terms of the proximity measure. Lemma 4.3 If δ := δ(z, µ), then z s µδ 2 and z s µδ 2 2. Proof: Componentwise division of (21) by µ v = zs yields z s s s + z z = µ ( v 1 v ). The terms at the left represent orthogonal vectors whose componentwise product is z s. Applying Lemma 4.2 to these vectors, and using that v 1 v = 2δ, the result immediately follows.

9 Quadratic convergence of the Newton process We are now ready for the main result on the Newton direction. Lemma 4.4 If δ := δ(z, µ) < 1, then the Newton step is strictly feasible, i.e., z + > 0 and s + > 0. Moreover, δ(z + δ 2, µ) 2(1 δ2 ). Proof: Let 0 α 1, z α = z + α z and s α = s + α s. We then have, using (21), z α s α = (z + α z)(s + α s) = zs + α (z s + s z) + α 2 z s. = zs + α (µe zs) + α 2 z s = (1 α)zs + α (µe α z s) By Lemma 4.3, µe α z s µ(1 αδ 2 )e > 0. Hence, since (1 α)zs 0, we have z α s α > 0, for all α [0, 1]. Therefore, the components of z α and s α cannot vanish for any α [0, 1]. Hence, since z > 0 and s > 0, by continuity, z α and s α must be positive for any such α, especially for α = 1. This proves the first statement in the lemma. Now let us turn to the proof of the second statement. Let δ + := δ(z +, µ) and v + z := + s + µ. Then, by definition, 2δ + = (v + ) 1 v + = (v + ) 1 ( e (v + ) 2). Recall from (22) that z + s + = µe + z s. In other words, µ (v + ) 2 = µe + z s. Substitution gives z s 2δ + = µ e + z s z s µ δ2 2 ; 1 δ 2 µ 1 z s µ in the last inequality we used Lemma 4.3 twice. Thus the proof is complete. Lemma 4.4 implies that when δ 1/ 2, then after a Newton step the proximity to the µ-center satisfies δ(z +, µ) δ 2. In other words, Newton s method is quadratically convergent Existence proof Now suppose that we know z := z(µ 0 ), for some µ 0 > 0. Note that this is true for µ 0 = 1, since s(e) = e. Let µ > 0 be such that δ(z, µ) 1/ 2. Since zs(z) = µ 0 e, this holds if and only if µ 0 µ µ µ 0 e 1 2. Using e = n, one may easily verify that this certainly holds if n µ µ n. (27) Now starting the Newton process at z = z 0, with µ fixed, we can generate an infinite sequence z 0, z 1, z k, such that δ ( z k, µ ) 1 2 2k 1. Hence lim k δ ( z k, µ ) = 0.

10 10 The generated sequence has at least one accumulation point z, since the iterates z 1, z k, lie in the compact set e T z + e T s(z) = n (1 + µ), z 0, s(z) 0, due to (19) and (23). Since δ (z, µ) = 0, we obtain z s (z ) = µe. This proves that the µ-center exists if µ satisfies (27) with µ 0 = 1, i.e., if n µ n. By redefining µ 0 as one of the endpoints of the above interval we can repeat the above procedure, and extend the interval where the µ-center exists to ( 1 ) 2 µ n ( ) 2 2, 2 n and so on. After applying the procedure k times the interval where the µ-center certainly exists is given by 1 ( ) k ( µ ) k n 2 n For arbitrary µ > 0, we have to apply the above procedure at most log µ ( ) 2 log n times, to obtain the µ-center. This completes the proof of the existence of the central path. 4.3 Existence of a strictly complementary solution Now that we have proven the existence of the central path we can use it as a guide to the optimal set, by letting the parameter µ approach to zero. As we show in this section, in this way we obtain an optimal solution z such that z + s(z) > 0. Recall that the optimality of z means that zs(z) = 0, which is usually expressed by saying that z and s(z) are complementary vectors. If moreover z + s(z) > 0 we say that z and s(z) are strictly complementary vectors. Then for every index i, either z i > 0 or s i (z) > 0. This implies that the index sets B and N, introduced in Section 3.4 form a partition of the index set, the so-called optimal partition of (SP ). It is convenient to introduce some more notation. If z is a nonnegative vector, we define its support, denoted by σ(z), as the set of coordinate indices i for which z i > 0. Note that if s = s(z), then σ(z) σ(s) = if and only if z is an optimal solution. Furthermore, z is a strictly complementary optimal solution if and only if it is optimal and σ(z) σ(s) = {1, 2,..., n}. Theorem 4.5 There exists an optimal solution z such that z + s(z ) > 0. Proof: Let {µ k } k=1 be a positive sequence such that µ k 0 if k, and let s(µ k ) := s (z(µ k )). Due to (19) the set {(z(µ k ), s(µ k ))} lies in a compact set, and hence it contains a subsequence converging to a point (z, s ), with s = s(z ). Since z(µ k ) T s(µ k ) = nµ k 0, we have (z ) T s = 0. Hence, from (11), q T z = 0. So z is an optimal solution. We claim that (z, s ) is a strictly complementary pair. To prove this, we apply the orthogonality property (10) to the points z and z(µ k ), which gives (z(µ k ) z ) T (s(µ k ) s ) = 0. Rearranging the terms, and using z(µ k ) T s(µ k ) = nµ k and (z ) T s = 0, we arrive at zj s j (µ k ) + z j (µ k )s j = nµ k. j σ(z ) j σ(s )

11 11 Dividing both sides by µ k and recalling that z j (µ k )s j (µ k ) = µ k, we obtain j σ(z ) z j z j (µ k ) + j σ(s ) s j s j (µ k ) = n. Letting k, the first sum on the left becomes equal to the number of nonzero coordinates in z. Similarly, the second sum becomes equal to the number of nonzero coordinates in s. The sum of these numbers being n, we conclude that the optimal pair (z, s ) is strictly complementary. By using a similar proof technique it can be shown that the limit of z(µ) exists if µ goes to zero. In other words, the central path converges. The limit point is (of course) a uniquely determined optimal solution of (SP ), which can further be characterized as the so-called analytic center of the set optimal solutions [31]. By Theorem 4.5 there exists a strictly complementary solution z of (SP ). Having such a solution, the classes B and N simply follow from B = {i : z i > 0}, N = {i : s i (z) > 0}. Let us finally mention that Theorem 4.5 is equivalent to an old result of Goldman and Tucker which states that every feasible linear system of equalities and inequalities has a strictly feasible solution [13]. We want to make explicit that Theorem 4.5 also implies the strong duality theorem for LO. We devote a separate section to this. 4.4 Proof of the strong duality theorem The strong duality theorem for LO states that if one of the problems (P ) and (D) is feasible, then so is the other problem, and then both problems have optimal solutions and their optimal objective values coincide. As we will show this result is tightly related to the fact (proved below) that (P ) and (D) have optimal solutions if and only κ B. This goes as follows. In Section 3.2 we established that (P ) and (D) have optimal solutions with zero duality gap if and only if there exists a solution of (SP ) with κ > 0. This happens exactly when κ B. For the proof of the strong duality theorem it remains to show that if κ / B, then neither (P ) nor (D) has an optimal solution. Thus we consider the case where κ / B. Then, by Theorem 4.5, we may conclude that κ N and hence there exists a solution ( x, ỹ, κ) to (5) with positive slack value for the third constraint. In other words, x and ỹ satisfy x 0, ỹ 0, A x 0, A T ỹ 0, b T ỹ c T x > 0. By the last inequality we cannot have b T ỹ 0 and c T x 0. Hence, either b T ỹ > 0 or c T x < 0. If b T ỹ > 0 then (P ) is infeasible, since otherwise there exists x 0 such that Ax b and we have b T ỹ (Ax) T ỹ = x T A T ỹ 0, a contradiction. Moreover, if (D) is not infeasible, then there exists y 0 such that A T y c. Then y + αỹ is feasible for (D), for any α > 0. The dual objective value b T (y + αỹ) = b T y + αb T ỹ goes to infinity if α grows. This shows that if b T ỹ > 0, then (P ) is infeasible and (D) is infeasible or unbounded. The case c T x < 0 can be handled in the same way. Then (D) is infeasible, since otherwise there exists y 0 such that A T y c, which would imply the contradiction c T x ( A T y ) T x = y T A x 0. If (P ) is not infeasible, then we have x 0 such that Ax b and x + α x is feasible for (P ), for any α > 0. If α grows then the objective value c T (x + α x) = c T x + αc T x goes to minus infinity, showing that (P ) is unbounded in that case. We conclude that if c T x < 0 then (D) is infeasible and (P ) is infeasible or unbounded. This completes the proof of he strong duality theorem for LO.

12 12 5 A polynomial solution method for LO After all the theoretical results of the previous sections we are now ready to present an algorithm that finds a strictly complementary solution of (SP ) in polynomial time. 5.1 Newton-step algorithm The idea of the algorithm is quite simple. Starting at z = e, we choose µ < 1 such that δ(z, µ) τ := 1 2, (28) and perform a Newton step targeting at z(µ). After the step the new iterate z satisfies δ(z, µ) τ 2 = 1 2. Then we decrease µ such that (28) holds for the new values of z and µ, and repeat the procedure. Note that after each Newton step we have q T z = nµ. Thus, if µ approaches zero, then z will approach the set of optimal solutions. Formally the algorithm can be stated as follows. Full-Newton step algorithm Input: An accuracy parameter ε > 0; an update parameter θ, 0 < θ < 1. begin z = e; µ := 1; while nµ ε do begin µ := (1 θ)µ; z := z + z; end end Note that the reduction of the barrier parameter µ is realized by the multiplication with the factor 1 θ. In the next section we discuss how an appropriate value of the update parameter θ can be obtained, so that during the course of the algorithm the iterates are kept within the region where Newton s method is quadratically convergent. 5.2 Complexity analysis At the start of the algorithm we have µ = 1 and z = z(1) = e, whence q T z = n and δ(z, µ) = 0. In each iteration µ is first reduced with the factor 1 θ and then the Newton step is made to the new µ-center. It will be clear that the reduction of µ has effect on the value of the proximity measure. This effect is fully described by the following lemma. Lemma 5.1 Let z > 0 and µ > 0 be such that s = s(z) > 0 and q T z = nµ. δ := δ(z, µ) and µ = (1 θ)µ. Then Moreover, let δ(z, µ ) 2 = (1 θ)δ 2 + θ2 n 4(1 θ). Proof: Let δ + := δ(z, µ ) and v = zs/µ, as defined in (24). Then, by definition, 4(δ + ) 2 = 1 θ v 1 v 2 ( = 1 θ 1 θ v 1 v ) + θv 2. 1 θ

13 13 From z T s = nµ it follows that v 2 = n. Hence, v is orthogonal to v 1 v: v T ( v 1 v ) = n v 2 = 0. Therefore, 4(δ + ) 2 = (1 θ) v 1 v 2 + θ2 v 2 1 θ. Finally, since v 1 v = 2δ and v 2 = n the result follows. Now let us have q T z = nµ and δ(z, µ) 1 2 (29) at the start of some iteration. It certainly holds in the first iteration. We claim that these properties are maintained during the course of the algorithm if θ = 1/(2 n). When the barrier parameter is updated to µ = (1 θ)µ, the lemma gives Due to the choice of θ, we get δ(z, µ ) 2 1 θ 4 + θ2 n 4(1 θ). δ(z, µ ) 2 < (1 θ) < 1 2. Therefore, after the Newton step we obtain z such that δ(z, µ ) 1/2. Also, q T z = nµ. This proves the claim. How many iterations are needed by the algorithm? The answer is provided by the following lemma. Lemma 5.2 After at most 1 θ log n ε iterations we have nµ ε. Proof: Initially, the objective value is n and in each iteration it is reduced by the factor 1 θ. Hence, after k iterations the objective value, given by q T x(µ) = nµ, is smaller than, or equal to ε if (1 θ) k n ε. Taking logarithms, this becomes Since log (1 θ) θ, this certainly holds if k log (1 θ) + log n log ε. kθ log n log ε = log n ε. This implies the lemma. The above results are summarized in the next theorem. Theorem 5.3 If θ = 1/(2 n) then the algorithm with full Newton steps requires at most 2 n log n ε iterations. The output is a feasible z > 0 such that q T z = nµ ε and δ(z, µ) 1 2.

14 14 This theorem shows that we can get an ε-solution z of our self-dual model with ε as small as desirable. But note that z will always be an interior solution, so z > 0 and s = s(z) > 0. A crucial question for us is whether the variable κ = z n 1 is positive or zero in the limit, when µ goes to zero. In practice, for small enough ε it is usually no serious problem to decide which of the two cases occurs. But the theory can help us in an elegant way, as we explain in the next section, where we show that the optimal partition can be found in polynomial time. This requires some further analysis of the central path. Before proceeding, we want to agree upon the following. Because of (29), the iterates move in a narrow neighborhood around the central path. In the next sections we will assume that the iterates are on the central path, because this simplifies the analysis significantly. At the cost of some additional technicalities in the analysis we can obtain similar results for the iterates of the practical algorithm. For the technical details the reader is referred to [31]. 5.3 Finding the optimal partition Let S denote the set of optimal solutions of (SP ). If z S, then, for each index i, either z i = 0 or s i (z) = 0, due to the complementarity property. On the other hand, given i, due to the existence of strictly complementary solutions, there exists an optimal z such that z i + s i (z) > 0. Hence the number σ := min max (z i + s i (z)) i z S is positive. One also has σ n, due to (19). We call σ the condition number of (SP ). The calculation of this number is more cumbersome than solving (SP ). For our purpose, however, it is sufficient to know the following lower bound σ 1 n j=1 M j, (30) where M j denotes the j-th column of M (Theorem I.33 in [31]). For this bound we need to make the assumption that the entries of M and q are integral. In the sequel, in all the bounds where the condition number occurs, it can be safely replaced by this easily computable lower bound. Lemma 5.4 For any positive µ one has z i (µ), i B, σ n z i(µ) nµ σ, i N. Proof: Let i N and z S be such that s i := s i ( z) is maximal. Then the definition of the condition number σ implies that s i σ > 0. Using (10) once more, we get (z(µ) z) T (s(µ) s) = 0. Since q T z = z T s = 0, it follows that z(µ) T s + s(µ) T z = nµ. This implies z i (µ) s i z(µ) T s nµ. Dividing by s i and using that s i σ we obtain the second inequality of the lemma: z i (µ) nµ s i If i B, then it can be derived in a similar way that nµ σ. s i (µ) nµ σ. Since z i (µ)s i (µ) = µ, this implies the first inequality.

15 15 Lemma 5.4 gives a complete separation between variables in B and variables in N, provided that µ is so small that σ n > nµ σ. or, equivalently, Thus, applying the algorithm with nµ < σ2 n. ε = σ2 n, we obtain a solution z which reveals the optimal partition according to B = {i : z i > s i (z)}, N = {i : z i < s i (z)}. By substitution of the above value of ε in Theorem 5.3, the required number of iterations follows: 2 n log n2 σ 2. After this number of iterations we know the optimal partition, and hence if κ belongs to B or not. In other words, we then know whether the original problem has an optimal solution or not. In the last case we are done, otherwise we are usually interested in finding a solution. If we are interested in an ε-solution we can proceed with the algorithm until such a solution is obtained. An alternative approach may be to use the rounding procedure described in the next section, which yields an exact solution. 5.4 Rounding to an exact solution Knowing the optimal partition, we aim for finding a strictly complementary solution of (SP ). We first establish that the set B in the optimal partition cannot be empty. Because then z = 0 is a strictly complementary solution, which implies that s(0) is positive. Since s(0) = q, and q has zero entries, this cannot happen. Therefore, the set B is not empty. Knowing this we describe a rounding procedure that can be applied to any z generated by the algorithm to yield a vector z such that z and its slack vector s = s( z) are complementary (in the sense that z N = s B = 0). In general, however, these z and s are not necessarily nonnegative. But, as we will see, after sufficiently many additional iterations of the algorithm the separation between the small and the large variables is strong enough to get a strictly complementary solution. All of this can be done in polynomial time. In the sequel we refer to the variables z i, i B, and s i, i N, as large variables, and to the remaining variables as small variables. Recall that the large variables converge to positive values if µ approaches zero, and the small variables to zero. b Partitioning the matrix M and the vectors z and s according to the optimal partition, the relation s = Mz + q can be rewritten as s B = M BB M BN z B + q B. (31) M NB M NN z N q N s N Now let z be a strictly complementary solution. Then q T B z B = 0 and z B > 0, which implies q B = 0. Using this we derive form (31) that s B = M BB z B + M BN z N. (32) Now let z be generated by the algorithm. We consider the system of equations in the unknown vector ξ given by M BB ξ = s B M BN z N. (33) Note that ξ = z B is a large solution of (33); its entries are large variables. We can easily see that (33) has more solutions. This follows by using the strictly complementary solution z once more.

16 16 Since z N = 0 and s B ( z) = 0, it follows from (32) that M BB z B = 0. Since z B > 0, we conclude that the matrix M BB must be singular, and hence (33) has multiple solutions. Now let ξ be any solution of (33) and consider the vector z defined by Define s = s( z). Since z N = 0, we have z B = z B ξ, z N = 0. s B = M BB z B = M BB (z B ξ) = 0. Therefore, z N = s B = 0, showing that the vectors z and s are complementary. It will be clear, however, that the vectors z and s are not necessarily nonnegative, let alone strictly complementary. This only holds if z B = z B ξ > 0, (34) and s N = M NB z B + M NN z N + q N = M NB (z B ξ) + q N = s N M NN z N M NB ξ > 0. (35) Note that if we run the algorithm long enough, z B and s N converge to positive vectors, whereas z N and s B (and hence also ξ) converge to zero. This makes it plausible that if ε is taken small enough in the algorithm, then we will get a solution ξ of (33) that satisfies (34) and (35), hence giving rise to a strictly complementary solution z of (33). Omitting further details, we conclude this section by stating the main complexity result: If we execute 7 nl iterations, and apply the above rounding procedure to the point z then generated by the algorithm, we obtain a strictly complementary solution z of (SP ). Here L denotes the binary input size of the problem (SP ). 6 Other interior-point methods In the preceding sections we have shown that the LO-problem can be solved in polynomial time. If the problem is infeasible or unbounded then this is detected by the method, and otherwise it generates an exact (strictly complementary) solution. The number of iterations required by the algorithm will be about the same as predicted by the theory. In practice this means that the method is much slower than the Simplex method. There are several ways to make the method more efficient, and competitive to the Simplex method. We briefly discuss some of them. For a more extensive discussion we refer to [31] and to some relevant references given below. We restrict ourselves below to strategies that may decrease the number of iterations. The computational effort per iteration, needed to compute the Newton step, is another issue that is critical for the performance of the algorithm, but that we do not discuss here. 6.1 Adaptive-update methods The method we considered generates iterates in a narrow neighborhood of the central path, since after each Newton step we have δ(x, µ) 1 2. In the theoretical analysis this property is maintained by taking small updates of the barrier barrier parameter (θ = 1/(2 n). In practice, Newton s method behaves much better than predicted by Lemma 4.4. As a consequence, after the Newton step δ(x, µ) will usually be much smaller than 1 2. It means that we can take larger values of θ without loosing the above property. The adaptive-update method seeks to take θ as large as possible such that after each Newton step we still have δ(x, µ) 1 2. This strategy accelerates the method significantly and does not deteriorate the theoretical iteration bound ([19, 26, 27]). 6.2 Large-update methods A popular strategy is to use larger values of θ, say θ = 0.5, or even θ = In this case, after the barrier update, δ(x, µ) will be much larger than 1, and Lemma 4.4 can be no longer used to

17 17 measure the performance of the Newton process. Even worse, the (full) Newton step may be no longer feasible. The remedy is to take a damped Newton step z = z + α z, with damping factor α; this factor can be chosen such that z is feasible and at the same time the proximity decreases sufficiently to get a provable polynomial method. In practice this approach yields very efficient methods ([2, 14, 15, 16, 18, 22, 23, 32, 34]). 6.3 Predictor-corrector method This is the most popular method. We describe a simple variant. It is based on a very greedy strategy that uses the Newton step targeting at the zero vector. So, in the definition of the Newton step one takes µ = 0. The resulting direction is called the affine scaling direction. To stay feasible, this step is damped again, but with a greedy damping factor. As a result, after such a step we may end up far from the central path. To restore the proximity, a so-called centering step is taken. This is a (damped) Newton step with µ = q T z/ n, where z is the current iterate. This method has not only turned out to be extremely efficient, but also has the nice property that asymptotically the objective value converges quadratically to zero ([24, 25, 29, 38]). References [1] E.D. Andersen and K.D. Andersen. The MOSEK interior-point optimizer for Linear Programming: an implementation of the homogeneous algorithm. In S. Zhang, H. Frenk, C. Roos, and T. Terlaky, editors, High Performence Optimization Techniques, pages Kluwer Academic Publishers, Dordrecht, The Netherlands, [2] K.M. Anstreicher and R.A. Bosch. Long steps in a O(n 3 L) algorithm for linear programming. Mathematical Programming, 54: , [3] A. Ben-Tal and A.S. Nemirovskii. Lectures on Modern Convex Optimization. Analysis, Algorithms, Engineering Applications. SIAM Publications. SIAM, Philadelphia, USA, [4] J.F. Bonnans and F.A. Potra. On the convergence of the iteration sequence of infeasible path following algorithms for linear complementarity problems. Mathematics of Operations Research, 22(2): , [5] S.E. Boyd and L. Vandenberghe. Semidefinite programming. SIAM Review, 38(1):49 96, [6] G.B. Dantzig. Linear Programming and Extensions. Princeton University Press, Princeton, New Jersey, [7] G.B. Dantzig. Linear programming. In J.K. Lenstra, A.H.G. Rinnooy Kan, and A. Schrijver, editors, History of mathematical programming. A collection of personal reminiscences. CWI, North Holland, The Netherlands, [8] I.I. Dikin. Iterative solution of problems of linear and quadratic programming. Doklady Akademii Nauk SSSR, 174: , Translated in : Soviet Mathematics Doklady, 8: , [9] I.I. Dikin. On the convergence of an iterative process. Upravlyaemye Sistemi, 12:54 60, (In Russian). [10] I.I. Dikin. Letter to the editor. Mathematical Programming, 41: , [11] K.R. Frisch. La resolution des problemes de programme lineaire par la methode du potential logarithmique. Cahiers du Seminaire D Econometrie, 4:7 20, [12] R. Frisch. The logarithmic potential method for solving linear programming problems. Memorandum, University Institute of Economics, Oslo, [13] A.J. Goldman and A.W. Tucker. Theory of linear programming. In H.W. Kuhn and A.W. Tucker, editors, Linear Inequalities and Related Systems, Annals of Mathematical Studies, No. 38, pages Princeton University Press, Princeton, New Jersey, [14] C.C. Gonzaga. Large step path-following methods for linear programming, Part I : Barrier function method. SIAM Journal on Optimization, 1: , [15] C.C. Gonzaga. Large step path-following methods for linear programming, Part II : Potential reduction method. SIAM Journal on Optimization, 1: , [16] D. den Hertog, C. Roos, and J.-Ph. Vial. A complexity reduction for the long-step path-following algorithm for linear programming. SIAM Journal on Optimization, 2:71 87, 1992.

18 18 [17] P. Huard. Resolution of mathematical programming with nonlinear constraints by the method of centers. In J. Abadie, editor, Nonlinear Programming, pages North Holland, Amsterdam, The Netherlands, [18] B. Jansen, C. Roos, T. Terlaky, and J.-Ph. Vial. Long-step primal-dual target-following algorithms for linear programming. Mathematical Methods of Operations Research, 44:11 30, [19] F. Jarre and M.A. Saunders. An adaptive primal-dual method for linear programming. COAL Newsletter, 19:7 16, August [20] N.K. Karmarkar. A new polynomial-time algorithm for linear programming. Combinatorica, 4: , [21] L. G. Khachiyan. A polynomial algorithm in linear programming. Doklady Akademiia Nauk SSSR, 244: , Translated into English in Soviet Mathematics Doklady 20, [22] M. Kojima, Y. Kurita, and S. Mizuno. Large-step interior point algorithms for linear complementarity problems. SIAM Journal on Optimization, 3(2): , [23] M. Kojima, N. Megiddo, and S. Mizuno. Theoretical convergence of large-step-primal-dual interior point algorithms for linear programming. Mathematical Programming, 59:1 22, [24] I.J. Lustig, R.E. Marsten, and D.F. Shanno. On implementing Mehrotra s predictor-corrector interior point method for linear programming. SIAM Journal on Optimization, 2: , [25] S. Mehrotra. On the implementation of a primal-dual interior point method. SIAM J. Optim., 2(4): , [26] S. Mizuno and M.J. Todd. An O(n 3 L) adaptive path following algorithm for a linear complementarity problem. Mathematical Programming, 52: , [27] S. Mizuno, M.J. Todd, and Y. Ye. On adaptive-step primal-dual interior-point algorithms for linear programming. Mathematics of Operations Research, 18: , [28] Y.E. Nesterov and A.S. Nemirovskii. Interior Point Polynomial Methods in Convex Programming : Theory and Algorithms. SIAM Publications. SIAM, Philadelphia, USA, [29] F.A. Potra. A quadratically convergent predictor-corrector method for solving linear programs from infeasible starting points. Mathematical Programming, 67(3): , [30] C. Roos and T. Terlaky. Advances in linear optimization. In M. Dell Amico, F. Maffioli, and S. Martello, editors, Annotated Bibliography in Combinatorial Optimization, chapter 7. John Wiley & Sons, New York, USA, [31] C. Roos, T. Terlaky, and J.-Ph.Vial. Theory and Algorithms for Linear Optimization. An Interior- Point Approach. John Wiley & Sons, Chichester, UK, [32] C. Roos and J.-Ph. Vial. Long steps with the logarithmic penalty barrier function in linear programming. In J. Gabszevwicz, J.F. Richard, and L. Wolsey, editors, Economic Decision Making : Games, Economics and Optimization, dedicated to J.H. Dreze, pages Elsevier Science Publisher B.V., Amsterdam, The Netherlands, [33] N.Z. Shor. Quadratic optimization problems. Soviet Journal of Computer and System Sciences, 25:1 11, [34] M.J. Todd and Y. Ye. A lower bound on the number of iterations of long-step and polynomial interior-point linear programming algorithms. Annals of Operations Research, 62: , [35] S. J. Wright. An infeasible-interior-point algorithm for linear complementarity problems. Mathematical Programming, 67(1):29 52, [36] S.J. Wright. A path-following infeasible-interior-point algorithm for linear complementarity problems. Optimization Methods and Software, 2:79 106, [37] S.J. Wright. Primal-Dual Interior-Point Methods. SIAM, Philadelphia, [38] Y. Zhang and D. Zhang. On polynomiality of the Mehrotra type predictor corrector interior point algorithms. Mathematical Programming, 68: , 1995.