CHAPTER 9 Integer Programming An integer linear program (ILP) is, by definition, a linear program with the additional constraint that all variables take integer values: (9.1) max c T x s t Ax b and x integral Integrality restrictions occur in many situations. For example, the products in a linear production model (cf. p. 81) might be indivisible goods that can only be produced in integer multiples of one unit. Many problems in operations research and combinatorial optimization can be formulated as ILPs. As integer programming is NP-hard (see Section 8.3), every NP-problem can in principle be formulated as an ILP. In fact, such problems usually admit many different ILP formulations. Finding a particularly suited one is often a decisive step towards the solution of a problem. 9.1. Formulating an Integer Program In this section we present a number of (typical) examples of problems with their corresponding ILP formulations. Graph Coloring. Let us start with the combinatorial problem of coloring the nodes of a graph G = V E so that no two adjacent nodes receive the same color and as few colors as possible are used (cf. Section 8.1). This problem occurs in many applications. For example, the nodes may represent jobs that can each be executed in one unit of time. An edge joining two nodes may indicate that the corresponding jobs cannot be executed in parallel (because they use perhaps common resources). In this interpretation, the graph G would be the conflict graph of the given set of jobs. The minimum number of colors needed to color its nodes equals the number of time units necessary to execute all jobs. Formulating the node coloring problem as an ILP, we assume V = {1 n} and that we have n colors at our disposal. We introduce binary variables y k, k = 1 n, to indicate whether color k is used y k = 1 or not y k = 0. Furthermore, we introduce variables x ik to indicate whether node i receives color k. 181
182 9. INTEGER PROGRAMMING The resulting ILP is (9.2) min n k=1 y k s t n 1 k=1 x ik = 1 i = 1 n 2 x ik y k 0 i k = 1 n 3 x ik + x jk 1 i j E k = 1 n 4 0 x ik y k 1 5 x ik y k Z The constraints (4) and (5) ensure that the x ik and y k are binary variables. The constraints (1) (3) guarantee (in this order) that each node is colored, node i receives color k only if color k is used at all, and any two adjacent nodes have different colors. EX. 9.1. Show: If the integrality constraint (5) is removed, the resulting linear program has optimum value equal to 1. The Traveling Salesman Problem (TSP). This is one of the best-known combinatorial optimization problems: There are n towns and a salesman, located in town 1, who is to visit each of the other n 1 towns exactly once and then return home. The tour (traveling salesman tour) has to be chosen so that the total distance traveled is minimized. To model this problem, consider the so-called complete graph K n, i.e., the graph K n = V E with n = V pairwise adjacent nodes. With respect to a given cost (distance) function c : E R we then seek to find a Hamilton circuit C E, i.e., a circuit including every node, of minimal cost. An ILP formulation can be obtained as follows. We introduce binary variables x ik i k = 1 n to indicate whether node i is the kth node visited. In addition, we introduce variables y e e E to record whether edge e is traversed: (9.3) min e E c ey e s t x 11 = 1 n k=1 x ik = 1 i = 1 n n i=1 x ik = 1 k = 1 n e y e = n x i+ k 1 + x jk y e 1 e = i j k 2 x in + x 11 y e 1 e = i 1 0 x ik y e 1 x ik y e Z EX. 9.2. Show that each feasible solution of (9.3) corresponds to a Hamilton circuit and conversely. In computational practice, other TSP formulations have proved more efficient. To derive an alternative formulation, consider first the following simple program with edge variables y e, e E:
9.1. FORMULATING AN INTEGER PROGRAM 183 (9.4) min c T y s t y Jt i = 2 i = 1 n 0 y 1 y integral (Recall our shorthand notation y -7 i = e " i y e for the sum of all y-values on edges incident with node i.) ILP (9.4) does not describe our problem correctly: We still must rule out solutions corresponding to disjoint circuits that cover all nodes. We achieve this by adding more inequalities, so-called subtour elimination constraints. To simplify the notation, we write for y R E and two disjoint subsets S T V y S : T = e = e i jf i S j T The subtour elimination constraints y S : S 2 make sure that there will be at least two edges in the solution that lead from a proper nonempty subset S V to its complement S = V \ S. So the corresponding tour is connected. A correct ILP-formulation is thus given by (9.5) min c T y s t y Jt i = 2 i = 1 n y S : S 2 S V y e 0 y 1 y integral Note the contrast to our first formulation (9.3): ILP (9.5) has exponentially many constraints, one for each proper subset S V. If n = 30, there are more than 2 30 constraints. Yet, the way to solve (9.5) in practice is to add even more constraints! This approach of adding so-called cutting planes is presented in Sections 9.2 and 9.3 below. REMARK. The mere fact that (9.5) has exponentially many constraints does not prevent us from solving it (without the integrality constraints) efficiently (cf. Section 10.6.2). Maximum Clique. This is another well-studied combinatorial problem, which we will use as a case study for integer programming techniques later. Consider again the complete graph K n = V E on n nodes. This time, there are weights c R V and d R E given on both the vertices and the edges. We look for a set C V that maximizes the total weight of vertices and induced edges: c C + d E C (9.6) max C V As K n = V E is the complete graph, each C V is a clique (set of pairwise adjacent nodes). Therefore, we call (9.6) the maximum weighted clique problem.
184 9. INTEGER PROGRAMMING EX. 9.3. Given a graph G = V E with E E, choose c = 1 and { d e = 0 e E n otherwise Show: With these parameters (for K n = V E ), (9.6) reduces to the problem of finding a clique C in G of maximum cardinality. Problem (9.6) admits a rather straightforward ILP-formulation: (9.7) max c T x + d T y y e x i 0 e E i e x i + x j y e 1 e = i j E 0 x y 1 x y integer A vector x y with all components x i y e {0 1} that satisfies the constraints of (9.7) is the so-called (vertex-edge) incidence vector of the clique C = {i V x i = 1} In other words, x R V is the incidence vector of C and y R E is the incidence vector of E C. REMARK. The reader may have noticed that all ILPs we have formulated so far are binary programs, i.e., the variables are restricted to take values in {0 1} only. This is not by pure accident. The majority of integer optimization problems can be cast in this setting. But of course, there are also others (e.g., the integer linear production model mentioned in the introduction to this chapter). Consider the integer linear program 9.2. Cutting Planes I (9.8) max c T x s t Ax b and x integral For the following structural analysis it is important (see Ex. 9.4) to assume that A and b are rational, i.e., A Q m n and b Q m. In this case, the polyhedron (9.9) P = {x R n Ax b} is a rational polyhedron (cf. Section 3.6). The set of integer vectors in P is a discrete set, whose convex hull we denote by (9.10) P I = conv {x Z n Ax b} Solving (9.8) is equivalent with maximizing c T x over the convex set P I (Why?). Below, we shall prove that also P I is a polyhedron and derive a system of inequalities describing P I. We thus show how (at least in principle) the original problem (9.8) can be reduced to a linear program. EX. 9.4. Give an example of a (non-rational) polyhedron P R n such that the set P I is not a polyhedron.
9.2. CUTTING PLANES I 185 PROPOSITION 9.1. Let P R n be a rational polyhedron. Then also P I is a rational polyhedron. In case P I, its recession cone equals that of P. Proof. The claim is trivial if P is bounded (as P then contains only finitely many integer points and the result follows by virtue of the discussion in Section 3.6). By the Weyl-Minkowski Theorem 3.2, a rational polyhedron generally decomposes into P = conv V + cone W with finite sets of rational vectors V Q n and W Q n. By scaling, if necessary, we may assume that W Z n. Denote by V and W the matrices whose columns are the vectors in V and W respectively. Thus each x P can be written as x = Vλ + Wµ where λ µ 0 and 1 T λ = 1 Let µ be the integral part of µ 0 (obtained by rounding down each component O i 0 to the next integer O i ). Splitting µ into its integral part µ and its non-integral part µ = µ µ yields with µ 0 integral and x P, where x = Vλ + Wµ + W µ = x + W µ P = {Vλ + Wµ λ 0 1 T λ = 1 0 µ 1} Because W Z n, x is integral if and only if x is integral. Hence P Z n = P Z n + {Wz z 0 integral} Taking convex hulls on both sides, we find (cf. Ex. 9.5) P I = conv P Z n + cone W Since P is bounded, P Z n is finite. So the claim follows as before. EX. 9.5. Show: conv V + W = conv V + conv W for all V W R n. We next want to derive a system of inequalities describing P I. There is no loss of generality when we assume P to be described by a system Ax b with A and b integral. The idea now is to derive new inequalities that are valid for P I (but not necessarily for P) and to add these to the system Ax b. Such inequalities are called cutting planes as they cut off parts of P that are guaranteed to contain no integral points. Consider an inequality c T x V that is valid for P. If c Z n but V Z, then each integral x P Z n obviously satisfies the stronger inequality c T x V. Recall from Corollary 2.6 that valid inequalities for P can be derived from the system Ax b by taking nonnegative linear combinations. We therefore consider inequalities of the form (9.11) y T A x y T b y 0
186 9. INTEGER PROGRAMMING If y T A Z n, then every x P Z n (and hence every x P I ) satisfies (9.12) y T A x y T b We say that (9.12) arises from (9.11) by rounding (if y T A Z n ). In particular, we regain the original inequalities Ax b taking as y all unit vectors. We conclude P I P = {x R n y T A x y T b y 0 y T A Z n } P Searching for inequalities of type (9.12) with y T A Z n, we may restrict ourselves to 0 y 1. Indeed, each y 0 splits into its integral part z = y 0 and non-integral part y = y z. The inequality (9.12) is then implied by the two inequalities (9.13) z T A x z T b Z y T A x y T b (Recall that we assume A and b to be integral.) The first inequality in (9.13) is implied by Ax b. To describe P, it thus suffices to augment the system Ax b by all inequalities of the type (9.12) with 0 y 1, which describes (9.14) P = {x R n y T A x y T b 0 y 1 y T A Z n } by a finite number of inequalities (see Ex. 9.6) and thus exhibits P as a polyhedron. EX. 9.6. Show: There are only finitely many vectors y T A Z n with 0 y 1. EX. 9.7. Show: P Q implies P Q. (In particular, P depends only on P and not on the particular system Ax b describing P.) Iterating the above construction, we obtain the so-called Gomory sequence (9.15) P P P P k P I Remarkably (cf. Gomory [34], and also Chvatal [9]), Gomory sequences are always finite: THEOREM 9.1. The Gomory sequence is finite in the sense that P t = P I holds for some t N. Before giving the proof, let us examine in geometric terms what it means to pass from P to P. Consider an inequality y T A x y T b with y 0 and y T A Z n. Assume that the components of y T A have greatest common divisor d = 1 (otherwise replace y by d 1 y). Then the equation y T A x = y T b
9.2. CUTTING PLANES I 187 admits an integral solution x Z n (cf. Ex. 9.8). Hence passing from P to P amounts to shifting all supporting hyperplanes H of P towards P I until they touch Z n in some point x (not necessarily in P I ). FIGURE 9.1. Moving a cutting plane towards P I EX. 9.8. Show: An equation c T x = ' with c Z n, ' Z admits an integer solution if and only if the greatest common divisor of the components of c divides ' (Hint: Section 2.3). The crucial step in proving Theorem 9.1 is the observation that the Gomory sequence (9.15) induces Gomory sequences on all faces of P simultaneously. More precisely, assume F P is a proper face. From Section 3.6, we know that F = P H holds for some rational hyperplane H = {x R n y T A x = y T b} with y Q m + (and hence yt A Q n and y T b Q). LEMMA 9.1. F = P H implies F = P H. Proof. From Ex. 9.7 we conclude F P. Since, furthermore, F F H holds, we conclude F P H. To prove the converse inclusion, note that F is the solution set of Ax b y T Ax = y T b Scaling y if necessary, we may assume that y T A and y T b are integral. By definition, F is described by the inequalities (9.16) w T A + 1 y T A x w T b + 1 y T b with w 0, 1 R (not sign-restricted) and w T A + 1 y T A Z n. We show that each inequality (9.16) is also valid for P H (and hence P H F ). If 1 0, observe that for x H (and hence for x P H) the inequality (9.16) remains unchanged if we increase 1 by an integer k N: Since x satisfies y T Ax =
188 9. INTEGER PROGRAMMING y T b Z, both the left and right hand side will increase by ky T b if 1 is increased to 1 + k. Hence we can assume 1 0 without loss of generality. If 1 0, however, (9.16) is easily recognized as an inequality of type (9.12). (Take y = w + 1 y 0.) So the inequality is valid for P and hence for P H. We are now prepared for the Proof of Theorem 9.1. In case P = {x R n Ax = b} is an affine subspace, the claim follows from Corollary 2.2 (cf. Ex. 9.9). In general, P is presented in the form Ax = b (9.17) A x b with n d equalities A i x = b i and s 0 facet inducing (i.e., irredundant) inequalities A j x b j. CASE 1: P I =. Let us argue by induction on s 0. If s = 0, P is an affine subspace and the claim is true. If s 1, we remove the last inequality A s x b s in (9.17) and let Q R n be the corresponding polyhedron. By induction, we then have Q t = Q I for some t N. Now P I = implies Q I {x R n A s x b s } = Since P t Q t and (trivially) P t {x R n A s x b s }, we conclude that P t = holds, too. CASE 2: P I. We proceed now by induction on dim P. If dim P = 0, P = {p} is an affine subspace and the claim is true. In general, since P I is a polyhedron, we can represent it as with C and d integral. Ax = b Cx d We show that each inequality c T x of the system Cx d will eventually become valid for some P t, t N (which establishes the claim immediately). So fix an inequality c T x. Since P and P I (and hence all P t ) have identical recession cones by Proposition 9.1, the values t = max c x P t T x are finite for each t N. The sequence t is decreasing. Indeed, from the definition of the Gomory sequence we conclude that t+1 t. Hence the sequence t reaches its limit := lim t t in finitely many steps. If =, there is nothing left to prove. Suppose therefore = t and consider the face F := {x P t c T x = }
9.2. CUTTING PLANES I 189 Then F I must be empty since every x P I F I satisfies c T x >. If c T row A, then c T x is constant on P P t P I, so x is impossible. Hence c T row A, i.e., dim F dim P. By induction, we conclude from Lemma 9.1 F k = P t+k {x R n c T x = } = for some finite k. Hence t+k, a contradiction. EX. 9.9. Assume P = {x R n Ax = b}. Show that either P = P I or P = P I =. (Hint: Corollary 2.2 and Proposition 9.1) EX. 9.10 (Matching Polytopes). Let G = V E be a graph with an even number of nodes. A perfect matching in G is a set of pairwise disjoint edges covering all nodes. Perfect matchings in G are in one-to-one correspondence with integral (and hence binary) vectors x R E satisfying the constraints (1) xvgh i L = 1 i V (2) 0 x 1. Let P R E be the polytope described by these constraints. The associated polytope P I is called the matching polytope of G. Thus P I is the convex hull of (incidence vectors of) perfect matchings in G. (For example, if G consists of two disjoint triangles, we have R E R 6, P = { 1 2 1} and P I = ). To construct the Gomory polytope P, consider some S V. When we add the constraints (1) for i S, every edge e = i j with i j S occurs twice. So the resulting equation is (1 ) xvgh S + 2x E S = S (Recall that E S E is the set of edges induced by S.) On the other hand, (2) implies (2 ) xvgh S 0 From (1 ) and (2 ) we conclude that x E S L 1 2 S is valid for P. Hence for S V (3) x E S 1 2 S is valid for P. It can be shown (cf. [12]) that the inequalities (1)-(3) describe P I. So P = P I and the Gomory sequence has length 1. Gomory s Cutting Plane Method. Theorem 9.1 tells us that at least in principle integer programs can be solved by repeated application of linear programming. Conceptually, Gomory s method works as follows. Start with the integer linear program (9.18) max c T x s t Ax b x integral and solve its LP-relaxation, which is obtained by dropping the integrality constraint: (9.19) max c T x s t Ax b So c T x is maximized over P = {x R n Ax b}. If the optimal solution is integral, the problem is solved. Otherwise, determine P and maximize c T x over P etc.
190 9. INTEGER PROGRAMMING Unfortunately, this approach is hopeless inefficient. In practice, if the optimum x of (9.19) is non-integral, one tries to find cutting planes (i.e., valid inequalities for P I that cut off a part of P containing x ) right away in order to add these to the system Ax b and then solves the new system etc.. This procedure is generally known as the cutting plane method for integer linear programs. Of particular interest in this context are cutting planes that are best possible in the sense that they cut as much as possible off P. Ideally, one would like to add inequalities that define facets of P I. Numerous classes of such facet defining cutting planes for various types of problems have been published in the literature. In Section 9.3, we discuss some techniques for deriving such cutting planes. 9.3. Cutting Planes II The cutting plane method has been successfully applied to many types of problems. The most extensively studied problem in this context is the traveling salesman problem (see, e.g., [12] for a detailed exposition). Here, we will take the max clique problem from Section 9.1 as our guiding example, trying to indicate some general techniques for deriving cutting planes. Moreover, we take the opportunity to explain how even more general (seemingly nonlinear) integer programs can be formulated as ILPs. The following unconstrained quadratic boolean (i.e., binary) problem was studied in Padberg [64] with respect to a symmetric matrix Q = q ij R n n : (9.20) max n q ij x i x j x i {0 1} i+ j=1 As x i x i = x i holds for a binary variable x i, the essential nonlinear terms in the objective function are the terms q ij x i x j i j. These may be linearized with the help of new variables y ij = x i x j. Since x i x j = x j x i, it suffices to introduce just n n 1 6 2 new variables y e, one for each edge e = i j E in the complete graph K n = V E with V = {1 n}. The salient point is the fact that the non-linear equation y e = x i x j is equivalent with the three linear inequalities if x i x j and y e are binary variables. y e x i y e x j and x i + x j y e 1
9.3. CUTTING PLANES II 191 With c i = q ii and d e = q ij + q ji for e = i j E, problem (9.20) can thus be written as an integer linear program: n max c i x i + d e y e e E (9.21) i=1 s.t. y e x i 0 e E i e x i + x j y e 1 e = i j E 0 x i y e 1 x i y e integer. Note that (9.21) is precisely our ILP formulation (9.7) of the weighted max clique problem. Let P R V E be the polytope defined by the inequality constraints of (9.21). As we have seen in Section 9.1, P I is then the convex hull of the (vertex-edge) incidence vectors x y R V E of cliques (subsets) C V. The polytope P R V E is easily seen to have full dimension n + ( n 2) (because, e.g., x = 1 2 1 and y = 1 3 1 yields an interior point x y of P). Even P I is full-dimensional (see Ex. 9.11). EX. 9.11. Show: R V E is the affine hull of the incidence vectors of the cliques of sizes 0,1 and 2. What cutting planes can we construct for P I? By inspection, we find that for any three vertices i j k V and corresponding edges e f g E, the following triangle inequality (9.22) x i + x j + x k y e y f y g 1 holds for any clique incidence vector x y R V E. EX. 9.12. Show: (9.22) can also be derived from the inequalities describing P by rounding. This idea can be generalized. To this end, we extend our general shorthand notation and write for x y R V E and S V: x S = x i and y S = y e i S e E S For example, (9.22) now simply becomes: x S y S 1 for S = 3. For every S V and integer 1 N, consider the following clique inequality (9.23) 1 x S y S 1?1 + 1 6 2 PROPOSITION 9.2. Each clique inequality is valid for P I.
192 9. INTEGER PROGRAMMING 1 6 6 1 6 Proof. Let x y R V E be the incidence vector of some clique C V. We must show that x y satisfies (9.23) for each S V and N. Let s = S C. Then x S = s and y S = s s 1 2. Hence 1?1 + 1 2 x S + y S = [1 v1 + 1 21 s + s s 1 ]6 2 =?1 s?1 s + 1 2 which is nonnegative since both 1 and s are integers. A further class of inequalities can be derived similarly. For any two disjoint subsets S T V, the associated cut inequality is (9.24) x S + y S + y T y S : T 0 (Recall from Section 9.1 that y S : T denotes the sum of all y-values on edges joining S and T). PROPOSITION 9.3. Each cut inequality is valid for P I. Proof. Assume that x y R V E is the clique incidence vector of C V. With s = C S and t = C T, we then find x S + y S + y T y S : T = s + s s 1 6 2 + t t 1 6 2 st = s t s t + 1 6 2 0 Multiplying a valid inequality with a variable x i 0, we obtain a new (nonlinear!) inequality. We can linearize it by introducing new variables as explained at the beginning of this section. Alternatively, we may simply use linear (lower or upper) bounds for the nonlinear terms, thus weakening the resulting inequality. For example, multiplying a clique inequality (9.23) by 2x i, i S, yields 21 j S x i x j 2x i y S 1?1 + 1 x i Because of x i y S y S, x 2 i = x i and x i x j = y e, e = i j E, the following so-called i-clique inequality (9.25) 21 y i : S \ {i} 2y S 1?1 1 x i 0 must be valid for P I. (This may also be verified directly.) REMARK. Most of the above inequalities actually define facets of P I. Consider, e.g., for some %, 1 % n 2, the clique inequality % x S y S % % + 1 2 which is satisfied with equality by all incidence vectors of cliques C V with C S = % or C S = % + 1. Let H R V E be the affine hull of all these incidence vectors. To prove that the clique inequality is facet defining, one has to show dim H = dim P I 1
9.4. BRANCH AND BOUND 193 i.e., H is a hyperplane in R V E. This is not too hard to do. (In the special case S = V and % = 1, it follows readily from Ex. 9.11). The cutting plane method suffers from a difficulty we have not mentioned so far. Suppose we try to solve an integer linear program, starting with its LP-relaxation and repeatedly adding cutting planes. In each step, we then face the problem of finding a suitable cutting plane that cuts off the current non-integral optimum. This problem is generally difficult. E.g., for the max clique problem one can show that it is N P-hard to check whether a given x y R V E satisfies all clique inequalities and, if not, find a violated one to cut off x y. Moreover, one usually has only a limited number of different classes (types) of cutting planes to work with. In the max clique problem, for example, we could end up with a solution x y that satisfies all clique, i-clique and cut inequalities and yet is non-integral. The original system and these three classes of cutting planes namely describe P I by no means completely. The situation in practice, however, is often not so bad. Quite efficient heuristics can be designed that frequently succeed to find cutting planes of a special type. Macambira and de Souza [57], for example, solve max clique instances of up to 50 nodes with the above clique and cut inequalities and some more sophisticated generalizations thereof. Furthermore, even when a given problem is not solved completely by cutting planes, the computation was not futile: Typically, the (non-integral) optimum obtained after having added hundreds of cutting planes provides a rather tight estimate of the true integer optimum. Such estimates are extremely valuable in a branch and bound method for solving ILPs as discussed in Section 9.4 below. For example, the combination of cutting planes and a branch and bound procedure has solved instances of the TSP with several thousand nodes to optimality (cf. [12]). 9.4. Branch and Bound Any linear maximization program (ILP) with binary variables x 1 x n can in principle be solved by complete enumeration: Check all 2 n possible solutions for feasibility and compare their objective values. To do this in a systematic fashion, one constructs an associated tree of subproblems as follows. Fixing, say the first variable x 1, to either x 1 = 0 or x 1 = 1, we generate two subproblems ILP x 1 = 0 and ILP x 1 = 1. These two subproblems are said to be obtained from (ILP) by branching on x 1. Clearly, an optimal solution of (ILP) can be inferred by solving the two subproblems. Repeating the above branching step, we can build a binary tree whose nodes correspond to subproblems obtained by fixing some variables to be 0 or 1. (The term binary refers here to the fact that each node in the tree has exactly two lower neighbors.) The resulting tree may look as indicated in Figure 9.2 below.
194 9. INTEGER PROGRAMMING I LP I LP x 1 = 0 I LP x 1 = 1 I LP x 1 = 0 x 3 = 0 I LP x 1 = 0 x 3 = 1 FIGURE 9.2. Having constructed the complete tree, we could solve (ILP) bottom up and inspect the 2 n leaves of the tree, which correspond to trivial (all variables fixed) problems. In contrast to this solution by complete enumeration, branch and bound aims at building only a small part of the tree, leaving most of the lower part unexplored. This approach is suggested by the following two obvious facts: If we can solve a particular subproblem, say ILP x 1 = 0 x 3 = 1, directly (e.g., by cutting planes), there is no need to inspect the subproblems in the branch below ILP x 1 = 0 x 3 = 1 in the tree. If we obtain an upper bound U x 1 = 0 x 3 = 1 for the sub-problem ILP x 1 = 0 x 3 = 1 that is less than the objective value of some known feasible solution of the original (ILP), then ILP x 1 = 0 x 3 = 1 offers no optimal solution. Only if neither of these circumstances occurs we have to explore the subtree rooted at ILP x 1 = 0 x 3 = 1 for possible optimal solutions. We do this by branching at ILP x 1 = 0 x 3 = 1 and creating two new subproblems in the search tree. An efficient branch and bound procedure tries to avoid such branching steps as much as possible. To this end, one needs efficient algorithms that produce (1) good feasible solutions of the original (ILP). (2) tight upper bounds for the subproblems. There is a trade-off between the quality of the feasible solutions and upper bounds on the one hand and the size of the search tree we have to build on the other. As a rule of thumb, good solutions should be almost optimal and bounds should differ from the true optimum by less than 10%. Algorithms for computing good feasible solutions usually depend very much on the particular problem at hand. So there is little to say in general. Quite often, however, simple and fast heuristic procedures for almost optimal solutions can be found. Such algorithms, also called heuristics for short, are known for many problem types. They have no guarantee for success, but work well in practice. REMARK [LOCAL SEARCH]. In the max clique problem the following simple local search often yields surprisingly good solutions: We start with some C V and check
9.5. LAGRANGIAN RELAXATION 195 whether the removal of some node i C or the addition of some node j C yields an improvement. If so, we add (delete) the corresponding node and continue this way until no such improvement is possible (in which case we stop with the current local optimum C V). This procedure may be repeated with different initial solutions C V. Computing good upper bounds is usually more difficult. Often, one just solves the corresponding LP-relaxations. If these are too weak, one can try to improve them by adding cutting planes as outlined in Section 9.3. An alternative is to obtain upper bounds from Lagrangian relaxation (see Section 9.5 below). Search and Branching Strategies. For the practical execution of a branch and bound algorithm, one needs to specify how one should proceed. Suppose, for example, that we are in a situation as indicated in Figure 9.2, i.e., that we have branched from (ILP) on variable x 1 and from ILP x 1 = 0 on variable x 3. We then face the question which subproblem to consider next, either ILP x 1 = 1 or one of the subproblems of ILP x 1 = 0. There are two possible (extremal) strategies: We either always go to one of the lowest (most restricted) subproblems or to one of the highest (least restricted) subproblems. The latter strategy, choosing ILP x 1 = 1 in our case, is called breadth first search while the former strategy is referred to as depth first search, as it moves down the search tree as fast as possible. A second question concerns the way of branching itself. If LP-relaxation or cutting planes are used for computing upper bounds, we obtain a fractional optimum x each time we try to solve a subproblem. A commonly used branching rule then branches on the most fractional x i. In the case of (0 1)-variables, this rule branches on the variable x i for which x i is closest to 16 2. In concrete applications, we have perhaps an idea about the relevance of the variables. We may then alternatively decide to branch on the most relevant variable x i. Advanced software packages for integer programming allow the user to specify the branching process and support various upper bounding techniques. REMARK. The branch and bound approach can easily be extended to general integer problems. Instead of fixing a variable x i to either 0 or 1, we may restrict it to x i % i or x i % i + 1 for suitable % i Z. Indeed, the general idea is to partition a given subproblem into a number of (possibly more than just two) subproblems of similar type. 9.5. Lagrangian Relaxation In Section 5.1, Lagrangian relaxation was introduced as a means for calculating upper bounds for optimization problems. Thereby, one relaxes (dualizes) some (in)equality constraints by adding them to the objective function using Lagrangian multipliers y 0 (in case of inequality constraints) to obtain an upper bound L y. The crucial question is which constraints to dualize. The more constraints are dualized, the weaker the bound becomes. On the other hand, dualizing more constraints facilitates the computation of L y. There is a trade-off between the
196 9. INTEGER PROGRAMMING quality of the bounds we obtain and the effort necessary for their computation. Generally, one would dualize only the difficult constraints, i.e., those that are difficult to deal with directly (see Section 9.5.2 for an example). Held and Karp [39] were the first to apply the idea of Lagrangian relaxation to integer linear programs. Assume that we are given an integer program as (9.26) max {c T x Ax b Bx d x Z n } for given integral matrices A B and vectors b c d and let z I P be the optimum value of (9.26). Dualizing the constraints Ax b 0 with multipliers u 0 yields the upper bound (9.27) L u = max {c T x u T Ax b Bx d x Z n } = u T b + max { c T u T A x Bx d x Z n } and thus the Lagrangian dual problem (9.28) z D = min u 0 L u EX. 9.13. Show that L u is an upper bound on z I P for every u 0. It is instructive to compare (9.28) with the linear programming relaxation (9.29) z LP = max {ct x Ax b Bx d} which we obtain by dropping the integrality constraint x Z n. We find that Lagrangian relaxation approximates the true optimum z I P at least as well: THEOREM 9.2. z I P z D z LP. Proof. The first inequality is clear (cf. Ex. 9.13). The second one follows from the fact that the Lagrangian dual of a linear program equals the linear programming dual. Formally, we may derive the second inequality by applying linear programming duality twice: z D = min u 0 L u = min u 0 [ut b + max x min u 0 [ut b + max x { c T u T A x Bx d x Z n }] { c T u T A x Bx d}] = min u 0 [ut b + min v {d T v v T B = c T u T A v 0}] = min u+ v {ut b + v T d u T A + v T B = c T u 0 v 0} = max x {c T x Ax b Bx d} = z LP
9.5. LAGRANGIAN RELAXATION 197 REMARK. As the proof of Theorem 9.2 shows, z D = z LP holds if and only if the integrality constraint x Z n is redundant in the Lagrangian dual problem defining z D. In this case, the Lagrangian dual is said to have the integrality property (cf. Geoffrion [29]). It turns out that solving the Lagrangian dual problem amounts to minimizing a piecewise linear function of a certain type. We say that a function f : R n R is piecewise linear convex if f is obtained as the maximum of a finite number of affine functions f i : R n R (cf. Figure 9.3 below). (General convex functions are discussed in Chapter 10). f x FIGURE 9.3. f u = max{ f i u 1 i k} x PROPOSITION 9.4. Let U be the set of vectors u 0 such that (9.30) L u = u T b + max x Then L is a piecewise linear convex function on U. { c T u T A x Bx d x Z n } Proof. For fixed u 0, the maximum in (9.30) is obtained by maximizing a linear function f x = c T u T A x over P I = conv {x Bx d x Z n } = conv V + cone E say, with finite sets V Z n and E Z n (cf. Proposition 9.1). If L u maximum in (9.30) is attained at some v V (Why?). Hence L u = u T b + max { c T u T A v v V}, the exhibiting the restriction of L to U as the maximum of the finitely many affine functions R i u = u T b Av i + c T v i v i V
198 9. INTEGER PROGRAMMING 9.5.1. Solving the Lagrangian Dual. After these structural investigations, let us address the problem of computing (at least approximately) the best possible upper bound L u and solving the Lagrangian dual z D = min u 0 L u To this end, we assume that we can evaluate (i.e., efficiently solve) for any given u 0: (9.31) L u = max {c T x u T Ax b Bx d x Z n } REMARK. In practice this means that the constraints we dualize (Ax b) have to be chosen appropriately so that the resulting L u is easy to evaluate (otherwise we obviously cannot expect to solve the problem min L u ) Suppose x Z n is an optimal solution of (9.31). We then seek some u 0 such that L u L u. Since x is a feasible solution of the maximization problem in (9.31), L u L u implies (9.32) c T x u T Ax b L u L u = c T x u T Ax b and hence u u T Ax b 0 The Subgradient Method. The preceding argument suggests to try a vector u = u + u with u = u u = Ax b for some small step size 0. Of course, we also want to have u = u + u 0. So we simply replace any negative component by 0, i.e., we project the resulting vector u onto the set R m + of feasible multipliers and obtain (9.33) u = max{0 u + Ax b } (componentwise) REMARK. This procedure appears intuitively reasonable: As our step size is small, a negative component can only occur if u i 0 and A i x b i. This means that we do not need to enforce the constraint A i x b i by assigning a large penalty (Lagrangian multiplier) to it. Consequently, we try u i = 0. The above procedure is the subgradient method (cf. also Section 5.2.3) for solving the Lagrangian dual: We start with some u 0 0 and compute a sequence u 1 u 2 by iterating the above step with step sizes 1 2. The appropriate choice of the step size i is a delicate problem both in theory and in practice. A basic result states that convergence takes place (in the sense of Theorem 11.6) provided lim i i = 0 and i=0 i =
a 9.5. LAGRANGIAN RELAXATION 199 9.5.2. Max Clique Revisited. How could Lagrangian relaxation be applied to the max clique problem? The first (and most crucial) step is to establish an appropriate ILP formulation of the max clique problem. This formulation should be such that dualizing a suitable subset of constraints yields upper bounds that are reasonably tight and efficiently computable. A bit of experimenting reveals our original formulation (9.7) resp. (9.21) to be inadequate. Below, we shall derive an alternative formulation that turns out to work better. We start by passing from the underlying complete graph K n = V E to the complete directed graph D n = V A, replacing each edge e = i j E by two oppositely directed arcs i j A and j i A. To avoid confusion with the notation, we will always indicate whether a pair i j is considered as an ordered or unordered pair and write i j A or i j E, resp. With each arc i j A, we associate a binary variable y ij. The original edge weights d e e E are equally replaced by arc weights q ij = q ji = d e 6 2 (e = i j E). The original ILP formulation (9.7) can now be equivalently replaced by (9.34) max c T x + q T y 1 x i + x j 1 2 y ij + y ji 1 i j E 2 y ij y ji = 0 i j E 3 y ij x i 0 i j A 4 x {0 1} V y {0 1} A REMARK. (9.34) is a directed version of (9.7). The cliques (subsets) C V are now in one-to-one correspondence with the feasible solutions of (9.34), namely the vertex-arc incidence vectors x y {0 1} V A, defined by x i = 1 if i C and y ij = 1 if i j C. The directed version (9.34) offers the following advantage over the formulation (9.7): After dualizing constraints (1) and (2) in (9.34), the remaining constraints (3) and (4) imply no dependence between different nodes i and j (i.e., y ij = 1 implies x i = 1 but not x j = 1). The resulting Lagrangian relaxation can therefore be solved quite easily (cf. Ex. 9.14). EX. 9.14. Using Lagrangian multipliers u R+ E for dualizing constraints (1) and unrestricted multipliers v R E for dualizing the equality constraints (2) in (9.34), one obtains L u v = max c T x + q T y + ( u ij 1 xi x j + 1 2 y ij + y ji ) + ij y ij y ji i+ j E s t subject to (3) (4) from (9.34). i+ j E So for given u R E + and v R E, computing L u v amounts to solving a problem of the following type (with suitable c R V and q R A ): max c T x + q T y subject to (3) (4) from (9.34) Show: A problem of the latter type is easy to solve because the constraints (3) (4) imply no dependence between different nodes i and j. (Hint: For i V, let P i = { j V q ij & 0}. Set x i = 1 if c i + j P i q ij & 0.)
_ 200 9. INTEGER PROGRAMMING Unfortunately, the Lagrangian bounds we obtain from the dualization of the constraints (1) and (2) in (9.34) are too weak to be useful in practice. To derive tighter bounds, we want to add more constraints to (9.34) while keeping the enlarged system still efficiently solvable after dualizing constraints (1) and (2). It turns out that one can add directed versions (cf. below) of the clique inequalities (9.23) and the i-clique inequalities (9.25) for S = V without complicating things too much. The resulting formulation of the max clique problem is (9.35) max c T x + q T y 1 x i + x j 1 2 y ij + y ji 1 i j E 2 y ij y ji = 0 i j E 3 y ij x i 0 i j A 4 21 x V y V 1?1 + 1 1 = 1 n 5 21 y J + i y V 1?1 1 x i 0 i V 6 x {0 1} V y {0 1} A where, in constraints (4) and (5), we used the straightforward extension of our general shorthand notation: s.t. y V = y ij and y J + i = i+ j A j i Constraints (4) and (5) are directed versions of the original clique and i-clique inequalities (9.23) and (9.25). EX. satisfies the constraints in (9.35). (Hint: Section 9.3) 9.15. Show that every incidence vector x y R V A of a set (clique) C V To dualize constraints (1) and (2) in (9.35), we introduce Lagrangian multipliers u R E + for the inequality constraints (1) and unrestricted multipliers v R E for the equality constraints (2). So we obtain for L u v the expression max c T x + q T y + i+ j E y ij ( u ij 1 xi x j + 1 2 y ) ij + y ji + subject to (3) (6) from (9.35) i+ j E ij y ij y ji Given u R E + and v R E, the computation of L u v amounts to solving a problem of the following type (for suitable c R V and q R A ): (9.36) max c T x + q T y subject to (3) (6) from (9.35) The integer linear program (9.36) appears to be more difficult, but can still be solved quickly. For p = 0 n, we determine the best solution satisfying x V = p as follows: For p = 0, set x = y = 0. Given p 1, we choose for each i V the p 1 most profitable arcs in + i, i.e., those with the highest q-values. Suppose their q-values sum up to q i for i V. We then let x i = 1 for the p largest values of c i + q i. If x i = 1, we let y ij = 1 for the p 1 most profitable arcs in + i.
9.5. LAGRANGIAN RELAXATION 201 The optimal solution is then the best we found for p = 0 n. This follows from LEMMA 9.2. Let x y {0 1} V A. Then x y is a feasible solution of (9.36) if and only if there exists some p {0 n} such that i x V = p and ii y J + i = { p 1 if xi = 1 0 if x i = 0 i V Proof. Assume first that x y satisfies (i) and (ii). Then x y satisfies the constraints (3) and (6) of (9.35). Constraint (4) reduces to (4 ) 21 p p p 1 1?1 + 1, which holds for all 1 p Z since?1 p 2 + v1 p 0. Constraint (5) is certainly satisfied if x i = 0 (due to (ii)). For x i = 1, constraint (5) becomes 21 p 1 p p 1 1?1 1 which is (4 ) again. Conversely, assume that x y is feasible for (9.36) and let p = x V = i V x i. Consider the constraints (5) of (9.36) for those i with x i = 1. Adding the corresponding inequalities for any 1, we find 21 y V py V p1 v1 1 0 Taking 1 = p, we conclude y V p p 1 On the other hand, letting 1 = p in (4), we have 2p 2 y V p p + 1 and hence y V p p 1 which proves y V = p p 1. Substituting the latter equality into (5) (with 1 = p) and dividing by p, we deduce for i V with x i = 1: 2y J + i p 1 + p 1 x i = 2 p 1 In view of constraint (3) in (9.35), we thus have the inequalities { y J + p 1 if xi = 1 i 0 if x i = 0. Since y V = p p 1, actually equality must hold. EX. 9.16. The Lagrangian bounds L u v we obtain when solving (9.36) as explained above are generally better than the bound produced by the LP-relaxation of (9.36). Consider, for example, the complete directed graph D 4 = V A with c = 0 R V and symmetric arc weights q ij = q ji as indicated in Figure 9.4 below. An optimum integral solution of (9.36) can be obtained as follows: Choose any set C V with C = 3. Set x i = 1 if i C. Furthermore, for each i C choose two arcs in g + i with weight q ij = 1. Set y ij = 1 on these two arcs. This solution guarantees an objective function value q T y = 6 (so the duality gap is zero).
202 9. INTEGER PROGRAMMING In contrast, the LP-relaxation of (9.36) is solved by x 1 = x 4 = 1, x 2 = x 3 = 2 3, y 12 = y 13 = y 42 = y 43 = 1 and y 21 = y 23 = y 24 = y 31 = y 32 = y 34 = 2 3 with an objective value of 8. So Lagrangian relaxation (in this example) provides strictly better bounds than LP-relaxation. In other words, problem formulation (9.36) does not have the integrality property (cf. p. 197). 2 3 1 FIGURE 9.4. All arcs have weight 1 except the two arcs 1 4 and 4 1 of weight 100. 4 Our Lagrangian relaxation of the max clique problem makes use of cutting planes by adding them to the constraints. This approach works well as long as we can deal with these additional constraints directly. If we wanted to add other cutting planes (say triangle inequalities), solving (9.36) with these additional constraints would become a lot more difficult. An alternative procedure would add such constraints and dualize them immediately. The resulting Lagrangian bound may then again be computed by solving a problem of type (9.36) (with a modified objective function). This approach has proved rather promising in practice (cf. [43]). 9.6. Dualizing the Binary Constraints As we have seen, Lagrangian relaxation is a technique to get rid of difficult inequality or equality constraints by dualizing them. Can we do something similar with the binary constraints? The answer is yes, and the reason is simple: A binary constraint x i {0 1} can be equivalently written as an equality constraint x 2 i x i = 0, which we dualize as usual. Note, however that dualizing the quadratic equation x 2 i x i = 0 necessarily results in a quadratic term in the Lagrangian function. We illustrate this approach in the case of the maximum clique problem or, equivalently, the unconstrained quadratic binary optimization problem from Section 9.3 (see Lemaréchal and Oustry [52] for other examples and more details of this technique in general).
9.6. DUALIZING THE BINARY CONSTRAINTS 203 Let Q R n n be a symmetric matrix and reconsider the unconstrained quadratic boolean problem (9.37) max {x T Qx x {0 1} n } Dualizing the constraints x 2 i x i = 0 with Lagrangian multipliers u i R, we obtain the Lagrangian bound (9.38) L u = max x R n x T Qx + i u i x 2 i x i Letting U R n n denote the diagonal matrix with diagonal u R n, we can write (9.39) L u = max x x T Q + U x u T x Evaluating L u amounts to solving the unconstrained quadratic optimization problem (9.39). Ex. 9.17 shows how to accomplish this. EX. 9.17. For fixed u R n, consider the function f x = x T Q + U x u T x Show: If x Q + U x & 0 holds for some x R n, then f has no finite maximum. Assume that x T Q + U x 0 always holds (i.e., Q + U is negative semidefinite). Show: x is optimal for f if and only if f x = 2x T Q + U u T = 0 T. (Hint: Section 10.3). So f has a finite maximum if and only if Q + U is negative semidefinite and f x = 0 T has a solution. The maximum is attained in each x R n satisfying 2 Q + U x = u, which implies L u = max x f x = 1 2 xt u u T x = 1 2 ut x The Lagrangian dual min u L u is called the semidefinite relaxation of the primal (9.37), as it can be reformulated as follows (with u R n, r R): min u L u = min {r u+ L u r} r = min {r x u+ T Q + U x u T x r x R n } r = min u+ r = min u+ r {r 1 x T { r [ r 1 2 ut 1 2 u Q + U [ r 1 2 ut 1 2 u Q + U ] ( 1 x ) 0 x R n } ] is negative semidefinite} Only the last step needs further explanation, which is given in Ex. 9.18 below. EX. 9.18. Show for any S R n+1 n+1 : 1 x T S ( ) 1 0 for all x R n z T Sz 0 for all z R n+1. x
204 9. INTEGER PROGRAMMING Our reformulation of the Lagrangian dual via (9.40) min u L u = min r+ u r s t S r+ u = [ r 1 2 ut 1 2 u Q + U ] 0 is a special case of a semidefinite program (optimizing a linear objective under linear and semidefinite constraints, see also Section12.6). REMARK. To understand how (and why) problem (9.40) can be solved at least approximately, consider the following cutting plane approach : We first replace the condition of semidefiniteness for S = S r+ u by a finite number of linear inequalities (9.41) a T Sa 0 a A for some finite set A R n+1. Note that, for each fixed a A, the inequality a T Sa 0 is a linear inequality with variables r and u. We then minimize r subject to constraints (9.41). If the solution provides us with r and u such that S r+ u is negative semidefinite, we have found a solution. Otherwise, if a T Sa & 0 holds for some a R n+1, we add a to A (i.e., we add a violated inequality) and solve the modified problem etc. (Note that we can check whether S = S r+ u is negative semidefinite with the Diagonalization algorithm from Section 2.1. This also provides us with a suitable vector a in case S is not negative semidefinite.) The theoretical aspects of this approach will be discussed in the context of the ellipsoid method in Section 10.6. In practice, analogues of the interior point method for linear programs (cf. Chapter 6) solve semidefinite programs more efficiently. We want to emphasize that the approach of dualizing the binary constraints in a general integer program max c T x s.t. Ax b x {0 1} n is limited. If we dualize only the binary constraints x 2 i x i = 0 using Lagrangian multipliers u i R, the Lagrangian function becomes L u = max x T Ux + c u T x s.t. Ax b In contrast to (9.38), this is a quadratic optimization problem with inequality constraints, which is in general difficult (NP-hard, cf. Section 8.3).