Linear Programming. Non-Lecture J: Linear Programming

The greatest flood has the soonest ebb; the sorest tempest the most sdden calm; the hottest love the coldest end; and from the deepest desire oftentimes enses the deadliest hate. Socrates Th extremes of glory and of shame, Like east and est, become the same. Samel Btler, Hdibras Part II, Canto I (c. 1670) Extremes meet, and there is no better example than the haghtiness of hmility. Ralph Waldo Emerson, Greatness, in Letters and Social Aims (1876) J Linear Programming The maximm flo/minimm ct problem is a special case of a very general class of problems called linear programming. Many other optimization problems fall into this class, inclding minimm spanning trees and shortest paths, as ell as several common problems in schedling, logistics, and economics. Linear programming as sed implicitly by Forier in the early 1800s, bt it as first formalized and applied to problems in economics in the 1930s by Leonid Kantorovich. Kantorivich s ork as hidden behind the Iron Crtain (here it as largely ignored) and therefore nknon in the West. Linear programming as rediscovered and applied to shipping problems in the early 1940s by Tjalling Koopmans. The first complete algorithm to solve linear programming problems, called the simplex method, as pblished by George Dantzig in 1947. Koopmans first proposed the name linear programming in a discssion ith Dantzig in 1948. Kantorovich and Koopmans shared the 1975 Nobel Prize in Economics for their contribtions to the theory of optimm allocation of resorces. Dantzig did not; his ork as apparently too pre. Koopmans rote to Kantorovich sggesting that they refse the prize in protest of Dantzig s exclsion, bt Kantorovich sa the prize as a vindication of his se of mathematics in economics, hich had been ritten off as a means for apologists of capitalism. A linear programming problem asks for a vector x IR d that maximizes (or eqivalently, minimizes) a given linear fnction, among all vectors x that satisfy a given set of linear ineqalities. The general form of a linear programming problem is the folloing: maximize sbject to c j x j a ij x j b i a ij x j = b i a ij x j b i for each i = 1.. p for each i = p + 1.. p + q for each i = p + q + 1.. n Here, the inpt consists of a matrix A = (a ij ) IR n d, a colmn vector b IR n, and a ro vector c IR d. Each coordinate of the vector x is called a variable. Each of the linear ineqalities is called 1

a constraint. The fnction x x b is called the objective fnction. I ill alays se d to denote the nmber of variables, also knon as the dimension of the problem. The nmber of constraints is sally denoted n. A linear programming problem is said to be in canonical form 1 if it has the folloing strctre: maximize sbject to c j x j a ij x j b i x j 0 for each i = 1.. n for each j = 1.. d We can express this canonical form more compactly as follos. For to vectors x = (x 1, x 2,..., x d ) and y = (y 1, y 2,..., y d ), the expression x y means that x i and y i for every index i. max c x s.t. Ax b x 0 Any linear programming problem can be converted into canonical form as follos: For each variable x j, add the eqality constraint x j = x + j x j and the ineqalities x + j 0 and x j 0. Replace any eqality constraint j a ijx j = b i ith to ineqality constraints j a ijx j b i and j a ijx j b i. Replace any pper bond j a ijx j b i ith the eqivalent loer bond j a ijx j b i. This conversion potentially triples the nmber of variables and dobles the nmber of constraints; fortnately, it is almost never necessary in practice. Another convenient formlation, especially for describing the simplex algorithm, is slack form 2, in hich the only ineqalities are of the form x j 0: max c x s.t. Ax = b x 0 It s fairly easy to convert any linear programming problem into slack form. especially sefl in describing the simplex algorithm. This form ill be J.1 The Geometry of Linear Programming A point x IR d is feasible ith respect to some linear programming problem if it satisfies all the linear constraints. The set of all feasible points is called the feasible region for that linear program. 1 Confsingly, some athors call this standard form. 2 Confsingly, some athors call this standard form. 2

The feasible region has a particlarly nice geometric strctre that lands some sefl intition to later linear programming algorithms. Any linear eqation in d variables defines a hyperplane in IR d ; think of a line hen d = 2, or a plane hen d = 3. This hyperplane divides IR d into to halfspaces; each halfspace is the set of points that satisfy some linear ineqality. Ths, the set of feasible points is the intersection of several hyperplanes (one for each eqality constraint) and halfspaces (one for each ineqality constraint). The intersection of a finite nmber of hyperplanes and halfspaces is called a polyhedron. It s not hard to verify that any halfspace, and therefore any polyhedron, is convex if a polyhedron contains to points x and y, then it contains the entire line segment xy. A to-dimensional polyhedron (hite) defined by 10 linear ineqalities. By rotating IR d so that the objective fnction points donard, e can express any linear programming problem in the folloing geometric form: Find the loest point in a given polyhedron. With this geometry in hand, e can easily pictre to pathological cases here a given linear programming problem has no soltion. The first possibility is that there are no feasible points; in this case the problem is called infeasible. For example, the folloing LP problem is infeasible: maximize x y sbject to 2x + y 1 x + y 2 x, y 0 An infeasible linear programming problem; arros indicate the constraints. The second possibility is that there are feasible points at hich the objective fnction is arbitrarily large; in this case, e call the problem nbonded. The same polyhedron cold be nbonded for some objective fnctions bt not others, or it cold be nbonded for every objective fnction. 3

A to-dimensional polyhedron (hite) that is nbonded donard bt bonded pard. J.2 Example 1: Shortest Paths We can compte the length of the shortest path from s to t in a eighted directed graph by solving the folloing very simple linear programming problem. maximize d t sbject to d s = 0 d v d l v for every edge v Here, v is the length of the edge v. Each variable d v represents a tentative shortest-path distance from s to v. The constraints mirror the reqirement that every edge in the graph mst be relaxed. These relaxation constraints imply that in any feasible soltion, d v is at most the shortest path distance from s to v. Ths, somehat conterintitively, e are correctly maximizing the objective fnction to compte the shortest path! In the optimal soltion, the objective fnction d t is the actal shortest-path distance from s to t, bt for any vertex v that is not on the shortest path from s to t, d v may be an nderestimate of the tre distance from s to v. Hoever, e can obtain the tre distances from s to every other vertex by modifying the objective fnction: maximize v d v sbject to d s = 0 d v d l v for every edge v There is another formlation of shortest paths as an LP minimization problem sing indicator variables. minimize l v x v v sbject to x s x s = 1 x t x t = 1 x v x v = 0 for every vertex v s, t x v 0 for every edge v Intitively, x v eqals 1 if v is in the shortest path from s to t, and eqals 0 otherise. The constraints merely state that the path shold start at s, end at t, and either pass throgh or avoid every other vertex v. Any path from s to t in particlar, the shortest path clearly implies a 4

feasible point for this linear program, bt there are other feasible soltions ith non-integral vales that do not represent paths. Nevertheless, there is an optimal soltion in hich every x e is either 0 or 1 and the edges e ith x e = 1 comprise the shortest path. Moreover, in any optimal soltion, the objective fnction gives the shortest path distance, even if not every x e is an integer! J.3 Example 2: Maximm Flos and Minimm Cts Recall that the inpt to the maximm (s, t)-flo problem consists of a eighted directed graph G = (V, E), to special vertices s and t, and a fnction assigning a non-negative capacity c e to each edge e. Or task is to choose the flo f e across each edge e, as follos: maximize f s f s sbject to f v f v = 0 for every vertex v s, t f v c v for every edge v f v 0 for every edge v Similarly, the minimm ct problem can be formlated sing indicator variables similarly to the shortest path problem. We have a variable S v for each vertex v, indicating hether v S or v T, and a variable X v for each edge v, indicating hether S and v T, here (S, T ) is some (s, t)-ct. 3 minimize c v X v v sbject to X v + S v S 0 for every edge v X v 0 S s = 1 S t = 0 for every edge v Like the minimization LP for shortest paths, there can be optimal soltions that assign fractional vales to the variables. Nevertheless, the minimm vale for the objective fnction is the cost of the minimm ct, and there is an optimal soltion for hich every variable is either 0 or 1, representing an actal minimm ct. No, this is not obvios; in particlar, my claim is not a proof! J.4 Linear Programming Dality Each of these pairs of linear programming problems is related by a transformation called dality. For any linear programming problem, there is a corresponding dal linear program that can be obtained by a mechanical translation, essentially by sapping the constraints and the variables. The translation is simplest hen the LP is in canonical form: Primal (Π) max c x s.t. Ax b x 0 Dal ( ) min y b s.t. ya c y 0 3 These to linear programs are not qite syntactic dals; I ve added to redndant variables S s and S t to the min-ct program increase readability. 5

We can also rite the dal linear program in exactly the same canonical form as the primal, by sapping the coefficient vector c and the objective vector b, negating both vectors, and replacing the constraint matrix A ith its negative transpose. 4 Primal (Π) max c x s.t. Ax b x 0 Dal ( ) max b y s.t. A y c y 0 Written in this form, it shold be immediately clear that dality is an involtion: The dal of the dal linear program is identical to the primal linear program Π. The choice of hich LP to call the primal and hich to call the dal is totally arbitrary. 5 The Fndamental Theorem of Linear Programming. A linear program Π has an optimal soltion x if and only if the dal linear program has an optimal soltion y here c x = y Ax = y b. The eak form of this theorem is trivial to prove. Weak Dality Theorem. If x is a feasible soltion for a canonical linear program Π and y is a feasible soltion for its dal, then c x yax y b. Proof: Becase x is feasible for Π, e have Ax b. Since y is positive, e can mltiply both sides of the ineqality to obtain yax y b. Conversely, y is feasible for and x is positive, so yax c x. It immediately follos that if c x = y b, then x and y are optimal soltions to their respective linear programs. This is in fact a fairly common ay to prove that e have the optimal vale for a linear program. J.5 Dality Example Before I prove the stronger dality theorem, let me first provide some intition abot here this dality thing comes from in the first place. 6 Consider the folloing linear programming problem: maximize 4x 1 + x 2 + 3x 3 sbject to x 1 + 4x 2 1 3x 1 x 2 + x 3 3 x 1, x 2, x 3 0 Let σ denote the optimm objective vale for this LP. The feasible soltion x = (1, 0, 0) gives s a loer bond σ 4. A different feasible soltion x = (0, 0, 3) gives s a better loer bond σ 9. 4 For the notational prists: In these formlations, x and b are colmn vectors, and y and c are ro vectors. This is a somehat nonstandard choice. Yes, that means the dot in c x is redndant. Se me. 5 For historical reasons, maximization LPs tend to be called primal and minimization LPs tend to be called dal, hich is really really stpid, since the only difference is a sign change. 6 This example is taken from Robert Vanderbei s excellent textbook Linear Programming: Fondations and Extensions [Springer, 2001], bt the idea appears earlier in Jens Clasen s 1997 paper Teaching Dality in Linear Programming: The Mltiplier Approach. 6

We cold play this game all day, finding different feasible soltions and getting ever larger loer bonds. Ho do e kno hen e re done? Is there a ay to prove an pper bond on σ? In fact, there is. Let s mltiply each of the constraints in or LP by a ne non-negative scalar vale y i : maximize 4x 1 + x 2 + 3x 3 sbject to y 1 ( x 1 + 4x 2 ) y 1 y 2 (3x 1 x 2 + x 3 ) 3y 2 x 1, x 2, x 3 0 Becase each y i is non-negative, e do not reverse any of the ineqalities. Any feasible soltion (x 1, x 2, x 3 ) mst satisfy both of these ineqalities, so it mst also satisfy their sm: (y 1 + 3y 2 )x 1 + (4y 1 y 2 )x 2 + y 2 x 3 y 1 + 3y 2. No sppose that each y i is larger than the ith coefficient of the objective fnction: y 1 + 3y 2 4, 4y 1 y 2 1, y 2 3. This assmption lets s derive an pper bond on the objective vale of any feasible soltion: 4x 1 + x 2 + 3x 3 (y 1 + 3y 2 )x 1 + (4y 1 y 2 )x 2 + y 2 x 3 y 1 + 3y 2. We have jst proved that σ y 1 + 3y 2. No it s natral to ask ho tight e can make this pper bond. Ho small can e make the expression y 1 + 3y 2 ithot violating any of the ineqalities e sed to prove the pper bond? This is jst another linear programming problem! minimize y 1 + 3y 2 sbject to y 1 + 3y 2 4 4y 1 y 2 1 This is precisely the dal of or original linear program! y 2 3 y 1, y 2 0 J.6 Strong Dality The Fndamental Theorem can be rephrased in the folloing form: Strong Dality Theorem. If x is an optimal soltion for a canonical linear program Π, then there is an optimal soltion y for its dal, sch that c x = y Ax = y b. Proof (Sketch): I ll prove the theorem only for non-degenerate linear programs, in hich (a) the optimal soltion (if one exists) is a niqe vertex of the feasible region, and (b) at most d constraint planes pass throgh any point. These non-degeneracy assmptions are relatively easy to enforce in practice and can be removed from the proof at the expense of some technical detail. I ill also prove the theorem only for the case n d; the argment for nder-constrained LPs is similar (if not simpler). 7

Let x be the optimal soltion for the linear program Π; non-degeneracy implies that this soltion is niqe, and that exactly d of the n linear constraints are satisfied ith eqality. Withot loss of generality (by permting the ros of A), e can assme that these are the first d constraints. So let A be the d d matrix containing the first d ros of A, and let A denote the other n d ros. Similarly, partition b into its first d coordinates b and everything else b. Ths, e have partitioned the ineqality Ax b into a system of eqations A x = b and a system of strict ineqalities A x < b. No let y = (y, y ) here y = ca 1 and y = 0. We easily verify that y b = c x : Similarly, it s trivial to verify that y A c: y b = y b = (ca 1 )b = c(a 1 b ) = c x. y A = y A = c. Once e prove that y is non-negative, and therefore feasible, the Weak Dality Theorem implies the reslt. Clearly y 0. As e ill see belo, the ineqality y 0 follos from the fact that x is optimal e had to se that fact somehere! This is the hardest part of the proof. The key insight is to give a geometric interpretation to the vector y = ca 1. Each ro of the linear system A x = b describes a hyperplane a i c = b i in IR d. The vector a i is normal to this hyperplane and points ot of the feasible region. The vectors a 1,..., a d are linearly independent (by non-degeneracy) and ths describe a coordinate frame for the vector space IR d. The definition of y can be reritten as follos: c = y A = yi a i. We are expressing the objective vector c as a linear combination of the constraint normals a 1,..., a d. No consider any vertex z of the feasible region that is adjacent to x. The vector z x is normal to all bt one of the vectors a i. Ths, e have i=1 A (z x ) = (0,..., ν,..., 0) here the constant ν is in the ith coordinate. The vector z x points into the feasible region, so ν 0. It follos that c (z x ) = y A (z x ) = νy i. The optimality of x implies that c x c z, so e mst have y i 0. We re done! c Copyright 2006 Jeff Erickson. Released nder a Creative Commons Attribtion-NonCommercial-ShareAlike 2.5 License (http://creativecommons.org/licenses/by-nc-sa/2.5/) 8