ISyE 6661: Topics Covered

ISyE 6661: Topics Covered 1. Optimization fundamentals: 1.5 lectures 2. LP Geometry (Chpt.2): 5 lectures 3. The Simplex Method (Chpt.3): 4 lectures 4. LP Duality (Chpt.4): 4 lectures 5. Sensitivity Analysis (Chpt.5): 3 lectures 6. Large-scale LP (Chpt.6): 1.5 lectures 7. Computational complexity and the Ellipsoid method (Chpt. 8): 2 lectures 8. Interior Point Algorithms (Chpt. 9): 5 lectures 1

1. Fundamentals of Optimization The generic optimization problem: (P ) : min{f(x) : x X}. Weirstrass Theorem: If f is continuous and X is compact then problem (P ) has an optimal solution. If f is a convex function and X is a convex set, then (P ) is a convex program. Theorem: If x is a local optimal solution of the convex program (P ) then it is also a global optimal solution. 2

2. Linear Programming Geometry LP in standard form (P ) : min{c T x : Ax = b, x 0}. LP involves involves minimizing a linear function over the polyhderal set X = {x : Ax = b, x 0}. Basic building blocks of a polyhedral set: Extreme points and Extreme rays. Theorem: (Algebraic characterization of Extreme pts.) A vector x is an extreme point of X iff it is a Basic Feasible Solution, i.e., a partitioning of A = [B N] (with B square and nonsingular) such that x B = B 1 b and x N = 0. Theorem: (Algebraic characterization of Extreme rays.) A vector d 0 is an extreme ray of X iff if it is a Non-negative Basic Direction, i.e., a partitioning of A = [B N] (with B square and nonsingular) s.t. [ B d = α 1 A j e j for some A j N and α > 0. ] 0 3

2. Linear Programming Geometry (contd.) The Representation Theorem: Let x 1,..., x k and d 1,..., d l be the extreme points and extreme rays of X respectively. Then X = x : x = k i=1 k i=1 λ i x i + l j=1 µ j d j λ i = 1, λ i 0 i, µ j 0 j. To prove the above result, we used: The Separation Theorem: Let S be a non-empty closed convex set, and x S. Then a vector c s.t. c T x < c T x x S. Theorem: (Cor. of Rep. Thm.) (a) An LP min{c T x : x X} has an optimal solution iff c T d j 0 for all extreme rays d j j = 1,..., l. (b) Extreme point optimality: If an LP has an optimal solution then there exists an extreme point that is optimal. 4

3. The Simplex Method Basic idea: Move from one extreme point (bfs) to another while improving the objective. Given a bfs x k with basis B move along one of the j-th Basic Directions (j N) [ d j B = 1 ] A j. e j If x is non-degenerate then d j is a feasible direction, i.e., allows a positive step move. If c T d j < 0 then d j is an improving direction. Note c T d j = c j c T B B 1 A j = c j (the reduced cost). If no improving direction exists, i.e. c j 0 for all j N, the current solution is optimal, Stop. Chose an improving basic direction d j from j N, and move to x k+1 x k + αd j where α 0 is such that x k+1 0. If d j 0 then α = + implying that the problem is unbounded, Stop. 5

3. The Simplex Method (contd.) Theorem: x k+1 is an adjacent bfs to x k with basis ˆB = B + {A j } {A l } where l is some basic variable that becomes nonbasic. Degeneracy, i.e., when a basic variable has a value of zero, is a problem. If x k is a degenerate, α could be zero, i.e., the basis changes from B to ˆB but x k+1 = x k and cause Stalling or Cycling. Can be dealt with by properly choosing j and l (e.g. Lexicographic rule). Theorem: The Simplex method (with proper pivot rules) solves LP in a finite number of iterations. 6

3. The Simplex Method (contd.) Revised Simplex and Tableau implementations. Initializing the Simplex method Two-phase Simplex Big-M method 7

4. Duality Standard form Primal-dual LP pairs: v P = min c T x v D = max b T y s.t. Ax = b s.t. A T y c. x 0 Recipe for writing dual problem for general LPs. Weak Duality Theorem: v D v P. Proof of WD: By construction of the dual problem. Strong Duality Theorem: If either problem has a finite optimal value then v D = v P. Proof 1 of SD: From the Simplex Method. (c T B B 1 are the optimal dual variables). Proof 2 of SD: From the theorems of alternatives (Farkaas Lemma). 8

4. Duality (Contd.) Fakaas Lemma: Let A R m n and b R m then exactly one of the following two systems (a or b) is feasible: (a) Ax = b (b) A T y 0 x 0 b T y < 0. Proof: Use Separating Hyperplane theorem. See different forms of Farkaas Lemma. From Duality to Polyhedral theory: An immediate proof of Farkaas Lemma. A simple proof of the Representation Thm. Converse to Rep. Thm.: Convex hull of a finite number of points is a polytope. 9

4. Duality (Contd.) LP Optimality Conditions (Cor. (x, y ) is primal-dual optimal iff of SD) A pair Ax = b, x 0 Primal Feasibility A T y c Dual Feasibility x j (c j A T j y ) = 0 j Complementary Slackness. Relation between non-degeneracy and uniqueness amongst primal and dual optimal solutions. The Dual Simplex Algorithm: A basis B is primal feasible (PF) if B 1 b 0 and dual feasible (DF) if c T c T B B 1 A 0. Start with a basis that is DF but not PF. Select a variable (< 0) to leave the basis (move towards PF). Select an entering variable to maintain DF. 10

4. Duality (Contd.) Dual Simplex is not analogous to applying Primal Simplex to the Dual problem. When to use Dual Simplex over Primal Simplex? Generalized Duality: The dual of v P = min{c T : Ax b, x X} is v D = max{l(y) : y 0} where L(y) := min{c T x + y T (b Ax) : x X}. 11

5. Sensitivity Analysis Consider the LP z = min{c T x : Ax = b, x 0}. An instance of the LP is given by the data (n, m, c, A, b). If the optimal solution x is non-degenerate then the i-th dual variable represents yi = z b i x i = 1,..., m. Local Sensitivity Analysis: (a) How doe the optimal solution x and the optimal value z behave under small perturbations of the problem data (n, m, c, A, b)? (b) How to efficiently recover the new optimal solution and optimal value after the perturbation? 12

5. Sensitivity Analysis (contd.) Adding a new variable: Current basis remains PF. So check DF (reduced cost of the new variable) and use Primal Simplex to optimize if needed. Adding a new constraint: Current basis remain DF. Check PF, and use Dual Simplex to optimize if needed. Perturbing b b + δd: Current basis remain DF and PF over a computable range of δ. Outside this range, we have DF but not PF, so use Dual Simplex to optimize. Perturbing c c + δd: Current basis remain DF and PF over a computable range of δ. Outside this range, we have PF but not DF, so use Primal Simplex to optimize. 13

5. Sensitivity Analysis (contd.) Perturbing A j A j + δd where j N: Current basis remain DF and PF over a computable range of δ. Outside this range, we have PF but not DF, so use Primal Simplex to optimize. Perturbing A j A j + δd where j B: Current basis remain DF and PF over a computable range of δ. Outside this range, both PF and DF maybe affected. Global behavior of value functions: (a) F (b) = min{c T x : Ax = b, x 0} is a convex function of b, and the dual solution y is a subgradient of F (b) at b. (b) G(c) = min{c T x : Ax = b, x 0} is a concave function of c, and x is a subgradient of G(c) at c. 14

6. Large-Scale LP Column Generation: The Cutting Stock Problem Dantzig-Wolfe decomposition. Row Generation: Benders decomposition. 15

7. Computational Complexity of LP A problem (class) is easy if there exists an algorithm whose computational effort required to solve any instance of the problem is bounded by some polynomial of the size of that instance (i.e. if there exists a polynomial time algorithm for the problem). Is LP easy? The Simplex method may require an exponential number (in the number of variables) of iterations! Klee-Minty (1972). Yudin and Nemirovskii (1977) developed Ellipsoid method and showed that general convex programs are easy and Khachian (1979) used it show that LP is indeed easy. 16

7. The Ellipsoid Method for LP The Ellipsoid method answers the following question Is X = {x R n Ax b} =? Assume: if X then 0 < v vol(x) V. We have a Separation Oracle S(x, X) which returns 0 if x X, otherwise it returns a vector a 0 such that a T y > a T x for all y X. 0. Find an ellipsoid E 0 (x 0 ) X. Set k = 0. 1. If S(x k, X) = 0 stop X. If vol(e k (x k )) v Stop X =. 2. If S(x k, X) = a k, then X H k := {x : a T k x a T k xk }. Find such that E k+1 (x k+1 ) E k (x k ) H k X vol(e k+1 (x k+1 )) vol(e k (x k )) < e 1/2(n+1). 3. Set k k + 1 and go to step 1. 17

7. The Ellipsoid Method for LP (contd.) The numbers v and V depend on n and U (the largest number in the data (A, b)). Theorem: The Ellipsoid method answers the question Is X = {x R n Ax b} =? in O(n 6 log(nu)) iterations. 18

7. The Ellipsoid Method for LP (contd.) Easily modified for optimization of a linear function over polyhedra. Polynomial complexity is preserved. Note the complexity does not depend on the number of constraints in X. Equivalence of Separation and Optimization: The description of X maybe involve an exponential number of constraints. However as long as we have a polynomial time Separation Oracle then the Ellipsoid algorithm guarantees that optimization of a linear function over X is still polynomial time! 19

8. Interior Point Methods min{c T x : x X} Basic idea: Given x k int(x), find a direction d k and a step size α k s.t. x k + α k d k =: x k+1 int(x) and c T x k+1 < c T x k. Continue until some termination criteria is met. The algorithms differ w.r.t choice of d k, α k and the termination criteria. May need some preprocessing to guarantee that an optimal solution exists. The algorithms are convergent lim k xk = x. A good criteria for finite termination is needed. 20

8. Interior Point Methods: The Affine Scaling Method Basic idea: Given x k int(x), construct an Ellipsoid E k (x k ) int(x). Choose x k+1 = argmin{c T x : x E k }. Based on the fact that the minimizer of a linear form over an Ellipsoid can be found analytically. Not proven to be polynomial time. 21

8. Interior Point Methods: The Primal path following (Barrier) method We want to solve P : min{c T x : Ax = b, x 0}. Use a penalty function to prevent iterates from approaching the boundary of the polyhedron. Reduce penalty as the iterates approach an optimal solution (on the boundary). Given µ > 0, the barrier problem is P (µ) : min{f µ (x) := c T x µ n j=1 log(x j ) : Ax = b}. 22

8. Interior Point Methods: The Barrier method For any µ > 0 the function f µ (x) is strictly convex the problem P (µ) has a unique optimal solution x(µ). For any µ > 0, x(µ) int(x), where X = {x : Ax = b, x 0}. For µ = +, x(µ) is the analytic center of X. As µ 0, x(µ) x. The set of solutions {x(µ) : µ (0, )} is known as the Central Path. How to find x(µ) (at least approximately)? 23

Aside: NLP Optimality Conditions NLP : min{f(x) : Ax = b, x 0} LP (x ) : min{ f(x ) T x : Ax = b, x 0} Theorem: If x is an optimal solution of NLP then x is an optimal solution of LP (x ). Theorem: If f is convex, then x is an optimal solution of NLP iff x is an optimal solution of LP (x ). Theorem: If x is an optimal solution of NLP then x solves the KKT system Ax = b, x 0 A T y + s = f(x ), s 0 x j s j = 0 j = 1,..., n. Theorem: If f is convex, then x is an optimal solution of NLP iff x solves the KKT system Ax = b, x 0 A T y + s = f(x ), s 0 x j s j = 0 j = 1,..., n. 24

8. Interior Point Methods: The Barrier method (contd.) x(µ) is a solution of the KKT system for the Barrier problem Ax = b, x > 0 A T y + s = c, s > 0 x j s j = µ j = 1,..., n. The system is nonlinear difficult to solve. We are content with β-approximate solutions (0 < β < 1) Ax = b, x > 0 A T y + s = c, s > 0 nj=1 ( x js j µ 1)2 β 2 For fixed β, lim µ 0 x β (µ) = lim µ 0 x(µ) = x. 25

8. Interior Point Methods: The Barrier method (contd.) Let β = 1/2. Start with some µ k > 0 and a β- approximation x k of x(µ k ). Linearize the KKT system around x k and solve it to get the new solution x k+1. It can be shown that x k+1 is a β-approximation of x(µ k+1 ) with µ k+1 = (1 1 2+4 n )µ k. Continue until the duality gap (x k ) T s k ɛ. 26

8. Interior Point Methods: The Barrier Method (contd.) Theorem: The barrier algorithm reduces the duality gap from ɛ 0 to ɛ in O( n log ɛ 0 ɛ ) iterations. 27

Not covered: Network Flow Problems A very important class of problems. Constraint matrix has a very special structure, called a Network matrix. Specialized Simplex type algorithm is strongly polynomial time. E.g. Transportation and Assignment Problems. 28

What s next? Optimization Courses in SP 04 ISyE 6662: Optimization II. Ph.D. level class on Integer Programming and Network Flows. Offered by Prof. Ergun. ISyE 8871: Integer Programming. Advanced Ph.D. level class on Integer Programming. Offered by Prof. Nemhauser. ISyE 6663: Optimization III. Nonlinear Programming programming theory for Ph.D. students. Offered by Prof. Nemirovskii. ISyE 8813: Advanced Ph.D. class on Interior Point Methods. Offered by Prof. Nemirovskii. ISyE 6669: Deterministic optimization (MS level). ISyE 6673: Financial optimization models (MS level). Offered by Prof. Sokol. ISyE 6679: Computational Methods in Optimization. Offered by Prof. Barnes. 29