How To Solve A Linear Optimization Problem

Similar documents
Linear Programming Notes V Problem Transformations

CHAPTER 9. Integer Programming

2.3 Convex Constrained Optimization Problems

What is Linear Programming?

Lecture 2: August 29. Linear Programming (part I)

Duality of linear conic problems

3. Linear Programming and Polyhedral Combinatorics

Inner Product Spaces

Nonlinear Programming Methods.S2 Quadratic Programming

Lecture 7: Finding Lyapunov Functions 1

1 Solving LPs: The Simplex Algorithm of George Dantzig

Linear Programming. March 14, 2014

5.1 Bipartite Matching

Linear Programming. April 12, 2005

Mathematical finance and linear programming (optimization)

4.6 Linear Programming duality

LINEAR INEQUALITIES. less than, < 2x + 5 x 3 less than or equal to, greater than, > 3x 2 x 6 greater than or equal to,

Special Situations in the Simplex Algorithm

Practical Guide to the Simplex Method of Linear Programming

Linear Programming I

Solutions of Linear Equations in One Variable

Proximal mapping via network optimization

7.7 Solving Rational Equations

On SDP- and CP-relaxations and on connections between SDP-, CP and SIP

Support Vector Machine. Tutorial. (and Statistical Learning Theory)

Several Views of Support Vector Machines

MATH APPLIED MATRIX THEORY

10.1. Solving Quadratic Equations. Investigation: Rocket Science CONDENSED

DATA ANALYSIS II. Matrix Algorithms

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Transportation Polytopes: a Twenty year Update

Lecture 6: Logistic Regression

3.1 Solving Systems Using Tables and Graphs

On Convergence Rate of Concave-Convex Procedure

NOTES ON LINEAR TRANSFORMATIONS

Linear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc.

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Minimizing costs for transport buyers using integer programming and column generation. Eser Esirgen

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

Section 7.2 Linear Programming: The Graphical Method

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Moving Least Squares Approximation

Vector and Matrix Norms

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method

Approximation Algorithms

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system

x y The matrix form, the vector form, and the augmented matrix form, respectively, for the system of equations are

Continued Fractions and the Euclidean Algorithm

Example 4.1 (nonlinear pendulum dynamics with friction) Figure 4.1: Pendulum. asin. k, a, and b. We study stability of the origin x

Lecture 5 Principal Minors and the Hessian

Definition Given a graph G on n vertices, we define the following quantities:

Integrating algebraic fractions

Linear Programming: Theory and Applications

Chapter 3 Section 6 Lesson Polynomials

Support Vector Machine (SVM)

Optimisation et simulation numérique.

Inner Product Spaces and Orthogonality

Minimizing Probing Cost and Achieving Identifiability in Probe Based Network Link Monitoring

Grade Level Year Total Points Core Points % At Standard %

Linear Algebra: Vectors

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.

Utility Maximization

26 Linear Programming

Linear Programming. Solving LP Models Using MS Excel, 18

CSI 333 Lecture 1 Number Systems

Solutions to Math 51 First Exam January 29, 2015

Linear Algebra I. Ronald van Luijk, 2012

Similarity and Diagonalization. Similar Matrices

. P. 4.3 Basic feasible solutions and vertices of polyhedra. x 1. x 2

Midterm Practice Problems

Chapter 10: Network Flow Programming

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Decision Mathematics D1. Advanced/Advanced Subsidiary. Friday 12 January 2007 Morning Time: 1 hour 30 minutes. D1 answer book

A Branch and Bound Algorithm for Solving the Binary Bi-level Linear Programming Problem

Optimization Modeling for Mining Engineers

LECTURE 5: DUALITY AND SENSITIVITY ANALYSIS. 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method

Support Vector Machines

α = u v. In other words, Orthogonal Projection

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.

The Goldberg Rao Algorithm for the Maximum Flow Problem

Chapter 6. Linear Programming: The Simplex Method. Introduction to the Big M Method. Section 4 Maximization and Minimization with Problem Constraints

Linear Programming Problems

On Minimal Valid Inequalities for Mixed Integer Conic Programs

1.4 Compound Inequalities

Linear Algebra Notes

Optimization in R n Introduction

Properties of Real Numbers

Follow the Perturbed Leader

1 Norms and Vector Spaces

Numerical Analysis Lecture Notes

Fairness in Routing and Load Balancing

Chapter 15: Dynamic Programming

Solution to Homework 2

Systems of Linear Equations

Applied Algorithm Design Lecture 5

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

Section Inner Products and Norms

MATH2210 Notebook 1 Fall Semester 2016/ MATH2210 Notebook Solving Systems of Linear Equations... 3

Discuss the size of the instance for the minimum spanning tree problem.

JUST THE MATHS UNIT NUMBER 8.5. VECTORS 5 (Vector equations of straight lines) A.J.Hobson

Transcription:

EE 227A: Conve Optimization and Applications January 31, 2012 Lecture 5: Conic Optimization: Overview Lecturer: Laurent El Ghaoui Reading assignment: Chapter 4 of BV. Sections 3.1-3.6 of WTB. 5.1 Linear Programg 5.1.1 Definition and standard forms Definition. form: A linear optimization problem (or, linear program, LP) is one of the standard f 0 () : f i () 0, i = 1,, m, A = b, i = 1,, p, where every function f 0, f 1,..., f m is affine. Thus, the feasible set of an LP is a polyhedron. Standard forms. Linear optimization problems admits several standard forms. One is derived from the general standard form: c T + d : A = b, C h, where the inequalities are understood componentwise 1. The constant term d in the objective function is, of course, immaterial. Another standard form used in several off-the-shelf algorithms -is c T : A = b, 0. (5.1) We can always transform the above problem into the previous standard form, and vice-versa. 5.1.2 Eamples Piece-wise linear imization. A piece-wise linear function is the point-wise maimum of affine functions, and has the form f() = ma 1 i m (at i + b i ), for appropriate vectors a i and scalars b i, i = 1,..., m. The (unconstrained) problem of imizing the piece-wise linear function above is not an LP. However, its epigraph form: is.,t t : t a T i + b i, i = 1,..., m 1 Notice that the convention for componentwise inequalities differs from the one adopted in BV. I will reserve the symbol or for negative and positive semi-definiteness of symmetric matrices. 5-1

Figure 5.1. A linear classifier, with a total of four errors on the training set. The sum of the lengths of the dotted lines (which correspond to classification errors) is the value of the loss function at optimum. l 1 -norm regression. A related eample involves the imization of the l 1 -norm of a vector that depends affinely on the variable. This arises in regression problems, such as image reconstruction. Using a notation similar to the previous eample, the problem has the form a T i + b i. The problem is not an LP, but we can introduce slack variables and re-write the above in the equivalent, LP format:,v v i : v i a T i + b i v i, i = 1,..., m. Linear binary classification. Consider a two-class classification problem as shown in Figure 5.1. Given m data points i R n, each of which is associated with a label y i { 1, 1}, the problem is to find a hyperplane that separates, as much as possible, the two classes. The two classes are separable by a hyperplane H(w, b) = { : w T + b 0}, where w R n, w 0, and b R, if and only if w T i + b 0 for y i = +1, and w T i + b 0 if y i = 1. Thus, the conditions on (w, b) y i (w T i + b) 0, i = 1,..., m ensure that the data set is separable by a linear classifier. In this case, the parameters w, b allow to predict the label associated to a new point, via y = sign(w T + b). The feasibility problem finding (w, b) that satisfy the above separability constraints is an LP. If the data 5-2

set is strictly separable (every condition in (5.2) holds strictly), then we can re-scale the constraints and transform them into y i (w T i + b) 1, i = 1,..., m. (5.2) In practice, the two classes may not be linearly separable. In this case, we would like to imize, by proper choice of the hyperplane parameters, the total number of classification errors. Our objective function has the form ψ(y i (w T i + b)), where ψ(t) = 1 if t < 0, and 0 otherwise. Unfortunately, the above objective function is not conve, and hard to imize. We can replace it by an upper bound, which is called the hinge function, h(t) = (1 t) + = ma(0, 1 t). Our problem becomes one of imizing a piece-wise linear loss function: w,b (1 y i (w T i + b)) +. In the above form, the problem is not yet an LP. We may introduce slack variables to obtain the LP form: v i : v 0, y i (w T i + b) 1 v i, i = 1,..., m. w,b,v The above can be seen as a variant of the separability conditions (5.2), where we allow for infeasibilities, and seek to imize their sum. The value of the loss function at optimum can be read from Figure 5.1: it is the sum of the lengths of the dotted lines, from data points that are wrongly classified, to the hyperplane. Network flow. Consider a network (directed graph) having m nodes connected by n directed arcs (ordered pairs (i, j)). We assume there is at most one arc from node i to node j, and no self-loops. We define the arc-node incidence matri A R m n to be the matri with coefficients A ij = 1 if arc j starts at node i, 1 if it ends there, and 0 otherwise. Note that the column sums of A are zero: 1 T A = 0. A flow (of traffic, information, charge) is represented by a vector R n, and the total flow leaving node i is then (A) i = n j=1 A ij j. The imum cost network flow problem has the LP form c T : A = b, l u, where c i is the cost of flow through arc i, l, u provide upper and lower bounds on and b R m is an eternal supply vector. This vector may have positive or negative components, 5-3

as it represents supply and demand. We assume that 1 T b = 0, so that the total supply equals the total demand. The constraint A = b represents the balance equations of the network. A more specific eample is the ma flow problem, where we seek to maimize the flow between node 1 (the source) and node m (the sink). It bears the form,t t : A = te, l u, with e = (1, 0,..., 0, 1). LP relaation of boolean problems. A boolean optimization problem is one where the variables are constrained to be boolean. An eample of boolean problem is the so-called boolean LP p = c T : A b, {0, 1} n. Such problems are non-conve, and usually hard to solve. The LP relaation takes the form p LP := c T : A b, 0 1. The relaation provides a lower bound on the original problem: p LP p. Hence, its optimal points may not be feasible (not boolean). Even though a solution of the LP relaation may not necessarily be boolean, we can often interpret it as a fractional solution to the original problem. For eample, in a graph coloring problem, the LP relaation colors the nodes of the graph not with a single color, but with many. Boolean problems are not always hard to solve. Indeed, in some cases, one can show that the LP relaation provides an eact solution to the boolean problem, as optimal points turn out to be boolean. A few eamples in this category, involving network flow problems with boolean variables: The weighted bipartite matching problem is to match N people to N tasks, in a oneto-one fashion. The cost of matching person i to task j is A ij. The problem reads N A ij ij : i,j=1 ij {0, 1} j, N ij = 1 i, N j=1 ij = 1 (one person for each task) (one task for each person) The shortest path problem has the form 1 T : A = (1, 0,..., 0, 1), {0, 1} n, where A stands for the incidence matri of the network, and arcs with i = 1 form a shortest forward path between nodes 1 and m. As before the LP relaation in this case is eact, in the sense that its solution is boolean. (The LP relaation problem can be solved very efficiently with specialized algorithms.) 5-4

5.2 Overview of conic optimization Conic optimization models. form as The linear optimization model can be written in standard c T : A = b, 0, where we epress the feasible set as the intersection of an affine subspace { : A = b}, with the non-negative orthant, R n +. One can think of the linear equality constraints, and the objective, as the part in the problem that involves the data (A, b, c), while the sign constraints describe its structure. With the advent of provably polynomial-time methods for linear optimization in the late 70 s, researchers tried to generalize the linear optimization model, in a way that retained the nice compleity properties of the linear model. Early attempts at generalizing the above model focussed on allowing the linear map A to be nonlinear. Unfortunately, as soon as we introduce non-linearities in the equality constraints, the model becomes non-conve and potentially intractable numerically. Thus, modifying the linear equality constraints is probably not the right way to go. Instead, one can try to modify the structural constraints R n +. If one replaces the non-negative orthant with another set K, then we obtain a generalization of linear optimization. Clearly, we need K to be a conve set, and we can further assume it to be a cone (if not, we can always introduce a new variable and a new equality constraint in order to satisfy this condition). Hence the motivation for the so-called conic optimization model: c T : A = b, K, (5.3) where K is a given conve cone. The issue becomes then of finding those conve cones K for which one can adapt the efficient methods invented for linear optimization, to the conic problem above. A nice theory due to Nesterov and Nemirovski, which they developed in the late 80 s, allows to find a rich class of cones for which the corresponding conic optimization problem is numerically tractable. We refer to this class as tractable conic optimization. Tractable conic optimization. The cones that are allowed in tractable conic optimization are of three basic types, and include any combination (as detailed below) of these three basic types. The three basic cones are The non-negative orthant, R n +. (Hence, conic optimization includes linear optimization as a special case.) The second-order cone, Q n := {(, t) R n + : t 2 }. The semi-definite cone, S n + = {X = X T 0}. 5-5

A variation on the second-order cone, which is useful in applications, involves the rotated second-order cone Q n rot := {(, y, z) R n+2 : 2yz 2 2, y 0, z 0}. We can easily convert the rotated second-order cone into the ordinary second-order cone representation, since the constraints 2yz 2 2, y 0, z 0, are equivalent to (y + z) (y z). 2 2 We can build all sorts of cones that are admissible for the tractable conic optimization model, using combinations of these cones. For eample, in a specific instance of the problem, we might have constraints of the form 1 0, 3 2 1 + 2 2, ( 2 4 4 5 ) 0. The above set of constraints involves the non-negative orthant (first constraint), the secondorder cone (second constraint), and the third, the semi-definite cone. We can always introduce new variables and equality constraints, to make sure that the cone K is a direct product of the form K 1... K m, where each K i is a cone of one of the three basic types above. In the eample above, since the variable 2 appears in two of the cones, we add a new variable 6 and the equality constraint 2 = 6. With that constraint, the constraint above can be written = ( 1,..., 6 ) K, where K is the direct product R + Q 2 S+. 2 Note that the three basic cones are nested, in the sense that we can interpret the nonnegative orthant as the projection of a direct product of second-order cones on a subspace (think of imposing = 0 in the definition of Q n ). Likewise, a projection of the semi-definite cone on a specific subspace gives the second-order cone, since t 1... n 1 t 0 2 t... 0. (5.4). n 0 t (The proof of this eercise hinges on the Schur complement lemma, see BV, pages 650-651.) Eercises 1. Prove that any linear program can be epressed in the standard form 5.1. Justify your steps carefully. 2. Prove the result (5.4). 5-6