Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725



Similar documents
Nonlinear Programming Methods.S2 Quadratic Programming

Nonlinear Optimization: Algorithms 3: Interior-point methods

2.3 Convex Constrained Optimization Problems

10. Proximal point method

4.6 Linear Programming duality

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Support Vector Machine (SVM)

Date: April 12, Contents

1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where.

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1

Two-Stage Stochastic Linear Programs

Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach

constraint. Let us penalize ourselves for making the constraint too big. We end up with a

Proximal mapping via network optimization

Several Views of Support Vector Machines

24. The Branch and Bound Method

Duality of linear conic problems

Solutions Of Some Non-Linear Programming Problems BIJAN KUMAR PATEL. Master of Science in Mathematics. Prof. ANIL KUMAR

Big Data - Lecture 1 Optimization reminders

Linear Programming Notes V Problem Transformations

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Distributed Machine Learning and Big Data

Support Vector Machines Explained

A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION

1 Solving LPs: The Simplex Algorithm of George Dantzig

Linear Programming in Matrix Form

Discrete Optimization

Interior-Point Algorithms for Quadratic Programming

Mathematical finance and linear programming (optimization)

Advanced Lecture on Mathematical Science and Information Science I. Optimization in Finance

1 Portfolio mean and variance

Statistical Machine Learning

The Heat Equation. Lectures INF2320 p. 1/88

Support Vector Machines

Support Vector Machines

Summer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.

LECTURE 5: DUALITY AND SENSITIVITY ANALYSIS. 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method

Optimal energy trade-off schedules

Lecture 3: Linear methods for classification

Economics 2020a / HBS 4010 / HKS API-111 FALL 2010 Solutions to Practice Problems for Lectures 1 to 4

LINES AND PLANES CHRIS JOHNSON

Cost Minimization and the Cost Function

MA107 Precalculus Algebra Exam 2 Review Solutions

3.3. Solving Polynomial Equations. Introduction. Prerequisites. Learning Outcomes

Solutions for Review Problems

Lecture 6: Logistic Regression

Lecture 2: The SVM classifier

Lecture 11: 0-1 Quadratic Program and Lower Bounds

Interior Point Methods and Linear Programming

Lecture 13 Linear quadratic Lyapunov theory

In this section, we will consider techniques for solving problems of this type.

Some representability and duality results for convex mixed-integer programs.

Notes on Symmetric Matrices

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

Sensitivity Analysis 3.1 AN EXAMPLE FOR ANALYSIS

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.

What is Linear Programming?

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

Duality in Linear Programming

Study Guide 2 Solutions MATH 111

OPTIMAL CONTROL OF A COMMERCIAL LOAN REPAYMENT PLAN. E.V. Grigorieva. E.N. Khailov

TOPIC 4: DERIVATIVES

3. Linear Programming and Polyhedral Combinatorics

Convex Programming Tools for Disjunctive Programs

Practical Guide to the Simplex Method of Linear Programming

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method

A New Quantitative Behavioral Model for Financial Prediction

1 Norms and Vector Spaces

Roots of Polynomials

IEOR 4404 Homework #2 Intro OR: Deterministic Models February 14, 2011 Prof. Jay Sethuraman Page 1 of 5. Homework #2

Optimization Modeling for Mining Engineers

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Insurance. Michael Peters. December 27, 2013

Lecture 7: Finding Lyapunov Functions 1

Algebra Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the school year.

3.2 The Factor Theorem and The Remainder Theorem

Alternative proof for claim in [1]

CHAPTER 9. Integer Programming

Applied Algorithm Design Lecture 5

CONSTRAINED NONLINEAR PROGRAMMING

Constrained optimization.

Separation Properties for Locally Convex Cones

Curves and Surfaces. Goals. How do we draw surfaces? How do we specify a surface? How do we approximate a surface?

Linear Programming I

1 Shapes of Cubic Functions

minimal polyonomial Example

Application. Outline. 3-1 Polynomial Functions 3-2 Finding Rational Zeros of. Polynomial. 3-3 Approximating Real Zeros of.

The Multiplicative Weights Update method

Numerical methods for American options

Linear Programming. Widget Factory Example. Linear Programming: Standard Form. Widget Factory Example: Continued.

Understanding Basic Calculus

OPRE 6201 : 2. Simplex Method

SMT 2014 Algebra Test Solutions February 15, 2014

Chapter 4. Duality. 4.1 A Graphical Example

Constrained Least Squares

Optimization in R n Introduction

Optimal shift scheduling with a global service level constraint

The degree of a polynomial function is equal to the highest exponent found on the independent variables.

An interval linear programming contractor

We can display an object on a monitor screen in three different computer-model forms: Wireframe model Surface Model Solid model

Transcription:

Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1

Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T u h T v subject to Ax = b subject to A T u G T v = c Gx h v 0 Primal LP Dual LP Explanation: for any u and v 0, and x primal feasible, u T (Ax b) + v T (Gx h) 0, i.e., ( A T u G T v) T x b T u h T v So if c = A T u G T v, we get a bound on primal optimal value 2

Explanation # 2: for any u and v 0, and x primal feasible c T x c T x + u T (Ax b) + v T (Gx h) := L(x, u, v) So if C denotes primal feasible set, f primal optimal value, then for any u and v 0, f min x C L(x, u, v) min x L(x, u, v) := g(u, v) In other words, g(u, v) is a lower bound on f for any u and v 0. Note that { b T u h T v if c = A T u G T v g(u, v) = otherwise This second explanation reproduces the same dual, but is actually completely general and applies to arbitrary optimization problems (even nonconvex ones) 3

Outline Today: Lagrange dual function Langrange dual problem Weak and strong duality Examples Preview of duality uses 4

Lagrangian Consider general minimization problem min x subject to f(x) h i (x) 0, i = 1,... m l j (x) = 0, j = 1,... r Need not be convex, but of course we will pay special attention to convex case We define the Lagrangian as L(x, u, v) = f(x) + m u i h i (x) + i=1 r v j l j (x) j=1 New variables u R m, v R r, with u 0 (implicitly, we define L(x, u, v) = for u < 0) 5

Important property: for any u 0 and v, f(x) L(x, u, v) at each feasible x Why? For feasible x, The Lagrange dual function 217 L(x, u, v) = f(x) + m u i h i (x) + } {{ } 0 i=1 r j=1 v j l j (x) f(x) } {{ } =0 5 4 3 2 1 0 1 2 1 0.5 0 0.5 1 x Figure 5.1 Lower bound from a dual feasible point. The solid curve shows the objective function f 0,andthedashedcurveshowstheconstraintfunctionf 1. The feasible set is the interval [ 0.46, 0.46], which is indicated by the two Solid line is f Dashed line is h, hence feasible set [ 0.46, 0.46] Each dotted line shows L(x, u, v) for different choices of u 0 and v (From B & V page 217) 6

Lagrange dual function 3 Let C denote primal feasible set, f denote 1 primal optimal value. Minimizing L(x, u, v) over all x gives a lower 0 bound: f min x C L(x, u, v) min x 2 L(x, u, v) := g(u, v) We call g(u, v) the Lagrange dual function, and it gives a lower bound on f for any u 0 and v, called dual feasible u, v Dashed horizontal line is f Dual variable λ is (our u) Solid line shows g(λ) (From B & V page 217) 4 2 1 1 0.5 0 0.5 1 x Figure 5.1 Lower bound from a dual feasible point. The solid curve shows the objective function f0, andthedashedcurveshowstheconstraintfunctionf1. The feasible set is the interval [ 0.46, 0.46], which is indicated by the two dotted vertical lines. The optimal point and value are x = 0.46, p =1.54 (shown as a circle). The dotted curves show L(x,λ) forλ =0.1, 0.2,...,1.0. Each of these has a minimum value smaller than p,sinceonthefeasibleset (and for λ 0) we have L(x, λ) f0(x). g(λ) 1.6 1.5 1.4 1.3 1.2 1.1 1 0 0.2 0.4 0.6 0.8 1 λ Figure 5.2 The dual function g for the problem in figure 5.1. Neither f0 nor f1 is convex, but the dual function is concave. The horizontal dashed line shows p,theoptimalvalueoftheproblem. 7

Consider quadratic program: where Q 0. Lagrangian: Example: quadratic program 1 min x R n 2 xt Qx + c T x subject to Ax = b, x 0 L(x, u, v) = 1 2 xt Qx + c T x u T x + v T (Ax b) Lagrange dual function: g(u, v) = min x R n L(x, u, v) = 1 2 (c u+at v) T Q 1 (c u+a T v) b T v For any u 0 and any v, this is lower a bound on primal optimal value f 8

Same problem but now Q 0. Lagrangian: 1 min x R n 2 xt Qx + c T x subject to Ax = b, x 0 L(x, u, v) = 1 2 xt Qx + c T x u T x + v T (Ax b) Lagrange dual function: 1 2 (c u + AT v) T Q + (c u + A T v) b T v g(u, v) = if c u + A T v null(q) otherwise where Q + denotes generalized inverse of Q. For any u 0, v, and c u + A T v null(q), g(u, v) is a nontrivial lower bound on f 9

10 Example: quadratic program in 2D We choose f(x) to be quadratic in 2 variables, subject to x 0. Dual function g(u) is also quadratic in 2 variables, also subject to u 0 primal Dual function g(u) provides a bound on f for every u 0 f / g dual Largest bound this gives us: turns out to be exactly f... coincidence? x1 / u1 x2 / u2 More on this later, via KKT conditions

Given primal problem Lagrange dual problem min x subject to f(x) h i (x) 0, i = 1,... m l j (x) = 0, j = 1,... r Our constructed dual function g(u, v) satisfies f g(u, v) for all u 0 and v. Hence best lower bound is given by maximizing g(u, v) over all dual feasible u, v, yielding Lagrange dual problem: max u,v g(u, v) subject to u 0 Key property, called weak duality: if dual optimal value is g, then f g Note that this always holds (even if primal problem is nonconvex) 11

12 Another key property: the dual problem is a convex optimization problem (as written, it is a concave maximization problem) Again, this is always true (even when primal problem is not convex) By definition: { g(u, v) = min f(x) + x = max x m u i h i (x) + i=1 { f(x) r j=1 m u i h i (x) i=1 } v j l j (x) r j=1 } v j l j (x) } {{ } pointwise maximum of convex functions in (u, v) I.e., g is concave in (u, v), and u 0 is a convex constraint, hence dual problem is a concave maximization problem

f 13 Example: nonconvex quartic minimization Define f(x) = x 4 50x 2 + 100x (nonconvex), minimize subject to constraint x 4.5 Primal Dual 1000 0 1000 3000 5000 g 1160 1120 1080 10 5 0 5 10 x 0 20 40 60 80 100 v Dual function g can be derived explicitly, via closed-form equation for roots of a cubic equation

14 Form of g is quite complicated: where for i = 1, 2, 3, F i (u) = g(u) = min F i 4 (u) 50Fi 2 (u) + 100F i (u), i=1,2,3 a ( i 12 2 1/3 432(100 u) ( 432 2 (100 u) 2 4 1200 3) ) 1/2 1/3 100 2 1/3 1 (432(100 u) ( 432 2 (100 u) 2 4 1200 3) 1/2 ) 1/3, and a 1 = 1, a 2 = ( 1 + i 3)/2, a 3 = ( 1 i 3)/2 Without the context of duality it would be difficult to tell whether or not g is concave... but we know it must be!

15 Strong duality Recall that we always have f g (weak duality). On the other hand, in some problems we have observed that actually which is called strong duality f = g Slater s condition: if the primal is a convex problem (i.e., f and h 1,... h m are convex, l 1,... l r are affine), and there exists at least one strictly feasible x R n, meaning h 1 (x) < 0,... h m (x) < 0 and l 1 (x) = 0,... l r (x) = 0 then strong duality holds This is a pretty weak condition. (Further refinement: only require strict inequalities over functions h i that are not affine)

16 LPs: back to where we started For linear programs: Easy to check that the dual of the dual LP is the primal LP Refined version of Slater s condition: strong duality holds for an LP if it is feasible Apply same logic to its dual LP: strong duality holds if it is feasible Hence strong duality holds for LPs, except when both primal and dual are infeasible In other words, we nearly always have strong duality for LPs

17 Example: support vector machine dual Given y { 1, 1} n, X R n p, rows x 1,... x n, recall the support vector machine problem: min β,β 0,ξ subject to 1 2 β 2 2 + C n i=1 ξ i ξ i 0, i = 1,... n y i (x T i β + β 0 ) 1 ξ i, i = 1,... n Introducing dual variables v, w 0, we form the Lagrangian: L(β, β 0, ξ, v, w) = 1 n n 2 β 2 2 + C ξ i v i ξ i + i=1 i=1 n ( w i 1 ξi y i (x T i β + β 0 ) ) i=1

18 Minimizing over β, β 0, ξ gives Lagrange dual function: { 1 g(v, w) = 2 wt X XT w + 1 T w if w = C1 v, w T y = 0 otherwise where X = diag(y)x. Thus SVM dual problem, eliminating slack variable v, becomes max 1 w 2 wt X XT w + 1 T w subject to 0 w C1, w T y = 0 Check: Slater s condition is satisfied, and we have strong duality. Further, from study of SVMs, might recall that at optimality β = X T w This is not a coincidence, as we ll later via the KKT conditions

19 Duality gap Given primal feasible x and dual feasible u, v, the quantity f(x) g(u, v) is called the duality gap between x and u, v. Note that f(x) f f(x) g(u, v) so if the duality gap is zero, then x is primal optimal (and similarly, u, v are dual optimal) From an algorithmic viewpoint, provides a stopping criterion: if f(x) g(u, v) ɛ, then we are guaranteed that f(x) f ɛ Very useful, especially in conjunction with iterative methods... more dual uses in coming lectures

20 Dual norms Let x be a norm, e.g., l p norm: x p = ( n i=1 x i p ) 1/p, for p 1 Trace norm: X tr = r i=1 σ i(x) We define its dual norm x as x = max z 1 zt x Gives us the inequality z T x z x, like Cauchy-Schwartz. Back to our examples, l p norm dual: ( x p ) = x q, where 1/p + 1/q = 1 Trace norm dual: ( X tr ) = X op = σ max (X) Dual norm of dual norm: it turns out that x = x... we ll see connections to duality (including this one) in coming lectures

21 References S. Boyd and L. Vandenberghe (2004), Convex optimization, Chapter 5 R. T. Rockafellar (1970), Convex analysis, Chapters 28 30