Proximal mapping via network optimization



Similar documents
4.6 Linear Programming duality

2.3 Convex Constrained Optimization Problems

5.1 Bipartite Matching

1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where.

Mathematical finance and linear programming (optimization)

10. Proximal point method

Duality in General Programs. Ryan Tibshirani Convex Optimization /36-725

Lecture 11: 0-1 Quadratic Program and Lower Bounds

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1

LECTURE 5: DUALITY AND SENSITIVITY ANALYSIS. 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method

Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach

24. The Branch and Bound Method

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method

What is Linear Programming?

Algorithm Design and Analysis

Lecture 4: BK inequality 27th August and 6th September, 2007

Discrete Optimization

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

Linear Programming. Widget Factory Example. Linear Programming: Standard Form. Widget Factory Example: Continued.

Linear Programming Notes V Problem Transformations

Linear Programming I

Approximation Algorithms

IEOR 4404 Homework #2 Intro OR: Deterministic Models February 14, 2011 Prof. Jay Sethuraman Page 1 of 5. Homework #2

Discuss the size of the instance for the minimum spanning tree problem.

Optimization in R n Introduction

1 Solving LPs: The Simplex Algorithm of George Dantzig

Fairness in Routing and Load Balancing

26 Linear Programming

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

CHAPTER 9. Integer Programming

Chapter 13: Binary and Mixed-Integer Programming

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Applied Algorithm Design Lecture 5

Chapter Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

Practical Guide to the Simplex Method of Linear Programming

Optimal shift scheduling with a global service level constraint

3. Linear Programming and Polyhedral Combinatorics

Definition Given a graph G on n vertices, we define the following quantities:

Bargaining Solutions in a Social Network

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

Two-Stage Stochastic Linear Programs

Nonlinear Programming Methods.S2 Quadratic Programming

Linear Programming in Matrix Form

International Doctoral School Algorithmic Decision Theory: MCDA and MOO

Support Vector Machines

A Branch and Bound Algorithm for Solving the Binary Bi-level Linear Programming Problem

The Equivalence of Linear Programs and Zero-Sum Games

Support Vector Machines Explained

Linear Programming. March 14, 2014

DATA ANALYSIS II. Matrix Algorithms

Several Views of Support Vector Machines

Date: April 12, Contents

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Minimizing the Number of Machines in a Unit-Time Scheduling Problem

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

A Note on the Bertsimas & Sim Algorithm for Robust Combinatorial Optimization Problems

Separation Properties for Locally Convex Cones

Adaptive Online Gradient Descent

Duality in Linear Programming

Max Flow. Lecture 4. Optimization on graphs. C25 Optimization Hilary 2013 A. Zisserman. Max-flow & min-cut. The augmented path algorithm

Nonlinear Optimization: Algorithms 3: Interior-point methods

Linear programming and reductions

The Impact of Linear Optimization on Promotion Planning

1 Portfolio mean and variance

A Robust Formulation of the Uncertain Set Covering Problem

Transportation Polytopes: a Twenty year Update

A Network Flow Approach in Cloud Computing

Statistical Machine Learning

Branch and Cut for TSP

Solving NP Hard problems in practice lessons from Computer Vision and Computational Biology

Cyber-Security Analysis of State Estimators in Power Systems

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.

Chapter 6. Linear Programming: The Simplex Method. Introduction to the Big M Method. Section 4 Maximization and Minimization with Problem Constraints

Lecture 2: August 29. Linear Programming (part I)

Routing in Line Planning for Public Transport

Linear Programming Sensitivity Analysis

Chapter 4. Duality. 4.1 A Graphical Example

A Working Knowledge of Computational Complexity for an Optimizer

8.1 Min Degree Spanning Tree

Improved Algorithms for Data Migration

Minimizing costs for transport buyers using integer programming and column generation. Eser Esirgen

Online Learning and Competitive Analysis: a Unified Approach

Linear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc.

Operation Research. Module 1. Module 2. Unit 1. Unit 2. Unit 3. Unit 1

Some representability and duality results for convex mixed-integer programs.

Linear Programming. April 12, 2005

The Goldberg Rao Algorithm for the Maximum Flow Problem

constraint. Let us penalize ourselves for making the constraint too big. We end up with a

Creating a More Efficient Course Schedule at WPI Using Linear Optimization

Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams

Transcription:

L. Vandenberghe EE236C (Spring 23-4) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping

Introduction this lecture: network flow algorithm for evaluating prox-operator of h(x) = = n i= n i,j= i j= A ij x i x j A ij max{,x i x j } coefficients A ij = A ji are nonnegative associated undirected graph has n nodes, edges (i,j) when A ij > applications in image processing and machine learning Proximal mapping via network optimization 2

Outline minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping

Minimum cut problem find subset I {,2,...,n} that minimizes C(I) = i I,j IA ij + i I b i A S n with A ij = A ji ; no sign restrictions on b R n cost can be expressed as C(I) = x T A( x)+b T x x is the incidence vector of I: x k = if k I, x k = if k I graph interpretation optimal two-way partition of n nodes of undirected graph first term in C(I) is cost of the cut (edges removed by partitioning) Proximal mapping via network optimization 3

Discrete optimization formulations binary quadratic maximization minimize x T Ax+(A+b) T x subject to x {,} n cost function is equal to C(I) if x is incidence vector of I binary piecewise-linear minimization minimize i>j subject to x {,} n A ij x i x j +b T x cost function is equal to C(I) if x is incidence vector of I Proximal mapping via network optimization 4

Convex relaxation relaxation: replace x {,} n with x (componentwise) minimize i>j subject to x A ij x i x j +b T x we will use LP duality to show that the relaxation is exact relaxation has an optimal solution x {,} n if x {,} n is optimal for the relaxation, then rounding x as ˆx i = if x i > /2, ˆx i = if x i /2 gives an integer optimal solution ˆx {,} n Proximal mapping via network optimization 5

Linear program formulation relaxed problem as LP: introduce matrix variable Y R n n minimize tr(ay)+b T x subject to Y x T x T Y x (componentwise) inequalities on Y are equivalent to Y ij max{,x i x j }, i,j =,...,n at optimum, Y ij = max{,x i x j } because A ij ; therefore tr(ay)+b T x = i>j A ij x i x j +b T x Proximal mapping via network optimization 6

Dual problem dual linear program (variables U R n n, v,w R n ) maximize T w subject to U A (U U T ) v +w +b = v, w complementary slackness conditions: if x, Y, U, v, w are optimal (Y x T +x T ) U = Y (A U) = x v = ( x) w = denotes Hadamard (componentwise) matrix product Proximal mapping via network optimization 7

Exactness of relaxation let x be optimal for the relaxation; round x to ˆx {,} n as ˆx i = if x i > /2, ˆx i = if x i /2 by complementary slackness, if ˆx i = and ˆx j =, then x i > x j ; hence U ij = A ij, U ji =, v i =, w j = implies ˆx T A( ˆx)+b Tˆx is equal to the lower bound from relaxation ˆx T A( ˆx)+b Tˆx = ˆx T (U U T )( ˆx) ˆx T v ( ˆx) T w+b Tˆx = ˆx T ((U U T ) v +w +b) ˆx T (U U T )ˆx T w = T w Proximal mapping via network optimization 8

Network flow interpretation of dual LP change of variables (with b +,k = max{b k,}, b,k = max{ b k,}) Z = U U T, z s = b w, z t = b + v reformulated dual problem maximize T z s T b subject to Z z s +z t = A Z A, z s b, z t b + Z ij = Z ji is flow from node i to node j z s is vector of flows from an added source node to nodes,...,n z t is vector of flows from nodes,...,n to an added sink node maximize flow T z s from source to sink, subject to capacity constraints exactness of relaxation is known as the max-flow min-cut theorem Proximal mapping via network optimization 9

Outline minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping

Parametric min-cut problem min-cut problem: take b = α c with α a scalar parameter minimize n i,j= subject to y {,} n A ij max{,y i y j }+(α c) T y equivalent LP and its dual min. tr(ay)+(α c) T y s.t. Y y T y T Y y max. T w s.t. U A (U U T ) v +w+α = c v, w primal variables y R n, Y R n n ; dual variables U R n n, v,w R n Proximal mapping via network optimization

p (α) = inf y {,} n Optimal value function n i,j= A ij max{,y i y j }+(α c) T y immediate properties p (α) is piecewise-linear and concave p (α) = nα T c for α c, with optimal solution y = p (α) = for α c, with optimal solution y = less obvious: p (α) hat at most n+ linear segments Proximal mapping via network optimization

Monotonicity parametric min-cut problem minimize f α (y) = y T A( y)+(α c) T y subject to y {,} n monotonicity of solutions if y β is optimal for α = β and y γ is optimal for α = γ > β then y γ y β y γ is zero in all positions where y β is zero implies that optimal value function p (α) has at most n+ segments Proximal mapping via network optimization 2

proof of monotonicity property denote component-wise maximum and minimum of y β and y γ as y min = min{y β,y γ }, y max = max{y β,y γ } (submodularity) it is readily verified that y T maxa( y max )+y T mina( y min ) y T βa( y β )+y T γa( y γ ) from submodularity and optimality of y β, y γ : f β (y β )+f β (y γ ) = f β (y β )+f γ (y γ ) (γ β) T y γ f β (y max )+f γ (y min ) (γ β) T y γ = f β (y max )+f β (y min ) (γ β) T (y γ y min ) f β (y β )+f β (y γ ) (γ β) T (y γ y min ) therefore γ > β implies y γ = y min Proximal mapping via network optimization 3

Example A = 3 2 2 3 c = 2 6 4 2 3 4 5 4 2 8 6 4 2 2 α p (α) Proximal mapping via network optimization 4

Outline minimum cut and maximum flow problems parametric minimum cut problem proximal mapping via parametric flow maximization

Proximal mapping piecewise-linear convex function h(x) = i>j A ij x i x j = n i,j= A ij max{,x i x j } proximal mapping: x = prox h (c) is the solution of minimize h(x)+ 2 x c 2 2 equivalent to a quadratic program efficiently computed from solution of parametric minimum cut problem Proximal mapping via network optimization 5

Quadratic program formulation minimize tr(ay)+ 2 x c 2 2 subject to Y x T x T Y at optimum, x = prox h (c) and Y ij = max{,y i y j } optimality conditions: there exists a U R n n that satisfies U A, x+(u U T ) = c and the complementary slackness conditions (Y x T +x T ) U =, Y (A U) = in particular, if x i > x j, then U ij = A ij and U ji = Proximal mapping via network optimization 6

Relation with parametric min-cut problem parametric min-cut problem minimize f α (y) = n i,j= subject to y {,} n A ij max{,y i y j }+(α c) T y parametric min-cut solution from proximal mapping if x = prox h (c) then a solution of the parametric min-cut problem is y i = if x i α, y i = if x i < α proximal mapping from parametric min-cut solution x i = sup{α parametric min-cut problem has a solution with y i = } (follows from monotonicity of min-cut solution) Proximal mapping via network optimization 7

proof: let x = prox h (c) and U the optimal multiplier (page 6); define w = (x α) +, v = (α x) + U, v, w are dual feasible for parametric LP on page ; therefore f α (ŷ) T (x α) + ŷ {,} n equality holds for y defined on p. 7: f α (y) = y T (U U T )( y)+(α c) T y = y T ((U U T )+x c) y T (U U T )y (x α) T y = T (x α) + first line follows from U ij = A ij, U ji = if y i =, y j = (see p. 6) last line follows from definition of U (p. 6) and construction of y Proximal mapping via network optimization 8

Example 2 3 4 5 4 2 8 6 4 2 2 α p (α) prox h (c) = (,,,3,2) prox h (c) k is value of α at breakpoint where y k switches from to Proximal mapping via network optimization 9

Summary proximal mapping of h(x) = i>j A ij x i x j can be computed by solving a parametric min-cut/max-flow problem very efficiently solved by algorithms from network optimization complexity O(mnlog(n 2 /m)) for general graphs (m is # edges) faster algorithms for graphs with special structure Proximal mapping via network optimization 2

References D. Goldfarb and W. Yin, Parametric maximum flow algorithms for fast total variation minimization, SIAM Journal on Scientic Computing (29) A. Chambolle and J. Darbon, On total variation minimization and surface evolution using parametric maximum flows, International Journal of Computer Vision (29) J. Mairal, R. Jenatton, G. Obozinski, F. Bach, Network flow algorithms for structured sparsity, arxiv.org/abs/8.529 (2) Proximal mapping via network optimization 2