L. Vandenberghe EE236C (Spring 23-4) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping
Introduction this lecture: network flow algorithm for evaluating prox-operator of h(x) = = n i= n i,j= i j= A ij x i x j A ij max{,x i x j } coefficients A ij = A ji are nonnegative associated undirected graph has n nodes, edges (i,j) when A ij > applications in image processing and machine learning Proximal mapping via network optimization 2
Outline minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping
Minimum cut problem find subset I {,2,...,n} that minimizes C(I) = i I,j IA ij + i I b i A S n with A ij = A ji ; no sign restrictions on b R n cost can be expressed as C(I) = x T A( x)+b T x x is the incidence vector of I: x k = if k I, x k = if k I graph interpretation optimal two-way partition of n nodes of undirected graph first term in C(I) is cost of the cut (edges removed by partitioning) Proximal mapping via network optimization 3
Discrete optimization formulations binary quadratic maximization minimize x T Ax+(A+b) T x subject to x {,} n cost function is equal to C(I) if x is incidence vector of I binary piecewise-linear minimization minimize i>j subject to x {,} n A ij x i x j +b T x cost function is equal to C(I) if x is incidence vector of I Proximal mapping via network optimization 4
Convex relaxation relaxation: replace x {,} n with x (componentwise) minimize i>j subject to x A ij x i x j +b T x we will use LP duality to show that the relaxation is exact relaxation has an optimal solution x {,} n if x {,} n is optimal for the relaxation, then rounding x as ˆx i = if x i > /2, ˆx i = if x i /2 gives an integer optimal solution ˆx {,} n Proximal mapping via network optimization 5
Linear program formulation relaxed problem as LP: introduce matrix variable Y R n n minimize tr(ay)+b T x subject to Y x T x T Y x (componentwise) inequalities on Y are equivalent to Y ij max{,x i x j }, i,j =,...,n at optimum, Y ij = max{,x i x j } because A ij ; therefore tr(ay)+b T x = i>j A ij x i x j +b T x Proximal mapping via network optimization 6
Dual problem dual linear program (variables U R n n, v,w R n ) maximize T w subject to U A (U U T ) v +w +b = v, w complementary slackness conditions: if x, Y, U, v, w are optimal (Y x T +x T ) U = Y (A U) = x v = ( x) w = denotes Hadamard (componentwise) matrix product Proximal mapping via network optimization 7
Exactness of relaxation let x be optimal for the relaxation; round x to ˆx {,} n as ˆx i = if x i > /2, ˆx i = if x i /2 by complementary slackness, if ˆx i = and ˆx j =, then x i > x j ; hence U ij = A ij, U ji =, v i =, w j = implies ˆx T A( ˆx)+b Tˆx is equal to the lower bound from relaxation ˆx T A( ˆx)+b Tˆx = ˆx T (U U T )( ˆx) ˆx T v ( ˆx) T w+b Tˆx = ˆx T ((U U T ) v +w +b) ˆx T (U U T )ˆx T w = T w Proximal mapping via network optimization 8
Network flow interpretation of dual LP change of variables (with b +,k = max{b k,}, b,k = max{ b k,}) Z = U U T, z s = b w, z t = b + v reformulated dual problem maximize T z s T b subject to Z z s +z t = A Z A, z s b, z t b + Z ij = Z ji is flow from node i to node j z s is vector of flows from an added source node to nodes,...,n z t is vector of flows from nodes,...,n to an added sink node maximize flow T z s from source to sink, subject to capacity constraints exactness of relaxation is known as the max-flow min-cut theorem Proximal mapping via network optimization 9
Outline minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping
Parametric min-cut problem min-cut problem: take b = α c with α a scalar parameter minimize n i,j= subject to y {,} n A ij max{,y i y j }+(α c) T y equivalent LP and its dual min. tr(ay)+(α c) T y s.t. Y y T y T Y y max. T w s.t. U A (U U T ) v +w+α = c v, w primal variables y R n, Y R n n ; dual variables U R n n, v,w R n Proximal mapping via network optimization
p (α) = inf y {,} n Optimal value function n i,j= A ij max{,y i y j }+(α c) T y immediate properties p (α) is piecewise-linear and concave p (α) = nα T c for α c, with optimal solution y = p (α) = for α c, with optimal solution y = less obvious: p (α) hat at most n+ linear segments Proximal mapping via network optimization
Monotonicity parametric min-cut problem minimize f α (y) = y T A( y)+(α c) T y subject to y {,} n monotonicity of solutions if y β is optimal for α = β and y γ is optimal for α = γ > β then y γ y β y γ is zero in all positions where y β is zero implies that optimal value function p (α) has at most n+ segments Proximal mapping via network optimization 2
proof of monotonicity property denote component-wise maximum and minimum of y β and y γ as y min = min{y β,y γ }, y max = max{y β,y γ } (submodularity) it is readily verified that y T maxa( y max )+y T mina( y min ) y T βa( y β )+y T γa( y γ ) from submodularity and optimality of y β, y γ : f β (y β )+f β (y γ ) = f β (y β )+f γ (y γ ) (γ β) T y γ f β (y max )+f γ (y min ) (γ β) T y γ = f β (y max )+f β (y min ) (γ β) T (y γ y min ) f β (y β )+f β (y γ ) (γ β) T (y γ y min ) therefore γ > β implies y γ = y min Proximal mapping via network optimization 3
Example A = 3 2 2 3 c = 2 6 4 2 3 4 5 4 2 8 6 4 2 2 α p (α) Proximal mapping via network optimization 4
Outline minimum cut and maximum flow problems parametric minimum cut problem proximal mapping via parametric flow maximization
Proximal mapping piecewise-linear convex function h(x) = i>j A ij x i x j = n i,j= A ij max{,x i x j } proximal mapping: x = prox h (c) is the solution of minimize h(x)+ 2 x c 2 2 equivalent to a quadratic program efficiently computed from solution of parametric minimum cut problem Proximal mapping via network optimization 5
Quadratic program formulation minimize tr(ay)+ 2 x c 2 2 subject to Y x T x T Y at optimum, x = prox h (c) and Y ij = max{,y i y j } optimality conditions: there exists a U R n n that satisfies U A, x+(u U T ) = c and the complementary slackness conditions (Y x T +x T ) U =, Y (A U) = in particular, if x i > x j, then U ij = A ij and U ji = Proximal mapping via network optimization 6
Relation with parametric min-cut problem parametric min-cut problem minimize f α (y) = n i,j= subject to y {,} n A ij max{,y i y j }+(α c) T y parametric min-cut solution from proximal mapping if x = prox h (c) then a solution of the parametric min-cut problem is y i = if x i α, y i = if x i < α proximal mapping from parametric min-cut solution x i = sup{α parametric min-cut problem has a solution with y i = } (follows from monotonicity of min-cut solution) Proximal mapping via network optimization 7
proof: let x = prox h (c) and U the optimal multiplier (page 6); define w = (x α) +, v = (α x) + U, v, w are dual feasible for parametric LP on page ; therefore f α (ŷ) T (x α) + ŷ {,} n equality holds for y defined on p. 7: f α (y) = y T (U U T )( y)+(α c) T y = y T ((U U T )+x c) y T (U U T )y (x α) T y = T (x α) + first line follows from U ij = A ij, U ji = if y i =, y j = (see p. 6) last line follows from definition of U (p. 6) and construction of y Proximal mapping via network optimization 8
Example 2 3 4 5 4 2 8 6 4 2 2 α p (α) prox h (c) = (,,,3,2) prox h (c) k is value of α at breakpoint where y k switches from to Proximal mapping via network optimization 9
Summary proximal mapping of h(x) = i>j A ij x i x j can be computed by solving a parametric min-cut/max-flow problem very efficiently solved by algorithms from network optimization complexity O(mnlog(n 2 /m)) for general graphs (m is # edges) faster algorithms for graphs with special structure Proximal mapping via network optimization 2
References D. Goldfarb and W. Yin, Parametric maximum flow algorithms for fast total variation minimization, SIAM Journal on Scientic Computing (29) A. Chambolle and J. Darbon, On total variation minimization and surface evolution using parametric maximum flows, International Journal of Computer Vision (29) J. Mairal, R. Jenatton, G. Obozinski, F. Bach, Network flow algorithms for structured sparsity, arxiv.org/abs/8.529 (2) Proximal mapping via network optimization 2