10. Proximal point method

Similar documents

2.3 Convex Constrained Optimization Problems

Duality in General Programs. Ryan Tibshirani Convex Optimization /36-725

GI01/M055 Supervised Learning Proximal Methods

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1

Proximal mapping via network optimization

Adaptive Online Gradient Descent

Nonlinear Optimization: Algorithms 3: Interior-point methods

Downloaded 08/02/12 to Redistribution subject to SIAM license or copyright; see

A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION

Error Bound for Classes of Polynomial Systems and its Applications: A Variational Analysis Approach

First Order Algorithms in Variational Image Processing

Big Data - Lecture 1 Optimization reminders

Summer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.

Convex analysis and profit/cost/support functions

Online Convex Optimization

Date: April 12, Contents

Nonlinear Programming Methods.S2 Quadratic Programming

Several Views of Support Vector Machines

1 Norms and Vector Spaces

Distributed Machine Learning and Big Data

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

(Quasi-)Newton methods

Table 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass.

24. The Branch and Bound Method

The Heat Equation. Lectures INF2320 p. 1/88

Convex Programming Tools for Disjunctive Programs

1 if 1 x 0 1 if 0 x 1

Variational approach to restore point-like and curve-like singularities in imaging

MATH 425, PRACTICE FINAL EXAM SOLUTIONS.

Online Learning. 1 Online Learning and Mistake Bounds for Finite Hypothesis Classes

A QUICK GUIDE TO THE FORMULAS OF MULTIVARIABLE CALCULUS

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method

Metric Spaces. Chapter 1

Quasi-static evolution and congested transport

3. INNER PRODUCT SPACES

Mathematical Methods of Engineering Analysis

Mathematical finance and linear programming (optimization)

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

constraint. Let us penalize ourselves for making the constraint too big. We end up with a

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

Practice with Proofs

(a) We have x = 3 + 2t, y = 2 t, z = 6 so solving for t we get the symmetric equations. x 3 2. = 2 y, z = 6. t 2 2t + 1 = 0,

Duality of linear conic problems

A Simple Introduction to Support Vector Machines

Høgskolen i Narvik Sivilingeniørutdanningen STE6237 ELEMENTMETODER. Oppgaver

1 Formulating The Low Degree Testing Problem

Linear Threshold Units

SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA)

Solutions Of Some Non-Linear Programming Problems BIJAN KUMAR PATEL. Master of Science in Mathematics. Prof. ANIL KUMAR

BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION

Optimal shift scheduling with a global service level constraint

STORM: Stochastic Optimization Using Random Models Katya Scheinberg Lehigh University. (Joint work with R. Chen and M. Menickelly)

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

BANACH AND HILBERT SPACE REVIEW

Inner product. Definition of inner product

Practical Guide to the Simplex Method of Linear Programming

Cost Minimization and the Cost Function

Integrating Benders decomposition within Constraint Programming

Lecture 7: Finding Lyapunov Functions 1

The Goldberg Rao Algorithm for the Maximum Flow Problem

Basics of Polynomial Theory

Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

Pattern Analysis. Logistic Regression. 12. Mai Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Solutions for Review Problems

GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1

Stationarity Results for Generating Set Search for Linearly Constrained Optimization

The p-norm generalization of the LMS algorithm for adaptive filtering

No: Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics

Properties of BMO functions whose reciprocals are also BMO

Dynamic Network Energy Management via Proximal Message Passing

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu

Critical points of once continuously differentiable functions are important because they are the only points that can be local maxima or minima.

Chapter 3 Nonlinear Model Predictive Control

Massive Data Classification via Unconstrained Support Vector Machines

Lecture 6: Logistic Regression

A Distributed Line Search for Network Optimization

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.

Support Vector Machines

THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS

Linear Programming Notes V Problem Transformations

Lecture 13 Linear quadratic Lyapunov theory

Metric Spaces. Chapter Metrics

1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where.

Chapter Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

Figure 2.1: Center of mass of four points.

Notes on metric spaces

A PRIORI ESTIMATES FOR SEMISTABLE SOLUTIONS OF SEMILINEAR ELLIPTIC EQUATIONS. In memory of Rou-Huai Wang

Rolle s Theorem. q( x) = 1

Constrained optimization.

15. Symmetric polynomials

Interior Point Methods and Linear Programming

TOPIC 4: DERIVATIVES

Chapter 4, Arithmetic in F [x] Polynomial arithmetic and the division algorithm.

Transcription:

L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1

Proximal point method a conceptual algorithm for minimizing a closed convex function f: x k) = prox tk fx k 1) ) = argmin u fu)+ 1 ) u x k 1) 2 2 2t k can be viewed as proximal gradient method page 6-2) with gx) = 0 of interest if prox evaluations are much easier than minimizing f directly a practical algorithm if inexact prox evaluations are used step size t k > 0 affects number of iterations, cost of prox evaluations basis of the augmented Lagrangian method Proximal point method 10-2

Convergence assumptions f is closed and convex hence, prox tf x) is uniquely defined for all x) optimal value f is finite and attained at x result fx k) ) f x 0) x 2 2 for k 1 2 k t i i=1 implies convergence if i t i rate is 1/k if t i is fixed or variable but bounded away from zero t i is arbitrary; however cost of prox evaluations will depend on t i Proximal point method 10-3

proof: apply analysis of proximal gradient method lect. 6) with gx) = 0 since g is zero, inequality 1) on page 6-12 holds for any t > 0 from page 6-14, fx i) ) is nonincreasing and t i fx i) ) f ) 1 2 ) x i) x 2 2 x i 1) x 2 2 combine inequalities for i = 1 to i = k to get k t i ) i=1 ) fx k) ) f ) k i=1 t i fx i) ) f ) 1 2 x0) x 2 2 Proximal point method 10-4

Accelerated proximal point algorithms FISTA take gx) = 0 on p.7-8): choose x 0) = x 1) and for k 1 x k) = prox tk f ) x k 1) 1 θ k 1 +θ k x k 1) x k 2) ) θ k 1 Nesterov s 2nd method p. 7-23): choose x 0) = v 0) and for k 1 v k) = prox tk /θ k )fv k 1) ), x k) = 1 θ k )x k 1) +θ k v k) possible choices of parameters fixed steps: t k = t and θ k = 2/k +1) variable steps: choose any t k > 0, θ 1 = 1, and for k > 1, solve θ k from 1 θ k )t k θ 2 k = t k 1 θ 2 k 1 Proximal point method 10-5

Convergence assumptions f is closed and convex hence, prox tf x) is uniquely defined for all x) optimal value f is finite and attained at x result fx k) ) f 2 x 0) x 2 2 2 ) t 1 + k 2 for k 1 ti i=2 implies convergence if i ti rate is 1/k 2 if t i is fixed or variable but bounded away from zero Proximal point method 10-6

proof: follows from analysis in lecture 7 with gx) = 0 since g is zero, first inequalities on p. 7-15 and p. 7-25 hold for any t > 0 therefore the conclusion on p. 7-16 and p. 7-26 holds: fx k) ) f θ2 k 2t k x 0) x 2 2 for fixed step size t k = t, θ k = 2/k +1), θ 2 k 2t k = 2 k +1) 2 t for variable step size, we proved on page 7-19 that θ 2 k 2t k 2 2 t 1 + k ti ) 2 i=2 Proximal point method 10-7

Outline proximal point method augmented Lagrangian method Moreau-Yosida smoothing

Standard problem form minimize fx) + gax) f : R n R and g : R m R are closed convex functions; A R m n equivalent formulation with auxiliary variable y: minimize fx) + gy) subject to Ax = y examples g is indicator function of {b}: minimize fx) subject to Ax = b g is indicator function of C: minimize fx) subject to Ax C gy) = y b : minimize fx)+ Ax b Proximal point method 10-8

Dual problem Lagrangian of reformulated problem) dual problem Lx,y,z) = fx)+gy)+z T Ax y) maximize inf x,y Lx,y,z) = f A T z) g z) optimality conditions: x, y, z are optimal if x, y are feasible: x domf, y domg, and Ax = y x and y minimize Lx,y,z): A T z fx) and z gy) augmented Lagrangian method: proximal point method applied to dual Proximal point method 10-9

Proximal mapping of dual function proximal mapping of hz) = f A T z)+g z) is defined as prox th z) = argmin u f A T u)+g u)+ 1 ) 2t u z 2 2 dual expression: prox th z) = z +taˆx ŷ) where ˆx,ŷ) = argmin x,y fx)+gy)+z T Ax y)+ t ) 2 Ax y 2 2 ˆx, ŷ minimize augmented Lagrangian Lagrangian + quadratic penalty) Proximal point method 10-10

proof write augmented Lagrangian minimization as minimize over x, y, w) fx)+gy)+ t 2 w 2 2 subject to Ax y +z/t = w optimality conditions u is multiplier for equality): Ax y + 1 t z = w, AT u fx), u gy), tw = u eliminating x, y, w gives u = z +tax y) and 0 A f A T u)+ g u)+ 1 t u z) this is the optimality condition for problem in definiton of u = prox th z) Proximal point method 10-11

choose initial z 0) and repeat: Augmented Lagrangian method 1. minimize augmented Lagrangian ˆx,ŷ) = argmin x,y fx)+gy)+ t k 2 Ax y +1/t k )z k 1) 2 2 ) 2. dual update z k) = z k 1) +t k Aˆx ŷ) also known as method of multipliers, Bregman iteration this is the proximal point method applied to the dual problem as variants, can apply the fast proximal point methods to the dual usually implemented with inexact minimization in step 1 Proximal point method 10-12

Examples minimize fx) + gax) equality constraints g is indicator of {b}): ˆx = argmin x z := z +taˆx b) fx)+z T Ax+ t ) 2 Ax b 2 2 set constraint g indicator of convex set C): ˆx = argmin fx)+ t2 ) dax+z/t)2 z := z +taˆx PAˆx+z/t)) Pu) is projection of u on C, du) = u Pu) 2 is Euclidean distance Proximal point method 10-13

Outline proximal point method augmented Lagrangian method Moreau-Yosida smoothing

Moreau-Yosida smoothing Moreau-Yosida regularization Moreau envelope) of closed convex f is f t) x) = inf u fu)+ 1 ) 2t u x 2 2 = f prox tf x) ) + 1 2t with t > 0) proxtf x) x 2 2 immediate properties f t) is convex infimum over u of a convex function of x, u) domain of f t) is R n recall that prox tf x) is defined for all x) Proximal point method 10-14

Examples indicator function: smoothed f is squared Euclidean distance fx) = I C x), f t) x) = 1 2t dx)2 1-norm: smoothed function is Huber penalty fx) = x 1, f t) x) = n φ t x k ) k=1 φ t z) = { z 2 /2t) z t z t/2 z t φtz) t/2 z t/2 Proximal point method 10-15

Conjugate of Moreau envelope f t) x) = inf u fu)+ 1 ) 2t u x 2 2 f t) is infimal convolution of fu) and v 2 2/2t) see page 8-14): f t) x) = inf fu)+ 1 ) u+v=x 2t v 2 2 from page 8-14, conjugate is sum of conjugates of fu) and v 2 2/2t): f t) ) y) = f y)+ t 2 y 2 2 hence, conjugate is strongly convex with parameter t Proximal point method 10-16

Gradient of Moreau envelope f t) x) = sup y x T y f y) t ) 2 y 2 2 maximizer in definition is unique and satisfies x ty f y) y fx ty) maximizing y is the gradient of f t) : from pages 6-7 and 9-4, f t) x) = 1 t x proxtf x) ) = prox 1/t)f x/t) gradient f t) is Lipschitz continuous with constant 1/t follows from nonexpansiveness of prox; see page 6-9) Proximal point method 10-17

Interpretation of proximal point algorithm apply gradient method to minimize Moreau envelope minimize f t) x) = inf u fu)+ 1 ) 2t u x 2 2 this is an exact smooth reformulation of problem of minimizing fx): solution x is minimizer of f f t) is differentiable with Lipschitz continuous gradient L = 1/t) gradient update: with fixed t k = 1/L = t x k) = x k 1) t f t) x k 1) ) = prox tf x k 1) )... the proximal point update with constant step size t k = t Proximal point method 10-18

Interpretation of augmented Lagrangian algorithm ˆx, ŷ) = argmin x,y z := z +taˆx ŷ) fx)+gy)+ t ) 2 Ax y +1/t)z 2 2 with fixed t, dual update is gradient step applied to smoothed dual if we eliminate y, primal step can be interpreted as smoothing g: ˆx = argmin x fx)+g1/t) Ax+1/t)z) ) example: minimize fx)+ Ax b 1 ˆx = argmin x fx)+φ1/t Ax b+1/t)z) ) with φ 1/t the Huber penalty applied componentwise page 10-15) Proximal point method 10-19

References proximal point algorithm and fast proximal point algorithm O. Güler, On the convergence of the proximal point algorithm for convex minimization, SIAM J. Control and Optimization 1991) O. Güler, New proximal point algorithms for convex minimization, SIOPT 1992) O. Güler, Augmented Lagrangian algorithm for linear programming, JOTA 1992) augmented Lagrangian algorithm D.P. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods 1982) Proximal point method 10-20