10. Proximal point method


 Edward Phelps
 1 years ago
 Views:
Transcription
1 L. Vandenberghe EE236C Spring ) 10. Proximal point method proximal point method augmented Lagrangian method MoreauYosida smoothing 101
2 Proximal point method a conceptual algorithm for minimizing a closed convex function f: x k) = prox tk fx k 1) ) = argmin u fu)+ 1 ) u x k 1) 2 2 2t k can be viewed as proximal gradient method page 62) with gx) = 0 of interest if prox evaluations are much easier than minimizing f directly a practical algorithm if inexact prox evaluations are used step size t k > 0 affects number of iterations, cost of prox evaluations basis of the augmented Lagrangian method Proximal point method 102
3 Convergence assumptions f is closed and convex hence, prox tf x) is uniquely defined for all x) optimal value f is finite and attained at x result fx k) ) f x 0) x 2 2 for k 1 2 k t i i=1 implies convergence if i t i rate is 1/k if t i is fixed or variable but bounded away from zero t i is arbitrary; however cost of prox evaluations will depend on t i Proximal point method 103
4 proof: apply analysis of proximal gradient method lect. 6) with gx) = 0 since g is zero, inequality 1) on page 612 holds for any t > 0 from page 614, fx i) ) is nonincreasing and t i fx i) ) f ) 1 2 ) x i) x 2 2 x i 1) x 2 2 combine inequalities for i = 1 to i = k to get k t i ) i=1 ) fx k) ) f ) k i=1 t i fx i) ) f ) 1 2 x0) x 2 2 Proximal point method 104
5 Accelerated proximal point algorithms FISTA take gx) = 0 on p.78): choose x 0) = x 1) and for k 1 x k) = prox tk f ) x k 1) 1 θ k 1 +θ k x k 1) x k 2) ) θ k 1 Nesterov s 2nd method p. 723): choose x 0) = v 0) and for k 1 v k) = prox tk /θ k )fv k 1) ), x k) = 1 θ k )x k 1) +θ k v k) possible choices of parameters fixed steps: t k = t and θ k = 2/k +1) variable steps: choose any t k > 0, θ 1 = 1, and for k > 1, solve θ k from 1 θ k )t k θ 2 k = t k 1 θ 2 k 1 Proximal point method 105
6 Convergence assumptions f is closed and convex hence, prox tf x) is uniquely defined for all x) optimal value f is finite and attained at x result fx k) ) f 2 x 0) x ) t 1 + k 2 for k 1 ti i=2 implies convergence if i ti rate is 1/k 2 if t i is fixed or variable but bounded away from zero Proximal point method 106
7 proof: follows from analysis in lecture 7 with gx) = 0 since g is zero, first inequalities on p and p hold for any t > 0 therefore the conclusion on p and p holds: fx k) ) f θ2 k 2t k x 0) x 2 2 for fixed step size t k = t, θ k = 2/k +1), θ 2 k 2t k = 2 k +1) 2 t for variable step size, we proved on page 719 that θ 2 k 2t k 2 2 t 1 + k ti ) 2 i=2 Proximal point method 107
8 Outline proximal point method augmented Lagrangian method MoreauYosida smoothing
9 Standard problem form minimize fx) + gax) f : R n R and g : R m R are closed convex functions; A R m n equivalent formulation with auxiliary variable y: minimize fx) + gy) subject to Ax = y examples g is indicator function of {b}: minimize fx) subject to Ax = b g is indicator function of C: minimize fx) subject to Ax C gy) = y b : minimize fx)+ Ax b Proximal point method 108
10 Dual problem Lagrangian of reformulated problem) dual problem Lx,y,z) = fx)+gy)+z T Ax y) maximize inf x,y Lx,y,z) = f A T z) g z) optimality conditions: x, y, z are optimal if x, y are feasible: x domf, y domg, and Ax = y x and y minimize Lx,y,z): A T z fx) and z gy) augmented Lagrangian method: proximal point method applied to dual Proximal point method 109
11 Proximal mapping of dual function proximal mapping of hz) = f A T z)+g z) is defined as prox th z) = argmin u f A T u)+g u)+ 1 ) 2t u z 2 2 dual expression: prox th z) = z +taˆx ŷ) where ˆx,ŷ) = argmin x,y fx)+gy)+z T Ax y)+ t ) 2 Ax y 2 2 ˆx, ŷ minimize augmented Lagrangian Lagrangian + quadratic penalty) Proximal point method 1010
12 proof write augmented Lagrangian minimization as minimize over x, y, w) fx)+gy)+ t 2 w 2 2 subject to Ax y +z/t = w optimality conditions u is multiplier for equality): Ax y + 1 t z = w, AT u fx), u gy), tw = u eliminating x, y, w gives u = z +tax y) and 0 A f A T u)+ g u)+ 1 t u z) this is the optimality condition for problem in definiton of u = prox th z) Proximal point method 1011
13 choose initial z 0) and repeat: Augmented Lagrangian method 1. minimize augmented Lagrangian ˆx,ŷ) = argmin x,y fx)+gy)+ t k 2 Ax y +1/t k )z k 1) 2 2 ) 2. dual update z k) = z k 1) +t k Aˆx ŷ) also known as method of multipliers, Bregman iteration this is the proximal point method applied to the dual problem as variants, can apply the fast proximal point methods to the dual usually implemented with inexact minimization in step 1 Proximal point method 1012
14 Examples minimize fx) + gax) equality constraints g is indicator of {b}): ˆx = argmin x z := z +taˆx b) fx)+z T Ax+ t ) 2 Ax b 2 2 set constraint g indicator of convex set C): ˆx = argmin fx)+ t2 ) dax+z/t)2 z := z +taˆx PAˆx+z/t)) Pu) is projection of u on C, du) = u Pu) 2 is Euclidean distance Proximal point method 1013
15 Outline proximal point method augmented Lagrangian method MoreauYosida smoothing
16 MoreauYosida smoothing MoreauYosida regularization Moreau envelope) of closed convex f is f t) x) = inf u fu)+ 1 ) 2t u x 2 2 = f prox tf x) ) + 1 2t with t > 0) proxtf x) x 2 2 immediate properties f t) is convex infimum over u of a convex function of x, u) domain of f t) is R n recall that prox tf x) is defined for all x) Proximal point method 1014
17 Examples indicator function: smoothed f is squared Euclidean distance fx) = I C x), f t) x) = 1 2t dx)2 1norm: smoothed function is Huber penalty fx) = x 1, f t) x) = n φ t x k ) k=1 φ t z) = { z 2 /2t) z t z t/2 z t φtz) t/2 z t/2 Proximal point method 1015
18 Conjugate of Moreau envelope f t) x) = inf u fu)+ 1 ) 2t u x 2 2 f t) is infimal convolution of fu) and v 2 2/2t) see page 814): f t) x) = inf fu)+ 1 ) u+v=x 2t v 2 2 from page 814, conjugate is sum of conjugates of fu) and v 2 2/2t): f t) ) y) = f y)+ t 2 y 2 2 hence, conjugate is strongly convex with parameter t Proximal point method 1016
19 Gradient of Moreau envelope f t) x) = sup y x T y f y) t ) 2 y 2 2 maximizer in definition is unique and satisfies x ty f y) y fx ty) maximizing y is the gradient of f t) : from pages 67 and 94, f t) x) = 1 t x proxtf x) ) = prox 1/t)f x/t) gradient f t) is Lipschitz continuous with constant 1/t follows from nonexpansiveness of prox; see page 69) Proximal point method 1017
20 Interpretation of proximal point algorithm apply gradient method to minimize Moreau envelope minimize f t) x) = inf u fu)+ 1 ) 2t u x 2 2 this is an exact smooth reformulation of problem of minimizing fx): solution x is minimizer of f f t) is differentiable with Lipschitz continuous gradient L = 1/t) gradient update: with fixed t k = 1/L = t x k) = x k 1) t f t) x k 1) ) = prox tf x k 1) )... the proximal point update with constant step size t k = t Proximal point method 1018
21 Interpretation of augmented Lagrangian algorithm ˆx, ŷ) = argmin x,y z := z +taˆx ŷ) fx)+gy)+ t ) 2 Ax y +1/t)z 2 2 with fixed t, dual update is gradient step applied to smoothed dual if we eliminate y, primal step can be interpreted as smoothing g: ˆx = argmin x fx)+g1/t) Ax+1/t)z) ) example: minimize fx)+ Ax b 1 ˆx = argmin x fx)+φ1/t Ax b+1/t)z) ) with φ 1/t the Huber penalty applied componentwise page 1015) Proximal point method 1019
22 References proximal point algorithm and fast proximal point algorithm O. Güler, On the convergence of the proximal point algorithm for convex minimization, SIAM J. Control and Optimization 1991) O. Güler, New proximal point algorithms for convex minimization, SIOPT 1992) O. Güler, Augmented Lagrangian algorithm for linear programming, JOTA 1992) augmented Lagrangian algorithm D.P. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods 1982) Proximal point method 1020
11 Projected Newtontype Methods in Machine Learning
11 Projected Newtontype Methods in Machine Learning Mark Schmidt University of British Columbia Vancouver, BC, V6T 1Z4 Dongmin Kim University of Texas at Austin Austin, Texas 78712 schmidtmarkw@gmail.com
More informationSparse Prediction with the ksupport Norm
Sparse Prediction with the Support Norm Andreas Argyriou École Centrale Paris argyrioua@ecp.fr Rina Foygel Department of Statistics, Stanford University rinafb@stanford.edu Nathan Srebro Toyota Technological
More informationIEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 2013. ACCEPTED FOR PUBLICATION 1
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 2013. ACCEPTED FOR PUBLICATION 1 ActiveSet Newton Algorithm for Overcomplete NonNegative Representations of Audio Tuomas Virtanen, Member,
More informationChapter 5. Banach Spaces
9 Chapter 5 Banach Spaces Many linear equations may be formulated in terms of a suitable linear operator acting on a Banach space. In this chapter, we study Banach spaces and linear operators acting on
More informationMichael Ulbrich. Nonsmooth Newtonlike Methods for Variational Inequalities and Constrained Optimization Problems in Function Spaces
Michael Ulbrich Nonsmooth Newtonlike Methods for Variational Inequalities and Constrained Optimization Problems in Function Spaces Technische Universität München Fakultät für Mathematik June 21, revised
More informationA Tutorial on Support Vector Machines for Pattern Recognition
c,, 1 43 () Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. A Tutorial on Support Vector Machines for Pattern Recognition CHRISTOPHER J.C. BURGES Bell Laboratories, Lucent Technologies
More informationOptimization with SparsityInducing Penalties. Contents
Foundations and Trends R in Machine Learning Vol. 4, No. 1 (2011) 1 106 c 2012 F. Bach, R. Jenatton, J. Mairal and G. Obozinski DOI: 10.1561/2200000015 Optimization with SparsityInducing Penalties By
More informationFixed Point Theorems For SetValued Maps
Fixed Point Theorems For SetValued Maps Bachelor s thesis in functional analysis Institute for Analysis and Scientific Computing Vienna University of Technology Andreas Widder, July 2009. Preface In this
More informationSupport Vector Machines
CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning algorithm. SVMs are among the best (and many believe are indeed the best)
More informationFigure 2.1: Center of mass of four points.
Chapter 2 Bézier curves are named after their inventor, Dr. Pierre Bézier. Bézier was an engineer with the Renault car company and set out in the early 196 s to develop a curve formulation which would
More informationCritical points of once continuously differentiable functions are important because they are the only points that can be local maxima or minima.
Lecture 0: Convexity and Optimization We say that if f is a once continuously differentiable function on an interval I, and x is a point in the interior of I that x is a critical point of f if f (x) =
More informationA Modern Course on Curves and Surfaces. Richard S. Palais
A Modern Course on Curves and Surfaces Richard S. Palais Contents Lecture 1. Introduction 1 Lecture 2. What is Geometry 4 Lecture 3. Geometry of InnerProduct Spaces 7 Lecture 4. Linear Maps and the Euclidean
More informationIterative Methods for Optimization
Iterative Methods for Optimization C.T. Kelley North Carolina State University Raleigh, North Carolina Society for Industrial and Applied Mathematics Philadelphia Contents Preface xiii How to Get the Software
More informationProbability in High Dimension
Ramon van Handel Probability in High Dimension ORF 570 Lecture Notes Princeton University This version: June 30, 2014 Preface These lecture notes were written for the course ORF 570: Probability in High
More informationON EMLIKE ALGORITHMS FOR MINIMUM DISTANCE ESTIMATION. P.P.B. Eggermont and V.N. LaRiccia University of Delaware
March 1998 ON EMLIKE ALGORITHMS FOR MINIMUM DISTANCE ESTIMATION PPB Eggermont and VN LaRiccia University of Delaware Abstract We study minimum distance estimation problems related to maximum likelihood
More informationControllability and Observability of Partial Differential Equations: Some results and open problems
Controllability and Observability of Partial Differential Equations: Some results and open problems Enrique ZUAZUA Departamento de Matemáticas Universidad Autónoma 2849 Madrid. Spain. enrique.zuazua@uam.es
More informationA UNIQUENESS RESULT FOR THE CONTINUITY EQUATION IN TWO DIMENSIONS. Dedicated to Constantine Dafermos on the occasion of his 70 th birthday
A UNIQUENESS RESULT FOR THE CONTINUITY EQUATION IN TWO DIMENSIONS GIOVANNI ALBERTI, STEFANO BIANCHINI, AND GIANLUCA CRIPPA Dedicated to Constantine Dafermos on the occasion of his 7 th birthday Abstract.
More informationNew Support Vector Algorithms
LETTER Communicated by John Platt New Support Vector Algorithms Bernhard Schölkopf Alex J. Smola GMD FIRST, 12489 Berlin, Germany, and Department of Engineering, Australian National University, Canberra
More informationClustering with Bregman Divergences
Journal of Machine Learning Research 6 (2005) 1705 1749 Submitted 10/03; Revised 2/05; Published 10/05 Clustering with Bregman Divergences Arindam Banerjee Srujana Merugu Department of Electrical and Computer
More informationLinear Control Systems
Chapter 3 Linear Control Systems Topics : 1. Controllability 2. Observability 3. Linear Feedback 4. Realization Theory Copyright c Claudiu C. Remsing, 26. All rights reserved. 7 C.C. Remsing 71 Intuitively,
More informationRate adaptation, Congestion Control and Fairness: A Tutorial. JEANYVES LE BOUDEC Ecole Polytechnique Fédérale de Lausanne (EPFL)
Rate adaptation, Congestion Control and Fairness: A Tutorial JEANYVES LE BOUDEC Ecole Polytechnique Fédérale de Lausanne (EPFL) September 12, 2014 2 Thanks to all who contributed to reviews and debugging
More informationA Study on SMOtype Decomposition Methods for Support Vector Machines
1 A Study on SMOtype Decomposition Methods for Support Vector Machines PaiHsuen Chen, RongEn Fan, and ChihJen Lin Department of Computer Science, National Taiwan University, Taipei 106, Taiwan cjlin@csie.ntu.edu.tw
More informationThe Backpropagation Algorithm
7 The Backpropagation Algorithm 7. Learning as gradient descent We saw in the last chapter that multilayered networks are capable of computing a wider range of Boolean functions than networks with a single
More informationProperties of BMO functions whose reciprocals are also BMO
Properties of BMO functions whose reciprocals are also BMO R. L. Johnson and C. J. Neugebauer The main result says that a nonnegative BMOfunction w, whose reciprocal is also in BMO, belongs to p> A p,and
More informationGeneralized compact knapsacks, cyclic lattices, and efficient oneway functions
Generalized compact knapsacks, cyclic lattices, and efficient oneway functions Daniele Micciancio University of California, San Diego 9500 Gilman Drive La Jolla, CA 920930404, USA daniele@cs.ucsd.edu
More informationAn Introduction to Mathematical Optimal Control Theory Version 0.2
An Introduction to Mathematical Optimal Control Theory Version.2 By Lawrence C. Evans Department of Mathematics University of California, Berkeley Chapter 1: Introduction Chapter 2: Controllability, bangbang
More informationDiagonal preconditioning for first order primaldual algorithms in convex optimization
Diagonal preconditioning for first order primaldual algorithms in conve optimization Thomas Pock Institute for Computer Graphics and Vision Graz University of Technology pock@icg.tugraz.at Antonin Chambolle
More informationA Leastsquares Approach to Direct Importance Estimation
Journal of Machine Learning Research 0 (2009) 39445 Submitted 3/08; Revised 4/09; Published 7/09 A Leastsquares Approach to Direct Importance Estimation Takafumi Kanamori Department of Computer Science
More informationSecondorder conditions in C 1,1 constrained vector optimization
Ivan Ginchev, Angelo Guerreggio, Matteo Rocca Secondorder conditions in C 1,1 constrained vector optimization 2004/17 UNIVERSITÀ DELL'INSUBRIA FACOLTÀ DI ECONOMIA http://eco.uninsubria.it In questi quaderni
More informationGroup Sparsity in Nonnegative Matrix Factorization
Group Sparsity in Nonnegative Matrix Factorization Jingu Kim Renato D. C. Monteiro Haesun Park Abstract A recent challenge in data analysis for science and engineering is that data are often represented
More information