Convex Optimization: Modeling and Algorithms

Size: px
Start display at page:

Download "Convex Optimization: Modeling and Algorithms"

Transcription

1 Convex Optimization: Modeling and Algorithms Lieven Vandenberghe Electrical Engineering Department, UC Los Angeles Tutorial lectures, 21st Machine Learning Summer School Kyoto, August 29-30, 2012

2 Introduction Convex optimization MLSS 2012 mathematical optimization linear and convex optimization recent history 1

3 Mathematical optimization minimize f 0 (x 1,...,x n ) subject to f 1 (x 1,...,x n ) 0 f m (x 1,...,x n ) 0 a mathematical model of a decision, design, or estimation problem finding a global solution is generally intractable even simple looking nonlinear optimization problems can be very hard Introduction 2

4 The famous exception: Linear programming minimize c 1 x 1 + c 2 x 2 subject to a 11 x 1 + +a 1n x n b 1... a m1 x 1 + +a mn x n b m widely used since Dantzig introduced the simplex algorithm in 1948 since 1950s, many applications in operations research, network optimization, finance, engineering, combinatorial optimization,... extensive theory (optimality conditions, sensitivity analysis,... ) there exist very efficient algorithms for solving linear programs Introduction 3

5 Convex optimization problem minimize f 0 (x) subject to f i (x) 0, i = 1,...,m objective and constraint functions are convex: for 0 θ 1 f i (θx+(1 θ)y) θf i (x)+(1 θ)f i (y) can be solved globally, with similar (polynomial-time) complexity as LPs surprisingly many problems can be solved via convex optimization provides tractable heuristics and relaxations for non-convex problems Introduction 4

6 History 1940s: linear programming minimize c T x subject to a T i x b i, i = 1,...,m 1950s: quadratic programming 1960s: geometric programming 1990s: semidefinite programming, second-order cone programming, quadratically constrained quadratic programming, robust optimization, sum-of-squares programming,... Introduction 5

7 New applications since 1990 linear matrix inequality techniques in control support vector machine training via quadratic programming semidefinite programming relaxations in combinatorial optimization circuit design via geometric programming l 1 -norm optimization for sparse signal reconstruction applications in structural optimization, statistics, signal processing, communications, image processing, computer vision, quantum information theory, finance, power distribution,... Introduction 6

8 Advances in convex optimization algorithms interior-point methods 1984 (Karmarkar): first practical polynomial-time algorithm for LP : efficient implementations for large-scale LPs around 1990 (Nesterov & Nemirovski): polynomial-time interior-point methods for nonlinear convex programming since 1990: extensions and high-quality software packages first-order algorithms fast gradient methods, based on Nesterov s methods from 1980s extend to certain nondifferentiable or constrained problems multiplier methods for large-scale and distributed optimization Introduction 7

9 Overview 1. Basic theory and convex modeling convex sets and functions common problem classes and applications 2. Interior-point methods for conic optimization conic optimization barrier methods symmetric primal-dual methods 3. First-order methods (proximal) gradient algorithms dual techniques and multiplier methods

10 Convex sets and functions Convex optimization MLSS 2012 convex sets convex functions operations that preserve convexity

11 Convex set contains the line segment between any two points in the set x 1,x 2 C, 0 θ 1 = θx 1 +(1 θ)x 2 C convex not convex not convex Convex sets and functions 8

12 Basic examples affine set: solution set of linear equations Ax = b halfspace: solution of one linear inequality a T x b (a 0) polyhedron: solution of finitely many linear inequalities Ax b ellipsoid: solution of positive definite quadratic inquality (x x c ) T A(x x c ) 1 (A positive definite) norm ball: solution of x R (for any norm) positive semidefinite cone: S n + = {X S n X 0} the intersection of any number of convex sets is convex Convex sets and functions 9

13 Example of intersection property C = {x R n p(t) 1 for t π/3} where p(t) = x 1 cost+x 2 cos2t+ +x n cosnt 2 p(t) x2 1 0 C 1 0 π/3 2π/3 π t x 1 C is intersection of infinitely many halfspaces, hence convex Convex sets and functions 10

14 Convex function domain domf is a convex set and Jensen s inequality holds: for all x,y domf, 0 θ 1 f(θx+(1 θ)y) θf(x)+(1 θ)f(y) (x,f(x)) (y,f(y)) f is concave if f is convex Convex sets and functions 11

15 Examples linear and affine functions are convex and concave expx, logx, xlogx are convex x α is convex for x > 0 and α 1 or α 0; x α is convex for α 1 norms are convex quadratic-over-linear function x T x/t is convex in x, t for t > 0 geometric mean (x 1 x 2 x n ) 1/n is concave for x 0 logdetx is concave on set of positive definite matrices log(e x 1 + e x n ) is convex Convex sets and functions 12

16 Epigraph and sublevel set epigraph: epif = {(x,t) x domf, f(x) t} epif a function is convex if and only its epigraph is a convex set f sublevel sets: C α = {x domf f(x) α} the sublevel sets of a convex function are convex (converse is false) Convex sets and functions 13

17 Differentiable convex functions differentiable f is convex if and only if domf is convex and f(y) f(x)+ f(x) T (y x) for all x,y domf f(y) f(x) + f(x) T (y x) (x,f(x)) twice differentiable f is convex if and only if domf is convex and 2 f(x) 0 for all x domf Convex sets and functions 14

18 Establishing convexity of a function 1. verify definition 2. for twice differentiable functions, show 2 f(x) 0 3. show that f is obtained from simple convex functions by operations that preserve convexity nonnegative weighted sum composition with affine function pointwise maximum and supremum minimization composition perspective Convex sets and functions 15

19 Positive weighted sum & composition with affine function nonnegative multiple: αf is convex if f is convex, α 0 sum: f 1 +f 2 convex if f 1,f 2 convex (extends to infinite sums, integrals) composition with affine function: f(ax+b) is convex if f is convex examples logarithmic barrier for linear inequalities f(x) = m log(b i a T i x) i=1 (any) norm of affine function: f(x) = Ax+b Convex sets and functions 16

20 Pointwise maximum is convex if f 1,..., f m are convex f(x) = max{f 1 (x),...,f m (x)} example: sum of r largest components of x R n f(x) = x [1] +x [2] + +x [r] is convex (x [i] is ith largest component of x) proof: f(x) = max{x i1 +x i2 + +x ir 1 i 1 < i 2 < < i r n} Convex sets and functions 17

21 Pointwise supremum g(x) = supf(x,y) y A is convex if f(x,y) is convex in x for each y A examples maximum eigenvalue of symmetric matrix λ max (X) = sup y T Xy y 2 =1 support function of a set C S C (x) = supy T x y C Convex sets and functions 18

22 Minimization h(x) = inf y C f(x,y) is convex if f(x,y) is convex in (x,y) and C is a convex set examples distance to a convex set C: h(x) = inf y C x y optimal value of linear program as function of righthand side follows by taking h(x) = inf y:ay x ct y f(x,y) = c T y, domf = {(x,y) Ay x} Convex sets and functions 19

23 Composition composition of g : R n R and h : R R: f is convex if f(x) = h(g(x)) g convex, h convex and nondecreasing g concave, h convex and nonincreasing (if we assign h(x) = for x domh) examples expg(x) is convex if g is convex 1/g(x) is convex if g is concave and positive Convex sets and functions 20

24 Vector composition composition of g : R n R k and h : R k R: f is convex if f(x) = h(g(x)) = h(g 1 (x),g 2 (x),...,g k (x)) g i convex, h convex and nondecreasing in each argument g i concave, h convex and nonincreasing in each argument (if we assign h(x) = for x domh) example log m i=1 expg i (x) is convex if g i are convex Convex sets and functions 21

25 Perspective the perspective of a function f : R n R is the function g : R n R R, g(x,t) = tf(x/t) g is convex if f is convex on domg = {(x,t) x/t domf, t > 0} examples perspective of f(x) = x T x is quadratic-over-linear function g(x,t) = xt x t perspective of negative logarithm f(x) = logx is relative entropy g(x,t) = tlogt tlogx Convex sets and functions 22

26 the conjugate of a function f is Conjugate function f (y) = sup (y T x f(x)) x dom f f(x) xy x (0, f (y)) f is convex (even if f is not) Convex sets and functions 23

27 convex quadratic function (Q 0) Examples f(x) = 1 2 xt Qx f (y) = 1 2 yt Q 1 y negative entropy norm f(x) = n x i logx i f (y) = i=1 f(x) = x f (y) = n e y i 1 i=1 { 0 y 1 + otherwise indicator function (C convex) f(x) = I C (x) = { 0 x C + otherwise f (y) = supy T x x C Convex sets and functions 24

28 Convex optimization problems Convex optimization MLSS 2012 linear programming quadratic programming geometric programming second-order cone programming semidefinite programming

29 Convex optimization problem minimize f 0 (x) subject to f i (x) 0, i = 1,...,m Ax = b f 0, f 1,..., f m are convex functions feasible set is convex locally optimal points are globally optimal tractable, in theory and practice Convex optimization problems 25

30 Linear program (LP) minimize c T x+d subject to Gx h Ax = b inequality is componentwise vector inequality convex problem with affine objective and constraint functions feasible set is a polyhedron P x c Convex optimization problems 26

31 Piecewise-linear minimization minimize f(x) = max i=1,...,m (at i x+b i ) f(x) a T i x + b i equivalent linear program an LP with variables x, t R minimize t subject to a T i x+b i t, i = 1,...,m x Convex optimization problems 27

32 l 1 -Norm and l -norm minimization l 1 -norm approximation and equivalent LP ( y 1 = k y k ) minimize Ax b 1 n minimize i=1 y i subject to y Ax b y l -norm approximation ( y = max k y k ) minimize Ax b minimize y subject to y1 Ax b y1 (1 is vector of ones) Convex optimization problems 28

33 example: histograms of residuals Ax b (with A is ) for x ls = argmin Ax b 2, x l1 = argmin Ax b (Ax ls b) k (Ax l1 b) k 1-norm distribution is wider with a high peak at zero Convex optimization problems 29

34 Robust regression f(t) t 42 points t i, y i (circles), including two outliers function f(t) = α+βt fitted using 2-norm (dashed) and 1-norm Convex optimization problems 30

35 Linear discrimination given a set of points {x 1,...,x N } with binary labels s i { 1,1} find hyperplane a T x+b = 0 that strictly separates the two classes a T x i +b > 0 if s i = 1 a T x i +b < 0 if s i = 1 homogeneous in a, b, hence equivalent to the linear inequalities (in a, b) s i (a T x i +b) 1, i = 1,...,N Convex optimization problems 31

36 Approximate linear separation of non-separable sets minimize N max{0,1 s i (a T x i +b)} i=1 a piecewise-linear minimization problem in a, b; equivalent to an LP can be interpreted as a heuristic for minimizing #misclassified points Convex optimization problems 32

37 Quadratic program (QP) minimize (1/2)x T Px+q T x+r subject to Gx h P S n +, so objective is convex quadratic minimize a convex quadratic function over a polyhedron f 0 (x ) x P Convex optimization problems 33

38 Linear program with random cost minimize c T x subject to Gx h c is random vector with mean c and covariance Σ hence, c T x is random variable with mean c T x and variance x T Σx expected cost-variance trade-off minimize Ec T x+γvar(c T x) = c T x+γx T Σx subject to Gx h γ > 0 is risk aversion parameter Convex optimization problems 34

39 Robust linear discrimination H 1 = {z a T z +b = 1} H 1 = {z a T z +b = 1} distance between hyperplanes is 2/ a 2 to separate two sets of points by maximum margin, minimize a 2 2 = a T a subject to s i (a T x i +b) 1, i = 1,...,N a quadratic program in a, b Convex optimization problems 35

40 Support vector classifier minimize γ a 2 2+ N max{0,1 s i (a T x i +b)} i=1 γ = 0 γ = 10 equivalent to a quadratic program Convex optimization problems 36

41 Kernel formulation minimize f(xa)+ a 2 2 variables a R n X R N n with N n and rank N change of variables y = Xa, a = X T (XX T ) 1 y a is minimum-norm solution of Xa = y gives convex problem with N variables y minimize f(y)+y T Q 1 y Q = XX T is kernel matrix Convex optimization problems 37

42 Total variation signal reconstruction minimize ˆx x cor 2 2+γφ(ˆx) x cor = x+v is corrupted version of unknown signal x, with noise v variable ˆx (reconstructed signal) is estimate of x φ : R n R is quadratic or total variation smoothing penalty φ quad (ˆx) = n 1 (ˆx i+1 ˆx i ) 2, φ tv (ˆx) = i=1 n 1 i=1 ˆx i+1 ˆx i Convex optimization problems 38

43 example: x cor, and reconstruction with quadratic and t.v. smoothing 2 xcor i quad i t.v quadratic smoothing smooths out noise and sharp transitions in signal total variation smoothing preserves sharp transitions in signal i Convex optimization problems 39

44 Geometric programming posynomial function f(x) = K k=1 c k x a 1k 1 xa 2k 2 x a nk n, domf = R n ++ with c k > 0 geometric program (GP) with f i posynomial minimize f 0 (x) subject to f i (x) 1, i = 1,...,m Convex optimization problems 40

45 Geometric program in convex form change variables to y i = logx i, and take logarithm of cost, constraints geometric program in convex form: ( K ) minimize log exp(a T 0ky +b 0k ) ( k=1 K ) subject to log exp(a T iky +b ik ) 0, i = 1,...,m k=1 b ik = logc ik Convex optimization problems 41

46 Second-order cone program (SOCP) minimize f T x subject to A i x+b i 2 c T i x+d i, i = 1,...,m 2 is Euclidean norm y 2 = y y2 n constraints are nonlinear, nondifferentiable, convex 1 constraints are inequalities w.r.t. second-order cone: { y } y1 2+ +y2 p 1 y p y y y 1 Convex optimization problems 42

47 Robust linear program (stochastic) minimize c T x subject to prob(a T i x b i) η, i = 1,...,m a i random and normally distributed with mean ā i, covariance Σ i we require that x satisfies each constraint with probability exceeding η η = 10% η = 50% η = 90% Convex optimization problems 43

48 SOCP formulation the chance constraint prob(a T i x b i) η is equivalent to the constraint ā T i x+φ 1 (η) Σ 1/2 i x 2 b i Φ is the (unit) normal cumulative density function 1 η Φ(t) t Φ 1 (η) robust LP is a second-order cone program for η 0.5 Convex optimization problems 44

49 Robust linear program (deterministic) minimize c T x subject to a T i x b i for all a i E i, i = 1,...,m a i uncertain but bounded by ellipsoid E i = {ā i +P i u u 2 1} we require that x satisfies each constraint for all possible a i SOCP formulation minimize c T x subject to ā T i x+ PT i x 2 b i, i = 1,...,m follows from sup (ā i +P i u) T x = ā T i x+ Pi T x 2 u 2 1 Convex optimization problems 45

50 Examples of second-order cone constraints convex quadratic constraint (A = LL T positive definite) x T Ax+2b T x+c 0 L T x+l 1 b 2 (b T A 1 b c) 1/2 extends to positive semidefinite singular A hyperbolic constraint [ x T x yz, y,z 0 2x y z ] 2 y +z, y,z 0 Convex optimization problems 46

51 Examples of SOC-representable constraints positive powers x 1.5 t, x 0 z : x 2 tz, z 2 x, x,z 0 two hyperbolic constraints can be converted to SOC constraints extends to powers x p for rational p 1 negative powers x 3 t, x > 0 z : 1 tz, z 2 tx, x,z 0 two hyperbolic constraints on r.h.s. can be converted to SOC constraints extends to powers x p for rational p < 0 Convex optimization problems 47

52 Semidefinite program (SDP) minimize c T x subject to x 1 A 1 +x 2 A 2 + +x n A n B A 1, A 2,..., A n, B are symmetric matrices inequality X Y means Y X is positive semidefinite, i.e., z T (Y X)z = i,j (Y ij X ij )z i z j 0 for all z includes many nonlinear constraints as special cases Convex optimization problems 48

53 Geometry 1 [ x y y z ] 0 z y x 1 a nonpolyhedral convex cone feasible set of a semidefinite program is the intersection of the positive semidefinite cone in high dimension with planes Convex optimization problems 49

54 Examples A(x) = A 0 +x 1 A 1 + +x m A m (A i S n ) eigenvalue minimization (and equivalent SDP) minimize λ max (A(x)) minimize t subject to A(x) ti matrix-fractional function minimize b T A(x) 1 b subject to A(x) 0 minimize t[ A(x) b subject to b T t ] 0 Convex optimization problems 50

55 Matrix norm minimization A(x) = A 0 +x 1 A 1 +x 2 A 2 + +x n A n (A i R p q ) matrix norm approximation ( X 2 = max k σ k (X)) minimize A(x) 2 minimize t[ subject to ti A(x) A(x) T ti ] 0 nuclear norm approximation ( X = k σ k(x)) minimize A(x) minimize (tru +trv)/2 [ U A(x) subject to T A(x) V ] 0 Convex optimization problems 51

56 Semidefinite relaxation semidefinite programming is often used to find good bounds for nonconvex polynomial problems, via relaxation as a heuristic for good suboptimal points example: Boolean least-squares minimize Ax b 2 2 subject to x 2 i = 1, i = 1,...,n basic problem in digital communications could check all 2 n possible values of x { 1,1} n... an NP-hard problem, and very hard in general Convex optimization problems 52

57 Lifting Boolean least-squares problem minimize x T A T Ax 2b T Ax+b T b subject to x 2 i = 1, i = 1,...,n reformulation: introduce new variable Y = xx T minimize tr(a T AY) 2b T Ax+b T b subject to Y = xx T diag(y) = 1 cost function and second constraint are linear (in the variables Y, x) first constraint is nonlinear and nonconvex... still a very hard problem Convex optimization problems 53

58 Relaxation replace Y = xx T with weaker constraint Y xx T to obtain relaxation minimize tr(a T AY) 2b T Ax+b T b subject to Y xx T diag(y) = 1 convex; can be solved as a semidefinite program Y xx T [ Y x x T 1 ] 0 optimal value gives lower bound for Boolean LS problem if Y = xx T at the optimum, we have solved the exact problem otherwise, can use randomized rounding generate z from N(x,Y xx T ) and take x = sign(z) Convex optimization problems 54

59 Example SDP bound LS solution frequency Ax b 2 /(SDP bound) n = 100: feasible set has points histogram of 1000 randomized solutions from SDP relaxation Convex optimization problems 55

60 Overview 1. Basic theory and convex modeling convex sets and functions common problem classes and applications 2. Interior-point methods for conic optimization conic optimization barrier methods symmetric primal-dual methods 3. First-order methods (proximal) gradient algorithms dual techniques and multiplier methods

61 Conic optimization Convex optimization MLSS 2012 definitions and examples modeling duality

62 Generalized (conic) inequalities conic inequality: a constraint x K with K a convex cone in R m we require that K is a proper cone: closed pointed: does not contain a line (equivalently, K ( K) = {0} with nonempty interior: intk (equivalently, K +( K) = R m ) notation x K y x y K, x K y x y intk subscript in K is omitted if K is clear from the context Conic optimization 56

63 Cone linear program minimize c T x subject to Ax K b if K is the nonnegative orthant, this is a (regular) linear program widely used in recent literature on convex optimization modeling: a small number of primitive cones is sufficient to express most convex constraints that arise in practice algorithms: a convenient problem format when extending interior-point algorithms for linear programming to convex optimization Conic optimization 57

64 Norm cone K = { (x,y) R m 1 R x y } 1 y x x for the Euclidean norm this is the second-order cone (notation: Q m ) Conic optimization 58

65 Second-order cone program minimize c T x subject to B k0 x+d k0 2 B k1 x+d k1, k = 1,...,r cone LP formulation: express constraints as Ax K b K = Q m 1 Q m r, A = B 10 B 11. B r0, b = d 10 d 11. d r0 B r1 d r1 (assuming B k0, d k0 have m k 1 rows) Conic optimization 59

66 Vector notation for symmetric matrices vectorized symmetric matrix: for U S p vec(u) = 2 ( U11, U 21,..., U p1, U 22, U 32,..., U p2,..., U ) pp inverse operation: for u = (u 1,u 2,...,u n ) R n with n = p(p+1)/2 mat(u) = 1 2 2u1 u 2 u p u 2 2up+1 u 2p 1... u p u 2p 1 2up(p+1)/2 coefficients 2 are added so that standard inner products are preserved: tr(uv) = vec(u) T vec(v), u T v = tr(mat(u)mat(v)) Conic optimization 60

67 Positive semidefinite cone S p = {vec(x) X S p +} = {x R p(p+1)/2 mat(x) 0} 1 z y 1 0 x S 2 = { (x,y,z) [ x y/ 2 y/ 2 z ] } 0 Conic optimization 61

68 Semidefinite program minimize c T x subject to x 1 A 11 +x 2 A x n A 1n B 1... x 1 A r1 +x 2 A r2 + +x n A rn B r r linear matrix inequalities of order p 1,..., p r cone LP formulation: express constraints as Ax K B K = S p 1 S p 2 S p r A = vec(a 11 ) vec(a 12 ) vec(a 1n ) vec(a 21 ) vec(a 22 ) vec(a 2n )... vec(a r1 ) vec(a r2 ) vec(a rn ), b = vec(b 1 ) vec(b 2 ). vec(b r ) Conic optimization 62

69 Exponential cone the epigraph of the perspective of expx is a non-proper cone { } K = (x,y,z) R 3 ye x/y z, y > 0 the exponential cone is K exp = clk = K {(x,0,z) x 0,z 0} 1 z y x 0 1 Conic optimization 63

70 Geometric program minimize c T x subject to log n i k=1 exp(a T ik x+b ik) 0, i = 1,...,r cone LP formulation minimize c T x subject to at ik x+b ik 1 z ik n i k=1 K exp, k = 1,...,n i, i = 1,...,r z ik 1, i = 1,...,m Conic optimization 64

71 Power cone definition: for α = (α 1,α 2,...,α m ) > 0, m i=1 α i = 1 K α = { (x,y) R m + R y x α } 1 1 xα m m examples for m = 2 α = ( 1 2, 1 2 ) α = (2 3, 1 3 ) α = (3 4, 1 4 ) y 0 y 0 y x 1 x x 1 x x 1 x 2 Conic optimization 65

72 Outline definition and examples modeling duality

73 Modeling software modeling packages for convex optimization CVX, YALMIP (MATLAB) CVXPY, CVXMOD (Python) assist the user in formulating convex problems, by automating two tasks: verifying convexity from convex calculus rules transforming problem in input format required by standard solvers related packages general-purpose optimization modeling: AMPL, GAMS Conic optimization 66

74 CVX example minimize Ax b 1 subject to 0 x k 1, k = 1,...,n MATLAB code cvx_begin variable x(3); minimize(norm(a*x - b, 1)) subject to x >= 0; x <= 1; cvx_end between cvx_begin and cvx_end, x is a CVX variable after execution, x is MATLAB variable with optimal solution Conic optimization 67

75 Modeling and conic optimization convex modeling systems (CVX, YALMIP, CVXPY, CVXMOD,... ) convert problems stated in standard mathematical notation to cone LPs in principle, any convex problem can be represented as a cone LP in practice, a small set of primitive cones is used (R n +, Q p, S p ) choice of cones is limited by available algorithms and solvers (see later) modeling systems implement set of rules for expressing constraints f(x) t as conic inequalities for the implemented cones Conic optimization 68

76 Examples of second-order cone representable functions convex quadratic f(x) = x T Px+q T x+r (P 0) quadratic-over-linear function f(x,y) = xt x y with domf = R n R + (assume 0/0 = 0) convex powers with rational exponent f(x) = x α, f(x) = { x β x > 0 + x 0 for rational α 1 and β 0 p-norm f(x) = x p for rational p 1 Conic optimization 69

77 Examples of SD cone representable functions matrix-fractional function f(x,y) = y T X 1 y with domf = {(X,y) S n + R n y R(X)} maximum eigenvalue of symmetric matrix maximum singular value f(x) = X 2 = σ 1 (X) [ ] ti X X 2 t X T ti 0 nuclear norm f(x) = X = i σ i(x) X t U,V : [ U X X T V ] 0, 1 (tru +trv) t 2 Conic optimization 70

78 Functions representable with exponential and power cone exponential cone exponential and logarithm entropy f(x) = xlogx power cone increasing power of absolute value: f(x) = x p with p 1 decreasing power: f(x) = x q with q 0 and domain R ++ p-norm: f(x) = x p with p 1 Conic optimization 71

79 Outline definition and examples modeling duality

80 Linear programming duality primal and dual LP (P) minimize c T x subject to Ax b (D) maximize b T z subject to A T z +c = 0 z 0 primal optimal value is p (+ if infeasible, if unbounded below) dual optimal value is d ( if infeasible, + if unbounded below) duality theorem weak duality: p d, with no exception strong duality: p = d if primal or dual is feasible if p = d is finite, then primal and dual optima are attained Conic optimization 72

81 Dual cone definition K = {y x T y 0 for all x K} K is a proper cone if K is a proper cone dual inequality: x y means x K y for generic proper cone K note: dual cone depends on choice of inner product: H 1 K is dual cone for inner product x,y = x T Hy Conic optimization 73

82 Examples R p +, Q p, S p are self-dual: K = K dual of a norm cone is the norm cone of the dual norm dual of exponential cone K exp = { (u,v,w) R R R + ulog( u/w)+u v 0 } (with 0log(0/w) = 0 if w 0) dual of power cone is K α = { (u,v) R m + R v (u 1 /α 1 ) α1 (u m /α m ) α m } Conic optimization 74

83 Primal and dual cone LP primal problem (optimal value p ) minimize c T x subject to Ax b dual problem (optimal value d ) maximize b T z subject to A T z +c = 0 z 0 weak duality: p d (without exception) Conic optimization 75

84 Strong duality p = d if primal or dual is strictly feasible slightly weaker than LP duality (which only requires feasibility) can have d < p with finite p and d other implications of strict feasibility if primal is strictly feasible, then dual optimum is attained (if d is finite) if dual is strictly feasible, then primal optimum is attained (if p is finite) Conic optimization 76

85 Optimality conditions minimize c T x subject to Ax+s = b s 0 maximize b T z subject to A T z +c = 0 z 0 optimality conditions [ 0 s ] = [ 0 A T A 0 ][ x z ] + [ c b ] s 0, z 0, z T s = 0 duality gap: inner product of (x,z) and (0,s) gives z T s = c T x+b T z Conic optimization 77

86 Barrier methods Convex optimization MLSS 2012 barrier method for linear programming normal barriers barrier method for conic optimization

87 History 1960s: Sequentially Unconstrained Minimization Technique (SUMT) solves nonlinear convex optimization problem minimize f 0 (x) subject to f i (x) 0, i = 1,...,m via a sequence of unconstrained minimization problems minimize tf 0 (x) m i=1 log( f i (x)) 1980s: LP barrier methods with polynomial worst-case complexity 1990s: barrier methods for non-polyhedral cone LPs Barrier methods 78

88 Logarithmic barrier function for linear inequalities barrier for nonnegative orthant R m +: φ(s) = m barrier for inequalities Ax b: i=1 logs i ψ(x) = φ(b Ax) = m log(b i a T i x) i=1 convex, ψ(x) at boundary of domψ = {x Ax < b} gradient and Hessian ψ(x) = A T φ(s), 2 ψ(x) = A T φ 2 (s)a with s = b Ax and ( 1 φ(s) =,..., s 1 1 s m ) ( 1, φ 2 (s) = diag s 2,..., 1 1 s 2 m ) Barrier methods 79

89 Central path for linear program minimize c T x subject to Ax b central path: minimizers x (t) of c f t (x) = tc T x+φ(b Ax) t is a positive parameter x x (t) optimality conditions: x = x (t) satisfies f t (x) = tc A T φ(s) = 0, s = b Ax Barrier methods 80

90 Central path and duality dual feasible point on central path for x = x (t) and s = b Ax, z (t) = 1 t φ(s) = ( 1 ts 1, 1 ts 2,..., 1 ts m z = z (t) is strictly dual feasible: c+a T z = 0 and z > 0 can be corrected to account for inexact centering of x x (t) duality gap between x = x (t) and z = z (t) is ) c T x+b T z = s T z = m t gives bound on suboptimality: c T x (t) p m/t Barrier methods 81

91 Barrier method starting with t > 0, strictly feasible x make one or more Newton steps to (approximately) minimize f t : x + = x α 2 f t (x) 1 f t (x) step size α is fixed or from line search increase t and repeat until c T x p ǫ complexity: with proper initialization, step size, update scheme for t, #Newton steps = O ( mlog(1/ǫ) ) result follows from convergence analysis of Newton s method for f t Barrier methods 82

92 Outline barrier method for linear programming normal barriers barrier method for conic optimization

93 Normal barrier for proper cone φ is a θ-normal barrier for the proper cone K if it is a barrier: smooth, convex, domain intk, blows up at boundary of K logarithmically homogeneous with parameter θ: φ(tx) = φ(x) θlogt, x intk, t > 0 self-concordant: restriction g(α) = φ(x + αv) to any line satisfies g (α) 2g (α) 3/2 (Nesterov and Nemirovski, 1994) Barrier methods 83

94 Examples nonnegative orthant: K = R m + φ(x) = m logx i (θ = m) i=1 second-order cone: K = Q p = {(x,y) R p 1 R x 2 y} φ(x,y) = log(y 2 x T x) (θ = 2) semidefinite cone: K = S m = {x R m(m+1)/2 mat(x) 0} φ(x) = logdetmat(x) (θ = m) Barrier methods 84

95 exponential cone: K exp = cl{(x,y,z) R 3 ye x/y z, y > 0} φ(x,y,z) = log(ylog(z/y) x) logz logy (θ = 3) power cone: K = {(x 1,x 2,y) R + R + R y x α 1 1 xα 2 2 } φ(x,y) = log ( x 2α 1 1 x 2α 2 2 y 2) logx 1 logx 2 (θ = 4) Barrier methods 85

96 Central path conic LP (with inequality with respect to proper cone K) minimize c T x subject to Ax b barrier for the feasible set where φ is a θ-normal barrier for K φ(b Ax) central path: set of minimizers x (t) (with t > 0) of f t (x) = tc T x+φ(b Ax) Barrier methods 86

97 Newton step centering problem minimize f t (x) = tc T x+φ(b Ax) Newton step at x x = 2 f t (x) 1 f t (x) Newton decrement λ t (x) = ( x T 2 f t (x) x ) 1/2 = ( f t (x) T x ) 1/2 useful as a measure of proximity of x to x (t) Barrier methods 87

98 Damped Newton method minimize f t (x) = tc T x+φ(b Ax) algorithm (with parameters ǫ (0,1/2), η (0,1/4]) select a starting point x domf t repeat: 1. compute Newton step x and Newton decrement λ t (x) 2. if λ t (x) 2 ǫ, return x 3. otherwise, set x := x+α x with α = 1 1+λ t (x) if λ t (x) η, α = 1 if λ t (x) < η stopping criterion λ t (x) 2 ǫ implies f t (x) inff t (x) ǫ alternatively, can use backtracking line search Barrier methods 88

99 Convergence results for damped Newton method damped Newton phase: f t decreases by at least a positive constant γ f t (x + ) f t (x) γ if λ t (x) η where γ = η log(1+η) quadratic convergence phase: λ t rapidly decreases to zero 2λ t (x + ) (2λ t (x)) 2 if λ t (x) < η implies λ t (x + ) 2η 2 < η conclusion: the number of Newton iterations is bounded by f t (x (0) ) inff t (x) γ +log 2 log 2 (1/ǫ) Barrier methods 89

100 Outline barrier method for linear programming normal barriers barrier method for conic optimization

101 Central path and duality x (t) = argmin ( tc T x+φ(b Ax) ) duality point on central path: x (t) defines a strictly dual feasible z (t) z (t) = 1 t φ(s), s = b Ax (t) duality gap: gap between x = x (t) and z = z (t) is c T x+b T z = s T z = θ t, ct x p θ t extension near central path (for λ t (x) < 1): c T x p ( 1+ λ t(x) θ ) θ t (results follow from properties of normal barriers) Barrier methods 90

102 Short-step barrier method algorithm (parameters ǫ (0,1), β = 1/8) select initial x and t with λ t (x) β repeat until 2θ/t ǫ: ( ) 1 t := t, x := x f t (x) 1 f t (x) θ properties increases t slowly so x stays in region of quadratic region (λ t (x) β) iteration complexity ( ( )) θlog θ #iterations = O ǫt 0 best known worst-case complexity; same as for linear programming Barrier methods 91

103 Predictor-corrector methods short-step barrier methods stay in narrow neighborhood of central path (defined by limit on λ t ) make small, fixed increases t + = µt as a result, quite slow in practice predictor-corrector method select new t using a linear approximation to central path ( predictor ) re-center with new t ( corrector ) allows faster and adaptive increases in t; similar worst-case complexity Barrier methods 92

104 Primal-dual methods Convex optimization MLSS 2012 primal-dual algorithms for linear programming symmetric cones primal-dual algorithms for conic optimization implementation

105 Primal-dual interior-point methods similarities with barrier method follow the same central path same linear algebra cost per iteration differences more robust and faster (typically less than 50 iterations) primal and dual iterates updated at each iteration symmetric treatment of primal and dual iterates can start at infeasible points include heuristics for adaptive choice of central path parameter t often have superlinear asymptotic convergence Primal-dual methods 93

106 Primal-dual central path for linear programming minimize c T x subject to Ax+s = b s 0 maximize b T z subject to A T z +c = 0 z 0 optimality conditions (s z is component-wise vector product) Ax+s = b, A T z +c = 0, (s,z) 0, s z = 0 primal-dual parametrization of central path Ax+s = b, A T z +c = 0, (s,z) 0, s z = µ1 solution is x = x (t), z = z (t) for t = 1/µ µ = (s T z)/m for x, z on the central path Primal-dual methods 94

107 Primal-dual search direction current iterates ˆx, ŝ > 0, ẑ > 0 updated as ˆx := ˆx+α x, ŝ := ŝ+α s, ẑ := ẑ +α z primal and dual steps x, s, z are defined by A(ˆx+ x)+ŝ+ s = b, A T (ẑ + z)+c = 0 ẑ s+ŝ z = σˆµ1 ŝ ẑ where ˆµ = (ŝ T ẑ)/m and σ [0,1] last equation is linearization of (ŝ+ s) (ẑ + z) = σˆµ1 targets point on central path with µ = σˆµ i.e., with gap σ(ŝ T ẑ) different methods use different strategies for selecting σ α (0,1] selected so that ŝ > 0, ẑ > 0 Primal-dual methods 95

108 Linear algebra complexity at each iteration solve an equation A I A T 0 diag(ẑ) diag(ŝ) x s z = b Aˆx ŝ c A T ẑ σˆµ1 ŝ ẑ after eliminating s, z this reduces to an equation A T DA x = r, with D = diag(ẑ 1 /ŝ 1,...,ẑ m /ŝ m ) similar equation as in simple barrier method (with different D, r) Primal-dual methods 96

109 Outline primal-dual algorithms for linear programming symmetric cones primal-dual algorithms for conic optimization implementation

110 Symmetric cones symmetric primal-dual solvers for cone LPs are limited to symmetric cones second-order cone positive semidefinite cone direct products of these primitive symmetric cones (such as R p +) definition: cone of squares x = y 2 = y y for a product that satisfies 1. bilinearity (x y is linear in x for fixed y and vice-versa) 2. x y = y x 3. x 2 (y x) = (x 2 y) x 4. x T (y z) = (x y) T z not necessarily associative Primal-dual methods 97

111 Vector product and identity element nonnegative orthant: component-wise product x y = diag(x)y identity element is e = 1 = (1,1,...,1) positive semidefinite cone: symmetrized matrix product x y = 1 2 vec(xy +YX) with X = mat(x), Y = mat(y) identity element is e = vec(i) second-order cone: the product of x = (x 0,x 1 ) and y = (y 0,y 1 ) is x y = 1 2 [ identity element is e = ( 2,0,...,0) x T y x 0 y 1 +y 0 x 1 ] Primal-dual methods 98

112 Classification symmetric cones are studied in the theory of Euclidean Jordan algebras all possible symmetric cones have been characterized list of symmetric cones the second-order cone the positive semidefinite cone of Hermitian matrices with real, complex, or quaternion entries 3 3 positive semidefinite matrices with octonion entries Cartesian products of these primitive symmetric cones (such as R p +) practical implication can focus on Q p, S p and study these cones using elementary linear algebra Primal-dual methods 99

113 Spectral decomposition with each symmetric cone/product we associate a spectral decomposition x = θ λ i q i, i=1 with θ q i = e and q i q j = i=1 { qi i = j 0 i j semidefinite cone (K = S p ): eigenvalue decomposition of mat(x) θ = p, mat(x) = p λ i v i vi T, q i = vec(v i vi T ) i=1 second-order cone (K = Q p ) θ = 2, λ i = x 0± x 1 2 2, q i = 1 2 [ ] 1, i = 1,2 ±x 1 / x 1 2 Primal-dual methods 100

114 Applications nonnegativity x 0 λ 1,...,λ θ 0, x 0 λ 1,...,λ θ > 0 powers (in particular, inverse and square root) x α = i λ α i q i log-det barrier φ(x) = logdetx = θ logλ i i=1 a θ-normal barrier, with gradient φ(x) = x 1 Primal-dual methods 101

115 Outline primal-dual algorithms for linear programming symmetric cones primal-dual algorithms for conic optimization implementation

116 Symmetric parametrization of central path centering problem minimize tc T x+φ(b Ax) optimality conditions (using φ(s) = s 1 ) Ax+s = b, A T z +c = 0, (s,z) 0, z = 1 t s 1 equivalent symmetric form (with µ = 1/t) Ax+b = s, A T z +c = 0, (s,z) 0, s z = µe Primal-dual methods 102

117 Scaling with Hessian linear transformation with H = 2 φ(u) has several important properties preserves conic inequalities: s 0 Hs 0 if s is invertible, then Hs is invertible and (Hs) 1 = H 1 s 1 preserves central path: s z = µe (Hs) (H 1 z) = µe example (K = S p ): transformation w = 2 φ(u)s is a congruence W = U 1 SU 1, W = mat(w), S = mat(s), U = mat(u) Primal-dual methods 103

118 Primal-dual search direction steps x, s, z at current iterates ˆx, ŝ, ẑ are defined by A(ˆx+ x)+ŝ+ s = b, A T (ẑ + z)+c = 0 (Hŝ) (H 1 z)+(h 1 ẑ) (H s) = σˆµe (Hŝ) (H 1 ẑ) where ˆµ = (ŝ T ẑ)/θ, σ [0,1], and H = 2 φ(u) last equation is linearization of (H(ŝ+ s)) ( H 1 (ẑ + z) ) = σˆµe different algorithms use different choices of σ, H Nesterov-Todd scaling: choose H = 2 φ(u) such that Hŝ = H 1 ẑ Primal-dual methods 104

119 Outline primal-dual algorithms for linear programming symmetric cones primal-dual algorithms for conic optimization implementation

120 Software implementations general-purpose software for nonlinear convex optimization several high-quality packages (MOSEK, Sedumi, SDPT3, SDPA,... ) exploit sparsity to achieve scalability customized implementations can exploit non-sparse types of problem structure often orders of magnitude faster than general-purpose solvers Primal-dual methods 105

121 Example: l 1 -regularized least-squares A is m n (with m n) and dense quadratic program formulation minimize Ax b 2 2+ x 1 minimize Ax b T u subject to u x u coefficient of Newton system in interior-point method is [ A T A ] [ ] D1 +D + 2 D 2 D 1 D 2 D 1 D 1 +D 2 (D 1, D 2 positive diagonal) expensive for large n: cost is O(n 3 ) Primal-dual methods 106

122 customized implementation can reduce Newton equation to solution of a system (AD 1 A T +I) u = r cost per iteration is O(m 2 n) comparison (seconds on 2.83 Ghz Core 2 Quad machine) m n custom general-purpose custom solver is CVXOPT; general-purpose solver is MOSEK Primal-dual methods 107

123 Overview 1. Basic theory and convex modeling convex sets and functions common problem classes and applications 2. Interior-point methods for conic optimization conic optimization barrier methods symmetric primal-dual methods 3. First-order methods (proximal) gradient algorithms dual techniques and multiplier methods

124 Gradient methods Convex optimization MLSS 2012 gradient and subgradient method proximal gradient method fast proximal gradient methods 108

125 Classical gradient method to minimize a convex differentiable function f: choose x (0) and repeat x (k) = x (k 1) t k f(x (k 1) ), k = 1,2,... step size t k is constant or from line search advantages every iteration is inexpensive does not require second derivatives disadvantages often very slow; very sensitive to scaling does not handle nondifferentiable functions Gradient methods 109

126 Quadratic example f(x) = 1 2 (x2 1+γx 2 2) (γ > 1) with exact line search and starting point x (0) = (γ,1) x (k) x 2 x (0) x 2 = ( ) k γ 1 γ +1 4 x x 1 Gradient methods 110

127 Nondifferentiable example f(x) = x 2 1 +γx2 2 ( x 2 x 1 ), f(x) = x 1+γ x 2 1+γ ( x 2 > x 1 ) with exact line search, x (0) = (γ,1), converges to non-optimal point 2 x x 1 Gradient methods 111

128 First-order methods address one or both disadvantages of the gradient method methods for nondifferentiable or constrained problems smoothing methods subgradient method proximal gradient method methods with improved convergence variable metric methods conjugate gradient method accelerated proximal gradient method we will discuss subgradient and proximal gradient methods Gradient methods 112

129 Subgradient g is a subgradient of a convex function f at x if f(y) f(x)+g T (y x) y domf f(x) f(x 1 ) + g T 1 (x x 1) f(x 2 ) + g T 2 (x x 2) f(x 2 ) + g T 3 (x x 2) x 1 x 2 generalizes basic inequality for convex differentiable f f(y) f(x)+ f(x) T (y x) y domf Gradient methods 113

130 Subdifferential the set of all subgradients of f at x is called the subdifferential f(x) absolute value f(x) = x f(x) = x f(x) 1 x 1 x Euclidean norm f(x) = x 2 f(x) = 1 x 2 x if x 0, f(x) = {g g 2 1} if x = 0 Gradient methods 114

131 Subgradient calculus weak calculus rules for finding one subgradient sufficient for most algorithms for nondifferentiable convex optimization if one can evaluate f(x), one can usually compute a subgradient much easier than finding the entire subdifferential subdifferentiability convex f is subdifferentiable on dom f except possibly at the boundary example of a non-subdifferentiable function: f(x) = x at x = 0 Gradient methods 115

132 Examples of calculus rules nonnegative combination: f = α 1 f 1 +α 2 f 2 with α 1,α 2 0 g = α 1 g 1 +α 2 g 2, g 1 f 1 (x), g 2 f 2 (x) composition with affine transformation: f(x) = h(ax + b) g = A T g, g h(ax+b) pointwise maximum f(x) = max{f 1 (x),...,f m (x)} g f i (x) where f i (x) = max k f k (x) conjugate f (x) = sup y (x T y f(y)): take any maximizing y Gradient methods 116

133 Subgradient method to minimize a nondifferentiable convex function f: choose x (0) and repeat x (k) = x (k 1) t k g (k 1), k = 1,2,... g (k 1) is any subgradient of f at x (k 1) step size rules fixed step size: t k constant fixed step length: t k g (k 1) 2 constant (i.e., x (k) x (k 1) 2 constant) diminishing: t k 0, t k = k=1 Gradient methods 117

134 Some convergence results assumption: f is convex and Lipschitz continuous with constant G > 0: f(x) f(y) G x y 2 x,y results fixed step size t k = t converges to approximately G 2 t/2-suboptimal fixed length t k g (k 1) 2 = s converges to approximately Gs/2-suboptimal decreasing k t k, t k 0: convergence rate of convergence is 1/ k with proper choice of step size sequence Gradient methods 118

135 Example: 1-norm minimization minimize Ax b 1 (A R ,b R 500 ) subgradient is given by A T sign(ax b) (f (k) best f )/f / k 0.01/k k fixed steplength s = 0.1, 0.01, k diminishing step size t k = 0.01/ k, t k = 0.01/k Gradient methods 119

136 Outline gradient and subgradient method proximal gradient method fast proximal gradient methods

137 Proximal operator the proximal operator (prox-operator) of a convex function h is prox h (x) = argmin u h(x) = 0: prox h (x) = x ( h(u)+ 1 ) 2 u x 2 2 h(x) = I C (x) (indicator function of C): prox h is projection on C prox h (x) = argmin u C u x 2 2 = P C (x) h(x) = x 1 : prox h is the soft-threshold (shrinkage) operation prox h (x) i = x i 1 x i 1 0 x i 1 x i +1 x i 1 Gradient methods 120

138 Proximal gradient method unconstrained problem with cost function split in two components minimize f(x) = g(x)+h(x) g convex, differentiable, with domg = R n h convex, possibly nondifferentiable, with inexpensive prox-operator proximal gradient algorithm x (k) = prox tk h ( ) x (k 1) t k g(x (k 1) ) t k > 0 is step size, constant or determined by line search Gradient methods 121

139 Examples minimize g(x) + h(x) gradient method: h(x) = 0, i.e., minimize g(x) x + = x t g(x) gradient projection method: h(x) = I C (x), i.e., minimize g(x) over C x x + = P C (x t g(x)) C x + x t g(x) Gradient methods 122

140 iterative soft-thresholding: h(x) = x 1 x + = prox th (x t g(x)) where prox th (u) i = u i t u i t 0 t u i t u i +t u i t t prox th (u) i t u i Gradient methods 123

141 Properties of proximal operator prox h (x) = argmin u ( h(u)+ 1 ) 2 u x 2 2 assume h is closed and convex (i.e., convex with closed epigraph) prox h (x) is uniquely defined for all x prox h is nonexpansive prox h (x) prox h (y) 2 x y 2 Moreau decomposition x = prox h (x)+prox h (x) Gradient methods 124

142 Moreau-Yosida regularization h (t) (x) = inf u ( h(u)+ 1 ) 2t u x 2 2 (with t > 0) h (t) is convex (infimum over u of a convex function of x, u) domain of h (t) is R n (minimizing u = prox th (x) is defined for all x) h (t) is differentiable with gradient h (t) (x) = 1 t (x prox th(x)) gradient is Lipschitz continuous with constant 1/t can interpret prox th (x) as gradient step x t h (t) (x) Gradient methods 125

143 Examples indicator function (of closed convex set C): squared Euclidean distance h(x) = I C (x), h (t) (x) = 1 2t dist(x)2 1-norm: Huber penalty h(x) = x 1, h (t) (x) = n φ t (x k ) k=1 φ t (z) = { z 2 /(2t) z t z t/2 z t φt(z) t/2 z t/2 Gradient methods 126

144 Examples of inexpensive prox-operators projection on simple sets hyperplanes and halfspaces rectangles {x l x u} probability simplex {x 1 T x = 1,x 0} norm ball for many norms (Euclidean, 1-norm,... ) nonnegative orthant, second-order cone, positive semidefinite cone Gradient methods 127

145 Euclidean norm: h(x) = x 2 prox th (x) = ( 1 t ) x if x 2 t, x 2 prox th (x) = 0 otherwise logarithmic barrier h(x) = n i=1 logx i, prox th (x) i = x i+ x 2 i +4t, i = 1,...,n 2 Euclidean distance: d(x) = inf y C x y 2 (C closed convex) prox td (x) = θp C (x)+(1 θ)x, θ = t max{d(x), t} generalizes soft-thresholding operator Gradient methods 128

146 Prox-operator of conjugate prox th (x) = x tprox h /t(x/t) follows from Moreau decomposition of interest when prox-operator of h is inexpensive example: norms h(x) = x, h (y) = I C (y) where C is unit ball for dual norm prox h /t is projection on C formula useful for prox-operator of if projection on C is inexpensive Gradient methods 129

147 Support function many convex functions can be expressed as support functions with C closed, convex h(x) = S C (x) = supx T y y C conjugate is indicator function of C: h (y) = I C (y) hence, can compute prox th via projection on C example: h(x) is sum of largest r components of x h(x) = x [1] + +x [r] = S C (x), C = {y 0 y 1,1 T y = r} Gradient methods 130

148 Convergence of proximal gradient method minimize f(x) = g(x)+h(x) assumptions g is Lipschitz continuous with constant L > 0 g(x) g(y) 2 L x y 2 x,y optimal value f is finite and attained at x (not necessarily unique) result: with fixed step size t k = 1/L f(x (k) ) f L 2k x(0) x 2 2 compare with 1/ k rate of subgradient method can be extended to include line searches Gradient methods 131

149 Outline gradient and subgradient method proximal gradient method fast proximal gradient methods

150 Fast (proximal) gradient methods Nesterov (1983, 1988, 2005): three gradient projection methods with 1/k 2 convergence rate Beck & Teboulle (2008): FISTA, a proximal gradient version of Nesterov s 1983 method Nesterov (2004 book), Tseng (2008): overview and unified analysis of fast gradient methods several recent variations and extensions this lecture: FISTA (Fast Iterative Shrinkage-Thresholding Algorithm) Gradient methods 132

151 FISTA unconstrained problem with composite objective minimize f(x) = g(x)+h(x) g convex differentiable with domg = R n h convex with inexpensive prox-operator algorithm: choose any x (0) = x ( 1) ; for k 1, repeat the steps y = x (k 1) + k 2 k +1 (x(k 1) x (k 2) ) x (k) = prox tk h(y t k g(y)) Gradient methods 133

152 Interpretation first two iterations (k = 1,2) are proximal gradient steps at x (k 1) next iterations are proximal gradient steps at extrapolated points y x (k) = prox tk h(y t k g(y)) x (k 2) x (k 1) y sequence x (k) remains feasible (in domh); y may be outside domh Gradient methods 134

153 Convergence of FISTA minimize f(x) = g(x)+h(x) assumptions domg = R n and g is Lipschitz continuous with constant L > 0 h is closed (implies prox th (u) exists and is unique for all u) optimal value f is finite and attained at x (not necessarily unique) result: with fixed step size t k = 1/L f(x (k) ) f 2L (k +1) 2 x(0) f 2 2 compare with 1/k convergence rate for gradient method can be extended to include line searches Gradient methods 135

154 10 0 gradient 10 0 gradient Example minimize log m i=1 exp(a T i x+b i) randomly generated data with m = 2000, n = 1000, same fixed step size 10-1 FISTA 10-1 FISTA f(x (k) ) f f FISTA is not a descent method k k Gradient methods 136

155 Dual methods Convex optimization MLSS 2012 Lagrange duality dual decomposition dual proximal gradient method multiplier methods

156 Dual function convex problem (with linear constraints for simplicity) minimize f(x) subject to Gx h Ax = b Lagrangian L(x,λ,ν) = f(x)+λ T (Gx h)+ν T (Ax b) dual function g(λ,ν) = inf x L(x,λ,ν) = f ( G T λ A T ν) h T λ b T ν f (y) = sup x (y T x f(x)) is conjugate of f Dual methods 137

157 Dual problem maximize g(λ, ν) subject to λ 0 a convex optimization problem in λ, ν duality theorem (p is primal optimal value, d is dual optimal value) weak duality: p d (without exception) strong duality: p = d if a constraint qualification holds (for example, primal problem is feasible and dom f open) Dual methods 138

158 Norm approximation minimize Ax b reformulated problem minimize y subject to y = Ax b dual function ( g(ν) = inf y +ν T y ν T Ax+b T ν ) x,y { b = T ν A T ν = 0, ν 1 otherwise dual problem maximize b T z subject to A T z = 0, z 1 Dual methods 139

159 Karush-Kuhn-Tucker optimality conditions if strong duality holds, then x, λ, ν are optimal if and only if 1. x is primal feasible x domf, Gx h, Ax = b 2. λ 0 3. complementary slackness holds λ T (h Gx) = 0 4. x minimizes L(x,λ,ν) = f(x)+λ T (Gx h)+ν T (Ax b) for differentiable f, condition 4 can be expressed as f(x)+g T λ+a T ν = 0 Dual methods 140

160 Outline Lagrange dual dual decomposition dual proximal gradient method multiplier methods

161 Dual methods primal problem minimize f(x) subject to Gx h Ax = b dual problem maximize h T λ b T ν f ( G T λ A T ν) subject to λ 0 possible advantages of solving the dual when using first-order methods dual problem is unconstrained or has simple constraints dual is differentiable dual (almost) decomposes into smaller problems Dual methods 141

162 (Sub-)gradients of conjugate function f (y) = sup x ( y T x f(x) ) subgradient: x is a subgradient at y if it maximizes y T x f(x) if maximizing x is unique, then f is differentiable at y this is the case, for example, if f is strictly convex strongly convex function: f is strongly convex with modulus µ > 0 if f(x) µ 2 xt x is convex implies that f (x) is Lipschitz continuous with parameter 1/µ Dual methods 142

163 Dual gradient method primal problem with equality constraints and dual minimize f(x) subject to Ax = b dual ascent: use (sub-)gradient method to minimize g(ν) = b T ν +f ( A T ν) = sup x ( (b Ax) T ν f(x) ) algorithm x = argmin ˆx ν + = ν +t(ax b) ( f(ˆx)+ν T (Aˆx b) ) of interest if calculation of x is inexpensive (for example, separable) Dual methods 143

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions 3. Convex functions Convex Optimization Boyd & Vandenberghe basic properties and examples operations that preserve convexity the conjugate function quasiconvex functions log-concave and log-convex functions

More information

Convex Optimization. Lieven Vandenberghe University of California, Los Angeles

Convex Optimization. Lieven Vandenberghe University of California, Los Angeles Convex Optimization Lieven Vandenberghe University of California, Los Angeles Tutorial lectures, Machine Learning Summer School University of Cambridge, September 3-4, 2009 Sources: Boyd & Vandenberghe,

More information

10. Proximal point method

10. Proximal point method L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing

More information

Optimisation et simulation numérique.

Optimisation et simulation numérique. Optimisation et simulation numérique. Lecture 1 A. d Aspremont. M2 MathSV: Optimisation et simulation numérique. 1/106 Today Convex optimization: introduction Course organization and other gory details...

More information

Nonlinear Optimization: Algorithms 3: Interior-point methods

Nonlinear Optimization: Algorithms 3: Interior-point methods Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,

More information

2.3 Convex Constrained Optimization Problems

2.3 Convex Constrained Optimization Problems 42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

More information

An Overview Of Software For Convex Optimization. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.

An Overview Of Software For Convex Optimization. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt. An Overview Of Software For Convex Optimization Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.edu In fact, the great watershed in optimization isn t between linearity

More information

An Introduction on SemiDefinite Program

An Introduction on SemiDefinite Program An Introduction on SemiDefinite Program from the viewpoint of computation Hayato Waki Institute of Mathematics for Industry, Kyushu University 2015-10-08 Combinatorial Optimization at Work, Berlin, 2015

More information

Summer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.

Summer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U. Summer course on Convex Optimization Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.Minnesota Interior-Point Methods: the rebirth of an old idea Suppose that f is

More information

A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION

A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION 1 A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION Dimitri Bertsekas M.I.T. FEBRUARY 2003 2 OUTLINE Convexity issues in optimization Historical remarks Our treatment of the subject Three unifying lines of

More information

Proximal mapping via network optimization

Proximal mapping via network optimization L. Vandenberghe EE236C (Spring 23-4) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

Nonlinear Programming Methods.S2 Quadratic Programming

Nonlinear Programming Methods.S2 Quadratic Programming Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective

More information

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen (für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725 Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T

More information

Duality of linear conic problems

Duality of linear conic problems Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least

More information

24. The Branch and Bound Method

24. The Branch and Bound Method 24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no

More information

1 Norms and Vector Spaces

1 Norms and Vector Spaces 008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)

More information

Conic optimization: examples and software

Conic optimization: examples and software Conic optimization: examples and software Etienne de Klerk Tilburg University, The Netherlands Etienne de Klerk (Tilburg University) Conic optimization: examples and software 1 / 16 Outline Conic optimization

More information

Advances in Convex Optimization: Interior-point Methods, Cone Programming, and Applications

Advances in Convex Optimization: Interior-point Methods, Cone Programming, and Applications Advances in Convex Optimization: Interior-point Methods, Cone Programming, and Applications Stephen Boyd Electrical Engineering Department Stanford University (joint work with Lieven Vandenberghe, UCLA)

More information

A Lagrangian-DNN Relaxation: a Fast Method for Computing Tight Lower Bounds for a Class of Quadratic Optimization Problems

A Lagrangian-DNN Relaxation: a Fast Method for Computing Tight Lower Bounds for a Class of Quadratic Optimization Problems A Lagrangian-DNN Relaxation: a Fast Method for Computing Tight Lower Bounds for a Class of Quadratic Optimization Problems Sunyoung Kim, Masakazu Kojima and Kim-Chuan Toh October 2013 Abstract. We propose

More information

Distributed Machine Learning and Big Data

Distributed Machine Learning and Big Data Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya

More information

An Accelerated First-Order Method for Solving SOS Relaxations of Unconstrained Polynomial Optimization Problems

An Accelerated First-Order Method for Solving SOS Relaxations of Unconstrained Polynomial Optimization Problems An Accelerated First-Order Method for Solving SOS Relaxations of Unconstrained Polynomial Optimization Problems Dimitris Bertsimas, Robert M. Freund, and Xu Andy Sun December 2011 Abstract Our interest

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014 LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

More information

(Quasi-)Newton methods

(Quasi-)Newton methods (Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

More information

Several Views of Support Vector Machines

Several Views of Support Vector Machines Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

More information

Convex analysis and profit/cost/support functions

Convex analysis and profit/cost/support functions CALIFORNIA INSTITUTE OF TECHNOLOGY Division of the Humanities and Social Sciences Convex analysis and profit/cost/support functions KC Border October 2004 Revised January 2009 Let A be a subset of R m

More information

Interior Point Methods and Linear Programming

Interior Point Methods and Linear Programming Interior Point Methods and Linear Programming Robert Robere University of Toronto December 13, 2012 Abstract The linear programming problem is usually solved through the use of one of two algorithms: either

More information

SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA)

SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA) SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI (B.Sc.(Hons.), BUAA) A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF MATHEMATICS NATIONAL

More information

8. Linear least-squares

8. Linear least-squares 8. Linear least-squares EE13 (Fall 211-12) definition examples and applications solution of a least-squares problem, normal equations 8-1 Definition overdetermined linear equations if b range(a), cannot

More information

Advanced Lecture on Mathematical Science and Information Science I. Optimization in Finance

Advanced Lecture on Mathematical Science and Information Science I. Optimization in Finance Advanced Lecture on Mathematical Science and Information Science I Optimization in Finance Reha H. Tütüncü Visiting Associate Professor Dept. of Mathematical and Computing Sciences Tokyo Institute of Technology

More information

Interior-Point Algorithms for Quadratic Programming

Interior-Point Algorithms for Quadratic Programming Interior-Point Algorithms for Quadratic Programming Thomas Reslow Krüth Kongens Lyngby 2008 IMM-M.Sc-2008-19 Technical University of Denmark Informatics and Mathematical Modelling Building 321, DK-2800

More information

Date: April 12, 2001. Contents

Date: April 12, 2001. Contents 2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........

More information

Error Bound for Classes of Polynomial Systems and its Applications: A Variational Analysis Approach

Error Bound for Classes of Polynomial Systems and its Applications: A Variational Analysis Approach Outline Error Bound for Classes of Polynomial Systems and its Applications: A Variational Analysis Approach The University of New South Wales SPOM 2013 Joint work with V. Jeyakumar, B.S. Mordukhovich and

More information

4.6 Linear Programming duality

4.6 Linear Programming duality 4.6 Linear Programming duality To any minimization (maximization) LP we can associate a closely related maximization (minimization) LP. Different spaces and objective functions but in general same optimal

More information

Mathematics (MAT) MAT 061 Basic Euclidean Geometry 3 Hours. MAT 051 Pre-Algebra 4 Hours

Mathematics (MAT) MAT 061 Basic Euclidean Geometry 3 Hours. MAT 051 Pre-Algebra 4 Hours MAT 051 Pre-Algebra Mathematics (MAT) MAT 051 is designed as a review of the basic operations of arithmetic and an introduction to algebra. The student must earn a grade of C or in order to enroll in MAT

More information

Solutions Of Some Non-Linear Programming Problems BIJAN KUMAR PATEL. Master of Science in Mathematics. Prof. ANIL KUMAR

Solutions Of Some Non-Linear Programming Problems BIJAN KUMAR PATEL. Master of Science in Mathematics. Prof. ANIL KUMAR Solutions Of Some Non-Linear Programming Problems A PROJECT REPORT submitted by BIJAN KUMAR PATEL for the partial fulfilment for the award of the degree of Master of Science in Mathematics under the supervision

More information

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method Lecture 3 3B1B Optimization Michaelmas 2015 A. Zisserman Linear Programming Extreme solutions Simplex method Interior point method Integer programming and relaxation The Optimization Tree Linear Programming

More information

Solving polynomial least squares problems via semidefinite programming relaxations

Solving polynomial least squares problems via semidefinite programming relaxations Solving polynomial least squares problems via semidefinite programming relaxations Sunyoung Kim and Masakazu Kojima August 2007, revised in November, 2007 Abstract. A polynomial optimization problem whose

More information

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d). 1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

CHAPTER 9. Integer Programming

CHAPTER 9. Integer Programming CHAPTER 9 Integer Programming An integer linear program (ILP) is, by definition, a linear program with the additional constraint that all variables take integer values: (9.1) max c T x s t Ax b and x integral

More information

Two-Stage Stochastic Linear Programs

Two-Stage Stochastic Linear Programs Two-Stage Stochastic Linear Programs Operations Research Anthony Papavasiliou 1 / 27 Two-Stage Stochastic Linear Programs 1 Short Reviews Probability Spaces and Random Variables Convex Analysis 2 Deterministic

More information

On Minimal Valid Inequalities for Mixed Integer Conic Programs

On Minimal Valid Inequalities for Mixed Integer Conic Programs On Minimal Valid Inequalities for Mixed Integer Conic Programs Fatma Kılınç Karzan June 27, 2013 Abstract We study mixed integer conic sets involving a general regular (closed, convex, full dimensional,

More information

What is Linear Programming?

What is Linear Programming? Chapter 1 What is Linear Programming? An optimization problem usually has three essential ingredients: a variable vector x consisting of a set of unknowns to be determined, an objective function of x to

More information

Optimization Methods in Finance

Optimization Methods in Finance Optimization Methods in Finance Gerard Cornuejols Reha Tütüncü Carnegie Mellon University, Pittsburgh, PA 15213 USA January 2006 2 Foreword Optimization models play an increasingly important role in financial

More information

Lecture 2: August 29. Linear Programming (part I)

Lecture 2: August 29. Linear Programming (part I) 10-725: Convex Optimization Fall 2013 Lecture 2: August 29 Lecturer: Barnabás Póczos Scribes: Samrachana Adhikari, Mattia Ciollaro, Fabrizio Lecci Note: LaTeX template courtesy of UC Berkeley EECS dept.

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Support Vector Machine (SVM)

Support Vector Machine (SVM) Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Lecture 2: The SVM classifier

Lecture 2: The SVM classifier Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of

More information

3. Linear Programming and Polyhedral Combinatorics

3. Linear Programming and Polyhedral Combinatorics Massachusetts Institute of Technology Handout 6 18.433: Combinatorial Optimization February 20th, 2009 Michel X. Goemans 3. Linear Programming and Polyhedral Combinatorics Summary of what was seen in the

More information

Additional Exercises for Convex Optimization

Additional Exercises for Convex Optimization Additional Exercises for Convex Optimization Stephen Boyd Lieven Vandenberghe February 11, 2016 This is a collection of additional exercises, meant to supplement those found in the book Convex Optimization,

More information

THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS

THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS KEITH CONRAD 1. Introduction The Fundamental Theorem of Algebra says every nonconstant polynomial with complex coefficients can be factored into linear

More information

INTERIOR POINT POLYNOMIAL TIME METHODS IN CONVEX PROGRAMMING

INTERIOR POINT POLYNOMIAL TIME METHODS IN CONVEX PROGRAMMING GEORGIA INSTITUTE OF TECHNOLOGY SCHOOL OF INDUSTRIAL AND SYSTEMS ENGINEERING LECTURE NOTES INTERIOR POINT POLYNOMIAL TIME METHODS IN CONVEX PROGRAMMING ISYE 8813 Arkadi Nemirovski On sabbatical leave from

More information

Algebra Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the 2012-13 school year.

Algebra Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the 2012-13 school year. This document is designed to help North Carolina educators teach the Common Core (Standard Course of Study). NCDPI staff are continually updating and improving these tools to better serve teachers. Algebra

More information

PUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include 2 + 5.

PUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include 2 + 5. PUTNAM TRAINING POLYNOMIALS (Last updated: November 17, 2015) Remark. This is a list of exercises on polynomials. Miguel A. Lerma Exercises 1. Find a polynomial with integral coefficients whose zeros include

More information

LECTURE 5: DUALITY AND SENSITIVITY ANALYSIS. 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method

LECTURE 5: DUALITY AND SENSITIVITY ANALYSIS. 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method LECTURE 5: DUALITY AND SENSITIVITY ANALYSIS 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method Introduction to dual linear program Given a constraint matrix A, right

More information

First Order Algorithms in Variational Image Processing

First Order Algorithms in Variational Image Processing First Order Algorithms in Variational Image Processing M. Burger, A. Sawatzky, and G. Steidl today Introduction Variational methods in imaging are nowadays developing towards a quite universal and flexible

More information

1 if 1 x 0 1 if 0 x 1

1 if 1 x 0 1 if 0 x 1 Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or

More information

1 Solving LPs: The Simplex Algorithm of George Dantzig

1 Solving LPs: The Simplex Algorithm of George Dantzig Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

Real-Time Embedded Convex Optimization

Real-Time Embedded Convex Optimization Real-Time Embedded Convex Optimization Stephen Boyd joint work with Michael Grant, Jacob Mattingley, Yang Wang Electrical Engineering Department, Stanford University Spencer Schantz Lecture, Lehigh University,

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

GI01/M055 Supervised Learning Proximal Methods

GI01/M055 Supervised Learning Proximal Methods GI01/M055 Supervised Learning Proximal Methods Massimiliano Pontil (based on notes by Luca Baldassarre) (UCL) Proximal Methods 1 / 20 Today s Plan Problem setting Convex analysis concepts Proximal operators

More information

Inner Product Spaces

Inner Product Spaces Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

More information

AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS

AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS Revised Edition James Epperson Mathematical Reviews BICENTENNIAL 0, 1 8 0 7 z ewiley wu 2007 r71 BICENTENNIAL WILEY-INTERSCIENCE A John Wiley & Sons, Inc.,

More information

Introduction to Algebraic Geometry. Bézout s Theorem and Inflection Points

Introduction to Algebraic Geometry. Bézout s Theorem and Inflection Points Introduction to Algebraic Geometry Bézout s Theorem and Inflection Points 1. The resultant. Let K be a field. Then the polynomial ring K[x] is a unique factorisation domain (UFD). Another example of a

More information

Section 1.1. Introduction to R n

Section 1.1. Introduction to R n The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Inner Product Spaces and Orthogonality

Inner Product Spaces and Orthogonality Inner Product Spaces and Orthogonality week 3-4 Fall 2006 Dot product of R n The inner product or dot product of R n is a function, defined by u, v a b + a 2 b 2 + + a n b n for u a, a 2,, a n T, v b,

More information

Massive Data Classification via Unconstrained Support Vector Machines

Massive Data Classification via Unconstrained Support Vector Machines Massive Data Classification via Unconstrained Support Vector Machines Olvi L. Mangasarian and Michael E. Thompson Computer Sciences Department University of Wisconsin 1210 West Dayton Street Madison, WI

More information

INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: FILTER METHODS AND MERIT FUNCTIONS

INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: FILTER METHODS AND MERIT FUNCTIONS INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: FILTER METHODS AND MERIT FUNCTIONS HANDE Y. BENSON, DAVID F. SHANNO, AND ROBERT J. VANDERBEI Operations Research and Financial Engineering Princeton

More information

14. Nonlinear least-squares

14. Nonlinear least-squares 14 Nonlinear least-squares EE103 (Fall 2011-12) definition Newton s method Gauss-Newton method 14-1 Nonlinear least-squares minimize r i (x) 2 = r(x) 2 r i is a nonlinear function of the n-vector of variables

More information

Lecture Topic: Low-Rank Approximations

Lecture Topic: Low-Rank Approximations Lecture Topic: Low-Rank Approximations Low-Rank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Linear Programming Notes V Problem Transformations

Linear Programming Notes V Problem Transformations Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material

More information

Linear Programming. Widget Factory Example. Linear Programming: Standard Form. Widget Factory Example: Continued.

Linear Programming. Widget Factory Example. Linear Programming: Standard Form. Widget Factory Example: Continued. Linear Programming Widget Factory Example Learning Goals. Introduce Linear Programming Problems. Widget Example, Graphical Solution. Basic Theory:, Vertices, Existence of Solutions. Equivalent formulations.

More information

A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

More information

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm. Approximation Algorithms 11 Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of three

More information

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}. Walrasian Demand Econ 2100 Fall 2015 Lecture 5, September 16 Outline 1 Walrasian Demand 2 Properties of Walrasian Demand 3 An Optimization Recipe 4 First and Second Order Conditions Definition Walrasian

More information

Support Vector Machines Explained

Support Vector Machines Explained March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA This handout presents the second derivative test for a local extrema of a Lagrange multiplier problem. The Section 1 presents a geometric motivation for the

More information

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α

More information

Discrete Optimization

Discrete Optimization Discrete Optimization [Chen, Batson, Dang: Applied integer Programming] Chapter 3 and 4.1-4.3 by Johan Högdahl and Victoria Svedberg Seminar 2, 2015-03-31 Todays presentation Chapter 3 Transforms using

More information

Lecture 6: Logistic Regression

Lecture 6: Logistic Regression Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,

More information

Linear Algebra I. Ronald van Luijk, 2012

Linear Algebra I. Ronald van Luijk, 2012 Linear Algebra I Ronald van Luijk, 2012 With many parts from Linear Algebra I by Michael Stoll, 2007 Contents 1. Vector spaces 3 1.1. Examples 3 1.2. Fields 4 1.3. The field of complex numbers. 6 1.4.

More information

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8 Spaces and bases Week 3: Wednesday, Feb 8 I have two favorite vector spaces 1 : R n and the space P d of polynomials of degree at most d. For R n, we have a canonical basis: R n = span{e 1, e 2,..., e

More information

Notes on Symmetric Matrices

Notes on Symmetric Matrices CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.

More information

Linear Programming I

Linear Programming I Linear Programming I November 30, 2003 1 Introduction In the VCR/guns/nuclear bombs/napkins/star wars/professors/butter/mice problem, the benevolent dictator, Bigus Piguinus, of south Antarctica penguins

More information

Practical Guide to the Simplex Method of Linear Programming

Practical Guide to the Simplex Method of Linear Programming Practical Guide to the Simplex Method of Linear Programming Marcel Oliver Revised: April, 0 The basic steps of the simplex algorithm Step : Write the linear programming problem in standard form Linear

More information

BANACH AND HILBERT SPACE REVIEW

BANACH AND HILBERT SPACE REVIEW BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but

More information

Numerical Methods I Eigenvalue Problems

Numerical Methods I Eigenvalue Problems Numerical Methods I Eigenvalue Problems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course G63.2010.001 / G22.2420-001, Fall 2010 September 30th, 2010 A. Donev (Courant Institute)

More information

Online Convex Optimization

Online Convex Optimization E0 370 Statistical Learning heory Lecture 19 Oct 22, 2013 Online Convex Optimization Lecturer: Shivani Agarwal Scribe: Aadirupa 1 Introduction In this lecture we shall look at a fairly general setting

More information

Linear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc.

Linear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc. 1. Introduction Linear Programming for Optimization Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc. 1.1 Definition Linear programming is the name of a branch of applied mathematics that

More information