Introduction to Convex Optimization for Machine Learning

Size: px
Start display at page:

Download "Introduction to Convex Optimization for Machine Learning"

Transcription

1 Introduction to Convex Optimization for Machine Learning John Duchi University of California, Berkeley Practical Machine Learning, Fall 2009 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

2 Outline What is Optimization Convex Sets Convex Functions Convex Optimization Problems Lagrange Duality Optimization Algorithms Take Home Messages Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

3 What is Optimization What is Optimization (and why do we care?) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

4 What is Optimization? What is Optimization Finding the minimizer of a function subject to constraints: minimize f 0 (x) x s.t. f i (x) 0, i = {1,...,k} h j (x) = 0, j = {1,...,l} Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

5 What is Optimization? What is Optimization Finding the minimizer of a function subject to constraints: minimize f 0 (x) x s.t. f i (x) 0, i = {1,...,k} h j (x) = 0, j = {1,...,l} Example: Stock market. Minimize variance of return subject to getting at least $50. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

6 What is Optimization Why do we care? Optimization is at the heart of many (most practical?) machine learning algorithms. Linear regression: minimize w Xw y 2 Classification (logistic regresion or SVM): minimize w n log ( 1 + exp( y i x T i w) ) i=1 or w 2 + C n ξ i s.t. ξ i 1 y i x T i w, ξ i 0. i=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

7 We still care... What is Optimization Maximum likelihood estimation: maximize θ n log p θ (x i ) i=1 Collaborative filtering: minimize w log ( 1 + exp(w T x i w T x j ) ) i j k-means: minimize µ 1,...,µ k J(µ) = k j=1 i C j x i µ j 2 And more (graphical models, feature selection, active learning, control) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

8 What is Optimization But generally speaking... We re screwed. Local (non global) minima of f 0 All kinds of constraints (even restricting to continuous functions): h(x) = sin(2πx) = Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

9 What is Optimization But generally speaking... We re screwed. Local (non global) minima of f 0 All kinds of constraints (even restricting to continuous functions): h(x) = sin(2πx) = Go for convex problems! Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

10 Convex Sets Convex Sets Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

11 Convex Sets Definition A set C R n is convex if for x, y C and any α [0, 1], αx + (1 α)y C. x y Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

12 Examples Convex Sets All of R n (obvious) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

13 Examples Convex Sets All of R n (obvious) Non-negative orthant, R n +: let x 0, y 0, clearly αx + (1 α)y 0. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

14 Examples Convex Sets All of R n (obvious) Non-negative orthant, R n +: let x 0, y 0, clearly αx + (1 α)y 0. Norm balls: let x 1, y 1, then αx + (1 α)y αx + (1 α)y = α x + (1 α) y 1. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

15 Examples Convex Sets Affine subspaces: Ax = b, Ay = b, then A(αx + (1 α)y) = αax + (1 α)ay = αb + (1 α)b = b x x x Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

16 More examples Convex Sets Arbitrary intersections of convex sets: let C i be convex for i I, C = i C i, then x C, y C αx + (1 α)y C i i I so αx + (1 α)y C. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

17 More examples Convex Sets PSD Matrices, a.k.a. the positive semidefinite cone S n + R n n. A S n + means x T Ax 0 for all x R n. For A, B S + n, x T (αa + (1 α)b) x z = αx T Ax + (1 α)x T Bx On right: {[ ] } S 2 x z + = 0 = { 1 y 0 x, y, z : x 0, y 0, xy z 2} z y 0.4 x Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

18 Convex Functions Convex Functions Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

19 Convex Functions Definition A function f : R n R is convex if for x, y domf and any α [0, 1], f(αx + (1 α)y) αf(x) + (1 α)f(y). αf(x) + (1 - α)f(y) f(y) f(x) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

20 Convex Functions First order convexity conditions Theorem Suppose f : R n R is differentiable. Then f is convex if and only if for all x, y domf f(y) f(x) + f(x) T (y x) f(y) f(x) + f(x) T (y - x) (x, f(x)) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

21 Convex Functions Actually, more general than that Definition The subgradient set, or subdifferential set, f(x) of f at x is f(x) = { g : f(y) f(x) + g T (y x) for all y }. f(y) Theorem f : R n R is convex if and only if it has non-empty subdifferential set everywhere. (x, f(x)) f(x) + g T (y - x) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

22 Convex Functions Second order convexity conditions Theorem Suppose f : R n R is twice differentiable. Then f is convex if and only if for all x dom f, 2 f(x) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

23 Convex Functions Convex sets and convex functions Definition The epigraph of a function f is the set of points epi f epi f = {(x, t) : f(x) t}. epi f is convex if and only if f is convex. Sublevel sets, {x : f(x) a} are convex for convex f. a Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

24 Examples Convex Functions Examples Linear/affine functions: f(x) = b T x + c. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

25 Examples Convex Functions Examples Linear/affine functions: f(x) = b T x + c. Quadratic functions: for A 0. For regression: f(x) = 1 2 xt Ax + b T x + c 1 2 Xw y 2 = 1 2 wt X T Xw y T Xw yt y. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

26 More examples Convex Functions Examples Norms (like l 1 or l 2 for regularization): αx + (1 α)y αx + (1 α)y = α x + (1 α) y. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

27 Convex Functions Examples More examples Norms (like l 1 or l 2 for regularization): αx + (1 α)y αx + (1 α)y = α x + (1 α) y. Composition with an affine function f(ax + b): f(a(αx + (1 α)y) + b) = f(α(ax + b) + (1 α)(ay + b)) αf(ax + b) + (1 α)f(ay + b) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

28 Convex Functions Examples More examples Norms (like l 1 or l 2 for regularization): αx + (1 α)y αx + (1 α)y = α x + (1 α) y. Composition with an affine function f(ax + b): f(a(αx + (1 α)y) + b) = f(α(ax + b) + (1 α)(ay + b)) αf(ax + b) + (1 α)f(ay + b) Log-sum-exp (via 2 f(x) PSD): ( n ) f(x) = log exp(x i ) i=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

29 Convex Functions Examples Important examples in Machine Learning 3 SVM loss: f(w) = [ 1 y i x T i w ] + Binary logistic loss: [1 - x] + f(w) = log ( 1 + exp( y i x T i w) ) log(1 + e x ) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

30 Convex Optimization Problems Convex Optimization Problems Definition An optimization problem is convex if its objective is a convex function, the inequality constraints f j are convex, and the equality constraints h j are affine minimize f 0 (x) (Convex function) x s.t. f i (x) 0 (Convex sets) h j (x) = 0 (Affine) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

31 It s nice to be convex Convex Optimization Problems Theorem If ˆx is a local minimizer of a convex optimization problem, it is a global minimizer x Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

32 Convex Optimization Problems Even more reasons to be convex Theorem f(x) = 0 if and only if x is a global minimizer of f(x). Proof. f(x) = 0. We have f(y) f(x) + f(x) T (y x) = f(x). f(x) 0. There is a direction of descent. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

33 Convex Optimization Problems LET S TAKE A BREAK Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

34 Lagrange Duality Lagrange Duality Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

35 Lagrange Duality Goals of Lagrange Duality Get certificate for optimality of a problem Remove constraints Reformulate problem Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

36 Constructing the dual Lagrange Duality Start with optimization problem: minimize f 0 (x) x s.t. f i (x) 0, i = {1,...,k} h j (x) = 0, j = {1,...,l} Form Lagrangian using Lagrange multipliers λ i 0, ν i R Form dual function g(λ, ν) = inf x L(x, λ, ν) = f 0 (x) + L(x, λ, ν) = inf x k λ i f i (x) + i=1 f 0(x) + l ν j h j (x) j=1 k λ i f i (x) + i=1 l ν j h j (x) j=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

37 Remarks Lagrange Duality Original problem is equivalent to [ minimize x sup L(x, λ, ν) λ 0,ν Dual problem is switching the min and max: [ ] maximize inf L(x, λ, ν). λ 0,ν x ] Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

38 Lagrange Duality One Great Property of Dual Lemma (Weak Duality) If λ 0, then g(λ, ν) f 0 (x ). Proof. We have g(λ, ν) = inf L(x, λ, ν) x L(x, λ, ν) k l = f 0 (x ) + λ i f i (x ) + ν j h j (x ) f 0 (x ). i=1 j=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

39 Lagrange Duality The Greatest Property of the Dual Theorem For reasonable 1 convex problems, sup g(λ, ν) = f 0 (x ) λ 0,ν 1 There are conditions called constraint qualification for which this is true Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

40 Geometric Look Lagrange Duality Minimize 1 2 (x c 1)2 subject to x 2 c x True function (blue), constraint (green), L(x, λ) for different λ (dotted) Dual function g(λ) (black), primal optimal (dotted blue) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53 λ

41 Intuition Lagrange Duality Can interpret duality as linear approximation. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

42 Lagrange Duality Intuition Can interpret duality as linear approximation. I (a) = if a > 0, 0 otherwise; I 0 (a) = unless a = 0. Rewrite problem as minimize x k l f 0 (x) + I (f i (x)) + I 0 (h j (x)) i=1 j=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

43 Lagrange Duality Intuition Can interpret duality as linear approximation. I (a) = if a > 0, 0 otherwise; I 0 (a) = unless a = 0. Rewrite problem as minimize x k l f 0 (x) + I (f i (x)) + I 0 (h j (x)) i=1 j=1 Replace I(f i (x)) with λ i f i (x); a measure of displeasure when λ i 0, f i (x) > 0. ν i h j (x) lower bounds I 0 (h j (x)): minimize x k l f 0 (x) + λ i f i (x) + ν j h j (x) i=1 j=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

44 Lagrange Duality Example: Linearly constrained least squares minimize x Form the Lagrangian: 1 2 Ax b 2 s.t. Bx = d. L(x, ν) = 1 2 Ax b 2 + ν T (Bx d) Take infimum: x L(x, ν) = A T Ax A T b + B T ν x = (A T A) 1 (A T b B T ν) Simple unconstrained quadratic problem! inf x = 1 2 L(x, ν) A(A T A) 1 (A T b B T ν) b 2 + ν T B((A T A) 1 A T b B T ν) d T ν Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

45 Lagrange Duality Example: Quadratically constrained least squares 1 minimize x 2 Ax b 2 s.t. Form the Lagrangian (λ 0): 1 2 x 2 c. L(x, λ) = 1 2 Ax b λ( x 2 2c) Take infimum: x L(x, ν) = A T Ax A T b + λi x = (A T A + λi) 1 A T b inf L(x, λ) = 1 A(A T A + λi) 1 A T b b 2 + λ (A T A + λi) 1 A T b 2 λc x 2 2 One variable dual problem! g(λ) = 1 2 bt A(A T A + λi) 1 A T b λc b 2. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

46 Lagrange Duality Uses of the Dual Main use: certificate of optimality (a.k.a. duality gap). If we have feasible x and know the dual g(λ, ν), then g(λ, ν) f 0 (x ) f 0 (x) f 0 (x ) f 0 (x) g(λ, ν) f 0 (x) f 0 (x) f 0 (x ) f 0 (x) g(λ, ν). Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

47 Lagrange Duality Uses of the Dual Main use: certificate of optimality (a.k.a. duality gap). If we have feasible x and know the dual g(λ, ν), then g(λ, ν) f 0 (x ) f 0 (x) f 0 (x ) f 0 (x) g(λ, ν) f 0 (x) f 0 (x) f 0 (x ) f 0 (x) g(λ, ν). Also used in more advanced primal-dual algorithms (we won t talk about these). Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

48 Optimization Algorithms Optimization Algorithms Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

49 Gradient Descent Optimization Algorithms Gradient Methods The simplest algorithm in the world (almost). Goal: minimize x f(x) Just iterate where η t is stepsize. x t+1 = x t η t f(x t ) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

50 Single Step Illustration Optimization Algorithms Gradient Methods f(x) (x t, f(x t )) f(x t ) - η f(x t ) T (x - x t ) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

51 Full Gradient Descent Optimization Algorithms Gradient Methods f(x) = log(exp(x 1 + 3x 2.1) + exp(x 1 3x 2.1) + exp( x 1.1)) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

52 Optimization Algorithms Gradient Methods Stepsize Selection How do I choose a stepsize? Idea 1: exact line search Too expensive to be practical. η t = argminf (x η f(x)) η Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

53 Optimization Algorithms Gradient Methods Stepsize Selection How do I choose a stepsize? Idea 1: exact line search Too expensive to be practical. η t = argminf (x η f(x)) η Idea 2: backtracking (Armijo) line search. Let α (0, 1 2 ), β (0, 1). Multiply η = βη until Works well in practice. f (x η f(x)) f(x) αη f(x) 2 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

54 Optimization Algorithms Gradient Methods Illustration of Armijo/Backtracking Line Search f(x - η f(x)) f(x) - ηα f(x) 2 η t = 0 η 0 η As a function of stepsize η. Clearly a region where f(x η f(x)) is below line f(x) αη f(x) 2. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

55 Newton s method Optimization Algorithms Gradient Methods Idea: use a second-order approximation to function. f(x + x) f(x) + f(x) T x xt 2 f(x) x Choose x to minimize above: x = [ 2 f(x) ] 1 f(x) This is descent direction: f(x) T x = f(x) T [ 2 f(x) ] 1 f(x) < 0. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

56 Newton step picture Optimization Algorithms Gradient Methods ˆf f (x, f(x)) (x + x, f(x + x)) ˆf is 2 nd -order approximation, f is true function. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

57 Optimization Algorithms Gradient Methods Convergence of gradient descent and Newton s method Strongly convex case: 2 f(x) mi, then Linear convergence. For some γ (0, 1), f(x t ) f(x ) γ t, γ < 1. f(x t ) f(x ) γ t or t 1 γ log 1 ε f(x t) f(x ) ε. Smooth case: f(x) f(y) C x y. f(x t ) f(x ) K t 2 Newton s method often is faster, especially when f has long valleys Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

58 Optimization Algorithms What about constraints? Gradient Methods Linear constraints Ax = b are easy. For example, in Newton method (assume Ax = b): minimize x f(x) T x xt 2 f(x) x s.t. A x = 0. Solution x satisfies A(x + x) = Ax + A x = b. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

59 Optimization Algorithms Gradient Methods What about constraints? Linear constraints Ax = b are easy. For example, in Newton method (assume Ax = b): minimize x f(x) T x xt 2 f(x) x s.t. A x = 0. Solution x satisfies A(x + x) = Ax + A x = b. Inequality constraints are a bit tougher... minimize f(x) T x + 1 x 2 xt 2 f(x) x s.t. f i (x + x) 0 just as hard as original. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

60 Optimization Algorithms Gradient Methods Logarithmic Barrier Methods Goal: Convert to minimize x f 0 (x) minimize x s.t. f i (x) 0, i = {1,...,k} f 0 (x) + k I (f i (x)) i=1 Approximate I (u) t log( u) for small t. minimize x k f 0 (x) t log ( f i (x)) i=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

61 The barrier function Optimization Algorithms Gradient Methods u I (u) is dotted line, others are t log( u). Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

62 Illustration Optimization Algorithms Gradient Methods Minimizing c T x subject to Ax b. c t = 1 t = 5 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

63 Subgradient Descent Optimization Algorithms Subgradient Methods Really, the simplest algorithm in the world. Goal: minimize x f(x) Just iterate where η t is a stepsize, g t f(x t ). x t+1 = x t η t g t Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

64 Optimization Algorithms Subgradient Methods Why subgradient descent? Lots of non-differentiable convex functions used in machine learning: f(x) = [ 1 a T x ] k +, f(x) = x 1, f(x) = σ r (X) r=1 where σ r is the rth singular value of X. Easy to analyze Do not even need true sub-gradient: just have Eg t f(x t ). Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

65 Optimization Algorithms Subgradient Methods Proof of convergence for subgradient descent Idea: bound x t+1 x using subgradient inequality. Assume that g t G. x t+1 x 2 = x t ηg t x 2 = x t x 2 2ηg T t (x t x ) + η 2 g t 2 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

66 Optimization Algorithms Subgradient Methods Proof of convergence for subgradient descent Idea: bound x t+1 x using subgradient inequality. Assume that g t G. Recall that x t+1 x 2 = x t ηg t x 2 = x t x 2 2ηg T t (x t x ) + η 2 g t 2 f(x ) f(x t ) + g T t (x x t ) g T t (x t x ) f(x ) f(x t ) so Then x t+1 x 2 x t x 2 + 2η [f(x ) f(x t )] + η 2 G 2. f(x t ) f(x ) x t x 2 x t+1 x 2 + η 2η 2 G2. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

67 Almost done... Optimization Algorithms Subgradient Methods Sum from t = 1 to T: T t=1 f(x t ) f(x ) 1 2η T t=1 [ x t x 2 x t+1 x 2] + Tη 2 G2 = 1 2η x 1 x 2 1 2η x T+1 x 2 + Tη 2 G2 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

68 Almost done... Optimization Algorithms Subgradient Methods Sum from t = 1 to T: T t=1 f(x t ) f(x ) 1 2η T t=1 [ x t x 2 x t+1 x 2] + Tη 2 G2 = 1 2η x 1 x 2 1 2η x T+1 x 2 + Tη 2 G2 Now let D = x 1 x, and keep track of min along run, f (x best ) f(x ) 1 2ηT D2 + η 2 G2. Set η = D G T and f (x best) f(x ) DG T. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

69 Optimization Algorithms Subgradient Methods Extension: projected subgradient descent Now have a convex constraint set X. Goal: minimize f(x) x X Idea: do subgradient steps, project x t back into X at every iteration. Π X (x t ) x t x t+1 = Π X (x t ηg t ) X Proof: x Π X (x t ) x x t x if x X. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

70 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

71 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

72 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

73 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

74 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

75 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

76 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

77 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

78 Optimization Algorithms Subgradient Methods Convergence results for (projected) subgradient methods Any decreasing, non-summable stepsize η t 0, t=1 η t = gives f ( x avg(t) ) f(x ) 0. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

79 Optimization Algorithms Subgradient Methods Convergence results for (projected) subgradient methods Any decreasing, non-summable stepsize η t 0, t=1 η t = gives f ( x avg(t) ) f(x ) 0. Slightly less brain-dead analysis than earlier shows with η t 1/ t f ( x avg(t) ) f(x ) C t Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

80 Optimization Algorithms Subgradient Methods Convergence results for (projected) subgradient methods Any decreasing, non-summable stepsize η t 0, t=1 η t = gives f ( x avg(t) ) f(x ) 0. Slightly less brain-dead analysis than earlier shows with η t 1/ t f ( x avg(t) ) f(x ) C t Same convergence when g t is random, i.e. Eg t f(x t ). Example: f(w) = 1 2 w 2 + C Just pick random training example. n [ 1 yi x T i w ] + i=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

81 Recap Take Home Messages Defined convex sets and functions Saw why we want optimization problems to be convex (solvable) Sketched some of Lagrange duality First order methods are easy and (often) work well Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

82 Take Home Messages Take Home Messages Many useful problems formulated as convex optimization problems If it is not convex and not an eigenvalue problem, you are out of luck If it is convex, you are golden Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

2.3 Convex Constrained Optimization Problems

2.3 Convex Constrained Optimization Problems 42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

More information

A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION

A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION 1 A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION Dimitri Bertsekas M.I.T. FEBRUARY 2003 2 OUTLINE Convexity issues in optimization Historical remarks Our treatment of the subject Three unifying lines of

More information

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen (für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained

More information

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725 Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T

More information

Lecture 2: The SVM classifier

Lecture 2: The SVM classifier Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function

More information

Several Views of Support Vector Machines

Several Views of Support Vector Machines Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

More information

Nonlinear Optimization: Algorithms 3: Interior-point methods

Nonlinear Optimization: Algorithms 3: Interior-point methods Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,

More information

10. Proximal point method

10. Proximal point method L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing

More information

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d). 1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction

More information

Summer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.

Summer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U. Summer course on Convex Optimization Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.Minnesota Interior-Point Methods: the rebirth of an old idea Suppose that f is

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 J. Zhang Institute of Applied Mathematics, Chongqing University of Posts and Telecommunications, Chongqing

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

Date: April 12, 2001. Contents

Date: April 12, 2001. Contents 2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........

More information

GI01/M055 Supervised Learning Proximal Methods

GI01/M055 Supervised Learning Proximal Methods GI01/M055 Supervised Learning Proximal Methods Massimiliano Pontil (based on notes by Luca Baldassarre) (UCL) Proximal Methods 1 / 20 Today s Plan Problem setting Convex analysis concepts Proximal operators

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric

More information

First Order Algorithms in Variational Image Processing

First Order Algorithms in Variational Image Processing First Order Algorithms in Variational Image Processing M. Burger, A. Sawatzky, and G. Steidl today Introduction Variational methods in imaging are nowadays developing towards a quite universal and flexible

More information

1 Norms and Vector Spaces

1 Norms and Vector Spaces 008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

Two-Stage Stochastic Linear Programs

Two-Stage Stochastic Linear Programs Two-Stage Stochastic Linear Programs Operations Research Anthony Papavasiliou 1 / 27 Two-Stage Stochastic Linear Programs 1 Short Reviews Probability Spaces and Random Variables Convex Analysis 2 Deterministic

More information

Duality of linear conic problems

Duality of linear conic problems Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Convex analysis and profit/cost/support functions

Convex analysis and profit/cost/support functions CALIFORNIA INSTITUTE OF TECHNOLOGY Division of the Humanities and Social Sciences Convex analysis and profit/cost/support functions KC Border October 2004 Revised January 2009 Let A be a subset of R m

More information

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem

More information

Linear Programming Notes V Problem Transformations

Linear Programming Notes V Problem Transformations Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material

More information

Vector and Matrix Norms

Vector and Matrix Norms Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

More information

Lecture 6: Logistic Regression

Lecture 6: Logistic Regression Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,

More information

Optimization. J(f) := λω(f) + R emp (f) (5.1) m l(f(x i ) y i ). (5.2) i=1

Optimization. J(f) := λω(f) + R emp (f) (5.1) m l(f(x i ) y i ). (5.2) i=1 5 Optimization Optimization plays an increasingly important role in machine learning. For instance, many machine learning algorithms minimize a regularized risk functional: with the empirical risk min

More information

Distributed Machine Learning and Big Data

Distributed Machine Learning and Big Data Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya

More information

(Basic definitions and properties; Separation theorems; Characterizations) 1.1 Definition, examples, inner description, algebraic properties

(Basic definitions and properties; Separation theorems; Characterizations) 1.1 Definition, examples, inner description, algebraic properties Lecture 1 Convex Sets (Basic definitions and properties; Separation theorems; Characterizations) 1.1 Definition, examples, inner description, algebraic properties 1.1.1 A convex set In the school geometry

More information

Proximal mapping via network optimization

Proximal mapping via network optimization L. Vandenberghe EE236C (Spring 23-4) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:

More information

1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where.

1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where. Introduction Linear Programming Neil Laws TT 00 A general optimization problem is of the form: choose x to maximise f(x) subject to x S where x = (x,..., x n ) T, f : R n R is the objective function, S

More information

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions 3. Convex functions Convex Optimization Boyd & Vandenberghe basic properties and examples operations that preserve convexity the conjugate function quasiconvex functions log-concave and log-convex functions

More information

Nonlinear Programming Methods.S2 Quadratic Programming

Nonlinear Programming Methods.S2 Quadratic Programming Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

Lecture 2: August 29. Linear Programming (part I)

Lecture 2: August 29. Linear Programming (part I) 10-725: Convex Optimization Fall 2013 Lecture 2: August 29 Lecturer: Barnabás Póczos Scribes: Samrachana Adhikari, Mattia Ciollaro, Fabrizio Lecci Note: LaTeX template courtesy of UC Berkeley EECS dept.

More information

Inner Product Spaces

Inner Product Spaces Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

More information

(Quasi-)Newton methods

(Quasi-)Newton methods (Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

Some representability and duality results for convex mixed-integer programs.

Some representability and duality results for convex mixed-integer programs. Some representability and duality results for convex mixed-integer programs. Santanu S. Dey Joint work with Diego Morán and Juan Pablo Vielma December 17, 2012. Introduction About Motivation Mixed integer

More information

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:

More information

A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

More information

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method Lecture 3 3B1B Optimization Michaelmas 2015 A. Zisserman Linear Programming Extreme solutions Simplex method Interior point method Integer programming and relaxation The Optimization Tree Linear Programming

More information

Lecture 7: Finding Lyapunov Functions 1

Lecture 7: Finding Lyapunov Functions 1 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.243j (Fall 2003): DYNAMICS OF NONLINEAR SYSTEMS by A. Megretski Lecture 7: Finding Lyapunov Functions 1

More information

CHAPTER 9. Integer Programming

CHAPTER 9. Integer Programming CHAPTER 9 Integer Programming An integer linear program (ILP) is, by definition, a linear program with the additional constraint that all variables take integer values: (9.1) max c T x s t Ax b and x integral

More information

Notes on Symmetric Matrices

Notes on Symmetric Matrices CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.

More information

Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach

Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach MASTER S THESIS Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach PAULINE ALDENVIK MIRJAM SCHIERSCHER Department of Mathematical

More information

Support Vector Machines Explained

Support Vector Machines Explained March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

Dual Methods for Total Variation-Based Image Restoration

Dual Methods for Total Variation-Based Image Restoration Dual Methods for Total Variation-Based Image Restoration Jamylle Carter Institute for Mathematics and its Applications University of Minnesota, Twin Cities Ph.D. (Mathematics), UCLA, 2001 Advisor: Tony

More information

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015. Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x

More information

GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1

GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1 (R) User Manual Environmental Energy Technologies Division Berkeley, CA 94720 http://simulationresearch.lbl.gov Michael Wetter MWetter@lbl.gov February 20, 2009 Notice: This work was supported by the U.S.

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

Elementary Gradient-Based Parameter Estimation

Elementary Gradient-Based Parameter Estimation Elementary Gradient-Based Parameter Estimation Julius O. Smith III Center for Computer Research in Music and Acoustics (CCRMA Department of Music, Stanford University, Stanford, California 94305 USA Abstract

More information

Chapter 4. Duality. 4.1 A Graphical Example

Chapter 4. Duality. 4.1 A Graphical Example Chapter 4 Duality Given any linear program, there is another related linear program called the dual. In this chapter, we will develop an understanding of the dual linear program. This understanding translates

More information

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

More information

4.6 Linear Programming duality

4.6 Linear Programming duality 4.6 Linear Programming duality To any minimization (maximization) LP we can associate a closely related maximization (minimization) LP. Different spaces and objective functions but in general same optimal

More information

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA This handout presents the second derivative test for a local extrema of a Lagrange multiplier problem. The Section 1 presents a geometric motivation for the

More information

Høgskolen i Narvik Sivilingeniørutdanningen STE6237 ELEMENTMETODER. Oppgaver

Høgskolen i Narvik Sivilingeniørutdanningen STE6237 ELEMENTMETODER. Oppgaver Høgskolen i Narvik Sivilingeniørutdanningen STE637 ELEMENTMETODER Oppgaver Klasse: 4.ID, 4.IT Ekstern Professor: Gregory A. Chechkin e-mail: chechkin@mech.math.msu.su Narvik 6 PART I Task. Consider two-point

More information

Cost Minimization and the Cost Function

Cost Minimization and the Cost Function Cost Minimization and the Cost Function Juan Manuel Puerta October 5, 2009 So far we focused on profit maximization, we could look at a different problem, that is the cost minimization problem. This is

More information

Metric Spaces. Chapter 1

Metric Spaces. Chapter 1 Chapter 1 Metric Spaces Many of the arguments you have seen in several variable calculus are almost identical to the corresponding arguments in one variable calculus, especially arguments concerning convergence

More information

The equivalence of logistic regression and maximum entropy models

The equivalence of logistic regression and maximum entropy models The equivalence of logistic regression and maximum entropy models John Mount September 23, 20 Abstract As our colleague so aptly demonstrated ( http://www.win-vector.com/blog/20/09/the-simplerderivation-of-logistic-regression/

More information

A FIRST COURSE IN OPTIMIZATION THEORY

A FIRST COURSE IN OPTIMIZATION THEORY A FIRST COURSE IN OPTIMIZATION THEORY RANGARAJAN K. SUNDARAM New York University CAMBRIDGE UNIVERSITY PRESS Contents Preface Acknowledgements page xiii xvii 1 Mathematical Preliminaries 1 1.1 Notation

More information

1 Sufficient statistics

1 Sufficient statistics 1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

Derivative Free Optimization

Derivative Free Optimization Department of Mathematics Derivative Free Optimization M.J.D. Powell LiTH-MAT-R--2014/02--SE Department of Mathematics Linköping University S-581 83 Linköping, Sweden. Three lectures 1 on Derivative Free

More information

RANDOM INTERVAL HOMEOMORPHISMS. MICHA L MISIUREWICZ Indiana University Purdue University Indianapolis

RANDOM INTERVAL HOMEOMORPHISMS. MICHA L MISIUREWICZ Indiana University Purdue University Indianapolis RANDOM INTERVAL HOMEOMORPHISMS MICHA L MISIUREWICZ Indiana University Purdue University Indianapolis This is a joint work with Lluís Alsedà Motivation: A talk by Yulij Ilyashenko. Two interval maps, applied

More information

Adaptive Bound Optimization for Online Convex Optimization

Adaptive Bound Optimization for Online Convex Optimization Adaptive Bound Optimization for Online Convex Optimization H. Brendan McMahan Google, Inc. mcmahan@google.com Matthew Streeter Google, Inc. mstreeter@google.com Abstract We introduce a new online convex

More information

Introduction to Online Optimization

Introduction to Online Optimization Princeton University - Department of Operations Research and Financial Engineering Introduction to Online Optimization Sébastien Bubeck December 14, 2011 1 Contents Chapter 1. Introduction 5 1.1. Statistical

More information

CONSTRAINED NONLINEAR PROGRAMMING

CONSTRAINED NONLINEAR PROGRAMMING 149 CONSTRAINED NONLINEAR PROGRAMMING We now turn to methods for general constrained nonlinear programming. These may be broadly classified into two categories: 1. TRANSFORMATION METHODS: In this approach

More information

Online Convex Optimization

Online Convex Optimization E0 370 Statistical Learning heory Lecture 19 Oct 22, 2013 Online Convex Optimization Lecturer: Shivani Agarwal Scribe: Aadirupa 1 Introduction In this lecture we shall look at a fairly general setting

More information

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α

More information

Stochastic Inventory Control

Stochastic Inventory Control Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the

More information

1. Let X and Y be normed spaces and let T B(X, Y ).

1. Let X and Y be normed spaces and let T B(X, Y ). Uppsala Universitet Matematiska Institutionen Andreas Strömbergsson Prov i matematik Funktionalanalys Kurs: NVP, Frist. 2005-03-14 Skrivtid: 9 11.30 Tillåtna hjälpmedel: Manuella skrivdon, Kreyszigs bok

More information

No: 10 04. Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics

No: 10 04. Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics No: 10 04 Bilkent University Monotonic Extension Farhad Husseinov Discussion Papers Department of Economics The Discussion Papers of the Department of Economics are intended to make the initial results

More information

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}. Walrasian Demand Econ 2100 Fall 2015 Lecture 5, September 16 Outline 1 Walrasian Demand 2 Properties of Walrasian Demand 3 An Optimization Recipe 4 First and Second Order Conditions Definition Walrasian

More information

MATH 425, PRACTICE FINAL EXAM SOLUTIONS.

MATH 425, PRACTICE FINAL EXAM SOLUTIONS. MATH 45, PRACTICE FINAL EXAM SOLUTIONS. Exercise. a Is the operator L defined on smooth functions of x, y by L u := u xx + cosu linear? b Does the answer change if we replace the operator L by the operator

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Portfolio Optimization Part 1 Unconstrained Portfolios

Portfolio Optimization Part 1 Unconstrained Portfolios Portfolio Optimization Part 1 Unconstrained Portfolios John Norstad j-norstad@northwestern.edu http://www.norstad.org September 11, 2002 Updated: November 3, 2011 Abstract We recapitulate the single-period

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Going For Large Scale Going For Large Scale 1

More information

Constrained Least Squares

Constrained Least Squares Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580-587 CICN may05/1 Background The least squares problem: min Ax b 2 x Sometimes,

More information

Continuity of the Perron Root

Continuity of the Perron Root Linear and Multilinear Algebra http://dx.doi.org/10.1080/03081087.2014.934233 ArXiv: 1407.7564 (http://arxiv.org/abs/1407.7564) Continuity of the Perron Root Carl D. Meyer Department of Mathematics, North

More information

Linear Algebra Notes for Marsden and Tromba Vector Calculus

Linear Algebra Notes for Marsden and Tromba Vector Calculus Linear Algebra Notes for Marsden and Tromba Vector Calculus n-dimensional Euclidean Space and Matrices Definition of n space As was learned in Math b, a point in Euclidean three space can be thought of

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

More information

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Quadratic forms Cochran s theorem, degrees of freedom, and all that Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us

More information

Advanced Lecture on Mathematical Science and Information Science I. Optimization in Finance

Advanced Lecture on Mathematical Science and Information Science I. Optimization in Finance Advanced Lecture on Mathematical Science and Information Science I Optimization in Finance Reha H. Tütüncü Visiting Associate Professor Dept. of Mathematical and Computing Sciences Tokyo Institute of Technology

More information

On Minimal Valid Inequalities for Mixed Integer Conic Programs

On Minimal Valid Inequalities for Mixed Integer Conic Programs On Minimal Valid Inequalities for Mixed Integer Conic Programs Fatma Kılınç Karzan June 27, 2013 Abstract We study mixed integer conic sets involving a general regular (closed, convex, full dimensional,

More information

24. The Branch and Bound Method

24. The Branch and Bound Method 24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no

More information

constraint. Let us penalize ourselves for making the constraint too big. We end up with a

constraint. Let us penalize ourselves for making the constraint too big. We end up with a Chapter 4 Constrained Optimization 4.1 Equality Constraints (Lagrangians) Suppose we have a problem: Maximize 5, (x 1, 2) 2, 2(x 2, 1) 2 subject to x 1 +4x 2 =3 If we ignore the constraint, we get the

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of

More information

Optimal shift scheduling with a global service level constraint

Optimal shift scheduling with a global service level constraint Optimal shift scheduling with a global service level constraint Ger Koole & Erik van der Sluis Vrije Universiteit Division of Mathematics and Computer Science De Boelelaan 1081a, 1081 HV Amsterdam The

More information

Online Learning of Optimal Strategies in Unknown Environments

Online Learning of Optimal Strategies in Unknown Environments 1 Online Learning of Optimal Strategies in Unknown Environments Santiago Paternain and Alejandro Ribeiro Abstract Define an environment as a set of convex constraint functions that vary arbitrarily over

More information

Mind the Duality Gap: Logarithmic regret algorithms for online optimization

Mind the Duality Gap: Logarithmic regret algorithms for online optimization Mind the Duality Gap: Logarithmic regret algorithms for online optimization Sham M. Kakade Toyota Technological Institute at Chicago sham@tti-c.org Shai Shalev-Shartz Toyota Technological Institute at

More information

Regression With Gaussian Measures

Regression With Gaussian Measures Regression With Gaussian Measures Michael J. Meyer Copyright c April 11, 2004 ii PREFACE We treat the basics of Gaussian processes, Gaussian measures, kernel reproducing Hilbert spaces and related topics.

More information

Lecture 11. 3.3.2 Cost functional

Lecture 11. 3.3.2 Cost functional Lecture 11 3.3.2 Cost functional Consider functions L(t, x, u) and K(t, x). Here L : R R n R m R, K : R R n R sufficiently regular. (To be consistent with calculus of variations we would have to take L(t,

More information

International Doctoral School Algorithmic Decision Theory: MCDA and MOO

International Doctoral School Algorithmic Decision Theory: MCDA and MOO International Doctoral School Algorithmic Decision Theory: MCDA and MOO Lecture 2: Multiobjective Linear Programming Department of Engineering Science, The University of Auckland, New Zealand Laboratoire

More information

Convex Optimization. Lieven Vandenberghe University of California, Los Angeles

Convex Optimization. Lieven Vandenberghe University of California, Los Angeles Convex Optimization Lieven Vandenberghe University of California, Los Angeles Tutorial lectures, Machine Learning Summer School University of Cambridge, September 3-4, 2009 Sources: Boyd & Vandenberghe,

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

More information

Downloaded 08/02/12 to 132.68.160.18. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.

Downloaded 08/02/12 to 132.68.160.18. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa. SIAM J. OPTIM. Vol. 22, No. 2, pp. 557 580 c 2012 Society for Industrial and Applied Mathematics SMOOTHING AND FIRST ORDER METHODS: A UNIFIED FRAMEWORK AMIR BECK AND MARC TEBOULLE Abstract. We propose

More information