Introduction to Convex Optimization for Machine Learning

Size: px
Start display at page:

Download "Introduction to Convex Optimization for Machine Learning"

Transcription

1 Introduction to Convex Optimization for Machine Learning John Duchi University of California, Berkeley Practical Machine Learning, Fall 2009 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

2 Outline What is Optimization Convex Sets Convex Functions Convex Optimization Problems Lagrange Duality Optimization Algorithms Take Home Messages Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

3 What is Optimization What is Optimization (and why do we care?) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

4 What is Optimization? What is Optimization Finding the minimizer of a function subject to constraints: minimize f 0 (x) x s.t. f i (x) 0, i = {1,...,k} h j (x) = 0, j = {1,...,l} Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

5 What is Optimization? What is Optimization Finding the minimizer of a function subject to constraints: minimize f 0 (x) x s.t. f i (x) 0, i = {1,...,k} h j (x) = 0, j = {1,...,l} Example: Stock market. Minimize variance of return subject to getting at least $50. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

6 What is Optimization Why do we care? Optimization is at the heart of many (most practical?) machine learning algorithms. Linear regression: minimize w Xw y 2 Classification (logistic regresion or SVM): minimize w n log ( 1 + exp( y i x T i w) ) i=1 or w 2 + C n ξ i s.t. ξ i 1 y i x T i w, ξ i 0. i=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

7 We still care... What is Optimization Maximum likelihood estimation: maximize θ n log p θ (x i ) i=1 Collaborative filtering: minimize w log ( 1 + exp(w T x i w T x j ) ) i j k-means: minimize µ 1,...,µ k J(µ) = k j=1 i C j x i µ j 2 And more (graphical models, feature selection, active learning, control) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

8 What is Optimization But generally speaking... We re screwed. Local (non global) minima of f 0 All kinds of constraints (even restricting to continuous functions): h(x) = sin(2πx) = Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

9 What is Optimization But generally speaking... We re screwed. Local (non global) minima of f 0 All kinds of constraints (even restricting to continuous functions): h(x) = sin(2πx) = Go for convex problems! Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

10 Convex Sets Convex Sets Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

11 Convex Sets Definition A set C R n is convex if for x, y C and any α [0, 1], αx + (1 α)y C. x y Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

12 Examples Convex Sets All of R n (obvious) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

13 Examples Convex Sets All of R n (obvious) Non-negative orthant, R n +: let x 0, y 0, clearly αx + (1 α)y 0. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

14 Examples Convex Sets All of R n (obvious) Non-negative orthant, R n +: let x 0, y 0, clearly αx + (1 α)y 0. Norm balls: let x 1, y 1, then αx + (1 α)y αx + (1 α)y = α x + (1 α) y 1. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

15 Examples Convex Sets Affine subspaces: Ax = b, Ay = b, then A(αx + (1 α)y) = αax + (1 α)ay = αb + (1 α)b = b x x x Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

16 More examples Convex Sets Arbitrary intersections of convex sets: let C i be convex for i I, C = i C i, then x C, y C αx + (1 α)y C i i I so αx + (1 α)y C. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

17 More examples Convex Sets PSD Matrices, a.k.a. the positive semidefinite cone S n + R n n. A S n + means x T Ax 0 for all x R n. For A, B S + n, x T (αa + (1 α)b) x z = αx T Ax + (1 α)x T Bx On right: {[ ] } S 2 x z + = 0 = { 1 y 0 x, y, z : x 0, y 0, xy z 2} z y 0.4 x Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

18 Convex Functions Convex Functions Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

19 Convex Functions Definition A function f : R n R is convex if for x, y domf and any α [0, 1], f(αx + (1 α)y) αf(x) + (1 α)f(y). αf(x) + (1 - α)f(y) f(y) f(x) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

20 Convex Functions First order convexity conditions Theorem Suppose f : R n R is differentiable. Then f is convex if and only if for all x, y domf f(y) f(x) + f(x) T (y x) f(y) f(x) + f(x) T (y - x) (x, f(x)) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

21 Convex Functions Actually, more general than that Definition The subgradient set, or subdifferential set, f(x) of f at x is f(x) = { g : f(y) f(x) + g T (y x) for all y }. f(y) Theorem f : R n R is convex if and only if it has non-empty subdifferential set everywhere. (x, f(x)) f(x) + g T (y - x) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

22 Convex Functions Second order convexity conditions Theorem Suppose f : R n R is twice differentiable. Then f is convex if and only if for all x dom f, 2 f(x) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

23 Convex Functions Convex sets and convex functions Definition The epigraph of a function f is the set of points epi f epi f = {(x, t) : f(x) t}. epi f is convex if and only if f is convex. Sublevel sets, {x : f(x) a} are convex for convex f. a Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

24 Examples Convex Functions Examples Linear/affine functions: f(x) = b T x + c. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

25 Examples Convex Functions Examples Linear/affine functions: f(x) = b T x + c. Quadratic functions: for A 0. For regression: f(x) = 1 2 xt Ax + b T x + c 1 2 Xw y 2 = 1 2 wt X T Xw y T Xw yt y. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

26 More examples Convex Functions Examples Norms (like l 1 or l 2 for regularization): αx + (1 α)y αx + (1 α)y = α x + (1 α) y. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

27 Convex Functions Examples More examples Norms (like l 1 or l 2 for regularization): αx + (1 α)y αx + (1 α)y = α x + (1 α) y. Composition with an affine function f(ax + b): f(a(αx + (1 α)y) + b) = f(α(ax + b) + (1 α)(ay + b)) αf(ax + b) + (1 α)f(ay + b) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

28 Convex Functions Examples More examples Norms (like l 1 or l 2 for regularization): αx + (1 α)y αx + (1 α)y = α x + (1 α) y. Composition with an affine function f(ax + b): f(a(αx + (1 α)y) + b) = f(α(ax + b) + (1 α)(ay + b)) αf(ax + b) + (1 α)f(ay + b) Log-sum-exp (via 2 f(x) PSD): ( n ) f(x) = log exp(x i ) i=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

29 Convex Functions Examples Important examples in Machine Learning 3 SVM loss: f(w) = [ 1 y i x T i w ] + Binary logistic loss: [1 - x] + f(w) = log ( 1 + exp( y i x T i w) ) log(1 + e x ) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

30 Convex Optimization Problems Convex Optimization Problems Definition An optimization problem is convex if its objective is a convex function, the inequality constraints f j are convex, and the equality constraints h j are affine minimize f 0 (x) (Convex function) x s.t. f i (x) 0 (Convex sets) h j (x) = 0 (Affine) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

31 It s nice to be convex Convex Optimization Problems Theorem If ˆx is a local minimizer of a convex optimization problem, it is a global minimizer x Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

32 Convex Optimization Problems Even more reasons to be convex Theorem f(x) = 0 if and only if x is a global minimizer of f(x). Proof. f(x) = 0. We have f(y) f(x) + f(x) T (y x) = f(x). f(x) 0. There is a direction of descent. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

33 Convex Optimization Problems LET S TAKE A BREAK Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

34 Lagrange Duality Lagrange Duality Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

35 Lagrange Duality Goals of Lagrange Duality Get certificate for optimality of a problem Remove constraints Reformulate problem Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

36 Constructing the dual Lagrange Duality Start with optimization problem: minimize f 0 (x) x s.t. f i (x) 0, i = {1,...,k} h j (x) = 0, j = {1,...,l} Form Lagrangian using Lagrange multipliers λ i 0, ν i R Form dual function g(λ, ν) = inf x L(x, λ, ν) = f 0 (x) + L(x, λ, ν) = inf x k λ i f i (x) + i=1 f 0(x) + l ν j h j (x) j=1 k λ i f i (x) + i=1 l ν j h j (x) j=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

37 Remarks Lagrange Duality Original problem is equivalent to [ minimize x sup L(x, λ, ν) λ 0,ν Dual problem is switching the min and max: [ ] maximize inf L(x, λ, ν). λ 0,ν x ] Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

38 Lagrange Duality One Great Property of Dual Lemma (Weak Duality) If λ 0, then g(λ, ν) f 0 (x ). Proof. We have g(λ, ν) = inf L(x, λ, ν) x L(x, λ, ν) k l = f 0 (x ) + λ i f i (x ) + ν j h j (x ) f 0 (x ). i=1 j=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

39 Lagrange Duality The Greatest Property of the Dual Theorem For reasonable 1 convex problems, sup g(λ, ν) = f 0 (x ) λ 0,ν 1 There are conditions called constraint qualification for which this is true Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

40 Geometric Look Lagrange Duality Minimize 1 2 (x c 1)2 subject to x 2 c x True function (blue), constraint (green), L(x, λ) for different λ (dotted) Dual function g(λ) (black), primal optimal (dotted blue) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53 λ

41 Intuition Lagrange Duality Can interpret duality as linear approximation. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

42 Lagrange Duality Intuition Can interpret duality as linear approximation. I (a) = if a > 0, 0 otherwise; I 0 (a) = unless a = 0. Rewrite problem as minimize x k l f 0 (x) + I (f i (x)) + I 0 (h j (x)) i=1 j=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

43 Lagrange Duality Intuition Can interpret duality as linear approximation. I (a) = if a > 0, 0 otherwise; I 0 (a) = unless a = 0. Rewrite problem as minimize x k l f 0 (x) + I (f i (x)) + I 0 (h j (x)) i=1 j=1 Replace I(f i (x)) with λ i f i (x); a measure of displeasure when λ i 0, f i (x) > 0. ν i h j (x) lower bounds I 0 (h j (x)): minimize x k l f 0 (x) + λ i f i (x) + ν j h j (x) i=1 j=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

44 Lagrange Duality Example: Linearly constrained least squares minimize x Form the Lagrangian: 1 2 Ax b 2 s.t. Bx = d. L(x, ν) = 1 2 Ax b 2 + ν T (Bx d) Take infimum: x L(x, ν) = A T Ax A T b + B T ν x = (A T A) 1 (A T b B T ν) Simple unconstrained quadratic problem! inf x = 1 2 L(x, ν) A(A T A) 1 (A T b B T ν) b 2 + ν T B((A T A) 1 A T b B T ν) d T ν Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

45 Lagrange Duality Example: Quadratically constrained least squares 1 minimize x 2 Ax b 2 s.t. Form the Lagrangian (λ 0): 1 2 x 2 c. L(x, λ) = 1 2 Ax b λ( x 2 2c) Take infimum: x L(x, ν) = A T Ax A T b + λi x = (A T A + λi) 1 A T b inf L(x, λ) = 1 A(A T A + λi) 1 A T b b 2 + λ (A T A + λi) 1 A T b 2 λc x 2 2 One variable dual problem! g(λ) = 1 2 bt A(A T A + λi) 1 A T b λc b 2. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

46 Lagrange Duality Uses of the Dual Main use: certificate of optimality (a.k.a. duality gap). If we have feasible x and know the dual g(λ, ν), then g(λ, ν) f 0 (x ) f 0 (x) f 0 (x ) f 0 (x) g(λ, ν) f 0 (x) f 0 (x) f 0 (x ) f 0 (x) g(λ, ν). Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

47 Lagrange Duality Uses of the Dual Main use: certificate of optimality (a.k.a. duality gap). If we have feasible x and know the dual g(λ, ν), then g(λ, ν) f 0 (x ) f 0 (x) f 0 (x ) f 0 (x) g(λ, ν) f 0 (x) f 0 (x) f 0 (x ) f 0 (x) g(λ, ν). Also used in more advanced primal-dual algorithms (we won t talk about these). Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

48 Optimization Algorithms Optimization Algorithms Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

49 Gradient Descent Optimization Algorithms Gradient Methods The simplest algorithm in the world (almost). Goal: minimize x f(x) Just iterate where η t is stepsize. x t+1 = x t η t f(x t ) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

50 Single Step Illustration Optimization Algorithms Gradient Methods f(x) (x t, f(x t )) f(x t ) - η f(x t ) T (x - x t ) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

51 Full Gradient Descent Optimization Algorithms Gradient Methods f(x) = log(exp(x 1 + 3x 2.1) + exp(x 1 3x 2.1) + exp( x 1.1)) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

52 Optimization Algorithms Gradient Methods Stepsize Selection How do I choose a stepsize? Idea 1: exact line search Too expensive to be practical. η t = argminf (x η f(x)) η Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

53 Optimization Algorithms Gradient Methods Stepsize Selection How do I choose a stepsize? Idea 1: exact line search Too expensive to be practical. η t = argminf (x η f(x)) η Idea 2: backtracking (Armijo) line search. Let α (0, 1 2 ), β (0, 1). Multiply η = βη until Works well in practice. f (x η f(x)) f(x) αη f(x) 2 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

54 Optimization Algorithms Gradient Methods Illustration of Armijo/Backtracking Line Search f(x - η f(x)) f(x) - ηα f(x) 2 η t = 0 η 0 η As a function of stepsize η. Clearly a region where f(x η f(x)) is below line f(x) αη f(x) 2. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

55 Newton s method Optimization Algorithms Gradient Methods Idea: use a second-order approximation to function. f(x + x) f(x) + f(x) T x xt 2 f(x) x Choose x to minimize above: x = [ 2 f(x) ] 1 f(x) This is descent direction: f(x) T x = f(x) T [ 2 f(x) ] 1 f(x) < 0. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

56 Newton step picture Optimization Algorithms Gradient Methods ˆf f (x, f(x)) (x + x, f(x + x)) ˆf is 2 nd -order approximation, f is true function. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

57 Optimization Algorithms Gradient Methods Convergence of gradient descent and Newton s method Strongly convex case: 2 f(x) mi, then Linear convergence. For some γ (0, 1), f(x t ) f(x ) γ t, γ < 1. f(x t ) f(x ) γ t or t 1 γ log 1 ε f(x t) f(x ) ε. Smooth case: f(x) f(y) C x y. f(x t ) f(x ) K t 2 Newton s method often is faster, especially when f has long valleys Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

58 Optimization Algorithms What about constraints? Gradient Methods Linear constraints Ax = b are easy. For example, in Newton method (assume Ax = b): minimize x f(x) T x xt 2 f(x) x s.t. A x = 0. Solution x satisfies A(x + x) = Ax + A x = b. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

59 Optimization Algorithms Gradient Methods What about constraints? Linear constraints Ax = b are easy. For example, in Newton method (assume Ax = b): minimize x f(x) T x xt 2 f(x) x s.t. A x = 0. Solution x satisfies A(x + x) = Ax + A x = b. Inequality constraints are a bit tougher... minimize f(x) T x + 1 x 2 xt 2 f(x) x s.t. f i (x + x) 0 just as hard as original. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

60 Optimization Algorithms Gradient Methods Logarithmic Barrier Methods Goal: Convert to minimize x f 0 (x) minimize x s.t. f i (x) 0, i = {1,...,k} f 0 (x) + k I (f i (x)) i=1 Approximate I (u) t log( u) for small t. minimize x k f 0 (x) t log ( f i (x)) i=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

61 The barrier function Optimization Algorithms Gradient Methods u I (u) is dotted line, others are t log( u). Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

62 Illustration Optimization Algorithms Gradient Methods Minimizing c T x subject to Ax b. c t = 1 t = 5 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

63 Subgradient Descent Optimization Algorithms Subgradient Methods Really, the simplest algorithm in the world. Goal: minimize x f(x) Just iterate where η t is a stepsize, g t f(x t ). x t+1 = x t η t g t Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

64 Optimization Algorithms Subgradient Methods Why subgradient descent? Lots of non-differentiable convex functions used in machine learning: f(x) = [ 1 a T x ] k +, f(x) = x 1, f(x) = σ r (X) r=1 where σ r is the rth singular value of X. Easy to analyze Do not even need true sub-gradient: just have Eg t f(x t ). Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

65 Optimization Algorithms Subgradient Methods Proof of convergence for subgradient descent Idea: bound x t+1 x using subgradient inequality. Assume that g t G. x t+1 x 2 = x t ηg t x 2 = x t x 2 2ηg T t (x t x ) + η 2 g t 2 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

66 Optimization Algorithms Subgradient Methods Proof of convergence for subgradient descent Idea: bound x t+1 x using subgradient inequality. Assume that g t G. Recall that x t+1 x 2 = x t ηg t x 2 = x t x 2 2ηg T t (x t x ) + η 2 g t 2 f(x ) f(x t ) + g T t (x x t ) g T t (x t x ) f(x ) f(x t ) so Then x t+1 x 2 x t x 2 + 2η [f(x ) f(x t )] + η 2 G 2. f(x t ) f(x ) x t x 2 x t+1 x 2 + η 2η 2 G2. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

67 Almost done... Optimization Algorithms Subgradient Methods Sum from t = 1 to T: T t=1 f(x t ) f(x ) 1 2η T t=1 [ x t x 2 x t+1 x 2] + Tη 2 G2 = 1 2η x 1 x 2 1 2η x T+1 x 2 + Tη 2 G2 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

68 Almost done... Optimization Algorithms Subgradient Methods Sum from t = 1 to T: T t=1 f(x t ) f(x ) 1 2η T t=1 [ x t x 2 x t+1 x 2] + Tη 2 G2 = 1 2η x 1 x 2 1 2η x T+1 x 2 + Tη 2 G2 Now let D = x 1 x, and keep track of min along run, f (x best ) f(x ) 1 2ηT D2 + η 2 G2. Set η = D G T and f (x best) f(x ) DG T. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

69 Optimization Algorithms Subgradient Methods Extension: projected subgradient descent Now have a convex constraint set X. Goal: minimize f(x) x X Idea: do subgradient steps, project x t back into X at every iteration. Π X (x t ) x t x t+1 = Π X (x t ηg t ) X Proof: x Π X (x t ) x x t x if x X. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

70 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

71 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

72 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

73 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

74 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

75 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

76 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

77 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

78 Optimization Algorithms Subgradient Methods Convergence results for (projected) subgradient methods Any decreasing, non-summable stepsize η t 0, t=1 η t = gives f ( x avg(t) ) f(x ) 0. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

79 Optimization Algorithms Subgradient Methods Convergence results for (projected) subgradient methods Any decreasing, non-summable stepsize η t 0, t=1 η t = gives f ( x avg(t) ) f(x ) 0. Slightly less brain-dead analysis than earlier shows with η t 1/ t f ( x avg(t) ) f(x ) C t Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

80 Optimization Algorithms Subgradient Methods Convergence results for (projected) subgradient methods Any decreasing, non-summable stepsize η t 0, t=1 η t = gives f ( x avg(t) ) f(x ) 0. Slightly less brain-dead analysis than earlier shows with η t 1/ t f ( x avg(t) ) f(x ) C t Same convergence when g t is random, i.e. Eg t f(x t ). Example: f(w) = 1 2 w 2 + C Just pick random training example. n [ 1 yi x T i w ] + i=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

81 Recap Take Home Messages Defined convex sets and functions Saw why we want optimization problems to be convex (solvable) Sketched some of Lagrange duality First order methods are easy and (often) work well Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

82 Take Home Messages Take Home Messages Many useful problems formulated as convex optimization problems If it is not convex and not an eigenvalue problem, you are out of luck If it is convex, you are golden Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION

A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION 1 A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION Dimitri Bertsekas M.I.T. FEBRUARY 2003 2 OUTLINE Convexity issues in optimization Historical remarks Our treatment of the subject Three unifying lines of

More information

2.3 Convex Constrained Optimization Problems

2.3 Convex Constrained Optimization Problems 42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

More information

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen (für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained

More information

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725 Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T

More information

Several Views of Support Vector Machines

Several Views of Support Vector Machines Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

More information

Lecture 2: The SVM classifier

Lecture 2: The SVM classifier Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function

More information

Nonlinear Optimization: Algorithms 3: Interior-point methods

Nonlinear Optimization: Algorithms 3: Interior-point methods Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,

More information

10. Proximal point method

10. Proximal point method L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing

More information

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d). 1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction

More information

Summer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.

Summer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U. Summer course on Convex Optimization Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.Minnesota Interior-Point Methods: the rebirth of an old idea Suppose that f is

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 J. Zhang Institute of Applied Mathematics, Chongqing University of Posts and Telecommunications, Chongqing

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

Date: April 12, 2001. Contents

Date: April 12, 2001. Contents 2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........

More information

GI01/M055 Supervised Learning Proximal Methods

GI01/M055 Supervised Learning Proximal Methods GI01/M055 Supervised Learning Proximal Methods Massimiliano Pontil (based on notes by Luca Baldassarre) (UCL) Proximal Methods 1 / 20 Today s Plan Problem setting Convex analysis concepts Proximal operators

More information

Tutorial on Convex Optimization for Engineers Part I

Tutorial on Convex Optimization for Engineers Part I Tutorial on Convex Optimization for Engineers Part I M.Sc. Jens Steinwandt Communications Research Laboratory Ilmenau University of Technology PO Box 100565 D-98684 Ilmenau, Germany jens.steinwandt@tu-ilmenau.de

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Convex Optimization SVM s and Kernel Machines

Convex Optimization SVM s and Kernel Machines Convex Optimization SVM s and Kernel Machines S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola and Stéphane Canu S.V.N.

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric

More information

First Order Algorithms in Variational Image Processing

First Order Algorithms in Variational Image Processing First Order Algorithms in Variational Image Processing M. Burger, A. Sawatzky, and G. Steidl today Introduction Variational methods in imaging are nowadays developing towards a quite universal and flexible

More information

1 Norms and Vector Spaces

1 Norms and Vector Spaces 008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

Duality of linear conic problems

Duality of linear conic problems Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least

More information

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Linear Programming, Lagrange Multipliers, and Duality Geoff Gordon

Linear Programming, Lagrange Multipliers, and Duality Geoff Gordon lp.nb 1 Linear Programming, Lagrange Multipliers, and Duality Geoff Gordon lp.nb 2 Overview This is a tutorial about some interesting math and geometry connected with constrained optimization. It is not

More information

Minimize subject to. x S R

Minimize subject to. x S R Chapter 12 Lagrangian Relaxation This chapter is mostly inspired by Chapter 16 of [1]. In the previous chapters, we have succeeded to find efficient algorithms to solve several important problems such

More information

Two-Stage Stochastic Linear Programs

Two-Stage Stochastic Linear Programs Two-Stage Stochastic Linear Programs Operations Research Anthony Papavasiliou 1 / 27 Two-Stage Stochastic Linear Programs 1 Short Reviews Probability Spaces and Random Variables Convex Analysis 2 Deterministic

More information

max cx s.t. Ax c where the matrix A, cost vector c and right hand side b are given and x is a vector of variables. For this example we have x

max cx s.t. Ax c where the matrix A, cost vector c and right hand side b are given and x is a vector of variables. For this example we have x Linear Programming Linear programming refers to problems stated as maximization or minimization of a linear function subject to constraints that are linear equalities and inequalities. Although the study

More information

Linear Programming Notes V Problem Transformations

Linear Programming Notes V Problem Transformations Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material

More information

Convex analysis and profit/cost/support functions

Convex analysis and profit/cost/support functions CALIFORNIA INSTITUTE OF TECHNOLOGY Division of the Humanities and Social Sciences Convex analysis and profit/cost/support functions KC Border October 2004 Revised January 2009 Let A be a subset of R m

More information

Optimization. J(f) := λω(f) + R emp (f) (5.1) m l(f(x i ) y i ). (5.2) i=1

Optimization. J(f) := λω(f) + R emp (f) (5.1) m l(f(x i ) y i ). (5.2) i=1 5 Optimization Optimization plays an increasingly important role in machine learning. For instance, many machine learning algorithms minimize a regularized risk functional: with the empirical risk min

More information

Vector and Matrix Norms

Vector and Matrix Norms Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

More information

(Basic definitions and properties; Separation theorems; Characterizations) 1.1 Definition, examples, inner description, algebraic properties

(Basic definitions and properties; Separation theorems; Characterizations) 1.1 Definition, examples, inner description, algebraic properties Lecture 1 Convex Sets (Basic definitions and properties; Separation theorems; Characterizations) 1.1 Definition, examples, inner description, algebraic properties 1.1.1 A convex set In the school geometry

More information

NONLINEAR AND DYNAMIC OPTIMIZATION From Theory to Practice

NONLINEAR AND DYNAMIC OPTIMIZATION From Theory to Practice NONLINEAR AND DYNAMIC OPTIMIZATION From Theory to Practice IC-32: Winter Semester 2006/2007 Benoît C. CHACHUAT Laboratoire d Automatique, École Polytechnique Fédérale de Lausanne CONTENTS 1 Nonlinear

More information

Proximal mapping via network optimization

Proximal mapping via network optimization L. Vandenberghe EE236C (Spring 23-4) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:

More information

Lecture 6: Logistic Regression

Lecture 6: Logistic Regression Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,

More information

Distributed Machine Learning and Big Data

Distributed Machine Learning and Big Data Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya

More information

Stanford University CS261: Optimization Handout 6 Luca Trevisan January 20, In which we introduce the theory of duality in linear programming.

Stanford University CS261: Optimization Handout 6 Luca Trevisan January 20, In which we introduce the theory of duality in linear programming. Stanford University CS261: Optimization Handout 6 Luca Trevisan January 20, 2011 Lecture 6 In which we introduce the theory of duality in linear programming 1 The Dual of Linear Program Suppose that we

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

More information

1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where.

1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where. Introduction Linear Programming Neil Laws TT 00 A general optimization problem is of the form: choose x to maximise f(x) subject to x S where x = (x,..., x n ) T, f : R n R is the objective function, S

More information

Nonlinear Programming Methods.S2 Quadratic Programming

Nonlinear Programming Methods.S2 Quadratic Programming Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective

More information

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions 3. Convex functions Convex Optimization Boyd & Vandenberghe basic properties and examples operations that preserve convexity the conjugate function quasiconvex functions log-concave and log-convex functions

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

Inner Product Spaces

Inner Product Spaces Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

More information

By W.E. Diewert. July, Linear programming problems are important for a number of reasons:

By W.E. Diewert. July, Linear programming problems are important for a number of reasons: APPLIED ECONOMICS By W.E. Diewert. July, 3. Chapter : Linear Programming. Introduction The theory of linear programming provides a good introduction to the study of constrained maximization (and minimization)

More information

Lecture 2: August 29. Linear Programming (part I)

Lecture 2: August 29. Linear Programming (part I) 10-725: Convex Optimization Fall 2013 Lecture 2: August 29 Lecturer: Barnabás Póczos Scribes: Samrachana Adhikari, Mattia Ciollaro, Fabrizio Lecci Note: LaTeX template courtesy of UC Berkeley EECS dept.

More information

ME128 Computer-Aided Mechanical Design Course Notes Introduction to Design Optimization

ME128 Computer-Aided Mechanical Design Course Notes Introduction to Design Optimization ME128 Computer-ided Mechanical Design Course Notes Introduction to Design Optimization 2. OPTIMIZTION Design optimization is rooted as a basic problem for design engineers. It is, of course, a rare situation

More information

(Quasi-)Newton methods

(Quasi-)Newton methods (Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

Some representability and duality results for convex mixed-integer programs.

Some representability and duality results for convex mixed-integer programs. Some representability and duality results for convex mixed-integer programs. Santanu S. Dey Joint work with Diego Morán and Juan Pablo Vielma December 17, 2012. Introduction About Motivation Mixed integer

More information

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:

More information

Derivative Free Optimization

Derivative Free Optimization Department of Mathematics Derivative Free Optimization M.J.D. Powell LiTH-MAT-R--2014/02--SE Department of Mathematics Linköping University S-581 83 Linköping, Sweden. Three lectures 1 on Derivative Free

More information

1 Polyhedra and Linear Programming

1 Polyhedra and Linear Programming CS 598CSC: Combinatorial Optimization Lecture date: January 21, 2009 Instructor: Chandra Chekuri Scribe: Sungjin Im 1 Polyhedra and Linear Programming In this lecture, we will cover some basic material

More information

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015. Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x

More information

1 Sufficient statistics

1 Sufficient statistics 1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =

More information

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method Lecture 3 3B1B Optimization Michaelmas 2015 A. Zisserman Linear Programming Extreme solutions Simplex method Interior point method Integer programming and relaxation The Optimization Tree Linear Programming

More information

Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach

Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach MASTER S THESIS Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach PAULINE ALDENVIK MIRJAM SCHIERSCHER Department of Mathematical

More information

Diagonalisation. Chapter 3. Introduction. Eigenvalues and eigenvectors. Reading. Definitions

Diagonalisation. Chapter 3. Introduction. Eigenvalues and eigenvectors. Reading. Definitions Chapter 3 Diagonalisation Eigenvalues and eigenvectors, diagonalisation of a matrix, orthogonal diagonalisation fo symmetric matrices Reading As in the previous chapter, there is no specific essential

More information

Definition of a Linear Program

Definition of a Linear Program Definition of a Linear Program Definition: A function f(x 1, x,..., x n ) of x 1, x,..., x n is a linear function if and only if for some set of constants c 1, c,..., c n, f(x 1, x,..., x n ) = c 1 x 1

More information

Dual Methods for Total Variation-Based Image Restoration

Dual Methods for Total Variation-Based Image Restoration Dual Methods for Total Variation-Based Image Restoration Jamylle Carter Institute for Mathematics and its Applications University of Minnesota, Twin Cities Ph.D. (Mathematics), UCLA, 2001 Advisor: Tony

More information

Notes on Symmetric Matrices

Notes on Symmetric Matrices CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.

More information

Elementary Gradient-Based Parameter Estimation

Elementary Gradient-Based Parameter Estimation Elementary Gradient-Based Parameter Estimation Julius O. Smith III Center for Computer Research in Music and Acoustics (CCRMA Department of Music, Stanford University, Stanford, California 94305 USA Abstract

More information

The Hadamard Product

The Hadamard Product The Hadamard Product Elizabeth Million April 12, 2007 1 Introduction and Basic Results As inexperienced mathematicians we may have once thought that the natural definition for matrix multiplication would

More information

Lecture 7 - Linear Programming

Lecture 7 - Linear Programming COMPSCI 530: Design and Analysis of Algorithms DATE: 09/17/2013 Lecturer: Debmalya Panigrahi Lecture 7 - Linear Programming Scribe: Hieu Bui 1 Overview In this lecture, we cover basics of linear programming,

More information

Support Vector Machines Explained

Support Vector Machines Explained March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA This handout presents the second derivative test for a local extrema of a Lagrange multiplier problem. The Section 1 presents a geometric motivation for the

More information

CONSTRAINED NONLINEAR PROGRAMMING

CONSTRAINED NONLINEAR PROGRAMMING 149 CONSTRAINED NONLINEAR PROGRAMMING We now turn to methods for general constrained nonlinear programming. These may be broadly classified into two categories: 1. TRANSFORMATION METHODS: In this approach

More information

CHAPTER 9. Integer Programming

CHAPTER 9. Integer Programming CHAPTER 9 Integer Programming An integer linear program (ILP) is, by definition, a linear program with the additional constraint that all variables take integer values: (9.1) max c T x s t Ax b and x integral

More information

Convex Optimization. Lieven Vandenberghe University of California, Los Angeles

Convex Optimization. Lieven Vandenberghe University of California, Los Angeles Convex Optimization Lieven Vandenberghe University of California, Los Angeles Tutorial lectures, Machine Learning Summer School University of Cambridge, September 3-4, 2009 Sources: Boyd & Vandenberghe,

More information

Adaptive Bound Optimization for Online Convex Optimization

Adaptive Bound Optimization for Online Convex Optimization Adaptive Bound Optimization for Online Convex Optimization H. Brendan McMahan Google, Inc. mcmahan@google.com Matthew Streeter Google, Inc. mstreeter@google.com Abstract We introduce a new online convex

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

More information

Chapter 4. Duality. 4.1 A Graphical Example

Chapter 4. Duality. 4.1 A Graphical Example Chapter 4 Duality Given any linear program, there is another related linear program called the dual. In this chapter, we will develop an understanding of the dual linear program. This understanding translates

More information

4.6 Linear Programming duality

4.6 Linear Programming duality 4.6 Linear Programming duality To any minimization (maximization) LP we can associate a closely related maximization (minimization) LP. Different spaces and objective functions but in general same optimal

More information

Absolute Value Programming

Absolute Value Programming Computational Optimization and Aplications,, 1 11 (2006) c 2006 Springer Verlag, Boston. Manufactured in The Netherlands. Absolute Value Programming O. L. MANGASARIAN olvi@cs.wisc.edu Computer Sciences

More information

Metric Spaces. Chapter 1

Metric Spaces. Chapter 1 Chapter 1 Metric Spaces Many of the arguments you have seen in several variable calculus are almost identical to the corresponding arguments in one variable calculus, especially arguments concerning convergence

More information

Cost Minimization and the Cost Function

Cost Minimization and the Cost Function Cost Minimization and the Cost Function Juan Manuel Puerta October 5, 2009 So far we focused on profit maximization, we could look at a different problem, that is the cost minimization problem. This is

More information

3. Proximal gradient method

3. Proximal gradient method Algorithms for large-scale convex optimization DTU 2010 3. Proximal gradient method introduction proximal mapping proximal gradient method convergence analysis accelerated proximal gradient method forward-backward

More information

The equivalence of logistic regression and maximum entropy models

The equivalence of logistic regression and maximum entropy models The equivalence of logistic regression and maximum entropy models John Mount September 23, 20 Abstract As our colleague so aptly demonstrated ( http://www.win-vector.com/blog/20/09/the-simplerderivation-of-logistic-regression/

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

Lecture 7: Finding Lyapunov Functions 1

Lecture 7: Finding Lyapunov Functions 1 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.243j (Fall 2003): DYNAMICS OF NONLINEAR SYSTEMS by A. Megretski Lecture 7: Finding Lyapunov Functions 1

More information

GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1

GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1 (R) User Manual Environmental Energy Technologies Division Berkeley, CA 94720 http://simulationresearch.lbl.gov Michael Wetter MWetter@lbl.gov February 20, 2009 Notice: This work was supported by the U.S.

More information

A FIRST COURSE IN OPTIMIZATION THEORY

A FIRST COURSE IN OPTIMIZATION THEORY A FIRST COURSE IN OPTIMIZATION THEORY RANGARAJAN K. SUNDARAM New York University CAMBRIDGE UNIVERSITY PRESS Contents Preface Acknowledgements page xiii xvii 1 Mathematical Preliminaries 1 1.1 Notation

More information

RANDOM INTERVAL HOMEOMORPHISMS. MICHA L MISIUREWICZ Indiana University Purdue University Indianapolis

RANDOM INTERVAL HOMEOMORPHISMS. MICHA L MISIUREWICZ Indiana University Purdue University Indianapolis RANDOM INTERVAL HOMEOMORPHISMS MICHA L MISIUREWICZ Indiana University Purdue University Indianapolis This is a joint work with Lluís Alsedà Motivation: A talk by Yulij Ilyashenko. Two interval maps, applied

More information

Høgskolen i Narvik Sivilingeniørutdanningen STE6237 ELEMENTMETODER. Oppgaver

Høgskolen i Narvik Sivilingeniørutdanningen STE6237 ELEMENTMETODER. Oppgaver Høgskolen i Narvik Sivilingeniørutdanningen STE637 ELEMENTMETODER Oppgaver Klasse: 4.ID, 4.IT Ekstern Professor: Gregory A. Chechkin e-mail: chechkin@mech.math.msu.su Narvik 6 PART I Task. Consider two-point

More information

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}. Walrasian Demand Econ 2100 Fall 2015 Lecture 5, September 16 Outline 1 Walrasian Demand 2 Properties of Walrasian Demand 3 An Optimization Recipe 4 First and Second Order Conditions Definition Walrasian

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Prof. Alexander Ihler Prof. Max Welling icamp Tutorial July 22 What is machine learning? The ability of a machine to improve its performance based on previous results:

More information

No: 10 04. Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics

No: 10 04. Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics No: 10 04 Bilkent University Monotonic Extension Farhad Husseinov Discussion Papers Department of Economics The Discussion Papers of the Department of Economics are intended to make the initial results

More information

Lecture 5: Conic Optimization: Overview

Lecture 5: Conic Optimization: Overview EE 227A: Conve Optimization and Applications January 31, 2012 Lecture 5: Conic Optimization: Overview Lecturer: Laurent El Ghaoui Reading assignment: Chapter 4 of BV. Sections 3.1-3.6 of WTB. 5.1 Linear

More information

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α

More information

1. Let X and Y be normed spaces and let T B(X, Y ).

1. Let X and Y be normed spaces and let T B(X, Y ). Uppsala Universitet Matematiska Institutionen Andreas Strömbergsson Prov i matematik Funktionalanalys Kurs: NVP, Frist. 2005-03-14 Skrivtid: 9 11.30 Tillåtna hjälpmedel: Manuella skrivdon, Kreyszigs bok

More information

Chapter 1. Metric Spaces. Metric Spaces. Examples. Normed linear spaces

Chapter 1. Metric Spaces. Metric Spaces. Examples. Normed linear spaces Chapter 1. Metric Spaces Metric Spaces MA222 David Preiss d.preiss@warwick.ac.uk Warwick University, Spring 2008/2009 Definitions. A metric on a set M is a function d : M M R such that for all x, y, z

More information

Online Convex Optimization

Online Convex Optimization E0 370 Statistical Learning heory Lecture 19 Oct 22, 2013 Online Convex Optimization Lecturer: Shivani Agarwal Scribe: Aadirupa 1 Introduction In this lecture we shall look at a fairly general setting

More information

Stochastic Inventory Control

Stochastic Inventory Control Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Constrained Least Squares

Constrained Least Squares Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580-587 CICN may05/1 Background The least squares problem: min Ax b 2 x Sometimes,

More information

AM 221: Advanced Optimization Spring Prof. Yaron Singer Lecture 7 February 19th, 2014

AM 221: Advanced Optimization Spring Prof. Yaron Singer Lecture 7 February 19th, 2014 AM 22: Advanced Optimization Spring 204 Prof Yaron Singer Lecture 7 February 9th, 204 Overview In our previous lecture we saw the application of the strong duality theorem to game theory, and then saw

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Going For Large Scale Going For Large Scale 1

More information