# Introduction to Convex Optimization for Machine Learning

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Introduction to Convex Optimization for Machine Learning John Duchi University of California, Berkeley Practical Machine Learning, Fall 2009 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

2 Outline What is Optimization Convex Sets Convex Functions Convex Optimization Problems Lagrange Duality Optimization Algorithms Take Home Messages Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

3 What is Optimization What is Optimization (and why do we care?) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

4 What is Optimization? What is Optimization Finding the minimizer of a function subject to constraints: minimize f 0 (x) x s.t. f i (x) 0, i = {1,...,k} h j (x) = 0, j = {1,...,l} Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

5 What is Optimization? What is Optimization Finding the minimizer of a function subject to constraints: minimize f 0 (x) x s.t. f i (x) 0, i = {1,...,k} h j (x) = 0, j = {1,...,l} Example: Stock market. Minimize variance of return subject to getting at least \$50. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

6 What is Optimization Why do we care? Optimization is at the heart of many (most practical?) machine learning algorithms. Linear regression: minimize w Xw y 2 Classification (logistic regresion or SVM): minimize w n log ( 1 + exp( y i x T i w) ) i=1 or w 2 + C n ξ i s.t. ξ i 1 y i x T i w, ξ i 0. i=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

7 We still care... What is Optimization Maximum likelihood estimation: maximize θ n log p θ (x i ) i=1 Collaborative filtering: minimize w log ( 1 + exp(w T x i w T x j ) ) i j k-means: minimize µ 1,...,µ k J(µ) = k j=1 i C j x i µ j 2 And more (graphical models, feature selection, active learning, control) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

8 What is Optimization But generally speaking... We re screwed. Local (non global) minima of f 0 All kinds of constraints (even restricting to continuous functions): h(x) = sin(2πx) = Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

9 What is Optimization But generally speaking... We re screwed. Local (non global) minima of f 0 All kinds of constraints (even restricting to continuous functions): h(x) = sin(2πx) = Go for convex problems! Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

10 Convex Sets Convex Sets Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

11 Convex Sets Definition A set C R n is convex if for x, y C and any α [0, 1], αx + (1 α)y C. x y Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

12 Examples Convex Sets All of R n (obvious) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

13 Examples Convex Sets All of R n (obvious) Non-negative orthant, R n +: let x 0, y 0, clearly αx + (1 α)y 0. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

14 Examples Convex Sets All of R n (obvious) Non-negative orthant, R n +: let x 0, y 0, clearly αx + (1 α)y 0. Norm balls: let x 1, y 1, then αx + (1 α)y αx + (1 α)y = α x + (1 α) y 1. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

15 Examples Convex Sets Affine subspaces: Ax = b, Ay = b, then A(αx + (1 α)y) = αax + (1 α)ay = αb + (1 α)b = b x x x Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

16 More examples Convex Sets Arbitrary intersections of convex sets: let C i be convex for i I, C = i C i, then x C, y C αx + (1 α)y C i i I so αx + (1 α)y C. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

17 More examples Convex Sets PSD Matrices, a.k.a. the positive semidefinite cone S n + R n n. A S n + means x T Ax 0 for all x R n. For A, B S + n, x T (αa + (1 α)b) x z = αx T Ax + (1 α)x T Bx On right: {[ ] } S 2 x z + = 0 = { 1 y 0 x, y, z : x 0, y 0, xy z 2} z y 0.4 x Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

18 Convex Functions Convex Functions Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

19 Convex Functions Definition A function f : R n R is convex if for x, y domf and any α [0, 1], f(αx + (1 α)y) αf(x) + (1 α)f(y). αf(x) + (1 - α)f(y) f(y) f(x) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

20 Convex Functions First order convexity conditions Theorem Suppose f : R n R is differentiable. Then f is convex if and only if for all x, y domf f(y) f(x) + f(x) T (y x) f(y) f(x) + f(x) T (y - x) (x, f(x)) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

21 Convex Functions Actually, more general than that Definition The subgradient set, or subdifferential set, f(x) of f at x is f(x) = { g : f(y) f(x) + g T (y x) for all y }. f(y) Theorem f : R n R is convex if and only if it has non-empty subdifferential set everywhere. (x, f(x)) f(x) + g T (y - x) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

22 Convex Functions Second order convexity conditions Theorem Suppose f : R n R is twice differentiable. Then f is convex if and only if for all x dom f, 2 f(x) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

23 Convex Functions Convex sets and convex functions Definition The epigraph of a function f is the set of points epi f epi f = {(x, t) : f(x) t}. epi f is convex if and only if f is convex. Sublevel sets, {x : f(x) a} are convex for convex f. a Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

24 Examples Convex Functions Examples Linear/affine functions: f(x) = b T x + c. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

25 Examples Convex Functions Examples Linear/affine functions: f(x) = b T x + c. Quadratic functions: for A 0. For regression: f(x) = 1 2 xt Ax + b T x + c 1 2 Xw y 2 = 1 2 wt X T Xw y T Xw yt y. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

26 More examples Convex Functions Examples Norms (like l 1 or l 2 for regularization): αx + (1 α)y αx + (1 α)y = α x + (1 α) y. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

27 Convex Functions Examples More examples Norms (like l 1 or l 2 for regularization): αx + (1 α)y αx + (1 α)y = α x + (1 α) y. Composition with an affine function f(ax + b): f(a(αx + (1 α)y) + b) = f(α(ax + b) + (1 α)(ay + b)) αf(ax + b) + (1 α)f(ay + b) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

28 Convex Functions Examples More examples Norms (like l 1 or l 2 for regularization): αx + (1 α)y αx + (1 α)y = α x + (1 α) y. Composition with an affine function f(ax + b): f(a(αx + (1 α)y) + b) = f(α(ax + b) + (1 α)(ay + b)) αf(ax + b) + (1 α)f(ay + b) Log-sum-exp (via 2 f(x) PSD): ( n ) f(x) = log exp(x i ) i=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

29 Convex Functions Examples Important examples in Machine Learning 3 SVM loss: f(w) = [ 1 y i x T i w ] + Binary logistic loss: [1 - x] + f(w) = log ( 1 + exp( y i x T i w) ) log(1 + e x ) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

30 Convex Optimization Problems Convex Optimization Problems Definition An optimization problem is convex if its objective is a convex function, the inequality constraints f j are convex, and the equality constraints h j are affine minimize f 0 (x) (Convex function) x s.t. f i (x) 0 (Convex sets) h j (x) = 0 (Affine) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

31 It s nice to be convex Convex Optimization Problems Theorem If ˆx is a local minimizer of a convex optimization problem, it is a global minimizer x Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

32 Convex Optimization Problems Even more reasons to be convex Theorem f(x) = 0 if and only if x is a global minimizer of f(x). Proof. f(x) = 0. We have f(y) f(x) + f(x) T (y x) = f(x). f(x) 0. There is a direction of descent. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

33 Convex Optimization Problems LET S TAKE A BREAK Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

34 Lagrange Duality Lagrange Duality Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

35 Lagrange Duality Goals of Lagrange Duality Get certificate for optimality of a problem Remove constraints Reformulate problem Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

36 Constructing the dual Lagrange Duality Start with optimization problem: minimize f 0 (x) x s.t. f i (x) 0, i = {1,...,k} h j (x) = 0, j = {1,...,l} Form Lagrangian using Lagrange multipliers λ i 0, ν i R Form dual function g(λ, ν) = inf x L(x, λ, ν) = f 0 (x) + L(x, λ, ν) = inf x k λ i f i (x) + i=1 f 0(x) + l ν j h j (x) j=1 k λ i f i (x) + i=1 l ν j h j (x) j=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

37 Remarks Lagrange Duality Original problem is equivalent to [ minimize x sup L(x, λ, ν) λ 0,ν Dual problem is switching the min and max: [ ] maximize inf L(x, λ, ν). λ 0,ν x ] Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

38 Lagrange Duality One Great Property of Dual Lemma (Weak Duality) If λ 0, then g(λ, ν) f 0 (x ). Proof. We have g(λ, ν) = inf L(x, λ, ν) x L(x, λ, ν) k l = f 0 (x ) + λ i f i (x ) + ν j h j (x ) f 0 (x ). i=1 j=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

39 Lagrange Duality The Greatest Property of the Dual Theorem For reasonable 1 convex problems, sup g(λ, ν) = f 0 (x ) λ 0,ν 1 There are conditions called constraint qualification for which this is true Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

40 Geometric Look Lagrange Duality Minimize 1 2 (x c 1)2 subject to x 2 c x True function (blue), constraint (green), L(x, λ) for different λ (dotted) Dual function g(λ) (black), primal optimal (dotted blue) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53 λ

41 Intuition Lagrange Duality Can interpret duality as linear approximation. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

42 Lagrange Duality Intuition Can interpret duality as linear approximation. I (a) = if a > 0, 0 otherwise; I 0 (a) = unless a = 0. Rewrite problem as minimize x k l f 0 (x) + I (f i (x)) + I 0 (h j (x)) i=1 j=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

43 Lagrange Duality Intuition Can interpret duality as linear approximation. I (a) = if a > 0, 0 otherwise; I 0 (a) = unless a = 0. Rewrite problem as minimize x k l f 0 (x) + I (f i (x)) + I 0 (h j (x)) i=1 j=1 Replace I(f i (x)) with λ i f i (x); a measure of displeasure when λ i 0, f i (x) > 0. ν i h j (x) lower bounds I 0 (h j (x)): minimize x k l f 0 (x) + λ i f i (x) + ν j h j (x) i=1 j=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

44 Lagrange Duality Example: Linearly constrained least squares minimize x Form the Lagrangian: 1 2 Ax b 2 s.t. Bx = d. L(x, ν) = 1 2 Ax b 2 + ν T (Bx d) Take infimum: x L(x, ν) = A T Ax A T b + B T ν x = (A T A) 1 (A T b B T ν) Simple unconstrained quadratic problem! inf x = 1 2 L(x, ν) A(A T A) 1 (A T b B T ν) b 2 + ν T B((A T A) 1 A T b B T ν) d T ν Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

45 Lagrange Duality Example: Quadratically constrained least squares 1 minimize x 2 Ax b 2 s.t. Form the Lagrangian (λ 0): 1 2 x 2 c. L(x, λ) = 1 2 Ax b λ( x 2 2c) Take infimum: x L(x, ν) = A T Ax A T b + λi x = (A T A + λi) 1 A T b inf L(x, λ) = 1 A(A T A + λi) 1 A T b b 2 + λ (A T A + λi) 1 A T b 2 λc x 2 2 One variable dual problem! g(λ) = 1 2 bt A(A T A + λi) 1 A T b λc b 2. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

46 Lagrange Duality Uses of the Dual Main use: certificate of optimality (a.k.a. duality gap). If we have feasible x and know the dual g(λ, ν), then g(λ, ν) f 0 (x ) f 0 (x) f 0 (x ) f 0 (x) g(λ, ν) f 0 (x) f 0 (x) f 0 (x ) f 0 (x) g(λ, ν). Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

47 Lagrange Duality Uses of the Dual Main use: certificate of optimality (a.k.a. duality gap). If we have feasible x and know the dual g(λ, ν), then g(λ, ν) f 0 (x ) f 0 (x) f 0 (x ) f 0 (x) g(λ, ν) f 0 (x) f 0 (x) f 0 (x ) f 0 (x) g(λ, ν). Also used in more advanced primal-dual algorithms (we won t talk about these). Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

48 Optimization Algorithms Optimization Algorithms Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

49 Gradient Descent Optimization Algorithms Gradient Methods The simplest algorithm in the world (almost). Goal: minimize x f(x) Just iterate where η t is stepsize. x t+1 = x t η t f(x t ) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

50 Single Step Illustration Optimization Algorithms Gradient Methods f(x) (x t, f(x t )) f(x t ) - η f(x t ) T (x - x t ) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

51 Full Gradient Descent Optimization Algorithms Gradient Methods f(x) = log(exp(x 1 + 3x 2.1) + exp(x 1 3x 2.1) + exp( x 1.1)) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

52 Optimization Algorithms Gradient Methods Stepsize Selection How do I choose a stepsize? Idea 1: exact line search Too expensive to be practical. η t = argminf (x η f(x)) η Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

53 Optimization Algorithms Gradient Methods Stepsize Selection How do I choose a stepsize? Idea 1: exact line search Too expensive to be practical. η t = argminf (x η f(x)) η Idea 2: backtracking (Armijo) line search. Let α (0, 1 2 ), β (0, 1). Multiply η = βη until Works well in practice. f (x η f(x)) f(x) αη f(x) 2 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

54 Optimization Algorithms Gradient Methods Illustration of Armijo/Backtracking Line Search f(x - η f(x)) f(x) - ηα f(x) 2 η t = 0 η 0 η As a function of stepsize η. Clearly a region where f(x η f(x)) is below line f(x) αη f(x) 2. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

55 Newton s method Optimization Algorithms Gradient Methods Idea: use a second-order approximation to function. f(x + x) f(x) + f(x) T x xt 2 f(x) x Choose x to minimize above: x = [ 2 f(x) ] 1 f(x) This is descent direction: f(x) T x = f(x) T [ 2 f(x) ] 1 f(x) < 0. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

56 Newton step picture Optimization Algorithms Gradient Methods ˆf f (x, f(x)) (x + x, f(x + x)) ˆf is 2 nd -order approximation, f is true function. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

57 Optimization Algorithms Gradient Methods Convergence of gradient descent and Newton s method Strongly convex case: 2 f(x) mi, then Linear convergence. For some γ (0, 1), f(x t ) f(x ) γ t, γ < 1. f(x t ) f(x ) γ t or t 1 γ log 1 ε f(x t) f(x ) ε. Smooth case: f(x) f(y) C x y. f(x t ) f(x ) K t 2 Newton s method often is faster, especially when f has long valleys Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

58 Optimization Algorithms What about constraints? Gradient Methods Linear constraints Ax = b are easy. For example, in Newton method (assume Ax = b): minimize x f(x) T x xt 2 f(x) x s.t. A x = 0. Solution x satisfies A(x + x) = Ax + A x = b. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

59 Optimization Algorithms Gradient Methods What about constraints? Linear constraints Ax = b are easy. For example, in Newton method (assume Ax = b): minimize x f(x) T x xt 2 f(x) x s.t. A x = 0. Solution x satisfies A(x + x) = Ax + A x = b. Inequality constraints are a bit tougher... minimize f(x) T x + 1 x 2 xt 2 f(x) x s.t. f i (x + x) 0 just as hard as original. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

60 Optimization Algorithms Gradient Methods Logarithmic Barrier Methods Goal: Convert to minimize x f 0 (x) minimize x s.t. f i (x) 0, i = {1,...,k} f 0 (x) + k I (f i (x)) i=1 Approximate I (u) t log( u) for small t. minimize x k f 0 (x) t log ( f i (x)) i=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

61 The barrier function Optimization Algorithms Gradient Methods u I (u) is dotted line, others are t log( u). Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

62 Illustration Optimization Algorithms Gradient Methods Minimizing c T x subject to Ax b. c t = 1 t = 5 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

63 Subgradient Descent Optimization Algorithms Subgradient Methods Really, the simplest algorithm in the world. Goal: minimize x f(x) Just iterate where η t is a stepsize, g t f(x t ). x t+1 = x t η t g t Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

64 Optimization Algorithms Subgradient Methods Why subgradient descent? Lots of non-differentiable convex functions used in machine learning: f(x) = [ 1 a T x ] k +, f(x) = x 1, f(x) = σ r (X) r=1 where σ r is the rth singular value of X. Easy to analyze Do not even need true sub-gradient: just have Eg t f(x t ). Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

65 Optimization Algorithms Subgradient Methods Proof of convergence for subgradient descent Idea: bound x t+1 x using subgradient inequality. Assume that g t G. x t+1 x 2 = x t ηg t x 2 = x t x 2 2ηg T t (x t x ) + η 2 g t 2 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

66 Optimization Algorithms Subgradient Methods Proof of convergence for subgradient descent Idea: bound x t+1 x using subgradient inequality. Assume that g t G. Recall that x t+1 x 2 = x t ηg t x 2 = x t x 2 2ηg T t (x t x ) + η 2 g t 2 f(x ) f(x t ) + g T t (x x t ) g T t (x t x ) f(x ) f(x t ) so Then x t+1 x 2 x t x 2 + 2η [f(x ) f(x t )] + η 2 G 2. f(x t ) f(x ) x t x 2 x t+1 x 2 + η 2η 2 G2. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

67 Almost done... Optimization Algorithms Subgradient Methods Sum from t = 1 to T: T t=1 f(x t ) f(x ) 1 2η T t=1 [ x t x 2 x t+1 x 2] + Tη 2 G2 = 1 2η x 1 x 2 1 2η x T+1 x 2 + Tη 2 G2 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

68 Almost done... Optimization Algorithms Subgradient Methods Sum from t = 1 to T: T t=1 f(x t ) f(x ) 1 2η T t=1 [ x t x 2 x t+1 x 2] + Tη 2 G2 = 1 2η x 1 x 2 1 2η x T+1 x 2 + Tη 2 G2 Now let D = x 1 x, and keep track of min along run, f (x best ) f(x ) 1 2ηT D2 + η 2 G2. Set η = D G T and f (x best) f(x ) DG T. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

69 Optimization Algorithms Subgradient Methods Extension: projected subgradient descent Now have a convex constraint set X. Goal: minimize f(x) x X Idea: do subgradient steps, project x t back into X at every iteration. Π X (x t ) x t x t+1 = Π X (x t ηg t ) X Proof: x Π X (x t ) x x t x if x X. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

70 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

71 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

72 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

73 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

74 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

75 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

76 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

77 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

78 Optimization Algorithms Subgradient Methods Convergence results for (projected) subgradient methods Any decreasing, non-summable stepsize η t 0, t=1 η t = gives f ( x avg(t) ) f(x ) 0. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

79 Optimization Algorithms Subgradient Methods Convergence results for (projected) subgradient methods Any decreasing, non-summable stepsize η t 0, t=1 η t = gives f ( x avg(t) ) f(x ) 0. Slightly less brain-dead analysis than earlier shows with η t 1/ t f ( x avg(t) ) f(x ) C t Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

80 Optimization Algorithms Subgradient Methods Convergence results for (projected) subgradient methods Any decreasing, non-summable stepsize η t 0, t=1 η t = gives f ( x avg(t) ) f(x ) 0. Slightly less brain-dead analysis than earlier shows with η t 1/ t f ( x avg(t) ) f(x ) C t Same convergence when g t is random, i.e. Eg t f(x t ). Example: f(w) = 1 2 w 2 + C Just pick random training example. n [ 1 yi x T i w ] + i=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

81 Recap Take Home Messages Defined convex sets and functions Saw why we want optimization problems to be convex (solvable) Sketched some of Lagrange duality First order methods are easy and (often) work well Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

82 Take Home Messages Take Home Messages Many useful problems formulated as convex optimization problems If it is not convex and not an eigenvalue problem, you are out of luck If it is convex, you are golden Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53

### A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION

1 A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION Dimitri Bertsekas M.I.T. FEBRUARY 2003 2 OUTLINE Convexity issues in optimization Historical remarks Our treatment of the subject Three unifying lines of

### 2.3 Convex Constrained Optimization Problems

42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

### Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

(für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained

### Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725

Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T

### Several Views of Support Vector Machines

Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

### Lecture 2: The SVM classifier

Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function

### Nonlinear Optimization: Algorithms 3: Interior-point methods

Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,

### 10. Proximal point method

L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing

### t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction

### Summer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.

Summer course on Convex Optimization Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.Minnesota Interior-Point Methods: the rebirth of an old idea Suppose that f is

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 J. Zhang Institute of Applied Mathematics, Chongqing University of Posts and Telecommunications, Chongqing

### Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

### Date: April 12, 2001. Contents

2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........

### GI01/M055 Supervised Learning Proximal Methods

GI01/M055 Supervised Learning Proximal Methods Massimiliano Pontil (based on notes by Luca Baldassarre) (UCL) Proximal Methods 1 / 20 Today s Plan Problem setting Convex analysis concepts Proximal operators

### Tutorial on Convex Optimization for Engineers Part I

Tutorial on Convex Optimization for Engineers Part I M.Sc. Jens Steinwandt Communications Research Laboratory Ilmenau University of Technology PO Box 100565 D-98684 Ilmenau, Germany jens.steinwandt@tu-ilmenau.de

### Linear Threshold Units

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

### Convex Optimization SVM s and Kernel Machines

Convex Optimization SVM s and Kernel Machines S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola and Stéphane Canu S.V.N.

### Support Vector Machines

Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric

### First Order Algorithms in Variational Image Processing

First Order Algorithms in Variational Image Processing M. Burger, A. Sawatzky, and G. Steidl today Introduction Variational methods in imaging are nowadays developing towards a quite universal and flexible

### 1 Norms and Vector Spaces

008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)

### Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

### Duality of linear conic problems

Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least

### The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### Linear Programming, Lagrange Multipliers, and Duality Geoff Gordon

lp.nb 1 Linear Programming, Lagrange Multipliers, and Duality Geoff Gordon lp.nb 2 Overview This is a tutorial about some interesting math and geometry connected with constrained optimization. It is not

### Minimize subject to. x S R

Chapter 12 Lagrangian Relaxation This chapter is mostly inspired by Chapter 16 of [1]. In the previous chapters, we have succeeded to find efficient algorithms to solve several important problems such

### Two-Stage Stochastic Linear Programs

Two-Stage Stochastic Linear Programs Operations Research Anthony Papavasiliou 1 / 27 Two-Stage Stochastic Linear Programs 1 Short Reviews Probability Spaces and Random Variables Convex Analysis 2 Deterministic

### max cx s.t. Ax c where the matrix A, cost vector c and right hand side b are given and x is a vector of variables. For this example we have x

Linear Programming Linear programming refers to problems stated as maximization or minimization of a linear function subject to constraints that are linear equalities and inequalities. Although the study

### Linear Programming Notes V Problem Transformations

Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material

### Convex analysis and profit/cost/support functions

CALIFORNIA INSTITUTE OF TECHNOLOGY Division of the Humanities and Social Sciences Convex analysis and profit/cost/support functions KC Border October 2004 Revised January 2009 Let A be a subset of R m

### Optimization. J(f) := λω(f) + R emp (f) (5.1) m l(f(x i ) y i ). (5.2) i=1

5 Optimization Optimization plays an increasingly important role in machine learning. For instance, many machine learning algorithms minimize a regularized risk functional: with the empirical risk min

### Vector and Matrix Norms

Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

### (Basic definitions and properties; Separation theorems; Characterizations) 1.1 Definition, examples, inner description, algebraic properties

Lecture 1 Convex Sets (Basic definitions and properties; Separation theorems; Characterizations) 1.1 Definition, examples, inner description, algebraic properties 1.1.1 A convex set In the school geometry

### NONLINEAR AND DYNAMIC OPTIMIZATION From Theory to Practice

NONLINEAR AND DYNAMIC OPTIMIZATION From Theory to Practice IC-32: Winter Semester 2006/2007 Benoît C. CHACHUAT Laboratoire d Automatique, École Polytechnique Fédérale de Lausanne CONTENTS 1 Nonlinear

### Proximal mapping via network optimization

L. Vandenberghe EE236C (Spring 23-4) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:

### Lecture 6: Logistic Regression

Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,

### Distributed Machine Learning and Big Data

Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya

### Stanford University CS261: Optimization Handout 6 Luca Trevisan January 20, In which we introduce the theory of duality in linear programming.

Stanford University CS261: Optimization Handout 6 Luca Trevisan January 20, 2011 Lecture 6 In which we introduce the theory of duality in linear programming 1 The Dual of Linear Program Suppose that we

### Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

### A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

### 1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where.

Introduction Linear Programming Neil Laws TT 00 A general optimization problem is of the form: choose x to maximise f(x) subject to x S where x = (x,..., x n ) T, f : R n R is the objective function, S

### Nonlinear Programming Methods.S2 Quadratic Programming

Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective

### 3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

3. Convex functions Convex Optimization Boyd & Vandenberghe basic properties and examples operations that preserve convexity the conjugate function quasiconvex functions log-concave and log-convex functions

### Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

### Inner Product Spaces

Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

### By W.E. Diewert. July, Linear programming problems are important for a number of reasons:

APPLIED ECONOMICS By W.E. Diewert. July, 3. Chapter : Linear Programming. Introduction The theory of linear programming provides a good introduction to the study of constrained maximization (and minimization)

### Lecture 2: August 29. Linear Programming (part I)

10-725: Convex Optimization Fall 2013 Lecture 2: August 29 Lecturer: Barnabás Póczos Scribes: Samrachana Adhikari, Mattia Ciollaro, Fabrizio Lecci Note: LaTeX template courtesy of UC Berkeley EECS dept.

### ME128 Computer-Aided Mechanical Design Course Notes Introduction to Design Optimization

ME128 Computer-ided Mechanical Design Course Notes Introduction to Design Optimization 2. OPTIMIZTION Design optimization is rooted as a basic problem for design engineers. It is, of course, a rare situation

### (Quasi-)Newton methods

(Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

### Some representability and duality results for convex mixed-integer programs.

Some representability and duality results for convex mixed-integer programs. Santanu S. Dey Joint work with Diego Morán and Juan Pablo Vielma December 17, 2012. Introduction About Motivation Mixed integer

### Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:

### Derivative Free Optimization

Department of Mathematics Derivative Free Optimization M.J.D. Powell LiTH-MAT-R--2014/02--SE Department of Mathematics Linköping University S-581 83 Linköping, Sweden. Three lectures 1 on Derivative Free

### 1 Polyhedra and Linear Programming

CS 598CSC: Combinatorial Optimization Lecture date: January 21, 2009 Instructor: Chandra Chekuri Scribe: Sungjin Im 1 Polyhedra and Linear Programming In this lecture, we will cover some basic material

### Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x

### 1 Sufficient statistics

1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =

### Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method

Lecture 3 3B1B Optimization Michaelmas 2015 A. Zisserman Linear Programming Extreme solutions Simplex method Interior point method Integer programming and relaxation The Optimization Tree Linear Programming

### Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach

MASTER S THESIS Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach PAULINE ALDENVIK MIRJAM SCHIERSCHER Department of Mathematical

### Diagonalisation. Chapter 3. Introduction. Eigenvalues and eigenvectors. Reading. Definitions

Chapter 3 Diagonalisation Eigenvalues and eigenvectors, diagonalisation of a matrix, orthogonal diagonalisation fo symmetric matrices Reading As in the previous chapter, there is no specific essential

### Definition of a Linear Program

Definition of a Linear Program Definition: A function f(x 1, x,..., x n ) of x 1, x,..., x n is a linear function if and only if for some set of constants c 1, c,..., c n, f(x 1, x,..., x n ) = c 1 x 1

### Dual Methods for Total Variation-Based Image Restoration

Dual Methods for Total Variation-Based Image Restoration Jamylle Carter Institute for Mathematics and its Applications University of Minnesota, Twin Cities Ph.D. (Mathematics), UCLA, 2001 Advisor: Tony

### Notes on Symmetric Matrices

CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.

Elementary Gradient-Based Parameter Estimation Julius O. Smith III Center for Computer Research in Music and Acoustics (CCRMA Department of Music, Stanford University, Stanford, California 94305 USA Abstract

The Hadamard Product Elizabeth Million April 12, 2007 1 Introduction and Basic Results As inexperienced mathematicians we may have once thought that the natural definition for matrix multiplication would

### Lecture 7 - Linear Programming

COMPSCI 530: Design and Analysis of Algorithms DATE: 09/17/2013 Lecturer: Debmalya Panigrahi Lecture 7 - Linear Programming Scribe: Hieu Bui 1 Overview In this lecture, we cover basics of linear programming,

### Support Vector Machines Explained

March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

### SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA This handout presents the second derivative test for a local extrema of a Lagrange multiplier problem. The Section 1 presents a geometric motivation for the

### CONSTRAINED NONLINEAR PROGRAMMING

149 CONSTRAINED NONLINEAR PROGRAMMING We now turn to methods for general constrained nonlinear programming. These may be broadly classified into two categories: 1. TRANSFORMATION METHODS: In this approach

### CHAPTER 9. Integer Programming

CHAPTER 9 Integer Programming An integer linear program (ILP) is, by definition, a linear program with the additional constraint that all variables take integer values: (9.1) max c T x s t Ax b and x integral

### Convex Optimization. Lieven Vandenberghe University of California, Los Angeles

Convex Optimization Lieven Vandenberghe University of California, Los Angeles Tutorial lectures, Machine Learning Summer School University of Cambridge, September 3-4, 2009 Sources: Boyd & Vandenberghe,

### An Introduction to Machine Learning

An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

### Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

### Chapter 4. Duality. 4.1 A Graphical Example

Chapter 4 Duality Given any linear program, there is another related linear program called the dual. In this chapter, we will develop an understanding of the dual linear program. This understanding translates

### 4.6 Linear Programming duality

4.6 Linear Programming duality To any minimization (maximization) LP we can associate a closely related maximization (minimization) LP. Different spaces and objective functions but in general same optimal

### Absolute Value Programming

Computational Optimization and Aplications,, 1 11 (2006) c 2006 Springer Verlag, Boston. Manufactured in The Netherlands. Absolute Value Programming O. L. MANGASARIAN olvi@cs.wisc.edu Computer Sciences

### Metric Spaces. Chapter 1

Chapter 1 Metric Spaces Many of the arguments you have seen in several variable calculus are almost identical to the corresponding arguments in one variable calculus, especially arguments concerning convergence

### Cost Minimization and the Cost Function

Cost Minimization and the Cost Function Juan Manuel Puerta October 5, 2009 So far we focused on profit maximization, we could look at a different problem, that is the cost minimization problem. This is

Algorithms for large-scale convex optimization DTU 2010 3. Proximal gradient method introduction proximal mapping proximal gradient method convergence analysis accelerated proximal gradient method forward-backward

### The equivalence of logistic regression and maximum entropy models

The equivalence of logistic regression and maximum entropy models John Mount September 23, 20 Abstract As our colleague so aptly demonstrated ( http://www.win-vector.com/blog/20/09/the-simplerderivation-of-logistic-regression/

### Linear Classification. Volker Tresp Summer 2015

Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

### Lecture 7: Finding Lyapunov Functions 1

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.243j (Fall 2003): DYNAMICS OF NONLINEAR SYSTEMS by A. Megretski Lecture 7: Finding Lyapunov Functions 1

### GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1

(R) User Manual Environmental Energy Technologies Division Berkeley, CA 94720 http://simulationresearch.lbl.gov Michael Wetter MWetter@lbl.gov February 20, 2009 Notice: This work was supported by the U.S.

### A FIRST COURSE IN OPTIMIZATION THEORY

A FIRST COURSE IN OPTIMIZATION THEORY RANGARAJAN K. SUNDARAM New York University CAMBRIDGE UNIVERSITY PRESS Contents Preface Acknowledgements page xiii xvii 1 Mathematical Preliminaries 1 1.1 Notation

### RANDOM INTERVAL HOMEOMORPHISMS. MICHA L MISIUREWICZ Indiana University Purdue University Indianapolis

RANDOM INTERVAL HOMEOMORPHISMS MICHA L MISIUREWICZ Indiana University Purdue University Indianapolis This is a joint work with Lluís Alsedà Motivation: A talk by Yulij Ilyashenko. Two interval maps, applied

### Høgskolen i Narvik Sivilingeniørutdanningen STE6237 ELEMENTMETODER. Oppgaver

Høgskolen i Narvik Sivilingeniørutdanningen STE637 ELEMENTMETODER Oppgaver Klasse: 4.ID, 4.IT Ekstern Professor: Gregory A. Chechkin e-mail: chechkin@mech.math.msu.su Narvik 6 PART I Task. Consider two-point

### Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.

Walrasian Demand Econ 2100 Fall 2015 Lecture 5, September 16 Outline 1 Walrasian Demand 2 Properties of Walrasian Demand 3 An Optimization Recipe 4 First and Second Order Conditions Definition Walrasian

### Introduction to Machine Learning

Introduction to Machine Learning Prof. Alexander Ihler Prof. Max Welling icamp Tutorial July 22 What is machine learning? The ability of a machine to improve its performance based on previous results:

### No: 10 04. Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics

No: 10 04 Bilkent University Monotonic Extension Farhad Husseinov Discussion Papers Department of Economics The Discussion Papers of the Department of Economics are intended to make the initial results

### Lecture 5: Conic Optimization: Overview

EE 227A: Conve Optimization and Applications January 31, 2012 Lecture 5: Conic Optimization: Overview Lecturer: Laurent El Ghaoui Reading assignment: Chapter 4 of BV. Sections 3.1-3.6 of WTB. 5.1 Linear

### MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α

### 1. Let X and Y be normed spaces and let T B(X, Y ).

Uppsala Universitet Matematiska Institutionen Andreas Strömbergsson Prov i matematik Funktionalanalys Kurs: NVP, Frist. 2005-03-14 Skrivtid: 9 11.30 Tillåtna hjälpmedel: Manuella skrivdon, Kreyszigs bok

### Chapter 1. Metric Spaces. Metric Spaces. Examples. Normed linear spaces

Chapter 1. Metric Spaces Metric Spaces MA222 David Preiss d.preiss@warwick.ac.uk Warwick University, Spring 2008/2009 Definitions. A metric on a set M is a function d : M M R such that for all x, y, z

### Online Convex Optimization

E0 370 Statistical Learning heory Lecture 19 Oct 22, 2013 Online Convex Optimization Lecturer: Shivani Agarwal Scribe: Aadirupa 1 Introduction In this lecture we shall look at a fairly general setting

### Stochastic Inventory Control

Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the

### Statistical Machine Learning from Data

Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

### Constrained Least Squares

Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580-587 CICN may05/1 Background The least squares problem: min Ax b 2 x Sometimes,