Introduction to Convex Optimization for Machine Learning


 Leo Lyons
 2 years ago
 Views:
Transcription
1 Introduction to Convex Optimization for Machine Learning John Duchi University of California, Berkeley Practical Machine Learning, Fall 2009 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
2 Outline What is Optimization Convex Sets Convex Functions Convex Optimization Problems Lagrange Duality Optimization Algorithms Take Home Messages Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
3 What is Optimization What is Optimization (and why do we care?) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
4 What is Optimization? What is Optimization Finding the minimizer of a function subject to constraints: minimize f 0 (x) x s.t. f i (x) 0, i = {1,...,k} h j (x) = 0, j = {1,...,l} Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
5 What is Optimization? What is Optimization Finding the minimizer of a function subject to constraints: minimize f 0 (x) x s.t. f i (x) 0, i = {1,...,k} h j (x) = 0, j = {1,...,l} Example: Stock market. Minimize variance of return subject to getting at least $50. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
6 What is Optimization Why do we care? Optimization is at the heart of many (most practical?) machine learning algorithms. Linear regression: minimize w Xw y 2 Classification (logistic regresion or SVM): minimize w n log ( 1 + exp( y i x T i w) ) i=1 or w 2 + C n ξ i s.t. ξ i 1 y i x T i w, ξ i 0. i=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
7 We still care... What is Optimization Maximum likelihood estimation: maximize θ n log p θ (x i ) i=1 Collaborative filtering: minimize w log ( 1 + exp(w T x i w T x j ) ) i j kmeans: minimize µ 1,...,µ k J(µ) = k j=1 i C j x i µ j 2 And more (graphical models, feature selection, active learning, control) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
8 What is Optimization But generally speaking... We re screwed. Local (non global) minima of f 0 All kinds of constraints (even restricting to continuous functions): h(x) = sin(2πx) = Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
9 What is Optimization But generally speaking... We re screwed. Local (non global) minima of f 0 All kinds of constraints (even restricting to continuous functions): h(x) = sin(2πx) = Go for convex problems! Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
10 Convex Sets Convex Sets Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
11 Convex Sets Definition A set C R n is convex if for x, y C and any α [0, 1], αx + (1 α)y C. x y Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
12 Examples Convex Sets All of R n (obvious) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
13 Examples Convex Sets All of R n (obvious) Nonnegative orthant, R n +: let x 0, y 0, clearly αx + (1 α)y 0. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
14 Examples Convex Sets All of R n (obvious) Nonnegative orthant, R n +: let x 0, y 0, clearly αx + (1 α)y 0. Norm balls: let x 1, y 1, then αx + (1 α)y αx + (1 α)y = α x + (1 α) y 1. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
15 Examples Convex Sets Affine subspaces: Ax = b, Ay = b, then A(αx + (1 α)y) = αax + (1 α)ay = αb + (1 α)b = b x x x Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
16 More examples Convex Sets Arbitrary intersections of convex sets: let C i be convex for i I, C = i C i, then x C, y C αx + (1 α)y C i i I so αx + (1 α)y C. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
17 More examples Convex Sets PSD Matrices, a.k.a. the positive semidefinite cone S n + R n n. A S n + means x T Ax 0 for all x R n. For A, B S + n, x T (αa + (1 α)b) x z = αx T Ax + (1 α)x T Bx On right: {[ ] } S 2 x z + = 0 = { 1 y 0 x, y, z : x 0, y 0, xy z 2} z y 0.4 x Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
18 Convex Functions Convex Functions Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
19 Convex Functions Definition A function f : R n R is convex if for x, y domf and any α [0, 1], f(αx + (1 α)y) αf(x) + (1 α)f(y). αf(x) + (1  α)f(y) f(y) f(x) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
20 Convex Functions First order convexity conditions Theorem Suppose f : R n R is differentiable. Then f is convex if and only if for all x, y domf f(y) f(x) + f(x) T (y x) f(y) f(x) + f(x) T (y  x) (x, f(x)) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
21 Convex Functions Actually, more general than that Definition The subgradient set, or subdifferential set, f(x) of f at x is f(x) = { g : f(y) f(x) + g T (y x) for all y }. f(y) Theorem f : R n R is convex if and only if it has nonempty subdifferential set everywhere. (x, f(x)) f(x) + g T (y  x) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
22 Convex Functions Second order convexity conditions Theorem Suppose f : R n R is twice differentiable. Then f is convex if and only if for all x dom f, 2 f(x) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
23 Convex Functions Convex sets and convex functions Definition The epigraph of a function f is the set of points epi f epi f = {(x, t) : f(x) t}. epi f is convex if and only if f is convex. Sublevel sets, {x : f(x) a} are convex for convex f. a Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
24 Examples Convex Functions Examples Linear/affine functions: f(x) = b T x + c. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
25 Examples Convex Functions Examples Linear/affine functions: f(x) = b T x + c. Quadratic functions: for A 0. For regression: f(x) = 1 2 xt Ax + b T x + c 1 2 Xw y 2 = 1 2 wt X T Xw y T Xw yt y. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
26 More examples Convex Functions Examples Norms (like l 1 or l 2 for regularization): αx + (1 α)y αx + (1 α)y = α x + (1 α) y. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
27 Convex Functions Examples More examples Norms (like l 1 or l 2 for regularization): αx + (1 α)y αx + (1 α)y = α x + (1 α) y. Composition with an affine function f(ax + b): f(a(αx + (1 α)y) + b) = f(α(ax + b) + (1 α)(ay + b)) αf(ax + b) + (1 α)f(ay + b) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
28 Convex Functions Examples More examples Norms (like l 1 or l 2 for regularization): αx + (1 α)y αx + (1 α)y = α x + (1 α) y. Composition with an affine function f(ax + b): f(a(αx + (1 α)y) + b) = f(α(ax + b) + (1 α)(ay + b)) αf(ax + b) + (1 α)f(ay + b) Logsumexp (via 2 f(x) PSD): ( n ) f(x) = log exp(x i ) i=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
29 Convex Functions Examples Important examples in Machine Learning 3 SVM loss: f(w) = [ 1 y i x T i w ] + Binary logistic loss: [1  x] + f(w) = log ( 1 + exp( y i x T i w) ) log(1 + e x ) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
30 Convex Optimization Problems Convex Optimization Problems Definition An optimization problem is convex if its objective is a convex function, the inequality constraints f j are convex, and the equality constraints h j are affine minimize f 0 (x) (Convex function) x s.t. f i (x) 0 (Convex sets) h j (x) = 0 (Affine) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
31 It s nice to be convex Convex Optimization Problems Theorem If ˆx is a local minimizer of a convex optimization problem, it is a global minimizer x Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
32 Convex Optimization Problems Even more reasons to be convex Theorem f(x) = 0 if and only if x is a global minimizer of f(x). Proof. f(x) = 0. We have f(y) f(x) + f(x) T (y x) = f(x). f(x) 0. There is a direction of descent. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
33 Convex Optimization Problems LET S TAKE A BREAK Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
34 Lagrange Duality Lagrange Duality Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
35 Lagrange Duality Goals of Lagrange Duality Get certificate for optimality of a problem Remove constraints Reformulate problem Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
36 Constructing the dual Lagrange Duality Start with optimization problem: minimize f 0 (x) x s.t. f i (x) 0, i = {1,...,k} h j (x) = 0, j = {1,...,l} Form Lagrangian using Lagrange multipliers λ i 0, ν i R Form dual function g(λ, ν) = inf x L(x, λ, ν) = f 0 (x) + L(x, λ, ν) = inf x k λ i f i (x) + i=1 f 0(x) + l ν j h j (x) j=1 k λ i f i (x) + i=1 l ν j h j (x) j=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
37 Remarks Lagrange Duality Original problem is equivalent to [ minimize x sup L(x, λ, ν) λ 0,ν Dual problem is switching the min and max: [ ] maximize inf L(x, λ, ν). λ 0,ν x ] Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
38 Lagrange Duality One Great Property of Dual Lemma (Weak Duality) If λ 0, then g(λ, ν) f 0 (x ). Proof. We have g(λ, ν) = inf L(x, λ, ν) x L(x, λ, ν) k l = f 0 (x ) + λ i f i (x ) + ν j h j (x ) f 0 (x ). i=1 j=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
39 Lagrange Duality The Greatest Property of the Dual Theorem For reasonable 1 convex problems, sup g(λ, ν) = f 0 (x ) λ 0,ν 1 There are conditions called constraint qualification for which this is true Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
40 Geometric Look Lagrange Duality Minimize 1 2 (x c 1)2 subject to x 2 c x True function (blue), constraint (green), L(x, λ) for different λ (dotted) Dual function g(λ) (black), primal optimal (dotted blue) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53 λ
41 Intuition Lagrange Duality Can interpret duality as linear approximation. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
42 Lagrange Duality Intuition Can interpret duality as linear approximation. I (a) = if a > 0, 0 otherwise; I 0 (a) = unless a = 0. Rewrite problem as minimize x k l f 0 (x) + I (f i (x)) + I 0 (h j (x)) i=1 j=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
43 Lagrange Duality Intuition Can interpret duality as linear approximation. I (a) = if a > 0, 0 otherwise; I 0 (a) = unless a = 0. Rewrite problem as minimize x k l f 0 (x) + I (f i (x)) + I 0 (h j (x)) i=1 j=1 Replace I(f i (x)) with λ i f i (x); a measure of displeasure when λ i 0, f i (x) > 0. ν i h j (x) lower bounds I 0 (h j (x)): minimize x k l f 0 (x) + λ i f i (x) + ν j h j (x) i=1 j=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
44 Lagrange Duality Example: Linearly constrained least squares minimize x Form the Lagrangian: 1 2 Ax b 2 s.t. Bx = d. L(x, ν) = 1 2 Ax b 2 + ν T (Bx d) Take infimum: x L(x, ν) = A T Ax A T b + B T ν x = (A T A) 1 (A T b B T ν) Simple unconstrained quadratic problem! inf x = 1 2 L(x, ν) A(A T A) 1 (A T b B T ν) b 2 + ν T B((A T A) 1 A T b B T ν) d T ν Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
45 Lagrange Duality Example: Quadratically constrained least squares 1 minimize x 2 Ax b 2 s.t. Form the Lagrangian (λ 0): 1 2 x 2 c. L(x, λ) = 1 2 Ax b λ( x 2 2c) Take infimum: x L(x, ν) = A T Ax A T b + λi x = (A T A + λi) 1 A T b inf L(x, λ) = 1 A(A T A + λi) 1 A T b b 2 + λ (A T A + λi) 1 A T b 2 λc x 2 2 One variable dual problem! g(λ) = 1 2 bt A(A T A + λi) 1 A T b λc b 2. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
46 Lagrange Duality Uses of the Dual Main use: certificate of optimality (a.k.a. duality gap). If we have feasible x and know the dual g(λ, ν), then g(λ, ν) f 0 (x ) f 0 (x) f 0 (x ) f 0 (x) g(λ, ν) f 0 (x) f 0 (x) f 0 (x ) f 0 (x) g(λ, ν). Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
47 Lagrange Duality Uses of the Dual Main use: certificate of optimality (a.k.a. duality gap). If we have feasible x and know the dual g(λ, ν), then g(λ, ν) f 0 (x ) f 0 (x) f 0 (x ) f 0 (x) g(λ, ν) f 0 (x) f 0 (x) f 0 (x ) f 0 (x) g(λ, ν). Also used in more advanced primaldual algorithms (we won t talk about these). Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
48 Optimization Algorithms Optimization Algorithms Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
49 Gradient Descent Optimization Algorithms Gradient Methods The simplest algorithm in the world (almost). Goal: minimize x f(x) Just iterate where η t is stepsize. x t+1 = x t η t f(x t ) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
50 Single Step Illustration Optimization Algorithms Gradient Methods f(x) (x t, f(x t )) f(x t )  η f(x t ) T (x  x t ) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
51 Full Gradient Descent Optimization Algorithms Gradient Methods f(x) = log(exp(x 1 + 3x 2.1) + exp(x 1 3x 2.1) + exp( x 1.1)) Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
52 Optimization Algorithms Gradient Methods Stepsize Selection How do I choose a stepsize? Idea 1: exact line search Too expensive to be practical. η t = argminf (x η f(x)) η Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
53 Optimization Algorithms Gradient Methods Stepsize Selection How do I choose a stepsize? Idea 1: exact line search Too expensive to be practical. η t = argminf (x η f(x)) η Idea 2: backtracking (Armijo) line search. Let α (0, 1 2 ), β (0, 1). Multiply η = βη until Works well in practice. f (x η f(x)) f(x) αη f(x) 2 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
54 Optimization Algorithms Gradient Methods Illustration of Armijo/Backtracking Line Search f(x  η f(x)) f(x)  ηα f(x) 2 η t = 0 η 0 η As a function of stepsize η. Clearly a region where f(x η f(x)) is below line f(x) αη f(x) 2. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
55 Newton s method Optimization Algorithms Gradient Methods Idea: use a secondorder approximation to function. f(x + x) f(x) + f(x) T x xt 2 f(x) x Choose x to minimize above: x = [ 2 f(x) ] 1 f(x) This is descent direction: f(x) T x = f(x) T [ 2 f(x) ] 1 f(x) < 0. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
56 Newton step picture Optimization Algorithms Gradient Methods ˆf f (x, f(x)) (x + x, f(x + x)) ˆf is 2 nd order approximation, f is true function. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
57 Optimization Algorithms Gradient Methods Convergence of gradient descent and Newton s method Strongly convex case: 2 f(x) mi, then Linear convergence. For some γ (0, 1), f(x t ) f(x ) γ t, γ < 1. f(x t ) f(x ) γ t or t 1 γ log 1 ε f(x t) f(x ) ε. Smooth case: f(x) f(y) C x y. f(x t ) f(x ) K t 2 Newton s method often is faster, especially when f has long valleys Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
58 Optimization Algorithms What about constraints? Gradient Methods Linear constraints Ax = b are easy. For example, in Newton method (assume Ax = b): minimize x f(x) T x xt 2 f(x) x s.t. A x = 0. Solution x satisfies A(x + x) = Ax + A x = b. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
59 Optimization Algorithms Gradient Methods What about constraints? Linear constraints Ax = b are easy. For example, in Newton method (assume Ax = b): minimize x f(x) T x xt 2 f(x) x s.t. A x = 0. Solution x satisfies A(x + x) = Ax + A x = b. Inequality constraints are a bit tougher... minimize f(x) T x + 1 x 2 xt 2 f(x) x s.t. f i (x + x) 0 just as hard as original. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
60 Optimization Algorithms Gradient Methods Logarithmic Barrier Methods Goal: Convert to minimize x f 0 (x) minimize x s.t. f i (x) 0, i = {1,...,k} f 0 (x) + k I (f i (x)) i=1 Approximate I (u) t log( u) for small t. minimize x k f 0 (x) t log ( f i (x)) i=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
61 The barrier function Optimization Algorithms Gradient Methods u I (u) is dotted line, others are t log( u). Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
62 Illustration Optimization Algorithms Gradient Methods Minimizing c T x subject to Ax b. c t = 1 t = 5 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
63 Subgradient Descent Optimization Algorithms Subgradient Methods Really, the simplest algorithm in the world. Goal: minimize x f(x) Just iterate where η t is a stepsize, g t f(x t ). x t+1 = x t η t g t Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
64 Optimization Algorithms Subgradient Methods Why subgradient descent? Lots of nondifferentiable convex functions used in machine learning: f(x) = [ 1 a T x ] k +, f(x) = x 1, f(x) = σ r (X) r=1 where σ r is the rth singular value of X. Easy to analyze Do not even need true subgradient: just have Eg t f(x t ). Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
65 Optimization Algorithms Subgradient Methods Proof of convergence for subgradient descent Idea: bound x t+1 x using subgradient inequality. Assume that g t G. x t+1 x 2 = x t ηg t x 2 = x t x 2 2ηg T t (x t x ) + η 2 g t 2 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
66 Optimization Algorithms Subgradient Methods Proof of convergence for subgradient descent Idea: bound x t+1 x using subgradient inequality. Assume that g t G. Recall that x t+1 x 2 = x t ηg t x 2 = x t x 2 2ηg T t (x t x ) + η 2 g t 2 f(x ) f(x t ) + g T t (x x t ) g T t (x t x ) f(x ) f(x t ) so Then x t+1 x 2 x t x 2 + 2η [f(x ) f(x t )] + η 2 G 2. f(x t ) f(x ) x t x 2 x t+1 x 2 + η 2η 2 G2. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
67 Almost done... Optimization Algorithms Subgradient Methods Sum from t = 1 to T: T t=1 f(x t ) f(x ) 1 2η T t=1 [ x t x 2 x t+1 x 2] + Tη 2 G2 = 1 2η x 1 x 2 1 2η x T+1 x 2 + Tη 2 G2 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
68 Almost done... Optimization Algorithms Subgradient Methods Sum from t = 1 to T: T t=1 f(x t ) f(x ) 1 2η T t=1 [ x t x 2 x t+1 x 2] + Tη 2 G2 = 1 2η x 1 x 2 1 2η x T+1 x 2 + Tη 2 G2 Now let D = x 1 x, and keep track of min along run, f (x best ) f(x ) 1 2ηT D2 + η 2 G2. Set η = D G T and f (x best) f(x ) DG T. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
69 Optimization Algorithms Subgradient Methods Extension: projected subgradient descent Now have a convex constraint set X. Goal: minimize f(x) x X Idea: do subgradient steps, project x t back into X at every iteration. Π X (x t ) x t x t+1 = Π X (x t ηg t ) X Proof: x Π X (x t ) x x t x if x X. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
70 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
71 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
72 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
73 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
74 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
75 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
76 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
77 Optimization Algorithms Projected subgradient example Subgradient Methods minimize x 1 2 Ax b s.t. x 1 1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
78 Optimization Algorithms Subgradient Methods Convergence results for (projected) subgradient methods Any decreasing, nonsummable stepsize η t 0, t=1 η t = gives f ( x avg(t) ) f(x ) 0. Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
79 Optimization Algorithms Subgradient Methods Convergence results for (projected) subgradient methods Any decreasing, nonsummable stepsize η t 0, t=1 η t = gives f ( x avg(t) ) f(x ) 0. Slightly less braindead analysis than earlier shows with η t 1/ t f ( x avg(t) ) f(x ) C t Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
80 Optimization Algorithms Subgradient Methods Convergence results for (projected) subgradient methods Any decreasing, nonsummable stepsize η t 0, t=1 η t = gives f ( x avg(t) ) f(x ) 0. Slightly less braindead analysis than earlier shows with η t 1/ t f ( x avg(t) ) f(x ) C t Same convergence when g t is random, i.e. Eg t f(x t ). Example: f(w) = 1 2 w 2 + C Just pick random training example. n [ 1 yi x T i w ] + i=1 Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
81 Recap Take Home Messages Defined convex sets and functions Saw why we want optimization problems to be convex (solvable) Sketched some of Lagrange duality First order methods are easy and (often) work well Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
82 Take Home Messages Take Home Messages Many useful problems formulated as convex optimization problems If it is not convex and not an eigenvalue problem, you are out of luck If it is convex, you are golden Duchi (UC Berkeley) Convex Optimization for Machine Learning Fall / 53
A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION
1 A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION Dimitri Bertsekas M.I.T. FEBRUARY 2003 2 OUTLINE Convexity issues in optimization Historical remarks Our treatment of the subject Three unifying lines of
More information2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
More informationNumerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen
(für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained
More informationDuality in General Programs. Ryan Tibshirani Convex Optimization 10725/36725
Duality in General Programs Ryan Tibshirani Convex Optimization 10725/36725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T
More informationSeveral Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
More informationLecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
More informationNonlinear Optimization: Algorithms 3: Interiorpoint methods
Nonlinear Optimization: Algorithms 3: Interiorpoint methods INSEAD, Spring 2006 JeanPhilippe Vert Ecole des Mines de Paris JeanPhilippe.Vert@mines.org Nonlinear optimization c 2006 JeanPhilippe Vert,
More information10. Proximal point method
L. Vandenberghe EE236C Spring 201314) 10. Proximal point method proximal point method augmented Lagrangian method MoreauYosida smoothing 101 Proximal point method a conceptual algorithm for minimizing
More informationt := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).
1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction
More informationSummer course on Convex Optimization. Fifth Lecture InteriorPoint Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.
Summer course on Convex Optimization Fifth Lecture InteriorPoint Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.Minnesota InteriorPoint Methods: the rebirth of an old idea Suppose that f is
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationFurther Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1
Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 J. Zhang Institute of Applied Mathematics, Chongqing University of Posts and Telecommunications, Chongqing
More informationBig Data  Lecture 1 Optimization reminders
Big Data  Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data  Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
More informationDate: April 12, 2001. Contents
2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........
More informationGI01/M055 Supervised Learning Proximal Methods
GI01/M055 Supervised Learning Proximal Methods Massimiliano Pontil (based on notes by Luca Baldassarre) (UCL) Proximal Methods 1 / 20 Today s Plan Problem setting Convex analysis concepts Proximal operators
More informationTutorial on Convex Optimization for Engineers Part I
Tutorial on Convex Optimization for Engineers Part I M.Sc. Jens Steinwandt Communications Research Laboratory Ilmenau University of Technology PO Box 100565 D98684 Ilmenau, Germany jens.steinwandt@tuilmenau.de
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationConvex Optimization SVM s and Kernel Machines
Convex Optimization SVM s and Kernel Machines S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola and Stéphane Canu S.V.N.
More informationSupport Vector Machines
Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric
More informationFirst Order Algorithms in Variational Image Processing
First Order Algorithms in Variational Image Processing M. Burger, A. Sawatzky, and G. Steidl today Introduction Variational methods in imaging are nowadays developing towards a quite universal and flexible
More information1 Norms and Vector Spaces
008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)
More informationModern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
More informationDuality of linear conic problems
Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least
More informationThe Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Linesearch Method
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Linesearch Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationLinear Programming, Lagrange Multipliers, and Duality Geoff Gordon
lp.nb 1 Linear Programming, Lagrange Multipliers, and Duality Geoff Gordon lp.nb 2 Overview This is a tutorial about some interesting math and geometry connected with constrained optimization. It is not
More informationMinimize subject to. x S R
Chapter 12 Lagrangian Relaxation This chapter is mostly inspired by Chapter 16 of [1]. In the previous chapters, we have succeeded to find efficient algorithms to solve several important problems such
More informationTwoStage Stochastic Linear Programs
TwoStage Stochastic Linear Programs Operations Research Anthony Papavasiliou 1 / 27 TwoStage Stochastic Linear Programs 1 Short Reviews Probability Spaces and Random Variables Convex Analysis 2 Deterministic
More informationmax cx s.t. Ax c where the matrix A, cost vector c and right hand side b are given and x is a vector of variables. For this example we have x
Linear Programming Linear programming refers to problems stated as maximization or minimization of a linear function subject to constraints that are linear equalities and inequalities. Although the study
More informationLinear Programming Notes V Problem Transformations
Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material
More informationConvex analysis and profit/cost/support functions
CALIFORNIA INSTITUTE OF TECHNOLOGY Division of the Humanities and Social Sciences Convex analysis and profit/cost/support functions KC Border October 2004 Revised January 2009 Let A be a subset of R m
More informationOptimization. J(f) := λω(f) + R emp (f) (5.1) m l(f(x i ) y i ). (5.2) i=1
5 Optimization Optimization plays an increasingly important role in machine learning. For instance, many machine learning algorithms minimize a regularized risk functional: with the empirical risk min
More informationVector and Matrix Norms
Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a nonempty
More information(Basic definitions and properties; Separation theorems; Characterizations) 1.1 Definition, examples, inner description, algebraic properties
Lecture 1 Convex Sets (Basic definitions and properties; Separation theorems; Characterizations) 1.1 Definition, examples, inner description, algebraic properties 1.1.1 A convex set In the school geometry
More informationNONLINEAR AND DYNAMIC OPTIMIZATION From Theory to Practice
NONLINEAR AND DYNAMIC OPTIMIZATION From Theory to Practice IC32: Winter Semester 2006/2007 Benoît C. CHACHUAT Laboratoire d Automatique, École Polytechnique Fédérale de Lausanne CONTENTS 1 Nonlinear
More informationProximal mapping via network optimization
L. Vandenberghe EE236C (Spring 234) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:
More informationLecture 6: Logistic Regression
Lecture 6: CS 19410, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,
More informationDistributed Machine Learning and Big Data
Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya
More informationStanford University CS261: Optimization Handout 6 Luca Trevisan January 20, In which we introduce the theory of duality in linear programming.
Stanford University CS261: Optimization Handout 6 Luca Trevisan January 20, 2011 Lecture 6 In which we introduce the theory of duality in linear programming 1 The Dual of Linear Program Suppose that we
More informationClass #6: Nonlinear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Nonlinear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Nonlinear classification Linear Support Vector Machines
More informationA Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Largemargin linear
More information1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where.
Introduction Linear Programming Neil Laws TT 00 A general optimization problem is of the form: choose x to maximise f(x) subject to x S where x = (x,..., x n ) T, f : R n R is the objective function, S
More informationNonlinear Programming Methods.S2 Quadratic Programming
Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective
More information3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions
3. Convex functions Convex Optimization Boyd & Vandenberghe basic properties and examples operations that preserve convexity the conjugate function quasiconvex functions logconcave and logconvex functions
More informationIntroduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multiclass classification.
More informationInner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
More informationBy W.E. Diewert. July, Linear programming problems are important for a number of reasons:
APPLIED ECONOMICS By W.E. Diewert. July, 3. Chapter : Linear Programming. Introduction The theory of linear programming provides a good introduction to the study of constrained maximization (and minimization)
More informationLecture 2: August 29. Linear Programming (part I)
10725: Convex Optimization Fall 2013 Lecture 2: August 29 Lecturer: Barnabás Póczos Scribes: Samrachana Adhikari, Mattia Ciollaro, Fabrizio Lecci Note: LaTeX template courtesy of UC Berkeley EECS dept.
More informationME128 ComputerAided Mechanical Design Course Notes Introduction to Design Optimization
ME128 Computerided Mechanical Design Course Notes Introduction to Design Optimization 2. OPTIMIZTION Design optimization is rooted as a basic problem for design engineers. It is, of course, a rare situation
More information(Quasi)Newton methods
(Quasi)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable nonlinear function g, x such that g(x) = 0, where g : R n R n. Given a starting
More informationAdaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More informationSome representability and duality results for convex mixedinteger programs.
Some representability and duality results for convex mixedinteger programs. Santanu S. Dey Joint work with Diego Morán and Juan Pablo Vielma December 17, 2012. Introduction About Motivation Mixed integer
More informationProbabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur
Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:
More informationDerivative Free Optimization
Department of Mathematics Derivative Free Optimization M.J.D. Powell LiTHMATR2014/02SE Department of Mathematics Linköping University S581 83 Linköping, Sweden. Three lectures 1 on Derivative Free
More information1 Polyhedra and Linear Programming
CS 598CSC: Combinatorial Optimization Lecture date: January 21, 2009 Instructor: Chandra Chekuri Scribe: Sungjin Im 1 Polyhedra and Linear Programming In this lecture, we will cover some basic material
More informationDepartment of Mathematics, Indian Institute of Technology, Kharagpur Assignment 23, Probability and Statistics, March 2015. Due:March 25, 2015.
Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 3, Probability and Statistics, March 05. Due:March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x
More information1 Sufficient statistics
1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =
More informationLecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method
Lecture 3 3B1B Optimization Michaelmas 2015 A. Zisserman Linear Programming Extreme solutions Simplex method Interior point method Integer programming and relaxation The Optimization Tree Linear Programming
More informationRecovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branchandbound approach
MASTER S THESIS Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branchandbound approach PAULINE ALDENVIK MIRJAM SCHIERSCHER Department of Mathematical
More informationDiagonalisation. Chapter 3. Introduction. Eigenvalues and eigenvectors. Reading. Definitions
Chapter 3 Diagonalisation Eigenvalues and eigenvectors, diagonalisation of a matrix, orthogonal diagonalisation fo symmetric matrices Reading As in the previous chapter, there is no specific essential
More informationDefinition of a Linear Program
Definition of a Linear Program Definition: A function f(x 1, x,..., x n ) of x 1, x,..., x n is a linear function if and only if for some set of constants c 1, c,..., c n, f(x 1, x,..., x n ) = c 1 x 1
More informationDual Methods for Total VariationBased Image Restoration
Dual Methods for Total VariationBased Image Restoration Jamylle Carter Institute for Mathematics and its Applications University of Minnesota, Twin Cities Ph.D. (Mathematics), UCLA, 2001 Advisor: Tony
More informationNotes on Symmetric Matrices
CPSC 536N: Randomized Algorithms 201112 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.
More informationElementary GradientBased Parameter Estimation
Elementary GradientBased Parameter Estimation Julius O. Smith III Center for Computer Research in Music and Acoustics (CCRMA Department of Music, Stanford University, Stanford, California 94305 USA Abstract
More informationThe Hadamard Product
The Hadamard Product Elizabeth Million April 12, 2007 1 Introduction and Basic Results As inexperienced mathematicians we may have once thought that the natural definition for matrix multiplication would
More informationLecture 7  Linear Programming
COMPSCI 530: Design and Analysis of Algorithms DATE: 09/17/2013 Lecturer: Debmalya Panigrahi Lecture 7  Linear Programming Scribe: Hieu Bui 1 Overview In this lecture, we cover basics of linear programming,
More informationSupport Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationSECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA
SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA This handout presents the second derivative test for a local extrema of a Lagrange multiplier problem. The Section 1 presents a geometric motivation for the
More informationCONSTRAINED NONLINEAR PROGRAMMING
149 CONSTRAINED NONLINEAR PROGRAMMING We now turn to methods for general constrained nonlinear programming. These may be broadly classified into two categories: 1. TRANSFORMATION METHODS: In this approach
More informationCHAPTER 9. Integer Programming
CHAPTER 9 Integer Programming An integer linear program (ILP) is, by definition, a linear program with the additional constraint that all variables take integer values: (9.1) max c T x s t Ax b and x integral
More informationConvex Optimization. Lieven Vandenberghe University of California, Los Angeles
Convex Optimization Lieven Vandenberghe University of California, Los Angeles Tutorial lectures, Machine Learning Summer School University of Cambridge, September 34, 2009 Sources: Boyd & Vandenberghe,
More informationAdaptive Bound Optimization for Online Convex Optimization
Adaptive Bound Optimization for Online Convex Optimization H. Brendan McMahan Google, Inc. mcmahan@google.com Matthew Streeter Google, Inc. mstreeter@google.com Abstract We introduce a new online convex
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationPattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University
Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision
More informationChapter 4. Duality. 4.1 A Graphical Example
Chapter 4 Duality Given any linear program, there is another related linear program called the dual. In this chapter, we will develop an understanding of the dual linear program. This understanding translates
More information4.6 Linear Programming duality
4.6 Linear Programming duality To any minimization (maximization) LP we can associate a closely related maximization (minimization) LP. Different spaces and objective functions but in general same optimal
More informationAbsolute Value Programming
Computational Optimization and Aplications,, 1 11 (2006) c 2006 Springer Verlag, Boston. Manufactured in The Netherlands. Absolute Value Programming O. L. MANGASARIAN olvi@cs.wisc.edu Computer Sciences
More informationMetric Spaces. Chapter 1
Chapter 1 Metric Spaces Many of the arguments you have seen in several variable calculus are almost identical to the corresponding arguments in one variable calculus, especially arguments concerning convergence
More informationCost Minimization and the Cost Function
Cost Minimization and the Cost Function Juan Manuel Puerta October 5, 2009 So far we focused on profit maximization, we could look at a different problem, that is the cost minimization problem. This is
More information3. Proximal gradient method
Algorithms for largescale convex optimization DTU 2010 3. Proximal gradient method introduction proximal mapping proximal gradient method convergence analysis accelerated proximal gradient method forwardbackward
More informationThe equivalence of logistic regression and maximum entropy models
The equivalence of logistic regression and maximum entropy models John Mount September 23, 20 Abstract As our colleague so aptly demonstrated ( http://www.winvector.com/blog/20/09/thesimplerderivationoflogisticregression/
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationLecture 7: Finding Lyapunov Functions 1
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.243j (Fall 2003): DYNAMICS OF NONLINEAR SYSTEMS by A. Megretski Lecture 7: Finding Lyapunov Functions 1
More informationGenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1
(R) User Manual Environmental Energy Technologies Division Berkeley, CA 94720 http://simulationresearch.lbl.gov Michael Wetter MWetter@lbl.gov February 20, 2009 Notice: This work was supported by the U.S.
More informationA FIRST COURSE IN OPTIMIZATION THEORY
A FIRST COURSE IN OPTIMIZATION THEORY RANGARAJAN K. SUNDARAM New York University CAMBRIDGE UNIVERSITY PRESS Contents Preface Acknowledgements page xiii xvii 1 Mathematical Preliminaries 1 1.1 Notation
More informationRANDOM INTERVAL HOMEOMORPHISMS. MICHA L MISIUREWICZ Indiana University Purdue University Indianapolis
RANDOM INTERVAL HOMEOMORPHISMS MICHA L MISIUREWICZ Indiana University Purdue University Indianapolis This is a joint work with Lluís Alsedà Motivation: A talk by Yulij Ilyashenko. Two interval maps, applied
More informationHøgskolen i Narvik Sivilingeniørutdanningen STE6237 ELEMENTMETODER. Oppgaver
Høgskolen i Narvik Sivilingeniørutdanningen STE637 ELEMENTMETODER Oppgaver Klasse: 4.ID, 4.IT Ekstern Professor: Gregory A. Chechkin email: chechkin@mech.math.msu.su Narvik 6 PART I Task. Consider twopoint
More informationWalrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.
Walrasian Demand Econ 2100 Fall 2015 Lecture 5, September 16 Outline 1 Walrasian Demand 2 Properties of Walrasian Demand 3 An Optimization Recipe 4 First and Second Order Conditions Definition Walrasian
More informationIntroduction to Machine Learning
Introduction to Machine Learning Prof. Alexander Ihler Prof. Max Welling icamp Tutorial July 22 What is machine learning? The ability of a machine to improve its performance based on previous results:
More informationNo: 10 04. Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics
No: 10 04 Bilkent University Monotonic Extension Farhad Husseinov Discussion Papers Department of Economics The Discussion Papers of the Department of Economics are intended to make the initial results
More informationLecture 5: Conic Optimization: Overview
EE 227A: Conve Optimization and Applications January 31, 2012 Lecture 5: Conic Optimization: Overview Lecturer: Laurent El Ghaoui Reading assignment: Chapter 4 of BV. Sections 3.13.6 of WTB. 5.1 Linear
More informationMATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.
MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α
More information1. Let X and Y be normed spaces and let T B(X, Y ).
Uppsala Universitet Matematiska Institutionen Andreas Strömbergsson Prov i matematik Funktionalanalys Kurs: NVP, Frist. 20050314 Skrivtid: 9 11.30 Tillåtna hjälpmedel: Manuella skrivdon, Kreyszigs bok
More informationChapter 1. Metric Spaces. Metric Spaces. Examples. Normed linear spaces
Chapter 1. Metric Spaces Metric Spaces MA222 David Preiss d.preiss@warwick.ac.uk Warwick University, Spring 2008/2009 Definitions. A metric on a set M is a function d : M M R such that for all x, y, z
More informationOnline Convex Optimization
E0 370 Statistical Learning heory Lecture 19 Oct 22, 2013 Online Convex Optimization Lecturer: Shivani Agarwal Scribe: Aadirupa 1 Introduction In this lecture we shall look at a fairly general setting
More informationStochastic Inventory Control
Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationConstrained Least Squares
Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580587 CICN may05/1 Background The least squares problem: min Ax b 2 x Sometimes,
More informationAM 221: Advanced Optimization Spring Prof. Yaron Singer Lecture 7 February 19th, 2014
AM 22: Advanced Optimization Spring 204 Prof Yaron Singer Lecture 7 February 9th, 204 Overview In our previous lecture we saw the application of the strong duality theorem to game theory, and then saw
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Going For Large Scale Going For Large Scale 1
More information