Lecture 9: Support Vector Machines

Size: px
Start display at page:

Download "Lecture 9: Support Vector Machines"

Transcription

1 Advanced Topics in Machine Learning: COMPGI13 Gatsby Unit, CSML, UCL March 15, 2016

2 Overview The representer theorem Review of convex optimization Support vector classification, the C-SV and ν-sv machines

3 Representer theorem

4 Learning problem: setting Given a set of paired observations x 1, y 1 ),... x n, y n ) regression or classification). Find the function f in the RKHS H which satisfies f = arg min Jf ), 1) f H where ) Jf ) = L y f x 1 ),..., f x n )) + Ω f 2 H, Ω is non-decreasing, y is the vector of y i, loss L depends on x i only via f x i ). Classification: L y f x 1 ),..., f x n )) = n I y i f x i ) 0 Regression: L y f x 1 ),..., f x n )) = n y i f x i )) 2

5 Representer theorem The representer theorem: a solution to [ L y f x 1 ),..., f x n )) + Ω takes the form min f H f = α i kx i, ). )] f 2 H If Ω is strictly increasing, the solution must have this form.

6 Representer theorem: proof Proof: Denote f s projection of f onto the subspace such that where f s = n α ikx i, ). Regularizer: span {kx i, ) : 1 i n}, 2) f = f s + f, f 2 H = f s 2 H + f 2 H f s 2 H, then ) ) Ω f 2 H Ω f s 2 H, so this term is minimized for f = f s.

7 Representer theorem: proof Proof cont.): Individual terms f x i ) in the loss: f x i ) = f, kx i, ) H = f s + f, kx i, ) H = f s, kx i, ) H, so L y f x 1 ),..., f x n )) = L y f s x 1 ),..., f s x n )). Hence Loss L...) only depends on the component of f in the data subspace, Regularizer Ω...) minimized when f = f s. If Ω is non-decreasing, then f H = 0 is a minimum. If Ω strictly increasing, min. is unique.

8 Short overview of convex optimization

9 Why we need optimization: SVM idea Classify two clouds of points, where there exists a hyperplane which linearly separates one cloud from the other without error.

10 Why we need optimization: SVM idea Classify two clouds of points, where there exists a hyperplane which linearly separates one cloud from the other without error.

11 Why we need optimization: SVM idea Classify two clouds of points, where there exists a hyperplane which linearly separates one cloud from the other without error. Smallest distance from each class to the separating hyperplane w x + b is called the margin.

12 Why we need optimization: SVM idea This problem can be expressed as follows: ) 2 max margin) = max w,b w,b w or min w,b w 2 3) subject to { w x i + b ) 1 i : y i = +1, w x i + b ) 1 i : y i = 1. 4) This is a convex optimization problem.

13 Convex set Figure from Boyd and Vandenberghe) Leftmost set is convex, remaining two are not. Every point in the set can be seen from any other point in the set, along a straight line that never leaves the set. Definition C is convex if for all x 1, x 2 C and any 0 θ 1 we have θx θ)x 2 C, i.e. every point on the line between x 1 and x 2 lies in C.

14 Convex function: no local optima Figure from Boyd and Vandenberghe) Definition Convex function) A function f is convex if its domain domf is a convex set and if x, y domf, and any 0 θ 1, f θx + 1 θ)y) θf x) + 1 θ)f y). The function is strictly convex if the inequality is strict for x y.

15 Optimization and the Lagrangian Generic) optimization problem on x R n, minimize f 0 x) subject to f i x) 0 i = 1,..., m 5) h i x) = 0 i = 1,... p. p the optimal value of 5), D assumed nonempty, where... D := m i=0 domf i p domh i dom f i =subset of R n where f i defined). Idealy we would want an unconstrained problem minimize f 0 x) + { 0 u 0 where I u) = u > 0 Why is this hard to solve? I f i x)) + p I 0 h i x)), and I 0 u) is the indicator of 0.

16 Optimization and the Lagrangian Generic) optimization problem on x R n, minimize f 0 x) subject to f i x) 0 i = 1,..., m 5) h i x) = 0 i = 1,... p. p the optimal value of 5), D assumed nonempty, where... D := m i=0 domf i p domh i dom f i =subset of R n where f i defined). Idealy we would want an unconstrained problem minimize f 0 x) + { 0 u 0 where I u) = u > 0 Why is this hard to solve? I f i x)) + p I 0 h i x)), and I 0 u) is the indicator of 0.

17 Lower bound interpretation of Lagrangian The Lagrangian L : R n R m R p R is a lower bound on the original problem: p Lx, λ, ν) := f 0 x) + λ i f i x) + ν }{{} i h i x), }{{} I f i x)) I 0 h i x)) and has domain doml := D R m R p. The vectors λ and ν are called lagrange multipliers or dual variables. To ensure a lower bound, we require λ 0.

18 Lagrange dual: lower bound on optimum p The Lagrange dual function: minimize Lagrangian When λ 0 and f i x) 0, Lagrange dual function is gλ, ν) := inf Lx, λ, ν). 6) x D A dual feasible pair λ, ν) is a pair for which λ 0 and λ, ν) domg). We will show: next slide) for any λ 0 and ν, gλ, ν) f 0 x) wherever including at f 0 x ) = p ). f i x) 0 g i x) = 0

19 Lagrange dual: lower bound on optimum p The Lagrange dual function: minimize Lagrangian When λ 0 and f i x) 0, Lagrange dual function is gλ, ν) := inf Lx, λ, ν). 6) x D A dual feasible pair λ, ν) is a pair for which λ 0 and λ, ν) domg). We will show: next slide) for any λ 0 and ν, gλ, ν) f 0 x) wherever including at f 0 x ) = p ). f i x) 0 g i x) = 0

20 Lagrange dual: lower bound on optimum p Simplest example: minimize over x Lx, λ) = f 0 x) + λf 1 x) Figure modified from Boyd and Vandenberghe) the function Reminders: f 0 is function to be minimized. f 1 0 is inequality constraint λ 0 is Lagrange multiplier p is minimum f 0 in constraint set

21 Lagrange dual: lower bound on optimum p Simplest example:minimize over x Lx, λ) = f 0 x) + λf 1 x) Figure from Boyd and Vandenberghe) the function Reminders: f 0 is function to be minimized. f 1 0 is inequality constraint λ 0 is Lagrange multiplier p is minimum f 0 in constraint set

22 Lagrange dual: lower bound on optimum p Simplest example: minimize over x Lx, λ) = f 0 x) + λf 1 x) Figure from Boyd and Vandenberghe) the function Reminders: f 0 is function to be minimized. f 1 0 is inequality constraint λ 0 is Lagrange multiplier p is minimum f 0 in constraint set

23 Lagrange dual: lower bound on optimum p When λ 0, then for all ν we have gλ, ν) p 7) A dual feasible pair λ, ν) is a pair for which λ 0 and λ, ν) domg) Figure from Boyd and Vandenberghe) Reminders: gλ, ν) := inf x D Lx, λ) λ 0 is Lagrange multiplier p is minimum f 0 in constraint set

24 Lagrange dual is lower bound on p proof) We now give a formal proof that Lagrange dual function gλ, ν) lower bounds p. Proof: Assume x is feasible, i.e. f i x) 0, h i x) = 0, x D, λ 0. Then p λ i f i x) + ν i h i x) 0 Thus gλ, ν) := inf x D f 0 x) + f 0 x). f 0 x) + λ i f i x) + λ i f i x) + ) p ν i h i x) p ν i h i x) This holds for every feasible x, hence lower bound holds.

25 Lagrange dual is lower bound on p proof) We now give a formal proof that Lagrange dual function gλ, ν) lower bounds p. Proof: Assume x is feasible, i.e. f i x) 0, h i x) = 0, x D, λ 0. Then p λ i f i x) + ν i h i x) 0 Thus gλ, ν) := inf x D f 0 x) + f 0 x). f 0 x) + λ i f i x) + λ i f i x) + ) p ν i h i x) p ν i h i x) This holds for every feasible x, hence lower bound holds.

26 Lagrange dual is lower bound on p proof) We now give a formal proof that Lagrange dual function gλ, ν) lower bounds p. Proof: Assume x is feasible, i.e. f i x) 0, h i x) = 0, x D, λ 0. Then p λ i f i x) + ν i h i x) 0 Thus gλ, ν) := inf x D f 0 x) + f 0 x). f 0 x) + λ i f i x) + λ i f i x) + ) p ν i h i x) p ν i h i x) This holds for every feasible x, hence lower bound holds.

27 Best lower bound: maximize the dual Best lower bound gλ, ν) on the optimal solution p of 5): Lagrange dual problem maximize gλ, ν) subject to λ 0. 8) Dual feasible: λ, ν) with λ 0 and gλ, ν) >. Dual optimal: solutions λ, ν ) maximizing dual, d is optimal value dual always easy to maximize: next slide). Weak duality always holds: d p....but what is the point of finding a best largest) lower bound on a minimization problem?

28 Best lower bound: maximize the dual Best lower bound gλ, ν) on the optimal solution p of 5): Lagrange dual problem maximize gλ, ν) subject to λ 0. 8) Dual feasible: λ, ν) with λ 0 and gλ, ν) >. Dual optimal: solutions λ, ν ) maximizing dual, d is optimal value dual always easy to maximize: next slide). Weak duality always holds: d p....but what is the point of finding a best largest) lower bound on a minimization problem?

29 Best lower bound: maximize the dual Best lower bound gλ, ν) on the optimal solution p of 5): Lagrange dual problem maximize gλ, ν) subject to λ 0. 9) Dual feasible: λ, ν) with λ 0 and gλ, ν) >. Dual optimal: solutions λ, ν ) to the dual problem, d is optimal value dual always easy to maximize: next slide). Weak duality always holds: d p. Strong duality: does not always hold, conditions given later): d = p. If S.D. holds: solve the easy concave) dual problem to find p.

30 Maximizing the dual is always easy The Lagrange dual function: minimize Lagrangian lower bound) gλ, ν) = inf Lx, λ, ν). x D Dual function is a pointwise infimum of affine functions of λ, ν), hence concave in λ, ν) with convex constraint set λ 0. Example: One inequality constraint, Lx, λ) = f 0 x) + λf 1 x), and assume there are only four possible values for x. Each line represents a different x.

31 How do we know if strong duality holds? Conditions under which strong duality holds are called constraint qualifications they are sufficient, but not necessary) Probably) best known sufficient condition: Strong duality holds if Primal problem is convex, i.e. of the form minimize f 0 x) subject to f i x) 0 i = 1,..., n 10) Ax = b for convex f 0,..., f m, and Slater s condition holds: there exists some strictly feasible point 1 x relintd) such that f i x) < 0 i = 1,..., m A x = b. 1 We denote by relintd) the relative interior of the set D. This looks like the interior of the set, but is non-empty even when the set is a subspace of a larger space.

32 How do we know if strong duality holds? Conditions under which strong duality holds are called constraint qualifications they are sufficient, but not necessary) Probably) best known sufficient condition: Strong duality holds if Primal problem is convex, i.e. of the form minimize f 0 x) subject to f i x) 0 i = 1,..., n 10) Ax = b for convex f 0,..., f m, and Slater s condition holds: there exists some strictly feasible point 1 x relintd) such that f i x) < 0 i = 1,..., m A x = b. 1 We denote by relintd) the relative interior of the set D. This looks like the interior of the set, but is non-empty even when the set is a subspace of a larger space.

33 How do we know if strong duality holds? Conditions under which strong duality holds are called constraint qualifications they are sufficient, but not necessary) Probably) best known sufficient condition: Strong duality holds if Primal problem is convex, i.e. of the form minimize f 0 x) subject to f i x) 0 i = 1,..., n Ax = b for convex f 0,..., f m, and Slater s condition for the case of affine f i is trivial inequality constraints no longer strict, reduces to original inequality constraints) f i x) 0 i = 1,..., m A x = b.

34 A consequence of strong duality... Assume primal is equal to the dual. What are the consequences? x solution of original problem minimum of f 0 under constraints), λ, ν ) solutions to dual f 0 x ) = assumed) = g definition) inf definition) 4) gλ, ν ) inf x D f 0 x ) + f 0 x ), f 0 x) + λ i f i x) + λ i f i x ) + ) p νi h i x) p νi h i x ) 4): x, λ, ν ) satisfies λ 0, f i x ) 0, and h i x ) = 0.

35 A consequence of strong duality... Assume primal is equal to the dual. What are the consequences? x solution of original problem minimum of f 0 under constraints), λ, ν ) solutions to dual f 0 x ) = assumed) = g definition) inf definition) 4) gλ, ν ) inf x D f 0 x ) + f 0 x ), f 0 x) + λ i f i x) + λ i f i x ) + ) p νi h i x) p νi h i x ) 4): x, λ, ν ) satisfies λ 0, f i x ) 0, and h i x ) = 0.

36 ...is complementary slackness From previous slide, λ i f i x ) = 0, 11) which is the condition of complementary slackness. This means λ i > 0 = f i x ) = 0, f i x ) < 0 = λ i = 0. From λ i, read off which inequality constraints are strict.

37 KKT conditions for global optimum Assume functions f i, h i are differentiable and strong duality. Since x minimizes Lx, λ, ν ), derivative at x is zero, f 0 x ) + λ i f i x ) + p νi h i x ) = 0. KKT conditions definition: we are at global optimum, x, λ, ν) = x, λ, ν ) when a) strong duality holds, and b) f 0 x) + λ i f i x) + f i x) 0, i = 1,..., m h i x) = 0, i = 1,..., p λ i 0, i = 1,..., m λ i f i x) = 0, i = 1,..., m p ν i h i x) = 0

38 KKT conditions for global optimum Assume functions f i, h i are differentiable and strong duality. Since x minimizes Lx, λ, ν ), derivative at x is zero, f 0 x ) + λ i f i x ) + p νi h i x ) = 0. KKT conditions definition: we are at global optimum, x, λ, ν) = x, λ, ν ) when a) strong duality holds, and b) f 0 x) + λ i f i x) + f i x) 0, i = 1,..., m h i x) = 0, i = 1,..., p λ i 0, i = 1,..., m λ i f i x) = 0, i = 1,..., m p ν i h i x) = 0

39 KKT conditions for global optimum In summary: if primal problem convex and constraint functions satisfy Slater s conditions then strong duality holds. If in addition functions f i, h i differentiable then KKT conditions necessary and sufficient for optimality.

40 Support vector classification

41 Linearly separable points Classify two clouds of points, where there exists a hyperplane which linearly separates one cloud from the other without error. Smallest distance from each class to the separating hyperplane w x + b is called the margin.

42 Maximum margin classifier, linearly separable case This problem can be expressed as follows: ) 2 max margin) = max w,b w,b w subject to The resulting classifier is { min w x i + b ) = 1 i : y i = +1, We can rewrite to obtain subject to max w x i + b ) = 1 i : y i = 1. max w,b y = signw x + b), 1 w or min w,b w 2 12) 13) y i w x i + b) 1. 14)

43 Maximum margin classifier, linearly separable case This problem can be expressed as follows: ) 2 max margin) = max w,b w,b w subject to The resulting classifier is { min w x i + b ) = 1 i : y i = +1, We can rewrite to obtain subject to max w x i + b ) = 1 i : y i = 1. max w,b y = signw x + b), 1 w or min w,b w 2 12) 13) y i w x i + b) 1. 14)

44 Maximum margin classifier: with errors allowed Allow errors : points within the margin, or even on the wrong side of the decision boudary. Ideally: ) 1 ) min w,b 2 w 2 + C I[y i w x i + b < 0], where C controls the tradeoff between maximum margin and loss. Replace with convex upper bound: 1 min w,b 2 w 2 + C θ y i w x i + b)) ). with hinge loss, θα) = 1 α) + = { 1 α 1 α > 0 0 otherwise.

45 Maximum margin classifier: with errors allowed Allow errors : points within the margin, or even on the wrong side of the decision boudary. Ideally: ) 1 ) min w,b 2 w 2 + C I[y i w x i + b < 0], where C controls the tradeoff between maximum margin and loss. Replace with convex upper bound: 1 min w,b 2 w 2 + C θ y i w x i + b)) ). with hinge loss, θα) = 1 α) + = { 1 α 1 α > 0 0 otherwise.

46 Hinge loss Hinge loss: θα) = 1 α) + = { 1 α 1 α > 0 0 otherwise.

47 Support vector classification Substituting in the hinge loss, we get 1 min w,b 2 w 2 + C θ y i w x i + b)) ). How do you implement hinge loss with simple inequality constraints for optimization)? ) 1 min w,b,ξ 2 w 2 + C ξ i 15) subject to 2 ) ξ i 0 y i w x i + b 1 ξ i 2 Either y i w x i + b ) 1 and ξ i = 0 as before, or y i w x i + b ) < 1, and then ξ i > 0 takes the value satisfying y i w x i + b ) = 1 ξ i.

48 Support vector classification Substituting in the hinge loss, we get 1 min w,b 2 w 2 + C θ y i w x i + b)) ). How do you implement hinge loss with simple inequality constraints for optimization)? ) 1 min w,b,ξ 2 w 2 + C ξ i 15) subject to 2 ) ξ i 0 y i w x i + b 1 ξ i 2 Either y i w x i + b ) 1 and ξ i = 0 as before, or y i w x i + b ) < 1, and then ξ i > 0 takes the value satisfying y i w x i + b ) = 1 ξ i.

49 Support vector classification

50 Does strong duality hold? 1 Is the optimization problem convex wrt the variables w, b, ξ? minimize f 0 w, b, ξ) := 1 2 w 2 + C ) subject to f i w, b, ξ) := 1 ξ i y i w x i + b 0 Ax = b absent) ξ i i = 1,..., n Each of f 0, f 1,..., f n are convex. 2 Does Slater s condition hold? Trivial since inequality constraints affine, and there always exists some ξ i 0 ) y i w x i + b 1 ξ i Thus strong duality holds, the problem is differentiable, hence the KKT conditions hold at the global optimum.

51 Does strong duality hold? 1 Is the optimization problem convex wrt the variables w, b, ξ? minimize f 0 w, b, ξ) := 1 2 w 2 + C ) subject to f i w, b, ξ) := 1 ξ i y i w x i + b 0 Ax = b absent) ξ i i = 1,..., n Each of f 0, f 1,..., f n are convex. 2 Does Slater s condition hold? Trivial since inequality constraints affine, and there always exists some ξ i 0 ) y i w x i + b 1 ξ i Thus strong duality holds, the problem is differentiable, hence the KKT conditions hold at the global optimum.

52 Does strong duality hold? 1 Is the optimization problem convex wrt the variables w, b, ξ? minimize f 0 w, b, ξ) := 1 2 w 2 + C ) subject to f i w, b, ξ) := 1 ξ i y i w x i + b 0 Ax = b absent) ξ i i = 1,..., n Each of f 0, f 1,..., f n are convex. 2 Does Slater s condition hold? Trivial since inequality constraints affine, and there always exists some ξ i 0 ) y i w x i + b 1 ξ i Thus strong duality holds, the problem is differentiable, hence the KKT conditions hold at the global optimum.

53 Support vector classification: Lagrangian The Lagrangian: Lw, b, ξ, α, λ) = 1 2 w 2 +C ) ) ξ i + α i 1 y i w x i + b ξ i + λ i ξ i ) with dual variable constraints α i 0, λ i 0. Minimize wrt the primal variables w, b, and ξ.

54 Support vector classification: Lagrangian Derivative wrt w: L w = w α i y i x i = 0 w = α i y i x i. 16) Derivative wrt b: Derivative wrt ξ i : L b = i y i α i = 0. 17) L ξ i = C α i λ i = 0 α i = C λ i. 18) Noting that λ i 0, α i C.

55 Support vector classification: Lagrangian Now use complementary slackness: Non-margin SVs: α i = C 0: 1 We immediately have 1 ξ i = y i w x i + b ). 2 Also, from condition α i = C λ i, we have λ i = 0 hence can have ξ i > 0). Margin SVs: 0 < α i < C: 1 We again have 1 ξ i = y i w x i + b ) 2 This time, from α i = C λ i, we have λ i 0, hence ξ i = 0. Non-SVs: α i = 0 1 This time we can have: y i w x i + b ) > 1 ξ i 2 From α i = C λ i, we have λ i 0, hence ξ i = 0.

56 Support vector classification: Lagrangian Now use complementary slackness: Non-margin SVs: α i = C 0: 1 We immediately have 1 ξ i = y i w x i + b ). 2 Also, from condition α i = C λ i, we have λ i = 0 hence can have ξ i > 0). Margin SVs: 0 < α i < C: 1 We again have 1 ξ i = y i w x i + b ) 2 This time, from α i = C λ i, we have λ i 0, hence ξ i = 0. Non-SVs: α i = 0 1 This time we can have: y i w x i + b ) > 1 ξ i 2 From α i = C λ i, we have λ i 0, hence ξ i = 0.

57 Support vector classification: Lagrangian Now use complementary slackness: Non-margin SVs: α i = C 0: 1 We immediately have 1 ξ i = y i w x i + b ). 2 Also, from condition α i = C λ i, we have λ i = 0 hence can have ξ i > 0). Margin SVs: 0 < α i < C: 1 We again have 1 ξ i = y i w x i + b ) 2 This time, from α i = C λ i, we have λ i 0, hence ξ i = 0. Non-SVs: α i = 0 1 This time we can have: y i w x i + b ) > 1 ξ i 2 From α i = C λ i, we have λ i 0, hence ξ i = 0.

58 Support vector classification: Lagrangian Now use complementary slackness: Non-margin SVs: α i = C 0: 1 We immediately have 1 ξ i = y i w x i + b ). 2 Also, from condition α i = C λ i, we have λ i = 0 hence can have ξ i > 0). Margin SVs: 0 < α i < C: 1 We again have 1 ξ i = y i w x i + b ) 2 This time, from α i = C λ i, we have λ i 0, hence ξ i = 0. Non-SVs: α i = 0 1 This time we can have: y i w x i + b ) > 1 ξ i 2 From α i = C λ i, we have λ i 0, hence ξ i = 0.

59 The support vectors We observe: 1 The solution is sparse: points which are not on the margin, or margin errors, have α i = 0 2 The support vectors: only those points on the decision boundary, or which are margin errors, contribute. 3 Influence of the non-margin SVs is bounded, since their weight cannot exceed C.

60 Support vector classification: dual function Thus, our goal is to maximize the dual, gα, λ) = 1 2 w 2 + C = 1 2 = + ξ i + λ i ξ i ) j=1 α i α j y i y j xi x j + C b α i y i + α i 1 2 }{{} 0 j=1 α i ) ) α i 1 y i w x i + b ξ i ξ i α i ξ i α i α j y i y j xi x j. j=1 α i α j y i y j xi C α i )ξ i x j

61 Support vector classification: dual function Maximize the dual, gα) = α i 1 2 α i α j y i y j xi x j, j=1 subject to the constraints 0 α i C, y i α i = 0 This is a quadratic program. Offset b: for the margin SVs, we have 1 = y i w x i + b ). Obtain b from any of these, or take an average.

62 Support vector classification: kernel version gin Hyperplanes 207 Figure 7.9 Toy problem task: separate circles from disks) solved using -SV classification, with parameter values Taken ranging from from Schoelkopf 0 1top and left)to Smola 2002) 0 8bottomright).Thelarger we make,themorepointsareallowedtolieinsidethemargindepicted by dotted lines). Results are shown for a Gaussian kernel, kx x ) exp x x 2 ).

63 Support vector classification: kernel version Maximum margin classifier in RKHS: write the hinge loss formulation ) 1 min w 2 w 2 H + C θ y i w, kx i, ) H ) for the RKHS H with kernel kx, ). Use the result of the representer theorem, w ) = β i kx i, ). Maximizing the margin equivalent to minimizing w 2 H : for many RKHSs a smoothness constraint e.g. Gaussian kernel).

64 Support vector classification: kernel version Substituting and introducing the ξ i variables, get ) 1 min β,ξ 2 β Kβ + C ξ i where the matrix K has i, jth entry K ij = kx i, x j ), subject to ξ i 0 y i β j kx i, x j ) 1 ξ i j=1 Convex in β, ξ since K is positive definite: hence strong duality holds. Dual: gα) = α i 1 α i α j y i y j kx i, x j ), 2 subject to w ) = j=1 y i α i kx, ), 0 α i C. 19)

65 Support vector classification: kernel version Substituting and introducing the ξ i variables, get ) 1 min β,ξ 2 β Kβ + C ξ i where the matrix K has i, jth entry K ij = kx i, x j ), subject to ξ i 0 y i β j kx i, x j ) 1 ξ i j=1 Convex in β, ξ since K is positive definite: hence strong duality holds. Dual: gα) = α i 1 α i α j y i y j kx i, x j ), 2 subject to w ) = j=1 y i α i kx, ), 0 α i C. 19)

66 Support vector classification: the ν-svm Another kind of SVM: the ν-svm: Hard to interpret C. Modify the formulation to get a more intuitive parameter ν. Again, we drop b for simplicity. Solve subject to min w,ρ,ξ 1 2 w 2 νρ + 1 n ρ 0 ξ i 0 y i w x i ρ ξ i, now directly adjust margin width ρ). ) ξ i

67 The ν-svm: Lagrangian 1 2 w 2 H+ 1 n ξ i νρ+ α i ρ y i w x i ξ i )+ β i ξ i )+γ ρ) for dual variables α i 0, β i 0, and γ 0. Differentiating and setting to zero for each of the primal variables w, ξ, ρ, w = α i y i x i From β i 0, equation 20) implies α i + β i = 1 n 20) ν = α i γ 21) 0 α i n 1.

68 The ν-svm: Lagrangian 1 2 w 2 H+ 1 n ξ i νρ+ α i ρ y i w x i ξ i )+ β i ξ i )+γ ρ) for dual variables α i 0, β i 0, and γ 0. Differentiating and setting to zero for each of the primal variables w, ξ, ρ, w = α i y i x i From β i 0, equation 20) implies α i + β i = 1 n 20) ν = α i γ 21) 0 α i n 1.

69 Complementary slackness 1) Complementary slackness conditions: Assume ρ > 0 at the global solution, hence γ = 0, and α i = ν. 22) Case of ξ i > 0: complementary slackness states β i = 0, hence from 20) we have α i = n 1. Denote this set as Nα). Then i Nα) 1 n = i Nα) α i α i = ν, so Nα) ν, n and ν is an upper bound on the number of non-margin SVs.

70 Complementary slackness 1) Complementary slackness conditions: Assume ρ > 0 at the global solution, hence γ = 0, and α i = ν. 22) Case of ξ i > 0: complementary slackness states β i = 0, hence from 20) we have α i = n 1. Denote this set as Nα). Then i Nα) 1 n = i Nα) α i α i = ν, so Nα) ν, n and ν is an upper bound on the number of non-margin SVs.

71 Complementary slackness 2) Case of ξ i = 0: β i > 0 and so α i < n 1. Denote by Mα) the set of points n 1 > α i > 0. Then from 22), ν = α i = i Nα) 1 n + i Mα) α i i Mα) Nα) 1 n, thus Nα) + Mα) ν, n and ν is a lower bound on the number of support vectors with non-zero weight both on the margin, and margin errors ).

72 Dual for ν-svm Substituting into the Lagrangian, we get 1 α i α j y i y j xi x j + 1 ξ i ρν 2 n = 1 2 j=1 + Maximize: subject to α i ρ j=1 α i ξ i α i α j y i y j xi x j gα) = 1 2 j=1 1 n α i j=1 ) ξ i ρ α i α j y i y j xi x j, α i ν 0 α i 1 n. α i α j y i y j xi x j ) α i ν

73 Dual for ν-svm Substituting into the Lagrangian, we get 1 α i α j y i y j xi x j + 1 ξ i ρν 2 n = 1 2 j=1 + Maximize: subject to α i ρ j=1 α i ξ i α i α j y i y j xi x j gα) = 1 2 j=1 1 n α i j=1 ) ξ i ρ α i α j y i y j xi x j, α i ν 0 α i 1 n. α i α j y i y j xi x j ) α i ν

74 Questions?

Support Vector Machine (SVM)

Support Vector Machine (SVM) Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725 Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric

More information

Several Views of Support Vector Machines

Several Views of Support Vector Machines Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

More information

4.6 Linear Programming duality

4.6 Linear Programming duality 4.6 Linear Programming duality To any minimization (maximization) LP we can associate a closely related maximization (minimization) LP. Different spaces and objective functions but in general same optimal

More information

Support Vector Machines Explained

Support Vector Machines Explained March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}. Walrasian Demand Econ 2100 Fall 2015 Lecture 5, September 16 Outline 1 Walrasian Demand 2 Properties of Walrasian Demand 3 An Optimization Recipe 4 First and Second Order Conditions Definition Walrasian

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

Nonlinear Optimization: Algorithms 3: Interior-point methods

Nonlinear Optimization: Algorithms 3: Interior-point methods Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,

More information

A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

More information

Distributed Machine Learning and Big Data

Distributed Machine Learning and Big Data Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya

More information

2.3 Convex Constrained Optimization Problems

2.3 Convex Constrained Optimization Problems 42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

More information

Date: April 12, 2001. Contents

Date: April 12, 2001. Contents 2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........

More information

Proximal mapping via network optimization

Proximal mapping via network optimization L. Vandenberghe EE236C (Spring 23-4) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

Nonlinear Programming Methods.S2 Quadratic Programming

Nonlinear Programming Methods.S2 Quadratic Programming Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective

More information

Linear Programming Notes V Problem Transformations

Linear Programming Notes V Problem Transformations Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material

More information

1 Solving LPs: The Simplex Algorithm of George Dantzig

1 Solving LPs: The Simplex Algorithm of George Dantzig Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.

More information

Support Vector Machine. Tutorial. (and Statistical Learning Theory)

Support Vector Machine. Tutorial. (and Statistical Learning Theory) Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. jasonw@nec-labs.com 1 Support Vector Machines: history SVMs introduced

More information

10. Proximal point method

10. Proximal point method L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing

More information

Convex analysis and profit/cost/support functions

Convex analysis and profit/cost/support functions CALIFORNIA INSTITUTE OF TECHNOLOGY Division of the Humanities and Social Sciences Convex analysis and profit/cost/support functions KC Border October 2004 Revised January 2009 Let A be a subset of R m

More information

1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where.

1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where. Introduction Linear Programming Neil Laws TT 00 A general optimization problem is of the form: choose x to maximise f(x) subject to x S where x = (x,..., x n ) T, f : R n R is the objective function, S

More information

Lecture 2: August 29. Linear Programming (part I)

Lecture 2: August 29. Linear Programming (part I) 10-725: Convex Optimization Fall 2013 Lecture 2: August 29 Lecturer: Barnabás Póczos Scribes: Samrachana Adhikari, Mattia Ciollaro, Fabrizio Lecci Note: LaTeX template courtesy of UC Berkeley EECS dept.

More information

CHAPTER 9. Integer Programming

CHAPTER 9. Integer Programming CHAPTER 9 Integer Programming An integer linear program (ILP) is, by definition, a linear program with the additional constraint that all variables take integer values: (9.1) max c T x s t Ax b and x integral

More information

Support Vector Machines

Support Vector Machines CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning algorithm. SVMs are among the best (and many believe are indeed the best)

More information

Practical Guide to the Simplex Method of Linear Programming

Practical Guide to the Simplex Method of Linear Programming Practical Guide to the Simplex Method of Linear Programming Marcel Oliver Revised: April, 0 The basic steps of the simplex algorithm Step : Write the linear programming problem in standard form Linear

More information

Duality of linear conic problems

Duality of linear conic problems Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least

More information

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard

More information

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Semi-Supervised Support Vector Machines and Application to Spam Filtering Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery

More information

Two-Stage Stochastic Linear Programs

Two-Stage Stochastic Linear Programs Two-Stage Stochastic Linear Programs Operations Research Anthony Papavasiliou 1 / 27 Two-Stage Stochastic Linear Programs 1 Short Reviews Probability Spaces and Random Variables Convex Analysis 2 Deterministic

More information

NOTES ON LINEAR TRANSFORMATIONS

NOTES ON LINEAR TRANSFORMATIONS NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

What is Linear Programming?

What is Linear Programming? Chapter 1 What is Linear Programming? An optimization problem usually has three essential ingredients: a variable vector x consisting of a set of unknowns to be determined, an objective function of x to

More information

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen (für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained

More information

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all. 1. Differentiation The first derivative of a function measures by how much changes in reaction to an infinitesimal shift in its argument. The largest the derivative (in absolute value), the faster is evolving.

More information

A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION

A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION 1 A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION Dimitri Bertsekas M.I.T. FEBRUARY 2003 2 OUTLINE Convexity issues in optimization Historical remarks Our treatment of the subject Three unifying lines of

More information

3. Linear Programming and Polyhedral Combinatorics

3. Linear Programming and Polyhedral Combinatorics Massachusetts Institute of Technology Handout 6 18.433: Combinatorial Optimization February 20th, 2009 Michel X. Goemans 3. Linear Programming and Polyhedral Combinatorics Summary of what was seen in the

More information

Mathematical finance and linear programming (optimization)

Mathematical finance and linear programming (optimization) Mathematical finance and linear programming (optimization) Geir Dahl September 15, 2009 1 Introduction The purpose of this short note is to explain how linear programming (LP) (=linear optimization) may

More information

Support Vector Machines for Classification and Regression

Support Vector Machines for Classification and Regression UNIVERSITY OF SOUTHAMPTON Support Vector Machines for Classification and Regression by Steve R. Gunn Technical Report Faculty of Engineering, Science and Mathematics School of Electronics and Computer

More information

Lecture 2: The SVM classifier

Lecture 2: The SVM classifier Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function

More information

160 CHAPTER 4. VECTOR SPACES

160 CHAPTER 4. VECTOR SPACES 160 CHAPTER 4. VECTOR SPACES 4. Rank and Nullity In this section, we look at relationships between the row space, column space, null space of a matrix and its transpose. We will derive fundamental results

More information

Linear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc.

Linear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc. 1. Introduction Linear Programming for Optimization Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc. 1.1 Definition Linear programming is the name of a branch of applied mathematics that

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Lecture 6: Logistic Regression

Lecture 6: Logistic Regression Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,

More information

MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set.

MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set. MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set. Vector space A vector space is a set V equipped with two operations, addition V V (x,y) x + y V and scalar

More information

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 J. Zhang Institute of Applied Mathematics, Chongqing University of Posts and Telecommunications, Chongqing

More information

Solutions Of Some Non-Linear Programming Problems BIJAN KUMAR PATEL. Master of Science in Mathematics. Prof. ANIL KUMAR

Solutions Of Some Non-Linear Programming Problems BIJAN KUMAR PATEL. Master of Science in Mathematics. Prof. ANIL KUMAR Solutions Of Some Non-Linear Programming Problems A PROJECT REPORT submitted by BIJAN KUMAR PATEL for the partial fulfilment for the award of the degree of Master of Science in Mathematics under the supervision

More information

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method Lecture 3 3B1B Optimization Michaelmas 2015 A. Zisserman Linear Programming Extreme solutions Simplex method Interior point method Integer programming and relaxation The Optimization Tree Linear Programming

More information

Vector and Matrix Norms

Vector and Matrix Norms Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

More information

Lecture 7: Finding Lyapunov Functions 1

Lecture 7: Finding Lyapunov Functions 1 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.243j (Fall 2003): DYNAMICS OF NONLINEAR SYSTEMS by A. Megretski Lecture 7: Finding Lyapunov Functions 1

More information

26 Linear Programming

26 Linear Programming The greatest flood has the soonest ebb; the sorest tempest the most sudden calm; the hottest love the coldest end; and from the deepest desire oftentimes ensues the deadliest hate. Th extremes of glory

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

In this section, we will consider techniques for solving problems of this type.

In this section, we will consider techniques for solving problems of this type. Constrained optimisation roblems in economics typically involve maximising some quantity, such as utility or profit, subject to a constraint for example income. We shall therefore need techniques for solving

More information

15 Kuhn -Tucker conditions

15 Kuhn -Tucker conditions 5 Kuhn -Tucker conditions Consider a version of the consumer problem in which quasilinear utility x 2 + 4 x 2 is maximised subject to x +x 2 =. Mechanically applying the Lagrange multiplier/common slopes

More information

Some representability and duality results for convex mixed-integer programs.

Some representability and duality results for convex mixed-integer programs. Some representability and duality results for convex mixed-integer programs. Santanu S. Dey Joint work with Diego Morán and Juan Pablo Vielma December 17, 2012. Introduction About Motivation Mixed integer

More information

Systems of Linear Equations

Systems of Linear Equations Systems of Linear Equations Beifang Chen Systems of linear equations Linear systems A linear equation in variables x, x,, x n is an equation of the form a x + a x + + a n x n = b, where a, a,, a n and

More information

constraint. Let us penalize ourselves for making the constraint too big. We end up with a

constraint. Let us penalize ourselves for making the constraint too big. We end up with a Chapter 4 Constrained Optimization 4.1 Equality Constraints (Lagrangians) Suppose we have a problem: Maximize 5, (x 1, 2) 2, 2(x 2, 1) 2 subject to x 1 +4x 2 =3 If we ignore the constraint, we get the

More information

Matrix Representations of Linear Transformations and Changes of Coordinates

Matrix Representations of Linear Transformations and Changes of Coordinates Matrix Representations of Linear Transformations and Changes of Coordinates 01 Subspaces and Bases 011 Definitions A subspace V of R n is a subset of R n that contains the zero element and is closed under

More information

LECTURE 5: DUALITY AND SENSITIVITY ANALYSIS. 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method

LECTURE 5: DUALITY AND SENSITIVITY ANALYSIS. 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method LECTURE 5: DUALITY AND SENSITIVITY ANALYSIS 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method Introduction to dual linear program Given a constraint matrix A, right

More information

Cost Minimization and the Cost Function

Cost Minimization and the Cost Function Cost Minimization and the Cost Function Juan Manuel Puerta October 5, 2009 So far we focused on profit maximization, we could look at a different problem, that is the cost minimization problem. This is

More information

Chapter 4. Duality. 4.1 A Graphical Example

Chapter 4. Duality. 4.1 A Graphical Example Chapter 4 Duality Given any linear program, there is another related linear program called the dual. In this chapter, we will develop an understanding of the dual linear program. This understanding translates

More information

Summer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.

Summer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U. Summer course on Convex Optimization Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.Minnesota Interior-Point Methods: the rebirth of an old idea Suppose that f is

More information

Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach

Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach MASTER S THESIS Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach PAULINE ALDENVIK MIRJAM SCHIERSCHER Department of Mathematical

More information

by the matrix A results in a vector which is a reflection of the given

by the matrix A results in a vector which is a reflection of the given Eigenvalues & Eigenvectors Example Suppose Then So, geometrically, multiplying a vector in by the matrix A results in a vector which is a reflection of the given vector about the y-axis We observe that

More information

24. The Branch and Bound Method

24. The Branch and Bound Method 24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no

More information

So let us begin our quest to find the holy grail of real analysis.

So let us begin our quest to find the holy grail of real analysis. 1 Section 5.2 The Complete Ordered Field: Purpose of Section We present an axiomatic description of the real numbers as a complete ordered field. The axioms which describe the arithmetic of the real numbers

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Duality in Linear Programming

Duality in Linear Programming Duality in Linear Programming 4 In the preceding chapter on sensitivity analysis, we saw that the shadow-price interpretation of the optimal simplex multipliers is a very useful concept. First, these shadow

More information

University of Lille I PC first year list of exercises n 7. Review

University of Lille I PC first year list of exercises n 7. Review University of Lille I PC first year list of exercises n 7 Review Exercise Solve the following systems in 4 different ways (by substitution, by the Gauss method, by inverting the matrix of coefficients

More information

Table 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass.

Table 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass. Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

Linear Programming I

Linear Programming I Linear Programming I November 30, 2003 1 Introduction In the VCR/guns/nuclear bombs/napkins/star wars/professors/butter/mice problem, the benevolent dictator, Bigus Piguinus, of south Antarctica penguins

More information

ORIENTATIONS. Contents

ORIENTATIONS. Contents ORIENTATIONS Contents 1. Generators for H n R n, R n p 1 1. Generators for H n R n, R n p We ended last time by constructing explicit generators for H n D n, S n 1 by using an explicit n-simplex which

More information

No: 10 04. Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics

No: 10 04. Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics No: 10 04 Bilkent University Monotonic Extension Farhad Husseinov Discussion Papers Department of Economics The Discussion Papers of the Department of Economics are intended to make the initial results

More information

International Doctoral School Algorithmic Decision Theory: MCDA and MOO

International Doctoral School Algorithmic Decision Theory: MCDA and MOO International Doctoral School Algorithmic Decision Theory: MCDA and MOO Lecture 2: Multiobjective Linear Programming Department of Engineering Science, The University of Auckland, New Zealand Laboratoire

More information

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions 3. Convex functions Convex Optimization Boyd & Vandenberghe basic properties and examples operations that preserve convexity the conjugate function quasiconvex functions log-concave and log-convex functions

More information

Similarity and Diagonalization. Similar Matrices

Similarity and Diagonalization. Similar Matrices MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

More information

Equilibrium computation: Part 1

Equilibrium computation: Part 1 Equilibrium computation: Part 1 Nicola Gatti 1 Troels Bjerre Sorensen 2 1 Politecnico di Milano, Italy 2 Duke University, USA Nicola Gatti and Troels Bjerre Sørensen ( Politecnico di Milano, Italy, Equilibrium

More information

The Envelope Theorem 1

The Envelope Theorem 1 John Nachbar Washington University April 2, 2015 1 Introduction. The Envelope Theorem 1 The Envelope theorem is a corollary of the Karush-Kuhn-Tucker theorem (KKT) that characterizes changes in the value

More information

A FIRST COURSE IN OPTIMIZATION THEORY

A FIRST COURSE IN OPTIMIZATION THEORY A FIRST COURSE IN OPTIMIZATION THEORY RANGARAJAN K. SUNDARAM New York University CAMBRIDGE UNIVERSITY PRESS Contents Preface Acknowledgements page xiii xvii 1 Mathematical Preliminaries 1 1.1 Notation

More information

Solutions to Math 51 First Exam January 29, 2015

Solutions to Math 51 First Exam January 29, 2015 Solutions to Math 5 First Exam January 29, 25. ( points) (a) Complete the following sentence: A set of vectors {v,..., v k } is defined to be linearly dependent if (2 points) there exist c,... c k R, not

More information

Fóra Gyula Krisztián. Predictive analysis of financial time series

Fóra Gyula Krisztián. Predictive analysis of financial time series Eötvös Loránd University Faculty of Science Fóra Gyula Krisztián Predictive analysis of financial time series BSc Thesis Supervisor: Lukács András Department of Computer Science Budapest, June 2014 Acknowledgements

More information

On the Interaction and Competition among Internet Service Providers

On the Interaction and Competition among Internet Service Providers On the Interaction and Competition among Internet Service Providers Sam C.M. Lee John C.S. Lui + Abstract The current Internet architecture comprises of different privately owned Internet service providers

More information

Large-Scale Sparsified Manifold Regularization

Large-Scale Sparsified Manifold Regularization Large-Scale Sparsified Manifold Regularization Ivor W. Tsang James T. Kwok Department of Computer Science and Engineering The Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong

More information

Advanced Microeconomics

Advanced Microeconomics Advanced Microeconomics Ordinal preference theory Harald Wiese University of Leipzig Harald Wiese (University of Leipzig) Advanced Microeconomics 1 / 68 Part A. Basic decision and preference theory 1 Decisions

More information

MINIMIZATION OF ENTROPY FUNCTIONALS UNDER MOMENT CONSTRAINTS. denote the family of probability density functions g on X satisfying

MINIMIZATION OF ENTROPY FUNCTIONALS UNDER MOMENT CONSTRAINTS. denote the family of probability density functions g on X satisfying MINIMIZATION OF ENTROPY FUNCTIONALS UNDER MOMENT CONSTRAINTS I. Csiszár (Budapest) Given a σ-finite measure space (X, X, µ) and a d-tuple ϕ = (ϕ 1,..., ϕ d ) of measurable functions on X, for a = (a 1,...,

More information

Chapter 20. Vector Spaces and Bases

Chapter 20. Vector Spaces and Bases Chapter 20. Vector Spaces and Bases In this course, we have proceeded step-by-step through low-dimensional Linear Algebra. We have looked at lines, planes, hyperplanes, and have seen that there is no limit

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

More information

5.1 Bipartite Matching

5.1 Bipartite Matching CS787: Advanced Algorithms Lecture 5: Applications of Network Flow In the last lecture, we looked at the problem of finding the maximum flow in a graph, and how it can be efficiently solved using the Ford-Fulkerson

More information

4.5 Linear Dependence and Linear Independence

4.5 Linear Dependence and Linear Independence 4.5 Linear Dependence and Linear Independence 267 32. {v 1, v 2 }, where v 1, v 2 are collinear vectors in R 3. 33. Prove that if S and S are subsets of a vector space V such that S is a subset of S, then

More information

A Lagrangian-DNN Relaxation: a Fast Method for Computing Tight Lower Bounds for a Class of Quadratic Optimization Problems

A Lagrangian-DNN Relaxation: a Fast Method for Computing Tight Lower Bounds for a Class of Quadratic Optimization Problems A Lagrangian-DNN Relaxation: a Fast Method for Computing Tight Lower Bounds for a Class of Quadratic Optimization Problems Sunyoung Kim, Masakazu Kojima and Kim-Chuan Toh October 2013 Abstract. We propose

More information

IEOR 4404 Homework #2 Intro OR: Deterministic Models February 14, 2011 Prof. Jay Sethuraman Page 1 of 5. Homework #2

IEOR 4404 Homework #2 Intro OR: Deterministic Models February 14, 2011 Prof. Jay Sethuraman Page 1 of 5. Homework #2 IEOR 4404 Homework # Intro OR: Deterministic Models February 14, 011 Prof. Jay Sethuraman Page 1 of 5 Homework #.1 (a) What is the optimal solution of this problem? Let us consider that x 1, x and x 3

More information

Linear Algebra Notes

Linear Algebra Notes Linear Algebra Notes Chapter 19 KERNEL AND IMAGE OF A MATRIX Take an n m matrix a 11 a 12 a 1m a 21 a 22 a 2m a n1 a n2 a nm and think of it as a function A : R m R n The kernel of A is defined as Note

More information

A Study on SMO-type Decomposition Methods for Support Vector Machines

A Study on SMO-type Decomposition Methods for Support Vector Machines 1 A Study on SMO-type Decomposition Methods for Support Vector Machines Pai-Hsuen Chen, Rong-En Fan, and Chih-Jen Lin Department of Computer Science, National Taiwan University, Taipei 106, Taiwan cjlin@csie.ntu.edu.tw

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Metric Spaces. Chapter 7. 7.1. Metrics

Metric Spaces. Chapter 7. 7.1. Metrics Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some

More information

88 CHAPTER 2. VECTOR FUNCTIONS. . First, we need to compute T (s). a By definition, r (s) T (s) = 1 a sin s a. sin s a, cos s a

88 CHAPTER 2. VECTOR FUNCTIONS. . First, we need to compute T (s). a By definition, r (s) T (s) = 1 a sin s a. sin s a, cos s a 88 CHAPTER. VECTOR FUNCTIONS.4 Curvature.4.1 Definitions and Examples The notion of curvature measures how sharply a curve bends. We would expect the curvature to be 0 for a straight line, to be very small

More information

Convex Programming Tools for Disjunctive Programs

Convex Programming Tools for Disjunctive Programs Convex Programming Tools for Disjunctive Programs João Soares, Departamento de Matemática, Universidade de Coimbra, Portugal Abstract A Disjunctive Program (DP) is a mathematical program whose feasible

More information