Lecture 9: Support Vector Machines
|
|
- Avice George
- 7 years ago
- Views:
Transcription
1 Advanced Topics in Machine Learning: COMPGI13 Gatsby Unit, CSML, UCL March 15, 2016
2 Overview The representer theorem Review of convex optimization Support vector classification, the C-SV and ν-sv machines
3 Representer theorem
4 Learning problem: setting Given a set of paired observations x 1, y 1 ),... x n, y n ) regression or classification). Find the function f in the RKHS H which satisfies f = arg min Jf ), 1) f H where ) Jf ) = L y f x 1 ),..., f x n )) + Ω f 2 H, Ω is non-decreasing, y is the vector of y i, loss L depends on x i only via f x i ). Classification: L y f x 1 ),..., f x n )) = n I y i f x i ) 0 Regression: L y f x 1 ),..., f x n )) = n y i f x i )) 2
5 Representer theorem The representer theorem: a solution to [ L y f x 1 ),..., f x n )) + Ω takes the form min f H f = α i kx i, ). )] f 2 H If Ω is strictly increasing, the solution must have this form.
6 Representer theorem: proof Proof: Denote f s projection of f onto the subspace such that where f s = n α ikx i, ). Regularizer: span {kx i, ) : 1 i n}, 2) f = f s + f, f 2 H = f s 2 H + f 2 H f s 2 H, then ) ) Ω f 2 H Ω f s 2 H, so this term is minimized for f = f s.
7 Representer theorem: proof Proof cont.): Individual terms f x i ) in the loss: f x i ) = f, kx i, ) H = f s + f, kx i, ) H = f s, kx i, ) H, so L y f x 1 ),..., f x n )) = L y f s x 1 ),..., f s x n )). Hence Loss L...) only depends on the component of f in the data subspace, Regularizer Ω...) minimized when f = f s. If Ω is non-decreasing, then f H = 0 is a minimum. If Ω strictly increasing, min. is unique.
8 Short overview of convex optimization
9 Why we need optimization: SVM idea Classify two clouds of points, where there exists a hyperplane which linearly separates one cloud from the other without error.
10 Why we need optimization: SVM idea Classify two clouds of points, where there exists a hyperplane which linearly separates one cloud from the other without error.
11 Why we need optimization: SVM idea Classify two clouds of points, where there exists a hyperplane which linearly separates one cloud from the other without error. Smallest distance from each class to the separating hyperplane w x + b is called the margin.
12 Why we need optimization: SVM idea This problem can be expressed as follows: ) 2 max margin) = max w,b w,b w or min w,b w 2 3) subject to { w x i + b ) 1 i : y i = +1, w x i + b ) 1 i : y i = 1. 4) This is a convex optimization problem.
13 Convex set Figure from Boyd and Vandenberghe) Leftmost set is convex, remaining two are not. Every point in the set can be seen from any other point in the set, along a straight line that never leaves the set. Definition C is convex if for all x 1, x 2 C and any 0 θ 1 we have θx θ)x 2 C, i.e. every point on the line between x 1 and x 2 lies in C.
14 Convex function: no local optima Figure from Boyd and Vandenberghe) Definition Convex function) A function f is convex if its domain domf is a convex set and if x, y domf, and any 0 θ 1, f θx + 1 θ)y) θf x) + 1 θ)f y). The function is strictly convex if the inequality is strict for x y.
15 Optimization and the Lagrangian Generic) optimization problem on x R n, minimize f 0 x) subject to f i x) 0 i = 1,..., m 5) h i x) = 0 i = 1,... p. p the optimal value of 5), D assumed nonempty, where... D := m i=0 domf i p domh i dom f i =subset of R n where f i defined). Idealy we would want an unconstrained problem minimize f 0 x) + { 0 u 0 where I u) = u > 0 Why is this hard to solve? I f i x)) + p I 0 h i x)), and I 0 u) is the indicator of 0.
16 Optimization and the Lagrangian Generic) optimization problem on x R n, minimize f 0 x) subject to f i x) 0 i = 1,..., m 5) h i x) = 0 i = 1,... p. p the optimal value of 5), D assumed nonempty, where... D := m i=0 domf i p domh i dom f i =subset of R n where f i defined). Idealy we would want an unconstrained problem minimize f 0 x) + { 0 u 0 where I u) = u > 0 Why is this hard to solve? I f i x)) + p I 0 h i x)), and I 0 u) is the indicator of 0.
17 Lower bound interpretation of Lagrangian The Lagrangian L : R n R m R p R is a lower bound on the original problem: p Lx, λ, ν) := f 0 x) + λ i f i x) + ν }{{} i h i x), }{{} I f i x)) I 0 h i x)) and has domain doml := D R m R p. The vectors λ and ν are called lagrange multipliers or dual variables. To ensure a lower bound, we require λ 0.
18 Lagrange dual: lower bound on optimum p The Lagrange dual function: minimize Lagrangian When λ 0 and f i x) 0, Lagrange dual function is gλ, ν) := inf Lx, λ, ν). 6) x D A dual feasible pair λ, ν) is a pair for which λ 0 and λ, ν) domg). We will show: next slide) for any λ 0 and ν, gλ, ν) f 0 x) wherever including at f 0 x ) = p ). f i x) 0 g i x) = 0
19 Lagrange dual: lower bound on optimum p The Lagrange dual function: minimize Lagrangian When λ 0 and f i x) 0, Lagrange dual function is gλ, ν) := inf Lx, λ, ν). 6) x D A dual feasible pair λ, ν) is a pair for which λ 0 and λ, ν) domg). We will show: next slide) for any λ 0 and ν, gλ, ν) f 0 x) wherever including at f 0 x ) = p ). f i x) 0 g i x) = 0
20 Lagrange dual: lower bound on optimum p Simplest example: minimize over x Lx, λ) = f 0 x) + λf 1 x) Figure modified from Boyd and Vandenberghe) the function Reminders: f 0 is function to be minimized. f 1 0 is inequality constraint λ 0 is Lagrange multiplier p is minimum f 0 in constraint set
21 Lagrange dual: lower bound on optimum p Simplest example:minimize over x Lx, λ) = f 0 x) + λf 1 x) Figure from Boyd and Vandenberghe) the function Reminders: f 0 is function to be minimized. f 1 0 is inequality constraint λ 0 is Lagrange multiplier p is minimum f 0 in constraint set
22 Lagrange dual: lower bound on optimum p Simplest example: minimize over x Lx, λ) = f 0 x) + λf 1 x) Figure from Boyd and Vandenberghe) the function Reminders: f 0 is function to be minimized. f 1 0 is inequality constraint λ 0 is Lagrange multiplier p is minimum f 0 in constraint set
23 Lagrange dual: lower bound on optimum p When λ 0, then for all ν we have gλ, ν) p 7) A dual feasible pair λ, ν) is a pair for which λ 0 and λ, ν) domg) Figure from Boyd and Vandenberghe) Reminders: gλ, ν) := inf x D Lx, λ) λ 0 is Lagrange multiplier p is minimum f 0 in constraint set
24 Lagrange dual is lower bound on p proof) We now give a formal proof that Lagrange dual function gλ, ν) lower bounds p. Proof: Assume x is feasible, i.e. f i x) 0, h i x) = 0, x D, λ 0. Then p λ i f i x) + ν i h i x) 0 Thus gλ, ν) := inf x D f 0 x) + f 0 x). f 0 x) + λ i f i x) + λ i f i x) + ) p ν i h i x) p ν i h i x) This holds for every feasible x, hence lower bound holds.
25 Lagrange dual is lower bound on p proof) We now give a formal proof that Lagrange dual function gλ, ν) lower bounds p. Proof: Assume x is feasible, i.e. f i x) 0, h i x) = 0, x D, λ 0. Then p λ i f i x) + ν i h i x) 0 Thus gλ, ν) := inf x D f 0 x) + f 0 x). f 0 x) + λ i f i x) + λ i f i x) + ) p ν i h i x) p ν i h i x) This holds for every feasible x, hence lower bound holds.
26 Lagrange dual is lower bound on p proof) We now give a formal proof that Lagrange dual function gλ, ν) lower bounds p. Proof: Assume x is feasible, i.e. f i x) 0, h i x) = 0, x D, λ 0. Then p λ i f i x) + ν i h i x) 0 Thus gλ, ν) := inf x D f 0 x) + f 0 x). f 0 x) + λ i f i x) + λ i f i x) + ) p ν i h i x) p ν i h i x) This holds for every feasible x, hence lower bound holds.
27 Best lower bound: maximize the dual Best lower bound gλ, ν) on the optimal solution p of 5): Lagrange dual problem maximize gλ, ν) subject to λ 0. 8) Dual feasible: λ, ν) with λ 0 and gλ, ν) >. Dual optimal: solutions λ, ν ) maximizing dual, d is optimal value dual always easy to maximize: next slide). Weak duality always holds: d p....but what is the point of finding a best largest) lower bound on a minimization problem?
28 Best lower bound: maximize the dual Best lower bound gλ, ν) on the optimal solution p of 5): Lagrange dual problem maximize gλ, ν) subject to λ 0. 8) Dual feasible: λ, ν) with λ 0 and gλ, ν) >. Dual optimal: solutions λ, ν ) maximizing dual, d is optimal value dual always easy to maximize: next slide). Weak duality always holds: d p....but what is the point of finding a best largest) lower bound on a minimization problem?
29 Best lower bound: maximize the dual Best lower bound gλ, ν) on the optimal solution p of 5): Lagrange dual problem maximize gλ, ν) subject to λ 0. 9) Dual feasible: λ, ν) with λ 0 and gλ, ν) >. Dual optimal: solutions λ, ν ) to the dual problem, d is optimal value dual always easy to maximize: next slide). Weak duality always holds: d p. Strong duality: does not always hold, conditions given later): d = p. If S.D. holds: solve the easy concave) dual problem to find p.
30 Maximizing the dual is always easy The Lagrange dual function: minimize Lagrangian lower bound) gλ, ν) = inf Lx, λ, ν). x D Dual function is a pointwise infimum of affine functions of λ, ν), hence concave in λ, ν) with convex constraint set λ 0. Example: One inequality constraint, Lx, λ) = f 0 x) + λf 1 x), and assume there are only four possible values for x. Each line represents a different x.
31 How do we know if strong duality holds? Conditions under which strong duality holds are called constraint qualifications they are sufficient, but not necessary) Probably) best known sufficient condition: Strong duality holds if Primal problem is convex, i.e. of the form minimize f 0 x) subject to f i x) 0 i = 1,..., n 10) Ax = b for convex f 0,..., f m, and Slater s condition holds: there exists some strictly feasible point 1 x relintd) such that f i x) < 0 i = 1,..., m A x = b. 1 We denote by relintd) the relative interior of the set D. This looks like the interior of the set, but is non-empty even when the set is a subspace of a larger space.
32 How do we know if strong duality holds? Conditions under which strong duality holds are called constraint qualifications they are sufficient, but not necessary) Probably) best known sufficient condition: Strong duality holds if Primal problem is convex, i.e. of the form minimize f 0 x) subject to f i x) 0 i = 1,..., n 10) Ax = b for convex f 0,..., f m, and Slater s condition holds: there exists some strictly feasible point 1 x relintd) such that f i x) < 0 i = 1,..., m A x = b. 1 We denote by relintd) the relative interior of the set D. This looks like the interior of the set, but is non-empty even when the set is a subspace of a larger space.
33 How do we know if strong duality holds? Conditions under which strong duality holds are called constraint qualifications they are sufficient, but not necessary) Probably) best known sufficient condition: Strong duality holds if Primal problem is convex, i.e. of the form minimize f 0 x) subject to f i x) 0 i = 1,..., n Ax = b for convex f 0,..., f m, and Slater s condition for the case of affine f i is trivial inequality constraints no longer strict, reduces to original inequality constraints) f i x) 0 i = 1,..., m A x = b.
34 A consequence of strong duality... Assume primal is equal to the dual. What are the consequences? x solution of original problem minimum of f 0 under constraints), λ, ν ) solutions to dual f 0 x ) = assumed) = g definition) inf definition) 4) gλ, ν ) inf x D f 0 x ) + f 0 x ), f 0 x) + λ i f i x) + λ i f i x ) + ) p νi h i x) p νi h i x ) 4): x, λ, ν ) satisfies λ 0, f i x ) 0, and h i x ) = 0.
35 A consequence of strong duality... Assume primal is equal to the dual. What are the consequences? x solution of original problem minimum of f 0 under constraints), λ, ν ) solutions to dual f 0 x ) = assumed) = g definition) inf definition) 4) gλ, ν ) inf x D f 0 x ) + f 0 x ), f 0 x) + λ i f i x) + λ i f i x ) + ) p νi h i x) p νi h i x ) 4): x, λ, ν ) satisfies λ 0, f i x ) 0, and h i x ) = 0.
36 ...is complementary slackness From previous slide, λ i f i x ) = 0, 11) which is the condition of complementary slackness. This means λ i > 0 = f i x ) = 0, f i x ) < 0 = λ i = 0. From λ i, read off which inequality constraints are strict.
37 KKT conditions for global optimum Assume functions f i, h i are differentiable and strong duality. Since x minimizes Lx, λ, ν ), derivative at x is zero, f 0 x ) + λ i f i x ) + p νi h i x ) = 0. KKT conditions definition: we are at global optimum, x, λ, ν) = x, λ, ν ) when a) strong duality holds, and b) f 0 x) + λ i f i x) + f i x) 0, i = 1,..., m h i x) = 0, i = 1,..., p λ i 0, i = 1,..., m λ i f i x) = 0, i = 1,..., m p ν i h i x) = 0
38 KKT conditions for global optimum Assume functions f i, h i are differentiable and strong duality. Since x minimizes Lx, λ, ν ), derivative at x is zero, f 0 x ) + λ i f i x ) + p νi h i x ) = 0. KKT conditions definition: we are at global optimum, x, λ, ν) = x, λ, ν ) when a) strong duality holds, and b) f 0 x) + λ i f i x) + f i x) 0, i = 1,..., m h i x) = 0, i = 1,..., p λ i 0, i = 1,..., m λ i f i x) = 0, i = 1,..., m p ν i h i x) = 0
39 KKT conditions for global optimum In summary: if primal problem convex and constraint functions satisfy Slater s conditions then strong duality holds. If in addition functions f i, h i differentiable then KKT conditions necessary and sufficient for optimality.
40 Support vector classification
41 Linearly separable points Classify two clouds of points, where there exists a hyperplane which linearly separates one cloud from the other without error. Smallest distance from each class to the separating hyperplane w x + b is called the margin.
42 Maximum margin classifier, linearly separable case This problem can be expressed as follows: ) 2 max margin) = max w,b w,b w subject to The resulting classifier is { min w x i + b ) = 1 i : y i = +1, We can rewrite to obtain subject to max w x i + b ) = 1 i : y i = 1. max w,b y = signw x + b), 1 w or min w,b w 2 12) 13) y i w x i + b) 1. 14)
43 Maximum margin classifier, linearly separable case This problem can be expressed as follows: ) 2 max margin) = max w,b w,b w subject to The resulting classifier is { min w x i + b ) = 1 i : y i = +1, We can rewrite to obtain subject to max w x i + b ) = 1 i : y i = 1. max w,b y = signw x + b), 1 w or min w,b w 2 12) 13) y i w x i + b) 1. 14)
44 Maximum margin classifier: with errors allowed Allow errors : points within the margin, or even on the wrong side of the decision boudary. Ideally: ) 1 ) min w,b 2 w 2 + C I[y i w x i + b < 0], where C controls the tradeoff between maximum margin and loss. Replace with convex upper bound: 1 min w,b 2 w 2 + C θ y i w x i + b)) ). with hinge loss, θα) = 1 α) + = { 1 α 1 α > 0 0 otherwise.
45 Maximum margin classifier: with errors allowed Allow errors : points within the margin, or even on the wrong side of the decision boudary. Ideally: ) 1 ) min w,b 2 w 2 + C I[y i w x i + b < 0], where C controls the tradeoff between maximum margin and loss. Replace with convex upper bound: 1 min w,b 2 w 2 + C θ y i w x i + b)) ). with hinge loss, θα) = 1 α) + = { 1 α 1 α > 0 0 otherwise.
46 Hinge loss Hinge loss: θα) = 1 α) + = { 1 α 1 α > 0 0 otherwise.
47 Support vector classification Substituting in the hinge loss, we get 1 min w,b 2 w 2 + C θ y i w x i + b)) ). How do you implement hinge loss with simple inequality constraints for optimization)? ) 1 min w,b,ξ 2 w 2 + C ξ i 15) subject to 2 ) ξ i 0 y i w x i + b 1 ξ i 2 Either y i w x i + b ) 1 and ξ i = 0 as before, or y i w x i + b ) < 1, and then ξ i > 0 takes the value satisfying y i w x i + b ) = 1 ξ i.
48 Support vector classification Substituting in the hinge loss, we get 1 min w,b 2 w 2 + C θ y i w x i + b)) ). How do you implement hinge loss with simple inequality constraints for optimization)? ) 1 min w,b,ξ 2 w 2 + C ξ i 15) subject to 2 ) ξ i 0 y i w x i + b 1 ξ i 2 Either y i w x i + b ) 1 and ξ i = 0 as before, or y i w x i + b ) < 1, and then ξ i > 0 takes the value satisfying y i w x i + b ) = 1 ξ i.
49 Support vector classification
50 Does strong duality hold? 1 Is the optimization problem convex wrt the variables w, b, ξ? minimize f 0 w, b, ξ) := 1 2 w 2 + C ) subject to f i w, b, ξ) := 1 ξ i y i w x i + b 0 Ax = b absent) ξ i i = 1,..., n Each of f 0, f 1,..., f n are convex. 2 Does Slater s condition hold? Trivial since inequality constraints affine, and there always exists some ξ i 0 ) y i w x i + b 1 ξ i Thus strong duality holds, the problem is differentiable, hence the KKT conditions hold at the global optimum.
51 Does strong duality hold? 1 Is the optimization problem convex wrt the variables w, b, ξ? minimize f 0 w, b, ξ) := 1 2 w 2 + C ) subject to f i w, b, ξ) := 1 ξ i y i w x i + b 0 Ax = b absent) ξ i i = 1,..., n Each of f 0, f 1,..., f n are convex. 2 Does Slater s condition hold? Trivial since inequality constraints affine, and there always exists some ξ i 0 ) y i w x i + b 1 ξ i Thus strong duality holds, the problem is differentiable, hence the KKT conditions hold at the global optimum.
52 Does strong duality hold? 1 Is the optimization problem convex wrt the variables w, b, ξ? minimize f 0 w, b, ξ) := 1 2 w 2 + C ) subject to f i w, b, ξ) := 1 ξ i y i w x i + b 0 Ax = b absent) ξ i i = 1,..., n Each of f 0, f 1,..., f n are convex. 2 Does Slater s condition hold? Trivial since inequality constraints affine, and there always exists some ξ i 0 ) y i w x i + b 1 ξ i Thus strong duality holds, the problem is differentiable, hence the KKT conditions hold at the global optimum.
53 Support vector classification: Lagrangian The Lagrangian: Lw, b, ξ, α, λ) = 1 2 w 2 +C ) ) ξ i + α i 1 y i w x i + b ξ i + λ i ξ i ) with dual variable constraints α i 0, λ i 0. Minimize wrt the primal variables w, b, and ξ.
54 Support vector classification: Lagrangian Derivative wrt w: L w = w α i y i x i = 0 w = α i y i x i. 16) Derivative wrt b: Derivative wrt ξ i : L b = i y i α i = 0. 17) L ξ i = C α i λ i = 0 α i = C λ i. 18) Noting that λ i 0, α i C.
55 Support vector classification: Lagrangian Now use complementary slackness: Non-margin SVs: α i = C 0: 1 We immediately have 1 ξ i = y i w x i + b ). 2 Also, from condition α i = C λ i, we have λ i = 0 hence can have ξ i > 0). Margin SVs: 0 < α i < C: 1 We again have 1 ξ i = y i w x i + b ) 2 This time, from α i = C λ i, we have λ i 0, hence ξ i = 0. Non-SVs: α i = 0 1 This time we can have: y i w x i + b ) > 1 ξ i 2 From α i = C λ i, we have λ i 0, hence ξ i = 0.
56 Support vector classification: Lagrangian Now use complementary slackness: Non-margin SVs: α i = C 0: 1 We immediately have 1 ξ i = y i w x i + b ). 2 Also, from condition α i = C λ i, we have λ i = 0 hence can have ξ i > 0). Margin SVs: 0 < α i < C: 1 We again have 1 ξ i = y i w x i + b ) 2 This time, from α i = C λ i, we have λ i 0, hence ξ i = 0. Non-SVs: α i = 0 1 This time we can have: y i w x i + b ) > 1 ξ i 2 From α i = C λ i, we have λ i 0, hence ξ i = 0.
57 Support vector classification: Lagrangian Now use complementary slackness: Non-margin SVs: α i = C 0: 1 We immediately have 1 ξ i = y i w x i + b ). 2 Also, from condition α i = C λ i, we have λ i = 0 hence can have ξ i > 0). Margin SVs: 0 < α i < C: 1 We again have 1 ξ i = y i w x i + b ) 2 This time, from α i = C λ i, we have λ i 0, hence ξ i = 0. Non-SVs: α i = 0 1 This time we can have: y i w x i + b ) > 1 ξ i 2 From α i = C λ i, we have λ i 0, hence ξ i = 0.
58 Support vector classification: Lagrangian Now use complementary slackness: Non-margin SVs: α i = C 0: 1 We immediately have 1 ξ i = y i w x i + b ). 2 Also, from condition α i = C λ i, we have λ i = 0 hence can have ξ i > 0). Margin SVs: 0 < α i < C: 1 We again have 1 ξ i = y i w x i + b ) 2 This time, from α i = C λ i, we have λ i 0, hence ξ i = 0. Non-SVs: α i = 0 1 This time we can have: y i w x i + b ) > 1 ξ i 2 From α i = C λ i, we have λ i 0, hence ξ i = 0.
59 The support vectors We observe: 1 The solution is sparse: points which are not on the margin, or margin errors, have α i = 0 2 The support vectors: only those points on the decision boundary, or which are margin errors, contribute. 3 Influence of the non-margin SVs is bounded, since their weight cannot exceed C.
60 Support vector classification: dual function Thus, our goal is to maximize the dual, gα, λ) = 1 2 w 2 + C = 1 2 = + ξ i + λ i ξ i ) j=1 α i α j y i y j xi x j + C b α i y i + α i 1 2 }{{} 0 j=1 α i ) ) α i 1 y i w x i + b ξ i ξ i α i ξ i α i α j y i y j xi x j. j=1 α i α j y i y j xi C α i )ξ i x j
61 Support vector classification: dual function Maximize the dual, gα) = α i 1 2 α i α j y i y j xi x j, j=1 subject to the constraints 0 α i C, y i α i = 0 This is a quadratic program. Offset b: for the margin SVs, we have 1 = y i w x i + b ). Obtain b from any of these, or take an average.
62 Support vector classification: kernel version gin Hyperplanes 207 Figure 7.9 Toy problem task: separate circles from disks) solved using -SV classification, with parameter values Taken ranging from from Schoelkopf 0 1top and left)to Smola 2002) 0 8bottomright).Thelarger we make,themorepointsareallowedtolieinsidethemargindepicted by dotted lines). Results are shown for a Gaussian kernel, kx x ) exp x x 2 ).
63 Support vector classification: kernel version Maximum margin classifier in RKHS: write the hinge loss formulation ) 1 min w 2 w 2 H + C θ y i w, kx i, ) H ) for the RKHS H with kernel kx, ). Use the result of the representer theorem, w ) = β i kx i, ). Maximizing the margin equivalent to minimizing w 2 H : for many RKHSs a smoothness constraint e.g. Gaussian kernel).
64 Support vector classification: kernel version Substituting and introducing the ξ i variables, get ) 1 min β,ξ 2 β Kβ + C ξ i where the matrix K has i, jth entry K ij = kx i, x j ), subject to ξ i 0 y i β j kx i, x j ) 1 ξ i j=1 Convex in β, ξ since K is positive definite: hence strong duality holds. Dual: gα) = α i 1 α i α j y i y j kx i, x j ), 2 subject to w ) = j=1 y i α i kx, ), 0 α i C. 19)
65 Support vector classification: kernel version Substituting and introducing the ξ i variables, get ) 1 min β,ξ 2 β Kβ + C ξ i where the matrix K has i, jth entry K ij = kx i, x j ), subject to ξ i 0 y i β j kx i, x j ) 1 ξ i j=1 Convex in β, ξ since K is positive definite: hence strong duality holds. Dual: gα) = α i 1 α i α j y i y j kx i, x j ), 2 subject to w ) = j=1 y i α i kx, ), 0 α i C. 19)
66 Support vector classification: the ν-svm Another kind of SVM: the ν-svm: Hard to interpret C. Modify the formulation to get a more intuitive parameter ν. Again, we drop b for simplicity. Solve subject to min w,ρ,ξ 1 2 w 2 νρ + 1 n ρ 0 ξ i 0 y i w x i ρ ξ i, now directly adjust margin width ρ). ) ξ i
67 The ν-svm: Lagrangian 1 2 w 2 H+ 1 n ξ i νρ+ α i ρ y i w x i ξ i )+ β i ξ i )+γ ρ) for dual variables α i 0, β i 0, and γ 0. Differentiating and setting to zero for each of the primal variables w, ξ, ρ, w = α i y i x i From β i 0, equation 20) implies α i + β i = 1 n 20) ν = α i γ 21) 0 α i n 1.
68 The ν-svm: Lagrangian 1 2 w 2 H+ 1 n ξ i νρ+ α i ρ y i w x i ξ i )+ β i ξ i )+γ ρ) for dual variables α i 0, β i 0, and γ 0. Differentiating and setting to zero for each of the primal variables w, ξ, ρ, w = α i y i x i From β i 0, equation 20) implies α i + β i = 1 n 20) ν = α i γ 21) 0 α i n 1.
69 Complementary slackness 1) Complementary slackness conditions: Assume ρ > 0 at the global solution, hence γ = 0, and α i = ν. 22) Case of ξ i > 0: complementary slackness states β i = 0, hence from 20) we have α i = n 1. Denote this set as Nα). Then i Nα) 1 n = i Nα) α i α i = ν, so Nα) ν, n and ν is an upper bound on the number of non-margin SVs.
70 Complementary slackness 1) Complementary slackness conditions: Assume ρ > 0 at the global solution, hence γ = 0, and α i = ν. 22) Case of ξ i > 0: complementary slackness states β i = 0, hence from 20) we have α i = n 1. Denote this set as Nα). Then i Nα) 1 n = i Nα) α i α i = ν, so Nα) ν, n and ν is an upper bound on the number of non-margin SVs.
71 Complementary slackness 2) Case of ξ i = 0: β i > 0 and so α i < n 1. Denote by Mα) the set of points n 1 > α i > 0. Then from 22), ν = α i = i Nα) 1 n + i Mα) α i i Mα) Nα) 1 n, thus Nα) + Mα) ν, n and ν is a lower bound on the number of support vectors with non-zero weight both on the margin, and margin errors ).
72 Dual for ν-svm Substituting into the Lagrangian, we get 1 α i α j y i y j xi x j + 1 ξ i ρν 2 n = 1 2 j=1 + Maximize: subject to α i ρ j=1 α i ξ i α i α j y i y j xi x j gα) = 1 2 j=1 1 n α i j=1 ) ξ i ρ α i α j y i y j xi x j, α i ν 0 α i 1 n. α i α j y i y j xi x j ) α i ν
73 Dual for ν-svm Substituting into the Lagrangian, we get 1 α i α j y i y j xi x j + 1 ξ i ρν 2 n = 1 2 j=1 + Maximize: subject to α i ρ j=1 α i ξ i α i α j y i y j xi x j gα) = 1 2 j=1 1 n α i j=1 ) ξ i ρ α i α j y i y j xi x j, α i ν 0 α i 1 n. α i α j y i y j xi x j ) α i ν
74 Questions?
Support Vector Machine (SVM)
Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationDuality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725
Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T
More informationSupport Vector Machines
Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric
More informationSeveral Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
More information4.6 Linear Programming duality
4.6 Linear Programming duality To any minimization (maximization) LP we can associate a closely related maximization (minimization) LP. Different spaces and objective functions but in general same optimal
More informationSupport Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationWalrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.
Walrasian Demand Econ 2100 Fall 2015 Lecture 5, September 16 Outline 1 Walrasian Demand 2 Properties of Walrasian Demand 3 An Optimization Recipe 4 First and Second Order Conditions Definition Walrasian
More informationIntroduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
More informationNonlinear Optimization: Algorithms 3: Interior-point methods
Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,
More informationA Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear
More informationDistributed Machine Learning and Big Data
Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya
More information2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
More informationDate: April 12, 2001. Contents
2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........
More informationProximal mapping via network optimization
L. Vandenberghe EE236C (Spring 23-4) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:
More informationBig Data - Lecture 1 Optimization reminders
Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
More informationNonlinear Programming Methods.S2 Quadratic Programming
Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective
More informationLinear Programming Notes V Problem Transformations
Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material
More information1 Solving LPs: The Simplex Algorithm of George Dantzig
Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.
More informationSupport Vector Machine. Tutorial. (and Statistical Learning Theory)
Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. jasonw@nec-labs.com 1 Support Vector Machines: history SVMs introduced
More information10. Proximal point method
L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing
More informationConvex analysis and profit/cost/support functions
CALIFORNIA INSTITUTE OF TECHNOLOGY Division of the Humanities and Social Sciences Convex analysis and profit/cost/support functions KC Border October 2004 Revised January 2009 Let A be a subset of R m
More information1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where.
Introduction Linear Programming Neil Laws TT 00 A general optimization problem is of the form: choose x to maximise f(x) subject to x S where x = (x,..., x n ) T, f : R n R is the objective function, S
More informationLecture 2: August 29. Linear Programming (part I)
10-725: Convex Optimization Fall 2013 Lecture 2: August 29 Lecturer: Barnabás Póczos Scribes: Samrachana Adhikari, Mattia Ciollaro, Fabrizio Lecci Note: LaTeX template courtesy of UC Berkeley EECS dept.
More informationCHAPTER 9. Integer Programming
CHAPTER 9 Integer Programming An integer linear program (ILP) is, by definition, a linear program with the additional constraint that all variables take integer values: (9.1) max c T x s t Ax b and x integral
More informationSupport Vector Machines
CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning algorithm. SVMs are among the best (and many believe are indeed the best)
More informationPractical Guide to the Simplex Method of Linear Programming
Practical Guide to the Simplex Method of Linear Programming Marcel Oliver Revised: April, 0 The basic steps of the simplex algorithm Step : Write the linear programming problem in standard form Linear
More informationDuality of linear conic problems
Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard
More informationSemi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
More informationTwo-Stage Stochastic Linear Programs
Two-Stage Stochastic Linear Programs Operations Research Anthony Papavasiliou 1 / 27 Two-Stage Stochastic Linear Programs 1 Short Reviews Probability Spaces and Random Variables Convex Analysis 2 Deterministic
More informationNOTES ON LINEAR TRANSFORMATIONS
NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationWhat is Linear Programming?
Chapter 1 What is Linear Programming? An optimization problem usually has three essential ingredients: a variable vector x consisting of a set of unknowns to be determined, an objective function of x to
More informationNumerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen
(für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained
More informationIncreasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.
1. Differentiation The first derivative of a function measures by how much changes in reaction to an infinitesimal shift in its argument. The largest the derivative (in absolute value), the faster is evolving.
More informationA NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION
1 A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION Dimitri Bertsekas M.I.T. FEBRUARY 2003 2 OUTLINE Convexity issues in optimization Historical remarks Our treatment of the subject Three unifying lines of
More information3. Linear Programming and Polyhedral Combinatorics
Massachusetts Institute of Technology Handout 6 18.433: Combinatorial Optimization February 20th, 2009 Michel X. Goemans 3. Linear Programming and Polyhedral Combinatorics Summary of what was seen in the
More informationMathematical finance and linear programming (optimization)
Mathematical finance and linear programming (optimization) Geir Dahl September 15, 2009 1 Introduction The purpose of this short note is to explain how linear programming (LP) (=linear optimization) may
More informationSupport Vector Machines for Classification and Regression
UNIVERSITY OF SOUTHAMPTON Support Vector Machines for Classification and Regression by Steve R. Gunn Technical Report Faculty of Engineering, Science and Mathematics School of Electronics and Computer
More informationLecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
More information160 CHAPTER 4. VECTOR SPACES
160 CHAPTER 4. VECTOR SPACES 4. Rank and Nullity In this section, we look at relationships between the row space, column space, null space of a matrix and its transpose. We will derive fundamental results
More informationLinear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc.
1. Introduction Linear Programming for Optimization Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc. 1.1 Definition Linear programming is the name of a branch of applied mathematics that
More informationAdaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationLecture 6: Logistic Regression
Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,
More informationMATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set.
MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set. Vector space A vector space is a set V equipped with two operations, addition V V (x,y) x + y V and scalar
More informationFurther Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1
Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 J. Zhang Institute of Applied Mathematics, Chongqing University of Posts and Telecommunications, Chongqing
More informationSolutions Of Some Non-Linear Programming Problems BIJAN KUMAR PATEL. Master of Science in Mathematics. Prof. ANIL KUMAR
Solutions Of Some Non-Linear Programming Problems A PROJECT REPORT submitted by BIJAN KUMAR PATEL for the partial fulfilment for the award of the degree of Master of Science in Mathematics under the supervision
More informationLecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method
Lecture 3 3B1B Optimization Michaelmas 2015 A. Zisserman Linear Programming Extreme solutions Simplex method Interior point method Integer programming and relaxation The Optimization Tree Linear Programming
More informationVector and Matrix Norms
Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty
More informationLecture 7: Finding Lyapunov Functions 1
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.243j (Fall 2003): DYNAMICS OF NONLINEAR SYSTEMS by A. Megretski Lecture 7: Finding Lyapunov Functions 1
More information26 Linear Programming
The greatest flood has the soonest ebb; the sorest tempest the most sudden calm; the hottest love the coldest end; and from the deepest desire oftentimes ensues the deadliest hate. Th extremes of glory
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationIn this section, we will consider techniques for solving problems of this type.
Constrained optimisation roblems in economics typically involve maximising some quantity, such as utility or profit, subject to a constraint for example income. We shall therefore need techniques for solving
More information15 Kuhn -Tucker conditions
5 Kuhn -Tucker conditions Consider a version of the consumer problem in which quasilinear utility x 2 + 4 x 2 is maximised subject to x +x 2 =. Mechanically applying the Lagrange multiplier/common slopes
More informationSome representability and duality results for convex mixed-integer programs.
Some representability and duality results for convex mixed-integer programs. Santanu S. Dey Joint work with Diego Morán and Juan Pablo Vielma December 17, 2012. Introduction About Motivation Mixed integer
More informationSystems of Linear Equations
Systems of Linear Equations Beifang Chen Systems of linear equations Linear systems A linear equation in variables x, x,, x n is an equation of the form a x + a x + + a n x n = b, where a, a,, a n and
More informationconstraint. Let us penalize ourselves for making the constraint too big. We end up with a
Chapter 4 Constrained Optimization 4.1 Equality Constraints (Lagrangians) Suppose we have a problem: Maximize 5, (x 1, 2) 2, 2(x 2, 1) 2 subject to x 1 +4x 2 =3 If we ignore the constraint, we get the
More informationMatrix Representations of Linear Transformations and Changes of Coordinates
Matrix Representations of Linear Transformations and Changes of Coordinates 01 Subspaces and Bases 011 Definitions A subspace V of R n is a subset of R n that contains the zero element and is closed under
More informationLECTURE 5: DUALITY AND SENSITIVITY ANALYSIS. 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method
LECTURE 5: DUALITY AND SENSITIVITY ANALYSIS 1. Dual linear program 2. Duality theory 3. Sensitivity analysis 4. Dual simplex method Introduction to dual linear program Given a constraint matrix A, right
More informationCost Minimization and the Cost Function
Cost Minimization and the Cost Function Juan Manuel Puerta October 5, 2009 So far we focused on profit maximization, we could look at a different problem, that is the cost minimization problem. This is
More informationChapter 4. Duality. 4.1 A Graphical Example
Chapter 4 Duality Given any linear program, there is another related linear program called the dual. In this chapter, we will develop an understanding of the dual linear program. This understanding translates
More informationSummer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.
Summer course on Convex Optimization Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.Minnesota Interior-Point Methods: the rebirth of an old idea Suppose that f is
More informationRecovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach
MASTER S THESIS Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach PAULINE ALDENVIK MIRJAM SCHIERSCHER Department of Mathematical
More informationby the matrix A results in a vector which is a reflection of the given
Eigenvalues & Eigenvectors Example Suppose Then So, geometrically, multiplying a vector in by the matrix A results in a vector which is a reflection of the given vector about the y-axis We observe that
More information24. The Branch and Bound Method
24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no
More informationSo let us begin our quest to find the holy grail of real analysis.
1 Section 5.2 The Complete Ordered Field: Purpose of Section We present an axiomatic description of the real numbers as a complete ordered field. The axioms which describe the arithmetic of the real numbers
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationDuality in Linear Programming
Duality in Linear Programming 4 In the preceding chapter on sensitivity analysis, we saw that the shadow-price interpretation of the optimal simplex multipliers is a very useful concept. First, these shadow
More informationUniversity of Lille I PC first year list of exercises n 7. Review
University of Lille I PC first year list of exercises n 7 Review Exercise Solve the following systems in 4 different ways (by substitution, by the Gauss method, by inverting the matrix of coefficients
More informationTable 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass.
Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
More informationLinear Programming I
Linear Programming I November 30, 2003 1 Introduction In the VCR/guns/nuclear bombs/napkins/star wars/professors/butter/mice problem, the benevolent dictator, Bigus Piguinus, of south Antarctica penguins
More informationORIENTATIONS. Contents
ORIENTATIONS Contents 1. Generators for H n R n, R n p 1 1. Generators for H n R n, R n p We ended last time by constructing explicit generators for H n D n, S n 1 by using an explicit n-simplex which
More informationNo: 10 04. Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics
No: 10 04 Bilkent University Monotonic Extension Farhad Husseinov Discussion Papers Department of Economics The Discussion Papers of the Department of Economics are intended to make the initial results
More informationInternational Doctoral School Algorithmic Decision Theory: MCDA and MOO
International Doctoral School Algorithmic Decision Theory: MCDA and MOO Lecture 2: Multiobjective Linear Programming Department of Engineering Science, The University of Auckland, New Zealand Laboratoire
More information3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions
3. Convex functions Convex Optimization Boyd & Vandenberghe basic properties and examples operations that preserve convexity the conjugate function quasiconvex functions log-concave and log-convex functions
More informationSimilarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
More informationEquilibrium computation: Part 1
Equilibrium computation: Part 1 Nicola Gatti 1 Troels Bjerre Sorensen 2 1 Politecnico di Milano, Italy 2 Duke University, USA Nicola Gatti and Troels Bjerre Sørensen ( Politecnico di Milano, Italy, Equilibrium
More informationThe Envelope Theorem 1
John Nachbar Washington University April 2, 2015 1 Introduction. The Envelope Theorem 1 The Envelope theorem is a corollary of the Karush-Kuhn-Tucker theorem (KKT) that characterizes changes in the value
More informationA FIRST COURSE IN OPTIMIZATION THEORY
A FIRST COURSE IN OPTIMIZATION THEORY RANGARAJAN K. SUNDARAM New York University CAMBRIDGE UNIVERSITY PRESS Contents Preface Acknowledgements page xiii xvii 1 Mathematical Preliminaries 1 1.1 Notation
More informationSolutions to Math 51 First Exam January 29, 2015
Solutions to Math 5 First Exam January 29, 25. ( points) (a) Complete the following sentence: A set of vectors {v,..., v k } is defined to be linearly dependent if (2 points) there exist c,... c k R, not
More informationFóra Gyula Krisztián. Predictive analysis of financial time series
Eötvös Loránd University Faculty of Science Fóra Gyula Krisztián Predictive analysis of financial time series BSc Thesis Supervisor: Lukács András Department of Computer Science Budapest, June 2014 Acknowledgements
More informationOn the Interaction and Competition among Internet Service Providers
On the Interaction and Competition among Internet Service Providers Sam C.M. Lee John C.S. Lui + Abstract The current Internet architecture comprises of different privately owned Internet service providers
More informationLarge-Scale Sparsified Manifold Regularization
Large-Scale Sparsified Manifold Regularization Ivor W. Tsang James T. Kwok Department of Computer Science and Engineering The Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong
More informationAdvanced Microeconomics
Advanced Microeconomics Ordinal preference theory Harald Wiese University of Leipzig Harald Wiese (University of Leipzig) Advanced Microeconomics 1 / 68 Part A. Basic decision and preference theory 1 Decisions
More informationMINIMIZATION OF ENTROPY FUNCTIONALS UNDER MOMENT CONSTRAINTS. denote the family of probability density functions g on X satisfying
MINIMIZATION OF ENTROPY FUNCTIONALS UNDER MOMENT CONSTRAINTS I. Csiszár (Budapest) Given a σ-finite measure space (X, X, µ) and a d-tuple ϕ = (ϕ 1,..., ϕ d ) of measurable functions on X, for a = (a 1,...,
More informationChapter 20. Vector Spaces and Bases
Chapter 20. Vector Spaces and Bases In this course, we have proceeded step-by-step through low-dimensional Linear Algebra. We have looked at lines, planes, hyperplanes, and have seen that there is no limit
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationStatistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
More information5.1 Bipartite Matching
CS787: Advanced Algorithms Lecture 5: Applications of Network Flow In the last lecture, we looked at the problem of finding the maximum flow in a graph, and how it can be efficiently solved using the Ford-Fulkerson
More information4.5 Linear Dependence and Linear Independence
4.5 Linear Dependence and Linear Independence 267 32. {v 1, v 2 }, where v 1, v 2 are collinear vectors in R 3. 33. Prove that if S and S are subsets of a vector space V such that S is a subset of S, then
More informationA Lagrangian-DNN Relaxation: a Fast Method for Computing Tight Lower Bounds for a Class of Quadratic Optimization Problems
A Lagrangian-DNN Relaxation: a Fast Method for Computing Tight Lower Bounds for a Class of Quadratic Optimization Problems Sunyoung Kim, Masakazu Kojima and Kim-Chuan Toh October 2013 Abstract. We propose
More informationIEOR 4404 Homework #2 Intro OR: Deterministic Models February 14, 2011 Prof. Jay Sethuraman Page 1 of 5. Homework #2
IEOR 4404 Homework # Intro OR: Deterministic Models February 14, 011 Prof. Jay Sethuraman Page 1 of 5 Homework #.1 (a) What is the optimal solution of this problem? Let us consider that x 1, x and x 3
More informationLinear Algebra Notes
Linear Algebra Notes Chapter 19 KERNEL AND IMAGE OF A MATRIX Take an n m matrix a 11 a 12 a 1m a 21 a 22 a 2m a n1 a n2 a nm and think of it as a function A : R m R n The kernel of A is defined as Note
More informationA Study on SMO-type Decomposition Methods for Support Vector Machines
1 A Study on SMO-type Decomposition Methods for Support Vector Machines Pai-Hsuen Chen, Rong-En Fan, and Chih-Jen Lin Department of Computer Science, National Taiwan University, Taipei 106, Taiwan cjlin@csie.ntu.edu.tw
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationMetric Spaces. Chapter 7. 7.1. Metrics
Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some
More information88 CHAPTER 2. VECTOR FUNCTIONS. . First, we need to compute T (s). a By definition, r (s) T (s) = 1 a sin s a. sin s a, cos s a
88 CHAPTER. VECTOR FUNCTIONS.4 Curvature.4.1 Definitions and Examples The notion of curvature measures how sharply a curve bends. We would expect the curvature to be 0 for a straight line, to be very small
More informationConvex Programming Tools for Disjunctive Programs
Convex Programming Tools for Disjunctive Programs João Soares, Departamento de Matemática, Universidade de Coimbra, Portugal Abstract A Disjunctive Program (DP) is a mathematical program whose feasible
More information