PDEEC Machine Learning 2016/17
|
|
- Barrie Jennings
- 7 years ago
- Views:
Transcription
1 PDEEC Machine Learning 2016/17 Lecture - Introduction to Support Vector Machines Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 21, / 56
2 Background Convex Hull Convex Hull Given a set of points {xi } we define the convex hull to be the set of all points x given by x = i α i x i subject to: α i 0 i α i = 1 Given two sets of points {x i } and {y i } if their convex hull intersect, the two sets of points cannot be linearly separable Given two sets of points {xi } and {y i } if they are linearly separable, their convex hull do not intersect x1 y1 x4 x2 y4 y3 y5 x3 y2 2 / 56
3 Background Convex Hull Reduced Convex Hulls Given a set of points {x i } we define the reduced convex hull to be the set of all points x given by x = i α i x i subject to: α i 0 i α i = 1 α i α, 1 N α 1 Figure: from [4] 3 / 56
4 Background Distance of the origin to a plane. plane defined by equation y(x) = w T x + b y(xa ) = 0 if x A to plane w is orthogonal to every vector lying on the plane: w T (x A x B ) = 0, with x A and x B on the plane the distance of the plane to the origin is w < x, w > (projection of any point of the plane in the direction of w) = wt x w = b w Figure: from [1] 4 / 56
5 Background Distance of a point P to a plane (I). 1st proof by translation, make P the origin O of a new system of coordinates: x = x P the equation of the plane w T x + b = 0 becomes w T (x + P ) + b = 0 w T x + (w T P + b) = 0 the distance of the origin to the plane is = (wt P +b) w = y(p ) w 2nd proof P = P + P // = P + r w w wt P + b = w T P + b + w T r w w = 0 + r w r = wt P +b w = y(p ) y(p ) w r = w 5 / 56
6 Background Distance of a point P to a plane (II) 3rd proof the problem of finding the minimum of (x P ) is equivalent to finding the minimum of (x P ) 2 = (x P ) T (x P ) using Lagrange multipliers, the problem of finding the minimum of (x P ) T (x P ) subject to w T x + b = 0 is equivalent to the problem of finding the minimum of L = {(x P ) T (x P ) + λ(w T x + b), for the variables x and L λ. x = 2(x P ) + λw = 0 L λ = wt x + b = 0 multiplying the 1st equation by w T and using the 2nd for w T x = b, we obtain λ = 2(wT P +b) w T w using this value of λ on the 1st equation we get x P = (wt P +b)w. The norm of this vector comes as w T w x P = wt P +b w = wt P +b y(p ) w T w w = w algebric distance of a point P to a plane: y(p )/ w 6 / 56
7 Background Lagrange Multipliers (I) Maximizing f(x) subject to g i (x) = 0 Consider the problem of finding the maximum of a function f(x), x R d, subject to a set of constraints g i (x) = 0, i = 1..m There exists a λ = (λ 1, λ 2,, λ m ) T, such that on the maximum x 0 f(x 0 ) = λ 1 g 1 (x 0 ) + λ 2 g 2 (x 0 ) + + λ m g m (x 0 ) Figure: from[1] λ = (λ 1, λ 2,, λ m ) are called the lagrange multipliers Introducing the Lagrangian function defined by L(x, λ) = f(x) + i λ ig i (x), we see that L/ λi = g i (x) L x = f(x) + λ 1 g 1 (x) + λ 2 g 2 (x) + + λ m g m (x) Just find the stationary point of L(x, λ) with respect to both x and λ. 7 / 56
8 Background Lagrange Multipliers (II) Maximizing f(x) subject to g i (x) 0 Two kinds of solutions: the solution lies in g(x) > 0, in which case the constraint is not active (λ = 0) the solution lies in g(x) = 0, in which case the constraint is active (λ 0). NOTE that now λ > 0. Figure: from [1] 8 / 56
9 Background Lagrange Multipliers (III) Karush-Kuhn-Tucker (KKT) conditions (it is a generalization of the method of Lagrange multipliers to inequality constraints) the solution to following nonlinear optimization problem is obtained by optimizing argmax x f(x) s.t. g i (x) 0 L(x, λ i, ν j ) = f(x) + i with respect to x and λ subject to h j (x) = 0 g i (x) 0 λ i 0 λ i g i (x) = 0 h i (x) = 0 λ i g i (x) + j ν j h j (x) 9 / 56
10 Background Lagrange Multipliers (IV) Karush-Kuhn-Tucker (KKT) conditions (it is a generalization of the method of Lagrange multipliers to inequality constraints) the solution to following nonlinear optimization problem is obtained by optimizing argmin x f(x) s.t. g i (x) 0 L(x, λ i, ν j ) = f(x) i with respect to x and λ subject to h j (x) = 0 g i (x) 0 λ i 0 λ i g i (x) = 0 h i (x) = 0 λ i g i (x) j ν j h j (x) 10 / 56
11 Background Lagrange Multipliers (V) Constraints f 1 (θ) = θ 1 + 2θ f 2 (θ) = θ 1 θ f 3 (θ) = θ a constraint is called inactive if the corresponding Lagrange multiplier is zero a constraint f i is called active if the corresponding Lagrange multiplier is nonzero (for which f i (θ ) = 0). 11 / 56
12 Background Min-Max Duality Primal: Dual: max y min x min x max y max F(x, y) y min F(x, y) x F(x, y) min x max F(x, y) y 12 / 56
13 Background Lagrangian Duality (I) minimize x J(x) s.t. g i (x) 0 L(x, λ i ) = J(x) i λ i g i (x) since λ i 0 and g i (x) 0 then J(x) = max λ L(x, λ i) the original problem is equivalent to min x max λ 0 L(x, λ) 13 / 56
14 Background Lagrangian Duality (II) Convex Programming Assume that J(x) is convex Assume that g i are convex then min x = max max L(x, λ) = λ 0 min λ 0 x L(x, λ) 14 / 56
15 Background Lagrangian Duality (III) Wolfe Dual Representation A convex programming problem is equivalent to max L(x, λ) λ 0 subject to L(x, λ) = 0 x 15 / 56
16 Support vector machines Linearly separable classes (I) The training set comprises N input vectors x 1,, x N, with corresponding target values y 1,, y N, where y i { 1, 1}. New data points x will be classified according the sign of the model y(x) = w T x + b margin - smallest distance between the decision boundary and any of the samples In SVMs the decision boundary is chosen to be the one for which the margin is maximized (a) An example of a linearly separable two class problem with multiple possible linear classifiers. (b) SVM option: maximize the margin (from [2]) 16 / 56
17 Support vector machines Linearly separable classes (II) the distance of a training point x n (correctly classified) to the decision surface is given by yny(xn) w = yn(wt x+b) w thus, we want { } 1 argmax w,b w min(y n(w T x n + b)) note now that the solution to this problem is not completely well-defined: if (w, b) is a solution to this optimization problem, so it will be any solution of the form (cw, cb), where c is a constant we can impose an additional constraint to completely define the problem. The constraint can be chosen to simplify the formulation. 17 / 56
18 Support vector machines Linearly separable classes (III) set (y n (w T x n + b)) = 1 for the point closest to the surface. In this case all points should satisfy (y n (w T x n + b)) 1, n = 1,, N We obtain the canonical representation: argmax w,b 1 w subject to: (y n (w T x n + b)) 1, equivalently: argmin w,b 1 2 w 2 subject to: (y n (w T x n + b)) 1, n = 1,, N n = 1,, N this is the primal formulation for the linearly separable case 18 / 56
19 Support vector machines Linearly separable classes (IV) Using the lagrange multipliers and the Karush-Kuhn-Tucker (KKT) conditions, we obtain the dual formulation: L(w, b, λ) = 1 2 w 2 n λ n{y n (w T x n + b) 1} L/ w = 0 w n λ n y n x n = 0 L/ b = 0 n λ n y n = 0 λ i 0, i = 1,, N λ i (y i (w T x i + b) 1) = 0, i = 1,, N y i (w T x i + b) 1) 0, i = 1,, N 19 / 56
20 Support vector machines Linearly separable classes (V) After a bit of algebra we end up with the equivalent optimization task, called the dual representation: argmax N λ n=1 λ n 1 2 subject to: i,j λ iλ j y i y j x T i x j n λ ny n = 0 λ i 0, i = 1,, N note that that now we have N variables, instead of the initial D variables (D is the dimension of the data space) note that in the optimization function the training data only show up in inner produts 20 / 56
21 Support vector machines Linearly separable classes (VI) Classifying new data points after finding the optimal λ, we can classify new data points, evaluating the sign of y(x) = x T w + b = n λ n y n x T x n + b (1) Any training point with λn = 0 plays no role in making predictions linear regression estimates a set of parameters and then the training points are not used for making predictions k-nearest Neighbour uses ALL training points to make predictions SVMs is something inbetween: uses only points with λ n 0 (support vectors) to make predictions: sparse kernel machine 21 / 56
22 Support vector machines Linearly separable classes (VII) Classifying new data points note that w does not need to be computed explicitly... but we need b note that any support vector satisfies (from the KKT conditions or from the very own formulation of SVMs) y m y(x m ) = 1 y m (w t x m + b) = 1. Using (1) we get y m ( n λ ny n x T mx n + b) = 1, where x m is any support vector b = y m n λ ny n x T mx n it is numericaly more stable to average over ALL support vectors: b = 1 { N S m S ym n S λ ny n x T } mx n, with m and n running only over support vectors 22 / 56
23 Support vector machines Linearly separable classes (VIII) A geometric interpretation find the convex hull of both sets of points find the two closest points in the two convex hulls (the convex hull is the smallest convex set containing the points) construct the plane that bisects these two points Figure: from [3] 23 / 56
24 Support vector machines Linearly separable classes (IX) A geometric interpretation The closest points in the convex hulls can be found by solving the following quadratic problem 1 argmin α 2 c d 2 s.t c = x i C 1 α i x i d = x i C 2 α i x i x i C 1 α i = 1 x i C 2 α i = 1 α i 0, i = 1,, N it can be shown to be equivalent to the maximum margin approach 24 / 56
25 Support vector machines Non-linearly separable classes (I) When in possession of a nearly separable dataset, a simple linear separator is bound to misclassify some points. But the question that we have to ask ourselves is if the non-linearly separable data portraits some intrinsic property of the problem (in which case a more complex classifier, allowing more general boundaries between classes may be more appropriate) or if it can be interpreted as the result of noisy points (measurement errors, uncertainty in class membership, etc), in which case keeping the linear separator and accept some errors is more natural. we will keep the linear boundary for now and accept errors in the training data 25 / 56
26 Support vector machines Non-linearly separable classes (II) allow some of the training points to be misclassified penalize the number of misclassifications introduce slack variables ξ n 0, n = 1,..., N ξ n = 0 ξ n = y n y(x n ) x n on the correct side of the margin otherwise Figure: from [1] 26 / 56
27 Support vector machines Non-linearly separable classes (III) replace exact constraints with y n y(x n ) 1 ξ n y n (w T x n + b) 1 ξ n the goal could now be to minimize 1 2 w 2 + C N n=1 sign(ξ n) The constraint parameter C controls the tradeoff between the dual objectives of maximizing the margin of separation and minimizing the misclassification error For an error to occur, the corresponding ξ n must exceed unity, so N n=1 sign(ξ n) is an upper bound on the number of the training errors optimization of the above is difficult since it involves a discontinuous function sign() to optimize a closely related cost function 1 2 w 2 + C N n=1 ξ n 27 / 56
28 Support vector machines Non-linearly separable classes (III) Figure: from [1] 28 / 56
29 Support vector machines Non-linearly separable classes (IV) L(w, b, ξ, µ, λ) = 1 2 w 2 + C ξ n λ n (y n y(x n ) 1 + ξ n ) µ n ξ n KKT conditions: λ n 0 y n y(x n ) 1 + ξ n 0 λ(y n y(x n ) 1 + ξ n ) = 0 µ n 0 ξ n 0 µ n ξ n = 0 29 / 56
30 Support vector machines Non-linearly separable classes (Va) L w = 0 w = n λ ny n x n L b = 0 n λ ny n = 0 = 0 λ n = C µ n L ξ n Dual Lagrangian form: argmax λ L(λ) = λ N n=1 1 2 N n=1 N m=1 λ nλ m y n y m x T mx n subject to: 0 λ n C N n=1 λ ny n = 0 This is very similar to the optimization problem in the linear separable case, except that there is an upper bound C on λ n now a QP solver can be used to find λ n 30 / 56
31 Support vector machines Non-linearly separable classes (Vb) The SMO algorithm Consider solving the unconstrained opt problem: EM (to be discussed) gradient ascend Coordinate ascend max α W (α 1,, α m ) 31 / 56
32 Support vector machines Non-linearly separable classes (Vc) Coordinate ascend 32 / 56
33 Support vector machines Non-linearly separable classes (Vc) Sequential minimal optimization Constrained optimization: argmax λ L(λ) = λ N n=1 1 2 N n=1 N m=1 λ nλ m y n y m x T mx n subject to: 0 λ n C N n=1 λ ny n = 0 Question: can we do coordinate along one direction at a time (i.e., hold all λ [ i] fixed, and update λ i?) 33 / 56
34 Support vector machines Non-linearly separable classes (Vd) The SMO algorithm Repeat till convergence Select some pair λ i and λ j to update next (using a heuristic that tries to pick the two that will allow us to make the biggest progress towards the global maximum). Re-optimize L(λ) with respect to λi and λ j, while holding all the other λ k s (k i; j) fixed. 34 / 56
35 Support vector machines Non-linearly separable classes (VI) Predictions: y(x) = N n=1 λ ny n x T x n + b b = 1 N M n M (y n m S λ my m x t mx n ) where M denotes the set of indices of data having 0 < λ n < C and S denotes the set of indices of data having λ n 0 (set of support vectors) 35 / 56
36 Support vector machines Non-linearly separable classes (VII) A geometric interpretation find the reduced convex hull of both sets of points find the two closest points in the two reduced convex hulls (the convex hull is the smallest convex set containing the points) construct the plane that bisects these two points (a) An example of two nonlinearly separable datasets (from [3]). (b) Reduced Convex Hull approach (from [3]) 36 / 56
37 Support vector machines Non-linearly separable classes (VIII) A geometric interpretation The closest points in the reduced convex hulls can be found by solving the following quadratic problem 1 argmin α 2 c d 2 s.t c = x i C 1 α i x i d = x i C 2 α i x i x i C 1 α i = 1 x i C 2 α i = 1 0 α i D, i = 1,, N it can be shown to be equivalent to the maximum margin approach 37 / 56
38 Support vector machines Non-linearly separable classes (IX) show that for a linearly separable dataset, the solution for the formulation with ξ s may be different from the solution for the formulation without the ξ s. what is the solution for the formulation without the ξ s for a non-linearly separable dataset? 38 / 56
39 Support vector machines Non-linearly separable classes (X) Dual Lagrangian form: L(λ) = λ N n=1 1 2 N n=1 N m=1 λ nλ m y n y m x T mx n subject to: 0 λ n C N n=1 λ ny n = 0 b = 1 N M m M (y m s S λ sy s x T s x m ) Predictions: y(x) = N n=1 λ ny n x T x n + b Notes: N unknowns: λ s the input vector x enters only in the form of scalar products the dimension of the data has almost no impact on the complexity 39 / 56
40 Support vector machines The design of nonlinear classifiers The design of a nonlinear classifier for the original data space x R d can be accomplished with the design of a linear classifier in a suitable space ˆx R D usually D > d but that is not always necessary Examples 1. the search for a boundary x x 2 2 = const in R 2 can be found as a linear boundary in the space defined by the variables r = x x2 2 and θ = arctan(y/x). In fact it could be simplified to the search in R defined just by the variable r 2. the search for a boundary w 1 x 1 + w 2 x 2 + w 3 x 1 x 2 + w 4 x w 5 x w 6 = 0 in R 2 can be found as a linear boundary in the space defined by the variables ˆx 1 = x 1, ˆx 2 = x 2, ˆx 3 = x 1 x 2, ˆx 4 = x 2 1, ˆx 5 = x / 56
41 Support vector machines The design of nonlinear classifiers to design a nonlinear SVM we just define a new space Φ(x) where our problem becomes linear apply the linear SVM in the new space Φ(x) (the linearly separable version or the nonlinearly separable version) the changes in our formulations are minimum: just replace x T i x j by Φ(x i ) T Φ(x j ) Define k(x i, x j ) = Φ(x i ) T Φ(x j ). The nonlinear SVMs become: L(λ) = λ N n=1 1 N N 2 n=1 m=1 λ nλ m y n y m k(x m, x n ) subject to: 0 λ n C N n=1 λ ny n = 0 b = 1 N M m M (y m s S λ sy s k(x s, x m )) Predictions: y(x) = N n=1 λ ny n k(x, x n ) + b The new space Φ(x) should capture the nonlinearity intrinsic to the data (and not the noise). In the simplest case Φ(x) = x and k(x i, x j ) = x T i x j 41 / 56
42 Kernel methods The design of nonlinear classifiers It would be nice to be able to compute k(x i, x j ) = Φ(x i ) T Φ(x j ) without needing to use the transformed space Φ(x) Mercer s theorem Let x R d and a mapping Φ(x) x Φ(x) H, where H is a Euclidean space (linear space equipped with inner product). Then Φ(x 1 ) T Φ(x 2 ) = k(x 1, x 2 ), where k(x 1, x 2 ) is a symmetric function satisfying k(x 1, x 2 )g(x 1 )g(x 2 )dx 1 dx 2 0, for any g(x), x R d such that g 2 (x)dx < Note that Mercer s theorem does not tell us how to find this space. A necessary and sufficient condition for a function k(x, x) to be a valid kernel is that the Gram Matrix K, whose elements are given by k(x n, x m ), should be a positive semidefinite matrix for all possible choices of the set {x n } 42 / 56
43 Kernel methods SVM examples 43 / 56
44 Kernel methods SVM examples 44 / 56
45 Kernel methods The design of nonlinear classifiers many models can be reformulated in terms of a dual representation in which the kernel function arises naturally linear regression kernel PCA nearest neighbour etc 45 / 56
46 Beyond basic Linear Regression Beyond basic LR Dual Variables in Regression (X t X + λi)w = X t y w = λ 1 X t (y Xw) = X t α showing that w can be written as a linear combination of the training points, w = N n=1 = α nx n, with α = λ 1 (y Xw) The elements of α are called the dual variables. α = λ 1 (y Xw) λα = y XX t α (XX t + λi)α = y α = (G + λi) 1 y G = XX t or, component-wise, G ij =< x i, x j >. The resulting prediction function is given by N g(x) =< w, x >= n=1 α nx n, x = n α n < x n, x > 46 / 56
47 Kernel methods An example of a kernel algorithm Idea: classify points x in the feature space according to which of the two class means is closer. c + = 1 N + i:y i =+1 x i c = 1 N i:y i = 1 x i Figure: from [?] Compute the sign of the dot product between w = c + c and x c. 47 / 56
48 Kernel methods An example of a kernel algorithm ( f(x) = sgn 1 ) N + i:y i =+1 < x, x i > 1 N i:y i = 1 < x, x i > +b ( = sgn 1 ) N + i:y i =+1 k(x, x i) 1 N i:y i = 1 k(x, x i) + b where ( b = N 2 (i,j):y i =y j = 1 k(x i, x j ) 1 ) N+ 2 (i,j):y i =y j =+1 k(x i, x j ) 48 / 56
49 Kernel methods Typical kernels any algorithm that only depends on dot products can benefit form the kernel trick think of the kernel as a nonlinear similarity measure examples of common kernels: Linear k(x1, x 2 ) = x T 1 x 2 Polynomial k(x1, x 2 ) = (x T 1 x 2 + c) d Gaussian k(x 1, x 2 ) = exp( γ x 1 x 2 2 ) kernels are also known as covariance functions 49 / 56
50 Kernel methods Constructing kernels Given valid kernels k 1 (x, x ) and k 2 (x, x ), the following new kernels will also be valid: k(x, x ) = ck 1 (x, x ), c R + k(x, x ) = f(x)k 1 (x, x )f(x ), f(.) is any function k(x, x ) = k 1 (x, x ) + k 2 (x, x ) k(x, x ) = k 1 (x, x )k 2 (x, x ) k(x, x ) = exp(k 1 (x, x )) 50 / 56
51 Regression Support Vector Machine Regression Support Vector Machine min w,b C n E ɛ(y n f(x n )) + λ 2 { w 2 0 if z < ɛ E ɛ (z) = z ɛ, otherwise E(z) t ɛ 0 ɛ z (a) ɛ insensitive error function. 0 x 1 (b) RVM result. Figure: from [1] 51 / 56
52 Multiclass Support Vector Machine Reduction techniques Conventional approaches One-against-All K two-class problems Pairwise K(K 1)/2 two-class problems Decision-Tree-Based DAG (Directed Acyclic Graph) Error-Correcting Output Codes 52 / 56
53 Multiclass Support Vector Machine Reduction techniques Conventional approaches apply binary classifier 1 to test example and get prediction F 1 (0/1) apply binary classifier 2 to test example and get prediction F 2 (0/1)... apply binary classifier M to test example and get prediction F M (0/1) use all M classifications to get the final multiclass classification 1..K 53 / 56
54 Multiclass Support Vector Machine Reduction techniques A lesser-known approach for solving multiclass problems via a Single binary classifier reduction obtained by embedding the original problem in a higher-dimensional space consisting of the original features, as well as one or more other dimensions determined by fixed vectors, termed here extension features. This embedding is implemented by replicating the training set points so that a copy of the original point is concatenated with each of the extension features vectors. The binary labels of the replicated points are set to maintain a particular structure in the extended space. This construction results in an instance of an artificial binary problem, which is fed to a binary learning algorithm that outputs a single soft binary classifier. To classify a new point, the point is replicated and extended similarly and the resulting replicas are fed to the soft binary classifier, which generates a number of signals, one for each replica. The class is determined as a function of these signals. 54 / 56
55 Multiclass Support Vector Machine All-at-Once Support Vector Machines For a K-class problem we define the decision function for class i by w T i x + b i > w T k x + b k, for k i, k = 1,..., K The L1 soft-margin support vector machine can be obtained by minimizing min w,b,ξ = 1 2 K w k 2 + C k=1 N n=1 K k=1,k y n ξ n,k subject to (w yn w k ) T x k + b yn b k 1 ξ n,k for k y n, k = 1,..., K, n = 1,..., N 55 / 56
56 References Christopher M. Bishop Pattern recognition and machine learning, Springer, Sergios Theodoridis and Konstantinos Koutroumbas Pattern recognition Elsevier, Academic Press, Kristin P. Bennet and Campbell Support Vector Machines: Hype or Hallelujah? SIGKDD Explorations, vol. 2, 2000 Michael E. Mavroforakis and Sergios Theodoridis A Geometric Approach to Support Vector Machine (SVM) Classification IEEE Transactions on Neural Networks, 2006
Support Vector Machine (SVM)
Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationSupport Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationA Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear
More informationIntroduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
More informationSupport Vector Machines
Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric
More informationSupport Vector Machines
CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning algorithm. SVMs are among the best (and many believe are indeed the best)
More informationBig Data - Lecture 1 Optimization reminders
Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
More informationLecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
More informationArtificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationSeveral Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationDuality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725
Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T
More informationA fast multi-class SVM learning method for huge databases
www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationNonlinear Programming Methods.S2 Quadratic Programming
Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationconstraint. Let us penalize ourselves for making the constraint too big. We end up with a
Chapter 4 Constrained Optimization 4.1 Equality Constraints (Lagrangians) Suppose we have a problem: Maximize 5, (x 1, 2) 2, 2(x 2, 1) 2 subject to x 1 +4x 2 =3 If we ignore the constraint, we get the
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More information3. INNER PRODUCT SPACES
. INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.
More informationDistributed Machine Learning and Big Data
Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya
More informationE-commerce Transaction Anomaly Classification
E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
More informationNumerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen
(für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained
More informationSearch Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
More informationMATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.
MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationLinear Programming Notes V Problem Transformations
Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material
More informationSupport Vector Machine. Tutorial. (and Statistical Learning Theory)
Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. jasonw@nec-labs.com 1 Support Vector Machines: history SVMs introduced
More information8. Linear least-squares
8. Linear least-squares EE13 (Fall 211-12) definition examples and applications solution of a least-squares problem, normal equations 8-1 Definition overdetermined linear equations if b range(a), cannot
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More informationSupport Vector Machines for Classification and Regression
UNIVERSITY OF SOUTHAMPTON Support Vector Machines for Classification and Regression by Steve R. Gunn Technical Report Faculty of Engineering, Science and Mathematics School of Electronics and Computer
More informationSECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA
SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA This handout presents the second derivative test for a local extrema of a Lagrange multiplier problem. The Section 1 presents a geometric motivation for the
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationSimple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,
More informationNeural Networks and Support Vector Machines
INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines
More informationMachine Learning in FX Carry Basket Prediction
Machine Learning in FX Carry Basket Prediction Tristan Fletcher, Fabian Redpath and Joe D Alessandro Abstract Artificial Neural Networks ANN), Support Vector Machines SVM) and Relevance Vector Machines
More information10. Proximal point method
L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing
More informationMetric Spaces. Chapter 7. 7.1. Metrics
Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationNonlinear Optimization: Algorithms 3: Interior-point methods
Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,
More informationProximal mapping via network optimization
L. Vandenberghe EE236C (Spring 23-4) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:
More information(Quasi-)Newton methods
(Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting
More informationGaussian Processes in Machine Learning
Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany carl@tuebingen.mpg.de WWW home page: http://www.tuebingen.mpg.de/ carl
More informationSUPPORT vector machine (SVM) formulation of pattern
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 17, NO. 3, MAY 2006 671 A Geometric Approach to Support Vector Machine (SVM) Classification Michael E. Mavroforakis Sergios Theodoridis, Senior Member, IEEE Abstract
More informationOne-sided Support Vector Regression for Multiclass Cost-sensitive Classification
One-sided Support Vector Regression for Multiclass Cost-sensitive Classification Han-Hsing Tu r96139@csie.ntu.edu.tw Hsuan-Tien Lin htlin@csie.ntu.edu.tw Department of Computer Science and Information
More informationData clustering optimization with visualization
Page 1 Data clustering optimization with visualization Fabien Guillaume MASTER THESIS IN SOFTWARE ENGINEERING DEPARTMENT OF INFORMATICS UNIVERSITY OF BERGEN NORWAY DEPARTMENT OF COMPUTER ENGINEERING BERGEN
More informationClassification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
More informationVector and Matrix Norms
Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty
More informationStatistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
More informationLinear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc.
1. Introduction Linear Programming for Optimization Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc. 1.1 Definition Linear programming is the name of a branch of applied mathematics that
More informationVirtual Landmarks for the Internet
Virtual Landmarks for the Internet Liying Tang Mark Crovella Boston University Computer Science Internet Distance Matters! Useful for configuring Content delivery networks Peer to peer applications Multiuser
More informationDate: April 12, 2001. Contents
2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........
More information2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
More informationTHE SVM APPROACH FOR BOX JENKINS MODELS
REVSTAT Statistical Journal Volume 7, Number 1, April 2009, 23 36 THE SVM APPROACH FOR BOX JENKINS MODELS Authors: Saeid Amiri Dep. of Energy and Technology, Swedish Univ. of Agriculture Sciences, P.O.Box
More informationInner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
More informationAIMS Big data. AIMS Big data. Outline. Outline. Lecture 5: Structured-output learning January 7, 2015 Andrea Vedaldi
AMS Big data AMS Big data Lecture 5: Structured-output learning January 7, 5 Andrea Vedaldi. Discriminative learning. Discriminative learning 3. Hashing and kernel maps 4. Learning representations 5. Structured-output
More informationOnline learning of multi-class Support Vector Machines
IT 12 061 Examensarbete 30 hp November 2012 Online learning of multi-class Support Vector Machines Xuan Tuan Trinh Institutionen för informationsteknologi Department of Information Technology Abstract
More informationIntroduction to the Finite Element Method (FEM)
Introduction to the Finite Element Method (FEM) ecture First and Second Order One Dimensional Shape Functions Dr. J. Dean Discretisation Consider the temperature distribution along the one-dimensional
More informationLABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
More informationThe Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationA Tutorial on Support Vector Machines for Pattern Recognition
c,, 1 43 () Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. A Tutorial on Support Vector Machines for Pattern Recognition CHRISTOPHER J.C. BURGES Bell Laboratories, Lucent Technologies
More information1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where.
Introduction Linear Programming Neil Laws TT 00 A general optimization problem is of the form: choose x to maximise f(x) subject to x S where x = (x,..., x n ) T, f : R n R is the objective function, S
More informationMachine Learning Final Project Spam Email Filtering
Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE
More informationMaking Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research
More informationDecompose Error Rate into components, some of which can be measured on unlabeled data
Bias-Variance Theory Decompose Error Rate into components, some of which can be measured on unlabeled data Bias-Variance Decomposition for Regression Bias-Variance Decomposition for Classification Bias-Variance
More informationPractical Guide to the Simplex Method of Linear Programming
Practical Guide to the Simplex Method of Linear Programming Marcel Oliver Revised: April, 0 The basic steps of the simplex algorithm Step : Write the linear programming problem in standard form Linear
More informationLecture 2: Homogeneous Coordinates, Lines and Conics
Lecture 2: Homogeneous Coordinates, Lines and Conics 1 Homogeneous Coordinates In Lecture 1 we derived the camera equations λx = P X, (1) where x = (x 1, x 2, 1), X = (X 1, X 2, X 3, 1) and P is a 3 4
More informationMachine Learning Big Data using Map Reduce
Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories
More informationSolutions Of Some Non-Linear Programming Problems BIJAN KUMAR PATEL. Master of Science in Mathematics. Prof. ANIL KUMAR
Solutions Of Some Non-Linear Programming Problems A PROJECT REPORT submitted by BIJAN KUMAR PATEL for the partial fulfilment for the award of the degree of Master of Science in Mathematics under the supervision
More informationA Lagrangian-DNN Relaxation: a Fast Method for Computing Tight Lower Bounds for a Class of Quadratic Optimization Problems
A Lagrangian-DNN Relaxation: a Fast Method for Computing Tight Lower Bounds for a Class of Quadratic Optimization Problems Sunyoung Kim, Masakazu Kojima and Kim-Chuan Toh October 2013 Abstract. We propose
More informationLINEAR EQUATIONS IN TWO VARIABLES
66 MATHEMATICS CHAPTER 4 LINEAR EQUATIONS IN TWO VARIABLES The principal use of the Analytic Art is to bring Mathematical Problems to Equations and to exhibit those Equations in the most simple terms that
More informationFactorization Theorems
Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization
More informationLinear Models for Classification
Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci
More informationArrangements And Duality
Arrangements And Duality 3.1 Introduction 3 Point configurations are tbe most basic structure we study in computational geometry. But what about configurations of more complicated shapes? For example,
More informationClassification of high resolution satellite images
Thesis for the degree of Master of Science in Engineering Physics Classification of high resolution satellite images Anders Karlsson Laboratoire de Systèmes d Information Géographique Ecole Polytéchnique
More informationA NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION
1 A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION Dimitri Bertsekas M.I.T. FEBRUARY 2003 2 OUTLINE Convexity issues in optimization Historical remarks Our treatment of the subject Three unifying lines of
More informationMathematical finance and linear programming (optimization)
Mathematical finance and linear programming (optimization) Geir Dahl September 15, 2009 1 Introduction The purpose of this short note is to explain how linear programming (LP) (=linear optimization) may
More informationLecture 2: August 29. Linear Programming (part I)
10-725: Convex Optimization Fall 2013 Lecture 2: August 29 Lecturer: Barnabás Póczos Scribes: Samrachana Adhikari, Mattia Ciollaro, Fabrizio Lecci Note: LaTeX template courtesy of UC Berkeley EECS dept.
More informationFóra Gyula Krisztián. Predictive analysis of financial time series
Eötvös Loránd University Faculty of Science Fóra Gyula Krisztián Predictive analysis of financial time series BSc Thesis Supervisor: Lukács András Department of Computer Science Budapest, June 2014 Acknowledgements
More informationLargest Fixed-Aspect, Axis-Aligned Rectangle
Largest Fixed-Aspect, Axis-Aligned Rectangle David Eberly Geometric Tools, LLC http://www.geometrictools.com/ Copyright c 1998-2016. All Rights Reserved. Created: February 21, 2004 Last Modified: February
More informationLecture 6: Logistic Regression
Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,
More informationLinear Programming. March 14, 2014
Linear Programming March 1, 01 Parts of this introduction to linear programming were adapted from Chapter 9 of Introduction to Algorithms, Second Edition, by Cormen, Leiserson, Rivest and Stein [1]. 1
More informationSOLUTIONS. f x = 6x 2 6xy 24x, f y = 3x 2 6y. To find the critical points, we solve
SOLUTIONS Problem. Find the critical points of the function f(x, y = 2x 3 3x 2 y 2x 2 3y 2 and determine their type i.e. local min/local max/saddle point. Are there any global min/max? Partial derivatives
More informationNonlinear Iterative Partial Least Squares Method
Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for
More informationMassive Data Classification via Unconstrained Support Vector Machines
Massive Data Classification via Unconstrained Support Vector Machines Olvi L. Mangasarian and Michael E. Thompson Computer Sciences Department University of Wisconsin 1210 West Dayton Street Madison, WI
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationAn Overview Of Software For Convex Optimization. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.
An Overview Of Software For Convex Optimization Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.edu In fact, the great watershed in optimization isn t between linearity
More informationCost Minimization and the Cost Function
Cost Minimization and the Cost Function Juan Manuel Puerta October 5, 2009 So far we focused on profit maximization, we could look at a different problem, that is the cost minimization problem. This is
More informationNumerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number
More informationPrincipal components analysis
CS229 Lecture notes Andrew Ng Part XI Principal components analysis In our discussion of factor analysis, we gave a way to model data x R n as approximately lying in some k-dimension subspace, where k
More informationData visualization and dimensionality reduction using kernel maps with a reference point
Data visualization and dimensionality reduction using kernel maps with a reference point Johan Suykens K.U. Leuven, ESAT-SCD/SISTA Kasteelpark Arenberg 1 B-31 Leuven (Heverlee), Belgium Tel: 32/16/32 18
More informationLeast-Squares Intersection of Lines
Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
More informationRecovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach
MASTER S THESIS Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach PAULINE ALDENVIK MIRJAM SCHIERSCHER Department of Mathematical
More informationFoundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu
Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationTable 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass.
Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
More information