PDEEC Machine Learning 2016/17

Size: px
Start display at page:

Download "PDEEC Machine Learning 2016/17"

Transcription

1 PDEEC Machine Learning 2016/17 Lecture - Introduction to Support Vector Machines Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 21, / 56

2 Background Convex Hull Convex Hull Given a set of points {xi } we define the convex hull to be the set of all points x given by x = i α i x i subject to: α i 0 i α i = 1 Given two sets of points {x i } and {y i } if their convex hull intersect, the two sets of points cannot be linearly separable Given two sets of points {xi } and {y i } if they are linearly separable, their convex hull do not intersect x1 y1 x4 x2 y4 y3 y5 x3 y2 2 / 56

3 Background Convex Hull Reduced Convex Hulls Given a set of points {x i } we define the reduced convex hull to be the set of all points x given by x = i α i x i subject to: α i 0 i α i = 1 α i α, 1 N α 1 Figure: from [4] 3 / 56

4 Background Distance of the origin to a plane. plane defined by equation y(x) = w T x + b y(xa ) = 0 if x A to plane w is orthogonal to every vector lying on the plane: w T (x A x B ) = 0, with x A and x B on the plane the distance of the plane to the origin is w < x, w > (projection of any point of the plane in the direction of w) = wt x w = b w Figure: from [1] 4 / 56

5 Background Distance of a point P to a plane (I). 1st proof by translation, make P the origin O of a new system of coordinates: x = x P the equation of the plane w T x + b = 0 becomes w T (x + P ) + b = 0 w T x + (w T P + b) = 0 the distance of the origin to the plane is = (wt P +b) w = y(p ) w 2nd proof P = P + P // = P + r w w wt P + b = w T P + b + w T r w w = 0 + r w r = wt P +b w = y(p ) y(p ) w r = w 5 / 56

6 Background Distance of a point P to a plane (II) 3rd proof the problem of finding the minimum of (x P ) is equivalent to finding the minimum of (x P ) 2 = (x P ) T (x P ) using Lagrange multipliers, the problem of finding the minimum of (x P ) T (x P ) subject to w T x + b = 0 is equivalent to the problem of finding the minimum of L = {(x P ) T (x P ) + λ(w T x + b), for the variables x and L λ. x = 2(x P ) + λw = 0 L λ = wt x + b = 0 multiplying the 1st equation by w T and using the 2nd for w T x = b, we obtain λ = 2(wT P +b) w T w using this value of λ on the 1st equation we get x P = (wt P +b)w. The norm of this vector comes as w T w x P = wt P +b w = wt P +b y(p ) w T w w = w algebric distance of a point P to a plane: y(p )/ w 6 / 56

7 Background Lagrange Multipliers (I) Maximizing f(x) subject to g i (x) = 0 Consider the problem of finding the maximum of a function f(x), x R d, subject to a set of constraints g i (x) = 0, i = 1..m There exists a λ = (λ 1, λ 2,, λ m ) T, such that on the maximum x 0 f(x 0 ) = λ 1 g 1 (x 0 ) + λ 2 g 2 (x 0 ) + + λ m g m (x 0 ) Figure: from[1] λ = (λ 1, λ 2,, λ m ) are called the lagrange multipliers Introducing the Lagrangian function defined by L(x, λ) = f(x) + i λ ig i (x), we see that L/ λi = g i (x) L x = f(x) + λ 1 g 1 (x) + λ 2 g 2 (x) + + λ m g m (x) Just find the stationary point of L(x, λ) with respect to both x and λ. 7 / 56

8 Background Lagrange Multipliers (II) Maximizing f(x) subject to g i (x) 0 Two kinds of solutions: the solution lies in g(x) > 0, in which case the constraint is not active (λ = 0) the solution lies in g(x) = 0, in which case the constraint is active (λ 0). NOTE that now λ > 0. Figure: from [1] 8 / 56

9 Background Lagrange Multipliers (III) Karush-Kuhn-Tucker (KKT) conditions (it is a generalization of the method of Lagrange multipliers to inequality constraints) the solution to following nonlinear optimization problem is obtained by optimizing argmax x f(x) s.t. g i (x) 0 L(x, λ i, ν j ) = f(x) + i with respect to x and λ subject to h j (x) = 0 g i (x) 0 λ i 0 λ i g i (x) = 0 h i (x) = 0 λ i g i (x) + j ν j h j (x) 9 / 56

10 Background Lagrange Multipliers (IV) Karush-Kuhn-Tucker (KKT) conditions (it is a generalization of the method of Lagrange multipliers to inequality constraints) the solution to following nonlinear optimization problem is obtained by optimizing argmin x f(x) s.t. g i (x) 0 L(x, λ i, ν j ) = f(x) i with respect to x and λ subject to h j (x) = 0 g i (x) 0 λ i 0 λ i g i (x) = 0 h i (x) = 0 λ i g i (x) j ν j h j (x) 10 / 56

11 Background Lagrange Multipliers (V) Constraints f 1 (θ) = θ 1 + 2θ f 2 (θ) = θ 1 θ f 3 (θ) = θ a constraint is called inactive if the corresponding Lagrange multiplier is zero a constraint f i is called active if the corresponding Lagrange multiplier is nonzero (for which f i (θ ) = 0). 11 / 56

12 Background Min-Max Duality Primal: Dual: max y min x min x max y max F(x, y) y min F(x, y) x F(x, y) min x max F(x, y) y 12 / 56

13 Background Lagrangian Duality (I) minimize x J(x) s.t. g i (x) 0 L(x, λ i ) = J(x) i λ i g i (x) since λ i 0 and g i (x) 0 then J(x) = max λ L(x, λ i) the original problem is equivalent to min x max λ 0 L(x, λ) 13 / 56

14 Background Lagrangian Duality (II) Convex Programming Assume that J(x) is convex Assume that g i are convex then min x = max max L(x, λ) = λ 0 min λ 0 x L(x, λ) 14 / 56

15 Background Lagrangian Duality (III) Wolfe Dual Representation A convex programming problem is equivalent to max L(x, λ) λ 0 subject to L(x, λ) = 0 x 15 / 56

16 Support vector machines Linearly separable classes (I) The training set comprises N input vectors x 1,, x N, with corresponding target values y 1,, y N, where y i { 1, 1}. New data points x will be classified according the sign of the model y(x) = w T x + b margin - smallest distance between the decision boundary and any of the samples In SVMs the decision boundary is chosen to be the one for which the margin is maximized (a) An example of a linearly separable two class problem with multiple possible linear classifiers. (b) SVM option: maximize the margin (from [2]) 16 / 56

17 Support vector machines Linearly separable classes (II) the distance of a training point x n (correctly classified) to the decision surface is given by yny(xn) w = yn(wt x+b) w thus, we want { } 1 argmax w,b w min(y n(w T x n + b)) note now that the solution to this problem is not completely well-defined: if (w, b) is a solution to this optimization problem, so it will be any solution of the form (cw, cb), where c is a constant we can impose an additional constraint to completely define the problem. The constraint can be chosen to simplify the formulation. 17 / 56

18 Support vector machines Linearly separable classes (III) set (y n (w T x n + b)) = 1 for the point closest to the surface. In this case all points should satisfy (y n (w T x n + b)) 1, n = 1,, N We obtain the canonical representation: argmax w,b 1 w subject to: (y n (w T x n + b)) 1, equivalently: argmin w,b 1 2 w 2 subject to: (y n (w T x n + b)) 1, n = 1,, N n = 1,, N this is the primal formulation for the linearly separable case 18 / 56

19 Support vector machines Linearly separable classes (IV) Using the lagrange multipliers and the Karush-Kuhn-Tucker (KKT) conditions, we obtain the dual formulation: L(w, b, λ) = 1 2 w 2 n λ n{y n (w T x n + b) 1} L/ w = 0 w n λ n y n x n = 0 L/ b = 0 n λ n y n = 0 λ i 0, i = 1,, N λ i (y i (w T x i + b) 1) = 0, i = 1,, N y i (w T x i + b) 1) 0, i = 1,, N 19 / 56

20 Support vector machines Linearly separable classes (V) After a bit of algebra we end up with the equivalent optimization task, called the dual representation: argmax N λ n=1 λ n 1 2 subject to: i,j λ iλ j y i y j x T i x j n λ ny n = 0 λ i 0, i = 1,, N note that that now we have N variables, instead of the initial D variables (D is the dimension of the data space) note that in the optimization function the training data only show up in inner produts 20 / 56

21 Support vector machines Linearly separable classes (VI) Classifying new data points after finding the optimal λ, we can classify new data points, evaluating the sign of y(x) = x T w + b = n λ n y n x T x n + b (1) Any training point with λn = 0 plays no role in making predictions linear regression estimates a set of parameters and then the training points are not used for making predictions k-nearest Neighbour uses ALL training points to make predictions SVMs is something inbetween: uses only points with λ n 0 (support vectors) to make predictions: sparse kernel machine 21 / 56

22 Support vector machines Linearly separable classes (VII) Classifying new data points note that w does not need to be computed explicitly... but we need b note that any support vector satisfies (from the KKT conditions or from the very own formulation of SVMs) y m y(x m ) = 1 y m (w t x m + b) = 1. Using (1) we get y m ( n λ ny n x T mx n + b) = 1, where x m is any support vector b = y m n λ ny n x T mx n it is numericaly more stable to average over ALL support vectors: b = 1 { N S m S ym n S λ ny n x T } mx n, with m and n running only over support vectors 22 / 56

23 Support vector machines Linearly separable classes (VIII) A geometric interpretation find the convex hull of both sets of points find the two closest points in the two convex hulls (the convex hull is the smallest convex set containing the points) construct the plane that bisects these two points Figure: from [3] 23 / 56

24 Support vector machines Linearly separable classes (IX) A geometric interpretation The closest points in the convex hulls can be found by solving the following quadratic problem 1 argmin α 2 c d 2 s.t c = x i C 1 α i x i d = x i C 2 α i x i x i C 1 α i = 1 x i C 2 α i = 1 α i 0, i = 1,, N it can be shown to be equivalent to the maximum margin approach 24 / 56

25 Support vector machines Non-linearly separable classes (I) When in possession of a nearly separable dataset, a simple linear separator is bound to misclassify some points. But the question that we have to ask ourselves is if the non-linearly separable data portraits some intrinsic property of the problem (in which case a more complex classifier, allowing more general boundaries between classes may be more appropriate) or if it can be interpreted as the result of noisy points (measurement errors, uncertainty in class membership, etc), in which case keeping the linear separator and accept some errors is more natural. we will keep the linear boundary for now and accept errors in the training data 25 / 56

26 Support vector machines Non-linearly separable classes (II) allow some of the training points to be misclassified penalize the number of misclassifications introduce slack variables ξ n 0, n = 1,..., N ξ n = 0 ξ n = y n y(x n ) x n on the correct side of the margin otherwise Figure: from [1] 26 / 56

27 Support vector machines Non-linearly separable classes (III) replace exact constraints with y n y(x n ) 1 ξ n y n (w T x n + b) 1 ξ n the goal could now be to minimize 1 2 w 2 + C N n=1 sign(ξ n) The constraint parameter C controls the tradeoff between the dual objectives of maximizing the margin of separation and minimizing the misclassification error For an error to occur, the corresponding ξ n must exceed unity, so N n=1 sign(ξ n) is an upper bound on the number of the training errors optimization of the above is difficult since it involves a discontinuous function sign() to optimize a closely related cost function 1 2 w 2 + C N n=1 ξ n 27 / 56

28 Support vector machines Non-linearly separable classes (III) Figure: from [1] 28 / 56

29 Support vector machines Non-linearly separable classes (IV) L(w, b, ξ, µ, λ) = 1 2 w 2 + C ξ n λ n (y n y(x n ) 1 + ξ n ) µ n ξ n KKT conditions: λ n 0 y n y(x n ) 1 + ξ n 0 λ(y n y(x n ) 1 + ξ n ) = 0 µ n 0 ξ n 0 µ n ξ n = 0 29 / 56

30 Support vector machines Non-linearly separable classes (Va) L w = 0 w = n λ ny n x n L b = 0 n λ ny n = 0 = 0 λ n = C µ n L ξ n Dual Lagrangian form: argmax λ L(λ) = λ N n=1 1 2 N n=1 N m=1 λ nλ m y n y m x T mx n subject to: 0 λ n C N n=1 λ ny n = 0 This is very similar to the optimization problem in the linear separable case, except that there is an upper bound C on λ n now a QP solver can be used to find λ n 30 / 56

31 Support vector machines Non-linearly separable classes (Vb) The SMO algorithm Consider solving the unconstrained opt problem: EM (to be discussed) gradient ascend Coordinate ascend max α W (α 1,, α m ) 31 / 56

32 Support vector machines Non-linearly separable classes (Vc) Coordinate ascend 32 / 56

33 Support vector machines Non-linearly separable classes (Vc) Sequential minimal optimization Constrained optimization: argmax λ L(λ) = λ N n=1 1 2 N n=1 N m=1 λ nλ m y n y m x T mx n subject to: 0 λ n C N n=1 λ ny n = 0 Question: can we do coordinate along one direction at a time (i.e., hold all λ [ i] fixed, and update λ i?) 33 / 56

34 Support vector machines Non-linearly separable classes (Vd) The SMO algorithm Repeat till convergence Select some pair λ i and λ j to update next (using a heuristic that tries to pick the two that will allow us to make the biggest progress towards the global maximum). Re-optimize L(λ) with respect to λi and λ j, while holding all the other λ k s (k i; j) fixed. 34 / 56

35 Support vector machines Non-linearly separable classes (VI) Predictions: y(x) = N n=1 λ ny n x T x n + b b = 1 N M n M (y n m S λ my m x t mx n ) where M denotes the set of indices of data having 0 < λ n < C and S denotes the set of indices of data having λ n 0 (set of support vectors) 35 / 56

36 Support vector machines Non-linearly separable classes (VII) A geometric interpretation find the reduced convex hull of both sets of points find the two closest points in the two reduced convex hulls (the convex hull is the smallest convex set containing the points) construct the plane that bisects these two points (a) An example of two nonlinearly separable datasets (from [3]). (b) Reduced Convex Hull approach (from [3]) 36 / 56

37 Support vector machines Non-linearly separable classes (VIII) A geometric interpretation The closest points in the reduced convex hulls can be found by solving the following quadratic problem 1 argmin α 2 c d 2 s.t c = x i C 1 α i x i d = x i C 2 α i x i x i C 1 α i = 1 x i C 2 α i = 1 0 α i D, i = 1,, N it can be shown to be equivalent to the maximum margin approach 37 / 56

38 Support vector machines Non-linearly separable classes (IX) show that for a linearly separable dataset, the solution for the formulation with ξ s may be different from the solution for the formulation without the ξ s. what is the solution for the formulation without the ξ s for a non-linearly separable dataset? 38 / 56

39 Support vector machines Non-linearly separable classes (X) Dual Lagrangian form: L(λ) = λ N n=1 1 2 N n=1 N m=1 λ nλ m y n y m x T mx n subject to: 0 λ n C N n=1 λ ny n = 0 b = 1 N M m M (y m s S λ sy s x T s x m ) Predictions: y(x) = N n=1 λ ny n x T x n + b Notes: N unknowns: λ s the input vector x enters only in the form of scalar products the dimension of the data has almost no impact on the complexity 39 / 56

40 Support vector machines The design of nonlinear classifiers The design of a nonlinear classifier for the original data space x R d can be accomplished with the design of a linear classifier in a suitable space ˆx R D usually D > d but that is not always necessary Examples 1. the search for a boundary x x 2 2 = const in R 2 can be found as a linear boundary in the space defined by the variables r = x x2 2 and θ = arctan(y/x). In fact it could be simplified to the search in R defined just by the variable r 2. the search for a boundary w 1 x 1 + w 2 x 2 + w 3 x 1 x 2 + w 4 x w 5 x w 6 = 0 in R 2 can be found as a linear boundary in the space defined by the variables ˆx 1 = x 1, ˆx 2 = x 2, ˆx 3 = x 1 x 2, ˆx 4 = x 2 1, ˆx 5 = x / 56

41 Support vector machines The design of nonlinear classifiers to design a nonlinear SVM we just define a new space Φ(x) where our problem becomes linear apply the linear SVM in the new space Φ(x) (the linearly separable version or the nonlinearly separable version) the changes in our formulations are minimum: just replace x T i x j by Φ(x i ) T Φ(x j ) Define k(x i, x j ) = Φ(x i ) T Φ(x j ). The nonlinear SVMs become: L(λ) = λ N n=1 1 N N 2 n=1 m=1 λ nλ m y n y m k(x m, x n ) subject to: 0 λ n C N n=1 λ ny n = 0 b = 1 N M m M (y m s S λ sy s k(x s, x m )) Predictions: y(x) = N n=1 λ ny n k(x, x n ) + b The new space Φ(x) should capture the nonlinearity intrinsic to the data (and not the noise). In the simplest case Φ(x) = x and k(x i, x j ) = x T i x j 41 / 56

42 Kernel methods The design of nonlinear classifiers It would be nice to be able to compute k(x i, x j ) = Φ(x i ) T Φ(x j ) without needing to use the transformed space Φ(x) Mercer s theorem Let x R d and a mapping Φ(x) x Φ(x) H, where H is a Euclidean space (linear space equipped with inner product). Then Φ(x 1 ) T Φ(x 2 ) = k(x 1, x 2 ), where k(x 1, x 2 ) is a symmetric function satisfying k(x 1, x 2 )g(x 1 )g(x 2 )dx 1 dx 2 0, for any g(x), x R d such that g 2 (x)dx < Note that Mercer s theorem does not tell us how to find this space. A necessary and sufficient condition for a function k(x, x) to be a valid kernel is that the Gram Matrix K, whose elements are given by k(x n, x m ), should be a positive semidefinite matrix for all possible choices of the set {x n } 42 / 56

43 Kernel methods SVM examples 43 / 56

44 Kernel methods SVM examples 44 / 56

45 Kernel methods The design of nonlinear classifiers many models can be reformulated in terms of a dual representation in which the kernel function arises naturally linear regression kernel PCA nearest neighbour etc 45 / 56

46 Beyond basic Linear Regression Beyond basic LR Dual Variables in Regression (X t X + λi)w = X t y w = λ 1 X t (y Xw) = X t α showing that w can be written as a linear combination of the training points, w = N n=1 = α nx n, with α = λ 1 (y Xw) The elements of α are called the dual variables. α = λ 1 (y Xw) λα = y XX t α (XX t + λi)α = y α = (G + λi) 1 y G = XX t or, component-wise, G ij =< x i, x j >. The resulting prediction function is given by N g(x) =< w, x >= n=1 α nx n, x = n α n < x n, x > 46 / 56

47 Kernel methods An example of a kernel algorithm Idea: classify points x in the feature space according to which of the two class means is closer. c + = 1 N + i:y i =+1 x i c = 1 N i:y i = 1 x i Figure: from [?] Compute the sign of the dot product between w = c + c and x c. 47 / 56

48 Kernel methods An example of a kernel algorithm ( f(x) = sgn 1 ) N + i:y i =+1 < x, x i > 1 N i:y i = 1 < x, x i > +b ( = sgn 1 ) N + i:y i =+1 k(x, x i) 1 N i:y i = 1 k(x, x i) + b where ( b = N 2 (i,j):y i =y j = 1 k(x i, x j ) 1 ) N+ 2 (i,j):y i =y j =+1 k(x i, x j ) 48 / 56

49 Kernel methods Typical kernels any algorithm that only depends on dot products can benefit form the kernel trick think of the kernel as a nonlinear similarity measure examples of common kernels: Linear k(x1, x 2 ) = x T 1 x 2 Polynomial k(x1, x 2 ) = (x T 1 x 2 + c) d Gaussian k(x 1, x 2 ) = exp( γ x 1 x 2 2 ) kernels are also known as covariance functions 49 / 56

50 Kernel methods Constructing kernels Given valid kernels k 1 (x, x ) and k 2 (x, x ), the following new kernels will also be valid: k(x, x ) = ck 1 (x, x ), c R + k(x, x ) = f(x)k 1 (x, x )f(x ), f(.) is any function k(x, x ) = k 1 (x, x ) + k 2 (x, x ) k(x, x ) = k 1 (x, x )k 2 (x, x ) k(x, x ) = exp(k 1 (x, x )) 50 / 56

51 Regression Support Vector Machine Regression Support Vector Machine min w,b C n E ɛ(y n f(x n )) + λ 2 { w 2 0 if z < ɛ E ɛ (z) = z ɛ, otherwise E(z) t ɛ 0 ɛ z (a) ɛ insensitive error function. 0 x 1 (b) RVM result. Figure: from [1] 51 / 56

52 Multiclass Support Vector Machine Reduction techniques Conventional approaches One-against-All K two-class problems Pairwise K(K 1)/2 two-class problems Decision-Tree-Based DAG (Directed Acyclic Graph) Error-Correcting Output Codes 52 / 56

53 Multiclass Support Vector Machine Reduction techniques Conventional approaches apply binary classifier 1 to test example and get prediction F 1 (0/1) apply binary classifier 2 to test example and get prediction F 2 (0/1)... apply binary classifier M to test example and get prediction F M (0/1) use all M classifications to get the final multiclass classification 1..K 53 / 56

54 Multiclass Support Vector Machine Reduction techniques A lesser-known approach for solving multiclass problems via a Single binary classifier reduction obtained by embedding the original problem in a higher-dimensional space consisting of the original features, as well as one or more other dimensions determined by fixed vectors, termed here extension features. This embedding is implemented by replicating the training set points so that a copy of the original point is concatenated with each of the extension features vectors. The binary labels of the replicated points are set to maintain a particular structure in the extended space. This construction results in an instance of an artificial binary problem, which is fed to a binary learning algorithm that outputs a single soft binary classifier. To classify a new point, the point is replicated and extended similarly and the resulting replicas are fed to the soft binary classifier, which generates a number of signals, one for each replica. The class is determined as a function of these signals. 54 / 56

55 Multiclass Support Vector Machine All-at-Once Support Vector Machines For a K-class problem we define the decision function for class i by w T i x + b i > w T k x + b k, for k i, k = 1,..., K The L1 soft-margin support vector machine can be obtained by minimizing min w,b,ξ = 1 2 K w k 2 + C k=1 N n=1 K k=1,k y n ξ n,k subject to (w yn w k ) T x k + b yn b k 1 ξ n,k for k y n, k = 1,..., K, n = 1,..., N 55 / 56

56 References Christopher M. Bishop Pattern recognition and machine learning, Springer, Sergios Theodoridis and Konstantinos Koutroumbas Pattern recognition Elsevier, Academic Press, Kristin P. Bennet and Campbell Support Vector Machines: Hype or Hallelujah? SIGKDD Explorations, vol. 2, 2000 Michael E. Mavroforakis and Sergios Theodoridis A Geometric Approach to Support Vector Machine (SVM) Classification IEEE Transactions on Neural Networks, 2006

Support Vector Machine (SVM)

Support Vector Machine (SVM) Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Support Vector Machines Explained

Support Vector Machines Explained March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric

More information

Support Vector Machines

Support Vector Machines CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning algorithm. SVMs are among the best (and many believe are indeed the best)

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

Lecture 2: The SVM classifier

Lecture 2: The SVM classifier Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function

More information

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Several Views of Support Vector Machines

Several Views of Support Vector Machines Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725 Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T

More information

A fast multi-class SVM learning method for huge databases

A fast multi-class SVM learning method for huge databases www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,

More information

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

Nonlinear Programming Methods.S2 Quadratic Programming

Nonlinear Programming Methods.S2 Quadratic Programming Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

constraint. Let us penalize ourselves for making the constraint too big. We end up with a

constraint. Let us penalize ourselves for making the constraint too big. We end up with a Chapter 4 Constrained Optimization 4.1 Equality Constraints (Lagrangians) Suppose we have a problem: Maximize 5, (x 1, 2) 2, 2(x 2, 1) 2 subject to x 1 +4x 2 =3 If we ignore the constraint, we get the

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

3. INNER PRODUCT SPACES

3. INNER PRODUCT SPACES . INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.

More information

Distributed Machine Learning and Big Data

Distributed Machine Learning and Big Data Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya

More information

E-commerce Transaction Anomaly Classification

E-commerce Transaction Anomaly Classification E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

More information

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen (für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained

More information

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

More information

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

Linear Programming Notes V Problem Transformations

Linear Programming Notes V Problem Transformations Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material

More information

Support Vector Machine. Tutorial. (and Statistical Learning Theory)

Support Vector Machine. Tutorial. (and Statistical Learning Theory) Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. jasonw@nec-labs.com 1 Support Vector Machines: history SVMs introduced

More information

8. Linear least-squares

8. Linear least-squares 8. Linear least-squares EE13 (Fall 211-12) definition examples and applications solution of a least-squares problem, normal equations 8-1 Definition overdetermined linear equations if b range(a), cannot

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

More information

Support Vector Machines for Classification and Regression

Support Vector Machines for Classification and Regression UNIVERSITY OF SOUTHAMPTON Support Vector Machines for Classification and Regression by Steve R. Gunn Technical Report Faculty of Engineering, Science and Mathematics School of Electronics and Computer

More information

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA This handout presents the second derivative test for a local extrema of a Lagrange multiplier problem. The Section 1 presents a geometric motivation for the

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Simple and efficient online algorithms for real world applications

Simple and efficient online algorithms for real world applications Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,

More information

Neural Networks and Support Vector Machines

Neural Networks and Support Vector Machines INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines

More information

Machine Learning in FX Carry Basket Prediction

Machine Learning in FX Carry Basket Prediction Machine Learning in FX Carry Basket Prediction Tristan Fletcher, Fabian Redpath and Joe D Alessandro Abstract Artificial Neural Networks ANN), Support Vector Machines SVM) and Relevance Vector Machines

More information

10. Proximal point method

10. Proximal point method L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing

More information

Metric Spaces. Chapter 7. 7.1. Metrics

Metric Spaces. Chapter 7. 7.1. Metrics Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Nonlinear Optimization: Algorithms 3: Interior-point methods

Nonlinear Optimization: Algorithms 3: Interior-point methods Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,

More information

Proximal mapping via network optimization

Proximal mapping via network optimization L. Vandenberghe EE236C (Spring 23-4) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:

More information

(Quasi-)Newton methods

(Quasi-)Newton methods (Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

More information

Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany carl@tuebingen.mpg.de WWW home page: http://www.tuebingen.mpg.de/ carl

More information

SUPPORT vector machine (SVM) formulation of pattern

SUPPORT vector machine (SVM) formulation of pattern IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 17, NO. 3, MAY 2006 671 A Geometric Approach to Support Vector Machine (SVM) Classification Michael E. Mavroforakis Sergios Theodoridis, Senior Member, IEEE Abstract

More information

One-sided Support Vector Regression for Multiclass Cost-sensitive Classification

One-sided Support Vector Regression for Multiclass Cost-sensitive Classification One-sided Support Vector Regression for Multiclass Cost-sensitive Classification Han-Hsing Tu r96139@csie.ntu.edu.tw Hsuan-Tien Lin htlin@csie.ntu.edu.tw Department of Computer Science and Information

More information

Data clustering optimization with visualization

Data clustering optimization with visualization Page 1 Data clustering optimization with visualization Fabien Guillaume MASTER THESIS IN SOFTWARE ENGINEERING DEPARTMENT OF INFORMATICS UNIVERSITY OF BERGEN NORWAY DEPARTMENT OF COMPUTER ENGINEERING BERGEN

More information

Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

More information

Vector and Matrix Norms

Vector and Matrix Norms Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

More information

Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

More information

Linear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc.

Linear Programming for Optimization. Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc. 1. Introduction Linear Programming for Optimization Mark A. Schulze, Ph.D. Perceptive Scientific Instruments, Inc. 1.1 Definition Linear programming is the name of a branch of applied mathematics that

More information

Virtual Landmarks for the Internet

Virtual Landmarks for the Internet Virtual Landmarks for the Internet Liying Tang Mark Crovella Boston University Computer Science Internet Distance Matters! Useful for configuring Content delivery networks Peer to peer applications Multiuser

More information

Date: April 12, 2001. Contents

Date: April 12, 2001. Contents 2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........

More information

2.3 Convex Constrained Optimization Problems

2.3 Convex Constrained Optimization Problems 42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

More information

THE SVM APPROACH FOR BOX JENKINS MODELS

THE SVM APPROACH FOR BOX JENKINS MODELS REVSTAT Statistical Journal Volume 7, Number 1, April 2009, 23 36 THE SVM APPROACH FOR BOX JENKINS MODELS Authors: Saeid Amiri Dep. of Energy and Technology, Swedish Univ. of Agriculture Sciences, P.O.Box

More information

Inner Product Spaces

Inner Product Spaces Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

More information

AIMS Big data. AIMS Big data. Outline. Outline. Lecture 5: Structured-output learning January 7, 2015 Andrea Vedaldi

AIMS Big data. AIMS Big data. Outline. Outline. Lecture 5: Structured-output learning January 7, 2015 Andrea Vedaldi AMS Big data AMS Big data Lecture 5: Structured-output learning January 7, 5 Andrea Vedaldi. Discriminative learning. Discriminative learning 3. Hashing and kernel maps 4. Learning representations 5. Structured-output

More information

Online learning of multi-class Support Vector Machines

Online learning of multi-class Support Vector Machines IT 12 061 Examensarbete 30 hp November 2012 Online learning of multi-class Support Vector Machines Xuan Tuan Trinh Institutionen för informationsteknologi Department of Information Technology Abstract

More information

Introduction to the Finite Element Method (FEM)

Introduction to the Finite Element Method (FEM) Introduction to the Finite Element Method (FEM) ecture First and Second Order One Dimensional Shape Functions Dr. J. Dean Discretisation Consider the temperature distribution along the one-dimensional

More information

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014 LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

More information

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem

More information

Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

More information

A Tutorial on Support Vector Machines for Pattern Recognition

A Tutorial on Support Vector Machines for Pattern Recognition c,, 1 43 () Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. A Tutorial on Support Vector Machines for Pattern Recognition CHRISTOPHER J.C. BURGES Bell Laboratories, Lucent Technologies

More information

1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where.

1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where. Introduction Linear Programming Neil Laws TT 00 A general optimization problem is of the form: choose x to maximise f(x) subject to x S where x = (x,..., x n ) T, f : R n R is the objective function, S

More information

Machine Learning Final Project Spam Email Filtering

Machine Learning Final Project Spam Email Filtering Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE

More information

Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

More information

Decompose Error Rate into components, some of which can be measured on unlabeled data

Decompose Error Rate into components, some of which can be measured on unlabeled data Bias-Variance Theory Decompose Error Rate into components, some of which can be measured on unlabeled data Bias-Variance Decomposition for Regression Bias-Variance Decomposition for Classification Bias-Variance

More information

Practical Guide to the Simplex Method of Linear Programming

Practical Guide to the Simplex Method of Linear Programming Practical Guide to the Simplex Method of Linear Programming Marcel Oliver Revised: April, 0 The basic steps of the simplex algorithm Step : Write the linear programming problem in standard form Linear

More information

Lecture 2: Homogeneous Coordinates, Lines and Conics

Lecture 2: Homogeneous Coordinates, Lines and Conics Lecture 2: Homogeneous Coordinates, Lines and Conics 1 Homogeneous Coordinates In Lecture 1 we derived the camera equations λx = P X, (1) where x = (x 1, x 2, 1), X = (X 1, X 2, X 3, 1) and P is a 3 4

More information

Machine Learning Big Data using Map Reduce

Machine Learning Big Data using Map Reduce Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories

More information

Solutions Of Some Non-Linear Programming Problems BIJAN KUMAR PATEL. Master of Science in Mathematics. Prof. ANIL KUMAR

Solutions Of Some Non-Linear Programming Problems BIJAN KUMAR PATEL. Master of Science in Mathematics. Prof. ANIL KUMAR Solutions Of Some Non-Linear Programming Problems A PROJECT REPORT submitted by BIJAN KUMAR PATEL for the partial fulfilment for the award of the degree of Master of Science in Mathematics under the supervision

More information

A Lagrangian-DNN Relaxation: a Fast Method for Computing Tight Lower Bounds for a Class of Quadratic Optimization Problems

A Lagrangian-DNN Relaxation: a Fast Method for Computing Tight Lower Bounds for a Class of Quadratic Optimization Problems A Lagrangian-DNN Relaxation: a Fast Method for Computing Tight Lower Bounds for a Class of Quadratic Optimization Problems Sunyoung Kim, Masakazu Kojima and Kim-Chuan Toh October 2013 Abstract. We propose

More information

LINEAR EQUATIONS IN TWO VARIABLES

LINEAR EQUATIONS IN TWO VARIABLES 66 MATHEMATICS CHAPTER 4 LINEAR EQUATIONS IN TWO VARIABLES The principal use of the Analytic Art is to bring Mathematical Problems to Equations and to exhibit those Equations in the most simple terms that

More information

Factorization Theorems

Factorization Theorems Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci

More information

Arrangements And Duality

Arrangements And Duality Arrangements And Duality 3.1 Introduction 3 Point configurations are tbe most basic structure we study in computational geometry. But what about configurations of more complicated shapes? For example,

More information

Classification of high resolution satellite images

Classification of high resolution satellite images Thesis for the degree of Master of Science in Engineering Physics Classification of high resolution satellite images Anders Karlsson Laboratoire de Systèmes d Information Géographique Ecole Polytéchnique

More information

A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION

A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION 1 A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION Dimitri Bertsekas M.I.T. FEBRUARY 2003 2 OUTLINE Convexity issues in optimization Historical remarks Our treatment of the subject Three unifying lines of

More information

Mathematical finance and linear programming (optimization)

Mathematical finance and linear programming (optimization) Mathematical finance and linear programming (optimization) Geir Dahl September 15, 2009 1 Introduction The purpose of this short note is to explain how linear programming (LP) (=linear optimization) may

More information

Lecture 2: August 29. Linear Programming (part I)

Lecture 2: August 29. Linear Programming (part I) 10-725: Convex Optimization Fall 2013 Lecture 2: August 29 Lecturer: Barnabás Póczos Scribes: Samrachana Adhikari, Mattia Ciollaro, Fabrizio Lecci Note: LaTeX template courtesy of UC Berkeley EECS dept.

More information

Fóra Gyula Krisztián. Predictive analysis of financial time series

Fóra Gyula Krisztián. Predictive analysis of financial time series Eötvös Loránd University Faculty of Science Fóra Gyula Krisztián Predictive analysis of financial time series BSc Thesis Supervisor: Lukács András Department of Computer Science Budapest, June 2014 Acknowledgements

More information

Largest Fixed-Aspect, Axis-Aligned Rectangle

Largest Fixed-Aspect, Axis-Aligned Rectangle Largest Fixed-Aspect, Axis-Aligned Rectangle David Eberly Geometric Tools, LLC http://www.geometrictools.com/ Copyright c 1998-2016. All Rights Reserved. Created: February 21, 2004 Last Modified: February

More information

Lecture 6: Logistic Regression

Lecture 6: Logistic Regression Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,

More information

Linear Programming. March 14, 2014

Linear Programming. March 14, 2014 Linear Programming March 1, 01 Parts of this introduction to linear programming were adapted from Chapter 9 of Introduction to Algorithms, Second Edition, by Cormen, Leiserson, Rivest and Stein [1]. 1

More information

SOLUTIONS. f x = 6x 2 6xy 24x, f y = 3x 2 6y. To find the critical points, we solve

SOLUTIONS. f x = 6x 2 6xy 24x, f y = 3x 2 6y. To find the critical points, we solve SOLUTIONS Problem. Find the critical points of the function f(x, y = 2x 3 3x 2 y 2x 2 3y 2 and determine their type i.e. local min/local max/saddle point. Are there any global min/max? Partial derivatives

More information

Nonlinear Iterative Partial Least Squares Method

Nonlinear Iterative Partial Least Squares Method Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for

More information

Massive Data Classification via Unconstrained Support Vector Machines

Massive Data Classification via Unconstrained Support Vector Machines Massive Data Classification via Unconstrained Support Vector Machines Olvi L. Mangasarian and Michael E. Thompson Computer Sciences Department University of Wisconsin 1210 West Dayton Street Madison, WI

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

An Overview Of Software For Convex Optimization. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.

An Overview Of Software For Convex Optimization. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt. An Overview Of Software For Convex Optimization Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.edu In fact, the great watershed in optimization isn t between linearity

More information

Cost Minimization and the Cost Function

Cost Minimization and the Cost Function Cost Minimization and the Cost Function Juan Manuel Puerta October 5, 2009 So far we focused on profit maximization, we could look at a different problem, that is the cost minimization problem. This is

More information

Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number

More information

Principal components analysis

Principal components analysis CS229 Lecture notes Andrew Ng Part XI Principal components analysis In our discussion of factor analysis, we gave a way to model data x R n as approximately lying in some k-dimension subspace, where k

More information

Data visualization and dimensionality reduction using kernel maps with a reference point

Data visualization and dimensionality reduction using kernel maps with a reference point Data visualization and dimensionality reduction using kernel maps with a reference point Johan Suykens K.U. Leuven, ESAT-SCD/SISTA Kasteelpark Arenberg 1 B-31 Leuven (Heverlee), Belgium Tel: 32/16/32 18

More information

Least-Squares Intersection of Lines

Least-Squares Intersection of Lines Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a

More information

Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach

Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach MASTER S THESIS Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach PAULINE ALDENVIK MIRJAM SCHIERSCHER Department of Mathematical

More information

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.

More information

Table 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass.

Table 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass. Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information