Introduction to Machine Learning NPFL 054

Size: px
Start display at page:

Download "Introduction to Machine Learning NPFL 054"

Transcription

1 Introduction to Machine Learning NPFL Barbora Hladká Martin Holub Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics NPFL054, 2016 Hladká & Holub Lecture 7, page 1/51

2 Outline Evaluation of binary classifiers (cntnd): ROC curve Support Vector Machines (SVM) NPFL054, 2016 Hladká & Holub Lecture 7, page 2/51

3 Support Vector Machines Basic idea of SVM for binary classification tasks We find a plane that separates the two classes in the feature space. If it is not possible allow some training errors, or enrich the feature space so that finding a separating plane is possible NPFL054, 2016 Hladká & Holub Lecture 7, page 3/51

4 Support Vector Machines Three key ideas Maximizing the margin Duality optimization task Kernels Key concepts needed Hyperplane Dot product Quadratic programming NPFL054, 2016 Hladká & Holub Lecture 7, page 4/51

5 Hyperplane A hyperplane of an m-dimensional space is a subspace with dimension m 1. Mathematical definition Θ 0 + Θ T x = 0, where Θ = Θ 1,..., Θ m If m = 2, a hyperplane is a line Θ is a normal vector If x satisfies the equation, then it lies on the hyperplane If Θ 0 + Θ T x 0, then x lies to one side of the hyperplane NPFL054, 2016 Hladká & Holub Lecture 7, page 5/51

6 Hyperplane NPFL054, 2016 Hladká & Holub Lecture 7, page 6/51

7 Hyperplane Separating hyperplane NPFL054, 2016 Hladká & Holub Lecture 7, page 7/51

8 Hyperplane Point-hyperplane distance Distance of x to the hyperplane Θ 0 + Θ T x = 0 ρ(x) = Θ 0 + Θ T x Θ NPFL054, 2016 Hladká & Holub Lecture 7, page 8/51

9 Dot product x R m length of x (or 2-norm) x = m i=1 x i 2 dot product of two vectors x 1, x 2 R m x 1 x 2 = m x 1i x 2i i=1 x 1x 2 = x 1. x 2. cos α geometric interpretation of x 1x 2: the length of the projection of x 1 onto the unit vector x 2 ( x 2 = 1) xx = x 2 NPFL054, 2016 Hladká & Holub Lecture 7, page 9/51

10 Dot product x 2 = 1 NPFL054, 2016 Hladká & Holub Lecture 7, page 10/51

11 Quadratic programming Quadratic programming is the problem of optimizing a quadratic function of several variables subject to linear constraints on these variables. NPFL054, 2016 Hladká & Holub Lecture 7, page 11/51

12 Support Vector Machines Binary classification task Y = {+1, 1} h has a form of h(x) = sgn(θ 0 + Θ T x) NPFL054, 2016 Hladká & Holub Lecture 7, page 12/51

13 Support Vector Machines Binary classification task Y = {+1, 1} Outline 1 Large margin classifier (linear separability) 2 Soft margin classifier (not linear separability) 3 Kernels (non-linear class boundaries) NPFL054, 2016 Hladká & Holub Lecture 7, page 13/51

14 Support Vector Machines Binary classification task Y = {+1, 1} Data set Data = { x i, y i, x i X, y i { 1, +1}} is linearly separable if there exists a hyperplane so that all instances from Data are classified correctly. NPFL054, 2016 Hladká & Holub Lecture 7, page 14/51

15 Support Vector Machines Binary classification task Y = {+1, 1} Assume a hyperplane g: Θ 0 + Θ T x = 0 Margin of x w.r.t. g is distance of x to g: ρ g (x) = Θ 0 + Θ T x Θ Functional margin of x, x, y Data w.r.t. g is ρ g (x, y) = y(θ 0 + Θ T x) Is x classified correctly or not? Large functional margin represents correct and confident classification. Geometric margin of I.e. functional margin scaled by Θ x, x, y Data w.r.t. g is ρ g (x, y) = ρ g (x, y)/ Θ NPFL054, 2016 Hladká & Holub Lecture 7, page 15/51

16 Geometric margin of x NPFL054, 2016 Hladká & Holub Lecture 7, page 16/51

17 Support Vector Machines Binary classification task Y = {+1, 1} Geometric margin of Data w.r.t. g is ρ g (Data) = argmin x,y Data ρ g (x, y) NPFL054, 2016 Hladká & Holub Lecture 7, page 17/51

18 Support Vector Machines Binary classification task Y = {+1, 1} Functional margin of Data w.r.t. g is ρ g (Data) = argmin x,y Data ρ g (x, y) Functional margin of training set is functional margin of closest points NPFL054, 2016 Hladká & Holub Lecture 7, page 18/51

19 Large Margin Classifier Training data is linearly separable We look for g so that g = argmax g ρ g (Data) NPFL054, 2016 Hladká & Holub Lecture 7, page 19/51

20 Large Margin Classifier Training data is linearly separable Θ 0 + Θ T x and kθ 0 + (kθ) T x define the same hyperplane. i.e. distances to separator are unchanged y i (Θ 0 + Θ T x i ) Θ = y i(kθ 0 + (kθ) T x i ) kθ Thus, we can choose Θ so that ρ g (Data) = 1. Then g = argmax g ρ g (Data) = argmax g 1 Θ NPFL054, 2016 Hladká & Holub Lecture 7, page 20/51

21 Large Margin Classifier Training data is linearly separable NPFL054, 2016 Hladká & Holub Lecture 7, page 21/51

22 Large Margin Classifier Training data is linearly separable Goal: Orientate the separatig hyperplane to be as far as possible from the closest instances of both classes. Θ = argmax Θ 1 Θ Support vectors are the instances touching the margins. NPFL054, 2016 Hladká & Holub Lecture 7, page 22/51

23 Large Margin Classifier Training data is linearly separable NPFL054, 2016 Hladká & Holub Lecture 7, page 23/51

24 Large Margin Classifier Training data is linearly separable Θ 1 = argmax Θ Θ argmin 1 Θ 2 Θ 2 NPFL054, 2016 Hladká & Holub Lecture 7, page 24/51

25 Large Margin Classifier Training data is linearly separable Primal problem Minimize subject to constraints 1 2 Θ 2 y i (Θ 0 + Θ T x i ) 1, i = 1,... n Properties 1 Convex optimization 2 Unique solution for linearly separable training data NPFL054, 2016 Hladká & Holub Lecture 7, page 25/51

26 Large Margin Classifier Training data is linearly separable For each training example x i, y i, introduce Lagrange multiplier α i 0. Let α = α 1,..., α n. Primal Lagrangian L(Θ, Θ 0, α) is given by L(Θ, Θ 0, α) = 1 2 Θ 2 i α i (y i (Θ 0 + Θ T x i ) 1) (1) subject to constraints α i [y i (Θ 0 + Θ T x i ) 1] = 0, i = 1,... n We wish to find the Θ and Θ 0 which minimizes and the α which maximizes L. NPFL054, 2016 Hladká & Holub Lecture 7, page 26/51

27 Large Margin Classifier Training data is linearly separable 1. Minimize L w.r.t. Θ Thus differentiate L w.r.t. Θ and L Θ = 0 It gives Θ = n α i y i x i (2) i=1 2. Minimize L w.r.t. Θ 0 Thus differentiate L w.r.t. Θ 0 and L Θ 0 = 0 It gives n α i y i = 0 (3) i=1 NPFL054, 2016 Hladká & Holub Lecture 7, page 27/51

28 Large Margin Classifier Training data is linearly separable 3. Substitute (2) into the primal form (1). Then L(Θ, Θ 0, α) = α i 1 2 i subject to α i 0 and i α iy i = 0, i = 1... n α i α j y i y j xi T x j i,j NPFL054, 2016 Hladká & Holub Lecture 7, page 28/51

29 Large Margin Classifier Training data is linearly separable 4. Solve the dual problem, i.e. maximize a quadratic function. Do quadratic programming. L(α) = i α i 1 α i α j y i y j xi T x j 2 subject to α i 0 and i α iy i = 0, i = 1... n 5. Get α 6. Then Θ = n i=1 α i y ix i i,j NPFL054, 2016 Hladká & Holub Lecture 7, page 29/51

30 Large Margin Classifier Training data is linearly separable Θ is the solution to the primal problem α is the solution to the dual problem Θ and α satisfy the Karush-Kuhn-Tucker conditions where one of them is so called Karush-Kuhn-Tucker dual complementarity: α i (1 y i (Θ 0 + Θ T x i ) = 0 y i (Θ 0 + Θ T x i ) 1 (x i is not support vector) α i = 0 α i 0 y i (Θ 0 + Θ T x i ) = 1 (x i is support vector) I.e., finding Θ is equivalent to finding support vectors and their weights NPFL054, 2016 Hladká & Holub Lecture 7, page 30/51

31 Large Margin Classifier Training data is linearly separable Prediction for a new instance x h(x) = sgn( n α i y i x i x + Θ 0 ) i=1 similarity between x and support vector x i : a support vector that is more similar contributes more to the classification support vector that is more important, i.e. has larger α i, contributes more to the classification if y i is positive, than the contribution is positive, otherwise negative NPFL054, 2016 Hladká & Holub Lecture 7, page 31/51

32 Soft Margin Classifier Training data is not linearly separable In a real problem it is unlikely that a line will exactly separate the data even if a curved decision boundary is possible. So exactly separating the data is probably not desirable if the data has noise and outliers, a smooth decision boundary that ignores a few data points is better than one that loops around the outliers. Thus minimize Θ 2 AND the number of training mistakes NPFL054, 2016 Hladká & Holub Lecture 7, page 32/51

33 Soft Margin Classifier Training data is not linearly separable NPFL054, 2016 Hladká & Holub Lecture 7, page 33/51

34 Soft Margin Classifier Training data is not linearly separable Introducing slack variables ξ i 0 ξ i = 0 if x i is correctly classified ξ i is distance to "its supporting hyperplane" otherwise 0 < ξ i 1/ Θ : margin violation ξ i > 1/ Θ : misclassification NPFL054, 2016 Hladká & Holub Lecture 7, page 34/51

35 Soft Margin Classifier Training data is not linearly separable Primal problem Minimize subject to constraint 1 2 Θ 2 + C n i=1 ξ i C 0 trade-off parameter small C large margin large C narrow margin y i (Θ 0 + Θ T x i ) 1 ξ i, i = 1,... n NPFL054, 2016 Hladká & Holub Lecture 7, page 35/51

36 Soft Margin Classifier Training data is not linearly separable Do quadratic programming as for Large Margin Classifier Prediction for a new instance x n h(x) = sgn( α i y i x i x + Θ 0 ) i=1 NPFL054, 2016 Hladká & Holub Lecture 7, page 36/51

37 Support Vector Machines Non-linear boundary If the examples are separated by a nonlinear region NPFL054, 2016 Hladká & Holub Lecture 7, page 37/51

38 Support Vector Machines Non-linear boundary Recall polynomial regression Polynomial regression is an extension of linear regression where the relationship between features and target value is modelled as a d-th order polynomial. Simple regression y = Θ 0 + Θ 1 x 1 Polynomial regression y = Θ 0 + Θ 1 x 1 + Θ 2 x Θ dx d 1 This defines a feature mapping φ(x 1 ) = [x 1, x 2 1,..., x d 1 ] It is still a linear model with features A 1, A 2 1,..., Ad 1. NPFL054, 2016 Hladká & Holub Lecture 7, page 38/51

39 Support Vector Machines Kernels Idea Apply Large/Soft margin classifier not to the orginal features but to the features obtained by the feature mapping φ φ(x) : R m F Large/Soft margin classifier uses dot product x i x j. Now, replace it with φ(x i )φ(x j ). NPFL054, 2016 Hladká & Holub Lecture 7, page 39/51

40 Support Vector Machines Kernels However, finding φ could be expensive. Kernel trick No need to know what φ is and what the feature space is, i.e. no need to explicitly map the data to the feature space Define a kernel function K : R m R m R Replace the dot product x i x j with a Kernel function K(x i, x j ) : L(α) = i α i 1 α i α j y i y j K(x i, x j ) 2 i,j NPFL054, 2016 Hladká & Holub Lecture 7, page 40/51

41 Support Vector Machines Kernels Source: /machine-learning-dir/notes-dir/ker1/ker1-l.html NPFL054, 2016 Hladká & Holub Lecture 7, page 41/51

42 Support Vector Machines Kernels Common kernel functions Linear: K(x i, x j ) = x T i x j Polynomial: K(x i, x j ) = (γx T i x j + c) d Radial basis function: K(x i, x j ) = exp( γ( x i x j 2 )) Sigmoid: K(x i, x j ) = tanh(γx T i x j + c), where tanh(x) = exp(x) exp( x) exp(x)+exp( x) NPFL054, 2016 Hladká & Holub Lecture 7, page 42/51

43 Support Vector Machines Multiclass classification tasks One-to-one Train ( K 2) SVM binary classifiers Classify x using each of the ( K 2) classifiers. The instance is assigned to the class which is the most frequent class assigned in the pairwise classification. NPFL054, 2016 Hladká & Holub Lecture 7, page 43/51

44 Support Vector Machines Multiclass classification tasks One-to-all Train K SVM binary classifiers. Each of them, doing classification of k-th class (+1) to the others (-1), is characterized by the hypothesis parameters Θ k, k = 1,..., K The instance x is assigned to the class k = max k Θ T k x NPFL054, 2016 Hladká & Holub Lecture 7, page 44/51

45 Evaluation of binary classifiers Sensitivity vs. specificity Confusion matrix True class Predicted class Positive Negative Positive True Positive (TP) False Negative (FN) Negative False Positive (FP) True Negative (TN) Measure Precision Recall/Sensitivity Specificity Accuracy Formula TP/(TP+FP) TP/(TP+FN) TN/(TN+FP) (TP+TN)/(TP+FP+TN+FN) NPFL054, 2016 Hladká & Holub Lecture 7, page 45/51

46 Evaluation of binary classifiers Sensitivity vs. specificity Perfect classifier NPFL054, 2016 Hladká & Holub Lecture 7, page 46/51

47 Evaluation of binary classifiers Sensitivity vs. specificity Reality NPFL054, 2016 Hladká & Holub Lecture 7, page 47/51

48 Evaluation of binary classifiers Sensitivity vs. specificity 100% sensitive classifier NPFL054, 2016 Hladká & Holub Lecture 7, page 48/51

49 Evaluation of binary classifiers Sensitivity vs. specificity 100% specific classifier NPFL054, 2016 Hladká & Holub Lecture 7, page 49/51

50 Evaluation of binary classifiers Sensitivity vs. specificity Sensitivity vs. specificity NPFL054, 2016 Hladká & Holub Lecture 7, page 50/51

51 Evaluation of binary classifiers ROC curve NPFL054, 2016 Hladká & Holub Lecture 7, page 51/51

Support Vector Machine (SVM)

Support Vector Machine (SVM) Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Support Vector Machines Explained

Support Vector Machines Explained March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric

More information

A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

Lecture 2: The SVM classifier

Lecture 2: The SVM classifier Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function

More information

Several Views of Support Vector Machines

Several Views of Support Vector Machines Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

More information

Support Vector Machines

Support Vector Machines CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning algorithm. SVMs are among the best (and many believe are indeed the best)

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

A fast multi-class SVM learning method for huge databases

A fast multi-class SVM learning method for huge databases www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,

More information

Distributed Machine Learning and Big Data

Distributed Machine Learning and Big Data Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya

More information

Support Vector Machines for Classification and Regression

Support Vector Machines for Classification and Regression UNIVERSITY OF SOUTHAMPTON Support Vector Machines for Classification and Regression by Steve R. Gunn Technical Report Faculty of Engineering, Science and Mathematics School of Electronics and Computer

More information

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725 Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

More information

SVM Ensemble Model for Investment Prediction

SVM Ensemble Model for Investment Prediction 19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Lecture 8 February 4

Lecture 8 February 4 ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt

More information

Nonlinear Programming Methods.S2 Quadratic Programming

Nonlinear Programming Methods.S2 Quadratic Programming Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

Trading regret rate for computational efficiency in online learning with limited feedback

Trading regret rate for computational efficiency in online learning with limited feedback Trading regret rate for computational efficiency in online learning with limited feedback Shai Shalev-Shwartz TTI-C Hebrew University On-line Learning with Limited Feedback Workshop, 2009 June 2009 Shai

More information

Case Study Report: Building and analyzing SVM ensembles with Bagging and AdaBoost on big data sets

Case Study Report: Building and analyzing SVM ensembles with Bagging and AdaBoost on big data sets Case Study Report: Building and analyzing SVM ensembles with Bagging and AdaBoost on big data sets Ricardo Ramos Guerra Jörg Stork Master in Automation and IT Faculty of Computer Science and Engineering

More information

Machine Learning in Spam Filtering

Machine Learning in Spam Filtering Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.

More information

Support Vector Machine. Tutorial. (and Statistical Learning Theory)

Support Vector Machine. Tutorial. (and Statistical Learning Theory) Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. jasonw@nec-labs.com 1 Support Vector Machines: history SVMs introduced

More information

Consistent Binary Classification with Generalized Performance Metrics

Consistent Binary Classification with Generalized Performance Metrics Consistent Binary Classification with Generalized Performance Metrics Nagarajan Natarajan Joint work with Oluwasanmi Koyejo, Pradeep Ravikumar and Inderjit Dhillon UT Austin Nov 4, 2014 Problem and Motivation

More information

AIMS Big data. AIMS Big data. Outline. Outline. Lecture 5: Structured-output learning January 7, 2015 Andrea Vedaldi

AIMS Big data. AIMS Big data. Outline. Outline. Lecture 5: Structured-output learning January 7, 2015 Andrea Vedaldi AMS Big data AMS Big data Lecture 5: Structured-output learning January 7, 5 Andrea Vedaldi. Discriminative learning. Discriminative learning 3. Hashing and kernel maps 4. Learning representations 5. Structured-output

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

8. Linear least-squares

8. Linear least-squares 8. Linear least-squares EE13 (Fall 211-12) definition examples and applications solution of a least-squares problem, normal equations 8-1 Definition overdetermined linear equations if b range(a), cannot

More information

Comparison of machine learning methods for intelligent tutoring systems

Comparison of machine learning methods for intelligent tutoring systems Comparison of machine learning methods for intelligent tutoring systems Wilhelmiina Hämäläinen 1 and Mikko Vinni 1 Department of Computer Science, University of Joensuu, P.O. Box 111, FI-80101 Joensuu

More information

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental

More information

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013 Notes on Orthogonal and Symmetric Matrices MENU, Winter 201 These notes summarize the main properties and uses of orthogonal and symmetric matrices. We covered quite a bit of material regarding these topics,

More information

E-commerce Transaction Anomaly Classification

E-commerce Transaction Anomaly Classification E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce

More information

Microeconomic Theory: Basic Math Concepts

Microeconomic Theory: Basic Math Concepts Microeconomic Theory: Basic Math Concepts Matt Van Essen University of Alabama Van Essen (U of A) Basic Math Concepts 1 / 66 Basic Math Concepts In this lecture we will review some basic mathematical concepts

More information

Decision Support Systems

Decision Support Systems Decision Support Systems 50 (2011) 602 613 Contents lists available at ScienceDirect Decision Support Systems journal homepage: www.elsevier.com/locate/dss Data mining for credit card fraud: A comparative

More information

11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression

11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c11 2013/9/9 page 221 le-tex 221 11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial

More information

Lectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain

Lectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain Lectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain 1. Orthogonal matrices and orthonormal sets An n n real-valued matrix A is said to be an orthogonal

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Data clustering optimization with visualization

Data clustering optimization with visualization Page 1 Data clustering optimization with visualization Fabien Guillaume MASTER THESIS IN SOFTWARE ENGINEERING DEPARTMENT OF INFORMATICS UNIVERSITY OF BERGEN NORWAY DEPARTMENT OF COMPUTER ENGINEERING BERGEN

More information

SOLUTIONS. f x = 6x 2 6xy 24x, f y = 3x 2 6y. To find the critical points, we solve

SOLUTIONS. f x = 6x 2 6xy 24x, f y = 3x 2 6y. To find the critical points, we solve SOLUTIONS Problem. Find the critical points of the function f(x, y = 2x 3 3x 2 y 2x 2 3y 2 and determine their type i.e. local min/local max/saddle point. Are there any global min/max? Partial derivatives

More information

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all. 1. Differentiation The first derivative of a function measures by how much changes in reaction to an infinitesimal shift in its argument. The largest the derivative (in absolute value), the faster is evolving.

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Evaluation & Validation: Credibility: Evaluating what has been learned

Evaluation & Validation: Credibility: Evaluating what has been learned Evaluation & Validation: Credibility: Evaluating what has been learned How predictive is a learned model? How can we evaluate a model Test the model Statistical tests Considerations in evaluating a Model

More information

4.5 Linear Dependence and Linear Independence

4.5 Linear Dependence and Linear Independence 4.5 Linear Dependence and Linear Independence 267 32. {v 1, v 2 }, where v 1, v 2 are collinear vectors in R 3. 33. Prove that if S and S are subsets of a vector space V such that S is a subset of S, then

More information

1 Solving LPs: The Simplex Algorithm of George Dantzig

1 Solving LPs: The Simplex Algorithm of George Dantzig Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.

More information

Introduction to the Finite Element Method (FEM)

Introduction to the Finite Element Method (FEM) Introduction to the Finite Element Method (FEM) ecture First and Second Order One Dimensional Shape Functions Dr. J. Dean Discretisation Consider the temperature distribution along the one-dimensional

More information

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

More information

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing

More information

Nonlinear Optimization: Algorithms 3: Interior-point methods

Nonlinear Optimization: Algorithms 3: Interior-point methods Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,

More information

Vector and Matrix Norms

Vector and Matrix Norms Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES ISSN: 2229-6956(ONLINE) ICTACT JOURNAL ON SOFT COMPUTING, JULY 2012, VOLUME: 02, ISSUE: 04 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES V. Dheepa 1 and R. Dhanapal 2 1 Research

More information

Neural Networks and Support Vector Machines

Neural Networks and Support Vector Machines INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines

More information

constraint. Let us penalize ourselves for making the constraint too big. We end up with a

constraint. Let us penalize ourselves for making the constraint too big. We end up with a Chapter 4 Constrained Optimization 4.1 Equality Constraints (Lagrangians) Suppose we have a problem: Maximize 5, (x 1, 2) 2, 2(x 2, 1) 2 subject to x 1 +4x 2 =3 If we ignore the constraint, we get the

More information

Least-Squares Intersection of Lines

Least-Squares Intersection of Lines Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a

More information

Similarity and Diagonalization. Similar Matrices

Similarity and Diagonalization. Similar Matrices MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

More information

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}. Walrasian Demand Econ 2100 Fall 2015 Lecture 5, September 16 Outline 1 Walrasian Demand 2 Properties of Walrasian Demand 3 An Optimization Recipe 4 First and Second Order Conditions Definition Walrasian

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Lecture 15 - ROC, AUC & Lift Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-17-AUC

More information

by the matrix A results in a vector which is a reflection of the given

by the matrix A results in a vector which is a reflection of the given Eigenvalues & Eigenvectors Example Suppose Then So, geometrically, multiplying a vector in by the matrix A results in a vector which is a reflection of the given vector about the y-axis We observe that

More information

Inner Product Spaces

Inner Product Spaces Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

More information

Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

More information

Date: April 12, 2001. Contents

Date: April 12, 2001. Contents 2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

More information

A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION

A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION 1 A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION Dimitri Bertsekas M.I.T. FEBRUARY 2003 2 OUTLINE Convexity issues in optimization Historical remarks Our treatment of the subject Three unifying lines of

More information

Orthogonal Projections

Orthogonal Projections Orthogonal Projections and Reflections (with exercises) by D. Klain Version.. Corrections and comments are welcome! Orthogonal Projections Let X,..., X k be a family of linearly independent (column) vectors

More information

Summary: Transformations. Lecture 14 Parameter Estimation Readings T&V Sec 5.1-5.3. Parameter Estimation: Fitting Geometric Models

Summary: Transformations. Lecture 14 Parameter Estimation Readings T&V Sec 5.1-5.3. Parameter Estimation: Fitting Geometric Models Summary: Transformations Lecture 14 Parameter Estimation eadings T&V Sec 5.1-5.3 Euclidean similarity affine projective Parameter Estimation We will talk about estimating parameters of 1) Geometric models

More information

Linear Programming Notes V Problem Transformations

Linear Programming Notes V Problem Transformations Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material

More information

Adaptive and Online One-Class Support Vector Machine-based Outlier Detection Techniques for Wireless Sensor Networks

Adaptive and Online One-Class Support Vector Machine-based Outlier Detection Techniques for Wireless Sensor Networks 2009 International Conference on Advanced Information Networking and Applications Workshops Adaptive and Online One-Class Support Vector Machine-based Outlier Detection Techniques for Wireless Sensor Networks

More information

How To Cluster

How To Cluster Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

Classification of high resolution satellite images

Classification of high resolution satellite images Thesis for the degree of Master of Science in Engineering Physics Classification of high resolution satellite images Anders Karlsson Laboratoire de Systèmes d Information Géographique Ecole Polytéchnique

More information

Elasticity Theory Basics

Elasticity Theory Basics G22.3033-002: Topics in Computer Graphics: Lecture #7 Geometric Modeling New York University Elasticity Theory Basics Lecture #7: 20 October 2003 Lecturer: Denis Zorin Scribe: Adrian Secord, Yotam Gingold

More information

Open-Set Face Recognition-based Visitor Interface System

Open-Set Face Recognition-based Visitor Interface System Open-Set Face Recognition-based Visitor Interface System Hazım K. Ekenel, Lorant Szasz-Toth, and Rainer Stiefelhagen Computer Science Department, Universität Karlsruhe (TH) Am Fasanengarten 5, Karlsruhe

More information

A Support Vector Method for Multivariate Performance Measures

A Support Vector Method for Multivariate Performance Measures Thorsten Joachims Cornell University, Dept. of Computer Science, 4153 Upson Hall, Ithaca, NY 14853 USA tj@cs.cornell.edu Abstract This paper presents a Support Vector Method for optimizing multivariate

More information

Kernel methods with Imbalanced Data and Applications to Weather Prediction

Kernel methods with Imbalanced Data and Applications to Weather Prediction Kernel methods with Imbalanced Data and Applications to Weather Prediction Theodore B. Trafalis Laboratory of Optimization and Intelligent Systems School of Industrial and Systems Engineering University

More information

DERIVATIVES AS MATRICES; CHAIN RULE

DERIVATIVES AS MATRICES; CHAIN RULE DERIVATIVES AS MATRICES; CHAIN RULE 1. Derivatives of Real-valued Functions Let s first consider functions f : R 2 R. Recall that if the partial derivatives of f exist at the point (x 0, y 0 ), then we

More information

Math 53 Worksheet Solutions- Minmax and Lagrange

Math 53 Worksheet Solutions- Minmax and Lagrange Math 5 Worksheet Solutions- Minmax and Lagrange. Find the local maximum and minimum values as well as the saddle point(s) of the function f(x, y) = e y (y x ). Solution. First we calculate the partial

More information

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the

More information

Performance Metrics. number of mistakes total number of observations. err = p.1/1

Performance Metrics. number of mistakes total number of observations. err = p.1/1 p.1/1 Performance Metrics The simplest performance metric is the model error defined as the number of mistakes the model makes on a data set divided by the number of observations in the data set, err =

More information

MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS

MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS MAXIMIZING RETURN ON DIRET MARKETING AMPAIGNS IN OMMERIAL BANKING S 229 Project: Final Report Oleksandra Onosova INTRODUTION Recent innovations in cloud computing and unified communications have made a

More information

Spazi vettoriali e misure di similaritá

Spazi vettoriali e misure di similaritá Spazi vettoriali e misure di similaritá R. Basili Corso di Web Mining e Retrieval a.a. 2008-9 April 10, 2009 Outline Outline Spazi vettoriali a valori reali Operazioni tra vettori Indipendenza Lineare

More information

A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM

A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM Journal of Computational Information Systems 10: 17 (2014) 7629 7635 Available at http://www.jofcis.com A Health Degree Evaluation Algorithm for Equipment Based on Fuzzy Sets and the Improved SVM Tian

More information

KERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA

KERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA Rahayu, Kernel Logistic Regression-Linear for Leukemia Classification using High Dimensional Data KERNEL LOGISTIC REGRESSION-LINEAR FOR LEUKEMIA CLASSIFICATION USING HIGH DIMENSIONAL DATA S.P. Rahayu 1,2

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α

More information

Two-Stage Stochastic Linear Programs

Two-Stage Stochastic Linear Programs Two-Stage Stochastic Linear Programs Operations Research Anthony Papavasiliou 1 / 27 Two-Stage Stochastic Linear Programs 1 Short Reviews Probability Spaces and Random Variables Convex Analysis 2 Deterministic

More information

Linear Algebra Notes

Linear Algebra Notes Linear Algebra Notes Chapter 19 KERNEL AND IMAGE OF A MATRIX Take an n m matrix a 11 a 12 a 1m a 21 a 22 a 2m a n1 a n2 a nm and think of it as a function A : R m R n The kernel of A is defined as Note

More information

Mathematical finance and linear programming (optimization)

Mathematical finance and linear programming (optimization) Mathematical finance and linear programming (optimization) Geir Dahl September 15, 2009 1 Introduction The purpose of this short note is to explain how linear programming (LP) (=linear optimization) may

More information

Network Intrusion Detection using Semi Supervised Support Vector Machine

Network Intrusion Detection using Semi Supervised Support Vector Machine Network Intrusion Detection using Semi Supervised Support Vector Machine Jyoti Haweliya Department of Computer Engineering Institute of Engineering & Technology, Devi Ahilya University Indore, India ABSTRACT

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Lecture 2: August 29. Linear Programming (part I)

Lecture 2: August 29. Linear Programming (part I) 10-725: Convex Optimization Fall 2013 Lecture 2: August 29 Lecturer: Barnabás Póczos Scribes: Samrachana Adhikari, Mattia Ciollaro, Fabrizio Lecci Note: LaTeX template courtesy of UC Berkeley EECS dept.

More information

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions. Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course

More information

MATH1231 Algebra, 2015 Chapter 7: Linear maps

MATH1231 Algebra, 2015 Chapter 7: Linear maps MATH1231 Algebra, 2015 Chapter 7: Linear maps A/Prof. Daniel Chan School of Mathematics and Statistics University of New South Wales danielc@unsw.edu.au Daniel Chan (UNSW) MATH1231 Algebra 1 / 43 Chapter

More information

Online learning of multi-class Support Vector Machines

Online learning of multi-class Support Vector Machines IT 12 061 Examensarbete 30 hp November 2012 Online learning of multi-class Support Vector Machines Xuan Tuan Trinh Institutionen för informationsteknologi Department of Information Technology Abstract

More information

Duality of linear conic problems

Duality of linear conic problems Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least

More information

Review of Fundamental Mathematics

Review of Fundamental Mathematics Review of Fundamental Mathematics As explained in the Preface and in Chapter 1 of your textbook, managerial economics applies microeconomic theory to business decision making. The decision-making tools

More information

A Utility Maximization Example

A Utility Maximization Example A Utilit Maximization Example Charlie Gibbons Universit of California, Berkele September 17, 2007 Since we couldn t finish the utilit maximization problem in section, here it is solved from the beginning.

More information

PYTHAGOREAN TRIPLES KEITH CONRAD

PYTHAGOREAN TRIPLES KEITH CONRAD PYTHAGOREAN TRIPLES KEITH CONRAD 1. Introduction A Pythagorean triple is a triple of positive integers (a, b, c) where a + b = c. Examples include (3, 4, 5), (5, 1, 13), and (8, 15, 17). Below is an ancient

More information