Introduction to Machine Learning NPFL 054


 Cecil Ross Day
 1 years ago
 Views:
Transcription
1 Introduction to Machine Learning NPFL 054 Barbora Hladká Martin Holub Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics NPFL054, 2016 Hladká & Holub Lecture 7, page 1/51
2 Outline Evaluation of binary classifiers (cntnd): ROC curve Support Vector Machines (SVM) NPFL054, 2016 Hladká & Holub Lecture 7, page 2/51
3 Support Vector Machines Basic idea of SVM for binary classification tasks We find a plane that separates the two classes in the feature space. If it is not possible allow some training errors, or enrich the feature space so that finding a separating plane is possible NPFL054, 2016 Hladká & Holub Lecture 7, page 3/51
4 Support Vector Machines Three key ideas Maximizing the margin Duality optimization task Kernels Key concepts needed Hyperplane Dot product Quadratic programming NPFL054, 2016 Hladká & Holub Lecture 7, page 4/51
5 Hyperplane A hyperplane of an mdimensional space is a subspace with dimension m 1. Mathematical definition Θ 0 + Θ T x = 0, where Θ = Θ 1,..., Θ m If m = 2, a hyperplane is a line Θ is a normal vector If x satisfies the equation, then it lies on the hyperplane If Θ 0 + Θ T x 0, then x lies to one side of the hyperplane NPFL054, 2016 Hladká & Holub Lecture 7, page 5/51
6 Hyperplane NPFL054, 2016 Hladká & Holub Lecture 7, page 6/51
7 Hyperplane Separating hyperplane NPFL054, 2016 Hladká & Holub Lecture 7, page 7/51
8 Hyperplane Pointhyperplane distance Distance of x to the hyperplane Θ 0 + Θ T x = 0 ρ(x) = Θ 0 + Θ T x Θ NPFL054, 2016 Hladká & Holub Lecture 7, page 8/51
9 Dot product x R m length of x (or 2norm) x = m i=1 x i 2 dot product of two vectors x 1, x 2 R m x 1 x 2 = m x 1i x 2i i=1 x 1x 2 = x 1. x 2. cos α geometric interpretation of x 1x 2: the length of the projection of x 1 onto the unit vector x 2 ( x 2 = 1) xx = x 2 NPFL054, 2016 Hladká & Holub Lecture 7, page 9/51
10 Dot product x 2 = 1 NPFL054, 2016 Hladká & Holub Lecture 7, page 10/51
11 Quadratic programming Quadratic programming is the problem of optimizing a quadratic function of several variables subject to linear constraints on these variables. NPFL054, 2016 Hladká & Holub Lecture 7, page 11/51
12 Support Vector Machines Binary classification task Y = {+1, 1} h has a form of h(x) = sgn(θ 0 + Θ T x) NPFL054, 2016 Hladká & Holub Lecture 7, page 12/51
13 Support Vector Machines Binary classification task Y = {+1, 1} Outline 1 Large margin classifier (linear separability) 2 Soft margin classifier (not linear separability) 3 Kernels (nonlinear class boundaries) NPFL054, 2016 Hladká & Holub Lecture 7, page 13/51
14 Support Vector Machines Binary classification task Y = {+1, 1} Data set Data = { x i, y i, x i X, y i { 1, +1}} is linearly separable if there exists a hyperplane so that all instances from Data are classified correctly. NPFL054, 2016 Hladká & Holub Lecture 7, page 14/51
15 Support Vector Machines Binary classification task Y = {+1, 1} Assume a hyperplane g: Θ 0 + Θ T x = 0 Margin of x w.r.t. g is distance of x to g: ρ g (x) = Θ 0 + Θ T x Θ Functional margin of x, x, y Data w.r.t. g is ρ g (x, y) = y(θ 0 + Θ T x) Is x classified correctly or not? Large functional margin represents correct and confident classification. Geometric margin of I.e. functional margin scaled by Θ x, x, y Data w.r.t. g is ρ g (x, y) = ρ g (x, y)/ Θ NPFL054, 2016 Hladká & Holub Lecture 7, page 15/51
16 Geometric margin of x NPFL054, 2016 Hladká & Holub Lecture 7, page 16/51
17 Support Vector Machines Binary classification task Y = {+1, 1} Geometric margin of Data w.r.t. g is ρ g (Data) = argmin x,y Data ρ g (x, y) NPFL054, 2016 Hladká & Holub Lecture 7, page 17/51
18 Support Vector Machines Binary classification task Y = {+1, 1} Functional margin of Data w.r.t. g is ρ g (Data) = argmin x,y Data ρ g (x, y) Functional margin of training set is functional margin of closest points NPFL054, 2016 Hladká & Holub Lecture 7, page 18/51
19 Large Margin Classifier Training data is linearly separable We look for g so that g = argmax g ρ g (Data) NPFL054, 2016 Hladká & Holub Lecture 7, page 19/51
20 Large Margin Classifier Training data is linearly separable Θ 0 + Θ T x and kθ 0 + (kθ) T x define the same hyperplane. i.e. distances to separator are unchanged y i (Θ 0 + Θ T x i ) Θ = y i(kθ 0 + (kθ) T x i ) kθ Thus, we can choose Θ so that ρ g (Data) = 1. Then g = argmax g ρ g (Data) = argmax g 1 Θ NPFL054, 2016 Hladká & Holub Lecture 7, page 20/51
21 Large Margin Classifier Training data is linearly separable NPFL054, 2016 Hladká & Holub Lecture 7, page 21/51
22 Large Margin Classifier Training data is linearly separable Goal: Orientate the separatig hyperplane to be as far as possible from the closest instances of both classes. Θ = argmax Θ 1 Θ Support vectors are the instances touching the margins. NPFL054, 2016 Hladká & Holub Lecture 7, page 22/51
23 Large Margin Classifier Training data is linearly separable NPFL054, 2016 Hladká & Holub Lecture 7, page 23/51
24 Large Margin Classifier Training data is linearly separable Θ 1 = argmax Θ Θ argmin 1 Θ 2 Θ 2 NPFL054, 2016 Hladká & Holub Lecture 7, page 24/51
25 Large Margin Classifier Training data is linearly separable Primal problem Minimize subject to constraints 1 2 Θ 2 y i (Θ 0 + Θ T x i ) 1, i = 1,... n Properties 1 Convex optimization 2 Unique solution for linearly separable training data NPFL054, 2016 Hladká & Holub Lecture 7, page 25/51
26 Large Margin Classifier Training data is linearly separable For each training example x i, y i, introduce Lagrange multiplier α i 0. Let α = α 1,..., α n. Primal Lagrangian L(Θ, Θ 0, α) is given by L(Θ, Θ 0, α) = 1 2 Θ 2 i α i (y i (Θ 0 + Θ T x i ) 1) (1) subject to constraints α i [y i (Θ 0 + Θ T x i ) 1] = 0, i = 1,... n We wish to find the Θ and Θ 0 which minimizes and the α which maximizes L. NPFL054, 2016 Hladká & Holub Lecture 7, page 26/51
27 Large Margin Classifier Training data is linearly separable 1. Minimize L w.r.t. Θ Thus differentiate L w.r.t. Θ and L Θ = 0 It gives Θ = n α i y i x i (2) i=1 2. Minimize L w.r.t. Θ 0 Thus differentiate L w.r.t. Θ 0 and L Θ 0 = 0 It gives n α i y i = 0 (3) i=1 NPFL054, 2016 Hladká & Holub Lecture 7, page 27/51
28 Large Margin Classifier Training data is linearly separable 3. Substitute (2) into the primal form (1). Then L(Θ, Θ 0, α) = α i 1 2 i subject to α i 0 and i α iy i = 0, i = 1... n α i α j y i y j xi T x j i,j NPFL054, 2016 Hladká & Holub Lecture 7, page 28/51
29 Large Margin Classifier Training data is linearly separable 4. Solve the dual problem, i.e. maximize a quadratic function. Do quadratic programming. L(α) = i α i 1 α i α j y i y j xi T x j 2 subject to α i 0 and i α iy i = 0, i = 1... n 5. Get α 6. Then Θ = n i=1 α i y ix i i,j NPFL054, 2016 Hladká & Holub Lecture 7, page 29/51
30 Large Margin Classifier Training data is linearly separable Θ is the solution to the primal problem α is the solution to the dual problem Θ and α satisfy the KarushKuhnTucker conditions where one of them is so called KarushKuhnTucker dual complementarity: α i (1 y i (Θ 0 + Θ T x i ) = 0 y i (Θ 0 + Θ T x i ) 1 (x i is not support vector) α i = 0 α i 0 y i (Θ 0 + Θ T x i ) = 1 (x i is support vector) I.e., finding Θ is equivalent to finding support vectors and their weights NPFL054, 2016 Hladká & Holub Lecture 7, page 30/51
31 Large Margin Classifier Training data is linearly separable Prediction for a new instance x h(x) = sgn( n α i y i x i x + Θ 0 ) i=1 similarity between x and support vector x i : a support vector that is more similar contributes more to the classification support vector that is more important, i.e. has larger α i, contributes more to the classification if y i is positive, than the contribution is positive, otherwise negative NPFL054, 2016 Hladká & Holub Lecture 7, page 31/51
32 Soft Margin Classifier Training data is not linearly separable In a real problem it is unlikely that a line will exactly separate the data even if a curved decision boundary is possible. So exactly separating the data is probably not desirable if the data has noise and outliers, a smooth decision boundary that ignores a few data points is better than one that loops around the outliers. Thus minimize Θ 2 AND the number of training mistakes NPFL054, 2016 Hladká & Holub Lecture 7, page 32/51
33 Soft Margin Classifier Training data is not linearly separable NPFL054, 2016 Hladká & Holub Lecture 7, page 33/51
34 Soft Margin Classifier Training data is not linearly separable Introducing slack variables ξ i 0 ξ i = 0 if x i is correctly classified ξ i is distance to "its supporting hyperplane" otherwise 0 < ξ i 1/ Θ : margin violation ξ i > 1/ Θ : misclassification NPFL054, 2016 Hladká & Holub Lecture 7, page 34/51
35 Soft Margin Classifier Training data is not linearly separable Primal problem Minimize subject to constraint 1 2 Θ 2 + C n i=1 ξ i C 0 tradeoff parameter small C large margin large C narrow margin y i (Θ 0 + Θ T x i ) 1 ξ i, i = 1,... n NPFL054, 2016 Hladká & Holub Lecture 7, page 35/51
36 Soft Margin Classifier Training data is not linearly separable Do quadratic programming as for Large Margin Classifier Prediction for a new instance x n h(x) = sgn( α i y i x i x + Θ 0 ) i=1 NPFL054, 2016 Hladká & Holub Lecture 7, page 36/51
37 Support Vector Machines Nonlinear boundary If the examples are separated by a nonlinear region NPFL054, 2016 Hladká & Holub Lecture 7, page 37/51
38 Support Vector Machines Nonlinear boundary Recall polynomial regression Polynomial regression is an extension of linear regression where the relationship between features and target value is modelled as a dth order polynomial. Simple regression y = Θ 0 + Θ 1 x 1 Polynomial regression y = Θ 0 + Θ 1 x 1 + Θ 2 x Θ dx d 1 This defines a feature mapping φ(x 1 ) = [x 1, x 2 1,..., x d 1 ] It is still a linear model with features A 1, A 2 1,..., Ad 1. NPFL054, 2016 Hladká & Holub Lecture 7, page 38/51
39 Support Vector Machines Kernels Idea Apply Large/Soft margin classifier not to the orginal features but to the features obtained by the feature mapping φ φ(x) : R m F Large/Soft margin classifier uses dot product x i x j. Now, replace it with φ(x i )φ(x j ). NPFL054, 2016 Hladká & Holub Lecture 7, page 39/51
40 Support Vector Machines Kernels However, finding φ could be expensive. Kernel trick No need to know what φ is and what the feature space is, i.e. no need to explicitly map the data to the feature space Define a kernel function K : R m R m R Replace the dot product x i x j with a Kernel function K(x i, x j ) : L(α) = i α i 1 α i α j y i y j K(x i, x j ) 2 i,j NPFL054, 2016 Hladká & Holub Lecture 7, page 40/51
41 Support Vector Machines Kernels Source: 8008/machinelearningdir/notesdir/ker1/ker1l.html NPFL054, 2016 Hladká & Holub Lecture 7, page 41/51
42 Support Vector Machines Kernels Common kernel functions Linear: K(x i, x j ) = x T i x j Polynomial: K(x i, x j ) = (γx T i x j + c) d Radial basis function: K(x i, x j ) = exp( γ( x i x j 2 )) Sigmoid: K(x i, x j ) = tanh(γx T i x j + c), where tanh(x) = exp(x) exp( x) exp(x)+exp( x) NPFL054, 2016 Hladká & Holub Lecture 7, page 42/51
43 Support Vector Machines Multiclass classification tasks Onetoone Train ( K 2) SVM binary classifiers Classify x using each of the ( K 2) classifiers. The instance is assigned to the class which is the most frequent class assigned in the pairwise classification. NPFL054, 2016 Hladká & Holub Lecture 7, page 43/51
44 Support Vector Machines Multiclass classification tasks Onetoall Train K SVM binary classifiers. Each of them, doing classification of kth class (+1) to the others (1), is characterized by the hypothesis parameters Θ k, k = 1,..., K The instance x is assigned to the class k = max k Θ T k x NPFL054, 2016 Hladká & Holub Lecture 7, page 44/51
45 Evaluation of binary classifiers Sensitivity vs. specificity Confusion matrix True class Predicted class Positive Negative Positive True Positive (TP) False Negative (FN) Negative False Positive (FP) True Negative (TN) Measure Precision Recall/Sensitivity Specificity Accuracy Formula TP/(TP+FP) TP/(TP+FN) TN/(TN+FP) (TP+TN)/(TP+FP+TN+FN) NPFL054, 2016 Hladká & Holub Lecture 7, page 45/51
46 Evaluation of binary classifiers Sensitivity vs. specificity Perfect classifier NPFL054, 2016 Hladká & Holub Lecture 7, page 46/51
47 Evaluation of binary classifiers Sensitivity vs. specificity Reality NPFL054, 2016 Hladká & Holub Lecture 7, page 47/51
48 Evaluation of binary classifiers Sensitivity vs. specificity 100% sensitive classifier NPFL054, 2016 Hladká & Holub Lecture 7, page 48/51
49 Evaluation of binary classifiers Sensitivity vs. specificity 100% specific classifier NPFL054, 2016 Hladká & Holub Lecture 7, page 49/51
50 Evaluation of binary classifiers Sensitivity vs. specificity Sensitivity vs. specificity NPFL054, 2016 Hladká & Holub Lecture 7, page 50/51
51 Evaluation of binary classifiers ROC curve NPFL054, 2016 Hladká & Holub Lecture 7, page 51/51
Support Vector Machine (SVM)
Support Vector Machine (SVM) CE725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept HardMargin SVM SoftMargin SVM Dual Problems of HardMargin
More informationSupport Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationSupport Vector Machines
Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric
More informationA Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Largemargin linear
More informationIntroduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multiclass classification.
More informationRegression Using Support Vector Machines: Basic Foundations
Regression Using Support Vector Machines: Basic Foundations Technical Report December 2004 Aly Farag and Refaat M Mohamed Computer Vision and Image Processing Laboratory Electrical and Computer Engineering
More informationClassifiers & Classification
Classifiers & Classification Forsyth & Ponce Computer Vision A Modern Approach chapter 22 Pattern Classification Duda, Hart and Stork School of Computer Science & Statistics Trinity College Dublin Dublin
More informationLecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
More informationNotes on Support Vector Machines
Notes on Support Vector Machines Fernando Mira da Silva Fernando.Silva@inesc.pt Neural Network Group I N E S C November 1998 Abstract This report describes an empirical study of Support Vector Machines
More informationSeveral Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
More informationSupport Vector Machines
CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning algorithm. SVMs are among the best (and many believe are indeed the best)
More informationData Mining Practical Machine Learning Tools and Techniques
Counting the cost Data Mining Practical Machine Learning Tools and Techniques Slides for Section 5.7 In practice, different types of classification errors often incur different costs Examples: Loan decisions
More informationBehavior Analysis of SVM Based Spam Filtering Using Various Kernel Functions and Data Representations
ISSN: 2278181 Vol. 2 Issue 9, September  213 Behavior Analysis of SVM Based Spam Filtering Using Various Kernel Functions and Data Representations Author :Sushama Chouhan Author Affiliation: MTech Scholar
More informationConvex Optimization SVM s and Kernel Machines
Convex Optimization SVM s and Kernel Machines S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola and Stéphane Canu S.V.N.
More informationClass #6: Nonlinear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Nonlinear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Nonlinear classification Linear Support Vector Machines
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationA fast multiclass SVM learning method for huge databases
www.ijcsi.org 544 A fast multiclass SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and TalebAhmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,
More informationDistributed Machine Learning and Big Data
Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya
More informationSupport Vector Machines for Classification and Regression
UNIVERSITY OF SOUTHAMPTON Support Vector Machines for Classification and Regression by Steve R. Gunn Technical Report Faculty of Engineering, Science and Mathematics School of Electronics and Computer
More informationDuality in General Programs. Ryan Tibshirani Convex Optimization 10725/36725
Duality in General Programs Ryan Tibshirani Convex Optimization 10725/36725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T
More informationBig Data  Lecture 1 Optimization reminders
Big Data  Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data  Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationClassification: Basic Concepts, Decision Trees, and Model Evaluation. General Approach for Building Classification Model
10 10 Classification: Basic Concepts, Decision Trees, and Model Evaluation Dr. Hui Xiong Rutgers University Introduction to Data Mining 1//009 1 General Approach for Building Classification Model Tid Attrib1
More informationSearch Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationDefinition of a Linear Program
Definition of a Linear Program Definition: A function f(x 1, x,..., x n ) of x 1, x,..., x n is a linear function if and only if for some set of constants c 1, c,..., c n, f(x 1, x,..., x n ) = c 1 x 1
More informationData Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Models vs. Patterns Models A model is a high level, global description of a
More informationLecture 8 February 4
ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt
More informationIntroduction to Convex Optimization for Machine Learning
Introduction to Convex Optimization for Machine Learning John Duchi University of California, Berkeley Practical Machine Learning, Fall 2009 Duchi (UC Berkeley) Convex Optimization for Machine Learning
More informationNonlinear Programming Methods.S2 Quadratic Programming
Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationSVM Ensemble Model for Investment Prediction
19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of
More informationTrading regret rate for computational efficiency in online learning with limited feedback
Trading regret rate for computational efficiency in online learning with limited feedback Shai ShalevShwartz TTIC Hebrew University Online Learning with Limited Feedback Workshop, 2009 June 2009 Shai
More informationA Survey of Kernel Clustering Methods
A Survey of Kernel Clustering Methods Maurizio Filippone, Francesco Camastra, Francesco Masulli and Stefano Rovetta Presented by: Kedar Grama Outline Unsupervised Learning and Clustering Types of clustering
More informationSupport Vector Machine. Tutorial. (and Statistical Learning Theory)
Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. jasonw@neclabs.com 1 Support Vector Machines: history SVMs introduced
More informationCase Study Report: Building and analyzing SVM ensembles with Bagging and AdaBoost on big data sets
Case Study Report: Building and analyzing SVM ensembles with Bagging and AdaBoost on big data sets Ricardo Ramos Guerra Jörg Stork Master in Automation and IT Faculty of Computer Science and Engineering
More informationConsistent Binary Classification with Generalized Performance Metrics
Consistent Binary Classification with Generalized Performance Metrics Nagarajan Natarajan Joint work with Oluwasanmi Koyejo, Pradeep Ravikumar and Inderjit Dhillon UT Austin Nov 4, 2014 Problem and Motivation
More informationMachine Learning in Spam Filtering
Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.
More informationNeural Networks. CAP5610 Machine Learning Instructor: GuoJun Qi
Neural Networks CAP5610 Machine Learning Instructor: GuoJun Qi Recap: linear classifier Logistic regression Maximizing the posterior distribution of class Y conditional on the input vector X Support vector
More information8. Linear leastsquares
8. Linear leastsquares EE13 (Fall 21112) definition examples and applications solution of a leastsquares problem, normal equations 81 Definition overdetermined linear equations if b range(a), cannot
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 3 Linear Least Squares Prof. Michael T. Heath Department of Computer Science University of Illinois at UrbanaChampaign Copyright c 2002. Reproduction
More informationData clustering optimization with visualization
Page 1 Data clustering optimization with visualization Fabien Guillaume MASTER THESIS IN SOFTWARE ENGINEERING DEPARTMENT OF INFORMATICS UNIVERSITY OF BERGEN NORWAY DEPARTMENT OF COMPUTER ENGINEERING BERGEN
More informationOverview of Math Standards
Algebra 2 Welcome to math curriculum design maps for Manhattan Ogden USD 383, striving to produce learners who are: Effective Communicators who clearly express ideas and effectively communicate with diverse
More informationClassifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical microclustering algorithm ClusteringBased SVM (CBSVM) Experimental
More informationNotes on Orthogonal and Symmetric Matrices MENU, Winter 2013
Notes on Orthogonal and Symmetric Matrices MENU, Winter 201 These notes summarize the main properties and uses of orthogonal and symmetric matrices. We covered quite a bit of material regarding these topics,
More informationMATH 304 Linear Algebra Lecture 24: Scalar product.
MATH 304 Linear Algebra Lecture 24: Scalar product. Vectors: geometric approach B A B A A vector is represented by a directed segment. Directed segment is drawn as an arrow. Different arrows represent
More informationMicroeconomic Theory: Basic Math Concepts
Microeconomic Theory: Basic Math Concepts Matt Van Essen University of Alabama Van Essen (U of A) Basic Math Concepts 1 / 66 Basic Math Concepts In this lecture we will review some basic mathematical concepts
More information11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression
Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c11 2013/9/9 page 221 letex 221 11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial
More informationEcommerce Transaction Anomaly Classification
Ecommerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of ecommerce
More informationAIMS Big data. AIMS Big data. Outline. Outline. Lecture 5: Structuredoutput learning January 7, 2015 Andrea Vedaldi
AMS Big data AMS Big data Lecture 5: Structuredoutput learning January 7, 5 Andrea Vedaldi. Discriminative learning. Discriminative learning 3. Hashing and kernel maps 4. Learning representations 5. Structuredoutput
More informationLectures notes on orthogonal matrices (with exercises) 92.222  Linear Algebra II  Spring 2004 by D. Klain
Lectures notes on orthogonal matrices (with exercises) 92.222  Linear Algebra II  Spring 2004 by D. Klain 1. Orthogonal matrices and orthonormal sets An n n realvalued matrix A is said to be an orthogonal
More informationSOLUTIONS. f x = 6x 2 6xy 24x, f y = 3x 2 6y. To find the critical points, we solve
SOLUTIONS Problem. Find the critical points of the function f(x, y = 2x 3 3x 2 y 2x 2 3y 2 and determine their type i.e. local min/local max/saddle point. Are there any global min/max? Partial derivatives
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationIncreasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.
1. Differentiation The first derivative of a function measures by how much changes in reaction to an infinitesimal shift in its argument. The largest the derivative (in absolute value), the faster is evolving.
More informationMachine Learning in Automatic Music Chords Generation
Machine Learning in Automatic Music Chords Generation Ziheng Chen Department of Music zihengc@stanford.edu Jie Qi Department of Electrical Engineering qijie@stanford.edu Yifei Zhou Department of Statistics
More informationComparison of machine learning methods for intelligent tutoring systems
Comparison of machine learning methods for intelligent tutoring systems Wilhelmiina Hämäläinen 1 and Mikko Vinni 1 Department of Computer Science, University of Joensuu, P.O. Box 111, FI80101 Joensuu
More informationNumerical solution of the spacetime tempered fractional diffusion equation. finite differences
Numerical solution of the spacetime tempered fractional diffusion equation: Alternatives to finite differences Université catholique de Louvain Belgium emmanuel.hanert@uclouvain.be Which numerical method
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationEvaluation & Validation: Credibility: Evaluating what has been learned
Evaluation & Validation: Credibility: Evaluating what has been learned How predictive is a learned model? How can we evaluate a model Test the model Statistical tests Considerations in evaluating a Model
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationIntroduction to the Finite Element Method (FEM)
Introduction to the Finite Element Method (FEM) ecture First and Second Order One Dimensional Shape Functions Dr. J. Dean Discretisation Consider the temperature distribution along the onedimensional
More informationControlling the Sensitivity of Support Vector Machines. K. Veropoulos, C. Campbell, N. Cristianini. United Kingdom. maximising the Lagrangian:
Controlling the Sensitivity of Support Vector Machines K. Veropoulos, C. Campbell, N. Cristianini Department of Engineering Mathematics, Bristol University, Bristol BS8 1TR, United Kingdom Abstract For
More information4.5 Linear Dependence and Linear Independence
4.5 Linear Dependence and Linear Independence 267 32. {v 1, v 2 }, where v 1, v 2 are collinear vectors in R 3. 33. Prove that if S and S are subsets of a vector space V such that S is a subset of S, then
More information1 Solving LPs: The Simplex Algorithm of George Dantzig
Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.
More informationMACHINE LEARNING. Introduction. Alessandro Moschitti
MACHINE LEARNING Introduction Alessandro Moschitti Department of Computer Science and Information Engineering University of Trento Email: moschitti@disi.unitn.it Course Schedule Lectures Tuesday, 14:0016:00
More informationDecision Support Systems
Decision Support Systems 50 (2011) 602 613 Contents lists available at ScienceDirect Decision Support Systems journal homepage: www.elsevier.com/locate/dss Data mining for credit card fraud: A comparative
More informationNonlinear Optimization: Algorithms 3: Interiorpoint methods
Nonlinear Optimization: Algorithms 3: Interiorpoint methods INSEAD, Spring 2006 JeanPhilippe Vert Ecole des Mines de Paris JeanPhilippe.Vert@mines.org Nonlinear optimization c 2006 JeanPhilippe Vert,
More informationArtificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network?  Perceptron learners  Multilayer networks What is a Support
More informationArtificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing Email Classifier
International Journal of Recent Technology and Engineering (IJRTE) ISSN: 22773878, Volume1, Issue6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing
More informationIntroduction to Machine Learning
Introduction to Machine Learning Prof. Alexander Ihler Prof. Max Welling icamp Tutorial July 22 What is machine learning? The ability of a machine to improve its performance based on previous results:
More informationx 2 + y 2 = 25 and try to solve for y in terms of x, we get 2 new equations y = 25 x 2 and y = 25 x 2.
Lecture : Implicit differentiation For more on the graphs of functions vs. the graphs of general equations see Graphs of Functions under Algebra/Precalculus Review on the class webpage. For more on graphing
More informationNeural Networks and Support Vector Machines
INF5390  Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF539013 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines
More informationVector and Matrix Norms
Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a nonempty
More informationBEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
ISSN: 22296956(ONLINE) ICTACT JOURNAL ON SOFT COMPUTING, JULY 2012, VOLUME: 02, ISSUE: 04 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES V. Dheepa 1 and R. Dhanapal 2 1 Research
More informationLeastSquares Intersection of Lines
LeastSquares Intersection of Lines Johannes Traa  UIUC 2013 This writeup derives the leastsquares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
More informationBaum s Algorithm Learns Intersections of Halfspaces with respect to LogConcave Distributions
Baum s Algorithm Learns Intersections of Halfspaces with respect to LogConcave Distributions Adam R. Klivans 1, Philip M. Long 2, and Alex K. Tang 3 1 UTAustin, klivans@cs.utexas.edu 2 Google, plong@google.com
More informationconstraint. Let us penalize ourselves for making the constraint too big. We end up with a
Chapter 4 Constrained Optimization 4.1 Equality Constraints (Lagrangians) Suppose we have a problem: Maximize 5, (x 1, 2) 2, 2(x 2, 1) 2 subject to x 1 +4x 2 =3 If we ignore the constraint, we get the
More informationSimilarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
More informationWalrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.
Walrasian Demand Econ 2100 Fall 2015 Lecture 5, September 16 Outline 1 Walrasian Demand 2 Properties of Walrasian Demand 3 An Optimization Recipe 4 First and Second Order Conditions Definition Walrasian
More informationInner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
More informationStatistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP  Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
More informationby the matrix A results in a vector which is a reflection of the given
Eigenvalues & Eigenvectors Example Suppose Then So, geometrically, multiplying a vector in by the matrix A results in a vector which is a reflection of the given vector about the yaxis We observe that
More informationFrom Maxent to Machine Learning and Back
From Maxent to Machine Learning and Back T. Sears ANU March 2007 T. Sears (ANU) From Maxent to Machine Learning and Back Maxent 2007 1 / 36 50 Years Ago... The principles and mathematical methods of statistical
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More informationDate: April 12, 2001. Contents
2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........
More informationIntroduction to Statistical Machine Learning
CHAPTER Introduction to Statistical Machine Learning We start with a gentle introduction to statistical machine learning. Readers familiar with machine learning may wish to skip directly to Section 2,
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Lecture 15  ROC, AUC & Lift Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.standrews.ac.uk twk@standrews.ac.uk Tom Kelsey ID505917AUC
More informationOrthogonal Projections
Orthogonal Projections and Reflections (with exercises) by D. Klain Version.. Corrections and comments are welcome! Orthogonal Projections Let X,..., X k be a family of linearly independent (column) vectors
More informationLinear Codes. In the V[n,q] setting, the terms word and vector are interchangeable.
Linear Codes Linear Codes In the V[n,q] setting, an important class of codes are the linear codes, these codes are the ones whose code words form a subvector space of V[n,q]. If the subspace of V[n,q]
More informationDERIVATIVES AS MATRICES; CHAIN RULE
DERIVATIVES AS MATRICES; CHAIN RULE 1. Derivatives of Realvalued Functions Let s first consider functions f : R 2 R. Recall that if the partial derivatives of f exist at the point (x 0, y 0 ), then we
More informationWe call this set an ndimensional parallelogram (with one vertex 0). We also refer to the vectors x 1,..., x n as the edges of P.
Volumes of parallelograms 1 Chapter 8 Volumes of parallelograms In the present short chapter we are going to discuss the elementary geometrical objects which we call parallelograms. These are going to
More informationLinear Programming Notes V Problem Transformations
Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material
More information2.1: MATRIX OPERATIONS
.: MATRIX OPERATIONS What are diagonal entries and the main diagonal of a matrix? What is a diagonal matrix? When are matrices equal? Scalar Multiplication 45 Matrix Addition Theorem (pg 0) Let A, B, and
More informationSummary: Transformations. Lecture 14 Parameter Estimation Readings T&V Sec 5.15.3. Parameter Estimation: Fitting Geometric Models
Summary: Transformations Lecture 14 Parameter Estimation eadings T&V Sec 5.15.3 Euclidean similarity affine projective Parameter Estimation We will talk about estimating parameters of 1) Geometric models
More informationThe Phase Plane. Phase portraits; type and stability classifications of equilibrium solutions of systems of differential equations
The Phase Plane Phase portraits; type and stability classifications of equilibrium solutions of systems of differential equations Phase Portraits of Linear Systems Consider a systems of linear differential
More informationData Clustering. Dec 2nd, 2013 Kyrylo Bessonov
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms kmeans Hierarchical Main
More informationA NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION
1 A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION Dimitri Bertsekas M.I.T. FEBRUARY 2003 2 OUTLINE Convexity issues in optimization Historical remarks Our treatment of the subject Three unifying lines of
More information