Support Vector Machine. Tutorial. (and Statistical Learning Theory)


 Irene Shepherd
 3 years ago
 Views:
Transcription
1 Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA.
2 1 Support Vector Machines: history SVMs introduced in COLT92 by Boser, Guyon & Vapnik. Became rather popular since. Theoretically well motivated algorithm: developed from Statistical Learning Theory (Vapnik & Chervonenkis) since the 60s. Empirically good performance: successful applications in many fields(bioinformatics,tet,imagerecognition,...)
3 2 Support Vector Machines: history II Centralized website: Several tetbooks, e.g. An introduction to Support Vector Machines by Cristianini and ShaweTaylor is one. A large and diverse community work on them: from machine learning, optimization, statistics, neural networks, functional analysis, etc.
4 3 Support Vector Machines: basics [Boser, Guyon, Vapnik 92],[Cortes & Vapnik 95] margin margin Nice properties: conve, theoretically motivated, nonlinear with kernels..
5 4 Preliminaries: Machine learning is about learning structure from data. Although the class of algorithms called SVM s can do more, in this talk we focus on pattern recognition. So we want to learn the mapping: X Y,where Xis some object and y Yis a class label. Let s take the simplest case: 2class classification. So: R n, y {±1}.
6 5 Eample: Suppose we have 50 photographs of elephants and 50 photos of tigers. vs. We digitize them into piel images, so we have R n where n =10, 000. Now, given a new (different) photograph we want to answer the question: is it an elephant or a tiger? [we assume it is one or the other.]
7 6 Training sets and prediction models input/output sets X, Y training set ( 1,y 1 ),...,( m,y m ) generalization : given a previously seen X, find a suitable y Y. i.e., want to learn a classifier: y = f(, α),whereα are the parameters of the function. For eample, if we are choosing our model from the set of hyperplanes in R n,thenwehave: f(, {w, b}) =sign(w + b).
8 7 Empirical Risk and the true Risk We can try to learn f(, α) by choosing a function that performs well on training data: R emp (α) = 1 m m l(f( i,α),y i ) = Training Error i=1 where l is the zeroone loss function, l(y, ŷ) =1,ify ŷ,and0 otherwise. R emp is called the empirical risk. By doing this we are trying to minimize the overall risk: R(α) = l(f(, α), y)dp (, y) = Test Error where P(,y) is the (unknown) joint distribution function of and y.
9 8 Choosing the set of functions What about f(, α) allowing all functions from X to {±1}? Training set ( 1,y 1 ),...,( m,y m ) X {±1} Test set 1,..., m X, such that the two sets do not intersect. For any f there eists f : 1. f ( i )=f( i ) for all i 2. f ( j ) f( j ) for all j Based on the training data alone, there is no means of choosing which function is better. On the test set however they give different results. So generalization is not guaranteed. = a restriction must be placed on the functions that we allow.
10 9 Empirical Risk and the true Risk Vapnik & Chervonenkis showed that an upper bound on the true risk can be given by the empirical risk + an additional term: h(log( 2m h R(α) R emp (α)+ +1) log( η 4 ) m where h is the VC dimension of the set of functions parameterized by α. The VC dimension of a set of functions is a measure of their capacity or compleity. If you can describe a lot of different phenomena with a set of functions then the value of h is large. [VC dim = the maimum number of points that can be separated in all possible ways by that set of functions.]
11 10 VC dimension: The VC dimension of a set of functions is the maimum number of points that can be separated in all possible ways by that set of functions. For hyperplanes in R n, the VC dimension can be shown to be n +1.
12 11 VC dimension and capacity of functions Simplification of bound: Test Error Training Error + Compleity of set of Models Actually, a lot of bounds of this form have been proved (different measures of capacity). The compleity function is often called a regularizer. If you take a high capacity set of functions (eplain a lot) you get low training error. But you might overfit. If you take a very simple set of models, you have low compleity, but won t get low training error.
13 12 Capacity of a set of functions (classification) [Images taken from a talk by B. Schoelkopf.]
14 13 Capacity of a set of functions (regression) y sine curve fit hyperplane fit true function
15 14 Controlling the risk: model compleity Bound on the risk Confidence interval Empirical risk (training error) h 1 h* h n S1 S* S n
16 15 Capacity of hyperplanes Vapnik & Chervonenkis also showed the following: Consider hyperplanes (w ) =0where w is normalized w.r.t a set of points X such that: min i w i =1. The set of decision functions f w () =sign(w ) defined on X such that w A has a VC dimension satisfying h R 2 A 2. where R is the radius of the smallest sphere around the origin containing X. = minimize w 2 and have low capacity = minimizing w 2 equivalent to obtaining a large margin classifier
17 <w, > + b > 0 <w, > + b < 0 w { <w, > + b = 0}
18 { <w, > + b = 1} y i = 1 w { <w, > + b = +1} 2 1, y i = +1 Note: <w, 1 > + b = +1 <w, 2 > + b = 1 => <w, ( 1 2 )> = 2 => < w, w ( 1 2 ) > 2 = w { <w, > + b = 0}
19 16 Linear Support Vector Machines (at last!) So, we would like to find the function which minimizes an objective like: Training Error + Compleity term We write that as: 1 m m l(f( i,α),y i ) + Compleity term i=1 For now we will choose the set of hyperplanes (we will etend this later), so f() =(w )+b: 1 m m l(w i + b, y i ) + w 2 i=1 subject to min i w i =1.
20 17 Linear Support Vector Machines II That function before was a little difficult to minimize because of the step function in l(y, ŷ) (either 1 or 0). Let s assume we can separate the data perfectly. Then we can optimize the following: Minimize w 2, subject to: (w i + b) 1, if y i =1 (w i + b) 1, if y i = 1 The last two constraints can be compacted to: y i (w i + b) 1 This is a quadratic program.
21 18 SVMs : nonseparable case To deal with the nonseparable case, one can rewrite the problem as: Minimize: subject to: w 2 + C m i=1 ξ i y i (w i + b) 1 ξ i, ξ i 0 This is just the same as the original objective: 1 m m l(w i + b, y i ) + w 2 i=1 ecept l is no longer the zeroone loss, but is called the hingeloss : l(y, ŷ) =ma(0, 1 yŷ). This is still a quadratic program!
22 margin ξ i
23 19 Support Vector Machines  Primal Decision function: Primal formulation: f() =w + b min P (w, b)= 1 2 w 2 + C H 1 [ y i f( i )] } {{ } i } {{ } maimize margin minimize training error Ideally H 1 would count the number of errors, approimate with: H 1(z) Hinge Loss H 1 (z) =ma(0, 1 z) 0 z
24 20 SVMs : nonlinear case Linear classifiers aren t comple enough sometimes. SVM solution: Map data into a richer feature space including nonlinear features, then construct a hyperplane in that space so all other equations are the same! Formally, preprocess the data with: Φ() and then learn the map from Φ() to y: f() =w Φ()+b.
25 21 SVMs : polynomial mapping Φ:R 2 R 3 ( 1, 2 ) (z 1,z 2,z 3 ):=( 2 1, (2) 1 2, 2 2) 1 2 z 1 z 3 z 2
26 22 SVMs : nonlinear case II For eample MNIST handwriting recognition. 60,000 training eamples, test eamples, Linear SVM has around 8.5% test error. Polynomial SVM has around 1% test error
27 23 SVMs : full MNIST results Classifier Test Error linear 8.4% 3nearestneighbor 2.4% RBFSVM 1.4 % Tangent distance 1.1 % LeNet 1.1 % Boosted LeNet 0.7 % Translation invariant SVM 0.56 % Choosing a good mapping Φ( ) (encoding prior knowledge + getting right compleity of function class) for your problem improves results.
28 24 SVMs : the kernel trick Problem: the dimensionality of Φ() can be very large, making w hard to represent eplicitly in memory, and hard for the QP to solve. The Representer theorem (Kimeldorf & Wahba, 1971) shows that (for SVMs as a special case): m w = α i Φ( i ) for some variables α. Instead of optimizing w directly we can thus optimize α. The decision rule is now: f() = i=1 m α i Φ( i ) Φ()+b i=1 We call K( i,)=φ( i ) Φ() the kernel function.
29 25 Support Vector Machines  kernel trick II We can rewrite all the SVM equations we saw before, but with the w = m i=1 α iφ( i ) equation: Decision function: f() = i α i Φ( i ) Φ()+b = i α i K( i, )+b Dual formulation: min P (w, b)= 1 m 2 α i Φ( i ) 2 + C H 1 [ y i f( i )] i=1 i } {{ } } {{ } maimize margin minimize training error
30 26 Support Vector Machines  Dual But people normally write it like this: Dual formulation: min α D(α) = 1 2 α i α j Φ( i ) Φ( j ) i,j i y i α i s.t. Èi α i=0 0 y i α i C Dual Decision function: f() = i α i K( i, )+b Kernel function K(, ) is used to make (implicit) nonlinear feature map, e.g. Polynomial kernel: K(, )=( +1) d. RBF kernel: K(, )=ep( γ 2 ).
31 27 PolynomialSVMs The kernel K(, )=( ) d gives the same result as the eplicit mapping + dot product that we described before: Φ:R 2 R 3 ( 1, 2 ) (z 1,z 2,z 3 ):=( 2 1, (2) 1 2, 2 2) Φ(( 1, 2 ) Φ(( 1, 2)=( 2 1, (2) 1 2, 2 2) ( 2 1, (2) 1 2, 2 2) is the same as: = K(, )=( ) 2 =(( 1, 2 ) ( 1, 2)) 2 =( ) 2 = Interestingly, if d is large the kernel is still only requires n multiplications to compute, whereas the eplicit representation may not fit in memory!
32 28 RBFSVMs The RBF kernel K(, )=ep( γ 2 ) is one of the most popular kernel functions. It adds a bump around each data point: m f() = α i ep( γ i 2 )+b i=1 Φ.. ' Φ() Φ(') Using this one can get stateoftheart results.
33 29 SVMs : more results There is much more in the field of SVMs/ kernel machines than we could cover here, including: Regression, clustering, semisupervised learning and other domains. Lots of other kernels, e.g. string kernels to handle tet. Lots of research in modifications, e.g. to improve generalization ability, or tailoring to a particular task. Lots of research in speeding up training. Please see tet books such as the ones by Cristianini & ShaweTaylor or by Schoelkopf and Smola.
34 30 SVMs : software Lots of SVM software: LibSVM (C++) SVMLight (C) As well as complete machine learning toolboes that include SVMs: Torch (C++) Spider (Matlab) Weka (Java) All available through
Support Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationIntroduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multiclass classification.
More informationSearch Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
More informationA Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Largemargin linear
More informationSupport Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationSupport Vector Machine (SVM)
Support Vector Machine (SVM) CE725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept HardMargin SVM SoftMargin SVM Dual Problems of HardMargin
More informationNotes on Support Vector Machines
Notes on Support Vector Machines Fernando Mira da Silva Fernando.Silva@inesc.pt Neural Network Group I N E S C November 1998 Abstract This report describes an empirical study of Support Vector Machines
More informationComparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
More informationArtificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network?  Perceptron learners  Multilayer networks What is a Support
More informationKnowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
More informationLecture 6: Logistic Regression
Lecture 6: CS 19410, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,
More informationRegression Using Support Vector Machines: Basic Foundations
Regression Using Support Vector Machines: Basic Foundations Technical Report December 2004 Aly Farag and Refaat M Mohamed Computer Vision and Image Processing Laboratory Electrical and Computer Engineering
More informationSupport Vector Machines for Classification and Regression
UNIVERSITY OF SOUTHAMPTON Support Vector Machines for Classification and Regression by Steve R. Gunn Technical Report Faculty of Engineering, Science and Mathematics School of Electronics and Computer
More informationA Tutorial on Support Vector Machines for Pattern Recognition
c,, 1 43 () Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. A Tutorial on Support Vector Machines for Pattern Recognition CHRISTOPHER J.C. BURGES Bell Laboratories, Lucent Technologies
More informationSeveral Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
More informationLearning to Process Natural Language in Big Data Environment
CCF ADL 2015 Nanchang Oct 11, 2015 Learning to Process Natural Language in Big Data Environment Hang Li Noah s Ark Lab Huawei Technologies Part 1: Deep Learning  Present and Future Talk Outline Overview
More informationIntroduction to Machine Learning
Introduction to Machine Learning Felix Brockherde 12 Kristof Schütt 1 1 Technische Universität Berlin 2 Max Planck Institute of Microstructure Physics IPAM Tutorial 2013 Felix Brockherde, Kristof Schütt
More informationIntroduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011
Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning
More informationLecture 5: Conic Optimization: Overview
EE 227A: Conve Optimization and Applications January 31, 2012 Lecture 5: Conic Optimization: Overview Lecturer: Laurent El Ghaoui Reading assignment: Chapter 4 of BV. Sections 3.13.6 of WTB. 5.1 Linear
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationSupport Vector Machines
Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric
More informationScalable Developments for Big Data Analytics in Remote Sensing
Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.Ing. Morris Riedel et al. Research Group Leader,
More informationSupport Vector Machines
Support Vector Machines Here we approach the twoclass classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard
More informationSupport Vector Pruning with SortedVotes for LargeScale Datasets
Support Vector Pruning with SortedVotes for LargeScale Datasets Frerk Saxen, Konrad Doll and Ulrich Brunsmann University of Applied Sciences Aschaffenburg, Germany Email: {Frerk.Saxen, Konrad.Doll, Ulrich.Brunsmann}@hab.de
More informationA User s Guide to Support Vector Machines
A User s Guide to Support Vector Machines Asa BenHur Department of Computer Science Colorado State University Jason Weston NEC Labs America Princeton, NJ 08540 USA Abstract The Support Vector Machine
More informationOnline Classification on a Budget
Online Classification on a Budget Koby Crammer Computer Sci. & Eng. Hebrew University Jerusalem 91904, Israel kobics@cs.huji.ac.il Jaz Kandola Royal Holloway, University of London Egham, UK jaz@cs.rhul.ac.uk
More informationLearning Using Privileged Information: Similarity Control and Knowledge Transfer
Journal of Machine Learning Research 16 (2015) 20232049 Submitted 7/15; Published 9/15 In memory of Alexey Chervonenkis Learning Using Privileged Information: Similarity Control and Knowledge Transfer
More informationChoosing Multiple Parameters for Support Vector Machines
Machine Learning, 46, 131 159, 2002 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Choosing Multiple Parameters for Support Vector Machines OLIVIER CHAPELLE LIP6, Paris, France olivier.chapelle@lip6.fr
More informationData Mining  Evaluation of Classifiers
Data Mining  Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More informationIntroduction to Machine Learning NPFL 054
Introduction to Machine Learning NPFL 054 http://ufal.mff.cuni.cz/course/npfl054 Barbora Hladká hladka@ufal.mff.cuni.cz Martin Holub holub@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationNeural Networks. CAP5610 Machine Learning Instructor: GuoJun Qi
Neural Networks CAP5610 Machine Learning Instructor: GuoJun Qi Recap: linear classifier Logistic regression Maximizing the posterior distribution of class Y conditional on the input vector X Support vector
More informationClass #6: Nonlinear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Nonlinear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Nonlinear classification Linear Support Vector Machines
More informationFast Kernel Classifiers with Online and Active Learning
Journal of Machine Learning Research 6 (2005) 1579 1619 Submitted 3/05; Published 9/05 Fast Kernel Classifiers with Online and Active Learning Antoine Bordes NEC Laboratories America 4 Independence Way
More informationOnline learning of multiclass Support Vector Machines
IT 12 061 Examensarbete 30 hp November 2012 Online learning of multiclass Support Vector Machines Xuan Tuan Trinh Institutionen för informationsteknologi Department of Information Technology Abstract
More informationIntroduction to Online Learning Theory
Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent
More informationData visualization and dimensionality reduction using kernel maps with a reference point
Data visualization and dimensionality reduction using kernel maps with a reference point Johan Suykens K.U. Leuven, ESATSCD/SISTA Kasteelpark Arenberg 1 B31 Leuven (Heverlee), Belgium Tel: 32/16/32 18
More informationMachine learning for algo trading
Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with
More informationMaximum Margin Clustering
Maximum Margin Clustering Linli Xu James Neufeld Bryce Larson Dale Schuurmans University of Waterloo University of Alberta Abstract We propose a new method for clustering based on finding maximum margin
More informationMachine Learning in Spam Filtering
Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.
More informationfml The SHOGUN Machine Learning Toolbox (and its python interface)
fml The SHOGUN Machine Learning Toolbox (and its python interface) Sören Sonnenburg 1,2, Gunnar Rätsch 2,Sebastian Henschel 2,Christian Widmer 2,Jonas Behr 2,Alexander Zien 2,Fabio de Bona 2,Alexander
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationUsing artificial intelligence for data reduction in mechanical engineering
Using artificial intelligence for data reduction in mechanical engineering L. Mdlazi 1, C.J. Stander 1, P.S. Heyns 1, T. Marwala 2 1 Dynamic Systems Group Department of Mechanical and Aeronautical Engineering,
More informationSupport Vector Machines
CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning algorithm. SVMs are among the best (and many believe are indeed the best)
More informationINTRUSION DETECTION USING THE SUPPORT VECTOR MACHINE ENHANCED WITH A FEATUREWEIGHT KERNEL
INTRUSION DETECTION USING THE SUPPORT VECTOR MACHINE ENHANCED WITH A FEATUREWEIGHT KERNEL A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial Fulfillment of the Requirements
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationSimple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRALab,
More informationTraining Invariant Support Vector Machines
Machine Learning, 46, 161 190, 2002 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Training Invariant Support Vector Machines DENNIS DECOSTE decoste@aig.jpl.nasa.gov Jet Propulsion
More informationSupportVector Networks
Machine Learning, 20, 273297 (1995) 1995 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. SupportVector Networks CORINNA CORTES VLADIMIR VAPNIK AT&T Bell Labs., Holmdel, NJ 07733,
More informationMACHINE LEARNING. Introduction. Alessandro Moschitti
MACHINE LEARNING Introduction Alessandro Moschitti Department of Computer Science and Information Engineering University of Trento Email: moschitti@disi.unitn.it Course Schedule Lectures Tuesday, 14:0016:00
More informationL1 vs. L2 Regularization and feature selection.
L1 vs. L2 Regularization and feature selection. Paper by Andrew Ng (2004) Presentation by Afshin Rostami Main Topics Covering Numbers Definition Convergence Bounds L1 regularized logistic regression L1
More informationMethods to Solve Quadratic Equations
Methods to Solve Quadratic Equations We have been learning how to factor epressions. Now we will apply factoring to another skill you must learn solving quadratic equations. a b c 0 is a seconddegree
More informationA Survey of Kernel Clustering Methods
A Survey of Kernel Clustering Methods Maurizio Filippone, Francesco Camastra, Francesco Masulli and Stefano Rovetta Presented by: Kedar Grama Outline Unsupervised Learning and Clustering Types of clustering
More informationNetwork Intrusion Detection using Semi Supervised Support Vector Machine
Network Intrusion Detection using Semi Supervised Support Vector Machine Jyoti Haweliya Department of Computer Engineering Institute of Engineering & Technology, Devi Ahilya University Indore, India ABSTRACT
More informationSpam detection with data mining method:
Spam detection with data mining method: Ensemble learning with multiple SVM based classifiers to optimize generalization ability of email spam classification Keywords: ensemble learning, SVM classifier,
More informationAdapting Codes and Embeddings for Polychotomies
Adapting Codes and Embeddings for Polychotomies Gunnar Rätsch, Alexander J. Smola RSISE, CSL, Machine Learning Group The Australian National University Canberra, 2 ACT, Australia Gunnar.Raetsch, Alex.Smola
More informationChapter 7. Diagnosis and Prognosis of Breast Cancer using Histopathological Data
Chapter 7 Diagnosis and Prognosis of Breast Cancer using Histopathological Data In the previous chapter, a method for classification of mammograms using wavelet analysis and adaptive neurofuzzy inference
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationA Support Vector Method for Multivariate Performance Measures
Thorsten Joachims Cornell University, Dept. of Computer Science, 4153 Upson Hall, Ithaca, NY 14853 USA tj@cs.cornell.edu Abstract This paper presents a Support Vector Method for optimizing multivariate
More informationActive Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
More informationMoving Least Squares Approximation
Chapter 7 Moving Least Squares Approimation An alternative to radial basis function interpolation and approimation is the socalled moving least squares method. As we will see below, in this method the
More informationAn Introduction to Statistical Machine Learning  Overview 
An Introduction to Statistical Machine Learning  Overview  Samy Bengio bengio@idiap.ch Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP) CP 592, rue du Simplon 4 1920 Martigny, Switzerland
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationBreaking SVM Complexity with CrossTraining
Breaking SVM Complexity with CrossTraining Gökhan H. Bakır Max Planck Institute for Biological Cybernetics, Tübingen, Germany gb@tuebingen.mpg.de Léon Bottou NEC Labs America Princeton NJ, USA leon@bottou.org
More informationCHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES
CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationClassifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical microclustering algorithm ClusteringBased SVM (CBSVM) Experimental
More informationPolynomial and Rational Functions
Chapter Section.1 Quadratic Functions Polnomial and Rational Functions Objective: In this lesson ou learned how to sketch and analze graphs of quadratic functions. Course Number Instructor Date Important
More informationLarge Margin DAGs for Multiclass Classification
S.A. Solla, T.K. Leen and K.R. Müller (eds.), 57 55, MIT Press (000) Large Margin DAGs for Multiclass Classification John C. Platt Microsoft Research Microsoft Way Redmond, WA 9805 jplatt@microsoft.com
More informationBIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM 10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
More informationChapter 5.1 Systems of linear inequalities in two variables.
Chapter 5. Systems of linear inequalities in two variables. In this section, we will learn how to graph linear inequalities in two variables and then apply this procedure to practical application problems.
More informationIntroduction to Machine Learning. Rob Schapire Princeton University. schapire
Introduction to Machine Learning Rob Schapire Princeton University www.cs.princeton.edu/ schapire Machine Learning studies how to automatically learn to make accurate predictions based on past observations
More informationContentBased Recommendation
ContentBased Recommendation Contentbased? Item descriptions to identify items that are of particular interest to the user Example Example Comparing with Noncontent based Items Userbased CF Searches
More informationChristian Kaul, Johannes Knabe, Tobias Lang, Volker Sperschneider. Filtering Spam Email with Support Vector Machines PICS
Christian Kaul, Johannes Knabe, Tobias Lang, Volker Sperschneider Filtering Spam Email with Support Vector Machines PICS Publications of the Institute of Cognitive Science Volume 82004 ISSN: 16105389
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationA fast multiclass SVM learning method for huge databases
www.ijcsi.org 544 A fast multiclass SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and TalebAhmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,
More informationA Hybrid Forecasting Methodology using Feature Selection and Support Vector Regression
A Hybrid Forecasting Methodology using Feature Selection and Support Vector Regression José Guajardo, Jaime Miranda, and Richard Weber, Department of Industrial Engineering, University of Chile Abstract
More informationCHAPTER 13. Definite Integrals. Since integration can be used in a practical sense in many applications it is often
7 CHAPTER Definite Integrals Since integration can be used in a practical sense in many applications it is often useful to have integrals evaluated for different values of the variable of integration.
More informationEarly defect identification of semiconductor processes using machine learning
STANFORD UNIVERISTY MACHINE LEARNING CS229 Early defect identification of semiconductor processes using machine learning Friday, December 16, 2011 Authors: Saul ROSA Anton VLADIMIROV Professor: Dr. Andrew
More informationA Neural Support Vector Network Architecture with Adaptive Kernels. 1 Introduction. 2 Support Vector Machines and Motivations
A Neural Support Vector Network Architecture with Adaptive Kernels Pascal Vincent & Yoshua Bengio Département d informatique et recherche opérationnelle Université de Montréal C.P. 6128 Succ. CentreVille,
More informationCalculus and Vectors. Day Lesson Title Math Learning Goals Expectations 1, 2, 3,
Unit Applying Properties of Derivatives Calculus and Vectors Lesson Outline Day Lesson Title Math Learning Goals Epectations 1,,, The Second Derivative (Sample Lessons Included) 4 Curve Sketching from
More informationChapter 13. A User s Guide to Support Vector Machines. Asa BenHur and Jason Weston. Abstract. 1. Introduction
Chapter 13 A User s Guide to Support Vector Machines Asa BenHur and Jason Weston Abstract The Support Vector Machine (SVM) is a widely used classifier in bioinformatics. Obtaining the best results with
More informationServer Load Prediction
Server Load Prediction Suthee Chaidaroon (unsuthee@stanford.edu) Joon Yeong Kim (kim64@stanford.edu) Jonghan Seo (jonghan@stanford.edu) Abstract Estimating server load average is one of the methods that
More informationClassification: Basic Concepts, Decision Trees, and Model Evaluation. General Approach for Building Classification Model
10 10 Classification: Basic Concepts, Decision Trees, and Model Evaluation Dr. Hui Xiong Rutgers University Introduction to Data Mining 1//009 1 General Approach for Building Classification Model Tid Attrib1
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.
Lecture 1 Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationCarrier Relevance Study for Indoor Localization Using GSM
1 Carrier Relevance Study for Indoor Localization Using GSM Iness Ahriz, Yacine Oussar, Bruce Denby, Senior Member, IEEE, Gérard Dreyfus, Senior Member, IEEE Abstract A study is made of subsets of relevant
More informationA Linear Programming Approach to Novelty Detection
A Linear Programming Approach to Novelty Detection Colin Campbell Dept. of Engineering Mathematics, Bristol University, Bristol Bristol, BS8 1 TR, United Kingdon C. Campbell@bris.ac.uk Kristin P. Bennett
More informationNeural Networks and Support Vector Machines
INF5390  Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF539013 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines
More informationClassification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
More informationIntroduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu
Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics
More informationMaking Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research
More informationThe Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
More informationA Random Sampling Technique for Training Support Vector Machines (For PrimalForm MaximalMargin Classifiers)
A Random Sampling Technique for Training Support Vector Machines (For PrimalForm MaximalMargin Classifiers) Jose Balcázar 1, Yang Dai 2, and Osamu Watanabe 2 1 Dept. Llenguatges i Sistemes Informatics,
More informationSection 37. Marginal Analysis in Business and Economics. Marginal Cost, Revenue, and Profit. 202 Chapter 3 The Derivative
202 Chapter 3 The Derivative Section 37 Marginal Analysis in Business and Economics Marginal Cost, Revenue, and Profit Application Marginal Average Cost, Revenue, and Profit Marginal Cost, Revenue, and
More informationNovelty Detection in image recognition using IRF Neural Networks properties
Novelty Detection in image recognition using IRF Neural Networks properties Philippe Smagghe, JeanLuc Buessler, JeanPhilippe Urban Université de HauteAlsace MIPS 4, rue des Frères Lumière, 68093 Mulhouse,
More informationMassive Data Classification via Unconstrained Support Vector Machines
Massive Data Classification via Unconstrained Support Vector Machines Olvi L. Mangasarian and Michael E. Thompson Computer Sciences Department University of Wisconsin 1210 West Dayton Street Madison, WI
More informationDUOL: A Double Updating Approach for Online Learning
: A Double Updating Approach for Online Learning Peilin Zhao School of Comp. Eng. Nanyang Tech. University Singapore 69798 zhao6@ntu.edu.sg Steven C.H. Hoi School of Comp. Eng. Nanyang Tech. University
More information