# Support Vector Machine. Tutorial. (and Statistical Learning Theory)

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA.

2 1 Support Vector Machines: history SVMs introduced in COLT-92 by Boser, Guyon & Vapnik. Became rather popular since. Theoretically well motivated algorithm: developed from Statistical Learning Theory (Vapnik & Chervonenkis) since the 60s. Empirically good performance: successful applications in many fields(bioinformatics,tet,imagerecognition,...)

3 2 Support Vector Machines: history II Centralized website: Several tetbooks, e.g. An introduction to Support Vector Machines by Cristianini and Shawe-Taylor is one. A large and diverse community work on them: from machine learning, optimization, statistics, neural networks, functional analysis, etc.

4 3 Support Vector Machines: basics [Boser, Guyon, Vapnik 92],[Cortes & Vapnik 95] margin margin Nice properties: conve, theoretically motivated, nonlinear with kernels..

5 4 Preliminaries: Machine learning is about learning structure from data. Although the class of algorithms called SVM s can do more, in this talk we focus on pattern recognition. So we want to learn the mapping: X Y,where Xis some object and y Yis a class label. Let s take the simplest case: 2-class classification. So: R n, y {±1}.

6 5 Eample: Suppose we have 50 photographs of elephants and 50 photos of tigers. vs. We digitize them into piel images, so we have R n where n =10, 000. Now, given a new (different) photograph we want to answer the question: is it an elephant or a tiger? [we assume it is one or the other.]

7 6 Training sets and prediction models input/output sets X, Y training set ( 1,y 1 ),...,( m,y m ) generalization : given a previously seen X, find a suitable y Y. i.e., want to learn a classifier: y = f(, α),whereα are the parameters of the function. For eample, if we are choosing our model from the set of hyperplanes in R n,thenwehave: f(, {w, b}) =sign(w + b).

8 7 Empirical Risk and the true Risk We can try to learn f(, α) by choosing a function that performs well on training data: R emp (α) = 1 m m l(f( i,α),y i ) = Training Error i=1 where l is the zero-one loss function, l(y, ŷ) =1,ify ŷ,and0 otherwise. R emp is called the empirical risk. By doing this we are trying to minimize the overall risk: R(α) = l(f(, α), y)dp (, y) = Test Error where P(,y) is the (unknown) joint distribution function of and y.

9 8 Choosing the set of functions What about f(, α) allowing all functions from X to {±1}? Training set ( 1,y 1 ),...,( m,y m ) X {±1} Test set 1,..., m X, such that the two sets do not intersect. For any f there eists f : 1. f ( i )=f( i ) for all i 2. f ( j ) f( j ) for all j Based on the training data alone, there is no means of choosing which function is better. On the test set however they give different results. So generalization is not guaranteed. = a restriction must be placed on the functions that we allow.

10 9 Empirical Risk and the true Risk Vapnik & Chervonenkis showed that an upper bound on the true risk can be given by the empirical risk + an additional term: h(log( 2m h R(α) R emp (α)+ +1) log( η 4 ) m where h is the VC dimension of the set of functions parameterized by α. The VC dimension of a set of functions is a measure of their capacity or compleity. If you can describe a lot of different phenomena with a set of functions then the value of h is large. [VC dim = the maimum number of points that can be separated in all possible ways by that set of functions.]

11 10 VC dimension: The VC dimension of a set of functions is the maimum number of points that can be separated in all possible ways by that set of functions. For hyperplanes in R n, the VC dimension can be shown to be n +1.

12 11 VC dimension and capacity of functions Simplification of bound: Test Error Training Error + Compleity of set of Models Actually, a lot of bounds of this form have been proved (different measures of capacity). The compleity function is often called a regularizer. If you take a high capacity set of functions (eplain a lot) you get low training error. But you might overfit. If you take a very simple set of models, you have low compleity, but won t get low training error.

13 12 Capacity of a set of functions (classification) [Images taken from a talk by B. Schoelkopf.]

14 13 Capacity of a set of functions (regression) y sine curve fit hyperplane fit true function

15 14 Controlling the risk: model compleity Bound on the risk Confidence interval Empirical risk (training error) h 1 h* h n S1 S* S n

16 15 Capacity of hyperplanes Vapnik & Chervonenkis also showed the following: Consider hyperplanes (w ) =0where w is normalized w.r.t a set of points X such that: min i w i =1. The set of decision functions f w () =sign(w ) defined on X such that w A has a VC dimension satisfying h R 2 A 2. where R is the radius of the smallest sphere around the origin containing X. = minimize w 2 and have low capacity = minimizing w 2 equivalent to obtaining a large margin classifier

17 <w, > + b > 0 <w, > + b < 0 w { <w, > + b = 0}

18 { <w, > + b = 1} y i = 1 w { <w, > + b = +1} 2 1, y i = +1 Note: <w, 1 > + b = +1 <w, 2 > + b = 1 => <w, ( 1 2 )> = 2 => < w, w ( 1 2 ) > 2 = w { <w, > + b = 0}

19 16 Linear Support Vector Machines (at last!) So, we would like to find the function which minimizes an objective like: Training Error + Compleity term We write that as: 1 m m l(f( i,α),y i ) + Compleity term i=1 For now we will choose the set of hyperplanes (we will etend this later), so f() =(w )+b: 1 m m l(w i + b, y i ) + w 2 i=1 subject to min i w i =1.

20 17 Linear Support Vector Machines II That function before was a little difficult to minimize because of the step function in l(y, ŷ) (either 1 or 0). Let s assume we can separate the data perfectly. Then we can optimize the following: Minimize w 2, subject to: (w i + b) 1, if y i =1 (w i + b) 1, if y i = 1 The last two constraints can be compacted to: y i (w i + b) 1 This is a quadratic program.

21 18 SVMs : non-separable case To deal with the non-separable case, one can rewrite the problem as: Minimize: subject to: w 2 + C m i=1 ξ i y i (w i + b) 1 ξ i, ξ i 0 This is just the same as the original objective: 1 m m l(w i + b, y i ) + w 2 i=1 ecept l is no longer the zero-one loss, but is called the hinge-loss : l(y, ŷ) =ma(0, 1 yŷ). This is still a quadratic program!

22 margin ξ i

23 19 Support Vector Machines - Primal Decision function: Primal formulation: f() =w + b min P (w, b)= 1 2 w 2 + C H 1 [ y i f( i )] } {{ } i } {{ } maimize margin minimize training error Ideally H 1 would count the number of errors, approimate with: H 1(z) Hinge Loss H 1 (z) =ma(0, 1 z) 0 z

24 20 SVMs : non-linear case Linear classifiers aren t comple enough sometimes. SVM solution: Map data into a richer feature space including nonlinear features, then construct a hyperplane in that space so all other equations are the same! Formally, preprocess the data with: Φ() and then learn the map from Φ() to y: f() =w Φ()+b.

25 21 SVMs : polynomial mapping Φ:R 2 R 3 ( 1, 2 ) (z 1,z 2,z 3 ):=( 2 1, (2) 1 2, 2 2) 1 2 z 1 z 3 z 2

26 22 SVMs : non-linear case II For eample MNIST hand-writing recognition. 60,000 training eamples, test eamples, Linear SVM has around 8.5% test error. Polynomial SVM has around 1% test error

27 23 SVMs : full MNIST results Classifier Test Error linear 8.4% 3-nearest-neighbor 2.4% RBF-SVM 1.4 % Tangent distance 1.1 % LeNet 1.1 % Boosted LeNet 0.7 % Translation invariant SVM 0.56 % Choosing a good mapping Φ( ) (encoding prior knowledge + getting right compleity of function class) for your problem improves results.

28 24 SVMs : the kernel trick Problem: the dimensionality of Φ() can be very large, making w hard to represent eplicitly in memory, and hard for the QP to solve. The Representer theorem (Kimeldorf & Wahba, 1971) shows that (for SVMs as a special case): m w = α i Φ( i ) for some variables α. Instead of optimizing w directly we can thus optimize α. The decision rule is now: f() = i=1 m α i Φ( i ) Φ()+b i=1 We call K( i,)=φ( i ) Φ() the kernel function.

29 25 Support Vector Machines - kernel trick II We can rewrite all the SVM equations we saw before, but with the w = m i=1 α iφ( i ) equation: Decision function: f() = i α i Φ( i ) Φ()+b = i α i K( i, )+b Dual formulation: min P (w, b)= 1 m 2 α i Φ( i ) 2 + C H 1 [ y i f( i )] i=1 i } {{ } } {{ } maimize margin minimize training error

30 26 Support Vector Machines - Dual But people normally write it like this: Dual formulation: min α D(α) = 1 2 α i α j Φ( i ) Φ( j ) i,j i y i α i s.t. Èi α i=0 0 y i α i C Dual Decision function: f() = i α i K( i, )+b Kernel function K(, ) is used to make (implicit) nonlinear feature map, e.g. Polynomial kernel: K(, )=( +1) d. RBF kernel: K(, )=ep( γ 2 ).

31 27 Polynomial-SVMs The kernel K(, )=( ) d gives the same result as the eplicit mapping + dot product that we described before: Φ:R 2 R 3 ( 1, 2 ) (z 1,z 2,z 3 ):=( 2 1, (2) 1 2, 2 2) Φ(( 1, 2 ) Φ(( 1, 2)=( 2 1, (2) 1 2, 2 2) ( 2 1, (2) 1 2, 2 2) is the same as: = K(, )=( ) 2 =(( 1, 2 ) ( 1, 2)) 2 =( ) 2 = Interestingly, if d is large the kernel is still only requires n multiplications to compute, whereas the eplicit representation may not fit in memory!

32 28 RBF-SVMs The RBF kernel K(, )=ep( γ 2 ) is one of the most popular kernel functions. It adds a bump around each data point: m f() = α i ep( γ i 2 )+b i=1 Φ.. ' Φ() Φ(') Using this one can get state-of-the-art results.

33 29 SVMs : more results There is much more in the field of SVMs/ kernel machines than we could cover here, including: Regression, clustering, semi-supervised learning and other domains. Lots of other kernels, e.g. string kernels to handle tet. Lots of research in modifications, e.g. to improve generalization ability, or tailoring to a particular task. Lots of research in speeding up training. Please see tet books such as the ones by Cristianini & Shawe-Taylor or by Schoelkopf and Smola.

34 30 SVMs : software Lots of SVM software: LibSVM (C++) SVMLight (C) As well as complete machine learning toolboes that include SVMs: Torch (C++) Spider (Matlab) Weka (Java) All available through

### Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

### Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

### Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

### A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

### Support Vector Machines Explained

March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

### Support Vector Machine (SVM)

Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

### Notes on Support Vector Machines

Notes on Support Vector Machines Fernando Mira da Silva Fernando.Silva@inesc.pt Neural Network Group I N E S C November 1998 Abstract This report describes an empirical study of Support Vector Machines

### Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

### Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

### Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

### Lecture 6: Logistic Regression

Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,

### Regression Using Support Vector Machines: Basic Foundations

Regression Using Support Vector Machines: Basic Foundations Technical Report December 2004 Aly Farag and Refaat M Mohamed Computer Vision and Image Processing Laboratory Electrical and Computer Engineering

### Support Vector Machines for Classification and Regression

UNIVERSITY OF SOUTHAMPTON Support Vector Machines for Classification and Regression by Steve R. Gunn Technical Report Faculty of Engineering, Science and Mathematics School of Electronics and Computer

### A Tutorial on Support Vector Machines for Pattern Recognition

c,, 1 43 () Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. A Tutorial on Support Vector Machines for Pattern Recognition CHRISTOPHER J.C. BURGES Bell Laboratories, Lucent Technologies

### Several Views of Support Vector Machines

Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

### Learning to Process Natural Language in Big Data Environment

CCF ADL 2015 Nanchang Oct 11, 2015 Learning to Process Natural Language in Big Data Environment Hang Li Noah s Ark Lab Huawei Technologies Part 1: Deep Learning - Present and Future Talk Outline Overview

### Introduction to Machine Learning

Introduction to Machine Learning Felix Brockherde 12 Kristof Schütt 1 1 Technische Universität Berlin 2 Max Planck Institute of Microstructure Physics IPAM Tutorial 2013 Felix Brockherde, Kristof Schütt

### Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

### Lecture 5: Conic Optimization: Overview

EE 227A: Conve Optimization and Applications January 31, 2012 Lecture 5: Conic Optimization: Overview Lecturer: Laurent El Ghaoui Reading assignment: Chapter 4 of BV. Sections 3.1-3.6 of WTB. 5.1 Linear

### Big Data Analytics CSCI 4030

High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

### Support Vector Machines

Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric

### Scalable Developments for Big Data Analytics in Remote Sensing

Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,

### Support Vector Machines

Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two

### Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard

### Support Vector Pruning with SortedVotes for Large-Scale Datasets

Support Vector Pruning with SortedVotes for Large-Scale Datasets Frerk Saxen, Konrad Doll and Ulrich Brunsmann University of Applied Sciences Aschaffenburg, Germany Email: {Frerk.Saxen, Konrad.Doll, Ulrich.Brunsmann}@h-ab.de

### A User s Guide to Support Vector Machines

A User s Guide to Support Vector Machines Asa Ben-Hur Department of Computer Science Colorado State University Jason Weston NEC Labs America Princeton, NJ 08540 USA Abstract The Support Vector Machine

### Online Classification on a Budget

Online Classification on a Budget Koby Crammer Computer Sci. & Eng. Hebrew University Jerusalem 91904, Israel kobics@cs.huji.ac.il Jaz Kandola Royal Holloway, University of London Egham, UK jaz@cs.rhul.ac.uk

### Learning Using Privileged Information: Similarity Control and Knowledge Transfer

Journal of Machine Learning Research 16 (2015) 2023-2049 Submitted 7/15; Published 9/15 In memory of Alexey Chervonenkis Learning Using Privileged Information: Similarity Control and Knowledge Transfer

### Choosing Multiple Parameters for Support Vector Machines

Machine Learning, 46, 131 159, 2002 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Choosing Multiple Parameters for Support Vector Machines OLIVIER CHAPELLE LIP6, Paris, France olivier.chapelle@lip6.fr

### Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

### An Introduction to Data Mining

An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

### Introduction to Machine Learning NPFL 054

Introduction to Machine Learning NPFL 054 http://ufal.mff.cuni.cz/course/npfl054 Barbora Hladká hladka@ufal.mff.cuni.cz Martin Holub holub@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and

### An Introduction to Machine Learning

An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

### Neural Networks. CAP5610 Machine Learning Instructor: Guo-Jun Qi

Neural Networks CAP5610 Machine Learning Instructor: Guo-Jun Qi Recap: linear classifier Logistic regression Maximizing the posterior distribution of class Y conditional on the input vector X Support vector

### Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

### Fast Kernel Classifiers with Online and Active Learning

Journal of Machine Learning Research 6 (2005) 1579 1619 Submitted 3/05; Published 9/05 Fast Kernel Classifiers with Online and Active Learning Antoine Bordes NEC Laboratories America 4 Independence Way

### Online learning of multi-class Support Vector Machines

IT 12 061 Examensarbete 30 hp November 2012 Online learning of multi-class Support Vector Machines Xuan Tuan Trinh Institutionen för informationsteknologi Department of Information Technology Abstract

### Introduction to Online Learning Theory

Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent

### Data visualization and dimensionality reduction using kernel maps with a reference point

Data visualization and dimensionality reduction using kernel maps with a reference point Johan Suykens K.U. Leuven, ESAT-SCD/SISTA Kasteelpark Arenberg 1 B-31 Leuven (Heverlee), Belgium Tel: 32/16/32 18

### Machine learning for algo trading

Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with

### Maximum Margin Clustering

Maximum Margin Clustering Linli Xu James Neufeld Bryce Larson Dale Schuurmans University of Waterloo University of Alberta Abstract We propose a new method for clustering based on finding maximum margin

### Machine Learning in Spam Filtering

Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.

### fml The SHOGUN Machine Learning Toolbox (and its python interface)

fml The SHOGUN Machine Learning Toolbox (and its python interface) Sören Sonnenburg 1,2, Gunnar Rätsch 2,Sebastian Henschel 2,Christian Widmer 2,Jonas Behr 2,Alexander Zien 2,Fabio de Bona 2,Alexander

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Using artificial intelligence for data reduction in mechanical engineering

Using artificial intelligence for data reduction in mechanical engineering L. Mdlazi 1, C.J. Stander 1, P.S. Heyns 1, T. Marwala 2 1 Dynamic Systems Group Department of Mechanical and Aeronautical Engineering,

### Support Vector Machines

CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning algorithm. SVMs are among the best (and many believe are indeed the best)

### INTRUSION DETECTION USING THE SUPPORT VECTOR MACHINE ENHANCED WITH A FEATURE-WEIGHT KERNEL

INTRUSION DETECTION USING THE SUPPORT VECTOR MACHINE ENHANCED WITH A FEATURE-WEIGHT KERNEL A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial Fulfillment of the Requirements

### CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

### Simple and efficient online algorithms for real world applications

Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,

### Training Invariant Support Vector Machines

Machine Learning, 46, 161 190, 2002 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Training Invariant Support Vector Machines DENNIS DECOSTE decoste@aig.jpl.nasa.gov Jet Propulsion

### Support-Vector Networks

Machine Learning, 20, 273-297 (1995) 1995 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Support-Vector Networks CORINNA CORTES VLADIMIR VAPNIK AT&T Bell Labs., Holmdel, NJ 07733,

### MACHINE LEARNING. Introduction. Alessandro Moschitti

MACHINE LEARNING Introduction Alessandro Moschitti Department of Computer Science and Information Engineering University of Trento Email: moschitti@disi.unitn.it Course Schedule Lectures Tuesday, 14:00-16:00

### L1 vs. L2 Regularization and feature selection.

L1 vs. L2 Regularization and feature selection. Paper by Andrew Ng (2004) Presentation by Afshin Rostami Main Topics Covering Numbers Definition Convergence Bounds L1 regularized logistic regression L1

### Methods to Solve Quadratic Equations

Methods to Solve Quadratic Equations We have been learning how to factor epressions. Now we will apply factoring to another skill you must learn solving quadratic equations. a b c 0 is a second-degree

### A Survey of Kernel Clustering Methods

A Survey of Kernel Clustering Methods Maurizio Filippone, Francesco Camastra, Francesco Masulli and Stefano Rovetta Presented by: Kedar Grama Outline Unsupervised Learning and Clustering Types of clustering

### Network Intrusion Detection using Semi Supervised Support Vector Machine

Network Intrusion Detection using Semi Supervised Support Vector Machine Jyoti Haweliya Department of Computer Engineering Institute of Engineering & Technology, Devi Ahilya University Indore, India ABSTRACT

### Spam detection with data mining method:

Spam detection with data mining method: Ensemble learning with multiple SVM based classifiers to optimize generalization ability of email spam classification Keywords: ensemble learning, SVM classifier,

### Adapting Codes and Embeddings for Polychotomies

Adapting Codes and Embeddings for Polychotomies Gunnar Rätsch, Alexander J. Smola RSISE, CSL, Machine Learning Group The Australian National University Canberra, 2 ACT, Australia Gunnar.Raetsch, Alex.Smola

### Chapter 7. Diagnosis and Prognosis of Breast Cancer using Histopathological Data

Chapter 7 Diagnosis and Prognosis of Breast Cancer using Histopathological Data In the previous chapter, a method for classification of mammograms using wavelet analysis and adaptive neuro-fuzzy inference

### Linear Threshold Units

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

### A Support Vector Method for Multivariate Performance Measures

Thorsten Joachims Cornell University, Dept. of Computer Science, 4153 Upson Hall, Ithaca, NY 14853 USA tj@cs.cornell.edu Abstract This paper presents a Support Vector Method for optimizing multivariate

### Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

### Moving Least Squares Approximation

Chapter 7 Moving Least Squares Approimation An alternative to radial basis function interpolation and approimation is the so-called moving least squares method. As we will see below, in this method the

### An Introduction to Statistical Machine Learning - Overview -

An Introduction to Statistical Machine Learning - Overview - Samy Bengio bengio@idiap.ch Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP) CP 592, rue du Simplon 4 1920 Martigny, Switzerland

### Predict Influencers in the Social Network

Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

### Breaking SVM Complexity with Cross-Training

Breaking SVM Complexity with Cross-Training Gökhan H. Bakır Max Planck Institute for Biological Cybernetics, Tübingen, Germany gb@tuebingen.mpg.de Léon Bottou NEC Labs America Princeton NJ, USA leon@bottou.org

### CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,

### Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

### Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental

### Polynomial and Rational Functions

Chapter Section.1 Quadratic Functions Polnomial and Rational Functions Objective: In this lesson ou learned how to sketch and analze graphs of quadratic functions. Course Number Instructor Date Important

### Large Margin DAGs for Multiclass Classification

S.A. Solla, T.K. Leen and K.-R. Müller (eds.), 57 55, MIT Press (000) Large Margin DAGs for Multiclass Classification John C. Platt Microsoft Research Microsoft Way Redmond, WA 9805 jplatt@microsoft.com

### BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376

Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.

### Chapter 5.1 Systems of linear inequalities in two variables.

Chapter 5. Systems of linear inequalities in two variables. In this section, we will learn how to graph linear inequalities in two variables and then apply this procedure to practical application problems.

### Introduction to Machine Learning. Rob Schapire Princeton University. schapire

Introduction to Machine Learning Rob Schapire Princeton University www.cs.princeton.edu/ schapire Machine Learning studies how to automatically learn to make accurate predictions based on past observations

### Content-Based Recommendation

Content-Based Recommendation Content-based? Item descriptions to identify items that are of particular interest to the user Example Example Comparing with Noncontent based Items User-based CF Searches

### Christian Kaul, Johannes Knabe, Tobias Lang, Volker Sperschneider. Filtering Spam E-mail with Support Vector Machines PICS

Christian Kaul, Johannes Knabe, Tobias Lang, Volker Sperschneider Filtering Spam E-mail with Support Vector Machines PICS Publications of the Institute of Cognitive Science Volume 8-2004 ISSN: 1610-5389

### Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

### A fast multi-class SVM learning method for huge databases

www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,

### A Hybrid Forecasting Methodology using Feature Selection and Support Vector Regression

A Hybrid Forecasting Methodology using Feature Selection and Support Vector Regression José Guajardo, Jaime Miranda, and Richard Weber, Department of Industrial Engineering, University of Chile Abstract

### CHAPTER 13. Definite Integrals. Since integration can be used in a practical sense in many applications it is often

7 CHAPTER Definite Integrals Since integration can be used in a practical sense in many applications it is often useful to have integrals evaluated for different values of the variable of integration.

### Early defect identification of semiconductor processes using machine learning

STANFORD UNIVERISTY MACHINE LEARNING CS229 Early defect identification of semiconductor processes using machine learning Friday, December 16, 2011 Authors: Saul ROSA Anton VLADIMIROV Professor: Dr. Andrew

### A Neural Support Vector Network Architecture with Adaptive Kernels. 1 Introduction. 2 Support Vector Machines and Motivations

A Neural Support Vector Network Architecture with Adaptive Kernels Pascal Vincent & Yoshua Bengio Département d informatique et recherche opérationnelle Université de Montréal C.P. 6128 Succ. Centre-Ville,

### Calculus and Vectors. Day Lesson Title Math Learning Goals Expectations 1, 2, 3,

Unit Applying Properties of Derivatives Calculus and Vectors Lesson Outline Day Lesson Title Math Learning Goals Epectations 1,,, The Second Derivative (Sample Lessons Included) 4 Curve Sketching from

### Chapter 13. A User s Guide to Support Vector Machines. Asa Ben-Hur and Jason Weston. Abstract. 1. Introduction

Chapter 13 A User s Guide to Support Vector Machines Asa Ben-Hur and Jason Weston Abstract The Support Vector Machine (SVM) is a widely used classifier in bioinformatics. Obtaining the best results with

Server Load Prediction Suthee Chaidaroon (unsuthee@stanford.edu) Joon Yeong Kim (kim64@stanford.edu) Jonghan Seo (jonghan@stanford.edu) Abstract Estimating server load average is one of the methods that

### Classification: Basic Concepts, Decision Trees, and Model Evaluation. General Approach for Building Classification Model

10 10 Classification: Basic Concepts, Decision Trees, and Model Evaluation Dr. Hui Xiong Rutgers University Introduction to Data Mining 1//009 1 General Approach for Building Classification Model Tid Attrib1

### PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

### CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Lecture 1 Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x-5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

### Carrier Relevance Study for Indoor Localization Using GSM

1 Carrier Relevance Study for Indoor Localization Using GSM Iness Ahriz, Yacine Oussar, Bruce Denby, Senior Member, IEEE, Gérard Dreyfus, Senior Member, IEEE Abstract A study is made of subsets of relevant

### A Linear Programming Approach to Novelty Detection

A Linear Programming Approach to Novelty Detection Colin Campbell Dept. of Engineering Mathematics, Bristol University, Bristol Bristol, BS8 1 TR, United Kingdon C. Campbell@bris.ac.uk Kristin P. Bennett

### Neural Networks and Support Vector Machines

INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines

### Classification algorithm in Data mining: An Overview

Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department

### Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics

### Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

### The Artificial Prediction Market

The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory

### A Random Sampling Technique for Training Support Vector Machines (For Primal-Form Maximal-Margin Classifiers)

A Random Sampling Technique for Training Support Vector Machines (For Primal-Form Maximal-Margin Classifiers) Jose Balcázar 1, Yang Dai 2, and Osamu Watanabe 2 1 Dept. Llenguatges i Sistemes Informatics,

### Section 3-7. Marginal Analysis in Business and Economics. Marginal Cost, Revenue, and Profit. 202 Chapter 3 The Derivative

202 Chapter 3 The Derivative Section 3-7 Marginal Analysis in Business and Economics Marginal Cost, Revenue, and Profit Application Marginal Average Cost, Revenue, and Profit Marginal Cost, Revenue, and

### Novelty Detection in image recognition using IRF Neural Networks properties

Novelty Detection in image recognition using IRF Neural Networks properties Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban Université de Haute-Alsace MIPS 4, rue des Frères Lumière, 68093 Mulhouse,