Kernels Methods in Machine Learning
|
|
- Elfrieda Holmes
- 7 years ago
- Views:
Transcription
1 Kernels Methods in Machine Learning Perceptron. Geometric Margins. Support Vector Machines (SVMs). MariaFlorina Balcan 03/23/2015
2 Quick Recap about Perceptron and Margins
3 The nline Learning Model Example arrive sequentially. We need to make a prediction. Afterwards observe the outcome. For i=1, 2,, : Example x i Phase i: nline Algorithm Prediction h(x i ) bserve c (x i ) Mistake bound model Analysis wise, make no distributional assumptions. Goal: Minimize the number of mistakes.
4 Perceptron Algorithm in nline Model WLG homogeneous linear separators [w 0 = 0]. Set t=1, start with the all zero vector w 1. Given example x, predict + iff w t x 0 n a mistake, update as follows: Mistake on positive, w t+1 w t + x Mistake on negative, w t+1 w t x = R n w Note 1: w t is weighted sum of incorrectly classified examples w t = a i1 x i1 + + a ik x ik So, w t x = a i1 x i1 x + + a ik x ik x Note 2: Number of mistakes ever made depends only on the geometric margin of examples seen. No matter how long the sequence is or how high dimension n is!
5 Geometric Margin Definition: The margin of example x w.r.t. a linear sep. w is the distance from x to the plane w x = 0. Margin of example x 1 If w = 1, margin of x w.r.t. w is x w. x 1 w Margin of example x 2 x 2
6 Geometric Margin Definition: The margin of example x w.r.t. a linear sep. w is the distance from x to the plane w x = 0. Definition: The margin γ w of a set of examples S wrt a linear separator w is the smallest margin over points x S. Definition: The margin γ of a set of examples S is the maximum γ w over all linear separators w. γ γ + + w + + +
7 Perceptron: Mistake Bound Theorem: If data linearly separable by margin γ and points inside a ball of radius R, then Perceptron makes R/γ 2 mistakes. No matter how long the sequence is how high dimension n is! w* R Margin: the amount of wiggleroom available for a solution. (Normalized margin: multiplying all points by 100, or dividing all points by 100, doesn t change the number of mistakes; algo is invariant to scaling.)
8 Perceptron Extensions Can use it to find a consistent separator with a given set S linearly separable by margin γ (by cycling through the data). Can convert the mistake bound guarantee into a distributional guarantee too (for the case where the x i s come from a fixed distribution). Can be adapted to the case where there is no perfect separator as long as the so called hinge loss (i.e., the total distance needed to move the points to classify them correctly large margin) is small. Can be kernelized to handle nonlinear decision boundaries!
9 Theorem: If data linearly separable by margin γ and points inside a ball of radius R, then Perceptron makes R/γ 2 mistakes. Implies that large margin classifiers have smaller complexity!
10 Complexity of Large Margin Linear Sep. Know that in R n we can shatter n+1 points with linear separators, but not n+2 points (VCdim of linear sep is n+1). What if we require that the points be linearly separated by margin γ? Can have at most R 2 points inside ball of radius R γ that can be shattered at margin γ (meaning that every labeling is achievable by a separator of margin γ). So, large margin classifiers have smaller complexity! Less classifiers to worry about that will look good over the sample, but bad over all. Less prone to overfitting!!!! Nice implications for usual distributional learning setting. w
11 Margin Important Theme in ML. Both sample complexity and algorithmic implications. Sample/Mistake Bound complexity: If large margin, # mistakes Peceptron makes is small (independent on the dim of the space)! If large margin γ and if alg. produces a large margin classifier, then amount of data needed depends only on R/γ [Bartlett & ShaweTaylor 99]. Suggests searching for a large margin classifier Algorithmic Implications: Perceptron, Kernels, SVMs w γ + γ
12 So far, talked about margins in the context of (nearly) linearly separable datasets
13 What if Not Linearly Separable Problem: data not linearly separable in the most natural feature representation. Example: Solutions: vs No good linear separator in pixel representation. Learn a more complex class of functions (e.g., decision trees, neural networks, boosting). Use a Kernel (a neat solution that attracted a lot of attention) Use a Deep Network Combine Kernels and Deep Networks
14 verview of Kernel Methods What is a Kernel? A kernel K is a legal def of dotproduct: i.e. there exists an implicit mapping Φ s.t. K(, ) =Φ( ) Φ( ) E.g., K(x,y) = (x y + 1) d : (ndimensional space)! n d dimensional space Why Kernels matter? Many algorithms interact with data only via dotproducts. So, if replace x z with K x, z they act implicitly as if data was in the higherdimensional Φspace. If data is linearly separable by large margin in the Φspace, then good sample complexity. [r other regularity properties for controlling the capacity.]
15 Kernels Definition K(, ) is a kernel if it can be viewed as a legal definition of inner product: ϕ: R N s.t. K x, z = ϕ x ϕ(z) Range of ϕ is called the Φspace. N can be very large. But think of ϕ as implicit, not explicit!!!!
16 Example For n=2, d=2, the kernel K x, z = x z d corresponds to x 1, x 2 Φ x = (x 1 2, x 2 2, 2x 1 x 2 ) x 2 x 1 z 1 z 3 Φspace riginal space
17 Example ϕ: R 2 R 3, x 1, x 2 Φ x = (x 1 2, x 2 2, 2x 1 x 2 ) x 2 x 1 z 1 z 3 Φspace riginal space ϕ x ϕ z = x 1 2, x 2 2, 2x 1 x 2 (z 1 2, z 2 2, 2z 1 z 2 ) = x 1 z 1 + x 2 z 2 2 = x z 2 = K(x, z)
18 Kernels Definition K(, ) is a kernel if it can be viewed as a legal definition of inner product: ϕ: R N s.t. K x, z = ϕ x ϕ(z) Range of ϕ is called the Φspace. N can be very large. But think of ϕ as implicit, not explicit!!!!
19 Example Note: feature space might not be unique. ϕ: R 2 R 3, x 1, x 2 Φ x = (x 2 1, x 2 2, 2x 1 x 2 ) ϕ x ϕ z = x 2 1, x 2 2, 2x 1 x 2 (z 2 1, z 2 2, 2z 1 z 2 ) = x 1 z 1 + x 2 z 2 2 = x z 2 = K(x, z) ϕ: R 2 R 4, x 1, x 2 Φ x = (x 1 2, x 2 2, x 1 x 2, x 2 x 1 ) ϕ x ϕ z = (x 1 2, x 2 2, x 1 x 2, x 2 x 1 ) (z 1 2, z 2 2, z 1 z 2, z 2 z 1 ) = x z 2 = K(x, z)
20 Avoid explicitly expanding the features Feature space can grow really large and really quickly. Crucial to think of ϕ as implicit, not explicit!!!! Polynomial kernel degreee d, k x, z = x z d = φ x φ z x d 1, x 1 x 2 x d, x 2 1 x 2 x d 1 Total number of such feature is d + n 1 d + n 1! = d d! n 1! d = 6, n = 100, there are 1.6 billion terms n computation! k x, z = x z d = φ x φ z
21 Kernelizing a learning algorithm If all computations involving instances are in terms of inner products then: Conceptually, work in a very high diml space and the alg s performance depends only on linear separability in that extended space. Computationally, only need to modify the algo by replacing each x z with a K x, z. Examples of kernalizable algos: classification: Perceptron, SVM. regression: linear, ridge regression. clustering: kmeans.
22 Kernelizing the Perceptron Algorithm Set t=1, start with the all zero vector w 1. Given example x, predict + iff w t x 0 n a mistake, update as follows: Mistake on positive, w t+1 w t + x Mistake on negative, w t+1 w t x w Easy to kernelize since w t is weighted sum of incorrectly classified examples w t = a i1 x i1 + + a ik x ik Replace w t x = a i1 x i1 x + + a ik x ik x a i1 K(x i1, x) + + a ik K(x ik, x) with Note: need to store all the mistakes so far.
23 Kernelizing the Perceptron Algorithm Given x, predict + iff φ(x it 1 ) φ(x) a i1 K(x i1, x) + + a it 1 K(x it 1, x) 0 n the t th mistake, update as follows: Mistake on positive, set a it 1; store x it Mistake on negative, a it 1; store x it Φspace w Perceptron w t = a i1 x i1 + + a ik x ik w t x = a i1 x i1 x + + a ik x ik x a i1 K(x i1, x) + + a ik K(x ik, x) Exact same behavior/prediction rule as if mapped data in the φspace and ran Perceptron there! Do this implicitly, so computational savings!!!!!
24 Generalize Well if Good Margin If data is linearly separable by margin in the φspace, then small mistake bound. If margin γ in φspace, then Perceptron makes R γ 2 mistakes Φspace + + R w* + +
25 Kernels: More Examples Linear: K x, z = x z Polynomial: K x, z = x z d or K x, z = 1 + x z d Gaussian: K x, z = exp x z 2 2 σ 2 Laplace Kernel: K x, z = exp x z 2 σ 2 Kernel for nonvectorial data, e.g., measuring similarity between sequences.
26 Theorem (Mercer) Properties of Kernels K is a kernel if and only if: K is symmetric For any set of training points x 1, x 2,, x m and for any a 1, a 2,, a m R, we have: i,j a i a j K x i, x j 0 a T Ka 0 I.e., K = (K x i, x j ) i,j=1,,n is positive semidefinite.
27 Kernel Methods ffer great modularity. No need to change the underlying learning algorithm to accommodate a particular choice of kernel function. Also, we can substitute a different algorithm while maintaining the same kernel.
28 Kernel, Closure Properties Easily create new kernels using basic ones! Fact: If K 1, and K 2, are kernels c 1 0, c 2 0, then K x, z = c 1 K 1 x, z + c 2 K 2 x, z is a kernel. Key idea: concatenate the φ spaces. ϕ x = ( c 1 ϕ 1 x, c 2 ϕ 2 (x)) ϕ x ϕ(z) = c 1 ϕ 1 x ϕ 1 z + c 2 ϕ 2 x ϕ 2 z K 1 (x, z) K 2 (x, z)
29 Kernel, Closure Properties Easily create new kernels using basic ones! Fact: If K 1, and K 2, are kernels, then K x, z = K 1 x, z K 2 x, z is a kernel. Key idea: ϕ x = ϕ 1,i x ϕ 2,j x i 1,,n,j {1,,m} ϕ x ϕ(z) = i,j ϕ 1,i x ϕ 2,j x ϕ 1,i z ϕ 2,j z = ϕ 1,i x ϕ 1,i z ϕ 2,j x ϕ 2,j z i j = ϕ 1,i x ϕ 1,i z K 2 x, z i = K 1 x, z K 2 x, z
30 Kernels, Discussion If all computations involving instances are in terms of inner products then: Conceptually, work in a very high diml space and the alg s performance depends only on linear separability in that extended space. Computationally, only need to modify the algo by replacing each x z with a K x, z. Lots of Machine Learning algorithms are kernalizable: classification: Perceptron, SVM. regression: linear regression. clustering: kmeans.
31 Kernels, Discussion If all computations involving instances are in terms of inner products then: Conceptually, work in a very high diml space and the alg s performance depends only on linear separability in that extended space. Computationally, only need to modify the algo by replacing each x z with a K x, z. How to choose a kernel: Kernels often encode domain knowledge (e.g., string kernels) Use CrossValidation to choose the parameters, e.g., σ for Gaussian Kernel K x, z = exp x z 2 2 σ 2 Learn a good kernel; e.g., [LanckrietCristianiniBartlettEl Ghaoui Jordan 04]
A Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationSupport Vector Machine (SVM)
Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationIntroduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
More informationSupport Vector Machine. Tutorial. (and Statistical Learning Theory)
Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. jasonw@nec-labs.com 1 Support Vector Machines: history SVMs introduced
More informationSupport Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationArtificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationSeveral Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard
More informationClassifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental
More informationSupport Vector Machines
CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning algorithm. SVMs are among the best (and many believe are indeed the best)
More informationAcknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues
Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the
More informationSearch Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
More informationMachine learning for algo trading
Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with
More informationOnline Classification on a Budget
Online Classification on a Budget Koby Crammer Computer Sci. & Eng. Hebrew University Jerusalem 91904, Israel kobics@cs.huji.ac.il Jaz Kandola Royal Holloway, University of London Egham, UK jaz@cs.rhul.ac.uk
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationNeural Networks and Support Vector Machines
INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines
More informationIntroduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu
Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics
More informationAnalysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
More informationIntroduction to Learning & Decision Trees
Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing
More informationSupport Vector Machines
Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric
More informationHT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
More informationBIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
More informationMachine Learning in Spam Filtering
Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationIntroduction to Machine Learning Using Python. Vikram Kamath
Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationTrading regret rate for computational efficiency in online learning with limited feedback
Trading regret rate for computational efficiency in online learning with limited feedback Shai Shalev-Shwartz TTI-C Hebrew University On-line Learning with Limited Feedback Workshop, 2009 June 2009 Shai
More informationSimple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,
More informationLocal features and matching. Image classification & object localization
Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to
More informationNotes on Orthogonal and Symmetric Matrices MENU, Winter 2013
Notes on Orthogonal and Symmetric Matrices MENU, Winter 201 These notes summarize the main properties and uses of orthogonal and symmetric matrices. We covered quite a bit of material regarding these topics,
More information3 An Illustrative Example
Objectives An Illustrative Example Objectives - Theory and Examples -2 Problem Statement -2 Perceptron - Two-Input Case -4 Pattern Recognition Example -5 Hamming Network -8 Feedforward Layer -8 Recurrent
More informationSemi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
More informationName: Section Registered In:
Name: Section Registered In: Math 125 Exam 3 Version 1 April 24, 2006 60 total points possible 1. (5pts) Use Cramer s Rule to solve 3x + 4y = 30 x 2y = 8. Be sure to show enough detail that shows you are
More informationMaking Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More informationNonlinear Programming Methods.S2 Quadratic Programming
Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective
More informationContent-Based Recommendation
Content-Based Recommendation Content-based? Item descriptions to identify items that are of particular interest to the user Example Example Comparing with Noncontent based Items User-based CF Searches
More informationLCs for Binary Classification
Linear Classifiers A linear classifier is a classifier such that classification is performed by a dot product beteen the to vectors representing the document and the category, respectively. Therefore it
More informationElasticity Theory Basics
G22.3033-002: Topics in Computer Graphics: Lecture #7 Geometric Modeling New York University Elasticity Theory Basics Lecture #7: 20 October 2003 Lecturer: Denis Zorin Scribe: Adrian Secord, Yotam Gingold
More informationCar Insurance. Prvák, Tomi, Havri
Car Insurance Prvák, Tomi, Havri Sumo report - expectations Sumo report - reality Bc. Jan Tomášek Deeper look into data set Column approach Reminder What the hell is this competition about??? Attributes
More informationClassification Problems
Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems
More informationLarge Margin DAGs for Multiclass Classification
S.A. Solla, T.K. Leen and K.-R. Müller (eds.), 57 55, MIT Press (000) Large Margin DAGs for Multiclass Classification John C. Platt Microsoft Research Microsoft Way Redmond, WA 9805 jplatt@microsoft.com
More informationThe Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
More informationMachine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler
Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More informationTHREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering
More informationMetric Spaces. Chapter 7. 7.1. Metrics
Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some
More informationLearning is a very general term denoting the way in which agents:
What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);
More informationE-commerce Transaction Anomaly Classification
E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce
More informationMachine Learning Logistic Regression
Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.
More informationCHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES
CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,
More informationAnalysis Tools and Libraries for BigData
+ Analysis Tools and Libraries for BigData Lecture 02 Abhijit Bendale + Office Hours 2 n Terry Boult (Waiting to Confirm) n Abhijit Bendale (Tue 2:45 to 4:45 pm). Best if you email me in advance, but I
More informationClassification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
More informationRecognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28
Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object
More informationSupport Vector Machines for Classification and Regression
UNIVERSITY OF SOUTHAMPTON Support Vector Machines for Classification and Regression by Steve R. Gunn Technical Report Faculty of Engineering, Science and Mathematics School of Electronics and Computer
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationMoving Least Squares Approximation
Chapter 7 Moving Least Squares Approimation An alternative to radial basis function interpolation and approimation is the so-called moving least squares method. As we will see below, in this method the
More information1 VECTOR SPACES AND SUBSPACES
1 VECTOR SPACES AND SUBSPACES What is a vector? Many are familiar with the concept of a vector as: Something which has magnitude and direction. an ordered pair or triple. a description for quantities such
More informationLecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
More informationInteractive Machine Learning. Maria-Florina Balcan
Interactive Machine Learning Maria-Florina Balcan Machine Learning Image Classification Document Categorization Speech Recognition Protein Classification Branch Prediction Fraud Detection Spam Detection
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationHow to Win at the Track
How to Win at the Track Cary Kempston cdjk@cs.stanford.edu Friday, December 14, 2007 1 Introduction Gambling on horse races is done according to a pari-mutuel betting system. All of the money is pooled,
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationSupport-Vector Networks
Machine Learning, 20, 273-297 (1995) 1995 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Support-Vector Networks CORINNA CORTES VLADIMIR VAPNIK AT&T Bell Labs., Holmdel, NJ 07733,
More informationRecognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang
Recognizing Cats and Dogs with Shape and Appearance based Models Group Member: Chu Wang, Landu Jiang Abstract Recognizing cats and dogs from images is a challenging competition raised by Kaggle platform
More informationSimilarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
More informationKEITH LEHNERT AND ERIC FRIEDRICH
MACHINE LEARNING CLASSIFICATION OF MALICIOUS NETWORK TRAFFIC KEITH LEHNERT AND ERIC FRIEDRICH 1. Introduction 1.1. Intrusion Detection Systems. In our society, information systems are everywhere. They
More informationUSING SUPPORT VECTOR MACHINES FOR LANE-CHANGE DETECTION
USING SUPPORT VECTOR MACHINES FOR LANE-CHANGE DETECTION Hiren M. Mandalia Drexel University Philadelphia, PA Dario D. Salvucci Drexel University Philadelphia, PA Driving is a complex task that requires
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationDecompose Error Rate into components, some of which can be measured on unlabeled data
Bias-Variance Theory Decompose Error Rate into components, some of which can be measured on unlabeled data Bias-Variance Decomposition for Regression Bias-Variance Decomposition for Classification Bias-Variance
More informationMicrosoft Azure Machine learning Algorithms
Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql Tomaz.kastrun@gmail.com http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation
More information4.1 Introduction - Online Learning Model
Computational Learning Foundations Fall semester, 2010 Lecture 4: November 7, 2010 Lecturer: Yishay Mansour Scribes: Elad Liebman, Yuval Rochman & Allon Wagner 1 4.1 Introduction - Online Learning Model
More informationCorrelation and Convolution Class Notes for CMSC 426, Fall 2005 David Jacobs
Correlation and Convolution Class otes for CMSC 46, Fall 5 David Jacobs Introduction Correlation and Convolution are basic operations that we will perform to extract information from images. They are in
More informationFacebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationMachine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.
Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,
More informationEnsemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
More informationApplied Algorithm Design Lecture 5
Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design
More informationComparison of machine learning methods for intelligent tutoring systems
Comparison of machine learning methods for intelligent tutoring systems Wilhelmiina Hämäläinen 1 and Mikko Vinni 1 Department of Computer Science, University of Joensuu, P.O. Box 111, FI-80101 Joensuu
More informationData Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationVisualization of General Defined Space Data
International Journal of Computer Graphics & Animation (IJCGA) Vol.3, No.4, October 013 Visualization of General Defined Space Data John R Rankin La Trobe University, Australia Abstract A new algorithm
More informationMachine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu
Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning Logistics Lectures M 9:30-11:30 am Room 4419 Personnel
More informationIFT3395/6390. Machine Learning from linear regression to Neural Networks. Machine Learning. Training Set. t (3.5, -2,..., 127, 0,...
IFT3395/6390 Historical perspective: back to 1957 (Prof. Pascal Vincent) (Rosenblatt, Perceptron ) Machine Learning from linear regression to Neural Networks Computer Science Artificial Intelligence Symbolic
More informationThe Set Covering Machine
Journal of Machine Learning Research 3 (2002) 723-746 Submitted 12/01; Published 12/02 The Set Covering Machine Mario Marchand School of Information Technology and Engineering University of Ottawa Ottawa,
More informationMassive Data Classification via Unconstrained Support Vector Machines
Massive Data Classification via Unconstrained Support Vector Machines Olvi L. Mangasarian and Michael E. Thompson Computer Sciences Department University of Wisconsin 1210 West Dayton Street Madison, WI
More information1 Solving LPs: The Simplex Algorithm of George Dantzig
Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.
More information10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
More informationA Neural Support Vector Network Architecture with Adaptive Kernels. 1 Introduction. 2 Support Vector Machines and Motivations
A Neural Support Vector Network Architecture with Adaptive Kernels Pascal Vincent & Yoshua Bengio Département d informatique et recherche opérationnelle Université de Montréal C.P. 6128 Succ. Centre-Ville,
More informationKnowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
More information1 Sets and Set Notation.
LINEAR ALGEBRA MATH 27.6 SPRING 23 (COHEN) LECTURE NOTES Sets and Set Notation. Definition (Naive Definition of a Set). A set is any collection of objects, called the elements of that set. We will most
More informationSystems of Linear Equations
Systems of Linear Equations Beifang Chen Systems of linear equations Linear systems A linear equation in variables x, x,, x n is an equation of the form a x + a x + + a n x n = b, where a, a,, a n and
More information