Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Size: px
Start display at page:

Download "Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris"

Transcription

1 Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1

2 Module #: Title of Module 2

3 Review Overview Linear separability Non-linear classification Linear Support Vector Machines (SVMs) Picking the best linear decision boundary Non-linear SVMs Decision Trees Ensemble methods: Bagging and Boosting 3

4 Objective functions E(X, Θ) -- function of the data (X) and the parameters (Θ) They measure data fit between a model and the data, e.g.: The K-means objective function is the sum of the squared Euclidean distance from each data point to the closest mean (or centroid ), parameters are the mean When fitting a statistical model, we use: E(X, Θ) = log P(X Θ) = Σ i log P(Xi Θ) 4

5 Examples of objective functions Clustering: Cuse Mix of Gaussians: log likelihood of data Affinity propagation: sum of similarities with exemplars Regression: Linear: Sum of squared errors (aka residuals), aka log likelihood under Gaussian noise Classification: Logistic regression: Sum of log probability of class under logistic function Fisher s: ratio of σ 2 within class vs σ 2 between class 5

6 Overfitting 1 mod del fit) Err ror (i.e. Generalization (test set) error Training set error # of parameters or complexity More complex models better fit the training data. But after some point, more complex models generalize worse to new data. 6

7 Dealing with overfitting by regularization Basic idea: add a term to the objective function that penalizes # of parameters or model complexity, Now: E(X, Θ) = datafit(x, Θ) λ complexity(θ) Hyper-parameter parameter λ controls the strength of regularization could have a natural value, or need to set this by cross-validation, Increases in model complexity need to be balanced by improved model fit. (each unit of added complexity costs λ units of data fit) 7

8 Examples of regularization Clustering: (N is # params, a M is # datapoints) apo Bayes IC: LL 0.5 N log M Akaike IC: LL N Regression: L1 (aka LASSO): LL λ Σ i b i L2 (aka Ridge): LL λ Σ i b i 2 Elastic net: LL λ 1 Σ i b i - λ 2 Σ i b i 2 Classification: Logistic regression: Same as linear regression 8

9 Decision boundary: b +b = T 1 x 1 2 x 2 x b = t x 1 f(x) = x T b discriminant function (or value) Linear classification b Can define decision boundary with a vector of feature weights, b, and a threshold, t, if x T b > t, predict + and otherwise predict -, can trade off false positives versus false negatives by changing t x 2

10 Linear separability Linearly separable Not linearly separable x 1 x 1 x 2 x 2 Two classes are linearly separable, if you can perfectly classify them with a linear decision boundary. Why is this important? Linear classifiers can only draw linear decision i boundaries.

11 Non-linear classification x 1 x 2 Non-linear classifiers have non-linear, and possibly discontinous decision boundaries 11

12 Non-linear decision boundaries Set x 3 = (x 1 m 1 ) 2 +(x 2 m 2 ) 2 x 3 = 1 x 1 m Decision rule: Predict if x 3 < 1 x 2 12

13 Non-linear classification: Idea #1 Fit a linear classifier to non-linear functions of the input features. E.g.: Transform features (quadratic): {1, x 1, x 2 } {1, x 1, x 2, x 1 x 2, x 12, x 22 } Now can fit arbitrary quadratic decision boundaries by fitting a linear classifier to transformed features. This is feasible for quadratic transforms, but what about other transforms ( power of 10 )? 13

14 Support Vector Machines Vladimir Vapnik 14

15 Picking the best decision boundary Decision boundaries x 1 What s the best decision boundary to choose when one? That is, what is the one most likely to generalize well to new datapoints? x 2

16 LDA & Fisher sol n (generative) Best decision boundary y( (in 2-d): parallel to the direction of greatest variance. x 1 This is the Bayes optimal classifier (if mean and covariance are correct). x 2

17 Unregularized logistic regression (discriminative) x 1 All solutions are equally good because each one perfectly discriminates the classes. (note: this changes if there s regularization) x 2 17

18 Margin Linear SVM solution Maximum margin boundary. x 1 x 2 Vladimir Vapnik 18

19 Margin SVM decision boundaries Here, the maximum margin boundary is specified by three points, called the support vectors. x 1 Support vectors x 2 Vladimir Vapnik 19

20 SVM objective function (Where targets: y i = -1 or y i = 1, x i input features, b are feature weights) Primal objective function: E(b) = -C Σ i ξ i b T b/2 where y i (b T x i ) > 1-ξ i for all i d a t a L 2 r ξ i >= 0 is degree eof mis-classification under g hinge loss f for datapoint i, C>0 is the i hyperparameter eter balancing regularization and data t fit. 20

21 Visualization of ξ i Misclassified point ξ i If linearly separable, all values of ξ i = 0 x 1 Vladimir Vapnik x 2 21

22 Another SVM objective function Dual : E(α) = Σ i α i - Σ i Σ j α i α j y i y j x it x j /2 Under the constraints that: Σ i α i y i = 0 and 0 α i C Depends only on dot products!!! Then can define weights b = Σ i α i y i x i and discriminant function: b T x = Σ i α i y i x it x Data i such that α i > 0 are support vectors 22

23 Dot products of transformed vector as kernel functions Let x = (x () 2 2 1, x 2 ) and 2 (x) = ( 2x 1 x 2, x 12, x 22 ) Here 2 (x) [or p (x)] is a vector valued function that returns all possible powers of two [or p] of x 1 and x 2 Then 2 (x) T 2 (z) = (x T z) 2 = K(x,z) kernel function And in general, p (x) T p (z) = (x T z) p Every kernel function K(x,z) that satisfies some simple conditions corresponds to a dot product of some transformation (x) (aka projection function ) 23

24 Non-linear SVM objective function Objective: E(α) = Σ i α i - Σ i Σ j α i α j y i y j K(x i,x j )/2 Under the constraints that: Σ i α i y i = 0 and 0 α i C Discriminant function: f(x) = Σ i α i y i K(x i,x) w i feature weight 24

25 Kernelization (FYI) Often instead of explicitly writing out the non-linear feature set, one simply calculates l a kernel function of the two input vectors, i.e., K(x i, x j ) = φ(x i ) T φ(x j ) Here s why: any kernel function K(x i, x j ) that satisfies certain conditions, e.g., K(x i, x j ) is a symmetric positive semi-definite function (which implies that the matrix K, where K ij = K(x i, x j ) is a symmetric positive semi-definite matrix), corresponds to a dot product K(x i, x j ) = φ(x i ) T φ(x j ) for some projection function φ(x). Often it s easy to think of defining a kernel function that captures some notation of similarity Non-linear SVMs use the discriminant function f(x; w) =Σ i w i K(x i,x)

26 Some popular kernel functions K(x i, x j ) = (x it x j + 1) P Inhomogeneous Polynomial kernel of degree P K(x i, x j ) = (x it x j ) P Homogeneous Polynomial kernel K(x i, x j ) = exp{-(x i -x j ) T (x i -x j )/ s 2 } Gaussian radial basis function kernel K(x i, x j ) = Pearson(x i,x j ) + 1 Bioinformatics kernel A Also, can make a kernel out of an interaction network!

27 Example: radial basis function kernels support vectors x 1 x 2 Φ(x) = {N(x; x 1, σ 2 I), N(x; x 2, σ 2 I),, N(x; x 2, σ 2 I)} Where N(x; m, Σ) is the probability density of x under a multivariate Gaussian with mean m, and covariance Σ

28 SVM summary A sparse linear classifier that uses as features kernel functions that measure similarity to each data point in the training. 28

29 Neural Networks (aka multilayer perceptrons) h 4 f(x,θ) = Σ k v k h k (x) Where hidden unit : h k (x) = σ(σ i w ik x i ) x 1 Often: σ(x) = 1 / (1 + e -x ) h 1 v and W are parameters x 2 h 2 h 3 Notes: -Fitting the hidden units has become a lot easier using deep belief networks -Could also fit means of RBF for hidden units

30 K-Nearest Neighbour K = 15 Classify x in the same way that a majority of the K nearest neighbours to x are labeled. Can also used # of nearest neighbors that are positive as a discriminant value. K = 1 Smaller values of K lead to more complex classifiers, you can set K using (nested) cross-validation. Images from Elements of Stat Learning, Hastie, Tibshirani, Efron (available online)

31 x 1 Decision trees For each leaf: 1) Choose a dimension to split along, and choose best split using a split criterion (see below) 2) Split along dimension, creating two new leaves, return to (1) until leaves are pure. To reduce overfitting, it is advisable to prune the tree by removing leaves with small numbers of examples x 2 Split criterion: C4.5 (Quinlan 1993): Expected decrease in (binary) entropy CART (Breiman et al 1984): Expected decrease in Gini impurity (1-sum of squared class probabilities)

32 Decision trees x 2 > a? yes no x 1 x 2 a

33 Decision trees x 2 > a? yes no x 1 x 2 > b? yes no etc b x 2 a

34 Decision trees x 2 > a? yes no x 1 x 2 > b? yes no etc b x 2 a

35 Decision tree summary Decision trees learn a recursive splits of the data along individual features that partition the input space into homogeneous groups of data points with the same labels. l 35

36 Ensemble classification Combining multiple classifiers together by training them separately and then averaging their predictions is a good way to avoid overfitting. Bagging (Breiman 1996): Resample training set, train separate classifiers on each sample, have the classifiers vote for the classification Boosting (e.g. Adaboost, Freund and Schapire 1997): Iteratively reweight training sets based on errors of a weighted average of classifiers: Train classifier ( weak learner ) to minimize weighted error on training set Weight new classifier according to prediction error, reweight training i set according to prediction error Repeat Minimizes exponential loss on training set over a convex set of functions

37 (x 1, y 1 ), (x 2, y 2 ),, (x N, y N ) Bootstrap samples Bagging Train separate classifiers (x 1, y 1 ), (x 1, y 1 ), (x 3, y 3 ),, (x N-1, y N-1 ) f 1 (x) (x 1, y 1 ), (x 2, y 2 ), (x 4, y 4 ),, (x N, y N ) f 2 (x) etc (M samples in total) bagged(x) = Σ j f j (x)/m 37

38 Boosting (x 1, y 1 ), (x 2, y 2 ),, (x N, y N ) (x 2, y 2 ), (x 2, y 2 ),, (x N-1 1, y N-1 ) Assess performance f 1 (x) Train classifier Resample hard cases, Set w t f t (x) Train classifier Assess performance (x 1, y 1 ), (x 2, y 2 ),, (x N, y N ) (x 2, y 2 ), (x 2, y 2 ),, (x N-1, y N-1 ) Legend (x i, y i ) -- correct (x i, y i ) -- incorrect boost(x) = Σ j w j f j (x) 38

39 Ensemble learning summary Bagging: classifiers trained separately (in parallel) on different bootstrapped samples of the training set, and make equally weighted votes Boosting: classifiers trained sequentially, weighted by performance, on samples of the training set that focus on hard examples. Final classification is based on weighted votes. 39

40 Random Forests (Breiman 2001) Construct M bootstrapped samples of the training set (of size N) For each sample, build DT using CART (no pruning), but split optimally on randomly chosen features random feature choice reduces correlation among trees, this is a good thing. Since bootstrap resamples training set with replacement, leaves out, on average, (1-1/N) N x 100% of the examples (~100/e% = 36.7%), can use these out-of-bag samples to estimate performance Bag predictions (i.e. average them) Can assess importance of features by evaluating performance of trees containing those features

41 Advantages of Decision Trees Relatively easy to combine continuous and discrete (ordinal or categorical) features within the same classifier, this is harder to do for other methods Random Forests can be trained in parallel (unlike boosted decision trees), and relatively quickly. Random Forests tend to do quite well in empirical tests (but see No Free Lunch theorem)

42 Networks as kernels Can use matrix representations of graphs to generate kernels. One popular graph-based kernel is the diffusion kernel: K =(λi L) -1 Where L = D W, D is a diagonal matrix with the row sums, and W is the matrix representation of the graph. GeneMANIA label propagation: f = (λi L) -1 y

Support Vector Machine. Hongyu Li School of Software Engineering Tongji University Fall, 2014

Support Vector Machine. Hongyu Li School of Software Engineering Tongji University Fall, 2014 Support Vector Machine Hongyu Li School of Software Engineering Tongji University Fall, 2014 Overview Intro. to Support Vector Machines (SVM) Properties of SVM Applications Gene Expression Data Classification

More information

CS Statistical Machine learning Lecture 18: Midterm Review. Yuan (Alan) Qi

CS Statistical Machine learning Lecture 18: Midterm Review. Yuan (Alan) Qi CS 59000 Statistical Machine learning Lecture 18: Midterm Review Yuan (Alan) Qi Overview Overfitting, probabilities, decision theory, entropy and KL divergence, ML and Bayesian estimation of Gaussian and

More information

Data Mining Chapter 10: Predictive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 10: Predictive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 10: Predictive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Predictive modeling Predictive modeling can be thought of as learning a mapping

More information

A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

More information

1 Comparison of Machine Learning Algorithms [Jayant, 20 points]

1 Comparison of Machine Learning Algorithms [Jayant, 20 points] 1 Comparison of Machine Learning Algorithms [Jayant, 20 points] In this problem, you will review the important aspects of the algorithms we have learned about in class. For every algorithm listed in the

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

Classification: Basic Concepts, Decision Trees, and Model Evaluation. General Approach for Building Classification Model

Classification: Basic Concepts, Decision Trees, and Model Evaluation. General Approach for Building Classification Model 10 10 Classification: Basic Concepts, Decision Trees, and Model Evaluation Dr. Hui Xiong Rutgers University Introduction to Data Mining 1//009 1 General Approach for Building Classification Model Tid Attrib1

More information

Logistic Regression. Machine Learning CSEP546 Carlos Guestrin University of Washington

Logistic Regression. Machine Learning CSEP546 Carlos Guestrin University of Washington Logistic Regression Machine Learning CSEP546 Carlos Guestrin University of Washington January 27, 2014 Carlos Guestrin 2005-2014 1 Reading Your Brain, Simple Example Pairwise classification accuracy: 85%

More information

Random Forests. Albert A. Montillo, Ph.D. University of Pennsylvania, Radiology Rutgers University, Computer Science

Random Forests. Albert A. Montillo, Ph.D. University of Pennsylvania, Radiology Rutgers University, Computer Science Random Forests Albert A. Montillo, Ph.D. University of Pennsylvania, Radiology Rutgers University, Computer Science Guest lecture: Statistical Foundations of Data Analysis Temple University 4-2-2009 Outline

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Third Edition Ethem Alpaydın The MIT Press Cambridge, Massachusetts London, England 2014 Massachusetts Institute of Technology All rights reserved. No part of this book

More information

Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

More information

Lecture 4: Regression ctd and multiple classes

Lecture 4: Regression ctd and multiple classes Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman Regression Lasso L1 regularization SVM regression and epsilon-insensitive loss More loss functions Multi-class

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

Model Combination. 24 Novembre 2009

Model Combination. 24 Novembre 2009 Model Combination 24 Novembre 2009 Datamining 1 2009-2010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

LEARNING TECHNIQUES AND APPLICATIONS

LEARNING TECHNIQUES AND APPLICATIONS MSc Course MACHINE LEARNING TECHNIQUES AND APPLICATIONS Classification with GMM + Bayes 1 Clustering, semi-supervised clustering and classification Clustering No labels for the points! Semi-supervised

More information

1. Show the tree you constructed in the diagram below. The diagram is more than big enough, leave any parts that you don t need blank. > 1.5.

1. Show the tree you constructed in the diagram below. The diagram is more than big enough, leave any parts that you don t need blank. > 1.5. 1 Decision Trees (13 pts) Data points are: Negative: (-1, 0) (2, 1) (2, -2) Positive: (0, 0) (1, 0) Construct a decision tree using the algorithm described in the notes for the data above. 1. Show the

More information

Kernel Methods and Nonlinear Classification

Kernel Methods and Nonlinear Classification Kernel Methods and Nonlinear Classification Piyush Rai CS5350/6350: Machine Learning September 15, 2011 (CS5350/6350) Kernel Methods September 15, 2011 1 / 16 Kernel Methods: Motivation Often we want to

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees Bob Stine Dept of Statistics, School University of Pennsylvania Trees Familiar metaphor Biology Decision tree Medical diagnosis Org chart Properties Recursive, partitioning

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

10-601: Machine Learning Midterm Exam November 3, Solutions

10-601: Machine Learning Midterm Exam November 3, Solutions 10-601: Machine Learning Midterm Exam November 3, 2010 Solutions Instructions: Make sure that your exam has 16 pages (not including this cover sheet) and is not missing any sheets, then write your full

More information

Tree based methods Idea: Partition the feature space into sets of rectangles Fit simple model in each areas. 5. Tree based methods

Tree based methods Idea: Partition the feature space into sets of rectangles Fit simple model in each areas. 5. Tree based methods Tree based methods Idea: Partition the feature space into sets of rectangles Fit simple model in each areas 5. Tree based methods Recursive binary partition Example with one continuous response and two

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Spring 2016 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic

More information

Bagging and Boosting. Amit Srinet Dave Snyder

Bagging and Boosting. Amit Srinet Dave Snyder Bagging and Boosting Amit Srinet Dave Snyder Outline Bagging Definition Variants Examples Boosting Definition Hedge(β) AdaBoost Examples Comparison Bagging Bootstrap Model Randomly generate L set of cardinality

More information

Learning Bayesian Networks: Naïve Bayes

Learning Bayesian Networks: Naïve Bayes Learning Baian Networks: Naïve and n-na Naïve Ba Hypothesis Space fixed size stochastic continuous parameters Learning Algorithm direct computation eager batch Multivariate Gaussian Classifier y x The

More information

Decision Trees from large Databases: SLIQ

Decision Trees from large Databases: SLIQ Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning PhD Course at BMCS Fabio Aiolli 24 June 2016 Fabio Aiolli Introduction to Machine Learning 24 June 2016 1 / 34 Outline Linear methods for classification and regression

More information

Support Vector Machines Explained

Support Vector Machines Explained March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

More information

An Introduction to Machine Learning for Social Scientists

An Introduction to Machine Learning for Social Scientists An Introduction to Machine Learning for Social Scientists Tyler Ransom Duke University, Social Science Research Institute March 30, 2016 Outline 1. Intro 2. Examples 3. Conclusion Tyler Ransom (Duke SSRI)

More information

Support Vector Machine (SVM)

Support Vector Machine (SVM) Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Support Vector Machines and Kernel Functions

Support Vector Machines and Kernel Functions Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 18 24 April 2015 Support Vector Machines and Kernel Functions Contents Support Vector

More information

Very convenient! Linear Methods: Logistic Regression. Module 5: Classification. STAT/BIOSTAT 527, University of Washington Emily Fox May 22 nd, 2014

Very convenient! Linear Methods: Logistic Regression. Module 5: Classification. STAT/BIOSTAT 527, University of Washington Emily Fox May 22 nd, 2014 Module 5: Classification Linear Methods: Logistic Regression STAT/BIOSTAT 527, University of Washington Emily Fox May 22 nd, 2014 1 Very convenient! implies Examine ratio: implies linear classification

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Boosting. Can we make dumb learners smart? Aarti Singh. Machine Learning / Oct 11, 2010

Boosting. Can we make dumb learners smart? Aarti Singh. Machine Learning / Oct 11, 2010 Boosting Can we make dumb learners smart? Aarti Singh Machine Learning 10-701/15-781 Oct 11, 2010 Slides Courtesy: Carlos Guestrin, Freund & Schapire 1 Project Proposal Due Today! 2 Why boost weak learners?

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications Machine Learning: Algorithms and Applications Floriano Zini Free University of Bozen-Bolzano Faculty of Computer Science Academic Year 2011-2012 Lecture 9: 7 May 2012 Ensemble methods Slides courtesy of

More information

Support Vector Machines

Support Vector Machines CS769 Spring 200 Advanced Natural Language Processing Support Vector Machines Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Many NLP problems can be formulated as classification, e.g., word sense disambiguation,

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

SVM and Decision Tree

SVM and Decision Tree SVM and Decision Tree Le Song Machine Learning I CSE 6740, Fall 2013 Which decision boundary is better? Suppose the training samples are linearly separable We can find a decision boundary which gives zero

More information

Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Models vs. Patterns Models A model is a high level, global description of a

More information

The Problem of Overfitting. BR data: neural network with 20% classification noise, 307 training examples

The Problem of Overfitting. BR data: neural network with 20% classification noise, 307 training examples The Problem of Overfitting BR data: neural network with 20% classification noise, 307 training examples Overfitting on BR (2) Overfitting: h H overfits training set S if there exists h H that has higher

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

Lecture 3: SVM dual, kernels and regression

Lecture 3: SVM dual, kernels and regression Lecture 3: SVM dual, kernels and regression C19 Machine Learning Hilary 215 A. Zisserman Primal and dual forms Linear separability revisted Feature maps Kernels for SVMs Regression Ridge regression Basis

More information

Semi-supervised learning

Semi-supervised learning Semi-supervised learning Learning from both labeled and unlabeled data Semi-supervised learning Learning from both labeled and unlabeled data Motivation: labeled data may be hard/expensive to get, but

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machine learning Mid-term exam October 15, 3 ( points) Your name and MIT ID: SOLUTIONS Problem 1 Suppose we are trying to solve an active learning problem, where the possible inputs you can select

More information

Support Vector Machines. Rezarta Islamaj Dogan. Support Vector Machines tutorial Andrew W. Moore

Support Vector Machines. Rezarta Islamaj Dogan. Support Vector Machines tutorial Andrew W. Moore Support Vector Machines Rezarta Islamaj Dogan Resources Support Vector Machines tutorial Andrew W. Moore www.autonlab.org/tutorials/svm.html A tutorial on support vector machines for pattern recognition.

More information

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet

More information

Introduction to Statistical Learning

Introduction to Statistical Learning Introduction to Statistical Learning Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr Mines ParisTech and Institut Curie Master Course, 2011. Jean-Philippe Vert (Mines ParisTech) 1 / 46 Outline 1 Introduction

More information

6.891 Machine learning and neural networks

6.891 Machine learning and neural networks 6.89 Machine learning and neural networks Mid-term exam: SOLUTIONS October 3, 2 (2 points) Your name and MIT ID: No Body, MIT ID # Problem. (6 points) Consider a two-layer neural network with two inputs

More information

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training

More information

PMR 2728 / 5228 Probability Theory in AI and Robotics. Machine Learning. Fabio G. Cozman - Office MS08 -

PMR 2728 / 5228 Probability Theory in AI and Robotics. Machine Learning. Fabio G. Cozman - Office MS08 - PMR 2728 / 5228 Probability Theory in AI and Robotics Machine Learning Fabio G. Cozman - Office MS08 - fgcozman@usp.br November 12, 2012 Machine learning Quite general term: learning from explanations,

More information

C19 Machine Learning

C19 Machine Learning C9 Machine Learning 8 Lectures Hilary Term 25 2 Tutorial Sheets A. Zisserman Overview: Supervised classification perceptron, support vector machine, loss functions, kernels, random forests, neural networks

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Support Vector Machines Barnabás Póczos, 2014 Fall Linear classifiers which line is better? 2 Pick the one with the largest margin! w x + b > 0 Class

More information

DATA MINING REVIEW BASED ON. MANAGEMENT SCIENCE The Art of Modeling with Spreadsheets. Using Analytic Solver Platform

DATA MINING REVIEW BASED ON. MANAGEMENT SCIENCE The Art of Modeling with Spreadsheets. Using Analytic Solver Platform DATA MINING REVIEW BASED ON MANAGEMENT SCIENCE The Art of Modeling with Spreadsheets Using Analytic Solver Platform What We ll Cover Today Introduction Session II beta training program goals Brief overview

More information

Decompose Error Rate into components, some of which can be measured on unlabeled data

Decompose Error Rate into components, some of which can be measured on unlabeled data Bias-Variance Theory Decompose Error Rate into components, some of which can be measured on unlabeled data Bias-Variance Decomposition for Regression Bias-Variance Decomposition for Classification Bias-Variance

More information

Kernels Methods in Machine Learning

Kernels Methods in Machine Learning Kernels Methods in Machine Learning Perceptron. Geometric Margins. Support Vector Machines (SVMs). MariaFlorina Balcan 03/23/2015 Quick Recap about Perceptron and Margins The nline Learning Model Example

More information

Decision Trees. Aarti Singh. Machine Learning / Oct 6, 2010

Decision Trees. Aarti Singh. Machine Learning / Oct 6, 2010 Decision Trees Aarti Singh Machine Learning 10-701/15-781 Oct 6, 2010 Learning a good prediction rule Learn a mapping Best prediction rule Hypothesis space/function class Parametric classes (Gaussian,

More information

Recitation. Mladen Kolar

Recitation. Mladen Kolar 10-601 Recitation Mladen Kolar Topics covered after the midterm Learning Theory Hidden Markov Models Neural Networks Dimensionality reduction Nonparametric methods Support vector machines Boosting Learning

More information

Microarray Data Analysis: Discovery

Microarray Data Analysis: Discovery Microarray Data Analysis: Discovery Classification Classification vs. Clustering Classification: Goal: Placing objects (e.g. genes) into meaningful classes Supervised Clustering: Goal: Discover meaningful

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric

More information

Classification Basic Concepts, Decision Trees, and Model Evaluation

Classification Basic Concepts, Decision Trees, and Model Evaluation Classification Basic Concepts, Decision Trees, and Model Evaluation Jeff Howbert Introduction to Machine Learning Winter 2014 1 Classification definition Given a collection of samples (training set) Each

More information

Optimal Separating Hyperplane and the Support Vector Machine. Volker Tresp Summer 2016

Optimal Separating Hyperplane and the Support Vector Machine. Volker Tresp Summer 2016 Optimal Separating Hyperplane and the Support Vector Machine Volker Tresp Summer 2016 1 (Vapnik s) Optimal Separating Hyperplane Let s consider a linear classifier with y i { 1, 1} If classes are linearly

More information

linear classifiers and Support Vector Machines

linear classifiers and Support Vector Machines linear classifiers and Support Vector Machines classification origins in pattern recognition considered a central topic in machine learning overlap between vision and learning unavoidable and very important

More information

Curve Fitting & Multisensory Integration

Curve Fitting & Multisensory Integration Curve Fitting & Multisensory Integration Using Probability Theory Hannah Chen, Rebecca Roseman and Adam Rule COGS 202 4.14.2014 Overheard at Porters You re optimizing for the wrong thing! What would you

More information

Ensemble Methods and Boosting. Copyright 2005 by David Helmbold 1

Ensemble Methods and Boosting. Copyright 2005 by David Helmbold 1 Ensemble Methods and Boosting Copyright 2005 by David Helmbold 1 Ensemble Methods: Use set of hypothesis 2 Ensemble Methods Use ensemble or goup of hypotheses Diversity important ensemble of yes-men is

More information

Trees and Random Forests

Trees and Random Forests Trees and Random Forests Adele Cutler Professor, Mathematics and Statistics Utah State University This research is partially supported by NIH 1R15AG037392-01 Cache Valley, Utah Utah State University Leo

More information

Introduction to Boosted Trees. Tianqi Chen Oct

Introduction to Boosted Trees. Tianqi Chen Oct Introduction to Boosted Trees Tianqi Chen Oct. 22 2014 Outline Review of key concepts of supervised learning Regression Tree and Ensemble (What are we Learning) Gradient Boosting (How do we Learn) Summary

More information

Machine Learning - Michaelmas Term 2016 Lecture 8 : Classification: Logistic Regression

Machine Learning - Michaelmas Term 2016 Lecture 8 : Classification: Logistic Regression Machine Learning - Michaelmas Term 2016 Lecture 8 : Classification: Logistic Regression Lecturer: Varun Kanade In the previous lecture, we studied two different generative models for classification Naïve

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees David Rosenberg New York University February 28, 2015 David Rosenberg (New York University) DS-GA 1003 February 28, 2015 1 / 36 Regression Trees General Tree Structure

More information

10-701/ Final, Fall 2003

10-701/ Final, Fall 2003 NDREW ID (CPITLS): NME (CPITLS): 10-701/15-781 Final, Fall 2003 You have 3 hours. There are 10 questions. If you get stuck on one question, move on to others and come back to the difficult question later.

More information

DATA MINING DECISION TREE INDUCTION

DATA MINING DECISION TREE INDUCTION DATA MINING DECISION TREE INDUCTION 1 Classification Techniques Linear Models Support Vector Machines Decision Tree based Methods Rule-based Methods Memory based reasoning Neural Networks Naïve Bayes and

More information

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture tes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar 1 Classification: Definition Given a collection

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

A General Greedy Approximation Algorithm with Applications

A General Greedy Approximation Algorithm with Applications A General Greedy Approximation Algorithm with Applications Tong Zhang IBM T.J. Watson esearch Center Yorktown Heights NY 10598 tzhang@watson.ibm.com Abstract Greedy approximation algorithms have been frequently

More information

CLASSIFICATION AND CLUSTERING. Anveshi Charuvaka

CLASSIFICATION AND CLUSTERING. Anveshi Charuvaka CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training

More information

Classifiers & Classification

Classifiers & Classification Classifiers & Classification Forsyth & Ponce Computer Vision A Modern Approach chapter 22 Pattern Classification Duda, Hart and Stork School of Computer Science & Statistics Trinity College Dublin Dublin

More information

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 7 of Data Mining by I. H. Witten and E. Frank

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 7 of Data Mining by I. H. Witten and E. Frank Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 7 of Data Mining by I. H. Witten and E. Frank Engineering the input and output Attribute selection Scheme independent, scheme

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Prof. Alexander Ihler Prof. Max Welling icamp Tutorial July 22 What is machine learning? The ability of a machine to improve its performance based on previous results:

More information

Knowledge Engineering and Expert Systems

Knowledge Engineering and Expert Systems Knowledge Engineering and Expert Systems Lecture Notes on Machine Learning Matteo Mattecci matteucci@elet.polimi.it Department of Electronics and Information Politecnico di Milano Lecture Notes on Machine

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci

More information

Machine learning for cancer informatics and drug discovery

Machine learning for cancer informatics and drug discovery Machine learning for cancer informatics and drug discovery Jean-Philippe.Vert@mines.org Mines ParisTech / Institut Curie / INSERM U900 ENS Paris, October 13, 2010 Team s goal Develop new mathematical/computational

More information

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Supervised Feature Selection & Unsupervised Dimensionality Reduction Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or

More information

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Introduction to Data Mining

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Introduction to Data Mining Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar (modified for I211) Tan,Steinbach, Kumar

More information

Classification: Basic Concepts, Decision Trees, and Model Evaluation

Classification: Basic Concepts, Decision Trees, and Model Evaluation Classification: Basic Concepts, Decision Trees, and Model Evaluation Dr. Hui Xiong Rutgers University Introduction to Data Mining 1/2/2009 1 Classification: Definition Given a collection of records (training

More information

Machine Learning!!!!!Srihari. Kernel Methods! Sargur Srihari!

Machine Learning!!!!!Srihari. Kernel Methods! Sargur Srihari! Kernel Methods! Sargur Srihari! 1 Topics in Kernel Methods! 1. Kernel Methods vs Linear Models/Neural Networks! 2. Stored Sample Methods! 3. Kernel Functions! 4. Dual Representations! 5. Constructing Kernels!

More information

Prototype Methods: K-Means

Prototype Methods: K-Means Prototype Methods: K-Means Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Prototype Methods Essentially model-free. Not useful for understanding the nature of the

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

CS570 Data Mining Classification: Ensemble Methods

CS570 Data Mining Classification: Ensemble Methods CS570 Data Mining Classification: Ensemble Methods Cengiz Günay Dept. Math & CS, Emory University Fall 2013 Some slides courtesy of Han-Kamber-Pei, Tan et al., and Li Xiong Günay (Emory) Classification:

More information

Classification Lecture 1: Basics, Methods

Classification Lecture 1: Basics, Methods Classification Lecture 1: Basics, Methods Jing Gao SUNY Buffalo 1 Outline Basics Problem, goal, evaluation Methods Nearest Neighbor Decision Tree Naïve Bayes Rule-based Classification Logistic Regression

More information

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-19-B &

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

Introduction to Fuzzy Logic Control

Introduction to Fuzzy Logic Control Introduction to Fuzzy Logic Control 1 Outline General Definition Applications Operations Rules Fuzzy Logic Toolbox FIS Editor Tipping Problem: Fuzzy Approach Defining Inputs & Outputs Defining MFs Defining

More information