Class #6: Nonlinear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris


 Meredith Butler
 1 years ago
 Views:
Transcription
1 Class #6: Nonlinear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1
2 Module #: Title of Module 2
3 Review Overview Linear separability Nonlinear classification Linear Support Vector Machines (SVMs) Picking the best linear decision boundary Nonlinear SVMs Decision Trees Ensemble methods: Bagging and Boosting 3
4 Objective functions E(X, Θ)  function of the data (X) and the parameters (Θ) They measure data fit between a model and the data, e.g.: The Kmeans objective function is the sum of the squared Euclidean distance from each data point to the closest mean (or centroid ), parameters are the mean When fitting a statistical model, we use: E(X, Θ) = log P(X Θ) = Σ i log P(Xi Θ) 4
5 Examples of objective functions Clustering: Cuse Mix of Gaussians: log likelihood of data Affinity propagation: sum of similarities with exemplars Regression: Linear: Sum of squared errors (aka residuals), aka log likelihood under Gaussian noise Classification: Logistic regression: Sum of log probability of class under logistic function Fisher s: ratio of σ 2 within class vs σ 2 between class 5
6 Overfitting 1 mod del fit) Err ror (i.e. Generalization (test set) error Training set error # of parameters or complexity More complex models better fit the training data. But after some point, more complex models generalize worse to new data. 6
7 Dealing with overfitting by regularization Basic idea: add a term to the objective function that penalizes # of parameters or model complexity, Now: E(X, Θ) = datafit(x, Θ) λ complexity(θ) Hyperparameter parameter λ controls the strength of regularization could have a natural value, or need to set this by crossvalidation, Increases in model complexity need to be balanced by improved model fit. (each unit of added complexity costs λ units of data fit) 7
8 Examples of regularization Clustering: (N is # params, a M is # datapoints) apo Bayes IC: LL 0.5 N log M Akaike IC: LL N Regression: L1 (aka LASSO): LL λ Σ i b i L2 (aka Ridge): LL λ Σ i b i 2 Elastic net: LL λ 1 Σ i b i  λ 2 Σ i b i 2 Classification: Logistic regression: Same as linear regression 8
9 Decision boundary: b +b = T 1 x 1 2 x 2 x b = t x 1 f(x) = x T b discriminant function (or value) Linear classification b Can define decision boundary with a vector of feature weights, b, and a threshold, t, if x T b > t, predict + and otherwise predict , can trade off false positives versus false negatives by changing t x 2
10 Linear separability Linearly separable Not linearly separable x 1 x 1 x 2 x 2 Two classes are linearly separable, if you can perfectly classify them with a linear decision boundary. Why is this important? Linear classifiers can only draw linear decision i boundaries.
11 Nonlinear classification x 1 x 2 Nonlinear classifiers have nonlinear, and possibly discontinous decision boundaries 11
12 Nonlinear decision boundaries Set x 3 = (x 1 m 1 ) 2 +(x 2 m 2 ) 2 x 3 = 1 x 1 m Decision rule: Predict if x 3 < 1 x 2 12
13 Nonlinear classification: Idea #1 Fit a linear classifier to nonlinear functions of the input features. E.g.: Transform features (quadratic): {1, x 1, x 2 } {1, x 1, x 2, x 1 x 2, x 12, x 22 } Now can fit arbitrary quadratic decision boundaries by fitting a linear classifier to transformed features. This is feasible for quadratic transforms, but what about other transforms ( power of 10 )? 13
14 Support Vector Machines Vladimir Vapnik 14
15 Picking the best decision boundary Decision boundaries x 1 What s the best decision boundary to choose when one? That is, what is the one most likely to generalize well to new datapoints? x 2
16 LDA & Fisher sol n (generative) Best decision boundary y( (in 2d): parallel to the direction of greatest variance. x 1 This is the Bayes optimal classifier (if mean and covariance are correct). x 2
17 Unregularized logistic regression (discriminative) x 1 All solutions are equally good because each one perfectly discriminates the classes. (note: this changes if there s regularization) x 2 17
18 Margin Linear SVM solution Maximum margin boundary. x 1 x 2 Vladimir Vapnik 18
19 Margin SVM decision boundaries Here, the maximum margin boundary is specified by three points, called the support vectors. x 1 Support vectors x 2 Vladimir Vapnik 19
20 SVM objective function (Where targets: y i = 1 or y i = 1, x i input features, b are feature weights) Primal objective function: E(b) = C Σ i ξ i b T b/2 where y i (b T x i ) > 1ξ i for all i d a t a L 2 r ξ i >= 0 is degree eof misclassification under g hinge loss f for datapoint i, C>0 is the i hyperparameter eter balancing regularization and data t fit. 20
21 Visualization of ξ i Misclassified point ξ i If linearly separable, all values of ξ i = 0 x 1 Vladimir Vapnik x 2 21
22 Another SVM objective function Dual : E(α) = Σ i α i  Σ i Σ j α i α j y i y j x it x j /2 Under the constraints that: Σ i α i y i = 0 and 0 α i C Depends only on dot products!!! Then can define weights b = Σ i α i y i x i and discriminant function: b T x = Σ i α i y i x it x Data i such that α i > 0 are support vectors 22
23 Dot products of transformed vector as kernel functions Let x = (x () 2 2 1, x 2 ) and 2 (x) = ( 2x 1 x 2, x 12, x 22 ) Here 2 (x) [or p (x)] is a vector valued function that returns all possible powers of two [or p] of x 1 and x 2 Then 2 (x) T 2 (z) = (x T z) 2 = K(x,z) kernel function And in general, p (x) T p (z) = (x T z) p Every kernel function K(x,z) that satisfies some simple conditions corresponds to a dot product of some transformation (x) (aka projection function ) 23
24 Nonlinear SVM objective function Objective: E(α) = Σ i α i  Σ i Σ j α i α j y i y j K(x i,x j )/2 Under the constraints that: Σ i α i y i = 0 and 0 α i C Discriminant function: f(x) = Σ i α i y i K(x i,x) w i feature weight 24
25 Kernelization (FYI) Often instead of explicitly writing out the nonlinear feature set, one simply calculates l a kernel function of the two input vectors, i.e., K(x i, x j ) = φ(x i ) T φ(x j ) Here s why: any kernel function K(x i, x j ) that satisfies certain conditions, e.g., K(x i, x j ) is a symmetric positive semidefinite function (which implies that the matrix K, where K ij = K(x i, x j ) is a symmetric positive semidefinite matrix), corresponds to a dot product K(x i, x j ) = φ(x i ) T φ(x j ) for some projection function φ(x). Often it s easy to think of defining a kernel function that captures some notation of similarity Nonlinear SVMs use the discriminant function f(x; w) =Σ i w i K(x i,x)
26 Some popular kernel functions K(x i, x j ) = (x it x j + 1) P Inhomogeneous Polynomial kernel of degree P K(x i, x j ) = (x it x j ) P Homogeneous Polynomial kernel K(x i, x j ) = exp{(x i x j ) T (x i x j )/ s 2 } Gaussian radial basis function kernel K(x i, x j ) = Pearson(x i,x j ) + 1 Bioinformatics kernel A Also, can make a kernel out of an interaction network!
27 Example: radial basis function kernels support vectors x 1 x 2 Φ(x) = {N(x; x 1, σ 2 I), N(x; x 2, σ 2 I),, N(x; x 2, σ 2 I)} Where N(x; m, Σ) is the probability density of x under a multivariate Gaussian with mean m, and covariance Σ
28 SVM summary A sparse linear classifier that uses as features kernel functions that measure similarity to each data point in the training. 28
29 Neural Networks (aka multilayer perceptrons) h 4 f(x,θ) = Σ k v k h k (x) Where hidden unit : h k (x) = σ(σ i w ik x i ) x 1 Often: σ(x) = 1 / (1 + e x ) h 1 v and W are parameters x 2 h 2 h 3 Notes: Fitting the hidden units has become a lot easier using deep belief networks Could also fit means of RBF for hidden units
30 KNearest Neighbour K = 15 Classify x in the same way that a majority of the K nearest neighbours to x are labeled. Can also used # of nearest neighbors that are positive as a discriminant value. K = 1 Smaller values of K lead to more complex classifiers, you can set K using (nested) crossvalidation. Images from Elements of Stat Learning, Hastie, Tibshirani, Efron (available online)
31 x 1 Decision trees For each leaf: 1) Choose a dimension to split along, and choose best split using a split criterion (see below) 2) Split along dimension, creating two new leaves, return to (1) until leaves are pure. To reduce overfitting, it is advisable to prune the tree by removing leaves with small numbers of examples x 2 Split criterion: C4.5 (Quinlan 1993): Expected decrease in (binary) entropy CART (Breiman et al 1984): Expected decrease in Gini impurity (1sum of squared class probabilities)
32 Decision trees x 2 > a? yes no x 1 x 2 a
33 Decision trees x 2 > a? yes no x 1 x 2 > b? yes no etc b x 2 a
34 Decision trees x 2 > a? yes no x 1 x 2 > b? yes no etc b x 2 a
35 Decision tree summary Decision trees learn a recursive splits of the data along individual features that partition the input space into homogeneous groups of data points with the same labels. l 35
36 Ensemble classification Combining multiple classifiers together by training them separately and then averaging their predictions is a good way to avoid overfitting. Bagging (Breiman 1996): Resample training set, train separate classifiers on each sample, have the classifiers vote for the classification Boosting (e.g. Adaboost, Freund and Schapire 1997): Iteratively reweight training sets based on errors of a weighted average of classifiers: Train classifier ( weak learner ) to minimize weighted error on training set Weight new classifier according to prediction error, reweight training i set according to prediction error Repeat Minimizes exponential loss on training set over a convex set of functions
37 (x 1, y 1 ), (x 2, y 2 ),, (x N, y N ) Bootstrap samples Bagging Train separate classifiers (x 1, y 1 ), (x 1, y 1 ), (x 3, y 3 ),, (x N1, y N1 ) f 1 (x) (x 1, y 1 ), (x 2, y 2 ), (x 4, y 4 ),, (x N, y N ) f 2 (x) etc (M samples in total) bagged(x) = Σ j f j (x)/m 37
38 Boosting (x 1, y 1 ), (x 2, y 2 ),, (x N, y N ) (x 2, y 2 ), (x 2, y 2 ),, (x N1 1, y N1 ) Assess performance f 1 (x) Train classifier Resample hard cases, Set w t f t (x) Train classifier Assess performance (x 1, y 1 ), (x 2, y 2 ),, (x N, y N ) (x 2, y 2 ), (x 2, y 2 ),, (x N1, y N1 ) Legend (x i, y i )  correct (x i, y i )  incorrect boost(x) = Σ j w j f j (x) 38
39 Ensemble learning summary Bagging: classifiers trained separately (in parallel) on different bootstrapped samples of the training set, and make equally weighted votes Boosting: classifiers trained sequentially, weighted by performance, on samples of the training set that focus on hard examples. Final classification is based on weighted votes. 39
40 Random Forests (Breiman 2001) Construct M bootstrapped samples of the training set (of size N) For each sample, build DT using CART (no pruning), but split optimally on randomly chosen features random feature choice reduces correlation among trees, this is a good thing. Since bootstrap resamples training set with replacement, leaves out, on average, (11/N) N x 100% of the examples (~100/e% = 36.7%), can use these outofbag samples to estimate performance Bag predictions (i.e. average them) Can assess importance of features by evaluating performance of trees containing those features
41 Advantages of Decision Trees Relatively easy to combine continuous and discrete (ordinal or categorical) features within the same classifier, this is harder to do for other methods Random Forests can be trained in parallel (unlike boosted decision trees), and relatively quickly. Random Forests tend to do quite well in empirical tests (but see No Free Lunch theorem)
42 Networks as kernels Can use matrix representations of graphs to generate kernels. One popular graphbased kernel is the diffusion kernel: K =(λi L) 1 Where L = D W, D is a diagonal matrix with the row sums, and W is the matrix representation of the graph. GeneMANIA label propagation: f = (λi L) 1 y
Support Vector Machine. Hongyu Li School of Software Engineering Tongji University Fall, 2014
Support Vector Machine Hongyu Li School of Software Engineering Tongji University Fall, 2014 Overview Intro. to Support Vector Machines (SVM) Properties of SVM Applications Gene Expression Data Classification
More informationCS Statistical Machine learning Lecture 18: Midterm Review. Yuan (Alan) Qi
CS 59000 Statistical Machine learning Lecture 18: Midterm Review Yuan (Alan) Qi Overview Overfitting, probabilities, decision theory, entropy and KL divergence, ML and Bayesian estimation of Gaussian and
More informationData Mining Chapter 10: Predictive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 10: Predictive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Predictive modeling Predictive modeling can be thought of as learning a mapping
More informationA Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Largemargin linear
More information1 Comparison of Machine Learning Algorithms [Jayant, 20 points]
1 Comparison of Machine Learning Algorithms [Jayant, 20 points] In this problem, you will review the important aspects of the algorithms we have learned about in class. For every algorithm listed in the
More informationThe exam is closed book, closed notes except your onepage (two sides) or twopage (one side) crib sheet.
CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your onepage (two sides) or twopage (one side) crib sheet. Please
More informationClassification: Basic Concepts, Decision Trees, and Model Evaluation. General Approach for Building Classification Model
10 10 Classification: Basic Concepts, Decision Trees, and Model Evaluation Dr. Hui Xiong Rutgers University Introduction to Data Mining 1//009 1 General Approach for Building Classification Model Tid Attrib1
More informationLogistic Regression. Machine Learning CSEP546 Carlos Guestrin University of Washington
Logistic Regression Machine Learning CSEP546 Carlos Guestrin University of Washington January 27, 2014 Carlos Guestrin 20052014 1 Reading Your Brain, Simple Example Pairwise classification accuracy: 85%
More informationRandom Forests. Albert A. Montillo, Ph.D. University of Pennsylvania, Radiology Rutgers University, Computer Science
Random Forests Albert A. Montillo, Ph.D. University of Pennsylvania, Radiology Rutgers University, Computer Science Guest lecture: Statistical Foundations of Data Analysis Temple University 422009 Outline
More informationIntroduction to Machine Learning
Introduction to Machine Learning Third Edition Ethem Alpaydın The MIT Press Cambridge, Massachusetts London, England 2014 Massachusetts Institute of Technology All rights reserved. No part of this book
More informationIntroduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011
Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning
More informationLecture 4: Regression ctd and multiple classes
Lecture 4: Regression ctd and multiple classes C19 Machine Learning Hilary 2015 A. Zisserman Regression Lasso L1 regularization SVM regression and epsiloninsensitive loss More loss functions Multiclass
More informationIntroduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multiclass classification.
More informationModel Combination. 24 Novembre 2009
Model Combination 24 Novembre 2009 Datamining 1 20092010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationLEARNING TECHNIQUES AND APPLICATIONS
MSc Course MACHINE LEARNING TECHNIQUES AND APPLICATIONS Classification with GMM + Bayes 1 Clustering, semisupervised clustering and classification Clustering No labels for the points! Semisupervised
More information1. Show the tree you constructed in the diagram below. The diagram is more than big enough, leave any parts that you don t need blank. > 1.5.
1 Decision Trees (13 pts) Data points are: Negative: (1, 0) (2, 1) (2, 2) Positive: (0, 0) (1, 0) Construct a decision tree using the algorithm described in the notes for the data above. 1. Show the
More informationKernel Methods and Nonlinear Classification
Kernel Methods and Nonlinear Classification Piyush Rai CS5350/6350: Machine Learning September 15, 2011 (CS5350/6350) Kernel Methods September 15, 2011 1 / 16 Kernel Methods: Motivation Often we want to
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte  CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationClassification and Regression Trees
Classification and Regression Trees Bob Stine Dept of Statistics, School University of Pennsylvania Trees Familiar metaphor Biology Decision tree Medical diagnosis Org chart Properties Recursive, partitioning
More informationThe exam is closed book, closed notes except your onepage (twosided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your onepage (twosided) cheat sheet. No calculators or
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More information10601: Machine Learning Midterm Exam November 3, Solutions
10601: Machine Learning Midterm Exam November 3, 2010 Solutions Instructions: Make sure that your exam has 16 pages (not including this cover sheet) and is not missing any sheets, then write your full
More informationTree based methods Idea: Partition the feature space into sets of rectangles Fit simple model in each areas. 5. Tree based methods
Tree based methods Idea: Partition the feature space into sets of rectangles Fit simple model in each areas 5. Tree based methods Recursive binary partition Example with one continuous response and two
More informationThe exam is closed book, closed notes except your onepage cheat sheet.
CS 189 Spring 2016 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your onepage cheat sheet. Electronic
More informationBagging and Boosting. Amit Srinet Dave Snyder
Bagging and Boosting Amit Srinet Dave Snyder Outline Bagging Definition Variants Examples Boosting Definition Hedge(β) AdaBoost Examples Comparison Bagging Bootstrap Model Randomly generate L set of cardinality
More informationLearning Bayesian Networks: Naïve Bayes
Learning Baian Networks: Naïve and nna Naïve Ba Hypothesis Space fixed size stochastic continuous parameters Learning Algorithm direct computation eager batch Multivariate Gaussian Classifier y x The
More informationDecision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
More informationIntroduction to Machine Learning
Introduction to Machine Learning PhD Course at BMCS Fabio Aiolli 24 June 2016 Fabio Aiolli Introduction to Machine Learning 24 June 2016 1 / 34 Outline Linear methods for classification and regression
More informationSupport Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationArtificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network?  Perceptron learners  Multilayer networks What is a Support
More informationAn Introduction to Machine Learning for Social Scientists
An Introduction to Machine Learning for Social Scientists Tyler Ransom Duke University, Social Science Research Institute March 30, 2016 Outline 1. Intro 2. Examples 3. Conclusion Tyler Ransom (Duke SSRI)
More informationSupport Vector Machine (SVM)
Support Vector Machine (SVM) CE725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept HardMargin SVM SoftMargin SVM Dual Problems of HardMargin
More informationSupport Vector Machines and Kernel Functions
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 18 24 April 2015 Support Vector Machines and Kernel Functions Contents Support Vector
More informationVery convenient! Linear Methods: Logistic Regression. Module 5: Classification. STAT/BIOSTAT 527, University of Washington Emily Fox May 22 nd, 2014
Module 5: Classification Linear Methods: Logistic Regression STAT/BIOSTAT 527, University of Washington Emily Fox May 22 nd, 2014 1 Very convenient! implies Examine ratio: implies linear classification
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationBoosting. Can we make dumb learners smart? Aarti Singh. Machine Learning / Oct 11, 2010
Boosting Can we make dumb learners smart? Aarti Singh Machine Learning 10701/15781 Oct 11, 2010 Slides Courtesy: Carlos Guestrin, Freund & Schapire 1 Project Proposal Due Today! 2 Why boost weak learners?
More informationMachine Learning: Algorithms and Applications
Machine Learning: Algorithms and Applications Floriano Zini Free University of BozenBolzano Faculty of Computer Science Academic Year 20112012 Lecture 9: 7 May 2012 Ensemble methods Slides courtesy of
More informationSupport Vector Machines
CS769 Spring 200 Advanced Natural Language Processing Support Vector Machines Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Many NLP problems can be formulated as classification, e.g., word sense disambiguation,
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationSVM and Decision Tree
SVM and Decision Tree Le Song Machine Learning I CSE 6740, Fall 2013 Which decision boundary is better? Suppose the training samples are linearly separable We can find a decision boundary which gives zero
More informationData Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Models vs. Patterns Models A model is a high level, global description of a
More informationThe Problem of Overfitting. BR data: neural network with 20% classification noise, 307 training examples
The Problem of Overfitting BR data: neural network with 20% classification noise, 307 training examples Overfitting on BR (2) Overfitting: h H overfits training set S if there exists h H that has higher
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationLecture 3: SVM dual, kernels and regression
Lecture 3: SVM dual, kernels and regression C19 Machine Learning Hilary 215 A. Zisserman Primal and dual forms Linear separability revisted Feature maps Kernels for SVMs Regression Ridge regression Basis
More informationSemisupervised learning
Semisupervised learning Learning from both labeled and unlabeled data Semisupervised learning Learning from both labeled and unlabeled data Motivation: labeled data may be hard/expensive to get, but
More information6.867 Machine learning
6.867 Machine learning Midterm exam October 15, 3 ( points) Your name and MIT ID: SOLUTIONS Problem 1 Suppose we are trying to solve an active learning problem, where the possible inputs you can select
More informationSupport Vector Machines. Rezarta Islamaj Dogan. Support Vector Machines tutorial Andrew W. Moore
Support Vector Machines Rezarta Islamaj Dogan Resources Support Vector Machines tutorial Andrew W. Moore www.autonlab.org/tutorials/svm.html A tutorial on support vector machines for pattern recognition.
More informationAnalysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
More informationIntroduction to Statistical Learning
Introduction to Statistical Learning JeanPhilippe Vert JeanPhilippe.Vert@ensmp.fr Mines ParisTech and Institut Curie Master Course, 2011. JeanPhilippe Vert (Mines ParisTech) 1 / 46 Outline 1 Introduction
More information6.891 Machine learning and neural networks
6.89 Machine learning and neural networks Midterm exam: SOLUTIONS October 3, 2 (2 points) Your name and MIT ID: No Body, MIT ID # Problem. (6 points) Consider a twolayer neural network with two inputs
More informationIntroduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training
More informationPMR 2728 / 5228 Probability Theory in AI and Robotics. Machine Learning. Fabio G. Cozman  Office MS08 
PMR 2728 / 5228 Probability Theory in AI and Robotics Machine Learning Fabio G. Cozman  Office MS08  fgcozman@usp.br November 12, 2012 Machine learning Quite general term: learning from explanations,
More informationC19 Machine Learning
C9 Machine Learning 8 Lectures Hilary Term 25 2 Tutorial Sheets A. Zisserman Overview: Supervised classification perceptron, support vector machine, loss functions, kernels, random forests, neural networks
More informationAdvanced Introduction to Machine Learning CMU10715
Advanced Introduction to Machine Learning CMU10715 Support Vector Machines Barnabás Póczos, 2014 Fall Linear classifiers which line is better? 2 Pick the one with the largest margin! w x + b > 0 Class
More informationDATA MINING REVIEW BASED ON. MANAGEMENT SCIENCE The Art of Modeling with Spreadsheets. Using Analytic Solver Platform
DATA MINING REVIEW BASED ON MANAGEMENT SCIENCE The Art of Modeling with Spreadsheets Using Analytic Solver Platform What We ll Cover Today Introduction Session II beta training program goals Brief overview
More informationDecompose Error Rate into components, some of which can be measured on unlabeled data
BiasVariance Theory Decompose Error Rate into components, some of which can be measured on unlabeled data BiasVariance Decomposition for Regression BiasVariance Decomposition for Classification BiasVariance
More informationKernels Methods in Machine Learning
Kernels Methods in Machine Learning Perceptron. Geometric Margins. Support Vector Machines (SVMs). MariaFlorina Balcan 03/23/2015 Quick Recap about Perceptron and Margins The nline Learning Model Example
More informationDecision Trees. Aarti Singh. Machine Learning / Oct 6, 2010
Decision Trees Aarti Singh Machine Learning 10701/15781 Oct 6, 2010 Learning a good prediction rule Learn a mapping Best prediction rule Hypothesis space/function class Parametric classes (Gaussian,
More informationRecitation. Mladen Kolar
10601 Recitation Mladen Kolar Topics covered after the midterm Learning Theory Hidden Markov Models Neural Networks Dimensionality reduction Nonparametric methods Support vector machines Boosting Learning
More informationMicroarray Data Analysis: Discovery
Microarray Data Analysis: Discovery Classification Classification vs. Clustering Classification: Goal: Placing objects (e.g. genes) into meaningful classes Supervised Clustering: Goal: Discover meaningful
More informationSupport Vector Machines
Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric
More informationClassification Basic Concepts, Decision Trees, and Model Evaluation
Classification Basic Concepts, Decision Trees, and Model Evaluation Jeff Howbert Introduction to Machine Learning Winter 2014 1 Classification definition Given a collection of samples (training set) Each
More informationOptimal Separating Hyperplane and the Support Vector Machine. Volker Tresp Summer 2016
Optimal Separating Hyperplane and the Support Vector Machine Volker Tresp Summer 2016 1 (Vapnik s) Optimal Separating Hyperplane Let s consider a linear classifier with y i { 1, 1} If classes are linearly
More informationlinear classifiers and Support Vector Machines
linear classifiers and Support Vector Machines classification origins in pattern recognition considered a central topic in machine learning overlap between vision and learning unavoidable and very important
More informationCurve Fitting & Multisensory Integration
Curve Fitting & Multisensory Integration Using Probability Theory Hannah Chen, Rebecca Roseman and Adam Rule COGS 202 4.14.2014 Overheard at Porters You re optimizing for the wrong thing! What would you
More informationEnsemble Methods and Boosting. Copyright 2005 by David Helmbold 1
Ensemble Methods and Boosting Copyright 2005 by David Helmbold 1 Ensemble Methods: Use set of hypothesis 2 Ensemble Methods Use ensemble or goup of hypotheses Diversity important ensemble of yesmen is
More informationTrees and Random Forests
Trees and Random Forests Adele Cutler Professor, Mathematics and Statistics Utah State University This research is partially supported by NIH 1R15AG03739201 Cache Valley, Utah Utah State University Leo
More informationIntroduction to Boosted Trees. Tianqi Chen Oct
Introduction to Boosted Trees Tianqi Chen Oct. 22 2014 Outline Review of key concepts of supervised learning Regression Tree and Ensemble (What are we Learning) Gradient Boosting (How do we Learn) Summary
More informationMachine Learning  Michaelmas Term 2016 Lecture 8 : Classification: Logistic Regression
Machine Learning  Michaelmas Term 2016 Lecture 8 : Classification: Logistic Regression Lecturer: Varun Kanade In the previous lecture, we studied two different generative models for classification Naïve
More informationClassification and Regression Trees
Classification and Regression Trees David Rosenberg New York University February 28, 2015 David Rosenberg (New York University) DSGA 1003 February 28, 2015 1 / 36 Regression Trees General Tree Structure
More information10701/ Final, Fall 2003
NDREW ID (CPITLS): NME (CPITLS): 10701/15781 Final, Fall 2003 You have 3 hours. There are 10 questions. If you get stuck on one question, move on to others and come back to the difficult question later.
More informationDATA MINING DECISION TREE INDUCTION
DATA MINING DECISION TREE INDUCTION 1 Classification Techniques Linear Models Support Vector Machines Decision Tree based Methods Rulebased Methods Memory based reasoning Neural Networks Naïve Bayes and
More informationData Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture tes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar 1 Classification: Definition Given a collection
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationA General Greedy Approximation Algorithm with Applications
A General Greedy Approximation Algorithm with Applications Tong Zhang IBM T.J. Watson esearch Center Yorktown Heights NY 10598 tzhang@watson.ibm.com Abstract Greedy approximation algorithms have been frequently
More informationCLASSIFICATION AND CLUSTERING. Anveshi Charuvaka
CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training
More informationClassifiers & Classification
Classifiers & Classification Forsyth & Ponce Computer Vision A Modern Approach chapter 22 Pattern Classification Duda, Hart and Stork School of Computer Science & Statistics Trinity College Dublin Dublin
More informationData Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 7 of Data Mining by I. H. Witten and E. Frank
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 7 of Data Mining by I. H. Witten and E. Frank Engineering the input and output Attribute selection Scheme independent, scheme
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationIntroduction to Machine Learning
Introduction to Machine Learning Prof. Alexander Ihler Prof. Max Welling icamp Tutorial July 22 What is machine learning? The ability of a machine to improve its performance based on previous results:
More informationKnowledge Engineering and Expert Systems
Knowledge Engineering and Expert Systems Lecture Notes on Machine Learning Matteo Mattecci matteucci@elet.polimi.it Department of Electronics and Information Politecnico di Milano Lecture Notes on Machine
More informationLinear Models for Classification
Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci
More informationMachine learning for cancer informatics and drug discovery
Machine learning for cancer informatics and drug discovery JeanPhilippe.Vert@mines.org Mines ParisTech / Institut Curie / INSERM U900 ENS Paris, October 13, 2010 Team s goal Develop new mathematical/computational
More informationSupervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
More informationData Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Introduction to Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar (modified for I211) Tan,Steinbach, Kumar
More informationClassification: Basic Concepts, Decision Trees, and Model Evaluation
Classification: Basic Concepts, Decision Trees, and Model Evaluation Dr. Hui Xiong Rutgers University Introduction to Data Mining 1/2/2009 1 Classification: Definition Given a collection of records (training
More informationMachine Learning!!!!!Srihari. Kernel Methods! Sargur Srihari!
Kernel Methods! Sargur Srihari! 1 Topics in Kernel Methods! 1. Kernel Methods vs Linear Models/Neural Networks! 2. Stored Sample Methods! 3. Kernel Functions! 4. Dual Representations! 5. Constructing Kernels!
More informationPrototype Methods: KMeans
Prototype Methods: KMeans Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Prototype Methods Essentially modelfree. Not useful for understanding the nature of the
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationCS570 Data Mining Classification: Ensemble Methods
CS570 Data Mining Classification: Ensemble Methods Cengiz Günay Dept. Math & CS, Emory University Fall 2013 Some slides courtesy of HanKamberPei, Tan et al., and Li Xiong Günay (Emory) Classification:
More informationClassification Lecture 1: Basics, Methods
Classification Lecture 1: Basics, Methods Jing Gao SUNY Buffalo 1 Outline Basics Problem, goal, evaluation Methods Nearest Neighbor Decision Tree Naïve Bayes Rulebased Classification Logistic Regression
More informationKnowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19  Bagging. Tom Kelsey. Notes
Knowledge Discovery and Data Mining Lecture 19  Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.standrews.ac.uk twk@standrews.ac.uk Tom Kelsey ID505919B &
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationIntroduction to Fuzzy Logic Control
Introduction to Fuzzy Logic Control 1 Outline General Definition Applications Operations Rules Fuzzy Logic Toolbox FIS Editor Tipping Problem: Fuzzy Approach Defining Inputs & Outputs Defining MFs Defining
More information