Class #6: Nonlinear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris


 Meredith Butler
 2 years ago
 Views:
Transcription
1 Class #6: Nonlinear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1
2 Module #: Title of Module 2
3 Review Overview Linear separability Nonlinear classification Linear Support Vector Machines (SVMs) Picking the best linear decision boundary Nonlinear SVMs Decision Trees Ensemble methods: Bagging and Boosting 3
4 Objective functions E(X, Θ)  function of the data (X) and the parameters (Θ) They measure data fit between a model and the data, e.g.: The Kmeans objective function is the sum of the squared Euclidean distance from each data point to the closest mean (or centroid ), parameters are the mean When fitting a statistical model, we use: E(X, Θ) = log P(X Θ) = Σ i log P(Xi Θ) 4
5 Examples of objective functions Clustering: Cuse Mix of Gaussians: log likelihood of data Affinity propagation: sum of similarities with exemplars Regression: Linear: Sum of squared errors (aka residuals), aka log likelihood under Gaussian noise Classification: Logistic regression: Sum of log probability of class under logistic function Fisher s: ratio of σ 2 within class vs σ 2 between class 5
6 Overfitting 1 mod del fit) Err ror (i.e. Generalization (test set) error Training set error # of parameters or complexity More complex models better fit the training data. But after some point, more complex models generalize worse to new data. 6
7 Dealing with overfitting by regularization Basic idea: add a term to the objective function that penalizes # of parameters or model complexity, Now: E(X, Θ) = datafit(x, Θ) λ complexity(θ) Hyperparameter parameter λ controls the strength of regularization could have a natural value, or need to set this by crossvalidation, Increases in model complexity need to be balanced by improved model fit. (each unit of added complexity costs λ units of data fit) 7
8 Examples of regularization Clustering: (N is # params, a M is # datapoints) apo Bayes IC: LL 0.5 N log M Akaike IC: LL N Regression: L1 (aka LASSO): LL λ Σ i b i L2 (aka Ridge): LL λ Σ i b i 2 Elastic net: LL λ 1 Σ i b i  λ 2 Σ i b i 2 Classification: Logistic regression: Same as linear regression 8
9 Decision boundary: b +b = T 1 x 1 2 x 2 x b = t x 1 f(x) = x T b discriminant function (or value) Linear classification b Can define decision boundary with a vector of feature weights, b, and a threshold, t, if x T b > t, predict + and otherwise predict , can trade off false positives versus false negatives by changing t x 2
10 Linear separability Linearly separable Not linearly separable x 1 x 1 x 2 x 2 Two classes are linearly separable, if you can perfectly classify them with a linear decision boundary. Why is this important? Linear classifiers can only draw linear decision i boundaries.
11 Nonlinear classification x 1 x 2 Nonlinear classifiers have nonlinear, and possibly discontinous decision boundaries 11
12 Nonlinear decision boundaries Set x 3 = (x 1 m 1 ) 2 +(x 2 m 2 ) 2 x 3 = 1 x 1 m Decision rule: Predict if x 3 < 1 x 2 12
13 Nonlinear classification: Idea #1 Fit a linear classifier to nonlinear functions of the input features. E.g.: Transform features (quadratic): {1, x 1, x 2 } {1, x 1, x 2, x 1 x 2, x 12, x 22 } Now can fit arbitrary quadratic decision boundaries by fitting a linear classifier to transformed features. This is feasible for quadratic transforms, but what about other transforms ( power of 10 )? 13
14 Support Vector Machines Vladimir Vapnik 14
15 Picking the best decision boundary Decision boundaries x 1 What s the best decision boundary to choose when one? That is, what is the one most likely to generalize well to new datapoints? x 2
16 LDA & Fisher sol n (generative) Best decision boundary y( (in 2d): parallel to the direction of greatest variance. x 1 This is the Bayes optimal classifier (if mean and covariance are correct). x 2
17 Unregularized logistic regression (discriminative) x 1 All solutions are equally good because each one perfectly discriminates the classes. (note: this changes if there s regularization) x 2 17
18 Margin Linear SVM solution Maximum margin boundary. x 1 x 2 Vladimir Vapnik 18
19 Margin SVM decision boundaries Here, the maximum margin boundary is specified by three points, called the support vectors. x 1 Support vectors x 2 Vladimir Vapnik 19
20 SVM objective function (Where targets: y i = 1 or y i = 1, x i input features, b are feature weights) Primal objective function: E(b) = C Σ i ξ i b T b/2 where y i (b T x i ) > 1ξ i for all i d a t a L 2 r ξ i >= 0 is degree eof misclassification under g hinge loss f for datapoint i, C>0 is the i hyperparameter eter balancing regularization and data t fit. 20
21 Visualization of ξ i Misclassified point ξ i If linearly separable, all values of ξ i = 0 x 1 Vladimir Vapnik x 2 21
22 Another SVM objective function Dual : E(α) = Σ i α i  Σ i Σ j α i α j y i y j x it x j /2 Under the constraints that: Σ i α i y i = 0 and 0 α i C Depends only on dot products!!! Then can define weights b = Σ i α i y i x i and discriminant function: b T x = Σ i α i y i x it x Data i such that α i > 0 are support vectors 22
23 Dot products of transformed vector as kernel functions Let x = (x () 2 2 1, x 2 ) and 2 (x) = ( 2x 1 x 2, x 12, x 22 ) Here 2 (x) [or p (x)] is a vector valued function that returns all possible powers of two [or p] of x 1 and x 2 Then 2 (x) T 2 (z) = (x T z) 2 = K(x,z) kernel function And in general, p (x) T p (z) = (x T z) p Every kernel function K(x,z) that satisfies some simple conditions corresponds to a dot product of some transformation (x) (aka projection function ) 23
24 Nonlinear SVM objective function Objective: E(α) = Σ i α i  Σ i Σ j α i α j y i y j K(x i,x j )/2 Under the constraints that: Σ i α i y i = 0 and 0 α i C Discriminant function: f(x) = Σ i α i y i K(x i,x) w i feature weight 24
25 Kernelization (FYI) Often instead of explicitly writing out the nonlinear feature set, one simply calculates l a kernel function of the two input vectors, i.e., K(x i, x j ) = φ(x i ) T φ(x j ) Here s why: any kernel function K(x i, x j ) that satisfies certain conditions, e.g., K(x i, x j ) is a symmetric positive semidefinite function (which implies that the matrix K, where K ij = K(x i, x j ) is a symmetric positive semidefinite matrix), corresponds to a dot product K(x i, x j ) = φ(x i ) T φ(x j ) for some projection function φ(x). Often it s easy to think of defining a kernel function that captures some notation of similarity Nonlinear SVMs use the discriminant function f(x; w) =Σ i w i K(x i,x)
26 Some popular kernel functions K(x i, x j ) = (x it x j + 1) P Inhomogeneous Polynomial kernel of degree P K(x i, x j ) = (x it x j ) P Homogeneous Polynomial kernel K(x i, x j ) = exp{(x i x j ) T (x i x j )/ s 2 } Gaussian radial basis function kernel K(x i, x j ) = Pearson(x i,x j ) + 1 Bioinformatics kernel A Also, can make a kernel out of an interaction network!
27 Example: radial basis function kernels support vectors x 1 x 2 Φ(x) = {N(x; x 1, σ 2 I), N(x; x 2, σ 2 I),, N(x; x 2, σ 2 I)} Where N(x; m, Σ) is the probability density of x under a multivariate Gaussian with mean m, and covariance Σ
28 SVM summary A sparse linear classifier that uses as features kernel functions that measure similarity to each data point in the training. 28
29 Neural Networks (aka multilayer perceptrons) h 4 f(x,θ) = Σ k v k h k (x) Where hidden unit : h k (x) = σ(σ i w ik x i ) x 1 Often: σ(x) = 1 / (1 + e x ) h 1 v and W are parameters x 2 h 2 h 3 Notes: Fitting the hidden units has become a lot easier using deep belief networks Could also fit means of RBF for hidden units
30 KNearest Neighbour K = 15 Classify x in the same way that a majority of the K nearest neighbours to x are labeled. Can also used # of nearest neighbors that are positive as a discriminant value. K = 1 Smaller values of K lead to more complex classifiers, you can set K using (nested) crossvalidation. Images from Elements of Stat Learning, Hastie, Tibshirani, Efron (available online)
31 x 1 Decision trees For each leaf: 1) Choose a dimension to split along, and choose best split using a split criterion (see below) 2) Split along dimension, creating two new leaves, return to (1) until leaves are pure. To reduce overfitting, it is advisable to prune the tree by removing leaves with small numbers of examples x 2 Split criterion: C4.5 (Quinlan 1993): Expected decrease in (binary) entropy CART (Breiman et al 1984): Expected decrease in Gini impurity (1sum of squared class probabilities)
32 Decision trees x 2 > a? yes no x 1 x 2 a
33 Decision trees x 2 > a? yes no x 1 x 2 > b? yes no etc b x 2 a
34 Decision trees x 2 > a? yes no x 1 x 2 > b? yes no etc b x 2 a
35 Decision tree summary Decision trees learn a recursive splits of the data along individual features that partition the input space into homogeneous groups of data points with the same labels. l 35
36 Ensemble classification Combining multiple classifiers together by training them separately and then averaging their predictions is a good way to avoid overfitting. Bagging (Breiman 1996): Resample training set, train separate classifiers on each sample, have the classifiers vote for the classification Boosting (e.g. Adaboost, Freund and Schapire 1997): Iteratively reweight training sets based on errors of a weighted average of classifiers: Train classifier ( weak learner ) to minimize weighted error on training set Weight new classifier according to prediction error, reweight training i set according to prediction error Repeat Minimizes exponential loss on training set over a convex set of functions
37 (x 1, y 1 ), (x 2, y 2 ),, (x N, y N ) Bootstrap samples Bagging Train separate classifiers (x 1, y 1 ), (x 1, y 1 ), (x 3, y 3 ),, (x N1, y N1 ) f 1 (x) (x 1, y 1 ), (x 2, y 2 ), (x 4, y 4 ),, (x N, y N ) f 2 (x) etc (M samples in total) bagged(x) = Σ j f j (x)/m 37
38 Boosting (x 1, y 1 ), (x 2, y 2 ),, (x N, y N ) (x 2, y 2 ), (x 2, y 2 ),, (x N1 1, y N1 ) Assess performance f 1 (x) Train classifier Resample hard cases, Set w t f t (x) Train classifier Assess performance (x 1, y 1 ), (x 2, y 2 ),, (x N, y N ) (x 2, y 2 ), (x 2, y 2 ),, (x N1, y N1 ) Legend (x i, y i )  correct (x i, y i )  incorrect boost(x) = Σ j w j f j (x) 38
39 Ensemble learning summary Bagging: classifiers trained separately (in parallel) on different bootstrapped samples of the training set, and make equally weighted votes Boosting: classifiers trained sequentially, weighted by performance, on samples of the training set that focus on hard examples. Final classification is based on weighted votes. 39
40 Random Forests (Breiman 2001) Construct M bootstrapped samples of the training set (of size N) For each sample, build DT using CART (no pruning), but split optimally on randomly chosen features random feature choice reduces correlation among trees, this is a good thing. Since bootstrap resamples training set with replacement, leaves out, on average, (11/N) N x 100% of the examples (~100/e% = 36.7%), can use these outofbag samples to estimate performance Bag predictions (i.e. average them) Can assess importance of features by evaluating performance of trees containing those features
41 Advantages of Decision Trees Relatively easy to combine continuous and discrete (ordinal or categorical) features within the same classifier, this is harder to do for other methods Random Forests can be trained in parallel (unlike boosted decision trees), and relatively quickly. Random Forests tend to do quite well in empirical tests (but see No Free Lunch theorem)
42 Networks as kernels Can use matrix representations of graphs to generate kernels. One popular graphbased kernel is the diffusion kernel: K =(λi L) 1 Where L = D W, D is a diagonal matrix with the row sums, and W is the matrix representation of the graph. GeneMANIA label propagation: f = (λi L) 1 y
A Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Largemargin linear
More informationClassification: Basic Concepts, Decision Trees, and Model Evaluation. General Approach for Building Classification Model
10 10 Classification: Basic Concepts, Decision Trees, and Model Evaluation Dr. Hui Xiong Rutgers University Introduction to Data Mining 1//009 1 General Approach for Building Classification Model Tid Attrib1
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationIntroduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multiclass classification.
More informationModel Combination. 24 Novembre 2009
Model Combination 24 Novembre 2009 Datamining 1 20092010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy
More informationIntroduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011
Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationSupport Vector Machine (SVM)
Support Vector Machine (SVM) CE725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept HardMargin SVM SoftMargin SVM Dual Problems of HardMargin
More informationSupport Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationDecision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationArtificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network?  Perceptron learners  Multilayer networks What is a Support
More informationCLASSIFICATION AND CLUSTERING. Anveshi Charuvaka
CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training
More informationAnalysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationData Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Models vs. Patterns Models A model is a high level, global description of a
More informationC19 Machine Learning
C9 Machine Learning 8 Lectures Hilary Term 25 2 Tutorial Sheets A. Zisserman Overview: Supervised classification perceptron, support vector machine, loss functions, kernels, random forests, neural networks
More informationIntroduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training
More informationSupport Vector Machines
Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric
More informationClassifiers & Classification
Classifiers & Classification Forsyth & Ponce Computer Vision A Modern Approach chapter 22 Pattern Classification Duda, Hart and Stork School of Computer Science & Statistics Trinity College Dublin Dublin
More informationTrees and Random Forests
Trees and Random Forests Adele Cutler Professor, Mathematics and Statistics Utah State University This research is partially supported by NIH 1R15AG03739201 Cache Valley, Utah Utah State University Leo
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More informationDecompose Error Rate into components, some of which can be measured on unlabeled data
BiasVariance Theory Decompose Error Rate into components, some of which can be measured on unlabeled data BiasVariance Decomposition for Regression BiasVariance Decomposition for Classification BiasVariance
More informationLinear Models for Classification
Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci
More informationNew Work Item for ISO 35345 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 35345 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
More informationIntroduction to Machine Learning
Introduction to Machine Learning Prof. Alexander Ihler Prof. Max Welling icamp Tutorial July 22 What is machine learning? The ability of a machine to improve its performance based on previous results:
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More information1. Bag of visual words model: recognizing object categories
1. Bag of visual words model: recognizing object categories 1 1 1 Problem: Image Classification Given: positive training images containing an object class, and negative training images that don t Classify:
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationSupervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
More informationSeveral Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
More informationData Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 7 of Data Mining by I. H. Witten and E. Frank
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 7 of Data Mining by I. H. Witten and E. Frank Engineering the input and output Attribute selection Scheme independent, scheme
More informationCS570 Data Mining Classification: Ensemble Methods
CS570 Data Mining Classification: Ensemble Methods Cengiz Günay Dept. Math & CS, Emory University Fall 2013 Some slides courtesy of HanKamberPei, Tan et al., and Li Xiong Günay (Emory) Classification:
More informationAcknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues
Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the
More informationKnowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19  Bagging. Tom Kelsey. Notes
Knowledge Discovery and Data Mining Lecture 19  Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.standrews.ac.uk twk@standrews.ac.uk Tom Kelsey ID505919B &
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationData Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
More informationChapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida  1 
Chapter 11 Boosting Xiaogang Su Department of Statistics University of Central Florida  1  Perturb and Combine (P&C) Methods have been devised to take advantage of the instability of trees to create
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationThe Operational Value of Social Media Information. Social Media and Customer Interaction
The Operational Value of Social Media Information Dennis J. Zhang (Kellogg School of Management) Ruomeng Cui (Kelley School of Business) Santiago Gallino (Tuck School of Business) Antonio MorenoGarcia
More informationL25: Ensemble learning
L25: Ensemble learning Introduction Methods for constructing ensembles Combination strategies Stacked generalization Mixtures of experts Bagging Boosting CSCE 666 Pattern Analysis Ricardo GutierrezOsuna
More informationData Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data
More information6 Classification and Regression Trees, 7 Bagging, and Boosting
hs24 v.2004/01/03 Prn:23/02/2005; 14:41 F:hs24011.tex; VTEX/ES p. 1 1 Handbook of Statistics, Vol. 24 ISSN: 01697161 2005 Elsevier B.V. All rights reserved. DOI 10.1016/S01697161(04)240111 1 6 Classification
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard
More informationIntroduction to Machine Learning. Rob Schapire Princeton University. schapire
Introduction to Machine Learning Rob Schapire Princeton University www.cs.princeton.edu/ schapire Machine Learning studies how to automatically learn to make accurate predictions based on past observations
More informationSupport Vector Machines
Support Vector Machines Here we approach the twoclass classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationData Mining Classification: Alternative Techniques. InstanceBased Classifiers. Lecture Notes for Chapter 5. Introduction to Data Mining
Data Mining Classification: Alternative Techniques InstanceBased Classifiers Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar Set of Stored Cases Atr1... AtrN Class A B
More informationFinal Exam, Spring 2007
10701 Final Exam, Spring 2007 1. Personal info: Name: Andrew account: Email address: 2. There should be 16 numbered pages in this exam (including this cover sheet). 3. You can use any material you brought:
More informationPCA, Clustering and Classification. By H. Bjørn Nielsen strongly inspired by Agnieszka S. Juncker
PCA, Clustering and Classification By H. Bjørn Nielsen strongly inspired by Agnieszka S. Juncker Motivation: Multidimensional data Pat1 Pat2 Pat3 Pat4 Pat5 Pat6 Pat7 Pat8 Pat9 209619_at 7758 4705 5342
More informationHT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
More informationData Mining Techniques Chapter 6: Decision Trees
Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationClassification Techniques for Remote Sensing
Classification Techniques for Remote Sensing Selim Aksoy Department of Computer Engineering Bilkent University Bilkent, 06800, Ankara saksoy@cs.bilkent.edu.tr http://www.cs.bilkent.edu.tr/ saksoy/courses/cs551
More informationLecture 10: Regression Trees
Lecture 10: Regression Trees 36350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
More informationGerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
More informationStatistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees
Statistical Data Mining Practical Assignment 3 Discriminant Analysis and Decision Trees In this practical we discuss linear and quadratic discriminant analysis and treebased classification techniques.
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More informationNeural Networks and Support Vector Machines
INF5390  Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF539013 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines
More informationEnsemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 20150305
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 20150305 Roman Kern (KTI, TU Graz) Ensemble Methods 20150305 1 / 38 Outline 1 Introduction 2 Classification
More informationNeural Networks & Boosting
Neural Networks & Boosting Bob Stine Dept of Statistics, School University of Pennsylvania Questions How is logistic regression different from OLS? Logistic mean function for probabilities Larger weight
More informationIntroduction to Machine Learning NPFL 054
Introduction to Machine Learning NPFL 054 http://ufal.mff.cuni.cz/course/npfl054 Barbora Hladká hladka@ufal.mff.cuni.cz Martin Holub holub@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and
More informationLocal features and matching. Image classification & object localization
Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to
More informationA Survey of Kernel Clustering Methods
A Survey of Kernel Clustering Methods Maurizio Filippone, Francesco Camastra, Francesco Masulli and Stefano Rovetta Presented by: Kedar Grama Outline Unsupervised Learning and Clustering Types of clustering
More informationGaussian Classifiers CS498
Gaussian Classifiers CS498 Today s lecture The Gaussian Gaussian classifiers A slightly more sophisticated classifier Nearest Neighbors We can classify with nearest neighbors x m 1 m 2 Decision boundary
More informationCheng Soon Ong & Christfried Webers. Canberra February June 2016
c Cheng Soon Ong & Christfried Webers Research Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 31 c Part I
More informationPattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University
Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision
More informationLecture 9: Introduction to Pattern Analysis
Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns
More informationBreaking SVM Complexity with CrossTraining
Breaking SVM Complexity with CrossTraining Gökhan H. Bakır Max Planck Institute for Biological Cybernetics, Tübingen, Germany gb@tuebingen.mpg.de Léon Bottou NEC Labs America Princeton NJ, USA leon@bottou.org
More informationUsing multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
More informationTan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2. Tid Refund Marital Status
Data Mining Classification: Basic Concepts, Decision Trees, and Evaluation Lecture tes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Classification: Definition Given a collection of
More informationMachine Learning in Spam Filtering
Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov kt@ut.ee Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.
More informationIntroduction to machine learning and pattern recognition Lecture 1 Coryn BailerJones
Introduction to machine learning and pattern recognition Lecture 1 Coryn BailerJones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 What is machine learning? Data description and interpretation
More informationClassification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
More informationMACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationEnsemble Approach for the Classification of Imbalanced Data
Ensemble Approach for the Classification of Imbalanced Data Vladimir Nikulin 1, Geoffrey J. McLachlan 1, and Shu Kay Ng 2 1 Department of Mathematics, University of Queensland v.nikulin@uq.edu.au, gjm@maths.uq.edu.au
More informationIntroduction to Convex Optimization for Machine Learning
Introduction to Convex Optimization for Machine Learning John Duchi University of California, Berkeley Practical Machine Learning, Fall 2009 Duchi (UC Berkeley) Convex Optimization for Machine Learning
More informationPa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classiﬁca6on
Pa8ern Recogni6on and Machine Learning Chapter 4: Linear Models for Classiﬁca6on Represen'ng the target values for classifica'on If there are only two classes, we typically use a single real valued output
More informationLABEL PROPAGATION ON GRAPHS. SEMISUPERVISED LEARNING. Changsheng Liu 10302014
LABEL PROPAGATION ON GRAPHS. SEMISUPERVISED LEARNING Changsheng Liu 10302014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
More informationCI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
More informationIntroduction to Neural Networks : Revision Lectures
Introduction to Neural Networks : Revision Lectures John A. Bullinaria, 2004 1. Module Aims and Learning Outcomes 2. Biological and Artificial Neural Networks 3. Training Methods for Multi Layer Perceptrons
More informationNotes on Support Vector Machines
Notes on Support Vector Machines Fernando Mira da Silva Fernando.Silva@inesc.pt Neural Network Group I N E S C November 1998 Abstract This report describes an empirical study of Support Vector Machines
More informationComparison of machine learning methods for intelligent tutoring systems
Comparison of machine learning methods for intelligent tutoring systems Wilhelmiina Hämäläinen 1 and Mikko Vinni 1 Department of Computer Science, University of Joensuu, P.O. Box 111, FI80101 Joensuu
More informationSemiSupervised Support Vector Machines and Application to Spam Filtering
SemiSupervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
More informationThe Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
More informationData Mining for Knowledge Management. Classification
1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh
More informationData Mining Methods: Applications for Institutional Research
Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014
More informationMicrosoft Azure Machine learning Algorithms
Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql Tomaz.kastrun@gmail.com http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More informationREVIEW OF ENSEMBLE CLASSIFICATION
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.
More informationMonotonicity Hints. Abstract
Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: joe@cs.caltech.edu Yaser S. AbuMostafa EE and CS Deptartments California Institute of Technology
More information