# L25: Ensemble learning

 To view this video please enable JavaScript, and consider upgrading to a web browser that supports HTML5 video
Save this PDF as:

Size: px
Start display at page:

## Transcription

1 L25: Ensemble learning Introduction Methods for constructing ensembles Combination strategies Stacked generalization Mixtures of experts Bagging Boosting CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna 1

2 Introduction What is ensemble learning? Ensemble learning refers to a collection of methods that learn a target function by training a number of individual learners and combining their predictions Why ensemble learning? Accuracy: a more reliable mapping can be obtained by combining the output of multiple experts Efficiency: a complex problem can be decomposed into multiple subproblems that are easier to understand and solve (divide-and-conquer approach) There is not a single model that works for all PR problems! To solve really hard problems, we ll have to use several different representations... It is time to stop arguing over which type of patternclassification technique is best... Instead we should work at a higher level of organization and discover how to build managerial systems to exploit the different virtues and evade the different limitations of each of these ways of comparing things. [Minsky, 1991] When to use ensemble learning? When you can build component classifiers that are more accurate than chance and, more importantly, that are independent from each other CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna 2

3 Why do ensembles work? Because uncorrelated errors of individual classifiers can be eliminated through averaging Assume a binary classification problem for which you can train individual classifiers with an error rate of 0.3 Assume that you build an ensemble by combining the prediction of 21 such classifiers with a majority vote What is the probability of error for the ensemble? In order for the ensemble to misclassify an example, 11 or more classifiers have to be in error, or a probability of The histogram below shows the distribution of the number of classifiers that are in error in the ensemble machine From [Dietterich, 1997] [Dietterich, 1998] CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna 3

4 The target function may not be implementable with single classifiers, but may be approximated by ensemble averaging Assume that you want to build a diagonal decision boundary with decision trees The decision boundaries constructed by these machines are hyperplanes parallel to the coordinate axes, or staircases in the example below By averaging a large number of such staircases, the diagonal decision boundary can be approximated with arbitrarily small accuracy [Dietterich, 1997] CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna 4

5 Methods for constructing ensembles Subsampling the training examples Multiple hypotheses are generated by training individual classifiers on different datasets obtained by resampling a common training set (Bagging, Boosting) Manipulating the input features Multiple hypotheses are generated by training individual classifiers on different representations, or different subsets of a common feature vector Manipulating the output targets The output targets for C classes are encoded with an L-bit codeword, and an individual classifier is built to predict each one of the bits in the codeword Additional auxiliary targets may be used to differentiate classifiers Modifying the learning parameters of the classifier A number of classifiers are built with different learning parameters, such as number of neighbors in a k Nearest Neighbor rule, initial weights in an MLP, etc. CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna 5

6 Structure of ensemble classifiers Parallel All the individual classifiers are invoked independently, and their results are fused with a combination rule (e.g., average, weighted voting) or a metaclassifier (e.g., stacked generalization) The majority of ensemble approaches in the literature fall under this category Cascading or Hierarchical Classifiers are invoked in a sequential or tree-structured fashion For the purpose of efficiency, inaccurate but fast methods are invoked first (maybe using a small subset of the features), and computationally more intensive but accurate methods are left for the latter stages E x p e rt 1 E x p e rt 2 E x p e rt K C o m b in e r E x p e rt 1 E x p e rt 2 E x p e rt 3 E x p e rt 1 E x p e rt 2 A E x p e rt 2 B [Jain, 2000] E x p e rt 3 A E x p e rt 3 B E x p e rt 3 C E x p e rt 3 D CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna 6

7 Combination strategies Static combiners The combiner decision rule is independent of the feature vector. Static approaches can be broadly divided into non-trainable and trainable Non-trainable: The voting is performed independently of the performance of each individual classifier Various combiners may be used, depending on the type of output produced by the classifier, including Voting: used when each classifier produces a single class label. In this case, each classifier votes for a particular class, and the class with the majority vote on the ensemble wins Averaging: used when each classifier produces a confidence estimate (e.g., a posterior). In this case, the winner is the class with the highest average posterior across the ensemble Borda counts: used when each classifier produces a rank. The Borda count of a class is the number of classes ranked below it [Ho et al., 1994] Trainable: The combiner undergoes a separate training phase to improve the performance of the ensemble. Two noteworthy approaches are Weighted averaging: the output of each classifier is weighted by a measure of its own performance, e.g., prediction accuracy on a separate validation set Stacked generalization: the output of the ensemble serves as a feature vector to a meta-classifier CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna 7

8 Adaptive combiners The combiner is a function that depends on the input feature vector Thus, the ensemble implements a function that is local to each region in feature space This divide-and-conquer approach leads to modular ensembles where relatively simple classifiers specialize in different parts of I/O space In contrast with static-combiner ensembles, the individual experts here do not need to perform well for all inputs, only in their region of expertise Representative examples of this approach are Mixture of Experts (ME) and Hierarchical ME [Jacobs et al., 1991; Jordan and Jacobs, 1994] E x p e rt 1 y 1 F e a tu re E x p e rt 2 y 2 M e ta O u tp u t v e c to r E x p e rt s ig n a l E x p e rt K y k [Gosh, 2002] CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna 8

9 Stacked generalization In stacked generalization, the output of the ensemble serves as an input to a second-level expert [Wolpert, 1992] Training of this modular ensemble can be performed as follows From a dataset X with N examples, leave out one test example, and train each of the level-0 experts on the remaining N 1 examples Generate a prediction for the test example. The output pattern y = [y 1, y 2,, y k ] across the level-0 experts, along with the target t for the test example, becomes a training example for the level-1 expert Repeat this process in a leave-one-out fashion. This yields a training set Y with N examples, which is used to train the level-1 expert separately To make full use of the training data, re-train all the level-0 experts one more time using all N examples in X E x p e rt 0 1 y 1 y 2 F e a tu re v e c to r (X ) E x p e rt 0 2 E x p e rt 1 O u tp u t s ig n a l (T ) E x p e rt 0 K [Bishop, 1995] CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna 9 y K

10 Mixture of experts ME is the classical adaptive ensemble method A gating network is used to partition feature space into different regions, with one expert in the ensemble being responsible for generating the correct output within that region [Jacobs et al., 1991] The experts in the ensemble and the gating network are trained simultaneously, which can be efficiently performed with EM ME can be extended to a multi-level hierarchical structure, where each component is itself a ME. In this case, a linear network can be used for the terminal classifiers without compromising the modeling capabilities of the machine y 1 E x p e rt 1 g 1 F e a tu re v e c to r E x p e rt 2 g 2 y 2 O u tp u t s ig n a l y k E x p e rt K g k G a tin g N e tw o rk [Bishop, 1995] CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna 10

11 Subsampling the training set Bagging [Breiman, 1996] Bagging (for bootstrap aggregation) creates an ensemble by training individual classifiers on bootstrap samples of the training set As a result of the sampling-with-replacement procedure, each classifier is trained on the average of 63.2% of the training examples For a dataset with N examples, each example has a probability of 1 1 1/N N of being selected at least once in the N samples. For N, this number converges to (1 1/e) or [Bauer and Kohavi, 1999] Bagging traditionally uses component classifiers of the same type (e.g., decision trees), and a simple combiner consisting of a majority vote across the ensemble [Bauer and Kohavi, 1999l; Duda et al., 2001] CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna 11

12 Bagging (continued) The perturbation in the training set due to the bootstrap resampling causes different hypotheses to be built, particularly if the classifier is unstable A classifier is said to be unstable if a small change in the training data (e.g., order of presentation of examples) can lead to a radically different hypothesis. This is the case of decision trees and, arguably, neural networks Bagging can be expected to improve accuracy if the induced classifiers are uncorrelated In some cases, such as k Nearest Neighbors, bagging has been shown to degrade performance as compared to individual classifiers as a result of an effectively smaller training set A related approach to bagging is cross-validated committees, in which the component classifiers are built on different partitions of the training set obtained through k-fold cross-validation CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna 12

13 Boosting [Schapire, 1990; Freund and Schapire, 1996] Boosting takes a different resampling approach than bagging, which maintains a constant probability of 1/N for selecting each example In boosting, this probability is adapted over time based on performance The component classifiers are built sequentially, and examples that are mislabeled by previous components are chosen more often than those that are correctly classified Boosting is based on the concept of a weak learner, an algorithm that performs slightly better than chance (e.g., 50% classification rate on binary tasks) Schapire has shown that a weak learner can be converted into a strong learner by changing the distribution of training examples Boosting can also be used with classifiers that are highly accurate, but the benefits in this case will be very small A number of variants of boosting are available in the literature. We focus on the most popular form, known as AdaBoost (for Adaptive Boosting), which allows the designer to continue adding components until an arbitrarily small error rate is obtained on the training set NOTE: boosting is also known as arcing (for adaptive resampling and combining), a term that was coined by Breiman CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna 13

14 AdaBoost AdaBoost operates as follows At iteration n, boosting provides the weak learner with a distribution D n over the training set, where D n (i) represents the probability of selecting the i-th example The initial distribution is uniform: D 1 (i) = 1/N. Thus, all examples are equally likely to be selected for the first component The weak learner subsamples the training set according to D n and generates a trained model or hypothesis H n The error rate of H n is measured with respect to the distribution D n A new distribution D n+1 is produced by decreasing the probability of those examples that were correctly classified, and increasing the probability of the misclassified examples The process is repeated T times, and a final hypothesis is obtained by weighting the votes of individual hypotheses {h 1, h 2,, h T } according to their performance Note The strength of AdaBoost derives from the adaptive re-sampling of examples, not from the final weighted combination To prove this point Breiman developed a variant of AdaBoost, known as arc-x4, in which the ensemble voting is unweighted [Breiman, 1996]: his results show that AdaBoost (referred to as arc-fs ) and arc-x4 have similar performance [Bauer and Kohavi, 1999] CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna 14

15 The bias and variance decomposition The effectiveness of Bagging and Boosting has been explained in terms of the bias-variance decomposition of classification error The expected error of a learning algorithm can be decomposed into A bias term that measures how closely the average classifier produced by the learning algorithm matches the target function A variance term that measures how much the learning algorithm s predictions fluctuate for different training sets (of the same size) An intrinsic target noise, which is the minimum error that can be achieved: that of the Bayes optimal classifier Following this line of reasoning, Breiman has suggested that both Bagging and Boosting reduce errors by reducing the variance term Along the same lines, Freund and Schapire have argued that Boosting also attempts to reduce the bias term since it focuses on misclassified samples Work by Bauer and Kohavi, however, seems to indicate that Bagging can also reduce the bias term [Opitz and Maclin, 1999] CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna 15

### Model Combination. 24 Novembre 2009

Model Combination 24 Novembre 2009 Datamining 1 2009-2010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy

### Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

### Chapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida - 1 -

Chapter 11 Boosting Xiaogang Su Department of Statistics University of Central Florida - 1 - Perturb and Combine (P&C) Methods have been devised to take advantage of the instability of trees to create

### Data Mining. Nonlinear Classification

Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

### Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

### Chapter 6. The stacking ensemble approach

82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

### REVIEW OF ENSEMBLE CLASSIFICATION

Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.

### Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk

Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training

### L13: cross-validation

Resampling methods Cross validation Bootstrap L13: cross-validation Bias and variance estimation with the Bootstrap Three-way data partitioning CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU

### A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,

### Increasing Classification Accuracy. Data Mining: Bagging and Boosting. Bagging 1. Bagging 2. Bagging. Boosting Meta-learning (stacking)

Data Mining: Bagging and Boosting Increasing Classification Accuracy Andrew Kusiak 2139 Seamans Center Iowa City, Iowa 52242-1527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Tel: 319-335

### Data Mining Practical Machine Learning Tools and Techniques

Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

### Solving Regression Problems Using Competitive Ensemble Models

Solving Regression Problems Using Competitive Ensemble Models Yakov Frayman, Bernard F. Rolfe, and Geoffrey I. Webb School of Information Technology Deakin University Geelong, VIC, Australia {yfraym,brolfe,webb}@deakin.edu.au

### Ensemble Data Mining Methods

Ensemble Data Mining Methods Nikunj C. Oza, Ph.D., NASA Ames Research Center, USA INTRODUCTION Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods

### Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Unit # 10 Sajjad Haider Fall 2012 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

### CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes

### On the effect of data set size on bias and variance in classification learning

On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent

### A Learning Algorithm For Neural Network Ensembles

A Learning Algorithm For Neural Network Ensembles H. D. Navone, P. M. Granitto, P. F. Verdes and H. A. Ceccatto Instituto de Física Rosario (CONICET-UNR) Blvd. 27 de Febrero 210 Bis, 2000 Rosario. República

### Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-19-B &

### Building Ensembles of Neural Networks with Class-switching

Building Ensembles of Neural Networks with Class-switching Gonzalo Martínez-Muñoz, Aitor Sánchez-Martínez, Daniel Hernández-Lobato and Alberto Suárez Universidad Autónoma de Madrid, Avenida Francisco Tomás

### Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

### Decompose Error Rate into components, some of which can be measured on unlabeled data

Bias-Variance Theory Decompose Error Rate into components, some of which can be measured on unlabeled data Bias-Variance Decomposition for Regression Bias-Variance Decomposition for Classification Bias-Variance

### Data Mining Classification: Alternative Techniques. Instance-Based Classifiers. Lecture Notes for Chapter 5. Introduction to Data Mining

Data Mining Classification: Alternative Techniques Instance-Based Classifiers Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar Set of Stored Cases Atr1... AtrN Class A B

### Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for

### Boosting. riedmiller@informatik.uni-freiburg.de

. Machine Learning Boosting Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de

### Decision Trees from large Databases: SLIQ

Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values

### Comparison of Data Mining Techniques used for Financial Data Analysis

Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract

### L15: statistical clustering

Similarity measures Criterion functions Cluster validity Flat clustering algorithms k-means ISODATA L15: statistical clustering Hierarchical clustering algorithms Divisive Agglomerative CSCE 666 Pattern

### Increasing the Accuracy of Predictive Algorithms: A Review of Ensembles of Classifiers

1906 Category: Software & Systems Design Increasing the Accuracy of Predictive Algorithms: A Review of Ensembles of Classifiers Sotiris Kotsiantis University of Patras, Greece & University of Peloponnese,

### CS570 Data Mining Classification: Ensemble Methods

CS570 Data Mining Classification: Ensemble Methods Cengiz Günay Dept. Math & CS, Emory University Fall 2013 Some slides courtesy of Han-Kamber-Pei, Tan et al., and Li Xiong Günay (Emory) Classification:

### Classification: Basic Concepts, Decision Trees, and Model Evaluation. General Approach for Building Classification Model

10 10 Classification: Basic Concepts, Decision Trees, and Model Evaluation Dr. Hui Xiong Rutgers University Introduction to Data Mining 1//009 1 General Approach for Building Classification Model Tid Attrib1

### Advanced Ensemble Strategies for Polynomial Models

Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer

### Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

### Supervised Learning with Unsupervised Output Separation

Supervised Learning with Unsupervised Output Separation Nathalie Japkowicz School of Information Technology and Engineering University of Ottawa 150 Louis Pasteur, P.O. Box 450 Stn. A Ottawa, Ontario,

### Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet

### Lecture 13: Validation

Lecture 3: Validation g Motivation g The Holdout g Re-sampling techniques g Three-way data splits Motivation g Validation techniques are motivated by two fundamental problems in pattern recognition: model

### Classification and Regression Trees

Classification and Regression Trees Bob Stine Dept of Statistics, School University of Pennsylvania Trees Familiar metaphor Biology Decision tree Medical diagnosis Org chart Properties Recursive, partitioning

### Ensembles of Similarity-Based Models.

Ensembles of Similarity-Based Models. Włodzisław Duch and Karol Grudziński Department of Computer Methods, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. E-mails: {duch,kagru}@phys.uni.torun.pl

### Ensembles of data-reduction-based classifiers for distributed learning from very large data sets

Ensembles of data-reduction-based classifiers for distributed learning from very large data sets R. A. Mollineda, J. M. Sotoca, J. S. Sánchez Dept. de Lenguajes y Sistemas Informáticos Universidad Jaume

### Chapter 12 Bagging and Random Forests

Chapter 12 Bagging and Random Forests Xiaogang Su Department of Statistics and Actuarial Science University of Central Florida - 1 - Outline A brief introduction to the bootstrap Bagging: basic concepts

### CPSC 340: Machine Learning and Data Mining. K-Means Clustering Fall 2015

CPSC 340: Machine Learning and Data Mining K-Means Clustering Fall 2015 Admin Assignment 1 solutions posted after class. Tutorials for Assignment 2 on Monday. Random Forests Random forests are one of the

### Metalearning for Dynamic Integration in Ensemble Methods

Metalearning for Dynamic Integration in Ensemble Methods Fábio Pinto 12 July 2013 Faculdade de Engenharia da Universidade do Porto Ph.D. in Informatics Engineering Supervisor: Doutor Carlos Soares Co-supervisor:

### Ensemble of Classifiers Based on Association Rule Mining

Ensemble of Classifiers Based on Association Rule Mining Divya Ramani, Dept. of Computer Engineering, LDRP, KSV, Gandhinagar, Gujarat, 9426786960. Harshita Kanani, Assistant Professor, Dept. of Computer

### Towards better accuracy for Spam predictions

Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 czhao@cs.toronto.edu Abstract Spam identification is crucial

### A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication

2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management

### Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 7 of Data Mining by I. H. Witten and E. Frank

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 7 of Data Mining by I. H. Witten and E. Frank Engineering the input and output Attribute selection Scheme independent, scheme

### Cross-Validation. Synonyms Rotation estimation

Comp. by: BVijayalakshmiGalleys0000875816 Date:6/11/08 Time:19:52:53 Stage:First Proof C PAYAM REFAEILZADEH, LEI TANG, HUAN LIU Arizona State University Synonyms Rotation estimation Definition is a statistical

### A Dynamic Integration Algorithm with Ensemble of Classifiers

1 A Dynamic Integration Algorithm with Ensemble of Classifiers Seppo Puuronen 1, Vagan Terziyan 2, Alexey Tsymbal 2 1 University of Jyvaskyla, P.O.Box 35, FIN-40351 Jyvaskyla, Finland sepi@jytko.jyu.fi

### An Introduction to Statistical Machine Learning - Overview -

An Introduction to Statistical Machine Learning - Overview - Samy Bengio bengio@idiap.ch Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP) CP 592, rue du Simplon 4 1920 Martigny, Switzerland

### ENSEMBLE METHODS FOR CLASSIFIERS

Chapter 45 ENSEMBLE METHODS FOR CLASSIFIERS Lior Rokach Department of Industrial Engineering Tel-Aviv University liorr@eng.tau.ac.il Abstract Keywords: The idea of ensemble methodology is to build a predictive

### Ensemble Methods for Noise Elimination in Classification Problems

Ensemble Methods for Noise Elimination in Classification Problems Sofie Verbaeten and Anneleen Van Assche Department of Computer Science Katholieke Universiteit Leuven Celestijnenlaan 200 A, B-3001 Heverlee,

### Combining Multiple Models Across Algorithms and Samples for Improved Results

Combining Multiple Models Across Algorithms and Samples for Improved Results Haleh Vafaie, PhD. Federal Data Corporation Bethesda, MD Hvafaie@feddata.com Dean Abbott Abbott Consulting San Diego, CA dean@abbottconsulting.com

### Active Learning with Boosting for Spam Detection

Active Learning with Boosting for Spam Detection Nikhila Arkalgud Last update: March 22, 2008 Active Learning with Boosting for Spam Detection Last update: March 22, 2008 1 / 38 Outline 1 Spam Filters

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### II. RELATED WORK. Sentiment Mining

Sentiment Mining Using Ensemble Classification Models Matthew Whitehead and Larry Yaeger Indiana University School of Informatics 901 E. 10th St. Bloomington, IN 47408 {mewhiteh, larryy}@indiana.edu Abstract

### Introduction To Ensemble Learning

Educational Series Introduction To Ensemble Learning Dr. Oliver Steinki, CFA, FRM Ziad Mohammad July 2015 What Is Ensemble Learning? In broad terms, ensemble learning is a procedure where multiple learner

### Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

### Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next

### LCs for Binary Classification

Linear Classifiers A linear classifier is a classifier such that classification is performed by a dot product beteen the to vectors representing the document and the category, respectively. Therefore it

### Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data

### Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE-541 28 Skövde

### Data Mining Methods: Applications for Institutional Research

Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014

### On Adaboost and Optimal Betting Strategies

On Adaboost and Optimal Betting Strategies Pasquale Malacaria School of Electronic Engineering and Computer Science Queen Mary, University of London Email: pm@dcs.qmul.ac.uk Fabrizio Smeraldi School of

### Combining Classifiers with Meta Decision Trees

Machine Learning, 50, 223 249, 2003 c 2003 Kluwer Academic Publishers. Manufactured in The Netherlands. Combining Classifiers with Meta Decision Trees LJUPČO TODOROVSKI Ljupco.Todorovski@ijs.si SAŠO DŽEROSKI

### Random Forest Based Imbalanced Data Cleaning and Classification

Random Forest Based Imbalanced Data Cleaning and Classification Jie Gu Software School of Tsinghua University, China Abstract. The given task of PAKDD 2007 data mining competition is a typical problem

### Case Study Report: Building and analyzing SVM ensembles with Bagging and AdaBoost on big data sets

Case Study Report: Building and analyzing SVM ensembles with Bagging and AdaBoost on big data sets Ricardo Ramos Guerra Jörg Stork Master in Automation and IT Faculty of Computer Science and Engineering

### Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

### An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

### Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2. Tid Refund Marital Status

Data Mining Classification: Basic Concepts, Decision Trees, and Evaluation Lecture tes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Classification: Definition Given a collection of

### New Ensemble Combination Scheme

New Ensemble Combination Scheme Namhyoung Kim, Youngdoo Son, and Jaewook Lee, Member, IEEE Abstract Recently many statistical learning techniques are successfully developed and used in several areas However,

### CLASSIFICATION AND CLUSTERING. Anveshi Charuvaka

CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training

### DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

### Why Ensembles Win Data Mining Competitions

Why Ensembles Win Data Mining Competitions A Predictive Analytics Center of Excellence (PACE) Tech Talk November 14, 2012 Dean Abbott Abbott Analytics, Inc. Blog: http://abbottanalytics.blogspot.com URL:

### Resampling for Face Recognition

Resampling for Face Recognition Xiaoguang Lu and Anil K. Jain Dept. of Computer Science & Engineering, Michigan State University East Lansing, MI 48824 {lvxiaogu,jain}@cse.msu.edu Abstract. A number of

### An Introduction to Ensemble Learning in Credit Risk Modelling

An Introduction to Ensemble Learning in Credit Risk Modelling October 15, 2014 Han Sheng Sun, BMO Zi Jin, Wells Fargo Disclaimer The opinions expressed in this presentation and on the following slides

### Local features and matching. Image classification & object localization

Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to

### 6 Classification and Regression Trees, 7 Bagging, and Boosting

hs24 v.2004/01/03 Prn:23/02/2005; 14:41 F:hs24011.tex; VTEX/ES p. 1 1 Handbook of Statistics, Vol. 24 ISSN: 0169-7161 2005 Elsevier B.V. All rights reserved. DOI 10.1016/S0169-7161(04)24011-1 1 6 Classification

### An Ensemble Method for Large Scale Machine Learning with Hadoop MapReduce

An Ensemble Method for Large Scale Machine Learning with Hadoop MapReduce by Xuan Liu Thesis submitted to the Faculty of Graduate and Postdoctoral Studies In partial fulfillment of the requirements For

### Training Methods for Adaptive Boosting of Neural Networks for Character Recognition

Submission to NIPS*97, Category: Algorithms & Architectures, Preferred: Oral Training Methods for Adaptive Boosting of Neural Networks for Character Recognition Holger Schwenk Dept. IRO Université de Montréal

### Operations Research and Knowledge Modeling in Data Mining

Operations Research and Knowledge Modeling in Data Mining Masato KODA Graduate School of Systems and Information Engineering University of Tsukuba, Tsukuba Science City, Japan 305-8573 koda@sk.tsukuba.ac.jp

### Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods

Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy B laszczyński 1, Krzysztof Dembczyński 1, Wojciech Kot lowski 1, and Mariusz Paw lowski 2 1 Institute of Computing

### Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

### Evaluation and Credibility. How much should we believe in what was learned?

Evaluation and Credibility How much should we believe in what was learned? Outline Introduction Classification with Train, Test, and Validation sets Handling Unbalanced Data; Parameter Tuning Cross-validation

### Machine Learning (CS 567)

Machine Learning (CS 567) Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol Han (cheolhan@usc.edu)

### Robotics 2 Clustering & EM. Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard

Robotics 2 Clustering & EM Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard 1 Clustering (1) Common technique for statistical data analysis to detect structure (machine learning,

### When Efficient Model Averaging Out-Performs Boosting and Bagging

When Efficient Model Averaging Out-Performs Boosting and Bagging Ian Davidson 1 and Wei Fan 2 1 Department of Computer Science, University at Albany - State University of New York, Albany, NY 12222. Email:

### Using Machine Learning Techniques to Improve Precipitation Forecasting

Using Machine Learning Techniques to Improve Precipitation Forecasting Joshua Coblenz Abstract This paper studies the effect of machine learning techniques on precipitation forecasting. Twelve features

### Introduction to Predictive Models Book Chapters 1, 2 and 5. Carlos M. Carvalho The University of Texas McCombs School of Business

Introduction to Predictive Models Book Chapters 1, 2 and 5. Carlos M. Carvalho The University of Texas McCombs School of Business 1 1. Introduction 2. Measuring Accuracy 3. Out-of Sample Predictions 4.

### PCA, Clustering and Classification. By H. Bjørn Nielsen strongly inspired by Agnieszka S. Juncker

PCA, Clustering and Classification By H. Bjørn Nielsen strongly inspired by Agnieszka S. Juncker Motivation: Multidimensional data Pat1 Pat2 Pat3 Pat4 Pat5 Pat6 Pat7 Pat8 Pat9 209619_at 7758 4705 5342

### Combining SVM classifiers for email anti-spam filtering

Combining SVM classifiers for email anti-spam filtering Ángela Blanco Manuel Martín-Merino Abstract Spam, also known as Unsolicited Commercial Email (UCE) is becoming a nightmare for Internet users and

### A HYBRID STACKING ENSEMBLE FRAMEWORK FOR EMPLOYMENT PREDICTION PROBLEMS

ISSN: 0975 3273 & E-ISSN: 0975 9085, Volume 3, Issue 1, 2011, PP-25-30 Available online at http://www.bioinfo.in/contents.php?id=33 A HYBRID STACKING ENSEMBLE FRAMEWORK FOR EMPLOYMENT PREDICTION PROBLEMS

### BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

### Ensemble Learning Better Predictions Through Diversity. Todd Holloway ETech 2008

Ensemble Learning Better Predictions Through Diversity Todd Holloway ETech 2008 Outline Building a classifier (a tutorial example) Neighbor method Major ideas and challenges in classification Ensembles

### Car Insurance. Havránek, Pokorný, Tomášek

Car Insurance Havránek, Pokorný, Tomášek Outline Data overview Horizontal approach + Decision tree/forests Vertical (column) approach + Neural networks SVM Data overview Customers Viewed policies Bought

### Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

### Introduction to Machine Learning. Rob Schapire Princeton University. schapire

Introduction to Machine Learning Rob Schapire Princeton University www.cs.princeton.edu/ schapire Machine Learning studies how to automatically learn to make accurate predictions based on past observations

### Leveraging Ensemble Models in SAS Enterprise Miner

ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to

### How Boosting the Margin Can Also Boost Classifier Complexity

Lev Reyzin lev.reyzin@yale.edu Yale University, Department of Computer Science, 51 Prospect Street, New Haven, CT 652, USA Robert E. Schapire schapire@cs.princeton.edu Princeton University, Department