# Ensemble Data Mining Methods

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Ensemble Data Mining Methods Nikunj C. Oza, Ph.D., NASA Ames Research Center, USA INTRODUCTION Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods that leverage the power of multiple models to achieve better prediction accuracy than any of the individual models could on their own. The basic goal when designing an ensemble is the same as when establishing a committee of people: each member of the committee should be as competent as possible, but the members should be complementary to one another. If the members are not complementary, i.e., if they always agree, then the committee is unnecessary---any one member is sufficient. If the members are complementary, then when one or a few members make an error, the probability is high that the remaining members can correct this error. Research in ensemble methods has largely revolved around designing ensembles consisting of competent yet complementary models. BACKGROUND A supervised machine learning task involves constructing a mapping from input data (normally described by several features) to the appropriate outputs. In a classification learning task, each output is one or more classes to which the input belongs. The goal of classification learning is to develop a model that separates the data into the different classes, with the aim of classifying new examples in the future. For example, a credit card company may develop a model that separates people who defaulted on their

2 credit cards from those who did not based on other known information such as annual income. The goal would be to predict whether a new credit card applicant is likely to default on his credit card and thereby decide whether to approve or deny this applicant a new card. In a regression learning task, each output is a continuous value to be predicted (e.g., the average balance that a credit card holder carries over to the next month). Many traditional machine learning algorithms generate a single model (e.g., a decision tree or neural network). Ensemble learning methods instead generate multiple models. Given a new example, the ensemble passes it to each of its multiple base models, obtains their predictions, and then combines them in some appropriate manner (e.g., averaging or voting). As mentioned earlier, it is important to have base models that are competent but also complementary. To further motivate this point, consider Figure 1. This figure depicts a classification problem in which the goal is to separate the points marked with plus signs from points marked with minus signs. None of the three individual linear classifiers (marked A, B, and C) is able to separate the two classes of points. However, a majority vote over all three linear classifiers yields the piecewiselinear classifier shown as a thick line. This classifier is able to separate the two classes perfectly. For example, the plusses at the top of the figure are correctly classified by A and B, but are misclassified by C. The majority vote over these correctly classifies these points as plusses. This happens because A and B are very different from C. If our ensemble instead consisted of three copies of C, then all three classifiers would misclassify the plusses at the top of the figure, and so would a majority vote over these classifiers.

3 Figure 1: An ensemble of linear classifiers. Each line---a, B, and C---is a linear classifier. The boldface line is the ensemble that classifies new examples by returning the majority vote of A, B, and C. MAIN THRUST OF THE CHAPTER We now discuss the key elements of an ensemble learning method and ensemble model and, in the process, discuss several ensemble methods that have been developed. Ensemble Methods The example shown in figure 1 is an artificial example. We cannot normally expect to obtain base models that misclassify examples in completely separate parts of the input space and ensembles that classify all the examples correctly. However, there are many algorithms that attempt to generate a set of base models that make errors that are as uncorrelated as possible. Methods such as Bagging (Breiman, 1994) and Boosting

4 (Freund and Schapire, 1996) promote diversity by presenting each base model with a different subset of training examples or different weight distributions over the examples. For example, in figure 1, if the plusses in the top part of the figure were temporarily removed from the training set, then a linear classifier learning algorithm trained on the remaining examples would probably yield a classifier similar to C. On the other hand, removing the plusses in the bottom part of the figure would probably yield classifier B or something similar. In this way, running the same learning algorithm on different subsets of training example can yield very different classifiers which can be combined to yield an effective ensemble. Input Decimation Ensembles (IDE) (Oza and Tumer, 2001, Tumer and Oza, 2003) and Stochastic Attribute Selection Committees (SASC) (Zheng and Webb, 1998) instead promote diversity by training each base model with the same training examples but different subsets of the input features. SASC trains each base model with a random subset of input features. IDE selects, for each class, a subset of features that has the highest correlation with the presence of that class. Each feature subset is used to train one base model. However, in both SASC and IDE, all the training patterns are used with equal weight to train all the base models. So far we have distinguished ensemble methods by the way they train their base models. We can also distinguish methods by the way they combine their base models predictions. Majority or plurality voting is frequently used for classification problems and is used in Bagging. If the classifiers provide probability values, simple averaging is commonly used and is very effective (Tumer and Ghosh, 1996). Weighted averaging has also been used and different methods for weighting the base models have been examined. Two particularly interesting methods for weighted averaging include Mixtures of Experts

5 (Jordan and Jacobs, 1994) and Merz s use of Principal Components Analysis (PCA) to combine models (Merz, 1999). In Mixtures of Experts, the weights in the weighted average combining are determined by a gating network, which is a model that takes the same inputs that the base models take, and returns a weight on each of the base models. The higher the weight for a base model, the more that base model is trusted to provide the correct answer. These weights are determined during training by how well the base models perform on the training examples. The gating network essentially keeps track of how well each base model performs in each part of the input space. The hope is that each model learns to specialize in different input regimes and is weighted highly when the input falls into its specialty. Merz s method uses PCA to lower the weights of base models that perform well overall but are redundant and therefore effectively give too much weight to one model. For example, in our figure, if there were instead two copies of A and one copy of B in an ensemble of three models, we may prefer to lower the weights of the two copies of A since, essentially, A is being given too much weight. Here, the two copies of A would always outvote B, thereby rendering B useless. Merz s method also increases the weight on base models that do not perform as well overall but perform well in parts of the input space where the other models perform poorly. In this way, a base model s unique contributions are rewarded. When designing an ensemble learning method, in addition to choosing the method by which to bring about diversity in the base models and choosing the combining method, one has to choose the type of base model and base model learning algorithm to use. The combining method may restrict the types of base models that can be used. For example, to use average combining in a classification problem, one must have base

6 models that can yield probability estimates. This precludes the use of linear discriminant analysis or support vector machines, which cannot return probabilities. The vast majority of ensemble methods use only one base model learning algorithm but use the methods described earlier to bring about diversity in the base models. There has been surprisingly little work (e.g., (Merz 1999)) on creating ensembles with many different types of base models. Two of the most popular ensemble learning algorithms are Bagging and Boosting, which we briefly explain next. Bagging Bootstrap Aggregating (Bagging) generates multiple bootstrap training sets from the original training set (using sampling with replacement) and uses each of them to generate a classifier for inclusion in the ensemble. The algorithms for bagging and sampling with replacement are given in figure 2 below. In these algorithms, T is the original training set of N examples, M is the number of base models to be learned, L b is the base model learning algorithm, the h i s are the base models, random_integer(a,b) is a function that returns each of the integers from a to b with equal probability, and I(A) is the indicator function that returns 1 if A is true and 0 otherwise. Bagging(T, M) For each m =1,2,K, M T m = Sample_With_Replacement(T, T ) h m = L b (T m ) Return h fin (x) = argmax y Y I(h m (x) = y). M m=1

7 Sample_With_Replacement(T,N) S = {} For i =1,2,K,N r = random_integer(1,n) Add T[r] to S. Return S. Figure 2: Batch Bagging Algorithm and Sampling with Replacement To create a bootstrap training set from an original training set of size N, we perform N Multinomial trials, where in each trial, we draw one of the N examples. Each example has probability 1/N of being drawn in each trial. The second algorithm shown in figure 2 does exactly this---n times, the algorithm chooses a number r from 1 to N and adds the rth training example to the bootstrap training set S. Clearly, some of the original training examples will not be selected for inclusion in the bootstrap training set and others will be chosen one time or more. In bagging, we create M such bootstrap training sets and then generate classifiers using each of them. Bagging returns a function h(x) that classifies new examples by returning the class y that gets the maximum number of votes from the base models h 1,h 2,,h M. In bagging, the M bootstrap training sets that are created are likely to have some differences. If these differences are enough to induce noticeable differences among the M base models while leaving their performances reasonably good, then the ensemble will probably perform better than the base models individually. (Breiman, 1994) demonstrates that bagged ensembles tend to improve upon their base models more if the base model learning algorithms are unstable---differences in their training sets tend to induce significant differences in the models. He notes that

8 decision trees are unstable, which explains why bagged decision trees often outperform individual decision trees; however, decision stumps (decision trees with only one variable test) are stable, which explains why bagging with decision stumps tends not to improve upon individual decision stumps. Boosting AdaBoost({(x 1,y 1 ),(x 2,y 2 ),K,(x N, y N )},L b, M) Initialize D 1 n { }. For each m =1,2,K, M : ( ) =1/N for all n 1,2,K,N h m = L b ({(x 1,y 1 ),(x 2, y 2 ),K,(x N,y N )},D m ). ε m = D m ( n). n:h m ( x n ) y n If ε m 1/2 then, set M = m 1 and abort this loop. Update distribution D m : 1 2( 1 ε D m +1 ( n) = D m ( n) m ) if h m( x n ) = y n 1 otherwise. 2ε m M Return h fin (x) = argmax y Y I(h m (x) = y)log 1 ε m. m=1 We now explain the AdaBoost algorithm because it is the most frequently used ε m among all boosting algorithms. AdaBoost generates a sequence of base models with different weight distributions over the training set. The AdaBoost algorithm is shown in Figure 3. Its inputs are a set of N training examples, a base model learning algorithm L b, and the number M of base models that we wish to combine. AdaBoost was originally designed for two-class classification problems; therefore, for this explanation we will

9 assume that there are two possible classes. However, AdaBoost is regularly used with a larger number of classes. The first step in AdaBoost is to construct an initial distribution of weights D 1 over the training set. This distribution assigns equal weight to all N training examples. We now enter the loop in the algorithm. To construct the first base model, we call L b with distribution D 1 over the training set. i After getting back a model h 1, we calculate its error ε 1 on the training set itself, which is just the sum of the weights of the training examples that h 1 misclassifies. We require that ε 1 < 1/2 (this is the weak learning assumption---the error should be less than what we would achieve through randomly guessing the class ii ). If this condition is not satisfied, then we stop and return the ensemble consisting of the previously-generated base models. If this condition is satisfied, then we calculate a new distribution D 2 over the training examples as follows. Examples that were correctly classified by h 1 have their weights multiplied by 1/(2(1-ε 1 )). Examples that were misclassified by h 1 have their weights multiplied by 1/(2ε 1 ). Note that, because of our condition ε 1 < 1/2, correctly classified examples have their weights reduced and misclassified examples have their weights increased. Specifically, examples that h 1 misclassified have their total weight increased to 1/2 under D 2 and examples that h 1 correctly classified have their total weight reduced to 1/2 under D 2. We then go into the next iteration of the loop to construct base model h 2 using the training set and the new distribution D 2. The point is that the next base model will be generated by a weak learner (i.e., the base model will have error less than 1/2); therefore, at least some of the examples misclassified by the previous base model will have to be correctly classified by the current base model. In this way, boosting forces subsequent base models to correct the mistakes made by earlier models. We construct M base models in this fashion. The

10 ensemble returned by AdaBoost is a function that takes a new example as input and returns the class that gets the maximum weighted vote over the M base models, where each base model's weight is log((1-ε m )/ε m ), which is proportional to the base model's accuracy on the weighted training set presented to it. AdaBoost has performed very well in practice and is one of the few theoreticallymotivated algorithms that has turned into a practical algorithm. However, AdaBoost can perform poorly when the training data is noisy (Dietterich, 2000), i.e., the inputs or outputs have been randomly contaminated. Noisy examples are normally difficult to learn. Because of this, the weights assigned to noisy examples often become much higher than for the other examples, often causing boosting to focus too much on those noisy examples at the expense of the remaining data. Some work has been done to mitigate the effect of noisy examples on boosting (Oza, 2003, 2004, Ratsch, et. al., 2001). FUTURE TRENDS The fields of machine learning and data mining are increasingly moving away from working on small datasets in the form of flat files that are presumed to describe a single process. The fields are changing their focus toward the types of data increasingly being encountered today: very large datasets, possibly distributed over different locations, describing operations with multiple regimes of operation, time-series data, online applications (the data is not a time series but nevertheless arrives continually and must be processed as it arrives), partially-labeled data, and documents. Research in ensemble methods is beginning to explore these new types of data. For example, ensemble learning traditionally has required access to the entire dataset at once, i.e., it performs batch

11 learning. However, this is clearly impractical for very large datasets that cannot be loaded into memory all at once. (Oza and Russell, 2001; Oza, 2001) apply ensemble learning to such large datasets. In particular, this work develops online bagging and boosting, i.e., they learn in an online manner. That is, whereas standard bagging and boosting require at least one scan of the dataset for every base model created, online bagging and online boosting require only one scan of the dataset regardless of the number of base models. Additionally, as new data arrives, the ensembles can be updated without reviewing any past data. However, because of their limited access to the data, these online algorithms do not perform as well as their standard counterparts. Other work has also been done to apply ensemble methods to other types of data such as time series data (Weigend, et. al, 1995). However, most of this work is experimental. Theoretical frameworks that can guide us in the development of new ensemble learning algorithms specifically for modern datasets have yet to be developed. CONCLUSIONS Ensemble Methods began about ten years ago as a separate area within machine learning and were motivated by the idea of wanting to leverage the power of multiple models and not just trust one model built on a small training set. Significant theoretical and experimental developments have occurred over the past ten years and have led to several methods, especially bagging and boosting, being used to solve many real problems. However, ensemble methods also appear to be applicable to current and upcoming problems of distributed data mining and online applications. Therefore, practitioners in data mining should stay tuned for further developments in the vibrant area

12 of ensemble methods. An excellent way to do this is to follow the series of workshops called the International Workshop on Multiple Classifier Systems. This series balance between theory, algorithms, and applications of ensemble methods gives a comprehensive idea of the work being done in the field. REFERENCES Breiman, L. (1994). Bagging Predictors, Technical Report 421, Department of Statistics, University of California, Berkeley. Dietterich, T. (2000). An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization, Machine Learning 40, Freund, Y. & Schapire, R. (1996). Experiments with a new Boosting Algorithm. In Proceedings of the Thirteenth International Conference on Machine Learning, pp Morgan Kaufmann. Jordan, M.I. & Jacobs, R.A. (1994). Hierarchical Mixture of Experts and the EM Algorithm. Neural Computation, 6, Merz, C.J. (1999). A Principal Component Approach to Combining Regression Estimates. Machine Learning, 36, Oza, N.C. (2001). Online Ensemble Learning, Ph.D. thesis, University of California, Berkeley. Oza, N.C. (2003). Boosting with Averaged Weight Vectors. In T. Windeatt and F. Roli (Eds.), Proceedings of the Fourth International Workshop on Multiple Classifier Systems, 15-24, Berlin.

13 Oza, N.C. (2004). AveBoost2: Boosting with Noisy Data. In F. Roli, J. Kittler, and T. Windeatt (Eds.), Proceedings of the Fifth International Workshop on Multiple Classifier Systems, 31-40, Berlin. Oza, N.C. & Russell, S. (2001). Experimental Comparisons of Online and Batch Versions of Bagging and Boosting, The Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California. Oza, N.C. & Tumer, K. (2001). Input Decimation Ensembles: Decorrelation through Dimensionality Reduction. In Second International Workshop on Multiple Classifier Systems. Springer-Verlag. Berlin. Ratsch, G., Onoda, T., & Muller, K.R. (2001). Soft Margins for AdaBoost. Machine Learning, 42, Tumer, K. & Ghosh, J. (1996). Error Correlation and Error Reduction in Ensemble Classifiers. Connection Science, Special Issue on Combining Artificial Neural Networks: Ensemble Approaches, 8(3 & 4), Tumer, K. & Oza, N.C. (2003). Input Decimated Ensembles, Pattern Analysis and Applications, 6(1): Weigend A.S., Mangeas, M., & Srivastava, A.N. (1995). Nonlinear Gated Experts for Time-Series---Discovering Regimes and Avoiding Overfitting, International Journal of Neural Systems, 6(4), Zheng, Z. & Webb, G. (1998). Stochastic Attribute Selection Committees. In Proceedings of the 11 th Australian Joint Conference on Artificial Intelligence (AI 98), pp

14 TERMS AND THEIR DEFINITIONS: Batch Learning: Learning using an algorithm that views the entire dataset at once and can access any part of the dataset at any time and as many times as desired. Ensemble: A function that returns a combination of the predictions of multiple machine learning models. Decision Tree: A model consisting of nodes that contain tests on a single attribute and branches representing the different outcomes of the test. A prediction is generated for a new example by performing the test described at the root node and then proceeding along the branch that corresponds to the outcome of the test. If the branch ends in a prediction, then that prediction is returned. If the branch ends in a node, then the test at that node is performed and the appropriate branch selected. This continues until a prediction is found and returned. Machine Learning: The branch of Artificial Intelligence devoted to enabling computers to learn. Neural Network: A nonlinear model derived through analogy with the human brain. It consists of a collection of elements that linearly combine their inputs and pass the result through a nonlinear transfer function. Online Learning: Learning using an algorithm that only examines the dataset once in order. This paradigm is often used in situations when data arrives continually in a stream and predictions must be obtainable at any time. Principal Components Analysis (PCA): Given a dataset, PCA determines the axes of maximum variance. For example, if the dataset was shaped like an egg, then the long axis of the egg would be the first principal component because the variance is greatest in this

15 direction. All subsequent principal components are found to be orthogonal to all previous components. ii If L b cannot take a weighted training set, then one can call it with a training set generated by sampling with replacement from the original training set according to the distribution D m. ii This requirement is perhaps too strict when there are more than two classes. There is a multi-class version of AdaBoost (Freund and Schapire, 1997) that does not have this requirement. However, the AdaBoost algorithm presented here is often used even when there are more than two classes if the base model learning algorithm is strong enough to satisfy the requirement.

### A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,

### Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

### Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk

Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training

### Data Mining Practical Machine Learning Tools and Techniques

Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

### Data Mining. Nonlinear Classification

Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

### REVIEW OF ENSEMBLE CLASSIFICATION

Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.

### Chapter 6. The stacking ensemble approach

82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

### Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

### L25: Ensemble learning

L25: Ensemble learning Introduction Methods for constructing ensembles Combination strategies Stacked generalization Mixtures of experts Bagging Boosting CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna

### On the effect of data set size on bias and variance in classification learning

On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent

### Model Combination. 24 Novembre 2009

Model Combination 24 Novembre 2009 Datamining 1 2009-2010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy

### CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes

### BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

### Chapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida - 1 -

Chapter 11 Boosting Xiaogang Su Department of Statistics University of Central Florida - 1 - Perturb and Combine (P&C) Methods have been devised to take advantage of the instability of trees to create

### Comparison of Data Mining Techniques used for Financial Data Analysis

Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract

### TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP

TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző csaba.fozo@lloydsbanking.com 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions

### Advanced Ensemble Strategies for Polynomial Models

Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer

### DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

### Roulette Sampling for Cost-Sensitive Learning

Roulette Sampling for Cost-Sensitive Learning Victor S. Sheng and Charles X. Ling Department of Computer Science, University of Western Ontario, London, Ontario, Canada N6A 5B7 {ssheng,cling}@csd.uwo.ca

### Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods

Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy B laszczyński 1, Krzysztof Dembczyński 1, Wojciech Kot lowski 1, and Mariusz Paw lowski 2 1 Institute of Computing

### Data Mining Methods: Applications for Institutional Research

Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014

### Boosting. riedmiller@informatik.uni-freiburg.de

. Machine Learning Boosting Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de

### Online Bagging and Boosting

Abstract Bagging and boosting are two of the ost well-known enseble learning ethods due to their theoretical perforance guarantees and strong experiental results. However, these algoriths have been used

### Getting Even More Out of Ensemble Selection

Getting Even More Out of Ensemble Selection Quan Sun Department of Computer Science The University of Waikato Hamilton, New Zealand qs12@cs.waikato.ac.nz ABSTRACT Ensemble Selection uses forward stepwise

### Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-19-B &

### Supervised Learning with Unsupervised Output Separation

Supervised Learning with Unsupervised Output Separation Nathalie Japkowicz School of Information Technology and Engineering University of Ottawa 150 Louis Pasteur, P.O. Box 450 Stn. A Ottawa, Ontario,

### Operations Research and Knowledge Modeling in Data Mining

Operations Research and Knowledge Modeling in Data Mining Masato KODA Graduate School of Systems and Information Engineering University of Tsukuba, Tsukuba Science City, Japan 305-8573 koda@sk.tsukuba.ac.jp

### Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

### Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

### Ensemble Learning Better Predictions Through Diversity. Todd Holloway ETech 2008

Ensemble Learning Better Predictions Through Diversity Todd Holloway ETech 2008 Outline Building a classifier (a tutorial example) Neighbor method Major ideas and challenges in classification Ensembles

### Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms

Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms Yin Zhao School of Mathematical Sciences Universiti Sains Malaysia (USM) Penang, Malaysia Yahya

### A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication

2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management

### Solving Regression Problems Using Competitive Ensemble Models

Solving Regression Problems Using Competitive Ensemble Models Yakov Frayman, Bernard F. Rolfe, and Geoffrey I. Webb School of Information Technology Deakin University Geelong, VIC, Australia {yfraym,brolfe,webb}@deakin.edu.au

### Using multiple models: Bagging, Boosting, Ensembles, Forests

Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or

### Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

### New Ensemble Combination Scheme

New Ensemble Combination Scheme Namhyoung Kim, Youngdoo Son, and Jaewook Lee, Member, IEEE Abstract Recently many statistical learning techniques are successfully developed and used in several areas However,

### Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE-541 28 Skövde

### Reducing multiclass to binary by coupling probability estimates

Reducing multiclass to inary y coupling proaility estimates Bianca Zadrozny Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92093-0114 zadrozny@cs.ucsd.edu

### Ensembles and PMML in KNIME

Ensembles and PMML in KNIME Alexander Fillbrunn 1, Iris Adä 1, Thomas R. Gabriel 2 and Michael R. Berthold 1,2 1 Department of Computer and Information Science Universität Konstanz Konstanz, Germany First.Last@Uni-Konstanz.De

### Generalizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel

Generalizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel Copyright 2008 All rights reserved. Random Forests Forest of decision

### CS570 Data Mining Classification: Ensemble Methods

CS570 Data Mining Classification: Ensemble Methods Cengiz Günay Dept. Math & CS, Emory University Fall 2013 Some slides courtesy of Han-Kamber-Pei, Tan et al., and Li Xiong Günay (Emory) Classification:

### Decision Trees from large Databases: SLIQ

Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values

### Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

### Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

### HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan

### Active Learning with Boosting for Spam Detection

Active Learning with Boosting for Spam Detection Nikhila Arkalgud Last update: March 22, 2008 Active Learning with Boosting for Spam Detection Last update: March 22, 2008 1 / 38 Outline 1 Spam Filters

### Predictive Data Mining in Very Large Data Sets: A Demonstration and Comparison Under Model Ensemble

Predictive Data Mining in Very Large Data Sets: A Demonstration and Comparison Under Model Ensemble Dr. Hongwei Patrick Yang Educational Policy Studies & Evaluation College of Education University of Kentucky

### Building Ensembles of Neural Networks with Class-switching

Building Ensembles of Neural Networks with Class-switching Gonzalo Martínez-Muñoz, Aitor Sánchez-Martínez, Daniel Hernández-Lobato and Alberto Suárez Universidad Autónoma de Madrid, Avenida Francisco Tomás

### A Hybrid Approach to Learn with Imbalanced Classes using Evolutionary Algorithms

Proceedings of the International Conference on Computational and Mathematical Methods in Science and Engineering, CMMSE 2009 30 June, 1 3 July 2009. A Hybrid Approach to Learn with Imbalanced Classes using

### How Boosting the Margin Can Also Boost Classifier Complexity

Lev Reyzin lev.reyzin@yale.edu Yale University, Department of Computer Science, 51 Prospect Street, New Haven, CT 652, USA Robert E. Schapire schapire@cs.princeton.edu Princeton University, Department

### Leveraging Ensemble Models in SAS Enterprise Miner

ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to

### Inductive Learning in Less Than One Sequential Data Scan

Inductive Learning in Less Than One Sequential Data Scan Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Research Hawthorne, NY 10532 {weifan,haixun,psyu}@us.ibm.com Shaw-Hwa Lo Statistics Department,

### SVM Ensemble Model for Investment Prediction

19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of

### Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Unit # 10 Sajjad Haider Fall 2012 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

### Ensembles of data-reduction-based classifiers for distributed learning from very large data sets

Ensembles of data-reduction-based classifiers for distributed learning from very large data sets R. A. Mollineda, J. M. Sotoca, J. S. Sánchez Dept. de Lenguajes y Sistemas Informáticos Universidad Jaume

### Online Forecasting of Stock Market Movement Direction Using the Improved Incremental Algorithm

Online Forecasting of Stock Market Movement Direction Using the Improved Incremental Algorithm Dalton Lunga and Tshilidzi Marwala University of the Witwatersrand School of Electrical and Information Engineering

### Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data

Fifth International Workshop on Computational Intelligence & Applications IEEE SMC Hiroshima Chapter, Hiroshima University, Japan, November 10, 11 & 12, 2009 Extension of Decision Tree Algorithm for Stream

### Supervised Feature Selection & Unsupervised Dimensionality Reduction

Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or

### An Experimental Study on Rotation Forest Ensembles

An Experimental Study on Rotation Forest Ensembles Ludmila I. Kuncheva 1 and Juan J. Rodríguez 2 1 School of Electronics and Computer Science, University of Wales, Bangor, UK l.i.kuncheva@bangor.ac.uk

### An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIRPO

### Ensemble of Classifiers Based on Association Rule Mining

Ensemble of Classifiers Based on Association Rule Mining Divya Ramani, Dept. of Computer Engineering, LDRP, KSV, Gandhinagar, Gujarat, 9426786960. Harshita Kanani, Assistant Professor, Dept. of Computer

### Introduction To Ensemble Learning

Educational Series Introduction To Ensemble Learning Dr. Oliver Steinki, CFA, FRM Ziad Mohammad July 2015 What Is Ensemble Learning? In broad terms, ensemble learning is a procedure where multiple learner

### Data Analytics and Business Intelligence (8696/8697)

http: // togaware. com Copyright 2014, Graham.Williams@togaware.com 1/36 Data Analytics and Business Intelligence (8696/8697) Ensemble Decision Trees Graham.Williams@togaware.com Data Scientist Australian

### Decision Tree Learning on Very Large Data Sets

Decision Tree Learning on Very Large Data Sets Lawrence O. Hall Nitesh Chawla and Kevin W. Bowyer Department of Computer Science and Engineering ENB 8 University of South Florida 4202 E. Fowler Ave. Tampa

### II. RELATED WORK. Sentiment Mining

Sentiment Mining Using Ensemble Classification Models Matthew Whitehead and Larry Yaeger Indiana University School of Informatics 901 E. 10th St. Bloomington, IN 47408 {mewhiteh, larryy}@indiana.edu Abstract

### Chapter 12 Bagging and Random Forests

Chapter 12 Bagging and Random Forests Xiaogang Su Department of Statistics and Actuarial Science University of Central Florida - 1 - Outline A brief introduction to the bootstrap Bagging: basic concepts

### The Predictive Data Mining Revolution in Scorecards:

January 13, 2013 StatSoft White Paper The Predictive Data Mining Revolution in Scorecards: Accurate Risk Scoring via Ensemble Models Summary Predictive modeling methods, based on machine learning algorithms

### Gerry Hobbs, Department of Statistics, West Virginia University

Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

### Using Random Forest to Learn Imbalanced Data

Using Random Forest to Learn Imbalanced Data Chao Chen, chenchao@stat.berkeley.edu Department of Statistics,UC Berkeley Andy Liaw, andy liaw@merck.com Biometrics Research,Merck Research Labs Leo Breiman,

### Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for

### An Experimental Study on Ensemble of Decision Tree Classifiers

An Experimental Study on Ensemble of Decision Tree Classifiers G. Sujatha 1, Dr. K. Usha Rani 2 1 Assistant Professor, Dept. of Master of Computer Applications Rao & Naidu Engineering College, Ongole 2

### The More Trees, the Better! Scaling Up Performance Using Random Forest in SAS Enterprise Miner

Paper 3361-2015 The More Trees, the Better! Scaling Up Performance Using Random Forest in SAS Enterprise Miner Narmada Deve Panneerselvam, Spears School of Business, Oklahoma State University, Stillwater,

### 6.2.8 Neural networks for data mining

6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural

### Studying Auto Insurance Data

Studying Auto Insurance Data Ashutosh Nandeshwar February 23, 2010 1 Introduction To study auto insurance data using traditional and non-traditional tools, I downloaded a well-studied data from http://www.statsci.org/data/general/motorins.

### On Adaboost and Optimal Betting Strategies

On Adaboost and Optimal Betting Strategies Pasquale Malacaria School of Electronic Engineering and Computer Science Queen Mary, University of London Email: pm@dcs.qmul.ac.uk Fabrizio Smeraldi School of

### 6 Classification and Regression Trees, 7 Bagging, and Boosting

hs24 v.2004/01/03 Prn:23/02/2005; 14:41 F:hs24011.tex; VTEX/ES p. 1 1 Handbook of Statistics, Vol. 24 ISSN: 0169-7161 2005 Elsevier B.V. All rights reserved. DOI 10.1016/S0169-7161(04)24011-1 1 6 Classification

### An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

### Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,

### MS1b Statistical Data Mining

MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

### Data mining techniques: decision trees

Data mining techniques: decision trees 1/39 Agenda Rule systems Building rule systems vs rule systems Quick reference 2/39 1 Agenda Rule systems Building rule systems vs rule systems Quick reference 3/39

### MHI3000 Big Data Analytics for Health Care Final Project Report

MHI3000 Big Data Analytics for Health Care Final Project Report Zhongtian Fred Qiu (1002274530) http://gallery.azureml.net/details/81ddb2ab137046d4925584b5095ec7aa 1. Data pre-processing The data given

### Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 7 of Data Mining by I. H. Witten and E. Frank

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 7 of Data Mining by I. H. Witten and E. Frank Engineering the input and output Attribute selection Scheme independent, scheme

### Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

### Predicting Student Performance by Using Data Mining Methods for Classification

BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, No 1 Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 Predicting Student Performance

### Training Methods for Adaptive Boosting of Neural Networks for Character Recognition

Submission to NIPS*97, Category: Algorithms & Architectures, Preferred: Oral Training Methods for Adaptive Boosting of Neural Networks for Character Recognition Holger Schwenk Dept. IRO Université de Montréal

### Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

### Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing

www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University

### Neural Networks and Support Vector Machines

INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines

### Ensemble Methods. Adapted from slides by Todd Holloway h8p://abeau<fulwww.com/2007/11/23/ ensemble- machine- learning- tutorial/

Ensemble Methods Adapted from slides by Todd Holloway h8p://abeau

### Increasing the Accuracy of Predictive Algorithms: A Review of Ensembles of Classifiers

1906 Category: Software & Systems Design Increasing the Accuracy of Predictive Algorithms: A Review of Ensembles of Classifiers Sotiris Kotsiantis University of Patras, Greece & University of Peloponnese,

### Bootstrapping Big Data

Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

### E-commerce Transaction Anomaly Classification

E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce

### Combining SVM classifiers for email anti-spam filtering

Combining SVM classifiers for email anti-spam filtering Ángela Blanco Manuel Martín-Merino Abstract Spam, also known as Unsolicited Commercial Email (UCE) is becoming a nightmare for Internet users and

### Internet Traffic Prediction by W-Boost: Classification and Regression

Internet Traffic Prediction by W-Boost: Classification and Regression Hanghang Tong 1, Chongrong Li 2, Jingrui He 1, and Yang Chen 1 1 Department of Automation, Tsinghua University, Beijing 100084, China

### Spam detection with data mining method:

Spam detection with data mining method: Ensemble learning with multiple SVM based classifiers to optimize generalization ability of email spam classification Keywords: ensemble learning, SVM classifier,

### Quantifying Seasonal Variation in Cloud Cover with Predictive Models

Quantifying Seasonal Variation in Cloud Cover with Predictive Models Ashok N. Srivastava, Ph.D. ashok@email.arc.nasa.gov Deputy Area Lead, Discovery and Systems Health Group Leader, Intelligent Data Understanding

### Classification: Basic Concepts, Decision Trees, and Model Evaluation. General Approach for Building Classification Model

10 10 Classification: Basic Concepts, Decision Trees, and Model Evaluation Dr. Hui Xiong Rutgers University Introduction to Data Mining 1//009 1 General Approach for Building Classification Model Tid Attrib1

### C19 Machine Learning

C9 Machine Learning 8 Lectures Hilary Term 25 2 Tutorial Sheets A. Zisserman Overview: Supervised classification perceptron, support vector machine, loss functions, kernels, random forests, neural networks