CS570 Data Mining Classification: Ensemble Methods



Similar documents
CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

Data Mining Practical Machine Learning Tools and Techniques

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Chapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida - 1 -

Data Mining. Nonlinear Classification

Knowledge Discovery and Data Mining

Model Combination. 24 Novembre 2009

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

Knowledge Discovery and Data Mining

Boosting.

Using multiple models: Bagging, Boosting, Ensembles, Forests

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Chapter 6. The stacking ensemble approach

Decision Trees from large Databases: SLIQ

Why Ensembles Win Data Mining Competitions

FilterBoost: Regression and Classification on Large Datasets

Ensemble Methods. Adapted from slides by Todd Holloway h8p://abeau<fulwww.com/2007/11/23/ ensemble- machine- learning- tutorial/

L25: Ensemble learning

REVIEW OF ENSEMBLE CLASSIFICATION

Ensemble Data Mining Methods

Leveraging Ensemble Models in SAS Enterprise Miner

Generalizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel

Source. The Boosting Approach. Example: Spam Filtering. The Boosting Approach to Machine Learning

Data Mining Methods: Applications for Institutional Research

Ensemble of Classifiers Based on Association Rule Mining

Comparison of Data Mining Techniques used for Financial Data Analysis

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

Knowledge Discovery and Data Mining

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

Distributed forests for MapReduce-based machine learning

Journal of Asian Scientific Research COMPARISON OF THREE CLASSIFICATION ALGORITHMS FOR PREDICTING PM2.5 IN HONG KONG RURAL AREA.

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

Case Study Report: Building and analyzing SVM ensembles with Bagging and AdaBoost on big data sets

Monday Morning Data Mining

Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods

Ensemble Learning Better Predictions Through Diversity. Todd Holloway ETech 2008

How To Solve The Class Imbalance Problem In Data Mining

Decompose Error Rate into components, some of which can be measured on unlabeled data

On the effect of data set size on bias and variance in classification learning

Ensembles and PMML in KNIME

Local classification and local likelihoods

Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms

MHI3000 Big Data Analytics for Health Care Final Project Report

Online Algorithms: Learning & Optimization with No Regret.

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

( ) = ( ) = {,,, } β ( ), < 1 ( ) + ( ) = ( ) + ( )

A Novel Classification Approach for C2C E-Commerce Fraud Detection

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

Random forest algorithm in big data environment

Chapter 12 Bagging and Random Forests

A Learning Algorithm For Neural Network Ensembles

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

Training Methods for Adaptive Boosting of Neural Networks for Character Recognition

Homework Assignment 7

Active Learning with Boosting for Spam Detection

Supervised Learning (Big Data Analytics)

How To Perform An Ensemble Analysis

Solving Regression Problems Using Competitive Ensemble Models

Azure Machine Learning, SQL Data Mining and R

On the application of multi-class classification in physical therapy recommendation

Applied Multivariate Analysis - Big data analytics

An Experimental Study on Ensemble of Decision Tree Classifiers

Gerry Hobbs, Department of Statistics, West Virginia University

Adaptive Classification Algorithm for Concept Drifting Electricity Pricing Data Streams

II. RELATED WORK. Sentiment Mining

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Beating the NCAA Football Point Spread

How To Solve The Kd Cup 2010 Challenge

CLASS imbalance learning refers to a type of classification

SVM Ensemble Model for Investment Prediction

Machine Learning for Medical Image Analysis. A. Criminisi & the InnerEye MSRC

How Boosting the Margin Can Also Boost Classifier Complexity

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Incremental SampleBoost for Efficient Learning from Multi-Class Data Sets

Metalearning for Dynamic Integration in Ensemble Methods

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Data Mining Introduction

Using Random Forest to Learn Imbalanced Data

Online Forecasting of Stock Market Movement Direction Using the Improved Incremental Algorithm

Ensemble Approaches for Regression: A Survey

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

AdaBoost. Jiri Matas and Jan Šochman. Centre for Machine Perception Czech Technical University, Prague

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Class Imbalance Learning in Software Defect Prediction

Data Mining and Visualization

Advanced Ensemble Strategies for Polynomial Models

Transcription:

CS570 Data Mining Classification: Ensemble Methods Cengiz Günay Dept. Math & CS, Emory University Fall 2013 Some slides courtesy of Han-Kamber-Pei, Tan et al., and Li Xiong Günay (Emory) Classification: Ensemble Methods Fall 2013 1 / 6

Today Due today midnight: Homework #2 Frequent itemsets Given today: Homework #3 Classification Today s menu: Classification: Ensemble Methods Günay (Emory) Classification: Ensemble Methods Fall 2013 2 / 6

Ensemble Methods Given a data set, generate multiple models and combine the results Bagging Random Forests Boosting PAC learning significance

General Idea

Why does it work? Suppose there are 25 base classifiers Each classifier has error rate, ε = 0.35 Assume classifiers are independent Probability that the ensemble classifier makes a wrong prediction: 25 ( 25i ) εi (1 ε )25 i =0. 06 i=13

Types of Ensemble Methods Can be obtained by manipulating: 1 Training set: Bagging Boosting Günay (Emory) Classification: Ensemble Methods Fall 2013 3 / 6

Types of Ensemble Methods Can be obtained by manipulating: 1 Training set: Bagging Boosting 2 Input features: Random forests Multi-objective evolutionary algorithms Forward/backward elimination? Günay (Emory) Classification: Ensemble Methods Fall 2013 3 / 6

Types of Ensemble Methods Can be obtained by manipulating: 1 Training set: Bagging Boosting 2 Input features: Random forests Multi-objective evolutionary algorithms Forward/backward elimination? 3 Class labels: Multi-classes Active learning Günay (Emory) Classification: Ensemble Methods Fall 2013 3 / 6

Types of Ensemble Methods Can be obtained by manipulating: 1 Training set: Bagging Boosting 2 Input features: Random forests Multi-objective evolutionary algorithms Forward/backward elimination? 3 Class labels: Multi-classes Active learning Learning algorithm: ANNs Decision trees Günay (Emory) Classification: Ensemble Methods Fall 2013 3 / 6

Bagging Create a data set by sampling data points with replacement Create model based on the data set Generate more data sets and models Predict by combining votes Classification: majority vote Prediction: average

Bagging Sampling with replacement Original Data Bagging (Round 1) Bagging (Round 2) Bagging (Round 3) 1 7 1 1 2 8 8 3 10 9 5 8 1 10 5 2 2 5 6 5 3 5 7 10 2 9 8 10 7 6 9 5 3 3 10 9 2 7 Build classifier on each bootstrap sample Each sample has probability (1 1/n)n of being selected

Bagging Advantages: Less overfitting Helps when classifier is unstable (has high variance) Disadvantages: Not useful when classifier is stable and has large bias Günay (Emory) Classification: Ensemble Methods Fall 2013 / 6

PAC learning Model defining learning with given accuracy and confidence using polynomial sample complexity References: L. Valiant. A theory of the learnable. http://web.mit.edu/6.35/www/valiant8.pdf D. Haussler. Overview of the Probably Approximately Correct (PAC) Learning Framework http://www.cs.iastate.edu/~honavar/pac.pdf

Boosting Use weak learners and combine to form strong learner in PAC learning sense Learn using a weak learner Boost the accuracy by reweighting the examples misclassified by previous weak learner and forcing the next weak learner to focus on the hard examples Predict by using a weighted combination of the weak learners Weight is determined by their accuracy

Boosting An iterative procedure to adaptively change distribution of training data by focusing more on previously misclassified records Initially, all N records are assigned equal weights Unlike bagging, weights may change at the end of boosting round

Boosting Records that are wrongly classified will have their weights increased Records that are classified correctly will have their weights decreased Original Data Boosting (Round 1) Boosting (Round 2) Boosting (Round 3) 1 7 5 2 3 3 2 9 8 8 10 5 7 2 6 9 5 5 7 1 8 10 7 6 9 6 3 10 3 2 Example is hard to classify Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds

Boosting Advantages: Focuses on samples that are hard to classify Sample weights can be used for: Adaboost: 1 Sampling probability 2 Used by classifier to value them more Calculates classifier importance instead of voting Exponential weight update rules But, susceptible to overfitting Günay (Emory) Classification: Ensemble Methods Fall 2013 5 / 6

Example: AdaBoost Base classifiers: C1, C2,, CT Error rate: 1 εi = N N w j δ ( C i ( x j ) y j ) j=1 Importance of a classifier: 1 ε i 1 α i= ln 2 εi ( )

Example: AdaBoost Weight update: ( j) wi ( j+ 1) wi = Zj { α j if C j ( xi )=y i αj if C j ( xi ) y i exp exp } where Z j is the normalization factor If any intermediate rounds produce error rate higher than 50%, the weights are reverted back to 1/n and the resampling procedure is repeated Classification: C * ( x ) = arg max α jδ ( C j ( x ) = y ) T y j =1

Illustrating AdaBoost Initial weights for each data point (C) Vipin Kumar, Parallel Issues in Data Mining, V Data points for training 11

Illustrating AdaBoost (C) Vipin Kumar, Parallel Issues in Data Mining, V 12

Random Forests Sample a data set with replacement Select m variables at random from p variables Create a tree Similarly create more trees Combine the results Reference: Hastie, Tibshirani, Friedman, The Elements of Statistical Learning, Chapter 15

Random Forests Advantages: Only for decision trees Lowers generalization error Uses randomization in tree construction: #features= log 2 d + 1 Equivalent accuracy to Adaboost, but faster See table in Tan et al p. 29 for comparison of ensemble methods. Günay (Emory) Classification: Ensemble Methods Fall 2013 6 / 6