CS570 Data Mining Classification: Ensemble Methods

Size: px

Start display at page:

Download "CS570 Data Mining Classification: Ensemble Methods"

Warren Hensley
8 years ago
Views:

1 CS570 Data Mining Classification: Ensemble Methods Cengiz Günay Dept. Math & CS, Emory University Fall 2013 Some slides courtesy of Han-Kamber-Pei, Tan et al., and Li Xiong Günay (Emory) Classification: Ensemble Methods Fall / 6

2 Today Due today midnight: Homework #2 Frequent itemsets Given today: Homework #3 Classification Today s menu: Classification: Ensemble Methods Günay (Emory) Classification: Ensemble Methods Fall / 6

3 Ensemble Methods Given a data set, generate multiple models and combine the results Bagging Random Forests Boosting PAC learning significance

4 General Idea

5 Why does it work? Suppose there are 25 base classifiers Each classifier has error rate, ε = 0.35 Assume classifiers are independent Probability that the ensemble classifier makes a wrong prediction: 25 ( 25i ) εi (1 ε )25 i =0. 06 i=13

6 Types of Ensemble Methods Can be obtained by manipulating: 1 Training set: Bagging Boosting Günay (Emory) Classification: Ensemble Methods Fall / 6

7 Types of Ensemble Methods Can be obtained by manipulating: 1 Training set: Bagging Boosting 2 Input features: Random forests Multi-objective evolutionary algorithms Forward/backward elimination? Günay (Emory) Classification: Ensemble Methods Fall / 6

Multi-objective evolutionary algorithms Forward/backward

8 Types of Ensemble Methods Can be obtained by manipulating: 1 Training set: Bagging Boosting 2 Input features: Random forests Multi-objective evolutionary algorithms Forward/backward elimination? 3 Class labels: Multi-classes Active learning Günay (Emory) Classification: Ensemble Methods Fall / 6

evolutionary algorithms Forward/backward elimination?

9 Types of Ensemble Methods Can be obtained by manipulating: 1 Training set: Bagging Boosting 2 Input features: Random forests Multi-objective evolutionary algorithms Forward/backward elimination? 3 Class labels: Multi-classes Active learning Learning algorithm: ANNs Decision trees Günay (Emory) Classification: Ensemble Methods Fall / 6

10 Bagging Create a data set by sampling data points with replacement Create model based on the data set Generate more data sets and models Predict by combining votes Classification: majority vote Prediction: average

Generate more data sets and models Predict by

11 Bagging Sampling with replacement Original Data Bagging (Round 1) Bagging (Round 2) Bagging (Round 3) Build classifier on each bootstrap sample Each sample has probability (1 1/n)n of being selected

2 5 6 5 3 5 7 10 2 9 8 10 7 6 9 5 3 3 10 9 2 7 Build classifier on

12 Bagging Advantages: Less overfitting Helps when classifier is unstable (has high variance) Disadvantages: Not useful when classifier is stable and has large bias Günay (Emory) Classification: Ensemble Methods Fall 2013 / 6

Disadvantages: Not useful when classifier is stable and

13 PAC learning Model defining learning with given accuracy and confidence using polynomial sample complexity References: L. Valiant. A theory of the learnable. D. Haussler. Overview of the Probably Approximately Correct (PAC) Learning Framework

http://web.mit.edu/6.35/www/valiant8.pdf D. Haussler.

14 Boosting Use weak learners and combine to form strong learner in PAC learning sense Learn using a weak learner Boost the accuracy by reweighting the examples misclassified by previous weak learner and forcing the next weak learner to focus on the hard examples Predict by using a weighted combination of the weak learners Weight is determined by their accuracy

previous weak learner and forcing the next weak learner to focus on the hard examples

15 Boosting An iterative procedure to adaptively change distribution of training data by focusing more on previously misclassified records Initially, all N records are assigned equal weights Unlike bagging, weights may change at the end of boosting round

misclassified records Initially, all N records are assigned

16 Boosting Records that are wrongly classified will have their weights increased Records that are classified correctly will have their weights decreased Original Data Boosting (Round 1) Boosting (Round 2) Boosting (Round 3) Example is hard to classify Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds

(Round 2) Boosting (Round 3) 1 7 5 2 3 3 2 9 8 8 10 5 7 2 6 9 5 5 7 1 8 10 7 6 9 6 3 10 3 2 Example

17 Boosting Advantages: Focuses on samples that are hard to classify Sample weights can be used for: Adaboost: 1 Sampling probability 2 Used by classifier to value them more Calculates classifier importance instead of voting Exponential weight update rules But, susceptible to overfitting Günay (Emory) Classification: Ensemble Methods Fall / 6

Calculates classifier importance instead of voting Exponential weight update rules

18 Example: AdaBoost Base classifiers: C1, C2,, CT Error rate: 1 εi = N N w j δ ( C i ( x j ) y j ) j=1 Importance of a classifier: 1 ε i 1 α i= ln 2 εi ( )

19 Example: AdaBoost Weight update: ( j) wi ( j+ 1) wi = Zj { α j if C j ( xi )=y i αj if C j ( xi ) y i exp exp } where Z j is the normalization factor If any intermediate rounds produce error rate higher than 50%, the weights are reverted back to 1/n and the resampling procedure is repeated Classification: C * ( x ) = arg max α jδ ( C j ( x ) = y ) T y j =1

produce error rate higher than 50%, the weights are reverted back to 1/n and the

20 Illustrating AdaBoost Initial weights for each data point (C) Vipin Kumar, Parallel Issues in Data Mining, V Data points for training 11

21 Illustrating AdaBoost (C) Vipin Kumar, Parallel Issues in Data Mining, V 12

22 Random Forests Sample a data set with replacement Select m variables at random from p variables Create a tree Similarly create more trees Combine the results Reference: Hastie, Tibshirani, Friedman, The Elements of Statistical Learning, Chapter 15

23 Random Forests Advantages: Only for decision trees Lowers generalization error Uses randomization in tree construction: #features= log 2 d + 1 Equivalent accuracy to Adaboost, but faster See table in Tan et al p. 29 for comparison of ensemble methods. Günay (Emory) Classification: Ensemble Methods Fall / 6

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore. CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes