Chapter 12 Bagging and Random Forests

Similar documents

Package trimtrees. February 20, 2015

Chapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida - 1 -

Comparison of Data Mining Techniques used for Financial Data Analysis

Data Mining Practical Machine Learning Tools and Techniques

Package missforest. February 20, 2015

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

Classification and Regression by randomforest

Data Mining. Nonlinear Classification

Using multiple models: Bagging, Boosting, Ensembles, Forests

M1 in Economics and Economics and Statistics Applied multivariate Analysis - Big data analytics Worksheet 3 - Random Forest

Knowledge Discovery and Data Mining

Leveraging Ensemble Models in SAS Enterprise Miner

Package bigrf. February 19, 2015

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Data Mining - Evaluation of Classifiers

Model Combination. 24 Novembre 2009

Knowledge Discovery and Data Mining

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms

Gerry Hobbs, Department of Statistics, West Virginia University

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

L25: Ensemble learning

Predicting borrowers chance of defaulting on credit loans

Applied Multivariate Analysis - Big data analytics

We discuss 2 resampling methods in this chapter - cross-validation - the bootstrap

Generalizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Evaluation & Validation: Credibility: Evaluating what has been learned

How To Make A Credit Risk Model For A Bank Account

A Methodology for Predictive Failure Detection in Semiconductor Fabrication

Knowledge Discovery and Data Mining

Decompose Error Rate into components, some of which can be measured on unlabeled data

Better credit models benefit us all

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"

Data mining on the EPIRARE survey data

Bootstrapping Big Data

Decision Trees from large Databases: SLIQ

Classification of Bad Accounts in Credit Card Industry

A Learning Algorithm For Neural Network Ensembles

II. RELATED WORK. Sentiment Mining

Supervised Learning (Big Data Analytics)

Azure Machine Learning, SQL Data Mining and R

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

How To Perform An Ensemble Analysis

L13: cross-validation

Ensemble Methods. Adapted from slides by Todd Holloway h8p://abeau<fulwww.com/2007/11/23/ ensemble- machine- learning- tutorial/

Exploratory Data Analysis using Random Forests

Implementation of Breiman s Random Forest Machine Learning Algorithm

Data Analysis of Trends in iphone 5 Sales on ebay

6 Classification and Regression Trees, 7 Bagging, and Boosting

How To Identify A Churner

UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee

Benchmarking Open-Source Tree Learners in R/RWeka

CS570 Data Mining Classification: Ensemble Methods

Ensemble Data Mining Methods

The Predictive Data Mining Revolution in Scorecards:

Lecture 13: Validation

CART 6.0 Feature Matrix

Chapter 6. The stacking ensemble approach

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

Predictive Data modeling for health care: Comparative performance study of different prediction models

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

Beating the MLB Moneyline

Data Science with R Ensemble of Decision Trees

Data Mining Techniques Chapter 6: Decision Trees

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Multiple Linear Regression in Data Mining

Mining Wiki Usage Data for Predicting Final Grades of Students

Data Mining Methods: Applications for Institutional Research

A General Approach to Incorporate Data Quality Matrices into Data Mining Algorithms

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

Studying Auto Insurance Data

On Cross-Validation and Stacking: Building seemingly predictive models on random data

On the effect of data set size on bias and variance in classification learning

Didacticiel Études de cas

Winning the Kaggle Algorithmic Trading Challenge with the Composition of Many Models and Feature Engineering

Microsoft Azure Machine learning Algorithms

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

An Overview and Evaluation of Decision Tree Methodology

Social Media Mining. Data Mining Essentials

Beating the NCAA Football Point Spread

PaRFR : Parallel Random Forest Regression on Hadoop for Multivariate Quantitative Trait Loci Mapping. Version 1.0, Oct 2012

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar

Ensemble Learning Better Predictions Through Diversity. Todd Holloway ETech 2008

Homework Assignment 7

Cross-Validation. Synonyms Rotation estimation

Predicting Flight Delays

Using Random Forest to Learn Imbalanced Data

Data Mining for Knowledge Management. Classification

THE RISE OF THE BIG DATA: WHY SHOULD STATISTICIANS EMBRACE COLLABORATIONS WITH COMPUTER SCIENTISTS XIAO CHENG. (Under the Direction of Jeongyoun Ahn)

Transcription:

Chapter 12 Bagging and Random Forests Xiaogang Su Department of Statistics and Actuarial Science University of Central Florida - 1 -

Outline A brief introduction to the bootstrap Bagging: basic concepts (Breiman, 1996) Random forest - 2 -

Bootstrap - Motivation Problem: What s the average price of house prices? From F, get a sample L= ( x1,, x n ), and calculate the average u. Question: how reliable is u? What s the standard error of u? what s the confidence interval? Now a harder estimation problem: What is the IQR of the house prices with a confidence interval? Repeat B time: Generate a sample L b of size n from L by sampling with replacement. Compute IQRb for L b. Now we end up with bootstrap values ( IQR1,, IQRb, IQRB) Use these values for calculating all the quantities of interest (e.g., standard deviation, confidence intervals) - 3 -

Bootstrap An Example on Estimating Mean - 4 -

Bootstrap An Overview Introduced by Bradley Efron in 1979. Named from the phrase to pull oneself up by one s bootstraps, which is widely believed to come from the Adventures of Baron Munchausen s. Popularized in 1980s due to the introduction of computers in statistical practice. It has a strong mathematical background. It is well known as a method for estimating standard errors, bias, and constructing confidence intervals for parameters. Bootstrap Distribution: The bootstrap does not replace or add to the original data. We use bootstrap distribution as a way to estimate the variation in a statistic based on the original data. Sampling distribution vs. bootstrap distribution Multiple samples sampling distribution Bootstrapping: One original sample B bootstrap samples B bootstrap samples bootstrap distribution - 5 -

Bagging Introduced by Breiman (1996). Bagging stands for bootstrap aggregating. An ensemble method: a method of combining multiple predictors. The Main Idea: Given a training set D= {( yi, xi) : i = 1,2,, n}, want to make prediction for an observation with x. Sample B data sets, each consisting of n observations randomly selected from D with replacement. Then { D1, D2,, DB} form B quasi replicated training sets. Train a machine or model on each D b for b= 1,, B and obtain a sequence of B predictions The final aggregate classifier can be obtained by averaging (regression) or majority voting (classification). - 6 -

Bagging the Detailed Algorithm Construct B bootstrap replicates of L (e.g., B =200): L,, 1 LB Apply learning algorithm to each replicate Lb to obtain hypothesis or prediction h b Let T b = L \ L b be the data points that do not appear in L b (out of bag points) Compute predicted value h b (x) for each x in T b For each data point x, we will now have the observed corresponding value y and several predictions y 1 ( x),, yk ( x). Compute the average prediction hx. ˆ( ) Estimate bias and variance o Bias y h ˆ( x) K o Variance { ˆ } 2 yˆ ( x) h( x) k = 1 k - 7 -

Variants of Re/Sub-Sampling Methods Standard bagging: each of the T subsamples has size n and created with replacement. Sub-bagging : create T subsamples of size α only (α <n). No replacement: same as bagging or sub-bagging, but using sampling without replacement Overlap vs. non-overlap: Should the T subsamples overlap? i.e. create T subsamples each with T training data. Others Issues: Permutation; Jackknife (Leave-One-Out); V-fold Cross-Validation Out-Of-Bag Estimates of the Prediction/Classification Errors - 8 -

Variance Reduction Bagging reduces variance (Intuition) If each single classifier is unstable that is, it has high variance, the aggregated classifier f has a smaller variance than a single original classifier. The aggregated classifier f can be thought of as an approximation to the true average f obtained by replacing the probability distribution p with the bootstrap approximation to p obtained concentrating mass 1/n at each point (xi, yi). Bagging works well for unstable learning algorithms. Bagging can slightly degrade the performance of stable learning algorithms. Unstable learning algorithms: small changes in the training set result in large changes in predictions. o Neural network; o Decision trees and Regression trees; o Subset selection in linear/logistic regression Stable learning algorithms: K-nearest neighbors - 9 -

Variance Reduction in Regression (Breiman) Data ( xi, yi), i= 1,, n are independently drawn from a distribution P. * Consider the ideal the aggregate estimator fag ( x) = EP f ( x). Here x is fixed and * * the bootstrap dataset consists of observations ( xi, y ), i= 1,, n sampled from P i (rather than from the current data). sampling distribution Then we have 2 2 * * EP { y f ( x) } = EP { y fag ( x) + fag ( x) f ( x) } 2 2 * = E y f ( x) + E f ( x) f ( x) { } { } P ag ag P { ag( )} E y f x 2 Therefore, true population aggregation never increases mean squared error for regression. The above argument does not hold for classification under 0-1 loss, because of the nonadditivity of bias and variance. - 10 -

Bagging Can Hurt Bagging helps when a learning algorithm is good on average but unstable with respect to the training set. But if we bag a stable learning algorithm, we can actually make it worse. Bagging almost always helps with regression, but even with unstable learners it can hurt in classification. If we bag a poor and stable classifier we can make it horrible. The vital element is the instability of the prediction method. If perturbing the learning set can cause significant changes in the predictor constructed, then bagging can improve accuracy. (Breiman, 1996) There are theoretical works that quantify the term stableness precisely. - 11 -

R Implementation the ipred Package ipredbagg.factor(y, X=NULL, nbagg=25, control= rpart.control(minsplit=2, cp=0, xval=0), comb=null, coob=false, ns=length(y), keepx = TRUE,...) ipredbagg.numeric(y, X=NULL, nbagg=25, control=rpart.control(xval=0), comb=null, coob=false, ns=length(y), keepx = TRUE,...) ipredbagg.surv(y, X=NULL, nbagg=25, control=rpart.control(xval=0), comb=null, coob=false, ns=dim(y)[1], keepx = TRUE,...) ## S3 method for class 'data.frame': bagging(formula, data, subset, na.action=na.rpart,...) OPTIONS: nbagg: an integer giving the number of bootstrap replications. coob: a logical indicating whether an out-of-bag estimate of the error rate (misclassification error, root mean squared error or Brier score) should be computed. See 'predict.classbagg' for details. control: options that control details of the 'rpart' algorithm, see 'rpart.control'. It is wise to set 'xval = 0' in order to save computing time. Note that the default values depend on the class of 'y'. - 12 -

Random Forests Refinement of bagged trees; quite popular. The term came from random decision forests that was first proposed by Tin Kam Ho of Bell Labs in 1995. The method combines Breiman's "bagging" idea and Ho's "random subspace method" to construct a collection of decision trees with controlled variations. The Main Idea o At each tree split, a random sample of m features is drawn, and only those m features are considered for splitting. Typically m = p or log2 p, where p is the number of features o For each tree grown on a bootstrap sample, the error rate for observations left out of the bootstrap sample is monitored. This is called the out-of-bag error rate. Random forests tries to improve on bagging by de-correlating the trees. Each tree has the same expectation. - 13 -

Steps in Random Forests - 14 -

Out-Of-Bag Estimates and Variable Importance - 15 -

A Graph Illustrating the Decision Boundary One Single Tree Bagging/Random Forests - 16 -

R Implementation: randomforest combine Combine Ensembles of Trees gettree Extract a single tree from a forest. grow Add trees to an ensemble importance Extract variable importance measure outlier Compute outlying measures partialplot Partial dependence plot plot.randomforest Plot method for randomforest objects predict.randomforest predict method for random forest objects randomforest Classification and Regression with Random Forest rfimpute Missing Value Imputations by randomforest treesize Size of trees in an ensemble tunerf Tune randomforest for the optimal mtry parameter varimpplot Variable Importance Plot varused Variables used in a random forest - 17 -

R Function randomforest ## S3 method for class 'formula': randomforest(formula, data=null,..., subset, na.action=na.fail) ## Default S3 method: randomforest(x, y=null, xtest=null, ytest=null, ntree=500, mtry=if (!is.null(y) &&!is.factor(y)) max(floor(ncol(x)/3), 1) else floor(sqrt(ncol(x))), replace=true, classwt=null, cutoff, strata, sampsize = if (replace) nrow(x) else ceiling(.632*nrow(x)), nodesize = if (!is.null(y) &&!is.factor(y)) 5 else 1, importance=false, localimp=false, nperm=1, proximity=false, oob.prox=proximity, norm.votes=true, do.trace=false, keep.forest=!is.null(y) && is.null(xtest), corr.bias=false, keep.inbag=false,...) For Unsupervised Learning with Random Forest: http://www.genetics.ucla.edu/labs/horvath/rfclustering/rfclustering.htm - 18 -

Options in Function randomforest ntree: Number of trees to grow. This should not be set to too small a number, to ensure that every input row gets predicted at least a few times. mtry: Number of variables randomly sampled as candidates at each split. Note that the default values are different for classification (sqrt(p) where p is number of variables in 'x') and regression (p/3) replace: Should sampling of cases be done with or without replacement? importance: Should importance of predictors be assessed? localimp: Should casewise importance measure be computed? (Setting this to 'TRUE' will override 'importance'.) nperm: Number of times the OOB data are permuted per tree for assessing variable importance. Number larger than 1 gives slightly more stable estimate, but not very effective. Currently only implemented for regression. For Detailed Explanation and Examples: http://oz.berkeley.edu/users/breiman/using_random_forests_v3.1.pdf - 19 -