The caret Package: A Unified Interface for Predictive Models



Similar documents
user! 2013 Max Kuhn, Ph.D Pfizer Global R&D Groton, CT

Applied Predictive Modeling in R R. user! 2014

Journal of Statistical Software

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

Chapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida - 1 -

Data Mining Practical Machine Learning Tools and Techniques

Consensus Based Ensemble Model for Spam Detection

Classification and Regression by randomforest

Knowledge Discovery and Data Mining

netresponse probabilistic tools for functional network analysis

Package hybridensemble

Data Mining. Nonlinear Classification

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Predictive Data modeling for health care: Comparative performance study of different prediction models

Chapter 6. The stacking ensemble approach

Didacticiel Études de cas

Social Media Mining. Data Mining Essentials

M1 in Economics and Economics and Statistics Applied multivariate Analysis - Big data analytics Worksheet 3 - Random Forest

Decision Trees from large Databases: SLIQ

Package missforest. February 20, 2015

Visualising big data in R

The Operational Value of Social Media Information. Social Media and Customer Interaction

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

CS570 Data Mining Classification: Ensemble Methods

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING

Maschinelles Lernen mit MATLAB

Data Science with R. Introducing Data Mining with Rattle and R.

Environmental Remote Sensing GEOG 2021

Lecture 10: Regression Trees

partykit: A Toolkit for Recursive Partytioning

UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

Package bigrf. February 19, 2015

Chapter 12 Bagging and Random Forests

MHI3000 Big Data Analytics for Health Care Final Project Report

Fortgeschrittene Computerintensive Methoden

Fortgeschrittene Computerintensive Methoden

2 Decision tree + Cross-validation with R (package rpart)

Beating the MLB Moneyline

Risk pricing for Australian Motor Insurance

An Overview of Knowledge Discovery Database and Data mining Techniques

Univariate Regression

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

ModEco Tutorial In this tutorial you will learn how to use the basic features of the ModEco Software.

Package trimtrees. February 20, 2015

Predict the Popularity of YouTube Videos Using Early View Data

Azure Machine Learning, SQL Data Mining and R

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Better credit models benefit us all

Model Combination. 24 Novembre 2009

How To Identify A Churner

Evaluation & Validation: Credibility: Evaluating what has been learned

A Short Tour of the Predictive Modeling Process

Regularized Logistic Regression for Mind Reading with Parallel Validation

Predicting borrowers chance of defaulting on credit loans

Data Mining Lab 5: Introduction to Neural Networks

Using multiple models: Bagging, Boosting, Ensembles, Forests

MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS

How To Understand Data Mining In R And Rattle

Knowledge Discovery and Data Mining

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Microsoft Azure Machine learning Algorithms

Random Forest Based Imbalanced Data Cleaning and Classification

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

The Artificial Prediction Market

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Classification of Bad Accounts in Credit Card Industry

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign

Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Data Mining Methods: Applications for Institutional Research

Knowledge Discovery and Data Mining

1. Classification problems

APPLICATION PROGRAMMING: DATA MINING AND DATA WAREHOUSING

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

A Property & Casualty Insurance Predictive Modeling Process in SAS

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

High Productivity Data Processing Analytics Methods with Applications

JetBlue Airways Stock Price Analysis and Prediction

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

Decompose Error Rate into components, some of which can be measured on unlabeled data

IST 557 Final Project

MERGING BUSINESS KPIs WITH PREDICTIVE MODEL KPIs FOR BINARY CLASSIFICATION MODEL SELECTION

Data Science with R Ensemble of Decision Trees

Data Mining with R. Decision Trees and Random Forests. Hugh Murrell

Transcription:

The caret Package: A Unified Interface for Predictive Models Max Kuhn Pfizer Global R&D Nonclinical Statistics Groton, CT max.kuhn@pfizer.com February 26, 2014

Motivation Theorem (No Free Lunch) In the absence of any knowledge about the prediction problem, no model can be said to be uniformly better than any other Given this, it makes sense to use a variety of different models to find one that best fits the data R has many packages for predictive modeling (aka machine learning)(aka pattern recognition)... Max Kuhn (Pfizer Global R&D) caret February 26, 2014 2 / 37

Model Function Consistency Since there are many modeling packages written by different people, there are some inconsistencies in how models are specified and predictions are made. For example, many models have only one method of specifying the model (e.g. formula method only) The table below shows the syntax to get probability estimates from several classification models: obj Class Package predict Function Syntax lda MASS predict(obj) (no options needed) glm stats predict(obj, type = "response") gbm gbm predict(obj, type = "response", n.trees) mda mda predict(obj, type = "posterior") rpart rpart predict(obj, type = "prob") Weka RWeka predict(obj, type = "probability") LogitBoost catools predict(obj, type = "raw", niter) Max Kuhn (Pfizer Global R&D) caret February 26, 2014 3 / 37

The caret Package The caret package was developed to: create a unified interface for modeling and prediction (currently 147 different models) streamline model tuning using resampling provide a variety of helper functions and classes for day to day model building tasks increase computational efficiency using parallel processing First commits within Pfizer: 6/2005 First version on CRAN: 10/2007 Website: http://caret.r-forge.r-project.org JSS Paper: www.jstatsoft.org/v28/i05/paper Book: Applied Predictive Modeling (AppliedPredictiveModeling.com) Max Kuhn (Pfizer Global R&D) caret February 26, 2014 4 / 37

Example Data SGI has several data sets on their homepage for MLC++ software at http://www.sgi.com/tech/mlc/. One data set can be used to predict telecom customer churn based on information about their account. Let s read in the data first: > library(c50) > data(churn) Max Kuhn (Pfizer Global R&D) caret February 26, 2014 5 / 37

Example Data > str(churntrain) 'data.frame': 3333 obs. of 20 variables: $ state : Factor w/ 51 levels "AK","AL","AR",..: 17 36 32 36 $ account_length : int 128 107 137 84 75 118 121 147 117 141... $ area_code : Factor w/ 3 levels "area_code_408",..: 2 2 2 1 2 3 $ international_plan : Factor w/ 2 levels "no","yes": 1 1 1 2 2 2 1 2 1 2 $ voice_mail_plan : Factor w/ 2 levels "no","yes": 2 2 1 1 1 1 2 1 1 2 $ number_vmail_messages : int 25 26 0 0 0 0 24 0 0 37... $ total_day_minutes : num 265 162 243 299 167... $ total_day_calls : int 110 123 114 71 113 98 88 79 97 84... $ total_day_charge : num 45.1 27.5 41.4 50.9 28.3... $ total_eve_minutes : num 197.4 195.5 121.2 61.9 148.3... $ total_eve_calls : int 99 103 110 88 122 101 108 94 80 111... $ total_eve_charge : num 16.78 16.62 10.3 5.26 12.61... $ total_night_minutes : num 245 254 163 197 187... $ total_night_calls : int 91 103 104 89 121 118 118 96 90 97... $ total_night_charge : num 11.01 11.45 7.32 8.86 8.41... $ total_intl_minutes : num 10 13.7 12.2 6.6 10.1 6.3 7.5 7.1 8.7 11.2.. $ total_intl_calls : int 3 3 5 7 3 6 7 6 4 5... $ total_intl_charge : num 2.7 3.7 3.29 1.78 2.73 1.7 2.03 1.92 2.35 3.0 $ number_customer_service_calls: int 1 1 0 2 3 0 3 0 1 0... $ churn : Factor w/ 2 levels "yes","no": 2 2 2 2 2 2 2 2 2 2 Max Kuhn (Pfizer Global R&D) caret February 26, 2014 6 / 37

Example Data Here is a vector of predictor names that we might need later. > predictors <- names(churntrain)[names(churntrain)!= "churn"] Note that the class s leading level is "yes". Many of the functions used here model the probability of the first factor level. In other words, we want to predict churn, not retention. Max Kuhn (Pfizer Global R&D) caret February 26, 2014 7 / 37

Data Splitting A simple, stratified random split is used here: > alldata <- rbind(churntrain, churntest) > > set.seed(1) > intrainingset <- createdatapartition(alldata$churn, + p =.75, list = FALSE) > churntrain <- alldata[ intrainingset,] > churntest <- alldata[-intrainingset,] > > ## For this presentation, we will stick with the original split Other functions: createfolds, createmultifolds, createresamples Max Kuhn (Pfizer Global R&D) caret February 26, 2014 8 / 37

Data Pre Processing Methods preprocess calculates values that can be used to apply to any data set (e.g. training, set, unknowns). Current methods: centering, scaling, spatial sign transformation, PCA or ICA signal extraction, imputation, Box Cox transformations and others. > numerics <- c("account_length", "total_day_calls", "total_night_calls") > ## Determine means and sd's > procvalues <- preprocess(churntrain[,numerics], + method = c("center", "scale", "YeoJohnson")) > ## Use the predict methods to do the adjustments > trainscaled <- predict(procvalues, churntrain[,numerics]) > testscaled <- predict(procvalues, churntest[,numerics]) Max Kuhn (Pfizer Global R&D) caret February 26, 2014 9 / 37

Data Pre Processing Methods > procvalues Call: preprocess.default(x = churntrain[, numerics], method = c("center", "scale", "YeoJohnson")) Created from 3333 samples and 3 variables Pre-processing: centered, scaled, Yeo-Johnson transformation Lambda estimates for Yeo-Johnson transformation: 0.89, 1.17, 0.93 preprocess can also be called within other functions for each resampling iteration (more in a bit). Max Kuhn (Pfizer Global R&D) caret February 26, 2014 10 / 37

Boosted Trees (Original adaboost Algorithm) A method to boost weak learning algorithms (e.g. single trees) into strong learning algorithms. Boosted trees try to improve the model fit over different trees by considering past fits (not unlike iteratively reweighted least squares) The basic tree boosting algorithm: Initialize equal weights per sample; for j = 1... M iterations do Fit a classification tree using sample weights (denote the model equation as f j (x)); forall the misclassified samples do increase sample weight end Save a stage weight (β j ) based on the performance of the current model; end Max Kuhn (Pfizer Global R&D) caret February 26, 2014 11 / 37

Boosted Trees (Original adaboost Algorithm) In this formulation, the categorical response y i is coded as either { 1, 1} and the model f j (x) produces values of { 1, 1}. The final prediction is obtained by first predicting using all M trees, then weighting each prediction f (x) = 1 M M β j f j (x) where f j is the j th tree fit and β j is the stage weight for that tree. The final class is determined by the sign of the model prediction. In English: the final prediction is a weighted average of each tree s prediction. The weights are based on quality of each tree. j =1 Max Kuhn (Pfizer Global R&D) caret February 26, 2014 12 / 37

Boosted Trees Parameters Most implementations of boosting have three tuning parameters: number of iterations (i.e. trees) complexity of the tree (i.e. number of splits) learning rate (aka. shrinkage ): how quickly the algorithm adapts Boosting functions for trees in R: gbm in the gbm package, ada in ada and blackboost in mboost. Max Kuhn (Pfizer Global R&D) caret February 26, 2014 13 / 37

Using the gbm Package The gbm function in the gbm package can be used to fit the model, then predict.gbm and other functions are used to predict and evaluate the model. > library(gbm) > # The gbm function does not accept factor response values so we > # will make a copy and modify the outcome variable > forgbm <- churntrain > forgbm$churn <- ifelse(forgbm$churn == "yes", 1, 0) > > gbmfit <- gbm(formula = churn ~., # Use all predictors + distribution = "bernoulli", # For classification + data = forgbm, + n.trees = 2000, # 2000 boosting iterations + interaction.depth = 7, # How many splits in each tree + shrinkage = 0.01, # learning rate + verbose = FALSE) # Do not print the details Max Kuhn (Pfizer Global R&D) caret February 26, 2014 14 / 37

Tuning the gbm Model This code assumes that we know appropriate values of the three tuning parameters. As previously discussed, one approach for model tuning is resampling. We can fit models with different values of the tuning parameters to many resampled versions of the training set and estimate the performance based on the hold out samples. From this, a profile of performance can be obtained across different gbm models. An optimal value can be chosen from this profile. Max Kuhn (Pfizer Global R&D) caret February 26, 2014 15 / 37

Tuning the gbm Model foreach resampled data set do Hold out samples ; foreach combination of tree depth, learning rate and number of trees do Fit the model on the resampled data set; Predict the hold outs and save results; end Calculate the average AUC ROC across the hold out sets of predictions end Determine tuning parameters based on the highest resampled ROC AUC; Max Kuhn (Pfizer Global R&D) caret February 26, 2014 16 / 37

Model Tuning using train In a basic call to train, we specify the predictor set, the outcome data and the modeling technique (eg. boosting via the gbm package): > gbmtune <- train(x = churntrain[,predictors], + y= churntrain$churn, + method = "gbm") > > # or, using the formula interface > gbmtune <- train(churn ~., data = churntrain, method = "gbm") Note that train uses the original factor input for classification. For binary outcomes, the function models the probability of the first factor level (churn for these data). A numeric object would indicate we are doing regression. One problem is that gbm spits out a lot of information during model fitting. Max Kuhn (Pfizer Global R&D) caret February 26, 2014 17 / 37

Passing Model Parameters Through train We can pass options through train to gbm using the three dots. Let s omit the function s logging using the gbm option verbose = FALSE. gbmtune <- train(churn ~., data = churntrain, method = "gbm", verbose = FALSE) Max Kuhn (Pfizer Global R&D) caret February 26, 2014 18 / 37

Changing the Resampling Technique By default, train uses the bootstrap for resampling. We ll switch to 5 repeats of 10 fold cross validation instead. We can then change the type of resampling via the control function. ctrl <- traincontrol(method = "repeatedcv", repeats = 5) gbmtune <- train(churn ~., data = churntrain, method = "gbm", verbose = FALSE, trcontrol = ctrl) Max Kuhn (Pfizer Global R&D) caret February 26, 2014 19 / 37

Using Different Performance Metrics At this point train will 1 fit a sequence of different gbm models, 2 estimate performance using resampling 3 pick the gbm model with the best performance The default metrics for classification problems are accuracy and Cohen s Kappa. Suppose we wanted to estimate sensitivity, specificity and the area under the ROC curve (and pick the model with the largest AUC). We need to tell train to produce class probabilities, estimate these statistics and to rank models by the ROC AUC. Max Kuhn (Pfizer Global R&D) caret February 26, 2014 20 / 37

Using Different Performance Metrics The twoclasssummary function is defined in caret and calculates the sensitivity, specificity and ROC AUC. Other custom functions can be used (see?train). ctrl <- traincontrol(method = "repeatedcv", repeats = 5, classprobs = TRUE, summaryfunction = twoclasssummary) gbmtune <- train(churn ~., data = churntrain, method = "gbm", metric = "ROC", verbose = FALSE, trcontrol = ctrl) Max Kuhn (Pfizer Global R&D) caret February 26, 2014 21 / 37

Expanding the Search Grid By default, train uses a minimal search grid: 3 values for each tuning parameter. We might also want to expand the scope of the possible gbm models to test. Let s look at tree depths from 1 to 7, boosting iterations from 100 to 1,000 and two different learning rates. We can create a data frame with these parameters to pass to train. The column names should be the parameter name proceeded by a dot. See?train for the tunable parameters for each model type. Max Kuhn (Pfizer Global R&D) caret February 26, 2014 22 / 37

Expanding the Search Grid ctrl <- traincontrol(method = "repeatedcv", repeats = 5, classprobs = TRUE, summaryfunction = twoclasssummary) grid <- expand.grid(interaction.depth = seq(1, 7, by = 2), n.trees = seq(100, 1000, by = 50), shrinkage = c(0.01, 0.1)) gbmtune <- train(churn ~., data = churntrain, method = "gbm", metric = "ROC", tunegrid = grid, verbose = FALSE, trcontrol = ctrl) Max Kuhn (Pfizer Global R&D) caret February 26, 2014 23 / 37

Runnig the Model > grid <- expand.grid(interaction.depth = seq(1, 7, by = 2), + n.trees = seq(100, 1000, by = 50), + shrinkage = c(0.01, 0.1)) > > ctrl <- traincontrol(method = "repeatedcv", repeats = 5, + summaryfunction = twoclasssummary, + classprobs = TRUE) > > set.seed(1) > gbmtune <- train(churn ~., data = churntrain, + method = "gbm", + metric = "ROC", + tunegrid = grid, + verbose = FALSE, + trcontrol = ctrl) Max Kuhn (Pfizer Global R&D) caret February 26, 2014 24 / 37

Model Tuning train uses as many tricks as possible to reduce the number of models fits (e.g. using sub models). Here, it uses the kernlab function sigest to analytically estimate the RBF scale parameter. Currently, there are options for 147 models (see?train for a list) Allows user defined search grid, performance metrics and selection rules Easily integrates with any parallel processing framework that can emulate lapply Formula and non formula interfaces Methods: predict, print, plot, ggplot, varimp, resamples, xyplot, densityplot, histogram, stripplot,... Max Kuhn (Pfizer Global R&D) caret February 26, 2014 25 / 37

Plotting the Results > ggplot(gbmtune) + theme(legend.position = "top") Max Tree Depth 1 3 5 7 ROC (Repeated Cross Validation) 0.92 0.90 0.88 0.86 0.01 0.10 250 500 750 1000 250 500 750 1000 # Boosting Iterations Max Kuhn (Pfizer Global R&D) caret February 26, 2014 26 / 37

Prediction and Performance Assessment The predict method can be used to get results for other data sets: > gbmpred <- predict(gbmtune, churntest) > str(gbmpred) Factor w/ 2 levels "yes","no": 2 2 2 2 2 2 2 2 2 2... > gbmprobs <- predict(gbmtune, churntest, type = "prob") > str(gbmprobs) 'data.frame': 1667 obs. of 2 variables: $ yes: num 0.0316 0.092 0.3086 0.0307 0.022... $ no : num 0.968 0.908 0.691 0.969 0.978... Max Kuhn (Pfizer Global R&D) caret February 26, 2014 27 / 37

Predction and Performance Assessment > confusionmatrix(gbmpred, churntest$churn) Confusion Matrix and Statistics Reference Prediction yes no yes 151 8 no 73 1435 Accuracy : 0.951 95% CI : (0.94, 0.961) No Information Rate : 0.866 P-Value [Acc > NIR] : < 2e-16 Kappa : 0.762 Mcnemar's Test P-Value : 1.15e-12 Sensitivity : 0.6741 Specificity : 0.9945 Pos Pred Value : 0.9497 Neg Pred Value : 0.9516 Prevalence : 0.1344 Detection Rate : 0.0906 Detection Prevalence : 0.0954 Max Kuhn Balanced (Pfizer Global Accuracy R&D) : 0.8343 caret February 26, 2014 28 / 37

Test Set ROC curve via the proc Package > roccurve <- roc(response = churntest$churn, + predictor = gbmprobs[, "yes"], + levels = rev(levels(churntest$churn))) > roccurve Call: roc.default(response = churntest$churn, predictor = gbmprobs[, " Data: gbmprobs[, "yes"] in 1443 controls (churntest$churn no) < 224 Area under the curve: 0.927 > # plot(roccurve) Max Kuhn (Pfizer Global R&D) caret February 26, 2014 29 / 37

Test Set ROC curve via the proc Package Sensitivity 0.0 0.2 0.4 0.6 0.8 1.0 0.200 (0.962, 0.826) 0.500 (0.994, 0.674) 1.0 0.8 0.6 0.4 0.2 0.0 Specificity Max Kuhn (Pfizer Global R&D) caret February 26, 2014 30 / 37

Switching To Other Models It is simple to go between models using train. A different method value is needed, any optional changes to preproc and potential options to pass through using... IF we set the seed to the same value prior to calling train, the exact same resamples are used to evaluate the model. Max Kuhn (Pfizer Global R&D) caret February 26, 2014 31 / 37

Switching To Other Models Minimal changes are needed to fit different models to the same data: > ## Using the same seed prior to train() will ensure that > ## the same resamples are used (even in parallel) > set.seed(1) > svmtune <- train(churn ~., data = churntrain, + ## Tell it to fit a SVM model and tune + ## over Cost and the RBF parameter + method = "svmradial", + ## This pre-processing will be applied to + ## these data and new samples too. + preproc = c("center", "scale"), + ## Tune over different values of cost + tunelength = 10, + trcontrol = ctrl, + metric = "ROC") Max Kuhn (Pfizer Global R&D) caret February 26, 2014 32 / 37

Switching To Other Models How about: > set.seed(1) > fdatune <- train(churn ~., data = churntrain, + ## Now try a flexible discriminant model + ## using MARS basis functions + method = "fda", + tunelength = 10, + trcontrol = ctrl, + metric = "ROC") Max Kuhn (Pfizer Global R&D) caret February 26, 2014 33 / 37

Other Functions and Classes nearzerovar: a function to remove predictors that are sparse and highly unbalanced findcorrelation: a function to remove the optimal set of predictors to achieve low pair wise correlations predictors: class for determining which predictors are included in the prediction equations (e.g. rpart, earth, lars models) (currently 7 methods) confusionmatrix, sensitivity, specificity, pospredvalue, negpredvalue: classes for assessing classifier performance varimp: classes for assessing the aggregate effect of a predictor on the model equations (currently 28 methods) Max Kuhn (Pfizer Global R&D) caret February 26, 2014 34 / 37

Other Functions and Classes knnreg: nearest neighbor regression plsda, splsda: PLS discriminant analysis icr: independent component regression pcannet: nnet:::nnet with automatic PCA pre processing step bagearth, bagfda: bagging with MARS and FDA models normalize2reference: RMA like processing of Affy arrays using a training set spatialsign: class for transforming numeric data (x = x/ x ) maxdissim: a function for maximum dissimilarity sampling rfe: a class/framework for recursive feature selection (RFE) with external resampling step sbf: a class/framework for applying univariate filters prior to predictive modeling with external resampling featureplot: a wrapper for several lattice functions Max Kuhn (Pfizer Global R&D) caret February 26, 2014 35 / 37

Thanks Ray DiGiacomo R Core Pfizer s Statistics leadership for providing the time and support to create R packages caret contributors: Jed Wing, Steve Weston, Andre Williams, Chris Keefer, Allan Engelhardt, Tony Cooper, Zachary Mayer and the R Core Team. Max Kuhn (Pfizer Global R&D) caret February 26, 2014 36 / 37

Session Info R Under development (unstable) (2014-02-14 r65008), x86_64-apple-darwin10.8.0 Base packages: base, datasets, graphics, grdevices, methods, parallel, splines, stats, utils Other packages: C50 0.1.0-15, caret 6.0-24, class 7.3-9, domc 1.3.2, e1071 1.6-1, earth 3.2-6, foreach 1.4.1, gbm 2.1, ggplot2 0.9.3.1, iterators 1.0.6, kernlab 0.9-19, knitr 1.5, lattice 0.20-24, mda 0.4-4, plotmo 1.3-2, plotrix 3.5-2, pls 2.4-3, plyr 1.8, proc 1.6, randomforest 4.6-7, survival 2.37-7 Loaded via a namespace (and not attached): car 2.0-19, codetools 0.2-8, colorspace 1.2-4, compiler 3.1.0, dichromat 2.0-0, digest 0.6.4, evaluate 0.5.1, formatr 0.10, grid 3.1.0, gtable 0.1.2, highr 0.3, labeling 0.2, MASS 7.3-29, munsell 0.4.2, nnet 7.3-7, proto 0.3-10, RColorBrewer 1.0-5, Rcpp 0.11.0, reshape2 1.2.2, scales 0.2.3, stringr 0.6.2, tools 3.1.0 This presentation was created with the knitr function at 23:23 on Wednesday, Feb 26, 2014. Max Kuhn (Pfizer Global R&D) caret February 26, 2014 37 / 37