Gerry Hobbs, Department of Statistics, West Virginia University


 Benjamin Flowers
 6 years ago
 Views:
Transcription
1 Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit scoring, target marketing, medical diagnosis and fraud detection. The SAS System provides many tools that may be used in the prediction of both continuous and categorical targets. In this presentation I will limit myself to prediction algorithms based on recursive partitioning, commonly called decision trees. Several widely used algorithms exist and are known by the names; CART, C5.0 and CHAID, among others. Decision trees involve splitting the data into groups by successively dividing the data into subgroups based on empirically derived associations between the response (target) and one or more predictor variables. In that effort observations are sorted into bins based on the value(s) of the predictor(s). Criteria must be established for each predictor to determine which observations go in which bins in such a way as to maximize the association with the response and then how decide which of the predictors has the best association with the target variable in the particular subgroup being divided. The discussion will describe ways in which those decisions can be made and then ways in which the predictive algorithms, thus derived, can be validated. Simple decision trees often do not perform well in comparison with other predictive modeling methods (neural networks, regression, etc). Their performance can be improved in a number of ways. We will discuss some methods such as bootstrapping that often produce improved results compared to initial results. Introduction When many researchers, marketers, investigators or other analysts think of prediction they think in terms of classical (OLS) regression analysis. Indeed, regression in its many and varied guises continues to be both widely and successfully used in a large number of prediction problems. A very different approach to prediction, called the decision tree, has become increasingly popular in recent years. Just as the regression approach may be applied to problems in which the response (target) variable is continuous or categorical (logistic regression), decision trees may also be applied to categorical or continuous response problems. In a similar way either methodology may be applied to problems in which the candidate predictors are continuous, categorical or some mixture of the two. The process of fitting a decision tree is an algorithm that leads to a solution that is typically displayed in a form that is shown below as Figure 1. The data for Figure 1 comes from an artificially created 10,000 observation data set that contained a 1
2 binary response divided as 6,319 0 s and 3,681 1 s. Five hundred of the 1 s were selected at random as were 500 of the 0 s in a process sometimes known as separate sampling or, perhaps even more commonly as stratified random sampling. The resulting data set is obviously enriched in terms of the proportion of 1 s (50%) as compared to the original population. 1,000 Obs s s X4 <.65 X > Obs 384 Obs s s s 78 1 s Figure 1 X1<3 X1>3 X4<.81 X4> Obs 268 Obs 180 Obs 204 Obs 81 0 s s s s s s 60 1 s 18 1 s Decision Tree displays similar to that shown in Figure I are available in both SAS/Enterprise Miner Software and in JMP software. In the very small portion of a much larger decision tree shown above there are seven candidate predictor variables, X1 X7, and a binary target. A small proportion, five observations, of the 1,000 observation training data set is displayed below. X1 and X2 are ordinal categorical variables while X5 and X7 are (unordered) nominal variables. X3 and X4 are continuous while X6 is a count variable that might be considered ordinal. The goal of this prediction method and, indeed generally, is to use the set of predictor variables to form groups that are as homogeneous as is possible with respect to the target variable. That is to say that the ideal would be to form groups based on the values of the predictor variables in such a way that within each leaf all target values are either one or all of the target values are zero. At each step in the partitioning process our goal is to maximize node purity minimizing within node variability is an equivalent expression. In other words, we want each split to separate the target values in groups of zeros and groups on ones as well as it is possible to do so. X1 X2 X3 X4 X5 X6 X7 Target B 0 Blue A 0 Red C 3 Blue B 0 Blue C 1 Red 1 Three of the predictors X1, X4 (twice) and X6 (twice) are involved in the five binary splits necessary to create the six leaf tree shown. In this model the predicted value 2
3 of any current or future observation depends on just those three values. Since the target is binary one possibility is that the goal is to estimate the probability that Target=1 for any set of predictors. If, for instance X1=4, X4=0.5 and X6=0 we would get the predicted probability that Target=1 by following the path from the root node as follows. Because X4=0.5 we go from the root node to the node below and to the left because X4=0.5 is < From there we go to the node below and to the right because X1=4 is > 3. Finally, we choose the leaf below and to the left because X6=0 is < 1. Terminal nodes are called leaves. Of the 116 observations that fall into that leaf, 79 of them have Target=1 and 37 have Target=0. The proportion where Target=1 is, then, 79/116 =.681 and that can serve as our estimate.. Of course, there are other estimators of proportions besides the sample proportion and they could equally well be used. Of course we can also think of our prediction as the decision, Target=1, since that is more likely event, according to our estimate, than Target=0. How we find splits Consider now the extremely simple case where we have exactly three observations with just one predictor variable and a binary (0,1) target. Assume further that the predictor assumes only the three values; A, B and C displayed in the table that follows. predictor target A 1 Allowing only binary splits, there are 3 ways to form B 0 2 groups from A, B and C. They are A vs B,C; B vs A, C & C 1 C vs A, B. Arranging the data into the three possible 2x2 contingency tables associating the predictor and target variables we display the associated Pearson Chi square statistic as follows. target A 0 1 B 1 0 C 1 0 B,C 1 1 A,C 0 2 A,B 1 1 Chi square The largest value of the Pearson Chi square statistic, 3.00, results from placing A and C in one node and B in the other. That suggests that groups formed as A,C vs. B are closely associated with the target outcomes than either of the other two possibilities. Splitting criteria other than the Pearson Chi square statistic are certainly possible to use. The likelihood ratio Chi square is another obvious choice and the Gini coefficient is probably the most popular. Please note that we are not using any of these as a test statistic. Statistical significance is not an important issue at the moment. Indeed, one can argue that in pure prediction problems it is not generally an important consideration at all. Now consider the case where the predictor is either an ordinal categorical variable or is continuous. In fact, the big distinction in splitting is whether the predictor is at least ordinal (ordinal or continuous) as compared to nominal because ordinal and 3
4 continuous predictors are treated in the same way. Specifically, when the data are at least ordinal splits must respect the ordinal nature the predictor. In other words, a numeric predictor would not be divided so that 3 and 8 were in one group while 4 and 6 were in another. Again, consider a binary target but this time let the predictor take on the ordered values A, B, C and D in the data below. predictor target A 0 There are 7 possible ways to split the letters A,B,C & D B 0 but only three of the splits respect the imposed order C 1 structure. They are, A BCD, AB CD and ABC D since the D 1 others, for instance AC BD, place non contiguous values in the same group. A and C are non contiguous because B is between them. Displaying the 2x2 tables and associated Chi Square as before we get the following. Please note that we could substitute A=1, B=2, C=3 and D=4 into this example and use it to demonstrate splitting on a continuous predictor. target A 1 0 AB 2 0 ABC 2 1 BCD 1 2 CD 0 2 D 0 1 Chi square Clearly, the AB vs. CD split produces the largest value of the Pearson Chi square statistic and so, at least by that single criterion, it would be selected as the chosen split. If a continuous or ordinal predictor has five distinct values then the number of order consistent splits is four instead of fifteen. If a nominal variable has even ten distinct values then the number of possible binary splits is 511. With an increased number of candidate splits to search there is a better chance to achieve a large Chisquare by chance therefore Bonferroni and other adjustments have been suggested. Indeed it is the case that the number of possible splits can be enormous. For a categorical variable that has eight levels there are = 127 possible binary splits and 4,139 possible splits of sizes 2 through 8. P values may be associated with the Chi square statistics (here they all have one degree of freedom) and those p values may be adjusted for the multiplicity of splits considered for any particular variable. Without that adjustment variables with many levels would be favored over those with few. In our case, of course, the best split is just the one with the smallest p value. Again we emphasize that the p value need not be understood as a test of significance in order to use it as a splitting criterion. In the situation where the response variable is continuous the goal of node purity is one of minimizing the variability of the response within the chosen splits the within group variance. We can consider the result of any possible split as an analysis of variance problem with two or more groups formed by the splits. For a fixed number of splits node purity is maximized when the SS error is minimized. Alternately, of course, that is equivalent to maximizing SS groups, the F statistic or R 2. 4
5 A p value can be associated with these calculations in the usual way and that gives up a way to directly compare splits of different sizes, i.e., 2 way splits and 3 way splits as the following example illustrates. Suppose a categorical predictor takes on the values a,b,c,d in a data set with only four observations. That corresponding responses are 1,2,4,6. There are ten ways to split the predictor values into two groups. Placing d is one group and a,b,c in another leads to an ANOVA F value of 4.32 (p=.173) while placing a,b in one group and c,d in the other leads to an F = 9.80 (p=.089). That second choice maximizes F over all two way splits and, because all such splits result in an F that nominally has 1 and 2 degrees of freedom, also minimizes the p value. Therefore (a,b)::(c,d) is the best two way split. Among three way splits we must put 2 values in one group and one in each of the other two groups. If we put c,d together the resulting F is 3.19 (p=.368). The best of the threeway splits results when a & b are grouped together. For that split, F=14.25 (p=.184). Note that, while is greater than the calculated F (9.80) from the best two way split, the associated p value is larger than that resulting from the best two way split. That occurs because the three way split results in an F with different degrees of than we had for the two way split. In a real prediction problem, of course there would be several candidate predictors so, at any point, we would have to find the best split for each of the candidates and then choose the best of the best to determine the actual splitting variable. In all of this the p value is the common currency smaller adjusted p values better splits. The adjustments are too complex to go into here but mainly relate the number of possible splits considered for each candidate. Stopping Tree Growth In certain instances a tree can be grown until each terminal node contains only a single observation. Each terminal node is then perfectly pure with respect to the target. To do that would be to create a vastly over fitted model. That is tantamount to fitting a high degree polynomial, super flexible spline function or some other overly complex model to a small data set. The problem of course is that while the various twists and turns in the fitted function help to fit the given data set, those random complexities are most unlikely to be replicated in any new data set from the same or a similar source. There are a couple of things that we can use in order to avoid over fitting. The first has to do with limiting the growth of the tree in the first place. The second has to do with pruning the tree back to a simpler form after it has been fully grown. Even when using large data sets the number of observations in some or all nodes will become small if you move far enough down the tree. With the smaller counts the split Chi square values become proportionately smaller and so the p values become correspondingly larger. In addition, certain p value adjustments are made related to what roughly can be called multiple comparisons. Those adjustments become larger as you move down the tree. At some threshold, perhaps based on a p value but not necessarily 0.05, we usually choose to stop growing the tree. Other 5
6 considerations, such as establishing a minimum leaf size or maximum depth, may also be involved in decisions to stop growing the tree. There are various strategies in tree growth. One of the most popular has been labeled CART (an acronym for Classification And Regression Trees). In that and some other strategies the goal is to over fit the data with a view towards using another data set in order to prune the tree back to a more parsimonious size. On the other hand, CHAID (Chi square Automatic Interaction Detector) is an algorithm that relies on stopping the growth of trees before over fitting occurs. Pruning the Tree The processes we described earlier are meant to be applied to the training data and are meant to find what has sometimes been called a maximal tree. The idea of a maximal tree is to establish a somewhat over fitted tree that can be the basis for a series of steps in which the tree may be pruned back to a simpler form. Another data set, ordinarily constructed to contain the same proportions of the binary target outcomes, is held back for the purposes of validation. As the tree is grown, in the case where we limit ourselves to binary splits, that is just one additional node at a time we form a series of trees. First, one with two leaves, then one with three leaves, then one with four leaves, and so on. Each of those trees may be thought of as a prediction model and each of them may be applied to the validation data set. Each model in the sequence can then be assessed to see how well it fits the training data. Any of a number of assessment criteria may be used in the comparison of the series of prediction models. If our prediction takes the form of a decision, say, to contact a person or to ignore them then perhaps the most obvious choice is to assess the models according to accuracy where accuracy is simply the proportion of observations in the validation data set that are correctly predicted. Other criteria may be, and often are, more appropriate for specific tasks. We can then choose the prediction model that best fits the validation data according to whatever selected assessment measure we have chosen. Improving Performance Decision trees are useful tools for fitting noisy data. They are easy to explain to people who may not be comfortable with mathematics and they do, in many ways, reflect the mindset in which many humans naturally approach the task of prediction. It is also no small point that they can handle missing values of the predictor variables in a direct and nearly seamless manner a point not discussed here. Unfortunately, they often don t yield predictions that are as precise as we might prefer and, more importantly, that they are often out performed by methodologies like regression, neural networks and some other less well known techniques. One reason they don t always predict well is that they are multivariate step functions and, as such, they are discontinuous. Lacking smoothness, observations that are very close together in the input space may get assigned predicted values that are substantially different and the topology of the predictor 6
7 space may be highly unstable. Some methods have been developed that can mitigate these problems to a large degree. Ensemble is a description given to a general class of models in which the final predictions are averages of the predictions made by other models. Bagging and boosting are two widely used ensemble methodologies. Ensemble models can be derived for almost any set of models. Here we focus on ensembles formed from tree models. Random Forests constitute one successful strategy that combines information from many trees. The process involves selecting several, n t, bootstrap (with replacement) samples of N observations from the original population of N observations. At each splitting opportunity (node) we select a subset of m << M inputs at random from among the M input variables available. We grow maximal trees in the sense that there is no pruning, although growth may be limited, for instance, by specifying a minimum tree size or some threshold for the Gini statistic. If the best available split fails to meet the threshold we cease splitting. Now we repeat that process many times. To predict (score) any observation we pass it through each of the trees and average the predictions over all trees. If the prediction simply consists of choosing a (categorical) outcome, then each tree casts a vote and the prediction is the winner of that vote. For continuous targets each observation is passed through each tree to produce a numerical prediction and the predictions from the many trees are averaged in order to find the final prediction of the target value. In what is somewhere between those two ideas, for a categorical response, you can average the predicted probabilities for an observation and use the averaged value to predict P(Target=1) for that observation. No validation data set is required when using this approach to modeling. Although the proportion of observations not selected in a bootstrap sample will vary for real problems with their finite samples, in a certain limiting sense, the fraction of the observations not selected in any bootstrap sample will be 1/e ~ 37%. Those observations usually referred to as the OOB (out of bag) data and they are usually used to estimate the errors of prediction. Contact Information: Gerry Hobbs may be reached at SAS and all other SAS Institute Inc. product or service names are registered trademarks of the SAS Institute Inc in the USA and in other countries indicates USA registration. Random Forest is a registered trademark of Leo Breiman and Adele Cutler 7
How To Make A Credit Risk Model For A Bank Account
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző csaba.fozo@lloydsbanking.com 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
More information!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'*&./#$&'(&(0*".$#$1"(2&."3$'45"
!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'*&./#$&'(&(0*".$#$1"(2&."3$'45"!"#"$%&#'()*+',$$.&#',/"0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:
More informationData Mining Techniques Chapter 6: Decision Trees
Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................
More informationEXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.
EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models
More informationUsing multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
More informationApplied Data Mining Analysis: A StepbyStep Introduction Using RealWorld Data Sets
Applied Data Mining Analysis: A StepbyStep Introduction Using RealWorld Data Sets http://info.salfordsystems.com/jsm2015ctw August 2015 Salford Systems Course Outline Demonstration of two classification
More informationLecture 10: Regression Trees
Lecture 10: Regression Trees 36350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationDecision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
More informationCOMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
More informationData Mining Methods: Applications for Institutional Research
Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014
More informationData Mining for Knowledge Management. Classification
1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh
More informationA Property & Casualty Insurance Predictive Modeling Process in SAS
Paper AA022015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing
More information6 Classification and Regression Trees, 7 Bagging, and Boosting
hs24 v.2004/01/03 Prn:23/02/2005; 14:41 F:hs24011.tex; VTEX/ES p. 1 1 Handbook of Statistics, Vol. 24 ISSN: 01697161 2005 Elsevier B.V. All rights reserved. DOI 10.1016/S01697161(04)240111 1 6 Classification
More informationFeature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification
Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE541 28 Skövde
More informationM15_BERE8380_12_SE_C15.7.qxd 2/21/11 3:59 PM Page 1. 15.7 Analytics and Data Mining 1
M15_BERE8380_12_SE_C15.7.qxd 2/21/11 3:59 PM Page 1 15.7 Analytics and Data Mining 15.7 Analytics and Data Mining 1 Section 1.5 noted that advances in computing processing during the past 40 years have
More informationDECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING
DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING ABSTRACT The objective was to predict whether an offender would commit a traffic offence involving death, using decision tree analysis. Four
More informationAn Overview of Data Mining: Predictive Modeling for IR in the 21 st Century
An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIRPO
More informationA Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND
Paper D022009 A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND ABSTRACT This paper applies a decision tree model and logistic regression
More informationData Mining  Evaluation of Classifiers
Data Mining  Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationTHE HYBRID CARTLOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell
THE HYBID CATLOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most datamining projects involve classification problems assigning objects to classes whether
More informationComparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
More informationCART 6.0 Feature Matrix
CART 6.0 Feature Matri Enhanced Descriptive Statistics Full summary statistics Brief summary statistics Stratified summary statistics Charts and histograms Improved User Interface New setup activity window
More informationIn this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
More informationDecision Trees What Are They?
Decision Trees What Are They? Introduction...1 Using Decision Trees with Other Modeling Approaches...5 Why Are Decision Trees So Useful?...8 Level of Measurement... 11 Introduction Decision trees are a
More informationImproving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationAn Overview and Evaluation of Decision Tree Methodology
An Overview and Evaluation of Decision Tree Methodology ASA Quality and Productivity Conference Terri Moore Motorola Austin, TX terri.moore@motorola.com Carole Jesse Cargill, Inc. Wayzata, MN carole_jesse@cargill.com
More informationKnowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19  Bagging. Tom Kelsey. Notes
Knowledge Discovery and Data Mining Lecture 19  Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.standrews.ac.uk twk@standrews.ac.uk Tom Kelsey ID505919B &
More informationBeating the MLB Moneyline
Beating the MLB Moneyline Leland Chen llxchen@stanford.edu Andrew He andu@stanford.edu 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring timeseries
More informationA Decision Theoretic Approach to Targeted Advertising
82 UNCERTAINTY IN ARTIFICIAL INTELLIGENCE PROCEEDINGS 2000 A Decision Theoretic Approach to Targeted Advertising David Maxwell Chickering and David Heckerman Microsoft Research Redmond WA, 980526399 dmax@microsoft.com
More informationBetter credit models benefit us all
Better credit models benefit us all Agenda Credit Scoring  Overview Random Forest  Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis
More informationModelBased Recursive Partitioning for Detecting Interaction Effects in Subgroups
ModelBased Recursive Partitioning for Detecting Interaction Effects in Subgroups Achim Zeileis, Torsten Hothorn, Kurt Hornik http://eeecon.uibk.ac.at/~zeileis/ Overview Motivation: Trees, leaves, and
More informationA fast, powerful data mining workbench designed for small to midsize organizations
FACT SHEET SAS Desktop Data Mining for Midsize Business A fast, powerful data mining workbench designed for small to midsize organizations What does SAS Desktop Data Mining for Midsize Business do? Business
More informationLeveraging Ensemble Models in SAS Enterprise Miner
ABSTRACT Paper SAS1332014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to
More informationModel Combination. 24 Novembre 2009
Model Combination 24 Novembre 2009 Datamining 1 20092010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy
More informationClass #6: Nonlinear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Nonlinear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Nonlinear classification Linear Support Vector Machines
More informationEnsemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 20150305
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 20150305 Roman Kern (KTI, TU Graz) Ensemble Methods 20150305 1 / 38 Outline 1 Introduction 2 Classification
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for
More informationChapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining BecerraFernandez, et al.  Knowledge Management 1/e  2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
More informationPrediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
More informationEvent driven trading new studies on innovative way. of trading in Forex market. Michał Osmoła INIME live 23 February 2016
Event driven trading new studies on innovative way of trading in Forex market Michał Osmoła INIME live 23 February 2016 Forex market From Wikipedia: The foreign exchange market (Forex, FX, or currency
More informationLearning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal
Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether
More informationSTATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Webbased Analytics Table of Contents INTRODUCTION: WHAT
More informationData Mining with R. Decision Trees and Random Forests. Hugh Murrell
Data Mining with R Decision Trees and Random Forests Hugh Murrell reference books These slides are based on a book by Graham Williams: Data Mining with Rattle and R, The Art of Excavating Data for Knowledge
More informationInteractive Data Mining and Design of Experiments: the JMP Partition and Custom Design Platforms
: the JMP Partition and Custom Design Platforms Marie Gaudard, Ph. D., Philip Ramsey, Ph. D., Mia Stephens, MS North Haven Group March 2006 Table of Contents Abstract... 1 1. Data Mining... 1 1.1. What
More informationFine Particulate Matter Concentration Level Prediction by using Treebased Ensemble Classification Algorithms
Fine Particulate Matter Concentration Level Prediction by using Treebased Ensemble Classification Algorithms Yin Zhao School of Mathematical Sciences Universiti Sains Malaysia (USM) Penang, Malaysia Yahya
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationChapter 12 Bagging and Random Forests
Chapter 12 Bagging and Random Forests Xiaogang Su Department of Statistics and Actuarial Science University of Central Florida  1  Outline A brief introduction to the bootstrap Bagging: basic concepts
More informationFast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
More informationRecall this chart that showed how most of our course would be organized:
Chapter 4 OneWay ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
More informationSTATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Webbased Analytics Table
More informationDecisionTree Learning
DecisionTree Learning Introduction ID3 Attribute selection Entropy, Information, Information Gain Gain Ratio C4.5 Decision Trees TDIDT: TopDown Induction of Decision Trees Numeric Values Missing Values
More informationData Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationThe Predictive Data Mining Revolution in Scorecards:
January 13, 2013 StatSoft White Paper The Predictive Data Mining Revolution in Scorecards: Accurate Risk Scoring via Ensemble Models Summary Predictive modeling methods, based on machine learning algorithms
More informationData quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
More informationCourse Syllabus. Purposes of Course:
Course Syllabus Eco 5385.701 Predictive Analytics for Economists Summer 2014 TTh 6:00 8:50 pm and Sat. 12:00 2:50 pm First Day of Class: Tuesday, June 3 Last Day of Class: Tuesday, July 1 251 Maguire Building
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 10 Sajjad Haider Fall 2012 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationData mining techniques: decision trees
Data mining techniques: decision trees 1/39 Agenda Rule systems Building rule systems vs rule systems Quick reference 2/39 1 Agenda Rule systems Building rule systems vs rule systems Quick reference 3/39
More informationECLT5810 ECommerce Data Mining Technique SAS Enterprise Miner  Regression Model I. Regression Node
Enterprise Miner  Regression 1 ECLT5810 ECommerce Data Mining Technique SAS Enterprise Miner  Regression Model I. Regression Node 1. Some background: Linear attempts to predict the value of a continuous
More informationClassification/Decision Trees (II)
Classification/Decision Trees (II) Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Right Sized Trees Let the expected misclassification rate of a tree T be R (T ).
More informationAgenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller
Agenda Introduktion till Prediktiva modeller Beslutsträd Beslutsträd och andra prediktiva modeller Mathias Lanner Sas Institute Pruning Regressioner Neurala Nätverk Utvärdering av modeller 2 Predictive
More informationInsurance Analytics  analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.
Insurance Analytics  analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics
More informationDoptimal plans in observational studies
Doptimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
More informationWhy Ensembles Win Data Mining Competitions
Why Ensembles Win Data Mining Competitions A Predictive Analytics Center of Excellence (PACE) Tech Talk November 14, 2012 Dean Abbott Abbott Analytics, Inc. Blog: http://abbottanalytics.blogspot.com URL:
More informationEfficiency in Software Development Projects
Efficiency in Software Development Projects Aneesh Chinubhai Dharmsinh Desai University aneeshchinubhai@gmail.com Abstract A number of different factors are thought to influence the efficiency of the software
More informationPredictive Modeling of Titanic Survivors: a Learning Competition
SAS Analytics Day Predictive Modeling of Titanic Survivors: a Learning Competition Linda Schumacher Problem Introduction On April 15, 1912, the RMS Titanic sank resulting in the loss of 1502 out of 2224
More informationModeling to improve the customer unit target selection for inspections of Commercial Losses in Brazilian Electric Sector  The case CEMIG
Paper 34062015 Modeling to improve the customer unit target selection for inspections of Commercial Losses in Brazilian Electric Sector  The case CEMIG Sérgio Henrique Rodrigues Ribeiro, CEMIG; Iguatinan
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationCI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
More informationIBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
More informationAutomated Statistical Modeling for Data Mining David Stephenson 1
Automated Statistical Modeling for Data Mining David Stephenson 1 Abstract. We seek to bridge the gap between basic statistical data mining tools and advanced statistical analysis software that requires
More informationPAKDD 2006 Data Mining Competition
PAKDD 2006 Data Mining Competition Date Submitted: February 28 th, 2006 SAS Enterprise Miner, Release 4.3 Team Members Bhuvanendran, Aswin Bommi Narasimha, Sankeerth Reddy Jain, Amit Rangwala, Zenab Table
More informationTHE RISE OF THE BIG DATA: WHY SHOULD STATISTICIANS EMBRACE COLLABORATIONS WITH COMPUTER SCIENTISTS XIAO CHENG. (Under the Direction of Jeongyoun Ahn)
THE RISE OF THE BIG DATA: WHY SHOULD STATISTICIANS EMBRACE COLLABORATIONS WITH COMPUTER SCIENTISTS by XIAO CHENG (Under the Direction of Jeongyoun Ahn) ABSTRACT Big Data has been the new trend in businesses.
More informationSilvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spsssa.com
SPSSSA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spsssa.com SPSSSA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.7 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Linear Regression Other Regression Models References Introduction Introduction Numerical prediction is
More informationIBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
More informationDecision Trees for Predictive Modeling
Decision Trees for Predictive Modeling Padraic G. Neville SAS Institute Inc. 4 August 1999 What a Decision Tree Is...................2 What to Do with a Tree................... 3 Variable selection Variable
More informationUsing Control Groups to Target on Predicted Lift:
Using Control Groups to Target on Predicted Lift: Building and Assessing Uplift Models Nicholas J. Radcliffe Portrait Software The Smith Centre The Fairmile HenleyonThames Oxfordshire RG9 6AB UK Department
More informationA Comparison of Variable Selection Techniques for Credit Scoring
1 A Comparison of Variable Selection Techniques for Credit Scoring K. Leung and F. Cheong and C. Cheong School of Business Information Technology, RMIT University, Melbourne, Victoria, Australia Email:
More informationOverview Classes. 123 Logistic regression (5) 193 Building and applying logistic regression (6) 263 Generalizations of logistic regression (7)
Overview Classes 123 Logistic regression (5) 193 Building and applying logistic regression (6) 263 Generalizations of logistic regression (7) 24 Loglinear models (8) 54 1517 hrs; 5B02 Building and
More informationA Property and Casualty Insurance Predictive Modeling Process in SAS
Paper 114222016 A Property and Casualty Insurance Predictive Modeling Process in SAS Mei Najim, Sedgwick Claim Management Services ABSTRACT Predictive analytics is an area that has been developing rapidly
More informationWhat is Data Mining? MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling
MS4424 Data Mining & Modelling MS4424 Data Mining & Modelling Lecturer : Dr Iris Yeung Room No : P7509 Tel No : 2788 8566 Email : msiris@cityu.edu.hk 1 Aims To introduce the basic concepts of data mining
More informationChapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS
Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple
More informationBenchmarking of different classes of models used for credit scoring
Benchmarking of different classes of models used for credit scoring We use this competition as an opportunity to compare the performance of different classes of predictive models. In particular we want
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationThe Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
More informationStartup Companies Predictive Models Analysis. Boyan Yankov, Kaloyan Haralampiev, Petko Ruskov
Startup Companies Predictive Models Analysis Boyan Yankov, Kaloyan Haralampiev, Petko Ruskov Abstract: A quantitative research is performed to derive a model for predicting the success of Bulgarian startup
More informationIntroduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training
More informationClassification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
More informationResearch Methods & Experimental Design
Research Methods & Experimental Design 16.422 Human Supervisory Control April 2004 Research Methods Qualitative vs. quantitative Understanding the relationship between objectives (research question) and
More informationENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,
More informationCollege Readiness LINKING STUDY
College Readiness LINKING STUDY A Study of the Alignment of the RIT Scales of NWEA s MAP Assessments with the College Readiness Benchmarks of EXPLORE, PLAN, and ACT December 2011 (updated January 17, 2012)
More informationPLANET: Massively Parallel Learning of Tree Ensembles with MapReduce. Authors: B. Panda, J. S. Herbach, S. Basu, R. J. Bayardo.
PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce Authors: B. Panda, J. S. Herbach, S. Basu, R. J. Bayardo. VLDB 2009 CS 422 Decision Trees: Main Components Find Best Split Choose split
More informationVariable Selection in the Credit Card Industry Moez Hababou, Alec Y. Cheng, and Ray Falk, Royal Bank of Scotland, Bridgeport, CT
Variable Selection in the Credit Card Industry Moez Hababou, Alec Y. Cheng, and Ray Falk, Royal ank of Scotland, ridgeport, CT ASTRACT The credit card industry is particular in its need for a wide variety
More information