# Gerry Hobbs, Department of Statistics, West Virginia University

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit scoring, target marketing, medical diagnosis and fraud detection. The SAS System provides many tools that may be used in the prediction of both continuous and categorical targets. In this presentation I will limit myself to prediction algorithms based on recursive partitioning, commonly called decision trees. Several widely used algorithms exist and are known by the names; CART, C5.0 and CHAID, among others. Decision trees involve splitting the data into groups by successively dividing the data into subgroups based on empirically derived associations between the response (target) and one or more predictor variables. In that effort observations are sorted into bins based on the value(s) of the predictor(s). Criteria must be established for each predictor to determine which observations go in which bins in such a way as to maximize the association with the response and then how decide which of the predictors has the best association with the target variable in the particular subgroup being divided. The discussion will describe ways in which those decisions can be made and then ways in which the predictive algorithms, thus derived, can be validated. Simple decision trees often do not perform well in comparison with other predictive modeling methods (neural networks, regression, etc). Their performance can be improved in a number of ways. We will discuss some methods such as bootstrapping that often produce improved results compared to initial results. Introduction When many researchers, marketers, investigators or other analysts think of prediction they think in terms of classical (OLS) regression analysis. Indeed, regression in its many and varied guises continues to be both widely and successfully used in a large number of prediction problems. A very different approach to prediction, called the decision tree, has become increasingly popular in recent years. Just as the regression approach may be applied to problems in which the response (target) variable is continuous or categorical (logistic regression), decision trees may also be applied to categorical or continuous response problems. In a similar way either methodology may be applied to problems in which the candidate predictors are continuous, categorical or some mixture of the two. The process of fitting a decision tree is an algorithm that leads to a solution that is typically displayed in a form that is shown below as Figure 1. The data for Figure 1 comes from an artificially created 10,000 observation data set that contained a 1

2 binary response divided as 6,319 0 s and 3,681 1 s. Five hundred of the 1 s were selected at random as were 500 of the 0 s in a process sometimes known as separate sampling or, perhaps even more commonly as stratified random sampling. The resulting data set is obviously enriched in terms of the proportion of 1 s (50%) as compared to the original population. 1,000 Obs s s X4 <.65 X > Obs 384 Obs s s s 78 1 s Figure 1 X1<3 X1>3 X4<.81 X4> Obs 268 Obs 180 Obs 204 Obs 81 0 s s s s s s 60 1 s 18 1 s Decision Tree displays similar to that shown in Figure I are available in both SAS/Enterprise Miner Software and in JMP software. In the very small portion of a much larger decision tree shown above there are seven candidate predictor variables, X1 X7, and a binary target. A small proportion, five observations, of the 1,000 observation training data set is displayed below. X1 and X2 are ordinal categorical variables while X5 and X7 are (unordered) nominal variables. X3 and X4 are continuous while X6 is a count variable that might be considered ordinal. The goal of this prediction method and, indeed generally, is to use the set of predictor variables to form groups that are as homogeneous as is possible with respect to the target variable. That is to say that the ideal would be to form groups based on the values of the predictor variables in such a way that within each leaf all target values are either one or all of the target values are zero. At each step in the partitioning process our goal is to maximize node purity minimizing within node variability is an equivalent expression. In other words, we want each split to separate the target values in groups of zeros and groups on ones as well as it is possible to do so. X1 X2 X3 X4 X5 X6 X7 Target B 0 Blue A 0 Red C 3 Blue B 0 Blue C 1 Red 1 Three of the predictors X1, X4 (twice) and X6 (twice) are involved in the five binary splits necessary to create the six leaf tree shown. In this model the predicted value 2

3 of any current or future observation depends on just those three values. Since the target is binary one possibility is that the goal is to estimate the probability that Target=1 for any set of predictors. If, for instance X1=4, X4=0.5 and X6=0 we would get the predicted probability that Target=1 by following the path from the root node as follows. Because X4=0.5 we go from the root node to the node below and to the left because X4=0.5 is < From there we go to the node below and to the right because X1=4 is > 3. Finally, we choose the leaf below and to the left because X6=0 is < 1. Terminal nodes are called leaves. Of the 116 observations that fall into that leaf, 79 of them have Target=1 and 37 have Target=0. The proportion where Target=1 is, then, 79/116 =.681 and that can serve as our estimate.. Of course, there are other estimators of proportions besides the sample proportion and they could equally well be used. Of course we can also think of our prediction as the decision, Target=1, since that is more likely event, according to our estimate, than Target=0. How we find splits Consider now the extremely simple case where we have exactly three observations with just one predictor variable and a binary (0,1) target. Assume further that the predictor assumes only the three values; A, B and C displayed in the table that follows. predictor target A 1 Allowing only binary splits, there are 3 ways to form B 0 2 groups from A, B and C. They are A vs B,C; B vs A, C & C 1 C vs A, B. Arranging the data into the three possible 2x2 contingency tables associating the predictor and target variables we display the associated Pearson Chi square statistic as follows. target A 0 1 B 1 0 C 1 0 B,C 1 1 A,C 0 2 A,B 1 1 Chi square The largest value of the Pearson Chi square statistic, 3.00, results from placing A and C in one node and B in the other. That suggests that groups formed as A,C vs. B are closely associated with the target outcomes than either of the other two possibilities. Splitting criteria other than the Pearson Chi square statistic are certainly possible to use. The likelihood ratio Chi square is another obvious choice and the Gini coefficient is probably the most popular. Please note that we are not using any of these as a test statistic. Statistical significance is not an important issue at the moment. Indeed, one can argue that in pure prediction problems it is not generally an important consideration at all. Now consider the case where the predictor is either an ordinal categorical variable or is continuous. In fact, the big distinction in splitting is whether the predictor is at least ordinal (ordinal or continuous) as compared to nominal because ordinal and 3

4 continuous predictors are treated in the same way. Specifically, when the data are at least ordinal splits must respect the ordinal nature the predictor. In other words, a numeric predictor would not be divided so that 3 and 8 were in one group while 4 and 6 were in another. Again, consider a binary target but this time let the predictor take on the ordered values A, B, C and D in the data below. predictor target A 0 There are 7 possible ways to split the letters A,B,C & D B 0 but only three of the splits respect the imposed order C 1 structure. They are, A BCD, AB CD and ABC D since the D 1 others, for instance AC BD, place non contiguous values in the same group. A and C are non contiguous because B is between them. Displaying the 2x2 tables and associated Chi Square as before we get the following. Please note that we could substitute A=1, B=2, C=3 and D=4 into this example and use it to demonstrate splitting on a continuous predictor. target A 1 0 AB 2 0 ABC 2 1 BCD 1 2 CD 0 2 D 0 1 Chi square Clearly, the AB vs. CD split produces the largest value of the Pearson Chi square statistic and so, at least by that single criterion, it would be selected as the chosen split. If a continuous or ordinal predictor has five distinct values then the number of order consistent splits is four instead of fifteen. If a nominal variable has even ten distinct values then the number of possible binary splits is 511. With an increased number of candidate splits to search there is a better chance to achieve a large Chisquare by chance therefore Bonferroni and other adjustments have been suggested. Indeed it is the case that the number of possible splits can be enormous. For a categorical variable that has eight levels there are = 127 possible binary splits and 4,139 possible splits of sizes 2 through 8. P values may be associated with the Chi square statistics (here they all have one degree of freedom) and those p values may be adjusted for the multiplicity of splits considered for any particular variable. Without that adjustment variables with many levels would be favored over those with few. In our case, of course, the best split is just the one with the smallest p value. Again we emphasize that the p value need not be understood as a test of significance in order to use it as a splitting criterion. In the situation where the response variable is continuous the goal of node purity is one of minimizing the variability of the response within the chosen splits the within group variance. We can consider the result of any possible split as an analysis of variance problem with two or more groups formed by the splits. For a fixed number of splits node purity is maximized when the SS error is minimized. Alternately, of course, that is equivalent to maximizing SS groups, the F statistic or R 2. 4

5 A p value can be associated with these calculations in the usual way and that gives up a way to directly compare splits of different sizes, i.e., 2 way splits and 3 way splits as the following example illustrates. Suppose a categorical predictor takes on the values a,b,c,d in a data set with only four observations. That corresponding responses are 1,2,4,6. There are ten ways to split the predictor values into two groups. Placing d is one group and a,b,c in another leads to an ANOVA F value of 4.32 (p=.173) while placing a,b in one group and c,d in the other leads to an F = 9.80 (p=.089). That second choice maximizes F over all two way splits and, because all such splits result in an F that nominally has 1 and 2 degrees of freedom, also minimizes the p value. Therefore (a,b)::(c,d) is the best two way split. Among three way splits we must put 2 values in one group and one in each of the other two groups. If we put c,d together the resulting F is 3.19 (p=.368). The best of the threeway splits results when a & b are grouped together. For that split, F=14.25 (p=.184). Note that, while is greater than the calculated F (9.80) from the best two way split, the associated p value is larger than that resulting from the best two way split. That occurs because the three way split results in an F with different degrees of than we had for the two way split. In a real prediction problem, of course there would be several candidate predictors so, at any point, we would have to find the best split for each of the candidates and then choose the best of the best to determine the actual splitting variable. In all of this the p value is the common currency smaller adjusted p values better splits. The adjustments are too complex to go into here but mainly relate the number of possible splits considered for each candidate. Stopping Tree Growth In certain instances a tree can be grown until each terminal node contains only a single observation. Each terminal node is then perfectly pure with respect to the target. To do that would be to create a vastly over fitted model. That is tantamount to fitting a high degree polynomial, super flexible spline function or some other overly complex model to a small data set. The problem of course is that while the various twists and turns in the fitted function help to fit the given data set, those random complexities are most unlikely to be replicated in any new data set from the same or a similar source. There are a couple of things that we can use in order to avoid over fitting. The first has to do with limiting the growth of the tree in the first place. The second has to do with pruning the tree back to a simpler form after it has been fully grown. Even when using large data sets the number of observations in some or all nodes will become small if you move far enough down the tree. With the smaller counts the split Chi square values become proportionately smaller and so the p values become correspondingly larger. In addition, certain p value adjustments are made related to what roughly can be called multiple comparisons. Those adjustments become larger as you move down the tree. At some threshold, perhaps based on a p value but not necessarily 0.05, we usually choose to stop growing the tree. Other 5

6 considerations, such as establishing a minimum leaf size or maximum depth, may also be involved in decisions to stop growing the tree. There are various strategies in tree growth. One of the most popular has been labeled CART (an acronym for Classification And Regression Trees). In that and some other strategies the goal is to over fit the data with a view towards using another data set in order to prune the tree back to a more parsimonious size. On the other hand, CHAID (Chi square Automatic Interaction Detector) is an algorithm that relies on stopping the growth of trees before over fitting occurs. Pruning the Tree The processes we described earlier are meant to be applied to the training data and are meant to find what has sometimes been called a maximal tree. The idea of a maximal tree is to establish a somewhat over fitted tree that can be the basis for a series of steps in which the tree may be pruned back to a simpler form. Another data set, ordinarily constructed to contain the same proportions of the binary target outcomes, is held back for the purposes of validation. As the tree is grown, in the case where we limit ourselves to binary splits, that is just one additional node at a time we form a series of trees. First, one with two leaves, then one with three leaves, then one with four leaves, and so on. Each of those trees may be thought of as a prediction model and each of them may be applied to the validation data set. Each model in the sequence can then be assessed to see how well it fits the training data. Any of a number of assessment criteria may be used in the comparison of the series of prediction models. If our prediction takes the form of a decision, say, to contact a person or to ignore them then perhaps the most obvious choice is to assess the models according to accuracy where accuracy is simply the proportion of observations in the validation data set that are correctly predicted. Other criteria may be, and often are, more appropriate for specific tasks. We can then choose the prediction model that best fits the validation data according to whatever selected assessment measure we have chosen. Improving Performance Decision trees are useful tools for fitting noisy data. They are easy to explain to people who may not be comfortable with mathematics and they do, in many ways, reflect the mindset in which many humans naturally approach the task of prediction. It is also no small point that they can handle missing values of the predictor variables in a direct and nearly seamless manner a point not discussed here. Unfortunately, they often don t yield predictions that are as precise as we might prefer and, more importantly, that they are often out performed by methodologies like regression, neural networks and some other less well known techniques. One reason they don t always predict well is that they are multivariate step functions and, as such, they are discontinuous. Lacking smoothness, observations that are very close together in the input space may get assigned predicted values that are substantially different and the topology of the predictor 6

7 space may be highly unstable. Some methods have been developed that can mitigate these problems to a large degree. Ensemble is a description given to a general class of models in which the final predictions are averages of the predictions made by other models. Bagging and boosting are two widely used ensemble methodologies. Ensemble models can be derived for almost any set of models. Here we focus on ensembles formed from tree models. Random Forests constitute one successful strategy that combines information from many trees. The process involves selecting several, n t, bootstrap (with replacement) samples of N observations from the original population of N observations. At each splitting opportunity (node) we select a subset of m << M inputs at random from among the M input variables available. We grow maximal trees in the sense that there is no pruning, although growth may be limited, for instance, by specifying a minimum tree size or some threshold for the Gini statistic. If the best available split fails to meet the threshold we cease splitting. Now we repeat that process many times. To predict (score) any observation we pass it through each of the trees and average the predictions over all trees. If the prediction simply consists of choosing a (categorical) outcome, then each tree casts a vote and the prediction is the winner of that vote. For continuous targets each observation is passed through each tree to produce a numerical prediction and the predictions from the many trees are averaged in order to find the final prediction of the target value. In what is somewhere between those two ideas, for a categorical response, you can average the predicted probabilities for an observation and use the averaged value to predict P(Target=1) for that observation. No validation data set is required when using this approach to modeling. Although the proportion of observations not selected in a bootstrap sample will vary for real problems with their finite samples, in a certain limiting sense, the fraction of the observations not selected in any bootstrap sample will be 1/e ~ 37%. Those observations usually referred to as the OOB (out of bag) data and they are usually used to estimate the errors of prediction. Contact Information: Gerry Hobbs may be reached at SAS and all other SAS Institute Inc. product or service names are registered trademarks of the SAS Institute Inc in the USA and in other countries indicates USA registration. Random Forest is a registered trademark of Leo Breiman and Adele Cutler 7

### !"!!"#\$\$%&'()*+\$(,%!"#\$%\$&'()*""%(+,'-*&./#-\$&'(-&(0*".\$#-\$1"(2&."3\$'45"

!"!!"#\$\$%&'()*+\$(,%!"#\$%\$&'()*""%(+,'-*&./#-\$&'(-&(0*".\$#-\$1"(2&."3\$'45"!"#"\$%&#'()*+',\$\$-.&#',/"-0%.12'32./4'5,5'6/%&)\$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:

### TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP

TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző csaba.fozo@lloydsbanking.com 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions

### Data Mining Techniques Chapter 6: Decision Trees

Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................

### Using multiple models: Bagging, Boosting, Ensembles, Forests

Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or

### EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models

### Lecture 10: Regression Trees

Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

### Decision Trees from large Databases: SLIQ

Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values

### Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification

### Data Mining Practical Machine Learning Tools and Techniques

Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

### Data Mining. Nonlinear Classification

Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

### The More Trees, the Better! Scaling Up Performance Using Random Forest in SAS Enterprise Miner

Paper 3361-2015 The More Trees, the Better! Scaling Up Performance Using Random Forest in SAS Enterprise Miner Narmada Deve Panneerselvam, Spears School of Business, Oklahoma State University, Stillwater,

### COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

### A Property & Casualty Insurance Predictive Modeling Process in SAS

Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing

### Data Mining Methods: Applications for Institutional Research

Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014

### M15_BERE8380_12_SE_C15.7.qxd 2/21/11 3:59 PM Page 1. 15.7 Analytics and Data Mining 1

M15_BERE8380_12_SE_C15.7.qxd 2/21/11 3:59 PM Page 1 15.7 Analytics and Data Mining 15.7 Analytics and Data Mining 1 Section 1.5 noted that advances in computing processing during the past 40 years have

### An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIRPO

### Data Mining for Knowledge Management. Classification

1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

### DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING

DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING ABSTRACT The objective was to predict whether an offender would commit a traffic offence involving death, using decision tree analysis. Four

### CART 6.0 Feature Matrix

CART 6.0 Feature Matri Enhanced Descriptive Statistics Full summary statistics Brief summary statistics Stratified summary statistics Charts and histograms Improved User Interface New setup activity window

### Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation

### 6 Classification and Regression Trees, 7 Bagging, and Boosting

hs24 v.2004/01/03 Prn:23/02/2005; 14:41 F:hs24011.tex; VTEX/ES p. 1 1 Handbook of Statistics, Vol. 24 ISSN: 0169-7161 2005 Elsevier B.V. All rights reserved. DOI 10.1016/S0169-7161(04)24011-1 1 6 Classification

### Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE-541 28 Skövde

### A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND

Paper D02-2009 A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND ABSTRACT This paper applies a decision tree model and logistic regression

### Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

### Comparison of Data Mining Techniques used for Financial Data Analysis

Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract

### Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

### Decision Trees What Are They?

Decision Trees What Are They? Introduction...1 Using Decision Trees with Other Modeling Approaches...5 Why Are Decision Trees So Useful?...8 Level of Measurement... 11 Introduction Decision trees are a

### THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell

THE HYBID CAT-LOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most data-mining projects involve classification problems assigning objects to classes whether

### Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-19-B &

### In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

### An Overview and Evaluation of Decision Tree Methodology

An Overview and Evaluation of Decision Tree Methodology ASA Quality and Productivity Conference Terri Moore Motorola Austin, TX terri.moore@motorola.com Carole Jesse Cargill, Inc. Wayzata, MN carole_jesse@cargill.com

### Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data

### Beating the MLB Moneyline

Beating the MLB Moneyline Leland Chen llxchen@stanford.edu Andrew He andu@stanford.edu 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring time-series

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT

### Event driven trading new studies on innovative way. of trading in Forex market. Michał Osmoła INIME live 23 February 2016

Event driven trading new studies on innovative way of trading in Forex market Michał Osmoła INIME live 23 February 2016 Forex market From Wikipedia: The foreign exchange market (Forex, FX, or currency

### Better credit models benefit us all

Better credit models benefit us all Agenda Credit Scoring - Overview Random Forest - Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis

### Prediction of Stock Performance Using Analytical Techniques

136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University

### Model-Based Recursive Partitioning for Detecting Interaction Effects in Subgroups

Model-Based Recursive Partitioning for Detecting Interaction Effects in Subgroups Achim Zeileis, Torsten Hothorn, Kurt Hornik http://eeecon.uibk.ac.at/~zeileis/ Overview Motivation: Trees, leaves, and

### A fast, powerful data mining workbench designed for small to midsize organizations

FACT SHEET SAS Desktop Data Mining for Midsize Business A fast, powerful data mining workbench designed for small to midsize organizations What does SAS Desktop Data Mining for Midsize Business do? Business

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms

Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms Yin Zhao School of Mathematical Sciences Universiti Sains Malaysia (USM) Penang, Malaysia Yahya

### Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

### Chapter 6. The stacking ensemble approach

82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

### Leveraging Ensemble Models in SAS Enterprise Miner

ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to

### Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for

### Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

### Model Combination. 24 Novembre 2009

Model Combination 24 Novembre 2009 Datamining 1 2009-2010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy

### Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

### A Decision Theoretic Approach to Targeted Advertising

82 UNCERTAINTY IN ARTIFICIAL INTELLIGENCE PROCEEDINGS 2000 A Decision Theoretic Approach to Targeted Advertising David Maxwell Chickering and David Heckerman Microsoft Research Redmond WA, 98052-6399 dmax@microsoft.com

### STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

### Data Mining Classification: Decision Trees

Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous

### Smart Grid Data Analytics for Decision Support

1 Smart Grid Data Analytics for Decision Support Prakash Ranganathan, Department of Electrical Engineering, University of North Dakota, Grand Forks, ND, USA Prakash.Ranganathan@engr.und.edu, 701-777-4431

### Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

### ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node

Enterprise Miner - Regression 1 ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node 1. Some background: Linear attempts to predict the value of a continuous

### Classification/Decision Trees (II)

Classification/Decision Trees (II) Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Right Sized Trees Let the expected misclassification rate of a tree T be R (T ).

### Data Mining with R. Decision Trees and Random Forests. Hugh Murrell

Data Mining with R Decision Trees and Random Forests Hugh Murrell reference books These slides are based on a book by Graham Williams: Data Mining with Rattle and R, The Art of Excavating Data for Knowledge

### Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

### Chapter 12 Bagging and Random Forests

Chapter 12 Bagging and Random Forests Xiaogang Su Department of Statistics and Actuarial Science University of Central Florida - 1 - Outline A brief introduction to the bootstrap Bagging: basic concepts

### Interactive Data Mining and Design of Experiments: the JMP Partition and Custom Design Platforms

: the JMP Partition and Custom Design Platforms Marie Gaudard, Ph. D., Philip Ramsey, Ph. D., Mia Stephens, MS North Haven Group March 2006 Table of Contents Abstract... 1 1. Data Mining... 1 1.1. What

### Fast Analytics on Big Data with H20

Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,

### IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

### Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller

Agenda Introduktion till Prediktiva modeller Beslutsträd Beslutsträd och andra prediktiva modeller Mathias Lanner Sas Institute Pruning Regressioner Neurala Nätverk Utvärdering av modeller 2 Predictive

### Decision-Tree Learning

Decision-Tree Learning Introduction ID3 Attribute selection Entropy, Information, Information Gain Gain Ratio C4.5 Decision Trees TDIDT: Top-Down Induction of Decision Trees Numeric Values Missing Values

### Trees and Random Forests

Trees and Random Forests Adele Cutler Professor, Mathematics and Statistics Utah State University This research is partially supported by NIH 1R15AG037392-01 Cache Valley, Utah Utah State University Leo

College Readiness LINKING STUDY A Study of the Alignment of the RIT Scales of NWEA s MAP Assessments with the College Readiness Benchmarks of EXPLORE, PLAN, and ACT December 2011 (updated January 17, 2012)

### The Predictive Data Mining Revolution in Scorecards:

January 13, 2013 StatSoft White Paper The Predictive Data Mining Revolution in Scorecards: Accurate Risk Scoring via Ensemble Models Summary Predictive modeling methods, based on machine learning algorithms

### Automated Statistical Modeling for Data Mining David Stephenson 1

Automated Statistical Modeling for Data Mining David Stephenson 1 Abstract. We seek to bridge the gap between basic statistical data mining tools and advanced statistical analysis software that requires

### Modeling to improve the customer unit target selection for inspections of Commercial Losses in Brazilian Electric Sector - The case CEMIG

Paper 3406-2015 Modeling to improve the customer unit target selection for inspections of Commercial Losses in Brazilian Electric Sector - The case CEMIG Sérgio Henrique Rodrigues Ribeiro, CEMIG; Iguatinan

### Course Syllabus. Purposes of Course:

Course Syllabus Eco 5385.701 Predictive Analytics for Economists Summer 2014 TTh 6:00 8:50 pm and Sat. 12:00 2:50 pm First Day of Class: Tuesday, June 3 Last Day of Class: Tuesday, July 1 251 Maguire Building

### Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Unit # 10 Sajjad Haider Fall 2012 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

### Data Mining Using SAS Enterprise Miner Randall Matignon, Piedmont, CA

Data Mining Using SAS Enterprise Miner Randall Matignon, Piedmont, CA An Overview of SAS Enterprise Miner The following article is in regards to Enterprise Miner v.4.3 that is available in SAS v9.1.3.

### Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics

### D-optimal plans in observational studies

D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

### Data mining techniques: decision trees

Data mining techniques: decision trees 1/39 Agenda Rule systems Building rule systems vs rule systems Quick reference 2/39 1 Agenda Rule systems Building rule systems vs rule systems Quick reference 3/39

### A Property and Casualty Insurance Predictive Modeling Process in SAS

Paper 11422-2016 A Property and Casualty Insurance Predictive Modeling Process in SAS Mei Najim, Sedgwick Claim Management Services ABSTRACT Predictive analytics is an area that has been developing rapidly

### Why Ensembles Win Data Mining Competitions

Why Ensembles Win Data Mining Competitions A Predictive Analytics Center of Excellence (PACE) Tech Talk November 14, 2012 Dean Abbott Abbott Analytics, Inc. Blog: http://abbottanalytics.blogspot.com URL:

### Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

### THE RISE OF THE BIG DATA: WHY SHOULD STATISTICIANS EMBRACE COLLABORATIONS WITH COMPUTER SCIENTISTS XIAO CHENG. (Under the Direction of Jeongyoun Ahn)

THE RISE OF THE BIG DATA: WHY SHOULD STATISTICIANS EMBRACE COLLABORATIONS WITH COMPUTER SCIENTISTS by XIAO CHENG (Under the Direction of Jeongyoun Ahn) ABSTRACT Big Data has been the new trend in businesses.

### The Basics of Graphical Models

The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

### What is Data Mining? MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling

MS4424 Data Mining & Modelling MS4424 Data Mining & Modelling Lecturer : Dr Iris Yeung Room No : P7509 Tel No : 2788 8566 Email : msiris@cityu.edu.hk 1 Aims To introduce the basic concepts of data mining

### CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes

### Efficiency in Software Development Projects

Efficiency in Software Development Projects Aneesh Chinubhai Dharmsinh Desai University aneeshchinubhai@gmail.com Abstract A number of different factors are thought to influence the efficiency of the software

### Predictive Modeling of Titanic Survivors: a Learning Competition

SAS Analytics Day Predictive Modeling of Titanic Survivors: a Learning Competition Linda Schumacher Problem Introduction On April 15, 1912, the RMS Titanic sank resulting in the loss of 1502 out of 2224

### Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk

Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training

### Northumberland Knowledge

Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

### Successfully Implementing Predictive Analytics in Direct Marketing

Successfully Implementing Predictive Analytics in Direct Marketing John Blackwell and Tracy DeCanio, The Nature Conservancy, Arlington, VA ABSTRACT Successfully Implementing Predictive Analytics in Direct

### Data Mining Part 5. Prediction

Data Mining Part 5. Prediction 5.7 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Linear Regression Other Regression Models References Introduction Introduction Numerical prediction is

### Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com

SPSS-SA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING

### Data Mining and Statistics for Decision Making. Wiley Series in Computational Statistics

Brochure More information from http://www.researchandmarkets.com/reports/2171080/ Data Mining and Statistics for Decision Making. Wiley Series in Computational Statistics Description: Data Mining and Statistics

### Classification: Basic Concepts, Decision Trees, and Model Evaluation. General Approach for Building Classification Model

10 10 Classification: Basic Concepts, Decision Trees, and Model Evaluation Dr. Hui Xiong Rutgers University Introduction to Data Mining 1//009 1 General Approach for Building Classification Model Tid Attrib1

### Simple Predictive Analytics Curtis Seare

Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

### Decision Trees for Predictive Modeling

Decision Trees for Predictive Modeling Padraic G. Neville SAS Institute Inc. 4 August 1999 What a Decision Tree Is...................2 What to Do with a Tree................... 3 Variable selection Variable

### A Comparison of Variable Selection Techniques for Credit Scoring

1 A Comparison of Variable Selection Techniques for Credit Scoring K. Leung and F. Cheong and C. Cheong School of Business Information Technology, RMIT University, Melbourne, Victoria, Australia E-mail: