Decision Trees and other predictive models Mathias Lanner SAS Institute
Agenda Introduction to Predictive Models Decision Trees Pruning Regression Neural Network Model Assessment 2
Predictive Modeling The Essence of Data Mining Most of the big payoff [in data mining] has been in predictive modeling. Herb Edelstein 3
Predictive Modeling Applications Database marketing Financial risk management Fraud detection Process monitoring Pattern detection 4
Predictive Modeling Training Data Training Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs Numeric or categorical values 5
Predictive Modeling Score Data Training Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs Score Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs????? Only input values known 6
Predictions Training Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs Predictions Score Data case 1: inputs? case 2: inputs? case 3: inputs? case 4: inputs? case 5: inputs? 7
Predictions Training Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs Predictions Score Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs????? 8
Predictive Modeling Essentials new case Predict new cases x 3 x 4 Select useful inputs Optimize complexity 9
Predictive Modeling Essentials new case Predict new cases x 3 x 4 Select useful inputs Optimize complexity 10
Three Prediction Types Training Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs Predictions Decisions Rankings Estimates 11
Decision Predictions Training Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs Decisions primary secondary tertiary primary secondary Trained model uses input measurements to make best decision for each case. 12
Ranking Predictions Training Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs Rankings 720 520 620 580 470 Trained model uses input measurements to optimally rank each case. 13
Estimate Predictions Training Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs Estimates 0.65 0.33 0.54 0.47 0.28 Trained model uses input measurements to optimally estimate value. 14
Model Essentials Predict Review new case Predict new cases Decide, rank, estimate x 3 x 4 Select useful inputs Optimize complexity 15
Model Essentials Select Review new case Predict new cases x 3 x 4 Select useful inputs Optimize complexity 16
Curse of Dimensionality 1 D 2 D 3 D 17
Input Selection Redundancy Irrelevancy 18
Model Essentials Select Review new case x 3 x 4 Predict new cases Select useful inputs Optimize complexity Decide, rank, estimate Eradicate redundancies irrelevancies 19
Model Essentials Optimize new case Predict new cases x 3 x 4 Select useful inputs Optimize complexity 20
Fool s Gold My model fits the training data perfectly... I ve struck it rich! 21
Model Complexity Too flexible Not flexible enough 22
23 Data Splitting
Training Data Role Training Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs Validation Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs Training data gives sequence of predictive models with increasing complexity. 24
Validation Data Role Training Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs Validation Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs 0.73 0.78 0.49 0.53 0.41 Validation data helps select best model from sequence 25
Validation Data Role Training Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs Validation Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs 0.73 0.78 0.49 0.53 0.41 0.73 0.78 0.49 0.53 0.41 Validation data helps select best model from Sequence. 26
Model Essentials Optimize new case x 3 x 4 Predict new cases Select useful inputs Optimize complexity Decide, rank, estimate Eradicate redundancies irrelevancies Tune models with validation data 27
Agenda Introduction to Predictive Models DECISION TREES Pruning Regression Neural Networks Model Assessment 28
Predictive Modeling Tools Primary Decision Tree Regression Neural Network Specialty Dmine Regression MBR AutoNeural Rule Induction DMNeural Multiple Model Ensemble Two Stage 29
Predictive Modeling Tools Primary Decision Tree Regression Neural Network Specialty Dmine Regression MBR AutoNeural Rule Induction DMNeural Multiple Model Ensemble Two Stage 30
Predictive Modeling Tools Primary Decision Tree Regression Neural Network Specialty Dmine Regression MBR AutoNeural Rule Induction DMNeural Multiple Model Ensemble Two Stage 31
Model Essentials Decision Trees new case Predict new cases Prediction rules x 3 x 4 Select useful inputs Split search Optimize complexity Pruning 32
Simple Prediction Illustration Analysis goal: Predict the color of a dot based on its location in a scatter plot. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 33
Model Essentials Decision Trees new case Predict new cases Prediction rules x 3 x 4 Select useful inputs Split search Optimize complexity Pruning 34
Decision Tree Prediction Rules 40% leaf node root node <0.63 0.63 interior node <0.52 0.52 <0.51 0.51 60% 70% 55% 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 35
Decision Tree Prediction Rules new case 40% <0.63 0.63 <0.52 0.52 <0.51 0.51 60% 70% 55% 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 36
Decision Tree Prediction Rules new case Decision = Estimate = 0.70 40% <0.63 0.63 <0.52 0.52 <0.51 0.51 60% 70% 55% 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 37
Model Essentials Decision Trees new case Predict new cases Prediction rules x 3 x 4 Select useful inputs Split search Optimize complexity Pruning 38
Decision Tree Split Search left right Calculate the logworth of every partition on input. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 39
Decision Tree Split Search left right 53% 42% 47% 58% max logworth( ) 0.95 Select the partition with maximum logworth. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.52 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 40
Decision Tree Split Search left right 53% 42% 47% 58% Repeat for input. max logworth( ) 0.95 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 41
Decision Tree Split Search left right 53% 42% 47% 58% max logworth( ) 0.95 1.0 0.9 0.8 0.7 0.6 0.63 bottom top 0.5 0.4 54% 35% 46% 65% max logworth( ) 4.92 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 42
Decision Tree Split Search <0.63 0.63 Create partition rule from best partition across all inputs. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.63 43
Decision Tree Split Search <0.63 0.63 Repeat process in each subset. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 44
Decision Tree Split Search left right 61% 40% 39% 60% logworth( ) 5.72 Select the partition with maximum logworth on input. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.52 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 45
Decision Tree Split Search left right 61% 40% 39% 60% bottom top 38% 55% 62% 45% Repeat for input. logworth( ) 5.72 logworth( ) -2.01 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.02 46
Decision Tree Split Search <0.52 0.52 <0.63 0.63 Create second partition rule. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 47
Decision Tree Split Search Repeat to form maximal tree. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 48
49 Demo Decision Tree