Decision Trees and other predictive models. Mathias Lanner SAS Institute

Decision Trees and other predictive models Mathias Lanner SAS Institute

Agenda Introduction to Predictive Models Decision Trees Pruning Regression Neural Network Model Assessment 2

Predictive Modeling The Essence of Data Mining Most of the big payoff [in data mining] has been in predictive modeling. Herb Edelstein 3

Predictive Modeling Applications Database marketing Financial risk management Fraud detection Process monitoring Pattern detection 4

Predictive Modeling Training Data Training Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs Numeric or categorical values 5

Predictive Modeling Score Data Training Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs Score Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs????? Only input values known 6

Predictions Training Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs Predictions Score Data case 1: inputs? case 2: inputs? case 3: inputs? case 4: inputs? case 5: inputs? 7

Predictions Training Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs Predictions Score Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs????? 8

Predictive Modeling Essentials new case Predict new cases x 3 x 4 Select useful inputs Optimize complexity 9

Predictive Modeling Essentials new case Predict new cases x 3 x 4 Select useful inputs Optimize complexity 10

Three Prediction Types Training Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs Predictions Decisions Rankings Estimates 11

Decision Predictions Training Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs Decisions primary secondary tertiary primary secondary Trained model uses input measurements to make best decision for each case. 12

Ranking Predictions Training Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs Rankings 720 520 620 580 470 Trained model uses input measurements to optimally rank each case. 13

Estimate Predictions Training Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs Estimates 0.65 0.33 0.54 0.47 0.28 Trained model uses input measurements to optimally estimate value. 14

Model Essentials Predict Review new case Predict new cases Decide, rank, estimate x 3 x 4 Select useful inputs Optimize complexity 15

Model Essentials Select Review new case Predict new cases x 3 x 4 Select useful inputs Optimize complexity 16

Curse of Dimensionality 1 D 2 D 3 D 17

Input Selection Redundancy Irrelevancy 18

Model Essentials Select Review new case x 3 x 4 Predict new cases Select useful inputs Optimize complexity Decide, rank, estimate Eradicate redundancies irrelevancies 19

Model Essentials Optimize new case Predict new cases x 3 x 4 Select useful inputs Optimize complexity 20

Fool s Gold My model fits the training data perfectly... I ve struck it rich! 21

Model Complexity Too flexible Not flexible enough 22

23 Data Splitting

Training Data Role Training Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs Validation Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs Training data gives sequence of predictive models with increasing complexity. 24

Validation Data Role Training Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs Validation Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs 0.73 0.78 0.49 0.53 0.41 Validation data helps select best model from sequence 25

Validation Data Role Training Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs Validation Data case 1: inputs case 2: inputs case 3: inputs case 4: inputs case 5: inputs 0.73 0.78 0.49 0.53 0.41 0.73 0.78 0.49 0.53 0.41 Validation data helps select best model from Sequence. 26

Model Essentials Optimize new case x 3 x 4 Predict new cases Select useful inputs Optimize complexity Decide, rank, estimate Eradicate redundancies irrelevancies Tune models with validation data 27

Agenda Introduction to Predictive Models DECISION TREES Pruning Regression Neural Networks Model Assessment 28

Predictive Modeling Tools Primary Decision Tree Regression Neural Network Specialty Dmine Regression MBR AutoNeural Rule Induction DMNeural Multiple Model Ensemble Two Stage 29

Model Essentials Decision Trees new case Predict new cases Prediction rules x 3 x 4 Select useful inputs Split search Optimize complexity Pruning 32

Simple Prediction Illustration Analysis goal: Predict the color of a dot based on its location in a scatter plot. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 33

Decision Tree Prediction Rules 40% leaf node root node <0.63 0.63 interior node <0.52 0.52 <0.51 0.51 60% 70% 55% 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 35

Decision Tree Prediction Rules new case 40% <0.63 0.63 <0.52 0.52 <0.51 0.51 60% 70% 55% 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 36

Decision Tree Prediction Rules new case Decision = Estimate = 0.70 40% <0.63 0.63 <0.52 0.52 <0.51 0.51 60% 70% 55% 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 37

Decision Tree Split Search left right Calculate the logworth of every partition on input. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 39

Decision Tree Split Search left right 53% 42% 47% 58% max logworth( ) 0.95 Select the partition with maximum logworth. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.52 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 40

Decision Tree Split Search left right 53% 42% 47% 58% Repeat for input. max logworth( ) 0.95 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 41

Decision Tree Split Search left right 53% 42% 47% 58% max logworth( ) 0.95 1.0 0.9 0.8 0.7 0.6 0.63 bottom top 0.5 0.4 54% 35% 46% 65% max logworth( ) 4.92 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 42

Decision Tree Split Search <0.63 0.63 Create partition rule from best partition across all inputs. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.63 43

Decision Tree Split Search <0.63 0.63 Repeat process in each subset. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 44

Decision Tree Split Search left right 61% 40% 39% 60% logworth( ) 5.72 Select the partition with maximum logworth on input. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.52 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 45

Decision Tree Split Search left right 61% 40% 39% 60% bottom top 38% 55% 62% 45% Repeat for input. logworth( ) 5.72 logworth( ) -2.01 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.02 46

Decision Tree Split Search <0.52 0.52 <0.63 0.63 Create second partition rule. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 47

Decision Tree Split Search Repeat to form maximal tree. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 48

49 Demo Decision Tree