Agenda Introduktion till Prediktiva modeller Beslutsträd Beslutsträd och andra prediktiva modeller Mathias Lanner Sas Institute Pruning Regressioner Neurala Nätverk Utvärdering av modeller 2 Predictive Modeling Applications Predictive Modeling Database marketing Financial risk management Fraud detection Process monitoring Pattern detection 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Numeric or categorical values 3 4
Predictive Modeling Score Data 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Predictions 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Predictions Score Data 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Only input values known Score Data 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs 5 6 Predictions Predictive Modeling Essentials 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Score Data 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Predictions Predict s 7 8
Predictive Modeling Essentials Three Prediction Types Predict s 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Predictions Decisions Rankings Estimates 9 10 Decision Predictions Ranking Predictions 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Decisions primary secondary tertiary primary secondary Trained model uses input measurements to make best decision for each. 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Rankings 720 520 620 580 470 Trained model uses input measurements to optimally rank each. 11 12
Estimate Predictions Model Essentials Predict Review 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Estimates 0.65 0.33 0.54 0.47 0.28 Trained model uses input measurements to optimally estimate value. Predict s Decide, rank, estimate 13 14 Model Essentials Select Review Curse of Dimensionality Predict s 1 D 2 D 3 D 15 16
Input Selection Model Essentials Select Review Redundancy Irrelevancy Predict s Decide, rank, estimate Eradicate redundancies irrelevancies 17 18 Model Essentials Optimize Fool s Gold Predict s My model fits the training data perfectly... I ve struck it rich! 19 20
Model Complexity Data Splitting Too flexible Not flexible enough 21 22 Role Validation Data Role 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Validation Data 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Training data gives sequence of predictive models with increasing complexity. 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Validation Data 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs 0.73 0.78 0.49 0.53 0.41 Validation data helps select best model from sequence 23 24
Validation Data Role Model Essentials Optimize 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Validation Data 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs 0.73 0.78 0.49 0.53 0.41 0.73 0.78 0.49 0.53 0.41 Validation data helps select best model from Sequence. Predict s Decide, rank, estimate Eradicate redundancies irrelevancies Tune models with validation data 25 26 Agenda Predictive Modeling Tools Introduktion till Prediktiva modeller Beslutsträd Primary Decision Tree Regression Neural Network Pruning Regressioner Specialty Dmine Regression MBR AutoNeural Neurala Nätverk Utvärdering av modeller Rule Induction DMNeural Multiple Model Ensemble Two Stage 27 28
Predictive Modeling Tools Predictive Modeling Tools Primary Decision Tree Regression Neural Network Primary Decision Tree Regression Neural Network Specialty Dmine Regression MBR AutoNeural Specialty Dmine Regression MBR AutoNeural Rule Induction DMNeural Rule Induction DMNeural Multiple Model Ensemble Two Stage Multiple Model Ensemble Two Stage 29 30 Model Essentials Decision Trees Simple Prediction Illustration Predict s Prediction rules Split search Pruning Analysis goal: Predict the color of a dot based on its location in a scatter plot. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 31 32
Model Essentials Decision Trees Decision Tree Prediction Rules Predict s Prediction rules Split search Pruning 40% leaf node root node <0.63 0.63 interior node <0.52 0.52 <0.51 0.51 60% 70% 55% 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 33 34 Decision Tree Prediction Rules Decision Tree Prediction Rules Decision = Estimate = 0.70 40% <0.63 0.63 <0.52 0.52 <0.51 0.51 60% 70% 55% 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 40% <0.63 0.63 <0.52 0.52 <0.51 0.51 60% 70% 55% 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 35 36
Model Essentials Decision Trees Predict s Prediction rules Demo Beslutsträd Split search Pruning 37 38 Agenda Model Essentials Decision Trees Introduktion till Prediktiva modeller Beslutsträd Predict s Prediction rules Pruning Regressioner Split search Neurala Nätverk Utvärdering av modeller Pruning 39 40
Binary Targets Decision Assessment 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs =1 =0 =0 =1 =0 primary outcome secondary outcome 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs =1 =0 =0 =1 =0 Decisions primary secondary primary secondary secondary Accuracy/Profit true positive true negative Focus on correct decisions 41 42 Decision Assessment (for Pessimists) Ranking Assessment 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs =1 =0 =0 =1 =0 Decisions primary secondary primary secondary secondary Misclassification/Loss false positive false negative 1: inputs =1 2: inputs =0 3: inputs =0 4: inputs =1 5: inputs =0 Rankings 720 520 620 580 470 Concordance rank(=1) > rank(=0) Focus on incorrect decisions Focus on correct ordering 43 44
Ranking Assessment (for Pessimists) Estimate Assessment (only Pessimistic!) 1: inputs =1 2: inputs =0 3: inputs =0 4: inputs =1 5: inputs =0 Rankings 720 520 620 580 470 Discordance rank(=1) < rank(=0) 1: inputs =1 2: inputs =0 3: inputs =0 4: inputs =1 5: inputs =0 Estimates 0.65 0.33 0.54 0.47 0.28 Squared Error (-estimate) 2 Focus on incorrect ordering Focus on incorrect estimation 45 46 Predictive Modeling Assessments Optimistic Assessment, Pessimistic Stats 42% Misclassification D 1 D 0 R r Decisions Rankings Accuracy Misclassification Concordance Discordance 40% 38% 45% 40% 35% 0.26 Discordance Squared Error Increasing leaf count improves assessment measures on training data. 0.24 p 1 Estimates Squared Error 0.22 47 48
Unbiased Assessment 42% 40% 38% Misclassification Demo på Pruning 45% 40% 35% 0.26 0.24 0.22 Discordance Squared Error Increasing leaf count might worsen assessment measures on validation data. 49 50 Model Essentials Regressions Model Essentials Regressions Predict s Prediction formula Predict s Prediction formula Sequential selection Sequential selection Optimal sequence model Optimal sequence model 51 52
Linear Regression Prediction Formula Logistic Regression Prediction Formula intercept estimate input measurement y ^ = w^ 0 + w^ 1 + w^ 2 parameter estimate estimate log p^ 1 ^p ( ) = w^ 0 + w^ 1 + w^ 2 logit scores Choose intercept and parameter estimates to minimize. squared error function ( y i y ^ i ) 2 training data Choose intercept and parameter estimates to maximize. log-likelihood function log(p ^ i ) + log(1 ^ p i ) primary outcome training s secondary outcome training s 53 54 Logit Link Function Simple Prediction Illustration Regressions log 5 p^ 1 ^p ( ) logit link function 0 1-5 = w^ 0 + w^ 1 + w^ 2 doubling amount x i logit scores consequence 1 odds exp(w i ) 0.69 odds 2 w i Model interpretation odds ratio ^ p = logit equation logit( p ^) = w^ 0 + w^ 1 + w^ 2 1 1 + e -logit( p ^ ) logistic equation 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 55 56
Beyond the Prediction Formula Missing Values and Regression Modeling Missing values Inputs Extreme or unusual values C A B Non-numeric inputs Cases Nonlinearity and Non-additivity 57 58 Missing Values and Regression Modeling Missing Values and the Prediction Formula Inputs Prediction Formula: logit( p )= -2.1 + 0.072 0.89 1.24 Cases New Case: (,, ) = ( 2,,-1 ) Predicted Value: logit( p )= -2.1 + 0.144 0.8 + 1.24 59 60
Missing Value Causes Missing Value Remedies N/A Not applicable N/A Not applicable Synthetic distribution No match Non-disclosure No match Non-disclosure Estimation x i = f(,,x p ) 61 62 Model Essentials Regressions Sequential Selection Forward Predict s Prediction formula Input p-value Entry Cutoff Sequential selection Optimal sequence model 63 64
Sequential Selection Forward Sequential Selection Backward Input p-value Entry Cutoff Input p-value Stay Cutoff 65 66 Sequential Selection Backward Sequential Selection Stepwise Input p-value Stay Cutoff Input p-value Entry Cutoff Stay Cutoff 67 68
Sequential Selection Stepwise Model Essentials Regressions Input p-value Entry Cutoff Predict s Prediction formula Stay Cutoff Sequential selection Optimal sequence model 69 70 Model Fit versus Complexity Model fit statistic Select Model with Optimal Validation Fit Model fit statistic validation training Evaluate each sequence step. Choose simplest optimal model. Evaluate each sequence step. Choose simplest optimal model. 1 2 3 4 5 6 1 2 3 4 5 6 71 72
Agenda Model Essentials Neural Networks Introduktion till Prediktiva modeller Beslutsträd Predict s Prediction formula Pruning Regressioner None Neurala Nätverk Utvärdering av modeller Stopped training 73 74 Model Essentials Neural Networks Neural Network Prediction Formula Predict s Prediction formula estimate weight estimate hidden unit ^ ^ ^ ^ ^ y = w 00 + w 01 H 1 + w 02 H 2 + w 03 H 3 bias estimate None Stopped training -5 1 0-1 tanh 5 H 1 = tanh(w^ 10 + w^ 11 + w^ 12 ) H 2 = tanh(w^ 20 + w^ 21 + w^ 22 ) H 3 = tanh(w^ 30 + w^ 31 + w^ 32 ) activation function 75 76
Neural Network Binary Prediction Formula Neural Network Diagram ^ p log = w^ 00 + w^ 01 H 1 + w^ 02 H 2 + w^ ( 03 H 1 p ^ ) 3 logit link function H 1 = tanh(w^ 10 + w^ 11 + w^ 12 ) H 2 = tanh(w^ 20 + w^ 21 + w^ 22 ) H 3 = tanh(w^ 30 + w^ 31 + w^ 32 ) input layer ^ p log = w^ 00 + w^ 01 H 1 + w^ 02 H 2 + w^ ( 03 H 1 p ^ ) 3 H 1 H 2 H 3 hidden layer y layer H 1 = tanh(w^ 10 + w^ 11 + w^ 12 ) H 2 = tanh(w^ 20 + w^ 21 + w^ 22 ) H 3 = tanh(w^ 30 + w^ 31 + w^ 32 ) 77 78 Prediction Illustration Neural Networks Agenda logit( p ^) = w^ 00 +w^ 01 H 1 +w^ 02 H 2 +w^ 03 H 3 ^ p = logit equation H 1 = tanh( w^ 10 +w^ 11 +w^ 12 ) H 2 = tanh( w^ 20 +w^ 21 +w^ 22 ) H 3 = tanh( w^ 30 +w^ 31 +w^ 32 ) 1 1 + e -logit( p ^ ) logistic equation 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Introduktion till Prediktiva modeller Beslutsträd Pruning Regressioner Neurala Nätverk Utvärdering av modeller 79 80
Assessment Types Summary Statistics Summary The Model Comparison tool provides KS C ASE Summary statistics Statistical graphics Prediction Type Decisions 1,2,3, Rankings Statistic Accuracy / Misclassification Profit / Loss KS-statistic ROC Index (concordance) Gini coefficient ^ p E(Y) Estimates Average squared error SBC / Likelihood 81 82 Summary Statistics Summary Summary Statistics Summary Prediction Type Statistic Prediction Type Statistic Decisions Accuracy / Misclassification Profit / Loss KS-statistic Decisions Accuracy / Misclassification Profit / Loss KS-statistic 1,2,3, Rankings ROC Index (concordance) Gini coefficient 1,2,3, Rankings ROC Index (concordance) Gini coefficient ^ p E(Y) Estimates Average squared error SBC / Likelihood ^ p E(Y) Estimates Average squared error SBC / Likelihood 83 84
Statistical Graphics Summary Prediction Type Statistic Decisions 1,2,3, ^ p E(Y) Rankings Estimates Sensitivity charts Response rate charts 85