Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller

Size: px
Start display at page:

Download "Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller"

Transcription

1 Agenda Introduktion till Prediktiva modeller Beslutsträd Beslutsträd och andra prediktiva modeller Mathias Lanner Sas Institute Pruning Regressioner Neurala Nätverk Utvärdering av modeller 2 Predictive Modeling Applications Predictive Modeling Database marketing Financial risk management Fraud detection Process monitoring Pattern detection 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Numeric or categorical values 3 4

2 Predictive Modeling Score Data 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Predictions 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Predictions Score Data 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Only input values known Score Data 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs 5 6 Predictions Predictive Modeling Essentials 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Score Data 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Predictions Predict s 7 8

3 Predictive Modeling Essentials Three Prediction Types Predict s 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Predictions Decisions Rankings Estimates 9 10 Decision Predictions Ranking Predictions 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Decisions primary secondary tertiary primary secondary Trained model uses input measurements to make best decision for each. 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Rankings Trained model uses input measurements to optimally rank each

4 Estimate Predictions Model Essentials Predict Review 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Estimates Trained model uses input measurements to optimally estimate value. Predict s Decide, rank, estimate Model Essentials Select Review Curse of Dimensionality Predict s 1 D 2 D 3 D 15 16

5 Input Selection Model Essentials Select Review Redundancy Irrelevancy Predict s Decide, rank, estimate Eradicate redundancies irrelevancies Model Essentials Optimize Fool s Gold Predict s My model fits the training data perfectly... I ve struck it rich! 19 20

6 Model Complexity Data Splitting Too flexible Not flexible enough Role Validation Data Role 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Validation Data 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Training data gives sequence of predictive models with increasing complexity. 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Validation Data 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Validation data helps select best model from sequence 23 24

7 Validation Data Role Model Essentials Optimize 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Validation Data 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs Validation data helps select best model from Sequence. Predict s Decide, rank, estimate Eradicate redundancies irrelevancies Tune models with validation data Agenda Predictive Modeling Tools Introduktion till Prediktiva modeller Beslutsträd Primary Decision Tree Regression Neural Network Pruning Regressioner Specialty Dmine Regression MBR AutoNeural Neurala Nätverk Utvärdering av modeller Rule Induction DMNeural Multiple Model Ensemble Two Stage 27 28

8 Predictive Modeling Tools Predictive Modeling Tools Primary Decision Tree Regression Neural Network Primary Decision Tree Regression Neural Network Specialty Dmine Regression MBR AutoNeural Specialty Dmine Regression MBR AutoNeural Rule Induction DMNeural Rule Induction DMNeural Multiple Model Ensemble Two Stage Multiple Model Ensemble Two Stage Model Essentials Decision Trees Simple Prediction Illustration Predict s Prediction rules Split search Pruning Analysis goal: Predict the color of a dot based on its location in a scatter plot

9 Model Essentials Decision Trees Decision Tree Prediction Rules Predict s Prediction rules Split search Pruning 40% leaf node root node < interior node < < % 70% 55% Decision Tree Prediction Rules Decision Tree Prediction Rules Decision = Estimate = % < < < % 70% 55% % < < < % 70% 55%

10 Model Essentials Decision Trees Predict s Prediction rules Demo Beslutsträd Split search Pruning Agenda Model Essentials Decision Trees Introduktion till Prediktiva modeller Beslutsträd Predict s Prediction rules Pruning Regressioner Split search Neurala Nätverk Utvärdering av modeller Pruning 39 40

11 Binary Targets Decision Assessment 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs =1 =0 =0 =1 =0 primary outcome secondary outcome 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs =1 =0 =0 =1 =0 Decisions primary secondary primary secondary secondary Accuracy/Profit true positive true negative Focus on correct decisions Decision Assessment (for Pessimists) Ranking Assessment 1: inputs 2: inputs 3: inputs 4: inputs 5: inputs =1 =0 =0 =1 =0 Decisions primary secondary primary secondary secondary Misclassification/Loss false positive false negative 1: inputs =1 2: inputs =0 3: inputs =0 4: inputs =1 5: inputs =0 Rankings Concordance rank(=1) > rank(=0) Focus on incorrect decisions Focus on correct ordering 43 44

12 Ranking Assessment (for Pessimists) Estimate Assessment (only Pessimistic!) 1: inputs =1 2: inputs =0 3: inputs =0 4: inputs =1 5: inputs =0 Rankings Discordance rank(=1) < rank(=0) 1: inputs =1 2: inputs =0 3: inputs =0 4: inputs =1 5: inputs =0 Estimates Squared Error (-estimate) 2 Focus on incorrect ordering Focus on incorrect estimation Predictive Modeling Assessments Optimistic Assessment, Pessimistic Stats 42% Misclassification D 1 D 0 R r Decisions Rankings Accuracy Misclassification Concordance Discordance 40% 38% 45% 40% 35% 0.26 Discordance Squared Error Increasing leaf count improves assessment measures on training data p 1 Estimates Squared Error

13 Unbiased Assessment 42% 40% 38% Misclassification Demo på Pruning 45% 40% 35% Discordance Squared Error Increasing leaf count might worsen assessment measures on validation data Model Essentials Regressions Model Essentials Regressions Predict s Prediction formula Predict s Prediction formula Sequential selection Sequential selection Optimal sequence model Optimal sequence model 51 52

14 Linear Regression Prediction Formula Logistic Regression Prediction Formula intercept estimate input measurement y ^ = w^ 0 + w^ 1 + w^ 2 parameter estimate estimate log p^ 1 ^p ( ) = w^ 0 + w^ 1 + w^ 2 logit scores Choose intercept and parameter estimates to minimize. squared error function ( y i y ^ i ) 2 training data Choose intercept and parameter estimates to maximize. log-likelihood function log(p ^ i ) + log(1 ^ p i ) primary outcome training s secondary outcome training s Logit Link Function Simple Prediction Illustration Regressions log 5 p^ 1 ^p ( ) logit link function = w^ 0 + w^ 1 + w^ 2 doubling amount x i logit scores consequence 1 odds exp(w i ) 0.69 odds 2 w i Model interpretation odds ratio ^ p = logit equation logit( p ^) = w^ 0 + w^ 1 + w^ e -logit( p ^ ) logistic equation

15 Beyond the Prediction Formula Missing Values and Regression Modeling Missing values Inputs Extreme or unusual values C A B Non-numeric inputs Cases Nonlinearity and Non-additivity Missing Values and Regression Modeling Missing Values and the Prediction Formula Inputs Prediction Formula: logit( p )= Cases New Case: (,, ) = ( 2,,-1 ) Predicted Value: logit( p )=

16 Missing Value Causes Missing Value Remedies N/A Not applicable N/A Not applicable Synthetic distribution No match Non-disclosure No match Non-disclosure Estimation x i = f(,,x p ) Model Essentials Regressions Sequential Selection Forward Predict s Prediction formula Input p-value Entry Cutoff Sequential selection Optimal sequence model 63 64

17 Sequential Selection Forward Sequential Selection Backward Input p-value Entry Cutoff Input p-value Stay Cutoff Sequential Selection Backward Sequential Selection Stepwise Input p-value Stay Cutoff Input p-value Entry Cutoff Stay Cutoff 67 68

18 Sequential Selection Stepwise Model Essentials Regressions Input p-value Entry Cutoff Predict s Prediction formula Stay Cutoff Sequential selection Optimal sequence model Model Fit versus Complexity Model fit statistic Select Model with Optimal Validation Fit Model fit statistic validation training Evaluate each sequence step. Choose simplest optimal model. Evaluate each sequence step. Choose simplest optimal model

19 Agenda Model Essentials Neural Networks Introduktion till Prediktiva modeller Beslutsträd Predict s Prediction formula Pruning Regressioner None Neurala Nätverk Utvärdering av modeller Stopped training Model Essentials Neural Networks Neural Network Prediction Formula Predict s Prediction formula estimate weight estimate hidden unit ^ ^ ^ ^ ^ y = w 00 + w 01 H 1 + w 02 H 2 + w 03 H 3 bias estimate None Stopped training tanh 5 H 1 = tanh(w^ 10 + w^ 11 + w^ 12 ) H 2 = tanh(w^ 20 + w^ 21 + w^ 22 ) H 3 = tanh(w^ 30 + w^ 31 + w^ 32 ) activation function 75 76

20 Neural Network Binary Prediction Formula Neural Network Diagram ^ p log = w^ 00 + w^ 01 H 1 + w^ 02 H 2 + w^ ( 03 H 1 p ^ ) 3 logit link function H 1 = tanh(w^ 10 + w^ 11 + w^ 12 ) H 2 = tanh(w^ 20 + w^ 21 + w^ 22 ) H 3 = tanh(w^ 30 + w^ 31 + w^ 32 ) input layer ^ p log = w^ 00 + w^ 01 H 1 + w^ 02 H 2 + w^ ( 03 H 1 p ^ ) 3 H 1 H 2 H 3 hidden layer y layer H 1 = tanh(w^ 10 + w^ 11 + w^ 12 ) H 2 = tanh(w^ 20 + w^ 21 + w^ 22 ) H 3 = tanh(w^ 30 + w^ 31 + w^ 32 ) Prediction Illustration Neural Networks Agenda logit( p ^) = w^ 00 +w^ 01 H 1 +w^ 02 H 2 +w^ 03 H 3 ^ p = logit equation H 1 = tanh( w^ 10 +w^ 11 +w^ 12 ) H 2 = tanh( w^ 20 +w^ 21 +w^ 22 ) H 3 = tanh( w^ 30 +w^ 31 +w^ 32 ) e -logit( p ^ ) logistic equation Introduktion till Prediktiva modeller Beslutsträd Pruning Regressioner Neurala Nätverk Utvärdering av modeller 79 80

21 Assessment Types Summary Statistics Summary The Model Comparison tool provides KS C ASE Summary statistics Statistical graphics Prediction Type Decisions 1,2,3, Rankings Statistic Accuracy / Misclassification Profit / Loss KS-statistic ROC Index (concordance) Gini coefficient ^ p E(Y) Estimates Average squared error SBC / Likelihood Summary Statistics Summary Summary Statistics Summary Prediction Type Statistic Prediction Type Statistic Decisions Accuracy / Misclassification Profit / Loss KS-statistic Decisions Accuracy / Misclassification Profit / Loss KS-statistic 1,2,3, Rankings ROC Index (concordance) Gini coefficient 1,2,3, Rankings ROC Index (concordance) Gini coefficient ^ p E(Y) Estimates Average squared error SBC / Likelihood ^ p E(Y) Estimates Average squared error SBC / Likelihood 83 84

22 Statistical Graphics Summary Prediction Type Statistic Decisions 1,2,3, ^ p E(Y) Rankings Estimates Sensitivity charts Response rate charts 85

Decision Trees and other predictive models. Mathias Lanner SAS Institute

Decision Trees and other predictive models. Mathias Lanner SAS Institute Decision Trees and other predictive models Mathias Lanner SAS Institute Agenda Introduction to Predictive Models Decision Trees Pruning Regression Neural Network Model Assessment 2 Predictive Modeling

More information

Data Mining Techniques Chapter 6: Decision Trees

Data Mining Techniques Chapter 6: Decision Trees Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................

More information

Internet Gambling Behavioral Markers: Using the Power of SAS Enterprise Miner 12.1 to Predict High-Risk Internet Gamblers

Internet Gambling Behavioral Markers: Using the Power of SAS Enterprise Miner 12.1 to Predict High-Risk Internet Gamblers Paper 1863-2014 Internet Gambling Behavioral Markers: Using the Power of SAS Enterprise Miner 12.1 to Predict High-Risk Internet Gamblers Sai Vijay Kishore Movva, Vandana Reddy and Dr. Goutam Chakraborty;

More information

Enhancing Compliance with Predictive Analytics

Enhancing Compliance with Predictive Analytics Enhancing Compliance with Predictive Analytics FTA 2007 Revenue Estimation and Research Conference Reid Linn Tennessee Department of Revenue reid.linn@state.tn.us Sifting through a Gold Mine of Tax Data

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Data Mining Using SAS Enterprise Miner Randall Matignon, Piedmont, CA

Data Mining Using SAS Enterprise Miner Randall Matignon, Piedmont, CA Data Mining Using SAS Enterprise Miner Randall Matignon, Piedmont, CA An Overview of SAS Enterprise Miner The following article is in regards to Enterprise Miner v.4.3 that is available in SAS v9.1.3.

More information

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.

More information

A fast, powerful data mining workbench designed for small to midsize organizations

A fast, powerful data mining workbench designed for small to midsize organizations FACT SHEET SAS Desktop Data Mining for Midsize Business A fast, powerful data mining workbench designed for small to midsize organizations What does SAS Desktop Data Mining for Midsize Business do? Business

More information

A Property & Casualty Insurance Predictive Modeling Process in SAS

A Property & Casualty Insurance Predictive Modeling Process in SAS Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees Bob Stine Dept of Statistics, School University of Pennsylvania Trees Familiar metaphor Biology Decision tree Medical diagnosis Org chart Properties Recursive, partitioning

More information

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

More information

Predictive Modeling of Titanic Survivors: a Learning Competition

Predictive Modeling of Titanic Survivors: a Learning Competition SAS Analytics Day Predictive Modeling of Titanic Survivors: a Learning Competition Linda Schumacher Problem Introduction On April 15, 1912, the RMS Titanic sank resulting in the loss of 1502 out of 2224

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node

ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node Enterprise Miner - Regression 1 ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node 1. Some background: Linear attempts to predict the value of a continuous

More information

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation

More information

Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@

Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Yanchun Xu, Andrius Kubilius Joint Commission on Accreditation of Healthcare Organizations,

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"

!!!#$$%&'()*+$(,%!#$%$&'()*%(+,'-*&./#-$&'(-&(0*.$#-$1(2&.3$'45 !"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"!"#"$%&#'()*+',$$-.&#',/"-0%.12'32./4'5,5'6/%&)$).2&'7./&)8'5,5'9/2%.%3%&8':")08';:

More information

Statistics in Retail Finance. Chapter 2: Statistical models of default

Statistics in Retail Finance. Chapter 2: Statistical models of default Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision

More information

1/2/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

1/2/2016. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 When and why do we use logistic regression? Binary Multinomial Theory behind logistic regression Assessing the model Assessing predictors

More information

A Property and Casualty Insurance Predictive Modeling Process in SAS

A Property and Casualty Insurance Predictive Modeling Process in SAS Paper 11422-2016 A Property and Casualty Insurance Predictive Modeling Process in SAS Mei Najim, Sedgwick Claim Management Services ABSTRACT Predictive analytics is an area that has been developing rapidly

More information

Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

More information

SPSS Multivariable Linear Models and Logistic Regression

SPSS Multivariable Linear Models and Logistic Regression 1 SPSS Multivariable Linear Models and Logistic Regression Multivariable Models Single continuous outcome (dependent variable), one main exposure (independent) variable, and one or more potential confounders

More information

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

More information

Gerry Hobbs, Department of Statistics, West Virginia University

Gerry Hobbs, Department of Statistics, West Virginia University Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING

DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING ABSTRACT The objective was to predict whether an offender would commit a traffic offence involving death, using decision tree analysis. Four

More information

Local classification and local likelihoods

Local classification and local likelihoods Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor

More information

Free Trial - BIRT Analytics - IAAs

Free Trial - BIRT Analytics - IAAs Free Trial - BIRT Analytics - IAAs 11. Predict Customer Gender Once we log in to BIRT Analytics Free Trial we would see that we have some predefined advanced analysis ready to be used. Those saved analysis

More information

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell THE HYBID CAT-LOGIT MODEL IN CLASSIFICATION AND DATA MINING Introduction Dan Steinberg and N. Scott Cardell Most data-mining projects involve classification problems assigning objects to classes whether

More information

Indiana State Core Curriculum Standards updated 2009 Algebra I

Indiana State Core Curriculum Standards updated 2009 Algebra I Indiana State Core Curriculum Standards updated 2009 Algebra I Strand Description Boardworks High School Algebra presentations Operations With Real Numbers Linear Equations and A1.1 Students simplify and

More information

Lecture 10: Regression Trees

Lecture 10: Regression Trees Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

What is Data Mining? MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling

What is Data Mining? MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling MS4424 Data Mining & Modelling MS4424 Data Mining & Modelling Lecturer : Dr Iris Yeung Room No : P7509 Tel No : 2788 8566 Email : msiris@cityu.edu.hk 1 Aims To introduce the basic concepts of data mining

More information

Developing Credit Scorecards Using Credit Scoring for SAS Enterprise Miner TM 12.1

Developing Credit Scorecards Using Credit Scoring for SAS Enterprise Miner TM 12.1 Developing Credit Scorecards Using Credit Scoring for SAS Enterprise Miner TM 12.1 SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2012. Developing

More information

Data Mining Using SAS Enterprise Miner 7.1

Data Mining Using SAS Enterprise Miner 7.1 Data Mining Using SAS Enterprise Miner 7.1 Lorne Rothman Lorne.rothman@sas.com Principal Statistician SAS Institute (Canada) Inc. Copyright 2010 SAS Institute Inc. All rights reserved. Data Mining The

More information

Data Mining Classification: Decision Trees

Data Mining Classification: Decision Trees Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through

More information

Result for G.S. Test-2 held on 7th February 2016

Result for G.S. Test-2 held on 7th February 2016 Result for G.S. Test-2 held on 7th February 2016 Admin. No. Correct Incorrect +ve Marks -ve Marks Score Accuracy Rank 550166 79 0 158 0 158 100.00 1 220123 80 13 160-8.66 151.34 86.02 2 550069 75 5 150-3.33

More information

A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND

A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND Paper D02-2009 A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND ABSTRACT This paper applies a decision tree model and logistic regression

More information

Modeling Lifetime Value in the Insurance Industry

Modeling Lifetime Value in the Insurance Industry Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting

More information

Leveraging Ensemble Models in SAS Enterprise Miner

Leveraging Ensemble Models in SAS Enterprise Miner ABSTRACT Paper SAS133-2014 Leveraging Ensemble Models in SAS Enterprise Miner Miguel Maldonado, Jared Dean, Wendy Czika, and Susan Haller SAS Institute Inc. Ensemble models combine two or more models to

More information

An Overview and Evaluation of Decision Tree Methodology

An Overview and Evaluation of Decision Tree Methodology An Overview and Evaluation of Decision Tree Methodology ASA Quality and Productivity Conference Terri Moore Motorola Austin, TX terri.moore@motorola.com Carole Jesse Cargill, Inc. Wayzata, MN carole_jesse@cargill.com

More information

Predictive Data Mining in Very Large Data Sets: A Demonstration and Comparison Under Model Ensemble

Predictive Data Mining in Very Large Data Sets: A Demonstration and Comparison Under Model Ensemble Predictive Data Mining in Very Large Data Sets: A Demonstration and Comparison Under Model Ensemble Dr. Hongwei Patrick Yang Educational Policy Studies & Evaluation College of Education University of Kentucky

More information

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

More information

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Arun K Mandapaka, Amit Singh Kushwah, Dr.Goutam Chakraborty Oklahoma State University, OK, USA ABSTRACT Direct

More information

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification

More information

Chapter 3 Introduction to Predictive Modeling: Predictive Modeling Fundamentals and Decision Trees

Chapter 3 Introduction to Predictive Modeling: Predictive Modeling Fundamentals and Decision Trees Chapter 3 Introduction to Predictive Modeling: Predictive Modeling Fundamentals and Decision Trees 3.1 Creating Training and Validation Data... 3-2 3.2 Constructing a Decision Tree Predictive Model...

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

JetBlue Airways Stock Price Analysis and Prediction

JetBlue Airways Stock Price Analysis and Prediction JetBlue Airways Stock Price Analysis and Prediction Team Member: Lulu Liu, Jiaojiao Liu DSO530 Final Project JETBLUE AIRWAYS STOCK PRICE ANALYSIS AND PREDICTION 1 Motivation Started in February 2000, JetBlue

More information

MACHINE LEARNING AN INTRODUCTION

MACHINE LEARNING AN INTRODUCTION AN INTRODUCTION JOSEFIN ROSÉN, SENIOR ANALYTICAL EXPERT, SAS INSTITUTE JOSEFIN.ROSEN@SAS.COM TWITTER: @ROSENJOSEFIN AGENDA What is machine learning? When, where and how is machine learning used? Exemple

More information

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics

More information

Modeling to improve the customer unit target selection for inspections of Commercial Losses in Brazilian Electric Sector - The case CEMIG

Modeling to improve the customer unit target selection for inspections of Commercial Losses in Brazilian Electric Sector - The case CEMIG Paper 3406-2015 Modeling to improve the customer unit target selection for inspections of Commercial Losses in Brazilian Electric Sector - The case CEMIG Sérgio Henrique Rodrigues Ribeiro, CEMIG; Iguatinan

More information

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010. Title Introduction to Data Mining Dr Arulsivanathan Naidoo Statistics South Africa OECD Conference Cape Town 8-10 December 2010 1 Outline Introduction Statistics vs Knowledge Discovery Predictive Modeling

More information

Predictive Dynamix Inc

Predictive Dynamix Inc Predictive Modeling Technology Predictive modeling is concerned with analyzing patterns and trends in historical and operational data in order to transform data into actionable decisions. This is accomplished

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

Using multiple models: Bagging, Boosting, Ensembles, Forests

Using multiple models: Bagging, Boosting, Ensembles, Forests Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or

More information

Data Mining for Knowledge Management. Classification

Data Mining for Knowledge Management. Classification 1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

More information

Predictive Analytics Modeling Methodology Document

Predictive Analytics Modeling Methodology Document Predictive Analytics Modeling Methodology Document Campaign Response Modeling 17 October- 2012 Version details Version number Date Author Reviewer name 1.0 16 October- 2012 Vikash chandra CONTENTS 1. TRAINING

More information

APPLICATION PROGRAMMING: DATA MINING AND DATA WAREHOUSING

APPLICATION PROGRAMMING: DATA MINING AND DATA WAREHOUSING Wrocław University of Technology Internet Engineering Henryk Maciejewski APPLICATION PROGRAMMING: DATA MINING AND DATA WAREHOUSING PRACTICAL GUIDE Wrocław (2011) 1 Copyright by Wrocław University of Technology

More information

Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

More information

Enterprise Miner - Decision tree 1

Enterprise Miner - Decision tree 1 Enterprise Miner - Decision tree 1 ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Decision Tree I. Tree Node Setting Tree Node Defaults - define default options that you commonly use

More information

Elementary Statistics. Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination

Elementary Statistics. Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination What is a Scatter Plot? A Scatter Plot is a plot of ordered pairs (x, y) where the horizontal axis is used

More information

Nominal and ordinal logistic regression

Nominal and ordinal logistic regression Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome

More information

Identifying and Overcoming Common Data Mining Mistakes Doug Wielenga, SAS Institute Inc., Cary, NC

Identifying and Overcoming Common Data Mining Mistakes Doug Wielenga, SAS Institute Inc., Cary, NC Identifying and Overcoming Common Data Mining Mistakes Doug Wielenga, SAS Institute Inc., Cary, NC ABSTRACT Due to the large amount of data typically involved, data mining analyses can exacerbate some

More information

Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

More information

Prediction of Stock Performance Using Analytical Techniques

Prediction of Stock Performance Using Analytical Techniques 136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University

More information

data on Down's syndrome

data on Down's syndrome DATA a; INFILE 'downs.dat' ; INPUT AgeL AgeU BirthOrd Cases Births ; MidAge = (AgeL + AgeU)/2 ; Rate = 1000*Cases/Births; LogRate = Log( (Cases+0.5)/Births ); LogDenom = Log(Births); age_c = MidAge - 30;

More information

Logistic Regression. http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests

Logistic Regression. http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests Logistic Regression http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests Overview Binary (or binomial) logistic regression is a form of regression which is used when the dependent is a dichotomy

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4. Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics

More information

Yiming Peng, Department of Statistics. February 12, 2013

Yiming Peng, Department of Statistics. February 12, 2013 Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013 2 Presentation and Data http://www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop

More information

Detecting Email Spam. MGS 8040, Data Mining. Audrey Gies Matt Labbe Tatiana Restrepo

Detecting Email Spam. MGS 8040, Data Mining. Audrey Gies Matt Labbe Tatiana Restrepo Detecting Email Spam MGS 8040, Data Mining Audrey Gies Matt Labbe Tatiana Restrepo 5 December 2011 INTRODUCTION This report describes a model that may be used to improve likelihood of recognizing undesirable

More information

Smart Sell Re-quote project for an Insurance company.

Smart Sell Re-quote project for an Insurance company. SAS Analytics Day Smart Sell Re-quote project for an Insurance company. A project by Ajay Guyyala Naga Sudhir Lanka Narendra Babu Merla Kiran Reddy Samiullah Bramhanapalli Shaik Business Situation XYZ

More information

SAS ENTERPRISE MINER 5.3

SAS ENTERPRISE MINER 5.3 FACT SHEET SAS ENTERPRISE MINER 5.3 Unearthing valuable insight profitable data mining results with less time and effort What does SAS Enterprise Miner do? SAS Enterprise Miner streamlines the data mining

More information

The Big 50 Revision Guidelines for S1

The Big 50 Revision Guidelines for S1 The Big 50 Revision Guidelines for S1 If you can understand all of these you ll do very well 1. Know what is meant by a statistical model and the Modelling cycle of continuous refinement 2. Understand

More information

Section 6: Model Selection, Logistic Regression and more...

Section 6: Model Selection, Logistic Regression and more... Section 6: Model Selection, Logistic Regression and more... Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Model Building

More information

Lecture 16: Logistic regression diagnostics, splines and interactions. Sandy Eckel 19 May 2007

Lecture 16: Logistic regression diagnostics, splines and interactions. Sandy Eckel 19 May 2007 Lecture 16: Logistic regression diagnostics, splines and interactions Sandy Eckel seckel@jhsph.edu 19 May 2007 1 Logistic Regression Diagnostics Graphs to check assumptions Recall: Graphing was used to

More information

Weight of Evidence Module

Weight of Evidence Module Formula Guide The purpose of the Weight of Evidence (WoE) module is to provide flexible tools to recode the values in continuous and categorical predictor variables into discrete categories automatically,

More information

Better credit models benefit us all

Better credit models benefit us all Better credit models benefit us all Agenda Credit Scoring - Overview Random Forest - Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis

More information

MORTGAGE LENDER PROTECTION UNDER INSURANCE ARRANGEMENTS Irina Genriha Latvian University, Tel. +371 26387099, e-mail: irina.genriha@inbox.

MORTGAGE LENDER PROTECTION UNDER INSURANCE ARRANGEMENTS Irina Genriha Latvian University, Tel. +371 26387099, e-mail: irina.genriha@inbox. MORTGAGE LENDER PROTECTION UNDER INSURANCE ARRANGEMENTS Irina Genriha Latvian University, Tel. +371 2638799, e-mail: irina.genriha@inbox.lv Since September 28, when the crisis deepened in the first place

More information

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios By: Michael Banasiak & By: Daniel Tantum, Ph.D. What Are Statistical Based Behavior Scoring Models And How Are

More information

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1 Stat 5303 (Oehlert): Tukey One Degree of Freedom 1 > catch

More information

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL Paper SA01-2012 Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL ABSTRACT Analysts typically consider combinations

More information

Classification and regression trees

Classification and regression trees Classification and regression trees December 9 Introduction We ve seen that local methods and splines both operate by partitioning the sample space of the regression variable(s), and then fitting separate/piecewise

More information

Neural Networks & Boosting

Neural Networks & Boosting Neural Networks & Boosting Bob Stine Dept of Statistics, School University of Pennsylvania Questions How is logistic regression different from OLS? Logistic mean function for probabilities Larger weight

More information

Key Topics What will ALL students learn? What will the most able students learn?

Key Topics What will ALL students learn? What will the most able students learn? 2013 2014 Scheme of Work Subject MATHS Year 9 Course/ Year Term 1 Key Topics What will ALL students learn? What will the most able students learn? Number Written methods of calculations Decimals Rounding

More information

Use Data Mining Techniques to Assist Institutions in Achieving Enrollment Goals: A Case Study

Use Data Mining Techniques to Assist Institutions in Achieving Enrollment Goals: A Case Study Use Data Mining Techniques to Assist Institutions in Achieving Enrollment Goals: A Case Study Tongshan Chang The University of California Office of the President CAIR Conference in Pasadena 11/13/2008

More information

Reevaluating Policy and Claims Analytics: a Case of Non-Fleet Customers In Automobile Insurance Industry

Reevaluating Policy and Claims Analytics: a Case of Non-Fleet Customers In Automobile Insurance Industry Paper 1808-2014 Reevaluating Policy and Claims Analytics: a Case of Non-Fleet Customers In Automobile Insurance Industry Kittipong Trongsawad and Jongsawas Chongwatpol NIDA Business School, National Institute

More information

Simple Methods and Procedures Used in Forecasting

Simple Methods and Procedures Used in Forecasting Simple Methods and Procedures Used in Forecasting The project prepared by : Sven Gingelmaier Michael Richter Under direction of the Maria Jadamus-Hacura What Is Forecasting? Prediction of future events

More information

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing E-mail Classifier International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-6, January 2013 Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing

More information

Step 1 MATHS. To achieve Step 1 in Maths students must master the following skills and competencies: Number. Shape. Algebra

Step 1 MATHS. To achieve Step 1 in Maths students must master the following skills and competencies: Number. Shape. Algebra MATHS Step 1 To achieve Step 1 in Maths students must master the following skills and competencies: Number Add and subtract positive decimal numbers Add and subtract negative numbers in context Order decimal

More information

GCSE Statistics Revision notes

GCSE Statistics Revision notes GCSE Statistics Revision notes Collecting data Sample This is when data is collected from part of the population. There are different methods for sampling Random sampling, Stratified sampling, Systematic

More information

Variable Selection and Transformation of Variables in SAS Enterprise Miner

Variable Selection and Transformation of Variables in SAS Enterprise Miner Variable Selection and Transformation of Variables in SAS Enterprise Miner Kattamuri S. Sarma, Ph.D Ecostat Research Corp., White Plains NY kssarma@worldnet.att.net kssarma@ecostat-research.com 2 Issues

More information

Semester 1 Statistics Short courses

Semester 1 Statistics Short courses Semester 1 Statistics Short courses Course: STAA0001 Basic Statistics Blackboard Site: STAA0001 Dates: Sat. March 12 th and Sat. April 30 th (9 am 5 pm) Assumed Knowledge: None Course Description Statistical

More information

Data Mining: A Magic Technology for College Recruitment. Tongshan Chang, Ed.D.

Data Mining: A Magic Technology for College Recruitment. Tongshan Chang, Ed.D. Data Mining: A Magic Technology for College Recruitment Tongshan Chang, Ed.D. Principal Administrative Analyst Admissions Research and Evaluation The University of California Office of the President Tongshan.Chang@ucop.edu

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

College Tuition: Data mining and analysis

College Tuition: Data mining and analysis CS105 College Tuition: Data mining and analysis By Jeanette Chu & Khiem Tran 4/28/2010 Introduction College tuition issues are steadily increasing every year. According to the college pricing trends report

More information

The general form of the PROC GLM statement is

The general form of the PROC GLM statement is Linear Regression Analysis using PROC GLM Regression analysis is a statistical method of obtaining an equation that represents a linear relationship between two variables (simple linear regression), or

More information