ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node



Similar documents
STATISTICA Formula Guide: Logistic Regression. Table of Contents

A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND

A fast, powerful data mining workbench designed for small to midsize organizations

SAS Software to Fit the Generalized Linear Model

Section 6: Model Selection, Logistic Regression and more...

5. Multiple regression

Data Mining Using SAS Enterprise Miner : A Case Study Approach, Second Edition

UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee

Logistic Regression.

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Internet Gambling Behavioral Markers: Using the Power of SAS Enterprise Miner 12.1 to Predict High-Risk Internet Gamblers

Logistic Regression (1/24/13)

APPLICATION PROGRAMMING: DATA MINING AND DATA WAREHOUSING

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

7 Generalized Estimating Equations

Data Mining Lab 5: Introduction to Neural Networks

Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller

Chapter 39 The LOGISTIC Procedure. Chapter Table of Contents

A Property & Casualty Insurance Predictive Modeling Process in SAS

Statistics in Retail Finance. Chapter 2: Statistical models of default

Gerry Hobbs, Department of Statistics, West Virginia University

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

SAS ENTERPRISE MINER 5.3

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

An Overview and Evaluation of Decision Tree Methodology

Some Essential Statistics The Lure of Statistics

Developing Credit Scorecards Using Credit Scoring for SAS Enterprise Miner TM 12.1

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

Developing Risk Adjustment Techniques Using the System for Assessing Health Care Quality in the

USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION. Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA

Didacticiel Études de cas

IBM SPSS Direct Marketing 23

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

IBM SPSS Direct Marketing 22

Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC

What is Data Mining? MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling

9.2 User s Guide SAS/STAT. Introduction. (Book Excerpt) SAS Documentation

2015 Workshops for Professors

Reevaluating Policy and Claims Analytics: a Case of Non-Fleet Customers In Automobile Insurance Industry

!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"

Data Mining Using SAS Enterprise Miner Randall Matignon, Piedmont, CA

SAS Code to Select the Best Multiple Linear Regression Model for Multivariate Data Using Information Criteria

Generalized Linear Models

STA 4273H: Statistical Machine Learning

Weight of Evidence Module

MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS

Text Analytics using High Performance SAS Text Miner

Social Media Mining. Data Mining Essentials

A Property and Casualty Insurance Predictive Modeling Process in SAS

Neural Network Add-in

PAKDD 2006 Data Mining Competition

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

IBM SPSS Neural Networks 22

Using JMP Version 4 for Time Series Analysis Bill Gjertsen, SAS, Cary, NC

Corporate Defaults and Large Macroeconomic Shocks

Data mining and statistical models in marketing campaigns of BT Retail

Big Data Analytics. Benchmarking SAS, R, and Mahout. Allison J. Ames, Ralph Abbey, Wayne Thompson. SAS Institute Inc., Cary, NC

Lecture 10: Regression Trees

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

GLM, insurance pricing & big data: paying attention to convergence issues.

SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY

SUGI 29 Statistics and Data Analysis

Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit scoring, contracting.

Application of SAS! Enterprise Miner in Credit Risk Analytics. Presented by Minakshi Srivastava, VP, Bank of America

Stephen du Toit Mathilda du Toit Gerhard Mels Yan Cheng. LISREL for Windows: PRELIS User s Guide

Modeling Lifetime Value in the Insurance Industry

Roots of Equations (Chapters 5 and 6)

The Basics of SAS Enterprise Miner 5.2

Statistical Machine Learning

Logistic Regression for Spam Filtering

Regression Modeling Strategies

i SPSS Regression 17.0

Data Mining - Evaluation of Classifiers

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Linear Threshold Units

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

Credit Risk Analysis Using Logistic Regression Modeling

Better credit models benefit us all

Directions for using SPSS

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Cool Tools for PROC LOGISTIC

IBM SPSS Direct Marketing 19

Lecture 3: Linear methods for classification

Chapter 29 The GENMOD Procedure. Chapter Table of Contents

Data Analysis Tools. Tools for Summarizing Data

How To Make A Credit Risk Model For A Bank Account

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

Chapter 12 Discovering New Knowledge Data Mining

IBM SPSS Regression 20

About Dell Statistica

Copyright 2006, SAS Institute Inc. All rights reserved. Predictive Modeling using SAS

Logistic regression modeling the probability of success

Data Mining Methods: Applications for Institutional Research

Binary Logistic Regression

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

data visualization and regression

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

Transcription:

Enterprise Miner - Regression 1 ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node 1. Some background: Linear attempts to predict the value of a continuous target as a linear function of one or Regression: more independent inputs Logistic attempts to predict the probability that a binary or ordinal target will acquire the Regression: event of interest as a function of one or more independent inputs N.B. : Regression cannot handle nominal target. Let there are three variables: A, B and C Effect: Main input / effect: Multiplication effect / Interaction terms: Polynomial effect: Selection Method: Selection Criteria: Optimization Method: Linking Function: Variable used to model the value / probability A, B and C AB, ABC, A**2, B**3, Method to select effects (e.g. starting from all, starting from zero) Criteria used to evaluate the effects of a model on the target Method used to optimize the selection function among a set of candidate effects Function used to link response to the linear predictor e.g. From logistic to linear Example Data: SAMPSIO.DMAGECR Variable: GOOD_BAD (Model: use) GOOD_BAD edit target profile Assessment information Add matrix Accept Good (i.e. true positive): -1, Accept Bad(i.e. false positive): 5, Others: 0 Edit Decision: Minimize Loss Data Partition: 70% Training 30% Validation With stratification, keep Good and Bad in proportion. Model Options Tab - lists details about the target variable and the regression process and enables you to specify options for both Target Definition Subtab - lists the name, measurement level, and event level of the target variable Regression Subtab 1

Enterprise Miner - Regression 2 Type Binary or ordinal targets Interval targets logistic (default) linear (default) Link Function For logistic regression: logit (default) cloglog (complementary log-log) probit Input Coding - convert categorical inputs to discrete integer values Deviation use middle level as reference level GLM use highest / lowest (descending / ascending) level as reference Selection Method Tab Method Backward Forward Stepwise Begins with all candidate effects, remove effect Begins with no candidate effects, add effect Begins with no candidate, add and remove effect All candidate effects are included Criteria AIC Akaike's Information Criterion (smallest) SBC Schwarz's Bayesian Criterion (smallest) Validation Error smallest error rate for the validation data set Validation smallest misclassification rate for the validation data set Misclassification Cross-Validation Error smallest cross validation error rate for the training data set Cross Validation smallest cross validation misclassification rate for the training Misclassification data set Profit/Loss maximizes the profit or minimizes the loss for the cases in the validation data set Cross Validation maximizes the cross validation profit or minimizes the cross Profit/Loss validation loss last model produced by the effects selection method 2

Enterprise Miner - Regression 3 Selection Method Number of Variables: Start - number of effects to use in the first model - list of candidate effects can be seen in the Tools Model Ordering window - first n effects will be selected in the first model Stop - Forward method: maximum number of effects to appear in the final model - Backward method: minimum number of effects to appear in the final model - effect selection method may terminate for other reasons before the Stop criterion is applied. Force - force specific effects into the final models - set force no. and arrange effects in the Tools Model Ordering window Initialization Tab You can set one of the following options in the Initialization tab: (default) Do not use initial parameter estimates Current estimates Use the current parameter estimates from an initial run of the Regression node as starting values Selected data set Specify a data set that contains starting values for the parameter estimates Advanced Tab - set the optimization method, iteration controls, and convergence criteria in the Advanced tab. 3

Enterprise Miner - Regression 4 Optimization Method Max Iterations Max Function Calls No. of variables (n) Conjugate Gradient 400 1000 n > 400 Double Dogleg 200 500 40 < n < 400 Newton-Raphson with Line Search 50 125 n < 40 Newton-Raphson with Ridging 50 125 n < 40 Quasi-Newton 200 500 40 < n < 400 Trust-Region 50 125 n < 40 Note: To learn about these optimization methods, see SAS/OR Technical Report: The NLP Procedure. Running the Regression Node Regression Results Browser The Regression node results help you interpret the regression analysis of your data. It provides a graphic display of parameter estimates, statistics of fit, and a full listing of the regression output, log, and code. Estimates Tab - T-scores: the larger the value, the higher the strength of the effect on the target Plot Tab The taller the bar, the higher the agreement between the predicted (the into variable) and the actual (the from variable) target values the more useful the model 4

Enterprise Miner - Regression 5 Statistics Tab - fit statistics, in alphabetical order, for the training data, validation data, and test data analyzed with the regression model - the fit statistics show how good the trained model using different assessment methods To learn about these statistics, read either the LOGISTIC procedure or the REG procedure documentation in the SAS/STAT User's Guide, Version 6, Volume 2. 5