Cool Tools for PROC LOGISTIC



Similar documents
SUGI 29 Statistics and Data Analysis

PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY

Basic Statistical and Modeling Procedures Using SAS

Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC

VI. Introduction to Logistic Regression

SAS Software to Fit the Generalized Linear Model

11. Analysis of Case-control Studies Logistic Regression

SAS Rule-Based Codebook Generation for Exploratory Data Analysis Ross Bettinger, Senior Analytical Consultant, Seattle, WA

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Developing Risk Adjustment Techniques Using the System for Assessing Health Care Quality in the

Segmentation For Insurance Payments Michael Sherlock, Transcontinental Direct, Warminster, PA

ln(p/(1-p)) = α +β*age35plus, where p is the probability or odds of drinking

ABSTRACT INTRODUCTION

Chapter 29 The GENMOD Procedure. Chapter Table of Contents

Statistics in Retail Finance. Chapter 2: Statistical models of default

Chapter 39 The LOGISTIC Procedure. Chapter Table of Contents

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Lecture 19: Conditional Logistic Regression

Statistics and Data Analysis

Logistic Regression.

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.

Credit Risk Analysis Using Logistic Regression Modeling

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

SUMAN DUVVURU STAT 567 PROJECT REPORT

An Application of the Cox Proportional Hazards Model to the Construction of Objective Vintages for Credit in Financial Institutions, Using PROC PHREG

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Generalized Linear Models

Despite its emphasis on credit-scoring/rating model validation,

Introduction to Multilevel Modeling Using HLM 6. By ATS Statistical Consulting Group

Ordinal Regression. Chapter

Charles Secolsky County College of Morris. Sathasivam 'Kris' Krishnan The Richard Stockton College of New Jersey

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Probit Analysis By: Kim Vincent

Modeling Lifetime Value in the Insurance Industry

Regression Clustering

Regression Modeling Strategies

Binary Logistic Regression

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Gamma Distribution Fitting

Statistics, Data Analysis & Econometrics

Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Sun Li Centre for Academic Computing

Weight of Evidence Module

Analysis of Survey Data Using the SAS SURVEY Procedures: A Primer

Electronic Thesis and Dissertations UCLA

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node

Chapter 7: Simple linear regression Learning Objectives

Moderation. Moderation

Examining a Fitted Logistic Model

Statistics in Retail Finance. Chapter 6: Behavioural models

Lecture 14: GLM Estimation and Logistic Regression

A LOGISTIC REGRESSION MODEL TO PREDICT FRESHMEN ENROLLMENTS Vijayalakshmi Sampath, Andrew Flagel, Carolina Figueroa

not possible or was possible at a high cost for collecting the data.

Statistics 305: Introduction to Biostatistical Methods for Health Sciences

Logistic Regression (1/24/13)

HLM software has been one of the leading statistical packages for hierarchical

Multinomial and ordinal logistic regression using PROC LOGISTIC Peter L. Flom National Development and Research Institutes, Inc

Statistical Models in R

LOGISTIC REGRESSION ANALYSIS

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Scatter Plots with Error Bars

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Two Correlated Proportions (McNemar Test)

IBM SPSS Regression 20

Logistic (RLOGIST) Example #1

Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller

Directions for using SPSS

In this tutorial, we try to build a roc curve from a logistic regression.

Simple Predictive Analytics Curtis Seare

Data Mining Using SAS Enterprise Miner : A Case Study Approach, Second Edition

i SPSS Regression 17.0

How to set the main menu of STATA to default factory settings standards

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Data Mining Techniques Chapter 6: Decision Trees

S The Difference Between Predictive Modeling and Regression Patricia B. Cerrito, University of Louisville, Louisville, KY

A Hybrid Modeling Platform to meet Basel II Requirements in Banking Jeffery Morrision, SunTrust Bank, Inc.

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

An Introduction to Modeling Longitudinal Data

Numerical Issues in Statistical Computing for the Social Scientist

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Logistic regression (with R)

MORTGAGE LENDER PROTECTION UNDER INSURANCE ARRANGEMENTS Irina Genriha Latvian University, Tel ,

Abbas S. Tavakoli, DrPH, MPH, ME 1 ; Nikki R. Wooten, PhD, LISW-CP 2,3, Jordan Brittingham, MSPH 4

Multivariate Normal Distribution

Virtual Site Event. Predictive Analytics: What Managers Need to Know. Presented by: Paul Arnest, MS, MBA, PMP February 11, 2015

MORE ON LOGISTIC REGRESSION

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

Family economics data: total family income, expenditures, debt status for 50 families in two cohorts (A and B), annual records from

Analysing Questionnaires using Minitab (for SPSS queries contact -)

GLM I An Introduction to Generalized Linear Models

Multinomial and Ordinal Logistic Regression

Control Charts - SigmaXL Version 6.1

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén Table Of Contents

Transcription:

Cool Tools for PROC LOGISTIC Paul D. Allison Statistical Horizons LLC and the University of Pennsylvania March 2013 www.statisticalhorizons.com 1

New Features in LOGISTIC ODDSRATIO statement EFFECTPLOT statement ROC comparisons FIRTH option 2

Based on Recent Book 3

Also covered in 2-day course Logistic Regression Using SAS June 6-7, Philadelphia Temple University Center City Other SAS courses offered by Statistical Horizons: Introduction to Structural Equation Modeling Paul Allison, Instructor April 12-13, Washington, DC Longitudinal Data Analysis Using SAS Paul Allison, Instructor April 19-20, Philadelphia Missing Data Paul Allison, Instructor May 17-18, Los Angeles Mediation and Moderation Andrew Hayes & Kristopher Preacher July 15-19, Philadelphia Event History & Survival Analysis Paul Allison, Instructor July 15-19, Philadelphia Get more info at StatisticalHorizons.com 4

Example Data Set Data: 5960 loan customers BAD 1=customer defaults, otherwise 0 (dependent variable) LOAN Amount of the loan DEBTCON 1=debt consolidation, 0=home improvement DELINQ Number of delinquent trade lines NINQ Number of recent credit inquiries. DEBTINC Debt to income ratio 5

ODDSRATIO Statement Good for interactions Fit the following model: PROC LOGISTIC DATA=my.credit DESC; MODEL bad = loan debtcon delinq ninq debtinc debtcon*ninq ; RUN; 6

Coefficients Analysis of Maximum Likelihood Estimates Parameter DF Estimate Standard Error Wald Chi-Square Pr > ChiSq Intercept 1-5.8119 0.3434 286.4367 <.0001 loan 1-0.00001 5.937E-6 3.6289 0.0568 debtcon 1 0.0831 0.1599 0.2705 0.6030 delinq 1 0.6480 0.0509 161.8830 <.0001 ninq 1 0.3475 0.0742 21.9398 <.0001 debtinc 1 0.0873 0.00848 105.8757 <.0001 debtcon*ninq 1-0.2221 0.0822 7.3083 0.0069 7

Odds Ratios Odds Ratio Estimates Effect Point Estimate 95% Wald Confidence Limits LOAN 1.000 1.000 1.000 DELINQ 1.912 1.730 2.112 DEBTINC 1.091 1.073 1.109 Odds ratios not reported for the variables in the interaction. But now we can get them: PROC LOGISTIC DATA=my.credit DESC; MODEL bad = loan debtcon delinq ninq debtinc debtcon*ninq ; ODDSRATIO ninq / AT(debtcon=0 1); ODDSRATIO debtcon / AT(ninq=0 1 2 3 5 10 15); RUN; 8

Odds Ratios for Interactions Odds Ratio Estimates and Wald Confidence Intervals Label Estimate 95% Confidence Limits NINQ at debtcon=0 1.484 1.303 1.690 NINQ at debtcon=1 1.136 1.061 1.218 debtcon at NINQ=0 1.106 0.813 1.506 debtcon at NINQ=1 0.847 0.660 1.088 debtcon at NINQ=2 0.649 0.495 0.851 debtcon at NINQ=3 0.497 0.348 0.711 debtcon at NINQ=5 0.292 0.159 0.535 debtcon at NINQ=10 0.077 0.021 0.287 debtcon at NINQ=15 0.020 0.003 0.157 9

ODS Graphics 10

EFFECTPLOT Statement The EFFECTPLOT statement offers many possibilities for plotting predicted values as a function of one or more predictors. Here s how to get a graph for one predictor, holding the others at their means (or reference category for CLASS variables). ODS GRAPHICS ON; PROC LOGISTIC DATA=my.credit DESC; MODEL bad = loan debtcon delinq ninq debtinc; EFFECTPLOT FIT(X=debtinc); RUN; ODS GRAPHICS OFF; 11

12

Polynomial Functions Things get more interesting with polynomial functions: ODS GRAPHICS ON; PROC LOGISTIC DATA=my.credit DESC; MODEL bad = loan debtcon delinq ninq debtinc debtinc*debtinc; EFFECTPLOT FIT(X=debtinc); RUN; ODS GRAPHICS OFF; The squared term is highly significant. 13

14

EFFECTPLOT with Interactions EFFECTPLOT is also very useful for visualizing interactions: ODS GRAPHICS ON; PROC LOGISTIC DATA=my.credit DESC; MODEL bad = loan debtcon delinq ninq debtinc debtcon*ninq ; EFFECTPLOT FIT(X=ninq) / AT(debtcon=0 1); RUN; ODS GRAPHICS OFF; 15

16

EFFECTPLOT with Interactions If you want the two graphs on the same axes, use the SLICEFIT option: ODS GRAPHICS ON; PROC LOGISTIC DATA=my.credit DESC; MODEL bad = loan debtcon delinq ninq debtinc debtcon*ninq ; EFFECTPLOT SLICEFIT(X=ninq SLICEBY=debtcon=0 1); RUN; ODS GRAPHICS OFF; 17

18

ROC Curves The Receiver Operating Characteristic curve is a way of evaluating the predictive power of a model for a binary outcome. It s a graph of sensitivity vs. 1-specificity. Sensitivity = probability of predicting an event, given that the individual has an event. Sensitivity = probability of predicting a non-event, given that the individual does not have an event. Both of these depend on the cut-point for determining whether a predicted probability is evaluated as an event prediction or a non-event prediction. Here s how to get the curve: PROC LOGISTIC DATA=my.credit PLOTS(ONLY)=ROC; MODEL bad = loan debtcon delinq ninq debtinc debtcon*ninq debtinc*debtinc; RUN; 19

The area under the curve (the C statistic) is a summary measure of predictive power 20

ROCCONTRAST With the ROC and ROCCONTRAST statements, we can get confidence intervals around the C statistic and compare different curves: PROC LOGISTIC DATA=my.credit; MODEL bad = loan debtcon delinq ninq debtinc debtcon*ninq debtinc*debtinc; ROC 'omit debtcon*ninq' loan debtcon delinq ninq debtinc*debtinc; ROC 'omit debtinc*debtinc' loan delinq ninq debtinc debtcon*ninq; ROC 'omit both' loan debtcon delinq ninq debtinc; ROCCONTRAST / ESTIMATE=ALLPAIRS; RUN; 21

ROC Results 1 ROC Association Statistics ROC Model Mann-Whitney Area Standard Error 95% Wald Confidence Limits Somers' D (Gini) Gamma Tau-a Model 0.7568 0.0152 0.7270 0.7865 0.5136 0.5136 0.0842 omit debtcon *ninq omit debtinc* debtinc omit both 0.7337 0.0156 0.7032 0.7643 0.4675 0.4675 0.0767 0.7286 0.0158 0.6976 0.7596 0.4571 0.4571 0.0750 0.7233 0.0160 0.6920 0.7546 0.4466 0.4466 0.0732 22

ROC Results 2 ROC Contrast Test Results Contrast DF Chi-Square Pr > ChiSq Reference = Model 3 24.4103 <.0001 ROC Contrast Estimation and Testing Results by Row Contrast Estimate Standard Error Model - omit debtcon*ninq Model - omit debtinc*debtinc 95% Wald Confidence Limits Chi-Square Pr > ChiSq 0.0230 0.00774 0.00785 0.0382 8.8401 0.0029 0.0282 0.00866 0.0112 0.0452 10.6136 0.0011 Model - omit both 0.0335 0.00914 0.0155 0.0514 13.3997 0.0003 omit debtcon*ninq - omit debtinc*debtinc omit debtcon*ninq - omit both omit debtinc*debtinc - omit both 0.00519 0.00325-0.00117 0.0116 2.5581 0.1097 0.0104 0.00233 0.00585 0.0150 19.9831 <.0001 0.00524 0.00233 0.000666 0.00981 5.0423 0.0247 23

ROC Results 24

FIRTH Option A solution to the problem of quasi-complete separation failure of the ML algorithm to converge because some coefficients are infinite. Example: PROC LOGISTIC DATA=my.credit ; CLASS derog /PARAM=GLM DESC; MODEL bad = derog; RUN; DEROG is the number of derogatory reports. This code produces the following in both the log and output windows. WARNING: There is possibly a quasi-complete separation of data points. The maximum likelihood estimate may not exist. WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood iteration. Validity of the model fit is questionable. 25

Odds Ratio Estimates Effect Point Estimate 95% Wald Confidence Limits DEROG 10 vs 0 <0.001 <0.001 >999.999 DEROG 9 vs 0 <0.001 <0.001 >999.999 DEROG 8 vs 0 <0.001 <0.001 >999.999 DEROG 7 vs 0 <0.001 <0.001 >999.999 DEROG 6 vs 0 0.100 0.034 0.293 DEROG 5 vs 0 0.228 0.083 0.632 DEROG 4 vs 0 0.056 0.021 0.150 DEROG 3 vs 0 0.070 0.039 0.126 DEROG 2 vs 0 0.190 0.138 0.262 DEROG 1 vs 0 0.315 0.255 0.387 26

Frequency BAD Table of BAD by DEROG DEROG 0 1 2 3 4 5 6 7 8 9 10 Total 0 3773 266 78 15 5 8 5 0 0 0 0 4150 1 754 169 82 43 18 7 10 8 6 3 2 1102 Total 4527 435 160 58 23 15 15 8 6 3 2 5252 Why does this happen? Because for values of DEROG > 6, every individual had BAD=1. Quasi-complete separation occurs when, for one or more categories of a CLASS variable, either everyone has the event or no one has the event. 27

Penalized Likelihood An effective solution is to invoke penalized likelihood by the FIRTH option: PROC LOGISTIC DATA=my.credit ; CLASS derog / PARAM=GLM DESC; MODEL bad = derog /FIRTH CLODDS=PL; RUN; The CLODDS options requests confidence intervals for the odds ratios based on the profile likelihood method. 28

Odds Ratio Estimates and Profile-Likelihood Confidence Intervals Effect Unit Estimate 95% Confidence Limits DEROG 10 vs 0 1.0000 0.040 <0.001 0.492 DEROG 9 vs 0 1.0000 0.029 <0.001 0.295 DEROG 8 vs 0 1.0000 0.015 <0.001 0.130 DEROG 7 vs 0 1.0000 0.012 <0.001 0.094 DEROG 6 vs 0 1.0000 0.105 0.034 0.285 DEROG 5 vs 0 1.0000 0.227 0.084 0.625 DEROG 4 vs 0 1.0000 0.059 0.021 0.145 DEROG 3 vs 0 1.0000 0.071 0.038 0.125 DEROG 2 vs 0 1.0000 0.190 0.138 0.262 DEROG 1 vs 0 1.0000 0.314 0.256 0.387 29