# PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY

Save this PDF as:

Size: px
Start display at page:

Download "PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY"

## Transcription

1 PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY ABSTRACT Keywords: Logistic. INTRODUCTION This paper covers some gotchas in SAS R PROC LOGISTIC. A gotcha is a mistake that isn t obviously a mistake the program runs, there may be a note or a warning, but no errors. Output appears. But it s the wrong output. This is not a primer on PROC LOGISTIC, much less on logistic regression. There are many good books on logistic regression; one such is Hosmer and Lemeshow [2000]. Each section has several subsections. First, I identify the gotcha. Then I give an example. Third, I show what evidence you have that it occurs. Fourth, I show how to fix it in some cases, referring to other resources. In some cases, I offer an explanation between the evidence and the solution. GOTCHA #1 CODING 0 AND 1 PROC LOGISTIC can be used to run logistic regression on a dichotomous dependent variable. Often, these are coded 0 and 1, with 0 for no or the equivalent, and 1 for yes or the equivalent. In this case, we are usually interested in modeling the probability of a yes. However, by default, SAS models the probability of a 0 (which would be a no ). For example, we might be interested in modeling the presence of a disease, with 0 meaning the person is not infected, and 1 meaning he or she is infected. To keep it simple, I will use one independent variable: sex, code as 1 for female and 0 for male. So: d a t a t o d a y ; i n p u t d i s e a s e f emale weight ; d a t a l i n e s ; ; ; ; ; we then run PROC LOGISTIC: p roc l o g i s t i c d a t a = t o d a y ; model d i s e a s e = f emale ; and get, among other output, an odds ratio estimate of 1.39 for female, while it s clear that men are much more likely to be infected. The evidence that this is happening is one line in the output: {Probability modeled is disease=0} and several lines in the log: NOTE: PROC LOGISTIC is modeling the probability that disease=0. One way to change this to model the probability that disease=1 is to specify the response variable option EVENT= 1 There are several solutions. The simplest is not the one mentioned in the log, but rather the DESCENDING option. 1

2 p roc l o g i s t i c d a t a = t o d a y d e s c e n d i n g ; model d i s e a s e = f emale ; Another method is the one mentioned in the log, which is more general: p roc l o g i s t i c d a t a = t o d a y ; model d i s e a s e ( e v e n t = 1 ) = female ; GOTCHA #2 EFFECT CODING AND CLASS VARIABLES When you have a categorical independent variable with more than 2 levels, you need to define it with a CLASS statement. In PROC GLM the default coding for this is dummy coding. In PROC LOGISTIC, it s effect coding. To me, effect coding is quite unnatural. EXAMPLE Continuing with the same example of modeling probability of infection, suppose you now have race/ethnicity as an IV, with 6 categories, as defined by the Census Bureau: White, Black/African American, Hispanic/Latino, American Indian/Alaskan Native, Native Hawaiian or other Pacific Islander, and Asian. p roc l o g i s t i c d a t a = today2 ; c l a s s r a c e ; model d i s e a s e ( e v e n t = 1 ) = r a c e ; and get parameter estimates that include (among much else): Parameter DF Estimate Intercept race AIAN race AfrA race Asian race Lat race NHPI and OR estimates Point Effect Estimate race AIAN vs White race AfrA vs White race Asian vs White race Lat vs White race NHPI vs White but we know that the OR estimate should be e OR, and, for example, e.06 = 0.94 not 1. The design matrix. With the default, the design matrix looks like this: 2

3 race AIAN AfrA Asian Lat NHPI White and each parameter estimate estimates the difference between that level and the average of the other levels. On the other hand, with dummy (or reference) coding, it looks like race AIAN AfrA Asian Lat NHPI White and each parameter estimates the difference between that level and the reference group. Use the param = reference option on the class statement: p roc l o g i s t i c d a t a = today2 ; c l a s s r a c e / param = r e f ; model d i s e a s e ( e v e n t = 1 ) = r a c e ; For more information (and other possible parameterizations) see the SAS documentation for PROC LOGISTIC, in particular the section CLASS variable parameterization in DETAILS GOTCHA #3 QUASI-COMPLETE AND COMPLETE SEPARATION If you picture the data as a 2x2 crosstab, then quasi-complete separation occurs when one of the cells is 0. Complete separation occurs when one cell in each row and column is 0. EXAMPLE QUASI-COMPLETE SEPARATION An example of quasi-complete separation is: d a t a t o d a y 7 a ; i n p u t x \$ y \$ d a t a l i n e s ; A C A C A C A C A C B C B C B C B C B C B C B C B C B C B D B D B D B D B D ; ; ; ; B C It varies. Confidence intervals will be extremely wide. But sometimes there is a warning that there was complete or quasicomplete separation, sometimes a note. Sometimes, you get a warning that convergence was not attained. With the weight option you sometimes get a note that observations with nonpositive frequencies or weights were excluded. For example, if you run the data step creating data today7a, and then p roc l o g i s t i c d a t a = t o d a y 7 a ; c l a s s x ; model y = x ; 3

4 you get a warning in the log. But if you run this data step: d a t a today7 ; i n p u t x \$ y \$ weight d a t a l i n e s ; A C 5 A D 0 B C 10 B D 5 ; ; ; and then p roc l o g i s t i c d a t a = today7 ; c l a s s x ; model y = x ; w e i g ht weight ; you get a note in the log. Sometimes, you can delete the offending variable. In the example, there was only one independent variable, but usually, there will be more, and one of them will be the problematic one. Alternatively, if there are multiple categories, it may be sensible to combine some of them. A third possibility is to leave the offending variable in the equation, and simply report the results for the other variables these are still correct. You can then report the coefficients for the offending variables as inf. Finally, you can use exact inference with the EXACT option. REFERENCES Paul Allison s paper at SGF 2008 has a great deal of lucid explanation of this problem [Allison, 2008], although his statement that PROC LOGISTIC gives clear diagnostic messages is, as we have seen, not always true. GOTCHA #4 CONCORDANT AND DISCORDANT Part of the default output from PROC LOGISTIC is a table that has entries including percent concordant and percent discordant. To me, this implies the percent that would correctly be assigned, based on the results of the logistic regression. But that is not what it is. It looks at all possible pairs of observations. A pair is concordant if the observation with the larger value of X also has the larger value of Y. A pair is discordant if the observation with the larger value of X has the smaller value of Y; here, X and Y are the predicted value and the actual value. EXAMPLE For our first example, the output looks like this: Association of Predicted Probabilities and Observed Responses Percent Concordant 25.0 Somers D Percent Discordant 25.0 Gamma Percent Tied 50.0 Tau-a Pairs 4 c It is hard to find documentation of this. I couldn t find it explained in the LOGISTIC documentation. I found a mention of concordant and discordant in the FREQ documentation, but it was not clear what X and Y were, until I searched SAS-L and found an explanation from David Cassell. For what I was thinking of, you need the CTABLE option on the MODEL statement, which gives the proportion correctly classified, the sensitivity, the specificity, and other measures for each of a number of cutpoints of the predicted probability level. By default, it gives probability levels from 0 to 1 at intervals of.02, but if you just want a few, you can get them: p roc l o g i s t i c d a t a = today3 ; c l a s s r a c e sex / param = r e f ; model d i s e a s e ( e v e n t = 1 ) = r a c e sex / c t a b l e pprob = ( ) ; 4

5 which yields Correct Incorrect Percentages Prob Non- Non- Sensi- Speci- False False Level Event Event Event Event Correct tivity ficity POS NEG GOTCHA #5 INTERACTIONS When you add interactions to a logistic model, no odds ratios are printed. EXAMPLE data today3; input disease race \$ sex \$ weight datalines; 0 White F White M White F White M AfrA F AfrA M AfrA F 20 1 AfrA M 28 0 Lat F Lat M 90 1 Lat F 75 1 Lat M 30 0 AIAN F 25 0 AIAN M 15 1 AIAN F 10 1 AIAN M 8 0 Asian F Asian M Asian F 50 1 Asian M 40 0 NHPI F 10 0 NHPI M 8 1 NHPI F 5 1 NHPI M 3 ;;;; run; and then p roc l o g i s t i c d a t a = today3 ; c l a s s r a c e sex / param = r e f ; model d i s e a s e ( e v e n t = 1 ) = r a c e sex r a c e sex ; There are no odds ratios printed. EXPLANATION The reason no odds ratios are printed is a bit complex. Let s take a simpler case, with two independent variables, each with two levels, and, for simplicity, equal cell sizes. If the dependent variable is continuous, then we might get something like this (with DV means in each cell). Male Female Mean White Other Average Now, we can see that White males are lower than White females, but Black males are higher than Black females. The average effect of being male is to add 5, and the average effect of being White is to subtract 10. If there were no interaction, the main effects would give 5

6 Male Female White = = 45 Other = = 65 But if we put proportions in the table and estimate odds ratios, things are trickier. Start with proportions: Male Female Mean White Black Average and it still works just like means. But change to odds: Male Female Mean White 2 to 3 3 t 2 1 to 1 Black 9 to 1 1 to 1 7 to 3 Average 13 to 7 11 to 9 3 to 2 or, dividing through Male Female Mean White Black Average Notice that the average odds is not the average of the odds...e.g ( )/2. What would the main effects model predict? Some people suggest using the CONTRAST statement, but I (and many others) find these confusing and prone to error for interactions. I wasn t able to find a good reference for this method that is, one that explained clearly how to do it. But SAS-L to the rescue We can get what we want with a two step process: First, recode the race and sex variables as numbers d a t a today4 ; s e t t oday3 ; i f r a c e = White t h e n racenum = 0 ; e l s e i f r a c e = AfrA t h e n racenum = 1 ; e l s e i f r a c e = AIAN t h e n racenum = 2 ; e l s e i f r a c e = Asian t h e n racenum = 3 ; e l s e i f r a c e = NHPI t h e n racenum = 4 ; i f sex = M t h e n sexnum = 0 ; e l s e i f sex = F t h e n sexnum = 1 ; r a c e s e x = 100 racenum + sexnum ; Second, write a regular PROC LOGISTIC with the new variable racesex, defining the reference category: p roc l o g i s t i c d a t a = today4 ; c l a s s r a c e s e x ( param = r e f r e f = 100 ) ; model d i s e a s e ( e v e n t = 1 ) = r a c e s e x ; which yields, among other things: 6

7 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits racesex 0 vs racesex 1 vs racesex 101 vs racesex 200 vs racesex 201 vs racesex 300 vs racesex 301 vs racesex 400 vs racesex 401 vs and we just need to remember the coding we used: 100 is African-American, 0 is male, so this is comparing each group to African-American males. An alternate solution (also gotten through SAS-L) is to use a macro created by Paul Thompson, and presented at SUGI 31: www2.sas.com/proceedings/sugi31/ pdf. SUMMARY PROC LOGISTIC offers many traps for the unwary. In addition to the above, one needs to be careful about Wald vs. likelihood ratio tests (SAS defaults to likelihood ratio tests for the confidence intervals and Wald tests for the parameter estimates. For more on these see Long [1997]. In addition to the gotchas presented here, there are the usual issues of model building, variable selection, model interpretation and so on. For more on these issues see: Hosmer and Lemeshow [2000], Long [1997], Allison [1999, 2008] REFERENCES D. W. Hosmer and S. Lemeshow. Applied logistic regression. John Wiley & Sons, New York, 2nd edition, Paul D. Allison. Convergence failures in logistic regression. In SAS Global Forum, number 360, J. S. Long. Regression models of categorical and limited dependent variables. Sage, Thousand Oaks, CA, P. D. Allison. Logistic regression using the SAS system: Theory and application. SAS Institute, Cary, NC,

8 ACKNOWLEDGMENTS CONTACT INFORMATION Peter L. Flom 515 West End Ave New York, NY Phone: (917) Personal webpage: SAS R and all other SAS Institute Inc., product or service names are registered trademarks ore trademarks of SAS Institute Inc., in the USA and other countries. R indicates USA registration. Other brand names and product names are registered trademarks or trademarks of their respective companies. 8

### Multinomial and ordinal logistic regression using PROC LOGISTIC Peter L. Flom National Development and Research Institutes, Inc

ABSTRACT Multinomial and ordinal logistic regression using PROC LOGISTIC Peter L. Flom National Development and Research Institutes, Inc Logistic regression may be useful when we are trying to model a

### SUGI 29 Statistics and Data Analysis

Paper 194-29 Head of the CLASS: Impress your colleagues with a superior understanding of the CLASS statement in PROC LOGISTIC Michelle L. Pritchard and David J. Pasta Ovation Research Group, San Francisco,

### Cool Tools for PROC LOGISTIC

Cool Tools for PROC LOGISTIC Paul D. Allison Statistical Horizons LLC and the University of Pennsylvania March 2013 www.statisticalhorizons.com 1 New Features in LOGISTIC ODDSRATIO statement EFFECTPLOT

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### ln(p/(1-p)) = α +β*age35plus, where p is the probability or odds of drinking

Dummy Coding for Dummies Kathryn Martin, Maternal, Child and Adolescent Health Program, California Department of Public Health ABSTRACT There are a number of ways to incorporate categorical variables into

### Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through

### Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@

Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Yanchun Xu, Andrius Kubilius Joint Commission on Accreditation of Healthcare Organizations,

### 5. Ordinal regression: cumulative categories proportional odds. 6. Ordinal regression: comparison to single reference generalized logits

Lecture 23 1. Logistic regression with binary response 2. Proc Logistic and its surprises 3. quadratic model 4. Hosmer-Lemeshow test for lack of fit 5. Ordinal regression: cumulative categories proportional

### Modeling Lifetime Value in the Insurance Industry

Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### A LOGISTIC REGRESSION MODEL TO PREDICT FRESHMEN ENROLLMENTS Vijayalakshmi Sampath, Andrew Flagel, Carolina Figueroa

A LOGISTIC REGRESSION MODEL TO PREDICT FRESHMEN ENROLLMENTS Vijayalakshmi Sampath, Andrew Flagel, Carolina Figueroa ABSTRACT Predictive modeling is the technique of using historical information on a certain

### Raul Cruz-Cano, HLTH653 Spring 2013

Multilevel Modeling-Logistic Schedule 3/18/2013 = Spring Break 3/25/2013 = Longitudinal Analysis 4/1/2013 = Midterm (Exercises 1-5, not Longitudinal) Introduction Just as with linear regression, logistic

### Statistics and Data Analysis

NESUG 27 PRO LOGISTI: The Logistics ehind Interpreting ategorical Variable Effects Taylor Lewis, U.S. Office of Personnel Management, Washington, D STRT The goal of this paper is to demystify how SS models

### A Tutorial on Logistic Regression

A Tutorial on Logistic Regression Ying So, SAS Institute Inc., Cary, NC ABSTRACT Many procedures in SAS/STAT can be used to perform logistic regression analysis: CATMOD, GENMOD,LOGISTIC, and PROBIT. Each

### Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 libname in1 >c:\=; Data first; Set in1.extract; A=1; PROC LOGIST OUTEST=DD MAXITER=100 ORDER=DATA; OUTPUT OUT=CC XBETA=XB P=PROB; MODEL

### Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.) Logistic regression generalizes methods for 2-way tables Adds capability studying several predictors, but Limited to

### VI. Introduction to Logistic Regression

VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models

### Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

### Abbas S. Tavakoli, DrPH, MPH, ME 1 ; Nikki R. Wooten, PhD, LISW-CP 2,3, Jordan Brittingham, MSPH 4

1 Paper 1680-2016 Using GENMOD to Analyze Correlated Data on Military System Beneficiaries Receiving Inpatient Behavioral Care in South Carolina Care Systems Abbas S. Tavakoli, DrPH, MPH, ME 1 ; Nikki

### Multivariate Logistic Regression

1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

### Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY

Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY ABSTRACT PROC FREQ is an essential procedure within BASE

### Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

### Displaying and comparing correlated ROC curves with the SAS System

Displaying and comparing correlated ROC curves with the SAS System Barbara Schneider, University of Vienna, Austria ABSTRACT Diagnosis is an essential part of clinical practice. Much medical research is

### Binary Logistic Regression

Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including

### Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC

Paper AA08-2013 Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC Robert G. Downer, Grand Valley State University, Allendale, MI ABSTRACT

### Lecture 19: Conditional Logistic Regression

Lecture 19: Conditional Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina

### Educational Attainment of Veterans: 2000 to 2009

Educational Attainment of Veterans: to 9 January 11 NCVAS National Center for Veterans Analysis and Statistics Data Source and Methods Data for this analysis come from years of the Current Population Survey

Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén

### XPost: Excel Workbooks for the Post-estimation Interpretation of Regression Models for Categorical Dependent Variables

XPost: Excel Workbooks for the Post-estimation Interpretation of Regression Models for Categorical Dependent Variables Contents Simon Cheng hscheng@indiana.edu php.indiana.edu/~hscheng/ J. Scott Long jslong@indiana.edu

Chapter 39 The LOGISTIC Procedure Chapter Table of Contents OVERVIEW...1903 GETTING STARTED...1906 SYNTAX...1910 PROCLOGISTICStatement...1910 BYStatement...1912 CLASSStatement...1913 CONTRAST Statement.....1916

### USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION. Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA

USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA Logistic regression is an increasingly popular statistical technique

### Multiple logistic regression analysis of cigarette use among high school students

Multiple logistic regression analysis of cigarette use among high school students ABSTRACT Joseph Adwere-Boamah Alliant International University A binary logistic regression analysis was performed to predict

### Charles Secolsky County College of Morris. Sathasivam 'Kris' Krishnan The Richard Stockton College of New Jersey

Using logistic regression for validating or invalidating initial statewide cut-off scores on basic skills placement tests at the community college level Abstract Charles Secolsky County College of Morris

### Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc.

Paper 264-26 Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc. Abstract: There are several procedures in the SAS System for statistical modeling. Most statisticians who use the SAS

### U.S. Population Projections: 2012 to 2060

U.S. Population Projections: 2012 to 2060 Jennifer M. Ortman Population Division Presentation for the FFC/GW Brown Bag Seminar Series on Forecasting Washington, DC February 7, 2013 2012 National Projections

### ABSTRACT INTRODUCTION

Paper SP03-2009 Illustrative Logistic Regression Examples using PROC LOGISTIC: New Features in SAS/STAT 9.2 Robert G. Downer, Grand Valley State University, Allendale, MI Patrick J. Richardson, Van Andel

### Two Correlated Proportions (McNemar Test)

Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with

### Counting the Ways to Count in SAS. Imelda C. Go, South Carolina Department of Education, Columbia, SC

Paper CC 14 Counting the Ways to Count in SAS Imelda C. Go, South Carolina Department of Education, Columbia, SC ABSTRACT This paper first takes the reader through a progression of ways to count in SAS.

### Constructing a Table of Survey Data with Percent and Confidence Intervals in every Direction

Constructing a Table of Survey Data with Percent and Confidence Intervals in every Direction David Izrael, Abt Associates Sarah W. Ball, Abt Associates Sara M.A. Donahue, Abt Associates ABSTRACT We examined

### Addressing Racial/Ethnic Disparities in Hypertensive Health Center Patients

Addressing Racial/Ethnic Disparities in Hypertensive Health Center Patients Academy Health June 11, 2011 Quyen Ngo Metzger, MD, MPH Data Branch Chief, Office of Quality and Data U.S. Department of Health

### Is it statistically significant? The chi-square test

UAS Conference Series 2013/14 Is it statistically significant? The chi-square test Dr Gosia Turner Student Data Management and Analysis 14 September 2010 Page 1 Why chi-square? Tests whether two categorical

### Workshop on Using the National Survey of Children s s Health Dataset: Practical Applications

Workshop on Using the National Survey of Children s s Health Dataset: Practical Applications Julian Luke Stephen Blumberg Centers for Disease Control and Prevention National Center for Health Statistics

### School of Nursing Faculty Salary Equity Report and Action Plan

July 1, 2015 School of Nursing Faculty Salary Equity Report and Action Plan Shari L. Dworkin, Ph.D., M.S. Associate Dean for Academic Affairs Overview: In 2012, then UC President Mark Yudof charged each

### How to set the main menu of STATA to default factory settings standards

University of Pretoria Data analysis for evaluation studies Examples in STATA version 11 List of data sets b1.dta (To be created by students in class) fp1.xls (To be provided to students) fp1.txt (To be

### Categorical Data Analysis

Richard L. Scheaffer University of Florida The reference material and many examples for this section are based on Chapter 8, Analyzing Association Between Categorical Variables, from Statistical Methods

### Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry

Paper 12028 Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry Junxiang Lu, Ph.D. Overland Park, Kansas ABSTRACT Increasingly, companies are viewing

### Free Trial - BIRT Analytics - IAAs

Free Trial - BIRT Analytics - IAAs 11. Predict Customer Gender Once we log in to BIRT Analytics Free Trial we would see that we have some predefined advanced analysis ready to be used. Those saved analysis

### Statistics, Data Analysis & Econometrics

Using the LOGISTIC Procedure to Model Responses to Financial Services Direct Marketing David Marsh, Senior Credit Risk Modeler, Canadian Tire Financial Services, Welland, Ontario ABSTRACT It is more important

### Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

### EXPANDING THE EVIDENCE BASE IN OUTCOMES RESEARCH: USING LINKED ELECTRONIC MEDICAL RECORDS (EMR) AND CLAIMS DATA

EXPANDING THE EVIDENCE BASE IN OUTCOMES RESEARCH: USING LINKED ELECTRONIC MEDICAL RECORDS (EMR) AND CLAIMS DATA A CASE STUDY EXAMINING RISK FACTORS AND COSTS OF UNCONTROLLED HYPERTENSION ISPOR 2013 WORKSHOP

### Linear Regression in SPSS

Linear Regression in SPSS Data: mangunkill.sav Goals: Examine relation between number of handguns registered (nhandgun) and number of man killed (mankill) checking Predict number of man killed using number

### HEALTH INSURANCE COVERAGE STATUS. 2009-2013 American Community Survey 5-Year Estimates

S2701 HEALTH INSURANCE COVERAGE STATUS 2009-2013 American Community Survey 5-Year Estimates Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found

### Predicting Customer Churn in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS

Paper 114-27 Predicting Customer in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS Junxiang Lu, Ph.D. Sprint Communications Company Overland Park, Kansas ABSTRACT

### Missing Data & How to Deal: An overview of missing data. Melissa Humphries Population Research Center

Missing Data & How to Deal: An overview of missing data Melissa Humphries Population Research Center Goals Discuss ways to evaluate and understand missing data Discuss common missing data methods Know

### Statistics in Retail Finance. Chapter 2: Statistical models of default

Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision

### Logistic regression diagnostics

Logistic regression diagnostics Biometry 755 Spring 2009 Logistic regression diagnostics p. 1/28 Assessing model fit A good model is one that fits the data well, in the sense that the values predicted

### LOGISTIC REGRESSION ANALYSIS

LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic

### Alex Vidras, David Tysinger. Merkle Inc.

Using PROC LOGISTIC, SAS MACROS and ODS Output to evaluate the consistency of independent variables during the development of logistic regression models. An example from the retail banking industry ABSTRACT

### Health Care and Life Sciences

Sensitivity, Specificity, Accuracy, Associated Confidence Interval and ROC Analysis with Practical SAS Implementations Wen Zhu 1, Nancy Zeng 2, Ning Wang 2 1 K&L consulting services, Inc, Fort Washington,

### Emailing Automated Notification of Errors in a Batch SAS Program Julie Kilburn, City of Hope, Duarte, CA Rebecca Ottesen, City of Hope, Duarte, CA

Emailing Automated Notification of Errors in a Batch SAS Program Julie Kilburn, City of Hope, Duarte, CA Rebecca Ottesen, City of Hope, Duarte, CA ABSTRACT With multiple programmers contributing to a batch

### Exploring Race & Ethnicity Health Insurance Coverage Differences: Results from the 2011 Oregon Health Insurance Survey

Office for Oregon Health Policy and Research Exploring Race & Ethnicity Health Insurance Coverage Differences: Results from the 2011 Oregon Health Insurance Survey September 2012 Table of Contents Race

### Paper D10 2009. Ranking Predictors in Logistic Regression. Doug Thompson, Assurant Health, Milwaukee, WI

Paper D10 2009 Ranking Predictors in Logistic Regression Doug Thompson, Assurant Health, Milwaukee, WI ABSTRACT There is little consensus on how best to rank predictors in logistic regression. This paper

### ABSTRACT INTRODUCTION STUDY DESCRIPTION

ABSTRACT Paper 1675-2014 Validating Self-Reported Survey Measures Using SAS Sarah A. Lyons MS, Kimberly A. Kaphingst ScD, Melody S. Goodman PhD Washington University School of Medicine Researchers often

### Same-sex Couples Consistency in Reports of Marital Status. Housing and Household Economic Statistics Division

Same-sex Couples Consistency in Reports of Marital Status Author: Affiliation: Daphne Lofquist U.S. Census Bureau Housing and Household Economic Statistics Division Phone: 301-763-2416 Fax: 301-457-3500

### Simple Rules to Remember When Working with Indexes Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California

Simple Rules to Remember When Working with Indexes Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California Abstract SAS users are always interested in learning techniques related

### Segmentation For Insurance Payments Michael Sherlock, Transcontinental Direct, Warminster, PA

Segmentation For Insurance Payments Michael Sherlock, Transcontinental Direct, Warminster, PA ABSTRACT An online insurance agency has built a base of names that responded to different offers from various

### One problem > Multiple solutions; various ways of removing duplicates from dataset using SAS Jaya Dhillon, Louisiana State University

One problem > Multiple solutions; various ways of removing duplicates from dataset using SAS Jaya Dhillon, Louisiana State University ABSTRACT In real world, analysts seldom come across data which is in

### Does Financial Sophistication Matter in Retirement Preparedness of U.S Households? Evidence from the 2010 Survey of Consumer Finances

Does Financial Sophistication Matter in Retirement Preparedness of U.S Households? Evidence from the 2010 Survey of Consumer Finances Kyoung Tae Kim, The Ohio State University 1 Sherman D. Hanna, The Ohio

### Multinomial Logistic Regression

Multinomial Logistic Regression Dr. Jon Starkweather and Dr. Amanda Kay Moske Multinomial logistic regression is used to predict categorical placement in or the probability of category membership on a

### Logistic (RLOGIST) Example #3

Logistic (RLOGIST) Example #3 SUDAAN Statements and Results Illustrated PREDMARG (predicted marginal proportion) CONDMARG (conditional marginal proportion) PRED_EFF pairwise comparison COND_EFF pairwise

### Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

### Beginning Tutorials. PROC FREQ: It s More Than Counts Richard Severino, The Queen s Medical Center, Honolulu, HI OVERVIEW.

Paper 69-25 PROC FREQ: It s More Than Counts Richard Severino, The Queen s Medical Center, Honolulu, HI ABSTRACT The FREQ procedure can be used for more than just obtaining a simple frequency distribution

### MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

### 2003 National Survey of College Graduates Nonresponse Bias Analysis 1

2003 National Survey of College Graduates Nonresponse Bias Analysis 1 Michael White U.S. Census Bureau, Washington, DC 20233 Abstract The National Survey of College Graduates (NSCG) is a longitudinal survey

### Ordinal Regression. Chapter

Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

### SPSS Resources. 1. See website (readings) for SPSS tutorial & Stats handout

Analyzing Data SPSS Resources 1. See website (readings) for SPSS tutorial & Stats handout Don t have your own copy of SPSS? 1. Use the libraries to analyze your data 2. Download a trial version of SPSS

### Simple Predictive Analytics Curtis Seare

Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

### Data Cleaning and Missing Data Analysis

Data Cleaning and Missing Data Analysis Dan Merson vagabond@psu.edu India McHale imm120@psu.edu April 13, 2010 Overview Introduction to SACS What do we mean by Data Cleaning and why do we do it? The SACS

### Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional

### Paper 45-2010 Evaluation of methods to determine optimal cutpoints for predicting mortgage default Abstract Introduction

Paper 45-2010 Evaluation of methods to determine optimal cutpoints for predicting mortgage default Valentin Todorov, Assurant Specialty Property, Atlanta, GA Doug Thompson, Assurant Health, Milwaukee,

### The Demand for Financial Planning Services 1

The Demand for Financial Planning Services 1 Sherman D. Hanna, Ohio State University Professor, Consumer Sciences Department Ohio State University 1787 Neil Avenue Columbus, OH 43210-1290 Phone: 614-292-4584

### Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Arun K Mandapaka, Amit Singh Kushwah, Dr.Goutam Chakraborty Oklahoma State University, OK, USA ABSTRACT Direct

### Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)

Cairo University Faculty of Economics and Political Science Statistics Department English Section Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study) Prepared

### Home Computers and Internet Use in the United States: August 2000

Home Computers and Internet Use in the United States: August 2000 Special Studies Issued September 2001 P23-207 Defining computer and Internet access All individuals living in a household in which the

### HIV/AIDS in the Binghamton Tri-County Region Revised June 2007

HIV/AIDS in the Binghamton Tri-County Revised June 2007 HIV is the virus that causes AIDS. You can be infected with HIV but not diagnosed with AIDS. The Centers for Disease Control (CDC) estimated that

### Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

### For example, enter the following data in three COLUMNS in a new View window.

Statistics with Statview - 18 Paired t-test A paired t-test compares two groups of measurements when the data in the two groups are in some way paired between the groups (e.g., before and after on the

### Logistic Regression. http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests

Logistic Regression http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests Overview Binary (or binomial) logistic regression is a form of regression which is used when the dependent is a dichotomy

### New SAS Procedures for Analysis of Sample Survey Data

New SAS Procedures for Analysis of Sample Survey Data Anthony An and Donna Watts, SAS Institute Inc, Cary, NC Abstract Researchers use sample surveys to obtain information on a wide variety of issues Many

### Using Names To Check Accuracy of Race and Gender Coding in NAEP

Using Names To Check Accuracy of Race and Gender Coding in NAEP Jennifer Czuprynski Kali, James Bethel, John Burke, David Morganstein, and Sharon Hirabayashi Westat Keywords: Data quality, Coding errors,

### Interactions involving Categorical Predictors

Interactions involving Categorical Predictors Today s Class: To CLASS or not to CLASS: Manual vs. program-created differences among groups Interactions of continuous and categorical predictors Interactions

### James E. Bartlett, II is Assistant Professor, Department of Business Education and Office Administration, Ball State University, Muncie, Indiana.

Organizational Research: Determining Appropriate Sample Size in Survey Research James E. Bartlett, II Joe W. Kotrlik Chadwick C. Higgins The determination of sample size is a common task for many organizational

### Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

### Projections of the Size and Composition of the U.S. Population: 2014 to 2060 Population Estimates and Projections

Projections of the Size and Composition of the U.S. Population: to Population Estimates and Projections Current Population Reports By Sandra L. Colby and Jennifer M. Ortman Issued March 15 P25-1143 INTRODUCTION

### Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL

Paper SA01-2012 Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL ABSTRACT Analysts typically consider combinations

### Categories of Data. Communication & Journalism as a Broad Field

Graduate Enrollment and Degrees in Communication/Journalism: Benchmarking Data from the Council of Graduate Schools/Graduate Record Examinations Survey Since 1986, the Council of Graduate Schools (CGS)