PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY

Size: px
Start display at page:

Download "PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY"

Transcription

1 PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY ABSTRACT Keywords: Logistic. INTRODUCTION This paper covers some gotchas in SAS R PROC LOGISTIC. A gotcha is a mistake that isn t obviously a mistake the program runs, there may be a note or a warning, but no errors. Output appears. But it s the wrong output. This is not a primer on PROC LOGISTIC, much less on logistic regression. There are many good books on logistic regression; one such is Hosmer and Lemeshow [2000]. Each section has several subsections. First, I identify the gotcha. Then I give an example. Third, I show what evidence you have that it occurs. Fourth, I show how to fix it in some cases, referring to other resources. In some cases, I offer an explanation between the evidence and the solution. GOTCHA #1 CODING 0 AND 1 PROC LOGISTIC can be used to run logistic regression on a dichotomous dependent variable. Often, these are coded 0 and 1, with 0 for no or the equivalent, and 1 for yes or the equivalent. In this case, we are usually interested in modeling the probability of a yes. However, by default, SAS models the probability of a 0 (which would be a no ). For example, we might be interested in modeling the presence of a disease, with 0 meaning the person is not infected, and 1 meaning he or she is infected. To keep it simple, I will use one independent variable: sex, code as 1 for female and 0 for male. So: d a t a t o d a y ; i n p u t d i s e a s e f emale weight ; d a t a l i n e s ; ; ; ; ; we then run PROC LOGISTIC: p roc l o g i s t i c d a t a = t o d a y ; model d i s e a s e = f emale ; and get, among other output, an odds ratio estimate of 1.39 for female, while it s clear that men are much more likely to be infected. The evidence that this is happening is one line in the output: {Probability modeled is disease=0} and several lines in the log: NOTE: PROC LOGISTIC is modeling the probability that disease=0. One way to change this to model the probability that disease=1 is to specify the response variable option EVENT= 1 There are several solutions. The simplest is not the one mentioned in the log, but rather the DESCENDING option. 1

2 p roc l o g i s t i c d a t a = t o d a y d e s c e n d i n g ; model d i s e a s e = f emale ; Another method is the one mentioned in the log, which is more general: p roc l o g i s t i c d a t a = t o d a y ; model d i s e a s e ( e v e n t = 1 ) = female ; GOTCHA #2 EFFECT CODING AND CLASS VARIABLES When you have a categorical independent variable with more than 2 levels, you need to define it with a CLASS statement. In PROC GLM the default coding for this is dummy coding. In PROC LOGISTIC, it s effect coding. To me, effect coding is quite unnatural. EXAMPLE Continuing with the same example of modeling probability of infection, suppose you now have race/ethnicity as an IV, with 6 categories, as defined by the Census Bureau: White, Black/African American, Hispanic/Latino, American Indian/Alaskan Native, Native Hawaiian or other Pacific Islander, and Asian. p roc l o g i s t i c d a t a = today2 ; c l a s s r a c e ; model d i s e a s e ( e v e n t = 1 ) = r a c e ; and get parameter estimates that include (among much else): Parameter DF Estimate Intercept race AIAN race AfrA race Asian race Lat race NHPI and OR estimates Point Effect Estimate race AIAN vs White race AfrA vs White race Asian vs White race Lat vs White race NHPI vs White but we know that the OR estimate should be e OR, and, for example, e.06 = 0.94 not 1. The design matrix. With the default, the design matrix looks like this: 2

3 race AIAN AfrA Asian Lat NHPI White and each parameter estimate estimates the difference between that level and the average of the other levels. On the other hand, with dummy (or reference) coding, it looks like race AIAN AfrA Asian Lat NHPI White and each parameter estimates the difference between that level and the reference group. Use the param = reference option on the class statement: p roc l o g i s t i c d a t a = today2 ; c l a s s r a c e / param = r e f ; model d i s e a s e ( e v e n t = 1 ) = r a c e ; For more information (and other possible parameterizations) see the SAS documentation for PROC LOGISTIC, in particular the section CLASS variable parameterization in DETAILS GOTCHA #3 QUASI-COMPLETE AND COMPLETE SEPARATION If you picture the data as a 2x2 crosstab, then quasi-complete separation occurs when one of the cells is 0. Complete separation occurs when one cell in each row and column is 0. EXAMPLE QUASI-COMPLETE SEPARATION An example of quasi-complete separation is: d a t a t o d a y 7 a ; i n p u t x $ y d a t a l i n e s ; A C A C A C A C A C B C B C B C B C B C B C B C B C B C B D B D B D B D B D ; ; ; ; B C It varies. Confidence intervals will be extremely wide. But sometimes there is a warning that there was complete or quasicomplete separation, sometimes a note. Sometimes, you get a warning that convergence was not attained. With the weight option you sometimes get a note that observations with nonpositive frequencies or weights were excluded. For example, if you run the data step creating data today7a, and then p roc l o g i s t i c d a t a = t o d a y 7 a ; c l a s s x ; model y = x ; 3

4 you get a warning in the log. But if you run this data step: d a t a today7 ; i n p u t x $ y $ d a t a l i n e s ; A C 5 A D 0 B C 10 B D 5 ; ; ; and then p roc l o g i s t i c d a t a = today7 ; c l a s s x ; model y = x ; w e i g ht weight ; you get a note in the log. Sometimes, you can delete the offending variable. In the example, there was only one independent variable, but usually, there will be more, and one of them will be the problematic one. Alternatively, if there are multiple categories, it may be sensible to combine some of them. A third possibility is to leave the offending variable in the equation, and simply report the results for the other variables these are still correct. You can then report the coefficients for the offending variables as inf. Finally, you can use exact inference with the EXACT option. REFERENCES Paul Allison s paper at SGF 2008 has a great deal of lucid explanation of this problem [Allison, 2008], although his statement that PROC LOGISTIC gives clear diagnostic messages is, as we have seen, not always true. GOTCHA #4 CONCORDANT AND DISCORDANT Part of the default output from PROC LOGISTIC is a table that has entries including percent concordant and percent discordant. To me, this implies the percent that would correctly be assigned, based on the results of the logistic regression. But that is not what it is. It looks at all possible pairs of observations. A pair is concordant if the observation with the larger value of X also has the larger value of Y. A pair is discordant if the observation with the larger value of X has the smaller value of Y; here, X and Y are the predicted value and the actual value. EXAMPLE For our first example, the output looks like this: Association of Predicted Probabilities and Observed Responses Percent Concordant 25.0 Somers D Percent Discordant 25.0 Gamma Percent Tied 50.0 Tau-a Pairs 4 c It is hard to find documentation of this. I couldn t find it explained in the LOGISTIC documentation. I found a mention of concordant and discordant in the FREQ documentation, but it was not clear what X and Y were, until I searched SAS-L and found an explanation from David Cassell. For what I was thinking of, you need the CTABLE option on the MODEL statement, which gives the proportion correctly classified, the sensitivity, the specificity, and other measures for each of a number of cutpoints of the predicted probability level. By default, it gives probability levels from 0 to 1 at intervals of.02, but if you just want a few, you can get them: p roc l o g i s t i c d a t a = today3 ; c l a s s r a c e sex / param = r e f ; model d i s e a s e ( e v e n t = 1 ) = r a c e sex / c t a b l e pprob = ( ) ; 4

5 which yields Correct Incorrect Percentages Prob Non- Non- Sensi- Speci- False False Level Event Event Event Event Correct tivity ficity POS NEG GOTCHA #5 INTERACTIONS When you add interactions to a logistic model, no odds ratios are printed. EXAMPLE data today3; input disease race $ sex $ datalines; 0 White F White M White F White M AfrA F AfrA M AfrA F 20 1 AfrA M 28 0 Lat F Lat M 90 1 Lat F 75 1 Lat M 30 0 AIAN F 25 0 AIAN M 15 1 AIAN F 10 1 AIAN M 8 0 Asian F Asian M Asian F 50 1 Asian M 40 0 NHPI F 10 0 NHPI M 8 1 NHPI F 5 1 NHPI M 3 ;;;; run; and then p roc l o g i s t i c d a t a = today3 ; c l a s s r a c e sex / param = r e f ; model d i s e a s e ( e v e n t = 1 ) = r a c e sex r a c e sex ; There are no odds ratios printed. EXPLANATION The reason no odds ratios are printed is a bit complex. Let s take a simpler case, with two independent variables, each with two levels, and, for simplicity, equal cell sizes. If the dependent variable is continuous, then we might get something like this (with DV means in each cell). Male Female Mean White Other Average Now, we can see that White males are lower than White females, but Black males are higher than Black females. The average effect of being male is to add 5, and the average effect of being White is to subtract 10. If there were no interaction, the main effects would give 5

6 Male Female White = = 45 Other = = 65 But if we put proportions in the table and estimate odds ratios, things are trickier. Start with proportions: Male Female Mean White Black Average and it still works just like means. But change to odds: Male Female Mean White 2 to 3 3 t 2 1 to 1 Black 9 to 1 1 to 1 7 to 3 Average 13 to 7 11 to 9 3 to 2 or, dividing through Male Female Mean White Black Average Notice that the average odds is not the average of the odds...e.g ( )/2. What would the main effects model predict? Some people suggest using the CONTRAST statement, but I (and many others) find these confusing and prone to error for interactions. I wasn t able to find a good reference for this method that is, one that explained clearly how to do it. But SAS-L to the rescue We can get what we want with a two step process: First, recode the race and sex variables as numbers d a t a today4 ; s e t t oday3 ; i f r a c e = White t h e n racenum = 0 ; e l s e i f r a c e = AfrA t h e n racenum = 1 ; e l s e i f r a c e = AIAN t h e n racenum = 2 ; e l s e i f r a c e = Asian t h e n racenum = 3 ; e l s e i f r a c e = NHPI t h e n racenum = 4 ; i f sex = M t h e n sexnum = 0 ; e l s e i f sex = F t h e n sexnum = 1 ; r a c e s e x = 100 racenum + sexnum ; Second, write a regular PROC LOGISTIC with the new variable racesex, defining the reference category: p roc l o g i s t i c d a t a = today4 ; c l a s s r a c e s e x ( param = r e f r e f = 100 ) ; model d i s e a s e ( e v e n t = 1 ) = r a c e s e x ; which yields, among other things: 6

7 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits racesex 0 vs racesex 1 vs racesex 101 vs racesex 200 vs racesex 201 vs racesex 300 vs racesex 301 vs racesex 400 vs racesex 401 vs and we just need to remember the coding we used: 100 is African-American, 0 is male, so this is comparing each group to African-American males. An alternate solution (also gotten through SAS-L) is to use a macro created by Paul Thompson, and presented at SUGI 31: www2.sas.com/proceedings/sugi31/ pdf. SUMMARY PROC LOGISTIC offers many traps for the unwary. In addition to the above, one needs to be careful about Wald vs. likelihood ratio tests (SAS defaults to likelihood ratio tests for the confidence intervals and Wald tests for the parameter estimates. For more on these see Long [1997]. In addition to the gotchas presented here, there are the usual issues of model building, variable selection, model interpretation and so on. For more on these issues see: Hosmer and Lemeshow [2000], Long [1997], Allison [1999, 2008] REFERENCES D. W. Hosmer and S. Lemeshow. Applied logistic regression. John Wiley & Sons, New York, 2nd edition, Paul D. Allison. Convergence failures in logistic regression. In SAS Global Forum, number 360, J. S. Long. Regression models of categorical and limited dependent variables. Sage, Thousand Oaks, CA, P. D. Allison. Logistic regression using the SAS system: Theory and application. SAS Institute, Cary, NC,

8 ACKNOWLEDGMENTS CONTACT INFORMATION Peter L. Flom 515 West End Ave New York, NY Phone: (917) Personal webpage: SAS R and all other SAS Institute Inc., product or service names are registered trademarks ore trademarks of SAS Institute Inc., in the USA and other countries. R indicates USA registration. Other brand names and product names are registered trademarks or trademarks of their respective companies. 8

SUGI 29 Statistics and Data Analysis

SUGI 29 Statistics and Data Analysis Paper 194-29 Head of the CLASS: Impress your colleagues with a superior understanding of the CLASS statement in PROC LOGISTIC Michelle L. Pritchard and David J. Pasta Ovation Research Group, San Francisco,

More information

Multinomial and ordinal logistic regression using PROC LOGISTIC Peter L. Flom National Development and Research Institutes, Inc

Multinomial and ordinal logistic regression using PROC LOGISTIC Peter L. Flom National Development and Research Institutes, Inc ABSTRACT Multinomial and ordinal logistic regression using PROC LOGISTIC Peter L. Flom National Development and Research Institutes, Inc Logistic regression may be useful when we are trying to model a

More information

Cool Tools for PROC LOGISTIC

Cool Tools for PROC LOGISTIC Cool Tools for PROC LOGISTIC Paul D. Allison Statistical Horizons LLC and the University of Pennsylvania March 2013 www.statisticalhorizons.com 1 New Features in LOGISTIC ODDSRATIO statement EFFECTPLOT

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through

More information

Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@

Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Yanchun Xu, Andrius Kubilius Joint Commission on Accreditation of Healthcare Organizations,

More information

ln(p/(1-p)) = α +β*age35plus, where p is the probability or odds of drinking

ln(p/(1-p)) = α +β*age35plus, where p is the probability or odds of drinking Dummy Coding for Dummies Kathryn Martin, Maternal, Child and Adolescent Health Program, California Department of Public Health ABSTRACT There are a number of ways to incorporate categorical variables into

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Modeling Lifetime Value in the Insurance Industry

Modeling Lifetime Value in the Insurance Industry Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

A LOGISTIC REGRESSION MODEL TO PREDICT FRESHMEN ENROLLMENTS Vijayalakshmi Sampath, Andrew Flagel, Carolina Figueroa

A LOGISTIC REGRESSION MODEL TO PREDICT FRESHMEN ENROLLMENTS Vijayalakshmi Sampath, Andrew Flagel, Carolina Figueroa A LOGISTIC REGRESSION MODEL TO PREDICT FRESHMEN ENROLLMENTS Vijayalakshmi Sampath, Andrew Flagel, Carolina Figueroa ABSTRACT Predictive modeling is the technique of using historical information on a certain

More information

Statistics and Data Analysis

Statistics and Data Analysis NESUG 27 PRO LOGISTI: The Logistics ehind Interpreting ategorical Variable Effects Taylor Lewis, U.S. Office of Personnel Management, Washington, D STRT The goal of this paper is to demystify how SS models

More information

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.) Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.) Logistic regression generalizes methods for 2-way tables Adds capability studying several predictors, but Limited to

More information

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 libname in1 >c:\=; Data first; Set in1.extract; A=1; PROC LOGIST OUTEST=DD MAXITER=100 ORDER=DATA; OUTPUT OUT=CC XBETA=XB P=PROB; MODEL

More information

VI. Introduction to Logistic Regression

VI. Introduction to Logistic Regression VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models

More information

Abbas S. Tavakoli, DrPH, MPH, ME 1 ; Nikki R. Wooten, PhD, LISW-CP 2,3, Jordan Brittingham, MSPH 4

Abbas S. Tavakoli, DrPH, MPH, ME 1 ; Nikki R. Wooten, PhD, LISW-CP 2,3, Jordan Brittingham, MSPH 4 1 Paper 1680-2016 Using GENMOD to Analyze Correlated Data on Military System Beneficiaries Receiving Inpatient Behavioral Care in South Carolina Care Systems Abbas S. Tavakoli, DrPH, MPH, ME 1 ; Nikki

More information

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

More information

Multivariate Logistic Regression

Multivariate Logistic Regression 1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

More information

Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

More information

Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY

Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY ABSTRACT PROC FREQ is an essential procedure within BASE

More information

Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC

Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC Paper AA08-2013 Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC Robert G. Downer, Grand Valley State University, Allendale, MI ABSTRACT

More information

Lecture 19: Conditional Logistic Regression

Lecture 19: Conditional Logistic Regression Lecture 19: Conditional Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina

More information

Educational Attainment of Veterans: 2000 to 2009

Educational Attainment of Veterans: 2000 to 2009 Educational Attainment of Veterans: to 9 January 11 NCVAS National Center for Veterans Analysis and Statistics Data Source and Methods Data for this analysis come from years of the Current Population Survey

More information

Binary Logistic Regression

Binary Logistic Regression Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including

More information

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén www.statmodel.com. Table Of Contents

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén www.statmodel.com. Table Of Contents Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén

More information

XPost: Excel Workbooks for the Post-estimation Interpretation of Regression Models for Categorical Dependent Variables

XPost: Excel Workbooks for the Post-estimation Interpretation of Regression Models for Categorical Dependent Variables XPost: Excel Workbooks for the Post-estimation Interpretation of Regression Models for Categorical Dependent Variables Contents Simon Cheng hscheng@indiana.edu php.indiana.edu/~hscheng/ J. Scott Long jslong@indiana.edu

More information

Chapter 39 The LOGISTIC Procedure. Chapter Table of Contents

Chapter 39 The LOGISTIC Procedure. Chapter Table of Contents Chapter 39 The LOGISTIC Procedure Chapter Table of Contents OVERVIEW...1903 GETTING STARTED...1906 SYNTAX...1910 PROCLOGISTICStatement...1910 BYStatement...1912 CLASSStatement...1913 CONTRAST Statement.....1916

More information

ABSTRACT INTRODUCTION

ABSTRACT INTRODUCTION Paper SP03-2009 Illustrative Logistic Regression Examples using PROC LOGISTIC: New Features in SAS/STAT 9.2 Robert G. Downer, Grand Valley State University, Allendale, MI Patrick J. Richardson, Van Andel

More information

Charles Secolsky County College of Morris. Sathasivam 'Kris' Krishnan The Richard Stockton College of New Jersey

Charles Secolsky County College of Morris. Sathasivam 'Kris' Krishnan The Richard Stockton College of New Jersey Using logistic regression for validating or invalidating initial statewide cut-off scores on basic skills placement tests at the community college level Abstract Charles Secolsky County College of Morris

More information

Multiple logistic regression analysis of cigarette use among high school students

Multiple logistic regression analysis of cigarette use among high school students Multiple logistic regression analysis of cigarette use among high school students ABSTRACT Joseph Adwere-Boamah Alliant International University A binary logistic regression analysis was performed to predict

More information

USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION. Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA

USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION. Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA Logistic regression is an increasingly popular statistical technique

More information

Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc.

Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc. Paper 264-26 Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc. Abstract: There are several procedures in the SAS System for statistical modeling. Most statisticians who use the SAS

More information

U.S. Population Projections: 2012 to 2060

U.S. Population Projections: 2012 to 2060 U.S. Population Projections: 2012 to 2060 Jennifer M. Ortman Population Division Presentation for the FFC/GW Brown Bag Seminar Series on Forecasting Washington, DC February 7, 2013 2012 National Projections

More information

Addressing Racial/Ethnic Disparities in Hypertensive Health Center Patients

Addressing Racial/Ethnic Disparities in Hypertensive Health Center Patients Addressing Racial/Ethnic Disparities in Hypertensive Health Center Patients Academy Health June 11, 2011 Quyen Ngo Metzger, MD, MPH Data Branch Chief, Office of Quality and Data U.S. Department of Health

More information

Two Correlated Proportions (McNemar Test)

Two Correlated Proportions (McNemar Test) Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with

More information

Constructing a Table of Survey Data with Percent and Confidence Intervals in every Direction

Constructing a Table of Survey Data with Percent and Confidence Intervals in every Direction Constructing a Table of Survey Data with Percent and Confidence Intervals in every Direction David Izrael, Abt Associates Sarah W. Ball, Abt Associates Sara M.A. Donahue, Abt Associates ABSTRACT We examined

More information

Counting the Ways to Count in SAS. Imelda C. Go, South Carolina Department of Education, Columbia, SC

Counting the Ways to Count in SAS. Imelda C. Go, South Carolina Department of Education, Columbia, SC Paper CC 14 Counting the Ways to Count in SAS Imelda C. Go, South Carolina Department of Education, Columbia, SC ABSTRACT This paper first takes the reader through a progression of ways to count in SAS.

More information

Is it statistically significant? The chi-square test

Is it statistically significant? The chi-square test UAS Conference Series 2013/14 Is it statistically significant? The chi-square test Dr Gosia Turner Student Data Management and Analysis 14 September 2010 Page 1 Why chi-square? Tests whether two categorical

More information

Categorical Data Analysis

Categorical Data Analysis Richard L. Scheaffer University of Florida The reference material and many examples for this section are based on Chapter 8, Analyzing Association Between Categorical Variables, from Statistical Methods

More information

School of Nursing Faculty Salary Equity Report and Action Plan

School of Nursing Faculty Salary Equity Report and Action Plan July 1, 2015 School of Nursing Faculty Salary Equity Report and Action Plan Shari L. Dworkin, Ph.D., M.S. Associate Dean for Academic Affairs Overview: In 2012, then UC President Mark Yudof charged each

More information

Workshop on Using the National Survey of Children s s Health Dataset: Practical Applications

Workshop on Using the National Survey of Children s s Health Dataset: Practical Applications Workshop on Using the National Survey of Children s s Health Dataset: Practical Applications Julian Luke Stephen Blumberg Centers for Disease Control and Prevention National Center for Health Statistics

More information

How to set the main menu of STATA to default factory settings standards

How to set the main menu of STATA to default factory settings standards University of Pretoria Data analysis for evaluation studies Examples in STATA version 11 List of data sets b1.dta (To be created by students in class) fp1.xls (To be provided to students) fp1.txt (To be

More information

Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry

Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry Paper 12028 Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry Junxiang Lu, Ph.D. Overland Park, Kansas ABSTRACT Increasingly, companies are viewing

More information

Statistics, Data Analysis & Econometrics

Statistics, Data Analysis & Econometrics Using the LOGISTIC Procedure to Model Responses to Financial Services Direct Marketing David Marsh, Senior Credit Risk Modeler, Canadian Tire Financial Services, Welland, Ontario ABSTRACT It is more important

More information

Free Trial - BIRT Analytics - IAAs

Free Trial - BIRT Analytics - IAAs Free Trial - BIRT Analytics - IAAs 11. Predict Customer Gender Once we log in to BIRT Analytics Free Trial we would see that we have some predefined advanced analysis ready to be used. Those saved analysis

More information

Predicting Customer Churn in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS

Predicting Customer Churn in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS Paper 114-27 Predicting Customer in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS Junxiang Lu, Ph.D. Sprint Communications Company Overland Park, Kansas ABSTRACT

More information

Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

More information

HEALTH INSURANCE COVERAGE STATUS. 2009-2013 American Community Survey 5-Year Estimates

HEALTH INSURANCE COVERAGE STATUS. 2009-2013 American Community Survey 5-Year Estimates S2701 HEALTH INSURANCE COVERAGE STATUS 2009-2013 American Community Survey 5-Year Estimates Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found

More information

EXPANDING THE EVIDENCE BASE IN OUTCOMES RESEARCH: USING LINKED ELECTRONIC MEDICAL RECORDS (EMR) AND CLAIMS DATA

EXPANDING THE EVIDENCE BASE IN OUTCOMES RESEARCH: USING LINKED ELECTRONIC MEDICAL RECORDS (EMR) AND CLAIMS DATA EXPANDING THE EVIDENCE BASE IN OUTCOMES RESEARCH: USING LINKED ELECTRONIC MEDICAL RECORDS (EMR) AND CLAIMS DATA A CASE STUDY EXAMINING RISK FACTORS AND COSTS OF UNCONTROLLED HYPERTENSION ISPOR 2013 WORKSHOP

More information

ABSTRACT INTRODUCTION STUDY DESCRIPTION

ABSTRACT INTRODUCTION STUDY DESCRIPTION ABSTRACT Paper 1675-2014 Validating Self-Reported Survey Measures Using SAS Sarah A. Lyons MS, Kimberly A. Kaphingst ScD, Melody S. Goodman PhD Washington University School of Medicine Researchers often

More information

Statistics in Retail Finance. Chapter 2: Statistical models of default

Statistics in Retail Finance. Chapter 2: Statistical models of default Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision

More information

Missing Data & How to Deal: An overview of missing data. Melissa Humphries Population Research Center

Missing Data & How to Deal: An overview of missing data. Melissa Humphries Population Research Center Missing Data & How to Deal: An overview of missing data Melissa Humphries Population Research Center Goals Discuss ways to evaluate and understand missing data Discuss common missing data methods Know

More information

LOGISTIC REGRESSION ANALYSIS

LOGISTIC REGRESSION ANALYSIS LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic

More information

Health Care and Life Sciences

Health Care and Life Sciences Sensitivity, Specificity, Accuracy, Associated Confidence Interval and ROC Analysis with Practical SAS Implementations Wen Zhu 1, Nancy Zeng 2, Ning Wang 2 1 K&L consulting services, Inc, Fort Washington,

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Logistic (RLOGIST) Example #3

Logistic (RLOGIST) Example #3 Logistic (RLOGIST) Example #3 SUDAAN Statements and Results Illustrated PREDMARG (predicted marginal proportion) CONDMARG (conditional marginal proportion) PRED_EFF pairwise comparison COND_EFF pairwise

More information

Alex Vidras, David Tysinger. Merkle Inc.

Alex Vidras, David Tysinger. Merkle Inc. Using PROC LOGISTIC, SAS MACROS and ODS Output to evaluate the consistency of independent variables during the development of logistic regression models. An example from the retail banking industry ABSTRACT

More information

Simple Rules to Remember When Working with Indexes Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California

Simple Rules to Remember When Working with Indexes Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California Simple Rules to Remember When Working with Indexes Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California Abstract SAS users are always interested in learning techniques related

More information

Paper D10 2009. Ranking Predictors in Logistic Regression. Doug Thompson, Assurant Health, Milwaukee, WI

Paper D10 2009. Ranking Predictors in Logistic Regression. Doug Thompson, Assurant Health, Milwaukee, WI Paper D10 2009 Ranking Predictors in Logistic Regression Doug Thompson, Assurant Health, Milwaukee, WI ABSTRACT There is little consensus on how best to rank predictors in logistic regression. This paper

More information

Same-sex Couples Consistency in Reports of Marital Status. Housing and Household Economic Statistics Division

Same-sex Couples Consistency in Reports of Marital Status. Housing and Household Economic Statistics Division Same-sex Couples Consistency in Reports of Marital Status Author: Affiliation: Daphne Lofquist U.S. Census Bureau Housing and Household Economic Statistics Division Phone: 301-763-2416 Fax: 301-457-3500

More information

Emailing Automated Notification of Errors in a Batch SAS Program Julie Kilburn, City of Hope, Duarte, CA Rebecca Ottesen, City of Hope, Duarte, CA

Emailing Automated Notification of Errors in a Batch SAS Program Julie Kilburn, City of Hope, Duarte, CA Rebecca Ottesen, City of Hope, Duarte, CA Emailing Automated Notification of Errors in a Batch SAS Program Julie Kilburn, City of Hope, Duarte, CA Rebecca Ottesen, City of Hope, Duarte, CA ABSTRACT With multiple programmers contributing to a batch

More information

Exploring Race & Ethnicity Health Insurance Coverage Differences: Results from the 2011 Oregon Health Insurance Survey

Exploring Race & Ethnicity Health Insurance Coverage Differences: Results from the 2011 Oregon Health Insurance Survey Office for Oregon Health Policy and Research Exploring Race & Ethnicity Health Insurance Coverage Differences: Results from the 2011 Oregon Health Insurance Survey September 2012 Table of Contents Race

More information

Segmentation For Insurance Payments Michael Sherlock, Transcontinental Direct, Warminster, PA

Segmentation For Insurance Payments Michael Sherlock, Transcontinental Direct, Warminster, PA Segmentation For Insurance Payments Michael Sherlock, Transcontinental Direct, Warminster, PA ABSTRACT An online insurance agency has built a base of names that responded to different offers from various

More information

Does Financial Sophistication Matter in Retirement Preparedness of U.S Households? Evidence from the 2010 Survey of Consumer Finances

Does Financial Sophistication Matter in Retirement Preparedness of U.S Households? Evidence from the 2010 Survey of Consumer Finances Does Financial Sophistication Matter in Retirement Preparedness of U.S Households? Evidence from the 2010 Survey of Consumer Finances Kyoung Tae Kim, The Ohio State University 1 Sherman D. Hanna, The Ohio

More information

One problem > Multiple solutions; various ways of removing duplicates from dataset using SAS Jaya Dhillon, Louisiana State University

One problem > Multiple solutions; various ways of removing duplicates from dataset using SAS Jaya Dhillon, Louisiana State University One problem > Multiple solutions; various ways of removing duplicates from dataset using SAS Jaya Dhillon, Louisiana State University ABSTRACT In real world, analysts seldom come across data which is in

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

Multinomial Logistic Regression

Multinomial Logistic Regression Multinomial Logistic Regression Dr. Jon Starkweather and Dr. Amanda Kay Moske Multinomial logistic regression is used to predict categorical placement in or the probability of category membership on a

More information

Beginning Tutorials. PROC FREQ: It s More Than Counts Richard Severino, The Queen s Medical Center, Honolulu, HI OVERVIEW.

Beginning Tutorials. PROC FREQ: It s More Than Counts Richard Severino, The Queen s Medical Center, Honolulu, HI OVERVIEW. Paper 69-25 PROC FREQ: It s More Than Counts Richard Severino, The Queen s Medical Center, Honolulu, HI ABSTRACT The FREQ procedure can be used for more than just obtaining a simple frequency distribution

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

2003 National Survey of College Graduates Nonresponse Bias Analysis 1

2003 National Survey of College Graduates Nonresponse Bias Analysis 1 2003 National Survey of College Graduates Nonresponse Bias Analysis 1 Michael White U.S. Census Bureau, Washington, DC 20233 Abstract The National Survey of College Graduates (NSCG) is a longitudinal survey

More information

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

More information

Contacts between Police and the Public, 2008

Contacts between Police and the Public, 2008 U.S. Department of Justice Office of Justice Programs Bureau of Justice Statistics Special Report october 2011 ncj 234599 Contacts between Police and the Public, 2008 Christine Eith and Matthew R. Durose,

More information

SPSS Resources. 1. See website (readings) for SPSS tutorial & Stats handout

SPSS Resources. 1. See website (readings) for SPSS tutorial & Stats handout Analyzing Data SPSS Resources 1. See website (readings) for SPSS tutorial & Stats handout Don t have your own copy of SPSS? 1. Use the libraries to analyze your data 2. Download a trial version of SPSS

More information

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional

More information

Paper 45-2010 Evaluation of methods to determine optimal cutpoints for predicting mortgage default Abstract Introduction

Paper 45-2010 Evaluation of methods to determine optimal cutpoints for predicting mortgage default Abstract Introduction Paper 45-2010 Evaluation of methods to determine optimal cutpoints for predicting mortgage default Valentin Todorov, Assurant Specialty Property, Atlanta, GA Doug Thompson, Assurant Health, Milwaukee,

More information

Data Cleaning and Missing Data Analysis

Data Cleaning and Missing Data Analysis Data Cleaning and Missing Data Analysis Dan Merson vagabond@psu.edu India McHale imm120@psu.edu April 13, 2010 Overview Introduction to SACS What do we mean by Data Cleaning and why do we do it? The SACS

More information

The Demand for Financial Planning Services 1

The Demand for Financial Planning Services 1 The Demand for Financial Planning Services 1 Sherman D. Hanna, Ohio State University Professor, Consumer Sciences Department Ohio State University 1787 Neil Avenue Columbus, OH 43210-1290 Phone: 614-292-4584

More information

Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)

Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study) Cairo University Faculty of Economics and Political Science Statistics Department English Section Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study) Prepared

More information

Home Computers and Internet Use in the United States: August 2000

Home Computers and Internet Use in the United States: August 2000 Home Computers and Internet Use in the United States: August 2000 Special Studies Issued September 2001 P23-207 Defining computer and Internet access All individuals living in a household in which the

More information

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Arun K Mandapaka, Amit Singh Kushwah, Dr.Goutam Chakraborty Oklahoma State University, OK, USA ABSTRACT Direct

More information

Using Names To Check Accuracy of Race and Gender Coding in NAEP

Using Names To Check Accuracy of Race and Gender Coding in NAEP Using Names To Check Accuracy of Race and Gender Coding in NAEP Jennifer Czuprynski Kali, James Bethel, John Burke, David Morganstein, and Sharon Hirabayashi Westat Keywords: Data quality, Coding errors,

More information

Logistic Regression. http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests

Logistic Regression. http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests Logistic Regression http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests Overview Binary (or binomial) logistic regression is a form of regression which is used when the dependent is a dichotomy

More information

New SAS Procedures for Analysis of Sample Survey Data

New SAS Procedures for Analysis of Sample Survey Data New SAS Procedures for Analysis of Sample Survey Data Anthony An and Donna Watts, SAS Institute Inc, Cary, NC Abstract Researchers use sample surveys to obtain information on a wide variety of issues Many

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

Interactions involving Categorical Predictors

Interactions involving Categorical Predictors Interactions involving Categorical Predictors Today s Class: To CLASS or not to CLASS: Manual vs. program-created differences among groups Interactions of continuous and categorical predictors Interactions

More information

James E. Bartlett, II is Assistant Professor, Department of Business Education and Office Administration, Ball State University, Muncie, Indiana.

James E. Bartlett, II is Assistant Professor, Department of Business Education and Office Administration, Ball State University, Muncie, Indiana. Organizational Research: Determining Appropriate Sample Size in Survey Research James E. Bartlett, II Joe W. Kotrlik Chadwick C. Higgins The determination of sample size is a common task for many organizational

More information

Projections of the Size and Composition of the U.S. Population: 2014 to 2060 Population Estimates and Projections

Projections of the Size and Composition of the U.S. Population: 2014 to 2060 Population Estimates and Projections Projections of the Size and Composition of the U.S. Population: to Population Estimates and Projections Current Population Reports By Sandra L. Colby and Jennifer M. Ortman Issued March 15 P25-1143 INTRODUCTION

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

Categories of Data. Communication & Journalism as a Broad Field

Categories of Data. Communication & Journalism as a Broad Field Graduate Enrollment and Degrees in Communication/Journalism: Benchmarking Data from the Council of Graduate Schools/Graduate Record Examinations Survey Since 1986, the Council of Graduate Schools (CGS)

More information

Poisson Regression or Regression of Counts (& Rates)

Poisson Regression or Regression of Counts (& Rates) Poisson Regression or Regression of (& Rates) Carolyn J. Anderson Department of Educational Psychology University of Illinois at Urbana-Champaign Generalized Linear Models Slide 1 of 51 Outline Outline

More information

Chapter 25 Specifying Forecasting Models

Chapter 25 Specifying Forecasting Models Chapter 25 Specifying Forecasting Models Chapter Table of Contents SERIES DIAGNOSTICS...1281 MODELS TO FIT WINDOW...1283 AUTOMATIC MODEL SELECTION...1285 SMOOTHING MODEL SPECIFICATION WINDOW...1287 ARIMA

More information

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Features Of The Chi-Square Statistic The chi-square test is non-parametric. That is, it makes no assumptions about the distribution

More information

Analysis of Survey Data Using the SAS SURVEY Procedures: A Primer

Analysis of Survey Data Using the SAS SURVEY Procedures: A Primer Analysis of Survey Data Using the SAS SURVEY Procedures: A Primer Patricia A. Berglund, Institute for Social Research - University of Michigan Wisconsin and Illinois SAS User s Group June 25, 2014 1 Overview

More information

Changing Demographics of Colorado

Changing Demographics of Colorado Changing Demographics of Colorado United States 250,000,000 200,000,000 150,000,000 100,000,000 50,000,000 White Black American Indian Asian Pacific Islander Other Race Hispanic 0 1990 2000 2010 2013 Colorado

More information

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL Paper SA01-2012 Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL ABSTRACT Analysts typically consider combinations

More information

Data The estimates presented in the tables originate from the 2013 SCS to the NCVS. The SCS collects information about student and school

Data The estimates presented in the tables originate from the 2013 SCS to the NCVS. The SCS collects information about student and school This document reports data from the 2013 School Crime Supplement (SCS) of the National Crime Victimization Survey (NCVS). 1 The Web Tables show the extent to which with different personal characteristics

More information

Age/sex/race in New York State

Age/sex/race in New York State Age/sex/race in New York State Based on Census 2010 Summary File 1 Jan K. Vink Program on Applied Demographics Cornell University July 14, 2011 Program on Applied Demographics Web: http://pad.human.cornell.edu

More information

Race Reporting By Immigrants From Spanish-Speaking Countries Of Latin America In Census 2000

Race Reporting By Immigrants From Spanish-Speaking Countries Of Latin America In Census 2000 Race Reporting By Immigrants From Spanish-Speaking Countries Of Latin America In Census 2000 Sharon Ennis and Jorge del Pinal U.S. Census Bureau, 4700 Silver Hill Road, Washington, D.C. 20233-6700 Key

More information

Hands-on Exercise Using DataFerrett American Community Survey Public Use Microdata Sample (PUMS)

Hands-on Exercise Using DataFerrett American Community Survey Public Use Microdata Sample (PUMS) Hands-on Exercise Using DataFerrett American Community Survey Public Use Microdata Sample (PUMS) U.S. Census Bureau Michael Burns (425) 495-3234 DataFerrett Help http://dataferrett.census.gov/ 1-866-437-0171

More information

MWSUG 2011 - Paper S111

MWSUG 2011 - Paper S111 MWSUG 2011 - Paper S111 Dealing with Duplicates in Your Data Joshua M. Horstman, First Phase Consulting, Inc., Indianapolis IN Roger D. Muller, First Phase Consulting, Inc., Carmel IN Abstract As SAS programmers,

More information