PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY


 Jade Sullivan
 2 years ago
 Views:
Transcription
1 PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY ABSTRACT Keywords: Logistic. INTRODUCTION This paper covers some gotchas in SAS R PROC LOGISTIC. A gotcha is a mistake that isn t obviously a mistake the program runs, there may be a note or a warning, but no errors. Output appears. But it s the wrong output. This is not a primer on PROC LOGISTIC, much less on logistic regression. There are many good books on logistic regression; one such is Hosmer and Lemeshow [2000]. Each section has several subsections. First, I identify the gotcha. Then I give an example. Third, I show what evidence you have that it occurs. Fourth, I show how to fix it in some cases, referring to other resources. In some cases, I offer an explanation between the evidence and the solution. GOTCHA #1 CODING 0 AND 1 PROC LOGISTIC can be used to run logistic regression on a dichotomous dependent variable. Often, these are coded 0 and 1, with 0 for no or the equivalent, and 1 for yes or the equivalent. In this case, we are usually interested in modeling the probability of a yes. However, by default, SAS models the probability of a 0 (which would be a no ). For example, we might be interested in modeling the presence of a disease, with 0 meaning the person is not infected, and 1 meaning he or she is infected. To keep it simple, I will use one independent variable: sex, code as 1 for female and 0 for male. So: d a t a t o d a y ; i n p u t d i s e a s e f emale weight ; d a t a l i n e s ; ; ; ; ; we then run PROC LOGISTIC: p roc l o g i s t i c d a t a = t o d a y ; model d i s e a s e = f emale ; and get, among other output, an odds ratio estimate of 1.39 for female, while it s clear that men are much more likely to be infected. The evidence that this is happening is one line in the output: {Probability modeled is disease=0} and several lines in the log: NOTE: PROC LOGISTIC is modeling the probability that disease=0. One way to change this to model the probability that disease=1 is to specify the response variable option EVENT= 1 There are several solutions. The simplest is not the one mentioned in the log, but rather the DESCENDING option. 1
2 p roc l o g i s t i c d a t a = t o d a y d e s c e n d i n g ; model d i s e a s e = f emale ; Another method is the one mentioned in the log, which is more general: p roc l o g i s t i c d a t a = t o d a y ; model d i s e a s e ( e v e n t = 1 ) = female ; GOTCHA #2 EFFECT CODING AND CLASS VARIABLES When you have a categorical independent variable with more than 2 levels, you need to define it with a CLASS statement. In PROC GLM the default coding for this is dummy coding. In PROC LOGISTIC, it s effect coding. To me, effect coding is quite unnatural. EXAMPLE Continuing with the same example of modeling probability of infection, suppose you now have race/ethnicity as an IV, with 6 categories, as defined by the Census Bureau: White, Black/African American, Hispanic/Latino, American Indian/Alaskan Native, Native Hawaiian or other Pacific Islander, and Asian. p roc l o g i s t i c d a t a = today2 ; c l a s s r a c e ; model d i s e a s e ( e v e n t = 1 ) = r a c e ; and get parameter estimates that include (among much else): Parameter DF Estimate Intercept race AIAN race AfrA race Asian race Lat race NHPI and OR estimates Point Effect Estimate race AIAN vs White race AfrA vs White race Asian vs White race Lat vs White race NHPI vs White but we know that the OR estimate should be e OR, and, for example, e.06 = 0.94 not 1. The design matrix. With the default, the design matrix looks like this: 2
3 race AIAN AfrA Asian Lat NHPI White and each parameter estimate estimates the difference between that level and the average of the other levels. On the other hand, with dummy (or reference) coding, it looks like race AIAN AfrA Asian Lat NHPI White and each parameter estimates the difference between that level and the reference group. Use the param = reference option on the class statement: p roc l o g i s t i c d a t a = today2 ; c l a s s r a c e / param = r e f ; model d i s e a s e ( e v e n t = 1 ) = r a c e ; For more information (and other possible parameterizations) see the SAS documentation for PROC LOGISTIC, in particular the section CLASS variable parameterization in DETAILS GOTCHA #3 QUASICOMPLETE AND COMPLETE SEPARATION If you picture the data as a 2x2 crosstab, then quasicomplete separation occurs when one of the cells is 0. Complete separation occurs when one cell in each row and column is 0. EXAMPLE QUASICOMPLETE SEPARATION An example of quasicomplete separation is: d a t a t o d a y 7 a ; i n p u t x $ y $ d a t a l i n e s ; A C A C A C A C A C B C B C B C B C B C B C B C B C B C B D B D B D B D B D ; ; ; ; B C It varies. Confidence intervals will be extremely wide. But sometimes there is a warning that there was complete or quasicomplete separation, sometimes a note. Sometimes, you get a warning that convergence was not attained. With the weight option you sometimes get a note that observations with nonpositive frequencies or weights were excluded. For example, if you run the data step creating data today7a, and then p roc l o g i s t i c d a t a = t o d a y 7 a ; c l a s s x ; model y = x ; 3
4 you get a warning in the log. But if you run this data step: d a t a today7 ; i n p u t x $ y $ weight d a t a l i n e s ; A C 5 A D 0 B C 10 B D 5 ; ; ; and then p roc l o g i s t i c d a t a = today7 ; c l a s s x ; model y = x ; w e i g ht weight ; you get a note in the log. Sometimes, you can delete the offending variable. In the example, there was only one independent variable, but usually, there will be more, and one of them will be the problematic one. Alternatively, if there are multiple categories, it may be sensible to combine some of them. A third possibility is to leave the offending variable in the equation, and simply report the results for the other variables these are still correct. You can then report the coefficients for the offending variables as inf. Finally, you can use exact inference with the EXACT option. REFERENCES Paul Allison s paper at SGF 2008 has a great deal of lucid explanation of this problem [Allison, 2008], although his statement that PROC LOGISTIC gives clear diagnostic messages is, as we have seen, not always true. GOTCHA #4 CONCORDANT AND DISCORDANT Part of the default output from PROC LOGISTIC is a table that has entries including percent concordant and percent discordant. To me, this implies the percent that would correctly be assigned, based on the results of the logistic regression. But that is not what it is. It looks at all possible pairs of observations. A pair is concordant if the observation with the larger value of X also has the larger value of Y. A pair is discordant if the observation with the larger value of X has the smaller value of Y; here, X and Y are the predicted value and the actual value. EXAMPLE For our first example, the output looks like this: Association of Predicted Probabilities and Observed Responses Percent Concordant 25.0 Somers D Percent Discordant 25.0 Gamma Percent Tied 50.0 Taua Pairs 4 c It is hard to find documentation of this. I couldn t find it explained in the LOGISTIC documentation. I found a mention of concordant and discordant in the FREQ documentation, but it was not clear what X and Y were, until I searched SASL and found an explanation from David Cassell. For what I was thinking of, you need the CTABLE option on the MODEL statement, which gives the proportion correctly classified, the sensitivity, the specificity, and other measures for each of a number of cutpoints of the predicted probability level. By default, it gives probability levels from 0 to 1 at intervals of.02, but if you just want a few, you can get them: p roc l o g i s t i c d a t a = today3 ; c l a s s r a c e sex / param = r e f ; model d i s e a s e ( e v e n t = 1 ) = r a c e sex / c t a b l e pprob = ( ) ; 4
5 which yields Correct Incorrect Percentages Prob Non Non Sensi Speci False False Level Event Event Event Event Correct tivity ficity POS NEG GOTCHA #5 INTERACTIONS When you add interactions to a logistic model, no odds ratios are printed. EXAMPLE data today3; input disease race $ sex $ weight datalines; 0 White F White M White F White M AfrA F AfrA M AfrA F 20 1 AfrA M 28 0 Lat F Lat M 90 1 Lat F 75 1 Lat M 30 0 AIAN F 25 0 AIAN M 15 1 AIAN F 10 1 AIAN M 8 0 Asian F Asian M Asian F 50 1 Asian M 40 0 NHPI F 10 0 NHPI M 8 1 NHPI F 5 1 NHPI M 3 ;;;; run; and then p roc l o g i s t i c d a t a = today3 ; c l a s s r a c e sex / param = r e f ; model d i s e a s e ( e v e n t = 1 ) = r a c e sex r a c e sex ; There are no odds ratios printed. EXPLANATION The reason no odds ratios are printed is a bit complex. Let s take a simpler case, with two independent variables, each with two levels, and, for simplicity, equal cell sizes. If the dependent variable is continuous, then we might get something like this (with DV means in each cell). Male Female Mean White Other Average Now, we can see that White males are lower than White females, but Black males are higher than Black females. The average effect of being male is to add 5, and the average effect of being White is to subtract 10. If there were no interaction, the main effects would give 5
6 Male Female White = = 45 Other = = 65 But if we put proportions in the table and estimate odds ratios, things are trickier. Start with proportions: Male Female Mean White Black Average and it still works just like means. But change to odds: Male Female Mean White 2 to 3 3 t 2 1 to 1 Black 9 to 1 1 to 1 7 to 3 Average 13 to 7 11 to 9 3 to 2 or, dividing through Male Female Mean White Black Average Notice that the average odds is not the average of the odds...e.g ( )/2. What would the main effects model predict? Some people suggest using the CONTRAST statement, but I (and many others) find these confusing and prone to error for interactions. I wasn t able to find a good reference for this method that is, one that explained clearly how to do it. But SASL to the rescue We can get what we want with a two step process: First, recode the race and sex variables as numbers d a t a today4 ; s e t t oday3 ; i f r a c e = White t h e n racenum = 0 ; e l s e i f r a c e = AfrA t h e n racenum = 1 ; e l s e i f r a c e = AIAN t h e n racenum = 2 ; e l s e i f r a c e = Asian t h e n racenum = 3 ; e l s e i f r a c e = NHPI t h e n racenum = 4 ; i f sex = M t h e n sexnum = 0 ; e l s e i f sex = F t h e n sexnum = 1 ; r a c e s e x = 100 racenum + sexnum ; Second, write a regular PROC LOGISTIC with the new variable racesex, defining the reference category: p roc l o g i s t i c d a t a = today4 ; c l a s s r a c e s e x ( param = r e f r e f = 100 ) ; model d i s e a s e ( e v e n t = 1 ) = r a c e s e x ; which yields, among other things: 6
7 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits racesex 0 vs racesex 1 vs racesex 101 vs racesex 200 vs racesex 201 vs racesex 300 vs racesex 301 vs racesex 400 vs racesex 401 vs and we just need to remember the coding we used: 100 is AfricanAmerican, 0 is male, so this is comparing each group to AfricanAmerican males. An alternate solution (also gotten through SASL) is to use a macro created by Paul Thompson, and presented at SUGI 31: www2.sas.com/proceedings/sugi31/ pdf. SUMMARY PROC LOGISTIC offers many traps for the unwary. In addition to the above, one needs to be careful about Wald vs. likelihood ratio tests (SAS defaults to likelihood ratio tests for the confidence intervals and Wald tests for the parameter estimates. For more on these see Long [1997]. In addition to the gotchas presented here, there are the usual issues of model building, variable selection, model interpretation and so on. For more on these issues see: Hosmer and Lemeshow [2000], Long [1997], Allison [1999, 2008] REFERENCES D. W. Hosmer and S. Lemeshow. Applied logistic regression. John Wiley & Sons, New York, 2nd edition, Paul D. Allison. Convergence failures in logistic regression. In SAS Global Forum, number 360, J. S. Long. Regression models of categorical and limited dependent variables. Sage, Thousand Oaks, CA, P. D. Allison. Logistic regression using the SAS system: Theory and application. SAS Institute, Cary, NC,
8 ACKNOWLEDGMENTS CONTACT INFORMATION Peter L. Flom 515 West End Ave New York, NY Phone: (917) Personal webpage: SAS R and all other SAS Institute Inc., product or service names are registered trademarks ore trademarks of SAS Institute Inc., in the USA and other countries. R indicates USA registration. Other brand names and product names are registered trademarks or trademarks of their respective companies. 8
Multinomial and ordinal logistic regression using PROC LOGISTIC Peter L. Flom National Development and Research Institutes, Inc
ABSTRACT Multinomial and ordinal logistic regression using PROC LOGISTIC Peter L. Flom National Development and Research Institutes, Inc Logistic regression may be useful when we are trying to model a
More informationSUGI 29 Statistics and Data Analysis
Paper 19429 Head of the CLASS: Impress your colleagues with a superior understanding of the CLASS statement in PROC LOGISTIC Michelle L. Pritchard and David J. Pasta Ovation Research Group, San Francisco,
More informationCool Tools for PROC LOGISTIC
Cool Tools for PROC LOGISTIC Paul D. Allison Statistical Horizons LLC and the University of Pennsylvania March 2013 www.statisticalhorizons.com 1 New Features in LOGISTIC ODDSRATIO statement EFFECTPLOT
More information11. Analysis of Casecontrol Studies Logistic Regression
Research methods II 113 11. Analysis of Casecontrol Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationln(p/(1p)) = α +β*age35plus, where p is the probability or odds of drinking
Dummy Coding for Dummies Kathryn Martin, Maternal, Child and Adolescent Health Program, California Department of Public Health ABSTRACT There are a number of ways to incorporate categorical variables into
More informationData Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through
More informationDeveloping Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@
Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Yanchun Xu, Andrius Kubilius Joint Commission on Accreditation of Healthcare Organizations,
More information5. Ordinal regression: cumulative categories proportional odds. 6. Ordinal regression: comparison to single reference generalized logits
Lecture 23 1. Logistic regression with binary response 2. Proc Logistic and its surprises 3. quadratic model 4. HosmerLemeshow test for lack of fit 5. Ordinal regression: cumulative categories proportional
More informationModeling Lifetime Value in the Insurance Industry
Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationA LOGISTIC REGRESSION MODEL TO PREDICT FRESHMEN ENROLLMENTS Vijayalakshmi Sampath, Andrew Flagel, Carolina Figueroa
A LOGISTIC REGRESSION MODEL TO PREDICT FRESHMEN ENROLLMENTS Vijayalakshmi Sampath, Andrew Flagel, Carolina Figueroa ABSTRACT Predictive modeling is the technique of using historical information on a certain
More informationRaul CruzCano, HLTH653 Spring 2013
Multilevel ModelingLogistic Schedule 3/18/2013 = Spring Break 3/25/2013 = Longitudinal Analysis 4/1/2013 = Midterm (Exercises 15, not Longitudinal) Introduction Just as with linear regression, logistic
More informationStatistics and Data Analysis
NESUG 27 PRO LOGISTI: The Logistics ehind Interpreting ategorical Variable Effects Taylor Lewis, U.S. Office of Personnel Management, Washington, D STRT The goal of this paper is to demystify how SS models
More informationA Tutorial on Logistic Regression
A Tutorial on Logistic Regression Ying So, SAS Institute Inc., Cary, NC ABSTRACT Many procedures in SAS/STAT can be used to perform logistic regression analysis: CATMOD, GENMOD,LOGISTIC, and PROBIT. Each
More informationUsing An Ordered Logistic Regression Model with SAS Vartanian: SW 541
Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 libname in1 >c:\=; Data first; Set in1.extract; A=1; PROC LOGIST OUTEST=DD MAXITER=100 ORDER=DATA; OUTPUT OUT=CC XBETA=XB P=PROB; MODEL
More informationUnit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)
Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.) Logistic regression generalizes methods for 2way tables Adds capability studying several predictors, but Limited to
More informationVI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
More informationOverview Classes. 123 Logistic regression (5) 193 Building and applying logistic regression (6) 263 Generalizations of logistic regression (7)
Overview Classes 123 Logistic regression (5) 193 Building and applying logistic regression (6) 263 Generalizations of logistic regression (7) 24 Loglinear models (8) 54 1517 hrs; 5B02 Building and
More informationAbbas S. Tavakoli, DrPH, MPH, ME 1 ; Nikki R. Wooten, PhD, LISWCP 2,3, Jordan Brittingham, MSPH 4
1 Paper 16802016 Using GENMOD to Analyze Correlated Data on Military System Beneficiaries Receiving Inpatient Behavioral Care in South Carolina Care Systems Abbas S. Tavakoli, DrPH, MPH, ME 1 ; Nikki
More informationMultivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More informationGuido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY
Guido s Guide to PROC FREQ A Tutorial for Beginners Using the SAS System Joseph J. Guido, University of Rochester Medical Center, Rochester, NY ABSTRACT PROC FREQ is an essential procedure within BASE
More informationBasic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS OneSample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
More informationDisplaying and comparing correlated ROC curves with the SAS System
Displaying and comparing correlated ROC curves with the SAS System Barbara Schneider, University of Vienna, Austria ABSTRACT Diagnosis is an essential part of clinical practice. Much medical research is
More informationBinary Logistic Regression
Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including
More informationImproved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC
Paper AA082013 Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC Robert G. Downer, Grand Valley State University, Allendale, MI ABSTRACT
More informationLecture 19: Conditional Logistic Regression
Lecture 19: Conditional Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina
More informationEducational Attainment of Veterans: 2000 to 2009
Educational Attainment of Veterans: to 9 January 11 NCVAS National Center for Veterans Analysis and Statistics Data Source and Methods Data for this analysis come from years of the Current Population Survey
More informationLinda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén www.statmodel.com. Table Of Contents
Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén
More informationXPost: Excel Workbooks for the Postestimation Interpretation of Regression Models for Categorical Dependent Variables
XPost: Excel Workbooks for the Postestimation Interpretation of Regression Models for Categorical Dependent Variables Contents Simon Cheng hscheng@indiana.edu php.indiana.edu/~hscheng/ J. Scott Long jslong@indiana.edu
More informationChapter 39 The LOGISTIC Procedure. Chapter Table of Contents
Chapter 39 The LOGISTIC Procedure Chapter Table of Contents OVERVIEW...1903 GETTING STARTED...1906 SYNTAX...1910 PROCLOGISTICStatement...1910 BYStatement...1912 CLASSStatement...1913 CONTRAST Statement.....1916
More informationUSING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION. Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA
USING LOGISTIC REGRESSION TO PREDICT CUSTOMER RETENTION Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA Logistic regression is an increasingly popular statistical technique
More informationMultiple logistic regression analysis of cigarette use among high school students
Multiple logistic regression analysis of cigarette use among high school students ABSTRACT Joseph AdwereBoamah Alliant International University A binary logistic regression analysis was performed to predict
More informationCharles Secolsky County College of Morris. Sathasivam 'Kris' Krishnan The Richard Stockton College of New Jersey
Using logistic regression for validating or invalidating initial statewide cutoff scores on basic skills placement tests at the community college level Abstract Charles Secolsky County College of Morris
More informationModel Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc.
Paper 26426 Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc. Abstract: There are several procedures in the SAS System for statistical modeling. Most statisticians who use the SAS
More informationU.S. Population Projections: 2012 to 2060
U.S. Population Projections: 2012 to 2060 Jennifer M. Ortman Population Division Presentation for the FFC/GW Brown Bag Seminar Series on Forecasting Washington, DC February 7, 2013 2012 National Projections
More informationABSTRACT INTRODUCTION
Paper SP032009 Illustrative Logistic Regression Examples using PROC LOGISTIC: New Features in SAS/STAT 9.2 Robert G. Downer, Grand Valley State University, Allendale, MI Patrick J. Richardson, Van Andel
More informationTwo Correlated Proportions (McNemar Test)
Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with
More informationCounting the Ways to Count in SAS. Imelda C. Go, South Carolina Department of Education, Columbia, SC
Paper CC 14 Counting the Ways to Count in SAS Imelda C. Go, South Carolina Department of Education, Columbia, SC ABSTRACT This paper first takes the reader through a progression of ways to count in SAS.
More informationConstructing a Table of Survey Data with Percent and Confidence Intervals in every Direction
Constructing a Table of Survey Data with Percent and Confidence Intervals in every Direction David Izrael, Abt Associates Sarah W. Ball, Abt Associates Sara M.A. Donahue, Abt Associates ABSTRACT We examined
More informationAddressing Racial/Ethnic Disparities in Hypertensive Health Center Patients
Addressing Racial/Ethnic Disparities in Hypertensive Health Center Patients Academy Health June 11, 2011 Quyen Ngo Metzger, MD, MPH Data Branch Chief, Office of Quality and Data U.S. Department of Health
More informationIs it statistically significant? The chisquare test
UAS Conference Series 2013/14 Is it statistically significant? The chisquare test Dr Gosia Turner Student Data Management and Analysis 14 September 2010 Page 1 Why chisquare? Tests whether two categorical
More informationWorkshop on Using the National Survey of Children s s Health Dataset: Practical Applications
Workshop on Using the National Survey of Children s s Health Dataset: Practical Applications Julian Luke Stephen Blumberg Centers for Disease Control and Prevention National Center for Health Statistics
More informationSchool of Nursing Faculty Salary Equity Report and Action Plan
July 1, 2015 School of Nursing Faculty Salary Equity Report and Action Plan Shari L. Dworkin, Ph.D., M.S. Associate Dean for Academic Affairs Overview: In 2012, then UC President Mark Yudof charged each
More informationHow to set the main menu of STATA to default factory settings standards
University of Pretoria Data analysis for evaluation studies Examples in STATA version 11 List of data sets b1.dta (To be created by students in class) fp1.xls (To be provided to students) fp1.txt (To be
More informationCategorical Data Analysis
Richard L. Scheaffer University of Florida The reference material and many examples for this section are based on Chapter 8, Analyzing Association Between Categorical Variables, from Statistical Methods
More informationModeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry
Paper 12028 Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry Junxiang Lu, Ph.D. Overland Park, Kansas ABSTRACT Increasingly, companies are viewing
More informationFree Trial  BIRT Analytics  IAAs
Free Trial  BIRT Analytics  IAAs 11. Predict Customer Gender Once we log in to BIRT Analytics Free Trial we would see that we have some predefined advanced analysis ready to be used. Those saved analysis
More informationStatistics, Data Analysis & Econometrics
Using the LOGISTIC Procedure to Model Responses to Financial Services Direct Marketing David Marsh, Senior Credit Risk Modeler, Canadian Tire Financial Services, Welland, Ontario ABSTRACT It is more important
More informationMultinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
More informationEXPANDING THE EVIDENCE BASE IN OUTCOMES RESEARCH: USING LINKED ELECTRONIC MEDICAL RECORDS (EMR) AND CLAIMS DATA
EXPANDING THE EVIDENCE BASE IN OUTCOMES RESEARCH: USING LINKED ELECTRONIC MEDICAL RECORDS (EMR) AND CLAIMS DATA A CASE STUDY EXAMINING RISK FACTORS AND COSTS OF UNCONTROLLED HYPERTENSION ISPOR 2013 WORKSHOP
More informationLinear Regression in SPSS
Linear Regression in SPSS Data: mangunkill.sav Goals: Examine relation between number of handguns registered (nhandgun) and number of man killed (mankill) checking Predict number of man killed using number
More informationHEALTH INSURANCE COVERAGE STATUS. 20092013 American Community Survey 5Year Estimates
S2701 HEALTH INSURANCE COVERAGE STATUS 20092013 American Community Survey 5Year Estimates Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found
More informationPredicting Customer Churn in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS
Paper 11427 Predicting Customer in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS Junxiang Lu, Ph.D. Sprint Communications Company Overland Park, Kansas ABSTRACT
More informationMissing Data & How to Deal: An overview of missing data. Melissa Humphries Population Research Center
Missing Data & How to Deal: An overview of missing data Melissa Humphries Population Research Center Goals Discuss ways to evaluate and understand missing data Discuss common missing data methods Know
More informationStatistics in Retail Finance. Chapter 2: Statistical models of default
Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationLogistic regression diagnostics
Logistic regression diagnostics Biometry 755 Spring 2009 Logistic regression diagnostics p. 1/28 Assessing model fit A good model is one that fits the data well, in the sense that the values predicted
More informationLOGISTIC REGRESSION ANALYSIS
LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic
More informationAlex Vidras, David Tysinger. Merkle Inc.
Using PROC LOGISTIC, SAS MACROS and ODS Output to evaluate the consistency of independent variables during the development of logistic regression models. An example from the retail banking industry ABSTRACT
More informationHealth Care and Life Sciences
Sensitivity, Specificity, Accuracy, Associated Confidence Interval and ROC Analysis with Practical SAS Implementations Wen Zhu 1, Nancy Zeng 2, Ning Wang 2 1 K&L consulting services, Inc, Fort Washington,
More informationEmailing Automated Notification of Errors in a Batch SAS Program Julie Kilburn, City of Hope, Duarte, CA Rebecca Ottesen, City of Hope, Duarte, CA
Emailing Automated Notification of Errors in a Batch SAS Program Julie Kilburn, City of Hope, Duarte, CA Rebecca Ottesen, City of Hope, Duarte, CA ABSTRACT With multiple programmers contributing to a batch
More informationExploring Race & Ethnicity Health Insurance Coverage Differences: Results from the 2011 Oregon Health Insurance Survey
Office for Oregon Health Policy and Research Exploring Race & Ethnicity Health Insurance Coverage Differences: Results from the 2011 Oregon Health Insurance Survey September 2012 Table of Contents Race
More informationPaper D10 2009. Ranking Predictors in Logistic Regression. Doug Thompson, Assurant Health, Milwaukee, WI
Paper D10 2009 Ranking Predictors in Logistic Regression Doug Thompson, Assurant Health, Milwaukee, WI ABSTRACT There is little consensus on how best to rank predictors in logistic regression. This paper
More informationABSTRACT INTRODUCTION STUDY DESCRIPTION
ABSTRACT Paper 16752014 Validating SelfReported Survey Measures Using SAS Sarah A. Lyons MS, Kimberly A. Kaphingst ScD, Melody S. Goodman PhD Washington University School of Medicine Researchers often
More informationSamesex Couples Consistency in Reports of Marital Status. Housing and Household Economic Statistics Division
Samesex Couples Consistency in Reports of Marital Status Author: Affiliation: Daphne Lofquist U.S. Census Bureau Housing and Household Economic Statistics Division Phone: 3017632416 Fax: 3014573500
More informationSimple Rules to Remember When Working with Indexes Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California
Simple Rules to Remember When Working with Indexes Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California Abstract SAS users are always interested in learning techniques related
More informationSegmentation For Insurance Payments Michael Sherlock, Transcontinental Direct, Warminster, PA
Segmentation For Insurance Payments Michael Sherlock, Transcontinental Direct, Warminster, PA ABSTRACT An online insurance agency has built a base of names that responded to different offers from various
More informationOne problem > Multiple solutions; various ways of removing duplicates from dataset using SAS Jaya Dhillon, Louisiana State University
One problem > Multiple solutions; various ways of removing duplicates from dataset using SAS Jaya Dhillon, Louisiana State University ABSTRACT In real world, analysts seldom come across data which is in
More informationDoes Financial Sophistication Matter in Retirement Preparedness of U.S Households? Evidence from the 2010 Survey of Consumer Finances
Does Financial Sophistication Matter in Retirement Preparedness of U.S Households? Evidence from the 2010 Survey of Consumer Finances Kyoung Tae Kim, The Ohio State University 1 Sherman D. Hanna, The Ohio
More informationMultinomial Logistic Regression
Multinomial Logistic Regression Dr. Jon Starkweather and Dr. Amanda Kay Moske Multinomial logistic regression is used to predict categorical placement in or the probability of category membership on a
More informationLogistic (RLOGIST) Example #3
Logistic (RLOGIST) Example #3 SUDAAN Statements and Results Illustrated PREDMARG (predicted marginal proportion) CONDMARG (conditional marginal proportion) PRED_EFF pairwise comparison COND_EFF pairwise
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationBeginning Tutorials. PROC FREQ: It s More Than Counts Richard Severino, The Queen s Medical Center, Honolulu, HI OVERVIEW.
Paper 6925 PROC FREQ: It s More Than Counts Richard Severino, The Queen s Medical Center, Honolulu, HI ABSTRACT The FREQ procedure can be used for more than just obtaining a simple frequency distribution
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More information2003 National Survey of College Graduates Nonresponse Bias Analysis 1
2003 National Survey of College Graduates Nonresponse Bias Analysis 1 Michael White U.S. Census Bureau, Washington, DC 20233 Abstract The National Survey of College Graduates (NSCG) is a longitudinal survey
More informationOrdinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
More informationSPSS Resources. 1. See website (readings) for SPSS tutorial & Stats handout
Analyzing Data SPSS Resources 1. See website (readings) for SPSS tutorial & Stats handout Don t have your own copy of SPSS? 1. Use the libraries to analyze your data 2. Download a trial version of SPSS
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More informationData Cleaning and Missing Data Analysis
Data Cleaning and Missing Data Analysis Dan Merson vagabond@psu.edu India McHale imm120@psu.edu April 13, 2010 Overview Introduction to SACS What do we mean by Data Cleaning and why do we do it? The SACS
More informationMissing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13
Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional
More informationPaper 452010 Evaluation of methods to determine optimal cutpoints for predicting mortgage default Abstract Introduction
Paper 452010 Evaluation of methods to determine optimal cutpoints for predicting mortgage default Valentin Todorov, Assurant Specialty Property, Atlanta, GA Doug Thompson, Assurant Health, Milwaukee,
More informationThe Demand for Financial Planning Services 1
The Demand for Financial Planning Services 1 Sherman D. Hanna, Ohio State University Professor, Consumer Sciences Department Ohio State University 1787 Neil Avenue Columbus, OH 432101290 Phone: 6142924584
More informationRole of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign
Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Arun K Mandapaka, Amit Singh Kushwah, Dr.Goutam Chakraborty Oklahoma State University, OK, USA ABSTRACT Direct
More informationStudents' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)
Cairo University Faculty of Economics and Political Science Statistics Department English Section Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study) Prepared
More informationHome Computers and Internet Use in the United States: August 2000
Home Computers and Internet Use in the United States: August 2000 Special Studies Issued September 2001 P23207 Defining computer and Internet access All individuals living in a household in which the
More informationHIV/AIDS in the Binghamton TriCounty Region Revised June 2007
HIV/AIDS in the Binghamton TriCounty Revised June 2007 HIV is the virus that causes AIDS. You can be infected with HIV but not diagnosed with AIDS. The Centers for Disease Control (CDC) estimated that
More informationAuxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationFor example, enter the following data in three COLUMNS in a new View window.
Statistics with Statview  18 Paired ttest A paired ttest compares two groups of measurements when the data in the two groups are in some way paired between the groups (e.g., before and after on the
More informationLogistic Regression. http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests
Logistic Regression http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests Overview Binary (or binomial) logistic regression is a form of regression which is used when the dependent is a dichotomy
More informationNew SAS Procedures for Analysis of Sample Survey Data
New SAS Procedures for Analysis of Sample Survey Data Anthony An and Donna Watts, SAS Institute Inc, Cary, NC Abstract Researchers use sample surveys to obtain information on a wide variety of issues Many
More informationUsing Names To Check Accuracy of Race and Gender Coding in NAEP
Using Names To Check Accuracy of Race and Gender Coding in NAEP Jennifer Czuprynski Kali, James Bethel, John Burke, David Morganstein, and Sharon Hirabayashi Westat Keywords: Data quality, Coding errors,
More informationInteractions involving Categorical Predictors
Interactions involving Categorical Predictors Today s Class: To CLASS or not to CLASS: Manual vs. programcreated differences among groups Interactions of continuous and categorical predictors Interactions
More informationJames E. Bartlett, II is Assistant Professor, Department of Business Education and Office Administration, Ball State University, Muncie, Indiana.
Organizational Research: Determining Appropriate Sample Size in Survey Research James E. Bartlett, II Joe W. Kotrlik Chadwick C. Higgins The determination of sample size is a common task for many organizational
More informationStatistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics: Behavioural
More informationProjections of the Size and Composition of the U.S. Population: 2014 to 2060 Population Estimates and Projections
Projections of the Size and Composition of the U.S. Population: to Population Estimates and Projections Current Population Reports By Sandra L. Colby and Jennifer M. Ortman Issued March 15 P251143 INTRODUCTION
More informationMethods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL
Paper SA012012 Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL ABSTRACT Analysts typically consider combinations
More informationCategories of Data. Communication & Journalism as a Broad Field
Graduate Enrollment and Degrees in Communication/Journalism: Benchmarking Data from the Council of Graduate Schools/Graduate Record Examinations Survey Since 1986, the Council of Graduate Schools (CGS)
More informationAnalysis of Survey Data Using the SAS SURVEY Procedures: A Primer
Analysis of Survey Data Using the SAS SURVEY Procedures: A Primer Patricia A. Berglund, Institute for Social Research  University of Michigan Wisconsin and Illinois SAS User s Group June 25, 2014 1 Overview
More informationContacts between Police and the Public, 2008
U.S. Department of Justice Office of Justice Programs Bureau of Justice Statistics Special Report october 2011 ncj 234599 Contacts between Police and the Public, 2008 Christine Eith and Matthew R. Durose,
More information