Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541



Similar documents
VI. Introduction to Logistic Regression

Generalized Linear Models

11. Analysis of Case-control Studies Logistic Regression

Basic Statistical and Modeling Procedures Using SAS

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.

Statistics, Data Analysis & Econometrics

Developing Risk Adjustment Techniques Using the System for Assessing Health Care Quality in the

Ordinal Regression. Chapter

ln(p/(1-p)) = α +β*age35plus, where p is the probability or odds of drinking

SUGI 29 Statistics and Data Analysis

Beginning Tutorials. PROC FREQ: It s More Than Counts Richard Severino, The Queen s Medical Center, Honolulu, HI OVERVIEW.

Simple Linear Regression Inference

SAS Software to Fit the Generalized Linear Model

Segmentation For Insurance Payments Michael Sherlock, Transcontinental Direct, Warminster, PA

Statistics in Retail Finance. Chapter 2: Statistical models of default

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Logistic (RLOGIST) Example #1

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

A LOGISTIC REGRESSION MODEL TO PREDICT FRESHMEN ENROLLMENTS Vijayalakshmi Sampath, Andrew Flagel, Carolina Figueroa

Logistic Regression.

PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY

Statistics 305: Introduction to Biostatistical Methods for Health Sciences

Multivariate Logistic Regression

Logistic (RLOGIST) Example #3

Lecture 19: Conditional Logistic Regression

Charles Secolsky County College of Morris. Sathasivam 'Kris' Krishnan The Richard Stockton College of New Jersey

Consider a study in which. How many subjects? The importance of sample size calculations. An insignificant effect: two possibilities.

Multinomial and Ordinal Logistic Regression

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Binary Logistic Regression

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Modeling Lifetime Value in the Insurance Industry

Nominal and ordinal logistic regression

Lecture 14: GLM Estimation and Logistic Regression

Chapter 6: Multivariate Cointegration Analysis

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Multinomial and ordinal logistic regression using PROC LOGISTIC Peter L. Flom National Development and Research Institutes, Inc

Tests for Two Survival Curves Using Cox s Proportional Hazards Model

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Correlational Research

Logistic (RLOGIST) Example #7

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

International Statistical Institute, 56th Session, 2007: Phil Everson

Additional sources Compilation of sources:

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

Cool Tools for PROC LOGISTIC

SAS Syntax and Output for Data Manipulation:

SPSS Guide: Regression Analysis

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

Statistical Models in R

Developing Business Failure Prediction Models Using SAS Software Oki Kim, Statistical Analytics

Fraud Risk Prediction in Merchant-Bank Relationship using Regression Modeling

HLM software has been one of the leading statistical packages for hierarchical

Statistics and Data Analysis

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Study Guide for the Final Exam

LOGISTIC REGRESSION ANALYSIS

Latent Class Regression Part II

Best Practices in Using Large, Complex Samples: The Importance of Using Appropriate Weights and Design Effect Compensation

2013 CRS Research Report MOTORCYCLE SAFETY AND DRIVING UNDER INFLUENCE OF ALCOHOL

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

Lecture 18: Logistic Regression Continued

Paper Evaluation of methods to determine optimal cutpoints for predicting mortgage default Abstract Introduction

ABSTRACT INTRODUCTION

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, Last revised February 21, 2015

Introduction to Quantitative Methods

Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Logs Transformation in a Regression Equation

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

CHAPTER 5 COMPARISON OF DIFFERENT TYPE OF ONLINE ADVERTSIEMENTS. Table: 8 Perceived Usefulness of Different Advertisement Types

How to set the main menu of STATA to default factory settings standards

Two Correlated Proportions (McNemar Test)

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

6 Variables: PD MF MA K IAH SBS

Penalized regression: Introduction

New SAS Procedures for Analysis of Sample Survey Data

Using Stata for Categorical Data Analysis

Binary Diagnostic Tests Two Independent Samples

SUMAN DUVVURU STAT 567 PROJECT REPORT

Week TSX Index

Paper D Ranking Predictors in Logistic Regression. Doug Thompson, Assurant Health, Milwaukee, WI

Weight of Evidence Module

Module 5: Introduction to Multilevel Modelling SPSS Practicals Chris Charlton 1 Centre for Multilevel Modelling

MORE ON LOGISTIC REGRESSION

The Relationship Between Rodent Offspring Blood Lead Levels and Maternal Diet

Final Exam Practice Problem Answers

ADVANCED FORECASTING MODELS USING SAS SOFTWARE

Chapter 39 The LOGISTIC Procedure. Chapter Table of Contents

ABSTRACT INTRODUCTION STUDY DESCRIPTION

III. INTRODUCTION TO LOGISTIC REGRESSION. a) Example: APACHE II Score and Mortality in Sepsis

Free Trial - BIRT Analytics - IAAs

Chapter 29 The GENMOD Procedure. Chapter Table of Contents

Discussion Section 4 ECON 139/ Summer Term II

Module 4 - Multiple Logistic Regression

Introduction. Survival Analysis. Censoring. Plan of Talk

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Transcription:

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 libname in1 >c:\=; Data first; Set in1.extract; A=1; PROC LOGIST OUTEST=DD MAXITER=100 ORDER=DATA; OUTPUT OUT=CC XBETA=XB P=PROB; MODEL EDUC=POVDUM ; WEIGHT WEIGHT; *EDUC IS A 4 LEVEL ORDERED VARIABLE FOR LEVEL OF EDUCATION. Each of the categories is mutually exclusive. P=prob will give the probability estimate for the likelihood of reaching particular levels of education. For each of the observations, SAS will create 3 observations B a different probability estimate for each of these levels. One of the levels is an excluded category, and we can determine the likelihood of that event by subtraction. Data F; set DD; Rename pov=cpov; drop _type_; A=1; Data G; Merge F CC; by A; Xb_npov=xb-cpov*pov; Xb_pov=xb_npov+cpov; PR_NPOV=(EXP(XB_NPOV))/(1+EXP(XB_NPOV)); PR_POV=(EXP(XB_POV))/(1+EXP(XB_POV)); DATA F;RETAIN _LEVEL_;SET G; PROC SORT;BY _LEVEL_; PROC MEANS;VAR PROB _LEVEL_ PR_NPOV PR_POV; BY _LEVEL_ ; run; There are 3 different levels that SAS will determine probability estimates for B one for each of the intercept values. What we need to do is simply run a proc means by the particular level to determine the probability estimates for each level. In other words, SAS is creating a probability estimate for 3 of the levels (out of 4) and will give the probability of being in the particular level for each individual. Thus for person 1 (or case 1), SAS creates 3 observations for this case, with probability estimates for each case by the level or category of D:\WP60\LECT2.PHD\LOGIST\ORDLOG1.WPD Page 1

education. Person 1 will have 3 separate observations with a newly created variable name _level_ indicating which level the probability estimate is for. To determine the probability estimate for level 1, we need to only examine those cases where the probability estimate is for level 1. What I=ve done above is determined mean values (by using proc means) by the particular _level_, which will give separate mean values for the different levels. Level 1 is the excluded category from the analysis, so we will only get probabilities for levels 2, 3 and 4. The probability estimate for Level 2 gives the probability of being a college graduate or having some college or being a high school graduate. (If we had a level 1 probability estimate, it would merely tell us the probability of being a college grad or having some college or graduating from high school or dropping out of high school. In other words, the value of this will always be 1.) For level 3, the probability estimates indicate the probability of some college or being a college graduate. The probability estimates for level 4 indicate the likelihood of graduating from college. Hence, the only probability we really know is the probability of graduating from college. We can then subtract the probability of graduating from college from the probability of either graduating from college or going to college to determine the probability of going to college. If we=d like to determine the probability of graduating from high school, we could subtract the probability of graduating from college or going to college from the probability for level 2 (graduating from college, going to college or graduating from high school). To determine the probability of dropping out of high school, we could subtract the probability of level 2 (graduating from college, going to college or graduating from high school) from 1. The reason for this difficulty in determining probability estimates is because the model is based on cumulative probabilities. Note that the bottom category is being a college graduate. You must look at the order that SAS puts the different levels B or look to the ordered values in SAS. Here, ordered value=1 is Educ=4. Ordered value=2 is Educ=3, etc. The interpretation of the intercepts are as follows: Intercept1 log odds of being a college grad versus having some college, being a high school grad or being a high school dropout. In other words, this is the log odds of being in the lowest ordered value category relative to all other categories. Intercept2 log odds of being a college grad or having some college D:\WP60\LECT2.PHD\LOGIST\ORDLOG1.WPD Page 2

relative to being a high school graduate or being a high school dropout. Or, the log odds of being in the bottom two ordered categories relative to being in the top two ordered categories. Intercept3 log odds of being a college grad or having some college or being a high school graduate relative to being a high school dropout. Or, the log odds of being in the bottom 3 ordered categories relative to being in the top ordered category. For a further explanation of how to use ordered logistic regression, see Categorical Data Analysis Using the SAS System, pages 217-231, by Maura E. Stokes, Charles S. Davis and Gary G. Koch, from the SAS Institute, 1995. Results The LOGISTIC Procedure Data Set: WORK.Z Response Variable: EDUC Response Levels: 4 Number of Observations: 1884 Weight Variable: WEIGHT Sum of Weights: 1884 Link Function: Logit Response Profile Ordered Total Value EDUC Count Weight 1 4 471 512.86406 2 3 671 674.85866 3 2 578 549.99787 4 1 164 146.27942 Since SAS puts these values in the Awrong@ order, I have reordered them with the sort command (above) and the data=order command (also above). Score Test for the Proportional Odds Assumption Chi-Square = 24.4340 with 2 DF (p=0.0001) The chi-square test above indicates if we can assume that the b coefficients have proportional effects on the different levels of the dependent variable. Since we would reject D:\WP60\LECT2.PHD\LOGIST\ORDLOG1.WPD Page 3

the null hypothesis, reject the proportional effects assumption. Thus, we could run separate logistic regression models for each of level of the dependent variable. Model Fitting Information and Testing Global Null Hypothesis BETA=0 Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC 4828.334 4654.716. SC 4844.957 4676.881. -2 LOG L 4822.334 4646.716 175.618 with 1 DF (p=0.0001) Score.. 170.368 with 1 DF (p=0.0001) -2 Log L tells us if the model is significant or not (much like the F value in OLS regression). The p value gives the exact level of significance. Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio INTERCP1 1-0.7468 0.0548 186.0003 0.0001.. INTERCP2 1 0.8646 0.0558 239.9376 0.0001.. INTERCP3 1 2.9219 0.0967 913.3127 0.0001.. POVDUM 1-1.3713 0.1063 166.4219 0.0001-0.314033 0.254 This indicates that those who grow up poor have less education than those who do not grow up poor. We determine probability estimates using these coefficient estimates. The probability estimates are given below. Intercept1 tell us the log odds of being a college grad relative to those who are not college grads. Intercept2 indicates the log odds of being a college graduate or having some college relative to those who are high school graduates or high school dropouts. Intercept3 indicates the log odds of being a college grad, having some college or having a high school degree relative to being a high school dropout. D:\WP60\LECT2.PHD\LOGIST\ORDLOG1.WPD Page 4

Probability Estimates 1. LIKELIHOOD OF COLLEGE GRAD, SOME COLLEGE OR HIGH SCHOOL GRADUATION. Response Value=2 PROB Estimated Probability 1884 0.9214712 0.0514712 0.8250009 _LEVEL_ Response Value 1884 2.0000000 0 2.0000000 PR_NPOV 1884 0.9489188 0 0.9489188 PR_POV 1884 0.8250009 0 0.8250009 ------------------------------------------------------------------------------ 2. LIKELIHOOD OF COLLEGE GRADUATION OR SOME COLLEGE Response Value=3 PROB Estimated Probability 1884 0.6310367 0.1360967 0.3759564 _LEVEL_ Response Value 1884 3.0000000 0 3.0000000 PR_NPOV 1884 0.7036118 0 0.7036118 PR_POV 1884 0.3759564 0 0.3759564 3. LIKELIHOOD OF COLLEGE GRADUATION Response Value=4 PROB Estimated Probability 1884 0.2740716 0.0889561 0.1073450 _LEVEL_ Response Value 1884 4.0000000 0 4.0000000 PR_NPOV 1884 0.3215084 0 0.3215084 PR_POV 1884 0.1073450 0 0.1073450 From these probabilities, we know that the overall likelihood of graduating from college is.274 and we could also easily determine the probability of dropping out by subtracting.9214 from 1 (=.0786). The likelihood of going to college (but not graduating) =.631-.274 =.357. The likelihood of getting a high school degree =.921-.631 =.290. We could also determine these probability estimates for those who are in poverty during childhood and those who are not. D:\WP60\LECT2.PHD\LOGIST\ORDLOG1.WPD Page 5