Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 libname in1 >c:\=; Data first; Set in1.extract; A=1; PROC LOGIST OUTEST=DD MAXITER=100 ORDER=DATA; OUTPUT OUT=CC XBETA=XB P=PROB; MODEL EDUC=POVDUM ; WEIGHT WEIGHT; *EDUC IS A 4 LEVEL ORDERED VARIABLE FOR LEVEL OF EDUCATION. Each of the categories is mutually exclusive. P=prob will give the probability estimate for the likelihood of reaching particular levels of education. For each of the observations, SAS will create 3 observations B a different probability estimate for each of these levels. One of the levels is an excluded category, and we can determine the likelihood of that event by subtraction. Data F; set DD; Rename pov=cpov; drop _type_; A=1; Data G; Merge F CC; by A; Xb_npov=xb-cpov*pov; Xb_pov=xb_npov+cpov; PR_NPOV=(EXP(XB_NPOV))/(1+EXP(XB_NPOV)); PR_POV=(EXP(XB_POV))/(1+EXP(XB_POV)); DATA F;RETAIN _LEVEL_;SET G; PROC SORT;BY _LEVEL_; PROC MEANS;VAR PROB _LEVEL_ PR_NPOV PR_POV; BY _LEVEL_ ; run; There are 3 different levels that SAS will determine probability estimates for B one for each of the intercept values. What we need to do is simply run a proc means by the particular level to determine the probability estimates for each level. In other words, SAS is creating a probability estimate for 3 of the levels (out of 4) and will give the probability of being in the particular level for each individual. Thus for person 1 (or case 1), SAS creates 3 observations for this case, with probability estimates for each case by the level or category of D:\WP60\LECT2.PHD\LOGIST\ORDLOG1.WPD Page 1

education. Person 1 will have 3 separate observations with a newly created variable name _level_ indicating which level the probability estimate is for. To determine the probability estimate for level 1, we need to only examine those cases where the probability estimate is for level 1. What I=ve done above is determined mean values (by using proc means) by the particular _level_, which will give separate mean values for the different levels. Level 1 is the excluded category from the analysis, so we will only get probabilities for levels 2, 3 and 4. The probability estimate for Level 2 gives the probability of being a college graduate or having some college or being a high school graduate. (If we had a level 1 probability estimate, it would merely tell us the probability of being a college grad or having some college or graduating from high school or dropping out of high school. In other words, the value of this will always be 1.) For level 3, the probability estimates indicate the probability of some college or being a college graduate. The probability estimates for level 4 indicate the likelihood of graduating from college. Hence, the only probability we really know is the probability of graduating from college. We can then subtract the probability of graduating from college from the probability of either graduating from college or going to college to determine the probability of going to college. If we=d like to determine the probability of graduating from high school, we could subtract the probability of graduating from college or going to college from the probability for level 2 (graduating from college, going to college or graduating from high school). To determine the probability of dropping out of high school, we could subtract the probability of level 2 (graduating from college, going to college or graduating from high school) from 1. The reason for this difficulty in determining probability estimates is because the model is based on cumulative probabilities. Note that the bottom category is being a college graduate. You must look at the order that SAS puts the different levels B or look to the ordered values in SAS. Here, ordered value=1 is Educ=4. Ordered value=2 is Educ=3, etc. The interpretation of the intercepts are as follows: Intercept1 log odds of being a college grad versus having some college, being a high school grad or being a high school dropout. In other words, this is the log odds of being in the lowest ordered value category relative to all other categories. Intercept2 log odds of being a college grad or having some college D:\WP60\LECT2.PHD\LOGIST\ORDLOG1.WPD Page 2

relative to being a high school graduate or being a high school dropout. Or, the log odds of being in the bottom two ordered categories relative to being in the top two ordered categories. Intercept3 log odds of being a college grad or having some college or being a high school graduate relative to being a high school dropout. Or, the log odds of being in the bottom 3 ordered categories relative to being in the top ordered category. For a further explanation of how to use ordered logistic regression, see Categorical Data Analysis Using the SAS System, pages 217-231, by Maura E. Stokes, Charles S. Davis and Gary G. Koch, from the SAS Institute, 1995. Results The LOGISTIC Procedure Data Set: WORK.Z Response Variable: EDUC Response Levels: 4 Number of Observations: 1884 Weight Variable: WEIGHT Sum of Weights: 1884 Link Function: Logit Response Profile Ordered Total Value EDUC Count Weight 1 4 471 512.86406 2 3 671 674.85866 3 2 578 549.99787 4 1 164 146.27942 Since SAS puts these values in the Awrong@ order, I have reordered them with the sort command (above) and the data=order command (also above). Score Test for the Proportional Odds Assumption Chi-Square = 24.4340 with 2 DF (p=0.0001) The chi-square test above indicates if we can assume that the b coefficients have proportional effects on the different levels of the dependent variable. Since we would reject D:\WP60\LECT2.PHD\LOGIST\ORDLOG1.WPD Page 3

the null hypothesis, reject the proportional effects assumption. Thus, we could run separate logistic regression models for each of level of the dependent variable. Model Fitting Information and Testing Global Null Hypothesis BETA=0 Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC 4828.334 4654.716. SC 4844.957 4676.881. -2 LOG L 4822.334 4646.716 175.618 with 1 DF (p=0.0001) Score.. 170.368 with 1 DF (p=0.0001) -2 Log L tells us if the model is significant or not (much like the F value in OLS regression). The p value gives the exact level of significance. Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio INTERCP1 1-0.7468 0.0548 186.0003 0.0001.. INTERCP2 1 0.8646 0.0558 239.9376 0.0001.. INTERCP3 1 2.9219 0.0967 913.3127 0.0001.. POVDUM 1-1.3713 0.1063 166.4219 0.0001-0.314033 0.254 This indicates that those who grow up poor have less education than those who do not grow up poor. We determine probability estimates using these coefficient estimates. The probability estimates are given below. Intercept1 tell us the log odds of being a college grad relative to those who are not college grads. Intercept2 indicates the log odds of being a college graduate or having some college relative to those who are high school graduates or high school dropouts. Intercept3 indicates the log odds of being a college grad, having some college or having a high school degree relative to being a high school dropout. D:\WP60\LECT2.PHD\LOGIST\ORDLOG1.WPD Page 4

Probability Estimates 1. LIKELIHOOD OF COLLEGE GRAD, SOME COLLEGE OR HIGH SCHOOL GRADUATION. Response Value=2 PROB Estimated Probability 1884 0.9214712 0.0514712 0.8250009 _LEVEL_ Response Value 1884 2.0000000 0 2.0000000 PR_NPOV 1884 0.9489188 0 0.9489188 PR_POV 1884 0.8250009 0 0.8250009 ------------------------------------------------------------------------------ 2. LIKELIHOOD OF COLLEGE GRADUATION OR SOME COLLEGE Response Value=3 PROB Estimated Probability 1884 0.6310367 0.1360967 0.3759564 _LEVEL_ Response Value 1884 3.0000000 0 3.0000000 PR_NPOV 1884 0.7036118 0 0.7036118 PR_POV 1884 0.3759564 0 0.3759564 3. LIKELIHOOD OF COLLEGE GRADUATION Response Value=4 PROB Estimated Probability 1884 0.2740716 0.0889561 0.1073450 _LEVEL_ Response Value 1884 4.0000000 0 4.0000000 PR_NPOV 1884 0.3215084 0 0.3215084 PR_POV 1884 0.1073450 0 0.1073450 From these probabilities, we know that the overall likelihood of graduating from college is.274 and we could also easily determine the probability of dropping out by subtracting.9214 from 1 (=.0786). The likelihood of going to college (but not graduating) =.631-.274 =.357. The likelihood of getting a high school degree =.921-.631 =.290. We could also determine these probability estimates for those who are in poverty during childhood and those who are not. D:\WP60\LECT2.PHD\LOGIST\ORDLOG1.WPD Page 5