CALCULATING MARGINAL PROBABILITIES IN PROC PROBIT Guy Pascal, Mmorial Halth Allianc Introduction Th PROBIT procdur within th SAS systm provids a simpl mthod for stimating discrt choic variabls (i.. dichotomous or polychotomous). Th difficulty with th procdur is that th paramtr stimats ar difficult to intrprt. On way to as this intrprtation issu is to calculat marginal probabilitis for ach paramtr stimat. Th purpos of this papr is to illustrat th us of marginal probabilitis in th contxt of PROC PROBIT. This is don using th normal, logistic, and gomprtz distributions that ar availabl in th PROBIT procdur. Th drivation of th marginal probability is illustratd for ach distributional assumption. Th prdictd probabilitis from ach modl ar outputtd and th marginal probabilitis ar calculatd. Th linar probability modl (LPM) is also usd to provid a baslin for comparisons across th distributions. Drivation of Marginal Probabilitis In ordr to gt th marginal probabilitis, w must tak th first drivativ for ach of th distributional assumptions. For th LPM, this is quit simpl: ( / X ik ) X i β = β k Thus, th marginal probability in th LPM is th paramtr stimat from th rgrssion analysis. No additional manipulation is ncssary. Th sam cannot b said for th normal, logistic, and gomprtz distributions. Th first drivativ for th normal distribution is as follows: ( / X ik ) Φ(X i β) = φ(x i β)β k whr Φ is th cumulativ dnsity function (CDF) for th normal and φ is th probability dnsity function (PDF) for th normal. Thus th marginal probability assuming a normal distribution is th paramtr stimat from th PROBIT multiplid by a standardization factor. Th first drivativ for th logistic distribution is as follows: ( / X ik ) L(X i β) = ( 1+ ) 2 β k whr L is th logistic distribution and quals 1/(1+xp(- x)). W s that th marginal probability for th logistic distribution is th paramtr stimat for th PROBIT multiplid by a standardization factor. This factor is th probability of bing a 1 multiplid by th probability of bing a 0. Th first drivativ of th gomprtz distribution is as follows: ( / X ik ) G(X i β) = β k whr G is th gomprtz distribution and quals 1-xp(- xp(x)). Onc again, w s that th marginal probability is qual to th stimatd cofficint multiplid by a standardization factor. Calculating Marginal Probabilitis Thr ar two ways to calculat th marginal probabilitis. On, th corrction factor could b valuatd at th sampl mans. For xampl, if th man ag wr 35.2 yars, th avrag lvl of ducational attainmnt wr 12.6 yars, and th man incom lvl wr $33,000, thn standardization factor is calculatd basd on ths mans. W will not us this tact sinc no on in th sampl will hav ths "avrag" charactristics. For our purposs, w will calculat th valu of th first drivativ for ach obsrvation and thn tak th avrag of th standardization factor for th ntir sampl. This man scor, whn multiplid by th paramtr stimats from th PROBIT modl, will giv th marginal probabilitis. For th normal distribution, th standardization factor (in trms of SAS coding) is as follows: Pdfnorm = xp(.5*xbta*xbta)/sqrt(2*3.1459); whr xbta is th prdictd probabilitis from th PROBIT procdur that spcifis th normal distribution. Th man of Pdfnorm is th standardization factor for th normal distribution. For th logistic distribution, th corrction factor (in trms of SAS cod) is as follows: Problog1 = 1/(1+xp(-xbta));
Problog0 = xp(-xbta)/(1+xp(- xbta)); Prob0X1 = Problog1*Problog0; Whr xbta is th prdictd probabilitis from th PROBIT procdur that assums a logistic distribution. Problog1 is th probability of bing a 1, Problog0 is th probability of bing a 0, and Prob0X1 is th probability of bing a 0 multiplid by th probability of bing a 1. Th man of Prob0X1 is th standardization factor for th logistic distribution. For th gomprtz distribution, th standardization factor (in trms of SAS cod) is as follows: Pdfgomp = xp(xbta)*xp(- 1*xp(xbta)); Whr xbta, onc again, is th prdictd probabilitis from th PROBIT procdur. Th man of Pdfgomb is th standardization factor for th gomprtz distribution. An Illustrativ Exampl To illustrat th calculation of th marginal probabilitis, w will look at th factors that influnc th probability of rspons to a patint satisfaction survy at Mmorial Hospital of Burlington County. Th hospital snds th satisfaction survys to all in-patints xcluding thos who xpird in th hospital, wr patints on th Mntal Halth Unit, or wr nwborns. Th survys hav th patint's hospital ID numbr printd on thm so that th survys could b linkd to hospital billing and clinical information. Patints wr considrd to b non-rspondnts if thy faild to rspond, scratchd out thr ID numbr on th survy, or faild to rspond to how likly thy wr to rcommnding th hospital frinds or family. 21,449 survys wr snt patints dischargd btwn 4/23/96 and 10/23/97 with 4,474 usabl survys rturnd. Th probability of rspons was modld using lngth of stay, gndr, th spcialty of th srvic th patint rcivd, typ of insuranc covrag, and whthr th patint gav birth. Th probability of rspons to th survy is stimatd four diffrnt ways: via th linar probability modl using rgrssion analysis and using th PROBIT procdur and sparatly spcifying th normal, logistic, and gomprtz distributions. Th calculation of th marginal probabilitis ntails to stps. First, th modl is stimatd with th prdictd probabilitis outputtd to a sparat data st. Scond, th outputtd probabilitis ar usd to calculat th standardization factor for ach obsrvation. Th man of th factor is calculatd and utilizd to standardiz th paramtr stimat from th PROBIT. Th rsults of th four modls is prsntd in Tabl 1. Th SAS programming is givn in th Appndix Thr things stand out whn looking at Tabl 1. First, th sam variabls ar significant rgardlss of modl spcification. Scond, th sign of th paramtr stimats ar idntical and finally, th magnitud of th calculatd marginal probabilitis ar similar. For xampl, patints who hav a lngth of stay of gratr than 5 days ar 4.6% lss likly to rspond than patints with shortr lngth of stay whn using th LPM. Whn utilizing PROC PROBIT and spcify th normal, logistic, or gomprtz distributions, w not that patints with longr stays in th hospital hav a 4.8%, 4.8%, and 4.7% lowr probability of rspons, rspctivly. Th othr significant variabls also hav similar marginal probabilitis. Gnral surgry patints hav a 7.1%, 6.5%, 6.3%, and a 6.1% highr probability of rspons for th LPM, Probit, Logit, and Gompit. Th rsults ar similar for th rmaindr of th variabls. All th modls find no rlationship btwn gndr, cardiology patints, gastro-ntrology patints, nurology patints, having commrcial insuranc, CHAMPUS, NJ Blu Cross, or PPO and th liklihood of rspons. Th marginal probabilitis do not match xactly for two rasons. On th LPM assums a linar rlationship. Thus, w xpct to s diffrncs btwn th LPM stimats and th maximum liklihood stimats (MLE) which ar non-linar. Two, som slight diffrncs ar xpctd for th MLE stimats bcaus th distributions ar slightly diffrnt. This is spcially tru for th gomprtz which is an xtrm valu function. Conclusion Th mthodology listd in th papr provids a simpl mthod for asing th intrprtation of paramtr stimats in PROC PROBIT. By outputting th prdictd probabilitis from th normal, logistic, or gomprtz distributions, a simpl standardization factor can convrt th paramtr stimats to marginal probabilitis.
** Significant at th 99% lvl * Significant at th 95% lvl Tabl 1 Comparison of Marginal Probability Calculations Variabl LPM Probit Logit Gompit DAYS > 5 -.04603** -.04774** -.04765** -.04740** MALE.00664.00719.00673.00610 CARDIOLOGY -.00588 -.00532 -.00609 -.00653 GASTRO- -.01121 -.01005 -.01163 -.01258 ENTEROLOGY GENERAL.07073**.06505**.06321**.06123** SURGERY NEUROLOGY -.00741 -.00682 -.00762 -.00832 OBSTETRICS -.10311** -.11891** -.12399** -.12793** ORTHOPEDICS.02815*.02563*.02577*.02613* PULMONARY -.03033** -.03270** -.03343** -.03420** COMMERCIAL.02989.02735.02717.02751 INSURANCE CHARITY CARE -.10031** -.11445** -.11733** -.12047** CHAMPUS.02061.01914.01957.01969 HMO OTHER -.01457 -.01422 -.01392 -.01376 MEDICARE -.02631* -.02625* -.02585* -.02587* NJ BLUE CROSS.02387.02263.02229.02207 PPO.01529.01477.01457.01451 SELF-PAY -.17717** -.27440** -.30858** -.32983** US.06545**.05813**.05744**.05693** HEALTHCARE MEDICAID -.10649** -.13192** -.14069** -.14716** DRG 371.11682**.13149**.13592**.13928** DRG 372.06104*.07832**.08169**.08365* DRG 373.09009**.10763**.11195**.11518**
Appndix SAS Coding for Calculating Marginal Probabilitis libnam guy 'd:\sasdata'; data on; st guy.mrgd6; data lpm; st on; proc rg s; titl 'OLS Rgrssion (Linar Probability Modl)'; titl2 'dpndnt variabl (survy) = 1 if prson rspondd to survy'; modl survy = daysgt5 mal cardio gastro gnsurg nuro obstt ortho pulm fcc fcf fcg fch fcm fcn fcp fcs fcu fcx drg371 drg372 drg373; data probit; st on; if survy q 1 thn nsurvy = 0; ls nsurvy = 1; titl 'PROBIT with (nsurvy) = 0 if prson rspondd to survy'; class nsurvy; modl nsurvy = daysgt5 mal cardio gastro gnsurg nuro obstt ortho drg372 drg373 / convrg =.00001; output out = probit2 xbta = xbpr prob = probpr; data probit2; st probit2; pdf1norm = xp (-.5*xbpr*xbpr)/sqrt(2*3.1459); probpr1 = 1-probnorm(-xbpr); probpr0 = probnorm(-xbpr); millslo = pdf1norm/(1-probnorm(-xbpr)); millshi = pdf1norm/probnorm(-xbpr); proc mans; var pdf1norm probpr1 probpr0 millslo millshi; data logit; st on; if survy q 1 thn nsurvy = 0; ls nsurvy = 1; titl 'LOGIT with (nsurvy) = 0 if prson rspondd to survy'; class nsurvy; modl nsurvy = daysgt5 mal cardio gastro gnsurg nuro obstt ortho drg372 drg373 / d=logistic convrg =.00001; output out = logit2 xbta = xblog prob = problog; data logit2; st logit2; problo1 = 1/(1+xp(-xblog)); problo0 = xp(-xblog)/(1+xp(-xblog));
prob0x1 = problo1*problo0; millslo = prob0x1/(1-problo0); millshi = prob0x1/problo0; proc mans; var problo1 problo0 prob0x1 millslo millshi; data gompit; st on; if survy q 1 thn nsurvy = 0; ls nsurvy = 1; titl 'gombit with (nsurvy) = 0 if prson rspondd to survy'; class nsurvy; modl nsurvy = daysgt5 mal cardio gastro gnsurg nuro obstt ortho drg372 drg373 / d=gomprtz convrg =.00001; output out = gompit2 xbta = xbgomp prob = probgomp; data gompit2; st gompit2; pdfgomp = xp(xbgomp)*xp(-1*xp(xbgomp)); prgomp1 = 1-xp(-1*xp(xbgomp)); prgomp0 = xp(-1*xp(xbgomp)); proc mans; var pdfgomp prgomp1 prgomp0;