Paper 1593-014 Modelng Loss Gven Default n SAS/SA Xao Yao, he Unversty of Ednburgh Busness School, UK Jonathan Crook, he Unversty of Ednburgh Busness School, UK Galna Andreeva, he Unversty of Ednburgh Busness School, UK ABSRAC Predctng loss gven default (LGD) s playng an ncreasngly crucal role n quanttatve credt rsk modelng. In ths paper, we propose to apply mxed effects models to predct corporate bonds LGD, as well as other wdely used LGD models. he emprcal results show that mxed effects models are able to explan the unobservable heterogenety and to make better predctons compared wth lnear regresson and fractonal response regresson. All the statstcal models are performed n SAS/SA, SAS 9., usng specfcally PROC REG and PROC NLMIXED, and the model evaluaton metrcs are calculated n PROC IML. hs paper gves a detaled descrpton on how to use PROC NLMIXED to buld and estmate generalzed lnear models and mxed effects models. INRODUCION Loss gven default (LGD) measures the percentage of all exposure at the tme of default that can not be recovered. Recovery rate (RR) s defned as one mnus LGD. LGD/RR modelng attracts much less attenton compared wth the large volume of lterature on PD modelng. Wth the portfolo loss estmaton beng a major concern n modern rsk management system, ncreasng attenton s beng dedcated to LGD modelng as well as PD and LGD jont estmaton. In terms of the methodologes there are two man streams: one approach s to apply fxed effect regresson models ncludng lnear regresson and fractonal response regresson to predct LGD (Gupton and Sten, 00, Dermne et al, 006 and Bastos, 010). In emprcal studes LGD dstrbutons often present b-modal characterstcs bounded between the nterval [0, 1] based on ts defnton. Calabrese (01) proposes an nflated beta regresson model whch consdered the dependent varable as a mxture of a contnuous beta dstrbuton on (0, 1) and a dscrete Bernoull dstrbuton to model the probablty mass at the boundares 0 and 1, whch can be regarded as a specal type of generalzed lnear regresson model. A second approach s the use of the mxed effects models based on the Vascek s sngle factor framework (Vascek 1987) where LGD s assumed to be dependent on a systematc rsk factor n the predctors (Hamerle et al, 006). In the settng of a sngle factor model the unobservable systematc rsk factor works as a random effect term and the dosyncratc rsk factor s the resdual term. Consequently a sngle factor model wth the observable covarates s equvalent as a mxed effects model. However, Hamerle et al (006) only consder the tme-varyng latent factor and they do not make any further benchmarkng studes. hs project ams to fll n ths gap by applyng the random effect terms at multple levels and to conduct a comparson study of the commonly used models n lterature. We manly nvestgate to apply the mxed effects models to predct the US corporate bonds recovery rates wth the random effect terms specfed at oblgor, senorty and tme levels. We fnd that the ncluson of an oblgor-varyng random effect term effectvely explans the unobservable heterogenety and the related predctve accuraces are also much better than the others. All the models are bult up and realzed n SAS/SA and the predcton results are generated n PROC IML. he remander of ths paper s structured as follows. he next secton brefly ntroduces the data used n the emprcal study and s followed by an overvew of the models wth SAS programs provded. he emprcal results and analyss are then presented and the last secton concludes ths study. DAA AND SEUP he emprcal study s based on the US corporate bonds recovery rates nformaton from Moody s Ultmate Recovery Database (MURD). he unt of observaton n ths study s nstrument. hs database covers the recovery nformaton of more than 3000 nstruments untl date. he nstruments nclude bank loans, revolvers and corporate bonds. In ths 1
study we are only nterested n the corporate bonds and use recovery rate as the dependent varable nstead of LGD, and the fnal sample has 1413 observatons of defaulted bonds observed from 1986 to 01. here are fve dfferent senortes ncludng Junor Subordnated, Subordnated, Senor Subordnated, Senor Secured and Senor Unsecured. Each oblgor may ssue more than one nstrument wth dfferent senortes. In MURD only nstrument related nformaton s provded ncludng the debt characterstcs and recovery process nformaton. We ntegrate the accountng ratos from Compustat nto our sample by usng the common frm dentfer. US macroeconomc varables are also ncluded and extracted from the open sources onlne to capture features of the economc cycle. Here both Issue Sze and otal Asset are subjected to a log transformaton for scalng. Both accountng and macroeconomc covarates are ncorporated one year pror to default and the regresson model can be presented as follows Recovery Rate,t = Intercept + Recovery Characterstcs + Frm Characterstcs,t-1 + Macroeconomc Varables t-1 Fgure 1 presents the dstrbuton of recovery rates n our sample. We can observe the clustered observatons wth recovery rates equal to 0 and 1 from the fgure. able 1 presents the recovery rates dstrbuton by senorty and able gves the descrptons of the varables used n ths study. Fgure 1. Dstrbuton of recovery rates No. Mean Std Mn Max Junor Subordnate Bonds 8 0.168 0.634 0 1 Senor Secured Bonds 33 0.69 0.3688 0 1.198 Senor Subordnate Bonds 198 0.3150 0.3617 0 1.6978 Senor Unsecured Bonds 681 0.5100 0.3813 0 1.0499 Subordnate Bonds 174 0.317 0.3743 0 1.3691 otal 1413 0.4806 0.3915 0 1.6978 able 1. Recovery rates by senorty
Debt Characterstcs Var1: Collateral Rank Var: Percent Above Var3: Issue Sze Instruments are ranked related to each other based on the structure pror to default, takng nto consderaton collateral and nstrument type. Percentage of debt whch s contractually senor to the current nstrument. Face value of the relevant nstrument. Frm Characterstcs Var4: otal Asset Var5: EBIDA Var6: Leverage Var7: Debt Rato Var8: Book Value per Share Var9: Asset angblty Var10: Quck Rato otal assets of the oblgor Earnngs before nterest, taxes, deprecaton and amortzaton Rato of total debt and total assets Rato of current labltes and long term debt Book value of assets scaled by the total outstandng shares Rato between ntangble assets and tangble assets Sum of cash and short-term nvestment and total recevables dvded by the current labltes. Macroeconomc Varables Var11: Growth Rate Var1: -Bll Rate Var13: Aggregated Default Rates Var14: Unemployment Rate US annual GDP growth rate US three months reasury bll rate US annual ssuer-weghted corporate default rates US annual unemployment rate able. Lst of covarates MODELS AND SAS PROGRAMS hs secton gves an overvew of the models n ths study. he followng mathematcal notatons are used through ths paper. he recovery rate of nstrument s defned as y and the vector of covarates s gven as x. b 0 and β denote the ntercept term and the vector of parameters respectvely. In the followng SAS programs we name the dependent varable as RR, and the ndependent covarates are ndexed as Var1-Var14. he parameters of ntercept term and all the covarates are named as b0-b14, and the cleaned dataset s named as MyData. LINEAR REGRESSION Prevous studes show that lnear regresson models appear to be of comparable predctve accuraces as other more complcated statstcal models (Q and hao, 011; Bellott and Crook, 01) even though there s the potental rsk to make predctons out of the range between 0 and 1. he lnear regresson model s defned y e = b0 + β x + e. (1) ~ N(0, s ) 3
Program 1. Lnear regresson proc reg data=mydata outest=tran_estmate; model RR=Var1 Var Var14; qut; proc score data=mydata score=tran_estmate out=reg_output (keep=rr model1) predct type=parms; var Var1 Var Var14; Ordnary least squared estmaton method s used n PROC REG and the estmates of parameters are exported nto a new table called tran_estmate. he predcted values are computed by PROC SCORE whle we only keep the actual and predcted values named as RR and model1 n the output dataset reg_output. FRACIONAL RESPONSE REGRESSION Fractonal response regresson was frst proposed by Papke and Wooldrdge (1996) and has been wdely appled n LGD modelng (Dermne and Carvalho, 006; Bastos, 010, Bellott & Crook, 01). In ths model, the dependent varable s bounded between 0 and 1 by mposng a lnk functon such as Ey ( x) = Gb ( 0 + β x ) where G() denotes a lnk functon such as a logt or a complementary log-log transformaton functon and the quas maxmum lkelhood functon s gven as follows log L = å( y log G( b0 + β x) + (1 -y)log(1 - G( b0 + β x ))). () Program. Fractonal response regresson proc nlmxed data=mydata tech=newrap maxter=3000 maxfunc=3000 qtol=0.0001; parms b0-b14=0.0001; cov_mu=b0+b1*var1+b*var+ +b14*var14; mu=logstc(cov_mu); loglkefun=rr*log(mu)+(1-rr)*log(1-mu); model RR~general(loglkefun); predct mu out=frac_resp_output (keep=nstrument_d RR pred); Here the fractonal response regresson model s realzed n PROC NLMIXED and the log-lkelhood functon s defned as equaton () and optmzed by a Newton-Raphson method wth lne search specfed by the opton tech=newrap. Note that cov_mu denotes the term b 0 + β x and a logt form transformaton functon s selected by applyng a bult-n functon logstc defned n PROC NLMIXED. he predcted recovery rate mu s gven by usng the opton predct and exported to an output dataset named as frac_resp_output, where the predcted recovery rate s named as pred automatcally. Notce that both actual and predcted recovery rates are kept n the output dataset. INFLAED BEA REGRESSION Inflated beta regresson s proposed by Ospna and Ferrar (010) where the dependent varable s regarded as a mxture dstrbuton of a beta dstrbuton on (0, 1) and a Bernoull dstrbuton on boundares 0 and 1. he probablty densty functon s gven as ìï ïp(1 - y) f y = 0 b01(; y pymf,,, ) = ï ípy f y = 1, (3) ï(1 - p) fy ( ; m, f) fyî (0, 1) ïî where (;, ) f y mf s the beta densty functon and m and f are the mean and precson parameters. Here m can be reparameterzed by mposng a logt transformaton. he expectaton of the dependent varable s derved mmedately 4
such that Ey () = py + (1- p) m. (4) Let d = 1 f y = 0 and d = 0 f y Î (0, 1], c = 1 f y = 1 and c = 0 f y Î [0, 1), and the maxmum lkelhood functon s gven as follows å log L = d log( p(1 - y)) + c log( py) + (1 -d)(1 -c)(log(1 - p) + log G( f) - log G( mf) - log G((1 - m) f). (5) + ( mf - 1) log y + ((1 - m) f - 1) log(1 -y )) he unknown parameters ( b0, β, p, y, f ) are solved by standard optmzaton algorthms as above. More detals can be found n Ospna and Ferrar (010). Program 3. Inflated beta regresson proc nlmxed data=mydata tech=quanew maxter=3000 maxfunc=3000 qtol=0.0001; parms b0-b14=0.0001 pe=0. kesa=0.3 ph=; cov_mu=b0+b1*var1+b*var+ +b14*var14; mu=logstc(cov_mu); f RR=0 then loglkefun=log(pe)+log(1-kesa); f RR>=1 then loglkefun=log(pe)+log(kesa); f 0<RR<1 then loglkefun=log(1-pe)+lgamma(ph)-lgamma(mu*ph)-lgamma((1-mu)*ph) +(mu*ph-1)*log(rr)+((1-mu)*ph-1)*log(1-rr); predct pe*kesa+(1-pe)*mu out=inf_beta_output (keep=nstrument_d RR pred); model RR~general(loglkefun); In Program 3 we use an f-condton to defne the log-lkelhood functon of the nflated beta regresson model based on (3). he logt transformaton s gven by a logstc functon as above. Note that the predcted recovery rate s defned as equaton (4) nstead of mu n Program. Here lgamma s a bult-n functon n PROC NLMIXED meanng the log-gamma functon and a quas-newton optmzaton method (quanew) s employed to obtan the estmated parameters. he output data set s then named as Inf_beta_output. MIXED EFFECS MODEL We start wth the sngle factor framework n Hamerle et al (006) and rewrte t as follows y = b0 + β x + g1t + ge t = 1,...,, (6) N(0,1), e N(0,1) t where the random effect t s termed as a tme-varyng systematc rsk factor and e denotes the dosyncratc rsk factor. he sngle latent factor t s related to the unobservable heterogenety on economc condtons. In model (6) the cluster ntra-class recovery rate correlaton of any two dfferent nstruments and j s gven as 5
Cov( y, y ) = g Corr( y, y ) = j j 1 g1 g1 + g. (7) Model (6) s bult on the assumpton that the nstruments defaulted n the same year are dependent on a common latent economc state varable. In ths study, we defne the oblgor and senorty-varyng factor models by substtutng the tme-varyng latent factor t wth k, the k-th oblgor of the total of K oblgors and s, the s-th type of the total of S senortes. he estmaton procedure of ths model starts by dervng the condtonal probablty densty functon and then the random effect terms are ntegrated out to obtan the uncondtonal dstrbuton. Condtonng on the realzaton of, the condtonal dstrbuton of y s gven such as where 1 ( y - m) =, (8) ps s fy ( ) exp( ) m = Ey ( ) = b0 + β x + g1. s = g he uncondtonal probablty densty functon of y s gven by ntegratng out of the condtonal probablty densty functon such as + + 1 ( y - m ) f ( y) = f( y ) df ( ) = exp( ) df( ) - - ps s ò ò, (9) where F ( ) denotes the cumulatve dstrbuton functon of the standard normal varable. We are now able to defne the log lkelhood functon as log L( b, β, g, g ) = log f( y ). (10) 0 1 he estmates of parameters are generated by solvng the log lkelhood functon (10). Notce that the log lkelhood functon (10) nvolves calculatng a mult-dmensonal ntegral that s dffcult to evaluate, whch can be approxmated by a Gaussan quadrature method. Program 4. Mxed effects model %let level=oblgor_d; proc sort data=mydata; by &level; proc nlmxed data=mydata noad qponts=80 tech=quanew maxter=3000 maxfunc=3000 qtol=0.0001; parms b0-b14=0.0001 gamma1-gamma=0.4; cov_mu=b0+b1*var1+b*var+ +b14*var14; con_mu=cov_mu+gamma1*z; con_sgma=gamma**; model RR~normal(con_mu,con_sgma); random z~normal(0,1) subject=&level; predct con_mu out= mx_output_&level (keep=nstrument_d RR pred); Opton subject determnes when random effect term z s realzed whch s defned as a macro level by usng a %let statement for convenence. In ths example the oblgor_d ndcates an oblgor-varyng random effect s specfed and 6
Program 4 s correspondng to an oblgor-varyng mxed effects model. Senorty and tme-varyng models can be realzed by specfyng the macro level. Note that the whole sample data should be sorted by level to make sure the nput data set s clustered accordng to ths varable pror to callng PROC NLMIXED. Opton noad means that a nonadaptve Gaussan quadrature method s called wth the number of quadrature ponts specfed by opton qponts to approxmate the log-lkelhood functon whch s then optmzed by a quas-newton algorthm defned by opton tech. EMPIRICAL RESULS AND ANALYSIS o compare the model predctve accuraces there are three performance metrcs used ncludng the R-square (R ), root mean squared errors (RMSE) and mean absolute errors (MAE). he model ft performances are reported n able 3. All these performance metrcs are calculated n PROC IML. able 3 shows clearly that the oblgor-varyng mxed effects model gves remarkably better performances than the other methods wth R as hgh as 0.8964. When usng a tme-varyng random effect, the mxed effects model stll gves better predctons than the other fx effect regresson models. But the ncluson of a senorty-varyng random effect does not demonstrate any advantages compared wth the other regresson models. Among the fxed effect regresson models, fractonal response regresson shows slghtly better performances than lnear regresson and nflated beta regresson models. Such evdence s also consstent wth Q and hao (011). It s unexpected to see that the nflated beta regresson model does not show better predctons. Calabrese (01) studes bank loan recovery rates from he Bank of Italy and fnds that nflated beta regresson yelds better predcton accuraces than fractonal response regresson models. One possble explanaton s that although the nflated beta regresson s sutable to model the clustered samples on the boundares 0 and 1, t s not able to dscrmnate these observatons accurately from that n the nterval. Another reason probably s the beta dstrbuton may not be well ftted n our sample. able 4 shows the ntra-class covarance and correlaton of the mxed effects model based on equaton (7). Frst, t s clear to see that the oblgor level ntra-class correlaton s sgnfcantly hgher than that at the senorty or tme levels, ndcatng that the nstruments ssued by the oblgor have a sgnfcant hgh correlaton. It s also straghtforward to explan the low correlaton of the nstruments of the same senorty because they are not necessarly nfluenced by a common factor or sharng common characterstcs. Instruments that defaulted at the same year are affected by the same macroeconomc condtons so that the recovery rates correlaton s hgher than the senorty level but stll much lower than the oblgor level. Our fndng shows that t s necessary to nclude the ntra-class correlaton to explan the oblgor specfc unobservable heterogenety whch makes mxed effects models gve better predctons than the fxed effect models. Lnear mxed effects model Level R RMSE MAE Oblgor 0.8964 0.159 0.0839 Senorty 0.3383 0.3183 0.68 me 0.409 0.304 0.461 Ordnary lnear regresson - 0.3409 0.3177 0.65 Fractonal response regresson - 0.368 0.314 0.545 Inflated beta regresson - 0.3558 0.3141 0.630 able 3 Comparsons of model ft Covarance Correlaton Oblgor 0.0961 0.808 Senorty 0.0001 0.0009 me 0.018 0.114 able 4 Intra-class covarance and correlaton 7
Program 5. Performance metrcs %let output_data=mx_output_&level; proc ml; use &output_data; read all var {RR,pred} nto data; close &output_data; m=nrow(data); RR=data[,1]; RR_estmate=data[,]; start var(x); mean=x[:,]; countn=j(1,ncol(x)); do =1 to ncol(x); countn[]=sum(x[,]^=.); end; var=(x-mean)[##,]/(countn-1); return (var); fnsh; resd=rr_estmate-rr; SS_error=resd`*resd; SS_total=var(RR)#(m-1); R_square=1-SS_error/SS_total; RMSE=sqrt(resd`*resd/m); MAE=sum(abs(resd))/m; prnt R_square RMSE MAE; create output_measure var {R_square RMSE MAE}; append; qut; proc prnt data=output_measure; Program 5 llustrates how to calculate the performance metrcs of mxed effects models n PROC IML. Note that n the data set output_data the actual and predcted recovery rates are denoted as RR and pred respectvely. We read them nto PROC IML named as data. Computng R nvolves a calculaton of the varance of the actual recovery rates. However, PROC IML n SAS 9. does not provde a bult-n functon to calculate the varance for a vector or matrx. We defne a module named as var to calculate the varance * and export the performance metrcs nto a SAS data set output_measure. CONCLUSION In ths project we demonstrate that how to estmate both fxed and mxed effects models n SAS/SA. We also llustrate calculatng performance metrcs n PROC IML wth common matrx operators. hs study proposes to apply a lnear mxed effects model to predct the US corporate bonds recovery rates. he purpose of usng a mxed effects model s to better explan the unobservable heterogenety n an emprcal data set. Performances of the mxed effects models wth the random effect term specfed at oblgor, senorty and tme levels are examned. We buld up lnear regresson model n PROC REG, and realze fractonal response regresson and nflated beta regresson models n PROC NLMIXED. We provde detals on how to estmate mxed effects models n PROC NLMIXED wth a smple macro varable ncorporated for convenence. Emprcal evdence ndcates that an oblgor-varyng mxed effects model sgnfcantly outperforms the others n terms of all the performance metrcs we consdered, whch emphaszes the mportance of ncludng frm specfc random effect. We provde a new angle to model corporate bonds recovery rates and show the convenences to buld up credt rsk models n SAS/SA especally PROC NLMIXED. Further study wll be conducted based on the exstng fndngs. * he module program var s gven from Rck Wckln s SAS blog: http://blogs.sas.com/content/ml/011/04/07/computngthe-varance-of-each-column-of-a-matrx/ 8
REFERENCES Bastos, J. A. (010). Forecastng bank loans loss-gven-default, Journal of Bankng and Fnance, 34, 510-517. Bellott,. & Crook, J. (01). Loss gven default models ncorporatng macroeconomc varables for credt cards. Internatonal Journal of Forecastng, 8, 171-18. Dermne, J. & Neto De Carvalho, C. (006). Bank loan losses-gven-default: A case study. Journal of Bankng and Fnance, 30, 119-143. Gupton, G. & Sten, R. (00). LossCalc M : Model for predctng loss gven default (LGD), Moody s KMV. Hamerle, A., Knapp, M. & Wldenauer, N. (006). Modelng loss gven default: A pont n tme -approach, n he Basel II Rsk Parameters, Sprnger, Berln. Lu, Wensu. (01). Modelng Rates and Proportons n SAS-8. Access on May 16, 01 from http://blog.sna.com.cn/s/blog_a8fc8a0101ceb.html Papke, L. & Wooldrdge, J. (1996). Econometrc method for fractonal response varables wth an applcaton to the 401(K) plan partcpaton rates. Journal of Appled Econometrcs, 11, 619-63. Q, M. & hao, X. (011). Comparson of modelng methods for loss gven default, Journal of Bankng and Fnance, 35, 84-855. Ospna, R. and Ferrar, S. (010). Inflated beta dstrbutons, Stat Papers, 51, 111-16. SAS Insttute Inc (009). SAS 9. User s Gude, Cary, NC. Vascek, O. (1987). Probablty of loss on loan portfolo, KMV Corporaton. Wckln, R. Computng the varance of each column of a matrx. he DO Loop: Statstcal programmng n SAS wth an emphass on SAS/IML programs. Access on Aprl 7, 011 from http://blogs.sas.com/content/ml/011/04/07/computng-the-varance-of-each-column-of-a-matrx/ ACKNOWLEDGEMEN he author would lke to thank hs mentor, Stephane hompson, SAS Global Forum Mentorng Program, for her tremendous assstance, and other staff n SAS Insttute Inc. for ther knd support. he author would also lke to thank Rck Wckln and other SAS bloggers for ther nvaluable deas and tps on SAS programmng shared onlne. CONAC INFORMAION Your comments and questons are valued and encouraged. Contact the author at: Xao Yao he Unversty of Ednburgh Busness School 9 Buccleuch Place Ednburgh, EH8 9JS, UK Emal: yaoxao18@gmal.com SAS and all other SAS Insttute Inc. product or servce names are regstered trademarks or trademarks of SAS Insttute Inc. n the USA and other countres. ndcates USA regstraton. Other brand and product names are trademarks of ther respectve companes. 9