Statistical Methods to Develop Rating Models

Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and Stress Testng Bernd Engelmann, Robert Rauhmeer, Robert (Edtors), 376 p., Sprnger Verlag, Berln 2006.

I. Statstcal Methods to Develop Ratng Models Evelyn Hayden and Danel Porath Österrechsche Natonalbank 1 and Unversty of Appled Scences at Manz 1. Introducton The Internal Ratng Based Approach (IRBA) of the New Basel Captal Accord allows banks to use ther own ratng models for the estmaton of probabltes of default (PD) as long as the systems meet specfed mnmum requrements. Statstcal theory offers a varety of methods for buldng and estmaton ratng models. Ths chapter gves an overvew of these methods. The overvew s focused on statstcal methods and ncludes parametrc models lke lnear regresson analyss, dscrmnant analyss, bnary response analyss, tme-dscrete panel methods, hazard models and nonparametrc models lke neural networks and decson trees. We also hghlght the benefts and the drawbacks of the varous approaches. We conclude by nterpretng the models n lght of the mnmum requrements of the IRBA. 2. Statstcal Methods for Rsk Classfcaton In the followng we defne statstcal models as the class of approach whch uses econometrc methods to classfy borrowers accordng to ther rsk. Statstcal ratng systems prmarly nvolve a search for explanatory varables whch provde as sound and relable a forecast of the deteroraton of a borrower's stuaton as possble. In contrast, structural models explan the threats to a borrower based on an economc model and thus use clear causal connectons nstead of the mere correlaton of varables. The followng sectons offer an overvew of parametrc and nonparametrc models generally consdered for statstcal rsk assessment. Furthermore, we dscuss the advantages and dsadvantages of each approach. Many of the methods are descrbed n more detal n standard econometrc textbooks, lke Greene (2003). 1 The opnons expressed n ths chapter are those of the author and do not necessarly reflect vews of the Österreschsche Natonalbank.

2 Evelyn Hayden and Danel Porath In general, a statstcal model may be descrbed as follows: As a startng pont, every statstcal model uses the borrower s characterstc ndcators and (possbly) macroeconomc varables whch were collected hstorcally and are avalable for defaultng (or troubled) and non-defaultng borrowers. Let the borrower s characterstcs be defned by a vector of n separate varables (also called covarates) x = x 1,...,x n observed at tme t - L. The state of default s ndcated by a bnary performance varable y observed at tme t. The varable y s defned as y = 1 for a default and y = 0 for a non-default. The sample of borrowers now ncludes a number of ndvduals or frms that defaulted n the past, whle (typcally) the majorty dd not default. Dependng on the statstcal applcaton of ths data, a varety of methods can be used to predct the performance. A common feature of the methods s that they estmate the correlaton between the borrowers characterstcs and the state of default n the past and use ths nformaton to buld a forecastng model. The forecastng model s desgned to assess the credtworthness of borrowers wth unknown performance. Ths can be done by nputtng the characterstcs x nto the model. The output of the model s the estmated performance. The tme lag L between x and y determnes the forecast horzon. 3. Regresson Analyss As a startng pont we consder the classcal regresson model. The regresson model establshes a lnear relatonshp between the borrowers characterstcs and the default varable: y = β' x + u (1) Agan, y ndcates whether borrower has defaulted (y = 1) or not (y = 0). In perod t, x s a column vector of the borrowers characterstcs observed n perod t L and β s a column vector of parameters whch capture the mpact of a change n the characterstcs on the default varable. Fnally, u s the resdual varable whch contans the varaton not captured by the characterstcs x. The standard procedure s to estmate (1) wth the ordnary least squares (OLS) estmators of β whch n the followng are denoted by b. The estmated result s the borrower s score S. Ths can be calculated by S y ( x ) = b x = E '. (2) Equaton (2) shows that a borrower s score represents the expected value of the performance varable when hs or her ndvdual characterstcs are known. The score can be calculated by nputtng the values for the borrower s characterstcs nto the lnear functon gven n (2).

I. Statstcal Methods to Develop Ratng Models 3 Note that S s contnuous (whle y s a bnary varable), hence the output of the model wll generally be dfferent from 0 or 1. In addton, the predcton can take on values larger than 1 or smaller than 0. As a consequence, the outcome of the model cannot be nterpreted as a probablty level. However, the score S, can be used for the purpose of comparson between dfferent borrowers, where hgher values of S correlate wth a hgher default rsk. The benefts and drawbacks from model (1) and (2) are the followng: OLS estmators are well-known and easly avalable. The forecastng model s a lnear model and therefore easy to compute and to understand. The random varable u s heteroscedastc (.e. the varance of u s not constant for all ) snce Var ( u ) = Var( y ) = E( y x ) [ 1 E( y x )] = b' x ( 1 b' x ). (3) As a consequence, the estmaton of β s neffcent and addtonally, the standard errors of the estmated coeffcents b are based. An effcent way to estmate β s to apply the Weghted Least Squares (WLS) estmator. WLS estmaton of β s effcent, but the estmaton of the standard errors of b stll remans based. Ths happens due to the fact that the resduals are not normally dstrbuted as they can only take on the values b x (f the borrower does not default and y therefore equals 0) or (1 b x ) (f the borrower does default and y therefore equals 1). Ths mples that there s no relable way to assess the sgnfcance of the coeffcents b and t remans unknown whether the estmated values represent precse estmatons of sgnfcant relatonshps or whether they are just caused by spurous correlatons. Inputtng characterstcs whch are not sgnfcant nto the model can serously harm the model s stablty when used to predct borrowers rsk for new data. A way to cope wth ths problem s to splt the sample nto two parts, where one part (the tranng sample) s used to estmate the model and the other part (the hold-out sample) s used to valdate the results. The consstency of the results of both samples s then taken as an ndcator for the stablty of the model. The absolute value of S cannot be nterpreted. 4. Dscrmnant Analyss Dscrmnant analyss s a classfcaton technque appled to corporate bankruptces by Altman as early as 1968 (see Altman, 1968). Lnear dscrmnant analyss s based on the estmaton of a lnear dscrmnant functon wth the task of separatng ndvdual groups (n ths case of defaultng and non-defaultng borrowers) accordng to specfc characterstcs. The dscrmnant functon s S = β' x. (4)

4 Evelyn Hayden and Danel Porath The Score S s also called the dscrmnant varable. The estmaton of the dscrmnant functon adheres to the followng prncple: Maxmzaton of the spread between the groups (good and bad borrowers) and mnmzaton of the spread wthn ndvdual groups Maxmzaton only determnes the optmal proportons among the coeffcents of the vector β. Usually (but arbtrarly), coeffcents are normalzed by choosng the pooled wthn-group varance to take the value 1. As a consequence, the absolute level of S s arbtrary as well and cannot be nterpreted on a stand-alone bass. As n lnear regresson analyss, S can only be used to compare the predcton for dfferent borrowers ( hgher score, hgher rsk ). Dscrmnant analyss s smlar to the lnear regresson model gven n equatons (1) and (2). In fact, the proportons among the coeffcents of the regresson model are equal to the optmal proporton accordng to the dscrmnant analyss. The dfference between the two methods s a theoretcal one: Whereas n the regresson model the characterstcs are determnstc and the default state s the realzaton of a random varable, for dscrmnant analyss the opposte s true. Here the groups (default or non-default) are determnstc and the characterstcs of the dscrmnant functon are realzatons from a random varable. For practcal use ths dfference s vrtually rrelevant. Therefore, the benefts and drawbacks of dscrmnant analyss are smlar to those of the regresson model: Dscrmnant analyss s a wdely known method wth estmaton algorthms that are easly avalable. Once the coeffcents are estmated, the scores can be calculated n a straghtforward way wth a lnear functon. Snce the characterstcs x are assumed to be realzatons of random varables, the statstcal tests for the sgnfcance of the model and the coeffcents rely on the assumpton of multvarate normalty. Ths s, however, unrealstc for the varables typcally used n ratng models as for example fnancal ratos from the balance-sheet. Hence, the methods for analyzng the stablty of the model and the plausblty of the coeffcents are lmted to a comparson between tranng and hold-out sample. The absolute value of the dscrmnant functon cannot be nterpreted n levels. 5. Logt and Probt Models Logt and probt models are econometrc technques desgned for analyzng bnary dependent varables. There are two alternatve theoretcal foundatons. The latent-varable approach assumes an unobservable (latent) varable y* whch s related to the borrower s characterstcs n the followng way:

I. Statstcal Methods to Develop Ratng Models 5 * y = β' x + u (5) Here β, x and u are defned as above. The varable y * s metrcally scaled and trggers the value of the bnary default varable y : * 1 f y > 0 y = (6) 0 otherwse Ths means that the default event sets n when the latent varable exceeds the threshold zero. Therefore, the probablty for the occurrence of the default event equals: P ( y = ) = P( u > β' x ) = 1 F( β' x ) = F( β' x ) 1. (7) Here F(.) denotes the (unknown) dstrbuton functon. The last step n (7) assumes that the dstrbuton functon has a symmetrc densty around zero. The choce of the dstrbuton functon F(.) depends on the dstrbutonal assumptons about the resduals (u ). If a normal dstrbuton s assumed, we are faced wth the probt model: β' x t 2 1 F( β' x ) = e 2 dt 2π (8) If nstead the resduals are assumed to follow a logstc dstrbuton, the result s the logt model: β' x e F( β' x ) = (9) β' x 1+ e The second way to motvate logt and probt models starts from the am of estmatng default probabltes. For sngle borrowers, default probabltes cannot be observed as realzatons of default probabltes. However, for groups of borrowers the observed default frequences can be nterpreted as default probabltes. As a startng pont consder the OLS estmaton of the followng regresson: p = b' x + u (10) In (10) the ndex denotes the group formed by a number of ndvduals, p s the default frequency observed n group and x are the characterstcs observed for group. The model, however, s nadequate. To see ths consder that the outcome (whch s E(y x ) = b x ) s not bounded to values between zero and one and therefore cannot be nterpreted as a probablty. As t s generally mplausble to assume that a probablty can be calculated by a lnear functon, n a second step the lnear expresson b x s transformed by a nonlnear functon (lnk functon) F:

6 Evelyn Hayden and Danel Porath p ( b' x ) = F. (11) An approprate lnk functon transforms the values of b x to a scale wthn the nterval [0,1]. Ths can be acheved by any dstrbuton functon. The choce of the lnk functon determnes the type of model: wth a logstc lnk functon equaton (11) becomes a logt model, whle wth the normal dstrbuton (11) results n the probt model. However, when estmatng (10) wth OLS, the coeffcents wll be heteroscedastc, because Var(u ) = Var(p ) = p(x ) (1-p(x )). A possble way to acheve homoscedastcty would be to compute the WLS estmators of b n (10). However, albet possble, ths s not common practce. The reason s that n order to observe default frequences, the data has to be grouped before estmaton. Groupng nvolves consderable practcal problems lke defnng the sze and number of the groups and the treatment of dfferent covarates wthn the sngle groups. A better way to estmate logt and probt models, whch does not requre groupng, s the Maxmum-Lkelhood (ML) method. For a bnary dependent varable the lkelhood functon looks lke: 1 y [ ] y ( ) = P( b' x ) 1 P( b' x ) L b. For the probt model P(.) s the normal densty functon and for the logt model P(.) s the logstc densty functon. Wth equaton (12) the estmaton of the model s theoretcally convncng and also easy to handle. Furthermore, the ML-approach lends tself for a broad set of tests to evaluate the model and ts sngle varables (see Hosmer and Lemeshow (2000) for a comprehensve ntroducton). Usually, the choce of the lnk functon s not theoretcally drven. Users famlar wth the normal dstrbuton wll opt for the probt model. Indeed, the dfferences n the results of both classes of models are often neglgble. Ths s due to the fact that both dstrbuton functons have a smlar form except for the tals, whch are heaver for the logt model. The logt model s easer to handle, though. Frst of all, the computaton of the estmators s easer. However, today computatonal complexty s often rrelevant as most users apply statstcal software where the estmaton algorthms are ntegrated. What s more mportant s the fact that the coeffcents of the logt model can be more easly nterpreted. To see ths we transform the logt model gven n (9) n the followng way: P = e 1 P β' x The left-hand sde of (13) are the odds,.e. the relaton between the default probablty and the probablty of survval. Now t can be easly seen that a varaton of a sngle varable x k of one unt has an mpact of e β k on the odds, when β k denotes the coeffcent of the varable x k. Hence, the transformed coeffcents e β are called (12) (13)

I. Statstcal Methods to Develop Ratng Models 7 odds-ratos. They represent the multplcatve mpact of a borrower s characterstc on the odds. Therefore, for the logt model, the coeffcents can be nterpreted n a plausble way, whch s not possble for the probt model. Indeed, the most mportant weakness of bnary models s the fact that the nterpretaton of the coeffcents s not straghtforward. The strengths of logt and probt models can be summarzed as: The methods are theoretcally sound The results generated can be nterpreted drectly as default probabltes The sgnfcance of the model and the ndvdual coeffcents can be tested. Therefore, the stablty of the model can be assessed more effectvely than n the prevous cases. 6. Panel Models The methods dscussed so far are all cross-sectonal methods because all covarates are related to the same perod. However, typcally banks dspose of a set of covarates for more than one perod for each borrower. In ths case t s possble to expand the cross-sectonal nput data to a panel dataset. The man motvaton s to enlarge the number of avalable observatons for the estmaton and therefore enhance the stablty and the precson of the ratng model. Addtonally, panel models can ntegrate macroeconomc varables nto the model. Macroeconomc varables can mprove the model for several reasons. Frst, many macroeconomc data sources are more up-to-date than the borrowers characterstcs. For example, fnancal ratos calculated from balance sheet nformaton are usually updated only once a year and are often up to two years old when used for rsk assessment. The ol prce, nstead, s avalable on a daly frequency. Secondly, by stressng the macroeconomc nput factors, the model can be used for a form of stress-testng credt rsk. However, as macroeconomc varables prmarly affect the absolute value of the default probablty, t s only reasonable to ncorporate macroeconomc nput factors nto those classes of models that estmate default probabltes. In prncple, the structure of, for example, a panel logt or probt model remans the same as gven n the equatons of the prevous secton. The only dfference s that now the covarates are taken from a panel of data and have to be ndexed by an addtonal tme seres ndcator,.e. we observe x t nstead of x. At frst glance panel models seem smlar to cross-sectonal models. In fact, many developers gnore the dynamc pattern of the covarates and smply ft logt or probt models. However, logt and probt models rely on the assumpton of ndependent observatons. Generally, cross-sectonal data meets ths requrement, but panel data does not. The reason s that observatons from the same perod and observatons from the same borrower should be correlated. Introducng ths correlaton n the estmaton procedure s cumbersome. For example, the fxed-effects estmator known from panel analyss for contnuous dependent varables s not avalable for the

8 Evelyn Hayden and Danel Porath probt model. Besdes, the modfed fxed-effects estmator for logt models proposed by Chamberlan (1980) excludes all non-defaultng borrowers from the analyss and therefore seems napproprate. Fnally, the random-effects estmators proposed n the lterature are computatonally extensve and can only be computed wth specalzed software. For an econometrc dscusson of bnary panel analyss, refer to Hosmer and Lemeshow (2000). 7. Hazard Models All methods dscussed so far try to assess the rskness of borrowers by estmatng a certan type of score that ndcates whether or not a borrower s lkely to default wthn the specfed forecast horzon. However, no predcton about the exact default pont n tme s made. Besdes, these approaches do not allow the evaluaton of the borrowers rsk for future tme perods gven they should not default wthn the reference tme horzon. These dsadvantages can be remeded by means of hazard models, whch explctly take the survval functon and thus the tme at whch a borrower's default occurs nto account. Wthn ths class of models, the Cox proportonal hazard model (cf. Cox, 1972) s the most general regresson model, as t s not based on any assumptons concernng the nature or shape of the underlyng survval dstrbuton. The model assumes that the underlyng hazard rate (rather than survval tme) s a functon of the ndependent varables; no assumptons are made about the nature or shape of the hazard functon. Thus, the Cox s regresson model s a semparametrc model. The model can be wrtten as: h β' x ( t x ) h ( t) e, (14) = 0 where h (t x ) denotes the resultant hazard, gven the covarates for the respectve borrower and the respectve survval tme t. The term h 0 (t) s called the baselne hazard; t s the hazard when all ndependent varable values are equal to zero. If the covarates are measured as devatons from ther respectve means, h 0 (t) can be nterpreted as the hazard rate of the average borrower. Whle no assumptons are made about the underlyng hazard functon, the model equaton shown above mples mportant assumptons. Frst, t specfes a multplcatve relatonshp between the hazard functon and the log-lnear functon of the explanatory varables, whch mples that the rato of the hazards of two borrowers does not depend on tme,.e. the relatve rskness of the borrowers s constant, hence the name Cox proportonal hazard model. Besdes, the model assumes that the default pont n tme s a contnuous random varable. However, often the borrowers fnancal condtons are not observed contnuously but rather at dscrete ponts n tme. What s more, the covarates are

I. Statstcal Methods to Develop Ratng Models 9 treated as f they were constant over tme, whle typcal explanatory varables lke fnancal ratos change wth tme. Although there are some advanced models to ncorporate the above mentoned features, the estmaton of these models becomes complex. The strengths and weaknesses of hazard models can be summarzed as follows: Hazard models allow for the estmaton of a survval functon for all borrowers from the tme structure of hstorcal defaults, whch mples that default probabltes can be calculated for dfferent tme horzons. Estmatng these models under realstc assumptons s not straghtforward. 8. Neural Networks In recent years, neural networks have been dscussed extensvely as an alternatve to the (parametrc) models dscussed above. They offer a more flexble desgn to represent the connectons between ndependent and dependent varables. Neural networks belong to the class of non-parametrcal methods. Unlke the methods dscussed so far they do not estmate parameters of a well-specfed model. Instead, they are nspred by the way bologcal nervous systems, such as the bran, process nformaton. They typcally consst of many nodes that send a certan output f they receve a specfc nput from the other nodes to whch they are connected. Lke parametrc models, neural networks are traned by a tranng sample to classfy borrowers correctly. The fnal network s found by adjustng the connectons between the nput, output and any potental ntermedary nodes. The strengths and weaknesses of neural networks can be summarzed as: Neural networks easly model hghly complex, nonlnear relatonshps between the nput and the output varables. They are free from any dstrbutonal assumptons. These models can be quckly adapted to new nformaton (dependng on the tranng algorthm). There s no formal procedure to determne the optmum network topology for a specfc problem,.e. the number of the layers of nodes connectng the nput wth the output varables. Neural networks are black boxes, hence they are dffcult to nterpret. Calculatng default probabltes s possble only to a lmted extent and wth consderable extra effort. In summary, neural networks are partcularly sutable when there are no expectatons (based on experence or theoretcal arguments) on the relatonshp between the nput factors and the default event and the economc nterpretaton of the resultng models s of nferor mportance.

10 Evelyn Hayden and Danel Porath 9. Decson Trees A further category of non-parametrc methods comprses decson trees, also called classfcaton trees. Trees are models whch consst of a set of f-then splt condtons for classfyng cases nto two (or more) dfferent groups. Under these methods, the base sample s subdvded nto groups accordng to the covarates. In the case of bnary classfcaton trees, for example, each tree node s assgned by (usually unvarate) decson rules, whch descrbe the sample accordngly and subdvde t nto two subgroups each. New observatons are processed down the tree n accordance wth the decson rules' values untl the end node s reached, whch then represents the classfcaton of ths observaton. An example s gven n Fgure 1. Sector Constructon Other Years n busness Less than 2 EBIT. Equty rato Less than 15% More than 15%. Rsk class 2 Rsk class 3 Fgure 1. Decson Tree One of the most strkng dfferences of the parametrc models s that all covarates are grouped and treated as categorcal varables. Furthermore, whether a specfc varable or category becomes relevant depends on the categores of the varables n the upper level. For example, n Fgure 1 the varable years n busness s only relevant for companes whch operate n the constructon sector. Ths knd of dependence between varables s called nteracton. The most mportant algorthms for buldng decson trees are the Classfcaton and Regresson Trees algorthms (C&RT) popularzed by Breman et al. (1984) and the CHAID algorthm (Ch-square Automatc Interacton Detector, see Kass, 1978). Both algorthms use dfferent crtera to dentfy the best splts n the data and to collapse the categores whch are not sgnfcantly dfferent n outcome. The general strengths and weaknesses of trees are:

I. Statstcal Methods to Develop Ratng Models 11 Through categorzaton, nonlnear relatonshps between the varables and the score can be easly modelled. Interactons present n the data can be dentfed. Parametrc methods can model nteractons only to a lmted extent (by ntroducng dummy varables). As wth neural networks, decson trees are free from dstrbutonal assumptons. The output s easy to understand. Probabltes of default have to be calculated n a separate step. The output s (a few) rsk categores and not a contnuous score varable. Consequently, decson trees only calculate default probabltes for the fnal node n a tree, but not for ndvdual borrowers. Compared to other models, trees contan fewer varables and categores. The reason s that n each node the sample s successvely parttoned and therefore contnuously dmnshes. The stablty of the model cannot be assessed wth statstcal procedures. The strategy s to work wth a tranng sample and a hold-out sample. In summary, trees are partcularly suted when the data s characterzed by a lmted number of predctve varables whch are known to be nteractve. 10. Statstcal Models and Basel II Fnally, we ask the queston whether the models dscussed n ths chapter are n lne wth the IRB Approach of Basel II. Pror to the dscusson, t should be mentoned that n the Basel documents, ratng systems are defned n a broader sense than n ths chapter. Followng 394 of the Revsed Framework from June 2004 (cf. BIS, 2004) a ratng system comprses all the methods, processes, controls, and data collecton and IT systems that support the assessment of credt rsk, the assgnment of nternal ratngs, and the quantfcaton of default and loss estmates. Compared to ths defnton, these methods provde one component, namely the assgnment of nternal ratngs. The mnmum requrements for nternal ratng systems are treated n part II, secton III, H of the Revsed Framework. A few passages of the text concern the assgnment of nternal ratngs, and the requrements are general. They manly concern the ratng structure and the nput data, examples beng: a mnmum of 7 ratng classes of non-defaulted borrowers ( 404) no undue or excessve concentratons n sngle ratng classes ( 403, 406) a meanngful dfferentaton of rsk between the classes ( 410) plausble, ntutve and current nput data ( 410, 411) all relevant nformaton must be taken nto account ( 411). The requrements do not reveal any preference for a certan method. It s ndeed one of the central deas of the IRBA that the banks are free n the choce of the

12 Evelyn Hayden and Danel Porath method. Therefore the models dscussed here are all possble canddates for the IRB Approach. The strengths and weaknesses of the sngle methods concern some of the mnmum requrements. For example, hazard rate or logt panel models are especally suted for stress testng (as requred by 434, 345) snce they contan a tmeseres dmenson. Methods whch allow for the statstcal testng of the ndvdual nput factors (e.g. the logt model) provde a straghtforward way to demonstrate the plausblty of the nput factors (as requred by 410). When the outcome of the model s a contnuous varable, the ratng classes can be defned n a more flexble way ( 403, 404, 406). On the other hand, none of the drawbacks of the models consdered here excludes a specfc method. For example, a bank may have a preference for lnear regresson analyss. In ths case the plausblty of the nput factors cannot be verfed by statstcal tests and as a consequence the bank wll have to search for alternatve ways to meet the requrements of 410. In summary, the mnmum requrements are not ntended as a gudelne for the choce of a specfc model. Banks should rather base ther choce on ther nternal ams and restrctons. If necessary, those components that are only needed for the purpose to satsfy the crtera of the IRBA should be added n a second step. All models dscussed n ths chapter allow for ths. References Altman EI (1968), Fnancal Indcators, Dscrmnant Analyss, and the Predcton of Corporate Bankruptcy, Journal of Fnance. BIS (2004), Internatonal Convergence of Captal Measurement and Captal Standards, Basel Commttee on Bankng Supervson, June 2004. Breman L, Fredman JH, Olshen RA, Stone SJ (1984), Classfcaton and Regresson Trees, Wadsworth. Chamberlan G (1980), Analyss of Covarance wth Qualtatve Data, Revew of Economc Studes 47, 225-238. Cox DR (1972), Regresson Models and Lfe Tables (wth Dscusson), Journal of Royal Statstcal Socety, Seres B. Greene W (2003), Econometrc Analyss, 5 th ed., Prentce-Hall New Jersey. Hosmer W, Lemeshow S (2000), Appled Logstc Regresson, New York, Wley. Kass GV (1978), An Exploratory Technque for Investgatng Large Quanttes of Categorcal Data, Appled Statstcs 29 (2), pp. 119-127.