Measuring the Discrimination Quality of Suites of Scorecards:

Measuring the Discrimination Quality of Suites of Scorecards: ROCs Ginis, ounds and Segmentation Lyn C Thomas Quantitative Financial Risk Management Centre, University of Southampton UK CSCC X, Edinburgh August 007 1

Outline How to measure scorecards Measures of discrimination Divergence expectations of woe functions Kolmogorv Smirnov- difference in distribution function ROC curves- comparison of distribution function/business measures Gini coefficient/d concordance statistic - Simple bound for ROC Curve Relationship between measures of discrimination Segmenting and measures of segmentation Why segment and build different scorecards on each segment How much of discrimination due to segmentation and how much to scorecard? Some examples from behavioural scorecards

Measuring scorecards in credit scoring Three different aspects of a scorecards performance that one might want to measure Discriminatory power ( only uses scorecard) How good is the system at separating the two classes of goods and bads Divergence statistic Mahalanobis distance Somer s D-concordance statistic Kolmogorov Smirnov statistic ROC curve Gini coefficient Calibration of probability forecast ( uses scorecard and population odds) Not used much until asel requirements and so few tests Chi-square ( Hosmer-Lemeshow ) test inomial and Normal tests Categorical prediction error( uses scorecard, population odds and cut-off score) This requires the scorecard and the cut-off score so one can implement the decisions and see how many erroneous classifications there are Error rates by tables and swap sets Hypothesis tests 3

Introduced by Kullbeck Divergence Continuous version of Information Value Let f(s G) ( f(s )) be density functions of scores of goods, (G) ( bads ()) in a scorecard. Divergence is then defined by f ( s G) Divergence = D= ( f ( s G) f ( s ) log ds = ( f ( s G) f ( s ) w( s) ds f ( s ) where w(s) is the weights of evidence at score. Like Exp goods dist (weights of evidence)-exp bad dist (weights of evidence) D 0 and D=0 f(s G)=f(s ) D no overlap between scores of goods and bads Really can only calculate the divergence by splitting scores into bands. If i bands with g = n and b = n i G i i I i I gi / n G gin Information Value = IV = ( gi / ng bi / n ) ln = ( gi / ng bi / n ) ln i I bi / n i I bn i G 4

Mahalanobis Distance and relationship with Divergence If goods have total n G, mean µ G and variance σ G and bads have total n mean µ and variance σ So assuming same variance G and, variance is Mahalanobis distance is D M = (µ G - µ )/ σ This is what discriminant analysis maximises If assume f(s ) and f(s G) are normal Divergence reduces to D If σ G = σ, then D= D M ( σ ) G σ 1 1 1 = + ( µ µ ) + σ σ σ σ G G G σ G G = p r o b a b i l i t y d e n s i t y n σ n G + n + n σ Difference between goods and bads different 1.5 1 0.5 : 0 0 4 6 8 10-0.5 score bads goods 5

Kolmogorov Smirnov statistic Not a difference in expectations but a difference in the distribution functions F(s G) and F(s ).( max difference) 1 KS = max F( s G) F( s ) s F(s ) Probability Distribution function K-S distance F(s G) 0 s: score 6

Kolmogorov-Smirnov statistic Problem with KS is that it describes situation at optimal separating score. i.e. where marginal good-bad odds is equal to overall good-bad odds This is usually much higher than any cut-off score. Strong relationship between KS and sensitivity and specificity F( s ) F( s G) = sensitivity + specificty 1 and KS = max F( s ) F( s G) = max sensitivity + specificty 1 s s 7

ROC curve Receiver Operating Characteristics Curve KS plots two functions F(s G), F(s ) against score ROC curve plots the functions against each other. Each point corresponds to a cut-off score s Vertical is % of bads below that cut-off Horizontal is % goods against that cut off F(s ) Ideal scorecard gives AC C through point ( 1,0) Curve AC (diagonal) like picking scores at random Gini coefficient (GC) asks how close to optimal is scorecard A F(s G) 8

ROC curve relationship with business measures As go down curve from top right to bottom left, decreasing cut-off score, so more accepted volume of portfolio increases As curve goes down, numbers of bads accepted increases losses increase Profit at cut-off score s, (R profit on good, D loss on bad) is Rp (1 F( s G)) Dp (1 F( s )) G So isobars of equal profit if Rp F( s G) + Dp F( s ) = constant G As one goes to north west along curve profit increases C F(s ) Volume increases Losses increase A Profits increase F(s G) 9

Gini Coefficient Gini coefficient, GC, is x(ratio of area between curve and diagonal to area AC) If GC =1 then perfect discrimination; GC =0 no discrimination. AUROC is area under the ROC curve so GC= (AUROC -0.5)= AUROC -1 K-S is greatest vertical distance from diagonal to curve. C F(s ) F(s ) A F(s G) F(s G) 10

Lift curve and Accuracy Ratio AR Lift curve looks similar to ROC curve ( originated in marketing) but subtle differences C F(s ) A F(s) Plots F(s ) % bads rejected against F(s) % rejected Ideal scorecard gives AC ( is p of population in) Random scorecard given by diagonal AC Curve depends on population odds UT Accuracy ratio, AR = ( area curve above diagonal)/area AC AR = GC ( even though different curves) 11

Somer s D-concordance statistic and relationship with Gini Coefiicient C F(s ) A F(s G) Area under curve can be interpreted as probability, D S, that if good and bad chosen at random, then good will have higher score than bad. Consider expectation of variable which is 1 if goods score>bads score; -1 if bads score> goods score; 0 if same score ( ) D = 1 F( s ) f ( s G) ds 1. F( s ) f ( s G) ds S = F( s ) f ( s G) ds 1. f ( s G) ds = AUROC 1 = G This D S is known as Somer s D-concordance statistic. Useful way of calculating GC.is to calculate Mann Whitney U If g goods, b bads let S(G) (S()) be sum of rankings of goods ( bads) Example as scores get ( GGGG); S(G) = +4+5+6=17; s()= 4; g=4,b=. AUROC=U/gb= [S(G) 1/g(g+1)]/gb =(17-10)/4. =0.875 D s = GC= AUROC-1 = U/gb -1= 0.75 1

C Very Simple bound on Gini and its powerful consequences Assume the scorecard has monotonically increasing marginal odds (reasonable) ROC curve is concave E (g,b)=(f(s G,F(s ) C F A G Area of AFE = (b-g)ag/; Area of FEC= (b-g).gd/ Area of two triangles = (b-g)/ < area from curve to diagonal =GC/ GC> (b-g) for any point on the curve D 13

GC > ( b-g) Very Simple bound on Gini and its powerful consequences (g,b)=(f(s G,F(s ) E F A D Take (g,b) to be at s which Maximises F(s )-F(s G), GC > KS Good cards satisfy 50-10 rule ( pick up 50% of bads in first 10% of goods) 50-10 rule G>.4 14

Segmentation and Discrimination In reality a scoring system consists of not just one score card but a suite of scorecards built on different segments of the population. Reasons for segmentation System / data constraints new account vs older accounts Policy issues want young people - different scorecard for <5 s Significant interactions between variables and others Usually only calculate discrimination measure for each scorecard separately but should do it for whole system after scorecards have been calibrated on common scale How much of the discrimination is due to the scorecards and how much to the segmentation? 15

Measuring power of segmentation in scorecards Measure segmentation power by taking the segments and choose scorecards which discriminates no better than random in each segment. Give borrower j in segment i a score of s i +εu j where u j has uniform distribution on [0,1] and s i+1 > s i + ε Results for two segment case but expands to k segment case. Assume segment 1 has g 1 goods and b 1 bads, score s 1 and segment has g goods and b bads, score s and assume Let g b < g b 1 1 Define n = g + b; n = g + b ; 1 1 1 b b g g p = ; p = ; p = ; p = b 1 b g 1 g 1 b 1 1 + b b1 + b g1 + g g1 + g p n n = ; p = t 1 t 1 n 1 + n n1 + n 16

Impact of segmentation on Gini coefficient AEC is ROC curve for segmented/random Gini E is (p,p ), so by same result as approx g b 1 1 GC=p -p b g 1 1 C Example from behavioural scorecards where can segment on whether ever in arrears or not In arrears- 830 goods; 160 bads Never in arrears- 7170 goods 40 bads Segment Gini = (160/00-830/8000) =.8-.104=.696 Actual Gini was.88 R E Even if no segmentation,gives view of how much of Gini, this characteristic brings to scorecard. Is like using (approx) D-concordance to decide which variables to choose Explains why behavioural, scores have higher Ginis than application score In example suppose Gini for no arrears is like that for application score say GC; assume cannot distinguish good/bads in arrears, then provided arrears score is small enough so curve goes through E ehavioural Gini = 0.696+.179GC So appl GC =.45 behav GC =.78 S A ROC curve given by segmentation only 17 D

Impact of segmentation on KS statistic For scorecard just from segments Easy to show that KS maximised at end of first segment b g 1 1 K S = p p Note for two segment scorecard KS=GC Cumulative prob F(s ) b p1 g p1 F(s G) ehavioural example In arrears- 830 goods; 160 bads Never in arrears- 7170 goods 40 bads Segment KS = (160/00-830/8000) =.8-.104=.696 Actual KS was.75 score 18

Impact of segmentation on Divergence and Mahalanobis distance Mahalanobis distance for segmented only scorecard where all segment 1 get s 1, all segment get s (ε 0) µ = p s + p s ; µ = p s + p s b b g g 1 1 G 1 1 σ = p p ( s s ) ; σ = p p ( s s ) b b g g 1 1 G 1 1 % σ = ( n σ + n σ )/ n = ( p p p + p p p )( s s ) D g b t g b t G G 1 1 1 1 % M ( σ ) = p p p p b g g b 1 1 p p p + p p p g b t g b t 1 1 1 If instead we take the overall sample varian ce σ σ = p p ( s s ) D t t 1 1 M ( σ ) = p p p p b g g b 1 1 p p t t 1 Example: behavioural score In arrears- 830 goods; 160 bads Never in arrears- 7170 goods 40 bads p =.8; p =.; p =.10375; p =.8965; b b g b 1 1 p =.107; p =.8793; σ =.16( s s ) ; σ =.0913( s s ) t b 1 1 G 1 % σ = 0.307; σ = 0.35 D M (% σ) =.6; D (% σ) =.14 Actual Mahalonobis distance is.40 M 19

Impact of segmentation on Divergence and Mahalanobis distance Divergence for segmented only scorecard where all segment 1 get s 1, all segment get s (ε 0) D = ( σ ) G σ 1 1 1 = + ( µ µ ) + σ σ σ σ G G G ( p p + p p )( p p p p ) + ( p p p p ) g g b b g b g b g g b b 1 1 1 1 1 1 b b g g p 1 p p 1 p Example: behavioural score In arrears- 830 goods; 160 bads Never in arrears- 7170 goods 40 bads For segmentation just by itself D = 4.73 Actual D = 5.761 ( but done using equal variance approx) 0

Conclusions There are connections between the different ways of measuring the discrimination of scorecards ROC curve is most fundamental of the measures ( does not depend on population odds) includes KS and D-concordance Very simple triangle bound give quick indication of GC Shows GC>KS Triangle bound is actual value if one only segments with random scorecard in each segment Allows one to recognise how much discrimination is built into segmentation alone independent of scorecards then built Even if no segmentation, gives importance of the variable considered for segmentation in full population scorecard 1