Measuring the Discrimination Quality of Suites of Scorecards:

Similar documents
Despite its emphasis on credit-scoring/rating model validation,

SAMPLE SELECTION BIAS IN CREDIT SCORING MODELS

Issues in Credit Scoring

An Application of the Cox Proportional Hazards Model to the Construction of Objective Vintages for Credit in Financial Institutions, Using PROC PHREG

MORTGAGE LENDER PROTECTION UNDER INSURANCE ARRANGEMENTS Irina Genriha Latvian University, Tel ,

How to Measure the Quality of Credit Scoring Models *

Weight of Evidence Module

Developing Credit Scorecards Using Credit Scoring for SAS Enterprise Miner TM 12.1

Credit Scorecards for SME Finance The Process of Improving Risk Measurement and Management

INTRODUCTION TO RATING MODELS

Credit Risk Models. August 24 26, 2010

Validation of Internal Rating and Scoring Models

Discussion Paper On the validation and review of Credit Rating Agencies methodologies

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Data Mining Techniques Chapter 6: Decision Trees

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

Reflection and Refraction

Key Concept. Density Curve

HYPOTHESIS TESTING: POWER OF THE TEST

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

11. Analysis of Case-control Studies Logistic Regression

Statistical tests for SPSS

Piecewise Logistic Regression: an Application in Credit Scoring

Hypothesis Testing for Beginners

Counterparty Credit Risk for Insurance and Reinsurance Firms. Perry D. Mehta Enterprise Risk Management Symposium Chicago, March 2011

Utility. M. Utku Ünver Micro Theory. M. Utku Ünver Micro Theory Utility 1 / 15

Decision & Risk Analysis Lecture 6. Risk and Utility

Cross-Tab Weighting for Retail and Small-Business Scorecards in Developing Markets

The aerodynamic center

Continuous Random Variables

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios

Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller

Ordinal Regression. Chapter

3.4 The Normal Distribution

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Data Modeling & Bureau Scoring Experian for CreditChex

Two Correlated Proportions (McNemar Test)

Demand. Lecture 3. August Reading: Perlo Chapter 4 1 / 58

Normality Testing in Excel

Nonlinear Regression Functions. SW Ch 8 1/54/

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Credit scoring Case study in data analytics

Testing Random- Number Generators

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Non-Inferiority Tests for Two Means using Differences

CALCULATIONS & STATISTICS

Math 425 (Fall 08) Solutions Midterm 2 November 6, 2008

Projects Involving Statistics (& SPSS)

Jitter Measurements in Serial Data Signals

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Statistics in Retail Finance. Chapter 6: Behavioural models

Interpretation of Somers D under four simple models

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Risk and return (1) Class 9 Financial Management,

CORPORATE CREDIT RISK MODELING: QUANTITATIVE RATING SYSTEM AND PROBABILITY OF DEFAULT ESTIMATION

Lecture 3: Linear methods for classification

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistics in Retail Finance. Chapter 2: Statistical models of default

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Data Mining: Algorithms and Applications Matrix Math Review

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Statistical Machine Learning

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

Point Biserial Correlation Tests

Lecture 10: Depicting Sampling Distributions of a Sample Proportion

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Simple Linear Regression Inference

Research Methods & Experimental Design

LOGIT AND PROBIT ANALYSIS

Financial Market Efficiency and Its Implications

Modeling Lifetime Value in the Insurance Industry

1 Sufficient statistics

Introduction to General and Generalized Linear Models

Appendix B D&B Rating & Score Explanations

Data Preprocessing. Week 2

The Wilcoxon Rank-Sum Test

Normal distribution. ) 2 /2σ. 2π σ

Simple Regression Theory II 2010 Samuel L. Baker

Adverse Impact Ratio for Females (0/ 1) = 0 (5/ 17) = Adverse impact as defined by the 4/5ths rule was not found in the above data.

Lecture Notes on Elasticity of Substitution

AN IMPROVED CREDIT SCORING METHOD FOR CHINESE COMMERCIAL BANKS

CREDIT SCORING MODEL APPLICATIONS:

Normal Distribution. Definition A continuous random variable has a normal distribution if its probability density. f ( y ) = 1.

17. SIMPLE LINEAR REGRESSION II

Environmental Remote Sensing GEOG 2021

Calibration of Uncertainty (P10/P90) in Exploration Prospects*

Confidence Intervals for the Difference Between Two Means

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Analysis of Variance ANOVA

Statistics Review PSY379

1 The Brownian bridge construction

MULTIPLE REGRESSION WITH CATEGORICAL DATA

Lecture notes for Choice Under Uncertainty

Transcription:

Measuring the Discrimination Quality of Suites of Scorecards: ROCs Ginis, ounds and Segmentation Lyn C Thomas Quantitative Financial Risk Management Centre, University of Southampton UK CSCC X, Edinburgh August 007 1

Outline How to measure scorecards Measures of discrimination Divergence expectations of woe functions Kolmogorv Smirnov- difference in distribution function ROC curves- comparison of distribution function/business measures Gini coefficient/d concordance statistic - Simple bound for ROC Curve Relationship between measures of discrimination Segmenting and measures of segmentation Why segment and build different scorecards on each segment How much of discrimination due to segmentation and how much to scorecard? Some examples from behavioural scorecards

Measuring scorecards in credit scoring Three different aspects of a scorecards performance that one might want to measure Discriminatory power ( only uses scorecard) How good is the system at separating the two classes of goods and bads Divergence statistic Mahalanobis distance Somer s D-concordance statistic Kolmogorov Smirnov statistic ROC curve Gini coefficient Calibration of probability forecast ( uses scorecard and population odds) Not used much until asel requirements and so few tests Chi-square ( Hosmer-Lemeshow ) test inomial and Normal tests Categorical prediction error( uses scorecard, population odds and cut-off score) This requires the scorecard and the cut-off score so one can implement the decisions and see how many erroneous classifications there are Error rates by tables and swap sets Hypothesis tests 3

Introduced by Kullbeck Divergence Continuous version of Information Value Let f(s G) ( f(s )) be density functions of scores of goods, (G) ( bads ()) in a scorecard. Divergence is then defined by f ( s G) Divergence = D= ( f ( s G) f ( s ) log ds = ( f ( s G) f ( s ) w( s) ds f ( s ) where w(s) is the weights of evidence at score. Like Exp goods dist (weights of evidence)-exp bad dist (weights of evidence) D 0 and D=0 f(s G)=f(s ) D no overlap between scores of goods and bads Really can only calculate the divergence by splitting scores into bands. If i bands with g = n and b = n i G i i I i I gi / n G gin Information Value = IV = ( gi / ng bi / n ) ln = ( gi / ng bi / n ) ln i I bi / n i I bn i G 4

Mahalanobis Distance and relationship with Divergence If goods have total n G, mean µ G and variance σ G and bads have total n mean µ and variance σ So assuming same variance G and, variance is Mahalanobis distance is D M = (µ G - µ )/ σ This is what discriminant analysis maximises If assume f(s ) and f(s G) are normal Divergence reduces to D If σ G = σ, then D= D M ( σ ) G σ 1 1 1 = + ( µ µ ) + σ σ σ σ G G G σ G G = p r o b a b i l i t y d e n s i t y n σ n G + n + n σ Difference between goods and bads different 1.5 1 0.5 : 0 0 4 6 8 10-0.5 score bads goods 5

Kolmogorov Smirnov statistic Not a difference in expectations but a difference in the distribution functions F(s G) and F(s ).( max difference) 1 KS = max F( s G) F( s ) s F(s ) Probability Distribution function K-S distance F(s G) 0 s: score 6

Kolmogorov-Smirnov statistic Problem with KS is that it describes situation at optimal separating score. i.e. where marginal good-bad odds is equal to overall good-bad odds This is usually much higher than any cut-off score. Strong relationship between KS and sensitivity and specificity F( s ) F( s G) = sensitivity + specificty 1 and KS = max F( s ) F( s G) = max sensitivity + specificty 1 s s 7

ROC curve Receiver Operating Characteristics Curve KS plots two functions F(s G), F(s ) against score ROC curve plots the functions against each other. Each point corresponds to a cut-off score s Vertical is % of bads below that cut-off Horizontal is % goods against that cut off F(s ) Ideal scorecard gives AC C through point ( 1,0) Curve AC (diagonal) like picking scores at random Gini coefficient (GC) asks how close to optimal is scorecard A F(s G) 8

ROC curve relationship with business measures As go down curve from top right to bottom left, decreasing cut-off score, so more accepted volume of portfolio increases As curve goes down, numbers of bads accepted increases losses increase Profit at cut-off score s, (R profit on good, D loss on bad) is Rp (1 F( s G)) Dp (1 F( s )) G So isobars of equal profit if Rp F( s G) + Dp F( s ) = constant G As one goes to north west along curve profit increases C F(s ) Volume increases Losses increase A Profits increase F(s G) 9

Gini Coefficient Gini coefficient, GC, is x(ratio of area between curve and diagonal to area AC) If GC =1 then perfect discrimination; GC =0 no discrimination. AUROC is area under the ROC curve so GC= (AUROC -0.5)= AUROC -1 K-S is greatest vertical distance from diagonal to curve. C F(s ) F(s ) A F(s G) F(s G) 10

Lift curve and Accuracy Ratio AR Lift curve looks similar to ROC curve ( originated in marketing) but subtle differences C F(s ) A F(s) Plots F(s ) % bads rejected against F(s) % rejected Ideal scorecard gives AC ( is p of population in) Random scorecard given by diagonal AC Curve depends on population odds UT Accuracy ratio, AR = ( area curve above diagonal)/area AC AR = GC ( even though different curves) 11

Somer s D-concordance statistic and relationship with Gini Coefiicient C F(s ) A F(s G) Area under curve can be interpreted as probability, D S, that if good and bad chosen at random, then good will have higher score than bad. Consider expectation of variable which is 1 if goods score>bads score; -1 if bads score> goods score; 0 if same score ( ) D = 1 F( s ) f ( s G) ds 1. F( s ) f ( s G) ds S = F( s ) f ( s G) ds 1. f ( s G) ds = AUROC 1 = G This D S is known as Somer s D-concordance statistic. Useful way of calculating GC.is to calculate Mann Whitney U If g goods, b bads let S(G) (S()) be sum of rankings of goods ( bads) Example as scores get ( GGGG); S(G) = +4+5+6=17; s()= 4; g=4,b=. AUROC=U/gb= [S(G) 1/g(g+1)]/gb =(17-10)/4. =0.875 D s = GC= AUROC-1 = U/gb -1= 0.75 1

C Very Simple bound on Gini and its powerful consequences Assume the scorecard has monotonically increasing marginal odds (reasonable) ROC curve is concave E (g,b)=(f(s G,F(s ) C F A G Area of AFE = (b-g)ag/; Area of FEC= (b-g).gd/ Area of two triangles = (b-g)/ < area from curve to diagonal =GC/ GC> (b-g) for any point on the curve D 13

GC > ( b-g) Very Simple bound on Gini and its powerful consequences (g,b)=(f(s G,F(s ) E F A D Take (g,b) to be at s which Maximises F(s )-F(s G), GC > KS Good cards satisfy 50-10 rule ( pick up 50% of bads in first 10% of goods) 50-10 rule G>.4 14

Segmentation and Discrimination In reality a scoring system consists of not just one score card but a suite of scorecards built on different segments of the population. Reasons for segmentation System / data constraints new account vs older accounts Policy issues want young people - different scorecard for <5 s Significant interactions between variables and others Usually only calculate discrimination measure for each scorecard separately but should do it for whole system after scorecards have been calibrated on common scale How much of the discrimination is due to the scorecards and how much to the segmentation? 15

Measuring power of segmentation in scorecards Measure segmentation power by taking the segments and choose scorecards which discriminates no better than random in each segment. Give borrower j in segment i a score of s i +εu j where u j has uniform distribution on [0,1] and s i+1 > s i + ε Results for two segment case but expands to k segment case. Assume segment 1 has g 1 goods and b 1 bads, score s 1 and segment has g goods and b bads, score s and assume Let g b < g b 1 1 Define n = g + b; n = g + b ; 1 1 1 b b g g p = ; p = ; p = ; p = b 1 b g 1 g 1 b 1 1 + b b1 + b g1 + g g1 + g p n n = ; p = t 1 t 1 n 1 + n n1 + n 16

Impact of segmentation on Gini coefficient AEC is ROC curve for segmented/random Gini E is (p,p ), so by same result as approx g b 1 1 GC=p -p b g 1 1 C Example from behavioural scorecards where can segment on whether ever in arrears or not In arrears- 830 goods; 160 bads Never in arrears- 7170 goods 40 bads Segment Gini = (160/00-830/8000) =.8-.104=.696 Actual Gini was.88 R E Even if no segmentation,gives view of how much of Gini, this characteristic brings to scorecard. Is like using (approx) D-concordance to decide which variables to choose Explains why behavioural, scores have higher Ginis than application score In example suppose Gini for no arrears is like that for application score say GC; assume cannot distinguish good/bads in arrears, then provided arrears score is small enough so curve goes through E ehavioural Gini = 0.696+.179GC So appl GC =.45 behav GC =.78 S A ROC curve given by segmentation only 17 D

Impact of segmentation on KS statistic For scorecard just from segments Easy to show that KS maximised at end of first segment b g 1 1 K S = p p Note for two segment scorecard KS=GC Cumulative prob F(s ) b p1 g p1 F(s G) ehavioural example In arrears- 830 goods; 160 bads Never in arrears- 7170 goods 40 bads Segment KS = (160/00-830/8000) =.8-.104=.696 Actual KS was.75 score 18

Impact of segmentation on Divergence and Mahalanobis distance Mahalanobis distance for segmented only scorecard where all segment 1 get s 1, all segment get s (ε 0) µ = p s + p s ; µ = p s + p s b b g g 1 1 G 1 1 σ = p p ( s s ) ; σ = p p ( s s ) b b g g 1 1 G 1 1 % σ = ( n σ + n σ )/ n = ( p p p + p p p )( s s ) D g b t g b t G G 1 1 1 1 % M ( σ ) = p p p p b g g b 1 1 p p p + p p p g b t g b t 1 1 1 If instead we take the overall sample varian ce σ σ = p p ( s s ) D t t 1 1 M ( σ ) = p p p p b g g b 1 1 p p t t 1 Example: behavioural score In arrears- 830 goods; 160 bads Never in arrears- 7170 goods 40 bads p =.8; p =.; p =.10375; p =.8965; b b g b 1 1 p =.107; p =.8793; σ =.16( s s ) ; σ =.0913( s s ) t b 1 1 G 1 % σ = 0.307; σ = 0.35 D M (% σ) =.6; D (% σ) =.14 Actual Mahalonobis distance is.40 M 19

Impact of segmentation on Divergence and Mahalanobis distance Divergence for segmented only scorecard where all segment 1 get s 1, all segment get s (ε 0) D = ( σ ) G σ 1 1 1 = + ( µ µ ) + σ σ σ σ G G G ( p p + p p )( p p p p ) + ( p p p p ) g g b b g b g b g g b b 1 1 1 1 1 1 b b g g p 1 p p 1 p Example: behavioural score In arrears- 830 goods; 160 bads Never in arrears- 7170 goods 40 bads For segmentation just by itself D = 4.73 Actual D = 5.761 ( but done using equal variance approx) 0

Conclusions There are connections between the different ways of measuring the discrimination of scorecards ROC curve is most fundamental of the measures ( does not depend on population odds) includes KS and D-concordance Very simple triangle bound give quick indication of GC Shows GC>KS Triangle bound is actual value if one only segments with random scorecard in each segment Allows one to recognise how much discrimination is built into segmentation alone independent of scorecards then built Even if no segmentation, gives importance of the variable considered for segmentation in full population scorecard 1