Predicting Customer Default Times using Survival Analysis Methods in SAS Bart Baesens Bart.Baesens@econ.kuleuven.ac.be
Overview The credit scoring survival analysis problem Statistical methods for Survival analysis Neural networks for Survival analysis Empirical setup and Results Conclusions
The credit scoring survival analysis problem (1) Traditional credit scoring systems aim at deciding upon the creditworthiness of applicants using characteristics e.g. age, marital status, amount on savings account, The problem is usually tackled using classification techniques, e.g. logistic regression, neural networks, decision trees, Income > $50,000 No Yes Yes Job > 3 Years No Yes High Debt No Good Risk Bad Risk Bad Risk Good Risk
The credit scoring survival analysis problem (2) But: Time to default is also very important : to decide upon length of time of loan for debt provisioning purposes decide upon increase or decrease of credit limit to monitor a client s repayment behaviour Traditional classification techniques not appropriate to handle this problem (censored data) Use survival analysis methods originating from medicine
Basic concepts of survival analysis Estimate distribution of failure times f(t) Two mathematically equivalent functions: Survival function S(t): Hazard function h(t): S(0)=1 and S( )=0 h(t)=f(t) /S(t) Censored observations t S ( t) = P( T t) = f ( t) dt h( t) P( t T lim = t 0 < t + t T t t)
Statistical techniques for survival analysis Kaplan-Meier analysis n = k dk S( t) n k t k < t Parametric methods No explanatory variables k Proportional hazards regression: Proportional hazards Partial likelihood method to estimate β h i ( t) = h0 ( t) e β T x i
Statistical techniques for survival analysis in SAS Use SAS/Stat proc lifetest, proc phreg, proc genmod Supports Kaplan-Meier analysis, parametric survival analysis methods, and proportional hazards regression Variety of test statistics to test o.a. significance of inputs, proportionality assumption, Advanced procedures for partial likelihood estimation
Kaplan Meier Proportional hazards
Neural networks for survival analysis (1) Drawbacks of statistical survival analysis models: Functional form of the inputs remains linear or some mild extension thereof Interaction and non-linear terms need to be specified by the user Baseline hazard is assumed to be uniform and proportional across entire population
Neural networks for survival analysis (2) Neural networks are mathematical models inspired by the functioning of the human brain Universal approximation property High non-linear modelling capability Advanced training algorithms for determining weights Good generalisation capability But: not standard equipped for survival analysis
Neural networks for survival analysis (3)! Multiple NN Single Output Monotone Survival curve Censoring Scalable Timevarying covariates Direct N Y N N N Y Classification Ohno- Y Y N Y Y N Machado Ravdin and N Y N Y Y N Clark Biganzoli et N Y Y Y Y N al. Liestol et al N N N Y Y Y Faraggi N Y Y Y Y Y Street N N N Y N Y Mani N N Y Y N Y Brown N N Y Y N Y
Neural networks for Survival analysis in SAS (1)
Empirical setup and results Data set obtained from major Benelux financial company consisting of 20630 Obs. and 28 inputs Compared phreg models with Mani NNs using SAS software Results in term of MSE and MAD suggest the presence of non-linearities Cluster hazard curves and detect customers with similar risk patterns using clustering methods in SAS/EnterpriseMiner
Conclusions Neural networks are powerful for predicting customer default times Both neural networks and statistical survival analysis methods are efficiently implemented in SAS Enterprise Miner and SAS/Stat Future Research: implement the Faraggi method in SAS