Predicting Customer Default Times using Survival Analysis Methods in SAS



Similar documents
Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER

Statistics in Retail Finance. Chapter 6: Behavioural models

Applying Survival Analysis Techniques to Loan Terminations for HUD s Reverse Mortgage Insurance Program - HECM

Modeling Customer Lifetime Value Using Survival Analysis An Application in the Telecommunications Industry

An Application of the Cox Proportional Hazards Model to the Construction of Objective Vintages for Credit in Financial Institutions, Using PROC PHREG

Predicting Customer Churn in the Telecommunications Industry An Application of Survival Analysis Modeling Using SAS

An Introduction to Survival Analysis

Introduction. Survival Analysis. Censoring. Plan of Talk

Predictive Modeling in the Insurance Industry Using SAS Software Terry J. Woodfield, Ph.D., SAS Institute Inc., Irvine, CA

Regression Modeling Strategies

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Developing Business Failure Prediction Models Using SAS Software Oki Kim, Statistical Analytics

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

Distance to Event vs. Propensity of Event A Survival Analysis vs. Logistic Regression Approach

A Basic Guide to Modeling Techniques for All Direct Marketing Challenges

An Application of Weibull Analysis to Determine Failure Rates in Automotive Components

Data mining and statistical models in marketing campaigns of BT Retail

A Property & Casualty Insurance Predictive Modeling Process in SAS

Leveraging Ensemble Models in SAS Enterprise Miner

Statistics in Retail Finance. Chapter 2: Statistical models of default

Predictive Modeling Techniques in Insurance

SAS Software to Fit the Generalized Linear Model

Lecture 15 Introduction to Survival Analysis

USING LOGIT MODEL TO PREDICT CREDIT SCORE

SUMAN DUVVURU STAT 567 PROJECT REPORT

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Survival Analysis And The Application Of Cox's Proportional Hazards Modeling Using SAS

Business Analytics and Credit Scoring

A course on Longitudinal data analysis : what did we learn? COMPASS 11/03/2011

Data are everywhere. IBM projects that every day we generate 2.5 quintillion bytes of data. In relative terms, this means 90

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

A Hybrid Modeling Platform to meet Basel II Requirements in Banking Jeffery Morrision, SunTrust Bank, Inc.

More details on the inputs, functionality, and output can be found below.

Competing-risks regression

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign

Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models

Classification Problems

«The Five Myths of Predictive Analytics» 1

Researching individual credit rating models

Potential Value of Data Mining for Customer Relationship Marketing in the Banking Industry

LOGISTIC REGRESSION AND MULTICRITERIA DECISION MAKING IN CREDIT SCORING

Development Period Observed Payments

6/5/2013. Predicting Student Loan Delinquency and Default. Outline. Introduction - Motivation (1) Reuben Ford CASFAA Conference, Ottawa June 10, 2013

ATV - Lifetime Data Analysis

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

testo dello schema Secondo livello Terzo livello Quarto livello Quinto livello

Survival Analysis of the Patients Diagnosed with Non-Small Cell Lung Cancer Using SAS Enterprise Miner 13.1

Big Data Big Knowledge?

1 Determinants of small business default

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Efficacy analysis and graphical representation in Oncology trials - A case study

Statistics Graduate Courses

Discover the possibilities. SAS Analytics Training. April June 2015 Course Schedule. support.sas.com/training/analytics

Predictive Modeling of Titanic Survivors: a Learning Competition

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

Small Business Credit Scoring: A Comparison of Logistic Regression, Neural Network, and Decision Tree Models

Reliability Prediction for Mechatronic Drive Systems

How to Optimize Your Data Mining Environment

Tests for Two Survival Curves Using Cox s Proportional Hazards Model

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

BayesX - Software for Bayesian Inference in Structured Additive Regression

M15_BERE8380_12_SE_C15.7.qxd 2/21/11 3:59 PM Page Analytics and Data Mining 1

Joseph Twagilimana, University of Louisville, Louisville, KY

Lending Decision Model for Agricultural Sector in Thailand

A Deeper Look Inside Generalized Linear Models

Better credit models benefit us all

Instabilities using Cox PH for forecasting or stress testing loan portfolios

Sampling Error Estimation in Design-Based Analysis of the PSID Data

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

Application of SAS! Enterprise Miner in Credit Risk Analytics. Presented by Minakshi Srivastava, VP, Bank of America

Reject Inference Methodologies in Credit Risk Modeling Derek Montrichard, Canadian Imperial Bank of Commerce, Toronto, Canada

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Cool Tools for PROC LOGISTIC

From Knowledge Discovery to Implementation: A Business Intelligence Approach Using Neural Network Rule Extraction and Decision Tables

V. Kumar Andrew Petersen Instructor s Presentation Slides

Statistics in Retail Finance. Chapter 7: Fraud Detection in Retail Credit

Survival Analysis Using Cox Proportional Hazards Modeling For Single And Multiple Event Time Data

Algorithmic Scoring Models

Reject Inference in Credit Scoring. Jie-Men Mok

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Monitoring the Behaviour of Credit Card Holders with Graphical Chain Models

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Analysis of Survey Data Using the SAS SURVEY Procedures: A Primer

Prediction of Stock Performance Using Analytical Techniques

Benchmarking of different classes of models used for credit scoring

Sun Li Centre for Academic Computing

Master programme in Statistics

BIO 226: APPLIED LONGITUDINAL ANALYSIS COURSE SYLLABUS. Spring 2015

Creditworthiness Analysis in E-Financing Businesses - A Cross-Business Approach

Transcription:

Predicting Customer Default Times using Survival Analysis Methods in SAS Bart Baesens Bart.Baesens@econ.kuleuven.ac.be

Overview The credit scoring survival analysis problem Statistical methods for Survival analysis Neural networks for Survival analysis Empirical setup and Results Conclusions

The credit scoring survival analysis problem (1) Traditional credit scoring systems aim at deciding upon the creditworthiness of applicants using characteristics e.g. age, marital status, amount on savings account, The problem is usually tackled using classification techniques, e.g. logistic regression, neural networks, decision trees, Income > $50,000 No Yes Yes Job > 3 Years No Yes High Debt No Good Risk Bad Risk Bad Risk Good Risk

The credit scoring survival analysis problem (2) But: Time to default is also very important : to decide upon length of time of loan for debt provisioning purposes decide upon increase or decrease of credit limit to monitor a client s repayment behaviour Traditional classification techniques not appropriate to handle this problem (censored data) Use survival analysis methods originating from medicine

Basic concepts of survival analysis Estimate distribution of failure times f(t) Two mathematically equivalent functions: Survival function S(t): Hazard function h(t): S(0)=1 and S( )=0 h(t)=f(t) /S(t) Censored observations t S ( t) = P( T t) = f ( t) dt h( t) P( t T lim = t 0 < t + t T t t)

Statistical techniques for survival analysis Kaplan-Meier analysis n = k dk S( t) n k t k < t Parametric methods No explanatory variables k Proportional hazards regression: Proportional hazards Partial likelihood method to estimate β h i ( t) = h0 ( t) e β T x i

Statistical techniques for survival analysis in SAS Use SAS/Stat proc lifetest, proc phreg, proc genmod Supports Kaplan-Meier analysis, parametric survival analysis methods, and proportional hazards regression Variety of test statistics to test o.a. significance of inputs, proportionality assumption, Advanced procedures for partial likelihood estimation

Kaplan Meier Proportional hazards

Neural networks for survival analysis (1) Drawbacks of statistical survival analysis models: Functional form of the inputs remains linear or some mild extension thereof Interaction and non-linear terms need to be specified by the user Baseline hazard is assumed to be uniform and proportional across entire population

Neural networks for survival analysis (2) Neural networks are mathematical models inspired by the functioning of the human brain Universal approximation property High non-linear modelling capability Advanced training algorithms for determining weights Good generalisation capability But: not standard equipped for survival analysis

Neural networks for survival analysis (3)! Multiple NN Single Output Monotone Survival curve Censoring Scalable Timevarying covariates Direct N Y N N N Y Classification Ohno- Y Y N Y Y N Machado Ravdin and N Y N Y Y N Clark Biganzoli et N Y Y Y Y N al. Liestol et al N N N Y Y Y Faraggi N Y Y Y Y Y Street N N N Y N Y Mani N N Y Y N Y Brown N N Y Y N Y

Neural networks for Survival analysis in SAS (1)

Empirical setup and results Data set obtained from major Benelux financial company consisting of 20630 Obs. and 28 inputs Compared phreg models with Mani NNs using SAS software Results in term of MSE and MAD suggest the presence of non-linearities Cluster hazard curves and detect customers with similar risk patterns using clustering methods in SAS/EnterpriseMiner

Conclusions Neural networks are powerful for predicting customer default times Both neural networks and statistical survival analysis methods are efficiently implemented in SAS Enterprise Miner and SAS/Stat Future Research: implement the Faraggi method in SAS