Study Unit 5 Business Analytics and Credit Scoring ANL 309 Business Analytics Applications
Introduction Process of credit scoring The role of business analytics in credit scoring Methods of logistic regression and decision trees
Constructing a Credit Scoring Model The construction of a credit or behavioural scoring model may be broadly broken down into the following phases: Defining Risky Customers Data Gathering and Analysis Scorecard Generation Implementation and Credit Risk Strategy
Defining Risky Customers First, define clearly the customers who the institution would classify as risky customers. Depends on the overall risk that the institution is willing to expose itself to, coupled with its profitability expectations. This is the basis on which the entire scoring framework is built upon.
Data Gathering and Analysis Next, identify all the dimensions which have an effect on the customer s propensity to default. Gather all the available information related to past credit behaviour of the customers. Perform the necessary data mining analysis to determine the significant relationships between demographic, behavioural dimensions, and the customer s propensity to default.
Scorecard Generation Scorecards can be generated using various techniques in data mining, such as logistic regression, which is a parametric statistical technique, or neutral network, which is a nonti technique. parametric A scorecard is a linear combination of the various attributes with appropriate weights assigned to each of them.
Implementation and Credit Risk Strategy Once every customer in the system is assigned a credit score, banks or lending institutions will re-formulate credit policies and operational strategies based on the portfolio. For example, customers with higher credit scores will enjoy better rates, tenures and faster approvals compared to the others. The institution may also decide to deny credit to customers who have very low credit scores.
Credit Scoring and Business Analytics There are a number of business analytical methods that can be applied in credit scoring. Two popular methods are: logistic regression decision trees
Model Variability Validity monitoring i is conducted d to ensure that t the model differentiates or slopes behaviour that is consistent with the business needs and expectations. When the performance or slope has degraded significantly, it indicates that the business needs are not being served and corrective measures must be taken. Validity monitoring should be viewed as the final defense mechanism because it identifies model failures after they have occurred.
Model Stability New models are assessed for stability that begins three months after their first use in production. For existing models, assessment occurs on a quarterly basis. Population stability will be assessed via the Population Stability Index (PSI) and a score distribution report. The statistic will be calculated by comparing a benchmark score distribution with the most recent score distribution.
Population Stability The Population Stability Index (PSI) calculations are performed monthly on the Small Business Card population used to the score the SL02 and NA01 models. The PSI value indicates if the population is stable or if there are significant shifts in the population. As such, model breakdown or data inconsistencies can be easily detected.
Logistic Regression Logistic regression is similar to linear regression, except that the dependent variable is not continuous. The dependent variable is discrete/ categorical, e.g. 1=respond to an offer, 0=did not respond to an offer; or 1=default on loan, 0=did not default on loan.
Logistic Regression
Logistic Regression: Assumptions The true conditional probabilities are a logistic function of the independent variables. No omission of important variables. No extraneous variables are included. No measurement error for the independent variables. Independence of observations. o s The independent variables are not linear combinations of each other.
Decision Trees Very popular in business analytics applications mainly because it produces visual model and generate rules that can be easily interpreted. Examine all possible questions which can distinguish the data into segments which are nearly homogeneous in characteristics.
Types of Decision Trees There are many types of decision tree approaches: C&RT ID3 C4.5/C5 CHAID. Their main difference is how they partition the data.
Decision Trees: Stopping Rule A decision tree algorithm will stop growing the tree when one of the following criteria is satisfied: Segment contains only one record. All records in the segment have identical characteristics. Improvement is not substantial to warrant growing the tree further.
Over-fitting and Cross-validation Once the tree has grown to a certain size, depending on the stopping rule, it is also important to check the tree for over-fitting of the data. Cross-validation and test set validation may be applied.