For a free evaluation of the software used, visit http://www.salford-systems.com and click the download button on the homepage. Mortgage Business Transformation Program Using CART-based Joint Risk Modeling (and including a practical discussion of TreeNet) Salford Systems Data Mining 2005 New York Ali Moazami Chief Credit Officer MortgageIT Holdings, Inc. Shaolin Li VP, Financial Modeling Group BlackRock Financial Management Corp.
Background As part of an overall mortgage market of $10 Trillion, US mortgagebacked securities account for $5.3 Trillion of debt instruments. Since these securities include an imbedded prepayment option, their accurate valuation requires projections of both prepayment and default risk. Over the years, Wall Street has developed sophisticated models to predict these risks. However, available data has been limited to a small set of loan and borrower characteristics. With the broader availability of analytic systems, many other market players, including large mortgage banks, have recently launched concerted proprietary modeling efforts.
Vision Several years ago, Chase Home Finance set out to use its much broader information set to develop a loan-level, prospective mortgage customer valuation system. The effort, called Project Oracle, was envisioned as a broad information-based business system aimed at transforming a multitude of functions and processes: Increasing the value of the franchise through a concerted focus on individual customer value Knowing and anticipating customers needs Increasing customer loyalty through retention and cross-sell programs Enabling state-of-the-art database marketing which takes advantage of a segment of one Improving origination yields by providing incentives for best potential customers Page 3
Loan-Level Resolution One of Oracle s guiding principles was to leverage customer-level information to account for distributions of values rather than just group averages. Distribution of Serviced Loan Values Conventional Conforming FRM30 Mortgages Originated in 1993 with Interest Rates between 7% and 7.5% 240 3% 220 2% Serviced NPV / Unpaid Balance (bps) 200 180 160 140 120 100 7% 7% 10% 10% 16% 17% Average Market Value of Servicing 80 26% 60 3% Page 4
Elements Oracle included extensive statistical modeling of the drivers of prospective mortgage profitability: Comprehensive Historical Datamart Prepayment and Default Modeling Cash Flow Financial Valuation Response and Pull-Through Modeling Customer Segmentation as well as concrete business programs to create competitive advantage in the mortgage market: Customer Retention & Loyalty Cross Selling Portfolio Management Database Marketing Information Brokerage
Data Elements By using the broadest possible array of data sources, Oracle aimed to establish a complete profile of the mortgage customer or prospect: Historical Origination and Servicing Data Detailed Historical Credit Bureau Information Updated Collateral Values PACS LINCS CAPS Economic Indicators Current Geo-Demographic and Census Data Credit Economic Housing Census Bureau Indicators Values Competitive Data HP3000 QUEST Program Response Information Market'g Lists Raw Oracle Superfile Oracle SuperFile Program Response Data Cleaning and formatting (Modcode) Demographic Cross- Chase Scoring File Modeling Sample SuperFile Extract Loan Valuation Based on theoretical research, a rich array of derived fields were created to summarize time-series indicators of customer behavior/ incentives and interest rate movements. Statistical Analysis Data Users
Modeling Requirements Design of modeling approach included a description of desired features: Handle Complexity ð Use a Broad Set of Predictors and their Multitude of Interactions Interpretable ð Use Decision Trees Prospective ð Extend Trees to Hazard Modeling Allow Forecasting ð Use Time Varying Covariates Competing Risks ð Simultaneous Estimation of Default and Prepay Handle Missing Values ð Use Most Relevant Surrogates Handle Outliers ð Robust n-parametric Modeling Page 7
Sample Selection Oracle s development sample of over 600,000 loans was designed to optimize the use of information while ensuring full representation of portfolio flows. Sample selection was based on the loan-month concept, the basic observation unit of Oracle s predictive modeling: Exclude partial loan history to control the effect of relatively long periods where the borrower simply pays principal and interest. Include all loan-month records where the borrower either prepaid or defaulted to maximize information. Actual Record G1 Sample G2 Sample Loan # Time Status 123 Dec-97 Active X 123 Jan-98 Active X X 123 Feb-98 Active X 123 Mar-98 Prepay X X 456 v-97 Active 456 Dec-97 Active X 456 Jan-98 Active 456 Feb-98 Prepay X More Information More Representation Fewer Repetitive Records 789 Sep-97 Active X 789 Oct-97 Active X X 789 v-97 Default X X
Predictive Modeling Oracle implemented competing risks through a Triad of Binary Decision Trees. This innovation resulted in: Better classification accuracy More compact trees Vastly improved interpretability Loan-Month Attributes Attribute A < x Attribute D < r Attribute G < u Attribute Attribute Attribute Attribute Attribute Attribute B < y A/P A/D P/D C < z E < s F < t H < v I < w Active vs. Prepay Conditional Probability Active vs. Default Conditional Probability Prepay vs. Default Conditional Probability Active Probability Prepay Probability Default Probability
Valuation Engine and Ancillary Modeling Project Oracle used a discounted cash flow model to turn customer behavior probabilities into a measure of loan and servicing value for each given economic forecast. Expected values were obtained by averaging these present values over a large number of simulated interest rate paths. Furthermore, the main model was supplemented by a multitude of specialized models to further refine decisions: Default Reinstatement Cash-out Refinance Product Migration In-Market and Out-of-Market Migration Etc.
Classification Rates The Oracle models correctly classified three out of four actual prepayers. The default classification rate were also consistently around 90%. % of Actulas Correctly 100% 80% 60% 40% 20% 51% Oracle G1 58% FRM30 Correct Classification Rates Development Sample Oracle G2 65% 74% 90% 91% % of Actulas Correctly 100% 80% 60% 40% 20% 51% Oracle G1 Short-Term ARM Correct Classification Rates Development Sample 57% Oracle G2 69% 74% 89% 92% 0% Active Prepay Default 0% Active Prepay Default Page 11
Comparative Gains Charts Oracle was consistently excellent at predicting both prepayments and 90+ dpd mortgage delinquencies, even with stringent performance requirements. 100 Prepayment Gains Chart March-August 98 Cumulative % of Prepayers 100 Default Gains Chart March-August 98 Cumulative % of Actual Defaulters Oracle G2 90 90 80.8 80 72.1 80 Equifax TMS 70 70 70.0 60 54.0 60 50 50 40 Rate Diff. 40 30 Oracle G2 30 20 Random 20 Random 10 10 0 0 10 20 30 40 50 60 70 80 90 100 File Depth (%) - Sorted by decreasing predicted risk 0 0 2.5 5 7.5 10 12.5 15 17.5 20 File Depth (%) - Sorted by decreasing predicted risk
Comparative Prepayment Rates Oracle achieved an average annual prepayment rate of 44% for the top 10% of customers by predicted prepayment risk. This compares to 29% for the highest rate (differential) loans and 22% for the servicing file as a whole. 60 Actual Cumulative Prepayment Rates by File Depth March - August 98 Annualized Cumulative Annualized Prepay Rate (%) 50 40 30 20 10 43.8 28.9 Rate Differential Oracle G2 Actual 0 0 10 20 30 40 50 60 70 80 90 100 File Depth (%) - Sorted by decreasing predicted risk
Selected Business Applications Oracle s customer-level information was used to create a fundamentally new perspective for many banking functions: Customer Relationship Management Pro-active streamlined refinance Preferred customer service in servicing Call center targeted cross-sell program Portfolio Management Prepayment forecasting Hedging and managing loan and servicing portfolios Asset management opportunities from arbitraging loan-level versus market price differentials. Credit Risk Management Early Payment Default identification/loss mitigation program Loss severity applications Risk management enhancements Default servicing prioritization Page 14
Proactive Refinance & Streamlining Oracle s information-based targeting proved to be particularly potent for a cost-advantaged direct-to-consumer approach to: Prospecting Customer retention Cross selling. The basic premise was to anticipate customer and prospect behavior in order to provide value added propositions directly to high-potentialvalue customers. Oracle s infrastructure also included a state-of-the-art prospect database as well as sophisticated targeting and response analyses. Page 15
Technical Issues in CART Application to Mortgage Prepayment and Default Modeling Shaolin Li Vice President BlackRock Financial Management
Mortgage Modeling Challenges Challenge in Loan Level Prepayment Modeling Interaction Detection (e.g. LTV and FICO interaction) Missing Values and Dirty Data ( Missing Income, Inaccurate LTV) Core CART Features Automatic Interaction Detection Handle Missing Data by Surrogate FRM 30 A/P Sub-Tree Payment Diff (incl closing) <= $30.39 Payment Diff (incl closing) <= $67.16 Prepay Age of loan <= 2.5 months Active Payment Diff (no closing) <= $52.34 Mortgage Score <= 723.5 Avg Cum Pmt Diff <= $39.98 Current LTV <= 0.831 Prepay House Appreciation <= 6% Local Unemployment <= 6.55 Prepay Prepay Active Active Page 17
Triad of Binary Decision Trees Drivers of Prepayment and Default are different Prepayment: Rate Incentive, Cash-out Refi, Moving Default: FICO, Unemployment Rate, Consumer Payment Pattern Simplicity of 2-Class Pairwise Tree: Only one decision boundary requires attention Current vs. Prepay, Current vs. Default and Prepay vs. Default Coupling the Pairwise Tree Probabilities to Form 3. FRM 30 and 15 A/D Sub-Tree Total Past Due/P&I <= 54.6% Mortgage Score <= 533.5 1+ Months Past Due Default Default 1+ Serious Delinquency on Bureau Mortgage Score Total Past Due/P&I <= 620.5 <= 96.6% Active Total Past Due/P&I <= 117.4% Active Default Active (-STOP-) Page 18
Prepayment Model Performance 100 Prepayment Gains Chart March-August 98 Cumulative % of Prepayers 90 80 72.1 70 60 54.0 50 40 30 20 Oracle G2 Rate Diff. Random 10 0 0 10 20 30 40 50 60 70 80 90 100 File Depth (%) - Sorted by decreasing predicted risk Page 19
Default Model Performance 100 90 80 Default Gains Chart March-August 98 Cumulative % of Actual Defaulters 80.8 Oracle G2 Equifax TMS 70 70.0 60 50 40 30 20 Random 10 0 0 2.5 5 7.5 10 12.5 15 17.5 20 File Depth (%) - Sorted by decreasing predicted risk Page 20
Power of TreeNet: Loan Reinstatement Forecasting The Model predicts the reinstatement likelihood of a loan that is already 90-day delinquent FICO is helpless. Interaction of payment pattern and current LTV determines loan reinstatement. TreeNet provide much better forecasting than hybrid CART-Logit Model Page 21