Innovative Analytics for Traditional, Social, and Text Data. Dr. Gerald Fahner, Senior Director Analytic Science, FICO

Similar documents
SCORES OVERVIEW. TransUnion Scores

New Predictive Analytics for Measuring Consumer Capacity for Incremental Credit

FICO Score Factors Guide

Classification of Bad Accounts in Credit Card Industry

Application of SAS! Enterprise Miner in Credit Risk Analytics. Presented by Minakshi Srivastava, VP, Bank of America

The Truth About Credit Repair

Understanding Credit Scoring

DEMYSTIFYING BIG DATA. What it is, what it isn t, and what it can do for you.

The Predictive Data Mining Revolution in Scorecards:

SAS Rule-Based Codebook Generation for Exploratory Data Analysis Ross Bettinger, Senior Analytical Consultant, Seattle, WA

How To Make A Credit Risk Model For A Bank Account

To Score or Not to Score?

Understanding Credit BY Sallie Mae and FICO

MANAGING CREDIT101 TM %*'9 [[[ EPXEREJGY SVK i

10/27/14. Consumer Credit Risk Management. Tackling the Challenges of Big Data Big Data Analytics. Andrew W. Lo

IMPROVE YOUR CREDIT SCORE. Effective Credit Management At Your Fingertips

IMPORTANCE OF CREDIT HISTORY AND SUCCESSFUL SAVING

Summary. WHITE PAPER Using Segmented Models for Better Decisions

Credit Score Secrets Revealed

Afni deploys predictive analytics to drive milliondollar financial benefits

Learning to Make Good Loans to Members with Bad Credit

CREDIT REPORTS WHAT EVERY CONSUMER SHOULD KNOW ABOUT MORTGAGE EQUITY P A R T N E R S

FICO Score Factors Guide - TransUnion

Benchmarking of different classes of models used for credit scoring

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

HOW PREDICTIVE ANALYTICS DRIVES PROFITABILITY IN ASSET FINANCE

Reason Statement Full Description Actions You Can Take or Keep this in mind

WHITEPAPER. How to Credit Score with Predictive Analytics

Predictive Analytics for Donor Management

Tom Aliff Vice President, Modeling and Analytics Martin O Connor Senior Vice President, Global Analytics

Loan Lessons. The Low-Down on Loans, Interest and Keeping Your Head Above Water. Course Objectives Learn About:

Understanding Your Credit Score

Your Credit Score What It Means to You as a Prospective Home Buyer

Why Credit is Important

Solving the Credit Puzzle. L G & W Federal Credit Union

Summary. January 2013»» white paper

Understanding Your Credit Score and How You Can Improve It

Your Money Matters! Financial Literacy Teacher Guide. Thanks to TD for helping us bring this resource to schools for free.

Lending 101 The Basics

NOTES. 12 Your Credit Score Your Credit Score 1 CONTENTS:

Risk Management of the financial Sector. Periklis Dontas Executive Director, Calculus Global Services Aybike Aker - Senior Consultant, FICO

GREENPATH FINANCIAL WELLNESS SERIES

Loan Lessons. The Low-Down on Loans, Interest and Keeping Your Head Above Water. Course Objectives Learn About:

Understanding Your Credit Score

Smarter digital banking with big data

Predicting borrowers chance of defaulting on credit loans

Maximizing Return and Minimizing Cost with the Decision Management Systems

Credit Scoring RATE

Understanding your Credit Score

How to improve your FICO Score in perilous times By Blair Ball. National Distribution of FICO Scores

Your Credit Score What It Means to You as a Prospective Home Buyer

Behavior Model to Capture Bank Charge-off Risk for Next Periods Working Paper

DNBi Risk Management. Unparalleled Data Insight to Drive Profitable Growth. Insights from Data. Relationships from Insights

TERM SHEET FOR THE UNSECURED BUSINESS LINE OF CREDIT

Business Analytics and Credit Scoring

Empowering Brokers to Identify and Combat Mortgage Fraud

PNC Financial Education FINANCIAL OPS. Banking Solutions for Military Personnel. Credit: Self study Guide

not possible or was possible at a high cost for collecting the data.

Investor Presentation Final Results 12 MONTHS ENDED 30 June 2012

The Road to Good. How to Start Building Your credit with In-House Auto Financing

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

High-Performance Scorecards. Best practices to build a winning formula every time

Credit ~ The Basics Participant s Guide

Introduction. Purpose. Student Introductions. Agenda and Ground Rules. Objectives

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Conquering Big Data Challenges Big Data is Here for Financial Services

SM4-1: Credit Card Application (version 1)

Comply and Compete: Model Management Best Practices

How do I get good credit?

Battling for Balances: Targeting the Lucrative Card-to-Card Consumer Balance-Transfer Market. An Experian Perspective

Introduction The History of Credit Scoring Why Your Credit Score is So Important

Unlocking Value from. Patanjali V, Lead Data Scientist, Tiger Analytics Anand B, Director Analytics Consulting,Tiger Analytics

Automotive Services. Tools for dealers, lenders and industry service providers that drive profitable results in today s economy

An effective approach to preventing application fraud. Experian Fraud Analytics

Certificate Program in Applied Big Data Analytics in Dubai. A Collaborative Program offered by INSOFE and Synergy-BI

8 Ways to Avoid or Stop Foreclosure...

CREDIT BASICS About your credit score

Predictive Analytics: Turn Information into Insights

Driving Insurance World through Science Murli D. Buluswar Chief Science Officer

Assessment Schedule 2013 Accounting: Prepare financial information for an entity that operates accounting subsystems (91176)

Cool Tools for PROC LOGISTIC

$uccessful Start and the Office of Student Services Present: THE WINNING SCORE

Transcription:

Innovative Analytics for Traditional, Social, and Text Data Dr. Gerald Fahner, Senior Director Analytic Science, FICO

Hot Trends in Predictive Analytics Big Data the Fuel is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. Machine Learning - the Engine Gartner is a scientific discipline that explores the construction and study of algorithms that can learn from data. Such algorithms operate by building a model from example inputs and using that to make predictions or decisions, rather than following strictly static program instructions. Machine learning is closely related to and often overlaps with computational statistics; a discipline that also specializes in prediction-making. Wikipedia

Domain Expert the Driver Balance Number Crunching With Expertise Big Data and Machine Learning can help you to Predict consumer behavior more accurately To unlock this value requires domain expertise to build Comprehensible models for justifiable decisions Big datadriven learning Domain expertise Deeper Insights - Stronger Predictions Better Decisions

Case Studies Informing Origination Risk Score Development by Machine Learning Improving Insurance Fraud Model With Social Network Variables Evaluating Predictive Power of Text Data for a Peer Lending Network

Machine Learning Models Can Beat Simple Models by Substantial Margins % Tree Ensemble Model % Bads Rejected 5% Logistic Regression Home Equity Loan portfolio At % Goods rejected: Logreg rejects 63% Bads Tree Ensemble Model rejects 85% Bads 5% % % Goods Rejected

Anatomy of a Tree Ensemble Model Training Data Tree Ensemble Model Tree Prediction Function Outcomes Tree 2 Combine predictions from 2 trees Score Scored!? New case Predictors Tree 2 Predictors Random Forest Gradient Boosting

Demonstration Problem Outcomes 5-5 - 2 - -2-2 - 2 Noisy training samples Predictive relationship from which data were generated ( ground truth ) Predictors

Stochastic Gradient Boosting st Tree Tree 5 Weighted Average -5-2 - -2-2 - 2

Stochastic Gradient Boosting 5 Trees Tree 5 Tree 5 Weighted Average -5-2 - -2-2 - 2

Stochastic Gradient Boosting 2 Trees Tree 5 Weighted Average -5 Tree 2-2 - -2-2 - 2

Direct Inspection Yields a Black Box 9 9 3 2 2 3 3 2 3 2 2 4 3 5 3 2 2 3 4 4 4 5 5 5 3 4 2 4 4 5 3 2 5 5 6 3 2 2 3 7 6 2 3 6 6 3 2 43 2 7 7 7 5 4 2 3 6 6 3 5 2 3 6 2 7 5 7 7 8 9 8 8 9 9 8 6 3 2 7 6 3 2 8 9 2 8 7 3 6 8 9 7 9 8 9 8 8 9

Useful Diagnostic Information Variable Importance Relative to most important variable Interaction Test Statistics Whether a variable interacts with other variables DEBTINC DELINQ VALUE CLAGE DEROG LOAN CLNO MORTDUE NINQ JOB YOJ REASON..49.45.4.34.25.24.23.23.2.2.7 DEBTINC VALUE CLNO CLAGE DELINQ YOJ MORTDUE LOAN DEROG JOB.29.22.4.3..8.7.7.6.6

Input/Output Simulation Create Deeps Insight Into Black Box Sweep through range of ( x x2... x P Condition values of all other predictors to data ) Black box model (already trained)? Score as function of 8 6 4 2-2 -4-6 -8-2 -.5 - -.5.5.5 2 x

Practical Pitfalls Hindering Deployment of Machine Learning Models Non-intuitive Associations Can Lead to Unjustifiable Credit Decisions Associa'on (a,er controlling for all else) Loan Applica'on: Consumers with % debt ra4o have lower risk score than consumers with 3% debt ra4o Mortgage Lending: Consumers without previous mortgage have lower risk score than consumers with previous mortgage Could lead to decision Applicant is rejected because her debt ra4o is not high enough(!?) Applicant can t get a mortgage because he doesn t have a mortgage(!?)

Pros and Cons of Machine Learning Models for Credit Scoring PROS Highly accurate fit to data Discovery of unexpected associa4ons Automated, produc4ve analysis CONS Vulnerable to data limita4ons May capture non- intui4ve associa4ons Hard to impose domain exper4se

Scorecardizer Approach: Converts Machine Learnings into Powerful Comprehensible Scorecards First train Machine Learning model Then convert learnings into (segmented) Scorecard(s) Data Tree Tree 2 Tree 5 Combine Scorecard # Current Thick Files S/c #3 Current Young S/c #4 Thin Files Domain expertise Completely data-driven discovery Domain expertise can be injected to warrant model palatability

Can We Automate Expertise? In well-defined domains (e.g. credit scoring), can codify expertise Scorecardizer hooks into a database of expert rules, which enables fully automatic construction of palatable models Manual refinement of the end product by domain experts is still possible before deployment Some examples of codified expertise Everything else equal: Score must not increase with higher Debt Ratio Score must not decrease with Applicant Age above 6 years For current accounts, a history of delinquency is bad For 2-cycle delinquents, history of mild delinquency can be good (these have shown their capacity to recover)

Results for Home Equity Loan Portfolio % Bads Rejected 5 Scorecard segmentation discovered by Scorecardizer Age of oldest credit line Value of Current Property 5-fold cross-validated AUC Tree Ensemble: incomprehensible.96 Scorecardizer: 5 comprehensible segmented scorecards.94 Single comprehensible scorecard.9 S/c # S/c #2 S/c #3 S/c #4 S/c #5 5 % Goods Rejected

Case Studies Informing Origination Risk Score Development by Machine Learning Improving Insurance Fraud Model With Social Network Variables Evaluating Predictive Power of Text Data for a Peer Lending Network

Insurance Fraud and Networks New York $279 Million in insurance fraud ring Florida $ Million in worker s compensation fraud ring 2

Predicting Fraud with Traditional and Network Data Age Income Credit Score Fraud Adam Beth Adam 2 7, 68 Beth 36 3,2 744 Carl 62 9, 83 Dave 32 4,25 73 Dave Carl Eva 49 8 72 Eva Customer-specific predictors Predict Combine Money transfer connection Similar address connection Traditional fraud score Network variables

Enhancing Data with Network Variables Age Income Credit Score Adam 2 7, 68 Beth 36 3,2 744 Carl 62 9, 83 Dave 32 4,25 73 Eva 49 8 72 Avg. Age of Connections # Money Transfers in Network per Month Fraud 23 2 45 3 2 7 37 3 55 4 Variables derived from network Graph algorithms Feature generation Feature evaluation Feature selection 22

Boosting Auto Insurance Fraud Detection with Network Information 9 8 After adding network variables +4% % Detected 7 6 5 4 Traditional variables only 3 2.5.5 2 2.5 3 3.5 4 4.5 5 % Reviewed

Variable Importance Top Variables Rank Variable Name Car Model 2 Total Paid Amount in Network 3 Policy Holder Occupation 4 Pre-accident Vehicle Value 5 Total # Payments in Network 6 # Phantom Vehicles in Network 7 ZIP Code of Policy Holder 8 Size of Network 9 Repairable Flag Type of Accident Network Variables 24

Case Studies Informing Origination Risk Score Development by Machine Learning Improving Insurance Fraud Model With Social Network Variables Evaluating Predictive Power of Text Data for a Peer Lending Network

Ubiquitous Text Data Can Be Predictive and Yield New Insights Call center records, claims, public records, collector notes, emails, blogs, social data, freeform comments, reviews, webpages, product descriptions, transcribed phone calls, news articles Is there predictive value? How can we leverage it for comprehensible models and justifiable decisions? Semantic Scorecards & Topic Analysis

Origination Risk for Peer-to-Peer Lending Network Structured origination information Associated Loan Descriptions Hi, thanks for considering my request. I m a student in Southern California. I have a great credit score. I will use this loan to pay my rent, books and tuition expenses. I ve secured a part time job. My federal loans take care of all the rest. Prospect Thanks! ID #53464: Last summer we ve extended our business for second-hand I need furniture this into loan Southern Florida. to pay We ve been growing nice there. We need $4, to do enlarge showroom. Expecting to double sales thereafter. Sincerely, D.J. I need this loan to pay off higher rate credit card debt - fixed rate at 5%, that s the only card I use. off higher rate credit card debt - fixed rate at 5%, that s the only card I use Generate traditional variables Extract keywords and topics Semantic Scorecard : Combine traditional variables and text-based features in a comprehensible model

Predictive Value of Text Data % Bads Rejected Winner: Regular and text data Runner-up: Regular data only For comparison: Text data only % Goods Rejected

Top Predictors From Automatic Variable Selection Rank Variable Name Credit Bureau Grade 2 Inquiries During Last 6 Months 3 Monthly Income 4 Months Since Last Record 5 need 6 Revolving Credit Balance 7 baby 8 Loan Purpose 9 business would Revolving Line Utilization 2 qualify 3 Length of Employment 4 sincerely 5 credit Rank Lead to Unjustifiable Decisions Traditional variables (black) Keywords indicating elevated risk (red) Keywords indicating reduced risk (green) Variable Name 6 provided 7 sales 8 job 9 open 2 stable 2 payday 22 card 23 www 24 Open Credit Lines 25 Total Credit Lines 26 shop

Gained New Insights Into Risk Topics Results Using Latent Dirichlet Allocation Reduced risk topic: Credit card consolidation debt, free, consolidating, consolidated, card, credit, revolving, paying, payoff, sooner, quicker, clear, accumulated, accrued, completely, Elevated risk topic: Business-related items business, equipment, sales, capital, store, marketing, experience, location, expand, owner, retail, advertising, partner, inventory, products, profit, shop, restaurant,...

Discussion Machine learning, text- and social network analytics can yield deeper insights and stronger predictions, by combining traditional and novel data sources. Yet to make more profitable and justifiable decisions requires careful results interpretation and robust, explainable models. Domain expertise, combined with special methods and tools supporting interpretation, are of utmost importance for harnessing the potential of big data and machine learning.