Predictive Analytics for Life Insurance: How Data and Advanced Analytics are Changing the Business of Life Insurance Seminar May 23, 2012



Similar documents
PREDICTIVE MODELLING FOR COMMERCIAL INSURANCE

Predictive Modeling Techniques in Insurance

Model Validation Techniques

Risk pricing for Australian Motor Insurance

SOA 2013 Life & Annuity Symposium May 6-7, Session 30 PD, Predictive Modeling Applications for Life and Annuity Pricing and Underwriting

Predictive Modeling and Big Data

The Data Mining Process

Session 62 TS, Predictive Modeling for Actuaries: Predictive Modeling Techniques in Insurance Moderator: Yonasan Schwartz, FSA, MAAA

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Anti-Trust Notice. Agenda. Three-Level Pricing Architect. Personal Lines Pricing. Commercial Lines Pricing. Conclusions Q&A

Supervised Learning (Big Data Analytics)

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

How To Build A Predictive Model In Insurance

Azure Machine Learning, SQL Data Mining and R

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

How Will Predictive Modeling Change the P&C Industry in the Next 5-10 Years?

ElegantJ BI. White Paper. The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Data Mining Algorithms Part 1. Dejan Sarka

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING

Chapter 12 Discovering New Knowledge Data Mining

MS1b Statistical Data Mining

How To Understand The Theory Of Probability

Database Marketing, Business Intelligence and Knowledge Discovery

Data Mining Applications in Higher Education

Principles of Data Mining by Hand&Mannila&Smyth

Why do statisticians "hate" us?

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Simple Predictive Analytics Curtis Seare

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Chapter 6. The stacking ensemble approach

Gerry Hobbs, Department of Statistics, West Virginia University

TDWI Best Practice BI & DW Predictive Analytics & Data Mining

A Basic Guide to Modeling Techniques for All Direct Marketing Challenges

Easily Identify Your Best Customers

Addressing Analytics Challenges in the Insurance Industry. Noe Tuason California State Automobile Association

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

MSCA Introduction to Statistical Concepts

Better decision making under uncertain conditions using Monte Carlo Simulation

A Property & Casualty Insurance Predictive Modeling Process in SAS

Data Mining Methods: Applications for Institutional Research

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

69 PD Underwriting Issues for Group Life and Disability Insurance. Moderator: Peter A. Heinrichs, FSA, MAAA

REPORT DOCUMENTATION PAGE

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

The Last Mile Problem

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Business Analytics and Data Mining for CRM Business Analytics and Data Mining for CRM: Jumpstart workshop

Data Analytical Framework for Customer Centric Solutions

Data Mining: Overview. What is Data Mining?

Analytics in Action. What do Jeopardy, Pampers, and Major League Baseball all have in common? October 24, 2012

Data Mining. Nonlinear Classification

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Data Mining: An Introduction

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Combining Linear and Non-Linear Modeling Techniques: EMB America. Getting the Best of Two Worlds

Data Mining with SAS. Mathias Lanner Copyright 2010 SAS Institute Inc. All rights reserved.

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

SURVEY REPORT DATA SCIENCE SOCIETY 2014

Statistics Graduate Courses

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell

Data Mining: An Overview. David Madigan

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Actuarial Science as Data Science A vision for actuarial science in the 21 st century

Data Mining Practical Machine Learning Tools and Techniques

Health Spring Meeting May 2008 Session # 42: Dental Insurance What's New, What's Important

Business Analytics and Credit Scoring

Data Mining Approaches to Modeling Insurance Risk. Dan Steinberg, Mikhail Golovnya, Scott Cardell. Salford Systems 2009

Cross Validation. Dr. Thomas Jensen Expedia.com

Learning Objectives for Selected Programs Offering Degrees at Two Academic Levels

Statistics for BIG data

Machine Learning with MATLAB David Willingham Application Engineer

Customer and Business Analytic

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

MHI3000 Big Data Analytics for Health Care Final Project Report

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

A Marketer s Guide to Analytics

Data Mining - Evaluation of Classifiers

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

MA2823: Foundations of Machine Learning

SAS Fraud Framework for Banking

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

How To Perform An Ensemble Analysis

Better credit models benefit us all

The Process of Predictive Modeling Dorothy L. Andrews Merlinos & Associates

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Data Mining + Business Intelligence. Integration, Design and Implementation

STA 4273H: Statistical Machine Learning

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

Gordon S. Linoff Founder Data Miners, Inc.

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

Data Mining Part 5. Prediction

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Business Intelligence and Decision Support Systems

Transcription:

Predictive Analytics for Life Insurance: How Data and Advanced Analytics are Changing the Business of Life Insurance Seminar May 23, 2012 Session 2 How to Build a Risk Based Analytical Model for Life Insurance Presenter James C. Guszcza, FCAS, MAAA

Predictive Modeling From A to B Society of Actuaries Los Angeles James Guszcza, PhD, FCAS, MAAA Deloitte Consulting LLP May 23, 2011 Analyzing Analytics Three thoughts capture a major reason why predictive analytics is ubiquitous... The whole of science is a refinement of everyday thinking. Albert Einstein Decision-making is at the heart of administration. Herbert Simon Human judges are not merely worse than optimal regression equations; they are worse than almost any regression equation. Richard Nisbett & Lee Ross Human Inference: Strategies and Shortcomings of Social Judgment 2 Deloitte Analytics Institute 2011 Deloitte LLP 1

Agenda Introduction Why Business Analytics is Now Ubiquitous Predictive Modeling: Levels of Discussion Strategy Methodology Technique Why Business Analytics is Increasingly Ubiquitous 2

Is Business Analytics New? Not really Easy example: credit scoring is an early example of business analytics and has been around for many decades. 5 Deloitte Analytics Institute 2010 Deloitte LLP Is Business Analytics New? Predictive Modeling in Insurance isn t new but it s rate of adoption continues to increase. Property-Casualty insurers have been intensively using predictive models for ~ 15 years now. 1990s: Personal credit used to price, underwrite personal insurance. 2000s: scoring models used to price, underwriter commercial and professional liability insurance. Today: Life insurers are adopting or preparing to adopt predictive modeling techniques. 6 Deloitte Analytics Institute 2010 Deloitte LLP 3

Why Now? (Answer #1) Technology (Moore s Law) Cost of storage and computing power has decreased exponentially Data Third-party data is becoming increasingly available Companies are learning to do more with their internal data Software and algorithms Great analytic ideas keep coming from statistics, economics, machine learning, marketing, Free tools like R 7 Deloitte Analytics Institute 2011 Deloitte LLP Exponential Growth of Data Availability More and more data is gathered, stored, and available to be analyzed Time Challenges Challenging Times for Future Innovation Search http://www.dlib.org/dlib/may09/mestl/05mestl.html 8 Deloitte Analytics Institute 2011 Deloitte LLP 4

Exponential growth of Statistical Computing Availability The open-source R statistical computing environment has experienced exponential growth The number of contributed packages has grown exponentially 9 Deloitte Analytics Institute 2011 Deloitte LLP Why Now? (Answer #2) OR: Clinical vs Actuarial Judgment the Motion Picture 10 Deloitte Analytics Institute 2011 Deloitte LLP 5

There s Something About Linda To illustrate, think about this person: Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations 11 Deloitte Analytics Institute 2011 Deloitte LLP There s Something About Linda Think about this person: Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations Now rank these possible scenarios in order of probability: Linda is active in the feminist movement Linda is a bank teller Linda is a bank teller and is active in the feminist movement 12 Deloitte Analytics Institute 2011 Deloitte LLP 6

There s Something About Linda Daniel Kahneman and Amos Tversky posed precisely this question to a group of people They found that 87% of the people thought that feminist bank teller was more probable than bank teller But this is logically impossible. Mathematical fact: So in particular: Prob(A&B) < Prob(B) Prob(feminist & bank teller) < Prob(bank teller) 13 Deloitte Analytics Institute 2011 Deloitte LLP Analytics Everywhere Neural net models are used to predict movie box-office returns based on features of their scripts Decision tree models are used to help ER doctors better triage patients complaining of chest pain. Predictive models are used to predict the price of different wine vintages based on variables about the growing season. Predictive models to help commercial insurance underwriters better select and price risks. Predict which non-custodial parents are at highest risk of falling into arrears on their child support. Predicting which job candidates will successfully make it through the interviewing / recruiting process and which candidates will subsequently retain and perform well on the job. Predicting which doctors are at highest risk of being sued for malpractice. Predicting the ultimate severity of injury claims. (Deloitte applications in green) 14 Deloitte Analytics Institute 2011 Deloitte LLP 7

A Practical Conclusion There is no controversy in social science which shows such a large body of quantitatively diverse studies coming out so uniformly in the same direction as this one. When you are pushing over 100 investigations, predicting everything from the outcome of football games to the diagnosis of liver disease, and when you can hardly come up with half a dozen studies showing even a weak tendency in favor of the clinician, it is time to draw a practical conclusion. -- Paul Meehl, Causes and Effects of my Disturbing Little Book 15 Deloitte Analytics Institute 2011 Deloitte LLP Predictive Analytics: Fundamental Concepts 8

What Business Analytics Isn t One view of predictive modeling: We assemble a database, explore it, and fit models to it. DATA MODEL Is this a full conception of analytics? 17 Deloitte Analytics Institute 2011 Deloitte LLP What Business Analytics Is No: analytics projects do not begin with data they begin with a strategic issue or problem STRATEGY DATA MODEL 18 Deloitte Analytics Institute 2011 Deloitte LLP 9

What Business Analytics Is No: analytics projects do not begin with data they begin with a strategic issue or problem STRATEGY DATA MODEL DECISIONS And they do not end with a model they end with a model being used to make better decisions 19 Deloitte Analytics Institute 2011 Deloitte LLP What Business Analytics Is No: analytics projects do not begin with data they begin with a strategic issue or problem STRATEGY DATA MODEL DECISIONS And they do not end with a model they end with a model being used to make better decisions Therefore, the best analytics projects projects begin with business strategy and end with IT implementation, training, creating analytics-aided decision rules, and change management. 20 Deloitte Analytics Institute 2011 Deloitte LLP 10

Predictive Analytics: Three Levels of Discussion Strategy Profitable growth More consistent, accurate and economical underwriting Retaining, i cross-selling to preferred policyholders ld Marketing / prospecting Promoting wellness Methodology Model design Modeling process ( actuarial science ) ( data science ) Technique Supervised and unsupervised GLM vs. decision trees vs. support vector machines 21 Deloitte Analytics Institute 2011 Deloitte LLP Methodology and Technique How does data science need actuarial science? Target variable specification, preparation Model design Domain knowledge Variable creation, variable selection Model evaluation Knowing which ambiguities to focus on and which to ignore How does actuarial science need data science? Advances in computing, modeling techniques Programming with data Exploratory data analysis, data visualization ideas Ideas from other fields can be applied to insurance problems 22 Deloitte Analytics Institute 2011 Deloitte LLP 11

Semantics: Data Mining vs Predictive Modeling Data Mining : exploration and discovery Find interesting patterns in databases Open-ended, cast the net wide Data visualization often an important t part Can involve brute-force machine learning algorithms Example: discovery that personal credit correlates with risky insurance behavior. Predictive Modeling : apply statistical techniques in the light of the EDA, knowledge discovery phase Quantify & synthesize relationships found during knowledge discovery Example: use credit information in a traditional predictive model 23 Deloitte Analytics Institute 2011 Deloitte LLP At the Center of It All: Data Science Or: The Collision between Statistics and Computation Today, it is arguably appropriate to call certain aspects of propertycasualty ypractice data science. This often involves more than just knowing how to run regression models. Data science goes beyond: Traditional statistics Business intelligence [BI] Information technology Image borrowed from Drew Conway s blog http://www.dataists.com/2010/09/the-data-science-venn-diagram 24 Deloitte Analytics Institute 2011 Deloitte LLP 12

Predictive Analytics: Understanding the Strategy A Great Book About Predictive Analytics 26 Deloitte Analytics Institute 2011 Deloitte LLP 13

The Strategic Use of Analytics in Life Insurance Like baseball, insurance is a numbers game. Both fields are awash in data and a lot of money is at stake! As with baseball, so in insurance: All of this data until fairly recently has not been used in strategic ways. Percent Declined 80% 70% 60% 50% 40% 30% Underwriting Model Lift Curve (Hypothetical) Life insurance example: Models can be built to assign probabilities of (e.g.) being declined or (e.g.) being assigned to the best underwriting class based on readily available data. 20% 10% 0% 1 2 3 4 5 6 7 8 9 10 Model Decile (Predicted Value) 27 Deloitte Analytics Institute 2011 Deloitte LLP The Strategic Use of Analytics in Life Insurance Strategic application: data-driven underwriting 80% Underwriting Model Lift Curve (Hypothetical) 70% Models estimate the probabilities that a risk will fall 60% into the best / worst 50% underwriting classes. In this hypothetical example Worst 10% of risks have a >70% decline rate Percent Declined 40% 30% 20% 10% 0% 1 2 3 4 5 6 7 8 9 10 Model Decile (Predicted Value) 28 Deloitte Analytics Institute 2011 Deloitte LLP 14

The Strategic Use of Analytics in Life Insurance Strategic application: data-driven underwriting 80% Underwriting Model Lift Curve (Hypothetical) 70% Models estimate the probabilities that a risk will fall 60% into the best / worst 50% underwriting classes. In this hypothetical example Worst 10% of risks have a >70% decline rate Best 10% of risks have <5% decline rate Percent Declined 40% 30% 20% 10% 0% 1 2 3 4 5 6 7 8 9 10 Model Decile (Predicted Value) 29 Deloitte Analytics Institute 2011 Deloitte LLP Predictive Analytics: Understanding the Methodology 15

Three Central Concepts of Predictive Analytics Scoring engines aka algorithmic solutions aka predictive models Lift / Gain analysis Lift: how much better/worse than average are the observations with the best/worst scores? Gain: what percent of the outcomes are captures in the 10% of observations with the highest scores Out-of-sample tests Estimate how the model is likely to perform once implemented Unbiased estimate of model s predictive power 31 Deloitte Analytics Institute 2011 Deloitte LLP Scoring Engines Scoring engine: a formula that classifies or separates risks (or accounts or households ) into: Expected longevity Churn likelihoodlih Profitable vs unprofitable Utilization... (non)-linear equation f() of several predictive variables Produces continuous range of scores: Y = f(x 1, X 2,,X N ) Common model form: Y = g -1 ( + 1 X 1, + 2 X 2,+ + N X N ) 32 Deloitte Analytics Institute 2011 Deloitte LLP 16

What Powers a Scoring Engine? Scoring engine: Y = f(x 1, X 2,,X N ) The {X 1, X 2,,X N } are at least as 1 2 N important as the f(). Domain expertise, creativity, insight, all important A major part of the modeling process consists of variable creation and selection It is possible to generate many 100s of variables Knowing what is important t to create, how to create them, and what to do with them once they are created is the steepest part of the learning curve Why predictive modeling is an art as well as a science Tacit knowledge vs textbook knowledge 33 Deloitte Analytics Institute 2011 Deloitte LLP Model Evaluation: Lift Curve Analysis Sort the data by the model score Underwriting Model Lift Curve (Hypothetical) Break into 10 equal pieces Best decile : 10% of risks with the best scores Worst decile : 10% of risks with the worst scores Lift: average measured difference in the quantity we are trying to predict or estimate Lift = segmentation power Lift translates into the ROI of the modeling project Unlike such concepts as R 2 or AIC, lift has operational meaning Percent Declined 80% 70% 60% 50% 40% 30% 20% 10% 0% 1 2 3 4 5 6 7 8 9 10 Model Decile (Predicted Value) 34 Deloitte Analytics Institute 2011 Deloitte LLP 17

Data Mining and The Curse of Dimensionality Data Mining has both positive and negative connotations. Done well, it can lead us to new insights that result in more powerful models Done naively, it can cause our models to be fooled by randomness The curse of dimensionality the more variables we consider the harder it becomes to search through the space of possible models. As the dimension increases, so does the side-length of the cube needed to cover r% of the data space. In 10 dimensions we need to cover 80% of the range of each coordinate to capture 10% of the data. (From The Elements of Statistical Learning by Hastie, Tibshirani, Friedman) 35 Deloitte Analytics Institute 2011 Deloitte LLP The Bias-Variance Tradeoff and the Problem of Overfitting Predictive modeling techniques cannot distinguish true patterns from random variation in the data. Error measured on the dataset used to fit the model is overly optimistic The Bias-Variance Tradeoff 8 9 10 11 0.15 0.20 high bias low variance <----- low bias high variance -----> 4 5 6 7 4 6 8 10 12 14 Which model would you choose? error prediction e 0.05 0.10 train data test data 40 60 80 100 120 140 160 36 Deloitte Analytics Institute complexity (# neural net cycles) 2011 Deloitte LLP 18

The Bias-Variance Tradeoff and the Problem of Overfitting Low bias : The model fit is good on the training data. The model s predicted value is close to the data s expected value High variance : The model might not make accurate predictions upon implementation Bias alone is not the name of the game. 5 0.20 The Bias-Variance Tradeoff high bias low variance <----- low bias high variance -----> game. 40 60 80 100 120 140 160 Out of sample validation is a way of guarding against overfitting. prediction error 0.05 0.10 0.15 train data test data complexity (# neural net cycles) 37 Deloitte Analytics Institute 2011 Deloitte LLP Out-of-Sample Model Testing / Validation A Classic methodology: Divide the data into 3 pieces Randomly and/or using time Test data: use to measure gain/lift of candidate model Train Test Val Training data: used to fit models E.g. estimate parameters of a Generalized Linear Model or Cox Hazard Model Test data: used to measure gain/lift of candidate models Perform train/test steps iteratively until you have a model you are happy with During this iterative phase, validation is held in cold storage Validation Data: used measure lift on final selected model The test data was implicitly used during model selection process, therefore it is preferable to measure lift on a truly blind test sample Validation data is often an out-of-time sample 38 Deloitte Analytics Institute 2011 Deloitte LLP 19

So far Therefore we want to: Build a scoring model that results in a continuous range of scores that Produces valuable segmentation power on a blind-test data sample How do we get there? 39 Deloitte Analytics Institute 2011 Deloitte LLP Predictive Analytics: Understanding the Process 20

The High-Level Process Data scrubbing and Variable Creation Data Exploration and Modeling Model Analysis and Implementation The modeling phases we will discuss fit into the Cross-Industry Standard Process for Data Mining [CRISP-DM] 41 Deloitte Analytics Institute 2011 Deloitte LLP Data Scrubbing and Variable Creation Research possible data sources Extract/purchase data Check data for quality (QA) Messy! (still deep in the mines) Create Predictive and Target Variables Opportunity to quantify tribal wisdom and come up with new ideas Can be a very big task! Steepest part of the learning curve 42 Deloitte Analytics Institute 2011 Deloitte LLP 21

Types of Predictive Variables Insurance data sources can be viewed in an outward inward hierarchy. Geodemographic Population Density, traffic, weather, crime, education, Policy Characteristics Term, whole-life, Policyholder, Household Characteristics Age, gender, martial status, #kids, income, education Prior History / Behavior Medical history, life events, credit history, education/career choices, 43 Deloitte Analytics Institute 2011 Deloitte LLP Knowing Me, Knowing You Consumer data is rapidly increasing in: Volume Scope Specificity it REALITY The nature of available data is evolving rapidly. A 3-stage evolutionary process 44 Deloitte Analytics Institute 2011 Deloitte LLP 22

Data Exploration and Variable Triage The variable creation phase was hypothesis-driven expansion. This phase is contraction : begin the process of analyzing and winnowing the space of candidate variables. Data visualization / Exploratory Data Analysis is a key part of this. Use EDA to: Discard weak or poorly distributed variables Consider how to handle missing variables Transform variables categoricity, linearity, Investigate correlations amongst the candidate predictors Explore dimension reduction techniques (remember curse of dimensionality) Take note of variables relative predictive power 45 Deloitte Analytics Institute 2011 Deloitte LLP Model Design Model design is where actuarial science meets data science. Translate business problem into algorithmic solution. Common considerations: Which target variable to use? Which data points / types of policies to include/exclude? Which data sources / variable types to consider? What is the level of the model (policy? HH for cross-sell?) How should model be evaluated? Lift curves / gains charts Broken down by certain segments of business Compare with existing practice How to split data into train / test / validation samples Is data sufficiently credible? 46 Deloitte Analytics Institute 2011 Deloitte LLP 23

Multivariate Modeling Key initial step: appropriate model design Translate ultimate use of the model into appropriate model design With the model design in place, the iterative ti modeling process begins Picks up where EDA process leaves off: process of triaging, transforming, interacting candidate predictors continues. Multiple candidate models are built/tested Sample techniques: Regression / GLM / proportional hazard models CART / MARS / Random Forests, Neural nets, Support Vector Machines 47 Deloitte Analytics Institute 2011 Deloitte LLP Multivariate Modeling: In More Detail 1) Pair down full collection of candidate predictive variables to a more manageable set 2) Iterative modeling process Build candidate models on training data Evaluate on test data Many things to tweak Alternate modeling techniques Hybrid models, ensemble models Different target t variables Different predictive variables Data filtering / sampling criteria Technique-specific modeling options: distributional assumptions, weights and offsets, cost-complexity tuning parameters, neural net topology, decision tree splitting method, multiple random samples, 48 Deloitte Analytics Institute 2011 Deloitte LLP 24

Considerations Do signs/magnitudes of parameters make sense? Statistically significant? Sensible from a business point of view? Readily explainable Consistent with corporate philosophy? Is the population reasonably homogenous? If not does the model behave consistently / reasonably on both groups? Does the predictive power hold up within sub-segments Continuity Are there small changes in input values that might produce large swings in scores Make sure that end-users can t game the system 49 Deloitte Analytics Institute 2011 Deloitte LLP Model Analysis and Implementation Perform model analytics Necessary for end-users to gain comfort with the model Calibrate Models Create user-friendly scale client dictates Implement models Technology implementation deep IT skills crucial Business implementation plan use of business rules, reason messages, education Monitor performance Distribution of scores over time, predictiveness, usage of model... Collect reactions, suggestions from end-users Collect list of desired improvements for v2.0 Plan model maintenance schedule 50 Deloitte Analytics Institute 2011 Deloitte LLP 25

Predictive Analytics: Understanding the Techniques Supervised vs Unsupervised Learning Supervised Learning Estimate expected value of Y given values of X. Generalized Linear Models (GLM) Generalized Additive Models (GAM) Survival Models (Cox Proportional Hazard) Classification and Regression Trees (CART) Multivariate Adaptive Regression Splines (MARS) Random Forests Support Vector Machines Neural Nets Unsupervised Learning Attempt to find interesting patterns amongst the X. No outcome ( target ) variable Clustering, Self-organizing maps Correlation / Principal Components / Factor Analysis 52 Deloitte Analytics Institute 2011 Deloitte LLP 26

Classification vs Regression Classification Problems Goal is to segment the observations of interest into 2 or more distinct categories. Average, above average, below average profitability customers Likely fraudsters vs likely non-fraudsters Likely defectors vs likely non-defectors Regression Problems Goal is to predict a continuous amount. Dollars of loss for a policy Ultimate size of claim Years of retention Days out of work 53 Deloitte Analytics Institute 2011 Deloitte LLP Parametric vs Non-Parametric Parametric Statistics Involves the specification of a probabilistic model governing the process that generated the data. e.g. Poisson Regression: assume that the process generating g the number of claims for a given policy is a Poisson process. The mean ( λ ) of each policy s Poisson distribution is a linear combination of various attributes. Non-Parametric Statistics No probability model specified Purely algebraic e.g. classification trees use brute-force computing to create relatively pure segments of a dataset. No assumptions made about the target variable. 54 Deloitte Analytics Institute 2011 Deloitte LLP 27

Regression and its Relations Generalized Linear Models [GLM] Relax assumptions of Ordinary Least Squares [OLS] regression Normality exponential family Linearity linearity on scale link function scale (e.g. log scale) Logistic regression: binary outcomes Poisson regression: count outcomes Multivariate Adaptive Regression Splines [MARS] and Neural Nets Clever automatic ways of transforming and interacting candidate input variables Automates difficult job of selecting and transforming dozens or hundred of variables Account for fact that true relationships aren t necessarily linear Universal approximators: can model any functional form Downside: are black-box models (though MARS is much more interpretable) Classification and Regression Trees [CART] Tree-structured regression Simplification of MARS; easier to interpret 55 Deloitte Analytics Institute 2011 Deloitte LLP This presentation contains general information only and is based on the experiences and research of Deloitte practitioners. Deloitte is not, by means of this presentation, rendering business, financial, investment, or other professional advice or services. This presentation is not a substitute for such professional advice or services, nor should it be used as a basis for any decision or action that may affect your business. Before making any decision or taking any action that may affect your business, you should consult a qualified professional advisor. Deloitte, its affiliates, and related entities shall not be responsible for any loss sustained by any person who relies on this presentation. Copyright 2011 Deloitte Development LLC. All rights reserved. Member of Deloitte Touche Tohmatsu Limited 28