Combining Linear and Non-Linear Modeling Techniques: EMB America. Getting the Best of Two Worlds

Similar documents

Predictive Modeling Techniques in Insurance

A Deeper Look Inside Generalized Linear Models

Introduction to Predictive Modeling Using GLMs

Measuring per-mile risk for pay-as-youdrive automobile insurance. Eric Minikel CAS Ratemaking & Product Management Seminar March 20, 2012

SAS Software to Fit the Generalized Linear Model

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

GLM I An Introduction to Generalized Linear Models

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar

Travelers Analytics: U of M Stats 8053 Insurance Modeling Problem

Dealing with continuous variables and geographical information in non life insurance ratemaking. Maxime Clijsters

Offset Techniques for Predictive Modeling for Insurance

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING

Generalized Linear Models

BIG DATA and Opportunities in the Life Insurance Industry

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell

Statement of the Fair, Isaac and Company Inc. To the Office of Financial and Insurance Services Public Hearings

Risk pricing for Australian Motor Insurance

Classification of Bad Accounts in Credit Card Industry

Anti-Trust Notice. Agenda. Three-Level Pricing Architect. Personal Lines Pricing. Commercial Lines Pricing. Conclusions Q&A

How To Build A Predictive Model In Insurance

Innovations and Value Creation in Predictive Modeling. David Cummings Vice President - Research

BIG DATA Driven Innovations in the Life Insurance Industry

what every insurance agent needs to know about credit-based insurance scores

GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE

Staying Ahead of the Analytical Competitive Curve: Integrating the Broad Range Applications of Predictive Modeling in a Competitive Market Environment

Personal Auto Predictive Modeling Update: What s Next? Roosevelt Mosley, FCAS, MAAA CAS Predictive Modeling Seminar October 6 7, 2008 San Diego, CA

DATA MINING TECHNIQUES AND APPLICATIONS

Session 62 TS, Predictive Modeling for Actuaries: Predictive Modeling Techniques in Insurance Moderator: Yonasan Schwartz, FSA, MAAA

STATISTICA Formula Guide: Logistic Regression. Table of Contents

More Flexible GLMs Zero-Inflated Models and Hybrid Models

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Stochastic programming approaches to pricing in non-life insurance

Data Mining Techniques Chapter 6: Decision Trees

Predictive Modeling and Big Data

Logistic Regression (a type of Generalized Linear Model)

BayesX - Software for Bayesian Inference in Structured Additive Regression

DECISION 2016 NSUARB 96 M07375 NOVA SCOTIA UTILITY AND REVIEW BOARD IN THE MATTER OF THE INSURANCE ACT. - and - CO-OPERATORS GENERAL INSURANCE COMPANY

Solving Insurance Business Problems Using Statistical Methods Anup Cheriyan

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Improving Demand Forecasting

Development Period Observed Payments

Examining a Fitted Logistic Model

Chapter 7: Simple linear regression Learning Objectives

A Hybrid Modeling Platform to meet Basel II Requirements in Banking Jeffery Morrision, SunTrust Bank, Inc.

11. Analysis of Case-control Studies Logistic Regression

A Property & Casualty Insurance Predictive Modeling Process in SAS

Better decision making under uncertain conditions using Monte Carlo Simulation

SUGI 29 Statistics and Data Analysis

Predictive Modeling for Life Insurers

Studying Auto Insurance Data

Lecture 8: Gamma regression

The zero-adjusted Inverse Gaussian distribution as a model for insurance claims

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

My name is Steven Lehmann. I am a Principal with Pinnacle Actuarial Resources, Inc., an actuarial consulting

Package insurancedata

Better credit models benefit us all

The point of this essay is to learn how to assess risk and put an economic value on it!

SOA 2013 Life & Annuity Symposium May 6-7, Session 30 PD, Predictive Modeling Applications for Life and Annuity Pricing and Underwriting

Combining GLM and datamining techniques for modelling accident compensation data. Peter Mulquiney

Data Mining Methods: Applications for Institutional Research

Predictive Modelling Not just an underwriting tool

INDEX 1. INTRODUCTION 3. RATE / PREMIUM 4. FREQUENCY / SEVERITY & NRP 5. NET PREMIUM METHOD 7. GLM

IMPACT OF THE DEREGULATION OF MASSACHUSETTS PERSONAL AUTO INSURANCE. November 17, 2009

Get to Know the IBM SPSS Product Portfolio

Why do statisticians "hate" us?

DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Customer retention and price elasticity

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Direct Marketing of Insurance. Integration of Marketing, Pricing and Underwriting

Modeling Lifetime Value in the Insurance Industry

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

An Overview and Evaluation of Decision Tree Methodology

Insurance Telematics:

CYBER LIABILITY INSURANCE MARKET TRENDS: SURVEY

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Insurance Rate Making Using Data Mining

Simple Linear Regression Inference

AN INTRODUCTION TO PREMIUM TREND

Application of SAS! Enterprise Miner in Credit Risk Analytics. Presented by Minakshi Srivastava, VP, Bank of America

Model Validation Techniques

Predictive Modeling and By-Peril Analysis for Homeowners Insurance

APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION ANALYSIS.

Noncrash fire safety recall losses

Data Mining: Overview. What is Data Mining?

White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices.

Transcription:

Combining Linear and Non-Linear Modeling Techniques: Getting the Best of Two Worlds

Outline Who is EMB? Insurance industry predictive modeling applications EMBLEM our GLM tool How we have used CART with EMBLEM Case studies Other areas of expected synergies 2

EMB Worldwide Global network of p&c insurance consultants servicing clients throughout the world 3

Consulting Services Offered Predictive Modeling Regulatory Support & Law Analysis Ratemaking & Profitability Analysis Expert Witness Testimony Underwriting & Credit Scoring Software Development & Software Support EMB Enterprise Risk Management, Pro Forma, Business Planning Reserve Analysis & Opinion Letters Retention & Conversion Modeling Reinsurance Program Analysis New Program Development Competitive Analysis 4

State-of-the-Art Software EMB s suite of software products cover all aspects of personal and commercial lines of insurance EMBLEM GLM software for risk, marketing, and claims analysis ExtrEMB Dynamic parameterization for risk modeling Rate Assessor Pricing implementation software ResQ Professional Complete loss reserving tool Classifier Categorization software for high-dimension variables (e.g., territory) PrisEMB Reinsurance and large account pricing Igloo Professional Financial simulation engine for risk modeling RePro Management information analysis software for excess of loss insurance and reinsurance 5

EMBLEM We use EMBLEM, a GLM tool, for our predictive modeling needs Why? 6

Predictive Modeling in the Insurance Industry Primary application: - Estimating the cost of the product they sell (insurance) Two steps: 1. Reserving = estimating the cost of outstanding insurance claims 2. Pricing = estimating the cost of future insurance coverage Secondary applications: - Retention Modeling = probability that a policyholder will renew - Conversion Modeling = probability that a prospective policyholder will purchase a policy - Price Optimization - Claim fraud detection - Marketing 7

Estimating the Cost of Insurance Goal is to develop a unique rate for every risk - Don t think in terms of good/bad risks - State Farm/Allstate vs GEICO/Progressive - Quickly exhausts the data credibility / variability / stability Risks are described by the predictor variables, not the target - Need to have a mapping of the predictor variable levels to a target value not the other way around Other way around makes it difficult to derive impact of individual predictor variables Important because actual data often does not describe all possible combinations of potential customers 8

Estimating the Cost of Insurance Highly regulated marketplace - Restrictions Predictors can and cannot use Credit score Rules on values for the predictors Ages 65+ relativities cannot be >110% of ages 40-60 Maximum rate change between adjacent territories Rules on predictor order and magnitude of importance CA Sequential Analysis (driving record > annual mileage > years held license) - Regulatory Approval Rates need to be supported Black box methodologies will not be accepted 9

Estimating the Cost of Insurance Response variable is continuous/discrete function Density: Severity Frequency: Frequency 1,400 100,000 90,000 1,200 80,000 1,000 Severity 70,000 Frequency Density 800 600 Frequency 60,000 50,000 40,000 400 30,000 20,000 200 10,000 0 0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 Range 0 0 1 2 3 4 Range - Gamma consistent with severity modeling, or even Inverse Gaussian - Poisson consistent with frequency modeling No single trial/outcome - Trial is measured in terms of time - Actual policy length varies tremendously because of changes marital status new car moved 10

Solution? EMBLEM In 1996, EMB designed EMBLEM to provide access to GLM for statisticians and non-statisticians pricing personal and commercial insurance EMBLEM revolutionized the use of GLM s, enabling analysis that was previously either impossible or too time-consuming to be worth attempting EMBLEM is now used by over 100 insurance companies globally: - 18 of the top 20 personal auto writers in the UK - 50 companies in the US including 8 of the top 10 personal auto writers Fastest GLM tool with the capability to model millions of observations in seconds with a host of diagnostic tools: - Graphical, practical, statistical, automated. - Stand-alone software package that can be integrated with a variety of external software including SAS. - Microsoft Visual Basic for Applications provides ultimate flexibility. 11

EMBLEM GLM characteristics work to our advantage - Exponential family does an excellent job of describing the underlying components of insurance losses - Output of the model is in the form of Beta parameters which can easily be converted to rate relativities - EMBLEM is not automated User has complete control over the model structure Complete diagnostic tools to assist the modeler with decisions 12

Current Status in Insurance Marketplace In terms of estimating the cost of insurance: - UK has embraced predictive modeling Experienced with its techniques Knowledgeable with the factors that tend to be predictive - US is learning about predictive modeling Saturation with big players in personal lines marketplace Companies not using predictive modeling techniques are being adversely selected against Now expanding dimensionality of databases Still fairly new concept in commercial lines marketplace Big players are using techniques but historical rating structures are hindering the rapid expansion 13

Current Status in Insurance Marketplace Result? - UK is expanding into secondary applications Retention modeling Conversion modeling Price optimization Claim fraud detection - Because Predictive Modeling has been around for some time in the UK, the datasets are getting larger in terms of the number of predictors to evaluate - Experienced US companies are beginning to evaluate the secondary applications - Marketing is used in a manner similar to other industries 14

Cart.lnk CART How does CART fit into this? - As we transition into the secondary applications we move from modeling a continuous function to a binary function Tree-based techniques can add value to the analysis Retention and Conversion modeling - Accept/Reject target variable - Desirable smooth surface - Price optimization integrates these with premium models Marketing and Fraud detection - Classic tree applications 15

Cart.lnk CART EMBLEM Using CART and EMBLEM - Goal is to play off of the strengths of each tool CART Strengths - Automatic separation of relevant from irrelevant predictors - Easily rank-orders variable importance - Automatic interaction detection (requires additional work) - Captures multiple structures within a dataset rather than a single dominant structure - Can handle missing values and is impervious to outliers 16

Cart.lnk CART EMBLEM EMBLEM Strengths - User has control over the model structure - Ease of communication/conceptualization effects of each explanatory variable is transparent - Provides predicted response values for new data points 17

Cart.lnk Factor selection CART Interaction detection Model validation EMBLEM Model structure Incorporating time/seasonality trend effects Implementation of results 18

Speakers Note Both CART and EMBLEM are excellent tools both of which produce consistent results in similar situations - This is not an exercise of seeing which is better The purpose of this discussion is to show how efficiencies can be gained in the modeling process - As datasets get larger in terms of the number of predictors time becomes a crucial element 19

Case Study #1 US Dataset Retention modeling assignment - 97,227 observations each observation represents one trial/outcome split 50/50 between training/test datasets - 11 predictors grand total number of levels: 147 20

Case Study #1 Modeling Process - Started with Forward Entry Regression Automated process Used Chi-Squared statistic for testing significance Took about 30 minutes to run - Significant factors (8) Rating Area Vehicle Category Age NCD Driver Restriction Vehicle Age Change Over Last Year s Premium Market Competitiveness 21

Forward Entry Regression Build a Model with no factors and add based on prespecified criteria regarding improvement in model fit: Model Variables Deviance Degrees of Freedom Chi Squared Compare to Base Base Mean 12,380.23 18,596 1 Mean + Gender 12,377.02 18,594 20.1% 2 Mean + Policyholder Age 12,214.88 18,570 0.0% 3 Mean + Rating Area 12,365.50 18,581 47.1% 4 Mean + Vehicle Age 9,997.75 18,576 0.0%.. 17 Mean + MTA Indicator 12,370.30 18,595 0.2% 18 Mean + Time 12,371.45 18,594 0.1% Add the factor that performed the best on the Chi Square test. (Policyholder Age) Iterate process with the new base model until no further factors indicated removal 22

Case Study #1 Compared results with CART/TreeNet - Significant factors were essentially the same - Model predictiveness was the same (ROC = 0.7) Interactions - no significant interactions were found by EMBLEM or CART Test Dataset - ROC = 0.7 23

Case Study #2 UK Dataset Retention modeling assignment - 198,386 observations each observation represented one trial/outcome split 50/50 between training/test datasets - 135 predictors grand total number of levels: approx 3,752 24

Case Study #2 Forward Entry Regression - Found 57 predictors to be significant - Took a weekend to run Comparison to CART/TreeNet - Found 24 significant predictors - Top 15 based on variable importance were also found by EMBLEM - Correlations with the rest of the predictors Through the modeling process we reduced the number of predictors to 26 25

Case Study #2 Interactions - We relied on indications from CART/TreeNet - 6 interactions were identified and included in the model EMBLEM Results - Training ROC =.862 - Test ROC =.85 26

Other Expected Synergies Variable importance Segmentation Super-Profiling 27

Segmentation CART excels at identifying different segments in data CART may also help determine where to segment data Segmentation is a useful alternative to fitting many interactions Example: in a automobile insurance renewal problem, a CART analysis showed several occurrences of a split between those policyholders with just one years duration and those with a greater duration. This suggests segmenting the data into two parts: Policies renewing with one year duration Policies renewing with more than one year 28

Super-Profiling After a GLM model is constructed use CART to model the residuals to see if any pattern exists - If a pattern is discovered, go back to the model structure and incorporate the findings - test to see if model structure was inadvertently over-simplified 29