Innovations and Value Creation in Predictive Modeling David Cummings Vice President - Research ISO Innovative Analytics 1
Innovations and Value Creation in Predictive Modeling A look back at the past decade of innovation in predictive analytics New innovations in predictive modeling in Auto and Homeowners Insurance Measuring the value of increased rate segmentation 2
The Recipe for Advanced Analytics Data Availability Computational Resources Business Knowledge Advanced Analytics Analytical Tl Talent 3
Example: Credit Scoring in Auto Insurance in the mid 90 s Data Availability Computational Resources Business Knowledge Known relationship between Credit Risk and Insurance Risk Advanced Analytics Analytical Tl Talent 4
Example: Credit Scoring in Auto Insurance in the mid 90 s Increased access to credit and insurance data Data Availability Computational Resources Business Knowledge Advanced Analytics Analytical Tl Talent 5
Example: Credit Scoring in Auto Insurance in the mid 90 s Data Availability Computational Resources Cheaper computer power and faster statistical ti ti algorithms Business Knowledge Advanced Analytics Analytical Tl Talent 6
Example: Credit Scoring in Auto Insurance in the mid 90 s Data Availability Computational Resources Business Knowledge Advanced Analytics Analytical Tl Talent Doable problem for statisticians, data miners (and some actuaries) 7
Example: Credit Scoring in Auto Insurance in the mid 90 s Increased access to credit and insurance data Data Availability Computational Resources Cheaper computer power and faster statistical ti ti algorithms Business Knowledge Known relationship between Credit Risk and Insurance Risk Advanced Analytics Analytical Tl Talent Doable problem for statisticians, data miners (and some actuaries) 8
What has the impact been? Major innovations in an historically static rate plan Increased competition Profitable growth for adopters of advanced analytics Hunger for the next innovation 9
Indication of Increased Competition Number of Companies writing Personal Auto Insurance in the US 600 500 400 300 200 1/3 of companies gone in 12 years 100 0 1980 1985 1990 1995 2000 2005 10
Indication of Increased Competition 100% Consolidation of Auto Insurance Markets 90% Marke et Share 80% 70% Top Carrier Group Top 10 Top 25 Top 50 60% 50% 1995 2000 2005 2007 Below 50 now has only 9% for remaining 280 groups 11
Innovations in Predictive Modeling: Predictions at the Address Level 12
Territorial Conundrum Territories should be big Have a sufficient volume of business to make credible estimates of the losses. 13 Territories should be small Conditions vary within territory.
View as Case Studies in Model Development Data Driven Approach Reduction in number of variables Necessary for small insurers Special circumstances in fitting models to individual auto / home owners data. Diagnostics Graphics and Maps 14
Data Versus the Conundrum ISO Data External Data Loss Cost Trend Obtained loss, exposure, policy features and address for individual policies from cooperating insurers Weather Census / Claritas Location Products Business Points 15 Distance to Coast / major body of water PPC Elevation 15
Some Environmental Features (Possibly) Related to Claims Proximity to Businesses and Attractions Workplaces, Shopping Centers, Contractors, etc. Weather / Terrain: Wind, Temperature, Snowfall, Change in Elevation Population (Traffic) Density Others : Commuting Patterns, Coastal proximity, etc. 16
Combining Environmental Variables at a Particular Address 17 Individually, the geographic variables have a predictable effect on claim rate and severity. Variables for a particular location could have a combination of positive and negative effects. ISO has built models to calculate the combined effect of all variables. Based on countrywide data Actuarially credible
Variable Selection is Multiplied by the Number of Models Frequency and Severity are modeled separately Models are at coverage / peril level Five auto coverages: BI, PD, PIP, Comp. & Coll. 10 models Nine home owners perils: Wind Fire Lightning Liability Theft / Vandalism Hail Other Water 18 models Water Weather Water Non-weather 18
In Depth for Auto Weather Component Environmental Model Loss Cost by Coverage Coverage Frequency Severity Frequency Severity Causes of Loss Frequency Traffic Generators Traffic Composition Weather Traffic Density Experience and Trend Sub Model Neural Net Weather RBF Weather Temperature Scale Clusters &Other Summaries Weather Precipitation Scale Neural Net Weather MLP Data Summary Variable Weather Summary Variables Raw Data 35 Years of Weather Data 19
Environmental Model Loss Cost = Pure Premium e Frequency = 1 e = Frequency x Severity Severity = e = Intercept = Intercept + Weather + Weather + Traffic Density + Traffic Generators + Traffic Density + Traffic Generators + Traffic Composition + Experience and Trend + Traffic Composition + Experience and Trend 20
Constructing the Components Frequency Model as Example Intercept x... x = Weather 1 1 1 1 x... x n 1 n 1 n n 1 1 2 2 x... x n 1 n 1 n n 2 2 3 3 3 3 4 4 n n = Traffic Density = Traffic Generators n 1 x n 1... n xn = Traffic Composition x... x n 1 n 1 n n 4 4 5 5 Other Classifiers = Experience & Trend 21
An Example on the Ground > 10% Difference across boundary More gradual differences Redrawn boundaries 22
Homeowners Amount Relativities by Peril Loss Cost per $1000 of Building Coverage 1.6 1.4 Fire 1.6 1.4 Lightning 1.6 1.4 Wind 1.6 1.4 Hail 1.2 1.2 1.2 1.2 1 1 1 1 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0 0 100,000 200,000 300,000 0 0 100,000 200,000 300,000 0 0 100,000 200,000 300,000 0 0 100,000 200,000 300,000 16 1.6 16 1.6 16 1.6 16 1.6 1.4 Water Weather 1.4 Water Non Weather 1.4 Theft 1.4 Other 1.2 1.2 1.2 1.2 1 1 1 1 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0 0 100,000 200,000 300,000 0 0 100,000 200,000 300,000 0 0 100,000 200,000 300,000 0 0 100,000 200,000 300,000 Current Relativity it Modeled d by Peril Significant variation by peril 23
Homeowners Rating Factors by Peril 24 Rating Factors that vary by peril provide lift Adds accuracy and complexity All-peril relativities iti can be derived d from peril-based relativities according to peril mix within the area Local Prediction by peril may result in varying peril loss costs at the address level Effectively produces all-peril relativities that vary at the address level
Overall Model Diagnostics Sort in order of increasing i prediction Frequency & Severity Group observations in buckets 1/100 th of record count for frequency 1/50 th of the record count for severity Calculate bucket averages Apply the GLM link function for bucket averages and predicted d value logit for frequency log for severity Plot predicted vs empirical With confidence bands 25
Overall Diagnostics - Frequency Empirical vs. Predicted Probabilities: BI (On logistic scales) -3-4 logit p ln 1 p emp pirical.logit -5-6 -7 26-8 -8-7 -6-5 -4-3 predicted.logit
Overall Diagnostics - Severity Empirical i vs. Predicted d Log (Base 10) Severities: i BI 4.6 4.4 empir rical.logsev 4.2 4.0 3.8 27 3.6 3.7 3.9 4.1 4.3 4.5 predicted.logsev
Credibility Statement e t of Principles regarding g P&C Insurance Ratemaking (adopted 1988) 28 Credibility is a measure of the predictive value that the actuary attaches to a particular body of data. Credibility is increased by making groupings more homogeneous or by increasing the size of the group analyzed. A group should be large enough to be statistically reliable. Obtaining homogeneous groupings requires refinement and partitioning i of the data. There is a point at which partitioning divides data into groups too small to provide credible patterns. Each situation requires balancing homogeneity and the volume of data. 28
Credibility Statement of Principles regarding P&C Insurance Ratemaking (adopted 1988) 29 Credibility is a measure of the predictive value that the actuary attaches to a particular body of data. Credibility is increased by making groupings more homogeneous or by increasing the size of the group analyzed. A group should be large enough to be statistically reliable. Obtaining homogeneous groupings requires refinement and partitioning i of the data. There is a point at which partitioning divides data into groups too small to provide credible patterns. Each situation requires balancing homogeneity and the volume of data. 29
Overall Diagnostics - Frequency Empirical vs. Predicted Probabilities: BI (On logistic scales) -3-4 logit p ln 1 p emp pirical.logit -5-6 -7 Credibility = Predictive Value and Statistical Reliability 30-8 -8-7 -6-5 -4-3 30 predicted.logit
Credibility Statement of Principles regarding P&C Insurance Ratemaking (adopted 1988) 31 Credibility is a measure of the predictive value that the actuary attaches to a particular body of data. Credibility is increased by making groupings more homogeneous or by increasing the size of the group analyzed. A group should be large enough to be statistically reliable. Obtaining homogeneous groupings requires refinement and partitioning i of the data. There is a point at which partitioning divides data into groups too small to provide credible patterns. Each situation requires balancing homogeneity and the volume of data. 31
Component Diagnostics Frequency Example 32 Sort observations in order of i th component C i Bucket as above and calculate C ib = Average C i in bucket b ib g i p ib = Average p i in bucket b Partial Residuals p R ln ib C ib kb 1 p ib k i Plot C ib vs R ib Expect linear relationship
Component Diagnostics Experience and Trend Logit Partial Residuals vs. Components: Comprehensive 1.0 logit.pa artial.residual 05 0.5 00 0.0-0.5 33-1.0-0.6-0.1 0.4 0.9 Exp
Component Diagnostics Traffic Composition Logit Partial Residuals vs. Components: Comprehensive 0.4 0.2 logit.pa artial.residual 00 0.0-0.2 34-0.4-0.16-0.11-0.06-0.01 0.04 0.09 0.14 0.19 TrafComp
Component Diagnostics Traffic Density Logit Partial Residuals vs. Components: Comprehensive 0.3 logit.pa artial.residual 0.1-0.1-0.3-0.5 35-0.4-0.3-0.2-0.1 0.0 0.1 0.2 0.3 TrafDen
Partial Residual Plot : Finding Transformations 36
Example of Diagnostics Collinearity and Multicollinearity Correlations Matrix: measures the correlations among each pair of variables in the models, but does not consider multicollinearity. Variance Inflation Factors (VIF): A measure of the multicollinearity among independent variables. 37
Customized Model Loss Cost = Pure Premium Frequency = e 0 1 e 1 5 1 in industry model Severity model customized similarly 38 = Frequency x Severity = + Weather 1 + Traffic Density 2 + 3 Traffic Generators + Traffic Composition 4 + Experience and Trend 5 + Other Classifiers
Predictions at the Address Level Summary Model estimates loss cost as a function of business, demographic and weather conditions associated with address. Preparing data for models based on geography is not a trivial exercise Showed fit assessment and model diagnostics Indicated how to customize the model 39
40 Measuring the Value of Rate Segmentation
Our Challenge Enhanced rate segmentation can add significant value BUT Increased segmentation has a cost How do we evaluate the value vs. cost? How do we make the case to decision makers? 41
How Some Actuaries Make the Case to Increase Segmentation We need to enhance our analytics in order to maintain our competitive pricing advantage! I don t want to lose our pricing advantage. How much will it cost to implement an enhanced pricing strategy? 42
How Some Actuaries Make the Case to Increase Segmentation It will take 100,000 000 IT man-hours costing $10 million to modify our underwriting and agency systems. That s a lot of money to spend! How much additional revenue will we bring in? 43
How Some Actuaries Make the Case to Increase Segmentation We will implement the new rate structure so that it will be revenue neutral. You want me to spend $10 million to get NO additional revenue? That doesn t make any sense! 44
How Some Actuaries Make the Case to Increase Segmentation Why doesn t he understand how important this pricing strategy t is to our business? Where can I find an actuary with some business sense? 45
What s wrong with this dialog? Focus only on implementation costs In a competitive marketplace, there is a cost to doing nothing Lost business, lost revenue, and increasing cost of remaining policies Short-term view of revenue impact Revenue Neutral applies only to average premiums on current book There can be long-term revenue impacts 46
How to make the case better Better projections of revenue and profit impacts Look beyond Revenue Neutral implementation Better consideration of marketplace dynamics Includes customer retention and competitive effects Demonstrate the value in monetary terms 47
The Discounted Cash Flow Trap Projected cash stream from investing in innovation Usual DCF or NPV comparison Assumed cash stream resulting from doing nothing More likely cash stream resulting from doing nothing Should make this comparison Source: Christensen, Kaufmann, Shih, Innovation Killers: How Financial Tools Destroy Your Capacity to Do New Things, Harvard Business Review, Jan 2008 48
Illustration Insurer writes 3 policies All policies priced in the same class Expected Loss Ratio = 50% Profit if Loss Ratio < 60% More accurate segmentation is available in the marketplace Used by competitors Places some policies at risk 49
Illustration Base Case Policy # Premium Insurer s Accurate Expected Break-Even Expected Loss Loss Loss 1 60 30 36 2 60 30 36 3 60 30 36 Total 180 90 108 Ratio to Premium Insurer s Profit 20 16 30 6 40-4 90 18 50% 60% 50% 10% 50
Illustration Year 1 Policy # Premium Insurer s Accurate Expected Break-Even Expected Loss Loss Loss 1 60 30 36 2 60 30 36 3 60 30 36 Total 180 90 108 Ratio to Premium Insurer s Profit 20 16 0 30 6 40-4 90 18 2 50% 60% 50% 10% 1% Lost Profit = 16 51
Value of Lift (VoL) Assume a competitor comes in and takes away the above average risks. Because of adverse selection, the new loss ratio will be higher than the current loss ratio. What is the value of avoiding this fate? $16 in this illustration Insurer could have spent additional $16 for segmentation and been no worse off May express the VoL as a $ per car year. $5.33 per policy 52
Value of Lift ISO Risk Analyzer Personal Auto Environmental Module Coverage Vl Value of flift Bodily Injury $4.99 Property Damage $3.63 Collision $1.61 Comprehensive $4.85 Personal Injury (PIP) $15.04 Combined $13.29 Based on holdout sample of all coverages industry data (4.5 million records) 53
Illustration Year 2 Policy # Premium Insurer s Accurate Expected Break-Even Expected Loss Loss Loss Insurer s Profit 2 70 35 42 30 12 3 70 35 42 40 2 Total 140 70 84 90 14 Ratio to Premium 50% 60% 50% 10% 54
Illustration Year 2 Policy # Premium Insurer s Accurate Expected Break-Even Expected Loss Loss Loss 2 70 35 42 3 70 35 42 Total 140 70 84 Ratio to Premium 50% 60% Insurer s Profit 30 12 0 40 2 90 14 2 50% 10% 1.4% Lost Profit = 12 55
Illustration Year 3 Policy # Premium Insurer s Accurate Expected Break-Even Expected Loss Loss Loss 3 80 40 48 Total 80 80 48 Ratio to Premium 50% 60% Insurer s Profit 40 8 40 8 50% 10% 56
Illustration Summary No Enhanced Segmentation Year Premium Profit Declining Revenue 0 180 18 Declining Profit 1 120 2 2 70 2 3 80 8 NPV 25 Calculate NPV Using 10% discount rate Proper Basis of Comparison 57
The Discounted Cash Flow Trap Projected cash stream from investing in innovation Usual DCF or NPV comparison Assumed cash stream resulting from doing nothing More likely cash stream resulting from doing nothing Should make this comparison Source: Christensen, Kaufmann, Shih, Innovation Killers: How Financial Tools Destroy Your Capacity to Do New Things, Harvard Business Review, Jan 2008 58
Alternative Scenario Enhanced Segmentation Profit excl Marginal Year Premium Marginal Costs Costs Profit 0 180 18 10 8 1 180 18 3 15 2 180 18 3 15 3 180 18 3 15 NPV 41 59 Assume premium and policies are retained Directly consider implementation costs Higher first year expenses
Comparison No Enhanced Segmentation Year Premium Profit 0 180 18 1 120 2 2 70 2 3 80 8 Enhanced Segmentation Year Premium Profit 0 180 8 1 180 15 2 180 15 3 180 15 NPV 25 NPV 41 Greater NPV for Enhanced Segmentation 60
References Glenn Meyers, Value of Lift, Actuarial Review, May 2008 David Cummings, Value of Lift AN Net Present Value Framework, Actuarial Review, Feb 2009 61
Extensions of this Approach Refined considerations of retention and conversion effects Consider different premium scenarios Projections are inherently uncertain Use stochastic simulation to project future scenarios under uncertainty Connection with Strategic Risk Management 62
Summary Predictive Modeling has had a profound impact on the insurance industry Significant innovations in progress for the next wave of advanced analytics Assessing the value of segmentation requires understanding of marketplace dynamics Profitability and market share are at risk for those who do nothing 63