Risk pricing for Australian Motor Insurance

Similar documents
BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Addressing Analytics Challenges in the Insurance Industry. Noe Tuason California State Automobile Association

Model Validation Techniques

Better credit models benefit us all

Studying Auto Insurance Data

Benchmarking of different classes of models used for credit scoring

The Forgotten JMP Visualizations (Plus Some New Views in JMP 9) Sam Gardner, SAS Institute, Lafayette, IN, USA

Classification of Bad Accounts in Credit Card Industry

How To Make A Credit Risk Model For A Bank Account

Identifying SPAM with Predictive Models

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

STATISTICA Formula Guide: Logistic Regression. Table of Contents

!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"

Modeling Lifetime Value in the Insurance Industry

Chapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida - 1 -

SOA 2013 Life & Annuity Symposium May 6-7, Session 30 PD, Predictive Modeling Applications for Life and Annuity Pricing and Underwriting

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell

A Property & Casualty Insurance Predictive Modeling Process in SAS

Combining Linear and Non-Linear Modeling Techniques: EMB America. Getting the Best of Two Worlds

Variable Reduction for Predictive Modeling with Clustering Robert Sanche, and Kevin Lonergan, FCAS

Data Mining Practical Machine Learning Tools and Techniques

How To Build A Predictive Model In Insurance

Why Ensembles Win Data Mining Competitions

Predictive Modeling Techniques in Insurance

Revising the ISO Commercial Auto Classification Plan. Anand Khare, FCAS, MAAA, CPCU

Machine Learning Capacity and Performance Analysis and R

Leveraging Ensemble Models in SAS Enterprise Miner

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Variable Selection in the Credit Card Industry Moez Hababou, Alec Y. Cheng, and Ray Falk, Royal Bank of Scotland, Bridgeport, CT

Innovations and Value Creation in Predictive Modeling. David Cummings Vice President - Research

Applying Customer Attitudinal Segmentation to Improve Marketing Campaigns Wenhong Wang, Deluxe Corporation Mark Antiel, Deluxe Corporation

Examining a Fitted Logistic Model

GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE

Introduction to Longitudinal Data Analysis

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Data Mining Methods: Applications for Institutional Research

Data Mining Approaches to Modeling Insurance Risk. Dan Steinberg, Mikhail Golovnya, Scott Cardell. Salford Systems 2009

Bike sharing model reuse framework for tree-based ensembles

Azure Machine Learning, SQL Data Mining and R

Development Period Observed Payments

Building risk prediction models - with a focus on Genome-Wide Association Studies. Charles Kooperberg

Joint models for classification and comparison of mortality in different countries.

Beating the NCAA Football Point Spread

A Property and Casualty Insurance Predictive Modeling Process in SAS

Fast Analytics on Big Data with H20

A Comparison of Variable Selection Techniques for Credit Scoring

Predictive Analytics for Life Insurance: How Data and Advanced Analytics are Changing the Business of Life Insurance Seminar May 23, 2012

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Predictive modelling around the world

D-optimal plans in observational studies

Journée Thématique Big Data 13/03/2015

Comparing return to work outcomes between vocational rehabilitation providers after adjusting for case mix using statistical models

Data Mining. Nonlinear Classification

DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING

The Data Mining Process

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Data Mining Opportunities in Health Insurance

The zero-adjusted Inverse Gaussian distribution as a model for insurance claims

Gerry Hobbs, Department of Statistics, West Virginia University

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

A Deeper Look Inside Generalized Linear Models

Data Mining Techniques Chapter 6: Decision Trees

Distances, Clustering, and Classification. Heatmaps

Using multiple models: Bagging, Boosting, Ensembles, Forests

Table of Contents. June 2010

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

ICPSR Summer Program, 2014

Knowledge Discovery and Data Mining

CART 6.0 Feature Matrix

A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND

IST 557 Final Project

Ensemble Approach for the Classification of Imbalanced Data

Tree Ensembles: The Power of Post- Processing. December 2012 Dan Steinberg Mikhail Golovnya Salford Systems

Lecture 10: Regression Trees

CS Data Science and Visualization Spring 2016

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence

An Overview and Evaluation of Decision Tree Methodology

Data mining and statistical models in marketing campaigns of BT Retail

exspline That: Explaining Geographic Variation in Insurance Pricing

Cross-Validation. Synonyms Rotation estimation

Expert Systems with Applications

Supervised and unsupervised learning - 1

ICPSR Summer Program

Introduction to Predictive Modeling Using GLMs

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Challenge. Solutions. Early results. Personal Lines Case Study Celina Insurance Reduces Expenses & Improves Processes Across the Business.

REPORT DOCUMENTATION PAGE

CAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION

Data Mining and Visualization

Offset Techniques for Predictive Modeling for Insurance

Transcription:

Risk pricing for Australian Motor Insurance Dr Richard Brookes November 2012

Contents 1. Background Scope How many models? 2. Approach Data Variable filtering GLM Interactions Credibility overlay 3. Model performance Overall Sources of value Final tests 2

Contents 1. Background Scope How many models? 2. Approach Data Variable filtering GLM Interactions Credibility overlay 3. Model performance Overall Sources of value Final tests 3

Scope Risk pricing for Australian motor Portfolio Several brands, all Australian states 9 million policy years worth of exposure Brief Build a state of the art risk pricing model Coverage All brands, all states This presentation concentrates on atfault collision frequency 4

Modelling philosophy Our aim when determining how complex our model should be, is to minimise Prediction Error in Test/Holdout Datasets Underfit Overfit Existing rating models Project Aim From Hastie, Tibshirani and Friedman 5

How many models? Fewer models, increased model complexity, more signal More models, simpler models, less signal 2 models Frequency Size Size model potentially very complex Many different perils Frequency model Has to incorporate all peril specific predictors 672 models! Close to existing practice Very hard to maintain and update Each model is simple, but Low signal And unnecessary structure /parameter error 6

How many models? Fewer models, increased model complexity, more signal More models, simpler models, less signal 2 models Frequency Size Size model potentially very complex Many different perils 16 models Frequency Size 8 perils Models complex Need to incorporate brand/state interactions 672 models! Close to existing practice Very hard to maintain and update Each model is simple, but Frequency model Has to incorporate all peril specific predictors More signal But does it justify the increased complexity of each model? Low signal And unnecessary structure /parameter error 7

Contents 1. Background Scope How many models? 2. Approach Data Variable filtering GLM Interactions Credibility overlay 3. Model performance Overall Sources of value Final tests 8

Large universe of potential predictors Policy Attributes Policy Related History Vehicle Related Data Location Based Data Standard policy rating factors Age, Gender, value, Additional Driver Information Number of drivers, age, family group, accident flag Policy and customer tenure Past motor claims history for each policy Vehicle attributes from Glass s Guide Make Model Engine size PWR Shape Dimensions Tyre width and profile Value 3 rd party sociodemographic segmentation 9

Partitioning and sampling 44M records, 3000 potential predictors SAS dataset is 1400GB uncompressed! Need to reduce this to a size that we can deal with Partition and sample horizontally Learn, test and holdout sets Keep all the 1 s but only some of the zeros Partition and sample vertically for variable selection Have a strategy for time based testing 10

Partitioning and sampling - horizontal M 50% 940k 0 s 260k 1 s 33m 0 s T 25% 260k 1 s H 25% Final validation 11

Time based testing Year 2007 2008 2009 2010 2011 Holdout Test Train Holdout 2 Implementation 12

Contents 1. Background Scope How many models? 2. Approach Data Variable filtering GLM Interactions Credibility overlay 3. Model performance Overall Sources of value Final tests 13

Filtering tool of choice - Treenet What is Treenet? Or MART or GBM A gradient based boosting algorithm using small decision trees as the base learners Performs both classification and regression, with various choices of loss functions Falls into the general class of ensemble based predictive models α + Gives very good out of the box models which often take some time and effort to surpass, but α + Maximum capacity (of Salford implementation) is about 300 variables and 1M observations Overfits generally, and Gives preferential treatment to high level categorical variables α + 14

Filtering general process Initial List ~3000 predictors TreeNet ~130 predictors Partition dataset (horizontal) and variables (vertical) For each variable partition Main rating variables included in each partition High level categorical variables grouped or excluded Fit Treenet models to each partition The best variables from each partition enter the super group Using Treenet, the super-group gets shaved from the bottom until performance on the test set peaks Variable clustering ~100 predictors Proc Varclus (SAS) used to discard correlated variables Extra TreeNet shaving ~80 predictors More Treenet shaving 15

Refined list Refined list Refined list Variable set 1 Variable set 13 Variable set 1 Variable set 10 Variable set 1 Variable set 10 Initial variable filtering Risk pricing for Australian motor Strat. Sample A Strat. Sample B Strat. Sample C Functional variable splits Random variable splits TreeNet fits to create refined lists Further TreeNet fit Combined list 16

Treenet AUC Gains score Risk pricing for Australian motor Filtering variable shaving Optimal shaved model size 0.75 0.7 0.65 0.6 0 30 60 90 120 150 180 210 Number of variables 17

Correlated variables Problem Highly correlated variables are undesirable in GLMs: Misleading parameters and significance Odd shapes Longer fit times Removal one of a pair of closely correlated variable generally has negligible impact on model performance Solution Apply hierarchical variable clustering (Proc Varclus) to identify sets of correlated variables Keep one or two variables from each cluster based on rank in TreeNet importance lists Rerun TreeNet on reduced variable list to check performance not materially degraded Produce correlation matrix to check that no major correlations remain 18

Contents 1. Background Scope How many models? 2. Approach Data Variable filtering GLM Interactions Credibility overlay 3. Model performance Overall Sources of value Final tests 19

Modelling - techniques Machine learning models Non-parametric Easy to build Adequate fit with little effort Push button to update Performance often not as good GLM Can be over-parameterised Little insight (except decision trees) Recalibration can lead to large changes at the observation level Generalised Linear Models (GLMs) Structure, form, INSIGHT Equal or better model with fewer parameters More stable over time Push button to update but with some care Harder to build good models Can be very hard if the structure is complex (but this is very rare) May not pick all structural nuances TF preferred for variable selection and interaction searches TF preferred for main model 20

Modelling - GLM fitting strategy Fit Saturated Model, including key interactions Find correct error structure (distribution and mean/variance relationship) Simplify Continuous Predictors Simplify Categorical Predictors Search for more interactions All GLMs fitted in SAS using TF custom macros 21

Mean/variance and distribution The observations are sorted with the lowest μ i at the left 22

Variance diagnostics 23

Simplifying continuous variables 24

Simplifying categorical variables 25

Contents 1. Background Scope How many models? 2. Approach Data Variable filtering GLM Interactions Credibility overlay 3. Model performance Overall Sources of value Final tests 26

Extensive search for interactions Brute force search Main effects models are splines plus binaries Form list of all pairwise interactions (6,000!) Fit a GLM for each interaction, using main effects prediction as an offset Order using AIC and p-value (a bit unreliable) Fit candidates in main model for final evaluation Testing suspected interactions manually Machine learning insights A Treenet model with two node trees is a main effects model. This can be used to evaluate the effect of interactions Fit Treenet models to residuals, view the model and use to test further interactions Also used rule fit 27

Contents 1. Background Scope How many models? 2. Approach Data Variable filtering GLM Interactions Credibility overlay 3. Model performance Overall Sources of value Final tests 28

Vehicle and geographic overlays Final model 54 predictors and 200 parameters, including 100 state/brand interactions and 36 other interactions Vehicle characteristics and geography largely accounted for and integrated Still possible residual effects from high level categorical variables Make, model and zone? GLM Vehicle credibility overlay Geographic credibility overlay 29

Credibility overlays Credibility overlays used to model residual vehicle and geographic variation Vehicle Credibility Model Make Model Variant Geographic Credibility Model Statistical district (SD) Stat sub-district (SSD) SLA Each level in the hierarchy has a relativity factor applied, representing a portion of the actual observed relativity. 1.1 Observed Relativity for Holden Portion depends on exposure and tuning parameter 1.04 Adopted Relativity for Holden 30

Vehicle credibility overlay - details Adaptation of approach suggested by Ohlsson, Scandinavian Actuarial Journal, 2008 (for log link) y* = f make f model f variant y f make=k = (1-z k )+z k r k z k = w k /(w k +T) w k = k y, for Poisson model Relativities are done sequentially, using test set to set optimal T Combined assigned relativities Within +/- 15% 95% within +/- 10% 31

Alternative geographic overlays Tree based latitude/longitude modelling Taking the residual as the target, and latitude and longitude plus rotations as predictors, build a tree-based model. Attempted using both random forests and treenet. The latter performed more strongly Non-parametric kernel smoothing For a given location, form a prediction by taking the weighted average residual of the 500 nearest CCDs Weights defined by a kernel function. More weight given to closest CCDs. Number of CCDs chosen empirically Postcodes (rather than CCDs) also trialled, giving inferior performance Thin plate spline (Whittaker) smoothing Using latitude and longitude as predictors, fit a thin plate spline model to the average residuals by postcode. Too computationally intensive to apply at a more granular level Alternative credibility setups Other types of hierarchies and variables explored Adopted the most successful 32

Contents 1. Background Scope How many models? 2. Approach Data Variable filtering GLM Interactions Credibility overlay 3. Model performance Overall Sources of value Final tests 33

Comparison with recalibrated base Over 30% improvement area under curve on holdout data 34

Comparison with recalibrated base 50% of polices changed in risk price by at least 20% (up or down) Area of identified loss represents 11% of total claims 35

Overall fit diagnostic Model shows good discrimination between good and poor risks. The worst 2.5% of drivers have an expected claim frequency 11 times more than the best 2.5% of drivers. This compares to under 8 times for the base model. 36

Contents 1. Background Scope How many models? 2. Approach Data Variable filtering GLM Interactions Credibility overlay 3. Model performance Overall Sources of value Final tests 37

Sources of value Improvement in gains relative to recalibrated base Mystery Tenure variables + Claim hist - A Mystery variables Other vars - B Vehicle Mystery variables Geographic - C Interactions Cred Veh Cred Geo. Treenet overlay 0% 5% 10% 15% 20% 25% 30% 35% 40% Increase in area under curve 38

Contents 1. Background Scope How many models? 2. Approach Data Variable filtering GLM Interactions Credibility overlay 3. Model performance Overall Sources of value Final tests 39

Final tests Has the multibrand, multistate structure cost predictive power? AvsE diagnostics by brand and state Comparison with a model fitted only on one brand one state All satisfactory Is there more to be gained? Compare final models to initial optimal Treenet models. Final models are better (but not by much) Fit to claim frequency and size residuals by peril to get machine learning overlay A little bit of extra performance Has the separate peril structure cost predictive power? Combine all models to get a single claim frequency and claim size Fit to residuals (take care with definition of residual) with Treenet and GLM Nothing compelling 40