Dealing with continuous variables and geographical information in non life insurance ratemaking Maxime Clijsters
Introduction Policyholder s Vehicle type (4x4 Y/N) Kilowatt of the vehicle Age Age of the vehicle Age of the permit Postal code Professional use (Y/N) Categorical variable Continuous variable Multi-Level Factor Tariff?
Introduction GLMs remain a very important statistical regression technique for pricing car insurance products GAMs provide interesting insights in the underlying dependency structure, but come at a high computational cost GAM as a complementary modelling tool GLM = Generalized Linear Model GAM = Generalized Additive Model
AGENDA Binning continuous variables GAM to explore nonlinear effects GAM and regression trees for binning Modelling geographical information
Binning continuous variables GLM GLM is satisfying modelling tool Industry-wide standard Only categorical variables GAM Continuous variables High computational cost No parametric functional form
Binning continuous variables GAM to explore nonlinear effects
Binning continuous variables GAM to explore nonlinear effects
Binning continuous variables GAM to explore nonlinear effects Often not desirable to keep the continuous effect in the tariff» GAM has a high computational cost (iterative method)» GAM lacks a parametric functional form GAMs provide insight in defining risk homogeneous groupings of variables
Binning continuous variables GAM for binning Results of the GAM as a starting point for binning Broader categories where the risk is similar More categories when the risk varies a lot Defining boundaries by means of regression trees
Binning continuous variables Regression tree Divide variables into groups based on GAM estimate Find splits that minimize overall sum of squared errors Grow tree with desired number of classes Figure: The black coloured nodes correspond to the regression tree used, the blue coloured nodes are the following splits, and the light blue nodes are the subsequent splits
Binning continuous variables Binning results Figure: Visualization of the classes suggested by the regression tree
AGENDA Binning continuous variables Geographical information Modelling GLM without geographical information GAM with geographical information Visualizing and binning
Geographical information Introduction
Latitude Geographical information Introduction Bree: 51 07'08.8"N 5 38'32.5"E Longitude
Geographical information Step 1: GLM without geographical information
Geographical information Step 1: GLM without geographical information Predicted number of claims per district Observed number of claims per district
Geographical information Step 2: GAM with geographical information
Geographical information Step 2: GAM with geographical information
Geographical information Step 2: GAM with geographical information
Geographical information Visualizing and binning the geographic effect
Geographical information Visualizing and binning the geographic effect Problematic issue Different classification methods can yield dissimilar classes Maps are very sensitive to the classification method used Visualization of the same data can convey different impressions
Geographical information Visualizing and binning the geographic effect
Conclusion GLMs remain a very important statistical regression technique for pricing car insurance products. GAMs provide interesting insights in the underlying dependency structure, but come at a high computational cost. Care is needed when reading and interpreting choropleth maps Different classification techniques produce different results. Classification strongly affects the visual impressions readers obtain.