Dealing with continuous variables and geographical information in non life insurance ratemaking. Maxime Clijsters

Similar documents
Combining Linear and Non-Linear Modeling Techniques: EMB America. Getting the Best of Two Worlds

What s New in IBM SPSS Statistics 20

exspline That: Explaining Geographic Variation in Insurance Pricing

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Lecture 10: Regression Trees

Prediction of Car Prices of Federal Auctions

THE HYBRID CART-LOGIT MODEL IN CLASSIFICATION AND DATA MINING. Dan Steinberg and N. Scott Cardell

Data Analysis of Trends in iphone 5 Sales on ebay

Thematic Map Types. Information Visualization MOOC. Unit 3 Where : Geospatial Data. Overview and Terminology

Understanding Characteristics of Caravan Insurance Policy Buyer

Risk pricing for Australian Motor Insurance

Gerry Hobbs, Department of Statistics, West Virginia University

Predictive Modeling and Big Data

Predictive Modeling Techniques in Insurance

Fast Analytics on Big Data with H20

Data Mining Techniques Chapter 6: Decision Trees

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

Data Mining. Nonlinear Classification

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Data Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data

Studying Auto Insurance Data

Maximize Revenues on your Customer Loyalty Program using Predictive Analytics

Regression III: Advanced Methods

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar

Statistics. Measurement. Scales of Measurement 7/18/2012

Model-Based Recursive Partitioning for Detecting Interaction Effects in Subgroups

Data Visualization Handbook

Practical applications of Predictive Modelling Overview of the process, the techniques and the applications

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Applications in Higher Education

Stochastic programming approaches to pricing in non-life insurance

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

The Predictive Data Mining Revolution in Scorecards:

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Business Analytics and Credit Scoring

A Deeper Look Inside Generalized Linear Models

Predictive modelling around the world

BayesX - Software for Bayesian Inference in Structured Additive Regression

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Probabilistic concepts of risk classification in insurance

Using statistical modelling to predict crash risks, injury outcomes and compensation costs in Victoria.

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

SAS Software to Fit the Generalized Linear Model

Distances, Clustering, and Classification. Heatmaps

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Predictive Analytics: Extracts from Red Olive foundational course

Data Mining Part 5. Prediction

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL CLUSTERING

Territorial Rating System for Automobile Insurance

CAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION

A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND

Modifying Insurance Rating Territories Via Clustering

SOA 2013 Life & Annuity Symposium May 6-7, Session 30 PD, Predictive Modeling Applications for Life and Annuity Pricing and Underwriting

Data Mining mit der JMSL Numerical Library for Java Applications

Clustering UE 141 Spring 2013

DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING

CHAID Decision Tree: Reverse Mortgage Loan Termination Example

International Financial Reporting Standards (IFRS) Financial Instrument Accounting Survey. CFA Institute Member Survey

Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller

Machine Learning & Predictive Analytics for IT Services

Planning and Analysis Tools of Transportation Demand and Investment Development of Formal Transportation Planning Process

Chapter 5 Analysis of variance SPSS Analysis of variance

More Flexible GLMs Zero-Inflated Models and Hybrid Models

Get to Know the IBM SPSS Product Portfolio

Data Mining and Visualization

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

not possible or was possible at a high cost for collecting the data.

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION ANALYSIS.

INDEX 1. INTRODUCTION 3. RATE / PREMIUM 4. FREQUENCY / SEVERITY & NRP 5. NET PREMIUM METHOD 7. GLM

Linear Models for Classification

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Classification and Regression Trees (CART) Theory and Applications

Constrained Clustering of Territories in the Context of Car Insurance

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Our Raison d'être. Identify major choice decision points. Leverage Analytical Tools and Techniques to solve problems hindering these decision points

Decision Trees What Are They?

Better credit models benefit us all

GEM global earthquake model. User guide: Tool for spatial inventory data development. Hu, Z., C. Huyck, M. Eguchi, J. Bevington. Data capture tools

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

Data Mining with SAS. Mathias Lanner Copyright 2010 SAS Institute Inc. All rights reserved.

How To Build A Predictive Model In Insurance

Transcription:

Dealing with continuous variables and geographical information in non life insurance ratemaking Maxime Clijsters

Introduction Policyholder s Vehicle type (4x4 Y/N) Kilowatt of the vehicle Age Age of the vehicle Age of the permit Postal code Professional use (Y/N) Categorical variable Continuous variable Multi-Level Factor Tariff?

Introduction GLMs remain a very important statistical regression technique for pricing car insurance products GAMs provide interesting insights in the underlying dependency structure, but come at a high computational cost GAM as a complementary modelling tool GLM = Generalized Linear Model GAM = Generalized Additive Model

AGENDA Binning continuous variables GAM to explore nonlinear effects GAM and regression trees for binning Modelling geographical information

Binning continuous variables GLM GLM is satisfying modelling tool Industry-wide standard Only categorical variables GAM Continuous variables High computational cost No parametric functional form

Binning continuous variables GAM to explore nonlinear effects

Binning continuous variables GAM to explore nonlinear effects

Binning continuous variables GAM to explore nonlinear effects Often not desirable to keep the continuous effect in the tariff» GAM has a high computational cost (iterative method)» GAM lacks a parametric functional form GAMs provide insight in defining risk homogeneous groupings of variables

Binning continuous variables GAM for binning Results of the GAM as a starting point for binning Broader categories where the risk is similar More categories when the risk varies a lot Defining boundaries by means of regression trees

Binning continuous variables Regression tree Divide variables into groups based on GAM estimate Find splits that minimize overall sum of squared errors Grow tree with desired number of classes Figure: The black coloured nodes correspond to the regression tree used, the blue coloured nodes are the following splits, and the light blue nodes are the subsequent splits

Binning continuous variables Binning results Figure: Visualization of the classes suggested by the regression tree

AGENDA Binning continuous variables Geographical information Modelling GLM without geographical information GAM with geographical information Visualizing and binning

Geographical information Introduction

Latitude Geographical information Introduction Bree: 51 07'08.8"N 5 38'32.5"E Longitude

Geographical information Step 1: GLM without geographical information

Geographical information Step 1: GLM without geographical information Predicted number of claims per district Observed number of claims per district

Geographical information Step 2: GAM with geographical information

Geographical information Step 2: GAM with geographical information

Geographical information Step 2: GAM with geographical information

Geographical information Visualizing and binning the geographic effect

Geographical information Visualizing and binning the geographic effect Problematic issue Different classification methods can yield dissimilar classes Maps are very sensitive to the classification method used Visualization of the same data can convey different impressions

Geographical information Visualizing and binning the geographic effect

Conclusion GLMs remain a very important statistical regression technique for pricing car insurance products. GAMs provide interesting insights in the underlying dependency structure, but come at a high computational cost. Care is needed when reading and interpreting choropleth maps Different classification techniques produce different results. Classification strongly affects the visual impressions readers obtain.