Survival analysis methods in Insurance Applications in car insurance contracts



Similar documents
Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Statistics in Retail Finance. Chapter 6: Behavioural models

SUMAN DUVVURU STAT 567 PROJECT REPORT

Checking proportionality for Cox s regression model

Modeling the Claim Duration of Income Protection Insurance Policyholders Using Parametric Mixture Models

Introduction to Event History Analysis DUSTIN BROWN POPULATION RESEARCH CENTER

Introduction. Survival Analysis. Censoring. Plan of Talk

Tests for Two Survival Curves Using Cox s Proportional Hazards Model

ATV - Lifetime Data Analysis

Regression Modeling Strategies

Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]

SAS Software to Fit the Generalized Linear Model

Survival Analysis, Software

Comparison of resampling method applied to censored data

Developing Business Failure Prediction Models Using SAS Software Oki Kim, Statistical Analytics

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.

7.1 The Hazard and Survival Functions

Statistical Analysis of Life Insurance Policy Termination and Survivorship

Distance to Event vs. Propensity of Event A Survival Analysis vs. Logistic Regression Approach

Nominal and ordinal logistic regression

Lecture 15 Introduction to Survival Analysis

BayesX - Software for Bayesian Inference in Structured Additive Regression

Interpretation of Somers D under four simple models

Tips for surviving the analysis of survival data. Philip Twumasi-Ankrah, PhD

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

An Application of Weibull Analysis to Determine Failure Rates in Automotive Components

Exam C, Fall 2006 PRELIMINARY ANSWER KEY

Gamma Distribution Fitting

Premaster Statistics Tutorial 4 Full solutions

Distribution (Weibull) Fitting

Reliability Prediction for Mechatronic Drive Systems

3. Regression & Exponential Smoothing

Introduction to Survival Analysis

Statistics Graduate Courses

5. Linear Regression

11. Analysis of Case-control Studies Logistic Regression

The Cox Proportional Hazards Model

Survival Analysis Approaches and New Developments using SAS. Jane Lu, AstraZeneca Pharmaceuticals, Wilmington, DE David Shen, Independent Consultant

Didacticiel - Études de cas

Multiple Linear Regression

V. Kumar Andrew Petersen Instructor s Presentation Slides

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Wes, Delaram, and Emily MA751. Exercise p(x; β) = [1 p(xi ; β)] = 1 p(x. y i [βx i ] log [1 + exp {βx i }].

5 Modeling Survival Data with Parametric Regression

Basic Statistical and Modeling Procedures Using SAS

Ordinal Regression. Chapter

Nonlinear Regression Functions. SW Ch 8 1/54/

Developing Risk Adjustment Techniques Using the System for Assessing Health Care Quality in the

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

Predicting Customer Default Times using Survival Analysis Methods in SAS

Parametric Survival Models

Regression Analysis: A Complete Example

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION

ESTIMATING THE DISTRIBUTION OF DEMAND USING BOUNDED SALES DATA

Examining a Fitted Logistic Model

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods

Comparison of sales forecasting models for an innovative agro-industrial product: Bass model versus logistic function

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

A LONGITUDINAL AND SURVIVAL MODEL WITH HEALTH CARE USAGE FOR INSURED ELDERLY. Workshop

Lecture 6: Poisson regression

Poisson Models for Count Data

Least Squares Estimation

Chapter 4: Statistical Hypothesis Testing

Survival Distributions, Hazard Functions, Cumulative Hazards

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1

Simple Linear Regression Inference

Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)

Exam Introduction Mathematical Finance and Insurance

Lecture 14: GLM Estimation and Logistic Regression

Applying Survival Analysis Techniques to Loan Terminations for HUD s Reverse Mortgage Insurance Program - HECM

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

Personalized Predictive Medicine and Genomic Clinical Trials

Linda Staub & Alexandros Gekenidis

Threshold Autoregressive Models in Finance: A Comparative Approach

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

Lecture 8: Gamma regression

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Chapter 13 Introduction to Linear Regression and Correlation Analysis

2. What is the general linear model to be used to model linear trend? (Write out the model) = or

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

Forecasting Methods. What is forecasting? Why is forecasting important? How can we evaluate a future demand? How do we make mistakes?

Introduction to General and Generalized Linear Models

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Simple Methods and Procedures Used in Forecasting

Chapter 3 Quantitative Demand Analysis

Transcription:

Survival analysis methods in Insurance Applications in car insurance contracts Abder OULIDI 1-2 Jean-Marie MARION 1 Hérvé GANACHAUD 3 1 Institut de Mathématiques Appliquées (IMA) Angers France 2 Institut de Statistiques et d Economie Appliquées (INSEA) Rabat Maroc 3 Mutuelles du Mans Assurances (MMA) Le Mans - France

Context Solvency II derectives to map, to identify their own risks to analyse and modelise their own risks Car insurance mature market Competition expending ( banks-insurers ) Quasi stability insurable motor vehicle populationp Insurers are led to develop optimal models of surveilance and mangement of their portfolio.

Plan 1. Introduction : definitions and notations 2. Survival models 2.1 Non parametic models 2.2 Parametric models 2.3 Semi parametric models 3. Application 3.1 Data set 32 3.2 Results 4. Conclusion and perspectives

1- INTRODUCTION: Applied Fields Statutory Mortality Tables. Experience Mortality Tables. Insurance Contracts.

1- INTRODUCTION: Definitions and notations T survival time from the starting point until cancellation of a contract. f probability density function and F cumulative distribution function of the distribution of T. S(t)=P(T>t) survival function. t hazard function defined d by: f () t 1 t lim P t T tt/ T t St () t 0 t

1- INTRODUCTION: Definitions and notations A(t) cumulative hazard function defined by : A () t t s ds 0 St ( ) exp At ( ) since S(0) 1

2- SURVIVAL MODELS: Non parametric ti models Kaplan-Meier estimator Peterson estimator Nelson estimator

2- SURVIVAL MODELS: Parametric models ( t1,..., t n ) a possibly right and left censored set of observations from: t z ln i i i the distribution of the error term i can be specified as exponential, Weibull, log normal, log logistic distributions

2- SURVIVAL MODELS: Semi-parametric ti models Cox model with time-fixed covariates: t / z t exp z β a vect or of r egr essi on par a met er s z a vect or of covari at es val ues 0 0 an unspecifi ed baseli ne hazar d f uncti on

2- SURVIVAL MODELS: Semi-parametric ti models The Cox regression model is a proportional hazard model t / z1 the «hazard ratio» exp z is independant of t 11z2 2 t/ z 2

2- SURVIVAL MODELS: Semi-parametric ti models t,..., t 1 n a sample of orderly observations. In order to estimate we use the «partial likelihood function»: n expz i k krt i,..., t ; 1 n i1 exp z Lt i

2- SURVIVAL MODELS: Semi-parametric ti models How to test proportional hazard assumption? Plots of Log cumulative hazard rate. Scaled Schoenfeld residuals an alternative to proportional hazards is time varying coefficients t g t If 0 the «hazard ratio» is not constant with respect to time t.

2- SURVIVAL MODELS: Semi-parametric ti models Alternatives models: A- Cox model with time-dependant covariates: t/ z texp z t The «partial likelihood function» is defined by: exp n zi t i Lt,..., t ; 1 n i1 exp zk tk krt i 0 i

2- SURVIVAL MODELS B- Non parametric Aalen s additive regression model: 0 t/ Z t t t Z( t) Our data, based on a sample of size n, consist of the triple Ti, i, Zi t i the event indicator for the ith contract t

2- SURVIVAL MODELS Aalen s additive regression model We define: i i T t; 1 Nt ( ) N t avec N t 1 1 i n i 1 i n i T t Y () t Y t avec Y t 1 (observation at risk at t - ) i i i

2- SURVIVAL MODELS Aalen s additive regression model The additive hazard model can be written in matrix form: dn () t Y () t db () t dm () t Y( t) is the matrix multiplicative intensity model M ( t) is a mean zero martingale k k k B ( t) B t with B t s ds 1kp t 0

2- SURVIVAL MODELS Aalen s additive regression model The least square estimator for B(t) is given by: Bt YTYT YT T 1 ˆ i i i 1( i) it ; t i where 1 T is a vector with ith element equal to 1 if contract i is cancelled i An estimate of t is given by the slope of the estimate or by using smoothing techniques k Bˆk t

2- SURVIVAL MODELS Aalen s additive regression model The estimator of the covariance matrix of ˆB t is: it ; t Var Bˆ ( t) Y Ti Y T i Y Ti 1 Ti 1 Ti Y T i Y Ti Y T i i 1 1 The hypothesis of no regression effect for one or more covariates is testing by: ( H ) B t 0 0 k

Dataset t Dataset from French insurance company 1461 car s insurance contracts t created during the period of June 13th, 1974 to December 28th, 1995. - Cancellation of a contract could only be observed after January 1st, 1996. - If the cancelling contract is before February 7th, 2006 we have considered the duration between cancellation and conclusion of contract (otherwise right censoring).

Dataset Lifetime variable : lifespan of cars insurance (Durvie) If cancellation is before February, 7th, 2006 Durvie = contract cancellation s date - contract conclusion s date If cancellation is after February, 7th, 2006 Durvie = February,7th,2006 - contract conclusion s s date fixed right censoring date

Dataset Covariates: Age of vehicle (AgeVehic) If AgeVehic 1 AgeVehic1 If 1<AgeVehic 4 AgeVehic2 If 4<AgeVehic 8 AgeVehic3 If 8<AgeVehic AgeVehic4 Type of insurance (Formule) Tierce Intégrale (formule tous risques) Formule1 Tierce Maxi (formule RC + dommages) Formule2 Tierce Simple (formule RC seule) Formule3

Dataset Bonus-Malus variable (BM) If Bonus-Malus = 0.5 05(b bonus 50%) BM1 If 0.5 < Bonus-Malus 0.7 (30 % bonus <50 %) BM2 If Bonus-Malus > 0.7 (bonus or malus < 30 % ) BM3

Results All Censoring Cancellation Effectifs 1461 537 924 Number of contracts All Cens. cancel BM1 569 266 303 BM2 387 142 245 All Cens. cancel Formule1 343 140 203 Formule2 589 222 367 AgeVehic 1 AgeVehic 2 AgeVehic 3 All Cens. cancel 49 12 37 260 111 149 449 161 288 BM3 505 129 376 Formule3 529 175 354 AgeVehic 4 703 253 450

Results All Censoring Cancellation DurVie 10.24 14.79 7.59 Mean of DurVie (in years) on January 1st, 1996 All Cens. cancel BM1 12.63 15.94 9.73 BM2 9.77 14.11 7.26 All Cens. cancel Formule1 9.19 12.80 6.70 Formule2 10.58 14.92 7.96 AgeVehic 1 AgeVehic 2 AgeVehic 3 All Cens. cancel 6.20 10.62 4.77 820 8.20 11.78 553 5.53 8.84 13.17 6.43 BM3 7.89 13.18 6.09 Formule3 10.52 16.22 7.71 AgeVehic 4 12.1616 17.34 925 9.25

Results coef exp(coef) se(coef) z p BM 0.419 1.520 0.04000400 10.47 0.0e+0000e+000 Formule 0.181 1.199 0.0577 3.14 1.7e-003 Agevehic -0.326 0.722 0.0522-6.25 4.2e-010 Rsquare= 0.112 Likelihood ratio test= 174 on 3 df, p=0 Wald test = 174 on 3 df, p=0 Score (logrank) test = 179 on 3 df, p=0 Cox model

Results survival function 0.6 0.8 1.0 BM1 BM2 BM3 0.0 0.2 0.4 0 5 10 15 20 25 time in years

Results survival functio on 1.0.6 0.8 0.4 0. Agevehic1 Agevehic2 Agevehic3 Agevehic4 0.0 0.2 0 5 10 15 20 25 time in years

Results.0 survival function n 0.8 1 0.4 0.6 Formule1 Formule2 Formule3 0.0 0.2 0 5 10 15 20 25 time in years

Results 0 BM1 BM2 BM3-6 log-log survival fu -4-2 nction 0.5 1.0 5.0 10.0 time in years

Results log-log survival func ction -2 0-4 - Agevehic1 Agevehic2 Agevehic3 Agevehic4-6 05 0.5 10 1.0 50 5.0 10.00 time in years

Results nction log-log survival fu -4-2 0 Formule1 Formule2 Formule3-6 0.5 1.0 5.0 10.0 time in years

Results Proportional hazard test: t g t 0 Test de: H = 0 avec gt ( ) t rho chisq p BM -0.0849 6.33 1.19e-02 Formule -0.1174 13.46 2.43e-04 Agevehic 0.0987 9.19 2.43e-03 GLOBAL NA 24.10 2.38e-05

Results Schoenfeld residuals:

Results

Results

Results Additive Aalen Model Test for non-significant effects Supremum-test t of significance ifi p-value H_0: B(t)=0 (Intercept) 4.75 0 Agevehic 6.17 0 Formule 4.79 0 BM 9.30 0 Test for time invariant effects Kolmogorov-Smirnov test p-value H_0: B(t)=b t (Intercept) 0.808 0.009 Agevehic 0.252 0.001 Formule 0.119 0.022 BM 0.195 0.000 Cramer von Mises test p-value H_0: B(t)=b t (Intercept) 1.240 0.081 Agevehic 0.263 0.003 Formule 0.112 0.002 BM 0.206 0.000

Results

Results

Results

Results

4- Conclusion and percpectives - Cox models with time-change covariates are not easy to understand or visualize. - Aelen model for failure time analysis allows the inclusion of time-dependent covariates as well as the variation of covariate effects over time. - Comparison with other models («duplication» models ) - Tests on another large dataset with new time dependant covariates..

Some References 1. Aalen, O.O. (1989). A linear regression model for the analysis of life times, Statistics in Medicine 8, 907-925. 1. Cox D.R. (1972). Regression models and life tables, J.R.Statist.Soc. B34, 187-220. 2. Grambsch P. and Therneau T.M. (1994). Proportional hazards tests and diagnostics based on weighted residuals. 3. Therneau T.M. and Grambsch P. (1990). Martingale-based residuals for survival models. Biometrika. 77, 1, pp. 147-160.