Survival analysis methods in Insurance Applications in car insurance contracts



Similar documents
benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

1 De nitions and Censoring

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

Statistical Methods to Develop Rating Models

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

Recurrence. 1 Definitions and main statements

CHAPTER 14 MORE ABOUT REGRESSION

Traffic-light a stress test for life insurance provisions

An Alternative Way to Measure Private Equity Performance

What is Candidate Sampling

Binomial Link Functions. Lori Murray, Phil Munz

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

Forecasting the Direction and Strength of Stock Market Movement

Forecasting and Stress Testing Credit Card Default using Dynamic Models

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

1. Measuring association using correlation and regression

The OC Curve of Attribute Acceptance Plans

Efficient Project Portfolio as a tool for Enterprise Risk Management

PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB.

How To Calculate The Accountng Perod Of Nequalty

Analysis of Premium Liabilities for Australian Lines of Business

Calculation of Sampling Weights

Traffic-light extended with stress test for insurance and expense risks in life insurance

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

Project Networks With Mixed-Time Constraints

DEFINING %COMPLETE IN MICROSOFT PROJECT

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Stress test for measuring insurance risks in non-life insurance

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

Transition Matrix Models of Consumer Credit Ratings

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

Customer Lifetime Value Modeling and Its Use for Customer Retention Planning

HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA*

SIMPLE LINEAR CORRELATION

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

Prediction of Disability Frequencies in Life Insurance

L10: Linear discriminants analysis

ADVERSE SELECTION IN INSURANCE MARKETS: POLICYHOLDER EVIDENCE FROM THE U.K. ANNUITY MARKET *

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

A Model of Private Equity Fund Compensation

Estimating Total Claim Size in the Auto Insurance Industry: a Comparison between Tweedie and Zero-Adjusted Inverse Gaussian Distribution

Testing Adverse Selection Using Frank Copula Approach in Iran Insurance Markets

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

GRAVITY DATA VALIDATION AND OUTLIER DETECTION USING L 1 -NORM

Abstract. 260 Business Intelligence Journal July IDENTIFICATION OF DEMAND THROUGH STATISTICAL DISTRIBUTION MODELING FOR IMPROVED DEMAND FORECASTING

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

The Application of Fractional Brownian Motion in Option Pricing

General Iteration Algorithm for Classification Ratemaking

Vasicek s Model of Distribution of Losses in a Large, Homogeneous Portfolio

RELIABILITY, RISK AND AVAILABILITY ANLYSIS OF A CONTAINER GANTRY CRANE ABSTRACT

Activity Scheduling for Cost-Time Investment Optimization in Project Management

OLA HÖSSJER, BENGT ERIKSSON, KAJSA JÄRNMALM AND ESBJÖRN OHLSSON ABSTRACT

SUPPLIER FINANCING AND STOCK MANAGEMENT. A JOINT VIEW.

Solution: Let i = 10% and d = 5%. By definition, the respective forces of interest on funds A and B are. i 1 + it. S A (t) = d (1 dt) 2 1. = d 1 dt.

Fragility Based Rehabilitation Decision Analysis

The role of time, liquidity, volume and bid-ask spread on the volatility of the Australian equity market.

Lecture 3: Force of Interest, Real Interest Rate, Annuity

Regression Models for a Binary Response Using EXCEL and JMP

Mathematical Models in Banking Sector in the Context of the new Economy

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

NON-CONSTANT SUM RED-AND-BLACK GAMES WITH BET-DEPENDENT WIN PROBABILITY FUNCTION LAURA PONTIGGIA, University of the Sciences in Philadelphia

1. Math 210 Finite Mathematics

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

Using Series to Analyze Financial Situations: Present Value

BERNSTEIN POLYNOMIALS

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Heterogeneous Paths Through College: Detailed Patterns and Relationships with Graduation and Earnings

Estimation of Dispersion Parameters in GLMs with and without Random Effects

Extending Probabilistic Dynamic Epistemic Logic

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

Damage detection in composite laminates using coin-tap method

Chapter 2 The Basics of Pricing with GLMs

PERRON FROBENIUS THEOREM

Portfolio Loss Distribution

STATISTICAL DATA ANALYSIS IN EXCEL

The Current Employment Statistics (CES) survey,

The Racial and Gender Interest Rate Gap. in Small Business Lending: Improved Estimates Using Matching Methods*

A Probabilistic Theory of Coherence

The Choice of Direct Dealing or Electronic Brokerage in Foreign Exchange Trading

Stochastic Protocol Modeling for Anomaly Based Network Intrusion Detection

Joe Pimbley, unpublished, Yield Curve Calculations

Chapter 8 Group-based Lending and Adverse Selection: A Study on Risk Behavior and Group Formation 1

RECENT DEVELOPMENTS IN QUANTITATIVE COMPARATIVE METHODOLOGY:

Software project management with GAs

Transcription:

Survval analyss methods n Insurance Applcatons n car nsurance contracts Abder OULIDI 1 Jean-Mare MARION 2 Hervé GANACHAUD 3 Abstract In ths wor, we are nterested n survval models and ther applcatons on actuaral problems. We partcularly study the Cox model and Aalen model whch allow covarate effects to vary wth tme (tme dependant covarates; ths allows to obtan more precse results on the lfespan of cars nsurance. We are nterested n the relatonshp between lfespan of contracts and some predctve covarates. For example, the Bonus-Malus s a covarate nfluencng contracts and we notced that the more the Bonus-Malus ncreases, the more the rs of cancellaton ncreases. We also studed tme dependant covarates ( nternal covarates are generated by ndvduals under study, external covarates more easly observed snce they are ndependent of the study subject to Bonus-Malus because the nsurant eeps ths one by changng nsurance company. We compare the lfespan of car nsurance contracts estmated by survval models (nonparametrc, parametrc and sem-parametrc models wth fxed and tme-dependant covarates. Keywords: Cox model, Aalen model, survval dstrbutons, censored data, Kaplan-Meer, lfespan of car nsurance contracts. 1 Insttut de Mathématques Applquées 44 Rue Rabelas BP 88 491 ANGERS CEDEX 1 ould@ma.uco.fr 2 Insttut de Mathématques Applquées 44 Rue Rabelas BP 88 491 ANGERS CEDEX 1 jean-mare-maron@uco.fr 3 Groupe MMA DCTGP - 1 Boulevard Alexandre Oyon 723 LE MANS Cedex 9 ganachaud@groupe-mma.fr 1

1. Introducton The car nsurance s a mature maret wth a wea growth rate. Furthermore, n ths muchsough sector, new actors (bans-nsurers, large dstrbuton come to jon the tradtonal actors. Confronted wth a strong competton aggravated by the quas-stablty of the nsurable motor vehcle populaton, and the advent of the future European prudental framewor, nsurers are led to develop optmal models of survellance and management of ther portfolo, among others, to develop loyalty of the most proftable customers and possbly to cancel some contracts of nsurance. In ths wor, we use survval models and ther applcatons on actuaral problems. We are nterested n the relatonshp between lfespan of contracts and some predctve covarates. We partcularly study models wth tme dependant covarates; ths allows obtanng more precse results on the lfespan of cars nsurance. We apply these methods of survval analyss to an actuaral dataset. The orgn of survval analyss can be traced to early wor on mortalty tables, whch was followed and expanded by statstcal research for engneerng applcatons. But there are also other felds of applcatons: medcne, bology and economy. In our paper, we wll use models of survval analyss n an actuaral context. In secton 2 we wll consder a bref overvew of tradtonal survval models (non-parametrc, parametrc and sem-parametrc models wth censorng. Secton 3 s dedcated to the Cox model wth tme dependant covarates. In secton 4 we dscuss the Aalen model whch allows covarate effects to vary wth tme. In secton 5, we consder a dataset from a French nsurance company whch contans nformaton about cars nsurance contracts. We nvestgate the tme from the concluson untl the cancellaton of a car s contract. There are several attrbutes gven about the nsurance holder. We wll compare survval models on ths dataset. Aalen model wth tme dependant covarates wll allow to obtan more precse results on the lfespan of car s nsurance. 2. Survval models In prospectve studes, the mportant feature s not only the outcome event, but the tme to event, the survval tme. For example the survval tme T from the concluson (startng pont untl the cancellaton (endng event of a contract. The dstrbuton of T from startng pont to the event of nterest, vewed as a postve random varable, s characterzed by the probablty densty functon f or the cumulatve dstrbuton functon F. 2

The survval functon s defned by St ( = PT ( > t and the hazard functon, denoted h, f( t 1 defned by ht ( = = lm PT ( ] tt, + t] / T> t St ( t t The hazard functon specfes the nstantaneous rate of contract s cancellaton at tme t, gven that the contract survves up tll t. t The cumulatve hazard functon s defned by Ht ( = hxdx (. We fnd that St ( exp { Ht (} = snce S ( = 1 The functons f, FSand, h gve mathematcally equvalent sgnfcaton of the dstrbuton of T. A specal source of dffculty n the analyss of survval data s the possblty that some ndvduals may not be observed for the full tme to event. Ths problem s called censorng and the assocated varable s denoted by C. The so-called censorng arses, for example, when observaton s termnated before the occurrence of the event. If the cancellaton of a contract s not observed, we defne ( Y, D wth Y = mn( T, C and D s the censorng ndcator. Non parametrc models In ths secton we dscuss the analyss of survval data wthout parametrc assumptons about the dstrbuton of T. Our topc s non-parametrc estmaton of the survval functon. T T, really we observe Assume we have a rght-censored sample of survval data (,..., 1 n Y mn ( T, C D = 1 and C ( 1 n = wth { T C } denote the rght censored observatons. Let (,..., ' ' Y Y 1 ( n the orderly sample wth D,..., 1 D n the ordered ndcators. Consder R( t the number of ndvduals at rs just pror to t (these are cases whose duraton tme s at least t and M ( Y ( the number of cancellatons at Y (. ( ( ( ( M Y Sˆ ( t = 1 RY { Y ; ( < t} s called the product-lmt estmator or Kaplan-Meer s estmator for St ( - the most common method of estmatng the survval functon. Parametrcs models In parametrc models, the survval tme T belongs to a class of specfed dstrbutons. These functons are descrbed usng a fnte number of parameters, the purpose of whch wll be to estmate them from a data set. 3

t t a sample resultng from a nown dstrbuton ( Let 1,..., n f x, θ, where θ s a vectoral or not - parameter. Really, we observe y 1,..., y n, a possbly rght or left censored set of observatons. Parametrc models, or regresson procedures, are technques for assessng the relatonshp between survval tmes and a set of explanatory varables (or covarates. For example, the Bonus-Malus, the age of vehcle are nfluencng the lfespan of car s nsurance contract A characterstc of survval data s that the response cannot be negatve. Ths suggests that a transformaton of the survval tme such as a log transformaton may be necessary or that specalzed methods may be more approprate than those that assume a normal dstrbuton for the error term. The parametrc model s of the form y = % xβ + ηε = 1,..., n ln where x% a transpose vector of covarates correspondng to the ndvdual s, β s a vector of unnown regresson parameters, η s an unnown scale parameter, and ε s an error term. The baselne dstrbuton of the error term can be specfed as one of several possble dstrbutons, ncludng, but not lmted to, the exponental, log normal, log logstc, and Webull dstrbutons. In parametrc models, we estmate parameters β,η and those of the ε dstrbuton. Fnally, we obtan the dstrbuton of the survval tmet. Sem-parametrc models Sem-parametrc models assume a parametrc form for the effects of explanatory varables on survval tmes and allow an unspecfed form for an underlyng survvor functon. Among these models, the most nown one s the Cox regresson model. Thus, the hazard functon of the survval tme s gven by: ht ( / x = h ( t exp( % xβ where h s an unspecfed baselne hazard functon, x% s a vector of covarate values (transposed and β s a vector of unnown regresson parameters. The effect of the covarates on survval s to act multplcatvely on some unnown baselne hazard rate. The Cox regresson s a proportonal hazards model. That s, wth tme-fxed covarates, the rato of ther hazard functon for any two ndvduals and j obeys the relatonshp: ht (/ x1 = exp( x% 1β1 x% 2β2 ht (/ x2 thus the hazard rato s constant wth respect to tme t. Let S the baselne survval functon assocated wth S t/ x = S( t. In order to estmate β, we observe ( y(1,..., y ( n an orderly sample and we use the partal h, we have ( [ ] exp( x% β 4

n exp( x% β lelhood functon L( y(,..., y 1 ( ; β n = = 1 exp( x% β R( y ( where the rs set R( y( ncludes those contracts at rs for the event at tme Y ( when the event was observed to occur for contract (or at whch tme contract was rght censored that s, contracts for whom the cancellaton has not yet occurred or who have yet to be rght censored. δ (Notce that censorng tmes are excluded from lelhood because for these observatons the exponent δ =. Fnally we use S( t/ x = [ S( t ] One may wrte Sˆ ( t/ x { jy ; ( j < t} exp( x% β exp j ( xβ = ˆ ν % to estmate S. where ˆ ν j are solutons of the lelhood equatons: exp ( x% ˆ lβ = exp ( ˆ ( x ˆ exp x lβ % lβ l R( y( j l D3 j 1 ν j and z s the number of dfferent lfetmes, Remars: % j = 1,..., z D3 j are lfetmes really observed n the sample. For detectng volaton of the proportonal hazard assumpton, some methods are recommended: - Log cumulatve hazard rate: We stratfy on categorcal varables. For each varable, we plot on the same graph the cumulatve hazard rate curves aganst t on a log scale and compare them. If the curves are parallel over tme, t supports the proportonal hazard assumpton. If they cross, ths s a blatant volaton. - Scaled Schoenfeld resduals: The Schoenfeld resdual s the dfference between the covarate at the event tme and the expected value of the covarate at ths tme. As an alternatve to proportonal hazards, Therneau and Gambsch consder tme varyng coeffcents β( t = β + θg( t for some smooth functon g. Gven gt (, they develop a score test for ( H θ = based on a generalzed least square estmaton forθ. Under( H, we expect to see a constant functon over tme. If not, the hazard rato s not constant wth respect to tme t. 5

When the proportonal hazard assumpton s volated we can study Cox model wth tme dependant covarates and Aalen s non-parametrc addtve hazards model. 3. The Cox model wth tme dependant covarates The Cox model can be extended to allow tme dependant covarates. It s often the case that the values of some explanatory varables n a survval analyss change over the tme (for example the Bonus-Malus varable. It seems natural to use the covarate nformaton that vares over tme n an approprate statstcal model. In ths case, the Cox model wth tme dependant covarates specfes that: ht ( / x = h ( t exp( % xtβ ( where x% ( t s a tme dependant vector of covarate values. We can dstngush between nternal and external tme dependant covarates: - For an nternal varable, the reason for a change depends on nternal characterstcs or behavor specfc to the ndvdual. The hazard functon bears no relatonshp to the survval functon for nternal covarates. - In contrast, a varable s called an external varable f ts values change prmarly because of external characterstcs of the envronment that may affect several ndvduals smultaneously. For example, an external covarate s one that s not drectly related to cancellaton of car s nsurance contract. The partal lelhood functon of β for ths model s gven by ( (,..., y 1 ( ; β n L y n exp % = = 1 exp R( y ( ( x ( y( β x% ( y( ( β The formula for partal lelhood loos almost dentcal to the one derved for tme ndependent covarates. The only dfference s that at tme y (, the values of tme-dependant covarates at tme y ( were used, both for the contract cancelled at that tme, as well as the contracts that are at rs sets at that tme. The estmates are obtaned by maxmzng the partal lelhood functon. The major dffculty wth tme dependant covarates n Cox model s computng, because the rs sets used to form L are more complcated wth tme dependant covarates (we need to now the exact value of covarates at cancellaton tme for all contracts at rs. 4. Aalen s addtve regresson model The proportonal hazards model assumes multplcatve effects of covarates on the hazard functon whle the addtve rs model assumes that the hazard functon assocated wth a set of covarates s the sum of a baselne hazard functon and a regresson functon of covarates. δ 6

The condtonal hazard rate at tme t, gven x( t, can be modelled by the followng lnear model: ht/ xt = β t+ % β txt ( ( ( ( ( where β ( t s a baselne hazard functon, ( β( t ( β ( t 1 = s a vector of unnown regresson parameters. p x t s a vector of covarate values and Drect estmaton of β ( t s dffcult. It s much easer to estmate the cumulatve regresson functons ( β ( t B t = s ds where p. The estmators of coeffcents B ( t are based on least-squares technque. A crude estmate of ( t ( t β s gven by the slope of the estmate B ( β can be obtaned by usng smoothng technque. t. Better estmates of 5. Applcaton The dataset we are consderng stems from a French nsurance company and contans nformaton about the lfespan of car s nsurance contracts. Havng elmnated some values (for example, n some contracts the varable frst date of crculaton of the vehcle can t mae use, the dataset conssts of 1461 car s nsurance contracts. All types of cancellaton are observed, contract s cancellaton by the customer or by the nsurance company. Consequently, cancellatons are not homogeneous and a small devaton about lfespan of contracts s possble. The contracts were created durng the perod of June 13 th, 1974 to December 28 th, 1995. The cancellaton of a contract could only be observed after January 1 st, 1996. For our analyss, the event of nterest s the contract s lfetme. If the cancellng contract s before February 7th, 26 we have consdered the duraton between cancellaton and concluson of contract otherwse the duraton between February 7th, 26 and concluson of contract (thus we have a rght censorng. For a contract several dfferent covarates are nown: the age of vehcle, Bonus-Malus varable, type of nsurance. In ths wor, we present methods to estmate the lfespan of car s nsurance contracts (parametrc, non parametrc and sem parametrc wth tme dependant covarates methods. Results Our man goal was to estmate survval functon of car s nsurance contracts. - If we have no pror nformaton on survval functon, we have estmated ths functon wth non-parametrc Kaplan-Meer method. 7

- To ntroduce exogenous varables n model, we consdered parametrc methods (regresson lnear models. The log-logstc model provded the best model for lfespan of car s nsurance contracts. - A sem-parametrc model, the Cox model was consdered. Ths model yelds easly nterpreted estmated of covarates effects, but the assumpton of proportonal hazards s necessary to mae these estmates vald. Frst, the proportonal hazards assumpton was nvestgated by examnng graphcal dagnostcs. We stratfed exogenous varables (Bonus-Malus, age of vehcle, type of nsurance and plotted on the same graph, one by varable, the cumulatve hazard rate curves aganst t on a log scale. Bonus-Malus curves, age of vehcle curves and type of nsurance curves are crossng and we deduct volaton of proportonal hazards assumpton. Secondly, the scaled Schoenfeld resduals and test for tme varyng coeffcents were nvestgated to assess proportonal hazards assumpton. For each covarate, we test tme ndependent Cox model coeffcents. The results from the test ndcate the proportonal hazards assumpton s not satsfed. A concluson s that Cox regresson model s not an adequately model to descrbe these data. Some varables were changng over tme (Bonus-Malus varable for example. The nvestgaton of Cox model wth tme dependant covarates s not possble; n the dataset the exact value of Bonus-Malus covarate tme for all contracts at rs s unnown.. - Fnally, we dscussed the Aalen s addtve regresson model. For the jth contract, the x t, can be modelled by: condtonal hazard rate at tme t, gven ( The column vector ( functons wll be estmated. j 3 ( / j( β ( β ( j ( ht x t t t x t = + = 1 B t, wth elements ( β ( t B t = s ds 1 3 (cumulatve regresson Wth our dataset, all coeffcents are statstcally sgnfcant. Then we dscuss cumulatve regresson functons plots for ths dataset. For example, we note that the more the Bonus-Malus varable ncreases, the more the rs of cancellaton ncreases over the entre tme. We note also that the cumulatve regresson coeffcent plot for Type of nsurance varable suggests that there s an ncrease n the hazard rate wth ncreasng tme that remans n effect over the frst 6 years. 8

Concluson Ths wor on lfespan of car s nsurance contracts was an llustraton of well-nown methods of survval analyss appled to a non lfe nsurance portfolo. The nsurance company can use these estmatons of survval functon wth covarates to develop, for example, the proftablty of nsurance contracts auto. References COX D.R. and OAKES D. (1984, Analyss of survval data, London, Edton Chapman and Hall. DROESBEKE J.J, FICHET B, TASSI P., édteurs (1989, Analyse statstque des durées de ve: Modélsaton et données censurées, Economca. KALBFLEISH J.D. and PRENTICE R.L. (198, The statstcal analyss of falure tme data, New Yor: Wley and Sons, Inc. KAPLAN E.L. and MEIER P. (1958, Non parametrc estmaton from ncomplete observatons, J. Amer. Statst. Assoc. 53, pp 457-481. LI, S.,(1996. Survval analyss, Maretng Research, 7(4, 17-23. PLANCHET F. and THEROND P. Modèles de durée. Applcatons actuarelles. Economca (26 THERNEAU T.M and GAMBSCH P.M. Modelng Survval Data. Sprnger (21 9