Survval analyss methods n Insurance Applcatons n car nsurance contracts Abder OULIDI 1 Jean-Mare MARION 2 Hervé GANACHAUD 3 Abstract In ths wor, we are nterested n survval models and ther applcatons on actuaral problems. We partcularly study the Cox model and Aalen model whch allow covarate effects to vary wth tme (tme dependant covarates; ths allows to obtan more precse results on the lfespan of cars nsurance. We are nterested n the relatonshp between lfespan of contracts and some predctve covarates. For example, the Bonus-Malus s a covarate nfluencng contracts and we notced that the more the Bonus-Malus ncreases, the more the rs of cancellaton ncreases. We also studed tme dependant covarates ( nternal covarates are generated by ndvduals under study, external covarates more easly observed snce they are ndependent of the study subject to Bonus-Malus because the nsurant eeps ths one by changng nsurance company. We compare the lfespan of car nsurance contracts estmated by survval models (nonparametrc, parametrc and sem-parametrc models wth fxed and tme-dependant covarates. Keywords: Cox model, Aalen model, survval dstrbutons, censored data, Kaplan-Meer, lfespan of car nsurance contracts. 1 Insttut de Mathématques Applquées 44 Rue Rabelas BP 88 491 ANGERS CEDEX 1 ould@ma.uco.fr 2 Insttut de Mathématques Applquées 44 Rue Rabelas BP 88 491 ANGERS CEDEX 1 jean-mare-maron@uco.fr 3 Groupe MMA DCTGP - 1 Boulevard Alexandre Oyon 723 LE MANS Cedex 9 ganachaud@groupe-mma.fr 1
1. Introducton The car nsurance s a mature maret wth a wea growth rate. Furthermore, n ths muchsough sector, new actors (bans-nsurers, large dstrbuton come to jon the tradtonal actors. Confronted wth a strong competton aggravated by the quas-stablty of the nsurable motor vehcle populaton, and the advent of the future European prudental framewor, nsurers are led to develop optmal models of survellance and management of ther portfolo, among others, to develop loyalty of the most proftable customers and possbly to cancel some contracts of nsurance. In ths wor, we use survval models and ther applcatons on actuaral problems. We are nterested n the relatonshp between lfespan of contracts and some predctve covarates. We partcularly study models wth tme dependant covarates; ths allows obtanng more precse results on the lfespan of cars nsurance. We apply these methods of survval analyss to an actuaral dataset. The orgn of survval analyss can be traced to early wor on mortalty tables, whch was followed and expanded by statstcal research for engneerng applcatons. But there are also other felds of applcatons: medcne, bology and economy. In our paper, we wll use models of survval analyss n an actuaral context. In secton 2 we wll consder a bref overvew of tradtonal survval models (non-parametrc, parametrc and sem-parametrc models wth censorng. Secton 3 s dedcated to the Cox model wth tme dependant covarates. In secton 4 we dscuss the Aalen model whch allows covarate effects to vary wth tme. In secton 5, we consder a dataset from a French nsurance company whch contans nformaton about cars nsurance contracts. We nvestgate the tme from the concluson untl the cancellaton of a car s contract. There are several attrbutes gven about the nsurance holder. We wll compare survval models on ths dataset. Aalen model wth tme dependant covarates wll allow to obtan more precse results on the lfespan of car s nsurance. 2. Survval models In prospectve studes, the mportant feature s not only the outcome event, but the tme to event, the survval tme. For example the survval tme T from the concluson (startng pont untl the cancellaton (endng event of a contract. The dstrbuton of T from startng pont to the event of nterest, vewed as a postve random varable, s characterzed by the probablty densty functon f or the cumulatve dstrbuton functon F. 2
The survval functon s defned by St ( = PT ( > t and the hazard functon, denoted h, f( t 1 defned by ht ( = = lm PT ( ] tt, + t] / T> t St ( t t The hazard functon specfes the nstantaneous rate of contract s cancellaton at tme t, gven that the contract survves up tll t. t The cumulatve hazard functon s defned by Ht ( = hxdx (. We fnd that St ( exp { Ht (} = snce S ( = 1 The functons f, FSand, h gve mathematcally equvalent sgnfcaton of the dstrbuton of T. A specal source of dffculty n the analyss of survval data s the possblty that some ndvduals may not be observed for the full tme to event. Ths problem s called censorng and the assocated varable s denoted by C. The so-called censorng arses, for example, when observaton s termnated before the occurrence of the event. If the cancellaton of a contract s not observed, we defne ( Y, D wth Y = mn( T, C and D s the censorng ndcator. Non parametrc models In ths secton we dscuss the analyss of survval data wthout parametrc assumptons about the dstrbuton of T. Our topc s non-parametrc estmaton of the survval functon. T T, really we observe Assume we have a rght-censored sample of survval data (,..., 1 n Y mn ( T, C D = 1 and C ( 1 n = wth { T C } denote the rght censored observatons. Let (,..., ' ' Y Y 1 ( n the orderly sample wth D,..., 1 D n the ordered ndcators. Consder R( t the number of ndvduals at rs just pror to t (these are cases whose duraton tme s at least t and M ( Y ( the number of cancellatons at Y (. ( ( ( ( M Y Sˆ ( t = 1 RY { Y ; ( < t} s called the product-lmt estmator or Kaplan-Meer s estmator for St ( - the most common method of estmatng the survval functon. Parametrcs models In parametrc models, the survval tme T belongs to a class of specfed dstrbutons. These functons are descrbed usng a fnte number of parameters, the purpose of whch wll be to estmate them from a data set. 3
t t a sample resultng from a nown dstrbuton ( Let 1,..., n f x, θ, where θ s a vectoral or not - parameter. Really, we observe y 1,..., y n, a possbly rght or left censored set of observatons. Parametrc models, or regresson procedures, are technques for assessng the relatonshp between survval tmes and a set of explanatory varables (or covarates. For example, the Bonus-Malus, the age of vehcle are nfluencng the lfespan of car s nsurance contract A characterstc of survval data s that the response cannot be negatve. Ths suggests that a transformaton of the survval tme such as a log transformaton may be necessary or that specalzed methods may be more approprate than those that assume a normal dstrbuton for the error term. The parametrc model s of the form y = % xβ + ηε = 1,..., n ln where x% a transpose vector of covarates correspondng to the ndvdual s, β s a vector of unnown regresson parameters, η s an unnown scale parameter, and ε s an error term. The baselne dstrbuton of the error term can be specfed as one of several possble dstrbutons, ncludng, but not lmted to, the exponental, log normal, log logstc, and Webull dstrbutons. In parametrc models, we estmate parameters β,η and those of the ε dstrbuton. Fnally, we obtan the dstrbuton of the survval tmet. Sem-parametrc models Sem-parametrc models assume a parametrc form for the effects of explanatory varables on survval tmes and allow an unspecfed form for an underlyng survvor functon. Among these models, the most nown one s the Cox regresson model. Thus, the hazard functon of the survval tme s gven by: ht ( / x = h ( t exp( % xβ where h s an unspecfed baselne hazard functon, x% s a vector of covarate values (transposed and β s a vector of unnown regresson parameters. The effect of the covarates on survval s to act multplcatvely on some unnown baselne hazard rate. The Cox regresson s a proportonal hazards model. That s, wth tme-fxed covarates, the rato of ther hazard functon for any two ndvduals and j obeys the relatonshp: ht (/ x1 = exp( x% 1β1 x% 2β2 ht (/ x2 thus the hazard rato s constant wth respect to tme t. Let S the baselne survval functon assocated wth S t/ x = S( t. In order to estmate β, we observe ( y(1,..., y ( n an orderly sample and we use the partal h, we have ( [ ] exp( x% β 4
n exp( x% β lelhood functon L( y(,..., y 1 ( ; β n = = 1 exp( x% β R( y ( where the rs set R( y( ncludes those contracts at rs for the event at tme Y ( when the event was observed to occur for contract (or at whch tme contract was rght censored that s, contracts for whom the cancellaton has not yet occurred or who have yet to be rght censored. δ (Notce that censorng tmes are excluded from lelhood because for these observatons the exponent δ =. Fnally we use S( t/ x = [ S( t ] One may wrte Sˆ ( t/ x { jy ; ( j < t} exp( x% β exp j ( xβ = ˆ ν % to estmate S. where ˆ ν j are solutons of the lelhood equatons: exp ( x% ˆ lβ = exp ( ˆ ( x ˆ exp x lβ % lβ l R( y( j l D3 j 1 ν j and z s the number of dfferent lfetmes, Remars: % j = 1,..., z D3 j are lfetmes really observed n the sample. For detectng volaton of the proportonal hazard assumpton, some methods are recommended: - Log cumulatve hazard rate: We stratfy on categorcal varables. For each varable, we plot on the same graph the cumulatve hazard rate curves aganst t on a log scale and compare them. If the curves are parallel over tme, t supports the proportonal hazard assumpton. If they cross, ths s a blatant volaton. - Scaled Schoenfeld resduals: The Schoenfeld resdual s the dfference between the covarate at the event tme and the expected value of the covarate at ths tme. As an alternatve to proportonal hazards, Therneau and Gambsch consder tme varyng coeffcents β( t = β + θg( t for some smooth functon g. Gven gt (, they develop a score test for ( H θ = based on a generalzed least square estmaton forθ. Under( H, we expect to see a constant functon over tme. If not, the hazard rato s not constant wth respect to tme t. 5
When the proportonal hazard assumpton s volated we can study Cox model wth tme dependant covarates and Aalen s non-parametrc addtve hazards model. 3. The Cox model wth tme dependant covarates The Cox model can be extended to allow tme dependant covarates. It s often the case that the values of some explanatory varables n a survval analyss change over the tme (for example the Bonus-Malus varable. It seems natural to use the covarate nformaton that vares over tme n an approprate statstcal model. In ths case, the Cox model wth tme dependant covarates specfes that: ht ( / x = h ( t exp( % xtβ ( where x% ( t s a tme dependant vector of covarate values. We can dstngush between nternal and external tme dependant covarates: - For an nternal varable, the reason for a change depends on nternal characterstcs or behavor specfc to the ndvdual. The hazard functon bears no relatonshp to the survval functon for nternal covarates. - In contrast, a varable s called an external varable f ts values change prmarly because of external characterstcs of the envronment that may affect several ndvduals smultaneously. For example, an external covarate s one that s not drectly related to cancellaton of car s nsurance contract. The partal lelhood functon of β for ths model s gven by ( (,..., y 1 ( ; β n L y n exp % = = 1 exp R( y ( ( x ( y( β x% ( y( ( β The formula for partal lelhood loos almost dentcal to the one derved for tme ndependent covarates. The only dfference s that at tme y (, the values of tme-dependant covarates at tme y ( were used, both for the contract cancelled at that tme, as well as the contracts that are at rs sets at that tme. The estmates are obtaned by maxmzng the partal lelhood functon. The major dffculty wth tme dependant covarates n Cox model s computng, because the rs sets used to form L are more complcated wth tme dependant covarates (we need to now the exact value of covarates at cancellaton tme for all contracts at rs. 4. Aalen s addtve regresson model The proportonal hazards model assumes multplcatve effects of covarates on the hazard functon whle the addtve rs model assumes that the hazard functon assocated wth a set of covarates s the sum of a baselne hazard functon and a regresson functon of covarates. δ 6
The condtonal hazard rate at tme t, gven x( t, can be modelled by the followng lnear model: ht/ xt = β t+ % β txt ( ( ( ( ( where β ( t s a baselne hazard functon, ( β( t ( β ( t 1 = s a vector of unnown regresson parameters. p x t s a vector of covarate values and Drect estmaton of β ( t s dffcult. It s much easer to estmate the cumulatve regresson functons ( β ( t B t = s ds where p. The estmators of coeffcents B ( t are based on least-squares technque. A crude estmate of ( t ( t β s gven by the slope of the estmate B ( β can be obtaned by usng smoothng technque. t. Better estmates of 5. Applcaton The dataset we are consderng stems from a French nsurance company and contans nformaton about the lfespan of car s nsurance contracts. Havng elmnated some values (for example, n some contracts the varable frst date of crculaton of the vehcle can t mae use, the dataset conssts of 1461 car s nsurance contracts. All types of cancellaton are observed, contract s cancellaton by the customer or by the nsurance company. Consequently, cancellatons are not homogeneous and a small devaton about lfespan of contracts s possble. The contracts were created durng the perod of June 13 th, 1974 to December 28 th, 1995. The cancellaton of a contract could only be observed after January 1 st, 1996. For our analyss, the event of nterest s the contract s lfetme. If the cancellng contract s before February 7th, 26 we have consdered the duraton between cancellaton and concluson of contract otherwse the duraton between February 7th, 26 and concluson of contract (thus we have a rght censorng. For a contract several dfferent covarates are nown: the age of vehcle, Bonus-Malus varable, type of nsurance. In ths wor, we present methods to estmate the lfespan of car s nsurance contracts (parametrc, non parametrc and sem parametrc wth tme dependant covarates methods. Results Our man goal was to estmate survval functon of car s nsurance contracts. - If we have no pror nformaton on survval functon, we have estmated ths functon wth non-parametrc Kaplan-Meer method. 7
- To ntroduce exogenous varables n model, we consdered parametrc methods (regresson lnear models. The log-logstc model provded the best model for lfespan of car s nsurance contracts. - A sem-parametrc model, the Cox model was consdered. Ths model yelds easly nterpreted estmated of covarates effects, but the assumpton of proportonal hazards s necessary to mae these estmates vald. Frst, the proportonal hazards assumpton was nvestgated by examnng graphcal dagnostcs. We stratfed exogenous varables (Bonus-Malus, age of vehcle, type of nsurance and plotted on the same graph, one by varable, the cumulatve hazard rate curves aganst t on a log scale. Bonus-Malus curves, age of vehcle curves and type of nsurance curves are crossng and we deduct volaton of proportonal hazards assumpton. Secondly, the scaled Schoenfeld resduals and test for tme varyng coeffcents were nvestgated to assess proportonal hazards assumpton. For each covarate, we test tme ndependent Cox model coeffcents. The results from the test ndcate the proportonal hazards assumpton s not satsfed. A concluson s that Cox regresson model s not an adequately model to descrbe these data. Some varables were changng over tme (Bonus-Malus varable for example. The nvestgaton of Cox model wth tme dependant covarates s not possble; n the dataset the exact value of Bonus-Malus covarate tme for all contracts at rs s unnown.. - Fnally, we dscussed the Aalen s addtve regresson model. For the jth contract, the x t, can be modelled by: condtonal hazard rate at tme t, gven ( The column vector ( functons wll be estmated. j 3 ( / j( β ( β ( j ( ht x t t t x t = + = 1 B t, wth elements ( β ( t B t = s ds 1 3 (cumulatve regresson Wth our dataset, all coeffcents are statstcally sgnfcant. Then we dscuss cumulatve regresson functons plots for ths dataset. For example, we note that the more the Bonus-Malus varable ncreases, the more the rs of cancellaton ncreases over the entre tme. We note also that the cumulatve regresson coeffcent plot for Type of nsurance varable suggests that there s an ncrease n the hazard rate wth ncreasng tme that remans n effect over the frst 6 years. 8
Concluson Ths wor on lfespan of car s nsurance contracts was an llustraton of well-nown methods of survval analyss appled to a non lfe nsurance portfolo. The nsurance company can use these estmatons of survval functon wth covarates to develop, for example, the proftablty of nsurance contracts auto. References COX D.R. and OAKES D. (1984, Analyss of survval data, London, Edton Chapman and Hall. DROESBEKE J.J, FICHET B, TASSI P., édteurs (1989, Analyse statstque des durées de ve: Modélsaton et données censurées, Economca. KALBFLEISH J.D. and PRENTICE R.L. (198, The statstcal analyss of falure tme data, New Yor: Wley and Sons, Inc. KAPLAN E.L. and MEIER P. (1958, Non parametrc estmaton from ncomplete observatons, J. Amer. Statst. Assoc. 53, pp 457-481. LI, S.,(1996. Survval analyss, Maretng Research, 7(4, 17-23. PLANCHET F. and THEROND P. Modèles de durée. Applcatons actuarelles. Economca (26 THERNEAU T.M and GAMBSCH P.M. Modelng Survval Data. Sprnger (21 9