Stochastic Claims Reserving Methods in Non-Life Insurance Mario V. Wüthrich 1 Department of Mathematics ETH Zürich Michael Merz 2 Faculty of Economics University Tübingen Version 1.1 1 ETH Zürich, CH-8092 Zürich, Switzerland. 2 University Tübingen, D-72074 Tübingen, Germany.
2
Contents 1 Introduction and Notation 7 1.1 Claims process............................. 7 1.1.1 Accounting principle and accident year............ 9 1.1.2 Inflation............................. 10 1.2 Structural framework to the claims reserving problem........ 12 1.2.1 Fundamental properties of the reserving process....... 13 1.2.2 Known and unknown claims.................. 15 1.3 Outstanding loss liabilities, classical notation............. 16 1.4 General Remarks............................ 18 2 Basic Methods 21 2.1 Chain-ladder model distribution free model............. 21 2.2 The Bornhuetter-Ferguson method.................. 27 2.3 Number of IBNyR claims, Poisson model............... 30 2.3.1 Poisson derivation of the chain-ladder model......... 34 3 Chain-ladder models 39 3.1 Mean square error of prediction.................... 39 3.2 Chain-ladder method.......................... 41 3.2.1 The Mack model........................ 42 3.2.2 Conditional process variance.................. 46 3.2.3 Estimation error for single accident years........... 48 3.2.4 Conditional MSEP in the chain-ladder model for aggregated accident years.......................... 59 3.3 Analysis of error terms......................... 62 3.3.1 Classical chain-ladder model.................. 63 3.3.2 Enhanced chain-ladder model................. 64 3.3.3 Interpretation.......................... 65 3.3.4 Chain-ladder estimator in the enhanced model........ 66 3.3.5 Conditional process and prediction errors........... 67 3
4 Contents 3.3.6 Chain-ladder factors and conditional estimation error.... 68 3.3.7 Parameter estimation...................... 75 4 Bayesian models 85 4.1 Introduction to credibility claims reserving methods......... 85 4.1.1 Benktander-Hovinen method.................. 86 4.1.2 Minimizing quadratic loss functions.............. 89 4.1.3 Cape-Cod Model........................ 92 4.1.4 A distributional example to credible claims reserving.... 95 4.2 Exact Bayesian models......................... 99 4.2.1 Motivation............................ 99 4.2.2 Log-normal/Log-normal model................ 101 4.2.3 Overdispersed Poisson model with gamma a priori distribution108 4.2.4 Exponential dispersion family with its associate conjugates. 116 4.2.5 Poisson-gamma case, revisited................. 125 4.3 Bühlmann-Straub Credibility Model.................. 126 4.3.1 Parameter estimation...................... 132 4.4 Multidimensional credibility models.................. 136 4.4.1 Hachemeister regression model................. 137 4.4.2 Other credibility models.................... 140 4.5 Kalman filter.............................. 142 5 Outlook 149 A Unallocated loss adjustment expenses 151 A.1 Motivation................................ 151 A.2 Pure claims payments......................... 152 A.3 ULAE charges.............................. 153 A.4 New York-method............................ 153 A.5 Example................................. 157 B Distributions 159 B.1 Discrete distributions.......................... 159 B.1.1 Binomial distribution...................... 159 B.1.2 Poisson distribution....................... 159 B.1.3 Negative binomial bistribution................. 160 B.2 Continuous distributions........................ 160 B.2.1 Normal distribution....................... 160 B.2.2 Log-normal distribution.................... 160 B.2.3 Gamma distribution...................... 161
Contents 5 B.2.4 Beta distribution........................ 162
6 Contents
Chapter 1 Introduction and Notation 1.1 Claims process In this lecture we consider claims reserving for a branch of insurance called Non-Life Insurance. Sometimes, this branch is also called General Insurance UK, or Property and Casualty Insurance US. This branch usually contains all kind of insurance products except life insurance products. This separation is mainly for two reasons: 1 Life insurance products are rather different from non-life insurance contracts, e.g. the terms of a contract, the type of claims, etc. This implies that life and non-life products are modelled rather differently. 2 Moreover, in many countries, e.g. in Switzerland, there is a strict legal separation between life insurance and non-life insurance products. This means that a company for non-life insurance products is not allowed to sell life products, and on the other hand a life insurance company can besides life products only sell health and disability products. Every Swiss company which sells both life and non-life products has at least two legal entities. The branch non-life insurance contains the following lines of business LoB: Motor insurance motor third party liability, motor hull Property insurance private and commercial property against fire, water, flooding, business interruption, etc. Liability insurance private and commercial liability including director and officers D&O liability insurance Accident insurance personal and collective accident including compulsory accident insurance and workmen s compensation 7
8 Chapter 1. Introduction and Notation Health insurance private personal and collective health Marine insurance including transportation Other insurance products, like aviation, travel insurance, legal protection, credit insurance, epidemic insurance, etc. A non-life insurance policy is a contract among two parties, the insurer and the insured. It provides to the insurer a fixed amount of money called premium, to the insured a financial coverage against the random occurrence of well-specified events or at least a promise that he gets a well-defined amount in case such an event happens. The right of the insured to these amounts in case the event happens constitutes a claim by the insured on the insurer. The amount which the insurer is obliged to pay in respect of a claim is known as claim amount or loss amount. The payments which make up this claim are known as claims payments, loss payments, paid claims, or paid losses. The history of a typical claim may look as follows: accident date claims payments reopening reporting date claims closing payments claims closing time Figure 1.1: Typical time line of a non-life insurance claim This means that usually the insurance company is not able to settle a claim immediately, this is mainly due to two reasons: 1. Usually, there is a reporting delay time-lag between claims occurrence and claims reporting to the insurer. The reporting of a claim can take several years, especially in liability insurance e.g. asbestos or environmental pollution claims, see also Example 1.1.
Chapter 1. Introduction and Notation 9 2. After the reporting it can take several years until a claim gets finally settled. In property insurance we usually have a rather fast settlement whereas in liability or bodily injury claims it often takes a lot of time until the total degree of a claim is clear and known and can be settled. 3. It can also happen that a closed claim needs to be reopend due to new unexpected new developments or in case a relapse happens. 1.1.1 Accounting principle and accident year There are different premium accounting principles: i premium booked, ii premium written, iii premium earned. It depends on the kind of business written, which principle should be chosen. W.l.o.g. we concentrate in the present manuscript on the premium earned principle: Usually an insurance company closes its books at least once a year. Let us assume that we close our book always on December 31. How should we show a one-year contract which was written on October 1 2006 with two premium installments paid on October 1 2006 and April 1 2007? We assume that premium written 2006 = 100, premium booked 2006 = 50 = premium received in 2006, pipeline premium 31.12.2006 = 50 = premium which will be received in 2007, which gives premium booked 2007 = 50. If we assume that the risk exposure is distributed uniformly over time pro rata temporis, this implies that premium earned 2006 = 25 = premium used for exposure in 2006, unearned premium reserve UPR 31.12.2006 = 75 = premium which will be used for exposure in 2007, which gives premium earned 2007 = 75. If the exposure is not pro rata temporis, then of course we have a different split of the premium earned into the different accounting years. In order to have a consistent financial statement it is now important that the accident date and the premium accounting principle are compatible via the exposure pattern. Hence all claims which have accident year 2006 have to be matched to the premium earned 2006, i.e. the claims 2006 have to be paid by the premium earned 2006, whereas the claims with accident year later than 2006 have to be paid by the unearned premium reserve UPR 31.12.2006.
10 Chapter 1. Introduction and Notation Hence on the one hand we have to build premium reserves for future exposures, but on the other hand we need also to build claims reserves for unsettled claims of past exposures. There are two different types of claims reserves for past exposures: 1. IBNyR reserves incurred but not yet reported: We need to build claims reserves for claims which have occurred before 31.12.2006, but which have not been reported by the end of the year i.e. the reporting delay laps into the next accounting years. 2. IBNeR reserves incurred but not enough reported: We need to build claims reserves for claims which have been reported before 31.12.2006, but which have not been settled yet, i.e. we still expect payments in the future, which need to be financed by the already earned premium. Example 1.1 Reporting delay accident year number of reported claims, non-cumulative according to reporting delay reporting period 0 1 2 3 4 5 6 7 8 9 10 0 368 191 28 8 6 5 3 1 0 0 1 1 393 151 25 6 4 5 4 1 2 1 0 2 517 185 29 17 11 10 8 1 0 0 1 3 578 254 49 22 17 6 3 0 1 0 0 4 622 206 39 16 3 7 0 1 0 0 0 5 660 243 28 12 12 4 4 1 0 0 0 6 666 234 53 10 8 4 6 1 0 0 0 7 573 266 62 12 5 7 6 5 1 0 1 8 582 281 32 27 12 13 6 2 1 0 9 545 220 43 18 12 9 5 2 0 10 509 266 49 22 15 4 8 0 11 589 210 29 17 12 4 9 12 564 196 23 12 9 5 13 607 203 29 9 7 14 674 169 20 12 15 619 190 41 16 660 161 17 660 Table 1.1: claims development triangle for number of IBNyR cases source [75] 1.1.2 Inflation The following subsection on inflation follows Taylor [75]. Claims costs are often subject to inflation. Usually it is not the typical inflation, like salary or price inflation. Inflation is very specific to the LoB chosen. For example in the LoB accident inflation is driven by medical inflation, whereas for
Chapter 1. Introduction and Notation 11 the LoB motor hull inflation is driven by the technical complexity of car repairing techniques. The essential point is that claims inflation may continue beyond the occurrence date of the accident up to the point of its final payments/settlement. If X ti denote the positive single claims payments at time t i expressed in money value at time t 1, then the total claim amount is in money value at time t 1 given by C 1 = X ti. 1.1 i=1 If λ denotes the index which measures the claims inflation, the actual claim amount nominal is C = i=1 λt i λt 1 X t i. 1.2 Whenever λ is an increasing function we observe that C is bigger than C 1. Of course, in practice we only observe the unindexed payments X ti λt i /λt 1 and in general it is difficult to estimate an index function such that we obtain indexed values X ti. Finding an index function λ is equivalent in defining appropriate deflators ϕ, which is a well-known concept in market consistent actuarial valuation, see e.g. Wüthrich-Bühlmann-Furrer [91]. The basic idea between indexed values C 1 is that, if two sets of payments relate to identical circumstances except that there is a time translation in the payment, their indexed values will be the same, whereas the unindexed values are not the same: For c > 0 we assume that For λ increasing we have that X ti +c = X ti. 1.3 C 1 = i X ti = i X ti +c = C 1 1.4 C = i=1 λt i + c λt 1 X ti +c = i=1 λt i + c X ti > C, 1.5 λt 1 whenever λ is an increasing function we have assumed 1.3. This means that the unindexed values differ by the factor λt i + c/λt i. However in practice this ratio turns often out to be even of a different form, namely 1 + ψt i, t i + c λt i + c, 1.6 λt i meaning that over the time interval [t i, t i + c] claim costs are inflated by an additional factor 1 + ψt i, t i + c above the natural inflation. This additional inflation is referred to as superimposed inflation and can be caused e.g. changes in the jurisdiction and an increased claims awareness of the insured. We will not further discuss this in the sequel. by
12 Chapter 1. Introduction and Notation 1.2 Structural framework to the claims reserving problem In this section we present a mathematical framework for claims reserving. For this purpose we follow Arjas [5]. Observe that in this subsection all actions of a claim are ordered according to their notification at the insurance company. From a statistical point of view this makes perfect sense, however from an accounting point of view, one should order the claims rather to their occurence/accident date, this has been done e.g. in Norberg [58, 59]. Of course, there is a one-to-one relation between the two concepts. We assume that we have N claims within a fixed time period with reporting dates T 1,... T N assume that they are ordered, T i T i+1 for all i. Fix the i-th claim. Then T i = T i,0, T i,1,..., T i,j,..., T i,ni denotes the sequence of dates, where some action on claim i is observed, at time T i,j we have for example a payment, a new estimation of the claims adjuster or other new information on claim i. T i,ni denotes the final settlement of the claim. Assume that T i,ni +k = for k 1. We specify the events that take place at time T i,j by { payment at time T i,j for claim i, X i,j = 1.7 0, if there is no payment at time T i,j, { new information available at T i,j for claim i, I i,j = 1.8, if there is no new information at time T i,j. We set X i,j = 0 and I i,j = whenever T i,j =. With this structure we can define various interesting processes, moreover our claims reserving problem splits into several subproblems. For every i we obtain a marked point processes. Payment process of claim i. T i,j, X i,j j 0 defines the following cumulative payment process C i t = X i,j. 1.9 j:t i,j t Moreover C i t = 0 for t < T i. The total ultimate claim amount is given by C i = C i T i,ni = X i,j. 1.10 j 0 The total claims reserves for claim i at time t for the future liabilities outstanding claim at time t are given by R i t = C i C i t = X i,j. 1.11 j:t i,j >t
Chapter 1. Introduction and Notation 13 Information process of claim i is given by T i,j, I i,j j 0. Settlement process of claim i is given by T i,j, I i,j, X i,j j 0. We denote the aggregated processes of all claims i by Ct = Rt = N C i t, 1.12 i=1 N R i t. i=1 Ct denotes all payments up to time t for all N claims, and Rt denotes the outstanding claims payments reserves at time t for these N claims. We consider now claims reserving as a prediction problem. Let Ft N = σ {T i,j, I i,j, X i,j i 1,j 0 : T i,j t} 1.13 be the information available at time t. This σ-field is obtained from the information available at time t from the claims settlement processes. Often there is additional exogenous information E t at time t change of legal practice, high inflation, job market infromation, etc.. Therefore we define the information which the insurance company has at time t by F t = σ Ft N E t. 1.14 Problem. Estimate the conditional distributions µ t = P [C F t ], 1.15 with the first two moments M t = E [C F t ], 1.16 V t = Var C F t. 1.17 1.2.1 Fundamental properties of the reserving process Because of C = Ct + Rt, 1.18 we have that M t = Ct + E [Rt F t ] def. = Ct + m t, 1.19 V t = Var Rt F t. 1.20
14 Chapter 1. Introduction and Notation Lemma 1.2 M t is an F t -martingale, i.e. for t > s we have that E [M t F s ] = M s, a.s. 1.21 Proof. The proof is clear successive forecasts. Lemma 1.3 The variance process V t is an F t -supermartingale, i.e. for t > s we have that E [V t F s ] V s, a.s. 1.22 Proof. Using Jensen s inequality for t > s we have a.s. that E [V t F s ] = E [Var C F t F s ] 1.23 = E [ E [ C 2 ] ] [ Ft Fs E E [C Ft ] 2 ] Fs E [ C 2 Fs ] E [E [C Ft ] F s ] 2 = Var C F s = V s. Consider u > t and define the increment from t to u by Mt, u = M u M t. 1.24 Then, a.s., we have that E [Mt, umu, F t ] = E [Mt, ue [Mu, F u ] F t ] 1.25 = E [Mt, u E [C F u ] M u F t ] = 0. This implies that Mt, u and Mu, are uncorrelated, which is the well-known property that a we have uncorrelated increments. First approach to the claims reserving problem. Use the martingale integral representation. This leads to the innovation gains process, which determines M t when updating F t. This theory is well-understood. One has little idea about the updating process. One has statistically not enough data.
Chapter 1. Introduction and Notation 15 Second approach to the claims reserving problem. For t < u we have that F t F u. Since M t is an F t -martingale we have that E [Mt, u F t ] = 0 a.s. 1.26 We define the incremental payments within t and u by Xt, u = Cu Ct. 1.27 Hence we have that Mt, u = E [C F u ] E [C F t ] = Cu + E [Ru F u ] Ct + E [Rt F t ] 1.28 = Xt, u + E [Ru F u ] E [Cu Ct + Ru F t ] = Xt, u E [Xt, u F t ] + E [Ru F u ] E [Ru F t ]. Henceforth we have the following two terms 1. prediction error for payments within t, t + 1] Xt, t + 1 E [Xt, t + 1 F t ] ; 1.29 2. prediction error of reserves Rt + 1 when updating information E [Rt + 1 F t+1 ] E [Rt + 1 F t ]. 1.30 1.2.2 Known and unknown claims As in Subsection 1.1.1 we define IBNyR incurred but not yet reported claims and reported claims. The following process counts the number of reported claims, N t = 1 {Ti t}. 1.31 i 1 Hence we can split the ultimate claim and the reserves at time t with respect to the fact whether we have a reported or an IBNyR claim by Rt = i R i t 1 {Ti t} + i R i t 1 {Ti >t}, 1.32 where R i t 1 {Ti t} reserves for at time t reported claims, 1.33 i R i t 1 {Ti >t} reserves for at time t IBNyR claims. 1.34 i
16 Chapter 1. Introduction and Notation Hence we define R rep t = E R IBNyR t = E [ ] [ Nt ] R i t 1 {Ti t} F t = E R i t F t, 1.35 i i=1 [ ] [ N ] R i t 1 {Ti >t} F t = E R i t F t, 1.36 i i=n t+1 where N is total random number of claims. Hence we easily see that R rep t R rep t = i N t E [R i t F t ], 1.37 R IBNyR t = E [ N i=n t+1 ] R i t F t. 1.38 denotes the at time t expected future payments for reported claims. This is often called best estimate reserves at time t for reported claims. R IBNyR t are the at time t expected future payments for IBNyR claims or best estimate reserves for IBNyR claims. Conclusions. 1.37-1.38 shows that the reserves for reported claims and the reserves for IBNyR claims are of rather different nature: i The reserves for reported claims should be determined individually, i.e. on a single claims basis. Often one has quite a lot of information on reported claims e.g. case estimates, which asks for an estimate on single claims. ii The reserves for IBNyR claims can not be decoupled due to the fact that N is not known at time t see 1.36. Moreover we have no information on a single claims basis. This shows that IBNyR reserves should be determined on a collective basis. Unfortunately most of the classical claims reserving methods do not distinguish reported claims from IBNyR claims, i.e. they estimate the claims reserves at the same time on both classes. In that context, we have to slightly disappoint the reader, because the most methods presented in this manuscript do also not make this distinction. 1.3 Outstanding loss liabilities, classical notation In this subsection we introduce the classical claims reserving notation and terminology. In most cases outstanding loss liabilities are estimated in so-called claims development triangles which separates claims on two time axis.
Chapter 1. Introduction and Notation 17 In the sequel we always denote by i = accident year, year of occurrence, 1.39 j = development year, development period. 1.40 For illustrative purposes we assume that: X i,j denotes all payments in development period j for claims with accident year i, i.e. this corresponds to the incremental claims payments for claims with accident year i done in accounting year i + j. Below, we see which other meaning X i,j can have. In a claims development triangle accident years are usually on the vertical line whereas development periods are on the horizontal line see also Table 1.1. Usually the loss development tables split into two parts the upper triangle/trapezoid where we have observations and the lower triangle where we want to estimate the outstanding payments. On the diagonals we always see the accounting years. Hence the claims data have the following structure: accident development years j year i 0 1 2 3 4... j... J 0 1. I + 1 J I + 2 J realizations of r.v. C i,j, X i,j. observations. I + i J... I 2 I 1 I predicted C i,j, X i,j Data can be shown in cumulative form or in non-cumulative incremental form. Incremental data are always denoted by X i,j and cumulative data are given by C i,j = j X i,k. 1.41 k=0 The incremental data X i,j may denote the incremental payments in cell i, j, the number of reported claims with reporting delay j and accident year i or the change of reported claim amount in cell i, j. For cumulative data C i,j we often use the terminology cumulative payments, total number of reported claims or
18 Chapter 1. Introduction and Notation claims incurred for cumulative reported claims. C i, is often called ultimate claim amount/load of accident year i or total number of claims in year i. X i,j incremental payments C i,j cumulative payments X i,j number of reported claims with delay j C i,j total number of reported claims X i,j change of reported claim amount C i,j claims incurred Usually we have observations D I = {X i,j ; i + j I} in the upper trapezoid and DI c = {X i,j; i + j > I} needs to be estimated. The payments in a single accounting year are X k = X i,j, 1.42 i+j=k these are the payments in the k + 1-st diagonal. If X i,j denote incremental payments then the outstanding loss liabilities for accident year i at time j are given by R i,j = X i,k = C i, C i,j. 1.43 R i,j k=j+1 are also called claims reserves, this is essentially the amount we have to estimate lower triangle so that together with the past payments C i,j we obtain the whole claims load ultimate claim for accident year i. 1.4 General Remarks If we consider loss reserving models, i.e. models which estimate the total claim amount there are always several possibilities to do so. Cumulative or incremental data Payments or claims incurred data Split small and large claims Indexed or unindexed data Number of claims and claims averages Etc.
Chapter 1. Introduction and Notation 19 Usually, different methods and differently aggregated data sets lead to very different results. Only an experienced reserving actuary is able to tell you which is an accurate/good estimate for future liabilities for a specific data set. Often there are so many phenomenons in the data which need first to be understood before applying a method we can not simply project the past to the future by applying one model. With this in mind we describe different methods, but only practical experience will tell you which method should be applied in which situation. I.e. the focus of this manuscript lies on the mathematical description of stochastic models. We derive various properties of these models. The question of an appropriate model choice for a specific data set is not treated here. Indeed, this is probably one of the most difficult questions. Moreover, there is only very little literature on this topic, e.g. for the chain-ladder method certain aspects are considered in Barnett- Zehnwirth [7] and Venter [77]. Remark on claims figures. When we speak about claims development triangles paid or incurred data, these usually contain loss adjustment expenses, which can be allocated/attributed to single claims and therefore are contained in the claims figures. Such expenses are called allocated loss adjustment expenses ALAE. These are typically expenses for external lawyers, an external expertise, etc. Internal loss adjustment expenses income of claims handling department, maintenance of claims handling system, management fees, etc. are typically not contained in the claims figures and therefore have to be estimated separately. These costs are called unallocated loss adjustment expenses ULAE. Below, in the appendix, we describe the New York-method paid-to-paid method, which serves to estimate ULAE. The New York-method is a rather rough method which only works well in stationary situation. Therefore one could think of more sophisticated methods. Since usually, ULAE are rather small compared to the other claims payments, the New Yorkmethod is often sufficient in practical applications.
20 Chapter 1. Introduction and Notation
Chapter 2 Basic Methods We start the general discussion on claims reserving with three standard methods: 1. Chain-ladder method 2. Bornhuetter-Ferguson method 3. Poisson model for claim counts This short chapter has on the one hand illustrative purposes to give some ideas, how one can tackle the problem. It presents the easiest two methods chain-ladder and Bornhuetter-Ferguson method. On the other hand one should realize that in practice these are the methods which are used most often due to their simplicity. The chain-ladder method will be discussed in detail in Chapter 3, the Bornhuetter- Ferguson method will be discussed in detail in Chapter 4. We assume that the last development period is given by J, i.e. X i,j = 0 for j > J, and that the last observed accident year is given by I of course we assume J I. 2.1 Chain-ladder model distribution free model The chain-ladder model is probably the most popular loss reserving technique. We give different derivations for the chain-ladder model. In this section we give a distribution-free derivation of the chain-ladder model see Mack [49]. The conditional prediction error of the chain-ladder model will be treated in Chapter 3. The classical actuarial literature often explains the chain-ladder method as a pure computational alogrithm to estimate claims reserves. It was only much later that actuaries started to think about stochastic models which generate the chain-ladder algorithm. The first who came up with a full stochastic model for the chain-ladder method was Mack [49]. In 1993, Mack [49] published one of the most famous 21
22 Chapter 2. Basic Methods articles in claims reserving on the calculation of the standard error in the chainladder model. Model Assumptions 2.1 Chain-ladder model There exist development factors f 0,..., f J 1 > 0 such that for all 0 i I and all 1 j J we have that E [C i,j C i,0,..., C i,j 1 ] = E [C i,j C i,j 1 ] = f j 1 C i,j 1, 2.1 and different accident years i are independent. Remarks 2.2 We assume independence of the accident years. We will see below that this assumption is done in almost all of the methods. already eliminated accounting year effects in the data. It means that we have In addition we could also do stronger assumptions for the sequences C i,0, C i,1,..., namely that they form Markov chains. Moreover, observe that forms a martingale for j 0. j 1 C i,j f 1 l 2.2 The factors f j are called development factors, chain-ladder factors or age-toage factors. It is the central object of interest in the chain-ladder method. Lemma 2.3 Let D I = {C i,j ; i + j I, 0 j J} be the set of observations upper trapezoid. Under Model Assumptions 2.1 we have for all I J + 1 i I that l=0 E [C i,j D I ] = E [C i,j C i,i i ] = C i,i i f I i f J 1. 2.3 Proof. This is an exercise using conditional expectations: E [C i,j C i,i i ] = E [C i,j D I ] = E [E [C i,j C i,j 1 ] D I ] 2.4 = f J 1 E [C i,j 1 D I ]. If we iterate this procedure until we reach the diagonal i + j = I we obtain the claim.
Chapter 2. Basic Methods 23 Lemma 2.3 gives an algorithm for estimating the expected ultimate claim given the observations D I. This algorithm is often called recursive algorithm. For known chain-ladder factors f j we estimate the expected outstanding claims liabilities of accident year i based on D I by E [C i,j D I ] C i,i i = C i,i i f I i f J 1 1. 2.5 This corresponds to the best estimate reserves for accident year i at time I based on the information D I. Unfortunately, in most practical applications the chain-ladder factors are not known and need to be estimated. We define j i = min{j, I i} and i j = I j, 2.6 these denote the last observations/indices on the diagonal. The age-to-age factors f j 1 are estimated as follows: f j 1 = i j C k,j k=0 i j k=0 C k,j 1. 2.7 Estimator 2.4 Chain-ladder estimator The CL estimator for E [C i,j D I ] is given by for i + j > I. We define see also Table 2.1 Ĉ i,j CL = Ê [C i,j D I ] = C i,i i f I i f j 1 2.8 B k = {C i,j ; i + j I, 0 j k} D I. 2.9 In fact, we have B J = D I, which is the set of all observations at time I. Lemma 2.5 Under Model Assumptions 2.1 we have that: a f [ ] j is, given B j, an unbiased estimator for f j, i.e. E fj Bj = f j, b f [ ] j is unconditionally unbiased for f j, i.e. E fj = f j, c f 0,..., f [ J 1 are uncorrelated, i.e. E f0... f ] [ ] J 1 = E f0... E [ fj 1 ], CL d Ĉi,J is, given Ci,I i, an unbiased estimator for E [C i,j D I ] = E [C i,j C i,i i ], CL i.e. E [Ĉi,J ] CI i = E [C i,j D I ] and
24 Chapter 2. Basic Methods accident year number of reported claims, non-cumulative according to reporting delay reporting period 0 1 2 3 4 5 6 7 8 9 10 0 368 191 28 8 6 5 3 1 0 0 1 1 393 151 25 6 4 5 4 1 2 1 0 2 517 185 29 17 11 10 8 1 0 0 1 3 578 254 49 22 17 6 3 0 1 0 0 4 622 206 39 16 3 7 0 1 0 0 0 5 660 243 28 12 12 4 4 1 0 0 0 6 666 234 53 10 8 4 6 1 0 0 0 7 573 266 62 12 5 7 6 5 1 0 1 8 582 281 32 27 12 13 6 2 1 0 9 545 220 43 18 12 9 5 2 0 10 509 266 49 22 15 4 8 0 11 589 210 29 17 12 4 9 12 564 196 23 12 9 5 13 607 203 29 9 7 14 674 169 20 12 15 619 190 41 16 660 161 17 660 Table 2.1: The set B 3 e Ĉi,J CL is unconditionally unbiased for E [Ci,J ], i.e. E [Ĉi,J CL ] = E [C i,j ]. At the first sight, the uncorrelatedness of f j is surprising since neighboring estimators of the age-to-age factors depend on the same data. Proof of Lemma 2.5. a We have [ ] i j k=0 E fj 1 Bj 1 = E [C k,j B j 1 ] i j k=0 C = k,j 1 i j k=0 C k,j 1 f j 1 i j k=0 C k,j 1 This immediately implies the conditional unbiasedness. b Follows immediately from a. = f j 1. 2.10 c For the uncorrelatedness of the estimators we have for j < k [ E fj f ] [ [ k = E E fj f ]] [ [ ]] [ ] k Bk = E fj E fk Bk = E fj f k = f j f k, which implies the claim. 2.11 d For the unbiasedness of the chain-ladder estimator we have CL E [Ĉi,J ] Ci,I i = E [C i,i i f I i f ] J 1 Ci,I i [ = E C i,i i f I i f [ ] ] J 2 E fj 1 BJ 1 Ci,I i 2.12 CL = f J 1 E [Ĉi,J 1 ] Ci,I i. Iteration of this procedure leads to CL E [Ĉi,J ] Ci,I i = C i,i i f I i f J 1 = E [C i,j D I ]. 2.13
Chapter 2. Basic Methods 25 e Follows immediately from d. This finishes the proof of this lemma. Remarks 2.6 Observe that we have proved in Lemma 2.5 that the estimators f j are uncorrelated. But pay attention to the fact that they are not independent. In fact, the squares of two successive estimators f j and f j+1 are negatively correlated see also Lemma 3.8 below. It is also this negative correlation which will lead to quite some discussions about estimation errors of our parameter estimates. Observe that Lemma 2.5 d shows that we obtain unbiased estimators for the best estimate reserves E [C i,j D I ]. Let us finish this section with an example. Example 2.7 Chain-ladder method
26 Chapter 2. Basic Methods 0 1 2 3 4 5 6 7 8 9 0 5 946 975 9 668 212 10 563 929 10 771 690 10 978 394 11 040 518 11 106 331 11 121 181 11 132 310 11 148 124 1 6 346 756 9 593 162 10 316 383 10 468 180 10 536 004 10 572 608 10 625 360 10 636 546 10 648 192 2 6 269 090 9 245 313 10 092 366 10 355 134 10 507 837 10 573 282 10 626 827 10 635 751 3 5 863 015 8 546 239 9 268 771 9 459 424 9 592 399 9 680 740 9 724 068 4 5 778 885 8 524 114 9 178 009 9 451 404 9 681 692 9 786 916 5 6 184 793 9 013 132 9 585 897 9 830 796 9 935 753 6 5 600 184 8 493 391 9 056 505 9 282 022 7 5 288 066 7 728 169 8 256 211 8 5 290 793 7 648 729 9 5 675 568 f b j 1.4925 1.0778 1.0229 1.0148 1.0070 1.0051 1.0011 1.0010 1.0014 Table 2.2: Observed historical cumulative payments Ci,j and estimated chain-ladder factors fj 0 1 2 3 4 5 6 7 8 9 Reserves 0 1 10 663 318 15 126 2 10 646 884 10 662 008 26 257 3 9 734 574 9 744 764 9 758 606 34 538 4 9 837 277 9 847 906 9 858 214 9 872 218 85 302 5 10 005 044 10 056 528 10 067 393 10 077 931 10 092 247 156 494 6 9 419 776 9 485 469 9 534 279 9 544 580 9 554 571 9 568 143 286 121 7 8 445 057 8 570 389 8 630 159 8 674 568 8 683 940 8 693 030 8 705 378 449 167 8 8 243 496 8 432 051 8 557 190 8 616 868 8 661 208 8 670 566 8 679 642 8 691 971 1 043 242 9 8 470 989 9 129 696 9 338 521 9 477 113 9 543 206 9 592 313 9 602 676 9 612 728 9 626 383 3 950 815 Total 6 047 061 Table 2.3: Estimated cumulative chain-ladder payments Ĉi,j CL and estimated chain-ladder reserves Ĉi,J CL Ci,I i
Chapter 2. Basic Methods 27 2.2 The Bornhuetter-Ferguson method The Bornhuetter-Ferguson method is in general a very robust method, since it does not consider outliers in the observations. We will further comment on this in Chapter 4. The method goes back to Bornhuetter-Ferguson [10] which have published this method in 1972 in an article called the actuary and IBNR. The Bornhuetter-Ferguson method is usually understood as a pure algorithm to estimate reserves this is also the way it was published in [10]. There are several possibilities to define an appropriate underlying stochastic model which motivates the BF method. Straightforward are for example the following assumptions: Model Assumptions 2.8 Different accident years i are independent. There exist parameters µ 0,..., µ I > 0 and a pattern β 0,..., β J > 0 with β J = 1 such that for all i {0,..., I}, j {0,..., J 1} and k {1,..., J j} E[C i,0 ] = µ i β 0, E[C i,j+k C i,0,..., C i,j ] = C i,j + µ i β j+k β j. 2.14 Then we have E[C i,j ] = µ i β j and E[C i,j ] = µ i. The sequence β j j denotes the claims development pattern. If C i,j are cumulative payments, then β j is the expected cumulative cashflow pattern also called payout pattern. Such a pattern is often used, when one needs to build market-consistent/discounted reserves, where time values differ over time see also Subsection 1.1.2 on inflation. From this discussion we see that Model Assumptions 2.8 imply the following model assumptions. Model Assumptions 2.9 Different accident years i are independent. There exist parameters µ 0,..., µ I > 0 and a pattern β 0,..., β J > 0 with β J = 1 such that for all i {0,..., I} and j {0,..., J 1} E[C i,j ] = µ i β j. 2.15
28 Chapter 2. Basic Methods Often the Bornhuetter-Ferguson method is explained with the help of Model Assumptions 2.9 see e.g. Radtke-Schmidt [63], pages 37ff.. However, with Model Assumptions 2.9 we face some difficulties: Observe that E [C i,j D I ] = E[C i,j C i,0,..., C i,i i ] 2.16 = C i,i i + E [C i,j C i,i i C i,0,..., C i,i i ]. If we have no additional assumptions, we do not exactly know, what we should do with this last term. If we would know that the incremental payment C i,j C i,i i is independent from C i,0,..., C i,i i then this would imply that E [C i,j D I ] = E[C i,j C i,0,..., C i,i i ] which also comes out of Model Assumptions 2.8. = C i,i i + 1 β I i µ i, 2.17 In both model assumptions it remains to estimate the last term in 2.16-2.17. In the Bornhuetter-Ferguson method this is done as follows Estimator 2.10 Bornhuetter-Ferguson estimator The BF estimator is given by BF Ĉ i,j = Ê [C i,j D I ] = C i,i i + 1 β I i µ i 2.18 for I J + 1 i I, where β I i is an estimate for β I i and µ i is an a priori estimate for E[C i,j ]. Comparison of Bornhuetter-Ferguson and chain-ladder estimator. From the Chain-ladder Assumptions 2.1 we have that which implies j 1 E[C i,j ] = E [E [C i,j C i,j 1 ]] = f j 1 E[C i,j 1 ] = E[C i,0 ] f k, 2.19 J 1 E[C i,j ] = E[C i,0 ] f k, 2.20 k=0 k=j k=0 J 1 E[C i,j ] = f 1 k E[C i,j ]. 2.21 If we compare this to the Bornhuetter-Ferguson model Model Assumptions 2.9 E [C i,j ] = β j µ i we find that J 1 f 1 k plays the role of β j, 2.22 k=j
Chapter 2. Basic Methods 29 since J 1 k=j f 1 k describes the proportion already paid from µ i = E [C i,j ] after j development periods in the chain-ladder model. Therefore the two variables in 2.22 are often identified: this can be done with Model Assumptions 2.9, but not with Model Assumptions 2.8 since Model Assumptions 2.8 are not implied by the chain-ladder assumptions nor vica versa. I.e. if one knows the chain-ladder factors f j one constructs a development pattern β k using the identity in 2.22 and vice-versa. Then the Bornhuetter-Ferguson estimator can be rewritten as follows BF Ĉ i,j = Ci,I i + 1 1 J 1 j=i i f µ i. 2.23 j On the other hand we have for the chain-ladder estimator that Ĉ i,j CL = Ci,I i J 1 j=i i f j = C i,i i + C i,i i J 1 j=i i CL = C i,i i + Ĉi,J J 1 f j=i i j = C i,i i + 1 f j 1 J 1 1 J 1 j=i i f j j=i i f j 1 Ĉi,J CL. 2.24 Hence the difference between the Bornhuetter-Ferguson method and the chainladder method is that for the Bornhuetter-Ferguson method we completely believe into our a priori estimate µ i, whereas in the chain-ladder method the a priori estimate is replaced by an estimate Ĉi,J CL which comes completely from the observations. Parameter estimation. For µ i we need an a priori estimate µ i. This is often a plan value from a strategic business plan. This value is estimated before one has any observations, i.e. it is a pure a priori estimate. For the still-to-come factor 1 β I i one should also use an a priori estimate if one applies strictly the Bornhuetter-Ferguson method. This should be done independently from the observations. In most practical applications here one quits the path of the pure Bornhuetter-Ferguson method and one estimates the still-to-come factor from the data with the chain-ladder estimators: If f k
30 Chapter 2. Basic Methods denote the chain-ladder estimators 2.7 see also 2.22, then we set β CL j = β J 1 1 1 j = J 1 k=j f =. 2.25 k f k In that case the Bornhuetter-Ferguson method and the chain-ladder method differ only in the choice of the estimator for the ultimate claim, i.e. a priori estimate vs. chain-ladder estimate see 2.23 and 2.24. Example 2.11 Bornhuetter-Ferguson method We revisit the example given in Table 2.2 see Example 2.7. a priori estimator estimator BF CL estimate bµ i β bcl I i BF dc i,j CL dc i,j reserves reserves 0 11 653 101 100.0% 11 148 124 11 148 124 1 11 367 306 99.9% 10 664 316 10 663 318 16 124 15 126 2 10 962 965 99.8% 10 662 749 10 662 008 26 998 26 257 3 10 616 762 99.6% 9 761 643 9 758 606 37 575 34 538 4 11 044 881 99.1% 9 882 350 9 872 218 95 434 85 302 5 11 480 700 98.4% 10 113 777 10 092 247 178 024 156 494 6 11 413 572 97.0% 9 623 328 9 568 143 341 305 286 121 7 11 126 527 94.8% 8 830 301 8 705 378 574 089 449 167 8 10 986 548 88.0% 8 967 375 8 691 971 1 318 646 1 043 242 9 11 618 437 59.0% 10 443 953 9 626 383 4 768 384 3 950 815 k=j Total 7 356 580 6 047 061 Table 2.4: Claims reserves from the Bornhuetter-Ferguson method and the chainladder method We already see in this example, that using different methods can lead to substantial differences in the claims reserves. 2.3 Number of IBNyR claims, Poisson model We close this chapter with the Poisson model, which is mainly used for claim counts. The remarkable thing in the Poisson model is, that it leads to the same reserves as the chain-ladder model see Lemma 2.16. It was Mack [48], Appendix A, who has first proved that the chain-ladder reserves as maximum likelihood reserves for the Poisson model. Model Assumptions 2.12 Poisson model There exist parameters µ 0,..., µ I > 0 and γ 0,..., γ J > 0 such that the incremental quantities X i,j are independent Poisson distributed with E[X i,j ] = µ i γ j, 2.26
Chapter 2. Basic Methods 31 for all i I and j J, and J j=0 γ j = 1. For the definition of the Poisson distribution we refer to the appendix, Section B.1.2. The cumulative quantity in accident year i, C i,j, is again Poisson-distributed with E[C i,j ] = µ i. 2.27 Hence, µ i is a parameter that stands for the expected number of claims in accident year i exposure, whereas γ j defines an expected reporting/cashflow pattern over the different development periods j. Moreover we have which is independent of i. E[X i,j ] E[X i,0 ] = γ j γ 0, 2.28 Lemma 2.13 The Poisson model satisfies the Model Assumptions 2.8. Proof. The independence of different accident years follows from the independence of X i,j. Moreover, we have that E [C i,0 ] = E [X i,0 ] = µ i β 0 with β 0 = γ 0 and E [C i,j+k C i,0,..., C i,j ] = C i,j + = C i,j + µ i k E [X i,j+l C i,0,..., C i,j ] 2.29 l=1 k γ j+l = C i,j + µ i β j+k β j, l=1 with β j = j l=0 γ j. This finishes the proof. To estimate the parameters µ i i and γ j j there are different methods, one possibility is to use the maximum likelihood estimators: The likelihood function for D I = {C i,j ; i + j I, j J}, the σ-algebra generated by D I is the same as the one generated by {X i,j ; i + j I, j J}, is given by L DI µ 0,..., µ I, γ 0,..., γ J = i+j I e µ iγj µ iγ j X i,j X i,j!. 2.30 We maximize this log-likelihood function by setting its I +J +2 partial derivatives w.r.t. the unknown parameters µ j and γ j equal to zero. Thus, we obtain on D I that I i J j=0 I j µ i γ j = µ i γ j = I i J j=0 X i,j = C i,i i J, 2.31 I j X i,j, 2.32
32 Chapter 2. Basic Methods for all i {0,..., I} and all j {0,..., J} under the constraint that γ j = 1. This system has a unique solution and gives us the ML estimates for µ i and γ j. Estimator 2.14 Poisson ML estimator The ML estimator in the Poisson Model 2.12 is for i + j > I given by X i,j P oi Ĉ i,j P oi = Ê[X i,j] = µ i γ j, 2.33 = Ê [C i,j D I ] = C i,i i + J P oi X i,j. 2.34 j=i i+1 Observe that P oi I i Ĉ i,j = Ci,I i + 1 γ j µ i, 2.35 hence the Poisson ML estimator has the same form as the BF Estimator 2.10. However, here we use estimates for µ i and γ j that depend on the data. Example 2.15 Poisson ML estimator We revisit the example given in Table 2.2 see Example 2.7. j=0
Chapter 2. Basic Methods 33 0 1 2 3 4 5 6 7 8 9 0 5 946 975 3 721 237 895 717 207 760 206 704 62 124 65 813 14 850 11 130 15 813 1 6 346 756 3 246 406 723 222 151 797 67 824 36 603 52 752 11 186 11 646 2 6 269 090 2 976 223 847 053 262 768 152 703 65 444 53 545 8 924 3 5 863 015 2 683 224 722 532 190 653 132 976 88 340 43 329 4 5 778 885 2 745 229 653 894 273 395 230 288 105 224 5 6 184 793 2 828 338 572 765 244 899 104 957 6 5 600 184 2 893 207 563 114 225 517 7 5 288 066 2 440 103 528 043 8 5 290 793 2 357 936 9 5 675 568 Table 2.5: Observed historical incremental payments Xi,j 0 1 2 3 4 5 6 7 8 9 bµi estimated reserves 0 11 148 124 1 15126 10 663 318 15 126 2 11133 15124 10 662 008 26 257 3 10506 10190 13842 9 758 606 34 538 4 50361 10628 10308 14004 9 872 218 85 302 5 69291 51484 10865 10538 14316 10 092 247 156 494 6 137754 65693 48810 10301 9991 13572 9 568 143 286 121 7 188846 125332 59769 44409 9372 9090 12348 8 705 378 449 167 8 594767 188555 125139 59677 44341 9358 9076 12329 8 691 972 1 043 242 9 2795422 658707 208825 138592 66093 49107 10364 10052 13655 9 626 383 3 950 815 bγj 58.96% 29.04% 6.84% 2.17% 1.44% 0.69% 0.51% 0.11% 0.10% 0.14% 6 047 061 Table 2.6: Estimated µi, γj, incremental payments Xi,j P oi and Poisson reserves
34 Chapter 2. Basic Methods Remark. The expected reserve is the same as in the chain-ladder model on cumulative data see Lemma 2.16 below. 2.3.1 Poisson derivation of the chain-ladder model I this subsection we show that the Poisson model Section 2.3 leads to the chainladder estimate for the reserves. Lemma 2.16 The Chain-ladder Estimate 2.4 and the ML Estimate 2.14 in the Poisson model 2.12 lead to the same reserve. In fact the Poisson ML model/estimate defined in Section 2.3 leads to a chainladder model see formula 2.39, moreover the ML estimators lead to estimators for the age-to-age factors which are the same as in the distribution-free chain-ladder model. Proof. In the Poisson model 2.12 the estimate for E [C i,j C i,j 1 ] is given by If we iterate this procedure we obtain for i > I J µ i γ j + C i,j 1. 2.36 Ĉ i,j P oi J = Ê [C i,j D I ] = µ i γ j + C i,i i j=j i+1 j J i J = µ i γ j + X i,j = µ i γ j, 2.37 j=j i+1 j=0 j=0 where in the last step we have used 2.31. Using 2.31 once more we find that J P oi j=0 Ĉ i,j = Ê [C i,j D I ] = C i,i i γ j j i j=0 γ. 2.38 j This last formula can be rewritten introducing additional factors Ĉ i,j P oi = C i,i i = C i,i i J j=0 γ j j i j=0 γ j j i+1 j=0 γ j j i j=0 γ j J j=0... γ j J 1 j=0 γ. j If we use Lemma 2.17 below we see that on D I we have that 2.39 I j C i,j J = I j j J µ i γ k. 2.40 k=0
Chapter 2. Basic Methods 35 Moreover using 2.32 I j I j I j C i,j 1 J = Ci,j J X i,j 1 {j J} = µ i j 1 J k=0 γ k. 2.41 But 2.40-2.41 immediately imply for j J that j k=0 γ I j k j 1 k=0 γ = C i,j I j k C = f j 1. 2.42 i,j 1 Hence from 2.39 we obtain Ĉ i,j P oi = C i,i i I j i+1 k=0 C I J k,j i+1 k=0 I j... C k,j i+1 I J k=0 C k,j i k=0 C k,j 1 = C i,i i f I i f J 1 = Ĉi,J CL, 2.43 which is the chain-ladder estimate 2.8. This finishes the proof of Lemma 2.16. Lemma 2.17 Under Model Assumptions 2.12 we have on D I that I j C i,j J = I j j J µ i γ k. 2.44 Proof. We proof this by induction. Using 2.31 for i = 0 we have that k=0 I J I J C 0,I J = X 0,j = µ 0 γ j, 2.45 j=0 which is the starting point of our induction j = I. Induction step j j 1 using 2.31-2.32: In the last step we use the induction assuption, then j=0 I j 1 C i,j 1 J = = = = = I j 1 I j C i,j J Ci,j J + C i,j 1 J C i,j J I j 1 X i,j 1 {j J} + C I j+1,j J I j I j j J C i,j J X i,j 1 {j J} X I j+1,j 1 {j J} + I j I j C i,j J X i,j 1 {j J} + I j j J µ i γ k γ j 1 {j J} k=0 I j j 1 J k=0 X I j+1,k µ i + µ I j+1 j 1 J k=0 k=0 γ k. 2.46 X I j+1,k
36 Chapter 2. Basic Methods Hence we find that I j 1 C i,j 1 J = = I j µ i I j 1 j 1 J µ i k=0 j 1 J k=0 γ k + µ I j+1 j 1 J k=0 γ k γ k, 2.47 which proves the claim 2.44. Corollary 2.18 Under Model Assumptions 2.12 we have for all j {0,..., J} that see also 2.25 j γ k = k=0 J 1 CL β j = Proof. From 2.38 and 2.43 we obtain for all i I J k=j 1 f k. 2.48 C i,i i J j=0 γ j I i j=0 γ j Since γ j = 1 is normalized we obtain that = Ĉi,J P oi = Ĉ i,j CL = Ci,I i f I i... f J 1. 2.49 I i 1 = γ j j=0 J 1 j=i i I i f j = γ j j=0 βcl I i 1, 2.50 which proves the claim. Remarks 2.19 Corollary 2.18 says that the chain-ladder method and the Poisson model method lead to the same cash-flow pattern β CL j and hence to the same Bornhuetter-Ferguson reserve if we use this cash-flow pattern for the estimate of β j. Henceforth, if we use the cash-flow pattern β CL j for the BF method, the BF method and the Poisson model only differ in the choice of the expected ultimate claim µ i, since with 2.35 we obtain that Ĉ i,j P oi = Ci,I i + 1 where µ i is the ML estimate given in 2.31-2.32. CL β I i µ i, 2.51
Chapter 2. Basic Methods 37 Observe that we have to solve a system of linear equations 2.31-2.32 to obtain the ML estimates µ i and γ j. This solution can easily be obtained with the help of the chain-ladder factors f j see Corollary 2.18, namely γ l = β CL l J 1 CL β l 1 = k=l 1 1 1/ f f l 1, 2.52 k and µ i = I i J j=0 X i,j / I i J j=0 γ j. 2.53 Below we will see other ML methods and GLM models where the solution of the equations is more complicated, and where one applies algorithmic methods to find numerical solutions.
38 Chapter 2. Basic Methods
Chapter 3 Chain-ladder models 3.1 Mean square error of prediction In the previous section we have only given an estimate for the mean/expected ultimate claim, of course we would also like to know, how good this estimate is. To measure the quality of the estimate we consider second moments. Assume that we have a random variable X and a set of observations D. Assume that X is a D-measurable estimator for E[X D]. Definition 3.1 Conditional Mean Square Error of Prediction The conditional mean square error of prediction of the estimator X is defined by [ msep X D X 2 = E X X D]. 3.1 For a D-measurable estimator X we have msep X D X = Var X D + X E [X D] 2. 3.2 The first term on the right-hand side of 3.2 is the so-called process variance stochastic error, i.e. the variance which is within the stochastic model pure randomness which can not be eliminated. The second term on the right-hand side of 3.2 is the parameter/estimation error. It reflects the uncertainty in the estimation of the parameters and of the expectation, respectively. this estimation error becomes smaller the more observations we have. In general, But pay attention: In many practical situations it does not completely disappear, since we try to predict future values with the help of past information, already a slight change in the model over time causes lots of problems this is also discussed below in Section 3.3. For the estimation error we would like to explicitly calculate the last term in 3.2. However, this can only be done if E [X D] is known, but of course this term is in 39
40 Chapter 3. Chain-ladder models general not known we estimate it with the help of X. Therefore, the derivation of an estimate for the parameter error is more sophisticated: One is interested into the quality of the estimate X, therefore one studies the possible fluctuations of X around E [X D]. Case 1. We assume that X is independent of D. This is e.g. the case if we have i.i.d. experiments where we want to estimate its average outcome. In that case we have that E [X D] = E[X] and Var X D = Var X. 3.3 If we consider the unconditional mean square error of prediction for the estimator X we obtain msep X X = E [ msep X D X ] [ ] 2 = Var X + E X E [X], 3.4 and if X is an unbiased estimator for E[X], i.e. E [ X] = E[X], we have msep X X [ = E msep X D X ] = Var X + Var X. 3.5 Hence the parameter error is estimated by the variance of X. Example. Assume X and X 1,..., X n are i.i.d. with mean µ and variance σ 2 <. Then we have for the estimator X = n i=1 X i/n that 2 msep X D X = σ 2 1 n + X i µ. 3.6 n By the strong law of large numbers we know that the last term disappears a.s. for n. In order to determine this term for finite n, one would like to explicitly calculate the distance between n i=1 X i/n and µ. However, since in general µ is not known, we can only give an estimate for that distance. If we calculate the unconditional mean square error of prediction we obtain i=1 msep X X = σ 2 + σ 2 /n. 3.7 Henceforth, we can say that the deviation of n i=1 X i/n around µ is in the average of order σ/ n. But unfortunately this doesn t tell anything about the estimation error for a specific realisation of n i=1 X i/n. We will further discuss this below. Case 2. X is not independent of the observations D. We have several time series examples below, where we do not have independence between different observations, e.g. in the distribution free version of the chain-ladder method.
Chapter 3. Chain-ladder models 41 In all these cases the situation is even more complicated. Observe that if we calculate the unconditional mean square error of prediction we obtain msep X X [ = E msep X D X ] 3.8 [ ] 2 = E [Var X D] + E X E [X D] [ ] 2 = Var X Var E [X D] + E X E [X D] [ ] 2 = Var X + E X E [X] [ ] 2 E X E [X] E [X D] E [X]. If the estimator X is unbiased for E [X] we obtain msep X X = Var X + Var X 2 Cov X, E [X D]. 3.9 This again tells something on the average estimation error but it doesn t tell anything on the quality of the estimate X for a specific realization. 3.2 Chain-ladder method We have already described the chain-ladder method in Subsection 2.1. The chainladder method can be applied to cumulative payments, to claims incurred, etc. It is the method which is most commonly applied because it is very simple, and often using appropriate estimates for the chain-ladder factors, one obtains reliable claims reserves. The main deficiencies of the chain-ladder method are The homogeneity property need to be satisfied, e.g. we should not have any trends in the development factors otherwise we have to transform our observations. For estimating old development factors f j for large j there is only very little data available, which is maybe in practice no longer representative for younger accident years. E.g. assume that we have a claims development with J = 20 years, and that I = 2006. Hence we estimate with today s information accident years < 2006 what will happen with accident year 2006 in 20 years. For young accident years, very much weight is given to the observations, i.e. if we have an outlier on the diagonal, this outlier is projected right to
42 Chapter 3. Chain-ladder models the ultimate claim, which is not always appropriate. Therefore for younger accident years sometimes the Bornhuetter-Ferguson method is preferred see also discussion in Subsection 4.2.4. In long-tailed branches/lob the difference between the chain-ladder method on cumulative payments and claims incurred is often very large. This is mainly due to the fact that the homogeneity property is not fulfilled. Indeed, if we have new phenomenons in the data, usually claims incurred methods overestimates such effects, whereas estimates on paid data underestimate the effects since we only observe the new behavior over time. This is mainly due to the effect that the claims adjusters usually overestimate new phenomenons which is reflected in the claims incurred figures, whereas in claims paid figures one observes new phenomenons only over time when a claim is settled via payments. There is an extensive list of references on how the chain-ladder method should be applied in practice and where future research projects could be settled. We do not further discuss this here but only give two references [46] and [40] which refer to such questions. Moreover, we would mention that there is also literature on the appropriatness of the chain-ladder method for specific data sets, see e.g. Barnett-Zehnwirth [7] and Venter [77]. 3.2.1 The Mack model We define the chain-ladder model once more, but this time we extend the definition to the second moments, so that we are also able to give an estimate for the conditional mean square error of prediction for the chain-ladder estimator. In the actuarial literature, the chain-ladder method is often understood as a purely computational algorithm and leaves the question open which probabilistic model would lead to that algorithm. It is Mack s merit [49] that he has given first an answer to that question a first decisive step towards the formulas was done by Schnieper [69]. Model Assumptions 3.2 Chain-ladder, Mack [49] Different accident years i are independent. C i,j j is a Markov chain with: There exist factors f 0,..., f J 1 > 0 and variance parameters σ0, 2..., σj 1 2 > 0 such that for all 0 i I and 1 j
Chapter 3. Chain-ladder models 43 J we have that E [C i,j C i,j 1 ] = f j 1 C i,j 1, 3.10 Var C i,j C i,j 1 = σj 1 2 C i,j 1. 3.11 Remark. In Mack [49] there are slightly weaker assumptions, namely the Markov chain assumption is replaced by weaker assumptions on the first two moments of C i,j j. We recall the results from Section 2.1 see Lemma 2.5: Choose the following estimators for the parameters f j and σ 2 j : f j = σ 2 j = i j+1 i j+1 C i,j+1 C i,j = i 1 j+1 i j + 1 i j+1 C i,j C i,j i j+1 k=0 Ci,j+1 C i,j C k,j f j C i,j+1 C i,j, 2. 3.12 f j is unconditionally and conditionally, given B j, unbiased for f j. f 0,..., f J 1 are uncorrelated. If we define the individual development factors by F i,j+1 = C i,j+1 C i,j, 3.13 then the age-to-age factor estimates f j are weighted averages of F i,j+1, namely f j = i j+1 C i,j i j+1 k=0 C k,j F i,j+1. 3.14 Lemma 3.3 Under Assumptions 3.2 the estimator f j is the B j+1 -measurable unbiased estimator for f j, which has minimal conditional variance among all linear combinations of the unbiased estimators F i,j+1 0 i i j+1 for f j, conditioned on B j, i.e. Var fj B j = min α i R Var i j+1 α i F i,j+1 B j. 3.15
44 Chapter 3. Chain-ladder models The conditional variance of f j is given by Var i j+1 fj B j = σ 2 j / C i,j. 3.16 We need the following lemma to proof the statement: Lemma 3.4 Assume that P 1,..., P H are stochastically independent unbiased estimators for µ with variances σ1, 2..., σh 2. Then the minimum variance unbiased linear combination of the P h is given by P = H P h /σh 2 h=1, 3.17 H 1/σh 2 h=1 with H 1 VarP = 1/σh 2. 3.18 h=1 Proof. See Proposition 12.1 in Taylor [75] the proof is based on the method of Lagrange. Proof of Lemma 3.3. Consider the individual development factors F i,j+1 = C i,j+1 C i,j. 3.19 Conditioned on B j, F i,j+1 are unbiased and independent estimators for f j with VarF i,j+1 B j = VarF i,j+1 C i,j = σ 2 j /C i,j. 3.20 With Lemma 3.4 the claim immediately follows with Lemma 3.5 Under Assumptions 3.2 we have: i j+1 Var f j B j = σj 2 / C i,j. 3.21 a σ j 2 is, given B j, an unbiased estimator for σj 2, i.e. E [ σ ] j 2 B j = σ 2 j, b σ 2 j is unconditionally unbiased for σ 2 j, i.e. E [ σ 2 j ] = σ 2 j.
Chapter 3. Chain-ladder models 45 Proof. b easily follows from a. Hence we prove a. Consider [ Ci,k+1 ] [ 2 Ci,k+1 ] 2 E C f k i,k B k = E f k C i,k B k [ Ci,k+1 ] [ ] 2 2 E f k fk f k B k + E fk f k B k. C i,k 3.22 Hence we calculate the terms on the r.h.s. of the equality above. [ Ci,k+1 ] 2 E f k B Ci,k+1 k = Var B k = 1 σ C k. 2 3.23 i,k C i,k C i,k The next term is using the independence of different accident years [ Ci,k+1 ] E f k fk f k Ci,k+1 B k = Cov, f k B k C i,k Whereas for the last term we obtain [ ] 2 E fk f k B k Putting all this together gives E [ Ci,k+1 C i,k f k 2 B k = = = Var fk Bk = ] = σ 2 k C i,k C i,k i Var k+1 C i,k σk 2 i. k+1 C i,k Ci,k+1 C i,k 3.24 B k σ 2 k i k+1 C i,k. 3.25 1 1 C i. 3.26 k+1 i,k C i,k Hence we have that E [ σ k 2 B k ] = 1 i k + 1 i k+1 C i,k E [ Ci,k+1 C i,k ] 2 f k B k = σk, 2 3.27 which proves the claim a. This finishes the proof of Lemma 3.5. The following equality plays an important role in the derivation of an estimator for the conditional estimation error [ ] E f 2 B k = Var fk Bk + fk 2 = k σ 2 k i k+1 C i,k + f 2 k. 3.28
46 Chapter 3. Chain-ladder models In Estimator 2.4 we have already seen how we estimate the ultimate claim C i,j, given the information D I in the chain-ladder model: Ĉ i,j CL = Ê [C i,j D I ] = C i,i i f I i f J 1. 3.29 Our goal is to derive an estimate for the conditional mean square error of prediction CL conditional MSEP of Ĉi,J for single accident years i {I J + 1,..., I} msep Ci,J D Ĉi,J [ CL CL 2 ] I = E Ĉi,J Ci,J D I 3.30 and for aggregated accident years we consider I I msep P CL i C i,j D I Ĉ i,j = E i=i J+1 = Var C i,j D I + Ĉi,J CL E [Ci,J D I ] 2, i=i J+1 Ĉ i,j CL I i=i J+1 2 C i,j D I. 3.31 From 3.30 we see that we need to give an estimate for the process variance and for the estimation error coming from the fact that f j is estimated by f j. 3.2.2 Conditional process variance We consider the first term on the right-hand side of 3.30, which is the conditional process variance. Assume J > I i, Var C i,j D I = Var C i,j C i,i i 3.32 = E [Var C i,j C i,j 1 C i,i i ] + Var E [C i,j C i,j 1 ] C i,i i = σ 2 J 1 E [C i,j 1 C i,i i ] + f 2 J 1 Var C i,j 1 C i,i i = σ 2 J 1 C i,i i J 2 j=i i f j + f 2 J 1 Var C i,j 1 C i,i i. Hence we obtain a recursive formula for the process variance. If we iterate this procedure, we find that Var C i,j C i,i i = C i,i i = This gives the following Lemma: J 1 m=i i J 1 m=i i J 1 n=m+1 J 1 n=m+1 = E [C i,j C i,i i ] 2 f 2 n σ 2 m m 1 l=i i f 2 n σ 2 m E [C i,m C i,i i ] 3.33 J 1 m=i i f l σ 2 m/f 2 m E [C i,m C i,i i ].
Chapter 3. Chain-ladder models 47 Lemma 3.6 Process variance for single accident years Under Model Assumptions 3.2 the conditional process variance for the ultimate claim of a single accident year i {I J + 1,..., I} is given by Var C i,j D I = E [C i,j C i,i i ] 2 J 1 m=i i σ 2 m/f 2 m E [C i,m C i,i i ]. 3.34 Hence we estimate the conditional process variance for a single accident year i by Var C i,j D I = Ê [ C i,j E [C i,j D I ] 2 DI ] = Ĉi,J CL 2 J 1 m=i i σ 2 m/ f 2 m Ĉ i,m CL. 3.35 The estimator for the conditional process variance can be rewritten in a recursive form. We obtain for i {I J + 1,..., I} Var C i,j D I = Var C i,j 1 D I f 2 J 1 + σ 2 J 1 Ĉ i,j 1 CL, 3.36 where Var C i,i i D I = 0. Because different accident years are independent, we estimate the conditional process variance for aggregated accident years by I I Var C i,j D I = i=i J+1 i=i J+1 Var C i,j D I. 3.37 Example 3.7 Chain-ladder method We come back to our example in Table 2.2 see Example 2.7. Since we do not have enough data i.e. we don t have I > J we are not able to estimate the last variance parameter σj 1 2 with the estimator σ2 J 1 cf. 3.12. There is an extensive literature about estimation of tail factors and variance estimates. We do not further discuss this here, but we simply choose the extrapolation chosen by Mack [49]: σ J 1 2 = min { σ } J 2/ σ 4 J 3; 2 σ J 3; 2 σ J 2 2 3.38 as estimate for σj 1 2. This estimate is motivated by the observation that the series σ 0,..., σ J 2 is usually decreasing cf. Table 3.1. This gives the estimated conditional process standard deviations in Table 3.2. We define the estimated conditional variational coefficient for accident year i relative to the estimated CL reserves as follows: Vco i = Vco C i,j D I = Var C i,j D I 1/2 Ĉ i,j CL Ci,I i. 3.39 If we take this variational coefficient as a measure for the uncertainty, we see that the uncertainty of the total CL reserves is about 7%.
48 Chapter 3. Chain-ladder models 0 1 2 3 4 5 6 7 8 1 1.6257 1.0926 1.0197 1.0192 1.0057 1.0060 1.0013 1.0010 1.0014 2 1.5115 1.0754 1.0147 1.0065 1.0035 1.0050 1.0011 1.0011 3 1.4747 1.0916 1.0260 1.0147 1.0062 1.0051 1.0008 4 1.4577 1.0845 1.0206 1.0141 1.0092 1.0045 5 1.4750 1.0767 1.0298 1.0244 1.0109 6 1.4573 1.0635 1.0255 1.0107 7 1.5166 1.0663 1.0249 8 1.4614 1.0683 9 1.4457 10 bf j 1.4925 1.0778 1.0229 1.0148 1.0070 1.0051 1.0011 1.0010 1.0014 bσ j 135.253 33.803 15.760 19.847 9.336 2.001 0.823 0.219 0.059 Table 3.1: Observed historical individual chain-ladder factors F i,j+1, estimated chain-ladder factors f j and estimated standard deviations σ j i C i,i i CL Ci,J d CL reserves Var d `Ci,J D I 1/2 Vco i 0 11 148 124 11 148 124 0 1 10 648 192 10 663 318 15 126 191 1.3% 2 10 635 751 10 662 008 26 257 742 2.8% 3 9 724 068 9 758 606 34 538 2 669 7.7% 4 9 786 916 9 872 218 85 302 6 832 8.0% 5 9 935 753 10 092 247 156 494 30 478 19.5% 6 9 282 022 9 568 143 286 121 68 212 23.8% 7 8 256 211 8 705 378 449 167 80 077 17.8% 8 7 648 729 8 691 971 1 043 242 126 960 12.2% 9 5 675 568 9 626 383 3 950 815 389 783 9.9% Total 6 047 061 424 379 7.0% Table 3.2: Estimated chain-ladder reserves and estimated conditional process standard deviations 3.2.3 Estimation error for single accident years Next we need to derive an estimate for the conditional parameter/estimation error, i.e. we want to get an estimate for the accuracy of our chain-ladder factor estimates f j. The parameter error for a single accident year in the chain-ladder estimate is given by see 3.30, 2.3 and 2.8 CL 2 Ĉi,J E [Ci,J D I ] = C 2 i,i i fi i... f 2 J 1 f I i... f J 1 3.40 J 1 J 1 J 1 = Ci,I i 2 f j 2 + fj 2 2 f j f j. j=i i j=i i j=i i Hence we would like to calculate 3.40. Observe that the realizations of the estimators f I i,..., f J 1 are known at time I, but the true chain-ladder factors f I i,..., f J 1 are unknown. Hence 3.40 can not be calculated explicitly. In order to determine the conditional estimation error we will analyze how much the posc 2006 M. Wüthrich, ETH Zürich & M. Merz, Uni Tübingen
Chapter 3. Chain-ladder models 49 sible chain-ladder factors f j fluctuate around f j. We measure these volatilities of the estimates f j by means of resampled observations for f j. There are different approaches to resample these values: conditional ones and unconditional ones, see Buchwalder et al. [13]. For the explanation of these different approaches we fix accident year i {I J + 1,..., I}. Then we see from the righthand side of 3.40 that the main difficulty in the determination of the volatility in the estimates comes from the calculation of the squares of the estimated chainladder factors. Therefore, we focus for the moment on these squares, i.e. we need to resample the following product of squared estimates f I i 2... f J 1, 2 3.41 the treatment of the last term in 3.40 is then straightforward. To be able to distinguish the different resample approaches we define by D O I,i = {C i,j D I ; j > I i} D I 3.42 the upper right corner of the observations D I j = I i + 1. with respect to development year accident development year j year i 0... I i... J 0. DI,i O i. I Table 3.3: The upper right corner D O I,i For the following explanation observe that f j is B j+1 -measurable. Approach 1 Unconditional resampling in DI,i O. In this approach one calculates the expectation [ E f 2 I i... f ] J 1 2 B I i. 3.43 This is the complete averaging over the multidimensional distribution after time I i. Since D O I,i B I i = holds true the value 3.43 does not depend on the observations in DI,i O. I.e. the observed realizations in the upper corner DO I,i have no influence on the estimation of the parameter error. Therefore we call this
50 Chapter 3. Chain-ladder models the unconditional version because it gives the average/expected estimation error independent of the observations in D O I,i. Approach 2 Partial conditional resampling in DI,i O. In this approach one calculates the value f I i 2... f [ ] J 2 2 E f 2 B J 1. 3.44 In this version the averaging is only done partially. However, D O I,i B J 1 holds. I.e. the value 3.44 depends on the observations in DI,i O. If one decouples the problem of resampling in a smart way, one can even choose the position j {I i,..., J 1} at which one wants to do the partial resampling. Approach 3 Conditional resampling in DI,i O. Calculate the value [ ] [ ] [ ] E f 2 B I i E f 2 B I i+1... E f 2 B J 1. 3.45 I i I i+1 Unlike the approach 3.44 the averaging is now done in every position j {I i,..., J 1} on the conditional structure. Since D O I,i B j if j > I i the observed realizations in DI,i O have a direct influence on the estimate and 3.45 depends on the observations in DI,i O. In contrast to 3.43 the averaging is only done over the conditional distributions and not over the multidimensional distribution after I i. Therefore we call this the conditional version. From a numerical point of view it is important to note that Approach 3 allows for a multiplicative structure of the measure of volatility see Figure 3.1. J 1 J 1 Concluding, this means that we consider different probability measures for the resampling, conditional and unconditional ones. Observe that the estimated chainladder factors f j are functions of C i,j+1,...,i j 1 and C i,j,...,i j 1, i.e. f j = f j C i,j+1,...,i j 1, C i,j,...,i j 1 = I j 1 C i,j+1 I j 1 C i,j. 3.46 In the conditional resampling the denominator serves as a fixed volume measure, whereas in the unconditional resampling the denominator is also resampled. Since our time series C k,j j is a Markov chain we can write its probability distribution with the help of stochastic kernels K j as follows: dp k x 0,..., x J 3.47 = K 0 dx 0 K 1 x 0, dx 1 K 2 x 0, x 1, dx 2 K J x 0,..., x J 1, dx J = K 0 dx 0 K 1 x 0, dx 1 K 2 x 1, dx 2 K J x J 1, dx J.
Chapter 3. Chain-ladder models 51 In Approach 1 one considers a complete resampling on DI,i O, i.e. one looks, given B I i, at the measures dp x k,j k,j B I i = k<i dp k x k,i i+1,..., x k,i k C k,i i = x k,i i 3.48 = k<i K I i+1 x k,i i, dx k,i i+1 K I k x k,i k 1, dx k,i k, for the resampling of the estimated chain-ladder factors j I i f j = j I i f j x i,j+1,...,i j 1, x i,j,...,i j 1. 3.49 In Approach 3 we always keep fixed the set of actual observations C i,j and we only resample the next step in the time series, i.e. given D I we consider the measures see also Figure 3.1 dp D I x k,j k,j = k<i K I i+1 C k,i i, dx k,i i+1 K I k C k,i k 1, dx k,i k, 3.50 for the resampling of j I i f j = j I i f j x i,j+1,...,i j 1, C i,j,...,i j 1. 3.51 Hence in this context C i,j serves as a volume measure for the resampling of C i,j+1. In Approach 1 this volume measure is also resampled, whereas in Approach 3 it is kept fixed. Observe. The question, as to which approach should be chosen, is not a mathematical one and has lead to extensive discussions in the actuarial community see Buchwalder et al. [11], Mack et al. [52], Gisler [29] and Venter [78]. It depends on the circumstances of the questions as to which approach should be used for a specific practical problem. Henceforth, only the practitioner can choose the appropriate approach for his problems and questions.
52 Chapter 3. Chain-ladder models Approach 1 Unconditional resampling In the unconditional approach we have due to the uncorrelatedness of the chainladder factors that [ CL 2 ] E Ĉi,J E [Ci,J D I ] B I i [ J 1 J 1 = Ci,I i 2 E f j 2 + fj 2 2 = C 2 i,i i E j=i i [ J 1 j=i i f 2 j j=i i ] B I i J 1 j=i i J 1 j=i i f 2 j f j f j B I i ]. 3.52 Hence, to give an estimate for the estimation error with the unconditional version, we need to calculate the expectation in the last term of 3.52 as described in Approach 1. This would be easy, if the estimated chain-ladder factors f j were independent. But they are only uncorrelated, see Lemma 2.5 and the following lemma for a similar statement see also Mack et al. [52]: Lemma 3.8 Under Model Assumptions 3.2 the squares of two successive chainladder estimators f j 1 and f j are, given B j 1, negatively correlated, i.e. Cov f 2 j 1, f j 2 B j 1 < 0 3.53 for 1 j J 1. Proof. Observe that f j 1 is B j -measurable. We define Hence, we have that Cov f 2 j 1, f j 2 B j 1 [ = E Cov f j 1, 2 σ2 j = Cov f 2 j 1, f 2 j = σ2 j Sj 1 2 Cov S j = i j+1 ] B j Bj 1 + Cov + fj 2 S j B j 1 i j C i,j 2 C i,j. 3.54 [ E f 2, 1 S j B j 1. j 1 ] [ ] B j, E f 2 B j Bj 1 j 3.55 Moreover, using i j C i,j 2 = S 2 j + 2 S j C I j,j + C 2 I j,j, 3.56
Chapter 3. Chain-ladder models 53 the independence of different accident years and E [C I j,j B j 1 ] = f j 1 C I j,j 1 leads to Cov f 2 j 1, f j 2 B j 1 3.57 = σ2 j Sj 1 2 [ Cov S 2j, 1Sj B j 1 + 2 f j 1 C I j,j 1 Cov S j, 1Sj B j 1 ]. Finally, we need to calculate both covariance terms on the right-hand side of 3.57. Using Jensen s inequality we obtain for α = 1, 2 Cov S αj, 1Sj B j 1 = E [ ] [ ] [ ] S α 1 Bj 1 E S α j Bj 1 E S 1 Bj 1 j < E [ ] S α 1 Bj 1 E [Sj B j 1 ] α E [S j B j 1 ] 1 = 0, j j 3.58 Jensen s inequality is strict because we have assumed strictly positive variances σ 2 j 1 > 0, which implies that S j is not deterministic at time j 1. This finishes the proof of Lemma 3.8. Lemma 3.8 implies that the term E [ J 1 j=i i f 2 j ] B I i 3.59 can not easily be calculated. Hence from this point of view Approach 1 is not a promising route for finding a closed formula for the estimation error. Approach 3 conditional resampling In Approach 3 we explicitly resample the observed chain-ladder factors f j. To do the resampling we introduce stronger model assumptions. This is done with a time series model. Such time series models for the chain-ladder method can be found in several papers in the literature see e.g. Murphy [55], Barnett-Zehnwirth [7] or Buchwalder et al. [13]. Model Assumptions 3.9 Time series model Different accident years i are independent. There exist constants f j > 0, σ j > 0 and random variables ε i,j+1 such that for all i {0,..., I} and j {0,..., J 1} we have that C i,j+1 = f j C i,j + σ j C i,j ε i,j+1, 3.60 with conditionally, given B 0, ε i,j+1 are independent with E [ε i,j+1 B 0 ] = 0, E [ ] ε 2 i,j+1 B 0 = 1 and P [Ci,j+1 > 0 B 0 ] = 1 for all i {0,..., I} and j {0,..., J 1}.
54 Chapter 3. Chain-ladder models Remarks 3.10 The time series model defines an auto-regressive process. It is particularly useful for the derivation of the estimation error and reflects the mechanism of generating sets of other possible observations. The random variables ε i,j+1 are defined conditionally, given B 0, in order to ensure that the cumulative payments C i,j+1 stay positive, P [ B 0 ]-a.s. It is easy to show that Model Assumptions 3.9 imply the Assumptions 3.2 of the classical stochastic chain-ladder model of Mack [49]. The definition of the time series model in Buchwalder et al. [11] is slightly different. The difference lies in the fact that here we assume a.s. positivity of C i,j. This could also be done with the help of conditional assumptions, i.e. the theory would also run through if we would assume that P [C i,j+1 > 0 C i,j ] = 1, 3.61 for all i and j. In the sequel we use Approach 3, i.e. we do conditional resampling in the time series model. We therefore resample the observations for f I i,..., f J 1, given the upper trapezoid D I. Thereby we take into account that, given D I, the observations for f j could have been different from the observed values. To account for this source of uncertainty we proceed as usual in statistics: Given D I, we generate for i {0,..., I} and j {0,..., J 1} a set of new observations C i,j+1 by the formula C i,j+1 = f j C i,j + σ j C i,j ε i,j+1, 3.62 where σ j > 0 and ε i,j+1, ε i,j+1 are independent and identically distributed given B 0 cf. Model Assumptions 3.9. This means that C i,j acts as a fixed volume measure and we resample C d i,j+1 = C i,j+1, given B j. This means in the language of stochastic kernels that we consider the distributions K j+1 C i,j, dx j+1 see 3.50. Remark. We have chosen a different notation C i,j+1 vs. C i,j+1 to clearly illustrate that we resample on the conditional structure, i.e. C i,j+1 are random variables and C i,j are deterministic volumes, given B j.
Chapter 3. Chain-ladder models 55 In the spirit of Approach 3 cf. 3.45 we resample the observations for f j by only resampling the observations of development year j + 1. Together with the resampling assumption 3.62 this leads to the following resampled representation for the estimates of the development factors f j = i j+1 i j+1 C i,j+1 C i,j = f j + σ j S j i j+1 Ci,j ε i,j+1 0 j J 1, 3.63 where S j = i j+1 C i,j. 3.64 As in 3.50 we denote the probability measure of these resampled chain-ladder estimates by P D I. These resampled estimates of the development factors have, given B j, the same distribution as the original estimated chain-ladder factors. Unlike the observations { Ci,j ; i+j I } the observations { Ci,j ; i+j I } and also the resampled estimates are random variables given D I. Furthermore the observations C i,j and the random variables ε i,j are independent, given B 0 D I. This and 3.63 shows that 1 the estimators f 0,..., f J 1 are conditionally independent w.r.t. P D I, 2 E D I [ fj ] = f j for 0 j J 1 and 3 E D I [ fj 2 ] = f 2 j + σ2 j S j for 0 j J 1. Figure 3.1 illustrates the conditional resampling for two different possible observations D 1 I and D 2 I of the original data D I, which would give the two different chain-ladder estimates Ĉ1 i,j and Ĉ2 i,j for E [C i,j D I ]. Therefore in Approach 3 we estimate the estimation error by using 1-3 ED I [C i,i i 2 fi i... f ] 2 J 1 f I i... f J 1 = Ci,I i 2 Var P fi i DI... f J 1 = C 2 i,i i = C 2 i,i i J 1 l=i i J 1 l=i i E D I [ fl 2 ] fl 2 + σ2 l S l J 1 l=i i J 1 l=i i f 2 l f 2 l. 3.65
56 Chapter 3. Chain-ladder models [ ] E f2 I i BI i = fi i 2 + σ2 I i S I i J 1 l=i i [ ] E f2 l Bl = J 1 fl 2 + σ2 l S l l=i i Ĉ 1 i,j f 1 I i E [C i,j D I ] f I i Ĉ 2 i,j C i,i i f 2 I i I i... J Figure 3.1: Conditional resampling in D O I,i Approach 3 Observe that this calculation is exact, the estimation has been done at the point where we have decided to use Approach 3 for the estimation error, i.e. the estimate was done choosing the conditional probability measure P D I. Next, we replace the parameters σ 2 I i,..., σ2 J 1 and f I i,..., f J 1 with their estimators, and we obtain the following estimator for the conditional estimation error of accident year i {I J + 1,..., I} CL Var Ĉi,J DI ½ = Ci,I i 2 [ CL 2 ] = Ê D I Ĉi,J E [Ci,J D I ] J 1 l=i i f l 2 + σ2 l S l J 1 l=i i f 2 l. 3.66
Chapter 3. Chain-ladder models 57 The estimator for the conditional estimation error can be written in a recursive form. We obtain for i {I J + 1,..., I} CL Var Ĉi,J DI = Var CL Ĉi,J 1 DI f J 1 2 + Ci,I i 2 σ2 J 1 S J 1 = Var CL Ĉi,J 1 DI where Var CL Ĉi,I i DI = 0. f 2 J 1 + σ2 J 1 S J 1 J 2 l=i i f l 2 + σ2 l S l + C 2 i,i i σ2 J 1 S J 1 J 2 l=i i Estimator 3.11 MSEP for single accident years, conditional version f 2 l, 3.67 Under Model Assumptions 3.9 we have the following estimator for the conditional mean square of prediction of the ultimate claim of a single accident year i {I J + 1,..., I} msep Ci,J D Ĉi,J [ CL I = Ĉi,J Ê CL 2 ] Ci,J D I = Ĉi,J CL 2 J 1 l=i i σ 2 l / f 2 l Ĉ i,l CL } {{ } process variance We can rewrite 3.68 as follows + C 2 i,i i msep Ci,J D I Ĉi,J CL = Ĉi,J CL 2 J 1 l=i i f l 2 + σ2 l S l J 1 l=i i } {{ } estimation error J 1 l=i i σ 2 l / f 2 l Ĉ i,l CL + J 1 l=i i We could also do a linear approximation to the estimation error: J 1 l=i i f l 2 + σ2 l S l J 1 l=i i f 2 l J 1 l=i i f 2 l J 1 l=i i f 2 l. 3.68 σ l 2/ f l 2 + 1 1. S l 3.69 σ 2 l / f 2 l S l. 3.70 Observe that in fact the right-hand side of 3.70 is a lower bound for the left-hand side. This immediately gives the following estimate: Estimator 3.12 MSEP for single accident years Under Model Assumptions 3.9 we have the following estimator for the conditional mean square error of prediction of the ultimate claim of a single accident year i {I J + 1,..., I} msep Ci,J D I Ĉi,J CL = Ĉi,J CL 2 J 1 l=i i σ l 2 f l 2 1 Ĉ i,l CL + 1 S l. 3.71
58 Chapter 3. Chain-ladder models The Mack [49] approach Mack [49] even gives a different approach to the estimation of the estimation error. Introduce for j {I i,..., J 1} T j = f I i f j 1 f j f j f j+1 f J 1. 3.72 Observe that This implies that see 3.40 fi i... f 2 J 1 J 1 f I i... f J 1 = CL 2 Ĉi,J E [Ci,J D I ] = C 2 i,i i J 1 j=i i T 2 j + 2 j=i i I i j<k J 1 T j 2. 3.73 T j T k. 3.74 Each term in the sums on the right-hand side of the equality above is now estimated by a slightly modified version of Approach 2: We estimate T j T k for j < k by E [T j T k B k ] 3.75 = f I i 2 f j 1 2 {f j f j f } j {f j+1 f } j+1... {f k 1 f } k 1 { f k E [f k f ]} k Bk fk+1 2... fj 1 2 = 0, and T 2 j is estimated by E [ T 2 j [ ] Bj = 2 f I i f j 1 2 E f j f ] 2 j B j fj+1 2... fj 1 2 3.76 Hence 3.40 is estimated by = f 2 I i f 2 j 1 σ2 j S j f 2 j+1... f 2 J 1. C 2 i,i i J 1 j=i i f 2 I i f 2 j 1 σ2 j S j f 2 j+1... f 2 J 1. 3.77 If we now replace the unknown parameters σj 2 and f j by its estimates we exactly obtain the estimate msep Ci,J D Ĉi,J CL I for the conditional estimation error presented in Estimator 3.12. Remarks 3.13
Chapter 3. Chain-ladder models 59 We see that the Mack estimate for the conditional estimation error also presented in Estimator 3.12 is a linear approximation and lower bound to the estimate coming from Approach 3. The difference comes from the fact that Mack [49] decouples the estimation error in an appropriate way with the help of the terms T j and then applies a partial conditional resampling to each of the terms in the decoupling. The Time Series Model 3.9 has slightly stronger assumptions than the weighted average development WAD factor model studied in Murphy [55], Model IV. To obtain the crucial recursive formula for the conditional estimation error Theorem 3 in Appendix C of [55] Murphy assumes independence for the estimators of the chain-ladder factors. However, this assumption is inconsistent with the model assumptions since the chain-ladder factors indeed are uncorrelated see Lemma 2.5c but the squares of two successive chain-ladder estimators are negatively correlated as we can see from Lemma 3.8. The point is that by his assumptions Murphy [55] gets a multiplicative structure of the measure of volatility. In Approach 3 we get the multiplicative structure by the choice of the conditional resampling probability measure PD I for the measure of the conditional volatility of the chain-ladder estimator see discussion in Section 3.2.3. This means, in Approach 3 we do not assume that the estimated chain-ladder factors are independent. Henceforth, since in both estimators a multiplicative structure is used it turns out that the recursive estimator 3.67 for the conditional estimation error is exactly the estimator presented in Theorem 3 of Murphy [55] see also Appendix B in Barnett-Zehnwirth [7]. Example 3.7 revisited We come back to our example in Table 2.2. This gives the following error estimates: From Tables 3.4 and 3.5 we see that the differences in the estimates for the conditional estimation error coming from the linear approximation Mack formula are negligible. In all examples we have looked at we came to this conclusion. 3.2.4 Conditional MSEP in the chain-ladder model for aggregated accident years Consider two different accident years i < l. From the model assumptions we know that the ultimate losses C i,j and C l,j are independent. Nevertheless we have to be
60 Chapter 3. Chain-ladder models i d Ci,J CL CL reserves d Var `Ci,J D I 1/2 dvar dci,j CL DI 1/2 msep Ci,J D I d C i,j CL 1/2 0 11 148 124 1 10 663 318 15 126 191 1.3% 187 1.2% 267 1.8% 2 10 662 008 26 257 742 2.8% 535 2.0% 914 3.5% 3 9 758 606 34 538 2 669 7.7% 1 493 4.3% 3 058 8.9% 4 9 872 218 85 302 6 832 8.0% 3 392 4.0% 7 628 8.9% 5 10 092 247 156 494 30 478 19.5% 13 517 8.6% 33 341 21.3% 6 9 568 143 286 121 68 212 23.8% 27 286 9.5% 73 467 25.7% 7 8 705 378 449 167 80 077 17.8% 29 675 6.6% 85 398 19.0% 8 8 691 971 1 043 242 126 960 12.2% 43 903 4.2% 134 337 12.9% 9 9 626 383 3 950 815 389 783 9.9% 129 770 3.3% 410 817 10.4% Table 3.4: Estimated chain-ladder reserves and error terms according to Estimator 3.11 i d Ci,J CL CL reserves d Var `Ci,J D I 1/2 dvar dci,j CL DI 1/2 msep Ci,J D I d C i,j CL 1/2 0 11 148 124 1 10 663 318 15 126 191 1.3% 187 1.2% 267 1.8% 2 10 662 008 26 257 742 2.8% 535 2.0% 914 3.5% 3 9 758 606 34 538 2 669 7.7% 1 493 4.3% 3 058 8.9% 4 9 872 218 85 302 6 832 8.0% 3 392 4.0% 7 628 8.9% 5 10 092 247 156 494 30 478 19.5% 13 517 8.6% 33 341 21.3% 6 9 568 143 286 121 68 212 23.8% 27 286 9.5% 73 467 25.7% 7 8 705 378 449 167 80 077 17.8% 29 675 6.6% 85 398 19.0% 8 8 691 971 1 043 242 126 960 12.2% 43 903 4.2% 134 337 12.9% 9 9 626 383 3 950 815 389 783 9.9% 129 769 3.3% 410 817 10.4% Table 3.5: Estimated chain-ladder reserves and error terms according to Estimator 3.12 careful if we aggregate Ĉi,J CL and Ĉ l,j CL. The estimators are no longer independent since they use the same observations for estimating the age-to-age factors f j. We have that msep Ci,J +C l,j D I Ĉi,J CL + Ĉ i,j CL = E [ CL CL 2 ] Ĉi,J + Ĉ l,j Ci,J + C l,j D I = Var C i,j + C l,j D I + Ĉi,J CL + Ĉ l,j CL E [Ci,J + C l,j D I ] 2. 3.78 Using the independence of the different accident years, we obtain for the first term Var C i,j + C l,j D I = Var C i,j D I + Var C l,j D I, 3.79 whereas for the second term we obtain Ĉi,J CL + Ĉ l,j CL E [Ci,J + C l,j D I ] 2 CL 2 CL 2 = Ĉi,J E [Ci,J D I ] + Ĉl,J E [Cl,J D I ] 3.80 CL CL +2 Ĉi,J E [Ci,J D I ] Ĉl,J E [Cl,J D I ].
Chapter 3. Chain-ladder models 61 Hence we have the following decomposition for the conditional prediction error of the sum of two accident years [ CL CL 2 ] E Ĉi,J + Ĉ l,j Ci,J + C l,j D I [ CL 2 ] [ CL 2 ] = E Ĉi,J Ci,J D I + E Ĉl,J Cl,J D I CL CL +2 Ĉi,J E [Ci,J D I ] Ĉl,J E [Cl,J D I ]. Hence we obtain 3.81 msep Ci,J +C l,j D I Ĉi,J CL + Ĉ i,j CL 3.82 = msep Ci,J D I Ĉi,J CL + msep Cl,J D I Ĉl,J CL CL +2 Ĉi,J E [Ci,J D I ] CL Ĉl,J E [Cl,J D I ]. In addition to the conditional mean square error of prediction of single accident years, we need to average similar to 3.40 over the possible values of f j for the cross-products of the conditional estimation errors of the two accident years: CL CL Ĉi,J E [Ci,J D I ] Ĉl,J E [Cl,J D I ] = C i,i i fi i... f J 1 f I i... f J 1 3.83 C l,i l fi l... f J 1 f I l... f J 1. Now we could have the same discussions about resampling as above. Here we simply use Approach 3 for resampling, i.e. we choose the probability measure P D I. Then we can explicitly calculate these cross-products. As in 3.65 we obtain as estimate for the cross-products [ J 1 C i,i i C l,i l ED I j=i i f j J 1 j=i i f j J 1 j=i l f j J 1 j=i l = C i,i i C l,i l Cov P DI fi i... f J 1, f I l... f J 1 = C i,i i C l,i l f I l... f I i 1 Var P DI fi i... f J 1 = C i,i i C l,i l f I l... f I i 1 = C i,i i E [C l,i i D I ] J 1 j=i i J 1 j=i i f 2 j + σ2 j S j E D I [ fj 2 ] J 1 j=i i f 2 j. f j ] J 1 j=i i f 2 j 3.84 But then the estimation of the covariance term is straightforward from the estimate of a single accident year.
62 Chapter 3. Chain-ladder models Estimator 3.14 MSEP aggregated accident years, conditional version Under Model Assumptions 3.9 we have the following estimator for the conditional mean square error of prediction of the ultimate claim for aggregated accident years I I 2 msep P CL C i,j D I Ĉ i,j = Ê CL I Ĉ i,j C i,j i D I = I i=i J+1 +2 Remarks 3.15 i=i J+1 msep Ci,J D I Ĉi,J CL I J+1 i<l I C i,i i Ĉ l,i i CL i=i J+1 J 1 j=i i i=i J+1 f j 2 + σ2 j S j J 1 j=i i f 2 j. 3.85 The last terms covariance terms from the result above can be rewritten as 2 I J+1 i<l I Ĉ l,i i CL C i,i i Var CL Ĉi,J DI, 3.86 where Var CL Ĉi,J DI is the conditional estimation error of the single accident year i see 3.66. This may be helpful in the implementation since it leads to matrix multiplications. We can again do a linear approximation and then we find the estimator presented in Mack [49]. Example 3.7 revisited We come back to our example in Table 2.2. This gives the error estimates in Table 3.6. 3.3 Analysis of error terms In this section we further analyze the conditional mean square error of prediction of the chain-ladder method. In fact, we consider three different kinds of error terms: a conditional process error, b conditional prediction error, c conditional estimation error. To analyze these three terms we define a model, which is different from the classical chain-ladder model. It is slightly more complicated than the classical
Chapter 3. Chain-ladder models 63 i d Ci,J CL CL reserves d Var `Ci,J D I 1/2 dvar dci,j CL DI 1/2 msep Ci,J D I d C i,j CL 1/2 0 11 148 124 1 10 663 318 15 126 191 1.3% 187 1.2% 267 1.8% 2 10 662 008 26 257 742 2.8% 535 2.0% 914 3.5% 3 9 758 606 34 538 2 669 7.7% 1 493 4.3% 3 058 8.9% 4 9 872 218 85 302 6 832 8.0% 3 392 4.0% 7 628 8.9% 5 10 092 247 156 494 30 478 19.5% 13 517 8.6% 33 341 21.3% 6 9 568 143 286 121 68 212 23.8% 27 286 9.5% 73 467 25.7% 7 8 705 378 449 167 80 077 17.8% 29 675 6.6% 85 398 19.0% 8 8 691 971 1 043 242 126 960 12.2% 43 903 4.2% 134 337 12.9% 9 9 626 383 3 950 815 389 783 9.9% 129 770 3.3% 410 817 10.4% Cov. term 116 811 116 811 Total 6 047 061 424 379 7.0% 185 026 3.1% 462 960 7.7% Table 3.6: Estimated chain-ladder reserves and error terms Estimator 3.14 model but therefore leads to a clear distinction between these error terms. The motivation for a clear distinction between the three error terms is that the sources of these error classes are rather different ones and we believe that in the light of the solvency discussions see e.g. SST [73], Sandström [67], Buchwalder et al. [11, 14] or Wüthrich [88] we should clearly distinguish between the different risk factors. In this section we closely follow Wüthrich [90]. For a similar Bayesian approach we also refer to Gisler [29]. 3.3.1 Classical chain-ladder model The observed individual development factors were defined by see also 3.19 F i,j = then we have with Model Assumptions 3.2 that C i,j C i,j 1, 3.87 E [F i,j C i,j 1 ] = f j 1 and Var F i,j C i,j 1 = σ2 j 1 C i,j 1. 3.88 The conditional variational coefficients of the development factors F i,j are given by Vco F i,j C i,j 1 = Vco C i,j C i,j 1 = σ j 1 f j 1 C 1/2 i,j 1 0, as C i,j 1. 3.89 Hence for increasing volume the conditional variational coefficients of F i,j converge to zero! It is exactly this property 3.89 which is crucial in risk management. If we assume that risk is defined through these variational coefficients, it means that the risk completely disappears for very large portfolios law of large numbers. But we all know that this is not the case in practice. There are always external factors, which influence a portfolio and which are not diversifiable, e.g. if jurisdiction
64 Chapter 3. Chain-ladder models changes it is not helpful to have a large portfolio, etc. Also the experiences in recent years have shown that we have to be very careful about external factors and parameter errors since they can not be diversified. Therefore, in almost all developments of new solvency guidelines and requirements one pays a lot of attention to these risks. The goal here is to define a chain-ladder model, which reflects also this kind of risk class. 3.3.2 Enhanced chain-ladder model The approach in this section modifies 3.89 as follows. We assume that there exist constants a 2 0, a 2 1,... 0 such that for all 1 j J we have that Hence Vco 2 F i,j C i,j 1 = σ2 j 1 f 2 j 1 Vco 2 F i,j C i,j 1 > C 1 i,j 1 + a2 j 1. 3.90 lim C i,j 1 Vco2 F i,j C i,j 1 = a 2 j 1, 3.91 which is now bounded from below by a 2 j 1. This implies that we replace the chainladder condition on the variance by Var C i,j C i,j 1 = σ 2 j 1 C i,j 1 + a 2 j 1 f 2 j 1 C 2 i,j 1. 3.92 This means that we add a quadratic term to ensure that the variational coefficient does not disappear when the volume is going to infinity. As above, we define the chain-ladder consistent time series model. This time series model gives an algorithm how we should simulate additional observations. This algorithm will be used for the calculation of the estimation error. Model Assumptions 3.16 Enhanced time series model Different accident years i are independent. There exist constants f j > 0, σ 2 j > 0, a 2 j 0 and random variables ε i,j+1 such that for all i {0,..., I} and j {0,..., J 1} we have that C i,j+1 = f j C i,j + σ 2 j + a 2 j f 2 j C i,j 1/2 Ci,j ε i,j+1, 3.93 with conditionally, given B 0, ε i,j+1 are independent with E [ε i,j+1 B 0 ] = 0, E [ ] ε 2 i,j+1 B 0 = 1 and P [Ci,j+1 > 0 B 0 ] = 1 for all i {0,..., I} and j {0,..., J 1}.
Chapter 3. Chain-ladder models 65 Remark. See Remarks 3.10. Lemma 3.17 Model 3.16 satisfies Model Assumptions 3.2 with 3.11 replaced by 3.92. I.e. the model satisfies the chain-ladder assumptions with modified variance function. For a j = 0 we obtain the Time Series Version 3.9. 3.3.3 Interpretation In this subsection we give an interpretation to the variance term 3.92. Alternatively, we could use a model with latent variables Θ i,j. This is similar to the Bayesian approaches such as used in Gisler [29] saying that the true chain-ladder factors f j are themselves random variables depending on external/latent factors. A1 Conditionally, given Θ i,j, we have E [C i,j+1 Θ i,j, C i,j ] = f j Θ i,j C i,j, 3.94 Var C i,j+1 Θ i,j, C i,j = σj 2 Θ i,j C i,j. 3.95 A2 Θ i,j are independent with E [f j Θ i,j C i,j ] = f j, 3.96 Var f j Θ i,j C i,j = a 2 j fj 2, 3.97 E [ σj 2 Θ i,j ] Ci,j = σ 2 j. 3.98 Remark. The variables F i,j = C i,j+1 /C i,j satisfy the Bühlmann-Straub model assumptions see Bühlmann-Gisler [18] and Section 4.3 below. For the variance term we obtain Var C i,j+1 C i,j = E [Var C i,j+1 Θ i,j, C i,j C i,j ] 3.99 +Var E [C i,j+1 Θ i,j, C i,j ] C i,j = σj 2 C i,j + a 2 j fj 2 Ci,j. 2 3.100 Moreover we see that Vco f j Θ i,j C i,j = a j. 3.101 Hence we introduce the following terminology:
66 Chapter 3. Chain-ladder models a Conditional process error / Conditional process variance. The conditional process error corresponds to the term σj 2 C i,j 3.102 and reflects the fact that C i,j+1 are random variables which have to be predicted. For increasing volume C i,j the variational coefficient of this term disappears. b Conditional Prediction error. The conditional prediction error corresponds to the term a 2 j fj 2 Ci,j 2 3.103 and reflects the fact that we have to predict the future development factors f j Θ i,j. These future development factors underlay also some uncertainty, and hence may be modelled stochastically Bayesian point of view. The Mack formula and the Estimator 3.14 for the conditional mean square error of prediction does not consider this kind of risk. c Conditional estimation error. There is a third kind of risk, namely the risk which comes from the fact that we have to estimate the true parameters f j in 3.96 from the data. This error term will be called conditional estimation error. It is also considered in the Mack model and in Estimator 3.14. For the derivation of an estimate for this term we will use Approach 3, page 50. This derivation will use the time series definition of the chain-ladder method. 3.3.4 Chain-ladder estimator in the enhanced model Under Model Assumptions 3.16 we have that with F i,j+1 = f j + σ 2 j C 1 i,j + a2 j f 2 j 1/2 εi,j+1, 3.104 E [F i,j+1 C i,j ] = f j and Var F i,j+1 C i,j = σ 2 j C 1 i,j + a2 j f 2 j. 3.105 This immediately gives the following lemma: Lemma 3.18 Under Model Assumptions 3.16 we have for i > I J that Proof. See proof of Lemma 2.3. E [C i,j D I ] = E [C i,j C i,i i ] = C i,i i J 1 j=i i f j. 3.106
Chapter 3. Chain-ladder models 67 Remark. As soon as we know the chain-ladder factors f j we can calculate the expected conditional ultimate C i,j, given the information D I. Of course, in general, the chain-ladder factors f j are not known and need to be estimated from the data. 3.3.5 Conditional process and prediction errors We derive now the recursive formula for the conditional process and prediction error: Under Model Assumptions 3.16 we have for the ultimate claim C i,j of accident year i > I J that Var C i,j D I = Var C i,j C i,i i 3.107 = E [Var C i,j C i,j 1 C i,i i ] + Var E [C i,j C i,j 1 ] C i,i i. For the first term on the right-hand side of 3.107 we obtain under Model Assumptions 3.16 that E [Var C i,j C i,j 1 C i,i i ] 3.108 = E [ ] σj 1 2 C i,j 1 + a 2 J 1 fj 1 2 Ci,J 1 2 C i,i i = σ 2 J 1 = C 2 i,i i J 2 j=i i f j C i,i i + a 2 J 1 f 2 J 1 Var C i,j 1 D I + E [C i,j 1 C i,i i ] 2 σ 2 J 1 C i,i i J 2 j=i i f j + a 2 J 1 J 1 j=i i f 2 j + a 2 J 1 f 2 J 1 Var C i,j 1 D I. For the second term on the right-hand side of 3.107 we obtain under Model Assumptions 3.16 Var E [C i,j C i,j 1 ] C i,i i = Var f J 1 C i,j 1 C i,i i 3.109 = f 2 J 1 Var C i,j 1 D I. This leads to the following recursive formula compare this to 3.32 Var C i,j D I = C 2 i,i i σ 2 J 1 C i,i i J 2 j=i i f j + a 2 J 1 J 1 j=i i +1 + a 2 J 1 f 2 J 1 Var C i,j 1 D I. f 2 j 3.110 For a 2 J 1 = 0 it coincides with the formula given in 3.32. This gives the following lemma:
68 Chapter 3. Chain-ladder models Lemma 3.19 Process/prediction errors for single accident years Under Model Assumptions 3.16 the conditional process variance and prediction errors for the ultimate claim of a single accident year i {I j + 1,..., I} are given by Var C i,j D I [ J 1 = Ci,I i 2 = E [C i,j D I ] 2 J 1 m=i i n=m+1 [ J 1 m=i i 1 + a 2 n fn 2 σm 2 C i,i i σ 2 m /fm 2 E [C i,m D I ] + a2 m m 1 j=i i J 1 n=m+1 f j + a 2 m 1 + a 2 n m j=i i ] f 2 j ]. 3.111 Lemma 3.19 implies that the conditional variational coefficient of the ultimate C i,j is given by [ J 1 Vco C i,j D I = m=i i σ 2 m /fm 2 E [C i,m D I ] + a2 m J 1 n=m+1 1 + a 2 n] 1/2. 3.112 Henceforth we see that the conditional prediction error of C i,j corresponds to the conditional process error disappears for infinitely large volume C i,i i [ J 1 J 1 1/2 lim Vco C i,j D I = a 2 m 1 + a 2 C i,i i n], 3.113 m=i i n=m+1 and the conditional variational coefficient for the conditional process error of C i,j is given by [ J 1 m=i i σ 2 m /fm 2 E [C i,m D I ] J 1 n=m+1 1 + a 2 n] 1/2. 3.114 3.3.6 Chain-ladder factors and conditional estimation error The conditional estimation error comes from the fact that we have to estimate the f j from the data. Estimation Approach 1 From Lemma 3.4 we obtain the following lemma: Lemma 3.20 Under Model Assumptions 3.16, the estimator F j = i j+1 C i,j σj 2+a2 j f j 2 C i,j i j+1 C i,j F i,j+1 = i j+1 i j+1 C i,j+1 σ 2 j +a2 j f 2 j C i,j C i,j. 3.115 σ 2 j +a2 j f 2 j C i,j σ 2 j +a2 j f 2 j C i,j
Chapter 3. Chain-ladder models 69 is the B j+1 -measurable unbiased estimator for f j, which has minimal conditional variance among all linear combinations of the unbiased estimators F i,j+1 0 i i j+1 for f j, conditioned on B j, i.e. i j+1 Var F j B j = min Var α i F i,j+1 α i R B j. 3.116 The conditional variance is given by Var Fj Bj = i j+1 C i,j σj 2 + a2 j f j 2 C i,j 1. 3.117 Proof. From 3.105 we see that F i,j+1 is an unbiased estimator for f j, conditioned on B j, with E [F i,j+1 B j ] = E [F i,j+1 C i,j ] = f j, 3.118 Var F i,j+1 B j = Var F i,j+1 C i,j = σ 2 j C 1 i,j + a2 j f 2 j. 3.119 Hence the proof follows from Lemma 3.4. Remark. For a j = 0 we obtain the classical chain-ladder estimators 2.7. Moreover, observe that for calculating the estimate F j one needs to know the parameter f j, a j and σ j see 3.115. Of course this contradicts the fact that we need to estimate f j. One way out of this dilemma is to use an estimate for f j which is not optimal, i.e. has larger variance. Let us in Estimation Approach 1 assume that we can calculate 3.115. Estimator 3.21 Chain-ladder estimator, enhanced time series model The CL estimator for E [C i,j D I ] in the Enhanced Model 3.16 is given by Ĉ i,j CL,2 = Ê [C i,j D I ] = C i,i i j 1 l=i i F l. 3.120 for i + j > I. We obtain the following lemma for the estimators in the enhanced time series model: Lemma 3.22 Under Assumptions 3.16 we have: a F [ ] j is, given B j, an unbiased estimator for f j, i.e. E Fj Bj = f j,
70 Chapter 3. Chain-ladder models b F [ ] j is unconditionally unbiased for f j, i.e. E Fj = f j, c F 0,..., F J 1 are uncorrelated, i.e. E [ F0... F ] J 1 = J 1 j=0 E [ Fj ], CL,2 d Ĉi,J is, given Ci,I i, an unbiased estimator for E [C i,j D I ], i.e. CL,2 E [Ĉi,J ] CI i = E [C i,j D I ] and CL,2 e Ĉi,J E [C i,j ]. is unconditionally unbiased for E [C i,j ], i.e. E [Ĉi,J CL,2 ] = Proof. See proof of Lemma 2.5. Single accident years In the sequel of this subsection we assume that the parameters in 4.62 are known to calculate F j. Our goal is to estimate the conditional mean square error of prediction conditional MSEP as in the classical chain-ladder model msep Ci,J D Ĉi,J [ CL,2 ] CL,2 2 I = E C i,j Ĉi,J D I = Var C i,j D I + E [C i,j D I ] Ĉi,J CL,2 2. 3.121 The first term is exactly the conditional process variance and the conditional prediction error obtained in Lemma 3.19, the second term is the conditional estimation error. It is given by E [C i,j D I ] Ĉi,J CL,2 2 = C 2 i,i i J 1 j=i i f j J 1 j=i i F j 2. 3.122 Observe that F j = i j+1 = f j + C i,j σj 2+a2 j f j 2 C i,j i j+1 C i,j σj 2+a2 j f j 2 C i,j i j+1 1 F i,j+1 C i,j σj 2+a2 j f j 2 C i,j i j+1 C i,j σ 2 j + a2 j f 2 j C i,j 1/2 ε i,j+1. 3.123 Hence F j consists of a constant f j and a stochastic error term see also Lemma 3.20. In order to determine the conditional estimation error we now proceed as
Chapter 3. Chain-ladder models 71 in Section 3.2.3 for the Time Series Model 3.9. This means that we use Approach 3 conditional resampling in DI,i O, page 50 to estimate the fluctuations of the estimators F 0,..., F J 1 around the chain-ladder factors f 0,..., f J 1, i.e. to get an estimate for 3.122. We therefore conditionally resample the observations F 0,..., F J 1, given D I, and use the resampled estimates to calculate an estimate for the conditional estimation error. For these resampled observations we again use the notation PD I for the conditional measure for a more detailed discussion we refer to Section 3.2.3. Moreover, under P D I, the random variables F j are independent with [ ] [ ] 2 ED Fj I = f j and ED Fj I = fj 2 + i j+1 C i,j σj 2 + a2 j f j 2 C i,j 1 3.124 cf. Section 3.2.3, Approach 3. This means that the conditional estimation error 3.122 is estimated by J 1 C i,i i 2 f j E D I j=i i = C 2 i,i i = C 2 i,i i J 1 j=i i J 1 j=i i J 1 j=i i F j 2 E D I [ Fj 2 ] J 1 fj 2 j=i i J 1 = Ci,I i 2 Var P DI J 1 j=i i i j+1 k=0 σ 2 j f 2 j f 2 j C k,j + a 2 j C k,j Finally, if we do a linear approximation to 3.125 we obtain J 1 J 1 E D I C 2 i,i i j=i i C 2 i,i i f j J 1 j=i i j=i i f 2 j F j 2 J 1 j=i i j=i i 1 F j J 1 = Ci,I i 2 Var P DI i j+1 k=0 σ 2 j f 2 j C k,j + a 2 j C k,j 3.125 + 1 1. j=i i 1 F j. 3.126 For a j = 0 this is exactly the conditional estimation error in the Mack Model 3.2. For increasing number of observations accident years i this error term goes to zero. If we use the linear approximation 3.126 and if we replace the parameters in 3.111 and 3.126 by their estimators cf. Section 3.3.7 we obtain the following estimator for the conditional mean square error of prediction for the time being we assume that σ 2 j and a 2 j are known.
72 Chapter 3. Chain-ladder models Estimator 3.23 MSEP for single accident years Under Model Assumptions 3.16 we have the following estimator for the conditional mean square error of prediction for the ultimate claim of a single accident year i {I J + 1,..., I} msep Ci,J D I Ĉi,J CL,2 = Aggregated accident years Consider two different accident years k < i. CL,2 Ĉi,J [ 2 J 1 σj 2 2 j=i i F + CL,2 a2 j 3.127 j Ĉ i,j 1 ] J 1 i j+1 1 + a 2 C k,j n + σj 2 n=j+1 k=0 2 + a 2 j C. k,j cf j From our assumptions we know that the ultimate losses C k,j and C i,j are independent. Nevertheless we have to be careful if we aggregate Ĉk,J CL,2 and Ĉ i,j CL,2. The estimators are no longer independent since they use the same observations for estimating the chain-ladder factors f j. [ CL,2 CL,2 2 ] E Ĉk,J + Ĉ i,j Ck,J + C i,j D I 3.128 = Var C k,j + C i,j D I + Ĉk,J CL,2 + Ĉ i,j CL,2 E [Ck,J + C i,j D I ] 2. Using the independence of the different accident years, we obtain for the first term Var C k,j + C i,j D I = Var C k,j D I + Var C i,j D I. 3.129 This term is exactly the conditional process and prediction error from Lemma 3.19. For the second term 3.128 we obtain Ĉk,J CL,2 + Ĉ i,j CL,2 E [Ck,J + C i,j D I ] 2 CL,2 2 CL,2 2 = Ĉk,J E [Ck,J D I ] + Ĉi,J E [Ci,J D I ] 3.130 CL,2 CL,2 +2 Ĉk,J E [Ck,J D I ] Ĉi,J E [Ci,J D I ]. Hence we have the following decomposition for the conditional mean square error of prediction error of the sum of two accident years [ CL,2 CL,2 2 ] E Ĉk,J + Ĉ i,j Ck,J + C i,j D I [ CL,2 2 ] [ CL,2 2 ] = E Ĉk,J Ck,J D I + E Ĉi,J Ci,J D I 3.131 CL,2 CL,2 +2 Ĉk,J E [Ck,J D I ] Ĉi,J E [Ci,J D I ].
Chapter 3. Chain-ladder models 73 In addition to the conditional MSEP of single accident years see Estimator 3.23, we need to average the covariance terms over the possible values of F j similar to 3.122: Ĉk,J CL,2 E [Ck,J D I ] = C k,i k J 1 l=i k F l CL,2 Ĉi,J E [Ci,J D I ] J 1 J 1 f l C i,i i F l l=i k l=i i J 1 l=i i As in 3.125, using Approach 3, we obtain for the covariance term 3.132 E D I [C k,i k C i,i i = C k,i k C i,i i = C k,i k C i,i i J 1 j=i k J 1 j=i k I k 1 j=i i I k 1 j=i i F j f j f j i j+1 m=0 J 1 j=i k J 1 j=i k J 1 j=i k σ 2 j f 2 j f 2 j f j J 1 j=i i F j E D I [ Fj 2 ] C m,j + a 2 j C m,j 1 J 1 j=i i J 1 j=i k f l f 2 j f j ] + 1 1.. 3.132 3.133 If we do the same linear approximation as in 3.126 the estimation of the covariance term is straightforward from 3.117. Estimator 3.24 MSEP for aggregated accident years Under Model Assumptions 3.16 we have the following estimator for the conditional mean square error of prediction of the ultimate claim for aggregated accident years msep P i C i,j D I + 2 I I J+1 I J+1 k<i I CL,2 Ĉ i,j = I i=i J+1 Ĉ k,j CL,2 Ĉ i,j CL,2 msep Ci,J D I Ĉi,J CL,2 3.134 J 1 j=i k i j+1 m=0 C m,j σj 2 2 + a 2 j C m,j cf j 1. Estimation Approach 2 In the derivation of the estimate F j, see 4.62, we have seen that we face the problem, that the parameters need already be known in order to estimate them.
74 Chapter 3. Chain-ladder models We could also use a different unbiased estimator. We define F j 0 = i j+1 C i,j F i,j+1 i j+1 C i,j = i j+1 i j+1 C i,j+1 C i,j. 3.135 F j 0 = fj is the classical chain-ladder estimator in the Mack Model 3.2. It is optimal under the Mack variance condition, but it is not optimal under our variance condition 3.90. Observe that 0 Var Fj Bj = = = 1 i j+1 i j+1 σj 2 i j+1 C i,j 2 i j+1 σ 2 j C i,j 1 + a 2 j f 2 j C 2 i,j i j+1 C i,j 2 a 2 j fj 2 + C i,j i j+1 Var C i,j+1 C i,j 3.136 i j+1 C 2 i,j C i,j 2. This immediately gives the following corollary: Corollary 3.25 Under Model Assumptions 3.16 we have for i > I J that C i,i i J 1 j=i i F j 0 3.137 defines a conditionally, given C i,i i, unbiased estimator for E [C i,j D I ]. The process variance and the prediction error is given by Lemma 3.19. For the estimation error of a single accident year in Approach 3 we obtain the estimate Ci,I i 2 J 1 j=i i σj 2 i j+1 a 2 j fj 2 + C i,j i j+1 i j+1 C 2 i,j C i,j 2 + f 2 j J 1 j=i i This expression is of course larger than the one obtained in 3.125. f 2 j. 3.138
Chapter 3. Chain-ladder models 75 3.3.7 Parameter estimation We need to estimate three families of parameters f j, σ j and a j. For Estimation Approach 1 the estimation of f j is given in 3.115, which gives only an implicit expression for the estimation of f j, since the chain-ladder factors appear also in the weights. Therefore we propose an iterative estimation in Approach 1 on the other hand there is no difficulty in Estimation Approach 2. Estimation of a j. The sequence a j can usually not be estimated from the data, unless we have a very large portfolio C i,j, such that the conditional process error disappears. Hence a j can only be obtained if we have data from the whole insurance market. This kind of considerations have been done for the determination of the parameters for prediction errors in the Swiss Solvency Test see e.g. Tables 6.4.4 and 6.4.7 in [73]. Unfortunately, the tables only give an overall estimate for the conditional prediction error, not a sequence a j e.g. the variational coefficient of the overall error similar to 3.101 for motor third party liability claims reserves is 3.5%. We reconstruct a j with the help of 3.113. Define for j = 0,..., J 1 V 2 j = J 1 m=j 1 a 2 m J 1 n=m+1 1 + a 2 n. 3.139 Hence a j 1 can be determined recursively from V 2 j V 2 j+1: a 2 j 1 = V 2 j V 2 j+1 1 + a 2 n 1. 3.140 J 1 n=j Henceforth, we can estimate a j 1 as soon as we have an estimate for V j. V j corresponds to V j = lim Vco C i,j C i,j 1 3.141 C i,j 1 cf. 3.113. Hence we need to estimate the conditional prediction error of C i,j, given the observation C i,j 1. Since we do not really have a good idea/guess about the conditional variational coefficient in 3.141 we express the conditional variac 2006 M. Wüthrich, ETH Zürich & M. Merz, Uni Tübingen
76 Chapter 3. Chain-ladder models tional coefficient in terms of reserves Vco C i,j C i,j 1 = Var C i,j C i,j 1 1/2 E [C i,j C i,j 1 ] = Var C i,j C i,j 1 C i,j 1 1/2 E [C i,j C i,j 1 C i,j 1 ] = Vco C i,j C i,j 1 C i,j 1 3.142 E [C i,j C i,j 1 C i,j 1 ] E [C i,j C i,j 1 ] J 1 l=j 1 J 1 l=j 1 f l 1 f l. In our examples we assume that the conditional variational coefficient for the conditional prediction error of the reserves C i,j C i,j 1 is constant equal to r and we set V j = r J 1 l=j 1 J 1 l=j 1 F l 0 1 F l 0. 3.143 This immediately gives an estimate â j for the conditional prediction error a j. Estimation of σ j. σ 2 j is estimated iteratively from the data. A tedious calculation on conditional expectation gives i 1 j+1 C i i,j E j + 1 [ F i,j+1 = σj 2 + a2 j fj 2 i j + 1 i j+1 ] 2 0 F j B j C i,j i j+1 C 2 i,j i j+1 C i,j. Hence we get the following iteration for the estimation of σ 2 j : For k 1 3.144 σ j 2 k = i 1 j+1 C i i,j F i,j+1 j + 1 F k 1 â j2 j i j + 1 2 i j+1 C i,j 2 0 F j 3.145 i j+1 C 2 i,j i j+1 C i,j. If σ j 2 k becomes negative, it is set to 0, i.e. we only have a conditional prediction error and the conditional process error is equal to zero the volume is sufficiently large, such that the conditional process error disappears.
Chapter 3. Chain-ladder models 77 Estimation of F j. The estimators F j are then iteratively determined via 3.115. For k 1 Remarks 3.26 F k j = i j+1 i j+1 cσ 2 j cσ 2 j C i,j+1 k + baj 2 bf k 1 j C i,j k + baj 2 bf k 1 j 2 Ci,j 2 Ci,j. 3.146 In all examples we have looked at we have observed very fast convergence of σ j 2 k k and F j in the sense that we have not observed any changes in the ultimates after three iterations for the F j. To determine σ 2 j we could also choose a different unbiased estimator 1 = i 1 j+1 i j + 1 C i,j σ 2 j + a2 j f 2 j C i,j E [ F i,j+1 F j 2 B j ]. 3.147 The difficulty with 3.147 is that it again leads to an implicit expression for σ 2 j. The formula for the MSEP, Estimator 3.24, was derived under the assumption that the underlying model parameters f j, σ j and a j are known, if we replace these parameters by their estimates as it is described via the iteration in this section we obtain additional sources for the estimation errors! However, since calculations get too tedious or even impossible we omit further derivations of the MSEP and take Estimator 3.24 as a first approximation. We close this section with an example: Example 3.27 MSEP in the enhanced chain-ladder model We choose two portfolios: Portfolio A and Portfolio B. Both are of similar type i.e. consider the same line of business, moreover Portfolio B is contained in Portfolio A. Portfolio A
78 Chapter 3. Chain-ladder models 0 1 2 3 4 5 6 7 8 9 10 0 111 551 154 622 156 159 156 759 157 583 158 666 160 448 160 552 160 568 160 617 160 621 1 116 163 171 449 175 502 176 533 176 989 177 269 178 488 178 556 178 620 178 621 178 644 2 127 615 189 682 193 823 196 324 198 632 200 299 202 740 203 848 204 168 205 560 205 562 3 147 659 217 342 220 123 222 731 222 916 223 320 223 447 223 566 227 103 227 127 227 276 4 157 495 212 770 219 680 220 978 221 276 223 724 223 743 223 765 223 669 223 601 223 558 5 154 969 213 352 219 201 220 469 222 751 223 958 224 005 224 030 223 975 224 048 224 036 6 152 833 209 969 214 692 220 040 223 467 223 754 223 752 223 593 223 585 223 688 223 697 7 144 223 207 644 212 443 214 108 214 661 214 610 214 564 214 484 214 459 214 459 8 145 612 209 604 214 161 215 982 217 962 220 783 221 078 221 614 221 616 9 196 695 282 621 288 676 290 036 292 206 294 531 294 671 294 705 10 181 381 260 308 266 497 269 130 269 404 269 691 269 720 11 177 168 263 130 268 848 270 787 271 624 271 688 12 156 505 230 607 237 102 244 847 245 940 13 157 839 239 723 261 213 264 755 14 159 429 233 309 239 800 15 169 990 246 019 16 173 377 Table 3.7: Observed cumulative payments Ci,j in Portfolio A 0 1 2 3 4 5 6 7 8 9 f b j 1.4416 1.0278 1.0112 1.0057 1.0048 1.0025 1.0008 1.0020 1.0010 1.0001 bσj 18.3478 8.7551 3.9082 2.2050 2.1491 2.0887 0.8302 2.4751 1.0757 0.1280 Table 3.8: Chain-ladder parameters in Mack s Model 3.2 for Portfolio A
Chapter 3. Chain-ladder models 79 This leads in Mack s Model 3.2 to the following reserves: i CL reserves msep Ci,J D I d C i,j CL 1/2 d Var `Ci,J D I 1/2 dvar dci,j CL DI 1/2 7 20 64 322.0% 59 300.4% 23 115.8% 8 231 543 235.2% 510 220.8% 187 80.9% 9 898 1 582 176.1% 1 468 163.4% 589 65.5% 10 1 044 1 573 150.7% 1 470 140.9% 560 53.7% 11 1 731 1 957 113.1% 1 838 106.2% 674 38.9% 12 2 747 2 169 79.0% 2 055 74.8% 693 25.2% 13 4 487 2 563 57.1% 2 426 54.1% 826 18.4% 14 6 803 3 169 46.6% 3 030 44.5% 928 13.6% 15 14 025 5 663 40.4% 5 443 38.8% 1 564 11.2% 16 90 809 10 121 11.1% 9 762 10.8% 2 669 2.9% Total 122 795 13 941 11.4% 12 336 10.0% 6 495 5.3% Table 3.9: Reserves and conditional MSEP in Mack s Model 3.2 for Portfolio A We compare these results now to the estimates in the Model 3.16: We set r = 5% and obtain the parameter estimates given below. Remark. In practice a j can only be determined with the help of external know how and market data. Therefore, e.g. for solvency purposes, a j should be determined a priori by the regulator. It answers the question how good can an actuarial estimate at most be?.
80 Chapter 3. Chain-ladder models 0 1 2 3 4 5 6 7 8 9 10 V c j 1.7187% 0.2697% 0.1379% 0.0833% 0.0552% 0.0316% 0.0193% 0.0152% 0.0052% 0.0005% 0.0000% baj 1.6974% 0.2317% 0.1099% 0.0624% 0.0453% 0.0251% 0.0119% 0.0143% 0.0052% 0.0005% 0.0000% Table 3.10: Estimated âj and Vj in Model 3.16 c F j c F j c F j 1 2 3 0 1 2 3 4 5 6 7 8 9 1.44152 1.02784 1.01123 1.00572 1.00477 1.00249 1.00082 1.00200 1.00095 1.00009 1.44152 1.02784 1.01123 1.00572 1.00477 1.00249 1.00082 1.00200 1.00095 1.00009 1.44152 1.02784 1.01123 1.00572 1.00477 1.00249 1.00082 1.00200 1.00095 1.00009 cσj 1 15.82901 8.68855 3.87516 2.18642 2.13924 2.08568 0.82851 2.47435 1.07546 0.12802 cσj 2 15.82926 8.68856 3.87516 2.18642 2.13924 2.08568 0.82851 2.47435 1.07546 0.12802 cσj 3 15.82926 8.68856 3.87516 2.18642 2.13924 2.08568 0.82851 2.47435 1.07546 0.12802 Table 3.11: Estimated parameters in Model 3.16 for Portfolio A Already after 3 iterations the parameters have sufficiently converged such that the reserves are stable.
Chapter 3. Chain-ladder models 81 «1/2 i CL reserves msep Ci,J D I Ci,J d CL,2 1/2 Var d `Ci,J DI 1/2 process error 1/2 prediction error 1/2 Var d CL,2 C d i,j D I 7 20 64 321.9% 59 300.4% 59 300.4% 1 5.0% 23 115.8% 8 231 543 235.1% 510 220.8% 510 220.7% 12 5.0% 187 80.9% 9 898 1 581 176.0% 1 468 163.4% 1 467 163.3% 45 5.0% 589 65.5% 10 1 044 1 573 150.7% 1 470 140.8% 1 469 140.7% 52 5.0% 560 53.7% 11 1 731 1 956 113.0% 1 836 106.1% 1 834 106.0% 87 5.0% 674 38.9% 12 2 747 2 165 78.8% 2 051 74.7% 2 046 74.5% 137 5.0% 693 25.2% 13 4 489 2 556 56.9% 2 418 53.9% 2 408 53.6% 224 5.0% 826 18.4% 14 6 804 3 153 46.3% 3 013 44.3% 2 994 44.0% 340 5.0% 928 13.6% 15 14 024 5 627 40.1% 5 405 38.5% 5 360 38.2% 701 5.0% 1 565 11.2% 16 90 796 9 244 10.2% 8 844 9.7% 7 590 8.4% 4 540 5.0% 2 688 3.0% Total 122 784 13 298 10.8% 11 598 9.4% 10 640 8.7% 4 615 3.8% 6 504 5.3% Table 3.12: Reserves and conditional MSEP in Model 3.16 for Portfolio A Comment. The resulting reserves are almost the same in the Mack Model 3.2 and in Model 3.16. We obtain now both, a conditional process error and a conditional prediction error term. The sum of these two terms has about the same size as the conditional process error in Mack s method. This comes from the fact that we use the same data to estimate the parameters. But the error term in the enhanced chain-ladder model is now bounded from below by the conditional prediction error, whereas the conditional process error in the Mack model converges to zero for increasing volume. Portfolio B We choose now a second portfolio B, which includes similar business as our example given in Table 3.7 Portfolio A. In fact, Portfolio B is a sub-portfolio of Portfolio A given in Table 3.13 containing exactly the same line of business. Therefore we assume that the conditional prediction errors are the same as in Table 3.12.
82 Chapter 3. Chain-ladder models 0 1 2 3 4 5 6 7 8 9 10 0 53 095 73 067 74 548 75 076 75 894 76 128 77 904 78 008 78 022 78 071 78 075 1 59 183 87 679 89 303 90 033 90 058 90 303 91 454 91 472 91 482 91 483 91 494 2 64 640 95 734 97 648 99 429 100 462 101 683 103 549 104 642 104 917 105 560 105 560 3 72 150 105 349 106 546 106 919 106 934 107 144 107 170 107 225 107 232 107 232 107 232 4 76 272 105 630 108 406 108 677 108 838 110 140 110 110 110 111 110 155 110 155 110 110 5 75 469 105 987 108 779 109 093 111 366 111 390 111 422 111 448 111 367 111 369 111 369 6 78 835 108 835 111 455 116 231 117 896 118 161 118 157 117 940 117 940 117 972 117 974 7 70 780 98 753 101 347 102 624 102 629 102 587 102 545 102 500 102 474 102 474 8 73 311 101 911 103 657 104 516 105 297 107 749 107 911 107 949 107 949 9 102 741 144 167 147 211 147 777 149 506 149 753 149 865 149 899 10 97 797 143 742 147 683 149 575 149 710 149 857 149 890 11 98 682 147 042 151 029 151 960 152 645 152 682 12 86 067 126 032 129 969 131 858 131 972 13 87 013 131 721 150 062 152 883 14 83 678 124 048 128 322 15 90 415 129 970 16 86 382 Table 3.13: Observed cumulative payments Ci,j Portfolio B c F j c F j c F j 1 2 3 0 1 2 3 4 5 6 7 8 9 1.43999 1.03310 1.01168 1.00632 1.00463 1.00415 1.00102 1.00026 1.00088 0.99996 1.43999 1.03310 1.01168 1.00632 1.00463 1.00415 1.00102 1.00026 1.00088 0.99996 1.43999 1.03310 1.01168 1.00632 1.00463 1.00415 1.00102 1.00026 1.00088 0.99996 cσj 1 11.16524 10.86582 3.52827 2.24269 2.35370 2.63740 1.10600 0.30292 0.69058 0.05593 cσj 2 11.16676 10.86582 3.52827 2.24269 2.35370 2.63740 1.10600 0.30292 0.69058 0.05593 cσj 3 11.16676 10.86582 3.52827 2.24269 2.35370 2.63740 1.10600 0.30292 0.69058 0.05593 Table 3.14: Estimated parameters in Model 3.16 for Portfolio B
Chapter 3. Chain-ladder models 83 «1/2 i CL reserves msep Ci,J D I Ci,J d CL,2 1/2 Var d `Ci,J DI 1/2 process error 1/2 prediction error 1/2 Var d CL,2 C d i,j D I 7-4 19-485.1% 18-453.9% 18-453.8% 0-12.0% 7-171.1% 8 91 242 265.6% 228 249.8% 228 249.7% 6 6.2% 82 90.5% 9 166 318 191.6% 293 176.4% 292 175.8% 23 13.7% 124 74.7% 10 320 557 174.4% 519 162.5% 518 162.2% 29 9.1% 202 63.3% 11 961 1 232 128.3% 1 159 120.6% 1 158 120.5% 49 5.1% 419 43.7% 12 1 445 1 453 100.6% 1 381 95.6% 1 379 95.4% 74 5.1% 452 31.3% 13 2 650 1 835 69.2% 1 734 65.4% 1 729 65.3% 130 4.9% 599 22.6% 14 3 749 2 144 57.2% 2 051 54.7% 2 043 54.5% 182 4.9% 625 16.7% 15 8 224 4 726 57.5% 4 545 55.3% 4 530 55.1% 373 4.5% 1 295 15.7% 16 45 878 5 885 12.8% 5 652 12.3% 5 175 11.3% 2 273 5.0% 1 639 3.6% Total 63 480 8 933 14.1% 7 967 12.6% 7 623 12.0% 2 316 3.6% 4 041 6.4% Table 3.15: Reserves and conditional MSEP in Model 3.16 for Portfolio B Comments. The error terms between portfolio A and portfolio B are now directly comparable: The conditional prediction errors are the same for both portfolios. The conditional process error decreases now from portfolio B to portfolio A by about factor 2, since portfolio A has about twice the size of portfolio B. The conditional estimation error decreases from portfolio B to portfolio A since in portfolio A we have more data to estimate the parameters. The conditional prediction errors in portfolio A and portfolio B slightly differ since we choose different development factors Fj and since the relative weights Ci,I i between the accident years i differ in portfolio A and portfolio B. A more conservative model would be to assume total dependence for the conditional prediction errors between the accident years, i.e. then we would not observe any diversification of the conditional prediction error between the accident years.
84 Chapter 3. Chain-ladder models
Chapter 4 Bayesian models 4.1 Introduction to credibility claims reserving methods In the broadest sense, Bayesian methods for claims reserving can be considered as methods in which one combines expert knowledge or existing a priori information with observations resulting in an estimate for the ultimate claim. In the simplest case this a priori knowledge/information is given e.g. by a single value like an a priori estimate for the ultimate claim or for the average loss ratio see this section and the following section. However, in a strict sense the a priori knowledge/information in Bayesian methods for claims reserving is given by an a priori distribution of a random quantity such as the ultimate claim or a risk parameter. The Bayesian inference is then understood to be the process of combining the a priori distribution of the random quantity with the data given in the upper trapezoid via Bayes theorem. In this manner it is sometimes possible to obtain an analytic expression for the a posteriori distribution of the ultimate claim that reflects the change in the uncertainty due to the observations. The a posteriori expectation of the ultimate claim is then called the Bayesian estimator for the ultimate claim and minimizes the quadratic loss in the class of all estimators which are square integrable functions of the observations see Section 4.2. In cases where we are not able to explicitly calculate the a posteriori expectation of the ultimate we restrict the search of the best estimator to the smaller class of estimators, which are linear functions of the observations see Sections 4.3, 4.4 and 4.5. 85
86 Chapter 4. Bayesian models 4.1.1 Benktander-Hovinen method This method goes back to Benktander [8] and Hovinen [37]. They have developed independently a method which leads to the same total estimated loss amount. Choose i > I J. Assume we have an a priori estimate µ i for E[C i,j ] and that the claims development pattern β j 0 j J with E[C i,j ] = µ i β j is known. Since the Bornhuetter-Ferguson method completely ignores the observations C i,i i on the last observed diagonal and the chain-ladder method completely ignores the a priori estimate µ i at hand, one could consider a credibility mixture of these two methods see 2.23-2.24: For c [0, 1] we define the following credibility mixture CL S i c = c Ĉi,J + 1 c µi 4.1 for I J + 1 i I, where Ĉi,J CL is the chain-ladder estimate for the ultimate claim. The parameter c should increase with developing C i,i i since we have more information in C i,j with increasing time. Benktander [8] proposed to choose c = β I i. This leads to the following estimator: Estimator 4.1 Benktander-Hovinen estimator The BH estimator is given by Ĉ i,j BH = Ci,I i + 1 β I i for I J + 1 i I. CL β I i Ĉi,J + 1 βi i µ i 4.2 Observe that we could again identify the claims development pattern β j 0 j J with the chain-ladder factors f j 0 j<j. This can be done if we use Model Assumptions 2.9 for the Bornhuetter-Ferguson motivation, see also 2.22. Henceforth we identify in the sequel of this section β j = 1 J 1 k=j f. 4.3 k Since the development pattern β j is known we also have using 4.3 known chainladder factors, which implies that we set f j = f j 4.4 for 0 j J 1. Then the BH estimator can be written in the form BH CL BF Ĉ i,j = βi i Ĉi,J + 1 βi i Ĉi,J BF = C i,i i + 1 β I i Ĉi,J 4.5 4.6 cf. 2.18 and 2.24.
Chapter 4. Bayesian models 87 Remarks 4.2 Equation 4.6 shows that the Benktander-Hovinen estimator can be seen as an iterated Bornhuetter-Ferguson estimator using the BF estimate as new a priori estimate. The following lemma shows that the weighting β I i is not a fixe point of our iteration since we have to evaluate the BH estimate at 1 1 β I i 2. Lemma 4.3 We have that for I J + 1 i I. Proof. It holds that Ĉ i,j BH = Ci,I i + 1 β I i Ĉ i,j BH = Si 1 1 βi i 2 4.7 CL β I i Ĉi,J + 1 βi i µ i = β I i Ĉi,J CL + βi i β 2 I i Ĉ i,j CL + 1 βi i 2 µ i 4.8 = 1 1 β I i 2 Ĉi,J CL + 1 βi i 2 µ i = S i 1 1 βi i 2. This finishes the proof of the lemma. Example 4.4 Benktander-Hovinen method i C i,i i µ i β I i d Ci,J CL dc i,j BH We revisit the data set given in Examples 2.7 and 2.11. We see that the Benktanderestimated reserves CL BH BF 0 11 148 124 11 653 101 100.0% 11 148 124 11 148 124 1 10 648 192 11 367 306 99.9% 10 663 318 10 663 319 15 126 15 127 16 124 2 10 635 751 10 962 965 99.8% 10 662 008 10 662 010 26 257 26 259 26 998 3 9 724 068 10 616 762 99.6% 9 758 606 9 758 617 34 538 34 549 37 575 4 9 786 916 11 044 881 99.1% 9 872 218 9 872 305 85 302 85 389 95 434 5 9 935 753 11 480 700 98.4% 10 092 247 10 092 581 156 494 156 828 178 024 6 9 282 022 11 413 572 97.0% 9 568 143 9 569 793 286 121 287 771 341 305 7 8 256 211 11 126 527 94.8% 8 705 378 8 711 824 449 167 455 612 574 089 8 7 648 729 10 986 548 88.0% 8 691 971 8 725 026 1 043 242 1 076 297 1 318 646 9 5 675 568 11 618 437 59.0% 9 626 383 9 961 926 3 950 815 4 286 358 4 768 384 Total 6 047 061 6 424 190 7 356 580 Table 4.1: Claims reserves from the Benktander-Hovinen method Hovinen reserves are in between the chain-ladder reserves and the Bornhuetter- Ferguson reserves. They are closer to the chain-ladder reserves because β I i is larger than 50% for all accident years i {0,..., I}.
88 Chapter 4. Bayesian models The next theorem is due to Mack [51]. It says that if we further iterate the BF method, we arrive at the chain-ladder reserve: Theorem 4.5 Mack [51] Choose Ĉ0 = µ i and define for m 0 Ĉ m+1 = C i,i i + 1 β I i Ĉm. 4.9 If β I i > 0 then lim Ĉ m CL = m Ĉi,J. 4.10 Proof. For m 1 we claim that Ĉ m = 1 1 β I i m Ĉi,J CL + 1 βi i m µ i. 4.11 The claim is true for m = 1 BF estimator and for m = 2 BH estimator, see Lemma 4.3. Hence we prove the claim by induction. Induction step m m + 1: Ĉ m+1 = C i,i i + 1 β I i Ĉm 4.12 1 = C i,i i + 1 β I i 1 βi i m CL Ĉi,J + 1 βi i m µ i = β I i Ĉi,J CL + 1 βi i 1 β I i m+1 Ĉ i,j CL + 1 βi i m+1 µ i, which proves 4.11. But from 4.11 the claim of the theorem immediately follows. Example 4.4, revisited In view of Theorem 4.5 we have bc 1 = d C i,j BF bc 2 = d C i,j BH bc 3 b C 4 b C 5... b C = d C i,j CL 0 11 148 124 11 148 124 11 148 124 11 148 124 11 148 124... 11 148 124 1 10 664 316 10 663 319 10 663 318 10 663 318 10 663 318 10 663 318 2 10 662 749 10 662 010 10 662 008 10 662 008 10 662 008 10 662 008 3 9 761 643 9 758 617 9 758 606 9 758 606 9 758 606 9 758 606 4 9 882 350 9 872 305 9 872 218 9 872 218 9 872 218... 9 872 218 5 10 113 777 10 092 581 10 092 252 10 092 247 10 092 247 10 092 247 6 9 623 328 9 569 793 9 568 192 9 568 144 9 568 143 9 568 143 7 8 830 301 8 711 824 8 705 711 8 705 395 8 705 379 8 705 378 8 8 967 375 8 725 026 8 695 938 8 692 447 8 692 028 8 691 971 9 10 443 953 9 961 926 9 764 095 9 682 902 9 649 579... 9 626 383 Table 4.2: Iteration of the Bornhuetter-Ferguson method
Chapter 4. Bayesian models 89 4.1.2 Minimizing quadratic loss functions We choose i > I J and define the reserves for accident year i see also 1.43 R i = C i,j C i,i i. 4.13 Hence under the assumption that the development pattern and the chain-ladder factors are known and identified by 4.3 under Model Assumptions 2.9 the chainladder reserve and the Bornhuetter-Ferguson reserve are given by R i CL = Ĉ i,j CL Ci,I i = C i,i i R i BF J 1 j=i i f j 1, 4.14 = Ĉi,J BF Ci,I i = 1 β I i µ i. 4.15 If we mix the chain-ladder and Bornhuetter-Ferguson methods we obtain for the credibility mixture c [0, 1] CL BF c Ĉi,J + 1 c Ĉ i,j 4.16 the following reserves see also 4.5. R i c = c R CL BF i + 1 c Ri CL = 1 β I i c Ĉi,J + 1 c µi 4.17 BF CL BF = Ĉi,J Ci,I i + c Ĉi,J Ĉ i,j Question. Which is the optimality c? Optimal is defined in the sense of minimizing the quadratic error function. This means: Our goal is to minimize the unconditional mean square error of prediction for the reserves R i c see also Section 3.1. msep Ri Ri c [ = E R i R ] 2 i c 4.18 In order to do this minimization we need a proper stochastic model definition. Model Assumptions 4.6
90 Chapter 4. Bayesian models Assume that different accident years i are independent. There exists a sequence β j 0 j J with β J = 1 such that we have for all j {0,..., J} E [C i,j ] = β j E [C i,j ]. 4.19 Moreover, we assume that U i is a random variable, which is unbiased for E[C i,j ], i.e. E[U i ] = E[C i,j ], and that U i is independent of C i,i i and C i,j. Remarks 4.7 Model Assumptions 4.6 coincide with Model Assumptions 2.9 if we assume that U i = µ i is deterministic. Observe that we do not assume that the chain-ladder model is satisfied! The chain-ladder model satisfies Model Assumptions 4.6 but not necessarily vica versa. Assume that f j is identified with β j via 4.3 and that CL C i,i i BF Ĉ i,j = and Ĉ i,j = Ci,I i + 1 β I i U i. 4.20 β I i Hence the reserves are given by R i c = 1 β I i CL c Ĉi,J + 1 c Ui. 4.21 Under these model assumptions and definitions we minimize msep Ri Ri c [ = E R i R ] 2 i c. 4.22 Observe that also if we would assume that the chain-ladder model is satisfied we could not directly compare this situation to the mean square error of prediction calculation in Chapter 3. For the derivation of a MSEP formula for the chain-ladder method we have always assumed that the chain-ladder factors f j are not known. If they would be known the mean square error of prediction of the chain-ladder reserves simply is given by see 3.30 [ CL msep Ci,J Ĉi,J = E [E ]] CL 2 C i,j Ĉi,J D I CL = E [msep Ci,J DI Ĉi,J ] = E [Var C i,j D I ] 4.23 = Var C i,j Var E [C i,j D I ] = Var C i,j Var C i,i i J 1 j=i i f 2 j.
Chapter 4. Bayesian models 91 If we calculate 4.17 under Model Assumptions 4.6 and with 4.20 we obtain E [ Ri c ] = E[R i ] and msep Ri Ri c [ = Var R i + E E[R i ] R ] 2 i c Then we have the following theorem: [ Ri +2 E E[R i ] E[R i ] R ] i c = Var R i + Var Ri c 2 Cov R i, R i c. 4.24 Theorem 4.8 Mack [51] Under Model Assumptions 4.6 and 4.20 the optimal credibility factor c i which minimizes the unconditional mean square error of prediction 4.22 is given by c i = β I i CovC i,i i, R i + β I i 1 β I i VarU i 1 β I i VarC i,i i + βi i 2 VarU i. 4.25 Proof. We have that [ ] 2 E Ri c i R i [ = c 2 CL i E Ri ] [ BF 2 Ri + E R i R ] BF 2 i Hence the optimal c i is given by 2 c i E [ Ri CL Ri BF R i R i BF ]. 4.26 E [ Ri CL BF Ri R i R ] BF i c i = [ CL E Ri ] 4.27 BF 2 Ri = E [ ] 1/β I i 1 C i,i i 1 β I i U i Ri 1 β I i U i [ 1/βI i ] 2 E 1 C i,i i 1 β I i U i β I i = E [ ] Ci,I i β I i U i Ri 1 β I i U i 1 β I i E [ C i,i i β I i U i 2]. Since E[β I i U i ] = E[C i,i i ] and E[U i ] = E[C i,j ] we obtain c i = = β I i CovC i,i i β I i U i, R i 1 β I i U i 1 β I i VarC i,i i β I i U i β I i CovC i,i i, R i + β I i 1 β I i VarU i 1 β I i VarC i,i i + βi i 2 VarU. i 4.28 This finishes the proof.
92 Chapter 4. Bayesian models We would like to mention once more that we have not considered the estimation errors in the claims development pattern β j and f j, respectively. In this sense Theorem 4.8 is a statement giving optimal credibility weights considering process variance and the uncertainty in the a priori estimate U i. Remark. To explicitly calculate c i in Theorem 4.8 we need to specify an explicit stochastic model. We will do this below in Section 4.1.4, and close this subsection for the moment. 4.1.3 Cape-Cod Model One main deficiency in the chain-ladder model is that the chain-ladder model completely depends on the last observation on the diagonal see Chain-ladder Estimator 2.4. If this last observation is an outlier, this outlier will be projected to the ultimate claim using the age-to-age factors. Often in long-tailed lines of business the first observations are not always representative. One possibility to smoothen outliers on the last observed diagonal is to combine BF and CL methods as e.g. in the Benktander-Hovinen method, another possibility is to robustify such observations. This is done in the Cape-Cod method. The Cape-Cod method goes back to Bühlmann [15]. Model Assumptions 4.9 Cape-Cod method There exist parameters Π 0,..., Π I > 0, κ > 0 and a claims development pattern β j 0 j J with β J = 1 such that E[C i,j ] = κ Π i β j 4.29 for all i = 0,..., I. Moreover, different accident years i are independent. Observe that the Cape-Cod model assumptions coincide with Model Assumptions 2.9, set µ i = κ Π i. For the Cape-Cod method we have described these new assumptions, to make clear the parameter Π i can be interpreted as the premium in year i and κ reflects the average loss ratio. We assume that κ is independent of the accident year i, i.e. the premium level w.r.t. κ is the same for all accident years. Under 4.3 we can for each accident year estimate the loss ratio using the chain-ladder estimate CL κ i = Ĉi,J = Π i C i,i i J 1 j=i i f = C i,i i. 4.30 1 j Π i β I i Π i
Chapter 4. Bayesian models 93 This is an unbiased estimate for κ, E [ κ i ] = 1 Π i E [Ĉi,J CL ] = The robustified overall loss ratio is then estimate by κ CC = I 1 Π i β I i E [C i,i i ] = 1 Π i E [C i,j ] = κ. 4.31 β I i Π i I k=0 β I k Π k κ i = Observe that κ CC is an unbiased estimate for κ. A robustified value for C i,i i is then found by i > I J This leads to the Cape-Cod estimator: I C i,i i I β I i Π i. 4.32 Ĉ i,i i CC = κ CC Π i β I i. 4.33 Estimator 4.10 Cape-Cod estimator The CC estimator is given by for I J + 1 i I. J 1 CC CC Ĉ i,j = Ci,I i Ĉ i,i i + j=i i f j Ĉ i,i i CC 4.34 Lemma 4.11 Under Model Assumptions 4.9 and 4.3 the estimator Ĉi,J CC C i,i i is unbiased for E [C i,j C i,i i ] = κ Π i 1 β I i. Proof. Observe that E [Ĉi,I i CC ] = E [ κ CC] Π i β I i = κ Π i β I i = E [C i,i i ]. 4.35 Moreover we have with 4.3 that J 1 CC CC Ĉ i,j Ci,I i = Ĉ i,i i f j 1 = κ CC Π i 1 β I i. 4.36 This finishes the proof. Remarks 4.12 j=i i The chain-ladder iteration is applied to the robustified diagonal value Ĉ i,i i CC, but still we add the difference between original observation C i,i i and robustified diagonal value in order to calculate the ultimate. If we transform the Cap-Code estimator we obtain see also 4.36 Ĉ i,j CC = Ci,I i + 1 β I i κ CC Π i, 4.37 which is a Bornhuetter-Ferguson type estimate with modified a priori estimate κ CC Π i.
94 Chapter 4. Bayesian models Observe that Var κ i = 1 Π 2 i Var C i,i i. 4.38 β2 I i According to the choice of the variance function of C i,j this may also suggest that the robustification can be done in an other way with smaller variance, see also Lemma 3.4. Example 4.13 Cape-Cod method We revisit the data set given in Examples 2.7, 2.11 and 4.4. Π i bκ i Ĉ i,i i CC dc i,j CC estimated reserves Cape-Cod CL BF 0 15 473 558 72.0% 10 411 192 11 148 124 0 0 0 1 14 882 436 71.7% 9 999 259 10 662 396 14 204 15 126 16 124 2 14 456 039 73.8% 9 702 614 10 659 704 23 953 26 257 26 998 3 14 054 917 69.4% 9 423 208 9 757 538 33 469 34 538 37 575 4 14 525 373 68.0% 9 688 771 9 871 362 84 446 85 302 95 434 5 15 025 923 67.2% 9 953 237 10 092 522 156 769 156 494 178 024 6 14 832 965 64.5% 9 681 735 9 580 464 298 442 286 121 341 305 7 14 550 359 59.8% 9 284 898 8 761 342 505 131 449 167 574 089 8 14 461 781 60.1% 8 562 549 8 816 611 1 167 882 1 043 242 1 318 646 9 15 210 363 63.3% 6 033 871 9 875 801 4 200 233 3 950 815 4 768 384 bκ CC 67.3% Total 6 484 530 6 047 061 7 356 580 Table 4.3: Claims reserves from the Cape-Cod method Estimated loss ratios 75.0% 70.0% loss ratio 65.0% 60.0% 55.0% 50.0% 0 1 2 3 4 5 6 7 8 9 accident years kappa i kappa CC Figure 4.1: Estimated individual loss ratios κ i and estimated Cap-Code loss ratio κ CC
Chapter 4. Bayesian models 95 CC The example shows the robustified diagonal values Ĉ i,i i. This leads to the Cape- CC Cod estimate. The Cape-Cod estimate Ĉi,J is smaller than the Bornhuetter- BF Ferguson estimate Ĉi,J. This comes from the fact, that the a priori estimates µi used for the Bornhuetter-Ferguson method are rather pessimistic. The loss ratios µ i /Π i are all above 75%, whereas the Cape-Cod method gives loss ratios κ i, which are all below 75% see Figure 4.1. However, as Figure 4.1 shows: We have to be careful with the assumption of constant loss ratios κ. The figure suggests that we have to consider underwriting cycles carefully. In soft markets, loss ratios are rather low we are able to charge rather high premiums. If there is a keen competition we expect low profit margins. If possible, we should adjust our premium with underwriting cycle information. For this reason one finds in practice modified versions of the Cape-Cod method, e.g. smoothening of the last observed diagonal is only done over neighboring values. 4.1.4 A distributional example to credible claims reserving To construct the Benktander-Hovinen estimate we have used a credible weighting between the Bornhuetter-Ferguson method and the chain-ladder method. Theorem 4.8 gave a statement for the best weighted average relative to the quadratic loss function. We now specify an explicit model to apply Theorem 4.8. Model Assumptions 4.14 Mack [51] Different accident years i are independent. There exists a sequence β j 0 j J with β J = 1 such that we have E [C i,j C i,j ] = β j C i,j, 4.39 Var C i,j C i,j = β j 1 β j α 2 C i,j 4.40 for all i = 0,..., I and j = 0,..., J. Remarks 4.15 The spirit of this model is different from the Chain-ladder Model 3.2. In the chain-ladder model we have a forward iteration, i.e. we condition on the preceding observation. In the model above we have rather a backward iteration, conditioning on the ultimate C i,j we determine intermediate cumulative payment states, i.e. this is simply a refined definition of the development pattern.
96 Chapter 4. Bayesian models This model can also be viewed as a Bayesian approach, with latent variables which determine the ultimate claim C i,j. This will be further discussed below. Observe that this model satisfies the Model Assumptions 2.9 with µ i = E [C i,j ]. Moreover C i,j satisfies the assumptions given in Model Assumptions 4.6. The chain-ladder model is in general not satisfied see also Section 4.2.2, e.g. 4.68 below. Observe that the variance condition is such that it converges to zero for β j 1, i.e. if the expected outstanding payments are low, also the uncertainty is low. In view of Theorem 4.8 we have the following corollary use definitions 4.21 and 4.20: Corollary 4.16 Under the assumption that U i is an unbiased estimator for E[C i,j ] which is independent of C i,i i and C i,j and Model Assumption 4.14 the optimal credibility factor c i which minimizes the unconditional mean square error of prediction 4.22 is given by c i = β I i β I i + t i with t i = E [α 2 C i,j ] VarU i + VarC i,j E [α 2 C i,j ] 4.41 for i {I J + 1,..., I}. Proof. From Theorem 4.8 we have c i = β I i CovC i,i i, C i,j C i,i i + β I i 1 β I i VarU i 1 β I i VarC i,i i + βi i 2 VarU. 4.42 i Now we need to calculate the elements of the equation above. We obtain VarC i,i i = E [VarC i,i i C i,j ] + Var E[C i,i i C i,j ] = β I i 1 β I i E [ α 2 C i,j ] + βi ivar 2 C i,j, 4.43 and CovC i,i i, C i,j C i,i i = CovC i,i i, C i,j VarC i,i i. 4.44 Henceforth we need to calculate CovC i,i i, C i,j = E [CovC i,i i, C i,j C i,j ] + Cov E[C i,i i C i,j ], E[C i,j C i,j ] = 0 + Cov β I i C i,j, C i,j = β I i VarC i,j. 4.45
Chapter 4. Bayesian models 97 This implies that Hence we obtain CovC i,i i, C i,j C i,i i = β I i VarC i,j VarC i,i i. 4.46 c i = = = β I i βi i VarC i,j VarC i,i i + β I i 1 β I i VarU i 1 β I i VarC i,i i + βi i 2 VarU i VarC i,j E [α 2 C i,j ] + VarU i β 1 I i 1 E [α 2 C i,j ] + VarC i,j + VarU i VarC i,j E [α 2 C i,j ] + VarU i β 1 I i E [α2 C i,j ] + VarC i,j E [α 2 C i,j ] + VarU i. 4.47 This finishes the proof of the corollary. Corollary 4.17 Under the assumption that U i is an unbiased estimator for E[C i,j ] which is independent of C i,i i and C i,j and the Model Assumption 4.14 we find the following mean square errors of prediction see also 4.21-4.22: msep Ri Ri c = E [ α 2 C i,j ] c 2 1 1 c2 + + 1 β I i 2, β I i 1 β I i t i msep Ri Ri 0 = E [ α 2 C i,j ] 1 + 1 1 β I i 2, 1 β I i t i msep Ri Ri 1 = E [ α 2 C i,j ] 1 1 + 1 β I i 2, β I i 1 β I i msep Ri Ri c i = E [ α 2 C i,j ] 1 1 + 1 β I i 2 4.48 β I i + t i 1 β I i for i {I J + 1,..., I}. Proof. Exercise. Remarks 4.18 The reserve R i 0 corresponds to the Bornhuetter-Ferguson reserve R i BF and R i 1 corresponds to the chain-ladder reserve R i CL. However, msepri Ri 1 and msep Ri Ri CL from Section 3 are not comparable since a we use a completely different model, which leads to different process error and prediction error terms; b in Corollary 4.17 we do not investigate the estimation error coming from the fact that we have to estimate f j and β j.
98 Chapter 4. Bayesian models From Corollary 4.17 we see that the Bornhuetter-Ferguson estimate in Model 4.14 is better than the chain-ladder estimate as long as t i > β I i. 4.49 I.e. for years with small loss experience β I i one should take the BF estimate whereas for older years one should take the CL estimate. Similar estimates can be derived for the BH estimate. Example 4.19 Mack model, Model Assumptions 4.14 An easy distributional example satisfying Model Assumptions 4.14 is the following. Assume that, conditionally given C i,j, C i,j /C i,j has a Beta α i β j, α i 1 β j - distribution. Hence [ ] Ci,j E [C i,j C i,j ] = C i,j E C i.j C i,j = β j C i,j, 4.50 Var C i,j C i,j = Ci,J 2 Ci,j Var C Ci,J 2 i,j = β j 1 β j 4.51 1 + α i for all i = 0,..., I and j = 0,..., J. C i,j See appendix, Section B.2.4, for the definition of the Beta distribution and its moments. We revisit the data set given in Examples 2.7, 2.11 and 4.4. Observe that E [ α 2 C i,j ] = 1 E [ ] C 2 E [C i,j ] 2 i,j = Vco 2 C i,j + 1. 4.52 1 + α i 1 + α i As already mentioned before, we assume that the claims development pattern β j 0 j J is known. This means that in our estimates there is no estimation error term coming from the claims development parameters. We only have a process variance term and the uncertainty in the estimation of the ultimate U i. This corresponds to a prediction error term in the language of Section 3.3. We assume that an actuary is able to predict the true a priori estimate with an error of 5%, i.e. VcoU i = 5%. Hence we assume that Vco C i,j = Vco 2 U i + r 2 1/2, 4.53 where r = 6% corresponds to the pure process error. This leads to the results in Table 4.4. We already see from the choices of our parameters α i, r and VcoU i that it is rather difficult to apply this method in practice, since we have not estimated these parameters from the data available.
Chapter 4. Bayesian models 99 estimated reserves α i VcoU i r VcoC i,j t i c i CL BF c Ri c i 0 600 5.0% 6.0% 7.8% 24.2% 80.5% 0 0 0 1 600 5.0% 6.0% 7.8% 24.2% 80.5% 15 126 16 124 15 320 2 600 5.0% 6.0% 7.8% 24.2% 80.5% 26 257 26 998 26 401 3 600 5.0% 6.0% 7.8% 24.2% 80.5% 34 538 37 575 35 131 4 600 5.0% 6.0% 7.8% 24.2% 80.4% 85 302 95 434 87 288 5 600 5.0% 6.0% 7.8% 24.2% 80.3% 156 494 178 024 160 738 6 600 5.0% 6.0% 7.8% 24.2% 80.1% 286 121 341 305 297 128 7 600 5.0% 6.0% 7.8% 24.2% 79.7% 449 167 574 089 474 538 8 600 5.0% 6.0% 7.8% 24.2% 78.5% 1 043 242 1 318 646 1 102 588 9 600 5.0% 6.0% 7.8% 24.2% 70.9% 3 950 815 4 768 384 4 188 531 Table 4.4: Claims reserves from Model 4.14 Total 6 047 061 7 356 580 6 387 663 E[U i ] E[α 2 C i,j ] 1/2 msep 1/2 c R i 1 msep 1/2 c R i 0 msep 1/2 c R i c i 0 11 653 101 476 788 1 11 367 306 465 094 17 529 17 568 17 527 2 10 962 965 448 551 22 287 22 373 22 282 3 10 616 762 434 386 25 888 26 031 25 879 4 11 044 881 451 902 42 189 42 751 42 153 5 11 480 700 469 734 58 952 60 340 58 862 6 11 413 572 466 987 81 990 85 604 81 745 7 11 126 527 455 243 106 183 113 911 105 626 8 10 986 548 449 515 166 013 190 514 163 852 9 11 618 437 475 369 396 616 500 223 372 199 total 457 811 560 159 435 814 Table 4.5: Mean square errors of prediction according to Corollary 4.17 The prediction errors are given in Table 4.5. Observe that these mean square errors of prediction can not be compared to the mean square error of prediction obtained in the chain-ladder method see Section 3. We do not know whether the model assumptions in this example imply the chain-ladder model assumptions. Moreover, we do not investigate the uncertainties in the parameter estimates, and the choice of the parameters was rather artificial, motivated by expert opinions. 4.2 Exact Bayesian models 4.2.1 Motivation Bayesian methods for claims reserving are methods in which one combines a priori information or expert knowledge with observations in the upper trapezoid D I. Available information/knowledge is incorporated through an a priori distribution
100 Chapter 4. Bayesian models of a random quantity such as the ultimate claim see Sections 4.2.2 and 4.2.3 or a risk parameter see Section 4.2.4 which must be modeled by the actuary. This distribution is then connected with the likelihood function via Bayes theorem. If we use a smart choice for the distribution of the observations and the a priori distribution such as the exponential dispersion family EDF and its associate conjugates see Section 4.2.4, we are able to derive an analytic expression for the a posteriori distribution of the ultimate claim. This means that we can compute the a posteriori expectation E [C i,j D I ] of the ultimate claim C i,j which is called Bayesian estimator for the ultimate claim, given the observations D I. The Bayesian method is called exact since the Bayesian estimator E [C i,j D I ] is optimal in the sense that it minimizes the squared loss function MSEP in the class L 2 C i,j D I of all estimators for C i,j which are square integrable functions of the observations in D I, i.e. E [C i,j D I ] = argmin E [ C i,j Y 2 ] DI. 4.54 Y L 2 C D I i,j For its conditional mean square error of prediction we have that msep Ci,J D I E [Ci,J D I ] = VarC i,j D I. 4.55 Of course, if there are unknown parameters in the underlying probabilistic model, we can not explicitly calculate E [C i,j D I ]. These parameters need to be estimated by D I -measurable estimators. Hence we obtain a D I -measurable estimator Ê [C i,j D I ] for E [C i,j D I ] and C i,j D I, resp. which implies that msep Ci,J D I Ê [Ci,J D I ] = VarC i,j D I + Ê [Ci,J D I ] E [C i,j D I ] 2, 4.56 and now we are in a similar situation as in the chain-ladder model, see 3.30. We close this section with some remarks: For pricing and tariffication of insurance contracts Bayesian ideas and techniques are well investigated and widely used in practice. For the claims reserving problem Bayesian methods are less used although we believe that they are very useful for answering practical questions this has e.g. already be mentioned in de Alba [2]. In the literature exact Bayesian models have been studied e.g. in a series of papers by Verrall [79, 81, 82], de Alba [2, 4], de Alba-Corzo [3], Gogol [30], Haastrup- Arjas [32] Ntzoufras-Dellaportas [60] and the corresponding implementation by Scollnik [70]. Many of these results refer to explicit choices of distributions, e.g. the Poisson-gamma or the log-normal-normal cases are considered. Below, we give an approach which suites for rather general distributions see Section 4.2.4.
Chapter 4. Bayesian models 101 4.2.2 Log-normal/Log-normal model In this section we revisit Model 4.14. We make distributional assumptions on C i,j and C i,j C i,j which satisfy Model Assumptions 4.14 and hence would allow for the application of Corollary 4.16. However, in this section we don t follow that route: In Corollary 4.16 we have specified a second distribution for an a priori estimate E[U i ] which then led to Corollary 4.16. Here, we don t use a distribution for the a priori estimate, but we explicitly specify the distribution of C i,j. The distributional assumptions will be such that we can determine the exact distribution of C i,j C i,j according to Bayes theorem. It figures out that the best estimate for E [C i,j C i,i i ] is a crediblity mixture between the observation C i,i i and the a priori mean E[C i,j ]. Gogol [30] proposed the following model. Model Assumptions 4.20 Log-normal/Log-normal model Different accident years i are independent. C i,j is log-normally distributed with parameters µ i and σ 2 i for i = 0,..., I. Conditionally, given C i,j, C i,j has a Log-normal distribution with parameters ν j C i,j and τ 2 j C i,j for i = 0,..., I and j = 0,..., J. Remark. In appendix, Section B.2.2, we provide the definition of the Log-normal distribution. µ i and σi 2 denote the parameters of the Log-normal distribution of C i,j, with is the a priori mean of C i,j. µ i = E[C i,j ] = exp { µ i + σ 2 i /2 } 4.57 If C i,j 0 j J also satisfies Model Assumptions 4.14 we have that E [C i,j C i,j ] = exp { ν j + τj 2 /2 }! = β j C i,j, 4.58 Var C i,j C i,j = exp { } 2 ν j + τj 2 exp{τ 2 j } 1! = β j 1 β j α 2 C i,j. This implies that we have to choose τj 2 = τj 2 C i,j = log 1 + 1 β j α2 C i,j, 4.59 β j Ci,J 2 ν j = ν j C i,j = log β j C i,j 1 2 log 1 + 1 β j α2 C i,j. 4.60 β j Ci,J 2
102 Chapter 4. Bayesian models The joint distribution of C i,j, C i,j is given by f Ci,j,C i,j x, y = f Ci,j C i,j x y f Ci,J y { 1 1 = 2π 1/2 τ j y x exp 1 } 2 logx νj y 4.61 2 τ j y { 1 1 2π 1/2 σ i y exp 1 logy µ i 2 } 2 σ i { 1 1 = 2π σ i τ j y x y exp 1 2 logx νj y 1 logy µ i 2 }. 2 τ j y 2 σ i Lemma 4.21 The Model Assumptions 4.20 combined with Model Assumptions 4.14 with α 2 c = a 2 c 2 for some a R satisfies the following equalities τj 2 c = τj 2 = log 1 + 1 β j a 2, 4.62 β j ν j c = log c + log β j τ 2 j /2. 4.63 Moreover, the conditional distribution of C i,j given C i,j is again a Log-normal distribution with updated parameters µ posti,j = 1 τ j 2 σi 2 + τ τ 2 j 2 j /2 + logc i,j /β j + τ j 2 σi 2 + τ j 2 µ i, 4.64 σ 2 posti,j = Remarks 4.22 τ 2 j σ 2 i + τ 2 j σ 2 i. 4.65 This example shows a typical Bayesian and credibility result: i In this example of conjugated distributions we can exactly calculate the a posteriori distribution of the ultimate claim C i,j given the information C i,j cf. Section 4.2.4 and see also Bühlmann-Gisler [18]. ii We see that we need to update the parameter µ i by choosing a credibility weighted average of the a priori parameter µ i and the transformed observation τj 2 /2 + logc i,j /β j, where the credibility weight is given by α i,j = σi 2 /σi 2 + τj 2. 4.66 This implies the updating of the a priori mean of the ultimate claim C i,j E[C i,j ] = exp{µ i + σ 2 i /2} 4.67
Chapter 4. Bayesian models 103 to the a posteriori mean of the ultimate claim C i,j E[C i,j C i,j ] = exp{µ posti,j + σ 2 posti,j/2} 4.68 = exp { 1 α i,j µ i + σ 2 i /2 + α i,j logc i,j /β j + τ 2 j /2 } = exp { 1 α i,j µ i + σ 2 i /2 + α i,j log β j + τ 2 j /2 } C see also 4.83 below. σ i 2 σ i 2+τ2 j i,j, Observe that this model does in general not satisfy the chain-ladder assumptions cf. last expression in 4.68. Remarks 4.15. This has already been mentioned in Observe that in the current derivation we only consider one observation C i,j. We could also consider the whole sequence of observations C i,0,..., C i,j then the a posteriori distribution of C i,j is log-normally distributed with mean µ posti,j = j log C i,k log β k +τk 2 k=0 τk 2 j 1 k=0 + 1 τk 2 σi 2 = α i,j 1 j k=0 1 τ 2 k j k=0 + µi σ 2 i log C i,k log β k + τ 2 k τ 2 k 4.69 + 1 α i,j µ i, with and variance α i,j = σ 2, posti,j = j k=0 j k=0 [ j 1 τ 2 k 1 τ 2 k=0 k 1 τk 2 + 1 σi 2, 4.70 + 1 σ 2 i ] 1. 4.71 Observe that this is again a credibility weighted average between the a priori estimate µ i and the observations C i,0,..., C i,j. The credibility weights are given by α i,j. Moreover, observe that this model does not have the Markov property, this is in contrast to our chain-ladder assumptions. Proof of Lemma 4.21. The equations 4.62-4.63 easily follow from 4.59-4.60. Hence we only need to calculate the conditional distribution of C i,j given C i,j. From 4.61 and 4.63 we see that the joint density of C i,j, C i,j is given by 1 1 f Ci,j,C i,j x, y = 2π σ i τ j x y { exp 1 logx logy log βj + τj 2 /2 2 τ j 2 1 2 4.72 logy µ i 2 }. σ i
104 Chapter 4. Bayesian models Now we have that 2 2 z c z µ + = τ σ 2 z σ2 c+τ 2 µ σ 2 +τ 2 + σ 2 τ 2 σ 2 +τ 2 µ c2 σ 2 + τ 2. 4.73 This implies that 1 1 f Ci,j,C i,j x, y = 2π σ i τ j x y log y σ2 i cx+τ j 2µi exp 1 σi 2 +τ j 2 2 σi 2τ j 2 σ 2 i +τ 2 j 2 µ i cx 2 + σi 2 + τ j 2, 4.74 where cx = logx log β j + τ 2 j /2. 4.75 From this we see that f Ci,J C i,j y x = f C i,j,c i,j x, y f Ci,j x = f C i,j,c i,j x, y fci,j,c x, ydy 4.76 i,j is the density of a Log-normal distribution with parameters Finally, we rewrite µ posti,j cf. 4.75: µ posti,j = σ2 i cc i,j + τj 2 µ i σi 2 + τ, 4.77 j 2 σposti,j 2 = σ2 i τj 2 σi 2 + τ. 4.78 j 2 µ posti,j = σ2 i logc i,j log β j + τj 2 /2 + τj 2 µ i σi 2 + τ. 4.79 j 2 This finishes the proof of Lemma 4.21. Estimator 4.23 Log-normal/Log-normal model, Gogol [30] Under the assumptions of Lemma 4.21 we have the following estimator for the ultimate claim E [C i,j C i,i i ] Go Ĉ i,j = E [Ci,J C i,i i ] 4.80 { σi 2 = exp 1 σi 2 + τ µ i + σ2 i σi 2 Ci,I i + I i 2 2 σi 2 + τ log + τ } I i 2 I i 2 β I i 2 for I J + 1 i I.
Chapter 4. Bayesian models 105 Observe that we only condition on the last observation C i,i i, see also Remarks 4.22 on Markov property. Remark. We could also consider Ĉ i,j Go,2 = Ci,I i + 1 β I i Ĉi,J Go. 4.81 From a practical point of view Ĉi,J Go,2 is more useful, if we have an outlier on the diagonal. However, both estimators are not easily obtained in practice, since there are too many parameters which are difficult to estimate. Example 4.24 Model Gogol [30], Assumptions of Lemma 4.21 We revisit the data set given in Example 2.7. For the parameters we do the same choices as in Example 4.19 see Table 4.4. I.e. we set VcoC i,j equal to the value obtained in 4.53. In formula 4.53 this variational coefficient was decomposed into process error and parameter uncertainties, here we only use the overall uncertainty. Moreover we choose a 2 = 1 1+α i with α i = 600. Using 4.62, 4.57 and cf. appendix, Table B.5 leads to Table 4.6. σ 2 i = ln Vco 2 C i,j + 1 4.82 µ i = E[C i,j ] VcoC i,j µ i σ i β I i a 2 τ I i 0 11 653 101 7.8% 16.27 7.80% 100.0% 0.17% 0.0% 1 11 367 306 7.8% 16.24 7.80% 99.9% 0.17% 0.2% 2 10 962 965 7.8% 16.21 7.80% 99.8% 0.17% 0.2% 3 10 616 762 7.8% 16.17 7.80% 99.6% 0.17% 0.2% 4 11 044 881 7.8% 16.21 7.80% 99.1% 0.17% 0.4% 5 11 480 700 7.8% 16.25 7.80% 98.4% 0.17% 0.5% 6 11 413 572 7.8% 16.25 7.80% 97.0% 0.17% 0.7% 7 11 126 527 7.8% 16.22 7.80% 94.8% 0.17% 1.0% 8 10 986 548 7.8% 16.21 7.80% 88.0% 0.17% 1.5% 9 11 618 437 7.8% 16.27 7.80% 59.0% 0.17% 3.4% Table 4.6: Parameter choice for the Log-normal/Log-normal model We obtain the credibility weights and estimates for the ultimates in Table 4.7. = µ 1 α i,i i i CL = C i,i i β I i Using 4.66, 4.57 and Ĉi,J we obtain for Estimator 4.23 the following representation: { Go Ĉ i,j = exp 1 α i,i i µ i + σi 2 /2 } Ci,I i + α i,i i log + τ 2 β I i/2 I i { CL αi,i i exp log Ĉi,J + τ 2 I i /2}. 4.83
106 Chapter 4. Bayesian models C i,i i 1 α i,i i µ posti,i i σ posti,i i d Ci,J Go estimated reserves Go CL BF 0 11 148 124 0.0% 16.23 0.00% 11 148 124 0 0 0 1 10 648 192 0.0% 16.18 0.15% 10 663 595 15 403 15 126 16 124 2 10 635 751 0.1% 16.18 0.20% 10 662 230 26 479 26 257 26 998 3 9 724 068 0.1% 16.09 0.24% 9 759 434 35 365 34 538 37 575 4 9 786 916 0.2% 16.11 0.38% 9 874 925 88 009 85 302 95 434 5 9 935 753 0.4% 16.13 0.51% 10 097 962 162 209 156 494 178 024 6 9 282 022 0.8% 16.08 0.71% 9 582 510 300 487 286 121 341 305 7 8 256 211 1.5% 15.98 0.94% 8 737 154 480 942 449 167 574 089 8 7 648 729 3.6% 15.99 1.48% 8 766 487 1 117 758 1 043 242 1 318 646 9 5 675 568 16.0% 16.11 3.12% 9 925 132 4 249 564 3 950 815 4 768 384 Table 4.7: Estimated reserves in model Lemma 4.21 6 476 218 6 047 061 7 356 580 Hence we obtain a weighted average between the a priori estimate µ i = E[C i,j ] and CL the chain-ladder estimate Ĉi,J on the log-scale. This leads together with the bias correction to multiplicative credibility formula. In Table 4.7 we see that the weights 1 α i,i i given to the a priori mean µ i are rather low. For the conditional mean square error of prediction we have msep Ci,J C i,i i Ĉi,J Go = VarC i,j C i,i i = exp { 2 µ posti,i i + σ 2 posti,i i = E [C i,j C i,i i ] 2 exp { σ 2 posti,i i } 1 = Ĉi,J Go 2 exp { σ 2 posti,i i} 1. } exp { σ 2 posti,i i } 1 4.84 This holds under the assumption that the parameters β j, µ i, σ i and a 2 are known. Hence it is not directly comparable to the mean square error of prediction obtained from the chain-ladder model, since we have no canonical model for the estimation of these parameters and hence we can not quantify the estimation error. If we want to compare this mean square error of prediction to the ones obtained in Corollary 4.17 we need to calculate the unconditional version: Go msep Ci,J Ĉi,J [ ] Go 2 = E C i,j Ĉi,J = E [VarC i,j C i,i i ] 4.85 = E [ Ĉi,J Go 2 ] exp { σ 2 posti,i i} 1. Hence we need the distribution of Ĉi,J CL = Ci,I i /β I i cf. 4.83. Using 4.74
Chapter 4. Bayesian models 107 we obtain f Ci,I i x = f Ci,I i,c i,j x, y dy R + = 1 log y σ2 i cx+τ I i 2 µi y exp 1 2 σi 2+τ I i 2 σi 2 τ I i 2 1 R + 2π σ i τ I i σ 2 i +τi i 2 σ 2 i +τ 2 I i } {{ } =1 1 1 2π σi 2 + τ I i 2 x exp 1 2 logx/β I i + τ 2 I i σ 2 i + τ 2 I i 2 dy 4.86 2 µ i 2. This shows the estimator Ĉi,J CL = Ci,I i /β I i is log-normally distributed with parameters µ i τi i 2 /2 and σ2 i +τi i 2. Moreover, the multiplicative reproductiveness of the Log-normal distribution implies that for γ > 0 CL Ĉi,J γ d LN γ µ i γ τi i/2, 2 γ 2 σ i 2 + τi i 2. 4.87 Using 4.83 and 4.57 this leads to msep Ci,J Ĉi,J Go [ Go = E Ĉi,J ] 2 exp { } σposti,i i 2 1 4.88 = µ 2 1 α i,i i i exp { [ } { } α i,i i τi i 2 exp σ 2 CL posti,i i 1 E Ĉi,J ] 2 α i,i i = exp { } 2 µ i + 1 α i,i i σi 2 + α i,i i τi i 2 exp { α i,i i τ 2 I i + 2 α 2 i,i i σ 2 i + τ 2 I i} exp { σ 2 posti,i i } 1. Observe that α i,i i σ 2 i + τ 2 I i = σ 2 i 4.89 cf. 4.66. This immediately implies the following corollary: Corollary 4.25 Under the assumptions of Lemma 4.21 we have Go msep Ci,J Ĉi,J = exp { 2 µ i + 1 + α i,i i σi 2 for all I J + 1 i I. } exp { σ 2 posti,i i } 1 4.90
108 Chapter 4. Bayesian models msep 1/2 Go dci,j C i,j C i,i i msep 1/2 Go dci,j C i,j msep 1/2`c Ri c i 0 1 16 391 17 526 17 527 2 21 602 22 279 22 282 3 23 714 25 875 25 879 4 37 561 42 139 42 153 5 51 584 58 825 58 862 6 68 339 81 644 81 745 7 82 516 105 397 105 626 8 129 667 162 982 163 852 9 309 586 363 331 372 199 total 359 869 427 850 435 814 Table 4.8: Mean square errors of prediction under the assumptions of Lemma 4.21 and in Model 4.14 4.2.3 Overdispersed Poisson model with gamma a priori distribution In the next subsections we will consider a different class of Bayesian models. In Model Assumptions 4.14 we had a distributional assumption on C i,j given the ultimate claim C i,j which can be seen as a backward iteration. Now we introduce a latent variable Θ i. Conditioned on Θ i we will do distributional assumptions on the cumulative sizes C i,j and incremental quantities X i,j, respectively. Θ i describes the risk characteristics of accident year i e.g. was it a good or a bad year. C i,j is then a random variable with parameters which depend on Θ i. In the spirit of the previous chapters Θ i reflects the prediction uncertainties. We start with the overdispersed Poisson model. The overdispersed Poisson model differs from the Poisson Model 2.12 in that the variance is not equal to the mean. This model was introduced for claims reserving in a Bayesian context by Verrall [79, 81, 82] and Renshaw-Verrall [65]. Furthermore, the overdispersed Poisson model is also used in a generalized linear model context see McCullagh-Nelder [53], England-Verrall [25] and references therein, and Chapter 5, below. The overdispersed Poisson model as considered below can be generalized to the exponential dispersion family, this is done in Subsection 4.2.4. We start with the overdispersed Poisson model with Gamma a priori distribution cf. Verrall [81, 82]. Model Assumptions 4.26 Overdispersed Poisson-gamma model There exist random variables Θ i and Z i,j as well as constants φ i > 0 and γ 0,..., γ J > 0 with J j=0 γ j = 1 such that for all i {0,..., I} and j {0,..., J} we have
Chapter 4. Bayesian models 109 conditionally, given Θ i, the Z i,j are independent and Poisson distributed, and the incremental variables X i,j = φ i Z i,j satisfy E [X i,j Θ i ] = Θ i γ j and Var X i,j Θ i = φ i Θ i γ j. 4.91 The pairs Θ i, X i,0,..., X i,j i = 0,..., I are independent and Θ i Gamma distributed with shape parameter a i and scale parameter b i. is Remarks 4.27 See appendix, Sections B.1.2 and B.2.3 for the definition of the Poisson and Gamma distribution. Observe, given Θ i, the expectation and variance of Z i,j satisfy E [Z i,j Θ i ] = Var [Z i,j Θ i ] = Θ i γ j φ i. 4.92 The a priori expectation of the increments X i,j is given by E [X i,j ] = E [ E [X i,j Θ i ] ] = γ j E [Θ i ] = γ j ai b i. 4.93 For the cumulative ultimate claim we obtain C i,j = φ i This implies that conditionally, given Θ i, J Z i,j. 4.94 j=0 C i,j φ i d PoissonΘ i /φ i, and E [C i,j Θ i ] = Θ i, 4.95 this means that Θ i plays the role of the unknown total expected claim amount of accident year i. The Bayesian approach chosen tells us, how we should combine the a priori expectation E[C i,j ] = a i /b i and the information D I. This model is sometimes problematic in practical applications. It assumes that we have no negative increments X i,j. If we count the number of reported claims this may hold true. However if X i,j denotes incremental payments, we can have negative values. E.g. in motor hull insurance in old development periods one gets more money via subrogation and repayments of deductibles than one spends.
110 Chapter 4. Bayesian models Observe that we have assumed that the claims development pattern γ j known. is Observe that in the overdispersed Poisson model, in general, C i,j is not a natural number. Henceforth, if we work with claims counts with dispersion φ i 1 there is not really an interpretation for this model. Lemma 4.28 Under Model Assumptions 4.26 the a posteriori distribution of Θ i, given X i,0,..., X i,j, is a Gamma distribution with updated parameters where β j = j k=0 γ k. a post i,j = a i + C i,j /φ i, 4.96 j b post i,j = b i + γ k /φ i = b i + β j /φ i, 4.97 k=0 Remarks 4.29 Since accident years are independent it suffices to consider X i,0,..., X i,j for the calculation of the a posteriori distribution of Θ i. We assume that a priori all accident years are equal Θ i are i.i.d.. After we have a set of observations D I, we obtain a posteriori risk characteristics which differ according to the observations. Model 4.26 belongs to the well-known class of exponential dispersion models with associated conjugates see e.g. Bühlmann-Gisler [18] in Subsection 2.5.1, and Subsection 4.2.4 below. Using Lemma 4.28 we obtain for the a posteriori expectation E [Θ i D I ] = apost i,i i b post i,i i = = a i + C i,i i /φ i 4.98 b i + β I i /φ i b i 1 b i b i + β I i /φ i ai b i + b i + β I i /φ i Ci,I i β I i, which is a credibility weighted average between the a priori expectation E [Θ i ] = a i b i and the observation C i,i i β I i [18] for more detailed discussions. see next section and Bühlmann-Gisler
Chapter 4. Bayesian models 111 In fact we can specify the a posteriori distribution of C i,j C i,i i /φ i, given D I. It holds for k {0, 1,...} that P C i,j C i,i i /φ i = k = = = R+ e 1 β I i θ 1 β I i θ k k! b post a post i,i i i,i i 1 β I i k Γ a post i,i i k! b post a post i,i i i,i i 1 β I i k Γ a post i,i i k! = Γ k + a post i,i i k! Γ a post i,i i k + a post i,i i = 1 k b post i,i i R + θ k+a apost bpost i,i i i,i i Γa post i,i i post i,i i 1 e bpost θ a post i,i i 1 e bpost i,i i θ dθ i,i i +1 β I i θ dθ } {{ } density of Γ b post i,i i + 1 β I i Γ k + a post b post i,i i + 1 β I i b post i,i i b post i,i i + 1 β I i k + a post i,i i, bpost i,i i + 1 β I i a post i,i i i,i i k+a post i,i i a post i,i i 1 β I i b post i,i i + 1 β I i k 1 β I i b post i,i i + 1 β I i 4.99 k, which is a Negative binomial distribution with parameters r = a post i,i i and p = b post i,i i / b post i,i i + 1 β I i see appendix Section B.1.3. Proof. Using 4.92 we obtain for the conditionally density of X i,0,..., X i,j, given Θ i, that f Xi,0,...,X i,j Θ i x 0,..., x j θ = j k=0 exp{ θ γ k /φ i } θ γ k/φ i x k/φ i x k /φ i. 4.100 Hence the joint distribution of Θ i and X i,0,..., X i,j is given by f Θi,X i,0,...,x i,j θ, x 0,..., x j = f Xi,0,...,X i,j Θ i x 0,..., x j θ f Θi θ j = exp{ θ γ k /φ i } θ γ k/φ i x k/φ i b a i i x k /φ i Γa i θa i 1 e bi θ. 4.101 k=0 This shows that the a posteriori distribution of Θ i, given X i,0,..., X i,j, is again a Gamma distribution with updated parameters a post This finishes the proof of the lemma. i,j = a i + C i,j /φ i, 4.102 j b post i,j = b i + γ k /φ i. 4.103 k=0
112 Chapter 4. Bayesian models Using the conditional independence of X i,j, given Θ i, and 4.91 we obtain E [C i,j D I ] = E [ E [C i,j Θ i, D I ] ] DI [ [ I i ] ] [ [ = E E X i,j Θ i, D I D I + E E j=0 = C i,i i + E [ E [ J j=i i+1 X i,j Θ i ] D I ] = C i,i i + 1 β I i E [Θ i D I ]. J j=i i+1 Together with 4.98 this motivates the following estimator: X i,j Θ i, D I ] D I ] 4.104 Estimator 4.30 Poisson-gamma model, Verrall [81, 82] Under Model Assumptions 4.26 we have the following estimator for the ultimate claim E [C i,j D I ] [ ] Ĉ i,j P oiga = Ci,I i + 1 β I i for I J + 1 i I. b i b i + β I i φ i Example 4.31 Poisson-gamma model ai b i + 1 b i b i + β I i φ i Ci,I i β I i 4.105 We revisit the data set given in Example 2.7. For the a priori parameters we do the same choices as in Example 4.19 see Table 4.4. Since Θ i is Gamma distributed with shape parameter a i and scale parameter b i we have and, using 4.91, we obtain This leads to Table 4.9. E [Θ i ] = a i b i, 4.106 Vco Θ i = a 1/2 i, 4.107 Var C i,j = E [Var C i,j Θ i ] + Var E [C i,j Θ i ] = φ i E[Θ i ] + Var Θ i 4.108 = a i φ i + b 1 i. b i We define α i,i i = β I i/φ i b i + β I i /φ i, 4.109
Chapter 4. Bayesian models 113 E[Θ i ] VcoΘ i VcoC i,j a i b i φ i 0 11 653 101 5.00% 7.8% 400 0.00343% 41 951 1 11 367 306 5.00% 7.8% 400 0.00352% 40 922 2 10 962 965 5.00% 7.8% 400 0.00365% 39 467 3 10 616 762 5.00% 7.8% 400 0.00377% 38 220 4 11 044 881 5.00% 7.8% 400 0.00362% 39 762 5 11 480 700 5.00% 7.8% 400 0.00348% 41 331 6 11 413 572 5.00% 7.8% 400 0.00350% 41 089 7 11 126 527 5.00% 7.8% 400 0.00360% 40 055 8 10 986 548 5.00% 7.8% 400 0.00364% 39 552 9 11 618 437 5.00% 7.8% 400 0.00344% 41 826 Table 4.9: Parameter choice for the Poisson-gamma model which is the credibility weight given to the observation C i,i i β I i cf. 4.98. The credibility weights and estimates for the ultimates are given in Table 4.10. Observe that α i,i i = β I i β I i + φ i b i = β I i β I i + E[VarX i,i i Θ i] γ I i VarΘ i. 4.110 The term E[VarX i,i i Θ i] γ I i VarΘ i 4.38. is the so-called credibility coefficient see also Remark estimated reserves a post i,i i C i,i i β i,i i α i,i i b post i,i i P oiga dc i,j PoiGa CL BF 0 11 148 124 100.0% 41.0% 11 446 143 11 148 124 0 0 0 1 10 648 192 99.9% 40.9% 11 079 028 10 663 907 15 715 15 126 16 124 2 10 635 751 99.8% 40.9% 10 839 802 10 662 446 26 695 26 257 26 998 3 9 724 068 99.6% 40.9% 10 265 794 9 760 401 36 333 34 538 37 575 4 9 786 916 99.1% 40.8% 10 566 741 9 878 219 91 303 85 302 95 434 5 9 935 753 98.4% 40.6% 10 916 902 10 105 034 169 281 156 494 178 024 6 9 282 022 97.0% 40.3% 10 670 762 9 601 115 319 093 286 121 341 305 7 8 256 211 94.8% 39.7% 10 165 120 8 780 696 524 484 449 167 574 089 8 7 648 729 88.0% 37.9% 10 116 206 8 862 913 1 214 184 1 043 242 1 318 646 9 5 675 568 59.0% 29.0% 11 039 755 10 206 452 4 530 884 3 950 815 4 768 384 6 927 973 6 047 061 7 356 580 Table 4.10: Estimated reserves in the Poisson-gamma model
114 Chapter 4. Bayesian models The conditional mean square error of prediction is given by msep Ci,J D Ĉi,J [ P oiga ] P oiga 2 I = E C i,j Ĉi,J D I J 2 = E X i,j 1 β I i E [Θ i D I ] j=i i+1 D I J = E Xi,j γ j E [Θ i D I ] 2 D I 4.111 j=i i+1 cf. 4.104-4.105. Since for j > I i E [X i,j D I ] = E [ ] E [X i,j Θ i, D I ] D I = E [ ] E [X i,j Θ i ] D I = γ j E [Θ i D I ], 4.112 we have that msep Ci,J D Ĉi,J J P oiga I = Var X i,j D I. 4.113 j=i i+1 This last expression can be calculated. We do the complete calculation, but we could also argue with the help of the negative binomial distribution. Using the conditional independence of X i,j, given Θ i, and 4.91 we obtain J Var j=i i+1 = E = E X i,j D I Var J j=i i+1 J j=i i+1 X i,j Θ i D I + Var φ i Θ i γ j D I + Var J E j=i i+1 J j=i i+1 Θ i γ j D I = φ i 1 β I i E [Θ i D I ] + 1 β I i 2 Var Θ i D I. With Lemma 4.28 this leads to the following corollary: X i,j Θ i D I 4.114 Corollary 4.32 Under Model Assumptions 4.26 the conditional mean square error of prediction is given by msep Ci,J D Ĉi,J P oiga I = φ i 1 β I i apost i,i i b post + 1 β I i 2 i,i i for I J + 1 i I. a post i,i i b post i,i i 2 4.115
Chapter 4. Bayesian models 115 Remark. Observe that we have assumed that the parameters a i, b i, φ i and γ j are known. If these need to be estimated we obtain an additional term in the MSEP calculation which corresponds to the parameter estimation error. The unconditional mean square error of prediction can then easily be calculated. We have P oiga msep Ci,J Ĉi,J [ = E msep Ci,J D Ĉi,J ] P oiga I 4.116 = φ i 1 β I i E [ ] a post i,i i b post + 1 β I i 2 E [ ] a post i,i i i,i i b post 2, i,i i and using E[C i,i i ] = β I i ai b i cf. 4.93 we obtain P oiga msep Ci,J Ĉi,J = φ i 1 β I i ai 1 + φ i b i. 4.117 b i φ i b i + β I i Hence we obtain the Table 4.11 for the conditional prediction errors. msep 1/2 C i,j C i,i i msep 1/2 C i,j dc i,j P oiga dc i,j Go dc i,j P oiga dc i,j Go cr i c i 0 1 25 367 16 391 25 695 18 832 17 527 2 32 475 21 602 32 659 23 940 22 282 3 37 292 23 714 37 924 27 804 25 879 4 60 359 37 561 61 710 45 276 42 153 5 83 912 51 584 86 052 63 200 58 862 6 115 212 68 339 119 155 87 704 81 745 7 146 500 82 516 153 272 113 195 105 626 8 224 738 129 667 234 207 174 906 163 852 9 477 318 309 586 489 668 388 179 372 199 total 571 707 359 869 588 809 457 739 435 814 Table 4.11: Mean square errors of prediction in the Poisson-gamma model, the Log-normal/Log-normal model and in Model 4.14 We have already seen in Table 4.10 that the Poisson-gamma reserves are closer to the Bornhuetter-Ferguson estimate this stands in contrast with the other methods presented in this chapter. Table 4.11 shows that the prediction error is substantially larger in the Poisson-gamma model than in the other models comparable to the estimation error in R i 0 in Table 4.5. This suggests that in the present case the Poisson-gamma method is not an appropriate method.
116 Chapter 4. Bayesian models 4.2.4 Exponential dispersion family with its associate conjugates In the subsection above we have seen that in the Poisson-gamma model Θ i has as a posteriori distribution again a Gamma distribution with updated parameters. This indicates, using a smart choice of the distributions we were able to calculate the a posteriori distribution. We generalize the Poisson-gamma model to the Exponential dispersion family EDF, and we look for its associate conjugates. This are standard models in Bayesian inference, for literature we refer e.g. to Bernardo-Smith [9]. Similar ideas have been applied for tariffication and pricing see Bühlmann- Gisler [18], Chapter 2, we transform these ideas to the reserving context see also Wüthrich [89]. Model Assumptions 4.33 Exponential dispersion model There exists a claims development pattern β j 0 j J with β J = 1, γ 0 = β 0 0 and γ j = β j β j 1 0 for j {1,..., J}. Conditionally, given Θ i, the X i,j 0 i I, 0 j J are independent with X i,j γ j µ i d df Θ i i,j x = a x, σ2 w i,j { x Θi bθ i exp σ 2 /w i,j } dνx, 4.118 where ν is a suitable σ-finite measure on R, b is some real-valued twicedifferentiable function of Θ i and µ i > 0, σ 2 and w i,j > 0 are some real-valued constants, and F Θ i i,j is a probability distribution on R. The random vectors Θ i, X i,0,..., X i,j i = 0,..., I are independent and Θ i are real-valued random variables with densities w.r.t. the Lebesgue measure with µ 1 and τ 2 > 0. { } µ θ bθ u µ,τ 2θ = dµ, τ 2 exp, 4.119 τ 2 Remarks 4.34 In the following the measure ν is given by the Lebesgue measure or by the counting measure.
Chapter 4. Bayesian models 117 A distribution of the type 4.118 is said to be a one parametric Exponential dispersion family EDF. The class of one parametric Exponential dispersion families covers a large class of families of distributions, e.g. the families of the Poisson, Bernoulli, Gamma, Normal and Inverse-gaussian distributions cf. Bühlmann-Gisler [18], Section 2.5. The first assumption implies that the scaled sizes Y i,j = X i,j /γ j µ i have, given Θ i, a distribution F Θ i i,j which belongs to the EDF. A priori they are all the same, which is described by the fact that µ and τ 2 do not depend on i. For the time being we assume that all parameters of the underlying distributions are known, w i,j is a known volume measure which will be further specified below. For the moment we could also concentrate on a single accident year i, i.e. we only need the Model Assumptions 4.33 for a fixed accident year i. A pair of distributions given by 4.118 and 4.119 is said to be a one parametric Exponential dispersion family with associated conjugates. Examples are see Bühlmann-Gisler [18], Section 2.5: Poisson-gamma case see also Verrall [81, 82] and Subsection 4.2.3, Binomial-beta case, Gamma-gamma case or Normal-normal case. We have the following lemma: Lemma 4.35 Associate Conjugate Under Model Assumptions 4.33 the conditional distribution of Θ i, given X i,0,..., X i,j, has the density u i µ with post,j,τ post,j 2 the a posteriori parameters where τ 2 post,j = σ 2 [ σ 2 τ + 2 µ i post,j = τ 2 post,j σ 2 Ȳ j i = j k=0 [ σ 2 ] 1 j w i,k, 4.120 k=0 τ 2 1 + j k=0 w i,k Ȳ j i ], 4.121 w i,k X i,k j l=0 w. 4.122 i,l γ k µ i
118 Chapter 4. Bayesian models Proof. Define Y i,j = X i,j /γ j µ i. The joint distribution of Θ i, Y i,0,..., Y i,j is given by f Θi,Y i,0,...,y i,j θ, y 0,..., y j = f Yi,0,...,Y i,j Θ i y 0,..., y j θ u 1,τ 2θ { } 1 θ bθ = d1, τ 2 exp j k=0 a y k, σ 2 τ 2 w i,k 4.123 { } yk θ bθ exp. σ 2 /w i,k Hence the conditional distribution of Θ i, given X i,0,..., X i,j, is proportional to { [ ] [ ]} 1 j exp θ τ + w i,k 2 σ X i,k 1 j bθ 2 γ k µ i τ + w i,k. 4.124 2 σ 2 k=0 This finishes the proof of the lemma. Remarks 4.36 Lemma 4.35 states that the distribution defined by density 4.119 is a conjugated distribution to the distribution given by 4.118. This means that the a posteriori distribution of Θ i, given X i,0,..., X i,j, is again of the type 4.119 with updated parameters τ 2 post,j and µ i post,j. From Lemma 4.35 we can calculate the distribution of Y i,i i+1,... Y i,j, given D I. First we remark that different accident years are independent, hence we can restrict ourselves to the observations Y i,0,..., Y i,i i, then we have that the a posteriori distribution is given by J j=i i+1 k=0 F θ i,j y j u i µ θ dθ. 4.125 post,i i,τ post,i i 2 In the Poisson-gamma case this is a negative binomial distribution. Observe that for the EDF with its associate conjugates we can determine the explicit distributions, not only estimates for the first two moments. Theorem 4.37 Under the Model Assumptions 4.33 we have for i = 0,..., I and j = 0,..., J: 1. The conditional moments of the standardized observations X i,j γ j µ i are given by [ ] µθ i def. Xi,j = E γ j µ i Θ i = b Θ i, 4.126 Xi,j Var γ j µ i Θ i = σ2 b Θ i. 4.127 w i,j
Chapter 4. Bayesian models 119 2. If exp { µ i θ bθ /τ 2} disappears on the boundary of Θ i for all µ i, τ 2 then where E [X i,j ] = γ j E [µθ i ] = γ j µ i, 4.128 E [µθ i X i,0,..., X i,j ] = α i,j Ȳ j i + 1 α i,j 1, 4.129 α i,j = j k=0 w i,k j k=0 w i,k + σ 2 /τ 2. 4.130 Proof. See Lemma 5 below, Theorem 2.20 in Bühlmann-Gisler [18] or Bernardo- Smith [9]. Remarks 4.38 In Model Assumptions 4.33 and in Theorem 4.37 we study the standardized version for the observations X i,j. If µ i is equal for all i, the standardization is not necessary. If they are not equal, the standardized version is straightforward for comparisons between accident years. Theorem 4.37 says, that the a posteriori mean of µθ i, given the observations X i,0,..., X i,j, is a credibility weighted average between the a priori mean E[µΘ i ] = 1 and the weighted average Ȳ j i of the standardized observations. The larger the individual variation σ 2 the smaller the credibility weight α i,j ; the larger the collective variability τ 2 the larger the credibility weight α i,j. For a detailed discussion on the credibility coefficient we refer to Bühlmann-Gisler [18]. κ = σ 2 /τ 2 4.131 Estimator 4.39 Under Model Assumptions 4.33 we have the following estimators for the increments E [X i,i i+k D I ] and the ultimate claims E [C i,j D I ] X i,i i+k EDF = γi i+k µ i µθ i, 4.132 Ĉ i,j EDF = Ci,I i + 1 β I i µ i µθ i 4.133 for I J + 1 i I and k {1,..., J I + i}, where µθ i = E [µθ i D I ] = α i,i i Ȳ I i i + 1 α i,i i 1. 4.134
120 Chapter 4. Bayesian models Theorem 4.40 Bayesian estimator Under Model Assumptions 4.33 the estimators µθ i, X i,j+k EDF and Ĉ i,j EDF are DI -measurable and minimize the conditional mean square errors msep µθi D I, msep Xi,j+k D I and msep Ci,J D I, respectively, for I J + 1 i I. I.e. these estimators are Bayesian w.r.t. D I and minimize the quadratic loss function L 2 P -norm. Proof. The D I -measurability is clear. But then the claim for µθ i is clear, since the conditional expectation minimizes the mean square error given D I see Theorem 2.5 in [18]. Due to our independence assumptions we have E [X i,i i+k D I ] = E [E [X i,i i+k Θ i ] D I ] = γ I i+k µ i µθ i, 4.135 E [C i,j D I ] = C i,i i + 1 β I i µ i µθ i, 4.136 which finishes the proof of the theorem. Explicit choice of weights. W.l.o.g. we may and will assume that m b = E [b Θ i ] = 1. 4.137 Otherwise we simply multiply σ 2 and τ 2 by m b, which in our context of EDF with associate conjugates leads to the same model with bθ replaced by b 1 θ = m b bθ/m b. This rescaled model has then Xi,j Var γ j µ i Θ i = m b σ 2 b 1 Θ i, with E [ b w 1Θ i ] = 1, 4.138 i,j Var b 1Θ i = m b τ 2. 4.139 Since both, σ 2 and τ 2 are multiplied by m b, the credibility weights α i,j do not change under this transformation. Hence we assume 4.137 for the rest of this work. In Section 4.2.4 we have not specified the weights w i,j. In Mack [47] there is a discussion choosing appropriate weights Assumption A4 α in Mack [47]. In fact we could choose a design matrix ω i,j which gives a whole family of models. We do not further discuss this here, we will do a canonical choice which is favoured in many applications that has the nice side effect that we obtain a natural mixture between the chain-ladder estimate and the Bornhuetter-Ferguson estimate. Model Assumptions 4.41
Chapter 4. Bayesian models 121 In addition to Model Assumptions 4.33 and 4.137 we assume that there exists δ 0 with w i,j = γ j µ δ i for all i = 0,..., I and j = 0,..., J and that exp { µ 0 θ bθ /τ 2} disappears on the boundary of Θ i for all µ 0 and τ 2. Hence, we have j k=0 w i,k = β j µ δ i. This immediately implies: Corollary 4.42 Under Model Assumptions 4.41 we have for all i = 0,..., I that C i,i i µθ i = α i,i i + 1 α i,i i 1, 4.140 β I i µ i where α i,i i = β I i β I i + σ2 µ δ i. 4.141 τ 2 Remark. Compare the weight α i,i i from 4.141 to α i,i i from 4.110: In the notation of Subsection 4.2.3 see 4.110 we have κ i = φ i b i = E [Var X i,i i Θ i ] γ I i VarΘ i and in the notation of this subsection we have [ E Var κ i = σ2 /µ δ i τ 2 = 4.142 ] Xi,I i Θi µ i γ I i Var µθ i. 4.143 This shows that the estimators Ĉi,J P oiga and Ĉ i,j EDF give the same estimated reserve the Poisson-gamma model is an example for the Exponential dispersion family with associate conjugates. Example 4.43 Exponential dispersion model with associate conjugate We revisit the data set given in Example 2.7. For the a priori parameters we do the same choices as in Example 4.19 see Table 4.4. Observe that the credibility weight of the reserves does not depend on the choice of δ 0 for given VcoC i,j : Using the conditional independence of the increments X i,j, given Θ i, and 4.126, 4.127 and 4.137 leads to [ J ] [ J ] VarC i,j = E Var X i,j Θ i + Var E X i,j Θ i j=0 [ J = E γj 2 µ 2 i Var = J j=0 = µ2 i µ δ i j=0 Xi,j j=0 i] J γ j µ i Θ + Var γ j µ i E j=0 ] γ j µ i Θ i [ Xi,j γ 2 j µ 2 i σ2 ω i,j + µ 2 i Var b Θ i 4.144 σ 2 + µ 2 i τ 2.
122 Chapter 4. Bayesian models Hence, we have that This implies that α i,i i = Vco 2 C i,j = σ2 µ δ i β I i β I i + σ2 µ δ i + τ 2. 4.145 τ 2 = β I i β I i + Vco2 C i,j τ 2 1. 4.146 For simplicity we have chosen δ = 0 in Table 4.12, which implies that κ i κ. τ σ κ α i,i i µθi reserves EDF 0 5.00% 6.00% 1.4400 41.0% 0.9822 0 1 5.00% 6.00% 1.4400 40.9% 0.9746 15 715 2 5.00% 6.00% 1.4400 40.9% 0.9888 26 695 3 5.00% 6.00% 1.4400 40.9% 0.9669 36 333 4 5.00% 6.00% 1.4400 40.8% 0.9567 91 303 5 5.00% 6.00% 1.4400 40.6% 0.9509 169 281 6 5.00% 6.00% 1.4400 40.3% 0.9349 319 093 7 5.00% 6.00% 1.4400 39.7% 0.9136 524 484 8 5.00% 6.00% 1.4400 37.9% 0.9208 1 214 184 9 5.00% 6.00% 1.4400 29.0% 0.9502 4 530 884 6 927 973 Table 4.12: Estimated reserves in the Exponential dispersion model with associate conjugate The estimates in Table 4.10 and Table 4.12 lead to the same result. Moreover, we see that the Bayesian estimate µθ i is below 1 for all accident years i see Table 4.12. This suggests once more that the choices of the a priori estimates µ i for the ultimate claims were too conservative. Conclusion 1. Corollary 4.42 implies that the estimator Ĉi,J EDF gives the optimal mixture between the Bornhuetter-Ferguson and the chain-ladder estimates in the EDF with associate conjugate: Assume that β j and f j are identified by 4.3 and set ĈCL i,j = C i,i i/β I i. Then we have that Ĉ i,j EDF [ ] = C i,i i + 1 β I i α i,i i Ci,I i + 1 α i,i i µ i β I i [ ] CL = C i,i i + 1 β I i α i,i i Ĉi,J + 1 αi,i i µ i 4.147 = C i,i i + 1 β I i S i α i,i i, where S i is the function defined in 4.1. Hence we have the mixture EDF CL BF Ĉ i,j = αi,i i Ĉi,J + 1 αi,i i Ĉi,J 4.148
Chapter 4. Bayesian models 123 between the CL estimate and the BF estimate. Moreover it minimizes the conditional MSEP in the Exponential dispersion model with associate conjugate. Observe that α i,i i = β I i β I i + κ i, 4.149 where the credibility coefficient was defined in 4.143. If we choose κ i = 0 we obtain the chain-ladder estimate and if we choose κ i = we obtain the Bornhuetter- Ferguson reserve. Conclusion 2. Using 4.135 we find for all I i j < J that E [C i,j+1 C i,0,..., C i,i i ] [ j+1 = C i,i i + E = C i,i i + = j+1 l=i i+1 l=i i+1 1 + β j+1 β I i α i,i i β I i X i,l C i,0,..., C i,i i ] γ l µ i µθ i 4.150 C i,i i + β j+1 β I i 1 α i,i i µ i. In the 2nd step we explicitly use, that we have an exact Bayesian estimator. 4.150 does not hold true in the Bühlmann-Straub model see Section 4.3 below. Formula 4.150 suggests that the EDF with associate conjugate is a linear mixture of the chain-ladder model and the Bornhuetter-Ferguson model. If we choose the credibility coefficient κ i = 0, we obtain E [C i,j+1 C i,0,..., C i,j ] = 1 + β j+1 β j C i,j = f j C i,j, 4.151 β j if we assume 4.3. This is exactly the chain-ladder assumption 2.1. If we choose κ i = then α i,i i = 0 and we E [C i,j C i,0,..., C i,i i ] = C i,i i + 1 β I i µ i, 4.152 which is Model 2.8 that we have used to motivate the Bornhuetter-Ferguson estimate Ĉi,J BF. Under Model Assumptions 4.41 we obtain for the conditional mean square error of prediction [ 2 ] msep µθi D µθi I = E µθi µθ i D I = VarµΘ i D I, 4.153
124 Chapter 4. Bayesian models and hence we have that msep µθi µθi = E [ VarµΘ i D I ]. 4.154 If we plug in the estimator 4.140 we obtain msep µθi µθi [ ] 2 C i,i i = E α i,i i + 1 α i,i i 1 µθ i β I i µ i [ Ci,I i = E α i,i i µθ i 1 α i,i i µθ i 1 ] 2 4.155 β I i µ i [ ] = α i,i i 2 Ci,I i E Var β I i µ i Θ i + 1 α i,i i 2 Var µθ i σ 2 = α i,i i 2 + 1 α β I i µ δ i,i i 2 τ 2 i = 1 α i,i i τ 2, where in the last step we have used τ 2 1 α i,i i = α i,i i σ 2 µ δ i β I i cf. 4.141. From this we derive the unconditional mean square error of prediction for the estimate of C i,j : EDF msep Ci,J Ĉi,J [ = E 1 β I i µ i µθ 2 ] i C i,j C i,i i 4.156 [ = µ 2 i E 1 β I i µθi µθ i + µθ i C ] 2 i,j C i,i i µ i = µ 2 i 1 β I i 2 msep µθi µθi + J k=i i+1 E [ Var X i,k Θ i ] = µ 2 i [1 β I i 2 1 α i,i i τ 2 + 1 β I i σ 2 /µ δ i ]. Moreover, if we set δ = 0 we obtain msep Ci,J Ĉi,J EDF = σ 2 1 β I i µ i 1 + σ2 µ i τ 2. 4.157 β i i + σ2 µ i τ This is the same value as for the Poisson-gamma case, see 4.116 and Table 4.11. For the conditional mean square error of prediction for the estimate of C i,j, one needs to calculate Var µθ i D I = Var b Θ i D I, 4.158 where Θ i, given D I, has a posteriori distribution u i µ given by Lemma post,j,τ post,j 2 4.35. We omit its further calculation.
Chapter 4. Bayesian models 125 Remarks on parameter estimation. So far we have always assumed that µ i, γ j, σ 2 and τ 2 are known. Under these assumptions we have calculated the Bayesian estimator which was optimal in the sense that it minimizes the MSEP. If the parameters are not known, the problem becomes substantially more difficult and in general one loses the optimality results. Estimation of γ j. At the moment we do not have a canonical way, how the claims development pattern should be estimated. In practice one often chooses the chain-ladder estimate β CL j provided in 2.25 and then sets γ CL j = β CL j β CL j 1. 4.159 At the current stage we can not say anything about the optimality of this estimator. However, observe that for the Poisson-gamma model this estimator is natural in the sense that it coincides with the MLE estimator provided in the Poisson model see Corollary 2.18. For more on this topic we refer to Subsection 4.2.5. Estimation of µ i. Usually, one takes a plan value, a budget value or the value used for the premium calculation as in the BF method. Estimation of σ 2 and τ 2. For known γ j and µ i one can give unbiased estimators for these variance parameters. For the moment we omit its formulation, because in Section 4.3 we see that the Exponential dispersion model with its associate conjugates satisfies the assumptions of the Bühlmann-Straub model. Hence we can take the same estimators as in the Bühlmann-Straub model and these are provided in Subsection 4.3.1. 4.2.5 Poisson-gamma case, revisited In Model Assumptions 4.26 and 4.33 we have assumed that the claims development pattern γ j is known. Of course, in general this is not the case and in practice one usually uses estimate 4.159 for the claims development pattern. In Verrall [82] this is called the plug-in estimate which leads to the CL and BF mixture. However, in a full Bayesian approach one should also estimate this paremater in a Bayesian way since usually it is not known. This means that we should also give an a priori distribution to the claims development pattern. For simplicity, we only treat the Poisson-gamma case which was also considered in Verrall [82]. We have the following assumptions
126 Chapter 4. Bayesian models Model Assumptions 4.44 Poisson-gamma model There exist positive random vectors Θ = Θ i i and γ = γ j j with J j=0 γ j = 1 such that for all i {0,..., I} and j {0,..., J} we have conditionally, given Θ and γ, the X i,j are independent and Poisson distributed with mean Θ i γ j. Θ and γ are independent and Θ i are independent Gamma distributed with shape parameter a i and scale parameter b i, and γ is f γ distributed. As before, we can calculate the joint distribution of {X i,j, i + j I}, Θ and γ, which is given by f x i,j i+j I, Θ, γ = i+j I e Θ i γj Θ i γ j x i,j x i,j! I f ai,b i Θ i f γ γ. 4.160 The posterior distribution Θ, γ given the observations D I is proportional to with I f a post i,i i,bpost I i k=0 Θ i I i J j=0 γ x i,j j f γ γ, 4.161 j J j J a post i,j = a i + X i,j and b post i,j = b i + γ k, 4.162 see also Lemma 4.28. From this we immediately see, that one can not calculate analytically the posterior distribution of Θ, γ given the observations D I, but this also implies that we can not easily calculate the conditional distribution of X k+l, k + l > I, given the observations D I. Hence these Bayesian models can only be implemented with the help of numerical simulations, e.g. the Markov Chain Monte Carlo MCMC approach. The implementation using a simulation-based MCMC is discussed in de Alba [2, 4] and Scollnik [70]. k=0 4.3 Bühlmann-Straub Credibility Model In the last section we have seen an exact Bayesian approach to the claims reserving problem. The Bayesian estimator µθ i = E [µθ i X i,0,..., X i,i i ] 4.163
Chapter 4. Bayesian models 127 is the best estimator for µθ i in the class of all estimators which are square integrable functions of the observations X i,0,..., X i,i i. The crucial point in the calculation was that from the EDF with its associate conjugates we were able to explicitly calculate the a posteriori distribution of µθ i. Moreover, the parameters of the a posteriori distribution and the Bayesian estimator were linear in the observations. However, in most of the Bayesian models we are not in the situation where we are able to calculate the a posteriori distribution, and therefore the Bayesian estimator cannot be expressed in a closed analytical form. I.e. in general the Bayesian estimator does not meet the practical requirements of simplicity and intuitiveness and can only be calculated by numerical procedures such as Markov Chain Monte Carlo methods MCMC methods. In cases where we are not able to derive the Bayesian estimator we restrict the class of possible estimators to a smaller class, which are linear functions of the observations X i,0,..., X i,i i. This means that we try to get an estimator which minimizes the quadratic loss function among all estimators which are linear combinations of the observations X i,0,..., X i,i i. The result will be an estimator which is practicable and intuitive by definition. This approach is well-known in actuarial science as credibility theory and since best is also to be understood in the Bayesian sense credibility estimators are linear Bayes estimators see Bühlmann-Gisler [18]. In claims reserving the credibility theory was used e.g. by De Vylder [84], Neuhaus [56] and Mack [51] in the Bühlmann-Straub context. In the sequel we always assume that the incremental loss development pattern γ j j=0,...,j given by γ 0 = β 0 and γ j = β j β j 1 for j = 1,..., J 4.164 is known as in the previous sections on Bayesian estimates. Model Assumptions 4.45 Bühlmann-Straub model [18] Conditionally, given Θ i, the increments X i,0,..., X i,j are independent with E [X i,j /γ j Θ i ] = µθ i, 4.165 Var X i,j /γ j Θ i = σ 2 Θ i /γ j 4.166 for all i = 0,..., I and j = 0,..., J. The pairs Θ i, X i i = 0,..., I are independent, and the Θ i are independent and identically distributed.
128 Chapter 4. Bayesian models For the cumulative claim amount we obtain E [C i,j Θ i ] = β j µθ i, 4.167 Var C i,j Θ i = β j σ 2 Θ i. 4.168 The latter equation shows that this model is different from Model 4.14. The term 1 β j α 2 C i,j is replaced by σ 2 Θ i. On the other hand the Bühlmann-Straub model is very much in the spirit of the EDF with its associate conjugates. The parameter Θ i plays the role of the underlying risk characteristics, i.e. the parameter Θ i is unknown and tells us whether we have a good or bad accident year. For a more detailed explanation in the framework of tariffication and pricing we refer to Bühlmann-Gisler [18]. In linear credibility theory one looks for an estimate µθ i of µθ i which minimizes the quadratic loss function among all estimators which are linear in the observations X i,j see also [18], Definition 3.8. I.e. one has to solve the optimization problem µθ i cred = argmin eµ LX,1 E [ µθ µ 2], 4.169 where LX, 1 = µ; µ = a i,0 + Remarks 4.46 I I i J j=0 a i,j X i,j with a i,j R. 4.170 Observe that the credibility estimator µθ i cred is linear in the observations X i,j by definition. We could also allow for general real-valued, square integrable functions of the observations X i,j. In that case we obtain simply the Bayesian estimator since the conditional a posteriori expectation minimizes the quadratic loss function among all estimators which are a square integrable function of the observations. Credibility estimators can also be constructed using Hilbert space theory. Indeed 4.169 asks for a minimization in an L 2 -sense, which corresponds to orthogonal projections in Hilbert spaces. For more on this topic we refer to Bühlmann-Gisler [18]. We define the structural parameters µ 0 = E [µθ i ], 4.171 σ 2 = E [ σ 2 Θ i ], 4.172 τ 2 = Var µθ i. 4.173
Chapter 4. Bayesian models 129 Theorem 4.47 inhomogeneous Bühlmann-Straub estimator Under Model Assumptions 4.45 the optimal linear inhomogeneous estimator of µθ i, given the observations D I, is given by for I J + 1 i I, where µθ i cred = α i Y i + 1 α i µ 0 4.174 α i = Y i = β I i β I i + σ 2 /τ 2, 4.175 I i J j=0 γ j β I i Xi,j γ j = C i,i i J β I i. 4.176 In credibility theory the a priori mean µ 0 can also be estimated from the data. This leads to the homogeneous credibility estimator. Theorem 4.48 homogeneous Bühlmann-Straub estimator Under Model Assumptions 4.45 the optimal linear homogeneous estimator of µθ i given the observations D I is given by µθ i hom = α i Y i + 1 α i µ 0 4.177 for I J + 1 i I, where α i and Y i are given in Theorem 4.47 and µ 0 = I α i α Y i, with α = I α i. 4.178 Proof of Theorem 4.47 and Theorem 4.48. We refer to Theorems 4.2 and 4.4 in Bühlmann-Gisler [18]. Remarks 4.49 If the a priori mean µ 0 is known we choose the inhomogeneous credibility estimator µθ i cred from Theorem 4.47. This estimator minimizes the quadratic loss function given in 4.169 among all estimators given in 4.170. If the a priori mean µ 0 is unknown, we estimate its value also from the data. This is done by switching to the homogeneous credibility estimator µθ i hom given in Theorem 4.48. The crucial part is that we have to slightly change the set of possible estimators given in 4.170 towards I i J I L e X = µ ; µ = a i,j X i,j with a i,j R, E [ µ] = µ 0. j=0 4.179
130 Chapter 4. Bayesian models The homogeneous credibility estimator minimizes the quadratic loss function among all estimators from the set L e X, i.e. µθ i hom = argmin eµ LeXE [ µθ µ 2]. 4.180 The crucial point in the credibility estimators in 4.174 and 4.177 is that we take a weighted average Y i between the individual observations of accident year i and the a priori mean µ 0 and its estimator µ 0, respectively. Observe that the weighted average Y i only depends on the observations of accident year i. This is a consequence of the independence assumption between the accident years. However, the estimator µ 0 uses the observations of all accident years since the a priori mean µ 0 holds for all accident years. The credibility weight α i [0, 1] for the weighted average of the individual observations Y i becomes small when the expected fluctuations within the accidents years σ 2 are large and becomes large if the fluctuations between the accident years τ 2 are large. The estimator 4.174 is exactly the same as the one from the exponential dispersion model with associate conjugates Corollary 4.42 if we assume that all a priori means µ i are equal and δ = 0. Since the inhomogeneous estimator µθ i cred contains a constant it is automatically an unbiased estimator for the a priori mean µ 0. In contrast to µθ i cred the homogeneous µθ i hom is unbiased for µ 0 by definition. The weights γ j in the model assumptions could be replaced by weights γ i,j, then the Bühlmann-Straub result still holds true. Indeed, one could choose a design matrix γ i,j = Γ i j to apply the Bühlmann-Straub model see Taylor [75] and Mack [47] and the variance condition is then replaced by Var X i,j /γ j,i Θ i = σ2 Θ i, 4.181 V i γj,i δ where V i > 0 is an appropriate measure for the volume and δ > 0. δ = 1 is the model favoured by Mack [47], whereas De Vylder [84] has chosen δ = 2. For δ = 0 we obtain a condition which is independent of j credibility model of Bühlmann, see [18]. Different a priori means µ i. If X i,j /γ j has different a priori means µ i for different accident years i, we modify the Bühlmann-Straub assumptions 4.165-4.166 to [ ] Xi,j E γ j µ i Θ i = µθ i, 4.182 Xi,j Var γ j µ i Θ i = σ2 Θ i, 4.183 γ j µ δ i
Chapter 4. Bayesian models 131 for an appropriate choice δ 0. In this case we have E [µθ i ] = 1 and the inhomogeneous and homogeneous credibility estimator are given by and respectively, where µθ i cred = α i Y i + 1 α i 1, 4.184 µθ i hom = α i Y i + 1 α i µ 0, 4.185 Y i = C i,i i J µ i β I i, α i = β I i β I i + κ i with κ i = σ2 µ δ i τ 2. 4.186 Observe that this gives now completely the same estimator as in the exponential dispersion family with its associate conjugates see Corollary 4.42. This immediately gives the following estimators: Estimator 4.50 Bühlmann-Straub credibility reserving estimator In the Bühlmann-Straub model 4.45 with generalized assumptions 4.182-4.183 we have the following estimators Ĉ i,j cred = Ê [C i,j D I ] = C i,i i + 1 β I i µ i µθ i cred, 4.187 Ĉ i,j hom = Ê [C i,j D I ] = C i,i i + 1 β I i µ i µθ i hom 4.188 for I J + 1 i I. Lemma 4.51 In the Bühlmann-Straub model 4.45 the quadratic losses for the credibility estimators are given by [ ] 2 E µθi cred µθ i for I J + 1 i I. [ ] 2 E µθi hom µθ i = τ 2 1 α i, 4.189 = τ 2 1 α i 1 + 1 α i α 4.190 Proof. We refer to Theorems 4.3 and 4.6 in Bühlmann-Gisler [18]. Corollary 4.52 In the Bühlmann-Straub model 4.45 with generalized assumptions 4.182-4.183 the mean square errors of prediction of the inhomogeneous and homogeneous credibility reserving estimator are given by cred msep Ci,J Ĉi,J = µ 2 i [1 β I i σ 2 /µ δ i + 1 β I i 2 τ 2 1 α i ]. 4.191
132 Chapter 4. Bayesian models and msep Ci,J Ĉi,J hom = msep Ci,J Ĉi,J cred + µ 2 i 1 β I i 2 τ 2 1 α i 2 respectively, for I J + 1 i I. Remarks 4.53 α, 4.192 The first term on the right-hand side of the above equalities stands again for the process error whereas the second terms stand for the parameter/prediction errors how good can an actuary predict the mean. Observe again, that we assume that the incremental loss development pattern γ j j=0,...,j is known, and hence we do not estimate the estimation error in the claims development pattern. Observe the MSEP formula for the credibility estimator conicides with the one for the exponential dispersion family, see 4.156. Proof. We separate the mean square error of prediction as follows cred msep Ci,J Ĉi,J [ ] 2 = E 1 β I i µ i µθ i cred C i,j C i,i i. 4.193 Conditionally, given Θ = Θ 0,..., Θ I, we have that the increments X i,j are independent. But this immediately implies that the expression in 4.193 is equal to E [ [ ]] 2 E 1 β I i 2 µ 2 i µθ i cred µθ i Θ 4.194 [ [ 1 + E E βi i µ i µθ i C i,j C i,i i ]] 2 Θ = 1 β I i 2 µ 2 i msep µθi µθ i cred + E [ Var C i,j C i,i i Θ ]. But then the claim follows from Lemma 4.51 and Var C i,j C i,i i Θ = 1 β I i µ 2 δ i σ 2 Θ i. 4.195 4.3.1 Parameter estimation So far in the example the choice of the variance parameters was rather artificial. In this subsection we provide estimators for σ 2 and τ 2. In practical applications it
Chapter 4. Bayesian models 133 is often convenient to eliminate outliers for the estimation of σ 2 and τ 2, since the estimators are often not very robust. Before we start with the parameter estimations we would like to mention that in this section essentially the same remarks apply as the ones mentioned on page 125. We need to estimate γ j, σ 2 and τ 2. For the weights γ j we proceed as in 4.159 Estimate the claims development pattern β j from 2.25. The incremental losse development pattern γ j is then estimated by 4.164. We define S i = I i J 1 Xi,j γ j I i J γ j=0 j Y i 2. 4.196 Then S i is an unbiased estimator for σ 2 see [18], 4.22. Hence σ 2 is estimated by the following unbiased estimator σ 2 = 1 I 1 S i. 4.197 I For the estimation of τ 2 we define I β I i T = i β I i Y i Y 2, 4.198 where i Y = β I i Y i i i β = C i,i i J I i i β I i Then an unbiased estimator for τ 2 is given by see [18], 4.26 with c = I If τ 2 is negative it is set to zero. τ 2 = c {T I σ2 i β I i β I i i β 1 β I i I i. 4.199 }, 4.200 i β I i 1. 4.201 If we work with different µ i we have to slightly change the estimators see Bühlmann- Gisler [18], Section 4.8. Example 4.54 Bühlmann-Straub model, constant µ i We revisit the data given in Example 2.7. We recall that we have set Vco µθ i = 5% and VcoC i,j = 7.8%, using external know how only see Tables 4.9 and 4.4. For this example we assume that all a priori expectations µ i are equal and we use the homogeneous credibility estimator. We have the following observations, where the incremental claims development pattern γ j is estimated via the chain-ladder method.
134 Chapter 4. Bayesian models 0 1 2 3 4 5 6 7 8 9 0 5 946 975 9 668 212 10 563 929 10 771 690 10 978 394 11 040 518 11 106 331 11 121 181 11 132 310 11 148 124 1 6 346 756 9 593 162 10 316 383 10 468 180 10 536 004 10 572 608 10 625 360 10 636 546 10 648 192 2 6 269 090 9 245 313 10 092 366 10 355 134 10 507 837 10 573 282 10 626 827 10 635 751 3 5 863 015 8 546 239 9 268 771 9 459 424 9 592 399 9 680 740 9 724 068 4 5 778 885 8 524 114 9 178 009 9 451 404 9 681 692 9 786 916 5 6 184 793 9 013 132 9 585 897 9 830 796 9 935 753 6 5 600 184 8 493 391 9 056 505 9 282 022 7 5 288 066 7 728 169 8 256 211 8 5 290 793 7 648 729 9 5 675 568 f b j 1.4925 1.0778 1.0229 1.0148 1.0070 1.0051 1.0011 1.0010 1.0014 Table 4.13: Observed historical cumulative payments Ci,j and estimated chain-ladder factors fj, see Table 2.2 1 2 3 4 5 6 7 8 9 10 0 10 086 719 12 814 544 13 090 078 9 577 303 14 357 308 9 048 371 12 901 245 13 793 367 10 658 637 11 148 124 1 10 764 791 11 179 404 10 569 215 6 997 504 4 710 946 5 331 290 10 340 861 10 390 677 11 152 804 2 10 633 061 10 248 997 12 378 890 12 113 052 10 606 498 9 531 934 10 496 344 8 289 406 3 9 944 313 9 240 019 10 559 131 8 788 679 9 236 259 12 866 767 8 493 606 4 9 801 620 9 453 540 9 556 060 12 602 942 15 995 382 15 325 912 5 10 490 085 9 739 738 8 370 426 11 289 335 7 290 128 6 9 498 524 9 963 120 8 229 388 10 395 847 7 8 969 136 8 402 801 7 716 853 8 8 973 762 8 119 848 9 9 626 383 γj 59.0% 29.0% 6.8% 2.2% 1.4% 0.7% 0.5% 0.1% 0.1% 0.1% Table 4.14: Observed scaled incremental payments Xi,j/γj and estimated incremental claims development pattern γj
Chapter 4. Bayesian models 135 Hence we obtain the following estimators: c = 1.11316, 4.202 Y = 9 911 975, 4.203 σ = 337 289, 4.204 τ = 734 887, 4.205 µ 0 = 9 885 584. 4.206 This leads with κ = c σ 2 cτ 2 = 21.1%, Vco µθi = bτ cµ 0 = 7.4% and VcoC i,j = cσ 2 + c τ 2 1/2 cµ 0 = 8.2% to the following reserves: estimated reserves α i cred Ci,J d CL hom. cred. 0 82.6% 11 148 124 0 0 1 82.6% 10 663 125 15 126 14 934 2 82.6% 10 661 675 26 257 25 924 3 82.5% 9 758 685 34 538 34 616 4 82.5% 9 872 238 85 302 85 322 5 82.4% 10 091 682 156 494 155 929 6 82.2% 9 569 836 286 121 287 814 7 81.8% 8 716 445 449 167 460 234 8 80.7% 8 719 642 1 043 242 1 070 913 9 73.7% 9 654 386 3 950 815 3 978 818 6 047 061 6 114 503 Table 4.15: Estimated reserves in the homogeneous Bühlmann-Straub model constant µ i We see that the estimates are close to the chain-ladder method. This comes from the fact that the credibility weights are rather big: Since κ is rather small compared to β I i we obtain credibility weights which are all larger than 70%. For the mean square errors of prediction we obtain the values in Table 4.16. Example 4.55 Bühlmann-Straub model, varying µ i We revisit the data set given in Example 2.7 and Example 4.55. This time we assume that an a priori differentiation µ i is given by Table 4.6 a priori mean for Bornhuetter-Ferguson method. We apply the scaled model 4.182-4.183 for δ = 0, 1, 2 and obtain the reserves in Table 4.17. We see that the estimates for different δ s do not differ too much, and they are still close to the chain-ladder method. However, they differ from the estimates for the constant µ i case see Table 4.15. For the estimated variational coefficient we have for δ = 0, 1, 2 Vco µθ i 6.8%. 4.207
136 Chapter 4. Bayesian models msep 1/2 cred dci,j C i,j msep 1/2 hom dci,j C i,j 0 0 0 1 12 711 12 711 2 16 755 16 755 3 20 095 20 096 4 31 465 31 467 5 42 272 42 278 6 59 060 59 076 7 78 301 78 339 8 123 114 123 259 9 265 775 267 229 total 314 699 315 998 Table 4.16: Mean square error of prediction in the Bühlmann-Straub model constant µ i credibility weights α i reserves credibility reserves δ = 0 δ = 1 δ = 2 CL δ = 0 δ = 1 δ = 2 0 80.2% 80.6% 81.1% 0 0 0 0 1 80.1% 80.2% 80.3% 15 126 14 943 14 944 14 944 2 80.1% 79.6% 79.1% 26 257 25 766 25 753 25 740 3 80.1% 79.1% 78.0% 34 538 34 253 34 238 34 222 4 80.0% 79.7% 79.3% 85 302 85 056 85 051 85 046 5 79.9% 80.2% 80.4% 156 494 156 562 156 561 156 559 6 79.7% 79.8% 80.0% 286 121 289 078 289 056 289 035 7 79.3% 79.0% 78.8% 449 167 460 871 461 021 461 180 8 78.1% 77.6% 77.0% 1 043 242 1 069 227 1 069 815 1 070 427 9 70.4% 71.0% 71.5% 3 950 815 4 024 687 4 023 270 4 021 903 total 6 047 061 6 160 443 6 159 709 6 159 056 cµ 0 0.8810 0.8809 0.8809 Table 4.17: Estimated reserves in the homogeneous Bühlmann-Straub model varying µ i This describes the accuracy of the estimate of the true expected mean by the actuary. Observe that we have chosen 5% in Example 4.54. Moreover, we see once more that the a priori estimate µ i seems to be rather pessimistic, since µ 0 is substantially smaller than 1 for all δ. For the mean square error of prediction we obtain the values in Table 4.18. 4.4 Multidimensional credibility models In Section 4.3 we have assumed that the incremental payments have the following form E [X i,j Θ i ] = γ j µθ i. 4.208
Chapter 4. Bayesian models 137 msep 1/2 C i,j dci,j hom δ = 0 δ = 1 δ = 2 1 12 835 12 771 12 711 2 16 317 16 532 16 755 3 18 952 19 511 20 094 4 30 871 31 161 31 464 5 43 110 42 682 42 272 6 59 876 59 456 59 059 7 77 383 77 819 78 282 8 120 119 121 536 123 008 9 273 931 269 926 266 054 total 320 377 317 540 314 889 Table 4.18: Mean square error of prediction in the Bühlmann-Straub model varying µ i The constant γ j denotes the payment ratio in period j. If we rewrite this in vector form we obtain E [X i Θ i ] = γ µθ i, 4.209 where X i = X i,0,..., X i,j and γ = γ 0,..., γ J. We see that the stochastic terms µθ i can only act as a scalar. Sometimes we would like to have more flexibility, i.e. we replace µθ i by a vector. This leads to a generalization of the Bühlmann-Straub model. 4.4.1 Hachemeister regression model Model Assumptions 4.56 Hachemeister regression model [31] There exist p-dimensional design vectors γ j i = γ j,1 i,..., γ j,p i vectors µθ i = µ 1 Θ i,..., µ p Θ i p J + 1 such that we have and E [X i,j Θ i ] = γ j i µθ i, 4.210 Cov X i,j, X i,k Θ i = Σ j,k,i Θ i 4.211 for all i {0,..., I} and j {0,..., J}. The J +1 p matrix Γ i = γ 0 i,..., γ J i has rank p and the components µ 1 Θ i,..., µ p Θ i of µθ i are linearly independent. The pairs Θ i, X i i = 0,..., I are independent, and the Θ i are independent and identically distributed. Remarks 4.57
138 Chapter 4. Bayesian models We are now in the credibility regression case, see Bühlmann-Gisler [18], Section 8.3, where µθ i = µ 1 Θ i,..., µ p Θ i is a p-dimensional vector, which we would like to estimate. Γ i is a known J + 1 p design matrix. We define the following parameters µ = E [ µθ i ], 4.212 S j,k,i = E [ Σ j,k,i Θ i ], 4.213 T = Cov µθ, µθ, 4.214 S i = S j,k,i j,k=0,,j 4.215 for i {0,..., I} and j, k {0,..., J}. Hence T is a p p covariance matrix for the variability between the different accident years and S i is a J + 1 J + 1 matrix that describes the variability within the accident year i. special case for S i is given by S i = σ 2 W 1 i for appropriate weights w i,j > 0 and a scalar σ 2 > 0. Theorem 4.58 Hachemeister estimator An important = σ 2 diag w 1 i,0,..., w 1 i,j, 4.216 Under Model Assumptions 4.56 the optimal linear inhomogeneous estimator for µθ i is given by µθ i cred = A i B i + 1 A i µ, 4.217 with A i = T B i = T + Γ [I i] i S 1 i Γ [I i] i Γ [I i] i S 1 i 1 1 Γ [I i] i, 4.218 1 Γ [I i] i S 1 i X [I i] i, 4.219 where Γ [I i] i = γ 0 i,..., γ I i J i, 0,..., 0 4.220 X [I i] i = X i,0,..., X i,i i J, 0,..., 0 4.221 for I J +1 i I with p I i+1. The quadratic loss matrix for the credibility estimator is given by [ E µθ i cred µθ i ] µθ i cred µθ i = 1 A i T. 4.222
Chapter 4. Bayesian models 139 Proof. See Theorem 8.7 in Bühlmann-Gisler [18]. We have the following corollary: Corollary 4.59 Standard Regression Under Model Assumption 4.56 with S i given by 4.216 we have A i = T T + σ 2 1 1 Wi Γ [I i], 4.223 B i = Γ [I i] i Wi Γ [I i] i for I J + 1 i I with p I i + 1. Γ [I i] i 1 Γ [I i] i This leads to the following reserving estimator: i Wi X [I i] i 4.224 Estimator 4.60 Hachemeister credibility reserving estimator In the Hachemeister Regression Model 4.56 the estimator is given by Ĉ i,j cred = Ci,I i + for I J + 1 i I with p I i + 1. Remarks 4.61 J j=i i+1 γ j i µθ i cred 4.225 If µ is not known, then 4.217 can be replaced by the homogeneous credibility estimator for µθ i using I 1 µ = A i I A i B i. 4.226 In that case the right-hand side of 4.222 needs to be replaced by I 1 1 A i T 1 + A i 1 A i. 4.227 Term 4.219 gives the formula for the data compression see also Theorem 8.6 in Bühlmann-Gisler [18]. We already see from this that for p > 1 we have some difficulties with considering the youngest years since the dimension of µ is larger than the available number of observations if p > I i + 1. Observe that E [B i Θ i ] = µθ i, 4.228 [ Bi E µθ i B i µθ i ] 1 = S 1. 4.229 Γ [I i] i i Γ [I i] i
140 Chapter 4. Bayesian models Choices of the design matrix Γ i. There are various possibilities to choose the design matrix Γ i. One possibility which is used is the so-called Hoerl curve see De Jong-Zehnwirth [42] and Zehnwirth [92], set p = 3 and γ j i = 1, logj + 1, j. 4.230 Parameter estimation. It is rather difficult to get good parameter estimations in this model for p > 1. If we assume that the covariance matrix Σj,k,i Θ i j,k=0,...,j is diagonal with mean S i given by 4.216, we can estimate S i with the help of the one-dimensional Bühlmann-Straub model see Subsection 4.3.1. An unbiased estimator for the covariance matrix T is given by T = 1 I p [ Bi E B B i B ] I p 1 I p I p + 1 with Γ [I i] i S 1 i Γ [I i] i 1, 4.231 I p 1 B = B i. 4.232 I p + 1 Examples. In all examples we have looked at it was rather difficult to obtain reasonable estimates for the claims reserves. This has various reasons: 1 There is not an obvious choice for a good design matrix Γ i. In our examples the Hoerl curve has not well-behaved. 2 The estimation of the structural parameters S i and T are always difficult. Moreover they are not robust against outliers. 3 Already, slight perturbations of the data had a large effect on the resulting reserves. For all these reasons we do not give a real data example, i.e. the Hachemeister model is very interesting from a theoretical point of view, from a practical point of view it is rather difficult to apply it to real data. 4.4.2 Other credibility models In the Bühlmann-Straub credibility model we had a deterministic cashflow pattern γ j and we have estimated the exposure µθ i of the accident years. We could also exchange the role of these two parameters Model Assumptions 4.62 There exist scalars µ i i = 0,..., I such that
Chapter 4. Bayesian models 141 conditionally, given Θ i, we have for all j {0,..., J} E [X i,j Θ i ] = γ j Θ i µ i. 4.233 The pairs Θ i, X i i = 0,..., I are independent, and the Θ i are independent and identically distributed. Remarks 4.63 Now the whole vector γθ i = γ 0 Θ i,..., γ J Θ i with is a random drawing E [ γ j Θ i ] = γ j, 4.234 Cov γ j Θ i, γ k Θ i = T j,k, 4.235 Cov X i,j, X i,k Θ i = Σ j,k,i Θ i. 4.236 The difficulty in this model is that we have observations X i,0,..., X i,i i for γ 0 Θ i,..., γ I i Θ i and we need to estimate γ I i+1 Θ i,..., γ J Θ i. This is slightly different from classical one-dimensional credibility applications. From this it is clear that a crucial role is played by the covariance structures, which projects past observations to the future. For general covariance structures it is difficult to give nice formulas. Special cases were studied by Jewell [38] and Hesselager-Witting [35]. Hesselager- Witting [35] assume that the vectors γ 0 Θ i,..., γ J Θ i 4.237 are i.i.d. Dirichlet distributed with parameters a 0,..., a J. Define a = J j=0 a j then we have see Hesselager-Witting [35], formula 3 E [ γ j Θ i ] = γ j = a j /a, 4.238 Cov γ j Θ i, γ k Θ i 1 = T j,k = 1 + a 1 j=k γ j γ j γ k. 4.239 If we then choose a specific form for the covariance structure Σ j,k,i Θ i we can work out a credibility formula for the expected ultimate claim. Of course there is a large variety of other credibility models, such as e.g. hierarchical credibility models Hesselager [36]. We do not further discuss them here.
142 Chapter 4. Bayesian models 4.5 Kalman filter Kalman filters are an enhancement of credibility models. We will treat only the one-dimensional case, since already in the multivariate credibility context we have seen that it becomes difficult to go to higher dimensions. Kalman filters are evolutionary credibility models. If we take e.g. the Bühlmann- Straub model then it is assumed that Θ i i = 0,..., I are independent and identically distributed see Model 4.45. If we go back to Example 4.54 we obtain the following picture for the observations Y 0,..., Y I and the estimate µ 0 for the a priori mean µ 0 cf. 4.176 and 4.178, respectively: 12'000'000 10'000'000 8'000'000 6'000'000 4'000'000 2'000'000 0 0 1 2 3 4 5 6 7 8 9 10 Y_i mu_0 hat Figure 4.2: Observations Y i and estimate µ 0 From Figure 4.2 it is not obvious that Θ = Θ 0, Θ 1,... is a process of identically distributed random variables. We could also have underwriting cycles which would rather suggest, that neighboring Θ i s are dependent. Hence, we assume that Θ = Θ 0, Θ 1,... is a stochastic process of random variables which are not necessarily independent and identically distributed. Model Assumptions 4.64 Kalman filter Θ = Θ 0, Θ 1,... is a stochastic process. Conditionally, given Θ, the increments X i,j are independent with for all i, j E [X i,j /γ j Θ] = µθ i, 4.240 Cov X i,j /γ j, X k,l /γ l Θ = 1 {i=k,j=l} σ 2 Θ i /γ j. 4.241
Chapter 4. Bayesian models 143 µθ i i 0 is a martingale. Remarks 4.65 The assumption 4.241 can be relaxed in the sense that we only need in the average over Θ conditional uncorrelatedness. Assumption 4.241 implies that we obtain an updating procedure which is recursive. The martingale assumption implies that we have uncorrelated centered increments µθ i+1 µθ i see also 1.25, E [µθ i+1 µθ 0,..., µθ i ] = µθ i. 4.242 In Hilbert space language this reads as follows: The projection of µθ i+1 onto the subspace of all square integrable functions of µθ 0,..., µθ i is simply µθ i, i.e. the process µθ i has centered orthogonal increments. This i 0 last assumption could be generalized to linear transformations see Corollary 9.5 in Bühlmann-Gisler [18]. We introduce the following notations the notation is motivated by the usual terminology from state space models, see e.g. Abraham-Ledolter [1]: Y i = X i,0 /γ 0,..., X i,i i /γ I i, 4.243 µ i i 1 = argmin eµ LY0,...,Y i 1,1E [ µθi µ 2 ], 4.244 µ i i = argmin eµ LY0,...,Y i,1e [ µθi µ 2 ] 4.245 cf. 4.170. µ i i 1 is the best linear forecast for µθ i based on the information Y 0,..., Y i 1. Whereas µ i i is the best linear forecast for µθ i which is also based on Y i. Hence there are two updating procedures: 1 updating from µ i i 1 to µ i i on the basis of the newest observation Y i and 2 updating from µ i i to µ i+1 i due to the parameter movement from µθ i to µθ i+1. We define the following structural parameters σ 2 = E [ σ 2 Θ i ], 4.246 δ 2 i = Var µθ i µθ i 1, 4.247 q i i 1 = E [ µi,i 1 µθ i 2 ], 4.248 q i i = E [ µi,i µθ i 2 ]. 4.249 Theorem 4.66 Kalman filter recursion formula, Theorem 9.6 in [18] Under Model Assumptions 4.64 we have
144 Chapter 4. Bayesian models 1. Anchoring i = 0 µ 0 1 = µ 0 = E [µθ 0 ] and q 0 1 = τ0 2 = Var µθ 0. 4.250 2. Recursion i 0 a Observation update: µ i i = α i Y i + 1 α i µ i i 1, 4.251 q i i = 1 α i q i i 1, 4.252 4.253 with α i = Y i = β I i β I i + σ 2 /q i i 1, 4.254 I i J j=0 γ j β I i X i,j γ j = C i,i i J β I i. 4.255 b Parameter update: µ i+1 i = µ i i and q i+1 i = q i i + δ 2 i+1. 4.256 Proof. For the proof we refer to Theorem 9.6 in Bühlmann-Gisler [18]. This leads to the following reserving estimator: Estimator 4.67 Kalman filter reserving estimator In the Kalman filter model 4.64 the estimator is given by Ka Ĉ i,j = Ê [C i,j D I ] = C i,i i + 1 β I i µ i i 4.257 for I J + 1 i I. Remarks 4.68 In practice we face two difficulties: 1 We need to estimate all the parameters. 2 We need good estimates for the starting values µ 0 and τ0 2 for the iteration. Parameter estimation: For the estimation of σ 2 we choose σ 2 as in the Bühlmann-Straub model see 4.197. The estimation of δi 2 is less straightforward, in fact we need to define a special case of the Model Assumptions 4.64.
Chapter 4. Bayesian models 145 Model Assumptions 4.69 Gerber-Jones [28] Model Assumptions 4.64 hold. There exists a sequence i i 1 of independent random variables with E[ i ] = 0 and Var i = δ 2 such that µθ i = µθ i 1 + i 4.258 for all i 1. µθ 0 and i are independent for all i 1. Remark. In this model holds δ i = Var µθ i µθ i 1 = Var i = δ 2. Let us first calculate the variances and covariances of Y i defined in 4.255. VarY i = Var E [Y i Θ] + E [ Var Y i Θ ] = Var µθ i I i J γ + E j 2 Xi,j Var Θ Assume that i > l j=0 β 2 I i = Var µθ 0 + i δ 2 + 1 β I i σ 2. 4.259 CovY i, Y l = Cov E [Y i Θ], E [Y l Θ] + E [ Cov Y i, Y l Θ ] = Cov µθ i, µθ l i = Cov µθ l + k, µθ l k=l+1 = Var µθ 0 + l δ 2. 4.260 We define Y as in 4.199 with β = I β I i. Hence I β [ I i Yi E Y ] 2 I β I I i = Var Y i β i Y i β β = = = I β I i β VarY i I + 1 σ2 β + δ 2 I + 1 σ2 β + δ 2 I k,l=0 I k=0 β γ j β I k β I l β 2 i βi i β CovY k, Y l I k=0 min{i, k} βi k β I i β 2 I i 1 i k βi k β I i. 4.261 β 2
146 Chapter 4. Bayesian models This motivates the following unbiased estimator for δ 2 see also 4.198: I i 1 1 I δ 2 = i k βi k β I i β I i Y i Y 2 I + 1 σ 2 β β with k=0 = c T I + 1 σ2 β c = I i,k=0 β 2, 4.262 max{i k, 0} βi i β I k β 2 1. 4.263 Observe that expression 4.262 is similar to the estimator of τ 2 in the Bühlmann- Straub model 4.200. The difference lies in the constant. Example 4.70 Kalman filter We revisit the Example 4.54. We have the following parameters and estimates We start the iteration with the estimates see also 4.178. c = 0.62943, 4.264 Y = 9 911 975, 4.265 σ = 337 289, 4.266 δ = 545 637. 4.267 µ 0 = 9 885 584 and τ 0 = δ = 545 637 4.268 µ i i 1 q 1/2 i i 1 α i Y i µ i i q 1/2 i i µ i+1 i q 1/2 i+1 i 0 9 885 584 545 637 72.4% 11 148 123 10 799 066 286 899 10 799 066 616 466 1 10 799 066 616 466 76.9% 10 663 316 10 694 625 296 057 10 694 625 620 781 2 10 694 625 620 781 77.2% 10 662 005 10 669 454 296 651 10 669 454 621 064 3 10 669 454 621 064 77.2% 9 758 602 9 966 628 296 805 9 966 628 621 138 4 9 966 628 621 138 77.1% 9 872 213 9 893 857 297 401 9 893 857 621 423 5 9 893 857 621 423 77.0% 10 092 241 10 046 550 298 230 10 046 550 621 820 6 10 046 550 621 820 76.7% 9 568 136 9 679 468 299 967 9 679 468 622 655 7 9 679 468 622 655 76.4% 8 705 370 8 935 539 302 670 8 935 539 623 962 8 8 935 539 623 962 75.1% 8 691 961 8 752 681 311 533 8 752 681 628 309 9 8 752 681 628 309 67.2% 9 626 366 9 339 528 360 009 9 339 528 653 702 Table 4.19: Iteration in the Kalman filter We see that in this example the credibility weights are smaller compared to the Bühlmann-Straub model see Table 4.15. However, they are still rather high, which means that the a priori value µ i i 1 will move rather closely with the observations Y i 1. Hence we are now able to model dependent time-series, where the a priori value incorporates the past observed loss ratios: see Figure 4.3.
Chapter 4. Bayesian models 147 12'000'000 10'000'000 8'000'000 6'000'000 4'000'000 2'000'000 0 0 1 2 3 4 5 6 7 8 9 10 Y_i mu_0 hat mu_i i-1 Figure 4.3: Observations Y i, estimate µ 0 and estimates µ i i 1 dc i,j Ka estimated reserves CL hom. cred. Kalman 0 11 148 123 0 0 0 1 10 663 360 15 126 14 934 15 170 2 10 662 023 26 257 25 924 26 275 3 9 759 339 34 538 34 616 35 274 4 9 872 400 85 302 85 322 85 489 5 10 091 532 156 494 155 929 155 785 6 9 571 465 286 121 287 814 289 450 7 8 717 246 449 167 460 234 461 042 8 8 699 249 1 043 242 1 070 913 1 050 529 9 9 508 643 3 950 815 3 978 818 3 833 085 total 6 047 061 6 114 503 5 952 100 Table 4.20: chain-ladder reserves, homogeneous Bühlmann-Straub reserves and Kalman filter reserves
148 Chapter 4. Bayesian models
Chapter 5 Outlook Several topics on stochastic claims reserving methods need to be added to the current version of this manuscript: e.g. explicit distributional models and methods, such as the Log-normal model or Tweedie s compound Poisson model generalized linear model methods bootstrapping methods multivariate methods Munich chain-ladder method etc. 149
150 Chapter 5. Outlook
Appendix A Unallocated loss adjustment expenses A.1 Motivation In this section we describe the New York -method for the estimation of unallocated loss adjustment expenses ULAE. The New York -method for estimating ULAE is, unfortunately, only poorly documented in the literature e.g. as footnotes in Feldblum [26] and Foundation CAS [19]. In non-life insurance there are usually two different kinds of claims handling costs, external ones and internal ones. External costs like costs for external lawyers or for an external expertise etc. are usually allocated to single claims and are therefore contained in the usual claims payments and loss development figures. These payments are called allocated loss adjustment expenses ALAE. Typically, internal loss adjustment expenses income of claims handling department, maintenance of claims handling system, etc. are not contained in the claims figures and therefore have to be estimated separately. These internal costs can usually not be allocated to single claims. We call these costs unallocated loss adjustment expenses ULAE. From a regulatory point of view, we should also build reserves for these costs/expenses because they are part of the claims handling process which guarantees that an insurance company is able to meet all its obligations. I.e. ULAE reserves should guarantee the smooth run off of the old insurance liabilities without pay-as-you-go from new business/premium for the internal claims handling processes. 151
152 Appendix A. Unallocated loss adjustment expenses A.2 Pure claims payments Usually, claims development figures only consist of pure claims payments not containing ULAE charges. They are usually studied in loss development triangles or trapezoids as above see Section 1.3. In this section we denote by X pure i,j the pure incremental payments for accident year i 0 i I in development year j 0 j J. Pure always means, that these quantities do not contain ULAE this is exactly the quantity studied in Section 1.3. The cumulative pure payments for accident year i after development period j are denoted by see 1.41 C pure i,j = j X pure i,k. k=0 A.1 We assume that X pure i,j given by C pure i,j. = 0 for all j > J, i.e. the ultimate pure cumulative loss is We have observations for D I = {X pure i,j ; 0 i I and 0 j min{j, I i}} and the complement of D I needs to be predicted. For the New York-method we also need a second type of development trapezoids, namely a reporting trapezoid: For accident year i, Z pure i,j denotes the pure cumulative ultimate claim amount for all those claims, which are reported up to and including development year j. Hence Z pure i,0, Z pure i,1,... with Z pure i,j = C pure i,j describes, how the pure ultimate claim C pure i,j is reported over time at the insurance company. Of course, this reporting pattern is much more delicate, because sizes which are reported in the upper set D I = {Z pure i,j ; 0 i I and 0 j min{j, I i}} are still developping, since usually it takes quite some time between the reporting and the final settlement of a claim. In general, the final value for Z pure i,j is only known at time i + J. Remark: Since the New York-method is an algorithm based on deterministic numbers, we assume that all our variables are deterministic. Stochastic variables are replaced by their best estimate for its conditional mean at time I. We think that for the current presentation to explain the New York-method it is not helpful to work in a stochastic framework.
Appendix A. Unallocated loss adjustment expenses 153 A.3 ULAE charges The cumulative ULAE payments for accident year i until development period j are denoted by C ULAE i,j. And finally, the total cumulative payments pure and ULAE are denoted by C i,j = C pure i,j + C ULAE i,j. A.2 The cumulative ULAE payments C ULAE i,j and the incremental ULAE charges X ULAE i,j = C ULAE i,j C ULAE i,j 1 A.3 need to be estimated: The main difficulty is that for each accounting year t I we usually have only one aggregated observation = X ULAE t i+j=t 0 j J X ULAE i,j sum over t-diagonal. A.4 I.e. ULAE payments are usually not available for single accident years but rather we have a position Total ULAE Expenses for each accounting year t in general ULAE charges are contained in the position Administrative Expenses in the annual profit-and-loss statement. Hence, for the estimation of future ULAE payments we need first to define an appropriate model in order to split the aggregated observations X ULAE t into the different accident years X ULAE i,j. A.4 New York-method The New York-method assumes that one part of the ULAE charge is proportional to the claims registration denote this proportion by r [0, 1] and the other part is proportional to the settlement payments of the claims proportion 1 r. Assumption A.1 We assume that there are two development patterns γ j j=0,...,j and δ j j=0,...,j with γ j 0, δ j 0, for all j, and J j=0 γ j = J j=0 δ j = 1 such that cashflow or payout pattern X pure i,j = γ j C pure i,j A.5 and reporting pattern for all i and j. Z pure i,j = j l=0 δ l C pure i,j A.6
154 Appendix A. Unallocated loss adjustment expenses Remarks: Equation A.5 describes, how the pure ultimate claim C pure i,j is paid over time. In fact γ j gives the cashflow pattern for the pure ultimate claim C pure i,j. We propose that γ j is estimated by the classical chain-ladder factors f j, see 3.12 γ CL j = 1 f j f 1 1. A.7 J 1 f j 1 The estimation of the claims reporting pattern δ j in A.6 is more delicate. As we have seen there are not many claims reserving methods which give a reporting pattern δ j. Such a pattern can only be obtained if one separates the claims estimates for reported claims and IBNyR claims incurred but not yet reported. Model Assumptions A.2 Assume that there exists r [0, 1] such that the incremental ULAE payments satisfy for all i and all j X ULAE i,j = r δ j + 1 r γ j C ULAE i,j. A.8 Henceforth, we assume that one part r of the ULAE charge is proportional to the reporting pattern one has loss adjustment expenses at the registration of the claim, and the other part 1 r of the ULAE charge is proportional to the claims settlement measured by the payout pattern. Definition A.3 Paid-to-paid ratio We define for all t π t = XULAE t X pure t = i+j=t 0 j J i+j=t 0 j J X ULAE i,j X pure i,j. A.9 The paid-to-paid ratio measures the ULAE payments relative to the pure claim payments in each accounting year t. Lemma A.4 Assume there exists π > 0 such that for all accident years i we have C ULAE i,j C pure i,j = π. A.10 Under Assumption A.1 and Model A.2 we have for all accounting years t π t = π, A.11 whenever C pure i,j is constant in i.
Appendix A. Unallocated loss adjustment expenses 155 Proof of Lemma A.4. We have π t = i+j=t 0 j J = π i+j=t 0 j J This finishes the proof. X ULAE i,j X pure i,j = J ULAE r δj + 1 r γ j C j=0 J j=0 J pure r δj + 1 r γ j C j=0 J j=0 γ j C pure t j,j γ j C pure t j,j t j,j t j,j = π. A.12 We define the following split of the claims reserves for accident year i at time j: R pure i,j = l>j X pure i,l = l>j γ l C pure i,j total pure claims reserves, R IBNyR i,j = l>j δ l C pure i,j IBNyR reserves, incurred but not yet reported, R rep i,j = R pure i,j R IBNyR i,j reserves for reported claims. Estimator A.5 New York-method Under the assumptions of Lemma A.4 we can predict π using the observations π t accounting year data. The reserves for ULAE charges for accident year i after development year j, R ULAE i,j = l>j XULAE i,l, are estimated by Explanation of Result A.5. R ULAE i,j = π r R IBNyR i,j + π 1 r R pure i,j = π R IBNyR i,j We have under the assumptions of Lemma A.4 for all i, j that + π 1 r R rep i,j. A.13 R ULAE i,j = ULAE r δl + 1 r γ l C i,j l>j = π pure r δl + 1 r γ l C l>j = π r R IBNyR i,j i,j + π 1 r R pure i,j. A.14 Remarks:
156 Appendix A. Unallocated loss adjustment expenses In practice one assumes the stationarity condition π t = π for all t. This implies that π can be estimated from the accounting data of the annual profit-and-loss statements. Pure claims payments are directly contained in the profit-and-loss statements, whereas ULAE payments are often contained in the administrative expenses. Hence one needs to divide this position into further subpositions e.g. with the help of an activity-based cost allocation split. Result A.5 gives an easy formula for estimating ULAE reserves. If we are interested into the total ULAE reserves after accounting year t we simply have R ULAE t = i+j=t R ULAE i,j = π i+j=t R IBNyR i,j + π 1 r R rep i,j, A.15 i+j=t i.e. all we need to know is, how to split of total pure claims reserves into reserves for IBNyR claims and reserves for reported claims. The assumptions for the New York-method are rather restrictive in the sense that the pure cumulative ultimate claim C pure i,j must be constant in k see Lemma A.4. Otherwise the paid-to-paid ratio π t for accounting years is not the same as the ratio C ULAE i,j /C pure i,j even if the latter is assumed to be constant. Of course in practice the assumption of equal pure cumulative ultimate claim is never fulfilled. following lemma. If we relax this condition we obtain the Lemma A.6 Assume there exists π > 0 such that for all accident years i we have C ULAE i,j = π r δ 1 + 1 r, A.16 C pure γ i,j with J j=0 γ = γ j C pure J t j,j j=0 J and δ = δ j C pure t j,j J. A.17 j=0 Cpure t j,j j=0 Cpure t j,j Under Assumption A.1 and Model A.2 we have for all accounting years t π t = π. Proof of Lemma A.6. As in Lemma A.4 we obtain J pure π t = π r δ 1 r δj + 1 r γ j C t j,j j=0 + 1 r γ J γ j C pure t j,j This finishes the proof. j=0 A.18 = π. A.19
Appendix A. Unallocated loss adjustment expenses 157 Remarks: If all pure cumulative ultimates are equal then γ = δ = 1/J + 1 apply Lemma A.4. Assume that there exists a constant i p > 0 such that for all i 0 we have C pure i+1,j = 1 + ip C pure i,j, i.e. constant growth i p. If we blindly apply A.11 of Lemma A.4 i.e. we do not apply the correction factor in A.16 and estimate the incremental ULAE payments by A.13 and A.15 we obtain i+j=t X ULAE i,j = π J j=0 = XULAE t X pure t = i+j=t = i+j=t > i+j=t r δj + 1 r γ j C pure t j,j J j=0 X ULAE i,j X ULAE i,j X ULAE i,j, r δj + 1 r γ j C pure t j,j r δ γ r + 1 r A.20 J j=0 δ j 1 + i p J j J j=0 γ j 1 + i p + 1 r J j where the last inequality in general holds true for i p > 0, since usually δ j j is more concentrated than γ j j, i.e. we usually have J > 1 and j δ l > l=0 j γ l for j = 0,..., J 1. A.21 l=0 This comes from the fact that the claims are reported before they are paid. I.e. if we blindly apply the New York-method for constant positive growth then the ULAE reserves are too high for constant negative growth we obtain the opposite sign. This implies that we have always a positive loss experience on ULAE reserves for constant positive growth. A.5 Example We assume that the observations for π t are generated by i.i.d. random variables X ULAE t. Hence we can estimate π from this sequence. Assume π = 10%. Moreover X pure t i p = 0 and set r = 50% this is the usual choice, also done in the SST [73].
158 Appendix A. Unallocated loss adjustment expenses Moreover we assume that we have the following reporting and cash flow patterns J = 4: Assume that C pure i,j by β 0,..., β 4 = 90%, 10%, 0%, 0%, 0%, A.22 α 0,..., α 4 = 30%, 20%, 20%, 20%, 10%. A.23 = 1 000. Then the ULAE reserves for accident year i are given RULAE i, 1,..., ULAE R i,3 = 100, 40, 25, 15, 5, A.24 which implies for the estimated incremental ULAE payments XULAE ULAE i,0,..., X i,4 = 60, 15, 10, 10, 5. A.25 Hence for the total estimated payments X i,j = X pure i,j Xi,0,..., X i,4 + X ULAE i,j we have = 360, 215, 210, 210, 105. A.26
Appendix B Distributions B.1 Discrete distributions B.1.1 Binomial distribution For n N and p 0, 1 the Binomial distribution Binn, p is defined to be the discrete distribution with probability function for all x {0,..., n}. f n,p x = n p x 1 p n x B.1 x EX VarX VcoX n p n p 1 p 1 p n p Table B.1: Expectation, variance and variational coefficient of a Binn, p- distributed random variable X B.1.2 Poisson distribution For λ 0, the Poisson distribution Poissonλ is defined to be the discrete distribution with probability function f λ x = e λ λx x! B.2 for all x N 0. 159
160 Appendix B. Distributions Table B.2: EX VarX VcoX λ λ 1 λ Expectation, variance and variational coefficient of a Poissonλ- distributed random variable X B.1.3 Negative binomial bistribution For r 0, and p 0, 1 the Negative binomial distribution NBr, p is defined to be the discrete distribution with probability function r + x 1 f r,p x = p r 1 p x B.3 x for all x N 0. For α R and n N 0, the generalized binomial coefficient is defined to be α α α 1... α n + 1 n α k + 1 = =. B.4 n n! k EX VarX VkoX r 1 p p r 1 p 1 p 2 r 1 p Table B.3: Expectation, variance and variational coefficient of a NBr, p- distributed random variable X k=1 B.2 Continuous distributions B.2.1 Normal distribution For µ R and σ 2 > 0 the Normal distribution N µ, σ 2 is defined to be the continuous distribution with density 1 f µ,σ 2x = exp x µ2 1 2 π σ 2 2 σ 2 R x. B.5 B.2.2 Log-normal distribution For µ R and σ 2 > 0 the Log-normal distribution LN µ, σ 2 is defined to be the continuous distribution with density 1 f µ,σ 2x = 2 π σ2 x exp ln x µ2 1 2 σ 2 0, x. B.6
Appendix B. Distributions 161 EX VarX VcoX µ σ 2 σ µ Table B.4: Expectation, variance and variational coefficient of a N µ, σ 2 - distributed random variable X EX VarX VcoX σ2 µ+ e 2 e 2 µ+σ2 e σ2 1 e σ2 1 Table B.5: Expectation, variance and variational coefficient of a LN µ, σ 2 - distributed random variable X B.2.3 Gamma distribution For γ, c 0, the Gamma distribution Γγ, c is defined to be the continuous distribution with density f γ,c x = cγ Γγ xγ 1 e c x 1 0,1 x. B.7 The map Γ : 0, 0, given by Γγ = 0 u γ 1 e u du B.8 is called the Gamma function. The parameters γ and c are called shape and scale respectively. The Gamma function has the following properties 1 Γ1 = 1. 2 Γ1/2 = π. 3 Γγ + 1 = γ Γγ. EX VarX VcoX γ c γ c 2 1 γ Table B.6: Expectation, variance and variational coefficient of a Γγ, c-distributed random variable X
162 Appendix B. Distributions B.2.4 Beta distribution For a, b 0, the Beta distribution Betaa, b is defined to be the continuous distribution with density f a,b x = 1 Ba, b xa 1 1 x b 1 1 0,1 x. B.9 The map B : 0, 0, 0, given by Ba, b = is called the Beta function. 1 0 u a 1 1 u b 1 du = Γa Γb Γa + b B.10 Table B.7: EX VarX VcoX a a+b a b a+b 2 a+b+1 b a a+b+1 Expectation, variance and variational coefficient of a Betaa, b- distributed random variable X
Bibliography [1] Abraham, B., Ledolter, J. 1983, Statistical Methods for Forecasting. John Wiley and Sons, NY. [2] Alba de, E. 2002, Bayesian estimation of outstanding claim reserves. North American Act. J. 6/4, 1-20. [3] Alba de, E., Corzo, M.A.R. 2006, Bayesian claims reserving when there are negative values in the runoff triangle. Actuarial Research Clearing House, Jan 01, 2006. [4] Alba de, E. 2006, Claims reserving when there are negative values in the runoff triangle: Bayesian analysis using the three-parameter log-normal distribution. North American Act. J. 10/3, 45-59. [5] Arjas, E. 1989, The claims reserving problem in non-life insurance: some structural ideas. ASTIN Bulletin 19/2, 139-152. [6] Barnett, G., Zehnwirth, B. 1998, Best estimate reserves. CAS Forum, 1-54. [7] Barnett, G., Zehnwirth, B. 2000, Best estimates for reserves. Proc. CAS, Vol. LXXXII, 245-321. [8] Benktander, G. 1976, An approach to credibility in calculating IBNR for casualty excess reinsurance. The Actuarial Review, April 1976, p.7. [9] Bernardo, J.M., Smith, A.F.M. 1994, Bayesian Theory. John Wiley and Sons, NY. [10] Bornhuetter, R.L., Ferguson, R.E. 1972, The actuary and IBNR. Proc. CAS, Vol. LIX, 181-195. [11] Buchwalder, M., Bühlmann H., Merz, M., Wüthrich, M.V. 2005, Legal valuation portfolio in non-life insurance. Conference paper, presented at the 36th International ASTIN Colloquium, 4-7 September 2005, ETH Zürich. www.astin2005.ch [12] Buchwalder, M., Bühlmann H., Merz, M., Wüthrich, M.V. 2006, Estimation of unallocated loss adjustment expenses. Bulletin SAA 2006/1, 43-53. [13] Buchwalder, M., Bühlmann H., Merz, M., Wüthrich, M.V. 2006, The mean square error of prediction in the chain ladder reserving method Mack and Murphy revisited. To appear in ASTIN Bulletin 36/2. 163
164 Bibliography [14] Buchwalder, M., Bühlmann H., Merz, M., Wüthrich, M.V. 2006, Valuation portfolio in non-life insurance. Preprint [15] Bühlmann, H. 1983, Chain ladder, Cape Cod and complementary loss ratio. International Summer School 1983, unpublished. [16] Bühlmann, H. 1992, Stochastic discounting. Insurance: Math. Econom. 11, 113-127. [17] Bühlmann, H. 1995, Life insurance with stochastic interest rates. In: Financial Risk in Insurance, G. Ottaviani Ed., Springer. [18] Bühlmann, H., Gisler, A. 2005, A Course in Credibility Theory and its Applications. Springer Universitext. [19] Casualty Actuarial Society CAS 1990. Foundations of Casualty Actuarial Science, fourth edition. [20] Clark, D.R. 2003, LDF curve-fitting and stochastic reserving: a maximum likelihood approach. CAS Forum Fall, 41-92. [21] Efron, B. 1979, Bootstrap methods: another look at the jackknife. Ann. Statist. 7/1, 1 26. [22] Efron, B., Tibshirani, R.J. 1995, An Introduction to the Bootstrap. Chapman & Hall, NY. [23] England, P.D., Verrall, R.J. 1999, Analytic and bootstrap estimates of prediction errors in claims reserving. Insurance: Math. Econom. 25, 281-293. [24] England, P.D., Verrall, R.J. 2001, A flexible framework for stochastic claims reserving. Proc. CAS, Vol. LXXXIII, 1-18. [25] England, P.D., Verrall, R.J. 2002, Stochastic claims reserving in general insurance. British Act. J. 8/3, 443-518. [26] Feldblum, S. 2002, Completing and using schedule P. CAS Forum, 353-590. [27] Finney, D.J. 1941, On the distribution of a variate whose algorithm is normally distributed. JRSS Suppl. 7, 155-161. [28] Gerber, H.U., Jones, D.A. 1975, Credibility formulas of the updating type. In: Credibility: Theory and Applications, P.M. Kahn Ed., Academic Press, NY. [29] Gisler, A. 2006, The estimation error in the chain-ladder reserving method: a Bayesian approach. To appear in ASTIN Bulletin 36/2. [30] Gogol, D. 1993, Using expected loss ratios in reserving. Insurance: Math. Econom. 12, 297-299.
Bibliography 165 [31] Hachemeister, C.A. 1975, Credibility for regression models with application to trend. In: Credibility: Theory and Applications, P.M. Kahn Ed., Academic Press, NY. [32] Haastrup, S., Arjas, E. 1996, Claims reserving in continuous time; a nonparametric Bayesian approach. ASTIN Bulletin 26/2, 139-164. [33] Herbst, T. 1999, An application of randomly truncated data models in reserving IBNR claims. Insurance: Math. Econom. 25, 123-131. [34] Hertig, J. 1985, A statistical approach to the IBNR-reserves in marine insurance. ASTIN Bulletin 15, 171-183. [35] Hesselager, O., Witting, T. 1988, A credibility model with random fluctutations in delay probabilities for the prediction of IBNR claims. ASTIN Bulletin 18, 79-90. [36] Hesselager, O. 1991, Prediction of outstanding claims: A hierarchical credibility approach. Scand. Act. J. 1991, 25-47. [37] Hovinen, E. 1981, Additive and continuous IBNR. ASTIN Colloquium Loen, Norway. [38] Jewell, W.S. 1976, Two classes of covariance matrices giving simple linear forecasts. Scand. Act. J. 1976, 15-29. [39] Jewell, W.S. 1989, Predicting IBNyR events and delays. ASTIN Bulletin 19/1, 25-55. [40] Jones, A.R., Copeman, P.J., Gibson, E.R., Line, N.J.S., Lowe, J.A., Martin, P., Matthews, P.N., Powell, D.S. 2006, A change agenda for reserving. Report of the general insurance reserving issues taskforce GRIT. Institute of Actuaries and Faculty of Actuaries. [41] Jong de, E. 2006, Forecasting runoff triangles. North American Act. J. 10/2, 28-38. [42] Jong de, P., Zehnwirth, B. 1983, Claims reserving, state-space models and the Kalman filter. J.I.A. 110, 157-182. [43] Jørgensen, B., de Souza, M.C.P. 1994, Fitting Tweedie s compound Poisson model to insurance claims data. Scand. Act. J. 1994, 69-93. [44] Kremer, E. 1982, IBNR claims and the two way model of ANOVA. Scand. Act. J. 1982, 47-55. [45] Larsen, C.R. 2005, A dynamic claims reserving model. Conference paper, presented at the 36th International ASTIN Colloquium, 4-7 September 2005, ETH Zürich. www.astin2005.ch
166 Bibliography [46] Lyons, G., Forster, W., Kedney, P., Warren, R., Wilkinson, H. 2002, Claims reserving working party paper. General Insurance Convention 2002. http://www.actuaries.org.uk [47] Mack, T. 1990, Improved estimation of IBNR claims by credibility. Insurance: Math. Econom. 9, 51-57. [48] Mack, T. 1991, A simple parametric model for rating automobile insurance or estimating IBNR claims reserves. ASTIN Bulletin 21/1, 93-109. [49] Mack, T. 1993, Distribution-free calculation of the standard error of chain ladder reserve estimates. ASTIN Bulletin 23/2, 213-225. [50] Mack, T. 1994, Measuring the variability of chain ladder reserve estimates. CAS Forum Spring, 101-182. [51] Mack, T. 2000, Credible claims reserves: The Benktander method. ASTIN Bulletin 30/2, 333-347. [52] Mack, T., Quarg, G., Braun, C. 2006, The mean square error of prediction in the chain ladder reserving method - a comment. To appear in ASTIN Bulletin 36/2. [53] McCullagh, P., Nelder, J.A. 1989, Generalized Linear Models. 2nd edition, Chapman & Hall, London. [54] Merz, M., Wüthrich, M.V. 2006, A credibility approach to the Munich chain-ladder method. To appear in Blätter DGVFM. [55] Murphy, D.M. 1994, Unbiased loss development factors. Proc. CAS, Vol. LXXXI, 154-222. [56] Neuhaus, W. 1992, Another pragmatic loss reserving method or Bornhuetter/Ferguson revisited. Scand. Act. J. 1992, 151-162. [57] Neuhaus, W. 2004, On the estimation of outstanding claims. Conference paper, presented at the 35th International ASTIN Colloquium 2004, Bergen, Norway. [58] Norberg, R. 1993, Prediction of outstanding liabilities in non-life insurance. ASTIN Bulletin 23/1, 95-115. [59] Norberg, R. 1999, Prediction of outstanding liabilities II. Model variations and extensions. ASTIN Bulletin 29/1, 5-25. [60] Ntzoufras, I., Dellaportas, P. 2002, Bayesian modelling of outstanding liabilities incorporating claim count uncertainty. North American Act. J. 6/1, 113-128. [61] Partrat, C., Pey, N., Schilling, J. 2005, Delta method and reserving. Conference paper, presented at the 36th International ASTIN Colloquium, 4-7 September 2005, ETH Zürich. www.astin2005.ch
Bibliography 167 [62] Quarg, G., Mack, T. 2004, Munich Chain Ladder. Blätter DGVFM, Band XXVI, 597-630. [63] Radtke, M., Schmidt, K.D. 2004, Handbuch zur Schadenreservierung. Verlag Versicherungswirtschaft, Karlsruhe. [64] Renshaw, A.E. 1994, Claims reserving by joint modelling. Actuarial research paper no. 72, Department of Actuarial Sciences and Statistics, City University, London. [65] Renshaw, A.E., Verrall, R.J. 1998, A stochastic model underlying the chain ladder technique. British Act. J. 4/4, 903-923. [66] Ross, S.M. 1985, Introdution to Probability Models, 3rd edition. Academic Press, Orlando Florida. [67] Sandström, A. 2006, Solvency: Models, Assessment and Regulation. Chapman and Hall, CRC. [68] Schmidt, K.D., Schaus, A. 1996, An extension of Mack s model for the chain-ladder method. ASTIN Bulletin 26, 247-262. [69] Schnieper, R. 1991, Separating true IBNR and IBNER claims. ASTIN Bulletin 21, 111-127. [70] Scollnik, D.P.M. 2002, Implementation of four models for outstanding liabilities in WinBUGS: A discussion of a paper by Ntzoufras and Dellaportas. North American Act. J. 6/1, 113-128. [71] Smyth, G.K., Jørgensen, B. 2002, Fitting Tweedie s compound Poisson model to insurance claims data: dispersion modelling. ASTIN Bulletin 32, 143-157. [72] Srivastava, V.K., Giles, D.E. 1987, Seemingly Unrelated Regression Equation Models: Estimation and Inference. Marcel Dekker, NY. [73] Swiss Solvency Test 2005, BPV SST Technisches Dokument, Version 22.Juni 2005. Available under www.sav-ausbildung.ch [74] Taylor, G. 1987, Regression models in claims analysis I: theory. Proc. CAS, Vol. XLVI, 354-383. [75] Taylor, G. 2000, Loss Reserving: An Actuarial Perspective. Kluwer Academic Publishers. [76] Taylor, G., McGuire, G. 2005, Synchronous bootstrapping of seemingly unrelated regressions. Conference paper, presented at the 36th International ASTIN Colloquium, 4-7 September 2005, ETH Zürich. www.astin2005.ch
168 Bibliography [77] Venter, G.G. 1998, Testing the assumptions of age-to-age factors. Proc. CAS, Vol. LXXXV, 807-847. [78] Venter, G.G. 2006, Discussion of MSEP in the CLRM MMR. To appear in ASTIN Bulletin 36/2. [79] Verrall, R.J. 1990, Bayesian and empirical Bayes estimation for the chain ladder model. ASTIN Bulletin 20/2, 217-238. [80] Verrall, R.J. 1991, On the estimation of reserves from loglinear models. Insurance: Math. Econom. 10, 75-80. [81] Verrall, R.J. 2000, An investigation into stochastic claims reserving models and the chain-ladder technique. Insurance: Math. Econom. 26, 91-99. [82] Verrall, R.J. 2004, A Bayesian generalized linear model for the Bornhuetter- Ferguson method of claims reserving. North American Act. J. 8/3, 67-89. [83] Verdier, B., Klinger, A. 2005, JAB Chain: A model-based calculation of paid and incurred loss development factors. Conference paper, presented at the 36th International ASTIN Colloquium, 4-7 September 2005, ETH Zürich. www.astin2005.ch [84] Vylder De, F. 1982, Estimation of IBNR claims by credibility theory. Insurance: Math. Econom. 1, 35-40. [85] Vylder De, F., Goovaerts, M.J. 1979, Proceedings of the first meeting of the contact group Actuarial Sciences. KU Leuven, nl. 7904B wettelijk report: D/1979/23761/5. [86] Wright, T.S. 1990, A stochastic method for claims reserving in general insurance. J.I.A. 117, 677-731. [87] Wüthrich, M.V. 2003, Claims reserving using Tweedie s compound Poisson model. ASTIN Bulletin 33/2, 331-346. [88] Wüthrich, M.V. 2006, Premium liability risks: modeling small claims. Bulletin SAA 2006/1, 27-38. [89] Wüthrich, M.V. 2006, Using a Bayesian approach for claims reserving. Accepted for publication in CAS Journal. [90] Wüthrich, M.V. 2006, Prediction error in the chain ladder method. Preprint. [91] Wüthrich, M.V., Bühlmann, H., Furrer, H. 2006, Lecture notes on market consistent actuarial valuation. ETH Zürich, Summer Term 2006. [92] Zehnwirth, B. 1998, ICRFS-Plus 8.3 Manual. Insureware Pty Ltd. St. Kilda, Australia.
Bibliography 169 [93] Zellner, A. 1962, An efficient method of estimating seemingly unrelated regressions and test for aggregated bias. J. American Stat. Assoc. 57, 346-368.