How To Model A Car Accident For A Car Insurance Rating Model

Size: px
Start display at page:

Download "How To Model A Car Accident For A Car Insurance Rating Model"

Transcription

1 Linear credibility models based on time series for claim counts Oana Purcaru 1 and Montserrat Guillén 2 and Michel Denuit 3 Abstract. In this paper, several models for longitudinal analysis of claim frequencies are presented. These models are fitted to a large panel data set relating to a major Spanish insurance company. Credibility theory is then used to update the individual claim frequencies. The results are compared and the consequencies of increasing the strength of dependence are clearly visible. Keywords: GEE, frequency credibility, motor insurance, positive dependence, mixed Poisson model. 1 Introduction and Motivation In automobile third party liability insurance, examples of classification variables encountered in practice include the age, gender and occupation of the policyholders, the type and use of their car, the place where they reside and sometimes even the number of cars in the household, marital status, smoking behavior or the color of the vehicle. However, many important factors cannot be taken into account at this stage; think for instance of swiftness of reflexes or aggressiveness behind the wheel in automobile insurance. Consequently, tariff cells are still quite heterogeneous. This residual heterogeneity can be represented by a random effect in a statistical model. The amount of premium charged to all policyholders in a risk class is thus itself an average, so that some policyholders pay too much and subsidize the others. The claims histories can be used to restore fairness in the risk classes, increasing the premium for policyholders reporting claims and decreasing those of good drivers. The allowance for the history of the policyholder in a rating model thus derives from interpretation of serial correlation for longitudinal data resulting from hidden features in the risk distribution. In a seminal paper, Dionne and Vanasse (1989) proposed a credibility model which integrates aprioriand a posteriori information on an individual basis. These authors introduced a regression component in the Poisson counting model in order to use all available information in the estimation of accident frequency. The unexplained heterogeneity was then modeled by the introduction of a latent variable representing the influence of hidden policy characteristics. 1 Institut de Statistique, Université Catholique de Louvain, Voie du Roman Pays 20, B-1348 Louvain-la-Neuve, Belgium, purcaru@stat.ucl.ac.be 2 Dept. Econometrics, University of Barcelona, Diagonal 690, S Barcelona, Spain, guillen@eco.ub.es 3 Institut de Statistique, Université Catholique de Louvain, Voie du Roman Pays 20, B-1348 Louvain-la-Neuve, Belgium, denuit@stat.ucl.ac.be The vast majority of the papers appeared in the actuarial literature considered time-independent (or static) heterogeneous models. Noticeable exceptions include the pioneering papers by Gerber & Jones (1975), Sundt (1988) and Pinquet, Guillén &Bolancé (2001, 2003). The allowance for an unknown underlying random parameter that develops over time is justified since unobservable factors influencing the driving abilities are not constant. One might consider either shocks (induced by events like divorces or nervous breakdown, for instance) or continuous modifications (e.g. due to learning effect). Another reason to allow for random effects that vary with time relates to moral hazard. Indeed, individual efforts to prevent accidents are unobserved and feature temporal dependence. The policyholders may adjust their efforts for loss prevention according to their experience with past claims, the amount of premium and awareness of future consequences of an accident (due to experience rating schemes). The main technical interest of letting the random effects evolve over time is to take into account the date of claims. This reflects the fact that the predictive ability of a claim depends on its age: a recent claim is a worse sign to the insurer than a very old one. Contrarily to the static case, the total number of claims reported in the past is no more an exhaustive summary of policyholders history. Rather, the sequence of annual claim numbers has now to be memorized to determine future premiums. In this paper, we resort to three simple models for time series of claim counts, namely: Poisson-LogNormal with AR(1) structure in the random effects (on the log-scale), Poisson-LogNormal with exchangeable random effects and Poisson-LogNormal with static (commonotonic) random effects. These models are fitted to a large Spanish third part liability automobile data set, and their relative merits are discussed. In the AR(1) case, the autocorrelation function decreases exponentially with time. Therefore, past claims are forgotten more rapidly than in the two other models, for which the autocorrelation function is constant. The latter models do not incorporate the age of the claim in risk prediction. We examine carefully the pattern of a posteriori corrections generated during a period of 10 years, from three points of vue: the age of claims, the aprioricharacteristics and the three models considered. We also emphasize the consequences of their application on the distribution of premiums among policyholders of the portofolio. It will be seen that a recent claim has a greater impact on the revised premium than an old one. We will also remark that if one claim is reported during the 1-st year (over c BELGIAN ACTUARIAL BULLETIN, Vol. 4, No. 1, 2004

2 the 10 years considered), the lower the claim frequency, the higher the relative a posteriori correction is. In change, if no claim is reported over 10 years, the high the claim frequency, the high the relative a posteriori discounts is. Let also mention a last interesting feature which we observed, namely the remission of sins (see more details in Section 4). It is important to mention that all the models possess the desirable financial stability property. Let us now briefly describe the content of this paper. Section 2 presents the data set used to illustrate the work. Section 3 discusses the modelling of series of claim counts, including the estimation of the parameters. It also displays the estimations for the data of Section 2. Section 4 recalls the linear credibility procedure and carefully compares the values obtained for different types of policyholders, for claims of different ages and for the three models presented in Section 3. The final Section 5 concludes. Before going further, let us make precise the notation that will be used throughout this paper. Random variables (vectors) will be denoted by capital (bold) letters. Normal distribution will be denoted as N (µ, σ 2 ) and LogNormal distribution as LN (µ, σ 2 ) 2 Description of the data set The data used in this paper are a ten percent sample of the automobile portofolio of a major insurance company operating in Spain. We only considered private use cars in this sample. The panel data contain information since 1991 until Our sample contains policyholders that stay in the company for seven complete yearly periods. We have 18 exogeneous variables that are kept in the panel plus the yearly number of accidents. For every policy we have the initial information at the beginning of the period and the total number of claims at fault that took place whithin this yearly period. Some exogeneous information does not change with time, such as the insured s date of birth, but other information may change, for example, the type of guarantees that are covered by the contract. Table 2.1 contains the observed yearly frequency of claims at fault from the first to the seventh period, together with the maximum number of claims per policyholder. The average claim frequency is 6.9%. The exogeneous variables are the ones described in Table 2.2. Period Frequency Maximum 1 0, , , , , , ,075 4 Total Table 2.1: Frequency of claims 3 Modelling through random effects 3.1 Description of the model For each policyholder i = 1,...,n and coverage period t =1,...,T i, we have at our disposal explanatory variables summarized in a vector x it. The information contained in the x it s partly explains the behavior of the N it s, the number of claims of the i-th policyholder during the t-th coverage period. Since the actuary does not have access to important risk factor, like aggressiveness behind the wheel, there remains some heterogeneity in the portofolio. The effect of the unknown characteristics relating to policyholder i during year t is represented in the model by a random variable Θ it. The sequences of annual numbers of claims N i = (N i1,n i2,...) are assumed to be independent and the components N i1,...,n iti are assumed to be independent given the sequence Θ i = (Θ i1, Θ i2,...,θ iti ) of random effects. The latent unobservable process Θ i characterizes the correlation structure of the N it s. Now, the ith policy of the portfolio, i =1, 2,...,n,isrepresented by a double sequence (Θ i, N i ) where N i gathers the observable annual claim numbers and Θ i is a positive random vector with unit mean representing the unexplained heterogeneity. Specifically, the model is based on the following assumptions: B1 given Θ i = θ i, the random variables N it, t = 1, 2,...,T i, are independent and conform to the Poisson distribution with mean λ it θ it,i.e. Pr[N it = k Θ it = θ it ]=exp( λ it θ it ) (λ itθ it ) k, k N; k! where λ it = d it exp(β x it ) and d it is the duration of the t-th coverage period (also refered to as the risk exposure; in our working dataset it equals one year). B2 at the portfolio level, the sequences (Θ i, N i ), i = 1, 2,...,n, are assumed to be independent. Moreover, the Θ it s are non-negative random variables with unit mean (E[Θ it ]=1for all i, t): the distribution of Θ i =(Θ i1, Θ i2,...,θ iti ) depends only on T i ; defining T max =max i T i, Θ i = d (Θ 1,...,Θ Ti ),the distribution of (Θ 1,...,Θ Tmax ) is supposed to be stationary. Henceforth, we denote by G( ) (resp. g( )) the common cumulative distribution function (resp. probability density function) of the Θ it s, for t =1,...,T i. We suppose also that the squared random effects are integrable. 3.2 Multivariate Poisson-LogNormal distribution Description of the model The multivariate LogNormal mixture of independent Poisson distributions is a parametric class of multivariate count distributions supporting a rich correlation structure. In our context, 63

3 Variable Description v2 to v7 vj equals 1 if record relates to the jth period, otherwise it is 0 v9 equals 1 for women and 0 for men v10 equals 1 when driving in urban area, 0otherwise v11 equals 1 when zone is medium risk (Madrid and Catalonia) v12 equals 1 when zone is high risk (Northern Spain) v13 equals 1 if the driving license is between 4 and 14 years old v14 equals 1 if the driving license is 15 or more years old v15 equals 1 if the client has been in the company between 3 and 5 years v16 equals 1 if the client has been in the company for more than 5 years v17 equals 1 if the insured is 30 years old or younger v18 equals 1 if coverage includes comprehensive except fire v19 equals 1 if coverage includes comprehensive (material damage and fire) v20 equals 1 if power is larger than or equal to 5500cc Table 2.2: Exogeneous variables the multivariate Poisson-LogNormal law is a natural candidate to describe the joint distribution of the N i s. This multivariate counting distribution has been considered by Aitchinson & Ho (1989); see also Joe (1997). Specifically, we have Pr[N i = n i ]= = θ i1 R +... θ iti R + } exp( θ it λ it ) (θ itλ it ) nit n it! { Ti t=1 f Θi (θ i Σ)dθ i1...dθ iti (3.1) where the density f Θi corresponds to the multivariate LN (µ, Σ) distribution, i.e. f Θi (θ i Σ) = exp { 1 2 (ln θ i µ) Σ 1 (ln θ i µ) } (2π) Ti/2 θ i1...θ iti Σ 1/2 with µ the mean vector and Σ the covariance matrix of the Gaussian random vector ln Θ i. In our context of serial dependence, Σ = {σ s,t } 1 s,t Ti has the form and σ tt = σ 2 and σ s,t = σ( t s ) for s t, µ = 1 2 σ2 1. We conventionally put σ(0) = σ 2 for convenience. All the ln Θ it s conform thus to the same N ( 1 2 σ2,σ 2 ) distribution First- and second-order moments Although the multiple integral giving Pr[N i = n i ] cannot be simplified, the moments of N i are easily obtained in function of those of Θ i. Let us first compute the moments of Θ i : E[Θ it ] = 1 Var[Θ it ] = E[Θ 2 it ] {E[Θ it]} 2 = exp(σ 2 ) 1 Cov[Θ it, Θ is ] = E[Θ it Θ is ] E[Θ it ]E[Θ is ] = exp ( σ( t s ) ) 1. Let us now compute the means and variances of the N it s: E[N it ] = λ it E[Θ it ]=λ it (3.2) [ ] [ ] Var[N it ] = E Var[N it Θ it ] + Var E[N it Θ it ] = λ it + λ 2 itvar[θ it ] } = λ it {1+λ it {exp(σ 2 ) 1}. (3.3) For s t, the covariance between N it and N is is given by Cov[N it,n is ] = E [Cov[N it,n is Θ i ]] (3.4) +Cov [E[N it Θ i ], E[N is Θ i ]] = λ it λ is Cov[Θ it, Θ is ] { ( ) } = λ it λ is exp σ( t s ) 1.(3.5) From (3.2)-(3.3), we see that the marginals of N i have overdispersion relative to the Poisson distribution, as expected from any Poisson mixture. From (3.5) we have a correspondance between positive and negative values of correlations for the random effects Θ it and the counts N it Autocorrelation function of the latent process Let us now introduce the autocorrelation function ρ Θ,defined as (h =1, 2,...) ρ Θ (h) =Corr[Θ it, Θ i,t+h ]= exp ( σ(h) ) 1 exp(σ 2 ) 1. (3.6) 64

4 This function features the temporal dependence between the random effects. If the predictive power of the claims decreases with their age, ρ Θ (h) should decrease with h. The decreasingness of ρ Θ is in line with the exogeneous interpretation of residual heterogeneity (generated by unobservable risk characteristics). The correlation between rating factors relating to different periods should decrease with the lag, for observable variables as well as for hidden features. Similarly, endogeneous components of latent processes (e.g. effort variables modeling moral hazard) yield the same conclusion: a driver will tend to be more prudent after an accident but this effect certainly decreases with the age of the claim. Computing Corr[N it,n is ]= Cov[N it,n is ] = Var[Nit ]Var[N is ] { λit λ is exp ( σ( t s ) ) } 1 = 1+λit {exp(σ 2 ) 1} 1+λ is {exp(σ 2 ) 1} for s t, showsthat Corr[N it,n it+h ]= ρ Θ (h) =, λ it{exp(σ 2 ) 1} 1+ λ it+h {exp(σ 2 ) 1} h 1. Note that the inequality Corr[N it,n it+h ] Corr[Θ it, Θ it+h ] is true, so that the correlation is always decreased by mixing Estimating the parameters In order to estimate the model parameters, a direct likelihood maximisation is extremely difficult since the likelihood n i=1 Pr[N i = n i ] with Pr[N i = n i ] of the form (3.1) has no explicit form. It is worth to mention that methods have been developed for Poisson-LogNormal count processes (as the Monte-Carlo approach of Chan & Ledolter (1995)), but seem to be out of reach for the very large data sets the actuaries have to treat. Therefore, we prefer a Generalized Estimating Equations (GEE) approach, which is in line with Poisson- LogNormal model, except that it is only based upon first- and second-order moments. The idea behind this method is to use the estimated moments of the random effects in the regression equation. Since all estimators are consistent, we do not expect much difference in practice. The advantage of the GEE approach over the classical Poisson regression (assuming serial independence, whose estimators are also consistent) is that the standard errors of the regression parameters are corrected For convenience, let us consider a panel data set (T observation periods for each policyholder with T =7for the Spanish data used in the numerical illustrations). Let us define the T T diagonal matrix A i as λ i λ i2 0 A i = λ it. We also introduce S i = N i λ i and X i as the part of the design matrix relating to policyholder i (i.e. X i is of dimension T p,wherep is the number of observed covariates, and has lines x i1,...,x it ). We denote as α the vector of parameters relating to the dependence structure; we chose α h = Cov[Θ t, Θ t+h ] but other parameterizations are of course possible. It is well-known that the maximum likelihood equation for β in the Poisson model with serial independence (i.e. if N i1,...,n iti are mutually independent) is n X i S i = i=1 n X i (N i λ i )=0. (3.7) i=1 where λ i =(λ i1,...,λ it ), with λ it = d it exp(β x it ). The solution β of (3.7) remains consistent if the elements of N i are correlated (provided the λ it s are correctly specified). Let us now present estimating equations taking the correlation into account to increase efficiency. Since the estimator of β resulting from (3.7) remains consistent, we do not expect much difference for point estimation of the regression coefficients. Accounting for serial correlation inflates the standard errors of the estimations of the β j s, which is important, e.g. in the context of variable selection as well as to correctly assess the accuracy of the pure premium estimation. Let D i = A i X i. We can view (3.7) as the sum of D i A 1 i S i,wherea i is the covariance matrix of N i in the model with serial independence. The idea is now to substitute the covariance matrix V i of N i in the Poisson mixture model to A i, and to estimate β by the solution of n i=1 D i V 1 i S i = 0. (3.8) This will be done by an interative procedure, as described in Liang & Zeger (1986). The properties of the estimators are also given. 3.3 Application to Spanish panel data Model 1 - Static Poisson-LogNormal distribution In the static LogNormal model we have the random effects Θ i = (Θ i,...,θ i ) with each Θ i conforming to the Log- Normal distribution (for more details see Purcaru & Denuit (2001)). Note that in this case ρ Θ (h) =1for all h. In this model, the random effect Θ it s representing the unobservable characteristics does not change over time: it is 65

5 estimators step1 step2 step3 step4 step5 step6 ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆσ Θ Table 3.1: Static - LogNormal model - estimations of the parameters estimators IC (independence hypothesis) IC (dependence hypothesis) ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ Table 3.2: Static - LogNormal model - 95% Confidence Intervals for ˆβ s given from the beginning. Consequently, the dependence between the N it s is very strong, since they all share the same Θ i. The estimators of the parameters are displayed in Table 3.1. Step1 is obtained by solving (3.7). An interative procedure is then used to get the solution of (3.8). Convergence occured rapidly (6 step were needed to reach stability). Table 3.2 displays confidence intervals for the regression parameters. When the estimations are performed under the independence assumption, all the β s are significantly different from 0 (since no confidence interval overlaps 0). On the contrary, the confidence interval for β 1 contains 0 when serial dependence is taken into account. This is a typical consequence of serial correlation: the explanatory power of exogeneous variables is overestimated under the falsely assumed independence hypothesis Model 2 - AR(1) Poisson-LogNormal A simple and powerful time series model for count data is obtained by specifying a Gaussian autoregressive process of order one (AR(1), in short) for the Θ it s on the log-scale, that is: ln Θ it = ϱ ln Θ it 1 + ɛ it, where ϱ < 1 and the ɛ it s are independent, N ( σ2 2 (1 ϱ),σ 2 (1 ϱ 2 ) ) errors (the variance ensures that the ln Θ it s remain N ( σ2 2,σ2 ) distributed.) For a reference in actuarial science, see Pinquet, Guillén and Bolancé (2001). In the AR(1) Poisson-LogNormal model, σ(h) =σ 2 ϱ h so that from (3.6) and since σθ 2 =exp(σ2 ) 1 it follows that the autocorrelation function is given by: ρ Θ (h) = exp(σ2 ϱ h ) 1 exp(σ 2 ) 1 = (σ2 Θ +1)ϱ h σ 2 Θ 1. (3.9) The estimations of the model parameters, are given in Table 3.3. It can be seen that the convergence is rather fast: the algorithm converged (the maximum change in the parameter estimates is less than 1E-6) after 5 iterations and that all of the obtained values are significant. Step1 represents the solution of (3.7). The 95% confidence intervals are presented in Table 3.4. As expected, taking serial dependence into account increases the standard deviation of the estimates and this makes the confidence intervals bigger compared to independence. Nevertheless, all the β values remain significant (at 5%). 66

6 estimators step1 step2 step3 step4 step5 step6 ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆσ Θ ˆϱ Table 3.3: AR(1) - LogNormal model - estimations of the parameters estimators IC (serial independence hypothesis) IC (serial dependence hypothesis) ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ Table 3.4: AR(1) - LogNormal model - 95% Confidence Intervals for ˆβ s Model 3 - Product Poisson-LogNormal Let Θ it = U i W it, where the W it s are independent and identically distributed, and independent from U i, with and W it = d LogNormal( σ2 W 2,σ2 W ) U i = d LogNormal( σ2 U 2,σ2 U ). Then Θ it = d LogNormal( σ2 W +σ2 U 2,σW 2 +σ2 U ). In this model the covariance of the random effects depends only of the variance of the time-independent component, i.e. [ ] Cov[Θ it, Θ it+h ] = E Cov[Θ it, Θ it+h U i ] [ ] +Cov E[Θ it U i ], E[Θ it+h U i ] [ ] = Cov U i E[W it ],U i E[W it+h ] = Var[U i ]=σ 2 U, Thus the autocorrelation function between the random effects is constant: σ 2 U ρ Θ (h) ρ = exp(σ 2 ) 1 = σ2 U σθ 2, for h 1. (3.10) The estimations of this model and their 95% confidence intervals are shown in Table 3.5 and Table 3.6, respectively. Again, convergence occurs very rapidly. Step1 is obtained by solving (3.7) for β. Since the variancegets inflated onceserial dependence has been taken into account, confidence intervals become bigger, as it can be seen from Table Comparison of frequency premiums for the three models Let us compare the result between the two models emphasizing the time-dependence and the one for the timeindependence. We have listed the estimated regression coefficients in Table 3.8, but these values are difficult to compare. For this reason we have identified three policyholders of the portofolio: the worst driver, the best driver and the average driver. Those are described in Table 3.7. The estimated expected claim frequency (λ t = d t exp(β x t )), whered t =1 (one year)) according to the three models are given in Table

7 estimators step1 step2 step3 step4 step5 step6 ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆσ Θ ˆρ Table 3.5: Product - LogNormal model - estimations of the parameters estimators IC (independence hypothesis) IC (dependence hypothesis) ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ Table 3.6: Product - LogNormal model - 95% Confidence Intervals for ˆβ s Policyholder v9 v10 v11 v12 v13 v14 v15 v16 v17 v18 v19 v20 GOOD AVERAGE BAD Table 3.7: Risk classes in the portofolio 68

8 Models estimators AR(1) - LogNormal Product - LogNormal Static - LogNormal ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ ˆβ Premiums Good Average Bad Table 3.8: Comparison of frequency premiums for the three models Clearly, the different models give very similar point estimates for the expected claim frequencies. Nevertheless the three models will yield very different credibility premiums. 4 Comparison of linear credibility updating formulas 4.1 Derivation of linear credibility formulas In the spirit of Bühlmann, linear credibility predictors are obtained from a linear regression derived in the model with random effects. This is particularly appealing in our context since Zeger s (1988) extension of the Poisson regression model allowing for serial correlation is distributionally known only up to the first- and second-order moments. Let H it be the claim history of the policyholder up to (and including) time t 1 (that is N i1,...,n i,t 1 together with frequency premiums λ i1,...,λ it 1 ). The predictor of E[Θ i,ti+1 H i,ti ] is of the form T i a i + b it N it (4.1) t=1 where N it is the number of claims reported by policyholder i during year t. The coefficients a i and b it, t =1,...,T i,are determined in order to minimize the expected square difference between the unknown Θ i,ti+1 and the linear credibility predictor (4.1), that is, to minimize ( ) T 2 i ψ(a i,b i1,...,b iti )=E Θ i,ti+1 a i b it N it. Equating a i ψ to zero yields T i a i + b it λ it =1 t=1 t=1 whence it follows that we have to minimize [ T 2 i ψ(b i1,...,b iti )=E (Θ i,ti+1 1) b it (N it λ it )]. The stationary equation t=1 b is ψ =0yields T i Cov[Θ i,ti+1,n is ] b it Cov[N is,n it ]=0 (4.2) t=1 where [ ] Cov[Θ i,ti+1,n is ] = Cov Θ i,ti+1, E[N is Θ i ] { = λ is exp ( σ(t i +1 s) ) } 1. Once the covariances appearing in (4.2) are estimated from the previous relation and from Section 3.2, we can obtain the b it s. Clearly, the predictors (4.1) depend not only on past claims but also on past premiums (that is, on past characteristics of policyholders). Consequently, they average to one whatever the rating factors. This fairness property is not always fulfilled by experience rating schemes enforced in real-life: if the boni and the mali do not depend on aprioriratemaking (that is, on the frequency of claims), the average a posteriori corrections coefficients increase with the frequency. In order to illustrate the extent to which a posteriori corrections depend on a priori characteristics, let us consider three typical policyholders: 1. a bad driver, i.e. a policyholder paying high aprioripremiums 2. an average driver, i.e. a policyholder paying average apriori premiums 3. a good driver, i.e. a policyholder paying low aprioririsk premium. Those persons are precisely described in Table 4.1. They are in line with the risks described in Tables 3.7 and

9 We consider that they pay 100% of the premium for being covered the first year. Policyholder aprioripremium Good 5% Medium 10% Bad 20% Table 4.1: Risk classes We will also consider that the coverage period is of 10 years and we will carefully examine: (i) how the age of the claims is incorporated in a posteriori correction (see subsection 4.2) (ii) how the observable characteristics of policyholder affect a posteriori correction (see subsection 4.3) (iii) how the choice of the model for the series of claims counts influences a posteriori corrections (see subsection 4.4) 4.2 A posteriori correction according to age of claims In order to illustrate the impact of the age of claims on the a posteriori premiums, we will consider in this section only the GOOD policyholder, whose aprioripremium equals 5% Model 1 - Static-LogNormal Table 4.2 describes the evolution of the expected annual claim frequency for a good driver having reported a single claim in year 1 (column 2) or in year 5 (column 3) over a coverage period of 10 years. We observe that both scenarios yield exactly the same a posteriori corrections from the 6-th year. This model does not take into account the age of claims: the predictive ability of a claim does not decrease with time, which is quite restrictive. Intuitively speaking, the first scenario should yield a lower claim frequency for year 10 than scenario 2. Year of occurence of the claim Year y =1 y = Table 4.2: GOOD policyholder : premiums assuming the Static - LogNormal model (penalisations in bold) Model 2 - AR(1)-LogNormal Table 4.3 describes the evolution of the expected claim frequency relating to a good policyholder if he has reported a single claim during the first year (column 2) or during the 5-th year (column 3). The penalisation is seen in year 2 or year 6, respectively (in bold). Considering the premium of year 7, we clearly see how the age of the claims affects the amount paid by the policyholder : the amount in the second case (claim during the 5-th year) by far exceeds the corresponding value in the first situation. A recent claim has thus more impact on the revised expected claim frequency than an old one. Year of occurence of the claim Year y =1 y = Table 4.3: GOOD policyholder : premiums assuming the AR(1) - LogNormal model (penalisations in bold) Model 3 - Product-LogNormal Table 4.4 is similar to Table 4.3. In this case, the premiums for year 6 are identical in both situations. As Model 1, this model does not incorporate the age of claims in the risk prediction and this is due to the fact that ρ Θ is constant. Year of occurence of the claim Years y =1 y = Table 4.4: GOOD policyholder : premiums assuming the Product - LogNormal model (penalisations in bold) 70

10 Policyholder Year Good Average Bad % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % Table 4.5: Static LogNormal model - premiums (values and % of the aprioripremium) if a claim is filed during the 1-st year Policyholder Year Good Average Bad % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % Table 4.6: Static LogNormal model - premiums and the associated discounts (in %) if no claims are reported over 10 years In passing, we see that the a posteriori corrections are more severe for the Static-LogNormal model than for the Product- LogNormal model. This is due to the fact that the dependence existing between the N it s is stronger in the Static-LogNormal model than in Product-LogNormal model so that the credibility system penalizes more heavily past claim for the Product- LogNormal model. For the AR(1)-LogNormal model since the past claims are forgotten more rapidly than in the two others, the penalisations are less severe. 4.3 A posteriori correction according to a priori characteristics Assume that the three individuals described in Table 4.1 (good, average and bad driver) report a single claim during the first year. The evolution of their expected claim frequency is presented in Tables 4.5, 4.7 and 4.9, according to the model retained to describe serial dependence. Tables 4.6, 4.8 and 4.10 are the analogues if those three kind of drivers do not report any claim Model 1 - Static-LogNormal By observing the Table 4.5, we conclude that the relative increase of premium due to the reporting of one claim during the year 1 is the biggest for the good policyholder. On the whole, the lower the claim frequency, the higher the relative a posteriori correction is. In Table 4.6, when there is no claim reported, we observe as in the two previous cases that the relative discounts granted to bad policyholder are the highest. On the whole, the higher the claim frequency, the higher the relative a posteriori discounts. In this case, this result straightly follows from the explicit expression E[Θ i,ti+1 H i,ti ]= 1+σ2 Ti Θ t=1 n it 1+σΘ 2 Ti t=1 λ, it which clearly shows how for fixed n it s the λ it s affect the a posteriori correction Model 2 - AR(1)-LogNormal As it can be seen from line 2 in Table 4.7, the relative increase of premium due to the reporting of one claim during the year 1 is the biggest for good policyholder. On the whole, the lower the claim frequency, the higher the relative a posteriori correction. In table 4.8 we see that the relative discounts granted to bad policyholders, when they do not report any claims is the highest. On the whole, the higher the claim frequency, the higher relative a posteriori discounts. The conclusions of Model 1 carry to Model 2, even if in the latter case there is no explicit expression for E[Θ i,ti+1 H i,ti ] Model 3 - Product-LogNormal Tables 4.9 and 4.10 reveal the same patterns for the a posteriori corrections acording to the the a priori penalty of the risk, i.e. the relative increase of premium due to the reporting of one claim during the year 1 is the biggest for good policyholder, and the relative discounts granted to bad policyholders, when they do not report any claims is the highest. On the whole, the lower (higher) claim frequency, the higher the relative a posteriori correction (discount) is. 71

11 Policyholder Year Good Average Bad % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % Table 4.7: AR(1)-LogNormal model - premiums (values and % of the aprioripremium) if a claim is filed during the 1-st year Policyholder Year Good Average Bad % 0.1 0% 0.2 0% % % % % % % % % % % % % % % % % % % % % % % % % % % % Table 4.8: AR(1)-LogNormal model - premiums and the associated discounts (in %) if no claims are reported over 10 years Policyholder Year Good Average Bad % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % Table 4.9: Product-LogNormal model - premiums (values and % of the aprioripremium) if a claim is filed during the 1-st year Policyholder Year Good Average Bad % 0.1 0% 0.2 0% % % % % % % % % % % % % % % % % % % % % % % % % % % % Table 4.10: Product-LogNormal model - premiums and the associated discounts (in %) if no claims are reported over 10 years 72

12 Models Claim AR(1) LN Product LN Static LN Year no claims (over 10 years) claim in the 1-st year Table 4.11: BAD policyholder - a posteriori corrections 4.4 A posteriori correction according to the model used for series of claim counts BAD policyholder In Table 4.11 there is one feature which appear to be interesting, namely the time needed for the remission of sins : how long does it take for the penalty to disappear. If there is a single claim reported during year 1, the initial level 0.2 is obtained once again after a period of 4 years, in the Product-LogNormal and the Static-LogNormal models. In the AR(1)-LogNormal model 2 years are enough to forget about the claim. A consequence of such phenomenon is that the discounts, when there is no claim reported, are smaller in the AR(1)-LogNormal model, than in the two others. Only the penalties obtained from the AR(1)-LogNormal model could be enforced in practice AVERAGE policyholder The phenomenon of remission of sins for the average policyholder is no longer so obvious, as it was for the bad policyholder. However, we observe in table 4.12 that if there is a single claim reported in the first year, the premiums are decreasing faster (3 years) toward the premium paid in year 1 with the AR(1)-LogNormal model, than the two other models. One consequence of this feature is that the discounts, when no claims are reported, in the AR(1)-LogNormal case are the smallest GOOD policyholder Table 4.13 is similar to Table We observe that for the good policyholder if there is a single claim reported in the first year, the premiums are decreasing faster (6 years) toward the premium paid in year 1 with the AR(1)-LogNormal model, than the two other models. Models Claim AR(1) LN Product LN Static LN Year no claims (over 10 years) claim in the 1-st year Table 4.12: AVERAGE policyholder - a posteriori corrections Models Claim AR(1) LN Product LN Static LN Year no claims (over 10 years) claim in the 1-st year Table 4.13: GOOD policyholder - a posteriori corrections As in the two previous cases, one consequence of this feature is that the discounts, when no claims are reported, in the AR(1)-LogNormal case are the smallest. 5 Conclusions The results of this paper illustrate the different kinds of a posteriori risk evaluation produced by the three models for series of claim counts considered in this study: Static-LogNormal, Product-LogNormal and AR(1)-LogNormal. In the future, we aim to apply copula models to generate Markov of order one structure for the latent process. This will allow us to have more freedom concerning the marginal distribution of the random effects (in particular, gamma distributed latent variables are desirable). 73

13 ACKNOWLEDGEMENTS The support of the Belgian Government under contract Projet d Actions de Recherche Concertées ARC 98/ is gratefully acknowledged. REFERENCES [1] Aitchinson, J., and Ho, C.H. (1989). The multivariate Poisson-log normal distribution. Biometrika 75, [2] Bolancé, C., Guillén, M., & Pinquet, J. (2003). Time-varying credibility for frequency risk models: estimation and tests for autoregressive specifications on the random effects. ASTIN Bulletin, 31, 2, [3] Chan, K.S., & Ledolter, J. (1995). Monte Carlo EM estimation for time series models involving counts. Journal of the American Statistical Association 90, [4] Joe, H. (1997). Multivariate Models and Dependence Concepts. Chapman & Hall, London. [5] Liang K-Y. & Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models.biometrika 73, [6] Pinquet, J., Guillén, M., & Bolancé, C. (2001). Allowance for the age of claims in bonus-malus systems. ASTIN Bulletin, 31, 2, [7] Purcaru, O., & Denuit, M. (2002). On the dependence induced by frequency credibility models. Belgian Actuarial Bulletin, 2, 1, [8] Purcaru, O., & Denuit, M. (2003). Dependence in dynamic claim frequency credibility models. ASTIN, 33, 1, [9] Zeger, S.L. (1988). A regression model for time series of counts. Biometrika 74,

Hierarchical Insurance Claims Modeling

Hierarchical Insurance Claims Modeling Hierarchical Insurance Claims Modeling Edward W. (Jed) Frees, University of Wisconsin - Madison Emiliano A. Valdez, University of Connecticut 2009 Joint Statistical Meetings Session 587 - Thu 8/6/09-10:30

More information

NUMBER OF ACCIDENTS OR NUMBER OF CLAIMS? AN APPROACH WITH ZERO-INFLATED POISSON MODELS FOR PANEL DATA

NUMBER OF ACCIDENTS OR NUMBER OF CLAIMS? AN APPROACH WITH ZERO-INFLATED POISSON MODELS FOR PANEL DATA NUMBER OF ACCIDENTS OR NUMBER OF CLAIMS? AN APPROACH WITH ZERO-INFLATED POISSON MODELS FOR PANEL DATA Jean-Philippe Boucher boucher.jean-philippe@uqam.ca Département de Mathématiques Université du Québec

More information

Probabilistic concepts of risk classification in insurance

Probabilistic concepts of risk classification in insurance Probabilistic concepts of risk classification in insurance Emiliano A. Valdez Michigan State University East Lansing, Michigan, USA joint work with Katrien Antonio* * K.U. Leuven 7th International Workshop

More information

Longitudinal Modeling of Singapore Motor Insurance

Longitudinal Modeling of Singapore Motor Insurance Longitudinal Modeling of Singapore Motor Insurance Emiliano A. Valdez University of New South Wales Edward W. (Jed) Frees University of Wisconsin 28-December-2005 Abstract This work describes longitudinal

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Time Series Analysis

Time Series Analysis Time Series Analysis Forecasting with ARIMA models Andrés M. Alonso Carolina García-Martos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 2012 Alonso and García-Martos (UC3M-UPM)

More information

A credibility method for profitable cross-selling of insurance products

A credibility method for profitable cross-selling of insurance products Submitted to Annals of Actuarial Science manuscript 2 A credibility method for profitable cross-selling of insurance products Fredrik Thuring Faculty of Actuarial Science and Insurance, Cass Business School,

More information

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

More information

ESTIMATION OF CLAIM NUMBERS IN AUTOMOBILE INSURANCE

ESTIMATION OF CLAIM NUMBERS IN AUTOMOBILE INSURANCE Annales Univ. Sci. Budapest., Sect. Comp. 42 (2014) 19 35 ESTIMATION OF CLAIM NUMBERS IN AUTOMOBILE INSURANCE Miklós Arató (Budapest, Hungary) László Martinek (Budapest, Hungary) Dedicated to András Benczúr

More information

BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

More information

Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

More information

Hierarchical Insurance Claims Modeling

Hierarchical Insurance Claims Modeling Hierarchical Insurance Claims Modeling Edward W. Frees University of Wisconsin Emiliano A. Valdez University of New South Wales 02-February-2007 y Abstract This work describes statistical modeling of detailed,

More information

An Extension Model of Financially-balanced Bonus-Malus System

An Extension Model of Financially-balanced Bonus-Malus System An Extension Model of Financially-balanced Bonus-Malus System Other : Ratemaking, Experience Rating XIAO, Yugu Center for Applied Statistics, Renmin University of China, Beijing, 00872, P.R. China Phone:

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Longitudinal Modeling of Insurance Claim Counts Using Jitters

Longitudinal Modeling of Insurance Claim Counts Using Jitters Longitudinal Modeling of Insurance Claim Counts Using Jitters Peng Shi Division of Statistics Northern Illinois University DeKalb, IL 60115 Email: pshi@niu.edu Emiliano A. Valdez Department of Mathematics

More information

Chapter 4: Vector Autoregressive Models

Chapter 4: Vector Autoregressive Models Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Chapter 9 Experience rating

Chapter 9 Experience rating 0 INTRODUCTION 1 Chapter 9 Experience rating 0 Introduction The rating process is the process of deciding on an appropriate level of premium for a particular class of insurance business. The contents of

More information

Application of Credibility Theory to Group Life Pricing

Application of Credibility Theory to Group Life Pricing Prepared by: Manuel Tschupp, MSc ETH Application of Credibility Theory to Group Life Pricing Extended Techniques TABLE OF CONTENTS 1. Introduction 3 1.1 Motivation 3 1.2 Fundamentals 1.3 Structure 3 4

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

Mixing internal and external data for managing operational risk

Mixing internal and external data for managing operational risk Mixing internal and external data for managing operational risk Antoine Frachot and Thierry Roncalli Groupe de Recherche Opérationnelle, Crédit Lyonnais, France This version: January 29, 2002 Introduction

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

Model Selection and Claim Frequency for Workers Compensation Insurance

Model Selection and Claim Frequency for Workers Compensation Insurance Model Selection and Claim Frequency for Workers Compensation Insurance Jisheng Cui, David Pitt and Guoqi Qian Abstract We consider a set of workers compensation insurance claim data where the aggregate

More information

**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.

**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1. **BEGINNING OF EXAMINATION** 1. You are given: (i) The annual number of claims for an insured has probability function: 3 p x q q x x ( ) = ( 1 ) 3 x, x = 0,1,, 3 (ii) The prior density is π ( q) = q,

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Joint models for classification and comparison of mortality in different countries.

Joint models for classification and comparison of mortality in different countries. Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute

More information

Underwriting risk control in non-life insurance via generalized linear models and stochastic programming

Underwriting risk control in non-life insurance via generalized linear models and stochastic programming Underwriting risk control in non-life insurance via generalized linear models and stochastic programming 1 Introduction Martin Branda 1 Abstract. We focus on rating of non-life insurance contracts. We

More information

A revisit of the hierarchical insurance claims modeling

A revisit of the hierarchical insurance claims modeling A revisit of the hierarchical insurance claims modeling Emiliano A. Valdez Michigan State University joint work with E.W. Frees* * University of Wisconsin Madison Statistical Society of Canada (SSC) 2014

More information

Integrated Resource Plan

Integrated Resource Plan Integrated Resource Plan March 19, 2004 PREPARED FOR KAUA I ISLAND UTILITY COOPERATIVE LCG Consulting 4962 El Camino Real, Suite 112 Los Altos, CA 94022 650-962-9670 1 IRP 1 ELECTRIC LOAD FORECASTING 1.1

More information

Stochastic programming approaches to pricing in non-life insurance

Stochastic programming approaches to pricing in non-life insurance Stochastic programming approaches to pricing in non-life insurance Martin Branda Charles University in Prague Department of Probability and Mathematical Statistics 11th International Conference on COMPUTATIONAL

More information

SYSTEMS OF REGRESSION EQUATIONS

SYSTEMS OF REGRESSION EQUATIONS SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations

More information

GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE

GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE ACTA UNIVERSITATIS AGRICULTURAE ET SILVICULTURAE MENDELIANAE BRUNENSIS Volume 62 41 Number 2, 2014 http://dx.doi.org/10.11118/actaun201462020383 GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE Silvie Kafková

More information

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA REVSTAT Statistical Journal Volume 4, Number 2, June 2006, 131 142 A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA Authors: Daiane Aparecida Zuanetti Departamento de Estatística, Universidade Federal de São

More information

ROMANIAN COMPULSORY MOTOR THIRD PARTY LIABILITY INSURANCE IN 2010 AND THE PREDICTABLE FUTURE

ROMANIAN COMPULSORY MOTOR THIRD PARTY LIABILITY INSURANCE IN 2010 AND THE PREDICTABLE FUTURE ROMANIAN COMPULSORY MOTOR THIRD PARTY LIABILITY INSURANCE IN 2010 AND THE PREDICTABLE FUTURE Marius Dan Gavriletea 1 Aura Carmen Moga 2 ABSTRACT: Insurance coverage is mandatory for the motor s owners.

More information

Fitting the Belgian Bonus-Malus System

Fitting the Belgian Bonus-Malus System Fitting the Belgian Bonus-Malus System S. Pitrebois 1, M. Denuit 2 and J.F. Walhin 3 Abstract. We show in this paper how to obtain the relativities of the Belgian Bonus-Malus System, including the special

More information

Handling attrition and non-response in longitudinal data

Handling attrition and non-response in longitudinal data Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

More information

Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

More information

1 Short Introduction to Time Series

1 Short Introduction to Time Series ECONOMICS 7344, Spring 202 Bent E. Sørensen January 24, 202 Short Introduction to Time Series A time series is a collection of stochastic variables x,.., x t,.., x T indexed by an integer value t. The

More information

Prospective Life Tables

Prospective Life Tables An introduction to time dependent mortality models by Julien Antunes Mendes and Christophe Pochet TRENDS OF MORTALITY Life expectancy at birth among early humans was likely to be about 20 to 30 years.

More information

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium

More information

A GENERALIZATION OF AUTOMOBILE INSURANCE RATING MODELS: THE NEGATIVE BINOMIAL DISTRIBUTION WITH A REGRESSION COMPONENT

A GENERALIZATION OF AUTOMOBILE INSURANCE RATING MODELS: THE NEGATIVE BINOMIAL DISTRIBUTION WITH A REGRESSION COMPONENT WORKSHOP A GENERALIZATION OF AUTOMOBILE INSURANCE RATING MODELS: THE NEGATIVE BINOMIAL DISTRIBUTION WITH A REGRESSION COMPONENT BY GEORGES DIONNE and CHARLES VANASSE Universit~ de MontrEal, Canada * ABSTRACT

More information

A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University

A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses Michael R. Powers[ ] Temple University and Tsinghua University Thomas Y. Powers Yale University [June 2009] Abstract We propose a

More information

One-year reserve risk including a tail factor : closed formula and bootstrap approaches

One-year reserve risk including a tail factor : closed formula and bootstrap approaches One-year reserve risk including a tail factor : closed formula and bootstrap approaches Alexandre Boumezoued R&D Consultant Milliman Paris alexandre.boumezoued@milliman.com Yoboua Angoua Non-Life Consultant

More information

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Random Effects Models for Longitudinal Survey Data

Random Effects Models for Longitudinal Survey Data Analysis of Survey Data. Edited by R. L. Chambers and C. J. Skinner Copyright 2003 John Wiley & Sons, Ltd. ISBN: 0-471-89987-9 CHAPTER 14 Random Effects Models for Longitudinal Survey Data C. J. Skinner

More information

On Correlating Performance Metrics

On Correlating Performance Metrics On Correlating Performance Metrics Yiping Ding and Chris Thornley BMC Software, Inc. Kenneth Newman BMC Software, Inc. University of Massachusetts, Boston Performance metrics and their measurements are

More information

From the help desk: Bootstrapped standard errors

From the help desk: Bootstrapped standard errors The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

More information

Penalized regression: Introduction

Penalized regression: Introduction Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

More information

Runoff of the Claims Reserving Uncertainty in Non-Life Insurance: A Case Study

Runoff of the Claims Reserving Uncertainty in Non-Life Insurance: A Case Study 1 Runoff of the Claims Reserving Uncertainty in Non-Life Insurance: A Case Study Mario V. Wüthrich Abstract: The market-consistent value of insurance liabilities consists of the best-estimate prediction

More information

FIXED EFFECTS AND RELATED ESTIMATORS FOR CORRELATED RANDOM COEFFICIENT AND TREATMENT EFFECT PANEL DATA MODELS

FIXED EFFECTS AND RELATED ESTIMATORS FOR CORRELATED RANDOM COEFFICIENT AND TREATMENT EFFECT PANEL DATA MODELS FIXED EFFECTS AND RELATED ESTIMATORS FOR CORRELATED RANDOM COEFFICIENT AND TREATMENT EFFECT PANEL DATA MODELS Jeffrey M. Wooldridge Department of Economics Michigan State University East Lansing, MI 48824-1038

More information

Health insurance pricing in Spain: Consequences and alternatives

Health insurance pricing in Spain: Consequences and alternatives Health insurance pricing in Spain: Consequences and alternatives Anna Castañer, M. Mercè Claramunt and Carmen Ribas Dept. Matemàtica Econòmica, Financera i Actuarial Universitat de Barcelona Abstract For

More information

State Space Time Series Analysis

State Space Time Series Analysis State Space Time Series Analysis p. 1 State Space Time Series Analysis Siem Jan Koopman http://staff.feweb.vu.nl/koopman Department of Econometrics VU University Amsterdam Tinbergen Institute 2011 State

More information

The CUSUM algorithm a small review. Pierre Granjon

The CUSUM algorithm a small review. Pierre Granjon The CUSUM algorithm a small review Pierre Granjon June, 1 Contents 1 The CUSUM algorithm 1.1 Algorithm............................... 1.1.1 The problem......................... 1.1. The different steps......................

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

The zero-adjusted Inverse Gaussian distribution as a model for insurance claims

The zero-adjusted Inverse Gaussian distribution as a model for insurance claims The zero-adjusted Inverse Gaussian distribution as a model for insurance claims Gillian Heller 1, Mikis Stasinopoulos 2 and Bob Rigby 2 1 Dept of Statistics, Macquarie University, Sydney, Australia. email:

More information

Centre for Central Banking Studies

Centre for Central Banking Studies Centre for Central Banking Studies Technical Handbook No. 4 Applied Bayesian econometrics for central bankers Andrew Blake and Haroon Mumtaz CCBS Technical Handbook No. 4 Applied Bayesian econometrics

More information

Systematic risk modelisation in credit risk insurance

Systematic risk modelisation in credit risk insurance Systematic risk modelisation in credit risk insurance Frédéric Planchet Jean-François Decroocq Ψ Fabrice Magnin α ISFA - Laboratoire SAF β Université de Lyon - Université Claude Bernard Lyon 1 Groupe EULER

More information

Master s Theory Exam Spring 2006

Master s Theory Exam Spring 2006 Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

More information

Modern Actuarial Risk Theory

Modern Actuarial Risk Theory Modern Actuarial Risk Theory Modern Actuarial Risk Theory by Rob Kaas University of Amsterdam, The Netherlands Marc Goovaerts Catholic University of Leuven, Belgium and University of Amsterdam, The Netherlands

More information

GLM, insurance pricing & big data: paying attention to convergence issues.

GLM, insurance pricing & big data: paying attention to convergence issues. GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.

More information

Some useful concepts in univariate time series analysis

Some useful concepts in univariate time series analysis Some useful concepts in univariate time series analysis Autoregressive moving average models Autocorrelation functions Model Estimation Diagnostic measure Model selection Forecasting Assumptions: 1. Non-seasonal

More information

Analysis of a Production/Inventory System with Multiple Retailers

Analysis of a Production/Inventory System with Multiple Retailers Analysis of a Production/Inventory System with Multiple Retailers Ann M. Noblesse 1, Robert N. Boute 1,2, Marc R. Lambrecht 1, Benny Van Houdt 3 1 Research Center for Operations Management, University

More information

Imputing Missing Data using SAS

Imputing Missing Data using SAS ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are

More information

A Multilevel Analysis of Intercompany Claim Counts

A Multilevel Analysis of Intercompany Claim Counts A Multilevel Analysis of Intercompany Claim Counts Katrien Antonio Edward W. Frees Emiliano A. Valdez May 15, 2008 Abstract For the construction of a fair tariff structure in automobile insurance, insurers

More information

Multivariate Negative Binomial Models for Insurance Claim Counts

Multivariate Negative Binomial Models for Insurance Claim Counts Multivariate Negative Binomial Models for Insurance Claim Counts Peng Shi Division of Statistics Northern Illinois University DeKalb, IL 0 Email: pshi@niu.edu Emiliano A. Valdez Department of Mathematics

More information

Stop Loss Reinsurance

Stop Loss Reinsurance Stop Loss Reinsurance Stop loss is a nonproportional type of reinsurance and works similarly to excess-of-loss reinsurance. While excess-of-loss is related to single loss amounts, either per risk or per

More information

Matching Investment Strategies in General Insurance Is it Worth It? Aim of Presentation. Background 34TH ANNUAL GIRO CONVENTION

Matching Investment Strategies in General Insurance Is it Worth It? Aim of Presentation. Background 34TH ANNUAL GIRO CONVENTION Matching Investment Strategies in General Insurance Is it Worth It? 34TH ANNUAL GIRO CONVENTION CELTIC MANOR RESORT, NEWPORT, WALES Aim of Presentation To answer a key question: What are the benefit of

More information

Quest for Optimal Bonus-Malus in Automobile Insurance in Developing Economies: An Actuarial Perspective

Quest for Optimal Bonus-Malus in Automobile Insurance in Developing Economies: An Actuarial Perspective Quest for Optimal Bonus-Malus in Automobile Insurance in Developing Economies: An Actuarial Perspective Ade Ibiwoye, I. A. Adeleke & S. A. Aduloju Department of Actuarial Science & Insurance, University

More information

Prediction of individual automobile RBNS claim reserves in the context of Solvency II

Prediction of individual automobile RBNS claim reserves in the context of Solvency II Research Institute of Applied Economics 2008 Working Papers 2008/06, 23 pages Prediction of individual automobile RBNS claim reserves in the context of Solvency II By Mercedes Ayuso and Miguel Santolino

More information

Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]

Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012] Survival Analysis of Left Truncated Income Protection Insurance Data [March 29, 2012] 1 Qing Liu 2 David Pitt 3 Yan Wang 4 Xueyuan Wu Abstract One of the main characteristics of Income Protection Insurance

More information

Lecture 8: Gamma regression

Lecture 8: Gamma regression Lecture 8: Gamma regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Models with constant coefficient of variation Gamma regression: estimation and testing

More information

Department of Economics

Department of Economics Department of Economics On Testing for Diagonality of Large Dimensional Covariance Matrices George Kapetanios Working Paper No. 526 October 2004 ISSN 1473-0278 On Testing for Diagonality of Large Dimensional

More information

Pricing Alternative forms of Commercial Insurance cover

Pricing Alternative forms of Commercial Insurance cover Pricing Alternative forms of Commercial Insurance cover Prepared by Andrew Harford Presented to the Institute of Actuaries of Australia Biennial Convention 23-26 September 2007 Christchurch, New Zealand

More information

ADVERSE SELECTION AND MORAL HAZARD IN INSURANCE: CAN DYNAMIC DATA HELP TO DISTINGUISH?

ADVERSE SELECTION AND MORAL HAZARD IN INSURANCE: CAN DYNAMIC DATA HELP TO DISTINGUISH? ADVERSE SELECTION AND MORAL HAZARD IN INSURANCE: CAN DYNAMIC DATA HELP TO DISTINGUISH? Jaap H. Abbring Free University James J. Heckman University of Chicago Pierre-André Chiappori University of Chicago

More information

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

Quantitative Methods for Finance

Quantitative Methods for Finance Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain

More information

Contents. List of Figures. List of Tables. List of Examples. Preface to Volume IV

Contents. List of Figures. List of Tables. List of Examples. Preface to Volume IV Contents List of Figures List of Tables List of Examples Foreword Preface to Volume IV xiii xvi xxi xxv xxix IV.1 Value at Risk and Other Risk Metrics 1 IV.1.1 Introduction 1 IV.1.2 An Overview of Market

More information

Bias in the Estimation of Mean Reversion in Continuous-Time Lévy Processes

Bias in the Estimation of Mean Reversion in Continuous-Time Lévy Processes Bias in the Estimation of Mean Reversion in Continuous-Time Lévy Processes Yong Bao a, Aman Ullah b, Yun Wang c, and Jun Yu d a Purdue University, IN, USA b University of California, Riverside, CA, USA

More information

Time Series Analysis

Time Series Analysis Time Series Analysis Identifying possible ARIMA models Andrés M. Alonso Carolina García-Martos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 2012 Alonso and García-Martos

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

Introduction to time series analysis

Introduction to time series analysis Introduction to time series analysis Margherita Gerolimetto November 3, 2010 1 What is a time series? A time series is a collection of observations ordered following a parameter that for us is time. Examples

More information

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring Non-life insurance mathematics Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring Overview Important issues Models treated Curriculum Duration (in lectures) What is driving the result of a

More information

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models Overview 1 Introduction Longitudinal Data Variation and Correlation Different Approaches 2 Mixed Models Linear Mixed Models Generalized Linear Mixed Models 3 Marginal Models Linear Models Generalized Linear

More information

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem Chapter Vector autoregressions We begin by taking a look at the data of macroeconomics. A way to summarize the dynamics of macroeconomic data is to make use of vector autoregressions. VAR models have become

More information

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

More information

1 Teaching notes on GMM 1.

1 Teaching notes on GMM 1. Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in

More information

Own Damage, Third Party Property Damage Claims and Malaysian Motor Insurance: An Empirical Examination

Own Damage, Third Party Property Damage Claims and Malaysian Motor Insurance: An Empirical Examination Australian Journal of Basic and Applied Sciences, 5(7): 1190-1198, 2011 ISSN 1991-8178 Own Damage, Third Party Property Damage Claims and Malaysian Motor Insurance: An Empirical Examination 1 Mohamed Amraja

More information

Analysis of Bayesian Dynamic Linear Models

Analysis of Bayesian Dynamic Linear Models Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main

More information

FAIR TRADE IN INSURANCE INDUSTRY: PREMIUM DETERMINATION OF TAIWAN AUTOMOBILE INSURANCE. Emilio Venezian Venezian Associate, Taiwan

FAIR TRADE IN INSURANCE INDUSTRY: PREMIUM DETERMINATION OF TAIWAN AUTOMOBILE INSURANCE. Emilio Venezian Venezian Associate, Taiwan FAIR TRADE IN INSURANCE INDUSTRY: PREMIUM DETERMINATION OF TAIWAN AUTOMOBILE INSURANCE Emilio Venezian Venezian Associate, Taiwan Chu-Shiu Li Department of Economics, Feng Chia University, Taiwan 100 Wen

More information

Introduction to Predictive Modeling Using GLMs

Introduction to Predictive Modeling Using GLMs Introduction to Predictive Modeling Using GLMs Dan Tevet, FCAS, MAAA, Liberty Mutual Insurance Group Anand Khare, FCAS, MAAA, CPCU, Milliman 1 Antitrust Notice The Casualty Actuarial Society is committed

More information

Portfolio Using Queuing Theory

Portfolio Using Queuing Theory Modeling the Number of Insured Households in an Insurance Portfolio Using Queuing Theory Jean-Philippe Boucher and Guillaume Couture-Piché December 8, 2015 Quantact / Département de mathématiques, UQAM.

More information

Univariate and Multivariate Methods PEARSON. Addison Wesley

Univariate and Multivariate Methods PEARSON. Addison Wesley Time Series Analysis Univariate and Multivariate Methods SECOND EDITION William W. S. Wei Department of Statistics The Fox School of Business and Management Temple University PEARSON Addison Wesley Boston

More information

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? SAMUEL H. COX AND YIJIA LIN ABSTRACT. We devise an approach, using tobit models for modeling annuity lapse rates. The approach is based on data provided

More information

Do We Drive More Safely When Accidents are More Expensive? Identifying Moral Hazard from Experience Rating Schemes

Do We Drive More Safely When Accidents are More Expensive? Identifying Moral Hazard from Experience Rating Schemes Do We Drive More Safely When Accidents are More Expensive? Identifying Moral Hazard from Experience Rating Schemes Mark Israel Kellogg School of Management, Northwestern University February 004 Abstract

More information

Asymmetry and the Cost of Capital

Asymmetry and the Cost of Capital Asymmetry and the Cost of Capital Javier García Sánchez, IAE Business School Lorenzo Preve, IAE Business School Virginia Sarria Allende, IAE Business School Abstract The expected cost of capital is a crucial

More information