# NON-LIFE INSURANCE PRICING USING THE GENERALIZED ADDITIVE MODEL, SMOOTHING SPLINES AND L-CURVES

Save this PDF as:

Size: px
Start display at page:

Download "NON-LIFE INSURANCE PRICING USING THE GENERALIZED ADDITIVE MODEL, SMOOTHING SPLINES AND L-CURVES"

## Transcription

1 NON-LIFE INSURANCE PRICING USING THE GENERALIZED ADDITIVE MODEL, SMOOTHING SPLINES AND L-CURVES Kivan Kaivanipour A thesis submitted for the degree of Master of Science in Engineering Physics Department of Mathematics ROYAL INSTITUTE OF TECHNOLOGY Stockholm, Sweden May 2015

2

3 Abstract In non-life insurance, almost every tariff analysis involves continuous rating variables, such as the age of the policyholder or the weight of the insured vehicle. In the generalized linear model, continuous rating variables are categorized into intervals and all values within an interval are treated as identical. By using the generalized additive model, the categorization part of the generalized linear model can be avoided. This thesis will treat different methods for finding the optimal smoothing parameter within the generalized additive model. While the method of cross validation is commonly used for this purpose, a more uncommon method, the L-curve method, is investigated for its performance in comparison to the method of cross validation. Numerical computations on test data show that the L-curve method is significantly faster than the method of cross validation, but suffers from heavy under-smoothing and is thus not a suitable method for estimating the optimal smoothing parameter. ii

4 iii

5 Acknowledgements First of all I would like to thank my supervisor Jonathan Fransson, for your helpful discussions on this subject, and for giving me the possibility of working with you. You taught me insurance pricing theory when we were co-workers at If P&C Insurance. I really appreciate how you always were available for fun discussions about insurance mathematics and esports. Thanks also to my other colleagues at If P&C Insurance and Vardia Insurance Group for fruitful collaborations. I want to thank Camilla Landén, Rasmus Janse and Andreas Runnemo for your helpful comments on the drafts of this thesis. Finally, I would like to use this oppurtunity to thank my friends and family, for your support and for all the good times we have shared. Stockholm, May 2015 Kivan Kaivanipour iv

6 v

7 Contents 1 Introduction 1 2 Insurance Tariff analysis Rating variables The multiplicative model Non-life insurance pricing using the GLM Exponential dispersion models The link function Price relativities Non-life insurance pricing using the GAM Penalized deviances Smoothing splines Price relativities - one rating variable Price relativities - several rating variables Optimal choice of the smoothing parameter Cross validation The L-curve method Results Numerical computations Unmodified test data Modified test data Summary and conclusion 35 vi

8 vii

9 Chapter 1 Introduction In non-life insurance, almost every tariff analysis involves continuous rating variables, such as the age of the policyholder or the weight of the insured vehicle. In the generalized linear model GLM, continuous rating variables are categorized into intervals and all values within an interval are treated as identical. Tariff cell Age years Weight kg Premium P P P P P P 6 Table 1.1: Illustration of a simple tariff with two categorized rating variables. This method is simple and often works well enough. However, a disadvantage of categorizing a rating variable is that two policies with different but close values for the rating variable may get significantly different premiums if the values happen to belong to different intervals. Also, finding a good subdivision into intervals can be time consuming and tedious. The intervals must be large enough to achieve good precision of the price relativities, but at the same time they have to be small if the effect of the rating variable varies much. Sometimes it is difficult to fulfill both these requirements. With this in mind, an alternative modelling approach can be used. By using the generalized additive model GAM, the categorization part of the GLM can be avoided. This thesis will treat different methods for finding the optimal smoothing parameter within the GAM. While the method of cross validation is commonly used for this purpose, a more uncommon method, the L-curve method, is investigated for its performance in comparison to the method of cross validation. 1

10 Chapter 2 Insurance An insurance policy is a contract between an insurer, e.g. an insurance company, and a policyholder, e.g. a consumer or a company. An insurance policy can be considered as a promise made by the insurer to cover certain unexpected losses that the policyholder may encounter. The policyholder pays the insurer a fee for this contract, called the premium. An event reported by the policyholder where he or she demands economic compensation is called a claim. Insurance policies are commonly categorized into life and non-life insurance. A life insurance policy covers future financial losses and the insurer pays a sum of money to a designated recipient, called the beneficiary. A non-life insurance policy covers damage incurred to the policyholder s possesion or property, and the policyholder receives compensation so that the property can be recovered to the previous physical condition. However, there are counterexamples, such as business interruption policies non-life insurance, where the policyholder usually a company is being covered for future financial losses. The strategic business idea of an insurer has its origin in ancient history where the community supported individual losses [13]. Back then, the premiums were distributed uniformly, but as time has passed, the realization of a modern insurance company s success is through risk differentiation. Statistics and probability theory can be used to estimate the risk an insurer is exposed to and the main theorem which is the cornerstone of insurance mathematics is the Law of Large Numbers [3]. The theorem states that the expected value µ of a sequence of independent and identically distributed random variables Y i can be approximated by the sample average of {Y 1,..., Y n. Applied to insurance mathematics, this means that the total loss for a large set of customers should be close to its expected value and the premium for a policy should be based on the expected average loss that is transferred from the policyholder to the insurer [12, 1]. In this thesis we will omit non-risk related costs, such as administration costs, and only discuss the part of the premium that is directly connected to the losses. 2

12 2.3 The multiplicative model There are basically two ways the elements of a tariff could operate when computing the premium. The model can be additive or multiplicative, whereas the latter model is nowadays considered as standard for insurance pricing [11]. In order to describe the multiplicative model we need some preliminaries. Let M be the number of rating variables, let m k be the number of intervals for rating variable k and let i k denote the interval of rating variable k. Tariff cell Age years Weight kg i 1 i Table 2.2: An illustrative example with M = 2, m 1 = 2 and m 2 = 3. Let µ be the mean of a key ratio Y, where the key ratio can be either the claim frequency, the claim severity or the pure premium. The mean µ for a policy with rating variables pertaining to the intervals i 1, i 2,..., i M is then given by M µ i1,i 2,...,i M = γ 0 γ k,ik, 2.2 where γ 0 is the base value and {γ k,ik, i k = 1, 2,..., m k are the price relativities for rating variable k [12, 1.3]. The model is over-parameterized since if we multiply all γ 1,i1 with any number α and divide all γ 2,i2 with the same α we get the same µ s as before. To make the price relativities unique we specify a base cell {i 1 = b 1, i 2 = b 2,..., i M = b M and set {γ k,bk = 1, k = 1, 2,..., M. The base value γ 0 can now be interpreted as the mean in the base cell and the price relativities measure the relative difference in relation to the base cell. In Chapter 3 we will get back to the multiplicative model and show how to determine the base value and the price relativities. Tariff cell Age years Weight kg i 1 i 2 Base cell µ i1.i x γ γ 0 γ 2, γ 0 γ 2, γ 0 γ 1, γ 0 γ 1,2 γ 2, γ 0 γ 1,2 γ 2,3 Table 2.3: A multiplicative model with b 1 = b 2 = 1 and γ 1,1 = γ 2,1 = 1. 4

13 Chapter 3 Non-life insurance pricing using the GLM As discussed in the previous chapter, the main goal of a tariff analysis is to determine how a key ratio Y varies with a number of rating variables. As mentioned, the key ratio can be either the claim frequency, the claim severity or the pure premium. The linear model LM is not suitable for insurance pricing due to; i Poor distribution approximation In the LM the response variable Y follows a normal distribution, whereas in insurance, the number of claims follows a discrete probability distribution on the non-negative integers. Furthermore, the distribution of the claim cost is non-negative and often skewed to the right [12, 2]. ii Non-reasonable mean modeling In the LM the mean µ is a linear function of the covariates, whereas multiplicative models are usually more reasonable for insurance pricing. The GLM [10] generalizes the LM in two different ways and is more suitable for insurance pricing due to; i Good distribution approximation In the GLM the response variable Y can follow any distribution that belongs to the class of the exponential dispersion models EDMs. The normal, Poisson and gamma distributions are all members of this class. ii Reasonable mean modeling In the GLM some monotone transformation of the mean gµ is a linear function of the covariates, with the linear and multiplicative models as special cases [12, 2]. These generalizations are discussed in detail in Sections 3.1 and 3.2, respectively. 5

14 3.1 Exponential dispersion models In the GLM the response variable Y can follow any distribution that belongs to the class of the EDMs [8]. The probability distribution of an EDM is given by the frequency function { yi θ i bθ i f Yi y i, θ i, φ = exp + cy i, φ, 3.1 φ where y i is a possible outcome of the response variable Y i, θ i is a parameter that may depend on i, φ is called the dispersion parameter and b is called the cumulant function. The function c is not of interest in GLM theory. We will now show that the normal, Poisson and gamma distributions all are members of the EDM class Normal distribution Let us first show that the normal distribution used in the LM is a member of the EDM class. Let Y i N µ i, σ 2. The frequency function is then given by 1 f Yi y i = 2 2πσ 2 e yi µi /2σ 2 { 1 = exp log 2πσ 2 1 { yi µ i µ 2 i = exp /2 σ y 2 2σ 2 i 2y i µ i + µ 2 i y 2 i σ 2 + log 2πσ The normal distribution is thus an EDM with θ i = µ i, φ = σ 2, bθ i = θi 2 /2 and cy i, φ = 1 2 y 2 i φ + log2πφ Poisson distribution Let us also show that the Poisson distribution is a member of the EDM class. Let Y i P oµ i. The frequency function is then given by f Yi y i = µyi i y i! e µi = exp{y i log µ i µ i logy i!. 3.4 The Poisson distribution is thus an EDM with θ i = log µ i, φ = 1, bθ i =e θi and cy i, φ = logy i!

15 3.1.3 Gamma distribution Let us finally show that the gamma distribution also is a member of the EDM class. Let Y i Gα, β i. The frequency function is then given by f Yi y i = βα i Γα yα 1 i e βiyi = exp { logβ α i + log y α 1 i log Γα βi y i. 3.6 Before transforming this expression to EDM form we need to re-parameterize it through µ i = α/β i and φ = 1/α. The frequency function is then given by { 1/φ 1 f Yi y i = exp log + log y 1/φ 1 i log Γ1/φ y i µ i φ µ i φ { 1 = exp φ log 1 µ i φ + 1/φ 1 log y i log Γ1/φ y i µ i φ { yi /µ i log µ i = exp + logy i/φ log y i log Γ1/φ. φ φ 3.7 The gamma distribution is thus an EDM with θ i = 1/µ i = β i /α, φ = 1/α, bθ i = log θ i and cy i, φ = logy i/φ φ log y i log Γ1/φ

16 3.2 The link function In the previous section we described how an EDM generalizes the normal distribution used in the LM, and we now turn to the other generalization, which concerns the linear structure of the mean. Let us start by looking at a simple example, in which we only have two rating variables, one with two intervals and one with three intervals. Using the multiplicative model, the mean is given by µ i1,i 2 = γ 0 γ 1,i1 γ 2,i By taking logarithms we can transform the above model to linear form log µ i1,i 2 = log γ 0 + log γ 1,i1 + log γ 2,i We recall from Section 2.3 that the model is over-parameterized so we choose a base cell, b 1 = b 2 = 1 and set γ 1,1 = γ 2,1 = 1. We realize that µ 1,1 = γ 0 and that the price relativities measure the relative difference in relation to the base cell. Tariff cell i 1 i 2 Base cell log µ i1,i x log γ log γ 0 + log γ 2, log γ 0 + log γ 2, log γ 0 + log γ 1, log γ 0 + log γ 1,2 + log γ 2, log γ 0 + log γ 1,2 + log γ 2,3 Table 3.1: A simple example with b 1 = b 2 = 1 and γ 1,1 = γ 2,1 = 1. To simplify the notation, we change the index of the mean so it represents the tariff cell instead of the interval for each rating variable. We also introduce the regression parameters β 0 = log γ 0, β 1 = log γ 1,2, β 2 = log γ 2,2 and β 3 = log γ 2,3. Tariff cell i i 1 i 2 Base cell log µ i x β β 0 +β β 0 +β β 0 +β β 0 +β 1 +β β 0 +β 1 +β 3 Table 3.2: Parameterization of a two-way multiplicative model. Next, we introduce the dummy variables x ij through the relation x ij = { 1, if β j is included in log µ i, 0, otherwise

17 Tariff cell i i 1 i 2 Base cell x i0 x i1 x i2 x i x Table 3.3: Dummy variables in a two-way multiplicative model. Using the dummy variables x ij and the regression parameters β j that we have introduced, we finally obtain the following useful equation for the logarithm of the mean log µ i = 3 x ijβ j, i = 1, 2,..., j=0 We now leave our simple example behind and look at the general problem of how a key ratio Y is affected by M rating variables. We let m k be the number of intervals for rating variable k and introduce η i = r x ijβ j, i = 1, 2,..., n, 3.13 j=0 where x ij and β j are defined similarly as in the simple example, r is the total number of regression parameters the base value parameter β 0 not included and n is the total number of tariff cells. The values r and n are given by M M r = m j M, n = m j In the LM, µ i = η i, but in the GLM it can be any arbitrary function gµ i = η i as long as it is monotone and differentiable [12, 2.2]. The function g is called the link function [10], since it links the mean to the linear structure through gµ i = η i = r x ijβ j Since we are solely working with multiplicative models we will only be using a logarithmic link function from now on j=0 gµ i = log µ i

18 3.3 Price relativities So far we have only discussed some properties of the GLM, but now it is time for the most important step, the estimation of the base value and the price relativities. We will treat two different cases. For the claim frequency we will assume a Poisson distribution and for the claim severity we will assume a gamma distribution. These assumptions are reasonable and considered as standard for insurance pricing [1, 2] The claim frequency For the number of claims of an individual policy during any period of time we assume a Poisson distribution. Furthermore, we assume that policies are independent and we use the fact that the sum of independent Poisson distributed random variables is also Poisson distributed [4]. We therefore get a Poisson distribution also at the aggregate level of the total number of claims in a tariff cell. Let X i be the number of claims in a tariff cell with duration w i and let µ i denote the expected value of X i when w i = 1. Then EX i = w i µ i and X i follows a Poisson distribution with frequency function f Xi x i, µ i = w iµ i xi e wiµi, x i = 0, 1, x i! As discussed earlier, we want to model the claim frequency Y i = X i /w i, and it follows that the frequency function of the claim frequency is given by f Yi y i, µ i = w iµ i wiyi e wiµi, w i y i = 0, 1, w i y i! We are now ready to use maximum likelihood estimation MLE [10] to estimate the regression parameters β j. Since we have assumed that all policies are independent it follows that the log-likelihood of the whole sample is the sum of the log-likelihoods in all the tariff cells [12, 2.3.1] ly i, µ i = n w i y i log µ i µ i + y i log w i logw i y i! From Equation 3.15 and 3.16 we deduce that which inserted to Equation 3.19 gives l = n w i y i j=0 { r µ i = exp x ijβ j, 3.20 j=0 { r r x ijβ j exp x ijβ j + y i log w i logw i y i! j=0 10

19 We then differentiate the log-likelihood with respect to every β j to find the regression parameters that maximize the expression. The partial derivatives are given by l β j = = { n r w i y i exp x ijβ j x ij n w i y i µ i x ij. j= By setting all these r + 1 partial derivatives equal to zero we find the stationary point, and the maximum likelihood ML equations can be expressed as n w i y i µ i x ij = 0, j = 0, 1,..., r The ML equations must almost always be solved numerically. Newton-Raphson s method and Fisher s scoring method are two well-known methods for solving the ML equations [12, 3.2.3]. When the ML equations have been solved we obtain the base value and the price relativities by using the fact that γ j = e βj, j = 0, 1,..., r The claim severity For the cost of an individual claim we assume a gamma distribution and we use the fact that the sum of independent gamma distributed random variables is also gamma distributed [4]. We therefore get a gamma distribution also at the aggregate level of the total claim cost in a tariff cell. Let X i be the claim cost in a tariff cell with duration w i and let µ i denote the expected value of X i when w i = 1. Then EX i = w i µ i and X i follows a gamma distribution with frequency function f Xi x i = βwiα i Γw i α xwiα 1 i e βixi, x i > As discussed earlier, we want to model the claim severity Y i = X i /w i, and it follows that the frequency function of the claim severity is given by f Yi y i = w iβ i wiα Γw i α ywiα 1 i e wiβiyi, y i > We re-parameterize this expression through µ i = α/β i and φ = 1/α, and obtain f Yi y i, µ i = wi/φ 1 wi y wi/φ 1 i e wiyi/µiφ, y i > Γw i /φ µ i φ 11

20 We are now ready to use MLE to estimate the regression parameters β j. Since we have assumed that all policies are independent it follows that the log-likelihood of the whole sample is the sum of the log-likelihoods in all the tariff cells l = 1 φ n w i log 1 y i + log w iy i µ i µ i φ φ log y i + log Γ w i wi φ We replace µ i with the expression from Equation 3.20, and the log-likelihood can be rewritten as l = 1 φ n w i { r r x ijβ j y i / exp x ijβ j j=0 j=0 + log w iy i φ φ w i log y i + log Γ wi φ We then differentiate the log-likelihood with respect to every β j to find the regression parameters that maximize the expression. The partial derivatives are given by l β j = 1 φ = 1 φ { n r w i y i / exp x ijβ j 1 n j=0 yi w i 1 x µ ij. i x ij 3.30 By setting all the partial derivatives equal to zero and multiplying by φ, we get the ML equations n yi w i 1 x ij = 0, j = 0, 1,..., r µ i As mentioned before, the ML equations must almost always be solved numerically. When the ML equations have been solved we obtain the base value and the price relativities by using the fact that γ j = e βj, j = 0, 1,..., r

21 Chapter 4 Non-life insurance pricing using the GAM In the 1980 s Hastie and Tibshirani analyzed the effect of continuous variables and introduced the GAM [6]. There are several possible approaches when using the GAM. In this thesis we will only be using smoothing splines [14]. 4.1 Penalized deviances Let us once again look at the general problem of how a key ratio Y is affected by M rating variables. In Section 3.2 we introduced the mean η i = r x ijβ j, i = 1, 2,..., n. 4.1 j=0 Hastie and Tibshirani introduced the GAM where instead of Equation 4.1 they assumed that η i = β 0 + M f j x ij, i = 1, 2,..., n, 4.2 for some functions f j. Here, x ij is the value of rating variable j for observation i and should not be mixed up with the dummy variable x ij. Also, n is the total number of observations, not the total number of tariff cells. This model is more general since the mean depends more freely on the value of each rating variable. Let us consider a simple example where we model all the rating variables, except one, in the usual way. The mean is then given by η i = r x ijβ j + fx i1, i = 1, 2,..., n. 4.3 j=0 Our goal is now to find the function f that describes the effect of rating variable x i1 in the best way. We want the function to have a good fit to the data but we also want it to be smooth and not to vary wildly. 13

22 4.1.1 The deviance To measure the goodness of fit of some estimated means ˆµ = exp{fx to the data y, we use the deviance Dy, ˆµ [12, 3.1], which is defined as Dy, ˆµ = 2φ n ly i, y i ly i, ˆµ, 4.4 where φ is the dispersion parameter introduced in Equation 3.1. The deviance may be interpreted as the weighted sums of distances of the estimated means ˆµ from the data y. For the Poisson and gamma distributions, we get Dy, exp{fx = 2 Dy, exp{fx = 2 n w i y i log y i y i fx i y i + e fxi, 4.5 n w i y i /e fxi 1 log y i + fx i The regularization As a measure of the variability of the function f, we use the regularization Rfx [12, 5.1], which is defined as Rfx = b a f x 2 dx, 4.7 where a is a value lower than the lowest possible value of x and b is a value higher than the highest possible value of x. For a function that varies wildly the regularization will be high, whereas for a function with little variation the regularization will be low The smoothing parameter Now when the deviance and the regularization have been defined, we are looking for the function f that minimizes the penalized deviance fx = Dy, exp{fx + λrfx, 4.8 where λ is called the smoothing parameter. The smoothing parameter creates a trade-off between good fit to the data and low variability of the function f. A small value of λ would increase the weight put on data, letting the function f vary freely Figure 4.1, whereas a large value of λ would decrease the weight put on data, forcing the integrated squared second derivative of the function f to be small Figure 4.2. In Chapter 5 we will compare two different methods for finding the optimal smoothing parameter. 14

23 Claim frequency Claim frequency Age of policyholder Age of policyholder Figure 4.1: A small value of λ would increase the weight put on data, letting the function f vary freely. Figure 4.2: A large value of λ would decrease the weight put on data, forcing the integrated squared second derivative of the function f to be small. 4.2 Smoothing splines Smoothing splines have important properties in connection to penalized deviances and it can be shown that among all twice continuously differentiable functions, the natural cubic spline minimizes the integrated squared second derivative [12, Appendix B.1]. This important feature is contained in the following theorem: Theorem 4.1 For any points u 1 < < u m and real numbers y 1,..., y m, there exists a unique natural cubic spline s, such that su k = y k, k = 1,..., m. Furthermore, if f is any twice continuously differentiable function such that fu k = y k, k = 1,..., m, then for any a u 1 and b u m, b s x 2 dx b a a f x 2 dx. 4.9 By Theorem 4.1 there exists a unique natural cubic spline s, such that {su k = fu k, k = 1,..., m, and since Dy, exp{sx = Dy, exp{fx it follows that sx fx This means that when looking for the twice continuously differentiable function f that minimizes Equation 4.8 we only need to consider the set of natural cubic splines. Since any spline function of given degree can be expressed as a linear combination of B-splines of that degree [12, Appendix B.2], we are going to use B-splines to parameterize the set of natural cubic splines. Let B 3k x be the k:th B-spline of order 3. The following useful theorem will be used extensively throughout the rest of this paper. 15

24 Theorem 4.2 For a given set of m knots, a cubic spline s may be written as sx = for unique constants β 1,..., β. β k B 3k x, 4.11 Here, β 1,..., β are constants which decide the weight of each B-spline and should not be mixed up with the regression parameters defined in Section Figure 4.3: B-splines of order 3. The three internal knots are marked with black diamonds and the two boundary knots are marked with white diamonds. 4.3 Price relativities - one rating variable It is now time to estimate the price relativities by finding the natural cubic spline s that minimizes the penalized deviance sx = Dy, exp{sx + λrsx We will start by considering one rating variable and we will treat two different cases, one for the claim frequency and one for the claim severity The claim frequency Let us assume that we have one continuous rating variable, and that the observations of the key ratio Y are Poisson distributed. When storing insurance data, continuous rating variables are usually rounded, so instead of assuming millions of different values, they usually assume a much smaller number of different values. For instance, the age of the policyholder is usually stored in years and rounded down to the nearest integer, so less than a hundred different values occur. 16

25 Let x i be the value of the rating variable for observation i and let z 1,..., z m denote the possible values of x i, in ascending order. For the Poisson distribution, the deviance is given by Dy, exp{sx = 2 n w i y i log y i y i sx i y i + e sxi Since a z 1 and b z m the regularization can be expressed as Rsx = z1 a zm b s x 2 dx + z 1 s x 2 dx + s x 2 dx. z m 4.14 The natural cubic spline is linear outside [z 1, z m ] so the first and third term of Equation 4.14 equals zero and the regularization can be simplified to Rsx = zm The penalized deviance can now be expressed as z 1 s x 2 dx sx = 2 n w i y i log y i y i sx i y i + e sxi + λ zm z 1 s x 2 dx It is time to use the fact that the natural cubic spline s can be expressed as a sum of B-splines Theorem 4.2, such that sx = β j B j x, 4.17 where β 1,..., β are unique parameters and B 1 x,..., B x are the cubic B-splines with knots z 1,..., z m. The penalized deviance can then be expressed as a function of the parameters β = [β 1,..., β ] and is given by β = 2 n w i y i log y i y i β j B j x i y i + exp { β j B j x i + λ β j β k Ω jk, 4.18 where Ω jk = zm z 1 B j xb k x dx

26 The numbers Ω jk can be computed using the basic properties of B-splines [12, Appendix B.2]. We are now ready to use partial derivatives to find the minimizing parameters β 1,..., β. The partial derivatives are given by β l = 2 where n w i y i B l x i + exp { β j B j x i B l x i = {Letting I k denote the set of i for which x i = z k { m = 2 y i B l z k + exp β j B j z k B l z k = 2 w i i I k m w k ỹ k + exp { β j B j z k B l z k + 2λ w k = i I k w i, ỹ k = 1 w k + 2λ β j Ω jl + 2λ β j Ω jl, β j Ω jl 4.20 i I k w i y i In the next step, we set the partial derivatives equal to zero β l = 2 m w k ỹ k + exp and obtain the equations m w k exp +2λ { { β j B j z k B l z k β j Ω jl = 0, l = 1, 2,..., m + 2, m β j B j z k B l z k w k ỹ k B l z k +λ β j Ω jl = 0, l = 1, 2,..., m Since the first term in Equation 4.23 depends in a non-linear way on β 1,..., β, we will use Newton-Raphson s method [9] to solve the equation system. We begin by defining h l β 1,..., β, such that h l β 1,..., β = m m w k exp { β j B j z k B l z k w k ỹ k B l z k + λ β j Ω jl, l = 1, 2,..., m

27 To solve the equations h l β 1,..., β = 0 we proceed by iteratively solving the linear equation systems for the unknowns β n+1 1,..., β n+1 h l β n 1,..., β n + β n+1 j β n hl j β n 1,..., β n = 0, β j l = 1, 2,..., m + 2, 4.25 where h l β j = m w k exp { β j B j z k B j z k B l z k + λω jl To make the notation a bit easier, we define γ n k = exp { β n j B j z k Using the properties of Equations 4.24, 4.26 and 4.27, Equation 4.25 can be simplified to + m β n+1 j w k γ n k B lz k m β n j m w k ỹ k B l z k + λ β n j Ω jl w k γ n k B jz k B l z k + λω jl = 0, l = 1, 2,..., m Collecting β n+1 j on one side of the equality and β n j on the other side, we get m = m w k γ n k B jz k B l z k β n+1 j + λ β n+1 j Ω jl w k γ n k ỹ k /γ n k 1 + β n j B j z k B l z k, l = 1, 2,..., m Let us now introduce the m m + 2 matrix B by B 1 z 1 B 2 z 1 B z 1 B 1 z 2 B 2 z 2 B z 2 B =......, 4.30 B 1 z m B 2 z m B z m 19

28 and the m m diagonal matrix W n by W n = w 1 γ n w 2 γ n w m γ n m Furthermore, let Ω denote the symmetric m + 2 m + 2 matrix with elements Ω jk. Also, let β n denote the vector with elements β n j and y n the vector with elements ỹ k /γ n k 1 + βn j B j z k. The system of linear equations may now be written on matrix form as B W n B + λω β n+1 = B W n y n, 4.32 and the equation system can be solved using Newton-Raphson s method to obtain β. The natural cubic spline sx can then be calculated using Equation 4.17 and the price relativities are given by e sz1,..., e szm The claim severity Let us again assume that we have one continuous rating variable, but this time we assume that the observations of the key ratio Y are gamma distributed. For the gamma distribution, the deviance is given by Dy, exp{sx = 2 n w i y i /e sxi 1 log y i + sx i If we combine the expression above with the conclusion from Equation 4.15 we can express the penalized deviance as sx = 2 n zm w i y i /e sxi 1 log y i + sx i + λ s x 2 dx z 1 It is once again time to use the fact that the natural cubic spline s can be expressed as a sum of B-splines. The penalized deviance can then be expressed as a function of the parameters β 1,..., β and is given by β = 2 n w i y i / exp { β j B j x i 1 log y i β j B j x i + λ + β j β k Ω jk

29 We are now ready to use partial derivatives to find the minimizing parameters β 1,..., β. The partial derivatives are given by β l = 2 = 2 = 2 n w i y i / exp m w i i I k { y i / exp m w k ỹ k / exp β j B j x i B l x i + B l x i { { β j B j z k B l z k + B l z k β j B j z k + 1 B l z k + 2λ + 2λ β j Ω jl + 2λ β j Ω jl. β j Ω jl 4.36 In the next step, we set the partial derivatives equal to zero β l = 2 m w k ỹ k / exp and obtain the equations +2λ { β j B j z k + 1 B l z k β j Ω jl = 0, l = 1, 2,..., m + 2, m w k ỹ k B l z k / exp +λ { m β j B j z k + w k B l z k β j Ω jl = 0, l = 1, 2,..., m Since the first term in Equation 4.38 depends in a non-linear way on β 1,..., β, we will once again use Newton-Raphson s method to solve the equation system. We begin by defining h l β 1,..., β, such that h l β 1,..., β = + m m w k ỹ k B l z k / exp { β j B j z k w k B l z k + λ β j Ω jl, l = 1, 2,..., m To solve the equations h l β 1,..., β = 0 we proceed by iteratively solving the linear equation systems for the unknowns β n+1 1,..., β n+1 h l β n 1,..., β n + β n+1 j β n hl j β n 1,..., β n = 0, β j l = 1, 2,..., m + 2,

30 where h l β j = m w k ỹ k B j z k B l z k / exp { β j B j z k + λω jl Using the properties of Equations 4.27, 4.39 and 4.41, Equation 4.40 can be simplified to + β n+1 j m w k ỹ k B l z k /γ n k + β n j m m w k ỹ k B j z k B l z k /γ n k w k B l z k + λ β n j Ω jl + λω jl = 0, l = 1, 2,..., m Collecting β n+1 j on one side of the equality and β n j on the other side, we get m = m w k ỹ k B γ n j z k B l z k β n+1 j + λ k w k ỹ k γ n k 1 γn k + ỹ k β n j B j z k β n+1 j Ω jl B l z k, l = 1, 2,..., m Let us now introduce the m m diagonal matrix W n by W n = w 1 ỹ 1 /γ n w 2 ỹ 2 /γ n w m ỹ m /γ m n Furthermore, let β n denote the vector with elements β n j and y n the vector with elements 1 γ n k /ỹ k + βn j B j z k. The system of linear equations may now be written on matrix form as B W n B + λω β n+1 = B W n y n, 4.45 and the equation system can be solved using Newton-Raphson s method to obtain β. The natural cubic spline sx can then be calculated using Equation 4.17 and the price relativities are given by e sz1,..., e szm. 22

31 4.4 Price relativities - several rating variables We will now continue by considering several rating variables. The backfitting algorithm is an iterative procedure used for estimation of the price relativities within the GAM [7]. The idea behind the backfitting algorithm is to reduce the estimation problem to a one dimensional problem and only consider one continuous rating variable at a time. Let us assume that we have a large number of categorical rating variables together with two continuous rating variables. The generalization to the case with an arbitrary number of continuous rating variables is completely straightforward. We let x 1i and x 2i be the values of observation i for the first and second continuous rating variable, respectively. We also let z 11,..., z 1m1 and z 21,..., z 2m2 denote the possible values for the continuous rating variables. The mean for this model can then be expressed as η i = r j=0 m 1+2 m 2+2 x ijβ j + β 1k B 1k x 1i + β 2l B 2l x 2i, i = 1, 2,..., n l=1 We will once again treat two different cases, one for the claim frequency and one for the claim severity The claim frequency Let us first assume that the observations of the key ratio Y are Poisson distributed. The penalized deviance can then be expressed as m 1+2 +λ 1 β 1, β 2 = 2 m 1+2 n w i y i log y i y i η i y i + e ηi β 1j β 1k Ω 1 jk + λ 2 m 2+2 m 2+2 β 2j β 2k Ω 2 jk, 4.47 where λ 1 and λ 2 are the smoothing parameters for the first and second continuous rating variable, respectively. Let us also introduce the notation { r ν 0i = exp x ijβ j, 4.48 ν 1i = exp ν 2i = exp j=0 { m1+2 { m2+2 β 1j B 1j x 1i β 2j B 2j x 2i,

32 The deviance can then be written as D = 2 = 2 = 2 = 2 = 2 = 2 n w i y i log y i y i η i y i + e ηi n w i y i log y i y i log µ i y i + µ i n w i y i log y i y i logν 0i ν 1i ν 2i y i + ν 0i ν 1i ν 2i n w i ν 1i ν 2i n w i ν 0i ν 2i n w i ν 0i ν 1i yi ν 1i ν 2i log yi ν 0i ν 2i log yi ν 0i ν 1i log y i ν 1i ν 2i y i ν 0i ν 2i y i ν 0i ν 1i y i ν 1i ν 2i log ν 0i y i ν 0i ν 2i log ν 1i y i ν 0i ν 1i log ν 2i y i + ν 0i ν 1i ν 2i y i + ν 1i ν 0i ν 2i y i + ν 2i. ν 0i ν 1i 4.51 If β 11,..., β 1,m1+2 and β 21,..., β 2,m2+2 were known, the problem of estimating β 0,..., β r would be identical to the problem with only categorical rating variables the regularization terms in Equation 4.47 are only constants in this case, but with observations y i /ν 1i ν 2i and weights w i ν 1i ν 2i as can be seen in Equation Similarly, if β 0,..., β r and β 21,..., β 2,m2+2 were known, the problem of estimating β 11,..., β 1,m1+2 is precisely the one treated in Section 4.3.1, with the observations y i /ν 0i ν 2i and the weights w i ν 0i ν 2i. The corresponding statement of course holds for the second continuous rating variable as well. To be able to use the backfitting algorithm we need some initial estimates. A 0 0 good initial estimate would be to derive ˆβ 0,..., ˆβ r by analyzing the data with the continuous rating variables excluded and to set all ˆβ0 0 11,..., ˆβ 1,m and ˆβ 21,..., ˆβ 2,m 2+2 to zero. Given this set of initial estimates we define ˆν 0 0i ˆν 0 1i ˆν 0 2i = exp = exp = exp { r j=0 { m1+2 { m2+2 x 0 ij ˆβ j ˆβ 0 1j B 1jx 1i ˆβ 0 2j B 2jx 2i, 4.52 = 1, 4.53 =

33 We continue by iterating the following three steps until the estimates have converged we have used the tolerance 1e-10 throughout this paper. For iteration q we have: Step 1 Compute q q ˆβ 11,..., ˆβ 1,m based on the observations y 1+2 i/ weights w iˆν q 1 0i ˆν q 1 2i with x 1i being the { only explanatory rating variable. Cal- m1+2 = exp ˆβ q 1j B 1jx 1i. culate ˆν q 1i Step 2 Compute weights w iˆν q 1 0i ˆν q 1i with x 2i being the { only explanatory rating variable. Cal- m2+2 = exp ˆβ q 2j B 2jx 2i. culate ˆν q 2i Step 3 Compute w iˆν q 1i ˆνq 2i formula ˆν q 0i with the formula ˆν q 1i ˆν q 1 0i ˆν q 1 2i and the q q ˆβ 21,..., ˆβ 2,m based on the observations y 2+2 i/ ˆν q 1 0i ˆν q 1i and the with the formula ˆν q 2i q q ˆβ 0,..., ˆβ r based on the observations y i / ˆν q 1i ˆνq 2i and the weights using only { the categorical rating variables. Calculate ˆν q 0i with the r = exp. j=0 x ij ˆβ q j When the estimates have converged, we obtain the price relativities for the categorical rating variables, using the fact that γ j = e βj, j = 0, 1,..., r We obtain the natural cubic splines for the continuous rating variables by s 1 x = s 2 x = m 1+2 m 2+2 β 1j B 1j x, 4.56 β 2j B 2j x, 4.57 and the price relativities for the continuous rating variables are given by e s1z11,..., e s1z1m 1 and e s 2z 21,..., e s2z2m 2. 25

34 4.4.2 The claim severity Let us now instead assume that the observations of the key ratio Y are gamma distributed. The penalized deviance can then be expressed as m 1+2 +λ 1 β 1, β 2 = 2 m 1+2 n w i y i /e ηi 1 logy i /e ηi β 1j β 1k Ω 1 jk + λ 2 m 2+2 m 2+2 β 2j β 2k Ω 2 jk The deviance can be written as D = 2 = 2 n n yi w i e 1 log y i ηi e ηi yi w i 1 log y i µ i µ i n y i y i = 2 w i 1 log ν 0i ν 1i ν 2i ν 0i ν 1i ν 2i = 2 = 2 = 2 n n n w i yi /ν 1i ν 2i ν 0i w i yi /ν 0i ν 2i ν 1i w i yi /ν 0i ν 1i ν 2i 1 log y i/ν 1i ν 2i ν 0i 1 log y i/ν 0i ν 2i ν 1i 1 log y i/ν 0i ν 1i. ν 2i 4.59 If β 11,..., β 1,m1+2 and β 21,..., β 2,m2+2 were known, the problem of estimating β 0,..., β r would be identical to the problem with only categorical rating variables the regularization terms in Equation 4.58 are only constants in this case, but with observations y i /ν 1i ν 2i as can be seen in Equation Similarly, if β 0,..., β r and β 21,..., β 2,m2+2 were known, the problem of estimating β 11,..., β 1,m1+2 is precisely the one treated in Section 4.3.2, with the observations y i /ν 0i ν 2i. The corresponding statement of course holds for the second continuous rating variable as well. 26

35 We use the same initial estimates as in the Poisson case and we continue by iterating the following three steps until the estimates have converged. For iteration q we have: Step 1 Compute q q ˆβ 11,..., ˆβ 1,m based on the observations y 1+2 i/ ˆν q 1 0i ˆν q 1 2i with x 1i being the { only explanatory rating variable. Calculate ˆν q 1i with the formula ˆν q m1+2 1i = exp ˆβ q 1j B 1jx 1i. Step 2 Compute q q ˆβ 21,..., ˆβ 2,m based on the observations y 2+2 i/ being the { only explanatory rating variable. ˆν q m2+2 2i = exp ˆβ q 2j B 2jx 2i. Step 3 Compute ˆν q 0i Calculate ˆν q 2i ˆν q 1 0i ˆν q 1i with x 2i with the formula q q ˆβ 0,..., ˆβ r based on the observations y i / ˆν q 1i ˆνq 2i using only the categorical { rating variables. Calculate ˆν q 0i with the formula r = exp. j=0 x ij ˆβ q j When the estimates have converged we obtain the price relativities for the categorical rating variables by using the fact that γ j = e βj, j = 0, 1,..., r We obtain the natural cubic splines for the continuous rating variables by s 1 x = s 2 x = m 1+2 m 2+2 β 1j B 1j x, 4.61 β 2j B 2j x, 4.62 and the price relativities for the continuous rating variables are given by e s1z11,..., e s1z1m 1 and e s 2z 21,..., e s2z2m 2. 27

36 Chapter 5 Optimal choice of the smoothing parameter As mentioned in Section 4.1.3, the smoothing parameter creates a trade-off between good fit to the data and low variability of the function f. A small value of λ would increase the weight put on data, letting the function f vary freely, whereas a large value of λ would decrease the weight put on data, forcing the integrated squared second derivative of the function f to be small Figure 5.1. In this chapter we will compare two different methods for finding the optimal smoothing parameter. While the method of cross validation is commonly used for this purpose, a more uncommon method, the L-curve method, is investigated for its performance in comparison to the method of cross validation Claim frequency Age of policyholder Figure 5.1: A small value of λ would increase the weight put on data, letting the function f vary freely dashed curve, whereas a large value of λ would decrease the weight put on data, forcing the integrated squared second derivative of the function f to be small solid curve. 28

37 5.1 Cross validation Cross validation is a way of measuring the predictive performance of a statistical model. There are plenty of different variations of cross validation but due to its simple properties we will only be using the variation that goes under the name of leave-one-out cross validation [7]. When describing the method of cross validation we will only be considering one continuous rating variable. In the case of several continuous rating variables, we can just proceed and use the approach described below every time we update one of the continuous rating variables in the backfitting algorithm. As the calculations for the Poisson case and the gamma case are very similar we will only consider the case where the observations of the key ratio Y are Poisson distributed. For the Poisson distribution, the fitted natural cubic spline s minimizes the expression sx = 2 m w k ỹ k log ỹ k ỹ k s z k ỹ k + e sz k + λ zm z 1 s x 2 dx. 5.1 Now suppose we remove one particular z k and the corresponding ỹ k from the data. We can then, for any λ, calculate the minimizing natural cubic spline s λ k x for this new data set. For a good value of λ, sλ k z k should be a good predictor of the deleted data point ỹ k. This should be true for any k. The cross validation score measures the overall ability of predicting all the removed data points a smaller value of the cross validation score means better ability to predict the removed data points and is defined as Cλ = 2 m w k ỹ k log ỹ k ỹ k s λ kz k ỹ k + e sλ k z k. 5.2 The idea of cross validation is to choose λ as the value for which the cross validation score Cλ is minimized. We compute Cλ by finding all the minimizing splines s λ 1,... s λ m. Since this can be very time consuming, approximative methods have been developed but these will not be used in this thesis [12, 5.5] Cross validation score Claim frequency Smoothing parameter Age of policyholder Figure 5.2: The cross validation score Cλ for different values of λ. Figure 5.3: The natural cubic spline based on the λ that minimizes the cross validation score Cλ. 29

38 5.2 The L-curve method An L-curve is a convenient graphical tool for displaying the size of the regularization versus the size of the deviance, as the smoothing parameter varies. A standard L-curve is visualized in a log-log plot and has the shape of an L, but for insurance noise-free data it has been shown that the L-curve always becomes concave [5] Regularization Deviance Figure 5.4: The L-curve becomes concave for insurance data. The idea of the L-curve method is to find the point on the corner of the L-curve and to choose the underlying smoothing parameter. The reason behind this choice is that the corner separates the flat and vertical parts of the curve where the solution is dominated by the regularization and the deviance, respectively. For standard L-curves with the shape of an L, this makes perfect sense, but for concave L-curves it is not clear whether the corner represents a good value of the smoothing parameter or not. This will be evaluated in Chapter 6. There are several different ways of defining the corner but we will choose the method proposed by Hansen and O Leary [5]. We define h 1 λ = log λr and h 2 λ = log D/λ and we note that both h 1 and h 2 are twice differentiable with respect to λ. The corner of the L-curve is then found by minimizing the curvature κλ, where h κλ = 1h 2 h 1h 2 h h 2 2 3/2 Any simple optimizing method will do for finding the minimal κλ Curvature Smoothing parameter Figure 5.5: The curvature κλ for different values of λ. 30

39 Regularization Deviance Figure 5.6: The point on the L-curve with the minimum curvature κλ Regularization Deviance Figure 5.7: The point on the L-curve with the minimum curvature κλ with axes range being of equal dimension Claim frequency Age of policyholder Figure 5.8: The natural cubic spline based on the λ that minimizes the curvature κλ. 31

40 Chapter 6 Results In this chapter we present some test results, comparing the method of cross validation with the L-curve method. The data used for this comparison is downloaded test data 1. The test data has been modified in different ways to fully analyze and compare the different methods. The calculations have been carried out in Matlab R2015a. 6.1 Numerical computations We recall the system of linear equations that needs to be solved in the GAM to obtain the regression parameters β Equation In the method of cross validation, the equation system needs to be solved p m times, where p is the total number of different values that is tested for the smoothing parameter and m is the number of possible values for the rating variable. In the L-curve method the equation system only needs to be solved p times. In Table 6.1 we present a summary of running times for different choices of p. The number of possible values for the rating variable is set to m = 12. Cross validation The L-curve method p = p = p = p = p = p = Table 6.1: Running times in seconds for different choices of p. Note that we have only considered one continuous rating variable in the computations above. When considering several continuous rating variables these running times adds up, every time we update one of the continuous rating variables in the backfitting algorithm

41 6.2 Unmodified test data In our first graphical example we are going to use the unmodified test data. We test p = different values of the smoothing parameter, both in the method of cross validation and in the L-curve method. The natural cubic splines based on the suggested smoothing parameters are shown in Figure 6.1. The program for the method of cross validation takes about 36 seconds to complete whereas the program for the L-curve method takes about 2.9 seconds to complete. See Section 6.1 for a more comprehensive comparison on running times for the different methods Claim frequency Age of policyholder Figure 6.1: The natural cubic splines based on the smoothing parameters given by the method of cross validation solid curve and the L-curve method dashed curve. 6.3 Modified test data Now we are going to modify our original test data in different ways to be able to fully analyze and compare the method of cross validation solid curve and the L-curve method dashed curve. We are going to modify data points with large duration w i in Section 3.3 and with small duration both upwards and downwards to be able to evaluate how the different methods react on data series with different properties. See Figures below Claim frequency Claim frequency Age of policyholder Age of policyholder Figure 6.2: The claim frequency for an internal data point with large duration is multiplied by 2. Figure 6.3: The claim frequency for an internal data point with large duration is divided by 2. 33

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

Statistics 580 Maximum Likelihood Estimation Introduction Let y (y 1, y 2,..., y n be a vector of iid, random variables from one of a family of distributions on R n and indexed by a p-dimensional parameter

### LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Smoothing and Non-Parametric Regression

Smoothing and Non-Parametric Regression Germán Rodríguez grodri@princeton.edu Spring, 2001 Objective: to estimate the effects of covariates X on a response y nonparametrically, letting the data suggest

### GLM, insurance pricing & big data: paying attention to convergence issues.

GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.

### Poisson Models for Count Data

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

### Linear Threshold Units

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

### Generalized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)

Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through

### Computer exercise 4 Poisson Regression

Chalmers-University of Gothenburg Department of Mathematical Sciences Probability, Statistics and Risk MVE300 Computer exercise 4 Poisson Regression When dealing with two or more variables, the functional

### Multivariate Normal Distribution

Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

### Lecture 6: Poisson regression

Lecture 6: Poisson regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction EDA for Poisson regression Estimation and testing in Poisson regression

### BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

### Joint models for classification and comparison of mortality in different countries.

Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute

### Regression III: Advanced Methods

Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

### Logistic Regression (1/24/13)

STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

### Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

### Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

### Lecture 8: Gamma regression

Lecture 8: Gamma regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Models with constant coefficient of variation Gamma regression: estimation and testing

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### The zero-adjusted Inverse Gaussian distribution as a model for insurance claims

The zero-adjusted Inverse Gaussian distribution as a model for insurance claims Gillian Heller 1, Mikis Stasinopoulos 2 and Bob Rigby 2 1 Dept of Statistics, Macquarie University, Sydney, Australia. email:

### Nonnested model comparison of GLM and GAM count regression models for life insurance data

Nonnested model comparison of GLM and GAM count regression models for life insurance data Claudia Czado, Julia Pfettner, Susanne Gschlößl, Frank Schiller December 8, 2009 Abstract Pricing and product development

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

### Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

### Linear Classification. Volker Tresp Summer 2015

Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

### Penalized regression: Introduction

Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### An extension of the factoring likelihood approach for non-monotone missing data

An extension of the factoring likelihood approach for non-monotone missing data Jae Kwang Kim Dong Wan Shin January 14, 2010 ABSTRACT We address the problem of parameter estimation in multivariate distributions

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

### Probability and Statistics

CHAPTER 2: RANDOM VARIABLES AND ASSOCIATED FUNCTIONS 2b - 0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be

Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

### Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

### Sufficient Statistics and Exponential Family. 1 Statistics and Sufficient Statistics. Math 541: Statistical Theory II. Lecturer: Songfeng Zheng

Math 541: Statistical Theory II Lecturer: Songfeng Zheng Sufficient Statistics and Exponential Family 1 Statistics and Sufficient Statistics Suppose we have a random sample X 1,, X n taken from a distribution

### Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

### The equivalence of logistic regression and maximum entropy models

The equivalence of logistic regression and maximum entropy models John Mount September 23, 20 Abstract As our colleague so aptly demonstrated ( http://www.win-vector.com/blog/20/09/the-simplerderivation-of-logistic-regression/

### Probabilities and Random Variables

Probabilities and Random Variables This is an elementary overview of the basic concepts of probability theory. 1 The Probability Space The purpose of probability theory is to model random experiments so

### 15.062 Data Mining: Algorithms and Applications Matrix Math Review

.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

### MATH 4330/5330, Fourier Analysis Section 11, The Discrete Fourier Transform

MATH 433/533, Fourier Analysis Section 11, The Discrete Fourier Transform Now, instead of considering functions defined on a continuous domain, like the interval [, 1) or the whole real line R, we wish

### Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable

### Mathematics Course 111: Algebra I Part IV: Vector Spaces

Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are

### Calculus C/Multivariate Calculus Advanced Placement G/T Essential Curriculum

Calculus C/Multivariate Calculus Advanced Placement G/T Essential Curriculum UNIT I: The Hyperbolic Functions basic calculus concepts, including techniques for curve sketching, exponential and logarithmic

### Penalized Logistic Regression and Classification of Microarray Data

Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification

### Underwriting risk control in non-life insurance via generalized linear models and stochastic programming

Underwriting risk control in non-life insurance via generalized linear models and stochastic programming 1 Introduction Martin Branda 1 Abstract. We focus on rating of non-life insurance contracts. We

### Maximum Likelihood Estimation

Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

### Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

### the points are called control points approximating curve

Chapter 4 Spline Curves A spline curve is a mathematical representation for which it is easy to build an interface that will allow a user to design and control the shape of complex curves and surfaces.

### We call this set an n-dimensional parallelogram (with one vertex 0). We also refer to the vectors x 1,..., x n as the edges of P.

Volumes of parallelograms 1 Chapter 8 Volumes of parallelograms In the present short chapter we are going to discuss the elementary geometrical objects which we call parallelograms. These are going to

### 3.8 Finding Antiderivatives; Divergence and Curl of a Vector Field

3.8 Finding Antiderivatives; Divergence and Curl of a Vector Field 77 3.8 Finding Antiderivatives; Divergence and Curl of a Vector Field Overview: The antiderivative in one variable calculus is an important

### Introduction to Predictive Modeling Using GLMs

Introduction to Predictive Modeling Using GLMs Dan Tevet, FCAS, MAAA, Liberty Mutual Insurance Group Anand Khare, FCAS, MAAA, CPCU, Milliman 1 Antitrust Notice The Casualty Actuarial Society is committed

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

### Profitable Strategies in Horse Race Betting Markets

University of Melbourne Department of Mathematics and Statistics Master Thesis Profitable Strategies in Horse Race Betting Markets Author: Alexander Davis Supervisor: Dr. Owen Jones October 18, 2013 Abstract:

### AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS

AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS Revised Edition James Epperson Mathematical Reviews BICENTENNIAL 0, 1 8 0 7 z ewiley wu 2007 r71 BICENTENNIAL WILEY-INTERSCIENCE A John Wiley & Sons, Inc.,

### Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort xavier.conort@gear-analytics.com Motivation Location matters! Observed value at one location is

### CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not

### GLM III: Advanced Modeling Strategy 2005 CAS Seminar on Predictive Modeling Duncan Anderson MA FIA Watson Wyatt Worldwide

GLM III: Advanced Modeling Strategy 25 CAS Seminar on Predictive Modeling Duncan Anderson MA FIA Watson Wyatt Worldwide W W W. W A T S O N W Y A T T. C O M Agenda Introduction Testing the link function

### Important Probability Distributions OPRE 6301

Important Probability Distributions OPRE 6301 Important Distributions... Certain probability distributions occur with such regularity in real-life applications that they have been given their own names.

### 4. Introduction to Statistics

Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

### Master s Theory Exam Spring 2006

Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

### Gamma Distribution Fitting

Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

### Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

### Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

### Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour

Semiparametric Multinomial Logit Models for the Analysis of Brand Choice Behaviour Thomas Kneib Department of Statistics Ludwig-Maximilians-University Munich joint work with Bernhard Baumgartner & Winfried

### Local classification and local likelihoods

Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor

### Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate

### Chapter 6: Point Estimation. Fall 2011. - Probability & Statistics

STAT355 Chapter 6: Point Estimation Fall 2011 Chapter Fall 2011 6: Point1 Estimat / 18 Chap 6 - Point Estimation 1 6.1 Some general Concepts of Point Estimation Point Estimate Unbiasedness Principle of

### 3. Interpolation. Closing the Gaps of Discretization... Beyond Polynomials

3. Interpolation Closing the Gaps of Discretization... Beyond Polynomials Closing the Gaps of Discretization... Beyond Polynomials, December 19, 2012 1 3.3. Polynomial Splines Idea of Polynomial Splines

### Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

### Regression Modeling Strategies

Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions

### EAA series - Textbook

EAA series - Textbook Editors Ch. Hipp M. Koller A. Pelsser E. Pitacco D. Filipovic (Co-Chair) U. Orbanz (Co-Chair) EAA series is successor of the EAA Lecture Notes and supported by the European Actuarial

### Exploratory Data Analysis

Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

### Sections 2.11 and 5.8

Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and

### DERIVATIVES AS MATRICES; CHAIN RULE

DERIVATIVES AS MATRICES; CHAIN RULE 1. Derivatives of Real-valued Functions Let s first consider functions f : R 2 R. Recall that if the partial derivatives of f exist at the point (x 0, y 0 ), then we

### Factorial experimental designs and generalized linear models

Statistics & Operations Research Transactions SORT 29 (2) July-December 2005, 249-268 ISSN: 1696-2281 www.idescat.net/sort Statistics & Operations Research c Institut d Estadística de Transactions Catalunya

### 4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4

4. Simple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/4 Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

### In this chapter, you will learn improvement curve concepts and their application to cost and price analysis.

7.0 - Chapter Introduction In this chapter, you will learn improvement curve concepts and their application to cost and price analysis. Basic Improvement Curve Concept. You may have learned about improvement

### Predictive Modeling for Life Insurers

Predictive Modeling for Life Insurers Application of Predictive Modeling Techniques in Measuring Policyholder Behavior in Variable Annuity Contracts April 30, 2010 Guillaume Briere-Giroux, FSA, MAAA, CFA

### Heritage Provider Network Health Prize Round 3 Milestone: Team crescendo s Solution

Heritage Provider Network Health Prize Round 3 Milestone: Team crescendo s Solution Rie Johnson Tong Zhang 1 Introduction This document describes our entry nominated for the second prize of the Heritage

### Introduction to Logistic Regression

OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction

### Time series Forecasting using Holt-Winters Exponential Smoothing

Time series Forecasting using Holt-Winters Exponential Smoothing Prajakta S. Kalekar(04329008) Kanwal Rekhi School of Information Technology Under the guidance of Prof. Bernard December 6, 2004 Abstract

### Generalized Linear Models

Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

### Review of Fundamental Mathematics

Review of Fundamental Mathematics As explained in the Preface and in Chapter 1 of your textbook, managerial economics applies microeconomic theory to business decision making. The decision-making tools

### Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

### Note on the EM Algorithm in Linear Regression Model

International Mathematical Forum 4 2009 no. 38 1883-1889 Note on the M Algorithm in Linear Regression Model Ji-Xia Wang and Yu Miao College of Mathematics and Information Science Henan Normal University

### Multivariate Logistic Regression

1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### LINEAR SYSTEMS. Consider the following example of a linear system:

LINEAR SYSTEMS Consider the following example of a linear system: Its unique solution is x +2x 2 +3x 3 = 5 x + x 3 = 3 3x + x 2 +3x 3 = 3 x =, x 2 =0, x 3 = 2 In general we want to solve n equations in

### Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

### From the help desk: Demand system estimation

The Stata Journal (2002) 2, Number 4, pp. 403 410 From the help desk: Demand system estimation Brian P. Poi Stata Corporation Abstract. This article provides an example illustrating how to use Stata to

### business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

### Correlation key concepts:

CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

### EECS 556 Image Processing W 09. Interpolation. Interpolation techniques B splines

EECS 556 Image Processing W 09 Interpolation Interpolation techniques B splines What is image processing? Image processing is the application of 2D signal processing methods to images Image representation