How To Predict Insurance Caim Size With A Tree Based Gradient Boosting Agorithm
|
|
|
- Jayson Thompson
- 5 years ago
- Views:
Transcription
1 A Boosted Tweedie Compound Poisson Mode for Insurance Premium Yi Yang, Wei Qian and Hui Zou Juy 27, 2015 Abstract The Tweedie GLM is a widey used method for predicting insurance premiums. However, the inear mode assumption can be too rigid for many appications. As a better aternative, a boosted Tweedie mode is considered in this paper. We propose a TDboost estimator of pure premiums and use a profie ikeihood approach to estimate the index and dispersion parameters. Our method is capabe of fitting a fexibe noninear Tweedie mode and capturing compex interactions among predictors. A simuation study confirms the exceent prediction performance of our method. As an appication, we appy our method to an auto insurance caim data and show that the new method is superior to the existing methods in the sense that it generates more accurate premium predictions, thus heping sove the adverse seection issue. We have impemented our method in a userfriendy R package that aso incudes a nice visuaization too for interpreting the fitted mode. 1 Introduction One of the most important probems in insurance business is to set the premium for the customers (poicyhoders). By insurance contracts, a arge number of poicyhoders individua osses are transformed into a more predictabe, aggregate oss of the insurer. In a McGi University Rochester Institute of Technoogy Corresponding author, [email protected], University of Minnesota 1
2 competitive market, it is advantageous for the insurer to charge a fair premium according to the expected oss of the poicyhoder. In persona car insurance, for instance, if an insurance company charges too much for od drivers and charges too itte for young drivers, then the od drivers wi switch to its competitors, and the remaining poicies for the young drivers woud be underpriced. This resuts in the adverse seection issue (Chiappori and Saanie, 2000; Dionne et a., 2001): the insurer oses profitabe poicies and is eft with bad risks, resuting in economic oss both ways. To appropriatey set the premiums for the insurer s customers, one crucia task is to predict the size of actua (currenty unforeseeabe) caims. In this paper, we wi focus on modeing caim oss, athough other ingredients such as safety oadings, administrative costs, cost of capita, and profit are aso important factors for setting the premium. One difficuty in modeing the caims is that the distribution is usuay highy rightskewed, mixed with a point mass at zero. Such type of data cannot be transformed to normaity by power transformation, and specia treatment on zero caims is often required. As an exampe, Figure 1 shows the histogram of an auto insurance caim data (Yip and Yau, 2005), in which there are 6,290 poicy records with zero caims and 4,006 poicy records with positive osses. The need for predictive modes emerges from the fact that the expected oss is highy dependent on the characteristics of an individua poicy such as age and annua income of the poicyhoder, popuation density of the poicyhoder s residentia area, and age and mode of the vehice. Traditiona methods used generaized inear modes (GLM; Neder and Wedderburn, 1972) for modeing the caim size (e.g. Renshaw, 1994; Haberman and Renshaw, 1996). However, a of these works performed their anayses on a subset of the poicies, which have at east one caim. Aternative approaches have empoyed Tobit modes by treating zero outcomes as censored beow some cutoff points (Van de Ven and van Praag, 1981; Showers and Shotick, 1994), but these approaches rey on a normaity assumption of the atent response. Aternativey, Jørgensen and de Souza (1994) and Smyth and Jørgensen (2002) used GLMs with a Tweedie distributed (Jørgensen, 1987, 1997) outcome to simutaneousy mode frequency and severity of insurance caims. They assume Poisson arriva of caims and gamma distributed amount for individua caims so that the size of the tota caim amount foows a Tweedie compound Poisson distribution. Due to its abiity to simutaneousy mode the zeros and the continuous positive outcomes, the Tweedie GLM 2
3 Frequency Tota Insurance Caim Amount (in $1000) Per Poicy Year Figure 1: Histogram of the auto insurance caim data as anayzed in Yip and Yau (2005). It shows that there are 6290 poicy records with zero tota caims per poicy year, whie the remaining 4006 poicy records have positive osses. 3
4 has been a widey used method in actuaria studies (Midenha, 1999; Murphy et a., 2000; Peters et a., 2008; Quijano Xacur et a., 2011). Despite of the popuarity of the Tweedie GLM, a major imitation is that the structure of the ink function is restricted to a inear form, which can be too rigid for rea appications. In auto insurance, for exampe, it is known that the risk does not monotonicay decrease as age increases (Owsey et a., 1991; McCartt et a., 2003; Anstey et a., 2005). Athough noninearity may be modeed by adding spines (Zhang, 2011), owdegree spines are often inadequate to capture the noninearity in the data, whie highdegree spines often resut in the overfitting issue that produces unstabe estimates. Generaized additive modes (GAM; Hastie and Tibshirani, 1990; Wood, 2006) overcome the restrictive inear assumption of GLMs, and can mode the continuous variabes by smooth functions estimated from data. The structure of the mode, however, has to be determined a priori. That is, one has to specify the main effects and interaction effects to be used in the mode. As a resut, misspecification of nonignorabe effects is ikey to adversey affect prediction accuracy. In this paper, we aim to mode the caim size by a nonparametric Tweedie compound Poisson mode, and we propose a gradient treeboosting agorithm (TDBoost henceforth) to fit this mode. Gradient boosting (Freund and Schapire, 1997, 1996) is one of the most successfu machine earning agorithms for nonparametric regression and cassification. Boosting adaptivey combines a arge number of reativey simpe prediction modes caed base earners into an ensembe earner to achieve high prediction performance. The semina work on the boosting agorithm caed AdaBoost (Freund and Schapire, 1997, 1996) was originay proposed for cassification probems. Later Breiman (1998) and Breiman (1999) pointed out an important connection between the AdaBoost agorithm and a functiona gradient descent agorithm. Friedman et a. (2000), Friedman (2001) and Hastie et a. (2009) deveoped a statistica view of boosting and proposed gradient boosting methods for both cassification and regression. There is a arge body of iterature on boosting. We refer interested readers to Bühmann and Hothorn (2007) for a comprehensive review of boosting agorithms. The TDBoost mode is motivated by the proven success of boosting in machine earning for cassification and regression probems (Friedman, 2001, 2002; Hastie et a., 2009). Its advantages are threefod. First, the mode structure of TDBoost is earned from data and not predetermined, thereby avoiding an expicit mode specification. Noninearities, 4
5 discontinuities, compex and higher order interactions are naturay incorporated into the mode to reduce the potentia modeing bias and to produce high predictive performance, which enabes TDboost to serve as a benchmark mode in scoring insurance poicies, guiding pricing practice, and faciitating marketing efforts. Second, in contrast to other nonparametric statistica earning methods, TDboost can provide interpretabe resuts, by means of the partia dependence pots, and reative importance of the predictors. Feature seection is performed as an integra part of the procedure. In addition, TDboost handes the predictor and response variabes of any type without the need for transformation, and it is highy robust to outiers. Missing vaues in the predictors are managed amost without oss of information (Eith et a., 2008). A these properties make TDboost an attractive too for insurance premium modeing. The remainder of this paper is organized as foows. We briefy review the gradient boosting agorithm and the Tweedie compound Poisson mode in Section 2 and Section 3, respectivey. We present the main methodoogy deveopment with impementation detais in Section 4. In Section 5, we use simuation to show the high predictive accuracy of TDboost. As an appication, we appy TDboost to anayze an auto insurance caim data in Section 6. 2 Gradient Boosting To keep the paper sefcontained, we briefy expain the genera procedures for the gradient boosting. Let x = (x 1,..., x p ) be the pdimensiona predictor variabes and y be the onedimensiona response variabe. The goa is to estimate the optima prediction function F ( ) that maps x to y by minimizing the expected vaue of a oss function Ψ(, ) over the function cass F: F ( ) = arg mine y,x [Ψ(y, F (x))], F ( ) F where Ψ is assumed to be differentiabe with respect to F. Given the observed data {y i, x i } n i=1, estimation of F ( ) can be done by minimizing the empirica risk function min R 1 n(f ) =: min F ( ) F F ( ) F n n Ψ(y i, F (x i )). (1) i=1 5
6 For the gradient boosting, each candidate function F F is assumed to be an ensembe of M base earners M F (x) = F [0] + β [m] h(x; ξ [m] ), (2) m=1 where h(x; ξ [m] ) usuay beongs to a cass of some simpe functions of x caed base earners (e.g., regression/decision tree) with the parameter ξ [m] (m = 1, 2,, M). F [0] is a constant scaar and β [m] is the expansion coefficient. Note that differing from the usua structure of an additive mode, there is no restriction on the number of predictors to be incuded in each h( ), and consequenty, highorder interactions can be easiy considered using this setting. A forward stagewise agorithm is adopted to approximate the minimizer of (1), which buids up the components β [m] h(x; ξ [m] ) (m = 1, 2,..., M) sequentiay through a gradientdescentike approach. At each iteration stage m (m = 1, 2,...), suppose the current estimate for F ( ) is ˆF [m 1] ( ). In principe, we want to update ˆF [m 1] ( ) to ˆF [m] ( ) aong the negative gradient direction of R n (F ). However, we ony know the negative gradient at the training data points. Indeed, the negative gradient vector (u [m] 1,..., u [m] n ) of R n (F ) with respect to F at {F (x i ) = ˆF [m 1] (x i )} n i=1 can be written as i = R n(f ) F (x i ) u [m] F (xi )= ˆF [m 1] (x i ), but we cannot directy update ˆF [m 1] ( ) using (u [m] 1,..., u [m] n ), as u [m] is ony defined at x i for i = 1,..., n and cannot make predictions based on new data not represented in the training set. To sove this probem, the gradient boosting fits the negative gradient vector (u [m] 1,..., u [m] n ) (as the working response) to (x 1,..., x n ) (as the predictor) to find a base earner h(x; ξ [m] ). Thus, the fitted h(x; ξ [m] ) can be viewed as an approximation of the negative gradient, and it can be evauated on the entire space of x. The expansion coefficient β [m] can then be determined by a ine search β [m] = argmin β n Ψ(y i, ˆF [m 1] (x i ) + βh(x; ξ [m] )), (3) i=1 which is a reminiscence of the steepest descent method. Consequenty, the estimation of i 6
7 F (x) for the next stage is ˆF [m] (x) := ˆF [m 1] (x) + νβ [m] h(x; ξ [m] ), (4) where 0 < ν 1 is the shrinkage factor (Friedman, 2001) that contros the update step size. A sma ν imposes more shrinkage whie ν = 1 gives compete negative gradient steps. Friedman (2001) has found that the shrinkage factor reduces overfitting and improves the predictive accuracy. 3 Compound Poisson Distribution and Tweedie Mode In this section, we briefy introduce the compound Poisson distribution and Tweedie mode, which is necessary for our methodoogy deveopment. Let N be a Poisson random variabe denoted by Pois(λ), and et Z d s be i.i.d. gamma random variabes denoted by Gamma(α, γ) with mean αγ and variance αγ 2. Assume N is independent of Z d s. Define a random variabe Z by 0 if N = 0 Z = Z 1 + Z Z. (5) N if N = 1, 2,... Thus Z is the Poisson sum of independent Gamma random variabes. The resuting distribution of Z is referred to as the compound Poisson distribution (Feer, 1968; BarLev and Stramer, 1987; Jørgensen and de Souza, 1994; Smyth and Jørgensen, 2002), which is known to be cosey connected to exponentia dispersion modes (EDM) (Jørgensen, 1987, 1997). Note that the distribution of Z has a probabiity mass at zero: P r(z = 0) = exp( λ). Then based on that Z conditiona on N = j is Gamma(jα, γ), the distribution function of Z can be written as f Z (z λ, α, γ) = P r(n = 0)d 0 (z) + = exp( λ)d 0 (z) + P r(n = j)f Z N=j (z) j=1 λ j e λ j=1 j! z jα 1 e z/γ γ jα Γ(jα), 7
8 where d 0 is the Dirac deta function at zero and f Z N=j is the conditiona density of Z given N = j. This gives the cumuant generating function of Z og M Z (t) = λ{(1 γt) α 1}. (6) Smyth (1996) pointed out that the compound Poisson distribution beongs to a specia cass of EDMs known as Tweedie modes (Tweedie, 1984). The EDMs are defined by the form { zθ κ(θ) } f Z (z θ, φ) = a(z, φ) exp, (7) φ where a( ) is a normaizing function, κ( ) is caed the cumuant function, and both a( ) and κ( ) are known. The parameter θ is in R and the dispersion parameter φ is in R +. EDMs have the property that the mean E(Z) µ = κ(θ) and the variance Var(Z) = φ κ(θ), where κ(θ) and κ(θ) are the first and second derivatives of κ(θ), respectivey. The cumuant generating function of EDMs is og M Z (t) = 1 {κ(θ + tφ) κ(θ)}. (8) φ Tweedie modes are specia cases of the EDMs characterized by power meanvariance reationship Var(Z) = φµ ρ for some index parameter ρ. Such meanvariance reation gives µ 1 ρ 1 ρ θ =, ρ 1 µ 2 ρ, κ(θ) =, ρ 2 2 ρ. (9) og µ, ρ = 1 og µ, ρ = 2 One can show that the compound Poisson distribution beongs to the cass of Tweedie modes. Indeed, if we repace the parameters (λ, α, γ) in the cumuant function (6) by λ = 1 µ 2 ρ φ 2 ρ, α = 2 ρ ρ 1, γ = φ(ρ 1)µρ 1, (10) the cumuant function of the compound Poisson mode has the form of a Tweedie mode with 1 < ρ < 2 and µ > 0. As a resut, for the rest of this paper, we ony consider the mode (5), and simpy refer to (5) as the Tweedie mode (or Tweedie compound Poisson mode), denoted by Tw(µ, φ, ρ), where 1 < ρ < 2 and µ > 0. 8
9 It is straightforward to show that the ogikeihood of the Tweedie mode is og f Z (z µ, φ, ρ) = 1 φ where the normaizing function a( ) can be written as ) (z µ1 ρ 1 ρ µ2 ρ + og a(z, φ, ρ), (11) 2 ρ 1 z t=1 a(z, φ, ρ) = W t(z, φ, ρ) = 1 z tα z t=1 for z > 0 (ρ 1) tα φ t(1+α) (2 ρ) t t!γ(tα), 1 for z = 0 and α = (2 ρ)/(ρ 1) and t=1 W t is an exampe of Wright s generaized Besse function (Tweedie, 1984). One of the desirabe properties of Tweedie modes is that they are the ony EDMs that are scae invariant (Jørgensen, 1997, Section 4.1.1): if Z is a Tweedie variabe with mean µ and dispersion φ, then cz foows the same distribution with mean cµ and dispersion c 2 ρ φ. This property makes Tweedie distributions a good choice for modeing data with an arbitrary monetary unit. 4 Our proposa In this section, we propose to integrate the Tweedie mode to the treebased gradient boosting agorithm to predict insurance caim size. Specificay, our discussion focuses on modeing the persona car insurance as an iustrating exampe (see Section 6 for a rea data anaysis), since our modeing strategy is easiy extended to other ines of nonife insurance business. Given an auto insurance poicy i, et N i be the number of caims (known as the caim frequency) and Z di be the size of each caim observed for d i = 1,..., N i. Let w i be the poicy duration, that is, the ength of time that the poicy remains in force. Then Z i = N i d i =1 Z di is the tota caim amount. In the foowing, we are interested in modeing the ratio between the tota caim and the duration Y i = Z i /w i, a key quantity known as the pure premium (Ohsson and Johansson, 2010). Foowing the settings of the compound Poisson mode, we assume N i is Poisson distributed, and its mean λ i w i has a mutipicative reation with the duration w i, where λ i is a poicyspecific parameter representing the expected caim frequency under unit duration. 9
10 Conditiona on N i, assume Z di s (d i = 1,..., N i ) are i.i.d. Gamma(α, γ i ), where γ i is a poicyspecific parameter that determines caim severity, and α is a constant. Furthermore, we assume that under unit duration (i.e., w i = 1), the meanvariance reation of a poicy satisfies for a poicies, where Y i ρ = (α + 2)/(α + 1). Note that V ar(y i ) = φ[e(y i )] ρ (12) is the pure premium under unit duration, φ is a constant, and µ i := E(Y i ) = E(E(Y i N i )) = λ i αγ i, V ar(y i ) = E(V ar(y i N i )) + V ar(e(y i N i )) = λ i αγ 2 i + λ i α 2 γ 2 i. Simiary, under duration w i, µ i := E(Y i ) = 1 w i E(Z i ) = λ i αγ i, V ar(y i ) = 1 w 2 i V ar(z i ) = (λ i αγ 2 i + λ i α 2 γ 2 i )/w i. As a resut, we can obtain the meanvariance reation for the pure premium Y i that V ar(y i ) = 1 w i V ar(y i ) = φ w i (µ i ) ρ = φ w i µ ρ i, (13) where the second equation foows by (12). Consequenty, the scae invariant property of Tweedie distribution and (13) impies that Y i Tw(µ i, φ/w i, ρ). Under the aforementioned settings, consider a portfoio of poicies {(y i, x i, w i )} n i=1 from n independent insurance contracts, where for the ith contract, y i is the poicy pure premium, x i is a vector of expanatory variabes that characterize the poicyhoder and the risk being insured (e.g. house, vehice), and w i is the duration. We then assume that the expected 10
11 pure premium µ i is determined by a predictor function F : R p R of x i : og{µ i } = og{e(y i x i )} = F (x i ). (14) In this paper, we do not impose a inear or other parametric form restriction on F ( ). Given the fexibiity of F ( ), we ca such setting as the boosted Tweedie mode (as opposed to the Tweedie GLM). Given {(y i, x i, w i )} n i=1, the ogikeihood function can be written as (F ( ), φ, ρ {y i, x i, w i } n i=1) = = n og f Y (y i µ i, φ/w i, ρ), i=1 n w i µ (y 1 ρ ) i i φ 1 ρ µ2 ρ i + og a(y i, φ/w i, ρ). (15) 2 ρ i=1 4.1 Estimating F ( ) via TDboost We estimate the predictor function F ( ) by integrating the boosted Tweedie mode into the treebased gradient boosting agorithm. To deveop the idea, we assume that φ and ρ are given for the time being. The joint estimation of F ( ), φ and ρ wi be studied in Section 4.2. Given ρ and φ, we repace the genera objective function in (1) by the negative ogikeihood derived in (15), and target the minimizer function F ( ) over a cass F of base earner functions in the form of (2). That is, we intend to estimate { F (x) = argmin (F ( ), φ, ρ {yi, x i, w i } n i=1) } = argmin F F F F n Ψ(y i, F (x i ) ρ), (16) i=1 where Ψ(y i, F (x i ) ρ) = w i { y i exp[(1 ρ)f (x i )] 1 ρ } + exp[(2 ρ)f (x i)]. 2 ρ Note that in contrast to (16), the function cass targeted by Tweedie GLM (Smyth, 1996) is restricted to a coection of inear functions of x. We propose to appy the forward stagewise agorithm described in Section 2 for soving (16). The initia estimate of F ( ) is chosen as a constant function that minimizes the 11
12 negative ogikeihood: ˆF [0] = argmin = og η n Ψ(y i, η ρ) i=1 ( n i=1 w iy i n i=1 w i ). This corresponds to the best estimate of F without any covariates. Let ˆF [m 1] be the current estimate before the mth iteration. At the mth step, we fit a base earner h(x; ξ [m] ) via ξ[m] = argmin ξ [m] n i=1 [u [m] i h(x i ; ξ [m] )] 2, (17) where (u [m] 1,..., u [m] n ) is the current negative gradient of Ψ( ρ), i.e., i = Ψ(y i, F (x i ) ρ) F (x i ) u [m] F (xi )= ˆF [m 1] (x i ) (18) = w i { yi exp[(1 ρ) ˆF [m 1] (x i )] + exp[(2 ρ) ˆF [m 1] (x i )] }, (19) and use an Ltermina node regression tree h(x; ξ [m] ) = L =1 u [m] I(x R [m] ) (20) with parameters ξ [m] = {R [m], u [m] } L =1 as the base earner. To find R[m] and u [m], we use a fast topdown bestfit agorithm with a east squares spitting criterion (Friedman et a., 2000) to find the spitting variabes and corresponding spit ocations that determine the fitted termina regions { as the mean faing in each region: [m] R } L =1. Note that estimating the R[m] entais estimating the u [m] ū [m] = mean i:xi [m](u [m] R i ) = 1,..., L. Once the base earner h(x; ξ [m] ) has been estimated, the optima vaue of the expansion 12
13 coefficient β [m] is determined by a ine search β [m] = argmin β = argmin β n Ψ(y i, ˆF [m 1] (x i ) + βh(x i ; ξ [m] ) ρ) (21) i=1 n Ψ(y i, ˆF [m 1] (x i ) + β i=1 The regression tree (20) predicts a constant vaue ū [m] L =1 ū [m] I(x i within each region (21) by a separate ine search performed within each respective region (21) reduces to finding a best constant η [m] R [m] based on the foowing criterion: ˆη [m] = argmin where the soution is given by ˆη [m] = og η { i:x i [m] i:x i R i:x i R [m] R [m] [m] R ) ρ). [m] R, so we can sove [m] R. The probem to improve the current estimate in each region Ψ(y i, ˆF [m 1] (x i ) + η ρ), = 1,..., L, (22) w i y i exp[(1 ρ) ˆF } [m 1] (x i )] w i exp[(2 ρ) ˆF, = 1,..., L. (23) [m 1] (x i )] Having found the parameters {ˆη [m] } L =1, we then update the current estimate ˆF [m 1] (x) in each corresponding region ˆF [m] (x) = ˆF [m 1] (x) + ν ˆη [m] I(x R [m] ), = 1,..., L, (24) where 0 < ν 1 is the shrinkage factor. Foowing (Friedman, 2001), we set ν = in our impementation. More discussions on the choice of tuning parameters are in Section 4.4. In summary, the compete TDBoost agorithm is shown in Agorithm 1. The boosting step is repeated M times and we report ˆF [M] (x) as the fina estimate. 4.2 Estimating (ρ, φ) via profie ikeihood Foowing Smyth (1996) and Dunn and Smyth (2005), we use the profie ikeihood to estimate the dispersion φ and the index parameter ρ, which jointy determine the meanvariance 13
14 Agorithm 1 TDboost 1. Initiaize ˆF [0] ˆF [0] = og ( n i=1 w iy i n i=1 w i ). 2. For m = 1,..., M repeatedy do steps 2.(a) 2.(d) 2.(a) Compute the negative gradient {u [m] i } n i=1 u [m] i = w i { yi exp[(1 ρ) ˆF [m 1] (x i )] + exp[(2 ρ) ˆF [m 1] (x i )] } i = 1,..., n. 2.(b) Fit the negative gradient vector {u [m] i } n i=1 to x 1,..., x n by an Ltermina node [m] regression tree, giving us the partitions { R } L =1. 2.(c) Compute the optima termina node predictions η [m] 1, 2,..., L { ˆη [m] = og i:x i i:x i R [m] R [m] for each region w i y i exp[(1 ρ) ˆF } [m 1] (x i )] w i exp[(2 ρ) ˆF. [m 1] (x i )] R [m], = 2.(d) Update ˆF [m] (x) for each region [m] R, = 1, 2,..., L ˆF [m] (x) = ˆF [m 1] (x) + ν ˆη [m] I(x 3. Report ˆF [M] (x) as the fina estimate. R [m] ) = 1, 2,..., L. 14
15 reation V ar(y i ) = φµ ρ i /w i of the pure premium. We expoit the fact that in Tweedie modes the estimation of µ depends ony on ρ: given a fixed ρ, the mean estimate µ (ρ) can be soved in (16) without knowing φ. Then conditiona on this ρ and the corresponding µ (ρ), we maximize the ogikeihood function with respect to φ by { φ (ρ) = argmax (µ (ρ), φ, ρ) }, (25) φ which is a univariate optimization probem that can be soved using a combination of goden section search and successive paraboic interpoation (Brent, 2013). In such a way, we have determined the corresponding (µ (ρ), φ (ρ)) for each fixed ρ. Then we acquire the estimate of ρ by maximizing the profie ikeihood with respect to 50 equay spaced vaues {ρ 1,..., ρ 50 } on (0, 1): ρ = argmax { (µ (ρ), φ (ρ), ρ) }. (26) ρ {ρ 1,...,ρ 50 } Finay, we appy ρ in (16) and (25) to obtain the corresponding estimates µ (ρ ) and φ (ρ ). There are some computationa issues, which must be taken care of when evauating the ogikeihood functions in (25) and (26): since in genera there are no cosed forms for Tweedie densities, in ikeihood evauation one must dea with an infinite summation in the normaizing function a(y, φ, ρ) = 1 y t=1 W t. For numerica evauation of Tweedie densities, Dunn and Smyth (2005) proposed a series expansions approach, which sums an infinite series arising from a Tayor expansion of the characteristic function. Aternativey, Dunn and Smyth (2008) deveoped a Fourier inversion approach, which consists of an inversion of the characteristic function based on numerica integration methods for osciating functions. These two numerica methods turn out to be compementary since each has advantages under a certain situation: when ony considering the case 1 < ρ < 2, the series approach performs very we for sma y but graduay oses computationa efficiency as y increases, whereas the inversion approach performs very we for arge y but graduay fais to provide accurate resuts as y decreases. Hence the inversion approach is preferred for arge y and the series approach for sma y. Dunn and Smyth (2008) provided a simpe guideine to choose between the two methods. In this paper we use their R package tweedie (avaiabe at rproject.org/web/packages/tweedie/index.htm) for evauating Tweedie densities in our profie ikeihood computation. For further detais regarding their agorithms, the reader 15
16 may refer to Dunn and Smyth (2005, 2008). 4.3 Mode interpretation Compared to other nonparametric statistica earning methods such as neura networks and kerne machines, our new estimator provides interpretabe resuts. In this section, we discuss some ways for mode interpretation after fitting the boosted Tweedie mode Margina effects of predictors The main effects and interaction effects of the variabes in the boosted Tweedie mode can be extracted easiy. In our estimate we can contro the order of interactions by choosing the tree size L (the number of termina nodes) and the number p of predictors. A tree with L termina nodes produces a function approximation of p predictors with interaction order of at most min(l 1, p). For exampe, a stump (L = 2) produces an additive TDboost mode with ony the main effects of the predictors, since it is a function based on a singe spitting variabe in each tree. Setting L = 3 aows both main effects and second order interactions. Foowing Friedman (2001) we use the socaed partia dependence pots to visuaize the main effects and interaction effects. Given the training data {y i, x i } n i=1, with a pdimensiona input vector x = [x 1, x 2,..., x p ], et z s be a subset of size s, such that z s = {z 1,..., z s } {x 1,..., x p }. For exampe, to study the main effect of the variabe j, we set the subset z s = {z j }, and to study the second order interaction of variabes i and j, we set z s = {z i, z j }. Let z \s be the compement set of z s, such that z \s z s = {x 1,..., x p }. Let the prediction ˆF (z s z \s ) be a function of the subset z s conditioned on specific vaues of z \s. The partia dependence of ˆF (x) on z s then can be formuated as ˆF (z s z \s ) averaged over the margina density of the compement subset z \s ˆF s (z s ) = ˆF (z s z \s )p \s (z \s )dz \s, (27) where p \s (z \s ) = p(x)dz s is the margina density of z \s. We estimate (27) by F s (z s ) = 1 n n i=1 ˆF (z s z \s,i ), (28) 16
17 where {z \s,i } n i=1 are evauated at the training data. We then pot F s (z s ) against z s. We have incuded the partia dependence pot function in our R package TDboost. We wi demonstrate this functionaity in Section Variabe importance In many appications identifying reevant predictors of the mode in the context of treebased ensembe methods is of interest. The TDboost mode defines a variabe importance measure for each candidate predictor X j in the set X = {X 1,..., X p } in terms of prediction/expanation of the response Y. The major advantage of this variabe seection procedure, as compared to univariate screening methods, is that the approach considers the impact of each individua predictor as we as mutivariate interactions among predictors simutaneousy. We start by defining the variabe importance (VI henceforth) measure in the context of a singe tree. First introduced by Breiman et a. (1984), the VI measure I Xj (T m ) of the variabe X j in a singe tree T m is defined as the tota heterogeneity reduction of the response variabe Y produced by X j, which can be estimated by adding up a the decreases in the squared error reductions ˆδ obtained in a L 1 interna nodes when X j is chosen as the spitting variabe. Denote v(x j ) = the event that X j is seected as the spitting variabe in the interna node, and et I j = I(v(X j ) = ). Then where ˆδ I Xj (T m ) = L 1 =1 ˆδ I j, (29) is defined as the squared error difference between the constant fit and the two subregion fits (the subregion fits are achieved by spitting the region associated with the interna node into the eft and right regions). Friedman (2001) extended the VI measure I Xj for the boosting mode with a combination of M regression trees, by averaging (29) over {T 1,..., T M }: I Xj = 1 M M I Xj (T m ). (30) m=1 Despite of the wide use of the VI measure, Breiman et a. (1984), White and Liu (1994) and Kononenko (1995) among others have pointed out that the VI measures (29) and (30) 17
18 are biased: even if X j is a noninformative variabe to Y (not correated to Y ), X j may sti be seected as a spitting variabe, hence the VI measure of X j is nonzero by Equation (30). Foowing Sandri and Zuccootto (2008) and Sandri and Zuccootto (2010) to avoid the variabe seection bias, we compute an adjusted VI measure for each expanatory variabe by permutating each X j : (1) For s = 1,..., S, repeat steps (2) (4). (2) Generate a matrix z s by randomy permutating (without repacement) the n rows of the design matrix x, whie keeping the order of coumns unchanged. (3) Create an n 2p matrix x s = [x, z s ] by binding z s with x matrix by coumn. (4) Use the data {y, x s } to fit the mode, and compute VI measures I s X j for X j and I s Z s j for Z s j, where Z s j (jth coumn of Z s ) is the pseudopredictor corresponding to X j. (5) Compute the VI measure I Xj as the average of I s X j and the baseine I Zj as the average of I s Z s j I Xj = 1 S S IX s j I Zj = 1 S s=1 S IZ s. (31) j s s=1 (6) Report the adjusted VI measure as I adj X j = I Xj I Zj for the variabe X j. The basic idea of the above agorithm is the foowing: the permutation breaks the association between the response variabe Y and each pseudopredictor Z s j, but sti preserves the association between Z s j and Z s k (k j); since Zs j is reshuffed from X j, Z s j has the same number of possibe spits as the corresponding predictor X j and has approximatey the same probabiity of being seected in spit nodes. Therefore, I Zj approximation of the importance of X j. can be viewed as a bias 4.4 Impementation We have impemented our proposed method in an R package TDboost, which is pubicy avaiabe from the Comprehensive R Archive Network at web/packages/tdboost/index.htm. Here, we discuss the choice of three meta parameters 18
19 in Agorithm 1: L (the size of the trees), ν (the shrinkage factor) and M (the number of boosting steps). To avoid overfitting and improve outofsampe predictions, the boosting procedure can be reguarized by imiting the number of boosting iterations M (eary stopping; Zhang and Yu, 2005) and the shrinkage factor ν. Empirica evidence (Friedman, 2001; Bühmann and Hothorn, 2007; Ridgeway, 2007; Eith et a., 2008) showed that the predictive accuracy is amost aways better with a smaer shrinkage factor at the cost of more computing time. However, smaer vaues of ν usuay requires a arger number of boosting iterations M and hence induces more computing time (Friedman, 2001). ν = throughout and determine M by the data. We choose a sufficienty sma The vaue L shoud refect the true interaction order in the underying mode, but we amost never have such prior knowedge. Therefore we choose the optima M and L using K fod cross vaidation, starting with a fixed vaue of L. The data are spit into K roughy equasized fods. Let an index function π(i) : {1,..., n} {1,..., K} indicate the fod to which observation i is aocated. Each time, we remove the kth fod of the data (k = 1, 2,..., K), and train the mode using the remaining K 1 fods. Denoting by [M] ˆF k (x) the resuting mode, we compute the vaidation oss by predicting on each kth fod of the data removed: CV(M, L) = 1 n n Ψ(y i, i=1 ˆF [M] π(i) (x i; L) ρ). (32) We seect the optima M at which the minimum vaidation oss is reached M L = argmin CV(M, L). M If we need to seect L too, then we repeat the whoe process for severa L (e.g. L = 2, 3, 4, 5) and choose the one with the smaest minimum generaization error L = argmin L CV(L, M L ). For a given ν, fitting trees with higher L eads to smaer M being required to reach the minimum error. 19
20 5 Simuation Studies In this section, we compare TDboost with the Tweedie GLM mode (TGLM: Jørgensen and de Souza, 1994) and the Tweedie GAM mode in terms of the function estimation performance. The Tweedie GAM mode is proposed by Wood (2001), which is based on a penaized regression spine approach with automatic smoothness seection. There is an R package MGCV accompanying the work, avaiabe at packages/mgcv/index.htm. In a numerica exampes beow using the TDboost mode, fivefod cross vaidation is adopted for seecting the optima (M, L) pair, whie the shrinkage factor ν is set to its defaut vaue of Case I In this simuation study, we demonstrate that TDboost is we suited to fit target functions that are noninear or invove compex interactions. We consider two true target functions: Mode 1 (Discontinuous function): The target function is discontinuous as defined by F (x) = 0.5I(x > 0.5). We assume x Unif(0, 1), and y Tw(µ, φ, ρ) with ρ = 1.5 and φ = 0.5. Mode 2 (Compex interaction): The target function has two his and two vaeys. F (x 1, x 2 ) = e 5(1 x 1) 2 +x e 5x 2 1 +(1 x 2) 2, which corresponds to a common scenario where the effect of one variabe changes depending on the effect of another. We assume x 1, x 2 Unif(0, 1), and y Tw(µ, φ, ρ) with ρ = 1.5 and φ = 0.5. We generate n = 1000 observations for training and n = 1000 for testing, and fit the training data using TDboost, MGCV, and TGLM. Since the true target functions are known, we consider the mean absoute deviation (MAD) as performance criteria, MAD = 1 n F (x n i ) ˆF (x i ), i=1 20
21 Mode TGLM MGCV TDboost (0.0006) (0.0016) (0.0021) (0.0009) (0.0004) (0.0008) Tabe 1: The averaged MADs and the corresponding standard errors based on 100 independent repications. where both the true predictor function F (x i ) and the predicted function ˆF (x i ) are evauated on the test set. The resuting MADs on the testing data are reported in Tabe 1, which are averaged over 100 independent repications. The fitted functions from Mode 2 are potted in Figure 2. In both cases, we find that TDboost outperforms TGLM and MGCV in terms of the abiity to recover the true functions and gives the smaest prediction errors. 5.2 Case II The idea is to see the performance of the TDboost estimator and MGCV estimator on a variety of very compicated, randomy generated predictor functions, and study how the size of the training set, distribution settings and other characteristics of probems affect fina performance of the two methods. We use the random function generator (RFG) mode by Friedman (2001) in our simuation. The true target function F is randomy generated as a inear expansion of functions {g k } 20 k=1 : 20 F (x) = b k g k (z k ). (33) k=1 Here each coefficient b k is a uniform random variabe from Unif[ 1, 1]. Each g k (z k ) is a function of z k, where z k is defined as a p k sized subset of the tendimensiona variabe x in the form z k = {x ψk (j)} p k j=1, (34) where each ψ k is an independent permutation of the integers {1,..., p}. The size p k is randomy seected by min( r k, p), where r k is generated from an exponentia distribution with mean 2. Hence the expected order of interactions presented in each g k (z k ) is between 21
22 (a) True F (x 1,x 2 ) (b) TDboost ˆF (x 1,x 2 ) F (x 1,x 2) ˆF (x 1,x 2) x x x x (c) TGLM ˆF (x 1,x 2 ) (d) MGCV ˆF (x 1,x 2 ) ˆF (x 1,x 2) 1.5 ˆF (x 1,x 2) x x x x Figure 2: Fitted curves that recover the target function defined in Mode 2. The top eft figure shows the true target function. The top right, bottom eft, and bottom right figures show the predictions on the testing data from TDboost, TGLM, and MGCV, respectivey. 22
23 four and five. Each function g k (z k ) is a p k dimensiona Gaussian function: { g k (z k ) = exp 1 } 2 (z k u k ) T V k (z k u k ), (35) where each mean vector u k is randomy generated from N(0, I pk ). The p k p k covariance matrix V k is defined by V k = U k D k U T k, (36) where U k is a random orthonorma matrix, D k = diag{d k [1],..., d k [p k ]}, and the square root of each diagona eement d k [j] is a uniform random variabe from Unif[0.1, 2.0]. We generate data {y i, x i } n i=1 according to y i Tw(µ i, φ, ρ), x i N(0, I p ), i = 1,..., n, (37) where µ i = exp{f (x i )}. Setting I: when the index is known Firsty, we study the situation that the true index parameter ρ is known when fitting modes. We generate data according to the RFG mode with index parameter ρ = 1.5 and the dispersion parameter φ = 1 in the true mode. We set the number of predictors to be p = 10 and generate n {1000, 2000, 5000} observations as training sets, on which both MGCV and TDboost are fitted with ρ specified to be the true vaue 1.5. An additiona test set of n = 5000 observations was generated for evauating the performance of the fina estimate. Figure 3 shows simuation resuts for comparing the estimation performance of MGCV and TDboost, when varying the training sampe size. The empirica distributions of the MADs shown as boxpots are based on 100 independent repications. We can see that in a of the cases, TDboost outperforms MGCV in terms of prediction accuracy. We aso test estimation performance on µ when the index parameter ρ is misspecified, that is, we use a guess vaue ρ differing from the true vaue ρ when fitting the TDboost mode. Because µ is statisticay orthogona to φ and ρ, meaning that the offdiagona eements of the Fisher information matrix are zero (Jørgensen, 1997), we expect ˆµ wi vary very sowy as ρ changes. Indeed, using the previous simuation data with the true vaue ρ = 1.5 and 23
24 Mean absoute deviation (MAD) N = 1000 N = 2000 N = 5000 MGCV T DBoost MGCV T DBoost MGCV T DBoost Method Figure 3: Simuation resuts for Setting I: compare the estimation performance of MGCV and TDboost when varying the training sampe size and the dispersion parameter in the true mode. Boxpots dispay empirica distributions of the MADs based on 100 independent repications. φ = 1, we fitted TDboost modes with nine guess vaues of ρ {1.1, 1.2,..., 1.9}. The resuting MADs are dispayed in Figure 4, which shows the choice of the vaue ρ has amost no significant effect on estimation accuracy of µ. Setting II: using the estimated index Next we study the situation that the true index parameter ρ is unknown, and we use the estimated ρ obtained from the profie ikeihood procedure discussed in Section 4.2 for fitting the mode. The same data generation scheme is adopted as in Setting I, except now both MGCV and TDboost are fitted with ρ estimated by maximizing the profie ikeihood. Figure 5 shows simuation resuts for comparing the estimation performance of MGCV and TDboost in such setting. We can see that the resuts have no significant difference to the resuts of Setting I: TDboost sti outperforms MGCV in terms of prediction accuracy when using the estimated ρ instead of the true vaue. Lasty, we demonstrate our resuts from the estimation of the dispersion φ and the index ρ by using the profie ikeihood. A tota number of 200 sets of training sampes are randomy 24
25 0.5 Mean absoute deviation (MAD) ρ Figure 4: Simuation resuts for Setting I when the index is misspecified: the estimation performance of TDboost when varying the vaue of the index parameter ρ {1.1, 1.2,..., 1.9}. In the true mode ρ = 1.5 and φ = 1. Boxpots show empirica distributions of the MADs based on 200 independent repications N = 1000 N = 2000 N = 5000 Mean absoute deviation (MAD) MGCV T DBoost MGCV T DBoost MGCV T DBoost Method Figure 5: Simuation resuts for Setting II: compare the estimation performance of MGCV and TDboost when varying the training sampe size and the dispersion parameter in the true mode. Boxpots dispay empirica distributions of the MADs based on 100 independent repications. 25
26 Profie ogikeihood ρ Figure 6: The curve represents the profie ikeihood function of ρ from a singe run. The dotted ine shows the true vaue ρ = 1.7. The soid ine shows the estimated vaue ρ = 1.68 corresponding to the maximum ikeihood. The associated estimated dispersion is φ =1.89. generated from a true mode according to the setting (37) with φ = 2 and ρ = 1.7, each sampe having 2000 observations. We fit the TDboost mode on each sampe and compute the estimates φ at each of the 50 equay spaced vaues {ρ 1,..., ρ 50 } on (1, 2). The (ρ j, φ (ρ j )) corresponding to the maxima profie ikeihood is the estimate of (ρ, φ). The estimation process is repeated 200 times. The estimated indices have mean ρ = 1.68 and standard error SE(ρ ) = 0.026, so the true vaue ρ = 1.7 is within ρ ± SE(ρ ). The estimated dispersions have mean φ = 1.82 and standard error SE(φ ) = Figure 6 shows the profie ikeihood function of ρ for a singe run. 26
27 Tota Caim Amount % obs. % of tota sum Mean Median (0, 10000] (10000, 50000] > Tabe 2: Description of the individua tota caim amount in the ast five years. 6 Appication: Automobie Caims We consider an auto insurance caim dataset as anayzed in Yip and Yau (2005) and Zhang and Yu (2005). The data set contains 10,296 driver vehice records, each record incuding an individua driver s tota caim amount (z i ) in the ast five years (w i = 5) and 17 characteristics x i = (x i,1,..., x i,17 ) for the driver and the insured vehice. We want to predict the expected pure premium based on x i. Tabe 2 and Tabe 3 summarize the data set. The histogram of the tota caim amounts in Figure 1 shows that the empirica distribution of these vaues is highy skewed. We find that approximatey 61.1% of poicyhoders had no caims, and approximatey 29.6% of the poicyhoders had a positive caim amount up to 10,000 doars. Note that ony 9.3% of the poicyhoders had a high caim amount above 10,000 doars, but the sum of their caim amount made up to 64% of the overa sum. We separate the entire dataset into a training set and a testing set with equa size. Then the TDboost mode is fitted on the training set and tuned with fivefod cross vaidation. For comparison, we aso fit TGLM and MGCV, both of which are fitted using a the expanatory variabes. In MGCV, the numerica variabes AGE, BLUEBOOK, HOME KIDS, INCOME, KIDSDRIV, MVR PTS, NPOLICY, RETAINED and TRAVTIME are modeed by smooth terms represented using penaized regression spines. We find the appropriate smoothness for each appicabe mode term using Generaized Cross Vaidation (GCV) (Craven and Wahba, 1978; Wahba, 1990). For the TDboost mode, it is not necessary to carry out data transformation, since the treebased boosting method can automaticay hande different types of data. For other modes, we use ogarithmic transformation on two variabes, i.e. og(bluebook), og(income+10), and scae a the numerica variabes except for HOMEKIDS, KIDSDRIV, MVR PTS and NPOLICY to have mean 0 and standard deviation 1. We aso create dummy variabes for the categorica variabes with more 27
28 ID Variabe Type Description 1 AGE N Driver s age 2 BLUEBOOK N Vaue of vehice 3 HOMEKIDS N Number of chidren 4 INCOME N Annua income 5 KIDSDRIV N Number of driving chidren 6 MVR PTS N Motor vehice record points 7 NPOLICY N Number of poicies 8 RETAINED N Number of years as a customer 9 TRAVTIME N Distance to work 10 AREA C Home/work area: Rura, Urban 11 CAR USE C Vehice use: Commercia, Private 12 CAR TYPE C Type of vehice: Pane Truck, Pickup, Sedan, Sports Car, SUV, Van 13 GENDER C Driver s gender: F, M 14 JOBCLASS C Unknown, Bue Coar, Cerica, Doctor, Home Maker, Lawyer, Manager, Professiona, Student 15 MAX EDUC C Education eve: High Schoo or Beow, Bacheors, High Schoo, Masters, PhD 16 MARRIED C Married or not: Yes, No 17 REVOKED C Whether icense revoked in past 7 years: Yes, No Tabe 3: Expanatory variabes in the caim history data set. Type N stands for numerica variabe, Type C stands for categorica variabe. AGE INCOME HOMEKIDS BLUEBOOK KIDSDRIV Min st Qu Median Mean rd Qu Max NPOLICY RETAINED TRAVTIME MVR PTS Min st Qu Median Mean rd Qu Max Tabe 4: Descriptive statistics for the continuous variabes in the caim history data set in Section 6. 28
29 AREA MARRIED REVOKED GENDER Rura: 20.2% No: 39.9% No: 87.8% F: 53.8% Urban: 79.8% Yes: 60.1% Yes: 12.2% M: 46.2% CAR USE MAX EDUC CAR TYPE JOBCLASS Private: 63.2% <High Schoo: 14.6% Pane Truck: 8.3% Bue Coar: 22.2% Commercia: 36.8% Bacheors: 27.3% Pickup: 17.3% Cerica: 15.5% High Schoo: 28.7% Sedan: 26.2% Professiona: 13.6% Masters: 20.2% Sports Car: 11.4% Manager: 12.2% PhD: 9.2% SUV: 27.9% Lawyer: 10.0% Van: 8.9% Student: 8.7% (Other): 17.8% Tabe 5: Descriptive statistics for the categorica variabes in the caim history data set in Section 6. Mode Parameter TGLM MGCV TDboost Index ρ Dispersion φ Tabe 6: The estimated ρ and φ of the mode TGLM, MGCV and TDboost using the profie ikeihood method. than two eves (CAR TYPE, JOBCLASS and MAX EDUC). For a modes, we use the profie ikeihood method to estimate the dispersion φ and the index ρ, which are in turn used in fitting the fina modes. The estimated vaues of φ and ρ are reported in Tabe 6. We see that the estimated vaue of the dispersion parameter in TGLM is greater than those in MGCV and TDBoost. To examine the performance of TGLM, MGCV and TDboost, after fitting on the training set, we predict the pure premium P (x) = ˆµ(x) by appying each mode on the independent hedout testing set. However, attention must be paid when measuring the differences between predicted premiums P (x) and rea osses y on the testing data. The mean squared oss or mean absoute oss is not appropriate here because the osses have high proportions of zeros and are highy right skewed. Therefore an aternative statistica measure the ordered Lorenz curve and the associated Gini index proposed by Frees et a. (2011) are used for capturing the discrepancy between the premium and oss distributions. By cacuating the 29
30 Gini index, the performance of different predictive modes can be compared. Here we ony briefy expain the idea of the ordered Lorenz curve (Frees et a., 2011, 2013). Let B(x) be the base premium, which is cacuated using the existing premium prediction mode, and et P (x) be the competing premium cacuated using an aternative premium prediction mode. In the ordered Lorenz curve, the distribution of osses and the distribution of premiums are sorted based on the reative premium R(x) = P (x)/b(x). The ordered premium distribution is ˆD P (s) = n i=1 B(x i)i(r(x i ) s) n i=1 B(x, i) and the ordered oss distribution is ˆD L (s) = n i=1 y ii(r(x i ) s) n i=1 y. i Two empirica distributions are based on the same sort order, which makes it possibe to compare the premium and oss distributions for the same poicyhoder group. The ordered Lorenz curve is the graph of ( ˆD P (s), ˆD L (s)). When the percentage of osses equas the percentage of premiums for the insurer, the curve resuts in a 45degree ine, known as the ine of equaity. Twice the area between the ordered Lorenz curve and the ine of equaity measures the discrepancy between the premium and oss distributions, and is defined as the Gini index. Curves beow the ine of equaity indicate that, given knowedge of the reative premium, an insurer coud identify the profitabe contracts, whose premiums are greater than osses. Therefore, a arger Gini index (hence a arger area between the ine of equaity and the curve beow) woud impy a more favorabe mode. Foowing Frees et a. (2013), we successivey specify the prediction from each mode as the base premium B(x) and use predictions from the remaining modes as the competing premium P (x) to compute the Gini indices. The entire procedure of the data spitting and Gini index computation are repeated 20 times, and a matrix of the averaged Gini indices and standard errors is reported in Tabe 7. To pick the best mode, we use a minimax strategy (Frees et a., 2013) to seect the base premium mode that are east vunerabe to competing premium modes; that is, we seect the mode that provides the smaest of the maxima Gini indices, taken over competing premiums. We find that the maxima Gini index 30
31 Competing Premium Base Premium TGLM MGCV TDboost TGLM (0.313) (0.415) MGCV (0.551) (0.497) TDboost (0.418) (0.456) 0 Tabe 7: The averaged Gini indices and standard errors in the auto insurance caim data exampe based on 20 random spits. is when using B(x) = ˆµ TGLM (x) as the base premium, when B(x) = ˆµ MGCV (x), and when B(x) = ˆµ TDboost (x). Therefore, TDboost has the smaest maximum Gini index at 4.677, hence is the east vunerabe to aternative scores. Figure 7 aso shows that when TGLM (or MGCV) is seected as the base premium, the area between the ine of equaity and the ordered Lorenz curve is arger when choosing TDboost as the competing premium, indicating again that the TDboost mode represents the most favorabe choice. MGCV TGLM Loss 50 Mode TGLM MGCV TDBoost Premium Figure 7: The ordered Lorenz curves for the auto insurance caim data. Next, we focus on the anaysis using the TDboost mode. There are severa expanatory variabes significanty reated to the pure premium. The VI measure and the baseine vaue of each expanatory variabe are shown in Figure 8. We find that REVOKED, MVR PTS, AREA and INCOME have high VI measure scores (the vertica ine), and their scores a surpass the corresponding baseines (the horizonta ineength), indicating that the impor 31
32 tance of those expanatory variabes is rea. We aso find the variabes AGE, CAR TYPE, JOBCLASS, NPOLICY, MARRIED, KIDSDRIV, MAX EDUC and CAR USE have argerthanbaseine VI measure scores, but the absoute scaes are much ess than aforementioned four variabes. On the other hand, athough the VI measure of, e.g., BLUEBOOK is quite arge, it does not significanty surpass the baseine importance. Reative Infuence Baseine REVOKED MVR PTS AREA INCOME BLUEBOOK AGE TRAVTIME CAR TYPE JOBCLASS NPOLICY MARRIED RETAINED KIDSDRIV MAX EDUC CAR USE HOMEKIDS GENDER Fraction of Reduction in Sum of Squared Error in Gradient Prediction Figure 8: The variabe importance measures and baseines of 17 expanatory variabes for modeing the pure premium. We now use the partia dependence pots to visuaize the fitted mode. Figure 9 shows the main effects of four important expanatory variabes on the pure premium. We ceary see that the strong noninear effects exist in predictors INCOME and MVR PTS: for the poicyhoders with the annua income beow 163 (in $1000), their pure premium is negativey associated with the income; after the income passes 163, the pure premium starts to graduay increase with the income unti the pure premium curve reaches a pateau when the income passes 237; Additionay, the pure premium is positivey associated with motor vehice record points MVR PTS, but the pure premium curve reaches a pateau when MVR PTS exceeds five. On the other hand, the partia dependence pots suggest that a poicyhoder who ives in the urban area (AREA= URBAN ) or with driver s icense revoked (REVOKED= YES ) 32
33 typicay has reativey high pure premium. REVOKED AREA Pure Premium (in $1000) No INCOME Yes Rura MVR PTS Urban x Figure 9: Margina effects of four most significant expanatory variabes on the pure premium. In our mode, the datadriven choice for the tree size is L = 7, which means that our mode incudes higher order interactions. In Figure 10, we visuaize the effects of four important second order interactions using the joint partia dependence pots. These four interactions are AREA MVR PTS, AREA REVOKED, AREA NPOLICY and INCOME MVR PTS. The first three interactions a invove the variabe AREA: we can see that the margina effects of MVR PTS, REVOKED and NPOLICY on the pure premium are greater for the poicyhoders iving in the urban area (AREA= URBAN ) than those iving in the rura area (AREA= RURAL ). Aso, a strong INCOME MVR PTS interaction suggests 33
34 (a) (b) µ(x) µ(x) Yes 6 MVR PTS Rura AREA Urban REVOKED No Rura AREA Urban (c) (d) µ(x) 1.5 µ(x) Urban AREA Rura NPOLICY MVR PTS INCOME 300 Figure 10: Four strong pairwise interactions. 34
35 that when MVR PTS vaue is ow, income vaues of the poicyhoders do not have a strong margina effect on the expected pure premium; but when MVR PTS vaue is high, the partia dependence of the pure premium on the income becomes stronger. 7 Concusions The need for noninear risk factors as we as risk factor interactions for modeing insurance caim sizes is werecognized by actuaria practitioners, but practica toos to study them are very imited. In this paper, reying on neither the inear assumption nor a prespecified interaction structure, a fexibe treebased gradient boosting method is designed for the Tweedie mode. We impement the proposed method in a userfriendy R package TDboost that can make accurate insurance premium predictions for compex data sets and serve as a convenient too for actuaria practitioners to investigate the noninear and interaction effects. In the context of persona auto insurance, we impicity use the poicy duration as a voume measure (or exposure), and demonstrate the favorabe prediction performance of TDboost for the pure premium. In cases that exposure measures other than duration are used, which is common in commercia insurance, we can extend the TDboost method to the corresponding caim size by simpy repacing the duration with any chosen exposure measure. We aso want to point out that TDboost can be an important compement to the traditiona GLM mode in insurance rating. Even under the strict circumstances that the reguators demand the fina mode to have a GLM structure, our approach can sti be quite hepfu due to its abiity to extract additiona information such as nonmonotonicity/noninearity and important interaction. In Appendix A, we provide an additiona rea data anaysis to demonstrate that our method can provide insights into the structure of interaction terms. After integrating the obtained information about the interaction terms into the origina GLM mode, we can much enhance the overa accuracy of the insurance premium prediction whie maintaining a GLM mode structure. 35
36 References Anstey, K. J., Wood, J., Lord, S., and Waker, J. G. (2005), Cognitive, sensory and physica factors enabing driving safety in oder aduts, Cinica psychoogy review, 25, BarLev, S. K. and Stramer, O. (1987), Characterizations of natura exponentia famiies with power variance functions by zero regression properties, Probabiity theory and reated fieds, 76, Breiman, L. (1998), Arcing cassifier (with discussion and a rejoinder by the author), The Annas of Statistics, 26, (1999), Prediction games and arcing agorithms, Neura Computation, 11, Breiman, L., Friedman, J., Oshen, R., Stone, C., Steinberg, D., and Coa, P. (1984), CART: Cassification and regression trees, Wadsworth. Brent, R. P. (2013), Agorithms for minimization without derivatives, Courier Dover Pubications. Bühmann, P. and Hothorn, T. (2007), Boosting agorithms: Reguarization, prediction and mode fitting, Statistica Science, 22, Chiappori, P.A. and Saanie, B. (2000), Testing for Asymmetric Information in Insurance Markets, The Journa of Poitica Economy, 108, Craven, P. and Wahba, G. (1978), Smoothing noisy data with spine functions, Numerische Mathematik, 31, Dionne, G., Gouriéroux, C., and Vanasse, C. (2001), Testing for evidence of adverse seection in the automobie insurance market: A comment, Journa of Poitica Economy, 109, Dunn, P. K. and Smyth, G. K. (2005), Series evauation of Tweedie exponentia dispersion mode densities, Statistics and Computing, 15, (2008), Evauation of Tweedie exponentia dispersion mode densities by Fourier inversion, Statistics and Computing, 18,
37 Eith, J., Leathwick, J. R., and Hastie, T. (2008), A working guide to boosted regression trees, Journa of Anima Ecoogy, 77, Feer, W. (1968), An Introduction to Probabiity Theory and its Appications, Wiey\ & Sons, New York. Frees, E. W., Meyers, G., and Cummings, A. D. (2011), Summarizing insurance scores using a Gini index, Journa of the American Statistica Association, 106. Frees, E. W. J., Meyers, G., and Cummings, A. D. (2013), Insurance ratemaking and a Gini index, Journa of Risk and Insurance. Freund, Y. and Schapire, R. (1996), Experiments with a new boosting agorithm, in Machine earning: Proceedings of the Thirteenth Internationa Conference, Morgan Kaufmann Pubishers, Inc., pp (1997), A decisiontheoretic generaization of onine earning and an appication to boosting, Journa of Computer and System Sciences, 55, Friedman, J. (2001), Greedy function approximation: A gradient boosting machine, The Annas of Statistics, 29, Friedman, J., Hastie, T., and Tibshirani, R. (2000), Additive ogistic regression: A statistica view of boosting (With discussion and a rejoinder by the authors), The Annas of Statistics, 28, Friedman, J. H. (2002), Stochastic gradient boosting, Computationa Statistics & Data Anaysis, 38, Haberman, S. and Renshaw, A. E. (1996), Generaized inear modes and actuaria science, Statistician, 45, Hastie, T., Tibshirani, R., and Friedman, J. (2009), The eements of statistica earning: Data mining, inference, and prediction. Second Edition., Springer Series in Statistics, Springer. Hastie, T. J. and Tibshirani, R. J. (1990), Generaized additive modes, vo. 43, CRC Press. 37
38 Jørgensen, B. (1987), Exponentia dispersion modes, Journa of the Roya Statistica Society. Series B (Methodoogica), (1997), The theory of dispersion modes, vo. 76, CRC Press. Jørgensen, B. and de Souza, M. C. (1994), Fitting Tweedie s compound Poisson mode to insurance caims data, Scandinavian Actuaria Journa, 1994, Kononenko, I. (1995), On biases in estimating mutivaued attributes, in Proceedings of the 14th internationa joint conference on Artificia inteigencevoume 2, Morgan Kaufmann Pubishers Inc., pp McCartt, A. T., Shabanova, V. I., and Leaf, W. A. (2003), Driving experience, crashes and traffic citations of teenage beginning drivers, Accident Anaysis & Prevention, 35, Midenha, S. J. (1999), A systematic reationship between minimum bias and generaized inear modes, in Proceedings of the Casuaty Actuaria Society, vo. 86, pp Murphy, K. P., Brockman, M. J., and Lee, P. K. (2000), Using generaized inear modes to buid dynamic pricing systems, in Casuaty Actuaria Society Forum, Winter, pp Neder, J. and Wedderburn, R. (1972), Generaized Linear Modes, Journa of the Roya Statistica Society. Series A (Genera), 135, Ohsson, E. and Johansson, B. (2010), Nonife insurance pricing with generaized inear modes, Springer. Owsey, C., Ba, K., Soane, M. E., Roenker, D. L., and Bruni, J. R. (1991), Visua/cognitive correates of vehice accidents in oder drivers. Psychoogy and aging, 6, 403. Peters, G. W., Shevchenko, P. V., and Wüthrich, M. V. (2008), Mode risk in caims reserving within Tweedie s compound Poisson modes, ASTIN Buetin, to appear. 38
39 Quijano Xacur, O. A. et a. (2011), Property and Casuaty Premiums based on Tweedie Famiies of Generaized Linear Modes, Ph.D. thesis, Concordia University. Renshaw, A. E. (1994), Modeing the caims process in the presence of covariates, ASTIN Buetin, 24, Ridgeway, G. (2007), Generaized Boosted Regression Modes, R package manua. Sandri, M. and Zuccootto, P. (2008), A bias correction agorithm for the Gini variabe importance measure in cassification trees, Journa of Computationa and Graphica Statistics, 17. (2010), Anaysis and correction of bias in Tota Decrease in Node Impurity measures for treebased agorithms, Statistics and Computing, 20, Showers, V. E. and Shotick, J. A. (1994), The effects of househod characteristics on demand for insurance: A tobit anaysis, Journa of Risk and Insurance, Smyth, G. and Jørgensen, B. (2002), Fitting Tweedie s Compound Poisson Mode to Insurance Caims Data: Dispersion Modeing, ASTIN Buetin, 32, Smyth, G. K. (1996), Regression anaysis of quantity data with exact zeros, in Proceedings of the second Austraia Japan workshop on stochastic modes in engineering, technoogy and management, Citeseer, pp Tweedie, M. (1984), An index which distinguishes between some important exponentia famiies, in Statistics: Appications and New Directions: Proc. Indian Statistica Institute Goden Jubiee Internationa Conference, pp Van de Ven, W. and van Praag, B. M. (1981), Risk aversion and deductibes in private heath insurance: appication of an adjusted tobit mode to famiy heath care expenditures, Heath, economics, and heath economics, Wahba, G. (1990), Spine modes for observationa data, vo. 59, SIAM. White, A. P. and Liu, W. Z. (1994), Technica note: Bias in informationbased measures in decision tree induction, Machine Learning, 15,
40 Wood, S. (2001), mgcv: GAMs and generaized ridge regression for R, R News, 1, (2006), Generaized additive modes: an introduction with R, CRC press. Yip, K. C. and Yau, K. K. (2005), On modeing caim frequency data in genera insurance with extra zeros, Insurance: Mathematics and Economics, 36, Zhang, T. and Yu, B. (2005), Boosting with eary stopping: Convergence and consistency, The Annas of Statistics, Zhang, W. (2011), cpm: Monte Caro EM agorithms and Bayesian methods for fitting Tweedie compound Poisson inear modes, R package, web/packages/cpm/index.htm. 40
Face Hallucination and Recognition
Face Haucination and Recognition Xiaogang Wang and Xiaoou Tang Department of Information Engineering, The Chinese University of Hong Kong {xgwang1, xtang}@ie.cuhk.edu.hk http://mmab.ie.cuhk.edu.hk Abstract.
Australian Bureau of Statistics Management of Business Providers
Purpose Austraian Bureau of Statistics Management of Business Providers 1 The principa objective of the Austraian Bureau of Statistics (ABS) in respect of business providers is to impose the owest oad
Secure Network Coding with a Cost Criterion
Secure Network Coding with a Cost Criterion Jianong Tan, Murie Médard Laboratory for Information and Decision Systems Massachusetts Institute of Technoogy Cambridge, MA 0239, USA E-mai: {jianong, medard}@mit.edu
A Latent Variable Pairwise Classification Model of a Clustering Ensemble
A atent Variabe Pairwise Cassification Mode of a Custering Ensembe Vadimir Berikov Soboev Institute of mathematics, Novosibirsk State University, Russia [email protected] http://www.math.nsc.ru Abstract.
Teamwork. Abstract. 2.1 Overview
2 Teamwork Abstract This chapter presents one of the basic eements of software projects teamwork. It addresses how to buid teams in a way that promotes team members accountabiity and responsibiity, and
Fixed income managers: evolution or revolution
Fixed income managers: evoution or revoution Traditiona approaches to managing fixed interest funds rey on benchmarks that may not represent optima risk and return outcomes. New techniques based on separate
A Supplier Evaluation System for Automotive Industry According To Iso/Ts 16949 Requirements
A Suppier Evauation System for Automotive Industry According To Iso/Ts 16949 Requirements DILEK PINAR ÖZTOP 1, ASLI AKSOY 2,*, NURSEL ÖZTÜRK 2 1 HONDA TR Purchasing Department, 41480, Çayırova - Gebze,
CONTRIBUTION OF INTERNAL AUDITING IN THE VALUE OF A NURSING UNIT WITHIN THREE YEARS
Dehi Business Review X Vo. 4, No. 2, Juy - December 2003 CONTRIBUTION OF INTERNAL AUDITING IN THE VALUE OF A NURSING UNIT WITHIN THREE YEARS John N.. Var arvatsouakis atsouakis DURING the present time,
Fast Robust Hashing. ) [7] will be re-mapped (and therefore discarded), due to the load-balancing property of hashing.
Fast Robust Hashing Manue Urueña, David Larrabeiti and Pabo Serrano Universidad Caros III de Madrid E-89 Leganés (Madrid), Spain Emai: {muruenya,darra,pabo}@it.uc3m.es Abstract As statefu fow-aware services
Vendor Performance Measurement Using Fuzzy Logic Controller
The Journa of Mathematics and Computer Science Avaiabe onine at http://www.tjmcs.com The Journa of Mathematics and Computer Science Vo.2 No.2 (2011) 311-318 Performance Measurement Using Fuzzy Logic Controer
Finance 360 Problem Set #6 Solutions
Finance 360 Probem Set #6 Soutions 1) Suppose that you are the manager of an opera house. You have a constant margina cost of production equa to $50 (i.e. each additiona person in the theatre raises your
Pay-on-delivery investing
Pay-on-deivery investing EVOLVE INVESTment range 1 EVOLVE INVESTMENT RANGE EVOLVE INVESTMENT RANGE 2 Picture a word where you ony pay a company once they have deivered Imagine striking oi first, before
SELECTING THE SUITABLE ERP SYSTEM: A FUZZY AHP APPROACH. Ufuk Cebeci
SELECTING THE SUITABLE ERP SYSTEM: A FUZZY AHP APPROACH Ufuk Cebeci Department of Industria Engineering, Istanbu Technica University, Macka, Istanbu, Turkey - [email protected] Abstract An Enterprise
TERM INSURANCE CALCULATION ILLUSTRATED. This is the U.S. Social Security Life Table, based on year 2007.
This is the U.S. Socia Security Life Tabe, based on year 2007. This is avaiabe at http://www.ssa.gov/oact/stats/tabe4c6.htm. The ife eperiences of maes and femaes are different, and we usuay do separate
A quantum model for the stock market
A quantum mode for the stock market Authors: Chao Zhang a,, Lu Huang b Affiiations: a Schoo of Physics and Engineering, Sun Yat-sen University, Guangzhou 5175, China b Schoo of Economics and Business Administration,
Risk Margin for a Non-Life Insurance Run-Off
Risk Margin for a Non-Life Insurance Run-Off Mario V. Wüthrich, Pau Embrechts, Andreas Tsanakas February 2, 2011 Abstract For sovency purposes insurance companies need to cacuate so-caed best-estimate
Design of Follow-Up Experiments for Improving Model Discrimination and Parameter Estimation
Design of Foow-Up Experiments for Improving Mode Discrimination and Parameter Estimation Szu Hui Ng 1 Stephen E. Chick 2 Nationa University of Singapore, 10 Kent Ridge Crescent, Singapore 119260. Technoogy
GREEN: An Active Queue Management Algorithm for a Self Managed Internet
: An Active Queue Management Agorithm for a Sef Managed Internet Bartek Wydrowski and Moshe Zukerman ARC Specia Research Centre for Utra-Broadband Information Networks, EEE Department, The University of
Normalization of Database Tables. Functional Dependency. Examples of Functional Dependencies: So Now what is Normalization? Transitive Dependencies
ISM 602 Dr. Hamid Nemati Objectives The idea Dependencies Attributes and Design Understand concepts normaization (Higher-Leve Norma Forms) Learn how to normaize tabes Understand normaization and database
Older people s assets: using housing equity to pay for health and aged care
Key words: aged care; retirement savings; reverse mortgage; financia innovation; financia panning Oder peope s assets: using housing equity to pay for heath and aged care The research agenda on the ageing
Risk Margin for a Non-Life Insurance Run-Off
Risk Margin for a Non-Life Insurance Run-Off Mario V. Wüthrich, Pau Embrechts, Andreas Tsanakas August 15, 2011 Abstract For sovency purposes insurance companies need to cacuate so-caed best-estimate reserves
Bite-Size Steps to ITIL Success
7 Bite-Size Steps to ITIL Success Pus making a Business Case for ITIL! Do you want to impement ITIL but don t know where to start? 7 Bite-Size Steps to ITIL Success can hep you to decide whether ITIL can
Distribution of Income Sources of Recent Retirees: Findings From the New Beneficiary Survey
Distribution of Income Sources of Recent Retirees: Findings From the New Beneficiary Survey by Linda Drazga Maxfied and Virginia P. Rena* Using data from the New Beneficiary Survey, this artice examines
WHITE PAPER UndERsTAndIng THE VAlUE of VIsUAl data discovery A guide To VIsUAlIzATIons
Understanding the Vaue of Visua Data Discovery A Guide to Visuaizations WHITE Tabe of Contents Executive Summary... 3 Chapter 1 - Datawatch Visuaizations... 4 Chapter 2 - Snapshot Visuaizations... 5 Bar
Betting Strategies, Market Selection, and the Wisdom of Crowds
Betting Strategies, Market Seection, and the Wisdom of Crowds Wiemien Kets Northwestern University [email protected] David M. Pennock Microsoft Research New York City [email protected]
effect on major accidents
An Investigation into a weekend (or bank hoiday) effect on major accidents Nicoa C. Heaey 1 and Andrew G. Rushton 2 1 Heath and Safety Laboratory, Harpur Hi, Buxton, Derbyshire, SK17 9JN 2 Hazardous Instaations
Maintenance activities planning and grouping for complex structure systems
Maintenance activities panning and grouping for compex structure systems Hai Canh u, Phuc Do an, Anne Barros, Christophe Berenguer To cite this version: Hai Canh u, Phuc Do an, Anne Barros, Christophe
Chapter 3: e-business Integration Patterns
Chapter 3: e-business Integration Patterns Page 1 of 9 Chapter 3: e-business Integration Patterns "Consistency is the ast refuge of the unimaginative." Oscar Wide In This Chapter What Are Integration Patterns?
Introduction the pressure for efficiency the Estates opportunity
Heathy Savings? A study of the proportion of NHS Trusts with an in-house Buidings Repair and Maintenance workforce, and a discussion of eary experiences of Suppies efficiency initiatives Management Summary
Oligopoly in Insurance Markets
Oigopoy in Insurance Markets June 3, 2008 Abstract We consider an oigopoistic insurance market with individuas who differ in their degrees of accident probabiities. Insurers compete in coverage and premium.
The Basel II Risk Parameters. Second edition
The Base II Risk Parameters Second edition . Bernd Engemann Editors Robert Rauhmeier The Base II Risk Parameters Estimation, Vaidation, Stress Testing with Appications to Loan Risk Management Editors Dr.
Life Contingencies Study Note for CAS Exam S. Tom Struppeck
Life Contingencies Study Note for CAS Eam S Tom Struppeck (Revised 9/19/2015) Introduction Life contingencies is a term used to describe surviva modes for human ives and resuting cash fows that start or
A Description of the California Partnership for Long-Term Care Prepared by the California Department of Health Care Services
2012 Before You Buy A Description of the Caifornia Partnership for Long-Term Care Prepared by the Caifornia Department of Heath Care Services Page 1 of 13 Ony ong-term care insurance poicies bearing any
A New Statistical Approach to Network Anomaly Detection
A New Statistica Approach to Network Anomay Detection Christian Caegari, Sandrine Vaton 2, and Michee Pagano Dept of Information Engineering, University of Pisa, ITALY E-mai: {christiancaegari,mpagano}@ietunipiit
Simultaneous Routing and Power Allocation in CDMA Wireless Data Networks
Simutaneous Routing and Power Aocation in CDMA Wireess Data Networks Mikae Johansson *,LinXiao and Stephen Boyd * Department of Signas, Sensors and Systems Roya Institute of Technoogy, SE 00 Stockhom,
COMPARISON OF DIFFUSION MODELS IN ASTRONOMICAL OBJECT LOCALIZATION
COMPARISON OF DIFFUSION MODELS IN ASTRONOMICAL OBJECT LOCALIZATION Františe Mojžíš Department of Computing and Contro Engineering, ICT Prague, Technicá, 8 Prague [email protected] Abstract This
READING A CREDIT REPORT
Name Date CHAPTER 6 STUDENT ACTIVITY SHEET READING A CREDIT REPORT Review the sampe credit report. Then search for a sampe credit report onine, print it off, and answer the questions beow. This activity
Business schools are the academic setting where. The current crisis has highlighted the need to redefine the role of senior managers in organizations.
c r o s os r oi a d s REDISCOVERING THE ROLE OF BUSINESS SCHOOLS The current crisis has highighted the need to redefine the roe of senior managers in organizations. JORDI CANALS Professor and Dean, IESE
Comparison of Traditional and Open-Access Appointment Scheduling for Exponentially Distributed Service Time
Journa of Heathcare Engineering Vo. 6 No. 3 Page 34 376 34 Comparison of Traditiona and Open-Access Appointment Scheduing for Exponentiay Distributed Service Chongjun Yan, PhD; Jiafu Tang *, PhD; Bowen
Let s get usable! Usability studies for indexes. Susan C. Olason. Study plan
Let s get usabe! Usabiity studies for indexes Susan C. Oason The artice discusses a series of usabiity studies on indexes from a systems engineering and human factors perspective. The purpose of these
Early access to FAS payments for members in poor health
Financia Assistance Scheme Eary access to FAS payments for members in poor heath Pension Protection Fund Protecting Peope s Futures The Financia Assistance Scheme is administered by the Pension Protection
Betting on the Real Line
Betting on the Rea Line Xi Gao 1, Yiing Chen 1,, and David M. Pennock 2 1 Harvard University, {xagao,yiing}@eecs.harvard.edu 2 Yahoo! Research, [email protected] Abstract. We study the probem of designing
HEALTH PROFESSIONS PATHWAYS
T heoffic eofcommuni t yco egeres ea r c ha ndl ea der s hi p Co egeofe duc a t i ona ti i noi s The Heath Professions Pathways (H2P) Consortium is a nationa consortium comprised of nine coeges in five states
A Similarity Search Scheme over Encrypted Cloud Images based on Secure Transformation
A Simiarity Search Scheme over Encrypted Coud Images based on Secure Transormation Zhihua Xia, Yi Zhu, Xingming Sun, and Jin Wang Jiangsu Engineering Center o Network Monitoring, Nanjing University o Inormation
Multi-Robot Task Scheduling
Proc of IEEE Internationa Conference on Robotics and Automation, Karsruhe, Germany, 013 Muti-Robot Tas Scheduing Yu Zhang and Lynne E Parer Abstract The scheduing probem has been studied extensivey in
LADDER SAFETY Table of Contents
Tabe of Contents SECTION 1. TRAINING PROGRAM INTRODUCTION..................3 Training Objectives...........................................3 Rationae for Training.........................................3
Take me to your leader! Online Optimization of Distributed Storage Configurations
Take me to your eader! Onine Optimization of Distributed Storage Configurations Artyom Sharov Aexander Shraer Arif Merchant Murray Stokey [email protected], {shraex, aamerchant, mstokey}@googe.com
Internal Control. Guidance for Directors on the Combined Code
Interna Contro Guidance for Directors on the Combined Code ISBN 1 84152 010 1 Pubished by The Institute of Chartered Accountants in Engand & Waes Chartered Accountants Ha PO Box 433 Moorgate Pace London
Art of Java Web Development By Neal Ford 624 pages US$44.95 Manning Publications, 2004 ISBN: 1-932394-06-0
IEEE DISTRIBUTED SYSTEMS ONLINE 1541-4922 2005 Pubished by the IEEE Computer Society Vo. 6, No. 5; May 2005 Editor: Marcin Paprzycki, http://www.cs.okstate.edu/%7emarcin/ Book Reviews: Java Toos and Frameworks
Business Banking. A guide for franchises
Business Banking A guide for franchises Hep with your franchise business, right on your doorstep A true understanding of the needs of your business: that s what makes RBS the right choice for financia
University of Southern California
Master of Science in Financia Engineering Viterbi Schoo of Engineering University of Southern Caifornia Dia 1-866-469-3239 (Meeting number 924 898 113) to hear the audio portion, or isten through your
Chapter 1 Structural Mechanics
Chapter Structura echanics Introduction There are many different types of structures a around us. Each structure has a specific purpose or function. Some structures are simpe, whie others are compex; however
Integrating Risk into your Plant Lifecycle A next generation software architecture for risk based
Integrating Risk into your Pant Lifecyce A next generation software architecture for risk based operations Dr Nic Cavanagh 1, Dr Jeremy Linn 2 and Coin Hickey 3 1 Head of Safeti Product Management, DNV
Advanced ColdFusion 4.0 Application Development - 3 - Server Clustering Using Bright Tiger
Advanced CodFusion 4.0 Appication Deveopment - CH 3 - Server Custering Using Bri.. Page 1 of 7 [Figures are not incuded in this sampe chapter] Advanced CodFusion 4.0 Appication Deveopment - 3 - Server
WHITE PAPER BEsT PRAcTIcEs: PusHIng ExcEl BEyond ITs limits WITH InfoRmATIon optimization
Best Practices: Pushing Exce Beyond Its Limits with Information Optimization WHITE Best Practices: Pushing Exce Beyond Its Limits with Information Optimization Executive Overview Microsoft Exce is the
How To Deiver Resuts
Message We sha make every effort to strengthen the community buiding programme which serves to foster among the peope of Hong Kong a sense of beonging and mutua care. We wi continue to impement the District
Serving the Millennial Generation - The Challenge and Opportunity for Financial Services Companies
Serving the Miennia Generation - The Chaenge and Opportunity for Financia Services Companies May 2015 Christopher J. Perry, CFA Equity Research Anayst Today, the Miennia Generation (or Generation Y), broady
FRAME BASED TEXTURE CLASSIFICATION BY CONSIDERING VARIOUS SPATIAL NEIGHBORHOODS. Karl Skretting and John Håkon Husøy
FRAME BASED TEXTURE CLASSIFICATION BY CONSIDERING VARIOUS SPATIAL NEIGHBORHOODS Kar Skretting and John Håkon Husøy University of Stavanger, Department of Eectrica and Computer Engineering N-4036 Stavanger,
Discounted Cash Flow Analysis (aka Engineering Economy)
Discounted Cash Fow Anaysis (aka Engineering Economy) Objective: To provide economic comparison of benefits and costs that occur over time Assumptions: Future benefits and costs can be predicted A Benefits,
An Idiot s guide to Support vector machines (SVMs)
An Idiot s guide to Support vector machines (SVMs) R. Berwick, Viage Idiot SVMs: A New Generation of Learning Agorithms Pre 1980: Amost a earning methods earned inear decision surfaces. Linear earning
CERTIFICATE COURSE ON CLIMATE CHANGE AND SUSTAINABILITY. Course Offered By: Indian Environmental Society
CERTIFICATE COURSE ON CLIMATE CHANGE AND SUSTAINABILITY Course Offered By: Indian Environmenta Society INTRODUCTION The Indian Environmenta Society (IES) a dynamic and fexibe organization with a goba vision
Oracle Project Financial Planning. User's Guide Release 11.1.2.2
Orace Project Financia Panning User's Guide Reease 11.1.2.2 Project Financia Panning User's Guide, 11.1.2.2 Copyright 2012, Orace and/or its affiiates. A rights reserved. Authors: EPM Information Deveopment
Human Capital & Human Resources Certificate Programs
MANAGEMENT CONCEPTS Human Capita & Human Resources Certificate Programs Programs to deveop functiona and strategic skis in: Human Capita // Human Resources ENROLL TODAY! Contract Hoder Contract GS-02F-0010J
The guaranteed selection. For certainty in uncertain times
The guaranteed seection For certainty in uncertain times Making the right investment choice If you can t afford to take a ot of risk with your money it can be hard to find the right investment, especiay
Virtual trunk simulation
Virtua trunk simuation Samui Aato * Laboratory of Teecommunications Technoogy Hesinki University of Technoogy Sivia Giordano Laboratoire de Reseaux de Communication Ecoe Poytechnique Federae de Lausanne
Leadership & Management Certificate Programs
MANAGEMENT CONCEPTS Leadership & Management Certificate Programs Programs to deveop expertise in: Anaytics // Leadership // Professiona Skis // Supervision ENROLL TODAY! Contract oder Contract GS-02F-0010J
Protection Against Income Loss During the First 4 Months of Illness or Injury *
Protection Against Income Loss During the First 4 Months of Iness or Injury * This note examines and describes the kinds of income protection that are avaiabe to workers during the first 6 months of iness
Qualifications, professional development and probation
UCU Continuing Professiona Deveopment Quaifications, professiona deveopment and probation Initia training and further education teaching quaifications Since September 2007 a newy appointed FE ecturers,
Welcome to Colonial Voluntary Benefits. Thank you for your interest in our Universal Life with the Accelerated Death Benefit for Long Term Care Rider.
Heo, Wecome to Coonia Vountary Benefits. Thank you for your interest in our Universa Life with the Acceerated Death Benefit for Long Term Care Rider. For detai pease ca 877-685-2656. Pease eave your name,
3.3 SOFTWARE RISK MANAGEMENT (SRM)
93 3.3 SOFTWARE RISK MANAGEMENT (SRM) Fig. 3.2 SRM is a process buit in five steps. The steps are: Identify Anayse Pan Track Resove The process is continuous in nature and handed dynamicay throughout ifecyce
Market Design & Analysis for a P2P Backup System
Market Design & Anaysis for a P2P Backup System Sven Seuken Schoo of Engineering & Appied Sciences Harvard University, Cambridge, MA [email protected] Denis Chares, Max Chickering, Sidd Puri Microsoft
1##111##1111#1#111i#lllil
1##111##1111#1#111i#i 140334197x SWP 6/90 GROWTH AND PERFORMANCE CONTRASTS BETWEEN TYPES OF SMALL FIRMS PROFESSOR SUE BIRLEY & DR PAUL WESTHEAD Cranfieid Entrepreneurship Research Centre Cranfied Schoo
ABSTRACT. Categories and Subject Descriptors. General Terms. Keywords 1. INTRODUCTION. Jun Yin, Ye Wang and David Hsu
Jun Yin, Ye Wang and David Hsu ABSTRACT Prompt feedback is essentia for beginning vioin earners; however, most amateur earners can ony meet with teachers and receive feedback once or twice a week. To hep
SPOTLIGHT. A year of transformation
WINTER ISSUE 2014 2015 SPOTLIGHT Wecome to the winter issue of Oasis Spotight. These newsetters are designed to keep you upto-date with news about the Oasis community. This quartery issue features an artice
Chapter 2 Traditional Software Development
Chapter 2 Traditiona Software Deveopment 2.1 History of Project Management Large projects from the past must aready have had some sort of project management, such the Pyramid of Giza or Pyramid of Cheops,
Overview of Health and Safety in China
Overview of Heath and Safety in China Hongyuan Wei 1, Leping Dang 1, and Mark Hoye 2 1 Schoo of Chemica Engineering, Tianjin University, Tianjin 300072, P R China, E-mai: [email protected] 2 AstraZeneca
Recent Trends in Workers Compensation Coverage by Brian Z. Brown, FCAS Melodee J. Saunders, ACAS
Recent Trends in Workers Compensation Coverage by Brian Z. Brown, FCAS Meodee J. Saunders, ACAS TITLE: RECENT TRENDS IN WORKERS COMPENSATION COVERAGE BY: Ms. Meodee J. Saunders, A.C.A.S., M.A.A.A. Mr.
History of Stars and Rain Education Institute for Autism (Stars and Rain)
History of Education Institute for Autism () Estabished:: March 15. 1993 in Beijing Founder:: Ms. Tian Huiping (mother of a boy with autism) STARS AND RAIN was founded in 1993 by a parent and is China
3.5 Pendulum period. 2009-02-10 19:40:05 UTC / rev 4d4a39156f1e. g = 4π2 l T 2. g = 4π2 x1 m 4 s 2 = π 2 m s 2. 3.5 Pendulum period 68
68 68 3.5 Penduum period 68 3.5 Penduum period Is it coincidence that g, in units of meters per second squared, is 9.8, very cose to 2 9.87? Their proximity suggests a connection. Indeed, they are connected
Scheduling in Multi-Channel Wireless Networks
Scheduing in Muti-Channe Wireess Networks Vartika Bhandari and Nitin H. Vaidya University of Iinois at Urbana-Champaign, USA [email protected], [email protected] Abstract. The avaiabiity of mutipe orthogona
NCH Software MoneyLine
NCH Software MoneyLine This user guide has been created for use with MoneyLine Version 2.xx NCH Software Technica Support If you have difficuties using MoneyLine pease read the appicabe topic before requesting
We are XMA and Viglen.
alearn with Microsoft 16pp 21.07_Layout 1 22/12/2014 10:49 Page 1 FRONT COVER alearn with Microsoft We are XMA and Vigen. Ca us now on 0115 846 4900 Visit www.xma.co.uk/aearn Emai [email protected] Foow
Niagara Catholic. District School Board. High Performance. Support Program. Academic
Niagara Cathoic District Schoo Board High Performance Academic Support Program The Niagara Cathoic District Schoo Board, through the charisms of faith, socia justice, support and eadership, nurtures an
Oracle Hyperion Tax Provision. User's Guide Release 11.1.2.2
Orace Hyperion Tax Provision User's Guide Reease 11.1.2.2 Tax Provision User's Guide, 11.1.2.2 Copyright 2013, Orace and/or its affiiates. A rights reserved. Authors: EPM Information Deveopment Team Orace
NCH Software FlexiServer
NCH Software FexiServer This user guide has been created for use with FexiServer Version 1.xx NCH Software Technica Support If you have difficuties using FexiServer pease read the appicabe topic before
Leakage detection in water pipe networks using a Bayesian probabilistic framework
Probabiistic Engineering Mechanics 18 (2003) 315 327 www.esevier.com/ocate/probengmech Leakage detection in water pipe networks using a Bayesian probabiistic framework Z. Pouakis, D. Vaougeorgis, C. Papadimitriou*
IMPLEMENTING THE RATE STRUCTURE: TIERING IN THE FEE-FOR-SERVICE SYSTEM
The New Jersey Department of Human Services Division of Deveopmenta Disabiities 1 IMPLEMENTING THE RATE STRUCTURE: TIERING IN THE FEE-FOR-SERVICE SYSTEM Eizabeth M. Shea Assistant Commissioner Thomas S.
