A Practitioner's Guide to Generalized Linear Models

Transcription

1 A Practtoner's Gude to Generalzed Lnear Models A CAS Study Note Duncan Anderson, FIA Sholom Feldblum, FCAS Claudne Modln, FCAS Dors Schrmacher, FCAS Ernesto Schrmacher, ASA Neeza Thand, FCAS Thrd Edton February 7

2 The Practtoner's Gude to Generalzed Lnear Models s wrtten for the practcng actuary who would lke to understand generalzed lnear models GLMs and use them to analyze nsurance data. The gude s dvded nto three sectons. Secton provdes a foundaton for the statstcal theory and gves llustratve examples and ntutve explanatons whch clarfy the theory. The ntutve explanatons buld upon more commonly understood actuaral methods such as lnear models and the mnmum bas procedures. Secton provdes practcal nsghts and realstc model output for each stage of a GLM analyss - ncludng data preparaton and prelmnary analyses, model selecton and teraton, model refnement and model nterpretaton. Ths secton s desgned to boost the actuary's confdence n nterpretng GLMs and applyng them to solve busness problems. Secton dscusses other topcs of nterest relatng to GLMs such as retenton modelng and scorng algorthms. More techncal materal n the paper s set out n appendces. Acknowledgements The authors would lke to thank James Tanser, FIA, for some helpful comments and contrbutons to some elements of ths paper, Shaun Wang, FCAS, for revewng the paper pror to ncluson on the CAS exam syllabus, and Volker Wlmsen for some helpful comments on the Second Edton of ths paper.

3 Contents Secton GLMs - theory and ntuton 4 GLMs n practce 4 Other applcatons of GLMs 8 Bblography 9 Appendx A The desgn matrx when varates are used 94 B The exponental famly of dstrbutons 96 C The Tweede dstrbuton 99 D Canoncal lnk functons E F Solvng for maxmum lkelhood n the general case of an exponental dstrbuton Example of solvng for maxmum lkelhood wth a gamma error and nverse lnk functon 4 G Data requred for a GLM clams analyss 6 H Automated approach for factor categorzaton I Cramer's V J Benefts of modelng frequency and severty separately rather than usng Tweede GLMs 4

4 GLMs - theory and ntuton. Secton dscusses how GLMs are formularzed and solved. The followng topcs are covered n detal: background of GLMs - buldng upon tradtonal actuaral methods such as mnmum bas procedures and lnear models ntroducton to the statstcal framework of GLMs formularzaton of GLMs - ncludng the lnear predctor, the lnk functon, the offset term, the error term, the scale parameter and the pror weghts typcal model forms solvng GLMs - maxmum lkelhood estmaton and numercal technques alasng model dagnostcs - standard errors and devance tests. Background. Tradtonal ratemakng methods n the Unted States are not statstcally sophstcated. Clams experence for many lnes of busness s often analyzed usng smple one-way and two-way analyses. Iteratve methods known as mnmum bas procedures, developed by actuares n the 96s, provde a sgnfcant mprovement, but are stll only part way toward a full statstcal framework.. The classcal lnear model and many of the most common mnmum bas procedures are, n fact, specal cases of generalzed lnear models GLMs. The statstcal framework of GLMs allows explct assumptons to be made about the nature of the nsurance data and ts relatonshp wth predctve varables. The method of solvng GLMs s more techncally effcent than teratvely standardzed methods, whch s not only elegant n theory but valuable n practce. In addton, GLMs provde statstcal dagnostcs whch ad n selectng only sgnfcant varables and n valdatng model assumptons..4 Today GLMs are wdely recognzed as the ndustry standard method for prcng prvate passenger auto and other personal lnes and small commercal lnes nsurance n the European Unon and many other markets. Most Brtsh, Irsh and French auto nsurers use GLMs to analyze ther portfolos and to the authors' knowledge GLMs are commonly used n Italy, the Netherlands, Scandnava, Span, Portugal, Belgum, Swtzerland, South Afrca, Israel and Australa. The method s ganng popularty n Canada, Japan, Korea, Brazl, Sngapore, Malaysa and eastern European countres..5 The prmary applcatons of GLMs n nsurance analyss are ratemakng and underwrtng. Crcumstances that lmt the ablty to change rates at wll eg regulaton have ncreased the use of GLMs for target marketng analyss. 4

5 The falngs of one-way analyss.6 In the past, actuares have reled heavly on one-way analyses for prcng and montorng performance..7 A one-way analyss summarzes nsurance statstcs, such as frequency or loss rato, for each value of each explanatory varable, but wthout takng account of the effect of other varables. Explanatory varables can be dscrete or contnuous. Dscrete varables are generally referred to as "factors", wth values that each factor can take beng referred to as "levels", and contnuous varables are generally referred to as "varates". The use of varates s generally less common n nsurance modelng..8 One-way analyses can be dstorted by correlatons between ratng factors. For example, young drvers may n general drve older cars. A one-way analyss of age of car may show hgh clams experence for older cars, however ths may result manly from the fact that such older cars are n general drven more by hgh rsk younger drvers. Relatvtes based on one-way analyses of age of vehcle and age of drver would double-count the effect of age of drver. Tradtonal actuaral technques for addressng ths problem usually attempt to standardze the data n such a way as to remove the dstortng effect of uneven busness mx, for example by focusng on loss ratos on a one-way bass, or by standardzng for the effect of one or more factors. These methods are, however, only approxmatons..9 One-way analyses also do not consder nterdependences between factors n the way they affect clams experence. These nterdependences, or nteractons, exst when the effect of one factor vares dependng on the levels of another factor. For example, the pure premum dfferental between men and women may dffer by levels of age.. Multvarate methods, such as generalzed lnear models, adjust for correlatons and allow nvestgaton nto nteracton effects. The falngs of mnmum bas procedures. In the 96s, actuares developed a ratemakng technque known as mnmum bas procedures. These procedures mpose a set of equatons relatng the observed data, the ratng varables, and a set of parameters to be determned. An teratve procedure solves the system of equatons by attemptng to converge to the optmal soluton. The reader seekng more nformaton may reference "The Mnmum Bas Procedure: A Practtoner's Gude" by Sholom Feldblum and Dr J. Erc Brosus. Baley, Robert A. and LeRoy J. Smon, "Two Studes n Automoble Insurance Ratemakng," Proceedngs of the Casualty Actuaral Socety, XLVII, 96. Feldblum, Sholom and Brosus, J Erc, "The Mnmum Bas Procedures: A Practtoner's Gude", Casualty Actuaral Socety Forum, Vol: Fall Pages:

6 . Once an optmal soluton s calculated, however, the mnmum bas procedures gve no systematc way of testng whether a partcular varable nfluences the result wth statstcal sgnfcance. There s also no credble range provded for the parameter estmates. The mnmum bas procedures lack a statstcal framework whch would allow actuares to assess better the qualty of ther modelng work. The connecton of mnmum bas to GLM. Stephen Mldenhall has wrtten a comprehensve paper showng that many mnmum bas procedures do correspond to generalzed lnear models. The followng table summarzes the correspondence for many of the more common mnmum bas procedures. The GLM termnology lnk functon and error functon s explaned n depth later n ths secton. In bref, these functons are key components for specfyng a generalzed lnear model. Mnmum Bas Procedures Generalzed Lnear Models Lnk functon Error functon Multplcatve balance prncple Logarthmc Posson Addtve balance prncple Identty Normal Multplcatve least squares Logarthmc Normal Multplcatve maxmum lkelhood Logarthmc Gamma wth exponental densty functon Multplcatve maxmum lkelhood Logarthmc Normal wth Normal densty functon Addtve maxmum lkelhood wth Normal densty functon Identty Normal.4 Not all mnmum bas procedures have a generalzed lnear model analog and vce versa. For example, the χ addtve and multplcatve mnmum bas models have no correspondng generalzed lnear model analog. Lnear models.5 A GLM s a generalzed form of a lnear model. To understand the structure of generalzed lnear models t s helpful, therefore, to revew classc lnear models..6 The purpose of both lnear models LMs and generalzed lnear models s to express the relatonshp between an observed response varable, Y, and a number of covarates also called predctor varables, X. Both models vew the observatons, Y, as beng realzatons of the random varable Y. Mldenhall, Stephen, "A Systematc Relatonshp between Mnmum Bas and Generalzed Lnear Models", Proceedngs of the Casualty Actuaral Socety, LXXXVI,

7 .7 Lnear models conceptualze Y as the sum of ts mean, μ, and a random varable, ε :.8 They assume that Y μ ε a. the expected value of Y, μ, can be wrtten as a lnear combnaton of the covarates, X, and b. the error term, ε, s Normally dstrbuted wth mean zero and varance σ..9 For example, suppose a smple prvate passenger auto classfcaton system has two categorcal ratng varables: terrtory urban or rural and gender male or female. Suppose the observed average clam severtes are: Urban Rural Male 8 5 Female 4. The response varable, Y, s the average clam severty. The two factors, terrtory and gender, each have two levels resultng n the four covarates: male X, female X, urban X, and rural X 4. These ndcator varables take the value or. For example, the urban covarate, X, s equal to f the terrtory s urban, and otherwse.. The lnear model seeks to express the observed tem Y n ths case average clam severty as a lnear combnaton of a specfed selecton of the four varables, plus a Normal random varable ε wth mean zero and varance σ, often wrtten ε ~ N,σ. One such model mght be Y 4 X 4 X X X ε. However ths model has as many parameters as t does combnatons of ratng factor levels beng consdered, and there s a lnear dependency between the four covarates X, X, X, X 4. Ths means that the model n the above form s not unquely defned - f any arbtrary value k s added to both and, and the same value k s subtracted from and 4, the resultng model s equvalent. 7

8 . To make the model unquely defned n the parameters consder nstead the model ε X X Y X.4 Ths model s equvalent to assumng that there s an average response for men and an average response for women, wth the effect of beng an urban polcyholder as opposed to beng a rural one havng an addtonal addtve effect whch s the same regardless of gender..5 Alternatvely ths could be thought of as a model whch assumes an average response for the "base case" of women n rural areas wth addtonal addtve effects for beng male - and for beng n an urban area..6 Thus the four observatons can be expressed as the system of equatons: ε ε ε ε Y Y Y Y.7 The parameters,, whch best explan the observed data are then selected. For the classcal lnear model ths s done by mnmzng the sum of squared errors SSE: ε ε ε ε SSE.8 Ths expresson can be mnmzed by takng dervatves wth respect to, and and settng each of them to zero. The resultng system of three equatons n three unknowns s: SSE SSE SSE 8

9 whch can be solved to derve: Vector and Matrx Notaton.9 Formulatng the system of equatons above quckly becomes complex as both the number of observatons and the number of covarates ncreases; consequently, vector notaton s used to express these equatons n compact form.. Let Y be a column vector wth components correspondng to the observed values for the response varable: Y Y Y Y Y. Let X, X, and X denote the column vectors wth components equal to the observed values for the respectve ndcator varables eg the th element of X s when the th observaton s male, and f female: X X X. Let denote a column vector of parameters, and for a gven set of parameters let ε be the vector of resduals: 4 ε ε ε ε ε 9

10 . Then the system of equatons takes the form: Y X X X ε.4 To smplfy ths further the vectors X, X, and X can be aggregated nto a sngle matrx X. Ths matrx s called the desgn matrx and n the example above would be defned as: X.5 Appendx A shows an example of the form of the desgn matrx X when explanatory varables nclude contnuous varables, or "varates"..6 The system of equatons takes the form YX. ε.7 In the case of the lnear model, the goal s to fnd values of the components of whch mnmze the sum of squares of the components of ε. If there are n observatons and p parameters n the model, ε wll have n components and wll have p components p<n..8 The basc ngredents for a lnear model thus consst of two elements: a. a set of assumptons about the relatonshp between Y and the predctor varables, and b. an objectve functon whch s to be optmzed n order to solve the problem. Standard statstcal theory defnes the objectve functon to be the lkelhood functon. In the case of the classcal lnear model wth an assumed Normal error t can be shown that the parameters whch mnmze sum of squared error also maxmze lkelhood.

11 Classcal lnear model assumptons.9 Lnear models assume all observatons are ndependent and each comes from a Normal dstrbuton..4 Ths assumpton does not relate to the aggregate of the observed tem, but to each observaton ndvdually. An example may help llustrate ths dstncton. Dstrbuton of ndvdual observatons Women Men.4 An examnaton of average clam amounts by gender may dentfy that average clam amounts for men are Normally dstrbuted, as are average clam amounts for women, and that the mean of the dstrbuton for men s twce the mean of the dstrbuton for women. The total dstrbuton of average clam amounts across all men and women s not Normally dstrbuted. The only dstrbuton of nterest s the dstrbuton of the two separate classes. In ths case there are only two classes beng consdered, but n a more complcated model there would be one such class for each combnaton of the ratng factors beng consdered..4 Lnear models assume that the mean s a lnear combnaton of the covarates, and that each component of the random varable s assumed to have a common varance.

12 .4 The lnear model can be wrtten as follows: Y E[ Y ] ε, E[ Y ] X..44 McCullagh and Nelder outlne the explct assumptons as follows: 4 LM Random component: Each component of Y s ndependent and s Normally dstrbuted. The mean, μ, of each component s allowed to dffer, but they all have common varance σ LM Systematc component: The p covarates are combned to gve the "lnear predctor" η: η X. LM Lnk functon: The relatonshp between the random and systematc components s specfed va a lnk functon. In the lnear model the lnk functon s equal to the dentty functon so that: E[Y ] μ η.45 The dentty lnk functon assumpton n LM may appear to be superfluous at ths pont, but t wll become more meanngful when dscussng the generalzaton to GLMs. Lmtatons of Lnear Models.46 Lnear models pose qute tractable problems that can be easly solved wth well-known lnear algebra approaches. However t s easy to see that the requred assumptons are not easy to guarantee n applcatons: It s dffcult to assert Normalty and constant varance for response varables. Classcal lnear regresson attempts to transform data so that these condtons hold. For example, Y may not satsfy the hypotheses but lny may. However there s no reason why such a transformaton should exst. The values for the response varable may be restrcted to be postve. The assumpton of Normalty volates ths restrcton. If the response varable s strctly non-negatve then ntutvely the varance of Y tends to zero as the mean of Y tends to zero. That s, the varance s a functon of the mean. 4 McCullagh, P. and J. A. Nelder, Generalzed Lnear Models, nd Ed., Chapman & Hall/CRC, 989.

13 The addtvty of effects encapsulated n the second LM and thrd LM assumptons s not realstc for a varety of applcatons. For example, suppose the response varable s equal to the area of the wngs of a butterfly and the predctor varables are the wdth and length of the wngs. Clearly, these two predctor varables do not enter addtvely; rather, they enter multplcatvely. More relevantly, many nsurance rsks tend to vary multplcatvely wth ratng factors ths s dscussed n more detal n Secton. Generalzed lnear model assumptons.47 GLMs consst of a wde range of models that nclude lnear models as a specal case. The LM restrcton assumptons of Normalty, constant varance and addtvty of effects are removed. Instead, the response varable s assumed to be a member of the exponental famly of dstrbutons 5. Also, the varance s permtted to vary wth the mean of the dstrbuton. Fnally, the effect of the covarates on the response varable s assumed to be addtve on a transformed scale. Thus the analog to the lnear model assumptons LM, LM, and LM are as follows. GLM Random component: Each component of Y s ndependent and s from one of the exponental famly of dstrbutons. GLM Systematc component: The p covarates are combned to gve the lnear predctor η: η X. GLM Lnk functon: The relatonshp between the random and systematc components s specfed va a lnk functon, g, that s dfferentable and monotonc such that: E[ Y ] μ g η.48 Most statstcal texts denote the frst expresson n GLM wth gx wrtten on the left sde of the equaton; therefore, the systematc element s generally expressed on the rght sde as the nverse functon, g -. 5 The exponental famly s a broader class of dstrbutons sharng the same densty form and ncludng Normal, Posson, gamma, nverse Gaussan, bnomal, exponental and other dstrbutons.

14 Exponental Famly of Dstrbutons.49 Formally, the exponental famly of dstrbutons s a -parameter famly defned as: f yθ b θ y ; θ, φ exp c y, φ a φ where a φ, bθ, and cy,φ are functons specfed n advance; θ s a parameter related to the mean; and φ s a scale parameter related to the varance. Ths formal defnton s further explored n Appendx B. For practcal purposes t s useful to know that a member of the exponental famly has the followng two propertes: a. the dstrbuton s completely specfed n terms of ts mean and varance, b. the varance of Y s a functon of ts mean..5 Ths second property s emphaszed by expressng the varance as: φ V μ Var Y ω where Vx, called the varance functon, s a specfed functon; the parameter φ scales the varance; and ω s a constant that assgns a weght, or credblty, to observaton..5 A number of famlar dstrbutons belong to the exponental famly: the Normal, Posson, bnomal, gamma, and nverse Gaussan. 6 The correspondng value of the varance functon s summarzed n the table below: Normal Posson Gamma Bnomal Inverse Gaussan V x x x x x x where the number of trals.5 A specal member of the exponental famly s the Tweede dstrbuton. The Tweede dstrbuton has a pont mass at zero and a varance functon proportonal to μ p where p< or <p< or p>. Ths dstrbuton s typcally used to model pure premum data drectly and s dscussed further n Appendx C. 6 A notable excepton to ths lst s the lognormal dstrbuton, whch does not belong to the exponental famly. 4

15 .5 The choce of the varance functon affects the results of the GLM. For example, the graph below consders the result of fttng three dfferent and very smple GLMs to three data ponts. In each case the model form selected s a two-parameter model the ntercept and slope of a lne, and the three ponts represent the ndvdual observatons wth the observed value Y shown on the y-axs for dfferent values of a sngle contnuous explanatory varable shown on the x-axs. Effect of varyng the error term smple example Data Normal Posson Gamma.54 The three GLMs consdered have a Normal, Posson and gamma varance functon respectvely. It can be seen that the GLM wth a Normal varance functon whch assumes that each observaton has the same fxed varance has produced ftted values whch are attracted to the orgnal data ponts wth equal weght. By contrast the GLM wth a Posson error assumes that the varance ncreases wth the expected value of each observaton. Observatons wth smaller expected values have a smaller assumed varance, whch results n greater credblty when estmatng the parameters. The model thus has produced ftted values whch are more nfluenced by the observaton on the left wth smaller expected value than the observaton on the rght whch has a hgher expected value and hence a hgher assumed varance..55 It can be seen that the GLM wth assumed gamma varance functon s even more strongly nfluenced by the pont on the left than the pont on the rght snce that model assumes the varance ncreases wth the square of the expected value. 5

16 .56 A further, rather more realstc, example llustrates how selectng an approprate varance functon can mprove the accuracy of a model. Ths example consders an artfcally generated dataset whch represents an nsurance portfolo. Ths dataset contans several ratng factors some of whch are correlated, and n each case the true effect of the ratng factor s assumed to be known. Clams experence n ths case average clam sze experence s then randomly generated for each polcy usng a gamma dstrbuton, wth the mean n each case beng that mpled by the assumed effect of the ratng factors. The clams experence s then analyzed usng three models to see how closely the results of each model relate to the n ths case known true factor effect..57 The three methods consdered are a one-way analyss a GLM wth assumed Normal varance functon a GLM wth assumed gamma varance functon..58 The results for one of the several ratng factors consdered are shown on the graph below. It can be seen that owng to the correlatons between the ratng factors n the data, the one-way analyss s badly dstorted. The GLM wth an assumed Normal dstrbuton s closer to the correct relatvtes, but t can be seen that t s the GLM wth an assumed gamma varance functon whch yelds results that are the closest to the true effect Effect of varyng the error term nsurance ratng factor example r Log of multple... A B C D -. E -. True effect One way GLM / Normal GLM / Gamma 6

17 .59 In addton to the varance functon Vx, two other parameters defne the varance of each observaton, the scale parameter φ and the pror weghts ω φ V μ Var[ Y ] ω Pror weghts.6 The pror weghts allow nformaton about the known credblty of each observaton to be ncorporated n the model. For example, f modelng clams frequency, one observaton mght relate to one month's exposure, and another to one year's exposure. There s more nformaton and less varablty n the observaton relatng to the longer exposure perod, and ths can be ncorporated n the model by defnng ω to be the exposure of each observaton. In ths way observatons wth hgher exposure are deemed to have lower varance, and the model wll consequently be more nfluenced by these observatons..6 An example demonstrates the approprateness of ths more clearly. Consder a set of observatons for personal auto clams under some classfcaton system. Let cell denote some generc cell defned by ths classfcaton system. To analyze frequency let: m k be the number of clams arsng from the k th unt of exposure n cell ω be the number of exposures n cell Y be the observed clam frequency n cell : ω Y m ω k k.6 If the random process generatng m k s Posson wth frequency f for all exposures k then E[m k ] f Var[m k ] 7

18 .6 Assumng the exposures are ndependent then k k f f m E Y E ω ω ω μ ω ] [ ] [ k k f f m Var Y Var ω μ ω ω ω ω ω ] [ ] [.64 So n ths case Vμ μ, φ, and the pror weghts are the exposures n cell..65 An alternatve example would be to consder clams severty. Let z k be the clam sze of the k th clam n cell ω be the number of clams n cell Y be the observed mean clam sze n cell : k k Y z ω ω.66 Ths tme assume that the random process generatng each ndvdual clam s gamma dstrbuted. Denotng E[z k ] m and Var[z k ] σ m and assumng each clam s ndependent then k k m m z E Y E ω ω ω μ ω ] [ ] [ k k m m z Var Y Var ω σ μ σ ω ω σ ω ω ω ] [ ] [.67 So for severty wth a gamma dstrbuton the varance of Y follows the general form for all exponental dstrbutons wth Vμ μ, φ σ, and pror weght equal to the number of clams n cell. 8

19 .68 Pror weghts can also be used to attach a lower credblty to a part of the data whch s known to be less relable. The scale parameter.69 In some cases eg the Posson dstrbuton the scale parameter φ s dentcally equal to and falls out of the GLM analyss entrely. However n general and for the other famlar exponental dstrbutons φ s not known n advance, and n these cases t must be estmated from the data..7 Estmaton of the scale parameter s not actually necessary n order to solve for the GLM parameters, however n order to determne certan statstcs such as standard errors, dscussed below t s necessary to estmate φ..7 φ can be treated as another parameter and estmated by maxmum lkelhood. The drawback of ths approach s that t s not possble to derve an explct formula for φ, and the maxmum lkelhood estmaton process can take consderably longer..7 An alternatve s to use an estmate of φ, such as a. the moment estmator Pearson χ statstc defned as ˆ φ ω Y μ n p V μ b. the total devance estmator φˆ n D p where D, the total devance, s defned later n ths paper. Lnk Functons.7 In practce when usng classcal lnear regresson practtoners sometmes attempt to transform data to satsfy the requrements of Normalty and constant varance of the response varable and addtvty of effects. Generalzed lnear models, on the other hand, merely requre that there be a lnk functon that guarantees the last condton of addtvty. Whereas LM requres that Y be addtve n the covarates, the generalzaton GLM nstead requres that some transformaton of Y, wrtten as gy, be addtve n the covarates. 9

20 .74 It s more helpful to consder μ as a functon of the lnear predctor, so typcally t s the nverse of gx whch s consdered: μ g η.75 In theory a dfferent lnk functon could be used for each observaton, but n practce ths s rarely done..76 The lnk functon must satsfy the condton that t be dfferentable and monotonc ether strctly ncreasng or strctly decreasng. Some typcal choces for a lnk functon nclude Identty Log Logt Recprocal g x x ln x ln x / x / x e x g e x x x x / e / x.77 Each error structure has assocated wth t a "canoncal" lnk functon whch smplfes the mathematcs of solvng GLMs analytcally. These are dscussed n Appendx D. When solvng GLMs usng modern computer software, however, the use of canoncal lnk functons s not mportant and any parng of lnk functon and varance functon whch s deemed approprate may be selected..78 The log-lnk functon has the appealng property that the effect of the covarates are multplcatve. Indeed, wrtng gx lnx so that g - x e x results n μ g x... x exp x exp x...exp x p p.79 In other words, when a log lnk functon s used, rather than estmatng addtve effects, the GLM estmates logs of multplcatve effects..8 As mentoned prevously, alternatve choces of lnk functons and error structures can yeld GLMs whch are equvalent to a number of the mnmum bas models as well as a smple lnear model see secton "The Connecton of Mnmum Bas to GLM". p p

21 The offset term.8 There are occasons when the effect of an explanatory varable s known, and rather than estmatng parameters n respect of ths varable t s approprate to nclude nformaton about ths varable n the model as a known effect. Ths can be acheved by ntroducng an "offset term" ξ nto the defnton of the lnear predctor η: whch gves η X. ξ E[Y] μ g - η g - X. ξ.8 A common example of the use of an offset term s when fttng a multplcatve GLM to the observed number, or count, of clams as opposed to clam frequency. Each observaton may relate to a dfferent perod of polcy exposure. An observaton relatng to one month's exposure wll obvously have a lower expected number of clams all other factors beng equal than an observaton relatng to a year's exposure. To make approprate allowance for ths, the assumpton that the expected count of clams ncreases n proporton to the exposure of an observaton all other factors beng equal can be ntroduced n a multplcatve GLM by settng the offset term ξ to be equal to the log of the exposure of each observaton, gvng: E[ Y ] g X j j j ξ exp j X j j log e exp X j j. e j where e the exposure for observaton..8 In the partcular case of a Posson multplcatve GLM t can be shown that modelng clam counts wth an offset term equal to the log of the exposure and pror weghts set to produces dentcal results to modelng clam frequences wth no offset term but wth pror weghts set to be equal to the exposure of each observaton.

22 Structure of a generalzed lnear model.84 In summary, the assumed structure of a GLM can be specfed as: where μ E[ Y ] g X j j ξ φ V μ Var[ Y ] ω j Y s the vector of responses gx s the lnk functon: a specfed nvertble functon whch relates the expected response to the lnear combnaton of observed factors X j s a matrx the "desgn matrx" produced from the factors j s a vector of model parameters, whch s to be estmated ξ s a vector of known effects or "offsets" φ s a parameter to scale the functon Vx Vx s the varance functon ω s the pror weght that assgns a credblty or weght to each observaton.85 The vector of responses Y, the desgn matrx X j, the pror weghts ω, and the offset term ξ are based on data n a manner determned by the practtoner. The assumptons whch then further defne the form of the model are the lnk functon gx, the varance functon Vx, and whether φ s known or to be estmated. Typcal GLM model forms.86 The typcal model form for modelng nsurance clam counts or frequences s a multplcatve Posson. As well as beng a commonly assumed dstrbuton for clam numbers, the Posson dstrbuton also has a partcular feature whch makes t ntutvely approprate n that t s nvarant to measures of tme. In other words, measurng frequences per month and measurng frequences per year wll yeld the same results usng a Posson multplcatve GLM. Ths s not true of some other dstrbutons such as gamma.

23 .87 In the case of clam frequences the pror weghts are typcally set to be the exposure of each record. In the case of clam counts the offset term s set to be the log of the exposure..88 A common model form for modelng nsurance severtes s a multplcatve gamma. As well as often beng approprate because of ts general form, the gamma dstrbuton also has an ntutvely attractve property for modelng clam amounts snce t s nvarant to measures of currency. In other words measurng severtes n dollars and measurng severtes n cents wll yeld the same results usng a gamma multplcatve GLM. Ths s not true of some other dstrbutons such as Posson..89 The typcal model form for modelng retenton and new busness converson s a logt lnk functon and bnomal error term together referred to as a logstc model. The logt lnk functon maps outcomes from the range of, to -, and s consequently nvarant to measurng successes or falures. If the y-varate beng modeled s generally close to zero, and f the results of a model are gong to be used qualtatvely rather than quanttatvely, t may also be possble to use a multplcatve Posson model form as an approxmaton gven that the model output from a multplcatve GLM can be rather easer to explan to a non-techncal audence..9 The below table summarzes some typcal model forms. Y Clam frequences Clam numbers or counts Average clam amounts Probablty eg of renewng Lnk functon gx lnx lnx lnx lnx/-x Error Posson Posson Gamma Bnomal Scale parameter φ Estmated Varance functon Vx x x x x-x* Pror weghts ω Exposure # of clams Offset ξ lnexposure * where the number of trals, or xt-x/t where the number of trals t

24 GLM maxmum lkelhood estmators.9 Havng defned a model form n terms of X, gx, ξ, Vx, φ, and ω, and gven a set of observatons Y, the components of are derved by maxmzng the lkelhood functon or equvalently, the logarthm of the lkelhood functon. In essence, ths method seeks to fnd the parameters whch, when appled to the assumed model form, produce the observed data wth the hghest probablty..9 The lkelhood s defned to be the product of probabltes of observng each value of the y-varate. For contnuous dstrbutons such as the Normal and gamma dstrbutons the probablty densty functon s used n place of the probablty. It s usual to consder the log of the lkelhood snce beng a summaton across observatons rather than a product, ths yelds more manageable calculatons and any maxmum of the lkelhood s also a maxmum of the log-lkelhood. Maxmum lkelhood estmaton n practce, therefore, seeks to fnd the values of the parameters that maxmze ths log-lkelhood..9 In smple examples the procedure for maxmzng lkelhood nvolves fndng the soluton to a system of equatons wth lnear algebra. In practce, the large number of observatons typcally beng consdered means that ths s rarely done. Instead numercal technques and n partcular mult-dmensonal Newton-Raphson algorthms are used. Appendx E shows the system of equatons for maxmzng the lkelhood functon n the general case of an exponental dstrbuton..94 An explctly solved llustratve example and a dscusson of numercal technques used wth large datasets are set out below. Solvng smple examples.95 To understand the mechancs nvolved n solvng a GLM, a concrete example s presented. Consder the same four observatons dscussed n a prevous secton for average clam severty: Urban Rural Male 8 5 Female 4.96 The general procedure for solvng a GLM nvolves the followng steps: a. Specfy the desgn matrx X and the vector of parameters b. Choose the error structure and lnk functon c. Identfy the log-lkelhood functon d. Take the logarthm to convert the product of many terms nto a sum 4

25 e. Maxmze the logarthm of the lkelhood functon by takng partal dervatves wth respect to each parameter, settng them to zero and solvng the resultng system of equatons f. Compute the predcted values..97 Recall that the vector of observatons, the desgn matrx, and the vector of parameters are as follows: and,, Rural Female Urban Female Rural Male Urban Male X Y where the frst column of X ndcates f an observaton s male or not, the second column ndcates whether the observaton s female, and the last column specfes f the observaton s n an urban terrtory or not..98 The followng three alternatve model structures are llustrated: Normal error structure wth an dentty lnk functon Posson error structure wth a log lnk functon Gamma error structure wth an nverse lnk functon..99 These three model forms may not necessarly be approprate models to use n practce - nstead they llustrate the theory nvolved.. In each case the elements of ω the pror weghts wll be assumed to be, and the offset term ξ assumed to be zero, and therefore these terms wll, n ths example, be gnored. Normal error structure wth an dentty lnk functon. The classcal lnear model case assumes a Normal error structure and an dentty lnk functon. The predcted values n the example take the form:. ] [ g g g g X g Y E 5

26 . The Normal dstrbuton wth mean μ and varance σ has the followng densty functon: ln exp, ; πσ σ μ σ μ y y f. Its lkelhood functon s: n y y L } ln exp{, ; πσ σ μ σ μ.4 Maxmzng the lkelhood functon s equvalent to maxmzng the log-lkelhood functon: n y y l ln, ; πσ σ μ σ μ.5 Wth the dentty lnk functon, μ Σ j X j j and the log-lkelhood functon becomes n p j j y X j y l ln., ; πσ σ σ μ.6 In ths example, up to a constant term of.lnπσ, the log-lkelhood s 4 5 σ σ σ * 8, ; σ σ μ y l.7 To maxmze l * take dervatves wth respect to, and and set each of them to zero. The resultng system of three equatons n three unknowns s: * * * l l l 6

27 .8 It can be seen that these equatons are dentcal to those derved when mnmzng the sum of squared error for a smple lnear model. Agan, these can be solved to derve: whch produces the followng predcted values: Urban Rural Male Female The Posson error structure wth a logarthm lnk functon.9 For the Posson model wth a logarthm lnk functon, the predcted values are gven by ] [ e e e e g g g g X g Y E. A Posson dstrbuton has the followng densty functon! / ; y y f y μ μ μ e. Its log-lkelhood functon s therefore. Wth the logarthm lnk functon, μ expσ j X j j, and the log-lkelhood functon reduces to. In ths example, the equaton s y y y f y l! ln ln ; ln ; μ μ μ μ ln!. ln 4! 4. ln 5! 5. 8! 8. ; μ e e e e y l n n! ln.. exp ; n p j j j p j j j X y X y X e y l ln 7

28 .4 Ignorng the constant of ln8! ln5! ln4! ln!, the followng functon s to be maxmzed: l * ; μ y e e e e..5 To maxmze l * the dervatves wth respect to, and are set to zero and the followng three equatons are derved: * l * l * l > > > e e e e e e 6 e.6 These can be solved to derve the followng parameter estmates: whch produces the followng predcted values: Urban Rural Male Female The gamma error structure wth an nverse lnk functon.7 Ths example s set out n Appendx F. 8

29 Solvng for large datasets usng numercal technques.8 The general case for solvng for maxmum lkelhood n the case of a GLM wth an assumed exponental dstrbuton s set out n Appendx E. In nsurance modelng there are typcally many thousands f not mllons of observatons beng modeled, and t s not practcal to fnd values of whch maxmze lkelhood usng the explct technques llustrated above and n Appendces E and F. Instead teratve numercal technques are used..9 As was the case n the smple examples above, the numercal technques seek to optmze lkelhood by seekng the values of whch set the frst dfferental of the log-lkelhood to zero, as there are a number of standard methods whch can be appled to ths problem. In practce, ths s done usng an teratve process, for example Newton-Raphson teraton whch uses the formula: n n H -.s where n s the n th teratve estmate of the vector of the parameter estmates wth p elements, s s the vector of the frst dervatves of the log-lkelhood and H s the p by p matrx contanng the second dervatves of the log-lkelhood. Ths s smply the generalzed form of the one-dmensonal Newton-Raphson equaton, x n x n - f'x n / f''x n whch seeks to fnd a soluton to f'x.. The teratve process can be started usng ether values of zero for elements of or alternatvely the estmates mpled by a one-way analyss of the data or of another prevously ftted GLM.. Several generc commercal packages are avalable to ft generalzed lnear models n ths way such as SAS, S, R, etc, and packages specfcally bult for the nsurance ndustry, whch ft models GLMs more quckly and wth helpful nterpretaton of output, are also avalable. 9

30 Base levels and the ntercept term. The smple examples dscussed above consdered a three parameter model, where corresponded to men, to women and to the effect of beng n an urban area. In the case of an addtve model wth dentty lnk functon ths could be thought of as ether assumng that there s an average response for men,, and an average response for women,, wth the effect of beng an urban polcyholder as opposed to beng a rural one havng an addtonal addtve effect whch s the same regardless of gender or assumng there s an average response for the "base case" of women n rural areas,, wth an addtonal addtve effects for beng male, -, and for beng n an urban area,.. In the case of a multplcatve model ths three parameter form could be thought of as assumng that there s an average response for men, exp, and an average response for women, exp, wth the effect of beng an urban polcyholder as opposed to beng a rural one havng a multplcatve effect exp, whch s the same regardless of gender or assumng there s an average response for the "base case" of women n rural areas exp wth an addtonal multplcatve effects for beng male, exp -, and for beng n an urban area exp..4 In the example consdered, some measure of the overall average response was ncorporated n both the values of and. The decson to ncorporate ths n the parameters relatng to gender rather than area was arbtrary..5 In practce when consderng many factors each wth many levels t s more helpful to parameterze the GLM by consderng, n addton to observed factors, an "ntercept term", whch s a parameter that apples to all observatons.

31 .6 In the above example, ths would have been acheved by defnng the desgn matrx X as X that s, by redefnng as the ntercept term, and only havng one parameter relatng to the gender of the polcyholder. It would not be approprate to have an ntercept term and a parameter for every value of gender snce then the GLM would not be unquely defned - any arbtrary constant k could be added to the ntercept term and subtracted from each of the parameters relatng to gender and the predcted values would reman the same..7 In practce when consderng categorcal factors and an ntercept term, one level of each factor should have no parameter assocated wth t, n order that the model remans unquely defned..8 For example consder a smple ratng structure wth three factors - age of drver a factor wth 9 levels, terrtory a factor wth 8 levels and vehcle class a factor wth 5 levels. An approprate parameterzaton mght be represented as follows: Age of drver Terrtory Vehcle class Factor level Parameter Factor level Parameter Factor level Parameter 7- A A -4 B B C C D D E E 4-49 F G H Intercept term that s, an ntercept term s defned for every polcy, and each factor has a parameter assocated wth each level except one. If a multplcatve GLM were ftted to clams frequency by selectng a log lnk functon the exponentals of the parameter estmates could be set out n tabular form also:

32 Age of drver Terrtory Vehcle class Factor level Multpler Factor level Multpler Factor level Multpler A.947 A B.9567 B C. C D.955 D E.975 E F G H Intercept term.4.9 In ths example the clams frequency predcted by the model can be calculated for a gven polcy by takng the ntercept term.4 and multplyng t by the relevant factor relatvtes. For the factor levels for whch no parameter was estmated the "base levels", no multpler s relevant, and ths s shown n the above table by dsplayng multplers of. The ntercept term relates to a polcy wth all factors at the base level e n ths example the model predcts a clam frequency of.4 for a 4-49 year old n terrtory C and a vehcle n class A. Ths ntercept term s not an average rate snce ts value s entrely dependent upon the arbtrary choce of whch level of each factor s selected to be the base level.. If a model were structured wth an ntercept term but wthout each factor havng a base level, then the GLM solvng routne would remove as many parameters as necessary to make the model unquely defned. Ths process s known as alasng. Alasng. Alasng occurs when there s a lnear dependency among the observed covarates X,,X p. That s, one covarate may be dentcal to some combnaton of other covarates. For example, t may be observed that X 4 X X 5. Equvalently, alasng can be defned as a lnear dependency among the columns of the desgn matrx X.

33 . There are two types of alasng: ntrnsc alasng and extrnsc alasng. Intrnsc alasng.4 Intrnsc alasng occurs because of dependences nherent n the defnton of the covarates. These ntrnsc dependences arse most commonly whenever categorcal factors are ncluded n the model..5 For example, suppose a prvate passenger automoble classfcaton system ncludes the factor vehcle age whch has the four levels: - years X, 4-7 years X, 8-9 years X, and years X 4. Clearly f any of X, X, X, s equal to then X 4 s equal to ; and f all of X, X, X, are equal to then X 4 must be equal to. Thus X 4 - X - X - X..6 The lnear predctor η X X X 4 X 4 gnorng any other factors can be unquely expressed n terms of the frst three levels: η X X 4 X X X 4 4 X.7 Upon renamng the coeffcents ths becomes: X X 4 X 4 η α X α X α X α.8 The result s a lnear predctor wth an ntercept term f one dd not already exst and three covarates..9 GLM software wll remove parameters whch are alased. Whch parameter s selected for excluson depends on the software. The choce of whch parameter to alas does not affect the ftted values. For example n some cases the last level declared e the last alphabetcally s alased. In other software the level wth the maxmum exposure s selected as the base level for each factor frst, and then other levels are alased dependent upon the order of declaraton. Ths latter approach s helpful snce t mnmzes the standard errors assocated wth other parameter estmates - ths subject s dscussed later n ths paper.

34 Extrnsc Alasng.4 Ths type of alasng agan arses from a dependency among the covarates, but when the dependency results from the nature of the data rather than nherent propertes of the covarates themselves. Ths data characterstc arses f one level of a partcular factor s perfectly correlated wth a level of another factor..4 For example, suppose a dataset s enrched wth external data and two new factors are added to the dataset: the factors number of doors and color of vehcle. Suppose further that n a small number of cases the external data could not be lnked wth the exstng data wth the result that some records have an unknown color and an unknown number of doors. Exposures # Doors 4 5 Unknown Red,4,4 5,4,4 Green 4,54 4,54,4,45 Blue 6,544 5,44 5,654 4,565 Black 4,64,5 4,565 4,545 Unknown,4 Color Selected Base: # Doors 4; Color Red Addtonal Alasng: Color Unknown.4 In ths case because of the way the new factors were derved, the level unknown for the factor color happens to be perfectly correlated wth the level unknown for the factor # doors. The covarate assocated wth unknown color s equal to n every case for whch the covarate for unknown # doors s equal to, and vce versa..4 Elmnaton of the base levels through ntrnsc alasng reduces the lnear predctor from covarates to 8, plus the ntroducton of an ntercept term. In addton, n ths example, one further covarate needs to be removed as a result of extrnsc alasng. Ths could ether be the unknown color covarate or the unknown # doors covarate. Assumng n ths case the GLM routne alases on the bass of order of declaraton, and assumng that the # doors factor s declared before color, the GLM routne would alas unknown color reducng the lnear predctor to just 7 covarates. 4

35 "Near Alasng".44 When modelng n practce a common problem occurs when two or more factors contan levels that are almost, but not qute, perfectly correlated. For example, f the color of vehcle was known for a small number of polces for whch the # doors was unknown, the two-way of exposure mght appear as follows: Exposures # Doors 4 5 Unknown Red,4,4 5,4,4 Green 4,54 4,54,4,45 Blue 6,544 5,44 5,654 4,565 Black 4,64,5 4,565 4,545 5 Unknown,4 Color Selected Base: # Doors 4; Color Red.45 In ths case the unknown level of color factor s not perfectly correlated to the unknown level of the # doors factor, and so extrnsc alasng wll not occur..46 When levels of two factors are "nearly alased" n ths way, convergence problems can occur. For example, f there were no clams for the 5 exposures ndcated n black color level and unknown # doors level, and f a log lnk model were ftted to clams frequency, the model would attempt to estmate a very large and negatve parameter for unknown # doors for example, - and a very large parameter for unknown color for example.. The sum. n ths example would be an approprate reflecton of the clams frequency for the,4 exposures havng unknown # doors and unknown color, whle the value of the unknown # doors parameter would be drven by the experence of the 5 rogue exposures havng color black wth unknown # doors. Ths can ether gve rse to convergence problems, or to results whch can appear very confusng..47 In order to understand the problem n such crcumstances t s helpful to examne two-way tables of exposure and clam counts for the factors whch contan very large parameter estmates. From these t should be possble to dentfy those factor combnatons whch cause the near-alasng. The ssue can then be resolved ether by deletng or excludng those rogue records, or by reclassfyng the rogue records nto another, more approprate, factor level. 5

36 Model dagnostcs.48 As well as dervng parameter estmates whch maxmze lkelhood, a GLM can produce mportant addtonal nformaton ndcatng the certanty of those parameter estmates whch themselves are estmates of some true underlyng value. Standard errors.49 Statstcal theory can be used to gve an estmate of the uncertanty. In partcular, the multvarate verson of the Cramer-Rao lower bound whch states that the varance of a parameter estmate s greater than or equal to mnus one over the second dervatve of the log lkelhood can defne "standard errors" for each parameter estmate. Such standard errors are defned as beng the dagonal element of the covarance matrx -H - where H the Hessan s the second dervatve matrx of the log lkelhood..5 Intutvely the standard errors can be thought of as beng ndcators of the speed wth whch log-lkelhood falls from the maxmum gven a change n a parameter. For example consder the below dagram. Intutve llustraton of standard errors Log Lkelhood Parameter Parameter 6

37 .5 Ths dagram llustrates a smple case wth two parameters and and shows how log lkelhood vares, for the dataset n queston, for dfferent values of the two parameters. It can be seen that movements n parameter from the optmal poston reduce log lkelhood more quckly than smlar movements n parameter, that s to say the log lkelhood curve becomes steeper n the parameter drecton than n the parameter drecton. Ths can be thought of as the second partal dfferental of log lkelhood wth respect to parameter beng large and negatve, wth the result that the standard error for parameter beng mnus one over the second partal dfferental s small. Conversely the second partal dfferental of log lkelhood wth respect to parameter s less large and negatve, wth the standard error for parameter beng larger ndcatng greater uncertanty..5 Generally t s assumed that the parameter estmates are asymptotcally Normally dstrbuted; consequently t s n theory possble to undertake a smple statstcal test on ndvdual parameter estmates, comparng each estmate wth zero e testng whether the effect of each level of the factor s sgnfcantly dfferent from the base level of that factor. Ths s usually performed usng a χ test, wth the square of the parameter estmate dvded by ts varance beng compared to a χ dstrbuton. Ths test n fact compares the parameter wth the base level of the factor. Ths s not necessarly a fully useful test n solaton as the choce of base level s arbtrary. It s theoretcally possble to change repeatedly the base level and so construct a trangle of χ tests comparng every par of parameter estmates. If none of these dfferences s sgnfcant then ths s good evdence that the factor s not sgnfcant..5 In practce graphcal nterpretaton of the parameter estmates and standard errors are often more helpful, and these are dscussed n Secton. Devance tests.54 In addton to the parameter estmate standard errors, measures of devance can be used to assess the theoretcal sgnfcance of a partcular factor. In broad terms, a devance s a measure of how much the ftted values dffer from the observatons..55 Consder a devance functon dy,μ defned by d Y ; μ ω Y μ Y ζ dζ V ζ Under the condton that Vx s strctly postve, dy,μ s also strctly postve and satsfes the condtons for beng a dstance functon. Indeed t should be nterpreted as such. 7

38 .56 Consder an observaton Y and a GLM that makes a predcton μ for that observaton. dy,μ s a measure of the dfference between the ftted and actual observatons whch gves more weght to the dfference between Y and μ when the varance functon Vx s small. That s, f Y s known to come from a dstrbuton wth small varance then any dscrepancy between Y and μ s gven more emphass..57 dy,μ can be thought of as a generalzed form of the squared error..58 Summng the devance functon across all observatons gves an overall measure of devance referred to as the total devance D: D n Y Y ω V ζ μ ζ dζ.59 Dvdng ths by the scale parameter φ gves the scaled devance D *, whch can be thought of as a generalzed form of the sum of squared errors, adjustng for the shape of the dstrbuton. D * n Y ω Y ζ dζ φ μ V ζ.6 For the class of exponental dstrbutons the scaled devance can be shown to be equal to twce the dfference between the maxmum achevable lkelhood e the lkelhood where the ftted value s equal to the observaton for every record and the lkelhood of the model..6 A range of statstcal tests can be undertaken usng devance measures. One of the most useful tests consders the rato of the lkelhood of two "nested" models, that s to say where one model contans explanatory varables whch are a subset of the explanatory varables n a second model. Such tests are often referred to as "type III" tests as opposed to "type I" tests whch consder the sgnfcance of factors as they are added sequentally to a model wth only an ntercept term, referred to as a null model..6 The change n scaled devance between two nested models whch reflects the rato of the lkelhoods can be consdered to be a sample from a χ dstrbuton wth degrees of freedom equal to the dfference n degrees of freedom between the two models where the degrees of freedom for a model s defned as the number of observatons less the number of parameters, e D * D * ~ χ df df 8

39 .6 Ths allows tests to be undertaken to assess the sgnfcance of the parameters that dffer between the two models wth the null hypothess that the extra parameters are not mportant. Expressed crudely ths measures whether the ncluson of an explanatory factor n a model mproves the model enough e decreases the devance enough gven the extra parameters whch t adds to the model. Addng any factor wll mprove the ft on the data n queston - what matters s whether the mprovement s sgnfcant gven the extra parameterzaton..64 The χ tests depend on the scaled devance. For some dstrbutons such as the Posson and the bnomal the scale parameter s assumed to be known, and t s possble to calculate the statstc accurately. For other dstrbutons the scale parameter s not known and has to be estmated, typcally as the rato of the devance to the degrees of freedom. Ths can decrease the relablty of ths test f the estmate of the scale parameter used s not accurate..65 It s possble to show that, after adjustng for the degrees of freedom and the true scale parameter, the estmate of the scale parameter s also dstrbuted wth a χ dstrbuton. The F-dstrbuton s the rato of χ dstrbutons. The rato of the change n devance and the adjusted estmate of the scale s therefore dstrbuted wth an F-dstrbuton. D D df df ~ F df D / df df, df.66 Ths means that the F-test s sutable for use when the scale parameter s not known for example when usng the gamma dstrbuton. There s no advantage to usng ths test where the scale s known. 9

40 GLMs n practce. Secton dscussed how GLMs are formularzed and solved. Ths secton consders practcal ssues and presents a plan for undertakng a GLM analyss n four general stages: pre-modelng analyss - consderng data preparaton as well as a range of helpful portfolo nvestgatons model teraton - typcal model forms and the dagnostcs used n both factor selecton and model valdaton model refnement - nvestgatng nteracton varables, the use of smoothng, and the ncorporaton of artfcal constrants nterpretaton of the results - how model results can be compared to exstng ratng structures both on a factor-by-factor bass and overall. Data requred. GLM clam analyses requre a certan volume of experence. Dependng on the underlyng clam frequences and the number of factors beng analyzed, credble results on personal lnes portfolos can generally be acheved wth around, exposures whch could for example be 5, n each of two years, etc. Meanngful results can sometmes be acheved wth smaller volumes of data partcularly on clam types wth adequate clams volume, but t s best to have many,s of exposures. As models ftted to only a sngle year of data could be dstorted by events that occurred durng that year, the data should deally be based on two or three years of experence.. In addton to combnng dfferent years of experence, combnng states or provnces can also mprove stablty, assumng nputs are consstent across borders. 7 In the case where one geographc area has suffcent exposure t may be more approprate to ft a model just to that area's experence. If a countrywde model has been run, the goodness of ft of that model on state data may be nvestgated, or the state and countrywde model results may be compared sde-by-sde. Examnng the nteracton of state wth each predctve factor may also dentfy where state patterns dffer from countrywde; nteracton varables are dscussed later n ths paper. 7 In ths sense, nputs refer to explanatory crtera, not necessarly exstng ratng relatvtes. Data codng should be revewed to ensure some level of consstency and care should be taken wth recycled codng from one state to another eg terrtory n Vrgna should not be added to terrtory n Florda. 4

41 .4 Dfferent types of clam can be affected by ratng factors n dfferent ways and so often t s approprate to analyze dfferent types of clam wth separate models. Analyzng dfferent clam elements separately wll often dentfy clearer underlyng trends than consderng models based on a mxture of clams for example, lablty clams combned wth theft clams. Even f a sngle model s requred ultmately, t s generally benefcal to model by ndvdual clam type and later to produce a sngle model whch fts the aggregate of the underlyng models by clam type..5 The overall structure of a dataset for GLM clams analyss conssts of lnked polcy and clams nformaton at the ndvdual rsk level. Typcal data requrements and a more detaled dscusson of ssues such as dealng wth IBNR are set out n Appendx G. In summary, however, the followng felds would typcally be ncluded n a GLM clams dataset. Raw explanatory varables - whether dscrete or contnuous, nternal or external to the company. Dummy varables to standardze for tme-related effects, geographc effects and certan hstorcal underwrtng effects. Earned exposure felds - preferably by clam type f certan clam types are only present for some polces. These felds should contan the amount of exposure attrbutable to the record eg measured n years. Number of ncurred clams felds. There should be one feld for each clam type, gvng the number of clams assocated wth the exposure perod n queston. Incurred loss amounts felds. There should be one feld for each clam type, gvng the ncurred loss amount of clams assocated wth the exposure perod n queston, based on the most recent possble case reserve estmates. Premum felds. These gve the premum earned durng the perod assocated wth the record. If t s possble to splt ths premum between the clam types then ths can be used to enhance the analyss. Ths nformaton s not drectly requred for modelng clams frequency and severty, however t can be helpful for a range of post-modelng analyses such as measurng the mpact of movng to a new ratng structure. 4

42 .6 When analyzng polcyholder retenton or new busness converson, a dfferent form of data s requred. For example to ft GLMs to polcyholder renewal experence, a dataset would contan one record for each nvtaton to renew and would contan the followng felds: explanatory varables ncludng, for example, ratng factors other factors such as dstrbuton channel, method of payment and number of terms wth company change n premum on latest renewal 8 change n premum on prevous renewal measure of compettveness on renewal premum detals of any md-term adjustments occurrng n the precedng polcy perod number of nvtatons to renew typcally for each record - ths would be the measure of exposure whether or not the polcy renewed..7 If several rsks are wrtten on a sngle polcy, renewal may be defned at the polcy level or at the ndvdual rsk level for example, a personal automoble carrer may wrte all vehcles n a household on a sngle polcy. An understandng of how the model wll be used wll ad data preparaton. For example, models that wll be part of a detaled model offce scenaro wll beneft from data defned at the ndvdual rsk level. Models used to gan an overall understandng of whch crtera affect polcyholder retenton perhaps for marketng purposes would not requre such detal. 8 Separaton of premum change nto rate change and rsk crtera change would be benefcal. 4

43 Prelmnary analyses.8 Before modelng, t s generally helpful to undertake certan prelmnary analyses. These analyses nclude data checks such as dentfcaton of records wth negatve or zero exposures, negatve clam counts or losses, and blanks n any of the statstcal felds. In addton, certan logcal tests may be run aganst the data - for example, dentfyng records wth ncurred losses but wth no correspondng clam count. Analyss of dstrbutons.9 One helpful prelmnary analyss s to consder the dstrbuton of key data tems for the purpose of dentfyng any unusual features or data problems that should be nvestgated pror to modelng. Manly ths concerns the dstrbuton of clam amounts e number of clam counts by average clam sze, whch are examned n order to dentfy features such as unusually large clams and dstortons resultng from average reserves placed on newly reported clams. A typcal clam dstrbuton graph s shown below. Dstrbuton Example of clam jobamounts Clam type - Thrd party materal damage 5 Number of clams - sumntpm Average clam sze. Ths dstrbuton, along wth a dstrbuton of loss amount by average clam sze, wll ad n understandng the tal of the dstrbuton for a partcular clam type. When modelng severty, t s often approprate to apply a large loss threshold to certan clam types, and ths helps assess possble thresholds. A tabular representaton of the dstrbuton would also help quantfy the percent of the clams dstrbuton whch would be affected by dfferent large loss thresholds. 4

44 . Dstrbuton analyses can also hghlght specfc anomales that mght requre addressng pror to modelng. For example, f many new clams have a standard average reserve allocated to them, t mght be approprate to adjust the amount of such an average reserve f t was felt that the average level was systematcally below or above the ultmate average clams cost. One and two-way analyses. Although GLMs are a multvarate method, there s generally beneft n revewng some one-way and two-way analyses of the raw data pror to modelng.. Frstly, the one-way dstrbuton of exposure and clams across levels of each raw varable wll ndcate whether a varable contans enough nformaton to be ncluded n any models for example, f 99.5% of a varable's exposures are n one level, t may not be sutable for modelng..4 Secondly, assumng there s some vable dstrbuton by levels of the factor, consderaton needs to be gven to any ndvdual levels contanng very low exposure and clam count. If these levels are not ultmately combned wth other levels, the GLM maxmum lkelhood algorthm may not converge f a factor level has zero clams and a multplcatve model s beng ftted, the theoretcally correct multpler for that level wll be close to zero, and the parameter estmate correspondng to the log of that multpler may be so large and negatve that the numercal algorthm seekng the maxmum lkelhood wll not converge..5 In addton to nvestgatng exposure and clam dstrbuton, a query of one-way statstcs eg frequency, severty, loss rato, pure premum wll gve a prelmnary ndcaton of the effect of each factor. Factor categorzatons.6 Before modelng, t s necessary to consder how explanatory varables should be categorzed, and whether any varables should be modeled n a contnuous fashon as varates or polynomals n varates. Although varates do not requre any artfcally mposed categorzaton, the man dsadvantage s that the use of polynomals may smooth over nterestng effects n the underlyng experence. Often t s better to begn modelng all varables as narrowly defned categorcal factors ensurng suffcent data n each category and f the categorcal factor presents GLM parameter estmates whch appear approprate for modelng wth a polynomal, then the polynomal n the varate may be used n place of the categorcal factor. 44

45 .7 When usng categorcal factors consderaton needs to be gven to the way n whch the factors are categorzed. If an example portfolo contaned a suffcent amount of clams for each for each age of drver say from age 6 to 99, the categorzaton of age of drver may consst of each ndvdual age. Ths s rarely the case n practce, however, and often t s necessary that levels of certan ratng factors are combned..8 In dervng an approprate categorzaton, the exstng ratng structure may provde ntal gudance partcularly f the GLMs are to be appled n ratemakng, wth factor levels wth nsuffcent exposure then beng grouped together and levels wth suffcent exposure beng consdered separately. In general such a manual approach tends to be the most approprate. One partcular automated approach wthn the GLM framework s consdered n Appendx H. Ths approach, however, would not necessarly produce any more approprate results than the manual approach. Correlaton analyses.9 Once categorcal factors have been defned, t can also be helpful to consder the degree to whch the exposures of explanatory factors are correlated. One commonly used correlaton statstc for categorcal factors s Cramer's V statstc. 9 Further nformaton about ths statstc s set out n Appendx I.. Although not used drectly n the GLM process, an understandng of the correlatons wthn a portfolo s helpful when nterpretng the results of a GLM. In partcular t can explan why the multvarate results for a partcular factor dffer from the unvarate results, and can ndcate whch factors may be affected by the removal or ncluson of any other factor n the GLM. Data extracts. In practce t s not necessary to ft every model to the entre dataset. For example, modelng severty for a partcular clam type only requres records that contan a clam of that type. Runnng models aganst data subsets, or extracts, can mprove model run speed. 9 Other correlaton statstcs for categorcal factors nclude Pearson ch-square, Lkelhood rato ch-square, Ph coeffcent and Contngency coeffcent. A thorough dscusson of these statstcs s beyond the scope of ths paper. 45

46 . The error term assumed for a model can also nfluence these data extracts. In the case of clam counts, a partcular property of Posson multplcatve model s that the observed data Y can be grouped by unque combnaton of ratng factors beng modeled summng exposure and clam counts for each unque combnaton and the GLM parameter estmates and the parameter estmate standard errors wll reman unchanged. Ths s helpful n practce snce t can decrease model run tmes. Ths s not the case for some other dstrbutons.. A gamma multplcatve model does not produce dentcal results f the observatons are grouped by unque combnatons of factors. Such a groupng would not change parameter estmates, but t would affect the standard errors. Dependng on the lne of busness, however, t may be approprate to group the small number of multple clams whch occur on the same polcy n the same exposure perod. Model teraton and the role of dagnostcs.4 Gven data relatng to the actual observatons and the assumptons about the model form, a GLM wll yeld parameter estmates whch best ft the data gven that model form. The GLM wll not automatcally provde nformaton ndcatng the approprateness of the model ftted - for ths t s necessary to examne a range of dagnostcs. Ths secton revews model forms typcally used n practce and dscusses the range of dagnostcs whch ad n both the selecton of explanatory factors and the valdaton of statstcal assumptons. Factor selecton.5 One of the key ssues to consder s whch explanatory factors should be ncluded n the model. The GLM wll beneft from ncludng factors whch systematcally affect experence, but excludng factors whch have no systematc effect. To dstngush whether a factor effect s systematc or random and therefore unlkely to be repeated n the future there are a number of crtera whch can be consdered, ncludng parameter estmate standard errors devance tests type III tests consstency wth tme common sense. Standard errors.6 As dscussed n Secton, as well as dervng parameter estmates whch maxmze lkelhood, a GLM can produce mportant addtonal nformaton ndcatng the certanty of those parameter estmates. 46

47 .7 One such helpful dagnostc s the standard errors of the parameter estmates, defned as beng the square root of the dagonal element of -H - where H the Hessan s the second dervatve matrx of the log lkelhood..8 Although theoretcally tests could be performed on ndvdual parameter estmates usng standard errors, n practce t s often more helpful to consder for each factor n the GLM the ftted parameter estmates alongsde the assocated standard errors for one base level n a graphcal form thus: GLM output example of sgnfcant factor. 6 8% 54% 5.8 5% Log of multpler.6.4 % 9% 45% 58% 84% 7% 7% 9% 4 Exposure years. % 5% Vehcle symbol Onew ay relatvtes Approx 95% confdence nterval Parameter estmate P value.%.9 One such graph would be shown for each factor n the model. In ths case the factor n queston s Vehcle Symbol wth levels runnng from to.. The thck sold lne shows the ftted parameter estmates. In ths case the model s a multplcatve model wth a log lnk functon and so the parameter estmates represent logs of multplers. For clarty the mpled loadngs are shown as labels by each pont on the thck sold lne. For example - the parameter estmate for Vehcle Symbol has value.7. Ths means that the model estmates that, all other factors beng constant, exposures wth Vehcle Symbol wll have a relatvty of e.7. tmes that expected for exposures at the base level n ths example Symbol. Ths multpler s shown on the graph as a "loadng" of %. 47

48 . The thn sold lnes on each graph ndcate two standard errors ether sde of the parameter estmate. Very approxmately ths means that assumng the ftted model s approprate and correct the data suggests that the true relatvty for each level of ratng factor wll le between the two thn sold lnes wth roughly 95% certanty. The two bands wll be wde apart, ndcatng great uncertanty n the parameter estmate where there s low exposure volume, where other correlated factors also explan the rsk, or where the underlyng experence s very varable.. The dotted lnes shows the relatvtes mpled by a smple one-way analyss. These relatvtes make no allowance for the fact that the dfference n experence may be explaned n part by other correlated factors. These one-way estmates are of nterest snce they wll dffer from the multvarate estmates for a gven factor when there are sgnfcant correlatons between that factor and one or more other sgnfcant factors. The dstrbuton of exposure for all busness consdered s also shown as a bar chart at the bottom of each graph. Ths serves to llustrate whch level of each factor may be fnancally sgnfcant.. Even though the standard errors on the graph only ndcate the estmated certanty of the parameter estmates relatve to the base level, such graphs generally gve a good ntutve feel for the sgnfcance of a factor. For example n the above case t s clear that the factor s sgnfcant snce the parameter estmates for Vehcle Symbols to are consderably larger than twce the correspondng standard errors. By contrast the graph below an example of the same factor n a dfferent model for a dfferent clam type llustrates an example where a factor s not sgnfcant - n ths case there are no parameter estmates more than two standard errors from zero.. GLM output example of nsgnfcant factor Log of multpler % 4% -% 7% % -5% % -5% -% % -% 4% -% 5 4 Number of clams Vehcle symbol Onew ay relatvtes Approx 95% confdence nterval Parameter estmate P value 5.5% 48

49 .4 Sometmes some levels of a categorcal factor may be clearly sgnfcant, whle other levels may be less so. Although the factor as a whole may be statstcally sgnfcant, ths may ndcate that t s approprate to re-categorze the factor, groupng together the less sgnfcant levels wth other levels. Devance tests.5 As dscussed n Secton, comparng measures of devance of two nested models allows "type III" tests χ or F-tests dependng on whether or not the scale parameter φ s known to be performed to determne the theoretcal sgnfcance of ndvdual factors..6 In the Vehcle Symbol examples above whch were based on frequency models of two dfferent clam types, each wth a Posson error structure, the resultng probablty values or P values from the χ tests are shown as footnotes to the graphs. Each χ test compares a model wth Vehcle Symbol to one wthout. In the frst case the χ test shows a probablty level close to dsplayed to one decmal place as.%. Ths means that the probablty of ths factor havng such an effect on the devance by chance s almost zero, e ths factor accordng to the χ test s hghly sgnfcant. Conversely n the second example the probablty value s 5.5%, ndcatng that the factor s consderably less sgnfcant and should be excluded from the model. Typcally factors wth χ or F-test probablty levels of 5% or less are consdered sgnfcant..7 These knds of type III lkelhood rato tests can provde addtonal nformaton to the graphcal nterpretaton of parameter estmates and standard errors. For example f other correlated factors n a model could largely compensate for the excluson of a factor, ths would be ndcated n the type III test. Also the type III test s not nfluenced by the choce of the base level n the way that parameter estmate standard errors are..8 On the other hand, type III tests can be mpractcal on occasons - for example f a level factor contaned only one level that had any dscrmnatory effect on experence, a type III test mght ndcate that the factor was statstcally sgnfcant, whereas a graphcal representaton of the model results would show at a glance that the factor contaned too many levels and needed to be re-categorzed wth fewer parameters. 49

50 Interacton wth tme.9 In addton to classcal statstcal tests t can often be helpful to consder rather more pragmatc tests such as whether the observed effect of a ratng factor s consstent over tme. For example f more than one year's experence s beng consdered t s possble to consder the effect of a partcular factor n each calendar year of exposure or alternatvely polcy year. In theory ths could be done by fttng separate models to each year and then comparng the results, however ths can be hard to nterpret snce a movement n one factor n one year may to a large extent be compensated for by a movement n another correlated factor. A potentally clearer test, therefore, s to ft a seres of models each one of whch consders the nteracton of a sngle factor wth tme. Interactons are dscussed n more detal later n ths paper..4 The below dagram shows one example factor nteracted wth calendar year of exposure. It s clear from ths result showng lnes whch are largely parallel that the factor effect s manly consstent from year to year, suggestng that the factor s lkely to be a good predctor of future experence. GLM output - example showng factor consstent over tme. 85% Log of multpler % 8% % 4% 7% 5% 44% 4% 6% 56% 97% 8% 7% 78% 76% 64% 8% 69% 65% 4% 89% 86% 8% 4% 5% 5% 49% 8% 4% Exposure years. 4% % -4% % 5% % Vehcle symbol.year of exposure Approx 95% confdence nterval, Year of exposure: Approx 95% confdence nterval, Year of exposure: Approx 95% confdence nterval, Year of exposure: Parameter estmate, Year of exposure: Parameter estmate, Year of exposure: Parameter estmate, Year of exposure: 5

51 .4 Conversely the graph below shows an example of a factor n ths case terrtory classfcaton whch, although sgnfcant accordng to classcal type III tests, shows a pattern for some levels whch dffers from year to year. In such a case t would be approprate to nvestgate whether there was a possble explanaton for such varatons. If the varaton can be attrbuted to some known change for examples some event n one of the terrtores durng one perod then that can be allowed when nterpretng the results. If no explanaton can be found for varatons over tme, ths may ndcate that the factor wll be an unrelable predctor of future experence..8 GLM output - example showng factor nconsstent over tme % % 8 Log of multpler % -% -% % -% -% 9% -5% -5% 4% % % 6% 4% -9% 6% -5% % -% 6 4 Exposure years Terrtory by Year of exposure Approx 95% confdence nterval, Year of exposure: Approx 95% confdence nterval, Year of exposure: Approx 95% confdence nterval, Year of exposure: Smoothed estmate, Year of exposure: Smoothed estmate, Year of exposure: Smoothed estmate, Year of exposure: Intuton.4 In addton to statstcal and other pragmatc tests, common sense can also play an mportant role n factor selecton. Issues whch should be consdered when assessng the sgnfcance of a factor nclude whether the observed effect of a factor s smlar across models whch consder related types of clam eg auto property damage lablty and collson whether the observed effect makes logcal sense gven the other factors n the model 5

52 whether the observed effects of a categorcal factor whch represents a contnuous varable such as the age of a vehcle show a natural trend - the model has no way of knowng that factor levels have a natural order, therefore f a trend s observed ths may suggest that the factor has a more sgnfcant effect than the pure statstcal tests alone would suggest. Model teraton / stepwse macros.4 It s not generally possble to determne from a sngle GLM whch set of factors are sgnfcant snce the ncluson or excluson of one factor wll change the observed effects and therefore possbly the sgnfcance of other correlated factors n the model. To determne the theoretcally optmal set of factors, therefore, t s generally necessary to consder an terated seres of models..44 Often the model teraton starts wth a GLM that ncludes all the man explanatory varables. Insgnfcant factors can then be excluded, one at a tme, refttng the model at each stage..45 When a factor s dentfed as beng nsgnfcant t s helpful to compare the GLM parameter estmates for that factor wth the equvalent one-way relatvtes. When the GLM parameter estmates are dfferent from the one-way relatvtes ths ndcates that the factor n queston s correlated wth other factors n the model and that the removal of that factor from the model s lkely to affect the parameter estmates for other factors and qute possbly also ther sgnfcance. Conversely f the one-way relatvtes are very smlar to the GLM relatvtes for the factor to be excluded, t s lkely that there wll be no such consequences and that therefore to save tme a second nsgnfcant factor could be removed at that teraton also..46 If a very large number of factors are to be consdered t can be mpractcal to start the factor teraton process wth all possble factors n the model. In such cases t s possble to select a model wth certan factors whch are known to be mportant, and then to test all other excluded factors by fttng a seres of models whch, one at a tme, tests the consequences of ncludng each of the excluded factors. The most sgnfcant of the excluded factors can then be ncluded n the model, and then the other excluded factors can be retested for sgnfcance. 5

53 .47 Where possble t s generally best to terate models manually by analyzng the varous dagnostcs descrbed above for each factor. In practce f many factors are beng analyzed ths can be mpractcal. In such crcumstances automatc "stepwse" model teratng algorthms can be programmed to terate models on the bass of type III tests alone. Such algorthms start wth a specfed model, and then: a. the sgnfcance of each factor n the model s tested wth a type III test, and the least sgnfcant factor s removed from the model f the sgnfcance s below a certan specfed threshold b. the sgnfcance of each factor not n the model but n a specfed lst of potental factors s tested by one at a tme creatng a new model contanng the factors n the prevous step plus the potental new factor. The most sgnfcant factor not currently n the model accordng to a type III test s then ncluded f the sgnfcance s above the specfed threshold c. steps a. and b. are repeated untl all factors n the model are deemed sgnfcant, and all factors not n the model are deemed nsgnfcant..48 Such algorthms allow no human judgment to be exercsed and can take a sgnfcant tme to complete. They are also heavly dependent on the type III test whch has some practcal shortcomngs as descrbed prevously. Nevertheless they can derve a theoretcally optmal model whch at the very least could form the startng pont for a more consdered manual teraton. Model valdaton.49 As well as consderng the sgnfcance of the modeled ratng factors, there are a number of more general dagnostc tests whch allow the approprateness of other model assumptons to be assessed. Dagnostcs whch ad n ths nvestgaton nclude: resduals whch test the approprateness of the error term leverage whch dentfes observatons whch have undue nfluence on a model the Box-Cox transformaton whch examnes the approprateness of the lnk functon 5

54 Resduals.5 Varous measures of resdual can be derved to show, for each observaton, how the ftted value dffers from the actual observaton..5 One measure of resdual s the devance resdual r D sgn Y μ ω Y μ Y ζ dζ V ζ whch s the square root of the observaton's contrbuton to the total devance e a measure of the dstance between the observaton and the ftted value, multpled by or - dependng on whether the observaton s more than or less than the ftted value..5 The devance resduals have varous helpful propertes. In general they wll be more closely Normally dstrbuted than the raw resduals defned smply as the dfference between the actual observaton and the expected value predcted by the GLM, as the devance calculaton corrects for the skewness of the dstrbutons. For contnuous dstrbutons t s possble to test the dstrbuton of the devance resduals to check that they are Normally dstrbuted. Any large devaton from ths dstrbuton s a good ndcaton that the dstrbutonal assumptons are beng volated..5 The below dagram shows a dstrbuton of devance resduals from an example model. In ths case the resduals appear to be reasonably consstent wth a Normal dstrbuton. 5 Hstogram of Devance Resduals Run Fnal models wth analyss Model 8 AD amounts 4 Frequency Sze of devance resduals Pretum 8//4 6: 54

55 .54 For dscrete dstrbutons the devance resduals based on ndvdual observatons tend not to appear Normally dstrbuted. Ths s because the calculaton of the contrbuton to the devance can adjust for the shape but not the dscreteness of the observatons. For example, n the case of fttng a model to clam numbers, a GLM mght predct a ftted value for a record of say. representng an expected clams frequency of %. In realty gnorng multple clams ether a clam occurs for that record or t does not, wth the result that the resdual for that record wll ether correspond to an "actual mnus expected" value of -. -., or wth lower probablty, the resdual wll correspond to an "actual mnus expected" value of Some practtoners group together the ndvdual resduals nto large groups of smlar rsks. Ths aggregaton can dsguse the dscreteness allowng some dstrbutonal tests to be performed. For example, t s commonly thought that a Posson wth a sutably large mean can be thought of as beng almost Normally dstrbuted. At ths pont the devance resdual calculated on the aggregate data should be smooth enough to test meanngfully..56 The devance resduals are often standardzed before beng analyzed. The purpose of ths standardzaton s to transform the resduals so that they have varance f the model assumptons hold. Ths s acheved by adjustng by the square root of the scale parameter and also by the square root of one mnus the "leverage" h : r DS sgn Y μ φ h ω Y μ Y ζ dζ V ζ.57 The leverage h s a measure of how much nfluence an observaton has over ts own ftted value. Its formal defnton s complex but essentally t s a measure of how much a change n an observaton affects the ftted value for that observaton. Leverage always les strctly between and. A leverage close to means that f the observaton was changed by a small amount the ftted value would move by almost the same amount. Where the leverage s close to t s lkely that the resdual for that observaton wll be unusually small because of the hgh nfluence the observaton has on ts ftted value. Dvdng by the square root of one mnus the leverage corrects for ths by ncreasng the resdual by an approprate amount..58 Another commonly used measure of the resdual s the Standardzed Pearson resdual. Ths s the raw resdual adjusted for the expected varance and leverage as descrbed above: PS Y μ r φ V μ h ω 55

56 .59 Ths adjustment makes observatons wth dfferent means comparable, but does not adjust for the shape of the dstrbuton..6 Observng scatter plots of resduals aganst ftted values can gve an ndcaton of the approprateness of the error functon whch has been assumed. For example, f the model form s approprate then the standardzed devance resduals should be dstrbuted Normal, regardless of the ftted value. The example scatter plot below shows the result of fttng a GLM wth a gamma varance functon to data whch has been randomly generated on a hypothetcal nsurance dataset from a gamma dstrbuton wth a mean based on assumed factor effects. It can be seen that movng from the left to the rght of the graph the general mean and varablty of the devance resduals s reasonably constant, suggestng as s known to be the case n ths artfcal example that the assumed varance functon s approprate. Plot of devance resdual aganst ftted value Run All clam types, fnal models, N&A Model 6 Own damage, Amounts Devance Resdual Ftted Value Pretum 8//4 : 56

57 .6 Conversely the graph below shows the scaled devance resduals obtaned from fttng a GLM wth an assumed Normal error to the same gamma data. In ths case the varablty ncreases wth ftted value, ndcatng that an napproprate error functon has been selected and that the varance of the observatons ncreases wth the ftted values to a greater extent than has been assumed. Ths could occur, for example, when a Normal model s ftted to Posson data, when a Posson model s ftted to gamma data, or as s the case here where a Normal model s ftted to gamma data. 6 Plot of devance resdual aganst ftted value Run All clam types, fnal models, N&A Model 7 Own damage, Amounts 5 4 Devance Resdual Ftted Value Pretum 8//4 : 57

58 Leverage.6 As well as beng needed to calculate standardzed resduals, the leverage statstc s also a helpful dagnostc n ts own rght, snce t can dentfy partcular observatons whch mght have an undue nfluence on the model. For example the graph below shows a scatter plot of leverage aganst ftted value. In ths case seven partcular observatons have clearly hgher leverage than other observatons around. and t s possble that they are havng an undue nfluence on the model. An nspecton of these observatons may ndcate whether or not t s approprate to retan them n the model.. Plot of leverage aganst ftted value Run All clam types, fnal models, N&A Model 6 Own damage, Amounts Leverage Ftted Value Pretum 8//4 : 58

59 Box Cox transformaton and the case for a multplcatve model.6 The Box Cox transformaton can be used to assess the approprateness of the assumed lnk functon. The transformaton defnes the followng lnk functon n terms of a scalar parameter λ: λ x g x, λ λ ln x, λ.64 If λ, gx x-. Ths s equvalent to an dentty lnk functon e an addtve model wth a base level shft..65 As λ, gx lnx lm λ λ x lm λ λ d d λ exp λ ln x lm d λ d λ λ ln x. x λ lnx Ths s equvalent to a multplcatve model..66 If λ-, gx -x -. Ths s equvalent to an nverse lnk functon wth a base level shft..67 By fttng a seres of GLMs to the data wth many dfferent values of λ ncludng real values between - & and &, and wth all other model features dentcal n every other respect, t s possble to assess whch value of λ s most approprate for the dataset n queston by seeng whch value of λ yelds the hghest lkelhood. Optmal values of λ around would suggest that a multplcatve structure wth a log lnk functon would be the most approprate for the data n queston, whereas optmal values of λ around would suggest an addtve structure would be best, wth values around - ndcatng that an nverse lnk functon would be most approprate..68 Examples based on two real datasets are shown below. Va L'Hôptal's Rule. 59

60 Box Cox transformaton results on frequency -779 Lkelhood Inverse Multplcatve Addtve λ Box Cox transformaton results on severty Lkelhood Inverse Multplcatve Addtve λ 6

61 .69 The frst graph shows varous values of λ tested on two dfferent datasets contanng prvate passenger automoble property damage lablty frequency experence. The optmal λ n one case s very close to zero suggestng a multplcatve model but n the other s around., suggestng that the frequency n that case s largely nfluenced by explanatory varables n a multplcatve fashon, but to an extent also n an addtve fashon..7 The second graph shows the results for clam amounts models for the same data. Here the optmal values of λ are near zero multplcatve, but ths tme slghtly toward the drecton of beng partly nverse..7 In order to understand how sgnfcant the value of λ s upon the ftted values produced by the model t s helpful to consder the hstogram graph below whch shows, for one of the two frequency datasets consdered above, the dstrbuton of the rato of ftted values produced by a GLM wth λ to an otherwse dentcal GLM wth λ.. It can be seen that there s n fact lttle dfference between the ftted values produced by these two models, wth the great majorty of ftted values beng wthn or % of each other. 5 Dstrbuton of rato of ftted values between model wth λ and model wth λ. 5 Count of records Rato of ftted values for Lambda mult to Lambda In practce there are many sgnfcant advantages wth usng a multplcatve structure, not least because t s easy to understand. In the above examples t seems that there s no strong evdence to use a structure other than a multplcatve structure. 6

62 .7 Whle ths should be tested n each case, t s often the case that multplcatve structures and log lnk functons are the most approprate practcal model for modelng nsurance rsk, and ths may explan the hgh prevalence of multplcatve ratng structures, especally n Europe where GLMs have been n use for many years. Model refnement Interactons.74 Thus far, the dscusson has focused on the ndependent effect of factors n the model. Generalzed lnear models can also consder the nteracton between two or more factors. Interactons occur when the effect of one factor vares accordng to the level of another factor..75 Interactons relate to the effect whch factors have upon the rsk, and are not related to the correlaton n exposure between two factors. Ths s llustrated wth two examples whch consder two ratng factors n a multplcatve ratng structure. Example - correlaton but no nteracton Earned exposure Town Countrysde Total Male Female Total 6 Number of clams Town Countrysde Total Male 8 Female 4 Total 4 4 Clams frequency Town Countrysde Male 4% % Female % %.76 In ths example the exposure s not dstrbuted evenly amongst the dfferent ratng cells - a hgher proporton of town dwellers are male than s the case n the countrysde. The effect of the two factors upon the rsk, however, does not n ths example depend on each other - men are twce the rsk of women regardless of locaton and town dwellers are twce the rsk of countrysde dwellers regardless of gender. In ths example there s therefore a correlaton between the two ratng factors, but no nteracton. 6

63 Example - nteracton but no correlaton Earned exposure Town Countrysde Total Male 5 45 Female Total Number of clams Town Countrysde Total Male 8 Female 4 5 Total 4 6 Clams frequency Town Countrysde Male 6% % Female % %.77 In ths example the exposure s dstrbuted evenly amongst the dfferent ratng cells - the same proporton of town dwellers are males as are countrysde dwellers. The effect of the two factors upon the rsk, however, n ths example depend on each other - t s not possble to represent accurately the effect of beng male compared wth beng female n terms of a sngle multpler, nor can the effect of locaton be represented by a sngle multpler. To reflect the stuaton accurately t s necessary n ths case to consder multplers dependent on the combned levels of gender and locaton..78 An nteracton term can be ncluded wthn a GLM smply by defnng an explanatory varable n terms of two or more explanatory varables. In the above example, rather than declarng locaton and gender as two explanatory varables each wth a base level and one parameter, a combned "gender-locaton" varable could be declared wth four levels a base level and three parameters..79 Interacton terms should only be ncluded where there s statstcal justfcaton for the ncluson of the addtonal parameters. In the above example the nteracton term only nvolved the addton of one further parameter to the model, but f an nteracton s ntroduced between two ten level factors each wth a base level and nne parameters, a further 8 parameters would be ntroduced nto the model. 6

64 "Complete" and "margnal" nteractons.8 Interactons can be expressed n dfferent ways. For example consder the case of two factors each wth four levels. One way of expressng an nteracton s to consder a sngle factor representng every combnaton of the two factors or "complete" nteracton. A set of multplers n the case of a multplcatve model could therefore be expressed as follows: Factor : A B C D Factor : W X.9... Y Z In ths case the base level has been selected to be the level correspondng to level B of factor and level X of factor, and the nteracton term has 5 parameters..8 An alternatve representaton of ths nteracton s to consder the sngle factor effects of factor and factor and the addtonal effect of an nteracton term over and above the sngle factor effects or "margnal" nteracton. A set of multplers n ths form can be set as follows: Factor : A B C D Factor : W.8 - X Y Z In ths case fewer parameters are present n the addtonal nteracton term because the presence of the sngle factor effects makes some of the nteracton terms redundant. When ftted n a GLM assumng that the sngle factor effects were declared frst the redundant terms n the addtonal nteracton term would be alased. Overall the three terms combned stll have 5 parameters, and result n dentcal predcted values for example n the case of factor level D and factor level Z,. *.4 *

65 .84 In practce sometmes t can be helpful to consder the "complete" nteracton e just a sngle factor representaton of all combnatons of the two factors and sometmes t can be helpful to consder the addtonal or "margnal" nteracton term over and above the sngle factor effects. Whle the ftted values from both approaches are dentcal, what dffers s the statstcal dagnostcs avalable n the form of parameter estmate standard errors. Example.85 For example, the graph below shows the result of a "complete" nteracton between the age of drver and the gender of drver for the clams frequency of a certan type of auto clam, wth age relatvtes for men and women supermposed wth sold and dotted lnes respectvely on the same x-axs.. Complete nteracton 55% 8%.8 5 Log of multpler % 46% 6% 4% 8% 9% 4% % -% % 6% -6% % -% 5 Exposure years -. -8% -9% Complete nteracton of Age of drver and Sex of drver P value.% Approx 95% confdence nterval, Sex of drver: Female Parameter estmate, Sex of drver: Female Approx 95% confdence nterval, Sex of drver: Male Parameter estmate, Sex of drver: Male 65

66 .86 If there were no sgnfcant nteracton between these two factors the sold and dotted lnes showng parameters from a log lnk GLM would be parallel. In ths example they are clearly not parallel, showng that whle younger drvers have hgher frequences, and whle n general male drvers have hgher frequences, n ths example as n many real cases young men experence a hgher frequency than would be predcted by the average ndependent effects of the two factors..87 The narrow standard error bands around the parameter estmate lnes suggest the lkely statstcal sgnfcance of the result, however they do not provde any sound theoretcal bass for assessng the sgnfcance of the factor. A more theoretcally approprate test can be appled f a margnal nteracton s consdered..88 The graphs below show the results of fttng age and then a margnal nteracton of age and sex to the same data. Man effect. 5 55% 8%.8 Log of multpler % 8% 4% 5 Exposure years -. % -9% -6% -% Age of drver Onew ay relatvtes Approx 95% confdence nterval Parameter estmate 66

67 Partal margnal nteracton.4. 7% Log of multpler % % % % % % % % % -% -% % % -7% -4% 5 5 Exposure years % -8% Margnal nteracton of Age of drver and Sex of drver P value.% Approx 95% confdence nterval, Sex of drver: Female Parameter estmate, Sex of drver: Female Parameter estmate, Sex of drver: Male.89 The frst graph shows the sngle factor effect for age, and the second shows the margnal nteracton term over and above ths sngle factor effect. In ths case the sngle factor gender of the drver was not ncluded snce t proved not to be sgnfcant..9 Snce the male level of the gender factor s ordered after the female level, the male levels of the margnal factor have been alased, wth the result that the frst graph represents the age effects for males, and the margnal graph shows the addtonal adjustment whch s approprate for females of dfferent ages..9 The mpled ftted values from the margnal nteracton are the same as the complete nteracton - for example: Complete nteracton effect for age -4 female 46% or multpler of.46 Margnal approach: Sngle factor age effect for -4 8% or multpler of.8 Margnal effect of women relatve to men at age -4-8% or multpler of.6 Combned effect for age -4 female.8 x.6.48 dfferences due to roundng 67

68 .9 The margnal approach does however provde more meanngful dagnostcs n the form of parameter estmate standard errors and type III tests. The standard errors on the graph of the margnal term ndcate that the margnal term s ndeed sgnfcant, and the Type III P-value of.% for ths factor confrms that ths s the case..9 An example of an nteracton term whch s not sgnfcant s shown below. The frst graph s the complete nteracton where the parameter estmate lnes can be seen to be largely parallel. The second and thrd graphs show the man effects of age of drver and payment frequency, respectvely. The fourth graph shows the margnal nteracton where the margnal nteracton term can be seen to be nsgnfcant, both vsually and because of the type III p-value of 6.9%. Complete nteracton.8 % 8% 9% 45 4 Log of multpler % 56% 6% 46% 66% 5% 46% 9% % 5% 5% % 7% 7% % % 5% 5% 5 5 Exposure years -6% -7% -7% -6% % 5 -% Complete nteracton of Age of drver and Payment frequency P value.% Approx 95% confdence nterval, Payment frequency: Yearly Approx 95% confdence nterval, Payment frequency: Half-yearly A pprox 95% confdence nterval, Payment frequency: Quarterly Parameter estmate, Payment frequency: Yearly Parameter estmate, Payment frequency: Half-yearly Parameter estmate, Payment frequency: Quarterly 68

69 Man effect % 46% 5% Log of multpler. % % % -7% -7% 5 Exposure years -. -% Age of drver Onew ay relatvtes Approx 95% confdence nterval Parameter estmate P value.% Man effect % Log of multpler.5. 7% Exposure years.5 % -.5 Yearly Half-yearly Quarterly Payment frequency P value.% Onew ay relatvtes Approx 95% confdence nterval Parameter estmate 69

70 Margnal nteracton % 5 Log of multpler % % % 6% % % 8% % % 7% % % 5% 6% % % % 9% % 5% % % -6% 5 5 Exposure years % Margnal nteracton of Age of drver and Payment frequency P value 6.9% Approx 95% confdence nterval, Payment frequency: Half-yearly Approx 95% confdence nterval, Payment frequency: Quarterly Parameter estmate, Payment frequency: Yearly Parameter estmate, Payment frequency: Half-yearly Parameter estmate, Payment frequency: Quarterly Interpretng margnal nteractons.94 Although the margnal form of an nteracton provdes a more sound theoretcal bass for assessng the sgnfcance of a factor, n practce margnal nteractons can be hard to nterpret. For example consder the example of two factors each wth four levels. It mght be the case that the true underlyng frequency all other factors beng at a certan level was as follows: Factor : A B C D Factor : W 7.% 8.% 8.8% 9.6% X 9.%.%.%.% Y 9.7%.% 4.5% 6.6% Z.6% 4.% 8.5%.% 7

71 .95 However n realty the exposure avalable for ths analyss mght be low for some combnatons of these two factors, for example: Exposure Factor : A B C D Factor : W X Y Z.96 If n general the clams experence was n lne wth the underlyng frequences but the one polcy wth factor level D and factor level X had one clam resultng n a very hgh clams frequency of %, a margnal nteracton would yeld results whch could be hard to nterpret. Specfcally f a margnal nteracton were ftted, the GLM would seek the followng parameters: Factor : A B C D - Factor : W X Y 5 - Z Snce level X s the base level of factor, there s no sngle term n the margnal nteracton whch can represent the very hgh observed frequency for factor level D / factor level X. Instead the model wll yeld parameter wth a very hgh value, and parameters 9, and 5 wth low values. Although theoretcally correct, the parameter estmates and standard errors for parameters 9, and 5 would be hard to nterpret. Searchng for nteractons.98 In general the sgnfcance of an nteracton can be assessed by consderng the standard errors of the parameter estmates of the margnal term the type III P-value of the margnal term general ntuton gven the overall "complete" nteracton effect the consstency of an nteracton over tme. 7

72 .99 In theory all possble combnatons of pars or trplets of factors could be tested as nteractons one at a tme n a model. In practce, the desgn of the current ratng plan, the results of two-way analyses and wder experence wll nfluence the choce of what s tested, as wll the ease of nterpretaton and the ultmate applcaton of the model.. In some cases rather than consderng every combnaton of two factors wth many levels t can be approprate to consder only the strongest effects. For example, a margnal nteracton of drver age, car symbol and the nteracton of drver age - car symbol denoted drver age*car symbol may hghlght an nterestng effect n one "corner" of the nteracton eg young drvers drvng hgh car symbols. In practce, the nteracton may be re-parameterzed as a combnaton of detaled sngle factors for age of drver and car symbol, and an addtonal less detaled factor based on the combnaton of age of drver and car symbol whch has the same level for many combnatons, and a few levels representng certan combnatons of young drvers drvng hgh car symbols.. The ncluson of several meanngful nteractons whch share factors eg age*sex, age*mult-car and terrtory*mult-car could provde a theoretcally correct model but may be very dffcult to nterpret. The practtoner may consder creatng separate models for sngle and mult-car, and contnue to nvestgate other nteractons. Smoothng. Once models have been terated to nclude only sgnfcant effects and nteractons have been nvestgated, smoothng of the parameter estmates may be consdered n order to mprove the predctve power of the model. Much lke the offset and pror weght terms n the formularzaton of GLMs, smoothng s used to ncorporate some element of the practtoner's knowledge nto the model. In ths sense, the practtoner may mpart knowledge that some factors have a natural order eg that age of car seven should fall between age of car sx and age of car eght. Outlers may also be tempered. Ths temperng s not based on commercal selectons at ths pont e tolerance for rate change but rather an attempt to adjust an anomaly once a proper nvestgaton has been done to ensure that the outler s truly an anomaly and not somethng systematc n the experence.. The selecton of smoothed parameter estmates can be done n an unscentfc fashon for example - a vsual modfcaton to a curve or n a more scentfc fashon for example - fttng polynomals to the observed parameter estmates, or electng to reft a model usng polynomal terms as varates wthn the GLM. If smoothng s rather severe, the practtoner may consder restrctng the values of the smoothed factor and re-runnng a model to allow other factors to compensate. The concept of restrctons s dscussed later n ths paper. In general, however, ths technque may only remove the random element from one factor and move t to another factor, and often t can be preferable not to reft usng restrctons n ths way. 7

73 Rsk Premum.4 Fttng GLMs separately to frequency and severty experence can provde a better understandng of the way n whch factors affect the cost of clams. Ths more easly allows the dentfcaton and removal va smoothng of certan random effects from one element of the experence. Ultmately, however, these underlyng models generally need to be combned to gve an ndcaton of loss cost, or "rsk premum", relatvtes..5 In the case of multplcatve models for a sngle clam type, the calculaton s straghtforward - the frequency multplers for each factor can smply be multpled by the severty multplers for the same factors whch s analogous to addng the parameter estmates when usng a log lnk functon. Alternatvely, models may be ftted drectly to pure premum data usng the Tweede dstrbuton dscussed n Appendx C. The advantages and dsadvantages of ths alternatve approach are dscussed n Appendx J..6 Certan market condtons may warrant the development of a sngle theoretcal rsk premum model, even f dfferent types of clam have been modeled separately. An example s the aggregaton of homeowners models by perl nto a sngle ratng algorthm at pont of sale. The dervaton of a sngle model n ths stuaton s not as straghtforward snce there s no drect way of combnng the model results for the underlyng clam types nto a sngle overall expected cost of clams model. In ths stuaton, however, t s possble to approxmate the overall effect of ratng factors on the total cost of clams by usng a further GLM to calculate a weghted average of the GLMs for each of the underlyng frequency and severty models for each of the clam types. Specfcally ths can be done by selectng a dataset whch most accurately reflects the lkely future mx of busness calculatng an expected clam frequency and severty by clam type for each record n the data combnng these ftted values, for each record, to derve the expected cost of clams accordng to the ndvdual GLMs for each record fttng a further generalzed lnear model to ths total expected cost of clams, wth ths fnal GLM contanng the unon of all factors and nteractons n all of the underlyng models. The term "rsk premum" s used rather than pure premum n order to dfferentate between a model ftted drectly on pure premum data and a model derved by combnng underlyng frequency and severty models. 7

74 .7 An llustratve example s shown below. The top table represents the ntercepts and multplers from underlyng frequency and severty models for clam types and. The bottom table shows the calculaton of the total rsk premum, based on the underlyng models, for the frst four records n the data. The addtonal GLM s ftted to ths last column n ths dataset n order have a sngle theoretcal rsk premum model. Clam type Clam type Frequency Severty Frequency Severty Intercept.,. 4,86 Sex: Male.... Female Area: Town.... Country Polcy Sex Area Ftted freq Ftted sev Ftted RP Ftted freq Ftted sev Ftted RP Total RP M T.%,...% 4, F T 4.%, % 4, M C 4.% % 4, F C.% %,

75 .8 In addton to combnng frequency and severty across multple clam types, the technque of fttng an overall GLM to ftted values of other GLMs can be used to ncorporate non-proportonal expense elements nto the modeled relatvtes. For example, a constant dollar amount could be added to each observaton's expected rsk premum and then a GLM re-ftted to ths new feld. The resultng "flattened" rsk premum relatvtes wll prevent hgh rsk factor levels from beng excessvely loaded for expenses..9 Alternatvely, the amount added to each observaton's expected rsk premum could be desgned to vary accordng to the results of a separate retenton study. Ths would allow rsks wth a hgh propensty to lapse to receve a hgher proporton of fxed expense than those rsks wth a low propensty to lapse. As above, a further GLM s ftted to the sum of the expected rsk premum and a lapse-dependent expense load. Restrctons. The theoretcal rsk premum results from a GLM clams analyss wll dffer from the rates mplemented n practce snce consderaton needs to be gven to prce demand elastcty and the compettve stuaton. There are, however, some stuatons where legal or commercal consderatons may also mpose rgd restrctons on the way partcular factors are used n practce. When the use of certan factors s restrcted, f desred the model may be able to compensate to an extent for ths artfcal restrcton by adjustng the ftted relatvtes for correlated factors. Ths s acheved usng the offset term n the GLM.. Specfcally, the requred parameter estmates logs of multplers n the case of a multplcatve model are calculated for each record and added to the offset term ξ. The factor n queston s then not ncluded as an explanatory factor n the GLM. Ths can ntutvely be thought of as fxng some selected elements of to be specfed values.. The graphs below llustrate the use of a restrcton. In the upper seres of graphs, the dotted lnes dsplay the theoretcally correct parameter estmates ndcated by a GLM contanng these four ratng factors. The dashed lne n Factor shows the ntended restrcton. In the lower seres of graphs, the sold lnes show the output of the GLM after the restrcton for Factor has been ncorporated and Factors,, 4 have been allowed to compensate. It can be seen that the parameter estmates n Factors and have hardly changed, suggestng lttle correlaton between these factors and Factor. On the other hand the sold lne n Factor 4 has moved away from the theoretcally correct dotted lne, suggestng a correlaton between the restrcted levels n Factor and those levels n Factor 4 whch moved to compensate for the restrcton. 75

76 Example of restrctng a factor Unrestrcted true effect Log of multpler Restrcted parameter estmate 8 6 Exposure years Log of multpler Exposure years No Yes Factor Factor Log of multpler Exposure years Log of multpler Exposure years Factor -. < Factor Log of multpler Exposure years Log of multpler.. 5 Exposure years No Yes Factor Factor Model compensates as best t can 5 Log of multpler Exposure years Log of multpler Exposure years < Factor Factor 4 76

77 . Although restrctons could be appled ether to frequency or amounts models or n part to both, generally t s more approprate to mpose the restrcton on the model at the rsk premum stage snce ths allows a more complete and balanced compensaton by the other factors. Ths can be acheved by calculatng the expected cost of clams for each record, accordng to "unrestrcted" GLMs, and then mposng the restrcton n the fnal GLM whch s then ftted to the total expected cost of clams. For restrcted rsk premum models ths approach s necessary even n the case of a sngle clam type..4 In the US, many personal lnes ratng plans contan dscounts that were ntally mplemented for marketng appeal or perhaps mandated by regulaton. Today's models may ndcate that these dscounts are not supported by the clams experence - or n many cases may even ndcate a surcharge. If a company chooses to contnue offerng such dscounts, t s mportant that these restrctons are ncorporated nto the modelng process snce such restrctons can affect the relatvtes whch become approprate for other correlated factors. Counterntutve model results may occur on behavoral factors such as factors whch polcyholders self-select, for example lmts and deductbles. These factors may requre restrcton f they are to be used drectly n ratemakng..5 Model restrctons are also used n US ratemakng to mtgate the number of factors whch wll change n a gven rate revew. Companes may restrct certan exstng ratng factors and allow the GLM to measure only the effect of new ratng factors. Restrctons may also come nto play when applyng the results of a countrywde model to a partcular state..6 Pror to ncorporatng restrctons, t s stll mportant to assess the true effect of all factors upon the rsk by ntally ncludng them n the analyss as f they were ordnary factors. In addton, a comparson of the ftted values of the theoretcal model and the restrcted models wll demonstrate the degree to whch other factors have compensated for the restrcton. The examples below show two such comparsons. Each graph shows the number of polces on the y-axs that have dfferent ratos of restrcted to unrestrcted ftted values on the x-axs. The graph s subdvded by levels of the restrcted factor shown n dfferent shadng. If the GLM can compensate well for a factor restrcton because there are many other factors n the model correlated wth the restrcted factor then ths dstrbuton wll be narrow. Conversely f the GLM cannot compensate well for the restrcton, ths dstrbuton wll be wder. 77

78 .7 In ths partcular example the factors n the upper graph have not compensated well for the restrcton. The wde dstrbuton of the restrcted to unrestrcted rato mples that the restrcton s movng the model away from the theoretcal result. The lower graph, on the other hand, shows a model whch contans factors that are more correlated wth the restrcted factor, and whch have compensated better for the restrcton. Dstrbuton of rato of ftted values between restrcted and unrestrcted models showng lttle compensaton from other factors 8 Count of records Rato A B C D E F G H I J K L M N O P Q R S Dstrbuton of rato of ftted values between restrcted and unrestrcted models showng some compensaton from other factors 8 Count of records Rato A B C D E F G H I J K L M N O P Q R S 78

79 Interpretng the results.8 To understand how the results of a GLM clams model dffer from the exstng ratng relatvtes t s helpful to consder the results both on a factor-by-factor bass and also by measurng the overall effect of all factor dfferences combned. Comparng GLM ndcated relatvtes to current relatvtes.9 The fnal rsk premum models can be plotted on graphs smlar to those shown n prevous sectons. Another lne can be added to dsplay the relatvtes mplct n the current ratng structure. Ths allows easy comparson of the relatvtes ndcated by the model and those whch are currently used. An example graph s shown below. In ths example t can be seen that the current relatvtes for young drvers shown as a dotted lne are too low. Fnal rsk premum model compared to current relatvtes.8 % % Log of multpler.4. % % 49% % 5% % 8% % % % % -% -5% 5 Exposure years -. -% -% -8% Age of drver Smoothed parameter estmate Approx s.e. from estmate Current relatvtes 79

80 . If the exstng ratng plan s purely multplcatve, supermposng current relatvtes on the graph above s very straghtforward. Supermposng relatvtes from a mxed multplcatve/addtve ratng plan s slghtly less straghtforward. Some addtve components may be re-expressed as an nteracton varable eg {A x B x CD} may be re-expressed to consder the nteracton of C and D. Exstng ratng plans wth more complex addtve components may be approxmated by fttng a multplcatve model to a data feld contanng exstng premum. The approprateness of ths multplcatve proxy to the mxed ratng plan can be evaluated by examnng the dstrbuton of the rato of the premum produced by the multplcatve proxy and the actual premum. Proxy models whch estmate the ratng plan wthn a narrow dstrbuton eg /-5% may well be approprate to use. Impact graphs. The results of a GLM analyss are nterdependent and must be consdered together. For example, whle a GLM analyss mght suggest that young drver relatvtes are too low, t may also suggest that relatvtes for nexperenced drvers eg less than two years lcensed are too hgh. Although the exstng ratng structure may be theoretcally wrong, t mght be the case that to a large extent these errors compensate each other. To understand the true "bottom lne" dfference between the exstng ratng structure and the theoretcal clams cost, "mpact" graphs such as the one below can be consdered. Impact on portfolo of movng to theoretcally correct relatvtes 7 6 Currently proftable busness 5 Exposure count 4 Currently unproftable busness Rato: Rsk Premum / Current tarff Where A, B, C and D represent factors each of whch possbly have a dfferent number of levels. 8

81 . Ths graph above shows the number of exposures n the exstng portfolo that would experence dfferent changes n premum f the ratng structure were to move from ts exstng form to the theoretcally correct form mmedately. It s, of course, exceptonally unlkely that such dramatc change would be mplemented ented n practce. The purpose of ths analyss s to understand the magntude of the exstng crosssubsdes by consderng the effect of all ratng factors at the same tme.. Ths graph can also be dvded by levels of a partcular ratng factor. Indeed one such graph can be produced for each ratng factor. Ths dentfes whch sectors of the busness are currently proftable, and whch are currently unproftable, takng nto account the correct theoretcal model and consderng the effect of all factors at the same tme. In the example below, the mpact graph s segmented by age of drver notce the shape does not change, only how the hstogram s patterned. Impact on portfolo of movng to theoretcally correct relatvtes segmented by age of drver Exposure count Rato: Rsk Premum / Current tarff

82 .4 The hstogram shows the mpact of all ratng factor changes not just the age of drver factor by age of drver levels. It can be seen n ths example that a large number of exposures whch would experence large ncreases n premum f the ratng structure were moved mmedately to the theoretcally correct structure are young drvers. It had already been seen from the GLM rsk premum graphs that young drver relatvtes were too low. Ths graph suggests there are no effects from other correlated factors whch notceably mtgate ths effect; otherwse, young drvers would not be so strongly on the "unproftable" sde of the mpact graph...5 An example may make nterpretaton of the graph above clearer. Assume the multplcatve clams model uses age, gender, martal status, terrtory and credt as ratng factors. Consder the followng young drver profle wth ndcated rate change for each crteron n parenthess: age 7-6%, male 5%, sngle -5%, urban terrtory 5%, hgh credt score -%. All factors consdered, the total ndcated rate change for ths rsk profle s 6% and so ths polcy would contrbute a count of one to the bar at There are roughly 6 total exposures n ths band; roughly one-thrd of whch correspond to drvers age The graph below adds a second rght hand y-axs. Ths y-axs contans the actual loss rato present n the hstorcal data. Ths shows very clearly how the GLM has dfferentated between segments of dfferng proftablty - each band on the x-axs represents a band of dfferng expected proftablty, and the sold lne shows the actual proftablty experenced for that band. 7 Impact on portfolo of movng to theoretcally correct relatvtes segmented by age of drver, wth actual loss rato also shown Exposure count Actual ncurred loss rato Rato: Rsk Premum / Current tarff Actual ncurred loss rato 8

83 Other applcatons of GLMs. Ths secton brefly dscusses the role of GLMs n the use of credt n personal lnes ratemakng the use of scorng algorthms n more general terms to consder underwrtng and marketng scorecards not necessarly related to credt the use of GLMs n retenton/converson analyss. The role of GLMs n the use of credt-based nsurance scores. Credt-based nsurance scores attempt to measure the predctve power of components of consumer credt report data on the cost of nsurance clams. The personal lnes nsurance ndustry n the US has been usng credt-based nsurance scorng for over a decade. A Connng & Company survey reported that 9% of the respondents of a survey of the largest personal automoble nsurance wrters n the US use some form of credt scorng.. The early publshed actuaral studes on the use of credt nformaton n nsurance demonstrated clear dfferences n unvarate loss rato by dfferent bands of a credtbased credt score. Further studes examned ths relatonshp by components of the Insurance Bureau score and also consdered how loss rato by credt component vared across certan tradtonal ratng varables e a two-way approach. 4.4 These studes drew early crtcsm regardng possble double-countng of effects already present n rsk classfcaton schemes. 5 Generalzed lnear models and other multvarate methods have played a crtcal role n addressng that crtcsm. A study conducted by EPIC Actuares LLC on behalf of the property-casualty nsurance ndustry's four natonal trade assocatons, offered four major fndngs about credtbased nsurance scores: "Insurance Scorng n Prvate Passenger Automoble Insurance Breakng the Slence", Connng Report, Connng,. 4 The reader seekng more nformaton may reference the summares of the Tllnghast study and the James Monoghan paper n "Does Credt Score Really Explan Insurance Losses? Multvarate Analyss from a Data Mnng Pont of Vew" by Cheng-Sheng Peter Wu and James C Guszcza, Casualty Actuaral Socety Forum Vol: Wnter Pages: The use of credt nformaton n nsurance underwrtng and ratemakng has also drawn serous crtcsm regardng ssues such as socal equty, ntutve correlaton wth loss, dsparate mpact by race and level of wealth, etc. These ssues are beyond the scope of ths paper. 8

84 a. usng generalzed lnear models to adjust for correlatons between factors, nsurance scores were predctve of propensty for prvate passenger automoble nsurance loss partcularly frequency; b. nsurance scores are correlated wth other rsk characterstcs, but after fully accountng for those correlatons, the scores sgnfcantly ncrease the accuracy of rsk assessment; c. nsurance scores are among the three most mportant rsk factors for each of the sx automoble clam types studed; d. an analyss of property damage lablty frequences by nsurance score group for each of the ffty states suggest consstent results across states. 6.5 Model vendors and nsurance companes have developed credt-based nsurance scorng algorthms whch vary n complexty, applcaton and propretary nature. The Connng & Company survey concluded that smaller nsurers were usng credt scorng predomnantly n ther underwrtng processes, whereas larger nsurers appeared to be focusng on underwrtng, prcng and sophstcated market segmentaton. Insurance scores beyond credt.6 Other scorng technques can be used as a way to share vtal nformaton between the actuaral departments and the rest of the nsurance organzaton. For example, scores can be used to predct the proftablty of an nsurance polcy gven a certan ratng structure regardless of whether or not credt s consdered. Ths nformaton can be used n underwrtng, cesson decsons, marketng, and agent compensaton schemes..7 The most drect way to manage the proftablty of a personal lnes product s through effectve ratemakng. Often, however, regulatory, practcal or commercal condtons restrct the degree to whch premums can be set to reflect the rsk. In these crcumstances a score based on expected loss rato can be used by nsurers to gauge whch customers are lkely to be more proftable. As varous functonal areas are famlar wth the applcaton of scorng algorthms, ths provdes a common language for communcatng a desred strategy throughout the nsurance organzaton. 6 "The Relatonshp of Credt-based Insurance Scores to Prvate Passenger Automoble Insurance Loss Propensty, an actuaral study by Epc Actuares LLC"; prncpal authors Mchael J. Mller and Rchard A. Smth 84

85 .8 For example, a scorng algorthm could help target marketng campagns to those customers who are lkely to be more proftable. Scores can also be used as part of an ncentve scheme for agents, where commsson or bonus s lnked to the average customer score. Such applcatons can be partcularly useful n hghly regulated markets, as the score can nclude polcyholder characterstcs that are not permtted n the actual premum. Producng the score.9 One method of dervng a scorng algorthm takes advantage of the "lnear" part of generalzed lnear models GLMs. The output of a GLM s a seres of addtve parameters whch s then transformed va the lnk functon to gve the expected value for an observaton. When calculatng a score the lnk functon can be omtted, leavng a smple addtve structure whch orders the rsk. A straghtforward calculaton can then transform the addtve structure nto a scorng algorthm whch produces scores between a desred range, for example to.. To derve a proftablty score, the startng pont would be a standard analyss of clams experence usng GLMs as dscussed n detal n Secton. Ths would nvolve fttng a seres of GLMs to hstorc clams data, consderng frequency and severty separately for each clam type. These models would nclude all standard ratng factors, as well as any addtonal nformaton that wll be avalable at the tme the score s to be calculated. Such addtonal nformaton could nclude geodemographc data.. The expected cost of clams can then be calculated for each record n the data based upon the GLM clams models. For each polcy ths can then be dvded by the premum whch wll be charged to yeld an expected loss rato, whch can then tself be modeled and re-scaled to derve the proftablty score.. The model of expected loss rato should only nclude those factors that wll be consdered at the tme the score s to be appled. For drect malng campagns ths wll usually mean that tradtonal nsurance ratng factors used n the premum wll have to be excluded at ths pont snce they are not known at the tme of the malng campagn. 85

86 Example results. The graph below shows how a score can be used to segment very effectvely between proftable and unproftable busness. The bars on the graph show the number of polces that have been allocated dfferent scores between and. The sold lne shows the actual loss rato experenced for busness wth dfferng scores. It can be seen that the busness towards the left of the graph, wth low proftablty scores, s experencng loss ratos of % and above, whle the busness to the rght of the graph, wth hgh scores, s returnng loss ratos of 5% and below. Dstrbuton of score 5 6% 4% % Number of polces 5 % 8% 6% Actual loss rato 5 4% % Score based on expected loss rato % Number of polces Actual loss rato.4 Scores are smple to produce, easy to explan and are ncreasngly used by nsurers. Actuares can play a vtal role n the development of scorng models wth the ad of generalzed lnear models. Retenton modelng usng GLMs.5 Tradtonal ratemakng technques focus prmarly on loss analyss n a statc envronment. Rate changes developed by these technques, especally when they are large, can actually contrbute to a shortfall n projected premum volume and proftablty f nsuffcent consderaton s gven to the effect of the rate change and other polcy characterstcs on customer retenton and/or new busness converson. Modelng retenton or ts complement, lapse rate and new busness converson wth GLMs can mprove ratemakng decsons and proftablty forecasts, as well as mprove marketng decsons. 86

87 .6 The data for a retenton model must nclude nformaton on ndvdual polces that have been gven a renewal offer, and whether or not each polcy renewed. 7 Smlarly, data for a converson model must contan ndvdual past quotes and whether the quote converted to new busness. Whle most nsurers have access to approprate retenton data, many dstrbutng va exclusve agents or ndependent brokers wll not have access to approprate ndvdual converson data. The explanatory varables to nclude n the data can be dvded nto three categores: customer nformaton, prce change data, and nformaton on the compettve poston..7 The frst category should encompass more than just the standard ratng varables eg age, terrtory, clam experence. Other "softer" varables such as number of years wth the company, other products held, payment plan and endorsement actvty can determne much about a customer's behavor. Dstrbuton channel, too, can have a clear effect on the retenton rate - and may nteract sgnfcantly wth other factors eg the effect of age may be dfferent wth nternet dstrbuton than wth agency dstrbuton..8 Pror rate change, whether measured n percent change or dollar change, s often one of the most sgnfcant factors n a retenton model. Though t s ntutve that retenton s a functon of rate change, the slope of the elastcty curve at dfferent rate changes may not be as obvous. In addton, measurng retenton usng a generalzed lnear model wll adjust for exposure correlatons between prce elastcty and other explanatory varables eg a GLM wll not show that a partcular ratng factor level has a low retenton rate merely because hstorcally there was aggressve rate actvty wth that level..9 The thrd type of varable, nformaton on the compettve poston, s often the hardest to gather n practce. An example of a compettve ndex may be the rato of the company's renewal quote to the thrd cheapest quote from a specfed selecton of major compettors at the tme of the quote.. Trackng the myrad of compettor rate changes n a multtude of states can be overwhelmng - even wth the avalablty of thrd party compettve ratng software and advances n quote collecton procedures. Fortunately even the most rudmentary compettve ndex varables can prove to be predctve n a retenton model and more so n a converson model. 7 Alternatvely, retenton data may be organzed by rsk f more than one rsk s wrtten on a sngle polcy. 87

88 Model form. As mentoned prevously n Secton, the typcal model form for modelng retenton or lapse and new busness converson s a logt lnk functon and bnomal error term together referred to as a logstc model. The logt lnk functon maps outcomes from the range of, to -, and s consequently nvarant to measurng successes or falures. If the y-varate beng modeled s generally close to zero, and f the results of a model are gong to be used qualtatvely rather than quanttatvely, t may also be possble to use a multplcatve Posson model form as an approxmaton gven that the model output from a multplcatve GLM can be rather easer to explan to a nontechncal audence. Example results. The graph below shows sample GLM output for a lapse model. The man lne on the graph demonstrates on a log scale the measured multplcatve effect of age of polcyholder upon lapse rate. The effect s measured relatve to an arbtrarly selected base level, and the results take nto account the effect of all other factors analyzed by the GLM. Posson multplcatve lapse model output Log of multpler Age of polcyholder Approx SEs from estmate Unsmoothed estmate 88

89 . In ths example, whch s farly typcal, t can be seen that young polcyholders lapse consderably more than older polcyholders, perhaps as a result of havng more tme and enthusasm n searchng for a better quotaton, and perhaps also as a result of beng generally less wealthy and therefore more nterested n fndng a compettve prce..4 Ths next graph shows the effect of premum change on lapse rate. Ths GLM output s from a UK Insttute of Actuares General Insurance Research Organsaton GIRO study 8 based on around 5, polces across several major UK nsurers n 996. The premum change s measured n ranges of monetary unts Brtsh pounds n ths case, but the model could easly be based on the percentage change n premum. As would be expected, ncreases n premum ncrease lapses. The model, however, quantfes ths accurately and enables nvestgatons nto potentally optmal rate ncreases to be undertaken. It can be seen n ths case as s often the case that decreases n premum beyond a small threshold do not decrease lapses. Posson multplcatve lapse model output.7 Log of multpler Change n premum on renewal Approx SEs from estmate Unsmoothed estmate 8 Bland, R.H. et al, Insttute of Actuares GIRO Customer Selecton and Retenton Workng Party, ISBN

90 .5 Measures of premum change should deally consder whether customers have an nherent expectaton of premum change. For example, customers wth recent clams wll antcpate a premum ncrease and may be prepared to accept ther renewal offer rather than face the underwrtng gudes of a new company. Conversely, customers who are rollng off an accdent surcharge, httng a mlestone age or a change n martal status may expect a decrease. A possble proxy for customer expectaton s to adjust the premum change varable to be the rato of proposed premum based on new rsk crtera and new rates to adjusted proposed premum new rsk crtera based on last year's rates..6 In addton to ncludng premum change, absolute premum can also be consdered as a factor n a model. Ths approach, though not theoretcally ncorrect, may make the model dffcult to nterpret snce many other factors n the model wll be a component of premum and therefore hghly correlated wth premum sze. Addng absolute premum to the model may sgnfcantly alter the observed relatvtes for other factors whch may make the results hard to nterpret. One alternatve to ncludng absolute premum n such a case s to ft separate models for dfferent bands of average premum..7 The next graph below shows an example of the effect of compettveness n a new busness converson model. The measure of compettveness used n ths case s the rato of the proposed premum to the average of the three cheapest alteratve premums from a selecton of alternatve nsurers. It can be seen that the less compettve the premum, the lower the converson rate. Logstc new busness converson model output Lof of multpler of p/-p Quote/Average of the three cheapest quotes on the market Approx SD from estmate Smoothed estmate 9

91 .8 A further analyss whch can be undertaken s to supermpose the results of two models on one graph: one model that ncludes the compettveness measure and one model that does not. The dsparty between these two models wll show how much of a factor's effects are smply prce-related. Applcatons.9 In a fully deregulated market such as the UK, nsurance companes can set premum rates accordng to what the market wll bear. In most US states and Canadan provnces, nsurance companes need to demonstrate that rates are wthn a reasonable range of loss and expense cost estmates. Companes can, however, measure the senstvty of varous pont selectons wthn those ranges whether the pont estmates pertan to overall rate level or classfcaton ratemakng. Future prcng revews may not only present management wth support on actuaral consderatons such as trend and loss development, but also a forecast of how varous rate change proposals are expected to affect retenton, converson, premum, overall loss rato ncorporatng both overall rate change and portfolo shft of classfcaton changes and proftablty n the short term and/or long term.. Retenton analyses can also lead to operatonal actons whch are unrelated to prce. For example, n a hghly rate-regulated state, consderaton could be gven to whch segments of the populaton gven a restrcted set of rates are both proftable and most lkely to renew n the future. Such a measure could help form new underwrtng gudes or targeted marketng and cross-sell campagns.. Insurance expense analyss s another feld of study that s often over-shadowed by loss analyss. If acquston expenses are hgher than renewal expenses then an understandng of lkely retenton and therefore expected lfe of a polcy can be used to amortze the hgher acquston cost over the expected lfe of the polcy. 9

92 Concluson. A GLM statstcally measures the effect that varables have on an observed tem. In nsurance, GLMs are most often used to determne the effect ratng varables have on clams experence and the effect that ratng varables and other factors have on the probablty of a polcy renewng or a new busness quotaton beng accepted.. GLMs estmate the true effect of each varable upon the experence, makng approprate allowance for the effect of all other factors beng consdered. Ignorng correlaton can produce sgnfcant naccuraces n rates..4 GLMs ncorporate assumptons about the nature of the random process underlyng clams experence. Havng the flexblty to specfy a lnk functon and probablty dstrbuton that matches the observed behavor ncreases the accuracy of the analyss..5 A further advantage of usng GLMs s that as well as estmatng the effect that a gven factor has on the experence, a GLM provdes nformaton about the certanty of model results..6 GLMs are robust, transparent and easy to understand. Wth advances n computer power, GLMs are wdely recognzed as the ndustry standard n European personal lnes, and fast ganng acceptance from ndustry professonals n the US and Canada..7 GLMs n nsurance are not lmted to prcng. Alternatve applcatons of GLM clams analyses nclude underwrtng, selectve marketng and agency marketng..8 GLMs are grounded n statstcal theory and offer a practcal method for nsurance companes to attan satsfactory proftablty and a compettve advantage. 9

93 Bblography Baley, Robert A.; and LeRoy J. Smon, "Two Studes n Automoble Insurance Ratemakng," Proceedngs of the Casualty Actuaral Socety, XLVII, 96. Bland, R.H. et al, Insttute of Actuares GIRO Customer Selecton and Retenton Workng Party, ISBN Brockman, M.J; Wrght, T.S., "Statstcal Motor Ratng: Makng Effectve Use of Your Data", Journal of Insttute of Actuares 9, Vol. III, pages: , 99. Connng, "Insurance Scorng n Prvate Passenger Automoble Insurance Breakng the Slence", Connng Report. Feldblum, Sholom; and Brosus, Erc J "The Mnmum Bas Procedure--A Practtoner's Gude" Casualty Actuaral Socety Forum Vol: Fall Pages: Hardn, James; and Hlbe, Joseph, "Generalzed Lnear Models and Extensons", Stata Press, Jørgenses, B and De Souza, M.C.P, "Fttng Tweede's Compound Posson Model to Insurance Clams Data", Scand. Actuaral J. 994 :69-9. McCullagh, P. and J. A. Nelder, "Generalzed Lnear Models", nd Ed., Chapman & Hall/CRC, 989. Mldenhall, Stephen, "A systematc relatonshp between mnmum bas and generalzed lnear models", Proceedngs of the Casualty Actuaral Socety, LXXXVI, 999. Mller, Mchael J.; and Smth, Rchard A., "The Relatonshp of Credt-based Insurance Scores to Prvate Passenger Automoble Insurance Loss Propensty", an Actuaral Study by Epc Actuares LLC, Wu, Cheng-Sheng Peter; and Guszcza, James C., "Does Credt Score Really Explan Insurance Losses? Multvarate Analyss from a Data Mnng Pont of Vew", Casualty Actuaral Socety Forum Vol: Wnter, Pages: -5. 9

94 A The desgn matrx when varates are used Consder the example of a model whch s based on two contnuous ratng varables: age of drver and age of car. Let Y be a column vector wth components correspondng to the n observed values for the response varable, for example severty: Y Y 8 Y Y n Let X and X denote the column vectors wth components equal to the observed values for the contnuous varables eg X shows the actual age of the drver for each observaton: X X As before, denotes a column vector of parameters, and ε the vector of resduals: ε ε ε... ε n Then the system of equatons takes the form: Y X X ε Or, defnng the desgn matrx X as X

95 The system of equatons takes the form YX. ε Polynomals Rather than assumng that the value of X. s lnear n the varate, t s also possble to nclude n the defnton of X. terms based on polynomals n the varates. For example, a model could be based on a thrd order polynomal n age of drver and a second order polynomal n age of vehcle. In ths case the desgn matrx would be defned as follows: X where the frst column represents the ntercept term drver age the second column represents the values of drver age the thrd column represents the values of drver age the fourth column represents the values of drver age the ffth column represents the values of vehcle age the sxth column represents the values of vehcle age 95

96 B The exponental famly of dstrbutons Formally the exponental famly of dstrbutons s a two-parameter famly of functons defned by: yθ b θ f y; θ, φ exp c y, φ a φ where aφ, bθ, and cy, φ are specfed functons. Condtons mposed on these functons are that a. aφ s postve and contnuous; b. bθ s twce dfferentable wth the second dervatve a postve functon n partcular bθ s a convex functon; and c. cy, φ s ndependent of the parameter θ. These three functons are related by the smple fact that f must be a probablty densty functon and so t must ntegrate to over ts doman. Dfferent choces for aφ, bθ, and cy, φ defne a dfferent class of dstrbutons and a dfferent soluton to the GLM problem. The parameter θ s termed the canoncal parameter and φ the scale parameter. The chart below summarzes some famlar dstrbutons that are members of the exponental famly: Normal Posson Gamma Bnomal mtrals Inverse Gaussan a φ φ ω φ ω φ ω φ ω φ ω b θ θ e θ ln θ m.ln e θ θ c y, φ ω y φ lnπφ / ω ln y! ω ωy ln ln y ln ω φ φ Γ φ m ln y ½ { lnπφ y / ω ω / φ y } It can be seen that the standard choce for aφ s a φ φ ω where ω s a pror weght, a constant that s specfed n advance. For nsurance applcatons common choces for the pror weght are equal to eg when modelng clam counts, the number of exposures eg when modelng clam frequency, or the total number of clams eg when modelng clam severty. It s also clear from the chart that for certan dstrbutons, such as the Posson and bnomal dstrbutons, the scale parameter φ s equal to and plays no further role n the modelng problem. 96

97 A dstrbuton for each observaton Y needs to be specfed. It s assumed that yθ b θ f y ; θ, φ exp c y, φ a φ Thus each observaton has a dfferent canoncal parameter θ but the scale parameter φ s the same across all observatons. It s further assumed that the functons aφ, bθ, and cy, φ are the same for all. So each observaton comes from the same class wthn the exponental famly, but allowng θ to vary corresponds to allowng the mean of each observaton to vary. The parameters θ and φ encapsulate the mean and varance nformaton about Y. It can be shown that for ths famly of dstrbutons: μ E Y b θ Var Y b θ. a φ where the prme ' denotes dfferentaton wth respect to θ. The frst equaton mplctly defnes θ as a functon of μ. If an explct expresson for the nverse of b'θ s known as s the case for the famlar dstrbutons then the frst equaton can be solved to express the canoncal parameter θ explctly as a functon of the mean of the dstrbuton μ : θ b μ Thus the canoncal parameter s essentally equvalent to the mean. Secton descrbes how a GLM asserts that μ s a functon of the lnear predctor η where the lnear predctor s a lnear combnaton of the p covarates X, X p : μ g x... x Thus θ s ultmately a complcated functon of the elements of : g x... x θ b Ths dervaton makes explct the manner n whch the dstrbuton of Y depends on the GLM parameters,, p. p p p p 97

98 It can be seen from the table above that the expresson for cy, φ can be complcated. Fortunately as long as cy, φ does not depend on θ - and hence not on μ and thus not on the GLM modelng parameters - then the form of cy, φ s rrelevant to the soluton of the maxmum lkelhood estmator. Gven that θ s a functon of the mean μ the equaton Var Y b θ. a φ can be nterpreted as establshng the varance of Y as a functon of the mean of Y tmes some scalng term aφ. Thus the scalng parameter φ s a functon of the mean and varance of the dstrbuton. Thus the exponental famly has two desrable propertes: each dstrbuton n the famly s completely specfed n terms of ts mean and varance the varance of Y s a functon of ts mean. Ths second property s emphaszed by wrtng V μ Var Y φ ω where the functon V s termed the varance functon. The chart below summarzes the relatonshp between the mean and the canoncal parameter, expresses f n terms of the standard parameters for the respectve dstrbuton, and lsts the varance functon for the famlar dstrbutons: Normal Posson Gamma Bnomal Inverse Gaussan Notaton N μ, σ P μ G μ, ν B m, π / m IG μ, σ / ω φ σ ν / m σ e θ μ θ θ e θ / θ / e θ θ / V μ μ μ μ μ μ 98

99 . C The Tweede dstrbuton Drect modelng of pure premum or ncurred loss data s problematc snce a typcal pure premum dstrbuton wll consst of a large spke at zero where polces have not had clams and then a wde range of amounts where polces have had clams. Ths s llustrated n the dagram below. Many of the tradtonal members of the exponental famly of dstrbutons are not approprate for modelng clams experence from such a dstrbuton snce they do not have a pont mass at zero combned wth an approprate spread across non-zero amounts. The Tweede dstrbuton s a specal member of the exponental famly whch has a varance functon proportonal to μ p, wth p beng an addtonal parameter. In the case of <p< the Tweede dstrbuton has a pont mass at zero and corresponds to the compound dstrbuton of a Posson clam number process and a Gamma clam sze dstrbuton. The dstrbuton can be Posson-lke as p or Gamma-lke as p. In total the dstrbuton has three parameters - a mean parameter, a dsperson parameter, and the "shape" parameter p, whch when <p< s often wrtten n terms of α where p α α 99

100 Its densty functon s rather complex, and n the case of <p< s defned as: f Y y; θ, λ, α n α { λω κ / y } α Γ nα n! y n.exp { λω[ θ y κ θ ]} α for y > and { } p Y exp λωκ α θ where α θ κ α θ. α α α θ /α θ. λ ω s the pror weght correspondng to the exposure of the observaton n queston. It can be shown that the varance functon for the above Tweede dstrbuton s gven by p V μ μ λ In practce the shape parameter can ether be assumed to be a partcular value or, more usefully, estmated as part of the maxmum lkelhood process. Typcally values of p just under.5 seem to be estmated for auto clams experence. Further nformaton about the Tweede dstrbuton can be found n the paper "Fttng Tweede's Compound Posson Model to Insurance Clams Data" by Jørgenses, B and De Souza, M.C.P, Scand. Actuaral J. 994 :69-9.

101 D Canoncal lnk functons Each of the exponental dstrbutons has a natural lnk functon called the canoncal lnk. It has the property that θ η where θ s the canoncal parameter. Ths property means that the GLM parameters, p enter the expresson for the dstrbuton functon n a smple way. In general fy y yθ b θ exp c y, φ a φ y exp but f θ η ths smplfes to b g η b b a φ yη b η exp c y, φ a φ g η c y, φ and subsequent dfferentaton wth respect to the GLM parameters j s thus sgnfcantly smplfed. The canoncal lnk functons assocated wth the famlar dstrbutons are lsted below Normal Posson Gamma Bnomal Inverse Gaussan Canoncal Lnk μ ln μ / μ ln μ / μ / μ Note that the requrement to be a canoncal lnk functon: θ b g η η mples that the nverse of the lnk functon, g -, s the nverse of b'. In practce, wth sophstcated software to solve GLM modelng problems there s no mperatve to use the canoncal lnk assocated wth a partcular dstrbuton. Instead any arbtrary parngs of the lnk functon and the error structure can be made and such non-canoncal parngs can n fact yeld more predctve models.

102 E Solvng for maxmum lkelhood n the general case of an exponental dstrbuton In the case of the exponental famly of dstrbutons the log lkelhood takes the form: y c a b y l, φ φ θ θ Log lkelhood s maxmzed by takng, for each j, the frst order partal dervatve of l wth respect to and settng equal to zero: p j l j,...,, If there s an explct expresson for θ n terms of, p one can make ths substtuton nto the log lkelhood functon and then carry out the dfferentaton. However, the calculatons become complcated qute quckly. It s smpler just to apply the chan rule of calculus three tmes: Recallng the followng relatonshps: j y c a b Y l μ θ φ φ θ θ θ, η η μ b b b θ μ θ θ θ μ θ μ g g g μ η μ μ μ η μ η j j p p X X X η η...

103 It can be deduced that p j g V x y p j x g b a y l j j j,...,,,...,, μ μ φ μ ω μ θ φ μ Although the theoretcal system of equatons whch must be satsfed n order to maxmze the lkelhood can be relatvely easly wrtten, fndng the soluton to these equatons s more complcated.

104 F Example of solvng for maxmum lkelhood wth a gamma error and nverse lnk functon For the gamma error structure wth an nverse lnk functon, the predcted values take the form:. ] [ g g g g X g Y E The gamma error structure has the followng densty functon / /, ; μφ φ μφ φ φ μ x e x x x f Γ Its log-lkelhood functon s n n x f x l ; ln, ; φ μ φ μ Γ x x x ln ln ln ln φ φ φ μ μ Wth an nverse lnk functon, μ /Σ j X j j and the log-lkelhood functon reduces to Γ p j j j j x X x ln ln ln. φ φ φ In ths example, ln ln ln 8. ln. ln ln ln 4 4. ln4. ln ln ln 5 5. ln5. ln ln ln 8 8. ln8. ; φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ μ Γ Γ Γ Γ x l Ignorng some constant terms and multplyng by φ, the followng functon s to be maxmzed n p j x X j X x l. ln, ;/ φ φ *. ln. 4. ln4. 5. ln5. 8. ln8. ; μ y l 4

105 Agan, to maxmze l* take dervatves wth respect to, and. Set the dervatves to zero and the followng three equatons are derved: * l * l * l > > > 6 Solvng these smultaneous equatons gves the followng solutons: whch result n the followng predcted values: Urban Rural Male Female

106 G Data requred for a GLM clams analyss The overall structure of a dataset for GLM clams analyss conssts of lnked polcy and clams nformaton at the ndvdual rsk level. The defnton of ndvdual rsk level wll vary accordng to the lne of busness and the type of model. For nstance, n a personal automoble clams model, the defnton of rsk may be a vehcle. In a personal automoble retenton model, the defnton of rsk may be a polcy contanng several vehcles. One record should be present for each perod of tme durng whch a polcy was exposed to the rsk of havng a clam, and durng whch all factors remaned unchanged. Polcy amendments should deally appear as two records, wth the prevous exposure curtaled at the pont of amendment. Md-term polcy cancellatons should also result n the exposure perod beng curtaled. If ths data s not avalable t s often possble to approxmate t from less perfect data - for example the polces n force at one year end could be compared wth the polces n force at the prevous year end, wth matchng polces beng assumed to be n force for the whole year, and approprate approxmatons beng made for non-matchng polces. The dataset should contan felds defnng the earned exposure and the ratng factors applcable at the start of the exposure perod. Addtonally, premum nformaton typcally earned premum can be attached to each record. Although premum s not used drectly n the development of the clams models, t can provde valuable nformaton for measurng the mpact of any new ratng or underwrtng actons, and for producng summary one-way and two-way analyses ncludng loss ratos. All explanatory varables n the dataset should record the crtera whch were applcable at the start of the polcy exposure or, strctly speakng, the pont at whch the premum was determned for the exposure perod n queston. In the case of categorcal varables such as terrtory or vehcle class, however, the data recorded should deally be derved by applyng the current method of categorzaton to the hstorc stuaton. Not all explanatory varables wll be used to predct future clams experence. Dummy varables may be used to absorb certan effects that could bas the parameter estmates. For example, f conductng a countrywde study, t may be approprate to create a dummy varable to standardze for dfferences n overall loss experence by geography. Ths dummy varable may be state provnce, terrtory wthn state provnce, or groups of terrtores wthn state provnce. Smlarly, f combnng data from several companes, a company dentfer may be an approprate dummy varable. Ths dummy varable could absorb dfferences n underwrtng standards and overall qualty of busness between the companes. Dummy varables could also absorb some hstorcal effect whch s not expected to contnue n the future. Though dummy varables can be used n such a way, t s stll preferable to have an experence perod devod of such dsruptve effects. 6

107 GLM clams datasets are typcally ether based on a certan polcy year perod or a certan calendar-accdent year perod. An example usng the tradtonal parallelogram and rectangle dagrams llustrates the dfference between the two. Dataset A: polcy year Dataset B: calendar-accdent year PY PY CAY CAY Polcy year: Annual polces wrtten between // and //, earned as of //. Clams ncurred on these polces before // but losses evaluated as of 6//4. Calendar-accdent year: Annual polces earnng between // and // n respect of polces wrtten between // and //. Clams occurrng on polces earnng between // and //, ncurred losses evaluated as of 6//. There are benefts and dsadvantages of each method of organzaton. The polcy year approach has the advantage of relatng to a certan perod of underwrtng and method of sellng a product. The earnng pattern of any gven polcy year, however, extends beyond the month perod. In order for polces to be fully earned, the cut-off date for exposures needs to extend months n the case of annual polces or sx months n the case of sem-annual polces. In addton, the need for some IBNR emergence bulds n more delay, resultng n data analyzed beng not very recent. The calendar-accdent method of organzaton requres that each polcy be splt nto ts calendar year components for example, an annual polcy wrtten on May wll be splt nto records defned by May through December and January through Aprl. Although ths adds to system requrements and ncreases the number of records n the dataset, ths allows the creaton of an accurate calendar year "dummy" explanatory varable whch can be used to absorb trends n clam experence whch purely relate to tme. If ths s not possble, the polcy year method of organzaton can be used, but the effect of any trends can be more dffcult to dentfy. 7

108 Clams nformaton Clam count and loss amount nformaton should be attached to the relevant exposure records, based on the most recent reserve estmates. The choce of defnton of ncurred clam count, specfcally whether ths pertans to number of clams or number of clamants, s not partcularly mportant f ultmately the clam frequency and clam severty wll be combned to the pure premum level. It s generally easer to model loss nformaton net of deductbles, but should deally not be truncated accordng to any large loss threshold at ths stage snce ths allows senstvty testng of several dfferent large loss thresholds when modelng. It s approprate to leave some delay between the end of the experence perod and the valuaton date to allow for some IBNR clams to emerge and to allow for the case estmates to develop. If there s a regular annual or quarterly revew of case estmates, or any other known ssue surroundng the reserves, the experence perod and valuaton date should be selected to take advantage of the most accurate nformaton. The overall base level adjustment for pure IBNR and development of known clams wll be made after models are fnalzed, but t s necessary to consder whether such tme-related nfluences could bas the model ratng factor relatvtes. There s a range of optons for nvestgatng the consequences of clams development upon the relatvtes measured, ncludng: gnorng loss development and assumng that parameter estmates are unaffected ncludng a dummy varable eg calendar year or polcy year 9 n the model to absorb tme-related nfluences; once models are fnalzed, the dummy varable s smply removed and the base levels are adjusted va a separate calculaton ths assumes the development of clams s smlar for all types of polcy before modelng the most recent experence, performng a seres of GLM analyses on an older dataset whch contans clams statstcs as at varous perods of development. By comparng GLM relatvtes based on data as at dfferent development perods t s possble to assess whether clams development dffers materally by type of rsk - f they do t s possble to use the rato of two models as at dfferent development perods to derve multvarate development factors whch can be appled to analyses based on a more recent dataset. 9 Dummy varables based on quarters or months may contan an element of seasonalty 8

109 It s also necessary to consder the treatment of clams closed wthout payment also known as CWPs. Before modelng, t s generally most approprate to remove such clams settng the clam count feld to zero n these cases, perhaps also creatng a new clam type consstng of only CWPs f they are to be modeled for expense allocaton purposes. If CWPs are not excluded t can become dffcult to model average clam amounts snce some common GLM forms eg those wth gamma error functons cannot be ftted to data contanng observatons equal to zero. Generally, one perod of polcy exposure wll have zero or one clam assocated wth t. Occasonally, there may be two or more accdents occurrng n a gven perod of exposure. There are a number of alternatve ways to deal wth ths stuaton: Multple clams could be attached to the sngle exposure record, wth the number of such clams and the total amount of such clams beng recorded. Ths s the smplest method. A small amount of nformaton s lost as a result of storng nformaton lke ths, but such a loss s not generally materal. Further records could be created n the database n the case of multple clams. The exposure end date of the orgnal record could be set to be the date of occurrence of the frst accdent, wth the exposure start date of the second record beng the day after. Each clam could then be attached to one exposure record and the "number of clams" felds would always be zero or one. All ratng factors recorded n the second record would be dentcal to the orgnal record. Further records could be created n the database as n the second opton above, but wth the exposure dates n the orgnal record remanng unaltered, and wth the exposure start and end dates n the second and subsequent coped records beng equal to each other, so that the addtonal records had zero days exposure recorded. When analyzng clam amounts, the exposure nformaton s not requred, and when analyzng clam frequences the experence could be summarzed by unque combnaton of ratng factor levels usng an approprate extract of the data, thus compressng ths data to derve the correct exposure. In practce, the easest way to program the last two of these three methods produces one extra record for every clam, so polces wth one clam would produce two records, and polces wth two clams would produce three records. For example, usng the second method, the exposure would be splt at every clam date, so that there would always be one record wth no clams the last record. 9

110 General In addton to volume requrements, how the model s to be used should also be consdered. If the model were to be used to dentfy naccuraces n the current ratng plan, a lne of busness whch undergoes sgnfcant rate nterventon at pont of sale would not be approprate unless beng used to gude underwrters on the acceptable range of ther nterventon. Smlarly, f lttle s collected or stored n the way of explanatory varables, ths too would lmt the strength of the GLM.

111 H Automated approach for factor categorzaton One automated approach wthn the GLM framework s to replace a sngle factor wth many levels wth a seres of factors each contanng just two levels whch are then tested for sgnfcance. For example nstead of modelng age of nsured wth a sngle factor, a seres of bnary factors could be created: bnary factor s the age less than 8? bnary factor s the age less than 9? bnary factor s the age less than? bnary factor 4 s the age less than? bnary factor s the age less than 9? age 4 s the base level n ths example bnary factor s the age less than 4? bnary factor 8 s the age less than? These sngle parameter bnary factors could then be tested for sgnfcance usng an automatc stepwse algorthm as dscussed n Secton. If, for example, ages, 4, 5 and 6 dd not have a statstcally dfferent effect on the rsk, the factors "s age less than 4", "s age less than 5" and "s age less than 6" would be deemed nsgnfcant and excluded from the model. Those bnary factors deemed sgnfcant n the model would determne the approprate age categorzaton, and mpled parameter estmates for each age could then be determned by summng the approprate bnary factors - eg n the above example the mpled parameter estmate for "age " would be the sum of the parameters for bnary factors 4 to 8. An example result, based on real data, s shown below. The dotted lne shows the ftted parameter estmates when age s not grouped and when a parameter s allocated to each ndvdual age rounded to the nearest nteger. The sold lne shows the parameter estmates mpled by the results of the automatc groupng approach descrbed above. Only results up to age 9 are shown for reasons of confdentalty.

112 Example of automatc groupng part result only - ages over 9, ncludng base, not shown Log of multpler Exposure years Age of drver 95% confdence nterval manual mappng Manual mappng Automatc mappng In ths case t s not at all clear that the automatc approach produces a better categorzaton than a manual approach - for example t can be seen from the dotted lne that age has a parameter estmate between ages and 4, and ntutvely t appears wrong to group ths level wth ages 5 to 6 as the automatc process suggests. It s often the case that a manual approach to categorzaton can produce more approprate results than an automated approach.

113 I Cramer's V Cramer's V statstc s a measure of correlaton between two categorcal factors and s defned as where:, j n j e e mn a, b. n a number of levels of factor one b number of levels of factor two n j amount of the exposure measure for the th level of factor one and j th level of factor two n Σ j n j e j Σ n j. Σ j n j / n The statstc takes values between and. A value of means that knowledge of one of the two factors gves no knowledge of the value of the other. A value of means that knowledge of one of the factors allows that value of the other factor to be deduced. The two tables below show possble two-way exposure dstrbutons of two categorcal factors - each wth only two levels, A and B, expressed as ether rows or columns. The top table shows a Cramer's V statstc of, and the bottom table gves an example of a Cramer's V of. j j A B A B A B A B

114 J Benefts of modelng frequency and severty separately rather than usng Tweede GLMs Tweede GLMs ftted to pure premum drectly can often gve very smlar results to those derved by the "tradtonal" approach of combnng models ftted to clam frequences and clam severtes separately. In these cases usng Tweede GLMs can reduce the amount of teratve modelng work requred to produce satsfactory clams models. The tradtonal approach, however, can provde a better understandng of the way n whch factors affect the cost of clams, and can more easly allow the dentfcaton and removal of certan random effects from one element of the experence, for example va smoothng or by excludng certan factors from one of the frequency or amounts models. For example, the graph below compares the rsk premum results from the Tweede model to those from the tradtonal approach for one ratng factor. Though the results between the two approaches are nearly dentcal, the tradtonal approach does provde addtonal nformaton about the underlyng frequency numbers and severty amounts effects - n ths case the factor affects frequences and severtes n completely opposte ways. Comparson of Tweede model wth tradtonal frequency / severty approach. 45 Log of multpler % -9% % 4 5 Exposure years Category Category Occupaton Tweede SE Tweede RPSE RP Numbers Amounts The three graphs below demonstrate a case where the results for a partcular factor from a Tweede GLM dffer from those produced by the tradtonal approach. The frst two graphs show the underlyng frequency and severty model output from the tradtonal approach. Because of the wde standard errors, meanngless pattern, and nsgnfcant type III test, the factor has been removed from the severty model. Consequently, the tradtonal rsk premum reflects the underlyng frequency experence only. The Tweede model s more affected by the volatlty from the underlyng severty experence, and produces results whch may be less approprate. 4

115 Comparson of Tweede model wth tradtonal frequency / severty approach - tradtonal frequency and severty models.4 8 Log of multpler % 6% % % % -% 5% 8% % 8% 8% -% -7% -% Exposure years % < Unknown Age of drver Oneway relatvtes Approx 95% confdence nterval Parameter estmate P value.% Rank 5/.8 6% 6 Log of multpler % % -% % % % 6% 9% -5% % 5% % % 4% 5 4 Number of clams < Unknown Age of drver EXCLUDED FACTOR Oneway relatvtes Approx 95% confdence nterval Smoothed estmate P value 5.6% Rank 4/9 5

116 Comparson of Tweede model wth tradtonal frequency / severty approach Log of multpler. -. % % % 6% % -% % % 4% -% % % 5% % 8% % % 6% 8% 5% 8% -% -% -4% -7% -6% -% -8% Exposure years -.4-9% -.6 < Age of drver 7 Unknown Tweede SE Tweede RPSE RP 6