# GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE

Save this PDF as:

Size: px
Start display at page:

## Transcription

3 Generalized Linear Models in Vehicle Insurance 385 I: Commonly used link functions Distribution Link function g(μ) normal identity μ poisson log ln(μ) binomial logit ln 1 cloglog ln ln 1 n exponential log ln(μ) Definition 3. Let Y be a random variable with mean denoted by μ and p.d.f. from the exponential family. Then the generalized linear model (GLM) is given by g(μ) = x β, where g(μ) is the link function. The generalized linear models provide relatively simple and robust way to analyze the effect of many different factors on some observed event. The GLMs are used for valuation of insurance policies due to the number of claims. In GLM, it is assumed that the number of the claims is a dependent variable which follows Poisson distribution and which depends on known predictors. The predictors characterize the insured individual or vehicle, e.g. gender, age, engine capacity. 2.2 Methods of Comparing Different Models Determining appropriate model is the basis of regression modeling. One important principle of regression modeling is the principle of simplicity. The simpler model, well describing the data, gets priority over the more complex model that describes the data almost perfectly Analysis of Deviance Along with the basic generalized linear model, we also take into account the following partial models, which are called submodels. Definition 4. The full model, denoted as GLM max, satisfies the following conditions: it has the same distribution as the proposed model, it has the same link function as the proposed model, the number of parameters is equal to the number of the response variables. The response variables are determined by the full model with residues equal to zero. Definition 5. The null model, denoted as GLM min, satisfies the following condition: it has the same distribution as the proposed model, it has the same link function as the proposed model, the number of parameters is equal to one. The full model is an indicator of the best regression and the null model gives the worst regression with given distribution and link function. The proposed model will be somewhere between these two extreme models. The relevance of the proposed model will be evaluated via comparison with these models. However, such comparison is permitted only for the original model and its submodel. Definition 6. Consider GLM with design matrix X n m and vector of parameters β m. Its submodel, denoted as GLM sub, with design matrix Q n q and vector of parameters β q satisfies the following conditions: it has the same distribution as the proposed GLM, it has the same link function as the proposed GLM, the number of parameters is q < m and columns of the design matrix Q n q are linear combinations of columns of the design matrix X n m. The deviance is defined as a measure of distance between the full model and the proposed submodel. Definition 7. The deviance, denoted as dev, is given by dev = 2(l max l), where l max is logarithm of the likelihood function of the full model and l is logarithm of the likelihood function of the proposed submodel. The deviances of the models are used for their comparison as described in the next theorem. Theorem 8. Consider GLM with vector of parameters β m and its submodel GLM sub with β q, where q < m < n. If the submodel GLM sub is suitable, then the difference of deviances Δdev = dev sub dev fulfills asymptotically χ 2 distribution with (m q) degrees of freedom. For more details see Kaas (2009). Hence, for Δdev > χ 2 (m q) we reject the assumption that 1 the submodel is suitable Information Criteria There are no perfect models. The idea is to find a model which is the best approximation of reality. We try to minimize the loss of information. The information criteria indicate that information is lost, when a model is used to describe the reality. They balance between accuracy of fitting the data and complexity of the model. Information criteria for selecting the minimal good model are for example: Akaike information criterion (AIC), Schwarz-Bayesian information criterion (BIC), Hannan-Quinn information criterion, Deviance information criterion (DIC). In our case study, Akaike information criterion will be used for comparison of the models.

4 386 Silvie Kafková, Lenka Křivánková Definition 9. Akaike information criterion (AIC) is given as AIC = 2l + 2k, where k is the number of model parameters and l is logarithm of the likelihood function of the proposed model. Preferred model is considered to be that with the lowest AIC. 3 RESULTS AND DISCUSION Every person, when applying for vehicle insurance policy, is assigned to a class, that is homogeneous in terms of risk. One of the criteria used for assigning an individual to a certain class is the number of claims. Thus, it is very important task for insurance companies to model the number of claims in a given insurance portfolio. Our aim is to predict relation of annual claim frequency on given risk factors. A data set from vehicle insurance will be processed. The data for our case study can be found in (Heller and Jong, 2008). The data set is based on one-year vehicle insurance policies recorded in 2004 or There are policies and of them (6.82%) have at least one claim. The total amount of claims is We see, that the histogram of annual claim frequency is strongly right-skew (Fig. 1). The GLMs are suitable for analysis of non-normal data, i.e. insurance data. Necessary procedures Frequency of ACF 1: Histogram of Annual Claim Frequency Annual Claim Frequency (ACF) II: Variables in a data set Notation Name of Variable Range expo Exposure 0 1 clm Claim occurrence 0 (no), 1 (yes) numclaims Number of claims 0, 1, 2, veh_body Vehicle body type hatchback, sedan, station wagon veh_age Vehicle age 1 (new), 2, 3, 4 area Area of residence A, B, C, D, E, F gender Gender male, female agecat Age band of policyholder 1 (youngest), 2, 3, 4, 5, 6

5 Generalized Linear Models in Vehicle Insurance 387 III: The estimate of parameters Coefficients (Intercept) veh_body2 veh_body3 veh_age2 veh_age3 veh_age area2 area3 area4 area5 area6 gender agecat2 agecat3 agecat4 agecat5 agecat are implemented in R software environment. All variables in data set are given in Tab. II. The drivers can be divided into groups on the basis of the risk factors (gender, age category, area, vehicle body, vehicle age). From these five risk factors and their values we get 864 groups. For each group, the total amount of exposure during the year (expo) and total of claims (numclaims) are known. We model the average number of claims per contract (numclaims/expo). We try to find well-fitting GLM for the claim frequency in terms of the risk factors. For the number of claims per contract, it is reasonable to assume Poisson distribution. Our first GLM for data fitting is a model from Poisson family with loglink, which parameters are predicted in Tab. III. The coefficients are given relatively with respect to the standard class (veh_body1, veh_age1, area1, gender1, agecat1). The coefficients are taken to be zero for the standard class. According to predicted parameters, the best group is the one with veh_body1, veh_age4, area4, gender2 and agecat6. The corresponding average number of claims equals to e ( ) = That means one claim per 10.3 years on average. 3.1 Comparison of the Models In this subsection, the models with different risk factors are compared. In the following Tab. IV we test the null hypothesis that adding a risk factor to our preceding model actually has no effect. The deviance (dev) for assessing the suitability of the proposed submodel is used. We assume the difference in deviance (Δdev) between the preceding model and the proposed model has χ 2 distribution with Δdf degrees of freedom. This is given in Theorem 8, where preceding model is a submodel of the proposed model. According to the analysis of deviance, the best model is 1+agecat+veh_age. However, we choose the model 1+agecat+veh_age+area, although Δdev = < χ 2 (5) = The test of the statistic is close to the critical value of χ 2 distribution. We can support the inclusion of the parameter area by calculation of AIC. Although the AIC penalizes the number of parameters, the selected model has smaller AIC than its submodel, for the model 1+agecat+veh_age it is AIC = and for the model 1+agecat+veh_ age+area it holds AIC = Hence, according to AIC, the model is improved. Furthermore, based on an educated guess, significance of the factor area is not negligible. 4 CONCLUSION Considering that real data from vehicle insurance is not normally distributed, we cannot use the standard linear regression model. This paper proposes an estimate of annual claim frequency in vehicle insurance based on General Linear Model. It represents a work devoted to better understanding, using data of vehicle insurance, and how GLMs can be used to explain the relation of annual claim frequency on given risk factors. The case study results confirm the importance of three factors: age group of policy holder (agecat), vehicle age (veh_age) and area of residence (area). This particular case study shows that the gender IV: The table of analysis of deviance Model specification df dev Δdev Δdf veh_body veh_age area gender agecat agecat+veh_age agecat+veh_age+area

6 388 Silvie Kafková, Lenka Křivánková or vehicle body type (veh_body) are relatively unimportant for annual claim frequency. When the model was being created, we also had in mind the principle of simplicity. Therefore, we used analysis of deviance to compare relevance of the submodels. Our proposed model is quite simple, which is important for its use in practice. SUMMARY The aim of this paper is to estimate an annual frequency of claims (AFC) from which the premium in vehicle insurance is derived. It is considered that the AFC depends on many risk factors. We take into account these five factors vehicle body type, vehicle age, area of residence, gender of policyholder and age band of policyholder. The generalized linear models (GLMs) are used for the estimation of AFC in this paper. This approach is compared with commonly used linear regression and the advantages of GLMs are shown. In the initial section, the framework of GLMs is introduced, including the definition of the link function and the specification of the exponential family of probability density functions. Then methods of model comparison, analysis of deviance and minimization of the information loss, are shown. The main part of the paper consists of a case study, where the GLMs are applied in vehicle insurance. We process a data set based on oneyear vehicle insurance policies. The drivers are divided into groups on the basis of the risk factors. For each group, we model the average number of claims per contract. The aim is to find a well-fitting GLM for the claim frequency in terms of the risk factors. The Poisson distribution is assumed for the number of claims per contract and the log-link function is used. Several different models containing various risk factors are considered. Analysis of deviance, based on a comparison of the goodness of fit, is used to select the best model. According to the analysis of deviance, the suitable model has two risk factors (age group of policyholder and vehicle age). Nevertheless, the more complex model with one other factor (area of residence) is taken into account. The significance of the factor area is not negligible in practice. Although the significance of the risk factor area is rejected at a significance level of 0.05, it is not rejected at a significance level of 0.1. The choice of the model with the factor area is substantiated by the calculation of Akaike information criterion (AIC), which is based on minimizing the information loss. The approach introduced in this paper proves to be a useful way to implement methodics of the generalized linear models into actuarial science. REFERENCES ANTONIO, K., BEIRLANT, J., 2007: Actuarial statistics with generalized linear mixed models. Insurance: Mathematics and Economics, 40, 1: ISSN CERCHIARA, R., EDWARDS, M., GAMBINI A., 2008: Generalized linear models in life insurance: decrements and risk factor analysis under Solvency II. In: 18th International AFIR Colloquium. Available online: cfm?lang=en&dsp=afir&act=colloquia. DENUIT, M. et al., 2007: Actuarial modelling of claim counts: risk classification, credibility and bonus-malus systems. Hoboken: Wiley, 384 p. ISBN DOBSON, A. J., 2002: An Introduction to Generalized Linear Models. Boca Raton: CRC Press, 225 p. ISBN GSCHLÖSSL, S., SCHOENMAEKERS, P., DENUIT, M., 2011: Risk classification in life insurance: methodology and case study. European Actuarial Journal, 1, 1: ISSN HABERMAN, S., RENSHAW, A. E., 1996: Generalized linear models and actuarial science. The Statistician, 45, 4: ISSN HARDIN, J. W., HILBE, J., 2007: Generalized linear models and extensions. Texas: Stata Press, 387 p. ISBN HELLER, G. Z., JONG P., 2008: Generalized Linear Models for Insurance Data. New York: Cambridge University Press, 204 p. ISBN KAAS, R. 2009: Modern Actuarial Risk Theory: Using R. Heidelberg: Springer, 400 p. ISBN MCCULLAGH, P., NELDER, J. A., 1989: Generalized Linear Models. London: Chapman and Hall, 532 p. ISBN NELDER, J. A., WEDDERBURN, R. W. M., 1972: Generalized linear models. Journal of the Royal Statistical Society: Series A (General), 135, 3: ISSN OHLSSON, E. JOHANSSON, B., 2010: Non-life insurance pricing with generalized linear models. Berlin / Heidelberg: Springer, 174 p. ISBN WOLNY-DOMINIAK, A., 2012: Modeling of claim counts using data mining procedures in R CRAN. In: RAMÍK, J. and STAVÁREK, D. (eds.) Proceedings of 30th International Conference Mathematical Methods in Economics. Karviná: Silesian University, School of Business Administration, ISBN Silvie Kafková: Lenka Křivánková: Contact information

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### Underwriting risk control in non-life insurance via generalized linear models and stochastic programming

Underwriting risk control in non-life insurance via generalized linear models and stochastic programming 1 Introduction Martin Branda 1 Abstract. We focus on rating of non-life insurance contracts. We

### The zero-adjusted Inverse Gaussian distribution as a model for insurance claims

The zero-adjusted Inverse Gaussian distribution as a model for insurance claims Gillian Heller 1, Mikis Stasinopoulos 2 and Bob Rigby 2 1 Dept of Statistics, Macquarie University, Sydney, Australia. email:

### Travelers Analytics: U of M Stats 8053 Insurance Modeling Problem

Travelers Analytics: U of M Stats 8053 Insurance Modeling Problem October 30 th, 2013 Nathan Hubbell, FCAS Shengde Liang, Ph.D. Agenda Travelers: Who Are We & How Do We Use Data? Insurance 101 Basic business

### Introduction to Predictive Modeling Using GLMs

Introduction to Predictive Modeling Using GLMs Dan Tevet, FCAS, MAAA, Liberty Mutual Insurance Group Anand Khare, FCAS, MAAA, CPCU, Milliman 1 Antitrust Notice The Casualty Actuarial Society is committed

### Generalized Linear Models

Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Stochastic programming approaches to pricing in non-life insurance

Stochastic programming approaches to pricing in non-life insurance Martin Branda Charles University in Prague Department of Probability and Mathematical Statistics 11th International Conference on COMPUTATIONAL

### Nonnested model comparison of GLM and GAM count regression models for life insurance data

Nonnested model comparison of GLM and GAM count regression models for life insurance data Claudia Czado, Julia Pfettner, Susanne Gschlößl, Frank Schiller December 8, 2009 Abstract Pricing and product development

### Penalized Logistic Regression and Classification of Microarray Data

Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification

### Hierarchical Insurance Claims Modeling

Hierarchical Insurance Claims Modeling Edward W. (Jed) Frees, University of Wisconsin - Madison Emiliano A. Valdez, University of Connecticut 2009 Joint Statistical Meetings Session 587 - Thu 8/6/09-10:30

### Own Damage, Third Party Property Damage Claims and Malaysian Motor Insurance: An Empirical Examination

Australian Journal of Basic and Applied Sciences, 5(7): 1190-1198, 2011 ISSN 1991-8178 Own Damage, Third Party Property Damage Claims and Malaysian Motor Insurance: An Empirical Examination 1 Mohamed Amraja

### Probabilistic concepts of risk classification in insurance

Probabilistic concepts of risk classification in insurance Emiliano A. Valdez Michigan State University East Lansing, Michigan, USA joint work with Katrien Antonio* * K.U. Leuven 7th International Workshop

### The Probit Link Function in Generalized Linear Models for Data Mining Applications

Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/\$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications

### Logistic Regression (a type of Generalized Linear Model)

Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge

### Package insurancedata

Type Package Package insurancedata February 20, 2015 Title A Collection of Insurance Datasets Useful in Risk Classification in Non-life Insurance. Version 1.0 Date 2014-09-04 Author Alicja Wolny--Dominiak

### Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Non-life insurance mathematics Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring Overview Important issues Models treated Curriculum Duration (in lectures) What is driving the result of a

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

### Model Selection and Claim Frequency for Workers Compensation Insurance

Model Selection and Claim Frequency for Workers Compensation Insurance Jisheng Cui, David Pitt and Guoqi Qian Abstract We consider a set of workers compensation insurance claim data where the aggregate

### Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

### Generalized Linear Model Theory

Appendix B Generalized Linear Model Theory We describe the generalized linear model as formulated by Nelder and Wedderburn (1972), and discuss estimation of the parameters and tests of hypotheses. B.1

### A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

### GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

### Predictive Modeling for Life Insurers

Predictive Modeling for Life Insurers Application of Predictive Modeling Techniques in Measuring Policyholder Behavior in Variable Annuity Contracts April 30, 2010 Guillaume Briere-Giroux, FSA, MAAA, CFA

### Factorial experimental designs and generalized linear models

Statistics & Operations Research Transactions SORT 29 (2) July-December 2005, 249-268 ISSN: 1696-2281 www.idescat.net/sort Statistics & Operations Research c Institut d Estadística de Transactions Catalunya

### A general statistical framework for assessing Granger causality

A general statistical framework for assessing Granger causality The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### Logistic and Poisson Regression: Modeling Binary and Count Data. Statistics Workshop Mark Seiss, Dept. of Statistics

Logistic and Poisson Regression: Modeling Binary and Count Data Statistics Workshop Mark Seiss, Dept. of Statistics March 3, 2009 Presentation Outline 1. Introduction to Generalized Linear Models 2. Binary

### WHAT POLICY FEATURES DETERMINE LIFE INSURANCE LAPSE? AN ANALYSIS OF THE GERMAN MARKET

WHAT POLICY FEATURES DETERMINE LIFE INSURANCE LAPSE? AN ANALYSIS OF THE GERMAN MARKET MARTIN ELING DIETER KIESENBAUER WORKING PAPERS ON RISK MANAGEMENT AND INSURANCE NO. 95 EDITED BY HATO SCHMEISER CHAIR

### Poisson Models for Count Data

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

### Lecture 6: Poisson regression

Lecture 6: Poisson regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction EDA for Poisson regression Estimation and testing in Poisson regression

### Generalized Linear Models. Today: definition of GLM, maximum likelihood estimation. Involves choice of a link function (systematic component)

Generalized Linear Models Last time: definition of exponential family, derivation of mean and variance (memorize) Today: definition of GLM, maximum likelihood estimation Include predictors x i through

### Generalised linear models for aggregate claims; to Tweedie or not?

Generalised linear models for aggregate claims; to Tweedie or not? Oscar Quijano and José Garrido Concordia University, Montreal, Canada (First version of May 20, 2013, this revision December 2, 2014)

### More Flexible GLMs Zero-Inflated Models and Hybrid Models

More Flexible GLMs Zero-Inflated Models and Hybrid Models Mathew Flynn, Ph.D. Louise A. Francis FCAS, MAAA Motivation: GLMs are widely used in insurance modeling applications. Claim or frequency models

### Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]

Survival Analysis of Left Truncated Income Protection Insurance Data [March 29, 2012] 1 Qing Liu 2 David Pitt 3 Yan Wang 4 Xueyuan Wu Abstract One of the main characteristics of Income Protection Insurance

### GLMs: Gompertz s Law. GLMs in R. Gompertz s famous graduation formula is. or log µ x is linear in age, x,

Computing: an indispensable tool or an insurmountable hurdle? Iain Currie Heriot Watt University, Scotland ATRC, University College Dublin July 2006 Plan of talk General remarks The professional syllabus

### A Deeper Look Inside Generalized Linear Models

A Deeper Look Inside Generalized Linear Models University of Minnesota February 3 rd, 2012 Nathan Hubbell, FCAS Agenda Property & Casualty (P&C Insurance) in one slide The Actuarial Profession Travelers

MODELING AUTO INSURANCE PREMIUMS Brittany Parahus, Siena College INTRODUCTION The findings in this paper will provide the reader with a basic knowledge and understanding of how Auto Insurance Companies

### Comparing return to work outcomes between vocational rehabilitation providers after adjusting for case mix using statistical models

Comparing return to work outcomes between vocational rehabilitation providers after adjusting for case mix using statistical models Prepared by Jim Gaetjens Presented to the Institute of Actuaries of Australia

### Statistical Analysis of Life Insurance Policy Termination and Survivorship

Statistical Analysis of Life Insurance Policy Termination and Survivorship Emiliano A. Valdez, PhD, FSA Michigan State University joint work with J. Vadiveloo and U. Dias Session ES82 (Statistics in Actuarial

### From the help desk: hurdle models

The Stata Journal (2003) 3, Number 2, pp. 178 184 From the help desk: hurdle models Allen McDowell Stata Corporation Abstract. This article demonstrates that, although there is no command in Stata for

### Computer exercise 4 Poisson Regression

Chalmers-University of Gothenburg Department of Mathematical Sciences Probability, Statistics and Risk MVE300 Computer exercise 4 Poisson Regression When dealing with two or more variables, the functional

### Joint models for classification and comparison of mortality in different countries.

Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute

### A revisit of the hierarchical insurance claims modeling

A revisit of the hierarchical insurance claims modeling Emiliano A. Valdez Michigan State University joint work with E.W. Frees* * University of Wisconsin Madison Statistical Society of Canada (SSC) 2014

### Multivariate Logistic Regression

1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

### BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING Xavier Conort xavier.conort@gear-analytics.com Session Number: TBR14 Insurance has always been a data business The industry has successfully

### Poisson Regression or Regression of Counts (& Rates)

Poisson Regression or Regression of (& Rates) Carolyn J. Anderson Department of Educational Psychology University of Illinois at Urbana-Champaign Generalized Linear Models Slide 1 of 51 Outline Outline

### Lecture 8: Gamma regression

Lecture 8: Gamma regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Models with constant coefficient of variation Gamma regression: estimation and testing

### EVALUATION OF PROBABILITY MODELS ON INSURANCE CLAIMS IN GHANA

EVALUATION OF PROBABILITY MODELS ON INSURANCE CLAIMS IN GHANA E. J. Dadey SSNIT, Research Department, Accra, Ghana S. Ankrah, PhD Student PGIA, University of Peradeniya, Sri Lanka Abstract This study investigates

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes JunXuJ.ScottLong Indiana University August 22, 2005 The paper provides technical details on

### Modeling Individual Claims for Motor Third Party Liability of Insurance Companies in Albania

Modeling Individual Claims for Motor Third Party Liability of Insurance Companies in Albania Oriana Zacaj Department of Mathematics, Polytechnic University, Faculty of Mathematics and Physics Engineering

### Nominal and ordinal logistic regression

Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA

REVSTAT Statistical Journal Volume 4, Number 2, June 2006, 131 142 A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA Authors: Daiane Aparecida Zuanetti Departamento de Estatística, Universidade Federal de São

### Analyses on Hurricane Archival Data June 17, 2014

Analyses on Hurricane Archival Data June 17, 2014 This report provides detailed information about analyses of archival data in our PNAS article http://www.pnas.org/content/early/2014/05/29/1402786111.abstract

### 13. Poisson Regression Analysis

136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often

### SUGI 29 Statistics and Data Analysis

Paper 194-29 Head of the CLASS: Impress your colleagues with a superior understanding of the CLASS statement in PROC LOGISTIC Michelle L. Pritchard and David J. Pasta Ovation Research Group, San Francisco,

### Normality Testing in Excel

Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

### The Do s & Don ts of Building A Predictive Model in Insurance. University of Minnesota November 9 th, 2012 Nathan Hubbell, FCAS Katy Micek, Ph.D.

The Do s & Don ts of Building A Predictive Model in Insurance University of Minnesota November 9 th, 2012 Nathan Hubbell, FCAS Katy Micek, Ph.D. Agenda Travelers Broad Overview Actuarial & Analytics Career

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### Hierarchical Insurance Claims Modeling

Hierarchical Insurance Claims Modeling Edward W. Frees University of Wisconsin Emiliano A. Valdez University of New South Wales 02-February-2007 y Abstract This work describes statistical modeling of detailed,

### GLM, insurance pricing & big data: paying attention to convergence issues.

GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.

### Automated Biosurveillance Data from England and Wales, 1991 2011

Article DOI: http://dx.doi.org/10.3201/eid1901.120493 Automated Biosurveillance Data from England and Wales, 1991 2011 Technical Appendix This online appendix provides technical details of statistical

### EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA

EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA Andreas Christmann Department of Mathematics homepages.vub.ac.be/ achristm Talk: ULB, Sciences Actuarielles, 17/NOV/2006 Contents 1. Project: Motor vehicle

### Combining Linear and Non-Linear Modeling Techniques: EMB America. Getting the Best of Two Worlds

Combining Linear and Non-Linear Modeling Techniques: Getting the Best of Two Worlds Outline Who is EMB? Insurance industry predictive modeling applications EMBLEM our GLM tool How we have used CART with

### Models for Count Data With Overdispersion

Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extra-poisson variation and the negative binomial model, with brief appearances

### AUTOCORRELATED RESIDUALS OF ROBUST REGRESSION

AUTOCORRELATED RESIDUALS OF ROBUST REGRESSION Jan Kalina Abstract The work is devoted to the Durbin-Watson test for robust linear regression methods. First we explain consequences of the autocorrelation

### Portfolio Using Queuing Theory

Modeling the Number of Insured Households in an Insurance Portfolio Using Queuing Theory Jean-Philippe Boucher and Guillaume Couture-Piché December 8, 2015 Quantact / Département de mathématiques, UQAM.

### A Model of Optimum Tariff in Vehicle Fleet Insurance

A Model of Optimum Tariff in Vehicle Fleet Insurance. Bouhetala and F.Belhia and R.Salmi Statistics and Probability Department Bp, 3, El-Alia, USTHB, Bab-Ezzouar, Alger Algeria. Summary: An approach about

### Institute of Actuaries of Australia. Disability Claims - Does Anyone Recover?

Institute of Actuaries of Australia Disability Claims - Does Anyone Recover? David Service FIA, ASA, FIAA David Pitt BEc, BSc, FIAA 2002 The Institute of Actuaries of Australia This paper has been prepared

### Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

### Profile Likelihood Confidence Intervals for GLM s

Profile Likelihood Confidence Intervals for GLM s The standard procedure for computing a confidence interval (CI) for a parameter in a generalized linear model is by the formula: estimate ± percentile

### 171:290 Model Selection Lecture II: The Akaike Information Criterion

171:290 Model Selection Lecture II: The Akaike Information Criterion Department of Biostatistics Department of Statistics and Actuarial Science August 28, 2012 Introduction AIC, the Akaike Information

### Final Exam Practice Problem Answers

Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

### Longitudinal Meta-analysis

Quality & Quantity 38: 381 389, 2004. 2004 Kluwer Academic Publishers. Printed in the Netherlands. 381 Longitudinal Meta-analysis CORA J. M. MAAS, JOOP J. HOX and GERTY J. L. M. LENSVELT-MULDERS Department

### Standard errors of marginal effects in the heteroskedastic probit model

Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic

### Advanced Statistical Analysis of Mortality. Rhodes, Thomas E. and Freitas, Stephen A. MIB, Inc. 160 University Avenue. Westwood, MA 02090

Advanced Statistical Analysis of Mortality Rhodes, Thomas E. and Freitas, Stephen A. MIB, Inc 160 University Avenue Westwood, MA 02090 001-(781)-751-6356 fax 001-(781)-329-3379 trhodes@mib.com Abstract

### 15.1 The Structure of Generalized Linear Models

15 Generalized Linear Models Due originally to Nelder and Wedderburn (1972), generalized linear models are a remarkable synthesis and extension of familiar regression models such as the linear models described

### Introduction to Generalized Linear Models

to Generalized Linear Models Heather Turner ESRC National Centre for Research Methods, UK and Department of Statistics University of Warwick, UK WU, 2008 04 22-24 Copyright c Heather Turner, 2008 to Generalized

### Perform hypothesis testing

Multivariate hypothesis tests for fixed effects Testing homogeneity of level-1 variances In the following sections, we use the model displayed in the figure below to illustrate the hypothesis tests. Partial

### Optimization approaches to multiplicative tariff of rates estimation in non-life insurance

Optimization approaches to multiplicative tariff of rates estimation in non-life insurance Kooperativa pojišt ovna, a.s., Vienna Insurance Group & Charles University in Prague ASTIN Colloquium in The Hague

### Lecture 13: Introduction to generalized linear models

Lecture 13: Introduction to generalized linear models 21 November 2007 1 Introduction Recall that we ve looked at linear models, which specify a conditional probability density P(Y X) of the form Y = α

### How Do We Test Multiple Regression Coefficients?

How Do We Test Multiple Regression Coefficients? Suppose you have constructed a multiple linear regression model and you have a specific hypothesis to test which involves more than one regression coefficient.

### Maximum Likelihood Estimation

Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

### MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

### Risk pricing for Australian Motor Insurance

Risk pricing for Australian Motor Insurance Dr Richard Brookes November 2012 Contents 1. Background Scope How many models? 2. Approach Data Variable filtering GLM Interactions Credibility overlay 3. Model

### Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

### One-year reserve risk including a tail factor : closed formula and bootstrap approaches

One-year reserve risk including a tail factor : closed formula and bootstrap approaches Alexandre Boumezoued R&D Consultant Milliman Paris alexandre.boumezoued@milliman.com Yoboua Angoua Non-Life Consultant

### Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Categorical Data Analysis

Richard L. Scheaffer University of Florida The reference material and many examples for this section are based on Chapter 8, Analyzing Association Between Categorical Variables, from Statistical Methods

### FAIR TRADE IN INSURANCE INDUSTRY: PREMIUM DETERMINATION OF TAIWAN AUTOMOBILE INSURANCE. Emilio Venezian Venezian Associate, Taiwan

FAIR TRADE IN INSURANCE INDUSTRY: PREMIUM DETERMINATION OF TAIWAN AUTOMOBILE INSURANCE Emilio Venezian Venezian Associate, Taiwan Chu-Shiu Li Department of Economics, Feng Chia University, Taiwan 100 Wen

### Data Analysis. Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) SS Analysis of Experiments - Introduction

Data Analysis Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) Prof. Dr. Dr. h.c. Dieter Rombach Dr. Andreas Jedlitschka SS 2014 Analysis of Experiments - Introduction

### Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.

Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models. Dr. Jon Starkweather, Research and Statistical Support consultant This month