Lecture 6: Poisson regression

Size: px
Start display at page:

Download "Lecture 6: Poisson regression"

Transcription

1 Lecture 6: Poisson regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

2 Overview Introduction EDA for Poisson regression Estimation and testing in Poisson regression Residuals in Poisson regression c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

3 Introduction Want regression models for count response data. See Cameron and Trivedi (1998) and Winkelmann (1997) for books on regression models for count data. Example: Canadian car insurance Y i = Number of claims in within a group i of insured with t i insurance years x i = covariate of group i c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

4 Poisson regression Model: Y i P oisson(t i µ(x i, β)) i = 1,..., n ind., i.e. P (Y i = y i ) = e t iµ(x i,β)(t i µ(x i,β)) y i y i! µ(x i, β) := e x i tβ 0 E(Y i ) = t i µ(x i, β) = V ar(y i ) t i gives the time length in which events occur, t i known. Likelihood: l(y, β) = n P (Y i = y i ) = e n t i µ(x i,β) n (t i µ(x i, β)) y i y i! c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

5 For the Poisson model EDA for Poisson regression Y i P oisson(t i e x i tβ ) ind. ln(µ i ) = ln(t i ) + x t i β ( ) If we only have a single obs. for each x i we can estimate ( ) by ln(y i ) = ln(t i ) + x t i β Therefore ln(y i ) ln(t i ) needs to be linear in x i if the Poisson model is valid. If Y i = 0 holds we need to consider ln(y i + c) for c small. Similar as for the binary regression we can investigate main and interaction effects. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

6 If there are several obs. denoted by Y i1,..., Y ini with t i1,..., t ini for each design vector x i we can plot x i versus ln 1 n i n i j=1 Y ij ln 1 n i n i j=1 t ij In this case we can check whether E(Y i ) = V ar(y i ) is valid for Poisson regression: plot ˆµ 0 i = 1 n i n i j=1 Y ij versus s 0 i = 1 n i 1 and we expect this plot to be centered around the 45 o - line. n i j=1 (Y ij ˆµ 0 i ) 2 c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

7 Example: Canadian Automobile Insurance Claims Source: Bailey and Simon (1960) Description: The data give the Canadian automobile insurance experience for policy years 1956 and 1957 as of June 30, The data includes virtually every insurance company operating in Canada and was collated by the Statistical Agency (Canadian Underwriters Association - Statistical Department) acting under instructions from the Superintendent of Insurance. The data given here is for private passenger automobile liability for non-farmers for all of Canada excluding Saskatchewan. The variable Merit measures the number of years since the last claim on the policy. The variable Class is a collation of age, sex, use and marital status. The variables Insured and Premium are two measures of the risk exposure of the insurance companies. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

8 Variable Description: Merit 3 licensed and accident free 3 years 2 licensed and accident free 2 years 1 licensed and accident free 1 year 0 all others Class 1 pleasure, no male operator < 25 2 pleasure, non-principal male operator < 25 3 business use 4 unmarried owner and principal operator< 25 5 married owner and principal operator < 25 c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

9 Insured Earned car years Premium Earned premium in 1000 s (adjusted to what the premium would have been had all cars been written at 01 rates) Claims Cost Number of claims Total cost of the claim in 1000 s of dollars c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

10 > cancar.data Merit Class Insured Premium Claims Cost Data c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

11 Exploratory Data Analysis: Merit Rating Class empirical log mean empirical log mean Merit Rating Class and Merit Class Merit and Class empirical log mean empirical log mean Merit Class Linear Effect for Merit Rating, but quadratic effect for Class if considered as metric variables. Interaction effects present, especially when Merit=2 and 3. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

12 Likelihood analysis in Poisson regression loglikelihood: ln l(y, β) = n t i µ(x i, β) + n y i ln(t i µ(x i, β)) + const. ind. of β s j (β) := ln l(y,β) β j = n t i e x i tβ x ij + n y i e x i t β e x i t β x ij s(β) = = n (y i t i e x tβ i )x ij = 0 j = 1,..., p s 1(β). s p (β) = X t (Y µ) = 0 µ = t i e x i t β. t i e x i t β score equations MLE ˆβ solves s(ˆβ) = 0. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

13 Fisher information in Poisson regression 2 ln l(y,β) β r β s = ( n ) β r (y i t i e x tβ i )x is = n i rs := E t i e x i tβ x is x ir ( ) 2 ln l(y,β) β r β s = n t i e x i tβ x is x ir I(β) := (i rs ) r,s=1,...,p = X t D(β)X where D(β) := diag(..., t i e x i tβ,...) I(ˆβ) 1 = estimated asymptotic covariance matrix of ˆβ c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

14 Poisson regression as GLM P (Y i = y i ) = exp{ t i µ(x i, β) + y i ln(t i µ(x i, β)) ln(y i!)} If µ(x i, β) = e x i t β = exp{y i ln(t i µ(x i, β)) t } {{ } i µ(x i, β) ln(y } {{ } i!)} } {{ } θ i b(θ i )=e θ i c(y i,φ) a(φ) = 1 φ = 1 µ i = E(Y i ) = b (θ i ) = e θ i = t i µ(x i, β) = t i e x i tβ = e x i t β+ln(t i ) η i = x t i β = ln(µ i) ln(t i ) = θ i ln(t i ) η i = g(µ i ) g(µ i ) = ln(µ i ) ln(t i ) ln(t i ) known offset extended link function If t i = 1 i θ i = η i g(µ i ) = ln(µ i ) canonical link. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

15 Properties of the MLE Since ˆβ solves s(ˆβ) = X t (Y ˆµ) = 0 X t Y = X t ˆµ where ˆµ = (..., t i e x i t ˆβ,...) t i) If X contains intercept n Y i = t }{{} 1 n Y = 1 nt ˆµ = n 1st column of X ˆµ i ii) If { 1 i X = (D 1, D 2,..., D k ) D ij = th obs. has level j 0 otherwise j = 1,..., k, i = 1,..., n, k = p X t Y = (N 1,..., N k ) where N j = number of obs. with level i Fitted marginal totals (X t ˆµ) are the same as the marginal totals (X t Y). c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

16 iii) If two factors A with I levels and B with J levels are considered with interaction model Y A B then the observed totals of each cell is the same as the fitted totals. If we only have a single observation for each cell, then we have a saturated model. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

17 Deviance analysis in Poisson regression - After the EDA identifies important covariates one can use the partial deviance test to test for significance of individual or groups of covariates - Goodnes of fit can be checked with the residual deviance test - The deviance is given by D = 2 n {y i log ( yi ˆµ i ) (y i ˆµ i )} a χ 2 n p, where ˆµ i := t i e x i t ˆβ. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

18 Remark: The second term of D will be zero if an intercept is used since n y i = n ˆµ i. Pearson χ 2 P statistic for Poisson GLM s is given by χ 2 P = n (y i ˆµ i ) 2 ˆµ i a χ 2 n p Note: a assumes n fixed, but t i µ i i. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

19 Poisson Regression: Class and Merit as factors > f.main_claims ~ offset(log(insured)) + Merit + Class > r.main_glm(f.main,family=poisson) > summary(r.main,cor=f) Call: glm(formula = Claims ~ offset(log(insured)) + Merit + Class, family = poisson) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Value Std. Error t value (Intercept) Merit Merit Merit Class Class Class Class Null Deviance: on 19 degrees of freedom Residual Deviance: 580 on 12 degrees of freedom Residual Deviance too large, not a good fit. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

20 With all interaction terms > f.inter_claims ~ offset(log(insured)) + Merit * Class > r.inter_glm(f.inter,family=poisson) > summary(r.inter,cor=f) Call: glm(formula = Claims ~ offset(log(insured)) + Merit * Class, family = poisson) c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

21 Coefficients: Value Std. Error t value (Intercept) Merit Merit Merit Class Class Class Class Merit1Class Merit2Class Merit3Class Merit1Class Merit2Class Merit3Class Merit1Class Merit2Class Merit3Class Merit1Class Merit2Class Merit3Class Null Deviance: on 19 degrees of freedom Residual Deviance: 0 on 0 degrees of freedom This model is the saturated model since only one observation per cell. Interaction effects when Merit=3 or 2 are very significant. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

22 With some interaction terms > f.inter1_claims ~ offset(log(insured)) + Merit + Class + Class:(Merit == 3) + Class:(Merit == 2) > r.inter1_glm(f.inter1,family=poisson) > summary(r.inter1,cor=f) Call: glm(formula = Claims ~ offset(log(insured)) + Merit + Class + Class:(Merit == 3) + Class:(Merit == 2), family = poisson) c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

23 Coefficients: (4 not defined because of singularities) Value Std. Error t value (Intercept) Merit Merit Merit Class Class Class Class Merit == 3 NA NA NA Merit == 2 NA NA NA Merit == 3Class Merit == 3Class Merit == 3Class Merit == 3Class Merit == 3Class5 NA NA NA Merit == 2Class Merit == 2Class Merit == 2Class Merit == 2Class Merit == 2Class5 NA NA NA Null Deviance: on 19 degrees of freedom Residual Deviance: 5.5 on 4 degrees of freedom The Residual Deviance is now only 5.5 on 4 df with a p-value of.24, reasonable good fit. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

24 Class and Merit as metric variables > Merit.m_as.numeric(Merit) > Class.m_as.numeric(Class) > f.main.m_claims ~ offset(log(insured))+merit.m + poly(class.m,2) > r.main.m_glm(f.main.m,family=poisson) > summary(r.main.m,cor=f) Call: glm(formula = Claims ~ offset(log(insured)) + Merit.m + poly(class.m, 2), family = poisson) Coefficients: Value Std. Error t value (Intercept) Merit.m poly(class.m, 2) poly(class.m, 2) Null Deviance: on 19 degrees of freedom Residual Deviance: 1000 on 16 degrees of freedom c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

25 With interaction > f.inter.m_claims ~ offset(log(insured)) + Merit.m*poly(Class.m,2) > r.inter.m_glm(f.inter.m,family=poisson) > summary(r.inter.m) Call: glm(formula = Claims ~ offset(log(insured)) + Merit.m * poly(class.m, 2), family = poisson) Deviance Residuals: Min 1Q Median 3Q Max c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

26 Coefficients: Value Std. Error (Intercept) Merit.m poly(class.m, 2) poly(class.m, 2) Merit.mpoly(Class.m, 2) Merit.mpoly(Class.m, 2) t value (Intercept) Merit.m poly(class.m, 2)1 4.1 poly(class.m, 2) Merit.mpoly(Class.m, 2)1 9.0 Merit.mpoly(Class.m, 2)2-4.6 Null Deviance: on 19 degrees of freedom Residual Deviance: 580 on 14 degrees of freedom Residual deviance is still too high the models with Merit and Class as factors fit better. Note this is not a saturated model. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

27 Residuals in Poisson regression Pearson residuals: r P i := y i ˆµ i ˆµi Deviance residuals: r D i := sign(y i.ˆµ i )[y i log ( yi ˆµ i ) (y i ˆµ i )] 1/2 If t i const resid(r.object,type= deviance ) resid(r.object,type= pearson ) in Splus i, it is better to consider the standardized response Y i = Y i /t i r P i := y i/t i ˆµ i /t i ˆµi /t i = 1 t i ti r P i = 1 ti r P i standardized Pearson residuals Interpretation: ˆµ i = t i e x i t ˆβ is the estimated number of events in 0 to t i units ˆµ i t i estimated number of events in 0 to 1 units. = the c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

28 Claims~offset+Merit+Class Residual Analysis Claims~offset+Merit+Class+Class:(Merit==3) +Class:(Merit==2) stand. Pearson residual stand. Pearson residual Index Index Claims~offset+Merit.m+poly(Class.m,2) Claims~offset+Merit.m*poly(Class.m,2) stand. Pearson residual stand. Pearson residual Index Index Residuals for model Claims~offset(log(Insured))+Merit+Class+Class:(Merit==3)+Class:(Merit==2) look best. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

29 Interpretation The estimated number of claims per 1000 earned car years for the main effects only and the interaction model are given by: > fit.main_tapply((fitted(r.main)/insured)*1000,list(merit,class),mean) > fit.main > fit.inter1_tapply((fitted(r.inter1)/insured)*1000,list(merit,class),mean) > fit.inter > fit.obs <- tapply((claims/insured) * 1000,list(Merit, Class), mean) > fit.obs c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

30 Observed Claims Expected Claims (Main Effects) observed no of claims Class Merit estimated no of claims Class Merit Expected Claims (With Interaction) estimated claims Class Merit c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

31 References Bailey, R. and L. Simon (1960). Two studies in automobile insurance ratemaking. ASTIN Bulletin V.1, N.4, Cameron, A. and P. Trivedi (1998). Regression analysis of count data. Cambridge University Press. Winkelmann, R. (1997). Econometric analysis of count data (Second, revised and enlarged ed.). Berlin: Springer-Verlag. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

Lecture 8: Gamma regression

Lecture 8: Gamma regression Lecture 8: Gamma regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Models with constant coefficient of variation Gamma regression: estimation and testing

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

Generalised linear models for aggregate claims; to Tweedie or not?

Generalised linear models for aggregate claims; to Tweedie or not? Generalised linear models for aggregate claims; to Tweedie or not? Oscar Quijano and José Garrido Concordia University, Montreal, Canada (First version of May 20, 2013, this revision December 2, 2014)

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

More information

Nonnested model comparison of GLM and GAM count regression models for life insurance data

Nonnested model comparison of GLM and GAM count regression models for life insurance data Nonnested model comparison of GLM and GAM count regression models for life insurance data Claudia Czado, Julia Pfettner, Susanne Gschlößl, Frank Schiller December 8, 2009 Abstract Pricing and product development

More information

AN ACTUARIAL NOTE ON THE CREDIBILITY OF EXPERIENCE OF A SINGLE PRIVATE PASSENGER CAR

AN ACTUARIAL NOTE ON THE CREDIBILITY OF EXPERIENCE OF A SINGLE PRIVATE PASSENGER CAR AN ACTUARIAL NOTE ON THE CREDIBILITY OF EXPERIENCE OF A SINGLE PRIVATE PASSENGER CAR BY ROBERT A. BAILEY AND LEROY J. SIMON The experience of the Canadian merit rating plan for private passenger cars provides

More information

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech.

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech. MSwM examples Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech February 24, 2014 Abstract Two examples are described to illustrate the use of

More information

NON-LIFE INSURANCE PRICING USING THE GENERALIZED ADDITIVE MODEL, SMOOTHING SPLINES AND L-CURVES

NON-LIFE INSURANCE PRICING USING THE GENERALIZED ADDITIVE MODEL, SMOOTHING SPLINES AND L-CURVES NON-LIFE INSURANCE PRICING USING THE GENERALIZED ADDITIVE MODEL, SMOOTHING SPLINES AND L-CURVES Kivan Kaivanipour A thesis submitted for the degree of Master of Science in Engineering Physics Department

More information

Multivariate Logistic Regression

Multivariate Logistic Regression 1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

More information

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

More information

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

More information

Household s Life Insurance Demand - a Multivariate Two Part Model

Household s Life Insurance Demand - a Multivariate Two Part Model Household s Life Insurance Demand - a Multivariate Two Part Model Edward (Jed) W. Frees School of Business, University of Wisconsin-Madison July 30, 1 / 19 Outline 1 2 3 4 2 / 19 Objective To understand

More information

Régression logistique : introduction

Régression logistique : introduction Chapitre 16 Introduction à la statistique avec R Régression logistique : introduction Une variable à expliquer binaire Expliquer un risque suicidaire élevé en prison par La durée de la peine L existence

More information

GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE

GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE ACTA UNIVERSITATIS AGRICULTURAE ET SILVICULTURAE MENDELIANAE BRUNENSIS Volume 62 41 Number 2, 2014 http://dx.doi.org/10.11118/actaun201462020383 GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE Silvie Kafková

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

Vector Time Series Model Representations and Analysis with XploRe

Vector Time Series Model Representations and Analysis with XploRe 0-1 Vector Time Series Model Representations and Analysis with plore Julius Mungo CASE - Center for Applied Statistics and Economics Humboldt-Universität zu Berlin mungo@wiwi.hu-berlin.de plore MulTi Motivation

More information

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

We extended the additive model in two variables to the interaction model by adding a third term to the equation. Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic

More information

Lecture 18: Logistic Regression Continued

Lecture 18: Logistic Regression Continued Lecture 18: Logistic Regression Continued Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina

More information

Logistic Regression (a type of Generalized Linear Model)

Logistic Regression (a type of Generalized Linear Model) Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge

More information

Lecture 14: GLM Estimation and Logistic Regression

Lecture 14: GLM Estimation and Logistic Regression Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South

More information

Hierarchical Insurance Claims Modeling

Hierarchical Insurance Claims Modeling Hierarchical Insurance Claims Modeling Edward W. (Jed) Frees, University of Wisconsin - Madison Emiliano A. Valdez, University of Connecticut 2009 Joint Statistical Meetings Session 587 - Thu 8/6/09-10:30

More information

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by Statistics 580 Maximum Likelihood Estimation Introduction Let y (y 1, y 2,..., y n be a vector of iid, random variables from one of a family of distributions on R n and indexed by a p-dimensional parameter

More information

Analyses on Hurricane Archival Data June 17, 2014

Analyses on Hurricane Archival Data June 17, 2014 Analyses on Hurricane Archival Data June 17, 2014 This report provides detailed information about analyses of archival data in our PNAS article http://www.pnas.org/content/early/2014/05/29/1402786111.abstract

More information

Appendix 1: Estimation of the two-variable saturated model in SPSS, Stata and R using the Netherlands 1973 example data

Appendix 1: Estimation of the two-variable saturated model in SPSS, Stata and R using the Netherlands 1973 example data Appendix 1: Estimation of the two-variable saturated model in SPSS, Stata and R using the Netherlands 1973 example data A. SPSS commands and corresponding parameter estimates Copy the 1973 data from the

More information

Chapter 6: Multivariate Cointegration Analysis

Chapter 6: Multivariate Cointegration Analysis Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration

More information

Joint models for classification and comparison of mortality in different countries.

Joint models for classification and comparison of mortality in different countries. Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute

More information

Underwriting risk control in non-life insurance via generalized linear models and stochastic programming

Underwriting risk control in non-life insurance via generalized linear models and stochastic programming Underwriting risk control in non-life insurance via generalized linear models and stochastic programming 1 Introduction Martin Branda 1 Abstract. We focus on rating of non-life insurance contracts. We

More information

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

More information

Logistic regression (with R)

Logistic regression (with R) Logistic regression (with R) Christopher Manning 4 November 2007 1 Theory We can transform the output of a linear regression to be suitable for probabilities by using a logit link function on the lhs as

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

DISCUSSION PAPERS IN STATISTICS AND ECONOMETRICS

DISCUSSION PAPERS IN STATISTICS AND ECONOMETRICS DISCUSSION PAPERS IN STATISTICS AND ECONOMETRICS SEMINAR OF ECONOMIC AND SOCIAL STATISTICS UNIVERSITY OF COLOGNE No. 01/11 On the causes of car accidents on German Autobahn connectors by Martin Garnowski

More information

GLMs: Gompertz s Law. GLMs in R. Gompertz s famous graduation formula is. or log µ x is linear in age, x,

GLMs: Gompertz s Law. GLMs in R. Gompertz s famous graduation formula is. or log µ x is linear in age, x, Computing: an indispensable tool or an insurmountable hurdle? Iain Currie Heriot Watt University, Scotland ATRC, University College Dublin July 2006 Plan of talk General remarks The professional syllabus

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

More information

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? SAMUEL H. COX AND YIJIA LIN ABSTRACT. We devise an approach, using tobit models for modeling annuity lapse rates. The approach is based on data provided

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Own Damage, Third Party Property Damage Claims and Malaysian Motor Insurance: An Empirical Examination

Own Damage, Third Party Property Damage Claims and Malaysian Motor Insurance: An Empirical Examination Australian Journal of Basic and Applied Sciences, 5(7): 1190-1198, 2011 ISSN 1991-8178 Own Damage, Third Party Property Damage Claims and Malaysian Motor Insurance: An Empirical Examination 1 Mohamed Amraja

More information

Factors affecting online sales

Factors affecting online sales Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

More information

A Mixed Copula Model for Insurance Claims and Claim Sizes

A Mixed Copula Model for Insurance Claims and Claim Sizes A Mixed Copula Model for Insurance Claims and Claim Sizes Claudia Czado, Rainer Kastenmeier, Eike Christian Brechmann, Aleksey Min Center for Mathematical Sciences, Technische Universität München Boltzmannstr.

More information

Local classification and local likelihoods

Local classification and local likelihoods Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor

More information

Examining a Fitted Logistic Model

Examining a Fitted Logistic Model STAT 536 Lecture 16 1 Examining a Fitted Logistic Model Deviance Test for Lack of Fit The data below describes the male birth fraction male births/total births over the years 1931 to 1990. A simple logistic

More information

Poisson Regression or Regression of Counts (& Rates)

Poisson Regression or Regression of Counts (& Rates) Poisson Regression or Regression of (& Rates) Carolyn J. Anderson Department of Educational Psychology University of Illinois at Urbana-Champaign Generalized Linear Models Slide 1 of 51 Outline Outline

More information

Lab 13: Logistic Regression

Lab 13: Logistic Regression Lab 13: Logistic Regression Spam Emails Today we will be working with a corpus of emails received by a single gmail account over the first three months of 2012. Just like any other email address this account

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Offset Techniques for Predictive Modeling for Insurance

Offset Techniques for Predictive Modeling for Insurance Offset Techniques for Predictive Modeling for Insurance Matthew Flynn, Ph.D, ISO Innovative Analytics, W. Hartford CT Jun Yan, Ph.D, Deloitte & Touche LLP, Hartford CT ABSTRACT This paper presents the

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Probabilistic concepts of risk classification in insurance

Probabilistic concepts of risk classification in insurance Probabilistic concepts of risk classification in insurance Emiliano A. Valdez Michigan State University East Lansing, Michigan, USA joint work with Katrien Antonio* * K.U. Leuven 7th International Workshop

More information

Statistics in Retail Finance. Chapter 2: Statistical models of default

Statistics in Retail Finance. Chapter 2: Statistical models of default Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision

More information

Stochastic programming approaches to pricing in non-life insurance

Stochastic programming approaches to pricing in non-life insurance Stochastic programming approaches to pricing in non-life insurance Martin Branda Charles University in Prague Department of Probability and Mathematical Statistics 11th International Conference on COMPUTATIONAL

More information

A revisit of the hierarchical insurance claims modeling

A revisit of the hierarchical insurance claims modeling A revisit of the hierarchical insurance claims modeling Emiliano A. Valdez Michigan State University joint work with E.W. Frees* * University of Wisconsin Madison Statistical Society of Canada (SSC) 2014

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

Logit Models for Binary Data

Logit Models for Binary Data Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response

More information

MATH5003 - Assignment in Mathematics. Modeling and Simulating Football Results

MATH5003 - Assignment in Mathematics. Modeling and Simulating Football Results MATH5003 - Assignment in Mathematics Modeling and Simulating Football Results James Adam Gardner - 200318885 Supervisor - Dr Jochen Voss May 6, 2011 Abstract This report contains the description and implementation

More information

Computer exercise 4 Poisson Regression

Computer exercise 4 Poisson Regression Chalmers-University of Gothenburg Department of Mathematical Sciences Probability, Statistics and Risk MVE300 Computer exercise 4 Poisson Regression When dealing with two or more variables, the functional

More information

Automated Biosurveillance Data from England and Wales, 1991 2011

Automated Biosurveillance Data from England and Wales, 1991 2011 Article DOI: http://dx.doi.org/10.3201/eid1901.120493 Automated Biosurveillance Data from England and Wales, 1991 2011 Technical Appendix This online appendix provides technical details of statistical

More information

BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

More information

Introduction to Predictive Modeling Using GLMs

Introduction to Predictive Modeling Using GLMs Introduction to Predictive Modeling Using GLMs Dan Tevet, FCAS, MAAA, Liberty Mutual Insurance Group Anand Khare, FCAS, MAAA, CPCU, Milliman 1 Antitrust Notice The Casualty Actuarial Society is committed

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

7 Generalized Estimating Equations

7 Generalized Estimating Equations Chapter 7 The procedure extends the generalized linear model to allow for analysis of repeated measurements or other correlated observations, such as clustered data. Example. Public health of cials can

More information

From the help desk: hurdle models

From the help desk: hurdle models The Stata Journal (2003) 3, Number 2, pp. 178 184 From the help desk: hurdle models Allen McDowell Stata Corporation Abstract. This article demonstrates that, although there is no command in Stata for

More information

Regression Models for Time Series Analysis

Regression Models for Time Series Analysis Regression Models for Time Series Analysis Benjamin Kedem 1 and Konstantinos Fokianos 2 1 University of Maryland, College Park, MD 2 University of Cyprus, Nicosia, Cyprus Wiley, New York, 2002 1 Cox (1975).

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information

Ordinal Regression. Chapter

Ordinal Regression. Chapter Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

Use of deviance statistics for comparing models

Use of deviance statistics for comparing models A likelihood-ratio test can be used under full ML. The use of such a test is a quite general principle for statistical testing. In hierarchical linear models, the deviance test is mostly used for multiparameter

More information

Mortgage Loan Approvals and Government Intervention Policy

Mortgage Loan Approvals and Government Intervention Policy Mortgage Loan Approvals and Government Intervention Policy Dr. William Chow 18 March, 214 Executive Summary This paper introduces an empirical framework to explore the impact of the government s various

More information

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring Non-life insurance mathematics Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring Overview Important issues Models treated Curriculum Duration (in lectures) What is driving the result of a

More information

Choosing number of stages of multistage model for cancer modeling: SOP for contractor and IRIS analysts

Choosing number of stages of multistage model for cancer modeling: SOP for contractor and IRIS analysts Choosing number of stages of multistage model for cancer modeling: SOP for contractor and IRIS analysts Definitions in this memo: 1. Order of the multistage model is the highest power term in the multistage

More information

Factorial experimental designs and generalized linear models

Factorial experimental designs and generalized linear models Statistics & Operations Research Transactions SORT 29 (2) July-December 2005, 249-268 ISSN: 1696-2281 www.idescat.net/sort Statistics & Operations Research c Institut d Estadística de Transactions Catalunya

More information

Statistics 305: Introduction to Biostatistical Methods for Health Sciences

Statistics 305: Introduction to Biostatistical Methods for Health Sciences Statistics 305: Introduction to Biostatistical Methods for Health Sciences Modelling the Log Odds Logistic Regression (Chap 20) Instructor: Liangliang Wang Statistics and Actuarial Science, Simon Fraser

More information

Exchange Rate Regime Analysis for the Chinese Yuan

Exchange Rate Regime Analysis for the Chinese Yuan Exchange Rate Regime Analysis for the Chinese Yuan Achim Zeileis Ajay Shah Ila Patnaik Abstract We investigate the Chinese exchange rate regime after China gave up on a fixed exchange rate to the US dollar

More information

Chapter 29 The GENMOD Procedure. Chapter Table of Contents

Chapter 29 The GENMOD Procedure. Chapter Table of Contents Chapter 29 The GENMOD Procedure Chapter Table of Contents OVERVIEW...1365 WhatisaGeneralizedLinearModel?...1366 ExamplesofGeneralizedLinearModels...1367 TheGENMODProcedure...1368 GETTING STARTED...1370

More information

MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

More information

EVALUATION OF PROBABILITY MODELS ON INSURANCE CLAIMS IN GHANA

EVALUATION OF PROBABILITY MODELS ON INSURANCE CLAIMS IN GHANA EVALUATION OF PROBABILITY MODELS ON INSURANCE CLAIMS IN GHANA E. J. Dadey SSNIT, Research Department, Accra, Ghana S. Ankrah, PhD Student PGIA, University of Peradeniya, Sri Lanka Abstract This study investigates

More information

Marginal Person. Average Person. (Average Return of College Goers) Return, Cost. (Average Return in the Population) (Marginal Return)

Marginal Person. Average Person. (Average Return of College Goers) Return, Cost. (Average Return in the Population) (Marginal Return) 1 2 3 Marginal Person Average Person (Average Return of College Goers) Return, Cost (Average Return in the Population) 4 (Marginal Return) 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

More information

Causal Forecasting Models

Causal Forecasting Models CTL.SC1x -Supply Chain & Logistics Fundamentals Causal Forecasting Models MIT Center for Transportation & Logistics Causal Models Used when demand is correlated with some known and measurable environmental

More information

GLM, insurance pricing & big data: paying attention to convergence issues.

GLM, insurance pricing & big data: paying attention to convergence issues. GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

More information

MULTIPLE REGRESSIONS ON SOME SELECTED MACROECONOMIC VARIABLES ON STOCK MARKET RETURNS FROM 1986-2010

MULTIPLE REGRESSIONS ON SOME SELECTED MACROECONOMIC VARIABLES ON STOCK MARKET RETURNS FROM 1986-2010 Advances in Economics and International Finance AEIF Vol. 1(1), pp. 1-11, December 2014 Available online at http://www.academiaresearch.org Copyright 2014 Academia Research Full Length Research Paper MULTIPLE

More information

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1 Stat 5303 (Oehlert): Tukey One Degree of Freedom 1 > catch

More information

A Comparison of Regression Models for Count Data in Third Party Automobile Insurance

A Comparison of Regression Models for Count Data in Third Party Automobile Insurance Department of Mathematical Statistics Royal Institute of Technology, Stockholm, Sweden Degree Project in Engineering Physics, First Level A Comparison of Regression Models for Count Data in Third Party

More information

Simple example of collinearity in logistic regression

Simple example of collinearity in logistic regression 1 Confounding and Collinearity in Multivariate Logistic Regression We have already seen confounding and collinearity in the context of linear regression, and all definitions and issues remain essentially

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate

More information

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis

More information

3. Regression & Exponential Smoothing

3. Regression & Exponential Smoothing 3. Regression & Exponential Smoothing 3.1 Forecasting a Single Time Series Two main approaches are traditionally used to model a single time series z 1, z 2,..., z n 1. Models the observation z t as a

More information

Predicting customer level risk patterns in non-life insurance ERIK VILLAUME

Predicting customer level risk patterns in non-life insurance ERIK VILLAUME Predicting customer level risk patterns in non-life insurance ERIK VILLAUME Master of Science Thesis Stockholm, Sweden Abstract Several models for predicting future customer profitability early into customer

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

Using Stata for Categorical Data Analysis

Using Stata for Categorical Data Analysis Using Stata for Categorical Data Analysis NOTE: These problems make extensive use of Nick Cox s tab_chi, which is actually a collection of routines, and Adrian Mander s ipf command. From within Stata,

More information

An extension of the factoring likelihood approach for non-monotone missing data

An extension of the factoring likelihood approach for non-monotone missing data An extension of the factoring likelihood approach for non-monotone missing data Jae Kwang Kim Dong Wan Shin January 14, 2010 ABSTRACT We address the problem of parameter estimation in multivariate distributions

More information

Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

More information

Linear Models for Continuous Data

Linear Models for Continuous Data Chapter 2 Linear Models for Continuous Data The starting point in our exploration of statistical models in social research will be the classical linear model. Stops along the way include multiple linear

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a

More information