Lecture 8: Gamma regression

Size: px
Start display at page:

Download "Lecture 8: Gamma regression"

Transcription

1 Lecture 8: Gamma regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

2 Overview Models with constant coefficient of variation Gamma regression: estimation and testing Gamma regression with weights c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

3 Motivation Linear models: V ar(y i ) = σ 2 constant Poisson models: V ar(y i ) = E(Y i ) = µ i non constant, linear Constant coefficient of variation: [V ar(y i )] 1/2 E(Y i ) = σ V ar(y i ) = σ 2 µ 2 i non constant, quadratic c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

4 Which transformation stablizes the variance if V ar(y i ) = σ 2 µ 2 i holds? Answer: Z i := log(y i ) Proof: According to a 2nd order Taylor expansion we have Z i = log(y i ) log(µ i ) + 1 µ i (Y i µ i ) 1 (Y 2µ 2 i µ i ) 2 i for σ 2 small E(Z i ) log(µ i ) 1 2µ 2 i = log(µ i ) 1 2 σ2 V ar(y i ) } {{ } =σ 2 µ 2 i According to a 1st order Taylor expansion we have E(Z i ) log(µ ( i ) ) E(Zi 2) E [log(µ i ) + 1 µ i (Y i µ i )] 2 ( ) ( = (log µ i ) 2 + 2E log(µ i ) 1 µ i (Y i µ i ) + E = (log µ i ) σ 2 µ 2 µ 2 i = (log µ i) 2 + σ 2 i V ar(z i ) (log µ i ) 2 + σ 2 (log µ i ) 2 = σ 2 For small σ 2 we have E(Z i ) log(µ i ) σ 2 /2 and V ar(z i ) σ 2 1 µ 2 i (Y i µ i ) 2 ) c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

5 Multiplicative error structure: Log normal model Assume Y i = e β 0 x β 1 i1 xβ p ip (1 + ɛ i ) } {{ } E(Y i )=:µ i where ɛ i := Y i E(Y i ) E(Y i ) E(ɛ i ) = 0, V ar(ɛ i ) = V ar(y i) E(Y i ) 2 =: σ 2 ln(y i ) = β 0 + β 1 ln x i β p ln x ip + ln(1 + ɛ i ) where E (ln(1 + ɛ i )) = E = E ( ) ln(1 + Y i E(Y i ) E(Y i ) ) ( ) ln( Y i E(Y i ) ) = E(ln(Y i )) ln E(Y } {{ } i ) = σ 2 /2 ln µ i σ2 2 and V ar (ln(1 + ɛ i )) = V ar (ln(y i ) ln E(Y i )) = V ar (ln(y i )) σ 2 c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

6 Consider ln Y i = β 0 + β 1 ln x i β p ln x ip σ2 2 + ɛ i, where ɛ i := ln(1 + ɛ i) + σ2 2 E(ɛ i ) 0, V ar(ɛ i ) σ2 If one assumes ɛ i N(0, σ2 ) i.i.d. and fits the log normal model ln Y i = α 0 + α 1 ln x i α p ln x ip + ɛ i then ˆα 1,..., ˆα p are consistent estimates for β 1,..., β p, but ˆα 0 is biased for β 0 with approx. bias σ 2 /2. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

7 Original scale: Gamma regression If one wants to work on original scale t β (ln x i ) µ i = E(Y i ) = e β 0x β 1 i1 xβ p ip = e } {{ } η i ln x i := (1, ln(x i1,..., ln(x ip )) t g(µ i ) = ln(µ i ). We need distribution for Y i 0 with E(Y i ) = µ i and V ar(y i ) = σ 2 µ 2 i. Such a distribution is the Gamma distribution, i.e. Y Gamma(ν, λ) with f Y (y) = λ Γ(ν) (λy)ν 1 e λy y 0 E(Y ) = ν λ = µ, V ar(y ) = ν = ( ) ν 2 1 λ 2 λ ν = 1 µ2. }{{} ν =σ 2 c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

8 Summary V ar(y i ) = σ 2 (E(Y i )) 2 µ i = E(Y i ) = e β 0x β 1 i1 xβ p ip Log normal model GLM with Y i Γ(ν, λ i ) (normal error on log scale) µ i = ν λ i σ 2 = 1 ν Linear model on log scale Gamma regression c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

9 Properties of the Gamma distribution 1) Mean parametrization µ = ν λ λ = ν µ f Y (y) = λ Γ(ν) (λy)ν 1 e λy = 1 Γ(ν) ( ν µ ) ( ν µ y ) ν 1 e ν µ y = 1 Γ(ν) ( ) ν νy µ e µ ν y 1 y f Y (y)dy = 1 Γ(ν) ( ) ν νy µ e µ ν y 1 y dy }{{} d(ln y) Gamma(µ, ν) Mean reparametrization c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

10 2) The form of the density is determined by ν <ν< ν>= ) Special cases : ν = 1 ν Skewed to the right Skewness exponential distribution normal distribution E(Y µ)3 σ 3 = = 2ν 1/2 0 for ν c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

11 Gamma regression as GLM For known ν in Y Gamma(µ, ν) the log likelihood is given by l(µ, ν, y) = (ν 1) log(y) ν µ y + ν log ν ν log µ log Γ(ν) = ν ( yµ log µ ) + c(y, ν) θ = 1/µ, b(θ) = log µ = log( 1/θ) = log( θ) b (θ) = ( 1) θ = 1/θ = µ expectation b (θ) = 1 θ 2 = µ 2 variance function a(φ) = φ, φ = 1/ν dispersion parameter c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

12 EDA for Gamma regression µ i = e xt i β log(µ i ) = x t i β a plot of x ij versus log(y i ) should be linear µ i = 1 x t i β 1 µ i = x t i β a plot of x ij versus 1/Y i should be linear c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

13 Example: Canadian Automobile Insurance Claims Source: Bailey, R.A. and Simon, LeRoy J. (1960). Two studies in automobile insurance. ASTIN Bulletin, Description: The data give the Canadian automobile insurance experience for policy years 1956 and 1957 as of June 30, The data includes virtually every insurance company operating in Canada and was collated by the Statistical Agency (Canadian Underwriters Association - Statistical Department) acting under instructions from the Superintendent of Insurance. The data given here is for private passenger automobile liability for non-farmers for all of Canada excluding Saskatchewan. The variable Merit measures the number of years since the last claim on the policy. The variable Class is a collation of age, sex, use and marital status. The variables Insured and Premium are two measures of the risk exposure of the insurance companies. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

14 Variable Description: Merit 3 licensed and accident free 3 years 2 licensed and accident free 2 years 1 licensed and accident free 1 year 0 all others Class 1 pleasure, no male operator < 25 2 pleasure, non-principal male operator < 25 3 business use 4 unmarried owner or principal operator< 25 5 married owner or principal operator < 25 c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

15 Variable Description(continued): Insured Earned car years Premium Earned premium in 1000 s (adjusted to what the premium would have been had all cars been written at 01 rates) Claims Number of claims Cost Total cost of the claim in 1000 s of dollars Variable of Interest is Cost. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

16 Data: > cancar.data Merit Class Insured Premium Claims Cost c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

17 Exploratory Data Analysis (unweighted case): Log Link: Merit Rating Class empirical log mean empirical log mean Merit Rating Class and Merit Class Merit and Class empirical log mean empirical log mean Merit Class No obvious functional relationships, treatment of covariates as factors might be appropriate. No strong interactions expected. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

18 Inverse Link: Exploratory Data Analysis (continued): Merit Rating Class empirical inverse mean empirical inverse mean Merit Rating Class and Merit Class Merit and Class empirical inverse mean empirical inverse mean Merit Class No obvious functional relationships, treatment of covariates as factors might be appropriate. Some interactions expected. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

19 Residual deviance in Gamma regression Y i Gamma(µ i, ν) independent µ i = e xt i β l(µ, ν, y) = n i=1 l(y, ν, y) = n i=1 i=1 { ν [ ] y i µ i ln(µ i ) } ln(γ(ν)) + ν ln(νy i ) ln(y i ) {ν [ 1 ln(y i )] ln(γ(ν)) + ν ln(νy i ) ln(y i )} 2 (l(ˆµ, [ ν, y) l(y, ν, y)) n ] = 2 ν( 1 ln(y i ) + y i ˆµ i + ln(ˆµ i )) i=1 = 2 n [ ( ) ] ν ln yi ˆµ i y i ˆµ i ˆµ i D(y, ˆµ) = 2 n i=1 [ ln ( yi ˆµ i ) ] y i ˆµ i ˆµ i saturated log likelihood deviance We have D(y, ˆµ) a φχ 2 n p where φ = 1/ν. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

20 Residual deviance test Y i Gamma(µ i, ν) independent µ i = e xt i β (Model (*)) Reject model (*) at level α if Here ˆφ is an estimate of φ. D(y, ˆµ) ˆφ > χ 2 n p,1 α c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

21 Partial deviance test Consider β = (β t 1, β t 2) t β 1 R p 1, β 2 R p 2 ˆµ 1 = fitted mean in reduced model with design X 2 ˆµ 2 = fitted mean in full model with design X 1 and X 2 ˆφ 1 = estimated dispersion parameter in reduced model ˆφ 2 = estimated dispersion parameter in full model The partial deviance test for H 0 : β 1 = 0 versus H 1 : β 1 0 is given by: Reject H 0 at level α D(y, ˆµ 1 ) ˆφ 1 D(y, ˆµ 2 ) ˆφ 2 > χ 2 p 1,1 α c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

22 Canonical link η i = θ i = 1 µ i Since we need µ i > 0 we need η i < 0, which gives restrictions on β. Therefore the canonical link is not often used. Most often the log link is used. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

23 Estimation of the dispersion parameter For the estimation of the dispersion parameter φ = σ 2 = 1/ν method of moments is used E(Y i ) = µ i V ar(y i ) = φµ 2 i i Z i = Y i µ i has expectation 1, variance φ and is i.i.d. ˆφ = 1 n p n i=1 ( ) 2 Yi ˆµ i 1 = 1 n p n i=1 ( ) 2 Yi ˆµ i ˆµ i since p parameters need to be estimated. Remark: Pearson χ 2 statistic for the gamma regression is given by ˆφ = χ 2 P /(n p). χ 2 P = n i=1 = n i=1 (Y i ˆµ i ) 2 V (ˆµ i ) ( ) 2 Yi ˆµ i ˆµ i V (ˆµ i ) = ˆµ 2 i c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

24 Relationship between gamma regression with log link and a linear model on the log scala For σ 2 = φ small we have V ar(log(y i )) σ 2, therefore we can use log Y i = x t i α + ɛ i ɛ i N(0, σ 2 ) i.i.d. linear model on log scala Cov( ˆα) = σ 2 (X t X) 1. For the gamma regression with Y i Γ(µ i, 1/σ 2 ) and µ i = exp{x t iβ} we have I(ˆβ)) = Fisher information = ˆβ a N(β, I(ˆβ) 1 ), ( E where ( )) 2 l(β) β= β β ˆβ. t c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

25 Here l(β) = 1 σ 2 n E i=1 l(β) β j = 1 σ 2 n i=1 2 l(β) n β s β j = 1 σ 2 ( ) 2 l(β) β s β j i=1 = 1 σ 2 n i=1 I(ˆβ) = 1 σ 2 X t X ( ) y µ i log µ i + const ( yi x ij e xt i β x ij ( ) y ix is x ij e xt i β ) j = 1,..., p x is x ij (independent of β) Cov(ˆβ) σ 2 (X t X) 1 = Cov( ˆα) Distinction between both models is difficult. If X corresponds to an orthogonal design, i.e. (X t X) 1 = I p, we have that the components of ˆα (ˆβ) are (asymptotically) independent. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

26 Gamma regression with weights Example: Y i = total claim size arising from n i claims in group i with covariates x i. Y i = n i j=1 Y ij ; Y ij = j th claim in group i Y s i := Y i /n i = average claim size in group i If Y ij Gamma(µ i, ν), independent, we have E(Y s i ) = 1 n i n i j=1 E(Y ij ) } {{ } =µ i = µ i V ar(y s i ) = 1 n i V ar(y i1 ) = 1 n i µ 2 i /ν = µ2 i /n iν. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

27 Since we have n i Y s i = n i j=1 Y ij Gamma(n i µ i, n i ν) Y s i Gamma(µ i, n i ν). This follows since m Y s i (t) := E ( e ty i s ) tn i Yi s = E(e n i 1 ) = (1 n i µ i n i ν t n i ) n i ν = 1 (1 µ i t n i ν )n i ν = m X(t) t, where X Gamma(µ i, n i ν). Therefore we need to introduce weights for a gamma regression for Yi s. Since E(Yi s) = µ i, the EDA for a weighted gamma regression does not change compared to an unweighted one. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

28 Exploratory Data Analysis (weighted case): Log Link: Merit Rating Class empirical log mean empirical log mean Merit Rating Class Class and Merit Merit and Class empirical log mean empirical log mean Merit Class c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

29 Exploratory Data Analysis (weighted case): Inverse Link: Merit Rating Class empirical inverse mean empirical inverse mean Merit Rating Class and Merit Class Merit and Class empirical inverse mean empirical inverse mean Merit Class c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

30 For estimation of the dispersion parameter σ 2 = φ = 1/ν in a weighted model we use ˆσ 2 = ˆφ = 1 n ( ) 2 Yi ˆµ i n i n p ˆµ i i=1 Residuals: E(Y s i ) = µ i Y i s µ i σ ni µ i = n i σ V ar(yi s) = µ2 i n i ν = µ2 i σ2 n i ( ) Y s i µ i µ i has zero expectation and unit variance i.e. standardized residuals can be defined by r s i := ni σ ( Y s i ) ˆµ i. ˆµ i c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

31 Example: Canadian Automobile Insurance Claims: Gamma Regression: Main Effects with Log Link > f.gamma.main_cost/claims ~ Merit + Class > r.gamma.log.main_glm(f.gamma.main, family = Gamma(link = "log"), weights = Claims) c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

32 Example: Gamma Regression: > summary(r.gamma.log.main,cor=f) Call: glm(formula = Cost/Claims ~ Merit + Class, family= Gamma(link = "log"), weights = Claims) Coefficients: Value Std. Error t value (Intercept) Merit Merit Merit Class Class Class Class (Dispersion Parameter for Gamma family taken to be 13 ) Null Deviance:1556 on 19 degrees of freedom Residual Deviance:157 on 12 degrees of freedom c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

33 Example: Gamma Regression (continued): To check the dispersion estimate, use the Pearson Chi Square Statistics. > sum(resid(r.gamma.log.main,type="pearson")^2) [1] 159 > 159/12 [1] 13 For the residual deviance test, we need to scale the deviance > 1-pchisq(156/13,12) [1] 0.45 A p-value of.45 shows no lack of fit. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

34 Main Effects with Inverse Link > f.gamma.main_cost/claims ~ Merit + Class > r.gamma.inverse.main_glm(f.gamma.main, family = Gamma, weights = Claims) > summary(r.gamma.inverse.main,cor=f) Coefficients: Value Std. Error t value (Intercept) Merit Merit Merit Class Class Class Class (Dispersion Parameter for Gamma family taken to be 14 ) Null Deviance:1556 on 19 degrees of freedom Residual Deviance:167 on 12 degrees of freedom c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

35 Example: Gamma Regression (continued): For the residual deviance test, we need to scale the deviance > 1-pchisq(167/14,12) [1] 0.45 A p-value of.45 shows no lack of fit. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

36 Log Link With Interaction > f.gamma.inter_cost/claims ~ Merit * Class > r.gamma.inter_glm(f.gamma.inter, family = Gamma, weights = Claims)) c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

37 Example: Gamma Regression (continued): > summary(r.gamma.log.inter,cor=f) Coefficients: Value Std. Error t value (Intercept) NA NA Merit NA NA Merit NA NA Merit NA NA Class NA NA Class NA NA Class NA NA Class NA NA Merit1Class NA NA Merit2Class NA NA Merit3Class NA NA Merit1Class NA NA Merit2Class NA NA Merit3Class NA NA Merit1Class NA NA Merit2Class NA NA Merit3Class NA NA Merit1Class NA NA Merit2Class NA NA Merit3Class NA NA (Dispersion Parameter for Gamma family taken to be NA ) c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

38 Example: Gamma Regression (continued): Null Deviance: 1556 on 19 degrees of freedom Residual Deviance: 0 on 0 degrees of freedom Number of Fisher Scoring Iterations: 1 This model is the saturated model since only one observation per cell. Therefore the dispersion estimate can no longer be estimated, since the Pearson Chi Square Statistics is 0. To fit a model with fixed dispersion, use > summary(r.gamma.log.inter,dispersion=1,cor=f) c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

39 Example: Gamma Regression (continued): Coefficients: Value Std. Error t value (Intercept) Merit Merit Merit Class Class Class Class Merit1Class Merit2Class Merit3Class Merit1Class Merit2Class Merit3Class Merit1Class Merit2Class Merit3Class Merit1Class Merit2Class Merit3Class (Dispersion Parameter for Gamma family taken to be 1 ) Null Deviance: 1556 on 19 degrees of freedom Residual Deviance: 0 on 0 degrees of freedom From this we see that certain interaction effects are strongly significant. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

40 Log Link With Some Interaction Terms > f.gamma.inter3_cost/claims ~ Merit + Class + I((Merit == 2) * ( Class == 2)) + I((Merit == 2) * (Class == 3)) + I((Merit == 1) * (Class == 4)) + I(( Merit == 3) * (Class == 4)) + I((Merit == 3) * (Class == 2)) + I((Merit == 2) * ( Class == 5)) > r.gamma.log.inter3_glm(f.gamma.inter3,family =Gamma(link="log"),weights=Claims) c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

41 Example: Gamma Regression (continued): > summary(r.gamma.log.inter3,cor=f) Call:glm(formula=Cost/Claims ~ Merit+Class+I(( Merit == 2) * (Class == 2)) + I((Merit == 2) * (Class == 3)) + I((Merit == 1) * ( Class == 4)) + I((Merit == 3) * (Class == 4)) + I((Merit == 3) * (Class == 2)) + I(( Merit == 2) * (Class == 5)), family = Gamma(link = "log"), weights = Claims) Deviance Residuals: Min 1Q Median 3Q Max e c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

42 Example: Gamma Regression (continued): Coefficients: Value Std. Error (Intercept) Merit Merit Merit Class Class Class Class I((Merit==2)*(Class==2)) I((Merit==2)*(Class==3)) I((Merit==1)*(Class==4)) I((Merit==3)*(Class==4)) I((Merit==3)*(Class==2)) I((Merit==2)*(Class==5)) c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

43 Example: Gamma Regression (continued): t value (Intercept) Merit1-6.6 Merit2-4.4 Merit3-8.6 Class2 1.3 Class3 2.7 Class4 7.2 Class5-4.1 I((Merit == 2) * (Class == 2)) 3.8 I((Merit == 2) * (Class == 3)) -4.0 I((Merit == 1) * (Class == 4)) 3.0 I((Merit == 3) * (Class == 4)) 3.6 I((Merit == 3) * (Class == 2)) 1.8 I((Merit == 2) * (Class == 5)) -1.9 (Dispersion Parameter for Gamma family taken to be 2.7 ) Null Deviance: 1556 on 19 degrees of freedom Residual Deviance: 16 on 6 degrees of freedom c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

44 Example: Gamma Regression (continued): > 1-pchisq(16/2.7,6) [1] 0.43 # Residual Deviance Test > 1-pchisq((156/13)-(16/2.7),6) [1] 0.41 # Partial Deviance Test We see that the joint effect of the interaction terms are not significant, since the partial deviance test yields a p-value of.41. A partial deviance test when only I((Merit == 2) * (Class == 2)) and I((Merit == 2) * (Class == 3)) are included gives a p-value of.39, thus showing that the interaction effects are not present in this data set. The same results are true when an inverse link is used instead. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

45 Standardized Residual Plots:: log.main inverse.main standardized residuals Index Difference between log link and inverse link is minimal. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

46 What needs to be done if one wants to use the linear model on the log scale? V ar(y s i ) = σ2 n i µ 2 i ( E(log(Yi s)) E log(µ i ) + 1 µ i (Yi s E [ (log(y s i ))2] = log(µ i ) 1 µ 2 i E ( V ar(yi s ) } {{ } µ 2 i σ2 n i [log(µ i ) + 1 µ i (Y s = [log(µ i )] 2 + V ar(y s i ) µ 2 i = (log µ i ) 2 + σ2 n i µ i) 1 µ 2 i σ = log(µ i ) i µ i)] 2 ) (Y s i µ i) 2 ) 2 2n i V ar(log(y s i )) (log µ i) 2 + σ2 n i (log µ i ) 2 = σ2 n i for σ 2 small linear model on the log scale needs also to be weighted c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

47 Log Normal Models: > f.lognormal.main_log(cost/claims)~merit + Class > r.lognormal.main_glm(f.lognormal.main, weights=claims) > summary(r.lognormal.main,cor=f) Call: glm(formula = log(cost/claims) ~ Merit + Class, weights = Claims) Deviance Residuals: Min 1Q Median 3Q Max c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

48 Log Normal Models (continued): Coefficients: Value Std. Error t value (Intercept) Merit Merit Merit Class Class Class Class (Dispersion Parameter for Gaussian family taken to be 13 ) Null Deviance:1515 on 19 degrees of freedom Residual Deviance:156 on 12 degrees of freedom c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

49 Log Normal Models (continued): > 1-pchisq(156/13,12) [1] 0.45 # Residual Deviance Test # Results when Splus function lm() is used: > summary(lm(f.lognormal.main,weights=claims),cor=f) Coefficients: Value Std. Error t value (Intercept) Merit Merit Merit Class Class Class Class Pr(> t ) (Intercept) Merit Merit Merit Class Class Class Class c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

50 Log Normal Models (continued): Residual standard error: 3.6 on 12 degrees of freedom Multiple R-Squared: 0.9 F-statistic: 15 on 7 and 12 degrees of freedom, the p-value is 4.7e-05 Both approaches (R-squared=.9, residual deviance p-value=.45) show no lack of fit. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

51 Fitted Costs per Claims: fitted Cost per Claim log.main inverse.main lognormal.main observed Index Difference between Models are minimal. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

52 Fitted Cost: fitted Cost log.main inverse.main lognormal.main observed Index Difference between Models are minimal. c (Claudia Czado, TU Munich) ZFS/IMS Göttingen

Lecture 6: Poisson regression

Lecture 6: Poisson regression Lecture 6: Poisson regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction EDA for Poisson regression Estimation and testing in Poisson regression

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

More information

Logistic Regression (a type of Generalized Linear Model)

Logistic Regression (a type of Generalized Linear Model) Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge

More information

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by Statistics 580 Maximum Likelihood Estimation Introduction Let y (y 1, y 2,..., y n be a vector of iid, random variables from one of a family of distributions on R n and indexed by a p-dimensional parameter

More information

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

More information

Probabilistic concepts of risk classification in insurance

Probabilistic concepts of risk classification in insurance Probabilistic concepts of risk classification in insurance Emiliano A. Valdez Michigan State University East Lansing, Michigan, USA joint work with Katrien Antonio* * K.U. Leuven 7th International Workshop

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

More information

Generalised linear models for aggregate claims; to Tweedie or not?

Generalised linear models for aggregate claims; to Tweedie or not? Generalised linear models for aggregate claims; to Tweedie or not? Oscar Quijano and José Garrido Concordia University, Montreal, Canada (First version of May 20, 2013, this revision December 2, 2014)

More information

Nonnested model comparison of GLM and GAM count regression models for life insurance data

Nonnested model comparison of GLM and GAM count regression models for life insurance data Nonnested model comparison of GLM and GAM count regression models for life insurance data Claudia Czado, Julia Pfettner, Susanne Gschlößl, Frank Schiller December 8, 2009 Abstract Pricing and product development

More information

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

We extended the additive model in two variables to the interaction model by adding a third term to the equation. Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic

More information

Underwriting risk control in non-life insurance via generalized linear models and stochastic programming

Underwriting risk control in non-life insurance via generalized linear models and stochastic programming Underwriting risk control in non-life insurance via generalized linear models and stochastic programming 1 Introduction Martin Branda 1 Abstract. We focus on rating of non-life insurance contracts. We

More information

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech.

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech. MSwM examples Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech February 24, 2014 Abstract Two examples are described to illustrate the use of

More information

Local classification and local likelihoods

Local classification and local likelihoods Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

More information

AN ACTUARIAL NOTE ON THE CREDIBILITY OF EXPERIENCE OF A SINGLE PRIVATE PASSENGER CAR

AN ACTUARIAL NOTE ON THE CREDIBILITY OF EXPERIENCE OF A SINGLE PRIVATE PASSENGER CAR AN ACTUARIAL NOTE ON THE CREDIBILITY OF EXPERIENCE OF A SINGLE PRIVATE PASSENGER CAR BY ROBERT A. BAILEY AND LEROY J. SIMON The experience of the Canadian merit rating plan for private passenger cars provides

More information

BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Automated Biosurveillance Data from England and Wales, 1991 2011

Automated Biosurveillance Data from England and Wales, 1991 2011 Article DOI: http://dx.doi.org/10.3201/eid1901.120493 Automated Biosurveillance Data from England and Wales, 1991 2011 Technical Appendix This online appendix provides technical details of statistical

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

More information

NON-LIFE INSURANCE PRICING USING THE GENERALIZED ADDITIVE MODEL, SMOOTHING SPLINES AND L-CURVES

NON-LIFE INSURANCE PRICING USING THE GENERALIZED ADDITIVE MODEL, SMOOTHING SPLINES AND L-CURVES NON-LIFE INSURANCE PRICING USING THE GENERALIZED ADDITIVE MODEL, SMOOTHING SPLINES AND L-CURVES Kivan Kaivanipour A thesis submitted for the degree of Master of Science in Engineering Physics Department

More information

Multivariate Logistic Regression

Multivariate Logistic Regression 1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

More information

Régression logistique : introduction

Régression logistique : introduction Chapitre 16 Introduction à la statistique avec R Régression logistique : introduction Une variable à expliquer binaire Expliquer un risque suicidaire élevé en prison par La durée de la peine L existence

More information

GLMs: Gompertz s Law. GLMs in R. Gompertz s famous graduation formula is. or log µ x is linear in age, x,

GLMs: Gompertz s Law. GLMs in R. Gompertz s famous graduation formula is. or log µ x is linear in age, x, Computing: an indispensable tool or an insurmountable hurdle? Iain Currie Heriot Watt University, Scotland ATRC, University College Dublin July 2006 Plan of talk General remarks The professional syllabus

More information

Time Series Analysis

Time Series Analysis Time Series 1 April 9, 2013 Time Series Analysis This chapter presents an introduction to the branch of statistics known as time series analysis. Often the data we collect in environmental studies is collected

More information

Logistic regression (with R)

Logistic regression (with R) Logistic regression (with R) Christopher Manning 4 November 2007 1 Theory We can transform the output of a linear regression to be suitable for probabilities by using a logit link function on the lhs as

More information

Time-Series Regression and Generalized Least Squares in R

Time-Series Regression and Generalized Least Squares in R Time-Series Regression and Generalized Least Squares in R An Appendix to An R Companion to Applied Regression, Second Edition John Fox & Sanford Weisberg last revision: 11 November 2010 Abstract Generalized

More information

Examining a Fitted Logistic Model

Examining a Fitted Logistic Model STAT 536 Lecture 16 1 Examining a Fitted Logistic Model Deviance Test for Lack of Fit The data below describes the male birth fraction male births/total births over the years 1931 to 1990. A simple logistic

More information

Computer exercise 4 Poisson Regression

Computer exercise 4 Poisson Regression Chalmers-University of Gothenburg Department of Mathematical Sciences Probability, Statistics and Risk MVE300 Computer exercise 4 Poisson Regression When dealing with two or more variables, the functional

More information

GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE

GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE ACTA UNIVERSITATIS AGRICULTURAE ET SILVICULTURAE MENDELIANAE BRUNENSIS Volume 62 41 Number 2, 2014 http://dx.doi.org/10.11118/actaun201462020383 GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE Silvie Kafková

More information

GLM with a Gamma-distributed Dependent Variable

GLM with a Gamma-distributed Dependent Variable GLM with a Gamma-distributed Dependent Variable Paul E. Johnson October 6, 204 Introduction I started out to write about why the Gamma distribution in a GLM is useful. In the end, I ve found it difficult

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Vector Time Series Model Representations and Analysis with XploRe

Vector Time Series Model Representations and Analysis with XploRe 0-1 Vector Time Series Model Representations and Analysis with plore Julius Mungo CASE - Center for Applied Statistics and Economics Humboldt-Universität zu Berlin [email protected] plore MulTi Motivation

More information

A Mixed Copula Model for Insurance Claims and Claim Sizes

A Mixed Copula Model for Insurance Claims and Claim Sizes A Mixed Copula Model for Insurance Claims and Claim Sizes Claudia Czado, Rainer Kastenmeier, Eike Christian Brechmann, Aleksey Min Center for Mathematical Sciences, Technische Universität München Boltzmannstr.

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Introduction to Predictive Modeling Using GLMs

Introduction to Predictive Modeling Using GLMs Introduction to Predictive Modeling Using GLMs Dan Tevet, FCAS, MAAA, Liberty Mutual Insurance Group Anand Khare, FCAS, MAAA, CPCU, Milliman 1 Antitrust Notice The Casualty Actuarial Society is committed

More information

Lecture 14: GLM Estimation and Logistic Regression

Lecture 14: GLM Estimation and Logistic Regression Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South

More information

Causal Forecasting Models

Causal Forecasting Models CTL.SC1x -Supply Chain & Logistics Fundamentals Causal Forecasting Models MIT Center for Transportation & Logistics Causal Models Used when demand is correlated with some known and measurable environmental

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma [email protected] The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

Testing for Lack of Fit

Testing for Lack of Fit Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit

More information

15.1 The Structure of Generalized Linear Models

15.1 The Structure of Generalized Linear Models 15 Generalized Linear Models Due originally to Nelder and Wedderburn (1972), generalized linear models are a remarkable synthesis and extension of familiar regression models such as the linear models described

More information

Own Damage, Third Party Property Damage Claims and Malaysian Motor Insurance: An Empirical Examination

Own Damage, Third Party Property Damage Claims and Malaysian Motor Insurance: An Empirical Examination Australian Journal of Basic and Applied Sciences, 5(7): 1190-1198, 2011 ISSN 1991-8178 Own Damage, Third Party Property Damage Claims and Malaysian Motor Insurance: An Empirical Examination 1 Mohamed Amraja

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

Combining Linear and Non-Linear Modeling Techniques: EMB America. Getting the Best of Two Worlds

Combining Linear and Non-Linear Modeling Techniques: EMB America. Getting the Best of Two Worlds Combining Linear and Non-Linear Modeling Techniques: Getting the Best of Two Worlds Outline Who is EMB? Insurance industry predictive modeling applications EMBLEM our GLM tool How we have used CART with

More information

ANOVA. February 12, 2015

ANOVA. February 12, 2015 ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R

More information

Statistics 305: Introduction to Biostatistical Methods for Health Sciences

Statistics 305: Introduction to Biostatistical Methods for Health Sciences Statistics 305: Introduction to Biostatistical Methods for Health Sciences Modelling the Log Odds Logistic Regression (Chap 20) Instructor: Liangliang Wang Statistics and Actuarial Science, Simon Fraser

More information

Factors affecting online sales

Factors affecting online sales Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

More information

EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA

EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA EMPIRICAL RISK MINIMIZATION FOR CAR INSURANCE DATA Andreas Christmann Department of Mathematics homepages.vub.ac.be/ achristm Talk: ULB, Sciences Actuarielles, 17/NOV/2006 Contents 1. Project: Motor vehicle

More information

Sales forecasting # 2

Sales forecasting # 2 Sales forecasting # 2 Arthur Charpentier [email protected] 1 Agenda Qualitative and quantitative methods, a very general introduction Series decomposition Short versus long term forecasting

More information

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate

More information

A Deeper Look Inside Generalized Linear Models

A Deeper Look Inside Generalized Linear Models A Deeper Look Inside Generalized Linear Models University of Minnesota February 3 rd, 2012 Nathan Hubbell, FCAS Agenda Property & Casualty (P&C Insurance) in one slide The Actuarial Profession Travelers

More information

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Lecture 18: Logistic Regression Continued

Lecture 18: Logistic Regression Continued Lecture 18: Logistic Regression Continued Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

Lab 13: Logistic Regression

Lab 13: Logistic Regression Lab 13: Logistic Regression Spam Emails Today we will be working with a corpus of emails received by a single gmail account over the first three months of 2012. Just like any other email address this account

More information

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1 Stat 5303 (Oehlert): Tukey One Degree of Freedom 1 > catch

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

L3: Statistical Modeling with Hadoop

L3: Statistical Modeling with Hadoop L3: Statistical Modeling with Hadoop Feng Li [email protected] School of Statistics and Mathematics Central University of Finance and Economics Revision: December 10, 2014 Today we are going to learn...

More information

**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.

**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1. **BEGINNING OF EXAMINATION** 1. You are given: (i) The annual number of claims for an insured has probability function: 3 p x q q x x ( ) = ( 1 ) 3 x, x = 0,1,, 3 (ii) The prior density is π ( q) = q,

More information

The zero-adjusted Inverse Gaussian distribution as a model for insurance claims

The zero-adjusted Inverse Gaussian distribution as a model for insurance claims The zero-adjusted Inverse Gaussian distribution as a model for insurance claims Gillian Heller 1, Mikis Stasinopoulos 2 and Bob Rigby 2 1 Dept of Statistics, Macquarie University, Sydney, Australia. email:

More information

Smoothing and Non-Parametric Regression

Smoothing and Non-Parametric Regression Smoothing and Non-Parametric Regression Germán Rodríguez [email protected] Spring, 2001 Objective: to estimate the effects of covariates X on a response y nonparametrically, letting the data suggest

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

Sections 2.11 and 5.8

Sections 2.11 and 5.8 Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and

More information

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? SAMUEL H. COX AND YIJIA LIN ABSTRACT. We devise an approach, using tobit models for modeling annuity lapse rates. The approach is based on data provided

More information

EVALUATION OF PROBABILITY MODELS ON INSURANCE CLAIMS IN GHANA

EVALUATION OF PROBABILITY MODELS ON INSURANCE CLAIMS IN GHANA EVALUATION OF PROBABILITY MODELS ON INSURANCE CLAIMS IN GHANA E. J. Dadey SSNIT, Research Department, Accra, Ghana S. Ankrah, PhD Student PGIA, University of Peradeniya, Sri Lanka Abstract This study investigates

More information

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

More information

Comparing Nested Models

Comparing Nested Models Comparing Nested Models ST 430/514 Two models are nested if one model contains all the terms of the other, and at least one additional term. The larger model is the complete (or full) model, and the smaller

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information

Factorial experimental designs and generalized linear models

Factorial experimental designs and generalized linear models Statistics & Operations Research Transactions SORT 29 (2) July-December 2005, 249-268 ISSN: 1696-2281 www.idescat.net/sort Statistics & Operations Research c Institut d Estadística de Transactions Catalunya

More information

Examples. David Ruppert. April 25, 2009. Cornell University. Statistics for Financial Engineering: Some R. Examples. David Ruppert.

Examples. David Ruppert. April 25, 2009. Cornell University. Statistics for Financial Engineering: Some R. Examples. David Ruppert. Cornell University April 25, 2009 Outline 1 2 3 4 A little about myself BA and MA in mathematics PhD in statistics in 1977 taught in the statistics department at North Carolina for 10 years have been in

More information

Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

More information

Chapter 6: Multivariate Cointegration Analysis

Chapter 6: Multivariate Cointegration Analysis Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration

More information

2. Linear regression with multiple regressors

2. Linear regression with multiple regressors 2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

More information

Regression Models for Time Series Analysis

Regression Models for Time Series Analysis Regression Models for Time Series Analysis Benjamin Kedem 1 and Konstantinos Fokianos 2 1 University of Maryland, College Park, MD 2 University of Cyprus, Nicosia, Cyprus Wiley, New York, 2002 1 Cox (1975).

More information

MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

More information

Penalized regression: Introduction

Penalized regression: Introduction Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

More information

Chapter 29 The GENMOD Procedure. Chapter Table of Contents

Chapter 29 The GENMOD Procedure. Chapter Table of Contents Chapter 29 The GENMOD Procedure Chapter Table of Contents OVERVIEW...1365 WhatisaGeneralizedLinearModel?...1366 ExamplesofGeneralizedLinearModels...1367 TheGENMODProcedure...1368 GETTING STARTED...1370

More information

SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS

SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS Copyright 005 by the Society of Actuaries and the Casualty Actuarial Society

More information

Nonlinear Regression Functions. SW Ch 8 1/54/

Nonlinear Regression Functions. SW Ch 8 1/54/ Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General

More information

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

More information

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring Non-life insurance mathematics Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring Overview Important issues Models treated Curriculum Duration (in lectures) What is driving the result of a

More information

University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014.

University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014. University of Ljubljana Doctoral Programme in Statistics ethodology of Statistical Research Written examination February 14 th, 2014 Name and surname: ID number: Instructions Read carefully the wording

More information

Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

More information

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models Factor Analysis Principal components factor analysis Use of extracted factors in multivariate dependency models 2 KEY CONCEPTS ***** Factor Analysis Interdependency technique Assumptions of factor analysis

More information

An extension of the factoring likelihood approach for non-monotone missing data

An extension of the factoring likelihood approach for non-monotone missing data An extension of the factoring likelihood approach for non-monotone missing data Jae Kwang Kim Dong Wan Shin January 14, 2010 ABSTRACT We address the problem of parameter estimation in multivariate distributions

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

7 Generalized Estimating Equations

7 Generalized Estimating Equations Chapter 7 The procedure extends the generalized linear model to allow for analysis of repeated measurements or other correlated observations, such as clustered data. Example. Public health of cials can

More information

Offset Techniques for Predictive Modeling for Insurance

Offset Techniques for Predictive Modeling for Insurance Offset Techniques for Predictive Modeling for Insurance Matthew Flynn, Ph.D, ISO Innovative Analytics, W. Hartford CT Jun Yan, Ph.D, Deloitte & Touche LLP, Hartford CT ABSTRACT This paper presents the

More information