Generalized Linear Model. Badr Missaoui

Size: px
Start display at page:

Download "Generalized Linear Model. Badr Missaoui"

Transcription

1 Badr Missaoui

2 Logistic Regression Outline Generalized linear models Deviance Logistic regression.

3 All models we have seen so far deal with continuous outcome variables with no restriction on their expectations, and (most) have assumed that mean and variance are unrelated (i.e. variance is constant). Many outcomes of interest do not satisfy this. Examples : binary outcomes, Poisson count outcomes. A Generalized Linear Model (GLM) is a model with two ingredients : a link function and a variance function. The link relates the means of the observations to predictors : linearization The variance function relates the means to the variances.

4 The data involve 462 males between the ages of 15 and 64. The outcome Y is the presence (Y = 1) or absence Y = 0 of heart disease Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) e-06 *** sbp tobacco ** ldl ** adiposity famhistpresent e-05 *** typea ** obesity alcohol age ***

5 Motivation Classical linear model Y = Xβ + ε where ε N(0, σ 2 ). That means, Y N(Xβ, σ 2 ) In the GLM, we specify that Y P(Xβ)

6 We write the GLM as and E(Y i ) = µ i η i = g(µ i ) = X i β where the function g called a link function which belongs to an exponential family.

7 The exponential family density are specifying two components, the canonical parameter θ and the dispersion parameter φ. Let Y = (Y i ) i=1...n be a sequence of random variables. Y i has an exponential density if ( ) yi θ i b(θ i ) f Yi (y i ; θ i, φ) = exp + c(y i, φ) a i (φ) where the functions b, c are specific to each distribution and a i (φ) = φ/w i.

8 Law Law µ σ 2 B(m, p) p y (1 p) m y. ( ) m m y k=0 δ {k} mp mp(1 p) P(µ) µ y e µ. m k=0 } 1 k! δ k µ µ N (µ, σ 2 ) exp { (y µ)2.dy µ σ 2 2σ 2 } IG(µ, λ) λ exp { λ(y µ)2 dy 2µy. µ µ 3 /λ 2πy 3

9 We write l(y; θ, φ) = log f (y; θ, φ) for the log-likelihood function of Y. Using the facts that ( ) l E θ ( ) l Var θ = 0 = E ( 2 ) l θ 2 We have and E(y) = b (θ) Var(y) = b (θ)a(φ)

10 Gaussian case ] 1 f (y; θ, φ) = [ σ 2π exp (y µ)2 2σ 2 ( yµ µ 2 /2 = exp σ 2 1 ( )) y 2 2 σ 2 + log(2πσ2 ) We can write( θ = µ, φ = σ 2, a(φ) ) = φ, b(θ) = θ 2 /2 and c(y, φ) = 1 y log(2πσ 2 ) σ 2 Binomial case ( ) n f (y; θ, φ) = µ y (1 µ) n y y = exp ( y log We can write θ = log µ c(y, φ) = log ( ) n y 1 µ ( ) µ + n log(1 µ) + log 1 µ, b(θ) = n log(1 µ) and ( )) n y

11 Recall that in ordinary linear models, the MLE of β satisfies ˆβ = (X T X) 1 X T Y if X has full rank. In GLM, the MLE ˆβ does not exist in closed form and can be approximately estimated via iterative weighted least squares.

12 For n observations, the log-likelihood function is n L(β) = l(y i ; θ, φ) Computing i=1 l i = l i θ i µ i η i 1 = x ij β j θ i µ i η i β j g (µ i ) 1 b (θ i ) y i µ i φ/w i The likelihood equations are L i n 1 µ i = x ij β j g (µ i ) 2 (y i µ i ) = 0 j = 1,.., p Var(y i ) η i Put and i=1 { } W = diag g (µ i ) 2 Var(y i ) i=1,...,n { µ η = diag µi η i } i=1,...,n

13 These likelihood equations are X T W 1 µ (y µ) = 0 η These equations are non-linear in β and require an iterative method (e.g Newton-Raphson). The Fisher s Information matrix is and in general term I = X T W 1 X ( 2 ) L(β) [I] jk = E = β j β k n i=1 x ij x jk Var(y i ) ( µi η i ) 2

14 Let ˆµ 0 = Y be the initial estimate. Then, set ˆη 0 = g(ˆµ 0 ), and form the adjusted variable Z 0 = ˆη 0 + (Y ˆµ 0 ) η µ µ=ˆµ 0 Calculate ˆβ 1 by the least squares regression of Z 0 on X, that means So, Set ˆβ 1 = argmin β (Z 0 Xβ) T W 1 0 (Z 0 Xβ) ˆβ 1 = (X T W 1 0 X) 1 X T W 1 0 Z 0 ˆη 1 = X ˆ β 1, ˆµ 1 = g 1 (ˆη 1 ) Repeat until changes in ˆβ m are sufficiently small.

15 Estimation In theory, ˆβ m ˆβ as m, but in practice, the algorithm may fail to converge. Under some conditions, ˆβ N(β, I 1 (β)) In practice, the asymptotic covariance matrix of ˆβ is estimated by φ(x T Wm 1 X) 1 where W m is the weight matrix from the m th iteration. If φ is unknown, it is estimated by ˆφ = 1 n p n i=1 w i (y i ˆµ) 2 V (ˆµ) where V (ˆµ i ) = var(y i )/a(φ) = w i var(y i )/φ

16 Confidence interval [ ] 1 1 CI α (β i ) = ˆβ j u 1 α/2 n ˆσ βj ; ˆβ j + u 1 α/2 n ˆσ βj where u 1 α/2 is the 1 α/2 quantile of N(0, 1) and [ ] 1 I( ˆβ). ˆσ βj = 1 n To test the hypothesis if φ is unknown jj H 0 : β j = 0 against H 1 : β j 0 ˆβ j N(0, 1) φ(x T Wm 1 X) 1 (j, j) ˆβ j t n p ˆφ(X T Wm 1 X) 1 (j, j)

17 Goodness-of-Fit H 0 : the true model is M versus H 1 : the true is M sat The likelihood ratio test for this hypothesis is called the deviance. For any submodel M, dev(m) = 2(ˆl sat ˆl M ) Under H 0, dev(m) χ 2 p sat p.

18 Goodness-of-Fit The scaled deviance for GLM is D(y, ˆµ) = 2 [l(ˆµ sat, φ; y) l(ˆµ, φ; y)] = n { 2w i yi (θ(ˆµ sat i ) θ(ˆµ i )) b(ˆµ sat } i ) + b(ˆµ i /φ = i=1 n D (y i ; ˆµ i )/φ i=1 = D (y; ˆµ)/φ

19 Tests We use the deviance to compare two models having p 1 and p 2 parameters respectively, where p 1 < p 2. Let ˆµ 1 and ˆµ 2 denote the corresponding MLEs. If φ is unknown, D(y, ˆµ 1 ) D(y, ˆµ 2 ) χ 2 p 2 p 1 D (y, ˆµ 1 ) D (y, ˆµ 2 ) (p 2 p 1 ) ˆφ F 1 α,p2 p 1,n p 2

20 Goodness-of-Fit The deviance residuals for a given model are d i = sign(y i ˆµ i ) D (y i ; ˆµ i ) A poorly fitting point will make a large contribution to the deviance, so d i will be large.

21 Diagnostics The Pearson residuals are defined by r i = y i ˆµ i (1 hii )V (ˆµ) where h ii is the ith diagonal element of The deviance residuals are H = X(X T Wm 1 X) 1 X T Wm 1 ˆε i = sign(y i ˆµ i ) D (y i ; ˆµ i ) 1 h ii

22 Diagnostics The Anscombe residuals is defined as a transformation of the Pearson residual r A i = t(y i ) t(ˆµ i ) t (ˆµ i ) φv (ˆµ i )(1 h ii ) The aim in introducing the function t is to make the residuals as Gaussian as possible. We consider t(x) = x 0 V (µ) 1/3 dµ

23 Diagnostics Influential points using the Cook s distance C i = 1 p ( ˆβ (i) ˆβ) T X T W m X( ˆβ (i) ˆβ) r 2 i h ii p(1 h ii ) 2 The outliers points : if h ii > 2p/n or h ii > 3p/n, then we consider that ith point is an outlier.

24 Model Selection Model selection can be done using the AIC and BIC. Forward, Backward and stepwise approach can be used.

25 Logistic regression Logistic regression is a generalization of regression that is used when the outcome Y is binary 0, 1. As example, we assume that P(Y i = 1 X i ) = eβ 0+β 1 X i 1 + e β 0+β 1 X i Note that E(Y i X i ) = P(Y i = 1 X i )

26 Logistic regression Define the logit function ( ) z logit(z) = log 1 z We can write where π i = P(Y i = 1 X i ) logit(π i ) = β 0 + β 1 X i The extension to several covariates is logit(π i ) = β 0 + p β j x ij i=1

27 How do we estimate the parameters? Can be fit using maximum likelihood. The likelihood function is L(β) = n f (y i X i ; β) = L(β) = i=1 n i=1 The estimator ˆβ has to be found numerically. π y i i (1 π i ) 1 y i

28 Usually, we use the reweighted least squares First set a starting values of β (0) Compute ex i β (k) ˆπ i = 1 + e X i β(k) Define weighted matrix W whose i th diagonal is ˆπ i (1 ˆπ i ) Define the adjusted response vector Z = Xβ (k) + W 1 (Y ˆπ) Take ˆβ (k+1) = (X T WX) 1 X T WZ which is the weighted linear regression of Z on X

29 Model selection and diagnostics Diagnostics : the Pearson χ 2 Y i ˆπ i ˆπi (1 ˆπ i ) The deviance residuals [ sign(y i ˆπ i ) 2 Y i log ( Yi ) ( )] 1 Yi + (1 Y i ) log ˆπ i 1 ˆπ i

30 To fit this model, we use the glm command. Call: glm(formula = chd ~., family = binomial, data = SAheart) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) e-06 *** row.names sbp tobacco ** ldl ** adiposity famhistpresent e-05 *** typea ** obesity alcohol age *** --- Signif. codes: 0 *** ** 0.01 * (Dispersion parameter for binomial family taken to be 1) Null deviance: on 461 degrees of freedom Residual deviance: on 451 degrees of freedom AIC: Number of Fisher Scoring iterations: 5

31 To fit this model, we use the glm command. Start: AIC= chd ~ row.names + sbp + tobacco + ldl + adiposity + famhist + typea + obesity + alcohol + age Df Deviance AIC - alcohol adiposity row.names sbp <none> obesity ldl tobacco typea age famhist etc... Step: AIC= chd ~ tobacco + ldl + famhist + typea + age Df Deviance AIC <none> ldl typea tobacco famhist age

32 Suppose Y i Binomial(n i, π i ) We can fit the logistic model as before Pearson residuals r i = logit(π i ) = X i β Y i n i ˆπ i ni ˆπ i (1 ˆπ i ) Deviation residuals d i = sign(y i Ŷi) 2 [ Y i log ( Yi ) ( )] ni Y i + (n i Y i ) log ˆµ i n i ˆµ i

33 Goodness-of-Fit test The Pearson test and deviance χ 2 = i D = i r 2 i d 2 i both have a χ 2 n p distribution if the model is correct.

34 To fit this model, we use the glm command. Call: glm(formula = cbind(y, n - y) ~ x, family = binomial) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) e-16 *** x e-16 *** --- Signif. codes: 0 *** ** 0.01 * (Dispersion parameter for binomial family taken to be 1) Null deviance: on 7 degrees of freedom Residual deviance: on 6 degrees of freedom AIC: Number of Fisher Scoring iterations: 4

35 To test the correctness of the model > pvalue = 1-pchisq(out$dev,out$df.residual) > print(pvalue) [1] > r=resid(out,type="deviance") > p=out$linear.predictors > plot(p,r,pch=19,xlab="linear predictor", ylab="deviance residuals") > print(sum(r^2)) [1] > cooks.distance(out) Note that the residuals give back the deviance test, and the p-value is large indicating no evidence of a lack of fit.

Lecture 8: Gamma regression

Lecture 8: Gamma regression Lecture 8: Gamma regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Models with constant coefficient of variation Gamma regression: estimation and testing

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

More information

Lecture 6: Poisson regression

Lecture 6: Poisson regression Lecture 6: Poisson regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction EDA for Poisson regression Estimation and testing in Poisson regression

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Logistic Regression (a type of Generalized Linear Model)

Logistic Regression (a type of Generalized Linear Model) Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge

More information

Multivariate Logistic Regression

Multivariate Logistic Regression 1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

More information

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

More information

Régression logistique : introduction

Régression logistique : introduction Chapitre 16 Introduction à la statistique avec R Régression logistique : introduction Une variable à expliquer binaire Expliquer un risque suicidaire élevé en prison par La durée de la peine L existence

More information

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

Factorial experimental designs and generalized linear models

Factorial experimental designs and generalized linear models Statistics & Operations Research Transactions SORT 29 (2) July-December 2005, 249-268 ISSN: 1696-2281 www.idescat.net/sort Statistics & Operations Research c Institut d Estadística de Transactions Catalunya

More information

GLM, insurance pricing & big data: paying attention to convergence issues.

GLM, insurance pricing & big data: paying attention to convergence issues. GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - [email protected] Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.

More information

Lecture 14: GLM Estimation and Logistic Regression

Lecture 14: GLM Estimation and Logistic Regression Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South

More information

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

More information

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by Statistics 580 Maximum Likelihood Estimation Introduction Let y (y 1, y 2,..., y n be a vector of iid, random variables from one of a family of distributions on R n and indexed by a p-dimensional parameter

More information

Lecture 18: Logistic Regression Continued

Lecture 18: Logistic Regression Continued Lecture 18: Logistic Regression Continued Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina

More information

Regression Models for Time Series Analysis

Regression Models for Time Series Analysis Regression Models for Time Series Analysis Benjamin Kedem 1 and Konstantinos Fokianos 2 1 University of Maryland, College Park, MD 2 University of Cyprus, Nicosia, Cyprus Wiley, New York, 2002 1 Cox (1975).

More information

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech.

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech. MSwM examples Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech February 24, 2014 Abstract Two examples are described to illustrate the use of

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

Local classification and local likelihoods

Local classification and local likelihoods Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Examining a Fitted Logistic Model

Examining a Fitted Logistic Model STAT 536 Lecture 16 1 Examining a Fitted Logistic Model Deviance Test for Lack of Fit The data below describes the male birth fraction male births/total births over the years 1931 to 1990. A simple logistic

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

NON-LIFE INSURANCE PRICING USING THE GENERALIZED ADDITIVE MODEL, SMOOTHING SPLINES AND L-CURVES

NON-LIFE INSURANCE PRICING USING THE GENERALIZED ADDITIVE MODEL, SMOOTHING SPLINES AND L-CURVES NON-LIFE INSURANCE PRICING USING THE GENERALIZED ADDITIVE MODEL, SMOOTHING SPLINES AND L-CURVES Kivan Kaivanipour A thesis submitted for the degree of Master of Science in Engineering Physics Department

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: [email protected] Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma [email protected] The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

Logit Models for Binary Data

Logit Models for Binary Data Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response

More information

GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE

GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE ACTA UNIVERSITATIS AGRICULTURAE ET SILVICULTURAE MENDELIANAE BRUNENSIS Volume 62 41 Number 2, 2014 http://dx.doi.org/10.11118/actaun201462020383 GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE Silvie Kafková

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

An extension of the factoring likelihood approach for non-monotone missing data

An extension of the factoring likelihood approach for non-monotone missing data An extension of the factoring likelihood approach for non-monotone missing data Jae Kwang Kim Dong Wan Shin January 14, 2010 ABSTRACT We address the problem of parameter estimation in multivariate distributions

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

VI. Introduction to Logistic Regression

VI. Introduction to Logistic Regression VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models

More information

15.1 The Structure of Generalized Linear Models

15.1 The Structure of Generalized Linear Models 15 Generalized Linear Models Due originally to Nelder and Wedderburn (1972), generalized linear models are a remarkable synthesis and extension of familiar regression models such as the linear models described

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Basic Statistical and Modeling Procedures Using SAS

Basic Statistical and Modeling Procedures Using SAS Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom

More information

Penalized regression: Introduction

Penalized regression: Introduction Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

More information

Chapter 6: Multivariate Cointegration Analysis

Chapter 6: Multivariate Cointegration Analysis Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

More information

ANOVA. February 12, 2015

ANOVA. February 12, 2015 ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R

More information

Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

More information

Logistic regression (with R)

Logistic regression (with R) Logistic regression (with R) Christopher Manning 4 November 2007 1 Theory We can transform the output of a linear regression to be suitable for probabilities by using a logit link function on the lhs as

More information

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY 5-10 hours of input weekly is enough to pick up a new language (Schiff & Myers, 1988). Dutch children spend 5.5 hours/day

More information

Computer exercise 4 Poisson Regression

Computer exercise 4 Poisson Regression Chalmers-University of Gothenburg Department of Mathematical Sciences Probability, Statistics and Risk MVE300 Computer exercise 4 Poisson Regression When dealing with two or more variables, the functional

More information

Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit scoring, contracting.

Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit scoring, contracting. Prof. Dr. J. Franke All of Statistics 1.52 Binary response variables - logistic regression Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate

More information

Underwriting risk control in non-life insurance via generalized linear models and stochastic programming

Underwriting risk control in non-life insurance via generalized linear models and stochastic programming Underwriting risk control in non-life insurance via generalized linear models and stochastic programming 1 Introduction Martin Branda 1 Abstract. We focus on rating of non-life insurance contracts. We

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Automated Biosurveillance Data from England and Wales, 1991 2011

Automated Biosurveillance Data from England and Wales, 1991 2011 Article DOI: http://dx.doi.org/10.3201/eid1901.120493 Automated Biosurveillance Data from England and Wales, 1991 2011 Technical Appendix This online appendix provides technical details of statistical

More information

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

We extended the additive model in two variables to the interaction model by adding a third term to the equation. Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic

More information

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model Assumptions Assumptions of linear models Apply to response variable within each group if predictor categorical Apply to error terms from linear model check by analysing residuals Normality Homogeneity

More information

Lab 13: Logistic Regression

Lab 13: Logistic Regression Lab 13: Logistic Regression Spam Emails Today we will be working with a corpus of emails received by a single gmail account over the first three months of 2012. Just like any other email address this account

More information

Sections 2.11 and 5.8

Sections 2.11 and 5.8 Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and

More information

L3: Statistical Modeling with Hadoop

L3: Statistical Modeling with Hadoop L3: Statistical Modeling with Hadoop Feng Li [email protected] School of Statistics and Mathematics Central University of Finance and Economics Revision: December 10, 2014 Today we are going to learn...

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Ordinal Regression. Chapter

Ordinal Regression. Chapter Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

Chapter 29 The GENMOD Procedure. Chapter Table of Contents

Chapter 29 The GENMOD Procedure. Chapter Table of Contents Chapter 29 The GENMOD Procedure Chapter Table of Contents OVERVIEW...1365 WhatisaGeneralizedLinearModel?...1366 ExamplesofGeneralizedLinearModels...1367 TheGENMODProcedure...1368 GETTING STARTED...1370

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

13. Poisson Regression Analysis

13. Poisson Regression Analysis 136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often

More information

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium

More information

Statistics 305: Introduction to Biostatistical Methods for Health Sciences

Statistics 305: Introduction to Biostatistical Methods for Health Sciences Statistics 305: Introduction to Biostatistical Methods for Health Sciences Modelling the Log Odds Logistic Regression (Chap 20) Instructor: Liangliang Wang Statistics and Actuarial Science, Simon Fraser

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

Simple example of collinearity in logistic regression

Simple example of collinearity in logistic regression 1 Confounding and Collinearity in Multivariate Logistic Regression We have already seen confounding and collinearity in the context of linear regression, and all definitions and issues remain essentially

More information

BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

More information

Nonnested model comparison of GLM and GAM count regression models for life insurance data

Nonnested model comparison of GLM and GAM count regression models for life insurance data Nonnested model comparison of GLM and GAM count regression models for life insurance data Claudia Czado, Julia Pfettner, Susanne Gschlößl, Frank Schiller December 8, 2009 Abstract Pricing and product development

More information

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

More information

Penalized Logistic Regression and Classification of Microarray Data

Penalized Logistic Regression and Classification of Microarray Data Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R. ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R. 1. Motivation. Likert items are used to measure respondents attitudes to a particular question or statement. One must recall

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

Analysis of ordinal data with cumulative link models estimation with the R-package ordinal

Analysis of ordinal data with cumulative link models estimation with the R-package ordinal Analysis of ordinal data with cumulative link models estimation with the R-package ordinal Rune Haubo B Christensen June 28, 2015 1 Contents 1 Introduction 3 2 Cumulative link models 4 2.1 Fitting cumulative

More information

Master s Theory Exam Spring 2006

Master s Theory Exam Spring 2006 Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

More information

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.

More information

MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

More information

Logistic regression modeling the probability of success

Logistic regression modeling the probability of success Logistic regression modeling the probability of success Regression models are usually thought of as only being appropriate for target variables that are continuous Is there any situation where we might

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Section 6: Model Selection, Logistic Regression and more...

Section 6: Model Selection, Logistic Regression and more... Section 6: Model Selection, Logistic Regression and more... Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Model Building

More information

Smoothing and Non-Parametric Regression

Smoothing and Non-Parametric Regression Smoothing and Non-Parametric Regression Germán Rodríguez [email protected] Spring, 2001 Objective: to estimate the effects of covariates X on a response y nonparametrically, letting the data suggest

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Introduction to Predictive Modeling Using GLMs

Introduction to Predictive Modeling Using GLMs Introduction to Predictive Modeling Using GLMs Dan Tevet, FCAS, MAAA, Liberty Mutual Insurance Group Anand Khare, FCAS, MAAA, CPCU, Milliman 1 Antitrust Notice The Casualty Actuarial Society is committed

More information

Parallelization Strategies for Multicore Data Analysis

Parallelization Strategies for Multicore Data Analysis Parallelization Strategies for Multicore Data Analysis Wei-Chen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management

More information

1 Maximum likelihood estimation

1 Maximum likelihood estimation COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N

More information

GLMs: Gompertz s Law. GLMs in R. Gompertz s famous graduation formula is. or log µ x is linear in age, x,

GLMs: Gompertz s Law. GLMs in R. Gompertz s famous graduation formula is. or log µ x is linear in age, x, Computing: an indispensable tool or an insurmountable hurdle? Iain Currie Heriot Watt University, Scotland ATRC, University College Dublin July 2006 Plan of talk General remarks The professional syllabus

More information

Time-Series Regression and Generalized Least Squares in R

Time-Series Regression and Generalized Least Squares in R Time-Series Regression and Generalized Least Squares in R An Appendix to An R Companion to Applied Regression, Second Edition John Fox & Sanford Weisberg last revision: 11 November 2010 Abstract Generalized

More information

Probabilistic concepts of risk classification in insurance

Probabilistic concepts of risk classification in insurance Probabilistic concepts of risk classification in insurance Emiliano A. Valdez Michigan State University East Lansing, Michigan, USA joint work with Katrien Antonio* * K.U. Leuven 7th International Workshop

More information

Some Essential Statistics The Lure of Statistics

Some Essential Statistics The Lure of Statistics Some Essential Statistics The Lure of Statistics Data Mining Techniques, by M.J.A. Berry and G.S Linoff, 2004 Statistics vs. Data Mining..lie, damn lie, and statistics mining data to support preconceived

More information