Generalized Linear Model. Badr Missaoui

Similar documents
Lecture 8: Gamma regression

Generalized Linear Models

Lecture 6: Poisson regression

STATISTICA Formula Guide: Logistic Regression. Table of Contents

SAS Software to Fit the Generalized Linear Model

Logistic Regression (a type of Generalized Linear Model)

Multivariate Logistic Regression

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Régression logistique : introduction

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Poisson Models for Count Data

Factorial experimental designs and generalized linear models

GLM, insurance pricing & big data: paying attention to convergence issues.

Lecture 14: GLM Estimation and Logistic Regression

GLM I An Introduction to Generalized Linear Models

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

Lecture 18: Logistic Regression Continued

Regression Models for Time Series Analysis

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech.

Logistic Regression (1/24/13)

Local classification and local likelihoods

Introduction to General and Generalized Linear Models

Statistical Machine Learning

Examining a Fitted Logistic Model

Lecture 3: Linear methods for classification

NON-LIFE INSURANCE PRICING USING THE GENERALIZED ADDITIVE MODEL, SMOOTHING SPLINES AND L-CURVES

Statistical Models in R

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Logit Models for Binary Data

GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE

Chapter 7: Simple linear regression Learning Objectives

An extension of the factoring likelihood approach for non-monotone missing data

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

11. Analysis of Case-control Studies Logistic Regression

VI. Introduction to Logistic Regression

15.1 The Structure of Generalized Linear Models

Linear Threshold Units

Basic Statistical and Modeling Procedures Using SAS

Penalized regression: Introduction

Chapter 6: Multivariate Cointegration Analysis

Multiple Linear Regression

ANOVA. February 12, 2015

Multiple Linear Regression in Data Mining

Simple Linear Regression Inference

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Logistic regression (with R)

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION

Computer exercise 4 Poisson Regression

Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit scoring, contracting.

Statistical Models in R

Underwriting risk control in non-life insurance via generalized linear models and stochastic programming

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Automated Biosurveillance Data from England and Wales,

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Lab 13: Logistic Regression

Sections 2.11 and 5.8

L3: Statistical Modeling with Hadoop

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Ordinal Regression. Chapter

Regression III: Advanced Methods

Chapter 29 The GENMOD Procedure. Chapter Table of Contents

Least Squares Estimation

13. Poisson Regression Analysis

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

Statistics 305: Introduction to Biostatistical Methods for Health Sciences

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Simple example of collinearity in logistic regression

BayesX - Software for Bayesian Inference in Structured Additive Regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

Nonnested model comparison of GLM and GAM count regression models for life insurance data

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Penalized Logistic Regression and Classification of Microarray Data

Statistics in Retail Finance. Chapter 6: Behavioural models

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

Part 2: Analysis of Relationship Between Two Variables

Analysis of ordinal data with cumulative link models estimation with the R-package ordinal

Master s Theory Exam Spring 2006

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

MATHEMATICAL METHODS OF STATISTICS

Logistic regression modeling the probability of success

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Section 6: Model Selection, Logistic Regression and more...

Smoothing and Non-Parametric Regression

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Introduction to Predictive Modeling Using GLMs

Parallelization Strategies for Multicore Data Analysis

1 Maximum likelihood estimation

GLMs: Gompertz s Law. GLMs in R. Gompertz s famous graduation formula is. or log µ x is linear in age, x,

Time-Series Regression and Generalized Least Squares in R

Probabilistic concepts of risk classification in insurance

Some Essential Statistics The Lure of Statistics

Transcription:

Badr Missaoui

Logistic Regression Outline Generalized linear models Deviance Logistic regression.

All models we have seen so far deal with continuous outcome variables with no restriction on their expectations, and (most) have assumed that mean and variance are unrelated (i.e. variance is constant). Many outcomes of interest do not satisfy this. Examples : binary outcomes, Poisson count outcomes. A Generalized Linear Model (GLM) is a model with two ingredients : a link function and a variance function. The link relates the means of the observations to predictors : linearization The variance function relates the means to the variances.

The data involve 462 males between the ages of 15 and 64. The outcome Y is the presence (Y = 1) or absence Y = 0 of heart disease Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -5.9207616 1.3265724-4.463 8.07e-06 *** sbp 0.0076602 0.0058574 1.308 0.190942 tobacco 0.0777962 0.0266602 2.918 0.003522 ** ldl 0.1701708 0.0597998 2.846 0.004432 ** adiposity 0.0209609 0.0294496 0.712 0.476617 famhistpresent 0.9385467 0.2287202 4.103 4.07e-05 *** typea 0.0376529 0.0124706 3.019 0.002533 ** obesity -0.0661926 0.0443180-1.494 0.135285 alcohol 0.0004222 0.0045053 0.094 0.925346 age 0.0441808 0.0121784 3.628 0.000286 ***

Motivation Classical linear model Y = Xβ + ε where ε N(0, σ 2 ). That means, Y N(Xβ, σ 2 ) In the GLM, we specify that Y P(Xβ)

We write the GLM as and E(Y i ) = µ i η i = g(µ i ) = X i β where the function g called a link function which belongs to an exponential family.

The exponential family density are specifying two components, the canonical parameter θ and the dispersion parameter φ. Let Y = (Y i ) i=1...n be a sequence of random variables. Y i has an exponential density if ( ) yi θ i b(θ i ) f Yi (y i ; θ i, φ) = exp + c(y i, φ) a i (φ) where the functions b, c are specific to each distribution and a i (φ) = φ/w i.

Law Law µ σ 2 B(m, p) p y (1 p) m y. ( ) m m y k=0 δ {k} mp mp(1 p) P(µ) µ y e µ. m k=0 } 1 k! δ k µ µ N (µ, σ 2 ) exp { (y µ)2.dy µ σ 2 2σ 2 } IG(µ, λ) λ exp { λ(y µ)2 dy 2µy. µ µ 3 /λ 2πy 3

We write l(y; θ, φ) = log f (y; θ, φ) for the log-likelihood function of Y. Using the facts that ( ) l E θ ( ) l Var θ = 0 = E ( 2 ) l θ 2 We have and E(y) = b (θ) Var(y) = b (θ)a(φ)

Gaussian case ] 1 f (y; θ, φ) = [ σ 2π exp (y µ)2 2σ 2 ( yµ µ 2 /2 = exp σ 2 1 ( )) y 2 2 σ 2 + log(2πσ2 ) We can write( θ = µ, φ = σ 2, a(φ) ) = φ, b(θ) = θ 2 /2 and c(y, φ) = 1 y 2 2 + log(2πσ 2 ) σ 2 Binomial case ( ) n f (y; θ, φ) = µ y (1 µ) n y y = exp ( y log We can write θ = log µ c(y, φ) = log ( ) n y 1 µ ( ) µ + n log(1 µ) + log 1 µ, b(θ) = n log(1 µ) and ( )) n y

Recall that in ordinary linear models, the MLE of β satisfies ˆβ = (X T X) 1 X T Y if X has full rank. In GLM, the MLE ˆβ does not exist in closed form and can be approximately estimated via iterative weighted least squares.

For n observations, the log-likelihood function is n L(β) = l(y i ; θ, φ) Computing i=1 l i = l i θ i µ i η i 1 = x ij β j θ i µ i η i β j g (µ i ) 1 b (θ i ) y i µ i φ/w i The likelihood equations are L i n 1 µ i = x ij β j g (µ i ) 2 (y i µ i ) = 0 j = 1,.., p Var(y i ) η i Put and i=1 { } W = diag g (µ i ) 2 Var(y i ) i=1,...,n { µ η = diag µi η i } i=1,...,n

These likelihood equations are X T W 1 µ (y µ) = 0 η These equations are non-linear in β and require an iterative method (e.g Newton-Raphson). The Fisher s Information matrix is and in general term I = X T W 1 X ( 2 ) L(β) [I] jk = E = β j β k n i=1 x ij x jk Var(y i ) ( µi η i ) 2

Let ˆµ 0 = Y be the initial estimate. Then, set ˆη 0 = g(ˆµ 0 ), and form the adjusted variable Z 0 = ˆη 0 + (Y ˆµ 0 ) η µ µ=ˆµ 0 Calculate ˆβ 1 by the least squares regression of Z 0 on X, that means So, Set ˆβ 1 = argmin β (Z 0 Xβ) T W 1 0 (Z 0 Xβ) ˆβ 1 = (X T W 1 0 X) 1 X T W 1 0 Z 0 ˆη 1 = X ˆ β 1, ˆµ 1 = g 1 (ˆη 1 ) Repeat until changes in ˆβ m are sufficiently small.

Estimation In theory, ˆβ m ˆβ as m, but in practice, the algorithm may fail to converge. Under some conditions, ˆβ N(β, I 1 (β)) In practice, the asymptotic covariance matrix of ˆβ is estimated by φ(x T Wm 1 X) 1 where W m is the weight matrix from the m th iteration. If φ is unknown, it is estimated by ˆφ = 1 n p n i=1 w i (y i ˆµ) 2 V (ˆµ) where V (ˆµ i ) = var(y i )/a(φ) = w i var(y i )/φ

Confidence interval [ ] 1 1 CI α (β i ) = ˆβ j u 1 α/2 n ˆσ βj ; ˆβ j + u 1 α/2 n ˆσ βj where u 1 α/2 is the 1 α/2 quantile of N(0, 1) and [ ] 1 I( ˆβ). ˆσ βj = 1 n To test the hypothesis if φ is unknown jj H 0 : β j = 0 against H 1 : β j 0 ˆβ j N(0, 1) φ(x T Wm 1 X) 1 (j, j) ˆβ j t n p ˆφ(X T Wm 1 X) 1 (j, j)

Goodness-of-Fit H 0 : the true model is M versus H 1 : the true is M sat The likelihood ratio test for this hypothesis is called the deviance. For any submodel M, dev(m) = 2(ˆl sat ˆl M ) Under H 0, dev(m) χ 2 p sat p.

Goodness-of-Fit The scaled deviance for GLM is D(y, ˆµ) = 2 [l(ˆµ sat, φ; y) l(ˆµ, φ; y)] = n { 2w i yi (θ(ˆµ sat i ) θ(ˆµ i )) b(ˆµ sat } i ) + b(ˆµ i /φ = i=1 n D (y i ; ˆµ i )/φ i=1 = D (y; ˆµ)/φ

Tests We use the deviance to compare two models having p 1 and p 2 parameters respectively, where p 1 < p 2. Let ˆµ 1 and ˆµ 2 denote the corresponding MLEs. If φ is unknown, D(y, ˆµ 1 ) D(y, ˆµ 2 ) χ 2 p 2 p 1 D (y, ˆµ 1 ) D (y, ˆµ 2 ) (p 2 p 1 ) ˆφ F 1 α,p2 p 1,n p 2

Goodness-of-Fit The deviance residuals for a given model are d i = sign(y i ˆµ i ) D (y i ; ˆµ i ) A poorly fitting point will make a large contribution to the deviance, so d i will be large.

Diagnostics The Pearson residuals are defined by r i = y i ˆµ i (1 hii )V (ˆµ) where h ii is the ith diagonal element of The deviance residuals are H = X(X T Wm 1 X) 1 X T Wm 1 ˆε i = sign(y i ˆµ i ) D (y i ; ˆµ i ) 1 h ii

Diagnostics The Anscombe residuals is defined as a transformation of the Pearson residual r A i = t(y i ) t(ˆµ i ) t (ˆµ i ) φv (ˆµ i )(1 h ii ) The aim in introducing the function t is to make the residuals as Gaussian as possible. We consider t(x) = x 0 V (µ) 1/3 dµ

Diagnostics Influential points using the Cook s distance C i = 1 p ( ˆβ (i) ˆβ) T X T W m X( ˆβ (i) ˆβ) r 2 i h ii p(1 h ii ) 2 The outliers points : if h ii > 2p/n or h ii > 3p/n, then we consider that ith point is an outlier.

Model Selection Model selection can be done using the AIC and BIC. Forward, Backward and stepwise approach can be used.

Logistic regression Logistic regression is a generalization of regression that is used when the outcome Y is binary 0, 1. As example, we assume that P(Y i = 1 X i ) = eβ 0+β 1 X i 1 + e β 0+β 1 X i Note that E(Y i X i ) = P(Y i = 1 X i )

Logistic regression Define the logit function ( ) z logit(z) = log 1 z We can write where π i = P(Y i = 1 X i ) logit(π i ) = β 0 + β 1 X i The extension to several covariates is logit(π i ) = β 0 + p β j x ij i=1

How do we estimate the parameters? Can be fit using maximum likelihood. The likelihood function is L(β) = n f (y i X i ; β) = L(β) = i=1 n i=1 The estimator ˆβ has to be found numerically. π y i i (1 π i ) 1 y i

Usually, we use the reweighted least squares First set a starting values of β (0) Compute ex i β (k) ˆπ i = 1 + e X i β(k) Define weighted matrix W whose i th diagonal is ˆπ i (1 ˆπ i ) Define the adjusted response vector Z = Xβ (k) + W 1 (Y ˆπ) Take ˆβ (k+1) = (X T WX) 1 X T WZ which is the weighted linear regression of Z on X

Model selection and diagnostics Diagnostics : the Pearson χ 2 Y i ˆπ i ˆπi (1 ˆπ i ) The deviance residuals [ sign(y i ˆπ i ) 2 Y i log ( Yi ) ( )] 1 Yi + (1 Y i ) log ˆπ i 1 ˆπ i

To fit this model, we use the glm command. Call: glm(formula = chd ~., family = binomial, data = SAheart) Deviance Residuals: Min 1Q Median 3Q Max -1.8320-0.8250-0.4354 0.8747 2.5503 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -5.9207616 1.3265724-4.463 8.07e-06 *** row.names -0.0008844 0.0008950-0.988 0.323042 sbp 0.0076602 0.0058574 1.308 0.190942 tobacco 0.0777962 0.0266602 2.918 0.003522 ** ldl 0.1701708 0.0597998 2.846 0.004432 ** adiposity 0.0209609 0.0294496 0.712 0.476617 famhistpresent 0.9385467 0.2287202 4.103 4.07e-05 *** typea 0.0376529 0.0124706 3.019 0.002533 ** obesity -0.0661926 0.0443180-1.494 0.135285 alcohol 0.0004222 0.0045053 0.094 0.925346 age 0.0441808 0.0121784 3.628 0.000286 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 596.11 on 461 degrees of freedom Residual deviance: 471.16 on 451 degrees of freedom AIC: 493.16 Number of Fisher Scoring iterations: 5

To fit this model, we use the glm command. Start: AIC=493.16 chd ~ row.names + sbp + tobacco + ldl + adiposity + famhist + typea + obesity + alcohol + age Df Deviance AIC - alcohol 1 471.17 491.17 - adiposity 1 471.67 491.67 - row.names 1 472.14 492.14 - sbp 1 472.88 492.88 <none> 471.16 493.16 - obesity 1 473.47 493.47 - ldl 1 479.65 499.65 - tobacco 1 480.27 500.27 - typea 1 480.75 500.75 - age 1 484.76 504.76 - famhist 1 488.29 508.29 etc... Step: AIC=487.69 chd ~ tobacco + ldl + famhist + typea + age Df Deviance AIC <none> 475.69 487.69 - ldl 1 484.71 494.71 - typea 1 485.44 495.44 - tobacco 1 486.03 496.03 - famhist 1 492.09 502.09 - age 1 502.38 512.38

Suppose Y i Binomial(n i, π i ) We can fit the logistic model as before Pearson residuals r i = logit(π i ) = X i β Y i n i ˆπ i ni ˆπ i (1 ˆπ i ) Deviation residuals d i = sign(y i Ŷi) 2 [ Y i log ( Yi ) ( )] ni Y i + (n i Y i ) log ˆµ i n i ˆµ i

Goodness-of-Fit test The Pearson test and deviance χ 2 = i D = i r 2 i d 2 i both have a χ 2 n p distribution if the model is correct.

To fit this model, we use the glm command. Call: glm(formula = cbind(y, n - y) ~ x, family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -0.70832-0.29814 0.02996 0.64070 0.91132 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -14.73119 1.83018-8.049 8.35e-16 *** x 0.24785 0.03031 8.178 2.89e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 137.7204 on 7 degrees of freedom Residual deviance: 2.6558 on 6 degrees of freedom AIC: 28.233 Number of Fisher Scoring iterations: 4

To test the correctness of the model > pvalue = 1-pchisq(out$dev,out$df.residual) > print(pvalue) [1] 0.8506433 > r=resid(out,type="deviance") > p=out$linear.predictors > plot(p,r,pch=19,xlab="linear predictor", ylab="deviance residuals") > print(sum(r^2)) [1] 2.655771 > cooks.distance(out) 1 2 3 4 5 0.0004817501 0.3596628502 0.0248918197 0.1034462077 0.0242941942 6 7 8 0.0688081629 0.0014847981 0.0309767612 Note that the residuals give back the deviance test, and the p-value is large indicating no evidence of a lack of fit.