Statistics 305: Introduction to Biostatistical Methods for Health Sciences

Statistics 305: Introduction to Biostatistical Methods for Health Sciences Modelling the Log Odds Logistic Regression (Chap 20) Instructor: Liangliang Wang Statistics and Actuarial Science, Simon Fraser University Nov 23 2015 Statistics 305 (SFU) Logistic Regression Nov 23 2015 1 / 16

Logistic Regression Logistic Regression (Chapter 20) In logistic regression we study the effect of explanatory variables on the odds of a binary outcome. This is a generalization of the analyses of odds ratios we have studied before. Think of the binary outcome Y as disease status (0=non-disease; 1=disease). The explanatory variables could be categorical (e.g., exposures), or quantitative variables. Statistics 305 (SFU) Logistic Regression Nov 23 2015 2 / 16

Logistic Regression Example Example In a sample of low birthweight infants in a neonatal intensive care unit, 76 were diagnosed with bronchopulmonary dysplasia (BPD; Y = 1) and 147 were non-bpd (Y = 0). One factor that might affect the risk of BPD is birth weight (BWT; X 1 ). A summary of these data with birth weight broken into three categories is as follows. BWT BPD no BPD odds log-odds 0-950 49 19 2.58 0.95 951-1350 18 62 0.62 1.24 1351-1750 9 66 0.14 1.99 Total 76 147 log-ors are obtained by taking differences between log-odds. Statistics 305 (SFU) Logistic Regression Nov 23 2015 3 / 16

Logistic Regression Logistic Regression Model Example We model the log-odds of BPD as a function of birth weight; i.e., [ ] p ln = α + β 1 x 1 1 p where ln is the natural logarithm and p is the probability of disease given x 1 (suppressed in notation). Letting LO = α + β 1 x 1, it can be shown that p = elo 1 + e LO which is the logistic function of LO. Rather than least squares, we use the method of maximum likelihood to fit the model (details omitted). For large sample sizes we can make approximate inference about the regression coefficient. Statistics 305 (SFU) Logistic Regression Nov 23 2015 4 / 16

Example Logistic Regression Example Fitted logistic regression of BPD probability on birth weight: Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) 4.0342913 0.6957121 5.799 6.68e-09 birthwt -0.0042291 0.0006408-6.600 4.11e-11 Interpretation of ˆβ 1 : A one gram increase in birth weight is estimated to change the log-odds of BPD by 0.0042. A one gram increase in birth weight is estimated to change the odds of BPD by a multiplicative factor of e 0.0042 = 0.996 A 100 gram increase in birth weight is estimated to change the log-odds of BPD by 0.42. A 100 gram increase in birth weight is estimated to change the odds of BPD by a multiplicative factor of e 0.42 = 0.657. Etc. Try to interpret in terms of ORs, rather than log-ors Statistics 305 (SFU) Logistic Regression Nov 23 2015 5 / 16

Example, continued Logistic Regression Example Fitted log-odds of BPD: logodds 3 2 1 0 1 2 Fitted probability of BPD: 400 600 800 1000 1200 1400 1600 birthwt probs 0.2 0.4 0.6 0.8 Statistics 305 (SFU) Logistic Regression Nov 23 2015 6 / 16

Odds Ratios Logistic Regression Odds ratios A difference in logarithms, ln(a) ln(b), is the logarithm of the ratio ln(a/b). Hence differences in estimated log odds are estimated log odds ratios: Take two values x 11 and x 12 of x 1. The difference in estimated log odds, (ˆα + ˆβ 1 x 11 ) (ˆα + ˆβ 1 x 12 ) = ˆβ 1 (x 11 x 12 ) is the estimated log OR for x 11 versus x 12 and e ˆβ 1(x 11 x 12) is the estimated OR for x 11 versus x 12. With one binary explanatory variable, such as a binary exposure, take x 11 = 1 and x 12 = 0 to see that ˆβ 1 is the estimated log-odds ratio and e ˆβ 1 is the estimated odds ratio. Statistics 305 (SFU) Logistic Regression Nov 23 2015 7 / 16

Logistic Regression Inference in Logistic Regression Inference in Logistic Regression Focus inference on β 1. It can be shown (details omitted) that the sampling distribution of ˆβ 1 is approximately normal with mean β 1 and certain SD. Let SE( ˆβ 1 ) denote the estimated SD. For large samples, ˆβ 1 β 1 SE( ˆβ 1 ) N(0, 1) Confidence intervals and hypothesis tests follow in the usual way. However, for CIs, we should exponentiate ends to get a confidence interval for the OR parameter, rather than the log OR parameter. Statistics 305 (SFU) Logistic Regression Nov 23 2015 8 / 16

Logistic Regression Inference in Logistic Regression Inference for the BPD Example The estimate is ˆβ 1 = 0.0042 with SE 0.00064. An approximate 95% CI for the log OR parameter is ( 0.0042 1.96 0.00064, 0.0042 + 1.96 0.00064) = ( 0.0055, 0.0029) An approximate 95% CI for the OR parameter is (e 0.0055, e 0.0029 ) = (.995,.997) The test statistic for testing H 0 : β 1 = 0 is 0.0042/0.00064 = 6.5625 which gives p < 0.001. Statistics 305 (SFU) Logistic Regression Nov 23 2015 9 / 16

Next Steps Logistic Regression Multiple logistic regression allows us to investigate possible synergy between explanatory variables adjust for confounders Example: The data on low birthweight infants that we used to study the relationship between BPD and birth weight also included gestational age. Could gestational age modify the effect of birth weight on the odds of BPD? If not, does gestational age confound the relationship between birth weight and the odds of BPD? Statistics 305 (SFU) Logistic Regression Nov 23 2015 10 / 16

Multiple Logistic Regression Model Multiple Logistic Regression Model We model the log-odds of disease as a function of q explanatory variables x 1, x 2,..., x q ; i.e., [ ] p ln = α + β 1 x 1 + β 2 x 2 +... β q x q 1 p where ln is the natural logarithm and p is the probability of disease given x 1,...,x q (suppressed in notation). With birth weight (x 1 ) and gestational age (x 2 ) are used as explanatory variables, the log-odds of BPD is modelled as α + β 1 x 1 + β 2 x 2. Letting LO = α + β 1 x 1 + β 2 x 2 +... β q x q, p = elo 1 + e LO Statistics 305 (SFU) Logistic Regression Nov 23 2015 11 / 16

Interaction Variables Multiple Logistic Regression Model Interaction Variables Interaction between gestatage (x 2 ) and birthwt (x 1 ) allows effect of birthwt on odds of BPD to vary with gestage: [ ] p ln = α + β 1 x 1 + β 2 x 2 + β 12 x 1 x 2 1 p For given value x2 of x 2, [ ] p ln = = (α + β 2 x2 ) + (β 1 + β 12 x2 )x 1 1 p Interpretations: For gestational age x2, a one unit increase in birth weight changes the log-odds of BPD by β 1 + β 12 x units. For gestational age x2, a one unit increase in birth weight changes the odds of BPD by a multiplicative factor of e β1+β12x units. β 12 = 0 implies that this multiplicative factor does not depend on x 2 homogeneous ORs. Testing for interaction is like testing for homogeneous ORs with the Mantel Haenszel procedures. Statistics 305 (SFU) Logistic Regression Nov 23 2015 12 / 16

Multiple Logistic Regression Model Interaction in the BPD example Interaction Variables Fitting the model with gestage-by-birthwt interactions gives: Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) 33.5735625 11.2277076 2.990 0.00279 birthwt -0.0208384 0.0097169-2.145 0.03199 gestage -1.0603539 0.3801499-2.789 0.00528 birthwt:gestage 0.0006124 0.0003204 1.912 0.05594 At significance level 5% we do not reject the null hypothesis H 0 : β 12 = 0 (i.e., no interaction). Conclude that gestage does not modify the effect of birthwt on the odds of BPD. Statistics 305 (SFU) Logistic Regression Nov 23 2015 13 / 16

Multiple Logistic Regression Model Confounding Variables Confounding Variables Though gestage does not modify the effect of birthwt on the odds of BPD, we must still consider gestage as a possible confounder. We fit the model with birthwt and gestage effects: Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) 13.8272516 2.9321159 4.716 2.41e-06 birthwt -0.0024097 0.0007925-3.041 0.002361 gestage -0.3982616 0.1129995-3.524 0.000424 and with just birthwt: Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) 4.0342913 0.6957121 5.799 6.68e-09 birthwt -0.0042291 0.0006408-6.600 4.11e-11 and find that the parameter estimate changes by 0.0024 ( 0.0042) / 0.0024 100% = 75% The birthwt estimate changes by more than 10% when gestage is excluded, so gestage is a confounder. Statistics 305 (SFU) Logistic Regression Nov 23 2015 14 / 16

Multiple Logistic Regression Model Interpretation Interpretation Birtwt estimate ˆβ 1 = 0.0024; gestage estimate ˆβ 2 = 0.398. Interpretation of ˆβ 1 : For a given gestational age, a one gram increase in birthweight is estimated to change the log-odds of BPD by 0.0024, or to change the odds of BPD by a multiplicative factor of e 0.0024 = 0.9976. NB: Without interaction, these effect on the odds of BPD are the same for all values of gestational age. Interpretation of ˆβ 2 : For a given birth weight, a one week increase in gestational age is estimated to change the log-odds of BPD by 0.398, or to change the odds of BPD by a multiplicative factor of e 0.398 = 0.672. Statistics 305 (SFU) Logistic Regression Nov 23 2015 15 / 16

Model checking Multiple Logistic Regression Model There are measures of goodness-of-fit and residual diagnostics for logistic regression, but these are beyond scope of this course. Statistics 305 (SFU) Logistic Regression Nov 23 2015 16 / 16