Statistics 305: Introduction to Biostatistical Methods for Health Sciences



Similar documents
Generalized Linear Models

Elements of statistics (MATH0487-1)

11. Analysis of Case-control Studies Logistic Regression

LOGISTIC REGRESSION ANALYSIS

Some Essential Statistics The Lure of Statistics

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Statistics in Retail Finance. Chapter 2: Statistical models of default

Nominal and ordinal logistic regression

SAS Software to Fit the Generalized Linear Model

Multinomial and Ordinal Logistic Regression

Chapter 7: Simple linear regression Learning Objectives

VI. Introduction to Logistic Regression

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

Binary Diagnostic Tests Two Independent Samples

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

Multivariate Logistic Regression

Factors affecting online sales

Simple Linear Regression Inference

Categorical Data Analysis

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Lecture 25. December 19, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

13. Poisson Regression Analysis

Lecture 19: Conditional Logistic Regression

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

International Statistical Institute, 56th Session, 2007: Phil Everson

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

How to set the main menu of STATA to default factory settings standards

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Logistic Regression. BUS 735: Business Decision Making and Research

Regression step-by-step using Microsoft Excel

Multiple Linear Regression

Developing Risk Adjustment Techniques Using the System for Assessing Health Care Quality in the

Chapter 18. Effect modification and interactions Modeling effect modification

Simple Linear Regression

Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP. Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study.

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Logistic Regression (a type of Generalized Linear Model)

Nonlinear Regression Functions. SW Ch 8 1/54/

Penalized regression: Introduction

Lecture 14: GLM Estimation and Logistic Regression

Free Trial - BIRT Analytics - IAAs

Regression Analysis: A Complete Example

Statistics and Data Analysis

Lecture 18: Logistic Regression Continued

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Interpretation of Somers D under four simple models

Logistic Regression Logistic regression is an example of a large class of regression models called generalized linear models (GLM)

LOGIT AND PROBIT ANALYSIS

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Logit and Probit. Brad Jones 1. April 21, University of California, Davis. Bradford S. Jones, UC-Davis, Dept. of Political Science

Statistical Models in R

Ordinal Regression. Chapter

Interaction between quantitative predictors

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Lesson Outline Outline

Correlation and Regression

Lecture 8: Gamma regression

Logit Models for Binary Data

Regression Modeling Strategies

Cool Tools for PROC LOGISTIC

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Nominal and Real U.S. GDP

SAMPLE SIZE TABLES FOR LOGISTIC REGRESSION

Simple linear regression

Online Appendix to Are Risk Preferences Stable Across Contexts? Evidence from Insurance Data

Links Between Early Retirement and Mortality

Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Erik Parner 14 September Basic Biostatistics - Day 2-21 September,

Department/Academic Unit: Public Health Sciences Degree Program: Biostatistics Collaborative Program

Week TSX Index

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Detection of changes in variance using binary segmentation and optimal partitioning

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Time Series Analysis

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

The first three steps in a logistic regression analysis with examples in IBM SPSS. Steve Simon P.Mean Consulting

Introduction to Regression and Data Analysis

FINAL EXAM SECTIONS AND OBJECTIVES FOR COLLEGE ALGEBRA

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

ch12 practice test SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3.

ln(p/(1-p)) = α +β*age35plus, where p is the probability or odds of drinking

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

BIOM611 Biological Data Analysis

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Refugees with diabetes mellitus have higher prevalence of latent tuberculosis infection

GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE

Correlation and Simple Linear Regression

4. Simple regression. QBUS6840 Predictive Analytics.

2. Linear regression with multiple regressors

Outline. Dispersion Bush lupine survival Quasi-Binomial family

Logistic (RLOGIST) Example #1

SUMAN DUVVURU STAT 567 PROJECT REPORT

Logs Transformation in a Regression Equation

The Demand for Financial Planning Services 1

Transcription:

Statistics 305: Introduction to Biostatistical Methods for Health Sciences Modelling the Log Odds Logistic Regression (Chap 20) Instructor: Liangliang Wang Statistics and Actuarial Science, Simon Fraser University Nov 23 2015 Statistics 305 (SFU) Logistic Regression Nov 23 2015 1 / 16

Logistic Regression Logistic Regression (Chapter 20) In logistic regression we study the effect of explanatory variables on the odds of a binary outcome. This is a generalization of the analyses of odds ratios we have studied before. Think of the binary outcome Y as disease status (0=non-disease; 1=disease). The explanatory variables could be categorical (e.g., exposures), or quantitative variables. Statistics 305 (SFU) Logistic Regression Nov 23 2015 2 / 16

Logistic Regression Example Example In a sample of low birthweight infants in a neonatal intensive care unit, 76 were diagnosed with bronchopulmonary dysplasia (BPD; Y = 1) and 147 were non-bpd (Y = 0). One factor that might affect the risk of BPD is birth weight (BWT; X 1 ). A summary of these data with birth weight broken into three categories is as follows. BWT BPD no BPD odds log-odds 0-950 49 19 2.58 0.95 951-1350 18 62 0.62 1.24 1351-1750 9 66 0.14 1.99 Total 76 147 log-ors are obtained by taking differences between log-odds. Statistics 305 (SFU) Logistic Regression Nov 23 2015 3 / 16

Logistic Regression Logistic Regression Model Example We model the log-odds of BPD as a function of birth weight; i.e., [ ] p ln = α + β 1 x 1 1 p where ln is the natural logarithm and p is the probability of disease given x 1 (suppressed in notation). Letting LO = α + β 1 x 1, it can be shown that p = elo 1 + e LO which is the logistic function of LO. Rather than least squares, we use the method of maximum likelihood to fit the model (details omitted). For large sample sizes we can make approximate inference about the regression coefficient. Statistics 305 (SFU) Logistic Regression Nov 23 2015 4 / 16

Example Logistic Regression Example Fitted logistic regression of BPD probability on birth weight: Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) 4.0342913 0.6957121 5.799 6.68e-09 birthwt -0.0042291 0.0006408-6.600 4.11e-11 Interpretation of ˆβ 1 : A one gram increase in birth weight is estimated to change the log-odds of BPD by 0.0042. A one gram increase in birth weight is estimated to change the odds of BPD by a multiplicative factor of e 0.0042 = 0.996 A 100 gram increase in birth weight is estimated to change the log-odds of BPD by 0.42. A 100 gram increase in birth weight is estimated to change the odds of BPD by a multiplicative factor of e 0.42 = 0.657. Etc. Try to interpret in terms of ORs, rather than log-ors Statistics 305 (SFU) Logistic Regression Nov 23 2015 5 / 16

Example, continued Logistic Regression Example Fitted log-odds of BPD: logodds 3 2 1 0 1 2 Fitted probability of BPD: 400 600 800 1000 1200 1400 1600 birthwt probs 0.2 0.4 0.6 0.8 Statistics 305 (SFU) Logistic Regression Nov 23 2015 6 / 16

Odds Ratios Logistic Regression Odds ratios A difference in logarithms, ln(a) ln(b), is the logarithm of the ratio ln(a/b). Hence differences in estimated log odds are estimated log odds ratios: Take two values x 11 and x 12 of x 1. The difference in estimated log odds, (ˆα + ˆβ 1 x 11 ) (ˆα + ˆβ 1 x 12 ) = ˆβ 1 (x 11 x 12 ) is the estimated log OR for x 11 versus x 12 and e ˆβ 1(x 11 x 12) is the estimated OR for x 11 versus x 12. With one binary explanatory variable, such as a binary exposure, take x 11 = 1 and x 12 = 0 to see that ˆβ 1 is the estimated log-odds ratio and e ˆβ 1 is the estimated odds ratio. Statistics 305 (SFU) Logistic Regression Nov 23 2015 7 / 16

Logistic Regression Inference in Logistic Regression Inference in Logistic Regression Focus inference on β 1. It can be shown (details omitted) that the sampling distribution of ˆβ 1 is approximately normal with mean β 1 and certain SD. Let SE( ˆβ 1 ) denote the estimated SD. For large samples, ˆβ 1 β 1 SE( ˆβ 1 ) N(0, 1) Confidence intervals and hypothesis tests follow in the usual way. However, for CIs, we should exponentiate ends to get a confidence interval for the OR parameter, rather than the log OR parameter. Statistics 305 (SFU) Logistic Regression Nov 23 2015 8 / 16

Logistic Regression Inference in Logistic Regression Inference for the BPD Example The estimate is ˆβ 1 = 0.0042 with SE 0.00064. An approximate 95% CI for the log OR parameter is ( 0.0042 1.96 0.00064, 0.0042 + 1.96 0.00064) = ( 0.0055, 0.0029) An approximate 95% CI for the OR parameter is (e 0.0055, e 0.0029 ) = (.995,.997) The test statistic for testing H 0 : β 1 = 0 is 0.0042/0.00064 = 6.5625 which gives p < 0.001. Statistics 305 (SFU) Logistic Regression Nov 23 2015 9 / 16

Next Steps Logistic Regression Multiple logistic regression allows us to investigate possible synergy between explanatory variables adjust for confounders Example: The data on low birthweight infants that we used to study the relationship between BPD and birth weight also included gestational age. Could gestational age modify the effect of birth weight on the odds of BPD? If not, does gestational age confound the relationship between birth weight and the odds of BPD? Statistics 305 (SFU) Logistic Regression Nov 23 2015 10 / 16

Multiple Logistic Regression Model Multiple Logistic Regression Model We model the log-odds of disease as a function of q explanatory variables x 1, x 2,..., x q ; i.e., [ ] p ln = α + β 1 x 1 + β 2 x 2 +... β q x q 1 p where ln is the natural logarithm and p is the probability of disease given x 1,...,x q (suppressed in notation). With birth weight (x 1 ) and gestational age (x 2 ) are used as explanatory variables, the log-odds of BPD is modelled as α + β 1 x 1 + β 2 x 2. Letting LO = α + β 1 x 1 + β 2 x 2 +... β q x q, p = elo 1 + e LO Statistics 305 (SFU) Logistic Regression Nov 23 2015 11 / 16

Interaction Variables Multiple Logistic Regression Model Interaction Variables Interaction between gestatage (x 2 ) and birthwt (x 1 ) allows effect of birthwt on odds of BPD to vary with gestage: [ ] p ln = α + β 1 x 1 + β 2 x 2 + β 12 x 1 x 2 1 p For given value x2 of x 2, [ ] p ln = = (α + β 2 x2 ) + (β 1 + β 12 x2 )x 1 1 p Interpretations: For gestational age x2, a one unit increase in birth weight changes the log-odds of BPD by β 1 + β 12 x units. For gestational age x2, a one unit increase in birth weight changes the odds of BPD by a multiplicative factor of e β1+β12x units. β 12 = 0 implies that this multiplicative factor does not depend on x 2 homogeneous ORs. Testing for interaction is like testing for homogeneous ORs with the Mantel Haenszel procedures. Statistics 305 (SFU) Logistic Regression Nov 23 2015 12 / 16

Multiple Logistic Regression Model Interaction in the BPD example Interaction Variables Fitting the model with gestage-by-birthwt interactions gives: Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) 33.5735625 11.2277076 2.990 0.00279 birthwt -0.0208384 0.0097169-2.145 0.03199 gestage -1.0603539 0.3801499-2.789 0.00528 birthwt:gestage 0.0006124 0.0003204 1.912 0.05594 At significance level 5% we do not reject the null hypothesis H 0 : β 12 = 0 (i.e., no interaction). Conclude that gestage does not modify the effect of birthwt on the odds of BPD. Statistics 305 (SFU) Logistic Regression Nov 23 2015 13 / 16

Multiple Logistic Regression Model Confounding Variables Confounding Variables Though gestage does not modify the effect of birthwt on the odds of BPD, we must still consider gestage as a possible confounder. We fit the model with birthwt and gestage effects: Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) 13.8272516 2.9321159 4.716 2.41e-06 birthwt -0.0024097 0.0007925-3.041 0.002361 gestage -0.3982616 0.1129995-3.524 0.000424 and with just birthwt: Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) 4.0342913 0.6957121 5.799 6.68e-09 birthwt -0.0042291 0.0006408-6.600 4.11e-11 and find that the parameter estimate changes by 0.0024 ( 0.0042) / 0.0024 100% = 75% The birthwt estimate changes by more than 10% when gestage is excluded, so gestage is a confounder. Statistics 305 (SFU) Logistic Regression Nov 23 2015 14 / 16

Multiple Logistic Regression Model Interpretation Interpretation Birtwt estimate ˆβ 1 = 0.0024; gestage estimate ˆβ 2 = 0.398. Interpretation of ˆβ 1 : For a given gestational age, a one gram increase in birthweight is estimated to change the log-odds of BPD by 0.0024, or to change the odds of BPD by a multiplicative factor of e 0.0024 = 0.9976. NB: Without interaction, these effect on the odds of BPD are the same for all values of gestational age. Interpretation of ˆβ 2 : For a given birth weight, a one week increase in gestational age is estimated to change the log-odds of BPD by 0.398, or to change the odds of BPD by a multiplicative factor of e 0.398 = 0.672. Statistics 305 (SFU) Logistic Regression Nov 23 2015 15 / 16

Model checking Multiple Logistic Regression Model There are measures of goodness-of-fit and residual diagnostics for logistic regression, but these are beyond scope of this course. Statistics 305 (SFU) Logistic Regression Nov 23 2015 16 / 16