Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)



Similar documents
11. Analysis of Case-control Studies Logistic Regression

Multivariate Logistic Regression

Expected Value. 24 February Expected Value 24 February /19

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

VI. Introduction to Logistic Regression

LOGISTIC REGRESSION ANALYSIS

Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC

Generalized Linear Models

Finding Supporters. Political Predictive Analytics Using Logistic Regression. Multivariate Solutions

THE WINNING ROULETTE SYSTEM by

Multinomial and Ordinal Logistic Regression

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Solutions: Problems for Chapter 3. Solutions: Problems for Chapter 3

Statistics and Data Analysis

3.4 Statistical inference for 2 populations based on two samples

The first three steps in a logistic regression analysis with examples in IBM SPSS. Steve Simon P.Mean Consulting

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Math 58. Rumbos Fall Solutions to Review Problems for Exam 2

International Statistical Institute, 56th Session, 2007: Phil Everson

PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY

Is it statistically significant? The chi-square test

Statistics 305: Introduction to Biostatistical Methods for Health Sciences

Statistics 151 Practice Midterm 1 Mike Kowalski

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

Multiple logistic regression analysis of cigarette use among high school students

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén Table Of Contents

III. INTRODUCTION TO LOGISTIC REGRESSION. a) Example: APACHE II Score and Mortality in Sepsis

Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

How to set the main menu of STATA to default factory settings standards

Statistics in Retail Finance. Chapter 2: Statistical models of default

Elementary Statistics and Inference. Elementary Statistics and Inference. 17 Expected Value and Standard Error. 22S:025 or 7P:025.

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

You can place bets on the Roulette table until the dealer announces, No more bets.

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

Interaction between quantitative predictors

13. Poisson Regression Analysis

Lesson Outline Outline

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Chi Squared and Fisher's Exact Tests. Observed vs Expected Distributions

Chapter 7: Simple linear regression Learning Objectives

Advanced Statistical Analysis of Mortality. Rhodes, Thomas E. and Freitas, Stephen A. MIB, Inc. 160 University Avenue. Westwood, MA 02090

Simple Linear Regression

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Categorical Data Analysis

Advanced Quantitative Methods for Health Care Professionals PUBH 742 Spring 2015

Nominal and ordinal logistic regression

Elementary Statistics

Section 7C: The Law of Large Numbers

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

Introduction to Statistics and Quantitative Research Methods

Developing Risk Adjustment Techniques Using the System for Assessing Health Care Quality in the

In the situations that we will encounter, we may generally calculate the probability of an event

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Basic Statistical and Modeling Procedures Using SAS

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Additional sources Compilation of sources:

Elementary Statistics

SUGI 29 Statistics and Data Analysis

Outline. Dispersion Bush lupine survival Quasi-Binomial family

Estimation of σ 2, the variance of ɛ

Elementary Statistics and Inference. Elementary Statistics and Inference. 16 The Law of Averages (cont.) 22S:025 or 7P:025.

Least Squares Estimation

Introduction to Quantitative Methods

Chapter 7: Dummy variable regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Logistic Regression.

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Module 4 - Multiple Logistic Regression

Logistic regression modeling the probability of success

Chapter 7: Proportional Play and the Kelly Betting System

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

Free Trial - BIRT Analytics - IAAs

Simple Linear Regression Inference

If You Think Investing is Gambling, You re Doing it Wrong!

If, under a given assumption, the of a particular observed is extremely. , we conclude that the is probably not

Final Exam Practice Problem Answers

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Mind on Statistics. Chapter 15

Analysis of Survey Data Using the SAS SURVEY Procedures: A Primer

Math 108 Exam 3 Solutions Spring 00

School of Nursing Faculty Salary Equity Report and Action Plan

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

Organizing Your Approach to a Data Analysis

Chapter 23. Inferences for Regression

Poisson Regression or Regression of Counts (& Rates)

STA 130 (Winter 2016): An Introduction to Statistical Reasoning and Data Science

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe

Examining a Fitted Logistic Model

The Math. P (x) = 5! = = 120.

Statistics in Medicine Research Lecture Series CSMC Fall 2014

SUMAN DUVVURU STAT 567 PROJECT REPORT

Internet Gambling in Canada: Prevalence, Patterns and Land-Based Comparisons

Use of the Chi-Square Statistic. Marie Diener-West, PhD Johns Hopkins University

Transcription:

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.) Logistic regression generalizes methods for 2-way tables Adds capability studying several predictors, but Limited to binary response variables Similar in intent to linear regression, but details are different Method for estimating joint association between several predictors and a response variable Typically useful in some class projects 1

2

Betting in a fair game An American roulette wheel has 38 slots: 1,2,3,, 36, 0, 00 If you place a $1 bet on 00 for a single spin of the wheel, you have 1/38 chance of winning in a single spin 1 way to win, 37 ways to lose, or The casino has 37 ways to win, 1 way to lose The odds of winning for the house are 37 to 1, 1 to 37 for you 3

Betting in roulette... For the game to be fair, Casino keeps your $1 if 00 does not come up Casino pays $37 if 00 comes up, and you keep your bet If X is your winnings from a $1 bet, E(X) = -1 (37/38) + 37 (1/38) = 0 Casinos stay in business by paying out 35 to 1, the casinos insure that roulette is not a fair game. In this case E(X) = -1 (37/38) + 35 (1/38) = -(2/38) = -0.053 4

Converting probabilities to odds and log(odds) In a game of chance, the odds of winning is the same as the ratio of money that should be bet by the two players. In roulette, the odds of your winning is the ratio of the probability of your winning to the probability of losing p/(1-p) = (1/38) / (37/38) = 1/37 Typically, odds are given to show the ratio of the payout: 37 to 1 in this case The values of an `odds range from 0 to Think of probabilities 0.01, 0.001, 0.0001, 0.99, 0.999, 0.9999,etc We will use a transformation of odds to log(odds) The values of log(odds) range from - to. 5

Odds vs log(odds) Transformation of p Why consider such a transformation? Answer: it transforms a 0 < p < 1 variable to a quantitative variable from - to + It is a simple algebraic operation to go back and forth between probabilities and log(odds) Odds Log odds p p/(1-p) log(p/(1-p)) 0.0 0-0.1 0.111-2.20 0.2 0.250-1.38 0.3 0.429-0.85 0.4 0.667-0.40 0.5 1.000 0 0.6 1.500 0.40 0.7 2.333 0.85 0.8 4.000 1.38 0.9 9.000 2.20 1.0 + + 6

Computing odds in data, an example The example on the next slide is very similar to IPS Example 8.1 (5th and 6th edition), but the numbers are from the 5th ed. Be careful when reading the example in the 6th ed. 7

Example: binge drinking survey Binger Men Women Total Yes 1630 22.7% 1684 17.0% 3314 No 5550 77.3% Total 7180 100% 8232 83.0% 9916 100% 13782 17096 8

Logistic Regression Idea behind logistic regression Let ˆp M be the proportion of men who are binge drinkers; log(ˆp M /(1 ˆp M )) is the log odds. Let ˆp F be the proportion of women who are binge drinkers; log(ˆp F /(1 ˆp F )) is the log odds. The ratio of the odds (called the odds ratio) of men to women being binge drinkers is ˆp M 1 ˆp M = ˆp F 1 ˆp F ( ˆpM )( 1 ˆpF 1 ˆp M ˆp F ) = 0.294 0.205 = 1.434 Now recall log(x/y) = log(x) log(y). 9

New page Idea behind logistic regression In the binge drinking table, log [( ˆpM )( 1 ˆpF 1 ˆp M ˆp F )] = log(0.294) log(0.205) = 1.225 ( 1.587) = 0.362 The log odds for males differs from the log odds for females by a constant. Logistic regresson is a model in which predictors induce changes in log(odds), similar to linear regression, where Predictors induce changes in mean of response variable. 10

Model for Logistic Regression Set the log odds to be a linear combination of the predictor variables This is the Logistic Regression Model Sometimes equivalently written as: 11

Logistic regression The Logistic Regression Model Predictor variables (x s) can be quantitative or binary More complex formulas for estimates than least squares Omnibus test of the model now a χ 2, not an F test We can test each predictor variable (x i ) for its contribution but now this is a z test, not a t test Assumptions of this model are quite complex and are not often checked Logistic regression model is widely used Coefficients can be derived directly in some 2-way tables Back to binge drinking example 12

Example: binge drinking survey Binger Men Women Total Yes 1630 22.7% 1684 17.0% 3314 19.4% No 5550 77.3% Total 7180 100% 8232 83.0% 9916 100% 13782 80.6% 17096 100% 13

The logistic model - binge drinking From the previous slide (odds of being a binge drinker) For men: Log odds = -1.225 For women: Log odds = -1.587 The logistic model for one predictor (gender) is Log (p /(1-p)) = Log odds = b0 + b1x1 where Y = 1 if a binge drinker; 0 otherwise and X1 = 1 if male; 0 if female So the logistic model is For men: Log odds = b0 + b1 = -1.225 For women: Log odds = b0 = -1.587 Solving b0 = -1.587 and b1 = -1.225 - b0 = 0.362 Thus the fitted logistic model for this example is Log odds = -1.587 + 0.362X1 14

The logistic model - binge drinking Working backwards to confirm this fitted model Log odds = log (p/(1-p)) = -1.587 + 0.362X 1 where X 1 = 1 if male and X 1 = 0 if female So for men Log odds = log (p/(1-p)) = -1.587 + 0.362(1) = -1.225 and odds = e -1.225 = 0.294 Thus the proportion of binge drinkers is odds / (odds +1) = 0.294 / 1.294 = 0.227 For women Log odds = log (p/(1-p)) = -1.587 + 0 and odds = e -1.587 = 0.205 Thus the proportion of binge drinkers is odds / (odds +1) = 0.205 / 1.205 = 0.17 15

Comparing two proportions Relative risk and odds ratio S F Total Group 1 a b a + b Group 2 c d c + d Total a + c b + d 16

Odds ratio As with RR, an odds ratio of 1 indicates the proportion of successes (events) is the same in both groups RR is easier to interpret (ratio of sample proportions) When successes are rare, RR and OR are very similar When successes are common, RR and OR are similar only if they are close to 1 OR tends to overstate differences Example: binge drinking 17

Odds ratio and logistic regression Odds ratio is the key output from a logistic regression An OR is calculated for each predictor variable OR measures the strength of the effect on p (probability of `success ) Example: binge drinking Log odds = -1.587 + 0.362X 1 where X 1 = 1 if M, 0 if F For men: Log odds = b 0 + b 1 = -1.225 For women: Log odds = b 0 = -1.587 Let OR be the odds ratio men to women Log (OR) = log (odds for men) log (odds for women) = (b 0 + b 1 ) (b 0 ) = b 1 So OR = e b 1 For the binge drinking example OR = e 0.362 = 1.436 18

Inference for logistic regression parameters A 95% confidence interval for the coefficient β 1 is given by b 1 ± 1.96 s.e.(b 1 ) A 95% confidence interval for the odds ratio e β 1 is given by e (b 1± 1.96 s.e.(b 1 )) To test the null hypothesis H 0 : β 1 =0(i.e., no association between the response variable and the predictor variable X 1 ), use Z = b 1 s.e.(b 1 ) Z has (approximately) a N(0, 1) distribution when H 0 is true. 19

Binge drinking: expanding data from a 2x2 table to a rectangular data file The 2x2 table Let Binge = 1 if binger Let Sex = 1 if male 0 otherwise Stata commands input Binge Sex Count 1 1 1630 0 1 5550 1 0 1684 0 0 8232 end expand Count Binger Men Women Total Yes 1630 1684 3314 No 5550 8232 13782 Total 7180 9916 17096 This creates a rectangular data file with 17,096 rows: 1 1 1 1 etc. 20

Binge drinking logistic regression Stata has 2 commands - logistic and logit logistic displays ORs; logit displays model coefficients Note: b 0 = -1.587 and b 1 = 0.362 as earlier 21

Binge drinking logistic regression Logistic command displays the odds ratio Notes: OR = 1.436 as earlier 95% CI for OR (1.33, 1.55) excludes OR = 1 z for the Wald test = 9.31 (P < 0.001) 22

Multiple logistic regression Example: Intensive Care Unit (ICU) Study of 200 patients admitted to the adult ICU at Baystate Medical Center in Springfield, MA Response variable Survival until hospital discharge (Surv) Surv = 1 if died, 0 if survived Predictor variables Age, in years (Age) Sex = 1 if female, 0 if male (Sex) Race = 1 if white, 0 otherwise (Race) Heart rate at ICU admission, beats/min. (HRate) Level of consciousness at ICU admission (LOC) LOC = 1 if deep stupor of coma, 0 otherwise Source: Hosmer & Lemeshow (2000) Wiley & Sons 23

Density Heart Rate at ICU Admission 0.01.02.03 Density 20 40 60 80 100 Age 0.005.015.02 100 150 200.005.01 Density 0 Density.01.02.015.03.02 20 40 60 Age 80 100 0 50 100 150 Heart Rate at ICU Admission 200. table LOC Level of Consciousness at ICU Admission Freq. No Coma or Deep Stupor 185 Deep Stupor 5 Coma 10. table surv surv Freq. 0 160 1 40 24

Example ICU, summary of the data Response variable - survival to hospital discharge Surv N = 200, 20% died Predictor variables Sex 38% female Age Average is 57.5 yrs., range from 16 to 92 yrs. Race 87.5% white HRate Average is 99, range from 39 to 192 beats/min. LOC 7.5% in deep stupor or coma 25

ICU - predictors of death before discharge A logistic regression with all 5 predictors Age and level of consciousness (LOC) both significant Re-estimate the model keeping only the significant terms 26

ICU - predictors of death before discharge The final logistic model (using logistic command) Age and LOC are significant Odds ratio For Age 1.028 (95% CI is 1.004 to 1.064) P = 0.022 For LOC 36.16 (95% CI is 7.63 to 171.24) P < 0.001 27

ICU - predictors of death before discharge The final logistic model (using logit command) Shows the logistic model coefficients Log odds = Log (p/1-p) = -3.46 + 0.028 Age + 3.59 LOC Note: e 0.028 = 1.028 (OR) and e 3.59 = 36.16 (OR for LOC) 28

Interpretation of coefficients of the logistic regression model The sign of the β i term indicates whether p increases or decreases as x increases ICU Example: both β i terms were positive so risk of death increases with age and presence of deep stupor or coma The magnitude of the β i term gives the additive change in log odds when there is +1 unit change in the predictor variable, holding other predictors fixed 29

Interpretation of coefficients The magnitude of the odds ratio term (= e βi ) gives the multiplicative change in odds for +1 change in predictor ICU Example: the odds of death increases multiplicatively by 2.8% (OR = 1.028) for each year increase in age To see this, exponentiate both sides of logistic model and note (p/1-p) = e β0 + β1[x + 1] = (e β0 )(e β1x )(e β1 ), where e β1 = OR 30

Final Thoughts on Logistic Regression Some of you will find logistic regression useful in a project, so last p-set has logistic regression problem. Not covered on final exam, because we have not had time to digest it. Logistic regression extends the analysis of two-way tables Response variable must still be binary. Predictors can now be categorical or quantitative. Logistic regression is an example of a class of regression models much more general than linear regression. These models are covered in detail in Stat 138 and Stat 149 31