Psychology 5741 (Neuroscience) Logistic Regression

Similar documents
11. Analysis of Case-control Studies Logistic Regression

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

SUGI 29 Statistics and Data Analysis

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Generalized Linear Models

1 Theory: The General Linear Model

Developing Risk Adjustment Techniques Using the System for Assessing Health Care Quality in the

Basic Statistical and Modeling Procedures Using SAS

Chapter 5 Analysis of variance SPSS Analysis of variance

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

SAS Software to Fit the Generalized Linear Model

Multinomial and Ordinal Logistic Regression

VI. Introduction to Logistic Regression

LOGISTIC REGRESSION ANALYSIS

Statistics, Data Analysis & Econometrics

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Cool Tools for PROC LOGISTIC

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Descriptive Statistics

Multinomial and ordinal logistic regression using PROC LOGISTIC Peter L. Flom National Development and Research Institutes, Inc

Binary Logistic Regression

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Lecture 14: GLM Estimation and Logistic Regression

Ordinal Regression. Chapter

2013 CRS Research Report MOTORCYCLE SAFETY AND DRIVING UNDER INFLUENCE OF ALCOHOL

Chapter 29 The GENMOD Procedure. Chapter Table of Contents

ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node

Simple Linear Regression Inference

Additional sources Compilation of sources:

Logistic Regression.

Simple Linear Regression, Scatterplots, and Bivariate Correlation

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Logistic Regression (a type of Generalized Linear Model)

PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY

A LOGISTIC REGRESSION MODEL TO PREDICT FRESHMEN ENROLLMENTS Vijayalakshmi Sampath, Andrew Flagel, Carolina Figueroa

Univariate Regression

SPSS Resources. 1. See website (readings) for SPSS tutorial & Stats handout

Categorical Data Analysis

Discussion Section 4 ECON 139/ Summer Term II

Weight of Evidence Module

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

ABSTRACT INTRODUCTION

When to Use a Particular Statistical Test

Multivariate Logistic Regression

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Statistics in Retail Finance. Chapter 2: Statistical models of default

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

Examining a Fitted Logistic Model

SPSS Guide: Regression Analysis

Charles Secolsky County College of Morris. Sathasivam 'Kris' Krishnan The Richard Stockton College of New Jersey

SAS Syntax and Output for Data Manipulation:

Individual Growth Analysis Using PROC MIXED Maribeth Johnson, Medical College of Georgia, Augusta, GA

2. Simple Linear Regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

HLM software has been one of the leading statistical packages for hierarchical

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.

Part 2: Analysis of Relationship Between Two Variables

II. DISTRIBUTIONS distribution normal distribution. standard scores

13. Poisson Regression Analysis

Introduction to Fixed Effects Methods

Regression step-by-step using Microsoft Excel

Chapter 7: Simple linear regression Learning Objectives

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

Lecture 19: Conditional Logistic Regression

Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Module 5: Introduction to Multilevel Modelling SPSS Practicals Chris Charlton 1 Centre for Multilevel Modelling

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Technical report. in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE

An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA

Illustration (and the use of HLM)

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Introduction to General and Generalized Linear Models

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

ln(p/(1-p)) = α +β*age35plus, where p is the probability or odds of drinking

5. Multiple regression

One-Way ANOVA using SPSS SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc.

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Segmentation For Insurance Payments Michael Sherlock, Transcontinental Direct, Warminster, PA

ASSESSING DECISION TREE MODELS FOR CLINICAL IN VITRO FERTILIZATION DATA

Section 6: Model Selection, Logistic Regression and more...

Linda K. Muthén Bengt Muthén. Copyright 2008 Muthén & Muthén Table Of Contents

TABLE OF CONTENTS. About Chi Squares What is a CHI SQUARE? Chi Squares Hypothesis Testing with Chi Squares... 2

Linear Mixed-Effects Modeling in SPSS: An Introduction to the MIXED Procedure

Week TSX Index

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Modeling Lifetime Value in the Insurance Industry

Point Biserial Correlation Tests

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

How To Model A Series With Sas

Linear Models in STATA and ANOVA

Module 4 - Multiple Logistic Regression

data visualization and regression

Research Methods & Experimental Design

Chapter 6: Multivariate Cointegration Analysis

Transcription:

QMIN Logistic Regression - 1 Psychology 5741 (Neuroscience) Logistic Regression Data Set: Logistic SAS input: Logistic.input.sas Background: The purpose of this study was to explore the action of a GABA (gaminobutyric acid) blocker on seizures. In most areas of the brain GABA is an inhibitory neurotransmitter, so blocking GABA might in theory lead to excitation and possibly seizures. Rats were given a dose of the blocker and then assessed over a 30 minute period for seizures. Afterwards, rats were sacrificed and their brains dissected and a measure of GABA receptors blocked by the drug was obtained the larger the number the greater the number of receptors blocked (per unit volume). The data set also includes the sex of the rat (0 = female, 1 = male). Background to Logistic Regression Logistic regression is used to predict two different types of dependent variables. The first type is a dichotomous dependent variable that takes on one of two mututally exclusive states. Examples of such a variable are success versus fail, correct versus incorrect, and schizophrenic versus not schizophrenia. The second type of dependent variable is an ordinal scale of response. Here, a study on schizophrenia might assign a value of 0 to those participants who lack appreciable schizophrenic pathology, a value of 1 to those with schizotypal personality but not full blown schizophrenia, and and value of 2 to schizophrenics. Ordinary regression computes a predicted value of a dependent variable as a linear function of a set of predictor (or independent) variables. The equation is Y ˆ = b 0 X 1 + b 2 X 2 +Kb k X k Logistic regression also begins with a linear function of the predictor (or independent) variables, but this linear function does not equal the predicted value of the dependent variables. Instead the linear function predicts a new variable that we will denote as L for liability towards the dependent variable. Hence, the starting equation for logistic regression is L = b 0 X 1 + b 2 X 2 +Kb k X k. Then the probability that the dependent variable takes on a specific state is a function of the liability dimension, L: Pr(Y = State 1) = exp(l) 1+ exp(l). In the current example, we want to predict the presence of a seizure from two variables in the data set, sex of the rat and the amount of GABA blocked. We begin by creating a model that predicts the liability of developing a seizure. We should be familiar with writing this type of model because it is the one that we have used for ANOVA and

QMIN Logistic Regression - 2 regression. We write L as a function of sex and GABA blocklage and allow for possibility of an interaction between sex and GABA blockage. The equation is L = b 0 sex+ b 2 GABA+ b 3 sex*gaba. Hence, the probability that an animal has a seizure equals Pr(Seizure) = exp(l) 1+ exp(l). SAS PROC LOGISTIC The text below shows a SAS program that performs the logistic regression detailed above. PROC LOGISTIC DATA=logistic; MODEL seizure = sex gababl sex*gababl; RUN; Note that the model statement takes on the same syntax as model statement for PROC GLM or PROC REG. PROC LOGISTIC will automatically parse the MODEL statement and create the appropriate mathematical equations to solve for the unknown coefficients (i.e., the bs). Output from this procedure is given below. We will examine individual sections of the output and comment on them. The LOGISTIC Procedure Model Information Data Set WORK.LOGISTIC Response Variable seizure Number of Response Levels 2 Number of Observations 121 Model binary logit Optimization Technique Fisher's scoring Response Profile Ordered Total Value seizure Frequency 1 0 77 2 1 44 Probability modeled is seizure=0. This first section of the output provides descriptive information about the logistic regression by naming the data set, dependent variable, number of observations and other technical information about the method of analysis. Make certain to examine the table labeled Response Profile. The last line of this section ( Probability modeled is seizure

QMIN Logistic Regression - 3 = 0 ) gives the state of the dependent variable that the model is trying to predict. In the present case, we are predicting the absence if a seizure. (This should not be of concern because, as we see later, we only have to reverse the sign of the coefficients to predict the presence of a seizure). Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 160.627 134.870 SC 163.422 146.053-2 Log L 158.627 126.870 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 31.7570 3 <.0001 Score 27.3657 3 <.0001 Wald 20.1104 3 0.0002 The next section of the output presents results that are analogous to the omnibus F test in ANOVA and regression. The bottom section of this output give three chi-square statistics that assess whether the model as a whole predicts better than chance. To be more specific, the null hypothesis being tested is that the bs for all independent variables (but not the intercept, b 0 ) equal 0. If the values of c 2 is large and its associated p value is less than the critical value, then we reject the null hypothesis that all the bs (except for the intercept, b 0 ) can be set to 0. In the present case, the null hypothesis is rejected for all three types of c 2. Although this pattern is frequently observed, it is not universal. Typically, the likelihood ratio c 2 is the most powerful while the Wald c 2 is the most conservative. The rows above labeled AIC and SC give two alternative statistics used to assess the usefulness of the model. AIC denotes Akaike s Information Criterion and is a measure that balances the increase in predictability by all variables to the model by the number of variables. Models with the lower AIC are to be preferred over those with a higher AIC. Here, the AIC for a model that fits only the intercept is 160.63 while the AIC for a model that fits an intercept and all three independent variables (confusing called covariates in the output) is 134.87. Hence, the model with the independent variables is to be preferred over that with just an intercept. SBC stands for Schwarz s Bayesian Criterion and it follows the same logic as the AIC models with smaller values of SBC are preferred over those with larger values. Again, the SBC suggests that the independent variables add significantly to prediction. Which statistic to report? In neuroscience, it is recommend that you report only the likelihood ratio c 2, its degrees of freedom, and its p value.

QMIN Logistic Regression - 4 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 6.8144 1.9413 12.3213 0.0004 sex 1 0.4683 3.0209 0.0240 0.8768 gababl 1-0.4754 0.1458 10.6319 0.0011 sex*gababl 1-0.0834 0.2349 0.1261 0.7225 Next, the output from PROC LOGISTIC gives the parameter estimates, their standard errors, and a test of whether the estimates differ significantly from 0. This part of the output should be interpreted just as the analogous sections on the parameter estimates from a regression or ANOVA output. SAS tests the estimates using a statistic called a Wald c 2. If the value of c 2 is large and its associated p value is small, then reject the null hypothesis that the parameter estimate is 0. For the present example, neither the variable sex nor its interaction with amount of GABA blockage contributes significantly to prediction. Above we noted that SAS was predicting the absence of a seizure. To predict the presence of a seizure all we need to do is reverse the sign for each coefficient. Hence, to predict the presence of a seizure, the coefficient for variable gababl (GABA blockage) is.4754. This denotes that higher levels of GBA blockage (and hence, higher neuronal activity) increase the probability of a seizure. Association of Predicted Probabilities and Observed Responses Percent Concordant 80.6 Somers' D 0.619 Percent Discordant 18.7 Gamma 0.624 Percent Tied 0.7 Tau-a 0.289 Pairs 3388 c 0.810 The final section of the output shows the extent to which predictions based on the logistic regression agree with the observed outcome (or dependent variable). If we used the logistic regression to predict for each rat whether or not it would have a seizure, we would have agreed with the observed data on seizures 80.6% of the time. Other statistics used for agreements are also presented (see the SAS manual for their meaning).

QMIN Logistic Regression - 5 The Logistic Function The figure below shows the logistic function for the present example. The X axis plots the value of L for each rat. In this case, L = -6.8144 -.4683* sex+.4754 * GABA+.0834 * sex*gaba. Note that the scale for liability differs from that of the raw variables.