QMIN Logistic Regression - 1 Psychology 5741 (Neuroscience) Logistic Regression Data Set: Logistic SAS input: Logistic.input.sas Background: The purpose of this study was to explore the action of a GABA (gaminobutyric acid) blocker on seizures. In most areas of the brain GABA is an inhibitory neurotransmitter, so blocking GABA might in theory lead to excitation and possibly seizures. Rats were given a dose of the blocker and then assessed over a 30 minute period for seizures. Afterwards, rats were sacrificed and their brains dissected and a measure of GABA receptors blocked by the drug was obtained the larger the number the greater the number of receptors blocked (per unit volume). The data set also includes the sex of the rat (0 = female, 1 = male). Background to Logistic Regression Logistic regression is used to predict two different types of dependent variables. The first type is a dichotomous dependent variable that takes on one of two mututally exclusive states. Examples of such a variable are success versus fail, correct versus incorrect, and schizophrenic versus not schizophrenia. The second type of dependent variable is an ordinal scale of response. Here, a study on schizophrenia might assign a value of 0 to those participants who lack appreciable schizophrenic pathology, a value of 1 to those with schizotypal personality but not full blown schizophrenia, and and value of 2 to schizophrenics. Ordinary regression computes a predicted value of a dependent variable as a linear function of a set of predictor (or independent) variables. The equation is Y ˆ = b 0 X 1 + b 2 X 2 +Kb k X k Logistic regression also begins with a linear function of the predictor (or independent) variables, but this linear function does not equal the predicted value of the dependent variables. Instead the linear function predicts a new variable that we will denote as L for liability towards the dependent variable. Hence, the starting equation for logistic regression is L = b 0 X 1 + b 2 X 2 +Kb k X k. Then the probability that the dependent variable takes on a specific state is a function of the liability dimension, L: Pr(Y = State 1) = exp(l) 1+ exp(l). In the current example, we want to predict the presence of a seizure from two variables in the data set, sex of the rat and the amount of GABA blocked. We begin by creating a model that predicts the liability of developing a seizure. We should be familiar with writing this type of model because it is the one that we have used for ANOVA and
QMIN Logistic Regression - 2 regression. We write L as a function of sex and GABA blocklage and allow for possibility of an interaction between sex and GABA blockage. The equation is L = b 0 sex+ b 2 GABA+ b 3 sex*gaba. Hence, the probability that an animal has a seizure equals Pr(Seizure) = exp(l) 1+ exp(l). SAS PROC LOGISTIC The text below shows a SAS program that performs the logistic regression detailed above. PROC LOGISTIC DATA=logistic; MODEL seizure = sex gababl sex*gababl; RUN; Note that the model statement takes on the same syntax as model statement for PROC GLM or PROC REG. PROC LOGISTIC will automatically parse the MODEL statement and create the appropriate mathematical equations to solve for the unknown coefficients (i.e., the bs). Output from this procedure is given below. We will examine individual sections of the output and comment on them. The LOGISTIC Procedure Model Information Data Set WORK.LOGISTIC Response Variable seizure Number of Response Levels 2 Number of Observations 121 Model binary logit Optimization Technique Fisher's scoring Response Profile Ordered Total Value seizure Frequency 1 0 77 2 1 44 Probability modeled is seizure=0. This first section of the output provides descriptive information about the logistic regression by naming the data set, dependent variable, number of observations and other technical information about the method of analysis. Make certain to examine the table labeled Response Profile. The last line of this section ( Probability modeled is seizure
QMIN Logistic Regression - 3 = 0 ) gives the state of the dependent variable that the model is trying to predict. In the present case, we are predicting the absence if a seizure. (This should not be of concern because, as we see later, we only have to reverse the sign of the coefficients to predict the presence of a seizure). Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 160.627 134.870 SC 163.422 146.053-2 Log L 158.627 126.870 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 31.7570 3 <.0001 Score 27.3657 3 <.0001 Wald 20.1104 3 0.0002 The next section of the output presents results that are analogous to the omnibus F test in ANOVA and regression. The bottom section of this output give three chi-square statistics that assess whether the model as a whole predicts better than chance. To be more specific, the null hypothesis being tested is that the bs for all independent variables (but not the intercept, b 0 ) equal 0. If the values of c 2 is large and its associated p value is less than the critical value, then we reject the null hypothesis that all the bs (except for the intercept, b 0 ) can be set to 0. In the present case, the null hypothesis is rejected for all three types of c 2. Although this pattern is frequently observed, it is not universal. Typically, the likelihood ratio c 2 is the most powerful while the Wald c 2 is the most conservative. The rows above labeled AIC and SC give two alternative statistics used to assess the usefulness of the model. AIC denotes Akaike s Information Criterion and is a measure that balances the increase in predictability by all variables to the model by the number of variables. Models with the lower AIC are to be preferred over those with a higher AIC. Here, the AIC for a model that fits only the intercept is 160.63 while the AIC for a model that fits an intercept and all three independent variables (confusing called covariates in the output) is 134.87. Hence, the model with the independent variables is to be preferred over that with just an intercept. SBC stands for Schwarz s Bayesian Criterion and it follows the same logic as the AIC models with smaller values of SBC are preferred over those with larger values. Again, the SBC suggests that the independent variables add significantly to prediction. Which statistic to report? In neuroscience, it is recommend that you report only the likelihood ratio c 2, its degrees of freedom, and its p value.
QMIN Logistic Regression - 4 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 6.8144 1.9413 12.3213 0.0004 sex 1 0.4683 3.0209 0.0240 0.8768 gababl 1-0.4754 0.1458 10.6319 0.0011 sex*gababl 1-0.0834 0.2349 0.1261 0.7225 Next, the output from PROC LOGISTIC gives the parameter estimates, their standard errors, and a test of whether the estimates differ significantly from 0. This part of the output should be interpreted just as the analogous sections on the parameter estimates from a regression or ANOVA output. SAS tests the estimates using a statistic called a Wald c 2. If the value of c 2 is large and its associated p value is small, then reject the null hypothesis that the parameter estimate is 0. For the present example, neither the variable sex nor its interaction with amount of GABA blockage contributes significantly to prediction. Above we noted that SAS was predicting the absence of a seizure. To predict the presence of a seizure all we need to do is reverse the sign for each coefficient. Hence, to predict the presence of a seizure, the coefficient for variable gababl (GABA blockage) is.4754. This denotes that higher levels of GBA blockage (and hence, higher neuronal activity) increase the probability of a seizure. Association of Predicted Probabilities and Observed Responses Percent Concordant 80.6 Somers' D 0.619 Percent Discordant 18.7 Gamma 0.624 Percent Tied 0.7 Tau-a 0.289 Pairs 3388 c 0.810 The final section of the output shows the extent to which predictions based on the logistic regression agree with the observed outcome (or dependent variable). If we used the logistic regression to predict for each rat whether or not it would have a seizure, we would have agreed with the observed data on seizures 80.6% of the time. Other statistics used for agreements are also presented (see the SAS manual for their meaning).
QMIN Logistic Regression - 5 The Logistic Function The figure below shows the logistic function for the present example. The X axis plots the value of L for each rat. In this case, L = -6.8144 -.4683* sex+.4754 * GABA+.0834 * sex*gaba. Note that the scale for liability differs from that of the raw variables.