Testing and Interpreting Interactions in Regression In a Nutshell



Similar documents
1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

Binary Logistic Regression

Data Analysis in SPSS. February 21, If you wish to cite the contents of this document, the APA reference for them would be

SPSS Basic Skills Test


MULTIPLE REGRESSION WITH CATEGORICAL DATA

Logs Transformation in a Regression Equation

Levels of measurement in psychological research:

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

SPSS Resources. 1. See website (readings) for SPSS tutorial & Stats handout

The F distribution and the basic principle behind ANOVAs. Situating ANOVAs in the world of statistical tests

NSM100 Introduction to Algebra Chapter 5 Notes Factoring

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest

ANALYSIS OF TREND CHAPTER 5

Experimental Designs (revisited)

Module 4 - Multiple Logistic Regression

Introduction to Data Analysis in Hierarchical Linear Models

Gerry Hobbs, Department of Statistics, West Virginia University

Directions for using SPSS

Few things are more feared than statistical analysis.

Using SAS Proc Mixed for the Analysis of Clustered Longitudinal Data

Chapter 7: Dummy variable regression

Main Effects and Interactions

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Chapter. Three-Way ANOVA CONCEPTUAL FOUNDATION. A Simple Three-Way Example. 688 Chapter 22 Three-Way ANOVA

II. DISTRIBUTIONS distribution normal distribution. standard scores

Introduction to Longitudinal Data Analysis

Moderation. Moderation

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

A Framework for Analyses with Numeric and Categorical Dependent Variables. An Exercise in Using GLM. Analyses with Categorical Dependent Variables

Analysis of Variance. MINITAB User s Guide 2 3-1

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

Mgmt 469. Model Specification: Choosing the Right Variables for the Right Hand Side

Data analysis process

Assignment objectives:

Statistics Review PSY379

January 26, 2009 The Faculty Center for Teaching and Learning

a.) Write the line 2x - 4y = 9 into slope intercept form b.) Find the slope of the line parallel to part a

Overview of Factor Analysis

Eight things you need to know about interpreting correlations:

Interaction effects between continuous variables (Optional)

The Big Picture. Correlation. Scatter Plots. Data

11. Analysis of Case-control Studies Logistic Regression

An analysis method for a quantitative outcome and two categorical explanatory variables.

Introduction to SPSS 16.0

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

expression is written horizontally. The Last terms ((2)( 4)) because they are the last terms of the two polynomials. This is called the FOIL method.

Εισαγωγή στην πολυεπίπεδη μοντελοποίηση δεδομένων με το HLM. Βασίλης Παυλόπουλος Τμήμα Ψυχολογίας, Πανεπιστήμιο Αθηνών

An Introduction to SPSS. Workshop Session conducted by: Dr. Cyndi Garvan Grace-Anne Jackman

Can SAS Enterprise Guide do all of that, with no programming required? Yes, it can.

Everything You Wanted to Know about Moderation (but were afraid to ask) Jeremy F. Dawson University of Sheffield

Algebra 1 If you are okay with that placement then you have no further action to take Algebra 1 Portion of the Math Placement Test

Using R for Linear Regression

Minitab Tutorials for Design and Analysis of Experiments. Table of Contents

Introduction to Multilevel Modeling Using HLM 6. By ATS Statistical Consulting Group

Lecture 14. Chapter 7: Probability. Rule 1: Rule 2: Rule 3: Nancy Pfenning Stats 1000

PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Enrollment Data Undergraduate Programs by Race/ethnicity and Gender (Fall 2008) Summary Data Undergraduate Programs by Race/ethnicity

Multivariate Analysis. Overview

Module 5: Introduction to Multilevel Modelling SPSS Practicals Chris Charlton 1 Centre for Multilevel Modelling

Lecture 2: Types of Variables

Measurement and Measurement Scales

SPSS Advanced Statistics 17.0

Chapter 7 Factor Analysis SPSS

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb

Click on the links below to jump directly to the relevant section

Improved Interaction Interpretation: Application of the EFFECTPLOT statement and other useful features in PROC LOGISTIC

Shifting focus from teaching to learning: Learning leadership from improvising jazz bands (ITL92)

SUGI 29 Statistics and Data Analysis

COLLEGE ALGEBRA. Paul Dawkins

PRIMARY CONTENT MODULE Algebra I -Linear Equations & Inequalities T-71. Applications. F = mc + b.

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Online EFFECTIVE AS OF JANUARY 2013

Two-Way ANOVA Lab: Interactions

Interaction effects and group comparisons Richard Williams, University of Notre Dame, Last revised February 20, 2015

Revisiting Inter-Industry Wage Differentials and the Gender Wage Gap: An Identification Problem

Multivariate Logistic Regression

Chapter 2 Quantitative, Qualitative, and Mixed Research

Monte Carlo Simulation. SMG ITS Advanced Excel Workshop

Specifications for this HLM2 run

25 Working with categorical data and factor variables

Psychology 205: Research Methods in Psychology

Correlational Research

Comparing a Multiple Regression Model Across Groups

1. True or False? A voltage level in the range 0 to 2 volts is interpreted as a binary 1.

Lecture 6. Weight. Tension. Normal Force. Static Friction. Cutnell+Johnson: , second half of section 4.7

Simple Predictive Analytics Curtis Seare

AP CALCULUS BC 2008 SCORING GUIDELINES

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide

Survey Research Data Analysis

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Transcription:

Testing and Interpreting Interactions in Regression In a Nutshell The principles given here always apply when interpreting the coefficients in a multiple regression analysis containing interactions. However, given these principles, the meaning of the coefficients for categorical variables varies according to the method used to code the categorical variables. The method assumed here is dummy coding, whereby each category except one is represented by a dummy (or indicator) variable which has a value of one for the category being represented and a value of zero for all other categories. The category for which there is no dummy variable consequently has a value of zero for all the dummy variables and is known as the reference category. It is also assumed, for convenience, that the indicator variables are entered into the procedure used for the analysis in such a way that the procedure doesn't do its own coding, but leaves the variables exactly as they are coded. For example, if the GLM procedure in SPSS is used, it is assumed here that the indicator variables are entered as covariates rather than fixed factors (after 'with' rather than 'by' in syntax). If they are entered as fixed factors or after 'by' in GLM, the procedure always makes the highest-numbered category the reference category, which can be a bit difficult to get your head around. For example, if our variable gender is coded 0 for females and 1 for males (so that females are the reference category), GLM reverses this so that males are now the reference category (0) and females are represented by 1. Note that the comments here apply to the regression coefficients shown in the parameter estimates table, not to the results in the ANOVA table. Exactly the same principles apply to the interpretation of the results shown in the ANOVA table, but programs like SPSS, if we allow them to do the coding of categorical variables for us, typically don't use dummy coding for the results shown in ANOVA tables. This is a topic in itself. As the emphasis is on interpreting interactions, no reference is made in the following to interpreting the coefficient for the constant. However, a note at the end briefly describes the effects that the strategies used for interpreting interactions have on the constant. Two Way Interactions In the regression equation for the model y = A + B + A*B (where A * B is the product of A and B, which is a test of their interaction) the regression coefficient for A shows the effect of A when B is zero and the coefficient for B shows the effect of B when A is zero. (The coefficient for A*B shows how the effect of A changes with a one-unit increase in B, but we won't be concentrating on that here.) This rule holds whether the interaction is significant or not: its mere presence changes the interpretation of the coefficients for A and B from unconditional (when there is no interaction term included) to conditional.

-2- Categorical Variables Imagine that A and B are single dummy (0,1) variables, and that A represents gender (0=females, 1=males) and B represents condition (0=control, 1=experimental). Then, the interaction shows whether the effect of condition is different for males and females (or, that the difference between males and females on y is different for the two conditions). Given the rule given earlier, the coefficient for A shows the difference in y between males and females for the control condition, because it is coded zero. Likewise, the coefficient for B shows the difference between the control and experimental conditions for female subjects. Once this is understood, a whole new world of possibilities opens up. To find out whether male and females scores differ for the experimental condition, we can run the analysis again with the codes for condition reversed (0=experimental, 1=control), so that now the coefficient for gender shows the difference between males and females in the experimental condition. Likewise, we can see whether the treatment (experimental versus control) had an effect for males by reversing the coding for gender, so that 0=males and 1=females. This is referred to as testing the simple effects of gender and condition. One Categorical and One Numeric Variable Now, imagine that in y = A + B + A*B (where A * B is the product of A and B, which is a test of their interaction) A is still gender, dummy coded as before, but B a continuous variable, let's say age in years. Exactly the same rules apply. Let's start with the coefficient for B. It shows the effect of age (by effect I mean the slope of the regression line relating age and y) for females. The test of significance of the effect for age shows whether the slope of the line departs significantly from zero and, of course, the sign of the coefficient shows whether the relationship is positive or negative. In order to examine the relationship between age and y for males, we can reverse the coding for gender, so that now the coefficient for age is that for males. This is called testing the simple slopes. What about the difference for males and females? Exactly the same principle applies. The coefficient for A shows the difference between males and females when age is zero. But hang on if we have a sample of people aged from 18 up, what does this mean? Well, it's a perfectly valid result as far as the model is concerned, but the coefficient is pretty meaningless given that the sample contains no one of zero age (and very few samples are likely to). So, this where centring comes in. If we subtract the mean of age for the sample from each subject's age, the mean of age is now zero and, when we run the analysis again, the coefficient for gender now shows the difference between males and females at the mean of age for the sample, a much more meaningful value. (The exciting thing, of course, is that we don't have to stop at the

-3- mean. Say the mean age of the sample is 35, but we have a goodly number of subjects aged from 18 to 30; it would be legitimate, instead of subtracting the mean age, to subtract (say) 25, and find out whether the model suggests that there is a significant difference between males and females aged 25.) Two Numeric Variables Now imagine that in y = A + B + A*B (where A * B is the product of A and B, which is a test of their interaction) both A and B are numeric. Let's say B is age, as above, but now A is IQ. If y is a test score of some sort, the question might be whether the relationship between age and y differs according to IQ (or whether the relationship between IQ and y differs according to age). The coefficient for A shows the relationship between IQ and y when age is zero, and the coefficient for B shows the relationship between age and y when IQ is zero. (Of course, the coefficient for A * B answers the research question, but we're concentrating on how including an interaction changes the meaning of the coefficients for the variables involved in the interaction.) Once again, centring will produce more meaningful values for the coefficients. If we centre age and IQ at their respective means, the coefficient for A shows the effect (slope) of IQ at the mean of age and the coefficient for B shows the slope for age at the average IQ of the sample. Three-Way Interactions The same principles apply when we move from two-way to higher-level interactions. Here is an example of a model with a three-way interaction and all two-way interactions: y = A + B + C + A*B + A*C + B*C + A*B*C Now, as well as considering the effects of the inclusion of an interaction on the interpretation of coefficients for individual variables, we can consider the effects of including higher-order interactions on the interpretation of the coefficients for lowerorder interactions. But, it all makes perfect sense. Two Way Interactions The rules are: When the interaction A*B*C is included: The coefficient for A*B shows the interaction between A and B when C is zero, The coefficient for A*C shows the interaction between A and C when B is zero, and The coefficient for B*C shows the interaction between B and C when A is zero.

-4- The same sorts of things described above apply here: We can manipulate the values of variables to carry out specific tests. For example, if C is gender, where 0=female and 1=male, the original analysis will show whether the interaction of A and B is significant for females; if we reverse the coding for gender, the result for A*B will show whether the interaction A*B is significant for males. If C is a numeric variable, the analysis will show whether the interaction A*B is significant when that variable has a value of zero. The usefulness of this result depends on the meaning of zero on C. If C is age, it is unlikely that the result will be useful. However, if C is centred at the mean, we can see whether the A*B interaction is significant at the mean age. Main Effects The rules are: When the interaction A*B*C and all two-way interactions are included: The coefficient for A shows the effect of A when both B and C are zero, The coefficient for B shows the effect of B when both A and C are zero, and The coefficient for C shows the effect of C when both A and B are zero. Therefore, the simple effects or simple slopes for each variable can be tested by the manipulating the values of the other two variables (see the example below). Four-Way and Higher Interactions The above principles extend directly to any order of interaction. An Example Say we have three IVs, gender (0=females, 1=male), age (mean 20) and IQ (mean 100), and an DV, creativity (the mean is irrelevant for our purposes) The original analysis: Selected Examples 1. The coefficient for age*iq shows whether there is an interaction between age and iq for females. Is there such an interaction for males? temporary. recode gender (0=1)(1=0).

-5-2. In the original analysis, the coefficient for gender*iq shows whether the relationship between IQ and creativity differs for males and females aged zero. What is the interaction between gender and IQ at the mean age of the sample? temporary. compute age = age 20. 3. In the original analysis, the coefficient for age shows the relationship between age and creativity for females with zero IQ. What is the relationship between age and creativity for males with an average IQ? temporary. compute iq = iq 100. recode gender (0=1)(1=0). Two final points 1. Don't get misled and worked up about the apparently very different results obtained for a variable with and without an interaction Sometimes the coefficient for, or the significance of, a variable involved in an interaction changes dramatically when an interaction term is included in an analysis, especially if it contains numeric variables. The most likely reason for this is that the coefficient for that variable is now showing something very different from what it was showing when there was no interaction term. Without the interaction term, the coefficient shows the relationship between the variable and the dependent variable averaged over all levels of the other variables. When the interaction is included (whether it is significant or not), the coefficient for the variable shows the effect of that variable when the other variable involved in the interaction is zero. This is called a conditional effect, and can be very different from the unconditional effect obtained when there is no product term included in the analysis. Centring the numeric variable(s) at the mean will often produce coefficients which are much more similar to the unconditional coefficients. 2. The constant is affected by the coding and centring of variables in a regression analysis The constant in a regression equation shows the value of the dependent variable when all the independent variables are zero. Consequently, the constant may change dramatically when numeric variables are centred at the mean. The change is usually to a more sensible value (i.e., more likely to be within the range of the values of the dependent variable actually observed) than is obtained with the uncentred version of an independent variable which does not include zero in its range. The constant may

-6- also change noticeably when a dummy code is reversed. For example, when a variable coded zero for females and one for males is reversed, the constant goes from showing the mean for females to showing the mean for males. Alan Taylor 20th June 2007