Interactions involving Categorical Predictors



Similar documents
861 Example SPLH. 5 page 1. prefer to have. New data in. SPSS Syntax FILE HANDLE. VARSTOCASESS /MAKE rt. COMPUTE mean=2. COMPUTE sal=2. END IF.

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

SAS Syntax and Output for Data Manipulation:

Introduction to Longitudinal Data Analysis

SPSS Guide: Regression Analysis

Interaction effects and group comparisons Richard Williams, University of Notre Dame, Last revised February 20, 2015

SUGI 29 Statistics and Data Analysis

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

ln(p/(1-p)) = α +β*age35plus, where p is the probability or odds of drinking

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

Testing and Interpreting Interactions in Regression In a Nutshell

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Introduction to Multilevel Modeling Using HLM 6. By ATS Statistical Consulting Group

1 Theory: The General Linear Model

Moderation. Moderation

Traditional Conjoint Analysis with Excel

1.1. Simple Regression in Excel (Excel 2010).

Chapter 5 Analysis of variance SPSS Analysis of variance

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Simple Linear Regression, Scatterplots, and Bivariate Correlation

PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

Binary Logistic Regression

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

Main Effects and Interactions

ORTHOGONAL POLYNOMIAL CONTRASTS INDIVIDUAL DF COMPARISONS: EQUALLY SPACED TREATMENTS

Logs Transformation in a Regression Equation

Introduction to Data Analysis in Hierarchical Linear Models

Introduction to proc glm

Profile analysis is the multivariate equivalent of repeated measures or mixed ANOVA. Profile analysis is most commonly used in two cases:

Forecast. Forecast is the linear function with estimated coefficients. Compute with predict command

Wooldridge, Introductory Econometrics, 4th ed. Chapter 7: Multiple regression analysis with qualitative information: Binary (or dummy) variables

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

One-Way Analysis of Variance

Final Exam Practice Problem Answers

Statistics in Retail Finance. Chapter 2: Statistical models of default

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA

Analysis of Survey Data Using the SAS SURVEY Procedures: A Primer

Logistic Regression.

Contrast Coding in Multiple Regression Analysis: Strengths, Weaknesses, and Utility of Popular Coding Structures

Multiple Linear Regression

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Εισαγωγή στην πολυεπίπεδη μοντελοποίηση δεδομένων με το HLM. Βασίλης Παυλόπουλος Τμήμα Ψυχολογίας, Πανεπιστήμιο Αθηνών

Cool Tools for PROC LOGISTIC

[This document contains corrections to a few typos that were found on the version available through the journal s web page]

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

NIKE Case Study Solutions

11. Analysis of Case-control Studies Logistic Regression

Consider a study in which. How many subjects? The importance of sample size calculations. An insignificant effect: two possibilities.

Forecasting in STATA: Tools and Tricks

Week 5: Multiple Linear Regression

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

HLM software has been one of the leading statistical packages for hierarchical

Multiple Regression. Page 24

The importance of graphing the data: Anscombe s regression examples

5. Multiple regression

Family economics data: total family income, expenditures, debt status for 50 families in two cohorts (A and B), annual records from

SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Ordinal Regression. Chapter

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Regression III: Advanced Methods

Copyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

ABSORBENCY OF PAPER TOWELS

Directions for using SPSS

Linear Mixed-Effects Modeling in SPSS: An Introduction to the MIXED Procedure

Multilevel Modeling Tutorial. Using SAS, Stata, HLM, R, SPSS, and Mplus

Best Practices in Using Large, Complex Samples: The Importance of Using Appropriate Weights and Design Effect Compensation

Efficient and Practical Econometric Methods for the SLID, NLSCY, NPHS

25 Working with categorical data and factor variables

Lecture 18: Logistic Regression Continued

SPSS Resources. 1. See website (readings) for SPSS tutorial & Stats handout

MULTIPLE REGRESSION WITH CATEGORICAL DATA

Week TSX Index

An analysis method for a quantitative outcome and two categorical explanatory variables.

Numerical Issues in Statistical Computing for the Social Scientist

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Data Mining Part 5. Prediction

Multicollinearity Richard Williams, University of Notre Dame, Last revised January 13, 2015


Multiple Choice: 2 points each

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Logistic (RLOGIST) Example #1

Chapter 7: Dummy variable regression

Supplementary PROCESS Documentation

DISCRIMINANT FUNCTION ANALYSIS (DA)

Curve Fitting in Microsoft Excel By William Lee

Introducing the Multilevel Model for Change

Lecture 19: Conditional Logistic Regression

Interaction effects between continuous variables (Optional)

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

VI. Introduction to Logistic Regression

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

Chapter 7: Simple linear regression Learning Objectives

Transcription:

Interactions involving Categorical Predictors Today s Class: To CLASS or not to CLASS: Manual vs. program-created differences among groups Interactions of continuous and categorical predictors Interactions among categorical predictors CLP 944: Lecture 2c 1

Categorical Predictors (3+ Groups) Two alternatives for how to include grouping predictors 1. Manually create and include dummy-coded group contrasts Need g 1 contrasts for g groups, added all at once, treated as continuous (WITH in SPSS, by default in SAS, c. in STATA) Corresponds more directly to linear model representation Can be easier to set own reference group and contrasts of interest 2. Let the program create and include group contrasts for you Treated as categorical: BY in SPSS, CLASS in SAS, i. in STATA SPSS and SAS: reference = highest/last group; STATA: reference = lowest/first group Can be more convenient if you have many groups, want many contrasts, or have interactions among categorical predictors Program marginalizes over main effects when estimating other effects CLP 944: Lecture 2c 2

Categorical Predictors Treated as Continuous Model: Treatgroup variable: Control=0, Treat1=1, Treat2=2, Treat3=3 New variables d1= 0, 1, 0, 0 difference between Control and T1 to be created d2= 0, 0, 1, 0 difference between Control and T2 for the model: d3= 0, 0, 0, 1 difference between Control and T3 How does the model give us all possible group differences? By determining each group s mean, and then the difference Control (Reference) Treatment 1 Treatment 2 Treatment 3 + + + The model for the 4 groups directly provides 3 differences (control vs. each treatment), and indirectly provides another 3 differences (differences between treatments) CLP 944: Lecture 2c 3

Categorical Predictors Treated as Continuous Model: Control (Reference) Treatment 1 Treatment 2 Treatment 3 + + + Alt Group Ref Group Difference Control vs. T1 = Control vs. T2 = Control vs. T3 = T1 vs. T2 = T1 vs. T3 = T2 vs. T3 = CLP 944: Lecture 2c 4

Main effects with manual dummy codes Alt Group Ref Group Difference Control vs. T1 = β β β β Control vs. T2 = β β β β Control vs. T3 = β β β β Note the order of the equations: the reference group mean is subtracted from the alternative group mean. T1 vs. T2 = β β β β β β T1 vs. T3 = β β β β β β T2 vs. T3 = β β β β β β TITLE "Manual Contrasts for 4-Group Diffs"; PROC MIXED DATA=dataname METHOD=REML; MODEL y = d1 d2 d3 / SOLUTION; CONTRAST "Omnibus df=3 main effect F-test" d1 1, d2 1, d3 1; ESTIMATE "Control " intercept 1 d1 0 d2 0 d3 0; ESTIMATE "T1 " intercept 1 d1 1 d2 0 d3 0; ESTIMATE "T2 " intercept 1 d1 0 d2 1 d3 0; ESTIMATE "T3 " intercept 1 d1 0 d2 0 d3 1; ESTIMATE "Control vs. T1" d1 1 d2 0 d3 0; ESTIMATE "Control vs. T2" d1 0 d2 1 d3 0; ESTIMATE "Control vs. T3" d1 0 d2 0 d3 1; ESTIMATE "T1 vs. T2" d1-1 d2 1 d3 0; ESTIMATE "T1 vs. T3" d1-1 d2 0 d3 1; ESTIMATE "T2 vs. T3" d1 0 d2-1 d3 1; RUN; In SAS ESTIMATE statements (or SPSS TEST or STATA LINCOM), the variables refer to their betas; the numbers refer to the operations of their betas. Intercepts are used only in predicted values. Positive values indicate addition; negative values indicate subtraction. CLP 944: Lecture 2c 5

Interactions with manual dummy codes When using manual dummy-codes for group contrasts treated as continuous, any interactions have to be specified with each contrast For example, adding an interaction of treatgroup with age (0=85): y β β d1 β d2 β d3 β Age 85 β d1 Age 85 β d2 Age 85 β d3 Age 85 e TITLE "Group by Age for 4-Group Variable Treated as Continuous"; PROC MIXED DATA=dataname METHOD=REML; MODEL y = d1 d2 d3 age d1*age d2*age d3*age / SOLUTION; CONTRAST "Omnibus df=3 SIMPLE effect F-test" d1 1, d2 1, d3 1; CONTRAST "Omnibus df=3 interaction F-test" d1*age 1, d2*age 1, d3*age 1; ESTIMATE "Age Slope for Control" age 1 d1*age 0 d2*age 0 d3*age 0; ESTIMATE "Age Slope for T1" age 1 d1*age 1 d2*age 0 d3*age 0; ESTIMATE "Age Slope for T2" age 1 d1*age 0 d2*age 1 d3*age 0; ESTIMATE "Age Slope for T3" age 1 d1*age 0 d2*age 0 d3*age 1; ESTIMATE "Age Slope: Control vs. T1" d1*age 1 d2*age 0 d3*age 0; ESTIMATE "Age Slope: Control vs. T2" d1*age 0 d2*age 1 d3*age 0; ESTIMATE "Age Slope: Control vs. T3" ESTIMATE "Age Slope: T1 vs. T2" d1*age 0 d2*age 0 d3*age 1; d1*age -1 d2*age 1 d3*age 0; ESTIMATE "Age Slope: T1 vs. T3" d1*age -1 d2*age 0 d3*age 1; ESTIMATE "Age Slope: T2 vs. T3" d1*age 0 d2*age -1 d3*age 1; CLP 944: Lecture 2c 6

Using BY/CLASS/i. statements instead Designate as categorical in program syntax If you let SAS/SPSS do the dummy coding via CLASS/BY, then the highest/last group is default reference In SAS 9.4 you can change reference group: REF= level FIRST LAST but it changes that group to be last in the data ( confusing) Type III test of fixed effects provide omnibus tests by default LSMEANS/EMMEANS can be used to get all means and comparisons without specifying each individual contrast If you let STATA do the dummy coding via i.group, then the lowest/first group is default reference Can change reference group, e.g., last = ref ib(last).group CONTRAST used to get omnibus tests (not provided by default) MARGINS can be used to get all means and comparisons with much less code than describing each individual contrast CLP 944: Lecture 2c 7

Main effects of Categorical Predictors TITLE "Program-Created Contrasts for 4-Group Diffs via CLASS"; PROC MIXED DATA=dataname ITDETAILS METHOD=REML; CLASS treatgroup; MODEL y = treatgroup / SOLUTION; LSMEANS treatgroup / DIFF=ALL; The LSMEANS line above gives you all of the following note that one value has to be given for each possible level of the categorical predictor in order of data ESTIMATE "Control " intercept 1 treatgroup 1 0 0 0; ESTIMATE "T1 " intercept 1 treatgroup 0 1 0 0; ESTIMATE "T2 " intercept 1 treatgroup 0 0 1 0; ESTIMATE "T3 " intercept 1 treatgroup 0 0 0 1; ESTIMATE "Control vs. T1" treatgroup -1 1 0 0; ESTIMATE "Control vs. T2" treatgroup -1 0 1 0; ESTIMATE "Control vs. T3" treatgroup -1 0 0 1; ESTIMATE "T1 vs. T2" treatgroup 0-1 1 0; ESTIMATE "T1 vs. T3" treatgroup 0-1 0 1; ESTIMATE "T2 vs. T3" treatgroup 0 0-1 1; CONTRAST "Omnibus df=3 main effect F-test" treatgroup -1 1 0 0, treatgroup -1 0 1 0, CLASS gives this contrast by default treatgroup -1 0 0 1; CLASS statement means make my dummy codes for me When predicting intercepts, 1 means for that group only When predicting group differences, contrasts must sum to 0; here 1 = ref, 1 = alt, and 0 = ignore Can also make up whatever contrasts you feel like using DIVISOR option: ESTIMATE " of Treat groups" intercept 1 treatgroup 0 1 1 1 / DIVISOR=3; ESTIMATE "Control vs. of Treat groups" treatgroup -3 1 1 1 / DIVISOR=3; RUN; CLP 944: Lecture 2c 8

Interactions with Categorical Predictors For example, adding an interaction of treatgroup with age (0=85): TITLE "Group by Age for 4-Group Variable Modeled as Categorical"; PROC MIXED DATA=dataname METHOD=REML; CLASS treatgroup; MODEL y = treatgroup age treatgroup*age / SOLUTION; * To explain interaction as how group diffs depend on age: LSMEANS treatgroup / DIFF=ALL AT (age)=(-5); * group intercept diffs at age 80; LSMEANS treatgroup / DIFF=ALL AT (age)=(0); * group intercept diffs at age 85; LSMEANS treatgroup / DIFF=ALL AT (age)=(5); * group intercept diffs at age 90; * To explain interaction as how age slope depends on group: ESTIMATE "Age Slope for Control" age 1 treatgroup*age 1 0 0 0; ESTIMATE "Age Slope for T1" age 1 treatgroup*age 0 1 0 0; ESTIMATE "Age Slope for T2" age 1 treatgroup*age 0 0 1 0; ESTIMATE "Age Slope for T3" age 1 treatgroup*age 0 0 0 1; ESTIMATE "Age Slope: Control vs. T1" treatgroup*age -1 1 0 0; ESTIMATE "Age Slope: Control vs. T2" treatgroup*age -1 0 1 0; ESTIMATE "Age Slope: Control vs. T3" treatgroup*age -1 0 0 1; ESTIMATE "Age Slope: T1 vs. T2" treatgroup*age 0-1 1 0; ESTIMATE "Age Slope: T1 vs. T3" treatgroup*age 0-1 0 1; ESTIMATE "Age Slope: T2 vs. T3" treatgroup*age 0 0-1 1; Can also make up whatever contrasts you feel like using DIVISOR option: ESTIMATE " Age Slope in Treat groups" age 1 treatgroup*age 0 1 1 1 / DIVISOR=3; ESTIMATE "Age Slope: Control vs. of Treat" treatgroup*age -3 1 1 1 / DIVISOR=3; RUN; CLP 944: Lecture 2c 9

Categorical Predictors = Marginal Effects Letting the program build contrasts for categorical predictors (instead of creating manual dummy codes) does the following: Allows LSMEANS/EMMEANS/MARGINS (for cell means and differences) Provides omnibus (multiple df) multivariate Wald tests for group effects Marginalizes the group effect across interacting predictors omnibus F-tests represent marginal main effects (instead of simple) e.g., MODEL y = Treatgroup sexmw Treatgroup*sexMW (in which Treatgroup is always categorical ) Type 3 Tests of Fixed Effects Interpretation if sexmw is continuous (no CLASS) Interpretation if sexmw is categorical on CLASS sexmw Marginal diff across groups Marginal diff across groups Treatgroup Group diff if sexmw=0 Marginal diff across sexes Treatgroup*sexMW Interaction Interaction CLP 944: Lecture 2c 10

Interactions Among Categorical Predictors By default (i.e., as in ANOVA ): Model includes all possible higher-order interactions among categorical predictors Software does this for you; nonsignificant interactions usually still are kept in the model (but only significant interactions are interpreted) This is very different from typical practice in regression! Omnibus marginal main effects are provided by default i.e., what we ask for via CONTRAST using manual group contrasts But are basically useless if given significant interactions Omnibus interaction effects are provided i.e., what we ask for via CONTRAST using manual group contrasts But are basically useless in actually understanding the interaction Let s see how to make software give us more useful info CLP 944: Lecture 2c 11