Regression with Categorical and Continuous Independent Variables

Similar documents
SPSS Guide: Regression Analysis

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Chapter 13 Introduction to Linear Regression and Correlation Analysis

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Multiple Regression. Page 24

Binary Logistic Regression

Module 5: Multiple Regression Analysis

10. Analysis of Longitudinal Studies Repeat-measures analysis

Simple linear regression

ANALYSIS OF TREND CHAPTER 5

Moderator and Mediator Analysis

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)

1.1. Simple Regression in Excel (Excel 2010).

Regression III: Advanced Methods

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

Part 2: Analysis of Relationship Between Two Variables

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Directions for using SPSS

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Additional sources Compilation of sources:

Chapter 5 Analysis of variance SPSS Analysis of variance

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Chapter 7: Simple linear regression Learning Objectives

Research Methods & Experimental Design

Ordinal Regression. Chapter

Two-way ANOVA and ANCOVA

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Introduction to Data Analysis in Hierarchical Linear Models

Example: Boats and Manatees

Statistical Models in R

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Chapter 15. Mixed Models Overview. A flexible approach to correlated data.

Psychology 205: Research Methods in Psychology

The Dummy s Guide to Data Analysis Using SPSS

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Final Exam Practice Problem Answers

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Illustration (and the use of HLM)

Introduction to Longitudinal Data Analysis

Main Effects and Interactions

Moderation. Moderation

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

When to use Excel. When NOT to use Excel 9/24/2014

Comparing a Multiple Regression Model Across Groups

Multiple Regression Using SPSS

data visualization and regression

10. Comparing Means Using Repeated Measures ANOVA

An analysis method for a quantitative outcome and two categorical explanatory variables.

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

The importance of graphing the data: Anscombe s regression examples

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Review Jeopardy. Blue vs. Orange. Review Jeopardy

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Multivariate Analysis. Overview

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

5. Linear Regression

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

How to Get More Value from Your Survey Data

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Predictability Study of ISIP Reading and STAAR Reading: Prediction Bands. March 2014

Mixed 2 x 3 ANOVA. Notes

UNDERSTANDING THE TWO-WAY ANOVA

Session 7 Bivariate Data and Analysis

2. Linear regression with multiple regressors

Slope-Intercept Equation. Example

Comparing Nested Models

Chapter 23. Inferences for Regression

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION

APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

One-Way ANOVA using SPSS SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

ANALYSIS OF USER ACCEPTANCE OF A NETWORK MONITORING SYSTEM WITH A FOCUS ON ICT TEACHERS

1 Theory: The General Linear Model

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

Profile analysis is the multivariate equivalent of repeated measures or mixed ANOVA. Profile analysis is most commonly used in two cases:

Test Bias. As we have seen, psychological tests can be well-conceived and well-constructed, but

Linear functions Increasing Linear Functions. Decreasing Linear Functions

Interaction effects and group comparisons Richard Williams, University of Notre Dame, Last revised February 20, 2015

Section 1.1 Linear Equations: Slope and Equations of Lines

MULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM. R, analysis of variance, Student test, multivariate analysis

y = a + bx Chapter 10: Horngren 13e The Dependent Variable: The cost that is being predicted The Independent Variable: The cost driver

Multiple Linear Regression

MEASURES OF VARIATION

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression

Mathematics within the Psychology Curriculum

Fairfield Public Schools

Economics of Strategy (ECON 4550) Maymester 2015 Applications of Regression Analysis

A Basic Guide to Analyzing Individual Scores Data with SPSS

Multiple Regression: What Is It?

Interaction between quantitative predictors

Transcription:

Regression with Categorical and Continuous Independent Lecture 12 November 19, 2008 ERSH 8320 Lecture #12-11/19/2008 Slide 1 of 28

Today s Lecture How regression works with categorical and continuous variables (Chapter 14). Today s Lecture Lecture #12-11/19/2008 Slide 2 of 28

Continuous and Categorical Independent Vari Previous techniques used either Categorical Independent or Continuous Independent. Example Data Wrong Way A Better Way Full Model Reduced Model Now, we will look at what happens when we combine both Categorical and Continuous Independent in a single analysis. Lecture #12-11/19/2008 Slide 3 of 28

Example Data An experiment was designed to study the effects of incentives and study time on retention of classroom material in students. Example Data Wrong Way A Better Way Full Model Reduced Model Study design: Groups of students: Incentive or No Incentive This is a categorical variable. Amount of study time: 5 hours, 10 hours, 15 hours, or 20 hours. We will consider this a continuous variable. Dependent variable was score on a test (retention). Lecture #12-11/19/2008 Slide 4 of 28

The Wrong Way to Analyze Data One way to analyze these data is to compute two regression lines. Example Data Wrong Way A Better Way Full Model Reduced Model One for the Incentive Group and one for the No Incentive Group. Then look to see how these two lines differ (if at all). This is not the right approach (the right one will be shown next). Lecture #12-11/19/2008 Slide 5 of 28

Two Regression Analyses Incentive Group No Incentive Group Incentive: Incentive Incentive: No Incentive 15.00 15.00 12.00 12.00 Y - Retention 9.00 6.00 Y - Retention 9.00 6.00 3.00 R Sq Linear 0.459 = 3.00 R Sq Linear 0.708 =.00 6.00 8.00 10.00 12.00 14.00 16.00 18.00 20.00.00 6.00 8.00 10.00 12.00 14.00 16.00 18.00 20.00 Study Time Study Time Y = 7.33 + 0.21X Y = 2.50 + 0.27X Do these equations seem different? Lecture #12-11/19/2008 Slide 6 of 28

The Eyeball Approach The slopes do not seem that different,.21 is fairly close to.27. Example Data Wrong Way A Better Way Full Model Reduced Model The increase in test score as a function of study time is very similar in both incentive groups. As you can see, there is a large difference in intercepts. The base score (score with no study time) is almost 5 points greater in the incentive group when modeled separately. Is that difference significant? Statistics needs evidence, not just eyeballs. Lecture #12-11/19/2008 Slide 7 of 28

A Better Way A better way to answer the question is to use a single statistical model. Example Data Wrong Way A Better Way Full Model Reduced Model We will refer to a regression equation with both study time and incentive group as IVs as the full model. To set up a comparison, we first need to calculate the regression equation using the full model (both variables together). The model will have both main effects (Incentive Group and Study Time) and well as the interaction between the incentive group and study time. Incentive is coded as 1 for No Incentive and -1 for Incentive. This is effect coding. Lecture #12-11/19/2008 Slide 8 of 28

The Full Model The full model is the model where incentive, study time, and their interaction are included to predict an examinee s retention: Example Data Wrong Way A Better Way Full Model Reduced Model Where: Y = a + b 1 X 1 + b 2 X 2 + b 3 X 1 X 2 X 1 is the effect coded variable for the incentive group of an examinee (either a -1 or a 1). X 2 is the amount of time studied for the test. X 1 X 2 is the product of the two variables, representing the interaction. To use the regression package in SPSS, we have to manually create this variable by using the Transform function. Lecture #12-11/19/2008 Slide 9 of 28

Full Model Results Model Summary Mode Adjusted R Std. Error of l R R Square Square the Estimate 1.909 a.827.801 1.22270 a. Predictors: (Constant), interact, Study Time, Incentive ANOVA b Model 1 Regression Residual Total Sum of Squares 142.725 29.900 172.625 a. Predictors: (Constant), interact, Study Time, Incentive b. Dependent Variable: Y - Retention df 3 20 23 Mean Square 47.575 1.495 Coefficients a F 31.823 Sig..000 a Model 1 (Constant) Incentive Study Time interact Unstandardized Coefficients Standardized Coefficients B Std. Error Beta t Sig. 4.917-2.417.237.030 a. Dependent Variable: Y - Retention.611.611.045.045 -.901.493.153 8.042-3.953 5.301.672.000.001.000.509 Lecture #12-11/19/2008 Slide 10 of 28

Full Model Results Estimated regression equation: Y = 4.917 2.417X 1 +.237X 2 +.03X 1 X 2 From the SPSS output we can tell the following: No significant interaction between incentive group and study time (ˆb 3 = 0.03, p = 0.509). We will come to know that no interaction means the slope of the line is the same across all levels of the categorical variable. Significant main effect of study time (ˆb 2 =.237, P < 0.001). Regardless of incentive group, retention increases by.237 points for every additional hour of study time. Significant main effect of incentive group (ˆb 1 = 2.417, p = 0.001). There is a significant difference in (adjusted) mean value of retention between the two groups. Lecture #12-11/19/2008 Slide 11 of 28

Further Interpretation Because the full model included a categorical independent variable, we can decompose that model into two separate models, one for each group: Incentive Group Y = 4.917 2.417( 1) +.237X 2 +.03( 1)X 2 Y = (4.917 + 2.417) + (.237.03)X 2 Y = 7.334 +.207X 2 No Incentive Group Y = 4.917 2.417(1) +.237X 2 +.03(1)X 2 Y = (4.917 2.417) + (.237 +.03)X 2 Y = 2.5 +.267X 2 Recall from slide 6 the original results: Incentive group: Y = 7.33 +.21X 2 No Incentive group: Y = 2.5 +.27X 2 We get the same numbers! Lecture #12-11/19/2008 Slide 12 of 28

Continuing the Analysis Example Data Wrong Way A Better Way Full Model Reduced Model Because the full model interaction term was not statistically significant, we should remove the term from the model and re-estimated the model. This is called the reduced model, and looks like: Where: Y = a + b 1 X 1 + b 2 X 2 X 1 is the effect coded variable for the incentive group of an examinee (either a -1 or a 1). X 2 is the amount of time studied for the test. Without the interaction, our model makes the assumption of equal slopes across incentive groups. We already tested this assumption and found evidence the slopes were equal across groups. Lecture #12-11/19/2008 Slide 13 of 28

Reduced Model Results Model Summary Mode Adjusted R Std. Error of l R R Square Square the Estimate 1.907 a.823.806 1.20663 a. Predictors: (Constant), Study Time, Incentive ANOVA b Model 1 Regression Residual Total Sum of Squares 142.050 30.575 172.625 a. Predictors: (Constant), Study Time, Incentive df 2 21 23 Mean Square 71.025 1.456 F 48.783 Sig..000 a b. Dependent Variable: Y - Retention Coefficients a Model 1 (Constant) Incentive Study Time Unstandardized Coefficients Standardized Coefficients B Std. Error Beta t Sig. 4.917-2.042.237 a. Dependent Variable: Y - Retention.603.246.044 -.761.493 8.149-8.289 5.371.000.000.000 Lecture #12-11/19/2008 Slide 14 of 28

Reduced Model Results Estimated regression equation: Y = 4.917 2.042X 1 +.237X 2 From the SPSS output we can tell the following: Significant main effect of study time (ˆb 2 =.237, P < 0.001). Regardless of incentive group, retention increases by.237 points for every additional hour of study time. Significant main effect of incentive group (ˆb 1 = 2.0424, p < 0.001). There is a significant difference in (adjusted) mean value of retention between the two groups. Lecture #12-11/19/2008 Slide 15 of 28

Further Interpretation Because the reduced model included a categorical independent variable, we can decompose that model into two separate models, one for each group: Incentive Group Y = 4.917 2.042( 1) +.237X 2 Y = (4.917 + 2.042) +.237X 2 Y = 6.959 +.237X 2 Recall from slide 6 the original results: Incentive group: Y = 7.33 +.21X 2 No Incentive Group Y = 4.917 2.042(1) +.237X 2 Y = (4.917 2.042) +.237X 2 Y = 2.875 +.237X 2 No Incentive group: Y = 2.5 +.27X 2 We do not get the same numbers... Lecture #12-11/19/2008 Slide 16 of 28

Danger Basis For Some researchers may find it beneficial to partition continuous variables into a number of categories. In our example, even though study time was continuous, we could have also thought of it as a categorical variable with 4 levels (5, 10, 15, 20 minutes). A 2 X 4 ANOVA could have been computed. Lecture #12-11/19/2008 Slide 17 of 28

Danger Basis For Another way of categorizing a continuous variable is often done in a treatment-by-levels design. For example, a researcher may be interested in the difference between two teaching methods. Prior to beginning treatment, all subjects have a different intelligence level. The experimenter may want to control for intelligence in the design to piece out the information regarding the treatment. The resulting ANOVA will portion out the variance related to the control variable. Lecture #12-11/19/2008 Slide 18 of 28

Danger Basis For Some studies categorize continuous variables in an attempt to study possible interactions between the independent variables. These are often called: Aptitude-Treatment Interaction (ATI). Attribute-Treatment Interaction (ATI). Trait-Treatment Interaction (TTI). Different from previous categorization because the control variable is actually a factor of interest. In this same example, the researcher may want to see if the treatments change the test scores differently for people with different intelligence. Lecture #12-11/19/2008 Slide 19 of 28

Danger Basis For You can also categorize continuous variables in a counterproductive way. This can occur if a researcher categorizes a continuous variable that has more than one attribute. For example, categorizing personality, attitudes, etc... Generally, you lose statistical power when you categorize a continuous variable. Contrary to the book s overall advice, categorization is a dangerous endeavor. Lecture #12-11/19/2008 Slide 20 of 28

Basis For Categorization How do you categorize a categorical variable? Danger Basis For Often, variables are cut in half at the median, then labeled low or high. It should be noted that you should be careful in your categorization, because not all lowšs are created equal... What effect does categorization have? Categorization leads to a loss of information and a less sensitive analysis. Lecture #12-11/19/2008 Slide 21 of 28

In the case where there is one continuous variable and one categorical variable (as in today s example), the interaction answers the question of whether the regression lines of the dependent variable (Retention) on the continuous variable (Study Time) are parallel for all the categories of the categorical variable (Incentive Group). In our example, the Study Time was manipulated, however, that is not always the case (researchers may simply ask how long the individual studied, for example). The test of significance would be the same, however, the interpretation of the interaction effect would differ. In the previous design, since we know Study Time was manipulated, the cause for difference has to be related to the Incentive Group. If we do not manipulate Study Time, the significance of the interaction may be a result of both the Incentive Group AND the Study Time. Lecture #12-11/19/2008 Slide 22 of 28

Types of Interaction Effects There are two main types of interaction effects: Ordinal Interaction: Interaction Types Reflects the fact that an independent variable seems to have more of an effect under one level of a second independent variable than under another level. If you graph an ordinal interaction, the lines will not be parallel, but they will not cross. Disordinal Interaction: When an independent variable has one kind of effect in the presence of one level of a second independent variable, but a different kind of effect in the presence of a different level of the second independent variable. Called a crossover interaction because the lines in a graph will cross. Lecture #12-11/19/2008 Slide 23 of 28

Types of Interaction Effects Ordinal Disordinal Lecture #12-11/19/2008 Slide 24 of 28

Comparing Regression Equations in Nonexperimental Nonexperimental designs are those in which neither the categorical variable nor the continuous variable are manipulated. The analytic approach in such designs is identical to that of experimental designs, however, it is the interpretation that differs. The interpretation is often more complex and ambiguous in terms of the findings. Lecture #12-11/19/2008 Slide 25 of 28

The Study of Bias One definition of test bias (Cleary, 1968) A test is biased for members of one subgroup of the population if, in the prediction of the criterion for which the test was designed, consistent nonzero errors of prediction are made for members of the subgroup. In other words, the test is biased if the criterion score predicted from the common regression line is consistently too high or too low for members of the subgroup. This is the regression model for test bias. This idea of test bias occurs when there is an interaction present when modeling two regression lines representing two categorical groups. Lecture #12-11/19/2008 Slide 26 of 28

Final Thought Combining categorical and continuous variables provides powerful statistical tools that help provide evidence as to the behavior of phenomena under study. Such tools provide the basis for most practical models used in quantitative research. Most nonexperimental studies include both categorical and continuous variables. Final Thought Next Class Next time we will see this is called ANCOVA (ANalysis of COVAriance). We will also see how controlling for continuous variables adjusts the means of our experimental groups. Lecture #12-11/19/2008 Slide 27 of 28

Next Time Lab: Categorical and continuous independent variables. Homework 8 due next week at the start of class. No class next week (Thanksgiving break). December 3: Analysis of Covariance (chapter 15), final preparation. Final Thought Next Class Lecture #12-11/19/2008 Slide 28 of 28