Comparing a Multiple Regression Model Across Groups



Similar documents
Statistical Control using Partial and Semi-partial (Part) Correlations

SPSS Guide: Regression Analysis

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Multiple Regression. Page 24

Binary Logistic Regression

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Two Related Samples t Test

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Moderator and Mediator Analysis

Mixed 2 x 3 ANOVA. Notes

Pearson s Correlation

Additional sources Compilation of sources:

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Chapter 7. Comparing Means in SPSS (t-tests) Compare Means analyses. Specifically, we demonstrate procedures for running Dependent-Sample (or

Example of Including Nonlinear Components in Regression

Analysing Questionnaires using Minitab (for SPSS queries contact -)

SPSS Resources. 1. See website (readings) for SPSS tutorial & Stats handout

Moderation. Moderation

SPSS Tests for Versions 9 to 13

Univariate Regression

1.1. Simple Regression in Excel (Excel 2010).

Chapter 2 Probability Topics SPSS T tests

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

The Dummy s Guide to Data Analysis Using SPSS

January 26, 2009 The Faculty Center for Teaching and Learning

Chapter 7: Simple linear regression Learning Objectives

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

Power Analysis for Correlation & Multiple Regression

Correlation and Regression Analysis: SPSS

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

A STUDY ON ONBOARDING PROCESS IN SIFY TECHNOLOGIES, CHENNAI

HYPOTHESIS TESTING WITH SPSS:

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Using R for Linear Regression

Pearson's Correlation Tests

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

Module 5: Multiple Regression Analysis

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Multiple Regression: What Is It?

THE RELATIONSHIP OF GRADUATE RECORD EXAMINATION APTITUDE TEST

Data Analysis for Marketing Research - Using SPSS

THE KRUSKAL WALLLIS TEST

Simple linear regression

Part 2: Analysis of Relationship Between Two Variables

Independent t- Test (Comparing Two Means)

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Co-Curricular Activities and Academic Performance -A Study of the Student Leadership Initiative Programs. Office of Institutional Research

Chapter 15. Mixed Models Overview. A flexible approach to correlated data.

Testing and Interpreting Interactions in Regression In a Nutshell

Chapter 3 Quantitative Demand Analysis

Psychology 205: Research Methods in Psychology

SPSS Guide How-to, Tips, Tricks & Statistical Techniques

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

DATA ANALYSIS. QEM Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. Howard University

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Chapter 5 Analysis of variance SPSS Analysis of variance

Using Excel for inferential statistics

THE RELATIONSHIP BETWEEN WORKING CAPITAL MANAGEMENT AND DIVIDEND PAYOUT RATIO OF FIRMS LISTED IN NAIROBI SECURITIES EXCHANGE

Multicollinearity Richard Williams, University of Notre Dame, Last revised January 13, 2015

Using Excel for Statistical Analysis

Introduction. Research Problem. Larojan Chandrasegaran (1), Janaki Samuel Thevaruban (2)

INTRODUCTION TO MULTIPLE CORRELATION

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

A full analysis example Multiple correlations Partial correlations

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish

The Not-Even-Remotely Close to Being a Complete Guide to SPSS / PASW Syntax. (For SPSS / PASW v.18+)

An Introduction to Path Analysis. nach 3

Illustration (and the use of HLM)

Lets suppose we rolled a six-sided die 150 times and recorded the number of times each outcome (1-6) occured. The data is

Chapter 23. Inferences for Regression

DDBA 8438: The t Test for Independent Samples Video Podcast Transcript

Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics

One-Way ANOVA using SPSS SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING

Directions for using SPSS

II. DISTRIBUTIONS distribution normal distribution. standard scores

USING THE CORRECT STATISTICAL TEST FOR THE EQUALITY OF REGRESSION COEFFICIENTS

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

CORRELATION ANALYSIS

EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST

Simple Linear Regression Inference

How to Get More Value from Your Survey Data

GMAC. Predicting Success in Graduate Management Doctoral Programs

Logs Transformation in a Regression Equation

Didacticiel - Études de cas

Introduction to Regression and Data Analysis

Main Effects and Interactions

Transcription:

Comparing a Multiple Regression Across Groups We might want to know whether a particular set of predictors leads to a multiple regression model that works equally effectively for two (or more) different groups (populations, treatments, cultures, social-temporal changes, etc.). Here's an example While developing a multiple regression model to be used to select graduate students based on scores, one of the faculty pointed out that it might not be a good idea to use the same model to select Experimental and Clinical graduate students. The way to answer this question is a bit cumbersome, but can be very important to consider. Here's what we'll do Split the file into a Clinical and an Experimental subfile Run the multiple regression predicting grad gpa from the three scores for each subfile Then compare how well the predictor set predicts the criterion for the two groups using Fisher's Z-test Then compare the structure (weights) of the model for the two groups using Hotelling's t-test and the Meng, etc. Z- test First we split the sample Data Split File Be sure "Organize output by groups" is marked and move the variable representing the groups into the "Groups Based on:" window Any analysis you request will be done separately for all the groups defined by this variable. Next, get the multiple regression for each group Analyze Regression Linear move graduate gpa into the "Dependent " window move grev, greq and grea into the "Independent(s)" window remember -- with the "split files" we did earlier, we'll get a separate model for each group SPSS Syntax SORT CASES BY program. SPLIT FILE SEPARATE BY program. RESSION /STATISTICS COEFF OUTS R ANOVA CHANGE /DEPENDENT ggpa /METHOD=ENTER grev greq grea.

Here's the abbreviated output PROGRAM = Clinical (n=64) Summary b Adjusted Std. Error of R R Square R Square the Estimate.835 a.698.683.34522 a. Predictors: (Constant), Verbal subscore of, Quantitative subscore of, Analytic subscore of b. PROGRAM = Clinical PROGRAM = Experimental (n=76) Summary b Adjusted Std. Error of R R Square R Square the Estimate.735 a.54.52.3980 a. Predictors: (Constant), Verbal subscore of, Quantitative subscore of, Analytic subscore of b. PROGRAM = Experimental (Constant) Analytic subscore of Quantitative subscore of Verbal subscore of Coefficients a,b Unstanda rdized Coefficien ts Stand ardize d Coeffi cients Si B Beta t g. -.773 -.287.20 2.698E-03.200 2.45.04 5.623E-03.74 8.070.00 -.7E-03 -.06 -.34.9 a. Dependent Variable: st year graduate gpa -- criterion variable b. PROGRAM = Clinical (Constant) Analytic subscore of Quantitative subscore of Verbal subscore of Coefficients a,b Unstandar dized Coefficient s Stand ardize d Coeffi cients B Beta t Sig. -.099-2.04.045 8.588E-03.754 7.737.000 2.275E-03.34 3.472.00-3.22E-03 -.36-3.48.00 a. Dependent Variable: st year graduate gpa -- criterion variable b. PROGRAM = Experimental Comparing the Fit -- R² values -- of the two models To compare the "fit" of this predictor set in each group we will use the FZT program to perform Fisher's Ztest to compare the R² values of.698 and.54 Remember the Fisher s Z part of the FZT program uses R (r) values! Applying the FZT program with ry =.835 & N=64 and ry =.735 & N=76 gives Z =.527 and so p >.05 So, based upon these sample data we would conclude that the predictor set does equally well for both groups. But remember that this is not a powerful test and that these groups have rather small sample sizes for this test. We might want to re-evaluate this question based on a larger sample size..

Comparing the "structure" of the two models. We want to work with the larger of the two groups, so that the test will have best sensitivity. So, first we have to tell SPSS that we want to analyze data only from Experimental students (program = 2). Data Select Cases Be sure "If condition is satisfied" is marked and click the "If " button Specify the variable and value that identifies the cases that are to be included in the analysis Next we have to construct a predicted criterion value from each group's model. Transform Compute

Finally we get the correlation of each model with the criterion and with each other (remember that the correlation between two models is represented by the correlation between their y' values). Because of the selection we did above these analyses will be based only on data from the Experimental students. Analyze Correlate Bivariate SPSS Syntax SELECT IF (program = 2). COMPUTE clinpred = (.002698*grea) + (.005623*greq) + (.007*grev) -.773 COMPUTE exppred = (.008588*grea) + (.002275*greq) + (-.003227*grev) -.099. CORR VARIABLES = gpa clinpred exppred. select experimental cases y based on clinical model y based on exp model get correlations to compare the clinical and experimental group models

Here's the output.. Correlations st year graduate gpa -- criterion variable EXPPRED CLINPRED Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N st year graduate gpa -- criterion variable EXPPRED CLINPRED.735.532..000.000 76 76 76.735.724.000..000 76 76 76.532.724.000.000. 76 76 76 Direct R Crossed R -- the same as the R from the original multiple regression analysis of the experimental data above. -- when you apply the weights from the Clinical sample multiple regression model onto the Experimental sample Correlation -- remember that the correlation between two models is represented by the correlation between their y' values Remember that the Hotellings t / Steiger s Z part of the FZT program uses R (r) values! Applying the FZT program with ry =.735, ry2 =.532 and r2 =.724 and N = 76 gives t = 3.44 & Z = 3.22 Based on this we would conclude there are structural differences between the best multiple regression model for predicting st year GPA for Clinical and Experimental graduate students. Inspection of the standardized weights of the two regression models suggests that all three predictors are important for predicting Experimental students grades, with something of an emphasis for the analytic subscale. For the Clinical students, the quant subscale seems the most important, with a lesser contribution by the analytic (and don't get too brave about ignoring the verbal -- remember the sample size is small).

Examining Individual Predictors for Between Group Differences in Contribution Asking if a single predictor has a different regression weight for two different groups is equivalent to asking if there is an interaction between that predictor and group membership. (Please note that asking about a regression slope difference and about a correlation difference are two different things you know how to use Fisher s Test to compare correlations across groups). This approach uses a single model, applied to the full sample Criterion = bpredictor + b2group + b3predictor*group + a If b3 is significant, then there is a difference between then predictor regression weights of the two groups. However, this approach gets cumbersome when applied to models with multiple predictors. With 3 predictors we would look at the model. Each interaction term is designed to tell us if a particular predictor has a regression slope difference across the groups. y = bg + b2p + b3g*p + b4p2 + b5g*p2 + b6p3 + b7g*p3 +a Because the collinearity among the interaction terms and between a predictor s term and other predictor s interaction terms all influence the interaction b weights, there has been dissatisfaction with how well this approach works for multiple predictors. Also, because his approach does not involve constructing different models for each group, it does not allow the comparison of the fit of the two models or an examination of the substitutability of the two models Another approach is to apply a significance test to each predictor s b weights from the two models to directly test for a significant difference. (Again, this is different from comparing the same correlation from 2 groups). However, there are competing formulas for SE b-difference. Here is the most common (e.g., Cohen, 983). b G - b G2 Z = ------------------ SE b-difference SE b-difference 2 (df * SE ) + (df * SE 2) bg bg bg2 bg2 = --------------------------------------------- df bg + df bg2 Note: When SEbs aren t available they can be calculated as SEb = b / t However, work by two research groups has demonstrated that, for large sample studies (both N > 30) this Standard Error estimator is negatively biased (produces error estimates that are too small), so that the resulting Z-values are too large, promoting Type I & Type 3 errors. (Brame, Paternost, Mazerolle & Piquero, 998; Clogg, Petrova & Haritou, 995). Leading to the formulas bg - bg2 SE b-difference = ( SEbG 2 + SEbG2 2 ) and Z = --------------------------- ( SEbG 2 + SEbG2 2 ) Remember: Just because the weight from model is significant and the weight from another model is non significant does not mean that the two weights are significantly different!!! You must apply this to determine if they are significantly different! Here are the results from these models Predictor Clinical Group Experimental Group SEb-diff Z (Brame/Clogg) p b SEb** b SEb** Analytic.002698.00258.008588.000757.00468 4.0 <.00 Quantitative.005623.000697.002275.000665 0.0009563 3.475 <.00 Verbal -.007.00258 -.00322.000923.00056.309.906 The results show that both Analytic and Quantitative have significantly different regression weights in the clinical and experimental samples, while Verbal has equivalent regression weights in the two groups.

Example write-up of these analyses (which used some univariate and correlation info not shown above): A series of regression analyses were run to examine the relationships between graduate school grade point average (GGPA) and the Verbal (V), Quantitative (Q) and Analytic (A) subscales and compare the models derived from the Clinical and Experimental programs. Table shows the univariate statistics, correlations of each variable with graduate GGPA, and the multiple regression weights for the two programs. For the Clinical Program students this model had an R² =.698, F(3,60) = 4.35, p <.00, with Q and A having significant regression weights and Q seeming to have the major contribution (based on inspection of the β weights). For the Clinical Program students this model had an R² =.54, F(3,72) = 47.53, p <.00, with all three predictors having significant regression weights and A seeming to have the major contribution (based on inspection of the β weights). Comparison of the fit of the model from the Clinical and Experimental programs revealed that there was no significant difference between the respective R² values, Z =.527, p >.05. A comparison of the structure of the models from the two groups was also conducted by applying the model derived from the Clinical Program to the data from the Experimental Program and comparing the resulting crossed R² with the direct R² originally obtained from this group. The direct R²=.54 and crossed R²=.283 were significantly different, Z = 3.22, p <.0, which indicates that the apparent differential structure of the regression weights from the two groups described above warrants further interpretation and investigation. Further analyses revealed that both Analytic and Quantitative have significantly different regression weights in the clinical and experimental samples (Z=4.0, p <.00 & Z=3.50, p <.00, respectively), while Verbal has equivalent regression weights in the two groups (Z=.309, p=.9) Table Summary statistics, correlations and multiple regression weights from the Clinical and Experimental program participants. Clinical Program Experimental Program r with r with Variable mean std GGPA b β mean std GGPA b β GGPA 3.23.6 3.42.6 V 567.88 40.99.70 -.002 -.06 655.00 55..289 -.0032** -.36 Q 589.62 82.0.779**.0056**.74 720.00 8.6.530**.0023**.34 A 576.03 66.86.532**.0027*.200 664.00 8.8.724**.0086**.754 constant -.773 -.099 * p <.05 ** p <.0