By Hui Bian Office for Faculty Excellence

Similar documents
Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Moderator and Mediator Analysis

SPSS Guide: Regression Analysis

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Multiple Regression Using SPSS

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Correlation and Regression Analysis: SPSS

SPSS Explore procedure

Simple linear regression

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Binary Logistic Regression

Univariate Regression

Chapter 23. Inferences for Regression

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Regression Analysis (Spring, 2000)

Directions for using SPSS

Module 3: Correlation and Covariance

January 26, 2009 The Faculty Center for Teaching and Learning

DISCRIMINANT FUNCTION ANALYSIS (DA)

Introduction to Regression and Data Analysis

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Additional sources Compilation of sources:

Multiple logistic regression analysis of cigarette use among high school students

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Premaster Statistics Tutorial 4 Full solutions

False. Model 2 is not a special case of Model 1, because Model 2 includes X5, which is not part of Model 1. What she ought to do is estimate

The Dummy s Guide to Data Analysis Using SPSS

Homework 8 Solutions

Moderation. Moderation

Instructions for SPSS 21

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Introduction to Quantitative Methods

Introduction to Data Analysis in Hierarchical Linear Models

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

SPSS-Applications (Data Analysis)

2. Simple Linear Regression

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Multinomial Logistic Regression

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Multiple Regression. Page 24

4. Multiple Regression in Practice

The importance of graphing the data: Anscombe s regression examples

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Statistics. Measurement. Scales of Measurement 7/18/2012

Descriptive Statistics

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

11. Analysis of Case-control Studies Logistic Regression

II. DISTRIBUTIONS distribution normal distribution. standard scores

How to Get More Value from Your Survey Data

Chapter 7: Simple linear regression Learning Objectives

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Measurement & Data Analysis. On the importance of math & measurement. Steps Involved in Doing Scientific Research. Measurement

Homework 11. Part 1. Name: Score: / null

Multiple Regression: What Is It?

Module 5: Multiple Regression Analysis

Simple Linear Regression Inference

SPSS Tests for Versions 9 to 13

Trust, Job Satisfaction, Organizational Commitment, and the Volunteer s Psychological Contract

Chapter 15. Mixed Models Overview. A flexible approach to correlated data.

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

When to Use a Particular Statistical Test

There are six different windows that can be opened when using SPSS. The following will give a description of each of them.

Correlation key concepts:

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

SPSS Guide How-to, Tips, Tricks & Statistical Techniques

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Correlation and Regression

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Canonical Correlation Analysis

Data analysis process

SPSS Tutorial, Feb. 7, 2003 Prof. Scott Allard

Lecture 13/Chapter 10 Relationships between Measurement (Quantitative) Variables

5. Multiple regression

Effects of Influential Factors on Entrepreneurial Intention of Postgraduate Students in Malaysia

Introduction to Linear Regression

SPSS Resources. 1. See website (readings) for SPSS tutorial & Stats handout

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

TRINITY COLLEGE. Faculty of Engineering, Mathematics and Science. School of Computer Science & Statistics

Exercise 1.12 (Pg )

Introduction to Longitudinal Data Analysis

WHAT IS A JOURNAL CLUB?

Formula for linear models. Prediction, extrapolation, significance test against zero slope.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Mixed 2 x 3 ANOVA. Notes

Logistic Regression.

LOGISTIC REGRESSION ANALYSIS

Credit Risk Analysis Using Logistic Regression Modeling

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Predictability Study of ISIP Reading and STAAR Reading: Prediction Bands. March 2014

Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences

An introduction to IBM SPSS Statistics

SPSS TUTORIAL & EXERCISE BOOK

Transcription:

By Hui Bian Office for Faculty Excellence 1

Email: bianh@ecu.edu Phone: 328-5428 Location: 2307 Old Cafeteria Complex 2

When want to predict one variable from a combination of several variables. When want to determine which variables are better predictors than others. When want to compare models. 3

It is a model for the relationship between a dependent variable and a collection of independent variables. According to IBM SPSS Manual Linear regression is used to model the value of a dependent scale variable based on its linear relationship or straight line relationship to one or more predictors. 4

Regression Equation Y predicted = b 0 +b 1 x 1 +b 2 x 2 + +b p x p +e p Y predicted : predicted score of dependent variable b 0 : intercept p: number of predictors b 1 -b p : weights or partial regression coefficients for predictors/slope x 1 -x p : scores of predictors e p : errors of prediction Positive and negative regression weights reflect the nature of correlations between predictor and dependent variable. 5

Y Regression line Intercept 0 Slope X 6

Positive relationship Negative relationship No relationship 7

The model is linear because increasing the value of the p th predictor by 1 unit increases the value of the dependent by b p units. b 0 is the intercept, the model-predicted value of the dependent variable when the value of every predictor is equal to 0. 8

We use Least Square Criterion to estimate parameters. Lease Square means the sum of the squared estimated errors of predictions is minimized. Residuals or errors = y(observed scorepredicted score) The line best fits the data. The vertical distance between observed values of y and line is the residual. 9

In the scatterplot, we have an independent or X variable, and a dependent or Y variable. Each point in the plot represents one case (or one subject). The goal of linear regression procedure is to fit a line through the points. SPSS program computes a line so that the squared deviations of the observed points from that line are minimized. This general procedure is sometimes also referred to as least squares estimation. 10

11

Normality Linearity Equal variance 12

For each value of the independent variable, the distribution of the dependent variable must be normal. The variance of the distribution of the dependent variable should be constant for all values of the independent variable. The relationship between the dependent variable and each independent variable should be linear, and all observations should be independent. 13

The error term has a normal distribution with a mean of 0. The variance of the error term is constant across cases and independent of the variables in the model. 14

Multicollinearity Moderate to high inter-correlations among the independent variables It limits the size of R. The model is unstable in terms of prediction. It is hard to interpret the significance of predictors. 15

Checking assumptions Histogram of the standardized or studentized residuals (normality assumption) Scatter plots: the dependent variable, standardized predicted values, standardized residuals, deleted residuals, adjusted predicted values, Studentized residuals, or Studentized deleted residuals. 16

Scatter plots: Plot the standardized residuals (* ZRESID) against the standardized predicted values (*ZPRED) to check for linearity and equality of variances. From SPSS: Dependent and Standardized predicted values (*ZPRED), Standardized residuals (*ZRESID), Deleted residuals (*DRESID), Adjusted predicted values (*ADJPRED), Studentized residuals (*SRESID), Studentized deleted residuals (*SDRESID). 17

Plots from SPSS 18

19

Regression coefficients determine the relative importance of the significant predictors when the effects of other predictors are controlled. Unstandardized regression coefficients (B): reflect the raw score values (different metrics). Standardized regression coefficients (β): all the variables are measured on the same metric. 20

Squared multiple correlation (R 2 ) The model accounts for certain amount of the variance of dependent variable. That certain amount is R 2. Residual (prediction error) The difference between the predicted value and observed score of dependent variable. 21

Dependent variable: criterion variable Scale variables (interval or ratio)/quantitative Independent variables: predictors or control variables Continuous or categorical Inclusion of variables in the model is based on theories and empirical studies done by other researchers. 22

1 means the presence of something 0 means the absence of something or reference Number of dummy variables = p-1 p = number of levels of nominal variable Each dummy variable is dichotomous (0, 1) The reference level is the focus and other levels will compare with it. 23

Exercise One variable: a03 (race) Recode a03 into three categories: white, black, and others and create a new variable named a03r (1 = White, 2 = Black, 3 = Others) Then recode a03r into two dummy variables White is the reference category Two new dummy variables are: Dummy1 (black vs. white) and Dummy2 (others vs. white) 24

Recode a03 into a03r Response options for a03r: 1 = White, 2 = Black, 3 = Others 25

Recode a03 into a03r Transform > Recode into different variables > Highlight a03 and click > type a03r > Click 26

Click Old and New Values button 27

Dummy1 Dummy2 White 0 0 Black 1 0 Others 0 1 Dummy1: if participants are Black then coded 1, other categories are coded 0. Dummy2: if participants are Others then coded 1, other categories are coded 0. 28

Transform > Recode into different variables > Highlight a03r and click > type Dummy1 29

Click Old and New Values button 30

The same process of creating Dummy2 You should have this window: 31

Example: we want to determine if several predictors have effect on problem of drug use among drug users (use any of alcohol, cigarettes, and marijuana in the last 30 days:a28, a29, and a30) while controlling race variable (two dummy variables). Dependent variable: aalcohol_problem (total score: 0-17) 32

Independent variables: including two dummy variables and Frequency of marijuana use (a30: During the past 30 days, on how many days did you use marijuana? 1 = 0 days, 2 = 1-3 days, 11 = 28-30 days) Self-efficacy (a80r: How sure are you that you can avoid using alcohol, if offered by friends? 0= Very sure, 1 = Somewhat sure to not sure) Self-control ( During the past 30 days, which of the following have you used to help you avoid or limit your alcohol, cigarette, or marijuana use? a total score ranges from 0 to 18). Higher score = More self-control Peer norms: a93a ( My friends think that it's okay for me to drinks too much alcohol. 1 = Agree a lot, 2= Agree, 3= Disagree, 4 = Disagree a lot) 33

Regression model for our study Self-efficacy Error Self-control Marijuana use Problems related to drug uses Peer norms 34

Enter: enters all independent variables in a single step Stepwise: enters one independent variable at a time. At each step, the program performs the following calculations: for each variable currently in the model, it computes "F-toremove" statistic; for each variable not in the model, it computes "F-to-enter" statistic. At the next step, the program automatically enters the variable with the highest F- to-enter statistic, or removes the variable with the lowest F- to-remove statistic. Each predictor is constantly assessed. 35

Forward: enters one independent variable at each step and that variable has the largest simple correlation with dependent variable. Once a variable is entered into the model, it remains in the model. Backward: enters all independent variables in the analysis, then starts to remove non-significant variable from the model. The loss of this variable would least decrease the R 2. 36

Data screening Purpose of data screen is to check assumptions for the regression model Residual plots: used to check the constant variance assumption. standardized residuals (on Y axis) versus standardized predicted values (on X axis) If there is no violation of assumptions, standardized residuals should scatter randomly around a horizontal line of 0. Histogram and Normal p-p plot of standardized or studentized residuals Used to check normality assumption 37

Run multiple regression analysis First select cases: condition is: a28>1 a29>1 a30>1 then go to Analyze > Regression > Linear > put aalcohol_problem into Dependent > put a80r, a30, a93a, self-control, and two dummy variables into Independent 38

Click Statistics 39

Click Plots 40

Click Save 41

From the Descriptive Statistics table, we know that a total of 202 drug users were in the study. The average drug problems is 3.47 (SD=3.25). The method used is Enter. It means that we entered all independent variables into model simultaneously. 42

1. Model Summary a. R is a Pearson correlation between predicted values and actual values of dependent variable. b. R 2 is multiple correlation coefficient that represents the amount of variance of dependent variable explained by the combination of six predictors. 14% variance of drug problem is explained by six predictors. c. Adjusted R 2 is a more conservative than R 2. 2. ANOVA table The significant F value, F(6, 195) = 5.18, p <.01, indicates that there is a significant relationships between drug problem and six predictors. 43

1. The regression equation should be: Y = 2.275 +.258 Marijuana+1.128 Self_efficacy+.088 self_control -.457 Peer Norms +.410 Dummy 1 +.035 Dummy2 2. B is unstandardized regression coefficient and Beta is standardized regression coefficient. 3. t test and sig show the outcomes of each independent variable. 44

Colinearity Tolerance is the percentage of the variance in a given predictor that cannot be explained by the other predictors. When the tolerances are close to 0, there is high multicollinearity and the standard error of the regression coefficients will be inflated. Variance Inflation Factor (VIF) greater than 2 is usually considered problematic (based on SPSS manual). 45

Histogram of standardized residuals 46

Normal QQ plot 1. We want to know whether the distribution of errors matches a normal distribution. 2. If the selected variable matches the test distribution, the points cluster around a straight line. 47

Residual plot 1. Our residuals scatter randomly around 0. 2. The constant variance assumption is not violated. 3. The standardized residual of ID 1090 is 3.06. 48

1. 1. First two residuals plots suggest that the error variance changes with the independent variable. 2. Neither of these distributions are constant variance patterns. Therefore there is a violation of equal error variance assumption 3. The last horizontal-band pattern suggests that the variance of the residuals is constant. 49

Zero-order correlation Simple bivariate correlations between independent variable and dependent variable. Partial correlation Correlation between independent variable and dependent variable after all other independent variables are controlled. Part correlation Correlation between independent variable and dependent variable with the correlation between dependent variable and other independent variable is controlled. When squared, it represents the unique contribution of the independent variable to the model. 50

New created variable: standardized residual: ZRE_1 Run descriptive statistics of ZRE_1, e.g. use Explore function 51

Explore results 52

Explore results The Kolmogorov-Smirnov test is based on a simple way to quantify the discrepancy between the observed and expected distributions. It turns out, however, that it is too simple, and doesn't do a good job of discriminating whether or not your data was sampled from a Gaussian distribution. An expert on normality tests, R.B. D Agostino, makes a very strong statement: The Kolmogorov-Smirnov test is only a historical curiosity. It should never be used. ( Tests for Normal Distribution in Goodness-of-fit Techniques, Marcel Decker, 1986). 53

Run previous analysis again using stepwise methods. Analyze > Regression > Linear 54

1. This table lists how many models in the process and which variable is entered and which is removed on each step. 2. No variable is removed on each step. 55

1. Model summary shows R 2 for each model. 2. Sig F Change tells us when extra IV is added into model, what kind contribution that IV makes. 56

1. ANOVA table shows F values for each model. 2. All two models are significant ( p <.05). 3. The last model has two predictors in the model. 57

For self-efficacy, high score means lower self-efficacy. The results show that drug users who used more marijuana and had lower self-efficacy, more likely to have drug use problems 58

Meyers, L. S., Gamst, G., & Guarino, A. J. (2006). Applied multivariate research: design and interpretation. Thousand Oaks, CA: Sage Publications, Inc. Stevens, J. P. (2002). Applied multivariate statistics for the social sciences. Mahwah, NJ: Lawrence Erlbaum Associates, Inc. 59

60