Multiple Regression in SPSS

Similar documents
Moderation. Moderation

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Premaster Statistics Tutorial 4 Full solutions

SPSS Explore procedure

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Multiple Regression Using SPSS

Regression Analysis (Spring, 2000)

SPSS-Applications (Data Analysis)

Correlation and Regression Analysis: SPSS

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Multicollinearity Richard Williams, University of Notre Dame, Last revised January 13, 2015

Additional sources Compilation of sources:

Univariate Regression

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Moderator and Mediator Analysis

Simple linear regression

The Dummy s Guide to Data Analysis Using SPSS

Using Correlation and Regression: Mediation, Moderation, and More

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Module 5: Multiple Regression Analysis

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Factor Analysis. Sample StatFolio: factor analysis.sgp

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Simple Regression Theory II 2010 Samuel L. Baker

Multiple Regression: What Is It?

DISCRIMINANT FUNCTION ANALYSIS (DA)

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Didacticiel - Études de cas

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

The Basics of Regression Analysis. for TIPPS. Lehana Thabane. What does correlation measure? Correlation is a measure of strength, not causation!

Example: Boats and Manatees

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

IBM SPSS Data Preparation 22

Canonical Correlation Analysis

There are six different windows that can be opened when using SPSS. The following will give a description of each of them.

Directions for using SPSS

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.


MULTIPLE REGRESSION WITH CATEGORICAL DATA

NCSS Statistical Software. Multiple Regression

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

INTRODUCTION TO MULTIPLE CORRELATION

Chapter 7 Factor Analysis SPSS

Data analysis and regression in Stata

hp calculators HP 50g Trend Lines The STAT menu Trend Lines Practice predicting the future using trend lines

Multiple Regression. Page 24

4. Multiple Regression in Practice

Multivariate Analysis of Variance (MANOVA)

Exercise 1.12 (Pg )

Factor Analysis. Chapter 420. Introduction

5. Multiple regression

A Review of Methods. for Dealing with Missing Data. Angela L. Cool. Texas A&M University

ABSORBENCY OF PAPER TOWELS

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Multivariate Analysis of Variance (MANOVA)

THE USE OF REGRESSION ANALYSIS IN MARKETING RESEARCH

Part 2: Analysis of Relationship Between Two Variables

Data analysis process

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

SPSS Guide: Regression Analysis

Chapter 15. Mixed Models Overview. A flexible approach to correlated data.

STAT 350 Practice Final Exam Solution (Spring 2015)

Regression III: Advanced Methods

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Strategies for Identifying Students at Risk for USMLE Step 1 Failure

5. Linear Regression

Teaching Multivariate Analysis to Business-Major Students

Multiple Regression Kean University February 12,

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Predictor Coef StDev T P Constant X S = R-Sq = 0.0% R-Sq(adj) = 0.

How Far is too Far? Statistical Outlier Detection

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

Example of Including Nonlinear Components in Regression

Instructions for SPSS 21

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Lecture 5: Model Checking. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II

Elements of statistics (MATH0487-1)

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

An Introduction to Path Analysis. nach 3

Descriptive Statistics

The KaleidaGraph Guide to Curve Fitting

Using R for Linear Regression

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared.

SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

2. Simple Linear Regression

Model Diagnostics for Regression

JD Eveland PhD. Presented to TUI University Faculty Research Forum. 30 November 2007

Final Exam Practice Problem Answers

Homework 8 Solutions

Chapter 7: Simple linear regression Learning Objectives

Transcription:

Multiple Regression in SPSS

Example: An automotive industry group keeps track of the sales for a variety of personal motor vehicles. In an effort to be able to identify over- and underperforming models, you want to establish a relationship between vehicle sales and vehicle characteristics. Information concerning different makes and models of cars is contained in car_sales.sav. Use linear regression to identify models that are not selling well; i.e. we would like to find attributes or variables that will inference the volumes of sales.

Which variable to use?

EXAMINE VARIABLES=sales lnsales /PLOT BOXPLOT HISTOGRAM NPPLOT /COMPARE GROUP /STATISTICS NONE /CINTERVAL 95 /MISSING LISTWISE /NOTOTAL.

The distribution of Log-transformed sales is closer to normal than Sales in thousands, and the linear regression model works better with normal variables.

To run a linear regression analysis, from the menus choose: Analyze Regression Linear... Select Log-transformed sales as the dependent variable. Select Vehicle type through Fuel efficiency as independent variables.

Partial Correlation. The correlation that remains between two variables after removing the correlation that is due to their mutual association with the other variables. The correlation between the dependent variable and an independent variable when the linear effects of the other independent variables in the model have been removed from both. Part Correlation. The correlation between the dependent variable and an independent variable when the linear effects of the other independent variables in the model have been removed from the independent variable. It is related to the change in R squared when a variable is added to an equation. Sometimes called the semipartial correlation.

Collinearity diagnostics. Collinearity (or multicollinearity) is the undesirable situation when one independent variable is a linear function of other independent variables. Eigenvalues of the scaled and uncentered cross-products matrix, condition indices, and variance-decomposition proportions are displayed along with variance inflation factors (VIF) and tolerances for individual variables. Residuals. Displays the Durbin-Watson test for serial correlation of the residuals and casewise diagnostics for the cases meeting the selection criterion (outliers above n standard deviations).

Collinearity diagnostics The Collinearity diagnostics table contains three types of indicators: 1. Eigenvalues of the cross-products matrix, 2. Condition Index 3. Tolerances for individual variables. Where tolerances are less than.01 multicollinerity problems are indicated. The results of the table below suggest there is not a problem with multicollinearity in the Leslie Salt regression Model I. This is another indication individual coefficients are significantly different from zero.

REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA COLLIN TOL ZPP /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT lnsales /METHOD=ENTER type price engine_s horsepow wheelbas width length curb_wgt fuel_cap mpg /SCATTERPLOT=(*ZRESID,*ZPRED ) /RESIDUALS NORM(ZRESID) /SAVE PRED RESID ZRESID.

The ANOVA table reports a significant F statistic, indicating that using the model is better than guessing the mean. As a whole, the regression does a good job of modeling sales. Nearly half the variation in sales is explained by the model. This also can be observed by the value of R 2.

Even though the model fit looks positive, the first section of the coefficients table shows that there are too many predictors in the model. There are several non-significant coefficients, indicating that these variables do not contribute much to the model.

To determine the relative importance of the significant predictors, look at the standardized coefficients. Even though Price in thousands has a small coefficient compared to Vehicle type, Price in thousands actually contributes more to the model because it has a larger absolute standardized coefficient.

The second section of the coefficients table shows that there might be a problem with multicollinearity. For most predictors, the values of the partial and part correlations drop sharply from the zero-order correlation. This means, for example, that much of the variance in sales that is explained by price is also explained by other variables.

The tolerance is the percentage of the variance in a given predictor that cannot be explained by the other predictors. Thus, the small tolerances show that 70%-90% of the variance in a given predictor can be explained by the other predictors. When the tolerances are close to 0, there is high multicollinearity and the standard error of the regression coefficients will be inflated. A variance inflation factor greater than 2 is usually considered problematic, and the smallest VIF in the table is 3.193.

The collinearity diagnostics confirm that there are serious problems with multicollinearity Several eigenvalues are close to 0, indicating that the predictors are highly intercorrelated and that small changes in the data values may lead to large changes in the estimates of the coefficients.

Now try to fix the collinearity problems by rerunning the regression using z scores of the dependent variables and the stepwise method of model selection. This is in order to include only the most useful variables in the model. To run a stepwise Linear Regression on the standardized variables, recall the Linear Regression dialog box. Deselect Vehicle type through Fuel efficiency as independent variables.

Select Zscore: Vehicle type through Zscore: Fuel efficiency as independent variables. Select Stepwise as the entry method. Select Model as the case labeling variable. And rerun the model.

此處的觀察值必須符合選擇條件 ( 偏離值在 n 標準差以上者 )

REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA COLLIN TOL CHANGE ZPP /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT lnsales /METHOD=STEPWISE ztype zprice zengine_ zhorsepo zwheelba zwidth zlength zcurb_wg zfuel_ca zmpg /SCATTERPLOT=(*ZRESID,lnsales ) (*SDRESID,*ZPRED ) /RESIDUALS HIST(ZRESID) NORM(ZRESID) ID( model ) /CASEWISE PLOT(ZRESID) OUTLIERS(2) /SAVE ZPRED COOK LEVER ZRESID.

Collinearity Diagnostics There are no eigenvalues close to 0, and all of the condition indices are much less than 15. The strategy has worked, and the model built using stepwise methods does not have problems with collinearity.

Checking the Model Fit The new model's ability to explain sales compares favorably with that of the previous model.

Checking the Model Fit Look in particular at the adjusted R- square statistics, which are nearly identical. A model with extra predictors will always have a larger R-square, but the adjusted R- square compensates for model complexity to provide a more fair comparison of model performance.

ANOVA for Multiple Regression

Stepwise Coefficients The stepwise algorithm chooses price and size (in terms of the vehicle wheelbase) as predictors. Sales are negatively affected by price and positively affected by size; your conclusion is that cheaper, bigger cars sell well.

Stepwise Coefficients Beta In is the value of the standardized coefficient for the predictor if it is included next. All of the significance values are less than 0.05, so any of the remaining predictors would be adequate if included in the model. Price was chosen first because it is the predictor that is most highly correlated with sales. The remaining predictors are then analyzed to determine which, if any, is the most suitable for inclusion at the next step.

Stepwise Coefficients After adding wheelbase to the model, none of the remaining predictors are significant. However, vehicle type just barely misses the 0.05 cutoff, so you may want to add it manually in a future analysis to see how it changes the results. To choose the best variable to add to the model, look at the partial correlation, which is the linear correlation between the proposed predictor and the dependent variable after removing the effect of the current model. Thus, wheelbase is chosen next because it has the highest partial correlation.

Stepwise Coefficients Engine size would have a larger beta coefficient if added to the model, but it's not as desirable as vehicle type. This is because engine size has a relatively low tolerance compared to vehicle type, indicating that it is more highly correlated with price and wheelbase.

The shape of the histogram follows the shape of the normal curve fairly well, but there are one or two large negative residuals. For more information on these cases, see the casewise diagnostics.

Collinearity Diagnostics The Explorer seems to be the only overperformer. This table identifies the cases with large negative residuals as the 3000GT and the Cutlass. This means that relative to other cars of their size and price, these two models underperformed in the market. The Breeze, Prowler, and SW also appear to have underperformed to a lesser extent.

Summary Using stepwise methods in Linear Regression, you have selected a "best" model for predicting motorvehicle sales. With this model, you found two vehicle models that were underperforming in the market, while no vehicle was clearly over-performing. Diagnostic plots of residuals and influence statistics indicated that your regression model may be adversely affected by unusually large, expensive cars like the SL- Class. Hopefully, by log-transforming wheelbase and price in thousands in future analyses, the SL-Class will no longer be influential.

Summary The Linear Regression procedure is useful for modeling the relationship between a scale dependent variable and one or more scale independent variables. Use the Correlations procedure to study the strength of relationships between the variables before fitting a model. If you have categorical predictor variables, try the GLM Univariate procedure. If your dependent variable is categorical, try the Discriminant Analysis procedure. If you have a lot of predictors and want to reduce their number, use the Factor Analysis procedure.

End of The Section