Practice 3 SPSS. Partially based on Notes from the University of Reading:

Similar documents
SPSS Guide: Regression Analysis

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Simple linear regression

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 23. Inferences for Regression

Multiple Linear Regression

Scatter Plots with Error Bars

Linear Models in STATA and ANOVA

Simple Predictive Analytics Curtis Seare

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

ANALYSIS OF TREND CHAPTER 5

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Directions for using SPSS

Module 5: Multiple Regression Analysis

Regression step-by-step using Microsoft Excel

Formula for linear models. Prediction, extrapolation, significance test against zero slope.

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

MULTIPLE REGRESSION EXAMPLE

Univariate Regression

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter 7: Simple linear regression Learning Objectives

Factors affecting online sales

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

An introduction to IBM SPSS Statistics

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Chapter 5 Analysis of variance SPSS Analysis of variance

SPSS Explore procedure

Part 2: Analysis of Relationship Between Two Variables

Introduction to Regression and Data Analysis

Module 5: Statistical Analysis

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Regression Analysis: A Complete Example

Multiple Regression. Page 24

Data Analysis for Marketing Research - Using SPSS

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Pearson s Correlation

Pearson's Correlation Tests

2013 MBA Jump Start Program. Statistics Module Part 3

Chapter 15. Mixed Models Overview. A flexible approach to correlated data.

Multiple Regression Using SPSS

Mixed 2 x 3 ANOVA. Notes

Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish

An SPSS companion book. Basic Practice of Statistics

Chapter 2 Probability Topics SPSS T tests

Simple Linear Regression Inference

Projects Involving Statistics (& SPSS)

There are six different windows that can be opened when using SPSS. The following will give a description of each of them.

Moderation. Moderation

Statistical Models in R

10. Comparing Means Using Repeated Measures ANOVA

STAT 350 Practice Final Exam Solution (Spring 2015)

The Dummy s Guide to Data Analysis Using SPSS

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

Using R for Linear Regression

Independent t- Test (Comparing Two Means)

Correlation and Simple Linear Regression

ABSORBENCY OF PAPER TOWELS

SPSS Tests for Versions 9 to 13

Table of Contents. Preface

Regression and Correlation

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

Premaster Statistics Tutorial 4 Full solutions

Didacticiel - Études de cas

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Fairfield Public Schools

Correlation and Regression Analysis: SPSS

Regression Analysis (Spring, 2000)

Data Analysis Tools. Tools for Summarizing Data

Comparing Nested Models

II. DISTRIBUTIONS distribution normal distribution. standard scores

One-Way ANOVA using SPSS SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

When to use Excel. When NOT to use Excel 9/24/2014

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

1.1. Simple Regression in Excel (Excel 2010).

An analysis method for a quantitative outcome and two categorical explanatory variables.

Exercise 1.12 (Pg )

Scientific Graphing in Excel 2010

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Additional sources Compilation of sources:

Linear Regression. use waist

Simple Regression Theory II 2010 Samuel L. Baker

SPSS Guide How-to, Tips, Tricks & Statistical Techniques

FREE FALL. Introduction. Reference Young and Freedman, University Physics, 12 th Edition: Chapter 2, section 2.5

Tutorial for proteome data analysis using the Perseus software platform

2. Simple Linear Regression

You have data! What s next?

Transcription:

Practice 3 SPSS Partially based on Notes from the University of Reading: http://www.reading.ac.uk Simple Linear Regression A simple linear regression model is fitted when you want to investigate whether there is a linear relationship between two quantitative variables. It takes the form y = α + βx + ε where y is the response (dependent) variable, in this next case height, x is the explanatory (independent) variable, which will be age, α is the intercept term of the model, β is the gradient of the linear model and ε is the error term. To fit a Simple Linear Regression model to the data go to Analyze, followed by Regression, and then Linear. Drag drop Height in to the Dependent variable box and Age into the Independent variable box. Analizar Regresion Lineales It is obtained the R-Square value which is the amount of variation in the response that is explained by the model proposed. It is the square of Pearson s correlation coefficient. We would ideally like this percentage to be greater than 65%. Coefficients of the fitted regression model along with their standard errors and p- values (Sig.) are also shown. 1

Note: Assumptions of the model individuals. Interpretation: In this example the regression model that has been fitted is Height = 100.43 + 0.35*Age are Normality of the errors and a constant variance for all The t-statistics are testing whether α = 0 or β = 0 and the corresponding p-values (Sig.) are given. 2

These p-values show us that both the intercept and gradient parameters are significantly different from zero as they are both significant. The result for β provides strong evidence of an association between height and age. The Rsquare value is 0.459 which means that 45.9% of the variation in height is explained by age. The ANOVA table can be ignored as the hypothesis being tested for a simple linear regression model is again that β = 0, the results of which are already given in the T-test section. Creating and adding a regression line to a scatterplot When creating a scatterplot you may want to fit a regression line to demonstrate graphically what your regression model is describing. In SPSS we need to create the scatterplot and add the fitted line all at once. First click on Analyze, followed by Regression, and then Curve Estimation. Drag drop Height to the Dependent variable box and Age to the Independent variable box. Under Models tick Linear is checked and leave the rest as the default settings. Analizar Regresion Estimacion curvilinea Interpretation: From this graph it can be seen that there is a strong positive relationship between age and height. This means as age increases height will generally increasee too. The graph explains what was seen earlier in the regression model output. 3

Multiple Linear Regression Multiple regression involves fitting a model for a quantitative response (dependent) variable involving more than one explanatory variable, which is linear in its parameters. The data we will use for this example is delivery.txt. The file is about drinks delivery and contains three columns: - Column 1 is the delivery time in minutes - Column 2 is the number of cases being delivered - Column 3 is the distance walked by the delivery man in feet A multiple regression equation for this problem takes the form y = β 0 + β 1 x 1 + β 2 x 2 + ε where the y variable is the response (delivery time), and the x s are the explanatory variables (number of cases delivered and distance walked) and ε is the error term. To fit the regression model select Analyze, then Regression, followed by Linear. Analizar Regresion Lineales It is shown the calculated coefficients for the equation (under the unstandardized coefficients, B column). Std Error is the standard error for the estimated coefficient. The T-statistics (t) are testing to see whether β i = 0 and the corresponding p-values are given (Sig.). 4

It is shown the estimated standard deviation (Std Error of the Estimate) for the error in the model. The R Square value is the amount of variation in the response that is explained by the model. It should be as high as possible. The ANOVA table is used to see if any of the variables are significant, unlike the T-tests which looks more specifically at the significance of individual variables. Note: Unless there is a good reason to keep a variable in the model when it is not significant then the variable should be removed and the model fitted again. This avoids unnecessary increased variability in predictions arising from the model. 5

Interpretation: Looking at the p-values of the coefficients we can see that the constant term in the model is only just significant at the 5% level, but we still include it. The p-values for the other terms in the model are highly significant (being less than 0.001) and so we reject the null hypothesis that each parameter is equal to zero, given the other terms are in the model. Therefore each variable influences the response. The Rsquare value for this model is 96%, meaning that 96% of the variation in delivery time is explained by the regression model with cases and distance. From the ANOVA table we can see that the model is significant with p<0.001. This means that there is strong evidence to suggest that at least one of the parameters is non-zero, which we have already concluded from the earlier tests. Model Checking The main reason for model checking is to see if the underlying assumptions of a multiple regression hold. The main assumptions are as follows: - The errors are Normally distributed (which implies that the response is as well, if all explanatory variables are quantitative). - There is a linear relationship between the response and explanatory variable. - The variance of the errors is equal for all observations. When checking the model assumptions select Analyze, Regression, followed by Linear, The same window will appear as above. Click Plots. Analizar Regresion Lineales (Graficos) 6

Clicking the Guardar (Save) button will produce the following window 7

Note: If the graph does not appear to be approximately Normally distributed then transformations, like logarithms or inverses will be needed. Note: In the data view window you will now have two new columns, for residuals and predicted values. You will now need to create a scatterplot with your X variable as the unstandardized predicted value, and the Y variable as the standardized residual. Note: residual plots are a function of the difference between observed responses and those predicted by the model. If the standardised residual plots are not showing a random scatter of data, as being represented above, then patterns within them will indicate problems with the assumptions. Mild deviations from the ideal pattern are not too concerning. However, major deviations will suggest that the model is unreliable. 8

Examples of problems with residual plots For residuals versus explanatory variable graph: If the following pattern is being displayed then a square term needs to be added to the model, i.e. x 2 For residuals versus explanatory variable graph: If the following pattern is being displayed then a cubic term needs to be added to the model, i.e. x 3 9

For residuals versus fit graph: If a funnel shape is shown like this, then the variances are not constant, and are in fact increasing with the fitted value. For residuals versus fit graph: If a funnel shape is shown like this, then the variances are not constant, and are in fact decreasing with the fitted value. 10

S tepwise Selection Stepwise selection is a method for building a model which contains only those variables which are significant (at a chosen level) in modelling the response (dependent) variable. It is particularly useful when there are many possible explanatory (independent) variables. Some of these variables may be highly correlated with each other and therefore will explain the same variation in the response and not be independently predictive. Some may also not influence the response in any meaningful way. It is advisable to construct a model with as few explanatory variables as possible to ensure an easy-to-interpret model which will be efficient for future prediction purposes. Stepwise selection aims to construct a good model satisfying these aims by the dropping and adding of variables in a model according to their significance in the presence of the other variables. For stepwise selection data would usually be in the form of one quantitative response variable, plus several quantitative explanatory variables. It is assumed normality for the response variable and it is build a multiple linear regression model. Note: there is no guarantee that the final model produced is sensible. Standard model checking procedures should be applied to check that the assumptions of the final model are reasonable. For this example the data that will be used is physical.xlsx. This dataset consists of the mass (weight in kg) plus various physical measurements for 22 healthy young males. To perform stepwise selection select Analyze, followed by Regression then select Linear. Analizar Regresion Lineales Add the response variable to the Dependent box and add the explanatory variables into the Independent(s) box. Select Stepwise (Pasos sucesivos) from the Method (Método) drop down menu. 11

Interpretation: The variables entered/removed box shows the number of steps carried out in the analysis and which variables are added and removed at each stage. The model summary section gives the summary statistics of each model in the analysis. Note: of most interest is the fourth model as it is the final one. 12

R-Square is the amount of variation in the response that is explained by the model; Adjusted R-Square is the adjusted value that takes into account the number of variable in the model. The ANOVA table is the final row of results which are relevant to the final model. In the Coefficients section we can again see that there were only four steps carried out in this analysis to reach the final model, and we can see there are only four variable added from the original ten (all of the four variables are significant). The estimates of the parameters for each of these variables are given in the Unstandardized coefficients (B) column. The final model has an R-Square value of 96.6%, so the majority of variation in the response (mass) is explained by this model. The final fitted model using this selection process includes variables: Waist, Fore, Height and Thigh. 13