ANNOTATED OUTPUT--SPSS Simple Linear (OLS) Regression

Similar documents
SPSS Guide: Regression Analysis

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 5 Analysis of variance SPSS Analysis of variance

Moderation. Moderation

Simple linear regression

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

Binary Logistic Regression

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Linear Models in STATA and ANOVA

Row vs. Column Percents. tab PRAYER DEGREE, row col

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

Module 5: Multiple Regression Analysis

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Example: Boats and Manatees

MULTIPLE REGRESSION WITH CATEGORICAL DATA

Chapter 7: Simple linear regression Learning Objectives

Chapter 23. Inferences for Regression

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Introduction to Quantitative Methods

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Univariate Regression

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

The Dummy s Guide to Data Analysis Using SPSS

Independent t- Test (Comparing Two Means)

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Finding Supporters. Political Predictive Analytics Using Logistic Regression. Multivariate Solutions

11. Analysis of Case-control Studies Logistic Regression

Multiple Regression. Page 24

UNDERSTANDING THE TWO-WAY ANOVA

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

MULTIPLE REGRESSION EXAMPLE

Additional sources Compilation of sources:

Moderator and Mediator Analysis

Part 2: Analysis of Relationship Between Two Variables

Multiple Regression: What Is It?

1.1. Simple Regression in Excel (Excel 2010).

Multicollinearity Richard Williams, University of Notre Dame, Last revised January 13, 2015

Copyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5

II. DISTRIBUTIONS distribution normal distribution. standard scores

2. Simple Linear Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

ANALYSIS OF USER ACCEPTANCE OF A NETWORK MONITORING SYSTEM WITH A FOCUS ON ICT TEACHERS

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Simple Methods and Procedures Used in Forecasting

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

5. Linear Regression

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

SPSS TUTORIAL & EXERCISE BOOK

SPSS-Applications (Data Analysis)

AGE AND EMOTIONAL INTELLIGENCE

Statistiek II. John Nerbonne. October 1, Dept of Information Science

Main Effects and Interactions

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Correlation and Regression Analysis: SPSS

Solución del Examen Tipo: 1

Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish

Using Correlation and Regression: Mediation, Moderation, and More

Week TSX Index

Using R for Linear Regression

SPSS Resources. 1. See website (readings) for SPSS tutorial & Stats handout

Scientific Method. 2. Design Study. 1. Ask Question. Questionnaire. Descriptive Research Study. 6: Share Findings. 1: Ask Question.

Regression Analysis (Spring, 2000)

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

SPSS Guide How-to, Tips, Tricks & Statistical Techniques

Data Analysis for Marketing Research - Using SPSS

Statistics. Measurement. Scales of Measurement 7/18/2012

Association Between Variables

Influence of Aggressivenessand Conservativenessin Investing and Financing Policies on Performance of Industrial Firms in Kenya

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

Elements of statistics (MATH0487-1)

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Pearson s Correlation

When to Use a Particular Statistical Test

Regression step-by-step using Microsoft Excel

ABSORBENCY OF PAPER TOWELS

MEASURES OF VARIATION

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

RECRUITERS PRIORITIES IN PLACING MBA FRESHER: AN EMPIRICAL ANALYSIS

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb

The Not-Even-Remotely Close to Being a Complete Guide to SPSS / PASW Syntax. (For SPSS / PASW v.18+)

Regression Analysis: A Complete Example

10. Analysis of Longitudinal Studies Repeat-measures analysis

Directions for using SPSS

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

January 26, 2009 The Faculty Center for Teaching and Learning

Chapter 7: Dummy variable regression

Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine

Transcription:

Simple Linear (OLS) Regression Regression is a method for studying the relationship of a dependent variable and one or more independent variables. Simple Linear Regression tells you the amount of variance accounted for by one variable in predicting another variable. In this example, we are interested in predicting the frequency of sex among a national sample of adults. The dependent variable is frequency of sex. The independent variables are: age, race, general happiness, church attendance, and marital status. REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA /CRITERIA=PIN(.05) POUT(.0) /NOORIGIN /DEPENDENT sexfreq /METHOD=ENTER age Married White happy attend. Variables Entered/Removed b Variables Entered CHURCH ATTENDA NCE, RACE (White =), GENERAL HAPPINES S, AGE, MARITAL (Married =) a Variables Removed a. All requested variables entered. Method. Enter b. Dependent Variable: FREQUENCY OF SEX DURING LAST YEAR Center for Family and Demographic Research Page

Summary Adjusted Std. Error of R Square R Square the Estimate R -a- -b- -c-.572 a.327.323.65 a. Predictors: (Constant), CHURCH ATTENDANCE, RACE (White =), GENERAL HAPPINESS, AGE, MARITAL (Married =) a. R Square = tells us how much of the variance of the dependent variable can be explained by the independent variable(s). Basically, it compares the model with the independent variables to a model without the independent variables. In this case, 33% of the variance is explained by differences in all included variables. b. Adjusted R Square = As predictors are added to the model, each predictor will explain some of the variance in the dependent variable simply due to chance. The Adjusted R Square attempts to produce a more honest value to estimate R Square for the population. In this example, the difference between R Square and Adjusted R Square is minimal. c. Std. Error = The standard error of the estimate (aka, the root mean square error), is the standard deviation of the error term, and is the square root of the Mean Square Residual (or Error). Center for Family and Demographic Research Page 2

ANOVA b Sum of Squares -f- df Mean Square F -g- Sig. Regression -d- 382.662 5 276.532 0.490.000 a Residual -e- 2850.064 046 2.725 Total 4232.726 05 a. Predictors: (Constant), CHURCH ATTENDANCE, RACE (White =), GENERAL HAPPINESS, AGE, MARITAL (Married =) b. Dependent Variable: FREQUENCY OF SEX DURING LAST YEAR d. The output for Regression displays information about the variation accounted for by the model. e. The output for Residual displays information about the variation that is not accounted for by your model. And the output for Total is the sum of the information for Regression and Residual. f. A model with a large regression sum of squares in comparison to the residual sum of squares indicates that the model accounts for most of variation in the dependent variable. Very high residual sum of squares indicate that the model fails to explain a lot of the variation in the dependent variable, and you may want to look for additional factors that help account for a higher proportion of the variation in the dependent variable. In this example, we see that 32.7% of the total sum of squares is made up from the regression sum of squares. You may notice that the R 2 for this model is also.327 (this is not a coincidence!). g. If the significance value of the F statistic is small (smaller than say 0.05) then the independent variables do a good job explaining the variation in the dependent variable. If the significance value of F is larger than say 0.05 then the independent variables do not explain the variation in the dependent variable. For this example the model does a good job explaining the variation in the dependent variable. Center for Family and Demographic Research Page 3

(Constant) -h- AGE MARITAL (Married =) RACE (White =) GENERAL HAPPINESS CHURCH ATTENDANCE a Unstandardized -i- a. Dependent Variable: FREQUENCY OF SEX DURING LAST YEAR Standardized -j- B Std. Error Beta t -k- Sig. -l- 5.553.250 22.9.000 -.055.003 -.475-8.255.000.372.07.339 2.80.000 -.25.25 -.045 -.79.086 -.262.085 -.08-3.094.002 -.067.020 -.089-3.436.00 h. Constant represents the Y-intercept, the height of the regression line when it crosses the Y axis. In this example, it is the predicted value of the dependent variable, frequency of sex, when all other values are zero (0). In this example, the value is 5.553. i. These are the values for the regression equation for predicting the dependent variable from the independent variable(s). These are unstandardized (B) coefficients because they are measured in natural units, and therefore cannot be compared to one another to determine which is more influential. j. The standardized coefficients or betas are an attempt to make the regression coefficients more comparable. Here we can see that the Beta for age has the largest absolute value (-.475), which can be interpreted as having the greatest impact on frequency of sex compared to the other variables in the model. The race variable has the smallest absolute value (-.045) and can be interpreted as having the smallest impact of the dependent variable. k. The t statistics can help you determine the relative importance of each variable in the model. The t-statistic is calculated by divided the variable s unstandardized coefficient by its standard error. As a guide regarding useful predictors, look for t values well below -.96 or above +.96. As you can see, the race variable is the only t-value not within this range (note also that this is the only variable that is not significant). l. The significance column indicates whether or not a variable is a significant predictor of the dependent variable. The p-value (significance) is the probability that your sample could have been drawn from the population(s) being tested (or that a more improbable sample could be drawn) given the assumption that the null hypothesis is true. A p-value of.05, for example, indicates that you would have only a 5% chance of drawing the sample being tested if the null hypothesis was actually true. As sociologists, we typically look for p-values below.05. For example, in this model the race variable is not significant (p =.086), while the other variables are significant. Center for Family and Demographic Research Page 4

Regression Equation SEXFREQ predicted = 5.553.055*age.372*marital -.25*race.262*happy.067*attend If we plug in values in for the independent variables (age = 35 years; marital = currently married-; race = white-; happiness = very happy-; church attendance = never attends-0), we can predict a value for frequency of sex: SEXFREQ predicted = 5.553-.055*35.372* -.25* -.262* -.067*0 = 4.523 As this variable is coded, a 35-year old, White, married person with high levels of happiness and who never attends church would be expected to report their frequency of sex between values 4 (weekly) and 5 (2-3 times per week). If we plug in 70 years, instead, we find that frequency of sex is predicted at 2.53, or approximately -2 times per month. Interpretation Constant = The predicted value of frequency of sex, when all other variables are 0. Important to note, values of 0 for all variables is not interpretable either (i.e., age cannot equal 0 since in our sample all respondents are between the ages of 8 and 89). Age = For every unit increase in age (in this case, year), frequency of sex will decrease by.055 units. Marital Status = For every unit increase in marital status, frequency of sex will decrease by.372 units. Since marital status has only two categories, we can conclude that currently married persons have more sex than currently unmarried persons. Race = For every unit increase in race, frequency of sex will decrease by.25 units. Since race has only two categories, we can conclude that non- White persons have more sex than White persons. Happiness = For every unit increase in happiness, frequency of sex will decrease by.262 units. Recall that happiness is coded such that higher scores indicate less happiness. For this example, then, higher levels of happiness predict higher frequency of sex. Church Attendance = For every unit increase in church attendance, frequency of sex decreases by.067 units. Center for Family and Demographic Research Page 5

Simple Linear Regression (with nonlinear variables) It is known that some variables are often non-linear, or curvilinear. Such variables may be age or income. In this example, we include the original age variable and an age squared variable. REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA /CRITERIA=PIN(.05) POUT(.0) /NOORIGIN /DEPENDENT sexfreq /METHOD=ENTER age Married White happy attend agesquare. a (Constant) AGE MARITAL (Married =) RACE (White =) GENERAL HAPPINESS CHURCH ATTENDANCE AGE-SQUARED Unstandardized Standardized a. Dependent Variable: FREQUENCY OF SEX DURING LAST YEAR B Std. Error Beta t Sig. 4.838.427.329.000 -.020.07 -.75 -.90.234.34..325.883.000 -.28.25 -.046 -.745.08 -.27.085 -.084-3.207.00 -.067.020 -.089-3.423.00.000.000 -.303-2.065.039 The age squared variable is significant, indicating that age is non-linear. Center for Family and Demographic Research Page 6

Simple Linear Regression (with interaction term) In a linear model, the effect of each independent variable is always the same. However, it could be that the effect of one variable depends on another. In this example, we might expect that the effect of marital status is dependent on gender. In the following example, we include an interaction term, male*married. To test for two-way interactions (often thought of as a relationship between an independent variable (IV) and dependent variable (DV), moderated by a third variable), first run a regression analysis, including both independent variables (IV and moderator) and their interaction (product) term. It is highly recommended that the independent variable and moderator are standardized before calculation of the product term, although this is not essential. For this example, two dummy variables were created, for ease of interpretation. Gender was coded such that =Male and 0=Female. Marital status was coded such that =Currently married and 0=Not currently married. The interaction term is a cross-product of these two dummy variables. Regression (without interactions) REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA /CRITERIA=PIN(.05) POUT(.0) /NOORIGIN /DEPENDENT sexfreq /METHOD=ENTER age White happy attend Male Married. (Constant) AGE RACE (White =) GENERAL HAPPINESS CHURCH ATTENDANCE GENDER (Male =) MARITAL (Married =) a Unstandardized Standardized a. Dependent Variable: FREQUENCY OF SEX DURING LAST YEAR B Std. Error Beta t Sig. 5.295.258 20.503.000 -.055.003 -.469-8.089.000 -.23.25 -.048 -.852.064 -.242.084 -.075-2.873.004 -.056.020 -.074-2.850.004.385.04.096 3.709.000.38.07.326 2.262.000 Center for Family and Demographic Research Page 7

Regression (with interactions) REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA /CRITERIA=PIN(.05) POUT(.0) /NOORIGIN /DEPENDENT sexfreq /METHOD=ENTER age White happy attend Male Married Interaction. a (Constant) AGE RACE (White =) GENERAL HAPPINESS CHURCH ATTENDANCE GENDER (Male =) MARITAL (Married =) Male x Married Interaction Term Unstandardized Standardized a. Dependent Variable: FREQUENCY OF SEX DURING LAST YEAR B Std. Error Beta t Sig. 5.36.262 9.573.000 -.053.003 -.456-7.43.000 -.263.25 -.055-2.09.035 -.25.084 -.078-2.986.003 -.053.020 -.070-2.669.008.673.39.68 4.827.000.630.47.403.055.000 -.643.209 -.37-3.078.002 The product term should be significant in the regression equation in order for the interaction to be interpretable. This indicates that the effect of being married is significantly different (p <.05) for males and females. Marital status has a significant weaker effect for males than females. Regression Equation SEXFREQ predicted = 5.36 -.053*age -.263*race -.25*happy -.053*attend +.630*married + (.673 -.643*married) * male Interpretation Main Effects The married coefficient represents the main effect for females (the 0 category). The effect for females is then.63, or the marital coefficient. The effect for males is.63 -.643, or.987. The gender coefficient represents the main effect for unmarried persons (the 0 category). The effect for unmarried is then.673, or the sex coefficient. The effect for married is.673 -.643, or.03. Center for Family and Demographic Research Page 8

Interaction Effects For a simple interpretation of the interaction term, plug values into the regression equation above. Married Men = SEXFREQ predicted = 5.36 -.053*35 -.263* -.25* -.053*0 +.630* + (.673 -. 643*) * = 5.455 Married Women = SEXFREQ predicted = 5.36 -.053*35 -.263* -.25* -.053*0 +.630* + (.673 -. 643*) * 0 = 5.425 Unmarried Men = SEXFREQ predicted = 5.36 -.053*35 -.263* -.25* -.053*0 +.630*0 + (.673 -. 643*0) * = 3.825 Unmarried Women = SEXFREQ predicted = 5.36 -.053*35 -.263* -.25* -.053*0 +.630*0 + (.673 -. 643*0) * 0 = 3.795 In this example (age = 35 years; race = white-; happiness = very happy-; church attendance = never attends-0), we can see that () for both married and unmarried persons, males are reporting higher frequency of sex than females, and (2) married persons report higher frequency of sex than unmarried persons. The interaction tells us that the gender difference is greater for married persons than for unmarried persons. Center for Family and Demographic Research Page 9