A full analysis example Multiple correlations Partial correlations



Similar documents
We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries?

An SPSS companion book. Basic Practice of Statistics

An introduction to IBM SPSS Statistics

Two Related Samples t Test

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Independent t- Test (Comparing Two Means)

Chapter 23. Inferences for Regression

Module 3: Correlation and Covariance

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

SPSS Explore procedure

The Dummy s Guide to Data Analysis Using SPSS

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Formula for linear models. Prediction, extrapolation, significance test against zero slope.

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Projects Involving Statistics (& SPSS)

One-Way ANOVA using SPSS SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

HYPOTHESIS TESTING WITH SPSS:

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Univariate Regression

Correlation and Regression Analysis: SPSS

Chapter 7 Section 7.1: Inference for the Mean of a Population

Linear Models in STATA and ANOVA

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Homework 11. Part 1. Name: Score: / null

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

Module 5: Multiple Regression Analysis

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Data Analysis Tools. Tools for Summarizing Data

Chapter 2 Probability Topics SPSS T tests

7. Comparing Means Using t-tests.

SPSS TUTORIAL & EXERCISE BOOK

Linear functions Increasing Linear Functions. Decreasing Linear Functions

Study Guide for the Final Exam

Testing for differences I exercises with SPSS

Chapter 7: Simple linear regression Learning Objectives

II. DISTRIBUTIONS distribution normal distribution. standard scores

SPSS Guide: Regression Analysis

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Odds ratio, Odds ratio test for independence, chi-squared statistic.

Chapter 5 Analysis of variance SPSS Analysis of variance

Multiple Regression: What Is It?

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Diagrams and Graphs of Statistical Data

Simple linear regression

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Opgaven Onderzoeksmethoden, Onderdeel Statistiek

STAT 350 Practice Final Exam Solution (Spring 2015)

MTH 140 Statistics Videos

Using R for Linear Regression

Using Excel for inferential statistics

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

Additional sources Compilation of sources:

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Pearson s Correlation

Hypothesis testing - Steps

Using Excel for Statistical Analysis

Descriptive Statistics

Final Exam Practice Problem Answers

Eight things you need to know about interpreting correlations:

Premaster Statistics Tutorial 4 Full solutions

Data analysis and regression in Stata

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Sample Size and Power in Clinical Trials

EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST

Introduction to Minitab and basic commands. Manipulating data in Minitab Describing data; calculating statistics; transformation.

Simple Linear Regression Inference

How Does My TI-84 Do That

How to Get More Value from Your Survey Data

SPSS-Applications (Data Analysis)

SPSS Resources. 1. See website (readings) for SPSS tutorial & Stats handout

Statistics Review PSY379

There are six different windows that can be opened when using SPSS. The following will give a description of each of them.

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

Fairfield Public Schools

Simple Linear Regression

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Data exploration with Microsoft Excel: analysing more than one variable

Chapter 7. One-way ANOVA

Chapter 13. Chi-Square. Crosstabs and Nonparametric Tests. Specifically, we demonstrate procedures for running two separate

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Pearson's Correlation Tests

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Chapter 7: Modeling Relationships of Multiple Variables with Linear Regression

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

Using SPSS, Chapter 2: Descriptive Statistics

Lecture Notes Module 1

Statistics 2014 Scoring Guidelines

Relationships Between Two Variables: Scatterplots and Correlation

Introduction to Quantitative Methods

Elements of statistics (MATH0487-1)

Transcription:

A full analysis example Multiple correlations Partial correlations

New Dataset: Confidence This is a dataset taken of the confidence scales of 41 employees some years ago using 4 facets of confidence (Physical, Appearance, Emotional, and Problem Solving, as well as their gender and their citizenship status.

Example problem 1: Analyze the correlation between physical confidence and appearance confidence. First question we should ask Is Pearson correlation appropriate? Four requirements for correlation: 1. 2. 3. 4.

Example problem 1: Analyze the correlation between physical confidence and appearance confidence. First question we should ask Is Pearson correlation appropriate? Four requirements for correlation: 1. A straight-line relationship. 2. Interval data. 3. Random sampling (Will need to assume) 4. Normal distributed characteristics

Check for normality in each of the histograms. (Graphs Legacy Dialog Histogram)

The appearance variable is close enough to normal, although it has more on the upper and lower end than it should. The physical variable has a negative skew, so that could be a problem.

There are at least two values that are far below the mean for confidence in physical. We should investigate them further. Graphs Legacy Dialogs Boxplot Use summaries of separate variables, and Options Exclude Variable-by-Variable

Boxplots identify outliers, from the boxplot we find that cases 31 and 37 are the outliers in physical confidence. Looking at the data directly we find that neither of these cases even have a value for appearance.

The two outliers in physical have no measured value for appearance That means they will have no effect on a correlation between physical and appearance. Correlation can only consider cases where there are values for both variables (a point needs both an X and a Y to exist)

Next, we look at the scatterplot. Graphs Legacy Dialog Scatter/Dot No obvious signs of non-linear trends, but there doesn t seem to be any strong trend at all. Correlation is a appropriate measure, but it won t be strong.

We run the correlation to find it and see if it s significant at alpha = 0.05. Analyze Correlate Bivariate Sig. (2-tailed) is.039, so the correlation is significant at alpha =.05. (Had we chosen the.01 level, this would not be the case)

We could also run a t-test by hand to verify the significance level we found. (r=.373, n=31) t* = 2.045 at 0.05 level, 29 df t* = 2.756 at 0.01 level, 29 df

Let s not sully this moment with a bad pun or something.

The correlation matrix is a table that shows the correlation between two variables. Physical Appearance Physical 1.000.373 Appearance.373 1 In this case, Physical is correlated with Appearance with r=.373 Likewise, Appearance is correlated with Physical with r=.373 Also, everything correlates with itself with r=1.000.

SPSS takes it a little farther by making a matrix of correlation coefficient, significance, and sample size. Confidences are significantly correlated, there are 31 entries for each pair (not 41 because real data has blanks).

However, if we go to the correlations menu and select more than two variables of interest:

We get a 4x4 correlation matrix instead! What s better than two variables? FOUR VARIABLES!

Cutting away all the sample size and significance stuff, I find: Phys. Appear. Emot. Pr.Solve. Physical 1.373*.430**.730** Appearance 1.483**.527** Emotional 1.540** Problem Solving 1 There is a positive correlation between every facet. That means that any one facet of confidence increases, so do all the others. * significant at 0.05 level * significant at 0.01 level

Phys. Appear. Emot. Pr.Solve. Physical 1.373*.430**.730** Appearance 1.483**.527** Emotional 1.540** Problem Solving 1 Multiple correlation is useful as a first-look search for connections between variables, and to see broad trends between data. If there were only a few variables connected to each other, it would help us identify which ones without having to look at all 6 pairs individually.

Pitfalls of multiple correlations: 1. Multiple testing. With 4 variables, there are 6 correlations being tested for significance. At alpha =0.05, there s a 26.5% chance that at least one correlation is going to show as significant even if there are no correlations at all. At 5 variables, there are 10 tests and a 40.1% chance of falsely rejecting at least one null. (Assuming no correlations) At 6 variables, there are 15 tests and a 53.7% chance of falsely rejecting the null.

You don t need to know how to handle multiple testing problems in this class. However, be cautious when dealing with many variables. Be suspicious of correlations that are significant, but just barely. Example: The weakest correlation here is physical with appearance, a correlation of.373. That correlation being significant could be a fluke.

2. Diagnostics doesn t get easier. Doing correlations as a matrix allows you to do the math of a correlation much faster than checking them one at a time. However, the diagnostic tests like histograms, scatterplots, and residual plots don t get any faster. Any correlation we re interested in (even if it s not showing as significant) still needs checks for normality and linearity before use in research.

One big advantage of correlating with multiple variables is that we can isolate the connections between different variables where they might not be obvious otherwise. Phys. Appear. Emot. Pr.Solve. Physical 1.373*.430**.730** Appearance 1.483**.527** Emotional 1.540** Problem Solving 1

Example: Is there really a correlation between appearance confidence and problem solving confidence SPECIFICALLY, or are they both attached to the same general confidence?

Ponder that over a Mandarin Duck.

To isolate a correlation between two variables from a third variable, we want to only look at the part of that correlation that s really between those two and not the third. We want the partial correlation. Example: Ice cream sales increase when murder rates increase. These two variables have nothing logical to do with each other, however, they both increase when it s hot out.

This is the simple correlation between these two variables. We want the relationship between murder and ice cream WITHOUT the confounding variable of heat.

In the dataset murderice.csv, we can find run a partial correlation and find out. First, a simple correlation reveals very significant correlations between everything.

But how much of that connection is truly between murder and ice cream? Analyze Correlate Partial

From here, put the two variables of interest in the variable (you can put more than two if you wish). Put the confounding variable in the control for slot.

The partial correlation between ice cream and murder is much lower than the simple correlation. It appears that heat (or something common to all three) was a major factor in both. In fact, the correlation is no longer significant (we fail to reject the null that there is no correlation)

Also note: SPSS tells us in the output table that heat is a control variable, so we know from the output that this is a partial correlation (hint, hint). We re using three degrees of freedom, one for each variable involved, so the df is 57 even when n is 60 (for interest)

Key observation: The partial correlation will be less than the simple correlation if both variables of interest are correlated to the confounding variable in the same way. Here, both murder and ice cream are correlated to heat positively, so the partial correlation removes that common positive relationship murder and ice cream. Removing a positive relationship makes the correlation less positive.

Likewise, if the correlation to the confounding variable is opposing, then the partial correlation will be higher than the simple correlation. If we re only considering positive correlations, this means a confounding variable could be hiding or masking a correlation hiding a correlation between two variables rather than creating a false correlation.

Example: Confidence. Consider the correlation between types of confidence. Do the correlations between the other three still show after we control for problem solving confidence? Simple Correlations Phys. Appear. Emot. Pr.Solve. Physical 1.373*.430**.730** Appearance 1.483**.527** Emotional 1.540** Problem Solving 1

The correlation between physical and anything is removed entirely (that means that knowing problem solving confidence tells you as much about an employee s physical confidence as knowing all three other facets)

With the heat behind murder and ice cream we had some other non-math information to make the claim that heat was behind the other two variables. It could have easily been something we didn t measure, like the proportion of elderly in an area (retirees often migrate south for winter). In the case of facets of confidence, we don t have any reason why problem solving confidence would be the common thread. The partial correlations shrink to nothing because after problem solving, the other variables we re giving much info.

If we control for emotional confidence, we see there s a connection between problem solving and physical when emotional is taken out of the picture.

Interestingly, controlling for appearance produces the same result. They all have a common thread and so increase together, but the real connection is between problem solving and physical confidence. Without partial correlation we would have never caught this.