Pearson s Correlation



Similar documents
SPSS Guide: Regression Analysis

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

Chapter 7: Simple linear regression Learning Objectives

Simple Linear Regression Inference

Hypothesis testing - Steps

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Correlational Research

Module 5: Multiple Regression Analysis

Module 3: Correlation and Covariance

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Association Between Variables

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Simple Regression Theory II 2010 Samuel L. Baker

Study Guide for the Final Exam

Elementary Statistics Sample Exam #3

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Two Related Samples t Test

Chapter 2 Probability Topics SPSS T tests

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Session 7 Bivariate Data and Analysis

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Module 5: Statistical Analysis

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Introduction to Regression and Data Analysis

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Univariate Regression

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

CORRELATION ANALYSIS

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Example: Boats and Manatees

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Chapter 13 Introduction to Linear Regression and Correlation Analysis

CALCULATIONS & STATISTICS

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST

Descriptive Statistics

Independent t- Test (Comparing Two Means)

Regression Analysis: A Complete Example

Simple linear regression

THE KRUSKAL WALLLIS TEST

The Dummy s Guide to Data Analysis Using SPSS

Chapter 5 Analysis of variance SPSS Analysis of variance

Section 13, Part 1 ANOVA. Analysis Of Variance

Scatter Plot, Correlation, and Regression on the TI-83/84

Non-Parametric Tests (I)

II. DISTRIBUTIONS distribution normal distribution. standard scores

Additional sources Compilation of sources:

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries?

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

Part 2: Analysis of Relationship Between Two Variables

Correlations. MSc Module 6: Introduction to Quantitative Research Methods Kenneth Benoit. March 18, 2010

LINEAR INEQUALITIES. less than, < 2x + 5 x 3 less than or equal to, greater than, > 3x 2 x 6 greater than or equal to,

Question 2: How do you solve a matrix equation using the matrix inverse?

2 Sample t-test (unequal sample sizes and unequal variances)

Roadmap to Data Analysis. Introduction to the Series, and I. Introduction to Statistical Thinking-A (Very) Short Introductory Course for Agencies

A full analysis example Multiple correlations Partial correlations

2013 MBA Jump Start Program. Statistics Module Part 3

Data Analysis for Marketing Research - Using SPSS

Statistical tests for SPSS

A Primer on Forecasting Business Performance

Odds ratio, Odds ratio test for independence, chi-squared statistic.

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

DATA INTERPRETATION AND STATISTICS

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

MULTIPLE REGRESSION EXAMPLE

Interaction between quantitative predictors

This can dilute the significance of a departure from the null hypothesis. We can focus the test on departures of a particular form.

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Categorical Data Analysis

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Ordinal Regression. Chapter

Formula for linear models. Prediction, extrapolation, significance test against zero slope.

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Linear Models in STATA and ANOVA

Correlation and Regression Analysis: SPSS

Overview of Factor Analysis

HYPOTHESIS TESTING WITH SPSS:

Premaster Statistics Tutorial 4 Full solutions

Introduction to Hypothesis Testing

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Chapter 23. Inferences for Regression

Covariance and Correlation

Independent samples t-test. Dr. Tom Pierce Radford University

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Multiple Linear Regression

Multicollinearity Richard Williams, University of Notre Dame, Last revised January 13, 2015

Multiple Regression. Page 24

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Mgmt 469. Regression Basics. You have all had some training in statistics and regression analysis. Still, it is useful to review

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Regression and Correlation

Binary Logistic Regression

A STUDY ON ONBOARDING PROCESS IN SIFY TECHNOLOGIES, CHENNAI

Transcription:

Pearson s Correlation

Correlation the degree to which two variables are associated (co-vary). Covariance may be either positive or negative. Its magnitude depends on the units of measurement. Assumes the data are from a bivariate normal population. Does not necessarily imply causation.

The four y variables have the same mean (7.5), standard deviation (4.1), correlation (0.81) and regression line (y = 3 + 0.5x).

Pearson s correlation coefficient is a measure of the intensity of the linear association between variables. It is possible to have non-linear associations. Need to examine data closely to determine if any association exhibits linearity. Linear Non-linear

This is called Pearson s Correlation Coefficient. It ranges from -1 to +1. where y x xy r n Y X XY xy n Y Y y n X X x

The correlation coefficient is a measure of the intensity of the association between variables. r is a unit-less number. It can not be used to extrapolate a change in y based on a change in x. If variables are highly correlated, then we may want to investigate their association further to determine if there is a causal mechanism operating.

1 versus -tailed hypotheses -tailed hypotheses concerning r would state that there is a significant correlation between two variables. e.g. H o : r = 0, H a : r 0 1-tailed hypotheses concerning r would state that the association is either positive or negative. e.g. H o : r 0, H a : r > 0

Significance Testing for r If the data are normally distributed we can calculate a t-statistic for the correlation coefficient (r) using the equation: t r s r where s r 1 ( r) n df = n- since there is one df for each column. Here we are testing the null hypothesis that r = 0.

Dissolved Oxygen and Temperature 16 Dissolved Oxygen 15 14 13 5 6 7 8 9 Temperature (C)

Temp (X) DO (Y) 5.419 15.515 6.156 15.389 6.338 15.68 6.419 15.746 6.76 15.648 6.860 14.954 6.913 15.489 7.98 14.540 7.677 14.464 7.781 14.145 8.30 14.76 8.568 14.48 8.74 13.166

Temp (X) DO (Y) X Y 5.419 15.515 9.37 40.7 6.156 15.389 37.90 36.8 6.338 15.68 40.17 44.3 6.419 15.746 41.0 47.94 6.76 15.648 45.7 44.86 6.860 14.954 47.06 3.6 6.913 15.489 47.79 39.91 7.98 14.540 53.6 11.41 7.677 14.464 58.94 09.1 7.781 14.145 60.54 00.08 8.30 14.76 68.9 17.9 8.568 14.48 73.41 09.73 8.74 13.166 76.11 173.34 93. 193.93 680.39 899.79

Temp(X) DO(Y) XY 5.419 15.515 84.08 6.156 15.389 94.73 6.338 15.68 99.05 6.419 15.746 101.07 6.76 15.648 105.81 6.860 14.954 10.58 6.913 15.489 107.08 7.98 14.540 106.11 7.677 14.464 111.04 7.781 14.145 110.06 8.30 14.76 1.55 8.568 14.48 14.08 8.74 13.166 114.86 93. 193.93 1383.1

1.796 4.94 1.796 sign) the (ignore 4.94 0.168 0.83 0.168 11 0.311 13 0.83) ( 1 0.83 (11.93)(6.80) 7.51 7.51 13 (93.)(193.93) 1383.1 6.80 13 (193.93) 899.79 11.93 13 (93.) 680.39 sum) then Y, and X (multiply 1383.1 observations) squared the (sum of 899.79 680.39 colums) the (sum of 193.93 93. 11 13, 13 Critical r t t s r xy y x XY Y X Y X n df n therefore reject H 0

Since the value of r can be either positive or negative the critical value is a range in this case: -1.796 to +1.796 Our t-statistic (-4.94) falls beyond that range so we reject H 0. There is a significant inverse (negative) correlation between temperature and dissolved oxygen (t -4.94, p < 0.005, r=-0.83).

Since the range of the correlation coefficient is from -1 to +1, a value of -0.83 tells us what? 1. The association is inverse meaning as one variable increases the other decreases.. The intensity of this inverse association between temperature and dissolved oxygen is fairly high.

Be aware that r is influenced by samples size. Therefore if the sample size is large, even smaller r values may be important.

Do by hand on the board: State Black Lung Rate Underground Miners Kentucky 6.03 1947 West Virginia 17.37 1439 Ohio.07 1759 Utah 3.09 19 Oklahoma 0.41 36 Tennessee 1.38 390 Alabama 1.9 4014 Virginia 3.87 5101 Pennsylvania 16.41 60 Colorado 1.95 93 r t x y xy r s r xy x X Y XY where y n n Y where X X Y s r n 1 ( r) n SPSS data set s:\geo\pgmarr\quantitative Methods\SPSS Data\BlackLung.sav

Correlati ons BlackLung Underground BlackLung Pearson Correlation Sig. (-tailed) 1.79*.017 N 10 10 Underground Pearson Correlation.79* 1 Sig. (-tailed) N.017 10 10 *. Correlation is signif icant at the 0.05 lev el (-tailed).

Correlation vs Regression Correlation measures of association, but no causal relationship is implied. There are no dependent or independent variables. Regression measures association where a causal relationship is believed to exist. A dependent and 1+ independent variables are assumed. Both correlation and regression assume that the relationship under investigation is linear, but it may be either positive (direct) or negative (inverse).

Boas Native American Tribe Anthropometric Measurements Note that arm and leg lengths are significantly correlated. However, longer arms do not cause longer legs. The causal mechanism is body proportions, meaning that larger individuals tend to have both longer arms and legs.

Remember that causation will result in correlation, but that correlation does not necessarily result in causation. Therefore, correlation is a necessary but not sufficient condition to make causal inferences regarding our data. Causation can really only be determined through controlled data analysis and a firm understanding of the underlying mechanisms which may result in a causal relationship.