The Correlation Coefficient



Similar documents
Diagrams and Graphs of Statistical Data

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

You buy a TV for $1000 and pay it off with $100 every week. The table below shows the amount of money you sll owe every week. Week

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y X

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Father s height (inches)

CURVE FITTING LEAST SQUARES APPROXIMATION

Graphing Quadratic Equations

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Section 3 Part 1. Relationships between two numerical variables

Regression and Correlation

Unit 9 Describing Relationships in Scatter Plots and Line Graphs

LINEAR EQUATIONS IN TWO VARIABLES

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 7: Simple linear regression Learning Objectives

Scatter Plot, Correlation, and Regression on the TI-83/84

Temperature Scales. The metric system that we are now using includes a unit that is specific for the representation of measured temperatures.

Simple Regression Theory II 2010 Samuel L. Baker

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Mario Guarracino. Regression

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Homework 11. Part 1. Name: Score: / null

Dimensionality Reduction: Principal Components Analysis

Vertical Alignment Colorado Academic Standards 6 th - 7 th - 8 th

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Simple linear regression

DERIVATIVES AS MATRICES; CHAIN RULE

Relationships Between Two Variables: Scatterplots and Correlation

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces

ALGEBRA. sequence, term, nth term, consecutive, rule, relationship, generate, predict, continue increase, decrease finite, infinite

CALCULATIONS & STATISTICS

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Linear Approximations ACADEMIC RESOURCE CENTER

2. Simple Linear Regression

Correlation key concepts:

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries?

Chapter 11: r.m.s. error for regression

Chapter 27: Taxation. 27.1: Introduction. 27.2: The Two Prices with a Tax. 27.2: The Pre-Tax Position

Chapter 6. Linear Programming: The Simplex Method. Introduction to the Big M Method. Section 4 Maximization and Minimization with Problem Constraints

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

LAYOUT OF THE KEYBOARD

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

Exercise 1.12 (Pg )

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

Charts, Tables, and Graphs

Number Who Chose This Maximum Amount

Title: Lending Club Interest Rates are closely linked with FICO scores and Loan Length

Common Tools for Displaying and Communicating Data for Process Improvement

Systems of Linear Equations

Module 3: Correlation and Covariance

Pennsylvania System of School Assessment

with functions, expressions and equations which follow in units 3 and 4.

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

1.6. Piecewise Functions. LEARN ABOUT the Math. Representing the problem using a graphical model

Mathematics Pre-Test Sample Questions A. { 11, 7} B. { 7,0,7} C. { 7, 7} D. { 11, 11}

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Part 1 Expressions, Equations, and Inequalities: Simplifying and Solving

Means, standard deviations and. and standard errors

GeoGebra Statistics and Probability

LAB 11: MATRICES, SYSTEMS OF EQUATIONS and POLYNOMIAL MODELING

Solving Linear Systems, Continued and The Inverse of a Matrix

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Pearson s Correlation Coefficient

Georgia Standards of Excellence Curriculum Map. Mathematics. GSE 8 th Grade

of surface, , , of triangle, 548 Associative Property of addition, 12, 331 of multiplication, 18, 433

Practice with Proofs

8.9 Intersection of Lines and Conics

9.4 THE SIMPLEX METHOD: MINIMIZATION

Univariate Regression

Session 7 Bivariate Data and Analysis

Chapter 4. Polynomial and Rational Functions. 4.1 Polynomial Functions and Their Graphs

Elements of a graph. Click on the links below to jump directly to the relevant section

The Graphical Method: An Example

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

10.1. Solving Quadratic Equations. Investigation: Rocket Science CONDENSED

The correlation coefficient

Review of Fundamental Mathematics

Data Mining Part 5. Prediction

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Part 2: Analysis of Relationship Between Two Variables

Chapter 9 Descriptive Statistics for Bivariate Data

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Designer: Nathan Kimball. Stage 1 Desired Results

1.2 Solving a System of Linear Equations

Decimals and Percentages

Statistics E100 Fall 2013 Practice Midterm I - A Solutions

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

Solving Systems of Linear Equations

Common Core Unit Summary Grades 6 to 8

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Statistical Analysis Using SPSS for Windows Getting Started (Ver. 2014/11/6) The numbers of figures in the SPSS_screenshot.pptx are shown in red.

Using Microsoft Excel to Plot and Analyze Kinetic Data

3.3. Solving Polynomial Equations. Introduction. Prerequisites. Learning Outcomes

The KaleidaGraph Guide to Curve Fitting

GRADES 7, 8, AND 9 BIG IDEAS

South East of Process Main Building / 1F. North East of Process Main Building / 1F. At 14:05 April 16, Sample not collected

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

AP Physics 1 and 2 Lab Investigations

Transcription:

The Correlation Coefficient Lelys Bravo de Guenni April 22nd, 2015

Outline The Correlation coefficient Positive Correlation Negative Correlation Properties of the Correlation Coefficient Non-linear association Outliers effect on the correlation

The Correlation coefficient Positive Correlation Suppose that the following scatter diagram corresponds to the standard units of the data. The signs of the quadrants are determined by the sign of the products. There are more points in the positive quadrants than in the negative quadrants. So the average is positive yielding a positive correlation. April 22nd, 2015 UCSC AMS5 3/19

April 22nd, 2015 UCSC AMS5 4/19

Negative Correlation In this case there are more points in the negative quadrants than in the positive quadrants. So the average is negative, yielding a negative correlation. April 22nd, 2015 UCSC AMS5 5/19

Properties of the Correlation Coefficient Suppose you measure the correlation between the temperature in New York and that in Boston during a month. Do you expect to get the same correlation if you measure it in Celsius than if you measure it in Fahrenheit? Recall the process of obtaining the correlation. The first step is to convert the samples of both variables to standard units. This implies that the correlation does not depend on units. Thus, no matter if you use Celsius of Fahrenheit, you will get the same correlation. On the other hand, since the correlation depends on the product of the two variables, it does not matter whether you consider the correlation the temperature in NY and that in Boston or the correlation between the temperature in Boston and that in NY. April 22nd, 2015 UCSC AMS5 6/19

The correlation is not affected when the two variables are interchanged. The correlation is not changed if the same number is added to all the values of one of the variables. The correlation is not changed if all the values of one of the variables is multiplied by the same positive number. It will change sign if the number is negative. April 22nd, 2015 UCSC AMS5 7/19

Non-linear association Correlation is useful only when measuring the degree of linear association between two variables. That is, how much the values from two variables cluster around a straight line. April 22nd, 2015 UCSC AMS5 8/19

The variables in this plot have an obvious non-linear association. Nevertheless the correlation between them is 0.3. This is because the points are clustered around a curve that has the shape of a parabola and not a straight line. April 22nd, 2015 UCSC AMS5 9/19

Outliers effect on the correlation Similarly, the presence of outliers produces an artificially low correlation between variables that have a high degree of linear association. April 22nd, 2015 UCSC AMS5 10/19

The correlation of the data in the plot is 0.944. When the points marked with crosses in red are considered, the correlation drops to 0.75. April 22nd, 2015 UCSC AMS5 11/19

Find the correlation coefficient for each of the following data sets: Example 1: x 1 1 1 1 2 2 2 3 3 4 y 5 3 5 7 3 3 1 1 1 1 The average of x is 2, the SD is 1. The average of y is 3 and the SD is 2. Then, the standardized values are April 22nd, 2015 UCSC AMS5 12/19

x -1-1 -1-1 0 0 0 1 1 2 y 1 0 1 2 0 0-1 -1-1 -1 x y -1 0-1 -2 0 0 0-1 -1-2 and the average of the last column is -0.80. April 22nd, 2015 UCSC AMS5 13/19

Example 2: x 1 1 1 1 2 2 2 3 3 4 y 1 2 1 3 1 4 1 2 2 3 x has not changes, y has average equal to 2 and a SD equal to 1. Then, the standardized values are April 22nd, 2015 UCSC AMS5 14/19

x -1-1 -1-1 0 0 0 1 1 2 y -1 0-1 1-1 2-1 0 0 1 x y 1 0 1-1 0 0 0 0 0 2 and the average of the last row is 0.3. April 22nd, 2015 UCSC AMS5 15/19

Example 3 x 1 1 1 1 2 2 2 3 3 4 y 2 2 2 2 4 4 4 6 6 8 We observe that y = 2 x and so there is total linear association between x and y, implying that the correlation is 1. April 22nd, 2015 UCSC AMS5 16/19

Problem: Guess the correlation coefficients in these scatter diagrams April 22nd, 2015 UCSC AMS5 17/19

Answers: a) +1.00; b) -0.50; c) +0.85; d) +0.15 April 22nd, 2015 UCSC AMS5 18/19