Relationships Between Two Variables: Scatterplots and Correlation



Similar documents
Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Correlation and Regression

Describing Relationships between Two Variables

Descriptive Statistics

Plot the following two points on a graph and draw the line that passes through those two points. Find the rise, run and slope of that line.

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Homework 8 Solutions

Univariate Regression

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Lecture 13/Chapter 10 Relationships between Measurement (Quantitative) Variables

Chapter 7: Simple linear regression Learning Objectives

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries?

Diagrams and Graphs of Statistical Data

Unit 9 Describing Relationships in Scatter Plots and Line Graphs

Exercise 1.12 (Pg )

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

2. Simple Linear Regression

Chapter 9 Descriptive Statistics for Bivariate Data

Simple linear regression

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Linear functions Increasing Linear Functions. Decreasing Linear Functions

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared.

List the elements of the given set that are natural numbers, integers, rational numbers, and irrational numbers. (Enter your answers as commaseparated

Formula for linear models. Prediction, extrapolation, significance test against zero slope.

Correlation key concepts:

Motion Graphs. Plotting distance against time can tell you a lot about motion. Let's look at the axes:

STAT 350 Practice Final Exam Solution (Spring 2015)

Section 3 Part 1. Relationships between two numerical variables

Chapter 7 Scatterplots, Association, and Correlation

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

1 of 7 9/5/2009 6:12 PM

Review of Fundamental Mathematics

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

PLOTTING DATA AND INTERPRETING GRAPHS

Section 1.1 Linear Equations: Slope and Equations of Lines

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

Graphing Linear Equations

MTH 140 Statistics Videos

Part 1: Background - Graphing

Week 1: Functions and Equations

In A Heartbeat (Algebra)

Module 3: Correlation and Covariance

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Statistics 2014 Scoring Guidelines

Interpreting Data in Normal Distributions

Scatter Plots with Error Bars

AP STATISTICS REVIEW (YMS Chapters 1-8)

CALCULATIONS & STATISTICS

Elasticity. I. What is Elasticity?

CORRELATION ANALYSIS

Name: Date: Use the following to answer questions 2-3:

FRICTION, WORK, AND THE INCLINED PLANE

A Determination of g, the Acceleration Due to Gravity, from Newton's Laws of Motion

The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces

What is the purpose of this document? What is in the document? How do I send Feedback?

Describing, Exploring, and Comparing Data

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

Lesson Using Lines to Make Predictions

COMMON CORE STATE STANDARDS FOR

containing Kendall correlations; and the OUTH = option will create a data set containing Hoeffding statistics.

Exploratory data analysis (Chapter 2) Fall 2011

Straightening Data in a Scatterplot Selecting a Good Re-Expression Model

What Does the Normal Distribution Sound Like?

The Big Picture. Correlation. Scatter Plots. Data

The correlation coefficient

Session 7 Bivariate Data and Analysis

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Notes for EER #4 Graph transformations (vertical & horizontal shifts, vertical stretching & compression, and reflections) of basic functions.

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

Homework 11. Part 1. Name: Score: / null

Summarizing and Displaying Categorical Data

Simple Predictive Analytics Curtis Seare

Graphing Motion. Every Picture Tells A Story

Elements of a graph. Click on the links below to jump directly to the relevant section

In this chapter, you will learn to use cost-volume-profit analysis.

The North Carolina Health Data Explorer

Algebra I Vocabulary Cards

THE COST OF COLLEGE EDUCATION PROJECT PACKET

Statistics. Measurement. Scales of Measurement 7/18/2012

The Central Idea CHAPTER 1 CHAPTER OVERVIEW CHAPTER REVIEW

HOW MUCH WILL I SPEND ON GAS?

Module 5: Multiple Regression Analysis

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

What does the number m in y = mx + b measure? To find out, suppose (x 1, y 1 ) and (x 2, y 2 ) are two points on the graph of y = mx + b.

Chapter 13 Introduction to Linear Regression and Correlation Analysis

The Correlation Coefficient

Descriptive Statistics and Measurement Scales

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

-2- Reason: This is harder. I'll give an argument in an Addendum to this handout.

Descriptive statistics; Correlation and regression

Law of Demand: Other things equal, price and the quantity demanded are inversely related.

Graphing Linear Equations in Two Variables

MA.8.A.1.2 Interpret the slope and the x- and y-intercepts when graphing a linear equation for a real-world problem. Constant Rate of Change/Slope

You buy a TV for $1000 and pay it off with $100 every week. The table below shows the amount of money you sll owe every week. Week

Chapter 27: Taxation. 27.1: Introduction. 27.2: The Two Prices with a Tax. 27.2: The Pre-Tax Position

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

How To Write A Data Analysis

Transcription:

Relationships Between Two Variables: Scatterplots and Correlation Example: Consider the population of cars manufactured in the U.S. What is the relationship (1) between engine size and horsepower? (2) Between engine size and gas mileage? Do cars with large engines tend to have high horsepower? Do cars with large engines tend to have high gas mileage? We will see both graphical and numerical ways to summarize this information about the relationship between two variables. University of South Carolina Page 1

Scatterplots A scatterplot is a graph that shows the relationship between two quantitative variables. Each individual in the data set has two variables measured on it. For each individual, the values of one variable are plotted on the horizontal axis, with the values of the other variable on the vertical axis. On the plot, there is a dot for each observation in the data set. See example: University of South Carolina Page 2

Scatterplot of Horsepower and Vehicle Weight for 32 cars. Horsepower 50 100 150 200 250 300 2 3 4 5 Vehicle Weight Figure 1: Vehicle weight (in 1000s of pounds) on horizontal axis, horsepower on vertical axis. University of South Carolina Page 3

Explanatory and Response Variables Sometimes one variable (called the explanatory variable) may naturally explain or predict the value of the other variable (called the response variable). If so, the explanatory variable is denoted X and plotted on the X-axis. The response variable is denoted Y and plotted on the Y-axis. In the engine size / gas mileage example, which is naturally the response variable? Other times, there is no natural explanatory-response relationship between the two variables either one can go on the horizontal axis. (Example: Height/Weight) University of South Carolina Page 4

Interpreting Scatterplots Same process as interpreting other graphs Look for overall pattern in the plot and look for deviations from that pattern. Describe general pattern (not counting outliers) Then look closely at individual outlying values to determine their cause. University of South Carolina Page 5

Positive and Negative Associations A scatterplot can show us the direction of the relationship between two variables. Two variables have a positive association if observations having large values for one variable also tend to have large values for the other variable. Also, when variables are positively associated, observations having small values for one variable also tend to have small values for the other variable. The scatterplot for such positively associated variables has a pattern that slopes upward from left to right. (Example: Horsepower and Vehicle Weight) University of South Carolina Page 6

Positive and Negative Associations (Continued) Two variables have a negative association if observations having large values for one variable tend to have small values for the other variable. The scatterplot for such negatively associated variables has a pattern that slopes downward from left to right. University of South Carolina Page 7

Scatterplot of Gas Mileage and Vehicle Weight for 32 cars. Gas Mileage 10 15 20 25 30 2 3 4 5 Vehicle Weight Figure 2: Vehicle weight (in 1000s of pounds) on horizontal axis, mileage (in miles per gallon) on vertical axis. University of South Carolina Page 8

Form and Strength of Association The form of the relationship between two variables may be approximately linear or curved. Once we identify the form of the relationship, we can characterize the strength of the relationship between the two variables. The relationship is strong if most of the data values closely follow the major trend in the plot. The relationship is weak if the data values show a great deal of random scatter around the major trend in the plot. University of South Carolina Page 9

Form and Strength of Association (Continued) In the previous two scatterplots shown, what are the forms of the relationships? Which of the two scatterplots shows a stronger relationship between the two variables plotted? Are there any notable outliers in the scatterplots? University of South Carolina Page 10

Clicker Quiz 1 What is the best description of the relationship between vehicle weight and gas mileage, based on the scatterplot? A. There is a fairly strong, roughly linear, positive association between vehicle weight and gas mileage. B. There is a very weak, roughly linear, negative association between vehicle weight and gas mileage. C. There is a very weak, curved, positive association between vehicle weight and gas mileage. D. There is a fairly strong, roughly linear, negative association between vehicle weight and gas mileage. University of South Carolina Page 11

Correlation Straight-line relationships between two variables are often of interest in data analyses. Correlation is a numerical measure of the strength and direction of the linear relationship between two quantitative variables. This could give us a more precise measure of the association than a scatterplot. Correlation coefficient (denoted r) is a number between -1 and 1. A positive value of r indicates the two variables are positively linearly associated. A negative value of r indicates the two variables are negatively linearly associated. University of South Carolina Page 12

More on Correlation The correlation also tells us how strong the linear relationship is. A value of r near -1 or near 1 indicates the two variables are strongly linearly associated. A value of r near 0 indicates the two variables are weakly linearly associated. See example pictures: University of South Carolina Page 13

Clicker Quiz 2 If two variables are strongly negatively linearly associated, what might be a value of r for a sample of data on these two variables? A. r = 0.78 B. r = 0.13 C. r = -0.95 D. r = -0.04 University of South Carolina Page 14

Cautions about Correlation The correlation does not change if we change the units of measurement for the variables (example: measuring vehicle weight in tons). Correlation ignores any distinction between explanatory and response variables. Correlation only describes the linear association between two variables, not any curved relationship! Correlation may be strongly affected (either increased or decreased) by a few outlying observations. University of South Carolina Page 15

Clicker Quiz 3 If two variables are very strongly associated, what might be a value of r for a sample of data on these two variables? Choose the best answer: A. r = 0.98 B. r = 0.03 C. Impossible to say for sure, but definitely near 1 or near -1. D. Impossible to say for sure. University of South Carolina Page 16

The Effect of Outliers on Correlation Recall the scatterplot of Horsepower and Vehicle Weight. The Maserati Bora is somewhat outlying with respect to the overall pattern. The correlation for the entire data set is 0.659. If we delete the Maserati Bora observation, will the correlation be larger or smaller? University of South Carolina Page 17

Scatterplot: Horsepower and Weight for 32 cars (Outlier shown) Maserati Bora > Horsepower 50 100 150 200 250 300 2 3 4 5 Vehicle Weight Figure 3: Vehicle weight (in 1000s of pounds) on horizontal axis, horsepower on vertical axis. University of South Carolina Page 18

The Effect of Outliers on Correlation The correlation for the entire data set is 0.659. If we delete the Maserati Bora observation, will the correlation be stronger or weaker? If we delete the Maserati Bora observation, the correlation becomes 0.725. The Maserati Bora observation had dampened the strength of the linear relationship. In other cases, an outlier could increase the correlation between the variables (see Thought Question 3 picture) University of South Carolina Page 19

Clicker Quiz 4 Consider the left graph in the Thought Question 3 slide. Suppose the correlation between the two variables is 0.57. What might be the correlation if we deleted the solitary outlier? A. 0.47 B. 0.02 C. -0.83 D. 0.74 University of South Carolina Page 20

Clicker Quiz 5 Consider the right graph in the Thought Question 3 slide. Suppose the correlation between the two variables is 0.83. What might be the correlation if we deleted the outlying value? A. 0.72 B. -0.12 C. -0.64 D. 0.95 University of South Carolina Page 21

More about Correlation The ordinary correlation coefficient can only be calculated when both variables are quantitative. It doesn t make sense to talk about a high correlation between gender and preferred TV network. There may be a relationship between two categorical variables, but it s not measured by correlation. There are other statistics that measure the association between categorical variables, or between a categorical variable and a quantitative variable (we won t cover these). Also note: To be complete, we should report means and standard deviations for each variable, along with the correlation. University of South Carolina Page 22