Correlation Coefficient A scatter plot displays the direction, form and strength of the relationship between two variables. However we do not know how strong or weak this relationship is. It s difficult for us to tell without a scale of what is considered strong and weak. The two images below are the same scatterplot, just a different scale 1
How to we compare two scatter plots? Correlation Coefficient: A numerical value (between +1 and 1) that identifies the strength of the linear relationship between variables. Some important facts about correlation coefficient: 1) We use the variable r to symbolize correlation coefficient. Enter data in (Stat edit) LI and L2 Stat Calc LinReg (Choice #4) 2
2) The correlation always falls in between 1 and 1 0 r <.5 weak Same scale for negatives.5 r <.8 moderate.8 r < 1 strong r = 1 perfect positive r = 1 perfect negative 3
http://www.rossmanchance.com/applets/guesscorrelation.html 4
Some more things to consider: 1) Changing units of measurements does not change the correlation coefficient. Example: changing kilograms to lbs 2) Correlation coefficient has NO UNIT OF MEASUREMENT. It s just r = 3) Correlation ignores the distinction between explanatory and response variables. Example: If we reversed the axis, it would still be the same r value 4) Correlation only measures the strength of LINEAR relationships only. Example: curved lines wouldn t have an r value 5
5) The correlation coefficient IS AFFECTED by outliers. The farther your point is away from the rest, The more it affects your correlation coefficient. When you see an outlier, it s important to calculate the r value with and WITHOUT the outlier (to see how it changes) 6
Match each graph with its corresponding correlation coefficient below:.85.40 0.50.90.99 7
4.1A Name: The 2008 EPA fuel economy ratings for both highway and city driving are given for 25 randomly selected standard pickup trucks with 4 wheel drive. A scatterplot is shown below. 1. Describe what you see in the scatterplot. 2. Is one of the points on the scatterplot unusual to you in comparison to the other points? Explain why you think this point is unusual. 3. In this situation, is it more reasonable to simply explore the relationship between the two variables or to view one of the variables as an explanatory variable and the other as a response variable? In the latter case, identify which is the explanatory and which is the response variable. 4. Based on this scatterplot, what does this tell you about the relationship between highway mpg and city mpg for trucks in this category? 8
Graded Assignment 4.1D Name: 1. State Park rangers are interested in estimating the weight of the bears that inhabit their park. One way to estimate the weight of a bear is by measuring its neck size (distance around the neck). One method used to accomplish this is to measure the bear s neck while it is hibernating, which is how some college students in Maine who were studying to be rangers and conservation officers got their data. Neck Size (inches) Weight (pounds) 15 65 20.5 142 16 80 28 344 32 432 31 416 26.5 262 20 204 18 144 24 207 2. Describe what the scatterplot tells you about the direction and form of the relationship. 3. Using your calculator, find the value of the correlation. 4. Explain in words what this r says. 5. Use the scatterplot to predict the approximate weight of a bear with a neck size of 22 inches. 6. Add a point to the scatterplot that would reduce the correlation between these variables. Explain why the point you added would have this effect. 9
10