-Steinberg-5.qxd 11//7 9:9 PM Page 89 Relationship Strength and Direction Terms: reliability, prediction, scatterplot, bivariate, strength, positive direction, negative direction, linear, curvilinear, outliers Learning Objectives: Distinguish between correlational and experimental studies Create a scatterplot Estimate the strength and direction of a set of data based on its scatterplot Understand the effect of outliers on the strength and direction of a correlation Experimental Versus Correlational Studies In Modules 1 to, we measured the effect of an independent variable on a dependent variable. Then, we tested the result for statistical significance, effect size, and power. In each of the studies, there were at least two groups that differed on the independent variable. We then examined whether the independent variable (the defining difference between the groups) caused the effect in the dependent variable. In this module, we begin to look at data from a new perspective. We will look at correlational studies. In a correlational study, we have only a single group of subjects rather than two or more groups. In addition, each of the subjects has a score on two different variables. Also, in a correlational study, we do not seek cause-and-effect relationships between independent and dependent variables. Rather, we simply want to know whether or not the scores on two variables are related. Sometimes correlational studies are used to establish the properties of the tests themselves. The SAT, for example, is given on multiple test dates throughout the year. Students taking the test on one date do not answer the same questions as students taking the test on another date. Rather, there are parallel forms of the test a different form for each date. Scores have the same meaning regardless of which form students take because the test forms are comparable. But how do the test developers know that the scores are comparable? During the test development process, they gave the same students (note the single group of subjects) two different forms of the test (note the two variables). Then, they compared the students scores on both tests (note the correlation). They found that the scores were similar for the same students on both forms of the test. This type of correlation is called test reliability. 89
-Steinberg-5.qxd 11//7 9:9 PM Page 9 9 MODULE : RELATIONSHIP STRENGTH AND DIRECTION Most of the time, correlational studies are used for prediction rather than for establishing the reliability of the tests themselves. That is, we seek to establish relationships so that the score of a person on one variable can be used to predict that person s probable score on a second variable. For example, once a relationship is established between the number of hours children watch television and children s academic performance in school, we can predict any given child s probable academic performance in school just by knowing the number of hours of television he or she watches. Similarly, a researcher interested in prediction may want to know the relationship between the amount of time students study and their grade on a test, the amount of antidepressant medication clients take and their reported mood level, air temperature and crime rate, income and years of education, height and weight, and IQ and shoe size (do you think there is any relationship?). CHECK YOURSELF! What are two common uses of correlations? CHECK YOURSELF! Complete the following table to compare and contrast studies analyzed by three different statistics. t test F test Correlation Purpose of Study or Type No. of Groups No. of Variables of Conclusion to Be Drawn PRACTICE 1. Which of these studies might be analyzed with a correlation? a. The relationship between birth order and academic achievement b. The number of inches babies grow when breast-fed versus bottle-fed c. Whether the amount of time runners practice is related to their running times during a tournament d. Which of three treatments best helps hyperactive children stay on task
-Steinberg-5.qxd 11//7 9:9 PM Page 91 MODULE : RELATIONSHIP STRENGTH AND DIRECTION 91. Which of these studies might be analyzed with a correlation? a. The cost of various cars when new and the amount of time before those cars need repair b. The number of illnesses reported per year by vegetarians versus nonvegetarians c. The duration of fever for viral versus bacterial infections d. The number of miles dieters walked and the amount of weight they lost Plotting Correlation Data A scatterplot displays the scores of individual cases on two variables. It visually displays the degree to which scores on one variable are related to scores on another variable. Because it depicts two variables, it is said to be bivariate (bi = two, variate = variable). Unlike the graphs you created in Module, in a scatterplot the Y-axis does not indicate frequency. Rather, both axes indicate scores. One axis indicates the score on one variable; the other axis indicates the score on the other variable. Individual cases are represented by a dot or some other symbol. To create a scatterplot, locate a subject s score for Variable 1 on the X-axis and the same subject s score for Variable on the Y-axis. Mentally draw perpendicular lines from the axes to the interior of the graph. The point at which the two lines intersect is the location of that student s scores on both variables (Figure.1): Y X Figure.1 Drawing a Scatterplot Assume that 1 students receive the following scores on Quiz 1 and Quiz. The scatterplot for this set of data is displayed in Figure., with the location of the first student, Abby, indicated. Student Quiz 1 Quiz Abby 9 1 Babs 7 8 Clyde 1 7 DeShawn 9 8 Emily 5 5 Fred Gino 1 8 Hortense 9 Ingrid Jorge 1 9
-Steinberg-5.qxd 11//7 9:9 PM Page 9 9 MODULE : RELATIONSHIP STRENGTH AND DIRECTION Score on Quiz 1 11 1 9 8 7 5 1 1 5 7 8 9 1 11 1 Score on Quiz 1 Abby Figure. Scores on Quiz 1 and Quiz PRACTICE. Create a scatterplot of the following heights and weights of ninth-grade girls: Subject Height Weight Quashonna 1 Ronette 8 11 Sherelle 15 Telise 17 Ulinda 7 1 Vonnie 11 Winona 8 1 Xelei 7 18 Yvette 9 Zoe 9 15. Create a scatterplot of the following data on the average annual household income and the average cost of a three-bedroom, 1½-bath home (both to the nearest thousand dollars) in 1 geographic areas: Area Income House 1 1 9 95 7 91 5 15 78 8 7 5 18 8 5 7 9 9 119 1 1
-Steinberg-5.qxd 11//7 9:9 PM Page 9 Relationship Strength MODULE : RELATIONSHIP STRENGTH AND DIRECTION 9 The relationship between two sets of scores has two characteristics: strength and direction. Let s look at both strength and direction in more detail. The strength of a relationship tells the degree to which scores on one variable are related to scores on the other variable. Strength is expressed from. to 1.. The higher the numerical value (regardless of sign), the stronger the relationship. A correlation of.8 is strong, while a correlation of.1 is weak. In a perfect relationship, Mathematics is the all data points fall along a straight line. For example, by knowing the temperature science of patterns. on the Celsius scale, we can exactly predict the temperature on the Fahrenheit Lynn Arthur Steen scale. Thus, the correlation between Celsius and Fahrenheit is 1.. A correlation of., at the other extreme, indicates no relationship. For example, there is no relationship between adult IQ and shoe size. Adults with high, medium, or low IQs are equally likely to have small, medium, or large shoe sizes. Thus, the data points fall in a circular blob. Here is how these two scatterplots would look (Figure.). Degrees Fahrenheit 5 15 1 5 Shoe size 1 1 1 11 1 9 8 7 Degrees Celsius Perfect relationship 8 1 7 8 9 1 IQ 11 1 1 Zero relationship Figure. Scatterplots of Perfect and Zero Relationships Although we sometimes find perfect relationships in the physical sciences, relationships are rarely perfect in the social sciences. With human beings, most variables are only moderately related. SAT score predicts freshman GPA, for example, but only moderately. Other variables such as motivation or test anxiety come into play. Thus, some people who score very high on the SAT do poorly in college, and some people who score low on the SAT do very well in college. Similarly, the number of absences during a semester predicts grade on a final exam but only moderately. Some students who never miss a class do poorly on the final exam, and some students who are frequently absent do well on the final exam. Other variables such as prior knowledge, diligence in reading the textbook, and obtaining missed notes come into play. The data for most social science relationships fall neither on a straight line nor in a circular blob. Rather, they fall in an ellipse. The thinner the ellipse, the closer the points fall to a straight line and, hence, the stronger (closer to 1.) the relationship. The wider the ellipse, the farther the points fall from a straight line and, hence, the weaker (closer to.) the relationship. Figure. shows two typical moderate relationships in the social sciences.
-Steinberg-5.qxd 11//7 9:9 PM Page 9 9 MODULE : RELATIONSHIP STRENGTH AND DIRECTION.75.5.5 Freshman GPA.75.5.5 1.75 1.5 8 9 1, 1,1 1, 1, 1, SAT score 5 Absences 1 5 7 75 8 85 9 95 1 Final exam grade Figure. Two Scatterplots of Moderate Relationships Relationship Direction The direction of a relationship tells whether or not the values on two variables go up and down together. Direction is indicated by a positive or a negative sign. If two variables are positively correlated, then as the values on one variable go up, so do the values on the other variable. For example, the relationship between SAT score and freshman college GPA is positive. Thus, students who receive higher scores on the SAT are more likely to receive higher grades during their freshman year of college. You can see this pattern in the upper scatterplot in Figure.. Note the direction of the data points. With a positive relationship, the data points go from the bottom left to the upper right.
-Steinberg-5.qxd 11//7 9:9 PM Page 95 MODULE : RELATIONSHIP STRENGTH AND DIRECTION 95 If two variables are negatively correlated, then as the values of one variable go up, the values of the other variable go down. For example, the relationship between the number of absences in a course and score on the final exam in that course is negative. In other words, the more often students are absent from class, the lower their grades tend to be on the final exam. You can see this pattern in the lower scatterplot in Figure.. Note the direction of the data points. With a negative relationship, the data points go from the upper left to the lower right. Let s return to the set of quiz scores from the beginning of this module. Judging from the scatterplot shown in Figure.5, what is the approximate strength of the relationship? What is the direction of the relationship? He seemed to have very strong intuitions but unfortunately of negative sign. the biologist Francis Crick, referring to René Thom, in the book What Mad Pursuit Score on Quiz 1 11 1 9 8 7 5 1 1 5 7 Score on Quiz 1 8 9 1 11 1 Figure.5 Scores on Quiz 1 and Quiz The relationship is moderate because the points form an ellipse about halfway between a straight line and a circle. And it is positive because the general trend is from lower left to upper right. That is, as the scores on Quiz 1 go up, the scores on Quiz also go up. CHECK YOURSELF! What two terms are used to describe a correlational relationship? What does each term indicate? PRACTICE 5. For each of the following, indicate whether the expected relationship between the two variables will be positive (+), negative ( ), or zero (): a. The average number of calories eaten per day and body weight b. Running speed and general physical condition c. Length of hair and introversion d. The amount of education and the time spent on welfare e. Per capita consumption of alcohol and suicide rate
-Steinberg-5.qxd 11//7 9:9 PM Page 9 9 MODULE : RELATIONSHIP STRENGTH AND DIRECTION. For each of the following, indicate whether the expected relationship between the two variables will be positive (+), negative ( ), or zero (): a. Air temperature and the amount of snow on the ground b. The number of minutes of exercise per day and score on a physical fitness test c. The number of years since having a driver s license and age d. The number of pages in a textbook and cost of that textbook e. Age at which a child takes its first step and educational level of the parents Linear and Nonlinear Relationships The relationships we have examined thus far have been linear. In linear relationships, the trend in the data is best described by a straight line. That is, we could fit a straight line in the center of the scatterplot to indicate the trend in the data. Figure. shows examples of two straight-line trends..75.5.5 Freshman GPA.75.5.5 1.75 1.5 8 9 1, 1,1 1, 1, 1, SAT score 5 Absences 1 5 7 75 8 85 9 95 1 Final exam grade Figure. Two Linear Relationships
-Steinberg-5.qxd 11//7 9:9 PM Page 97 MODULE : RELATIONSHIP STRENGTH AND DIRECTION 97 Not all relationships are linear. Some are curvilinear. In a curvilinear relationship, the trend in the data changes direction. For example, the relationship between test score and test anxiety is curvilinear. High levels of test anxiety impair test performance, but so do low levels of test anxiety. The best test performance occurs among test takers having moderate anxiety. Thus, the relationship takes on an inverted U shape, as you can see in Figure.7. Test score Anxiety level Figure.7 An Inverted U Relationship The relationship mentioned at the beginning of this module between the amount of antidepressant medication taken and reported mood level is also curvilinear. You can see this pattern in Figure.8. Mood Dosage Figure.8 An S-Shaped Relationship Note from the graph that mood elevates sharply with initial dosage, then elevates more slowly with a further increased dosage, and finally levels off at a still higher dosage. Thus, the relationship takes on a leaning S shape.
-Steinberg-5.qxd 11//7 9:9 PM Page 98 98 MODULE : RELATIONSHIP STRENGTH AND DIRECTION If we try to fit a straight line to curvilinear data, many data points will fall off the line. This is especially true for the relationship between test score and test anxiety. However, if we fit a curved line to the data, the data points hug the line quite closely. Figure.9 shows how a curved line fits the data better than a straight line does. Test score Anxiety level Mood Dosage Figure.9 Curvilinear Data Fitted With Linear and Curvilinear Lines Recall that the closer data fall to the line, the stronger the relationship. Because the data in Figure.9 closely hug a curved line, fitting a curved line to the data will yield a relatively high correlation something close to 1.. Conversely, because the same data fall quite far from a straight line, fitting a straight line to the data will yield a relatively low correlation something close to.. This is why you should always plot your data before calculating any statistic. As you will learn in Module 5, some correlation statistics are linear, and some are curvilinear. Using a linear statistic to describe curvilinear data will seriously underestimate the amount of correlation.
-Steinberg-5.qxd 11//7 9:9 PM Page 99 CHECK YOURSELF! MODULE : RELATIONSHIP STRENGTH AND DIRECTION 99 How can you tell if a relationship is linear or curvilinear? Why does the shape of the data matter when calculating a correlation? Outliers and Their Effects In any relationship there may be outliers. Outliers are scores that fall far outside the trend of the rest of the data typically or more standard deviations beyond the next closest score. Consider the two scatterplots shown in Figure.1. They are identical except for a single score. 1 11 1 9 8 7 5 1 1 11 1 9 8 7 5 1 1 5 7 8 9 1 11 1 1 5 7 8 9 1 11 1 Figure.1 Data Without and With an Outlier For the most part, a straight line fits the data well. However, the outlier pulls the line in the direction of the outlier, as shown in the lower example in Figure.1. When the line is pulled toward the outlier, the remaining points then fall farther from the line than they otherwise would have. The fit is worse; hence, the correlation is lower. Outliers lower the amount of correlation from what it would be without the outliers. Often an outlier is due to error: a mismarked answer paper, a mistake in entering a score in a database, a subject who misunderstood the directions, and so on. You should always seek to understand the cause of an outlying score. If the cause is not legitimate, you should eliminate the outlying score from the analysis. Leaving an incorrect or errant score in the analysis distorts the true correlation.
-Steinberg-5.qxd 11/1/7 1:1 PM Page MODULE : RELATIONSHIP STRENGTH AND DIRECTION At other times, outliers are legitimate. One person may legitimately score far beyond everyone else on the variable being measured. Nevertheless, if the outlier seriously distorts interpretation of the remaining data, it is customary to present the data with and without the outlier, state that you are eliminating it, and then conduct further analyses with the outlier removed. CHECK YOURSELF! How can you tell if a set of data includes outliers? Why does the presence of outliers matter when calculating a correlation? Looking Ahead So far, we have merely estimated the strength and direction of relationships through visual inspection of the data. In the next module, we will calculate a correlation coefficient. Then, we will be able to state the exact strength and direction of a relationship. PRACTICE 7. Here are the quiz scores for students on each of three pop quizzes: Quiz 1 Quiz Quiz Ann 11 1 5 Brianna 9 7 9 Chara 8 9 1 Diane 1 1 Eliza 11 Felice 1 a. Create three separate scatterplots: Quiz 1 and Quiz, Quiz 1 and Quiz, and Quiz and Quiz. b. Describe the direction and apparent strength of the relationship in each scatterplot. Comment on the effect of outliers, if any. 8. Gather data from 1 adults (perhaps from students in your statistics class) for height and shoe size. a. Create a scatterplot for the data. b. Describe the direction and apparent strength of the data. Comment on the effect of outliers, if any. 9. Gather data from 1 adults (perhaps from students in your statistics class) for the number of hours of television watched on an average day and the typical number of hours of sleep obtained in an average night. a. Create a scatterplot for the data. b. Describe the direction and apparent strength of the data. Comment on the effect of outliers, if any. Visit the study site at www.sagepub.com/steinbergsastudy for practice quizzes and other study resources.