STAB22 section 2.1. Figure 1: Scatterplot of price vs. size for Mocha Frappuccino

STAB22 section 2.1 2.3 Both ounces and price are quantitative variables, and so we could draw a scatterplot to see how they are related. We might expect that bigger sizes cost more, though a Venti (24 ounces) costs less than twice a Tall (12 ounces), even though it s twice the size. (I have problems with a company that calls its smallest serving a Tall, but that may just be me.) If you leave the variable Size as categorical, you could make something like a bar graph but using Ounces instead of frequency. The individuals (cases) here are cups of Mocha Frappuccino. 2.9 The price of a drink depends on the size. So price should be the response and size the explanatory variable, and on your scatterplot price should be on the vertical y scale. I typed the numbers into software and produced the plot shown in Figure 1, though you could almost as easily do this by hand. As the size goes up, the price goes up as well, but not in a straight-line way: the relationship looks less steep as the size increases (reflecting the fact that a 24-ounce drink costs the least per ounce of coffee, because the coffee itself is only one component of the price, and there is also the fixed cost of hiring a barista to serve you, however big a drink you have). Figure 1: Scatterplot of price vs. size for Mocha Frappuccino 1

2.24 The first test comes before the final exam chronologically, so the final exam score should be the response (and go on the vertical scale on your scatterplot). Again, this one could be done either by hand or by using software (your choice). I used Minitab, with the results shown in Figure 2. Select Scatterplot and Simple, then select the response as Y and the explanatory as X. There is essentially no relationship between the two scores: if you knew the first test score, that would not help you at all in predicting the final exam score. This might be because the first test came very early in the course, and the material it tested was very different from that on the final exam. Or students might react to their first test result: a student who scores poorly might study hard for the final, and a student who scores well might relax a bit too much before the final. 2.25 Again, the final exam score will be the response. My scatterplot is shown in Figure 3. This appears to be something of a positive association (more so than in Figure 2, anyway), so that knowing the score on the second test helps a bit in predicting the final exam score. (Note that the student who does best on the 2nd test, 175, does well on the final, and the two students who score under 150 on the second test don t do very well on the final either.) By the time the second test comes around, usu- Figure 2: Scatterplot of first test and final exam scores ally late in the semester, it will usually be pretty clear what material is going to be tested (pretty much the same stuff that will be on the final), so a student who does well on one will probably do well on the other (and will know how hard they need to study for the final). 2.27 Think of whether one variable might be the cause of the other, or whether the two variables are just things that happen to go together. In (b) and (e), the two values in each case are obtained at the same time, and so they just go together (or not): just explore the relationship in each case. In (a), older children will tend to be heavier, so that if you knew the age of a child, you wold be 2

you would probably get pretty close to the right order.) In each of (a), (c) and (d) here, you could make a case for the explanatory and response variables being the other way around, but the major interest would be in the relationships as described above. For instance, if you knew the weight of a child, you could guess their age, but you would normally want to do it the other way around. Figure 3: Scatterplot of second test and final exam scores able to predict their weight. Being able to say if I knew x, I would be able to predict y means that x is explanatory and y is the response: here age is explanatory and weight is the response. In (c), if you knew how many bedrooms the apartment has, you could make a guess at its rental price. Thus bedrooms is explanatory, and rental price is the response. In (d), likewise, if you knew how much sugar a cup of coffee has, you would be able to guess how sweet it would taste. (A more interesting setup would be to have a friend prepare three cups of coffee with differing amounts of sugar in, and then, by tasting, you would rank them in order of sweetness. If you re a big coffee drinker, 2.28 Parents income is explanatory and college debt is the response, because parental income influences college debt (it comes first). These variables are both quantitative (you would measure them). If the parents have a high income, the student will not have to borrow so much money, so the debt will be low; if the parents have a low income, the student will have to borrow a lot of money to pay tuition, living expenses and so on. So we would expect a negative association. This is assuming that parents will pay their children s college expenses, if they can. This isn t always the case. Some students work while they re at school (or during the summers) and save what they earn, and such students can be expected to graduate with a lower debt than they would otherwise have had. 2.29 IQ is supposed to be a measure of general intelligence, and we would expect more intelligent 3

children to be more interested in and more skilled in reading. This would be especially true for children in the same grade (and thus of about the same age). In Figure 2.13, children with higher IQ scores generally have higher reading scores, though there is a lot of scatter. There are four children (with IQs between 100 and 130, and reading scores less than 20) that don t seem to follow the general trend. Their reading scores are about 40 points less then you would expect based on their IQ; these children could have some kind of developmental problems that hinder their reading even though they score well on general intelligence. Ignoring the outliers, the trend is roughly linear (there is no obvious curve to the relationship, which is how you tell). But it isn t very strong: there is a lot of scatter in the in the picture, which is another way of saying that if you know a child s IQ, you wouldn t be able to predict their reading test score very accurately. (There is more to reading than general intelligence, in other words.) 2.30 As on a normal probability plot, when you see a stair-step pattern like this, it means that one of the variables only takes a few different values. Here, it s the child s self-estimate of reading ability, which can only be 1, 2, 3, 4 or 5. There are 60 children, so there are several with the same selfestimate. Having said that, children with a high test score also tend to have a high self-estimate (all of the children with test scores above 80 rate themselves 3 or better). Likewise, the children with a test score below 40 rate themselves 3 or worse, with one exception. This exception is the one outlier: a test score of about 10, and a selfestimate of 4, which is a serious over-estimate (looking at the plot, you would expect this child to have a self-rating of 1 or maybe 2). 2.32 Get the data from the disk into your software. In Minitab, select Graph and Plot, with the right variable (cycle length, here) as the response, Y, variable. My plot is shown in Figure 4. Figure 4: Plot of cycle length against day length 4

The point on the far right (with day length close to 24) is an outlier, because it is not part of the general pattern. You could claim that there is a positive association, but it is very weak: if you try to predict cycle length from day length, your prediction won t be very accurate. score on the distress scale leads to a higher brain activity measurement. The relationship is more or less linear and fairly strong. I don t see any outliers. The data do suggest that distress from social exclusion is related to brain activity in the pain region. 2.33 I did this in Minitab again (though you could do this one by hand if you really want to). Get the data from the disk into Minitab; treat brain activity as the response. Select Graph and Plot, and select the two variables into Y and X with brain activity as Y. My plot is in Figure 5. Figure 6: Plot of team value against revenue Figure 5: Scatter plot of brain activity against social distress The relationship shows an upward trend: a higher 2.34 My plot of team value against revenue is in Figure 6. I don t think there s much of a relationship. If anything, the trend is downward, since one of the teams with no revenue has the highest value, 5

Figure 7: Plot of team value against debt Figure 8: Plot of team value against operating income 6

and the team with the highest revenue has almost the lowest value. On the other hand, the plot of value against debt is close to a perfect upward straight line: the larger the debt, the larger the value. There are some outliers at the bottom left (in the sense of points that are further off the line than the others): the Oklahoma City Thunder and the Orlando Magic have higher value than you would expect given their amounts of debt, and the Portland Trail Blazers have lower value than you would expect from their debt. None of these are far off the trend, but the overall fit is so good that I would call even these moderately off values outliers. I d describe the value income plot as a weakish positive association, since there does seem to be some relationship. The teams with negative income seem to be following the same trend as the others, except for the Dallas Mavericks: for a team with that kind of value, you d expect a positive income at least. 2.35 The last sentence of the first paragraph in the text gives you a clue as to what should be on the y-axis: rate is the response, and mass the explanatory variable. So get a scatterplot of Rate against Mass, with groups, and use Sex as the grouping categorical variable. Figure 9. Your plot should look something like Figure 9: Metabolic rate vs. lean body mass Looking at all the data, the relationship is positive (larger lean body mass goes with larger metabolic rate), and the trend looks linear. The relationship looks quite strong, except perhaps at the upper end. Separating out the men and women, some of the men (red squares) have large lean body mass and large metabolic rate, and the trend overall for the men is not as clear as it is for the women (black circles). (Most of the larger values are men, and all of the smaller values, on both variables, are women.) 2.37 To get the plot with men and women s records separately labelled, use the same idea as 2.35: do 7

a scatterplot with groups, and select Sex as the grouping variable. Figure 10: Men and women s 10,000 record times Men (red squares) have been running this event for longer than women (black circles), so their history is longer. But the women s record appears to have been dropping more quickly than the men s. In recent years, though, the women s record hasn t dropped very much, while the men s has dropped more quickly. So the data support the first claim of (b), but not the second (the men s record is still less than the women s, with no apparent sign that the women are going to catch up). 8