STAB22 section 2.1. Figure 1: Scatterplot of price vs. size for Mocha Frappuccino

Similar documents
Relationships Between Two Variables: Scatterplots and Correlation

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Chapter 1 Introduction to Correlation

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Charts, Tables, and Graphs

MOST FREQUENTLY ASKED INTERVIEW QUESTIONS. 1. Why don t you tell me about yourself? 2. Why should I hire you?

Section 3 Part 1. Relationships between two numerical variables

Describing Relationships between Two Variables

Lecture 13/Chapter 10 Relationships between Measurement (Quantitative) Variables

Unit 9 Describing Relationships in Scatter Plots and Line Graphs

Statistics 2014 Scoring Guidelines

Frequency Distributions

Updates to Graphing with Excel

Lesson 4 Measures of Central Tendency

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Homework 8 Solutions

STAB22 section 1.1. total = 88(200/100) + 85(200/100) + 77(300/100) + 90(200/100) + 80(100/100) = = 837,

Comparing 4 Sugar-Free Energy Drinks: A Survey. May 10, 2013 TECM

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Plot the following two points on a graph and draw the line that passes through those two points. Find the rise, run and slope of that line.

The Dummy s Guide to Data Analysis Using SPSS

Father s height (inches)

Chapter 2: Descriptive Statistics

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

The Taxman Game. Robert K. Moniot September 5, 2003

Determine If An Equation Represents a Function

1 Shapes of Cubic Functions

Descriptive Statistics and Measurement Scales

Chapter 7 Scatterplots, Association, and Correlation

The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces

CALCULATIONS & STATISTICS

Top Ten Mistakes in the FCE Writing Paper (And How to Avoid Them) By Neil Harris

EdExcel Decision Mathematics 1

The Correlation Coefficient

Module 3: Correlation and Covariance

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there

Independent samples t-test. Dr. Tom Pierce Radford University

Simple linear regression

The 5 P s in Problem Solving *prob lem: a source of perplexity, distress, or vexation. *solve: to find a solution, explanation, or answer for

Mind on Statistics. Chapter 2

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Correlation key concepts:

Exercise 1.12 (Pg )

Statistics Chapter 2

Eight things you need to know about interpreting correlations:

Part 1: Background - Graphing

Chapter 7: Simple linear regression Learning Objectives

Unit 1 Number Sense. In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions.

Testing for Prostate Cancer

Descriptive statistics; Correlation and regression

Name: Date: Use the following to answer questions 2-3:

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries?

More Quadratic Equations

The correlation coefficient

Fun for all the Family 3- Quite a few games for articles and determiners

Chapter 2: Frequency Distributions and Graphs

Name Partners Date. Energy Diagrams I

Simple Regression Theory II 2010 Samuel L. Baker

Lesson 26: Reflection & Mirror Diagrams

Students summarize a data set using box plots, the median, and the interquartile range. Students use box plots to compare two data distributions.

What you should know about: Windows 7. What s changed? Why does it matter to me? Do I have to upgrade? Tim Wakeling

PROJECT DUE: Name: Date: Water Filtration Portfolio Eligible Project

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

Grade 8 Performance Assessment Spring 2001

How to Study for Class 4: The Determinants of Demand and Supply

Introduction to Statistics for Psychology. Quantitative Methods for Human Sciences

Descriptive Statistics

FREE FALL. Introduction. Reference Young and Freedman, University Physics, 12 th Edition: Chapter 2, section 2.5

Years after US Student to Teacher Ratio

STAT 350 Practice Final Exam Solution (Spring 2015)

Advanced Trading Systems Collection FOREX TREND BREAK OUT SYSTEM

Chapter 6: Probability

CHAPTER 14 NONPARAMETRIC TESTS

The Normal Distribution

Two-sample inference: Continuous data

Linear Programming Notes VII Sensitivity Analysis

Elements of a graph. Click on the links below to jump directly to the relevant section

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

What Does the Normal Distribution Sound Like?

Learning English podcasts from the Hellenic American Union. Level: Lower Intermediate Lesson: 2 Title: The History of Beer

Algebra: Real World Applications and Problems

Summarizing and Displaying Categorical Data

There are basically three options available for overcoming barriers to learning:

The Force Table Vector Addition and Resolution

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Midterm Review Problems

11. Analysis of Case-control Studies Logistic Regression

Formula for linear models. Prediction, extrapolation, significance test against zero slope.

Session 7 Bivariate Data and Analysis

Determination of g using a spring

Sample lab procedure and report. The Simple Pendulum

Chapter 3. The Concept of Elasticity and Consumer and Producer Surplus. Chapter Objectives. Chapter Outline

Advice to USENIX authors: preparing presentation slides

Straightening Data in a Scatterplot Selecting a Good Re-Expression Model

GRAPHS/TABLES. (line plots, bar graphs pictographs, line graphs)

Diagrams and Graphs of Statistical Data

Transcription:

STAB22 section 2.1 2.3 Both ounces and price are quantitative variables, and so we could draw a scatterplot to see how they are related. We might expect that bigger sizes cost more, though a Venti (24 ounces) costs less than twice a Tall (12 ounces), even though it s twice the size. (I have problems with a company that calls its smallest serving a Tall, but that may just be me.) If you leave the variable Size as categorical, you could make something like a bar graph but using Ounces instead of frequency. The individuals (cases) here are cups of Mocha Frappuccino. 2.9 The price of a drink depends on the size. So price should be the response and size the explanatory variable, and on your scatterplot price should be on the vertical y scale. I typed the numbers into software and produced the plot shown in Figure 1, though you could almost as easily do this by hand. As the size goes up, the price goes up as well, but not in a straight-line way: the relationship looks less steep as the size increases (reflecting the fact that a 24-ounce drink costs the least per ounce of coffee, because the coffee itself is only one component of the price, and there is also the fixed cost of hiring a barista to serve you, however big a drink you have). Figure 1: Scatterplot of price vs. size for Mocha Frappuccino 1

2.24 The first test comes before the final exam chronologically, so the final exam score should be the response (and go on the vertical scale on your scatterplot). Again, this one could be done either by hand or by using software (your choice). I used Minitab, with the results shown in Figure 2. Select Scatterplot and Simple, then select the response as Y and the explanatory as X. There is essentially no relationship between the two scores: if you knew the first test score, that would not help you at all in predicting the final exam score. This might be because the first test came very early in the course, and the material it tested was very different from that on the final exam. Or students might react to their first test result: a student who scores poorly might study hard for the final, and a student who scores well might relax a bit too much before the final. 2.25 Again, the final exam score will be the response. My scatterplot is shown in Figure 3. This appears to be something of a positive association (more so than in Figure 2, anyway), so that knowing the score on the second test helps a bit in predicting the final exam score. (Note that the student who does best on the 2nd test, 175, does well on the final, and the two students who score under 150 on the second test don t do very well on the final either.) By the time the second test comes around, usu- Figure 2: Scatterplot of first test and final exam scores ally late in the semester, it will usually be pretty clear what material is going to be tested (pretty much the same stuff that will be on the final), so a student who does well on one will probably do well on the other (and will know how hard they need to study for the final). 2.27 Think of whether one variable might be the cause of the other, or whether the two variables are just things that happen to go together. In (b) and (e), the two values in each case are obtained at the same time, and so they just go together (or not): just explore the relationship in each case. In (a), older children will tend to be heavier, so that if you knew the age of a child, you wold be 2

you would probably get pretty close to the right order.) In each of (a), (c) and (d) here, you could make a case for the explanatory and response variables being the other way around, but the major interest would be in the relationships as described above. For instance, if you knew the weight of a child, you could guess their age, but you would normally want to do it the other way around. Figure 3: Scatterplot of second test and final exam scores able to predict their weight. Being able to say if I knew x, I would be able to predict y means that x is explanatory and y is the response: here age is explanatory and weight is the response. In (c), if you knew how many bedrooms the apartment has, you could make a guess at its rental price. Thus bedrooms is explanatory, and rental price is the response. In (d), likewise, if you knew how much sugar a cup of coffee has, you would be able to guess how sweet it would taste. (A more interesting setup would be to have a friend prepare three cups of coffee with differing amounts of sugar in, and then, by tasting, you would rank them in order of sweetness. If you re a big coffee drinker, 2.28 Parents income is explanatory and college debt is the response, because parental income influences college debt (it comes first). These variables are both quantitative (you would measure them). If the parents have a high income, the student will not have to borrow so much money, so the debt will be low; if the parents have a low income, the student will have to borrow a lot of money to pay tuition, living expenses and so on. So we would expect a negative association. This is assuming that parents will pay their children s college expenses, if they can. This isn t always the case. Some students work while they re at school (or during the summers) and save what they earn, and such students can be expected to graduate with a lower debt than they would otherwise have had. 2.29 IQ is supposed to be a measure of general intelligence, and we would expect more intelligent 3

children to be more interested in and more skilled in reading. This would be especially true for children in the same grade (and thus of about the same age). In Figure 2.13, children with higher IQ scores generally have higher reading scores, though there is a lot of scatter. There are four children (with IQs between 100 and 130, and reading scores less than 20) that don t seem to follow the general trend. Their reading scores are about 40 points less then you would expect based on their IQ; these children could have some kind of developmental problems that hinder their reading even though they score well on general intelligence. Ignoring the outliers, the trend is roughly linear (there is no obvious curve to the relationship, which is how you tell). But it isn t very strong: there is a lot of scatter in the in the picture, which is another way of saying that if you know a child s IQ, you wouldn t be able to predict their reading test score very accurately. (There is more to reading than general intelligence, in other words.) 2.30 As on a normal probability plot, when you see a stair-step pattern like this, it means that one of the variables only takes a few different values. Here, it s the child s self-estimate of reading ability, which can only be 1, 2, 3, 4 or 5. There are 60 children, so there are several with the same selfestimate. Having said that, children with a high test score also tend to have a high self-estimate (all of the children with test scores above 80 rate themselves 3 or better). Likewise, the children with a test score below 40 rate themselves 3 or worse, with one exception. This exception is the one outlier: a test score of about 10, and a selfestimate of 4, which is a serious over-estimate (looking at the plot, you would expect this child to have a self-rating of 1 or maybe 2). 2.32 Get the data from the disk into your software. In Minitab, select Graph and Plot, with the right variable (cycle length, here) as the response, Y, variable. My plot is shown in Figure 4. Figure 4: Plot of cycle length against day length 4

The point on the far right (with day length close to 24) is an outlier, because it is not part of the general pattern. You could claim that there is a positive association, but it is very weak: if you try to predict cycle length from day length, your prediction won t be very accurate. score on the distress scale leads to a higher brain activity measurement. The relationship is more or less linear and fairly strong. I don t see any outliers. The data do suggest that distress from social exclusion is related to brain activity in the pain region. 2.33 I did this in Minitab again (though you could do this one by hand if you really want to). Get the data from the disk into Minitab; treat brain activity as the response. Select Graph and Plot, and select the two variables into Y and X with brain activity as Y. My plot is in Figure 5. Figure 6: Plot of team value against revenue Figure 5: Scatter plot of brain activity against social distress The relationship shows an upward trend: a higher 2.34 My plot of team value against revenue is in Figure 6. I don t think there s much of a relationship. If anything, the trend is downward, since one of the teams with no revenue has the highest value, 5

Figure 7: Plot of team value against debt Figure 8: Plot of team value against operating income 6

and the team with the highest revenue has almost the lowest value. On the other hand, the plot of value against debt is close to a perfect upward straight line: the larger the debt, the larger the value. There are some outliers at the bottom left (in the sense of points that are further off the line than the others): the Oklahoma City Thunder and the Orlando Magic have higher value than you would expect given their amounts of debt, and the Portland Trail Blazers have lower value than you would expect from their debt. None of these are far off the trend, but the overall fit is so good that I would call even these moderately off values outliers. I d describe the value income plot as a weakish positive association, since there does seem to be some relationship. The teams with negative income seem to be following the same trend as the others, except for the Dallas Mavericks: for a team with that kind of value, you d expect a positive income at least. 2.35 The last sentence of the first paragraph in the text gives you a clue as to what should be on the y-axis: rate is the response, and mass the explanatory variable. So get a scatterplot of Rate against Mass, with groups, and use Sex as the grouping categorical variable. Figure 9. Your plot should look something like Figure 9: Metabolic rate vs. lean body mass Looking at all the data, the relationship is positive (larger lean body mass goes with larger metabolic rate), and the trend looks linear. The relationship looks quite strong, except perhaps at the upper end. Separating out the men and women, some of the men (red squares) have large lean body mass and large metabolic rate, and the trend overall for the men is not as clear as it is for the women (black circles). (Most of the larger values are men, and all of the smaller values, on both variables, are women.) 2.37 To get the plot with men and women s records separately labelled, use the same idea as 2.35: do 7

a scatterplot with groups, and select Sex as the grouping variable. Figure 10: Men and women s 10,000 record times Men (red squares) have been running this event for longer than women (black circles), so their history is longer. But the women s record appears to have been dropping more quickly than the men s. In recent years, though, the women s record hasn t dropped very much, while the men s has dropped more quickly. So the data support the first claim of (b), but not the second (the men s record is still less than the women s, with no apparent sign that the women are going to catch up). 8