Statistics Chapter 7: Review C KEY

Similar documents
Chapter 7: Simple linear regression Learning Objectives

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Homework 8 Solutions

a) Find the five point summary for the home runs of the National League teams. b) What is the mean number of home runs by the American League teams?

Section 3 Part 1. Relationships between two numerical variables

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Chapter 7 Scatterplots, Association, and Correlation

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

MTH 140 Statistics Videos

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Relationships Between Two Variables: Scatterplots and Correlation

2013 MBA Jump Start Program. Statistics Module Part 3

Describing Relationships between Two Variables

Lecture 13/Chapter 10 Relationships between Measurement (Quantitative) Variables

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Homework 11. Part 1. Name: Score: / null

Foundations for Functions

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

Module 3: Correlation and Covariance

Simple Predictive Analytics Curtis Seare

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

2. Simple Linear Regression

Name: Date: Use the following to answer questions 2-3:

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces

Lesson Using Lines to Make Predictions

Correlation and Regression

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Scatter Plot, Correlation, and Regression on the TI-83/84

Using Excel for Statistical Analysis

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries?

Statistics E100 Fall 2013 Practice Midterm I - A Solutions

The importance of graphing the data: Anscombe s regression examples

table to see that the probability is (b) What is the probability that x is between 16 and 60? The z-scores for 16 and 60 are: = 1.

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Copyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5

PITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU

Example: Boats and Manatees

Correlation key concepts:

STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012)

Simple linear regression

Outline: Demand Forecasting

AP Statistics. Chapter 4 Review

Univariate Regression

STAT 350 Practice Final Exam Solution (Spring 2015)

Exercise 1.12 (Pg )

What does the number m in y = mx + b measure? To find out, suppose (x 1, y 1 ) and (x 2, y 2 ) are two points on the graph of y = mx + b.

Descriptive statistics; Correlation and regression

Simple Regression Theory II 2010 Samuel L. Baker

AP STATISTICS REVIEW (YMS Chapters 1-8)

STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section:

COMMON CORE STATE STANDARDS FOR

CALCULATIONS & STATISTICS

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Introduction to Regression and Data Analysis

Statistics. Measurement. Scales of Measurement 7/18/2012

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test March 2014

What is the purpose of this document? What is in the document? How do I send Feedback?

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

PLOTTING DATA AND INTERPRETING GRAPHS

socscimajor yes no TOTAL female male TOTAL

Correlation and Simple Linear Regression

Chapter 9 Descriptive Statistics for Bivariate Data

ch12 practice test SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Statistics 2014 Scoring Guidelines

RUTHERFORD HIGH SCHOOL Rutherford, New Jersey COURSE OUTLINE STATISTICS AND PROBABILITY

WIN AT ANY COST? How should sports teams spend their m oney to win more games?

This curriculum is part of the Educational Program of Studies of the Rahway Public Schools. ACKNOWLEDGMENTS

The Importance of Statistics Education

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE REGRESSION EXAMPLE

Data Mining Introduction


4. Simple regression. QBUS6840 Predictive Analytics.

Linear Regression. use waist

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

The data set we have taken is about calculating body fat percentage for an individual.

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

A. Test the hypothesis: The older you are, the more money you earn. Plot the data on the scatter plot below, choosing appropriate scales and labels.

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Correlational Research

Chapter 23. Inferences for Regression

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y X

Teaching Multivariate Analysis to Business-Major Students

A full analysis example Multiple correlations Partial correlations

Descriptive Statistics

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

parent ROADMAP MATHEMATICS SUPPORTING YOUR CHILD IN HIGH SCHOOL

17. SIMPLE LINEAR REGRESSION II

What do you think Henry Luce meant in this quote? CHAPTER. If you do an Internet search for the word

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

List of Examples. Examples 319

Transcription:

Statistics Chapter 7: Review C KEY 1. After conducting a marketing study to see what consumers thought about a new tinted contact lens they were developing, an eyewear company reported, Consumer satisfaction is strongly correlated with eye color. Comment on this observation. There may be an association between customer satisfaction and eye color, but these are both categorical variables so they cannot be correlated. 2. On the axes below, sketch a scatterplot described: a. a strong negative b. a strong association, c. a weak but association but r is near 0 positive association 3. A school board study found a moderately strong negative association between the number of hours high school seniors worked at part time jobs after school hours and the students grade point averages. a. Explain in this context what negative association means. Students who worked more hours tended to have lower grades. b. Hoping to improve student performance, the school board passed a resolution urging parents to limit the number of hours students be allowed to work. Do you agree or disagree with the school board s reasoning. Explain. They are mistakenly attributing the association to cause and effect. Maybe students with low grades are more likely to seek jobs, or maybe there is some other factor in their home life that leads both to lower grades and to the desire or need to work. 7-25

4. Researchers investigating the association between the size and strength of muscles measured the forearm circumference (in inches) of 20 teenage boys. Then they measured the strength of the boys grips (in pounds). Their data are plotted below. 80 70 S t r n g t h 60 50 40 10 11 12 13 14 Circum a. Write a few sentences describing the association in context. There is a moderate, linear, positive association between forearm circumference and grip strength. Students with greater forearm circumference tend to have higher grip strengths. b. Which value of r represents this correlation: 0.76, 0.22, 0.22, or 0.65? 0.65. 7-26

Term/Concept Definition/Explanation Example Causation vs. Association Causation means an explanatory variable caused a change in a response variable. This can only be done in experiments. Association just implies that there is a link between two variables. Causation: cigarettes cause cancer. Association: studying is associated with better grades. Lurking variable A lurking variable is a variable that is behinds the scenes in a relationship between two other variables. The lurking variable produces a change in the response variable and this change can be wrongly attributed to the explanatory variable. Sunburns and ice cream sales. There is a strong association between the variables but the lurking variable is the season. In summer there are more ice cream sales and more sunburns. Extrapolation (and its dangers) Going outside the data to predict values. Using gas prices from the 1950s to predict gas prices in 2010. 2. Sally created the following model using data taken in an experiment in her laboratory. Number of Bacteria = 34, 219 + 532.1 hours a. Identify the explanatory and response variables she is measuring. The explanatory variable is the number of hours and the response variable is the number of bacteria. b. What should Sally have done with the data before finding this model? She should have checked the Outlier Condition and Straight Enough Condition. c. Which are true? This model predicts how many bacteria there are after a certain number of hours. TRUE This model predicts how many hours a certain number of bacteria have been alive.false This model produces (x,y) points of (number of bacteria, number of hours) FALSE This model produces (x,y) points of (number of hours, number of bacteria) TRUE d. Describe the slope and intercept of this model in context Slope: The slope is the predicted increase in bacteria per hour, 532.1bacteria/hour. Intercept: The intercept is the predicted amount of bacteria at the beginning of the experiment, 34,219. e. Calculate the residual for this data point (2, 37642). Number of Bacteria 34219 + 532.1(2) = 35283.2. Residual = 37642 35283.2 = 2358.2 bacteria. f. Describe what a positive residual means in this context A positive residual means that the model predicts fewer bacteria than there actually were at some time in the experiment. 8-28

3. Data shows a strong positive association between ice cream sales and the number of cases of sunburn in a small town. a. Describe what this means in language a 10 year old would understand. One possible explanation: During the summer, when the sun is out and it s warm, people enjoy eating ice cream more than they do at other times of the year. People also get more sunburned in the summer than at other times of the year. So, we notice that as ice cream sales go up, it s usually summer, and so people are more likely to get sunburned. b. Does this mean that eating ice cream causes sunburn? No. Just an association. This is not an experiment. There is no cause-and-effect relationship. c. What might be a lurking variable in this situation? Explain. People tend to stay outdoors more in the summer than at other times of the year. The sun is out. People tend to get sunburned more often. 4. Here is some fictional data for a set of cars showing the number of oil changes and the annual cost of repairs for each car. (from http://illuminations.nctm.org/lessondetail.aspx?id=l298 Oil changes 3 5 2 3 1 4 6 4 3 2 0 10 7 Repair costs ($) 300 300 500 400 700 400 100 250 450 650 600 0 150 a. Justify the use of a linear model for this data. Using a scatterplot, the data appear to fulfill the Outlier Condition and the Straight Enough Condition. b. Describe the association between the number of oil changes and repair costs for these cars. As the number of oil changes a car receives increases, the annual cost of repairs tends to decrease. c. Create a model to predict the annual repair costs for a car based on the number of oil changes in that year. Costs = 650.27 73.07Changes d. Describe the slope and intercept of your model in context. slope: For every oil change, the model predicts a decrease in annual repair costs of $73.07. intercept: A car that receives no oil changes is predicted to have $650.27 in annual repair costs. e. Predict the annual repair costs for a car that has 8 oil changes that year. Costs = 650.27 73.07(8) = $65.71 f. Predict the number of oil changes for a car that had $85 in repairs that year. Changes = 8.0674 0.0114Costs Changes = 8.0674 0.0114(85) = 7.1 oil changes 8-29

4. Here is some fictional data for a set of BMX bikes showing their weight and maximum number of inches a rider was able to get them in the air for a trick. (from http://illuminations.nctm.org/lessondetail.aspx?id=l298 weight (lbs) 19 19.5 20 20.5 21 22 22.5 23 23.5 Height (in) 10.35 10.3 10.25 10.2 10.1 9.85 9.8 9.79 9.7 a. Justify the use of a linear model for this data. Using a scatterplot, the data appear to fulfill the Outlier Condition and the Straight Enough Condition. b. Describe the association between the weight and height of jump for these bikes. As the of a BMX bike increases, the maximum air the bike is able to attain tends to decrease c. Predict the height attained by a bike that weighs one standard deviation more than the mean weight. r 0.988 x 21.22 s 1.60 Height = 21. (0.988)(1.60) = 19.42 inches d. Create a model to predict the height of a jump of a bike based on its weight. Height = 13.34 0.155Weight e. Describe the slope and intercept of your model in context. slope: For every 1 pound increase in weight, our model predicts a decrease in maximum air height of 0.155 inches. intercept: A bike that has no weight is predicted to reach a maximum air height of 13.34 inches. f. Predict the height attained by a bike weighing 24 pounds. Height = 13.34 0.155(24) = 9.62 inches g. Create a model and predict the weight of a bike that reached a height of 12 inches. Weight = 84.27 6.28Height Weight = 84.27 6.28(12) = 8.91 lbs h. Discuss your confidence in using these models to make these predictions. I have no confidence in these models to make these predictions. These predictions are extrapolating beyond the data. 8-35

Statistics Chapter 8: Review A KEY 1. For each term or concept, give a definition/explanation in your own words and an example (Only use your book or other resources to check when you re done or if you really get stuck!) Term/Concept Definition/Explanation Example Model Our best guess at explaining the linear relationship between two variables. Cost 65 0.25MinutesUsed Predicted Value An estimate based on known data. Use the cost model above to predict the cost of a cell phone bill when 300 minutes are used. Conditions to use a linear model Straight Enough Condition and Outlier Condition If there appears to be an outlier or outliers OR if the data does not appear to be straight, we cannot use the linear model. Residuals The distance between actual and predicted values. If you use the cost model above to predict the value of 100 minutes used on a cell phone plan, you would get $90. If the actual value was $80, then the residual would be 80 90 = -$10. Regression towards the mean Predictors give estimates that are closer to their means than the predictor is to its own mean. One-hit-wonders: a band has an amazing first album but future albums are mediocre at best. Slope of a linear model (what it is and how to interpret it in context) Intercept of a linear model (what it is and how to interpret it in context) The slope tells us about the association of the linear relationship if it is positive or negative. To interpret it, we use the units provided in the context of the scenario. The intercept is the starting point for our model. Sometimes it has meaning and sometimes it does not depending on the context. In the cell phone example above, the slope would be $0.25/minute. This means the model predicts that every minute of use is associated with an increase of $0.25. In the above cell phone example, the intercept has meaning. That is, it is the starting value of the cost for the cell phone plan predicted by the model. 8-27