Homework 8 Solutions



Similar documents
Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 7 Scatterplots, Association, and Correlation

Relationships Between Two Variables: Scatterplots and Correlation

Exercise 1.12 (Pg )

Simple linear regression

Homework 11. Part 1. Name: Score: / null

Example: Boats and Manatees

Premaster Statistics Tutorial 4 Full solutions

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Chapter 7: Simple linear regression Learning Objectives

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

MTH 140 Statistics Videos

Univariate Regression

Father s height (inches)

The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces

STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012)

ch12 practice test SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Section 3 Part 1. Relationships between two numerical variables

Correlation and Regression

Lecture 13/Chapter 10 Relationships between Measurement (Quantitative) Variables

Using Excel for Statistical Analysis

The importance of graphing the data: Anscombe s regression examples

5. Multiple regression

Simple Predictive Analytics Curtis Seare

STAT 350 Practice Final Exam Solution (Spring 2015)

2. Simple Linear Regression

Chapter 13 Introduction to Linear Regression and Correlation Analysis

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

Name: Date: Use the following to answer questions 2-3:

Linear Regression. use waist

Transforming Bivariate Data

Correlation and Simple Linear Regression

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Introduction to Longitudinal Data Analysis

Simple Linear Regression

High School Statistics and Probability Common Core Sample Test Version 2

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96


Lesson Using Lines to Make Predictions

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Session 7 Bivariate Data and Analysis

Regression Analysis: A Complete Example

AGAINST ALL ODDS EPISODE 11 FITTING LINES TO DATA TRANSCRIPT

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Getting Correct Results from PROC REG

Chapter 9 Descriptive Statistics for Bivariate Data

Simple Regression Theory II 2010 Samuel L. Baker

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test March 2014

Session 9 Case 3: Utilizing Available Software Statistical Analysis

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression

Diagrams and Graphs of Statistical Data

AP STATISTICS REVIEW (YMS Chapters 1-8)

WIN AT ANY COST? How should sports teams spend their m oney to win more games?

Foundations for Functions

Module 5: Multiple Regression Analysis

CALCULATIONS & STATISTICS

An Analysis of the Undergraduate Tuition Increases at the University of Minnesota Duluth

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

II. DISTRIBUTIONS distribution normal distribution. standard scores

Title ID Number Sequence and Duration Age Level Essential Question Learning Objectives. Lead In

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Scatter Plot, Correlation, and Regression on the TI-83/84

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Regression and Correlation

Elements of statistics (MATH0487-1)

Chapter 23. Inferences for Regression

Statistics 2014 Scoring Guidelines

table to see that the probability is (b) What is the probability that x is between 16 and 60? The z-scores for 16 and 60 are: = 1.

Worksheet A5: Slope Intercept Form

How Far is too Far? Statistical Outlier Detection

MULTIPLE REGRESSION EXAMPLE

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Title: Lending Club Interest Rates are closely linked with FICO scores and Loan Length

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb

Copyright 2013 by Laura Schultz. All rights reserved. Page 1 of 7

Formula for linear models. Prediction, extrapolation, significance test against zero slope.

Grade. 8 th Grade SM C Curriculum

COMMON CORE STATE STANDARDS FOR

What is the purpose of this document? What is in the document? How do I send Feedback?

Chapter 7: Modeling Relationships of Multiple Variables with Linear Regression

RARITAN VALLEY COMMUNITY COLLEGE ACADEMIC COURSE OUTLINE MATH 111H STATISTICS II HONORS

a) Find the five point summary for the home runs of the National League teams. b) What is the mean number of home runs by the American League teams?

LOGIT AND PROBIT ANALYSIS

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

17. SIMPLE LINEAR REGRESSION II

Correlation key concepts:

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Regression Analysis (Spring, 2000)

Descriptive statistics; Correlation and regression

Section 1: Simple Linear Regression

The Big Picture. Correlation. Scatter Plots. Data

Transcription:

Math 17, Section 2 Spring 2011 Homework 8 Solutions Assignment Chapter 7: 7.36, 7.40 Chapter 8: 8.14, 8.16, 8.28, 8.36 (a-d), 8.38, 8.62 Chapter 9: 9.4, 9.14 Chapter 7 7.36] a) A scatterplot is given below. It doesn t become clear which is the explanatory variable and which is the response until part (d). Either plot is fine. b) Using Rcmdr, the correlation is 0.9341. > cor(drugs[,c("marijuana","ot her")], use="complete.obs") Marijuana Other Marijuana 1.0000 0.9341 Other 0.9341 1.0000 c) We have a positive, fairly strong linear association between marijuana and other drugs. Countries with higher percentages of teens who have used marijuana tend to have higher percentages of teens that have used other drugs. d) These results do not confirm that marijuana is a gateway drug. An association exists between the percent of teens that have used marijuana and the percent of teens that have used other drugs. This does not mean that one caused the other.

7.40] a) Winning teams generally enjoy greater attendance at their home games. The association between home attendance and number of wins is positive, somewhat straight, and moderately strong. b) The association between winning and home attendance has the strongest correlation, with r = 0.697. This is only slightly higher than the association between home attendance and scoring runs, with r = 0.667. c) The correlation between number of runs scored and number of wins is r = 0.605, indicating a possible moderate association. However, since there is no scatterplot of wins vs. runs provided, we can t be sure the relationship is straight. Correlation may not be an appropriate measure of the strength of the association. Chapter 8 8.14] Residuals. a) The curved pattern in the residuals plot indicates that the linear model is not appropriate. The relationship is not linear. b) The fanned pattern indicates heteroscedastic data. This is a fancy statistics term that means the equal variances assumption has been violated. c) The scattered residuals plot indicates an appropriate linear model. 8.16] Roller coaster. a) The explanatory variable (x) is initial drop, measured in feet, and the response variable (y) is duration, measured in seconds. b) The units of the slope are seconds per foot. c) The slope of the regression line predicting duration from initial drop should be positive. Coasters with higher initial drops probably provide longer rides.

8.28] Last ride. a) According to the linear model, the duration of a coaster ride is expected to increase by about 0.242 seconds for each additional foot of initial drop. b) 91.033 0.242 91.033 0.242 200 139.433 According to the linear model, a coaster with a 200 foot initial drop is expected to last 139.433 seconds. c) 91.033 0.242 150 127.333 2.12 minutes According to the linear model, a coaster with a 150 foot initial drop is expected to last 127.333 seconds. The advertised duration is shorter, at 120 seconds. The difference is 120 seconds 127.333 seconds = 7.333 seconds. This is called a residual. 8.36 (a-d)] Interest Rates and mortgages again. a) Yes, the regression seems appropriate. Both interest rate and total mortgages are quantitative, and the scatterplot looks pretty straight. The spread is fairly constant, and there are no outliers. b) We can use the summary data to find the regression equation. 0.84.. 7.768 151.9 7.768 8.88 220.88 The regression equation is: 220.88 7.768 c) With an interest rate of 20%, we d predict a mortgage amount of 220.88 7.768 20 65.52 million. d) Yes, I would have reservations about this prediction. We are extrapolating beyond the range of our data. We cannot be sure the linear relationship still holds.

8.38] Online clothes II. 8.62] Gators a) We can use the summary data to find the regression equation. 0.722.. 0.0108 572.52 0.0108 50343.40 28.81 The regression equation is: 28.81 0.0108 b) Yes, the regression seems appropriate. Both variables are quantitative, and the scatterplot looks pretty straight. The spread is fairly constant, and there are no outliers. c) With an income of $20,000, we d predict a total yearly purchases amount of 28.81 0.0108 20000 244.81. With an income of $80,000, we d predict a total yearly purchases amount of 28.81 0.0108 80000 892.81. d) This is the R 2 value. 0.722 0.5213. This means that 52.13% of the variation in purchases is explained by the linear regression with income. e) This regression may be useful. The R 2 value isn t great, but the line is still explaining something. a) Weight is the proper dependent variable. The researchers can estimate length from the air, and use length to predict weight. b) The correlation between an alligator s length and weight is 0.836 0.914. Since length and weight are positively associated, the correlation is r = 0.914. c) The linear regression model that predicts and alligator s weight from its length is 393 5.9 d) For each additional inch in length, the model predicts an increase of 5.9 pounds in weight. e) The estimates made using this model should be fairly accurate. The model accounts for 83.6% of the variability in weight. However, care should be taken. With no scatterplot, and no residuals plot, we cannot verify the regression condition of linearity. The association between length and weight may be curved, in which case, the linear model is not appropriate.

Chapter 9 9.4] HDI revisited. a) Fitting a linear model to the association between the number of cell phones and HDI would be misleading, since the relationship is not straight. b) The residuals plot will be curved downward. Imagine a linear regression line through the points. At the left side, the line will be above the data point. Residuals will be negative. In the middle of the plot the line will be below the data points, so the residuals will be positive. At the right of the plot, the line is again above the data points, so the residuals will be negative again. So the residuals will go from negative to positive to negative, a downward curve. 9.14] The extra point revisited. 1) -0.45 d. Point d is influential. Its addition will pull the slope of the regression line toward point d, resulting in the steepest negative slope, a slope of 0.45. 2) -0.30 e. Point e is very influential, but since it is far away from the group of points, its addition will only pull the slope down slightly. The slope is 0.30. 3) 0.00 c. Point c is directly below the middle of the group of points. Its position is directly below the mean of the explanatory variable. It has no influence. Its addition will leave the slope the same, 0. 4) 0.05 b. Point b is almost in the center of the group of points, but not quite. It has very little influence, but what influence it has is positive. The slope will increase very slightly with its addition, to 0.05. 5) 0.85 a. Point a is very influential. Its addition will pull the regression line up to its steepest positive slope, 0.85.