ST 311 Evening Problem Session Solutions Week 11

Similar documents
Chapter 7: Simple linear regression Learning Objectives

MULTIPLE REGRESSION EXAMPLE

Homework 8 Solutions

Copyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

International Statistical Institute, 56th Session, 2007: Phil Everson

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

Module 5: Multiple Regression Analysis

Premaster Statistics Tutorial 4 Full solutions

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Using R for Linear Regression

17. SIMPLE LINEAR REGRESSION II

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

ch12 practice test SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Univariate Regression

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

Correlation and Regression

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 3 Part 1. Relationships between two numerical variables

Simple linear regression

Correlation and Simple Linear Regression

The importance of graphing the data: Anscombe s regression examples

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

2. Simple Linear Regression

Solution Let us regress percentage of games versus total payroll.

Multiple Linear Regression

SPSS Guide: Regression Analysis

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb

The Numbers Behind the MLB Anonymous Students: AD, CD, BM; (TF: Kevin Rader)

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Outline: Demand Forecasting

Copyright 2013 by Laura Schultz. All rights reserved. Page 1 of 7

Module 3: Correlation and Covariance

Regression Analysis: A Complete Example

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y X

Logs Transformation in a Regression Equation

STAT 350 Practice Final Exam Solution (Spring 2015)

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

STT 200 LECTURE 1, SECTION 2,4 RECITATION 7 (10/16/2012)

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Simple Methods and Procedures Used in Forecasting

Getting Correct Results from PROC REG

WIN AT ANY COST? How should sports teams spend their m oney to win more games?

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Does pay inequality within a team affect performance? Tomas Dvorak*

Exercise 1.12 (Pg )

table to see that the probability is (b) What is the probability that x is between 16 and 60? The z-scores for 16 and 60 are: = 1.

Describing Relationships between Two Variables

Mario Guarracino. Regression

8. Time Series and Prediction

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Chapter 9 Descriptive Statistics for Bivariate Data

Correlation key concepts:

The Correlation Coefficient

TIME SERIES ANALYSIS & FORECASTING

Name: Date: Use the following to answer questions 2-3:

Relationships Between Two Variables: Scatterplots and Correlation

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Simple Linear Regression

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Nonlinear Regression Functions. SW Ch 8 1/54/

c 2015, Jeffrey S. Simonoff 1

The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces

Chapter 23. Inferences for Regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Georgia Department of Education Common Core Georgia Performance Standards Framework Teacher Edition Coordinate Algebra Unit 4

Discussion Section 4 ECON 139/ Summer Term II

Using Excel for Statistical Analysis

5. Linear Regression

A Primer on Forecasting Business Performance

Simple Predictive Analytics Curtis Seare

2) The three categories of forecasting models are time series, quantitative, and qualitative. 2)

Interaction effects between continuous variables (Optional)

Chapter 1: Exploring Data

Scatter Plot, Correlation, and Regression on the TI-83/84

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Homework 11. Part 1. Name: Score: / null

2. Linear regression with multiple regressors

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Session 7 Bivariate Data and Analysis

August 2012 EXAMINATIONS Solution Part I

Simple Regression Theory II 2010 Samuel L. Baker

Statistics 151 Practice Midterm 1 Mike Kowalski

Lets suppose we rolled a six-sided die 150 times and recorded the number of times each outcome (1-6) occured. The data is

Algebra 1 Course Information

Module 6: Introduction to Time Series Forecasting

2013 MBA Jump Start Program. Statistics Module Part 3

MTH 140 Statistics Videos

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test March 2014

Moderator and Mediator Analysis

5. Multiple regression

The Latent Variable Growth Model In Practice. Individual Development Over Time

Transcription:

1. p. 175, Question 32 (Modules 10.1-10.4) [Learning Objectives J1, J3, J9, J11-14, J17] Since 1980, average mortgage rates have fluctuated from a low of under 6% to a high of over 14%. Is there a relationship between the amount of money people borrow and the interest rate that s offered? Here is a scatterplot of Total Mortgages in the United States (in millions of 2005 dollars) versus Interest Rate at various times over the past 26 years. R-Squared 0.706 Reg Equation a) Identify the dependent and independent variables. b) Interpret the meaning of R 2 in this context? T otalm ortgages = 220.89 7.78 InterestRate c) Find the correlation coefficient. If we were to measure Total Mortgages in thousands of dollars instead of millions of dollars, how would the correlation coefficient change? d) Do these data provide proof that if mortgage rates are lowered, the total mortgage amount that people will take out will increase? Explain. e) Interpret the meaning of the slope of the regression line in this context. f) Suppose we discovered a missing measurement that recorded the Total Mortgages as $180 million for an interest rate of 14%. What would its predicted value and residual value be? How will this value affect the slope of our line? What would you expect to happen to the R 2 value? Explain. Solutions: a) The dependent variable is Total Mortgages and the independent variable is Interest Rates b) The R 2 of 0.706 means that Interest Rates account for 70.6% of the variation in Total Mortgages Page 1

c) First make a not of the sign of the slope in this problem. Our slope here is 7.78, so we need to take the negative square root of R 2. The correlation coefficient is r = 0.706 = 0.840 d) These data do not indicate causation. Instead, the relationship here is one of correlation. So, it would just appear that the interest rates are correlated with the amount of mortgages. To imply causation, we would have needed to do an experiment. e) The slope of 7.78 indicates that when interest rates increase by 1%, we would expect an average decrease in total mortgages of $7.78 million. f) The predicted value would be and the associated residual value would be ŷ = 220.89 7.78(14) = 111.97 Residual = Observed P redicted = 180 111.97 = $68.03 million. This value will make our regression line flatter, so the new slope would be closer to 0. The R 2 value should decrease. 2. p. 204, Question 30 (Modules 10.1-10.4) [Learning Objectives J1, J3, J9, J11-14] Here is a scatter plot of the number of wins by American League baseball teams and the average attendance at their home games for the 2006 season, and part of the regression analysis. R-Squared 0.485 Reg Equation HomeAttendance = 14364.5 + 538.915 W ins. Page 2

a) Identify the dependent and independent variables. b) Interpret the meaning of R 2 in this context. c) Find the correlation coefficient. d) Estimate the Average Attendance for a team with 72 Wins e) Interpret the meaning of the slope of the regression line in this context. f) The St. Louis Cardinals, the 2006 World Champions, are not included in these data because they are a National League team. During the 2006 regular season, the Cardinals won 83 games and averaged 42,588 fans at their home games. Calculate the residual for this team, and explain what it means. How will this value impact the slope of the line. Explain. Solutions a) The dependent variable is Home Attendance and the independent variable is Wins b) The R 2 of 0.485 means that Wins account for 48.5% of the variation in Home Attendance c) First make a not of the sign of the slope in this problem. Our slope here is 538.915, so we need to take the positive square root of R 2. The correlation coefficient is d) ŷ = 14364.5 + 538.915(72) = 24437.4 r = 0.485 = 0.696 e) The slope of 538.915 indicates that for every additional win, we would expect the average home attendance to increase by 538.915 people. f) The predicted value would be and the associated residual value would be ŷ = 14364.5 + 538.915(83) = 30365.4 Residual = Observed P redicted = 42588 30365.5 = 12222.6 This residual value means that the Cardinals had an average attendance of 12,222.6 people higher than we would expect given the number of win in their season. If we included this value in our data set, it will make the slope increase. 3. p. 235, Question 30 (Modules 10.1-10.4) [Learning Objectives J1, J9, J11-12, J14] Information was gathered about the condition and ages bridges of Tompkins County, NY built since 1880. Below you can find the corresponding scatterplot and some simple linear regression output from StatCrunch. Page 3

R-Squared 0.518 Reg. Equation a) Identify the dependent and independent variables. Condition = 44.991 + 0.0256 year b) Interpret the meaning of the slope of the regression line in this context. c) Tompkins County is the home of the oldest covered bridge in daily use in New York. Built in 1853, it is judged to have a condition of 4.523. If we use this regression to predict the condition of the covered bridge, what would its predicted value and residual value be? d) How do you think this will impact the regression slope? Explain. e) If we add the covered bridge (from c) to the data, what would you expect to happen to the R 2 value? Explain. f) The Tompkins County bridge (from c) was extensively restored in 1972. If we use that date instead of 1853, do you find the condition of the bridge remarkable? Solutions: (a) The dependent variable is Condition and the independent variable is Year (b) The slope of 0.0256 indicates that as we move later by one year (i.e. 1940 to 1941), we would expect the average condition increase by 0.0256. (c) The predicted value would be ŷ = 44.991 + 0.0256(1853) = 2.4458 and the associated residual value would be Residual = Observed P redicted = 4.523 2.4458 = 2.0772 Page 4

(d) This value will make our regression line flatter, so the new slope would be closer to 0. (e) If we add the bridge to the data, we would expect the R 2 value to decrease because we would be adding an outlier into the data set, so the line would fit worse. (f) No, if we consider the year as 1972, then we would predict the condition to be ŷ = 44.991 + 0.0256(1972) = 5.4922 Which has a residual value of a little less than 1. While the observed value (of 4.523) is somewhat different than the predicted value (of 5.4922), if you look at the other bridges built in 1972, it is not unusual to see bridges with condition numbers around 4.5. 4. Additional Question 1 (Modules 10.1-10.4) [Learning Objectives J2, J4, J8, 16] The following partial regression output explores the relationship between shoe size and height (in inches). Simple linear regression results: Dependent Variable: Height Independent Variable: shoe Sample size: 389 R (correlation coefficient) = 0.8869 R-sq = 0.78655535 Estimate of error standard deviation: 1.9304528 Parameter Estimates: Page 5

Parameter Estimate Std. Err. Alternative DF T-Stat P-Value Intercept 50.711956 0.45652923 0 387 111.08151 < 0.0001 Slope 1.8122975 0.04799014 0 387 37.763954 < 0.0001 (a) Describe the relationship between shoe size and height shown in the scatterplot. When commenting on the strength of the relationship, include a specific number from the output that is used to determine if the relationship is strong or weak. (b) What is the equation of the regression model (report values to 2 decimal places)? (c) Would it be appropriate to use the model from part (b) to predict the height of a person with a size 4 shoe? Explain why or why not. Solutions: (a) Overall, there is a strong, positive linear relationship with no obvious outliers. The relationship is strong since r = 0.8869, which is close to 1. (Note: you could also report that R 2 is close to 1: R 2 = 0.7866.) (b) ŷ = 50.71 + 1.81x (c) No, since that is beyond the range of the data we have and we dont know if the relationship remains the same as what we see here. Page 6