SELF-TEST: SIMPLE REGRESSION

Similar documents
Premaster Statistics Tutorial 4 Full solutions

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Regression step-by-step using Microsoft Excel

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Regression Analysis: A Complete Example

Module 5: Multiple Regression Analysis

2013 MBA Jump Start Program. Statistics Module Part 3

Chapter 7: Simple linear regression Learning Objectives

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Multiple Linear Regression

Regression Analysis (Spring, 2000)

1.5 Oneway Analysis of Variance

Simple Regression Theory II 2010 Samuel L. Baker

August 2012 EXAMINATIONS Solution Part I

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

MULTIPLE REGRESSION EXAMPLE

Chapter 5 Analysis of variance SPSS Analysis of variance

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Week TSX Index

Final Exam Practice Problem Answers

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

SPSS Guide: Regression Analysis

STAT 350 Practice Final Exam Solution (Spring 2015)

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

MULTIPLE REGRESSION WITH CATEGORICAL DATA

GLM I An Introduction to Generalized Linear Models

2. What is the general linear model to be used to model linear trend? (Write out the model) = or

Univariate Regression

Simple linear regression

Interaction between quantitative predictors

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

5. Multiple regression

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

Hedge Effectiveness Testing

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Descriptive Statistics

Notes on Applied Linear Regression

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Causal Forecasting Models

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

A Primer on Forecasting Business Performance

The importance of graphing the data: Anscombe s regression examples

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Recall this chart that showed how most of our course would be organized:

Unit 26 Estimation with Confidence Intervals

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

Correlation and Simple Linear Regression

Predictor Coef StDev T P Constant X S = R-Sq = 0.0% R-Sq(adj) = 0.

Module 3: Correlation and Covariance

Homework 8 Solutions

International Statistical Institute, 56th Session, 2007: Phil Everson

Simple Methods and Procedures Used in Forecasting

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal

2. Simple Linear Regression

11. Analysis of Case-control Studies Logistic Regression

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

Simple Linear Regression Inference

Hypothesis testing - Steps

II. DISTRIBUTIONS distribution normal distribution. standard scores

Session 7 Bivariate Data and Analysis

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)

Chapter 3 Quantitative Demand Analysis

Moderation. Moderation

Projects Involving Statistics (& SPSS)

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Not Your Dad s Magic Eight Ball

Multiple Regression: What Is It?

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

Q = ak L + bk L. 2. The properties of a short-run cubic production function ( Q = AL + BL )

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

12: Analysis of Variance. Introduction

Exercise 1.12 (Pg )

1 Simple Linear Regression I Least Squares Estimation

Predicting Box Office Success: Do Critical Reviews Really Matter? By: Alec Kennedy Introduction: Information economics looks at the importance of

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

Generalized Linear Models

Getting Correct Results from PROC REG

STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section:

2. Linear regression with multiple regressors

MULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM. R, analysis of variance, Student test, multivariate analysis

Elementary Statistics Sample Exam #3

HYPOTHESIS TESTING WITH SPSS:

Introduction to Regression and Data Analysis

Linear Models in STATA and ANOVA

You have data! What s next?

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

Statistical Models in R

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Regression III: Advanced Methods

Chapter 7: Modeling Relationships of Multiple Variables with Linear Regression

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there

Transcription:

ECO 22000 McRAE SELF-TEST: SIMPLE REGRESSION Note: Those questions indicated with an (N) are unlikely to appear in this form on an in-class examination, but you should be able to describe the procedures used to get an answer and be able to interpret the answers. 1. What are the assumptions involved in simple linear regression? 2. What line does the method of least squares actually find? 3. What information might we get from a scatter plot of y against x? 4. Describe how to use Excel to create a scatter diagram. 5. Describe how to use Excel to calculate a regression line. 6. The regression equation of starting salary on GPA for a sample of recent graduates of RCCC is salary = 8000 + 3500 * GPA. Randy just graduated with a GPA of 2.6; what starting salary would the regression equation predict for him? 7. For a cross-section of companies, a marketing analyst regressed sales on advertising expenditures, resulting in the following Excel output: SUMMARY OUTPUT Regression Statistics Multiple R 0.9 R Square 0.81 Adjusted R Square 0.80 Standard Error 100 Observations 45 ANOVA df SS MS F Significance F Regression 1 48000 48000 16 0.000245081 Residual 43 129000 3000 Total 44 173000 Coefficients Standard Error t Stat P-value Intercept 400 75 5.33 3.41301E -06 Advertising 0.5 0.125 4 0.000245081 a) Write out the regression equation, showing sales as a function of advertising expenditures. b) Give a point prediction for sales for a company whose advertising expenditures equal $7,000. c) Give a 95% confidence interval for the average sales level for a company spending $2,000 on advertising. Assume the mean advertising expenditure = $4,000. d) Give a 95% confidence interval for a specific value of y for a company spending $2,000 on advertising with x = $4,000. e) Explain why the intervals in c. and d. are not the same.

Stats II, Regression, page 2 8. What shape do confidence intervals for y values at given x values have? What does this imply about predicted values far from the mean value of x? 9. Say whether the following statement is true or false and explain your answer: If a regression equation has a high r 2, statisticians see no problem with making extrapolations well beyond the observed range of x and y values. 10. What does the coefficient of correlation measure? How is it related to a regression line? 11. Find the coefficient of correlation between x and y: x y 2 5 1 7 6 3 12. To test whether a correlation between x and y is significant, we should test the null hypothesis with alternative hypothesis ; the test statistic is a with d.f. 13. Describe three different ways to find the correlation coefficient using Excel. 14. Comment on the following: Among the industrial nations, there is a negative correlation between average medical expenditures and life expectancy; this proves that medical care causes people to live shorter lives. 15. r 2 is called the ; it is interpreted as giving the in y which is by variation in x. 16. Generally speaking, what does r-squared tell us about a regression equation? 17. ART's engineers regressed production costs on output and found the regression equation: cost = 4000 + 2 * output. In the regression results, s y.x = 1800 and s b = 0.6; the regression was based on a sample of 40 days output and costs. Give a 98% confidence interval for β 1. 18. Using the data of the preceding question, formulate and conduct an appropriate test for the significance of the regression coefficient. 19. The following Excel output was generated by regressing percentage rates of inflation on percentage rates of increase in the money supply: SUMMARY OUTPUT Regression Statistics Multiple R 0.7 R Square 0.49 Adjusted R Square 0.46 Standard Error 1 Observations 62 ANOVA df SS MS F Significance F Regression 1 900 Residual 60 6000 Total 61 6900 Coefficients Standard Error t Stat P-value Intercept -1 0.2-5 0.00032 X Variable 1 1.2 0.4

Stats II, Regression, page 3 a) What is the simple correlation coefficient between prices and money? b) In a t test of H 0 : ρ = 0, what is the calculated value of t? c) In a t test of H 0 : β 1 = 0, what is the calculated value of t? At α = 0.01, what should we do with the null hypothesis? d) In an ANOVA test of this regression equation, what is the critical value of F for α = 0.025? (Use FINV to find the critical value.) e) What is the calculated value of F in an ANOVA test? Should we accept or reject the null hypothesis of no linear relation between money and inflation? 20. In a regression ANOVA table, how are the following terms defined? Regression sum of squares; residual sum of squares; total sum of squares. What does each represent? 21. In a regression of managers' salaries on firm size, researchers estimated the equation salary = 20000 + 5000 * sales, where sales were measured in millions of dollars. Observation number 42 works at a firm with annual sales of 8 million dollars, and he makes $53,000 a year. What is the residual for observation 42? 22. How could a graph of the residuals from a regression equation help in determining whether ε is normally distributed? 23. How might you use a histogram of the residuals from a regression equation? A CPA has gathered the following data for a sample of twelve corporations: Observation # Long-Term Assets Long-Term Debt 1 54 28 2 47 26 3 60 39 4 56 43 5 64 24 6 26 16 7 47 30 8 69 38 9 62 43 10 45 24 11 48 36 12 39 20 24. (N) Suppose that we wish to know whether acquiring long-term assets is done primarily by acquiring long-term debt. a) Designating assets as y and debt as x, use your spreadsheet to find the regression equation of assets on debt; state this equation in algebraic notation. b) What does the x coefficient tell you about the relation between assets and debt? c) What is the correlation between assets and debt? Use a t test to find whether we can consider this significant. d) Use an appropriate t test to test whether the slope of the regression line can be considered different from 0; set your significance level at 5%. e) At 1% significance, use ANOVA to test H 0 : there is no significant linear relation between assets and debt. f) Make a point prediction of assets for a corporation which has 25 million dollars of long term debt. g) Give a 95% prediction interval for the assets of a corporation with 25 million dollars of debt. h) Give a 95% confidence interval for the average of all corporations that have 25 million dollars of debt. i) Compute and interpret the residual for observation #9. j) Give a 90% confidence interval for the value of β. 25. What would you look for in a residual plot that would be a clue to the presence of each of the following conditions? a) non-normality of the residuals b) heteroscedasticity c) non-linearity of the relation between x and y

Stats II, Regression, page 4 d) autocorrelation 26. In the ANOVA table, the regression sum of squares is defined as SSR = Σ( ŷ y) 2 ; explain why that represents the variation in y which is explained by variation in x. 27. The residual sum of squares, or error sum of squares, is defined as SSE = Σ(y ŷ ) 2 ; explain why this term represents the variation in y which is NOT explained by variation in x. 2 28. r 2 2 ( y yˆ) is defined as r = 1. Explain how this definition leads to the interpretation usually given of 2 ( y y) r 2. 29. What condition is indicated by each of the following residual plots? A. B. C. D.

SELF TEST: MULTIPLE REGRESSION 1. Marketing researchers at ART, Inc., have regressed their sales on Gross Domestic Product and their own advertising expenditures with the following result: Sales = 400,000 + 4,000 GDP + 7000 A a) What could we predict ART's sales to be if GDP = 6.5 trillion and advertising expenditures = 20 million? b) If GDP rose to 6.8 trillion, by how much would we expect sales to change? c) ART wishes to increase its unit sales by 21,000; by how much will they need to increase their advertising budget? 2. Why is the use of adjusted R 2 preferred to the use of plain R 2 in multiple regression? What is it we're adjusting for? 3. When is it important to use adjusted R 2? When is it not important? 4. R 2 can be thought of as the proportion of in y which is by in the x's. State the definition of R 2 and explain why that definition leads to this interpretation. 5. In performing a t test on a coefficient from multiple regression, what null and alternative hypotheses are we testing? The following Excel output is for questions 6 to 12: SUMMARY OUTPUT Regression Statistics Multiple R 0.774597 R Square 0.6 Adjusted R Square 0.52 Standard Error 10.00 Observations 16 ANOVA df SS MS F Significance F Regression 3 1800 Residual 12 1200 Total 15 3000 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 20.00 10 2 0.06866 X Variable 1 10.00 5 X Variable 2 5.00 1.5 X Variable 3 3.00 0.5 6. What is the regression equation? 7. How many degrees of freedom are there in the t Stats? 8. According to the t ratios, which of the regression coefficients would be significant at the 5% level? Which at the 10% level? 9. What is the F ratio? What null hypothesis would be tested with this value? At α = 0.01, can we reject the null hypothesis? Can we reject at α = 0.05?

Stats II, Regression, page 6 10. Suppose x 1 = 6, x 2 = 0, and x 3 = 2; what is ŷ? 11. Observation #8 had x 1 = 8, x 2 = 2, and x 3 = 4; for that observation, y = 108. What is the residual for this observation? 12. Find a 95% confidence interval for ß 1, the coefficient on variable X 1. 13. What is a dummy variable? 14. A marketing researcher has created a dummy variable for "Owns own home." John lives in an apartment; what value will this dummy have for him? Mary is paying off the mortgage on her condominium; what value will this dummy have for her? 15. In a regression of monthly entertainment expenditures on several things, the dummy of q. 14 had the value $21. Explain the meaning of this number. 16. What is multicollinearity? How can we detect it? 17. What are the effects in regression analyses of multicollinearity? 18. Suppose the relation between x and y is not linear: how could you detect this nonlinearity? 19. (N) A researcher wishes to be able to predict the number of movies attended in a year's time on the basis of four explanatory variables: age, education, income, and sex. A sample of ten people yields the following data: No. of Movies Age Education Income Sex Dummy (Male = 1) 25 18 11 35 1 12 35 13 38 0 21 21 14 35 1 9 35 16 50 0 18 25 14 36 0 27 21 13 39 1 4 39 13 37 0 17 31 12 34 0 17 20 14 41 1 7 40 12 29 0 a) Using your spreadsheet, find the regression equation and write it out in algebraic notation. b) Explain what each of the regression X coefficients means. c) Using an appropriate t test, at 5% significance test H 0 : β i = 0 for i = 1 to 4. d) What is the adjusted R 2? How would we interpret that number? Why is there so much difference in this case between R 2 and adjusted R 2? e) Using ANOVA state and test the appropriate null hypothesis to test whether there is a significant linear relation among these variables. f) Predict how many movies will be seen by a 37 year-old female high-school graduate whose family income is $43,000 a year. g) State the 95% confidence interval for each X coefficient. h) Calculate a 98% confidence interval for β 2 i) Find the residual for the first observation (25 movies, age 18 and so on). j) In examining the residual plots generated by the Excel, do you detect any problems or violations of the regression assumptions? k) Does there appear to be significant multicollinearity among the X variables? How do you know that?

Stats II, Regression, page 7 Selected Answers: Simple Regression:: 6. 17,100 19. a. 0.7 7. a. sales = 400 + 0.5 adv b. 7.59 b. 3900 c. 3 reject c. 1400 ± 505.07 d. 5.29 d. 1400 ± 543.84 e. 9, reject 11. 0.945 21. $7,000 17. 2 ± 1.4574 29. a. nothing in particular 18. H 0 : β = 0; t = 3.33; p-value b. autocorrelation = 0.0019 c. non-linearity d. heteroscedasticity 24. a. y-hat = 22.62 + 0.94 X b. for each one-dollar increase in debt, assets increase 94 cents c. 0.71; since p value = 0.0092, we can reject at 1% significance the hypothesis that population correlation = 0. d. for α = 0.05, critical t = 2.228 < calculated 3.219, so reject the null that β = 0. (Alternatively, since p < 0.05, reject.) e. Critical F = 10.04 < 10.359, so reject null and conclude there is a significant relation. (Alternatively, in ANOVA table p < 0.01, so reject null.) f. 46.16 g. 46.16 ± 20.71 h. 46.16 ± 6.72 i. 1.107 j. 0.41 β 1 1.47 Multiple Regression: 1. 566,000; +1,200; $3 million 6. ŷ = 20+10x 1 +5x 2 +3x 3 7. 12 8. β 2 and β 3 at 5%; all at 10% 9. F=6; with 3,12 d.f. F.01 =5.95, so reject H O at 1% and 5% 10. 86 11. 14 12. 10 ± 10.89 14. 0; 1 15. homeowners typically spend $21 a month less on entertainment 19. a. movies = 56.71 0.93 x age 1.30 x educ + 0.096 x inc 2.28 x male b. movies attended falls by.93 for each year age increases, falls by 1.3 for each extra year of education, and increases by about 0.1 for each extra thousand dollars of family income; other things being equal males attend 2.28 fewer movies a year than females c. reject H 0 for β 1 since p = 0.024; fail to reject for i = 2-4 since all p values > 0.05 d. Adj. R 2 = 0.77; these four variables explain 77% of the observed variation in movie attendance. e. H 0 : β 1 = β 2 = β 3 = β 4 = 0 vs. H 1 : at least one equality not true F = 8.549 with p value = 0.018, so at 2% significance we reject null and conclude there is a significant linear relation with at least one of the x variables. f. y-hat = 10.82. g. see output Lower 95% Upper 95% h. 3.37 ± 4.86 i. since y-hat = 26.72, residual = 1.72 j. no k. yes; education is highly correlated with income and sex with age; use Data Analysis Correlation tool