c 2015, Jeffrey S. Simonoff 1



Similar documents
Predictor Coef StDev T P Constant X S = R-Sq = 0.0% R-Sq(adj) = 0.

Regression Analysis: A Complete Example

17. SIMPLE LINEAR REGRESSION II

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal

1.1. Simple Regression in Excel (Excel 2010).

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

TRINITY COLLEGE. Faculty of Engineering, Mathematics and Science. School of Computer Science & Statistics

HOW TO USE MINITAB: DESIGN OF EXPERIMENTS. Noelle M. Richard 08/27/14

2. Simple Linear Regression

Implied Volatility Skews in the Foreign Exchange Market. Empirical Evidence from JPY and GBP:

The Volatility Index Stefan Iacono University System of Maryland Foundation

AP Statistics. Chapter 4 Review

Premaster Statistics Tutorial 4 Full solutions

RELATIONSHIP BETWEEN WORKING CAPITAL MANAGEMENT AND PROFITABILITY IN TURKEY INDUSTRIAL LISTED COMPANIES

Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

4. Multiple Regression in Practice

Time-Series Forecasting and Index Numbers

Getting Started with Minitab 17

Week TSX Index

(More Practice With Trend Forecasts)

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Factors affecting online sales

Module 5: Multiple Regression Analysis

Chapter 7: Simple linear regression Learning Objectives

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Solution Let us regress percentage of games versus total payroll.

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Logistic regression modeling the probability of success

4. Simple regression. QBUS6840 Predictive Analytics.

Logs Transformation in a Regression Equation

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

Getting Correct Results from PROC REG

Elementary Statistics Sample Exam #3

10. Analysis of Longitudinal Studies Repeat-measures analysis

Simple Methods and Procedures Used in Forecasting

ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION

MULTIPLE REGRESSION EXAMPLE

Example G Cost of construction of nuclear power plants

INTRODUCTION TO MULTIPLE CORRELATION

Module 6: Introduction to Time Series Forecasting

The importance of graphing the data: Anscombe s regression examples

Forecasting in STATA: Tools and Tricks

JetBlue Airways Stock Price Analysis and Prediction

Regression and Time Series Analysis of Petroleum Product Sales in Masters. Energy oil and Gas

A Primer on Forecasting Business Performance

International Statistical Institute, 56th Session, 2007: Phil Everson

GLM I An Introduction to Generalized Linear Models

PITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Univariate Regression

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

2. What is the general linear model to be used to model linear trend? (Write out the model) = or

Analysing Questionnaires using Minitab (for SPSS queries contact -)

August 2012 EXAMINATIONS Solution Part I

The Tax Benefits and Revenue Costs of Tax Deferral

Business Valuation Review

Chapter 4 and 5 solutions

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Statistical Models in R

Time Series Analysis. 1) smoothing/trend assessment

Lecture 8: Stock market reaction to accounting data

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

The Liquidity Trap and U.S. Interest Rates in the 1930s

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Notes on logarithms Ron Michener Revised January 2003

Final Exam Practice Problem Answers

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week (0.052)

Module 5: Statistical Analysis

Simple linear regression

430 Statistics and Financial Mathematics for Business

ARKANSAS PUBLIC SERVICE COMMISSYF cc7 DOCKET NO U IN THE MATTER OF ON THE DEVELOPMENT OF COMPETITION IF ANY, ON RETAIL CUSTOMERS

5. Multiple regression

Chapter 5 Estimating Demand Functions

Chapter 23. Inferences for Regression

The average hotel manager recognizes the criticality of forecasting. However, most

Table of Contents. I. Executive Summary 2. II. Function of The United States Department of Treasury 3. III. Treasury Bonds, Bills and Notes 3

Simple Linear Regression Inference

Time Series Analysis

Time Series Analysis: Basic Forecasting.

USE OF ARIMA TIME SERIES AND REGRESSORS TO FORECAST THE SALE OF ELECTRICITY

WIN AT ANY COST? How should sports teams spend their m oney to win more games?

Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques Page 1 of 11. EduPristine CMA - Part I

The normal approximation to the binomial

Earnings Announcement and Abnormal Return of S&P 500 Companies. Luke Qiu Washington University in St. Louis Economics Department Honors Thesis

Two-way ANOVA and ANCOVA

A CRF-based approach to find stock price correlation with company-related Twitter sentiment

Causal Forecasting Models

Multiple Linear Regression

Regression Analysis (Spring, 2000)

Recall this chart that showed how most of our course would be organized:

Multiple Linear Regression in Data Mining

Introduction to Regression and Data Analysis

C(t) (1 + y) 4. t=1. For the 4 year bond considered above, assume that the price today is 900$. The yield to maturity will then be the y that solves

Hedge Effectiveness Testing

Stata Walkthrough 4: Regression, Prediction, and Forecasting

Transcription:

Modeling Lowe s sales Forecasting sales is obviously of crucial importance to businesses. Revenue streams are random, of course, but in some industries general economic factors would be expected to have a great effect on sales. One such industry is the building supply industry, since contractor work is a driving force for such purchases. Is it possible to model sales of Lowe s Companies (the world s second largest home improvement retailer and the 14th largest retailer in the U.S.) as a function of generally available economic factors related to the housing industry? The data studied here were gathered by Mike Nannizzi, and refer to 79 consecutive quarters from the first quarter of 1983 through the third quarter of 2002. We are interested in modeling Lowe s quarterly sales, in millions of dollars, as a function of housing starts (in millions) and average mortgage rate (I also thank Mike for some of the financial analysis quoted here). Examination of the revenue variable shows that it is right tailed; since it is a money variable, it is natural to take the target variable as logged (base 10) sales. That is, we will fit a semilog model. Recall, by the way, that these sales are in millions of dollars, so these quarterly sales are as big as $7.5 billion. There s a lot of money in hammers and nails! Here are scatter plots of logged sales versus housing starts and mortgage rate. As would be expected, there is a direct relationship with housing starts (more new houses meaning more building supplies), and an inverse relationship with mortgage rate (higher rates meaning fewer purchases of houses, with the resultant fewer repairs). We also see evidence in both plots of two distinct subgroups in the data, with apparently different relationships between the variables. The group with flatter sales corresponds to the 1980s, while that with higher sales corresponds to the 1990s. c 2015, Jeffrey S. Simonoff 1

There is also a strong relationship between logged sales and time, reflecting an annual proportional growth in sales. Once again we see evidence that the 1980s and 1990s correspond to two distinct time periods. Why would that be? Unlike Home Depot, which was the market leader in the (urban and suburban) home improvement industry, Lowe s spent the 1980s in mostly rural markets, aiming to support local contractors. As the home improvement concept became tremendously profitable into the 1990s, Lowe s changed its focus to compete more directly with Home Depot. c 2015, Jeffrey S. Simonoff 2

Here are the results of fitting the model of logged revenue on the three predictors: Regression Analysis: Log Sales versus Housing starts, Mortgage, Time Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 3 11.8100 3.93665 4924.32 0.000 Housing starts 1 0.3727 0.37271 466.23 0.000 Mortgage 1 0.0138 0.01377 17.23 0.000 Time 1 2.4666 2.46663 3085.49 0.000 Error 75 0.0600 0.00080 Total 78 11.8699 Model Summary S R-sq R-sq(adj) R-sq(pred) 0.0282742 99.49% 99.47% 99.44% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 1.8700 0.0444 42.16 0.000 Housing starts 0.09847 0.00456 21.59 0.000 1.13 Mortgage 0.01551 0.00374 4.15 0.000 5.67 Time 0.018073 0.000325 55.55 0.000 5.44 c 2015, Jeffrey S. Simonoff 3

Regression Equation Log Sales = 1.8700 + 0.09847 Housing starts + 0.01551 Mortgage + 0.018073 Time The regression fit is apparently very strong. The coefficients can be interpreted as follows. An increase of one million housing starts in a quarter is associated with increasing sales by 25.5%, holding all else fixed (10.0985 = 1.255). The coefficient for mortgage rates is puzzling, as it is positive; an increase in mortgage rate by one percentage point is associated with an increase in sales of 3.6% (10.01551 = 1.036), holding all else fixed. In fact, this variable adds little to the fit, as the model with it removed has R 2 =.994. Finally, given the other variables, there is a 4.2% quarterly increase in sales (10.01807 = 1.042). Unfortunately, there are problems with this model. There is apparently structure left in the data, related to the time effect noted earlier. In addition, there is a strong effect that sales in the third quarter are systematically lower than during the rest of the year. c 2015, Jeffrey S. Simonoff 4

We can try to address these model deficiencies by adding two more predictors: Time 2, to address the parabolic pattern in the residuals related to time, and an indicator variable identifying the third quarter. Here is the resultant regression output: Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 5 11.8416 2.36831 6100.00 0.000 Housing starts 1 0.2340 0.23395 602.58 0.000 Mortgage 1 0.0010 0.00103 2.66 0.107 Time 1 0.1370 0.13696 352.77 0.000 Time sq 1 0.0103 0.01035 26.65 0.000 Q3 1 0.0168 0.01681 43.29 0.000 Error 73 0.0283 0.00039 Total 78 11.8699 Model Summary S R-sq R-sq(adj) R-sq(pred) 0.0197040 99.76% 99.74% 99.71% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 2.0649 0.0489 42.27 0.000 c 2015, Jeffrey S. Simonoff 5

Housing starts 0.09386 0.00382 24.55 0.000 1.64 Mortgage 0.00517 0.00317 1.63 0.107 8.43 Time 0.014280 0.000760 18.78 0.000 61.17 Time sq 0.000037 0.000007 5.16 0.000 37.43 Q3-0.03489 0.00530-6.58 0.000 1.08 Regression Equation Log Sales = 2.0649 + 0.09386 Housing starts + 0.00517 Mortgage + 0.014280 Time + 0.000037 Timesq - 0.03489 Q3 The collinearity between Time and Time 2 is to be expected, so we don t have to worry about that. Apparently we don t need mortgage rate now, so that original positive coefficient wasn t something to worry about anyway: Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 4 11.8405 2.96013 7457.39 0.000 Housing starts 1 0.2329 0.23293 586.82 0.000 Time 1 0.2886 0.28855 726.94 0.000 Time sq 1 0.0214 0.02136 53.82 0.000 Q3 1 0.0165 0.01650 41.56 0.000 Error 74 0.0294 0.00040 Total 78 11.8699 Model Summary S R-sq R-sq(adj) R-sq(pred) 0.0199233 99.75% 99.74% 99.71% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 2.1379 0.0197 108.65 0.000 Housing starts 0.09349 0.00386 24.22 0.000 1.63 Time 0.013331 0.000494 26.96 0.000 25.30 Time sq 0.000044 0.000006 7.34 0.000 25.25 Q3-0.03453 0.00536-6.45 0.000 1.08 c 2015, Jeffrey S. Simonoff 6

Regression Equation Log Sales = 2.1379 + 0.09349 Housing starts + 0.013331 Time + 0.000044 Time sq - 0.03453 Q3 Given time, and whether it is the third quarter, one million additional housing starts is associated with an expected 24.0% increase in Lowe s sales. Given time and the number of housing starts, sales are 7.7% lower in the third quarter. Why would this be? We wouldn t be surprised to see higher sales in the first part of the year, since that is the peak construction season in the northern part of the country, but why wouldn t this affect the fourth quarter as well? In fact, there is evidence that Lowe s sold goods at a steeper discount in the fourth quarter, as its income as a percentage of sales is one third lower than in any of the other three quarters. This could, perhaps, reflect a desire to pump up end of year sales, so as to meet analysts sales expectations. The time effect is a little trickier, since it is a quadratic relationship. Since the coefficient for Time 2 is positive, we re seeing an increasing growth rate in sales over time, and a little calculus can make that more specific. Given all else is held fixed, the expected rate of change of the response as a function of a predictor xwhen x is in the model quadratically (β 1 x+β 2 x 2 ) is just the partial derivative with respect to x, or β 1 + 2β 2 x. Thus, given all else is held fixed, at the first quarter of 1983 the estimated expected time-related rate of sales growth is 3.1% (.0133314 + (2)(.00004389)(1) =.0134, and 10.0134 = 1.031); on the other hand, given all else is fixed, at the first quarter of 2002 the estimated expected time-related rate of sales growth is 4.7% (.0133314 + (2)(.00004389)(77) =.0201, and 10.0201 = 1.047). Thus, unless economic conditions change, it seems that Lowe s sales can be expected to continue to rise. The model now seems to fit pretty well (although the plots of residuals versus housing starts and time of year seem to hint at nonconstant variance). c 2015, Jeffrey S. Simonoff 7

c 2015, Jeffrey S. Simonoff 8

Given the very high R 2, we can say that housing starts and the time related variables, we can predict Lowe s sales very accurately. Indeed, the standard error of the estimate s =.0199 implies that 95% of the time Lowe s sales are predicted to within roughly 9 10% high or low (10.0398 =.912; 10.0398 = 1.096). Of course, that translates into as much as ±$750 million, so we shouldn t get too excited! Another potential approach we could have taken here is to split the data into pre 1990 and post 1990 groups, being consistent with the earlier scatter plots. We can do this using the pooled / constant shift / full model approach we discussed earlier. Here is the full model fit: Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 9 11.8515 1.31683 4923.18 0.000 Housing starts 1 0.0889 0.08893 332.48 0.000 Time 1 1.2738 1.27379 4762.25 0.000 Mortgage 1 0.0059 0.00586 21.90 0.000 Q3 1 0.0155 0.01555 58.12 0.000 1980s 1 0.0086 0.00864 32.29 0.000 Housing80s 1 0.0001 0.00007 0.25 0.619 Time80s 1 0.0138 0.01385 51.77 0.000 Mortgage80s 1 0.0052 0.00523 19.56 0.000 Q380s 1 0.0031 0.00313 11.70 0.001 Error 69 0.0185 0.00027 Total 78 11.8699 c 2015, Jeffrey S. Simonoff 9

Model Summary S R-sq R-sq(adj) R-sq(pred) 0.0163547 99.84% 99.82% 99.79% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 1.8473 0.0394 46.88 0.000 Housing starts 0.08892 0.00488 18.23 0.000 3.87 Time 0.019151 0.000278 69.01 0.000 11.83 Mortgage 0.01631 0.00349 4.68 0.000 14.77 Q3-0.04195 0.00550-7.62 0.000 1.69 1980s 0.4004 0.0705 5.68 0.000 329.84 Housing80s -0.00356 0.00714-0.50 0.619 60.92 Time80s -0.005737 0.000797-7.20 0.000 12.17 Mortgage80s -0.02239 0.00506-4.42 0.000 233.21 Q380s 0.03226 0.00943 3.42 0.001 2.12 Regression Equation Log Sales = 1.8473 + 0.08892 Housing starts + 0.019151 Time + 0.01631 Mortgage - 0.04195 Q3 + 0.4004 1980s - 0.00356 Housing80s - 0.005737 Time80s - 0.02239 Mortgage80s + 0.03226 Q380s Separate slopes for the housing starts variable don t seem to be supported: Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 8 11.8514 1.48142 5598.58 0.000 Housing starts 1 0.1606 0.16057 606.82 0.000 Time 1 1.5004 1.50040 5670.29 0.000 Mortgage 1 0.0059 0.00586 22.16 0.000 Q3 1 0.0158 0.01578 59.63 0.000 1980s 1 0.0095 0.00950 35.92 0.000 Time80s 1 0.0138 0.01380 52.14 0.000 Mortgage80s 1 0.0052 0.00518 19.58 0.000 Q380s 1 0.0031 0.00314 11.85 0.001 Error 70 0.0185 0.00026 c 2015, Jeffrey S. Simonoff 10

Total 78 11.8699 Model Summary S R-sq R-sq(adj) R-sq(pred) 0.0162667 99.84% 99.83% 99.80% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 1.8501 0.0388 47.72 0.000 Housing starts 0.08726 0.00354 24.63 0.000 2.07 Time 0.019204 0.000255 75.30 0.000 10.10 Mortgage 0.01632 0.00347 4.71 0.000 14.76 Q3-0.04140 0.00536-7.72 0.000 1.62 1980s 0.3866 0.0645 5.99 0.000 279.51 Time80s -0.005695 0.000789-7.22 0.000 12.04 Mortgage80s -0.02224 0.00503-4.42 0.000 232.37 Q380s 0.03088 0.00897 3.44 0.001 1.94 Regression Equation Log Sales = 1.8501 + 0.08726 Housing starts + 0.019204 Time + 0.01632 Mortgage - 0.04140 Q3 + 0.3866 1980s - 0.005695 Time80s - 0.02224 Mortgage80s + 0.03088 Q380s This model implies predictions of sales to within 7 8%, roughly 95% of the time. The model yields two fitted lines: for the 1980s, LogSales = 2.2368+.0873 Housing starts+.01351 Time.0059 Mortgage rate.0105 Q3, and for post 1990, LogSales = 1.8501+.0873 Housing starts+.0192 Time+.0163 Mortgage rate.0414 Q3. The housing starts effect is very similar to that in the quadratic model, and the third quarter effect was stronger in the later time period. Consistent with the increasing predicc 2015, Jeffrey S. Simonoff 11

tions from the quadratic model, the estimated annual rate of change in sales (given the other variables) was 3.2% in the earlier time period, and 4.5% in the latter time period, certainly good news for Lowe s. Interestingly, a similar analysis to this one using Home Depot revenues shows the opposite pattern, with the rate of change of Home Depot s revenues decreasing in recent time periods. Perhaps this accounts for the relatively poor performance of Home Depot stock; Home Depot s price dropped more than 50% from June 2002 to March 2003, while that of Lowe s dropped only (?) 15%. There are two other points worth mentioning here. These data form a time series, of course, and even though the plot of standardized residuals versus time didn t show apparent autocorrelation, there is, in fact, some autocorrelation in the residuals. It s not that important, however; some basic time series remedies (which we will talk about later) only change the standard error of the estimate from.0163 to.016. In addition, we should recognize that part of the time trend effect that we are seeing is presumably an inflation effect; an analysis that avoided that (uninteresting) effect could be accomplished by using constant dollar sales (inflation-adjusted), rather than the actual (nominal) dollar sales. Minitab commands To create all K indicators for a categorical variable (likequarter) click on Calc Make Indicator Variables and enter the variable name under Indicator variables for:. The program will choose default names for the indicators, but you can change them if you wish. c 2015, Jeffrey S. Simonoff 12