ABSTRACT DETERMINANTS OF GROCERY STORE SALES: A MULTIPLE REGRESSION APPROACH by Larry R. Woodward University of Mary Hardin-Baylor 900 College St. Belton, Texas 76513 lwoodward@umhb.edu 254-295-4648 Grocery store sales forecasting can be one of the most difficult jobs a store manager must do each month. Store sales vary week to week for many reasons including changes in the general price index, economic growth, purchasing power of the local population, seasonal effects, weather, etc. Typical methods to forecast sales use exponential smoothing or use regression models to forecast sales by the recent historic trend. Such models are relatively easy to employ but are quite naive in capturing all of the factors affecting sales. A large grocery store chain needs to incorporate specific information relating to each store's particular demographic makeup to obtain a more accurate forecast rather than a simple trend approach. In this study, sales are forecasted for a grocery store in a low-income area to highlight the need of using customized forecasts for each grocery store in the chain. Controlling for activation days on government food purchasing cards, used by many of the patrons of this store, greatly enhances the forecast. Using backward stepwise regression and dummy variables to control for these activation days, in addition to the traditional factors affecting sales, produced a much more accurate forecasting model. INTRODUCTION Whether a business provides goods or services, forecasting sales accurately is a vital function for any company. It is the foundation of how a company will plan and perform daily operations. Many companies fail to realize the importance of this management function, which is a key contributor to a company s success. Excellence in sales forecasting can boost a firm s financial health and gratify customers and employees alike. [4, p.44] Without a good sales forecast a company is seriously handicapped in planning its production, planning its purchasing, scheduling its inventory adjustments, setting sales quotas, budgeting expense, and, in general, doing the sort of planning, scheduling, and controlling which are vital to making as much profit as possible under competitive conditions [2]. Through observation, Richard C. Wiser (Vice President, Financial Planning and Analysis, Mary Kay Cosmetics) and his colleagues discovered that management sometimes does not understand statistical forecasting methods [5]. The purpose of this paper is to compare the current sales forecasting methodology of a nationwide grocery store chain, which uses a naive trend approach, obtained from a
nationally known forecasting vendor, to sales forecasts from a flexible multiple regression model based on easily known idiosyncrasies of a particular store. A regression model is developed for each department of a specific grocery store to predict sales in that department using a variety of independent variables (holidays, activation days, local economic conditions, etc.), depending on the department. These forecast results will be compared to the actual forecasts made by the store management through the use their current forecasting software by a mean square error approach using 10 weeks of out of sample data. Current Forecasting Methodology Currently the store manager takes two days to forecast sales for all store departments for each of the four weeks in the next month. These sales forecasts are sent to the district manager and then to the corporate home office. The accuracy of the sales forecasts directly impacts the evaluation of the store director, his/her promotion, bonuses, etc., as well as the operating efficiency and profitability of the store. At present, each store manager (several hundred all total) uses his/her own method for making sales forecasts in conjunction with a program provided by the home office. This program, used widely in the industry, simply calculates sales projections by adding the sales trend selected to the previous year s sales. The manager then makes adjustments based on his/her experience, intuition, and professional judgment. Based on a store manager making $100,000 a year, 260 work days in a year, and two days a month to make forecasts, each manager is being paid approximately $9,230 each year just to make forecasts. If these forecasts are not accurate, much of this money is wasted. The degree to which a store manager understands the various factors that drive his/her store sales will vary with the experience of the manager. Incorrect assumptions about factors which affect sales growth for a particular store could possibly hinder future sales growth for the company as a whole. With an improved forecasting method, store managers can reduce the time they personally spend on forecasting, properly schedule employee work hours, improve order writing, control inventory levels, reduce shrinkage, create efficient store operations, and give better customer service. DATA For this study, one grocery store was used as a pilot study to determine if the current method of forecasting should be revised for the entire chain. The particular store was in a low-income neighborhood in a mid-sized city in Texas with predominantly an African- American and Hispanic clientele. It averages several hundred thousand dollars in weekly sales and contains nine typical departments found in most major grocery store chains. Forecasts for the six largest departments (grocery, market, drugs, pharmacy, produce and gasoline) will be included in this paper to help obfuscate the identity of the actual store. Weekly store sales and departmental sales were collected from the last two fiscal years (FY 2003 & 2004). Since sales at grocery stores are impacted by holidays, special dummy variables were formed for the following holidays: Christmas Eve, Christmas,
New Years Eve, New Years Day, Easter, Cinco De Mayo, Mothers Day, Fathers Day, July 4, Labor day, Ash Wednesday, Valentine's Day, Memorial Day, Thanksgiving, and Spring Break. Each holiday was analyzed as a separate dummy variable for each individual department forecast model to determine which specific holidays had the greatest impact on sales. For some departments, the individual day was very significant. For instance, Mothers Day and Valentine's day were the primary determinants for explaining variability in floral sales (results not shown in this paper). However, such holidays were not individually statistically significant in explaining grocery or market sales. Therefore, for most of the departments, holidays were also grouped together as either "special days but open" or "special days but closed." The weekly average price for all grades of gasoline was used as an independent variable for the gasoline department. A variety of economic data was obtained to capture changes in the local economy. However, the size of the labor force, unemployment, interest rates, and population growth were little changed over the two years studied and had little statistical significance and were thus dropped from the models. "Activation days" are the days of every month when customers who receive government assistance receive their monthly balances on Lone Star cards, used for food purchases only, and social security checks. These days include the 1st, 3rd, 5th, 6th, 7th, 9th, 11th, 12th, 13th, and 15th of every month. This type of media was found to have an enormous impact on store sales, especially stores with low-income clientele. METHODOLOGY A multiple regression model was formed for each of the six largest departments using the backwards stepwise approach. The goal of the final model for each department was to explain sales in a straight forward manner so that the store manager and home office could accurately interpret the coefficients and the relative impact each independent variable had on store and department sales. With the final model obtained for each department, ten weeks of out-of-sample data was used to compare the new multiple regression sales forecast to the old method sales forecast. The Mean Squared Error (MSE) was computed for the difference of actual sales vs. each of the forecasted values. The degree to which the MSE dropped was used to evaluate the validity of the new forecast method. RESULTS Table 1 shows the final model results for each of the six largest departments and the overall store. Each of the data sets used contained 105 weeks of data since 2003 contained 53 weeks for the fiscal year and 2004 contained 52 weeks for the fiscal year. The specific forecast results from the entire store, grocery, and market departments will be discussed in detail with the statistics being shown for all 6 of the separate departments.
Grocery Regression Model The initial model used to explain "grocery" sales included dummy variables for each of the main holidays, the trend measured by "week", activation days, and moving average. Thanksgiving, New Years day, Superbowl week, and Easter were found to be statistically significant at the.10 level. This model had an R-square of approximately 66%. The inclusion of each holiday as a separate dummy variable stemmed from the results obtained on a small department, "floral" ( results not shown) which had an R-square of over 90% due to the extremely strong relationship of Mother's Day and Valentine's Day to flower sales. While separate dummy variables worked extremely well for the floral TABLE 1 Grocery Drugstore Produce Market Pharmacy Gasoline Store R-Square 0.71 0.45 0.71 0.78 0.51 0.49 0.72 Observations 105 105 105 105 105 105 105 Fstatistic 58.7 27.9 81.5 116.5 18.4 48.3 63.1 Intercept 212637 42614 7538 67351 78000 13971 504199 115.6** 6.26** 3.92** 52.9** 38.5** 3.5** 110.7** Week 51.54 112.4 133.8 364 2.05* 624 5.36** 4.7** 5.85** Activation Days 4967 684 7.33** 4554.7 1717 12521 12.89** 4.08** 17.01** 3.98** 13.13** Special Day Closed -9412-6396 -34766-2.68** -2.86** - 3.46** Special Day Open 11728 13373 5446 5424 18241 5.5** 7.71** 8.7** 3.81** 3.87** Moving Average 0.28 0.64 0.72 2.59* 8.97** 8.97** * Significant at.05 ** Significant at.01 department, such holidays do not drive sales in the other departments to such an extreme extent. Grouping the holidays together was the logical choice due to each being a factor in sales but as a group they were much more interpretable for the larger departments like grocery and produce. The model was run again by grouping all of the holidays into two categories- "special days but closed" and "special days but open." Table1 indicates an R-square of 70 percent. Thus 70% of the variability in grocery sales can be attributed to the following independent variables: week, number of activation days in a week, special day but closed, special day but open, and grocery moving average. Base level grocery sales are approximately $212,637 for the period with a slight upward trend of about $51 captured by the "week" variable. After correcting for a slight increase in inflation over the two year period, the "real" grocery sales over the period did not change much-if at all. For
each "activation day", sales in a typical week increased about $4,967. When a week had a holiday but was closed or open sales increased $11,728 and dropped $9,412 respectively. The F-test, overall level of significance, was 47.33. The grocery moving average variable was removed due to its p-value or level of significance of.26641. A comparison of the actual sales data to the forecast using the old method and the new method is shown in Table 2 through the use of the MSE and the percent difference of the old MSE vs. the MSE using the new method. These results are from using the model developed from the previous two years and then forecasting for the next 10 weeks with out of sample data. The new method reduced the MSE by 55% vs. the old forecasting method. TABLE 2: MSE: OLD FORECAST METHOD MSE :NEW FORECAST METHOD % REDUCTION IN MSE GROCERY 67,583,928 30,230,803-55% DRUGSTORE 27,861,526 10,623,930-62% PRODUCE 4,951,915 2,522,755-49% MARKET 103,678,060 26,957,239-74% PHARMACY 246,073,828 135,349,513-45% GASOLINE 121,985,762 69,615,048-43% STORE 914,480,080 435,779,593-52% Market Regression Model The results for the Market department were the strongest of any department shown in this study with the final model having an R-square of 78% and an F-statistic of 116. Each independent variable had a significant impact on market sales with the exception of special day but closed and market moving average variables. The base level of market sales was found to be $67,351 with a positive trend of $112 per week-about the same increase as would be expected from a general rise in prices over the period. Variability around the base sales level was primarily due to activation days which accounted for an increase of $4,554 per activation day. Finally, when there was a holiday and the store remained open, sales increased by about $5,424. It is interesting that the market sales, which include mostly meat and poultry, were not statistically affected by holidays where the store closed- as was the case for general grocery sales. Table 2 shows the new model reduced the MSE of the forecast by 74%. Total Store Model The store regression model is inclusive of all department sales. An R-square of 71 percent was obtained with an F-statistic of 63. The only variable that had a p-value greater than.05 was store moving average at a.399, which indicated that it had the least significant impact and should be removed. The other independent variables had p- values that were less than.05, and had a significant impact on store sales.
The store as a whole had a base level of sales of over $500,000 each week and an upward trend of $364 dollars each week- due primarily to the general increase in prices of consumer goods. For each activation day, sales increased approximately $12,521. When the store closed, sales dropped by almost $35,000 and when there was a special holiday and the store remained open sales increased over $18,000. Table 2 shows that the new forecast model reduced the MSE for total store sales by 52%. CONCLUSIONS The statistical models formed using the multiple regression method proved to be much more accurate than those typically employed using simple trend projections. Further, the results highlight the need for upper management to look the idiosyncrasies of each store to build a unique forecasting model for each and not use a generic approach. This study focused on a store in a low income area and it was determined that "activation days" was highly significant. In stores located in more affluent areas, such a variable probably would not be necessary and should be replaced by some other factor. With an improved forecasting method, store managers can properly schedule hours, improve order writing, control inventory levels, reduce shrink, create efficient store operations; consequently, resulting in excellent customer service. References [1] Anderson, D., Sweeney, D., Williams, T. (2002). Statistics for Business and Economics (8 th ed.). South-Western:Thomson Learning. [2] MacGowan, T.G., (1998). Forecasting Sales. Harvard Business Review. 760-770. [3] Malehorn, J., (2001). Forecasting at Ocean Spray Cranberries. The Journal of Business Forecasting. 6. [4] Moon, M., Mentzer, J., Smith, C., Garver, M. (1998, September). Seven Keys tobetter Forecasting. Business Horizons. 44-52. [5] Wiser, R., (1995). Forecaster s Viewpoint. The Journal of Business Forecasting. 26.