PROJECT REPORT FORECASTING ANALYTICS. Submitted By: Arka Sarkar ( ) Kushal Paliwal ( ) Malvika Gaur ( )

Similar documents
Indian School of Business Forecasting Sales for Dairy Products

Forecasting Analytics. Group members: - Arpita - Kapil - Kaushik - Ridhima - Ushhan

Demand Forecasting to Increase Profits on Perishable Items

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal

Forecast the monthly demand on automobiles to increase sales for automotive company

COMPARISON OF FIXED & VARIABLE RATES (25 YEARS) CHARTERED BANK ADMINISTERED INTEREST RATES - PRIME BUSINESS*

COMPARISON OF FIXED & VARIABLE RATES (25 YEARS) CHARTERED BANK ADMINISTERED INTEREST RATES - PRIME BUSINESS*

Demand forecasting & Aggregate planning in a Supply chain. Session Speaker Prof.P.S.Satish

Module 6: Introduction to Time Series Forecasting

AT&T Global Network Client for Windows Product Support Matrix January 29, 2015

Executive Summary. Abstract. Heitman Analytics Conclusions:

FORECASTING. Operations Management

Use of Statistical Forecasting Methods to Improve Demand Planning

Methodology For Illinois Electric Customers and Sales Forecasts:

Agenda. Managing Uncertainty in the Supply Chain. The Economic Order Quantity. Classic inventory theory

Sensex Realized Volatility Index

JetBlue Airways Stock Price Analysis and Prediction

Case 2:08-cv ABC-E Document 1-4 Filed 04/15/2008 Page 1 of 138. Exhibit 8

TIME SERIES ANALYSIS

Using INZight for Time series analysis. A step-by-step guide.

Volatility Tracker: No Surprises in Gold

2) The three categories of forecasting models are time series, quantitative, and qualitative. 2)

INCREASING FORECASTING ACCURACY OF TREND DEMAND BY NON-LINEAR OPTIMIZATION OF THE SMOOTHING CONSTANT

Data driven approach in analyzing energy consumption data in buildings. Office of Environmental Sustainability Ian Tan

Regression and Time Series Analysis of Petroleum Product Sales in Masters. Energy oil and Gas

Industry Environment and Concepts for Forecasting 1

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Outline: Demand Forecasting

An Assessment of Prices of Natural Gas Futures Contracts As A Predictor of Realized Spot Prices at the Henry Hub

Analysis One Code Desc. Transaction Amount. Fiscal Period

Financial Operating Procedure: Budget Monitoring

OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS

Condor Options Volatility Tracker

Jinadasa Gamage, Professor of Mathematics, Illinois State University, Normal, IL, e- mail:

The Changing Relationship Between the Price of Crude Oil and the Price At the Pump

Equipping your Forecasting Toolkit to Account for Ongoing Changes

Exponential Smoothing with Trend. As we move toward medium-range forecasts, trend becomes more important.

Using JMP Version 4 for Time Series Analysis Bill Gjertsen, SAS, Cary, NC

High Frequency Equity Pairs Trading: Transaction Costs, Speed of Execution and Patterns in Returns

Forecasting DISCUSSION QUESTIONS

Causal Leading Indicators Detection for Demand Forecasting

IT S ALL ABOUT THE CUSTOMER FORECASTING 101

Qi Liu Rutgers Business School ISACA New York 2013

2013 MBA Jump Start Program. Statistics Module Part 3

A Regime-Switching Model for Electricity Spot Prices. Gero Schindlmayr EnBW Trading GmbH

Simple linear regression

ENERGY STAR for Data Centers

Enhanced Vessel Traffic Management System Booking Slots Available and Vessels Booked per Day From 12-JAN-2016 To 30-JUN-2017

Simple Inventory Management

Hedging Strategies Using

Time series Forecasting using Holt-Winters Exponential Smoothing

IBM SPSS Forecasting 22

Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques Page 1 of 11. EduPristine CMA - Part I

Problem 5: Forecasting the demand for bread st June Fernando Baladrón Laura Barrigón Ángeles Garrido Alejandro González Verónica Hernández

Week TSX Index

A Guide to the Insider Buying Investment Strategy

Replenishment: What is it exactly and why is it important?

USE OF ARIMA TIME SERIES AND REGRESSORS TO FORECAST THE SALE OF ELECTRICITY

Energy Purchasing Strategy Mid-year Review 2015

Chapter 7: Simple linear regression Learning Objectives

Welcome! First Steps to Achieving Effective Inventory Management

I. Introduction. II. Background. KEY WORDS: Time series forecasting, Structural Models, CPS

CASH DEMAND FORECASTING FOR ATMS

Module 3: Correlation and Covariance

Mario Guarracino. Regression

Probabilistic Forecasting of Medium-Term Electricity Demand: A Comparison of Time Series Models

16 : Demand Forecasting

HIGH DIVIDEND STOCKS IN RISING INTEREST RATE ENVIRONMENTS. September 2015

Solution-Driven Integrated Learning Paths. Make the Most of Your Educational Experience. Live Learning Center

EIM Effective Inventory Management, Inc.

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Module 5: Multiple Regression Analysis

Forecasting in supply chains

PART 3 CASH FLOW FORMULA:

Aurora Updates Aurora Dividend Income Trust (Managed Fund) vs. Listed Investment Companies

Short Term Ridership Forecasting Model (Version 3.0)

Need to know finance

The Main Page of RE STATS will provide the range of data at the very top of the page.

What s behind the liquidity spread? On-the-run and off-the-run US Treasuries in autumn

Ashley Institute of Training Schedule of VET Tuition Fees 2015

Chapter 23. Inferences for Regression

Simple Predictive Analytics Curtis Seare

Start Your. Business Business Plan

A Primer on Forecasting Business Performance

Natural Gas Wholesale Prices at PG&E Citygate as of November 7, 2006

Foreign Exchange Analytics Chartbook. Tactical Perspectives

How To Plan A Pressure Container Factory

Contracts. David Simchi-Levi. Professor of Engineering Systems

Promotional Forecast Demonstration

Forecasting in STATA: Tools and Tricks

Based on Chapter 11, Excel 2007 Dashboards & Reports (Alexander) and Create Dynamic Charts in Microsoft Office Excel 2007 and Beyond (Scheck)

Smoothing methods. Marzena Narodzonek-Karpowska. Prof. Dr. W. Toporowski Institut für Marketing & Handel Abteilung Handel

VENDOR MANAGED INVENTORY

INVESTING IN NZ BONDS

Analyzing price seasonality

Analysis of Whether the Prices of Renewable Fuel Standard RINs Have Affected Retail Gasoline Prices

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Time Series Analysis and Forecasting

Transcription:

FORECASTING ANALYTICS PROJECT REPORT Submitted By: Arka Sarkar (613161) Kushal Paliwal (613128) Malvika Gaur (6131456) Shwaitang Singh (6131261)

Executive Summary: Problem Description: We aim to forecast daily sales (unit sold) of the top 5 selling SKUs for the coming week (1 st August 212 to 7 th August 212). We identify the top 5 selling SKUs in 212 as follows: Item Sales (INR) Saffola Gold Oil 5 Lt. Jar 92,92.47 BikajiNmknBhujiaSev 1KG 819,325.89 DaawatDevaaya BM Rice 5 Kg. Pack 78,153.15 Ashirwaad Atta 1 Kg PO 587,513.58 Fortune Refine Soya Oil 5 Lt. Jar 538,473.99 From a total of nearly 1, SKUs, the top 5 SKUs alone are responsible for nearly 3% of the store s revenues (see Appendix I). Also these 5 SKUs are bellwether products for their classesand it would be instructive for store managers to note change in sales in these as an indication for change in sales for the respective classes themselves. Forecasts of the sales of the top 5 SKUs can help managers to: Estimate volatility of earning and design promotion campaigns to smoothen earnings Protect against stock-outs to avoid lost business opportunities Data Description: We re provided with 2 datasets, namely customer data and transaction data. The transaction data comprises of details (quantity sold, extended price etc.) Figure: per day for Saffola Gold 5 Lt. Jar per day Figure: per day of the week for Saffola Gold 5 Lt. Jar

Key Data Characteristics: Missing Values: There were a number of missing values for the Top 5 SKUs. These missing values can either be interpreted as (1) error in recording data or (2) no sales (i.e. data value = ) due either stock-outs or no demand for the product. Seasonality: The SKUs exhibit some level of seasonality. In most cases there is a pronounced shift in trend over the weekends as compared to the weekdays. In some cases, the there is a 7 day seasonality that is exhibited. Transaction size: We see that the transaction size (i.e. the number of units bought) is in multiples of 3 each. Outliers: There are number of outliers (i.e. when values are more than 3 standard deviations away). These occur due to (1) more than normal number of transactions driven in some cases by proximity to public holidays (2) a single bulk buyer who dominates the sales. High Level description of the final forecasting method and performance: Item Forecasting Method* RMSE MAPE Saffola Gold Oil 5 Lt. Jar MLR and Residual Forecast 5.262 71.66% Bikaji Nmkn Bhujia Sev 1KG Seasonal Naïve Forecast** 8.635 69.81% Daawat Devaaya BM Rice 5 Kg. Pack MLR and Residual Forecast 6.862 68.14% Ashirwaad Atta 1 Kg PO Multiple Linear Regression (MLR) 16.59 45.33% Fortune Refine Soya Oil 5 Lt. Jar Holt-Winters Smoothening 11.97 52.69% *Please refer to appendix III, IV, V and VI for the respective models tested, intermediate results and forecast plot for the next 7 days. Forecast for Saffola shown in the accompanying presentation. **For the SKU Bikaji, although the MLR with Residual Forecast shows a much better visual fit (see appendix III), it does not outperform seasonal naive when it comes to the error measures for the specific 7 day period that we chose. Forecast assumptions: Level of confidence for the forecast: There are just forecast and should not be construed as predictions. Methods used produce forecast with a 95% level of confidence. We believe the relevant error measures in this case are: RMSE and MAPE. Relatively higher values of Customer preferences do not change over the 7 day forecast period and stay the same experienced over the training and validation period (1 st August, 211 to 31 st July, 212) The store has sufficient inventory and there would be no stock outs that would act as upper limits number of units that can be sold Missing Values imply that there were no sales on that given day for the SKU. The data has been fitted to incorporate zeros in cases where there was missing data. Outliers due to bulk purchases (>4X times the regular purchase size) made by a particular customer are considered to be random events and cannot be forecasted Impact of holidays is not incorporated as we do not have an exhaustive set of holidays in the region where the store is located. Additionally, we have limited information on how the effect of the holiday should be built into the model. In some cases, the the SKU sales peak 2 days before the actual holiday, in others they peak on the actual day of the holiday while in others they peak 2 days after the actual holiday. In the absence of credible source of different types of

holiday, coupled with the fact that the data that we have is only for 1 year (i.e. we cannot confirm the type of holiday across years), we ve not incorporated the effects of holidays into our forecast. Conclusions and recommendations: Cash flow variations: Since these five SKUs combined account for over 2% of the revenues of the store, accurate forecasting will help in understanding the cash flows via receipts over the immediate future. This information can then be used to balance outflows of cash, including payments to be made to suppliers, wholesalers, etc. We recommend the store-owner to use this information to manage cash flow positions. Protection against stock-outs: Accurate forecasts will help us to gauge demand for these products over the next one week. Since these SKUs are important from a revenue standpoint, accurate forecasts will help us to anticipate an increase in demand, and give the store-owner ample lead time to order the SKUs and prevent stock-outs. Inventory management: The store-owner could contrast revenues generated by these SKUs and lower selling SKUs. In case of high anticipated demand and limited shelf space, the owner could remove lower grossing items and order excess quantity of these top selling SKUs. Additionally, these forecasts could be used to better manage inventory levels in general. Holiday demand: It was observed that the sales of these 5 SKUs tend to peak just before holidays. The store-owner could use these forecasts to protect against stock-outs during such periods, especially since these SKUs are the highest grossing items. Technical Summary: Data Preparation Issues: Filtering: The first level of data preparation included filtering for the particular SKU of interest Missing Values Removal:We insert zeros (assuming that there were no sales on the particular day) for the dates where there were no data in the dataset Outlier Removal: In this step, a frequency chart wascreated to understand if there any aberrations in the purchase pattern. For example, if 3 and 6 are the most common purchase quantities, we needed to understand how to deal with a customer who has bought 9 units. Our treatment of outliers has been conservative and we have replaced the rows with only the highest frequency with the average value, for example a value of 9 has been replaced with 3 in the case of Saffola Gold Methodology: The sequence of steps in forecast is shown below: Data Visualization Random Walk Test Benchmarking: Naive and Seasonal Forecast Single Layer: Multiple Linear Regression Model Dual Layer: Multiple Linear Regression Model with Residual Forecast Smoothing Method (Additive Holt-Winters)

Caveats: As part of our investigations, we also did find customers who purchased certain SKUs (eg: Saffola) did in-fact return for a second purchase after a certain period of time. To forecast the sales of Saffola, an alternate method could have been to build a forecast on the number of customers visiting the particular store on any given day to purchase Saffola Gold. We believe this would be venturing into the realm of econometric models, which given the current data may have a very high explanatory power; the correlation could not be construed as causation and made the basis for a forecast. Data Visualization: We first perform a visual check to detect level, trend and seasonality in the data. To detect seasonality we plot ACF plots of the quantity sold, to identify the seasonal lags. An example of how this was done for the Saffola SKU is shown below, where in we conclude that there is 7 day seasonality in the data base on the ACF Plot. We also see that we would need an additive model based on the linear trend line plotted for the quantity of Saffola Gold SKU sold per day. Random Walk Test: We perform the random walk test to determine if the series at hand can actually be forecasted. We find that all the series are not random walks. Benchmarking: We use the naïve and the seasonal naïve forecast as the benchmark for the final model. Error Measure Naïve Forecast Seasonal Naïve Forecast RMSE 13.8194 11.31513 MAPE 15.2% 89.7% Multiple Linear Regression Model: Based on the trend and seasonality involved we forecast each SKU using a multiple linear trend model with seasonality being built in by using dummy variables with Training, Validation and Testing partitions. The time period for each of these periods are: o Training: 1 st August 211 to July 24 th, 212 o Validation (1 week): July 25 th 212 to 31 st July 212 o Test (1 month): August 1 st 212 and August 31 st 212

Multiple Linear Regression Model with Residual Forecast:To further improve the model, we add an additional layer of residual forecast to the model. In the case of Saffola Gold SKU, we notice that there is still some seasonality (lag 4) that remains in the residuals, which can be captured by an AR(4) model. Holt Winter s Smoothing Method: Last but not the least, we try a smoothing method to estimate the sales for the various SKUs to find the most optimal values of RMSE and MAPE for the 1 week forecast that we intend to develop. To take the example of Saffola Gold, we find the below results by running the Holt Winters Smoothing, using the various parameters. Time Plot of Actual Vs Forecast (Training Data) 4 2-2 Dayindex Actual Forecast Summary of results for the various SKUs:

Appendix I: Selection of Top 5 SKUs Top 5 selling SKUs in 212 are as shown below: Total Sales in the Department for the years 211 and 212: Top 5 selling SKUs in 212 as a percentage of total sales (in INR) in 212, is shown below: SaffolaGold Oil 5 Lt. Jar 92,92.47 BikajiNmknBhujiaSev 1KG 819,325.89 DaawatDevaaya BM Rice 5 Kg. Pack 78,153.15 Ashirwaad Atta 1 Kg PO 587,513.58 Fortune Refine Soya Oil 5 Lt. Jar 538,473.99 Sum of Top 5 SKUs 3,628,387.8 Total (for all SKUs) 122,781,98.6 % of sales 2.96%

Appendix II: Line Graphs for SKUs Figure: Haldiram Sev Bhujia ( on Y-axis and dates on X-axis) Figure: Fortune Refined Oil ( on Y-axis and dates on X-axis) Figure: Daawat Rice ( on Y-axis and dates on X-axis) Figure: Aashirwaad Aata ( on Y-axis and dates on X-axis)

Frequency ACF Appendix III: Bikaji Sev Bhujia A histogram plot of the data, after data cleaning was performed. The ACF plot shows that there is a seven day seasonality that can be exploited during forecasting. 2 Histogram 1.5 ACF Plot for 1 2 4 6 8 1 12 -.5-1 1 2 3 4 5 6 7 8 9 1 Lags ACF UCI LCI Visualizing the data: Data is fairly noisy with a close to no trend. 12 1 8 6 4 2 Linear () 2 per. Mov. Avg. () Naïve forecast and the Seasonal Naive Forecast: 12 1 8 6 4 2 Naïve Forecast 12 1 8 6 4 2 Seasonal Naïve Forecast Multiple Linear Regression Plot using a 7 day seasonality (modelled using dummy variables)

1 12 23 34 45 56 67 78 89 1 111 122 133 144 155 166 177 188 199 21 221 232 243 254 265 276 287 298 39 32 331 342 353 364 12 1 8 6 4 2 Forecast ACF plot for the residuals shows that the there is some signal that remains in the residuals We see a good fit, once we capture the signal in the residuals using an AR(5) model and add it back to our orginal, however for the one week forecast that we wish to make we do not get a result better than the seasonal naive forecast on the error measures chosen. Actual Value New Forecast 12 1 8 6 4 2-2

ACF Plot for residuals shows that there is no further signal to be captured. Forecast for Bikaji Namkeen, for the next 7 days using the multiple linear regression with AR(5) model for the residuals: 6 5 4 3 2 1 Actual 1 2 3 4 5 6 7 Forecast

Appendix IV: Daawat Rice Day of the week break down for Daawat Rice is shown below. We notice that there is a clear difference in levels for weekends as compared to weekdays. Data preparation, we see that there a number of missing values and negative values. For the missing values, we use our judgement to conclude that these were recording errors and replace them with the average value in the data. Quantity bought per Transaction Number of such transactions -6 1-3 2 3 156 6 92 9 6 12 5 15 2 27 1 3 1 Grand Total 1166 Seasonal Naïve Forecast 6 5 4 3 2 1 Naïve Forecast

1 13 25 37 49 61 73 85 97 19 121 133 145 157 169 181 193 25 217 229 241 253 265 277 289 31 313 325 337 349 361 1 13 25 37 49 61 73 85 97 19 121 133 145 157 169 181 193 25 217 229 241 253 265 277 289 31 313 325 337 349 361 Frequency ACF We show histogram of the data of the data after we correct for missing values and negative values. We also show the ACF plot see that there is seven seasonality in the data. 2 1 Histogram 5 1 15 2 25 3 35 4 45 5 1.5 -.5-1 ACF Plot for 1 2 3 4 5 6 7 8 9 1 Lags ACF UCI LCI Training and Validation Actual vs. Forecasted (MLR with seven day seasonality) and the respective residual plot is shown below. We then check the ACF plot for 6 5 4 3 2 1-1 Actual Value Predicted Value Training and Validation Residuals Residuals 4 2-2 -4 Actual Predicted Residual 45 4 35 3 25 2 15 1 5 1 15 2 25 3-25 -2-15 -1-55

1-Aug-11 1-Sep-11 1-Oct-11 1-Nov-11 1-Dec-11 1-Jan-12 1-Feb-12 1-Mar-12 1-Apr-12 1-May-12 1-Jun-12 1-Jul-12 ACF ACF We check the residual plots to see that there is some seasonality that is not captured and we attempt to do it via an AR model. 1 ACF Plot for Residual Combined.5 -.5 1 2 3 4 5 6 7 8 9 1-1 Lags ACF UCI LCI Results of the multiple linear regression with overlay of the residual forecasts show a much better fit. Additionally, the residual plots do not show any further signal that can be captured. 6 5 4 3 2 1-1 ARIMA Forecast 1.5 -.5 ACF Plot for New Residuals 1 2 3 4 5 6 7 8 9 1-1 Lags ACF UCI LCI

Appendix V: Fortune Oil Visualizing the data : Sales Dominant on Weekends: Data Clean Up: Negative quantity values are replaced with the lowest value (assuming those were data entry errors). For missing values we ve inserted zero value rows. Quantity bought per transaction No. of such transactions -3 2 3 571 6 8 9 2 Grand Total 583

ACF Test for Random Walk: ARIMA Model ARIMA Coeff StErr p-value Const. term 2.8973697.3631583 AR1.378598.465415 Naïve Forecast: 6 5 4 3 2 1 Naïve Forecast Residuals & Errors: Residual 6 4 2 Residual -2-4 Seasonal Naïve Forecast: Not much improvement since there are many intermediate missing values which cause even the seasonal model to break quite often. We then check for seasonality in data. We notice that seasonality is most pronounced for day 7. 1.5 ACF Plot for -.5-1 1 2 3 4 5 6 7 8 9 1 Lags ACF UCI LCI

Multiple Linear Regression with 7 day seasonality: The Regression Model Input variables Coefficient Std. Error p-value SS Constant term 5.912266.9451734 597.183594 Dayindex.138713.286613.684 67.551285 Weekday_1-4.97293854 1.196196.99 37.27494812 Weekday_2-5.1591244 1.194717.462 76.1297 Weekday_3-3.2417698 1.11446846.38592 22.2744477 Weekday_4-5.8439149 1.11444271.23 235.951971 Weekday_5-5.44441414 1.11442423.153 332.531669 Weekday_6-4.63397169 1.11441314.432 547.5755615 Training Data scoring - Summary Report Total sum of squared errors RMS Error Average Error 11115.7246 5.56443733-1.33148E-8 Validation Data scoring - Summary Report Total sum of squared errors RMS Error Average Error 113.118556 4.19915699-1.3349953 Actual Vs Forecast and Residuals: 6 5 4 3 2 1 Forecast Residuals 6 4 2-2 Res

ACF ACF Plot for Residuals of the multiple linear regression 1 ACF Plot for E.5 -.5 1 2 3 4 5 6 7 8 9 1-1 Lags ACF UCI LCI Plot of multiple linear regression with overlay of AR model for residuals, we see that there is still some signal remaining in the residuals however on running the model, we do not see a significant improvement in terms of the error measures.

Appendix VI: Aashirwaad Atta Visualizing the data: Sales dominant on weekends: Outliers and Missing Data: Transaction Size Count of Customer_No 3 11 6 114 9 28 12 8 15 5 18 1 21 2 24 2 27 1

3 3 36 1 Grand Total 1166 Missing values are replaced with zero. Spikes are treated as outliers and are replaced with closest values that fall within our tolerance range. Most such cases are due to one single customer doing infrequent bulk purchases. Random Walk Test: We see that the ARIMA Coeff StErr p-value Const. term 8.31235886.61765313 AR1.2253562.2676752 Naïve Forecast: 7 6 5 4 3 2 1 Naïve Forecast Residuals: Residual 6 4 2-2 -4-6 Residual

ACF ACF Plot to detect seasonality: ACF Values Lags ACF ACF Plot for 1 1.22546686 1 2.118919 3.7472585.5 4.164873 5 -.139351 1 2 3 4 5 6 7 8 9 1 6.391767 -.5 7.21895115 8.3814469-9.123861-1 Lags 1.7832895 ACF UCI LCI Multiple Linear Regression with seven day seasonality: 7 6 5 4 3 2 1 Forecast Residuals: Residuals 6 4 2 Residuals -2

Multiple Linear Regression with overlay of the AR(7) for the residuals: 6 5 4 3 2 1-1 Double Layer Forecast