A Retail Demand Forecasting Model Based on Data Mining Techniques

A Retail Demand Forecasting Model Based on Data Mining Techniques İrem İşlek Idea Teknoloji Çözümleri Istanbul, Turkey iremislek@ideateknolojicomtr Şule Gündüz Öğüdücü Istanbul Technical University Department of Computer Engineering Istanbul, Turkey sgunduz@ituedutr Abstract This paper addresses the problem of forecasting various product demands of main distribution warehouses Demand forecasting is the activity of building forecasting models to estimate the quantity of a product that customers will purchase It is affected from numerously different factors such as warehouse region size, customer count, product type etc When the number of the distribution warehouses and products increases, it becomes considerably hard to estimate the demand of customers In this study, we provide an appropriate methodology for demand forecasting which is capable of overcoming the aforementioned limitations while providing a high estimation accuracy The proposed methodology clusters similar warehouses according to their sale behavior using bipartite graph clustering After that, hybrid forecasting phase which combines moving average model and Bayesian Network machine learning algorithm is applied Our experimental results on real data set show that this approach considerably improves the forecasting performance Index Terms Bayesian networks, Bipartite graph, Bipartite graph clustering, Demand forecasting, Moving Average, Multilayer perceptron algorithm (MLP), Supply chain I INTRODUCTİON Demand forecasting which is the process of estimating the quantity of a product that customers will purchase is a research topic in machine learning It is an important part of supply chain management Supply chain can be thought as the network of organizations that are involved, through upstream and downstream linkages, in the different processes that produce value in the form of products or services delivered to the end consumer [1] There are various types of supply chain for different environments Supply chain models can be categorized into three main types [2] such as Direct Supply Chain, Extended Supply Chain and Ultimate Supply Chain Graphical representations of these models can be seen in Fig 1 In this study, we focus on demand forecasting models for Direct Supply Chain Detailed representation of Direct Supply Chain can be seen in Fig 2 In direct supply chain management, the products of a manufacturer are sent to main distribution warehouses Every main distribution warehouse may have a different number of sub-distribution warehouses that distribute products to end sale points such as supermarkets, grocery stores, canteens etc Since both the number of warehouses and the variety of products increase in today s competitive and dynamic business environment, accurate demand forecasting becomes more important The improvement of the accuracy of demand forecasting yields significant savings for manufacturers They can produce each product in sufficient amount that prevents unnecessary inventory costs Besides, the manufacturers can buy adequate amount of supply materials and avoids redundant supply material costs s Ultimate Organization Customer Third Party Logistic Organization Customer Financial Provider Direct Supply Chain Organization Customer Extended Supply Chain Ultimate Supply Chain Fig 1 Types of supply chain Market Research Firm Customer s Customer Ultimate Customer In this study, we focus on building a demand forecasting model for main distribution warehouses of a company Data for proposed study was taken from a national dried nuts and fruits company from Turkey This company has nearly one hundred main distribution warehouses The main distribution warehouses

End Sale Points contain sub-distribution warehouses which give service to end sale points The company produces and distributes nearly seventy different products With the increasing number of the variety of products and size of the warehouses, it becomes more difficult to accurately estimate the demand with traditional methods For this reason, most of the previously proposed methods are interested in limited count of warehouses and products In addition to that, the estimation accuracy of these methods are not sufficient In this paper, a new methodology which can handle numerous main distribution warehouses and products is proposed for demand forecasting Sale amount of an item can be estimated for every main distribution warehouse The proposed model based on data mining techniques is able to estimate the product demands accurately by considering various warehouse, product, shopper s demographic and time attributes Manufacturer distribution warehouses Sub distribution warehouses Fig 2 and sub distribution warehouses in direct supply chain Customers Our overall model can be summarized as follows: we constructed a dataset from sales invoices of the company After that we prepared data in order to apply data mining algorithms and calculated moving average values of product sale amounts Then we clustered main distribution warehouses and sub distribution warehouses Lastly we set a Bayesian Network model for demand forecasting The rest of the paper is organized as follows; a brief literature review in topics related to this paper is given in Section II Section III gives background information for methodology Section IV describes the overall details of the proposed methodology Section V gives the results of the experiments and a discussion of these results Finally, in Section VI, we conclude the paper and discuss the future work II RELATED WORKS Demand forecasting has been pointed as an important and a challenging problem for supply chain management [1] For this reason, there have been several studies that applied data mining and machine learning techniques to solve this problem In some prior studies about demand forecasting, traditional statistical methodologies such as moving average, Box- Jenkins were used Liu et al used data mining methodologies for time series and provided improvement in Box - Jenkins time series forecasting results [3] Since statistical models could not give satisfying results, artificial intelligence algorithms were tried in numerous studies For instance, Neural Network algorithms were commonly employed in the literature [4-9] Given studies provided impressive results with NN algorithms Hasin et al showed that ANN provides better results than traditional statistical methods such as Box- Jenkins model, Holt-Winter s model [10] Some subsequent studies combined ANN algorithm with another algorithm with the purpose of providing more successful methodologies Doganis et al used genetic algorithm with RBF neural network algorithm [11] Aburto and Weber proposed another hybrid model which combined Autoregressive Integrated Moving Average (ARIMA) model with neural network algorithm [12] Because of the fact that ANN was a popular algorithm for demand forecasting, Efendigil et al compared Adaptive Neural Fuzzy Inference System (ANFIS) with ANN In their study, ANFIS provided higher success than ANN [13] Sun et al used several Extreme Learning Machines (ELM) in parallel to forecast sales amounts [5] Data mining is used in some recent studies for providing more efficient methodologies in demand forecasting Parikh proposed a data mining application for better demand forecasting and product allocations clustering [14] Altıntaş and Trick used data mining methods for categorizing customer order distributions into data clusters [15] Conducted studies in demand forecasting problem focus on different points of the supply chain For instance, most of the studies are trying to forecast demand of one end sale point In addition to that some of the studies have limited count of products to demand forecast On the contrary, our problem contains larger number of main distribution warehouses and products compared to aforementioned studies In this study, we benefited from data mining techniques for overcoming this problem III BACKGROUND This section provides the necessary background on the problem we want to solve First, we briefly describe bipartite graph clustering Then, we explain the classification method using Bayesian networks A Bipartite Graph Clustering In some applications, data can be represented as a bipartite graph structure G(X, Y, E) Bipartite graph is a special type of graph which the set of nodes (X, Y) represents two different type of objects and the set of edges (E) represents the relation between these objects In bipartite graphs, same type of nodes can not have connection There can be a connection only between different type of nodes An example of bipartite graph representation can be seen at Fig 3

In this study, bipartite graphs are constructed to represent warehouse-item relations A bipartite graph G = (X,Y,W) is obtained, where the set of nodes (X, Y) represent the warehouses and items, the set of edge weights (W) represents sale amount of item for the warehouse This bipartite graph partitioning method applied in this study tries to seperate a bipartite graph into two bipartite graphs, recursively A vertex partition of G(X, Y, W) denoted by Π(A, B) is defined by a partition of the vertex sets X and Y, respectively: X = A A c, and Y = B B c as can be seen in Fig 3 In this partition, A pairs with B, A c pairs with B c from a random experiment In addition to that, every edge between these nodes represents probabilistic dependencies among these random variables Two random variables are said to be independent if the result of the second variable is not affected by the result of the first random variable Bayesian Networks can be used for numerous applications such as classification, regression, segmentation etc [17] In our study, Bayesian network could represent the probabilistic relationships between demand forecasting results (sale amounts predictions of items) and various attributes such as moving average value, number of transportation vehicles of warehouse, location of warehouse etc Given attributes, the network can be used to compute the probabilities of the presence of demand forecast results Fig 3 Bipartite graph clustering On the purpose of spliting graph into clusters, it is searched a partition (Ncut) using Eq 1 that the similarity between unmatched vertices is as small as possible min π(a,b) Ncut(A, B) (1) Equation 2 which can be seen in below is used for calculation of Ncut(A, B) IV DETAILS OF THE METHODOLOGY The main purpose of the methodology is forecasting the sale amount of a specific item for a specific week and a specific main distribution warehouse Basic steps of the purposed methodology can be seen in Fig 4 The proposed method consists of four main stages: (1) In the first step, we prepare the data set obtained from a retailer in order to apply data mining algorithms; (2) For each product, we calculate its moving average value; (3) a bipartite clustering algorithm is applied in order to group warehouses and their sub-distributers that have similar sales behavior; (4) applying Bayesian Network to obtain forecasting results The details of these steps are explained in this section Constructing dataset Calculating moving average values Constructing bipartite graph with main warehouses Ncut(A, B) = cut(a,b) W(A,Y)+W(X,B) + cut(ac,b c ) W(A c,y)+w(x,b c ) (2) Clustering main warehouses using bipartite graph cut(a, B) can be calculated using Eq 3 where W(A, Y) is the sum of the weights of edges with one endpoint in A and the other endpoint in Y cut(a, B) = W(A, B c ) + W(A c, B) (3) The reason of choosing this algorithm is that it works efficiently at clustering bipartite graphs compared to regular clustering algorithms such as k nearest neighbor [16] B Bayesian Network Algorithm Bayesian Network is a simple, graphical representation for conditional independence assertions In this graphical representation, every node of graph symbolize a random variable, where a random variable can take on possible values 1 2 Constructing bipartite graph with sub warehouses Clustering sub warehouses using bipartite graph 3 Using machine learning algorithm Forecast Results Fig 4 Basic steps of the methodology

A Constructing Dataset First step of the methodology is preparing the dataset which includes necessary information for generating forecast results The data was taken from a national dried nuts and fruits company from Turkey We used sale invoices of 2011, 2012 and 2013 with the purpose of constructing a specialized dataset to be used in the experiments The total numbers of warehouses and different products are ninety eight and seventy, respectively The dataset contains the following information about the warehouses and products: Warehouse related attributes: location, size related attributes, such as number of sub-warehouses it has, number of transportation vehicles, total amount of weekly selling products, selling area in square meter, number of employees, number of customers Product related attributes: product category (in this study a product ontology is constructed), selling amount, selling time B Data Preparation An important step of a model based on data mining techniques is data preparation In this step, the data is cleaned and prepared in order to apply data mining algorithms In this study, we designed a product ontology not only for providing an effective way for interoperability with other systems but also for avoiding the cold start problem The cold start problem also called new user or new item problem is the problem of estimating the demand of a new product We used Protégé [18] tool for constructing the product ontology All descriptor features of products were defined in detail Obtained ontology had four main and twenty eight sub product categories Nearly seventy different products were grouped according to defined product categories In this step, we also calculate moving average values of product sale amounts of past three weeks The calculation of the moving average value for a specific week t can be seen at Eq 4 Moving avg(t) = 3 i=1 sale amount(t i) 3 C Constructing bipartite graph and clustering warehouses (4) It was noticed that some of the main distribution warehouses show quite different sale behavior For instance, warehouses which are in Istanbul have more sub distribution warehouses and they give service in wider area than regular warehouses The warehouse features such as location and size also has an effect on the types of the products they sale For example, a warehouse may sell higher profit margin products where the customers of another warehouse in a different location may prefer less expensive products For this reason, it was decided grouping main distribution warehouses based on their product sale amounts by using a bipartite clustering method A bipartite graph which includes two different types of nodes was constructed using all main warehouses and all products Essential approach in this step is that, if a main warehouse sales a product, these two nodes have an edge Also, weight of the edge is total sale amount of the product for the main warehouse A representative figure for main distribution warehouse product bipartite graph can be seen in Fig 5 Bipartite graph clustering algorithm [16] is applied in order to group warehouses that have similar product sale behavior This algorithm provides more performance on bipartite graph clustering than regular clustering algorithms such as K Nearest Neighbor For this reason, bipartite graph clustering algorithm was chosen This algorithm separates bipartite graph into two graphs recursively In our study, twenty nine different main warehouse clusters were generated using bipartite graph clustering After that main warehouse clustering phase was completed, sub distribution warehouse clustering was started Purpose of this step is that some sub warehouses of a main warehouse serve disparate regions which have different purchase power Bipartite graph clustering algorithm was used for clustering sub warehouses of main warehouse likewise main distribution warehouses clustering step Count of sub distribution warehouse clusters was 97 1000 679 921 772 811 996 798 1000 1041 450 Product 1 Product 2 Product 3 Product 4 Product 5 Product 6 Fig 5 Example of a bipartite graph of main warehouses and products D Using machine learning algorithm Last step of the proposed methodology is using a machine learning algorithm as can be seen in Fig 4 Moving average values, warehouse related attributes and product related attributes were used to construct a Bayesian Network model These attributes corresponded random variables in Bayesian Network Forecast results were calculated based upon probabilities among these random variables of the network Data of 2011 and 2012 were used for training Bayesian Network model while data of 2013 were used for testing In the first trial, we set a Bayesian Network model which handles all main distribution warehouses together In the second trial, we set individual Bayesian Network models for main distribution warehouse clusters Then, separate models were constructed for every sub warehouse cluster in third trial Detailed results of given trials can be found in Section V

In the phase of setting Bayesian Network model, we use moving average values of products As we mentioned before, demand of a new product which does not have past sales data can be estimated owing to product ontology In this case, nearest neighbors of the new product in ontology are determined and moving average values of neighbor products are used for estimating the quantity of new product that customers will purchase V EXPERIMENTAL RESULTS The evaluation metric which was used in measuring the error rate is MAPE Equation 5 shows how to calculate MAPE value where At is actual value and Ft is forecasting value MAPE = 100 n n At Ft At t=1 (5) Moving average is one of the primitive forecasting methodologies as stand-alone We calculated error rate of this method for our dataset and found 129 % with MAPE This results showed us that primitive models for demand forecasting give insufficient results in complex structures The first trial which handled all main warehouses together had 49% error rate with MAPE using the hybrid model (moving average and Bayesian Network together) warehouses were clustered due to their sale behavior in the second trial This trail had 24% error rate Error rate decreased considerably in this methodology because respective models applied to every main warehouse cluster TABLE I RESULT TABLE Error Rates With MAPE One model for all main warehouses 49% Respective models for main distribution warehouse clusters Respective models for sub distribution warehouse clusters 24% 17% In the third trial, clustering was done to construct sub warehouse clusters Error rate in this step found as 17% Especially for main distribution warehouses which has numerous sub distribution warehouses, this trial provided more improvement For instance, a specific main warehouse cluster had 37% error rate with second trial When sub warehouses of this main warehouse clustered using third trial, the error rate dropped to 16% VI CONCLUSION This work presents an approach for demand forecasting which can handle numerous main distribution warehouses and products It was shown in this study that using one model for all main distribution warehouses gives unsatisfying results for multitudinous count of main distribution warehouses Furthermore, clustering main distribution warehouses according to sale amounts of per product and splitting forecasting models based on main warehouse clusters provided improvement in results If a main distribution warehouse had a larger area to serve and had more sub warehouses, clustering sub warehouses of this main warehouse provided better results In other words, when separate models applied to sub warehouse clusters, error rate decreased This situation comes from the fact that some sub warehouses of same main warehouse serve completely different regions which have distinctive purchasing power ACKNOWLEDGMENT This research was supported by Ministry of Science, Industry and Technology of Turkey SANTEZ project 0484STZ2013-2 REFERENCES [1] M L Christopher, Logistics and Supply Chain Management, London: Pitman Publishing, 1992 [2] J T Mentzer, W DeWitt, JS Keebler, S Min, N W Nix, C D Smith, Z G Zacharia, Defining Supply Chain Management, Journal of Business Logistics, vol 22, no 2, 2001 [3] LM Liu, S Bhattacharyya, SL Sclove, R Chen, W J Lattyak, Data Mining on Time Series: An Illustration Using Fast-Food Restaurant Franchise Data, Computational Statistics & Data Analysis, vol 37, pp 455-476, 2001 [4] PC Chang, YW Wang, CH Liu, The Development of a Weighted Evolving Fuzzy Neural Network for PCB Sales Forecasting, Expert Systems with Applications, vol32, pp 86-96, 2007 [5] ZL Sun, TM Choi, KF AU, Y Yu, Sales Forecasting Using Extreme Learning Machine With Applications In Fashion Retailing, Decision Support Systems, vol 46, pp 411-419, December 2008 [6] Y Yu, T Choi, C Hui, An Intelligent Fast Sales Forecasting Model for Fashion Products, Expert System with Applications, vol 38, pp 7373-7379, 2011 [7] SH Ling, Genetic Algorithm and Variable Neural Networks: Theory and Application, Lambert Academic Publishing, 2010 [8] KF Au, TM Choi, Y Yu, Fashion Retail Forecasting by Evolutionary Neural Networks, International Journal of Production Economics, vol 114, pp615-630, 2008 [9] RS Gutierrez, A Solis, S Mukhopadhyay, Lumpy Demand Forecasting Using Neural Networks, International Journal of Production Economics, vol 111, pp 409-420, 2008 [10] MAA Hasin, S Ghosh, MA Shareef, An ANN Approach to Demand Forecasting in Retail Trade in Bangladesh, International Journal of Trade, Economics and Finance, vol 2, no 2, April 2011 [11] P Doganis, A Alexandridis, P Patrinos, H Sarimveis, Time Series Sales Forecasting For Short Shelf-Life Food Products Based On Artificial Neural Networks And Evolutionary Computing, Journal Of Food Engineering, vol 75, pp 196-204, 2006 [12] L Aburto, R Weber, Improved supply chain management based on hybrid demand forecasts, Applied Soft Computing, 2007

[13] T Efendigil, S Önüt, C Kahraman, "A decision support system for demand forecasting with artificial neural networks and neuro-fuzzy models: A comparative analysis" Expert Systems with Applications, vol 36, no 3, pp 6697-6707, 2009 [14] B Parikh, Applying Data Mining to Demand Forecasting and Product Allocations, The Pennsylvania State University, 2003 [15] N Altintas, M Trick "A data mining approach to forecast behavior", Annals of Operations Research, vol 216, no 1, pp 3-22, 2014 [16] H Zha, X He, C Ding, H Simon, M Gu, Bipartite Graph Partitioning and Data Clustering, CIKM 01, Atlanta, Georgia, USA, November 5-10, pp 25-32, 2001 [17] I Ben-Gal, Bayesian Networks, In: F Ruggeri, F Faltin & R Kenett (Eds), Encyclopedia of Statistics in Quality and Reliability, John Wiley & Sons, 2007 [18] Protégé, http://protegestanfordedu