A Sales Forecasting Model for an Automotive Distributing Company

A Sales Forecasting Model for an Automotive Distributing Company Jéssica Andressa de Souza 1, and Celso Gonçalves Camilo-Junior 2 1 Faculty of Sciences and Technology, Federal University of Grande Dourados, Dourados, MS, Brazil 2 Institute of Informatics, Federal University of Goias, Goiania, GO, Brazil Abstract - One way to improve inventory management is to analyze sales forecasting information because those data can improve decision making. Thus, this paper proposes the application of Artificial Neural Networks (ANN) to generate two prediction models. The first one is for a product and the second model is for a group of products of an automotive distributing company in Brazil. Different architectures of ANN were tested to identify the best configuration. The results show a good performance of ANN for the scenarios analyzed. Keywords: Artificial Neural Networks, Forecast, Multilayer Perceptron, Data Mining. 1 Introduction Data Mining (DM) is an area of computing that allows the work in the discovery of new information based on standards or rules with a large volume of data [14]. The terms Data Mining and Knowledge Discovery in Databases KDD are commonly mistaken as synonymous. However, DM is a part of the KDD process [4], [19]. Among the various targets of DM and KDD, the prediction analyzes how certain variable will behave by other related attributes [5]. The forecasting is not an end in itself, but a means of providing information and support for a subsequent decision, aimed at achieving certain goals [12]. One application is the prediction of time-series forecasting, which is to provide the series value at time t + h from a predictive model built based on a set of information from a series collected up to time t, illustrated in Figure (1). Figure 1 - Observations of a time-series forecasting horizon h and origin t. Source: [13]. The predictions of a time series can be classified as short, medium or long-term, depending on the forecast horizon. Thus, two different techniques to predict future values of a time series can be used [13]. Forecast multi-step: where the set of current values is used to predict a given instant. The value of the next moment is determined from the current values plus the predictions already made. This technique is adopted for long forecast horizons as it seeks to identify trends and turning points in the most relevant series; Forecast simple step: where the prediction is made only for the time period immediately following the current, from observations of the current series According to Souza (1989), ensuring the quality of forecasts of a time series is only achieved by adopting the forecast horizon as the instant of time immediately subsequent to the origin t. There is a variety of methods of DM for pattern recognition, and some of these are of Artificial Intelligence. Among those, this paper uses the technique of Artificial Neural Network (ANN) as a tool to create a prediction model of a series. We opted for the ANN by good performance elsewhere. Comparisons with other methods of ANNs can be obtained by works [17], [11]. A good demand forecast means that stocks are better managed because from the moment the company becomes aware of the demand, it can prepare itself to make their decision-making [20]. Therefore, despite the importance of the prediction of sales in decision making, this paper discusses the use of ANNs to forecast demand in wholesale in an automotive distribution company. Among the various products available for sale, this work deals with the forecast of unit sales of the product 6167. which are considered one of the most financially important products for the company, and also the forecast of unit sales of a product group, which is this group s most financially profitable product as well. This paper is organized as follows. Section 2 presents a brief summary of the Multilayer Neural Network (MLP) and the learning algorithm Rprop. Section 3 deals with time series forecasts. Section 4 presents the problem addressed in the workplace. Section 5 presents the experiments and Section 6 the results. Finally, Section 7 presents the conclusions.

2 Neural Networks For Multiple Multilayer ANNs (Multilayer Perceptron - MLP) have the input layer, hidden layers and output layer [10] in their architecture. MLP ANNs have as main feature the ability to deal with issues not linearly separable, unlike the single-layer perceptron ANNs The purpose of the hidden layers is to treat the nonlinearity in order to reduce complexity for the output layer. For example, given a non-issue linearly separable, the middle tier of the ANN will make it linearly separable. The result is sent to the output layer so that it solves the problem given by the input layer. MLP ANNs are progressive (feed-forward), i.e., the outputs of neurons in any particular layer are connected only to the entry of the next layer of neurons, without the presence of feedback loops. Consequently, the input signal propagates through the ANNs, layer by layer, in a forward direction [8]. Figure (2) illustrates an MLP ANN with two hidden layers. Figure 2 - ANN with an input layer (left) with two neurons, two hidden layers (center): the first with four neurons and the second with four neurons and an output layer (right) with one neuron. Source: [7]. 2.1 Backpropagation The Backpropagation is an algorithm that is based on supervised learning for correction of errors for training the MLP ANNAs [15]. Basically, learning consists of two steps: the spread, in which information - once processed - is propagated as the input layer to layer until the output layer, where the response is obtained from the ANN and the error is calculated, and the backpropagation where changes are made in the synaptic weights based on the output layer to input layer. The two steps are shown in Figure (3). During the training phase, values of inputs must be present for ANN and also outputs desired values according to the inputs provided so as to allow the comparison of the ANN output with the desired output. Thus, the overall error of ANN that influences the correction of weights in the backpropagation step is estimated in order to decrease the calculated error [8, 3]. The aim of the Backpropagation algorithm, therefore, is on the surface of a global minimum error, i.e., values for the synaptic weights that minimize the error of the ANN. The current value of synaptic weight is often overlooked. This does not only depend on the learning rate, but also on the partial derivative. The unpredictable behavior of the partial derivative affects the speed of the algorithm and its adaptability. This was one of the reasons that led to the development of Rprop. To avoid the problem of adaptability, Rprop changes the size of the synaptic weights directly, i.e., without considering the size of the partial derivative [15]. These changes are intended to accelerate the training time of the algorithm and improve its performance since the standard Backpropagation algorithm is often slow. 2.2 Description do RProp The algorithm Rprop (English Resilient Backpropagation) is a variation of the Backpropagation algorithm. It updates the weights and adjustments as soon as the gradient has computed all the standards and has made a direct adaptation of the weight value based on local gradient information. The algorithm tries to eliminate the negative influence of the value of the derivative in the definition of partial adjustment of weights, as defined in Backpropagation. The algorithm eliminates the problem by leaving only the sign of the derivative, and not its value. Thus, the effort of adaptation is not overshadowed by the unpredictable behavior of the derivative [15], [3] Take into consideration the example: when there is an increase in the error derivative and it is positive, then there is a decrease in the weight by the update value as indicated by Equation (1), otherwise the derivative is negative, then the update value is added according to Equation (2). Equation (1) w ji (t)= - ji (t), se E/( w ji )(t) > 0 + ji (t), se E/( w ji )(t) < 0 0, se E/( w ji )= 0 Figure 3 - Flow of the phases of propagation and back-propagation algorithm Backpropagation. Source: [1].

ji (t)= n + ji (t-1), se E(t-1)/( w ji ). E(t)/( w ji ) > 0 n - ji (t-1),se E(t-1)/( w ji ). E(t)/( w ji ) < 0 ji (t-1), se E/( w ji ) = 0 Equation (2) Under the rule of adaptation algorithm Rprop, if the partial derivative of error with respect to a weight w ij keeps its signal, it means that its last adjustment made a reduction of the error, then the update value is increased by the object n + leading to an increase in the speed of convergence of the training. But when the derivative changes its sign, this indicates that its last adjustment was too large, and then the update value is reduced by the object Δ ji n - changing direction of adjustment. 3 Estimated Time Series A time series is a collection of observations made sequentially over time. The time series forecasts are forecasts of future values based on past values [6], [16]. A discrete time series can be represented by X T = (x 1. x 2...., x T ), where each discrete observation x t is related to different sets, with a link of subordination between these observations [13]. The forecasts are not perfect and future involves uncertainty, but they provide information and provides help for an important decision-making in order to achieve certain objective. 3.1 Artificial Neural Networks in Time Series ANN techniques have been applied to a variety of problem areas such as image processing, speech processing, forecasting optimization, among others, where you can get better results than conventional methods. However, understanding how the techniques of ANNs can achieve higher levels of operating performance continues to be difficult [18]. Among the problems mentioned above, time-series forecasting has received special attention from researchers. Predicting the future is fundamental in decision making, given the behavior of series that varies over time. This has been a great challenge for statistical and computing. The ANN is presented as a great tool for forecasting time series. Its ability to extract difficult non-linear relationships from data entries that have noise has achieved remarkable results. In most cases, these have been even better than those achieved by conventional statistical procedures. Examples of applications of ANNs in forecasting time series are found in several applications: forecasting stock prices [13], prediction of drought [11]; forecast value of the commodity agribusiness, among others. It is noted that statistical techniques do not provide good prediction results for some applications that have a set of sample size and restricted to some non-linearity in the data set [21]. Due to their great ability to learn, ANNs possess qualities that enable them to identify some characteristics of the series, such as timeliness, besides dealing with non-linear data [21]. Apart from their great advantages, the ANNs have disadvantages, such as identifying the best architecture of the ANN. That is, there is an appropriate setting for the application. The identification of architecture is discovered by trial and error. However, some heuristics suggested ranges of values. 4 Description of Problem The problem, in this paper is based on the daily lives of a large company which has seven branches across the country with a turnover of millions per month. The company works with more than 20.000 different items from various brands and models. It has, therefore, a high complexity in the buying process, which is fundamental to good inventory management and enterprise resource. To reduce complexity and improve the decision-making process of purchases, the company specialized and directed staff to certain brands of products, managing a division of tasks and, consequently, a reduction of complexity since each employee determines the purchase of a single brand. One of the main characteristics of the company is its ability to prompt delivery of products with a gap in the market. This makes it one of the fastest selling in the sector. However, to achieve good financial results, it is necessary to estimate the size of the stock, which can be done with a good purchasing management. In order to help the decision-making process, this paper addresses the projected amount of sales for a product brand and specific model and the amount of sales of that product group. To this end, it uses a database with 465 samples, 409 samples for training and 56 samples for testing/validation of ANN. Thus, the window of testing and validation of the week brought together the sales of two months. The amount of sales of the product chosen for the tests was identified after preliminary analysis of data and conversations with employees, and the group that this product is bound varies with the shop (subsidiary) company and the week/month years. Therefore, we selected the information presented below for input ANN. Initially, there is a total of five attributes: 1. Week: variable identifying which week the sale was made, which ranges from 1 to 4; 2. Month: identification variable of the month the sale of the product took place, which ranges from 1 to 12; 3. Seller: identification variable branch. It varies from 0 to 6; 4. QuantitySalesPreviousWeek: provides the quantity of product sold in the previous week; 5. Group: Identifying which group the product belongs to.

It is believed that there were deals in the period analyzed that differentiated the sale of this product. As output of ANN was selected attribute QuantitySalesWeek, which provides an estimate of the total amount sold in the week. 6. After defining the attributes of entries, there was the process of standardizing the data from these attributes. This process meant that the data were normalized in a range of value ranging from -1.0 to 1.0. Data normalization allows better learning of the ANN. 5 Experiments After pre-processing with the attributes, we sought to identify the amount of hidden layers and number of neurons in the hidden layers to the problem addressed. Finding a number of neurons for the layer(s) occult(s) is not an easy task because it depends on several factors, for example the amount of data available for testing and training, the quality of data available, among others [2]. If there are many neurons in layer(s) occult(s), the performance of ANN is suitable for the training data, but it tends to be bad for the test data/validation of the ANN, and it generates high computational cost. If there are few neurons of the ANN performance in layer(s), it may be bad both for training and for validation, despite the reduction in computational cost. That is, finding the perfect amount of hidden nodes is a costly task because the designer must always train and test several times to the ANN with different quantities of neurons and layers. The ideal number of neurons is one that can achieve the performance specifications suitable for both training data and for test data. In order to identify the best architecture of the ANN, the tests were performed with one and two layer(s) occult(s) ranging from two to five neurons. This range was established by studying the rule Hayssler & Baum (1989). The rule proposes that the total number of parameters of the ANN (Z) calculated by Equation (3), and the amount of data available (N) obey a relation. See Equation (4). Equation (3) Z = (p+1) q 1 + (q 1 +1)m Where p is the dimension of input vector, m is the dimension of output vector EQA number of neurons in first hidden layer. Equation (4) N > Z / ε Where ε is the error tolerated during testing, for example: if the error is = 0.1 (10% tolerance), this means that N> 10Z. Among the various measures of accuracy, this paper adopts the Average Percentage Error (MSE), see Equation (5), the Mean Absolute Error (MAE), see Equation (6), and Mean Absolute Percentage Error (MAPE), see Equation (7). x i is the predicted value at time i and n is the number of predictions made. Equation (6) ( n (i=1) ( ((x i - x i )) / x^ i )) / n Equation (7) ( n (i=1) ( ((x i - x i )) / x^ i )) / n (100) The input data of the ANN were established through a program developed in Java, from data supplied by the company, filters and summarizes information. Thus, a new data table was generated and it is used for forecasting. This table was the entry for the software Knime [9], used for testing. To learn the ANN, we used the algorithm implemented in Rprop Knime. For each run 10.000 iterations are performed as preliminary tests indicate that value as the most appropriate. 6 Results We applied the ANN with five neurons and one hidden layer Figure (4) and 2 hidden layers Figure (5) to forecast the number of units of a product group that was sold. We applied the ANN with five neurons and one hidden layer Figure (6) and 2 hidden layers Figure (7) to predict the amount of product units sold 6167. Figure 4 - Forecast for a group of products with the ANN of a hidden layer with five neurons. Figure 5 - Forecast for a product with ANN from a hidden layer with five neurons. ( n (i=1) (((x i - x i )) / x^ i ) (100)) / n Where x^ i is the observed value at time i; Equation (5)

1 2 1 0,036 1,299 3,6 1 2 2 0,033 3,369 3,3 1 2 3 0,029 7,96 2,9 Average 0,033 4,209 3,267 586,486 2 5 1 0,038-12,058 3,8 2 5 2 0,034 5,268 3,4 2 5 3 0,032-1,299 3,2 Figure 6 - Forecast for a group of products with the ANN of two hidden layers with five neurons. Average 0,035-2,696 3,467-375,680 2 4 1 0,036 1,042 3,6 2 4 2 0,036 4,918 3,6 2 4 3 0,037 1,999 3,7 Average 0,036 2,653 3,633 369,642 2 3 1 0,035 1,313 3,5 2 3 2 0,032 5,305 3,2 2 3 3 0,036-3,34 3,6 Average 0,034 1,093 3,4333 152,241 2 2 1 0,04 2,93 4 Figure 7 - Forecast for a product with ANN from two hidden layers with five neurons. Some figures have vertical lines that delimit the series by store, so a more detailed assessment can be made. Store 0, store 1. store 2 store 3. store 4. 5 and shop shop 6 (series for the month of April) are analyzed from left to right and, after the end, a new count (series related to May) starts. The line has red corresponds to the desired value. The blue line is the predicted value by ANN. Table 1 shows the results of AMI, the EPM and the MAPE of the forecasts for the group of each architecture. Table 1 - Results of the forecast for the group. Table of Forecast Errors for Group L = Layer; N = Neurons; E = Execution L N E EAM EPM (%) EPAM (%) 1 5 1 0,028-2,722 2,8 1 5 2 0,036 7,732 3,6 1 5 3 0,036 0,035 3,6 Error in quantity produtct (units) Average 0,033 1,682 3,333 234,353 1 4 1 0,036 1,103 3,6 1 4 2 0,029-2,709 2,9 1 4 3 0,032-1,082 3,2 Average 0,032-0,896 3,233-124,840 1 3 1 0,035 1,327 3,5 1 3 2 0,035 1,293 3,5 1 3 3 0,032 3,779 3,2 Average 0,034 2,133 3,4 297,191 2 2 2 0,033 0,651 3,3 2 2 3 0,035 5,906 3,5 Average 0,036 3,162 3,6 440,608 Given these results, it is observed that the percentage index of the errors is small. Also, note that the results vary according to the store because its store time series is more complex than others. Observe that in stores 3. 4 and 5. ANN is more difficult since the stores 0, 1 and 2 the model is most effectively illustrated in Figures 5 and 6. Whereas the results obtained are further points to the training points and the model takes into account the different stores, the ANN showed a good performance on average. For the worst case there was an EPM of -7.801% and in the best case of 1.422% with respect to the quantity of product units 6167 sold. Analyzing the error by the amount of plant products, 54.35 units in the worst case and 51.24 units at best. Since the EPM, referring to the product group, presented the worst case 3.6% to 501.58 units of products of the group and presented the best case 3.3% to 390.12 units of the group's products. Thus, the difference in units between the worst and best case is 111.46 units. It is noticed that the error both in predicting the product and in predicting the group would not cause considerable harm to the company since these values are small and easily absorbed in the following months. 7 Conclusion The sales forecast is very important in making a business decision since it enables better management of inventory. That is, it helps in the purchase of quantity of products so that the

company does not lose sales and it does not increase the cost of inventory. Therefore, this paper also forecasts sales of units of a product brand and model specific and group of products related to this product. It developed the model based on the time window of 2002 week by week from seven branches to predict the values of 2003 (April and May). The results showed that it is possible to develop a predictive model based on ANN for the company analyzed. This model can predict wholesale sales of the product and the group that is satisfactory. Even with few input data, we obtained a model with good performance. Given that there are several factors that affect the sales forecast, such as: promotions, commissions for salespeople, taxes, among others, we suggested as future work use more variable impact on the proposed model and an increase in the time series used in training. Moreover, we suggest using other ANN algorithms, e.g. static learning rate with Backpropagation [22]. 8 References [1] BARROS, Adélia Carolina de Andrade. Otimização de Redes Neurais Artificiais para Previsão de Séries Temporais. Trabalho de Conclusão de Curso de Graduação (Engenharia da Computação), Escola Politécnica de Pernambuco. Maio, 2005. [2] BAUM, E. B., & HAUSSLER, D. What size net gives valid generalization? Neural Computation, 6. 151 160, 1989. [3] BRAGA, Antônio de Pádua; CARVALHO, André Ponce de Leon F. de Carvalho; LUDERMIR, Teresa Bernarda. Redes Neurais Artificiais: Teoria e Aplicações. 2. Ed. Rio de Janeiro: LTC, 2007. [4] CALIL, Leonardo Aparecido de Almeida; CARVALHO, Deborah Ribeiro; SANTOS, Celso Bilynkievycz; VAZ, Salete Marcon Gomes Vaz. Mineração de Dados e Pós- Processamento em Padrões Descobertos. Publ. UEPG Ciências Exatas e da Terra, Ciências Agrárias e Engenharias, Ponta Grossa. Vol 14. Nº 03. p. 207 215 dez. 2003. [5] FAYYAD, Usama; SHAPIRO, Gregory Piatetsky; SMYTH, Padhraic. From Data Mining to Knowledge Discovery in Database. American Association for Artificial Intelligence. Providence, Rhode Island: FALL 1996. AAAI 97 - Conferences National de 1997. [6] FIGUEREDO, José Clodoaldo. Previsão de Séries Temporais Utilizando a Metodologia Box & Jenkins e Redes Neurais Artificiais para Inicialização de Planejamento e Controle de Produção. Curitiba. Dissertação (mestrado) - Universidade Federal do Paraná. 2008. [7] FREIMAN, José Paulo; PAMPLONA, Edson de O. Redes Neurais Artificiais na Previsão do Valor de Commodity do Agronegócio. V Encuentro Internacional de Finanzas. Santigo, Chile, 19 a 21 de janeiro, 2005. Disponível em: <www.iepg.unifei.edu.br/edson/download/artfreimanchile05.pdf> Acesso em: Junho, 2010. [8] HAYKIN, Simon. Redes Neurais Artificiais Princípios e Prática. 2. Ed. São Paulo: Editora ARTMED, 1999. [9] KNIME, Desktop. Konstanz Information Miner. Disponível em: <http://www.knime.org/downloads-0> Acesso em Junho, 2010. [10] KOVÁCS, Zsolt László. Redes Neurais Artificiais: Fundamentos e Aplicações. 4. Ed. São Paulo: Editora Livraria da Física, 2006. [11] MACIEL, Leandro S.; BALLINI, Rosangela. Neural Networks Applied to Stock Market Forecasting: An Empirical Analysis. Journal of the Brazilian Neural Network Society, Vol. 8. Iss. 1. pp. 3-22. 2010. [12] MORETTIN, Pedro Alberto; TOLOI, Clélia Maria de Castro. Modelos para Previsão de Séries Temporais. In: 13º Colóquio Brasileiro de Matemática. Rio de Janeiro: (s.n), 1981. [13] MUELLER, Alessandro. Uma Aplicação de RNAs na Previsão do Mercado Acionário. Florianópolis. Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Departamento de Pós-Graduação em Engenharia de Produção, 1996. [14] NAVATHE, Shamkant; ELMASRI, Ramez E. Sistemas de Banco de Dados Fundamentos e Aplicações. 4. Ed. Editora: Pearson Education (inglês), 2005. p. 622 643. [15] RIDMILLER, Martin; BRAUN, Heinrich. A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm. Proceedings of the IEEE International Conference on Neural Networks, San Francisco, CA, April, 1993. p. 586 591. [16] SOUZA, Reinaldo Castro. Modelos estruturais para previsão de séries temporais: abordagem clássica e Bayesiana, Instituto de Matemática Pura e Aplicada do CNPQ. IN: 17º Colóquio Brasileiro de Matemática. Rio de Janeiro, 1989.

[17] VIGLIONI, Giovanni Melo Carvalho. Comparação entre Redes Neurais e Técnicas Clássicas para Previsão de Demanda de Transporte Ferroviário. Disponível em: <http://publique.rdc.pucrio.br/rica/media/ica01_viglioni.pdf> Acesso em Junho, 2010. [18] YOUNGOHC, Yoon; SWALES, George; MARGAVIO, Thomas. Comparison of Discriminant Analysis versus Artificial Neural Networks. The Journal of the Operational Research Society, Vol. 44. Nº 1 (Jan., 1993), p. 51-60. Disponível em: <http://www.jstor.org/stable/2584434> Acesso em Junho, 2010. [19] ZUCHINI, Márcio Henrique. Aplicações de mapas autoorganizáveis em mineração de dados e recuperação de informação. Campinas. Dissertação (mestrado) Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação, 2003. [20] FLORES, João Henrique Ferreira; WERNER, Liane. Aplicação de RNAs Neurais Artificiais à Previsão de Vendas de Máquinas Agrícolas. XXVII Encontro Nacional de Engenharia de Produção, Foz do Iguaçu, PR, Brasil. Out. 2007. Disponível em: <www.abepro.org.br/biblioteca/enegep2007_tr620466_9 360.pdf> Acesso em: abril, 2010. [21] ABELÉM, Antônio Jorge Gomes. Redes Neurais Artificiais na Previsão de Séries Temporais. Dissertação (mestrado), Rio de Janeiro Pontifícia Universidade Católica do Rio de Janeiro, Departamento de Engenharia Elétrica. Setembro, 1994. [22] CAMILO, Celso; YAMANAKA, Keiji. A Practical Method for Finding an Efficient Static Learning Rate for ANN. Proceedings of the the 2008 International Conference on Artificial Intelligence, ICAI 2008, July 14-17, 2008.