Artificial Neural Networks for Time Series Prediction - a novel Approach to Inventory Management using Asymmetric Cost Functions

Artificial Neural Networks for Time Series Prediction - a novel Approac to Inventory Management using Asymmetric Cost Functions Sven F. Crone University of Hamburg, Institute of Information Systems crone@econ.uni-amburg.de Abstract Artificial neural networks in time series prediction generally minimize a symmetric statistical error, suc as te sum of squared errors, to model least squares predictors. However, applications in business elucidate tat real forecasting problems contain non-symmetric errors. In inventory management te costs arising from over- versus underprediction are dissimilar for errors of identical magnitude, requiring an ex-post correction of te predictor troug safety stocks. To reflect tis, an asymmetric cost function is developed and employed as te objective function for neural network training, deriving superior forecasts and a cost efficient stock-level directly from te network output. Some experimental results are computed using a multilayer perceptron trained wit different objective functions, evaluating te performance in competition to statistical forecasting metods on a wite noise time series. 1. Introduction Artificial neural networks (ANN) ave found increasing consideration in forecasting teory, leading to successful applications in time series and explanatory sales forecasting [2,14,16]. In management, forecasts are a prerequisite for all decisions based upon planning. Terefore te quality of a forecast is evaluated considering its ability to enance te quality of te resulting management decision. In inventory management, te final evaluation of a forecast must be measured by te monetary costs arising from setting suboptimal inventory levels based on imprecise predictions of future demand [13]. Te costs from over- and underprediction are typically not quadratic in form and frequently non-symmetric [8]. Based upon modest researc in non-quadratic error functions in ANN teory [2,11,15] and asymmetric costs in prediction teory [1,8,4], a set of asymmetric cost functions was recently proposed as objective functions for neural network training [6]. In tis paper, we analyse te efficiency of a linear asymmetric cost function in inventory management decisions, training a multilayer perceptron to find a cost efficient stock-level for a stationary wite noise time series directly from te data. Following a brief introduction to neural network prediction in inventory management, Section 3 assesses statistical error measures and asymmetric cost functions for neural network training. Section 4 gives an experimental evaluation of neural networks trained wit asymmetric cost functions, outperforming expert software-systems for time series prediction in Section 4. Conclusions are given in Section 5. 2. Neural network predictions for inventory management decisions 2.1. Forecasting in inventory management In management, forecast are generated as a prerequisite of business decisions. Troug decisions based on sub-optimal forecasts, costs arise to te decision maker in coosing an inefficient alternative. In inventory management, forecasts of future demands are generated to select an efficient inventory level, balancing inventory olding costs for excessive stocks wit costs of lost sales-revenue troug insufficient stock [3]. Altoug te amount of costs will generally increase wit te numerical magnitude of te forecast errors, te costs arising from over- and underprediction are frequently neiter symmetric nor quadratic [8,5]. A service-level is routinely determined from strategic objectives or according to te actual costs arising from te decision, e.g. aiming to fulfil 98.5% of customer demand to balance tis trade-off. Assuming Gaussian distribution, setting te inventory level to te optimum predictor will only fulfil 5% of all customer demand. Terefore, safety-stocks are calculated to reac te service level, using assumptions of te conditional distribution of te ex post forecast errors of te metod applied. For te decision of an inventory level for a single product in a single period of time te classic newsboy - problem is applicable. Te decision rule for a service level resulting from a given cost of underprediction c u and overprediction c o reads

p y< ( Q ) c = c + c * u, u giving te value for a lookup of k in te probability table of te valid distribution. Te final stock-level s is calculated using te forecast and adding a safety stock (SS) of k standard deviations of te forecast errors: s = + o e (1) yˆ t + kδ. (2) Consequently, te precision of te forecasts directly determines te safety stocks kept, te inventory level and te inventory olding costs. Hence, forecasting metods wit superior accuracy suc as ANN may significantly reduce inventory olding costs [3]. 2.2. Neural networks for time series forecasting Forecasting time series wit non-recurrent ANNs is generally based on modelling te network in analogy to a non-linear autoregressive AR(p) model [1,4]. At a point in time t, a one-step aead forecast y ˆ 1 is computed using n observations yt, yt 1, K, yt n+ 1 from n preceding points in time t, t-1, t-2,, t-n+1, wit n denoting te number of input units of te ANN. Tis models a time series prediction of te form ( y, y y ) yˆ = t + 1 f t t 1,..., t n+ 1. (3) Te arcitecture of a feed-forward multilayer perceptron (MLP) of arbitrary topology togeter wit te resulting residuals of invalid forecasts denoted as te absolute error (AE) is displayed in Fig. 1. y(t) y(t) e(t) AE ( t ) = y t y ˆ t y 1 y 2 y 3 y 4 t* t*+1 t w 1,5 u 1 u 2 u 3 u 5 u 5 u 6 u 4 u 7 t w 5,8 µ u 8 y 5 99.55% Figure 1. Neural network application to time series forecasting in inventory management, applying a (4-4-1)-MLP, using n=4 input units for observations in t, t-1, t-2, t-3, 4 idden and 1 output neuron for time period 1 [16]. Te task of te MLP is to model te underlying generator of te data during training, so tat a valid forecast is made wen te trained network is subsequently presented wit a new value for te input vector [2]. Terefore te objective function used for ANN training determines te resulting system beaviour and performance. Te objective functions routinely employed in neural network training differ from te objective function of te underlying inventory management decision in slope, scale and ratio of asymmetry. Following, alternate objective functions are discussed to incorporate te original objective structure in ANN training. 3. Objective functions for neural network training Supervised online-training of a MLP is te task of adjusting te weigts of te links w ij between units j and adjusting teir tresolds to minimize te error δ j between te actual and a desired system beaviour [11]. Gradient descent algoritms traditionally minimize te modified sum of squared errors (S) as te objective function, ever since te popular description of te backpropagation algoritm by Rumelart, Hinton and Williams [12]. Te S, as all statistical error measures, produces a value of for an optimal forecast and is symmetric about e t =, implying symmetric costs of errors in predicting future demand for inventory levels. Te consistent use of te modified S in time series forecasting wit ANN is motivated primarily by analytical simplicity [11] and te similarity to statistical regression problems, modelling te conditional distribution of te output variables [2] similar to most statistical forecasting metods. As neural network teory and applications consistently focus on te symmetric S-function for training, also modeling least squares predictors, te forecasts also need to be adjusted using safety stocks to attain a desired service-level. Following, we propose an asymmetric cost function (ACF), modelling te objective function of te costs arising in te original decision problem instead of least squares predictors. Tese cost are often not only nonquadratic, but also non-symmetric in form. Te objective function in ANN training, determining te size of te error in te output-layer, may tus be interpreted as te actual costs arising from an overprediction or an underprediction of te current pattern p in training. Recently, we introduced a linear ACF to ANN training [6], originally developed by Granger for statistical forecasts in inventory management problems [9]. Te LINLIN cost function (LLC) is linear to te left and rigt of. Te parameters a and b give te slopes of te brances for eac cost function and measure te costs of error for eac stock keeping unit (SKU) difference between te forecast y ˆ and te actual value y. Te parameter a corre-

sponds to an overprediction and te resulting stockkeeping costs, wile b relates to te costs of lost salesrevenue for eac underpredicted SKU. Te LLC yields: LLC( y, yˆ a y ) = b y. yˆ yˆ for y for y for y < yˆ = yˆ > yˆ Te sape of one asymmetric LLC, as a valid linear approximation of a real cost function in our corresponding inventory management problem, is displayed in Fig. 2. (4) ave not yet been developed for ANN-training, as sown in Table 1. Table 1. Objective functions for neural network training Variability Symmetry of objective function Symmetric Non-symmetric, AE, ACE. LINLIN etc. Fixed (statistical error functions) (asymmetric cost functions) Variable - - Asymmetric transformations of te error function alter te error surface significantly, resulting in canges of slope and creating different local and global minima. Terefore, using gradient descent algoritms, different solutions are found minimizing cost functions instead of symmetric error functions, finding a cost minimum prediction for te inventory management problem. Tese asymmetric cost functions may be applied in ANN training using a simple generalisation of te error-term of te back-propagation rule and its derivatives, amending only te error calculation for te weigt adaptation in te output layer [6], but applying alternative training metods or global searc metods to allow network training [11]. 4. Simulation experiment of neural networks in inventory management 4.1. Experimental time series and objectives Figure 2. Empirical Asymmetric Cost Function sowing cost arsing for over- and under-prediction, using a=$.1 and b=$1. in comparison to te. For a b tese cost functions are non-symmetric about and are ence called asymmetric cost functions. Te degree of asymmetry is given by te ratio of a to b [4]. For a = b = 1 te LLC equals te statistical absolute error measure AE. Te linear form of te ACF represents constant marginal costs arising from te business decision. Our model terefore coincides wit te analysis of business decisions based on linear marginal costs and profits. Yang, Can and King introduce a classificationsceme for objective functions, introducing dynamic nonsymmetric margins for support vector regression [17]. Applied to objective functions in ANN training it allows a classification of all symmetric statistical error functions and asymmetric cost functions previously developed. Linear, non-linear and mixed ACFs ave been specified [1,4] in literature wile variable or dynamic objective functions Following, we conduct an experiment to evaluate te ability of a MLP to evolve a set of weigts minimizing an LLC asymmetric cost function for a random, stationary time series. To control additional influences in te forecasting experiment we analyse a time series wic is wite noise of te form: y = c +. (5) t e t A wite noise model represents a simple random model consisting of an overall level c and a random error component e t wic is uncorrelated in time [1]. Te time series considered is derived from te popular montly airline passenger data, introduced by Brown [3]. In [1], te montly values from 1/1949 unto 12/1956 are detrended and deseasonalised using a X-12-ARIMA Census II decomposition, extracting only te residuals. An analysis of te autocorrelations confirms a stationary wite noise model of te form c=1249 and no datapattern in te residuals e t.. Considering te structure of a stationary wite-noise model, lacking any systematic pattern in te residuals of e t, it sould proibit one ANN to extract any underlying linear or nonlinear generator of te data and tus outperform competing ANNs or linear metods, ensuring an unbiased comparison of metods regardless of individual model performance. Experiencing

random fluctuations, te ANN as oter metods sould predict te level c as te optimum predictor witout overfitting to te training data. In order to specify te underlying costs arising from te decision process we require a suitable objective function. We construct te airline decision problem as an inventory model witout backordering. An airline carrier needs to allocate planes of different sizes to matc passenger demand of fligts. Overprediction of air fligt passengers leads to inventory olding costs a for te seat wile underprediction results in costs b troug lost salesrevenue per seat, assuming b>a and disregarding fixed costs of te decision. For our experiment, we assume a igly asymmetric cost relationsip of (a=$.1; b=$1.). We generate business forecasts based upon ordinary least-squares predictors added to te safety buffers necessary to acieve te desired service level. Additionally we use asymmetric cost predictors to decide te cost efficient amount of passenger seats provided for eac mont, assessing te ex-post performance. For forecasting metods using ordinary least squares predictors, suc as ANN trained on or ARIMAmodels, te service level is calculated, using * cu 1. ( Q ) = = =. 999 p, y< (6) c + c 1.1 u o to derive an optimum service level of 99.9% for our inventory problem, terefore requiring k=3.9 standard deviations assuming a Gaussian distribution of residuals. 4.2. Experimental design of te forecasting metods A sample of n=96 observations is split into tree consecutive datasets, using 72 observations for te training-, 12 for te validation- and 12 for te test-set, resulting in 59, 12 and 12 predictable patterns in eac set. All data was scaled from a range of 9 to 1 to te interval [-1;1]. We consider a fully connected MLP wit a topology of 13 input, 12 idden, 1 output node witout sortcuts connections to exploit all feasible yearly timelags of a montly series. All units use a summation as an input-function, te tangens yperbolicus (tan ) as a semilinear activation-function and te identity function as an output-function. Additionally, 1 bias unit models te tresolds for all units in te idden and output layer. Two sets of networks were trained. Set ANN was trained on minimizing te, set ANN LLC was trained minimizing an asymmetric cost function wit te parameters LLC (a=$.1;b=$1.) for (7). Eac MLP was initialised and trained for five times to account for [-.3;.3] randomised starting weigts. Training consisted of a maximum of epocs wit a validation after every epoc, applying early stopping if te validation error did not decrease for epocs. After training, te results for te best network, cosen on its performance on te validation set, and te average of five networks are computed for all data-sets. Only te test-set-data is used to measure generalisation, applying a simple old out metod for cross-validation. As bencmarks, te Naïve1 metod using last periods sales as a forecast, and results using te software Forecast Pro and Autobox were computed. Eac software selects and parameterises appropriate models of exponential smooting or ARIMA intervention models based upon statistical testing and expert knowledge. All ANNs were simulated using NeuralWorks II Professional and distinct error function tables to bias te calculated standard error. For te predictions by ANN and te expert software systems te final inventory level was calculated, using s = ˆ + + 3.9δ. (7) y t Resulting in an ex-post correction of te ordinary least squares predictor. Te ANN LLC was trained to forecast te inventory level directly troug te ACF. 4.3. Experimental results on asymmetric costs Table 2 displays te results using mean error measures computed on eac data-set to allow comparison between data-sets of varying lengt. Te results are given in te e Table 2. Results on Forecasting Metods and ANNs trained on linear Asymmetric Costs and Squared Error Measures Error Measures Service Level Rank on test-set M(e) MLLC(e) Beta M LLC Naïve 1(random walk) 18.4 1.26.82 1.57 1.53.32 98.75% 98.78% 99.74% 4 11 Autobox ARIMA wit pulse 1.8 2.95.58.53.58.25 99.58% 99.54% 99.8% 2 9 ForecastProXE ARIMA (,,) 1.3 4.39.5 1.2.92.23 99.19% 99.27% 99.82% 1 8 ANN trained on, best network (No.45) 1.8 4.48.68.96.77.16 99.23% 99.38% 99.88% 3 7 ANN trained on, average of 5 nets 1.91 4.9.82 1.7.91.27 99.14% 99.28% 99.78% 4 1 Autobox ARIMA + safety stock k=3 18.5 21.6 18.47.5.4.4 99.96%.%.% 8 3 Forecast Pro ARIMA + safety stock k=3 123.6 81.7 81.3.5.9.9 99.96%.%.% 11 6 ANN trained on, best net + safety st. k=3 39.9 3.3 31.6.19.1.6 99.85%.%.% 9 4 ANN trained on, av. 5 net +safety st. k=3 42. 32.4 33.8.18.1.6 99.86%.%.% 1 4 ANN LLC trained on LLC1, best net (No.48) 18.3 9.5 8.47.4.17.3 99.68% 99.86%.% 7 1 ANN LLC trained on LLC1, av. of 5 networks 15.23 8.87 6.36.45.17.3 99.65% 99.81%.% 6 1

form (training-set / validation-set / test-set) to allow interpretation. Te descriptive performance measure of te β - service-level gives te amount of suppressed sales per dataset in relation to all demand. An asymmetric ex-post performance measure is calculated, denoting te ex post mean LINLIN costs (MLLC) resulting from a given forecast metod. All metods are evaluated on and ranked by teir performance for te mean (M) and te MLLC. Various results may be drawn from te experiment. As expected, te best ANN trained on te standard gives forecasts close to te wite noise level c wit little deviations due to limited overfitting on te residuals, as sown in Fig.3. Te forecasts given by all ANNs robustly centre around c as displayed in te comparative performance of te average of all 5 networks trained on minimizing te in table 2.. [ passengers] Residual of vs. Prediction of ANN min! S ANN4 5 1954 9 1954 1 1955 5 1955 9 1955 1 1956 5 1956 Figure 3. ANN trained on minimizing te symmetric to forecast montly airline passengers, sowing te wite noise time series of ticket sales, te ANN forecast and te ex-post forecast error measured by te (e). 4 Te ARIMA intervention model parameterised by te software Autobox fitted significant deviations of c as pulses to reduce te ex-post forecast error, tereby reducing te standard deviation of te error in figure 5. [ passengers] Residual of vs. Prediction of Autobox 5 1954 9 1954 1 1955 5 1955 9 1955 1 1956 5 1956 Autobox Figure 5. Autobox prediction of ARIMA intenvention model, te original wite noise time series of ticket sales and te ex-post forecast error measured by te (e). However, te ARIMA (,,)-model outperforms all oter metods including te ANN and te random walk model regarding te performance of te on te test-set. Te impact of intervention modelling in comparison to standard ARIMA-modelling becomes evident in te proposed inventory levels of te competing models. Altoug all metods avoid costly stockouts acieving a service level of.%, te increased standard deviation of te ARIMA forecasting residuals including all pulses leads to significantly iger safety stocks and an increased inventory level on te test-set. 4 In comparison, te ARIMA-(,,)-model selected by te expert system software Forecast Pro predicts te constant c precisely, as sown in figure 4. Residual of vs. Autobox Inventory k=3 Autobox Inventoy [ passengers] Residual of vs. Prediction of Forecast Pro Forecast Pro 5 1954 9 1954 1 1955 5 1955 9 1955 1 1956 5 1956 Figure 4. Forcast Pro prediction of ARIMA (,,)-model, te original wite noise time series of ticket sales and te ex-post forecast error measured by te (e). 4 [ passengers] 5 1954 9 1954 1 1955 5 1955 9 1955 1 1956 5 1956 Figure 6. Autobox forecast of ARIMA intervention models including safety stocks of k=3 standard deviations and te ex-post forecast error measured by te (e). Consequently, te costs arising from te ARIMA inventory levels exceed tat of te intervention models, as displayed in figures 6 and 7 as well as te error measures presented in table 2. 4

[ passengers] Residual of vs. Forecast Pro Inventory k=3 145 Forecast Pro Inventory 5 1954 9 1954 1 1955 5 1955 9 1955 1 1956 5 1956 Figure 7. Forecast Pro forecast of ARIMA (,,) model including safety stocks of k=3 standard deviations and te ex-post forecast error measured by te (e). Te best ANN LLC trained wit te asymmetric LINLIN cost function LLC, sown in fig. 8, gives an overall superior forecast regarding te business objective, acieving te lowest mean costs on te test-data wit.3. It exceeds all inventory metods, and clearly outperforms forecasts of ANN trained wit te criteria and added safety stocks, as well as te ARIMA and ARIMAintervention models wit safety stocks selected by te software expert system. [ passengers] Residual of vs. Prediction of ANN min! LLC 5 1954 9 1954 1 1955 5 1955 9 1955 1 1956 5 1956 4 4 ANN5 Figure 8. ANN trained on minimizing te asymmetric cost function LLC1 to forecast montly airline passengers, sowing te actual ticket sales, te ANN forecast and te ex-post forecast error measured by te (e). Analysing te beaviour of te forecast based upon te asymmetry of te costs function, te neural network ANN LLC raises its predictions in comparison to te ANN trained on squared errors to acieve a cost efficient forecast of te optimum inventory level, accounting for iger costs of underpredicition versus overprediction and terefore avoiding costly stock-outs. Tis is also evident in te lack of stock-outs represented by te increased β -service-level of.%. Consequently, te neural network no longer predicts te expected mean c of te wite noise function but instead produces a biased optimum predictor, as intended by Grangers original work troug ex post correction of te original predictor [8]. Tis may be interpreted as finding a point on te conditional distribution of te optimal predictor depending on te standard-deviation. For an inventory management problem, te network finds a cost efficient inventory level witout te separate calculation of safety stocks directly from te cost relationsip. Tis reduces te complexity of te overall management process of stock control, successfully calculating a cost efficient inventory level directly troug a forecasting metod using only a cost function and te data. 5. Conclusion We ave examined symmetric and asymmetric error functions as performance measures for neural network training. Te restriction on using squared error measures in neural network training may be motivated by analytical simplicity, but it leads to biased results regarding te final performance of forecasting metods. Asymmetric cost functions can capture te actual decision problem directly and allow a robust minimization of relevant costs using standard multilayer perceptrons and training metods, finding optimum inventory levels. Our approac to train neural networks wit asymmetric cost functions as a number of advantages. Minimizing an asymmetric cost function allows te neural network not only to forecast, but instead to reac optimal business decisions directly, taking te model building process closer towards business reality. As demonstrated, considerations of finding optimal service levels in inventory management are incorporated witin te ANN training process, leading directly to te forecast of a cost minimum stock level witout furter computations. However, te limitations and promises of using asymmetric cost functions wit neural networks require systematic analysis. Future researc may incorporate te modelling of dynamic carry-over-, spill-over-, tresoldand saturation-effects for exact asymmetric cost functions were applicable. In particular, verification on multiple time series, oter network topologies and arcitectures is required, in order to evaluate current researc results. References [1] G. Arminger, N. Götz, Asymmetric Loss Functions for Evaluating te Quality of Forecasts in Time Series for Goods Management Systems - Tecnical Report 22/1999, Dortmund, 1999 [2] J. S. Armstrong, Strategic Planning and Forecasting Fundamentals, in K. Albert (eds.), Te Strategic Management Handbook, McGraw-Hill, New York, 1983, 2, 2-1 - 2-32,

[3] J. S. Armstrong, Error Measures for Generalizing About Forecasting Metods - Empirical Comparisons, International Journal of Forecasting, 1992, 8, 69-8, [4] J. S. Armstrong, Evaluating Forecasting Metods, in J. S. Armstrong (eds.), Principles of Forecasting - A Handbook for Researcers and Practitioners, Kluwer Academic, Boston, 1, 443-472, [5] C. M. Bisop, Neural Networks for Pattern Recognition, Oxford University Press, Oxford, 1995 [6] G. Box, G. Jenkins, Time Series Analysis - Forecasting and Control, Holden-Day, San Francisco, 197 [7] R. G. Brown, Statistical Forecasting for Inventory Control, McGraw-Hill, New York, 1959 [8] P. F. Cristoffersen, F. X. Diebold, Furter Results on Forecasting and Model Selection under Asymmetric Loss, Journal of applied econometrics, 11, 1996, 5, 561-572, [9] P. F. Cristoffersen, F. X. Diebold, Optimal Prediction under General Asymmetric Loss, Econometric Teory, 13, 1997, 88-817, in M. P. Clements, D. F. Hendry (eds.), Forecasting economic time series, Cambridge University Press, Cambridge, 1998 [1] S. F. Crone, Training Artificial Neural Networks for Time Series Prediction using Asymmetric Cost Functions, in L. Wang, C. Rajapakse, K. Fukusima, S.-Y. Lee, X. Yao (eds.), Computational Intelligence for te E-Age - Proceedings of te 9t International Conference on Neural Information Processing ICONIP'2, Nov. 18-22 2, Singapore, 2, III, III-1-III-8, [11] G. Dorffner, Neural Networks for Time Series Processing, Neural Network World, 4, 1996, 6, 447-468, [12] C.W.J. Granger, Prediction wit a Generalized Cost of Error Function, Operational Researc Society (eds.), Operational Researc Quarterly, 2, 1969, S. 199-27, [13] C.W.J. Granger, Forecasting in Business and Economics, Academic Press, New York, 198 [14] S. Makridakis, S. C. Weelwrigt, R. J. Hyndman, Forecasting - Metods and Applications, Wiley, New York, 1998 [15] Parsian, N. S. Farsipour, Estimation of te mean of te selected population under asymmetric loss function, O. Anderson, et al. (eds.), Metrika, International Journal for teoretical and applied statistics, 5, 1999, 2, 89-17, [16] R. D. Reed, R. J. Marks II, Neural Smiting - Supervised Learning in Feedforward Artificial Neural Networks, MIT Press, Camebridge, MA, 1998 [17] D. E. Rumelart, G. E. Hinton, R. J. Williams, Learning representations by back-propagating errors - (reprint), in J. A. Anderson, E. Rosenfeld (eds.), Neurocomputing - Foundations of Researc, MIT Press, Camebridge, MA, 1988, 1, 696-699, [18] J. Scwarze, Statistisce Kengrößen zur Ex-Post- Beurteilung von Prognosen - (Prognosefelermaße), in J. Scwarze (eds.), Angewandte Prognoseverfaren, Neue Wirtscafts-Briefe, Herne, 198, 317-344,