ISSUES IN UNIVARIATE FORECASTING By Rohaiza Zakaria 1, T. Zalizam T. Muda 2 and Suzilah Ismail 3 UUM College of Arts and Sciences, Universiti Utara Malaysia 1 rhz@uum.edu.my, 2 zalizam@uum.edu.my and 3 halizus@uum.edu.my 1
1 9 17 25 33 41 49 57 65 73 81 89 97 yt Introduction: Univariate forecasting Univariate modelling just involve one variable i.e. a set of time series data. The series has its own pattern whether trend, seasonal, cycle or irregular. (d) Irregular t The identification of components is important towards determination of suitable technique. 2
Forecasting Scenario For model building (diff, techniques diff model) Hold out data point historical forecast Future forecast Policy planning and control 10 steps ahead 2 step 2 step 1 step 1 step 1 t-2 t-1 t t+1 t+2 n n+1 n+2 T1 estimation period T2 evaluation period T3 (today/present)
Stages in the time series forecasting procedure for a given data set 1 2 3 4 5 6 Plot the data and identify the existence of the time series components Based on the components, choose several suitable forecasting techniques Divide the data into two parts; models estimation and evaluation Estimate the models using techniques identified in (2) Evaluate the models using recursive process and choose the best model using error measures Use the best model to forecast for the future 4
Problems of the study To solve the forecasting problem via manual calculation will involve highly cost. So automated system is important to avoid the problem. How do we automate the forecasting process when tacit knowledge and expert judgement is part of it? The issues of blended knowledge exist in the following part: Identification of time series components. Partitioning data. Estimation of forecasting techniques. Evaluation using error measures. 5
If forecasters want to produce forecast values, is not a problem but it will takes time to do it manually. But if end-users, there will be a problem because no expertise to do. If we can automate this process, there will be helpful to them. 6
Issues Identification of Time Series Components Partitioning of Data Estimation of Forecasting Techniques Decomposition (Multiplicative versus Additive) Exponential Smoothing Time Series Regression (Violation of Assumptions) ARIMA Identification Evaluation Error Measures Fixed versus Rolling Evaluation Solutions to the Issues in Automating the Forecasting Process 7
Issues in Identification of Time Series Components The needs of the blending of tacit knowledge or expert judgement of the forecaster Trend Seasonal Irregular/ Cyclical what type of trend exist non-fixed seasonal will be difficult to identify the outlier identification which perhaps just happen due to accidental event 8
Issues in Partitioning of Data There is no fixed rule of how to partition the data. Forecaster has adapt the data mining concept by partitioning the data into two parts. The general rule, the number of observation for estimation part is more than evaluation part. What is the best rate of partition? 9
Issues in Estimation of Forecasting Techniques Decomposition (Multiplicative versus Additive): Combination of these two conditions simultaneously in the data. Exponential Smoothing: Initial values and weight age of smoothing constant parameters. Time Series Regression (Violation of Assumptions): Is it true if the assumptions are not fulfilled will lead to inaccurate forecast? Do we really need to bother checking the assumptions? ARIMA Identification: Identification of ARIMA models (ACF and PACF plot) and the criteria used in selecting the best ARIMA model (AIC, BIC, standard errors and parsimony concept). 10
Issues in Evaluation Competition Ranking Parametric test Many studies (Fildes et al. (2011), Hyndman and Koehler (2006), Ismail (2005) and Fildes and Ord (2002)) have shown that more than one error measures needed to be used because each of the error measures has weakness and strength. Ranking (Batchelor (1990) and Stekler (1987)) methods play an important role in selecting the best technique. But by ranking the error measures, the true value of the errors are shadowed with the rank which we are loosing true information regarding the errors. Suggested initially by Diebold-Mariano but extended by Harvey et al (1997) is to conduct a parametric test implemented on mean squared errors for two sets of forecast errors. 11
Fixed versus Rolling Evaluation The recursive process is tedious and requires expert person in implementing it. Due to this, the end users still used the fixed evaluation. Fixed fitted values Evaluation of performance using the comparison between fixed fitted values and the actual values. Rolling fitted values Used in the recursive evaluation part where the equation is updated by including the left out actual values one by one to re-estimate the equation in order to mimic the future process. Once the equation been re-estimated, fitted values are produced using the updated equation (Fildes et al. (2011) and Lazim (2011)). 12
Initial Solutions to the Issues in Automating the Forecasting Process Identification: Start by identifying type of time series data i.e. yearly or non-yearly (monthly). All forecasting techniques that suitable for data set will go thru the estimation and evaluation part. Partition: We left out five data values for the evaluation part for the data and the rest are for the estimation part. Forecasting techniques: In Exponential Smoothing (ES): Single ES initial value equal to the first data value (Hanke & Wichern, 2005 and Lazim, 2011). 13
Double ES, Holt s and Holt-Winters techniques initial value from the coefficient in time series regression (Gaynor & Kirkpatrick, 1994 and Bowerman, O Connell & Koehler, 2005). In Time Series Regression, since the data is large we assumed the assumptions are met. In ARIMA, we used trial and error approach to identify the combination of p,d,q. We limit our search approach maximum 5 lags for p and q based on yearly data and 36 lags for non yearly (monthly) data. Error measures: more than one are used to develop the algorithm together with ranking procedure. Rolling evaluation: used in the recursive process to mimic the future process. 14
Objective of the study Solve the issues by automating the univariate forecasting process. 15
Methodology: Model building framework Start Data Specification Estimation Theory No Model checking: Is the model adequate? Yes Use the model End 16
Techniques to automate Yearly data Time series regression Moving average Double moving average Simple exponential smoothing Double exponential smoothing Holt s exponential smoothing Nonseasonal ARIMA Non yearly data Time series regression Decomposition Moving average Double moving average Simple exponential smoothing Double exponential smoothing Holt s exponential smoothing Holt-Winters exponential smoothing Nonseasonal ARIMA Seasonal ARIMA 17
A part of formula to automate the process: The univariate forecasting algorithm is transform to computer coding using Java. 18
Summarized of algorithm for automated univariate time series forecasting 1 2 3 4 5 6 7 classify each series into yearly or not yearly data. make a partition for each series to estimation and evaluation part apply all models that are appropriate for each series in estimation part, optimizing parameters (both smoothing parameters and the initial state variable) of the model in each case do recursive for each series in evaluation part to produce predicted values and error values select the best models based on the comparison of error measures produce the point forecast using the best model (with optimized parameters) for three steps ahead. display the best model with the graph, three steps ahead point forecast and the error. 19
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 103 106 109 112 115 118 121 yt yt We only use two set of simulated data, yearly data (n=100) and non yearly data (n=121). 200.00 Plot of yearly simulated data between y t and t 150.00 100.00 50.00 0.00 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 t 4000 3000 2000 1000 0 Plot of non yearly simulated data between y t and t t 20
Findings: A part of Java coding 21
Findings: A part of results in Java 22
Findings: Yearly data time series regression Yearly data Time Series Regression Coefficients Java Excel SPSS Constant, b 0 40.5781 40.5781 40.578 Slope, b 1 1.0979 1.0979 1.098 23
Findings: Yearly data Holt s technique Yearly data Holt s technique Initial values Java Excel SPSS Level, l 40.5781 40.5781 35.2457 Trend, T 1.0979 1.0979 1.1085 Yearly data Holt s technique Parameters Java Excel SPSS Alpha, α 0.90 0.90 0.90 Gamma, γ 0.01 0.01 0.01 Yearly data Holt s technique SSE Java Excel SPSS 3078.4199 3075.0075 3039.0758 24
Findings: Non yearly data Multiplicative decomposition Non yearly data Multiplicative Decomposition Component at t=116 Java Excel SPSS Trend, TR 1611.3137 1611.3117 1611.8231 Seasonal, SN 1.7169 1.7169 1.7204 25
Findings: Non yearly data Holt-Winters technique Non yearly data Holt-Winters technique Initial values Java Excel SPSS Level, l 359.0606 359.0606 395.7604 Trend, T 10.8545 10.8545 10.6649 Non yearly data Holt-Winters technique Parameters Java Excel SPSS Alpha, α 0.09 0.08 0.08 Gamma, γ 0.01 0.01 0.01 Delta, δ 0.76 0.75 0.70 Non yearly data Holt-Winters technique SSE Java Excel SPSS 162230.8601 159890.7404 174061.5416 26
Findings: Evaluation Using rolling evaluation: For simulated yearly data (n=100) The best model (the smallest rank) for: 1 step ahead MA 2 and 3 step ahead non-seasonal ARIMA(0,1,1) For simulated non yearly data (n=121) The best model (the smallest rank) for: 1, 2 and 3 step ahead Multiplicative Holt-Winters 27
Findings: Evaluation for yearly data (n=100) Error measure and performance ranking for 1 step ahead forecast value Techniques Error measure Total MSE RMSE GRMSE MAPE rank TSR 54.3954 (8) 7.3753 (8) 6.0555 (8) 4.9362 (8) 32 (8) SES 8.9080 (4) 2.9846 (4) 2.3196 (5) 1.8608 (4) 17 (4) DES 15.1737 (6) 3.8953 (6) 2.0851 (3) 2.2810 (6) 21 (5.5) HM 7.8751 (3) 2.8063 (3) 2.2294 (4) 1.8090 (3) 13 (3) MA(3) 6.5333 (1) 2.5560 (1) 1.9850 (1) 1.5935(1) 4 (1) DMA(3) 18.4469 (7) 4.2950 (7) 3.1444 (7) 2.6738 (7) 28 (7) nsarima 5552.4137 (10) 74.5145 (10) 74.4252 (10) 52.9548 (10) 40 (10) (0,0,1) nsarima 241.7160 (9) 15.5472 (9) 15.0902 (9) 10.8593 (9) 36 (9) (1,0,0) nsarima 7.6108 (2) 2.7588 (2) 2.0738 (2) 1.7079 (2) 8 (2) (0,1,1) nsarima (1,1,0) 9.6440 (5) 3.1055 (5) 2.4320 (6) 1.9392 (5) 21 (5.5) Notes: ( ) is rank of error measures 28
Findings: Evaluation for non yearly data (n=121) Error measure and performance ranking for 1 step ahead forecast value Techniques Error measure Total MSE RMSE GRMSE MAPE rank TSR (10) 885143.2873 940.8205 (10) 538.3685 (7) 51.3845 (10) 37 (10) MD 548.8900 (2) 23.4284 (2) 18.5129 (2) 1.7582 (2) 8 (2) SES 475881.0043 (5) 689.8413 (5) 543.5030 (9) 38.9002 (7) 26 (7) DES 580548.4212 (9) 761.9373 (9) 250.7882 (5) 30.3310 (6) 29 (8) HM 554615.7849 (8) 744.7253 (8) 105.8883 (3) 28.3425 (3) 22 (5.5) MHW 255.2392 (1) 15.9762 (1) 14.7690 (1) 0.9833 (1) 4 (1) MA(3) 304800.3333 (3) 552.0873 (3) 540.8231 (8) 40.3006 (8) 22 (5.5) DMA(3) 1118197.0222 (11) 1057.4484 (11) 623.4768 (10) 54.5727 (11) 43 (11) nsarima 1509461.4295 (12) 1228.6014 (12) 623.7136 (11) 55.9355 (12) 47 (12) (0,0,1) nsarima 338524.1200 (4) 581.8283 (4) 386.6450 (6) 28.6957 (5) 19 (3) (1,0,0) nsarima 509542.5628 (7) 713.8225 (7) 647.3582 (12) 44.5439 (9) 35 (9) (0,1,1) nsarima (1,1,0) 491456.9120 (6) 701.0399 (6) 134.1773 (4) 28.4720 (4) 20 (4) Notes: ( ) is rank of error measures 29
Conclusion This study has shown an example of algorithm that require tacit knowledge can be automate and lessen the role of expert opinion. Therefore, it gives solution for the end users to use automated time series forecasting. In summary, this study attempts to solve practical issues in forecasting that face to the non statistical user. This study demonstrates that in the early stage, an algorithm focus on specific data to conduct an optimize parameter to produce a better results. Further research on these issues can provide guidelines especially to end users. Perhaps by automating the process will help them gain higher forecast accuracy and lead to better decision making. 30
31