SINGULAR SPECTRUM ANALYSIS HYBRID FORECASTING METHODS WITH APPLICATION TO AIR TRANSPORT DEMAND K. Adjenughwure, Delft University of Technology, Transport Institute, Ph.D. candidate V. Balopoulos, Democritus Thrace University, Dep. of Civil Engineering, Associate Professor G. Botzoris, Democritus Thrace University, Dep. of Civil Engineering, Assistant Professor Athens, Conference Hall, Ministry of Infrastructure, Transport and Networks, 5&6 November 205
TITLE TRANSPORTATION OF THE SLIDEDEMAND FORECASTING Transportation demand forecasting is the process of estimating the number of people or vehicles that will use a specific transport facility over a particular time interval. Accurate forecasting of demand is particularly important in air transport, influencing decisions such as ticket pricing, operation of new or closing of existing routes, aircraft purchase, building of new or abandoning of old terminals, etc. The numerous methods that have been developed for or employed in air transport demand forecasting may be classified as qualitative (such as market surveys, Delphi method, and expert meetings), or quantitative (such as econometric, time series, etc.). 2
TITLE TRANSPORTATION OF THE SLIDEDEMAND FORECASTING Statistical time-series prediction methods, such as Autoregressive Integrated Moving Average, have long been preferred for modeling of airport passenger demand, but recently artificial intelligence methods, such as Artificial Neural Networks, Fuzzy Logic, and the Adaptive Neuro-Fuzzy Inference System, have gained recognition and have been applied to the same task. All time-series prediction methods are reasonably accurate, but are inherently sensitive to noise. To increase the accuracy of timeseries prediction, various methods have been developed to remove noise from raw data and to decompose any time series into its trend, its oscillatory components and its noise components. One of these methods is the Singular Spectrum Analysis which decomposes any time series into various components. 3
TITLE SINGULAR OF THE SPECTRUM SLIDE ANALYSIS The Singular Spectrum Analysis (SSA) has been combined with other classical time-series prediction methods to help improve their results. Most related research use the SSA as a noise removal. A very recent hybrid approach, however, is to first use SSA to decompose a time series into many component time series (trend, seasonal and noise), then predict each non-noise component separately by a chosen time-series prediction model, and finally employ SSA to aggregate the predicted components into predictions for the original time series. trend cyclical variation seasonal variation Y t = T t + C t + S t +R t random variation 4
TITLE TIME-SERIES OF THE OF SLIDE A VARIABLE SINGLE DECOMPOSITION 8000 7000 6000 5000 4000 3000 2000 000 0 Heathrow airport, monthly passenger demand (thousands) = Jan-05 May-05 Sep-05 Jan-06 May-06 Sep-06 Jan-07 May-07 Sep-07 Jan-08 May-08 Sep-08 Jan-09 May-09 Sep-09 Jan-0 May-0 Sep-0 Jan- May- Sep- Jan-2 May-2 Sep-2 Jan-3 May-3 Sep-3 8000 7000 6000 5000 4000 3000 2000 000 0 Jan-05 Sep-05 May-06 Jan-07 Sep-07 TREND May-08 Jan-09 Sep-09 May-0 Jan- Sep- May-2 Jan-3 Sep-3 + OSCILLATION Jan-05 Jul-05 Jan-06 Jul-06 Jan-07 Jul-07 Jan-08 Jul-08 Jan-09 Jul-09 Jan-0 Jul-0 Jan- Jul- Jan-2 Jul-2 Jan-3 Jul-3 5
TITLE SCOPE OF OF THE SLIDE PAPER The contribution of this paper is to show that SSA decomposition of a time series and the subsequent prediction of its components can improve forecasting results. ANFIS was chosen as a method to allow easy comparison with the work of Xiao et al. (204). We demonstrate this fact by using the statistical data of two international airports (Heathrow, London and El. Venizelos, Athens), with very different traffic volume and characteristics. 7,000 6,500 6,000 5,500 5,000 4,500 4,000 Passengers (in thousands), LHR airport Trend Training Testing 2005 2007 2009 20 203 6
TITLE ADAPTIVE OF THE NEURO-FUZZY SLIDE INFERENCE SYSTEM (ANFIS) ANFIS = ANN + FIS The acronym ANFIS derives its name from adaptive neuro-fuzzy inference system. Using a given input/output data set, the anfis constructs a Fuzzy Inference System (FIS) whose membership function parameters are tuned (adjusted) using either a back propagation algorithm (i.e. a Artificial Neural Network) alone or in combination with a least squares type of method. This adjustment allows your fuzzy systems to learn from the data they are modeling. Layer 0 Layer Layer 2 Layer 3 Layer 4 Layer 5 x y A A 2 B B 2 w w 2 w w 2 2 x y w w f f 2 2 f Layer : Fuzzification Layer Layer 2: Rule Layer Layer 3: Normalization Layer Layer 4: Defuzzification Layer Layer 5: Summation Layer 7
TITLE ADAPTIVE OF THE NEURO-FUZZY SLIDE INFERENCE SYSTEM (ANFIS) To improve the generalization capability of an ANFIS model, a method known as cross-validation is used. In this method, all the available data is split into three sets: a training set, a validation or checking set and a testing set. The data in the training set is used to train the model while the validation data set is used to prevent the model from overfitting by monitoring the error in their output. The training of the model is stopped when the error of the validation set is minimized. Note that the validation data is only used after the model have been trained and is not part of the training. Thus this can be considered as an independent check on how well the trained model is doing. After training and validation, the test set is then used as a second independent test of the generalization ability of the model. The final model chosen is the model that gives the minimum error in the output of the test set. 8
TITLE SINGULAR OF THE SPECTRUM SLIDE ANALYSIS (SSA) The first stage is the decomposition of the series and the second stage is the reconstruction of the decomposed series to get the original series. The three parameters to be selected for the SSA algorithm are the window length L, the number of elementary matrices to use for the reconstruction r, and the number of groups m. The most important parameter is the window length L. The other two parameters can be omitted, depending on the way the SSA will be used (for pure decomposition only the window length is required, and for noise removal the grouping stage can be omitted). The window length is the only parameter needed for the decomposition of the time series. There is currently no algorithm for selecting the window length but many researchers have suggested choosing L<(N/2) as a general rule, where N is the number of available time series data. 9
TITLE SINGULAR OF THE SPECTRUM SLIDE ANALYSIS (SSA) For a time series data with a known period T, Golyandina et al. (200) recommend choosing L such that L/T is an integer. For instance, if the time series data is seasonal and the period is 4, then choosing L to be multiples of 4 (4, 8, 2, 6,...) will help capture the periodic components with periods 4. If the series has multiple periods (T, T 2, T 3 ), then L should be chosen such that L/T i is an integer for all i. To extract only a trend component, L should be chosen large enough so that the trend is separable from other components such as the noise but not too large because large values of L mixup the trend with other components. In conclusion, L should be chosen such that all the components from the decomposition of the time series are separable or non-correlated. 0
TITLE THE HYBRID OF THE MODELS SLIDE The proposed hybrid models combine the SSA with ANFIS. The goal is to improve the performances of the ANFIS model by first decomposing the time series into a sum of simple components (time series) which are easier to predict using these methods and then combining the predictions of each component. Time series components Grouped components Predicted components PC GC prediction with ANFIS PGC PC 2 Original time series Decomposition with Singular Spectrum Analysis (SSA) PC 3 PC 4 GC 2 prediction with ANFIS PGC 2 Summation with Singular Spectrum Analysis (SSA) Predicted time series PC L- GC m prediction with ANFIS PGC m PC L
THE TIME SERIES CHARACTERISTICS OF THE LONDON TITLE OF THE SLIDE HEATHROW (LHR) AND ATHENS (ATH) AIRPORT LHR ATH 2
COMPONENTS TITLE OF THE SLIDE EXTRACTED FROM THE LHR AIRPORT BY SSA 3
COMPONENTS TITLE OF THE SLIDE EXTRACTED FROM THE ATH AIRPORT BY SSA 4
TITLE OF THE SLIDE COMPARISON OF RESULTS BETWEEN PURE ANFIS AND HYBRID SSA ANFIS MODELS 5
IMPROVEMENT OF THE FORECASTING ABILITY BY USING TITLE OF THE SLIDE THE HYBRID SSA - ANFIS MODEL The results of the prediction of the pure ANFIS model re-emphasise the advantages in using the hybrid models. Although the pure models did not perform well on average on two airports with MAPE between 4.38% and 8.69%, the hybrid SSA ANFIS models gave far better predictions with MAPE less than 2% for both airports. In terms of the RMSE, the predictions made by the hybrid models were an average 5.3 times better than the pure ANFIS. Also the coefficient of determination R 2 had an average improvement of 2% across both airports Statistics Pure ANFIS model Hybrid SSA ANFIS model Airport Root Mean Square Error (RMSE) Mean Absolute Error (MAE) Mean Absolute Percentage Error (MAPE) 335.49 89.68 Heathrow 2.96 6.26 Athens 263.99 72.27 Heathrow 73.70 4.32 Athens 4.38.2 Heathrow 8.69.52 Athens Coefficient of 0.77 0.98 Heathrow determination, R 2 0.85 0.98 Athens 6
NEXT TITLE STEP OF THE SLIDE 7
CONCLUDING TITLE OF THE REMARKS SLIDE Although econometric methods are currently being used to forecast transport demand, the success of time series forecasting models, especially for short-term demand forecasting, has shifted research focus to development of methods to improve the forecasting ability of these models. Consequently, specialized statistical models like ARIMA and more recently artificial intelligence (AI) methods like ANN and ANFIS have been applied successfully to forecast air transport demand time series. Despite the success of AI models, their poor performance when used to predict noisy and seasonal time-series data, like monthly passenger demand of airports, has necessitated better forecasting models that can forecast in the presence of noise and also exploit the seasonality of the data to improve forecasting results. Methods like seasonal ARIMA have been used to forecast seasonal data, while Singular Spectrum Analysis (SSA) has been used as a noise removal tool to forecast noisy data. 8
CONCLUDING TITLE OF THE REMARKS SLIDE In this paper, hybrid models that combine SSA and ANFIS have been calibrated to forecast the passenger demand of two international airports, London Heathrow and Athens. Forecast results have shown that decomposing a time series by means of SSA into simpler components, predicting the future values of the components using any established prediction method, and then summing the predictions using SSA, can greatly improve forecasting performance. The main reasons for the remarkably improved forecasting achieved by the SSA-hybrid prediction methods are the simplicity, since the component time series are simpler and, hence, easier to predict, the exploitation of seasonality, since each seasonal component is predicted separately and the noise removal, since noise in the data is reduced by removing components with no seasonality or no significant contribution. 9