Modeling and Prediction of Network Traffic Based on Hybrid Covariance Function Gaussian Regressive

Journal of Information & Computational Science 12:9 (215) 3637 3646 June 1, 215 Available at http://www.joics.com Modeling and Prediction of Network Traffic Based on Hybrid Covariance Function Gaussian Regressive Liang Tian, Weifeng Wang Department of Computer and Information Engineering, Xinxiang University, Xinxiang 4533, China Abstract In order to obtain better predict results of the network traffic, this paper proposes a novel network traffic prediction model based on hybrid covariance function Gauss Process (GP). Firstly, GP model is built by using hybrid covariance function, and then the network training set is input to GP model for training to find the optimal parameter of covariance and mean function, finally, network traffic prediction model is established, and one-step and multi-step network traffic prediction test are carried out to test the performance compared with support vector machine, the neural network, and the traditional Gauss process. The results show that, compared with the contrast model, the proposed mode can describe the change trends of network traffic, and improve the prediction accuracy of network traffic, so it is an effective prediction method for complex network traffic. Keywords: Network Traffic; Gaussian Process; Phase Space Reconstruction; Modeling and Prediction 1 Introduction Internet has been widely adopted in many realms as a result of computer technology development. Network traffic prediction, a key technology in network management, plays a significant role in network broad band distribution and network congestion control. Thus, a highly accurate network traffic predication model, to describe network dynamics, captures a lot attention from scholars [1]. Network traffic s evolution shows that its history data is actually time series data. Thus, its models are very important. The classic models include MA, AR, and ARIMA, etc [2, 3]. These models, adopting difference modeling, are simply structured. They assume that network traffic change reposefully. However, as network traffic data may be affected by external factors, such models can only describe network traffic s periodicity, resemblance, not non-stationary. Thus they are a sophisticated non-linear system. The traditional time series prediction models can not establish a prediction model that reflects network traffic s variations, and their prediction accuracy also can not meet corresponding demands in network management [4,5]. With the development of artificial intelligence, some scholars put forward network traffic prediction model based on Chaos theory. Such models use methods, like fuzzy logic, neural network, kernel-based learning machine Corresponding author. Email address: gaa252@gmail.com (Liang Tian). 1548 7741 / Copyright 215 Binary Information Press DOI: 1.12733/jics215954

3638 L. Tian et al. / Journal of Information & Computational Science 12:9 (215) 3637 3646 or SVM (support vector machine) [6 8], to delay time and insert mensions to regroup network traffic time series. Then these models will establish non-liner network traffic predication models by machine learning methods. Better predication results will be achieved no matter single-step or multiple steps predictions to network traffic data [9,1]. Gaussian processes (short as GP) is a new machine learning method, with less parameters to adjust. It can carry out the combination of priori knowledge and data observation. As it owns features of good confidence, meanwhile owns support vector machine s kernel function technology, it has been proved to better function than support vector machine or neural network do [11]. In order to improve network traffic prediction accuracy, a network traffic predication based on covariance function Gaussian regressive is proposed. This model initially establishes GP model by covariance function, then founds network traffic prediction. Simulation test will be done by traffic data to verify single-step and multiple-step network traffic results. Then it will contrast with support vector machine and neural network under same conditions. 2 Gaussian Process Network Traffic Predication 2.1 Gaussian Process Regressive The modeling of GP is actually an GP distribution extension from limited mensional space to infinite space. It emerges priori knowledge into GP by selecting covariance functions and relevant parameters. Then it defines parameters by Bayes to provide model predication with confidence level. The training set: D = {x (n), y n } N n=1, f(x (1) ),..., f(x (N) ) refers to random variable set with joint Gaussian distribution. The GP shall be: f(x) GP (m(x), k(x, x )) (1) Here, E stands for mathematical expectation; k(x, x ) stands for covariance function; k(x, x ) stands for mean function. As for noisy data, the GP regressive model shall be: Here, ε stands for noise unrelated to data. y = f(x) + ε (2) If we assume f(x) is GP, y shall comply with Gaussian distribution. And its limited observed values joint distributions form a GP, i.e.: Here, δ ij stands for dirac function. Y GP (m(x), k(x, x ) + σ 2 nδ ij ) (3) If we adopt matrix form to show covariance function, then we can get: C(X, X) = E[yyT ] = K(X, X) + σ 2 ni (4) Here, I stands for unit matrix; C(X, X) stands for covariance matrix; K(X, X) stands for Gram matrix, or kernel matrix.

L. Tian et al. / Journal of Information & Computational Science 12:9 (215) 3637 3646 3639 In functional space of prior knowledge, we can get predication output value of prior distribution by Bayes. The test data, which contain N samples: D = {x (n), y n } N n=1: output function values f (x) forms output f. Then, observed output values of training and test data sets shall comply with joint Gaussian distribution, i.e.: [ ] ([ ] [ ]) y m C(X, X) K(X, X ) N, (5) K(X, X) K(X, X ) f m Here, m and m are even vectors components of training set and test set. According to joint Gaussian distribution, we can get: [ ] ([ ] [ ]) x mx A E N, (6) t E T B Then, the marginal distribution and conditions of x are: m t x N(m x, A) (7) x t N(m x + EB 1 (t m t ), A EB 1 E T ) (8) In this formula, T stands for transposition; A, E and B are covariance matrix. According to Eq. (5-7), GP model regressive equation shall be: f X, y, X N(f, cov(f )) (9) f = E[f X, y, X ] = m + K(X, X)C(X, X) 1 (y m) (1) cov(f ) = K(X, X ) K(X, X)C(X, X) 1 K(X, X ) (11) Here, f is the predictive value of f. Then from Eq. (1, 11), we can know f owns mean value and GP distribution of variance, i.e.: f(x ) = m(x ) + k T C 1 (y m(x)) (12) σ 2 f(x ) = k(x, x ) k T C 1 k (13) Here, f(x ) represents model s output value; and σ 2 f(x ) stands for variance. 2.2 Selection of GP Model and Its Parameters (1). Model Selection During the process of GP modeling, covariance function works as a key factor, which furnishes hypothetical information to function to be learned. Based on Eq. (4), GP s full covariance function can be described as: Here, k(x (i), x (j) ) is kernel function; and σ 2 n is noise variance. c(x (i), x (j) ) = k(x (i), x (j) ) + σ 2 nδ ij (14)

364 L. Tian et al. / Journal of Information & Computational Science 12:9 (215) 3637 3646 In GP model, covariance functions in Eq. (11) are: k 1a (x (i), x (j) ) = a + a 1 k rq (x (i), x (j) ) = v ( Here, d stands for the input mension. d l=1 1 + 1 2a x (i) l x (j) l (15) d l=1 w l (x (i) l ) a x (j) l ) 2 Eq. (15) and (16) are linear covariance function and rational square covariance function. In common GP models, as the parameters of Gaussian RBF functions are same, there is less possibility for single Gaussian kernel function s GP model to describe input mension changes in complicated system. Thus, with the help of composite functions of support vectors, many functions are combined in this paper to form a better-functioning hybrid covariance function. See below for details: c 1 (x (i), x (j) ) = k rq (x (i), x (j) ) + k 1a (x (i), x (j) ) + σ 2 nδ ij (17) (16) (2). Selection of Model Parameters We can adopt GP model to learn training set and define practical model parameters. So, based on Bayes frame, we can get covariance function by the formula below: { θ opt = arg max{y X, θ)} = arg max 1 θ θ 2 log(det(k + σ2 ni)) 1 2 (y m)t [K + σni] 2 1 (y m) N } (18) 2 log 2π Firstly, we can randomize parameters, and then adopt conjugate gradient optimization algorithm to iterate Eq. (18) to find the optimal parameters, i.e.: log p(y X, θ) = 1 θ k 2 (y m)t C 1 C θk C 1 (y m) 1 ( 2 tr 1 C C θk θ m log p(y X, θ) = (y m) T C Here, θ m and θ k are parameters for covariance functions. ) 1 m θ m (19) 3 Simulation Test 3.1 Data Set Source In Pentium Dual-Core 2.8 GHz CPU, 2 GB RAM, Windows 7, all programs adopt VC++ 6. programming. In this paper, the hourly data, total of 12 data points, is extracted from Xinxiang University network computer center during Oct 5, 212 to Nov 23, 212. The 12 data points forms a network traffic time series as shown in Fig. 1.

L. Tian et al. / Journal of Information & Computational Science 12:9 (215) 3637 3646 3641 15 1 5 2 4 6 8 1 12 Time series Fig. 1: The collected network traffic data In order to improve training efficiency, so, as to the original network traffic x(i), we can use Eq. (2) to pre-process the collectd network traffic, then we can get normalization network traffic value x (i). Here, E x and σ x stand for mean value and standard deviation. x (i) = x(i) E x σ x (2) 3.2 Model Comparison and Evaluation Standards We shall choose LSSVM, ε-svr, regular GP model (Gaussian RBF kernel function), RBFNN as network traffic models to compare. Among these models, ε-svr kernel functions are all RBF kernel functions, and its parameters are to be chosen by 5-fold cross-validation method. PMSE (prediction m ean square error) and ARE (average relative error) are used as evaluation standards. The definitions of PMSE and ARE are as follows: P MSE = N [ x(i) x(i)] 2 i=1 (21) N [x(i) x(i)] 2 i=1 ARE = 1 N N x(i) x(i) x(i) (22) i=1 In these two formulas, N stands for sample number; x(i) stands for mean value; x(i) and x(i) are actual value and prediction value. 3.3 Network Traffic Phase Space S Reconstitution As there may be some chaos in network traffic, we shall define delay time and built-in mensions. Thus, phase space s reconstitution is necessary [12]. Fig. 2 is about changing curves of network traffic s mutual trust functions and correlation mensions. From this diagram, we can see that, the optional delay time t = 2; the beset built-in mensions m = 5. Then wen use t = 2, m = 5 to reconstitute network traffic s phase space, and reconstruct learning sample of network traffic predication model.

3642 L. Tian et al. / Journal of Information & Computational Science 12:9 (215) 3637 3646 Mutual information function value 1.8 1.6 1.4 1.2 1. 2 4 6 8 1 τ (a) Calculate delay time (t) x(m) 2. 1.8 1.6 1.4 1.2 1. 1 3 5 7 9 11 13 15 m (b) Calculate built-in mension Fig. 2: Parameter setting for network traffic phase space reconstitution 3.4 Result and Analysis (1). Single-step Prediction Performance Analysis We can choose the last 2 data as test set to analyze model performance, and make the others as training set to establish network traffic single-step predication. Then we adopt Eq. (17) (hybrid covariance function) to establish model to improve GP network traffic predication. The actual output value and GP predication value are shown as Fig. 3, and predication result error changing curve is shown in Fig 4. Analysis of Fig. 3 and Fig. 4 shed light that: network traffic output single-step predication from improved GP model matches actual output traffic. Thus, this improved GP model has high accuracy and deserves wide promotion. 14 12 1 8 6 4 2 Forecast error (MB/s) 1 5 5 5 1 15 2 1 5 1 15 2 Fig. 3: Changing curve of single-step predication and actual output Fig. 4: Predication error distribution of network traffic test set See Table 1 for every model s single-step predication result. Data in Table 1 prove that the improved GP model s predication gains advantages by its stability and superiority. (2). Multiple-steps Predication Performance Analysis We can choose former 9 network traffic data as training set to establish its multiple-step predication model. Then we predict ahead of 2 steps or 4 steps. Another 1 network traffic data can work as test set to measure model s performance. Then we use Eq. (17) (hybrid covariance function) to establish an improved GP network traffic predication model. Make the improved

L. Tian et al. / Journal of Information & Computational Science 12:9 (215) 3637 3646 3643 Table 1: Performance contrast among between PMSE and ARE Prediction model PMSE ARE Improved GP model.363.319 Common GP model.551.572 LSSVM.426.423 ε-svr.411.49 RBFNN.53.5238 model s 2-step-ahead prediction and actual output value are shown in Fig. 5, from which we can see the two result are approximately same. The predication error changing curve is shown by Fig. 6, which shows that the error is within control range and can only be distinguished in small orders of magnitude. 12 1 8 6 4 2 2 4 6 8 1 Fig. 5: 2-step iterative prediction result and actual output Forecast error (MB/s) 15 1 5 5 1 15 2 4 6 8 1 Fig. 6: 2-step iterative prediction error distribution When the improved GP model directly predicts 4 steps ahead, its prediction result and actual output values are shown in Fig. 7, and its every point prediction error distribution is shown in Fig. 8. From Fig. 7, we can see, the 4-step-ahead prediction well matches its output value. Thus, the prediction model has good generalization. Table 2 is about the contrast between the two type models predictions. We can see, after the parameters are set, LSSVM and ε-svr have good prediction results. But GP model of hybrid 12 1 8 6 4 2 2 4 6 8 1 Fig. 7: 4-step-ahead direct predication Forecast error (MB/s) 4 3 2 1 1 2 4 6 8 1 Fig. 8: 4-step-ahead prediction error distribution

3644 L. Tian et al. / Journal of Information & Computational Science 12:9 (215) 3637 3646 Table 2: Multiple-step prediction performance contrast between GP model and other motels Prediction model 2-step-ahead iterative prediction 4-step-ahead direct prediction PMSE ARE PMSE ARE Improved GP model.37.3171.41.3433 Common GP model.562.5816.68.6295 LSSVM.435.4315.47.467 ε-svr.419.489.454.4426 RBFNN.513.5343.555.5783 covariance function is better. As the improved GP model is probabilistic kernel machine and its parameters are selected from marginal likelihood function. RBFNN has a good performance, but its predication is not stable. Contrast results show that improved GP model has a better performance, and its prediction results are more accurate and reliable. (3). Noisy Network Traffic Predication Performance Analysis To improve robustness of GP model, wen can add noise to network traffic in Table 1, by which we can get results shown in Fig. 9. Then we can model and predict these traffic data, and can get single-step and 2-step-ahead predication results, as shown in Fig. 1 and Fig. 11. From these 2 diagrams, both single-step and multiple-step prediction have gained satisfying results. Network traffic model based on improved GP has better stability and robustness, and good generalization. 15 1 5 2 4 6 8 1 12 Time series Fig. 9: Noisy network traffic data 15 1 5 2 4 6 8 1 Fig. 1: Single-step prediction of noisy network traffic data Noisy network traffic data prediction performance contrast by different models is shown in Table 3. This table shows that compared to LSSVM, ε-svr, RBFNN, and common GP model, hybrid covariance function GP model can more accurately describe noisy network traffic changing trends, and its predication result is quite stable. This result from that GP model shares same quick learning ability with neural network, and has support vector s generalization ability. Thus this can be deemed as an effective prediction method to research sophisticated network traffic changing.

L. Tian et al. / Journal of Information & Computational Science 12:9 (215) 3637 3646 3645 14 12 1 8 6 4 2 2 4 6 8 1 Fig. 11: Multiple-step prediction iteration results of noisy network traffic data Table 3: Noisy network traffic predication performance contrast between different models Prediction model Single-step prediction 2-step-ahead iteration prediction PMSE ARE PMSE ARE Improved GP model.3866.46.479.412 Common GP model.789.699.727.7523 LSSVM.526.54.562.5581 ε-svr.4985.521.542.529 RBFNN.6513.638.664.6911 3.5 Abnormal Prediction Analysis Network traffic s abnormal performance will bring much risks or potential risks, thus we shall preciously predict abnormal network traffic, which is important to improve network performance. In this paper, MAWI data, actual network data taken from Japan network to USA network, are selected as simulation objects. We use hybrid covariance function GP to analyze it. The diagram is about the predication of abnormal network traffic, and the diagram is about prediction error. From Fig. 12, we can see that the model in this paper can preciously predict network traffic performance. The predication value closely matches actual value. Meanwhile, in Fig. 13, we can see predication error is pretty small and within control range. These above mentioned are evidence for this paper s high applicability and robustness. Therefore, this paper can preciously 6 5 4 3 12 36 6 84 18 12 Data points 7 Forecast error (MB/s) 4 3 2 1 12 36 6 84 18 12 Data points Fig. 12: Abnormal network traffic prediction result Fig. 13: Predication error changing curve

3646 L. Tian et al. / Journal of Information & Computational Science 12:9 (215) 3637 3646 describe network traffic. 4 Summary GP network traffic predication model, based on hybrid covariance function, aims at solutions to network traffic prediction chaos. GP can well show non-liner mapping relation of network traffic with plain explanation. It is always brought into comparison with LSSVM, common GP model, ε-svr and RBF neural network. The results show that, improved GP model can actually improve network traffic predication performance, and the prediction results are more reliable. This is an effective non-liner network traffic prediction method with huge potential applicability. References [1] T. T. Nguyen, G. Armitage, A survey of techniques for internet traffic classification using machine learning [J], IEEE Communications Surveys and Tutorials, 1(4), 28, 56-76 [2] Bo Gao, Qin-yu Zhang, Yong-sheng Liang, Ning-ning Liu, Cheng-bo Huang, Nai-tong Zhang, Predicting self-similar networking traffic based on EMD and ARMA [J], Journal on Communications, 32(4), 211, 47-56 [3] Han-lin Sun, Yue-hui Jin, Yidong Cui, Shiduan Cheng, Large-time scale network traffic shortterm prediction by grey model [J], Journal of Beijing University of Posts and Telecommunications, 33(1), 21, 71-75 [4] A. Callado, R. J. Keu, D. Sadok et al., Better network traffic identification through the independent combination of techniques [J], Journal of Network and Computer Applications, 33(4), 21, 433-446 [5] A. Este, F. Gringoli, L. Salgarelli, Support vector machines for TCP traffic classification [J], Computer Networks, 53(14), 29, 2476-49 [6] Zhenjiang Zhao, Prediction and research on network traffic based on PSO-BP neural network [J], Computer Applications and Software, 26(1), 29, 218-221 [7] Hao Yu, Zhifeng Chen, Network traffic prediction based on wavelet analysis and Hopfield neural network [J], Computer Applications and Software, 3(6), 213, 246-249 [8] Nan Xiong, Baifen Liu, Online prediction of network traffic based on adaptive particle swarm optimisation and LSSVM [J], Computer Applications and Software, 3(9), 213, 21-24, 127 [9] Daowen Liu, Haina Hu, Network traffic prediction based on grid search SVM method [J], Computer Applications and Software, 29(11), 212, 185-186, 247 [1] Xiaolei Zhou, Wanliang Wang, Weijie Chen, Network traffic prediction model based on wavelet transform and optimised support vector machine [J], Computer Applications and Software, 28(2), 211, 34-36, 59 [11] Lijun Zhang, Peng You, Single-step and multiple-step prediction of chaotic time series using Gaussian process model [J], Acta Physica Sinica, 6(7), 211, 1-11 [12] Zhaodong Jin, Hong Chen, Zhenghao Zhang, Predicting network traffic based on adaptive genetic algorithm and LS-SVM [J], Computer Applications and Software, 27(11), 21, 22-222