1s Inernaional Conference on Exerimens/Process/Sysem Modeling/Simulaion/Oimizaion 1s IC-EsMsO Ahens, 6-9 July, 2005 IC-EsMsO NETWORK TRAFFIC MODELING AND PREDICTION USING MULTIPLICATIVE SEASONAL ARIMA MODELS Vassilios C. Moussas 1*, Marios Daglis 1, and Eva Kolega 1 1 Nework Oeraions Cenre (NOC), Technological Educaional Insiuion (T.E.I.) of Ahens Egaleo GR-12210, Greece, * e-mail: vmouss@eiah.gr Keywords: Time-Series, ARIMA, Seasonal, Nework, Traffic, Uilizaion. Absrac: Today s nework designers are execed o lan for fuure exansion and o esimae he nework s fuure uilizaion. Several simulaors can be used for wha-if scenarios bu hey all require as inu some esimaes of he fuure nework use. A mehod for esimaing he fuure uilizaion of a nework is resened in his work. Nework uilizaion is iniially modeled using an ARIMA model (, d, q), bu is redicion accuracy has a limied ime san. The redicion is imroved significanly by using a mulilicaive seasonal ARIMA (, d, q) x (P, D, Q) s model. The seasonal model roved exremely caable o recreae he curren daa and redic he fuure uilizaion wih recision. The only requiremen of he nonlinear model is he availabiliy of longer as records. The daily, weekly and monhly daases were colleced from real-life nework uilizaion, a he TEI of Ahens camus nework. 1 INTRODUCTION Nework raffic monioring and fuure raffic redicions lay a significan role in he design of neworks and nework ugrades. I is imoran o forecas he nework raffic workload when lanning, designing, conrolling and managing various LAN, MAN or WAN neworks. Time-series echniques look romising for modeling nework uilizaion, as, he Auo Regressive Inegraed Moving Average models are able o caure rends and eriodic behavior. The ARIMA and Seasonal ARIMA models have been alied successfully in economy, signal rocessing, safey and oher research fields [1, 2]. Recenly, several researchers modeled secific nework characerisics, ranging from long-erm backbone raffic forecasing o wireless and GSM raffic, using ime series models wih success [3, 4, 5]. In his aer a se of ARIMA models is used o redic fuure nework uilizaion from simle and common raffic monioring daa. The aim is o obain accurae redicions using common (easy o find, simle o aly and widely used) raffic daases. Four differen uilizaion cases were sudied, namely, a camus Inerne connecion, a building backbone, an office LAN, and a 160 lines dialu connecion. 2 TRAFFIC DATA COLLECTION There are several yes of daa o collec, when sudying he raffic of a nework. Almos any raffic characerisic may be measured and logged i.e. bi or acke rae, acke size, roocols, ors, connecions, addresses, server load, alicaions, ec. Rouers, firewalls, servers and managers (servers wih agens) can be used for his ask, bu measuring and archiving all hese daa for oenial fuure reamen is no a regular rocedure. Usually mos neworks log only he load of heir lines and he uilizaion some criical resources, using a more deailed monioring only when a resource requires secific aenion. As a resul, raffic rae or uilizaion are he only daa collecions ha are always available and wih long hisory records on almos any nework. In his work we focus on such commonly found daa ses in order o develo a more general and widely alied ool. Our raffic daa are aken from he rouer s sandard MIB or from he server s yical. In order o avoid device or sysem secific roblems he daa were aken via a monioring ool. The Muli Rouer Traffic Graher
(MRTG ) [8] ool is used for all daa collecion. The MRTG ool is very common, widely alied, and easily imlemened sofware for collecing and monioring uilizaion daa from a rouer or server MIB. The MRTG ool roduces sandard log files wih curren and as daa ha can be downloaded and saved by any browser or simle GET commands. Our mehod uses some MATLAB rocedures ha read hose sandard files and reare he daa for he model idenificaion ses. We seleced 4 differen yes of load/uilizaion daa regularly colleced in our camus nework (Figure 1): a) he TEI of Ahens camus Inerne connecion raffic rae (in bs), b) one of he camus buildings backbone raffic rae (in bs), c) an offices-only VLAN wih no labs (in bs), d) he number of he on-line Dialu users. Figure 1. Average uilizaion daa from he TEI of Ahens camus nework (weekly daa). a) The camus Inerne connecion. b) A building backbone. c) An offices only VLAN. d) On line dial-u users. Daa samling is erformed by defaul every 5 minues. Alhough i can be changed, his is a sandard adjusmen for MRTG and i is used in mos imlemenaions. Daily raffic is herefore exressed in ses of a 5- minue average raffic rae, measured in bis er seconds (bs). Samling by a shorer eriod (e.g. 1 min) may someimes lead o inaccuracies due o iming and delayed resonses from he rouer s MIB and samling by a longer eriod (e.g. 10 min) may lead o loss of informaion due o averaging. Therefore, he defaul samling rae (5-min) was finally used for his work. For visualizing long eriods MRTG does some averaging over a longer se. Weekly raffic is exressed in 30- min ses, monhly raffic in 2-hour ses, and, yearly raffic in 1-day ses. In order o reserve some informaion los by averaging he ool logs ogeher he average and he maximum value for each se. They can boh be used o roduce forecass, regarding he average or he maximum behavior resecively. As i also clear in Figure 1, he uilizaion of he nework resources demonsraes a daily eriodiciy. In addiion mos raffic aerns demonsrae anoher, less aaren, weekly eriodiciy. The monhly grah in Figure 2 clearly resens boh yes of eriodiciy. Such characerisics are imoran for he modeling rocedure ha follows, and for he forecasing accuracy of he mehod. Figure 2. Dialu uilizaion in TEI demonsraing daily and weekly eriodiciy. 3 TIME SERIES MODELING The rincile underlying his mehodology is ha raffic daa occur in a form of a ime series where observaions are deenden. This deendency is no necessarily limied o one se (Markov assumion) bu i can exend o
many ses in he as of he series. Thus in general he curren value X (= nework raffic a ime ) of he rocess X can be exressed as a finie linear aggregae of revious values of he rocess and he resen and revious values of a random inu u [1], i.e. X = ϕ X + ϕ X + + ϕ X + u θu θ u θ u (1) 1 1 2 2 1 1 2 2 q q In his equaion and reresen resecively he raffic volume and he random inu a equally saced ime inervals, -1, -2,.The random inu u consiue a whie noise sochasic rocess, whose disribuion is assumed o be Gaussian wih zero mean and sandard deviaion σ u. Eqn. (1) can be economically rewrien as (4) by defining he auoregressive (AR) oeraor of order and he moving-average (MA) oeraor of order q resecively by (2) & (3): ( ) 1 1 2 2 ( ) 1 1 2 2 ( B) X θ( B) u ϕ B = ϕ B ϕ B ϕ B (2) θ B = θ B θ B θ B (3) ϕ = (4) where, B sands for he backward shif oeraor defined as B s X = X -s. Anoher closely relaed oeraor is he backward difference oeraor defined as X = X - X -1 and hus, = 1- B, d = (1- B) d and s D = (1- B s ) D. The auoregressive moving-average model (ARMA) as formulaed above is limied o modeling henomena exhibiing saionariy. Clearly his is no he case for he nework raffic daa of Fig. 1. I is ossible hough ha he rocesses sill ossess homogeneiy of some kind. I is usually he case ha he d h difference of he original ime series exhibis saionary characerisics. The revious ARMA model could hen be alied o he new saionary rocess X and eqn. (4) will corresondingly read d ( ) θ( ) ϕ B X = B u (5) This equaion reresens he general model used in his work. Clearly, i can describe saionary (d = 0) or nonsaionary (d 0), urely auoregressive (q = 0) or urely moving-average ( = 0) rocesses. I is called auoregressive inegraed moving-average (ARIMA) rocess of order (, d, q). I emloys + q +1 unknown arameers ϕ 1,, ϕ ; θ 1,, θ q ; σ u, which will have o be esimaed from he daa. Saring from ARIMA model of eqn. (5) i can be deduced [1] ha a seasonal series can be mahemaically reresened by he general mulilicaive model ofen called Seasonal ARMA or SARIMA ϕ s d D s ( ) ( ) θ ( ) ( ) B Φ B X = B Θ B u (6) P s q Q The general scheme for deermining a model includes hree hases, which are: 1. Model idenificaion, where he values of he arameers, d, q are defined 2. Parameer esimaion, where he {ϕ} and {θ} arameers are deermined in some oimal way, and 3. Diagnosic checking for conrolling he model s erformance. As is saed however in [1], here is no uniqueness in he ARIMA models for a aricular hysical roblem. In he selecion rocedure, among oenially good candidaes one is aided by cerain crieria. Alhough more advanced mehods for model selecion have been roosed [6] he mos common and classic crieria remain he Akaike s Informaion Crierion (AIC) and he Schwarz s Bayesian Informaion Crierion (SBC or BIC) [1,7]. If L = L (ϕ 1,, ϕ, θ 1,, θ q, σ u ) reresens he likelihood funcion, formed during he arameer esimaion, he AIC and SBC are exressed resecively, as AIC = 2ln L+ 2 k & SBC = 2ln L+ ln ( n) k (7) where, k = number of free arameers (= + q) and n = number of residuals ha can be comued for he ime series. Proer choice of and q calls for a minimizaion of he AIC and SBC.
4 ARIMA MODEL BUILDING Vassilios C. Moussas, Marios Daglis, and Eva Kolega In he search of he arameers, d, q of he ARIMA model of eqn. (5) we work firs wih a single day raffic daa from Fig. 2. The curve in fig. 2 reresens he average daily raffic of he TEI camus Inerne connecion. The second difference of he daily curve (i.e. d = 2) demonsraes saionariy, as i is shown by is Auocorrelaion Funcion (ACF). The original curve, is 1 s and 2 nd differences and heir corresonding ACF are shown in Fig. 3. Figure 3. Inerne connecion of he TEI camus. Daily raffic daa, he 2 nd difference and heir ACFs. The crieria are minimized for = q = 3, herefore he ARIMA (3,2,3) model should be used. Parameer esimaion for his model yields he following values: ϕ 1 = -0.317, ϕ 1 = 0.318, ϕ 1 = -0.153, θ 1 = 0.764, θ 1 = 0.899, and, θ 1 = -0.7475. The model fis saisfacorily he daa, bu i may redic accuraely only for a few ses in he fuure (Figure 4). As shown in Figure s (4) inside lo, i is no reliable for longer redicions. Figure 4. Fiing of he average daily raffic (Inerne connecion) using he ARIMA (3,2,3) model, and, is redicions. Shor ime (1-4 ses) forecass of he 30 h case may be acceable, bu 30 min (6 ses) or longer forecass for he same case are inaccurae. Daa se (,d,q) AR & MA Parameers Inerne Conn. ( 3,2,3 ) AR: -0.317, 0.318, -0.153, MA: 0.764, 0.899, -0.7475 Gbi Backbone ( 2,2,1 ) AR: -0.4334, -0.2827, MA: 0.9676 Offices VLAN ( 1,1,1 ) AR: -0.3864, MA: -0.5468 Dial-U Users ( 1,2,1 ) AR: -0.1833, MA: 0.5915 Table 1 : ARIMA model arameers for all 4 raffic daases
The idenified models ha minimize he AIC/SBC crieria for each daase are shown in Table 1. The same conclusions regarding fuure redicions hold for all 4 kinds of uilizaion daa. The ARIMA model is no accurae for long forecas eriods. As already menioned in secion 2 he uilizaion daa demonsrae some eriodiciies and herefore anoher ye of model mus be alied ha will ake ino accoun such roeries. 5 SEASONAL ARIMA MODEL BUILDING As he above aroach yields acceable redicions only for a shor eriod and in order o redic reliably for longer eriods, he use of he seasonal aroach is required. From he firs ACF lo in figure 5 becomes clear ha he daa se has wo seasonal comonens, a daily one (a 24h) and a weekly one (a 168h). Figure 5. Inerne connecion. Four monhs of raffic daa, single & seasonal differences and heir ACFs. In he search of he arameers, d, q and P, D, Q of he SARIMA model of eqn. (6) we work as before by aking one single and one seasonal difference. Afer wo differences (i.e. D = 1 & d = 1) he ACF demonsraes saionariy and resens a eak a he seasonal comonen, i.e. lag 168, when he weekly eriodiciy is seleced (Fig. 5). Figure 6. Predicion of he weekly Dial-U uilizaion using he SARIMA (1,1,1)x(0,1,2) 168 model. Saring from se 577 he model redics saisfacorily for 10 days in he fuure.
(a) Camus Inerne Connecion Building Backbone (b) (c) Offices only VLAN Figure 7. Predicions of he weekly raffic using he SARIMA (1,1,1)x(0,1,1) 168 model. Saring from se 350 he model redics saisfacorily for 10 days in he fuure, including days wih lower raffic.
Table 2 summarizes he resuls for each daase using seasonal models wih weekly eriodiciy. Each model is also used for fuure redicions and he resuls are quie accurae as shown in figures 6 & 7. Daa se (,d,q)x(p,d,q) s AR, MA & Seasonal MA Parameers Inerne Conn. ( 1,1,1 )x( 0,1,1 ) 168 AR: 0.520607, MA: 0.975213, SMA: 0.767889 Gbi Backbone ( 1,1,1 )x( 0,1,1 ) 168 AR: 0.404779, MA: 0.982133, SMA: 0.921960 Offices VLAN ( 1,1,1 )x( 0,1,1 ) 168 AR: 0.485866, MA: 0.996179, SMA: 0.921947 Dial-U Users ( 1,1,1 )x( 0,1,2 ) 168 AR: 0.609797, MA: 0.988091, SMA: 0.81489, 0.09942 Table 2 : Seasonal ARIMA model arameers for all 4 raffic daases In addiion o he weekly models, SARIMA models wih daily eriodiciy were also esed. They redic equally well he fuure daily uilizaion, bu under he assumion ha as and fuure day belong in he same caegory. As i is clear in figure 6, here are wo yes of daily uilizaion he one of a normal working day and he oher of a weekend or vacaion. Daily SARIMA doesn searae and erforms an average redicion for he nex day whaever he day is. On he oher hand he weekly SARIMA handles he raffic aern of an enire week ha also includes he lower uilizaion observed during weekends. This is mos obvious in figure 7c showing he Offices VLAN raffic when he offices are closed on Saurday and Sunday. For he above reasons we seleced he weekly seasonal ARIMA for weekly, monhly and long ime redicions. The daily seasonal ARIMA is more aroriae for a differen ye of raffic monioring, i.e., by having several daily aerns available, one can monior raffic and adaively selec he curren ye of uilizaion. Of course his is no resriced o working days or weekends, bu also o oher yes of raffic such as connecion roblems, anomalies or inrusions. Such an online idenificaion of he raffic aern is beyond he scoe of his work (modeling & redicion) and i will be sudied in a searae conribuion. 6 CONCLUSIONS In his work we sudied he fiing and forecasing caabiliies of ime series models o he real raffic and uilizaion daa of a camus nework. The mulilicaive seasonal ARIMA model roduced saisfacory forecass for four differen yes of uilizaion. All fuure redicions were accurae once he nework is working under normal condiions (no failures, DoS aacks, ec.). Moreover he seasonal model wih weekly eriodiciy was suerior o he one wih daily eriodiciy, as i akes ino accoun he raffic reducion during weekends. Our resuls agree o hose of oher researchers and we concluded ha he so-called seasonal ime series could offer a unique ool for modeling, wih aricular bearing o he real ime monioring and redicion of nework raffic & uilizaion. A furher exloraion of he above models for online idenificaion follows in a searae conribuion. REFERENCES [1] Box G., Jenkins G.M. and Reinsel G. (1994), Time Series Analysis: Forecasing & Conrol, 3 rd ed. Prenice Hall,. [2] Solomos G.P. and Moussas V.C. (1991), "A Time Series Aroach o Faigue Crack Proagaion", Srucural Safey, 9,.211-226. [3] Paagiannaki K., Taf N., Zhang Z. and Dio C. (2003), "Long-Term Forecasing of Inerne Backbone Traffic: Observaions and Iniial Models", IEEE Infocom 2003. [4] Shu Y., Yu M., Liu J. and Yang O.W.W. (2003), "Wireless Traffic Modeling and Predicion Using Seasonal ARIMA Models", IEEE Inl. Conference on Communicaion May 2003, ICC 03 vol. 3. [5] You C. and Chandra K. (1999), "Time Series Models for Inerne Daa Traffic", In Proc. 24h Conf. on Local Comuer Neworks Ocober 1999, LCN-99. [6] Kasikas S.K., Likohanassis S.D. and Lainiois D.G. (1990), "AR model idenificaion wih unknown rocess order", IEEE Trans. Acous., Seech and Signal Proc., Vol. ASSP-38, No. 5,. 872-876. [7] Akaike H. (1969), "Fiing Auoregressive models for Predicion", Ann. Ins. Sa. Mah., vol. 21,. 243-247. [8] Oeiker Tobias, (2005), Muli Rouer Traffic Graher (MRTG) ool, Sofware Package & Manuals, h://eole.ee.ehz.ch/~oeiker/webools/mrg/, Las visi: Feb 2005.