NETWORK TRAFFIC MODELING AND PREDICTION USING MULTIPLICATIVE SEASONAL ARIMA MODELS



Similar documents
Usefulness of the Forward Curve in Forecasting Oil Prices

CALCULATION OF OMX TALLINN

Chapter 8: Regression with Lagged Explanatory Variables

Stock Price Prediction Using the ARIMA Model

Optimal Real-Time Scheduling for Hybrid Energy Storage Systems and Wind Farms Based on Model Predictive Control

SPEC model selection algorithm for ARCH models: an options pricing evaluation framework

MACROECONOMIC FORECASTS AT THE MOF A LOOK INTO THE REAR VIEW MIRROR

Predicting Stock Market Index Trading Signals Using Neural Networks

RISK-BASED REPLACEMENT STRATEGIES FOR REDUNDANT DETERIORATING REINFORCED CONCRETE PIPE NETWORKS

PRESSURE BUILDUP. Figure 1: Schematic of an ideal buildup test

2.2 Time Series Analysis Preliminaries Various Types of Stochastic Processes Parameters of Univariate and Bivariate Time Series

TEMPORAL PATTERN IDENTIFICATION OF TIME SERIES DATA USING PATTERN WAVELETS AND GENETIC ALGORITHMS

Sensor Network with Multiple Mobile Access Points

Appendix D Flexibility Factor/Margin of Choice Desktop Research

The Greek financial crisis: growing imbalances and sovereign spreads. Heather D. Gibson, Stephan G. Hall and George S. Tavlas

DOES TRADING VOLUME INFLUENCE GARCH EFFECTS? SOME EVIDENCE FROM THE GREEK MARKET WITH SPECIAL REFERENCE TO BANKING SECTOR

Vector Autoregressions (VARs): Operational Perspectives

Strictly as per the compliance and regulations of:

How To Write A Demand And Price Model For A Supply Chain

An Agent-based Bayesian Forecasting Model for Enhanced Network Security

A Note on Using the Svensson procedure to estimate the risk free rate in corporate valuation

[web:reg] ARMA Excel Add-In

Time Series Analysis Using SAS R Part I The Augmented Dickey-Fuller (ADF) Test

A New Type of Combination Forecasting Method Based on PLS

Journal Of Business & Economics Research September 2005 Volume 3, Number 9

WHAT ARE OPTION CONTRACTS?

Illiquidity and Pricing Biases in the Real Estate Market

Statistical Analysis with Little s Law. Supplementary Material: More on the Call Center Data. by Song-Hee Kim and Ward Whitt

The naive method discussed in Lecture 1 uses the most recent observations to forecast future values. That is, Y ˆ t + 1

Chapter 8 Student Lecture Notes 8-1

Microstructure of Russian stock market and profitability of market making

Automatic measurement and detection of GSM interferences

SEASONAL ADJUSTMENT. 1 Introduction. 2 Methodology. 3 X-11-ARIMA and X-12-ARIMA Methods

Index futures, spot volatility, and liquidity: Evidence from FTSE Xinhua A50 index futures. Yakup Eser Arisoy *

Cointegration: The Engle and Granger approach

INTRODUCTION TO FORECASTING

Measuring macroeconomic volatility Applications to export revenue data,

UPDATE OF QUARTERLY NATIONAL ACCOUNTS MANUAL: CONCEPTS, DATA SOURCES AND COMPILATION 1 CHAPTER 7. SEASONAL ADJUSTMENT 2

The Kinetics of the Stock Markets

Return performance, leverage effect, and volatility spillover in Islamic stock indices evidence from DJIMI, FTSEGII and KLSI

Mathematics in Pharmacokinetics What and Why (A second attempt to make it clearer)

Improving timeliness of industrial short-term statistics using time series analysis

The Application of Multi Shifts and Break Windows in Employees Scheduling

Stability analysis of constrained inventory systems with transportation delay

Hotel Room Demand Forecasting via Observed Reservation Information

Why Did the Demand for Cash Decrease Recently in Korea?

Distributing Human Resources among Software Development Projects 1

DYNAMIC MODELS FOR VALUATION OF WRONGFUL DEATH PAYMENTS

Improvement in Forecasting Accuracy Using the Hybrid Model of ARFIMA and Feed Forward Neural Network

USE OF EDUCATION TECHNOLOGY IN ENGLISH CLASSES

Modeling a distribution of mortgage credit losses Petr Gapko 1, Martin Šmíd 2

Morningstar Investor Return

Key Words: Steel Modelling, ARMA, GARCH, COGARCH, Lévy Processes, Discrete Time Models, Continuous Time Models, Stochastic Modelling

Duration and Convexity ( ) 20 = Bond B has a maturity of 5 years and also has a required rate of return of 10%. Its price is $613.

Diane K. Michelson, SAS Institute Inc, Cary, NC Annie Dudley Zangi, SAS Institute Inc, Cary, NC

Forecasting. Including an Introduction to Forecasting using the SAP R/3 System

TSG-RAN Working Group 1 (Radio Layer 1) meeting #3 Nynashamn, Sweden 22 nd 26 th March 1999

Government Revenue Forecasting in Nepal

A closer look at Black Scholes option thetas

Chapter 1 Overview of Time Series

This is the author s version of a work that was submitted/accepted for publication in the following source:

Task is a schedulable entity, i.e., a thread

Trends in TCP/IP Retransmissions and Resets

Principal components of stock market dynamics. Methodology and applications in brief (to be updated ) Andrei Bouzaev, bouzaev@ya.

Individual Health Insurance April 30, 2008 Pages

FORECASTING NETWORK TRAFFIC: A COMPARISON OF NEURAL NETWORKS AND LINEAR MODELS

COMPARISON OF AIR TRAVEL DEMAND FORECASTING METHODS

WATER MIST FIRE PROTECTION RELIABILITY ANALYSIS

Prostate Cancer. Options for Localised Cancer

Do Public Income Transfer to the Poorest affect Internal Inter-Regional Migration? Evidence for the Case of Brazilian Bolsa Família Program

Contrarian insider trading and earnings management around seasoned equity offerings; SEOs

The Relationship between Stock Return Volatility and. Trading Volume: The case of The Philippines*

PARAMETRIC EXTREME VAR WITH LONG-RUN VOLATILITY: COMPARING OIL AND GAS COMPANIES OF BRAZIL AND USA.

Subband-based Single-channel Source Separation of Instantaneous Audio Mixtures

The Real Business Cycle paradigm. The RBC model emphasizes supply (technology) disturbances as the main source of

Finance and Economics Discussion Series Divisions of Research & Statistics and Monetary Affairs Federal Reserve Board, Washington, D.C.

Hedging with Forwards and Futures

A Scalable and Lightweight QoS Monitoring Technique Combining Passive and Active Approaches

DEMAND FORECASTING MODELS

FORECASTING WATER DEMAND FOR AGRICULTURAL, INDUSTRIAL AND DOMESTIC USE IN LIBYA

Multiprocessor Systems-on-Chips

Mobile Broadband Rollout Business Case: Risk Analyses of the Forecast Uncertainties

Transcription:

1s Inernaional Conference on Exerimens/Process/Sysem Modeling/Simulaion/Oimizaion 1s IC-EsMsO Ahens, 6-9 July, 2005 IC-EsMsO NETWORK TRAFFIC MODELING AND PREDICTION USING MULTIPLICATIVE SEASONAL ARIMA MODELS Vassilios C. Moussas 1*, Marios Daglis 1, and Eva Kolega 1 1 Nework Oeraions Cenre (NOC), Technological Educaional Insiuion (T.E.I.) of Ahens Egaleo GR-12210, Greece, * e-mail: vmouss@eiah.gr Keywords: Time-Series, ARIMA, Seasonal, Nework, Traffic, Uilizaion. Absrac: Today s nework designers are execed o lan for fuure exansion and o esimae he nework s fuure uilizaion. Several simulaors can be used for wha-if scenarios bu hey all require as inu some esimaes of he fuure nework use. A mehod for esimaing he fuure uilizaion of a nework is resened in his work. Nework uilizaion is iniially modeled using an ARIMA model (, d, q), bu is redicion accuracy has a limied ime san. The redicion is imroved significanly by using a mulilicaive seasonal ARIMA (, d, q) x (P, D, Q) s model. The seasonal model roved exremely caable o recreae he curren daa and redic he fuure uilizaion wih recision. The only requiremen of he nonlinear model is he availabiliy of longer as records. The daily, weekly and monhly daases were colleced from real-life nework uilizaion, a he TEI of Ahens camus nework. 1 INTRODUCTION Nework raffic monioring and fuure raffic redicions lay a significan role in he design of neworks and nework ugrades. I is imoran o forecas he nework raffic workload when lanning, designing, conrolling and managing various LAN, MAN or WAN neworks. Time-series echniques look romising for modeling nework uilizaion, as, he Auo Regressive Inegraed Moving Average models are able o caure rends and eriodic behavior. The ARIMA and Seasonal ARIMA models have been alied successfully in economy, signal rocessing, safey and oher research fields [1, 2]. Recenly, several researchers modeled secific nework characerisics, ranging from long-erm backbone raffic forecasing o wireless and GSM raffic, using ime series models wih success [3, 4, 5]. In his aer a se of ARIMA models is used o redic fuure nework uilizaion from simle and common raffic monioring daa. The aim is o obain accurae redicions using common (easy o find, simle o aly and widely used) raffic daases. Four differen uilizaion cases were sudied, namely, a camus Inerne connecion, a building backbone, an office LAN, and a 160 lines dialu connecion. 2 TRAFFIC DATA COLLECTION There are several yes of daa o collec, when sudying he raffic of a nework. Almos any raffic characerisic may be measured and logged i.e. bi or acke rae, acke size, roocols, ors, connecions, addresses, server load, alicaions, ec. Rouers, firewalls, servers and managers (servers wih agens) can be used for his ask, bu measuring and archiving all hese daa for oenial fuure reamen is no a regular rocedure. Usually mos neworks log only he load of heir lines and he uilizaion some criical resources, using a more deailed monioring only when a resource requires secific aenion. As a resul, raffic rae or uilizaion are he only daa collecions ha are always available and wih long hisory records on almos any nework. In his work we focus on such commonly found daa ses in order o develo a more general and widely alied ool. Our raffic daa are aken from he rouer s sandard MIB or from he server s yical. In order o avoid device or sysem secific roblems he daa were aken via a monioring ool. The Muli Rouer Traffic Graher

(MRTG ) [8] ool is used for all daa collecion. The MRTG ool is very common, widely alied, and easily imlemened sofware for collecing and monioring uilizaion daa from a rouer or server MIB. The MRTG ool roduces sandard log files wih curren and as daa ha can be downloaded and saved by any browser or simle GET commands. Our mehod uses some MATLAB rocedures ha read hose sandard files and reare he daa for he model idenificaion ses. We seleced 4 differen yes of load/uilizaion daa regularly colleced in our camus nework (Figure 1): a) he TEI of Ahens camus Inerne connecion raffic rae (in bs), b) one of he camus buildings backbone raffic rae (in bs), c) an offices-only VLAN wih no labs (in bs), d) he number of he on-line Dialu users. Figure 1. Average uilizaion daa from he TEI of Ahens camus nework (weekly daa). a) The camus Inerne connecion. b) A building backbone. c) An offices only VLAN. d) On line dial-u users. Daa samling is erformed by defaul every 5 minues. Alhough i can be changed, his is a sandard adjusmen for MRTG and i is used in mos imlemenaions. Daily raffic is herefore exressed in ses of a 5- minue average raffic rae, measured in bis er seconds (bs). Samling by a shorer eriod (e.g. 1 min) may someimes lead o inaccuracies due o iming and delayed resonses from he rouer s MIB and samling by a longer eriod (e.g. 10 min) may lead o loss of informaion due o averaging. Therefore, he defaul samling rae (5-min) was finally used for his work. For visualizing long eriods MRTG does some averaging over a longer se. Weekly raffic is exressed in 30- min ses, monhly raffic in 2-hour ses, and, yearly raffic in 1-day ses. In order o reserve some informaion los by averaging he ool logs ogeher he average and he maximum value for each se. They can boh be used o roduce forecass, regarding he average or he maximum behavior resecively. As i also clear in Figure 1, he uilizaion of he nework resources demonsraes a daily eriodiciy. In addiion mos raffic aerns demonsrae anoher, less aaren, weekly eriodiciy. The monhly grah in Figure 2 clearly resens boh yes of eriodiciy. Such characerisics are imoran for he modeling rocedure ha follows, and for he forecasing accuracy of he mehod. Figure 2. Dialu uilizaion in TEI demonsraing daily and weekly eriodiciy. 3 TIME SERIES MODELING The rincile underlying his mehodology is ha raffic daa occur in a form of a ime series where observaions are deenden. This deendency is no necessarily limied o one se (Markov assumion) bu i can exend o

many ses in he as of he series. Thus in general he curren value X (= nework raffic a ime ) of he rocess X can be exressed as a finie linear aggregae of revious values of he rocess and he resen and revious values of a random inu u [1], i.e. X = ϕ X + ϕ X + + ϕ X + u θu θ u θ u (1) 1 1 2 2 1 1 2 2 q q In his equaion and reresen resecively he raffic volume and he random inu a equally saced ime inervals, -1, -2,.The random inu u consiue a whie noise sochasic rocess, whose disribuion is assumed o be Gaussian wih zero mean and sandard deviaion σ u. Eqn. (1) can be economically rewrien as (4) by defining he auoregressive (AR) oeraor of order and he moving-average (MA) oeraor of order q resecively by (2) & (3): ( ) 1 1 2 2 ( ) 1 1 2 2 ( B) X θ( B) u ϕ B = ϕ B ϕ B ϕ B (2) θ B = θ B θ B θ B (3) ϕ = (4) where, B sands for he backward shif oeraor defined as B s X = X -s. Anoher closely relaed oeraor is he backward difference oeraor defined as X = X - X -1 and hus, = 1- B, d = (1- B) d and s D = (1- B s ) D. The auoregressive moving-average model (ARMA) as formulaed above is limied o modeling henomena exhibiing saionariy. Clearly his is no he case for he nework raffic daa of Fig. 1. I is ossible hough ha he rocesses sill ossess homogeneiy of some kind. I is usually he case ha he d h difference of he original ime series exhibis saionary characerisics. The revious ARMA model could hen be alied o he new saionary rocess X and eqn. (4) will corresondingly read d ( ) θ( ) ϕ B X = B u (5) This equaion reresens he general model used in his work. Clearly, i can describe saionary (d = 0) or nonsaionary (d 0), urely auoregressive (q = 0) or urely moving-average ( = 0) rocesses. I is called auoregressive inegraed moving-average (ARIMA) rocess of order (, d, q). I emloys + q +1 unknown arameers ϕ 1,, ϕ ; θ 1,, θ q ; σ u, which will have o be esimaed from he daa. Saring from ARIMA model of eqn. (5) i can be deduced [1] ha a seasonal series can be mahemaically reresened by he general mulilicaive model ofen called Seasonal ARMA or SARIMA ϕ s d D s ( ) ( ) θ ( ) ( ) B Φ B X = B Θ B u (6) P s q Q The general scheme for deermining a model includes hree hases, which are: 1. Model idenificaion, where he values of he arameers, d, q are defined 2. Parameer esimaion, where he {ϕ} and {θ} arameers are deermined in some oimal way, and 3. Diagnosic checking for conrolling he model s erformance. As is saed however in [1], here is no uniqueness in he ARIMA models for a aricular hysical roblem. In he selecion rocedure, among oenially good candidaes one is aided by cerain crieria. Alhough more advanced mehods for model selecion have been roosed [6] he mos common and classic crieria remain he Akaike s Informaion Crierion (AIC) and he Schwarz s Bayesian Informaion Crierion (SBC or BIC) [1,7]. If L = L (ϕ 1,, ϕ, θ 1,, θ q, σ u ) reresens he likelihood funcion, formed during he arameer esimaion, he AIC and SBC are exressed resecively, as AIC = 2ln L+ 2 k & SBC = 2ln L+ ln ( n) k (7) where, k = number of free arameers (= + q) and n = number of residuals ha can be comued for he ime series. Proer choice of and q calls for a minimizaion of he AIC and SBC.

4 ARIMA MODEL BUILDING Vassilios C. Moussas, Marios Daglis, and Eva Kolega In he search of he arameers, d, q of he ARIMA model of eqn. (5) we work firs wih a single day raffic daa from Fig. 2. The curve in fig. 2 reresens he average daily raffic of he TEI camus Inerne connecion. The second difference of he daily curve (i.e. d = 2) demonsraes saionariy, as i is shown by is Auocorrelaion Funcion (ACF). The original curve, is 1 s and 2 nd differences and heir corresonding ACF are shown in Fig. 3. Figure 3. Inerne connecion of he TEI camus. Daily raffic daa, he 2 nd difference and heir ACFs. The crieria are minimized for = q = 3, herefore he ARIMA (3,2,3) model should be used. Parameer esimaion for his model yields he following values: ϕ 1 = -0.317, ϕ 1 = 0.318, ϕ 1 = -0.153, θ 1 = 0.764, θ 1 = 0.899, and, θ 1 = -0.7475. The model fis saisfacorily he daa, bu i may redic accuraely only for a few ses in he fuure (Figure 4). As shown in Figure s (4) inside lo, i is no reliable for longer redicions. Figure 4. Fiing of he average daily raffic (Inerne connecion) using he ARIMA (3,2,3) model, and, is redicions. Shor ime (1-4 ses) forecass of he 30 h case may be acceable, bu 30 min (6 ses) or longer forecass for he same case are inaccurae. Daa se (,d,q) AR & MA Parameers Inerne Conn. ( 3,2,3 ) AR: -0.317, 0.318, -0.153, MA: 0.764, 0.899, -0.7475 Gbi Backbone ( 2,2,1 ) AR: -0.4334, -0.2827, MA: 0.9676 Offices VLAN ( 1,1,1 ) AR: -0.3864, MA: -0.5468 Dial-U Users ( 1,2,1 ) AR: -0.1833, MA: 0.5915 Table 1 : ARIMA model arameers for all 4 raffic daases

The idenified models ha minimize he AIC/SBC crieria for each daase are shown in Table 1. The same conclusions regarding fuure redicions hold for all 4 kinds of uilizaion daa. The ARIMA model is no accurae for long forecas eriods. As already menioned in secion 2 he uilizaion daa demonsrae some eriodiciies and herefore anoher ye of model mus be alied ha will ake ino accoun such roeries. 5 SEASONAL ARIMA MODEL BUILDING As he above aroach yields acceable redicions only for a shor eriod and in order o redic reliably for longer eriods, he use of he seasonal aroach is required. From he firs ACF lo in figure 5 becomes clear ha he daa se has wo seasonal comonens, a daily one (a 24h) and a weekly one (a 168h). Figure 5. Inerne connecion. Four monhs of raffic daa, single & seasonal differences and heir ACFs. In he search of he arameers, d, q and P, D, Q of he SARIMA model of eqn. (6) we work as before by aking one single and one seasonal difference. Afer wo differences (i.e. D = 1 & d = 1) he ACF demonsraes saionariy and resens a eak a he seasonal comonen, i.e. lag 168, when he weekly eriodiciy is seleced (Fig. 5). Figure 6. Predicion of he weekly Dial-U uilizaion using he SARIMA (1,1,1)x(0,1,2) 168 model. Saring from se 577 he model redics saisfacorily for 10 days in he fuure.

(a) Camus Inerne Connecion Building Backbone (b) (c) Offices only VLAN Figure 7. Predicions of he weekly raffic using he SARIMA (1,1,1)x(0,1,1) 168 model. Saring from se 350 he model redics saisfacorily for 10 days in he fuure, including days wih lower raffic.

Table 2 summarizes he resuls for each daase using seasonal models wih weekly eriodiciy. Each model is also used for fuure redicions and he resuls are quie accurae as shown in figures 6 & 7. Daa se (,d,q)x(p,d,q) s AR, MA & Seasonal MA Parameers Inerne Conn. ( 1,1,1 )x( 0,1,1 ) 168 AR: 0.520607, MA: 0.975213, SMA: 0.767889 Gbi Backbone ( 1,1,1 )x( 0,1,1 ) 168 AR: 0.404779, MA: 0.982133, SMA: 0.921960 Offices VLAN ( 1,1,1 )x( 0,1,1 ) 168 AR: 0.485866, MA: 0.996179, SMA: 0.921947 Dial-U Users ( 1,1,1 )x( 0,1,2 ) 168 AR: 0.609797, MA: 0.988091, SMA: 0.81489, 0.09942 Table 2 : Seasonal ARIMA model arameers for all 4 raffic daases In addiion o he weekly models, SARIMA models wih daily eriodiciy were also esed. They redic equally well he fuure daily uilizaion, bu under he assumion ha as and fuure day belong in he same caegory. As i is clear in figure 6, here are wo yes of daily uilizaion he one of a normal working day and he oher of a weekend or vacaion. Daily SARIMA doesn searae and erforms an average redicion for he nex day whaever he day is. On he oher hand he weekly SARIMA handles he raffic aern of an enire week ha also includes he lower uilizaion observed during weekends. This is mos obvious in figure 7c showing he Offices VLAN raffic when he offices are closed on Saurday and Sunday. For he above reasons we seleced he weekly seasonal ARIMA for weekly, monhly and long ime redicions. The daily seasonal ARIMA is more aroriae for a differen ye of raffic monioring, i.e., by having several daily aerns available, one can monior raffic and adaively selec he curren ye of uilizaion. Of course his is no resriced o working days or weekends, bu also o oher yes of raffic such as connecion roblems, anomalies or inrusions. Such an online idenificaion of he raffic aern is beyond he scoe of his work (modeling & redicion) and i will be sudied in a searae conribuion. 6 CONCLUSIONS In his work we sudied he fiing and forecasing caabiliies of ime series models o he real raffic and uilizaion daa of a camus nework. The mulilicaive seasonal ARIMA model roduced saisfacory forecass for four differen yes of uilizaion. All fuure redicions were accurae once he nework is working under normal condiions (no failures, DoS aacks, ec.). Moreover he seasonal model wih weekly eriodiciy was suerior o he one wih daily eriodiciy, as i akes ino accoun he raffic reducion during weekends. Our resuls agree o hose of oher researchers and we concluded ha he so-called seasonal ime series could offer a unique ool for modeling, wih aricular bearing o he real ime monioring and redicion of nework raffic & uilizaion. A furher exloraion of he above models for online idenificaion follows in a searae conribuion. REFERENCES [1] Box G., Jenkins G.M. and Reinsel G. (1994), Time Series Analysis: Forecasing & Conrol, 3 rd ed. Prenice Hall,. [2] Solomos G.P. and Moussas V.C. (1991), "A Time Series Aroach o Faigue Crack Proagaion", Srucural Safey, 9,.211-226. [3] Paagiannaki K., Taf N., Zhang Z. and Dio C. (2003), "Long-Term Forecasing of Inerne Backbone Traffic: Observaions and Iniial Models", IEEE Infocom 2003. [4] Shu Y., Yu M., Liu J. and Yang O.W.W. (2003), "Wireless Traffic Modeling and Predicion Using Seasonal ARIMA Models", IEEE Inl. Conference on Communicaion May 2003, ICC 03 vol. 3. [5] You C. and Chandra K. (1999), "Time Series Models for Inerne Daa Traffic", In Proc. 24h Conf. on Local Comuer Neworks Ocober 1999, LCN-99. [6] Kasikas S.K., Likohanassis S.D. and Lainiois D.G. (1990), "AR model idenificaion wih unknown rocess order", IEEE Trans. Acous., Seech and Signal Proc., Vol. ASSP-38, No. 5,. 872-876. [7] Akaike H. (1969), "Fiing Auoregressive models for Predicion", Ann. Ins. Sa. Mah., vol. 21,. 243-247. [8] Oeiker Tobias, (2005), Muli Rouer Traffic Graher (MRTG) ool, Sofware Package & Manuals, h://eole.ee.ehz.ch/~oeiker/webools/mrg/, Las visi: Feb 2005.