KSII RANSAIONS ON INERNE AN INFORMAION SYSEMS VOL. 4, NO. 3, Jue 2 428 opyrigh c 2 KSII ombiig Adapive Filerig ad IF Flows o eec os Aacks wihi a Rouer Ruoyu Ya,2, Qighua Zheg ad Haifei Li 3 eparme of ompuer Sciece ad echology, MOE KLINNS, i a Jiaoog Uiversiy i a, Shaxi, hia [e-mail: qhzheg@mail.xju.edu.c] 2 School of Iformaio Sciece, Guagdog Ocea Uiversiy, Zhajiag, Guagdog, hia [e-mail: ryya@sei.xju.edu.c] 3 eparme of ompuer Sciece, Uio Uiversiy, Jackso, N, USA [e-mail: hli@uu.edu] *orrespodig auhor: Qighua Zheg Received February 3, 2; revised March 7, 2; acceped April 29, 2; published Jue 3, 2 Absrac raffic marix-based aomaly deecio ad os aacks deecio i eworks are research focus i he ework securiy ad raffic measureme commuiy. I his paper, firsly, a ew ype of uidirecioal flow called IF flow is proposed. Meris ad feaures of IF flows are aalyzed i deail ad he wo efficie mehods are iroduced i our os aacks deecio ad evaluaio scheme. he firs mehod uses residual variace raio o deec os aacks afer Recursive Leas Square (RLS) filer is applied o predic IF flows. he secod mehod uses geeralized likelihood raio (GLR) saisical es o deec os aacks afer a Kalma filer is applied o esimae IF flows. Based o he wo complemeary mehods, a evaluaio formula is proposed o assess he seriousess of curre os aacks o rouer pors. Furhermore, he sesiiviy of hree ypes of raffic (IF flow, ipu lik ad oupu lik) o os aacks is aalyzed ad compared. Experimes show ha IF flow has more power o expose aomaly ha he oher wo ypes of raffic. Fially, wo proposed mehods are compared i erms of deecio rae, processig speed, ec., ad also compared i deail wih Pricipal ompoe Aalysis (PA) ad umulaive Sum (USUM) mehods. he resuls demosrae ha adapive filer mehods have higher deecio rae, lower false alarm rae ad smaller deecio lag ime. Keywords: Aomaly deecio, disribued deial of service, Kalma filer, recursive leas square, rouer-wide raffic aalysis A prelimiary versio of his paper appeared i IEEE IS 28, November 9-2, Guagzhou, hia. his versio icludes a os aacks deecio scheme, ad a cocree aalysis ad compariso. he research was suppored by he Naioal High-ech R& Program of hia (28AAZ3), he Naioal Sciece Foudaio of hia(682522, 68379, 66332, 6923), he Naioal Key echologies R& Program of hia (26BAKB2, 26BAJ7B6, 28BAH26B2, 29BAH5B), he Ope Projec Program of he Key Laboraory of omplex Sysems ad Ielligece Sciece, Isiue of Auomaio, hiese Academy of Scieces (28). We express our haks o r. Juaid Kha who checked our mauscrip. OI:.3837/iis.2.6.4
KSII RANSAIONS ON INERNE AN INFORMAION SYSEMS VOL. 4, NO. 3, Jue 2 429. Iroducio Iere-based aacks ca be lauched aywhere i he world ad ay Iere-based service is a poeial arge for hese aacks. A deial of service (os) aack aims o dey legiimae users o access shared services or resources []. Whe he raffic of a os aack comes from muliple sources, i is called a disribued deial of service (os) aack. By usig muliple aack sources, he power of os aacks is amplified ad he problem of defese becomes much more complicaed. I 24, Federal Bureau of Ivesigaio (FBI) ad ompuer Securiy Isiue (SI ) released a survey [2] o impac of global Iere securiy eves, which shows ha amog all Iere-based aacks os aack is he mos cosly oe his year. As a resul, how o ideify ad preve os aacks is crucial. A challege i deecig os aacks is he eed o deal wih huge raffic i eworks. May aomaly deecio approaches [3][4][5][6][7][8][9] solve he raffic processig boleeck by aggregaig raffic volume daa over ime isead of scruiizig every packe. his ca speed up deecio ad reveal impora feaures for securiy maageme. here are may ways of aggregaio [] o mee differe eeds, such as aggregaig raffic accordig o Poi of Preseces (POPs), liks, ad IP address prefixes. As for os aacks deecio, i is ecessary o aalyze a ypical os aack srucure show i Fig.. his srucure resembles a huge fuel i which aack packes are aggregaed o he arge, ad each rouer acs like a smaller fuel aggregaig aack raffic from differe pors o he desiaio por. If he aack raffic is huge, deecio of he aack a desiaio por is o a problem. Bu if he aack raffic is smaller, i is always flooded by big ormal raffic aggregaed a egress por, so i is hardly deeced. We propose a ew mehod o cluser raffic o deec his kid of mior aacks accuraely ad imely, which aggregaes raffic bewee wo pors wihi a rouer. his aggregaed raffic is called IF flow ad i is very suiable for os aacks deecio as i ca icrease he value of he raio of aack raffic o ormal raffic compared o ipu liks ad oupu liks. Aoher challege i ideifyig os aacks is deecio efficiecy for high badwidh eworks ad limied compuaioal resources. he secod major coribuio of his paper is he developme of a scheme o deec ad evaluae os aacks i a rouer based o IF flows, which preses a efficie way of deecio compared wih wo exisig saisical mehods. o quickly ad accuraely deec aacks we propose wo adapive filerig-based deecio mehods i he scheme. he wo mehods ca work idepedely uder differe eviromes accordig o he acual eeds. Fig.. os aack ree
43 Ya e al.: ombiig Adapive Filerig ad IF Flows o eec os Aacks wihi a Rouer I he ex secio, we briefly summarize exisig research o aack deecio echiques, especially os/os aacks deecio i ework raffic. I secio 3, we defie hree ypes of raffic ad discuss he meris of IF flows i deail. I secio 4, we propose a os aacks deecio scheme. Paricularly, a evaluaio formula is proposed o assess how seriously a rouer por is uder os aacks. he RLS-based mehod ad Kalma-based mehod o deec os aacks are preseed i Secio 5 ad Secio 6 respecively. We demosrae he effeciveess of our approach usig a empirical evaluaio ad give some meaigful compariso resuls i Secio 7. Secio 8 gives he exesio of he proposed scheme. 2. Relaed Work Exisig mehods for os/os aacks deecio ca be roughly caegorized io hree ypes accordig o he deployme pois: ) deployed o a lik or a server,2) deployed o a rouer, 3) deployed o a large-scale ework. eployed o a Lik or a Server: [] proposed a scheme called MULOPS o deec os aacks by moiorig he packe rae i boh he up ad dow liks. his scheme assumes ha packe raes bewee wo hoss or subes are proporioal durig ormal operaio. If he packe raes are sigificaly disproporioal, i is srogly believed ha a os aack happes. [2] proposed SYN deecio o deec SYN floodig by moiorig saisical chages. he raio of SYN packes o FIN ad RS packes is used. Whe he radom sequece is saisically o-homogeeous ad he raio chages remarkably, a aack is assumed o be deeced. [3] used Kolmogorov omplexiy-based algorihm o deec os aacks wih a high accuracy rae. his algorihm works well uder he assumpio ha os aacks always chage raffic feaure disribuio. eployed o a Rouer: I [4] auhors proposed a mehod o keep a hisory of all he legiimae IP addresses which have previously appeared i a rouer. Whe a curre IP packe is comig, he hisory is used o decide wheher a os aack has happeed. How o maiai ad idex he huge IP address daabase is a big problem. [5] aalyzed he raffic paers i a rouer ad adoped a oparameric umulaive Sum (USUM) o process raffic a each ipu/oupu por. Based o his mehod applied o a rouer, a hierarchical alarm sysem agais os aacks was iroduced. [6] aimed a he chage of pors ipu ad oupu raffic i a core rouer, ad employs a improved USUM algorihm o race raffic saisical characerisics i real ime o deec os aacks. Boh mehods [5][6] work well uder he assumpio ha he raio of ipu raffic o oupu raffic is early a cosa value. However some os aacks, such as reflecor aack wih aackers evely disribued i ework, is hard o be deeced. [6] used differe level sube prefix o aggregae raffic goig o ad from a rouer por. Whe raffic volume ad sube umber are large, compuaio is a problem i real ime deecio. [7] proposed Reyi cross eropy o deec os aacks i a rouer. his mehod ca oly ideify wheher here is a aack happeig, bu cao ell exacly which pors are uder aack. eployed o a Large-scale Nework: How o deec os aacks i large-scale ework has become a acive research area i rece years [5][8][9]. All hese papers have adoped disribued deecio echiques o deec os aacks. Specifically, disribued deecio sofware is deployed ad ruig i each rouer. Laer o, each rouer seds is iermediae deecio resuls o a corol ceer. Fially, a geeral resul is compued by daa fusio. O he oher had, ework-wide aomaly deecio based o raffic marix is a ovel mehod for deecig volume aomalies [7][8][9]. Ulike a hierarchical srucure, his kid of mehod ca accomplish aomaly deecio i large-scale eworks i hree seps. he firs sep is o build
KSII RANSAIONS ON INERNE AN INFORMAION SYSEMS VOL. 4, NO. 3, Jue 2 43 raffic marix relaioship bewee origi-desiaio (O) flows ad ipu liks. he secod sep is o esimae raffic marix by Pricipal ompoe Aalysis (PA) or Kalma filer. he hird sep is o build a ormal profile usig a saisical process. Ay aomalies from poeial aackers compared wih he ormal profile are regarded as srog idicaios of a aack. Pracically, obaiig O flows i real ime is o easy, which makes hese mehods hard o be deployed i a rue ework evirome. From he view poi of deecio echiques, geerally he mehods meioed above ca be classified io wo groups. he firs group is called os-aack-specific deecio, which is based o he special feaures of os/os aacks, such as [][2[3][4]. he secod group is called aomaly deecio, which models he behavior of ormal raffic, ad he repors ay aomalies, such as [7][6][7][8][9][5][6]. Aomaly deecio has become a major focus of research, due o is abiliy o deec ew aacks, icludig os aacks. Bu aomaly deecio echiques are facig a dilemma of how o choose a radeoff bewee processig speed ad deecio accuracy. For hese reasos, we propose wo adapive filerig based aomaly deecio algorihms o solve he dilemma. he wo deecio algorihms have good accuracy ad fas speed, ad are complemeary i heir applicaios. he mos similar work wih ours is ha of [9], however here are some remarkable differeces. Firsly, we propose Kalma filer ad RLS filer o deec ad evaluae os aacks i differe applicaio eviromes whereas [9] maily compares four differe Kalma filer based saisical mehods. Furhermore, we focus o how o deec ad evaluae os aacks i a rouer whereas work i [9] deecs aomalies i O flows. Besides, we compare Kalma filer mehod wih RLS filer mehod, USUM mehod ad PA mehod i may aspecs, such as deecio rae, processig speed, deecio lag ime. he sesiiviy of hree ypes of raffic (IF flow, ipu lik, oupu lik) o os aacks is also compared. 3. Meris of Usig IF Flows o eec os Aacks o faciliae he discussio, hree ypes of flow raffic are defied below. IF flow: IF sads for ieral flow i a rouer, which is defied as a group of packes ravelig from oe por o aoher differe por i a rouer per ui ime. I is assumed ha packes ravelig from oe por o he same por are very sparse, so hey are o cosidered. Ipu lik: A group of packes eerig a rouer from oe por per ui ime. Oupu lik: A group of packes leavig a rouer from oe por per ui ime. Por serial umbers are used o mark IF flow, ipu lik ad oupu lik like i Fig. 2. For example, a group of packes ravelig from por A o por per ui ime are marked as IF flow A-. he raffic acually observed o ipu liks (or oupu liks) arises from he superposiio of IF flows wihi a rouer, which ca be see i Fig. 2. he relaioship bewee ipu liks ad IF flows ca be cocisely capured i he rouig marix H. he marix H has size (ipu liks cou) (IF flows cou), where H ij if IF flow j raverses ipu lik i, ad is zero oherwise. he he vecor of raffic cous o ipu liks (Y ) is relaed o he vecor of raffic cous o IF flow ( ) by Y H [2]. his makes i possible o build a raffic sae-space model subsequely. Noe ha he relaioship bewee oupu liks ad IF flows ca also be buil by aoher rouig marix. Much of he work i aomaly deecio has focused o sigle-lik raffic daa. A rouer-wide view of raffic eables deecio of aomalies ha may be igored i idividual lik raffic. IF flows are easy o collec ha O flows because cosideraio of packe rouig bewee muliple rouers is o eeded.
432 Ya e al.: ombiig Adapive Filerig ad IF Flows o eec os Aacks wihi a Rouer 2 Assume here is a rouer wih pors, he i ca produce IF flows by is defiiio; For example, a rouer wih 5 pors show i Fig. 2 ca produce 2 IF flows. o simplify he discussio, a few assumpios are made accordig o he characerisics of os aack pah. Firs, amog IF flows show i Fig. 2, wo of hem are aomalous which are IF flow A- ad B-. Secod, raffic cou o each ipu liks or oupu liks is ad raffic cou o each IF flow is /4 o average. hird, aomaly raffic cou is / o ipu liks A ad B. From he assumpios i is easy o kow ha aomaly raffic accous for % of all raffic o ipu liks A ad B, ad 2% o oupu liks, ad 4% o IF flow A- ad B-. herefore usig IF flows o deec aomalies is more effecive ha usig ipu liks ad oupu liks. May aackers ry o disribue heir os aack raffic evely i a large-scale ework i order o hide aack behaviors ad avoid beig spoed a a early ime. Schemes deployed o a sigle lik, such as i may aomaly-based IS sysems, are hard o spo he aacks imely. I coras, IF flows based mehod ca fid his kid of aack early, because IF flows icrease value of he raio of aack raffic o ormal raffic. I addiio, IF flows expose he pors from which he aack raffic is comig ad o which hey are aggregaed. his por iformaio is very useful for aacks defese wih proper measures. Fig. 2. os aack emulaio 4. os Aacks eecio Scheme Accordig o he deailed aalysis meio above ad pracical applicaio eeds, we propose a os aacks deecio ad evaluaio scheme based o RLS predicio deecio ad Kalma esimaio deecio. he scheme is show i Fig. 3. raffic ollecio: A prese here is o ool available o obai IF flows direcly. Simple ework maageme proocol (SNMP) ca oly be used o collec pors igress ad egress raffic saisics i rouer. Alhough i is possible o obai IF flows by aalyzig packe rouig, oe mus kow rouig able ad moior all packes i rouer. his approach cosumes a sigifica amou of ime ad resources, ad faces he rouble of rouig able updaig i real ime. Foruaely, Neflow records creaed by Neflow cache [2] i a rouer ca be used o achieve his goal. NeFlow, proposed by isco ompay, is based o a flow cocep. Flow is a uidirecioal sream of packes wih five uples: source IP, desiaio IP, source por, desiaio por ad layer 3 proocol ype. Afer cofigurig a rouer o ope Neflow cache, Neflow eries are he creaed a every ime bi ad ecapsulaed i UP
KSII RANSAIONS ON INERNE AN INFORMAION SYSEMS VOL. 4, NO. 3, Jue 2 433 packes direced o a raffic saisical aalysis server. However isco Neflow performace aalysis whie paper [22] idicaes ha he opeed Neflow fucio icreases PU cos a lile. Accordig o he differe cou of Neflow records exised i rouer cache, PU used by Neflow fucio is 7%~23% o average. o make he scheme more flexible ad robus, Kalma-based mehod o reduce opeig Neflow fucio i a rouer is proposed. Fig. 3. os aacks deecio scheme Saisical Aalysis: A raffic saisical aalysis server receives UP packes a a cerai UP por. he i upacks he packes o exrac Neflow records ad sores hem i a daabase. Because every Neflow record icludes ipu ad oupu fields, by usig simple SQL saemes, i is possible o calculae bye cou, packe cou ad flow cou of ipu liks, oupu liks ad IF flows per ui ime. Hisorical aa: All hisorical daa, which iclude previous Neflow records ad saisical aalysis resuls, are sored i a daabase for real ime deecio. For example, hisorical IF flow is eeded o calculae hreshold for judgig aomaly before RLS-based deecio mehod is applied i real ime; hisory of IF flow ad ipu lik(or oupu lik) is eeded o esimae ieraio parameers before Kalma-based deecio mehod is applied i real ime. RLS-based Aomaly eecio: he ormal hisorical IF flow is eeded o calculae deecio hreshold beforehad. Whe a ew IF flow saisical value arrives, his module calculaes he predicio error bewee he ew value ad is RLS predicio value compued a previous ime bi i real ime. he a raio of predicio error variace i a hisory widow o predicio error variace i deecio widow is used as a saisic o deec aomaly. his mehod has higher deecio accuracy bu wih relaively lower processig speed. eailed procedures are preseed i secio 5. Kalma-based Aomaly eecio: Before Kalma filer is applied o esimae IF flow marix, hisorical IF flows ad ipu liks (or oupu liks) are eeded o esimae ieraio parameers i discree Kalma equaios. Whe a ew ipu lik (or oupu lik) saisical value arrives, his module esimaes IF flow value ad calculaes he predicio residual imely. he a GLR saisical es mehod is used o deec aomaly i real ime. his mehod has faser processig speed bu wih relaively lower deecio accuracy. eailed procedures are preseed i secio 6. os Aacks Evaluaio: os aacks evaluaio is a mehod o judge how seriously each
434 Ya e al.: ombiig Adapive Filerig ad IF Flows o eec os Aacks wihi a Rouer rouer por is uder os aacks. he evaluaio mechaism has fully uilized he fuel srucure of os aacks. wo facors ha affec he evaluaio idex are cosidered. Facor oe is how may IF flows goig o a por are deeced as aomaly a oe ime. A may umber of aomalous IF flows goig o a por imply ha he por is uder os aack more seriously. Facor wo is how large is he aomalous IF flow s raffic volume. A larger value of he raio of aomalous IF flow s raffic volume o he por s ou raffic volume implies ha he por is uder os aack more seriously. he evaluaio equaio is give below. Fji ( ) j W ji ( ) Fji ( ) 5 ad Fji ( ) j j () Ei ( ) W ji ( ) Fji ( ) 5 ad Fji ( ) j j exp( / 2Fji ( )) j Fji ( ) j Where, is he cou of acive rouer pors wih raversig packes. i ad j deoe por umber. E i () deoes he iesiy of os aacks agais por i a ime, amely evaluaio idex. ad deoe he weigh of facor oe ad he weigh of facor wo respecively i evaluaio idex. For example, if. 5, i meas aomalous IF flow cou has bee cosidered more impora ha he volume of aomaly raffic i compuaio of evaluaio idex. Whe j i, F ji () is aomalous value of IF flow j i a ime, where F ji ( ) if IF flow j i is abormal, ad is zero oherwise; W ji () is value of he raio of IF flow j i raffic cou o por #i ou raffic cou (amely oupu lik #i raffic cou) a ime. Whe j i, he F ji ( ) ad W ji ( ) i erms of IF flow defiiio. Because he same cou of abormal IF flows impacs bigger o rouer wih a small umber of pors ha wih a large umber of pors. So i pracice if a rouer has por umber o more ha five, we use a liear fucio wih big slope o deoe he coribuio of he cou of abormal IF flows, oherwise sigmoid fucio is used. Sigmoid fucio is a good hreshold fucio, ad is form characerisic is ha is fro par ad back par have smaller slope, bu middle par has seep slope. Sigmoid fucio capures he siuaio well: smaller cou of abormal IF flows affecs lesser o he por; whe he cou is larger o a exe, he effec o por icreases very fas, bu whe he cou furher icreases, he impac of ewly added abormal IF flows is relaively limied. 5. eecio Algorihm Model 5. RLS-based Aomaly eecio he deecio algorihm model based o RLS is show i Fig. 4. I is applied o hree ypes of raffic i he followig experimes: IF flows, ipu liks, oupu liks. We ake IF flows as a example.. IF flow saisic module. his module is he same as he saisic aalysis module show i Fig. 3. I is used o sum up bye cou, packe cou ad flow cou of IF flows as saisics.
KSII RANSAIONS ON INERNE AN INFORMAION SYSEMS VOL. 4, NO. 3, Jue 2 435 2. RLS predicio module. Accordig o differe eeds, IF Flow saisics are seleced ad he prediced by RLS a every ime bi. eailed seps of predicio are give i secio 5.2. 3. Predicio error module. Whe a ew IF Flow saisical value arrives a some ime, predicio error is calculaed. Predicaio error is he differece bewee ewly arrived value ad is prediced oe compued previously. 4. Variace raio saisic deecio module. I deals wih predicio error i deecio widow ad hisorical widow. eailed seps of deecio are give i secio 5.3. 5. Judge aomaly module. Alarm hreshold is firs se by he variace aalysis of hisorical raffic. If a arrived value goes beyod alarm hreshold, aomaly is deemed o exis. Fig. 4. raffic aomaly deecio model 5.2 Nework raffic Predicio Based o RLS Isead of Auo-Regressive (AR) model, RLS is seleced as he predicio algorihm for he followig reasos: ) AR model mus updae weighs periodically i order o deal wih o-saioary sochasic ime series, which icreases compuaio load. he siuaio becomes worse whe facig muliple saisics.2) Before predicio, AR model mus fi a secio of series o ge weighs ad he legh of fied series affecs he predicio precisio ad calculaio speed.3) RLS is adapive o o-saioary sochasic ime series, wih faser speed ad lesser memory whe usig wih a smaller order, such as 2. RLS algorihm is a kid of Kalma filer i aure [23], which exacly mees leas square crierio. RLS is maily used o filer sigal oise, bu also ca predic sigal. Lieraure [24] pois ou ha wih he icrease of predicio sep, he predicio precisio will decrease fas; hece RLS is used o predic raffic i oe sep. Suppose d () called desired respose is our expeced sigal a ime. We aemp o predic he desired ework raffic d( ) x( ) by use of RLS filer. he filer coefficies a ime (amely weigh vecor wih N dimesios) are se as W ( ) [ w ( ) w ( ) w N ( ) ],2,, k. Whe hisorical raffic vecor wih N dimesios a ime is kow ad is N ( ) [ x( ) x( 2) x( N )], d () ca be prediced a priori by ˆ d( ) N ( ) W ( ). Whe weigh vecor dimesio N becomes bigger ad, more hisorical iformaio is used, he predicio resul approximaes he pure sigal. Bu a larger N also leads o more compuaioal work. I he experime N is se as 2. he iiial Weigh vecor ca be se as ay smaller values, such as, because he filer recursio process ca updae weigh vecor ieraively. he followig is he algorihm for predicio: Iiializaio: W ( ) N (), () I, where is small posiive cosa, I is he N-by-N ideiy marix. For each ui ime,,2,...
436 Ya e al.: ombiig Adapive Filerig ad IF Flows o eec os Aacks wihi a Rouer. Read ipu values: d( ), N ( ) 2. Updae predicio error: ( ) d( ) N ( ) W ( ) (2) ( ) N ( ) 3. Updae iformaio gai vecor: G( ) (3) N ( ) ( ) N ( ) 4. Updae weigh vecor: W ( ) W ( ) G( ) ( ) (4) 5. Updae iverse of correlaio marix: ( ) ( ) G( ) N ( ) ( ) (5) I ca be see ha weigh vecor is correced by gai ad error vecor. Equaio (3), (5) updae he value of he gai vecor iself. he iverse of he correlaio marix is replaced a each sep by a simple scalar divisio. Noe is forgeig facor. he smaller is, he smaller coribuio of previous samples. his makes he filer more sesiive o rece samples, which meas more flucuaios i he filer coefficies. Geerally is se as.99. 5.3 Variace Raio Saisic eecio Algorihm Hisorical raffic is obaied beforehad by moiorig he ormal ework. Afer is predicio, predicio error variace is calculaed. he variace capures saisical characerisics of ormal hisorical raffic. Whe i real ime deecio, slidig widow variace raio deecio algorihm show i Fig. 5 is iroduced o ideify aomaly. wo slidig widows are used i he algorihm: hisory widow (HisWi) ad deecio widow (ewi). Boh of hem are slidig i real ime. Fig. 5. Slidig widow s variace raio deecio A firs error variace ev of daa i deecio widow ( ewi, ) ad error variace HisV of daa i hisory widow ( HisWi, ) a ime are compued. he 2 variable raio ( ev / HisV ) is compued o deoe he deparure of daa i deecio widow from daa i hisory widow a ime. If here is a aomaly addig o deecio widow a prese, he raio value would icrease remarkably. As a resul, raio chages ca be used o fid he aomaly. here are hree parameers i he deecio algorihm:. eecio widow size. I is ideal if he size of deecio widow is equal o he lasig ime of possible aomaly. However, i is impossible o kow i advace he lasig ime of aomaly raffic o be deeced. However, i is possible o kow he lasig ime of all hisorical aomaly raffic. herefore, he average lasig ime of all hisorical aomaly raffic is seleced as a deecio widow size. 2. Hisory widow size. A big hisory widow size leads o a more accurae deecio resuls. However, oo large hisory widow size icreases he cos of sysem's sorage ad compuaio. Here hisory widow size is se as. 3. Aomaly hreshold. Aomaly hreshold is x m. Where x ad raio hreshold
KSII RANSAIONS ON INERNE AN INFORMAION SYSEMS VOL. 4, NO. 3, Jue 2 437 deoe average value ad sadard deviaio of raio which is compued from ormal hisorical raffic colleced beforehad, ad m is a smaller posiive value bewee o 4. Whe raio raiohreshold, a aomaly is assumed o be deeced a ime. 6. Kalma-based Aomaly eecio RLS-based mehod ca oly predic oe IF flow a oe ime. However Kalma-based mehod ca process raffic marix as a whole ad all IF flows ca be esimaed simulaeously. As a resul, Kalma-based mehod has a advaage i erms of speed. Moreover, i real ime deecio, Kalma-based mehod does o eed o collec Neflow records. I meas ha he mehod has lile effecs o rouer performace. 6. Buildig raffic Sae-space Model I Fig. 2 he raffic acually observed o ipu liks (similarly o oupu liks) arises from he superposiio of IF flows wihi a rouer. I order o reflec he iaccuracy of daa collecio, equaio (6) is used o express he relaioship bewee ipu liks (or oupu liks) ad IF flows. Y V (6) where Y deoes a ipu lik raffic vecor (observaio vecor), ad deoes a IF flow vecor (hidde variable), ad deoes a ieral rouig marix, where H ij if IF flow j raverses a ipu lik #i, ad is zero oherwise. As he raffic collecig device may cause measureme errors, sochasic variable V is used o capure his error. Alhough predicio model ca have ay srucure, ad oise process ca have ay disribuio, he combiaio of liear sochasic predicio model ad Gaussia oise has bee successfully applied o solve may problems recely. Hece IF flow is reaed here as ework sae, ad a liear equaio is cosruced as follows o build a predicio model o correlae ad. W (7) where sae rasiio marix capures emporal ad spaial correlaios, ad W is a oise sochasic process which deoes he radomess ad upredicabiliy exised i IF flows. For a sigle IF flow, he diagoal elemes of capure he emporal correlaios appearig i he IF flow s evoluio. he o-diagoal elemes of describe he depedecy of oe IF flow o aoher, hus capurig ay spaial correlaios amog he IF flows (if ad whe hey exis). his model follows he form of a ypical liear ime-ivaria dyamical sysem. he combiaio of equaios (6) ad (7) becomes a liear sae space dyamic sysem, give by: W (8) Y V where sae oise W ad measureme oisev are ucorrelaed, zero-mea whie-oise processes ad wih covariace maricesq ad R respecively. If he dyamic sysem model
438 Ya e al.: ombiig Adapive Filerig ad IF Flows o eec os Aacks wihi a Rouer is kow, he opimal esimaio of real ework sae is possible whe a series of observaiosy,..., Y are give. Kalma filer is a classical mehod o solve his problem. 6.2 iscree Kalma Filerig Equaios he Kalma filerig equaios [23] applied o discree ime-varyig sysem are lised below, which icludes predicio procedure give by equaios (9) ad updae procedure i equaios (). Some impora variables used i he equaios are explaied i [23]. Whe iiial codiios, ˆ E[ ] ad error covariace marix ˆ [( )( ˆ P E ) ], are kow, he sysem sae ˆ ca be esimaed ieraively by equaios (9) ad (). ˆ ˆ (9) P P ( ) Q ˆ ˆ ˆ K [ Y ] P ( I K ) P () K P ( ) [ P ( ) R ] 6.3 Usig Expecaio Maximizaio (EM) Algorihm o Esimae {, Q, R} Equaios (9) ad () are geeral descripios of Kalma filer uder o-saioary sae. he deecio resuls are rarely affeced if he {, Q, R} is cosa ad are o calibraed for eve abou oe week. I order o improve deecio speed ad decrease daa collecio i acual applicaios, { H,, Q, R} is assumed o be cosa. So heir ime subscrips i equaios (9) ad () are removed i he followig experimes. { H,, Q, R} should be kow before usig Kalma filer o esimae raffic marix. he rouig marix H is a already give cosa i he ligh of raffic relaioship bewee IF flows ad ipu liks (or oupu liks) i rouer. he remaiig sysem parameer {, Q, R} eeds o be calculaed. We use EM algorihm o esimae i. EM compues maximum likelihood esimaio of recursively. Suppose, he sysem sae is observable i equaio (8), ca be esimaed by maximum likelihood esimaio if Y [ Y Y... Y ] ad [... ] are kow. he maximizig equaio is l L(, Y, ) log Q ( k k ) Q ( k k ) log R 2 2 k 2 ( Yk k ) R ( Yk k ) ONSAN 2 k he sysem parameers are derived as follows [25][26]. ˆ BA, () ˆ Q ( B B A ), (2) Rˆ [( Y )( Y ) P ]. (3)
KSII RANSAIONS ON INERNE AN INFORMAION SYSEMS VOL. 4, NO. 3, Jue 2 439 I he above equaios, A,B ad are defied as follows. P A ] ) ( [, (4) ] ) ( [, P B, (5) P ] ) ( [. (6) osa-ierval Kalma smoohig filer is used o compue eeded parameers },, { B A. his filerig procedure icludes sadard Kalma filer forward recursio ad Kalma filer backward recursio [25][26][27]. Forward recursio procedure icludes previous equaios (9) ad (), bu parameers },,, { R Q H are o depede o ime. Kalma filer backward recursio icludes: For =,-,,, ) ( J, where ) ( P P J (7) J P P J P P ) ( (8) For =,-,,2, J P P J J P P, 2, ) (, where, ) ( P H K I P (9) Noe ha some values such as P P ad P i backward recursio are iiialized by correspodig fial values compued from forward recursio. o sum up, Fig. 6 shows a flow char displayig how EM algorihm esimaes sysem parameer.,q ad R ca be iiialized as ui marix. he larger he recursio imes, he higher esimaio precisio is. Bu ime cos is much higher ad covergece rae is slower wih he icrease of recursio imes. Acually isead of esimaio precisio, a suiable recursio imes is always used as a codiio o ed esimaio. Fig. 6. Flow char of EM esimaio of },, { R Q 6.4 Geeralized Likelihood Raio (GLR )-Based Saisical es
44 Ya e al.: ombiig Adapive Filerig ad IF Flows o eec os Aacks wihi a Rouer Oe sep predicio value ad esimaio value ca be goe direcly from Kalma filer equaios. iovaio ( ) ad residual ( ) of IF flow are compued as follows. ˆ (2) ˆ ˆ K e (2) ca be used o deec aomaly i IF flow raffic, bu calculaig is very complicaed if is ukow. Isead of, residual is used as saisics o deec aomaly hrough GLR es give i [28]. GLR es ca make bes esimaio uder ukow mea ad variace of ime series. I he ex secio he Kalma-based mehod is compared wih RLS-based mehod. 7. Measureme aa Used 7. ompariso ad Aalysis o validae ay aomaly deecio mehod, oe commo approach is o collec live daa i he form of a packe or flow level race, bu operaors mus examie hese daa ad mark aomaly eve. I is hard o label or mark a race, because operaors ca make misakes by eiher missig a aomaly or geeraig a false posiive. A he same ime such live daa coai a limied umber of aomaly eves whose parameers cao be varied. herefore, i is ecessary o creae syheic aacks as es samples. he advaage of his approach is ha he parameers of a aack ca be carefully corolled. he mehod of creaig syheic aomaly described i [9] i deail is used o creae os aack raffic as experime daa. ocree procedure is as follows.. ollec a week of NeFlow raffic i a five-por rouer a iervals of oe miue i i a Jiaoog Uiversiy. IF flows raffic ca be aggregaed by usig ipu field ad oupu field i NeFlow records. Packe cous of IF flows are calculaed as measureme meric ad used i he followig experimes. 2. Use aubechies-5 discree wavele rasform o exrac he log-erm saisical red from he seleced IF flows. he goal is o capure he diural paer by smoohig he origial sigal. 3. Add o he smoohed IF flow a zero mea Gaussia oise whose variace is compued as follows. ake he firs 5 deailed sigals from wavele rasform, ad compue he variace of he sum of he 5 deailed sigals. A backgroud IF flow raffic is creaed. 4. Radomly selec values of four parameers i able o characerize os aacks. Add he aomaly o op of he backgroud raffic. 5. Use he creaed IF flow raffic o ifer ipu liks ad oupu liks raffic accordig o ieral raffic marix relaioship wihi a rouer. Alhough mos os aacks las bewee 5 o 3 miues [29], here are some ouliers ha las less ha miue or more ha oe day. Here he aack lasig ime is seleced bewee ad 3 miues. I able, is a muliplicaive facor which is muliple of., ad muliplied by he IF flow baselie raffic o geerae he aack raffic load. acually deoes raio of os aack raffic o ormal IF flow raffic. For each 4 os aacks are geeraed sarig a differe ime, ad each aack affecs 2.5 IF flows o average. (Src,s) refers o os aack comig from Src pors ad leavig from s pors. Here s= idicaes os aack raffics are oly aggregaed o oe egress por. Ramp ad Expoeial
KSII RANSAIONS ON INERNE AN INFORMAION SYSEMS VOL. 4, NO. 3, Jue 2 44 are shape fucios, deailed descripio abou which ca be foud i lieraure [9]. able. os aack descripio parameers Parameer uraio(mi) Aack iesiy Num (Src,s) Shape Possible values ~3. (, )~ (4, ) Ramp, Expoeial 7.2 Validaio of os Aacks eecio ad Evaluaio Mehods Validaio of RLS-based mehod: Fig. 7 shows resuls of RLS-based mehod applied o a syheic IF flow raffic measured by packe cou. he syheic IF flow is creaed by usig he mehod meioed above. I he firs sub figure syheic IF flow wih os aack is marked i blue solid curve ad is prediced raffic is marked i red dash-do curve. he wo verical doed lies mark he sar ad ed ime of a os aack. he secod sub figure shows he resuls of variace raio saisical mehod direcly applied o origial raffic. he hird sub figure shows he predicio error a differe imes. I ca be validaed ha predicio error is ormally disribued wih mea zero. he fial sub figure is deecio resuls of variace raio saisic mehod applied o predicio error. I sub figure 2 ad 4 red horizoal lie is he hreshold whe m is 3; blue curve is variace raio. From sub figure 2 ad 4 i ca be see ha predicio error has more powerful abiliy ha origial raffic o deec aomaly. No oly ha i ca make lesser misakes, bu also i ca deec os aacks a he very sar ad is able o ell is duraio precisely. Fig. 7. RLS-based mehod o deec syheic IF flow Validaio of Kalma-based mehod: Fig. 8 shows resuls of Kalma-based mehod applied o a origial IF flow raffic measured by flow cou. he upper sub figure shows he origial IF flow (blue solid curve) ad is esimaed IF flow (red dash-do curve). I ca be see ha he esimaed resuls ca capure he red of origial raffic s chages. he middle sub figure shows he residual a differe imes. he boom sub figure shows he resuls of GLR es [28] applied o deermie aomaly. he size of ime widow is se as 3 i GLR es. Red horizoal lie ad black dashed horizoal lie show corol limi for value of.5 ad.5 respecively. urig he ime bi from 5 o 79 here is viole vibraio akig place
442 Ya e al.: ombiig Adapive Filerig ad IF Flows o eec os Aacks wihi a Rouer i he IF flow, besides here are some serious burr pheomea scaered i he raffic. I he middle of IF flow ime series raffic cou is very small because of igh ime. All hese pheomea are illusraed properly i boom sub figure. Fig. 8. Kalma-based mehod o deec origial IF flow Validaio of os aacks Evaluaio: Fig. 9 show he os aack agais por #4 happeed aroud 35 ime bi which is carefully seleced ad validaed. Here RLS-based mehod is used o deec aomaly ad hreshold is se as m=3. Fig. shows evaluaio resuls of os aacks agais por #4. I he figure por #4 is regarded i healhy sae (amely uder o os aacks ad aack idex is zero) durig all he ime excep for some sharper spikes caused by os aacks or ormal raffic vibraio. ompared wih Fig. 9 i is o difficul o fid ha he ime whe he highes idex values have appeared is righ he same ime whe os aack has happeed. he mehod shows effecive i evaluaig severiy of os aacks agais rouer por. Fig. 9. oupu lik #4 ad ipu lik #4 Fig.. Evaluaio resuls of os aacks agais por #4 7.3 Resuls ompariso of hree ypes of raffic Wihi each ype of raffic, for each value of he hreshold, he eire raffic marix (hus raversig all aomalies ad o-aomalies) is examied. Oe false posiive perceage ad
KSII RANSAIONS ON INERNE AN INFORMAION SYSEMS VOL. 4, NO. 3, Jue 2 443 oe false egaive perceage for each hreshold cofiguraio of a scheme are compued. he performace of he mehod applied o hree ypes of flows is depiced i Receiver Operaio haracerisic (RO) curves. he RO curve is he plo of rue posiive raio (PR) agais False Posiive Raio (FPR). PR is he fracio of os aacks raffic correcly classified as os aacks. he false egaive raio (FNR) is he fracio of os aacks raffic wrogly classified as ormal raffic. FPR is he fracio of ormal raffic wrogly classified as os aacks. A algorihm is cosidered beer if is RO curve climbs rapidly owards he upper lef corer of he graph. I Fig., oe FNR ad oe FPR are resuls of deecig syheic aomaly raffic wih he same aack iesiy. I Fig. 2, oe PR ad oe FPR are deecio resuls a he same hreshold. Fig.. FNR ad FPR as a fucio of he aack iesiy Fig. 2. RO curves usig syheic daa he more eormous aomalies, he easier is deecio, ad vice versa. FNR ad FPR also decrease wih he icrease of aomaly iesiy. he same coclusio is show i Fig. ad 2. Bu icideally, i Fig. FPR of Kalma-based mehod is icreasig wih he icrease of aack iesiy. his is because IF flows esimaed by Kalma filer ca affec each oher i esimaig resuls, which causes propagaio of esimaio error bewee IF flows. All hese observaios give below are oly wihi he deecio resuls of RLS-based mehod.. I Fig., wih he decrease of aack size, he curves for FPR ad FNR of he hree ypes of flows are more coverge. I is easy o fid ha, whe aack iesiy is railig off o a cerai degree, i is hard o deec aomaly wih he hree ypes of flows. 2. Wih he icrease of aack iesiy, FPR ad FNR of IF flows are smaller ha ha of ipu liks ad oupu liks; FPR ad FNR of oupu liks are smaller ha ha of ipu liks.
444 Ya e al.: ombiig Adapive Filerig ad IF Flows o eec os Aacks wihi a Rouer 3. Fig. 2 shows aomaly deecio i IF flows more effecive ha i ipu liks ad oupu liks. learly a he same FPR, PR of IF flows is abou % higher ha ha of oupu liks, 35% higher ha ha of ipu liks. I summary, if IF flows ca be obaied easily, deecio wih IF flows is a good opio. However, if i is o accessible, oupu liks are more suiable ha ipu liks for deecio. 7.4 Resuls ompariso of RLS-based Mehod ad Kalma- based Mehod hese wo mehods are compared i erms of deecio rae, deecio speed, applicabiliy, ad difficuly levels associaed wih realizaio. Boh are complemeary o each oher, ad suiable for differe applicaio eviromes. his makes he scheme more powerful,flexible ad robus.. eecio rae. Fig. 2 shows, uder he same false alarm rae, deecio rae of RLS-based deecio i IF flows is a lile higher ha ha of Kalma-based deecio i IF flows. Oe reaso is ha Kalma-based mehod is lesser accurae i esimaig IF flows ha RLS-based mehod, which decreases he deecio accuracy of Kalma-based mehod. Aoher reaso is ha he wo deecio mehods use differe saisical deecio algorihm, which also causes differe deecio resuls. 2. eecio speed. Kalma-based mehod eeds o calculae parameers recursively beforehad. RLS-based mehod also eeds los of hisorical raffic ad compuaio o calculae hresholds. Bu hese early work has lile effec o real ime deecio, oly he deecio speed of wo mehods are compared i real ime. A compuer wih a PU of Peium-IV 3.GHz ad memory of 752M is used o ru MALAB programs of wo mehods respecively for 5, ieraios. Experimes show ha Kalma-based mehod eeds.4 ms o deec oce i average, however RLS-based mehod eeds.45 ms. Kalma-based mehod is faser ha RLS-based mehod. Of course, as a maer of fac, wo mehods have eough deecio speed o deec aomaly i real ime. 3. Applicabiliy. Before usig RLS-based mehod, i is ecessary o collec previous IF flows i real ime, which leads o a burde o a rouer ad affec is ormal operaio. However Kalma-based mehod oly eeds IF flows o esimae ieraio parameers beforehad. Oce he ieraio parameers are esimaed, here is o eed of IF flows o calibrae he ieraio parameers for a log period of ime (such as oe week) ad oly ipu liks or oupu liks are eeded as observaio vecors. hese wo ypes of raffic ca be obaied by SNMP o access Maageme Iformaio Base (MIB) i a rouer wihou usig Neflow cache. As a resul, is overhead o rouer is very small. learly, RLS-based mehod is suiable for a lighly loaded rouer, bu Kalma-based mehod ca be applied o a heavily loaded rouer. 4. ifficuly levels associaed wih realizaio. here are a lo of marix operaios used i Kalma-based mehod, especially formulas o calculae ieraio parameers, which makes he mehod difficul o perform i pracice. Neverheless i is easy o esimae raffic i real ime ad IF flow marix ca be esimaed i oe sroke. RLS-based mehod is simple i predicig raffic ad easy o impleme. Bu i ca oly predic oe saisic a a ime, ad he more saisics arrived a oe ime, he more imes RLS-based mehod performs. Foruaely, IF flows are idepede of each oher ad ca be obaied simulaeously, which make RLS-based mehod easy o predic raffic i parallel. A he same ime, variace raio es ad GLR saisical es ca be performed i ieraive opimizaio, which ca reduce reduda compuaio. 7.5 ompariso wih USUM eecio Mehod
KSII RANSAIONS ON INERNE AN INFORMAION SYSEMS VOL. 4, NO. 3, Jue 2 445 USUM mehod has a srog real ime deecio power. I is ofe used i all kids of saisical process corol. I [6], auhors made some improvemes o geeral o parameric USUM ad used raffic raio of ipu por raffic o oupu por raffic o deec os aacks i rouer. However hey did o give ay efficiecy experime alhough some resuls show are perfec. Before usig his mehod i real ime deecio, experimes are eeded o deermie some parameers. For a geeral view of deecio performace, some ypical values for hreshold muliplier ad skewed raio are seleced respecively. is seleced bewee 2 o 8 uiformly. is seleced bewee.5 o.45 uiformly. Ipu liks ad oupu liks raffic aggregaed i previous experime (amely ipu ad oupu por raffic) are used as es daa. he sable mea parameers of five rouer pors compued as i [6] are (.2596,.965,.2749,.254,.367). Forgeig facor is se as.. he experime resuls are a series of RO curves show i Fig. 3. Each RO curve correspods o each. Fig. 3 A series of RO curves of USUM mehod We ca see ha all he RO curves are almos liear wih same slope. ha meas for differe, he deecio resuls is cosise as a whole, if oly is se wihi wide eough rage. omparig Fig. 3 wih Fig. 2, i is easy o fid ha USUM mehod has lower deecio rae ha boh Kalma mehod ad RLS mehod whe hey are applied o hree ypes of raffic. A he same deecio rae, USUM mehod causes much more false alarms, makig i usuiable for pracical applicaio whe aack raffic is very small. here are wo mai reasos which lead o poor performace of USUM excep for small aack raffic. Oe reaso is ha alhough USUM algorihm ca deec aack, bu whe aack is eded, he USUM cumulaive value of he follow-up is sill large eough o exceed hreshold value, which resuls i false alarms frequely, especially whe aack lass for a log ime. Aoher reaso is ha he raffic samplig ime is oe miue bu i [6] is ms. he ormal raffic display sroger saioariy uder Saisical muliplexig [3] for shorer samplig ime. So he raffic is o very saioary because of loger samplig ime. his makes i o easy o obai a reasoable mea, which affecs USUM deecio performace. 7.6 ompariso wih PA eecio Mehod I [8], auhors used PA based Q-saisic o deec aomaly i ipu liks raffic. he resuls showed PA based Q-saisic have srog deecio performace. A firs, his mehod use PA o separae pricipal compoe of offlie raffic marix races. he he squared predicio error (SPE) is compued bewee origial vecor ad prediced vecor. A las, he hreshold 2 2 for SPE a he cofidece level is se. Whe SPE, he ework raffic is cosidered abormal.
446 Ya e al.: ombiig Adapive Filerig ad IF Flows o eec os Aacks wihi a Rouer he es daa used i he experime are creaed by he same ma made raffic mehod meioed before. he differece bewee hem is ha aack iesiy has a much wide value from. o 2. For each, os aacks are radomly geeraed. RLS, Kalma ad PA mehods are all applied o IF flow raffic. he compariso resuls of hem are show i Fig. 4 ad Fig. 5. Fig. 4. FNR ad FPR as a fucio of he aack iesiy Fig. 5. RO curves of hree mehods Fig. 4 ad Fig. 5 show RLS ad Kalma mehods have cosise resuls i geeral compared wih previous experimes. Nex performace compariso ad aalysis i deail are give amog he wo mehods ad PA mehod. I Fig. 4, lef subfigure shows false egaive raio of PA is much higher ha ha of RLS ad Kalma. Wih he icrease of aack iesiy, alhough he false egaive raio of PA decreases gradually, he gap bewee hem is icreasig. Righ subfigure shows he false posiive raio of PA is lower ha oher mehods. Moreover, i chages lile whe aack iesiy chages. his explais why PA mehod ca corol false alarm perfecly, bu a he same ime icreases false egaive rae, causig lower deecio rae. Fig. 5 shows whe FPR is bewee ~4%, deecio rae of Kalma is 8% higher ha ha of PA o average. Whe FPR exceeds 4%, deecio rae of PA exceeds ha of Kalma. Whe FPR is bewee ~9%, deecio rae of RLS is 3% higher ha ha of PA o average. Whe FPR exceeds 9%, deecio rae of PA exceeds ha of RLS. A las, PA deecs all aomalies whe FPR is abou 2%. However RLS ad Kalma deec all aomalies whe FPR is abou 26%. However whe FPR is very low (abou ~5%), RLS ad Kama mehods have much higher deecio rae ha PA; whe FPR is very high (abou
KSII RANSAIONS ON INERNE AN INFORMAION SYSEMS VOL. 4, NO. 3, Jue 2 447 exceedig9%), deecio rae of hree mehods differs lile. herefore RLS-based mehod ad Kalma-based mehod are cosidered beer ha PA mehod. I he followig he deecio lag ime experime proves his oo. eecio lag ime is a key idicaor o judge wheher a deecio mehod is good or o. eecio lag ime deoes he ime whe deecio mehod deecs a rue aomaly mius he ime whe he aomaly bega. A good deecio mehod should have shorer deecio lag ime. Whe deecio lag ime is shorer, aomaly ca be deeced earlier. he lag ime cumulaive probabiliy disribuios of hree mehods are show i Fig. 6. PA mehod ca deec abou 3% of all deeced aomalies wihou delay; RLS ad Kalma mehods ca deec abou 6%. I oher words, RLS ad Kalma mehods ca easily deec aomaly ad much earlier ha PA mehod. Eve so, he deecio rae is o very high for all hree mehods uder o deecio lag. his is relaed o red fucios seleced i syheic aack raffic creaio. iffere from sep fucio which icreases aack raffic o maximum value a he begiig, ramp ad expoeial fucios used here icrease aack raffic smoohly. his smooh aack raffic makes deecio mehod harder o fid aomaly wih o lag. Fig. 6. umulaive disribuio fucio of deecio lag ime 8. iscussio he scheme ca be geeralized o muliple core rouers i large-scale ework. A disribued way ad co-operaio bewee rouers should be cosidered. A possible mehod o deec os aacks i a large-scale ework is as follows. o explai he mehod clearly, a 3-rouer ework show i Fig. 7 is used as a example. I Fig. 7, umber deoes rouer umber; leer deoes rouer por umber. Sep : eec ad evaluae os aacks i each rouer o ge evaluaio idexes of os aacks agais pors ad aomalous IF flows. os aacks deecio ad evaluaio resuls give o each rouer i Fig. 7 are show i able 2. I each figure lised i able 2, red arrow deoes aomalous IF flow, umber deoes evaluaio idex of os aacks agais por; leer deoes rouer por umber. Sep 2: Se a hreshold of evaluaio idex o ge aack-aggregaio por ad aack-eerig por. For example, he hreshold is se as.6. ha meas he por called aack-aggregaio por, whose evaluaio idex is larger ha.6, is assumed o be aacked, ad he por, from which aomalous IF flow leavig aack-aggregaio por eers rouer, is called aack-eerig por. I is clear from able 2 ha aack-aggregaio por of rouer is, aack-eerig por of rouer is A ad B; aack-aggregaio por of rouer 2 is, aack-eerig por of rouer 2 is A ad ; aack-aggregaio por of rouer 3 is, aack-eerig por of rouer 3 is A ad.
448 Ya e al.: ombiig Adapive Filerig ad IF Flows o eec os Aacks wihi a Rouer Sep 3: Merge all rouers aack-aggregaio por ad aack-eerig por o cosruc a direced aack pah opology. he mehod o deermie wheher aack raffic exiss i he lik bewee wo rouers is: if rouer A s coeced por is aack-aggregaio por ad rouer B s coeced por is aack-ierrig por, he aack exiss i he lik; furhermore, he aack direcio is from rouer A o rouer B. Based o he mergig mehod, he direced aack pah opology exisig i Fig. 7 is cosruced ad show i Fig. 8. Sep 4: Evaluae he direced aack pah opology o ideify os aacks behavior. os aacks pah has some geeral feaures, such as oe-way ree srucure ad a sole vicim. If aack pah opology does o mee heses feaures well ad has poorer coeciviy, lesser obvious hierarchy, here has lesser probabiliy ha os aack exiss i ework. I is clear from Fig. 8 ha a os aack exiss i ework show i Fig. 7 by he evaluaio crierio. aackers aackers 3 A A B aackers B A B 2 aackers A aackers A 3 aackers B A 2 aackers aacked 4 por rouer Aackig Flows Fig. 7. 3-rouer ework aackers aacked 4 por rouer Aackig Pah Fig. 8. he direced aack pah opology able 2. os aacks deecio ad evaluaio resuls o each rouer Rouer umber Por A Por B Por Por.2 A B 2 A B.2 3 A B A A A.2 A B B.8.2 A B B.8 B A B.8 A A A.2 B B B Because he disicio bewee flash crowds ad os aacks is difficul, i seems ha he scheme will have high false posiive rae for flash crowd raffic. Some measures ca be cosidered o solve his problem. For example, a heurisic based o he fidigs of [3] ca be adoped. I is show ha os requess came from clies widely disribued across he Iere (perhaps because os aacks are more likely o be spoofed). I ha ligh, raffic emergig from opologically clusered hoss ad direced o well kow desiaio pors (such as por 53 (NS) or 8 (HP)) are classified as flash crowd. Of course, his heurisic may o always hold ad as a resul, some of he flash crowds may be deeced as os aacks i realiy. O he oher had, os aacks are characerized by a coceraio i desiaio address. However flash crowds are from a dispersed se of source pors, o a coceraed se of desiaio addresses [32]. So eropy ca be iroduced i he scheme o disiguish os aacks ad flash crowds.
KSII RANSAIONS ON INERNE AN INFORMAION SYSEMS VOL. 4, NO. 3, Jue 2 449 9. oclusios Based o he ree characerisic of os aack, feaures of IF flows ad properies of adapive filers, his paper proposes a ew scheme o deec os aacks wihi a rouer. he work provides he followig key coribuios. ) A ew ype of raffic IF flows are iroduced o deec os aacks, ad i has bee demosraed ha IF flows have more powerful abiliy o expose aomaly ha ipu liks ad oupu liks. 2) A deecio ad evaluaio scheme agais os aacks is proposed, ad i ca achieve higher deecio efficiecy ad flexibiliy i compariso o previous work for facors like deecio rae, false alarm rae, deecio lag ime ec. Refereces [] V.. Gligor, A oe o deial-of-service i operaig sysems, IEEE ras. Sofw. Eg., vol., o. 3, pp. 32-324, 984. [2] ompuer rime Research eer, 24 SI/FBI ompuer rime ad Securiy Survey, hp://www.crime-research.org/ews/.6.24/423/ [3] P. Barford, J. Klie,. Ploka, ad A. Ro, A Sigal Aalysis of Nework raffic Aomalies, i Proc. of Iere Measureme Workshop, 22. [4] S. Kim, A. Reddy, ad M. Vaucci, eecig raffic Aomalies a he Source hrough Aggregae Aalysis of Packe Header aa, i Proc. of Neworkig, 24. [5] ao Qi. iaohog Gua, Wei Li ad Pighui Wag, yamic Feaures Measureme ad Aalysis for Large-Scale Neworks, i Proc. of I28, SIM workshop, pp. 22-26, 28. [6]. M. Gil, ad M. Poleo, Mulops: a daa-srucure for badwidh aack deecio, i Proc. of he h USENI Securiy Symposium, 2. [7] Haako Rigberg, Augusi Soule, Jeifer Rexford, hrisophe io, Sesiiviy of PA for raffic Aomaly eecio, i Proc. of SIGMERIS 7,USA, pp. 9-2, Jue 27. [8] Aukool Lakhia, Mark rovella, hrisophe io, iagosig Nework-wide raffic Aomalies, i Proc. of SIGOMM 4, Porlad, Orego,USA, pp. 29-23, 24. [9] Augusi Soule, Kave Salamaia, Nia af, ombiig Filerig ad Saisical Mehods for Aomaly eecio, i Proc. of Iere Measureme oferece, pp. 33-344, 25. [] A. Media,. Fraleigh, N. af, S. Bhaacharyya,. io, A axoomy of IP raffic Marices, i Proc. of Scalabiliy ad raffic orol i IP Neworks II, Boso, USA, pp. 2-23, 23. []. M. Gil ad M. Poleo, MULOPS: A daa-srucure for badwidh aack deecio, i Proc. of he h USENI Securiy Symposium, 2. [2] H. Wag,. Zhag ad K. G. Shi, eecig SYN floodig aacks, i Proc. of IEEE INFOOM, pp. 53-539, 22. [3] Ami Kulkari ad Sephe Bush, eecig disribued deial-of-service aacks usig kolmogorov complexiy merics, Joural of Nework ad Sysems Maageme, vol. 4, o., pp. 69-8, Mar. 26. [4] Peg ao,. Leckie ad K. Ramamohaarao, Proecio from disribued deial of service aacks usig hisory-based IP filerig, i Proc. of I 3, pp. 482-486, 23. [5] Yu he, Kai Hwag, Wei-Shi Ku, ollaboraive eecio of os Aacks over Muliple Nework omais, IEEE ras. O Parallel ad isribued Sysmes, vol. 8, o. 2, pp. 649-662, ec. 27. [6] Su Zhi-i, ag Yi-Wei, heg Yua, Rouer Aomaly raffic eecio Based o Modified-USUM Algorihms, Joural of Sofware, vol. 6, o. 2, pp. 27-223, 25. [7] Ruoyu Ya ad Qighua Zheg, Usig Reyi ross Eropy o Aalyze raffic Marix ad eec os aack, Iformaio echology Joural, vol. 8, o. 8, pp. 8-88, 29. [8] Krisha Kumar, R. Joshi, Kuldip Sigh, A isribued Approach usig Eropy o eec os aacks i ISP omai, i Proc. of Ieraioal oferece o Sigal Processig, ommuicaios ad Neworkig, pp. 33-337, 27.
45 Ya e al.: ombiig Adapive Filerig ad IF Flows o eec os Aacks wihi a Rouer [9] avid K. Y. Yau, Joh. S. Lui, Feg Liag, ad Yeug Yam, efedig Agais isribued eial-of-service Aacks Wih Max-Mi Fair Server-eric Rouer hroles, IEEE/AM RANSAIONS ON NEWORKING, vol. 3, o., pp. 29-42, Feb. 25. [2] Aukool Lakhia, Kosaia Papagiaaki, Mark rovella, hrisophe io, Eric.Kolaczyk, ad Nia af, Srucural Aalysis of Nework raffic Flows, i Proc. of SIGMERIS/Performace, New York, USA, pp. 6-72, 24. [2] isco IOS NeFlow Whie Papers, hp://www.cisco.com/e/us/producs/ps66/prod_whie_ papers _lis.hml. [22] isco NeFlow Performace Aalysis Whie Papers, hp://www.cisco.com/e/us/echologies/ k543/k82/echologies_whie_paper9aecd82aeb9_ps66_producs_whie_paper.hml, 27. [23] Simo Hayki, Adapive Filer heory, Beijig: Publishig House of Elecroics Idusry, 22. [24] V. Paxso, Bro: A Sysem for eecig Nework Iruders i Real-ime, ompuer Neworks, vol. 3, o. 23-24, pp. 2435-2463, 999. [25] Bre Niess, Suar Gibso, he EM algorihm for Mulivariable yamic Sysem Esimaio, echical Repor EE2, 2. [26] R. H. Shmway,. S. Soffer, yamic Liear Models wih Swichig, Joural of he America Saisical Associaio, vol. 86, o. 45, pp. 763-769, 99. [27] V. igalakis, J. Rohlicek, M. Osedorf, ML Esimaio of a Sochasic Liear Sysem wih he EM Algorihm ad Is Applicaio o Speech Recogiio, IEEE ras. O Speech ad Audio Processig, vol., o. 4, pp. 43-44, 993. [28] ouglas M. Hawkis, Peihua Qiu, hag Wook Kag, he chagepoi model for saisical process corol, Joural of Qualiy echology, vol. 35, o. 4, pp. 355-366, 23. [29]. Moore, G. M. Voelker, S. Savage, Iferrig iere eial-of-service aciviy, i Proc. of he h USENI Securiy Symposium, pp. 9-22, 2. [3] Hao Jiag, osaios ovrolis, Why Is he Iere raffic Bursy i Shor ime Scales, i Proc. of AM SIG MERIS 5, pp. 24-252, Jue 25. [3] J. Jug, B. Krishamurhy ad M. Rabiovich. Flash rowds ad eial of Service Aacks: haracerizaio ad Implicaios for Ns ad Web Sies, i Proc. of World Wide Web oferece, Hawaii, USA, 22. [32] Aukool Lakhia, Mark rovella, hrisophe io, Miig aomalies usig raffic feaure disribuios, i Proc. of SIGOMM 5, Philadelphia, Pesylvaia, USA, pp. 27-228, 25. Ruoyu Ya received a M.S. degree from Beijig Jiaoog Uiversiy i compuer sciece, hia, i 24. urrely he is a Ph.. cadidae i compuer sciece, i a Jiaoog Uiversiy, hia. His research ieress focus o ework securiy.
KSII RANSAIONS ON INERNE AN INFORMAION SYSEMS VOL. 4, NO. 3, Jue 2 45 Qighua Zheg received his B.S. ad M.S. degrees i compuer sciece ad echology from i a Jiaoog Uiversiy, hia, i 99 ad 993, respecively, ad his Ph.. degree i sysems egieerig from he same uiversiy i 997. He was a posdocoral researcher a Harvard Uiversiy i 22. Sice 995 he has bee wih he eparme of ompuer Sciece ad Egieerig a i a Jiaoog Uiversiy, ad was appoied direcor of he eparme i 28 ad heug Kog Professor i 29. His research ieress iclude ework securiy ad iellige e-learig. Haifei Li is a associae professor of ompuer Sciece a Uio Uiversiy, Jackso, N, USA. He received his M.S. ad Ph.. degrees i ompuer Sciece from he Uiversiy of Florida, i 998 ad i 2, received his Bachelor's degree i ompuer Sciece from i'a Jiaoog Uiversiy, i'a, hia i 99. r. Li s area of ieres icludes e-learig, daabase, e-commerce, auomaed busiess egoiaio ad busiess process maageme.