JOURNAL OF SOFWARE, VOL. 9, NO. 4, APRIL 24 985 Mining Web User Behaviors o Deec Applicaion Layer DDoS Aacks Chuibi Huang Deparmen of Auomaion, USC Key laboraory of nework communicaion sysem and conrol Hefei, China Email: hcb@mail.usc.edu.cn Jinlin Wang, 2, Gang Wu, Jun Chen 2 2 Insiue of Acousics, Chinese Academy of Sciences Naional Nework New Media Engineering Research Cener Beijing, China Email: {wangjl, chenj}@dsp.ac.cn Absrac Disribued Denial of Service (DDoS) aacks have caused coninuous criical hreas o he Inerne services. DDoS aacks are generally conduced a he nework layer. Many DDoS aack deecion mehods are focused on he IP and CP layers. However, hey are no suiable for deecing he applicaion layer DDoS aacks. In his paper, we propose a scheme based on web user browsing behaviors o deec he applicaion layer DDoS aacks (app-ddos). A clusering mehod is applied o exrac he access feaures of he web objecs. Based on he access feaures, an exended hidden semi-markov model is proposed o describe he browsing behaviors of web user. he deviaion from he enropy of he raining daa se fiing o he hidden semi-markov model can be considered as he abnormaliy of he observed daa se. Finally experimens are conduced o demonsrae he effeciveness of our model and algorihm. Index erms HsMM, web user behaviors, DDoS, DDoS Aacks, clusering I. INRODUCION Disribued Denial of Service (DDoS) aacks have become a serious problem in recen years. Generally, DDoS aacks are carried ou a he nework layer, such as ICMP flooding, SYN flooding and UDP flooding. Wihou advance warning, a DDoS aack can easily exhaus he compuing and communicaion resources of is vicim wihin a shor period of ime []. Because of he seriousness and desruciveness, many sudies have been conduced on his ype of aacks. A lo of effecive schemes have been proposed o proec he nework and equipmen from bandwidh aack, so i is no as efforless as in he pas for aackers o launch he nework layer DDoS aacks. In order o dodge deecion, aackers shif heir offensive sraegies o applicaion-layer aacks and esablish more sophisicaed ypes of DDoS aacks. hey uilize legiimae applicaion layer HP requess from legiimaely conneced nework machines o overwhelm web server. hese aacks are ypically more efficien han CP or UDP-based aacks, requiring fewer nework connecions o achieve heir malicious purposes. he MyDoom worm and he CyberSlam are all insances of his ype aack [2, 3]. he challenges of deecing he app-ddos aacks can be summarized as he following aspecs: he app-ddos aacks make use of higher layer proocols such as HP o pass hrough mos of he curren anomaly deecion sysems designed for low layers. Along wih flooding, App-DDoS races he average reques rae of he legiimae user and uses he same rae for aacking he server, or employs large-scale bone o generae low rae aack flows. his makes DDoS aacks deecion more difficul. Burs raffic and high volume are he common characerisics of App-DDoS aacks and flash crowds. "Flash crowd" refers o he siuaion when a very large number of users simulaneously access a popular Websie, which produces a surge in raffic o he Websie and migh cause he sie o be virually unreachable [4]. I is no easy for curren echniques o disinguish APP-DDoS aacks from flash crowds. From he lieraure, few exiing researches focus on he deecion of app-ddos aacks during he flash crowd even. his paper inroduces he hidden semi-markov model (HsMM) o capure he user browsing paerns during flash crowd and o implemen he app-ddos aacks deecion. Our conribuions in his paper are hreefold: ) We use he clusering mehod o exrac he web objecs' access feaures, which can well porray curren web user browsing behaviors. 2) We apply hidden semi-markov model (HsMM) o describe he dynamics of access feaures and o implemen he deecion of app- DDoS aacks. 3) Experimens based on real raffic are conduced o validae our deecion scheme. he organizaion of he paper is as follows. In he nex secion, we review he relaed work of our research. In 24 ACADEMY PUBLISHER doi:.434/jsw.9.4.985-99
986 JOURNAL OF SOFWARE, VOL. 9, NO. 4, APRIL 24 Secion 3, we explain our model and algorihm o deec he app-ddos aacks. Experimen resuls are presened in secion 4. Finally, we conclude our paper in Secion 5. II. RELAED WORK Mos curren DDoS-relaed researches are conduced on he IP layer or CP layer insead of he applicaion layer. hese mechanisms ypically uilize he specific nework feaures o deec aacks. Mirkovic e al. [5] proposed a defense sysem called D-WARD locaed in edge rouers o monior he asymmery of wo-way packe raes and o deec aacks. Yu e al. [6] discriminaed DDoS aacks from flash crowds by using he flow correlaion coefficien as a similariy meric among suspicious flows. Zhang e al. [7] proposed he Congesion Paricipaion Rae (CPR) meric and a CPRbased approach o deec and filer he CP layer DDos aacks. Cabrera e al. [8] depend on he MIB raffic variables colleced from he sysems of aackers o achieve he early deecion. Yuan e al. [9] capure he raffic paern by using cross correlaion analysis and hen decide where and when a DDoS aack possibly arises. Soule e al. [] used he raffic marix, which represens he raffic sae, o idenify various dynamic aacks a early sage. In [], DDoS aacks were discovered by analyzing he CP packe header agains he well-defined rules and condiions and disinguished he difference beween normal and abnormal raffic. In [2], aackes are deeced by compuing he rae of CP flags o CP packes received a a web server. However, here are lile DDoS defense mehods ha uilize he applicaion layer informaion. In [9], a suspicion assignmen mechanism Ranjan e al. [9] used saisical mehod o deec ime relaed characerisics of HP sessions, such as reques inerarrival ime, session iner-arrival ime and session arrival ime. Yen e al. [2] defended he applicaion DDoS aacks wih consrain random reques aacks by he saisical mehods. Kandula e al. [3] designed a sysem o proec a web cluser from DDoS aacks by (i) opimally dividing ime spen in auhenicaing new cliens and serving auhenicaed cliens (ii) using CAPCHAs designing a probabilisic auhenicaion mechanism, bu he ask of requiring users o solve graphic puzzles causes addiional service delay. As a resul, he graphic puzzle cause annoying legiimae users as well as ac as anoher DDoS aack poins. [6] inroduced a web browsing model represened by he ransiion of he click pages. In a few previous sudies, analysis of user behaviors have been applied in many research fields [3, 5, 7, 2, 22]. III. MODEL PRELIMINARIES AND ASSUMPIONS When deecing he app-ddos aacks, we are faced wih he following challenges: ) aacker may launch an app-ddos aack by mimicking he normal web user access behavior, so he malicious requess differ from he legiimae ones bu no in raffic characerisics. herefore, mos curren deecion mechanisms based on raffic characerisics become invalid 2) Boh he flash crowd and app-ddos aacks are unsable, bursy and huge raffic volume. he app-ddos aackers are increasingly moving away from pure bandwidh flooding o more surrepiious aacks ha hide in normal flash crowd of he websie. I is a challenge o deec he app- DDoS aacks when hey occur during a flash crowd even. o mee he above challenges, we focus on analysis of he user behaviors. In his paper, we assume ha i is impossible for app-aacks o compleely mimic he normal web user behaviors. his assumpion is based on he following consideraion. Generally, web user access behaviors can be described by hree facors: HP reques rae, page reenion ime, and reques sequence. he aackers can mimic he normal access behavior by launching aacks wih similar HP reques rae and reenion ime as he normal users. However, he app- DDoS aacks canno simulae he dynamic process of he web user behavior or he reques sequence, because i is a pre-designed rouine and canno capure he dynamics of he users and he neworks. Anoher assumpion of our paper is ha he user behavior can be described by he disribuion of web objecs populariy. hen he variaion of web objecs populariy could represen he dynamic changing process of user behavior. Since app-ddos aacker is unable o obain hisorical access records from he vicim server, i canno mimic he dynamic of normal user behaviors. We consider he app-ddos aacks as anomaly browsing behavior. Moreover, we also assume ha normal users always access he "ho webpages". However, he aackers access he "cold webpages. herefore, we could build he browsing behavior model for boh aacks and normal user by monioring he dynamic change of he web objecs populariy. Using his model, we could disinguish he app-ddos aack from normal user. IV. MODEL AND DEECION PRINCIPLE A. Hidden Semi-Markov Model Hidden semi-markov model (HsMM) [4] is an exension of hidden Markov model (HMM) [23] wih variable sae duraion. I is a sochasic finie sae machine, specified by λ ( Q, π, A, B, P) where: Q is a discree se of hidden saes wih cardinaliy M, i.e., Q {,..., N}. q Q denoes he sae ha he sysem akes a ime ; π is he iniial sae probabiliy disribuion, i.e., π { π m m Q}, π Pr q m m. q denoes he sae ha he sysem akes a ime and m Q. he iniial sae probabiliy disribuion saisfies π m m ; A is he sae ransiion marix wih probabiliies: a Pr mn q n q m, m, n Q, and he sae ransiion coefficiens saisfy a ; n mn 24 ACADEMY PUBLISHER
JOURNAL OF SOFWARE, VOL. 9, NO. 4, APRIL 24 987 b k Pr o v q m, m Q and o k denoes he observed vecor a ime, aking,..., v v, K is he size of he K B is he oupu probabiliies marix m ( ) values from { } observable oupu se. For a given sae m, ( ) ; m b k k P is he sae duraion marix wih probabiliies: p ( d ) Pr τ d q m, m τ denoes he remaining ime of he curren sae q, m Q, d { D},...,, D is he maximum inerval beween any wo consecuive sae ransiions, and he sae duraion coefficiens saisfy p ( d ). d m q, τ akes on value hen, if he pair process ( ) ( m, d ), he semi-markov chain will remain in he curren sae m unil ime + d and ransis o anoher sae a ime + d, for d {,..., D}. I is generally no observable for hese saes. he observable variables are a series of observaions O ( o,..., o ) where o denoes he observable oupu a ime and is he number of samples in he observed sequences. b ( o m a b) represens he observaion sequence from ime a o ime i.e., o : a b. When he condiional ( ) b { } b o independence of oupus is assumed, m ( a b) b bm ( o) a. B. Problem Formulaion We use he hidden semi-markov model o describe he dynamic changing process of web user behaviors. he hidden sae q is used o presen he disribuion of web objecs populariy, namely he user behavior, a ime. In general, he disribuion of web objecs populariy is unobservable. he observable oupu is he web objecs click raes: ci Click () i N c where ci i i is he click number of he ih ime uni, and N is he number of he web server s objecs. p m ( d ) is he user behavior reenion ime disribuion. he change of web user behaviors can be considered as a ransiion of he hidden sae (i.e., from q o q ). We can esimae he parameer wih he following forward and back algorihm [4]. he forward variable is defined as follows: α( m, d) Pr o,( q, τ ) ( m, d). (2) A ransiion ino sae ( q, τ ) ( m, d) akes place eiher from ( q, τ ) ( m, d + ) or from ( q, τ ) ( n,) for some n m. herefore, we could obain he following forward recursion formulas: α( md, ) ( md, + ) b( o) α m ( α ) + ( n,) a b ( o ) p ( d), d (3) nm m m α( md, ) π b( o) p( d). (4) m m m he backward variable is defined as follows: β( m, d) Pr o,( q, τ ) ( m, d). + (5) I can be seen ha when d > he nex sae mus be ( q, τ ) ( m, d ), and when d i mus be + + ( q, τ ) ( n, d' ) for some n m and d '. + + We hus obain he following backward recursion formula: β( md, ) b( o ) β ( md, ), for d > (6) m + + and β( m,) ( ) ( ) (, ) a b o p d β n d mn n + n + (7) n m d Especially when : β ( md, ), d. (8) hree join probabiliy funcions can be expressed in erms of he assumed model parameers and he forward and backward variables defined above: ξ ( m, n) Pr o, q m, q n α ( m,) a b ( o ) p ( d) β( n, d) mn n n d (9) η( m, d) Pr o, q m, q m, τ d α ( n,) a b ( o ) p ( d) β( m, d). nm m m n m () γ ( m) Pr o, q m. () hen, he model parameers can be re-esimaed by he following formulas: ˆ q arg max Pr q m o m M arg max γ ( m ), for,,...,. m M ˆ π m γ ( m ) N n γ ( n) (2) (3) 24 ACADEMY PUBLISHER
988 JOURNAL OF SOFWARE, VOL. 9, NO. 4, APRIL 24 Where { v k } k mn N aˆ ˆ( p d) ξ ( m, n) n m D γ k γ ( n) η ( md, ) d η ( md, ) ( m)( δ o v ) ˆ( ) k b v m k γ ( m)( δ o v ) k is he se of observable values, and if o v, δ( v ), oherwise δ( v ) o k o k. he average enropies AE of observed sequences fiing o he HsMM model are used o deec app- DDoS aacks: Enropy Pr[ o λ] md, (4) (5) (6) Pr[ o,( q, τ ) ( m, d ) λ] (7) ALE ln(pr[ o λ]). (8) C. Clusering Mehod Clusering is he ask of grouping a se of objecs in such a way ha objecs in he same group are more similar o each oher han o hose in oher groups. I is a main ask of exploraory daa mining, and a common echnique for saisical daa analysis used in many fields, including machine learning, paern recogniion, image analysis, informaion rerieval, and bioinformaics. Since he number of web objecs is enormous, i is difficul o deal wih he mulidimensional observed daa wihou mass compuaion when raining he hidden semi- Markov model. hus, we apply clusering mehod o reduce he dimension of observed daa. he specific clusering algorihm is shown in Fig.. We assume he cluser resuls as: C { c, c,..., c 2 M }. (9) Num n, n,..., n. 2 M (2) { } Where C denoes he M classes, and n i denoes he number of he web objecs in c i. hen we calculae he click raes of each class according (). VI. EXPERIMENS We simulae 2 clien nodes which play as normal users from he semifinals of FIFA WorldCup98 [8]. We randomly selec 5% of hese nodes as compromised nodes. Furhermore, we assume he aackers can inercep a small porion of requess from normal web users and make he same reques or "ho" pages o launch he app-ddos aack o he vicim server. Bu we also assume ha he app-ddos aacks canno simulae he dynamic process of he web user behavior or he reques sequence, because i is a pre-designed rouine and canno capure he dynamics of he user and he neworks. hus, when he aack begins, each compromised node replays a snippe of anoher hisorical flash crowd race. he inerval beween wo consecuive aack requess is decided by he aack rae. In our experimens, we simulae consan rae aack and increasing rae aacks. We se he ime uni o 5s and group 2 consecuive observaions ino one sequence. he "moving" sep is one ime uni and a new sequence is formed using he curren observaion and he preceding observaions. Algorihm he K-means Clusering Inpu: Web objecs' click raes daase D Number clusers K Oupu: Se of cluser represenaives C Cluser membership vecor m /*Iniialize cluser represenaives C */ Randomly choose K daa poins from D Use hese K daa poins as iniial se of C repea /*Daa Assignmen*/ Reassign poins in D o close cluser mean Updae m such ha m is cluser ID of ih i poin in D /*Relocaion of means*/ Updae C such ha c is means of poins in jh cluser Unil convergence of objecive funcion 2 N (arg min x-c ) i j i j 2 Figure. he Clusering Algorihm A. Aacks during Flash Crowd he emulaion process lass abou 6 h. he firs 2 h daa are used o rain he model, and he remaining 4 h of daa including a flash crowd even are used for es. he emulaed app-ddos aacks are mixed wih he race chose from he period of [3.5h, 5.5h]. Fig.2 shows he average enropies of observaions varying wih he ime, when consan rae aacks are emulaed. Curve a represens he dynamic enropy varying process of normal flash crowd and curve b represens he dynamic enropy varying process of flash crowd mixed wih consan rae app-ddos aacks. We can see ha he average enropy of observaions does no change much during he flash crowd period of [2h, 3.5h], which implies ha he main web user behaviors do no have obvious varieies during he flash crowd even. j 24 ACADEMY PUBLISHER
JOURNAL OF SOFWARE, VOL. 9, NO. 4, APRIL 24 989 However, during he period of [3.5h, 5.5h], consan rae aacks appear and he average enropy of observaions decreases sharply. In he duraion of consan rae aack, here is significanly deviaion from he average enropy of normal observaions. herefore, we could make use of his phenomenon o deec app-ddos aack. Fig.3 shows he average enropy of observaion varying wih he ime, when increasing rae aacks are emulaed. Curve a represens he dynamic enropy varying process of normal flash crowd and curve b represens he dynamic enropy varying process of flash crowd mixed wih increasing rae app-ddos aacks. As shows in Fig.3, during he period of [3.5h, 5.5h], increasing rae aacks appear and he average enropy of observaions decreases gradually. A he end of increasing rae aacks, he average enropy increases gradually o he value range of he normal case. Similar o he siuaion described above, he average enropy in he duraion increasing rae aacks is significanly smaller han he one in normal flash crowd period. enropy -2-4 -6 Deecion for consan rae aack a(flash crowd) b(flash crowd wih aack) b a -8 9 8 27 36 45 index of ime(*5s) Figure 2. Enropy versus ime of consan rae aack he disribuions of average enropy. I is easy o see ha here exis significan differences in enropy disribuions beween wo groups: he enropies of normal web raffic are larger han -3, bu mos enropies of he raffic conaining aacks are less han -4. herefore, we could make use of his resul o idenify he app-ddos aacks from he normal web raffic when conducing DDoS deecion. As shows in Fig.5, if we ake -2.7 for he hreshold value of normal web raffic s average enropy, he false negaive rae (FNR) is abou 2%, and he deecion rae (DR) is abou 97%. We can see ha he algorihm can correcly deec he app-ddos aacks which happened wih a flash crowd even by making use of he dynamic HsMM models of web user behaviors. Frequence.5..5.8 he disribuions of average enropy -7-5 -3 - enropy Figure 4. Disribuions of average enropy FNR DR (-2.7,.97) Normal Aacks Deecion for increase rae aack FNR/DR.6.4 enropy -2-4 -6 a(flash crowd) b(flash crowd wih aack) b a -8 9 8 27 36 45 index of ime(*5s) Figure 3. Enropy versus ime of increasing rae aack B. Performance In he above scenarios, based on he average enropy of observaions fiing o he HsMM model, we can deec he abnormiy caused by he DDoS aack. Fig.4 shows.2 (-2.7,.2) -7-5 -3 - enropy Figure 5. Cumulae disribuion of average enropies VII. CONCLUSION In his paper, we applied he hidden semi-markov model o describe he web user behaviors which can be represened by he click raes of web objecs. By raining he observed daa of normal web raffic wih forward and backward algorihm, we obain he hidden semi-markov model parameers. he average enropies of observed sequences fiing o he HsMM model are used o deec app-ddos aacks. he experimens show ha here is obvious obviaion from he average enropies of normal 24 ACADEMY PUBLISHER
99 JOURNAL OF SOFWARE, VOL. 9, NO. 4, APRIL 24 observaions in he duraion of app-ddos aacks. In order o reduce he amoun of calculaion in raining HsMM model, we apply clusering mehod o reduce he dimension of observed daa. In addiion, in fuure we can consider how o cope wih he app-ddos aacks launched by requesing dynamic webpage. ACKNOWLEDGMEN his work and relaed experimen environmen is suppored by he Naional High echnology Research and Developmen Program of China under Gran No. 2AAA2, he Naional Key echnology R&D Program under Gran No. 22BAH8B4 and he Sraegic Prioriy Research Program of he Chinese Academy of Sciences under Gran No. XDA632. We are sincerely graeful o heir suppor. REFERENCES [] C. Douligeris, A. Mirokosa, DDoS aacks and denfense mechanisms: Classificaion and sae-of-he-ar, Compuer Neworks: he In. J. Compuers and elecommunicaions Neworking, v 44, n 5, p 643-666, Apr 24. [2] Inciden Noe IN-24- W32/Novarg. A virus, CER. [Online]. Available: hp://www.cer.org/inciden_noes/ IN-24-.hml [3] S. Kandula, D. Kaabi, M. Jacob, A. W. Berger, Boz-4- Sale:Surviving Organized DDoS Aacks ha Mimic Flash Crowds, MI, ech. Rep. R-969, 24[Online]. Available: hp://www.usenix.org/evens/nsdi5/ech/kandula/kandula.pdf. [4] Yi. Xie, Shunzheng. Yu, Monioring he applicaion-layer DDoS aacks for popular websies, IEEE/ACM ransacions on Neworking, v 7, n, February 29. [5] J. Mirkovic, G. Prier, P. Reiher, Aacking DDoS a he source, In Proc. In. Conf. Nework Proocols, p 32-32, 22. [6] Shui. Yu, Wanlei. Zhou, Weijia. Jia, Song. Guo, Discriminaing DDoS aacks from flash crowds using flow correlaion coefficien, IEEE ransacions on Parallel and Disribued Sysem, v 23, n 6, June 22. [7] Changwang. Zhang, Zhiping. Cai, Weifeng Chen, Flow level deecion and filering of low-rae DDoS, Compuer Neworks, v 56, n 5, p 347-343, Ocober 22. [8] J. B. D. Cabrera, L. Lewis, X. Qin, W. Lee, R. K. Prasanh, B. Ravichandran, R. K. Mehra, Proacive deecion of disribued denial of service aacks using MIB raffic variables a feasibiliy sudy, In Proc. IEEE/IFIP In. Symp. Inegr. New. Manag., p 69-622, May 2. [9] J. Yuan, K. Mills, Monioring he macroscopic effec of DDoS flooding aacks, IEEE rans. Dependable and Secure Compuing, v 2, n 4, p 324-335, Ocober 22. [] A. Soule, K. Salamaian, N. af, Combining filering and saisical mehod for anomaly deecion, In Proceedings of Inerne Measuremen Conference, p 33-344, 25. [] L. Limwiwakul, A. Rungsawangr, Disrubied denial of service deecion using CP/IP header and raffic measuremen analysis, In Proc. In. Symp. Commun. Inf. echnol., Sappoo, Japan, p 65-6, Ocober 24. [2] S. Noh, C. Lee, K. Choi, G. Jung, Deecing Disribued Denial of Service (DDoS) aacks hrough inducive learning, Lecure Noes in Compuer Science, v 269, p 286-295, 23. [3] Jun. Cai, Shunzheng. Yu, Yu. Wang, he communiy analysis of user behaviors nework for web raffic, Journal of Sofware, v 6, n, p 227-2224, 2. [4] S. Z. Yu, H. Kobayashi, An efficien forward-back algorihm for an explici duraion hidden Markov model. IEEE Signal Process. Le., v, n, p -4, Jan 23. [5] Hui. Chen, Relaionship beween moivaion and behavior of SNS User, Journal of Sofware, v 7, n 6, p 265-272, 22. [6] Y. Xie, S. Yu, A large-scale hidden semi-markov model for anomaly deecion on user browsing behaviors, IEEE ransacion on Neworking, v 7, n, February 29. [7] Wei. Yu, Shijun. Li, Yunlu. Zhang, Zhuo Zhang, Mining users similariy of ineres in web communiy, Journal of Compuers, v 6, n, p 2357-2364, 2. [8] [Online]. Available: hp://ia.ee.lbl.gov/hml/races.hml. [9] S. Ranjan, R. Swaminahan, M. Uysal, E. Knighly, DDoS-resilien scheduling o couner applicaion layer aacks under imperfec deecion, In Proceedings of IEEE INFOCOM, April 26. [2] Qingzhang. Chen, Yanqing. Ou, Hang. Sun, Design and implemen of cusomer communicaion behavior analysis sysem, Journal of Sofware, v 6, n 8, p 484-49, 2. [2] W. Yen, M.-F. Lee, Defending applicaion DDoS wih consrain random reques aacks, In Proc. Asia-Pacific Conf. Commun., Perh, Wesern Ausralia, p 62-624, Ocober 25. [22] Xiaohua. Hu, ao. Mu, Weihui. Dai, Hongzhi. Hu, Genghui. Dai, Analysis of browsing behaviors wih an colony clusering algorihm, Journal of Compuers, v 7, n 2, p 396-32, 22. [23] L. R. Rabiner, A uorial on hidden markov models and seleced applicaions in speech recongniion, In Proc. of IEEE, v 77, n 2, p 257-286, February 989. 24 ACADEMY PUBLISHER