Network traffc analyss optmzaton for sgnature-based ntruson detecton systems Dmtry S. Kazachkn, Student, Computatonal systems lab at CMC MSU, Denns Y. Gamayunov, scentfc advsor, PhD, Computatonal systems lab at CMC MSU Abstract In ths paper we propose a method for sgnature matchng optmzaton n the feld of ntruson detecton and preventon. Sgnature matchng algorthm performance s one of the key factors n the overall qualty of the IDS/IPS, especally n hgh-speed networks. Optmzaton method proposed n ths paper reles on semantcs of the sgnature matchng task, typcal for such systems as Snort. The method mnmzes the number of patterns called by the detecton system for each network packet, reducng the tme of ts processng. Index Terms Securty, Intruson detecton, Networks, Snort. N I. INTRODUCTION ETWORK ntruson detecton and preventon systems (IDS/IPS), whch are qute common nowadays, mostly use sgnature-based method for attacks recognton. The dea of ths method s n comparson of real observed network traffc wth a set of known attack descrptons. The number of sgnatures n a typcal IDS database s usually several thousands (Snort system has about 7 thousand sgnatures [1]). A sngle sgnature can be magned as an ordered set of tests of network packet header and payload content values of some characterstc header felds, typcal textual substrngs, regular expressons, etc. Thus, straght testng of all the sgnatures set aganst a sngle network packet results n hundreds of thousands machne operatons. Ths s one of the key factors that hnder wde spreadng of those systems at hgh speed networks 1Gbt/s and beyond. Furthermore, there are 2 mportant trends, observed from the begnnng of computers and networks evoluton, called Moor s law and Glder s law. Moor s law reads that processor s throughput of a gven cost doubles every 2 years; Glder s law says that the network throughput trebles at the same perod. So, the network node s computatonal power growth s left behnd by the growth of volume of nformaton transferred through the network, whch contnuously make performance requrements of the securty systems algorthms to grow hgher as well, whch refers to IDSs too. Ths paper proposes a method for optmzed testng process of a gven sgnature set on a sngle packet. The basc model Manuscrpt receved March 31, 2008. D. S. Kazachkn s wth the Computatonal systems lab at CMC MSU, Moscow, Russa (e-mal: zok@lvk.cs.msu.su). D. U. Gamayunov s wth the Computatonal systems lab at CMC MSU, Moscow, Russa (e-mal: gamajun@lvk.cs.msu.su). for sgnature matchng, used s ths paper for evaluatng the optmzed method, s consecutve checkng of every sgnature untl the frst success (gettng true value), whch s n fact a brute-force search. Of course, modern IDSs do not use ths full search for sgnature matchng and most of them have some heurstcs for search space reducton (Snort optmzatons are shown below). The proposed method was mplemented for an expermental IDS, desgned at the Computatonal systems lab of Moscow State Unversty s Faculty of Computatonal Math and Cybernetcs [4], and Snort s sgnature base released at 2007 was used as an ntal pack of sgnatures. Implementng the optmzed sgnature matchng algorthm would be a hard task, for t mples alterng the Snort core engne. Yet, for expermental IDS ths method mplementaton s reduced to sgnature base translaton wthout meddlng n the IDS engne. Expermental IDS uses a specalzed behavor descrpton language called R-lang, whch use fnte automaton for descrbng network object behavor. A set of such descrptons, grouped by the object type (n our case network packets), s called module. R-lang language s based on deas of Eckmann s, Vgna s and Kemmerer s works and STATL language [5,6]. Implementaton of sgnatures matchng optmzaton method offers an optmzng translator from R-lang modules to R-lang modules, whch mnmze the total number of states and transtons n the module, and also the number of testng functons called for a sngle event. Evaluaton of the Snort s sgnature base wth optmzng translator revealed ts essental superfluty, that partally can be explaned by the fact, that ths base was formed by many ndependent developers, and seemngly some of them were usng automatc sgnature generators, based on attack samples. In the end of ths paper s shown how proposed sgnature matchng optmzaton has helped to reduce ths superfluty. II. SIGNATURE FORMAL DEFINITION Frst, we wll gve a formal sgnature defnton to use t afterwards. The object of analyss of a sgnature s a sngle network packet whch consst of header and packet payload. Header s a vector of some fxed felds H = {H 1,H 2,...,H n }, that belong to correspondng fnte spaces. Payload text lne P of unrestrcted length. Also, there s a varable vector C = {С 1,С 2,...,С k }, descrbng the sgnature-based analyzer nner state (so-called
state vector). Varables C also have a fnte value ranges. Header condton CondH (H) s a logcal predcate, whch takes header felds H as ts arguments. A sample of such condton s a testng of the equalty of source port (one of the header felds) to value 80. Payload condton s an array of functons: CondP ( s a logcal predcate, whch takes packet payload and state vector as ts arguments, Effects = {Effect,j (} each functon n ths set (for =1...k) represents the sde effect on varable С k, performed durng CondP ( checkng. It s sad that payload condton evaluates successfully, f CondP ( returns 1. At the same tme, all the varables b C vector smultaneously get the new values C' j =Effect,j (, based on payload text and ther old values. When consecutve payload condtons are evaluated, each of them feels sde effects from already evaluated condtons, and the result s ther complex superposton. Ths means there s no way of cachng condtons evaluaton result n a general case. A sample of such condton s a test of substrng presence n payload P between markers С 1 and С 2 wth a sde effect of movng left marker С 1 to the end of found substrng. Reacton s an element of some fnte set of event classes. Sgnature s a trplet <SH,SR>, where SH set of header condtons, SP ordered set of payload condtons, R reacton. Sgnature evaluaton result <SH,SR>,H,:, f all the header condtons return true and all the payload condtons successfully evaluate consecutve., else III. CONVERTED SIGNATURES Snort s sgnatures are qute strctly descrbed by suggested formal model. Let s examne Snort s sgnature structure and descrbe the way to buld a correspondng formal sgnature. Snort s sgnature conssts of 4 sectons: acton acton performed on rule actvaton (usually, alert ). Corresponds to reacton functon. ( msg: Panc alert tcp any:80 -> any:any (content: qwerty ; acton header optons nfo Fg. 1. Snort s sgnature structure. header context-ndependent condtons on packet header: protocol, IP addresses and ports (f defned by protocol) of source and destnaton, drecton. Corresponds to header condtons from the model. optons context-dependent condtons on packet payload. Conssts of dfferent tests of text, contaned n payload after a sngle movng marker. Marker s moved dependng on a condton type. Corresponds to payload condtons from the model. nfo rule nfo and message, generated at rule actvaton. Corresponds to reacton functon arguments. Let s defne vector H n the followng way: H 1 protocol type (TCP / UDP / ICMP / other IP), H 2 and H 3 IP and port of packet source, H 4 packet drecton (from server/to server), H 5 and H 6 IP and port of packet destnaton. Ports range s 065535, IPs range s 0.0.0.0 255.255.255.255. Regardng to so defned vector H, for any header feld a set of header condtons {CondH (H)} exsts, havng the same functonalty. Vector C s defned as a sngle-varable vector. C 1 current state of the text marker, after whch the pattern matchng takes place. Regardng to so defned vector C, for any optons chan a chan of payload condtons [<CondP(, Effect(>] exsts, havng the same functonalty, ncludng both return value and sde-effects. Reactons space, whch contan dfferent classes of events we can defne as a space of dfferent possble values of classtype feld, whch s representng class of detected attack. The reacton of each sgnature s the value of classtype feld. Ths defnton s correct, because for each sgnature reacton exsts and only one. IV. R-LANG LANGUAGE R-lang s the language used n expermental IDS. It descrbes the behavor of observed system wth a scenaro fnte automaton wth memory, each transton of whch s marked by type, condton and body of transton. There are 3 types of transtons consumng (smply performs a transton), non-consumng (creates automaton exact copy and perform a transton there), unwndng (destroys current automaton copy). Condton s a logcal expresson that should be true to perform transton. Body a program nstructon block, that executes durng the transton. In fact, automaton s an extenson of structure concept, because t can contan functons (methods of scenaro), and automaton s memory s а mplemented n form of nner varables (felds of scenaro), that are globally vsble n all the transtons. There s an mportant partcularty of R-lang language: there s totally no dynamc memory n t, so the framework s safe from troubles caused by scenaro errors. Also there are no global varable, vsble n all the scenaros, so for a lbrary functon, desgned as a functon that s external to scenaro, sde-effect can be performed only over ts drect arguments (structures are passed by reference). Functons, defned nsde scenaro can also perform sde-effects on ts felds, even f they are not passed as arguments. V. CONVERSION TO R-LANG A.Smple converson Structure smlarty of R-lang and STATL languages allows usng converson deas, descrbed by S.T. Eckmann n [2], the paper about translatng snort rules to STATL scenaros. Network sensor of expermental IDS provdes typfed events to scenaros on gettng every packet. Each event contans IP-addresses and ports of packet source and destnaton (IPSrc, PortSrc, IPDst and PortDst), ts drecton
(Drecton), and also ts drecton (Payload). Below s an algorthm of sgnature converson, wrtten n terms of formal model. The result s a R-lang scenaro. If protocol Pr (TC UD IP or ICMP) s granted by header condtons {CondH (H)} (vector H * ={Pr, IPSrc *, PortSrc *, Dr *, IPDst *, PortDst * } exsts and all the condtons return true value on t), then the sgnature has a correspondng scenaro, acceptng events of correspondng packet type. Each header condton has a correspondng logcal expresson n R-lang language, whch tests network event felds, correspondng to H vector elements. Ths expresson s a model of CondH (H) n R-lang. For example, correspondng code for CondH(H):(H 3 =80) s ev.tcpsrcport==80. Each payload condton s assgned to a logcal expresson n R-lang language, whch tests packet payload text P and condton vector C. Ths expresson returns a Boolean value, smulatng CondP (. Also durng ts evaluaton condton vector C can be modfed, smulatng Effect (. For example, CondP(: true, f packet payload contans substrng «qwerty» after the marker C 1, EffectP 1 ( moves C 1 to the end of found substrng. The code correspondng to payload condton <Cond EffectP> s a call content( qwerty, C, ev.payload), where content external functon, modelng the descrbed functonalty, ncludng sdeeffect on passed-by-reference vector C. Each reacton s assgned to an alert sendng code, whch sends to IDS a message, contanng found attack classfcaton. Here s the scheme of sgnature evaluaton performng scenaro n R-lang: scenaro sc(<event correspondng to packet type> ev) { <Secondary varables defnton> ntal state st0; consumng transton st0->st0 event <event correspondng to packet type> ( <Header condton 1> && && <Header condton N> ){ f(<payload condton 1>) f(<payload condton M>) <Reacton>; } }; Before optmzatons, descrbed n ths paper, Snort2R-lang converter worked ths way. B.Header-based optmzaton Condtons alternatve a set of pars <SP,R >, where SP ordered set of payload condtons, R reacton. Alternatve-contanng sgnature a par <SH, SA>, where SH set header condtons, SA condton alternatve. Set of sgnatures wth the same header are converted to alternatve-contanng sgnature ths trval way: { < SH, SB, R > } < SH,{ < SB, R > } > Alternatve-contanng sgnature evaluaton result s defned as: < SH, SA>, H, = U N = 1 < SH, S R >, H, Also, let s defne condton alternatve evaluaton result: U N = 1 SA, = < / ο, SP, R >, H, Here s the structure of transton body at the R-lang model of alternatve-contanng sgnature (anythng else s just as n smple sgnature scenaro): C1=C; f(<seres of payload condtons 1>) <Reacton 1>; C=C1; f(<seres of payload condtons 2>) <Reacton 2>; C=C1; f(<seres of payload condtons N>) <Reacton N>; Tests on Snort sgnature base revealed consderable economy on header condton evaluaton provded by ths optmzaton: there are only 519 dfferent header condton sets for base of 6372 sgnatures. The burden of ths optmzaton s n replacement of excess header condton evaluatons wth context vector restorng operaton, much smpler n the case of Snort s sgnatures, where context vector contan one nteger only. Let a set of alternatve-contanng sgnatures s gven, grantng the same protocol. Because of context ndependency of header condtons, they could be evaluated n tree-style order that provdes a good economy. Ths addtonal optmzaton can be organzed n R-lang by scenaro combnng wth the use of nested f s: consumng transton st0->st0 event <event correspondng to packet type>(true){ f(<header condton 1> f(<header condton 2>) <predcate alternatve for header granted by 1,2> f(<header condton 3>) f(<header condton 4>) <predcate alternatve for header granted by 1,3,4> }} Snort analyss engne use that header optmzaton only, that does not allow achevng further speed-up on a fxed sgnature set. C.Predcate tree The computatonal complexty of payload condton evaluaton usually much hgher than the computatonal complexty of header condton evaluaton. That s why the task of optmzng condtons alternatves evaluaton s urgent. Condton chans contaned n condtons alternatve can have the same begnnngs. A huge beneft can be acheved by usng ths fact. Predcate tree ST tree: the edges are marked wth a payload condton the nodes are marked wth a reacton set, possbly empty the leafs are marked wth non-empty reacton sets Algorthm of buldng a predcate tree based on a condton alternatve s ntutve clear: when addng a next chan of payload condtons to a tree, a ponter moves from root node through the edges marked wth correspondng payload condtons f they exst, otherwse a new branch, contanng left condtons s created and lnked to the current ponter poston.
A B {R1} A C D A {R3} Fg. 2. Predcate tree buldng sketch. {R2} Predcate tree evaluaton recursve traversal of tree from root node. Sub-trees are evaluated only f correspondng condton could be evaluated successfully n current context. Predcate tree evaluaton result ST, a set of all the reacton, that were acheved durng the predcate tree evaluaton. Statement. Let predcate tree ST s converted from condton alternatve SA. Then ST,=SA,. D.Statc result analyss Due to that optmzaton, a hgh beneft s acheved at some scenaros. Here are defned two statc characterstc of ths optmzaton. Tree proft s the dfference between edges number n the tree and the total number of payload condtons n a condtons alternatve that was the source of the predcate tree. Relatonal tree proft s the raton between thee proft and total number of payload condtons n a condtons alternatve. For two sgnature groups n the Snort Base the tree proft exceeds 11000 for each group, and the relatonal tree proft for them s about 80%. These groups contan about 2000 sgnatures,.e. about 1/3 off the whole base. On average, the relatonal tree proft s about 62%. Also, whle buldng those trees, 38 par of dentcal sgnatures and 2 groups of sx dentcal sgnatures were found. The number of context restorng s not ncreased at all comparng to condton alternatve usage. The number of context backups s ncreased by the number of addtonal branchng, whch s not more than the tree proft. Thus, consderng heavness of payload condton evaluaton, burden of ths optmzaton s nsgnfcant n comparson wth economy, acheved by the lowerng the number of condtons. E.Addtonal optmzatons Consder a set of leafs, that are all marked wth same reacton R and beng drect chldren of the same node. Sometmes t s possble to thnk of a payload condton B, so <,[B],R>,H, would be equal to evaluaton result of a sub-tree that ncludes these gven leafs and a parent node, and the computatonal costs of B evaluaton s lower than costs for evaluaton of A 1, A 2... A n n total. For example, payload condtons, whch perform regular expresson analyss (pcre) can be combned nto a sngle predcate, that performs the search of frst pattern, found of ths set. For mplementng ths optmzaton for R-lang scenaros, A C B {R3} D {R1} {R2} A 1 A 2 A n callout mechansm of PCRE lbrary was used. It allows not only to stop the ongong regular expressons set testng after the frst match, but also returns d of matched pattern, whch allows to nform IDS of matched sgnature d and classfcaton text, not only classtype. Also t s proposed to unte payload predcates, performng substrng search (contest) usng Aho-Corasck algorthms [3], whch allows savng on substrng search, performng all the patterns look-up at once. These predcate combnaton methods of qute effectve when performng condtons set evaluaton untl the frst match. But when searchng all the matches (that s necessary, f the branchng node s not pre-leaf) at some strngs, ths method works even worther, than separate matchng. Nevertheless, they demonstrate a good performance n average, so ther applcaton for non-pre-leaf branchng nodes needs further research. VI. CONCLUSION Usng the proposed optmzaton method wth the Snort sgnature base allowed reducng the number of testng functons calls by 60% (see table below). Consderng that most of superfluous condtons were heavy-wegh functons such as substrng search and regular expressons matchng, the real gan could be even hgher. A seres of experments usng dfferent types of traffc s planned to fnd out the numercal evaluaton of that gan. The deas, used n optmzng translator are wth mnmal changes applcable for some R-lang constructons, not overvewed n ths paper. Here are some examples of ths constructons: several consequent transtons of the same type to the same state (condton of transton s a analogue of a payload condtons chan, because some sde-effects are possble here, body of transtons s and analogue of reacton) ; several consequent f-blocks, etc. Implementng these deas would help to develop a unversal optmzng translator for ths language. TABLE I STATIC CHARACTERISTICS OF PROPOSED OPTIMIZATION Characterstc Fg. 3. Addtonal optmzaton scheme. The source sgnatures Usng proposed optmzaton Number of header 6372 519 condtons sets Number of payload 39299 15049 condtons Ths table llustrates advantages of proposed method. The presented data was collected usng Snort IDS sgnature base dated 28.01.2007. B
REFERENCES [1] Snort IDS, http://www.snort.org [2] S.T. Eckmann, "Translatng Snort rules to STATL scenaros" presented at the 4th Internatonal Symposum on Recent Advances n Intruson Detecton (RAID 2001), Davs, CA, October 2001, LNCS 2212, pp. 69-84. [3] M. Norton, "Optmzng Pattern Matchng for Intruson Detecton," whte paper, Sourcefre Inc., 2004 [Onlne] Avalable: http:// docs.dsresearch.org/optmzngpatternmatchng/forids.pdf. [4] D.U. Gamayunov, Network objects behavor analyss for detectng computer attacks PhD thess, Faculty of Computatonal Math and Cybernetcs, Moscow State Unversty, Moscow, 2007. [5] S.T. Eckmann, G. Vgna, and R. A. Kemmerer. STATL: An Attack Language for State-based Intruson Detecton Dept. of Computer Scence, Unversty of Calforna, Santa Barbara, 2000. [6] G. Vgna, R. Kemmerer, "NetSTAT: A Network-based Intruson Detecton Approach." Proceedngs of the 14th Annual Computer Securty Applcaton Conference, Scottsdale, Arzona, December 1998. [7] M. Roesch. "Wrtng Snort Rules: How To wrte Snort rules and keep your santy" [Onlne] Avalable: http://www.snort.org.