Iformatca 5 (00) xxx yyy Statstcal Itruso Detector wth Istace-Based Learg Iva Verdo, Boja Nova Faulteta za eletroteho raualštvo Uverza v Marboru Smetaova 7, 000 Marbor, Sloveja va.verdo@sol.et eywords: truso detecto, stace-based learg, reducto techques Receved: [Eter date] I ths paper we are dealg wth computer securty ssues. I ths very broad area, we focused o truso detecto, specfcally, o statstcal detecto. Our statstcal truso detector, preseted the paper, s based o Istace-based Learg wth the -earest Neghbours method. Statstcal detector requres a good ad small database of regular data to be able to valdate the actual traffc correctly ad promptly. Therefore we cosdered reducto techques of gathered data, based o clusterg. We adjusted the -earest Neghbours algorthm by comparg a sequece of actual data wth sequeces of regular data stead of comparg oly oe actual stace wth -earest regular staces.for ths purpose we explored four smlarty measure fuctos. Fally, our securty soluto VAL (Varost ALarm), cosstg of our statstcal detector, a SNORT rule-based truso detecto system, a ptables Lux frewall ad a maagemet cosole, s preseted. Itroducto I addto to the great opportutes ad beeft for mad, the emergece of global etworg the prevous decade has brought also serous securty threats to ts users. Itruso detecto ca be regarded as a tool that ca mprove the securty of local etwor ad/or dvdual hosts. Itruso Detecto Systems (IDS) ca prevet uauthorzed access to system resources ad data ad catch the attacer at the act. There are two ma approaches to truso detecto [6]. These are: rule-based msuse detecto ad statstcal based aomaly detecto. Each of them has ts strog ad wea pots. Rule-based detectors are better for teral securty (by that we mea securty sde the compay traet). O the other had, the strogest pot of statstcal detectors s the detecto of ovel, prevously uow ds of attacs whle they are wea at teral securty. Therefore t s reasoable to combe a rule-based ad a statstcal detector to a hybrd detector. The latest tred s to bloc the attacer IP address wth a frewall from the truso detector. Such systems are called truso preveto systems. Our securty soluto VAL ecompasses all these features. Istace-based Learg Istace-Based Learg (IBL) algorthms cosst of smply storg the preseted trag examples as well as ther attrbute lsts ad ther outcome (database of regular data). Ad whe a ew stace s ecoutered, a set of smlar, related staces s retreved from the memory ad used to classfy the actual (ew) stace accordg to the outcome of the majorty of related trag staces []. Ths d of classfcato s called target fucto. The outcome s our case ether 0 - ormal actvty or - truso. The followg are the most commo IBL target fuctos: -Nearest Neghbor Locally Weghted Regresso Radal Bass Fucto
Ttle of the paper Iformatca 3 (999) xxx yyy IBL approaches ca costruct a dfferet approxmato of the target fucto for each dstct ew stace to be classfed. Some techques oly costruct a local approxmato of the target fucto that apples the eghborhood of the ew query stace ad ever costruct a approxmato desged to perform well over the etre stace space. Ths s a advatage whe the target fucto s very complex, but ca stll be descrbed by a collecto of less complex local approxmatos [].. -Nearest Neghbour The -Nearest Neghbor algorthm s the most basc of all Istace-Based Learg (IBL) methods. The algorthm assumes all staces correspod to pots the - dmesoal space R. The earest eghbors of a stace are defed terms of stadard Eucldea geometry (dstaces betwee pots -dmesoal space). More precsely, let a arbtrary stace x be descrbed by the feature attrbute lst: < a (x), a (x), a 3 (x),..., a (x)>, where a r (x) deotes the value of the r th attrbute of stace x. I our case attrbute lst of the staces cossts of TCP pacet header parameters. The most mportat parameters are: source ad destato IP addresses, source ad destato port umbers ad status of flags. The dstace betwee the two staces x ad x j [3] s gve by equato below. Ths s the geeral form for calculatg dstace -dmesoal space. d( x, x ) j r r [ a ( x ) a ( x )] r Equato : Euclda dstace betwee two staces wth attrbutes We do ot use ths dstace equato exactly sce we test oly the equalty betwee attrbutes. I earest-eghbor learg, the target fucto may be ether dscrete-valued or real-valued. The form of the dscrete-valued target fucto s f :R ->V, where V {v, v, r j v 3,..., v s } s a fte set ( our case V {regularty, truso}) ad R s real - dmesoal space. The -Nearest Neghbours algorthm for approxmatg a dscrete-valued target fucto [3] s gve algorthm below: Trag part: For For each example <x, <x, f(x)>, add the the example to to the the lst lst of of trag_examples Classfcato part: Gve a ew stace x q q to to be be classfed, Step : : Let Let x,, x,,......,, x deote the the staces from the the trag_examples that that are earest to x q, are earest to x f ˆ q, Step:Retur, ( xq ) arg max δ ( v, f ( x )) v V f ˆ ( xq ) arg max where: f ( a b) δ ( δ ( v, f ( x )) v V a, b) 0 f ( a b) where: f ( a b) δ ( a, b) 0 f ( a b) Algorthm : -earest Neghbours I the trag part we must collect trag examples staces. We collect ther attrbute lst as well as ther target. I our case staces are TCP pacets. We collected the mportat header parameters as a attrbute lst. I the classfcato part we frst search for the -earest staces from the trag examples closest to the actual stace (.e. to the ew stace). The we classfy ths stace accordg to the outcome of the majorty of earest trag staces. Our case s a bt specfc sce all our trag examples are cosdered to be regular,.e. all of them have oly oe outcome. Therefore ew stace s cosdered regular, f ts attrbutes are close eough to the attrbute lsts of trag examples. 3 VAL Statstcal Detector As prevously sad, our statstcal detector performs truso detecto usg adapted IBL wth -Nearest Neghbor method. We collected the trag examples by recordg
Ttle of the paper Iformatca 3 (999) xxx yyy 3 the TCP etwor actvty o the computer plugged to uversty departmet traet, for two wees. So collected trag examples were hghly redudat ad osy. To mprove the qualty ad to reduce the sze of gathered data we frst cosdered clusterg methods. Clusterg meas to partto data space to dsjot subsets so that the pots each subset are coheret accordg to a certa crtero. Our dea was to group TCP etwor pacets to sets of smlar pacets ad to preserve oly pacets the ceter of groups. The methods we have spected are: -Meas Mxture of Gaussa dstrbutos used by Expectato-Maxmzato Greedy Clusterg Algorthm 3. -Meas -Meas s oe of the smplest clusterg algorthms. It assumes that the clusters are sphercal, that every cluster has a ceter ad that other pots belogg to the cluster are close aroud the ceter [4]. See Algorthm below. Iput data pots { x, x,.., x} ; umber of clusters Output clust() for ; c, c,.., c postos of ceters Italze c, c,.., c wth radom values Do for.. fd such that x c x c' for all,.., clust() for,.., C { x, clust( ) } c x C C utl clust(),,.., rema uchaged Algorthm : -Meas clusterg We have put data pots ad clusters, whle the output s the assgmet of data pots to clusters ad postos of cluster ceters. Frst, cluster ceters are talzed wth radom values. The, a loop, the data pots are frst assged to the cluster wth the ceter earest to the data pot. I the ext step, the cluster ceters are recalculated from all the pots curretly the cluster. The loop terates utl classfcato of all the data pots to the clusters remas uchaged. The -Meas algorthm fals to fd the correct clusterg whe clusters have dfferet szes ad/or they have (dfferet) elogated shapes [4]. 3. Mxture of Gaussa dstrbutos Dfferet models have to be used for clusters that are t sphercal. Oe of them ca be a mxture of Gaussa dstrbutos. A mxture of Gaussa dstrbutos [4] s a probablty desty gve by f ( x) λ f ( x) where: f (x) are ormal destes wth parameters µ σ called the mxture compoets,, λ 0 are real umbers satsfyg λ, called mxture coeffcets. Itutvely, adoptg a mxture reflects the assumpto that there are sources whch depedetly geerate data ( f, f,.., f ). The probablty that data s geerated by f s λ. So ( λ, λ,.., λ ) represet a dscrete dstrbuto over the sources. The ew data pot s geerated two steps: the frst source f s radomly pced from ( f, f,.., f ) wth a probablty gve by ( λ, λ,.., λ ), the secod data pot x s sampled from chose f. We ow x, but we do t ow, the dex of the source that geerated x. Therefore s called the hdde varable [4]. f (x) ca be rewrtte to show the two-step data geerato model: f ( x) ) f ( x ) where: ) λ for,
Ttle of the paper Iformatca 3 (999) xxx yyy 4 f ( x ) f ( x) I ths probablstc framewor, the clusterg problem ca be traslated as follows. Fdg the clusters s equvalet to estmatg the destes of the data sources ( f, f,.., f ). Assgg the data to the clusters meas recoverg the values of the hdde varable for each data pot [4]. 3.3 Expectatos-Maxmzato The Expectato-Maxmzato (EM) algorthm [4][5] solves the clusterg problem as a Maxmum Lelhood estmato problem. It s based o mxture of the Gaussa dstrbutos. It taes the data D { x, x,.., x} ad the umber of clusters as the put ad outputs the model parameters Θ { λ,.., λ, µ,.., µ, σ,.., σ } ad the posteror probablty of the clusters for each data pot γ (), for,,,... For ay gve set of model parameters Θ, we compute the probablty P ( x ) that observato x was geerated by the -th source f usg the Bayes formula x ) ) f ( x ) )' f ( x )' ' ' λ f ( x ) γ ( ) λ f ( x ) ' ' The values γ (),,.., sum to. They are called the partal assgmets of pot x to the -clusters - see Algorthm 3. It ca be proved that the EM algorthm coverges. The parameters Θ obtaed at covergece represet a local maxmum of the lelhood L(Θ). The complexty of each terato s O(). Clusterg methods based o EM are popular because they are geeral ad ofte hghly effectve. However whe may local optma are preset the lelhood space the qualty of the soluto produced ca be sestve to the tal assgmet of pots to clusters. A larger dffculty for the aomaly detecto doma s that, the umber of clusters to be sought must be ow a pror, yet t s ot clear how to determe the umber of atural clusters a set of etwor pacets wth ther parameters. Furthermore, for large search tme ca be prohbtve []. Iput { x, x,.., x} the data pots, the umber of clusters Output γ () for,..,,.., µ, σ for, the parameters of the mxture compoets λ for,, the mxture coeffcets Italze µ, σ, λ for,.., wth radom values Do E step for,..., λ f ( x ) γ ( ) for, λ f ( x M step for, ' γ ( ) ' ' ) λ µ γ ( ) x σ utl covergece Algorthm 3: Expectatos-Maxmzato 3.4 Greedy Clusterg Algorthm Greedy clusterg algorthm [] bulds dvdual clusters cosecutvely attemptg to mmze the crtero: Dst( x, y) x C y C val( C) C for each cluster C. Begg wth the tal pot, the cluster grows by cludg pots, whch creases val(c) the least. Growth s stopped whe the value reaches a local mmum. Whe the cluster s complete we defe ts ceter,.e. the pot, whch has the mmum dstace to all other pots the cluster. Fally, the cluster s represeted oly by the ceter pot ad the mea radus. The complete clusterg algorthm s smlar to the sgle cluster costructo. We γ ( )( x µ )
Ttle of the paper Iformatca 3 (999) xxx yyy 5 sequetally select dvdual clusters by ther ablty to maxmze the mea tra-cluster dstace: val{ C, C,.., C } Dst( C,, C j cet j, cet We halt the clusterg process whe the tercluster value falls below a certa threshold. Ths parameter defes whe the clusterg process wll be halted ad how may clusters wll be created. A small threshold results may clusters ad a large oe few clusters. 3.5 Our Algorthm After cosderg all of these clusterg methods ad a umber of etwor pacets collected by recordg a etwor traffc, whch was greater tha 00000, we had to fd a computatoally less demadg algorthm. Frst, we decded to dscard all the pacets whose source IP, destato IP, port umber ad TCP flags combato appeared oly oce the collecto of pacets. After that, we further reduced our collecto by preservg oly oe pacet amog all whch had the same source IP, destato IP, port umber ad TCP flags combato. I ths way, we reduced the umber of pacets to oly about 500 pacets. 3.6 Smlarty Measure Decso about truso based o oly oe pacet s certaly urelable. Therefore, we decded to base the decso whether there s truso or ot by cosderg a sequece of pacets. We cosdered dfferet ds of smlarty fuctos [] to compare the sequeces of pacets. Sce a exact match betwee the volved sequeces s t lely, we examed four varats of loosely matchg smlarty fuctos. Furthermore, we do t requre all header data betwee two pacets (oe from actual sequece ad the other from trag examples sequece) to be the same but at least the source IP, the destato IP, the port ad flags. Frst of the fuctos, deoted as MC-P (Match Cout Polyomal), smply couts the umber of matchg postos betwee the sequeces. ) The ext smlarty fucto s deoted as MC-E (Match Cout Expoetally). Ths fucto doubles ts value for each matchg posto betwee sequeces. The ext two smlarty fuctos are based o the feelg that adjacet matches should have stroger weght. Therefore we explored the MCA-P (Match Cout Adjacecy Polyomal) ad the MCA-E (Match Cout Adjacecy Expoetal) fucto. Smlarty measure computato s the same all four cases (oly fuctos are dfferet) - see Algorthm 4 below. Set a adjacecy couter c to oe (c ) ad the tal value of the smlarty measure to, Sm. For each posto j the sequece legth l: If Xj Yj the Sm f(sm,c) ad c u(c) otherwse c. After all postos are examed retur the measure value. Algorthm 4: Smlarty measure computato We have a sequece of l actual pacets X ( x, x,.., xl ) ad sequece of l trag examples Y ( y, y,.., yl ). Fally, there s table wth f(sm,c) ad u(c) deftos for all four types of smlarty measure see Table below. f(sm,c) u(c) MC-P 0 Sm + MCA-P 0 Sm + c c+ MC-E * Sm MCA-E 0 Sm + c *c Table : Fuctos for dfferet smlarty measure computatos It was foud that statstcal sgfcace of smlarty fuctos s dstgushable, so we used MC-P. 3.7 Other parts of VAL We combed our VAL statstcal detector wth GNU lcesed rule-based lghtweght truso detector SNORT. It s used ot oly for rule-based detecto but t serves also for
Ttle of the paper Iformatca 3 (999) xxx yyy 6 TCP etwor traffc capture. Traffc s stored to MySQL database. From there s accessed by the statstcal detector wrtte GNU C. The orgal database schema defed wth SNORT s adjusted ad exteded. I ths way, we produced a hybrd truso detecto system. Furthermore, we corporated Lux ptables persoal frewall to bloc hostle actvtes detected ether wth SNORT or wth the statstcal detector. Addtoally, e-mal s set to the securty admstrator f truso s detected. We also bult a web maagemet cosole wrtte PHP wth access to the same MySQL database for admstratve ad formatve purposes. 4 Results To test the statstcal detector, we used Nessus Vulerablty Scaer ad geerated the attacs ourselves. We executed a whole rage of attacs ad obtaed the followg results: Total umber of pacets was 49003 Number of captured pacets was 343 or 94.060% Number of ot captured pacets was 4790 or 5.940% Number of detected trusve pacets was 703 or 95.73% (amog captured) Number of udetected trusve pacets was 077 or 4.77% (amog captured) The results are relatvely satsfyg. However, at a greater regular traffc load, the result would probably deterorate. Also, f the attacer goes slow ad low, most lely othg would be detected. However, o statstcal detector performs better smlar codtos. 5 Cocluso Securty threats to our computer systems ca be reduced, wth the help of a truso detecto system. A statstcal detector performs truso detecto by comparg a curret actvty wth a owledge base of regular actvty. Our VAL statstcal detector, uses Istace-based Learg wth the - Nearest Neghbor method. To mprove the qualty of gathered data ad to reduce ts quatty, we examed varous clusterg algorthms ad fally used our ow. The we spected fuctos for smlarty measure computato. Sce t has bee foud that there s o sgfcat dfferece ther qualty we used MC-P. We completed our soluto wth SNORT, the frewall ad the maagemet cosole. The testg has show that our detector, combed wth other compoets, secures computer coected to Iteret qute well despte ts smple costructo. Acowledgemet Authors are thaful to Mha Strehar for sharg hs experece about truso detecto, hs help at etwor traffc acqusto ad soluto testg. Refereces [] Terra Lae, Mache Learg Techques for the Computer Securty: Doma of Aomaly Detecto, A Thess Submtted to the Faculty of Purdue Uversty, 000 [] D. Aha, D. bler, M. Albert: Istace- Based Learg Algorthms, Mache Learg, 99 [3] B. V. Dasarathy: Nearest Neghbor (NN) Norms: NN Patter Classfcato Techques, IEEE Computer Socety Press, 99 [4] D.R. Wlso, T.R. Martez: Reducto Techques for Exemplar-Based Learg Algorthms, Mache Learg, 000 [5] T.. Moo: The Expectato- Maxmzato Algorthm, IEEE Sgal Processg Magaz, 996 [6] Stephe Northcutt: Networ Itruso Detecto: A Aalyst's Hadboo, New Rders, 999