Scan Detecton n Hgh-Speed Networks Based on Optal Dynac Bt Sharng Tao L Shgang Chen Wen Luo Mng Zhang Departent of Coputer & Inforaton Scence & Engneerng, Unversty of Florda Abstract Scan detecton s one of the ost portant functons n ntruson detecton systes. In order to keep up wth the everhgher lne speed, recent research trend s to pleent scan detecton n fast but sall SRAM. Ths leads to a dffcult techncal challenge because the aount of traffc to be ontored s huge but the on-de eory space for perforng such a ontorng task s very lted. We propose an effcent scan detecton schee based on dynac bt sharng, whch ncorporates probablstc saplng and bt sharng for copact nforaton storage. We desgn a axu lkelhood estaton ethod to extract persource nforaton fro the shared bts n order to deterne the scanners. Our new schee ensures that the false postve/false negatve ratos are bounded wth hgh probablty. Moreover, gven an arbtrary set of bounds, we develop a systeatc approach to deterne the optal syste paraeters that nze the aount of eory needed to eet the bounds. Experents based on a real Internet traffc trace deonstrate that the proposed scan detecton schee reduces eory consupton by three to twenty tes when coparng wth the best exstng work. I. INTRODUCTION Many network-based attacks are preceded wth a reconnassance phase, n whch the attacker or ts zobes scan the hosts n a network to dentfy vulnerablty. As a result, scan detecton s one of the ost fundaental functons n alost any network ntruson detecton syste (IDS). Csco has been pushng for years to buld securty functons nto ts hgh-end routers. Scan detecton s ncreasngly perfored by routers wth securty odules or frewalls that nspect packets [1]. The throughput of hgh-speed lnks has been ncreased fro 10 Mbps to 100 Mbps, ult-ggabts per second, and even terabts per second [2]. Modern frewalls heavly rely on ASIC chps to avod becong bottlenecks n the routng paths. In such a deandng envronent, t s hghly desrable to assst the fundaental IDS functons wth hardware. Specfcally, because scan detecton requres storng a large aount of nforaton extracted fro the packets, DRAM can be too slow. Recent research suggests pleentng ths functon n SRAM [3], [4], [5], [6]. On-de SRAM s fast but expensve. To acheve hgh speed, t s desrable to ake on-de eory sall the access te for a SRAM of tens of klobts s certanly uch saller than the access te for a SRAM of tens of egabts. Moreover, ths hgh-speed eory has to be shared aong other crtcal functons for routng, packet schedulng, traffc anageent and securty purposes. The aount that wll be allocated for onlne scan detecton s lkely to be a sall fracton of the avalable SRAM. Therefore, t s extreely portant to ake scan detecton, as well as other functons pleented n SRAM, eory-effcent. We defne a contact as a source-destnaton par, for whch the source sends a packet to the destnaton. The source or destnaton can be an IP address, a port nuber, or a cobnaton of the together wth other felds n the packet header. The spread of a source s the nuber of dstnct destnatons contacted by the source durng a easureent perod. A source s classfed as a scanner f ts spread exceeds a certan threshold. Therefore, scan detecton s fundaentally an onlne traffc easureent proble. One of the greatest challenges for scan detecton s that the data volue to be stored can be huge. For exaple, the an gateway at our capus observes ore than 10 llon dstnct source-destnaton pars on an average day. Suppose each easureent perod s one day long (n order to catch stealthy low-rate scanners). If we sply store all dstnct source/destnaton address pars for scan detecton, t wll requre ore than 80MB of SRAM, whch s too uch. A ajor thrust n the scan detecton research s to reduce the eory consupton [3], [4], [6], [7], [8]. Reducng eory consupton does not coe for free. The pror research sacrfces detecton accuracy for eory savng. The basc dea s to copress the contact nforaton n lted eory space. The copressed nforaton allows us to estate the spreads of the sources, nstead of countng the exactly. However, the estated spread values ay cause false postves (n whch a non-scanner s stakenly reported as a scanner) and false negatves (n whch a scanner s not reported). Consequently, the followng questons becoe portant for any practcal securty syste: How serous s the false postve/false negatve proble? Can the syste be confgured such that the false postve/false negatve ratos are bounded? To date, few papers drectly addressed these questons. The pror work follows two general ethods for eory reducton: probablstc saplng and storage sharng. The probablstc saplng ethod s to record only a certan percentage of randoly sapled contacts. An exaple s the one-level/two-level algorths proposed by Venkataraan et al [3]. These algorths store the source/destnaton addresses of the sapled contacts n hash tables. Ther an contrbuton s to derve the optal saplng probablty that ensures wth hgh probablty that the false postve/false negatve ratos do not exceed certan pre-defned bounds. However, t s not eory-effcent to drectly store the 1
addresses of the contacts ade by each source. A nave soluton s to use per-source counters to record the nuber of packets fro each source. Near-optal counter archtectures such as counter brads [9] requre only a few bts per source. The proble s that counters cannot reove duplcates: A thousand packets fro the sae source to the sae destnaton should count as one contact, nstead of a thousand. In order to reove duplcates, one ay use Bloo flters [3] or btap algorths [7]. They encode the contacts ade by each source n a separate btap, whch autoatcally flters duplcates. However, persource btaps stll take too uch space. Cao et al. use a seres of Bloo flters and a hash table to reduce the nuber of sources that need btaps [8]. Instead of usng a separate btap for each source, an nterestng space-savng ethod s to allow storage sharng, where each data structure s no longer dedcated to a sngle source but shared aong ultple sources. Ths s partcularly necessary when the nuber of sources s ore than the nuber of avalable bts. Zhao et al. [6] encodes each contact n three shared btaps usng a technque slar to Bloo flters. Yoon et al. [4] desgn another storage sharng ethod wth superor perforance. Although both ethods can be used for scan detecton, none of the provdes any eans to ensure that the false postve/false negatve ratos are bounded. Moreover, our experents show that these exstng ethods [8], [6], [4] take far ore eory than the one proposed n ths paper. Also related s the work by Band et al. [10] usng TCAM. Another research branch [10], [11], [12], [13], [14], [15], [16], [17], [18] s to fnd heavy htters,.e., sources that send a lot of packets. A heavy htter ay have a spread value of just one because t ay send all ts packets to the sae destnaton. Hence, heavy htters and scanners are very dfferent. Ths paper proposes an effcent scan detecton schee based on a new storage sharng ethod, called dynac bt sharng, whch shares the avalable bts unforly at rando aong all sources, such that the eory space s fully utlzed for storng contact nforaton. It eploys a axu lkelhood estaton ethod to extract per-source nforaton fro the shared bts n order to deterne the scanners. It also enhances securty through a prvate key. Our new ethod ensures that the false postve/false negatve ratos are bounded. Moreover, gven an arbtrary set of bounds, we show analytcally how to choose the optal syste paraeters such that the aount of eory needed to satsfy the bounds s nzed. We also perfor experents based on a real traffc trace and deonstrate that, usng these optal paraeters, we can reduce the eory consupton by three to twenty tes when coparng wth the best exstng work. II. PROBLEM STATEMENT The nuber of dstnct destnaton addresses that an external source has contacted s called the spread of the source. The proble of scan detecton s to confgure a frewall or an ntruson detecton syste to report all external sources whose spreads exceed a certan threshold durng a easureent perod. We refer to these sources as potental scanners (or scanners for short). If a frewall or an IDS keeps the exact count of dstnct destnatons that each source has contacted, t s able to report the scanners precsely. However, keepng track of per-source nforaton consues a large aount of resources. The lted SRAM ay only allow us to estate a rough count of dstnct destnatons that each source contacts [3], [4], [6]. When precsely reportng scanners s nfeasble, the functon of scan detecton ust be defned n a probablstc ter. We adopt the probablstc perforance objectve fro [3]. Let h and l be two postve ntegers, h > l. Let α and β be two probablty values, 0 < α < 1 and 0 < β < 1. The objectve s to report any source whose spread s h or larger wth a probablty no less than α and report any source whose spread s l or saller wth a probablty no ore than β. Let k be the spread of an arbtrary source src. The objectve can be expressed n ters of condtonal probabltes: Prob{report src as a scanner k h} α Prob{report src as a scanner k l} β We treat the report of a source whose spread s l or saller as a false postve, and the non-report of a source whose spread s h or larger as a false negatve. Hence, the above objectve can also be stated as boundng the false postve rato by β and the false negatve rato by 1 α. Our goal s to nze the aount of SRAM that s needed for achevng the above objectve. The eory requreent for detectng aggressve scanners s lkely to be sall. For exaple, suppose an aggressve scanner akes 100 dstnct contacts each second, whereas a noral host rarely akes 100 dstnct contacts n a day. To detect such a scanner, a frewall can set the easureent perod to be a second. The nuber of contacts that pass the frewall n such a sall perod s lkely to be sall. Consequently, t does not need uch eory to store the. However, the stuaton s totally dfferent for stealthy scanners that ake contacts at low rates. Consder a scanner that akes 500 dstnct contacts a day. If the easureent perod s a day, we are able to set t apart fro the noral hosts. However, f the easureent perod s a second, we wll not detect ths scanner because t akes less than 0.006 contact per second on average. In order to detect dfferent types of scanners, a frewall ay execute ultple nstances of a scan detecton functon sultaneously, each havng a dfferent easureent perod. For aggressve scanners, a sall perod wll be chosen so that they can be detected n real te. For stealthy scanners, a large perod wll be chosen. In the latter case, tely detecton s of second prorty because the scanners theselves operate slowly. But the eory requreent s of frst prorty due to the large nuber of contacts that are expected to pass through the frewall n a long easureent perod. Reducng eory consupton s the focus of ths paper. III. AN EFFICIENT SCAN DETECTION SCHEME Ths secton presents an effcent scan detecton schee (ESD), whch s the cobnaton of probablstc saplng, (1) 2
dynac bt sharng, and axu lkelhood estaton for scanner report. A. Probablstc Saplng To save resources, a frewall (or IDS) saples the contacts ade by external sources to nternal destnatons, and t only stores the sapled contacts. The frewall selects contacts for storage unforly at rando wth a saplng probablty p. The saplng procedure s sple: the frewall hashes the source/destnaton address par of each packet that arrves at the external network nterface nto a nuber n a range [0,N). If the hash result s saller than p N, the contact wll be stored; otherwse, the contact wll not be stored. B. Bt-Sharng Storage A bt array (also called btap) ay be used to store all sapled contacts ade by a source [7]. The bts are ntally zeros. Each sapled contact s hashed to a bt n the btap, and the bt s set to one. At the end of the easureent perod, we can estate the nuber of contacts,.e., the spread of the source, based on the nuber of zeros reanng n the btap. Usng per-source btaps s not eory-effcent. On one hand, the sze of each btap has to be large enough to ensure the accuracy n estatng the spread values of the scanners. On the other hand, the vast ajorty of noral sources have sall spread values and ther btaps are largely wasted because ost bts rean zeros. To solve ths proble, we want to put those wasted bts n good use by allowng btaps to share ther bts. To fully share the avalable bts, ESD stores contacts fro dfferent sources n a sngle bt array B. Let be the nuber of bts n B. For an arbtrary source src, we use a hash functon to pseudo-randoly select a nuber of bts fro B to store the contacts ade by src. The ndces of the selected bts are H(src R[0]), H(src R[1]),..., H(src R[s 1]), where H(...) s a hash functon whose range s [0,), R s an nteger array, storng randoly chosen constants whose purpose s to arbtrarly alter the hash result, and s ( ) s a syste paraeter that specfes the nuber of bts to be selected. The above bts for a logcal btap of source src, denoted as LB(src). Slarly, a logcal btap can be constructed fro B for any other source. Essentally, we ebed the btaps of all possble sources n B. The bt-sharng relatonshp s dynacally deterned on the fly as each new source src whose contacts are sapled by the frewall wll be allocated a logcal btap LB(src ) fro B. At the begnnng of a easureent perod, all bts n B are reset to zeros. Consder an arbtrary contact src, dst that s sapled for storage, where src s the source address and dst s the destnaton address. The frewall sets a sngle bt n B to one. Obvously, t ust also be a bt n the logcal btap LB(src). The ndex of the bt to be set for ths contact s gven as follows: H(src R[H(dst K) od s]). The second hash, H(dst K), ensures that the bt s pseudorandoly selected fro LB(src). The prvate key K s ntroduced to prevent the hash collson attacks. In such an attack, a scanner src fnds a set of destnaton addresses, dst 1, dst 2,..., that have the sae hash value, H(dst 1 ) = H(dst 2 ) =... If t only contacts these destnatons, the sae bt n LB(src) wll be set, whch allows the scanner to stay undetected. Ths type of attacks can be prevented f we use a cryptographc hash functon such as MD5 or SHA1, whch akes t dffcult to fnd destnaton addresses that have the sae hash value. However, f a weaker hash functon s used for perforance reason, then a prvate key becoes necessary. Wthout knowng the key, the scanners wll not be able to predct whch destnaton addresses produce the sae hash value. To store a contact, ESD only sets a sngle bt and perfors two hash operatons. Ths s ore effcent than the ethods that use hash tables [3] or have features slar to Bloo flters that requre settng ultple bts for storng each contact [6]. C. Maxu Lkelhood Estaton and Scanner Report At the end of the easureent perod, ESD wll send the content of B to an offlne data processng center. There, the logcal btap of each sourcesrc s extracted and the estated spread ˆk of the source s coputed. Only f ˆk s greater than a threshold value T, ESD reports the source as a potental scanner. We wll dscuss how to keep track of the source addresses n Secton III-D, and explan how to deterne the threshold T n Secton IV. Below we derve the forula for ˆk. Let k be the true spread of source src, and n be the nuber of dstnct contacts ade by all sources. Let V be the fracton of bts n B whose values are zeros at the end of the easureent perod, V s be the fracton of bts n LB(src) whose values are zeros, and U s be the nuber of bts n LB(src) whose values are zeros. Clearly, V s = Us s. Dependng on the context, V (or V s, U s ) s used ether as a rando varable or an nstance value of the rando varable. The probablty for any contact to be sapled for storage s p. Consder an arbtrary bt b n LB(src). A sapled contact ade bysrc has a probablty of 1 s to set b to 1, and a sapled contact ade by any other source has a probablty of 1 to set b to 1. Hence, the probablty q(k) for b to rean 0 at the end of the easureent perod s q(k) = (1 p )n k (1 p s )k. (2) Each bt n LB(src) has a probablty of q(k) to rean 0. The observed nuber of 0 bts n LB(src) s U s. The lkelhood functon for ths observaton to occur s gven as follows: L = q(k) Us (1 q(k)) s Us. (3) In the standard process of axu lkelhood estaton, the unknown value k s techncally treated as a varable n (3). We want to fnd an estate ˆk that axzes the lkelhood functon. Naely, ˆk = arg ax{l}. (4) k 3
Snce the axa s not affected by onotone transforatons, we use logarth to turn the rght sde of (3) fro product to suaton: ln(l) = U s ln(q(k))+(s U s ) ln(1 q(k)). Fro (2), the above equaton can be wrtten as ln(l) =U s ((n k)ln(1 p )+kln(1 p s )) +(s U s ) ln(1 (1 p )n k (1 p s )k ). To fnd the axa, we dfferentate both sdes: ln(l) k = ln( 1 p s 1 p We then let the rght sde be zero. That s, ) Us s(1 p )n k (1 p s )k 1 (1 p )n k (1 p. (5) s )k U s = s(1 p )n k (1 p s )k. (6) Takng logarth on both sdes, we have ln U s s = nln(1 p )+k(ln(1 p s ) ln(1 p )), k = lnv s nln(1 p ) ln(1 p s ) ln(1 p (7) ). where V s = Us s. Suppose the nuber of sources (whch equals to the nuber of logcal btaps) s suffcently large. Because every bt n every logcal btap s randoly selected fro B, n ths sense, each of the n contacts has about the sae probablty p of settng any bt n B. Hence, we have Applyng (8) to (7), we have E(V ) = (1 p )n. (8) lnv s lne(v ) k = ln(1 p s ) ln(1 p (9) ). Replacng E(V ) by the nstance value V, we have the followng estaton for k. lnv s lnv ˆk = ln(1 p s ) ln(1 p (10) ), where V s can be easured by countng the nuber of zeros n LB(src), V can be easured by countng the nuber of zeros n B, and s, p and are pre-set paraeters of ESD (see the next secton). D. Source Addresses ESD does not store the source address of every arrval packet. Instead, t stores a source address only when a contact sets a bt n B fro 0 to 1. Hence, the frequency of storng source addresses s uch saller than the frequency at whch contacts are sapled for settng bts n B. Frst, nuerous packets ay be sent fro a source to a destnaton n a TCP/UDP sesson. Only the frst sapled packet ay cause the source address to be stored because only the frst packet sets a bt fro 0 to 1 and the reanng packets wll set the sae bt (whch s already 1 ). Second, a source ay send thousands or even llons of packets through a frewall, but the nuber of tes ts address wll be stored s bounded by s (whch s the nuber of bts n the source s logcal btap). In suary, because the operaton of storng source addresses s relatvely nfrequent, these addresses can be stored n the an eory. IV. OPTIMAL SYSTEM PARAMETERS AND MINIMUM MEMORY REQUIREMENT In ths secton, we frst develop the constrants that the syste paraeters ust satsfy n order to acheve the probablstc perforance objectve. Based on the constrants, we deterne the optal values for the sze s of the logcal btaps, the saplng probablty p, and the threshold T. We also deterne the nu aount of eorythat should be allocated for ESD to acheve the perforance objectve. Recall that on-de SRAM ay be shared by other functons. A. Report Probablty Consder an arbtrary source src whose spread s k. Gven a set of syste paraeters,,s,pandt, we derve the probablty for ESD to reportsrc as a scanner,.e.,prob{ˆk T}. Fro (10), we know that the followng nequaltes are equvalent. ˆk T lnv s lnv ln(1 p s ) ln(1 p ) T V s V ( 1 p s 1 p ) T Let U s be the rando varable for the nuber of 0 bts n LB(src). U s = s V s. The above nequalty becoes U s s V ( 1 p s 1 p ) T. (11) For a set of paraeters, s, p and T, we defne a constant C = s V ( 1 p s 1 p ) T, where the nstance value of V can be easured fro B after the easureent perod. Hence, the probablty for ESD to report src s Prob{ˆk T} = Prob{U s C}. U s follows the bnoal dstrbuton wth paraeters s and q(k), where q(k) n (2) s the probablty for an arbtrary bt n LB(src) to rean zero at the end of the easureent perod. Hence, the probablty of havng exactly zeros n LB(src) s gven by the followng probablty ass functon: Prob{U s = } = We ust have =0 ( s ) q(k) (1 q(k)) s. (12) Prob{ˆk T} = Prob{U s C} = q(k) (1 q(k)) s. (13) 4
Std(V )/E(V ) 0.01 0.008 0.006 0.004 0.002 LF=0.5 LF=1 LF=2 0 1 2 3 4 5 6 7 8 9 10 (*100KB) Std(V Fg. 1. The relatve standard devaton, ), approaches to zero as E(V ) ncreases. The load factor (LF) s defned as n p/, where n p s the nuber of dstnct contacts that are sapled by ESD for storage. In our experents (reported n Secton V), when we use the syste paraeters deterned by the algorth proposed n ths secton, the load factor never exceeds 2. B. Constrants for the Syste Paraeters We derve the constrants that the syste paraeters ust satsfy n order to acheve the perforance objectve n (1). Frst, we gve the varance of V, whch s derved n Appendx A. Var(V ) e np (1 (1+ np2 )e np ). (14) It approaches to zero as ncreases. In Fgure 1, we plot the rato of the standard devaton Std(V ) = Var(V ) to E(V ), whch can be found n (8). The fgure shows that Std(V )/E(V ) s very sall when s reasonably large. In ths case, we can approxately treat V as a constant. V E(V ) (1 p )n. (15) The probablstc perforance objectve can be stated as two requreents. Frst, the probablty for ESD to report a source wth k h ust be at least α. That s, Prob{ˆk T} α, k h. Fro (13), ths requreent can be wrtten as the followng nequalty: =0 where C = s V ( 1 p s 1 p q(k) (1 q(k)) s α, ) T s (1 p )n ( 1 p s ) T. The 1 p left sde of the nequalty s an ncreasng functon n k. Hence, to satsfy the requreent n the worst case when k = h, the followng constrant for the syste paraeters ust be et: =0 q(h) (1 q(h)) s α. (16) Second, the probablty for ESD to report a source wth k l ust be no ore than β. Ths requreent can be slarly converted nto the followng constrant: =0 q(l) (1 q(l)) s β. (17) C. Optal Syste Paraeters Our goal s to optze the syste paraeters such that the eory requreent,, s nzed under the constrants (16) and (17). The proble s forally defned as follows. Mnze (18) Subject to q(h) (1 q(h)) s α, =0 =0 q(l) (1 q(l)) s β, C = s (1 p )n ( 1 p s 1 p ) T. The paraeters, h, l, α and β, are specfed n the perforance objectve. The value of n s decded based on the hstory data n the past easureent perods. To be conservatve, we take the the axu nuber n of dstnct contacts observed n a nuber of prevous easureent perods. More specfcally, (8) can be turned nto a forula for estatng n n each prevous perod f we replace E(V ) wth the nstance value V. ˆn = p lnv (19) We derve the relatve bas and the relatve standard devaton of the above estaton. Bas(ˆn n ) = E(ˆn np e ) 1 n np2 1 2np (20) Std(ˆn n ) = np np 2 (e np 1)1/2 (21) They both approach to zero as ncreases. Based on the largest ˆn value observed n a certan nuber of past easureent perods, we can set the value of n. To solve the constraned optzaton proble (18), we need to deterne the optal values of the reanng three syste paraeters, s, p and T, such that wll be nzed. We consder the left sde of (16) as a functon F h (,s,p,t), and the left sde of (17) as F l (,s,p,t). Naely, F h (,s,p,t) = q(h) (1 q(h)) s, =0 F l (,s,p,t) = q(l) (1 q(l)) s. =0 Both of the are non-ncreasng functons n T, accordng to the relaton between C and T. In the followng, we present an teratve nuercal algorth to solve the optzaton proble. The algorth conssts of four procedures. Frst, we construct a procedure called P otental(, s, p), whch takes a value of, a value of s and a value of p as nput and returns the axu value of F h (,s,p,t) under the condton that F l (,s,p,t) β s satsfed. Because F h (,s,p,t) s a non-ncreasng functon n T, we need to fnd the sallest value of T that satsfes F l (,s,p,t) β. 5
That can be done nuercally through bnary search: Pck a sall nteger T 1 such that F l (,s, p,t 1 ) β and a large nteger T 2 such that F l (,s,p,t 2 ) β. We teratvely shrnk the dfference between the by resettng one of the to be the average T1+T2 2, whle antanng the nequaltes, F l (,s,p,t 1 ) β and F l (,s,p,t 2 ) β. The process stops when T 1 = T 2, whch s denoted as T. The procedure Potental(,s,p) returns F h (,s,p,t ). The pseudo code s presented n Algorth 1. We wll ot the pseudo code for the other three procedures to save space. Algorth 1 P otental(, s, p) INPUT:,s,p and β OUTPUT: The axu value of F h (,s,p,t) under the condton that F l (,s,p,t) β. Pck a sall nteger T 1 such that F l (,s,p,t 1) > β and a large nteger T 2 such that F l (,s,p,t 2) β. whle T 2 T 1 > 1 do T = (T 1 +T 2)/2 f F l (,s,p, T) β then T 1 = T else T 2 = T end f end whle T = T return F h (,s,p,t ) Essentally, what P otental(, s, p) returns s the axu value of the left sde n (16) under the condton that (17) s satsfed. The dfference between P otental(, s, p) and α provdes us wth a quanttatve ndcaton on how conservatve or aggressve we have chosen the value of. If Potental(,s,p) α s postve, t eans that the perforance acheved by the current eory sze s ore than requred. We shall reduce. On the contrary, f P otental(, s,p) α s negatve, we shall ncrease. Gven the above seantcs, when we deterne the optal values for p and s, our goal s certanly to axze the return value of Potental(,s,p). Second, gven a value of and a value of s, we construct a procedure OptalP(, s) that deternes the optal value p such that Potental(,s,p ) s axzed. When the values of and s are fxed, Potental(,s,p) becoes a functon of p. It s a curve as llustrated n Fgure 3; see explanaton under the capton (a) and gnore the arrows n the fgure for now. We use a bnary search algorth to fnd a near-optal value of p. Let p 1 = 0 and p 2 = 1. Let δ be a sall postve value (such as 0.001). Repeat the followng operaton: Let p = (p 1 + p 2 )/2. If Potental(,s, p) < Potental(,s, p+δ), set p 1 to be p; otherwse, set p 2 to be p. The above teratve operaton stops when p 2 p 1 < δ. The procedure OptalP(,s) returns (p 1 +p 2 )/2, whch s wthn ±δ/2 of the optal. Ths dfference can be ade arbtrarly sall when we decrease δ at the expense of ncreased coputaton overhead. We want to stress that t s one-te overhead (not onlne overhead) Potental(, s, p) 1 0.9 0.8 0.7 p1 2... 3 0.6 p2 0.5 0 0.2 0.4 0.6 0.8 1 Fg. 2. (A) The curve (wthout the arrows) shows the value of Potental(,s,p) wth respect to p when = 0.45MB and s = 150. Its non-sooth appearance s due to n the forula of F h (,s,p,t ). F h (,s,p,t ) depends on the values of and q(h), whch are both functons of p. (B) The arrows llustrate the operaton of OptalP(, s). In the frst teraton (arrow 1 ), p 2 s set to be (p 1 + p 2 )/2. In the second teraton (arrow 2 ), p 1 s set to be (p 1 +p 2 )/2. In the thrd teraton (arrow 3 ), p 2 s set to be (p 1 +p 2 )/2. Potental(, s, OptalP(, s)) 0.7 0.65 0.6 0.55 0.5 p 1 0 100 200 300 400 500 Fg. 3. The value of Potental(,s,OptalP(,s)) wth respect to s when = 0.25MB. nuber of sources 10 7 10 6 10 5 10 4 10 3 10 2 10 1 10 0 10 0 10 1 10 2 10 3 10 4 10 5 s spread (k) Fg. 4. Traffc dstrbuton: each pont shows the nuber of sources havng a certan spread value. to deterne the syste paraeters before deployent. The operaton of OptalP(,s) s llustrated by the arrows n Fgure 2; see explanaton under the capton (b). Thrd, gven a value of, we construct a procedure OptalS () that deternes the optal value s such that Potental(,s,OptalP(,s )) s axzed. When the value of s fxed, P otental(, s, OptalP(, s)) becoes a functon of s. It s a curve as llustrated n Fgure 3. We can use a bnary search algorth slar to that of OptalP(,s) to fnd s. Fourth, we construct a procedure OptalM() that deternes the nu eory requreent through bnary search: Denote P otental(, OptalS(), OptalP(, OptalS())) as Potental(,...). Pck a sall value 1 such that Potental( 1,...) α, whch eans that the perforance objectve s not et ore specfcally, accordng to the seantcs of P otental(...), the constrant (16) cannot be satsfed f the constrant (17) s satsfed. Pck a large value 6
2 such that Potental( 2,...) α, whch eans that the perforance objectve s et. Repeat the followng operaton. Let = ( 1 + 2 )/2. If Potental(,...) α, set 1 to be ; otherwse, set 2 to be. The above teratve operaton ternates when 1 = 2, whch s returned by the procedure OptalM(). In practce, a network adnstrator wll frst defne the perforance objectve that s specfed by α, β, h and l. He or she sets the value of n based on hstory data, and then sets = OptalM(), s = OptalS(), p = OptalP(,s) and T as the threshold value T before the last call to P otental(, s, p) s returned durng the executon of OptalM(). After the frewall (or IDS) s confgured wth these paraeters and begns to easure the network traffc, t also ontors the value of n. If the axu nuber of dstnct contacts n a easureent perod changes sgnfcantly, the values of, s, p and T wll be recoputed. A. Experental Setup V. EXPERIMENTS We evaluate the perforance of ESD and copare t wth the exstng work, ncludng the Two-level Flterng Algorth (TFA) [3], the Thresholded Btap Algorth (TBA) [8], and the Copact Spread Estator (CSE) [4]. TFA uses two flters to reduce both the nuber of sources to be ontored and the nuber of contacts to be stored. It s desgned to satsfy the probablstc perforance objectve n (1). TBA s not desgned for eetng the probablstc perforance objectve. It cannot ensure that the false postve/false negatve ratos are bounded. CSE s desgned to estate the spreads of the external sources n a very copact eory space. It can be used for scan detecton by reportng the sources whose estated spreads exceed a certan threshold. However, the desgn of CSE akes t unsutable for eetng the objectve n (1). Onlne Streang Module (OSM) [6] s another related work. We do not pleent OSM n ths study because Yoon et al. show that, gven the sae aount of eory, CSE estates spread values ore accurately than OSM [4]. Moreover, the operatons of OSM share certan slarty wth Bloo flters. To store each contact, t perfors three hash functons and akes three eory accesses. In coparson, ESD perfors two hash functons and akes one eory access. The experents use a real Internet traffc trace captured by Csco s Netflow at the an gateway of our capus for a week. For exaple, n one day of the week, the traffc trace records 10,702,677 dstnct contacts, 4,007,256 dstnct source IP addresses and 56,167 dstnct destnaton addresses. The average spread per source s 2.67, whch eans a source contacts 2.67 dstnct destnatons on average. Fgure 4 shows the nuber of sources wth respect to the source spread n log scale. The nuber of sources decreases exponentally as the spread value ncreases fro 1 to 500. After that, there s zero, one or a few sources for each spread value. We pleent ESD, TFA, TBA and CSE, and execute the wth the traffc trace as nput. As part of the setup n each experent, the values of h and l are gven to specfy what to report as scanners. For exaple, f h = 500 and l = 0.7h, the sources whose spreads are 500 or ore should be reported, and the sources whose spreads are 350 or less should not be reported. In the experents, the source of a contact s the IP address of the sender and the destnaton s the IP address of the recever. The easureent perod s one day. A long easureent perod helps to separate low-rate scanners fro noral hosts. The experental results are the average over the week-long data. One perforance etrc used n coparson s the aount of eory that s requred for a scan detecton schee to eet a gven probablty perforance objectve. Rearkably, the nuber of bts requred by ESD s far saller than the nuber of dstnct sources n the traffc trace. That s, ESD requres uch less than 1 bt per source to perfor scan detecton. Other perforance etrcs nclude the false postve rato and the false negatve rato, whch wll be explaned further shortly. B. Coparson n Ters of Meory Requreent The frst set of experents copares ESD and TFA for the aount of eory that they need n order to satsfy a gven probablstc perforance objectve, whch s specfed by four paraeters, α, β, h, and l. See Secton II for the foral defnton of the perforance objectve. We do not copare TBA and CSE here because they are not desgned to eet ths objectve. The eory requred by ESD s deterned based on the teratve algorth n Secton IV-C. The values of other paraeters, s, T and p, are decded by the sae algorth. Usng these paraeters, we perfor experents on ESD wth the traffc trace as nput, and the experental results confr that the perforance objectve s ndeed acheved for each day durng the week. The aount of eory requred by TFA s deterned experentally based on the ethod n [3] together wth the traffc trace. The paraeters of TFA are chosen based on the orgnal paper. The eory requreents of ESD and TFA are presented n Tables I-III wth respect to α, β, h and l. For α = 0.9 and β = 0.1, Table I shows that TFA requres sx to twentyfour tes of the eory that ESD requres, dependng on the values of h and l (whch the syste adnstrator wll select based on the organzaton s securty polcy). For exaple, when h = 500 and l = 0.5h, ESD reduces the eory consupton by an order of agntude when coparng wth TFA. To deonstrate the pact of probablstc saplng, the table also ncludes the eory requreent of ESD when saplng s turned off (by settng p = 1). Ths verson of ESD s denoted as ESD-1. Snce p s set as a constant, the teratve algorth n Secton IV-C needs to be slghtly odfed: The procedure OptalP(, s) wll always return 1, whle other procedures rean the sae. Table I shows that the eory saved by saplng s sgnfcant when h s large. For exaple, when h = 5,000 and l = 0.3h, ESD wth saplng uses less than one thrteenth of the eory that s needed by ESD-1. However, when h becoes saller or l h becoes larger, ESD has to choose a larger saplng probablty n order to lt the 7
TABLE I MEMORY REQUIREMENTS (IN MB) OF ESD, TFA AND ESD-1 (I.E. ESD WITH p = 1) WHEN α = 0.9 AND β = 0.1. l = 0.1h l = 0.3h l = 0.5h l = 0.7h h ESD TFA ESD-1 ESD TFA ESD-1 ESD TFA ESD-1 ESD TFA ESD-1 500 0.09 2.02 0.33 0.19 2.53 0.43 0.30 3.61 0.54 0.97 6.12 1.01 1000 0.07 1.10 0.27 0.09 1.29 0.33 0.15 1.85 0.42 0.47 3.11 0.86 2000 0.03 0.55 0.24 0.05 0.71 0.29 0.08 1.02 0.42 0.25 1.62 0.86 3000 0.02 0.42 0.24 0.03 0.51 0.27 0.06 0.68 0.42 0.17 1.09 0.86 4000 0.01 0.32 0.21 0.03 0.38 0.27 0.03 0.52 0.42 0.13 0.83 0.86 5000 0.01 0.24 0.21 0.02 0.31 0.27 0.03 0.43 0.42 0.11 0.66 0.86 TABLE II MEMORY REQUIREMENTS (INMB) OF ESD, TFA AND ESD-1 (I.E. ESD WITHp = 1) WHEN α = 0.95 AND β = 0.05. l = 0.1h l = 0.3h l = 0.5h l = 0.7h h ESD TFA ESD-1 ESD TFA ESD-1 ESD TFA ESD-1 ESD TFA ESD-1 500 0.12 2.41 0.38 0.22 3.27 0.48 0.48 4.59 0.68 1.56 8.03 1.60 1000 0.08 1.29 0.32 0.12 1.65 0.38 0.24 2.34 0.50 0.76 4.04 1.20 2000 0.03 0.69 0.26 0.08 0.87 0.32 0.13 1.21 0.47 0.38 2.12 1.20 3000 0.02 0.46 0.26 0.06 0.60 0.32 0.09 0.83 0.47 0.26 1.42 1.20 4000 0.02 0.37 0.23 0.04 0.45 0.32 0.06 0.63 0.47 0.20 1.08 1.20 5000 0.01 0.29 0.23 0.04 0.35 0.32 0.05 0.52 0.47 0.16 0.89 1.20 TABLE III MEMORY REQUIREMENTS (INMB) OF ESD, TFA AND ESD-1 (I.E. ESD WITHp = 1) WHEN α = 0.99 AND β = 0.01. l = 0.1h l = 0.3h l = 0.5h l = 0.7h h ESD TFA ESD-1 ESD TFA ESD-1 ESD TFA ESD-1 ESD TFA ESD-1 500 0.20 3.60 0.48 0.29 4.82 0.52 0.97 7.25 1.03 4.20 13.15 4.20 1000 0.10 1.92 0.38 0.15 2.42 0.40 0.50 3.60 0.67 1.59 6.54 3.10 2000 0.07 1.01 0.32 0.09 1.30 0.34 0.24 1.85 0.60 0.81 3.21 3.10 3000 0.04 0.68 0.29 0.07 0.85 0.34 0.16 1.24 0.60 0.53 2.18 3.10 4000 0.03 0.50 0.29 0.05 0.66 0.34 0.12 0.96 0.59 0.41 1.70 3.10 5000 0.03 0.42 0.29 0.05 0.55 0.34 0.10 0.77 0.59 0.33 1.38 3.10 error n spread estaton caused by saplng. Consequently, t has to store ore contacts and thus requre ore eory. For nstance, when h = 500 and l = 0.5h, ESD wth saplng uses 55.6% of the eory that s needed by ESD-1. Table II copares the eory requreents of ESD and TFA when α = 0.95 and β = 0.05. Table III copares the eory requreents when α = 0.99 and β = 0.01. They show slar results: (1) ESD uses sgnfcantly less eory than TFA, and (2) the probablstc saplng ethod n ESD s crtcal for eory savng especally when h s large or l h s sall. The tables also deonstrate that the eory requreent of ether ESD or TFA ncreases when the perforance objectve becoes ore strngent,.e., α s set larger and β saller. TFA requres ore eory because t stores the source and destnaton addresses of the contacts. In [5], the authors also ndcate that Bloo Flters [19], [20] can be used to reduce the eory consupton. However, the paper does not gve detaled desgn or paraeter settngs. Therefore, we cannot pleent the Bloo-flter verson of TFA. The paper clas that the eory requreent wll be reduced by a factor of 2.5 when Bloo flters are used. Even when ths factor s taken nto account n Tables I-III, eory savng by ESD wll stll be sgnfcant. C. Coparson n Ters of False Postve Rato and False Negatve Rato The false postve rato (FPR) s defned as the fracton of all non-scanners (whose spreads are no ore than l) that are stakenly reported as scanners. The false negatve rato (FNR) s the fracton of all scanners (whose spreads are no less than h) that are not reported by the syste. In the prevous subsecton, we have shown that, gven the bounds of FPR and FNR, t takes ESD uch less eory to acheve the bounds than TFA. Snce CSE and TBA are not desgned for eetng a gven set of bounds, we copare our ESD wth the by a dfferent set of experents that easure and copare the FPR and FNR values under a fxed aount of SRAM. Gven a fxed eory sze, we use OptalS(, s) n Secton IV-C to deterne the value of s n ESD, use OptalP(, s) to deterne the value of p, and then set the threshold T as h+l 2. We perfor experents usng the weeklong traffc trace. For = 0.05MB and 0.2MB, the results are presented n Tables IV and V, respectvely. In both tables, l = 0.5h. We also perfor the sae experents for CSE and TBA, and the results are presented n the tables as well. The optal paraeters are chosen for each schee based on the orgnal papers. When the avalable eory s very sall, such as 0.05MB n Table IV, CSE has zero FNR but ts FPR s 1.0, whch eans t reports all non-scanners. The reason s that, wthout probablstc saplng, CSE stores nforaton of too any contacts such that ts data structure s fully saturated. In ths case, the spread estaton ethod of CSE breaks down. TBA has a sall FPR but ts FNR s large. For exaple, when h = 500, ts FNR s 26%. Only ESD acheves sall values for both FNR and FPR. For exaple, when h = 500, ts FNR 8
TABLE IV FALSE NEGATIVE RATIO AND FALSE POSITIVE RATIO OF ESD, CSE AND TBA WITH = 0.05MB. FNR FPR h ESD CSE TBA ESD CSE TBA 500 7.4e-2 0 2.6e-1 5.0e-2 1 9.0e-6 1000 1.0e-2 0 2.6e-1 5.5e-3 1 9.0e-6 2000 4.2e-3 0 2.5e-1 2.0e-3 1 1.1e-5 3000 5.5e-3 0 2.5e-1 2.0e-3 1 1.0e-5 4000 0 0 2.4e-1 2.0e-3 1 7.0e-6 5000 0 0 2.4e-1 2.0e-3 1 7.0e-6 TABLE V FALSE NEGATIVE RATIO AND FALSE POSITIVE RATIO OF ESD, CSE AND TBA WITH = 0.2MB. FNR FPR h ESD CSE TBA ESD CSE TBA 500 1.2e-2 3.3e-2 3.7e-3 1.5e-3 1.2e-1 1.8e-4 1000 8.8e-4 0 3.7e-3 7.5e-4 5.5e-2 1.9e-4 2000 0 0 9.3e-3 7.5e-4 5.5e-2 2.0e-4 3000 0 0 7.4e-3 7.5e-4 5.5e-2 1.8e-4 4000 0 0 1.9e-3 7.5e-4 5.5e-2 1.9e-4 5000 0 0 3.7e-3 7.5e-4 5.5e-2 1.8e-4 s 7.4% and ts FPR s 5.0%. These values decrease quckly as h ncreases. When h = 1,000, they are 1.0% and 0.55%, respectvely, whle the FNR of TBA reans to be 26%. When the avalable eory ncreases n Table V, the perforance of all three schees proves. Stll, ESD perfors better n ost cases. VI. CONCLUSIONS Scan detecton s one of the ost portant functons n ntruson detecton systes. The recent research trend s to pleent such a functon n the tght SRAM space to catch up wth the rapd advance n network speed. Ths paper proposes an effcent scan detecton schee based on a new ethod called dynac bt sharng, whch optally cobnes probablstc saplng, bt-sharng storage, and axu lkelhood estaton. We deonstrate theoretcally and experentally that the new schee s able to acheve a probablstc perforance objectve wth arbtrarly-set bounds on worst-case false postve/negatve ratos. It does so n a very tght eory space where the nuber of bts avalable s uch saller than the nuber of external sources to be ontored. REFERENCES [1] R. Deal, Csco Router Frewall Securty, Csco Press, ISBN-10: 1-58705-175-3, August 2004. [2] W. Davd Gardner, Researchers Transt Optcal Data At 16.4 Tbps, InforatonWeek, February 2008. [3] S. Venkatataan, D. Song, P. Gbbons, and A. Blu, New Streang Algorths for Fast Detecton of Superspreaders, Proc. of NDSS, February 2005. [4] M. Yoon, T. L, and S. Chen, Ft a Spread Estator n Sall Meory, Proc. of IEEE INFOCOM, Aprl 2009. [5] Q. Zhao, A. Kuar, and J. Xu, Jont Data Streang and Saplng Technques for Accurate Identfcaton of Super Sources/Destnatons, Proc. of USENIX/ACM Internet Measureent Conference, October 2005. [6] Q. Zhao, J. Xu, and A. Kuar, Detecton of Super Sources and Destnatons n Hgh-Speed Networks: Algorths, Analyss and Evaluaton, IEEE Journal on Selected Areas n Councatons (JASC), vol. 24, no. 10, pp. 1840 1852, October 2006. [7] C. Estan, G. Varghese, and M. Fsh, Btap Algorths for Countng Actve Flows on Hgh-Speed Lnks, IEEE/ACM Transactons on Networkng (TON), vol. 14, no. 5, pp. 925 937, October 2006. [8] J. Cao, Y. Jn, A. Chen, T. Bu, and Z. Zhang, Identfyng Hgh Cardnalty Internet Hosts, Proc. of IEEE INFOCOM, Aprl 2009. [9] Y. Lu, A. Montanar, B. Prabhakar, S. Dharapurkar, and A. Kabban, Counter Brads: A Novel Counter Archtecture for Per-Flow Measureent, Proc. of ACM SIGMETRICS, June 2008. [10] N. Band, D. Agrawal, and A. Abbad, Fast Algorths for Heavy Dstnct Htters usng Assocatve Meores, Proc. of IEEE Internatonal Conference on Dstrbuted Coputng Systes(ICDCS), June 2007. [11] M. Charkar, K. Chen, and M. Farach-Colton, Fndng Frequent Ites n Data Streas, Proc. of Internatonal Colloquu on Autoata, Languages, and Prograng (ICALP), July 2002. [12] G. Corode and S. Muthukrshnan, Space Effcent Mnng of Multgraph Streas, Proc. of ACM PODS, June 2005. [13] E. Deane, A. Lopez-Ortz, and J. Ian-Munro, Frequency Estaton of Internet Packet Streas wth Lted Space, Proc. of Annual European Syposu on Algorths (ESA), Septeber 2002. [14] X. Dtropoulos, P. Hurley, and A. Knd, Probablstc Lossy Countng: An Effcent Algorth for Fndng Heavy Htters, ACM SIGCOMM Coputer Councaton Revew, vol. 38, no. 1, January 2008. [15] C. Estan and G. Varghese, New Drectons n Traffc Measureent and Accountng, Proc. of ACM SIGCOMM, October 2002. [16] P. Gbbons and Y. Matas, New Saplng-based Suary Statstcs for Iprovng Approxate Query Answers, Proc. of ACM SIGMOD, June 1998. [17] G. Manku and R. Motwan, Approxate Frequency Counts over Data Streas, Proc. of VLDB, August 2002. [18] Y. Zhang, S. Sngh, S. Sen, N. Duffeld, and C. Lund, Onlne Identfcaton of Herarchcal Heavy Htters: Algorths, Evaluaton, and Applcaton, Proc. of ACM SIGCOMM IMC, October 2004. [19] B. H. Bloo, Space/Te Trade-offs n Hash Codng wth Allowable Errors, Councatons of the ACM, vol. 13, no. 7, pp. 422 426, 1970. [20] A. Broder and M. Mtzenacher, Network Applcatons of Bloo Flters: A Survey, Internet Matheatcs, vol. 1, no. 4, June 2002. APPENDIX A: VARIANCE OF V Let A be the event that the th bt n B reans 0 at the end of the easureent perod and 1 A be the correspondng ndcator rando varable. Let U be the rando varable for the nuber of 0 bts n B. We frst derve the probablty for A to occur and the expected value of U. For an arbtrary bt n B, each dstnct contact has a probablty of p to set the bt to one. All contacts are ndependent of each other when settng bts n B. Hence, Prob{A } = (1 p )n, [0,s). The probablty for A and A j,,j [0,), j, to happen sultaneously s Prob{A A j} = (1 2p )n. Snce V = U and U = =1 1 A, we have E(V) 2 = 1 2E(( 1 A ) 2 ) =1 = 1 2E( 1 2 A )+ 2 2E( =1 1 <j = 1 (1 p )n + 1 (1 2p )n. Based on (8) and the equaton above, we have Var(V ) = E(V 2 ) E(V ) 2 1 A 1 Aj ) = 1 (1 p )n + 1 (1 2p )n (1 p )2n np e (1 (1+ np 2 )e np ). (22) 9