Automating Analysis of Large-Scale Botnet Probing Events

Transcription

1 Automatng Analyss of Large-Scale Botnet Probng Events Zhchun L, Anup Goyal and Yan Chen Northwestern Unversty 2145 Sherdan Road Evanston, IL, USA Vern Paxson UC Berkeley & ICSI 1947 Center St., Sute 600 Berkeley, CA, USA ABSTRACT Botnets domnate today s attack landscape. In ths work we nvestgate ways to analyze collectons of malcous probng traffc n order to understand the sgnfcance of large-scale botnet probes. In such events, an entre collecton of remote hosts together probes the address space montored by a sensor n some sort of coordnated fashon. Our goal s to develop methodologes by whch stes recevng such probes can nfer usng purely local observaton nformaton about the probng actvty: What scannng strateges does the probng employ? Is ths an attack that specfcally targets the ste, or s the ste only ncdentally probed as part of a larger, ndscrmnant attack? Our analyss draws upon extensve honeynet data to explore the prevalence of dfferent types of scannng, ncludng propertes such as trend, unformty, coordnaton, and darknet avodance. In addton, we desgn schemes to extrapolate the global propertes of scannng events (e.g., total populaton and target scope) as nferred from the lmted local vew of a honeynet. Cross-valdatng wth data from DSheld shows that our nferences exhbt promsng accuracy. Categores and Subject Descrptors C.2.3 [Computer-Communcaton Networks]: Network Operatons network montorng; C.2.0 [Computer-Communcaton Networks]: General Securty and protecton General Terms Algorthms, Measurement, Securty Keywords Botnet, Global property extrapolaton, Honeynet, Scan strategy nference, Stuatonal awareness, Statstcal nference 1. INTRODUCTION When a ste receves probes from the Internet whether basc attempts to connect to ts servces, or apparent attacks drected at those servces, or smply pecular spkes n seemngly bengn actvty often what the ste s securty staff most wants to know s Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, to republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. ASIACCS 09 March 10-12, 2009, Sydney, NSW, Australa Copyrght 2009 ACM /09/03...$5.00. not are we beng attacked? (snce the answer to that s almost always yes, all the tme ) but rather what s the sgnfcance of ths actvty? Is the ste beng delberately targeted? Or s the ste smply recevng one small part of much broader probng actvty? For example, suppose a ste wth a /16 network receves malcous probes from a botnet. If the ste can determne that the botnet probed only ther /16, then they can conclude that the attacker may well have a specal nterest n ther enterprse. On the other hand, f the botnet probed a much larger range, e.g., a /8, then very lkely the attacker s not specfcally targetng the enterprse. The answers to these questons greatly nfluence the resources the ste wll choose to employ n respondng to the actvty. Obvously, the ste wll often care more about the probng f the attacker has specfcally targeted the ste, snce such nterest may reflect a worrsome level of determnaton on the part of the attacker. Indeed, such targeted attacks have recently grown n promnence. Yet gven the ncessant level of probng all Internet addresses receve [21], how can a ste assess the rsk a gven event reflects? In ths work we seek to contrbute to the types of analyss that stes can apply to gauge such rsks. We orent much of our methodology wth an assumpton that most probng events reflect actvty from the coordnated botnets that domnate today s Internet attack landscape. Our approach s lmted to analyzng farly large-scale actvty that nvolves multple local addresses. As such, our technques are sutable for use by stes that deploy darknets (unused subnets), honeynets (subnets for whch some addresses are populated by some form of honeypot responder), or n general any montored networks wth unexpected access, for whch we can detect botnet probng events. The man contrbuton of ths paper s the development of a set of technques for analyzng botnet events, most of whch do not requre the use of responders. For smplcty, we wll refer to the collecton of sensors as the ste s Sensors. In contrast to prevous work on botnets, whch has focused on ether host-level observatons of sngle nstances of a botnet actvty, studes of partcular captured botnet bnares [11], or networklevel analyss of command-and-control (C&C) actvty [24], our technques am to characterze facets of large-scale botnet probng events regardless of the nature of the botnet. Our analyss does not requre assumptons about the nternal organzaton and communcaton mechansms employed by the botnets. We focus on characterzaton of botnet propertes based on nferences from ther probng behavor. In addton, our approach has the sgnfcant beneft of requrng only local nformaton, rather than global nformaton as requred by collaboratve efforts such as DSheld [27]. We gve more detaled comparsons n Secton 6. We frame the contrbutons of our work as follows. Frst, we develop a set of statstcal approaches to assess the attrbutes of large-scale probng events seen n Sensors, ncludng checkng for trends, unformty, coordnaton, and one specfc form of ht-lst (Secton 3). The type of ht-lst we focus on s lveness-aware scan-

2 W G H oney nets/ H oney farm s T raffc T raffc C l assfcaton E v ent E x tracton Botnet Detecton Msconfguraton Msconfguraton S ep araton W orm S ep araton orm B o t n e t Mod el ch eck ng Monotonc trend ch eck ng H t l st ch eck ng U nform ty ch eck ng I nd ep end ency ch eck ng Fgure 1: System archtecture. Botnet w th unform scan m od el Botnet I nf er ence l ob al P rop erty E x trap ol aton 35% 30% 25% 20% 15% 10% 5% 0% HTTP Vul. MSSQL Vul. Symantec Vul. VNC Vul. SMB/RPC Vul. Other Vul. Not Vul. Fgure 2: The dstrbuton of the malcous payload dscovered n the scan events. nng, n whch the attackers try to avod darknets. For trend and unformty checkng, the statstcal lterature provdes apt technques, but assessng coordnaton and use of ht-lsts requres developng new technques. We confrmed the consstency of the statstcal technques for nferrng event propertes wth manual nspecton or vsualzaton. Applyng such statstcal testng on massve honeynet traffc reveals some nterestng and sophstcated botnet scan behavors such as coordnated scans. We then used our sute of tests to frame the scannng strateges employed durng dfferent probe events, from whch we can further extrapolate the global propertes for partcular strateges. Second, we devse two algorthms to extrapolate the global propertes of a scannng event based on a sensor s lmted local vew. These algorthms are based on dfferent underlyng assumptons and exhbt dfferng accuraces, but both enable us to nfer the global scannng scope of a probng event, as well as the total number of bots ncludng those unseen by the Sensors, and the average scannng speed per bot (Secton 4). The global scannng scope enables the ste s operators to assess whether ther network s a specfc target of botnet actvty, or f nstead the botnet s scannng targets a large network scope that smply happens to nclude the ste. The estmated total botnet sze can help us track trends n how botnets are used, wth mplcatons for ther C&C capabltes. The algorthms are rooted n the observaton (confrmed by our checkng of scannng propertes) that the most frequent scannng patterns reflect unform random scannng or unform ht-lst scannng. Indeed, nearly all of the probng events we observed follow one of these two scan patterns. 1 In Secton 5, we evaluate our technques usng 24-month trace (293 GB total) of Honeynet traffc collected at a large research nsttuton. Of the events classfed as lkely botnet actvty (.e., not msconfguratons or worms), most reflected ether unform-random or unform-htlst scannng. Analyzng the data, we fnd that 66.5% of botnet events exhbt unform random scannng and 16.3% of botnet events reflect ht-lst scannng, 85% of whch were also unform. Also, we fnd most of these probes nclude attacks. As shown n Fgure 2, our honeynet measurements fnd that about 84% of scan events carry malcous payloads targetng vulnerabltes of dfferent protocols, such as SMB/RPC, MSSQL, VNC, etc. 2 We note that such botnet scans are one key technque employed for botnet recrutment [24]. Through event correlaton study, we also fnd some nterestng behavors of how botmasters control ther bots. 1 Of course there s the usual arms race here between attackers and defenders. If our technques become wdely used, then attackers may modfy ther probng traffc to skew the defenders analyss. But untl the botmasters take steps to do so, these technques have value. We adopt the vew common n network securty research that there s sgnfcant utlty n rasng the bar for attackers even f a technque s ultmately evadable. 2 Not Vul. conssts of nstances where the honeynet receved lttle or no payload, or purely servce-testng probes. Unq Source Counts Tme (Sx Hour Interval) Year 2006 Fgure 3: Temporal dstrbuton of source count for VNC(5900). To valdate our estmates of the global propertes, we compare our results wth those from DSheld [27], the Internet s largest global alert repostory. We fnd that n 75% of cases, our extrapolated scope s wthn a factor of 1.35 of the scan scope observed n DSheld data. In all the cases t s wthn a factor of 1.5. The results ndcate that our approaches hold promse for suffcent accuracy to enable stes to make relable nferences, wth the caveat that we were unable to fnd any nstances of events n our current dataset that reflected a global scope much dfferent from /8. 2. SYSTEM FRAMEWORK The archtecture of our desgn s shown n Fgure 1. The system has two subsystems: botnet detecton and botnet nference. In ths paper we focus on the latter (rghthand half of Fgure 1). All of the steps n our analyss system are automated, most of them fully so. We manly use the Honeynet sensor to drve the rest of the dscusson, although we can generally apply our analyss technques (the botnet nference subsystem) to botnet probe events detected by other types of sensors. The system classfes traffc seen on the sensors by dfferent protocols or by sesson semantcs. We defne a sesson as a set of connectons between a par of hosts wth a specfc purpose, perhaps nvolvng multple applcaton protocols. The system extracts events based on the number of unque sources arrvng n a wndow of tme (cf. the spkes n Fgure 3), classfyng the actvty nto msconfguratons, worms, and botnet-lke probng. 2.1 Honeynet and Data Collecton Our detecton sensor conssts of ten contguous /24 subnets wthn one of a large research nsttuton s /16 networks. We deployed Honeyd responders [23] on fve of the subnets and operated the other fve completely dark. (We use ths latter for ht-lst detecton.) The Honeyd confguraton s smlar to that used by Pang et al.n [21]: we smulate the HTTP, NetBIOS, SMB, WINRPC, MSSQL, MYSQL, SMTP, Telnet, DameWare protocols, wth echo servers for all other port numbers. We evaluate our analyss technques usng 293 GB of trace data collected over two years (2006

3 and 2007). 2.2 Botnet Detecton Subsystem In ths paper we manly focus on botnet nference. For the completeness we brefly ntroduce how to detect botnet events here. The detals s avalable n our technque report [18]. Traffc Classfcaton: Attack traffc can have complex sesson structure nvolvng multple applcaton protocols. For example, an attacker can send an explot to TCP port 139 whch, f successful, results n openng a shell and ssung an HTTP download command. Often the applcaton protocol contacted frst s the protocol beng exploted (an excepton s an ntal connecton to a portmapper servce), so we label sessons wth the servce assocated wth the frst destnaton port appearng n them. Dong so also provdes consstent labelng for connecton attempts seen n darknets or other types of sensors. We aggregate connectons nto sessons usng an approach smlar to the frst step algorthm by Kannan et al [14]. For applcaton protocols not commonly used, the background radaton nose (ncludng ndvdual port scans) s typcally low, and thus we use port numbers to separate event traffc. However, nose s usually strong for popular protocols, requrng further dfferentaton based on payload (when avalable). To do so, we mplemented payload summary scrpts for 20 commonly seen protocols, based on the Bro system s network analyss capabltes [22]. Event Extracton: Fgure 3 shows source arrval counts for VNC (TCP port 5900) for the year 2006 on our sensor, where each pont represents the number of sources wthn a sx-hour nterval. Large spkes n such plots generally correspond to scannng from worms or apparent botnets, or msconfguratons. We classfy such spkes as events, as follows. We defne the nose strength N as the pernterval count of unque sources seen n the absence of events. Suppose the tme nterval length s I. We calculate N as the medan of unque source counts of K contnuous tme ntervals before the event. We defne sgnal strength S = X N as the peak unque H t L s t Not H t L s t M onot on c T r e nd M onot on c T r e nd P a r t a l M onot on c T r e nd P a r t a l M onot on c T r e nd U n f or m & I nd e p e nd e nt U n f or m & Non- nd e p e d e nt source count arrval X mnus the nose strength N, and defne the sgnal-to-nose rato as SNR= S = X N = X 1. N N N In our evaluaton we use I = 6 hours and K = 120. The aggregated tme wndow I K s about 30 days. We only examne events wth SNR 50. We automatcally extract potental events as follows: for any gven tme nterval, we calculate the medan of the prevous normal K ntervals and the SNR. For those spkes exceedng our SNR threshold, we extend the tme range to both sdes untl S ωn where ω s a tunable parameter controllng the amount of the sgnal tal to nclude n the event. (We use ω = 5, though we fnd rangng t over does not sgnfcantly alter the results.) For multple events wthn one tme seres, we extract the events teratvely, startng wth the event wth largest SNR. One problem we have to consder s that some events have complex sesson structures nvolvng multple protocols. After traffc classfcaton by protocol nformaton, a sngle event can be separated to multple events. Therefore, after event classfcaton, we need to merge them. We detect such cases by checkng the connecton correlaton. If two connectons are n one sesson, they wll be both from host A to host B and the protocols of the two connectons are fxed. For example, suppose the frst connecton s HTTP and the second one s WINRPC. If we fnd such events to be hghly correlated,.e., for most connectons n the HTTP event, each HTTP connecton s followed by a WINRPC connecton from the WINRPC event for the same source and destnaton par, we merge them as one event. Event Classfcaton and Separaton: We separate msconfguratons from worms or botnets based on the observaton that botnet scans and worms should contact a sgnfcant range of the IP addresses, whereas msconfguratons exhbt a few hot-spot targets. We found that most msconfguraton events are due to P2P traf- Non- U n f or m U n f or m & I nd e p e nd e nt U n f or m & Non- nd e p e d e nt Non- U n f or m Fgure 4: Model Checkng Desgn Space. W/ mono t r e nd No mono t r e nd fc. The detaled analyss of these msconfguraton s our techncal report [17, 18]. In general, probng from worms (self-propagatng processes) can look very smlar to that from botnets (processes under a common C&C), and ndeed the lne between the two can blur dependng on the nature of the commands that botmasters ssue to ther bots. For our purposes, we dentfy and remove as worms those events that exhbt an exponental growng trend (per the technque developed n [31]) and deem the remander as botnet probng events. 2.3 Botnet Inference Subsystem Scan Pattern Checkng: For botnet probng events, there are numerous scannng strateges that attackers can potentally use. Identfyng the partcular approach can provde a bass to nfer further propertes of the events and perhaps of the botnets themselves. We refer to these strateges as scan patterns, and undertake to develop a set of scan-pattern checkng technques to understand dfferent dmensons of such strateges: Monotonc trend checkng Ht lst checkng Unformty checkng Dependency checkng For detals, see Secton 3. Global Property Extrapolaton: Once we dentfy a probng event s scan pattern, we then use the scan pattern to extrapolate a global vew of the event. We focus on two of the most common scan patterns: unform random scannng, and unform ht-lst scannng. We confrm ther common use both from botnet source code analyss (Secton A) and expermental observatons (Secton 5). We then extrapolate the global scan scope and the global number of bots based on these two scan patterns, usng technques developed n Secton PROPERTY CHECKING OF BOTNET SCAN PATTERNS The whole desgn space of the botnet probng strateges s very large. It s hard to consder all of them n our botnet nference framework. Through botnet source code analyss and reasonng what a ratonale botnet master wll do (the detals s n Appendx A), We fnd the unform random scannng, ht-lst scannng, monotonc scannng and coordnated permutaton scannng are the strateges more lkely used by the botmasters, gven they are smple and effectve. In ths secton we develop a set of analyss algorthms for detectng these scan strateges. Each s desgned to check a sngle dmenson of characterstcs n the scan pattern. Then we combne the characterstcs of an event to construct the scan pattern n use. We frst classfy the scan traffc pattern nto monotonc, partally monotonc and non-monotonc trends. For non-monotonc trend, we assess the possble use of a ht-lst or random-unform scannng (even dstrbuton of scans across the porton of the sensor space). Fnally, for random-unform pattern we test whether the senders can be modeled as ndependent.

4 #scan per IP Ht lst Destnaton IPs n the sensor #scan per IP Unform random Destnaton IPs n the sensor Fgure 5: Ht-lst and unform scannng dstrbuton on the sensor. 3.1 Monotonc Trend Checkng Queston: Do senders follow a monotonc trend n ther scannng? Monotoncally scannng the destnaton IP addresses (e.g., sequentally one after another) s a common scan strategy wdely used by network scannng tools. In our evaluaton, we dd fnd a few events whch use the monotonc trend scannng. Furthermore, for random events, the monotonc trend checkng can help us flter out the noses caused by the non-bot scanners. For each sender, we test for monotoncty n targetng by applyng the Mann-Kendall trend test [15], a non-parametrc hypothess testng approach. In our study, we set the sgnfcance level to 0.5%, snce a hgher sgnfcance level wll ntroduce more false postves and we need to check thousands of sources. In our evaluaton, we manually check the statstcal power and fnd t hgh enough to detect weak trends. The ntuton behnd ths test s that f the data have a monotonc trend, the aggregated sgn value(> 1; = 0; < 1.) of all the consecutve value pars would be out of the range the randomness can acheve. In our techncal report [18], we descrbe the detaled approach and our enhancement to the orgnal Mann-Kendall trend test. We label an entre event as havng a monotonc trend f more than 80% of senders exhbt a trend, and for further analyss remove those that do not reflect a trend as lkely representng separate actvty (and thus lkely removng a source of potental nose). We nstead label the event as non-monotonc f more than 80% of senders do not exhbt a trend. We label the remander as partal monotonc. 3.2 Ht-Lst Checkng Queston: Do the bots use a target ht-lst for scannng? By ht-lst scannng, we refer to an event for whch the attacker appears to have prevously acqured a specfc lst of targets. Htlst s often employed by sophstcated botmasters to acheve hgh scan effcency. It s mportant for the network admnstrators to know whether they are n the ht-lst. When that s the case, most lkely they wll be re-scanned by the attacker agan and agan. We detect the use of a ht-lst based on the observaton that such scans should heavly favor the use of lve addresses (those that respond) to dark (non-responsve) addresses. To ths end, we operate half of our sensor regon n a lve fashon and half dark. If we observe an event n the Honeynet porton, but not n the darknet porton, ths provdes strong evdence that the scan used a ht lst. However, one consderaton s event polluton (sources that actually are background nose rather than part of the botnet). We do not requre a complete absence of darknet scannng, nstead test for the prevalence of honeynet scans over darknet scans sgnfcantly exceedng what we would expect. Fgure 5 compares an example ht-lst event (WINRPC ) versus a random-unform event (VNC ). To dstngush between two such cases, we defne the rato of the number of senders whch target the darknet (m d ) over those of the honeynet (m h ) as θ = m d m h. Then we test whether θ crosses a gven threshold. In our evaluaton, we fnd the results are not senstve to the threshold we choose. Note that for the events that requre applcaton-level analyss to separate the actvty from the background traffc (e.g., dfferent types of HTTP probng), sources n the event wll necessarly be restrcted to the honeynet because applcaton-level dalog requres responses that the darknet cannot provde. In ths case we can stll perform an approxmate test, by testng the volume of traffc seen concurrently n the darknet usng the same port number. Dong so, may mss some ht-lst events, however, because we tend to overestmate the amount of actvty the botnet exhbts n the darknet. Even other factors could potentally cause an mbalance between the darknet and the Honeynet. However, most of these do not result n a sgnfcantly small θ, except the one n whch an attacker chooses a small scan range that happens to nclude only the Honeynet addresses. However, even f ths occurs we would also (f t does not reflect prevous scannng,.e., s not a ht-lst) expect t to occur equally often the other way around,.e., ncludng only darknet addresses but not Honeynet addresses, whch have not been observed over two years. In the 203 events we analyzed, we fnd 33 (16.3%) ht-lst events. 3.3 Unformty Checkng Queston: Does an event unformly scan the target range? A natural techncal for bots s to employ unform random scannng across the target range. Testng whether the scans are evenly dstrbuted n the honeynet sensor can be descrbed as a dstrbuton checkng problem. We employ a smple χ 2 test, whch s wellsuted for the dscrete nature of address blocks. For χ 2 test, when choosng the number of bns for the test, a key requrement s to ensure that the expected value E for any bn should exceed 5 [26]. Accordngly, gven that our events have at least several hundred scans n them, we dvde the 2,560 addresses n our Honeynet nto 40 bns wth 64 addresses per bn. We then use the χ 2 test wth a sgnfcance level of 0.5%, whch s found to work well n our subsequent evaluaton n Secton Dependency Checkng Queston: Do the sources scan ndependently or are they coordnated? Sophstcated scannng strateges can ntroduce correlatons between the sources n order to control the work that each contrbutes more effcently. For example, In Appendx A.2, we descrbe a more effcent coordnated scheme ABPS (Advanced Botnet Permutaton Scannng) based on permutaton scannng wll nduce negatve correlatons n the targetng among the sources (they try to get out of each other s way ). Snce tradtonal approach only an work n lnear dependence or two-varable cases, we develop a new hypothess testng approach. To test for such coordnaton, we use the followng hypothess test. The null hypothess s that the senders act n a unform, ndependent fashon (where we frst test for unformty as dscussed above); whle the alternatve hypothess s that the senders do not act n an ndependent fashon. If an event comprses n scans targetng d destnatons n a unform random manner, we can n prncple calculate the dstrbuton of the number of destnatons that receve exactly k scans, Z k. We then reject the null hypothess f the observed value s too unlkely gven ths dstrbuton (we agan use a 0.5% sgnfcance level). THEOREM 1. If n scans target d addresses n a unform ndependent manner, the number of addresses Z 0 (k = 0) whch do not receve any scan follows the probablty dstrbuton functon:! P (z 0) = d Strlng2(n, d z 0) (d z 0)!/d n z 0

5 Property name unform scannng unform ht lst estmaton method Global target scope Yes Yes ndrect Total # of bots Yes Yes ndrect Total # of scans Yes Yes ndrect Average scan speed per bot Yes Yes ndrect Coverage ht rato Yes No drect Sender OS dstrbuton Yes Yes drect Sender AS dstrbuton Yes Yes drect Sender IP prefx dstrbuton Yes Yes drect Table 1: Global propertes estmated from local observatons. The Strlng2(n, y) denotes the Strlng number of the second knd [29], whch s the number of ways to partton n elements to y non-empty sets. The proof s n Appendx B. However, f n d, then the sensor range wll be sparsely populated, and ths dstrbuton does not gve us much statstcal power. Instead, we need to use a larger value of k. The more detaled analyss s n our technque report verson [18]. We valdate our tests usng Monte Carlo smulatons wth and wthout ntroduced correlatons. We also confrm that the test correctly detects the correlatons ntroduced by our ABPS scheme. Fnally, when applyng our test to our two years worth of data, we do not n fact fnd any cases exhbtng lkely coordnated scannng. 4. EXTRAPOLATING GLOBAL PROPER- TIES We now turn to the problem of estmatng a probng event s global scope (target sze, partcpatng scanners) based only on local nformaton. Ths task s challengng because the sze of the local sensor may be very small compared to the whole range scanned by a botnet, gvng only a very lmted vew of the scannng event. For our estmaton, we consdered eght global propertes, as shown n Table 1. For both unform-random and unform-ht-lst scannng, the unformty property enables us to consder the local vew as a random sample of the global vew. Thus, the operatng system (OS), autonomous system (AS), and IP prefx dstrbutons observed n local measurements provde an estmate of the correspondng global dstrbutons (bottom three rows). However, we need to consder that f bots exhbt heterogenety n ther scannng rates, then the probablty of observng a bot decreases for slower-scannng ones. The scannng rate heterogenety mentoned above ntroduces a bas towards the faster bots n the populaton for these dstrbutonal propertes. By extrapolatng the total number of bots, however, we can roughly estmate the prevalence of ths effect. It turns out that n all of our analyzed events, we fnd that more than 70% of the bots appear at the local sensor 3 by comparng the number of bots seen at the local sensors wth the extrapolated global bot populaton as shown n Table 6. Thus, the bas s relatvely small. The coverage ht rato gves the percentage of target IP addresses scanned by the botnet. As ths metrc s dffcult to estmate for ht-lst probng, we manly consder unform scannng, for whch certan destnatons are not reached due to statstcal varatons. For unform scannng, we can drectly estmate ths metrc based on the coverage seen n our local sensor. In the remander of ths secton we focus on how to estmate the four remanng propertes, each of whch requres ndrect extrapolaton. 4.1 Assumptons and Requrements To proceed wth ndrect extrapolaton, we must make two key 3 The hgh percentage of bots appearng at the local sensor arses due to the fact that probng events contnue long enough to expose majorty of the bots. Approach Property name Affected Requre IPID by botnet or port # dynamcs contnuty Both # of bots No No Approach I Global target scope Total # of scans No No Yes Yes Average scan speed per bot Yes Yes Approach II Global target scope Yes No Total # of scans Yes No Average scan speed per bot Yes No Table 2: Addtonal assumptons and requrements. assumptons: 1 The attacker s oblvous to our sensors and thus sends probes to them wthout dscrmnaton. Ths assumpton s fundamental to general honeynet-based traffc study, (cf. the probe-response attack developed n [9] and counter-defenses [10]). A general dscusson of the problem s beyond the scope of ths paper. However, snce we assume our technque s manly used by a sngle enterprse or a set of collaboratng enterprses, we need not release sensng nformaton to the publc, whch counters the basc attack n [9]. Wth ths assumpton, we can treat the local vew as provdng unbased samples of the global vew. 2 Each sender has the same global scan scope. Ths should be true f all the senders are controlled by the same botmaster and each sender scans unformly usng the same set of nstructons. We argue that these two fundamental assumpton lkely apply to any local-to-global extrapolaton scheme. In addton, we check for one general requrement before applyng extrapolaton, namely consstency wth the presumpton that each sender evenly dstrbutes ts scans across the global scan scope. Ths requrement s vald for the dark regons shown n Fgure 4 (Secton 3 above),.e., both unform random scannng and random permutaton scannng, regardless of whether employng a ht-lst. Therefore, pror to applyng the extrapolaton approaches, we test for consstency wth unformty (va methodology dscussed n Secton 3), whch many of the botnet scan events pass (80.3%). There are some addtonal requrements specfc to certan extrapolaton approaches, as lsted n Table 2. Botnet dynamcs, such as churn or growth, can nfluence certan extrapolaton approaches. Accordngly these approaches work better for short-lved events. Approach I, as dscussed n secton 4.3, requres contnuty of the IP fragment dentfer (IPID) or ephemeral port, whch holds for botnets domnated by Wndows or MacOS machnes (n our datasets we found all the events are domnated by Wndows machnes). We use passve OS fngerprntng to check whether we can assume that ths property holds. 4.2 Estmatng Global Populaton Table 3 shows the notaton we use n our problem formulaton and analyss, markng estmates wth hat s. For example, ˆρ represents the estmated local over global rato,.e., rato of local sensor sze comparng to the global target scope of the botnet event, and Ĝ represents the estmated global target scope. If ρ s small, many senders may not arrve at the sensor at all. In ths case, we cannot measure the total bot populaton drectly. Instead, we extrapolate the total number of bots as follows. Wth the unform scan assumpton dscussed above, we have: m 1 M = m12 m 2 (1) based on the followng reasonng. We can splt the address range of the sensor nto two parts. Snce the senders observed n each part are ndependent samples from the total populaton M, Equaton 1 follows from ndependence. For example, suppose there are total

6 T d G ρ M m m 1 m 2 m 12 R R G T n t j Q Event duraton observed n the local sensor Sze of the local sensor Sze of global target scope Local over global rato d/g Total # of senders n the global vew n T Total # of senders n the local vew n T # of senders n the frst half of the local vew n T # of senders n the second half of the local vew n T # of overlapped senders of m 1 and m 2 n T Average scannng speed per bot Global scannng speed of bot Tme between frst and last scan arrval tme from bot Number of local scans observed from bot n T Inter-arrval tme between the j and j + 1 scans Local total # of scans n T Table 3: Table of notatons. M = 400 bots. In the frst half sensor, we see m 1 = 100 bots, whch s 1/4 of the total bot populaton. Consder the second half as another ndependent sensor, so the bots t observes form another random sample from the total populaton. Then we have a 1/4 chance to see f there s a bot already seen n the frst half. If the second half observes m 2 = 100 bots too, the shared bots wll be close to m 12 = 100/4 = 25. Snce n Equaton 1 we can drectly measure m 1, m 2, and m 12, we can therefore solve for M, the total number of bots n the populaton. Ths s a smple varaton of a general approach used to estmate anmal populatons known as Mark and Recapture. Snce the m 1,m 2 and m 12 are measured at exactly the same tme wndow 4, the estmated total populaton M s the number of bots of the botnet n the tme wndow. 4.3 Explotng IPID/Port Contnuty We now turn to estmatng the global scan scope. We nvestgated two basc strateges: frst, nferrng the number of scans sent by sources n between observatons of ther probes at the Honeynet (Approach I); second, estmatng the average bot global scannng speed usng the mnmal nter-arrval tme we observe for each source (Approach II, covered n the next secton). Approach I s based on measurng changes between a source s probes n the IPID or ephemeral port number. We predcate use of ths test on frst applyng passve OS fngerprntng to dentfy whether the sender exhbts contnuous IPID and/or ephemeral port selecton. Ths property turns out (see below) to hold for modern Wndows and Mac systems, as well as Lnux systems for ephemeral ports. IPID contnuty. Wndows and MacOS systems set the 16-bt IPID feld n the IP header from a sngle, global packet counter, whch s ncremented by 1 per packet. Durng scannng, f the machne s manly dle, and f the 16-bt counter does not overflow, we can use the dfference n IPID between two observed probes to measure how many addtonal (unseen by us) scans the sender sent n an nterval. (The algorthm becomes a bt more complex because of the need to dentfy and correct IPID overflow/wrap, as dscussed below. We also need to take nto account the endanness of the counter as present n the IP header.) A potental problem that arses wth ths approach s retransmsson of TCP SYN s, whch may ncrement the IPID counter even though they do not reflect new scans. Thus, when estmatng global scan speed we dvde by the average TCP SYN retransmsson rate we observe for the sender. Ephemeral port number contnuty. All of the botnets for whch we could nspect source code let the operatng system allocate the ephemeral source port assocated wth scannng probes. Agan, these are usually allocated by sequentally ncrementng a sngle, global counter. As wth IPID, we then use observed gaps n 4 Mark and Recapture requres the close system assumpton snce the two vsts do not happen n the same tme, whch s dfferent here. Operatng System Clents Wndows 159,152 (85.2%) Wndows 2000/XP 155,869 (97.9%) Wndows 2003/Vsta 231 (<.1%) Wndows NT ( 1.07%) Wndows (0.7%) Wndows (<.01%) Wndows other 39 (<.01%) BSD 458 (0.2%) Lnux 126 (<.1%) Novell 20 (<.01%) Undentfed 27,047 (14.4%) Total 186,725 Table 4: Aggregate operatng system dstrbuton, from passve OS fngerprntng of probng events. ths header feld to estmate the number of addtonal scans we dd not see. (In ths case, the logc for dealng wth overflow/wrappng s slghtly more complex, snce dfferent operatng systems confne the range used for ephemeral ports to dfferent ranges. If we know the range from the fngerprnted OS, we use t drectly; otherwse, we estmate t usng the range observed locally,.e., the maxmum port number observed mnus the mnmum port number observed.) IPID and ephemeral port number contnuty valdaton. In a controlled expermental envronment, we nstalled fve versons of Wndows, one of MacOS X, and two versons of Lnux, each n a dfferent vrtual machne. We then ran Nmap on each to generate scans, confrmng that all but Lnux (2.4/2.6) exhbt contnuty of IPID (wth Wn98 and NT4 ncrementng t lttle-endan, but Wn2000, WnXP, Wn2003, and MacOS X usng network order) and that all 8 systems allocated the ephemeral ports sequentally. As shown n Table 4, for all the probng events n the two-year Honeynet dataset, OS fngerprntng (va the p0f tool) ndcates that the large majorty of bots run Wndows 2000/XP/2003/Vsta (85%), enablng us to apply both IPID and ephemeral port number based estmaton. From ths analyss, we also know that the proporton of Wndows 95/98/NT4 s very low (0.8%), and only for those cases do we need swtch the byte order. (These percentages match nstall-based statstcs [5] ndcatng that Wn98 and NT4 comprse less than 1.5% of systems overall.) NAT effects on IPID and ephemeral port contnuty. Snce NATs can potentally alter IPID and ephemeral ports, we test three popular home routers n ths regard Lnksys, Netgear and D-Lnk, whch comprse more than 70% of the home router market [1]. We use Nmap to send the scans from hosts behnd these NATs and examne whether ther IPID or ephemeral ports changed. For all three, IPID remans unchanged, and for a sngle scanner behnd the NAT, the ephemeral port also remans unchanged. For multple scanners behnd the NAT, the ephemeral port numbers of the frst sender reman unchanged, though for the D-Lnk router the ports of addtonal scanners become arbtrary. Even though IPID remans unchanged, the ntermnglng of multple IPID sequences for a sngle apparent source address renders smple extrapolaton of scannng speed mpractcal. Technques exst for detectng the presence of multple sources behnd a NAT (also based on IPID), but these requre observng a large porton of the traffc comng out of the NAT [8], whch s mpractcal n our case. However, gven that we usually have a large number of dstnct sources, we can restrct our analyss to those cases that exhbt strong lnearty for ether IPID or ephemeral port numbers, whch avods conflatng patterns n these arsng from multple sources alased to the same publc IP address. In our evaluaton, we fnd that on an average 463 senders mantan lnearty n IPID and/or ephemeral port numbers for an event; thus, they can be used for extrapolaton purpose. Global scan speed estmaton. As the IPID and ephemeral port number approaches work smlarly, here we dscuss only the for-

7 mer. We proceed by dentfyng the top sources orgnatng n at least four sets of scannng. We test whether (after overflow recovery) the IPIDs ncreases lnearly wth respect to tme, as follows. Frst, for two consecutve scans, f the IPID of the second s smaller than the frst, we adjust t by 64K. We then try to ft the corrected IPID and ts correspondng arrval tme t, along wth prevous ponts, to a lne. If they ft wth correlaton coeffcent r > 0.99, t reflects consstency wth a near-constant scan speed, and the sender s a sngle host rather than multple hosts behnd a NAT. When ths happens, we estmate the global speed from the slope. It s possble that multple overflows mght occur, n whch case the smple overflow recovery approach wll fal. However, n ths case the chance that we can stll ft the IPIDs to a lne s very small, so n general we wll dscard such cases. Ths wll create a bas when estmatng very large global scopes, because they wll more often exhbt multple overflows. Sources that happen to engage n actvty n addton to scannng can lead to overestmaton of ther global scan speed, snce they wll consume IPID or possbly ephemeral port numbers more quckly than those that mght be smply due to the scannng. To offset ths bas, when we have both IPID and ephemeral port estmates, we use the lesser of the two. Furthermore, n our evaluaton, for the cases where we can get both estmates, we check the consstency between them, and found that IPID estmates usually produce larger results, but more than 95% of the tme wthn a factor of two of the ephemeral port estmate. (Clearly, IPID can sometmes advance more quckly f the scanner receves a SYN-ACK n response to a probe, and thus returns an ACK to complete the 3-way handshake.) Global scan scope extrapolaton. Wth the ablty to estmate the global scan speed, we fnally estmate the global scan scope. Snce we know the local scope, the problem s equvalent to estmate the local over global rato ρ. Suppose n a probng event there are m senders seen by the sensor, for whch we can estmate the global scan speeds R G of a subset of sze m. For sender ( [m ]), we know T (duraton durng whch we observe the sender n the Honeynet) and n (number of observed scans). We use the lnear regresson wth correlaton coeffcent r > 0.99 (as we dscussed before) to estmate the R G whch s also qute accurate. The man estmaton error comes from varaton of the observed n from ts expectaton. Defne ˆρ = n R G T for each sender. Sender s global scan speed s R G. Globally durng T, t sends out R G T scans. n s the number of scans we see f we sample from R G T total scans wth probablty ρ. Therefore, ˆρ s an estmator of ρ. If we aggregate over all the m senders, we get ˆρ = n R G T (2) As show n Appendx C, we formally prove that ˆρ s an unbased estmator of ρ, and t s more accurate than ˆρ, whch only reflects a sngle sender. We then can use ˆρ to estmate the global scope a probe targeted. Average Scan Speed Per Bot. After extrapolatng ρ and M, we estmate the average scan speed per bot usng: Q R T M = ρ (3) Here Q s the number of scans receved by the sensor n tme T, whch should reflect a porton ρ of the total scans. We estmate the total scans by R T M, where R s the average scan speed per bot. Ths formulaton assumes that each bot partcpates n the entre duraton of the event, whch s more lkely to hold for short-lved events. Lmtatons. Note that both of the above technques can fal f attackers ether craft raw IP packets or explctly bnd the source Estmate Global Speed (probes/sec) Rank Fgure 6: Top 30 estmate speeds of Event VNC port used for TCP probes. Thus, the schemes may lose power n the future. However, craftng raw IP packets and smulatng a TCP stack s a somewhat tme consumng process, especally gven most bots (85+%) we observed run Wndows, and n modern Wndows systems the raw socket nterface has been dsabled. Emprcally, n our datasets we dd not fnd any case for whch the technques dd not appear to apply. 4.4 Extrapolatng from Interarrval Tmes For Approach II, we estmate global scannng speed (and hence global scope, va estmatng ρ from an estmate of R usng Equaton 3) n a qute dfferent fashon, as follows. Clearly, a sender s global scan speed s provdes an upper bound on the local speed we mght observe for the sender. Furthermore, f we happen to observe two consecutve scans from that sender, then they should arrve about t = 1/s apart. Accordngly, the mnmum observed t gves us a lower bound on s, but wth two mportant consderatons: () the lower bound mght be too conservatve, f the global scope s large, and we never observe two consecutve scans, and () nose perturbng network tmng wll ntroduce potentally consderable naccuraces n the assumpton that the observed T matches the nterarrval spacng present at the source. We proceed by consderng all m senders we observe, other than those that sent only a sngle scan. We rank these by the estmated global scan rate they mply va ŝ = 1/ ˆ t, where ˆ t s the mnmum observed nterarrval tme for the sender. Naturally, fast senders should tend to reflect larger estmated speeds, whch we verfed by comparng ˆ t of each sender wth how many scans we observed from t. We fnd that generally the correlaton s clear though wth consderable devatons. Usng the fast senders speeds to form an estmate of the average scannng speed may of course overestmate the average speed. On the other hand, our technque ams at estmatng a lower bound. Thus, t s crucal to fnd a balanced pont among the possble estmates. We do so by presentng the dfferent sorted estmates from whch the analyst chooses the knee of the resultng curve,.e., the pont wth smallest rank k for whch an ncrease n k yelds lttle change n s. Fgure 6 shows an example, plottng the top 30 maxmum estmated speeds of Event VNC From the fgure we would lkely select k = 6 as the knee, gvng an estmated speed EVALUATION We evaluate our technques usng the honeynet traffc descrbed n Secton 2.1. The total data spans 24 months and 293 GB of packet traces. Snce the extrapolaton algorthms we use are lnear n the number of scans n the events, we fnd that our system takes less than one mnute to analyze the scan propertes and perform the extrapolaton analyss for a gven event. We use SNR= 50 and a tal parameter ω = 5 for event extracton (rangng ω from 3 to 8 yelds dentcal results). We extract 203 botnet scan events and 504 msconfguraton events. There were a few moderate worm outbreaks observed durng the perod, such as the Allaple worm [4].

8 Targeted # of knds of Events Servce vul./probes NetBIOS/SMB/RPC 7 81 VNC 1 39 Symantec 1 34 MS SQL 1 14 HTTP 2 13 Telnet 1 12 MySQL 1 6 Others 4 4 total Table 5: The summary of the events The msconfguraton events are manly caused by P2P traffc. In ths paper, we focus on the botnet scan events. We frst present characterstcs of the botnet scannng events. Then we present the botnet event correlaton study. Next we dscuss results for the four botnet scan pattern checkng technques and ther valdaton. We fnsh wth the presentaton of global extrapolaton results and ther valdaton usng DSheld, a world-wde scan repostory. 5.1 Basc Characterstcs of the Botnet Events In Table 5, we break down 203 events accordng to ther targeted servces. We fnd that most of the events target popular servces that have large nstall-base. We also fnd that 30 (14.8%) events are purely port reconnassance wthout any payloads. Another three events check whether the HTTP servce s open by requestng the homepage. The remanng (83.7%) events target certan vulnerabltes. Therefore, these botnet scans lkely reflect attempted explotatons. Fgure 7 shows the CDF of event duraton. A botnet event can last from a few mnutes to a few days. There are 36 events that last very close to half an hour, leadng to the spke n the Fgure 7. As we wll dscuss n Secton 5.2, t s a cluster of events whch scan the same vulnerablty every half hour over and over agan, for days on end. Most lkely these botnet events are drven by a sngle botmaster. From Fgure 8, we also fnd that the number of sources nvolved n a botnet event s qute heterogeneous. In Fgure 9, we show the CDF of unque number of ASes per event. Most of the bots (62.7%) come from more than 100 ASes. Only 3% of events reflect fewer than 20 ASes. Ths mples that cleanng the botnets from some part of the world (some of ASes) wll not mprove the stuaton. Also blockng them based on AS number s very hard due to large number of ASes nvolved. We also fnd that the number of destnatons a bot scans dffers sgnfcantly for dfferent events, as show n Fgure 10. We further study the OS, AS and IP dstrbuton of the events. Table 4 n Secton 4 shows the aggregated OS dstrbuton. We see that Mcrosoft Wndows s the most popular OS, wth more than 83% of bots usng Wndows 2000/XP. (We see smlar results when analyzng ndvdual events.) For AS and IP address dstrbuton, we fnd that the aggregated results (203 events together) are close to those seen n prevous work [25]. However, we fnd very large varaton across ndvdual events; thus, address blacklsts derved from one event mght not be effectve when defendng aganst other events. cumulatve probablty e 01 1e+00 1e+01 1e+02 1e+03 event duraton (hours) Fgure 7: Event Duraton. cumulatve probablty # of sources per event Fgure 8: # of Sources. cumulatve probablty # of ASes per event Fgure 9: # Source ASes. SMB_COM_LOGOFF_ANDX-TCP445E SMB_COM_LOGOFF_ANDX-TCP445E cumulatve probablty SMB_COM_LOGOFF_ANDX-TCP445E the average destnatons per source contacted Fgure 10: Avg. # Destnatons / Source SMB_COM_LOGOFF_ANDX-TCP445E SMB_COM_LOGOFF_ANDX-TCP445E SMB_COM_LOGOFF_ANDX-TCP445E Fgure 11: A subset of the cluster of 36 events whch all target a same vulnerablty n SMB. The number on an edge labels the percentage of bots sharng. 5.2 Event Correlaton We study the temporal and source (bot IP address) correlaton of dfferent events. In ths context, f we fnd two events that have more than 20% source addresses n common, we consder them as correlated. We calculate the percentage of sharng as the maxmum of the shared addresses over total addresses of two events. We observe two types of nterestng behavor: Behavor 1: The botmasters ask the same botnet to scan the same vulnerablty repeatedly. In our two years of data, we fnd several event clusters that exhbt ths behavor. For example, there s a cluster of 36 events that occur every day, always scannng the same SMB vulnerablty. These events form a nearly complete clque,.e., each event shares 20% of the same source addresses n common wth most of the other events. In Fgure 11, we show a subset of ths commonalty graph. These events on average share about 35% of the same sources. Each event occurs on a dfferent day. We speculate ths actvty reflects the botmaster commandng the same botnet to re-scan the same address range repeatedly. Behavor 2: The botmasters appear to ask most of the bots n a botnet to focus on one vulnerablty, whle choosng a small subset of the bots to test another vulnerablty. Apart from these bg clusters, we fnd there are some cases n whch two events has very hgh correlaton (more than 80% of source address commonalty), and occur very close n tme, usually the same day. We fnd that often the frst event s much larger n terms of the number of bots than the second; the second s just a small subset of the bots from the frst. Ths behavor llustrates that the dffculty of fngerprntng botnet actvty, gven that botmasters may select a subset of bots to assgn to dfferent tasks. 5.3 Property-Checkng Results Fgure 12 shows the breakdown of the events along dfferent scannng dmensons. Sx of the 203 events exhbt partal monotonc trends; 16.3% reflect ht-lsts; 80.3% follow the randomunform pattern, passng both unformty and ndependence tests. Through manual nspecton of the partal monotonc events, we fnd that nearly half of the bots scan randomly and another half of 40 33

9 H t L s t 1 6.3% ( 33) Not H t L s t 83.7% ( 1 70) M onot on c T r e nd 0% M onot on c T r e nd 0% P a r t a l M onot on c T r e nd 0% P a r t a l M onot on c T r e nd 3.0% ( 6) U n f or m & I nd e p e nd e nt 1 3.8% ( 28) U n f or m & Non- nd e p e nd e nt 0% Non- U n f or m 2.5% ( 5) U n f or m & I nd e p e nd e nt 6 6.5% ( 1 35) U n f or m & Non- nd e p e nd e nt 0% Non- U n f or m 1 4.2% ( 29 ) Fgure 12: Scan Pattern checkng results. W/ mono t r e nd 3.0% No mono t r e nd 9 7.0% bots scan sequentally. All of these bots start to scan at almost the same tme. Perhaps they reflect two groups of bots controlled by the same botmaster, and the botmaster askng these two groups to use dfferent scan strateges; but n general, ths behavor s puzzlng. After that, we test the use of lveness-aware scannng (whch we term ht-lsts ). As mentoned above, we use θ (the rato of the number of senders n the darknet over to those of the lve honeynet) as the metrc to classfy the events. Out of the 106 events classfed by port number, 34 reflect ht-lst scannng when usng θ = 0.5. In fact, all have emprcal values for θ < 0.01, and all of events wth θ > 0.5 have θ > The 97 other events use popular ports also seen n background radaton, and thus we have to classfy them based on applcaton-level behavor. For these, we conservatvely assume that all the senders n the darknet usng the same port number s possble members of the event, whch tends to overestmate θ. For these 97 events, we dd not fnd any wth small θ and most of them have θ larger than one. We found n all the cases, the results are nsenstve to the threshold of θ. In addton, none of the events only target the darknet. date 2006 desc ex. scope DSheld scope scope rato ex. scope (I)(/8) (/8) (I) (II)(/8) MSSQL Symantec Symantec Symantec VNC VNC VNC NetBIOS NetBIOS NetBIOS SMB SMB Table 6: Global scope extrapolaton results and valdaton (ex. denotes extrapolated; DSheld denotes the valdaton results usng DSheld data.). 34 of the 197 random events fal the test for unformty. We vsually confrm that all of the remanng 163 events passng the test ndeed appear unform. Three of those that faled appear unform vsually, but have very large numbers of scans, for whch the statstcal testng becomes strngent n the presence of a mnor amount of nose. In the remanng faled cases, we can see hot-spot addresses that clearly attract more actvty than others; we do not know why. Fnally, we test the 163 unform cases for coordnaton, not fndng any nstances at a 0.5% sgnfcance level. In addton, we smulate the advanced botnet permutaton scan (ABPS), and fnd the dependency test can accurately detect t even wth 0% 20% packet loss. Thus, none of the scannng we observe appears to reflect any sgnfcant degree of coordnaton. 5.4 Extrapolaton Evaluaton and Valdaton We valdate two forms of global extrapolaton global scan scope and total number of bots usng data from DSheld [27], a very large repostory of scannng and attack reports. Fndng: 75% of our estmates of global scannng scope usng only local data le wthn a factor of 1.35 of estmates from DSheld s global data, and all wthn a factor of 1.5. Fndng: 64% of bot populaton estmates are wthn 8% of relatve errors from DSheld s global data, and all wthn 27% of relatve errors For 163 unform events, 135 reflect ndependent unform scannng and 28 reflect ht-lst scannng. For each type we estmate ether the total scannng ranges or the total sze of the ht lsts, respectvely. It s dffcult to verfy ht-lst extrapolatons because of the dffculty of assessng how the ht-lst wll algn wth sources that report to DSheld. However, we can valdate extrapolatons from the frst class of events snce we fnd they usually target a large address range. Due to lmted data access to DSheld, we have only been able to verfy 12 cases as of today, as shown n Table Global Scope Extrapolaton and Valdaton. Global scope extrapolaton results: In Table 6, we show the extrapolated scan scope we estmate from the local honeynet comparng wth the estmaton we make wth the DSheld data. Column ex. scope (I) shows the honeynet extrapolated scan scope by Approach I. Column DSheld scope shows the DSheld based estmaton. Column scope rato gves the rato of the honeynet extrapolated scan scope by Approach I over the DSheld scope. Column ex. scope (II) shows the extrapolated scan scope by Approach II. From the results, we see that our fndngs are consstent wth those derved from DSheld. Next, we ntroduce how the DSheld valdaton works, and then we wll analyze the accuracy of our results. Valdaton Methodology: We fnd that most DSheld sensors appear to have synchronzed clocks (.e., we often fnd sgnfcant temporal overlap between our honeynet events and correspondng DSheld reports). For a gven extraplaton, we take two steps for valdaton. Frst, snce the extrapolaton results we got are all of /8 sze or qute close, we try to fnd all the /8 networks (except those wth prvate IP prefxes) wth suffcent source overlap wth the honeynet events. Secondly, for these /8 networks, we nfer the scan scopes and compare them wth our results. Step 1. Let X denote the /8 IP prefx of our sensor. We frst calculate the number of shared senders N(X) between our event data and scan logs for X from DSheld. We consder addtonal /8 prefxes Y f ther numbers of senders shared wth the honeynet N(Y ) are larger than N(X)/3, reflectng an assumpton that f a botnet unformly scans multple /8 prefxes, each should see qute a few sources n common. For X and each Y, we select the full wdth at half maxmum (FWHM) of the unque source arrval process as a (conservatve) way to delneate the global nterval of the event. We then calculate the tme range overlap wth X for each Y ; f the overlap of Y exceeds 50% of X s nterval, we consder that the botnet scanned X and Y at the same tme. Step 2. After fndng the scanned /8 networks, we estmate the scan scope wthn each. Alternatvely, we compute the rato of sensors n each network reportng the scans. There are several lmtatons of DSheld data. Frst, t does not contan complete scan nformaton (only a subset of scans wthn a prefx are reported). Second, dfferent sensors mght use dfferent reportng thresholds and mght not see all actvty (e.g., due to frewall flterng). Thus all these lmtatons makes calbraton of data a challengng job. To assess the lmtatons, we check a one-week nterval around our events to fnd whch DSheld sensors ever report a gven type of actvty. We treat all the reportng sensors n one /24 network as a sngle unque sensor. We count the number of sensors from dfferent /24 networks, denoted by C total. Smlarly, we count the number of unque sensors from dfferent /24 networks that reported scans from shared senders of the gven event, denoted C est. We reduce the nose from the DSheld data by removng sensors

10 cumulatve probablty Approach I scope factor cumulatve probablty Approach II scope factor Fgure 13: The CDFs of the scope factors of the 12 events we valdate. that only report a sngle address wthn a /24 sensor. We then use C est/c total to estmate the fracton of a /8 networks scanned by the botnet, whch gves us a conservatve estmate of the event s total range. We add up such fractons f there are multple related /8 networks dscovered n the frst step, ndcatng the results n Column DSheld scope of Table 6. Accuracy Analyss: We defne the scope factor as «DSheld scope Honeynet scope scope factor = max, Honeynet scope DSheld scope The scope factor ndcates the absolute relatve error n the log scale. The DSheld data shows that our local estmates of global scope exhbt a promsng level of accuracy. As shown n Fgure 13, we can clearly know that, for Approach I, the scope factors of 75% events are less than 1.35, and all of them are less than 1.5. Approach II (column ex. scope II) works less well (58% of events are wthn a factor of three and 92% wthn a factor of sx), but t may stll exhbt enough power to enable stes to dfferentate scans that specfcally target them versus broader sweeps. In our twoyear dataset, we dd not fnd any scan events specfcally targetng the research nsttuton where the sensor resdes; ths fts wth the nsttute s threat model, whch s manly framed n terms of ndscrmnant attacks Total Populaton Estmate and Valdaton We assume that our honeynet event data and the correspondng DSheld scan data gve us two ndependent samples of the bot populaton, whch s another chance to use the Mark and Recapture prncple. We count the sources observed by DSheld sensors of IP prefx X on the same port number n the same tme wndow as the sources of DSheld sensors. We term the number of sources n common between our honeynet and DSheld as the shared sources. Based on the smlar dea of Equaton 1, we know the fracton of the shared sources to the sources of DSheld should be equal to the rato between bots observed n the honeynet and total populaton. Snce DSheld sensors wll see other scanners (consttutng nose) as well, we wll lkely underestmate the frst fracton, and consequently overestmate the bot populaton. Per the results shown below, we fnd the estmates very close to those we estmate locally by splttng the sensor nto two halves. Table 7. shows the extrapolaton and DSheld valdaton results. Column ex. #bots shows our bot populaton extrapolaton constructed by splttng the sensor nto two halves. Column #bot DSheld shows the results usng DSheld s global data. Column #bots rato gves the rato between the two of these. Note, we only valdate the seven port number based events (MSSQL, Symantec and VNC). The NetBIOS/SMB events requre payload analyss, whch cannot valdate through DSheld snce t does not provde any payloads. We fnd our approach s qute accurate gven 64% of cases are wthn 8% of relatve error ( (our DSheld) /DSheld) Other Extrapolaton Results Based on Approach I, we can also nfer the total number of scans and extrapolated average scan speed of the bots n each event. In date 2006 desc ex. #bots #bots DSheld #bots rato MSSQL Symantec Symantec Symantec VNC VNC VNC Table 7: extrapolated bot populaton results and valdaton. cumulatve probablty # of extraplated scans (M) Fgure 14: Extrapolated # of scans. cumulatve probablty extrapolated average speed (probes/sec) Fgure 15: Extrapolated the average scan speed. Fgure 14, we show the extrapolated total number of scans, usng a log-scaled X axs. We can see the number of scans sent by the events could dffer sgnfcantly gven the duraton and the number of bots n each event dffer. In Fgure 15, we show the extrapolated average scan speed of the bots. 6. RELATED WORK The work that most heavly nfluences us s the vson paper of Yegneswaran and colleagues on Internet stuatonal awareness [30]. Ther work outlnes the general problem of analyzng honeynet traffc to assess ts sgnfcance for the ste observng t. The authors present the potental promse of such analyss usng technques that rely consderably on vsualzaton. In ths work, we am to go substantally further, developng a toolkt for analyzng partcular features of large-scale honeynet events, and devsng technques and a general framework to automatcally or semautomatcally derve conclusons based on honeynet data. DSheld s the Internet s largest global alert repostory [27]. The advantages of our approach comparng wth DSheld are as follow: () In our experence, DSheld data s qute nosy, and the sensor densty qute non-unform. These lead to cases where t s dffcult to develop sound nferences from the data. () DSheld s subject to polluton and avodance [9]. Dependng solely on DSheld mght not be relable for operatonal securty. () When the target scope s small, t s hard to fnd other sensors n DSheld whch share the same behavor; thus DSheld wll fal to work n such cases. Whle the state of the art n terms of buldng honeynet systems has advanced consderably, the analyss of large-scale events captured by such systems remans n ts early stages. The Honeynet project has developed a set of tools for host-level honeypot analyss [2]. At the network level, Honeysnap [3] analyzes the contents of ndvdual connectons, partcularly for nvestgatng IRC traffc used for botnet command-and-control. These approaches all ether focus on sngle nstances of actvty, or on study of partcular botnets over tme (e.g., [24]). In contrast, n ths paper, we am nstead to understand the sgnfcance of sngle, large-scale events as seen by honeynets. Such actvty by defnton entals analyss ntegrated across a large number of nstances of the actvty, but also (unlke [24]) localzed n tme. Furthermore, the lterature ncludes a number of forensc case studes analyzng specfc large-scale events, partcularly worms [16, 20]. Such case studes have often benefted from a pror knowledge of the underlyng mechansms generatng the traffc of nterest. For our purposes, however, our goal s to nfer the mechansms themselves from a startng pont of more lmted knowledge.

11 Fnally, Gu et al.propose a seres botnet detecton technques based on behavor correlaton [12, 13]. In contrast, we focus on nferrng botnet propertes n the wake of detecton, rather than detecton tself. 7. CONCLUSIONS In ths paper we present several algorthms that can automatcally analyze and determne the features of large-scale events that gve nsght nto ther underlyng nature observed at a honeynet. In partcular, we develop technques for recognzng botnet scannng strateges and nferrng a dstrbuted scan s global propertes. An evaluaton of our tools usng extensve honeynet and DSheld data demonstrates the promse our approach holds for contrbutng to a ste s stuatonal awareness ncludng the crucal queston of whether a large probng event detected by the ste smply reflects broader, ndscrmnate actvty, or nstead reflects an attacker who has explctly targeted the ste. 8. ACKNOWLEDGMENT We would lke to thank Vnod Yegneswaran and Ruomng Pang for helpng collect the data and mplementng the Bro payload summary scrpts, the operatons staff of the Lawrence Berkeley Natonal Laboratory for facltatng the LBNL honeypot setup, and anonymous revewers for ther valuable comments. Ths work was supported by DOE CAREER award DE-FG02-05ER25692//A001, DOD (Ar Force of Scentfc Research) Young Investgator Award FA , and NSF grants NSF and CNS Any opnons, fndngs, and conclusons or recommendatons expressed n ths materal are those of the authors and do not necessarly reflect the vews of the fundng sources. 9. REFERENCES [1] AP Market Sharng. ups+and+downs/ _ [2] HoneyBow Sensor. [3] Honeysnap. honeysnap/ndex.html. [4] Net-Worm.Wn32.Allaple.a. encyclopeda?vrusd= [5] OS Platform Statstcs by W3school. browsers_stats.asp. [6] BACHER, P., HOLZ, T., KOTTER, M., AND WICHERSKI, G. Know your Enemy: Trackng Botnets. [7] BARFORD, P., ET AL. An nsde look at botnets. In Seres: Advances n Informaton Securty. Sprnger, [8] BELLOVIN, S., ET AL. A technque for countng NATted hosts. In Proc. of USENIX/ACM IMW (2002). [9] BETHENCOURT, J., ET AL. Mappng nternet sensors wth probe response attacks. In Proc. of the USENIX Securty (2005). [10] CAI, J., ET AL. Honeynets and honeygames: A game theoretc approach to defendng network montors. Tech. Rep. TR1577, Unversty of Wscconsn, [11] CHIANG, K., AND LLOYD, L. A case study of the rustock rootkt and spam bot. In Proc. of USENIX HotBots (2007). [12] GU, G., PORRAS, P., YEGNESWARAN, V., FONG, M., AND LEE, W. Bothunter: Detectng malware nfecton through ds-drven dalog correlaton. In Proc. of USENIX Securty (2007). [13] GU, G., ZHANG, J., AND LEE, W. Botsnffer: Detectng botnet command and control channels n network traffc. In Proc. of NDSS (2008). [14] KANNAN, J., JUNG, J., PAXSON, V., AND KOKSAL, C. Sem-automated dscovery of applcaton sesson structure. In Proc. of ACM IMC (2006). [15] KENDALL, M. G. Rank Correlaton Methods. Grffn., [16] KUMAR, A., PAXSON, V., AND WEAVER, N. Explotng underlyng structure for detaled reconstructon of an nternet scale event. In Proc. of ACM IMC (2005). [17] LI, Z., GOYAL, A., CHEN, Y., AND KUZMANOVIC, A. P2p doctor: Measurement and dagnoss of msconfgured peer-to-peer traffc. Tech. Rep. NWU-EECS-07-06, Northwestern Unversty, [18] LI, Z., GOYAL, A., CHEN, Y., AND PAXSON, V. Towards stuatonal awareness of large-scale botnet events usng honeynets. Tech. Rep. NWU-EECS-08-08, Northwestern Unversty, [19] MANNA, P., CHEN, S., AND RANKA, S. Exact modelng of propagaton for permutaton-scannng worms. In IEEE INFOCOM (2008). [20] MOORE, D., PAXSON, V., SAVAGE, S., SHANNON, C., STANFORD, S., AND WEAVER, N. Insde the slammer worm. IEEE Securty and Prvacy (2003). [21] PANG, R., YEGNESWARAN, V., BARFORD, P., PAXSON, V., AND PETERSON, L. Characterstcs of Internet background radaton. In Proc. of ACM IMC (2004). [22] PAXSON, V. Bro: A system for detectng network ntruders n real-tme. Computer Networks 31 (1999). [23] PROVOS, N. A vrtual honeypot framework. In Proc. of USENIX Securty (2004). [24] RAJAB, M., ZARFOSS, J., MONROSE, F., AND TERZIS, A. A multfaceted approach to understandng the botnet phenomenon. In Proc. of ACM IMC (2006). [25] RAMACHANDRAN, A., AND FEAMSTER, N. Understandng the network-level behavor of spammers. In Proceedngs of ACM SIGCOMM 06 (September 2006). [26] RICE, J. A. Mathematcal Statstcs and Data Analyss. Duxbury Press, [27] SANS INSTITUTE. Dsheld.org: Dstrbuted ntruson detecton system. [28] STANIFORD, S., PAXSON, V., AND WEAVER, N. How to 0wn the Internet n your spare tme. In Proc. of USENIX Securty (2002). [29] WEISSTEIN, W. E. Strlng Number of the Second Knd. StrlngNumberoftheSecondKnd.html. [30] YEGNESWARAN, V., BARFORD, P., AND PAXSON, V. Usng honeynets for nternet stuatonal awareness. In In Proc. of ACM Hotnets IV (2005). [31] ZOU, C., GAO, L., GONG, W., AND TOWSLEY, D. Montorng and early warnng for nternet worms. In Prof. of ACM CCS (2003). APPENDIX A. MODELING HOW BOTS SCAN A.1 Bot Source Code Study By analyzng the source code of fve popular famles of bots, we study dfferent dmensons of scan strateges employed by botnets. The popularty of these fve bot famles s confrmed n [6, 7]. Our fndngs confrm those n [7], but we more focus on scan pattern study.

12 Botnet name Agobot Phatbot Spybot SDBot rxbot Global Yes Yes Yes Yes Yes Local Yes Yes Yes Yes Yes Ht-lst Possble Possble Possble Possble Possble Independent & Unform Yes Yes No Yes Yes Sequental No No Yes Yes Yes # of lnes Modularty Medum Hgh Low Low Hgh Table 8: Botnet source code study. Table 8 shows the scan strateges and complexty of the bot famles. Some of them are modularly well desgned. Currently, these bot famles manly use smple scannng strateges. Each supports both Global scannng (a specfed address block) and Local scannng (relatve to each bot s address). By ht-lst scannng, we refer to an event for whch the attacker appears to have prevously acqured a specfc lst of targets. Such scans may heavly favor the use of lve addresses (those that respond) to dark (nonresponsve) addresses. The fve bot famles we analyzed do not drectly automate ht-lst scannng, but an attacker can possbly acheve ths va two steps, frst scannng to gather a lst of lve addresses/blocks, and then specfyng these at the command lne. In addton, most bot famles support (unformly) Random and Sequental scannng of the desgnated addresses or blocks. Our dataset analyss accords wth the above capabltes: most scanners we observe use ether smple sequental scannng (IP address ncrements by one between scans) or ndependent unform random scannng. We do observe more sophstcated monotonc trends (address ncrementng by k), but very nfrequently. We also observe botnets usng ht-lst scannng qute frequently. A.2 Modelng Botnet Global Scannng There s a large desgn space for botmasters when developng scan strateges, but we expect that the followng features are usually desred: Cover the target scope fully. Dstrbute the load based on bots capabltes. Low communcaton overhead for coordnaton. Scan detecton evason. Botmasters may want bots to avod aggressve scannng of a small address range, to avod easy detecton and blockng by IDS/IPS systems. Redundancy. Snce the bots n a botnet can readly be lost due to detecton or smply the host computer gong offlne, the botmaster wll prefer nstructng multple bots to scan the same addresses. A smlar analyss s proposed n [19] for worms. Gven these desred features, a smple and effectve approach s to ask each bot to ndependently scan the specfed range n a random unform fashon. Dong so can acheve the scan detecton evason, low communcaton overhead, and load dstrbuton, whle also provdng good coverage and redundancy. Ths approach s also smple to correctly mplement. Most of the events we found n our datasets are close to unform scannng. Advanced Scannng Strateges. In fact, by ntroducng some smple coordnaton between bots one can do better than random unform for both coverage and redundancy. An advanced scannng strategy, called worm scan permutaton, was proposed n the context of worm propagaton [28]. But the above strategy s optmzed for worms and does not consder the usage of C & C channels of botnets. Potentally, wth C & C channels botnets can acheve even better coordnaton. Usng the botnet C & C, we propose a better scan strategy called Advanced Botnet Permutaton scan (ABPS). Each bot permutes the whole scannng scope n the same way wth a key from botmaster. Then based on bots capabltes, the botmaster dvdes the replcates of the permuted IP scope to all the bots. Ths can acheve much better coverage and redundancy. We smulate and evaluate ths strategy n our evaluaton. B. PROOF OF THEOREM 1 PROOF. There are totally d n ways to dstrbute the n scans nto d addresses. Among them f there are X 0 ways whch have z 0 addresses recevng zero scan (.e., z 0 empty slots). Then, we know P (z 0) = X 0/d n. We wll show that for a gven z 0 the X 0 s! d Strlng2(n, d z 0) (d z 0)! z 0 In d addresses, there are ` d z 0 confguratons to select whch z0 addresses got zero scan. Each confguraton has z 0 addresses whch got zero scan and d z 0 addresses got non-zero scans. Strlng2(n, m) denotes the number of ways of parttonng a set of n element nto m nonempty sets [29]. Consder after parttonng the n scans nto d z 0 sets, we have (d z 0)! ways to map the sets to the addresses. Therefore, for each confguraton we have Strlng2(n, d z 0) (d z 0)! ways to dstrbute the n scans nto d z 0 addresses. Hence we proved X 0 = d z 0! Strlng2(n, d z 0) (d z 0)! C. PROOF OF THEOREM 2 AND 3 Proof of Theorem 2: THEOREM 2. ˆρ s an unbased estmator for ρ. PROOF. n E(ˆρ) = E( ) = E( P n m ) E(n ) P R G T m = P R G T m R G T As we mentoned, n s the number of scans we see f we sample from R G T total scans wth probablty ρ, whch follows a bnomal dstrbuton. Hence we have E(n ) = ρ R G T. Therefore, E(ˆρ) = Proof of Theorem 3: ρ R G T = ρ R G T THEOREM 3. V AR(ˆρ) = P ρ (1 ρ) m R G T R G T = ρ R G T < V AR( ˆρ ),.e., the accuracy of ρ estmator when aggregatng over all m senders s hgher than that of each and every sngle sender. PROOF. n V AR(n ) ( R G T ) 2 V AR(ˆρ) = V AR( ) = R G T Smlar as before snce n follows a bnomal dstrbuton, we have V AR(n ) = ρ (1 ρ) R G T. Therefore, V AR(ˆρ) = On the other hand, ρ (1 ρ) R G T ( R G T ) 2 = ρ (1 ρ) R G T n V AR( ˆρ ) = V AR( ) = V AR(n) ρ (1 ρ) = 2 R G T (R G T ) R G T Therefore, V AR(ˆρ) < V AR( ˆρ )