Automating Analysis of Large-Scale Botnet Probing Events
|
|
|
- Sylvia Dawson
- 10 years ago
- Views:
Transcription
1 Automatng Analyss of Large-Scale Botnet Probng Events Zhchun L, Anup Goyal and Yan Chen Northwestern Unversty 2145 Sherdan Road Evanston, IL, USA Vern Paxson UC Berkeley & ICSI 1947 Center St., Sute 600 Berkeley, CA, USA ABSTRACT Botnets domnate today s attack landscape. In ths work we nvestgate ways to analyze collectons of malcous probng traffc n order to understand the sgnfcance of large-scale botnet probes. In such events, an entre collecton of remote hosts together probes the address space montored by a sensor n some sort of coordnated fashon. Our goal s to develop methodologes by whch stes recevng such probes can nfer usng purely local observaton nformaton about the probng actvty: What scannng strateges does the probng employ? Is ths an attack that specfcally targets the ste, or s the ste only ncdentally probed as part of a larger, ndscrmnant attack? Our analyss draws upon extensve honeynet data to explore the prevalence of dfferent types of scannng, ncludng propertes such as trend, unformty, coordnaton, and darknet avodance. In addton, we desgn schemes to extrapolate the global propertes of scannng events (e.g., total populaton and target scope) as nferred from the lmted local vew of a honeynet. Cross-valdatng wth data from DSheld shows that our nferences exhbt promsng accuracy. Categores and Subject Descrptors C.2.3 [Computer-Communcaton Networks]: Network Operatons network montorng; C.2.0 [Computer-Communcaton Networks]: General Securty and protecton General Terms Algorthms, Measurement, Securty Keywords Botnet, Global property extrapolaton, Honeynet, Scan strategy nference, Stuatonal awareness, Statstcal nference 1. INTRODUCTION When a ste receves probes from the Internet whether basc attempts to connect to ts servces, or apparent attacks drected at those servces, or smply pecular spkes n seemngly bengn actvty often what the ste s securty staff most wants to know s Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, to republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. ASIACCS 09 March 10-12, 2009, Sydney, NSW, Australa Copyrght 2009 ACM /09/03...$5.00. not are we beng attacked? (snce the answer to that s almost always yes, all the tme ) but rather what s the sgnfcance of ths actvty? Is the ste beng delberately targeted? Or s the ste smply recevng one small part of much broader probng actvty? For example, suppose a ste wth a /16 network receves malcous probes from a botnet. If the ste can determne that the botnet probed only ther /16, then they can conclude that the attacker may well have a specal nterest n ther enterprse. On the other hand, f the botnet probed a much larger range, e.g., a /8, then very lkely the attacker s not specfcally targetng the enterprse. The answers to these questons greatly nfluence the resources the ste wll choose to employ n respondng to the actvty. Obvously, the ste wll often care more about the probng f the attacker has specfcally targeted the ste, snce such nterest may reflect a worrsome level of determnaton on the part of the attacker. Indeed, such targeted attacks have recently grown n promnence. Yet gven the ncessant level of probng all Internet addresses receve [21], how can a ste assess the rsk a gven event reflects? In ths work we seek to contrbute to the types of analyss that stes can apply to gauge such rsks. We orent much of our methodology wth an assumpton that most probng events reflect actvty from the coordnated botnets that domnate today s Internet attack landscape. Our approach s lmted to analyzng farly large-scale actvty that nvolves multple local addresses. As such, our technques are sutable for use by stes that deploy darknets (unused subnets), honeynets (subnets for whch some addresses are populated by some form of honeypot responder), or n general any montored networks wth unexpected access, for whch we can detect botnet probng events. The man contrbuton of ths paper s the development of a set of technques for analyzng botnet events, most of whch do not requre the use of responders. For smplcty, we wll refer to the collecton of sensors as the ste s Sensors. In contrast to prevous work on botnets, whch has focused on ether host-level observatons of sngle nstances of a botnet actvty, studes of partcular captured botnet bnares [11], or networklevel analyss of command-and-control (C&C) actvty [24], our technques am to characterze facets of large-scale botnet probng events regardless of the nature of the botnet. Our analyss does not requre assumptons about the nternal organzaton and communcaton mechansms employed by the botnets. We focus on characterzaton of botnet propertes based on nferences from ther probng behavor. In addton, our approach has the sgnfcant beneft of requrng only local nformaton, rather than global nformaton as requred by collaboratve efforts such as DSheld [27]. We gve more detaled comparsons n Secton 6. We frame the contrbutons of our work as follows. Frst, we develop a set of statstcal approaches to assess the attrbutes of large-scale probng events seen n Sensors, ncludng checkng for trends, unformty, coordnaton, and one specfc form of ht-lst (Secton 3). The type of ht-lst we focus on s lveness-aware scan-
2 W G H oney nets/ H oney farm s T raffc T raffc C l assfcaton E v ent E x tracton Botnet Detecton Msconfguraton Msconfguraton S ep araton W orm S ep araton orm B o t n e t Mod el ch eck ng Monotonc trend ch eck ng H t l st ch eck ng U nform ty ch eck ng I nd ep end ency ch eck ng Fgure 1: System archtecture. Botnet w th unform scan m od el Botnet I nf er ence l ob al P rop erty E x trap ol aton 35% 30% 25% 20% 15% 10% 5% 0% HTTP Vul. MSSQL Vul. Symantec Vul. VNC Vul. SMB/RPC Vul. Other Vul. Not Vul. Fgure 2: The dstrbuton of the malcous payload dscovered n the scan events. nng, n whch the attackers try to avod darknets. For trend and unformty checkng, the statstcal lterature provdes apt technques, but assessng coordnaton and use of ht-lsts requres developng new technques. We confrmed the consstency of the statstcal technques for nferrng event propertes wth manual nspecton or vsualzaton. Applyng such statstcal testng on massve honeynet traffc reveals some nterestng and sophstcated botnet scan behavors such as coordnated scans. We then used our sute of tests to frame the scannng strateges employed durng dfferent probe events, from whch we can further extrapolate the global propertes for partcular strateges. Second, we devse two algorthms to extrapolate the global propertes of a scannng event based on a sensor s lmted local vew. These algorthms are based on dfferent underlyng assumptons and exhbt dfferng accuraces, but both enable us to nfer the global scannng scope of a probng event, as well as the total number of bots ncludng those unseen by the Sensors, and the average scannng speed per bot (Secton 4). The global scannng scope enables the ste s operators to assess whether ther network s a specfc target of botnet actvty, or f nstead the botnet s scannng targets a large network scope that smply happens to nclude the ste. The estmated total botnet sze can help us track trends n how botnets are used, wth mplcatons for ther C&C capabltes. The algorthms are rooted n the observaton (confrmed by our checkng of scannng propertes) that the most frequent scannng patterns reflect unform random scannng or unform ht-lst scannng. Indeed, nearly all of the probng events we observed follow one of these two scan patterns. 1 In Secton 5, we evaluate our technques usng 24-month trace (293 GB total) of Honeynet traffc collected at a large research nsttuton. Of the events classfed as lkely botnet actvty (.e., not msconfguratons or worms), most reflected ether unform-random or unform-htlst scannng. Analyzng the data, we fnd that 66.5% of botnet events exhbt unform random scannng and 16.3% of botnet events reflect ht-lst scannng, 85% of whch were also unform. Also, we fnd most of these probes nclude attacks. As shown n Fgure 2, our honeynet measurements fnd that about 84% of scan events carry malcous payloads targetng vulnerabltes of dfferent protocols, such as SMB/RPC, MSSQL, VNC, etc. 2 We note that such botnet scans are one key technque employed for botnet recrutment [24]. Through event correlaton study, we also fnd some nterestng behavors of how botmasters control ther bots. 1 Of course there s the usual arms race here between attackers and defenders. If our technques become wdely used, then attackers may modfy ther probng traffc to skew the defenders analyss. But untl the botmasters take steps to do so, these technques have value. We adopt the vew common n network securty research that there s sgnfcant utlty n rasng the bar for attackers even f a technque s ultmately evadable. 2 Not Vul. conssts of nstances where the honeynet receved lttle or no payload, or purely servce-testng probes. Unq Source Counts Tme (Sx Hour Interval) Year 2006 Fgure 3: Temporal dstrbuton of source count for VNC(5900). To valdate our estmates of the global propertes, we compare our results wth those from DSheld [27], the Internet s largest global alert repostory. We fnd that n 75% of cases, our extrapolated scope s wthn a factor of 1.35 of the scan scope observed n DSheld data. In all the cases t s wthn a factor of 1.5. The results ndcate that our approaches hold promse for suffcent accuracy to enable stes to make relable nferences, wth the caveat that we were unable to fnd any nstances of events n our current dataset that reflected a global scope much dfferent from /8. 2. SYSTEM FRAMEWORK The archtecture of our desgn s shown n Fgure 1. The system has two subsystems: botnet detecton and botnet nference. In ths paper we focus on the latter (rghthand half of Fgure 1). All of the steps n our analyss system are automated, most of them fully so. We manly use the Honeynet sensor to drve the rest of the dscusson, although we can generally apply our analyss technques (the botnet nference subsystem) to botnet probe events detected by other types of sensors. The system classfes traffc seen on the sensors by dfferent protocols or by sesson semantcs. We defne a sesson as a set of connectons between a par of hosts wth a specfc purpose, perhaps nvolvng multple applcaton protocols. The system extracts events based on the number of unque sources arrvng n a wndow of tme (cf. the spkes n Fgure 3), classfyng the actvty nto msconfguratons, worms, and botnet-lke probng. 2.1 Honeynet and Data Collecton Our detecton sensor conssts of ten contguous /24 subnets wthn one of a large research nsttuton s /16 networks. We deployed Honeyd responders [23] on fve of the subnets and operated the other fve completely dark. (We use ths latter for ht-lst detecton.) The Honeyd confguraton s smlar to that used by Pang et al.n [21]: we smulate the HTTP, NetBIOS, SMB, WINRPC, MSSQL, MYSQL, SMTP, Telnet, DameWare protocols, wth echo servers for all other port numbers. We evaluate our analyss technques usng 293 GB of trace data collected over two years (2006
3 and 2007). 2.2 Botnet Detecton Subsystem In ths paper we manly focus on botnet nference. For the completeness we brefly ntroduce how to detect botnet events here. The detals s avalable n our technque report [18]. Traffc Classfcaton: Attack traffc can have complex sesson structure nvolvng multple applcaton protocols. For example, an attacker can send an explot to TCP port 139 whch, f successful, results n openng a shell and ssung an HTTP download command. Often the applcaton protocol contacted frst s the protocol beng exploted (an excepton s an ntal connecton to a portmapper servce), so we label sessons wth the servce assocated wth the frst destnaton port appearng n them. Dong so also provdes consstent labelng for connecton attempts seen n darknets or other types of sensors. We aggregate connectons nto sessons usng an approach smlar to the frst step algorthm by Kannan et al [14]. For applcaton protocols not commonly used, the background radaton nose (ncludng ndvdual port scans) s typcally low, and thus we use port numbers to separate event traffc. However, nose s usually strong for popular protocols, requrng further dfferentaton based on payload (when avalable). To do so, we mplemented payload summary scrpts for 20 commonly seen protocols, based on the Bro system s network analyss capabltes [22]. Event Extracton: Fgure 3 shows source arrval counts for VNC (TCP port 5900) for the year 2006 on our sensor, where each pont represents the number of sources wthn a sx-hour nterval. Large spkes n such plots generally correspond to scannng from worms or apparent botnets, or msconfguratons. We classfy such spkes as events, as follows. We defne the nose strength N as the pernterval count of unque sources seen n the absence of events. Suppose the tme nterval length s I. We calculate N as the medan of unque source counts of K contnuous tme ntervals before the event. We defne sgnal strength S = X N as the peak unque H t L s t Not H t L s t M onot on c T r e nd M onot on c T r e nd P a r t a l M onot on c T r e nd P a r t a l M onot on c T r e nd U n f or m & I nd e p e nd e nt U n f or m & Non- nd e p e d e nt source count arrval X mnus the nose strength N, and defne the sgnal-to-nose rato as SNR= S = X N = X 1. N N N In our evaluaton we use I = 6 hours and K = 120. The aggregated tme wndow I K s about 30 days. We only examne events wth SNR 50. We automatcally extract potental events as follows: for any gven tme nterval, we calculate the medan of the prevous normal K ntervals and the SNR. For those spkes exceedng our SNR threshold, we extend the tme range to both sdes untl S ωn where ω s a tunable parameter controllng the amount of the sgnal tal to nclude n the event. (We use ω = 5, though we fnd rangng t over does not sgnfcantly alter the results.) For multple events wthn one tme seres, we extract the events teratvely, startng wth the event wth largest SNR. One problem we have to consder s that some events have complex sesson structures nvolvng multple protocols. After traffc classfcaton by protocol nformaton, a sngle event can be separated to multple events. Therefore, after event classfcaton, we need to merge them. We detect such cases by checkng the connecton correlaton. If two connectons are n one sesson, they wll be both from host A to host B and the protocols of the two connectons are fxed. For example, suppose the frst connecton s HTTP and the second one s WINRPC. If we fnd such events to be hghly correlated,.e., for most connectons n the HTTP event, each HTTP connecton s followed by a WINRPC connecton from the WINRPC event for the same source and destnaton par, we merge them as one event. Event Classfcaton and Separaton: We separate msconfguratons from worms or botnets based on the observaton that botnet scans and worms should contact a sgnfcant range of the IP addresses, whereas msconfguratons exhbt a few hot-spot targets. We found that most msconfguraton events are due to P2P traf- Non- U n f or m U n f or m & I nd e p e nd e nt U n f or m & Non- nd e p e d e nt Non- U n f or m Fgure 4: Model Checkng Desgn Space. W/ mono t r e nd No mono t r e nd fc. The detaled analyss of these msconfguraton s our techncal report [17, 18]. In general, probng from worms (self-propagatng processes) can look very smlar to that from botnets (processes under a common C&C), and ndeed the lne between the two can blur dependng on the nature of the commands that botmasters ssue to ther bots. For our purposes, we dentfy and remove as worms those events that exhbt an exponental growng trend (per the technque developed n [31]) and deem the remander as botnet probng events. 2.3 Botnet Inference Subsystem Scan Pattern Checkng: For botnet probng events, there are numerous scannng strateges that attackers can potentally use. Identfyng the partcular approach can provde a bass to nfer further propertes of the events and perhaps of the botnets themselves. We refer to these strateges as scan patterns, and undertake to develop a set of scan-pattern checkng technques to understand dfferent dmensons of such strateges: Monotonc trend checkng Ht lst checkng Unformty checkng Dependency checkng For detals, see Secton 3. Global Property Extrapolaton: Once we dentfy a probng event s scan pattern, we then use the scan pattern to extrapolate a global vew of the event. We focus on two of the most common scan patterns: unform random scannng, and unform ht-lst scannng. We confrm ther common use both from botnet source code analyss (Secton A) and expermental observatons (Secton 5). We then extrapolate the global scan scope and the global number of bots based on these two scan patterns, usng technques developed n Secton PROPERTY CHECKING OF BOTNET SCAN PATTERNS The whole desgn space of the botnet probng strateges s very large. It s hard to consder all of them n our botnet nference framework. Through botnet source code analyss and reasonng what a ratonale botnet master wll do (the detals s n Appendx A), We fnd the unform random scannng, ht-lst scannng, monotonc scannng and coordnated permutaton scannng are the strateges more lkely used by the botmasters, gven they are smple and effectve. In ths secton we develop a set of analyss algorthms for detectng these scan strateges. Each s desgned to check a sngle dmenson of characterstcs n the scan pattern. Then we combne the characterstcs of an event to construct the scan pattern n use. We frst classfy the scan traffc pattern nto monotonc, partally monotonc and non-monotonc trends. For non-monotonc trend, we assess the possble use of a ht-lst or random-unform scannng (even dstrbuton of scans across the porton of the sensor space). Fnally, for random-unform pattern we test whether the senders can be modeled as ndependent.
4 #scan per IP Ht lst Destnaton IPs n the sensor #scan per IP Unform random Destnaton IPs n the sensor Fgure 5: Ht-lst and unform scannng dstrbuton on the sensor. 3.1 Monotonc Trend Checkng Queston: Do senders follow a monotonc trend n ther scannng? Monotoncally scannng the destnaton IP addresses (e.g., sequentally one after another) s a common scan strategy wdely used by network scannng tools. In our evaluaton, we dd fnd a few events whch use the monotonc trend scannng. Furthermore, for random events, the monotonc trend checkng can help us flter out the noses caused by the non-bot scanners. For each sender, we test for monotoncty n targetng by applyng the Mann-Kendall trend test [15], a non-parametrc hypothess testng approach. In our study, we set the sgnfcance level to 0.5%, snce a hgher sgnfcance level wll ntroduce more false postves and we need to check thousands of sources. In our evaluaton, we manually check the statstcal power and fnd t hgh enough to detect weak trends. The ntuton behnd ths test s that f the data have a monotonc trend, the aggregated sgn value(> 1; = 0; < 1.) of all the consecutve value pars would be out of the range the randomness can acheve. In our techncal report [18], we descrbe the detaled approach and our enhancement to the orgnal Mann-Kendall trend test. We label an entre event as havng a monotonc trend f more than 80% of senders exhbt a trend, and for further analyss remove those that do not reflect a trend as lkely representng separate actvty (and thus lkely removng a source of potental nose). We nstead label the event as non-monotonc f more than 80% of senders do not exhbt a trend. We label the remander as partal monotonc. 3.2 Ht-Lst Checkng Queston: Do the bots use a target ht-lst for scannng? By ht-lst scannng, we refer to an event for whch the attacker appears to have prevously acqured a specfc lst of targets. Htlst s often employed by sophstcated botmasters to acheve hgh scan effcency. It s mportant for the network admnstrators to know whether they are n the ht-lst. When that s the case, most lkely they wll be re-scanned by the attacker agan and agan. We detect the use of a ht-lst based on the observaton that such scans should heavly favor the use of lve addresses (those that respond) to dark (non-responsve) addresses. To ths end, we operate half of our sensor regon n a lve fashon and half dark. If we observe an event n the Honeynet porton, but not n the darknet porton, ths provdes strong evdence that the scan used a ht lst. However, one consderaton s event polluton (sources that actually are background nose rather than part of the botnet). We do not requre a complete absence of darknet scannng, nstead test for the prevalence of honeynet scans over darknet scans sgnfcantly exceedng what we would expect. Fgure 5 compares an example ht-lst event (WINRPC ) versus a random-unform event (VNC ). To dstngush between two such cases, we defne the rato of the number of senders whch target the darknet (m d ) over those of the honeynet (m h ) as θ = m d m h. Then we test whether θ crosses a gven threshold. In our evaluaton, we fnd the results are not senstve to the threshold we choose. Note that for the events that requre applcaton-level analyss to separate the actvty from the background traffc (e.g., dfferent types of HTTP probng), sources n the event wll necessarly be restrcted to the honeynet because applcaton-level dalog requres responses that the darknet cannot provde. In ths case we can stll perform an approxmate test, by testng the volume of traffc seen concurrently n the darknet usng the same port number. Dong so, may mss some ht-lst events, however, because we tend to overestmate the amount of actvty the botnet exhbts n the darknet. Even other factors could potentally cause an mbalance between the darknet and the Honeynet. However, most of these do not result n a sgnfcantly small θ, except the one n whch an attacker chooses a small scan range that happens to nclude only the Honeynet addresses. However, even f ths occurs we would also (f t does not reflect prevous scannng,.e., s not a ht-lst) expect t to occur equally often the other way around,.e., ncludng only darknet addresses but not Honeynet addresses, whch have not been observed over two years. In the 203 events we analyzed, we fnd 33 (16.3%) ht-lst events. 3.3 Unformty Checkng Queston: Does an event unformly scan the target range? A natural techncal for bots s to employ unform random scannng across the target range. Testng whether the scans are evenly dstrbuted n the honeynet sensor can be descrbed as a dstrbuton checkng problem. We employ a smple χ 2 test, whch s wellsuted for the dscrete nature of address blocks. For χ 2 test, when choosng the number of bns for the test, a key requrement s to ensure that the expected value E for any bn should exceed 5 [26]. Accordngly, gven that our events have at least several hundred scans n them, we dvde the 2,560 addresses n our Honeynet nto 40 bns wth 64 addresses per bn. We then use the χ 2 test wth a sgnfcance level of 0.5%, whch s found to work well n our subsequent evaluaton n Secton Dependency Checkng Queston: Do the sources scan ndependently or are they coordnated? Sophstcated scannng strateges can ntroduce correlatons between the sources n order to control the work that each contrbutes more effcently. For example, In Appendx A.2, we descrbe a more effcent coordnated scheme ABPS (Advanced Botnet Permutaton Scannng) based on permutaton scannng wll nduce negatve correlatons n the targetng among the sources (they try to get out of each other s way ). Snce tradtonal approach only an work n lnear dependence or two-varable cases, we develop a new hypothess testng approach. To test for such coordnaton, we use the followng hypothess test. The null hypothess s that the senders act n a unform, ndependent fashon (where we frst test for unformty as dscussed above); whle the alternatve hypothess s that the senders do not act n an ndependent fashon. If an event comprses n scans targetng d destnatons n a unform random manner, we can n prncple calculate the dstrbuton of the number of destnatons that receve exactly k scans, Z k. We then reject the null hypothess f the observed value s too unlkely gven ths dstrbuton (we agan use a 0.5% sgnfcance level). THEOREM 1. If n scans target d addresses n a unform ndependent manner, the number of addresses Z 0 (k = 0) whch do not receve any scan follows the probablty dstrbuton functon:! P (z 0) = d Strlng2(n, d z 0) (d z 0)!/d n z 0
5 Property name unform scannng unform ht lst estmaton method Global target scope Yes Yes ndrect Total # of bots Yes Yes ndrect Total # of scans Yes Yes ndrect Average scan speed per bot Yes Yes ndrect Coverage ht rato Yes No drect Sender OS dstrbuton Yes Yes drect Sender AS dstrbuton Yes Yes drect Sender IP prefx dstrbuton Yes Yes drect Table 1: Global propertes estmated from local observatons. The Strlng2(n, y) denotes the Strlng number of the second knd [29], whch s the number of ways to partton n elements to y non-empty sets. The proof s n Appendx B. However, f n d, then the sensor range wll be sparsely populated, and ths dstrbuton does not gve us much statstcal power. Instead, we need to use a larger value of k. The more detaled analyss s n our technque report verson [18]. We valdate our tests usng Monte Carlo smulatons wth and wthout ntroduced correlatons. We also confrm that the test correctly detects the correlatons ntroduced by our ABPS scheme. Fnally, when applyng our test to our two years worth of data, we do not n fact fnd any cases exhbtng lkely coordnated scannng. 4. EXTRAPOLATING GLOBAL PROPER- TIES We now turn to the problem of estmatng a probng event s global scope (target sze, partcpatng scanners) based only on local nformaton. Ths task s challengng because the sze of the local sensor may be very small compared to the whole range scanned by a botnet, gvng only a very lmted vew of the scannng event. For our estmaton, we consdered eght global propertes, as shown n Table 1. For both unform-random and unform-ht-lst scannng, the unformty property enables us to consder the local vew as a random sample of the global vew. Thus, the operatng system (OS), autonomous system (AS), and IP prefx dstrbutons observed n local measurements provde an estmate of the correspondng global dstrbutons (bottom three rows). However, we need to consder that f bots exhbt heterogenety n ther scannng rates, then the probablty of observng a bot decreases for slower-scannng ones. The scannng rate heterogenety mentoned above ntroduces a bas towards the faster bots n the populaton for these dstrbutonal propertes. By extrapolatng the total number of bots, however, we can roughly estmate the prevalence of ths effect. It turns out that n all of our analyzed events, we fnd that more than 70% of the bots appear at the local sensor 3 by comparng the number of bots seen at the local sensors wth the extrapolated global bot populaton as shown n Table 6. Thus, the bas s relatvely small. The coverage ht rato gves the percentage of target IP addresses scanned by the botnet. As ths metrc s dffcult to estmate for ht-lst probng, we manly consder unform scannng, for whch certan destnatons are not reached due to statstcal varatons. For unform scannng, we can drectly estmate ths metrc based on the coverage seen n our local sensor. In the remander of ths secton we focus on how to estmate the four remanng propertes, each of whch requres ndrect extrapolaton. 4.1 Assumptons and Requrements To proceed wth ndrect extrapolaton, we must make two key 3 The hgh percentage of bots appearng at the local sensor arses due to the fact that probng events contnue long enough to expose majorty of the bots. Approach Property name Affected Requre IPID by botnet or port # dynamcs contnuty Both # of bots No No Approach I Global target scope Total # of scans No No Yes Yes Average scan speed per bot Yes Yes Approach II Global target scope Yes No Total # of scans Yes No Average scan speed per bot Yes No Table 2: Addtonal assumptons and requrements. assumptons: 1 The attacker s oblvous to our sensors and thus sends probes to them wthout dscrmnaton. Ths assumpton s fundamental to general honeynet-based traffc study, (cf. the probe-response attack developed n [9] and counter-defenses [10]). A general dscusson of the problem s beyond the scope of ths paper. However, snce we assume our technque s manly used by a sngle enterprse or a set of collaboratng enterprses, we need not release sensng nformaton to the publc, whch counters the basc attack n [9]. Wth ths assumpton, we can treat the local vew as provdng unbased samples of the global vew. 2 Each sender has the same global scan scope. Ths should be true f all the senders are controlled by the same botmaster and each sender scans unformly usng the same set of nstructons. We argue that these two fundamental assumpton lkely apply to any local-to-global extrapolaton scheme. In addton, we check for one general requrement before applyng extrapolaton, namely consstency wth the presumpton that each sender evenly dstrbutes ts scans across the global scan scope. Ths requrement s vald for the dark regons shown n Fgure 4 (Secton 3 above),.e., both unform random scannng and random permutaton scannng, regardless of whether employng a ht-lst. Therefore, pror to applyng the extrapolaton approaches, we test for consstency wth unformty (va methodology dscussed n Secton 3), whch many of the botnet scan events pass (80.3%). There are some addtonal requrements specfc to certan extrapolaton approaches, as lsted n Table 2. Botnet dynamcs, such as churn or growth, can nfluence certan extrapolaton approaches. Accordngly these approaches work better for short-lved events. Approach I, as dscussed n secton 4.3, requres contnuty of the IP fragment dentfer (IPID) or ephemeral port, whch holds for botnets domnated by Wndows or MacOS machnes (n our datasets we found all the events are domnated by Wndows machnes). We use passve OS fngerprntng to check whether we can assume that ths property holds. 4.2 Estmatng Global Populaton Table 3 shows the notaton we use n our problem formulaton and analyss, markng estmates wth hat s. For example, ˆρ represents the estmated local over global rato,.e., rato of local sensor sze comparng to the global target scope of the botnet event, and Ĝ represents the estmated global target scope. If ρ s small, many senders may not arrve at the sensor at all. In ths case, we cannot measure the total bot populaton drectly. Instead, we extrapolate the total number of bots as follows. Wth the unform scan assumpton dscussed above, we have: m 1 M = m12 m 2 (1) based on the followng reasonng. We can splt the address range of the sensor nto two parts. Snce the senders observed n each part are ndependent samples from the total populaton M, Equaton 1 follows from ndependence. For example, suppose there are total
6 T d G ρ M m m 1 m 2 m 12 R R G T n t j Q Event duraton observed n the local sensor Sze of the local sensor Sze of global target scope Local over global rato d/g Total # of senders n the global vew n T Total # of senders n the local vew n T # of senders n the frst half of the local vew n T # of senders n the second half of the local vew n T # of overlapped senders of m 1 and m 2 n T Average scannng speed per bot Global scannng speed of bot Tme between frst and last scan arrval tme from bot Number of local scans observed from bot n T Inter-arrval tme between the j and j + 1 scans Local total # of scans n T Table 3: Table of notatons. M = 400 bots. In the frst half sensor, we see m 1 = 100 bots, whch s 1/4 of the total bot populaton. Consder the second half as another ndependent sensor, so the bots t observes form another random sample from the total populaton. Then we have a 1/4 chance to see f there s a bot already seen n the frst half. If the second half observes m 2 = 100 bots too, the shared bots wll be close to m 12 = 100/4 = 25. Snce n Equaton 1 we can drectly measure m 1, m 2, and m 12, we can therefore solve for M, the total number of bots n the populaton. Ths s a smple varaton of a general approach used to estmate anmal populatons known as Mark and Recapture. Snce the m 1,m 2 and m 12 are measured at exactly the same tme wndow 4, the estmated total populaton M s the number of bots of the botnet n the tme wndow. 4.3 Explotng IPID/Port Contnuty We now turn to estmatng the global scan scope. We nvestgated two basc strateges: frst, nferrng the number of scans sent by sources n between observatons of ther probes at the Honeynet (Approach I); second, estmatng the average bot global scannng speed usng the mnmal nter-arrval tme we observe for each source (Approach II, covered n the next secton). Approach I s based on measurng changes between a source s probes n the IPID or ephemeral port number. We predcate use of ths test on frst applyng passve OS fngerprntng to dentfy whether the sender exhbts contnuous IPID and/or ephemeral port selecton. Ths property turns out (see below) to hold for modern Wndows and Mac systems, as well as Lnux systems for ephemeral ports. IPID contnuty. Wndows and MacOS systems set the 16-bt IPID feld n the IP header from a sngle, global packet counter, whch s ncremented by 1 per packet. Durng scannng, f the machne s manly dle, and f the 16-bt counter does not overflow, we can use the dfference n IPID between two observed probes to measure how many addtonal (unseen by us) scans the sender sent n an nterval. (The algorthm becomes a bt more complex because of the need to dentfy and correct IPID overflow/wrap, as dscussed below. We also need to take nto account the endanness of the counter as present n the IP header.) A potental problem that arses wth ths approach s retransmsson of TCP SYN s, whch may ncrement the IPID counter even though they do not reflect new scans. Thus, when estmatng global scan speed we dvde by the average TCP SYN retransmsson rate we observe for the sender. Ephemeral port number contnuty. All of the botnets for whch we could nspect source code let the operatng system allocate the ephemeral source port assocated wth scannng probes. Agan, these are usually allocated by sequentally ncrementng a sngle, global counter. As wth IPID, we then use observed gaps n 4 Mark and Recapture requres the close system assumpton snce the two vsts do not happen n the same tme, whch s dfferent here. Operatng System Clents Wndows 159,152 (85.2%) Wndows 2000/XP 155,869 (97.9%) Wndows 2003/Vsta 231 (<.1%) Wndows NT ( 1.07%) Wndows (0.7%) Wndows (<.01%) Wndows other 39 (<.01%) BSD 458 (0.2%) Lnux 126 (<.1%) Novell 20 (<.01%) Undentfed 27,047 (14.4%) Total 186,725 Table 4: Aggregate operatng system dstrbuton, from passve OS fngerprntng of probng events. ths header feld to estmate the number of addtonal scans we dd not see. (In ths case, the logc for dealng wth overflow/wrappng s slghtly more complex, snce dfferent operatng systems confne the range used for ephemeral ports to dfferent ranges. If we know the range from the fngerprnted OS, we use t drectly; otherwse, we estmate t usng the range observed locally,.e., the maxmum port number observed mnus the mnmum port number observed.) IPID and ephemeral port number contnuty valdaton. In a controlled expermental envronment, we nstalled fve versons of Wndows, one of MacOS X, and two versons of Lnux, each n a dfferent vrtual machne. We then ran Nmap on each to generate scans, confrmng that all but Lnux (2.4/2.6) exhbt contnuty of IPID (wth Wn98 and NT4 ncrementng t lttle-endan, but Wn2000, WnXP, Wn2003, and MacOS X usng network order) and that all 8 systems allocated the ephemeral ports sequentally. As shown n Table 4, for all the probng events n the two-year Honeynet dataset, OS fngerprntng (va the p0f tool) ndcates that the large majorty of bots run Wndows 2000/XP/2003/Vsta (85%), enablng us to apply both IPID and ephemeral port number based estmaton. From ths analyss, we also know that the proporton of Wndows 95/98/NT4 s very low (0.8%), and only for those cases do we need swtch the byte order. (These percentages match nstall-based statstcs [5] ndcatng that Wn98 and NT4 comprse less than 1.5% of systems overall.) NAT effects on IPID and ephemeral port contnuty. Snce NATs can potentally alter IPID and ephemeral ports, we test three popular home routers n ths regard Lnksys, Netgear and D-Lnk, whch comprse more than 70% of the home router market [1]. We use Nmap to send the scans from hosts behnd these NATs and examne whether ther IPID or ephemeral ports changed. For all three, IPID remans unchanged, and for a sngle scanner behnd the NAT, the ephemeral port also remans unchanged. For multple scanners behnd the NAT, the ephemeral port numbers of the frst sender reman unchanged, though for the D-Lnk router the ports of addtonal scanners become arbtrary. Even though IPID remans unchanged, the ntermnglng of multple IPID sequences for a sngle apparent source address renders smple extrapolaton of scannng speed mpractcal. Technques exst for detectng the presence of multple sources behnd a NAT (also based on IPID), but these requre observng a large porton of the traffc comng out of the NAT [8], whch s mpractcal n our case. However, gven that we usually have a large number of dstnct sources, we can restrct our analyss to those cases that exhbt strong lnearty for ether IPID or ephemeral port numbers, whch avods conflatng patterns n these arsng from multple sources alased to the same publc IP address. In our evaluaton, we fnd that on an average 463 senders mantan lnearty n IPID and/or ephemeral port numbers for an event; thus, they can be used for extrapolaton purpose. Global scan speed estmaton. As the IPID and ephemeral port number approaches work smlarly, here we dscuss only the for-
7 mer. We proceed by dentfyng the top sources orgnatng n at least four sets of scannng. We test whether (after overflow recovery) the IPIDs ncreases lnearly wth respect to tme, as follows. Frst, for two consecutve scans, f the IPID of the second s smaller than the frst, we adjust t by 64K. We then try to ft the corrected IPID and ts correspondng arrval tme t, along wth prevous ponts, to a lne. If they ft wth correlaton coeffcent r > 0.99, t reflects consstency wth a near-constant scan speed, and the sender s a sngle host rather than multple hosts behnd a NAT. When ths happens, we estmate the global speed from the slope. It s possble that multple overflows mght occur, n whch case the smple overflow recovery approach wll fal. However, n ths case the chance that we can stll ft the IPIDs to a lne s very small, so n general we wll dscard such cases. Ths wll create a bas when estmatng very large global scopes, because they wll more often exhbt multple overflows. Sources that happen to engage n actvty n addton to scannng can lead to overestmaton of ther global scan speed, snce they wll consume IPID or possbly ephemeral port numbers more quckly than those that mght be smply due to the scannng. To offset ths bas, when we have both IPID and ephemeral port estmates, we use the lesser of the two. Furthermore, n our evaluaton, for the cases where we can get both estmates, we check the consstency between them, and found that IPID estmates usually produce larger results, but more than 95% of the tme wthn a factor of two of the ephemeral port estmate. (Clearly, IPID can sometmes advance more quckly f the scanner receves a SYN-ACK n response to a probe, and thus returns an ACK to complete the 3-way handshake.) Global scan scope extrapolaton. Wth the ablty to estmate the global scan speed, we fnally estmate the global scan scope. Snce we know the local scope, the problem s equvalent to estmate the local over global rato ρ. Suppose n a probng event there are m senders seen by the sensor, for whch we can estmate the global scan speeds R G of a subset of sze m. For sender ( [m ]), we know T (duraton durng whch we observe the sender n the Honeynet) and n (number of observed scans). We use the lnear regresson wth correlaton coeffcent r > 0.99 (as we dscussed before) to estmate the R G whch s also qute accurate. The man estmaton error comes from varaton of the observed n from ts expectaton. Defne ˆρ = n R G T for each sender. Sender s global scan speed s R G. Globally durng T, t sends out R G T scans. n s the number of scans we see f we sample from R G T total scans wth probablty ρ. Therefore, ˆρ s an estmator of ρ. If we aggregate over all the m senders, we get ˆρ = n R G T (2) As show n Appendx C, we formally prove that ˆρ s an unbased estmator of ρ, and t s more accurate than ˆρ, whch only reflects a sngle sender. We then can use ˆρ to estmate the global scope a probe targeted. Average Scan Speed Per Bot. After extrapolatng ρ and M, we estmate the average scan speed per bot usng: Q R T M = ρ (3) Here Q s the number of scans receved by the sensor n tme T, whch should reflect a porton ρ of the total scans. We estmate the total scans by R T M, where R s the average scan speed per bot. Ths formulaton assumes that each bot partcpates n the entre duraton of the event, whch s more lkely to hold for short-lved events. Lmtatons. Note that both of the above technques can fal f attackers ether craft raw IP packets or explctly bnd the source Estmate Global Speed (probes/sec) Rank Fgure 6: Top 30 estmate speeds of Event VNC port used for TCP probes. Thus, the schemes may lose power n the future. However, craftng raw IP packets and smulatng a TCP stack s a somewhat tme consumng process, especally gven most bots (85+%) we observed run Wndows, and n modern Wndows systems the raw socket nterface has been dsabled. Emprcally, n our datasets we dd not fnd any case for whch the technques dd not appear to apply. 4.4 Extrapolatng from Interarrval Tmes For Approach II, we estmate global scannng speed (and hence global scope, va estmatng ρ from an estmate of R usng Equaton 3) n a qute dfferent fashon, as follows. Clearly, a sender s global scan speed s provdes an upper bound on the local speed we mght observe for the sender. Furthermore, f we happen to observe two consecutve scans from that sender, then they should arrve about t = 1/s apart. Accordngly, the mnmum observed t gves us a lower bound on s, but wth two mportant consderatons: () the lower bound mght be too conservatve, f the global scope s large, and we never observe two consecutve scans, and () nose perturbng network tmng wll ntroduce potentally consderable naccuraces n the assumpton that the observed T matches the nterarrval spacng present at the source. We proceed by consderng all m senders we observe, other than those that sent only a sngle scan. We rank these by the estmated global scan rate they mply va ŝ = 1/ ˆ t, where ˆ t s the mnmum observed nterarrval tme for the sender. Naturally, fast senders should tend to reflect larger estmated speeds, whch we verfed by comparng ˆ t of each sender wth how many scans we observed from t. We fnd that generally the correlaton s clear though wth consderable devatons. Usng the fast senders speeds to form an estmate of the average scannng speed may of course overestmate the average speed. On the other hand, our technque ams at estmatng a lower bound. Thus, t s crucal to fnd a balanced pont among the possble estmates. We do so by presentng the dfferent sorted estmates from whch the analyst chooses the knee of the resultng curve,.e., the pont wth smallest rank k for whch an ncrease n k yelds lttle change n s. Fgure 6 shows an example, plottng the top 30 maxmum estmated speeds of Event VNC From the fgure we would lkely select k = 6 as the knee, gvng an estmated speed EVALUATION We evaluate our technques usng the honeynet traffc descrbed n Secton 2.1. The total data spans 24 months and 293 GB of packet traces. Snce the extrapolaton algorthms we use are lnear n the number of scans n the events, we fnd that our system takes less than one mnute to analyze the scan propertes and perform the extrapolaton analyss for a gven event. We use SNR= 50 and a tal parameter ω = 5 for event extracton (rangng ω from 3 to 8 yelds dentcal results). We extract 203 botnet scan events and 504 msconfguraton events. There were a few moderate worm outbreaks observed durng the perod, such as the Allaple worm [4].
8 Targeted # of knds of Events Servce vul./probes NetBIOS/SMB/RPC 7 81 VNC 1 39 Symantec 1 34 MS SQL 1 14 HTTP 2 13 Telnet 1 12 MySQL 1 6 Others 4 4 total Table 5: The summary of the events The msconfguraton events are manly caused by P2P traffc. In ths paper, we focus on the botnet scan events. We frst present characterstcs of the botnet scannng events. Then we present the botnet event correlaton study. Next we dscuss results for the four botnet scan pattern checkng technques and ther valdaton. We fnsh wth the presentaton of global extrapolaton results and ther valdaton usng DSheld, a world-wde scan repostory. 5.1 Basc Characterstcs of the Botnet Events In Table 5, we break down 203 events accordng to ther targeted servces. We fnd that most of the events target popular servces that have large nstall-base. We also fnd that 30 (14.8%) events are purely port reconnassance wthout any payloads. Another three events check whether the HTTP servce s open by requestng the homepage. The remanng (83.7%) events target certan vulnerabltes. Therefore, these botnet scans lkely reflect attempted explotatons. Fgure 7 shows the CDF of event duraton. A botnet event can last from a few mnutes to a few days. There are 36 events that last very close to half an hour, leadng to the spke n the Fgure 7. As we wll dscuss n Secton 5.2, t s a cluster of events whch scan the same vulnerablty every half hour over and over agan, for days on end. Most lkely these botnet events are drven by a sngle botmaster. From Fgure 8, we also fnd that the number of sources nvolved n a botnet event s qute heterogeneous. In Fgure 9, we show the CDF of unque number of ASes per event. Most of the bots (62.7%) come from more than 100 ASes. Only 3% of events reflect fewer than 20 ASes. Ths mples that cleanng the botnets from some part of the world (some of ASes) wll not mprove the stuaton. Also blockng them based on AS number s very hard due to large number of ASes nvolved. We also fnd that the number of destnatons a bot scans dffers sgnfcantly for dfferent events, as show n Fgure 10. We further study the OS, AS and IP dstrbuton of the events. Table 4 n Secton 4 shows the aggregated OS dstrbuton. We see that Mcrosoft Wndows s the most popular OS, wth more than 83% of bots usng Wndows 2000/XP. (We see smlar results when analyzng ndvdual events.) For AS and IP address dstrbuton, we fnd that the aggregated results (203 events together) are close to those seen n prevous work [25]. However, we fnd very large varaton across ndvdual events; thus, address blacklsts derved from one event mght not be effectve when defendng aganst other events. cumulatve probablty e 01 1e+00 1e+01 1e+02 1e+03 event duraton (hours) Fgure 7: Event Duraton. cumulatve probablty # of sources per event Fgure 8: # of Sources. cumulatve probablty # of ASes per event Fgure 9: # Source ASes. SMB_COM_LOGOFF_ANDX-TCP445E SMB_COM_LOGOFF_ANDX-TCP445E cumulatve probablty SMB_COM_LOGOFF_ANDX-TCP445E the average destnatons per source contacted Fgure 10: Avg. # Destnatons / Source SMB_COM_LOGOFF_ANDX-TCP445E SMB_COM_LOGOFF_ANDX-TCP445E SMB_COM_LOGOFF_ANDX-TCP445E Fgure 11: A subset of the cluster of 36 events whch all target a same vulnerablty n SMB. The number on an edge labels the percentage of bots sharng. 5.2 Event Correlaton We study the temporal and source (bot IP address) correlaton of dfferent events. In ths context, f we fnd two events that have more than 20% source addresses n common, we consder them as correlated. We calculate the percentage of sharng as the maxmum of the shared addresses over total addresses of two events. We observe two types of nterestng behavor: Behavor 1: The botmasters ask the same botnet to scan the same vulnerablty repeatedly. In our two years of data, we fnd several event clusters that exhbt ths behavor. For example, there s a cluster of 36 events that occur every day, always scannng the same SMB vulnerablty. These events form a nearly complete clque,.e., each event shares 20% of the same source addresses n common wth most of the other events. In Fgure 11, we show a subset of ths commonalty graph. These events on average share about 35% of the same sources. Each event occurs on a dfferent day. We speculate ths actvty reflects the botmaster commandng the same botnet to re-scan the same address range repeatedly. Behavor 2: The botmasters appear to ask most of the bots n a botnet to focus on one vulnerablty, whle choosng a small subset of the bots to test another vulnerablty. Apart from these bg clusters, we fnd there are some cases n whch two events has very hgh correlaton (more than 80% of source address commonalty), and occur very close n tme, usually the same day. We fnd that often the frst event s much larger n terms of the number of bots than the second; the second s just a small subset of the bots from the frst. Ths behavor llustrates that the dffculty of fngerprntng botnet actvty, gven that botmasters may select a subset of bots to assgn to dfferent tasks. 5.3 Property-Checkng Results Fgure 12 shows the breakdown of the events along dfferent scannng dmensons. Sx of the 203 events exhbt partal monotonc trends; 16.3% reflect ht-lsts; 80.3% follow the randomunform pattern, passng both unformty and ndependence tests. Through manual nspecton of the partal monotonc events, we fnd that nearly half of the bots scan randomly and another half of 40 33
9 H t L s t 1 6.3% ( 33) Not H t L s t 83.7% ( 1 70) M onot on c T r e nd 0% M onot on c T r e nd 0% P a r t a l M onot on c T r e nd 0% P a r t a l M onot on c T r e nd 3.0% ( 6) U n f or m & I nd e p e nd e nt 1 3.8% ( 28) U n f or m & Non- nd e p e nd e nt 0% Non- U n f or m 2.5% ( 5) U n f or m & I nd e p e nd e nt 6 6.5% ( 1 35) U n f or m & Non- nd e p e nd e nt 0% Non- U n f or m 1 4.2% ( 29 ) Fgure 12: Scan Pattern checkng results. W/ mono t r e nd 3.0% No mono t r e nd 9 7.0% bots scan sequentally. All of these bots start to scan at almost the same tme. Perhaps they reflect two groups of bots controlled by the same botmaster, and the botmaster askng these two groups to use dfferent scan strateges; but n general, ths behavor s puzzlng. After that, we test the use of lveness-aware scannng (whch we term ht-lsts ). As mentoned above, we use θ (the rato of the number of senders n the darknet over to those of the lve honeynet) as the metrc to classfy the events. Out of the 106 events classfed by port number, 34 reflect ht-lst scannng when usng θ = 0.5. In fact, all have emprcal values for θ < 0.01, and all of events wth θ > 0.5 have θ > The 97 other events use popular ports also seen n background radaton, and thus we have to classfy them based on applcaton-level behavor. For these, we conservatvely assume that all the senders n the darknet usng the same port number s possble members of the event, whch tends to overestmate θ. For these 97 events, we dd not fnd any wth small θ and most of them have θ larger than one. We found n all the cases, the results are nsenstve to the threshold of θ. In addton, none of the events only target the darknet. date 2006 desc ex. scope DSheld scope scope rato ex. scope (I)(/8) (/8) (I) (II)(/8) MSSQL Symantec Symantec Symantec VNC VNC VNC NetBIOS NetBIOS NetBIOS SMB SMB Table 6: Global scope extrapolaton results and valdaton (ex. denotes extrapolated; DSheld denotes the valdaton results usng DSheld data.). 34 of the 197 random events fal the test for unformty. We vsually confrm that all of the remanng 163 events passng the test ndeed appear unform. Three of those that faled appear unform vsually, but have very large numbers of scans, for whch the statstcal testng becomes strngent n the presence of a mnor amount of nose. In the remanng faled cases, we can see hot-spot addresses that clearly attract more actvty than others; we do not know why. Fnally, we test the 163 unform cases for coordnaton, not fndng any nstances at a 0.5% sgnfcance level. In addton, we smulate the advanced botnet permutaton scan (ABPS), and fnd the dependency test can accurately detect t even wth 0% 20% packet loss. Thus, none of the scannng we observe appears to reflect any sgnfcant degree of coordnaton. 5.4 Extrapolaton Evaluaton and Valdaton We valdate two forms of global extrapolaton global scan scope and total number of bots usng data from DSheld [27], a very large repostory of scannng and attack reports. Fndng: 75% of our estmates of global scannng scope usng only local data le wthn a factor of 1.35 of estmates from DSheld s global data, and all wthn a factor of 1.5. Fndng: 64% of bot populaton estmates are wthn 8% of relatve errors from DSheld s global data, and all wthn 27% of relatve errors For 163 unform events, 135 reflect ndependent unform scannng and 28 reflect ht-lst scannng. For each type we estmate ether the total scannng ranges or the total sze of the ht lsts, respectvely. It s dffcult to verfy ht-lst extrapolatons because of the dffculty of assessng how the ht-lst wll algn wth sources that report to DSheld. However, we can valdate extrapolatons from the frst class of events snce we fnd they usually target a large address range. Due to lmted data access to DSheld, we have only been able to verfy 12 cases as of today, as shown n Table Global Scope Extrapolaton and Valdaton. Global scope extrapolaton results: In Table 6, we show the extrapolated scan scope we estmate from the local honeynet comparng wth the estmaton we make wth the DSheld data. Column ex. scope (I) shows the honeynet extrapolated scan scope by Approach I. Column DSheld scope shows the DSheld based estmaton. Column scope rato gves the rato of the honeynet extrapolated scan scope by Approach I over the DSheld scope. Column ex. scope (II) shows the extrapolated scan scope by Approach II. From the results, we see that our fndngs are consstent wth those derved from DSheld. Next, we ntroduce how the DSheld valdaton works, and then we wll analyze the accuracy of our results. Valdaton Methodology: We fnd that most DSheld sensors appear to have synchronzed clocks (.e., we often fnd sgnfcant temporal overlap between our honeynet events and correspondng DSheld reports). For a gven extraplaton, we take two steps for valdaton. Frst, snce the extrapolaton results we got are all of /8 sze or qute close, we try to fnd all the /8 networks (except those wth prvate IP prefxes) wth suffcent source overlap wth the honeynet events. Secondly, for these /8 networks, we nfer the scan scopes and compare them wth our results. Step 1. Let X denote the /8 IP prefx of our sensor. We frst calculate the number of shared senders N(X) between our event data and scan logs for X from DSheld. We consder addtonal /8 prefxes Y f ther numbers of senders shared wth the honeynet N(Y ) are larger than N(X)/3, reflectng an assumpton that f a botnet unformly scans multple /8 prefxes, each should see qute a few sources n common. For X and each Y, we select the full wdth at half maxmum (FWHM) of the unque source arrval process as a (conservatve) way to delneate the global nterval of the event. We then calculate the tme range overlap wth X for each Y ; f the overlap of Y exceeds 50% of X s nterval, we consder that the botnet scanned X and Y at the same tme. Step 2. After fndng the scanned /8 networks, we estmate the scan scope wthn each. Alternatvely, we compute the rato of sensors n each network reportng the scans. There are several lmtatons of DSheld data. Frst, t does not contan complete scan nformaton (only a subset of scans wthn a prefx are reported). Second, dfferent sensors mght use dfferent reportng thresholds and mght not see all actvty (e.g., due to frewall flterng). Thus all these lmtatons makes calbraton of data a challengng job. To assess the lmtatons, we check a one-week nterval around our events to fnd whch DSheld sensors ever report a gven type of actvty. We treat all the reportng sensors n one /24 network as a sngle unque sensor. We count the number of sensors from dfferent /24 networks, denoted by C total. Smlarly, we count the number of unque sensors from dfferent /24 networks that reported scans from shared senders of the gven event, denoted C est. We reduce the nose from the DSheld data by removng sensors
10 cumulatve probablty Approach I scope factor cumulatve probablty Approach II scope factor Fgure 13: The CDFs of the scope factors of the 12 events we valdate. that only report a sngle address wthn a /24 sensor. We then use C est/c total to estmate the fracton of a /8 networks scanned by the botnet, whch gves us a conservatve estmate of the event s total range. We add up such fractons f there are multple related /8 networks dscovered n the frst step, ndcatng the results n Column DSheld scope of Table 6. Accuracy Analyss: We defne the scope factor as «DSheld scope Honeynet scope scope factor = max, Honeynet scope DSheld scope The scope factor ndcates the absolute relatve error n the log scale. The DSheld data shows that our local estmates of global scope exhbt a promsng level of accuracy. As shown n Fgure 13, we can clearly know that, for Approach I, the scope factors of 75% events are less than 1.35, and all of them are less than 1.5. Approach II (column ex. scope II) works less well (58% of events are wthn a factor of three and 92% wthn a factor of sx), but t may stll exhbt enough power to enable stes to dfferentate scans that specfcally target them versus broader sweeps. In our twoyear dataset, we dd not fnd any scan events specfcally targetng the research nsttuton where the sensor resdes; ths fts wth the nsttute s threat model, whch s manly framed n terms of ndscrmnant attacks Total Populaton Estmate and Valdaton We assume that our honeynet event data and the correspondng DSheld scan data gve us two ndependent samples of the bot populaton, whch s another chance to use the Mark and Recapture prncple. We count the sources observed by DSheld sensors of IP prefx X on the same port number n the same tme wndow as the sources of DSheld sensors. We term the number of sources n common between our honeynet and DSheld as the shared sources. Based on the smlar dea of Equaton 1, we know the fracton of the shared sources to the sources of DSheld should be equal to the rato between bots observed n the honeynet and total populaton. Snce DSheld sensors wll see other scanners (consttutng nose) as well, we wll lkely underestmate the frst fracton, and consequently overestmate the bot populaton. Per the results shown below, we fnd the estmates very close to those we estmate locally by splttng the sensor nto two halves. Table 7. shows the extrapolaton and DSheld valdaton results. Column ex. #bots shows our bot populaton extrapolaton constructed by splttng the sensor nto two halves. Column #bot DSheld shows the results usng DSheld s global data. Column #bots rato gves the rato between the two of these. Note, we only valdate the seven port number based events (MSSQL, Symantec and VNC). The NetBIOS/SMB events requre payload analyss, whch cannot valdate through DSheld snce t does not provde any payloads. We fnd our approach s qute accurate gven 64% of cases are wthn 8% of relatve error ( (our DSheld) /DSheld) Other Extrapolaton Results Based on Approach I, we can also nfer the total number of scans and extrapolated average scan speed of the bots n each event. In date 2006 desc ex. #bots #bots DSheld #bots rato MSSQL Symantec Symantec Symantec VNC VNC VNC Table 7: extrapolated bot populaton results and valdaton. cumulatve probablty # of extraplated scans (M) Fgure 14: Extrapolated # of scans. cumulatve probablty extrapolated average speed (probes/sec) Fgure 15: Extrapolated the average scan speed. Fgure 14, we show the extrapolated total number of scans, usng a log-scaled X axs. We can see the number of scans sent by the events could dffer sgnfcantly gven the duraton and the number of bots n each event dffer. In Fgure 15, we show the extrapolated average scan speed of the bots. 6. RELATED WORK The work that most heavly nfluences us s the vson paper of Yegneswaran and colleagues on Internet stuatonal awareness [30]. Ther work outlnes the general problem of analyzng honeynet traffc to assess ts sgnfcance for the ste observng t. The authors present the potental promse of such analyss usng technques that rely consderably on vsualzaton. In ths work, we am to go substantally further, developng a toolkt for analyzng partcular features of large-scale honeynet events, and devsng technques and a general framework to automatcally or semautomatcally derve conclusons based on honeynet data. DSheld s the Internet s largest global alert repostory [27]. The advantages of our approach comparng wth DSheld are as follow: () In our experence, DSheld data s qute nosy, and the sensor densty qute non-unform. These lead to cases where t s dffcult to develop sound nferences from the data. () DSheld s subject to polluton and avodance [9]. Dependng solely on DSheld mght not be relable for operatonal securty. () When the target scope s small, t s hard to fnd other sensors n DSheld whch share the same behavor; thus DSheld wll fal to work n such cases. Whle the state of the art n terms of buldng honeynet systems has advanced consderably, the analyss of large-scale events captured by such systems remans n ts early stages. The Honeynet project has developed a set of tools for host-level honeypot analyss [2]. At the network level, Honeysnap [3] analyzes the contents of ndvdual connectons, partcularly for nvestgatng IRC traffc used for botnet command-and-control. These approaches all ether focus on sngle nstances of actvty, or on study of partcular botnets over tme (e.g., [24]). In contrast, n ths paper, we am nstead to understand the sgnfcance of sngle, large-scale events as seen by honeynets. Such actvty by defnton entals analyss ntegrated across a large number of nstances of the actvty, but also (unlke [24]) localzed n tme. Furthermore, the lterature ncludes a number of forensc case studes analyzng specfc large-scale events, partcularly worms [16, 20]. Such case studes have often benefted from a pror knowledge of the underlyng mechansms generatng the traffc of nterest. For our purposes, however, our goal s to nfer the mechansms themselves from a startng pont of more lmted knowledge.
11 Fnally, Gu et al.propose a seres botnet detecton technques based on behavor correlaton [12, 13]. In contrast, we focus on nferrng botnet propertes n the wake of detecton, rather than detecton tself. 7. CONCLUSIONS In ths paper we present several algorthms that can automatcally analyze and determne the features of large-scale events that gve nsght nto ther underlyng nature observed at a honeynet. In partcular, we develop technques for recognzng botnet scannng strateges and nferrng a dstrbuted scan s global propertes. An evaluaton of our tools usng extensve honeynet and DSheld data demonstrates the promse our approach holds for contrbutng to a ste s stuatonal awareness ncludng the crucal queston of whether a large probng event detected by the ste smply reflects broader, ndscrmnate actvty, or nstead reflects an attacker who has explctly targeted the ste. 8. ACKNOWLEDGMENT We would lke to thank Vnod Yegneswaran and Ruomng Pang for helpng collect the data and mplementng the Bro payload summary scrpts, the operatons staff of the Lawrence Berkeley Natonal Laboratory for facltatng the LBNL honeypot setup, and anonymous revewers for ther valuable comments. Ths work was supported by DOE CAREER award DE-FG02-05ER25692//A001, DOD (Ar Force of Scentfc Research) Young Investgator Award FA , and NSF grants NSF and CNS Any opnons, fndngs, and conclusons or recommendatons expressed n ths materal are those of the authors and do not necessarly reflect the vews of the fundng sources. 9. REFERENCES [1] AP Market Sharng. ups+and+downs/ _ [2] HoneyBow Sensor. [3] Honeysnap. honeysnap/ndex.html. [4] Net-Worm.Wn32.Allaple.a. encyclopeda?vrusd= [5] OS Platform Statstcs by W3school. browsers_stats.asp. [6] BACHER, P., HOLZ, T., KOTTER, M., AND WICHERSKI, G. Know your Enemy: Trackng Botnets. [7] BARFORD, P., ET AL. An nsde look at botnets. In Seres: Advances n Informaton Securty. Sprnger, [8] BELLOVIN, S., ET AL. A technque for countng NATted hosts. In Proc. of USENIX/ACM IMW (2002). [9] BETHENCOURT, J., ET AL. Mappng nternet sensors wth probe response attacks. In Proc. of the USENIX Securty (2005). [10] CAI, J., ET AL. Honeynets and honeygames: A game theoretc approach to defendng network montors. Tech. Rep. TR1577, Unversty of Wscconsn, [11] CHIANG, K., AND LLOYD, L. A case study of the rustock rootkt and spam bot. In Proc. of USENIX HotBots (2007). [12] GU, G., PORRAS, P., YEGNESWARAN, V., FONG, M., AND LEE, W. Bothunter: Detectng malware nfecton through ds-drven dalog correlaton. In Proc. of USENIX Securty (2007). [13] GU, G., ZHANG, J., AND LEE, W. Botsnffer: Detectng botnet command and control channels n network traffc. In Proc. of NDSS (2008). [14] KANNAN, J., JUNG, J., PAXSON, V., AND KOKSAL, C. Sem-automated dscovery of applcaton sesson structure. In Proc. of ACM IMC (2006). [15] KENDALL, M. G. Rank Correlaton Methods. Grffn., [16] KUMAR, A., PAXSON, V., AND WEAVER, N. Explotng underlyng structure for detaled reconstructon of an nternet scale event. In Proc. of ACM IMC (2005). [17] LI, Z., GOYAL, A., CHEN, Y., AND KUZMANOVIC, A. P2p doctor: Measurement and dagnoss of msconfgured peer-to-peer traffc. Tech. Rep. NWU-EECS-07-06, Northwestern Unversty, [18] LI, Z., GOYAL, A., CHEN, Y., AND PAXSON, V. Towards stuatonal awareness of large-scale botnet events usng honeynets. Tech. Rep. NWU-EECS-08-08, Northwestern Unversty, [19] MANNA, P., CHEN, S., AND RANKA, S. Exact modelng of propagaton for permutaton-scannng worms. In IEEE INFOCOM (2008). [20] MOORE, D., PAXSON, V., SAVAGE, S., SHANNON, C., STANFORD, S., AND WEAVER, N. Insde the slammer worm. IEEE Securty and Prvacy (2003). [21] PANG, R., YEGNESWARAN, V., BARFORD, P., PAXSON, V., AND PETERSON, L. Characterstcs of Internet background radaton. In Proc. of ACM IMC (2004). [22] PAXSON, V. Bro: A system for detectng network ntruders n real-tme. Computer Networks 31 (1999). [23] PROVOS, N. A vrtual honeypot framework. In Proc. of USENIX Securty (2004). [24] RAJAB, M., ZARFOSS, J., MONROSE, F., AND TERZIS, A. A multfaceted approach to understandng the botnet phenomenon. In Proc. of ACM IMC (2006). [25] RAMACHANDRAN, A., AND FEAMSTER, N. Understandng the network-level behavor of spammers. In Proceedngs of ACM SIGCOMM 06 (September 2006). [26] RICE, J. A. Mathematcal Statstcs and Data Analyss. Duxbury Press, [27] SANS INSTITUTE. Dsheld.org: Dstrbuted ntruson detecton system. [28] STANIFORD, S., PAXSON, V., AND WEAVER, N. How to 0wn the Internet n your spare tme. In Proc. of USENIX Securty (2002). [29] WEISSTEIN, W. E. Strlng Number of the Second Knd. StrlngNumberoftheSecondKnd.html. [30] YEGNESWARAN, V., BARFORD, P., AND PAXSON, V. Usng honeynets for nternet stuatonal awareness. In In Proc. of ACM Hotnets IV (2005). [31] ZOU, C., GAO, L., GONG, W., AND TOWSLEY, D. Montorng and early warnng for nternet worms. In Prof. of ACM CCS (2003). APPENDIX A. MODELING HOW BOTS SCAN A.1 Bot Source Code Study By analyzng the source code of fve popular famles of bots, we study dfferent dmensons of scan strateges employed by botnets. The popularty of these fve bot famles s confrmed n [6, 7]. Our fndngs confrm those n [7], but we more focus on scan pattern study.
12 Botnet name Agobot Phatbot Spybot SDBot rxbot Global Yes Yes Yes Yes Yes Local Yes Yes Yes Yes Yes Ht-lst Possble Possble Possble Possble Possble Independent & Unform Yes Yes No Yes Yes Sequental No No Yes Yes Yes # of lnes Modularty Medum Hgh Low Low Hgh Table 8: Botnet source code study. Table 8 shows the scan strateges and complexty of the bot famles. Some of them are modularly well desgned. Currently, these bot famles manly use smple scannng strateges. Each supports both Global scannng (a specfed address block) and Local scannng (relatve to each bot s address). By ht-lst scannng, we refer to an event for whch the attacker appears to have prevously acqured a specfc lst of targets. Such scans may heavly favor the use of lve addresses (those that respond) to dark (nonresponsve) addresses. The fve bot famles we analyzed do not drectly automate ht-lst scannng, but an attacker can possbly acheve ths va two steps, frst scannng to gather a lst of lve addresses/blocks, and then specfyng these at the command lne. In addton, most bot famles support (unformly) Random and Sequental scannng of the desgnated addresses or blocks. Our dataset analyss accords wth the above capabltes: most scanners we observe use ether smple sequental scannng (IP address ncrements by one between scans) or ndependent unform random scannng. We do observe more sophstcated monotonc trends (address ncrementng by k), but very nfrequently. We also observe botnets usng ht-lst scannng qute frequently. A.2 Modelng Botnet Global Scannng There s a large desgn space for botmasters when developng scan strateges, but we expect that the followng features are usually desred: Cover the target scope fully. Dstrbute the load based on bots capabltes. Low communcaton overhead for coordnaton. Scan detecton evason. Botmasters may want bots to avod aggressve scannng of a small address range, to avod easy detecton and blockng by IDS/IPS systems. Redundancy. Snce the bots n a botnet can readly be lost due to detecton or smply the host computer gong offlne, the botmaster wll prefer nstructng multple bots to scan the same addresses. A smlar analyss s proposed n [19] for worms. Gven these desred features, a smple and effectve approach s to ask each bot to ndependently scan the specfed range n a random unform fashon. Dong so can acheve the scan detecton evason, low communcaton overhead, and load dstrbuton, whle also provdng good coverage and redundancy. Ths approach s also smple to correctly mplement. Most of the events we found n our datasets are close to unform scannng. Advanced Scannng Strateges. In fact, by ntroducng some smple coordnaton between bots one can do better than random unform for both coverage and redundancy. An advanced scannng strategy, called worm scan permutaton, was proposed n the context of worm propagaton [28]. But the above strategy s optmzed for worms and does not consder the usage of C & C channels of botnets. Potentally, wth C & C channels botnets can acheve even better coordnaton. Usng the botnet C & C, we propose a better scan strategy called Advanced Botnet Permutaton scan (ABPS). Each bot permutes the whole scannng scope n the same way wth a key from botmaster. Then based on bots capabltes, the botmaster dvdes the replcates of the permuted IP scope to all the bots. Ths can acheve much better coverage and redundancy. We smulate and evaluate ths strategy n our evaluaton. B. PROOF OF THEOREM 1 PROOF. There are totally d n ways to dstrbute the n scans nto d addresses. Among them f there are X 0 ways whch have z 0 addresses recevng zero scan (.e., z 0 empty slots). Then, we know P (z 0) = X 0/d n. We wll show that for a gven z 0 the X 0 s! d Strlng2(n, d z 0) (d z 0)! z 0 In d addresses, there are ` d z 0 confguratons to select whch z0 addresses got zero scan. Each confguraton has z 0 addresses whch got zero scan and d z 0 addresses got non-zero scans. Strlng2(n, m) denotes the number of ways of parttonng a set of n element nto m nonempty sets [29]. Consder after parttonng the n scans nto d z 0 sets, we have (d z 0)! ways to map the sets to the addresses. Therefore, for each confguraton we have Strlng2(n, d z 0) (d z 0)! ways to dstrbute the n scans nto d z 0 addresses. Hence we proved X 0 = d z 0! Strlng2(n, d z 0) (d z 0)! C. PROOF OF THEOREM 2 AND 3 Proof of Theorem 2: THEOREM 2. ˆρ s an unbased estmator for ρ. PROOF. n E(ˆρ) = E( ) = E( P n m ) E(n ) P R G T m = P R G T m R G T As we mentoned, n s the number of scans we see f we sample from R G T total scans wth probablty ρ, whch follows a bnomal dstrbuton. Hence we have E(n ) = ρ R G T. Therefore, E(ˆρ) = Proof of Theorem 3: ρ R G T = ρ R G T THEOREM 3. V AR(ˆρ) = P ρ (1 ρ) m R G T R G T = ρ R G T < V AR( ˆρ ),.e., the accuracy of ρ estmator when aggregatng over all m senders s hgher than that of each and every sngle sender. PROOF. n V AR(n ) ( R G T ) 2 V AR(ˆρ) = V AR( ) = R G T Smlar as before snce n follows a bnomal dstrbuton, we have V AR(n ) = ρ (1 ρ) R G T. Therefore, V AR(ˆρ) = On the other hand, ρ (1 ρ) R G T ( R G T ) 2 = ρ (1 ρ) R G T n V AR( ˆρ ) = V AR( ) = V AR(n) ρ (1 ρ) = 2 R G T (R G T ) R G T Therefore, V AR(ˆρ) < V AR( ˆρ )
benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).
REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or
CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol
CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL
An Alternative Way to Measure Private Equity Performance
An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate
The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis
The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna [email protected] Abstract.
Traffic-light a stress test for life insurance provisions
MEMORANDUM Date 006-09-7 Authors Bengt von Bahr, Göran Ronge Traffc-lght a stress test for lfe nsurance provsons Fnansnspetonen P.O. Box 6750 SE-113 85 Stocholm [Sveavägen 167] Tel +46 8 787 80 00 Fax
Can Auto Liability Insurance Purchases Signal Risk Attitude?
Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang
CHAPTER 14 MORE ABOUT REGRESSION
CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp
Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..
What is Candidate Sampling
What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble
DEFINING %COMPLETE IN MICROSOFT PROJECT
CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,
Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College
Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure
Calculation of Sampling Weights
Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample
Recurrence. 1 Definitions and main statements
Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.
PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12
14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed
The OC Curve of Attribute Acceptance Plans
The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4
How To Understand The Results Of The German Meris Cloud And Water Vapour Product
Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller
Single and multiple stage classifiers implementing logistic discrimination
Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,
Calculating the high frequency transmission line parameters of power cables
< ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,
FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES
FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES Zuzanna BRO EK-MUCHA, Grzegorz ZADORA, 2 Insttute of Forensc Research, Cracow, Poland 2 Faculty of Chemstry, Jagellonan
How To Detect An 802.11 Traffc From A Network With A Network Onlne Onlnet
IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. X, NO. X, XXX 2008 1 Passve Onlne Detecton of 802.11 Traffc Usng Sequental Hypothess Testng wth TCP ACK-Pars We We, Member, IEEE, Kyoungwon Suh, Member, IEEE,
RequIn, a tool for fast web traffic inference
RequIn, a tool for fast web traffc nference Olver aul, Jean Etenne Kba GET/INT, LOR Department 9 rue Charles Fourer 90 Evry, France [email protected], [email protected] Abstract As networked
How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence
1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh
Vembu StoreGrid Windows Client Installation Guide
Ser v cepr ov dered t on Cl enti nst al l at ongu de W ndows Vembu StoreGrd Wndows Clent Installaton Gude Download the Wndows nstaller, VembuStoreGrd_4_2_0_SP_Clent_Only.exe To nstall StoreGrd clent on
1. Measuring association using correlation and regression
How to measure assocaton I: Correlaton. 1. Measurng assocaton usng correlaton and regresson We often would lke to know how one varable, such as a mother's weght, s related to another varable, such as a
Analysis of Premium Liabilities for Australian Lines of Business
Summary of Analyss of Premum Labltes for Australan Lnes of Busness Emly Tao Honours Research Paper, The Unversty of Melbourne Emly Tao Acknowledgements I am grateful to the Australan Prudental Regulaton
Project Networks With Mixed-Time Constraints
Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa
A graph-theoretic framework for isolating botnets in a network
SECURITY AND COMMUNICATION NETWORKS Securty Comm. Networks (212) Publshed onlne n Wley Onlne Lbrary (wleyonlnelbrary.com)..5 SPECIAL ISSUE PAPER A graph-theoretc framework for solatng botnets n a network
An Interest-Oriented Network Evolution Mechanism for Online Communities
An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne
NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6
PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has
Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall
SP 2005-02 August 2005 Staff Paper Department of Appled Economcs and Management Cornell Unversty, Ithaca, New York 14853-7801 USA Farm Savngs Accounts: Examnng Income Varablty, Elgblty, and Benefts Brent
Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION
Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble
HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA*
HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA* Luísa Farnha** 1. INTRODUCTION The rapd growth n Portuguese households ndebtedness n the past few years ncreased the concerns that debt
Forecasting the Direction and Strength of Stock Market Movement
Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye [email protected] [email protected] [email protected] Abstract - Stock market s one of the most complcated systems
A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification
IDC IDC A Herarchcal Anomaly Network Intruson Detecton System usng Neural Network Classfcaton ZHENG ZHANG, JUN LI, C. N. MANIKOPOULOS, JAY JORGENSON and JOSE UCLES ECE Department, New Jersey Inst. of Tech.,
1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)
6.3 / -- Communcaton Networks II (Görg) SS20 -- www.comnets.un-bremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes
Statistical Methods to Develop Rating Models
Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and
Efficient Project Portfolio as a tool for Enterprise Risk Management
Effcent Proect Portfolo as a tool for Enterprse Rsk Management Valentn O. Nkonov Ural State Techncal Unversty Growth Traectory Consultng Company January 5, 27 Effcent Proect Portfolo as a tool for Enterprse
An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services
An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao
VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) 2127472, Fax: (370-5) 276 1380, Email: info@teltonika.
VRT012 User s gude V0.1 Thank you for purchasng our product. We hope ths user-frendly devce wll be helpful n realsng your deas and brngng comfort to your lfe. Please take few mnutes to read ths manual
Updating the E5810B firmware
Updatng the E5810B frmware NOTE Do not update your E5810B frmware unless you have a specfc need to do so, such as defect repar or nstrument enhancements. If the frmware update fals, the E5810B wll revert
8 Algorithm for Binary Searching in Trees
8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the
Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting
Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of
1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.
HIGHER DOCTORATE DEGREES SUMMARY OF PRINCIPAL CHANGES General changes None Secton 3.2 Refer to text (Amendments to verson 03.0, UPR AS02 are shown n talcs.) 1 INTRODUCTION 1.1 The Unversty may award Hgher
Extending Probabilistic Dynamic Epistemic Logic
Extendng Probablstc Dynamc Epstemc Logc Joshua Sack May 29, 2008 Probablty Space Defnton A probablty space s a tuple (S, A, µ), where 1 S s a set called the sample space. 2 A P(S) s a σ-algebra: a set
Simple Interest Loans (Section 5.1) :
Chapter 5 Fnance The frst part of ths revew wll explan the dfferent nterest and nvestment equatons you learned n secton 5.1 through 5.4 of your textbook and go through several examples. The second part
Politecnico di Torino. Porto Institutional Repository
Poltecnco d Torno Porto Insttutonal Repostory [Artcle] A cost-effectve cloud computng framework for acceleratng multmeda communcaton smulatons Orgnal Ctaton: D. Angel, E. Masala (2012). A cost-effectve
Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic
Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange
Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006
Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model
THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek
HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo
Enterprise Master Patient Index
Enterprse Master Patent Index Healthcare data are captured n many dfferent settngs such as hosptals, clncs, labs, and physcan offces. Accordng to a report by the CDC, patents n the Unted States made an
PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign
PAS: A Packet Accountng System to Lmt the Effects of DoS & DDoS Debsh Fesehaye & Klara Naherstedt Unversty of Illnos-Urbana Champagn DoS and DDoS DDoS attacks are ncreasng threats to our dgtal world. Exstng
Realistic Image Synthesis
Realstc Image Synthess - Combned Samplng and Path Tracng - Phlpp Slusallek Karol Myszkowsk Vncent Pegoraro Overvew: Today Combned Samplng (Multple Importance Samplng) Renderng and Measurng Equaton Random
Traffic-light extended with stress test for insurance and expense risks in life insurance
PROMEMORIA Datum 0 July 007 FI Dnr 07-1171-30 Fnansnspetonen Författare Bengt von Bahr, Göran Ronge Traffc-lght extended wth stress test for nsurance and expense rss n lfe nsurance Summary Ths memorandum
An Empirical Study of Search Engine Advertising Effectiveness
An Emprcal Study of Search Engne Advertsng Effectveness Sanjog Msra, Smon School of Busness Unversty of Rochester Edeal Pnker, Smon School of Busness Unversty of Rochester Alan Rmm-Kaufman, Rmm-Kaufman
STANDING WAVE TUBE TECHNIQUES FOR MEASURING THE NORMAL INCIDENCE ABSORPTION COEFFICIENT: COMPARISON OF DIFFERENT EXPERIMENTAL SETUPS.
STADIG WAVE TUBE TECHIQUES FOR MEASURIG THE ORMAL ICIDECE ABSORPTIO COEFFICIET: COMPARISO OF DIFFERET EXPERIMETAL SETUPS. Angelo Farna (*), Patrzo Faust (**) (*) Dpart. d Ing. Industrale, Unverstà d Parma,
APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT
APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT Toshhko Oda (1), Kochro Iwaoka (2) (1), (2) Infrastructure Systems Busness Unt, Panasonc System Networks Co., Ltd. Saedo-cho
Reliable State Monitoring in Cloud Datacenters
Relable State Montorng n Cloud Datacenters Shcong Meng Arun K. Iyengar Isabelle M. Rouvellou Lng Lu Ksung Lee Balaj Palansamy Yuzhe Tang College of Computng, Georga Insttute of Technology, Atlanta, GA
On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features
On-Lne Fault Detecton n Wnd Turbne Transmsson System usng Adaptve Flter and Robust Statstcal Features Ruoyu L Remote Dagnostcs Center SKF USA Inc. 3443 N. Sam Houston Pkwy., Houston TX 77086 Emal: [email protected]
Stochastic Protocol Modeling for Anomaly Based Network Intrusion Detection
Stochastc Protocol Modelng for Anomaly Based Network Intruson Detecton Juan M. Estevez-Tapador, Pedro Garca-Teodoro, and Jesus E. Daz-Verdejo Department of Electroncs and Computer Technology Unversty of
INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS
21 22 September 2007, BULGARIA 119 Proceedngs of the Internatonal Conference on Informaton Technologes (InfoTech-2007) 21 st 22 nd September 2007, Bulgara vol. 2 INVESTIGATION OF VEHICULAR USERS FAIRNESS
ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING
ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,
BERNSTEIN POLYNOMIALS
On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful
Brigid Mullany, Ph.D University of North Carolina, Charlotte
Evaluaton And Comparson Of The Dfferent Standards Used To Defne The Postonal Accuracy And Repeatablty Of Numercally Controlled Machnng Center Axes Brgd Mullany, Ph.D Unversty of North Carolna, Charlotte
Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy
Fnancal Tme Seres Analyss Patrck McSharry [email protected] www.mcsharry.net Trnty Term 2014 Mathematcal Insttute Unversty of Oxford Course outlne 1. Data analyss, probablty, correlatons, vsualsaton
Luby s Alg. for Maximal Independent Sets using Pairwise Independence
Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent
8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by
6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng
A Secure Password-Authenticated Key Agreement Using Smart Cards
A Secure Password-Authentcated Key Agreement Usng Smart Cards Ka Chan 1, Wen-Chung Kuo 2 and Jn-Chou Cheng 3 1 Department of Computer and Informaton Scence, R.O.C. Mltary Academy, Kaohsung 83059, Tawan,
) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance
Calbraton Method Instances of the Cell class (one nstance for each FMS cell) contan ADC raw data and methods assocated wth each partcular FMS cell. The calbraton method ncludes event selecton (Class Cell
How To Calculate The Accountng Perod Of Nequalty
Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.
PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION
PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIIOUS AFFILIATION AND PARTICIPATION Danny Cohen-Zada Department of Economcs, Ben-uron Unversty, Beer-Sheva 84105, Israel Wllam Sander Department of Economcs, DePaul
Hollinger Canadian Publishing Holdings Co. ( HCPH ) proceeding under the Companies Creditors Arrangement Act ( CCAA )
February 17, 2011 Andrew J. Hatnay [email protected] Dear Sr/Madam: Re: Re: Hollnger Canadan Publshng Holdngs Co. ( HCPH ) proceedng under the Companes Credtors Arrangement Act ( CCAA ) Update on CCAA Proceedngs
A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression
Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,
The Current Employment Statistics (CES) survey,
Busness Brths and Deaths Impact of busness brths and deaths n the payroll survey The CES probablty-based sample redesgn accounts for most busness brth employment through the mputaton of busness deaths,
7.5. Present Value of an Annuity. Investigate
7.5 Present Value of an Annuty Owen and Anna are approachng retrement and are puttng ther fnances n order. They have worked hard and nvested ther earnngs so that they now have a large amount of money on
Traffic State Estimation in the Traffic Management Center of Berlin
Traffc State Estmaton n the Traffc Management Center of Berln Authors: Peter Vortsch, PTV AG, Stumpfstrasse, D-763 Karlsruhe, Germany phone ++49/72/965/35, emal [email protected] Peter Möhl, PTV AG,
RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST) yaoqi.feng@yahoo.
ICSV4 Carns Australa 9- July, 007 RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL Yaoq FENG, Hanpng QIU Dynamc Test Laboratory, BISEE Chna Academy of Space Technology (CAST) [email protected] Abstract
Chapter 8 Group-based Lending and Adverse Selection: A Study on Risk Behavior and Group Formation 1
Chapter 8 Group-based Lendng and Adverse Selecton: A Study on Rsk Behavor and Group Formaton 1 8.1 Introducton Ths chapter deals wth group formaton and the adverse selecton problem. In several theoretcal
Network Security Situation Evaluation Method for Distributed Denial of Service
Network Securty Stuaton Evaluaton Method for Dstrbuted Denal of Servce Jn Q,2, Cu YMn,2, Huang MnHuan,2, Kuang XaoHu,2, TangHong,2 ) Scence and Technology on Informaton System Securty Laboratory, Bejng,
SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:
SPEE Recommended Evaluaton Practce #6 efnton of eclne Curve Parameters Background: The producton hstores of ol and gas wells can be analyzed to estmate reserves and future ol and gas producton rates and
SIMPLE LINEAR CORRELATION
SIMPLE LINEAR CORRELATION Smple lnear correlaton s a measure of the degree to whch two varables vary together, or a measure of the ntensty of the assocaton between two varables. Correlaton often s abused.
IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS
IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS Chrs Deeley* Last revsed: September 22, 200 * Chrs Deeley s a Senor Lecturer n the School of Accountng, Charles Sturt Unversty,
Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation
Exhaustve Regresson An Exploraton of Regresson-Based Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The
Student Performance in Online Quizzes as a Function of Time in Undergraduate Financial Management Courses
Student Performance n Onlne Quzzes as a Functon of Tme n Undergraduate Fnancal Management Courses Olver Schnusenberg The Unversty of North Florda ABSTRACT An nterestng research queston n lght of recent
For example, you might want to capture security group membership changes. A quick web search may lead you to the 632 event.
Audtng Wndows & Actve Drectory Changes va Wndows Event Logs Ths document takes a lghtweght look at the steps and consderatons nvolved n settng up Wndows and/or Actve Drectory event log audtng. Settng up
Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,
Enabling P2P One-view Multi-party Video Conferencing
Enablng P2P One-vew Mult-party Vdeo Conferencng Yongxang Zhao, Yong Lu, Changja Chen, and JanYn Zhang Abstract Mult-Party Vdeo Conferencng (MPVC) facltates realtme group nteracton between users. Whle P2P
To manage leave, meeting institutional requirements and treating individual staff members fairly and consistently.
Corporate Polces & Procedures Human Resources - Document CPP216 Leave Management Frst Produced: Current Verson: Past Revsons: Revew Cycle: Apples From: 09/09/09 26/10/12 09/09/09 3 years Immedately Authorsaton:
A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing
A Replcaton-Based and Fault Tolerant Allocaton Algorthm for Cloud Computng Tork Altameem Dept of Computer Scence, RCC, Kng Saud Unversty, PO Box: 28095 11437 Ryadh-Saud Araba Abstract The very large nfrastructure
Proactive Secret Sharing Or: How to Cope With Perpetual Leakage
Proactve Secret Sharng Or: How to Cope Wth Perpetual Leakage Paper by Amr Herzberg Stanslaw Jareck Hugo Krawczyk Mot Yung Presentaton by Davd Zage What s Secret Sharng Basc Idea ((2, 2)-threshold scheme):
CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements
Lecture 3 Densty estmaton Mlos Hauskrecht [email protected] 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there
Statistical algorithms in Review Manager 5
Statstcal algorthms n Reve Manager 5 Jonathan J Deeks and Julan PT Hggns on behalf of the Statstcal Methods Group of The Cochrane Collaboraton August 00 Data structure Consder a meta-analyss of k studes
Sketching Sampled Data Streams
Sketchng Sampled Data Streams Florn Rusu, Aln Dobra CISE Department Unversty of Florda Ganesvlle, FL, USA [email protected] [email protected] Abstract Samplng s used as a unversal method to reduce the
Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy
4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.
A Dynamic Load Balancing for Massive Multiplayer Online Game Server
A Dynamc Load Balancng for Massve Multplayer Onlne Game Server Jungyoul Lm, Jaeyong Chung, Jnryong Km and Kwanghyun Shm Dgtal Content Research Dvson Electroncs and Telecommuncatons Research Insttute Daejeon,
Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application
Internatonal Journal of mart Grd and lean Energy Performance Analyss of Energy onsumpton of martphone Runnng Moble Hotspot Applcaton Yun on hung a chool of Electronc Engneerng, oongsl Unversty, 511 angdo-dong,
A Performance Analysis of View Maintenance Techniques for Data Warehouses
A Performance Analyss of Vew Mantenance Technques for Data Warehouses Xng Wang Dell Computer Corporaton Round Roc, Texas Le Gruenwald The nversty of Olahoma School of Computer Scence orman, OK 739 Guangtao
Gender differences in revealed risk taking: evidence from mutual fund investors
Economcs Letters 76 (2002) 151 158 www.elsever.com/ locate/ econbase Gender dfferences n revealed rsk takng: evdence from mutual fund nvestors a b c, * Peggy D. Dwyer, James H. Glkeson, John A. Lst a Unversty
Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001
Proceedngs of the Annual Meetng of the Amercan Statstcal Assocaton, August 5-9, 2001 LIST-ASSISTED SAMPLING: THE EFFECT OF TELEPHONE SYSTEM CHANGES ON DESIGN 1 Clyde Tucker, Bureau of Labor Statstcs James
