Large-Scale IP Traceback in High-Speed Internet: Practical Techniques and Theoretical Foundation

Transcription

1 Large-Scale IP Traceback in High-Seed Internet: Practical Techniques and Theoretical Foundation Jun Li Minho Sung Jun (Jim) Xu College of Comuting Georgia Institute of Technology Li (Erran) Li Bell Labs Lucent Technologies Abstract Tracing attack ackets to their sources, known as IP traceback, is an imortant ste to counter distributed denial-of-service (DDoS) attacks. In this aer, we roose a novel acket logging based (i.e., hash-based) traceback scheme that requires an order of magnitude smaller rocessing and storage cost than the hash-based scheme roosed by Snoeren et al. [9], thereby being able to scalable to much higher link seed (e.g., OC-768). The baseline idea of our aroach is to samle and log a small ercentage (e.g., 3.3%) of ackets. The challenge of this low samling rate is that much more sohisticated techniques need to be used for traceback. Our solution is to construct the attack tree using the correlation between the attack ackets samled by neighboring routers. The scheme using naive indeendent random samling does not erform well due to the low correlation between the ackets samled by neighboring routers. We invent a samling scheme that imroves this correlation and the overall efficiency significantly. Another major contribution of this work is that we introduce a novel information-theoretic framework for our traceback scheme to answer imortant questions on system arameter tuning and the fundamental trade-off between the resource used for traceback and the traceback accuracy. Simulation results based on real-world network toologies (e.g. Skitter) match very well with results from the information-theoretic analysis. The simulation results also demonstrate that our traceback scheme can achieve high accuracy, and scale very well to a large number of attackers (e.g., 5+). The work of J. Li, M. Sung and J. Xu was suorted in art by the National Science Foundation (NSF) under Grant ITR/SY ANI-3933 and the NSF CAREER Award ANI The names of the student authors, Jun Li and Minho Sung, are listed in the alhabetical order, reresenting equal contributions from them.. Introduction Distributed Denial of Service (DDoS) attacks against high-rofile web sites such as Yahoo, CNN, Amazon and E*Trade in early [3] rendered the services of these web sites unavailable for hours or even days. New instances of DDoS attacks continue to be reorted. For examle, a recent DDoS attack against root DNS servers brought down eight of them in an effort to aralyze the Internet []. It is clear that DDoS attacks will not sto or scale down until they are roerly addressed. One ossible way to counter DDoS attacks is to trace the attack sources and unish the eretrators. However, current Internet design makes such tracing difficult in two asects. First, there is no field in the IP header that indicates its source excet for the IP address, which can be easily soofed by an attacker. Second, the Internet is stateless in that it does not kee track of the ath traversed by a acket. Recently, efforts are made to change one or both asects to allow for tracing ackets to their sources, known as IP Traceback. U to now, two main tyes of traceback techniques have been roosed in the literature.. One is to mark each acket with artial ath information robabilistically [9, 8, 3, 8, 4]. By receiving a significant number of ackets, the victim can construct the attack aths. This is referred to as Probabilistic Packet Marking (PPM) scheme.. The other is to store acket digests in the form of Bloom filters [3] at each router [9]. By checking neighboring routers iteratively with attack ackets, the attack ath of a flow can be constructed. This is referred to as hash-based scheme. However, both traceback schemes suffer from scalability roblems. As we will show in the next section, PPM schemes cannot scale to large number of attackers as the best scheme roosed can only efficiently

2 trace fewer than attackers using a 7-bit marking field (discussed later); Hash-based scheme is not scalable for high-seed links since recording % of ackets, even in the Bloom filter digest form, would incur rohibitively high comutational and storage overhead. The objective of our work is to design a traceback scheme that is scalable both in terms of the number of attackers and in terms of high link seed... Scalability roblems of existing aroaches The advantage of PPM schemes is that they do not incur any storage overhead at the routers and the comutation of marking is usually lightweight. However, PPM-based schemes work well only when the number of attackers is small, due artly to the limited number of bits available for marking in the IP header. A recent PPM scheme roosed by Goodrich [4] is shown to be the most scalable one among the PPM schemes. However, with a marking field of 7 bits, it can only scale u to attack trees containing routers. A large-scale DDoS attack can have thousands of attackers and tens of thousands of routers on the attack aths, making the PPM schemes unsuitable for traceback. Hash-based aroach, on the other hand, is very effective for large-scale IP traceback, and needs only a single acket to trace one attacker [9]. However, since it comutes and stores a Bloom filter digest for every acket, its comutational and storage overhead is rohibitive at a router with very high seed links. For examle, assuming a acket size of, bits, a dulex OC-9 link requires 6 million hash oerations to be erformed every second, resulting in the use of SRAM (5ns DRAM is too slow for this) and 44GB of storage sace every hour, with the arameters suggested in [9]. It is imortant to reduce the comutational, memory and storage overhead of the hash-based scheme for it to be ractical for high-seed Internet... New contributions Our technical contributions are two-fold. First, we roose a novel acket logging based traceback scheme Song et at. s scheme [3] allows for traceback to a large number of attackers. However, it requires the knowledge of the routerlevel Internet toology, which may not be ractical. For the traceback to be tamer-resistant, it also requires most of the Internet routers authenticate to the victim, which can be comlicated to deloy and administer. We assume that the message size (defined in [4]) is 64 bits for reresenting the IP addresses of the current router and the revious router, and the collision size (defined in [4]) is no more than. that is scalable to high link seeds. The baseline idea of our aroach is to samle a small ercentage (e.g., 3.3%) of ackets. We construct the attack tree using the correlation between the attack ackets samled by neighboring routers. The scheme with naive indeendent random samling does not erform well due to the low correlation between the ackets samled by neighboring routers. We invent a samling scheme that imroves this correlation and the overall efficiency by orders of magnitude. Samling greatly reduces the comutational and storage overhead for acket logging. For examle, with a samling rate of 3.3% (it can be smaller), our storage overhead is only / ln bits er acket 3, a dulex OC-9 link will require the comutation of 8 million hash functions every second, and the storage of 5.GB for one hour s traffic. This is an order of magnitude more affordable than the scheme in [9]. Our second major contribution is to introduce a novel information-theoretic framework for our traceback scheme to answer imortant questions on system arameter tuning and on the fundamental trade-off between the resource used for traceback and the traceback accuracy. For a given erformance constraint, there is the question of how to tune the traceback scheme in terms of the number of hash functions and the samling rate. This otimization roblem is formulated as a channel caacity maximization roblem in information theory. This framework also allows us to comute the minimum number of attack ackets needed for achieving certain traceback accuracy and how this number scales to larger number of attackers. Our roosed scheme is simulated on three sets of real-world Internet toologies with varying oerating arameters. Simulation results demonstrate that, even when there are a large number of attackers, our traceback scheme can accurately find most of them using a reasonable number of attack ackets. For examle, with a samling robability of only 3.3%, our traceback scheme can identify 9% of infected routes, using only a total of 75, attack ackets for traceback (resulting in a query size of 4.MB 4 ), even when there are, attackers. The rest of the aer is organized as follows. In Section we resent an overview of the roosed traceback scheme and the information-theoretic framework. 3 Each Bloom filter digest uses hash functions. The reason why we use will be clear in Section 4. The term ln is due to the Bloom filter sace-efficiency trade-off and will be exlained in Section Only the invariant arts of IP header(6 bytes) and first 8 bytes of the ayload will be used for traceback as in [9].

3 In Section 3, we articulate the challenges raised by samling, and describe the comonents of our scheme in detail. In Section 4, the roosed scheme is analyzed using a novel information-theoretic framework. The erformance is evaluated in Section 5 through simulation studies. Section 6 surveys the related work and Section 7 concludes the aer.. Overview.. Our solution for large-scale traceback In this aer we roose a new traceback scheme that is scalable both to a large number of attackers and to high link seed. Like [9], our scheme requires Internet routers to record Bloom filter digests of ackets going through them. However, unlike [9], which records % of ackets, our scheme only samles a small ercentage of them (say 3.3%) and stores the digests of the samled ackets. With such a samling rate, the storage and comutational cost becomes much smaller, allowing the link seed to scale to OC-9 seed or higher rates. For examle, our scheme can scale to OC- 768 seed (simlex) using only DRAM, when samling 3.3% of the traffic. The trade-off of samling is that it makes the traceback rocess much more difficult, esecially with a low samling rate such as 3.3%. In articular, it is no longer ossible to trace one attacker with only one acket. This is because, due to samling, the robability that two neighboring routers on the attack ath both samle this acket is very small. This makes the one-acket traceback oeration hard to roceed. In our scheme, the victim uses a set L v of attack ackets it has received as material evidence to trace and construct the attack tree, consisting of attack aths from attackers to the victim. The attack tree starts with the victim as the root and the only leaf. It grows when a leaf node determines that one or more of its neighbors are highly likely to be on an attack ath (called infected hereafter). Such a likelihood is assessed by erforming the following test. Suose R is a leaf node that is already considered as being infected (called convicted ). R would like to check whether one of its neighbors R is likely to be on an attack ath. We define what R has seen as the ackets among L v that match the Bloom filter digests stored at R. Our test is to check whether what R has seen has non-negligible correlation with what R has seen, as determined by a threshold decoding rule. If the answer is yes, R will be convicted; Otherwise, R will be exonerated. If R is convicted, R will further test its neighbors recursively using this rocedure. Designing the aforementioned threshold decoding rule is nontrivial, and careful game-theoretic study is needed to make sure that the rule is loohole-free to the attackers. Clearly, the higher the correlation between the attack ackets samled by neighboring infected routers is, the more accurate our traceback scheme is. Given other arameters such as samling rate and the number of attack ackets gathered by the victim (i.e., L v ) being fixed, it is critical to imrove the correlation factor, the ercentage of the attack ackets samled by R (ustream) matched by the attack ackets samled by R (downstream). A naive samling scheme is that each router indeendently samles a certain ercentage (say 3.3%) of ackets. However, in this case the correlation factor of two routers is just 3.3%. In other words, what R has samled only matches 3.3% of what R has samled. While consistent samling techniques such as trajectory samling [] has the otential to imrove this factor to nearly %, it will not work for an adversarial environment, as we will discuss in Section 3... We roose a novel technique that imroves this correlation factor significantly, using only one bit in the IP header for communications between neighboring routers to coordinate the samling. This scheme is shown to be robust against attackers tamering. Using this technique, our scheme requires much smaller number of attack ackets for traceback, and achieves better traceback accuracy than indeendent samling... Information-theoretic framework of our traceback scheme The design of the scheme leads to a very interesting otimization roblem. We assume that the average number of bits devoted for each acket is a fixed constant s, due to the comutational and storage constraints of a router. In other words, on the average for each acket we comute s hash functions. Then the number of hash functions our scheme comutes for each samled acket is inversely roortional to the ercentage of ackets that is samled. For examle, if the resource constraint is that hash comutations are erformed for each acket, one ossible combination is that the router samles 5% of the ackets and the number of hash functions is 8 (5% 8 = ). With the same resource constraint, an alternative combination is to samle.5% of the ackets, but the number of hash functions is 6. Which one is better? Intuitively, higher samling rate increases the aforementioned correlation between the ackets samled by two routers, making traceback easier. However, the number of hash functions would have to be roortionally

4 smaller, which results in a higher false ositive rate in Bloom filter. This adds noise to the aforementioned traceback rocess and reduces the accuracy. Clearly there is an inherent trade-off between these two arameters, but where is the sweet sot (i.e., otimal arameter setting)? We show that this question can be answered by alying information theory. Our simulation results show that the information-theoretic framework indeed guides us to find the otimal arameter setting. Our information-theoretic framework also allows us to answer another imortant question concerning the trade-off between the amount of evidence the victim has to gather (the number of attack ackets) and the traceback accuracy. In articular, information theory allows us to derive a lower bound on the number of ackets the victim must obtain to achieve a certain level of traceback accuracy. A bonus from studying these lower bounds is that it sheds light on how this number scales to larger number of attackers. 3. Detailed Design Our scheme consists of two algorithms. One is a samling algorithm that is running at the Internet routers to samle and record the Bloom filter digests of the ackets going through them. The other is a traceback algorithm that is initiated by the victim to trace the attackers using the digests stored at these routers, uon the detection of a DDoS attack. In Sections 3. and 3., we describe the samling algorithm and the traceback algorithm in detail. 3.. Samling 3... A design challenge. Our roosed scheme significantly reduces the rocessing and storage requirements by samling. However, samling makes traceback more difficult. In articular, it is now almost imossible to trace one attacker with only one acket as in [9]. This is because, with a low samling ercentage, the first router on the attack ath that will samle a articular attack acket is on the average many hos away. Intuitively, with a samling rate of, the victim needs to receive at least O( ) ackets to be able to trace one attacker, since each router on the ath needs to store at least one attack acket. It turns out that to design a samling algorithm that allows for accurate traceback of one attacker with this minimum number of attack ackets (i.e., O( )) is nontrivial. A naive samling scheme is that each router indeendently samles ackets with the robability. However, this aroach does not work well since it would require a minimum of O( ) attack ackets 5 to trace one attacker. Recall from Section. that if a convicted router R wants to check whether one of its neighbors R is infected, the scheme checks whether the set of ackets R has seen has non-negligible correlation with the set of ackets R has seen. It takes at least O( ) ackets for these two sets to have an overla of one or more ackets. The key roblem of this naive scheme is that the correlation factor between the ackets samled by neighboring routers is only, i.e., what R has samled only matches (ercentage) of what R has samled. We roose a novel samling scheme that imroves this correlation factor to over 5% with the same samling rate at every router, therefore reaching the O( ) asymtotic lower bound. We will describe this scheme in the next section. One may say that there is a scheme that achieves the correlation factor of %, by asking all routers on the same ath to samle the same set of ackets (known as trajectory samling []). However, techniques to achieve such consistent samling will not work in this adversarial environment since an attacker can easily generate ackets that evade being samled. We exlored along this direction and found that it is extremely challenging to design noncrytograhic 6 techniques to achieve consistent samling in this adversarial environment. Our scheme, on the other hand, is robust against the tamering by the attackers, without resorting to crytograhic techniques One-bit Random Marking and Samling (ORMS). Indeendent random samling method does not work well since the correlation factor between the ackets samled by neighboring routers is only, the samling rate. In this section, we resent our samling scheme that significantly imroves this correlation factor. The key idea of our scheme is that, besides samling the ackets, a router also marks the samled ackets so that the next router on the ath, seeing the mark, can coordinate its samling with the revious router to imrove the correlation factor. We use a marking field of only one bit for this coordination. This bit can be easily fit into many ossible locations in the IP header (e.g., IP fragmentation field 7 ). 5 Note that O( ) can be orders of magnitude larger than O( ) when is small. 6 This is ossible with crytograhic techniques. However, it may involve key distribution and management on hundreds of thousands of Internet routers.

5 Our ORMS scheme is resented in Figure. This algorithm is executed at every interface of the articiating routers. If an arriving acket has the bit marked, the bit will be unmarked and the acket will be stored in Bloom filter digest form. However, if the ercentage of ackets (denoted as r) that are marked among the arriving ackets is over, it must have been tamered by an attacker (exlained next). Our scheme will only samle and store the marked ackets with robability r. This is the meaning of subject to a ca of in line 4 of Figure. If an arriving acket is not marked, it will be stored and marked with robability (where comes from will become clear after next aragrah). A router will also measure the ercentage of ackets coming from itself that is marked. If this ercentage is larger or smaller than, the router will adjust it to by marking and unmarking some bits (lines 9 & in Figure ). This can be achieved using traditional rate-control techniques in networking such as leaky bucket [3]. Consider the ath from a remote host to the victim. We will show that the following two invariants hold in the aroximate sense, when only the first ho (a router) from the host have other hosts attached to it and all later hos (routers) are neighboring with other articiating routers only. The first invariant is that aroximately of the ackets from a router will be marked. Note that a router on the first ho from the attacker will mark of the ackets (lines 9 & in Figure ). This argument certainly works for every router, but we would like to show that once the system is jum-started to stationarity, these two lines almost (subject to a small error ɛ) do not need to be executed at later routers. To see this, note that at later routers, aroximately ( ) of the arriving ackets are not marked, and among those ( ) = = will be marked. Therefore, once the system is jum-started to stationarity (with marked), it remains stationary. The second invariant is that each router, excet for the first ho (which may samle less than ), will samle aroximately of the ackets. This is because a router will samle all the ackets marked by the ustream neighbors ( ), and samle another of ackets that are marked by itself. Finally, it is not hard to verify that, no matter how an attacker maniulates the marking field, the first router on the attacker s ath will samle at least and at most of the ackets coming from the attacker. 7 The IP fragmentation field has been reused in the PPM-based IP traceback schemes. The backward comatibility issues has been discussed in [8]. Samling rocedure at router R (given samling rate ):. for each acket w. if (w.mark = ) then 3. write into w.mark; 4. store the digest of w, subject to a ca of ; 5. else 6. with robability 7. store the digest of w; 8. write into w.mark; 9. if (marking ercentage is not ) then. tune it to ; /* make the rocess stationary */ Figure : One-bit random marking and samling (ORMS) scheme Now we quantitatively analyze the benefit of our one-bit marking technique. We claim that the exected correlation factor between two neighboring routers R (downstream) and R (ustream) is, when R is not on the first ho from the attacker. This is because R has samled all ercentage of ackets R has marked, and among another ercentage of ackets that R has samled but unmarked, R samles of them. The total is (+ ), which is. The correlation factor is (samled by both) divided by (samled by R ), which is. Note that is larger than 5% because < <. This reresents orders of magnitude imrovement comared to indeendent random samling, when is small (say < 5%). Finally, we would like to show that the correlation factor of our scheme is resistant to tamering by attackers. In other words, an attacker cannot maniulate this factor by marking or unmarking the ackets they send. This is because our ORMS scheme is oblivious: the first router that receives the marked ackets from an attacker will unmark them and the outut ackets from the router have exactly of them marked (i.e., jum-start to stationarity). Later on, as discussed before, the correlation between neighboring routers will always be Packet digesting. Like [9], we use a saceefficient data structure known as Bloom filter [3] to record acket digests. A Bloom filter reresenting a set of ackets S = {x, x,, x n } of size n is described by an array A of m bits, initialized to. A Bloom filter uses k indeendent hash functions h, h,, h k with range {,, m}. During insertion, given a acket x to be inserted into a set S, the bits A[h i (x)], i =,,, k, are set to. To query for a acket y, i.e., to check if y is in S, we check the values of the bits A[h i (y)], i =,,, k. The answer to the query is

6 yes if all these bits are, and no otherwise. A Bloom filter guarantees not to have any false negative, i.e., returning no even though the set actually contains the acket. However, it may contain false ositives, i.e., returning yes while the acket is not in the set. The caacity factor, denoted as c, of a Bloom filter is defined as the ratio of m to n. In this aer, we assume the Bloom filter at each router is aged to disk before c decreases to k/ ln. Then according to [3], the false ositive rate of the Bloom filter is no more than k. In Sections 4 and 5, the false ositive rate of the Bloom filter is always assumed to be k for analysis and erformance evaluation uroses. Note that same as in [9], we use the first 4 invariant bytes of an IP acket as the hash inut. These 4 bytes include the invariant ortion of the IP header (6 bytes) and the first 8 bytes of the ayload. In the rest of the aer, when we refer to a acket, we always refer to its first 4 invariant bytes. 3.. Traceback rocessing When the victim detects a DDoS attack, it will trigger a traceback rocedure. The victim will first collect a decent number of attack ackets, which is not difficult during a DDoS attack. Then it will use these ackets to track down the attackers. We denote the set of ackets that is used for traceback as L v, described in Section.. The size of L v is tyically between MB and MB deending on the number of attackers and the traceback accuracy desired. The traceback rocedure starts with the victim checking all its immediate neighbors. For any router S which is one ho away from the victim, the victim will first query the corresonding (right date and time) Bloom filter at S with the whole set L v. The router S is added to the attack tree if at least one match is found. If S is convicted, the set of ackets in L v that match the Bloom filter of S will be assembled into a set L S. Each neighbor R of S will then be queried by L S (not L v!), if R has not yet been convicted. Again, if at least one match is found, S convicts R and sends L v to R; Otherwise, nothing needs to be done to R by S. If R is convicted, R will assemble L R, which is the set of ackets in L v that match the Bloom filter at R. The set L R will then be used by R to query its neighbors. This rocess is reeated recursively until it cannot roceed. We now discuss the subtleties involved in our traceback rocessing. In the above algorithm, a router is convicted if the Bloom filter returns yes for at least one acket. It is imortant to use as the detection threshold. Otherwise, an attacker can send identical ackets to avoid detection. This loohole exists because the Bloom filter we use does not count the number of occurrences of a acket 8. This loohole is closed under our one-acket decoding rule. It can be easily verified that an attacker has no incentive to send identical ackets anymore from a game-theoretic oint of view, since this will only increase its robability of being detected. Note that our scheme uses L R to match the Bloom filter at the neighbors of R once R is convicted. A careful reader may wonder why we do not simly use L v to query each router. Recall that, Bloom filter can have a false ositive robability of k where k is the number of hash functions used. We will show that a tyical k value is. When k = (with a false ositive robability ) and L v >> 5,, more than one false ositive will occur with high robability. This will result in almost all Internet routers being convicted. Since L R is much smaller than L v, the number of false ositives caused by L R is also much smaller. 4. An information-theoretic framework In this section we resent our information-theoretic framework that serves as the theoretical foundation of our traceback scheme. We first resent the roblems that are answered by this framework in Section 4.. After briefly introducing the relevant information theory concets and theorems in Section 4., we show how they are alied to our context in Section Why do we need a theoretical foundation? Our information-theoretic framework answers the following two questions concerning arameter tuning and the minimum number of attack ackets needed for accurate traceback, resectively Parameter tuning. We have discussed in Section. that given a resource constraint, the number of hash functions in each Bloom filter is inversely roortional to the samling robability. Clearly there is an otimal trade-off between these two arameters. Information theory will hel us find the sweet sot Tradeoff between traceback overhead and accuracy. The information-theoretic framework also allows us to answer the following question: 8 One can also use counting Bloom filter [] or Sectrum Bloom filter [6] to record the number of occurrences of a acket. Detection rules based on multile ackets can be designed accordingly. However, these schemes are much more comlicated. Also the game-theoretic analysis associated with using the higher threshold is extremely comlex.

7 What is the minimum number of attack ackets that the victim has to gather in order to achieve a traceback error rate of no more than ɛ?. This information is imortant because it exhibits the fundamental trade-off between the number of attack ackets the victim needs to use for traceback, and the accuracy to be achieved. Our solution to this question also answers a related question: How does this number (of attack ackets) scale with resect to certain system arameters such as the number of attackers? For examle, if the number of attackers grows from, to,, how many more attack ackets does the victim have to use to achieve the same accuracy? 4.. Information theory background In this section, we summarize the information theory concets and theorems that will be used in our later exloration. We first review the concets of entroy and conditional entroy. Then we introduce Fano s inequality [7], which will be used to answer the question raised in Section Entroy and conditional entroy. Definition 4. The entroy of a discrete random variable X is defined as H(X) def = x X Pr[X = x] log Pr[X = x] () where X is the set of values that X can take. The entroy of a random variable X measures the uncertainty of X, in the unit of bits. Definition 4. The conditional entroy of a random variable X conditioned on another random variable Y is defined as H(X Y ) def = x X (Pr[X = x, Y = y] y Y log Pr[X = x Y = y]) () where Y is the set of values that Y can take. The concet of conditional entroy arises when we are interested in estimating the value of X, which cannot be observed directly, using the observation of a related random variable Y. The conditional entroy H(X Y ) measures how much uncertainty remains for X given our observation of Y Fano s inequality. In our analysis, we would like to estimate the value of X based on the observation of Y. The conditional entroy H(X Y ) measures how much uncertainty remains for X given our observation of Y. Intuitively, the smaller this conditional entroy value is, the more accurate the estimation that can be made is. This intuition is catured by Fano s inequality [7]. Suose, given an observation of Y, our estimation of X is ˆX. We denote e as the robability that this estimation is incorrect, i.e., e = Pr[ ˆX X]. Fano s inequality states the following. H( e ) + e log ( X ) H(X Y ) (3) Here, H( e ) is overloaded to stand for the entroy of the indicator random variable { ˆX X}. By (), H( e ) = e log e ( e ) log ( e ). In (3), X is the number of different values that X can take. If we are estimating a random variable that will only take ossible values (i.e., X = ), Fano s inequality becomes the following simlified form: H( e ) H(X Y ) (4) Note that, without loss of generality, we can assume that e is no more than.5 (if a binary estimation rocedure A roduces wrong result more than half of the time, we can simly use A). So Fano s inequality and the fact that H is strictly increasing from to.5, imlies that if we would like the estimation of X (binaryvalued) to have an estimation error no more than e, the conditional entroy H(X Y ) has to be no more than H( e ) Alications to our roblems Modeling. As described in Section 3., when a (convicted) router R would like to check whether one of its neighbors R is infected, it queries the Bloom filter at R with L R. Here L R is the set of ackets that match the Bloom filter at R among the set of ackets used for traceback (i.e., L v ). We first define some notations: N : the number of attack ackets used by the victim for traceback. d : the ercentage of the attack ackets that travel through R. d : the ercentage of the attack ackets that travel through R. In the following, we introduce ste by ste the random variables involved in the analysis. By convention, Binom(N, P) reresents the binomial distribution with constant arameters N and P, where N is the number of trials and P is the success robability. In some laces below, we abuse the Binom notation slightly to ut a random variable in the lace of N, which will be made mathematically rigorous next.

8 Let X be a random variable. The rigorous mathematical definition for a random variable Y to have the distribution Binom(X, P) is that, the conditional distribution of Y given that X = x is Binom(x, P), and this holds for all values of x that X will take. This abuse is not counterintuitive, and makes our reasoning much more succinct. Let X t be the number of attack ackets samled by R. It has the robability distribution Binom(N d, ). Let X f be the number of false ositives when L v is queried against the Bloom filter at R. Its robability distribution is Binom(N X t, f). Here f is the false ositive rate of the Bloom filter. Let X t be the number of attack ackets samled by R. Its robability distribution is Binom(N d, ). Let Y t be the number of true ositives (real matches instead of Bloom filter false ositives) when the Bloom filter at R is queried with L R. Its robability distribution is Binom(X t, ). The arameter comes from the fact that the correlation factor between the ackets samled by neighboring routers is in our ORMS scheme. Let Y f be the number of false ositives when the Bloom filter at R is queried with L R. Its robability distribution is Binom(X t + X f Y t, f). During the traceback rocess, we are able to observe the values of the following two random variables: X t + X f : the total number of ackets in the acket set L R. Y t + Y f : the number of ositives when the Bloom filter at R is queried with L R. We are interested in estimating the value of the following random variable Z, which indicates whether R has stored at least one attack acket in the set of the attack ackets used by the victim for traceback. { if Xt > Z = otherwise By information theory, the accuracy of estimating Z from observing X t +X f and Y t +Y f is measured by the conditional entroy H(Z X t + X f, Y t + Y f ). The actual formula of H(Z X t + X f, Y t + Y f ) in terms of system arameters N, d, d, and k is very involved. The details on how to calculate the conditional entroy can be found in Aendix A. We have written a rogram to comute H(Z X t + X f, Y t + Y f ) given a set of arameters. Its results are used to lot the figures related to H(Z X t + X f, Y t + Y f ) in the rest of the aer. In comuting H(Z X t + X f, Y t + Y f ), we assume d = d. This is because, given a tyical router-level Internet toology, when we trace routers several hos away from the victim, with good robability R is the only ustream neighbor of R that is infected (i.e., no more branching ustream). So d = d = d catures the common case. We also assume r[z = ] = r[z = ] = /, that is, we assume no rior knowledge about Z Parameter tuning. As we discussed before, our resource constraint is k s. Here s is the number of bits of comutation (i.e., the number of hashing oerations) devoted to each acket on the average, k is the number of hash functions in each Bloom filter, and is the samling robability. Clearly, the best erformance haens on the curve k = s. Since s is treated as a constant, only one arameter k needs to be tuned ( = s/k). It remains to be found out which k value will allow us to determine with best accuracy whether R has been infected. By information theory, our knowledge about Z from observing X t +X f and Y t +Y f is maximized when the conditional entroy H(Z X t + X f, Y t + Y f ) is minimized. In other words, we would like to comute k = argmin k H(Z X t + X f, Y t + Y f ) (5) subject to the constraint k = s as discussed before. In general, the value of H(Z X t + X f, Y t + Y f ) not only deends on the arameter k we would like to tune, but also deends on other arameters such as d (we assume d = d ). We can view the value of d (say d = d) as a targeted level of concentration. In other words, when k = k, our system is most accurate in estimating the value of Z for those otential R s that have the concentration d. One may wonder if we target a certain concentration, but a different concentration haens during an attack, our k may not be otimal. However, our comutation results show that if we target a low concentration such as 5, which aroximately corresonds to 5, attackers attacking with the same intensity, our k is otimal or close to otimal for other higher concentrations as well. In other words, the otimality of k is not sensitive to the concentration value we are targeting. Therefore, we can choose a k for our scheme to work well even if we do not know the accurate information of d, d. All we need is the range of d, d. We illustrate these results in Figure. Each curve in Figure (c) shows how the value of H(Z X t + X f, Y t + Y f ) varies with different k values, given a

9 conditional entroy (a) d = / N = 5, N = 75, N =, number of hash functions k conditional entroy (b) d = / N =, N = 5, N =, number of hash functions k conditional entroy (c) d = /5 N = 5, N = 375, N = 5, number of hash functions k Figure : Conditional entroy with resect to the number of hash functions used in a Bloom filter for s = with different concentrations certain N value (number of attack ackets used for traceback). The three curves in this figure corresonds to N = 5,, 375,, 5, resectively. Here the resource constraint is s =. The targeted concentration d is 5. We can clearly see that the otimal k value is not sensitive to the arameter N. Given d = 5, Figure (c) shows that k = or 3 results in the lowest value for H(Z X t + X f, Y t + Y f ). Figures (a) and (b) show how the value of H(Z X t + X f, Y t + Y f ) varies with different k values, when d is set to and, resectively. From these two figures, we see that k = is very close to otimal for higher concentrations and. This demonstrates that the otimal value of k is not very sensitive to the value of d. Therefore, in Section 5, our scheme will adot k = when its resource constraint is s =. Simulation results show that k = indeed allows our scheme to achieve the otimal erformance. In other words, the information theory indeed rescribes the otimal arameter setting for our scheme Alication of Fano s inequality. In this section, we will show how Fano s inequality can be used to comute the minimum number of attack ackets needed for achieving a certain traceback accuracy and how this number scales to larger number of attackers. According to Fano s inequality for the estimation of a binary-valued random variable (formula (4)), we have H( e ) H(Z X t + X f, Y t + Y f ). (6) where e = Pr[Ẑ Z] is the robability that our estimation Ẑ is different from the actual value of Z. Therefore, given a desired traceback error rate ɛ, the number of attack ackets has to be larger than N min, where N min is the minimum N that makes H(Z X t + X f, Y t + Y f ) no more than H(ɛ). Figure 3 shows the fundamental trade-off between the traceback error e and N min. In this figure, s is set to and k is set to the aforementioned otimal value. The three curves in this figure corresond to the estimation error d = / d = / d = / number of attack ackets N (x) Figure 3: The trade-off between the estimation error e and N min, given s = and k =. setting d =,, and 5 resectively. For examle, when there are, attackers attacking with the same intensity, to be able to achieve the estimation error rate of., the victim needs to receive and use at least 8, attack ackets. All curves go downward, matching the intuition that larger number of attack ackets are needed for traceback when smaller estimation error rate is desired. Figure 3 also shows how N min scales with the number of attackers. We can see that N min grows almost linearly with the number of attackers for all desired estimation accuracies. For examle, when the desired e is., we need 8,, 66,, 45, ackets for scenarios which have,,,, and 5, attackers with the same intensity, resectively. 5. Performance Evaluation We have conducted extensive simulation on three real-world network toologies to evaluate the erformance of the roosed scheme, using a simulation tool we have develoed. The goal of our simulation is twofold. First, we are interested in knowing how well our information-theoretic results match with our simulation results. We show that they agree with each other very well. Second, we would like to investigate the er-

10 (a) Skitter I toology N = 5, N = 75, N =, (b) Skitter II toology N = 5, N = 75, N =, (c) Bell-lab s toology N = 5, N = 75, N =, number of hash functions k number of hash functions k number of hash functions k Figure 4: Simulation results suorting the theoretical analysis (error level by varying k) Performance Metrics Control Parameters FNR(False Negative Ratio): the ratio of the number of missed routers in the constructed attack tree to the number of infected routers FPR(False Positive Ratio): the ratio of the number of incorrectly convicted routers to the number of convicted routers in the constructed attack tree N a: the number of attackers N : the number of attack ackets used for traceback : the samling rate at an intermediate router k: the number of hash functions in a Bloom filter s: resource constraint (=k ) Table : Performance Metrics and Control Parameters formance of our traceback scheme. We show that our scheme can achieve high traceback accuracy even when there are a large number of attackers, and only requires the victim to collect and use a moderate number of attack ackets. 5.. Simulation set-u: toologies and metrics The following three real-world network toologies are used in our simulation study. Skitter data I collected from a CAIDA-owned host (a-root.skitter.caida.org) on /8/ as a art of the Skitter roject []. This data contains the traceroute data from this server to 9,9 destinations. Skitter data II collected from another CAIDA host (e-root.skitter.caida.org) on /7/, containing routes to 58,8 destinations. Bell-lab s dataset collected from a Bell-labs host [5], containing routes to 86,83 destinations. We merged six route sets originated from the same host into one and trimmed incomlete aths. All three toologies are routes from a single origin to many destinations in the Internet. In our simulation, we assume that this origin is the victim and the attackers are randomly distributed among the destination hosts 9. Table shows the erformance metrics and control arameters used in our simulation. Due to samling, some routers that are on the attack ath may not be detected. We call these routers false negatives. The false negative ratio (FNR) of an attack tree constructed by the traceback scheme is defined as the ratio of the number of false negatives to the number of actual infected routers during the attack. Because Bloom filters are used to store acket digests, the traceback system may identify routers that are not actually on attack aths. We call these routers false ositives. The false ositive ratio (FPR) of an attack tree constructed by the traceback scheme is defined as the ratio of the number of false ositives to the total number of routers in the attack tree. It is ideal for the traceback scheme to be able to trace most of the attackers (i.e., low FNR), using a moderate number of attack ackets. It is in general not necessary for FNR to be zero (i.e., find all attackers) since identifying and removing most of the attackers are effective enough for restoring the services being attacked. Incomlete or aroximate attack ath information is valuable because the efficacy of comlementary measures such as acket filtering imroves as they are alied further from the victim and closer to the attack sources [8]. This is why we count routers instead of routes in these erformance metrics. Among the control arameters, N a denotes the number of attackers, and N reresents the number of attack ackets that are used for traceback. The larger N is, the higher the traceback overhead is. Recall that denotes the samling rate, k denotes the number of 9 In real situation, this assumtion of random distribution can be wrong because many hosts in same vulnerable network can be comromised simultaneously. However, clustering of attackers only hels our scheme because it increases the correlation between routers in attack ath. Recall that a router on the attack ath of an attacker is called infected.

11 (a) Skitter I toology (b) Skitter II toology (c) Bell-lab s toology. k=8 k=9 k= k= k= k=3 k=4 k=5 k=6. k=8 k=9 k= k= k= k=3 k=4 k=5 k=6. k=8 k=9 k= k= k= k=3 k=4 k=5 k=6.. resource constraint s (=k*).. resource constraint s (=k*).. resource constraint s (=k*) Figure 5: Simulation results suorting the theoretical analysis (error level by varying s) (a) Skitter I toology (b) Skitter II toology (c) Bell-lab s toology attackers attackers 5 attackers attackers attackers 5 attackers attackers attackers 5 attackers number of attack ackets N (x) number of attack ackets N (x) number of attack ackets N (x) Figure 6: Simulation results suorting the theoretical analysis (error level by varying N ) hash functions used for each Bloom filter, and s = k is the comutational comlexity er acket. We assume every router uses the same values of s and for evaluation urose. For the urose of simulations, we also assume all the intermediate routers do the marking and store the acket digests. 5.. Verification of theoretical analysis In Section 4, we have develoed an informationtheoretic framework for otimal arameter tuning. In articular, we redict that when the resource constraint is s =, the traceback accuracy is maximized when k = or if there are, attackers with same intensity. We conduct simulations on all toologies to verify the accuracy of our model, and the results are shown in Figures 4(a,b,c). Here the number of attackers N a is,. We use the sum of FNR and FPR to reresent the overall error level of the simulation results, since the entroy concet reflects both FNR and FPR. The three curves corresond to using 5,, 75, and, attack ackets for traceback, resectively. These figures show that the otimal value The error e does not corresond exactly to FNR + FPR, but is close to FNR + FPR when both numbers are reasonably small. of k arameter in our simulation is either or, matching our theoretical rediction erfectly. For examle, when we use hash functions in a Bloom filter and use, attack ackets for traceback on Skitter I toology, we can get.38 and.7 as FNR and FPR resectively. It means that we can correctly identify around 7% of infected routers in attack tree with only.7% of false ositive. Note that this result is obtained using very low resource constraint s = which makes the samling rate as low as 3.3%. We also simulate, given a fixed k value, how the error rate varies with different s values, and the results are shown in Figures 5(a,b,c). Here the number of attackers N a is set to, and the number of attack ackets used for traceback is,. The nine curves in each figure reresent the error rates when k is set to 8, 9,, and 6, resectively. Among different k values, our traceback scheme erforms best with k = when the resource constraint s is no more than. For examle, when k = and s =, we get.9 and.6 as FNR and FPR resectively. When there are more resources (i.e., s > ), our traceback scheme erforms better with larger k values. The interretation of this is that our one-acket decoding rule generates more false ositives when larger s allows for higher samling rate and hence larger L R

12 (a) Skitter I toology (b) Skitter II toology (c) Bell-lab s toology False Negative Ratio. attackers attackers 5 attackers False Negative Ratio. attackers attackers 5 attackers False Negative Ratio. attackers attackers 5 attackers number of attack ackets N (x) number of attack ackets N (x) number of attack ackets N (x) Figure 7: False Negative Ratio of our traceback scheme on three different toologies (a) Skitter I toology (b) Skitter II toology (c) Bell-lab s toology False Positive Ratio. attackers attackers 5 attackers False Positive Ratio. attackers attackers 5 attackers False Positive Ratio. attackers attackers 5 attackers number of attack ackets N (x) number of attack ackets N (x) number of attack ackets N (x) Figure 8: False Positive Ratio of our traceback scheme on three different toologies (number of attack ackets that match the Bloom filter at R ). Since FNR at this oint is already low, the increase on the FPR will wie out the gain we have on FNR. In other words, at this oint, the larger L R becomes a liability rather than an asset. Therefore, when s >, our scheme achieves lower (FNR + FPR), when k is increased to reduce the false ositive rate of the Bloom filter and the size of L R. We also would like to comare the minimum number of ackets needed to achieve a certain level of traceback accuracy with the theoretical lower bound we have established in Section This can be achieved by comaring the curves in Figures 6(a,b,c) and curves in Figure 3 (in Section 4.3.3). The arameter settings used in all figures are the same. All three curves in each figure of Figures 6(a,b,c) are higher than curves in Figure 3. In other words, the required number of ackets to achieve a certain error rate in the simulation is higher than the number from the theoretical analysis. This is exected for the following reason. The error e in the theoretical context is different from (FNR + FPR). In the theoretical context, the error e corresonds to the decoding error when R is correctly convicted and only R is in question. In the (FNR + FNR) measure, however, even R may not have been correctly convicted. Therefore, (FNR + FPR) values are always higher than e values under the same attack scenario. Note that curves in Figures 6(a,b,c) corresonding to, and, attackers go u when a large number of attack ackets are used for traceback. Our exlanation is that when N becomes larger, there are more false ositives due to the one-acket decoding rule. In this case, the decrease in FNR is moderate and outweighed by the increase in FPR Performance of our scheme We would like to investigate how our traceback scheme erforms in terms of FPR and FNR with resect to different number of attackers and different number of attack ackets used for traceback. Figures 7(a,b,c) show the FNR of our scheme against the number of attack ackets N used for traceback, under the three aforementioned Internet toologies. Similarly, Figures 8(a,b,c) show the FPR values. In all six figures, we assume s = (devote bits of comutation to each acket). We set k to bits and to 3.3% ( 3.3% = ), which corresond to the otimal arameter setting rescribed by the informationtheoretic framework in Section The three curves in each figure corresond to,,, and 5, attackers, resectively. For all curves in Figures 7(a,b,c), we observe that as the number of attack ackets used for traceback N increases, FNR value decreases sharly, which corresonds to more and more infected routers being iden-

13 tified. On the other hand, the FPR value in Figures 8(a,b,c) increases very slowly and is always reasonable. The increase of FPR is caused by our oneacket decoding rule. In general, the lower FNR we get from larger N significantly outweighs the slightly higher FPR. We also observe that our scheme can achieve very high traceback accuracy with a reasonable number of attack ackets. For examle in Figure 7(a), under the attack from, attackers, about 75, attack ackets would be enough to track more than 9% of the infected routers, resulting in only 4.4% FPR. In this case, the average number of ackets er attacker is 75. As the number of attackers increases, the number of ackets to achieve the same accuracy also increases. However, normalized over the number of attackers, this number actually decreases. For examle, to track 9% of the infected routers when there are, or 5, attackers, we need 35, or 75, ackets, resectively. The normalized numbers in these two cases are 6 and 45, resectively. The reason is that, the more attackers there are, the easier it is to identify the infected routers located not too far from the victim. 6. Related Work Recent large-scale DDoS attacks have drawn considerable attention [3]. The broad research efforts on defending DDoS attacks can be classified into three categories.. Attack detection and classification. Many techniques have been roosed to detect ongoing DDoS attacks, which can be classified into either signaturebased (e.g., [7]) or statistics-based (e.g., [33]). As we have mentioned, these attack detection techniques are needed to trigger our traceback rocedure. Hussain et al. [5] roose a framework to classify DoS attacks into single source or multile sources. This classification information can hel the victim to better resond to the attacks.. Attack resonse mechanisms. Two classes of solutions have been roosed to address the roblem. One class is the IP traceback schemes [4, 9, 8, 3, 8, 9, 4, ] that we have discussed in detail in Section, including this work. In addition to roosing some PPM-based IP traceback schemes, Adler [] studied the fundamental tradeoffs between the number of ackets needed for traceback and the bits available for erforming acket marking, in the PPM context. In this aer, we studied a similar tradeoff question in the context of logging-based IP traceback (i.e., hashbased) and samling. The techniques used in [] to derive these two tradeoffs are very different. While techniques in [] come mostly from theoretical comuter science, ours come mostly from information theory. Finally, we find it extremely hard to study this tradeoff question when the network allows both PPM and logging, since the question can be cast as a network information theory (mostly unsolved [7]) roblem. The second class is the techniques to revent DDoS attacks and/or to mitigate the effect of such attacks while they are raging on [9, 7, 36, 34, 3, 35,, 5, 8,, 4, ]. In one of our rior work [3], we resent a technique that can effectively filter out the majority of DDoS traffic, thus imroving the overall throughut of the legitimate traffic. Another rior work of ours [34] rooses a ractical DDoS defense system that can rotect the availability of web services during severe DDoS attacks. These two ieces of work fall into the second class. SOS [8] uses overlay techniques with selective re-routing to revent large flooding attacks. Mitigation mechanisms roactively filter attack ackets at strategic laces in the network. For examle, Ferguson [] rooses to deloy ingress filtering in routers to detect and dro ackets sent using soofed IP addresses which do not belong to the stub network. Park et al. [5] roose to install acket filters at the borders of autonomous systems to filter ackets traveling between them. Yarr et al. [35] roose to encode the aths traversed by the ackets and filter out the attack traffic according to the ath identifier. Jin et al. [6] roose to use the TTL values to detect and filter out soofed IP ackets. Schemes in both [9] and [36] use router throttles to allocate the victim bandwidth equally ([9]) or in a min-max fashion ([36]) among erimeter routers. All these schemes aim at filtering out attack traffic or throttling its volume, thereby making legitimate traffic easier to go through. 3. Understanding DoS attack revalence and attack dynamics. Moore et al. used backscatter analysis to gauge the level of Internet DoS activity [3]. They studied the intensity and duration of the DoS attacks and observed a small number of long attacks constituting a significant fraction of the overall attack volume. Paxson [6] analyzed the reflector attacks that conventional PPM schemes can not work against. He then roosed a solution called Reflective Probabilistic Packet Marking Scheme (RPPM). 7. Conclusion In this aer, we have resented a new aroach to IP traceback based on logging samled acket digests. In this aroach, the samling rate can be low enough for the scheme to scale to very high link seed (e.g., OC-768). To achieve high traceback accuracy de-

14 site the low samling rate, we introduce ORMS, a novel samling technique. It significantly increases the correlation between the ackets samled by neighboring routers, thereby enabling our traceback scheme to achieve very high traceback accuracy and efficiency. ORMS is also shown to be resistant to the tamering by the attackers. We analyze the roosed scheme based on a novel information-theoretic framework. This framework allows us to comute the arameters with which our system achieves the otimal erformance. It also allows us to answer imortant questions concerning the trade-off between the amount of evidence the victim uses for traceback (the number of attack ackets) and the traceback accuracy. Our simulation results show that the roosed scheme erforms very well with a reasonable number of attack ackets as evidence, even when there are thousands of attackers and the samling rate is as low as 3.3%. References [] CAIDA s Skitter roject web age. Available at htt:// [] M. Adler. Tradeoffs in robabilistic acket marking for i traceback. In Proc. ACM Symosium on Theory of Comuting (STOC), May. [3] B. Bloom. Sace/time trade-offs in hash coding with allowable errors. Communications of the Association for Comuting Machinery, 3(7):4 46, 97. [4] H. Burch and B. Cheswick. Tracing anonymous ackets to their aroximate source. In Proc. USENIX LISA, ages 39 37, Dec.. [5] B. Cheswick. Internet maing. Available at htt://cm.belllabs.com/who/ches/ma/dbs/index.html, 999. [6] S. Cohen and Y. Matias. Sectral bloom filters. In Proc. ACM SIGMOD Conference on Management of Data, ages 4 5, 3. [7] T. M. Cover and J. A. Thomas. Elements of information theory. Wiley, 99. [8] D. Dean, M. Franklin, and A. Stubblefield. An algebraic aroach to IP traceback. In Proc. NDSS, ages 3, Feb.. [9] T. Doener, P. Klein, and A. Koyfman. Using router staming to identify the source of IP ackets. In Proc. ACM CCS, ages 84 89, Nov.. [] N. Duffield and M. Grossglauser. Trajectory samling for direct traffic observation. IEEE/ACM Transactions on Networking, 9(3):8 9,. [] L. Fan, P. Cao, J. Almeida, and A. Broder. Summary cache: A scalable wide-area Web cache sharing rotocol. IEEE/ACM Transactions on Networking, 8(3):8 93,. [] P. Ferguson. Network Ingress Filtering: Defeating Denial of Service Attacks Which Emloy IP Source Address Soofing. RFC 67, Jan [3] L. Garber. Denial-of-service attacks ri the Internet. IEEE Comuter, 33(4): 7, Ar.. [4] M. T. Goodrich. Efficient acket marking for largescale IP traceback. In Proc. ACM CCS, ages 7 6, November. [5] A. Hussain, J. Heidemann, and C. Paadooulos. A framework for classifying denial of service attacks. In Proc. ACM SIGCOMM, ages 99, Aug. 3. [6] C. Jin, H. Wang, and K. G. Shin. Ho-count filtering: An effective defense against soofed DDoS traffic. In Proc. ACM CCS, ages 3 4, October 3. [7] F. Kargl, J. Maier, S. Schlott, and M. Weber. Protecting web servers from distributed denial of service attacks. In Proc. th Intl. WWW Conference, ages 54 54, May. [8] A. D. Keromytis, V. Misra, and D. Rubenstein. SOS: Secure overlay services. In Proc. ACM SIGCOMM, ages 6 7, Aug.. [9] R. Mahajan, S. Bellovin, S. Floyd, J. Ioannidis, V. Paxson, and S. Shenker. Controlling high bandwidth aggregates in the network. ACM Comuter Communication Review, 3(3):6 73, July. [] D. McGuire and B. Krebs. Attack on internet called largest ever. htt:// Oct.. [] J. Mirkovic, G. Prier, and P. Reiher. Attacking DDoS at the source. In Proc. IEEE ICNP, ages 3 3, Nov.. [] J. Mirkovic, M. Robinson, P. Reiher, and G. Kuenning. Alliance formation for ddos defense. In Proc. New Security Paradigms Worksho, ACM SIGSAC, Aug. 3. [3] D. Moore, G. M. Voelker, and S. Savage. Inferring Internet Denial-of-Service activity. In USENIX Security Symosium, ages 9,. [4] C. Paadooulos, R. Lindell, J. Mehringer, A. Hussain, and R. Govidan. COSSACK: coordinated suression of simultaneous attacks. In DISCEX III, ages 4, Aril 3. [5] K. Park and H. Lee. On the effectiveness of route-based acket filtering for distributed DoS attack revention in ower-law Internets. In Proc. ACM SIGCOMM, ages 5 6, Aug.. [6] V. Paxson. An analysis of using reflectors for distributed denial-of-service attacks. ACM Comuter Communications Review (CCR), 3(3):38 47, July. [7] M. Roesch. Snort - lightweight intrusion detection for networks. htt:// [8] S. Savage, D. Wetherall, A. Karlin, and T. Anderson. Practical network suort for IP traceback. In Proc. ACM SIGCOMM, ages 95 36, Aug.. [9] A. Snoeren, C. Partridge, et al. Hash-based IP traceback. In Proc. ACM SIGCOMM, ages 3 4, Aug.. [3] D. Song and A. Perrig. Advanced and authenticated marking schemes for IP traceback. In Proc. IEEE IN- FOCOM, ages , Ar.. [3] M. Sung and J. Xu. IP Traceback-based Intelligent Packet Filtering: A Novel Technique for Defending Against Internet DDoS Attacks. IEEE Transactions on

15 Parallel and Distributed Systems, 4(9):86 87, Set. 3. Preliminary version aeared in Proc. th IEEE ICNP. [3] J. Turner. New directions in communications (or which way to the information age?). IEEE Communications Magazine, 5():8 5, Oct [33] H. Wang, D. Zhang, and K. G. Shin. Detecting SYN flooding attacks. In Proc. IEEE INFOCOM, ages , June. [34] J. Xu and W. Lee. Sustaining availability of web services under severe denial of service attacks. IEEE Transaction on Comuters, secial issue on Reliable Distributed Systems, 5():95 8, Feb. 3. [35] A. Yaar, A. Perrig, and D. Song. Pi: A ath identification mechanism to defend against DDoS attacks. In Proc. IEEE Symosium on Security and Privacy, ages IEEE Comuter Society Press, May 3. [36] D. K. Yau, J. C. Lui, and F. Liang. Defending against distributed denial-of-service attacks with max-min fair server-centric router throttles. In Proc. IEEE International Worksho on Quality of Service, ages 35 44, May. Aendix A. Comuting H(Z X t + X f, Y t + Y f ) The number of attack ackets X t samled by router R is a binomial random variable with robability mass function Pr[X t = k] = ( N d ) k k ( ) Nd k. The number of false ositives X f when L v is queried against the Bloom filter at router R is also a binomial random variable, with the following robability mass function: N d ( N i ) Pr[X f = k] = Pr[X t = i] f k ( f) N i k. k i= Let X = X t + X f and Y = Y t + Y f. The robability mass function of X is given as follows: min (k,n d ) Pr[X = k] = Pr[X t = i]pr[x f = k i]. i= The robability mass function of the air of random variables (X, Y ) conditioned on Z = is given as follows: Pr[X = j, Y = i Z = ] = Pr[X = j Z = ]Pr[Y = i X = j, Z = ] Now all we need is to comute Pr[Y t = k X = j, Z = ]. Its comutation is a little involved. We will show how to comute it ste by ste. The random variable X (i.e., X t + X f ) and Y t satisfies X t = Y t + W + W where, W and W have robability distributions Binom(N d X t, /()) and (N d N d, ) resectively. Intuitively, the attack ackets samled by R consist of three arts: () Y t, number of attack ackets that R has samled; () W, number of attack ackets samled from the set of attack ackets that are not samled by R ; (3) W, number of attack ackets samled from attack ackets coming from neighbors other than R. We assume d = d as exlained in Section Since Pr[Y t = k X = j, Z = ] = j l= Pr[X f = l Z = ]Pr[Y t = k X t = j l, X f = l, Z = ], all we need to calculate is Pr[Y t = k X t = j l, X f = l, Z = ]. It is given as follows: = = = Pr[Y t = k X t = j l, X f = l, Z = ] N d Pr[X t = g Z = ] g=k Pr[Y t = k, W = j l k X t = j l, X f = l, X t = g, Z = ] N d Pr[X t = g Z = ] g=k Pr[Y t = k X t = g, Z = ]Pr[W = j l k X t = g, Z = ] N d ( Nd ) ( g ( ) (Nd g) g ( g k) )k ( )(g k) g=k ( Nd g ) (/( )) j l k ( /( )) Nd g j+l+k j l k Once we have comuted Pr[X = i, Y = j Z = ], then according to formula () in Section 4. the conditional entroy can be calculated as follows: H(Z X, Y ) = Pr[X = i, Y = j, Z = ] Pr[X = i, Y = j, Z = ] log Pr[X = i, Y = j] (X,Y ) Pr[X = i, Y = j, Z = ] Pr[X = i, Y = j, Z = ] log Pr[X = i, Y = j] (X,Y ) where Pr[X = i, Y = j Z = ] = Pr[X = i Z = ]Pr[Y = j X = i, Z = ] ( i = Pr[X = i] f j) j ( f) i j The robability mass function of Pr[Y = i X = j, Z = ] is given as follows: min(i,n d ) Pr[Y t+y f = i X = j, Z = ] = Pr[Y t = k X = j, Z = ] k= Pr[Y f = i k X = j, Y t = k, Z = ] where Pr[Y f = i k X = j, Y t = k, Z = ] = ( j k i k) f i k ( f) j i. and Pr[X = i, Y = j] = Pr[Z = ]Pr[X = i, Y = j Z = ] +Pr[Z = ]Pr[X = i, Y = j Z = ] = Pr[X t = ]Pr[X = i, Y = j Z = ] +Pr[X t > ]Pr[X = i, Y = j Z = ]. Finally, note that Pr[X = i, Y = j, Z = a] = Pr[X = i, Y = j Z = a]pr[z = a] for a =,, and Pr[Z = ] = Pr[Z = ] = / as assumed in Sec