Protectig Cotet Distributio Networks from Deial of Service Attacks Kag-Wo Lee, Suresh Chari, Aees Shaikh, Sambit Sahu, Pau-Che Cheg IBM T. J. Watso Research Ceter Hawthore, NY 10532 Abstract I this paper, we develop two mechaisms to deter DoS attacks agaist CDN-hosted Web sites ad CDN ifrastructure servers. First, we propose a ovel request routig algorithm which allows CDN servers to effectively distiguish attacks from legitimate requests. Our scheme, based o a keyed hash fuctio, sigificatly improves the resiliece of servers to DoS attacks. Secod, we itroduce several site allocatio algorithms based o biary codes which isure that a attack o oe hosted Web site will have a limited impact o other hosted sites. Our scheme guaratees that a specified miimum umber of servers remai available for o-victimized sites. Together, the proposed schemes sigificatly improve the resiliece of CDN-hosted Web sites, ad complemet other work o couterig distributed DoS attacks. I. INTRODUCTION The problem of detectig ad thwartig deial of service (DoS) attacks agaist Iteret servers has draw cosiderable attetio. These attacks typically flood a etwork or server with bogus requests, rederig it uavailable to hadle legitimate requests. Despite icreased awareess about security issues, deial of service attacks remai a challegig problem. DoS attacks ofte target etwork resources by geeratig a large volume of bogus traffic that cosumes etwork badwidth. The impact of such brute-force attacks ca be mitigated, however, by deployig etwork level mechaisms such as packet filterig ad rate limitig [1]. While etwork level mechaisms provide a first level of protectio, they caot completely prevet attack traffic from reachig its targets. Therefore, cosiderable attetio has bee devoted to developig server-side measures to withstad DoS attacks o Iteret servers [2], [3]. I today s Iteret architecture, may high volume sites are distributed, either by replicatig cotet i several data ceters, or via a cotet distributio service provider (CDSP). Due to the icreased server ad etwork capacity available from their geographically distributed ifrastructure, CDSPs offer icreased resiliece to DoS attacks. However, replicatig servers or hostig o cotet distributio etworks (CDNs) is ot bullet-proof. A attacker ca ifiltrate a large umber of machies usig automated tools ad lauch a large scale distributed DoS (DDoS) attack. Defese agaist such DDoS attacks is hardest whe the attacker uses legitimate packets such as TCP SYN packets, to flood the target site. Hece, we focus o floodig attacks usig TCP SYN packets o CDNs. I additio, the shared ature of a CDN s ifrastructure ca be exploited sice a attack o a sigle CDN-hosted site ca affect may other sites hosted by the same CDN. Without a careful site allocatio strategy, the redudacy provided by the CDN offer limited protectio. I this paper, we develop deterrece mechaisms to DoS attacks that are suited for CDN-like eviromets. Our proposed scheme makes the job of the attacker sigificatly more difficult by leveragig the request routig system, which directs cliet requests to the most appropriate CDN server. We also itroduce ovel site allocatio algorithms to provide sufficiet isolatio amog CDN-hosted sites. As with ay Iteret security measure, our scheme is ot comprehesive by itself. Rather, we propose a set of mechaisms to complemet the existig techiques to combat DDoS attacks. Our proposed mechaisms caot protect, for example, agaist attackers who target the etwork badwidth resources o the liks coected to the CDN servers. Such attacks are better addressed by other DDoS coutermeasures, such as packet filterig ad rate limitig. It should be oted, however, that may observed attacks do ot geerate eough traffic to cosume the badwidth o today s high-speed liks, but are quite sufficiet to overwhelm a server [4]. II. RELATED WORK Network-level mechaisms such as packet filterig [1] ad rate-limitig [5], have bee desiged to mitigate the impact of DDoS attacks. These mechaisms are deployed i etwork routers to prevet potetial attack packets from reachig their destiatio. A umber of recet studies have proposed techiques to determie the origi of attacks as a deterrece mechaism. Examples of such techiques iclude cotrolled floodig [6], audit trails [7], [8], ad traceback [9], [10]. I packet-based traceback, packets are specially marked as they are forwarded by the routers, ad the path back to the origi of ca be costructed, give a large eough umber of marked packets [9], [10]. While traceback techiques deter potetial attackers, they are oly good for postmorem aalysis. Oe of the most commo types of attacks o ed systems is SYN floodig [11], i which a attacker seds a large umber of TCP SYN packets to the target without fully establishig the coectio. Numerous defesive measures have bee proposed icludig SYN cookies [3], radomly droppig SYN packets, reducig the time allowed to complete TCP coectio establishmet. Amog these techiques, SYN cookies are most popular i practice because of its simplicity ad effectiveess. The basic idea of SYN cookies is to ecode the iformatio
about the icomig SYN packets i the sequece umber of the correspodig SYN-ACKs. I this way, the server does ot have to maitai state for partially established coectios, thereby substatially improvig the resiliece to SYN flood attacks. I Sectio IV-B, however, we show that SYN cookies aloe may ot provide eough protectio from attacks with very high packet rates as reported i [4]. Ulike most previous work, the mai focus of this paper is to develop security mechaisms tailored to CDNs. Recet work by Jug et al. o flash crows i CDNs [12] is closely related, but is differet i two ways. First, our scheme differetiates attacks from legitimate requests o-lie, while Jug et al. provide a post-mortem aalysis. Secod, our scheme improves the resiliece of a CDN-hosted site by filterig out attack packets whereas Jug et al. proposed a mechaism to dyamically redirect traffic from a overloaded server to less loaded oes. III. PROBLEM FORMULATION A. CDN model We cosider a cotet distributio etwork model cosistig of servers or server clusters distributed i multiple regios. Regios may be arbitrarily defied, though they typically have some topological or geographic sigificace. Each CDN server i a regio is shared i that it serves cotet of multiple Web sites. Cliets access cotet from the CDN by first cotactig a request router which directs the cliet to a server withi the appropriate regio (where the regio is chose based o cliet proximity, for example). We assume that, withi a regio, the performace received by the cliet is equivalet for ay server i the regio. This assumptio is cosistet with a recet work by Krishamurthy et al [13]. Moreover, i certai CDNs, each regio may have oly sigle cluster of servers [14], where all servers i the same cluster provide similar performace. We assume the request router bases its decisio o the cliet IP address, perhaps alog with other iformatio about the state of the etwork or the cadidate servers. I practice, the request router may be a specialized DNS server or Web server which chooses a proximal server whe the cliet makes a ame resolutio request or HTTP request, respectively [15]. If the request router is a DNS server, the additioal modificatios (e.g., as proposed i [16]) are ecessary to expose the cliet IP address. We ote that our proposed mechaisms are desiged to operate i each regio. I other words, the proposed request routig algorithm selects a target server oly from the local regio. Similarly, our site allocatio algorithm makes allocatio decisio per regio. B. Assumptios about the attack I this paper, we assume that the primary target of DDoS attacks o a CDN is the CDN servers. This assumptio is based o observatios that the attacker ca overwhelm a server with a relatively small volume of attack traffic [4], [17]. We also assume that the request routers are less susceptible to DoS attacks tha the servers. Ulike Web/applicatio servers, which may perform complex operatios, such as DB query processig, the request routers hadle simple request ad respose operatios without havig to establish coectios or maitai state. Thus, we cosider attacks agaist CDN servers to pose a more immediate threat. While certai types of DDoS attacks utilize ICMP or UDP packets, recet studies reveal that the majority (more tha 90% i the study) of attacks use TCP packets [4]. Also, the observed attack packets commoly has source IP addresses followig a radom uiform distributio, idicatig that source address spoofig is widely used i. Hece, our focus is o floodig attacks usig TCP SYN packets with spoofed source IP addresses. By spoofig the source IP address, the attacker will try to hide the true origi of the attack [8], [9] ad icrease the effectiveess of the attack. Attacks which use geuie IP addresses are ot hadled by our scheme. Very large-scale attacks usig legitimate addresses (e.g., Code Red ), however, are much more challegig, ad couterig such attacks is a ogoig research. C. Quatifyig resiliece We begi with a simple otio of resiliece of a hostig eviromet. Ituitively, we say server A is more resiliet tha server B if the attacker must sed more attack traffic to brig dow server A tha to brig dow B. Thus, we ca defie the relative resiliece of server A with respect to server B as follows: Defiitio 1 (Resiliece of a server): Server A is k times more resiliet tha server B if k times more attack traffic is required to make server A uavailable tha to make server B uavailable. For example, whe a site is replicated over a group of servers the the replicated site provides O() resiliece compared to a sigle server because it takes roughly times more attack traffic to brig dow all servers. We cosider CDNs to have replicated servers ad thus have O() resiliece. 1 Our secod metric quatifies the degree of isolatio, or protectio, of a Web site from a attack o aother site hosted by the same CDN. For example, cosider a CDSP hostig two sites A ad B. Ideally, a DDoS attack o site A should ot affect the performace or availability of site B, which is true whe the two sites are ot assiged to ay commo CDN servers. I practice, though, a sigle server is shared amog multiple sites for resource sharig. Our goal is to maximize the umber of servers hostig each site while guarateeig a specified degree of isolatio amog them. Oe simple metric for the degree of isolatio is the umber of servers that are ot shared by ay two sites: Defiitio 2 (Isolatio betwee two sites): Let A ad B deote two Web sites, ad S A = {s 1,...,s l } ad S B = {s 1,...,s k } deote the sets of CDN servers allocated to A ad B, respectively. We defie the degree of isolatio betwee A ad B to be mi( S A S B, S B S A ). For example, if S A = {s 1,s 2,s 3 } ad S B = {s 2,s 3,s 4,s 5 }, the degree 1 Some existig CDN architectures do ot provide O() resiliece, however, sice they require the idex page of a site to be retrieved from the origi server.
of isolatio is 1 because S A S B = {s 1 } = 1 ad S B S A = {s 4,s 5 } =2. I Sectio V, we show that if sites are assiged to a equal umber of servers, the the degree of isolatio betwee two sites is d 2 where d (which is always eve) deotes the umber of disjoit CDN servers. IV. HASH-BASED REQUEST ROUTING The key idea of hash-based request routig is to treat requests with legitimate source IP addresses differetlyfrom bogus requests with spoofed source IP addresses so that most of the attack packets are preferetially dropped whe the CDN is overloaded. I particular, the proposed request routig scheme helps the server to filter out 1 fractio of the attack traffic, where servers are hostig the site i a regio. A. Algorithm descriptio Cliet with IP address cli (1) Seds a request with site ad cli Fig. 1. Request router (2) Selects cache usig H H(site, cli) = server id (3) Respods with server id to cotact (4) Seds TCP SYN with cli (6) Respods with SYN ACK Request router ad server secretly shares hash fuctio H (5) Checks with H If (H(site, cli) = self) isert ito ormal queue else isert ito low prio queue Operatio of hash-based request routig Whe a cliet wats to access a CDN-hosted Web site, it first cotacts a request router to fid the IP address of the appropriate CDN server to cotact. I geeral, the request routig decisio is based o performace or load-balacig metrics. I our approach, however, the hash-based request routig aims to differetiate legitimate requests from potetial attack traffic with spoofed IP addresses. This goal is achieved with simple keyed hashig usig a secret key shared betwee request routers ad CDN servers [18]. The operatio of the hash-based request routig algorithm is described below (see Figure 1). 1) The cliet first seds a request to a request router to fid the server address for the target site. 2) The request router selects a CDN server based o the target site ad the cliet s IP address usig a keyed hash fuctio H, ad the secret key K which is shared with the CDN servers. We assume a simple uiform keyed hash fuctio H : IP addr server id. I other words, ay give IP address is equally likelytohashitoay server id i the regio. 3) The request router respods with the address of the CDN server to cotact. Note that the attacker must use a legitimate source address to query the request router because it will ot receive the respose otherwise. 4) Upo receivig the respose from the request router, the cliet seds a TCP SYN packet to the server with server id. Whe the server receives the SYN packet it verifies if the source IP address hashes to its ow address usig the hash fuctio H ad the shared key K. 5) If the hash value matches its ow address, the CDN server iserts the SYN packet ito the ormal service queue. Otherwise, the SYN packet is iserted ito the low priority queue. Packets i the ormal queue are always served before those i the low priority queue. 6) Oce the SYN packet is processed by the server, a SYN-ACK packet is retured to the cliet. There are a few thigs to ote i this procedure. First, the keyed hash fuctio (Step 2) ca dyamically chage its behavior by simply chagig the key. Secod, the attacker caot discover the mappig for arbitrary IP addresses by permutatio because it caot always receive the resposes from the request router (Step 3). Third, the proposed scheme does ot geerate reverse traffic (TCP SYN-ACK), i respose to the attack traffic as SYN cookie does. Resiliece of hash-based request routig: Whe sedig attack traffic, the attacker will try to guess a address that will pass the hashig test at a CDN server. However, the pseudoradomess of the hash fuctio esures that the attacker s guess is o better tha a radom selectio. Suppose the attacker uses radom IP addresses. From our assumptio of a uiform hash fuctio H, statistically oly 1 fractio of the attack traffic, will pass the test at the server, where is the umber of servers i the regio. The other 1 fractio of the attack traffic will fail the test ad be siletly dropped. From the attacker s perspective, this scheme requires sigificatly more attack traffic to brig dow a server. I the previous sectio, we observed that a CDN with servers has O() resiliece. With the additio of hash-based request routig, each CDN server will accept oly 1 of the attack traffic. This effectively icrease the amout of traffic ecessary to brig dow each server by the factor of. As a result, a attacker must geerate O( 2 ) attack traffic i aggregate to victimize all of the CDN servers i the regio. B. Evaluatio cliet attacker attacker attacker Fig. 2. 100BT 1000BT switch 1000BT Testbed cofiguratio CDN server I this sectio, we preset a few performace umbers as a proof of cocept. We set up a testbed cosistig of oe Web server ad four cliets, all ruig Liux 2.4.7 o Petium III 500 MHz PCs (Figure 2). Amog the cliets, oe is assumed to be a legitimate user ad the other three serve the role of attackers. The server rus Apache 1.3, ad serves local copies
of the CNN.com. I this settig, we examie the performace of a sigle CDN server as if the server is a part of a CDN that employs hash-based request routig. We emulate the umber of other servers by cotrollig the hash parameter. Recall that the proposed hash-based algorithm ca filter out 1 fractio of attack packets at each server if there are CDN servers collaboratig i the same regio. The attackers geerate spoofed TCP packets at a specified rate usig raw IP sockets. We use httperf [19] to geerate request traffic from the legitimate cliet. For each sceario, five sets of data were collected, where each data cosists of results from 1,000 accesses to the Web site. At the server, the Liux SYN cookies implemetatio is tured o as the basic protectio agaist SYN flood attack. We ote that, without SYN cookies, the availability of the server is compromised at a much lower attack rate. Figure 3 presets the umber of timed out requests i the 1,000 accesses from the legitimate user, where timeout is set to 15 secods. The x-axis represets the attack rate (i packets/sec) o each server. The figure plots the results from the SYN cookies oly case i compariso with the SYN cookies+hashig case, where the umber of servers i the regio is assumed to be 2, 3, 4, ad 5. From the figure, we observe that the proposed hashig scheme reduces timeout evets, thereby providig substatially better protectio tha the case whe oly SYN cookies used. For example, with the hashig scheme, the legitimate user does ot experiece timeouts at the attack rate of 3,500 packets/sec, whereas 50% of the requests timeout whe oly SYN cookies were used. We also observe that the level of protectio icreases as the umber of CDN servers i the regio icreases. A similar tred ca be foud from Figure 4 for the average respose time of the HTTP requests that did ot time out. As i the previous case, the average respose time is sigificatly lower whe the hashig scheme is used tha the case without. For the same attack rate, the respose time decreases with the umber of CDN servers i the same regio. V. ISOLATING THE IMPACT OF THE ATTACK I this sectio, we outlie strategies to allocate Web sites to differet CDN servers i order to isolate the impact of a attack o ay idividual site. I particular, we wat to be able to guaratee at least a specified miimum degree of isolatio betwee ay two sites i the CDN, while providig good performace. Ituitively, however, allocatig a large fractio of the servers to each of two differet sites results i sigificat overlap i the set of servers hostig both of the sites. Thus, a attack which brigs dow the servers hostig oe site also collaterally causes a large loss of service for the other site. Therefore, we have the followig two coflictig goals: 1) For each Web site, we wish to serve the site from a large umber of CDN servers i each regio. 2) For ay pair of Web sites, if oe is the target of a DDoS attack the the other should experiece miimal service degradatio. The first goal maximizes the throughput of a site hosted by the CDN, ad the secod protects a Web site from attacks o ay other site. Our cotributios i this paper are to relate the problem of site allocatio to a codig theoretic framework ad obtai good allocatio strategies by adaptig carefully chose codes. We cosider the case where there is oe level of service, where each Web site is hosted o the same umber of CDN servers i each regio. I ogoig work we are ivestigatig similar allocatio strategies to offer multiple levels of service, for example to allocate more servers to more popular sites. A. Relatig site allocatio to codes Let S be the set of CDN servers ad W the set of Web sites to allocate to the servers i S. For each site w W form the bit vector of legth S with bit i set if w is allocated to server i. This bit vector is called the allocatio vector for the site. Followig stadard termiology, the Hammig weight of a biary vector is defied to be the umber of 1s i the vector. The Hammig weight of the allocatio vector of a site represets the umber of CDN servers that serve cotet for this Web site. Hece, our goals may be restated as: Requiremet (1) states that allocatio vectors of each site have the largest Hammig weight possible. Requiremet (2) is that for every pair of sites w 1 ad w 2, the umber of servers which serve w 1 but ot w 2 (ad vice versa) is as large as possible, i.e. if s 1 ad s 2 are the allocatio vectors of w 1 ad w 2 respectively, the the Hammig weights of the bit vectors (s 1 s 2 ) ad (s 2 s 1 ) should be as large as possible. 2 Whe all Web sites are treated equally, i.e. whe all allocatio vectors have equal Hammig weight, we have Hammig weight(s 1 s 2 ) = 2 Hammig weight(s 1 s 2 ) = 2 Hammig weight(s 2 s 1 ), (1) where a b deotes XOR of vectors a ad b. The Hammig weight of (s 1 s 2 ) is called the Hammig distace betwee s 1 ad s 2. Thus, i this restricted case, our problem is to fid allocatio vectors with large Hammig weight uder the costrait that the Hammig distace betwee the vectors is as large as possible. Thus the problem of site allocatio ca be stated as follows: Allocatio problem: Give, the umber of CDN servers i a regio, fid a efficiet algorithm to eumerate a large umber of biary vectors each of legth, each vector havig Hammig weight exactly h (as large as possible) ad the miimum pairwise Hammig distace d betwee vectors beig as large as possible. Give such a algorithm, we ca sequetially geerate such bit vectors ad assig them as the allocatio vectors for each Web site. Uder such a allocatio each Web site is served by h CDN servers out of. If the servers hostig a particular site are all redered ioperative due to a DDoS attack, the ay other site is guarateed to be served by at least d 2 servers. Each Web site thus utilizes h of the available capacity ad 2 x deotes the oes complemet of the biary vector x.
1000 SYN cookies oly Number of timeouts 12000 Respose time for successful requests SYN cookies oly Number of timeouts 800 600 400 200 cookies+hashig ( = 2) cookies+hashig ( = 3) cookies+hashig ( = 4) cookies+hashig ( = 5) Average respose time (msec) 10000 8000 6000 4000 2000 cookies+hashig ( = 2) cookies+hashig ( = 3) cookies+hashig ( = 4) cookies+hashig ( = 5) 0 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 Attack rate (pkt/sec) 0 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 Attack rate (pkt/sec) Fig. 3. Number of timeouts vs. attack traffic rate (packets/sec) Fig. 4. Respose time vs. attack traffic rate (packets/sec) the resultig loss of service whe ay oe Web site is take dow is at most (1 d 2h ) 100 percet. B. Allocatio strategies from biary codes I this sectio, we outlie geeral allocatio methods where we try to maximize h ad d alog with maximizig m, the total umber of Web sites which we ca accommodate. Our allocatio strategies are adaptatios of results from codig theory ad we outlie geeral costructios without details of actual codes. We refer the iterested reader to [20] for comprehesive details about codes ad their costructios. I geeral, defiig codes with may vectors where all codewords have exactly a fixed Hammig weight, is a very difficult problem i codig theory. The few codes that exist to geerate costat Hammig weight codewords, geerally yield oly a small (polyomial i the legth of the code) umber of codewords. To accommodate a larger umber of Web sites, we take arbitrary codes ad prue them to yield biary vectors fittig our specificatio. Our first cut at a allocatio strategy is the followig aive algorithm: Algorithm 1: Fix a code of legth with a large miimum distace d. Choose parameter h so that there are eough codewords of Hammig weight h. The geerate all biary vectors with Hammig weight exactly h ad output oly those vectors which belog to the code. I this algorithm we first fix a code from a family of codes which fixes the parameter d. Oce we fix a code, the distributio o the Hammig weight of the vectors is defied. The parameter h is the chose to have a allocatio for at least m Web sites based o this distributio of Hammig weights. Besides fidig good values for h ad d, we also wish to use codes which have explicit costructios ad efficiet algorithms to eumerate codewords. A particularly good class of codes which have easy algorithms to idetify codewords are the class of liear codes. This icludes a umber of codes such as the Reed Solomo codes [20]. Defiitio 3: A (, k, d) code is a liear code of legth with miimum distace d ad the dimesio of the liear subspace is k. It is defied by a k biary matrix G called the geerator matrix ad the set of codewords is obtaied by x G where x rages over all biary vectors of legth k [20]. Note that liear codes produce codewords with arbitrary Hammig weight. The algorithm to geerate codewords is straightforward: Sequetially eumerate vectors of legth k ad multiply by the geerator matrix G. Alterately, a liear code is also defied by its sydrome matrix C [20], a ( k) biary matrix: a legth word x belogs to the code if ad oly if x C T =0. Usig properties of liear codes, our ext refiemet is the followig: Algorithm 2: Fix a (, k, d) liear code with a large miimum distace d. Systematically geerate all biary vectors of Hammig weight h ad retai words x such that x C T =0. Alterately, eumerate vectors y of legth k ad geerate codeword y G. Retai oly those with Hammig weight h. As before, the parameters d ad h are chose by first fixig a family of liear codes to defie d. Oce the code is fixed, h is chose to maximize the umber of codewords with Hammig weight h i this code. We describe aother geeral scheme to obtai allocatio vectors, which focuses o a particular value for h. Ituitively, if h is too large, the there are few codewords of Hammig weight h. Also, choosig too large a value for h makes the maximum distace (which ca be at most h) small. O the other had if h is small, the each Web site is served by at most h CDN servers ad thus results i wasted capacity. A particularly good value for h is 2 : this is the weight at which we have the maximum umber of biary vectors ad hece potetially a large umber of codewords. For h = 2 we ca use the followig: Algorithm 3: Fix a code C of legth with miimum distace d. Defie a modified code C such that for each codeword c C, C cotais the 2 legth word c = cc. I the modified code C, each codeword has legth 2 ad weight exactly (half the legth of the code). The miimum distace betwee words i C is at least 2d. This is a quick way to use ay code to produce words of costat Hammig weight with h = 2. The umber of codewords i C is the same as that of C, but ow each codeword ca be used as a allocatio vector. These algorithms are geeral methods to covert codes ito allocatio strategies for Web sites to CDN servers. Pluggig good codes ito the costructios yields good allocatio strategies. The equivalece holds i the other directio: ay
allocatio strategy ca be coverted ito a code. This equivalece is useful to verify if allocatio strategies with certai parameters are possible: There are a umber of tables [21] which list (for small values of ), give values for h ad the distace d, the maximum umber of codewords possible i such a code. C. Example Suppose that the CDSP wishes to host 100 Web sites with the guaratee that if a Web site is attacked, all remaiig sites have at least 3 fuctioig servers i the CDN regio. Restated, the problem is: give m = 100 ad miimum distace d =6, fid optimal values for ad h (See Table I). The first step is to fid the miimum value of for which a code with distace d = 6 ad at least m = 100 codewords exists. From stadard tables (see [22]), we see that the miimum possible value for is 15. Our first cut is to use a very specialized o-liear code [22] which yields about 128 codewords with Hammig weight 8 ad with legth =16. This is a fairly optimal allocatio strategy usig a esoteric o-liear code. Aother allocatio ca be obtaied usig the Reed Solomo code of legth =21with a distace of d =5which yields 512 codewords. Ispectig the distributio of the umber of codewords for each Hammig weight, we fid the umber of codewords is maximized at weights 10 ad 11. We choose h =11ad select oly codewords of weight 11 which yields 126 codewords. For these costat weight words, the distace is actually 6. A slightly less optimal, but straightforward, allocatio is to use Algorithm 3 choosig the code C to be the Hammig code of legth 15 ad distace 3. With our parameters the Hammig code has 2048 codewords. Pluggig this code ito the Algorithm 3 gives us a easily implemetable allocatio where =30ad each Web site is assiged to at least 15 servers. While ot optimal, the code yields a large umber of codewords which gives us the flexibility to expad to more Web sites. We have chose these codes from may possibilities, to illustrate the tradeoffs. Optimal codes geerally ted to be o-liear with complex ecodig algorithms. Straightforward choices for codes such as the Reed Solomo code give us slightly less optimal values of. D. Geeral Allocatio Strategies I this sectio we discuss a umber of possible allocatio strategies, usig various codes to place differet emphasis o the umber of hosted Web sites (m), umber of CDN servers per hosted site (h), ad the degree of isolatio betwee sites (d). Table II summarizes the trade-offs of site allocatio usig these codes. Allocatios for a small umber of Web sites: Our first case is whe m is small compared to. Ifm (2 2) we ca use subsets of Hadamard codes [20], [21] ad get very good guaratees o the Hammig distace. The Hadamard code is a (, log(2), 2 ) liear code with 2 codewords. I fact, usig these codes oe ca costruct 2 2 biary vectors each with a Hammig weight 2 with miimum pairwise distace 2. With these as allocatio vectors, we ca assig each site to servers ad guaratee that a site will always be served by 2 4 servers. For small values of m, we ca therefore get very good guaratees o resiliece. Codes with efficiet algorithms: A good class of codes with a large miimum distace ad efficiet algorithms are Reed-Solomo codes. Choosig parameters carefully, ad usig Algorithm 3 stated above, give, we ca use Reed Solomo codes to eumerate a expoetial umber (2 c1 )of codewords with a miimum distace of at least c 2 log(), where c 1 ad c 2 are costats. Thus, we ca guaratee that o Web site will suffer more tha a log() factor drop i service uder attack. Although this is high, these codes have the advatage that allocatio algorithms are easily implemeted. Allocatio strategies for a rage of parameters: There are a umber of advaced codes which ca be coverted to good allocatio strategies. Care should be take, however, sice they typically have complex algorithms for ecodig, ad yield the best parameters oly for large values of. Oe such family of codes are Justese codes [20]. Pluggig these codes i Algorithm 3, gives us a algorithm yieldig a expoetial umber of allocatio vectors where each Web site is allocated to 2 servers ad we ca guaratee that a Web site which is ot uder attack will at most suffer a small costat factor loss of service. VI. DISCUSSION Security vs. performace trade-offs: I most systems security features come at the cost of degraded performace, ad our proposed DDoS coutermeasures for CDNs face a similar trade-off. As described i Sectio IV, the hash-based request routig scheme is ulike a stadard CDN request routig algorithm that chooses the optimal server based o etwork or server load, or etwork proximity. Rather, we assume that each CDN server withi a give regio provides roughly equivalet performace for cliets assiged to that regio. To allow further optimizatio withi a regio, the approach could be modified to use weighted hash fuctios, for istace, where the weights are determied usig covetioal request routig metrics. However, if the request routig fuctio exhibits some predictability based o performace-related iformatio, it may icrease the vulerability of CDN servers to attack. Similarly, the site allocatio strategy preseted i Sectio V potetially reduces the performace of a idividual Web site by assigig it to fewer CDN servers. CDN server distributio ad footprit: The size ad distributio of the CDN iflueces the effectiveess of our DDoS coutermeasures. The mechaisms we propose are directly applicable to large CDSPs which curretly operate thousads of CDN servers distributed across may etworks. O the other had, if the CDN is composed of a few large regios, each cotaiig a small umber of CDN servers, it may be impossible to fid a site allocatio that provides sufficiet Web site isolatio. Similarly, the additioal
TABLE I EXAMPLE SITE ALLOCATION FOR 100 WEB SITES WITH DEGREE OF ISOLATION =3(m = 100 AND d =6) # Total server # Servers/site Code Algorithm 1 =16 h =8 specialized o liear code [22] Algorithm 2 =21 h =11 Reed-Solomo code Algorithm 3 =30 h =15 Hammig code TABLE II SUMMARY OF THE SITE ALLOCATION STRATEGIES USING CODES Code Properties Commets Hadamard code [20], [21] m =2 2, h = 2, d = good isolatio, small umber of sites (m) 2 Reed-Solomo code [20] m = O(2 c1 ), h = 2, d = balaces isolatio ad umber of sites, c 2 log efficiet costructio Justese code [20] m = O(2 c1 ), h = 2, d = c 2 good isolatio, may sites, higher complexity protectio afforded by the hash-based request routig is sigificatly reduced with a small umber of servers. The applicability to small CDNs may be improved, however, with the emergece of CDN peerig i which multiple, admiistratively separate CDNs are combied to create a larger virtual CDN with icreased reach ad distributio. VII. CONCLUSION Recet work o couterig DDoS attacks typically has focused primarily o attacks targetig a cetralized server locatio or etwork resources. Icreasigly, however, highprofile sites are distributed usig CDNs. While CDNs, owig to their distributed structure, promise better resiliece to DDoS attacks, the shared ature of the CDN ifrastructure itroduce uique challeges. I this paper, we proposed two mechaisms to sigificatly improve the resiliece of CDN-hosted Web sites ad CDN servers to DDoS attacks: (a) a hash-based request routig scheme that eables CDN servers to effectively distiguish attack traffic from legitimate requests, ad (b) site allocatio algorithms, based o codig theory, which guaratee a miimum level of availability of the sites that are ot directly uder attack. Together, these schemes improve the resiliece of CDN-hosted Web sites, ad complemet existig techiques used to couter DDoS attacks. Several issues remai to be addressed i future work. For example, mechaisms are still eeded to secure request routers from attack. Also, a CDSP may wish to support multiple levels of service, or to hadle cases where some sites require more or fewer servers. Although the direct relatio to codes is ot valid, we are developig allocatio strategies for multiple classes of service usig codes. REFERENCES [1] Kihog Park ad Heejo Lee, O the Effectiveess of Route-Based Packet Filterig for Distributed DoS Attack Prevetio i Power-Law Iterets, i Proceedigs of ACM SIGCOMM, August 2001. [2] F. Kargl, J. Maier, ad M. Weber, Protectig Web servers from distributed deial of service attacks, vol. 10, May 2001. [3] D. J. Berstei, SYN cookies. http://cr.yp.to/sycookies. html, November 2001. [4] David Moore, Geoffrey M. Voelker, ad Stefa Savage, Iferrig Iteret Deial-of-Service Attack, i Proceedigs of USENIX Security Symposium, August 2001. [5] Strategies to protect agaist distributed deial of service (DDoS) attacks. Cisco Systems White Paper, February 2000. http://www. cisco.com/warp/public/707/ewsflash.html. [6] H. Burch ad B. Cheswick, Tracig aoymous packets to their approximate source, i Proceedigs of USENIX Systems Admiistratio Coferece (LISA), December 2000. [7] G. Sager, Security fu with OCxmo ad cflowd. Presetatio to Iteret-2 Measuremet Workig Group, November 1998. http://www.caida.org/projects/gi/cotet/ security/1198/. [8] Alex C. Soere, Craig Partridge, Luis A. Sachez, Christie E. Joes, Fabrice Tchakoutio, Stephe T. Ket, ad W. Timothy Strayer, Hashbased IP Traceback, i Proceedigs of ACM SIGCOMM, August 2001. [9] Stefa Savage, David Wetherall, Aa Karli, ad Tom Aderso, Practical Network Support for IP Traceback, i Proceedigs of ACM SIGCOMM, August 2000. [10] S. M. Bellovi, M. D. Leech, ad T. Taylor. Iteret draft (draft-ietfitrace-01.txt), April 2002. [11] CERT Coordiatio Ceter, TCP SYN floodig ad IP spoofig attacks. CERT Advisory CA-1996-21, September 1996. http://www. cert.org/advisories/ca-1996-21.html. [12] J. Jug, B. Krishamurthy, ad M. Rabiovich, Flash crowds ad deial of service attacks: Characterizatio ad implicatios for cds ad web sites, i Proceedigs of 11th WWW Coferece, May 2002. [13] B. Krishamurthy, C. Wills, ad Y. Zhag, O the use ad performace of cotet distributio etworks, i Proceedigs of ACM SIGCOMM Iteret Measuremet Workshop, November 2001. [14] IBM e-busiess Hostig, July 2002. http://www.ibm.com/ services/webhostig. [15] A. Barbir et al., Kow CDN request-routig mechaisms. Iteret Draft (draft-ietf-cdi-kow-request-routig-00.txt), February 2002. [16] A. Shaikh, R. Tewari, ad M. Agrawal, O the effectiveess of DNSbased server selectio, i Proceedigs of IEEE INFOCOM, April 2001. [17] CERT Coordiatio Ceter, Deial of service attacks. CERT Tech Tips, Jue 2001. http://www.cert.org/tech tips/deial of service.html. [18] Mihir Bellare ad Ra Caetti ad Hugo Krawczyk, Keyed Hash Fuctios for Message Autheticatio, i Proceedigs of CRYPTO, August 1996. [19] D. Mosberger ad T. Ji, httperf: A tool for measurig web server performace, ACM Performace Evaluatio Review, vol. 26, December 1998. [20] R. E. Blahut, Theory ad practice of Error Cotrol Codes. Addiso Wesley, 1983. [21] V. Pless ad W. Huffma, eds., Hadbook of Codig Theory. Elsevier, Amsterdam, 1998. [22] M. Best, A. Brouwer, F. MacWilliams, A. Odlyzko, ad N. Sloae, Bouds for Biary Codes of Legth less tha 25, i IEEE Trasactios o Iformatio Theory, Jauary 1978.