A Novel Method to Defense Against Web DDoS 1 Yan Haitao, * 2 Wang Fengyu, 3 Cao ZhenZhong, 4 Lin Fengbo, 5 Chen Chuantong 1 First Author, 5 School of Computer Science and Technology, Shandong University, JiNan, China {htyan,ctchen}@mail.sdu.edu.cn *2 Corresponding Author,4 School of Computer Science and Technology, Shandong University, JiNan, China {wangfengyu, linfb}@sdu.edu.cn 3 Computer Science College, Qufu Normal University, Qufu, China caozhzh@gmail.com Abstract Web DDoS is one of the common network security problems, the defense means which have been proposed are complex and obscure. In this paper we introduce a simple algorithm which can detect the attacks and locate the attackers. We demonstrate that it is good enough to defense against Web DDoS by using the length and arrival time of request packets. The rhythm got from length and interval of packets is the key point to distinguish the illegal traffic from the legitimate traffic. We explain how to generate rhythm from flows and why the rhythm can be used to defense against Web DDoS. At last, experiments have proved that our algorithm can defense against Web DDoS effectively and accurately. Keywords: DDoS(Distributed Denial of Service), Rhythm Matrix, Packet Length, Arrival Interval 1. Introduction In recent years, the defense of Web DDoS has attracted a lot of attention from the research community for the popular of Web DDoS attacks. Some papers [1,2,3] propose to use Turing tests as puzzles to differentiate human users from automated zombies, but this method may interferes the legitimate users browsing. Walfish et al. [4] propose "Speak-up" strategy that encourages all clients to increase their sending rates during application-layer attacks. The strategy assumes that the attackers have already run out of their own available bandwidth, only legitimate users can increase their bandwidth. The limitation lies in that the use of bandwidth as a currency is questionable, because the bandwidth of users may vary from dial-up modem to fiber connection. Jasmshed et al. [5] propose a new framework to reduce bot-generated traffic by human attestation technology based on trustworthy input devices. It is a multipurpose technology aims at spamming, password cracking and DDoS attacks, but it may interference automatic operations launched by legimate processes like mail watcher, antivirus program etc. Ranjan et al. [6] detect the Web DDoS by statistical characteristics of HTTP sessions and employ requests rate-limiting as defense mechanism, however, the method requires client side support, and may also interfere the user s browsing. Jie Yu et al. [7] build a DDoS attack model in layer-7 and propose a defense mechanism against application layer DDoS attacks by combing detection and currency technologies. They [8] also propose a method by using lightweight trust management mechanism to defend against DDoS attacks. Xie et al. [9] introduce an extended hidden semi-markov model to describe the browsing behaviors and consider the attack s session as anomaly browsing behavior. However, the method is obscure because the model parameters selected will greatly affect the detection result and it s hard to implement the algorithm in production environment. And other papers [15,16] discuss the defense scheme of DDoS attack from the perspective of network anomaly detection and queue schedule. Our goal is to find a way to defense against the Web DDoS efficiently. To give focus in our work, we specifically consider two points: (a) The algorithm has low complexity and can also International Journal of Digital Content Technology and its Applications(JDCTA) Volume6,Number19,October 212 doi:1.4156/jdcta.vol6.issue19.21 162
be implemented easily. (b) The algorithm can confront various forms of Web DDoS attack. In this paper, we expand on the rhythm of access flows to address these problems by making use of the mapping relationship between rhythm and matrix. As our key contribution, we propose a new and simple algorithm for the two problems we mentioned above. To the best of our knowledge, this is the first work to address the Web DDoS using rhythm of flows. 2. Web DDos attack mode Inherited from paper [1], we classify the Web DDoS attack mode as the following 5 classes: - Single-URL flooding: repeatedly send single URL request; - Multiple-URLs flooding: repeatedly send multiple URL requests; - Random-URLs flooding: send URL requests random selected from current page; - Session flooding: repeatedly replay a real HTTP session got from legimate access; - Forge-URLs flooding: sending forge URL request. In modes 1,2,3,5, attackers often tend to increase the sending rate to archive better result [1]. 3. Flow and rhythm 3.1. Flow definition A web surfing session may contain multiple TCP connections. These connections should be deal as whole for accurately describe the client s browsing behavior. In this paper, we focus on the HTTP request packets sent from client to server and do not care about ACK-only packets or packets responsed by server side. Therefore, we define the flow as following: a sequence of packets ordered by arrival time which has same 4-tuple (source address, destination address, destination port, protocol number). The packets belong to same flow are processed in sequence. 3.2. Flow rhythm Given the HTTP request packets as p and the packets arrival interval as Δt, the flow from client to server can be represented by the following formula: F p, t ){1 i n, n count( p )} (1) ( i i i We focus on length and arrival interval of packets, the packets payload is not concerned. Given l i =length(p i ), then (1) can be written as: F l, t ){1 i n, n count( p )} (2) ( i i i And then, the user's browsing behavior is mapped into a sequence of packets length and a sequence of arrival interval. We use the following formula to generate the rhythms of an access flow (Norm() is the normalization function): X j Norm( li ) *1 Norm( li 1) *1 Norm( li 2 ) Y j Norm( ti ) *1 Norm( ti 1 ) *1 Norm( ti n n (i 3x 1, x,1, 3 j 3 n count ( p ), Norm() 9) i 2 ) (3) Formula (3) shows that X j and Y j got from rhythmization are in range (, 999). 163
Figure 1 illustrates a sequence of rhythms generated from the real traffic of a client in data set DS2 [12] (experimental datasets are explained hereinafter). For convenience of observation, the arrival interval rhythms in schematic diagram adopt the negative value of its absolute value. 1 5-5 -1 5 1 15 2 Figure 1. A Real Datagram Rhythm As the result of normalization, curves in Figure 1 are at range (, 999). The interval rhythm curve shows different packet arrival interval, curve in the vicinity of y-axis value indicates a smaller interval time, curve downward protruding peak indicates a longer interval time. We also plot 5 Web DDoS attack modes in Figure 2 to Figure 6 for comparison. The rhythms in figures are generated from flows extracted from simulation dataset. 6 5 4 RhythmValue 3 2 1-1 2 4 6 8 1 Figure 2. Rhythm of Single-URL Flooding 164
Figure 2 illustrates the Single-URL flooding attack. In this attack mode, attacker repeatedly submits one URL to server, so the rhythm of packets length after normalization is a fixed value; the curve in figure is a straight line. 6 5 4 3 2 1-1 5 1 15 Figure 3. Rhythm of Multiple-URLs Flooding Figure 3 illustrates the Multiple-URLs flooding attack. In this mode, attacker repeatedly submits multiple URLs to server; the rhythm curve extracted from attack traffic repeats as shown in figure. 1 8 6 4 2 5 1 15 2 25 3 35 Figure 4. Rhythm of Random-URLs Flooding 165
Figure 4 illustrates the Random-URLs flooding attack. Attacker jumps between pages randomly, although the total URL length set is a fixed set, but the rhythm of packet length has no obvious characteristics. 6 4 2-2 -4 5 1 15 2 25 3 Figure 5. Rhythm of Session Flooding Figure 5 illustrates the Session flooding attack, attacker submits URLs in the sequence of real HTTP session with real packet interval, the consequent length rhythm and interval rhythm have obvious regulation, and curves in figure are repeated periodically. 14 12 1 8 6 4 2 5 1 15 2 25 3 35 Figure 6. Rhythm of Forge-URLs Flooding Figure 6 illustrates the Forge-URLs attack, as the attacker aims at consuming the buffer of server side and forcing the server to drop legitimate requests, the fake URLs are generally longer than normal [1]. It can be seen in the figure that rhythm of packets length has no obvious characteristics and mainly falls into range (45,999) because of longer packet length. In Figures 2,3,4,6, interval rhythm values are stable at because of short arrival intervals. 166
4. Rhythm matrix From formula (3) we get a series of X j and Y j, each tuple of (X j, Y j ) can be mapped into an element in a 1 1 matrix, for convenience we assume the matrix subscript starts from. We refer to the mapping process as rhythm falling on matrix element. The initial value of matrix element is set to. While we obtain a tuple (X j, Y j ), the value of element corresponds to subscript (X j, Y j ) in matrix adds 1. Assuming the value of element (i,j) is C (i,j) during unit interval t (we use time interval of 1 minute throughout the experiment), the velocity of rhythm falling on matrix in period t is: S ( i, j ) C ( i, j ) t (4) Processing continuous data of K unit intervals, we obtain a series of velocity (S 1, S 2,... S k ) calculated from different unit intervals. Take S max ( S ) (5) ( i, j ) m 1 m k The maximum velocity of matrix element (i,j) is got from formula (5). Calculate the maximum velocity of each matrix element, and use these maximum velocity values to generate a new matrix. The new matrix is known as rhythm velocity matrix. Due to the similarity of users interest, the length and arrival interval of request packets are statistical stable as long as the page structure of web site is stable. Figure 7 illustrates two rhythm velocity matrixes generated from dataset DS2 [12]. Figure 7(a) shows the data from to 12 oclock; Figure 7(b) shows the data from 12 to 24 oclock. (a) Matrix of data from to 12 oclock (b) Matrix of data from 12 to 24 oclock Figure 7. Rhythm velocity matrix of Experiment Dataset The x-axis describes packet length rhythm, the y-axis describes arrival interval rhythm, and the z-axis shows velocity in log-scale. Because the rhythm is statistically stable, Figure 7(b) is very similar Figure 7(a). We use the traffic trace of legitimate access as training set, extract rhythm from request packets and generate rhythm velocity matrix S. Matrix S is the base matrix to detect the occurrence of DDoS and identify attackers. Under normal circumstance, the rhythm velocity 167
matrix S generated from real time access traffic is similar to S while S i,j approximately equal to S i,j. When DDoS attack occurs, the S i,j correspond to the rhythm of DDoS attack flow will be significantly greater than S i,j. We can determine the occurrence of DDoS by this unusual gain of element value and mark the element (i,j) as suspected point, then we use these suspected points to filter the attacker traffic. 5. Experiment 5.1. Experimental datasets Two traces [11, 12] are used in our experiment. We extracted the traffic of randomly selected servers with larger traffic as test dataset from each trace. The traffic extracted is named as DS1 and DS2. DS1 and DS2 are both divided into training set A and test set B. DS1A and DS2A are used to generate rhythm velocity matrix, DS1B and DS2B are used as background traffic of simulated Web DDoS traffic. The method of Web DDoS Attack simulation is based on the previous papers [13, 14]. Ten traces are simulated according to different attack mode. For attack mode 1,2,3,5, packet arrival interval is set to approximately 2ms. For attack mode 4, a randomly selected HTTP session from background traffic is used as simulation pattern. 5.2. Experimental Results We generate rhythm velocity matrix of traffic from test set B per unit interval, compare it to the base matrix got from training set A. If S i,j in B is much greater than S i,j in A, we determine that DDoS has occured. Then we mark the element (i,j) as suspected point, if a flow continuously falling on these suspected points, we consider this flow to be an attack flow. 1).Result of Single-URL flooding (attack mode 1): Single-URL flooding use only one URL during the attack process, so the rhythm of request packets length is a fixed value (Please refer to Figure 2), and meanwhile, the rhythm of arrival interval is fixed at because the packets interval is short. Thus the rhythm of attack flows continuously falling on a fixed element in velocity matrix, resulting in the value of the element has a great increase, far more than the velocity under normal circumstances, and thereby triggering the DDoS detecting schema. The experimental results are shown in Table 1. Table 1. Experiment result in attack mode 1 DataSet Attackers Detected True Positive False Positive DS1B-1 2 2 1% / DS2B-2 2 21 1%.5% 2). Result of Multiple-URLs flooding (attack mode 2): Multiple-URLs flooding use multiple URLs during attack process; the rhythm of request packets length is a circular array (Figure 3). The rhythm of attack flows falling on the fixed elements in matrix, causing the value has a great increase. The experimental results are shown in Table 2. Table 2. Experiment result in attack mode 2 DataSet Attackers Detected True Positive False Positive DS1B-2 2 2 1% / DS2B-2 2 2 1% / 3). Result of Random-URLs flooding (attack mode 3): In random-urls flooding, the requested URLs are selected from page randomly. Regarding the length of all URLs in a web site as set L, the rhythm of random-urls flooding generated from URLs randomly selected 168
from L is similar to that of legimate access. However, because the zombies hasn t tendentiousness as the human client, the rhythm of attack flows are different from legimate flows. The experimental result is shown in Table 3. Table 3. Experiment result in attack mode 3 Data Set Attackers Detected True Positive False Positive DS1B-1 2 22 1% 1% DS2B-2 2 2 1% / 4). Result of Session flooding (attack mode 4): In Session flooding, the request packets and arrival intervals of attack flows are similar to legitimate flows, so the rhythm of attack flow is identical to that of legitimate flow. However, all attack flows have similar rhythm, the rhythms are falling on same elements in velocity matrix during attack period, thus there are abnormal growth of velocity in these elemetns. Table 4. Experiment result in attack mode 4 Data Set Attackers Detected True Positive False Positive DS1B-4 2 22 1% 1% DS2B-4 2 2 1% / 5). Result of Forge-URLs flooding (attack mode 5): In Forge-URLs flooding, the attack packets have random length, resulting random rhythm, the abnormal elements of matrix are randomly distributed. Table 5. Experiment result in attack mode 5 Data Set Attackers Detected True Positive False Positive DS1B-5 2 2 1% / DS2B-5 2 21 1%.5% As shown above, our algorithm achieves very good results in all 5 attack modes. The true positive rate 1%, the maximum false positive rate is 1%. 6. Conclusion In this paper, we propose a simple but efficient method to defense against Web DDoS. The novelty of algorithm lies in using rhythm as a tool to detect and filter attack flows. We applied the algorithm on two real traces with very promising result. The true positive rate is 1% and the false positive rate is 1%. The result demonstrates that the proposed algorithm is expected to be practical in monitoring Web DDoS attack. 7. References [1] S. Kandula, D. Katabi, M. Jacob, and A. Berger, Botz-4-Sale: Surviving organized DDoS attacks that mimic flash crowds, In Proceedings of the 2 nd conference on Symposium on Network Systems Design & Implementation, vol. 2, pp.287-3, 25. [2] V. Gligor, Guaranteeing access in spite of distributed service-flooding attacks, Journal of Lecture Notes in Computer Science, Springer, vol. 3364, pp.8-96, 25. [3] W. Morein, A. Stavrou, C. Cook, A. Keromytis, V. Misra, and R. Rubenstein, Using graphic Turing tests to counter automated DDoS attacks a gainst web servers, In Proceedings of the 1th ACM conference on Computer and communications security, pp.8-19, 23. [4] Michael Walfish, Mythili Vutukuru, Hari Balakrishnan, David Karger, and Scott Shenker, DDoS defense by offense, Journal of ACM Transcations on Computer Systems, ACM, vol. 28, no. 1, 21 169
[5] Jamshed M. A, Kim W, Park K, Suppressing bot traffic with accurate human attestation, In Proceedings of the first ACM asia-pacific workshop on Workshop on systems, pp.43-48, 21 [6] Ranjan S, Swaminathan R, Uysal M, Knightly E, DDos-resilient scheduling to counter application layer attacks under imperfect detection, Journal of IEEE/ACM Transcations on Networking, IEEE, vol. 17, no. 1, pp.26-39, 29. [7] Jie Yu, Zhoujun Li, Huowang Chen, Xiaoming Chen, A detection and offense mechanism to defend against application layer DDos attacks, In Proceedings of Third International Conference on Networking and Services, pp.54-6, 27 [8] Jie Yu, Fangfang Cheng, Liming Lu, Zhoujun Li, A lightweight mechanism to mitigate application layer DDos attacks, Journal of Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, Springer, vol. 18, pp.175-191, 29 [9] Xie Y, Yu SZ, A large-scale hidden semi-markov model for anomaly detection on user browsing behaviors, Journal of IEEE/ACM Transcations on Networking, IEEE, vol. 17, no. 1, pp.54-65, 29 [1] Xiao J, Yun XC, Zhang YZ, Defend against application-layer distributed denial-of-service attacks based on session suspicion probability model, Chinese Journal of Computers, China Computer Federation, vol. 33, no. 9, pp.1713-1724, 21 [11] http://mawi.wide.ad.jp/mawi/ditl/ditl29/ [12] http://www.wand.net.nz/wits/nzix/2/ [13] Xie Y, Yu SZ, Anomaly detection based on web users browsing behaviors, Chinese Journal of Software, ISCAS, vol. 18, no. 4, pp.967-977, 27 [14] Xie Y, Yu SZ, A model for detecting application layer flooding attacks, Journal of Chinese Computer Science, CCS, vol. 34, no. 8, pp.19-111, 27. [15] Ming Yu, "A Nonparametric Adaptive Cusum Method And Its Application In Network Anomaly Detection", IJACT, Vol. 4, No. 1, pp. 28 ~ 288, 212 [16] Yu Ming, "Mitigating Flooding-Based DDoS Attacks by Stochastic Fairness Queueing", AISS, Vol. 4, No. 6, pp. 145 ~ 152, 212 17