DOS ATTACK DETECTION USING SOURCE IP ADDRESS ENTROPY AND AVERAGE PACKET ARRIVAL TIME INTERVAL

Proceedings of the IASTED International Conference Computational Intelligence (CI 2015) February 16-17, 2015 Innsbruck, Austria DOS ATTACK DETECTIO USIG SOURCE IP ADDRESS ETROPY AD AVERAGE PACKET ARRIVAL TIME ITERVAL Keiichirou Kurihara Graduate School of Systems and Information Engineering University of Tsukuba 1-1-1 Tendai, Tsukuba, Ibaraki, Japan email: s1320620@u.tsukuba.ac.jp Kazuki Katagishi Academic Computing and Communications Center University of Tsukuba 1-1-1 Tendai, Tsukuba, Ibaraki, Japan email: katagisi@cc.tsukuba.ac.jp ABSTRACT DoS attack is the threat to ICT(Information and communications techlogy) society. There are many detection methods. But countermeasures have been become difficult according to complication of attacks. In conventional methods, the property of entropy is used to detect attacks. It enables to estimate increase and decrease of dispersion of header information values, like IP address, by comparing before and after entropy values in time series. In these methods, the detection rate with only one header information is low in accuracy. Therefore various kinds of header information are necessary for accurate detection. However, it takes a long time to distinguish DoS attacks and also the detection method becomes complicated. This paper proposes the detection method with only 2 header information, Packet arrival time and Source IP address. The method can be used to detect DoS attacks with fewer number of header information than conventional methods. In addition, False Positive and False egative are less than 2% and 0%, respectively. From these results, the method is t only simple but also accurate. KEY WORDS DoS attack,entropy,regression analysis 1 Introduction Cyber crimes are increasing day by day, so countermeasures are urgent need. Especially, DoS (Denial of Service) attacks give a heavy load to system and occupy the bandwidth. As a result, the system is forced to stop, so it affects society greatly. Their number of occurrences are increasing, and also the scale spreads. Moreover, the purpose of attack changes from for pleasure to for money and politics, and gets more malicious. So attacks become clever and their countermeasures also become difficult. In the research [1], authors treated 5 problems for DoS attacks detection: accuracy, immediacy, network adaptability, periodical adaptability and cyber attack adaptability. Then we focus on 3 problems in those problems: accuracy, immediacy and network adaptability. First, accuracy means that a detection method can distinguish differences between a attack packet and a rmal packet(n-attack packet) exactly. Wrong detection rate is must be as low as possible. Wrong detection can be classified into False-Positive and False-egative. False-Positive is that a detection method judges a rmal packet as a attack pocket. False-egative is that a detection method judges a attack packet as a rmal packet. Reducing wrong detection improves accuracy. Second, immediacy means that a detection method can detect in a short time when DoS attacks start. When DoS attacks occurs, immediacy is important to suppress influences on the system and the server. Finally, network adaptability means that a detection method can be applied to various networks and hosts. As network configurations are different with each other, network adaptability is important when a detection method is actually applied to various network structures. There are many detection methods. Among them, we focus on methods using entropy. In these methods, some parameters of header information are used to detect DoS attacks accurately. And the less the number of packets for calculating entropy, the lower the detection accuracy. So to extract common features of DoS attack, we analyze header information for two datasets: DARPA2000 and CAIDA2007. We analyze time-series distribution of each header value and then calculate entropy of each header value. We set the number of packets used for calculating the entropy values to 1000 packets and 5000 packets. From the results, we verify that there is a correlation between average arrival time interval of packets and entropy of source IP addresses in DARPA2000 and CAIDA2007. In last of this paper, we propose a detection method and verify effectiveness of the method. 2 Preliminary Conventional detection methods of DoS attacks can be classified into signature-based methods and statistical methods. In this section, we outline their mechanisms, existed researches, and problems. 2.1 Signature-based methods Signature-based methods can detect attacks with matching the signature. Signature is the attack pattern registered in DOI: 10.2316/P.2015.827-008 237

the database. The pattern t registered can t be detected as attacks. So the signature database is needed to be updated constantly and to be redefined corresponding to change of network configurations. Therefore, the more the number of signatures, the longer detection time. From the above, accuracy is an advantage of Signature-based methods. On the contrary, network adaptability and immediacy are disadvantages of them. As one of the existed methods, Miyazawa et al proposed a Signature-based method using received attack history[2]. Signatures are usually searched from beginning of them. So the more the number of signatures, the longer detection time. It is difficult to detect efficiency the attacks. Then Miyazawa et al made the database which was registered attack histories received in the past. And they proposed the method which searches preferentially a signature which occurs with high frequency. As a result, detection time was reduced about 70% compared with the conventional method. In addition, CPU usage rate was also reduced about 40%. So it was able to reduce the processing load. But a method for efficient update of the signature database isn t discussed. And a countermeasure against an attack which is t registered as signature isn t also discussed. 2.2 Statistical methods Statistical methods detect attacks using basic statistical information. These methods decide thresholds for detection from statistical information obtained in rmal time. ormal time is the time which is t received attacks. So Statistical methods are adaptable for various network configurations. In addition, extra memory for registration of signatures isn t needed. There is a smaller dispersion of a detection time distribution compared with signaturebased methods. On the other hand, wrong detection occurs depending on the threshold, so it is necessary to vary the threshold adaptively. Moreover, the time to store several tens thousand of packets for calculating statistics is required in addition to detection time. From the above, network adaptability is an advantage of Statistical methods. On the contrary, accuracy depending on the threshold and immediacy depending on time for storing packets are disadvantages. There are many conventional methods in statistical methods. General methods use the Gaussian distribution and statistical tests. In the research [3], attacks were detected by t-test. They proposed the method which distinguishes the difference of arrival ratio between SY packets and ACK packets. It was effective for SY-Flood detection. In the research [4], the method using χ 2 -test was proposed. It detected attacks from the χ 2 value calculated from frequency of appearance of source IP addresses. It doesn t assume that a distribution of rmal time obey the Gaussian distribution. But they mentioned a problem that wrong detection rate is high. 2.3 Entropy-based methods In Statistical methods, there are Entropy-based methods. In this section, we outline entropy and existed methods using it. 2.3.1 Entropy The entropy is the value which represents uncertainly. It is calculated by the following equation. H = m P i log 2 P i (1) Where m is the number of symbols. Parameter P i is calculated from the number of occurrences n i (i = 1,, m) of each symbol which appeared in any window width W and so P i = n i /W (i = 1,, m). The window width W is the total number of occurrences of each symbol, that is, W = n 1 + n 2 + + n m. The window width W is assumed to be fixed. In this case, the less the difference between maximum and minimum entropy values or the more the number of symbols m, the higher the entropy value. In this paper, we apply this property to DoS attacks detection. We calculate the entropy from observed packets in rmal time. When the distribution of entropy for symbols observed is different from that of the rmal time, the attack is judged. Symbols used in detection are header information of packets like source IP address, source (destination) port, total length of the packet, TTL (time to live), identification, window size and so on. 2.3.2 Existed researches There are many entropy-based methods. Every method uses header information as symbols. In the research [1], attacks were detected using the Mahalabis distance of entropy values which was calculated from nine header information. In the research [5], authors showed that entropy values which was calculated from five header information could be classified into 27 clusters. This result can be used to detect DoS attacks. In the research [6], authors detected some amalies from entropy which was calculated from IDS(Intrusion Detection System) logs. The amalies include DoS attacks, worms and illegal accesses. 3 Analysis for conventional DoS attack datasets 3.1 DDoS datasets Many datasets for DoS attacks have been published for the evaluation of attack detection methods. We use datasets which consists of raw packet data. In this paper, two typical datasets are used: DARPA dataset and CAIDA dataset. 238

DARPA2000 DARPA dataset was made from 1998 to 2000 by MIT(Massachusetts Institute of Techlogy) and sponsored by DARPA(Defense Advanced Research Projects Agency) and AFRL(Air Force Research Laboratory)[7]. It consists of unprocessed pcap data. In this paper, we analyze the DARPA2000 dataset. It assumes that an attacker intrudes hosts forcibly and infects them in bot programs. After that, the attacker makes DDoS attacks to target host. The scenario consists of 5 phases as below[7]. Phase 1 IPsweep of AFB from a remote site. Phase 2 Probe of live IP s to look for the sadmind daemon running on Solaris hosts. Phase 3 Breakins via the sadmind vulnerability, both successful and unsuccessful on those hosts. Phase 4 Installation of the trojan mistreat DDoS software on three hosts at the AFB. Phase 5 Launching the DDoS. CAIDA2007 Series of CAIDA datasets have been distributed by CAIDA(Center of Applied for Internet Data Analysis)[8]. In this paper, we analyze the CAIDA2007 dataset. It consists of pcap data, but a part of its contents of packets(payload) is removed, and IP addresses are anymized. This dataset includes also packets data for stronger DDoS attacks data. However it doesn t include packet data in rmal time. For about 20 minutes from the beginning, attackers make weak DDoS attacks. After that, attackers make strong DDoS attacks. In addition, some types of attacks are launched at the same time. In this paper, we call about 20 minutes from the beginning Weak attack time, and remaining time Strong attack time. We will regard weak attack as n-attack in this paper. 3.2 Dataset Analysis In this paper, we analyze datasets to extract common features among DoS attacks. We calculate entropy values of source IP addresses, source and destination ports, identification numbers, TTL, window sizes, total lengths with window width W = 1000 (packets). And we also calculate average arrival time interval of packets. Entropy is calculated by counting the number of occurrences of each header information per 1000 packets. As an example, we show the procedure to calculate entropy of source IP addresses. We assume that 10 source IP addresses SrcIP1,, SrcIP10 are occurred. First, we count the number of occurrences n i (i = 1,, 10) of each source IP address SrcIPi (i = 1,, 10). ext, we calculate the probability of occurrence P i = n i /1000 (i = 1,, 10). Finally we calculate the entropy value from each P i using the equation (1). Each entropy of source (destination) ports, total lengths of the packet, TTL, identification numbers and window sizes is calculated with same ways. 3.3 Difference of distribution between attack and nattack in Correlation between Source IP address entropy and Average arrival time interval Figures 1 and 2 show relation between average arrival time interval of packets and entropy of source IP addresses in DARPA2000 and CAIDA2007, respectively. Let x-axis and y-axis be average arrival time interval of packets and entropy values of source IP addresses, respectively. The average arrival time interval x i and the entropy value of source IP addresses y i are calculated in the same window width. These figures show distributions of points (x i, y i ). Then the strong correlation can be seen between entropy of source IP addresses and average arrival time interval of packets. Figure 1. Correlation between entropy of source IP addresses and average arrival time interval of packets(darpa2000) As is understood from Figure 1, the point distribution in Attack time is away from that in ormal time. And also from Figure 2, we can see that the point distribution in Weak attack time is away from that in Strong attack time. In addition, these figures show that points in ormal time and in Weak attack time are distributed linearly. Each of distributions can be approximated by a regression line. Then we will can use a regression line to detect DoS attacks in this paper. 239

attacks(points in ormal time shown in Figure 1), we can obtain the regression line: y = 38.381x + 2.2375. The variance σd 2 of residual, the standard deviation σ D and the mean µ D are 0.379089735, 0.615702635 and 0.0000252898( 0.0), respectively. Figure 2. Correlation between entropy of source IP addresses and average arrival time interval of packets(caida2007) 3.4 Property of residuals e i in Regression analysis We can represent the relation between independent variables x 1, x 2,, x and dependent variables y 1, y 2,, y as Y = ax + b(x = {x 1, x 2,, x }), (Y = {y 1, y 2,, y }) by Regression analysis. Least-squares method is one of the methods for obtaining a and b. This method determines a and b by minimizing the sum of squares of residuals e = {e 1,, e }, where each residual, e i = y i ŷ i, is calculated from the fitted value ŷ i = ax i + b and the observed value y i. The coefficients a and b are obtained by following equation. Figure 3. Histogram of residuals(darpa2000) As is understood from Figure 3, all of residuals in ormal time are included between µ D 3σ D ( 1.86) and µ D + 3σ D ( 1.86). The mean µ D is nearly 0, so residuals approximately follow the rmal distribution with mean 0. ext, Figure 4 shows the distribution of residuals in time series. a = b = x i y i x i y i ( ) 2 (2) x 2 i x i x i 2 y i x i y i x i ( ) 2 (3) x 2 i x i It is kwn that when is large, residuals follow four properties: Unbiasedness, Homoscedasticity, ocorrelation and ormality. It means residuals follow rmal distribution with mean 0 and variance σ 2 [9], where σ is a standard deviation. We verify that this property is seen in previous two dataset. First, we show results in DARPA2000 dataset. Figure 3 shows the histogram of residuals in ormal time. By using 407 points from the beginning to just before Figure 4. Distribution of residuals in time series(darpa2000) As is understood from Figure 4, residuals in Attack time deviate from the range between µ D 3σ D and 240

µ D + 3σ D. Therefore attacks are detected when residuals deviate from the range. ext, we show results in CAIDA2007 dataset. Figure 5 shows the histogram of residuals in Weak attack time. By using 183 points for about 5 minutes just before strong attacks(points in Weak attack time shown in Figure 2), we can obtain the regression line: y = 357.86x + 3.6293. The variance σc 2 of residual, the standard deviation σ C and the mean µ C are 0.019562181, 0.139864867 and 0.00000311352( 0.0), respectively. Figure 6. Distribution of residuals in time series(caida2007) Figure 5. Histogram of residuals(caida2007) As is understood from Figure 5, almost residuals in Weak attack time are included between µ C 3σ C ( 0.42) and µ C + 3σ C ( 0.42). The mean µ C is nearly 0, so residuals approximately follow the rmal distribution with mean 0. ext, Figure 6 shows the distribution of residuals in time series. As is understood from Figure 6, residuals in Strong attack time deviate from the range between µ C 3σ C and µ C + 3σ C. Therefore Strong attacks are detected when residuals deviate from the range. 4 Proposal method In this paper, we propose the detection method using the entropy from few header information. By using DARPA2000 dataset and CAIDA2007 dataset, we obtain common features of DoS attack. Distribution between two parameters in ormal time and Weak attack time can be approximated the regression line. Then, the proposal method uses the property of residuals: residuals follow the rmal distribution with mean 0 and variance σ 2. Figure 7 shows our detection process. It consists of 2- stages, [I] Initial value setting and [II] Detection as below. [ I ] Initial value setting: 1 Input the window width W and the number of data. amely the number of packets for learning is W packets. Calculate an initial range for detection, by using the average arrival time interval x (0) i and the entropy value of source IP addresses y (0) i. And calculate the coefficients a (0) and b (0) of the regression line of (x (0) 1, y(0) 1 ),, (x(0), y(0) ), the average µ(0) and the standard deviation σ (0) of residuals. [ II ] Detection (after ( + 1)-th W packets in Figure 7) 2 From captured W packets, calculate the average arrival time interval of packet x (k) and the entropy value of source IP address y (k). 3 By using x (k) and y (k), calculate the residual e (k) from the regression line. 4 If the residual e (k) deviates from the range between µ (k) 3σ (k) and µ (k) + 3σ (k), algorithm judges attack. Then, updating the range isn t performed. If e (k) doesn t deviate, algorithm judges n-attack. And it updates the average µ (k) and the standard deviation σ (k) of the residuals by using x (k) and y (k) calculated in 2. And return to 2. Figures 8 and 9 show flowcharts of the algorithm which consists of [I] Initial value setting and [II] Detection, respectively. This method can detect DoS attacks from only 2 header information, source IP address and arrival time of a packet. The range between µ (k) 3σ (k) and µ (k) +3σ (k) is updated every time when this method judges a n-attack. 241

Start Input Window width :W Input number of data for learning : Counter :i = 1, j = 1, k = 1, w 1 = 1, w 2 = 1 5 Verification Figure 7. Detection process 5.1 Detection rate against previous two datasets From the above, the regression line and the range are obtained from points. We calculate False Positive and False egative for previous two datasets: DARPA2000 and CAIDA2007 to verify how the number of points influences the detection rate. First, we show the result in DARPA2000. Figure 10 shows False egative and False Positive when is 50, 100, 200, 300 and 400. False egative is all 0%. False Positive is 0.57% at the minimum when is 400. As is understood from Figure 10, the proposal method can be used to detect DDoS attack accurately when the parameter increased. ext, we show the result in CAIDA2007. Figure 11 shows False egative and False Positive when is 50, 100, 200, 300, 400, 500 and 600. False egative is all 0%. False Positive is 1.67% at the minimum when is 400. As is understood from Figure 11, even if increase, False Positive isn t always decreased. As the result, we can detect DoS attacks with False egative 0% and with False Positive less than 2%. 5.2 Verification using a latest dataset ext, we graph data in CAIDA2013. CAIDA2013 datasets record packets data which is captured in datacenter in Chicago and San jose. This doesn t contain DoS attack data. We verify that the latest dataset has the same property that the distribution of correlation between average arrival time interval and entropy of source IP addresses has linearity in n-attack time. In CAIDA2013 datasets, we use the latest one which was recorded in Chicago on 19th December, 2013. This dataset contains 1 hour pcap data. We w 1 = w 1 + 1 Read a packet w 1 = W? Calculate below for read W packets The average of arrival time interval :x i (0) The entropy of source IP address : y i (0) i = i + 1, w 1 = 1 i >? Calculate regression line from X (0) (0) (0) = x 1,,, x and Y (0) (0) (0) = {y 1,,, y }: Y (0) = a (0) X (0) + b (0) Calculate each residuals e j (0) = yj (0) (a (0) x j (0) + b (0) ) j = j + 1 j >? Calculate below values of residuals in E (0) = {e 1 (0),,, e (0) } Average: μ (0), Standard Deviation: σ (0) To Detection Figure 8. Flowchart of the proposed method [I] Initial value setting analyze it every one minute. Figure 12 shows the number of packets per second in the CAIDA2013 dataset. Figure 13 shows relation between average arrival time interval and entropy of source IP addresses in the CAIDA2013 dataset. As is understood from this figure, the correlation between two parameters distributes linearly. From these results, we can say that distribution of correlation between two parameters has linearity in latest data. ext we calculate False Positive for CAIDA2013 every one minute. Figure 14 shows that False Positive is about 0.6% if is larger than 100. As is understood from Figures 10, 11 and 14, False Positive is reduced less than 2% when = 400. 6 Conclusion In this paper, we proposed the entropy-based detection method for DoS attacks using two header information. That is fewer than conventional methods. First, we analyzed 242

From Initial value setting Read a packet w 2 = w 2 + 1 k = k + 1 umber of read packets > 0 w 2 = W? Calculate below for read W packets The average of arrival time interval: x (k) The entropy of source IP address: y (k) The residual: e (k) = y (k) (a (k 1) x (k) + b (k 1) ) w 2 = 1 e (k) < μ (k 1) 3σ (k 1) or μ (k 1) + 3σ (k 1) < e (k)? Calculate below values of residuals in E (k) Average: μ (k) Standard Deviation: σ (k) Calculate each residuals E (k) = Y (k) (a (k) X (k) + b (k) ) Calculate regression line Y (k) = a (k) X (k) + b (k) Exchange x (k), y (k) with each oldest value in X (k 1), Y (k 1). X (k) = X (k 1), Y (k) = Y (k 1) Figure 10. False Positive and False egative(darpa) Detection DoS Attack a (k) = a (k 1), b (k) = b (k 1), c (k) = c (k 1) σ (k) = σ (k 1), μ (k) = μ (k 1) X (k) = X (k 1), Y (k) = Y (k 1), E (k) = E (k 1) End Figure 9. Flowchart of the proposed method [II] Detection two datasets and extracted common features. From the result, we showed that the proposal method can distinguish attack time from n-attack time with False egative 0% and with False Positive less than 2%. The proposal method is simple because only two header information is required. This method is the entropy-based method, so this has higher network adaptability compared with signature-based methods. And results of the verification show accuracy of the proposal method. In addition, conventional methods need several tens thousand of packets for the accurate detection. But the proposal method can detect attacks accurately only 1000 packets window width. We didn t show results with 5000 packets window width in this paper, but we have confirmed that detection accuracy was at same level. So this has higher immediacy compared with conventional methods. In future work, we research whether same properties are seen in actual network traffic. So we will capture actual packets data and verify effectiveness of the proposal method. Figure 11. False Positive and False egative(caida2007) References [1] Shunsuke Oshima, Takao akajima and Toshiri Sueyoshi: Fast Amaly Detection Method Using Entropy-based Mahalabis Distance, IPSJ, 52(2), 656-668, 2011-02-15, ISS:03875806. [2] MIYAZAWA Ryota, ABE Koki: Improving Resistance to DoS using Attak History in Signature-based Intrusion Detecting System, CSEC, 2008(71), 143-148, 2008-07-17. [3] Chin-Ling Chen: A ew Detection Methods for Distributed Denial-of-Service Attack Traffic based on Statistical Test, Journal of Universal Computer Science, vol.15,.2, (2009). [4] Shunsuke Oshima, Takao akajima and Toshiri Sueyoshi: Amaly Detection using Chi-Square Values based on the Typical Features and the Time 243

Figure 12. CAIDA2013 The number of packets per second in Figure 14. False Positive(CAIDA2013) [10] Mowar H. Bhuyan, H.J.Kashyap, D.K..Bhattacharyya and J.K.Kalita: Detecting Distributed Denial of Service Attacks: Methods, Tools and Future Directions, The Computer Journal Advance Access 2013-3-28. [11] Keiichirou KURIHARA and Kazuki KATAGISHI: A Simple Detection Method for DoS Attacks based on IP Packets Entropy values, AsiaJCIS.2014.20, 44-51, 2014-09-04. Figure 13. Correlation in CAIDA2013(parts) [5] [6] [7] [8] [9] Deviation, 2011 International Conference on Advanced Information etworking and Applications. Kuai Xu, Zhi-Li Zhang: Internet Traffic Behavior Profiling of etwork Security Monitoring, IEEE/ACM TRASACTIOS O ETWORKIG, vol.16,.6, DECEMBER 2008. TAKEMORI Keisuke, MIYAKE Yutaka, TAAKA Toshiaki, SASASE Iwao: An Amaly Detection Technique for IDS Events using Deviations of Information Entropy,CSEC, 2004(54), 31-36, 2004-0521, ISS:09196072. www.ll.mit.edu/mission/communications/cy ber/cstcorpora/ideval/data/ www.caida.org/data/overview Hitoshi Kume and Yoshiri Iizuka (1987): Kaiki Bunseki(Regression analysis), Iwanami syoten, Tokyo, ISB 4000077627. 244