A Review of Data Mining Techniques for Detection of DDoS Attack Apurva Tiwari 1, Dr. Sanjiv Sharma 2 (Department of CSE & IT) 1 Madhav Institute of Technology & Science, Gwalior (India) Abstract- Data Mining plays a crucial role for implementation of network security against various types of attacks. Distributed Denial of Service (DDoS) attacks detection is one of the key steps in defending against DoS/ DDoS attacks. A good detection technique should have short detection time, low rate of false positives, low rate of false negatives but high normal packet survival ratio. This review paper provides the comparative study of various detection techniques with their corresponding advantages and disadvantages and focuses on the existing attack and defense mechanisms, so that a better understanding of DDoS attacks can be achieved and more efficient defense mechanisms and techniques can be devised. Existing literature and researches reveal various significant areas for detecting DDoS attack and data mining techniques are efficient, scalable and flexible for detecting DDoS attack. This paper presents a comprehensive survey of DDoS attacks, detection methods and tools used in network. Index Terms- Data Mining techniques, Distributed Denial of Service (DDoS) attack, Network security. I. INTRODUCTION Distributed Denial of Service (DDoS) is a simple and a very powerful technique to attack Internet resources as well as system resources. Distributed multiple agents consume some critical resources at the target within a short time and deny the service to legitimate the clients. As a side effect, they frequently create network congestion on the way from source to target, thus disturbing normal Internet operation and making the connections of users be lost. Recently, the side effect seriously threatens real networks together with worm viruses. As the damage by DDoS attack [1] increases, many researches for detection mechanism have performed, but the existing security mechanisms do not provide effective defense against these attacks or the defense capability is only limited to specific DDoS attacks. The large number of attacking machines and use of source IP address spoofing make the traceback impossible. Use of legitimate packets for the attack and the variation of packet fields disable the filtering of the attack streams. Distributed nature of the attacks calls for a distributed response, but cooperation between administrative domains is very tough to achieve. Security and authentication of participants results high cost.the DDoS attack disables the victim s resources from a single attacker. DDoS is a network flooding attack from multiple machines, simultaneously. Consumption of overall network bandwidth is done by filling large amount of packets in it. DDoS attack is launched by the attacker from a hidden site. Attack involves four major participants: the attacker, one or more Page 28 handler nodes, multiple agents, and a target victim. The attacker first selects one or more handlers, which in turn select some innocent hosts to serve as agents to launch the attacks on victim machine. With the rapid development of network technologies, security becomes one of the most important issues. There have not been developed fundamental defense solutions of Distributed Denial of Service (DDoS) attacks. DDoS attacks make a victim to deny providing normal services in the Internet. It is done by flooding a great number of malicious traffic. Attackers do not use the security holes of a network-connected system but launch attacks against its availability. The well known web sites, such as Yahoo, ebay, and Amazon.com, were damaged by DDoS attacks in 2000, although these were well-equipped in security. Such web sites were attacked because they are connected through the internet. Thus, DDoS attack has become a major threat to the stability of the internet. In a DDoS attack, an attacker compromises a large number of network-connected hosts by exploiting network software vulnerabilities. Then, attack software is installed on these host systems through secure channels. A large number of the compromised hosts on which attack software is installed send useless packets toward a victim at the same time. The volume of malicious traffic generated by such hosts is so high that a victim cannot afford it and becomes paralyzed The recent rapid development in Data Mining has made available wide variety of algorithms, drawn from the pattern recognition, machine learning and database. These algorithms made it possible to achieve the ultimate aim of writing this paper. The central theme of this paper is to explore areas where data mining techniques extensively gathers the audited data to compute patterns which predict the actual behaviour that can be used for detecting or tracing various DDoS attack. This paper comprises different sections: Section 2 focuses on basic terminology and architecture of DDoS attack. Section 3 presents Data Mining in DDoS attack detection. Section 4 shows background and related work in DDoS attack detection using Data Mining. Section 5 describes comparison of available algorithms, mechanisms, methods for detection of DDoS attack using Data Mining. Section 6 defines conclusion and future work. II. BASIC TERMINOLOGY AND ARCHITECTURE OF DDOS ATTACK DDoS attacks [2] have first appeared in June of 1998. DDoS attack disrupts the availability of services or resources in the internet. DDoS attack is performed to deplete the resource of one or more victims and make the victim unavailable to its legitimate clients. Therefore, it involves dumping packets from
many agents (zombies) towards the victim server. The server is never compromised, database is never viewed, and the data is never deleted. Backbone of DDoS attack is the network of zombies called as decoy network or botnet. Zombie is considered as a secondary victim, it is not the target of the DDoS attack but they act as the accomplice. Here, the zombie is called as accomplice, an accomplice is a person who participates in a crime, even though it takes no part in the actual crime, such is also a punishable offence. The zombies do not initiate the attack but they participate in the DDoS attack, therefore they are accomplice. The ignorance of zombies not only leaves room for DDoS attack but their own private, vital and sensible data are under risk of being exploited by the attacker. The Agents are compromised hosts that are running an attack tool and also responsible for generating a stream of packets towards the intended victim. The users of the agent systems remain unaware of the situation. The main aim of DDoS attack is to overload the victim and render it incapable of performing normal transactions [3]. To protect network servers, network routers and client hosts from becoming the handlers, Zombies and victims of DDoS attacks, Data Mining technique can be adopted as a weapon to these attacks. The rapid development in Data Mining has made available wide variety of techniques, drawn from the statistics, pattern recognition, machine learning process, and database. The architecture of DDoS attack [4], in which attacker sets up hierarchical architecture of attack. An attacker chooses more than one handler which has security vulnerabilities, and intrudes them by gaining access right as shown in fig 1. The procedures for selecting agents (zombies) are performed as the same way for selecting Handlers; attacker indirectly achieves it through handlers. The agents will perform DDoS attack by sending unaccountable amount of malicious traffic to a target system simultaneously. The handlers and agents are located in the external networks of victim s and attacker s network. The attacker successfully accomplished the selection of handlers and agents, then controls communications among the three systems to compromise attack. Figure 1. Architecture of DDoS Attack III. DATA MINING IN DDOS ATTACK DETECTION Data Mining [5] is becoming a persistent technology in activities as diverse as using historical data to predict the success of marketing campaigns, looking for templates in network traffic to discover illegal activities or analyzing sequences. Data Mining is also an important part of knowledge discovery in databases (KDDs), an iterative process of the non trivial extraction of information from data and can be applied for developing secure system infrastructure. KDD includes several steps from the collection of raw data to the creation of new knowledge. Data Mining is used in many domains, like engineering, finance, biomedicine, and cyber security. There are two categories of Data Mining methods [15]: supervised and unsupervised. Supervised Data Mining techniques predict a hidden function using training data. The training data have pairs of input variables and output classes. The output of the method can predict a class label of the input variables. Examples of supervised mining are classification and prediction. Unsupervised data mining is an attempt to identify hidden patterns from given data without introducing training data. Examples of unsupervised mining are clustering and associative rule mining. Data Mining is the mining of knowledge from a large amount of data. The strong patterns or rules detected by data-mining techniques can be used for the nontrivial prediction of new data i.e. information that is implicitly presented in the data, but was previously unknown is discovered. Data Mining techniques use statistics, artificial intelligence, and pattern recognition of data in order to extract behaviours or entities. Thus, Data Mining is an interdisciplinary field that employs the use of analysis tools from statistical models, mathematical algorithms, and machine learning methods to discover previously unknown, patterns and relationships in large data sets, which are useful for finding hackers and preserving privacy. Proactive security solutions are designed to maintain the overall security of a system, even if individual components of the system have been compromised by an attack. Recently, the improvement of Data Mining techniques and Information Technology brings unlimited chances for Internet and other media users to explore new information. The new information may include sensitive information and incur a new research domain where researchers consider Data Mining algorithms from the viewpoint of privacy preservation. Various applications where data mining approach can be used in detection of DDoS attacks. Intrusion Detection Systems (IDS) aim at detecting attacks against computer systems and networks. An IDS acquires information about an information system to perform a diagnosis on the security status. The goal is to discover holes in security, open vulnerabilities that could lead to potential breaches. Intrusion Detection techniques can be classified as misuse detection and anomaly detection. Misuse detection systems use patterns of well-known attacks or weak spots of the system to match and identify known intrusions. Anomaly detection systems flag observed activities that deviate significantly from the established normal usage profiles as anomalies, i.e., possible intrusions. The main reason of using Data Mining for intrusion detection systems is the enormous volume of existing and newly appearing network data that requires processing. Literature also provides evidence where Data Mining techniques are used for intrusion detection. Page 29
IP Traceback is the ability to trace IP packets from source to destination. This is a significant step towards identifying and stopping attackers. The IP Traceback is a vital procedure in defending against DDoS attacks. Lot of techniques are used to trace the DDoS attacks. An approach suggested by [6] and [7] is called Logging that is to log packets at key routers and then use Data Mining techniques to determine the path that the packets traversed. This scheme has the functional property that it can trace an attack long after the attack has accomplished. It also has distinct trash bags, including potentially excessive resource requirements and a large scale inter provider database integration problem. The Data Mining techniques are providing very efficient way for discovering useful knowledge from the available information. [8] proposed a system which uses packet marking mechanisms along with Intrusion Prevention Systems for efficient IP Traceback. IV. BACKGROUND AND RELATED WORK Keunsoo Lee et.al. [4] proposed DDoS attack detection method using cluster analysis. DDoS attacks generate enormous packets by a large number of agents and can easily exhaust the computing and communication resources of a victim within a short period of time. This paper proposes a method for proactive detection of DDoS attack by exploiting its architecture which consists of the selection of handlers and agents, communication and compromise, and attack. Focus is on the procedures of DDoS attack and then select variables based on these features. After that, cluster analysis for proactive detection of the attack is method. The outcomes show that each phase of the attack scenario is partitioned well and detection of precursors of DDoS attack as well as the attack itself can be done. Kanwal Garg et.al.[9] introduced that DDoS attacks are large-scale cooperative attacks launched from a large number of compromised hosts called Zombies, a major threat to Internet services. Popular web sites such as Amazon, Yahoo, and CNN are among the prominent victims of DDoS attacks. Large number of companies transacting online are mainly facing the considerable loss as they are being targeted to DDoS attack. Therefore, keeping the problem in view, author presents various significant areas where data mining technique work as a strong candidate for detecting and preventing DDoS attack. Mihui Kim et.al. [1] proposed a combined Data Mining approach for DDoS attack detection. As the DDoS attacks causes serious damage, the fast detection and the appropriate response mechanisms are critical. extant security mechanisms do not provide effective defense against these attacks. It is necessary to analyze the fundamental features of DDoS attacks because these attacks can easily vary the used protocol, or operation method. This paper proposes a combined Data Mining approach for modeling the traffic pattern of normal and distinct attacks. This approach uses the automated feature selection mechanism for selecting the relevant attributes. The classifier is built with hypothetically selected attribute through the neural network. The results of experiments conclude that this approach can provide the best performance on the real network, in comparison with that by interrogative feature selection and any other single Data Mining approaches. Features of DDoS Trinoo Synk4 TFN2K Stacheldraht Attack Type UDP Flood SYN Flood UDP/SYN/ICMP flood,smurf UDP/SYN/ICMP flood,smurf Source IP Not Spoofing Spoofing Capable of control the spoofing level Automated Spoofing Source Port Not allow to specify Automatic selection Automatic selection(at random or sequentially) Automatic selection(at random or sequentially) Target Port Not allow to specify Specify the range Allow to specify Specify the range Etc. -Unidirectional control -Encrypted communication -Automated agent update -Encrypted communication achieved. This paper exercises with 2000 DARPA Intrusion tools are described in Table 1. Detection Scenario Specific Data Set in order to check out new Table 1. Features of DDoS Tools Jignesh Vania et.al. [10] proposed Association Rule based Data Mining approach to HTTP Botnet detection. Botnet is the most dangerous and widespread among all threats in today s cyber world. It is mainly a group of compromised computers connected via internet, mostly the liable hosts, are accessed remotely and controlled by botmaster to deliver various network threats and malicious activities which includes spamming, ID theft, phishing and spoofing. Among challenging characteristic of botnet, Command and Control centre is the most basic one through which botnet can be used to update and command. Page 30 Recently malignant botnets evolve into HTTP botnets out of common IRC botnets. Data Mining techniques allow us to automate detecting characteristics from vast amount of data, which the traditional heuristics and signature based methods could not apply. Rui Zhong et.al. [11] proposed a DDoS detection system based on Data Mining. DDoS brings a very serious threat to send to the stability of the Internet. This paper considers the nature of the DDoS attack and recently DDoS attack detection method presents a DDoS attack detection model based on Data
Mining algorithm. FCM (Fuzzy c means) cluster algorithm and Apriori association algorithm used to extract network traffic model and network packet protocol status model. Threshold is set for detection model. The experimental result shows that DDoS attacks can be detected efficiently and swiftly. Christos Douligeris et.al. [12] proposed a classification of DDoS attacks and defense mechanism. With little or no advance warning a DDoS attack can easily exhaust the computing and communication resources of its victim within a short period of time. This paper presents the problem of DDoS attacks and develops a classification of DDoS defense systems. The relevant features of each attack and defense system category are described and advantages and disadvantages of each proposed scheme are outlined. This paper focuses on the existing attack and defense system, so that a better consideration of DDoS attacks can be achieved and more efficient defense mechanisms can be devised. DDoS attack classification is described in Figure 4: This paper explains different kinds of Detection Techniques. These techniques can be easily understood by the Fig 3: Anomaly Statistical Parametric Non-parametric Semi-parametric Proximity DDoS: Detection Techniques Signature Expert System Hybrid Model based Reasoning State Transition Analysis Keystroke Monitoring Data Mining Neural Network Classification Technique Inductive Rule Fig 4. DDoS Attack Classification Shaveta Gupta et.al. [13] proposed a comprehensive review of detection techniques against DDoS attacks. DDoS attack has also become a problem for users of computer systems connected to the Internet. So shielding internet from these attacks has become the need. There are three solutions against a DDoS attack: Prevention, Detection and Reaction. Attack detection is one of the key steps in defending against DDoS attacks. There are some challenges that have to face while adopting any of the detection techniques. If attacks can be detected close to sources of attack, attack traffic can be refined before it wastes any network bandwidth. An acceptable detection technique should have minimum detection time, low false positive rate, low false negative rate but high normal packet survival ratio. This paper brings the relative analysis of various detection techniques with their corresponding advantages and disadvantages. In Figure 2, a taxonomy of Defense Mechanism against DDoS attack is shown. Fuzzy Logic Genetic Algorithms Neural Network Supervised Unsupervised Clustering Technique Association Full Discovery Machine Learning System call based sequenced analysis Bayesian Network Principal Component Analysis Page 31 Fig 2. A Taxonomy of Defense Mechanism Against DDoS Attack Markov Model Fig 3. DDoS Attack Detection Techniques
V. COMPARATIVE STUDY OF EXISTING RESEARCHES S. No. Name of algorithm / mechanism/ detection of DDoS attack Details Data Mining techniques used Deployment objective Refrences 1 Netshield protocol anomaly detection system using Alarm Matrix Protects network servers, routers and clients from DDoS attacks using protocol anomaly detection technique. Classification Victim side Attack prevention Hwang et.al.[14] 2 Traffic threshold model and packet protocol status model Uses fuzzy c-means clustering and Apriori techniques to build a model and detect unknown DDoS attacks. Fuzzy c-means cluster algorithm and Apriori association algorithm Victim side Attack detection Zhong and Yue[11] 3 Agent handler architecture Detects DDoS attack proactively based on cluster analysis with agent handler architecture. Cluster analysis technique Source side Attack detection Lee et.al.[4] Table 2. Comparison among different methods for DDoS attack detection using Data Mining techniques VI. CONCLUSION AND FUTURE WORK DDoS attack is an attempt to make a machine or network resources unavailable to legitimate user. In result of DDoS attack, network consumption leads to cost, delay and interruption in communication between various legal network users. Data Mining techniques provide very efficient way for discovering useful knowledge from available information. This paper is based on survey of Data Mining techniques in DDoS attack detection and focuses on various researches in the form of method, algorithm and protocol. Furthermore, this paper explores overall possibilities for finding DDoS attack using Data Mining technique. This survey provides opportunities for developing an advanced detection algorithm. It can improve detection rate resulting from existing work. It can analyze algorithm by using different types of DDoS attacks and data sets. REFERENCES [1] Mihui Kim, Hyunjung Na, Kijoon Chae, Hyochan Bang, and Jungchan Na (2004). A Combined Data Mining Approach for DDoS Attack Detection. ICOIN 2004, LNCS 3090, c Springer-Verlag Berlin Heidelberg. pp. 943 950. [2] Lin, S. C., & Tseng, S. S. (2004). Constructing detection knowledge for DDoS intrusion tolerance. Expert Systems with Applications, 27. pp 379 390. [3] Yoohwan Kim, Wing Cheong Lau, Mooi Choo Chuah And Jonathan H. Chao (2004). Packetscore: Statistical-Based Overload Control Against Distributed Denial-Of-Service Attacks. IEEE INFOCOM, The 23rd Annual Joint Conference of the IEEE Computer and Communications Societies, Hong Kong, China. [4] Keunsoo Lee, Juhyun Kim, Ki Hoon Kwon, Younggoo Han, Sehun Kim (2008). DDoS attack detection method using Cluster analysis. Expert Systems with Applications 34. pp 1659 1665. [5] P.Sundari, Dr.K.Thangadurai (2010). An Empirical Study on Data Mining Applications. Global Journal of Computer Science and Technology, Vol. 10 Issue 5 Ver. 1.0. pp 23-27. [6] G. Sager (1998). Security Fun with Ocxmon and Cflowd. presented at the Internet 2 Working Group. [7] R. Stone (2000). CenterTrack: An IP overlay network for tracking DoS floods. in Proc. USENIX Security Symp.pp.199 212. [8] K.C.Nalavade, and B.B.Meshram (2010). Identifying the Attack Source by IP Traceback. Springer, ICT 2010, CCIS 101. pp. 292-296. Page 32
[9] Kanwal Garg, Rshma Chawla (2011). Detection of DDoS attacks using Data Mining, International Journal of Computing and Business Research (IJCBR). Pp. 2229-6166. [10] Jignesh Vania, Arvind Meniya and Harikrishna Jethva (2013). Association Rule Based Data Mining Approach to HTTP Botnet Detection. IJAIEM, Volume 2, Issue 4, ISSN. pp 2319 4847. [11] Rui Zhong, and Guangxue Yue (2010). DDoS Detection System Based on Data Mining. ISBN 978-952-5726-09-1 (Print) Proceedings of the Second International Symposium on Networking and Network Security (ISNNS 10) Jinggangshan, P. R. China. pp 062-065. [12] Christos Douligeris and Aikaterini Mitrokotsa (2004). DDoS attacks and defense mechanisms: A Classification and state of-the-art. Computer Networks 44. pp 643-666. [13] Shaveta Gupta, Dinesh Grover and Abhinav Bhandari (2014). Detection Techniques against DDoS Attacks: A Comprehensive Review, International Journal of Computer Applications, Volume 96 No.5. pp 0975-8887. [14] Kai Hwang, Pinalkumar Dave and Sapon Tanachaiwat (2003). NetShield: Protocol Anomaly Detection with Data Mining Against DDoS Attacks. the Sixth International Symposium on Recent Advances in Intrusion Detection, Pittsburgh. [15] Sumit Dua and Xian Du (2011). Data Mining and Machine Learning in Cyber Security. Auerbach Publications. International Standard Book Number-13: 978-1-4398 39430. Page 33