Defending Against Distributed Denial of Service Attacks

Transcription

1 Defending Against Distributed Denial of Service Attacks By Tao Peng A thesis submitted to the University of Melbourne in total fullfillment for the degree of Doctor of Philosophy Department of Electrical and Electronic Engineering April 2004

2 ii c Tao Peng, Produced in L A TEX 2ε.

3 Abstract The Denial of Service attack, especially the Distributed Denial of Service (DDoS) attack, has become one of the major threats to the Internet. Generally, attackers launch DDoS attacks by directing a massive number of attack sources to send useless traffic to the victim. The victim s services are disrupted when its host or network resources are occupied by the attack traffic. The threat of DDoS attacks has become even more severe as attackers can compromise a huge number of computers by spreading a computer worm using vulnerabilities in popular operating systems. This thesis investigates DoS attacks (including DDoS attacks), and is divided into three parts. In the first part, we categorize existing defense mechanisms, and analyze their strengths and weaknesses. In particular, we design a countermeasure for each defense mechanism from the attacker s point of view. In the second part of our investigation, we develop and evaluate three defense models for DoS attacks: the Victim Model, the Victim-Router Model, and the Router- Router Model. Each of these models provides defense in a different part of the network, and has different resource requirements. The Victim Model provides defense at the target or victim of an attack. We develop a novel technique for identifying attack traffic based on the connection history at the victim. We then present a history-based IP filtering algorithm to filter attack traffic in an accurate and efficient manner. A key advantage of iii

4 this technique is that it can filter attack traffic while allowing the majority of normal traffic to reach the service under attack. The Victim-Router Model uses cooperation between the victim and its upstream routers to locate attack sources and filter attack traffic close to its source. We propose a novel method for tracing the attack path called adjusted probabilistic packet marking. In contrast to previous packet marking schemes, we can minimize the number of packets needed to trace the attack path. We also present a selective pushback scheme that uses the path information provided by packet marking in order to filter attack traffic close to its source. The Router-Router Model is a distributed defense architecture that can detect attack traffic close to its source. This model is based on a cooperative scheme in which routers can efficiently share evidence of attacks. In order to minimize the communication overhead of this approach, we apply a machine learning technique to decide when to share evidence between routers. A major advantage of this scheme is that it can detect and filter highly distributed attacks before the malicious traffic can congest the network. In each case, we use both analytical results and simulations based on real-life packet traces in order to demonstrate the effectiveness of our models. In the third part of our investigation, we assess the effectiveness of our defense models for different types of DoS attack. We categorize existing DoS attacks, and evaluate the advantages and disadvantages of our defense models in comparison to existing defense techniques. Finally, we demonstrate how our three defense models complement each other, and can be integrated into a robust solution for DoS attacks. iv

5 v Declaration This is to certify that (i) the thesis comprises only my original work towards the PhD, (ii) due acknowledgement has been made in the text to all other material used, (iii) the thesis is less than 100,000 words in length, exclusive of tables, maps, bibliographies and appendices. Tao Peng

6 vi

7 Acknowledgements I would like to thank my supervisors, Dr. Chris Leckie and Professor Rao Kotagiri for their many suggestions, generous help and constant support during my Ph.D research. I also want to thank Chen Zhenzhong for introducing me to the University of Melbourne, and Dr. Steven Low for encouraging me to pursue my study in the University of Melbourne. I feel grateful for the support I have received from the University of Melbourne through both MIRS and MIFRS scholarships. I also appreciate the ARC Special Research Center for Ultra-Broadband Information Networks (CUBIN) for sponsoring my conference trips. I feel extremely grateful for the strong and selfless support from my parents and my dear sister Peng Xiaowen. Without the sacrifices from my parents, I could never have the opportunity to undertake this study. It is my sister who devoted her encouragement, trust, and love to me during the most difficult time of my Ph.D study, which I will never forget. I also want to thank the Waikato Applied Network Dynamics Research Group for making their traces publicly available and Dr. Hai Vu for his comments to this thesis. Finally, I wish to thank Laurence, Malcolm, Alistair, Rami, Marija, Nicolas, Jolyon, Boon, John, Jun, Andrew, Bartek, Brian for their help and making CUBIN such a fun place. vii

8 viii

9 Contents Abstract Acknowledgements List of Figures List of Tables iii vii xv xxi 1 Introduction Background and Problem Statement The Internet Denial of Service (DoS) Attacks Technical Problems Research Objective Basic Concepts Source, Router and Victim Definition of Attacks Flow, Traffic Aggregate and Internet Protocols Firewall and IDS False Positive Rate and Detection Accuracy Defense Mechanism ix

10 x Contents Victim Model (VM) Victim-Router Model (VRM) Router-Router Model (RRM) Overview of the Thesis Contributions of the Thesis List of Publications A Survey of DoS Attacks Introduction Bandwidth Attacks Impacts of the Bandwidth Attack Inherent Vulnerabilities of the Internet Architecture Typical Bandwidth Attacks Existing DoS Attack Defense Proposals Attack Prevention Attack Detection Attack Source Identification Attack Reaction Summary History-based Attack Detection and Reaction Introduction Background Problem Definition Detecting Attacks in the Vicinity of the Attack Source Detecting Attacks Quickly in the Victim s Network Reacting to the Detected DDoS Attacks Motivation for History-based Attack Detection and Reaction

11 Contents xi Stopping DoS Attacks Using Traffic Control Algorithms Intrinsic Attack Feature Attack Detection and Reaction Our Solution: History-based Attack Detection and Reaction Overview of the HADR Scheme New Address Detection Engine Flow Rate Detection Engine Decision Engine Filtering Engine: History-based IP Filtering Placement of the HADR HADR Design The Choice of Detection Feature: New IP Addresses IP Address Database Design Abrupt Change Detection Hash Techniques An Example of IP Address Database Design Normal Traffic Behavior Consistency of the IP Addresses How to Build an Efficient IP Address Database Performance Evaluation DDoS Detection Using the New Address Detection Engine DDoS Detection Using the Flow Rate Detection Engine Performance of the History-based IP Filtering Complexity of Using History-based IP Filtering in Routers Attacks Against the HADR Infiltrating Attacks Countermeasures Against Sophisticated Infiltrating Attacks.. 134

12 xii Contents DDoS Attacks from Infiltrated Sources Discussion Conclusion Adjusted Probabilistic Packet Marking Introduction Motivation Background on Probabilistic Packet Marking (PPM) Passive Victim-Router Model: APPM Number of Hops Traversed by Packet (d 1 ) Number of Hops Traversed Since the Packet was Last Marked (d 2 ) Number of Hops from Current Router to Destination (d 3 ) Summary Evaluation for APPM Methodology Results Discussion APPM Against DDoS Attacks APPM and Marking Field Spoofing Conclusion Selective Pushback Introduction Background Definitions Limitation of Router-based Pushback Selective Pushback

13 Contents xiii Overview of Selective Pushback An Example of Selective Pushback Evaluation for Selective Pushback Simulation Methodology Results Discussion Implementation Overhead for Selective Pushback Other Related Issues Conclusion Distributed Detection by Sharing Combined Beliefs Introduction Problem Definition Distributed Denial of Service Attacks Reflector Attacks Motivation Motivation for Distributed DDoS Attack Detection Motivation for Distributed Reflector Attack Detection Methodology Detecting Abnormal Network Behavior Combining Beliefs for Attack Detection Learning When to Broadcast the Warning Message Evaluation Evaluation for DDoS Attack Detection Evaluation for Reflector Attack Detection Discussion Conclusion

14 xiv Contents 7 Analysis of DoS Defense Schemes Introduction Challenges for DoS Defense Schemes The Tragedy of the Commons The Power of Many Versus the Strength of Few Implementation Cost DoS Attack Category Victim Type The Parameters of Attack Power Average Flow Rate and Number of Flows Attack Traffic Rate Dynamics Impact of Attack Comparison Between Our Defense Models How to Use Our Defense Models DoS Attacks Versus Our Defense Models Integrate the VM with the VRM Other Related Issues Computer Crime Laws IP Version Conclusion Conclusion 235 Appendix A: Abbreviations and Glossary of terms 239 References 243 Index 256

15 List of Figures 1.1 The number of Internet security incidents reported from 1988 to 2003 (The data is collected from CERT [1]) A simple model of the Internet The relation of different types of attacks The number of vulnerabilities reported each year according to CERT [1] TCP 3-Way Handshake The UDP flooding is initiated by a single packet A smurf attack, using an intermediary network to amplify a Ping flood Structure of a typical DDoS attack (based on [2]) Structure of a distributed reflector denial of service (DRDoS) attack (based on [2]) Defending against IP source address spoofing using ingress filtering Router-based packet filtering An example of SAVE message update A model of DoS attack reaction schemes Intermediate network reaction: controller-agent scheme Basic SOS architecture The architecture of History-based Attack Detection and Reaction.. 75 xv

16 xvi List of Figures 3.2 The hash table for the detection engine The placement of the HADR scheme Effect of choice of detection feature on detecting the occurrence of an attack The sampling intervals for History-based Attack Detection and Reaction New Address Detection Engine algorithm Illustration of the CUSUM algorithm CUSUM algorithm For each IP packet, the Bloom filter computes k independent N-bit digests of the 32-bit source IP address, and sets the corresponding bits in the 2 N -bit table Number of IP addresses that appeared in at least d days Distribution of IP addresses that generated at least u packets The trace-driven simulation experiment Auck-IV-in Trace: the ratio of new IP addresses calculated in the time intervals of 10 seconds for each packet trace Auck-IV-out Trace: the ratio of new IP addresses calculated in the time intervals of 10 seconds for each packet trace Bell-I Trace: the ratio of new IP addresses calculated in the time intervals of 10 seconds for each packet trace Auck-IV-in Trace: CUSUM test statistics under normal operation Auck-IV-out Trace: CUSUM test statistics under normal operation Bell-I Trace: CUSUM test statistics under normal operation DARPA Dataset: DDoS Attack Scenario The DDoS attack detection sensitivity in the first-mile router using the Auck-IV-out trace: attacks with 10 new IP addresses

17 List of Figures xvii 3.21 The DDoS attack detection sensitivity in the first-mile router using the Auck-IV-out trace: attacks with 4 new IP addresses The DDoS attack detection sensitivity in the first-mile router using the Auck-IV-out trace: attacks with 2 new IP addresses The DDoS attack detection sensitivity for the last-mile router using the Auck-IV-in trace: attacks with 200 new IP addresses The DDoS attack detection sensitivity for the last-mile router using the Auck-IV-in trace: attacks with 40 new IP addresses The DDoS attack detection sensitivity for the last-mile router using the Auck-IV-in trace: attacks with 18 new IP addresses The flow rate distribution of the Auck-IV-in Trace Detection thresholds for the flow with source IP address DDoS attacks on a two-dimensional attack detection space The filtering accuracy of History-based IP Filtering Accuracy of the combined rule F c = p 1 (d) p 2 (u) on the Auckland Trace Memory requirements for the IP Address Database on the Auckland trace The relation between the IP Address Database and the source IP addresses of attack traffic Probabilistic Packet Marking Definition of different distance measures Adjusted Probabilistic Packet Marking Scheme One Adjusted Probabilistic Packet Marking Scheme Two Adjusted Probabilistic Packet Marking Scheme Three APPM Schemes 1, 2, and 3 compared with uniform marking probability p=0.01, 0.04, and

18 xviii List of Figures 4.7 Effect of Spoofing the Marking Field (Fake sub-path: v f1 to v f3, true path: v 1 to v 3 ) Router map showing attack traffic in bold An example of the Selective Pushback scheme Simulation topology The traffic distribution of router R Detecting a single attack source between 2AM and 3AM Detecting a single attack source between 11AM and 12PM Detecting a DDoS attack with one of the 6 distributed attack sources at router R3.0 between 2AM and 3AM Detecting a DDoS attack with one of 6 distributed attacks sources at router R3.0 between 11AM and 12PM Detecting a DDoS attack with one of the 6 distributed attack sources at router R3.6 between 2AM and 3AM Detecting a DDoS attack with one of 6 distributed attacks sources at router R3.6 between 11AM and 12PM Detecting a DDoS attack with one of the 6 distributed attack sources at router R3.3 between 2AM and 3AM Detecting a DDoS attack with one of 6 distributed attacks sources at router R3.3 between 11AM and 12PM A simple topology to show the advantage of Selective Pushback over Router-based Pushback The challenge of first-mile HADR detection Overview of detecting reflector attacks Combining Beliefs for DDoS Attack Detection The algorithm for learning when to broadcast the warning message.. 205

19 List of Figures xix 6.5 Performance of decision functions of the Router-Router Model for DDoS attack detection Convergence of the broadcast threshold optimization in the Router- Router Model for DDoS attack detection The CUSUM statistics for L in distributed reflector attack detection Performance of decision functions of the Router-Router Model for reflector attack detection Convergence of the broadcast threshold optimization in the Router- Router Model for reflector attack detection The accuracy of distributed detection of reflector attacks Categorization of DoS attacks according to victim type Categorization of DDoS attacks according to the parameters of attack power The architecture for combining the Victim Model and the Victim- Router Model

20 xx List of Figures

21 List of Tables 2.1 Comparison between bandwidth attacks and flash crowds Basic assumptions for different attack detection techniques Percentage of IP addresses in a single day that have previously appeared in the past fortnight Summary of the packet traces used for testing Detection performance of the first-mile router Detection performance of the last-mile router The rule for the decision engine The false positive rate for router R3.0, R3.6, and R3.3 when there are 6 uniformly distributed attack sources The detection performance of our scheme against DDoS attacks with different number of attack sources Comparison between our defense models Summary: DoS attacks versus DoS defense models xxi

22 xxii List of Tables

23 Chapter 1 Introduction The Internet was initially designed for openness and scalability. The infrastructure is certainly working as envisioned by that yardstick. However, the price of this success has been poor security. On the Internet, anyone can send any packet to anyone without being authenticated, while the receiver has to process any packet that arrives to a provided service. The lack of authentication means that attackers can create a fake identity, and send malicious traffic with impunity. All systems connected to the Internet are potential targets for attacks since the openness of the Internet makes them accessible to attack traffic. A Denial of Service (DoS) attack aims to stop the service provided by a target. It can be launched in two forms. The first form is to exploit software vulnerabilities of a target by sending malformed packets and crash the system. The second form is to use massive volumes of useless traffic to occupy all the resources that could service legitimate traffic. While it is possible to protect the first form of attack by patching known vulnerabilities, the second form of attack cannot be so easily prevented. The targets can be attacked simply because they are connected to the public Internet. When the traffic of a DoS attack comes from multiple sources, we call it a Distributed Denial of Service (DDoS) attack. By using multiple attack sources, the power of a DDoS attack is amplified and the problem 1

24 2 Introduction of defense is made more complicated. This thesis presents several techniques for defending against DDoS attacks, and evaluates their effectiveness against a variety of DDoS attacks. 1.1 Background and Problem Statement The Internet The Internet (originally known as ARPANET) was first created in 1969 as a research network sponsored by the Advanced Research Projects Agency (ARPA) of the Department of Defense (DoD) in the United States of America. The original aim was to provide an open network for researchers to share their research resources [3]. Therefore, openness and growth of the network were the design priorities while security issues less of a concern [3]. The occurrence of the Morris Worm [4] in 1988 marked the first major computer security incident on the Internet. However, the world was not so dependent on the Internet as it is now. The Internet was still limited to research and educational communities until the late 1990s. Hence, not much attention was paid to Internet security. In the last decade, the phenomenal growth and success of the Internet is changing its traditional role. The Internet is no longer just a tool for the researchers. It has become the main infrastructure of the global information society. Governments use the Internet to provide information to the citizens and the world at large, and they will increasingly use the Internet to provide government services. Companies share and exchange information with their divisions, suppliers, partners and customers efficiently and seamlessly. Research and educational institutes depend more on the Internet as a platform for collaboration and as a medium for disseminating their research discoveries rapidly. Unfortunately, with the growth of the Internet, the

25 1.1 Background and Problem Statement 3 16 x Estimated number of incidents reported in 2003 is 153,140 Number of incidents reported Number of incidents reported in 1988 was 6 Number of incidents reported in 2002 was 82, Year Figure 1.1: The number of Internet security incidents reported from 1988 to 2003 (The data is collected from CERT [1]). attacks to the Internet have also increased incredibly fast. According to CERT [1], a center of Internet security expertise located in the U.S., the number of reported Internet security incidents has jumped from 6 in 1988 to 82,094 in 2002, and the estimated number of Internet security incidents in 2003 is 153,140. The growth in the number of incidents reported between 1998 to 2003 is shown in Figure 1.1. More importantly, traditional operations in essential services, such as banking, transportation, power, medicine, and defense are being progressively replaced by cheaper, more efficient Internet-based applications. Historically, an attack to a nation s critical services involves actions that need to cross a physical boundary. These actions can be intercepted and prevented by a nation s security services. However, the

26 4 Introduction global connectivity of the Internet renders physical boundaries meaningless. Internetbased attacks can be launched anywhere in the world, and unfortunately no Internetbased services are immune from these attacks. Therefore, the reliability and security of the Internet not only benefits on-line businesses, but is also an issue for national security Denial of Service (DoS) Attacks A DoS attack is a malicious attempt by a single person or a group of people to disrupt an online service. DoS attacks can be launched against both services, e.g., a web server, and networks, e.g., the network connection to a server. The impact of DoS attacks can vary from minor inconvenience to users of a website, to serious financial losses for companies that rely on their on-line availability to do business. On February 9, 2000, Yahoo, ebay, Amazon.com, E*Trade, ZDnet, Buy.com, the FBI, and several other Web sites fell victim to DoS attacks resulting in substantial damage and inconvenience [5]. As emergency and essential services become reliant on the Internet as part of their communication infrastructure, the consequences of DoS attacks could even become life-threatening. Hence, it is crucial to deter, or otherwise minimize, the damage caused by DoS attacks Technical Problems There are four different ways to defend against DoS attacks: (1) attack prevention; (2) attack detection; (3) attack source identification; and (4) attack reaction. Attack prevention aims to fix security holes, such as insecure protocols, weak authentication schemes and vulnerable computer systems, which can be used as stepping stones to launch a DoS attack. This approach aims to improve the global security level and is the best solution to DoS attacks in theory. However, the disadvantage is that it needs

27 1.2 Research Objective 5 global cooperation to ensure its effectiveness, which is extremely difficult in reality. Hence, the challenge is how to develop a scalable mechanism with low implementation cost. Attack detection aims to detect DoS attacks in the process of an attack. Attack detection is an important procedure to direct any further action. The challenge is how to detect every attack quickly without misclassifying any legitimate traffic. Attack source identification aims to locate the attack sources regardless of the spoofed source IP addresses. It is a crucial step to minimize the attack damage and provide deterrence to potential attackers. The challenge for attack source identification is how to locate attack sources quickly and accurately without changing current Internet infrastructure. Attack reaction aims to eliminate or curtail the effects of an attack. It is the final step in defending against DoS attacks, and therefore determines the overall performance of the defense mechanism. The challenge for attack reaction is how to filter the attack traffic without disturbing legitimate traffic. 1.2 Research Objective The objective of this research is to develop practical and scalable mechanisms to detect and react to DoS attacks. These defense mechanisms should detect the DoS attack quickly and accurately, ensure reasonable performance for the networks or systems under attack, and track the attack sources accurately with low computational overhead. Our research also includes a classification of different defense models according to their implementation cost and cooperation levels. We investigate the strengths and weaknesses of each model, and provide extensive analysis of the methods for DoS attack defense.

28 6 Introduction 1.3 Basic Concepts Figure 1.2: A simple model of the Internet. In this section, we introduce some basic concepts on which this thesis is based Source, Router and Victim We define a source as a device that can generate Internet traffic. The source could be a university s mail server, a company s web server or a home PC connected to the Internet. When the source is used to generate attack traffic, it becomes an attack source. In the rest of the thesis, unless otherwise stated, source refers to attack source. We define a third party as a device that is not compromised but is used by an attacker to generate attack traffic without notice. We define a victim as a system that provides an Internet service and whose service is disrupted during an attack. We define a target as a system that is being attacked or will be attacked by an attacker. If the services of a target are damaged during an attack, then the target becomes a victim. The victim could be a government s web server, a regional DNS server or an ISP s router. Depending on actual network conditions, a connected device could be a source or a victim or both. The end host can be defined as a device that connects to the end of the Internet. We use the term edge router to refer to the router that provides access to the Internet

29 1.3 Basic Concepts 7 for the subnetwork that we are defending. For incoming traffic, the edge router can be described as the last-mile router. For outgoing traffic, the edge router can be described as the first-mile router. We define a user s upstream routers as the routers that connect the user to the Internet. Given two routers A and B, if A is B s upstream router, then B is A s downstream router. These definitions are illustrated in Figure Definition of Attacks In general, a denial of service (DoS) attack is any attack which makes an on-line service (e.g., Web Service) unavailable. The attack could involve a single packet (e.g., the land attack [6]) exploiting software bugs in a server, or a traffic stream with a tremendous number of packets that congest the target s server or network. We define a bandwidth attack as any attack that consumes a target s resources through a massive traffic volume. In this thesis, we focus on bandwidth attacks, and henceforth we mean bandwidth attack when we refer to denial of service attacks unless otherwise stated. The distributed denial of service (DDoS) attack is a bandwidth attack whose attack traffic comes from multiple sources. To launch a DDoS attack, an attacker usually compromises many insecure computers connected to the Internet first. Then a DDoS attack is launched from these compromised computers. The reflector attack is an attack where innocent third-parties (reflectors) are used to bounce attack traffic from the attacker to the target. A reflector can be any network device that responds to any incoming packet, for example, a web server. The attacker can make the attack traffic highly distributed by using many reflectors. The reflector attack is a type of DDoS attack. To summarize, the relations between different types of attacks are illustrated in Figure 1.3.

30 8 Introduction Figure 1.3: The relation of different types of attacks Flow, Traffic Aggregate and Internet Protocols We define the IP flow as a sequence of packets with the same source IP address. We define the traffic aggregate as a group of IP flows that share the same feature, for example, the same destination address. We define the Internet protocols as all the protocols that are used in the Internet, such as IP, TCP and UDP Firewall and IDS Firewall is an access control device that admits incoming traffic according to a set of rules. An Intrusion Detection System (IDS) is a traffic monitoring system that analyzes the network traffic and reports any suspicious network behavior False Positive Rate and Detection Accuracy We define a false positive as a normal operation that is misdiagnosed by the detection scheme as an attack. The false positive rate is defined as the number of false positives

31 1.4 Defense Mechanism 9 divided by the total number of detection decisions made. We define a false negative as an attack that has not been detected by the detection scheme. The false negative rate is defined as the number of false negatives divided by the total number of detection decisions made. We define the detection accuracy as the number of attacks detected divided by the total number of attacks. 1.4 Defense Mechanism The defense mechanisms taken by this thesis can be categorized into three basic models: (1) the Victim Model (VM), (2) the Victim-Router Model (VRM), (3) the Router-Router Model (RRM). Each model is classified according to where the defense mechanism is employed, and how the network components, such as the victim and router, cooperate together Victim Model (VM) The VM is a traditional defense model that identifies and filters attack traffic at a single location, namely, the victim. The key issue of the VM is to be able to identify the attack traffic pattern accurately and efficiently. Our proposal consists of a detection agent and a packet filter that are based on the analysis of the connection history to the victim. We define a new IP address as an IP address that does not appear in the target s connection history. The premise of our approach is that attack traffic is most likely to contain new IP addresses. Therefore, we identify the attack traffic by checking whether it has many new IP addresses. The high proportion of new IP addresses indicates an attack. Once a DoS attack is detected, the VM filters the traffic that contains new IP addresses. The purpose of the research for this model is to demonstrate a stand-alone defense mechanism that does not require cooperation with other network systems.

32 10 Introduction Victim-Router Model (VRM) The VRM is a cooperative model that identifies and filters the attack traffic at multiple locations. The defense process is triggered by the signal from a victim and accomplished with the cooperation from participating routers. There are two key points for this model. The first is that the victim identifies the attack source using information inserted by the upstream routers. The second is that the victim directs the routers close to the attack sources to filter attack traffic. To implement this model, routers need to run a lightweight packet marking process to include path information into the packets. In addition, the victim needs to analyze the incoming packets to locate the attack sources. Once the victim identifies the attack sources, control messages will be sent to the routers that are adjacent to the attack sources. The routers will then start to filter attack traffic according to the received control messages. The objective of this research is to investigate the feasibility of a victim-based defense architecture that can be implemented incrementally Router-Router Model (RRM) The RRM is a distributed defense model that detects the attack traffic by sharing information among participating routers. The ultimate goal of DDoS defense is to filter attack traffic close to the attack sources so that both network and server resources will be saved. Therefore, routers close to attack sources should be able to identify attack traffic quickly and accurately. However, if the attack sources for a DDoS attack are highly distributed (for instance, a reflector attack), little attack traffic will be observed by a single router. Due to the scarce attack traffic, it is nearly impossible for an individual router to detect an attack, and an effective solution must be formulated with a distributed approach. In the RRM model, each router reports any suspicious network behavior to other routers. At the same time, each router

33 1.5 Overview of the Thesis 11 combines the reports from other routers with network statistics observed locally to decide whether an attack has happened. The aim of this research is to study how distributed detection can improve the detection accuracy and reduce the detection time. Each of these three models has its own trade-off between defense performance and implementation overhead. The defense performance includes detection time, detection accuracy, and what proportion of resources are protected at the victim. The implementation overhead includes computational overhead, level of manual intervention, and cost of deployment. The VM provides a short-term and basic solution to DDoS attacks. Its defense scheme can be extended by adding the VRM and RRM. The VRM and RRM can be regarded as long-term solutions that need cooperation from multiple network devices, such as upstream routers. In an ideal situation, all three models can be integrated to achieve better performance. 1.5 Overview of the Thesis This thesis is composed of eight chapters. Chapter 1 introduces the problem of the DoS attack problem and defines some basic concepts related to the DoS attack defense. Chapter 2 gives a detailed literature review of DoS attack prevention, detection and reaction. Chapter 3 discusses our Victim Model (VM), where we present our history-based scheme to detect attacks and filter the attack traffic. Chapters 4 and 5 discuss our Victim-Router Model (VRM), where Chapter 4 gives a traceback scheme to locate the attack sources and Chapter 5 presents a selective pushback scheme to filter the attack traffic close to the attack sources. Chapter 6 discusses our Router- Router Model (RRM), where we present a distributed detection scheme, as well as a machine learning approach for sharing information between distributed detection systems. Chapter 7 gives a detailed analysis of all three models, and discusses how

34 12 Introduction to combine these models to provide an integrated solution to DoS attacks. Chapter 8 concludes the thesis by presenting a summary of our proposed defense mechanisms. All the abbreviations and glossary terms are listed in the Appendix. 1.6 Contributions of the Thesis In this thesis, the following contributions have been made. Chapter 2 We categorize the existing solutions to DoS attacks according to their operation. We highlight the limitations of each defense mechanism and formulate a set of attacks against each scheme. Chapter 3 We propose an independent defense model that can detect and filter attack traffic locally. We propose a novel attack detection model based on the connection history at the victim. We use trace-driven simulations to prove the efficiency and accuracy of this defense model. We propose a history-based IP filtering algorithm to filter attack traffic accurately with low computational overhead. The results of this chapter are presented in [7] and [8].

35 1.6 Contributions of the Thesis 13 Chapter 4 We propose an adjusted probabilistic packet marking (APPM) scheme to identify the IP sources reliably and efficiently. We propose three different methods for implementing our APPM scheme. The results of this chapter are presented in [9]. Chapter5 We propose a network defense architecture to detect the attack at the victim and filter attack traffic at the routers close to attack sources. By analyzing the variance of the traffic distribution, our defense model can quickly and accurately detect DoS attacks and locate the attack sources. We conduct extensive simulations to verify the analytical results of our model and gain further insight into its operation. The results of this chapter are presented in [10]. Chapter 6 We propose a distributed detection architecture that can detect DDoS attacks close to the attack sources and apply a machine learning scheme to improve the performance of the distributed approach. We analyze the traffic characteristics of reflector attacks, and propose a distributed approach to detect them. The results of this chapter are presented in [8] and [11].

36 14 Introduction Chapter 7 We compare the strengths of each model and investigate the feasibility of combining these three models together to provide an integrated solution. We analyze the limits of our proposed DDoS defense approaches and their effectivenesses against different types of DoS attacks. 1.7 List of Publications The following list of publications were generated in the course of conducting the research that contributed to this thesis. Published Papers T. Peng, C. Leckie, and K. Ramamohanarao. Adjusted probabilistic packet marking for IP traceback. In Proceedings of the Second IFIP Networking Conference (Networking 2002), pp (Pisa, Italy, 2002). T. Peng, C. Leckie, and K. Ramamohanarao. Defending against distributed denial of service attack using selective pushback. In Proceedings of 9th IEEE International Conference on Telecommunications (ICT 2002), pp (Beijing, China, 2002). T. Peng, C. Leckie, and K. Ramamohanarao. Prevention from distributed denial of service attacks using history-based IP filtering. In Proceeding of 38th IEEE International Conference on Communications (ICC 2003), pp (Anchorage, Alaska, USA, 2003) T. Peng, C. Leckie, and K. Ramamohanarao. Detecting distributed denial of service attacks by sharing distributed beliefs. In Proceedings of 8th Australasian

37 1.7 List of Publications 15 Conference on Information Security and Privacy (ACISP 2003), pp (Wollongong, Australia, 2003). T. Peng, C. Leckie, and K. Ramamohanarao. Detecting reflector attacks by sharing beliefs. In Proceedings of IEEE 2003 Global Communications Conference (Globecom 2003) (San Francisco, California, USA, 2003). Other Papers Under Preparation T. Peng, C. Leckie, and K. Ramamohanarao. Proactively detecting DDoS attack using source IP address monitoring. (to appear in Networking 2004, Athens, Greece.). T. Peng, C. Leckie, and K. Ramamohanarao. Information Sharing for Distributed Intrusion Detection Systems. (submitted to the Journal of Computer Communications) T. Peng, C. Leckie, and K. Ramamohanarao. A survey on DDoS defense mechanisms. (in preparation, to be submitted to ACM Computing Survey) T. Peng, C. Leckie, and K. Ramamohanarao. Victim-Router Mode DDoS Defense Mechanism. (in preparation, to be submitted to IEEE/ACM Transactions on Networking)

38 16 Introduction

39 Chapter 2 A Survey of DoS Attacks This chapter presents a survey of denial of service attacks. In this survey, we analyze the fundamental weaknesses of the Internet in terms of its vulnerabilities for denial of service attacks. We review the proposed methods for defending against denial of service attacks, discuss the strengths and weaknesses of each proposal, and present countermeasures that an attacker may employ to defeat the protection provided by each proposal. 2.1 Introduction Network intrusion has been a growing concern since the invention of the Internet. Problems such as viruses, worms, and hackers are widely reported [12]. Although there is no clear definition of computer and network intrusions, we define two main categories according to the attacker s motivations, namely, unauthorized access and denial of service attacks. Unauthorized access can take a number of forms, which include the user-to-root attack, the remote-to-local attack, and the scan attack [13]. The user-to-root attack occurs when a normal user gains privileged (root) access to a computer by exploiting 17

40 18 A Survey of DoS Attacks a vulnerability of either the operation system or the installed software. A classic example occurred in the early UNIX systems using the finger daemon [14]. The finger daemon neglected to limit of the size of an input string, which has a potential risk of causing buffer overflow. Since the finger daemon has root privileges, by carefully designing a special input string, the attacker is able to exploit the buffer overflow to execute any command as root. The remote-to-local attack occurs when a user of a remote system gains local access to the computer by exploiting a vulnerability of the system. For example, the code red worm gained access to the Microsoft IIS server by sending a malicious HTTP GET request [15]. The scan attack occurs when an attacker makes a reconnaissance of the target network, such as the type of operating system and any open ports on the hosts of the target network. The attacker uses this knowledge to launch an attack using any known vulnerabilities of the operating system and network services that are running at the target hosts. For example, the attacker can send an unusual TCP packet where the SYN flag, which is used to indicate the beginning of a connection, and the FIN flag, which is used to indicate the end of a connection, are both set at the same time. By analyzing the reply of the target host, the attacker can check whether the reply matches the fingerprints of a specific operating system. The second category of security problems is the denial of service (DoS) attack. In this case, the attacker s aim is to make the service provided by the victim unavailable to legitimate users rather than obtain unauthorized access. There are two types of DoS attacks. The first type of DoS attack has the aim of disrupting the services provided by the victim by exploiting a software vulnerability of the system. For example, the ping-of-death attack [6] sends a packet with an illegal payload (i.e., longer than 64K bytes), which causes some operating systems to lock up or reboot due to buffer overflow. The second type of DoS attack is based on the volume of traffic, which is known

41 2.2 Bandwidth Attacks 19 as a bandwidth attack. Bandwidth attacks became a major security concern after massive bandwidth attacks paralyzed many high profile web sites, such as CNN and Yahoo, causing substantial financial loss in February 2000 [5]. After this severe incident, defending against bandwidth attacks has become a very important research issue for both academia and industry. Many schemes have been proposed and many commercial products have been produced to tackle this problem. After three years, one might ask whether the threat of bandwidth attack has been eliminated. Is it safe to run a mission-critical business via the Internet? This chapter presents a survey of the proposed technologies to defend against bandwidth DoS attacks. The rest of the chapter is organized as follows. Section 2.2 gives a definition of the bandwidth attack and the fundamental vulnerabilities of the Internet that facilitate bandwidth attacks. Section 2.3 gives a detailed review of the proposed solutions to DoS attacks. Section 2.4 discusses the remaining threats and open issues in solving denial-of-service problems. 2.2 Bandwidth Attacks The bandwidth attack can be defined as any activity that aims to disable the services provided by the victim by sending an excessive volume of useless traffic. This is in contrast to the flash crowd which occurs when a large number of legitimate users access a server at the same time. The comparison between bandwidth attacks and flash crowds is shown in Table Impacts of the Bandwidth Attack There are two major impacts of bandwidth attacks. This first is the consumption of the host s resources. Generally, the victim could be a web server or proxy connected to the Internet. The victim has limited resources to process the incoming packets.

42 20 A Survey of DoS Attacks Table 2.1: Comparison between bandwidth attacks and flash crowds Bandwidth Attack Flash Crowd Network impact congested congested Server impact overloaded overloaded Traffic illegitimate genuine Response to traffic control unresponsive responsive Traffic type any mostly web Number of flows any large number of flows Predictability unpredictable mostly predictable When the traffic load becomes high, the victim will drop packets to inform senders, which consist of both legitimate users and attack sources, to reduce their sending rates. Legitimate users will slow down their sending rates while the attack sources will maintain or increase their sending rates. Eventually, the victim s resources, such as CPU and memory, will be used up and the victim will be unable to service legitimate traffic. The second impact is consumption of the network bandwidth, which is more threatening than the first. If the malicious flows are able to dominate the communication links that lead to the victim, then the legitimate flows will be blocked. Therefore, not only the intended victim of the attack is disabled, but also any system which relies on the communication links of the attack path. Although a congested router can control the traffic flow by dropping packets, legitimate traffic will also be discarded if there is no clear mechanism to differentiate legitimate traffic from attack traffic Inherent Vulnerabilities of the Internet Architecture Bandwidth attacks are the result of several fundamental weaknesses of the Internet architecture.

43 2.2 Bandwidth Attacks 21 Connectivity and Resource-sharing The Internet is designed as an open public infrastructure to share information resources. This has two consequences. First, the potential victims, such as web servers, must connect to the Internet and be visible to the public in order to provide public service. The visibility is made via a globally routable IP address. Second, the Internet is based on packet-switching, unlike its counterpart, the public telecommunication networks, which is based on circuit-switching. For circuit-switched networks, each service (e.g. a phone call) will be allocated a separate channel until the end of the service. A user s service will not be interfered by other users behavior. In contrast, for packet-switched networks, users share all the resources and one user s service can be disturbed by other users behavior. Bandwidth attacks take advantage of these two consequences: Attack packets will be delivered to the victim before knowing whether they are malicious or not. By occupying most of the shared resources, bandwidth attacks manage to disrupt the services for the legitimate users. Authentication and Traceability The Internet is equipped with no authentication scheme, which leads to a serious problem, IP spoofing. IP spoofing refers to creating an IP packet containing fake information. IP source address spoofing occurs when one IP packet is generated without using the source IP address that is assigned by the computer system. Without an integrity check for each IP packet, attackers can spoof any field of an IP packet and inject it into the Internet. Moreover, the routers generally do not have packet tracing functions, for example, keeping all previous connection records. In practice, this cannot be done due to the huge amount of traffic that needs to be stored. Therefore, once an IP packet is received by the victim, there is no way to authenticate whether

44 22 A Survey of DoS Attacks the packet actually comes from where it claims. By hiding their identities using IP spoofing, the attacker can launch bandwidth attacks without being responsible for the damage. Reliability of Global Network Infrastructure Denial-of-service occurs when the attacker is able to consume all the victim s resources. Generally, even a well-provisioned target can be disabled by the traffic generated by a single attack source. Moreover, the global network infrastructure is not guaranteed to be reliable, which gives attackers an opportunity to amplify their attack power. First, some inappropriate protocol designs can make one malformed attack packet consume much more resources than one normal packet. For example, one SYN packet can consume more resources at the target than other normal TCP packets. Second, the Internet is a huge community, where many insecure systems exist. Unfortunately, the number of vulnerabilities reported each year is increasing according to CERT statistics [1] as shown in Figure 2.1. Hence, an attacker can control a large number of insecure systems by exploiting their vulnerabilities. By launching bandwidth attacks from these controlled systems, the attack power is tremendously increased Typical Bandwidth Attacks We classify bandwidth attacks according to the way the attack power is magnified. The first category is protocol bandwidth attacks that take advantage of the Internet protocols. The second category is DDoS attacks that amplify attack power using a large number of distributed attack sources. In practice, a real attack can belong to both of these categories at the same time.

45 2.2 Bandwidth Attacks 23 Number of vulnerabilities reported Number of vulnerabilities reported in 1995 is 171 Number of vulnerabilities reported in 2001 is 2,437 Number of vulnerabilities reported in 2002 is 4, Year Figure 2.1: The number of vulnerabilities reported each year according to CERT [1] Protocol-based Bandwidth Attacks The protocol bandwidth attack can normally be launched effectively from a single attack source. Its attack power is based on the weaknesses of the Internet protocols. It can be broadly categorized as SYN flood, UDP flood, or ICMP flood. SYN flood In order to describe the SYN flood attack, we first need to define several aspects of TCP connections. We define the client as the one who initiates the TCP connection, and the server as the one who receives the connection request. At the beginning of each TCP connection, the client will negotiate with the server to set up a connection, which is called 3-way handshake and is illustrated in Figure 2.2. First, the client will send a SYN packet to the server, requesting a connection. Then the server will respond to the connection request using a SYN-ACK packet, and store the request information in the memory stack. After receiving the SYN-ACK packet, the client will confirm the request using an ACK packet. When the server receives the

46 24 A Survey of DoS Attacks Figure 2.2: TCP 3-Way Handshake ACK packet, it checks the memory stack to see whether this packet is used to confirm an existing request. If it is, the server will remove the request information from the memory stack and start actual data communication. The SYN flood attack exploits a vulnerability of the TCP/IP protocol and is one of the most powerful and commonly seen attacks in the Internet [16]. During SYN flood attacks, the attacker sends SYN packets with source IP addresses that do not exist or are not in use. During the 3-way handshake, when the server puts the request information into the memory stack, it will wait for the confirmation from the client that sends the request. Before the request is confirmed, it will remain in the memory stack. Since the source IP addresses used in SYN flood attacks are non-existent, the server cannot receive confirmation packets for requests created by the SYN flood attack. Thus, more and more requests will accumulate and fill up the memory stack. Therefore, no new request, including legitimate requests, can be processed and the services of the system are disabled. Generally, the space for the memory stack allocated by the operating system is small, and even a small scale SYN flood attack can be dangerous. Mechanisms to defend against SYN flood attack in discussed in Section

47 2.2 Bandwidth Attacks 25 Figure 2.3: The UDP flooding is initiated by a single packet. UDP flood The User Datagram Protocol (UDP) is a connectionless protocol that does not have flow control mechanisms, i.e., there is no built-in mechanism for the sender and receiver to be synchronized to adapt to changing network conditions. The UDP flood is a type of bandwidth attack that uses UDP packets. Since UDP does not have flow control mechanisms, when traffic congestion happens, both legitimate and attack flows will not reduce their sending rates. Hence, the victim is unable to decide whether a source is an attack source or legitimate source by just checking the source s sending rate. Moreover, unlike TCP, UDP does not have a negotiation mechanism before setting up a connection. Therefore, it is easier to spoof UDP traffic without being noticed by the victim. Figure 2.3 gives an example of how a single spoofed UDP packet can initiate a never-ending attack stream. The attacker sends a UDP packet to victim 1, claiming to be from victim 2, requesting the echo service. Since victim 1 does not know this is a spoofed packet, it echoes a UDP packet to victim 2 at port 7 (echo service). Then victim 2 does exactly the same as victim 1 and the loop of sending echo requests will never end unless it is stopped by the external source [17]. In addition, if two or more hosts are so connected, the intervening network may also become congested and deny service to all hosts whose traffic traverses that network.

48 26 A Survey of DoS Attacks Figure 2.4: A smurf attack, using an intermediary network to amplify a Ping flood. Solutions to the UDP flood are discussed in [17], which include disabling any unused UDP services, e.g., the echo service. ICMP Flood The Internet Control Message Protocol (ICMP) is based on the IP protocol and is used to diagnose network status. An ICMP flood is a type of bandwidth attack that uses ICMP packets. On IP networks, a packet can be directed to an individual machine or broadcast to an entire network. When a packet is sent to an IP broadcast address from a machine on the local network, that packet is delivered to all machines on that network. When a packet is sent to that IP broadcast address from a machine outside the local network, it is broadcast to all machines on the target network (as long as routers are configured to pass along that traffic). IP broadcast addresses are usually network addresses with the host portion of the address having all one bits. For example, the IP broadcast address for the network 10.*.*.* is , and for the network *.* is Network addresses with all zeros in the host portion, such as , can also produce a broadcast response. The smurf attack is a type of ICMP flood, where attackers use ICMP echo

49 2.2 Bandwidth Attacks 27 request packets directed to IP broadcast addresses from remote locations to generate denial-of-service attacks. There are three parties in these attacks: the attacker, the intermediary, and the victim (note that the intermediary can also be a victim) [18]. Figure 2.4 gives an example of the Smurf Attack. First, the attacker sends one ICMP echo request packet to the network broadcast address and the request is forwarded to all the hosts within the intermediary network. Second, all of the hosts within the intermediary network send the ICMP echo replies to flood the victim. Solutions to the smurf attack are discussed in [18], which include disabling the IP-directed broadcast service at the intermediary network. The aforementioned protocol bandwidth attacks utilize TCP, UDP, and ICMP traffic respectively, which are commonly observed in the Internet. All these attacks are based on the spoofed IP addresses and take advantage of the vulnerabilities of the Internet protocols. DDoS Attacks The Distributed Denial of Service (DDoS) attack is a type of bandwidth attack, where the attack traffic is launched from the multiple distributed sources. The attack power of a DDoS attack is based on the massive number attack sources instead of the vulnerabilities of one particular protocol. Hence, the DDoS attack can consist of all types of traffic. There are two common scenarios for DDoS attacks, which we define as the typical DDoS attack and the distributed reflector denial of service (DRDoS) attack [2]. Typical DDoS attack As we see in Figure 2.5 [2], a typical DDoS attack contains two stages. The first stage is to compromise vulnerable systems available in the Internet and install attack tools in these compromised systems. This is known as turning the computers into zombies. Second, the attacker sends an attack command

50 28 A Survey of DoS Attacks Figure 2.5: Structure of a typical DDoS attack (based on [2]). to the zombies through a secure channel to launch a bandwidth attack against the victim(s) [19]. The attack traffic is sent from the zombies to the innocent third-parties. The attack traffic could use genuine or spoofed source IP addresses. However, there are two major motivations for the attacker to use randomly spoofed IP addresses: (1) to hide the identity of the zombies and reduce the risk of being traced back via the zombies ; (2) to make it hard or impossible to filter this type of traffic without disturbing the legitimate traffic. In this thesis, we propose several novel defense mechanisms against this type of attack. Distributed Reflector Denial of Service (DRDoS) attack Figure 2.6 [2] illustrates another type of DDoS attack called a distributed reflector denial of service (DRDoS) attack, which uses a third-party (routers or web servers) to bounce the attack traffic to the victim. The DRDoS attack also contains three stages. The first stage is very similar to the typical DDoS attack. However, in the second stage, after the attacker has gained control of a certain number of zombies, instead of instructing the zombies to send attack traffic to the victims directly, the zombies are ordered to send the spoofed traffic with the victim s IP address as the source IP address to the third parties. In the third stage, the third parties will then send the

51 2.2 Bandwidth Attacks 29 Figure 2.6: Structure of a distributed reflector denial of service (DRDoS) attack (based on [2]). reply traffic to the victim, which constitutes a DDoS attack. This type of attack shut down a security research website, in January 2002, and is considered to a be a potent, increasingly prevalent, and worrisome Internet attack [20]. Compared with the typical DDoS attack, the DRDoS attack is more dangerous, for the following reasons. First, the DRDoS attack traffic is further diluted by the third parties, which makes the attack traffic even more distributed. Second, as noticed by Paxson [2] and Gibson [20], the distributed reflector denial of service (DRDoS) attack has the ability to amplify the attack traffic, which makes the attack even more potent. In this thesis, we propose a novel defense mechanism against this type of attack in Section

52 30 A Survey of DoS Attacks 2.3 Existing DoS Attack Defense Proposals Generally, there are four basic steps to defend against bandwidth attacks. The first step is attack prevention, the second step is attack detection, the third step is attack source identification, and the fourth step is attack reaction Attack Prevention Attack Prevention is a mechanism which stops the attacks before they actually cause damage. This approach assumes attack traffic is spoofed, which is true in most situations since attackers need spoofed traffic to hide their identities 1 and exploit the protocol vulnerabilities as discussed in Section This approach normally comprises a variety of packet filtering schemes, which are deployed at the routers. The packet filters are used to make sure only valid (non-spoofed) traffic can pass through. This greatly reduces the chance of having DDoS attacks. For all the available attack countermeasures, attack prevention is the most preferred approach because it can minimize attack damage. However, it is not easy to specify a filtering rule that can differentiate attack traffic from legitimate traffic accurately. Moreover, some types of filtering schemes require wide deployment to be effective. Unfortunately, the Internet is an open community without central administration, which makes prevention a taxing and daunting task. Ingress Filtering Ingress filtering is a filtering scheme that filters incoming traffic according to a specified rule. We define the customer s network as the network that hosts Internet users of one organization, for example, a university. We define the external network as the network that is outside of the customer s network. Figure 2.7 illustrates the operation 1 The identities can be the IP addresses of the attackers or their compromised systems.

53 2.3 Existing DoS Attack Defense Proposals 31 Figure 2.7: Defending against IP source address spoofing using ingress filtering. of ingress filtering in a customer s network. As shown in Figure 2.7, there are two types of ingress filtering [21]. One is ISP-to-customer ingress filtering, which filters the traffic from the external networks to the customer. Another is customer-to-isp ingress filtering, which filters the traffic from the customer to the external networks (known as egress filtering). Analysis of ISP-to-customer Ingress Filtering For the ISP-to-customer ingress filtering, any internal IP address (since internal IP addresses are not expected to appear from the external network), any private network IP address 2 (e.g., *.*) and specified IP addresses (e.g. the IP addresses of known malicious users) will be filtered as shown in Figure 2.7. The ISP-to-customer ingress filtering is normally integrated with firewall technology and is very useful in filtering attack traffic. However, the efficacy of the ISP-to-customer ingress filtering is limited. First, since the filtering 2 A private network IP address refers to an IP address that is only used internally and is not globally routable. The Internet Assigned Numbers Authority (IANA) has reserved the following three blocks of the IP address space for private networks: , and

54 32 A Survey of DoS Attacks is done close to the victim, it cannot prevent network bandwidth consumption. Second, it only denies a small proportion of IP addresses, so attacks can still be launched using the rest of the IP address space. To bypass the ISP-to-customer ingress filter, the attacker can carefully spoof the IP addresses to avoid using any IP address within the victim s internal network and any IP address for a private network. Analysis of Customer-to-ISP Ingress Filtering For the customer-to-isp ingress filtering, all the IP addresses which do not belong to the ISP s network will be filtered. As shown in Figure 2.7 all the IP addresses in the University of Melbourne can be characterized as *.*. Hence, IP addresses other than *.* will be filtered by the customer-to-isp ingress filter of Melbourne University. The customer-to-isp ingress filtering is effective in defending against IP source address spoofing since it can filter spoofed traffic close to the source. However, only universal deployment of the customer-to-isp ingress filtering can guarantee the IP address authenticity. However, a series of costs are raised by implementing the customerto-isp ingress filtering. First, as it needs to do per packet checking, it will result in potential router overhead (especially for high-speed links). Second, it will cause a conflict with the existing services that depend on the IP source address spoofing, such as some versions of the Mobile IP [22] and some hybrid satellite communications architectures [23]. To beat the customer-to-isp ingress filtering, the attacker can spoof the IP source address from the customer s network. The number of IP addresses which can be used as the spoofed IP addresses depends on the size of the network where the customer-to-isp ingress filtering will be installed. Summary With the wide deployment of ingress filtering, we can greatly limit the attacker s ability to spoof addresses and hence reduce the risk of having denial of

55 2.3 Existing DoS Attack Defense Proposals 33 Figure 2.8: Router-based packet filtering. service attacks. However, without universal deployment, ingress filtering cannot completely stop IP source address spoofing. Even if IP source address spoofing can be eliminated, the DDoS attacks launched from compromised systems, which do not need IP spoofing, still pose a strong threat to the Internet. Router-based Packet Filtering The Router-based Packet Filtering is a mechanism that filters spoofed traffic according to network topology. On the Internet, an autonomous system (AS) is the basic level of a routing domain, either a single network or a group of networks that are controlled by a common network administrator (or group of administrators) on behalf of a single administrative entity (such as a university, a business enterprise, or a business division). The ASes are connected by border routers using the Border Gateway Protocol (BGP) [24]. As shown in Figure 2.8, each node represents a border router for one AS. Analysis of Router-based Packet Filtering The Router-based Packet Filtering (RPF) approach proposed by Park and Lee [25] extends ingress filtering to the core of the Internet. RPF is implemented in border routers and filters any unexpected traffic on each link.

56 34 A Survey of DoS Attacks Generally, IP packets travel between the source and the destination using the same path. Hence, one border router would only expect traffic from a stable group of ASes on each link. As shown in Figure 2.8, all the routes from AS 3 are shown in the graph (normal solid arrow). In one scenario, an attack is launched to the target in AS 4 from AS 7 using spoofed IP addresses from AS 3, and a RPF is deployed in the border router of AS 6. In this scenario, the attack traffic will be dropped by AS 6 since the IP addresses from AS 3 are not expected in link 7-to-6. Therefore, the filtering rule for RPF is based on the topology of the autonomous systems and the policies for each AS. Discussion Simulation results show that a significant fraction of spoofed IP addresses can be filtered if RPF is implemented in at least 18% of ASes in the Internet [25]. However, there are some limitations of this scheme. First, the major drawback of this scheme is still the implementation issue. Given the current the number of ASes exceeds 10,000, RPF should be implemented in at least 1,800 ASes to make the scheme effective, which is an onerous task to accomplish especially when the number of ASes is still increasing. Moreover, RPF needs the BGP [24] messages to carry the source addresses, which significantly increases the BGP message size and processing time for the BGP message. Second, the dropped packets by RPF can be legitimate if there has been a recent route change. As shown in Figure 2.8, the route from AS 3 has changed due to a link failure, congestion or policy change in 5-to-6. If the RPF in the border router of AS 6 has not updated the information of the IP addresses that are expected on individual links, the legitimate traffic with IP addresses in AS 2 will be dropped in link 7-to-6. Finally, similar to ingress filtering, RPF can only restrict the space for IP spoofing instead of completely stopping IP spoofing. Furthermore, the RPF cannot prevent non-spoofed DDoS attacks, e.g., attacks launched from compromised systems.

57 2.3 Existing DoS Attack Defense Proposals 35 Since the filtering rule in RPF has a very coarse granularity (only at the AS level), the attacker can still spoof IP addresses based on the network topology. Alternatively, the attacker can launch the attack from the compromised systems directly without spoofing the IP addresses. In addition, since RPF depends on the BGP message to configure the RPF filter, the attacker can hijack a BGP session and disseminate bogus BGP messages to mislead border routers to update filtering rules in favor of the attacker. Source Address Validity Enforcement (SAVE) Protocol Analysis of SAVE As we discussed before, the router-based packet filter is vulnerable to asymmetrical and dynamic Internet routing as it does not provide a scheme to update the routing information. To overcome this disadvantage, Li et. al have proposed a new protocol called the Source Address Validity Enforcement (SAVE) Protocol [26], which enables routers to update the information of expected source IP addresses on each link and block any IP packet with an unexpected source IP address. Similar to the existing routing protocols, SAVE constantly propagates messages containing valid source address information from the source location to all destinations. Hence, each router along the way is able to build an incoming table that associates each link of the router with a set of valid source address blocks. As shown in Figure 2.9, after receiving the SAVE messages, router C builds a forwarding table that IP address range * is only expected on link 1 and IP address range * is only expected on link 2. Discussion SAVE is a protocol that enables the router to filter packets with spoofed source addresses using incoming tables. It shares the same idea with ingress filtering and RPF that the source address space on each link of the router is stable and foreseen. Any packet that violates the expected source address space will be regarded

58 36 A Survey of DoS Attacks Figure 2.9: An example of SAVE message update. as forged and will be filtered. SAVE outperforms ingress filtering and RPF in that it overcomes the asymmetries of Internet routing by updating the incoming tables on each router periodically. However, SAVE needs to change the routing protocol, which will take a long time to accomplish. Moreover, as SAVE filters the spoofed packets to protect other entities, it does not provide direct implementation incentives. If SAVE is not universally deployed, attackers can always spoof the IP addresses within networks that do not implement SAVE. Moreover, even if SAVE were universally deployed, attackers could still launch DDoS attacks using non-spoofed source addresses. Summary To conclude, attack prevention aims to solve IP spoofing, a fundamental weakness of the Internet. However, all the attack prevention schemes lack strong incentive for deployment. Unless new policies or legislation are introduced to force the deployment, it is doubtful that wide deployment of attack prevention schemes will happen in the near future. More importantly, the attack prevention schemes assume attacks will be greatly reduced if every source address is accountable. However, with poor reliability of

59 2.3 Existing DoS Attack Defense Proposals 37 global network infrastructure, attackers can easily gain control of a large number of compromised computers known as zombies. Then attackers can direct these zombies to attack using valid source addresses. Since the communication between attackers and zombies is encrypted, only zombies can be exposed instead of attackers Attack Detection Apart from attack prevention, the first step to defend against DoS attacks is attack detection. The attack detection for DoS attacks is different from general intrusion detection. First, for general intrusion such as user-to-root and remote-to-local attacks, the attacker can hide the attack by changing the system log or deleting any file created by the attack. Thus, these attacks are difficult to detect. However, DoS attacks can be easily detected since the target s services will be degraded, for example, with a high packet drop rate. Second, false positives are a serious concern for DoS attack detection. Since the potency of DoS attacks does not depend on the exploitation of software bugs or protocol vulnerabilities, it only depends on the volume of attack traffic. Consequently, DoS attack packets do not need to be malformed, such as invalid fragmentation field or malicious packet payload, to be effective. As a result, the DoS attack traffic will look very similar to legitimate traffic. This means that any detection scheme has a high risk of mistaking legitimate traffic as attack traffic, which is called a false positive. If the DoS attack can be detected eventually, a common question is why do we need attack detection? There are three reasons for attack detection. First, if a target can detect an attack before the actual damage occurs, the target can win more time to implement attack reaction and protect legitimate users. Second, if attacks can be detected close to attack sources, attack traffic can be filtered before it wastes any network bandwidth. However, there is generally insufficient attack traffic in the

60 38 A Survey of DoS Attacks early stage of an attack and at links close to attack sources. Consequently, it is easy to mistake legitimate traffic as attack traffic. Therefore, it is challenging to accurately detect attacks quickly and close to attack sources. Finally, flash crowds are very similar to DoS attacks, which can cause network congestion and service degradation. However, flash crowds are caused by legitimate traffic, whereas DoS attacks caused by malicious traffic. Hence, it is important to differentiate DoS attacks from flash crowds so that targets can react to them separately. Generally, there are two measures for DoS attack detection. The first is detection time and the second is false positive rate. A good detection techniques should have a short detection time and low false positive rate. Generally there are two groups of DoS attack detection techniques. The first group is called DoS-attack-specific detection, which is based on the special features of DoS attacks. The second group is called anomaly-based detection, which models the behavior of normal traffic, and then reports any anomalies. DoS-attack-specific Detection Generally, DoS attack traffic is created at an attacker s will. First, attackers want to send as much traffic as possible to make an attack powerful. Hence, attack traffic does not observe any traffic control protocols, such as TCP flow control. In addition, there will be a flow rate imbalance between the source and the victim if the victim is unable to reply to all packets. Second, attack traffic is created in a random pattern to make an attack anonymous. Third, for each known attack, attack traffic at the target is highly correlated with abnormal traffic behavior at attack sources. Analysis of DoS-attack-specific Detection

61 2.3 Existing DoS Attack Defense Proposals 39 MULTOPS Gil and Poletto propose a scheme called MULTOPS [27] to detect denial of service attacks by monitoring the packet rate in both the up and down links. MULTOPS assumes that packet rates between two hosts are proportional during normal operation. A significant, disproportional difference between the packet rate going to and from a host or subnet is strong indication of a DoS attack. SYN and Batch Detection Wang et al. [28] proposed SYN detection to detect SYN floods, and Blažek et al. [29] proposed batch detection to detect DoS attacks. Both methods detect DoS attacks by monitoring statistical changes. The first step for these methods is to choose a parameter for incoming traffic and model it to be a random sequence during normal operation. In [28], the ratio of SYN packets to FIN and RST packets is used, while in [29] a variety of parameters, such as TCP and UDP traffic volume, are used. The attack detection is based on the following assumptions. First, the random sequence is statistically homogeneous. Second, there will be a statistical change when an attack happens. Spectral Analysis Generally, DoS attack flows are not regulated by TCP flow control protocols as normal flows do. Hence, DoS attack flows have different statistical features compared with normal flows. Based on this assumption, Cheng et al. propose to use spectral analysis [30] to identify DoS attack flows. In this approach, the number of packet arrivals in a fixed interval is used as the signal. In the power spectral density of the signal, a normal TCP flow will exhibit strong periodicity around its round-trip time in both flow directions, whereas an attack flow usually does not. Kolmogorov Test Normally, an attacker performs a DoS attack using large numbers of similar packets (in terms of their destination address, protocol type, execution pattern etc.) generated from various locations but intended for the same destination. Thus, there is a lot of similarity in the traffic pattern. On the other

62 40 A Survey of DoS Attacks hand, legitimate traffic flows tend to have many different traffic types. Hence, traffic flows are not highly correlated and appear to be random. Based on this assumption, Kulkarni et al. proposed a Kolmogorov complexity based detection algorithm [31] to identify attack traffic. Time Series Analysis Based on the strong correlation between traffic behavior at the target and traffic behavior at the attack source, Cabrera et al.[32] have proposed a scheme to proactively detect DDoS attacks using time series analysis. There are three steps to this scheme. The first step is to extract the key variables from the target. For example, the number of ICMP echo packets is the key variable for Ping Flood attacks. The second step is to use statistical tools (e.g., AutoRegressive Model) to find the variables from the potential attackers that are highly related to the key variable. For example, the number of ICMP echo reply packets at the potential attackers is highly correlated with the key variable for Ping Flood attacks. The third step is to build a normal profile using the found variables from the potential attackers. Any anomalies from potential attackers compared with the normal profile are regarded as strong indications of an attack. Step one and two are completed during the off-line training period and step three is done on-line. Discussion All DoS-attack-specific detection techniques are based on one or more assumptions. In the following text, we will challenge each assumption as well as provide countermeasures to evade detection. MULTOPS MULTOPS assumes that the incoming packet rate is proportional to outgoing packet rate, which is not always the case. For example, real audio/video streams are highly disproportional, and with the widespread use of on-line movie and on-line news, where the packet rate from the server is much higher than from the client, false positive rates will become a serious concern for this scheme. Moreover,

63 2.3 Existing DoS Attack Defense Proposals 41 MULTOPS is vulnerable to attacks with randomly spoofed IP source addresses. The simplest way to cripple MULTOPS is to use randomly spoofed IP addresses, which makes the calculation based on genuine IP addresses inaccurate and consumes resources by storing spoofed IP address information. Another countermeasure is to connect to the target from a large number of attack sources in a legitimate manner (e.g. downloading a file from a ftp server). Therefore, the packet rate ratio between in flows and out flows 3 during the attack will appear to be normal and undetected by MULTOPS. SYN and Batch Detection The detection scheme in [28] is based on the fact that a SYN packet will end with a FIN or RST packet during normal TCP connection. When the SYN flood starts, there will be more SYN packets than FIN and RST packets. The attacker can avoid detection by sending the FIN or RST packet in conjunction with the SYN packets. To beat the detection scheme in [29], the attacker can carefully mix different types of traffic to ensure the proportion of each traffic is the same as it is in normal traffic. Therefore, separating different types of traffic cannot make the attack behavior more conspicuous. Spectral Analysis First of all, spectral analysis is only valid for TCP flows. As UDP and ICMP are connectionless protocols, the periodic traffic behavior is unexpected. Attackers can use UDP or ICMP traffic to confuse the detection scheme. Moreover, the attacker can mimic the periodicity of normal TCP flows by sending packets periodically. More importantly, attackers can make the reverse traffic from the target have the designed periodicity by using closed-looped protocols. For example, a large number of zombies can be directed to make legitimate TCP connections to the target. 3 We define in flow as the packet stream going to a host or subnet and out flow as the packet stream going from a host or subnet.

64 42 A Survey of DoS Attacks Time Series Analysis The vulnerability of this scheme is that the efficacy of training is based on the features of known attacks. The attacker can disturb or disable the detection scheme by inventing new attacks. As DDoS attacks do not necessarily need to use any particular type of traffic, it is easy for the attacker to create a new type of attack just by combining different types of attack traffic. This causes multiple key variables from the target, and the correlations between the variables from the potential attackers and the target will become extremely complex, which complicates the process of building a normal profile and makes the detection less effective. Kolmogorov Test The assumption of the Kolmogorov test is based on the fact that multiple attack sources use the same DoS attack tool. Therefore, the resulted traffic is highly correlated. Unfortunately, there is no theoretical analysis to support this assumption. Attacker sources can be orchestrated to break the correlation by sending attack traffic at different times, with different traffic types, packet sizes, and sending rates. This is easy to achieve. For example, attackers can use the IP address of a compromised computer as the random seed to generate a set of parameters for configuring attack traffic. By doing this, attack traffic will appear random, which can bypass detection. To conclude, the efficacies of DoS-attack-specific detection can be evaluated in two aspects: the assumption strength, and technical complexity. As shown in Table 2.2, most assumptions are not strong, since attackers can change their attack patterns to overthrow the assumption and evade detection. Although the assumption for spectral analysis is strong, it only works for TCP flows and it is complicated to implement. Anomaly-based Detection Signature-based detection and anomaly-based detection are two different approaches for network-based intrusion detection system (IDS). Signature-based detection detects

65 2.3 Existing DoS Attack Defense Proposals 43 Table 2.2: Basic assumptions for different attack detection techniques Detection Basic Assumption Technical Technique Assumption Strength Complexity MULTOPS Incoming traffic rate is Medium Low proportional to outgoing traffic rate SYN Detection Number SY Npackets = NumberF IN+RST packets Weak Low Batch Detection Attack traffic is statistically unstable Medium Low Spectral Analysis Attack flow does not have periodic behavior Strong High Kolmogorov Test Attack traffic is highly correlated Medium High Time Series Analysis Attacks are limited to known attacks Medium Medium the attack if the monitored traffic behavior matches known characteristics of malicious users as indicated by a database. Anomaly-based detection detects the attack if the monitored traffic behavior does not match the normal traffic profile that is built using training data. Anomaly-based detection has drawn many researchers attention since it can detect new attacks. Since DoS attacks are part of network attacks, all the anomaly-based network intrusion detection methods can be applied to detect DoS attacks. Analysis of anomaly-based DoS detection Statistically-based Anomaly Detection Building a normal profile is the first step for all anomaly-based detection techniques. Since there is no clear definition of what is normal, statistical modeling plays a crucial role in constructing the normal profile. Statistically-based anomaly detection includes two major parts. This first part is to find effective parameters to generate similarity measures. The parameters can be IP packet length, IP packet rate and etc.. Manikopoulos et al. [33] propose to solve this key issue by using statistical preprocessing and neural network classification. The second part is to calculate the similarity distance. Statistical methods, such as χ 2 -like and Kolmogrov-Smirnov tests [34][33], have been used to provide similarity

66 44 A Survey of DoS Attacks metrics to evaluate the difference between the monitoring traffic and the expected normal traffic. If the distance between the monitored traffic and the normal traffic profile is larger than a given threshold, a DoS attack is detected. Artificial Immune System Inspired by human immunology, Forrest et al. developed a network-based IDS, called LISYS [35], using Artificial Immune System (AIS). LISYS is further extended by Bebo et al. [36]. The general idea for AIS-based network intrusion detection includes the following four steps. First, each IP packet is reduced to a string as its identity. For example, this string can contain the source IP address, destination IP address and destination port number. Second, during the training period, all packets that occur frequently are considered self, i.e. normal. Third, based on self, detector strings are created such that they do not match any self string. Fourth, when the number of incoming packets that match the detector string reaches a certain threshold, an attack is reported. Discussion The common challenge for all anomaly-based intrusion detection system is that it is difficult or impossible for the training data to provide all types of normal traffic behavior. As a result, legitimate traffic can be classified as attack traffic, which is known as a false positive. To minimize the false positive rate, a larger number of parameters are used to provide more accurate normal profiles. For example, in AIS-based IDS, longer strings can be used to improve the detection resolution. However, with the increase of the number of parameters, the computational overhead to detect intrusion increases. This becomes a bottleneck, especially for volume-oriented DoS attacks that will be aggravated by the computational overhead of the detection scheme. More importantly, unlike sophisticated network intrusions that depend on malformed packets or special packet sequences, DoS attacks only need the massive traffic

67 2.3 Existing DoS Attack Defense Proposals 45 volume to be effective. Thus, different packet content or traffic patterns, will not affect the attack power. Unlike other attacks which are constrained to sending traffic that exploits a specific vulnerability, DoS attackers can mimic legitimate traffic to avoid anomaly-based detection. For example, an attacker can first use real data traces (either by using publicly available packet traces or monitoring real network traffic) to create a normal traffic profile, and then create the attack traffic according to this profile. Moreover, a system that uses sophisticated detection algorithms will become a victim itself during a large scale of DoS attack. Summary DoS-attack-specific detection techniques generally use one or more features of DoS attacks, and can identify attack traffic effectively. However, all these techniques are based on one or more assumptions, which are not always reliable. Attackers can evade detection by overthrowing these assumptions. Anomaly-based detection techniques are facing a dilemma of how to choose a trade-off between the processing speed and the detection accuracy. Moreover, attackers can use legitimate traffic generators to avoid detection Attack Source Identification Once an attack has been detected, an ideal response would be to block the attack traffic at its source. Unfortunately, there is no easy way to track IP traffic to its source. This is due to two aspects of the IP protocol. The first is the ease with which IP source addresses can be forged. The second is the stateless nature of IP routing, where routers normally know only the next hop for forwarding a packet, rather than the complete end-to-end route taken by each packet. This design decision has given the Internet enormous efficiency and scalability, albeit at the cost of traceability and

68 46 A Survey of DoS Attacks network security in terms of DoS attack. In order to address this limitation, many schemes based on enhanced router functions or modification of the current protocols have been proposed to support IP traceability. IP Traceback by Active Interaction The main feature for IP traceback schemes in this category is that routers actively interfere with the attack traffic and trace the attack sources based on the reaction of attack traffic. Analysis of Active IP Traceback Schemes Backscatter traceback [37][38] is a traceback scheme based on the fact that DoS attacks generally use invalid spoofed source IP addresses. Typically, DoS attack traffic uses randomly spoofed source IP addresses. However, some IP addresses (e.g., IP address 10.*.*.*) have been reserved for future use instead of global routing. They can be used in private networks but are invalid in the Internet. summarized as follows. The key procedures for backscatter traceback can be First, the ISP configures all routers to drop all packets to the victim after a DoS attack is detected or reported. When a router rejects a packet, it will also send an ICMP destination unreachable error message packet (backscatter 4 )to the source IP addresses. It is worth noting that the source IP addresses are spoofed IP addresses during a DoS attack, which could be invalid IP addresses. Second, the routers are also configured to send all ICMP destination unreachable error message packets with invalid destination IP addresses to an analyzer. Since all these packets can only be caused by DoS attack traffic, the entry point of the attack traffic can be revealed by checking the source IP addresses of these collected ICMP packets. Moreover, a 4 In general, backscatter refers to the response to any unsolicited request (e.g., SYN/ACK packets to the spoofed source IP addresses). In this case, it refers to the ICMP destination unreachable error message packet.

69 2.3 Existing DoS Attack Defense Proposals 47 request can be sent to the upstream routers of the attack traffic entry point for further traceback. Burch and Cheswick [39] proposed a link-testing traceback technique. It infers the attack path by flooding all links with large bursts of traffic and observing how this perturbs the attack traffic. This scheme requires considerable knowledge of network topology and the ability to generate huge traffic in any network link. Generally, highspeed routers lack tracking ability, such as the ability to tell which link one packet comes from. Stone [40] proposed an overlay network 5 architecture to overcome this limitation. During DoS attacks, attack traffic (traffic to the target) is rerouted to the overlay network which is called CenterTrack. The CenterTrack is normally equipped with routers configured for tracking. Thus, the attack packets can be easily tracked, hop-by-hop, through the overlay network, from the routers close to the target to the attack entry point of the ISP. Discussion Generally, active IP traceback schemes can locate attack paths reliably and quickly. However, the common shortcoming for all active IP traceback schemes is that substantial control is needed to co-ordinate all participating routers, which is unlikely for the Internet. Consequently, active IP traceback schemes are only suitable for identifying attack path within one ISP s network, where the ownership of routers is unanimous. To evade backscatter traceback, an attacker only needs to use a valid (spoofed or non-spoofed) IP address, as the scheme is based on the assumption that DoS attack traffic will always contain invalid source IP addresses, for example, *.*. As link-testing traceback needs to flood the link to affect the attack traffic, it is questionable whether a target has the right or power to flood links for tracking purposes. 5 An overlay network is a new physical or logical connection of a set of nodes on top of the existing one. In Stone s proposal, it refers to a logical connection.

70 48 A Survey of DoS Attacks Besides, when the attack traffic has multiple attack paths, there is only a small fraction of attack traffic on one attack path. Consequently, the change of the total attack traffic will be negligible by flooding a single link, which renders the link-testing scheme less effective. The CenterTrack scheme creates a logical overlay network by IP tunneling. The overhead to create the IP tunnel could amplify the negative effect of the DoS attack. In addition, DoS attacks that originate from within the overlay network cannot be tracked. Finally, it is not clear whether this scheme is scalable during a DDoS attack which has multiple entry points to the ISP. Probabilistic IP Traceback Schemes The general idea of all probabilistic IP traceback schemes is that routers probabilistically insert partial path information into the incoming traffic, and the target reconstructs the packet path using the partial path information. Analysis of Probabilistic Traceback Schemes Savage et al. proposed to traceback the IP source by probabilistic packet marking (PPM) [41]. The main idea of PPM is that each router embeds its IP address (partial path information) into the incoming packets probabilistically while they travel between the source and the destination. Based on the embedded path information, a target can reconstruct the packet transmission path. However, since no specific field has been reserved for tracking purpose in the current Internet protocol IP v.4 It is expected to have such field in IP v.6 [42], encoding schemes are needed to squeeze the path information into rarely used fields (e.g., the 16 bit identification field in the IP header). Song et al. have improved the efficiency and security of the PPM scheme by introducing a new hashing scheme to encode the path information and an authentication scheme to ensure the integrity of the marking information [43]. More details about PPM can be found in Section 4.3. In [44], another coding scheme using an algebraic approach to embed

71 2.3 Existing DoS Attack Defense Proposals 49 path information is proposed to reduce the number of packets needed to reconstruct the attack path. Bellovin [45] proposed a similar approach called the ICMP traceback scheme. In this scheme, routers generate an ICMP traceback message (called an itrace packet) to the destination containing the address of the router with a low probability. For a significant traffic flow, the destination can gradually reconstruct the route that was taken by the packets in the flow. The itrace packets are generated with a very low probability by routers to reduce the additional traffic, which undermines the effectiveness of the scheme. To prevent attackers from spoofing the ICMP packets, an authentication field is used in the itrace packet. This scheme is later improved by Wu et al. [46]. Discussion Unlike active IP traceback, probabilistic approaches traceback IP sources passively without interfering with incoming traffic. Therefore, less control of routers and less computational resources are needed for probabilistic approaches. One crucial assumption for all probabilistic approaches is that a significant amount of attack traffic transmits across the attack path. However, during a highly distributed denial of service attack (e.g., reflector attack [2]), the attack traffic comes from a large number of links. Hence, the number of attack packets is low on each independent link, where attack packets come from only one attack source. Therefore, these probabilistic approaches will fail to traceback the attack sources due to insufficient attack traffic on independent links. Although authentication schemes were proposed to protect the marking field or the itrace packet, many implementation issues need to be further studied. For example, many authentication schemes use public key infrastructure to sign the marked packet or itrace packet. However, it is not clear who has the right to sign a packet and how one can validate that signature. Moreover, how to find a tradeoff between the level of

72 50 A Survey of DoS Attacks security versus the computational overhead is still an open research problem. Without secured marking information or itrace packet, it is noted in [47] that the attacker can eject IP packets with spoofed marking fileds to mislead the path reconstruction, which makes the probabilistic approaches less effective. More recently, Waldvogel has proved that attackers can insert fake paths efficiently using Groups of Strongly SImilar Birthdays (GOSSIB) [48] attacks against PPM schemes. Hash-based IP Traceback As discussed before, all the probabilistic approaches fail to identify attack paths when attack traffic is very scarce on each independent link during a highly distributed denial of service attack. Similarly, probabilistic approaches also fail to traceback the attack source, where the attack only contains a small number of packets. For example, the ping-of-death attack only needs one excessive long ICMP packet. Consequently, a better traceback approach is needed, such that it is not affected by traffic volume and is able to traceback even one single packet. Analysis of Hash-based IP Traceback Snoeren et al. [49] proposed a scheme, called hash-based IP traceback, to trace individual packets. In this proposal, routers keep a record of every packet passing through the router. A Bloom filter [50] is used to reduce the memory requirement to store packet records. Moreover, in order to protect privacy, only packet digests, instead of actual packets, are stored. When a traceback is needed, a target will send a traceback query for one packet to its upstream traceback routers. Then a router can identify this packet by checking its records, and pass the query to its neighboring routers. Eventually, the packet origin can be located. Discussion This scheme is arguably the most effective scheme to traceback DDoS attacks. However, the success of traceback depends on the number of tracking routers

73 2.3 Existing DoS Attack Defense Proposals 51 Figure 2.10: A model of DoS attack reaction schemes installed, and the area covered by these routers. Although an efficient scheme is used to compress the storage, it is still a huge overhead for a router to implement this scheme, especially for high speed traffic over a long period. Therefore, wide deployment is not expected in the near future, and the traceback strength is limited. More importantly, if a router with tracking facilities is compromised by an attacker, spoofed information can be generated to mislead the traceback Attack Reaction Unlike more subtle attacks, such as remote-to-local attacks, DoS attacks try to damage the target as much as possible and attackers do not attempt to disguise the attack since the target will be aware of the attack damage eventually. All the detection and traceback techniques discussed above aim to shorten the time needed to detect the attack, and locate the attack sources. In order to minimize the loss caused by DoS attacks, a reaction scheme must be employed when the attack is underway. The aim of DoS attacks is to suffocate the target s communication channel, which

74 52 A Survey of DoS Attacks includes the target and the networks links to which the target is connected. Figure 2.10 shows a simple model of DoS attack, where the thick line represents highbandwidth links and the thin line represents low-bandwidth links. The bottleneck of a target s communication channel can be caused by low-bandwidth network links as well as poorly-provisioned hosts. DoS attacks take effect once the resource limit of a bottleneck is reached. Hence, to minimize attack damage, the initial attack reaction is to protect the bottleneck s resources, which is called bottleneck resource management. Once the bottleneck resource is protected, the target is able to restore partial services instead of being completely paralyzed by the attack. However, since the Internet is a resource-sharing architecture, resources will be wasted unless attack traffic is removed at the source. The result of wasted resources will degrade the service quality of any host, including the target, who shares the path with attack traffic. Moreover, if the attack volume is large enough, new bottlenecks will appear, although poorly-configured resources are protected. As shown in Figure 2.10, the link between router C and the victim is the bottleneck. Attack damage can be alleviated if bottleneck resource management schemes are used to protect this link. However, when the attack traffic volume is excessively high, the bandwidth limit of link A-B will be reached, and normal users S 1 and S 2 will fail to access the victim. To protect S 1 and S 2, attack reaction should be taken at router B. We define intermediate network reaction as the attack reaction taken at the routers between the attacker and the victim. In an ideal situation, attack traffic should be filtered at the source (Router A), which is called source end reaction. These three types of attack reaction are illustrated in Figure Bottleneck Resource Management Resource management is a mechanism that the victim reorganizes the resource management schemes so that the victim s system is robust against bandwidth attacks. It

75 2.3 Existing DoS Attack Defense Proposals 53 includes two types of schemes. One is the host resource management scheme, which takes effect in the end host, another is the network resource management scheme, which takes effect in the network link. Analysis of Bottleneck Resource Management Host Resource Management One approach to managing host resources is to modify operating systems to fix software-based vulnerabilities. For example, systems using SYN cookies [51] do not need to keep half-open states, and are less vulnerable to SYN flood attacks. Another host resource management scheme is to punish attack traffic and reserve resources for well-behaved users or processes using end-to-end resource accounting [52] and traffic shaping [53]. In [53], Frank et al. have also proposed to use a server farm together with a load balancer to enhance a web server s capacity. With this increased capacity, the web server is able to handle more web requests and is less likely to be disabled by a bandwidth attack. Network Resource Management While the host resources are effectively managed, network resources are likely to become the bottleneck during DoS attacks. How to manage and protect network resources becomes a key step for DoS attack defense. In [54], Lau et al. have shown that class based queuing (CBQ) [55] algorithms can ensure bandwidth for certain classes of input flows, while Random Early Detection (RED) [56] performs poorly with regard to DDoS attacks. This lies in the fact that CBQ classifies traffic and reserves resources for each class of traffic. Yau et al. [57] have proposed a feedback control scheme on the router to throttle the aggressive (attack) traffic flow with max-min fairness. This scheme can proactively rate-limit the attack traffic before it reaches the server, and therefore forestalls the DDoS attack.

76 54 A Survey of DoS Attacks Discussion As bottleneck resource management mechanisms aim to deploy at the target or routers close to the target, it is easy to implement. Most commercial DoS attack solutions belong to this type. Both host and network resource management schemes need to classify traffic into several types, and then treat them differently. Unfortunately, it is rather difficult to give an accurate classification as DoS attack traffic can mimic any type of legitimate traffic. Without a proper rule to characterize attack traffic, the target will fail to provide services to legitimate users. Even though a sophisticated algorithm can do a better job on classifying traffic, a large scale of DoS attack can succeed by exploiting its resource-consuming feature. Consequently, any type of large scale DoS attacks that simulate normal traffic behavior will defeat bottleneck resource management schemes. Alternatively, some companies try to eliminate the bottleneck by simply increasing both host and network resources. For example, high profile websites, such as Yahoo and Microsoft, generally weather DoS attacks by investing an enormous amount of money on expanding the server capacity and the Internet connection bandwidth. This solution is arguably very effective. However, it entails a huge financial expense which only a few websites can afford. More importantly, this solution only increases the difficulty for a successful attack, and does not eliminate the DoS attack threat fundamentally. An excessively large DoS attack, such as the code-red worm [15], is still able to succeed. Intermediate Network Reaction As we analyzed above, protecting bottleneck resources only relieves attack damage instead of eliminating attacks completely. It is essential to filter attack traffic close to attack sources. The first benefit is to save bandwidth which will otherwise be wasted by attack traffic. The second benefit is to separate attack traffic from legitimate traffic geographically. Given no accurate attack signature is available at a single

77 2.3 Existing DoS Attack Defense Proposals 55 location, the closer the defense location to the attack sources, the more legitimate traffic will be protected. We define intermediate network reaction as the defense mechanism that filters attack traffic using routers in between attack sources and a target. Unfortunately, it gets more and more difficult to detect DoS attacks as the distance increases between the detection point and the target, due to reduced attack evidence. Therefore, a communication mechanism is needed to keep the routers between the target and attack sources informed of an attack. Then these routers start to filter attack traffic according to the information provided by the victim or developed by their local defense agents. In the following section, we will introduce three types of intermediate network reaction schemes, where pushback and controller-agent schemes are based on active cooperation between routers and a victim, and secure overlay service is based on anonymous routing and multiple-level filtering. Analysis of Intermediate Network Reaction Pushback Scheme Mahajan et al. [58] provide a scheme in which routers learn a congestion signature to tell good traffic from bad traffic based on the volume of traffic to the target from different links. The router then filters the bad traffic according to this signature. Furthermore, a pushback scheme is given to let the router ask its adjacent routers to filter the bad traffic at an earlier stage. By pushing the defense frontier towards the attack sources, more legitimate traffic will be protected. Controller-Agent Scheme Tupakula and Varadharajan [59] propose an agentcontroller model to counteract DoS attacks within one ISP domain, which is illustrated in Figure In this model, agents represent the edge routers and controllers represent trusted entities owned by the ISP. Once a target detects an attack, it sends a request to the controller, asking all agents to mark all packets to the target. After checking the marking field, the target can find out which agent (edge router) is the

78 56 A Survey of DoS Attacks Figure 2.11: Intermediate network reaction: controller-agent scheme entry point for the attack traffic. The target then sends a refined request to the controller, asking some particular agents to filter attack traffic according to the attack signature provided by the target. Secure Overlay Service Keromytis et al. proposed an architecture called secure overlay service (SOS) [60] to secure the communication between the confirmed users and the victim. As shown in Figure 2.12 [60], all the traffic from a source point is verified by a secure overlay access point (SOAP). Authenticated traffic will be routed to a special overlay node called a beacon in a anonymous manner by consistent hash mapping. The beacon then forwards traffic to another special overlay node called a secret servlet for further authentication, and the secret servlet forwards verified traffic to the victim. The identity of the secret servelt is revealed to the beacon via a secure protocol, and remains a secret to the attacker. Finally, only traffic forwarded by the secret servlet chosen by the victim can pass its perimetric routers. There are two design rationales of SOS. First, SOAPs are essentially acting as a distributed firewall. With a large number of SOAPs working in distributed manner,

79 2.3 Existing DoS Attack Defense Proposals 57 Figure 2.12: Basic SOS architecture each SOAP only needs to deal with a small proportion of the attack traffic. Therefore, sophisticated protocols, such as IPsec [61], can be used to verify the goodness of the traffic. Secondly, the final node that connects to the victim is unknown to attackers. Therefore, attackers cannot find any vulnerable link of the victim. Discussion The basic assumption for all schemes is that there is a limited number of attack paths, and not all legitimate traffic shares a path with the attack traffic. Without confidence in accurately differentiating attack traffic from legitimate traffic at a single location, all schemes try to identify attack paths based on network topology. By filtering attack traffic along the attack paths, at least legitimate traffic that does not share the path with attack traffic will be protected. Unfortunately, the assumption fails when the attack traffic is uniformly distributed. For example, reflector attack traffic can easily be geographically distributed by choosing reflectors from different locations. Consequently, all intermediate network reaction schemes are vulnerable to a large scale reflector attack.

80 58 A Survey of DoS Attacks Pushback Scheme This scheme is effective against most DDoS attacks except uniformly distributed attack sources. Moreover, it needs a narrow and accurate congestion signature to make sure only attack traffic is filtered while legitimate traffic is not affected. Since the pushback scheme aggregates attack traffic according to destination IP addresses, it is vulnerable to attack traffic with spoofed source addresses. Moreover, this scheme infers attack sources by checking the traffic volume to the victim on each upstream link. If the attack sources are highly distributed, the traffic volume to the victim on each upstream link will appear to be similar, which invalidates the pushback scheme. Controller-Agent Scheme The aim of the controller-agent model is to filter attack traffic at the edge routers of one ISP domain. Since there are two communication processes 6 among the target, controllers and agents, it is doubtful whether the control messages can get through during network congestion, and whether the attack reaction is quick enough to curtail the attack. Moreover, since this model is limited to a single ISP domain, an attacker can paralyze a target by flooding the whole ISP s network given enough attack power. More importantly, if attack sources are geographically distributed, attack traffic can appear from most, if not all, entry points of an ISP. Therefore, the attack traffic will share most entry points with legitimate traffic. Then the effectiveness of the model depends on the capability to separate attack traffic from legitimate traffic at the entry points, which is a rather challenging task. Secure Overlay Service SOS addresses the problem of how to guarantee the communication between legitimate users and a victim during DoS attacks. Keromytis et al. demonstrate that SOS can greatly reduce the likelihood of a successful attack. 6 The first one is the marking process and the second one is the filtering process.

81 2.3 Existing DoS Attack Defense Proposals 59 The power of SOS is based on the number and distribution level of SOAPs. However, wide deployment of SOAPs is a difficult DoS defense challenge. Moreover, the power of SOS is also based on the anonymous routing protocol within the overlay nodes. Unfortunately, the introduction of a new routing protocol is in itself another security issue. If an attacker is able to breach the security protection of some overlay node, then it can launch the attack from inside the overlay network. Moreover, if attackers can gain massive attack power, for example, via worm spread, all the SOAPs can be paralyzed, and the target s services will be disrupted. Source End Reaction As the ultimate goal for DoS attack defense is to filter attack traffic at the source, Mirković et al. proposed a scheme called D-WARD [62] to defend against DoS attacks at the source network, where the attack sources are located. Analysis of D-WARD First, D-WARD collects flow statistics by constantly monitoring two-way traffic between the source network and the rest of the Internet. The flow statistics include the ratio of in-traffic and out-traffic, the number of connections per destination, etc.. Second, it periodically compares the measured statistics with normal flow models, where a separate normal flow model is built for each type of traffic. Third, once a flow mismatches the normal flow model, it will be classified as an attack flow, and will be filtered or rate-limited. Discussion D-WARD addresses the fundamental DoS attack defense rationale: removing attack traffic at its source. However, it faces the following two challenges. First, for a large scale of DDoS attack, attack traffic generated by one source network can be very small and unnoticed compared with legitimate traffic flows. Hence,

82 60 A Survey of DoS Attacks detecting attack traffic accurately can be difficult or impossible. A well-organized, geographically distributed DoS attack is likely to defeat this scheme as attackers can control the attack traffic originated from each source network to be within normal range. Second, while D-WARD plays a similar role as ingress filtering, it is more expensive to implement. Consequently, the deployment motivation is a big concern. 2.4 Summary With the release of new operating systems, users are given more power over computer resources. For example, a normal user of Windows XP Home Edition is allowed to access raw sockets, a data structure that can be used for IP spoofing, which is only available for root users of Unix-like operating systems. Furthermore, both the number of Internet users and the users bandwidth have kept increasing dramatically. Unfortunately, the average security knowledge for current Internet users is decreasing while attacks are becoming more and more sophisticated [3]. As a result, the attack power is expanding rapidly. On the other hand, although the security community works very hard to patch the vulnerabilities, defense effects are limited due to the lack of central control of the Internet. One important step to combat DoS attacks is to increase the reliability of global network infrastructure. Having more secure computer systems on the Internet will greatly reduces attackers power to launch a large scale of DDoS attack. Another important step to combat DoS attacks is global cooperation, for example, cooperative IP traceback. However, it is a long and difficult path to achieve these goals. The main reason is that there is a lack of economic incentives for personal users or ISPs to invest money on security to mainly protect others networks. A usage-based billing system proposed in [63] might provide a certain level of motivation for personal users to secure their own systems. More importantly, similar problems, such as the Tragedy

83 2.4 Summary 61 of the Commons 7 [64], have been solved through legislation. Optimistically, the DoS attack problem can draw the attention of lawmakers, and global cooperation can be enforced by legislative measures. 7 The tragedy of the commons [64] happens when individuals try to maximize their benefits while ignoring the public interests.

84 62 A Survey of DoS Attacks

85 Chapter 3 History-based Attack Detection and Reaction 3.1 Introduction In this chapter, we focus on defending against DDoS attacks using Victim Model approaches. In our Victim Model, the victim is an edge router, which could be a first-mile router or last-mile router. The advantage of victim model approaches is to detect and identify the attack traffic at a single location. Since single location defense does not require cooperation between routers, it is easy to implement. Moreover, all DDoS defense mechanisms need a mechanism to identify and block attack traffic. Therefore, victim model approaches can be used as a basic component of all DDoS defense mechanisms. We propose a simple but robust scheme to detect DDoS attacks by monitoring the increase in new IP addresses, and to filter the attack traffic using an IP Address Database. Our scheme, called History-based Attack Detection and Reaction (HADR), has several important advantages over earlier proposals. Unlike earlier proposals for bandwidth attack detection [27][28][29][30][31][32][36][33] that are either based on 63

86 64 History-based Attack Detection and Reaction unreliable assumptions or too complicated to implement, our scheme is very effective for highly distributed and sophisticated denial of service attacks. Our scheme exploits an inherent feature of DDoS attacks, which makes it hard for the attacker to counter this detection and reaction scheme by changing their attack signature. Our scheme uses a sequential nonparametric change point detection method to improve detection accuracy without requiring a detailed model of normal and attack traffic. At the end of this chapter, we demonstrate that we can achieve high detection and filtering accuracy on a range of different network packet traces with low computational overhead. The rest of this chapter is organized as follows. Section 3.2 provides the background for DDoS attacks. Section 3.3 gives a detailed definition of the attack detection and reaction problem. Section 3.4 gives the motivation of our scheme. Section 3.5 discusses in detail our solution to this problem. Section 3.6 explains our proposed techniques for bandwidth attack detection and reaction. Section 3.7 gives an example of how to build an efficient IP Address Database. Section 3.8 presents the simulation results of our detection and reaction mechanism. Section 3.9 discusses an evaluation of our defense scheme. Section 3.10 discusses how our defense mechanism fits the existing Internet protocols and devices. 3.2 Background Sophisticated tools to gain root access to other people s computers are freely available on the Internet. This process is known as cracking [65]. These tools for cracking are easy to use, even for unskilled users. Once a computer is cracked, it is turned into a zombie under the control of a master computer. The master is operated by the attacker. The attacker can instruct all its zombies to send bogus data to one particular destination. The resulting traffic can clog links, and cause routers near the target or the target itself to fail under the load. The type of DoS attack that

87 3.2 Background 65 causes problems by overloading the target with useless traffic is known as a bandwidth attack. At present, there are no effective means of defending against bandwidth attacks due to the following reasons. Both IP and TCP can be misused to launch a serious DoS attack. Since all Web traffic is TCP/IP based, attackers can release their malicious packets on the Internet without being conspicuous or easily traceable. It is the sheer volume of all packets that poses a threat rather than the characteristics of individual packets. Therefore, a bandwidth attack solution is more complex than a straightforward filter in a router. Two key problems to tackle when solving bandwidth attacks are attack detection and attack reaction. Detection of a bandwidth attack might be easy in the vicinity of the victim, but becomes more difficult as the distance (i.e., the hop count) to the victim increases. The underlying reason is that most bandwidth attacks are launched from distributed sources. This means that the attack traffic is spread across multiple links, which makes it more diffuse and harder to detect. In order to react to the attacks, we need to characterize the attack traffic accurately. When the attack traffic become distributed, the volume of the attack flows are indistinguishable from legitimate traffic flows, which makes it extremely difficult to identify and filter the attack traffic. As defined in Section 2.2.3, DDoS attacks include typical DDoS attacks and distributed reflector denial of service (DRDoS) attacks. Our proposed detection and reaction mechanism is focused on how to defend against these two types of DDoS attacks, which are one of the most challenging threats to Internet security [2]. For simplicity, we refer to these two types of DDoS attacks as highly distributed denial of service (HDDoS) attacks. Our scheme also provides defense against some naive DoS attacks, such as attacks from one or a small number of sources. However, this is not the main focus of our approach since naive DoS attacks can be easily solved using

88 66 History-based Attack Detection and Reaction traditional traffic control mechanisms [66][67]. In our defense model, as the first step to tracing the attacker, we focus on detecting and filtering the attack traffic between the reflector and the victim. Unless otherwise stated, when we talk about DRDoS attack detection in the rest of this thesis, we are referring to detecting the attack traffic from the reflectors to the victim, i.e., the third stage of the DRDoS attack as defined in Section Previously proposed approaches rely on monitoring the volume of traffic that is received by the victim [58][57][29]. A major drawback of these approaches is that they do not provide a way to differentiate DDoS attacks from flash crowds, where a large number of legitimate users access the same website simultaneously. Due to the inherently bursty nature of Internet traffic, a sudden increase in traffic may be mistaken as an attack. If we delay our response in order to ensure that the traffic increase is not just a transient burst, then we risk allowing the target to be overwhelmed by a real attack. Moreover, some persistent increases in traffic may not be attacks, but actually flash crowd events. Because of the lack of an accurate attack traffic model, attack traffic filtering is done with a very coarse granularity. For example, the victim is configured to filter attack traffic according to a particular destination port or destination IP address [58]. Hence, legitimate traffic will be misclassified and filtered. Clearly, there is a need for a better approach to detecting bandwidth attacks. A better approach is to monitor the number of new source IP addresses, rather than the local traffic volume. Jung et al. [68] have observed that during bandwidth attacks, most source IP addresses are new to the victim, whereas most source IP addresses in a flash crowd appeared at the victim before. In this chapter, we propose to monitor the number of new IP addresses in a given time period in order to detect bandwidth attacks. We demonstrate that this is a more sensitive variable for detecting bandwidth attacks than monitoring the total

89 3.2 Background 67 volume of incoming traffic. In addition, we present a method for detecting changes in our monitoring variable, based on the non-parametric Cumulative Sum (CUSUM) algorithm [69][28]. The CUSUM algorithm reduces the false positive rate, and has been shown to be optimal in terms of detection accuracy and computational overhead for parametric models, where the data follows a known distribution. The CUSUM algorithm also has good performance for non-parametric models, where the data distribution is unknown [69]. We also propose an attack reaction mechanism to filter out attack traffic based on an IP Address Database. We present experimental results to show that this mechanism can filter attack traffic efficiently while protecting most of the legitimate traffic. There are three main contributions in this chapter. 1. We propose a novel approach to detecting bandwidth attacks by monitoring the arrival rate of new source IP addresses. We show that this approach is much more effective than earlier schemes, especially when there are multiple attack sources and the attack traffic is highly distributed. We adapt the detection scheme proposed by Wang et al. [28], which is based on an advanced non-parametric change detection scheme, CUSUM, and demonstrate that this approach detects a wide range of simulated attacks quickly and with high accuracy. 2. We combine our new approach with a traditional detection scheme that checks the rate of each flow. We demonstrate that we can detect all DDoS attacks by using this combined approach. 3. We propose a new scheme called History-based IP Filtering to filter attack traffic accurately and efficiently. In particular, we provide a set of guidelines for building an efficient IP Address Database.

90 68 History-based Attack Detection and Reaction 3.3 Problem Definition There are three challenging goals for our defense mechanism: (1) how to detect attacks in the vicinity of the attack source; (2) how to detect attacks quickly in the victim s network; (3) how to react to the detected DDoS attacks Detecting Attacks in the Vicinity of the Attack Source Highly Distributed Denial of Service (HDDoS) attacks are extremely difficult to detect in the vicinity of the attack sources. Suppose the traffic volume needed to shutdown a network is V, and the HDDoS attack traffic is distributed over U links. It might be easy to detect the HDDoS at the victim, since V is significantly larger than normal traffic. However, the attack traffic volume close to an attack source will be indistinguishable from the normal traffic, since V U will become very small if U is sufficiently large. Previous DDoS solutions, such as probabilistic packet marking [41] and routerbased pushback [58] will become less effective in this context, since they are all based on the assumption that the attack traffic volume is large on each attack path. This assumption does not hold for highly distributed denial of service attacks. However, in order to prevent the attack traffic from consuming network bandwidth, it is essential to detect and block the HDDoS attacks as close as possible to the attack sources, namely, the first-mile routers Detecting Attacks Quickly in the Victim s Network As we mentioned in the previous section, detecting an HDDoS attack at the victim is not hard, since all the attack traffic has been aggregated and the victim will experience a high packet dropping rate and degraded service. However, it is too late to react to the attack at this stage. Victims normally choose to contact the upstream ISPs [41], which is a time-consuming process. An alternative approach is to

91 3.3 Problem Definition 69 implement an automatic pushback control mechanism to block the attack traffic at upstream routers [58]. For example, when a HDDoS attack is detected by the victim, a message is sent to the victim s upstream routers. This message contains a description of the attack traffic, and a request to filter that traffic. It is essential to send the pushback message as soon as possible to successfully defend against DDoS attacks. Consequently, we need a rapid detection mechanism so that the control message can be sent in the early stages of an attack. Although this can be achieved by simply lowering the detection threshold so that we shorten the detection time, this results in an increase in the false alarm rate. Thus, it is crucial for us to detect attacks accurately when the attack traffic is not large enough to congest the network links Reacting to the Detected DDoS Attacks Once an attack has been detected, we need a mechanism for stopping the attack either by blocking the attack traffic or removing the attack sources. Arguably, removing the attack sources is the ideal solution to stop the attack since the attack traffic will be eliminated permanently. However, given a large number of attack sources, it will take considerable effort to locate and remove all the attack sources, e.g., cooperation between different organizations or implementation of traceback schemes [41][49] in most of the routers. In contrast to removing the attack sources, blocking the attack traffic seems more practical since it can be done directly by the victim or the victim s upstream routers. This raises the question of how to differentiate legitimate traffic from attack traffic. Attackers will try to imitate the legitimate traffic as closely as possible to avoid detection. Hence, the defender is facing a dilemma: either filter the legitimate traffic as well as the attack traffic, or risk being flooded by the overwhelming attack traffic. There are three challenges for blocking the attack traffic. First, an accurate rule is needed to distinguish the attack traffic from the legitimate traffic so that legitimate

92 70 History-based Attack Detection and Reaction traffic can still reach the victim while the attack traffic is being filtered. Second, the attack traffic model should use an intrinsic feature of the attack traffic, otherwise a simple change of attack signature will render the IP filtering useless. Third, the filtering rule must be simple and the filtering process should be computationally efficient, otherwise the filtering process will become a denial of service attack itself. 3.4 Motivation for History-based Attack Detection and Reaction Early forms of DoS attacks involved a fixed pattern of attack traffic. For example, smurf attacks use ICMP packets. Thus, it is easy to block the attack traffic according to the protocol number. Unfortunately, recent research [70] shows that DDoS attack tools are sophisticated enough to generate packets with randomly spoofed source addresses, source ports and even packet payloads. This makes the filtering process much more difficult and requires the use of advanced traffic modeling techniques Stopping DoS Attacks Using Traffic Control Algorithms A DoS attack can be viewed as a group of greedy users trying to compete for both network and server resources. When they occupy most of the target s resources, few resources can be used by legitimate users, and the target s service is disrupted. This can be classified as a problem of how to share a target s resources fairly. Traditional traffic control schemes solve this problem by allocating a proportion of resources for each traffic flow [66] and punishing any greedy flows [67] at the router or server. Traffic control algorithms classify flows according to the information obtained from the IP packet header, such as source IP address in IPv4 or flow label in IPv6 [42]. Therefore, with the deployment of these traffic control algorithms, a DoS attack that

93 3.4 Motivation for History-based Attack Detection and Reaction 71 contains a single or a small number of flows can only consume a small amount of resources and hence will be less threatening. However, attackers can counteract these traffic control mechanisms by using a large number of attack flows. This can be achieved by either spoofing many traffic flows from a single or multiple attack sources, or by generating attack traffic from a large number of compromised computers. The traffic volume of each flow can then be kept sufficiently small to avoid being classified as a greedy flow. The power of this type of attack is that the aggregation of these small traffic flows can occupy most of the resources, leaving few for legitimate users. Hence, the real threat is a DoS attack with a huge number of traffic flows, such as DDoS attacks Intrinsic Attack Feature In order to detect and filter DoS attack traffic, we require a means of characterizing whether a given packet is legitimate or part of an attack. A common attack scenario is for attackers to use spoofed traffic, especially spoofed source IP addresses. There are two main reasons for this. First, as we discussed in Section 2.2.3, spoofed source IP addresses are an essential component of certain types of powerful attacks, e.g., SYN flood attacks. Second, spoofed source IP addresses can increase the number of flows in the attack traffic and reduce the risk of being filtered by traffic control mechanisms. In the absence of detailed information about legitimate flows, attackers generally choose to spoof source IP addresses randomly in order to mimic normal flows. In a different attack scenario, when attackers control a large number of compromised computers, spoofed source IP addresses are not necessary. Attackers might use the genuine IP addresses of compromised computers in order to avoid detection by deployed egress/ingress filters. Large numbers of compromised computers can normally be obtained only by exploiting a common software vulnerability. Computers

94 72 History-based Attack Detection and Reaction with such a software vulnerability can potentially appear anywhere in the network at random. Consequently, we expect that source IP addresses of non-spoofed attack traffic from compromised computers will follow some form of random distribution. To conclude, in both scenarios, DoS attack traffic can be characterized as a large number of flows with randomly distributed source IP addresses. In Section 3.9, we examine the consequences for our approach if this assumption does not hold. In the case of normal traffic, the nature of the service provided by a target determines the stability of its physical user group. Users could come from a particular geographical region, e.g., Australia, or a special profession, e.g., medical institutes. Because of the IP address allocation structure, there is a loose correlation between a physical user and an IP address [71]. Accordingly, the source IP addresses of legitimate traffic are likely to have some degree of consistency. In contrast, the source IP addresses of attack traffic will follow a random distribution and be inconsistent with the source IP addresses of normal traffic. For example, it was observed that only % of IP addresses in a Code Red attack had appeared before [68]. In contrast, Jung et al. [68] have found that around 82.9% of all IP addresses in observed flash crowd events have sent a request before. Based on statistical analysis of Internet traffic collected in a medium size stub network that is described in Section 3.8, we have found that a large proportion of IP addresses appearing in Internet traffic consistently reappear based on daily observation. This implies that a reliable feature for identifying legitimate traffic is that it has regularly appeared at the website. More discussion about the IP address consistency can be found in Section Attack Detection and Reaction A fundamental security problem for IP networks is that there is no scheme to authenticate the source of an IP packet [3]. If a router or web server has the ability to

95 3.5 Our Solution: History-based Attack Detection and Reaction 73 identify which are legitimate IP addresses, the authentication problem can be solved. As we analyzed before, source IP addresses of legitimate traffic are more likely to be consistent with previous source IP addresses that have appeared at the target, while the source IP addresses of attack traffic are likely to be new. Therefore, in order to tell good packets from bad packets, the router or web server should learn from its network connection history. Our proposal is to record all the source IP addresses of the previous successful network connections in order to compile an IP address database. For the incoming traffic, if there is a large number of IP addresses that do not appear in the IP Address Database, then a possible network attack has occurred. If our network or website experiences a high level of congestion, we can discard packets whose source address does not appear in our IP address database. In the case of a DDoS attack, this means that we have a high confidence of filtering out attack packets. In the case of a flash crowd, this means the quality of service will be guaranteed for frequent users that are indicated by the IP Address Database. 3.5 Our Solution: History-based Attack Detection and Reaction We propose a scheme called History-based Attack Detection and Reaction (HADR) to defend against the Highly Distributed Denial of Service (HDDoS) attack. HADR uses an intrinsic feature of HDDoS attacks, namely, the abnormally large number of new IP addresses in the attack traffic to the target. This novel approach has two important advantages: (1) it can detect attacks close to their sources in the early stages of the attack, and (2) its filtering accuracy is not affected by the changes in the attack signature.

96 74 History-based Attack Detection and Reaction Overview of the HADR Scheme Figure 3.1 provides an overview of our HADR scheme. The HADR scheme consists of three parts: a detection engine, a decision engine, and a filtering engine. The detection engine analyzes the incoming traffic pattern to detect any abnormalities. The decision engine summarizes the results from the detection engines and decides whether an attack is occurring. The filtering engine, called History-based IP Filtering, filters the attack traffic according to the identified attack traffic pattern. Note that there are two detection engines. The first detection engine, called New Address Detection Engine, is used to detect highly distributed denial of service attacks, while the second detection engine, called Flow Rate Detection Engine, is used to detect nondistributed attacks from a small number of sources. As we can see from Figure 3.1, the detection engines monitor the traffic through a passive (read-only) interface which is pre-configured with a non-routable IP address. This implementation feature can make the detection engines immune to the attacks since the detection engines are invisible to the attacker New Address Detection Engine In order to protect the target effectively, it is essential to detect attacks quickly and accurately. The New Address Detection Engine is the core technology of our detection scheme, and is shown in the shaded part of Figure 3.1. It contains two parts: off-line training, and detection and learning. Off-line Training The purpose of off-line training is to maintain a database of normal IP address, known as the IP Address Database. A learning engine keeps the IP Address Database

97 3.5 Our Solution: History-based Attack Detection and Reaction 75 Figure 3.1: The architecture of History-based Attack Detection and Reaction

98 76 History-based Attack Detection and Reaction updated by adding new legitimate IP addresses and deleting expired IP addresses 1. This is done off-line to make sure the traffic data used for training does not contain any attack traffic. Simple rules can be used to decide whether a new IP address is legitimate or not, for example, a TCP connection with less than 3 packets is considered to be an abnormal IP flow. In Section 3.6.2, we describe how to implement and maintain an efficient IP Address Database. Detection and Learning The purpose of detection and learning is to detect traffic with a large number of new IP addresses and collect normal traffic data traces for off-line training. In this mode of operation, our aim is to measure the percentage of new IP addresses during a sampling period. In the New Address Detection Engine, a hash table is used to record the IP addresses that appeared in the current sampling interval, and whether the address appears in the IP Address Database. Every hash table entry contains two fields, the number of IP packets and the time stamp of the most recent packet for that IP address, which is illustrated in Figure 3.2. If an IP address appeared during the sampling period and it is not in the IP Address Database, it is considered to be a new IP address. By analyzing the number of new IP addresses during the sampling period compared to the size of IP Address Database, we can detect whether a HDDoS attack is occurring. The New Address Detection Engine then outputs its detection result to the decision engine. If no attack is detected, the traffic data trace is then kept to be further sanitized for off-line training. 1 Each IP address is assigned a life limit in the IP Address Database, where the life limit corresponds to the time since a packet from that IP address was last seen. Once its life limit is reached, the IP address expires.

99 3.5 Our Solution: History-based Attack Detection and Reaction 77 Figure 3.2: The hash table for the detection engine Flow Rate Detection Engine The purpose of the Flow Rate Detection Engine is to detect some unsophisticated attacks that use a small number of source IP addresses. The Flow Rate Detection Engine sorts the incoming IP flows according to source IP addresses using the hash table shown in Figure 3.2. During every sampling period, the Flow Rate Detection Engine calculates the traffic volume for each IP address, known as the flow rate. If the flow rate is larger than a certain threshold, an alarm is set to indicate a bandwidth attack. The Flow Rate Detection Engine then outputs the detection result to the decision engine. A detailed description of how the flow rate threshold is set appears in Section Decision Engine The purpose of the decision engine is two-fold: (1) it combines the detection results of the New Address Detection Engine and the Flow Rate Detection Engine to obtain a more accurate result, and (2) it controls the operation of the filtering engine. After receiving the outputs from the two detection engines, the decision engine summarizes the detection results according to a predefined rule, which is discussed in detail in Section The decision engine only generates two detection signals: (1) an attack

100 78 History-based Attack Detection and Reaction is occurring, or (2) no attack. If the decision engine identifies an attack, it sends a control signal to activate the filtering engine. If no attack has been diagnosed in the decision engine for a certain period of time, which is a predefined system parameter, a control signal is sent to stop the filtering engine Filtering Engine: History-based IP Filtering The purpose of the History-based IP Filtering is to protect the target s resources by removing the packets whose source IP addresses are not in the IP Address Database. First, the History-based IP Filtering is loaded with the IP Address Database in the form of a hash table, for example, the hash table described in Figure 3.2. Second, for each incoming IP packet, the source IP address is extracted and used to lookup the hash table. If no matching hash entry is found, the packet is considered to be an attack packet, and will be dropped accordingly. The History-based IP Filtering can be implemented in software or hardware. The complexity of the filtering operation depends on the complexity of the hash function used to construct the IP Address Database. In general, the lookup operation for our hash table is a constant time operation, i.e., it has complexity O(1) Placement of the HADR Wang et al. [28] discussed how attack detection can be performed at either the firstmile or last-mile edge routers. Our History-based Attack Detection and Reaction (HADR) scheme can be installed at either the first-mile or the last-mile edge router, or both. As shown in Figure 3.3, each edge router can be both the first-mile router and last-mile router, depending on the direction of traffic flows between the local network and the Internet. For the packets going out of the local network (outbound traffic), the edge router is their first-mile router. On the other hand, for the incoming

101 3.5 Our Solution: History-based Attack Detection and Reaction 79 Figure 3.3: The placement of the HADR scheme packets into the local network (inbound traffic), the edge router is their last-mile router. Thus, we can deploy the HADR in both inbound and outbound interfaces of the edge router. We call the HADR modules deployed in the edge router s inbound and outbound interfaces the last-mile HADR and the first-mile HADR respectively. The first-mile HADR plays the primary role in detecting a flooding attack, due to its proximity to the sources of the flooding attack. However, the detection sensitivity may decline as the number of attack sources increases. In a large-scale DDoS attack, the attack sources can be orchestrated so that attack traffic generated by each source is scarce and looks similar to legitimate traffic. Consequently, the attack traffic close to the source will cause only an insignificant deviation from the normal traffic pattern. The advantage of the first-mile HADR is to prevent attack traffic from congesting shared Internet resources. Due to the poor reliability of global network infrastructure, isolated security incidents, such as a computer being compromised by an attacker, is inevitable. The first-mile HADR can prevent attack traffic from entering the defended network regardless of whether there are any compromised computers inside the network. Moreover, the first-mile HADR can also help to identify the attack sources within the network and notify the owners of those computers used as attack

102 80 History-based Attack Detection and Reaction sources. However, as the first-mile router s service is hardly affected by the scarce attack traffic, the ISP is less motivated to implement the first-mile HADR since it mainly serves to protect others networks. In contrast, the last-mile HADR can quickly detect an attack since all attack traffic is aggregated at the last-mile router. Although it cannot provide any hint about the bandwidth attack sources, the filtering engine described in Section can be triggered to protect the victim. In order to disable the target under protection, the bandwidth attack sources must significantly increase their attack traffic rates. However, this increased attack traffic makes it easier for the first-mile HADR to detect the bandwidth attack and its sources. The advantage of the last-mile HADR is to identify attack traffic quickly and protect the victim s network effectively. Furthermore, quick detection at the lastmile router can provide more time to inform upstream routers to defend against DDoS attacks cooperatively. Hence, the ISP is more motivated to implement the last-mile HADR. However, the disadvantage of the last-mile HADR is that it cannot stop network bandwidth consumption. To conclude, it is ideal to implement both the first-mile and the last-mile HADR. However, there is less incentive for networks to deploy a first-mile HADR. This is because its cost is suffered by the deploying network, which does not benefit substantially from its operation. Similar problems in history, such as, the Tragedy of the Commons [64], have been solved by legislation. It is expected that legislative action will punish those that do not take efforts to secure their systems in the future. Therefore, deploying the first-mile HADR would become part of a general security commitment, and only networks that deploy the first-mile HADR or similar mechanisms will not be liable for hosting attack sources.

103 3.6 HADR Design HADR Design The key issues in the design of the HADR include (1) which detection feature to use, (2) how to build an efficient IP Address Database, (3) how to implement the detection algorithm, and (4) how to index the incoming IP flows The Choice of Detection Feature: New IP Addresses In comparison with earlier detection proposals [27][28][29][30][31][32][36][33], the key aspect of our detection scheme is that we use new IP addresses as our detection feature. In this section, we discuss this choice of detection feature and its benefits. Analysis of the Available Features Let us define the following three traffic scenarios. Normal traffic conditions represent the situation when there is no attack or network congestion. Flash crowds represent the situation when many legitimate users start to access one website at the same time. HDDoS attacks include the typical DDoS attack and the DRDoS attack as defined in Section Let A normal, A flash, A attack represent the number of packets, and B normal, B flash, B attack represent the number of new IP addresses for normal traffic conditions, flash crowds and HDDoS attacks respectively in the same sampling interval. We define the Monitoring Point as the router where we collect the traffic statistics. For all the features in the above scenarios, we use the same Monitoring Point, and the same IP Address Database is used to calculate the number of new IP addresses.

104 82 History-based Attack Detection and Reaction Let us first discuss the situation when we put the Monitoring Point close to the target where all the attack traffic aggregates. As we discussed in Section 2.2.3, for a typical DDoS attack, the attack traffic uses randomly spoofed source IP addresses, which will be new to the IP Address Database. For a Distributed Reflector Denial of Service (DRDoS) attack, the attack traffic uses the reflectors IP addresses. Although the attack traffic is not spoofed, it is unsolicited traffic. The attacker directs the zombies to send the spoofed request traffic to some highly provisioned third-parties, for example, backbone routers, as observed in [20]. Thus, the traffic between the zombie and the reflector is easy to disguise among the high-volume background traffic. Generally, the backbone router only generates IP packets to communicate with other backbone routers using the Border Gateway Protocol (BGP) protocol [24]. Therefore it is unusual for IP packets that originate from the backbone routers to appear in the Monitoring Point close to the target. Consequently, most of the source IP addresses of the DDoS attacks will be new to the IP Address Database. A flash crowd event is similar to a DDoS attack from the traffic volume point of view. However, most of the source IP addresses of the flash crowd traffic have appeared in the Monitoring Point before, which has been justified in [68]. We can summarize these observations in terms of the following relations when the Monitoring Point is close to the target. A normal A flash A attack (3.1) B normal < B flash B attack (3.2) When the Monitoring Point is far from the target, the attack traffic is very diffuse. For a typical DDoS attack, since the attack traffic from each zombie is randomly spoofed, almost every packet will be new to the Monitoring Point. Hence B attack will be large, although not as large as when the Monitoring Point is close to the target. For a DRDoS attack, the attack traffic from one reflector only contains one new IP

105 3.6 HADR Design 83 flow to the Monitoring Point. The treatment of this type of attack will be discussed in detail in the following section and we exclude this situation in this section for the simplicity of discussion. Hence, the following equations apply when the Monitoring Point is far from the target. A normal A flash A attack (3.3) B normal B flash < B attack (3.4) Choice of Detection Feature From the analysis above, the traffic volume cannot be used to differentiate a flash crowd from a DDoS attack when the Monitoring Point is close to the target. Even worse, it cannot differentiate all these three types of network scenarios when the Monitoring Point is far from the target. However, the number of new IP addresses is effective in differentiating the DDoS attack from the normal traffic condition and the flash crowd. The Internet is a very complicated and dynamic entity and it is nearly impossible to characterize Internet traffic by a simple model [72][73]. Thus, we cannot use the traffic volume as our detection feature, since we can be easily mislead by bursty nature of Internet traffic, which means a sudden increase in traffic volume is not necessarily a bandwidth attack. Therefore, we choose the number of new IP addresses as our detection feature. To demonstrate the effectiveness of our detection feature, we embedded traffic from a simulated attack into a real-life packet trace. The packet trace that we used was from the Internet connection to the University of Auckland [74], on the 19 March We simulated a 5 minute DDoS attack with an attack rate of 160 packets/s. Both the attack length and the attack rate are representative values that are commonly observed in the Internet [16][75]. As shown in Figure 3.4, we can hardly observe

106 84 History-based Attack Detection and Reaction 1.5 x 104 Number of packets every 10 seconds attack Traffic volume pm 1pm 2pm 3pm 4pm Percentage of new IP addresses Percentage of new IP addresses every 10 seconds attack 0 12pm 1pm 2pm 3pm 4pm Figure 3.4: Effect of choice of detection feature on detecting the occurrence of an attack. Figure 3.5: The sampling intervals for History-based Attack Detection and Reaction any sign of attack when analyzing the traffic purely on traffic volume because of the bursty nature of the Internet traffic. In contrast, we can easily observe a large peak caused by the attack traffic when analyzing the percentage of new IP addresses in the sampling interval. This is because the percentage of new IP addresses stays at a very low level during normal operation. This makes the attacks detectable using our HADR scheme, even when the attacks are highly distributed.

107 3.6 HADR Design 85 How to use the detection feature We collect the IP addresses during each sampling interval n (n=1, 2, 3,...), which determines the detection resolution. As shown in Figure 3.5, 1 = 2 =... = n, which means the sampling intervals are of equal length. The choice of n is a compromise between making n small so that the detection engine can quickly detect an attack, and making n large so that the detection engine has less computational load because it checks the traffic less often. Let T n represent the set of unique IP addresses and D n represent the entries in IP Address Database at the end of the sampling interval n (n = 1, 2, 3,...). The number of new IP addresses in n is defined as T n T n D n. We can use this measure to detect the onset of a DDoS attack. However, T n T n D n varies according to the location of the Monitoring Point and different n. For example, T n T n D n observed at the last-mile router of a small ISP will be different from what could be measured in a large corporate network. We can normalize this value by defining X n = Tn Tn Dn T n, which will not be affected by the Monitoring Point and n. Consequently, we use X n as our detection variable. The algorithm is shown in Figure 3.6. current timestamp records the arriving time of a packet, reference timestamp is the start point of a sampling interval, and sampling interval determines the length of a sampling interval. During the sampling interval, we record the number of packets with the same IP address, and the arrival time of the most recent packet with that IP address. All the information is kept in a hash table. At the end of a sampling interval, we calculate the total number of IP addresses, and the total number of new IP addresses in the hash table. Moreover, the detection variable X n is calculated, and used as input to the detection engines. Detection Algorithm() represents the algorithms used by our detection engines. Finally, the variables are reset to calculate X n for the next sampling interval.

108 86 History-based Attack Detection and Reaction Figure 3.6: New Address Detection Engine algorithm IP Address Database Design In this section we describe in detail how we differentiate legitimate packets from malicious packets by checking whether the source IP addresses of the incoming packets are in the IP Address Database. IP Address Database Let S i = {s i 1, s i 2,..., s i n i } denote the collection of all the legitimate source IP addresses that appeared in the network on day i, where S i = n i. Let F k = {f 1, f 2,..., f m } denote the collection of all the frequent legitimate IP addresses from days 1 to k, where F k = m. In practice, we provide two alternative rules for defining frequent

109 3.6 HADR Design 87 later in this section. Let A = {a 1, a 2,..., a x } denote the IP addresses appearing in a distributed denial of service attack. Please note that s i n i, f m, and a x denote IP addresses. Obviously, F k (S 1 S 2... S k ). We aim to analyze the IP address distribution within S 1 S 2... S k and use a statistical approach to develop a threshold to determine the frequent user collection F. Let p normal = F S j S j represent the proportion of normal IP flows admitted by History-based IP Filtering on day j (j > k) and p ddos = F A A represent the percentage of attack IP flows admitted by History-based IP Filtering. Ideally, we want p normal to be 1 and p ddos to be 0. We define the IP Address Database as the collection of frequent IP addresses that appeared over the training period, for example, one month. For History-based IP Filtering, it is very important to get an accurate and efficient IP Address Database. We develop two different rules to determine what is a frequent IP address. The first rule considers an IP address to be frequent based on the number of the days it appeared during the training period. Let p 1 (d) represent the collection of unique IP addresses that each appeared in at least d days. Let f 1 (d) represent the proportion of legitimate traffic getting through when using p 1 (d) as the IP Address Database. The second rule is the number of packets per IP addresses. Let p 2 (u) represent the collection of unique IP addresses that are shared by at least u packets. Let f 2 (u) represent the proportion of legitimate traffic getting through when using p 2 (u) as the IP Address Database. In practice, we want p 1 (d) and p 2 (u) to be small so that we can reduce the memory requirement for keeping the IP Address Database, and we want f 1 (d) and f 2 (u) to be large so that we can protect legitimate traffic. In conclusion, there are two design parameters involved in deciding whether an IP address is frequent: the minimum number of days (d) on which the IP address appeared, and the minimum number of packets per IP address (u). We can tune the parameters according to

110 88 History-based Attack Detection and Reaction different network conditions. Moreover, we can generate a more accurate and efficient IP Address Database by using a combination of these two rules as follows: F c (d, u) = p 1 (d) p 2 (u), (3.5) where F c (d, u) is the set of frequent IP addresses defined by the parameters d and u. Maintaining and Operating the IP Address Database History-based IP Filtering is only activated when we detect a high level of server or network utilization that leads to packets being dropped. Under normal traffic conditions, History-based IP Filtering is not required, which allows us to learn new IP addresses. When the traffic volume is normal, we can extract IP addresses from the incoming packet trace in order to update the IP Address Database. In particular, we are only interested in learning valid IP addresses. For example, if an IP address appears in a successful TCP handshake, then it is considered to be valid, since a spoofed IP address would not be able to complete the three-way handshake. Consequently, attackers are forced to use their real IP addresses to attack the network. This limits the number of IP addresses that can be used by attackers. Thus, even if their valid address appears in the IP Address Database, we can easily detect the attackers due to the abrupt increase in their traffic volume from their IP addresses during an attack. We can then trace the path of these source using existing IP traceback schemes [41][43][9], and block these IP addresses for a certain period. During off-line training, we use a sliding window to remove expired IP addresses. In our experiments we set the length of this window to be 2 weeks. Each IP address in the IP Address Database has an associated time stamp, which records the last time that a packet was received from this IP source address. When we start our leaning process, we subtract the time stamp of every entry in the IP Address Database from

111 3.6 HADR Design 89 the current time stamp. If the remainder is larger than 2 weeks, then this entry is removed from the IP Address Database. By using the sliding window we can always keep the most relevant IP addresses, and therefore reduce the memory requirements of the IP Address Database Abrupt Change Detection In order to detect a DDoS attack, we need to test for changes in our detection feature over time. However, our detection feature is a random variable due to to the stochastic nature of Internet traffic. Consequently, we require a mechanism that can accurately discriminate between the onset of a DoS attack and a temporary random fluctuation in traffic. Change Detection Modelling Internet traffic can be viewed as a complex stochastic process and any traffic abnormalities, for example, a HDDoS attack, can lead to an abrupt change of the process. Our goal is to detect the change in the number of new IP addresses. There are two approaches to detecting this change. One is fixed-size batch detection [76], which monitors the change of mean value every fixed time period. Another is sequential change-point detection [76], which monitors the detection variable successively. The latter is designed to detect a change in the model as soon as possible after its occurrence, which meets the key design requirement for our detection engine. Thus, we choose to model our task as a sequential change point detection problem. Consider the illustrative example in Figure 3.7. For the random sequence {X n }, there is a step change of the mean value at m from α to α + h. We require an algorithm to detect changes of at least step size h in the sequence {X n } and estimate the change point m in a sequential manner.

112 90 History-based Attack Detection and Reaction detection feature transformed detection feature X n α+h Z n α 0 a+h a= 0 α β y n n n m m CUSUM variable N 0 n m τ N time Figure 3.7: Illustration of the CUSUM algorithm In our problem, {X n } represents the percentage of new IP addresses in a sequence of time periods. Inspired by Wang et al. [28], we formulate the random sequence {X n } as follows: X n = α + ξ n I(n < m) + (h + η n )I(n m), (3.6) where ξ = {ξ n } n=1, η = {η n} n=1 are random sequences such that E(ξ n) = E(η n ) 0, h 0. I(H) is the indicator function: it equals 1 when the condition H is satisfied and 0 otherwise. The CUSUM Algorithm The CUSUM (Cumulative Sum) algorithm is a commonly used algorithm in statistical process control, which can detect the change of mean value of a statistical process [76][69]. CUSUM relies on the fact that if a change occurs, the probability distribution of the random sequence will also change. Generally, CUSUM requires

113 3.6 HADR Design 91 a parametric model for the random sequence so that the probability density function can be applied to monitor the sequence. Unfortunately, the Internet is a very dynamic and complicated entity, and the theoretical construction of Internet traffic models is a complex open problem [72], which is beyond the scope of this thesis. Thus, a key challenge is how to model {X n }. Since non-parametric methods are not modelspecific, they are more suitable for analyzing the Internet. Consequently, we have implemented a non-parametric CUSUM (Cumulative Sum) method [69] in our detection algorithm. This general approach is based on the model presented in Wang et al. [28] for attack detection using CUSUM. The main idea behind the non-parametric CUSUM algorithm is that we accumulate values of X n that are significantly higher than the mean level under normal operation. One of the advantages of this algorithm is that it monitors the input random variables in a sequential manner so that real-time detection is achieved. Let us begin by defining our notation before we give a formal definition of our algorithm. As we mentioned in Section 3.6.1, X n represents the fraction of new IP addresses in the sampling interval n. The top graph in Figure 3.7 shows an illustrative example of {X n }. In normal operation, this fraction will be close to 0, i.e. E(X n ) = α 1, since there is only a small proportion of IP addresses that are new to the network under normal conditions [68] [7]. However, one of the assumptions for the nonparametric CUSUM algorithm [69] is that mean value of the random sequence is negative during normal conditions, and becomes positive when a change occurs. Thus, without loss of any statistical feature, {X n } is transformed into another random sequence {Z n } with negative mean a, i.e. Z n = X n β, where a = α β (see the middle graph of Figure 3.7). Parameter β is a constant value for a given network condition, and it helps to produce a random sequence {Z n } with a negative mean so that the negative values of {Z n } will not accumulate according to time. When an attack happens, Z n will suddenly become large and positive, i.e., h+a >

114 92 History-based Attack Detection and Reaction 0, where h can be viewed as a lower bound of the increase in X n during an attack. Hence, Z n with a positive value (h + a > 0) is accumulated to indicate whether an attack happens or not (see the bottom graph of Figure 3.7). One point worth noting is that h is defined as the minimum increase of the mean value during an attack. In order to detect an attack, we derive a third variable y n, which represents the accumulated positive values of Z n, as illustrated in Figure 3.7. If y n exceeds the attack detection threshold N, then we consider that an attack has occurred. Our change detection is based on the observation that h β. Now our detection problem is to find the abrupt change point m in the random sequence {Z n } which is described as follows: Z n = a + ξ n I(n < m) + (h + η n )I(n m), (3.7) where a < 0, a < h < 1, and other conditions are the same as Eq The formal definition of the non-parametric CUSUM algorithm is as follows: y n = Q n min 1 k n Q k, (3.8) where Q k = k i=1 Z i, with Q 0 = 0 at the beginning, and y n is our test statistic. In order to reduce the overhead for online implementation, we use the recursive version of the non-parametric CUSUM algorithm [76][69][29][28] which is shown as follows: y n = (y n 1 + Z n ) +, y 0 = 0, (3.9) where x + is equal to x if x > 0 and 0 otherwise. A large y n is a strong indication of an attack. As we see in the bottom graph of Figure 3.7, y n represents the cumulative positive values of Z n. We consider the change to have occurred at time τ N if y τn N. The

115 3.6 HADR Design 93 Figure 3.8: CUSUM algorithm decision function can be described as follows: 0 if y n N; d N (y n ) = 1 if y n > N, where N is the threshold for attack detection, and d N (y n ) represents the decision at time n. The decision is 1 if the test statistic y n is larger than N, which indicates an attack. Otherwise, the decision is 0, which indicates normal operation, i.e., no statistical feature change for the random sequence {Z n }. The CUSUM algorithm is shown in Figure 3.8. Analysis of the CUSUM algorithm It has been proved [76][69] that if the values in a time series are independent and identically distributed with a parametric model, CUSUM is asymptotically optimal for a variety of Change Point Detection problems. There are two requirements to apply CUSUM to the aforementioned random sequence {Z n }. First, the dependence between random variables decreases with the increase of time. Second, the random variable is bounded by a finite value. This has been formalized in [69][28] as follows: 1. ψ(s), the ψ-mixing coefficient of {Z n }, approaches 0 as s, where

116 94 History-based Attack Detection and Reaction {Z n } n=1 is a random sequence. Let F k j = σ{ω : Z j, Z j+1,..., Z k } (1 j k < ), which is a σ-algebra 2 generated by the random vectors {Z n } k n=j. The ψ-mixing coefficient is defined as follows: ψ(s) def = sup t 1 sup A F t 1, B F t+s P (A)P (B) 0 P (AB) 1, (3.10) P (A)P (B) where sup stands for supremum. As we see from the equation above, if the dependency among {Z n } is very weak, for example, long range dependent arrival processes, ψ(s) will approach 0 as s. 2. The one-dimensional distribution of Z n satisfies the following regularity condition: H > 0 such that E(e tzn ) < for t H, which means Z n will not be infinitely large. In [76][69], the two conditions are described in more detail. Since the Z n is derived from the Internet traffic, where long range dependent arrival processes are common, the dependency among {Z n } samples decays as the time interval increases. Thus, condition (1) is satisfied. Since 0 X n 1 and Z n = X n β, where β is a finite constant, Z n is also a finite value. Therefore, condition (2) is satisfied. Consequently, our detection variable Z n can easily satisfy these two weak requirements for applying the CUSUM algorithm. There are two key measures that are used to evaluate bandwidth attack detection systems. The first evaluation measure is the false alarm rate, which is one of the biggest concerns among the anomaly detection community [77]. If a system produces too many false alarms, it will waste a significant amount of time investigating whether the alarm indicates a real attack or not. If the attack reaction (such as packet filtering) 2 Formally, X is a σ-algebra if and only if it has the following properties: (1)The empty set is in X; (2) If E is in X then so is the complement of E; (3) If E1, E2, E3,... is a sequence in X then their (countable) union is also in X.

117 3.6 HADR Design 95 is taken according to the false alarm, innocent traffic will be unfairly punished and normal network services will be disturbed. The second evaluation measure is the detection time. One of the advantages for a bandwidth detection system is to detect the attack as soon as possible so that suitable reaction schemes can be applied earlier to minimize or eliminate the attack damage. Unfortunately, these two parameters are a conflicting pair. It is hard to reduce the detection time and the false alarm rate at the same time. Therefore, a trade-off must be made between these two. As we mentioned before, the CUSUM algorithm is said to be optimal in minimizing the detection time as well as reducing the false alarm rate [76][69][28]. According to previous theoretical work [69] on the non-parametric CUSUM algorithm, the detection time τ N and the normalized detection time after a change occurs ρ N are defined as follows: τ N = inf{n : d N (.) = 1}, ρ N = (τ N m) +, N (3.11) where inf represent infimum, and m represents the starting time of the attack, which are illustrated in Figure 3.7. The relation between ρ N and h (the lower bound of actual increase in X n during an attack) is described as follows [69]: ρ N γ = 1 h a, (3.12) where a < 0 is the mean of {Z n } during normal operation, and h a is the lower bound of the mean of {Z n } when an attack happens. Since the actual increase during an attack will be larger than h, the above equation provides a conservative estimate of the normalized detection time. The actual detection time should be shorter.

118 96 History-based Attack Detection and Reaction Parameter Specification The two design goals, low false alarm rate and short detection time, are achieved by choosing the optimal parameters β and N. β is used to offset {X n } to give {Z n }, which has a negative mean a during normal operation. The larger β is chosen, the less likely a positive value will appear in {Z n }. Therefore, it is less likely that the test statistics y n will be accumulated to a large value to indicate an attack. N is the attack threshold for y n. The larger the N, the lower the false alarm rate, but the longer the detection time. According to Eq and Eq. 3.12, N can be decided by a and h, as we explain below. Moreover, β = α + a. Thus, if a (the mean of {Z n } during normal operation) and h (the lower bound of the actual increase during an attack) are given, then β and N will also be decided. As mentioned earlier, it is hard to discuss the optimality of choices for β and h given the lack of a parametric model for {Z n }. However, it is known that for the CUSUM algorithm the asymptotically optimal value is achieved when h = 2 a for one of its worst cases, a Gaussian random sequence [69]. This motivated us to choose h = 2 a in our experiments, similar to the approach in [28]. In Section 3.8, we demonstrate the effectiveness of this choice on simulated network attacks. Based on a and h, we can determine β, the upper bound of X n, and the detection threshold N. First, we use Eq to determine γ given h and a. We can then in turn use γ as an estimate of ρ N. Next, given a required detection time, which can be approximated by the product of N and ρ N, we can obtain N from Eq For a given Monitoring Point, we can observe E(X n ) = α under normal conditions. Hence, β can be calculated by β = α + a. Since α varies according to different Monitoring Points, we will discuss the value of β in our experimental evaluation. When the attack traffic converges at the last-mile router (close to the target), there is a large increase in the percentage of new IP addresses during an attack, which can be easily observed with h α. In other words, the change value h caused by

119 3.6 HADR Design 97 the attack traffic will be large. Therefore, we simply choose a = 0.05, and hence h = 2 a = 0.1 when our algorithm is used at the last-mile router. For the last-mile router, the false alarm rate is low because of the aggregated attack traffic behavior. Consequently, we are more concerned about the detection time and want it to be as short as possible. Thus, we set the minimum possible detection time to be τ N = m+1. If we combine this value with a = 0.05 and h = 0.1 in Eq and 3.12, we can obtain ρ N γ = 1 h a = = 20 and N = (τ N m) + ρ N = (m+1 m) 20 = However, the attack traffic at the first-mile router (close to the attack source) is much more diluted. This is because sophisticated attackers can generate attack traffic from multiple sources so that the attack sources do not stand out from the background traffic, i.e., the change value h contributed by the attack traffic will be small. In order to find a balance between detection sensitivity and false alarm rate, we choose a = 0.01, h = 0.02 at the first-mile router. For the first-mile router, the most challenging task is to reduce the false positive rate because of the sparse attack traffic. Thus, we let τ N = m + 3 and get γ = 100 and N = All these derived values satisfy the requirement for an asymptotically optimal CUSUM algorithm. However, all these values can be adjusted to suit the local network conditions. Moreover, it is worth noting that for the same Monitoring Point, α can be updated periodically to represent the most accurate estimation of the random sequence {X n }, and other detection parameters can be updated accordingly Hash Techniques Once we have populated the IP Address Database, it is essential to have a fast IP address lookup process, especially when the frequent user collection F is large. We propose two schemes for address lookup in the IP Address Database. The first is to use an appropriate hash function to build a single hash table. This scheme is simple to implement but requires a large amount of storage. To reduce the storage

120 98 History-based Attack Detection and Reaction Figure 3.9: For each IP packet, the Bloom filter computes k independent N-bit digests of the 32-bit source IP address, and sets the corresponding bits in the 2 N -bit table space, our second scheme uses a space-efficient data structure known as a Bloom filter [50]. As shown in Figure 3.9, a Bloom filter computes k distinct IP address digests (d 1 (s), d 2 (s), d 3 (s),..., d k (s)) for each IP source address (s) using k independent uniform hash functions (H 1, H 2, H 3,..., H k ), and uses the N-bit results to index into a 2 N -sized bit array. Before storing the IP Address Database using the Bloom filter, the array is initialized to all zeros. Then, for each IP address in the IP Address Database bits in the 2 N -sized bit array are set to one as shown in Figure 3.9. To test whether an IP address is in the IP Address Database or not, we can simply compute the k digests on the IP address, and check the indicated bit positions. If any one of them is zero, the IP address is not in the IP Address Database. If all of the bits are one, it is highly likely that the IP address belongs to the IP Address Database. It is possible that an IP address that is not in the IP Address Database can cause all the bit positions to be one, creating a false positive. However, the rate of this kind of false positive can be controlled [50]. As the size of the IP Address Database in our trace-driven experiment is moderate (about 37,000), we use a single hash table to store IP Address Database for the

121 3.7 An Example of IP Address Database Design 99 simplicity of evaluation. The Bloom filter can be used to store IP Address Database for large websites, such as Yahoo and CNN. 3.7 An Example of IP Address Database Design The IP Address Database represents the history of previous connections to the protected network, and is the key component of the HADR. There are two fundamental performance measures for the IP Address Database design: 1. Accuracy: The percentage of legitimate IP addresses included in the IP Address Database. 2. Overhead: The size of the IP Address Database. We want the second measure to be as small as possible while keeping the first measure as large as possible. However, they are conflicting goals and cannot be simultaneously achieved. Therefore, the design philosophy of our heuristic method is to minimize the overhead of the IP Address Database while keeping a certain accuracy. In order to achieve a good balance between the accuracy and overhead, we demonstrate the design and performance of an IP Address Database based on real network packet traces. The packet trace we are using is a continuous 6 1 week IP header trace taken at 2 the University of Auckland with an OC3 ( Mbps) Internet access link [74]. The total trace contains about packets, and is 180 GBytes when uncompressed. For privacy issues, all the IP addresses have been mapped into 10.*.*.* using oneto-one hash mapping. Since the History-based IP Filtering is designed to defend against DDoS attacks from outside of the network, we are only interested in traces of incoming packets to the University of Auckland network.

122 100 History-based Attack Detection and Reaction Due to the lack of publicly available packet traces over a long time period, we have used a private packet trace provided by an industry partner. We refer to this dataset as the Small ISP Trace, which records one month of traffic that went into a class C network located in Australia from 1 April 2000 to 30 April This trace includes about packets. In the following sections, we present an example of IP Address Database design in three steps. First, we specify a simple rule to clean the collected traffic traces. Second, we evaluate the consistency of IP source addresses in the two aforementioned data traces. Finally, we implement the two rules introduced in Section on the two data traces to optimize the IP Address Database design Normal Traffic Behavior In order to understand the normal network behavior, we need to remove all the network noise, such as network scan traffic and reflector attack traffic. Normal TCP connections include at least six packets, i.e., SYN, SYN-ACK, ACK, FIN, FIN-ACK, ACK. Consequently, a TCP flow with fewer than six packets is considered to be an anomaly. As we only analyze the traffic coming into the University of Auckland, any TCP flow with less than 3 packets is an indication of an anomalous flow. The source IP address in this type of anomalous TCP flow is ignored. This type of ignored traffic is typically network scan traffic or the reply traffic from the victim of a denial of service attack where the spoofed source IP addresses happen to match the IP addresses inside the University of Auckland Consistency of the IP Addresses In order to validate our analysis in Section that the IP source addresses to a network are consistent during normal operation, we compare daily Auckland data

123 3.7 An Example of IP Address Database Design 101 Table 3.1: Percentage of IP addresses in a single day that have previously appeared in the past fortnight. Auckland Trace Small ISP Trace Date Percentage Date Percentage 01-Mar % 00-Apr % 01-Mar % 00-Apr % 01-Mar % 00-Apr % 01-Mar % 00-Apr % 01-Mar % 00-Apr % 01-Mar % 00-Apr % 01-Apr % 00-Apr % traces taken from 26 March to 1 April with the previous two week s data traces and calculate the percentage of traffic that appeared before. As we can see from Table 3.1, about 88-90% of IP addresses appeared in the last two weeks. This verifies our idea that most IP addresses that appear in the network under normal conditions have previously appeared in the network. We also conducted similar tests to the Small ISP Trace, and found that about 80% of IP addresses appeared in the last two weeks. Obviously, with an increase of the length of the history period, the percentage of recurring IP addresses will increase. However, due to limited data traces, we chose two weeks as the history period How to Build an Efficient IP Address Database There are two reasons to build an efficient IP Address Database. The first is the size of the IP Address Database can be too big to maintain if we just keep all the IP addresses that have been received by the network. For example, the total number of IP addresses in the Auckland trace in two weeks (from 12 March 2001 to 25 March 2001) is 373,494. Since the data traces are collected from a medium-size institution, we would expect more IP addresses for some high-profile web sites, such

124 102 History-based Attack Detection and Reaction p 1 (d) /T 1 :Percentage of IP addresses appearing in at least d days 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% Auckland Trace Small ISP Trace 0% Number of days(d) Figure 3.10: Number of IP addresses that appeared in at least d days. as Yahoo or CNN. The second reason is that routers and web servers only have limited power to process the incoming traffic during a denial of service attack or flash crowd. Thus, a proportion of packets will be dropped anyway because of buffer overflow. In fact, among all the IP addresses that appeared in network before, there is only a small number of stable IP addresses that appeared regularly, which we define as the frequent IP addresses. Therefore, it is very important for us to keep a compact list of IP addresses with high priority to protect. By narrowing the range of IP addresses that we protect, we can reduce the IP lookup time for History-based IP Filtering and thus achieve a high throughput rate. Let us now consider the use of the two rules p 1 (d) and p 2 (u) that we defined in Section for selecting frequent addresses. Rule 1 [p 1 (d)] the number of days: Normally, users often surf the Internet at regular times, and repeat their network usage behavior daily [78]. Thus, we can consider an IP address to be frequent based on the number of days it has appeared in the network. Let T 1 represent the total number of IP addresses that appeared

125 3.7 An Example of IP Address Database Design 103 p 2 (u) /T 2 : Percentage of IP addresses generating at least u packets 100% 90% 80% 70% 60% 50% 40% 30% 20% Auckland Trace Small ISP Trace 10% Number of packets per IP address (u) Figure 3.11: Distribution of IP addresses that generated at least u packets. in 27 days 3. As defined in Section 3.6.2, p 1 (d) represents the collection of unique IP addresses that each appeared in at least d days. Thus p 1(d) T 1 represents the percentage of IP addresses that appeared in at least d days. As we can see from Figure 3.10, for both the Auckland Trace and Small ISP Trace, the percentage of IP addresses that appeared in at least two days is only 40%. That means around 60% of the IP addresses appeared on only one day in 27 day period. These could be considered as infrequent IP addresses and as less likely to visit the network again. By increasing d, the number of IP addresses decreases exponentially. Rule 2 [p 2 (u)] the number of packets per IP address: Generally, frequent IP addresses should be expected to send a certain number of packets to the network. For example, a normal user needs to generate at least 5 packets to download a web page (2 packets for the TCP connection establishment, 2 packets for the TCP connection 3 Due to some corrupted data files, we are only able to obtain 27 day continuous traffic data traces in our experiment

126 104 History-based Attack Detection and Reaction release, and 1 packet for the HTTP request). As defined in Section 3.6.2, p 2 (u) represents the collection of unique IP addresses that have at least u packets. Let T 2 represent the total number of IP addresses that appeared in 13 March 2001 for the Auckland Trace and 18 April 2000 for the Small ISP Trace, both of which are Tuesdays and can be used as a sample of network traffic during weekdays. Thus p 2 (u) T 2 represents the percentage of IP addresses that have at least u packets. As we see in Figure 3.11, only 67% and 55% of the IP addresses at least have 5 IP packets during a one day period for the Auckland Trace and Small ISP Trace respectively. The reason why about 33% and 45% of the IP addresses for the two traces have less than 5 packets could be due to a scan attack, the response from the DDoS attack victims, or network failure. The Auckland Trace only records three types of IP traffic (TCP, UDP, ICMP) and some obviously spoofed IP addresses are possibly removed from the traces. The Small ISP Trace is completely raw IP traffic during the one month period, and we have observed several types of denial of service attacks and scan attacks in this trace. This might explain why there are more IP addresses in the Small ISP Trace that have less than 5 packets. We are only interested in protecting the IP addresses which have more than 5 packets during a DDoS attack. However, the network administrator can tune the minimum number of packets u to reflect local conditions. For example, u can be set to a large value to get a more efficient IP Address Database in the case of a high volume attack or flash crowd event. Our measurements provide a guide to finding a suitable trade-off between minimizing IP Address Database size and maximizing fairness.

127 3.8 Performance Evaluation Performance Evaluation The CUSUM algorithm detects changes based on the cumulative effect of the changes made in a random sequence, rather than using a single threshold to test each individual value in the random sequence. Therefore, with the deployment of the CUSUM algorithm, the performance of our detection scheme will not be affected by whether the attack rate is bursty or constant. Due to the difficulty of conducting trials on a large scale network with real-life traffic, the standard approach to evaluation in this field is to test the detection and reaction schemes on publicly available data traces. To evaluate the efficacy of our detection and reaction scheme HADR, we conducted the following simulation experiments. As shown in Figure 3.12, we created different types of DDoS attack traffic and merged them with the normal traffic. HADR was then applied to detect the attacks from the merged traffic and filter the attack traffic. The normal traffic traces used in our study are collected at different times from three different sources. The first two sets were gathered at the University of Auckland [74] with an OC3 ( Mbps) Internet access link [74]. The third data trace is taken from the DARPA intrusion detection data set [79]. The fourth data trace was taken on a 9 Mbps Internet Connection in Bell Labs [80]. A summary of the data traces used in our experiment is listed in Table 3.2. The last two data traces are not used in the example of IP Address Database design because the DARPA trace is not a trace of real traffic, and the length of Bell-I trace is insufficient. We focus on evaluating the HADR using the Auckland data traces based on the following reasons. First, only the Auckland data traces represent real-network traffic over a considerably long period. Second, only the Auckland data traces record the traffic at the entry point of a medium-sized network, where the HADR is proposed to be implemented. Hence, the outgoing traffic data traces of the University of Auckland

128 106 History-based Attack Detection and Reaction Figure 3.12: The trace-driven simulation experiment Table 3.2: Summary of the packet traces used for testing Trace Trace Length Creation Date Traffic Type Auck-IV-in 3 weeks March 2001 Uni-directional Auck-IV-out 3 weeks March 2001 Uni-directional DARPA 3 weeks 1999 Bi-directional Bell-I 1 week May 2002 Bi-directional can be used as the normal traffic background for the first-mile HADR while the incoming traffic data traces can be used as the normal traffic background for the last-mile HADR. To evaluate the performance of the HADR, we need to generate simulated attack traffic. For the simplicity of the experiment design, we assume the attack traffic rate to be constant. The attack period is set to be 5 minutes, which is a commonly observed attack period in the Internet [16]. The attack traffic rate for all the simulated DDoS attacks is set to be 1 Mbps. Since the network we are defending has the link capacity of Mbps and the peak traffic rate of about 6 Mbps [74], we define 1 Mbps as the minimum traffic rate to disrupt the network services. We set this conservative attack traffic rate, and aim to test the detection sensitivity of the HADR. Attack traffic with a higher traffic volume should be easier to detect, and hence is not covered by our performance evaluation.

129 3.8 Performance Evaluation DDoS Detection Using the New Address Detection Engine We use two weeks of data traces to build the IP Address Database for the Auck-IVin, Auck-IV-out, and DARPA data traces, and 6 days of data traces to build the IP Address Database for Bell-I data trace. Based on the pre-built IP Address Database, we conduct the following performance evaluation for the New Address Detection Engine. First, we illustrate the performance of the detection variable for the Auck- IV-in, Auck-IV-out, and Bell-I data traces during normal network conditions. Second, we evaluate the false positive rate of the New Address Detection Engine for the Auck- IV-in, Auck-IV-out, and Bell-I data traces. Third, we use the DARPA data trace as an example to demonstrate the efficacy of the New Address Detection Engine during randomly spoofed DDoS attacks. Finally, we use the Auck-IV-in and Auck-IV-out data traces to evaluate the detection accuracy and detection time of the New Address Detection Engine when it is implemented in the first-mile router and last-mile router respectively. Normal Traffic Behavior Before we evaluate the performance of the New Address Detection Engine, it is essential to verify the stability of the detection variable, i.e., X n. Auck-IV-in, Auck- IV-out represent the normal traffic behavior for a medium network (OC-3 connection to the backbone Internet), while Bell-I represents normal traffic behavior for an intranet (with 9Mbps Internet connection to a local ISP). Our detection variable is the percentage of new IP addresses observed in each 10 second sampling interval (X n ). Unless otherwise stated, the sampling interval is set to be 10 seconds for the rest of this chapter. Please note, in order to keep consistent with the definition of detection variable X n, we represent the percentage of new IP addresses in the form of fractions

130 108 History-based Attack Detection and Reaction 0.2 Auck IV in Trace The mean value of X n 0.15 Ratio of New IP Addresses (X n ) Time(minutes) Figure 3.13: Auck-IV-in Trace: the ratio of new IP addresses calculated in the time intervals of 10 seconds for each packet trace. 0.2 Auck IV out Trace The mean value of X n 0.15 Ratio of New IP Addresses (X n ) Time(minutes) Figure 3.14: Auck-IV-out Trace: the ratio of new IP addresses calculated in the time intervals of 10 seconds for each packet trace.

131 3.8 Performance Evaluation Bell I Trace The mean value of X n 0.15 Ratio of New IP Addresses (X n ) Time(minutes) Figure 3.15: Bell-I Trace: the ratio of new IP addresses calculated in the time intervals of 10 seconds for each packet trace Auck IV in Trace Detection Threshold CUSUM statistics (y n ) Time(minutes) Figure 3.16: Auck-IV-in Trace: CUSUM test statistics under normal operation.

132 110 History-based Attack Detection and Reaction Auck IV out Trace Detection Threshold 0.04 CUSUM statistics (y n ) Time(minutes) Figure 3.17: Auck-IV-out Trace: CUSUM test statistics under normal operation CUSUM statistics (y n ) Bell I Trace Detection Threshold Time(minutes) Figure 3.18: Bell-I Trace: CUSUM test statistics under normal operation.

133 3.8 Performance Evaluation 111 shown on the axis. Figures 3.13, 3.14, and 3.15 show the behavior of this detection feature when applied to the three traces. The performance of variable X n in the Auck-IV-out Trace (Figure 3.14) is more stable than in the Auck-IV-in and Bell-I traces (Figure 3.13 and Figure 3.15). The reason lies in the fact that the population of users within a local network, such as the University of Auckland, is more stable than the population of users who access that network from the Internet. Thus, there are very few IP addresses which are new to the IP Address Database. In contrast, the Bell-I data trace is bi-directional, and contains the traffic from both outside and inside the network. Hence, the variance of the detection variable is dominated by the traffic from outside the network. Consequently, in our experiment, we use the Bell-I data trace as the background traffic for the last-mile router, which monitors the traffic from outside the network. False Positives An important feature for an effective detection mechanism is that there should be a low false positive rate. Generally, a false positive is created if an attack is reported during normal network operation. We use the following method to evaluate the false positive rate of the New Address Detection Engine. We use the data traces collected at the University of Auckland and Bell Labs as the normal traffic input to the New Address Detection Engine. If any attack is detected, then a false positive is generated. Figures 3.16, 3.17, and 3.18 illustrate the corresponding CUSUM statistics {y n } which are derived by applying our detection algorithm to the aforementioned three traces. Consider the Auck-IV-out trace as an example to demonstrate how we obtain the {y n }. The mean value of {X n }, which is E(X n ) = α, can be obtained by the learning engine using the traffic statistics before detection. For the Auck-IV-in trace in Figure 3.13, α = Since the implementation we are dealing with is in the last-mile router, then we use a = 0.05 and N = 0.05 according to the discussion in

134 112 History-based Attack Detection and Reaction Section Thus, β = , and Z n = X n Now, we can calculate y n according to Eq According to Figure 3.16, y n is very stable. It is interesting to see that there are some separated bursts in Figure 3.16, 3.17, and These bursts are caused by the bursty feature of the Internet traffic. However, the burst for the Internet traffic is normally very short, which will not provide a large accumulated value as the attack traffic does in respect of y n. Thus, these separated bursts are far below the threshold N = 0.05, as shown in Figure 3.16, 3.17, and 3.18, which provides a large safety margin. After setting the detection thresholds for each network situation, we randomly chose 20 samples of normal traffic, each one hour in duration. We repeated the experiments described above on each trace independently. In each experiment, no false positive was found. Therefore, the false alarm rate in all of our trace-driven experiments is zero. The excellent performance of the New Address Detection Engine in terms of false positive rate is based on the fact that our detection variable X n provides an effective way to separate attack traffic from normal traffic. Randomly Spoofed DDoS attacks As most DDoS attack tools choose to spoof source IP addresses randomly [16], we want to evaluate the performance of New Address Detection Engine during randomly spoofed DDoS attacks. We used the labeled DDoS attack scenario in the DARPA Intrusion Detection Dataset [79], where source IP addresses are randomly spoofed, as an example to demonstrate the short detection delay of the New Address Detection Engine. The labeled attack started at time t = 3s and lasted for 5 seconds. Since the labeled attack is very short, we set the sampling interval to be 0.01 second. As we see from Figure 3.19, in the first three seconds, the average percentage of new IP addresses is close to zero, i.e., α = 0. However, the percentage of new IP addresses jumps to 0.75 at the end of the third second, where the DDoS attack

135 3.8 Performance Evaluation Ratio of New IP Addresses in 0.01 Second attack starts Time(seconds) Figure 3.19: 2000 DARPA Dataset: DDoS Attack Scenario 1. starts. As the DARPA dataset records incoming traffic, we apply the last-mile New Address Detection Engine for attack detection. As discussed in Section 3.6.3, the mean value a of the transformed detection variables during normal operation is , and the detection threshold N is Hence, the CUSUM variable y n will be 0.75 (α + a ) = 0.75 ( ) = 0.7 when the DDoS attack starts, which is much larger than the detection threshold. Thus, it is obviously easy to detect the DDoS attack with randomly spoofed source IP addresses within one sampling interval. However, this is not the focus of the New Address Detection Engine, which is designed to defend against much more sophisticated DDoS attack scenarios. DDoS attacks with a small number of randomly spoofed IP addresses In an attempt to avoid detection by our scheme, attackers may try to reduce the number of spoofed IP addresses that they use. Similarly, in the case of distributed

136 114 History-based Attack Detection and Reaction reflector denial of service (DRDoS) attacks, the number of source IP addresses used by the attack traffic depends on the number of reflectors. Thus, the attacker can control the number of new IP addresses used in the attack. However, there is a lower bound on the number of new IP addresses used, since the number of IP packets from each IP address will increase as fewer source IP addresses are used. Therefore, this type of attack will be detected by the Flow Rate Detection Engine as we described in Section To test the detection sensitivity of the New Address Detection Engine for DDoS attacks with different numbers of new IP addresses, we conducted the following experiment. We used the Auck-IV-in trace as the background traffic for the last-mile router detection evaluation, and the Auck-IV-out trace as the background traffic for the first-mile router detection evaluation. Let W represent the number IP addresses in the attack traffic which are new to the network. We tested different values of W in our simulation. The detection performance for the first-mile router is shown in Figures 3.20, 3.21, and Similarly, the detection performance for the last-mile router is shown in Figures 3.23, 3.24, and We repeated the attack detection under a variety of different network conditions, and listed both the average detection accuracy and detection time in Table 3.3 and Table 3.4. As we can see from the simulation results, our detection algorithm is very robust in both the first-mile and last-mile routers. For the last-mile router, we can detect the DDoS attack with W = 18 within 81.1 seconds with 100% accuracy, and detect the DDoS attack with W = 15 within seconds with 90% accuracy. Given the attack traffic length is no more than 5 minutes, only the attack traffic with W < 18 has the possibility of sometimes avoiding our detection. However, by forcing the attacker to use a small number of new IP addresses, we can detect the attack by observing the abrupt change of the number of packets per IP source address using the Flow Rate

137 3.8 Performance Evaluation CUSUM statistics y n Detection Threshold 0.24 CUSUM variable y n Time(minutes) Figure 3.20: The DDoS attack detection sensitivity in the first-mile router using the Auck-IV-out trace: attacks with 10 new IP addresses. 0.3 CUSUM statistics y n Detection Threshold 0.24 CUSUM variable y n Time(minutes) Figure 3.21: The DDoS attack detection sensitivity in the first-mile router using the Auck-IV-out trace: attacks with 4 new IP addresses.

138 116 History-based Attack Detection and Reaction 0.3 CUSUM statistics y n Detection Threshold 0.24 CUSUM variable y n Time(minutes) Figure 3.22: The DDoS attack detection sensitivity in the first-mile router using the Auck-IV-out trace: attacks with 2 new IP addresses. 0.4 CUSUM statistics y n Detection Threshold 0.3 CUSUM variable y n Time(minutes) Figure 3.23: The DDoS attack detection sensitivity for the last-mile router using the Auck-IV-in trace: attacks with 200 new IP addresses.

139 3.8 Performance Evaluation CUSUM statistics y n Detection Threshold 0.3 CUSUM variable y n Time(minutes) Figure 3.24: The DDoS attack detection sensitivity for the last-mile router using the Auck-IV-in trace: attacks with 40 new IP addresses. 0.4 CUSUM statistics y n Detection Threshold 0.3 CUSUM variable y n Time(minutes) Figure 3.25: The DDoS attack detection sensitivity for the last-mile router using the Auck-IV-in trace: attacks with 18 new IP addresses.

140 118 History-based Attack Detection and Reaction Table 3.3: Detection performance of the first-mile router Number of New IP Addresses Detection Accuracy Detection Time (seconds) 2 99% % % % % 10 Detection Engine which is described in Section For the first-mile router, we can achieve 99% detection accuracy even when there are only 2 new IP addresses in the attack traffic in each monitoring period n, where n = 10s in our simulation. Note that for the first-mile router, we do not expect to see many new IP addresses appearing from the local network. Generally, there are very few IP addresses that are new to the network since all the valid IP packets originated from within the same network. Since the IP addresses in the IP Address Database will expire and be removed after a certain time period, the IP addresses within the subnetworks which have not been used recently will be new to IP Address Database. This is very similar to ingress filtering [21]. However, ingress filtering cannot detect the attack when the spoofed IP addresses are within the subnetworks. In contrast, our first-mile router detection algorithm can detect the spoofed IP addresses within the subnetworks if they are new to the IP Address Database. It is worth noting that we have chosen a sampling interval n = 10s in our experiments, which is a conservative choice for a real implementation. If we decrease the sampling interval by using more computing resources, we can reduce the detection time accordingly.

141 3.8 Performance Evaluation 119 Table 3.4: Detection performance of the last-mile router Number of New IP Addresses Detection Accuracy Detection Time (seconds) 15 90% % % % % DDoS Detection Using the Flow Rate Detection Engine The weakness of the New Address Detection Engine is that it cannot detect DoS attacks that use a small number of flows. Fortunately, the Flow Rate Detection Engine can make up for this vulnerability by checking the rate of each individual flow. In this section, we use the Flow Rate Detection Engine for the last-mile HADR as an example to demonstrate the effectiveness of the Flow Rate Detection Engine. First, we analyze the flow rate distribution of real network traces, and motivate the Flow Rate Detection Engine. Second, we introduce a method, called statistical process control [81], to set the detection threshold for each flow. Finally, we discuss the detection strength of the Flow Rate Detection Engine. The Flow Rate Distribution Figure 3.26 shows the cumulative distribution of the flow rates of the flows in the incoming traffic to the University of Auckland recorded in March The horizontal axis represents the average rate of each flow and the vertical axis represents the probability of one flow whose average rate is below a given value. From the graph, we can see that about 57.5% of the flows have an average rate below 1 kbps. As we calculate the average flow rate by averaging the flow volume over 10 seconds, small traffic volume flows will result in a low flow rate. Consequently, the high percentage

142 120 History-based Attack Detection and Reaction Cumulative Probability Flow Rate (kbps) Figure 3.26: The flow rate distribution of the Auck-IV-in Trace of the low rate flows could be caused by small traffic volume flows, such as short web accesses, DNS queries, or short s. Only 12% of the flows have an average rate that is larger than 20 kbps. This implies that only a small proportion of the total flows have a high traffic rate. The flow rate distribution is determined by many factors, such as the physical connection speed of each IP address. Hence, it is very difficult for the attacker to know the average flow rate for each IP address. Without detailed information about the flow rate distribution of the incoming traffic to a target, an attacker is limited to the following two actions. First, an attacker can spread attack traffic over a large number of flows, which is likely to be detected by the New Address Detection Engine. Second, an attacker can spoof a small number of flows using educated guesses, which is likely to be detected by the Flow Rate Detection Engine.

143 3.8 Performance Evaluation Flow Speed Warning Value Bandwidth Limit Flow Speed (Kbps) Time (minutes) Figure 3.27: Detection thresholds for the flow with source IP address The Detection Thresholds for Each Flow One important issue for the Flow Rate Detection Engine design is how to set a detection threshold for each flow. We assume the rate of normal flows is statistically stable and in a state of control. Let µ f be the average flow rate of an individual flow, and σ f be the standard deviation of the flow rate of an individual flow. We calculate µ f and σ f for each individual flow. Using the theory of statistical process control [81], we take µ f +2σ f as the warning threshold and take µ f +3σ f as the detection threshold. Figure 3.27 shows an example of how to set the two thresholds for an IP flow. The flow statistics are based on the flow IP address in the Auck-IV-in packet trace. For this flow, the flow rate statistics are µ f =1.5 kbps and σ f =2.4kbps. Once the rate of a flow exceeds the detection threshold, it is regarded as a potential attack flow. Hence the HADR will probabilistically drop packets from this flow for a certain period of time. If the flow rate still exceeds the detection threshold after the

144 122 History-based Attack Detection and Reaction probabilistic packet dropping, the HADR will block this flow completely. The usage of the warning threshold will be discussed in Section 3.9. The Strength of the Flow Rate Detection Engine Our aim is to evaluate the effectiveness of the Flow Rate Detection Engine. During an attack, an individual flow at the target can comprise an attack flow from a spoofed source IP address and a legitimate flow from a genuine source IP address. For simplicity, we assume in the following attack scenarios that each attack flow has the same traffic rate. Moreover, we also assume that if the attack flow reaches the average rate of that individual flow during normal operation, the overall rate 4 of that individual flow during an attack will exceed the detection threshold. The benefit for us to choose this detection threshold instead of the detection threshold described in the previous section, µ f + 3σ f, is to simplify our analytical model which will be discussed in the following text without loss of generality. In real operation, we still use µ f + 3σ f as the detection threshold. For example, let the average flow rate of an individual flow with source IP address during normal operation be 10 kbps. If the rate of an attack flow with spoofed source IP address reaches 10 kbps, then the flow with source IP address , which includes the attack flow and the legitimate flow, will be detected by the Flow Rate Detection Engine at the target. Let us consider the following two extreme scenarios for the attacker to generate 1 Mbps attack traffic. In the first scenario, the attacker uses 50 flows with a rate of 20 kbps. Let k be the probability that the attacker s IP addresses are included in the IP Address Database. Let us make a conservative assumption that k = 1 in this scenario, which means that all IP addresses of the attack flows are included in the IP Address Database. Hence, this attack can evade the New Address Detection Engine as none of the attack flows will be new to the target. However, the Flow Rate 4 The overall rate is the sum of the attack flow rate and the legitimate flow rate.

145 3.8 Performance Evaluation 123 Detection Engine can be used to detect the attack. According to the flow distribution illustrated in Figure 3.26, about 88% of the IP flows have an average rate lower than 20 kbps. Therefore, about = 44 flows will exceed the detection threshold and be detected by the Flow Rate Detection Engine. For a given total attack traffic rate, with the increase of the number of flows, the average flow rate will decrease. Generally, it is difficult for the Flow Rate Detection Engine to identify an attack flow that has a low traffic rate. Consequently, for simplicity, we set 500 as the maximum number of attack flows that the Flow Rate Detection Engine can detect in our evaluation on the Auck-IV-in packet traces. In the second scenario, the attacker uses 1000 flows with a rate of 1 kbps. Then the expected number of new IP addresses will be 1000(1 k). If the attacker spoofs the source IP address randomly, then k = M /2 32, where M represents the size of the IP Address Database. Since M is approximately 40,000 in our experiment, k is nearly zero. As shown in Section 3.8.1, the lower detection bound for the New Address Detection Engine is 18 new flows. Therefore, this type of DDoS attack can be easily detected by the New Address Detection Engine. In summary, if the attacker uses a small number of traffic flows, then each flow will have a large volume, and hence will be detected by the Flow Rate Detection Engine. At the other extreme, if an attacker uses a large number of small volume attack flows, the large number of new IP addresses will trigger the New Address Detection Engine. Let us now briefly consider the benefits of using our two detection engines in combination. Generally, any DDoS attack can be characterized by two parameters: (1) the average attack flow rate (R attack ), and (2) the number of attack flows (N attack ). Figure 3.28 illustrates a two-dimensional space based on these two parameters. One point on the space corresponds to one DDoS attack. The total traffic rate of a DDoS attack can be formulated as R attack N attack. All DDoS attacks with the same total traffic

146 124 History-based Attack Detection and Reaction rate can be plotted in the curve shown in Figure For example, curves 1, 2 and 3 represent all DDoS attacks that have total attack traffic rate of V 1, V 2 and V 3 respectively, where V 1 < V 2 < V 3. We assume the probability (k) that the attacker s IP addresses are included in the IP Address Database is constant. The detection threshold of the New Address Detection Engine can be represented as the minimum number of new flows it can detect, which can also be interpreted as the minimum number of attack flows. The detection threshold of the Flow Rate Detection Engine can be represented as the minimum average flow rate it can detect. In Figure 3.28, the vertical dashed line represents the detection threshold for the New Address Detection Engine, the horizontal dashed line represents the detection threshold for the Flow Rate Detection Engine, and the shadowed area represents all the DDoS attacks that can be detected by the New Address Detection Engine, or the Flow Rate Detection Engine or both. As we discussed before, both detection thresholds are small values that depend on normal background traffic. Attacks with both parameters (R attack and N attack ) below the detection thresholds will have very small traffic rates, and will not be considered as DDoS attacks. Consequently, all DDoS attacks that are indicated by the curves are covered by the shadowed area, which means all DDoS attacks can be detected by the combination of the New Address Detection Engine and the Flow Rate Detection Engine Performance of the History-based IP Filtering An accurate and quick attack detection scheme can let the system be aware of an attack, and have enough time to react to the attack. However, the final performance of a defense system depends on how accurately and efficiently it can filter attack traffic while protecting legitimate traffic. Therefore, the performance of History-based IP Filtering is crucial to the overall performance of History-based Attack Detection

147 3.8 Performance Evaluation 125 Figure 3.28: DDoS attacks on a two-dimensional attack detection space. and Reaction. In this section, we aim to evaluate the performance of History-based IP Filtering in three aspects: (1) the proportion of legitimate traffic that can be protected, (2) the proportion of attack traffic that can get through the History-based IP Filtering, and (3) the memory requirement for a given filtering accuracy. We do not consider the naive DoS attacks with a small number of attack flows, as they can be easily tackled by just blocking the attack flows according to source IP addresses. It is worth noting that we use two weeks worth of data traces to build the IP Address Database due to the lack of publicly available data traces. In a real implementation, if we can build the IP Address Database using traces over a longer period, we can expect better filtering performance.

148 126 History-based Attack Detection and Reaction The filtering accuracy of HIF (Pnormal) 100% 90% 80% 70% 60% 50% 40% 30% 20% accuracy when d=1 accuracy when d=2 accuracy when d=3 10% 0% 26 Mar 27 Mar 28 Mar 29 Mar 30 Mar 31 Mar 1 Apr 2 Apr 3 Apr 4 Apr Auck IV in Traces Figure 3.29: The filtering accuracy of History-based IP Filtering The filtering accuracy of HIF (Pnormal) 90% 85% 80% 75% 70% 65% 60% 55% u=4 u=5 u=6 u=7 u=8 u=9 50% 45% Number of day (d) Figure 3.30: Accuracy of the combined rule F c = p 1 (d) p 2 (u) on the Auckland Trace.

149 3.8 Performance Evaluation 127 We define the filtering accuracy (P normal ) as the percentage of legitimate traffic that History-based IP Filtering can protect. First, we evaluate the filtering accuracy of History-based IP Filtering with different IP Address Databases. The IP Address Database used for History-based IP Filtering is constructed according to the two parameters, d, the minimum number of days on which the IP address appeared, and u, the minimum number of packets per IP address. Figure 3.29 shows how the filtering accuracy changes with time for different values of the parameter d given u = 3. As we see in Figure 3.29, when we use p 1 (1) to build the IP Address Database, the accuracy of History-based IP Filtering is close to 90% for traces from the Auck-IV-in dataset in March, while the accuracy of History-based IP Filtering drops to about 70% for traces in April. This is because the IP Address Database was generated using traces between March 12 and March 25, and becomes less relevant for the traces in April. It is interesting to see that the accuracy is stable until March 31 and then drops abruptly after that. Thus, we can update the IP Address Database every 5 days to achieve better performance. Figure 3.29 also shows that the filtering accuracy of History-based IP Filtering is 88%, 75% and 65% when using p 1 (1), p 1 (2) and p 1 (3). Figure 3.30 shows how the filtering accuracy changes with d for different values of the parameter u. As we can see from the graph, the performance of History-based IP Filtering when u = 4 and u = 5 are very close. This is because frequent IP addresses normally contain at least 5 packets as we discussed in Section Thus when we remove the IP addresses containing 4 packets from the IP Address Database in order to reduce its memory requirements, the filtering accuracy is hardly affected. Second, we want to evaluate the proportion of attack traffic that can get through the History-based IP Filtering. We assume each attack flow has the same traffic volume. Hence, the proportion of attack traffic that can get through is k, where k represents the probability that the attacker s IP addresses are included in the IP Address Database. Generally, k is a very small value. For example, if the attacker uses

150 128 History-based Attack Detection and Reaction randomly spoofed IP source addresses, the probability for the IP Address Database to accept a spoofed IP address is k = p 1(d) (3.13) As mentioned in Section 3.7.3, p 1 (d) = in our experiment, the probability of accepting an attack packet is nearly zero. Figure 3.31 shows the relation between the memory requirement for the IP Address Database and the filtering accuracy for History-based IP Filtering using the Aucklandin dataset. We record the system performance in three different testing scenarios. The top curve represents the percentage of legitimate flows with more than 10 packets on March 26 being protected. The middle curve represents the percentage of all flows on March 26 being protected. The bottom curve represents the percentage of all flows on March 27 being protected. As shown in Figure 3.31, in all three testing scenarios, the memory requirement expands with the increase in filtering accuracy. The top curve indicates a good system performance because the frequent IP addresses have a higher probability of being admitted. The middle curve indicates better system performance than the bottom curve because we use the IP Address Database generated using traces between March 12 and March 25, which is more relevant to the packets in March 26. We can also see that the three curves become very flat after the memory of IP Address Database reaches 2.5 MB, and converge when the memory of IP Address Database is 4 MB. This means the increase of memory size of the IP Address Database after 2.5 MB does not improve the filtering accuracy significantly, and 4 MB is the optimal memory size of the IP Address Database for the Auck-IV-in dataset.

151 3.8 Performance Evaluation Complexity of Using History-based IP Filtering in Routers We can analyze the computational complexity of History-based IP Filtering in routers from two prospectives. First, we can analyze the theoretical complexity of our algorithm in terms of features of the network traffic to be analyzed, e.g., the packet arrival rate and the number of new IP addresses seen over the time scale of interest. Second, we can estimate the computational delay that will be experienced by a router in terms of these traffic features. The traffic features of interest can be measured from real-life packet traces. Although it would be highly desirable to evaluate our detection scheme by testing its performance in a router on live traffic, it is not feasible to deploy a research prototype on live traffic in a real-life network. Consequently, we have followed the accepted practice [28][30][43][41] of evaluating the complexity of our filtering scheme using trace-driven experiments. When History-based IP Filtering is deployed in the router, the processing overhead for each packet is just one hash table lookup. Consequently, the filtering operation complexity is O(1), and the memory cost is O(n), where n is the number IP addresses in the IP Address Database. In our trace-driven experiments, our hash function simply extracts the least significant 16 bits from the source addresses, which can be calculated in a simple CPU instruction. In order to estimate the throughput that a router can achieve using History-based IP Filtering, we measured the CPU time required to check every packet in a real-life packet trace, which contains approximately 1.6 billion packets. The trace-driven experiment was conducted on a Linux PC with Dual Pentium III Xeon 900 MHz processor, and 512 MB DDR RAM. We run this experiment on 1000 batches of 10 6 packets, the average time needed to process one packet was 8 ± 1 nanoseconds. Let us take the average processing time per packet to be 8 nanoseconds, then 125 million packets can be processed per second by History-based IP Filtering. Given the

152 130 History-based Attack Detection and Reaction average packet size is 200 bytes, the average throughput that the History-based IP Filtering can achieve on a dedicated processor will thus be = 200Gbps. In the worst scenario, where the average packet size is 44 bytes (IP header plus TCP header), the average throughput will be 44Gbps, which can easily satisfy traffic at OC192c rate. In practice, a router s throughput is limited by one routing procedure, called packet classification, which includes IP routing table lookup and firewall rule checking. The state-of-art research on packet classification in [82] claimed that the average packet processing time is 33 nanoseconds per packet using pipelined hardware. Hence, the History-based IP Filtering is unlikely to become the bottleneck for the router. 3.9 Attacks Against the HADR The only possibility of defeating the HADR is if the attacker has a high probability of using addresses that already appear in the IP Address Database. Since the contents of the IP Address Database are unknown to the attacker, the only option for the attacker is to infiltrate the IP Address Database by first making some legitimate connections using genuine IP addresses. This means that the number of attack sources will be small, and the volume of traffic from each source will be much more noticeable, and hence easily detected by our Flow Rate Detection Engine. Thus, the key advantage of our HADR approach is that it can detect a wide variety of DDoS attack schemes. In this section, our aim is to analyze the robustness of HADR, and how HADR performs under a variety of sophisticated attacks. Let U represent total IP address space, M represent the collection of legitimate IP addresses, namely the IP Address Database, and A represents the collection of source IP addresses for attack traffic. The effectiveness of the IP Address Database depends on the size of M A, which is indicated in the shaded area as shown in Figure 3.32.

153 3.9 Attacks Against the HADR The filtering accuracy of HIF (Pnormal) IP flows with more than 10 packets in March 26 All IP flows in March 26 All IP flows in March Memory required for the IP Address Database (MB) Figure 3.31: Memory requirements for the IP Address Database on the Auckland trace Figure 3.32: The relation between the IP Address Database and the source IP addresses of attack traffic

154 132 History-based Attack Detection and Reaction To avoid being detected by the New Address Detection Engine, the attacker wants A M to be small so there are very few new IP addresses to the IP Address Database. To avoid being detected by the Flow Rate Detection Engine, the attacker wants A to be large so that each single attack flow does not look suspicious. However, these are two conflicting goals given a small value k, where k is the probability that the attacker s IP addresses are included in the IP Address Database as defined in Section The target s connection history is confidential data and should be unknown to the attacker. Therefore, k should be small. We assume the attacker cannot obtain the complete connection history of the target, otherwise much more serious security incidents may occur. Hence, it is not possible for the attacker to achieve k = 1 with a large A simultaneously. Without detailed knowledge of the target s connection history, the attacker might try to increase the value of k using some educated guesses about normal access patterns for the target network. For example, when attacking an Australian website, the attacker could choose to use IP addresses located in Australia. By doing this, the attacker has a greater chance of obtaining a higher probability k. Let D 1 be the minimum number of attack flows that the New Address Detection Engine can detect. Let D 2 be the maximum number of attack flows that the Flow Rate Detection Engine can detect. The detection engines of the HADR fail if the following two conditions are satisfied at the same time. A.(1 k) < D 1 (3.14) A > D 2 (3.15) In our experiment, we choose D 1 = 18 and D 2 = 500 as described in Section and Section Given these values of D 1 and D 2, then k > 0.964, which is impossible to achieve without detailed knowledge of the target s connection history. Hence, the HADR is robust against possible attacks.

155 3.9 Attacks Against the HADR Infiltrating Attacks If the attackers know that the HADR is based on previous network connections, they could mislead the HADR to be included in the IP Address Database, which we refer to infiltrating. For example, the attackers can first use a set of IP addresses to make some legitimate connections, known as reconnaissance, before the real attack. The attackers can control the reconnaissance traffic to be sufficiently low so as not be detected by the HADR. If the HADR considers the reconnaissance traffic to be part of the normal traffic, it will add the attacker s reconnaissance IP addresses into the IP Address Database. Therefore, the attacker can use these reconnaissance IP addresses to launch the DDoS attack. Since these IP addresses appear in the IP Address Database, the attack traffic can pass the filter easily, which constitutes a successful denial-of-service attack. There are several possible solutions to defeating infiltrating attacks: (1) increase the period over which IP addresses must appear in order to be considered frequent, (2) randomize the learning time for the IP Address Database and keep it secret from the attacker, and (3) ensure only IP addresses that have successfully completed a TCP connection are included in the IP Address Database. The last approach prevents the attacker from using spoofed IP addresses for which no host exists. The attackers can only launch their attacks using the real IP address of their computers or their compromised computers, which makes it much easier to identify and block the source of the attack. We may also be able to use techniques from scan detection [83] in order to identify IP addresses with unusual patterns of accesses. Moreover, we can combine additional rules for defining frequent IP addresses in order to improve the accuracy of the HADR. For example, the type of service accessed by the user and the length of each session may be useful measures for identifying frequent IP addresses.

156 134 History-based Attack Detection and Reaction Countermeasures Against Sophisticated Infiltrating Attacks The main type of infiltrating attack could be that attackers train the IP database to include its own set of addresses by using reconnaissance traffic over a period of several weeks before the attack. In order to achieve this type of attack, the attacker needs to control a large number of computers for a reasonably long period, e.g., two months, which depends on the training period chosen by the target. Particularly, if the target chooses to update the IP Address Database randomly, the attacker needs a longer period of time to train the IP Address Database in order to infiltrate. Attackers normally take over a large number of computers through worm propagation, including the spread of malicious attachments. Generally, most of the DDoS attacks are the byproducts of worm propagation [5]. The worm propagation can result in thousands of computers being compromised in a short period of time, such as a few hours. Fortunately, many Internet security companies, such as Symantec [84], and Government funded security research centers, such as CERT [1], monitor Internet activities continuously, and the propagation of a new worm is unlikely to proceed for a few months without being noticed. Moreover, some open source projects, such as the Honeypot project [85], will also help the early disclosure of the worm. Once a new worm is discovered, users can be alerted through a variety of channels, such as s from system administrators and news from TV or radio, to be aware of the new security threat. Hence, most users will start to make sure the integrity of their computers, and IT technicians will also take efforts to isolate and patch compromised computers. Consequently, it is not feasible for an attacker to compromise a large set of computers for a long period. This explains why the time gap between the worm propagation and DDoS attack is generally within one day, which is observed in most of the worm propagations [15][86]. With a short-period control of the compromised

157 3.9 Attacks Against the HADR 135 computers, it is nearly impossible for the attacker to train the IP Address Database to include its own set of addresses. More importantly, we can make the IP Address Database more robust by deploying a challenge-response scheme to authenticate whether an IP address corresponds to a genuine user. In order to make sure that the packets to the target web server are initiated by a human instead of an automated computer program, the web server can send a challenge to the users. For example, the challenge can be a randomly generated string in picture format, and the user is instructed to replicate the string and send it back to the web server [87]. This is easy for human but hard for computer programs to realize. In this way, the attacker needs manual intervention to respond to the challenge, which makes the infiltrating attack practically impossible DDoS Attacks from Infiltrated Sources A DDoS attack from infiltrated sources occurs when the attack traffic consists of a large number of IP addresses while only few of the IP addresses are new to the IP Address Database, which is the worst attack scenario. Even worse, we assume the attacker is able to ensure that the traffic volume generated by each flow is below the detection threshold. In this case, both detection engines might not be able to detect the attack due to insufficient attack evidence. However, both detection engines will observe some suspicious network behavior at the same time. This motivates us to refine the output of each detection engine and combine them to make a more accurate decision. Each detection engine can be redesigned to produce three output states, namely, a normal state, a suspicious state, and an abnormal state. For the New Address Detection Engine, the three states are defined as follows: Normal state: The CUSUM statistics y n is below G w, where G w is a value

158 136 History-based Attack Detection and Reaction Table 3.5: The rule for the decision engine New Address Detection Engine Flow Rate Detection Engine Decision Engine normal normal NORMAL normal suspicious NORMAL suspicious normal NORMAL suspicious suspicious ATTACK attack any output ATTACK any output attack ATTACK slightly less than the detection threshold N. Suspicious state: y n is between G w and N. Abnormal state: y n is larger than N. For the Flow Rate Detection Engine, the three states are defined as follows: Normal state: No flow exceeds the detection threshold and the number of flows that exceed the warning threshold is below T w. The definition of the warning threshold is given in Section Suspicious state: The number of flows that exceed the warning line is above T w. Abnormal state: There is at least one flow that exceeds the rate limit. The two parameters T w and G w can be chosen according to normal traffic conditions. The decision engine is used to summarize the results from the two detection engines. We employ a simple rule listed in Table 3.5 to combine the output of the two detection engines. As we see in Table 3.5, when both detection engines output suspicious states, an attack will be reported by the History-based Attack Detection and Reaction (HADR). In this way, the HADR can be more robust against DDoS attacks from infiltrated sources.

159 3.10 Discussion Discussion With the growth in Internet users, several protocols are proposed to make full use of the limited Internet resources. For example, with the deployment of the IP Network Address Translator (NAT) [88], Dynamic Host Configuration Protocol (DHCP) [89] and proxy services, multiple users can share the same source IP address. However, the source IP address can still represent some level of identity, for example, a group of users with geographical proximity. Since the IP addresses in our traces have been sanitized using a one-to-one hash mapping, the network information in the IP address is lost. In practice, we can use the network address of the source IP address to represent the user s identity, for example, the class C network address. Moreover, with the increasing implementation of IPv6 [42], every Internet user can be allocated a unique IP address. This will strengthen the correlation between source IP address and user identity. High profile websites, such as Yahoo and CNN, have a large and diverse range of users. In this environment, it is more challenging to maintain an IP Address Database because of the dynamic nature of their massive user group. Fortunately, these high profile websites use Content Distribution Networks (CDN) to balance their global traffic load. Generally, Content Distribution Networks bind the users to their local CDN server. Hence, we can build an IP Address Database for each local CDN server separately since each CDN server will have a consistent group of users. For search engine websites, such as Google, the user behavior may be more dynamic compared with corporate or university websites. However, due to the lack of data traces from search engine websites, we cannot obtain any results regarding to IP address consistency.

160 138 History-based Attack Detection and Reaction 3.11 Conclusion In this chapter, we proposed a defense mechanism, the History-based Attack Detection and Reaction (HADR), to detect and react to distributed denial of service attacks by using the target s connection history. We have also presented a sequential change point detection algorithm that can identify when an attack has occurred. We demonstrated the efficiency and robustness of this scheme by using trace-driven simulations. The experimental results on the Auckland traces show that we can detect DDoS attacks with 100% accuracy using as few as 18 new IP addresses in the last-mile router and DDoS attacks using as few as 2 new IP address in the first-mile router. Our on-line detection algorithm is fast and has a very low computing overhead. Furthermore, our first-mile router HADR has the advantage over ingress filtering [21] that it can detect attack traffic with spoofed source IP addresses from within the monitored subnetwork. More importantly, we showed that with the combination of the New Address Detection Engine and Flow Rate Detection Engine, we can detect both sophisticated and naive DoS attacks. In addition, we have presented History-based IP Filtering as an approach to reacting against denial of service attacks. We have demonstrated that we can build a practical IP Address Database based on packet traces collected from real-life networks. Furthermore, we have shown the effectiveness of using this database to filter out attack packets during simulated attacks. Our experiments on the Auckland Trace show that we can protect 90% of legitimate traffic with 4 MB of memory. We also found that History-based IP Filtering can protect 80% of legitimate traffic for the Small ISP Trace with 800 K of memory. The key advantage of our approach is that we can defend against attacks that use randomly spoofed source IP addresses. In contrast to previous flow-based monitoring schemes that try to categorize attack flows using the source IP addresses which are

161 3.11 Conclusion 139 unreliable, our technique achieved negligible false positive errors when filtering attacks that use randomly spoofed source IP addresses. Consequently, our HADR scheme forces attackers to launch attacks using genuine IP addresses, which makes it easy to traceback the attack sources. In the next two chapters, we will discuss the Victim- Router Model defense approach.

162 140 History-based Attack Detection and Reaction

163 Chapter 4 Adjusted Probabilistic Packet Marking 4.1 Introduction In the last chapter, we proposed a mechanism called History-based Attack Detection and Reaction (HADR) to detect and filter attack traffic. HADR can be used at both the first-mile router near the source, or the last-mile router near the target. For the first-mile HADR, attack traffic becomes highly diluted as attack sources are geographically distributed. Hence, it is difficult to detect highly distributed attacks near their source. Moreover, the effectiveness of first-mile HADR defense relies upon its universal deployment, which requires significant effort. For the last-mile HADR, attack traffic is detected and filtered at the victim, which only protects the server resources. The attack traffic can still congest network resources, which will jeopardise the whole defense efficacy unless the attack sources are located and removed. Consequently, the task of locating attack sources is of critical importance, in terms of providing a deterrent to the attacker and a warning to the compromised systems. Attackers will be discouraged from launching attacks for the fear of being punished, 141

164 142 Adjusted Probabilistic Packet Marking and compromised systems will be motivated to secure their systems because of the possible liability for attack damage. Therefore, an essential component of the solution to bandwidth attacks is to locate and remove all attack sources. One of the biggest difficulties in locating attack sources is that attackers always use incorrect, or spoofed IP source addresses to disguise their true origins. Therefore, source IP addresses are not reliable in identifying attack sources. In contrast, routers that are managed by major ISPs can be regarded as the trustworthy parties. Hence, the router can play a key role in identifying the attack sources. In general, attack sources are separated from the victim by a sequence of routers that are not under the victim s control. In order to respond to an attack, the victim needs the support from routers in the network to trace the sources of the attack. We present a victim-router model to defend against bandwidth attacks. The Victim-Router Model integrates a victim s local defense mechanism with cooperation that can be taken by network routers on the victim s behalf. This model consists of two approaches. The first approach is called adjusted probabilistic packet marking (APPM), which is used to traceback attack sources. The APPM is a passive Victim- Router Model, where the routers and the victim operate independently. The second approach is called Selective Pushback, which is used to filter attack traffic at routers close to the attack sources. Selective Pushback is an active Victim-Router Model, where a victim directly interacts with participating routers. APPM has the advantage of low implementation overhead while Selective Pushback has the advantage of stopping attacks in real-time. We demonstrate using simulation results that both of these two approaches are very effective against bandwidth attacks. In this chapter, we will focus on the passive victim-router model and discuss the active victim-router model in next chapter. The rest of this chapter is organized as follows. Section 4.2 provides the motivation for APPM. Section 4.3 provides the

165 4.2 Motivation 143 background for APPM. Section 4.4 describes three different APPM schemes. Section 4.5 evaluates the performance of APPM. Section 4.6 discusses the related issues for APPM. 4.2 Motivation Bandwidth attacks are extremely difficult to defend against [90], due to the ability of attackers to use incorrect ( spoofed ) IP addresses in the attacking packets and therefore disguise the real origin of the attacks. This has made it very difficult or impossible to traceback the source of attacking IP packets. A number of recent studies have approached the problem of IP packet traceback by Probabilistic Packet Marking (PPM) [41][43]. It is assumed that the attacking packets are much more frequent than the normal packets. The main idea is to let every router mark packets probabilistically and let the victim reconstruct the attack path from the marked packets. All of the probabilistic marking algorithms try to overload the marking information into the 16 bit identification field in the IP packet header, which is seldom used [91][92]. Since routers mark packets indiscriminately, packets marked by upstream routers can be overwritten by downstream routers. Hence, a major issue with existing probabilistic marking schemes is that they use a fixed marking probability, which means that there is a greatly reduced probability of getting packets from routers which are far away from the victim. Consequently the number of packets needed to reconstruct the attack path depends on the number of packets that are marked by the furthest router in the attack path. If we can increase the marking probability for the routers that are far away from the victim, then we need less packets to reconstruct the attack path. A potential problem with packet marking is that the attacker can forge the marking field. The authenticity of the marking field is the biggest challenge for probabilistic

166 144 Adjusted Probabilistic Packet Marking packet marking, which is discussed in [47]. Although Song and Perrig [43] have proposed a scheme for router authentication, the implementation overhead is high and there are still opportunities for the attacker to spoof the marking field. However, if we can let the routers mark all the packets when they first enter the network, then there is no way for the attacker to use the spoofed marking field to mislead the victim. In order to overcome the aforementioned weaknesses of the previous PPM approaches, we propose an adjusted probabilistic packet marking (APPM) scheme. This scheme allows the victim to traceback the approximate origin of spoofed IP packets efficiently and reliably. APPM chooses to mark each packet with a probability that is adjusted by the distance between the marking router and the destination. We develop three techniques to adjust the probability so that the victim can receive an equal number of marked packets from each marking router. Then there will be no bottleneck for the victim to collect the path information. Hence, the total number of packets needed by the victim to reconstruct the attack path is significantly reduced. More importantly, we give a detailed analysis of the vulnerabilities of probabilistic packet marking, and describe a version of our adjusted probabilistic packet marking scheme whose performance is not affected by spoofed marking fields. In this chapter, we make two contributions. First, we have developed three techniques for adjusting the probability used by routers to mark packets, in order to reduce the number of packets needed by the victim to reconstruct the attack path. Second, we give a detailed analysis of the vulnerabilities of PPM, and describe a version of our adjusted probabilistic packet marking scheme whose performance is not affected by spoofed marking fields. We demonstrate the benefits of our approach with an analytical model as well as providing an experimental evaluation using simulated packet traces.

167 4.3 Background on Probabilistic Packet Marking (PPM) Background on Probabilistic Packet Marking (PPM) Once an attack has been detected, an ideal response would be to block the attack traffic at its source. Unfortunately, there is no easy way to trace IP traffic to its source. This is due to two features of the IP protocol. The first feature is the ease with which IP source addresses can be forged. The second feature is the stateless nature of IP routing, where routers normally know only the next hop for forwarding a packet, rather than the complete end-to-end route taken by each packet. This design decision has given the Internet enormous efficiency and scalability, albeit at the cost of traceability. In order to address this limitation, probabilistic packet marking (PPM) has been proposed to support IP traceability. Definitions The main idea of PPM is to let routers mark packets with path information probabilistically and let the victim reconstruct the attack path using the marked packets. We need to mark packets that are already marked, so that attackers cannot fill the marking field of all attack packets, and thus block out any marking by downstream routers. Denial-of-service attacks are only effective so long as they occupy the resources of the victim. As a result, most denial-of-service attacks are comprised of thousands or millions of packets. PPM is based on the assumption that when we mark each packet with only a small probability then the victim will receive sufficient packets to reconstruct the attack path. The network can be viewed as a directed graph G = (V, E) where V is the set of nodes and E is the set of edges. V can be further partitioned into end systems (leaf nodes) and routers (internal nodes). The edges denote physical links between

168 146 Adjusted Probabilistic Packet Marking elements in V. Let S V denote the set of attackers and let t V/S denote the victim. We will first consider the case when S = 1 (single-source attack) and treat the distributed DoS attack case separately. We assume that routes are fixed, and that the attack path A = (s, v 1, v 2,..., v d, t) is comprised of d routers (or hops) and has path length d [47]. Let N denote the number of packets sent from s to t. A packet x is assumed to have a marking field where the identity of a link (v, v ) E traversed can be inscribed. A packet travels on the attack path sequentially. At a hop v i {v 1,..., v d }, packet x is marked with the edge value (v i 1, v i ), i = 1,..., d, with probability p. As we see in Figure 4.1, packet 1 is marked with edge value (v 1, v 2 ) and distance 2; packet 2 is marked with edge value (v 2, v 3 ) and distance 1. When t receives these two packets it can reconstruct the attack path (v 1, v 2, v 3 ). Each router marks a packet with probability p. When the router decides to mark a packet, it writes its own IP address into the edge field and zero into the distance field. Otherwise, if the distance field is already zero, which means this packet has been marked by the previous router, it processes the packet as follows: (1) It combines its IP address and the existing value in the edge field, and writes the combined value into the edge field. (2) It increases the distance value by 1. Thus, the edge value contains both information from the previous router and the current router. Finally if the router does not mark the packet, then it always increments the distance field. This distance field indicates the number of hops between the victim and the router that has marked the packet. The distance field should be updated using saturating addition, meaning the distance field is not allowed to overflow. When using this scheme, any packet written by the attacker will have a distance field greater than or equal to the real attack path. In contrast, a packet which is marked by a router should have a distance field which is less than the length of the path traversed from that router. Let us take the topology shown in Figure 4.1 as an example. Any packet

169 4.3 Background on Probabilistic Packet Marking (PPM) 147 v, v packet 1 S v v v t v, v packet 2 Figure 4.1: Probabilistic Packet Marking created by the source s will have a distance field which is greater than or equal to 3, unless it is marked by a router. If a packet is only marked by router v 1, it will arrive at the victim with a distance field of 2. Savage et al. proposed a method called the fragment marking scheme (FMS) [41] to compress the IP addresses and reconstruct the attack path. It was later improved by Song and Perrig [43]. Unless otherwise stated, when we talk about PPM in the rest of this thesis, we are referring to Song and Perrig s version of PPM. Limitation of Previous PPM Schemes Our aim is to minimize the time required to reconstruct the attack path. This depends on the time it takes to receive packets that have been marked by each router on the attack path. This in turn depends on the choice of the marking probability p. In this section, we model the performance of PPM in terms of p, and highlight the limitation of using a fixed marking probability. Definition 1 Let α i denote the probability that packet arriving at the victim was last marked at node v i but nowhere after v i. For a uniform marking probability, α i = Pr{x d = (v i 1, v i )} = p(1 p) d i (i = 1, 2,..., d). Definition 2 Let α 0 denote the probability that a packet sent from the attacker reaches the victim without being marked at any of the routers. For a uniform marking probability, α 0 = (1 p) d. In order to reconstruct the attack path as quickly as possible, the victim needs to receive a sample of packets marked by each router in

170 148 Adjusted Probabilistic Packet Marking the path. An unmarked packet provides no information to the victim. In fact, there is a risk that unmarked packets may contain misleading information that has been spoofed by the attacker. Consequently, we want as many packets to be marked as possible. This implies that p should be large, so that α 0 is as small as possible. However, there is a penalty for making p too large. As p increases, there is a greater likelihood that packets marked by routers close to the source will be overwritten by routers close to the victim. Note that α d... α 2 α 1, so α 1 is the smallest value. This is worst for packets marked by the first router after the source. So we need to choose p such that α 0 is minimized and α 1 is maximized. According to the coupon collecting problem 1 [93], for each attack path with d routers (excluding the victim), and with marking probability p, the expected number of packets needed to reconstruct the attack path is N(d) = ln(d)+o(1) p(1 p) d 1 [41]. In particular, let N f (d) represent the number of packets needed to reconstruct the attack path using fixed marking probability, and let N a (d) represent the number of packets needed to reconstruct the attack path using adjusted marking probability. N(d) is minimized when p(1 p) d 1 is maximized. Hence, the derivative of N(d) equals 0, i.e., d dp [p(1 p)d 1 ] = (1 p) d 1 (d 1)p(1 p) d 2 = 0. As (1 p) 0, we can have (1 p) (d 1)p = 1 p dp + p = 1 dp = 0. Therefore, the minimum value of N(d) is achieved when p = 1. Consequently, the d number of packets needed to get one sample from each router is N f (d) ln(d)+o(1) 1 d (1 1 d )d 1. This is the best result we can achieve for marking algorithms with a fixed probability. Our proposal is to reduce the total number of packets required N(d) by using a higher marking probability for routers close to the source. Ideally, we want to receive an equal number of packets marked by each router on the attack path, i.e. α i = 1/d. According to coupon collecting problem [93], the number of packets needed 1 Coupon collecting problem: If there are n distinct kinds of coupons, each equally likely to be received with any given purchase, what is the expected number of purchases in order to acquire a complete set of coupons. The well-known solution is n(1 + 1/2 + 1/ /n) = n(ln(n) + O(1)).

171 4.4 Passive Victim-Router Model: APPM 149 for reconstruction is N a (d) d(ln(d) + O(1)). The savings of this approach are N f (d) ( ln(d)+o(1) N a(d) 1 )/(d(ln(d) + O(1))) (1 1 d (1 1 d )d 1 d )1 d, which is greater than 2 for d 2. Our aim has been to develop a technique for adjusting the marking probability so that we can achieve the performance of N a (d). 4.4 Passive Victim-Router Model: APPM Existing proposals for PPM [41][43] have some critical limitations. A major disadvantage of previous PPM schemes is that the number of packets needed for path reconstruction grows exponentially with path length. This makes it difficult to react to attacks quickly and could become a system bottleneck. To address the vulnerabilities of previous PPM schemes, we propose a passive Victim-Router Model called adjusted probabilistic packet marking (APPM). According to the analysis in Section 4.3, we propose that a router should adjust its packet marking probability based on its position in the attack path. However, the position of the router in the attack path is not known, since the position of the attacker is unknown. We need to estimate this distance based on the available information. In this section, we propose three different schemes for adjusting the marking probability based on three different distance measures d 1, d 2 and d 3. The definition of these distances is shown in Figure Number of Hops Traversed by Packet (d 1 ) Let every router mark the packet with probability p 1 (d 1 ) = 1/d 1. The ideal case for packet marking is to receive packets marked by each router with equal probability α i = 1/d, where the path length is d. Let p 1 (d 1 ) represent the marking probability of the router at distance d 1 from the source, where d 1 = 1, 2,..., d. Then we obtain the following equations:

172 150 Adjusted Probabilistic Packet Marking Figure 4.2: Definition of different distance measures α d = p 1 (d) = 1/d (4.1) α d 1 = p 1 (d 1)[1 p 1 (d)] = 1/d (4.2) α d 2 = p 1 (d 2)[1 p 1 (d 1)][1 p 1 (d)] = 1/d (4.3) From equation 4.1 we can get p 1 (d) = 1/d; from equation 4.2 we can get p 1 (d 1) = 1/(d 1); from equation 4.3 we can get p 1 (d 2) = 1/(d 2). Accordingly, we can summarize the marking probability formula as p 1 (d 1 ) = 1/d 1. Then for the router at distance d 1, α d1 = 1 d 1 (1 1 ) (1 1 )... (1 1 d 1 +1 d 1 ). This equation can be +2 d simplified as α d1 = 1 d 1 d 1 d 1+1 d 1 d 1 +1 d d = 1. This means if each router marks d the packet with the probability p 1 (d 1 ) = 1/d 1, we can receive the packets marked by each router with equal probability 1/d, given the path length is d. In order to implement this marking scheme, we need to know the distance measure d 1. We propose to add an extra field in the IP option field. This field can be used to record the number of hops (d 1 ) traversed by the packet. The default value for this field is 0, and the router increases this value by 1 every time it forwards the packet. Every time the router gets the packet, it extracts the information d 1 from the option

173 4.4 Passive Victim-Router Model: APPM 151 field and marks the packet with probability 1/d 1. In order to prevent the attacker from spoofing this field, we can use the encryption schemes that are discussed in [43] Number of Hops Traversed Since the Packet was Last Marked (d 2 ) In the original probabilistic packet marking (PPM) scheme [41], there are three parts in the marking field. One part is called the distance field (d 2 ), which is used to hold the distance information from last router to mark the packet to the current router. We denote d 2 = 0 for routers next to each other. Since the larger the d 2 value, the higher the likelihood that the packet will be overwritten. Thus, we believe we should use a low marking probability for packets with a high d 2 value. We propose that a 1 router at distance d 2 should mark the packet with a probability 2(d 2. Let us now +1) illustrate the derivation of this formula by considering an example when the attack path length is 3. A packet which has a distance value d 2 in the marking field is marked by the router with a probability p 2 (d 2 ). We assume that routers mark each packet when it first enters the network. So when the packet passes the first router, the d 2 value will be set to 0. By analyzing all the possibilities of the d 2 value when the packets traverse the attack path, we can derive expressions for α i, i = 1, 2,..., d. Using these equations, we can find optimal marking probabilities for α 1, α 2, α 3. However, the equations become more complicated as the path length increases. Consequently, we propose that the general marking probability should be p 2 (d 2 ) = 1 2 2(d 2, which has +1) been shown through experiments to have the best performance. Since there are 5 bits in the marking field to hold the information d 2 in the existing 2 Based on the principle that a packet with a large d 2 value should have a small marking probability. We tried a variety of formulas that observe this principle, and chose p 2 (d 2 ) = 1 2(d 2+1) as the optimum marking scheme based on the empirical results.

174 152 Adjusted Probabilistic Packet Marking probabilistic marking scheme [41][43], we only need to extract this information from the marking field and mark the packet according to the formula p 2 (d 2 ) = 1 2(d 2 +1) Number of Hops from Current Router to Destination (d 3 ) If we can determine the distance of the current router to the destination (d 3 ), we can mark each packet with a probability p 3 (d 3 ) = 1/(c + 1 d 3 ) where c is a constant. Using this scheme, we can receive packets marked by each router with a probability of 1/c. We can derive this result as follows. α d3 = p 3 (d 3 )[1 p 3 (d 3 1)]...[1 p 3 (2)][1 p 3 (1)] 1 1 α d3 = (1 c + 1 d 3 c d )...(1 1 c 1 )(1 1 c ) 1 α d3 = c d c + 1 d 3 c d c 2 c 1 c 1 c α d3 = 1 c. In order to ensure that our probabilities are greater than zero, we have to make sure c+1 d 3 > 0. Since most path lengths in the Internet are bounded by 30 [94] [95] [96], we can take c = 30 for safety. So if we mark with probability p 3 (d 3 ) = 1/(31 d 3 ), we can make sure we can receive the packets marked by each router with probability 1/30. We rely on the routing protocol to provide us with the distance measure d 3. Current Internet routing protocols are destination-based and every time the router forwards the packet, it will look at the routing table to find the next hop to the destination. For example, Open Shortest Path First (OSPF) [97] is such a protocol, which is widely adopted as the TCP/IP Internet routing standard within an Autonomous System. Internet protocols provide us with a measure of the number of

175 4.4 Passive Victim-Router Model: APPM 153 Figure 4.3: Adjusted Probabilistic Packet Marking Scheme One Figure 4.4: Adjusted Probabilistic Packet Marking Scheme Two hops to each destination, which can be stored in the routing table as a measure of distance d 3. When the router starts to route the packet, it can extract the distance information d 3 from the routing table and then mark the packet according to the formula p 3 (d 3 ) = 1/(31 d 3 ). The three Adjusted Probabilistic Packet Marking schemes are shown in Figure 4.3, 4.4 and 4.5. P.distance and P.edge denote the distance, and edge information of the marking field. More details about the encoding schemes can be found in [43].

176 154 Adjusted Probabilistic Packet Marking Figure 4.5: Adjusted Probabilistic Packet Marking Scheme Three Summary We can summarize each marking scheme in terms of its performance and practicality. Marking scheme 1: p 1 (d 1 ) = 1/d 1 can achieve the ideal marking performance. With this marking scheme, we can receive the packets marked by each router with equal probability for any path length. Furthermore, every packet is marked under this scheme, and the attacker has no chance to spoof the marking field. However, this scheme requires a special hop count field, and there is a risk that this field can be spoofed by the attacker. In order to make this scheme work, we need a strong authentication scheme which can stop the attacker from spoofing, such as that described in [43]. Marking scheme 2: p 2 (d 2 ) = 1 2(d 2 +1) uses the distance field that is part of the packet marking scheme. This scheme can achieve a performance which is close to the optimal performance. In order to make this scheme work, we need to make sure the distance value in the marking field is trustable. One possibility is to let the routers mark all the packets when they first enter the network, then the attackers have no

177 4.5 Evaluation for APPM 155 way to spoof the distance value. However, this is only practical if we control the ingress routers to our network, and thus is effectively the same as a technique called ingress filtering [21]. Marking scheme 3: p 3 (d 3 ) = 1/(c + 1 d 3 ) uses information from the routing protocol and can achieve better results than using the uniform marking probability. Since the information is from the routing protocol, it can not be manipulated by an attacker. So scheme 3 is the safest and most practical scheme. 4.5 Evaluation for APPM Our aim is to evaluate the performance of our three APPM schemes in comparison to standard PPM. Our evaluation is based on the number of packets needed to reconstruct the attack path for a range of simulated attacks Methodology We simulate attacks from different distances using the methodology in [43]. network topology is based on a real traceroute dataset obtained from Lucent Bell Labs [98]. In our simulation, we vary the attack path from 1 to 30 hops and conduct 1000 random trials at each path length value. We measured the number of packets required to reconstruct the attack path using our schemes, and compared this to the number of packets required by PPM [43], where our implementation of PPM used a threshold parameter of M=5 as defined in [43]. As analyzed in Section 4.3, the optimum marking probability for PPM is p = 1, where d is the path length. We d varied the uniform marking probability of PPM using the values p = 0.01, 0.04, and 0.1. Note that p = 0.04 is recommended as the optimum choice for PPM [41]. The

178 156 Adjusted Probabilistic Packet Marking 3500 The number of packets needed for reconstruction Scheme 1 Scheme 2 Scheme 3 p=0.01 p=0.04 p= Path Length Figure 4.6: APPM Schemes 1, 2, and 3 compared with uniform marking probability p=0.01, 0.04, and Results The performance of our APPM schemes 1 to 3 are shown in Figure 4.6 as we vary the path length from the attacker to the victim. Schemes 1 and 2 perform the best, outperforming PPM for all values of p tested. However, these results assume that the distance field has not been tampered with by the attacker. For example, the attack packets can be forged with a large d 1 or d 2 value in order to reduce the probability of being marked by the downstream routers. Scheme 3 is the most practical, since its distance measure cannot be tampered with by the attacker. Scheme 3 outperformed PPM with p = 0.01 and Although PPM with p = 0.1 outperforms scheme 3 for small hop counts, scheme 3 performs far better when the attack path is large.

179 4.6 Discussion 157 Scheme 3 outperforms scheme 2 when the path length is 20 or higher as shown in Figure 4.6. This is because as the path length increases, scheme 3 approaches the optimum performance while scheme 2 cannot achieve the optimum performance as we discussed in Section Furthermore, scheme 1 and scheme 3 converge when the path length equals 30 because c equals the path length, which makes p 3 (d 3 ) equivalent to p 1 (d 1 ). 4.6 Discussion APPM Against DDoS Attacks During a distributed denial-of-service attack, there are many attacking sources. We have found that the number of packets needed for reconstruction increases linearly with the number of attackers. This makes it very difficult to identify all the attacking sources during a DDoS attack. Thus, our method to reduce the number of packets needed for reconstruction becomes extremely important to improve the reconstruction efficiency APPM and Marking Field Spoofing By spoofing the marking field, it is possible for attackers to make the attack appear as though it has come from a more distant source, e.g. a false source s f as shown in Figure 4.7. However, the attacker cannot change the marking of routers between it and the victim, e.g., v 1 to v 3. Consequently, we can always reconstruct the path to the attacker, although we may also reconstruct a false sub-path at the start of the true path, e.g., v f1 to v f3. If we are unable to authenticate the marking field, then this false sub-path can affect the performance of our first two schemes. This is because distance measures

180 158 Adjusted Probabilistic Packet Marking s f s v f v f v f v v v t Figure 4.7: Effect of Spoofing the Marking Field (Fake sub-path: v f1 path: v 1 to v 3 ) to v f3, true d 1 and d 2 will be inflated by the false sub-path, thus decreasing the packet marking probability of routers in the true attack path. However, our third scheme is unaffected by the actions of the attacker. This is because d 3 is derived from information in the routing table of each router, and the destination field. The attacker cannot fake the destination field without defeating the purpose of the attack, and the attacker cannot manipulate the contents of the routing tables in the routers. Thus, the performance of our third scheme is secure against manipulation by the attacker. In summary, every marking scheme in the literature uses a fixed marking probability. This means that only a small number of packets that are marked by the more distant routers will arrive at the victim. In contrast, we have developed several schemes that solve this problem by adjusting the marking probability in each router, which significantly reduces the number of packets required to reconstruct the attack path. Furthermore, no one has set up a scheme to completely solve the security problem that the attacker can fake the marking field. However, our third marking scheme does not use the contents of the marking field to adjust the marking probability, and thus cannot be manipulated by the attacker. Thus, our third APPM scheme provides a more robust and efficient method for tracing the source of an attack.

181 4.7 Conclusion Conclusion We have proposed a novel Victim-Router Model called the Adjusted Probabilistic Packet Marking scheme, which is a passive approach to defense. Our first contribution to packet marking is that we developed three techniques to adjust the marking probability used by each router so that the victim receives packets marked by each router with equal probability. Scheme 1 is to let the IP packet carry a message to inform the router how far the packet has traveled. Scheme 2 is to use the distance value of the marking field in the IP packet. Scheme 3 is to extract the distance between the router and destination from the routing table. Both Scheme 1 and Scheme 2 need authentication to prevent the attacker from spoofing the required information. Scheme 3 is the most practical one and can improve the reconstruction efficiency compared with the optimal uniform marking probability (p = 0.04). By implementing this scheme, we can substantially reduce the number of packets needed to reconstruct the attack path in comparison to PPM. Our second contribution is that we give a detailed analysis of the vulnerability of PPM, and describe a version of our adjusted probabilistic packet marking scheme whose performance is not affected by the vulnerability caused by spoofed marking fields. In the next chapter, we will discuss the active Victim-Router Model for defense.

182 160 Adjusted Probabilistic Packet Marking

183 Chapter 5 Selective Pushback 5.1 Introduction The advantage for Adjusted Probabilistic Packet Marking (APPM) is that it can locate attack sources regardless of spoofed source addresses, which provides deterrence to potential attackers. However, the common limitation for all the traceback schemes [41][43][44][9] is that the damage of the attack can not be controlled in the process of traceback without an effective source filtering scheme. Moreover, in the absence of source filtering, the final goal for traceback schemes is to find ultimate attack sources, which becomes extremely challenging for distributed attack sources. To identify attack sources as well as filter attack traffic, we introduce a router-based system called Selective Pushback to defend against distributed denial of service (DDoS) attacks. DDoS attacks are treated as a congestion-control problem. The main issue is to identify the congestion and then pushback a packet filter to the router closest to the source that causes congestion. This allows us to save the bandwidth that would otherwise be wasted by the packets destined to be dropped at the destination. This mechanism is also useful to tackle a flash crowd, where a link is inundated with many more legitimate requests than it can handle. Unlike previous approaches, we 161

184 162 Selective Pushback propose an anomaly detection scheme using source information. Since the source IP address is not trustable, we obtain source information using our APPM scheme. By filtering packets based on this source information, we can filter malicious traffic while protecting legitimate traffic. In this chapter, we make two contributions. First, we propose a Selective Pushback scheme to fix the inherent vulnerability of probabilistic packet marking. The robustness of Selective Pushback lies in that it locates attack sources using traffic volume as well as previous traffic statistics. Second, we use the path reconstructed by the probabilistic packet marking to direct the pushback and block the malicious traffic more efficiently and accurately. The rest of the chapter is organized as follows. Section 5.2 provides the background for Selective Pushback. Section 5.3 describes the Selective Pushback algorithm in detail. Section 5.4 evaluates the performance of the Selective Pushback scheme. Section 5.5 discusses the related issues of Selective Pushback. 5.2 Background When we detect a DDoS attack, the most important step is how to react to the attack. Mahajan et al. [58] have proposed a scheme to recursively pushback a control signal to the network. The control signal includes a description of the traffic aggregate which causes the congestion. When the router receives the pushback signal, it starts to monitor the incoming traffic and decides whether to propagate the pushback signal to its neighboring routers Definitions Consider the network in Figure 5.1. Server t is under attack; the routers Rn (n=1,2,..., 8) are the last few routers through which traffic reaches t. The thick lines show

185 5.2 Background 163 Figure 5.1: Router map showing attack traffic in bold links through which attack traffic is flowing; the thin lines show links without attack traffic. Only the last link R8-t is actually congested, as the other part of the network is adequately provisioned. Without any special measures, hardly any non-attack traffic can reach the destination. All of the links carry some non-attack traffic, but most of it is dropped due to congestion in R8-t. If the congested router R8 can propagate the congestion signal to its neighboring routers, the attack traffic can be effectively filtered at R1 and R Limitation of Router-based Pushback The deployment of filters in upstream routers really depends on the downstream router s ability to estimate what fraction of the aggregate comes from each upstream router. Mahajan et al. [58] propose to let the downstream router to send a dummy pushback request to all upstream neighbors. The recipient will then estimate the arrival rate of the specified aggregate and report it to the downstream router in

186 164 Selective Pushback status messages. In this scheme, the downstream router needs to contact all its upstream neighbors and all the upstream neighbors need to estimate the aggregate arrival rate. This additional processing makes the router implementation much more complicated and increases attack reaction time. Moreover, during a DDoS attack, attack traffic on one remote upstream link could be less than legitimate traffic on another remote upstream link. Hence, estimating the traffic aggregate arrival rate does not give substantial supporting evidence to locate attack sources. 5.3 Selective Pushback In this section, we will first give an overview of the active Victim-Router Model: Selective Pushback, and then an illustrative example to describe the operation of Selective Pushback Overview of Selective Pushback For normal pushback, the pushback signal is sent to all of the router s upstream neighbors and every router has to start a monitoring process after receiving the pushback packet. We proposed a mechanism, called Selective Pushback, to direct the pushback packet directly to routers near the source of the attack. The potential benefits are to reduce the computing overhead for the routers, and to reduce the time for the relevant routers to receive a control signal. Since PPM can provide us with partial information about where the attack traffic comes from [9], we can send the pushback packet to the routers in the reconstructed attack path and ask the routers to filter the attack traffic. Therefore, this method is more efficient compared with the hop-by-hop pushback mode proposed in [58]. However, DDoS attacks can create a large number of attack flows with small traffic volume. Since Probabilistic Packet Marking (PPM) relies on a high traffic volume

187 5.3 Selective Pushback 165 on each attack path in order to quickly reconstruct the attack path, the performance of PPM will be degraded in a DDoS attack. We propose that although the attacker can make the attack traffic well distributed across all the links, we can still see the changes in traffic volume from the normal state on each link. This requires the router to keep a normal profile of where the traffic comes from. Obviously, this depends on the source information of the traffic flows. Unfortunately, due to the inability for the TCP/IP protocol suite to authenticate the source address, we cannot use the source IP address to keep this record. Since PPM provides source information with high confidence without depending on the IP source address, it is possible to perform source-based anomaly detection by analyzing changes in the distribution of PPM fields. During normal conditions, the target collects source information from the incoming traffic and builds up a normal profile for the sources of the traffic flows. Since routers mark packets probabilistically, the target counts the packets according to the marking field. This enables the target to learn a distribution of how many packets come from each router. We can record the traffic volume profile at regular intervals. In our current evaluation, we profile traffic on an hourly basis. Thus, the target keeps different tables for different time intervals. All the normal traffic profiles are updated automatically every week. When severe congestion is experienced (e.g., high packet dropping rate) at the target, the target starts to compile a temporary profile of the arrival rates of packets from different routers. By comparing the temporary profile with the normal profile, we can calculate the change in rate of different routers. We can then decide whether the rate change is abnormally large by comparing the change in rate to a threshold. We describe in Section how we calculate that threshold. If the change in rate exceeds the threshold, a pushback signal is sent to the routers that are suspected of being the attack source candidates.

188 166 Selective Pushback Figure 5.2: An example of the Selective Pushback scheme An Example of Selective Pushback Figure 5.2 shows an example of the Selective Pushback scheme, where attack traffic came from routers R1 and R3, and the attack path is indicated by the thick lines. First, all the routers mark packets using APPM Scheme 3, which is discussed in Section We can choose the constant c to be 30. The benefit of using APPM is to ensure that the server t receives packets marked by each router with the same probability. In our case, the probability will be 1. Hence, by collecting the marked 30 packets, we can estimate the actual traffic volume at each router to server t. For example, if server t receives 10 packets marked by router R3, then there are about = 300 packets passing R3 to server t. As shown in Figure 5.2, server t keeps the traffic distribution during normal operation, which is called the normal profile. When an attack is detected, we can identify R1, R3, R5 and R6 to be abnormal routers by comparing the current traffic distribution with the normal profile. As R1 and R3 are the routers close to the attack sources according to the network topology,

189 5.4 Evaluation for Selective Pushback 167 server t sends a pushback packet to routers R1 and R3 directly. However, a traditional router-based pushback scheme [58] will infer the attack path hop-by-hop by estimating the incoming traffic rate to server t, which is much more time-consuming. One thing worth noting is that as R2 accounts for 7% while R1 only accounts for 4% of the total traffic. A router-based pushback scheme will identify R2 instead of R1 to be abnormal as R2 has higher traffic volume. Consequently, a pushback packet will be sent to the normal router R2 to filter or rate-limit the traffic to server t, which will unfairly punish the legitmate traffic. In contrast, our Selective Pushback scheme can identify the attack path by checking the deviation of traffic distribution. As R1 normally accounts for 2% of the total traffic, it becomes suspicious when its traffic distribution jumps to 4%. As R2 normally accounts for 8% of the total traffic, it is unlikely to be an attack source when its traffic distribution becomes 7%. The decrease of the traffic distribution can be caused by the increase of the total traffic volume. Therefore, the Selective Pushback can identify R1 to be the abnormal router, and send a pushback packet to R1 directly. 5.4 Evaluation for Selective Pushback Simulation Methodology Our aim is to simulate attacks in order to investigate the performance of our Selective Pushback scheme. This requires realistic background traffic. Unfortunately, it is difficult to obtain extensive real-life packet traces. Most extensive public-domain packet trace comes from the University of Auckland [74]. In our experiment, we use the data traces from the University of Auckland as our background traffic. We assume there is no DDoS attack traffic in the Auckland data traces. Then, we split the Auckland data traces into two parts. The first part is used as training data,

190 168 Selective Pushback which can be used to develop a normal traffic distribution profile. The second part is merged with some simulated DDoS attacks, and is used as the testing data. We can evaluate our scheme by analyzing the performance of Selective Pushback against the testing data. Normal Traffic Simulation In order to simulate normal traffic conditions, we analyzed packet traces from the Internet link to the University of Auckland [74]. In order to extract connection information from the data traces, we use TCP SYN packets to indicate traffic which goes into the University. The key component of the Selective Pushback scheme is the normal traffic distribution profile. The aim of the normal traffic profile is to provide a yardstick to decide whether the observed traffic distribution is normal or abnormal. Generally, traffic levels can vary according to the network location and the time of day. In order to build an accurate normal traffic distribution profile, we need to take these two factors into account. In order to simulate the real-time traffic to the University of Auckland, we need to know the path of network traffic. Given that we do not have the real router topology leading to the University of Auckland, and given that IP addresses in the traces have been randomized, we have used a simulated topology shown in Figure 5.3, where t is the target we want to protect. In our case, t represents the network for the University of Auckland. As the IP addresses in the Auckland data traces are encoded, the network information of the IP address is lost. However, for simplicity of the simulation, we still use the 24 bit IP prefix to group the network traffic. We assign the IP addresses with the same 24 bit IP prefix in the trace to the upstream routers randomly. For example, all packets passing router R.3.0 have the IP prefix of *. Hence, in order to

191 5.4 Evaluation for Selective Pushback 169 Figure 5.3: Simulation topology keep the traffic distribution for router R.3.0, we only need to count the proportion of network traffic with IP prefix *. As the time of day affects the traffic level, we analyzed traffic from the same hour of 15 weekdays from March 12, 2001 to March 30, Consequently, the target R0.0 keeps a normal traffic distribution profile for each hour of the day. Please note, weekend traffic is normally different from weekday traffic, and is not used in our experiment. We process the data traces as follows: 1. Take traffic data traces from the same hour, for example, 2.00 AM to 3.00AM, 2. Calculate the total number of SYN packets (denoted s), 3. Calculate the number of SYN packets which are originated from router i (denoted a i ), 4. Calculate the traffic distribution of router i, which can be represented as a i /s.

192 170 Selective Pushback Traffic distribution between 2AM and 3AM Traffic distribution between 9AM and 10AM Traffic distribution between 11AM and 12PM Traffic distribution of Router R Date of the trace Figure 5.4: The traffic distribution of router R3.0 Statistic Process Control Figure 5.4 plotted the traffic distribution of router R3.0 during three different hours. The horizontal axis represents the date of the trace. For example, 1 represents March 12, 2001 (Monday), 9 represents March 22, 2001 (Thursday), and 15 represents March 30, 2001 (Friday). The vertical axis represents the traffic distribution of router R3.0. As shown in Figure 5.4, the traffic distribution during one particular hour varies according to date. In order to decide whether the incoming traffic distribution is normal or abnormal, it is essential to develop a threshold for each router at one particular hour using the normal traffic distribution profile. For example, the traffic distribution of router R3.0 1 between 2.00AM and 3.00AM varies from 0 to If the 1 As explained in Section 5.5.2, the source IP addresses are encoded in the traces we are using. Hence, the traffic statistics are an approximation of the real statistics for upstream routers.

193 5.4 Evaluation for Selective Pushback 171 traffic distribution of router R3.0 is 0.06 at the monitoring period between 2.00AM and 3.00AM, we should be able to decide whether an attack has happened. In order to find an appropriate threshold, we assume the network without attack traffic is a stable system and in a state of control. Let µ represent the mean value and σ represent the standard deviation of the traffic distribution. According to the theory of statistical process control [81], we take µ ± 2σ as the warning line and take µ ± 3σ as the action line. We use the limit lines to indicate whether the system is in a state of control. The warning line and action line are built according to the 15 days traffic data. It will be updated after a certain period (for example, one month). Since a DoS attack is indicated by the increase of percentage of traffic, we monitor only the upper control limit rather than the lower limit. Thus, we choose µ + 2σ to be the warning line and µ + 3σ to be the action line in our simulation. Since most of the DoS attacks observed in Internet last from 5 to 10 minutes, we use 5 minutes as the time window size to monitor the distribution of incoming traffic. Once we have observed the traffic distribution of router R n is outside of the warning line, the server will send a pushback packet to the router R n to rate-limit the traffic to the server. If the traffic distribution of router R n is outside of the action line, the server will send a pushback packet to the router R n to filter all traffic to the server. For simplicity of the evaluation, we only use the warning line, µ + 2σ, as the threshold to detect DoS attacks Results In order to evaluate the performance of our Selective Pushback scheme, we require data traces containing attack traffic. We used 6 weekdays traffic from April 2, 2001 to April 9, 2001 as background traffic, then added the simulated attack traffic into the background traffic. To simplify the evaluation, we chose 2.00AM to 3.00AM to

194 172 Selective Pushback represent the off-peak time, and 11AM to 12PM to represent the peak time. All our evaluation is based on the traffic during these two time periods. The simulated DoS attacks we are using are SYN flood attacks, which are commonly observed Internet attacks [16]. To test the sensitivity of our scheme, we have simulated attacks from a single attack source as well as distributed attack sources. The detection performance is evaluated in terms of two detection parameters, false positive rate and detection accuracy, which are defined in Section We want to minimize false positive rate and maximize the detection accuracy. Single Attack Source We define a single source attack is an attack which originates from one router. We create a data trace for a SYN flood attack of 100 SYNs/sec which runs for 5 minutes with IP address prefix *. Hence, the simulated attack traffic is sent from router R SYNs/sec is the lowest limit for a single attacker source to disable a network service [75], and 5 minutes is the commonly observed attack length in the Internet [16]. As we calculate the traffic distribution every 5 minutes, there are 12 time windows during a one hour period. The attack trace is assigned to the 6th time window for each daily data trace. Figure 5.5 illustrates the detection performance of our scheme when operating between 2AM and 3AM, and the detection threshold we are using is , which is calculated from µ + 2σ, i.e., the warning line. Figure 5.6 illustrates the detection performance of our scheme when operating between 11AM and 12PM, and the detection threshold we are using is If the traffic distribution is above the threshold, a DoS attack is detected. As we can see from Figure 5.5 and Figure 5.6, the traffic distribution jumps to above 0.9 when a DoS attack occurs, and all inserted 6 DoS attacks of each simulation scenario can be easily detected by our scheme. Hence, the detection accuracy is 100%.

195 5.4 Evaluation for Selective Pushback attacks 0.9 Traffic Distribution of Router R Traffic Distribution of Router R3.0 Threshold= false positive 0 2 April 3 April 4 April 5 April 6 April 9 April Time (2AM 3AM, time window=5 minutes) Figure 5.5: Detecting a single attack source between 2AM and 3AM. 1 attacks 0.9 Traffic Distribution of Router R Traffic Distribution of Router R3.0 Threshold= April 3 April 4 April 5 April 6 April 9 April Time (11AM 12PM, time window=5 minutes) Figure 5.6: Detecting a single attack source between 11AM and 12PM.

196 174 Selective Pushback When our scheme is operating between 2AM and 3AM, only one false positive is found, and the false positive rate is 1.4%. As no traffic distribution is above the detection threshold during normal operation while operating between 11AM and 12PM, no false positive is reported. From the trace-driven simulation result, we can conclude that it is very easy to detect single source attacks by using statistical control limit thresholds [81] with a low false positive rate. Distributed Attack Sources Attackers can reduce attack traffic volume from a single source to hide their identities. We assume that the attack traffic comes evenly from multiple sources, which is the worst case for DDoS attack [58]. In our simulation, we choose IP address ranges *, *, *, *, *, * as the 6 sources respectively. The total attack traffic rate that we use for simulation is still 100 SYNs/sec. Thus, every source will send around 16.7 SYNs/sec for 5 minutes to constitute an effective DDoS attack. Similar to single attack source simulation, the simulated attack is also assigned to the 6th time window of each daily data trace. As we aggregate traffic according to encoded IP addresses, it may not reflect the actual network traffic scenario. In order to make up for the disadvantage that we only have encoded real-life traffic using simulated network topology, we evaluate a set of routers with different traffic distribution levels. We use router R3.0, which accounts about 6.6% of the total traffic during normal operation, to represent routers with high traffic load. We use router R3.6, which accounts about 1.1% of the total traffic during normal operation, to represent routers with medium traffic load. Finally, we use router R3.3, which accounts for about 0.11% of the total traffic during normal operation, to represent routers with low traffic load. Figure 5.7 and Figure 5.8 show the detection performance when one distributed attack source originates from router R3.0. Figure 5.9 and Figure 5.10 show the

197 5.4 Evaluation for Selective Pushback Traffic Distribution of Router R3.0 Threshold= Traffic Distribution of Router R attacks false positive April 3 April 4 April 5 April 6 April 9 April Time (2AM 3AM, time window=5 minutes) Figure 5.7: Detecting a DDoS attack with one of the 6 distributed attack sources at router R3.0 between 2AM and 3AM Traffic Distribution of Router R3.0 Threshold= Traffic Distribution of Router R attacks April 3 April 4 April 5 April 6 April 9 April Time (11AM 12PM, time window=5 minutes) Figure 5.8: Detecting a DDoS attack with one of 6 distributed attacks sources at router R3.0 between 11AM and 12PM.

198 176 Selective Pushback Traffic Distribution of Router R3.6 Threshold= Traffic Distribution of Router R false positives attacks April 3 April 4 April 5 April 6 April 9 April Time (2AM 3AM, time window=5 minutes) Figure 5.9: Detecting a DDoS attack with one of the 6 distributed attack sources at router R3.6 between 2AM and 3AM Traffic Distribution of Router R3.6 Threshold= Traffic Distribution of Router R attacks false positives April 3 April 4 April 5 April 6 April 9 April Time (11AM 12PM, time window=5 minutes) Figure 5.10: Detecting a DDoS attack with one of 6 distributed attacks sources at router R3.6 between 11AM and 12PM.

199 5.4 Evaluation for Selective Pushback Traffic Distribution of Router R3.3 Threshold= Traffic Distribution of Router R attacks false positives April 3 April 4 April 5 April 6 April 9 April Time (2AM 3AM, time window=5 minutes) Figure 5.11: Detecting a DDoS attack with one of the 6 distributed attack sources at router R3.3 between 2AM and 3AM Traffic Distribution of Router R3.3 Threshold= Traffic Distribution of Router R attacks April 3 April 4 April 5 April 6 April 9 April Time (11AM 12PM, time window=5 minutes) Figure 5.12: Detecting a DDoS attack with one of 6 distributed attacks sources at router R3.3 between 11AM and 12PM.

200 178 Selective Pushback Table 5.1: The false positive rate for router R3.0, R3.6, and R3.3 when there are 6 uniformly distributed attack sources 2AM to 3AM 11AM to 12PM Router R % 0 Router R % 5.6% Router R % 8.3% detection performance when one distributed attack source originates from router R3.6. Figure 5.11 and Figure 5.12 show the detection performance when one distributed attack source originates from router R3.3. The figures show that all inserted DDoS attacks can be easily detected. Hence, the detection accuracy rate is 100%. In addition, the false positive rates are shown in Table 5.1. Router R3.3 has a higher false positive rate because it only accounts for a small proportion of traffic which makes its traffic distribution more bursty. To reduce the false positive rate, we can either tune the threshold according to local network conditions or combine other traffic statistics, such as traffic volume, to decide when to raise alarm. The average traffic distribution for routers R3.0 and R3.6 between 11AM and 12PM is higher than between 2AM and 3AM because 11AM to 12PM is the normal working time. However, the average traffic distribution for router R3.3 between 2AM and 3AM is higher than between 11AM and 12PM. This may partly due to the traffic of router R3.3 is from somewhere that has a time difference from New Zealand, for example, North America. To conclude, DDoS attacks with 6 attack sources that are located at different types of routers can be easily detected. However, the attacker could have many more attack sources. Thus, we are more interested in what is the maximum number of attack sources we can detect. Let S t represent the total number of SYN packets to

201 5.4 Evaluation for Selective Pushback 179 the University of Auckland during normal traffic time, let S i represent the number of SYN packets from one single source i (i = 1, 2,..., n), let A t represent the total number of SYN packets from all the distributed attack sources and let A j represent the number of SYN packets from attack source j (j = 1, 2,..., m). Obviously, we can get S t = n i=1 S i and A t = m j=1 A j. It is worth noting that S i, S t, A j and A t are calculated every 5 minutes. As we mentioned in Section 5.4.1, the detection threshold we are using is µ + 2σ. In order to detect the attack, the following equation should be satisfied: S i + A j S t + A t µ + 2σ, (5.1) where S i and A j come from the same router. In our simulation, the total attack traffic volume A t during 5 minutes is = SYN packets, which remains unchanged. For a uniformly distributed DDoS attack, A j = A t /m, where m is the number of attack sources. When m increases, A j will decrease. Hence, S i+a j S t+a t will also decrease given S t and S i remain stable. If the detection threshold µ + 2σ is a small value, attacks are more likely to be detected even if m is a large value. We inserted simulated DDoS attacks with different numbers of attack sources into the testing traces, and repeated the experiments we did before. The detection performance is shown in Table 5.2. We can see from the table that the detection performance varies according to the time of the day, the total number of attack sources, and the attack source location. For example, if one of the 15 evenly distributed attack sources is located in router R3.0 between 2AM and 3AM, this attack source can be detected with 100% accuracy. However, if one of the 16 evenly distributed attack sources is located in router R3.0 between 2AM and 3AM, this attack source can be detected with 83.3% accuracy. Therefore, for router R3.0 between 2AM and 3AM, the maximum number of attack sources we can detect is 15. As router R3.0 normally accounts for reasonably large proportion (6.6%) of the overall traffic, only

202 180 Selective Pushback attack sources that generate high traffic volume will significantly change R3.0 s traffic distribution, and hence be detected. As router R3.3 normally accounts for small proportion (0.11%) of the overall traffic, attack sources with low traffic volume will also affect R3.3 s traffic distribution dramatically, and hence be detected. Consequently, as shown in Table 5.2, the maximum number of attack sources we can detect for router R3.0 between 11AM and 12PM is 6, while the maximum number of attack sources we can detect for router R3.3 between 11AM and 12PM is 411. It is worth noting that R3.0 is a router with high traffic load, which will be one of the few routers close to the target. Generally, attack traffic will aggregate at router R3.0 due to its proximity to the target. From router R3.0 s point of view, the number of attack sources will be limited to a small number. Hence, it is still an effective detection result that the maximum number of attack sources we can detect for R3.0 is 6. Comparison with Router-based Pushback As the strength of Router-based Pushback is based on inferring attack sources hopby-hop by monitoring the aggregate traffic arrival rate, it has been evaluated using the dynamic network simulation tool NS [58][99]. In contrast, as the strength of Selective Pushback is based on accurately locating attack sources using anomaly-based detection, we use real network traffic traces for evaluation. As real network traffic traces with a given network topology are not publicly available, we compare Selective Pushback with Router-based Pushback, subject to the following constraints: (1) A simulated topology is assigned to the real traffic data traces; (2) Simulated attacks are merged with normal background traffic to generate the testing data. For the simplicity of comparison, we assume that both Router-based Pushback and Selective Pushback filter all traffic to the target t at the router where attack sources have been located. We use the following simple example to demonstrate the advantage of Selective

203 attack time attack source location total number of attack sources detection accuracy false positive rate % Router R % 1.4% 2AM 17 0 to % 3AM Router R % 2.8% % % Router R % 6.9% 95 50% 6 100% Router R AM 8 0 to % 12PM Router R % 5.6% 40 50% % Router R % 8.3% % Table 5.2: The detection performance of our scheme against DDoS attacks with different number of attack sources. 5.4 Evaluation for Selective Pushback 181

204 182 Selective Pushback Figure 5.13: A simple topology to show the advantage of Selective Pushback over Router-based Pushback Pushback over Router-based Pushback. Consider the topology shown in Figure 5.13, when the overall DDoS attack speed is 100 SYN packets per second with 1000 evenly distributed attack sources. Let us suppose there are 5 attack sources in R3.3, 3 attack sources in R3.6, and no attack sources in R3.0. When the attack occurs between 11AM and 12PM, the link R3.0-R2 will account for 7%, the link R3.3-R2 will account for 0.39%, and the link R3.6-R2 will account for 0.46% of the total traffic to the target t. The Router-based Pushback scheme determines the links that contribute to the congestion by examining the traffic rate. In this case, link R3.0-R2 will most likely be identified as the link contributing to the congestion. Hence, a filter will be put at router R3.0 to filter all the traffic to the target t. Therefore, about 6.6% of the normal traffic will be punished while no attack traffic has been removed. However, as shown before, our Selective Pushback scheme can identify that link R3.3-R2 and

205 5.5 Discussion 183 link R3.6-R2 are contributing to the congestion. Hence, filters will be put at R3.3 and R3.6. Therefore, only about 1.2% of the normal traffic to the target t will be punished while all 8 attack sources will be removed. 5.5 Discussion Implementation Overhead for Selective Pushback A major challenge for pushback techniques is how to identify and characterize highvolume aggregates [58], so that the pushback signal can specify what type of traffic should be traced and filtered. Since the source field is unreliable in IP traffic, some other combination of protocol fields must be used to categorize the traffic, e.g., destination address, destination port, and protocol number. This problem of building a signature that describes the high-volume aggregate is a significant computational burden for the victim, particularly given the high volume of traffic and the many possible combinations of protocol fields. A significant advantage of our approach is that we eliminate this complexity by using the contents of the packet marking fields to characterize traffic flows. This is a reliable form of source information, which can be used to detect abnormal increases in traffic flow that constitute an attack. In order to implement our scheme, the routers only need to perform the probabilistic packet marking which has a low overhead. In fact, some commercial routers (e.g., Cisco routers [100]) have the marking function already. For the target, it only needs to keep a normal profile of where the traffic comes from. When congestion happens, which is indicated by a high dropping rate, the target starts to collect the temporary profile and compares it with the normal profile. Since we only keep records according to the routers, the memory requirement for the target is very low.

206 184 Selective Pushback Although our simulation is mainly based on SYN flood attacks, we can easily change the protocol number to detect UDP flood attack and other types of DDoS attacks. Detection Performance In our experiment, the time window size we chose was 5 minutes. Hence, the target t decides whether to send a selective pushback message every 5 minutes. In order to shorten the time for the target to react to an attack, we can reduce the time window size. More importantly, to improve the detection accuracy, we can build normal profiles for individual hosts instead of all the hosts within one network. The more normal profiles we keep, the more memory space is needed. However, memory space in routers is limited. Hence, it is up to the users to choose a trade-off between memory space and performance (in terms of detection sensitivity and detection accuracy) Other Related Issues Geographical information is very important for us to build a normal profile since user behaviour differs according to geographical position. Unfortunately, the real IP addresses in the Auckland data traces [74] have been masked by one-to-one hash mapping to protect privacy, so we cannot obtain any geographical information from the IP address. An issue for future work includes generating normal profiles using IP prefixes that represent users from the same geographical area based on IP addresses that have not been masked. We do not consider the interference between the attack traffic and legitimate traffic as we use trace-driven experiments. Another issue for future work includes implementing our scheme on a real network test-bed so that the real-life performance can be evaluated.

207 5.6 Conclusion Conclusion We have proposed a novel Victim-Router Model called Selective Pushback, which is an active defense approach. Our first contribution in this area is a scheme to detect and react to denial of service attacks. Second, we described a scheme to detect the attack based on the source information which is provided by probabilistic packet marking. This means we can pushback according to the detection result, which is much more efficient than hop-by-hop pushback. Third, we applied statistical process control [81] techniques to detect attacks. Finally, we demonstrated the effectiveness of our scheme against a well-distributed denial of service attack. Simulation results show that our scheme can locate up to 411 evenly distributed attack sources and remove attack sources more effectively than router-based pushback. One point worth noting is that one attack source in our trace-driven simulation can be the aggregation of traffic sent by many compromised computers in a real attack scenario. Consequently, our detection scheme is also effective during a large-scale of DDoS attack, where thousands of compromised computers are used to launch the attack. In the next chapter, we will discuss the Router-Router Model defense approach.

208 186 Selective Pushback

209 Chapter 6 Distributed Detection by Sharing Combined Beliefs 6.1 Introduction In the previous three chapters, we discussed two defense models: the Victim Model and the Victim-Router Model. As discussed in Section 5.1, it is important to filter attack traffic as close as possible to its sources in order to prevent attack traffic from congesting shared Internet resources. However, if attacks are launched from distributed sources, the attack evidence created by each attack source might be hard to detect against the background of normal traffic. The Victim-Router Model pushes the attack defense from the victim toward the attack sources. However, this Victim-Router Model defense is initiated by the victim and depends on the cooperation between the victim and the routers. When the victim detects the attack, the attack traffic has already wasted the network bandwidth. If the attack traffic can be filtered before it congests the network links, more bandwidth can be saved. Hence, a better model is needed to achieve the following two goals. The first goal is to be able to detect highly distributed attacks close to the attack 187

210 188 Distributed Detection by Sharing Combined Beliefs sources. The second goal is to be independent of the victim. We propose a Router- Router Model to realize these two goals. In the Router-Router Model, when a router observes any suspicious network behavior that might not be serious enough to raise an alarm in the Victim Model, it broadcasts a warning message to other routers. After a router receives a warning message, it combines the local network behavior with the warning message to make a detection decision. This has the advantage that global network information can be shared among routers and thus attacks can be detected in their early stages. For highly distributed denial of service (HDDoS) attacks, attack sources are spread across the Internet. The traffic to the victim looks like legitimate flows when it is close to the attack sources. However, the attack traffic aggregates when it is close to the victim, and constitutes a powerful denial of service attack. It is easy to detect the attack traffic close to the victim because the traffic volume is substantially high. However, it is difficult to detect attack traffic close to the attack sources because the traffic generated by each attack source is very low. More importantly, it is crucial to detect the attack close to the source so that the attack damage can be minimized. Therefore, there is a universal challenge for all the detection approaches: how can we identify a malicious flow if its traffic statistics, such as traffic volume, satisfies the requirements for a legitimate flow. In Chapter 3, we introduced a novel detection feature, the number of new IP addresses, to increase the detection accuracy and reduce the detection time. Given the attacker has controlled a large number of attack sources, this approach still faces the detection challenge when it is close to the attack sources due to the diluted attack traffic. As soon as attack flows leave the attack sources, the global view of the Internet is that a massive number of attack flows are approaching the victim. Unfortunately, traditional detection systems do not have a global view and fail to detect the attack close to the sources. To strengthen the detection capability, we need to equip the

211 6.1 Introduction 189 detection system with a global view. Hence, we propose a distributed detection model called the Router-Router Model. In the Router-Router Model, a detection agent is implemented in routers throughout the network. Each detection agent informs other detection agents once it observes any suspicious network behavior. With the distributed information from other detection agents, each agent is able to maintain a global picture of the state of the network, which helps the agent to detect a DDoS attack close to the source. In this chapter, we use two examples to illustrate the operation of the Router- Router Model. First, as a distributed version of the Victim Model, we apply the Router-Router Model to detect DDoS attacks by monitoring the increase of new IP addresses. As discussed in Chapter 3, our Victim Model scheme exploits an inherent feature of DDoS attacks, which makes it hard for the attacker to counter this detection scheme by changing their attack signature. In the Router-Router Model approach, we show that by sharing the distributed beliefs, we can improve the detection efficiency of the Victim Model. Second, we apply the Router-Router Model to detect a type of distributed denial of service attack, known as reflector attacks. In this type of attack, every potential reflector monitors the incoming packets, and broadcasts a warning message to other potential reflectors if any abnormal traffic is observed. The warning message contains a description of the abnormal traffic it has observed. A detection decision can be made based on the combined information from multiple potential reflectors. We present a learning algorithm to decide when to broadcast the warning message. Our contribution in this chapter is two-fold. First, we propose two distributed detection models to detect DDoS attacks close to the sources. Second, we describe how machine learning can be used to improve the efficiency of a multi-agent system for distributed attack detection. We show that this approach is much more effective than centralized detection schemes, especially when there are multiple attack sources

212 190 Distributed Detection by Sharing Combined Beliefs and the attack traffic is highly distributed. The rest of the chapter is organized as follows. Section 6.2 defines the problems we want to solve. Section 6.3 gives the motivation for using the Router-Router Model defense. Section 6.4 discusses the methods we are using in the Router-Router Model. Section 6.5 gives the evaluation of the Router-Router Model. Section 6.6 discusses the related issues of the Router-Router Model. 6.2 Problem Definition Distributed Denial of Service Attacks In Section 3.9, we discussed the detection strength of the History-based Attack Detection and Reaction (HADR). In our discussion, we assume the attack traffic volume is 1 Mbps, which is a conservative assumption for the attack traffic close to the victim. From our discussion, we demonstrate that most of the DDoS attacks can be detected by the last-mile HADR. However, the attack traffic close to the source does not have a lower bound for traffic volume. Even worse, it is easy for an attacker to guess what will be a legitimate IP address due to its proximity to the first-mile HADR. As discussed in Chapter 3, a DDoS attack can be characterized by the average attack flow rate and the number of attack flows. As shown in Figure 6.1, any DDoS attack can be described as one point on the two-dimensional space. Each curve or trajectory in this two-dimensional space represents a group of DDoS attacks with the same total traffic rate. The attack detection zone represents all DDoS attacks that can be detected by the Flow Rate Detection Engine or the New Address Detection Engine or both. As shown in Figure 6.1, when the average attack flow rate and the number of attack flows are carefully chosen, the trajectory of a DDoS attack can be out of the detection

213 6.2 Problem Definition 191 Figure 6.1: The challenge of first-mile HADR detection zone. Consequently, the group of DDoS attacks indicated by the trajectory cannot be detected by the first-mile HADR Reflector Attacks The reflector attack is one type of highly distributed denial of service attack. It is also called the distributed reflector denial of service (DRDoS) attack as defined in Section As shown in Figure 2.6, the attacker directs a set of compromised hosts, called zombies, to send attack traffic to the reflectors using the victim s IP address as the source address. Then the reflectors will send the reply traffic to the victim. The more reflectors used, the more diffuse the attack traffic between the reflectors and the victim. A reflector could be any IP host that will reply to every packet sent,

214 192 Distributed Detection by Sharing Combined Beliefs e.g., a web server. Since there are many such reflectors available in the Internet, the attacker can diffuse the attack traffic using many reflectors. In this chapter, we define the reflector attack as the second stage of a DRDoS attack, where attack traffic is sent from the zombies to the reflectors. Paxson has made a detailed analysis of reflector attacks in [2]. He describes a method to filter the reflector traffic at the victim. Although this filtering scheme can lessen the damage to the victim, it hardly helps to stop the network bandwidth abuse. Therefore, the final solution for defending against the reflector attack is to let the reflectors detect the attack and trace back to the attacker. However Paxson does not provide a scheme for the reflectors to detect the attack. Wang et al. [28] proposed a CUSUM detection scheme to detect SYN flooding attacks by observing the ratio of the number of SYN packets and number of FIN packets. Gil proposes a scheme called MULTOPS [27] to detect denial of service attacks by monitoring the packet rate in both the up and down links. However, none of these schemes have been designed to detect reflector attacks. To tackle this problem, we propose a scheme called Detection with Information Sharing (DIS) to let all the potential reflectors exchange information once any slightly abnormal traffic is observed. By analyzing all the combined information from multiple reflectors, we can quickly and efficiently detect the reflector attack. In this thesis, we adapt the distributed probabilistic detection scheme in [101] to detect reflector attacks by using an approach based on the CUSUM algorithm [69].

215 6.3 Motivation Motivation Motivation for Distributed DDoS Attack Detection Suppose there are k Intrusion Detection System (IDS) agents monitoring all the traffic that goes into the victim s network. During a DDoS attack, the attack evidence seen by IDS agent i is a i, where i = 1,..., k, and the attack evidence seen by the victim is A. The attack evidence could be traffic volume, or the number of abnormal connections. In this chapter, a i refers to the percentage of new IP addresses. It is difficult to identify the DDoS attack using an individual IDS agent, as a i A given k is large. If we reduce the detection threshold for each IDS, many legitimate traffic flows may be mistaken as the attack traffic. However, since the k IDS agents monitor all the traffic to the victim, we have A = k a i. (6.1) i=1 Therefore, if each IDS agent knows what other agents observe, the attack evidence seen by individual IDS agents will become A. This motivates the proposal of a multiagent scenario to share information between IDS agents Motivation for Distributed Reflector Attack Detection The Importance of Detecting at the Reflectors The reflector is generally not compromised during the attack, and plays a role as an innocent third-party. If we implement the detection agent in the reflectors, we can detect the attack traffic and filter it before it arrives at the victim. Furthermore, we also provide a platform to trace back the real attacker.

216 194 Distributed Detection by Sharing Combined Beliefs The Detection Opportunities There are two detection opportunities for the reflector. First is to monitor the spoofed traffic from the zombies. Second is to monitor the reply traffic from the victim. Since the attacker can spoof any type of traffic and there is only a small amount of traffic sent to the reflector, it is rather hard for the reflector to detect the spoofed traffic sent from the zombies. In contrast, the packets sent from the reflector to the victim are unsolicited. Generally, the victim will reply with a TCP reset (RST) packet for TCP packets and an ICMP port unreachable packet for UDP packets. Thus, by monitoring the incoming RST or ICMP port unreachable packets, we can detect abnormalities in the network. Sharing Distributed Beliefs The most challenging part for detecting attacks at the reflector is that the reply traffic from the victim to the reflectors may be widely distributed. Therefore, the traffic seen by one reflector is only a very small part of the reply traffic and may not be large enough to raise an alarm. We can take two steps to tackle this problem. First, we can divide the network we aim to protect into several subnetworks according to the network topology. A detection agent is implemented to monitor each subnetwork s traffic, which represents a proportion of the reply traffic from the victim to the reflectors. Second, each detection agent informs the other detection agents when it observes any abnormal traffic. By combining the distributed beliefs of each agent, we can detect the reflector attack. The architecture of our detection mechanism is shown in Figure 6.2.

217 6.4 Methodology 195 Figure 6.2: Overview of detecting reflector attacks 6.4 Methodology The Router-Router Model includes three key parts. The first is a scheme for individual detection agents to detect the abnormal network behavior. The second is a mechanism for combining distributed information from each agent. The third is a rule to decide when to broadcast the warning message.

218 196 Distributed Detection by Sharing Combined Beliefs Detecting Abnormal Network Behavior Abnormal Traffic Behavior for the DDoS Attack One conspicuous traffic abnormality caused by the DDoS attack is the large number of new IP addresses. In Chapter 3, we proposed a Victim Model to detect the DDoS attack. For simplicity, we use the New Address Detection Engine at each detection agent in the distributed DDoS detection. The parameters for the New Address Detection Engine are defined in Chapter 3. Abnormal Traffic Behavior for the Reflector Attack Before we start to analyze the traffic that the reflectors may receive, let us assume that the reflectors are trustworthy and not compromised. Let us take the TCP-based reflector attacks as an example. We define the TCP-based reflector attacks as attacks based on the TCP protocol, excluding TCP reflector attacks using the application layer exploit [2]. When the attacker sends a spoofed TCP packet to the reflector, there are two possible replies. If the attacker sends a SYN request with a spoofed source address, then the reflector will reply with a SYN/ACK packet. If the attacker sends any other TCP packets with a spoofed source address to the reflector, the reflector will reply with a TCP reset packet, namely, the RST packet. As for the victim, it will reply to the SYN/ACK packets with RST packets, and will ignore the RST packets. Thus, we can detect the first type of reflector attack by monitoring the incoming RST packets and the second type of reflector attack by monitoring the outgoing RST packets. A naive approach would be to simply count the number of RST packets and use it to detect reflector attacks. However, there are three reasons TCP reset packets can be generated by a host. The first is the host responds to a connection request to a nonexistent port; the second is the host wants to abort a connection; the third is the

219 6.4 Methodology 197 host terminates a half-open connection 1. Therefore, we have to investigate the causes of the RST packets instead of just counting the RST packets. Since the RST packets from the victim are caused by the SYN/ACK packets during the TCP-based reflector attack, we only count a RST packet that has a SYN/ACK state in the corresponding outgoing connection. Unless otherwise stated, the RST packet only refers to the TCP reset packet which is generated by the victim in reply to the SYN/ACK packet. It is a trivial matter to extend to use ICMP port unreachable packets to detect other types of reflector attacks. Following our motivation from Section 3.6.3, let X n represent the number of RST packets in the sampling interval n. The top graph in Figure 3.7 shows an illustrative example of the random sequence {X n }. The mean value of {X n } is α. In normal operation, the number of RST packets will be close to 0, since RST packets are very rare under normal network conditions. However, one single RST packet does not constitute a reflector attack, since other network abnormalities, e.g., a network scan, can also account for the RST packets. If a reflector attack happens, the number of RST packets will increase continuously. In order to detect this continuous increase, we use the non-parametric Cumulative Sum (CUSUM) algorithm [69] which is discussed in detail in Section We choose CUSUM to detect the increase, since this algorithm is designed to accumulate values of X n that are significantly higher than the mean level under normal operation. One of the advantages of this algorithm is that it monitors the input random variables in a sequential manner so that real-time detection is achieved. We follow the rules defined in Section to choose parameters for the CUSUM algorithm. 1 A half-open connection refers to the TCP connection where one end has closed or aborted the connection without the knowledge of the other end.

220 198 Distributed Detection by Sharing Combined Beliefs Combining Beliefs for Attack Detection As we discussed before, in order to detect attacks, our IDS agents need to cooperate by sharing their beliefs about potentially suspicious traffic. This raises two challenges. First, we need a framework for combining different agents beliefs about the incoming traffic. Second, we need a function that decides when to share beliefs about the incoming traffic. It is important that our model for combining beliefs should use summary information about the traffic rather than raw measurements about each new IP address or RST packet in the incoming traffic, in order to minimize the communication overhead. In this section, we first focus on how to combine beliefs for reflector attack detection, then focus on how to combine beliefs for DDoS attack detection. In the next section, we will discuss when to share beliefs. Combining Beliefs for Reflector Attack Detection For reflector attack detection, a centralized Intrusion Detection System would monitor all the incoming RST packets to the target network, and use these measurements to test the hypothesis that the incoming RST packets are caused by the reflector attack. Based on these measurements, we can evaluate our belief as to whether the hypothesis is true. However, since the location of the reflectors can be highly distributed, it is hard to collect the measurement information of all the reflectors at a single point. Without loss of generality, let us consider a multi-agent scenario as show in Figure 6.2 where the target network is divided into two subnetworks, each with its own intrusion detection system. It is a trivial matter to apply our model to larger numbers of agents. For convenience, we shall call the Intrusion Detection System (IDS) agent at the left subnetwork L and the IDS agent at the right subnetwork R. Each agent only sees the subset of the RST packets that go to its subnetwork from the

221 6.4 Methodology 199 victim(s). Consequently, the RST packets sent by the victim(s) are split between the two agents. Every agent runs the CUSUM algorithm to detect the network abnormality. Let y rf (L) and y rf (R) represent the CUSUM detection variables observed by agents L and R respectively. Let N rf (L) and N rf (R) represent the detection thresholds employed by agents L and R respectively. Each agent may believe that such a small number of RST packets is not suspicious, i.e., y rf (L) < N rf (L) and y rf (R) < N rf (R). Thus, each agent acting in isolation has insufficient evidence to consider that an attack is occurring. In order to detect this reflector attack, we need to combine the beliefs from both agents. The first step is to extend our centralized model to calculate the combined CUSUM statistics for the incoming RST packets. Each subnetwork can monitor the number of RST packets addressed to hosts in that subnetworks. Any RST packets that are in transit through a subnetwork, and thus are not addressed to a host in that subnetwork, will not be counted. Since each agent is responsive for monitoring a separate set of destination host addresses, the sum of the two detection variables can be used as an estimate of the detection variable for the whole network, i.e., y rf = y rf (L) + y rf (R). Combining Beliefs for DDoS Attack Detection As shown in Figure 6.3, we consider there are two transit networks where the DDoS attack traffic passes through, each with its own IDS agent. L is the IDS agent that monitors the traffic passing through the left transit network, and R is the IDS agent that monitors the traffic passing through the right transit network. It is also a trivial matter to apply our model to larger numbers of agents. As we analyzed before, the DDoS attack traffic will cause an increase in the number of new IP addresses seen at the victim. However, since the transit network will only see part of the DDoS attack traffic, the number of new IP addresses might not be large enough to raise a alarm. Let y DDoS (L) and y DDoS (R) be the CUSUM detection variables that

222 200 Distributed Detection by Sharing Combined Beliefs Figure 6.3: Combining Beliefs for DDoS Attack Detection IDS agents L and R will observe and let N DDoS (L) and N DDoS (R) be the detection thresholds for the IDS agent L and R respectively. Then, y DDoS (L) < N DDoS (L) and y DDoS (R) < N DDoS (R). Again, each agent acting in isolation has insufficient evidence to consider the traffic to be suspicious. Hence, we need to combine beliefs from both agents in order to detect this DDoS attack. Let D L and D R denote the set of hosts in the left and right transit networks, respectively. Let P n (D L ) and P n (D R ) represent the percentage of new IP addresses that pass the transit networks L and R. As we analyzed in Section 3.6.3, the decision function is based on monitoring the percentage of new IP addresses during the designated time interval. If one IDS agent can update the percentage of new IP addresses by sharing the distributed beliefs, it can recalculate the detection variable y DDoS using the CUSUM algorithm. Therefore, the IDS agent can make a decision by combining beliefs from other IDS agents.

223 6.4 Methodology 201 The first step to realize our distributed model is to calculate the percentage of new IP addresses by sharing the distributed beliefs. Let F L and F R represent the collection of the frequent IP addresses which are stored in the IP Address Database (IAD). Let M L and M R represent the collection of the incoming IP addresses during the sampling interval. Thus, we have P n (D L ) = ML M L F L F L and P n (D R ) = MR M R F R F R. Ideally, when we combine the belief, the percentage of new IP addresses to two transit networks should be P n (D) = ML M R (M L M R ) (F L F R ) F L F R. However, in order to get an accurate value of P n (D) we need raw measurements, for example, the sets M L and F L, which creates a huge communication overhead. In order to simplify the implementation of this scheme, we make the following assumptions. First, due to the dynamics of Internet traffic paths [102], packets from one source can arrive at the victim through both transit networks during a long period. Hence, we assume the IP Address Databases of the two transit networks have a big overlap 2, i.e., F L F R max( F L, F R ). Second, as the path taken by a packet is generally stable during the short sampling interval (10 seconds in our experiment), we assume M L and M R are disjoint collections. Thus, the simplified calculation is P n (D) = ML + M R M L F L M R F R max( F L, F R ) Learning When to Broadcast the Warning Message Given the two methods above for combining beliefs about the abnormal network behavior from different agents, we need to formulate a technique for deciding when to broadcast this information, and hence combine beliefs between agents. A naive approach to deciding when to share beliefs would be to broadcast our beliefs to other agents every time we observe a new IP address or a RST packet. This liberal 2 Note that if there is unlikely to be any significant overlap between the IP Address Databases of the two transit networks, we can approximate F L F R F L + F R without changing the overall structure of our solution.

224 202 Distributed Detection by Sharing Combined Beliefs approach effectively replicates all measurements at all agents. Given that most of the measurements are normal and thus of little interest to other agents, this approach would be an enormous waste of bandwidth and resources. An alternative approach that is equally naive would be to broadcast our beliefs only when we have sufficient local measurements to confirm our hypothesis. The incoming traffic is independently monitored by each agent until one agent confirms that an attack occurs, and then notifies the other agents. This conservative approach to sharing beliefs minimizes communication overheads. However, it also maximizes the delay in confirming a hypothesis because beliefs are not shared about unconfirmed hypotheses. Our aim is to find a decision function that lies somewhere between these two extremes. Agents should share information when there is a significant change in belief that is likely to help confirm a hypothesis. Our approach is based on the learning scheme described in [101]. For simplicity of discussion, we make the following specifications. In the case of DDoS attack detection, let N(L), N(R), y(l) and y(r) represent N DDoS (L), N DDoS (R), y DDoS (L) and y DDoS (R) respectively. In the case of reflector attack detection, let N(L), N(R), y(l) and y(r) represent N rf (L), N rf (R), y rf (L) and y rf (R) respectively. Recall that an agent L considers that an attack has occurred if N(L) < y(l). Our decision function should trigger a broadcast before the agent has confirmed that the incoming traffic is attack traffic. The key issue is how small this difference in likelihoods should be before we broadcast. We introduce a parameter T that represents the threshold at which we should

225 6.4 Methodology 203 broadcast. Thus, our decision function is: Broadcast if N(L) y(l) < T. If T is large then the agent will broadcast early, when it has seen few new IP addresses or RST packets and y(l) is small. Conversely, if T is small, then the agent will delay broadcasting until it has seen sufficient new IP addresses or RST packets to increase y(l) in comparison to N(L). The aim of learning is to find an optimum broadcast threshold T, so that we avoid wasting broadcasts while minimizing the detection delay. We need to adjust T in response to feedback about how our multi-agent system performs in comparison to a centralized monitoring approach. Each time an attack occurs, we record how many measurements (γ m ) were needed before our multi-agent system detected the attack. We can also determine how many measurements (γ s ) would have been needed by a centralized system using a single agent to analyze all the incoming traffic. γ m and γ s refer to the number of new IP addresses during DDoS attack detection, and the number of RST packets during reflector attack detection. Note that γ m γ s. We refer to the difference δ = γ m γ s as the confirmation delay of using a distributed approach. We can also record whether an agent issued a broadcast in the course of analyzing the incoming traffic. Let σ = 1 if a broadcast was made, otherwise σ = 0. In order to measure the performance of our multi-agent system, we can average δ and σ over a large number of simulated DDoS attacks. Let δ and σ denote the average confirmation delay and the average number of broadcasts over a set of DDoS attacks. Given that we want to minimize both these quantities, we define our feedback function as f(t ) = u( δ) 2 + v( σ) 2, where u and v can be any functions. In our case, we have used the identity function for u and v.

226 204 Distributed Detection by Sharing Combined Beliefs For a given setting of the threshold T in our decision function, we can observe the feedback function f(t ) by averaging over a set of DDoS attacks. Consequently, we can use f(t ) as our objective function to optimize T. This is an example of a stochastic optimization problem, where the objective function and its gradient can only be estimated by observation. We can solve this problem using a technique known as stochastic approximation (see [103] for an overview). We use the current value of T k at the k th iteration to estimate T k+1 using: T k+1 = T k a k ĝ k (T k ), where ĝ k (T k ) is an estimate of the gradient of the objective function at f(t k ), and a k is a step size co-efficient. The gradient is estimated using perturbations ±c k around T k : ĝ k (T k ) = f(t k + c k ) f(t k c k ) 2c k. We choose the perturbations and step size based on a scheme by Spall [104]. Based on Spall s recommendations, we found that a global minimum was obtained using a k = 10/ k and c k = 1/ k. Using this scheme, we can learn an optimum value of T that minimizes both the communication overhead and the confirmation delay. In our test domain, we observed that there was a well-defined global minimum for T. We have used this approach in a centralized learning scheme, where each agent uses the same threshold value. The algorithm is also illustrated in Figure 6.4. It is a simple matter for agents to archive measurements of confirmed attacks, so that they can be downloaded later as training examples for learning. In order to provide a basis for comparison with our machine learning approach, we have developed a default decision function that is based on random broadcasts. Our default decision function is to broadcast after the CUSUM variable calculated by an agent has reached M, where M is uniformly distributed Uniform(1, M max ).

227 6.4 Methodology 205 Figure 6.4: The algorithm for learning when to broadcast the warning message. Random broadcasts: Broadcast belief in an attack each time the CUSUM variable reaches M, where M Uniform(0, M max ) is reset after each broadcast. We use this decision function as a benchmark to explore the trade-off between communication overhead and confirmation delay by varying M max.

228 206 Distributed Detection by Sharing Combined Beliefs 6.5 Evaluation Evaluation for DDoS Attack Detection We have evaluated our learning technique by testing its performance on a set of simulated distributed denial of service attacks, and comparing its performance to a default decision function that is based on random broadcasts. We measure the performance of these two approaches in terms of the average confirmation delay and the average number of broadcasts made by our multi-agent system on a set of simulated DDoS attacks. We introduce two types of costs for learning. The first cost is the cost of sharing information by broadcasting. The second cost is confirmation delay. When an attacker starts a DDoS attack, it is initially classified as normal until it has created enough new IP addresses to be classified as an attack. The same attack takes longer to detect in a multi-agent system compared to a centralized system, because each agent sees only a subset of the attack traffic. Given enough new IP addresses, the multi-agent system will reach the same conclusion as the centralized system. Hence, it is important to measure this confirmation delay. In order to measure these two costs, we have tested our multi-agent approach on a set of known DDoS attacks, and compared its performance to our centralized approach. We have generated DDoS attacks with sufficiently large volume so that they are always detected by the centralized approach, and almost always detected by the multi-agent approach. On the rare occasions when our multi-agent approach is unable to detect the DDoS attack in the given number of new IP addresses, the cost of misclassification is reflected by setting the confirmation delay to the number of new IP addresses during the total length of the DDoS attack. We have based our simulated DDoS attacks on the Auck-IV-in traces [74], which were described in detail in Section 3.8. In the Auck-IV-in traces, all the IP addresses

229 6.5 Evaluation 207 have been mapped into 10.*.*.* using a one-to-one hash mapping for privacy. Let IP prefix 10.1.*.* represent transit network L and IP prefix 10.2.*.* represent transit network R. All the traffic with the source IP address 10.1.*.* and 10.2.*.* are analyzed by the intrusion detection agent L and R respectively. Each agent monitors the percentage of new IP addresses and calculates the CUSUM variable y DDoS to decide whether it is an attack. If any evidence has been broadcast from the other agent, then it is included in this evaluation. The agent also uses its decision function to determine if it should share its beliefs with the other agent. Once an agent has confirmed that the traffic is attack traffic, we record the total number of new IP addresses that were generated by the DDoS attack before it was detected, as well as the number of broadcasts received by the agent before it reached its conclusion. We also determined the number of new IP addresses that would have been required by a centralized agent in order to confirm that a DDoS attack happens. The difference between the number of new IP addresses needed by the multi-agent system and the centralized system represents the confirmation delay in using a distributed approach. We used this procedure to evaluate our optimized and default decision functions in terms of the number of broadcasts needed and the confirmation delay. For the optimized decision function, our feedback function f(t ) was averaged over 1000 trials, where each trial is defined as a new simulated DDoS attack with a random assignment of attack traffic volume and number of attack source IP addresses. It was necessary to average over a large number of trials in order to eliminate random variations in individual DDoS attacks. For the default decision function using random broadcasts, we tried 17 different settings of M max from 0.01 to At each setting, we averaged the results over 1000 random trials. The results are shown in Figure 6.5. Each point in the graph corresponds to the

230 208 Distributed Detection by Sharing Combined Beliefs 10 9 Random broadcasts Optimized broadcasts Average number of broadcasts Random broadcasts (small M max ) Optimum threshold Random broadcasts (large M max ) Average confirmation delay (number of new IP addresses) Figure 6.5: Performance of decision functions of the Router-Router Model for DDoS attack detection. average of 1000 trials using the indicated decision function. The average number of broadcasts and the 95% confidence interval for average confirmation delay are shown at each point. The results using random broadcasts form a curve, with small values of M max on the left, and large values on the right. If an optimized decision function is to be considered acceptable, it should fall below the envelope formed by the random broadcasts. Our learning technique found an optimum value of T = 0.03, which resulted in an average of 1.1 broadcasts per agent and an average confirmation delay of 20 new IP addresses. The learning algorithm converged after 10 iterations, as shown in Figure 6.6. Figure 6.5 shows the trajectory of successive T k values moving from right to left, with the results for the optimum value of T k indicated. Note that all the values of T k performed better than the random broadcasts. As discussed in Section 3.8.1,

231 6.5 Evaluation Broadcast threshold Iteration Figure 6.6: Convergence of the broadcast threshold optimization in the Router-Router Model for DDoS attack detection the false positive rate of the centralized New Address Detection Engine is zero for the Auck-IV-in traces. Since our multi-agent detection approach chooses the same parameters as the centralized approach, no false positive is found Evaluation for Reflector Attack Detection Similar to our evaluation of DDoS attack detection, we have evaluated our distributed detection techniques for reflector attack detection by testing their performance on a set of simulated reflector attacks. We have chosen the bidirectional traffic at the New Zealand Internet Exchange [105] on the 9 July 2000 as our background traffic. The total trace contains about packets, and is 8.6 GBytes when uncompressed. All the IP addresses have been hashed into 10.*.*.* to protect privacy. Let L represent

232 210 Distributed Detection by Sharing Combined Beliefs 120 attack attack 100 CUSUM statistics y rf (L) T CUSUM statistics y rf (L) Distributed Threshold Centralized Threshold :00 4:00 8:00 12:00 16:00 20:00 24:00 time Figure 6.7: The CUSUM statistics for L in distributed reflector attack detection. Average number of broadcasts Random broadcasts (small M max ) 2 Optimum threshold 1 Random broadcasts Optimized broadcasts Random broadcasts (large M max ) Average confirmation delay (number of RST packets) Figure 6.8: Performance of decision functions of the Router-Router Model for reflector attack detection.

233 6.5 Evaluation Broadcast threshold (T) Iteration Figure 6.9: Convergence of the broadcast threshold optimization in the Router-Router Model for reflector attack detection. 100% 80% False negative False positive Percentage of false alarms 60% 40% 20% Broadcast Threshold (T) Figure 6.10: The accuracy of distributed detection of reflector attacks

234 212 Distributed Detection by Sharing Combined Beliefs the hosts with IP address 10.0.*.* and let R represent the hosts with IP addresses 10.1.*.*. There are 49,180 active hosts in L and 41,048 active hosts in R. The data trace is regarded to be clean since no reflector attacks are found. We randomly merged simulated reflector attacks of various intensities into the clean trace. The merged trace is then processed by our detection scheme. Figure 6.7 shows the CUSUM statistics y rf (L) observed by the agent in L. There are two thresholds. One is the centralized threshold when the agent works as standalone mode. Another is the distributed threshold when the agent cooperates with other agents. How to specify this detection threshold is explained in Section However, when the attacker only generated a small amount of reflector attack traffic from L, it will take a long time or may be impossible to detect the attack without sharing the beliefs from other agents. Therefore, we have to broadcast the warning message earlier before we observe sufficient evidence of the reflector attacks. The decision is based on the broadcast threshold T as described in Section The other agents start to recalculate the y rf and make a new decision after they receive the warning message. Figure 6.8 shows the learning costs of the two broadcast mechanisms. Each point in the graph corresponds to the average of 1000 trials using the indicated decision function. The average number of broadcasts and the 95% confidence interval for average confirmation delay are shown at each point. The results using random broadcasts form a curve, with small values of M max on the left, and large values on the right. Clearly, as M max increases, the frequency of broadcasts by each agent decreases, and the confirmation delay increases. An optimized decision function is considered acceptable if it falls below the envelope formed by the random broadcasts. Our learning technique found an optimum value of T = 35.5, which resulted in an average of 1.1 broadcasts per agent and an average confirmation delay of 15.8 RST packets. The learning algorithm converged after 9 iterations, as shown in Figure 6.9.

235 6.6 Discussion 213 Figure 6.8 shows the trajectory of successive T k values moving from right to left, with the results for the optimum value of T k indicated. Note that all the values of T k performed better than the decision function based on random broadcasts. Figure 6.10 shows the average false alarm rate for different broadcast thresholds. We define a false positive as normal traffic that has been detected as an attack and a false negative as an attack which has not been detected. When operating at the optimum threshold, we achieved a 0.5% false negative rate and 1.5% false positive rate. In practice, our learning technique can be trained using an off-line simulation using statistics from a real network. We can then learn the optimum value of T based on simulated reflector attacks, as we have done in this section. The optimum threshold can then be loaded into the monitoring agents in the live network. In summary, our results demonstrate that we can learn a decision function for when to share beliefs without requiring any prior knowledge of the domain. Furthermore, we can learn a decision function that outperforms a default decision function based on random broadcast periods. 6.6 Discussion When the Router-Router Model is applied to a large number of agents, the communication overhead between agents may be a concern. We can minimize this overhead by using the following approaches. First, we can optimize the broadcast threshold for each agent so that the number of broadcasts from each agent is minimized. Second, we can minimize the number of agents used, and place the agents in strategic locations, e.g., at the gateways between major networks. Third, we can use a hierarchical architecture to group agents into regions. We can then combine evidence within local regions before broadcasting globally between regions. However, testing the scalability

236 214 Distributed Detection by Sharing Combined Beliefs and effectiveness of our approach on a large scale network testbed is beyond the scope of this thesis. 6.7 Conclusion In this chapter, we proposed a Router-Router Model to detect distributed denial of service attacks by sharing distributed beliefs. We have also presented a machine learning scheme to optimize the communication between the agents while sharing the distributed beliefs. We applied our Router-Router Model to two detection scenarios to demonstrate its efficacy. In the first detection scenario, we proposed a multi-agent scheme to detect DDoS attacks by monitoring the increase of new IP addresses. We have evaluated our multi-agent scheme using extensive simulations of DDoS attacks based on packet trace data from a real network. This evaluation demonstrated that we can reduce both the delay and communication overhead required to detect network intrusions, in comparison to a default decision function that relies on arbitrarily chosen broadcast periods. In the second detection scenario, we proposed a distributed mechanism to detect reflector attacks. Our contributions are two-fold. First, we have presented a scheme to detect the abnormal packets caused by the reflector attack by analyzing the inherent features of the reflector attack. Second, we have adapted a machine learning scheme to combine the beliefs of multiple agents that an attack is occurring. Experimental results show that our optimized learning scheme outperforms a random broadcast policy and accurately detects reflector attacks.

237 Chapter 7 Analysis of DoS Defense Schemes 7.1 Introduction In chapters 3, 4, 5 and 6, we introduced three defense models. Each model has its own defense strengths and implementation overhead. In this chapter, we aim to understand the challenges faced by DoS defense systems, and compare the strengths of our defense models with other approaches to DoS attack defense. In particular, we examine how we can optimize the defense performance by combining complementary models of defense. The rest of the chapter is organized as follows. Section 7.2 explains the fundamental challenges for DoS defense schemes. Section 7.3 categorizes DoS attacks according to features such as victim type and attack power. Section 7.4 compares the three defense models presented in the previous chapters, and analyzes the pros and cons of each model. Section 7.5 discusses how to use our defense models to defend against different type of DoS attacks and how to combine these models to make a complete DoS attack defense system. Finally, Section 7.6 discusses the impact of computer crime laws and IPv6 to DDoS defense. 215

238 216 Analysis of DoS Defense Schemes 7.2 Challenges for DoS Defense Schemes The DoS attack poses an unprecedented threat to the Internet community. All DoS defense systems face the three challenges listed below, which make DoS attacks one of the most difficult problems in managing network intrusions The Tragedy of the Commons As we discussed in Section 2.2.2, a DoS attack is a network problem which involves multiple entities, such as the attack source network, the transit network, and the target network. A comprehensive solution to this problem requires the cooperation of all these entities. Unfortunately, it is extremely difficult to establish cooperation between these entities. This is known as the tragedy of the commons [64], where no-one takes responsibility for the care of common resources for the common good. For example, as demonstrated in Chapter 3 and Chapter 6, the first-mile Victim Model (VM) defense and the Router-Router Model (RRM) defense are very effective solutions as they can filter attack traffic close to the source. However, there is no immediate incentive for network operators to deploy these solutions, since the benefits of these defenses are felt elsewhere in the network. Consequently, those who deploy these solutions incur the costs without receiving the benefits The Power of Many Versus the Strength of Few DoS defense can be described as a campaign between attack power and defense strength. Attackers accumulate attack power by compromising many computer systems, which is called the power of many. Targets maintain their defense strength by implementing distributed filters or increasing the host and network resources. The defense strength is mainly provided by the target, which is called the strength of the few. If the attack power is larger than the defense strength, the target s services

239 7.2 Challenges for DoS Defense Schemes 217 will be disrupted. Unfortunately, due to the frequency of discovered flaws in popular software platforms, there is a widespread knowledge of vulnerable computers on the Internet. This makes it easy for the attacker to gain tremendously large attack power. The attack power becomes even more daunting with the increase of the connection bandwidth of the compromised computers. The Code Red worm [15] is a prominent example of how attackers gain attack power. On the other hand, due to limited financial budgets and difficulties in cooperation, the target s defense strength is generally restricted. Only high profile organizations, such as Microsoft, are financially capable of increasing their defense strength. To conclude, targets are generally vulnerable to DoS attacks because of the imbalance between the attack power and defense strength Implementation Cost Generally, it is expensive or impossible to eliminate the DoS attack problem entirely. As we discussed in the previous chapters, the most effective DoS defense scheme is to detect and block attack traffic close to the source. However, the implementation cost for this scheme is high. First, there is a cost to legitimate traffic. As DoS attacks do not necessarily depend on malformed packets or a particular type of protocol, it is difficult to identify attack flows, especially when attack traffic is close to the source. Hence, legitimate flows could be blocked as a result of inaccuracies in DoS defense systems. Second, there is a cost in terms of the complexity of detection algorithms. To improve detection accuracy, many detection algorithms conduct intensive packet inspection to find attack packets, which can slow down end-to-end throughput in the network. Third, international cooperation in terms of law enforcement is required to track the perpetrators and to hold them accountable. However, since cooperation is driven by both economic and political motivations, it is difficult to reach a worldwide agreement on fighting Internet crimes in the short term. As a result, these costs impede the deployment of DoS defense systems.

240 218 Analysis of DoS Defense Schemes Figure 7.1: Categorization of DoS attacks according to victim type 7.3 DoS Attack Category Generally, all DoS attacks cause some degree of disruption and inconvenience to victims. However, the scale of the damage varies according to the type of services and network used by the victim, as well as the type and volume of attack traffic. More importantly, DoS attack defense systems perform differently against different types of DoS attacks. Hence, it is essential to categorize the main types of DoS attacks so that appropriate countermeasures for each type of DoS attack can be carefully developed Victim Type Generally, DoS attacks consume both host and network resources. The DoS attack succeeds if it ties up the resources of the victim. The specific target of an attack depends on the type of service used by the victim. We can categorize DoS attacks according to victim type as shown in Figure 7.1, namely: Service, Host, Network and Infrastructure attacks. Service Attack Each host normally provides multiple services, where each service is a separate application. A service attack aims to disable one particular service by exploiting an inherent vulnerability of that service. It can also consume all the resources of the target host. However, if the shared resources of the target host are not completely

241 7.3 DoS Attack Category 219 consumed, other services can still be accessible to the users. For example, for a host with both authentication and mail services, an attacker can send forged signatures to tie up the resources of the authentication application. However, the mail service application can still proceed to function because of its insulation from the authentication application 1. It is difficult to detect this type of DoS attack. First, as the service attack aims to target one particular service that has a relatively small proportion of resources, its traffic volume is low. Hence, the attack traffic is virtually indistinguishable from the legitimate traffic at the IP layer. Second, as applications that are not under attack operate normally, the host is likely to be unaware of the attack. In order to detect the attack, we need to configure a detection scheme for each service of the host. Host Attack Each host has a communication kernel that provides network connections for higher layer applications. The host attack aims to disable the host s communication kernel. For example, a SYN flood ties up the resources of the TCP/IP stack using bogus connection requests. Once these resources have been used up, no new TCP connection can be established. The host can detect this type of attack easily as the attack impact is conspicuous in terms of traffic patterns. Moreover, the host attack is characterized by its high attack traffic volume. Filtering attack traffic at the host can only relieve, rather than eliminate, the attack damage. The ideal solution is still to block attack traffic close to the source. 1 This example assumes that the CPU time is shared fairly among these multiple applications and each application has an upper bound of its share of CPU time.

242 220 Analysis of DoS Defense Schemes Network Attack Network attacks aim to consume the bandwidth of the target s network. As a result, all communication using the victim s network will fail. During a network attack, the destination IP addresses of the attack packets share the same network address. The capacity of the target s network determines the power needed to launch a successful network attack. As the network attack mainly consumes the network bandwidth, the attack traffic is generally high. To curtail this attack, the attack traffic should be filtered close to the source. Infrastructure Attack The infrastructure attack aims to disable the services of critical components of the Internet. The result of an infrastructure attack is potentially catastrophic as the whole Internet may be affected. For example, DNS root servers are the source of information for resolving web addresses, such as An infrastructure attack can tie up both network and host resources of the DNS root server, disrupting all Internet users from using web services. Normally, critical network infrastructure is highly provisioned. Therefore, significant power is required to launch a successful infrastructure attack. Given the massive attack power of the infrastructure attack, global cooperation is essential for an effective defense The Parameters of Attack Power The DoS attack s goal is to consume the resources of the target. The capability to tie up the resources is determined by the attack power. Generally, the attack power consists of two parameters. The first parameter is the traffic volume, which can be represented by the number of packets in a given period. The second parameter is the

243 7.3 DoS Attack Category 221 Figure 7.2: Categorization of DDoS attacks according to the parameters of attack power resources consumed per packet, which can be represented by CPU time or memory size needed to process the packet. We can categorize the DoS attacks according to the values of these two parameters as shown in Figure 7.2. Volume Attack The volume attack is based on a massive number of packets. During the volume attack, each attack packet consumes the same level of resources as a normal packet. With high traffic volume, the network links to the target will be highly congested. Hence, most packets to the target, including legitimate packets, will be dropped. The target s services are disrupted because its communication channel is congested. Malformed Attack The malformed attack sends a set of malformed packets, which consume more resources than normal packets. These malformed packets generally contain an invalid source address, fragmentation field, or protocol number. For example, when the target receives maliciously fragmented packets, it has to store these fragmented packets in memory, and then reassemble them, which is resource-consuming. Malformed packets can also be crafted for service-level attacks. For example, the CrashIIS attack [13] sends a malformed GET request via telnet to port 80 on a target Windows NT host

244 222 Analysis of DoS Defense Schemes to crash its web service. Hence, the power of a malformed attack is based on the relatively high level of resources consumed by each malformed packet. More importantly, the traffic volume of a malformed attack does not need to be high to be effective. Volume and Malformed Attack Generally, most operating systems have adjusted their kernels to be resilient to malformed packets. For example, a system can limit the maximum number of fragmented packets it will accept, preventing malformed packets from wasting unlimited resources. As a result, attackers have to increase the volume of malformed packets to maintain the attack power. We call this type of attack the volume and malformed attack, which is based on both the traffic volume and resource consumed per malformed packet Average Flow Rate and Number of Flows Given the attack traffic volume is constant, there are two variable parameters: average flow rate and number of flows. According to different values of these two parameters, we can categorize DDoS attacks as follows. Centralized Attack The centralized attack contains a single flow with extremely high flow rate. For example, a centralized attack can be one UDP flow with flow rate of 10 Mbps. Normally, the attack traffic is generated by a single attack source without IP spoofing. This type of attack traffic is easy to filter according to its source IP address. Distributed Attack The distributed attack contains a small number of flows with high average flow rate. The attack traffic can be generated by a few attack sources without IP spoofing or

245 7.3 DoS Attack Category 223 with partially spoofed IP source addresses. For example, a distributed attack occurs when an attacker directs each of the 20 compromised zombies to send 20 spoofed TCP flows at the rate of 50 Kbps to the target machine. If we assume the spoofed flows from each zombie are unique, then there will be = 400 flows with the rate of 50 Kbps at the target host. This type of attack traffic is still easy to filter according to their source IP addresses although more resources are needed to do so. Highly Distributed Attack The highly distributed attack contains an extremely large number of flows with normal or high average flow rate. The attack traffic can be generated by a massive number of compromised computer systems without IP spoofing or a small number of attack sources with random IP spoofing. A prominent example is that during the spread of the Code Red worm, over 300,000 zombie machines were compromised to launch a denial of service attack on the White House website [106]. If each zombie sends SYN traffic at a rate of 100 Kbps, the White House website will experience a 30 Gbps attack traffic rate. This type of attack traffic is hard to filter due to its wide range of source IP addresses and high overall attack traffic volume Attack Traffic Rate Dynamics During a DoS attack, an attack can choose the attack traffic rate dynamics. The result can be either a constant rate attack or a variable rate attack. Constant Rate Attack During a constant rate attack, the attack sources send attack traffic to the target at a constant traffic rate, which causes continuous service disruption to the target. This type of attack is easy to detect due to the continuous service degradation at the

246 224 Analysis of DoS Defense Schemes target. The attack sources can be located as the attack traffic is stable during the attack. Variable Rate Attack During a variable rate attack, the attack sources send attack traffic to the target with changing traffic rate. There are two scenarios of variable rate attack. In the first attack scenario, the attack sources send attack traffic to the target using variable traffic rates, which causes periodic service disruption to the target. For example, each of the 1,000 attack sources sends 500 SYN packets per second to the target continuously for 10 seconds, then waits for 30 seconds, and sends again. It is more difficult to detect this attack than the constant rate attack as the attack traffic is not stable. In the second attack scenario, the attacker divides the attack sources into several groups, each time only one group of attack sources are used to send attack packets. The result will be that each attack source has a variable sending rate while the victim will suffer from continuous service disruption. For example, 4,000 attack sources are divided into 4 groups with 1000 attacks sources each. Each attack source within one group sends 500 SYN packets per second to the target continuously for 10 seconds, then waits for 30 seconds, and sends again. Then the victim will constantly receive = 500, 000 SYN packets per second. The detection of this attack at the target is the same as the constant rate attack detection. However, it is more difficult to detect this attack at the source than the constant rate attack as each attack source uses a variable attack rate.

247 7.3 DoS Attack Category Impact of Attack The aim of a DoS attack is to disrupt the target s services. The impact of an attack depends on attack power that has been harnessed by the attacker in terms of the traffic volume and resources consumed per packet, and the resources available to the target. The result can be either a malign attack, which causes complete disruption, or a benign attack, which causes only partial disruption. Malign Attack There are two types of malign attacks, where the target s services are completely disrupted. One is a non-recoverable attack, where some systems will lock up or crash as the result of an attack. The system has to reboot in order to restore the normal service. For example, the Ping-of-death attack [6] will crash a computer system by sending an ICMP packet over 64K bytes. Another is a recoverable attack, where the target can continue to operate once the attack ceases. For example, the target under a UDP flood attack will function normally as soon as the attack stops. Benign Attack The benign attack only consumes a proportion, instead of all, of the resources of the target. During this attack, the target can still provide services to some legitimate users. However, most users will experience degraded performance or fail to access the target s services. This attack is extremely annoying while being difficult to distinguish from normal congestion, since the attack traffic volume is lower than the recoverable malign attack.

248 226 Analysis of DoS Defense Schemes 7.4 Comparison Between Our Defense Models In this thesis, we have proposed three defense models, namely, the Victim Model (VM), the Victim-Router Model (VRM) and the Router-Router Model (RRM). As the detection and filtering location of each model is different, it is important to understand the defense strength and implementation overhead of each model. As the First-mile Victim Model (FVM) and the Last-mile Victim Model (LVM) vary in defense strength and deployment overhead due to different implementation locations, we discuss these two models separately in the rest of this chapter. Table 7.1 compares our defense models in terms of three aspects: detection and filtering mechanisms (items 1 to 3), defense strength of each model (items 4 to 6) and implementation overhead (items 7 to 9). As shown in Table 7.1, the attack signatures used by the First-mile Victim Model and the Last-mile Victim Model are the same. The VRM uses the variation of traffic distribution to detect attacks while the RRM uses multiple rules to identify traffic anomalies (e.g., the number of RST packets is used to detect Reflector Attacks as shown in Section 6.4.1). The detection accuracy of the Lastmile Victim Model is high as all attack traffic converges at the last-mile router. In contrast, the detection accuracy of First-mile Victim Model depends on abnormal traffic patterns, e.g., spoofed addresses, as attack traffic can be scarce at the firstmile router. The normal packet survival ratio is low for the Last-mile Victim Model during a large-scale DDoS attack. As the Last-mile Victim Model only protects local network and server resources, normal packets still cannot reach the victim due to the congestion at the victim s upstream networks. The normal packet survival ratio is high for the VRM and the RRM, as both models propose to filter attack traffic close to the source. More importantly, if enough First-mile Victim Models are deployed with zero false positive rate, the normal packet survival ratio will approach 100% because all

249 First-mile Victim Model Last-mile Victim Model Victim-Router Model Router-Router Model (FVM) (LVM) (VRM) (RRM) 1. Detection Potential attack sources Potential victims Potential victims A set of detection systems Locations networks and/or their network and/or their networks and/or their distributed in the potential upstream ISP networks upstream ISP networks upstream ISP networks attack sources networks 2. Filtering Same as Same as Same as the detection Same as Locations the detection the detection locations and further the detection locations locations upstream ISP networks locations 3. Attack A large number of A large number of High traffic volume on A variety of traffic Signatures new IP addresses and/or new IP addresses and/or attack path and the anomalies observed by the flows with high traffic rate flows with high traffic rate change of traffic distribution distributed detection systems 4. False =0 if a proper detection =0 if a proper detection 0 (=0 if packet marking 0 (small if an Positive threshold is chosen threshold is chosen field cannot be spoofed and appropriate broadcast Rate as shown in Section as shown in Section normal traffic profile is updated) threshold is chosen) 5. Detection 1 (large if attack packets 1 (=1 if attack traffic 1 (=1 if attack traffic 1 (=1 if enough distributed Accuracy use spoofed addresses) volume is high) volume is high) detection systems are used) 6. Normal 0 (=1 if a substantial 0 (small in 0 (high if attack path 0 (high if attacks are Packet number of FVMs a sufficiently is accurately identified and detected accurately and the Survival are deployed and the large-scale attack traffic is filtered distributed detection systems Ratio false positive rate is 0) DDoS attack) close to the attack sources) are optimally placed) 7. New Pushback protocols Protocols Communication Not required Not required between victims and between distributed Protocols their upstream ISP networks detection systems 8. Low Low (Moderate if High (Moderate if High Computational the potential victim pushback is limited Requirement is a large website) in one ISP network) 9. Deployment High due to low Low Moderate (High for Moderate (High for Difficulty deployment incentive widespread deployment) widespread deployment) Table 7.1: Comparison between our defense models 7.4 Comparison Between Our Defense Models 227

250 228 Analysis of DoS Defense Schemes attack packets will be filtered before congesting the Internet links. Among the four defense models listed, the Last-mile Victim Model is the most deployable. This is mainly because the effectiveness of the Last-mile Victim Model does not depend on widespread deployment. In contrast, the First-mile Victim Model is the most difficult to deploy as customers generally do not have a direct economic incentive for deployment. It is difficult to deploy the VRM and the RRM widely as the cooperation between multiple parties is costly (e.g., the authentication between the victim and its upstream routers). When the potential victim is a large website, more memory space is needed to build an accurate IP Address Database (IAD) for the Last-mile Victim Model. Hence, the computational requirement for the Last-mile Victim Model is moderate. As the VRM needs to investigate the traffic variance for each upstream router and the RRM needs to identify the traffic anomalies according to a set of rules, their computational requirements are generally high. 7.5 How to Use Our Defense Models DoS Attacks Versus Our Defense Models We have proposed the First-mile Victim Model (FVM), Last-mile Victim Model (LVM), Victim-Router Model (VRM), and Router-Router Model (RRM). In Section 7.3, we discussed the categories of DoS attacks. As shown in Table 7.1, each model achieves a different trade-off between defense strength and implementation overhead. More importantly, depending on the desired security level and expected attacks, a target may be particularly interested in defending against specific types of DoS attacks. For example, a small organization that uses the Internet to provide basic web services would be more concerned about service attacks and host attacks. In contrast, a large organization, such as Yahoo, would be more concerned about

251 7.5 How to Use Our Defense Models 229 network attacks and infrastructure attacks. Hence, it is important to know how effective each of our defense models is against different types of DoS attacks. Table 7.2 shows the efficacy of each defense model against different types of DoS attacks, and provides a recommendation of which defense models should be used for each attack. In the following text, we will give a brief explanation of how to understand this table. As shown in the table, the FVM, VRM, and RRM are all effective in defending against the volume attack. However, the ideal defense mechanism is to filter the attack traffic before it congests the Internet. The VRM starts to filter attack traffic only after the victim has located the attack sources. However, by this time it is possible that the attack damage has already occurred. In the case of a highly distributed DoS attack, an individual First-mile Victim Model (FVM) may observe little attack traffic, which increases the difficulty of detecting the attack. However, the RRM can detect a highly distributed DoS attack by sharing distributed beliefs, and filter attack traffic close to the source. Consequently, the RRM is recommended to defend against the volume attack. As the power of a malformed attack is based on the packet contents instead of the traffic volume, it generally has low traffic volume. Hence, little network bandwidth will be consumed by this type of attack, and it is not necessary to filter attack traffic close to the source. More importantly, if the malformed attack is launched from distributed sources, the FVM may fail to detect the attack due to the small number of attack packets seen by each First-mile Victim Model (FVM), while the RRM needs a substantial number of distributed detection systems to detect this attack. In contrast, it is easy for the Last-mile Victim Model (LVM) to detect the attack as all attack packets converge at the victim. Consequently, the LVM is recommended to defend against the malformed attack.

252 230 Analysis of DoS Defense Schemes Table 7.2: Summary: DoS attacks versus DoS defense models Attack Type Effective Defense Models Recommended Defense Models Service Attack LVM LVM Host Attack FVM LVM VRM RRM RVM Network Attack FVM VRM RRM FVM RRM Infrastructure Attack FVM VRM RRM FVM RRM Volume Attack FVM VRM RRM RRM Malformed Attack LVM LVM Volume and Malformed Attack LVM RRM RRM Centralized Attack FVM VRM VRM Distributed Attack FVM VRM RRM FVM RRM Highly Distributed Attack RRM RRM Constant Rate Attack FVM LVM VRM RRM FVM VRM RRM Variable Rate Attack LVM RRM RRM Malign Attack (recoverable) FVM VRM RRM FVM Malign Attack (non-recoverable) FVM LVM LVM Benign Attack VRM RRM LVM VRM Integrate the VM with the VRM Although each defense model can operate independently, there are opportunities to integrate two or more models into one model. The original VRM is based on probabilistic packet marking (PPM) to infer the attack path information. This results in two limitations. First, PPM assumes that the traffic volume is large on each link of the attack path. Unfortunately, this assumption is not true for the links that are far away from the victim during DDoS attacks. Second, routers filter attack traffic according to source or destination address, which does not accurately reflect whether the traffic is malicious. On the other hand, the VM is a history-based defense scheme that does not depend on traffic volume. Moreover, the VM provides a detection mechanism as well as an accurate filtering scheme. Hence, we can integrate the VM with the VRM so that the combined system can robustly locate the attack source and defend against the attack close to the source. The architecture of combining the VM and VRM is shown in Figure 7.3. The operation of this combined model can be described as follows:

253 7.5 How to Use Our Defense Models 231 Figure 7.3: The architecture for combining the Victim Model and the Victim-Router Model. 1. Incoming traffic is partitioned according to router addresses that can be inferred from the packet marking field. The target is then able to build a separate IP Address Database (IAD) for each router, which is a subset of the target s total IAD. As shown in Figure 7.3, IAD: R1 means the IAD for router R1. 2. We apply history-based detection techniques to analyze each router s traffic to the target, and decide whether a router is on the attack path. We refer to a router on an attack path as an abnormal router. 3. Once an attack is detected, the target will send pushback messages to its furthest abnormal routers based on the distance field that is introduced in Chapter 4. A pushback message contains the target s IAD for that router s address. 4. Once a router receives an authenticated pushback message, it starts to filter the traffic to the target according to the IAD included in a pushback message.

254 232 Analysis of DoS Defense Schemes 5. Once a target does not observe any DoS attack for a certain period of time, a pushback message is sent to inform routers to stop traffic filtering. There are two major advantages for this combined model. First, the target can identify attack sources using not only traffic volume but also connection history. This will greatly improve the accuracy of locating an attack source. Second, once a router is informed that an attack occurs, it starts to filter traffic according to the connection history to the target. This provides a more accurate filtering rule than purely checking source or destination address. Hence, more legitimate traffic will be protected. However, the cost involved for this combined model is two-fold. First, the target needs more computational resources to maintain an IAD for each upstream router. At the same time, the target needs to perform detection for traffic from each router. Second, the target needs to include an IAD in the pushback message, which requires a large communication bandwidth. For the first cost, techniques introduced in Section can be used to build efficient IADs. Moreover, the target can enhance the processing capacity, for example, deploying a parallel detection architecture. For the second cost, compression techniques can be used to reduce the message size. Alternatively, a subset of the IAD can be sent, which contains only the most frequent IP addresses. Moreover, an outband channel can be used to prioritize the communication between the target and the upstream routers. How to evaluate this combined model is a future research issue. 7.6 Other Related Issues In this section, we will discuss the impact of computer crime laws and IPv6 [42] to DoS attack defense. However, further investigation into the computer crime laws and IPv6 is beyond the scope of this thesis.

255 7.6 Other Related Issues Computer Crime Laws As we mentioned in Section 2.4 and Section 7.2.1, law enforcement can help to solve the problem of DoS attacks. The first comprehensive initiative on computer crime was a staff study by the U.S. Senate Government Operations Committee in February 1977 [107]. After more than two decades evolution, many countries, including Australia, have established their own legislation against computer crimes [107]. However, computer crimes often cross multiple administrative, jurisdictional, and national boundaries. Hence, intense international cooperation is essential to combat against computer crimes. Interpol is the first International organization to deal with law enforcement in regarding to computer crime [107]. Currently, significant efforts to achieve effective, collaborative, and efficient approaches to fighting against computer crimes and cyber-terrorism are underway. For example, the G8 Recommendations on Transnational Crime [108] and the Council of Europe Convention on Cybercrime [109] have set up a high-level framework for international cooperation against computer crimes. The recent survey of the member countries of the forum for Asia Pacific Economic Cooperation [110] has also shown the achievement of international cooperation on law enforcement against computer crimes IP Version 6 IPv6 [42] is the proposed successor to IPv4. The most significant change from IPv4 to IPv6 is, of course, the 128 bits IPv6 address fields. With the increased IP address space, we expect each Internet user will be assigned a unique IP address. Hence, IP addresses are more associated with Internet users, which will improve the IP consistency that is the basis of our Victim Model approaches as discussed in Chapter 3. Consequently, our History-based Attack Detection and Reaction scheme proposed in Chapter 3 will be at least as efficient if not more as under IPv6.

256 234 Analysis of DoS Defense Schemes IPv6 will affect our Adjusted Probabilistic Packet Marking Scheme (APPM) proposed in Chapter 4.3 as there is no identification field in the IPv6 header, which is proposed to store the path information in APPM. However, we can propose similar techniques for IPv6. For example, instead of encoding the path information into the 16 bits identification field in IPv4 packet header, we can encode the path information into the 24 bits flow label field in the IPv6 packet header. As the IPv6 address has 128 bits, we have to overload the edge information which is = 256 bits into the 24 bit field. Consequently, we need to collect more marked packets to reconstruct the attack path comparing with IPv4. For the Selective Pushback scheme proposed in Chapter 5, we need to collect more marked packets to build a normal traffic distribution profile. As our Router-Router Model defense approaches do not depend on any special field of the IPv4 packet, they should continue to function in the case of IPv Conclusion In this chapter, we discussed three major challenges faced by DoS defense schemes, categorized DoS attacks in terms of victim type, attack power, attack flow statistics and attack damage. Moreover, we made a thorough comparison between our defense models in these aspects: detecting and filtering mechanisms, defense strength, and implementation overhead. Finally, we demonstrated how to use our defense models in two steps. First, we discussed the efficacy of each defense model against different types of DoS attack. Second, we investigated how to combine the Victim Model with the Victim-Router Model to make a more effective defense system.

257 Chapter 8 Conclusion After analyzing existing Denial of Service (DoS) attack defense techniques in Chapter 2, we find that the major challenges of DoS attack defense are how to identify the attack traffic accurately and efficiently, and how to locate attack sources and filter attack traffic close to the source. To address these challenges, we have proposed three defense models to defend against DoS attacks, namely, the Victim Model, the Victim-Router Model, and the Router-Router Model. In the Victim Model, attack traffic is detected and filtered at the target. The Victim Model mainly comprises the New Address Detection Engine, Flow Rate Detection Engine, and History-based IP Filtering. The New Address Detection Engine detects DoS attacks by monitoring the abnormal increase of new IP addresses. The Flow Rate Detection Engine detects DoS attacks by monitoring abnormal increase of flow rate. By combining the New Address Detection Engine and the Flow Rate Detection Engine, we can detect most of the DoS attacks. The History-based IP Filtering scheme can accurately filter attack traffic according to connection history without affecting legitimate traffic. Our major contribution in the Victim Model is to propose an efficient algorithm to identify DoS attack traffic accurately using the connection history at the victim. We 235

258 236 Conclusion have found that the connection history provides a much more stable way of identifying attack traffic, in contrast to other traffic features, such as the traffic rate, which are much more volatile. In particular, when combined with a traditional detection algorithm based on traffic volume, it can detect most DoS attacks. Moreover, we demonstrated that our History-based IP Filtering scheme can filter attack traffic while allowing the majority of the normal traffic to reach the target under attack. Finally, as our detection and filtering schemes only analyze simple traffic features, they are easy and efficient to implement. In order to locate the attack paths and protect network bandwidth during DoS attacks, we proposed the Victim-Router Model. In the Victim-Router Model, the victim detects the attack and cooperates with upstream routers to filter attack traffic close to the source. The Victim-Router Model includes the Adjusted Probabilistic Packet Marking scheme (the passive mode) and the Selective Pushback scheme (the active mode). In the Adjusted Probabilistic Packet Marking scheme, the upstream routers mark the incoming packets probabilistically, where the probability is decided by the distance between the marking router and the target. During an attack, the target can reconstruct the attack path by collecting the marked packets and hence identify the attack sources. The major contribution of Adjusted Probabilistic Packet Marking is to greatly reduce the number of packets needed to reconstruct the attack path. This is extremely useful in the case of a DDoS attack with many attack paths. In the Selective Pushback scheme, the target identifies the attack source according to the abnormal variance of the traffic distribution from upstream routers, and sends pushback messages to the routers close to the attack sources to filter attack traffic. One major contribution of Selective Pushback is that the target can identify attack sources quickly and send pushback messages to the routers close to the attack sources directly. This approach is much more efficient than previous pushback proposals, which have to propagate pushback messages to upstream routers incrementally, i.e.,

259 237 hop-by-hop. Another major contribution is that we can identify DDoS attack traffic accurately even though the attack traffic is low on each distributed link. The ideal DoS attack defense scenario is to filter attack traffic before it congests the Internet, which is the objective of our Router-Router Model. In the Router- Router Model, routers close to the potential attack sources communicate with each other to share local evidence of attacks. This enables routers to detect attacks much more quickly than if they acted in isolation. We applied the Router-Router Model to two intrusion detection scenarios. One is to detect Reflector Attacks and another is to detect Distributed Denial of Service Attacks. A machine learning scheme is used to decide when to share the information. The contributions of the Router-Router Model are three-fold. First, the Router- Router Model addresses the key issue of DoS attack defense, namely, filtering attack traffic at the source. Second, the Router-Router Model provides a new way of information sharing, which makes detection and filtering at the source possible. Third, the machine learning scheme applied in the Router-Router Model can let distributed Intrusion Detection System agents share information effectively with minimal cost. In each case, we demonstrated the effectiveness of our defense models using both analytical results and simulations based on real-life packet traces. Moreover, we highlight how our three defense models complement each other, and can be integrated into a robust solution for DoS attacks. Finally, our study gives a rich understanding of the DoS attack problem, and opens the way for long-term solutions, e.g., the important role that the legislation will play in eliminating DoS attack threats. The openness of Internet makes it inherently vulnerable to DoS attacks. In the short term, Victim Model provides an immediate defense, but does not guarantee to protect the Internet itself. In the long term, greater cooperation is needed between carriers in order to provide defense in-depth, e.g., by using the Victim-Router and

260 238 Conclusion Router-Router Models. Ultimately, we require a mechanism for enforcing traffic controls at the source of the traffic, or at least a legislative framework that allows action to be taken against the initiators of DoS attacks once they have been detected. In the meantime, the contribution of this thesis provide a range of defenses that can severely limit the damage caused by DoS attacks. This is itself is a significant step forward in providing a robust Internet service that can be used with confidence for electronic commerce and other on-line services.

261 Appendix A: Abbreviations and Glossary of terms Abbreviations [APPM] Adjusted Probabilistic Packet Marking [DoS Attack] Denial of Service Attack [DDoS Attack] Distributed Denial of Service Attack [FRDE] Flow Rate Detection Engine [FVM] First-mile Victim Model [HDDoS Attack] Highly Distributed Denial of Service Attack [HIF] History-based IP Filtering [HADR] History-based Attack Detection and Reaction [IAD] IP Address Database [IDS] Intrusion Detection System [IP] Internet Protocol 239

262 240 Appendix A: Abbreviations and Glossary of terms [IPv4] Internet Protocol version 4 [IPv6] Internet Protocol version 6 [kbps] kilobit per second [LVM] Last-mile Victim Model [Mbps] Megabit per second [NADE] New Address Detection Engine [NTMP] Network Traffic Monitoring Point [PPM] Probabilistic Packet Marking [RP] Router-based Pushback [RRM] Router-Router Model [RST Packet] TCP Reset Packet [SIA] Source IP Address [SP] Selective Pushback [TCP] Transmission Control Protocol [UDP] User Datagram Protocol [VM] Victim Model [VRM] Victim-Router Model

263 241 Glossary of terms [Detection Accuracy] The number of attacks detected over the total number of attacks. [False Positive] A normal operation that is misdiagnosed by the detection scheme as an attack. [False Positive Rate] The number of false positives over the total number of detection decisions made. [Filtering Accuracy] Percentage of legitimate traffic that a filtering scheme can protect. [Zombie] A computer that is compromised and controlled by an attacker.

264 242 Appendix A: Abbreviations and Glossary of terms

265 References [1] CERT/CC Statistics, URL stats.html. [2] V. Paxson. An analysis of using reflectors for distributed denial-of-service attacks. ACM Computer Communications Review 31(3), (2001). [3] H. F.Lipson. Tracking and tracing cyber-attacks: Technical challenges and global policy issues. Special Report CMU/SEI-2002-SR-009, CERT Coordination Center (2002). [4] J. A. Rochlis and M. W. Eichin. With microscope and tweezers: The worm from MIT s perspective. Communications of the ACM 32(6), (1989). [5] L. Garber. Denial-of-service attacks rip the Internet. IEEE Computer 33(4), (2000). [6] D. J. Marchette. Computer Intrusion Detection and Network Monitoring: A Statistical Viewpoint (Springer, 2001). [7] T. Peng, C. Leckie, and K. Ramamohanarao. Prevention from distributed denial of service attacks using history-based IP filtering. In Proceeding of 38th IEEE International Conference on Communications (ICC 2003), pp (Anchorage, Alaska, USA, 2003). 243

266 244 References [8] T. Peng, C. Leckie, and K. Ramamohanarao. Detecting distributed denial of service attacks by sharing distributed beliefs. In Proceedings of 8th Australasian Conference on Information Security and Privacy (ACISP 2003), pp (Wollongong, Australia, 2003). [9] T. Peng, C. Leckie, and K. Ramamohanarao. Adjusted probabilistic packet marking for IP traceback. In Proceedings of the Second IFIP Networking Conference (Networking 2002), pp (Pisa, Italy, 2002). [10] T. Peng, C. Leckie, and K. Ramamohanarao. Defending against distributed denial of service attack using selective pushback. In Proceedings of 9th IEEE International Conference on Telecommunications (ICT 2002), pp (Beijing, China, 2002). [11] T. Peng, C. Leckie, and K. Ramamohanarao. Detecting reflector attacks by sharing beliefs. In Proceedings of IEEE 2003 Global Communications Conference (Globecom 2003) (San Francisco, California, USA, 2003). [12] L. D. Paulson. Wanted: more network-security graduates and research. IEEE Computer 35(2), (2002). [13] J. Korba. Windows NT Attacks for the Evaluation of Intrusion Detection Systems (2000). S.M. Thesis, Massachusetts Institute of Technology. [14] E. H. Spafford. Crisis and aftermath. Communications of the ACM 32(6), (1989). [15] CERT Advisory CA : Code Red Worm Exploiting Buffer Overflow In IIS Indexing Service DLL, URL

267 References 245 [16] D. Moore, G. M. Voeker, and S. Savage. Inferring Internet Denial-of-Service acitivity. In Proceedings of the USENIX Security Symposium, pp (2001). [17] CERT Advisory CA : UDP Port Denial-of-Service Attack, URL [18] CERT Advisory CA : Smurf IP Denial-of-Service Attacks, URL [19] S. Dietrich, N. Long, and D. Dittrich. Analyzing distributed denial of service attack tools: The shaft case. In Proceedings of 14th Systems Administration Conference, pp (New Orleans, Louisiana, USA, 2000). [20] S. Gibson. Distributed Reflection Denial of Service (2002). URL [21] P. Ferguson and D. Senie. Network ingress filtering: Defeating denial of service attacks which employ IP source address spoofing. RFC 2267, the Internet Engineering Task Force (IETF) (1998). [22] C. Perkins. IP Mobility Support. RFC 2002, the Internet Engineering Task Force (IETF) (1996). [23] V. Arora, N. Suphasindhu, J. Baras, and D. Dillon. Effective extensions of Internet in hybrid satellite-terrestrial networks. Technical Report CSHCN TR , University of Maryland (1996). [24] Y. Rekhter and T. Li. A Border Gateway Protocol 4 (BGP-4). RFC 1771, the Internet Engineering Task Force (IETF) (1995). [25] K. Park and H. Lee. On the effectiveness of router-based packet filtering for distributed DoS attack prevention in power-law Internets. In Proceedings of

268 246 References the 2001 ACM SIGCOMM Conference, pp (San Diego, California, USA, 2001). [26] J. Li, J. Mirkovic, M. Wang, P. Reither, and L. Zhang. Save: Source address validity enforcement protocol. In Proceedings of IEEE INFOCOM 2002, pp (2002). [27] T. M. Gil and M. Poletto. Multops: a data-structure for bandwidth attack detection. In Proceedings of the 10th USENIX Security Symposium (2001). [28] H. Wang, D. Zhang, and K. G. Shin. Detecting SYN flooding attacks. In Proceedings of IEEE INFOCOM 2002, pp (2002). [29] R. B. Blažek, H. Kim, B. Rozovskii, and A. Tartakovsky. A novel approach to detection of denial-of-service attacks via adaptive sequential and batchsequential change-point detection methods. In Proceedings of IEEE Systems, Man and Cybernetics Information Assurance Workshop (2001). [30] C.-M. Cheng, H. T. Kung, and K.-S. Tan. Use of spectral analysis in defense against DoS attacks. In Proceedings of IEEE GLOBECOM 2002, pp (2002). [31] A. Kulkarni, S. Bush, and S. Evans. Detecting distributed denialof-service attacks using Kolmogorov complexity metrics. Technical Report 2001CRD176, GE Research & Development Center (2001). URL bushsf/ftn/2001crd176.pdf. [32] J. B. D. Cabrera, L. Lewis, X. Qin, W. Lee, R. K. Prasanth, B. Ravichandran, and R. K. Mehra. Proactive detection of distributed denial of service attacks using MIB traffic variables- a feasibility study. In Proceedings of the

269 References 247 7th IFIP/IEEE International Symposium on Integrated Network Management, pp (Seattle, WA, 2001). [33] C. Manikopoulos and S. Papavassiliou. Network intrusion and fault detection: A statistical anomlay approach. IEEE Communications Magazine 40(10), (2002). [34] Z. Zhang, J. Li, C. Manikopoulos, J. Jorgenson, and J. Ucles. HIDE: a hierarchical network intrusion detection system using statistical preprocessing and neural network classification. In Proceedings of the 2001 IEEE Workshop on Information Assurance and Security (United States Military Academy, West Point, NY, 2001). [35] S.Forrest and S.Hofmeyr. Architecture for an artificial immune system. Evolutionary Computation Journal 7(1), (1999). [36] J. L. Bebo, G. H. Gunsch, G. D. Lamont, P. D. Williams, and K. P. Anchor. CDIS: Towards a computer immune system for detecting network intrusions. In Proceedings of 4th International Workshop, RAID 2001 (2001). [37] B. Gemberling, C. Morrow, and B. Greene. ISP Security- Real World Techniques. Presentation, NANOG (2001). URL [38] C. Morrow. BlackHole Route Server and Tracking Traffic on an IP Network. UUNET, WorldCom, Inc., URL [39] H. Burch and B. Cheswick. Tracing anonymous packets to their approximate source. In Proceedings of the 14th Systems Administration Conference (New Orleans, Louisiana, USA, 2000).

270 248 References [40] R. Stone. Centertrack: An IP overlay network for tracking DoS floods. In Proceedings of the 9th USENIX Security Symposium (Denver, Colorado, USA, 1999). [41] S. Savage, D. Wetherall, A. Karlin, and T. Anderson. Practical network support for IP traceback. In Proceedings of the 2000 ACM SIGCOMM Conference, pp (2000). [42] S. Deering and R. Hinden. Internet Protocol, Version 6 (IPv6) Specification. RFC 2401, the Internet Engineering Task Force (IETF) (1998). [43] D. X. Song and A. Perrig. Advanced and authenticated marking schemes for IP traceback. In Proceedings of IEEE INFOCOM 2001, pp (2001). [44] D. Dean, M. Franklin, and A. Stubblefield. An algebraic approach to IP traceback. ACM Transactions on Information and System Security 5(2), (2002). [45] S. Bellovin. The ICMP traceback message. IETF Internet Draft (2000). URL [46] S. F. Wu, L. Zhang, D. Massey, and A. Mankin. Intension- Driven ICMP Trace-Back. IETF Internet Draft (2001). URL [47] K. Park and H. Lee. On the effectiveness of probabilistic packet marking for IP traceback under denial of service attack. In Proceedings of IEEE INFOCOM 2001, pp (2001). [48] M. Waldvogel. Gossib vs. IP traceback rumors. In Proceedings of 18th Annual Computer Security Applications Conference (ACSAC 2002) (2002).

271 References 249 [49] A. C. Snoeren, C. Partridge, L. A. Sanchez, C. E. Jones, F. Tchakountio, S. T. Kent, and W. T. Strayer. Hash-based IP traceback. In Proceedings of the 2001 ACM SIGCOMM Conference, pp (San Diego, California, USA, 2001). [50] B. H. Bloom. Space/time tradeoffs in hash coding with allowable errors. Communications of the ACM 13(7), (1970). [51] D. J. Bernstein and E. Schenk. Linux Kernel SYN Cookies Firewall Project. URL [52] O. Spatscheck and L. L. Petersen. Defending against denial of service attacks in Scout. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation (1999). [53] F. Kargl, J. Maier, and M. Weber. Protecting web servers from distributed denial of service attacks. In Proceedings of 10th International World Wide Web Conference, pp (2001). [54] F. Lau, S. H. Rubin, M. H. Smith, and L. Trajković. Distributed denial of service attacks. In Proceedings of IEEE International Conference on Systems, Man, and Cybernetics, vol. 3, pp (2000). [55] S. Floyd and V. Jacobson. Link-sharing and resource management models for packet networks. IEEE/ACM Transactions on Networking 3(4), (1995). [56] S. Floyd and V. Jacobson. Random early detection gateways for congestion avoidance. IEEE/ACM Transactions on Networking 1(4), (1993). [57] D. K. Y. Yau, J. C. S. Lui, and F. Liang. Defending against distributed denial-of-service attacks with max-min fair server-centric router throttles. In

272 250 References Proceedings of IEEE International Workshop on Quality of Service (IWQoS), pp (Miami Beach, FL, 2002). [58] R. Mahajan, S. M. Bellovin, S. Floyd, J. Ioannidis, V. Paxson, and S. Shenker. Controlling high bandwidth aggregates in the network. ACM Computer Communications Review 32(3), (2002). [59] U. Tupakula and V. Varadharajan. A practical method to counteract denial of service attacks. In Proceedings of Twenty-Sixth Australasian Computer Science Conference (ACSC2003), pp (Adelaide, Australia, 2003). [60] A. D.Keromytis, V. Misra, and D. Rubenstein. SOS: Secure overlay services. In Proceedings of the 2002 ACM SIGCOMM Conference, pp (2002). [61] S. Kent and R. Atkinson. Security Architecture for the Internet Protocol. RFC 2401, the Internet Engineering Task Force (IETF) (1998). [62] J. Mirkovic, G. Prier, and P. Reiher. Attacking DDoS at the source. In Proceedings of ICNP 2002, pp (Paris, France, 2002). [63] X. Geng and A. Whinston. Defeating distributed denial of service attacks. IEEE IT Professional 2(4), (2000). [64] G. Hardin. The tragedy of the commons. Science pp (1968). [65] J. D. Howard. An analysis of security incidents on the Internet. Ph.D. thesis, Carnegie Mellon University (1998). [66] A. Demers, S. Keshav, and S. Shenker. Analysis and simulation of a fair queuing algorithm. In Proceedings of the 1990 ACM SIGCOMM Conference, pp (1990).

273 References 251 [67] R. Pan, B. Prabhakar, and K. Psounis. CHOKe - a stateless active queue management scheme for approximating fair bandwidth allocation. In Proceedings of IEEE INFOCOM 2000, pp (2000). [68] J. Jung, B. Krishnamurthy, and M. Rabinovich. Flash crowds and denial of service attacks: Characterization and implications for CDNs and web sites. In Proceeding of 11th Word Wide Web conference, pp (Honolulu, Hawaii, USA, 2002). [69] B. E. Brodsky and B. S. Darkhovsky. Nonparametric Methods in Change-point Problems (Kluwer Academic Publishers, 1993). [70] K. J. Houle, G. M. Weaver, N. Long, and R. Thomas. Trends in denial of service attack technology. Technical Report Version 1.0, CERT Coordination Center, Carnegie Mellon University (2001). URL trends.pdf. [71] Y. Rekhter and T. Li. An Architecture for IP Address Allocation with CIDR. RFC 1518, the Internet Engineering Task Force (IETF) (1993). [72] V. Paxson and S. Floyd. Wide-area traffic: The failure of Poisson modeling. IEEE/ACM Transactions on Networking 3(3), (1995). [73] F. D. Smith, F. H. Campos, K. Jeffay, and D. Ott. What TCP/IP protocol header can tell us about the web. In Proceedings of the 2001 ACM SIGMET- RICS Conference, pp (2001). [74] Waikato Applied Network Dynamics Research Group, The University of Waikato, URL

274 252 References [75] R. Oliver. Countering SYN Flood Denial-of-Service (DoS) Attacks. Invited Talk, 10th USENIX Security Symposium (2001). URL [76] M. Basseville and I. V. Nikiforov. Detection of Abrupt Changes: Theory and Application (Prentice Hall, 1993). [77] W. Lee and D. Xiang. Information-theoretic measures for anomaly detection. In Proceedings of the 2001 IEEE Symposium on Security and Privacy, pp (2001). [78] J. Kim, D. Oard, and K. Romanik. Using implicit feedback for user modeling in internet and intranet searching. Technical Report CLIS-TR-00-01, College of Library and Information Services, University of Maryland at College Park (2000). URL [79] MIT Lincoln Laboratory, 2000 DARPA Intrusion Detection Scenario Specific Data Sets, URL [80] NLANR PMA and the Internet Traffic Research group, Bell Labs - I data set, URL [81] G. Box and A. Lucenõ. Statistical Control by Monitoring and Feedback Adjustment (John Wiley & Sons, Inc., 1997). [82] P. Gupta and N. McKeown. Packet classification on multiple fields. In SIG- COMM, pp (1999). [83] C. Leckie and R. Kotagiri. A probabilistic approach to detecting network scans. In Proceedings of Eighth IEEE Network Operations and Management Symposium (NOMS 2002), pp (Florence, Italy, 2002).

275 References 253 [84] Symantec Corporation, URL [85] Honeypot Project, URL [86] CERT Advisory CA W32/Blaster worm, URL [87] Anti-Spam Solutions and Security, Part 2, URL [88] K. Egevang and P. Francis. The IP Network Address Translator (NAT). RFC 1631, the Internet Engineering Task Force (IETF) (1994). [89] R. Droms. Dynamic Host Configuration Protocol. RFC 2131, the Internet Engineering Task Force (IETF) (1997). [90] CERT Advisory CA : Denial-of-service developments, URL [91] S. McCreary and K. Claffy. Trends in wide area IP traffic patterns: A view from AMES Internet exchange. In ITC Specialist Seminar on IP Traffic Measurement, Modeling, and Management (Monterey, California, USA, 2000). URL [92] I. Stoica and H. Zhang. Providing guaranteed services without per flow management. In Proceedings of the 1999 ACM SIGCOMM Conference (Boston,MA, 1999). [93] W. Feller. An Introduction to Probability Theory and Its Applications (2nd edition), vol. 1 (Wiley and Sons, 1966).

276 254 References [94] R. Carter and M. Crovella. Dynamic server selection using dynamic path characterization in wide-area networks. In Proceedings of IEEE INFOCOM 1997, pp (Kobe, Japan, 1997). [95] Skitter Analysis, Cooperative Association for Internet Data Analysis, URL [96] W. Theilmann and K. Rothermel. Dynamic distance maps of the Internet. In Proceedings of IEEE INFOCOM 2000, vol. 1, pp (Tel-Aviv, Israel, 2000). [97] J. Moy. OSPF Version 2. RFC 1583, the Internet Engineering Task Force (IETF) (1994). [98] B. Cheswick and H. Burch. Lumeta Corp., Internet Mapping Project, URL [99] NS Web Page, URL [100] Cisco Routers, URL [101] C. Leckie and R. Kotagiri. Learning to share distributed probabilistic beliefs. In Proceedings of the Nineteenth International Conference on Machine Learning (ICML-2002), pp (Sydney, Australia, 2002). [102] R. Carter and M. Crovella. Server selection using dynamic path characterization in wide-area networks. In Proceedings of IEEE INFOCOM 1997, pp (1997). [103] J. Rustagi. Optimization techniques in statistics (Academic Press, 1994).

277 References 255 [104] J. Spall. Implementation of the simultaneous perturbation algorithm for stochastic optimization. IEEE Transactions on Aerospace and Electronic Systems 34(3), (1998). [105] Waikato Applied Network Dynamic Research Group, The University of Waikato, URL [106] D. Evans and D. Larochelle. Improving security using extensible lightweight static analysis. IEEE Software 19(1), (2002). [107] Cybercrime Law: A Global Survey of Cybercrime Legislation. URL [108] G8 Recommendations on Transnational Crime (2002). Part IV: Transnational Crime, Section D: High-Tech and Computer-Related Crimes. Endorsed by G8 Ministers of Justice and the Interior. Mont-Tremblant, URL [109] Council of Europe. Convention on Cybercrime (2001). URL [110] Asia Pacific Economic Cooperation. Survey of Cybercrime Legislation: Preliminary Summary (2003). URL 1.htm.

278 Index k, 122, 124, way handshake, 23 bandwidth attack, 7, 19, 65 Bloom filter, 98 client, 23 code red worm, 18 customer s network, 30 CUSUM, 90 DDoS attack, 7 denial of service attack, 4 detection accuracy, 9 distributed denial of service attack, 7 DoS attack, 4 downstream router, 7 DRDoS, 27 DRDoS attack, 28, 66 edge router, 6 egress filtering, 31 end host, 6 external network, 30 false negative, 9 false negative rate, 9 false positive, 8 false positive rate, 8 filtering accuracy, 127 finger daemon, 18 firewall, 8 first-mile router, 7 flash crowd, 19, 66 Flow Rate Detection Engine, 77 frequent IP address, 87, 102 half-open connection, 197 HDDoS, 65 ICMP flood, 26 IDS, 8 infiltrating, 133 ingress filtering, 30 Internet protocol, 8 intrusion detection system, 8 IP Address Database, 87 IP flow, 8 256

279 Index 257 IP spoofing, 21 IPv6, 70, 233 last-mile router, 7 monitoring point, 81 New Address Detection Engine, 74 new IP address, 9 non-parametric model, 67 normal traffic conditions, 81 upstream routers, 7 user-to-root attack, 17 victim, 6 Victim Model, 9 Victim-Router Model, 10 parametric model, 67 ping-of-death, 18 private network IP address, 31 reflector, 7 reflector attack, 7 remote-to-local attack, 17 Router-Router Model, 10 sampling interval, 84 source, 6 SYN flood, 23 target, 6 the Tragedy of the Commons, 61 third party, 6 traffic aggregate, 8 typical DDoS attack, 27 UDP flood, 24