Detection of Distributed Denial of Service Attack with Hadoop on Live Network



Similar documents
Efficacy of Live DDoS Detection with Hadoop

Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis

Chapter 7. Using Hadoop Cluster and MapReduce

Big Data Analytics for Net Flow Analysis in Distributed Environment using Hadoop

Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework

Log Mining Based on Hadoop s Map and Reduce Technique

Hadoop Technology for Flow Analysis of the Internet Traffic

Generic Log Analyzer Using Hadoop Mapreduce Framework

How To Analyze Network Traffic With Mapreduce On A Microsoft Server On A Linux Computer (Ahem) On A Network (Netflow) On An Ubuntu Server On An Ipad Or Ipad (Netflower) On Your Computer

A TWO LEVEL ARCHITECTURE USING CONSENSUS METHOD FOR GLOBAL DECISION MAKING AGAINST DDoS ATTACKS

Scalable NetFlow Analysis with Hadoop Yeonhee Lee and Youngseok Lee

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Processing of Hadoop using Highly Available NameNode

An Anomaly-Based Method for DDoS Attacks Detection using RBF Neural Networks

HADOOP MOCK TEST HADOOP MOCK TEST II

Application of Netflow logs in Analysis and Detection of DDoS Attacks

Keywords: Big Data, HDFS, Map Reduce, Hadoop

International Journal of Advance Research in Computer Science and Management Studies

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

A Novel Distributed Denial of Service (DDoS) Attacks Discriminating Detection in Flash Crowds

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE

Flexible Deterministic Packet Marking: An IP Traceback Scheme Against DDOS Attacks

General Terms. Keywords 1. INTRODUCTION 2. RELATED WORKS

An Experimental Approach Towards Big Data for Analyzing Memory Utilization on a Hadoop cluster using HDFS and MapReduce.

Distributed Framework for Data Mining As a Service on Private Cloud

Dual Mechanism to Detect DDOS Attack Priyanka Dembla, Chander Diwaker 2 1 Research Scholar, 2 Assistant Professor

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique

Data-Intensive Computing with Map-Reduce and Hadoop

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Machine Learning Algorithm for Pro-active Fault Detection in Hadoop Cluster

Document Similarity Measurement Using Ferret Algorithm and Map Reduce Programming Model

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

co Characterizing and Tracing Packet Floods Using Cisco R

Click Stream Data Analysis Using Hadoop

International Journal of Emerging Technology & Research

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

Reduction of Data at Namenode in HDFS using harballing Technique

The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform

Hadoop IST 734 SS CHUNG

IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE

International Journal of Innovative Research in Computer and Communication Engineering

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications

Redundant Data Removal Technique for Efficient Big Data Search Processing

Hadoop and Map-Reduce. Swati Gore

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Apache Hadoop new way for the company to store and analyze big data

Big Data and Scripting map/reduce in Hadoop

Enhancing MapReduce Functionality for Optimizing Workloads on Data Centers

Hadoop Scheduler w i t h Deadline Constraint

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

A Review of Anomaly Detection Techniques in Network Intrusion Detection System

Packet Flow Analysis and Congestion Control of Big Data by Hadoop

Hadoop Architecture. Part 1

Mining Interesting Medical Knowledge from Big Data

Adaptive Discriminating Detection for DDoS Attacks from Flash Crowds Using Flow. Feedback

Workshop on Hadoop with Big Data

CSE-E5430 Scalable Cloud Computing Lecture 2

Mobile Cloud Computing for Data-Intensive Applications

Big Data and Hadoop. Sreedhar C, Dr. D. Kavitha, K. Asha Rani

DDOS WALL: AN INTERNET SERVICE PROVIDER PROTECTOR

MapReduce. Tushar B. Kute,

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela


An Efficient Filter for Denial-of-Service Bandwidth Attacks

19. Exercise: CERT participation in incident handling related to the Article 13a obligations

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov

A Novel Packet Marketing Method in DDoS Attack Detection

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Big Data With Hadoop

The Performance Characteristics of MapReduce Applications on Scalable Clusters

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012

H2O on Hadoop. September 30,

Guide to DDoS Attacks December 2014 Authored by: Lee Myers, SOC Analyst

ISSN: CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS

GraySort and MinuteSort at Yahoo on Hadoop 0.23

Hadoop on a Low-Budget General Purpose HPC Cluster in Academia

Keywords Attack model, DDoS, Host Scan, Port Scan

HADOOP PERFORMANCE TUNING

Apache Hadoop. Alexandru Costan

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

Survey on DDoS Attack in Cloud Environment

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Firewall Firewall August, 2003

A Small-time Scale Netflow-based Anomaly Traffic Detecting Method Using MapReduce

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage

A SYSTEM FOR DENIAL OF SERVICE ATTACK DETECTION BASED ON MULTIVARIATE CORRELATION ANALYSIS

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Parallel Processing of cluster by Map Reduce

Transcription:

Detection of Distributed Denial of Service Attack with Hadoop on Live Network Suchita Korad 1, Shubhada Kadam 2, Prajakta Deore 3, Madhuri Jadhav 4, Prof.Rahul Patil 5 Students, Dept. of Computer, PCCOE, Pune, India 1,2,3,4 Assistant Professor, Department of Computer Engineering, University of Pune, Pune, India 5 ABSTRACT: Distributed Denial of Service i.e. DDoS flooding attacks are one of the biggest challenges to the availability of online services now-a-days. These DDoS attacks overwhelm the victim with huge volume of traffic and render it incapable of performing communication or crashes it completely. If there are delays in detecting the flooding attacks, we have to manually disconnect the victim and fix the problem. With the rapid increase of DDoS attack volume and frequency, the current DDoS detection techniques are challenged to deal with huge attack volume in reasonable and affordable response time. In this paper, we had proposed, a Hadoop based Live DDoS Detection framework to tackle efficient analysis of flooding attacks by using core components of Hadoop like MapReduce and HDFS. We implemented a counter-based DDoS detection algorithm for four major flooding attacks (TCP-SYN, UDP and ICMP, HTTP GET) in MapReduce, consisting of mapper and reducer functions. We deployed a testbed to evaluate the performance of Hadoop framework for live DDoS detection. Based on the experiment we showed that Hadoop is capable of processing and detecting DDoS attacks in affordable time. KEYWORDS: Hadoop, HDFS, DDoS, Flooding attacks I. INTRODUCTION Distributed Denial of Service (DDoS) flooding attacks are one of the biggest concerns for security and network professionals. As the volume of Internet traffic increases explosively year after year, the Intrusion Detection Systems (IDSes) have faced the issue on how to assure both scalability and accuracy of analysing the DDoS attack from these huge volume ofdata.a Denial of Service (DoS) attack is an attempt to make a computer resource unavailable to normal users.the Dos attacks are becoming more powerful due to behaviour of attacker. Attack that leverages multiple sources to create the denial-of-service condition is known as The Distributed Denial of Service (DDoS) attack. DDoS attacks are big threats to internet services. Now a day there is massive growth in internet traffic. One of the major advantages of hardware-based DDoS defence system is that they can process packets at a higher speed but problem with this system are high false positive rate. In order to address these problem of false positive rate & big data traffic we are implementing software based system by which we can solve problem of high traffic rate. In this paper we are using hadoop framework in terms to address the problem of big data which is caused by DDoS attacks. We are proposing Counter based algorithm to detect DDoS attack to solve the problem of high false positive rate to detect DDoS attack using behaviour of attacker. II. HADOOP FRAMEWORK Hadoop is an open-source software framework that supports data-intensive distributed applications. It enables applications to work with thousands of computationally independent computers and with petabytes of data. Hadoop increases the storage space and the processing power by uniting many computers into one. A small Hadoop cluster will include a single master and multiple worker nodes (slaves) as in Figure-1. Copyright to IJIRCCE DOI: 10.15680/IJIRCCE.2016. 0401018 92

Figure 1:Hadoop Framework The master node consists of a Job Tracker, Task Tracker, Name Node and Data Node. A slave or worker node acts as both a Data Node and Task Tracker. In a large cluster, HDFS is managed through a dedicated NameNode server to host the file system index and a secondary Name Node that can generate snapshots of the Name Node s memory structures, thus preventing file system corruption and reducing loss of data. Hadoop Distributed File System (HDFS) is a distributed, scalable, and portable file system written in Java for the Hadoop framework. Map Reduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters of commodity hardware in a reliable,fault tolerant manner. III. HADOOP DDOS DETECTION FRAMEWORK The Hadoop Based DDoS Detection Framework comprise of four major phases 1. Packet capturing and Log generation. 2. Log transferring to HDFS. 3. Detection of DDoS Attack. 4. Result. Each of the above mentioned phases are implemented as separate components that communicate with each other to perform their assigned task. Packet capturing and log generation are handled at the capturing server, whereas DDoS detection and result notification is performed by the detection server. Log transfer is handled through Flume. In the following subsections we have explained the functionalities for each of the phase/component in detail. Figure 2: Different Phases Copyright to IJIRCCE DOI: 10.15680/IJIRCCE.2016. 0401018 93

A. Packet Capturing and Log Generation DDoS detection starts with the capturing of network traffic coming to the server. This system provides a web interface through which the admin can tune the capturing server with desired parameters.wireshark is an open source library capable of capturing huge amount of traffic. Under default settings, wireshark library runs through command line, and outputs the result on console. To log the traffic for later use, we here use flume to create a pipeline with wireshark and read all the output packets from wireshark. We have also tuned wireshark to output only the relevant information required during detection phase. This includes information of timestamps, source IP, destination IP, packet protocol and brief packet header information. Figure 3 : Network Traffic Capturing and Log Generation Component B. Log Transfer After the log file is generated, the Traffic Handler in the capturing server will temporarily pause the traffic capturing operations of wireshark. The traffic handler will then notify the detection server and also share the file information with it via a flume. Figure 4 :Log Transfer Phase C. DDoS Detection The Apache Hadoop consists of two core components i.e. HDFS and MapReduce. Hadoop's central management node also known as NameNode splits the data into large number of same size blocks and distributes them amongst the cluster nodes. Hadoop's MapReduce transfers packaged code for nodes to process in parallel, the data each node is responsible to process. The detection server mainly serves as the Hadoop's NameNode, which is the centrepiece of the Hadoop DDoS detection cluster. On successful transfer of log file/files, the detection server split the file into same size large blocks and starts MapReduce DDoS detection jobs on cluster nodes. We have discussed MapReduce job analyser and counter based DDoS detection algorithm. Once the detection task is finished, the results are saved into HDFS. Copyright to IJIRCCE DOI: 10.15680/IJIRCCE.2016. 0401018 94

Counter Based algorithm Figure 5: DDoS Detection on Hadoop Cluster Figure 6 : The MapReduce algorithm for counter-based DDoS detection In detection phase of our paper we are implementing this algorithm to detect DDoS attacks. This algorithm needs three inputs which are time interval, threshold and unbalance ratio, which can be loaded through the distributed cache mechanism of MapReduce. Time interval is to limit monitoring duration of the page request. Threshold indicates the frequency of the page request to the server against the previous normal status of network, which determines whether the server should be alarmed or not. The unbalance ratio denotes the ratio of response per page request between a specific client and a server. This value is used for finding out attackers from the clients. Finally, the algorithm aggregates and summarizes values per server. When total requests for a specific server from client exceeds the threshold value limit, the MapReduce job emits that records whose response ratio against requests is greater than Copyright to IJIRCCE DOI: 10.15680/IJIRCCE.2016. 0401018 95

unbalance ratio, marking them as attacker's IP address. While this algorithm has the low computational complexity and could be easily converted into the MapReduce implementation, it needs a prerequisite to know the threshold value from historical monitoring data from network in advance. D. Result Notification Once the execution of all the MapReduce tasks is finished, Hadoop will save the results in HDFS. The detection server will then parse the result file from HDFS and send the information about the attackers back to the administrator via the capturing server. Once the results are notified both the input and output folders from HDFS will be deleted for better memory management by the detection server. IV. MAPREDUCE JOB AND DDOS DETECTION A MapReduce program is composed of a Map task that performs functions of filtering and sorting and a Reduce task that performs a summary operation. Here we have explained how we have implemented detection of DDoS flooding attacks (UDP, HTTP GET, ICMP and TCP-SYN) as a MapReduce task on Hadoop cluster using counter-based algorithm. A. Mapper Job After starting MapReduce task, the first task is a mapper task which takes input from HDFS as a block. In our case the block will represent a file in text format and the input for each iteration of mapper function will be a single line from the file. Any single line in the file contains only brief information of a network packet captured through Wireshark. Mapper job takes pair of data as input and returns a list of pairs. Mapper's output type may differ from mapper's input type, in our case the input of mapper function is pair of any number and the network packet. In our MapReduce algorithm, the map function filters network packets and generates key values of server IP address, masked timestamp, and client IP address. The masked timestamp with time interval is used for counting the number of requests from a specific client to the specific URL within the same time duration. Mapper job also use hashing for combining all the logs of data on the basis of source IP address, so that it becomes easier for reducer to analyze the attack traffic. B. Reducer Job and Counter-Based Algorithm Once the mapper tasks are completed, the reducer will start operating on the list of key/value pairs (i.e./packet pairs) produced by the mapper functions. The reducers are assigned a group with unique key, it means that all the packets with unique key will be assigned to one reducer. We can configure Hadoop to run reducer jobs on varying number of data nodes. For efficiency and performance it is very important to identify the correct number of reducers required for analyzing the analysis job. Hadoop run counter-based algorithm to detect DDoS flooding attacks on reducer nodes. The reduce function summarizes the number of URL requests, page requests, and server responses between a client and a server. Counter based algorithm is the simplest, yet very effective algorithm to analyze the DDoS flooding attacks by monitoring the traffic volumes for source IP addresses. V. SIMULATION RESULTS Fig. 7 and 8 shows the overall performance of our proposed framework to detect the DDoS attacks. These numbers present the total time required for capturing, processing, transferring and detection with different file sizes. For the experiments in fig. 7 we have used 80-20 attack volume, 128 MB block size and 500 threshold. For the experiments in fig. 8, we have only changed the threshold to 1000. In fig. 7, we can observe that with the increase in the file size, the overall overhead of capturing and transferring phase increase. A 10 MB file takes approx. 16 seconds (42%) in capturing/transferring phase and 21 seconds in detection phase.the best case of 1 GB file (10 node cluster) takes 178 seconds (77%) in capturing/transferring phase and just 50 seconds in detection phase. On the whole, it takes somewhere between 4.3 mins to 3.82 mins to analyse 1 GB of log file that can resolve 10K attackers and generated from an aggregate attack volume of 15.83 GBs. Copyright to IJIRCCE DOI: 10.15680/IJIRCCE.2016. 0401018 96

Figure 7 : Total time to detection DDoS Attack in HADEC with 500 threshold Figure 8 : Total time to detection DDoS Attack in HADEC with 1000 threshold VI. CONCLUSION In this paper, we present Hadoop framework, a scalable Hadoop based Live DDoS Detection framework that is capable of analysing DDoS attacks in affordable time. It captures live network traffic i.e. packets, process it to log relevant information in brief form and use MapReduce and HDFS of Hadoop to run detection algorithm for DDoS flooding attacks. Hadoop solves the scalability, memory inefficiency and process complexity issues of conventional solution by utilizing parallel data processing by Hadoop. The evaluation results showed that it takes would less than 5 mins to Copyright to IJIRCCE DOI: 10.15680/IJIRCCE.2016. 0401018 97

process (from capturing to detecting) 1GB of log file, generated from approx. 15.83 GBs of live network traffic. With small log file the overall detection time can be further reduced to couple seconds. REFERENCES [1] Hadoop. https://hadoop.apache.org/. [2] Hadoop yarn. http://hortonworks.com/hadoop/yarn/. [3] J. Mirkivic and P. Reiher, A Taxonomy of DDoS Attack and DDoS Defense Mechanisms, ACM SIGCOMM CCR, 2004 [4] H. Sun, Y Zhaung, and H. Chao, A Principal Components Analysis-based Robust DDoS Defense System, IEEE ICC, 2008 [5] H. Liu, Y Sun, and M. Kim, Fine-Grained DDoS Detection Scheme Based on Bidirectional Count Sketch, IEEE ICCCN, August 2011 [6] Y. Lee, W. Kang, and Y. Lee, A Hadoop-based Packet Trace Processing Tool,TMA,April2011 Copyright to IJIRCCE DOI: 10.15680/IJIRCCE.2016. 0401018 98