A Small-time Scale Netflow-based Anomaly Traffic Detecting Method Using MapReduce
|
|
|
- Georgia Harrington
- 10 years ago
- Views:
Transcription
1 , pp A Small-time Scale Netflow-based Anomaly Traffic Detecting Method Using MapReduce Wang Jin-Song, Zhang Long, Shi Kai and Zhang Hong-hao School of Computer and Communication Engineering, Tianjin University of Technology, Tianjin [email protected] Abstract Anomaly traffic detecting using Netflow data is one of important problems in the field of network security. In this paper, we proposed an approach using MapReduce model, which was realized by means of the entropy observation and DFN (Distinct feature number) distribution deviations of traffic features under anomalies at small time scales. The MapReduce was used to deal with huge amounts of data with the aid of computer cluster processing. Experimental results show the effectiveness of the proposed approach. Keywords: Netflow; MapReduce; small time scales; traffic features 1. Introduction In recent years, with the Internet rapid development in global and kinds of internet applications popularizing, the Internet has become an essential tool for carrying information in people s daily life [1]. The security of Network information system becomes more and more important too. Netflow is a protocol of traffic statistics developed by Cisco [2]. The work principle of Netflow is as follows: with the use of the standard exchange model, Netflow can process the first IP data packet of the data flow and form Netflow buffer, and then the same data based on cache information transfer in the same data flow, no longer matched the strategies of access. At the same time, Netflow cache contains the statistics information of data flow afterward. In other word, flow is a unidirectional data packet which has the same source IP, destination IP, source port and destination port. According to different version, there are several forms of Netflow data collection. At present, the widely used Netflow versions are V5 and V8 [3]. Several traffic features (e.g., flow size, ports and addresses) have been suggested as candidates for entropy based anomaly detection [4]. The goal of this paper is to provide a better understanding of the use of Netflow-based methods in anomaly detection and accelerate the efficiency of anomaly detection. In this paper we use the existing equipment and some low cost hardwares to design a small-time scale Netflow-based anomaly traffic detecting method using MapReduce. Through analyzing Netflow data, it can discover attack and intrusion behavior in the network. We propose a ten-dimensional anomaly analysis index to detect anomaly traffic and found their stability at small-scale time. The MapReduce computing model helps us to accelerate detection efficiency. We base on the Netflow form as follows: 210.*.* *.* ISSN: IJSIA Copyright c 2014 SERSC
2 The meaning of the fields are source IP, destination IP, flow in port, flow out port, source port, destination port, type of protocol, number of packets, number of bytes[5]. The remainder of this paper is organized as follows: we introduce the MapReduce computing model in Section 2. Then in Section 3 we analyze the characteristics of network anomalies, present our proposed method and explain the details of the procedure. In Section 4, the experiment set-up, results and their analysis are explained. Finally, we conclude our paper in Section Formatting your Paper MapReduce initialized by Google is a programming model for expressing distributed computation on massive amount of data and an execution framework for large-scale data processing on clusters of commodity servers [6]. The underlying idea of MapReduce comes from the well-known principles in parallel and distributed processing [7]. Hadoop is an open source implementation of MapReduce [8] written in java which provides reliable, scalable and fault tolerance distributed computing. Hadoop environment set up involves a great number of parameters which are crucial to achieve best performance. It allows programmers to develop distributed applications without any distributed knowledge. Key-value pairs form the basic data structure in MapReduce. Keys and values may be primitives such as integers, floating point values, strings and raw bytes or they may be arbitrary complex structures (lists, tuples, associative array, etc.). Programmers typically need to define their custom data types. The map function takes the input records and generates intermediate key and value pairs. The reduce function takes an intermediate keys and a set of values to form a smaller set of values. Typically just zero or one output value is produced by the reducer. In MapReduce, the programmer defines a mapper and reducer with the following signature: Map (k1, v1) [(k2, v2)] Reduce (k2, [v2]) [(k3, v3)] [ ] denotes the list [9] MapReduce framework is responsible for automatically splitting the input, distributing each chunk to workers (mappers) on multiple machines, grouping and sorting all intermediate values associated with the intermediate key, passing these values to workers (reducers) on multiple resources, this is shown in Figure 1. Monitoring the execution of mappers and reducers as to re-execute them when failures are detected is done by the master. It is common for MapReduce jobs to have thousands of individual tasks that need to be assigned to nodes in the cluster. In large jobs, the total number of tasks may exceed the number of tasks that can be run on the cluster concurrently, making it necessary for the scheduler to maintain some sorts of task queues and to track the progress of running tasks so that waiting tasks can be assigned to nodes as they become available. HDFS [10] is the subproject of Apache foundation which is under project Hadoop and used to construct a distributed file system with cheap PC hardwares. Compared to the other distributed file systems, HDFS has the advantage of high reliability and low cost. HDFS has the following mechanisms: High fault tolerance: If some nodes break down, HDFS can quickly detects faults and takes measures to restore the data for fault nodes. 232 Copyright c 2014 SERSC
3 Support streaming data access: Data in HDFS needs to use flow method to access, and doesn t support the random access model. Support massive data: HDFS supports for large-file storage; a large amount of small files will result the poor system performance. Simple consistency model: HDFS data supports the access mode of write once and read many; if the files are created, they can t be modified. Split1 Map1 Reduce1 Output1 Split3 Map3 shuffle Split3 Map3 Reduce2 Output2 Figure 1. Simplified View of MapReduce 3. Implementation of the Anomaly Traffic Detecting Method Generally, Netflow data reflects the real-time network performance. The network traffic features always change when anomaly traffic happens. We pick up some network traffic features from Netflow data and discover the law of the relationship of anomaly traffic and network traffic features The Network Traffic Features in Anomaly Traffic Detection Entropy [11], reflecting the distribution probability of system microscopic, shows micro state diversity or uniformity in the rmodynamics. From the point of communication, the interference of randomness is inevitable. Therefore, communication system has the characteristics of statistics; information source can be seen as a set of random events. The randomness of this set is similar to the chaos degrees of micro state in thermodynamics, the information entropy will be formed when the thermodynamic probability are extended to the chance of all information source signal of system appear. The information entropy marks how much information contained, it is the description of the uncertainty of the system. The entropy values of a sample of size n lie in the range [0, logn]. The minimum value 0 is taken when there is no variation in the data items (e.g., single IP address or port) and the maximum value logn appears when all the data items are distinct or when the variation is large. In entropy-based detection techniques. The entropy of a random variable N with possible values {n1, n2, n3., nn} can be calculated as: N n i n i H ( X ) ( ) log( ) (1) S i 1 S Copyright c 2014 SERSC 233
4 and S N n i i 1. Entropy is used to captain the degree of dispersal or concentration of the distributions for traffic features. The higher entropy indicates more dispersed distribution, whereas the distribution is more concentrated. At the same time we use N in the formula (1) as a new indicator as DFN (Distinct Feature Number). DFN also shows the degree of dispersal or concentration of the distributions for traffic features. We define the fixed value of consequent packets in Netflow data as a PU (packet unit). PU is the unit of the value of entropy and DFN. By analyzing the value of entropy and DFN in PU, anomaly traffic could be detected. Then we get a ten-dimensional anomaly analysis index system including the entropy and DFN of source/destination IP, source/destination port number and packet length as shown in Table 1. Table 1. Ten-dimensional Anomaly Analysis Index X(*) Characteristics of flow Entropy DFN X(SIP) Source IP H(X(SIP)) N(X(SIP)) X(DIP) Destination IP H(X(DIP)) N(X(DIP)) X(SPT) Source port number H(X(SPT)) N(X(SPT)) X(DPT) Destination port number H(X(DPT)) N(X(DPT)) X(PKT) Packet length H(X(PKT)) N(X(PKT)) 3.2. Implementation using MapReduce The MapReduce distributed data analysis framework model is good at large-scale data parallel computing. We get all the Netflow data from Netflow collector of the detected network. The Netflow data is so large that the traditional detection method always takes a sample. MapReduce help us to use all the Netflow data in detection of anomaly traffic. The initial Netflow data files are divided into several new files which are classified by network traffic characteristics, entropy and DFN of source/destination IP, source/destination port number and packet length. Upload all the processed files to HDFS. Setting the map function and the reduce function under which the files are analyzed. Through studying the analysis results we can find the relationship between the network traffic features and anomaly traffic. The specific process is as follows: Step1:Data Collecting: Collect Netflow data in Cisco router. Step2:Data processing: Separate the files into new PU files. The new files are classified by network traffic features, entropy and DFN of source/destination IP, source/destination port number and packet length. Step3:Uploading files: Upload all the processed files to HDFS, using MapReduce to process all the files. Step4:Analyzing results to find the relationships of the network traffic characteristics and anomaly traffic. 234 Copyright c 2014 SERSC
5 Analysis and Detection MapReduce Computing Hadoop Data Processing Data Collecting Cisco Router Figure 2. Implementation Procedure of the Method 4. Simulation The experimental environment is shown in Figure 3. The data source in the simulation comes from Tianjin Urban Education Net. R1 is the border router, Cisco 7606; node A is the Netflow collector, it is a Netflow collector server with nfdump [12] which collected Netflow data and wrote into a file every one minute; Host B is the payload analysis server which is a Linux machine with snort which is a famous open source NIDS, used to monitor the network anomaly and compare to the Netflow detection results. C is the Hadoop cluster, used for storing and analyzing Netflow data. In this experiment the fixed value of consequent packets 6000, means that 1 PU contains 6000 consequent packets. One file always includes 450 PUs around. Figure 3. Experimental Environment Copyright c 2014 SERSC 235
6 CaseⅠ, The network is stable and reliable. Figure 4-Figure 8 show the entropy values of destination/source IP, destination/source port and packet length of 460 PUs, the time window is 20:00-20:01 in 21th May Figure 4. The Entropy of Destination IP in Normal Network Figure 5. The Entropy of Source IP in Normal Network Figure 6. The Entropy of Destination Figure 7. The Entropy of Source Port Number in Normal Network Port Number in Normal Network Figure 8. The Entropy of Packet Length in Normal Network Figure 9-Figure13 show the DFN values of destination/source IP, destination/source port and packet length of 460 PUs, the time window is 20:00-20:01 in 21th May Figure 9. The DFN of Destination IP in Normal Network Figure 10. The DFN of Source IP in Normal Network 236 Copyright c 2014 SERSC
7 Figure 11. The DFN of Destination Port Number in Normal Network Figure 12. the DFN of Source Port Number in Normal Network Figure 13. The DFN of Packet Length in Normal Network From Figure 4-Figure 13, it is clear that in the stable and reliable network, the values of entropy and the DFN of the five traffic features are steady and they are in a small range. Case Ⅱ, the network is under DDoS attack, and this is the situation where anomaly traffic happened. The Figure 14 shows the entropy value of destination IP, and the Figure 15 show the DFN value of destination IP. Figure 14. The Entropy of Destination IP under DDoS Attack Figure 15. The DFN of Destination IP under DDoS Attack From the Figure 14-Figure 15 show that when DDoS attack happens, the entropy value of destination IP is steady and they are in a small range but the DFN value of destination IP becomes ruleless and beyond the threshold value. DFN value could help us find anomaly traffic that the entropy value couldn t clearly shows. Case Ⅲ, Figure 16-Figure 20 show the DFN values of destination/source IP, destination/source port and packet length of 460 PUs, the time window is 20:05-20:06 in 25th May 2012 when we scan the network. Copyright c 2014 SERSC 237
8 Figure 16. The Entropy of Destination IP in Abnormal Network Figure 17. The Entropy of Source IP in Abnormal Network Figure 18. The Entropy of Destination Port Number in Abnormal Network Figure 19. The Entropy of Source Port Number in Abnormal Network Figure 20. The Entropy of Packet Length in Abnormal Network Figure 21-Figure 25 show the DFN values of destination/source IP, destination/source port and packet length of 460 PUs, the time window is 20:05-20:06 in 25th May Figure 21. The DFN of Destination IP in Abnormal Network Figure 22. The DFN of Source IP in Abnormal Network 238 Copyright c 2014 SERSC
9 Figure 23. The DFN of Destination Port Number in Abnormal Network Figure 24. The DFN of Source Port Number in Abnormal Network Figure 25. The DFN of Packet Length in Abnormal Network Figure 16-Figure 25 depict that the results of entropy and DFN when anomaly traffic happed in network. They are show that the entropy and DFN become ruleless, and the values are beyond the threshold value. From long time experiment, we found that the entropy and DFN of the traffic futures are floating in a range of values. When the value is out of this range,the anomaly traffic happening in the network. The Table 2 shows the range values. Table 2. The Range of Values of the Entropy and DFN of the Traffic Futures Characteristics of flow threshold value Characteristics of flow threshold value H(X(SIP)) N(X(SIP)) H(X(DIP)) N(X(DIP)) H(X(SPT)) N(X(SPT)) H(X(DPT)) N(X(DPT)) H(X(PKT)) According to the experimental results, we can arrive at the following reviews: (1)The entropy and DFN of the network traffic characteristics are quite stable in normal network at small scale time. (2)When anomaly traffic happened, the entropy and DFN of the network traffic characteristics has exceeded the threshold value. (3)The DFN of the packet length has no significant change when anomaly traffic happened. 5. Conclusion In this paper we present a Netflow-based anomaly traffic detecting method realized with the aid of MapReduce. A ten-dimensional anomaly analysis index system including the entropy and DFN of source/destination IP, source/destination port number and packet length is also proposed. With the use of MapReduce computing, the proposed approach improves the efficiency of anomaly traffic detection. The entropy and the DFN of traffic features are steady Copyright c 2014 SERSC 239
10 in small-scale time. Experimental results show that the presented method is suitable to find anomaly traffic in network timely. Acknowledgement This work was supported by the National Natural Science Foundation of China ( and ). The authors would like to thank Tianjin Key Lab of Intelligent Computing and Novel Software Technology and Key Laboratory of Computer Vision and System, Ministry of Education, for their support to the work. References [1] F. Bashir Shaikh and S. Haider, Security Threats in Cloud Computing, 6th International conference on International Technology and Secured Transactions, (2011) December [2] Introduction to Cisco ISO netflow, Technical Overview, (2007). [3] Cisco System Ine. Introduction to Cisco NetFlow-A Technical Overview. [4] Z. Jia, J. Yuehui and Y. Xiaowei, Netflow based Anomaly Traffic Analyzer, Microcomputer application, vol. 28, no. 7, (2007) July. [5] F. RasPall and A. Kock, Implementation of a General-Purpose Network Measurement System, Budapest, Hungaray, (2004). [6] G. Yang, The Application of MapReduce in the Cloud Computing, International Symposium on Intelligence Information Processing and Trusted Computing, (2011), pp [7] J. Dean, Experiences with MapRedue, an abstraetion for large-scale computation, Proe.15th International Conferene on Parallel Arehitectures and Compilation Techniques, (2006). [8] S. Hammoud, M. Li, Y. Liu, N. K. Alham and Z. Liu, MRSim: A discrete event based MapReduce simulator, Seventh International IEEE Conference on Fuzzy Systems and Knowledge Discovery (FSKD), (2010). [9] D. Peng and F. Dabek, Large-scale Incremental Processing Using Distributed Transactions and Notifications, Operating Systems Design and Implementation, (2010) October. [10] T. White, Hadoop the Definitive Guide, (2011). [11] H. Xian-hua, Z. Yun, L. Ai-bing, H. Hong-rang, Y. Xiang-rong and Z. Jian, Study of Entropy Flow Characteristics during the Evolution of Typhoon Morakot, 2011 International Conference on Electronics and Optoelectronics, (2011). [12] R. Hofstede, A. Sperotto, T. Fioreze and A. Pras, The network data handling war: MySQL vs. NfDump, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), LNCS, Networked Services and Applications - Engineering, Control and Management, vol. 6164, (2010), pp Authors Wang Jin-song, is currently a teacher with the School of Computer and Communication Engineering, Tianjin University of Technology, Tianjin , China ( [email protected]). 240 Copyright c 2014 SERSC
11 Zhong Long, is currently a student with the School of Computer and Communication Engineering, Tianjin University of Technology, Tianjin , China, ( chh @126.com). Shi Kai is currently a teacher with the School of Computer and Communication Engineering, Tianjin University of Technology, Tianjin , China. ( [email protected]). Zhang Hong-hao is currently a teacher with the School of Tianjin University of Technology, Tianjin, , China. His research interests include network security, trusted networks and next generation network( [email protected]). Copyright c 2014 SERSC 241
12 242 Copyright c 2014 SERSC
Log Mining Based on Hadoop s Map and Reduce Technique
Log Mining Based on Hadoop s Map and Reduce Technique ABSTRACT: Anuja Pandit Department of Computer Science, [email protected] Amruta Deshpande Department of Computer Science, [email protected]
UPS battery remote monitoring system in cloud computing
, pp.11-15 http://dx.doi.org/10.14257/astl.2014.53.03 UPS battery remote monitoring system in cloud computing Shiwei Li, Haiying Wang, Qi Fan School of Automation, Harbin University of Science and Technology
A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster
, pp.11-20 http://dx.doi.org/10.14257/ ijgdc.2014.7.2.02 A Load Balancing Algorithm based on the Variation Trend of Entropy in Homogeneous Cluster Kehe Wu 1, Long Chen 2, Shichao Ye 2 and Yi Li 2 1 Beijing
Detection of Distributed Denial of Service Attack with Hadoop on Live Network
Detection of Distributed Denial of Service Attack with Hadoop on Live Network Suchita Korad 1, Shubhada Kadam 2, Prajakta Deore 3, Madhuri Jadhav 4, Prof.Rahul Patil 5 Students, Dept. of Computer, PCCOE,
NetFlow Analysis with MapReduce
NetFlow Analysis with MapReduce Wonchul Kang, Yeonhee Lee, Youngseok Lee Chungnam National University {teshi85, yhlee06, lee}@cnu.ac.kr 2010.04.24(Sat) based on "An Internet Traffic Analysis Method with
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model
Chapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
Big Data with Rough Set Using Map- Reduce
Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,
Hadoop Technology for Flow Analysis of the Internet Traffic
Hadoop Technology for Flow Analysis of the Internet Traffic Rakshitha Kiran P PG Scholar, Dept. of C.S, Shree Devi Institute of Technology, Mangalore, Karnataka, India ABSTRACT: Flow analysis of the internet
Optimization and analysis of large scale data sorting algorithm based on Hadoop
Optimization and analysis of large scale sorting algorithm based on Hadoop Zhuo Wang, Longlong Tian, Dianjie Guo, Xiaoming Jiang Institute of Information Engineering, Chinese Academy of Sciences {wangzhuo,
Large-Scale Data Sets Clustering Based on MapReduce and Hadoop
Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE Anjali P P 1 and Binu A 2 1 Department of Information Technology, Rajagiri School of Engineering and Technology, Kochi. M G University, Kerala
Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis
, 22-24 October, 2014, San Francisco, USA Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis Teng Zhao, Kai Qian, Dan Lo, Minzhe Guo, Prabir Bhattacharya, Wei Chen, and Ying
Design of Electric Energy Acquisition System on Hadoop
, pp.47-54 http://dx.doi.org/10.14257/ijgdc.2015.8.5.04 Design of Electric Energy Acquisition System on Hadoop Yi Wu 1 and Jianjun Zhou 2 1 School of Information Science and Technology, Heilongjiang University
International Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6
International Journal of Engineering Research ISSN: 2348-4039 & Management Technology Email: [email protected] November-2015 Volume 2, Issue-6 www.ijermt.org Modeling Big Data Characteristics for Discovering
Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment
Analysing Large Files in a Hadoop Distributed Cluster Environment S Saravanan, B Uma Maheswari Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham,
Enabling Multi-pipeline Data Transfer in HDFS for Big Data Applications
Enabling Multi-pipeline Data Transfer in HDFS for Big Data Applications Liqiang (Eric) Wang, Hong Zhang University of Wyoming Hai Huang IBM T.J. Watson Research Center Background Hadoop: Apache Hadoop
Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image
Limitations of Packet Measurement
Limitations of Packet Measurement Collect and process less information: Only collect packet headers, not payload Ignore single packets (aggregate) Ignore some packets (sampling) Make collection and processing
Hadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
Data-Intensive Computing with Map-Reduce and Hadoop
Data-Intensive Computing with Map-Reduce and Hadoop Shamil Humbetov Department of Computer Engineering Qafqaz University Baku, Azerbaijan [email protected] Abstract Every day, we create 2.5 quintillion
Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
Packet Flow Analysis and Congestion Control of Big Data by Hadoop
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.456
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.
Big Data Storage Architecture Design in Cloud Computing
Big Data Storage Architecture Design in Cloud Computing Xuebin Chen 1, Shi Wang 1( ), Yanyan Dong 1, and Xu Wang 2 1 College of Science, North China University of Science and Technology, Tangshan, Hebei,
Big Data Technology Map-Reduce Motivation: Indexing in Search Engines
Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process
Keywords: Big Data, HDFS, Map Reduce, Hadoop
Volume 5, Issue 7, July 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Configuration Tuning
http://www.paper.edu.cn
5 10 15 20 25 30 35 A platform for massive railway information data storage # SHAN Xu 1, WANG Genying 1, LIU Lin 2** (1. Key Laboratory of Communication and Information Systems, Beijing Municipal Commission
Network Traffic Analysis using HADOOP Architecture. Zeng Shan ISGC2013, Taibei [email protected]
Network Traffic Analysis using HADOOP Architecture Zeng Shan ISGC2013, Taibei [email protected] Flow VS Packet what are netflows? Outlines Flow tools used in the system nprobe nfdump Introduction to
Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop
Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social
Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2
Advanced Engineering Forum Vols. 6-7 (2012) pp 82-87 Online: 2012-09-26 (2012) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/aef.6-7.82 Research on Clustering Analysis of Big Data
L1: Introduction to Hadoop
L1: Introduction to Hadoop Feng Li [email protected] School of Statistics and Mathematics Central University of Finance and Economics Revision: December 1, 2014 Today we are going to learn... 1 General
Parallel Processing of cluster by Map Reduce
Parallel Processing of cluster by Map Reduce Abstract Madhavi Vaidya, Department of Computer Science Vivekanand College, Chembur, Mumbai [email protected] MapReduce is a parallel programming model
Flow Based Traffic Analysis
Flow based Traffic Analysis Muraleedharan N C-DAC Bangalore Electronics City [email protected] Challenges in Packet level traffic Analysis Network traffic grows in volume and complexity Capture and decode
Big Data With Hadoop
With Saurabh Singh [email protected] The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
Networking in the Hadoop Cluster
Hadoop and other distributed systems are increasingly the solution of choice for next generation data volumes. A high capacity, any to any, easily manageable networking layer is critical for peak Hadoop
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,
Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.
Load Rebalancing for Distributed File Systems in Clouds. Smita Salunkhe, S. S. Sannakki Department of Computer Science and Engineering KLS Gogte Institute of Technology, Belgaum, Karnataka, India Affiliated
MapReduce. Tushar B. Kute, http://tusharkute.com
MapReduce Tushar B. Kute, http://tusharkute.com What is MapReduce? MapReduce is a framework using which we can write applications to process huge amounts of data, in parallel, on large clusters of commodity
Comparison of Different Implementation of Inverted Indexes in Hadoop
Comparison of Different Implementation of Inverted Indexes in Hadoop Hediyeh Baban, S. Kami Makki, and Stefan Andrei Department of Computer Science Lamar University Beaumont, Texas (hbaban, kami.makki,
DoS: Attack and Defense
DoS: Attack and Defense Vincent Tai Sayantan Sengupta COEN 233 Term Project Prof. M. Wang 1 Table of Contents 1. Introduction 4 1.1. Objective 1.2. Problem 1.3. Relation to the class 1.4. Other approaches
A Study of Network Security Systems
A Study of Network Security Systems Ramy K. Khalil, Fayez W. Zaki, Mohamed M. Ashour, Mohamed A. Mohamed Department of Communication and Electronics Mansoura University El Gomhorya Street, Mansora,Dakahlya
Efficient Data Replication Scheme based on Hadoop Distributed File System
, pp. 177-186 http://dx.doi.org/10.14257/ijseia.2015.9.12.16 Efficient Data Replication Scheme based on Hadoop Distributed File System Jungha Lee 1, Jaehwa Chung 2 and Daewon Lee 3* 1 Division of Supercomputing,
Big Data Analytics for Net Flow Analysis in Distributed Environment using Hadoop
Big Data Analytics for Net Flow Analysis in Distributed Environment using Hadoop 1 Amreesh kumar patel, 2 D.S. Bhilare, 3 Sushil buriya, 4 Satyendra singh yadav School of computer science & IT, DAVV, Indore,
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,
Available online at www.sciencedirect.com Available online at www.sciencedirect.com
Available online at www.sciencedirect.com Available online at www.sciencedirect.com Physics Physics Procedia Procedia 00 (2011) 24 (2012) 000 000 2293 2297 Physics Procedia www.elsevier.com/locate/procedia
BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic
BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop
CS 378 Big Data Programming. Lecture 2 Map- Reduce
CS 378 Big Data Programming Lecture 2 Map- Reduce MapReduce Large data sets are not new What characterizes a problem suitable for MR? Most or all of the data is processed But viewed in small increments
Suresh Lakavath csir urdip Pune, India [email protected].
A Big Data Hadoop Architecture for Online Analysis. Suresh Lakavath csir urdip Pune, India [email protected]. Ramlal Naik L Acme Tele Power LTD Haryana, India [email protected]. Abstract Big Data
Real-Time Handling of Network Monitoring Data Using a Data-Intensive Framework
Real-Time Handling of Network Monitoring Data Using a Data-Intensive Framework Aryan TaheriMonfared Tomasz Wiktor Wlodarczyk Chunming Rong Department of Electrical Engineering and Computer Science University
MASSIVE DATA PROCESSING (THE GOOGLE WAY ) 27/04/2015. Fundamentals of Distributed Systems. Inside Google circa 2015
7/04/05 Fundamentals of Distributed Systems CC5- PROCESAMIENTO MASIVO DE DATOS OTOÑO 05 Lecture 4: DFS & MapReduce I Aidan Hogan [email protected] Inside Google circa 997/98 MASSIVE DATA PROCESSING (THE
Fault Tolerance in Hadoop for Work Migration
1 Fault Tolerance in Hadoop for Work Migration Shivaraman Janakiraman Indiana University Bloomington ABSTRACT Hadoop is a framework that runs applications on large clusters which are built on numerous
Telecom Data processing and analysis based on Hadoop
COMPUTER MODELLING & NEW TECHNOLOGIES 214 18(12B) 658-664 Abstract Telecom Data processing and analysis based on Hadoop Guofan Lu, Qingnian Zhang *, Zhao Chen Wuhan University of Technology, Wuhan 4363,China
Task Scheduling in Hadoop
Task Scheduling in Hadoop Sagar Mamdapure Munira Ginwala Neha Papat SAE,Kondhwa SAE,Kondhwa SAE,Kondhwa Abstract Hadoop is widely used for storing large datasets and processing them efficiently under distributed
packet retransmitting based on dynamic route table technology, as shown in fig. 2 and 3.
Implementation of an Emulation Environment for Large Scale Network Security Experiments Cui Yimin, Liu Li, Jin Qi, Kuang Xiaohui National Key Laboratory of Science and Technology on Information System
CSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 14.9-2015 1/36 Google MapReduce A scalable batch processing
Open source Google-style large scale data analysis with Hadoop
Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical
Botnet Detection Based on Degree Distributions of Node Using Data Mining Scheme
Botnet Detection Based on Degree Distributions of Node Using Data Mining Scheme Chunyong Yin 1,2, Yang Lei 1, Jin Wang 1 1 School of Computer & Software, Nanjing University of Information Science &Technology,
marlabs driving digital agility WHITEPAPER Big Data and Hadoop
marlabs driving digital agility WHITEPAPER Big Data and Hadoop Abstract This paper explains the significance of Hadoop, an emerging yet rapidly growing technology. The prime goal of this paper is to unveil
A Catechistic Method for Traffic Pattern Discovery in MANET
A Catechistic Method for Traffic Pattern Discovery in MANET R. Saranya 1, R. Santhosh 2 1 PG Scholar, Computer Science and Engineering, Karpagam University, Coimbatore. 2 Assistant Professor, Computer
BSPCloud: A Hybrid Programming Library for Cloud Computing *
BSPCloud: A Hybrid Programming Library for Cloud Computing * Xiaodong Liu, Weiqin Tong and Yan Hou Department of Computer Engineering and Science Shanghai University, Shanghai, China [email protected],
A Review on Network Intrusion Detection System Using Open Source Snort
, pp.61-70 http://dx.doi.org/10.14257/ijdta.2016.9.4.05 A Review on Network Intrusion Detection System Using Open Source Snort Sakshi Sharma and Manish Dixit Department of CSE& IT MITS Gwalior, India [email protected],
Hadoop Parallel Data Processing
MapReduce and Implementation Hadoop Parallel Data Processing Kai Shen A programming interface (two stage Map and Reduce) and system support such that: the interface is easy to program, and suitable for
CS 378 Big Data Programming
CS 378 Big Data Programming Lecture 2 Map- Reduce CS 378 - Fall 2015 Big Data Programming 1 MapReduce Large data sets are not new What characterizes a problem suitable for MR? Most or all of the data is
The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform
The Comprehensive Performance Rating for Hadoop Clusters on Cloud Computing Platform Fong-Hao Liu, Ya-Ruei Liou, Hsiang-Fu Lo, Ko-Chin Chang, and Wei-Tsong Lee Abstract Virtualization platform solutions
Reducer Load Balancing and Lazy Initialization in Map Reduce Environment S.Mohanapriya, P.Natesan
Reducer Load Balancing and Lazy Initialization in Map Reduce Environment S.Mohanapriya, P.Natesan Abstract Big Data is revolutionizing 21st-century with increasingly huge amounts of data to store and be
Case Study: Instrumenting a Network for NetFlow Security Visualization Tools
Case Study: Instrumenting a Network for NetFlow Security Visualization Tools William Yurcik* Yifan Li SIFT Research Group National Center for Supercomputing Applications (NCSA) University of Illinois at
MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012
MapReduce and Hadoop Aaron Birkland Cornell Center for Advanced Computing January 2012 Motivation Simple programming model for Big Data Distributed, parallel but hides this Established success at petabyte
Integrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
Hadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
Survey on Scheduling Algorithm in MapReduce Framework
Survey on Scheduling Algorithm in MapReduce Framework Pravin P. Nimbalkar 1, Devendra P.Gadekar 2 1,2 Department of Computer Engineering, JSPM s Imperial College of Engineering and Research, Pune, India
Apache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay
Weekly Report Hadoop Introduction submitted By Anurag Sharma Department of Computer Science and Engineering Indian Institute of Technology Bombay Chapter 1 What is Hadoop? Apache Hadoop (High-availability
A programming model in Cloud: MapReduce
A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value
OpenFlow Based Load Balancing
OpenFlow Based Load Balancing Hardeep Uppal and Dane Brandon University of Washington CSE561: Networking Project Report Abstract: In today s high-traffic internet, it is often desirable to have multiple
Fault Analysis in Software with the Data Interaction of Classes
, pp.189-196 http://dx.doi.org/10.14257/ijsia.2015.9.9.17 Fault Analysis in Software with the Data Interaction of Classes Yan Xiaobo 1 and Wang Yichen 2 1 Science & Technology on Reliability & Environmental
A VoIP Traffic Monitoring System based on NetFlow v9
A VoIP Traffic Monitoring System based on NetFlow v9 Chang-Yong Lee *1, Hwan-Kuk Kim, Kyoung-Hee Ko, Jeong-Wook Kim, Hyun- Cheol Jeong Korea Information Security Agency, Seoul, Korea {chylee, rinyfeel,
The WAMS Power Data Processing based on Hadoop
Proceedings of 2012 4th International Conference on Machine Learning and Computing IPCSIT vol. 25 (2012) (2012) IACSIT Press, Singapore The WAMS Power Data Processing based on Hadoop Zhaoyang Qu 1, Shilin
The Performance Characteristics of MapReduce Applications on Scalable Clusters
The Performance Characteristics of MapReduce Applications on Scalable Clusters Kenneth Wottrich Denison University Granville, OH 43023 [email protected] ABSTRACT Many cluster owners and operators have
Cloud Resilient Architecture (CRA) -Design and Analysis. Hamid Alipour Salim Hariri Youssif-Al-Nashif
Cloud Resilient Architecture (CRA) -Design and Analysis Glynis Dsouza Hamid Alipour Salim Hariri Youssif-Al-Nashif NSF Center for Autonomic Computing University of Arizona Mohamed Eltoweissy Pacific National
Survey on Load Rebalancing for Distributed File System in Cloud
Survey on Load Rebalancing for Distributed File System in Cloud Prof. Pranalini S. Ketkar Ankita Bhimrao Patkure IT Department, DCOER, PG Scholar, Computer Department DCOER, Pune University Pune university
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT Samira Daneshyar 1 and Majid Razmjoo 2 1,2 School of Computer Science, Centre of Software Technology and Management (SOFTEM),
MapReduce and Hadoop Distributed File System
MapReduce and Hadoop Distributed File System 1 B. RAMAMURTHY Contact: Dr. Bina Ramamurthy CSE Department University at Buffalo (SUNY) [email protected] http://www.cse.buffalo.edu/faculty/bina Partially
16.1 MAPREDUCE. For personal use only, not for distribution. 333
For personal use only, not for distribution. 333 16.1 MAPREDUCE Initially designed by the Google labs and used internally by Google, the MAPREDUCE distributed programming model is now promoted by several
How To Analyze Network Traffic With Mapreduce On A Microsoft Server On A Linux Computer (Ahem) On A Network (Netflow) On An Ubuntu Server On An Ipad Or Ipad (Netflower) On Your Computer
A Comparative Survey Based on Processing Network Traffic Data Using Hadoop Pig and Typical Mapreduce Anjali P P and Binu A Department of Information Technology, Rajagiri School of Engineering and Technology,
Evaluating HDFS I/O Performance on Virtualized Systems
Evaluating HDFS I/O Performance on Virtualized Systems Xin Tang [email protected] University of Wisconsin-Madison Department of Computer Sciences Abstract Hadoop as a Service (HaaS) has received increasing
Generic Log Analyzer Using Hadoop Mapreduce Framework
Generic Log Analyzer Using Hadoop Mapreduce Framework Milind Bhandare 1, Prof. Kuntal Barua 2, Vikas Nagare 3, Dynaneshwar Ekhande 4, Rahul Pawar 5 1 M.Tech(Appeare), 2 Asst. Prof., LNCT, Indore 3 ME,
Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000
Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000 Alexandra Carpen-Amarie Diana Moise Bogdan Nicolae KerData Team, INRIA Outline
Open source large scale distributed data management with Google s MapReduce and Bigtable
Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory
Infrastructures for big data
Infrastructures for big data Rasmus Pagh 1 Today s lecture Three technologies for handling big data: MapReduce (Hadoop) BigTable (and descendants) Data stream algorithms Alternatives to (some uses of)
Preprocessing Web Logs for Web Intrusion Detection
Preprocessing Web Logs for Web Intrusion Detection Priyanka V. Patil. M.E. Scholar Department of computer Engineering R.C.Patil Institute of Technology, Shirpur, India Dharmaraj Patil. Department of Computer
A Study on Data Analysis Process Management System in MapReduce using BPM
A Study on Data Analysis Process Management System in MapReduce using BPM Yoon-Sik Yoo 1, Jaehak Yu 1, Hyo-Chan Bang 1, Cheong Hee Park 1 Electronics and Telecommunications Research Institute, 138 Gajeongno,
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce
Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of
Entropy-Based Collaborative Detection of DDoS Attacks on Community Networks
Entropy-Based Collaborative Detection of DDoS Attacks on Community Networks Krishnamoorthy.D 1, Dr.S.Thirunirai Senthil, Ph.D 2 1 PG student of M.Tech Computer Science and Engineering, PRIST University,
Distributed Framework for Data Mining As a Service on Private Cloud
RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &
Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after
