Network Traffic Analysis using HADOOP Architecture. Zeng Shan ISGC2013, Taibei

Size: px
Start display at page:

Download "Network Traffic Analysis using HADOOP Architecture. Zeng Shan ISGC2013, Taibei zengshan@ihep.ac.cn"

Transcription

1 Network Traffic Analysis using HADOOP Architecture Zeng Shan ISGC2013, Taibei

2 Flow VS Packet what are netflows? Outlines Flow tools used in the system nprobe nfdump Introduction to Hadoop HDFS Map/Reduce Architecture of our system Function Display

3 Flow VS Packet

4 What is a flow? Network flow is a sequence of packets From a source computer to a destination, which may be another host, a multicast group, or a broadcast domain. A network flow measures sequences of IP packets sharing a common feature as they pass between devices. Flow format: NetFlow(Cisco) J-Flow(Juniper) Sflow(HP).

5 Netflows Netflows concept developed by Cisco Defines a protocol and packet content for collecting network traffic data Several versions of the protocol, v5 most widespread Netflow v5 A flow is a unique tuple of (router port,source IP+port,dest IP+port,TCP protocol) A flow also contains TCP flags and numbers of packets and bytes transferred Flows are cached on the router and sent regularly to a recording node If the load on a router becomes high, flows can get lost as the cache may be flushed Typically more than 2000 Flows/s can be recorded

6 Flow VS Packet Flows contain relevant information to characterize and analyze the traffic Flows contain less information than packets, so recording the flows will not lead to the high network loads Storing flows, comparing with storing packets, requires much less disk space 1 day traffic in IHEP is roughly 1.5G which is stored in flows Instead of recording all flows, sampling could also be used Flows can be delivered by the router or created from packets by nprobe nprobe is an alternative if the router doesn t output netflows

7 Flow Tools used in the System

8 What is nprobe? nprobe is an open source tools Capture packets flowing on a Ethernet segment, computes flows and export them to the specified collectors. Features: Ability to keep up with Gbit speeds on Ethernet networks handling thousand of packets per second without packet sampling on commodity hardware. Support for major OS including Unix, Windows and Mac OS X it is designed for environments with limited resources

9 nfdump fulfils two tasks: data storage and data filtering/resolution nfcapd: does receive flows on a configurable port and stores it every n minutes nfdump: reads dump files containing flow data, filters it and resolve it to readable files nfdump is IPv6 enabled Current stable version is nfdump usage is very similar to tcpdump, especially for filtering rules Example: Looking for any activity of a host with IP on Mar from 11:00 to 11:59am nfdump R nfcapd :nfcapd ip

10

11 Information contains in nfdump output Tag Description Tag Description %ts Start Time - first seen %in Input Interface num %te End Time - last seen %out Output Interface num %td Duration %pkt Packets counts in this flow %pr Protocol %byt Bytes count in this flow %sa Source Address %fl Number of flows. %da Destination Address %flg TCP Flags %sap Source Address: Port %tos Type of Service %dap Destination Address:Port %bps bits per second %sp Source Port %pps packets per second %dp Destination Port %bpp Bytes per package %sas Source AS %das Destination AS

12

13 Introduction to Hadoop

14 What can Hadoop do? Hadoop is an open-source software framework. It was originally developed to support distribution for the Nutch search engine project. Licensed under the Apache v2 license. Supports data-intensive distributed applications. It enables applications to work with thousands of computation-independent computers and petabytes of data Storage: HDFS Compute: Map/Reduce

15 HDFS HDFS is short for Hadoop Distributed File System Differences from other distributed file systems: highly fault-tolerant designed to be deployed on low-cost hardware. Portability across heterogeneous hardware and software platforms Applications run on HDFS need streaming access to their data sets provides high throughput access to application data suitable for applications that have large data sets Moving computation is cheaper than moving data

16 HDFS master/slave architecture NameNode Manages name space of the file system Regulates access to files by clients Determine the mapping of blocks to DataNodes DataNodes Responsible for serving read and write requests from the clients Perform block creation, deletion and replication upon the instructions from NameNode

17 Data Replication in HDFS To ensure the fault tolerance in HDFS, the blocks of a file are replicated, the replicas of a block can be specified by the application

18 Map/Reduce MapReduce is a programming model for processing large data sets MapReduce is typically used to do distributed computing on clusters of computers MapReduce can take advantage of locality of data, processing data on or near the storage assets to decrease transmission of data. The model is inspired by the map and reduce functions "Map" step: The master node takes the input, divides it into smaller sub-problems, and distributes them to slave nodes. The slave node processes the smaller problem, and passes the answer back to its master node. "Reduce" step: The master node then collects the answers of all the sub-problems and combines them in some way to form the final output

19 Architecture of our System

20 Architecture

21 Drawing tools RRDtool acronym for round-robin database tool The data are stored in a round-robin database(circular buffer) It also includes tools to extract RRD data in a graphical format drawing flow trend graph in three dimensionality: Flow count, Packet count, Traffic count Highstock Highstock lets you create stock or general timeline charts in pure JavaScript including sophisticated navigation options like a small navigator series, preset date ranges, date picker, scrolling and panning. We just need to write API between HDFS and Highstock

22 Function Display

23

24

25

26

27 Summary Network trend chart for IHEP every 5 minutes in three dimension: Flow count/packet count/traffic count Detailed traffic information chart select a single timeslot and get the detailed traffic information select a time window and get the detailed traffic information on hovering the chart, a tooltip text with traffic information on each point and series can be displayed. the tooltip follows as the user moves the mouse over the graph Traffic information related to the HEP experiments Once the IP address of the machines related to the data transferring of the HEP experiment is specified we already have DYB/YBJ/CMS/ALTAS Intrusion Detection is also possible Writing rules, generating alerts

28 Thank You Questions?

Network Traffic Analysis using HADOOP Architecture. Shan Zeng HEPiX, Beijing 17 Oct 2012

Network Traffic Analysis using HADOOP Architecture. Shan Zeng HEPiX, Beijing 17 Oct 2012 Network Traffic Analysis using HADOOP Architecture Shan Zeng HEPiX, Beijing 17 Oct 2012 Outline Introduction to Hadoop Traffic Information Capture Traffic Information Resolution Traffic Information Storage

More information

Hadoop Architecture. Part 1

Hadoop Architecture. Part 1 Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Hadoop Distributed File System. Jordan Prosch, Matt Kipps

Hadoop Distributed File System. Jordan Prosch, Matt Kipps Hadoop Distributed File System Jordan Prosch, Matt Kipps Outline - Background - Architecture - Comments & Suggestions Background What is HDFS? Part of Apache Hadoop - distributed storage What is Hadoop?

More information

HDFS Architecture Guide

HDFS Architecture Guide by Dhruba Borthakur Table of contents 1 Introduction... 3 2 Assumptions and Goals... 3 2.1 Hardware Failure... 3 2.2 Streaming Data Access...3 2.3 Large Data Sets... 3 2.4 Simple Coherency Model...3 2.5

More information

Introduction to Netflow

Introduction to Netflow Introduction to Netflow Mike Jager Network Startup Resource Center mike.jager@synack.co.nz These materials are licensed under the Creative Commons Attribution-NonCommercial 4.0 International license (http://creativecommons.org/licenses/by-nc/4.0/)

More information

Network Monitoring and Management NetFlow Overview

Network Monitoring and Management NetFlow Overview Network Monitoring and Management NetFlow Overview These materials are licensed under the Creative Commons Attribution-Noncommercial 3.0 Unported license (http://creativecommons.org/licenses/by-nc/3.0/)

More information

NfSen Plugin Supporting The Virtual Network Monitoring

NfSen Plugin Supporting The Virtual Network Monitoring NfSen Plugin Supporting The Virtual Network Monitoring Vojtěch Krmíček krmicek@liberouter.org Pavel Čeleda celeda@ics.muni.cz Jiří Novotný novotny@cesnet.cz Part I Monitoring of Virtual Network Environments

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com What s Hadoop Framework for running applications on large clusters of commodity hardware Scale: petabytes of data

More information

Fault Tolerance in Hadoop for Work Migration

Fault Tolerance in Hadoop for Work Migration 1 Fault Tolerance in Hadoop for Work Migration Shivaraman Janakiraman Indiana University Bloomington ABSTRACT Hadoop is a framework that runs applications on large clusters which are built on numerous

More information

nfdump and NfSen 18 th Annual FIRST Conference June 25-30, 2006 Baltimore Peter Haag 2006 SWITCH

nfdump and NfSen 18 th Annual FIRST Conference June 25-30, 2006 Baltimore Peter Haag 2006 SWITCH 18 th Annual FIRST Conference June 25-30, 2006 Baltimore Peter Haag 2006 SWITCH Some operational questions, popping up now and then: Do you see this peek on port 445 as well? What caused this peek on your

More information

An overview of traffic analysis using NetFlow

An overview of traffic analysis using NetFlow The LOBSTER project An overview of traffic analysis using NetFlow Arne Øslebø UNINETT Arne.Oslebo@uninett.no 1 Outline What is Netflow? Available tools Collecting Processing Detailed analysis security

More information

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

More information

EXPERIMENTATION. HARRISON CARRANZA School of Computer Science and Mathematics

EXPERIMENTATION. HARRISON CARRANZA School of Computer Science and Mathematics BIG DATA WITH HADOOP EXPERIMENTATION HARRISON CARRANZA Marist College APARICIO CARRANZA NYC College of Technology CUNY ECC Conference 2016 Poughkeepsie, NY, June 12-14, 2016 Marist College AGENDA Contents

More information

Network forensics 101 Network monitoring with Netflow, nfsen + nfdump

Network forensics 101 Network monitoring with Netflow, nfsen + nfdump Network forensics 101 Network monitoring with Netflow, nfsen + nfdump www.enisa.europa.eu Agenda Intro to netflow Metrics Toolbox (Nfsen + Nfdump) Demo www.enisa.europa.eu 2 What is Netflow Netflow = Netflow

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

Case Study : 3 different hadoop cluster deployments

Case Study : 3 different hadoop cluster deployments Case Study : 3 different hadoop cluster deployments Lee moon soo moon@nflabs.com HDFS as a Storage Last 4 years, our HDFS clusters, stored Customer 1500 TB+ data safely served 375,000 TB+ data to customer

More information

THE HADOOP DISTRIBUTED FILE SYSTEM

THE HADOOP DISTRIBUTED FILE SYSTEM THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,

More information

The ntop Project: Open Source Network Monitoring

The ntop Project: Open Source Network Monitoring The ntop Project: Open Source Network Monitoring Luca Deri 1 Agenda 1. What can ntop do for me? 2. ntop and network security 3. Integration with commercial protocols 4. Embedding ntop 5. Work in

More information

Apache Hadoop new way for the company to store and analyze big data

Apache Hadoop new way for the company to store and analyze big data Apache Hadoop new way for the company to store and analyze big data Reyna Ulaque Software Engineer Agenda What is Big Data? What is Hadoop? Who uses Hadoop? Hadoop Architecture Hadoop Distributed File

More information

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela

Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance

More information

Large scale processing using Hadoop. Ján Vaňo

Large scale processing using Hadoop. Ján Vaňo Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine

More information

Who is Generating all This Traffic?

Who is Generating all This Traffic? Who is Generating all This Traffic? Network Monitoring in Practice Luca Deri Who s ntop.org? Started in 1998 as open-source monitoring project for developing an easy to use passive monitoring

More information

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Prepared By : Manoj Kumar Joshi & Vikas Sawhney Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks

More information

Open Source in Network Administration: the ntop Project

Open Source in Network Administration: the ntop Project Open Source in Network Administration: the ntop Project Luca Deri 1 Project History Started in 1997 as monitoring application for the Univ. of Pisa 1998: First public release v 0.4 (GPL2) 1999-2002:

More information

Apache Hadoop. Alexandru Costan

Apache Hadoop. Alexandru Costan 1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open

More information

Netflow Collection with AlienVault Alienvault 2013

Netflow Collection with AlienVault Alienvault 2013 Netflow Collection with AlienVault Alienvault 2013 CONFIGURE Configuring NetFlow Capture of TCP/IP Traffic from an AlienVault Sensor or Remote Hardware Level: Beginner to Intermediate Netflow Collection

More information

vsphere Networking vsphere 6.0 ESXi 6.0 vcenter Server 6.0 EN-001391-01

vsphere Networking vsphere 6.0 ESXi 6.0 vcenter Server 6.0 EN-001391-01 vsphere 6.0 ESXi 6.0 vcenter Server 6.0 This document supports the version of each product listed and supports all subsequent versions until the document is replaced by a new edition. To check for more

More information

Sector vs. Hadoop. A Brief Comparison Between the Two Systems

Sector vs. Hadoop. A Brief Comparison Between the Two Systems Sector vs. Hadoop A Brief Comparison Between the Two Systems Background Sector is a relatively new system that is broadly comparable to Hadoop, and people want to know what are the differences. Is Sector

More information

[Optional] Network Visibility with NetFlow

[Optional] Network Visibility with NetFlow [Optional] Network Visibility with NetFlow TELE301 Laboratory Manual Contents 1 NetFlow Architecture........................... 1 2 NetFlow Versions.............................. 2 3 Requirements Analysis...........................

More information

Large-Scale TCP Packet Flow Analysis for Common Protocols Using Apache Hadoop

Large-Scale TCP Packet Flow Analysis for Common Protocols Using Apache Hadoop Large-Scale TCP Packet Flow Analysis for Common Protocols Using Apache Hadoop R. David Idol Department of Computer Science University of North Carolina at Chapel Hill david.idol@unc.edu http://www.cs.unc.edu/~mxrider

More information

Watch your Flows with NfSen and NFDUMP 50th RIPE Meeting May 3, 2005 Stockholm Peter Haag

Watch your Flows with NfSen and NFDUMP 50th RIPE Meeting May 3, 2005 Stockholm Peter Haag Watch your Flows with NfSen and NFDUMP 50th RIPE Meeting May 3, 2005 Stockholm Peter Haag 2005 SWITCH What I am going to present: The Motivation. What are NfSen and nfdump? The Tools in Action. Outlook

More information

Hadoop Technology for Flow Analysis of the Internet Traffic

Hadoop Technology for Flow Analysis of the Internet Traffic Hadoop Technology for Flow Analysis of the Internet Traffic Rakshitha Kiran P PG Scholar, Dept. of C.S, Shree Devi Institute of Technology, Mangalore, Karnataka, India ABSTRACT: Flow analysis of the internet

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp

Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp Introduction to Hadoop Comes from Internet companies Emerging big data storage and analytics platform HDFS and MapReduce

More information

Viete, čo robia Vaši užívatelia na sieti? Roman Tuchyňa, CSA

Viete, čo robia Vaši užívatelia na sieti? Roman Tuchyňa, CSA Viete, čo robia Vaši užívatelia na sieti? Roman Tuchyňa, CSA What is ReporterAnalyzer? ReporterAnalyzer gives network professionals insight into how application traffic is impacting network performance.

More information

Configuring Flexible NetFlow

Configuring Flexible NetFlow CHAPTER 62 Note Flexible NetFlow is only supported on Supervisor Engine 7-E, Supervisor Engine 7L-E, and Catalyst 4500X. Flow is defined as a unique set of key fields attributes, which might include fields

More information

Cisco NetFlow TM Briefing Paper. Release 2.2 Monday, 02 August 2004

Cisco NetFlow TM Briefing Paper. Release 2.2 Monday, 02 August 2004 Cisco NetFlow TM Briefing Paper Release 2.2 Monday, 02 August 2004 Contents EXECUTIVE SUMMARY...3 THE PROBLEM...3 THE TRADITIONAL SOLUTIONS...4 COMPARISON WITH OTHER TECHNIQUES...6 CISCO NETFLOW OVERVIEW...7

More information

Processing of Hadoop using Highly Available NameNode

Processing of Hadoop using Highly Available NameNode Processing of Hadoop using Highly Available NameNode 1 Akash Deshpande, 2 Shrikant Badwaik, 3 Sailee Nalawade, 4 Anjali Bote, 5 Prof. S. P. Kosbatwar Department of computer Engineering Smt. Kashibai Navale

More information

NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE

NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE Anjali P P 1 and Binu A 2 1 Department of Information Technology, Rajagiri School of Engineering and Technology, Kochi. M G University, Kerala

More information

Lab VI Capturing and monitoring the network traffic

Lab VI Capturing and monitoring the network traffic Lab VI Capturing and monitoring the network traffic 1. Goals To gain general knowledge about the network analyzers and to understand their utility To learn how to use network traffic analyzer tools (Wireshark)

More information

Scalable Extraction, Aggregation, and Response to Network Intelligence

Scalable Extraction, Aggregation, and Response to Network Intelligence Scalable Extraction, Aggregation, and Response to Network Intelligence Agenda Explain the two major limitations of using Netflow for Network Monitoring Scalability and Visibility How to resolve these issues

More information

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop

More information

GraySort and MinuteSort at Yahoo on Hadoop 0.23

GraySort and MinuteSort at Yahoo on Hadoop 0.23 GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters

More information

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data

More information

IPV6 流 量 分 析 探 讨 北 京 大 学 计 算 中 心 周 昌 令

IPV6 流 量 分 析 探 讨 北 京 大 学 计 算 中 心 周 昌 令 IPV6 流 量 分 析 探 讨 北 京 大 学 计 算 中 心 周 昌 令 1 内 容 流 量 分 析 简 介 IPv6 下 的 新 问 题 和 挑 战 协 议 格 式 变 更 用 户 行 为 特 征 变 更 安 全 问 题 演 化 流 量 导 出 手 段 变 化 设 备 参 考 配 置 流 量 工 具 总 结 2 流 量 分 析 简 介 流 量 分 析 目 标 who, what, where,

More information

NetFlow Analysis with MapReduce

NetFlow Analysis with MapReduce NetFlow Analysis with MapReduce Wonchul Kang, Yeonhee Lee, Youngseok Lee Chungnam National University {teshi85, yhlee06, lee}@cnu.ac.kr 2010.04.24(Sat) based on "An Internet Traffic Analysis Method with

More information

Network Management & Monitoring

Network Management & Monitoring Network Management & Monitoring NetFlow Overview These materials are licensed under the Creative Commons Attribution-Noncommercial 3.0 Unported license (http://creativecommons.org/licenses/by-nc/3.0/)

More information

Big Data Testbed for Research and Education Networks Analysis. SomkiatDontongdang, PanjaiTantatsanawong, andajchariyasaeung

Big Data Testbed for Research and Education Networks Analysis. SomkiatDontongdang, PanjaiTantatsanawong, andajchariyasaeung Big Data Testbed for Research and Education Networks Analysis SomkiatDontongdang, PanjaiTantatsanawong, andajchariyasaeung Research and Education Networks ThaiREN is a specialized Internet Service Provider

More information

Netflow Overview. PacNOG 6 Nadi, Fiji

Netflow Overview. PacNOG 6 Nadi, Fiji Netflow Overview PacNOG 6 Nadi, Fiji Agenda Netflow What it is and how it works Uses and Applications Vendor Configurations/ Implementation Cisco and Juniper Flow-tools Architectural issues Software, tools

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 8, August 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Network Monitoring and Traffic CSTNET, CNIC

Network Monitoring and Traffic CSTNET, CNIC Network Monitoring and Traffic Analysis in CSTNET Chunjing Han Aug. 2013 CSTNET, CNIC Topics 1. The background of network monitoring 2. Network monitoring protocols and related tools 3. Network monitoring

More information

NetFlow Performance Analysis

NetFlow Performance Analysis NetFlow Performance Analysis Last Updated: May, 2007 The Cisco IOS NetFlow feature set allows for the tracking of individual IP flows as they are received at a Cisco router or switching device. Network

More information

MapReduce Job Processing

MapReduce Job Processing April 17, 2012 Background: Hadoop Distributed File System (HDFS) Hadoop requires a Distributed File System (DFS), we utilize the Hadoop Distributed File System (HDFS). Background: Hadoop Distributed File

More information

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. Hadoop Source Alessandro Rezzani, Big Data - Architettura, tecnologie e metodi per l utilizzo di grandi basi di dati, Apogeo Education, ottobre 2013 wikipedia Hadoop Apache Hadoop is an open-source software

More information

Cisco IOS Flexible NetFlow Technology

Cisco IOS Flexible NetFlow Technology Cisco IOS Flexible NetFlow Technology Last Updated: December 2008 The Challenge: The ability to characterize IP traffic and understand the origin, the traffic destination, the time of day, the application

More information

Hadoop: Embracing future hardware

Hadoop: Embracing future hardware Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop

More information

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social

More information

NetFlow-Lite offers network administrators and engineers the following capabilities:

NetFlow-Lite offers network administrators and engineers the following capabilities: Solution Overview Cisco NetFlow-Lite Introduction As networks become more complex and organizations enable more applications, traffic patterns become more diverse and unpredictable. Organizations require

More information

NTT DOCOMO Technical Journal. Large-Scale Data Processing Infrastructure for Mobile Spatial Statistics

NTT DOCOMO Technical Journal. Large-Scale Data Processing Infrastructure for Mobile Spatial Statistics Large-scale Distributed Data Processing Big Data Service Platform Mobile Spatial Statistics Supporting Development of Society and Industry Population Estimation Using Mobile Network Statistical Data and

More information

Log Management with Open-Source Tools. Risto Vaarandi SEB Estonia

Log Management with Open-Source Tools. Risto Vaarandi SEB Estonia Log Management with Open-Source Tools Risto Vaarandi SEB Estonia Outline Why use open source tools for log management? Widely used logging protocols and recently introduced new standards Open-source syslog

More information

Limitations of Packet Measurement

Limitations of Packet Measurement Limitations of Packet Measurement Collect and process less information: Only collect packet headers, not payload Ignore single packets (aggregate) Ignore some packets (sampling) Make collection and processing

More information

Hadoop at Yahoo! Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com

Hadoop at Yahoo! Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Hadoop at Yahoo! Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Who Am I? Yahoo! Architect on Hadoop Map/Reduce Design, review, and implement features in Hadoop Working on Hadoop full time since Feb

More information

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

vsphere Networking vsphere 5.5 ESXi 5.5 vcenter Server 5.5 EN-001074-02

vsphere Networking vsphere 5.5 ESXi 5.5 vcenter Server 5.5 EN-001074-02 vsphere 5.5 ESXi 5.5 vcenter Server 5.5 This document supports the version of each product listed and supports all subsequent versions until the document is replaced by a new edition. To check for more

More information

How To Analyze Network Traffic With Mapreduce On A Microsoft Server On A Linux Computer (Ahem) On A Network (Netflow) On An Ubuntu Server On An Ipad Or Ipad (Netflower) On Your Computer

How To Analyze Network Traffic With Mapreduce On A Microsoft Server On A Linux Computer (Ahem) On A Network (Netflow) On An Ubuntu Server On An Ipad Or Ipad (Netflower) On Your Computer A Comparative Survey Based on Processing Network Traffic Data Using Hadoop Pig and Typical Mapreduce Anjali P P and Binu A Department of Information Technology, Rajagiri School of Engineering and Technology,

More information

User Documentation nfdump & NfSen

User Documentation nfdump & NfSen User Documentation nfdump & NfSen 1 NFDUMP This is the combined documentation of nfdump & NfSen. Both tools are distributed under the BSD license and can be downloaded at nfdump http://sourceforge.net/projects/nfdump/

More information

How To Monitor And Test An Ethernet Network On A Computer Or Network Card

How To Monitor And Test An Ethernet Network On A Computer Or Network Card 3. MONITORING AND TESTING THE ETHERNET NETWORK 3.1 Introduction The following parameters are covered by the Ethernet performance metrics: Latency (delay) the amount of time required for a frame to travel

More information

Open source Google-style large scale data analysis with Hadoop

Open source Google-style large scale data analysis with Hadoop Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical

More information

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop

More information

NoSQL Data Base Basics

NoSQL Data Base Basics NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS

More information

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) Journal of science e ISSN 2277-3290 Print ISSN 2277-3282 Information Technology www.journalofscience.net STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) S. Chandra

More information

Accelerating and Simplifying Apache

Accelerating and Simplifying Apache Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly

More information

Detection of Distributed Denial of Service Attack with Hadoop on Live Network

Detection of Distributed Denial of Service Attack with Hadoop on Live Network Detection of Distributed Denial of Service Attack with Hadoop on Live Network Suchita Korad 1, Shubhada Kadam 2, Prajakta Deore 3, Madhuri Jadhav 4, Prof.Rahul Patil 5 Students, Dept. of Computer, PCCOE,

More information

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage Sam Fineberg, HP Storage SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations

More information

Generic Log Analyzer Using Hadoop Mapreduce Framework

Generic Log Analyzer Using Hadoop Mapreduce Framework Generic Log Analyzer Using Hadoop Mapreduce Framework Milind Bhandare 1, Prof. Kuntal Barua 2, Vikas Nagare 3, Dynaneshwar Ekhande 4, Rahul Pawar 5 1 M.Tech(Appeare), 2 Asst. Prof., LNCT, Indore 3 ME,

More information

Applications of Passive Message Logging and TCP Stream Reconstruction to Provide Application-Level Fault Tolerance. Sunny Gleason COM S 717

Applications of Passive Message Logging and TCP Stream Reconstruction to Provide Application-Level Fault Tolerance. Sunny Gleason COM S 717 Applications of Passive Message Logging and TCP Stream Reconstruction to Provide Application-Level Fault Tolerance Sunny Gleason COM S 717 December 17, 2001 0.1 Introduction The proliferation of large-scale

More information

TP1: Getting Started with Hadoop

TP1: Getting Started with Hadoop TP1: Getting Started with Hadoop Alexandru Costan MapReduce has emerged as a leading programming model for data-intensive computing. It was originally proposed by Google to simplify development of web

More information

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015 Lecture 2 (08/31, 09/02, 09/09): Hadoop Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015 K. Zhang BUDT 758 What we ll cover Overview Architecture o Hadoop

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Mauro Fruet University of Trento - Italy 2011/12/19 Mauro Fruet (UniTN) Distributed File Systems 2011/12/19 1 / 39 Outline 1 Distributed File Systems 2 The Google File System (GFS)

More information

Flow Based Traffic Analysis

Flow Based Traffic Analysis Flow based Traffic Analysis Muraleedharan N C-DAC Bangalore Electronics City murali@ncb.ernet.in Challenges in Packet level traffic Analysis Network traffic grows in volume and complexity Capture and decode

More information

Overview of Network Traffic Analysis

Overview of Network Traffic Analysis Overview of Network Traffic Analysis Network Traffic Analysis identifies which users or applications are generating traffic on your network and how much network bandwidth they are consuming. For example,

More information

IPv6 Traffic Analysis and Storage

IPv6 Traffic Analysis and Storage Report from HEPiX 2012: Network, Security and Storage david.gutierrez@cern.ch Geneva, November 16th Network and Security Network traffic analysis Updates on DC Networks IPv6 Ciber-security updates Federated

More information

Monitoring high-speed networks using ntop. Luca Deri <deri@ntop.org>

Monitoring high-speed networks using ntop. Luca Deri <deri@ntop.org> Monitoring high-speed networks using ntop Luca Deri 1 Project History Started in 1997 as monitoring application for the Univ. of Pisa 1998: First public release v 0.4 (GPL2) 1999-2002:

More information

Transport and Network Layer

Transport and Network Layer Transport and Network Layer 1 Introduction Responsible for moving messages from end-to-end in a network Closely tied together TCP/IP: most commonly used protocol o Used in Internet o Compatible with a

More information

Log Management with Open-Source Tools. Risto Vaarandi rvaarandi 4T Y4H00 D0T C0M

Log Management with Open-Source Tools. Risto Vaarandi rvaarandi 4T Y4H00 D0T C0M Log Management with Open-Source Tools Risto Vaarandi rvaarandi 4T Y4H00 D0T C0M Outline Why do we need log collection and management? Why use open source tools? Widely used logging protocols and recently

More information

Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science

Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science A Seminar report On Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science SUBMITTED TO: www.studymafia.org SUBMITTED BY: www.studymafia.org

More information

IPTV Traffic Monitoring System with IPFIX/PSAMP

IPTV Traffic Monitoring System with IPFIX/PSAMP IPTV Traffic Monitoring System with IPFIX/PSAMP Shingo Kashima NTT Information Sharing Platform Laboratories 3rd NMRG Workshop 2010 NTT Information Sharing Platform Laboratories Outline Introduction Motivation

More information

plixer Scrutinizer Competitor Worksheet Visualization of Network Health Unauthorized application deployments Detect DNS communication tunnels

plixer Scrutinizer Competitor Worksheet Visualization of Network Health Unauthorized application deployments Detect DNS communication tunnels Scrutinizer Competitor Worksheet Scrutinizer Malware Incident Response Scrutinizer is a massively scalable, distributed flow collection system that provides a single interface for all traffic related to

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information

MASSIVE DATA PROCESSING (THE GOOGLE WAY ) 27/04/2015. Fundamentals of Distributed Systems. Inside Google circa 2015

MASSIVE DATA PROCESSING (THE GOOGLE WAY ) 27/04/2015. Fundamentals of Distributed Systems. Inside Google circa 2015 7/04/05 Fundamentals of Distributed Systems CC5- PROCESAMIENTO MASIVO DE DATOS OTOÑO 05 Lecture 4: DFS & MapReduce I Aidan Hogan aidhog@gmail.com Inside Google circa 997/98 MASSIVE DATA PROCESSING (THE

More information

Machine Learning Algorithm for Pro-active Fault Detection in Hadoop Cluster

Machine Learning Algorithm for Pro-active Fault Detection in Hadoop Cluster Malaviya National Institute of Technology Department of Computer Science & Engineering Machine Learning Algorithm for Pro-active Fault Detection in Hadoop Cluster Winter Internship of: Agrima Seth Advisor:

More information

Hands on Workshop. Network Performance Monitoring and Multicast Routing. Yasuichi Kitamura NICT Jin Tanaka KDDI/NICT APAN-JP NOC

Hands on Workshop. Network Performance Monitoring and Multicast Routing. Yasuichi Kitamura NICT Jin Tanaka KDDI/NICT APAN-JP NOC Hands on Workshop Network Performance Monitoring and Multicast Routing Yasuichi Kitamura NICT Jin Tanaka KDDI/NICT APAN-JP NOC July 18th TEIN2 Site Coordination Workshop Network Performance Monitoring

More information

VisuSniff: A Tool For The Visualization Of Network Traffic

VisuSniff: A Tool For The Visualization Of Network Traffic VisuSniff: A Tool For The Visualization Of Network Traffic Rainer Oechsle University of Applied Sciences, Trier Postbox 1826 D-54208 Trier +49/651/8103-508 oechsle@informatik.fh-trier.de Oliver Gronz University

More information

MapReduce, Hadoop and Amazon AWS

MapReduce, Hadoop and Amazon AWS MapReduce, Hadoop and Amazon AWS Yasser Ganjisaffar http://www.ics.uci.edu/~yganjisa February 2011 What is Hadoop? A software framework that supports data-intensive distributed applications. It enables

More information

IP address format: Dotted decimal notation: 10000000 00001011 00000011 00011111 128.11.3.31

IP address format: Dotted decimal notation: 10000000 00001011 00000011 00011111 128.11.3.31 IP address format: 7 24 Class A 0 Network ID Host ID 14 16 Class B 1 0 Network ID Host ID 21 8 Class C 1 1 0 Network ID Host ID 28 Class D 1 1 1 0 Multicast Address Dotted decimal notation: 10000000 00001011

More information