Real-Time Handling of Network Monitoring Data Using a Data-Intensive Framework

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Real-Time Handling of Network Monitoring Data Using a Data-Intensive Framework"

Transcription

1 Real-Time Handling of Network Monitoring Data Using a Data-Intensive Framework Aryan TaheriMonfared Tomasz Wiktor Wlodarczyk Chunming Rong Department of Electrical Engineering and Computer Science University of Stavanger CloudCom, 2013

2 Problem? & Solution Problem? Proper network operation requires efficient monitoring Different monitoring instruments and protocols exist Monitoring data are huge Diverse query types are required (planned vs. ad-hoc)

3 Problem? & Solution Contributions A mechanism for: Scalable and flexible storage Real-time processing, long-term analysis Protocol independent

4 Norwegian NREN Backbone Network Data Characteristics Norwegian NREN backbone network Flow information from two core routers Anonymized records Average number of NetFlow records: 22m /day Average volume of NetFlow records: 60GB /day Sampling rate: 8

5 Overview Solution Overview Hadoop framework: HDFS, HBase, MapReduce HBase: nosql data store (row key, column-families, columns) Row Key: Facilitate accessing a specific data point or a range of them

6 Schema Schemas Composite row key: {src, dst}{addr, port}{ts} Three table types are required: IP Based Tables Port Based Tables Time Based Tables Single table has actual data, others are lookup tables

7 Implementation Implementation Initial data collection didn t perform well For a single day of NetFlow data: HBase max # op/s: 50 HBase max op latency: 2.3 s HDFS max # written bytes/s: 81 MB/s MR job duration: min This is not good at all

8 Implementation What is wrong? Non uniform distribution of data across regions (Hot Regions) Write Ahead Log Concurrent-Mark-Sweep Garbage Collection (CMS-GC) Old generation heap fragmentation etc.

9 Performance Tuning What to do? Using Compression Tuning Swap Disabling Write Ahead Log Enabling Deferred Log Flush Increasing Heap Size Specifying Concurrent-Mark-Sweep Garbage Collection Enabling MemStore-Local Allocation Buffers (MSLAB) Pre-Splitting Regions

10 Performance Tuning Regions Basic element of availability and distribution for tables Has start and end row keys Two Splitting Strategies 1 Uniform splitting over leading field of rowkey IP in IP Based tables ((2 32 1)/#Regions) Port in Port Based tables ((2 16 1)/#Regions) 2 Empirical study of leading field value domain Norwegian IP blocks Popular src, dst Popular services

11 Performance Tuning Pre-Splitting Regions 1) Uniform Distribution Results: x30 more operation/s x14 faster operation x3 shorter duration 2) Empirical study Results: x64 more operation/s x80 faster operation x7.5 shorter duration

12 Top-N Host Pairs Top-N Host Pairs Results Finding host pairs which exchanged most traffic Belongs to long-term query family Aggregation of input and output bytes for all host pairs Query on Reference table (T1) with 5 billion records Traditional tools: not capable handling this much data (e.g. nfdump) Chaining MapReduce jobs: min (Average response time) Reasonable duration

13 Service Server Discovery Service Server Discovery for a Given Period Criteria: Port number and Time range Four methods of execution: 1 HBase 2 OpenTSDB 3 NFD1 (Over complete dataset) 4 NFD2 (Limited dataset by time)

14 Service Server Discovery Service Server Discovery Results HBase x87 faster than OpenTSDB HBase x4472 faster than NFD1

15 Summary Data-intensive frameworks are effective for network monitoring Solutions should be protocol independent Designing proper data structure is crucial Data characteristics should be well studied Different query types have heterogeneous demands One size doesn t fit all

16 Ongoing Research End-to-End secure virtual layer 2 networks

Security Monitoring and Enforcement for the Cloud Model

Security Monitoring and Enforcement for the Cloud Model Security Monitoring and Enforcement for the Cloud Model Aryan TaheriMonfared aryan.taherimonfared@uis.no June 21, 2013 Agenda 1 Infrastructure Architecture for a Cloud IaaS Provider 10000 Foot View 1000

More information

Real-Time Handling of Network Monitoring Data Using a Data-Intensive Framework

Real-Time Handling of Network Monitoring Data Using a Data-Intensive Framework 2013 IEEE International Conference on Cloud Computing Technology and Science Real-Time Handling of Network Monitoring Data Using a Data-Intensive Framework Aryan TaheriMonfared, Tomasz Wiktor Wlodarczyk,

More information

Real-Time Handling of Network Monitoring Data Using a Data-Intensive Framework

Real-Time Handling of Network Monitoring Data Using a Data-Intensive Framework Real-Time Handling of Network Monitoring Data Using a Data-Intensive Framework Aryan TaheriMonfared, Tomasz Wiktor Wlodarczyk, Chunming Rong, Department of Electrical Engineering and Computer Science,

More information

A Scalable Data Transformation Framework using the Hadoop Ecosystem

A Scalable Data Transformation Framework using the Hadoop Ecosystem A Scalable Data Transformation Framework using the Hadoop Ecosystem Raj Nair Director Data Platform Kiru Pakkirisamy CTO AGENDA About Penton and Serendio Inc Data Processing at Penton PoC Use Case Functional

More information

III Big Data Technologies

III Big Data Technologies III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Getting Started with SandStorm NoSQL Benchmark

Getting Started with SandStorm NoSQL Benchmark Getting Started with SandStorm NoSQL Benchmark SandStorm is an enterprise performance testing tool for web, mobile, cloud and big data applications. It provides a framework for benchmarking NoSQL, Hadoop,

More information

HBase Schema Design. NoSQL Ma4ers, Cologne, April 2013. Lars George Director EMEA Services

HBase Schema Design. NoSQL Ma4ers, Cologne, April 2013. Lars George Director EMEA Services HBase Schema Design NoSQL Ma4ers, Cologne, April 2013 Lars George Director EMEA Services About Me Director EMEA Services @ Cloudera ConsulFng on Hadoop projects (everywhere) Apache Commi4er HBase and Whirr

More information

Lecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Lecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the

More information

Figure 1. perfsonar architecture. 1 This work was supported by the EC IST-EMANICS Network of Excellence (#26854).

Figure 1. perfsonar architecture. 1 This work was supported by the EC IST-EMANICS Network of Excellence (#26854). 1 perfsonar tools evaluation 1 The goal of this PSNC activity was to evaluate perfsonar NetFlow tools for flow collection solution and assess its applicability to easily subscribe and request different

More information

Wireshark Developer and User Conference

Wireshark Developer and User Conference Wireshark Developer and User Conference Using NetFlow to Analyze Your Network June 15 th, 2011 Christopher J. White Manager Applica6ons and Analy6cs, Cascade Riverbed Technology cwhite@riverbed.com SHARKFEST

More information

Big Data and Cloud Computing

Big Data and Cloud Computing . or g Chunming Rong Tomasz Wiktor Wlodarczyk Chair (IEEE CloudCom) Big Data Chair (IEEE CloudCom) Head (CIPSI) Administrative Head (CIPSI) Professor (UiS) Associate Professor (UiS) chunming.rong@uis.no

More information

NfSen Plugin Supporting The Virtual Network Monitoring

NfSen Plugin Supporting The Virtual Network Monitoring NfSen Plugin Supporting The Virtual Network Monitoring Vojtěch Krmíček krmicek@liberouter.org Pavel Čeleda celeda@ics.muni.cz Jiří Novotný novotny@cesnet.cz Part I Monitoring of Virtual Network Environments

More information

Limitations of Packet Measurement

Limitations of Packet Measurement Limitations of Packet Measurement Collect and process less information: Only collect packet headers, not payload Ignore single packets (aggregate) Ignore some packets (sampling) Make collection and processing

More information

Big Data Analytics Platform @ Nokia

Big Data Analytics Platform @ Nokia Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform

More information

Firebird meets NoSQL (Apache HBase) Case Study

Firebird meets NoSQL (Apache HBase) Case Study Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Scaling-out Semantic Data Management and Processing

Scaling-out Semantic Data Management and Processing Scaling-out Semantic Data Management and Processing Tomasz Wiktor Wlodarczyk, Norway CIPSI Director: prof. Chunming Rong Areas of interest: Reasoning, analytics and simulation Distributed systems Dependability

More information

A Performance Analysis of Distributed Indexing using Terrier

A Performance Analysis of Distributed Indexing using Terrier A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search

More information

An Open Source NoSQL solution for Internet Access Logs Analysis

An Open Source NoSQL solution for Internet Access Logs Analysis An Open Source NoSQL solution for Internet Access Logs Analysis A practical case of why, what and how to use a NoSQL Database Management System instead of a relational one José Manuel Ciges Regueiro

More information

Port evolution: a software to find the shady IP profiles in Netflow. Or how to reduce Netflow records efficiently.

Port evolution: a software to find the shady IP profiles in Netflow. Or how to reduce Netflow records efficiently. TLP:WHITE - Port Evolution Port evolution: a software to find the shady IP profiles in Netflow. Or how to reduce Netflow records efficiently. Gerard Wagener 41, avenue de la Gare L-1611 Luxembourg Grand-Duchy

More information

Use case: Merging heterogeneous network measurement data

Use case: Merging heterogeneous network measurement data Use case: Merging heterogeneous network measurement data Jorge E. López de Vergara and Javier Aracil Jorge.lopez_vergara@uam.es Credits to Rubén García-Valcárcel, Iván González, Rafael Leira, Víctor Moreno,

More information

Performance Management in Big Data Applica6ons. Michael Kopp, Technology Strategist @mikopp

Performance Management in Big Data Applica6ons. Michael Kopp, Technology Strategist @mikopp Performance Management in Big Data Applica6ons Michael Kopp, Technology Strategist NoSQL: High Volume/Low Latency DBs Web Java Key Challenges 1) Even Distribu6on 2) Correct Schema and Access paperns 3)

More information

Software-Defined Networking Architecture Framework for Multi-Tenant Enterprise Cloud Environments

Software-Defined Networking Architecture Framework for Multi-Tenant Enterprise Cloud Environments Software-Defined Networking Architecture Framework for Multi-Tenant Enterprise Cloud Environments Aryan TaheriMonfared Department of Electrical Engineering and Computer Science University of Stavanger

More information

Hadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013

Hadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013 Hadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013 * Other names and brands may be claimed as the property of others. Agenda Hadoop Intro Why run Hadoop on Lustre?

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

An overview of traffic analysis using NetFlow

An overview of traffic analysis using NetFlow The LOBSTER project An overview of traffic analysis using NetFlow Arne Øslebø UNINETT Arne.Oslebo@uninett.no 1 Outline What is Netflow? Available tools Collecting Processing Detailed analysis security

More information

Software-Defined Networking Architecture Framework for Multi-Tenant Enterprise Cloud Environments

Software-Defined Networking Architecture Framework for Multi-Tenant Enterprise Cloud Environments Software-Defined Networking Architecture Framework for Multi-Tenant Enterprise Cloud Environments by Aryan TaheriMonfared A dissertation submitted in partial satisfaction of the requirements for the degree

More information

Ubuntu and Hadoop: the perfect match

Ubuntu and Hadoop: the perfect match WHITE PAPER Ubuntu and Hadoop: the perfect match February 2012 Copyright Canonical 2012 www.canonical.com Executive introduction In many fields of IT, there are always stand-out technologies. This is definitely

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

Media Upload and Sharing Website using HBASE

Media Upload and Sharing Website using HBASE A-PDF Merger DEMO : Purchase from www.a-pdf.com to remove the watermark Media Upload and Sharing Website using HBASE Tushar Mahajan Santosh Mukherjee Shubham Mathur Agenda Motivation for the project Introduction

More information

Apache HBase. Crazy dances on the elephant back

Apache HBase. Crazy dances on the elephant back Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage

More information

Network Traffic Analysis using HADOOP Architecture. Zeng Shan ISGC2013, Taibei zengshan@ihep.ac.cn

Network Traffic Analysis using HADOOP Architecture. Zeng Shan ISGC2013, Taibei zengshan@ihep.ac.cn Network Traffic Analysis using HADOOP Architecture Zeng Shan ISGC2013, Taibei zengshan@ihep.ac.cn Flow VS Packet what are netflows? Outlines Flow tools used in the system nprobe nfdump Introduction to

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

Four Orders of Magnitude: Running Large Scale Accumulo Clusters. Aaron Cordova Accumulo Summit, June 2014

Four Orders of Magnitude: Running Large Scale Accumulo Clusters. Aaron Cordova Accumulo Summit, June 2014 Four Orders of Magnitude: Running Large Scale Accumulo Clusters Aaron Cordova Accumulo Summit, June 2014 Scale, Security, Schema Scale to scale 1 - (vt) to change the size of something let s scale the

More information

HADOOP PERFORMANCE TUNING

HADOOP PERFORMANCE TUNING PERFORMANCE TUNING Abstract This paper explains tuning of Hadoop configuration parameters which directly affects Map-Reduce job performance under various conditions, to achieve maximum performance. The

More information

Using distributed technologies to analyze Big Data

Using distributed technologies to analyze Big Data Using distributed technologies to analyze Big Data Abhijit Sharma Innovation Lab BMC Software 1 Data Explosion in Data Center Performance / Time Series Data Incoming data rates ~Millions of data points/

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon. Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new

More information

Hypertable Goes Realtime at Baidu. Yang Dong yangdong01@baidu.com Sherlock Yang(http://weibo.com/u/2624357843)

Hypertable Goes Realtime at Baidu. Yang Dong yangdong01@baidu.com Sherlock Yang(http://weibo.com/u/2624357843) Hypertable Goes Realtime at Baidu Yang Dong yangdong01@baidu.com Sherlock Yang(http://weibo.com/u/2624357843) Agenda Motivation Related Work Model Design Evaluation Conclusion 2 Agenda Motivation Related

More information

Cloud Computing, Software Defined Networking, Network Function Virtualization

Cloud Computing, Software Defined Networking, Network Function Virtualization Cloud Computing, Software Defined Networking, Network Function Virtualization Aryan TaheriMonfared Department of Electrical Engineering and Computer Science University of Stavanger August 27, 2015 Outline

More information

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Dave Dykstra dwd@fnal.gov Fermilab is operated by the Fermi Research Alliance, LLC under contract No. DE-AC02-07CH11359

More information

NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE

NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE Anjali P P 1 and Binu A 2 1 Department of Information Technology, Rajagiri School of Engineering and Technology, Kochi. M G University, Kerala

More information

How Companies are! Using Spark

How Companies are! Using Spark How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made

More information

NetFlow Tracker Overview. Mike McGrath x ccie CTO mike@crannog-software.com

NetFlow Tracker Overview. Mike McGrath x ccie CTO mike@crannog-software.com NetFlow Tracker Overview Mike McGrath x ccie CTO mike@crannog-software.com 2006 Copyright Crannog Software www.crannog-software.com 1 Copyright Crannog Software www.crannog-software.com 2 LEVELS OF NETWORK

More information

Applying Hadoop to Semantic Big Data Processing. CIPSI and Big Data. Tomasz Wiktor Wlodarczyk Chunming Rong. University of Stavanger, Norway

Applying Hadoop to Semantic Big Data Processing. CIPSI and Big Data. Tomasz Wiktor Wlodarczyk Chunming Rong. University of Stavanger, Norway Applying Hadoop to Semantic Big Data Processing CIPSI and Big Data Tomasz Wiktor Wlodarczyk Chunming Rong University of Stavanger, Norway Index Querying in Hadoop- based triple store Scaling- out NCBO

More information

How good can databases deal with Netflow data

How good can databases deal with Netflow data How good can databases deal with Netflow data Bachelorarbeit Supervisor: bernhard fabian@net.t-labs.tu-berlin.de Inteligent Networks Group (INET) Ernesto Abarca Ortiz eabarca@net.t-labs.tu-berlin.de OVERVIEW

More information

Turn Big Data to Small Data

Turn Big Data to Small Data Turn Big Data to Small Data Use Qlik to Utilize Distributed Systems and Document Databases October, 2014 Stig Magne Henriksen Image: kdnuggets.com From Big Data to Small Data Agenda When do we have a Big

More information

Cloud Computing at Google. Architecture

Cloud Computing at Google. Architecture Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale

More information

Analyzing Big Data at. Web 2.0 Expo, 2010 Kevin Weil @kevinweil

Analyzing Big Data at. Web 2.0 Expo, 2010 Kevin Weil @kevinweil Analyzing Big Data at Web 2.0 Expo, 2010 Kevin Weil @kevinweil Three Challenges Collecting Data Large-Scale Storage and Analysis Rapid Learning over Big Data My Background Studied Mathematics and Physics

More information

Analytics on Spark & Shark @Yahoo

Analytics on Spark & Shark @Yahoo Analytics on Spark & Shark @Yahoo PRESENTED BY Tim Tully December 3, 2013 Overview Legacy / Current Hadoop Architecture Reflection / Pain Points Why the movement towards Spark / Shark New Hybrid Environment

More information

HareDB HBase Client Web Version USER MANUAL HAREDB TEAM

HareDB HBase Client Web Version USER MANUAL HAREDB TEAM 2013 HareDB HBase Client Web Version USER MANUAL HAREDB TEAM Connect to HBase... 2 Connection... 3 Connection Manager... 3 Add a new Connection... 4 Alter Connection... 6 Delete Connection... 6 Clone Connection...

More information

and reporting Slavko Gajin slavko.gajin@rcub.bg.ac.rs

and reporting Slavko Gajin slavko.gajin@rcub.bg.ac.rs ICmyNet.Flow: NetFlow based traffic investigation, analysis, and reporting Slavko Gajin slavko.gajin@rcub.bg.ac.rs AMRES Academic Network of Serbia RCUB - Belgrade University Computer Center ETF Faculty

More information

Hybrid network traffic engineering system (HNTES)

Hybrid network traffic engineering system (HNTES) Hybrid network traffic engineering system (HNTES) Zhenzhen Yan, Zhengyang Liu, Chris Tracy, Malathi Veeraraghavan University of Virginia and ESnet Jan 12-13, 2012 mvee@virginia.edu, ctracy@es.net Project

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Integrating Big Data into the Computing Curricula

Integrating Big Data into the Computing Curricula Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big

More information

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

Scalable NetFlow Analysis with Hadoop Yeonhee Lee and Youngseok Lee

Scalable NetFlow Analysis with Hadoop Yeonhee Lee and Youngseok Lee Scalable NetFlow Analysis with Hadoop Yeonhee Lee and Youngseok Lee {yhlee06, lee}@cnu.ac.kr http://networks.cnu.ac.kr/~yhlee Chungnam National University, Korea January 8, 2013 FloCon 2013 Contents Introduction

More information

Watch your Flows with NfSen and NFDUMP 50th RIPE Meeting May 3, 2005 Stockholm Peter Haag

Watch your Flows with NfSen and NFDUMP 50th RIPE Meeting May 3, 2005 Stockholm Peter Haag Watch your Flows with NfSen and NFDUMP 50th RIPE Meeting May 3, 2005 Stockholm Peter Haag 2005 SWITCH What I am going to present: The Motivation. What are NfSen and nfdump? The Tools in Action. Outlook

More information

Scaling Up 2 CSE 6242 / CX 4242. Duen Horng (Polo) Chau Georgia Tech. HBase, Hive

Scaling Up 2 CSE 6242 / CX 4242. Duen Horng (Polo) Chau Georgia Tech. HBase, Hive CSE 6242 / CX 4242 Scaling Up 2 HBase, Hive Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos, Le

More information

Comparing Scalable NOSQL Databases

Comparing Scalable NOSQL Databases Comparing Scalable NOSQL Databases Functionalities and Measurements Dory Thibault UCL Contact : thibault.dory@student.uclouvain.be Sponsor : Euranova Website : nosqlbenchmarking.com February 15, 2011 Clarications

More information

CS 78 Computer Networks. Internet Protocol (IP) our focus. The Network Layer. Interplay between routing and forwarding

CS 78 Computer Networks. Internet Protocol (IP) our focus. The Network Layer. Interplay between routing and forwarding CS 78 Computer Networks Internet Protocol (IP) Andrew T. Campbell campbell@cs.dartmouth.edu our focus What we will lean What s inside a router IP forwarding Internet Control Message Protocol (ICMP) IP

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

Big Data. A general approach to process external multimedia datasets. David Mera

Big Data. A general approach to process external multimedia datasets. David Mera Big Data A general approach to process external multimedia datasets David Mera Laboratory of Data Intensive Systems and Applications (DISA) Masaryk University Brno, Czech Republic 7/10/2014 Table of Contents

More information

Load Balancing in Distributed Web Server Systems With Partial Document Replication

Load Balancing in Distributed Web Server Systems With Partial Document Replication Load Balancing in Distributed Web Server Systems With Partial Document Replication Ling Zhuo, Cho-Li Wang and Francis C. M. Lau Department of Computer Science and Information Systems The University of

More information

brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS...253 PART 4 BEYOND MAPREDUCE...385

brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS...253 PART 4 BEYOND MAPREDUCE...385 brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 1 Hadoop in a heartbeat 3 2 Introduction to YARN 22 PART 2 DATA LOGISTICS...59 3 Data serialization working with text and beyond 61 4 Organizing and

More information

NetFlow Analysis with MapReduce

NetFlow Analysis with MapReduce NetFlow Analysis with MapReduce Wonchul Kang, Yeonhee Lee, Youngseok Lee Chungnam National University {teshi85, yhlee06, lee}@cnu.ac.kr 2010.04.24(Sat) based on "An Internet Traffic Analysis Method with

More information

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process

More information

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

More information

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang 2011-10

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang 2011-10 Application and practice of parallel cloud computing in ISP Guangzhou Institute of China Telecom Zhilan Huang 2011-10 Outline Mass data management problem Applications of parallel cloud computing in ISPs

More information

Decoding DNS data. Using DNS traffic analysis to identify cyber security threats, server misconfigurations and software bugs

Decoding DNS data. Using DNS traffic analysis to identify cyber security threats, server misconfigurations and software bugs Decoding DNS data Using DNS traffic analysis to identify cyber security threats, server misconfigurations and software bugs The Domain Name System (DNS) is a core component of the Internet infrastructure,

More information

Time-Series Databases and Machine Learning

Time-Series Databases and Machine Learning Time-Series Databases and Machine Learning Jimmy Bates November 2017 1 Top-Ranked Hadoop 1 3 5 7 Read Write File System World Record Performance High Availability Enterprise-grade Security Distribution

More information

Big Data and Scripting map/reduce in Hadoop

Big Data and Scripting map/reduce in Hadoop Big Data and Scripting map/reduce in Hadoop 1, 2, parts of a Hadoop map/reduce implementation core framework provides customization via indivudual map and reduce functions e.g. implementation in mongodb

More information

JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra

JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra January 2014 Legal Notices Apache Cassandra, Spark and Solr and their respective logos are trademarks or registered trademarks

More information

Scalable Extraction, Aggregation, and Response to Network Intelligence

Scalable Extraction, Aggregation, and Response to Network Intelligence Scalable Extraction, Aggregation, and Response to Network Intelligence Agenda Explain the two major limitations of using Netflow for Network Monitoring Scalability and Visibility How to resolve these issues

More information

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Data Warehousing and Analytics Infrastructure at Facebook Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Overview Challenges in a Fast Growing & Dynamic Environment Data Flow Architecture,

More information

Can the Elephants Handle the NoSQL Onslaught?

Can the Elephants Handle the NoSQL Onslaught? Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

Scalable Network Measurement Analysis with Hadoop. Taghrid Samak and Daniel Gunter Advanced Computing for Sciences, LBNL

Scalable Network Measurement Analysis with Hadoop. Taghrid Samak and Daniel Gunter Advanced Computing for Sciences, LBNL Scalable Network Measurement Analysis with Hadoop Taghrid Samak and Daniel Gunter Advanced Computing for Sciences, LBNL Outline Motivation Hadoop overview Approach doing the right thing, Avro what worked,

More information

Benchmarking Hadoop & HBase on Violin

Benchmarking Hadoop & HBase on Violin Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages

More information

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools

More information

Software Defined Networking What is it, how does it work, and what is it good for?

Software Defined Networking What is it, how does it work, and what is it good for? Software Defined Networking What is it, how does it work, and what is it good for? slides stolen from Jennifer Rexford, Nick McKeown, Michael Schapira, Scott Shenker, Teemu Koponen, Yotam Harchol and David

More information

From GWS to MapReduce: Google s Cloud Technology in the Early Days

From GWS to MapReduce: Google s Cloud Technology in the Early Days Large-Scale Distributed Systems From GWS to MapReduce: Google s Cloud Technology in the Early Days Part II: MapReduce in a Datacenter COMP6511A Spring 2014 HKUST Lin Gu lingu@ieee.org MapReduce/Hadoop

More information

Exploring Netflow Data using Hadoop

Exploring Netflow Data using Hadoop Exploring Netflow Data using Hadoop X. Zhou 1, M. Petrovic 2,T. Eskridge 3,M. Carvalho 4,X. Tao 5 Xiaofeng Zhou, CISE Department, University of Florida Milenko Petrovic, Florida Institute for Human and

More information

Storage of Structured Data: BigTable and HBase. New Trends In Distributed Systems MSc Software and Systems

Storage of Structured Data: BigTable and HBase. New Trends In Distributed Systems MSc Software and Systems Storage of Structured Data: BigTable and HBase 1 HBase and BigTable HBase is Hadoop's counterpart of Google's BigTable BigTable meets the need for a highly scalable storage system for structured data Provides

More information

Big Data Analytics - Accelerated. stream-horizon.com

Big Data Analytics - Accelerated. stream-horizon.com Big Data Analytics - Accelerated stream-horizon.com StreamHorizon & Big Data Integrates into your Data Processing Pipeline Seamlessly integrates at any point of your your data processing pipeline Implements

More information

!"#$%&' ( )%#*'+,'-#.//"0( !"#$"%&'()*$+()',!-+.'/', 4(5,67,!-+!"89,:*$;'0+$.<.,&0$'09,&)"/=+,!()<>'0, 3, Processing LARGE data sets

!#$%&' ( )%#*'+,'-#.//0( !#$%&'()*$+()',!-+.'/', 4(5,67,!-+!89,:*$;'0+$.<.,&0$'09,&)/=+,!()<>'0, 3, Processing LARGE data sets !"#$%&' ( Processing LARGE data sets )%#*'+,'-#.//"0( Framework for o! reliable o! scalable o! distributed computation of large data sets 4(5,67,!-+!"89,:*$;'0+$.

More information

A Comparative Survey Based on Processing Network Traffic Data Using Hadoop Pig and Typical Mapreduce

A Comparative Survey Based on Processing Network Traffic Data Using Hadoop Pig and Typical Mapreduce A Comparative Survey Based on Processing Network Traffic Data Using Hadoop Pig and Typical Mapreduce Anjali P P and Binu A Department of Information Technology, Rajagiri School of Engineering and Technology,

More information

Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone

Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it Dan Ariely MYSQL AND HBASE ECOSYSTEM

More information

Comparing SQL and NOSQL databases

Comparing SQL and NOSQL databases COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations

More information

Developing MapReduce Programs

Developing MapReduce Programs Cloud Computing Developing MapReduce Programs Dell Zhang Birkbeck, University of London 2015/16 MapReduce Algorithm Design MapReduce: Recap Programmers must specify two functions: map (k, v) * Takes

More information

Big data Journey. From a pallet of parts to big data analytics. Sebastian Castro.nz Registry Services

Big data Journey. From a pallet of parts to big data analytics. Sebastian Castro.nz Registry Services Big data Journey From a pallet of parts to big data analytics Sebastian Castro.nz Registry Services Introduction Apache Hadoop Open-source software framework Storage using HDFS Data processing in batch

More information

Cloudera Certified Developer for Apache Hadoop

Cloudera Certified Developer for Apache Hadoop Cloudera CCD-333 Cloudera Certified Developer for Apache Hadoop Version: 5.6 QUESTION NO: 1 Cloudera CCD-333 Exam What is a SequenceFile? A. A SequenceFile contains a binary encoding of an arbitrary number

More information

Big Systems, Big Data

Big Systems, Big Data Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,

More information

Openbus Documentation

Openbus Documentation Openbus Documentation Release 1 Produban February 17, 2014 Contents i ii An open source architecture able to process the massive amount of events that occur in a banking IT Infraestructure. Contents:

More information

KNOM Tutorial 2003. Internet Traffic Measurement and Analysis. Sue Bok Moon Dept. of Computer Science

KNOM Tutorial 2003. Internet Traffic Measurement and Analysis. Sue Bok Moon Dept. of Computer Science KNOM Tutorial 2003 Internet Traffic Measurement and Analysis Sue Bok Moon Dept. of Computer Science Overview Definition of Traffic Matrix 4Traffic demand, delay, loss Applications of Traffic Matrix 4Engineering,

More information

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES 1 HYPER-CONVERGED INFRASTRUCTURE STRATEGIES MYTH BUSTING & THE FUTURE OF WEB SCALE IT 2 ROADMAP INFORMATION DISCLAIMER EMC makes no representation and undertakes no obligations with regard to product planning

More information

Duke University http://www.cs.duke.edu/starfish

Duke University http://www.cs.duke.edu/starfish Herodotos Herodotou, Harold Lim, Fei Dong, Shivnath Babu Duke University http://www.cs.duke.edu/starfish Practitioners of Big Data Analytics Google Yahoo! Facebook ebay Physicists Biologists Economists

More information

The Top 10 7 Hadoop Patterns and Anti-patterns. Alex Holmes @

The Top 10 7 Hadoop Patterns and Anti-patterns. Alex Holmes @ The Top 10 7 Hadoop Patterns and Anti-patterns Alex Holmes @ whoami Alex Holmes Software engineer Working on distributed systems for many years Hadoop since 2008 @grep_alex grepalex.com what s hadoop...

More information

Discovering Business Insights in Big Data Using SQL-MapReduce

Discovering Business Insights in Big Data Using SQL-MapReduce Discovering Business Insights in Big Data Using SQL-MapReduce A Technical Whitepaper Rick F. van der Lans Independent Business Intelligence Analyst R20/Consultancy July 2013 Sponsored by Copyright 2013

More information