Real-Time Handling of Network Monitoring Data Using a Data-Intensive Framework

Size: px

Start display at page:

Download "Real-Time Handling of Network Monitoring Data Using a Data-Intensive Framework"

Bertram Smith
10 years ago
Views:

1 Real-Time Handling of Network Monitoring Data Using a Data-Intensive Framework Aryan TaheriMonfared Tomasz Wiktor Wlodarczyk Chunming Rong Department of Electrical Engineering and Computer Science University of Stavanger CloudCom, 2013

Wiktor Wlodarczyk Chunming Rong Department of Electrical

2 Problem? & Solution Problem? Proper network operation requires efficient monitoring Different monitoring instruments and protocols exist Monitoring data are huge Diverse query types are required (planned vs. ad-hoc)

3 Problem? & Solution Contributions A mechanism for: Scalable and flexible storage Real-time processing, long-term analysis Protocol independent

4 Norwegian NREN Backbone Network Data Characteristics Norwegian NREN backbone network Flow information from two core routers Anonymized records Average number of NetFlow records: 22m /day Average volume of NetFlow records: 60GB /day Sampling rate: 8

core routers Anonymized records Average number of NetFlow

5 Overview Solution Overview Hadoop framework: HDFS, HBase, MapReduce HBase: nosql data store (row key, column-families, columns) Row Key: Facilitate accessing a specific data point or a range of them

6 Schema Schemas Composite row key: {src, dst}{addr, port}{ts} Three table types are required: IP Based Tables Port Based Tables Time Based Tables Single table has actual data, others are lookup tables

7 Implementation Implementation Initial data collection didn t perform well For a single day of NetFlow data: HBase max # op/s: 50 HBase max op latency: 2.3 s HDFS max # written bytes/s: 81 MB/s MR job duration: min This is not good at all

8 Implementation What is wrong? Non uniform distribution of data across regions (Hot Regions) Write Ahead Log Concurrent-Mark-Sweep Garbage Collection (CMS-GC) Old generation heap fragmentation etc.

9 Performance Tuning What to do? Using Compression Tuning Swap Disabling Write Ahead Log Enabling Deferred Log Flush Increasing Heap Size Specifying Concurrent-Mark-Sweep Garbage Collection Enabling MemStore-Local Allocation Buffers (MSLAB) Pre-Splitting Regions

Deferred Log Flush Increasing Heap Size Specifying

10 Performance Tuning Regions Basic element of availability and distribution for tables Has start and end row keys Two Splitting Strategies 1 Uniform splitting over leading field of rowkey IP in IP Based tables ((2 32 1)/#Regions) Port in Port Based tables ((2 16 1)/#Regions) 2 Empirical study of leading field value domain Norwegian IP blocks Popular src, dst Popular services

rowkey IP in IP Based tables ((2 32 1)/#Regions) Port in Port Based tables ((2 16

11 Performance Tuning Pre-Splitting Regions 1) Uniform Distribution Results: x30 more operation/s x14 faster operation x3 shorter duration 2) Empirical study Results: x64 more operation/s x80 faster operation x7.5 shorter duration

operation x3 shorter duration 2) Empirical study

12 Top-N Host Pairs Top-N Host Pairs Results Finding host pairs which exchanged most traffic Belongs to long-term query family Aggregation of input and output bytes for all host pairs Query on Reference table (T1) with 5 billion records Traditional tools: not capable handling this much data (e.g. nfdump) Chaining MapReduce jobs: min (Average response time) Reasonable duration

Query on Reference table (T1) with 5 billion records Traditional tools: not capable handling

13 Service Server Discovery Service Server Discovery for a Given Period Criteria: Port number and Time range Four methods of execution: 1 HBase 2 OpenTSDB 3 NFD1 (Over complete dataset) 4 NFD2 (Limited dataset by time)

Four methods of execution: 1 HBase 2 OpenTSDB 3 NFD1

14 Service Server Discovery Service Server Discovery Results HBase x87 faster than OpenTSDB HBase x4472 faster than NFD1

15 Summary Data-intensive frameworks are effective for network monitoring Solutions should be protocol independent Designing proper data structure is crucial Data characteristics should be well studied Different query types have heterogeneous demands One size doesn t fit all

proper data structure is crucial Data characteristics should be

16 Ongoing Research End-to-End secure virtual layer 2 networks

Real-Time Handling of Network Monitoring Data Using a Data-Intensive Framework

Real-Time Handling of Network Monitoring Data Using a Data-Intensive Framework Aryan TaheriMonfared, Tomasz Wiktor Wlodarczyk, Chunming Rong, Department of Electrical Engineering and Computer Science,