Real-Time Handling of Network Monitoring Data Using a Data-Intensive Framework

Real-Time Handling of Network Monitoring Data Using a Data-Intensive Framework Aryan TaheriMonfared Tomasz Wiktor Wlodarczyk Chunming Rong Department of Electrical Engineering and Computer Science University of Stavanger CloudCom, 2013

Problem? & Solution Problem? Proper network operation requires efficient monitoring Different monitoring instruments and protocols exist Monitoring data are huge Diverse query types are required (planned vs. ad-hoc)

Problem? & Solution Contributions A mechanism for: Scalable and flexible storage Real-time processing, long-term analysis Protocol independent

Norwegian NREN Backbone Network Data Characteristics Norwegian NREN backbone network Flow information from two core routers Anonymized records Average number of NetFlow records: 22m /day Average volume of NetFlow records: 60GB /day Sampling rate: 8

Overview Solution Overview Hadoop framework: HDFS, HBase, MapReduce HBase: nosql data store (row key, column-families, columns) Row Key: Facilitate accessing a specific data point or a range of them

Schema Schemas Composite row key: {src, dst}{addr, port}{ts} Three table types are required: IP Based Tables Port Based Tables Time Based Tables Single table has actual data, others are lookup tables

Implementation Implementation Initial data collection didn t perform well For a single day of NetFlow data: HBase max # op/s: 50 HBase max op latency: 2.3 s HDFS max # written bytes/s: 81 MB/s MR job duration: 45.46 min This is not good at all

Implementation What is wrong? Non uniform distribution of data across regions (Hot Regions) Write Ahead Log Concurrent-Mark-Sweep Garbage Collection (CMS-GC) Old generation heap fragmentation etc.

Performance Tuning What to do? Using Compression Tuning Swap Disabling Write Ahead Log Enabling Deferred Log Flush Increasing Heap Size Specifying Concurrent-Mark-Sweep Garbage Collection Enabling MemStore-Local Allocation Buffers (MSLAB) Pre-Splitting Regions

Performance Tuning Regions Basic element of availability and distribution for tables Has start and end row keys Two Splitting Strategies 1 Uniform splitting over leading field of rowkey IP in IP Based tables ((2 32 1)/#Regions) Port in Port Based tables ((2 16 1)/#Regions) 2 Empirical study of leading field value domain Norwegian IP blocks Popular src, dst Popular services

Performance Tuning Pre-Splitting Regions 1) Uniform Distribution Results: x30 more operation/s x14 faster operation x3 shorter duration 2) Empirical study Results: x64 more operation/s x80 faster operation x7.5 shorter duration

Top-N Host Pairs Top-N Host Pairs Results Finding host pairs which exchanged most traffic Belongs to long-term query family Aggregation of input and output bytes for all host pairs Query on Reference table (T1) with 5 billion records Traditional tools: not capable handling this much data (e.g. nfdump) Chaining MapReduce jobs: 26.10 min (Average response time) Reasonable duration

Service Server Discovery Service Server Discovery for a Given Period Criteria: Port number and Time range Four methods of execution: 1 HBase 2 OpenTSDB 3 NFD1 (Over complete dataset) 4 NFD2 (Limited dataset by time)

Service Server Discovery Service Server Discovery Results HBase x87 faster than OpenTSDB HBase x4472 faster than NFD1

Summary Data-intensive frameworks are effective for network monitoring Solutions should be protocol independent Designing proper data structure is crucial Data characteristics should be well studied Different query types have heterogeneous demands One size doesn t fit all

Ongoing Research End-to-End secure virtual layer 2 networks