Network monitoring in high-speed links



Similar documents
Load Distribution in Large Scale Network Monitoring Infrastructures

Custom Load Shedding for Non-Cooperative Network Monitoring Applications

Internet Traffic Measurement

Bitmap Algorithms for Counting Active Flows on High Speed Links. Elisa Jasinska

Application Latency Monitoring using nprobe

An apparatus for P2P classification in Netflow traces

Software-Defined Traffic Measurement with OpenSketch

Adaptive Probing: A Monitoring-Based Probing Approach for Fault Localization in Networks

4 Internet QoS Management

Application of Netflow logs in Analysis and Detection of DDoS Attacks

Enabling Flow-level Latency Measurements across Routers in Data Centers

Internet Management and Measurements Measurements

Cisco IOS Flexible NetFlow Technology

Lecture 12: Network Management Architecture

TE in action. Some problems that TE tries to solve. Concept of Traffic Engineering (TE)

Bloom Filter based Inter-domain Name Resolution: A Feasibility Study

Enabling Open-Source High Speed Network Monitoring on NetFPGA

Qcast : IP Multicast Traffic Monitoring System with IPFIX/PSAMP

SMARTxAC: A Passive Monitoring and Analysis System for High-Speed Networks

LOBSTER: Overview. LOBSTER: Large Scale Monitoring for Broadband Internet Infrastructure

Cisco Performance Agent Data Source Configuration in the Branch-Office Router

Flow Analysis Versus Packet Analysis. What Should You Choose?

Clearing the Way for VoIP

Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking

Advanced Computer Networks IN Dec 2015

Detecting Network Anomalies. Anant Shah

基 於 SDN 與 可 程 式 化 硬 體 架 構 之 雲 端 網 路 系 統 交 換 器

Introduction to Quality of Service. Andrea Bianco Telecommunication Network Group

Chapter 4. VoIP Metric based Traffic Engineering to Support the Service Quality over the Internet (Inter-domain IP network)

Hybrid network traffic engineering system (HNTES)

Provider-Based Deterministic Packet Marking against Distributed DoS Attacks

Network Performance Monitoring at Small Time Scales

Internet Protocol trace back System for Tracing Sources of DDoS Attacks and DDoS Detection in Neural Network Packet Marking

Cisco Bandwidth Quality Manager 3.1

MODELING RANDOMNESS IN NETWORK TRAFFIC

Nine Use Cases for Endace Systems in a Modern Trading Environment

Software Defined Traffic Measurement with OpenSketch

Packet-Marking Scheme for DDoS Attack Prevention

Beyond Monitoring Root-Cause Analysis

Building a better NetFlow

A System for Detecting Network Anomalies based on Traffic Monitoring and Prediction

Observer Probe Family

Internet Infrastructure Measurement: Challenges and Tools

Observer Analysis Advantages

Cisco Integrated Services Routers Performance Overview

Effects of Interrupt Coalescence on Network Measurements

Effects of Filler Traffic In IP Networks. Adam Feldman April 5, 2001 Master s Project

Reformulating the monitor placement problem: Optimal Network-wide wide Sampling

Influence of Load Balancing on Quality of Real Time Data Transmission*

SiteCelerate white paper

Port evolution: a software to find the shady IP profiles in Netflow. Or how to reduce Netflow records efficiently.

Network traffic monitoring and management. Sonia Panchen 11 th November 2010

Traffic Monitoring in a Switched Environment

SLA para aplicaciones en redes WAN. Alvaro Cayo Urrutia

Improving the Database Logging Performance of the Snort Network Intrusion Detection Sensor

IP Network Monitoring and Measurements: Techniques and Experiences

Improving the Performance of TCP Using Window Adjustment Procedure and Bandwidth Estimation

Performance Evaluation of AODV, OLSR Routing Protocol in VOIP Over Ad Hoc

Introduction to Network Traffic Monitoring. Evangelos Markatos. FORTH-ICS

IP Forwarding Anomalies and Improving their Detection using Multiple Data Sources

ICND2 NetFlow. Question 1. What are the benefit of using Netflow? (Choose three) A. Network, Application & User Monitoring. B.

How Much Broadcast and Multicast Traffic Should I Allow in My Network?

Entropy-Based Collaborative Detection of DDoS Attacks on Community Networks

HP Intelligent Management Center v7.1 Network Traffic Analyzer Administrator Guide

Scheduling using Optimization Decomposition in Wireless Network with Time Performance Analysis

Monitoring of Internet traffic and applications

VoIP QoS on low speed links

Networking Topology For Your System

FlowRadar: A Better NetFlow for Data Centers

Monitoring and analyzing audio, video, and multimedia traffic on the network

Gaining Operational Efficiencies with the Enterasys S-Series

Signature-aware Traffic Monitoring with IPFIX 1

Monitoring high-speed networks using ntop. Luca Deri

IPTV Traffic Monitoring System with IPFIX/PSAMP

CHAPTER 1 INTRODUCTION

and reporting Slavko Gajin

NetFlow Performance Analysis

Introduction to Performance Measurements and Monitoring. Vinayak

Adaptive Flow Aggregation - A New Solution for Robust Flow Monitoring under Security Attacks

Introduction to Passive Network Traffic Monitoring

Radware s Attack Mitigation Solution On-line Business Protection

Transcription:

Network monitoring in high-speed links Algorithms and challenges Pere Barlet-Ros Advanced Broadband Communications Center (CCABA) Universitat Politècnica de Catalunya (UPC) http://monitoring.ccaba.upc.edu pbarlet@ac.upc.edu NGI course, 30 Nov. 2010 1 / 36

Outline 1 Network monitoring 2 Addressing the technological barriers 3 Use cases 4 Addressing the social barriers 2 / 36

Outline 1 Network monitoring Introduction Active monitoring Passive monitoring Technological and social barriers 2 Addressing the technological barriers 3 Use cases 4 Addressing the social barriers 3 / 36

Introduction to network monitoring Process of measuring network systems and traffic Routers, switches, servers,... Network traffic volume, type, topology,... Monitoring is crucial for network operation and management Traffic engineering, capacity planning, BW management Fault diagnosis, troubleshooting, performance evaluation Accounting, billing, security,... Network measurements are important for networking research Design and evaluation of protocols, applications,... Traffic modeling and characterization Network monitoring is very challenging and datasets scarce From technological and social standpoint 4 / 36

Classification Classification of network monitoring tools and methods Hardware vs. software Online vs. offline LAN vs. WAN Protocol level Active vs. passive 5 / 36

Active monitoring Active tools are based on traffic injection Probe traffic generated by a measurement device Response to probe traffic is measured Pros: Flexibility Devices can be deployed at the edge (e.g., end-hosts) No instrumentation at the core is needed Measurement does not directly rely on existing traffic Cons: Intrusiveness Probe traffic can degrade network performance Probe traffic can impact on the measurement itself Main usages Performance evaluation (e.g., ping) Bandwidth estimation (e.g., pathload) Topology discovery (e.g., traceroute) 6 / 36

Passive monitoring Traffic collection from inside the network Routers and switches (e.g., Cisco NetFlow) Passive devices (e.g., libpcap, DAG cards, optical taps) Pros: Transparency Network performance is not affected No additional traffic is injected Useful even with a single measurement point Cons: Complexity Requires administrative access to network devices Requires explicit presence of traffic under study Online collection and analysis is hard (e.g., sampling) Privacy concerns Multiple and diverse usages Traffic analysis and classification,... Anomaly and intrusion detection,... 7 / 36

Technological and social barriers Datasets and platforms for research purposes are limited Outdated datasets Academic traffic only Anonymized and without payloads Technological barriers Internet was designed without monitoring in mind Collection of Gb/s and storage of TB/day Links speeds increase at faster pace than processing speeds Building monitoring apps is error-prone and time-consuming Social barriers Lack of coordination between projects ISPs have no incentive to share information Privacy and competition concerns Monitoring hardware is expensive and difficult to manage 8 / 36

Outline 1 Network monitoring 2 Addressing the technological barriers Bloom filters Bitmap algorithms Direct bitmaps and variants Bitmaps over sliding windows 3 Use cases 4 Addressing the social barriers 9 / 36

Technological challenges Few ns per packet Interarrivals 8ns (40Gb/s), 32ns (10Gb/s) Memory access times < 10ns (SRAM), tens of ns (DRAM) Obtaining simple metrics becomes extremely challenging Approaches based on hash tables do not scale Core of most monitoring algorithms E.g., Active flows, flow size distribution, heavy hitter detection, delay, entropy, sophisticated sampling,... Probabilistic approach: trade accuracy for speed Extremely efficient compared to compute exact answer Fit in SRAM, 1 access/pkt Probabilistic guarantees (bounded error) 10 / 36

Bloom filters 1 Space-efficient data structure to test set membership Based on hashing (e.g., pseudo-random hash functions) Examples of usage in network monitoring Replace hash tables to check if a flow has already been seen Definition of flow is flexible Advantages Small memory (SRAM) is needed compared to hash tables Limitations False positives are possible Removals are not possible (counting variants can support them) 1 B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7), 1970. 11 / 36

Bloom filters Parameters k: #hash functions m: size of the bitmap p: false positive rate n: #elements in the filter (max) Figure: Example of a bloom filter 2 2 A. Broder and M. Mitzenmacher. Network Applications of Bloom Filters: A Survey. Internet Mathematics, 1(4), 2005. 12 / 36

Direct bitmaps (linear counting) 3 Space-efficient algorithms to count the number of unique items E.g., useful to count the number of flows over a fixed time interval Basic idea Each flow hashes to one position (and all its packets) Counting the number of 1 s is inaccurate due to collisions Count the number of unset positions instead E.g., 20KB to count 1M flows with 1% error Estimate formulae Flow hashes to a given bit: p = 1/b No flow hashes to a given bit: p z = (1 p) n (1/e) n/b Expected non-set bits: E[z] = bp z b(1/e) n/b Estimated number of flows: ˆn = b ln(b/z) 3 K.-Y. Whang et al. A linear-time probabilistic counting algorithm for database applications. ACM Trans. Database Syst., 15(2), 1990. 13 / 36

Bitmap variants 4 Direct bitmaps scale linearly with the number of flows Variants: Virtual, multiresolution, adaptive, triggered bitmaps,... 4 C. Estan, G. Varghese, M. Fisk. Bitmap algorithms for counting active flows on high speed links. IEEE/ACM Trans. Netw. 14(5), 2006. 14 / 36

Bitmaps over sliding windows Timestamp Vector (TSV) 5 Vector of timestamps (instead of bits) O(n) query cost Countdown Vector (CDV) 6 Vector of small timeout counters (instead of full timestamps) Independent query and update processes O(1) query cost 5 H. Kim, D. O Hallaron. Counting network flows in real time. In Proc. of IEEE Globecom, Dec. 2003. 6 J. Sanjuàs-Cuxart et al. Counting flows over sliding windows in high speed networks. In Proc. of IFIP/TC6 Networking, May 2009. 15 / 36

CDV and TSV performance 30-min trace, 271 Mbps, 1 query/s, 50K/10s-1.8M/min flows 7 relative error 0.000 0.005 0.010 0.015 0.020 w = 10 s, average error w = 10 s, 95 percentile of error w = 300 s, average error w = 300 s, 95 percentile of error w = 600 s, average error w = 600 s, 95 percentile of error 5 10 15 20 counter initialization value Bytes 2.5 x 106 TSV TSV (extension) CDV 2 1.5 1 0.5 0 10 30 60 120 300 600 window accesses/s 3 x 105 TSV TSV (extension) 2.5 CDV 2 1.5 1 0.5 0 10 30 60 120 300 600 window 7 A hash table would require several MBs with these settings 16 / 36

Outline 1 Network monitoring 2 Addressing the technological barriers 3 Use cases Load shedding Lossy difference aggregator 4 Addressing the social barriers 17 / 36

Overload problem Previous solutions focus on a particular metric They are not valid for any (arbitrary) monitoring application Monitoring systems are prone to dramatic overload situations Link speeds, anomalous traffic, bursty traffic nature... Complexity of traffic analysis methods Overload situations lead to uncontrolled packet loss Severe and unpredictable impact on the accuracy of applications... when results are most valuable!! 18 / 36

Overload problem Previous solutions focus on a particular metric They are not valid for any (arbitrary) monitoring application Monitoring systems are prone to dramatic overload situations Link speeds, anomalous traffic, bursty traffic nature... Complexity of traffic analysis methods Overload situations lead to uncontrolled packet loss Severe and unpredictable impact on the accuracy of applications... when results are most valuable!! Load Shedding Scheme Efficiently handle extreme overload situations Over-provisioning is not feasible 18 / 36

Case study: Intel CoMo CoMo (Continuous Monitoring) 8 Open-source passive monitoring system Framework to develop and execute network monitoring applications Open (shared) network monitoring platform Traffic queries are defined as plug-in modules written in C Contain complex computations 8 http://como.sourceforge.net 19 / 36

Case study: Intel CoMo CoMo (Continuous Monitoring) 8 Open-source passive monitoring system Framework to develop and execute network monitoring applications Open (shared) network monitoring platform Traffic queries are defined as plug-in modules written in C Contain complex computations Traffic queries are black boxes Arbitrary computations and data structures Load shedding cannot use knowledge of the queries 8 http://como.sourceforge.net 19 / 36

Load shedding approach 9 Working scenario Monitoring system supporting multiple arbitrary queries Single resource: CPU cycles Approach: Real-time modeling of the queries CPU usage 1 Find correlation between traffic features and CPU usage Features are query agnostic with deterministic worst case cost 2 Exploit the correlation to predict CPU load 3 Use the prediction to decide the sampling rate 9 P. Barlet-Ros, G. Iannaccone, J. Sanjuàs-Cuxart, and J. Solé-Pareta.. IEEE/ACM Transactions on Networking, 2010. 20 / 36

System overview Figure: Prediction and Load Shedding Subsystem 21 / 36

Traffic features vs CPU usage CPU cycles 4 2 3000 x 10 6 0 0 10 20 30 40 50 60 70 80 90 100 Packets 2000 1000 0 0 10 20 30 40 50 60 70 80 90 100 x 10 5 15 Bytes 10 5 5 tuple flows 0 0 10 20 30 40 50 60 70 80 90 100 3000 2000 1000 0 0 10 20 30 40 50 60 70 80 90 100 Time (s) Figure: CPU usage compared to the number of packets, bytes and flows 22 / 36

Traffic features vs CPU usage 2.8 x 106 2.6 2.4 CPU cycles 2.2 2 1.8 1.6 new_5tuple_hashes < 500 500 <= new_5tuple_hashes < 700 700 <= new_5tuple_hashes < 1000 new_5tuple_hashes >= 1000 1.4 1800 2000 2200 2400 2600 2800 3000 packets/batch Figure: CPU usage versus the number of packets and flows 23 / 36

Prediction methodology 10 Multiple Linear Regression (MLR) Y i = β 0 + β 1 X 1i + β 2 X 2i + + β p X pi + ε i, i = 1, 2,..., n. Y i = n observations of the response variable (measured cycles) X ji = n observations of the p predictors (traffic features) β j = p regression coefficients (unknown parameters to estimate) ε i = n residuals (OLS minimizes SSE) 10 P. Barlet-Ros et al. Load Shedding in Network Monitoring Applications, In Proc. of USENIX Annual Technical Conf., 2007. 24 / 36

Prediction methodology 10 Multiple Linear Regression (MLR) Y i = β 0 + β 1 X 1i + β 2 X 2i + + β p X pi + ε i, i = 1, 2,..., n. Y i = n observations of the response variable (measured cycles) X ji = n observations of the p predictors (traffic features) β j = p regression coefficients (unknown parameters to estimate) ε i = n residuals (OLS minimizes SSE) Feature selection Variant of the Fast Correlation-Based Filter (FCBF) Removes irrelevant and redundant predictors Reduces significantly the cost and improves accuracy of the MLR 10 P. Barlet-Ros et al. Load Shedding in Network Monitoring Applications, In Proc. of USENIX Annual Technical Conf., 2007. 24 / 36

Load shedding scheme 11 Prediction and Load Shedding subsystem 1 Each 100ms of traffic is grouped into a batch of packets 2 The traffic features are efficiently extracted from the batch (multi-resolution bitmaps) 3 The most relevant features are selected (using FCBF) to be used by the MLR 4 MLR predicts the CPU cycles required by each query to run 5 Load shedding is performed to discard a portion of the batch 6 CPU usage is measured (using TSC) and fed back to the prediction system 11 P. Barlet-Ros et al. Robust network monitoring in the presence of non-cooperative traffic queries. Computer Networks, 53(3), 2009. 25 / 36

Results: Load shedding performance 9 x 109 8 CPU usage [cycles/sec] 7 6 5 4 3 2 CoMo cycles Load shedding cycles Query cycles Predicted cycles CPU limit 1 0 09 am 10 am 11 am 12 pm 01 pm 02 pm 03 pm 04 pm 05 pm time Figure: Stacked CPU usage (Predictive Load Shedding) 26 / 36

Results: Load shedding performance 1 0.9 0.8 0.7 F(CPU usage) 0.6 0.5 0.4 0.3 0.2 0.1 CPU limit per batch Predictive Original Reactive 0 0 2 4 6 8 10 12 14 16 CPU usage [cycles/batch] x 10 8 Figure: CDF of the CPU usage per batch 27 / 36

Results: Packet loss 11 x 104 11 x 104 11 x 104 10 9 8 10 9 8 Total DAG drops Unsampled 10 9 8 7 7 7 packets 6 5 packets 6 5 packets 6 5 4 4 4 3 2 1 Total DAG drops 3 2 1 3 2 1 Total DAG drops Unsampled 0 09 am 10 am 11 am 12 pm 01 pm 02 pm 03 pm 04 pm 05 pm time (a) Original CoMo 0 09 am 10 am 11 am 12 pm 01 pm 02 pm 03 pm 04 pm 05 pm time (b) Reactive Load Shedding 0 09 am 10 am 11 am 12 pm 01 pm 02 pm 03 pm 04 pm 05 pm time (c) Predictive Load Shedding Figure: Link load and packet drops 28 / 36

Results: Error of the queries Query original reactive predictive application (pkts) 55.38% ±11.80 10.61% ±7.78 1.03% ±0.65 application (bytes) 55.39% ±11.80 11.90% ±8.22 1.17% ±0.76 counter (pkts) 55.03% ±11.45 9.71% ±8.41 0.54% ±0.50 counter (bytes) 55.06% ±11.45 10.24% ±8.39 0.66% ±0.60 flows 38.48% ±902.13 12.46% ±7.28 2.88% ±3.34 high-watermark 8.68% ±8.13 8.94% ±9.46 2.19% ±2.30 top-k destinations 21.63 ±31.94 41.86 ±44.64 1.41 ±3.32 50 45 40 35 30 Error (%) 25 20 15 10 5 0 Original Reactive Predictive 29 / 36

One-way delay measurement Traditional approaches are expensive Probing traffic (intrusive) Trajectory sampling Constant overhead per sample Alternative: LDA (Lossy Difference Aggregator) 12 Send only sums of timestamps Deal with packet loss (sampling + partition input stream) 12 R. Kompella et al. Every microsecond counts: tracking fine-grain latencies with a lossy difference aggregator. SIGCOMM, 2010. 30 / 36

Lossy Difference Aggregator (LDA) (delay = 1) 1,2,...,5 2,3,...,6 a network b 31 / 36

Lossy Difference Aggregator (LDA) (delay = 1) 1,2,...,5 2,3,...,6 a network b (2 1)+(3 2)+... n = 1 (send n timestamps) 31 / 36

Lossy Difference Aggregator (LDA) (delay = 1) 1,2,...,5 2,3,...,6 a network b (2 1)+(3 2)+... n = 1 (send n timestamps) (2+3+...) (1+2+...) 4 = tb t a n = 1 (send just one ts) 31 / 36

Lossy Difference Aggregator (LDA) (delay = 1) 1,2,...,5 2,3,...,6 a network b # 15 5 # 20 5 31 / 36

Lossy Difference Aggregator (LDA) (delay = 1) 1,2,...,5 2,3,...,6 a network b # 15 5 # 20 5 result: 20 15 5 = 1 31 / 36

Lossy Difference Aggregator (LDA) (delay = 1) 1,2,...,5 2,3,...,6 a network b # 15 5 # 17 4 packet loss: mismatch in number of packets! must protect against packet loss 31 / 36

Lossy Difference Aggregator (LDA) (delay = 1) 1,2,...,5 2,3,...,6 a network b # # 1 1 2 1 2 1 3 1 7 2 9 2 5 1 6 1 total: < 15, 5 > total: < 20, 5 > 20 15 5 = 1 31 / 36

Lossy Difference Aggregator (LDA) (delay = 1) 1,2,...,5 2,3,...,6 a network b # # 1 1 2 1 2 1 0 0 7 2 9 2 5 1 6 1 total: < 13, 4 > total: < 17, 4 > 17 13 4 = 1 31 / 36

Lossy Difference Aggregator (LDA) (delay = 1) 1,2,...,5 2,3,...,6 a network b s # s # 1 1 2 1 2 1 3 1 7 2 9 2 5 1 6 1 31 / 36

Dimensioning the packet sampling rate 13 PSR presents a classical tradeoff. The higher, the more packets it aggregates, but...... the higher the chance of counter invalidation We wish to maximize the exp. effective sample size: E [S] = (1 r) p n e n r p/b which is max. with p = b n r Original analysis obtained p 0.5 b n r our analysis doubles the sampling rate we improve sample size by 20% Best algorithm in terms of network overhead up to 25% loss 13 J. Sanjuàs-Cuxart et al. Validation and improvement of the Lossy Difference Aggregator to measure packet delays. TMA, 2010. 32 / 36

Outline 1 Network monitoring 2 Addressing the technological barriers 3 Use cases 4 Addressing the social barriers CoMo-UPC initiative 33 / 36

The problem Several researchers working on: Anomaly Detection (AD) Traffic Classification (TC) Real packet traces are needed to test novel AD and TC methods Several AD and TC algorithms require: Unanonymized IP addresses (or prefix-preserving) Payload inspection (e.g., IDS, DPI,... ) Traditional solution: Anonymized traffic traces Examples: NLANR, CAIDA, CRAWDAD,... 34 / 36

The problem Several researchers working on: Anomaly Detection (AD) Traffic Classification (TC) Real packet traces are needed to test novel AD and TC methods Several AD and TC algorithms require: Unanonymized IP addresses (or prefix-preserving) Payload inspection (e.g., IDS, DPI,... ) Traditional solution: Anonymized traffic traces Examples: NLANR, CAIDA, CRAWDAD,... Anonymization is not the right solution! Data owners: Privacy concerns Researchers: Not enough data Lack of recent publicly available traces 34 / 36

CoMo-UPC 14 Move the code to the data Instead of publishing anonymized data traces Significantly lowers the privacy concerns Traffic data do not leave provider premises Data providers keep the ownership of the data The source code can be inspected by the data owner Researchers have (blind) access to unanonymized traffic IP addresses and payloads can be processed...... but not stored or exported The CoMo system is based on this model Implement AD and TC methods as CoMo modules 14 http://monitoring.ccaba.upc.edu/como-upc 35 / 36

UPC traffic UPC access link 1 GE full-duplex ( 900 Mbps, 60K flows/s) Connects 40 departments and 25 faculties (10 campuses) 12 traffic traces are also available 36 / 36