Limitations of Packet Measurement

Similar documents
Network Monitoring and Management NetFlow Overview

Introduction to Netflow

NfSen Plugin Supporting The Virtual Network Monitoring

Network congestion control using NetFlow

Large-Scale TCP Packet Flow Analysis for Common Protocols Using Apache Hadoop

NetFlow Performance Analysis

J-Flow on J Series Services Routers and Branch SRX Series Services Gateways

Connecting North Carolina s Future Today. Application Monitoring: ClassScape Case Study. NCSU Centennial Networking Lab

Watch your Flows with NfSen and NFDUMP 50th RIPE Meeting May 3, 2005 Stockholm Peter Haag

Network Management & Monitoring

Network Monitoring and Traffic CSTNET, CNIC

Netflow Overview. PacNOG 6 Nadi, Fiji

Beyond Monitoring Root-Cause Analysis

GPU-Based Network Traffic Monitoring & Analysis Tools

Practical Experience with IPFIX Flow Collectors

Internet Infrastructure Measurement: Challenges and Tools

Research on Errors of Utilized Bandwidth Measured by NetFlow

NetFlow-Lite offers network administrators and engineers the following capabilities:

Monitoring of Tunneled IPv6 Traffic Using Packet Decapsulation and IPFIX

Case Study: Instrumenting a Network for NetFlow Security Visualization Tools

First Midterm for ECE374 03/09/12 Solution!!

Network forensics 101 Network monitoring with Netflow, nfsen + nfdump

MLPPP Deployment Using the PA-MC-T3-EC and PA-MC-2T3-EC

Transport Layer Protocols

WAN Optimization, Web Cache, Explicit Proxy, and WCCP. FortiOS Handbook v3 for FortiOS 4.0 MR3

Outline. The Problem BGP/Routing Information. Netflow/Traffic Information. Conclusions

Datagram-based network layer: forwarding; routing. Additional function of VCbased network layer: call setup.

Network Traffic Analysis using HADOOP Architecture. Zeng Shan ISGC2013, Taibei

An overview of traffic analysis using NetFlow

Beyond Monitoring Root-Cause Analysis

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

nfdump and NfSen 18 th Annual FIRST Conference June 25-30, 2006 Baltimore Peter Haag 2006 SWITCH

Network Traffic Monitoring & Analysis with GPUs

NetStream (Integrated) Technology White Paper HUAWEI TECHNOLOGIES CO., LTD. Issue 01. Date

Real-Time Handling of Network Monitoring Data Using a Data-Intensive Framework

and reporting Slavko Gajin

NetFlow Tracker Overview. Mike McGrath x ccie CTO mike@crannog-software.com

NetFlow Aggregation. Feature Overview. Aggregation Cache Schemes

Flow Analysis Versus Packet Analysis. What Should You Choose?

Port evolution: a software to find the shady IP profiles in Netflow. Or how to reduce Netflow records efficiently.

Advanced Computer Networks IN Dec 2015

The Value of Flow Data for Peering Decisions

CS514: Intermediate Course in Computer Systems

Securing and Accelerating Databases In Minutes using GreenSQL

Network Security Monitoring and Behavior Analysis Pavel Čeleda, Petr Velan, Tomáš Jirsík

Internet Management and Measurements Measurements

The Ecosystem of Computer Networks. Ripe 46 Amsterdam, The Netherlands

Cisco IOS Flexible NetFlow Technology

Internet Firewall CSIS Packet Filtering. Internet Firewall. Examples. Spring 2011 CSIS net15 1. Routers can implement packet filtering

Gaining Operational Efficiencies with the Enterasys S-Series

Viete, čo robia Vaši užívatelia na sieti? Roman Tuchyňa, CSA

plixer Scrutinizer Competitor Worksheet Visualization of Network Health Unauthorized application deployments Detect DNS communication tunnels

OpenFlow Based Load Balancing

Understanding Slow Start

Software Defined Networks

Network Traffic Monitoring and Analysis with GPUs

Final for ECE374 05/06/13 Solution!!

Question: 3 When using Application Intelligence, Server Time may be defined as.

Log Management with Open-Source Tools. Risto Vaarandi SEB Estonia

How Comcast Built An Open Source Content Delivery Network National Engineering & Technical Operations

PANDORA FMS NETWORK DEVICE MONITORING

NetFlow Analysis with MapReduce

Analyzing 6LoWPAN/ZigBeeIP networks with the Perytons Protocol Analyzer May, 2012

Instructions for Access to Summary Traffic Data by GÉANT Partners and other Organisations

Internet Content Distribution

NetFlow/IPFIX Various Thoughts

NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE

First Midterm for ECE374 02/25/15 Solution!!

Big Data Analytics for Net Flow Analysis in Distributed Environment using Hadoop

From traditional to alternative approach to storage and analysis of flow data. Petr Velan, Martin Zadnik

Steelcape Product Overview and Functional Description

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Lab 5 Explicit Proxy Performance, Load Balancing & Redundancy

Outline. VL2: A Scalable and Flexible Data Center Network. Problem. Introduction 11/26/2012

Firewalls P+S Linux Router & Firewall 2013

Scala Storage Scale-Out Clustered Storage White Paper

Application Latency Monitoring using nprobe

Application Delivery Networking

Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking

Detection of Distributed Denial of Service Attack with Hadoop on Live Network

A Link Load Balancing Solution for Multi-Homed Networks

Protocols. Packets. What's in an IP packet

Internet Working 5 th lecture. Chair of Communication Systems Department of Applied Sciences University of Freiburg 2004

Improving DNS performance using Stateless TCP in FreeBSD 9

How To Create A Network Monitoring System (Flowmon) In Avea-Tech (For Free)

Network Monitoring On Large Networks. Yao Chuan Han (TWCERT/CC)

Masters Project Proxy SG

A Small-time Scale Netflow-based Anomaly Traffic Detecting Method Using MapReduce

ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy

IP - The Internet Protocol

Wide Area Networks. Learning Objectives. LAN and WAN. School of Business Eastern Illinois University. (Week 11, Thursday 3/22/2007)

Securing and Monitoring BYOD Networks using NetFlow

Behavior Analysis of TCP Traffic in Mobile Ad Hoc Network using Reactive Routing Protocols

Stateful Firewalls. Hank and Foo

Internet Packets. Forwarding Datagrams

Transcription:

Limitations of Packet Measurement Collect and process less information: Only collect packet headers, not payload Ignore single packets (aggregate) Ignore some packets (sampling) Make collection and processing faster: Move to kernel space Distributed collection & processing Dedicated hardware

Sampling For packet-based measurements: Systematic sampling (every nth) (bad!) Random sampling: n-to-n sampling Dynamic sampling: e.g. sample big flows more often For flow-based measurements, too! Packet sampling Flow sampling SURFnet: netflow with 1:100 packet sampling

Estimating Distributions From Sample Statistics How to reverse the effects of sampling? Example: 1. Create flows based on packets sampled with ratio 1:10 2. Observe flows with byte size b 1,b 2,b 3, 3. What were the original byte sizes b 1,b 2,b 3,? 4. Estimation: b i = 10 b 1 5. Right?

Sampling Error How does sampling influence the results? (sampling error) Results are only statistical estimations with a certain confidence Depends on many factors: Nature of data (packets or flows) Sampling method (periodic, random, ) Sampling rate Characteristics of the sampled process (distribution, )

Sampled Flows A flow could be splitted into two: Original packet sequence: a b c d e Assume: Time between b and d larger than timeout used by flow collector to determine end of flow If c is not sampled: two flows ab and de! High sampling ratio: Small flows have a significant probability not to be sampled at all Result is biased towards large flows Periodic sampling: Fails if periodicities in the data Example: back-to-back communications

Example: Flows with Periodic Sampling (from: Estimating Flow Distributions from Sampled Flow Statistics, 2003)

Example: Sampling Small Flows (from: Estimating Flow Distributions from Sampled Flow Statistics, 2003)

Limitations of packet measurement Collect and process less information: Only collect packet headers, not payload Ignore single packets (aggregate) Ignore some packets (sampling) Make collection and processing faster: Move to kernel space Distributed collection & processing Dedicated hardware

Dedicated Hardware Exist for packet measurements and flow measurements Jobs that could be handled by hardware: Protocol analysis (analysis of IP packet payload) Filtering (as a pre-processing step) Analysis (run analysis algorithms over the packet) For flows: building flow records

How Flow Exporters Work 1 When packet arrives: 1. Calculate hash value for packet 2. Lookup in hash table whether flow exists Yes: update flow information (number of bytes, packets, ) No: create new flow entry Entry from table is removed and flow record is exported if Inactivity Timeout: no new packets for a flow since x seconds Activity Timeout: flow is active but older than y seconds Flow record Flow

How Flow Exporters Work 2 Usually more complex: Out of memory: forced flow export Don t export single flow records: wait until several flows records are ready for export Cache levels Can be implemented in software or hardware Loss of data: Packet arrival rate too high Flow export rate too high

Example: Hardware Flow Exporter From: www.invea-tech.com Hardware-accelerated flow exporter FlowMon: The accelerated model can capture 6 million packets/s providing full 4 x 1 Gbps throughput under all conditions.

Storing and Processing Traffic Measurements

Storing and Processing Traffic Measurements UT, 2007: Average bandwidth: 652 Mb/s Maximum bandwidth: 1010 Mb/s Volume (packets): 21.6 TB in 2 days Flows (no sampling): 983 Million flows in 2 day Surfnet, 2007: Average bandwidth: 7730 Mb/s Maximum bandwidth: 10500 Mb/s Volume (packets): 162.3 TB in 2 days Flows (1:100 sampling): 523.7 Million flows in 2 days

Online vs. Offline Analysis Online: process data continuously, faster than arrival rate. Often, only a portion of the data (last n seconds) or no data at all is stored Simple example: protocol usage statistics Offline: store data and analyze it afterwards Needed if restriction on time (analysis algorithm too complex) or space (large amounts of data to be analyzed)

Nfsen Combination of offline and online approach Graphical web-based front end for nfdump netflow tools Flow data exported by router(s) is collected by capture deamon(s) and stored in round robin database(s) in pieces of several minutes Nfdump/nfsen allow to browse, search, filter, etc. (syntax similar to tcpdump) Profiles supported Extendable by plugins

Nfsen: Screenshot (from: nfsen.sourceforge.net)

Nfsen: Plugins

Relational Databases Advantages: Standard software Performant server-side processing of queries (stored procedures) Use existing interfaces to programming languages Example of SQL query: SELECT ipv4_dst,count(*) c FROM data WHERE port_dst=80 AND protocol=6 GROUP BY ipv4_dst ORDER BY c DESC

Relational Databases: Example Netflow data of UT measurement (2 days) stored in MySQL database MyISAM engine (transactions, constraints, etc. not needed) Table: Columns: start/end time, IP src/dst, port src/dst, protocol, Data size: 49.6 GB ~ 53 Bytes/row In total 10 indexes: start time, src, dst, Total index size: 86.9 GB

Distributed Relational Databases MySQL cluster (from: MySQL reference manual)

Stream databases Continuous sequence of data instead of tables Queries operate on streams and return a table by applying a sliding window to the input or again a stream. Example: GSQL query in Gigascope SELECT tb, srcip, sum(len) FROM IPv4 WHERE protocol=6 GROUP BY time/60 as tb, srcip HAVING count(*)>5 Speed: several Gbs

Temporal Aggregation Temporal aggregation is a basic operation in data processing Example: A plot showing the number of transfered bytes for each month of the year Expensive operation if all data (every packet, every flow, ) is stored Simple solution: calculate aggregates (sums, ) online and only store the results But: which granularity to use? Fine: too much data Coarse: usefulness limited

Temporal Aggregation Using Multiple Granularities From: Temporal Aggregation over Data Streams using Multiple Granularities, Zhang et al., 2002. Idea: Recent data more interesting than old data use different granularities for old and recent data Maintain different indexes for each segment, integrated as a unified index

Adaptive Aggregation No fixed aggregation granularity A new measurement record is created if value aggregated so far differs from last measurement record by a given Example: (from: Adaptive Distributed Monitoring with Accuracy Objectives, 2006)

Hadoop, Google s MapReduce Designed to run on hundred or thousands of nodes Execution: 1. Data is split into pieces and distributed over nodes ( splits ) 2. Worker processes read data from splits and apply the map function. Result: (key,value) pairs. 3. Worker processes sort the intermediate data according to the key and apply the reduce function to each set of values with same key. Result: output data. 4. (Do another MapReduce call)

MapReduce: Example Data: flow data Goal: count number of flows per source IP address Map: Emit a pair (key=source IP,value=1) for each flow record in a split Reduce: For all pairs with same source IP, sum up the values. Result data: list of pairs (source IP,total count)

Hadoop, Google s MapReduce: Overview (from: MapReduce: Simplified Data Processing on Large Clusters, OSDI 04)

Many other approaches P2P-based distributed analysis Flow record query language (Jacobs University Bremen)

Network Tomography and Geolocation

Network Tomography Learn the internal characteristics of a network from external observations Why in that way? Internal data is often not available. Example:Internet Decentralized management Splitted up in subnetworks Examples: Bandwidth estimation Topology identification

Bandwidth Estimation Applications of bandwidth estimation: Network administrator: find bottlenecks, find corrupt lines, Video streaming: adapt compression rate P2P: find best peer

Bandwidth Estimation with Packet Pair Probing

Topology Identification Determine the topology of the network Applications of topology identification: Optimize routing Optimize virtual overlay networks (e.g. P2P) Optimize data delivery: What is the closest cache server to this particular client? Attacks Simple way: traceroute (only works if the network cooperates)

Topology Identification with Sandwich Probe Measurement (from: Maximum Likelihood Network Topology Identification from Edge-based Unicast Measurements, 2002)

Topology Identification with Sandwich Probe Measurement Every link shared by the paths from the start node to the two destination nodes will increase d Approach: 1. Make many measurements from one start node to several destination nodes 2. Use statistical techniques to calculate the most probable tree connecting the start nodes with the destination nodes Assumption: cross-traffic has zero-mean effect

Geolocation Identify the real-world geographical location of a network host Sources of information: IP address, GPS (if available), WiFi Applications: Provide location-specific information to user (where is the next...?) Tracking (parcel tracking, criminal prosecution, ) Advertisement Simple approach: map IP address to geographical location

Geolocation with CBG Idea: measure delays to reference hosts with known position (from: Constraint-Based Geolocation of Internet Hosts, 2004)

WiFi-based Geolocation of Internet Hosts Approach: Companies collect information on geographical positions of WiFi networks (Skyhook, Google Streetview cars, ) Client sends list of WLANs that it currently receives to the service provider Geographical position is calculated + also works inside buildings + no special hardware (GPS) required - requires dense deployment of WiFi