Scalable High Resolution Network Monitoring Open Cloud Day Wintherthur, 16 th of June, 2016 Georgios Kathareios, Ákos Máté, Mitch Gusat IBM Research GmbH Zürich Laboratory {ios, kos, mig}@zurich.ibm.com
Traditional Network Monitoring Network monitoring is essential for: Performance tuning Security Reliability, troubleshooting Traditional monitoring tools: SNMP sflow NetFlow / IPFIX OpenFlow statistics Insufficient at large speeds: Data delivered through the control plane (ms to s granularity) Short-lived / Volatile traffic patterns can affect the system networking performance and stability and eventually the user experience MapReduce congestive episodes in the range of 100s of µs (TCP incast problem) Traditional monitoring is Slow for the speed of modern networks
zmon: architectural goals Goal: Implement high-resolution load monitoring (at 10/25/40/100 Gbps networks) What: Scalable global continuous network monitoring Where: Data plane switch CPU not involved How: In an non-intrusive manner: Re-using available data plane sampling and mirroring mechanisms Targets: Large-scale datacenter and IXP networks
zmon: architectural goals Goal: Implement high-resolution load monitoring (at 10/25/40/100 Gbps networks) What: Scalable global continuous network monitoring Where: Data plane switch CPU not involved How: In an non-intrusive manner: Re-using available data plane sampling and mirroring mechanisms Targets: Large-scale datacenter and IXP networks http://www.h2020-endeavour.eu Zürich Research Lab
The Heatmap method - overview 1. Layer-2 load sampling Monitor the switch via an Layer-2 sampler Sample all output queue occupancies 2. Heatmap construction Queue samples upload to a controller Datacenter network topology assumed to be available Network state snapshots are produced 3. Synchronization (Time-coherency) Result is a stream of snapshots over time video Frame-rate depends on: 1. speed monitoring data collection 2. speed of monitoring data processing
The Heatmap method the QCN protocol Quantized Congestion Notification (QCN) Layer-2 congestion management scheme defined in the IEEE DCB 802.1Qau standard Objective: adapt the source injection to the available network capacity Congestion detection at the switch: 1. Sample packets with probability depending on the severity of the congestion (output queue size) 2. Calculate feedback value based on queue occupancy 3. Send congestion notification message (CNM) to traffic source Rate Limiter follows 2 control laws at the traffic source reduces TX rate proportional with feedback increases rate autonomously based on byte counter and timer
The Heatmap method repurposing QCN For monitoring, we use the congestion detection part of QCN (rate limiter not needed) As an IEEE standard, it should be already implemented in h/w in most switches (non-intrusive) CNMs are created by the switch and gathered in the end-nodes of the network (distributed, scalable) The switch CPU is not involved (monitoring in the data plane fast, us sampling) Prototype implemented in simulation: Detection of congestion trees in 10s of us T+0.07 ms T+0.18 ms T+1.74ms Anghel, Andreea, Robert Birke, and Mitch Gusat. "Scalable High Resolution Traffic Heatmaps: Coherent Queue Visualization for Datacenters." Traffic Monitoring and Analysis. Springer Berlin Heidelberg, 2014. 26-37.
The Heatmap method challenges in real-life Trident switches don t support QCN in hardware When implemented, it s mainly a reduced version in firmware Trident-2 and after do implement a proper hardware QCN FSM More challenges: Timestamping CNMs Must take place in the switch CNMs are scattered to the network edge provides scalability, but aggregation is harder End-node needs a collector/aggregator for CNMs Central collector aggregates all CNMs for heatmap production All operations are at line speed Aggregation/filtering needs same speed
Another sampling approach: Planck Work performed in Brown, Rice universities, IBM Research Austin Laboratory and Brocade Architectural goals: Obtain very fast samples across all switches in the network Use samples to infer global state of the network (flow throughput, flow path, port congestion state) System Measurement Speed (ms) Hedera (NSDI 10) 5,000 DevoFlow Polling (Sigcomm 11) 500 15,000 Mahout Polling (Infocom 11) 190 sflow/opensample (ICDCS 14) 100 Helios (Sigcomm 10) 77.4 Planck (Sigcomm 14) < 4.2 Jeff Rasley, Brent Stephens, Colin Dixon, Eric Rozner, Wes Felter, Kanak Agarwal, John Carter, Rodrigo Fonseca "Planck: Millisecond-scale monitoring and control for commodity networks." ACM SIGCOMM Computer Communication Review. Vol. 44. No. 4. ACM, 2014. Large parts of the current and following slides were contributed by the authors of the paper
Port mirroring in Planck Planck leverages the port-mirroring function of modern switches Copies all packets e.g. going out a port to a designated mirror port Mirror all ports to a single mirror port Intentional oversubscription Drop behavior approximates sampling Data-plane sampling Production Traffic Mirror Port Switch
Planck architecture Oversubscribed port-mirroring as a primitive SDN Controller Collectors receive samples from mirror ports Netmap for fast processing (Ongoing move to Intel DPDK) Reconstruct flow information across all flows in the network Flow throughput estimated from TCP sequence numbers Planck Collector Instance(s) Collectors can interact with an SDN controller to implement various applications
Inferring Throughput Sampling function is unknown Inferring throughput from samples typically relies on understanding the sampling function Limited to only packets containing sequence numbers (e.g. TCP) Packet Sample A at ta SA payload Packet Sample B at tb SB payload ta < tb throughput = SB - SA tb - ta Adaptive windowbased rate estimator
Planck challenges Increases production traffic packet drop rate due to shared buffer Switch Collector Mirror Port Mirrored packets hog the shared buffer space, reducing the available space for production traffic
Planck challenges (2) Planck mirroring is implemented with OpenFlow Without OpenFlow: Rely on switch mirroring implementation not always well implemented On Trident switches dropped mirrored packets also cause the original packets to be dropped Mirror disabled Both flows 100% throughput Mirror enabled Both flows at 50% throughput Mirror Port Switch Switch 50% drop at mirror port
Towards Planck 2.0 Uses match-and-mirror capability of commodity switches to reduce mirrored traffic Used by Everflow, a Microsoft telemetry tool for troubleshooting datacenter network problems With match-and-mirror we can solve unfairness of sampling between large and small flows In addition, reduce overhead by packet chunking and aggregation Zhu, Y., Kang, N., Cao, J., Greenberg, A., Lu, G., Mahajan, R.,... & Zheng, H. (2015, August). Packet-level telemetry in large datacenter networks. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication (pp. 479-491). ACM.
Next steps: Ingestion Pipeline Machine- Learning Pre-processing Storage Monitoring and sampling data, application traces Messaging system Monitoring as a Big Data application
Conclusions Legacy monitoring techniques are not adequate for modern networks We need high-resolution, non-intrusive monitoring in the data-plane 2 approaches for zmon: Heatmap creation by repurposing the QCN protocol (Ab)use of the traffic mirroring functionality Processing monitoring data is a Big Data application: Still a challenge