Petr Velan petr.velan@cesnet.cz High-Density Network Flow Monitoring IM2015 12 May 2015, Ottawa
Motivation What is high-density flow monitoring? Monitor high traffic in as little rack units as possible Why do we want high-density flow monitoring? Flow monitoring is deployed on many lines Number of flow probes is growing Management and operational costs are growing One probe per link does not scale Petr Velan High-Density Network Flow Monitoring 12. 5. 2015 1 / 18
Our Approach Use our custom made network interface cards to monitor multiple 10G links See what throughput can be achieved in one machine Test how advanced features of our NICs help the monitoring Identify performance bottlenecks Petr Velan High-Density Network Flow Monitoring 12. 5. 2015 2 / 18
Monitoring Setup The Testbed Petr Velan High-Density Network Flow Monitoring 12. 5. 2015 3 / 18
The Server Dell PowerEdge R720 server (2 rack units) 2 E5-2670 v1 CPUs (8 cores, 3 GHz) 64GB DDR3 RAM (1600 MHz) Scientific Linux 6.5 with 2.6.32-41 kernel 2 COMBO-80G cards Petr Velan High-Density Network Flow Monitoring 12. 5. 2015 4 / 18
COMBO-80G NIC FPGA based programmable hardware Two QSPF+ interfaces in 40 G or 4 10 G Ethernet mode 80 G per card PCI-Express gen3 x8 bus (64 Gb/s) Additional features: Accurate timestamps Hash based packet distribution Packet trimming Packet feature extraction into Unified Header Our setup allows to monitor 16 10 G links Petr Velan High-Density Network Flow Monitoring 12. 5. 2015 5 / 18
COMBO-80G NIC Core 0... Core 7 CPU 0... 7 DMA PCI-Express RAM FPGA Channel Hash Data (UH or Packets) Select UH UH Packets Parsing Trimming Timestamp Assign QSFP+ QSFP+ COMBO-80G 8 x 10 GE Petr Velan High-Density Network Flow Monitoring 12. 5. 2015 6 / 18
Flow Exporter Architecture Thread 2N+1 Export Multi-threaded design Utilizes 2N + 1 CPU cores where N is number of ring buffers Thread N+1... Thread 2N Flow cache Thread 1... Input Thread N 0... N-1 Ring Bu ers Petr Velan High-Density Network Flow Monitoring 12. 5. 2015 7 / 18
Data Generation Spirent TestCenter hardware 1 10 G repeated to all 16 interfaces IPv4 UDP packets Packet sizes 64 B, 128 B, 256 B, 512 B Flow counts 2 11, 2 18, 2 21 Petr Velan High-Density Network Flow Monitoring 12. 5. 2015 8 / 18
Results The Measurement Petr Velan High-Density Network Flow Monitoring 12. 5. 2015 9 / 18
Basic Performance on Full Packets 120 x 10 6 100 x 10 6 Maximum Ethernet Maximum PCIe 128 flows [Card 0] 16384 flows [Card 0] 131072 flows [Card 0] 128 flows [Card 1] 16384 flows [Card 1] 131072 flows [Card 1] Packets/s 80 x 10 6 60 x 10 6 40 x 10 6 20 x 10 6 0 x 10 6 64 128 256 512 Packet size Full packet processing performance in packets/s. Petr Velan High-Density Network Flow Monitoring 12. 5. 2015 10 / 18
Basic Performance on Full Packets PCI-Express limit is reached only for the longest packets NUMA architecture affects the performance Number of flows has large impact mainly on short packets Petr Velan High-Density Network Flow Monitoring 12. 5. 2015 11 / 18
Hardware Accelerated Performance 250 x 10 6 Maximum Ethernet Maximum PCIe Full Packets Trimmed Packets Unified Headers 200 x 10 6 Packets/s 150 x 10 6 100 x 10 6 50 x 10 6 0 x 10 6 64 128 256 512 Packet size Packet processing performance in packets/s for 2 18 flows. Petr Velan High-Density Network Flow Monitoring 12. 5. 2015 12 / 18
Hardware Accelerated Performance Packet trimming ad Unified Headers help significantly Full throughput monitoring for 256 B and longer packets Packet trimming and Unified Headers solve the problem of PCI-Express throughput Petr Velan High-Density Network Flow Monitoring 12. 5. 2015 13 / 18
Impact of CPU Choice Comparison of the E5-2670 with E5-2620 Only 6 cores, 2 Ghz frequency 128 flows [E5-2670] 16384 flows [E5-2670] 131072 flows [E5-2670] 128 flows [E5-2620] 16384 flows [E5-2620] 131072 flows [E5-2620] 140 x 10 6 120 x 10 6 100 x 10 6 Packets/s 80 x 10 6 60 x 10 6 40 x 10 6 20 x 10 6 0 x 10 6 64 128 256 512 Packet size Comparison for two different CPUs on trimmed packets Petr Velan High-Density Network Flow Monitoring 12. 5. 2015 14 / 18
Impact of CPU Choice Faster CPU helps greatly for less flows Large number of flows has greater impact on memory bus utilization Both CPU are doing well for longer packets Petr Velan High-Density Network Flow Monitoring 12. 5. 2015 15 / 18
Conclusions & Future Work Petr Velan High-Density Network Flow Monitoring 12. 5. 2015 16 / 18
Conclusions It is possible to monitor 16 10 G links in one 2U box Hardware acceleration can significantly help to improve the performance PCI-Express can be limiting for commodity cards NUMA architecture must be taken into consideration Number of flows has significant impact on performance Petr Velan High-Density Network Flow Monitoring 12. 5. 2015 17 / 18
Future Work Test the performance with COMBO-100G cards (PCIe gen3 x16) High-speed expriment with application flow measurements Build better framework for measurements Different flow lengths More complex packets and flows Different packets in one flow Petr Velan High-Density Network Flow Monitoring 12. 5. 2015 18 / 18
Thanks for your attention High-Density Network Flow Monitoring IM2015 12 May 2015, Ottawa Petr Velan petr.velan@cesnet.cz