EVALUATING NETWORK BUFFER SIZE REQUIREMENTS

Similar documents

Campus Network Design Science DMZ

TCP Labs. WACREN Network Monitoring and Measurement Workshop Antoine Delvaux perfsonar developer

Using Linux Traffic Control on Virtual Circuits J. Zurawski Internet2 February 25 nd 2013

TCP loss sensitivity analysis ADAM KRAJEWSKI, IT-CS-CE

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

Performance of VMware vcenter (VC) Operations in a ROBO Environment TECHNICAL WHITE PAPER

1000Mbps Ethernet Performance Test Report

D1.2 Network Load Balancing

Tier3 Network Issues. Richard Carlson May 19, 2009

High-Speed TCP Performance Characterization under Various Operating Systems

Iperf Tutorial. Jon Dugan Summer JointTechs 2010, Columbus, OH

Please purchase PDF Split-Merge on to remove this watermark.

perfsonar: End-to-End Network Performance Verification

Fundamentals of Data Movement Hardware

The new frontier of the DATA acquisition using 1 and 10 Gb/s Ethernet links. Filippo Costa on behalf of the ALICE DAQ group

Question: 3 When using Application Intelligence, Server Time may be defined as.

Transparent Optimization of Grid Server Selection with Real-Time Passive Network Measurements. Marcia Zangrilli and Bruce Lowekamp

Deploying 10/40G InfiniBand Applications over the WAN

Allocating Network Bandwidth to Match Business Priorities

Evaluation of 40 Gigabit Ethernet technology for data servers

Netest: A Tool to Measure the Maximum Burst Size, Available Bandwidth and Achievable Throughput

perfsonar Host Hardware

Open Source Bandwidth Management: Introduction to Linux Traffic Control

Optimizing Throughput on Guaranteed-Bandwidth WAN Networks for the Large Synoptic Survey Telescope (LSST)

Comparison of 40G RDMA and Traditional Ethernet Technologies

VMWARE WHITE PAPER 1

PE310G4BPi40-T Quad port Copper 10 Gigabit Ethernet PCI Express Bypass Server Intel based

Scaling Up Your Network Monitoring: From the Garden Hose to the Fire Hose

Gigabit Ethernet Design

TCP tuning guide for distributed application on wide area networks 1.0 Introduction

High Speed Ethernet. Dr. Sanjay P. Ahuja, Ph.D. Professor School of Computing, UNF

Network testing with iperf

Overview of Network Measurement Tools

Performance of Host Identity Protocol on Nokia Internet Tablet

Network Monitoring with the perfsonar Dashboard

Small is Better: Avoiding Latency Traps in Virtualized DataCenters

N5 NETWORKING BEST PRACTICES

Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking

Infrastructure for active and passive measurements at 10Gbps and beyond

Cisco Bandwidth Quality Manager 3.1

Improving the Performance of TCP Using Window Adjustment Procedure and Bandwidth Estimation

Network Diagnostic Tools. Jijesh Kalliyat Sr.Technical Account Manager, Red Hat 15th Nov 2014

Frequently Asked Questions

Challenges of Sending Large Files Over Public Internet

A Simulation Study of Effect of MPLS on Latency over a Wide Area Network (WAN)

Optimizing WAN Performance for the Global Enterprise

APPOSITE TECHNOLOGIES Smoothing the Transition to 10 Gbps. WAN Emulation Made Easy

Performance Tuning Guidelines for PowerExchange for Microsoft Dynamics CRM

Network Performance Optimisation and Load Balancing. Wulf Thannhaeuser

ITL Lab 5 - Performance Measurements and SNMP Monitoring 1. Purpose

Measuring Wireless Network Performance: Data Rates vs. Signal Strength

TCP Tuning Techniques for High-Speed Wide-Area Networks. Wizard Gap

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

LotServer Deployment Manual

Data Networks Summer 2007 Homework #3

Using TrueSpeed VNF to Test TCP Throughput in a Call Center Environment

Quantum StorNext. Product Brief: Distributed LAN Client

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

Network Probe. Figure 1.1 Cacti Utilization Graph

Choosing Tap or SPAN for Data Center Monitoring

Measure wireless network performance using testing tool iperf

How Router Technology Shapes Inter-Cloud Computing Service Architecture for The Future Internet

Configuring Your Computer and Network Adapters for Best Performance

Effects of Interrupt Coalescence on Network Measurements

Windows Server 2012 R2 Hyper-V: Designing for the Real World

Assessing the Performance of Virtualization Technologies for NFV: a Preliminary Benchmarking

Smart Tips. Enabling WAN Load Balancing. Key Features. Network Diagram. Overview. Featured Products. WAN Failover. Enabling WAN Load Balancing Page 1

A High-Performance Storage and Ultra-High-Speed File Transfer Solution

Hands on Workshop. Network Performance Monitoring and Multicast Routing. Yasuichi Kitamura NICT Jin Tanaka KDDI/NICT APAN-JP NOC

Operating Systems and Networks Sample Solution 1

Linux NIC and iscsi Performance over 40GbE

Voice Over IP. MultiFlow IP Phone # 3071 Subnet # Subnet Mask IP address Telephone.

Deploying in a Distributed Environment

Leveraging NIC Technology to Improve Network Performance in VMware vsphere

perfsonar Host Hardware

Advanced Network Services Teaming

Network Instruments white paper

Layer 2 Network Encryption where safety is not an optical illusion Marko Bobinac SafeNet PreSales Engineer

IP SAN Best Practices

100 Gigabit Ethernet is Here!

Clearing the Way for VoIP

Net Optics and Cisco NAM

Requirements of Voice in an IP Internetwork

Technical Bulletin. Arista LANZ Overview. Overview

Performance Measurement of Wireless LAN Using Open Source

How To Monitor And Test An Ethernet Network On A Computer Or Network Card

pathchar a tool to infer characteristics of Internet paths

Transcription:

EVALUATING NETWORK BUFFER SIZE REQUIREMENTS for Very Large Data Transfers Michael Smitasin Lawrence Berkeley National Laboratory (LBNL) Brian Tierney Energy Sciences Network (ESnet)

[ 2 ]

Example Workflow Step 1 Analyze raw data at Beamline in real time and adjust experiment [ 3 ]

Example Workflow Step 2 Store raw data [ 4 ]

Example Workflow Step 3 Analyze stored data locally [ 5 ]

Example Workflow Step 4 Transfer to supercomputers over WAN [ 6 ]

Example Workflow Step 5 Analyze processed results and adjust experiment [ 7 ]

LAN Transfer Bits per 10 millisecs 1 ~1ms RTT 1 100Mb/0.01sec = 10Gbps [ 8 ]

WAN 1 Transfer Bits per 10 millisecs 2 70ms RTT The WAN just sucks, right? 1 70ms simulated RTT via netem 2 100Mb/0.01sec = 10Gbps [ 9 ]

Uncongested vs Congested WAN Transfers Bits per 1 millisec 2 captured via passive tap Uncongested Congested 3 Data Transfer (TCP) Background Traffic (UDP) 1 70ms simulated RTT via netem 2 10Mb/0.001sec = 10Gbps 3 Adding 2Gbps UDP Traffic [ 10 ]

Real World Testing @ 70ms RTT 35ms x2 Special thanks to Mark Lukasczyk at Brookhaven National Laboratory for providing far-end test servers [ 11 ]

Uncongested vs Congested WAN Transfers Real World tests California to New York 1 Optical tap / cpacket @ LBNL border 1 ~ 70ms real RTT [ 12 ]

Impact of packet loss at different distances Source: http://fasterdata.es.net [ 13 ]

TCP s Congestion Control w/ insufficient buffers 50ms simulated RTT Congestion w/ 2Gbps UDP traffic HTCP / Linux 2.6.32 [ 14 ]

TCP s Congestion Control w/ sufficient buffers 50ms simulated RTT Congestion w/ 2Gbps UDP traffic HTCP / Linux 2.6.32 [ 15 ]

Congestion = Packet Loss = Poor Performance so what do we do? With this? Replace (something like) this Royalty checks to: Michael Smitasin PO Box 919013 Berkeley, CA 94707 Source: http://www.juniper.net [ 16 ]

That is a hard ( and expensive) pill to swallow. (and it s not always the right choice) [ 17 ]

What if you could try before you buy? Easy to build test environment Open source (free) software LAN distances, WAN latencies Isolated, controlled (no saturating production links!) Quickly compare models, vendors, configs, etc. [ 18 ]

Hardware 4x Servers w/ 10G NICs We used Dell R320s and Intel X520s 1x Receiving switch w/ 3x 10G ports Doesn t need to be expensive, this is where data fans out so no congestion on this side. 5x SFP+ Direct-Attached Copper Cables (cheapest) OR 6x 10G Optics + 5x fiber cables (flexible options) We used SR optics + 50µm multimode fiber Wanted flexibility for testing models w/ X2, XENPAK, etc Sending switch(es) w/ 3x 10G ports (to test) [ 19 ]

Software / Configuration Linux (distro generally your preference) We used CentOS 6 (some Fedora too) Install test utilities (or just use perfsonar 1 ) import Internet2 repo install iperf nuttcp bwctl-client bwctl-server Host and NIC tuning per FasterData 2 recommendations: TCP Tuning - /etc/sysctl.conf TX Queue Length TX / RX Descriptors Jumbo Frames 1 http://www.perfsonar.net 2 http://fasterdata.es.net [ 20 ]

Test Overview Add delay between Host 1 and Host 3 using netem Host 3 runs iperf3 server Host 4 runs iperf3 server Host 2 sends UDP traffic @ 2Gbps Host 1 sends TCP traffic and we measure rate Sending Switch s outbound interface is congested and buffering occurs [ 21 ]

Test Commands Add delay: host1 # tc qdisc add dev ethn root netem delay 25ms host3 # tc qdisc add dev ethn root netem delay 25ms Start iperf3 servers to receive data: host3 # iperf3 -s host4 # iperf3 -s Start background traffic (to cause congestion): host2 # iperf3 -c host4 -u -b2g -t3000 Start TCP traffic to simulate data transfer: host1 # iperf3 -c host3 -P2 -t30 -O5 [ 22 ]

Test Results [ ID] Interval Transfer Bandwidth Retr Cwnd... [ 4] 29.00-30.00 sec 201 MB 1.69 Gbps 0 9.54 MB [ 6] 29.00-30.00 sec 126 MB 1.06 Gbps 0 6.05 MB [SUM] 29.00-30.00 sec 328 MB 2.75 Gbps 0 - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-30.00 sec 5.85 GB 1.68 Gbps 40 sender [ 4] 0.00-30.00 sec 5.83 GB 1.67 Gbps receiver [ 6] 0.00-30.00 sec 4.04 GB 1.16 Gbps 39 sender [ 6] 0.00-30.00 sec 4.01 GB 1.15 Gbps receiver [SUM] 0.00-30.00 sec 9.89 GB 2.83 Gbps 79 sender [SUM] 0.00-30.00 sec 9.85 GB 2.82 Gbps receiver [ 23 ]

Swap out sending switch and repeat [ 24 ]

Average TCP results, various switches Buffers per 10G egress port 2x parallel TCP streams 50ms simulated RTT 2Gbps UDP background traffic 15 iterations 1MB Brocade MLXe 1 9MB Arista 7150 16MB Cisco 6704 64MB Brocade MLXe 1 90MB Cisco 6716 2 200MB Cisco 6716 3 VOQ Arista 7504 1 NI-MLX-10Gx8-M 2 Over-subscription Mode 3 Performance Mode [ 25 ]

Tunable Buffers with a Brocade MLXe 1 Buffers per 10G egress port 2x parallel TCP streams 50ms simulated RTT 2Gbps UDP background traffic 15 iterations qos queue-type 0 max-queue-size 1 NI-MLX-10Gx8-M [ 26 ]

Is netem accurate? Real World RTT vs Simulated RTT 20% = ~76MB? 70ms RTT 2x parallel TCP streams 2Gbps UDP background traffic Juniper MX80 config: class-of-service scheduler buffer-size percent Special thanks to Mark Lukasczyk at Brookhaven National Laboratory for providing far-end test servers [ 27 ]

Tap from LBNL border CA to NY MX80 w/ 1% buffer MX80 w/ 5% buffer [ 28 ]

Tap from LBNL border CA to NY MX80 w/ 50% buffer [ 29 ]

What if there s a small buffered switch upstream? No Congestion on it? No problem Congestion on it? Many problem [ 30 ]

Alternate test - nuttcp [ 31 ]

Alternate test nuttcp commands Add delay: host1# tc qdisc add dev eth1 root netem delay 25ms host2# tc qdisc add dev eth1 root netem delay 25ms Start 2Gbps UDP flow to add congestion: host4# iperf3 s host3# iperf3 -c host4 -u -b2g -t3000 nuttcp basic test parameters 1 : host2# nuttcp -S host1# nuttcp -l8972 -T30 -u -w4m Ri300m/X i1 host2 1 https://fasterdata.es.net/performance-testing/networktroubleshooting-tools/nuttcp/ [ 32 ]

nuttcp conclusion This will probably have no packet loss on smaller buffer switches: nuttcp -l8972 -T30 -u -w4m -Ri300m/65 -i1 While this will probably have some: nuttcp -l8972 -T30 -u -w4m -Ri300m/300 -i1 BUT only applies to where there is congestion. Small buffer switch that isn t congested won t be detectable with this method. [ 33 ]

Then Big Buffers = good? Only in the context of these Elephant flows Very large data transfers (Terabytes, Petabytes) Large pipes (10 Gbps & up) Long distances (50ms+) Between small numbers of hosts By big we re talking MBs per 10G port, not GBs. Important to have enough buffers to ride out micro-bursts. May need to drop 1 or 2 packets to fit available capacity, but to maintain performance we need to keep TCP from getting stuck in loss recovery mode. [ 34 ]

General Purpose Network Congestion, Insufficient Buffering [ 35 ]

General Purpose Network Science DMZ [ 36 ]

Workflow-specific Networks: Beamline 10G LAN Campus LAN Science DMZ [ 37 ]

Effects of TCP Segmentation Offload TCP Throughput on Small Buffer Switch (Congestion w/ 2Gbps UDP background traffic) ethtool -k EthN Show current offload settings ethtool -K EthN tso off Set TSO off TSO Off was negligible on our hosts with Intel X520 @ 10Gbps (~100-200Mbps) BUT may be NIC specific, or more applicable @ 40Gbps+ [ 38 ]

Fair Queuing and Pacing in Kernel 4.1.4 TCP Throughput on Small Buffer Switch (Congestion w/ 2Gbps UDP background traffic) Requires newer kernel version not available in 2.6.32 tc qdisc add dev EthN root fq Enable Fair Queuing Pacing side effect of Fair Queuing yields ~1.25Gbps increase in throughput @ 10Gbps on our hosts TSO differences still negligible on our hosts w/ Intel X520 [ 39 ]

Additional Information A History of Buffer Sizing http://people.ucsc.edu/~warner/bufs/buffer-requirements Jim Warner s Packet Buffer Page http://people.ucsc.edu/~warner/buffer.html Faster Data @ ESnet http://fasterdata.es.net Michael Smitasin mnsmitasin@lbl.gov Brian Tierney bltierney@es.net [ 40 ]

EVALUATING NETWORK BUFFER SIZE REQUIREMENTS for Very Large Data Transfers Michael Smitasin Lawrence Berkeley National Laboratory (LBNL) Brian Tierney Energy Sciences Network (ESnet)