Throughput Issues for High-Speed Wide-Area Networks



Similar documents
Network Monitoring with the perfsonar Dashboard

Deploying distributed network monitoring mesh

TCP Tuning Techniques for High-Speed Wide-Area Networks. Wizard Gap

Tier3 Network Issues. Richard Carlson May 19, 2009

TCP loss sensitivity analysis ADAM KRAJEWSKI, IT-CS-CE

High Performance Bulk Data Transfer

Network performance monitoring Insight into perfsonar

perfsonar: End-to-End Network Performance Verification

Integration of Network Performance Monitoring Data at FTS3

Comparing the Network Performance of Windows File Sharing Environments

Iperf Tutorial. Jon Dugan Summer JointTechs 2010, Columbus, OH

Netest: A Tool to Measure the Maximum Burst Size, Available Bandwidth and Achievable Throughput

LHCONE Site Connections

Question: 3 When using Application Intelligence, Server Time may be defined as.

TCP Labs. WACREN Network Monitoring and Measurement Workshop Antoine Delvaux perfsonar developer

EVALUATING NETWORK BUFFER SIZE REQUIREMENTS

Hands on Workshop. Network Performance Monitoring and Multicast Routing. Yasuichi Kitamura NICT Jin Tanaka KDDI/NICT APAN-JP NOC

Application Note. Windows 2000/XP TCP Tuning for High Bandwidth Networks. mguard smart mguard PCI mguard blade

End-to-End Network/Application Performance Troubleshooting Methodology

Introduction to perfsonar

D1.2 Network Load Balancing

Visualizations and Correlations in Troubleshooting

Campus Network Design Science DMZ

High-Speed TCP Performance Characterization under Various Operating Systems

Using Fuzzy Logic Control to Provide Intelligent Traffic Management Service for High-Speed Networks ABSTRACT:

RFC 6349 Testing with TrueSpeed from JDSU Experience Your Network as Your Customers Do

Dynamic Circuit Network (DCN) / perfsonar Shared Infrastructure

perfsonar MDM updates for LHCONE: VRF monitoring, updated web UI, VM images

TCP tuning guide for distributed application on wide area networks 1.0 Introduction

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

RTT 60.5 msec receiver window size: 32 KB

ESnet Support for WAN Data Movement

Network monitoring with perfsonar. Duncan Rand Imperial College London

Overview of Network Measurement Tools

Measuring IP Performance. Geoff Huston Telstra

Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking

Transport Layer Protocols

Applications. Network Application Performance Analysis. Laboratory. Objective. Overview

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

Final for ECE374 05/06/13 Solution!!

TCP AND ATM IN WIDE AREA NETWORKS

Network testing with iperf

Voice Quality Measurement in perfsonar

Performance Measurement of Wireless LAN Using Open Source

Improving our Evaluation of Transport Protocols. Sally Floyd Hamilton Institute July 29, 2005

TCP Adaptation for MPI on Long-and-Fat Networks

Chapter 1 Reading Organizer

CCNA R&S: Introduction to Networks. Chapter 5: Ethernet

Cisco Application Networking for IBM WebSphere

High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand

Top-Down Network Design

VMWARE WHITE PAPER 1

Using Linux Traffic Control on Virtual Circuits J. Zurawski Internet2 February 25 nd 2013

Network Probe. Figure 1.1 Cacti Utilization Graph

Network congestion control using NetFlow

Allocating Network Bandwidth to Match Business Priorities

How To Connect To Bloomerg.Com With A Network Card From A Powerline To A Powerpoint Terminal On A Microsoft Powerbook (Powerline) On A Blackberry Or Ipnet (Powerbook) On An Ipnet Box On

Using LiveAction Software for Successful VoIP Deployments How to quickly and accurately deploy QoS for VoIP networks

ANI Network Testbed Update

Procedure: You can find the problem sheet on Drive D: of the lab PCs. 1. IP address for this host computer 2. Subnet mask 3. Default gateway address

Linux TCP Implementation Issues in High-Speed Networks

Performance Analysis of IPv4 v/s IPv6 in Virtual Environment Using UBUNTU

Cleaning Encrypted Traffic

WiFiPerf User Guide 1.5

4 High-speed Transmission and Interoperability

Performance Analysis of Switches in High Speed LAN

CCNA Discovery Networking for Homes and Small Businesses Student Packet Tracer Lab Manual

TCP Window Size for WWAN Jim Panian Qualcomm

Using Netflow data for forecasting

Transcription:

Throughput Issues for High-Speed Wide-Area Networks Brian L. Tierney ESnet Lawrence Berkeley National Laboratory http://fasterdata.es.net/ (HEPIX Oct 30, 2009)

Why does the Network seem so slow?

TCP Performance Issues Getting good TCP performance over highlatency high-bandwidth networks is not easy! You must keep the TCP window full the size of the window is directly related to the network latency Easy to compute max throughput: Throughput = buffer size / latency eg: 64 KB buffer / 40 ms path = 1.6 KBytes (12.8 Kbits) / sec

Setting the TCP buffer sizes It is critical to use the optimal TCP send and receive socket buffer sizes for the link you are using. Recommended size to fill the pipe 2 x Bandwidth Delay Product (BDP) Recommended size to leave some bandwidth for others around 20% of (2 x BPB) =.4 * BDP Default TCP buffer sizes are way too small for today s high speed networks Until recently, default TCP send/receive buffers were typically 64 KB tuned buffer to fill LBL to BNL link: 10 MB 150X bigger than the default buffer size with default TCP buffers, you can only get a small % of the available bandwidth!

Buffer Size Example Optimal Buffer size formula: buffer size = 20% * (2 * bandwidth * RTT) ping time (RTT) = 50 ms Narrow link = 500 Mbps (62 MBytes/sec) e.g.: the end-to-end network consists of all 1000 BT ethernet and OC-12 (622 Mbps) TCP buffers should be:.05 sec * 62 * 2 * 20% = 1.24 MBytes

Sample Buffer Sizes LBL to (50% of pipe) SLAC (RTT = 2 ms, narrow link = 1000 Mbps) : 256 KB BNL: (RTT = 80 ms, narrow link = 1000 Mbps): 10 MB CERN: (RTT = 165 ms, narrow link = 1000 Mbps): 20.6 MB Note: default buffer size is usually only 64 KB, and default maximum autotuning buffer size for is often only 256KB e.g.: FreeBSD 7.2 Autotuning default max = 256 KB 10-150 times too small! Home DSL, US to Europe (RTT = 150, narrow link = 2 Mbps): 38 KB Default buffers are OK.

Socket Buffer Autotuning To solve the buffer tuning problem, based on work at LANL and PSC, Linux OS added TCP Buffer autotuning Sender-side TCP buffer autotuning introduced in Linux 2.4 Receiver-side autotuning added in Linux 2.6 Most OS s now include TCP autotuning TCP send buffer starts at 64 KB As the data transfer takes place, the buffer size is continuously re-adjusted up max autotune size Current OS Autotuning default maximum buffers Linux 2.6: 256K to 4MB, depending on version FreeBSD 7: 256K Windows Vista: 16M Mac OSX 10.5: 8M

Autotuning Settings Linux 2.6 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 # autotuning min, default, and max number of bytes to use net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216 FreeBSD 7.0 net.inet.tcp.sendbuf_auto=1 net.inet.tcp.recvbuf_auto=1 net.inet.tcp.sendbuf_max=16777216 net.inet.tcp.recvbuf_max=16777216 Windows Vista netsh interface tcp set global autotunninglevel=normal max buffer fixed at 16MB OSX 10.5 ( Self-Tuning TCP) kern.ipc.maxsockbuf=16777216 For more info, see: http://fasterdata.es.net/tcp-tuning/

More Problems: TCP congestion control Path = LBL to CERN (Geneva) OC-3, (in 2000), RTT = 150 ms average BW = 30 Mbps

Parallel Streams Can Help

Parallel Streams Issues Potentially unfair Places more load on the end hosts But they are necessary when you don t have root access, and can t convince the sysadmin to increase the max TCP buffers graph from Tom Dunigan, ORNL

TCP Response Function Well known fact that TCP Reno does not scale to highspeed networks Average TCP congestion window = 1.2 p segments p = packet loss rate see: http://www.icir.org/floyd/hstcp.html What this means: For a TCP connection with 1500-byte packets and a 100 ms round-trip time filling a 10 Gbps pipe would require a congestion window of 83,333 packets, a packet drop rate of at most one drop every 5,000,000,000 packets. requires at most one packet loss every 6000s, or 1h:40m to keep the pipe full

Proposed TCP Modifications Many proposed alternate congestion control algorithms: BIC/CUBIC HTCP: (Hamilton TCP) Scalable TCP Compound TCP And several more

Linux 2.6.12 Results 800 TCP Results Note that BIC reaches Max throughput MUCH faster 700 600 500 400 300 Linux 2.6, BIC TCP Linux 2.4 Linux 2.6, BIC off 200 100 RTT = 67 ms 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 time slot (5 second intervals)

Comparison of Various TCP Congestion Control Algorithms

Selecting TCP Congestion Control in Linux To determine current configuration: sysctl -a grep congestion net.ipv4.tcp_congestion_control = cubic net.ipv4.tcp_available_congestion_control = cubic reno Use /etc/sysctl.conf to set to any available congested congestion control. Supported options (may need to enabled by default in your kernel): CUBIC, BIC, HTCP, HSTCP, STCP, LTCP, more..

Summary so far To optimize TCP throughput, do the following: Use a newer OS that supports TCP buffer autotuning Increase the maximum TCP autotuning buffer size Use a few parallel streams if possible Use a modern congestion control algorithm

PERFSONAR

High Performance Networking Most of the R&E community has access to 10 Gbps networks. Naive users with the right tools should be able to easily get: 200 Mbps/per stream between properly maintained systems 2 Gbps aggregate rates between significant computing resources Most users are not experiencing this level of performance There is widespread frustration involved in the transfer of large data sets between facilities, or from the data s facility of origin back to the researcher s home institution. From the BES network requirements workshop: http://www.es.net/pub/esnet doc/bes Net Req Workshop 2007 Final Report.pdf We can increase performance by measuring the network and reporting problems!

Network Troubleshooting is a Multi-Domain Problem Source Campus Education Backbone Destination Campus S D JET Network Regional

Where are common problems? Source Campus Congested or faulty links between domains Education Backbone Latency dependant problems inside domains with small RTT Destination Campus S D JET Network Regional

Soft Network Failures Soft failures are where basic connectivity functions, but high performance is not possible. TCP was intentionally designed to hide all transmission errors from the user: As long as the TCPs continue to function properly and the internet system does not become completely partitioned, no transmission errors will affect the users. (From IEN 129, RFC 716) Some soft failures only affect high bandwidth long RTT flows. Hard failures are easy to detect & fix, soft failures can lie hidden for years.

Common Soft Failures Small Queue Tail Drop Switches not able to handle the long packet trains prevalent in long RTT sessions and cross traffic at the same time Un-intentional Rate Limiting Process Switching on Cisco 6500 devices due to faults, acl s, or mis-configuration Security Devices E.g.: 10X improvement by turning off Cisco Reflexive ACL Random Packet Loss Bad fibers or connectors Low light levels due to amps/interfaces failing Duplex mismatch

Local testing will not find all problems Source Campus S Performance is poor when RTT exceeds 20 ms R&E Backbone Performance is good when RTT is < 20 ms Destination Campus D Regional Regional Switch with small buffers

Addressing the Problem: perfsonar Developing an open web-services based framework for collecting, managing and sharing network measurements Deploying the framework across the science community Encouraging people to deploy known good measurement points near domain boundaries Using the framework to find & correct soft network failures.

perfsonar Deployments Internet2 ESnet Argonne National Lab Brookhaven National Lab Fermilab National Energy Research Scientific Computing Center Pacific Northwest National Lab University of Michigan, Ann Arbor Indiana University Boston University University of Texas Arlington Oklahoma University, Norman Michigan Information Technology Center William & Mary University of Wisconsin Madison Southern Methodist University, Dallas University of Texas Austin Vanderbilt University APAN GLORIAD JGN2PLUS KISTI Korea Monash University, Melbourne NCHC, HsinChu, Taiwan Simon Fraser University GEANT GARR HUNGARNET PIONEER SWITCH CCIN2P3 CERN CNAF DE-KIT NIKHEF/SARA PIC RAL TRIUMF

perfsonar Architecture Interoperable network measurement middleware (SOA): Modular Web services-based Decentralized Locally controlled Integrates: Network measurement tools and data archives Data manipulation Information Services Discovery Topology Authentication and authorization Based on: Open Grid Forum Network (OGF) Network Measurement Working Group (NM- WG) schema Formalizing the specification of perfsonar protocols in the OGF NMC-WG Network topology description work in the OGF NML-WG

Main perfsonar Services Lookup Service gls Global service used to find services hls Home service for registering local perfsonar metadata Measurement Archives SNMP MA Interface Data psb MA -- Scheduled bandwidth and latency data Measurement Points BWCTL OWAMP PINGER Troubleshooting Tools NDT NPAD Topology Service

PerfSONAR-PS Deployment Status Currently deployed in over 130 locations 45 bwctl and owamp servers 40 psb MA 15 SNMP MA 5 gls and 135 hls US Atlas Deployment Monitoring all Tier 1 to Tier 2 connections For current list of services, see: http://www.perfsonar.net/activeservices.html 29

Deploying a perfsonar measurement host in under 30 minutes Using the PS Performance Toolkit is very simple Boot from CD Use command line tool to configure what disk partition to use for persistent data Network address and DNS User and root passwords Use Web GUI to configure Select which services to run Select remote measurement points for bandwidth and latency tests Configure Cacti to collect SNMP data from key router interfaces 30

SAMPLE RESULTS

Example: US Atlas LHC Tier 1 to Tier 2 Center Data transfer problem Couldn t exceed 1 Gbps across a 10GE end to end path that included 5 administrative domains Used perfsonar tools to localize problem Identified problem device An unrelated domain had leaked a full routing table to the router for a short time causing FIB corruption. The routing problem was fixed, but the router started process switching some flows after that. Fixed Rebooting device fixed the symptoms of the problem Better BGP filters configured to prevent reoccurrence 32

Sample Results: Fixed a soft failure

Sample Results: Bulk Data Transfer between DOE Supercomputer Centers Users were having problems moving data between supercomputer centers One user was: waiting more than an entire workday for a 33 GB input file perfsonar Measurement tools were installed Regularly scheduled measurements were started Numerous choke points were identified & corrected Dedicate wide area transfer nodes were setup Tuned for Wide Area Transfers Now moving 40 TB in less than 3 days 34

Importance of Regular Testing You cant wait for users to report problems and then fix them Soft failure can go unreported for years Problems that get fixed have a way of coming back System defaults coming back after hardware/software upgrade New employees not knowing why the previous employee set things up that way perfsonar makes is easy to collect, archive, and alert on throughput information

Example: NERSC <-> ORNL Testing

More Information http://fasterdata.es.net/ email: BLTierney@es.net Also see: http://fasterdata.es.net/talks/bulk-transfer-tutorial.pdf https://plone3.fnal.gov/p0/wan/netperf/methodology/