High Performance Active End-toend Network Monitoring

Similar documents

Correlating Internet Performance Changes and Route Changes to Assist in Trouble-shooting from an End-user Perspective

Using Netflow data for forecasting

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

Netest: A Tool to Measure the Maximum Burst Size, Available Bandwidth and Achievable Throughput

Performance Measurement of Wireless LAN Using Open Source

Experiences in Traceroute and Available Bandwidth Change Analysis*

Overview of Network Measurement Tools

Hybrid network traffic engineering system (HNTES)

Integration of Network Performance Monitoring Data at FTS3

D1.2 Network Load Balancing

High-Speed TCP Performance Characterization under Various Operating Systems

Pump Up Your Network Server Performance with HP- UX

TCP Tuning Techniques for High-Speed Wide-Area Networks. Wizard Gap

Deploying distributed network monitoring mesh

Frequently Asked Questions

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE

Question: 3 When using Application Intelligence, Server Time may be defined as.

Internet2 NetFlow Weekly Reports

Experiences with MPTCP in an intercontinental multipathed OpenFlow network

Iperf Tutorial. Jon Dugan Summer JointTechs 2010, Columbus, OH

mbits Network Operations Centrec

Network Monitoring with the perfsonar Dashboard

Measure wireless network performance using testing tool iperf

Measuring Wireless Network Performance: Data Rates vs. Signal Strength

TCP Labs. WACREN Network Monitoring and Measurement Workshop Antoine Delvaux perfsonar developer

Applications. Network Application Performance Analysis. Laboratory. Objective. Overview

TCP tuning guide for distributed application on wide area networks 1.0 Introduction

Using TrueSpeed VNF to Test TCP Throughput in a Call Center Environment

Lecture 8 Performance Measurements and Metrics. Performance Metrics. Outline. Performance Metrics. Performance Metrics Performance Measurements

Network Performance Optimisation and Load Balancing. Wulf Thannhaeuser

perfsonar: End-to-End Network Performance Verification

CS551 End-to-End Internet Packet Dynamics [Paxson99b]

ESnet Support for WAN Data Movement

DEPLOYMENT GUIDE Version 1.1. Configuring BIG-IP WOM with Oracle Database Data Guard, GoldenGate, Streams, and Recovery Manager

IMPLEMENTING GREEN IT

Monitoring high-speed networks using ntop. Luca Deri

Distributed applications monitoring at system and network level

TCP Adaptation for MPI on Long-and-Fat Networks

Operating System for the K computer

Performance Analysis of IPv4 v/s IPv6 in Virtual Environment Using UBUNTU

White Paper. The Ten Features Your Web Application Monitoring Software Must Have. Executive Summary

Finding Fault Location: Combining network topology and end-to-end measurments to locate network problems?

Open Source in Network Administration: the ntop Project

Windows Server Performance Monitoring

The Ecosystem of Computer Networks. Ripe 46 Amsterdam, The Netherlands

Hands on Workshop. Network Performance Monitoring and Multicast Routing. Yasuichi Kitamura NICT Jin Tanaka KDDI/NICT APAN-JP NOC

Low-rate TCP-targeted Denial of Service Attack Defense

Dell PowerVault MD Series Storage Arrays: IP SAN Best Practices

Final for ECE374 05/06/13 Solution!!

Infrastructure for active and passive measurements at 10Gbps and beyond

A Talari Networks White Paper. Turbo Charging WAN Optimization with WAN Virtualization. A Talari White Paper

Development of 10 Gbits/s Traffic Shaper

Active Measurement Data Analysis Techniques

IP SAN Best Practices

CA Unified Infrastructure Management

SC14 Remote I/O Pipeline Processing Demonstrtion

On evaluating the differences of TCP and ICMP in network measurement

Virtualization: TCP/IP Performance Management in a Virtualized Environment Orlando Share Session 9308

Network Measurement. Why Measure the Network? Types of Measurement. Traffic Measurement. Packet Monitoring. Monitoring a LAN Link. ScienLfic discovery

Campus Network Design Science DMZ

ABW - Short-timescale passive bandwidth monitoring

Procedure: You can find the problem sheet on Drive D: of the lab PCs. 1. IP address for this host computer 2. Subnet mask 3. Default gateway address

IP SAN BEST PRACTICES

NASA EOSDIS Network Monitoring New Active, Passive and Real-time Monitoring Approaches

Monitoring Android Apps using the logcat and iperf tools. 22 May 2015

Performance Comparison of low-latency Anonymisation Services from a User Perspective

Open Source File Transfers

Policy Based Forwarding

Key Components of WAN Optimization Controller Functionality

MONITORING AVAILABLE BANDWIDTH OF UNDERLYING GRID NETWORKS

Web Load Stress Testing

Deploying Riverbed wide-area data services in a LeftHand iscsi SAN Remote Disaster Recovery Solution

Transparent Optimization of Grid Server Selection with Real-Time Passive Network Measurements. Marcia Zangrilli and Bruce Lowekamp

Post-production Video Editing Solution Guide with Quantum StorNext File System AssuredSAN 4000

SonicWALL Global Management System Reporting Guide Standard Edition

EVERYTHING A DBA SHOULD KNOW

Transport Layer Protocols

Best of Breed of an ITIL based IT Monitoring. The System Management strategy of NetEye

Measuring IP Performance. Geoff Huston Telstra

Transcription:

High Performance Active End-toend Network Monitoring Les Cottrell, Connie Logg, Warren Matthews, Jiri Navratil, Ajay Tirumala SLAC Prepared for the 1 st SCAMPI Workshop, Amsterdam, January 2003 http://www.slac.stanford.edu/grp/scs/net/talk/caida-jun02.html Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring (IEPM), by the SciDAC base program, and also supported by IUPAP 1

Outline High performance testbed Challenges for measurements at high speeds Infrastructure for regular high-performance measurements 2

Testbed 6 cpu servers 4 disk servers 7 6 0 6 T 6 4 0 OC192/POS (10Gbits/s) G S R 12 cpu servers 4 disk servers Sunnyvale 2.5Gbits/s 6 cpu servers 7 6 0 6 3

Problems: Achievable TCP throughput Typically use iperf Want to measure stable throughput (i.e. after slow start) Slow start takes quite long at high BW*RTT GE for RTT from California to Geneva (RTT=182ms) slow start takes ~ 5s So for slow start to contribute < 10% to throughput measured need to run for 50s About double for Vegas/FAST TCP So developing Quick Iperf Use web100 to tell when out of slow start Measure for 1 second afterwards 90% reduction in duration and bandwidth used 4

Examples 24ms RTT 140ms RTT 5

Problems: Achievable bandwidth Typically use packet pair dispersion or packet size techniques (e.g. pchar, pipechar, pathload, pathchirp, ) In our experience current implementations fail for > 155Mbits/s and/or take a long time to make a measurement Developed a simple practical packet pair tool ABwE Typically uses 40 packets, tested up to 950Mbits/s Low impact Few seconds for measurement (can use for real-time monitoring) 6

Screen shot of real time ABwE display tool for monitoring at SC2002 Measurements 1 minute separation ABwE Results Note every hour sudden dip in available bandwidth 7

Problem: File copy applications Some tools (e.g. bbcp will not allow a large enough window currently limited to 2MBytes) Same slow start problem as iperf Need big file to assure not cached E.g. 2GBytes, at 200 Mbits/s takes 80s to transfer, even longer at lower speeds Looking at whether can get same effect as a big file but with a small (64MByte) file, by playing with commit Many more factors involved, e.g. adds file system, disks speeds, RAID etc. Maybe best bet is to let the user measure it for us. 8

Passive (Netflow) Measurements Use Netflow measurements from border router Netflow records time, duration, bytes, packets etc./flow Calculate throughput from Bytes/duration Validate vs. iperf, bbcp etc. No extra load on network, provides other SLAC & remote hosts & applications, ~ 10-20K flows/day, 100-300 unique pairs/day Tricky to aggregate all flows for single application call Look for flows with fixed triplet (sce & dst addr, and port) Starting at the same time +- 2.5 secs, ending at roughly same time - needs tuning missing some delayed flows Check works for known active flows To ID application need a fixed server port (bbcp peer-to-peer but have modified to support) Investigating differences with tcpdump Aggregate throughputs, note number of flows/streams 9

Mbits/s Iperf SLAC to Caltech (Feb-Mar 02) 450 + Active + Passive Passive Passive vs active 0 Date Iperf matches well BBftp reports under what it achieves Active Bbftp SLAC to Caltech (Feb-Mar 02) Mbits/s 80 0 + Active + Passive Date 10

Passive bbftp from SLAC to IN2P3 (unrelated to active measurements) TCP fair share results in the green flow (60 streams) getting twice the throughput of magenta flow (30 streams) when both run simultaneously Adding the flows together we see we can get about 80Mbits/s. 11

Problems: Host configuration Need fast interface and hispeed Internet connection Need powerful enough host Need large enough available TCP windows Need enough memory Need enough disk space 12

Windows and Streams Well accepted that multiple streams and/or big windows are important to achieve optimal throughput Can be unfriendly to others Optimum windows & streams changes with changes in path, hard to optimize For 3Gbits/s and 200ms RTT need a 75MByte window 13

Even with big windows (1MB) still need multiple streams with stock TCP ANL, Caltech & RAL reach a knee (between 2 and 24 streams) above this gain in throughput slow Above knee performance still improves slowly, maybe due to squeezing out others and taking more than fair share due to large number of streams 14

Configurations 1/2 Do we measure with standard parameters, or do we measure with optimal? Need to measure all to understand effects of parameters, configurations: Windows, streams, txqueuelen, TCP stack, MTU Lot of variables Examples of 2 TCP stacks FAST TCP no longer needs multiple streams, this is a major simplification (reduces # variables by 1) Stock TCP, 1500B MTU 65ms RTT FAST TCP, 1500B MTU FAST 65ms TCP, RTT 1500B MTU 65ms RTT 15

Configurations: Jumbo frames Become more important at higher speeds: Reduce interrupts to CPU and packets to process Similar effect to using multiple streams (Hacker) Jumbo can achieve >95% utilization SNV to CHI with 1 or multiple stream up to Gbit/s Factor 5 improvement over 1500B MTU throughput for stock TCP Alternative to a new stack 16

Repetitive long term measurements 17

IEPM-BW = PingER NG Driven by data replication needs of HENP, PPDG, DataGrid No longer ship plane/truck loads of data Latency is poor Now ship all data by network (TB/day today, double each year) Complements PingER, but for high performance nets Build an infrastructure to make E2E network (e.g. iperf, packet pair dispersion) & application (FTP) measurements for high-performance A&R networking Started SC2001 18

Tasks Develop/deploy a simple, robust ssh based E2E app & net measurement and management infrastructure for making regular measurements Major step is setting up collaborations, getting trust, accounts/passwords Can use dedicated or shared hosts, located at borders or with real applications COTS hardware & OS (Linux or Solaris) simplifies application integration Integrate base set of measurement tools (ping, iperf, bbcp ), provide simple (cron) scheduling Develop data extraction, reduction, analysis, reporting, simple forecasting & archiving 19

Purposes Compare & validate tools With one another (pipechar vs pathload vs iperf or bbcp vs bbftp vs GridFTP vs Tsunami) With passive measurements, With web100 Evaluate TCP stacks (FAST, Sylvain, HS TCP, Frank Kelley ) Trouble shooting Set expectations, planning Understand requirements for high performance, jumbos performance issues, in network, OS, cpu, disk/file system etc. Provide public access to results for people & applications 20

Measurement Sites Production, i.e. choose own remote hosts, run monitor themselves: SLAC (40) San Francisco, FNAL (2) Chicago, INFN (4) Milan, NIKHEF (32) Amsterdam, APAN Japan (4) Evaluating toolkit: Internet 2 (Michigan), Manchester University, UCL, Univ. Michigan, GA Tech (5) Also demonstrated at: igrid2002, SC2002 Using on Caltech / SLAC / DataTag / Teragrid / StarLight / SURFnet testbed If all goes well 30-60 minutes to install monitoring host, often problems with keys, disk space, ports blocked, not registered in DNS, need for web access, disk space SLAC monitoring over 40 sites in 9 countries 21

22 SNV SLAC CHI ESnet NY NERSC LANL ORNL TRIUMF KEK Abilene SLAC SNV FNAL ANL NIKHEF CERN IN2P3 CERN Caltech SDSC BNL JAnet HSTN SEA ATL CLV RAL UCL UManc DL NNW NY UTDallas UMich I2 SOX UFL APAN RIKEN INFN-Roma INFN-Milan CESnet APAN Geant Stanford CalREN Rice ORN JLAB GARR CAnet Surfnet Stanford Renater 220 56 220 110 42 68 65 84 31 323 278 31 220 433 15 478 226 44 11 125 133 93 17 80 18 290 17 120 300 95 IPLS UIUC 140 Monitor 00Mbps E

Results Time series data, scatter plots, histograms CPU utilization required (MHz/Mbits/s) jumbo and standard, new stacks Forecasting Diurnal behavior characterization Disk throughput as function of OS, file system, caching Correlations with passive, web100 23

24 www.slac.stanford.edu/comp/net/bandwidth-tests/antonia/html/slac_wan_bw_tests.html

Excel 25

Problem Detection Must be lots of people working on this? Our approach is: Rolling averages if have recent data Diurnal changes 26

Rolling Averages EWMA~Avg of last 5 points +- 2% 27

Fit to α*sin(t+φ)+γ Indicate diurnalness by δγ, can look at previous week at same time, if do not have recent measurements 28

Alarms Too much to keep track of Rather not wait for complaints Automated Alarms Rolling average à la RIPE-TTM 29

30

31

Action However concern is generated Look for changes in traceroute Compare tools Compare common routes Cross reference other alarms 32

Next steps Rewrite (again) based on experiences Improved ability to add new tools to measurement engine and integrate into extraction, analysis GridFTP, tsunami, UDPMon, pathload Improved robustness, error diagnosis, management Need improved scheduling Want to look at other security mechanisms 33

More Information IEPM/PingER home site: www-iepm.slac.stanford.edu/ IEPM-BW site www-iepm.slac.stanford.edu/bw Quick Iperf http://www-iepm.slac.stanford.edu/bw/iperf_res.html ABwE Submitted to PAM2003 34

Passive vs Active correlations Strong 35

IEPM-BW Uses/deliverables Understand and identify resources needed to achieve high throughput performance for Grid and other data intensive applications Provide access to archival and near real-time data and results for eyeballs and applications: planning and expectation setting, see effects of upgrades assist in trouble-shooting problems by identifying what is impacted, time and magnitude of changes and anomalies as input for application steering (e.g. data grid bulk data transfer), changing configuration parameters for forecasting and further analysis Identify critical changes in performance, record and notify administrators and/or users Provide a platform for evaluating new network tools (e.g. pathrate, pathload, GridFTP, INCITE, UDPmon ) Provide measurement/analysis/reporting suite for Grid & hi-perf sites 36

IEPM-BW Deployment

38 SNV SLAC CHI ESnet NY Stanford CalREN NERSC LANL JLAB TRIUMF KEK Abilene SLAC SNV FNAL ANL NIKHEF CERN IN2P3 CERN CALTECH SDSC BNL JAnet HSTN SEA ATL CLV IPLS RAL UCL UManc DL NNW NY Rice UTDallas NCSA UMich I2 SOX UFL APAN RIKEN INFN-Roma INFN-Milan CESnet APAN Geant EDG PPDG/GriPhyN Monitoring Site

Early results Reasonable estimates of throughput achievable with 10 sec iperf meas. Multiple streams and big windows are critical Improve over default by 5 to 60. There is an optimum windows*streams Continuous data at 90 min intervals from SLAC to 33 hosts in 8 countries since Dec 01 39

Early results 1MHz ~ 1Mbps Bbcp mem to mem tracks iperf BBFTP & bbcp disk to disk tracks iperf until disk performance limits Bandwidth estimators fail above 100Mbits/s High throughput affects RTT for others E.g. to Europe adds ~ 100ms Archival raw throughput data & graphs already available via http 40

File copy disk-to-disk E.g. Iperf vs file copy disk to disk 100 Fast Ethernet OC3 Disk limited 0 Iperf TCP Mbits/s 400 Over 60Mbits/s iperf >> file copy 41

Disk performance It matters for the applications Depends on: disk sub-system file system (nfs, ufs, /tmp), caching, can be cached for long time, can improve throughput by factor of 10 or more Read speed varies from 4MB/s to 230MB/s measured for 30 remote hosts (depends on caching) Uncached write speeds vary from 1MByte/s to 29MBytes/s If disk speed < network speed, no need to measure network, so need parallelizing of disks & servers 42

Pipechar min throughpt Mbits/s 300 0 E.g. iperf vs pipechar Iperf TCP Mbits/s 400 Working with Developer. Not using TCP. Pipechar disagrees badly above 100Mbits/s (6 hosts, 20%), 50% of hosts have reasonable agreement Typical of several bw prediction tools. 43

Web100 vs Iperf throughputs Nice sanity check is to see if Web100 sees the same throughput as iperf reports Web100 throughput = Streams*DataBytesOut *8 / (max(iperfstreamdur)) / 10^6 Looks like good agreement 44

Web100 estimated vs observed throughput EstBW~C*Streams*MSS/(RTT*sqrt(loss)) (Mathis et. al. & Hacker) Measure retransmissions using Web100, loss~pktsretrans/pktsout Note high degree of correlation (R^2 >0.8) If window large enough ((>=128KB) then ~ common relation, window threshold varies from link to link 45

Effect of saturation If keep below knee then RTT stays low Could also use congestion signals Could use to optimize, e.g. throttle back app 46

Forecasting Given access to the data one can do real-time forecasting for TCP bandwidth, file transfer/copy throughput E.g. NWS, Predicting the Performance of Wide Area Data Transfers by Vazhkudai, Schopf & Foster Developing simple prototype using average of previous measurements Validate predictions versus observations Get better estimates to adapt frequency of active measurements & reduce impact Also use ping RTTs and route information Look at need for diurnal corrections Use for steering applications Working with NWS for more sophisticated forecasting Can also use on demand bandwidth estimators (e.g. pipechar, but need to know range of applicability) 47

Forecast results Predict=Moving average of last 5 measurements +- σ Iperf TCP throughput SLAC to Wisconsin, Jan 02 Mbits/s 100 Observed Predicted 60 % average error = average(abs(observe-predict)/observe) x 33 nodes Average % error Iperf TCP 13% +- 11% Bbcp mem 23% +- 18% Bbcp disk 15% +-13% bbftp 14% +-12% pipechar 13% +-8% 48

Impact on Others Make ping measurements with & without iperf loading Loss loaded(unloaded) RTT Looking at how to avoid impact: e.g. QBSS/LBE, application pacing, control loop on stdev(rtt) reducing streams, want to avoid scheduling 49

Possible HEP usage of QBSS Apply priority to lower volume interactive voice/video-conferencing and real time control Apply QBSS to high volume data replication Leave the rest as Best Effort Since 40-65% of bytes to/from SLAC come from a single application, we have modified to enable setting of TOS bits Need to identify bottlenecks and implement QBSS there Bottlenecks tend to be at edges so hope to try with a few HEP sites 50

Experiences Getting ssh accounts and resources on remote hosts Tremendous variation in account procedures from site to site, takes up to 7 weeks, requires knowing somebody who cares, sites are becoming increasingly circumspect Steep learning curve on ssh, different versions Getting disk space for file copies (100s Mbytes) Diversity of OSs, userids, directory structures, where to find perl, iperf..., contacts Required database to track Also anonymizes hostnames, tracks code versions, whether to execute command (e.g. no ping if site blocks ping) & with what options, Developed tools to download software and to check remote configurations Remote server (e.g. iperf) crashes Start & kill server remotely for each measurement Commands lock up or never end: Time out all commands Hung processes need to be killed (or else fill up sockets with CLOSE-WAIT) Some commands (e.g. pipechar) take a long time, others (disk throughput, pathrate) have results that change infrequently so have different schedules AFS tokens to allow access to.ssh identity timed out, used trscron Protocol port blocking Ssh following Xmas attacks; bbftp, iperf ports, big variation between sites Wrote analyses to recognize and track problems and work with site contacts Ongoing issue, especially with increasing need for security, and since we want to measure inside firewalls close to real applications 51

Next steps Develop/extend management, analysis, reporting, navigating tools improve robustness, manageability, workaround ssh anomalies Get improved forecasters (NWS) and quantify how they work, provide tools to access Optimize intervals (using forecasts, and lighter weight measurements) and durations Evaluate self rate limiting application (bbcp), look at using Web100 for feedback loop Extend analysis of passive Netflow measurements Add gridftp (with Allcock@ANL), UDPmon (RHJ@manchester) & new BW measurers netest (Jin@LBNL), pathrate, pathload (Dovropolis@Udel) Make early data available via http to interested & friendly researchers CAIDA for correlation and validation of Pipechar & iperf etc. (sent documentaion) NWS for forecasting with UCSB (sent documentation) Understand correlations, validate various tools, choose optimum set Make data available by std methods (e.g. MDS, GMA, ) with Dantong@BNL, Jenny Schopf@ANL & Tierney@LBNL Make tools portable, set up other monitoring sites, e.g. PPDG sites SLAC ported to Linux Currently porting measurement tools to Manchester Will work with INFN/Trieste & FNAL to port to other sites 52