perfsonar: End-to-End Network Performance Verification

Similar documents
Tier3 Network Issues. Richard Carlson May 19, 2009

TCP Labs. WACREN Network Monitoring and Measurement Workshop Antoine Delvaux perfsonar developer

Network Probe. Figure 1.1 Cacti Utilization Graph

Using TrueSpeed VNF to Test TCP Throughput in a Call Center Environment

EVALUATING NETWORK BUFFER SIZE REQUIREMENTS

Deploying distributed network monitoring mesh

Application Note. Windows 2000/XP TCP Tuning for High Bandwidth Networks. mguard smart mguard PCI mguard blade

Performance Analysis of IPv4 v/s IPv6 in Virtual Environment Using UBUNTU

Infrastructure for active and passive measurements at 10Gbps and beyond

Overview of Network Measurement Tools

Deploying in a Distributed Environment

Operating Systems and Networks Sample Solution 1

Network Monitoring with the perfsonar Dashboard

Campus Network Design Science DMZ

Network performance monitoring Insight into perfsonar

LHCONE Site Connections

Question: 3 When using Application Intelligence, Server Time may be defined as.

D1.2 Network Load Balancing

Hands on Workshop. Network Performance Monitoring and Multicast Routing. Yasuichi Kitamura NICT Jin Tanaka KDDI/NICT APAN-JP NOC

1000Mbps Ethernet Performance Test Report

Mike Canney Principal Network Analyst getpackets.com

Network testing with iperf

N5 NETWORKING BEST PRACTICES

Using Linux Traffic Control on Virtual Circuits J. Zurawski Internet2 February 25 nd 2013

Measure wireless network performance using testing tool iperf

Linux NIC and iscsi Performance over 40GbE

Procedure: You can find the problem sheet on Drive D: of the lab PCs. 1. IP address for this host computer 2. Subnet mask 3. Default gateway address

Network Performance Optimisation and Load Balancing. Wulf Thannhaeuser

Network monitoring with perfsonar. Duncan Rand Imperial College London

Windows Server 2012 R2 Hyper-V: Designing for the Real World

Performance of Host Identity Protocol on Nokia Internet Tablet

Remote PC Guide Series - Volume 1

Frequently Asked Questions

Technical Support Information Belkin internal use only

ADVANCED NETWORK CONFIGURATION GUIDE

A High-Performance Storage and Ultra-High-Speed File Transfer Solution

Policy Based Forwarding

Sage ERP Accpac Online

Sage 300 ERP Online. Mac Resource Guide. (Formerly Sage ERP Accpac Online) Updated June 1, Page 1

Iperf Tutorial. Jon Dugan Summer JointTechs 2010, Columbus, OH

Applications. Network Application Performance Analysis. Laboratory. Objective. Overview

Guideline for setting up a functional VPN

Integration of Network Performance Monitoring Data at FTS3

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

perfsonar MDM updates for LHCONE: VRF monitoring, updated web UI, VM images

Ten top problems network techs encounter

Microsoft SQL Server 2012 on Cisco UCS with iscsi-based Storage Access in VMware ESX Virtualization Environment: Performance Study

Quantum StorNext. Product Brief: Distributed LAN Client

How To Connect To Bloomerg.Com With A Network Card From A Powerline To A Powerpoint Terminal On A Microsoft Powerbook (Powerline) On A Blackberry Or Ipnet (Powerbook) On An Ipnet Box On

VMWARE WHITE PAPER 1

Set Up a VM-Series Firewall on an ESXi Server

RFC 6349 Testing with TrueSpeed from JDSU Experience Your Network as Your Customers Do

Fundamentals of Data Movement Hardware

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

Networking Topology For Your System

Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine

Rapid Remote File System Benchmark on Long Fat Network and User ID Mapping Function Naoyuki FUJITA, Hirofumi OOKAWA

Internet Services. Amcom. Support & Troubleshooting Guide

The Ecosystem of Computer Networks. Ripe 46 Amsterdam, The Netherlands

Visualizations and Correlations in Troubleshooting

CCNA Discovery Networking for Homes and Small Businesses Student Packet Tracer Lab Manual

Quantifying the Performance Degradation of IPv6 for TCP in Windows and Linux Networking

Configuring PA Firewalls for a Layer 3 Deployment

Configuring IPS High Bandwidth Using EtherChannel Load Balancing

IMPLEMENTING GREEN IT

Hardware Sizing and Bandwidth Usage Guide. McAfee epolicy Orchestrator Software

Allocating Network Bandwidth to Match Business Priorities

Distributed Network Monitoring. netbeez.net Booth #2344

End-to-End Network/Application Performance Troubleshooting Methodology

LOLA (Low Latency) Project

Improving Effective WAN Throughput for Large Data Flows By Peter Sevcik and Rebecca Wetzel November 2008

Test Methodology White Paper. Author: SamKnows Limited

Performance Tuning best pracitces and performance monitoring with Zabbix

New!! - Higher performance for Windows and UNIX environments

Availability Digest. Redundant Load Balancing for High Availability July 2013

Fibre Channel over Ethernet in the Data Center: An Introduction

IP addressing and forwarding Network layer

Distributed applications monitoring at system and network level

TamoSoft Throughput Test

Nutanix Tech Note. VMware vsphere Networking on Nutanix

Lab Exercise Configure the PIX Firewall and a Cisco Router

Troubleshooting Tools

Set Up a VM-Series Firewall on an ESXi Server

L-Series LAN Provisioning Best Practices for Local Area Network Deployment. Introduction. L-Series Network Provisioning

LSI MegaRAID CacheCade Performance Evaluation in a Web Server Environment

Transcription:

perfsonar: End-to-End Network Performance Verification Toby Wong Sr. Network Analyst, BCNET Ian Gable Technical Manager, Canada

Overview 1. IntroducGons 2. Problem Statement/Example Scenario 3. Why perfsonar? 4. Success Stories 5. DemonstraGon 6. Q & A 2

IntroducGons Ian Gable Technical Manager igable@uvic.ca Toby Wong Network Analyst toby.wong@bc.net 3

Problem Statement Research ScienGst Data 4

Data Movement Data flows are ge^ng larger, more projects able to move data ATLAS, CANFAR, Genomics, Square Kilometer Array, LSST 3.2 Gigapixel camera etc. Large data transfers sensigve to loss Capability of individual data transfer machines growing, especially with SSD Sites with the data are widely distributed, and the users are distributed All data flows cross mulgple network domains CANARIE and BCNET are 100G and soon many sites will be Note: Some slides inspired by/derived from work by Jason Zurawski ESnet perfsonar guru 5

Network View Jim s Host In Vancouver The Network Or The Internets Jane s Host In Toronto 6

Network View Jim s Host In Vancouver switch A Router Router Router Router Router firewall switch B Switch C Jane s Host In Toronto 7

Back to Reality The problem is almost never the network, but what do you do when it is. Is the problem in the core i.e. BCNET or CANARIE? Is the problem in campus network? How about the local switch port? Is the TCP stack on the local host properly tuned, are the right drivers installed. Is it just poor disk I/O 8

Problems on long RTT Paths Most network problems scale with the round trip Gme (RTT) of the path TCP Buffer Space: Throughput = Buffer/RTT Packet Loss: Throughput = (MSS/RTT) (1/sqrt(packetloss)) This fact can lead to local problems being misinterpreted as a problem with the WAN It works fine when I run the transfer between two local host must be a problem on the WAN 9

Buffer Size Example For Long Fat Networks you must have sufficiently large TCP buffers for there to be enough data in flight to fill the pipe. Max throughput = BufferSize/latency OpGmum: BufferSize = 2 x Bandwidth x RTT BufferSize = 2 x (Bandwidth Delay Product) Reseacher wants to move data from U of Toronto to Vancouver and expects 500 Mbits/s: (0.05 sec x 500 Mbps x 2) /8 = 6.25 MB buffer needed Older RHEL Default Max Buffer size: 256 Kbytes Linux Distro Default varies 256 KB to 4 MB 10

What does the researcher get: 256 KB / 0.050 seconds = 5 Megabytes/s = 40 Mbits/s Researcher Conclusion: "The network between here and Toronto sucks Excellent resources: hup://fasterdata.es.net/ 11

Packet Loss 12

What is perfsonar? Standardized Measurement Infrastructure Developed by Internet2, GEANT and ESnet Can be used by both network operators and end users All sovware is open source Runs on commodity PC hardware 13

Benefits Increases network awareness Reduces network diagnosgc Gme Isolates network performance issues Provides current and historical visibility into network performance Ability to troubleshoot across mulgple network domains 14

Benefits Trusted network measurement point Ability to diagnose whether problem is a host issue vs. network issue ConGnuous performance verificagon; historical data retained 15

Quick Tests Reverse traceroute Reverse ping Reverse tracepath 16

Network Tests Access to on- demand and historical informagon Delay Packet loss Throughput transfer rates 17

Network Test Suite BWCTL client/server bandwidth test controller scheduler OWAMP client/server program designed to test one- way delay and jiuer NDT client/server java applicagon to test end- user bandwidth performance NPAD client/server java applicagon which diagnoses last- mile segment Ping / Traceroute ICMP responders 18

Server Requirements Intel CPU Ethernet interface (1Gb, 10Gb and now 40 GE) Usually 1U Rack server Network Time Protocol (NTP) source OperaGng System runs on CentOS (Linux) perfsonar is best used on a dedicated server to prevent compeggon of resources when running tests. Ideally, two separate servers would be used. One for latency and the other for throughput. 19

Deployment ConsideraGons Servers should be placed closest to the service that is to be measured MulGple nodes along your network can provide a view of your network (especially important in a distributed environment) Placing a firewall in front of the servers is not recommended due to the results being skewed 20

Deployment - NTP Network Time Protocol (NTP) Accurate Time Source NTP is important here as the tests require accurate Gme Time sources should ideally use the same stratum Geographically close servers located in a low latency and low jiuer network 4-5 NTP Gme sources 21

AdopGon Worldwide and in Canada hup://stats.es.net/servicesdirectory/ April 29 - May 1 perfsonar: End- to- End Network Performance VerificaGon 22

AdopGon Canada BWCTL servers 46 servers NDT servers 52 servers NPAD servers 33 servers OWAMP servers 29 servers Ping responders 55 servers Traceroute responders 55 servers Used By:, BCNet, CANARIE, Compute Canada, LHC Servers now available in most Large CompuGng centres. 23

BCNET ImplementaGon Hardware Intel Xeon 2.4Ghz CPU 8GB RAM 1TB SAS Hard Drive Onboard 1Gb NIC PCIe 10Gb NIC (bandwidth servers) 24

BCNET ImplementaGon 2 servers at each POP site (1 x 1Gbps latency, 1 x 10Gbps bandwidth) Server pairs located in Surrey, Victoria, Kamloops, Kelowna and Prince George Network reachable via both R&E and Internet networks Latency servers are configured to test against other latency servers (mesh configuragon) Throughput servers are configured to test against other throughput servers (mesh configuragon) Tests are also configured to the nearest CANARIE servers which are located in Vancouver 25

Case Slow throughput from UVic to Chicago reporgng slow throughput between University of Victoria (UVic) and Argonne NaGonal Laboratory (ANL). April 29 - May 1 perfsonar: End- to- End Network Performance VerificaGon 26

Case Slow throughput from UVic to Chicago PerfSONAR chart showing performance 27

Case Slow throughput from UVic to Chicago Perform traceroutes to verify source to desgnagon path 1. nestaffgw.bc.net 0.0% 663 56.2 5.1 0.2 95.2 12.6 2. ORAN-CU-ALL-cr1.vantx1.BC.net 0.0% 663 0.3 1.2 0.2 106.3 7.9 3. 205.189.32.194 0.0% 663 11.6 12.1 11.4 44.0 3.6 4. nlr-1-lo-jmb-706.sttlwa.pacificwave.net 0.0% 663 25.9 30.8 25.8 234.3 24.4 5. 216.24.186.69 0.0% 662 79.7 83.5 79.5 279.0 21.6 6. 216.24.186.71 0.0% 662 85.6 83.4 79.5 279.8 21.0 7. 216.24.186.62 0.0% 662 80.8 84.1 79.5 285.6 21.7 8. xe-1-2-0.118.rtr.ictc.indiana.gigapop.net 0.0% 662 84.0 86.1 83.8 149.5 9.3 9. 149.165.144.50 0.0% 662 88.8 89.0 88.6 114.0 2.0 10. vm-102.uc.futuregrid.org 0.0% 662 89.3 89.1 89.1 89.8 0.1 Network path = BCNET, CANARIE, Pacific Wave, NaGonal Lambda Rail (NLR), NorthernLights GigaPoP, and the desgnagon Argonne NaGonal Lab (aka FutureGrid) 28

Case Slow throughput from UVic to Chicago Perform traceroutes to verify desgnagon to source path vm-102:~# traceroute elephant01.heprc.uvic.ca traceroute to elephant01.heprc.uvic.ca (206.12.154.1), 30 hops max, 40 byte packets 1 * * * 2 xe-0-2-0.2014.rtr.ictc.indiana.gigapop.net (149.165.144.49) 5.395 ms 5.405 ms 5.431 ms 3 tge-0-1-0-0.2093.chic.layer3.nlr.net (149.165.254.226) 10.003 ms 10.056 ms 10.112 ms 4 216.24.186.63 (216.24.186.63) 64.097 ms 64.194 ms 64.255 ms 5 216.24.186.70 (216.24.186.70) 64.035 ms 64.114 ms 64.181 ms 6 216.24.186.68 (216.24.186.68) 63.707 ms 63.685 ms 63.705 ms 7 207.231.240.21 (207.231.240.21) 77.812 ms 77.818 ms 77.765 ms 8 c4-bcnet.canet4.net (205.189.32.193) 88.926 ms 181.947 ms 181.686 ms 9 cr1-bb3901.victx1.bc.net (206.12.0.38) 90.658 ms 90.677 ms 90.656 ms 10 ORAN-UVicA.VICTX.BC.net (207.23.241.114) 91.046 ms 91.150 ms 90.962 ms 11 csc1cled050.bb.uvic.ca (142.104.252.245) 90.989 ms 90.999 ms 90.952 ms 12 drc1edc111.bb.uvic.ca (142.104.252.154) 91.005 ms 90.991 ms 90.985 ms 13 elephant.heprc.uvic.ca (206.12.154.1) 90.978 ms!x 90.963 ms!x 90.939 ms!x Network path = Argonne NaGonal Lab, NorthernLights GigaPoP, NaGonal Lambda Rail (NLR), Pacific Wave, CANARIE, BCNET and then UVic 29

Case Slow throughput from UVic to Chicago Perform bandwidth tests to verify where bouleneck is in the path Test from Argonne host to: NDT - Northern Lights Gigapop @ Minneapolis, MN ====================== vm-101:~# web100clt -n 146.57.255.17 Testing network path for configuration and performance problems Using IPv4 address Checking for Middleboxes.................. Done checking for firewalls................... Done running 10s outbound test (client to server)..... 866.03 Mb/s running 10s inbound test (server to client)...... 755.87 Mb/s The slowest link in the end-to-end path is a 1.0 Gbps Gigabit Ethernet subnet Server '146.57.255.17' is not behind a firewall. [Connection to the ephemeral port was successful] Client is not behind a firewall. [Connection to the ephemeral port was successful] Packet size is preserved End-to-End Information: Network Address Translation (NAT) box is modifying the Server's IP address Server says [146.57.255.17] but Client says [ infotech-sv-62.ggnet.umn.edu] Information: Network Address Translation (NAT) box is modifying the Client's IP address Server says [149.165.148.101] but Client says [ vm-101.uc.futuregrid.org] 30

Case Slow throughput from UVic to Chicago Perform bandwidth tests to verify where bouleneck is in the path AddiGonal NDT tests Argonne to Chicago: pass Argonne to Denver: pass Argonne to Seaule: fail UVic to Seaule: pass UVic to Denver: fail UVic to Chicago: fail 31

Case Slow throughput from UVic to Chicago Perform IP fragmentagon test Standard MTU test: [tobywong@knoppix ~]$ ping -s 1472 vm-101.uc.futuregrid.org PING vm-101.uc.futuregrid.org (149.165.148.101) 1472(1500) bytes of data. 1480 bytes from vm-101.uc.futuregrid.org (149.165.148.101): icmp_seq=1 ttl=55 time=90.9 ms 1480 bytes from vm-101.uc.futuregrid.org (149.165.148.101): icmp_seq=2 ttl=55 time=90.8 ms 1480 bytes from vm-101.uc.futuregrid.org (149.165.148.101): icmp_seq=3 ttl=55 time=91.3 ms 1480 bytes from vm-101.uc.futuregrid.org (149.165.148.101): icmp_seq=4 ttl=55 time=91.2 ms ^C vm-101.uc.futuregrid.org ping statistics 4 packets transmitted, 4 received, 0% packet loss, time 3003ms rtt min/avg/max/mdev = 90.891/91.097/91.328/0.353 ms Jumbo MTU Test: [tobywong@knoppix ~]$ ping -s 1500 vm-101.uc.futuregrid.org PING vm-101.uc.futuregrid.org (149.165.148.101) 1500(1528) bytes of data. ^C vm-101.uc.futuregrid.org ping statistics 5 packets transmitted, 0 received, 100% packet loss, time 3999ms 32

Case Slow throughput from UVic to Chicago Perform IP fragmentagon test AddiGonal standard/jumbo frame tests using ping UVic to Seaule: pass UVic to Denver: fail UVic to Chicago: fail 33

Case Slow throughput from UVic to Chicago Problem network configuragon found! MTU on the interface between Pacific Wave and NLR was set to 1500 instead of 9000 34

Dashboard Showing Packet Loss Failure in the column indicates problem to Toronto 35

Detected Hardware Problem 05:47 Failed switchport in TRIUMF LHCONE router 36

Deployment in PracGce 37

Compute Canada Dashboard hup://ps- dashboard.computecanada.ca/maddash- webui/ Lixin Liu <liu@sfu.ca> 38

Mesh configuragons Lager groupings of independently operated perfsonar hosts can be hard to coordinate. One has to contact individual operators to set up a grouping of test to examine a problem. Mesh pull configuragon can allow an individual perfsonar host to pull it s test configuragons from a central spot. 39

WEB GUI for Mesh ConfiguraGons 40

Dashboard Other Dashboards Latency BCNet Dashboards Bandwidth Latency Bandwidth Per Community dashboards 41

Summary perfsonar will provide greater visibility into the network for network operators and users Useful tool in idengfying issues across mulgple network domains Allows congnuous performance verificagon Trusted measurement resource for users and network operators alike Standardized measurement toolkit that is both easy to deploy and to use Develop views that are useful to communiges. 42

Ge^ng the performance you expect from high performance networks 43

Links Official Website hup://psps.perfsonar.net/ InstallaGon Guide hups://code.google.com/p/perfsonar- ps/wiki/ psperformancetoolkit33 44