perfsonar: End-to-End Network Performance Verification Toby Wong Sr. Network Analyst, BCNET Ian Gable Technical Manager, Canada
Overview 1. IntroducGons 2. Problem Statement/Example Scenario 3. Why perfsonar? 4. Success Stories 5. DemonstraGon 6. Q & A 2
IntroducGons Ian Gable Technical Manager igable@uvic.ca Toby Wong Network Analyst toby.wong@bc.net 3
Problem Statement Research ScienGst Data 4
Data Movement Data flows are ge^ng larger, more projects able to move data ATLAS, CANFAR, Genomics, Square Kilometer Array, LSST 3.2 Gigapixel camera etc. Large data transfers sensigve to loss Capability of individual data transfer machines growing, especially with SSD Sites with the data are widely distributed, and the users are distributed All data flows cross mulgple network domains CANARIE and BCNET are 100G and soon many sites will be Note: Some slides inspired by/derived from work by Jason Zurawski ESnet perfsonar guru 5
Network View Jim s Host In Vancouver The Network Or The Internets Jane s Host In Toronto 6
Network View Jim s Host In Vancouver switch A Router Router Router Router Router firewall switch B Switch C Jane s Host In Toronto 7
Back to Reality The problem is almost never the network, but what do you do when it is. Is the problem in the core i.e. BCNET or CANARIE? Is the problem in campus network? How about the local switch port? Is the TCP stack on the local host properly tuned, are the right drivers installed. Is it just poor disk I/O 8
Problems on long RTT Paths Most network problems scale with the round trip Gme (RTT) of the path TCP Buffer Space: Throughput = Buffer/RTT Packet Loss: Throughput = (MSS/RTT) (1/sqrt(packetloss)) This fact can lead to local problems being misinterpreted as a problem with the WAN It works fine when I run the transfer between two local host must be a problem on the WAN 9
Buffer Size Example For Long Fat Networks you must have sufficiently large TCP buffers for there to be enough data in flight to fill the pipe. Max throughput = BufferSize/latency OpGmum: BufferSize = 2 x Bandwidth x RTT BufferSize = 2 x (Bandwidth Delay Product) Reseacher wants to move data from U of Toronto to Vancouver and expects 500 Mbits/s: (0.05 sec x 500 Mbps x 2) /8 = 6.25 MB buffer needed Older RHEL Default Max Buffer size: 256 Kbytes Linux Distro Default varies 256 KB to 4 MB 10
What does the researcher get: 256 KB / 0.050 seconds = 5 Megabytes/s = 40 Mbits/s Researcher Conclusion: "The network between here and Toronto sucks Excellent resources: hup://fasterdata.es.net/ 11
Packet Loss 12
What is perfsonar? Standardized Measurement Infrastructure Developed by Internet2, GEANT and ESnet Can be used by both network operators and end users All sovware is open source Runs on commodity PC hardware 13
Benefits Increases network awareness Reduces network diagnosgc Gme Isolates network performance issues Provides current and historical visibility into network performance Ability to troubleshoot across mulgple network domains 14
Benefits Trusted network measurement point Ability to diagnose whether problem is a host issue vs. network issue ConGnuous performance verificagon; historical data retained 15
Quick Tests Reverse traceroute Reverse ping Reverse tracepath 16
Network Tests Access to on- demand and historical informagon Delay Packet loss Throughput transfer rates 17
Network Test Suite BWCTL client/server bandwidth test controller scheduler OWAMP client/server program designed to test one- way delay and jiuer NDT client/server java applicagon to test end- user bandwidth performance NPAD client/server java applicagon which diagnoses last- mile segment Ping / Traceroute ICMP responders 18
Server Requirements Intel CPU Ethernet interface (1Gb, 10Gb and now 40 GE) Usually 1U Rack server Network Time Protocol (NTP) source OperaGng System runs on CentOS (Linux) perfsonar is best used on a dedicated server to prevent compeggon of resources when running tests. Ideally, two separate servers would be used. One for latency and the other for throughput. 19
Deployment ConsideraGons Servers should be placed closest to the service that is to be measured MulGple nodes along your network can provide a view of your network (especially important in a distributed environment) Placing a firewall in front of the servers is not recommended due to the results being skewed 20
Deployment - NTP Network Time Protocol (NTP) Accurate Time Source NTP is important here as the tests require accurate Gme Time sources should ideally use the same stratum Geographically close servers located in a low latency and low jiuer network 4-5 NTP Gme sources 21
AdopGon Worldwide and in Canada hup://stats.es.net/servicesdirectory/ April 29 - May 1 perfsonar: End- to- End Network Performance VerificaGon 22
AdopGon Canada BWCTL servers 46 servers NDT servers 52 servers NPAD servers 33 servers OWAMP servers 29 servers Ping responders 55 servers Traceroute responders 55 servers Used By:, BCNet, CANARIE, Compute Canada, LHC Servers now available in most Large CompuGng centres. 23
BCNET ImplementaGon Hardware Intel Xeon 2.4Ghz CPU 8GB RAM 1TB SAS Hard Drive Onboard 1Gb NIC PCIe 10Gb NIC (bandwidth servers) 24
BCNET ImplementaGon 2 servers at each POP site (1 x 1Gbps latency, 1 x 10Gbps bandwidth) Server pairs located in Surrey, Victoria, Kamloops, Kelowna and Prince George Network reachable via both R&E and Internet networks Latency servers are configured to test against other latency servers (mesh configuragon) Throughput servers are configured to test against other throughput servers (mesh configuragon) Tests are also configured to the nearest CANARIE servers which are located in Vancouver 25
Case Slow throughput from UVic to Chicago reporgng slow throughput between University of Victoria (UVic) and Argonne NaGonal Laboratory (ANL). April 29 - May 1 perfsonar: End- to- End Network Performance VerificaGon 26
Case Slow throughput from UVic to Chicago PerfSONAR chart showing performance 27
Case Slow throughput from UVic to Chicago Perform traceroutes to verify source to desgnagon path 1. nestaffgw.bc.net 0.0% 663 56.2 5.1 0.2 95.2 12.6 2. ORAN-CU-ALL-cr1.vantx1.BC.net 0.0% 663 0.3 1.2 0.2 106.3 7.9 3. 205.189.32.194 0.0% 663 11.6 12.1 11.4 44.0 3.6 4. nlr-1-lo-jmb-706.sttlwa.pacificwave.net 0.0% 663 25.9 30.8 25.8 234.3 24.4 5. 216.24.186.69 0.0% 662 79.7 83.5 79.5 279.0 21.6 6. 216.24.186.71 0.0% 662 85.6 83.4 79.5 279.8 21.0 7. 216.24.186.62 0.0% 662 80.8 84.1 79.5 285.6 21.7 8. xe-1-2-0.118.rtr.ictc.indiana.gigapop.net 0.0% 662 84.0 86.1 83.8 149.5 9.3 9. 149.165.144.50 0.0% 662 88.8 89.0 88.6 114.0 2.0 10. vm-102.uc.futuregrid.org 0.0% 662 89.3 89.1 89.1 89.8 0.1 Network path = BCNET, CANARIE, Pacific Wave, NaGonal Lambda Rail (NLR), NorthernLights GigaPoP, and the desgnagon Argonne NaGonal Lab (aka FutureGrid) 28
Case Slow throughput from UVic to Chicago Perform traceroutes to verify desgnagon to source path vm-102:~# traceroute elephant01.heprc.uvic.ca traceroute to elephant01.heprc.uvic.ca (206.12.154.1), 30 hops max, 40 byte packets 1 * * * 2 xe-0-2-0.2014.rtr.ictc.indiana.gigapop.net (149.165.144.49) 5.395 ms 5.405 ms 5.431 ms 3 tge-0-1-0-0.2093.chic.layer3.nlr.net (149.165.254.226) 10.003 ms 10.056 ms 10.112 ms 4 216.24.186.63 (216.24.186.63) 64.097 ms 64.194 ms 64.255 ms 5 216.24.186.70 (216.24.186.70) 64.035 ms 64.114 ms 64.181 ms 6 216.24.186.68 (216.24.186.68) 63.707 ms 63.685 ms 63.705 ms 7 207.231.240.21 (207.231.240.21) 77.812 ms 77.818 ms 77.765 ms 8 c4-bcnet.canet4.net (205.189.32.193) 88.926 ms 181.947 ms 181.686 ms 9 cr1-bb3901.victx1.bc.net (206.12.0.38) 90.658 ms 90.677 ms 90.656 ms 10 ORAN-UVicA.VICTX.BC.net (207.23.241.114) 91.046 ms 91.150 ms 90.962 ms 11 csc1cled050.bb.uvic.ca (142.104.252.245) 90.989 ms 90.999 ms 90.952 ms 12 drc1edc111.bb.uvic.ca (142.104.252.154) 91.005 ms 90.991 ms 90.985 ms 13 elephant.heprc.uvic.ca (206.12.154.1) 90.978 ms!x 90.963 ms!x 90.939 ms!x Network path = Argonne NaGonal Lab, NorthernLights GigaPoP, NaGonal Lambda Rail (NLR), Pacific Wave, CANARIE, BCNET and then UVic 29
Case Slow throughput from UVic to Chicago Perform bandwidth tests to verify where bouleneck is in the path Test from Argonne host to: NDT - Northern Lights Gigapop @ Minneapolis, MN ====================== vm-101:~# web100clt -n 146.57.255.17 Testing network path for configuration and performance problems Using IPv4 address Checking for Middleboxes.................. Done checking for firewalls................... Done running 10s outbound test (client to server)..... 866.03 Mb/s running 10s inbound test (server to client)...... 755.87 Mb/s The slowest link in the end-to-end path is a 1.0 Gbps Gigabit Ethernet subnet Server '146.57.255.17' is not behind a firewall. [Connection to the ephemeral port was successful] Client is not behind a firewall. [Connection to the ephemeral port was successful] Packet size is preserved End-to-End Information: Network Address Translation (NAT) box is modifying the Server's IP address Server says [146.57.255.17] but Client says [ infotech-sv-62.ggnet.umn.edu] Information: Network Address Translation (NAT) box is modifying the Client's IP address Server says [149.165.148.101] but Client says [ vm-101.uc.futuregrid.org] 30
Case Slow throughput from UVic to Chicago Perform bandwidth tests to verify where bouleneck is in the path AddiGonal NDT tests Argonne to Chicago: pass Argonne to Denver: pass Argonne to Seaule: fail UVic to Seaule: pass UVic to Denver: fail UVic to Chicago: fail 31
Case Slow throughput from UVic to Chicago Perform IP fragmentagon test Standard MTU test: [tobywong@knoppix ~]$ ping -s 1472 vm-101.uc.futuregrid.org PING vm-101.uc.futuregrid.org (149.165.148.101) 1472(1500) bytes of data. 1480 bytes from vm-101.uc.futuregrid.org (149.165.148.101): icmp_seq=1 ttl=55 time=90.9 ms 1480 bytes from vm-101.uc.futuregrid.org (149.165.148.101): icmp_seq=2 ttl=55 time=90.8 ms 1480 bytes from vm-101.uc.futuregrid.org (149.165.148.101): icmp_seq=3 ttl=55 time=91.3 ms 1480 bytes from vm-101.uc.futuregrid.org (149.165.148.101): icmp_seq=4 ttl=55 time=91.2 ms ^C vm-101.uc.futuregrid.org ping statistics 4 packets transmitted, 4 received, 0% packet loss, time 3003ms rtt min/avg/max/mdev = 90.891/91.097/91.328/0.353 ms Jumbo MTU Test: [tobywong@knoppix ~]$ ping -s 1500 vm-101.uc.futuregrid.org PING vm-101.uc.futuregrid.org (149.165.148.101) 1500(1528) bytes of data. ^C vm-101.uc.futuregrid.org ping statistics 5 packets transmitted, 0 received, 100% packet loss, time 3999ms 32
Case Slow throughput from UVic to Chicago Perform IP fragmentagon test AddiGonal standard/jumbo frame tests using ping UVic to Seaule: pass UVic to Denver: fail UVic to Chicago: fail 33
Case Slow throughput from UVic to Chicago Problem network configuragon found! MTU on the interface between Pacific Wave and NLR was set to 1500 instead of 9000 34
Dashboard Showing Packet Loss Failure in the column indicates problem to Toronto 35
Detected Hardware Problem 05:47 Failed switchport in TRIUMF LHCONE router 36
Deployment in PracGce 37
Compute Canada Dashboard hup://ps- dashboard.computecanada.ca/maddash- webui/ Lixin Liu <liu@sfu.ca> 38
Mesh configuragons Lager groupings of independently operated perfsonar hosts can be hard to coordinate. One has to contact individual operators to set up a grouping of test to examine a problem. Mesh pull configuragon can allow an individual perfsonar host to pull it s test configuragons from a central spot. 39
WEB GUI for Mesh ConfiguraGons 40
Dashboard Other Dashboards Latency BCNet Dashboards Bandwidth Latency Bandwidth Per Community dashboards 41
Summary perfsonar will provide greater visibility into the network for network operators and users Useful tool in idengfying issues across mulgple network domains Allows congnuous performance verificagon Trusted measurement resource for users and network operators alike Standardized measurement toolkit that is both easy to deploy and to use Develop views that are useful to communiges. 42
Ge^ng the performance you expect from high performance networks 43
Links Official Website hup://psps.perfsonar.net/ InstallaGon Guide hups://code.google.com/p/perfsonar- ps/wiki/ psperformancetoolkit33 44