Network Monitoring with the perfsonar Dashboard

Similar documents
Tier3 Network Issues. Richard Carlson May 19, 2009

Deploying distributed network monitoring mesh

Overview of Network Measurement Tools

perfsonar MDM updates for LHCONE: VRF monitoring, updated web UI, VM images

Network performance monitoring Insight into perfsonar

perfsonar: End-to-End Network Performance Verification

Campus Network Design Science DMZ

Administrator s Guide: perfsonar MDM 3.0

EVALUATING NETWORK BUFFER SIZE REQUIREMENTS

Hands on Workshop. Network Performance Monitoring and Multicast Routing. Yasuichi Kitamura NICT Jin Tanaka KDDI/NICT APAN-JP NOC

Throughput Issues for High-Speed Wide-Area Networks

TCP Labs. WACREN Network Monitoring and Measurement Workshop Antoine Delvaux perfsonar developer

HADES MA Installation Guide

Summer Webinar Series Network Monitoring Probe Virtual Appliance

Infrastructure for active and passive measurements at 10Gbps and beyond

End-to-End Network/Application Performance Troubleshooting Methodology

High Performance Bulk Data Transfer

Network monitoring with perfsonar. Duncan Rand Imperial College London

Network Monitoring. Sebastian Büttrich, NSRC / IT University of Copenhagen Last edit: February 2012, ICTP Trieste

TITANXR Multi-Switch Management Software

Cisco Performance Visibility Manager 1.0.1

Service Quality Management for multidomain network services. Pavle Vuletić, AMRES edupert videoconference, 20 July 2015

Linux Distributions. What they are, how they work, which one to choose Avi Alkalay

ZABBIX. An Enterprise-Class Open Source Distributed Monitoring Solution. Takanori Suzuki MIRACLE LINUX CORPORATION October 22, 2009

Frequently Asked Questions

Introduction to perfsonar

NMS300 Network Management System

LHCONE Site Connections

Oracle Communications Session Delivery Manager

Network Management System (NMS) FAQ

Content Distribution Management

IBM Endpoint Manager Version 9.1. Patch Management for Red Hat Enterprise Linux User's Guide

Monitoring backbone networks

The Problem with TCP. Overcoming TCP s Drawbacks

The new services in nagios: network bandwidth utility, notification and sms alert in improving the network performance

Network futures: AARNet4, Science DMZ, SDN

1000Mbps Ethernet Performance Test Report

Distributed Network Monitoring. netbeez.net Booth #2344

Correlating Internet Performance Changes and Route Changes to Assist in Trouble-shooting from an End-user Perspective

Other monitoring tools

Centralized Logging in a Decentralized World

Measure wireless network performance using testing tool iperf

Linux - CentOS 6 Install Guide

Cisco Bandwidth Quality Manager 3.1

Service Level Monitoring with Nagios. National Technical University of Athens Network Operations Center

OnTimeDetect: Offline and Online Network Anomaly Notification Tool

Troubleshoo*ng Network Performance Issues with Ac*ve Monitoring

Performance Analysis of IPv4 v/s IPv6 in Virtual Environment Using UBUNTU

A Comparison of VMware and {Virtual Server}

Figure 1. perfsonar architecture. 1 This work was supported by the EC IST-EMANICS Network of Excellence (#26854).

Maintaining Non-Stop Services with Multi Layer Monitoring

SCF/FEF Evaluation of Nagios and Zabbix Monitoring Systems. Ed Simmonds and Jason Harrington 7/20/2009

XpoLog Center Suite Log Management & Analysis platform

Integration of Network Performance Monitoring Data at FTS3

Colt IP Access Colt Technology Services

ANI Network Testbed Update

perfsonar MDM The multi-domain monitoring service for the GÉANT Service Area connect communicate collaborate

Internet Infrastructure Measurement: Challenges and Tools

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

itvsense Probe M-301/M-304

This document describes how the Meraki Cloud Controller system enables the construction of large-scale, cost-effective wireless networks.

A SURVEY ON AUTOMATED SERVER MONITORING

Packet Capture and Expert Troubleshooting with the Viavi Solutions T-BERD /MTS-6000A

Finding Fault Location: Combining network topology and end-to-end measurments to locate network problems?

diversifeye Application Note

Transcription:

Network Monitoring with the perfsonar Dashboard Andy Lake Brian Tierney ESnet Advanced Network Technologies Group TIP2013 Honolulu HI January 15, 2013

Overview perfsonar overview Dashboard history and motivation Current ESnet dashboards Installation and integration with toolkit Future work and collaboration Info on new perfsonar-ps release 1/15/13 2

What is perfsonar? perfsonar is a tool to: Set network performance expectations Find network problems ( soft failures ) Help fix these problems All in multi-domain environments These problems are all harder when multiple networks are involved perfsonar is provides a standard way to publish active and passive monitoring data This data is interesting to network researchers as well as network operators 1/15/13 3

Setting Expectations: Time to Copy 1 Terabyte 10 Mbps network : 300 hrs (12.5 days) 100 Mbps network : 30 hrs 1 Gbps network : 3 hrs (are your disks fast enough?) 10 Gbps network : 20 minutes (need really fast disks / filesystem) These figures assume some headroom left for other users Compare these speeds to: USB 2.0 portable disk 60 MB/sec (480 Mbps) peak 5-15 MB/sec more typical 15-40 hours to load 1 Terabyte 1/29/12 4

Soft Network Failures Soft failures are where basic connectivity functions, but high performance is not possible. TCP was intentionally designed to hide all transmission errors from the user: As long as the TCPs continue to function properly and the internet system does not become completely partitioned, no transmission errors will affect the users. (From IEN 129, RFC 716) Some soft failures only affect high bandwidth long RTT flows. Hard failures are easy to detect & fix soft failures can lie hidden for years! One network problem can often mask others 1/29/12 5

Sample Results: Finding/Fixing soft failures Rebooted router with full route table Gradual failure of optical line card 1/29/12 6

perfsonar Services PS-Toolkit includes these measurement tools: BWCTL: network throughput OWAMP: network loss, delay, and jitter traceroute Test scheduler: runs bwctl, traceroute, and owamp tests on a regular interval Measurement Archives (data publication) SNMP MA router interface Data psb MA -- results of bwctl, owamp, and traceroute tests Lookup Service: used to find services PS-Toolkit includes these web100-based Troubleshooting Tools NDT (TCP analysis, duplex mismatch, etc.) NPAD (TCP analysis, router queuing analysis, etc)

perfsonar-ps Software perfsonar-ps is an open source implementation of the perfsonar measurement infrastructure and protocols Mostly written in perl, some Java and python too http://software.internet2.edu/ps-performance_toolkit/ Documentation and Issue tracker at: http://code.google.com/p/perfsonarps/ All components are available as RPMs. The perfsonar-ps consortium supports the CentOS. RPMs are compiled for both i386 and x86_64 architecture Functionality on other platforms and architectures is possible, but not supported. Should work: Red Hat Enterprise Linux and Scientific Linux ( v5) Harder, but possible: Fedora Linux, SuSE, Debian Variants 8 1/15/13, 2011 Internet2

World-Wide perfsonar-ps Deployments: 536 bwctl nodes, 505 owamp nodes as of Jan 7 1/15/13 9

perfsonar-ps Toolkit 3.3-RC1 now available perfsonar-ps Toolkit 3.3-rc1 announced. Still considered beta but contains foundation for a wealth of new features and enhancements: CentOS 6 support 32 bit and 64 bit support LiveUSB Installation New REST-based Lookup Service Support for centralized mesh configuration of multiple hosts MaDDash add-on package for displaying performance measurement dashboard Many more fixes/features and more to come in future RCs Dashboard is focus of this presentation, but part of a suite of new tools being added to perfsonar-ps

ESnet perfsonar Infrastructure ESnet maintains perfsonar deployment of 80 nodes on backbone and at site borders 36 throughput nodes running regular BWCTL tests 34 latency nodes running regular OWAMP tests 10 combined hosts running both BWCTL and OWAMP Over a 1000 point-to-point tests to monitor just within network. Manually walking through every graph trying to find problems not feasible 1/15/13 11

First attempt at automated alerts from perfsonar data: Nagios Checks Developed set of Nagios checks to report when tests fell below certain thresholds. Integrated into ESnet production monitoring Nagios system really good at looking at individual hosts and services, but looking at pairs of hosts was not straightforward. Started off looking at things in aggregate, but caused us to miss smaller issues drowned out by working tests. It was a onedimensional solution to an inherently two-dimensional problem. 1/15/13 12

Developing a Dashboard Realize need a better solution. At that time USATLAS had started experimenting with dashboard, but had not yet evolved into modular dashboard. There was clearly a community (not just ESnet) need for solution. Developed MaDDash (Monitoring and Debugging Dashboard) to help address this problem Immediately saw patterns that allowed us to identify problems with measurement infrastructure that were causing tests to fail. 1/15/13 13

Demo 1/15/13 14

perfsonar-ps Toolkit Integration Install with yum install maddash on CentOS box, including those running perfsonar-ps Toolkit NetInstall Install instructions at this link: http://code.google.com/p/perfsonar-ps/wiki/maddashinstall New centralized mesh configuration tool can automatically generate configuration Integration with some of the administrative GUIs such as those that enable and disable services 1/15/13 15

Dashboard Deployment Contains REST interface that allows easy access to data Able to use interface to integrate with My ESnet portal https://my.es.net/network/performance/bwctl https://my.es.net/network/performance/owamp Also able to extract into standalone GUI that others can install via RPM. NCAR/XSEDE: http://ps.ncar.xsede.org/maddash-webui/ Internet2: http://lab234.internet2.edu/maddash-webui/ PennREN: http://bwctl.net.pennren.net/maddash-webui/index.cgi? dashboard=pennren ESnet : http://ps-dashboard.es.net 1/15/13 16

Future Work and Collaboration Working with USATLAS to share components and make sure tools can work with each other using common APIs and code. Job scheduling component of MaDDash ported to work with USATLAS modular dashboard Ongoing effort to improve performance, GUIs and simplify configuration Possible refinement of algorithm to determine when to change from green to yellow and back Identifying new dashboards and helping others get it running 1/15/13 17

Questions? Thanks! Andy Lake- andy@es.net Brian Tierney- bltierney@es.net http://www.es.net/ http://fasterdata.es.net/ http://code.google.com/p/perfsonar-ps/wiki/maddashinstall

Extra Slides

Demo Dashboard List 1/15/13 20

Demo ESnet Throughput 1/15/13 21

Demo ESnet Loss 1/15/13 22

Demo Esnet to APAN 1/15/13 23

Demo - Graphs 1/15/13 24

perfsonar-ps Toolkit 3.3-rc1 perfsonar-ps Toolkit 3.3-rc1 released on Monday. 3.3 is a major overhaul so we expect multiple RCs and need everyone s help to test! New Features in RC1 CentOS 6 (32-bit and 64 bit support) LiveUSB distribution in addition to NetInstall and LiveCD Completely rewritten and re-designed lookup service Centralized mesh configuration management software Integration with MaDDash performance monitoring dashboard Numerous other bug fixes and enhancements Expect more in coming weeks from future RCs before final release Traceroute visualization developed by University of Wisconsin Iperf3 integration with BWCTL Web10G integration Since it s a release candidate should be considered beta software and not used to upgrade existing production toolkit hosts 1/15/13 25