Network monitoring with perfsonar. Duncan Rand Imperial College London
|
|
|
- Adele Wheeler
- 10 years ago
- Views:
Transcription
1 Network monitoring with perfsonar Duncan Rand Imperial College London
2 A little history: Gridmon Cast your minds back to GridPP16 at QMUL about 6 years ago; we saw a not too dissimilar network monitoring status report by Robin GridPP Gridmon network monitoring system developed at Daresbury GridMonV1 ran in full mesh between 12 original GridPP sites Functionality: iperf: throughput ping: packet-loss, RTT udpmon: UDP throughput and packet loss 2
3 Gridmon Version 2 of Gridmon software proposed to run based on Monarch model of envisaged LHC data flow from Tier-1 at RAL to Tier-2s star from RAL to all ~20 Tier-2s because of concerns about contention sites had less bandwidth in those days a full mesh only within distributed Tier-2s (i.e. Scotgrid, NorthGrid etc) Status of site roll-out similar to now, i.e. at the tail-end URL still works: 3
4 4
5 LHC data: CMS Switching to the present, where is LHC VO data that s being copied to Tier-2s actually coming from? CMS Tier-2s have always got data from sites apart from their local Tier-1 For example: snapshot of data flow into Imperial College taken over last week 46 TB dataset transfer mostly from Fermilab 5
6 CMS Over last 6 months UK CMS Tier-2s have got only 14% of their data from UK and about 28% comes from one site: Fermilab worth monitoring link 6
7 ATLAS Similar 48 hour picture for typical ATLAS site: QMUL 35% (11TB) from BNL 7
8 ATLAS ATLAS UK Tier-2s over last year get 54% of data from outside the UK RAL-LCG2 still site with the largest overall proportion (34%) So we clearly need an international network performance monitoring and diagnostic system.. 8
9 perfsonar PerfSONAR stands for PERFormance Service Oriented Network monitoring Architecture perfsonar is an infrastructure for network performance monitoring, making it easier to solve end-to-end performance problems on paths crossing several networks A collaboration between ESnet (the US DoE Energy Sciences network) GÉANT2 (Europe s high-bandwidth academic internet) Internet2 (advanced networking consortium) RNP (Rede Nacional de Ensino e Pesquisa: Brazil s academic internet backbone) Also a protocol and several interoperable software packages Two flavours: perfsonar-mdm (multi-domain monitoring, Geant flavour) perfsonar-ps (US version) 9
10 UK deployment UK s DRI money fortuitously timed as it enabled GridPP sites to go out and buy suitable perfsonar monitoring hosts en masse Some sites bought two hosts, one for bandwidth monitoring and one for latency/packet loss monitoring, others only bought one RAL Tier-1 pioneered installation followed by Oxford and QMUL Wiki set up to help new sites with installation Currently 16 UK sites including Tier-1 running perfsonar boxes Last remaining sites are those with very little (if any) GridPP man-power 10
11 Sites join GridPP community and thereby appear for other sites to schedule tests to Full mesh of bandwidth and latency tests between all UK sites Typically measure bandwidth using 30s iperf test every 4 hours and one-way latency using NTP synchronised clock and sending a set of packets to each destination site 11
12 Bandwidth throughput results 12
13 13
14 Packet loss/ latency results 14
15 15
16 BNL dashboard When the US deployed perfsonar they found it difficult to get an overview; they were continuously having to monitor each site s perfsonar rate graphs So they developed BNL dashboard Matrices for bandwidth and packet loss between sites all on one page USCMS, USATLAS, Italy, Canada, France, LHCOPN, LHCONE (selected sites) clouds UK sites have been added to UK cloud as and when they came online now form the largest country cloud (16 sites) Inter-cloud groups to monitor international performance 16
17 BNL dashboard 17
18 How can perfsonar help us: ATLAS T2D ATLAS now have standard Tier-2 and higher performance Tier-2 known as T2D T2Ds have demonstrated good network connectivity and ATLAS ships out more data directly to them; uses them to export data locally within cloud to other T2s Requirements are to be able to copy to and from 9/12 Tier1s at >5MB/s reliably over a sustained period (about a month) A number of UK ATLAS Tier-2 have been T2Ds for a while (Glasgow, Manchester QMUL, Lancaster, Oxford, ECDF) Others more recently promoted: RHUL and Cambridge Up until recently once promoted apparently unlikely to get demoted Now ADC are hinting at demotions; ECDF recently put on negative watch A real networking problem to solve Have set up an ATLAS-UK inter-cloud group to help with such current issues 18
19 Other ATLAS sites to UK inter-cloud 19
20 Current issues I: UK Tier-2s to FZK/KIT Many UK Tier-2 sites have been getting poor file transfer rates into KIT for a while perfsonar results also poor; diagnosed to KIT firewall problem Manchester and Oxford were moved outside of firewall dramatic change KIT see LHCONE as the solution to this problem GGUS-Ticket-ID: #
21 Current issues II: Taiwan to QMUL On 4 th September QMUL upgraded their WAN to 10 Gbps Saw an increased in bandwidth to other UK sites as expected But after a temporary increase there was a dramatic decrease in ATLAS sonar file transfer rates from ASGC traceroute and perfsonar reverse traceroute commands suggest possible differing routes to and from Taiwan GGUS-Ticket-ID: #
22 Current issues II: Taiwan to QMUL On 4 th September QMUL upgraded their WAN to 10 Gbps Saw an increased in bandwidth to other UK sites as expected But after a temporary increase there was a dramatic decrease in ATLAS sonar file transfer rates from ASGC traceroute and perfsonar reverse traceroute commands suggest possible differing routes to and from Taiwan GGUS-Ticket-ID: #
23 Current issues II: Taiwan to QMUL On 4 th September QMUL upgraded their WAN to 10 Gbps Saw an increased in bandwidth to other UK sites as expected But after a temporary increase there was a dramatic decrease in ATLAS sonar file transfer rates from ASGC traceroute and perfsonar reverse traceroute commands suggest possible differing routes to and from Taiwan Solved! GGUS-Ticket-ID: #
24 Current issues III: Oxford to BNL Since April Oxford gets very poor results copying ATLAS sonar files to BNL but throughput to BNL is OK (0.8 Gbps) as is Oxford to AGLT2 (since June) traceroute indicates the same routing for other UK sites to BNL GGUS-Ticket-ID: #
25 AGLT2 MWT2 BNL 25
26 Current issues IV: Birmingham to BNL Similar problem copying from BHAM to BNL Poor FTS rates but OK to MWT2 and good perfsonar throughput rates What about reverse traceroute? GGUS-Ticket-ID: #
27 X X X X 27
28 CMS use of perfsonar CMS is developing the way in which it intends to use perfsonar USCMS + Brazil have a cloud on the BNL dashboard UK CMS sites in UK cloud form our national CMS cloud RALPP is part of prototype inter-cloud group with Nebraska, Florida, DESY, Beijing, SPRACE, Bari, TIFR, SINP, & Taiwan Purpose: testing network connections between widely separated CMS computing sites (T2s), a few per geographic region 28
29 Conclusion Clearly need international standard network monitoring to be able to monitor links and actively diagnose them when problems occur perfsonar, together with timely DRI hardware, provides this capability UK is now in a strong position: most (16) of the UK Tier-2s have now deployed their perfsonar hosts and have been added to the BNL dashboard perfsonar is already being used to assist in diagnosing network issues Plenty to do in terms of understanding how best to use the diagnostic tool to optimise and debug links between sites UK needs to keep up to date with developments associated with LHCONE Thanks to Tom Wlodek (BNL), Shawn McKee and Jason Zurawski (Internet2) for their help 29
Deploying distributed network monitoring mesh
Deploying distributed network monitoring mesh for LHC Tier-1 and Tier-2 sites Phil DeMar, Maxim Grigoriev Fermilab Joe Metzger, Brian Tierney ESnet Martin Swany University of Delaware Jeff Boote, Eric
Tier3 Network Issues. Richard Carlson May 19, 2009 [email protected]
Tier3 Network Issues Richard Carlson May 19, 2009 [email protected] Internet2 overview Member organization with a national backbone infrastructure Campus & Regional network members National and International
perfsonar MDM updates for LHCONE: VRF monitoring, updated web UI, VM images
perfsonar MDM updates for LHCONE: VRF monitoring, updated web UI, VM images Domenico Vicinanza DANTE, Cambridge, UK perfsonar MDM Product Manager [email protected] LHCONE Meeting Oslo 20-21
perfsonar: End-to-End Network Performance Verification
perfsonar: End-to-End Network Performance Verification Toby Wong Sr. Network Analyst, BCNET Ian Gable Technical Manager, Canada Overview 1. IntroducGons 2. Problem Statement/Example Scenario 3. Why perfsonar?
Network Monitoring with the perfsonar Dashboard
Network Monitoring with the perfsonar Dashboard Andy Lake Brian Tierney ESnet Advanced Network Technologies Group TIP2013 Honolulu HI January 15, 2013 Overview perfsonar overview Dashboard history and
Network issues on FR cloud. Eric Lançon (CEA-Saclay/Irfu)
Network issues on FR cloud Eric Lançon (CEA-Saclay/Irfu) Network Usage Data distribution MC production Analysis Distributed storage Network used for Data distribution, 2 components : Analysis Pre-placed
Network performance monitoring Insight into perfsonar
Network performance monitoring Insight into perfsonar Szymon Trocha, Poznań Supercomputing and Networking Center E-infrastructure Autumn Workshops, Chisinau, Moldova 9 September 2014 Agenda! Network performance
LHCOPN and LHCONE an introduction
LHCOPN and LHCONE an introduction APAN workshop Nantou, 13 th August 2014 [email protected] CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it 1 Summary - WLCG - LHCOPN - LHCONE - L3VPN
LHCONE Site Connections
LHCONE Site Connections Michael O Connor [email protected] ESnet Network Engineering Asia Tier Center Forum on Networking Daejeon, South Korea September 23, 2015 Outline Introduction ESnet LHCONE Traffic Volumes
Introduction to perfsonar
Introduction to perfsonar Loukik Kudarimoti, DANTE 27 th September, 2006 SEEREN2 Summer School, Heraklion Overview of this talk Answers to some basic questions The need for Multi-domain monitoring What
Integration of Network Performance Monitoring Data at FTS3
Integration of Network Performance Monitoring Data at FTS3 July-August 2013 Author: Rocío Rama Ballesteros Supervisor(s): Michail Salichos Alejandro Álvarez CERN openlab Summer Student Report 2013 Project
US CMS Tier1 Facility Network at Fermilab
US CMS Tier1 Facility Network at Fermilab Andrey Bobyshev Fermilab, Computing Division Winter 2010 ESCC/Internet2 Joint Techs Salt Lake City, Utah, January 31 February 4, 2010 Outline of the talk : A little
perfsonar MDM The multi-domain monitoring service for the GÉANT Service Area connect communicate collaborate
DATASHEET Network Performance Services perfsonar MDM The multi-domain monitoring service for the GÉANT Service Area connect communicate collaborate What is perfsonar MDM? perfsonar MDM (Multi-Domain Monitoring)
Overview of Network Measurement Tools
Overview of Network Measurement Tools Jon M. Dugan Energy Sciences Network Lawrence Berkeley National Laboratory NANOG 43, Brooklyn, NY June 1, 2008 Networking for the Future of Science
Software Defined Networking for big-data science
Software Defined Networking for big-data science Eric Pouyoul Chin Guok Inder Monga (presenting) SRS presentation November 15 th, Supercomputing 2012 Acknowledgements Many folks at ESnet who helped with
Software Defined Networking for big-data science
Software Defined Networking for big-data science Eric Pouyoul Chin Guok Inder Monga (presenting) TERENA Network Architects meeting, Copenhagen November 21 st, 2012 ESnet: World s Leading Science Network
Frequently Asked Questions
Frequently Asked Questions 1. Q: What is the Network Data Tunnel? A: Network Data Tunnel (NDT) is a software-based solution that accelerates data transfer in point-to-point or point-to-multipoint network
Campus Network Design Science DMZ
Campus Network Design Science DMZ Dale Smith Network Startup Resource Center [email protected] The information in this document comes largely from work done by ESnet, the USA Energy Sciences Network see
(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015
(Possible) HEP Use Case for NDN Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015 Outline LHC Experiments LHC Computing Models CMS Data Federation & AAA Evolving Computing Models & NDN Summary Phil DeMar:
This document describes how the Meraki Cloud Controller system enables the construction of large-scale, cost-effective wireless networks.
This document describes how the Meraki Cloud Controller system enables the construction of large-scale, cost-effective wireless networks. Copyright 2009 Meraki, Inc. All rights reserved. Trademarks Meraki
Trial of the Infinera PXM. Guy Roberts, Mian Usman
Trial of the Infinera PXM Guy Roberts, Mian Usman LHC Workshop Recap Rather than maintaining distinct networks, the LHC community should aim to unify its network infrastructure Traffic aggregation on few
TCP Labs. WACREN Network Monitoring and Measurement Workshop Antoine Delvaux [email protected] perfsonar developer 30.09.
TCP Labs WACREN Network Monitoring and Measurement Workshop Antoine Delvaux [email protected] perfsonar developer 30.09.2015 Hands-on session We ll explore practical aspects of TCP Checking the effect
Forschungszentrum Karlsruhe in der Helmholtz - Gemeinschaft. Holger Marten. Holger. Marten at iwr. fzk. de www.gridka.de
Tier-2 cloud Holger Marten Holger. Marten at iwr. fzk. de www.gridka.de 1 GridKa associated Tier-2 sites spread over 3 EGEE regions. (4 LHC Experiments, 5 (soon: 6) countries, >20 T2 sites) 2 region DECH
OnTimeDetect: Offline and Online Network Anomaly Notification Tool
OnTimeDetect: Offline and Online Network Anomaly Notification Tool Prasad Calyam, Ph.D. [email protected] Other Team Members: Jialu Pu, Weiping Mandrawa Network Tools Tutorial Session, Internet2 Spring Member
Figure 1. perfsonar architecture. 1 This work was supported by the EC IST-EMANICS Network of Excellence (#26854).
1 perfsonar tools evaluation 1 The goal of this PSNC activity was to evaluate perfsonar NetFlow tools for flow collection solution and assess its applicability to easily subscribe and request different
Genius SIP Trunking voice services. A cost effective alternative to ISDN that gives your business flexibility and business continuity
A cost effective alternative to ISDN that gives your business flexibility and business continuity What is Genius SIP? Genius SIP is a market-leading SIP Trunking solution, connecting your site directly
Deploying in a Distributed Environment
Deploying in a Distributed Environment Distributed enterprise networks have many remote locations, ranging from dozens to thousands of small offices. Typically, between 5 and 50 employees work at each
Accurate End-to-End Performance Management Using CA Application Delivery Analysis and Cisco Wide Area Application Services
White Paper Accurate End-to-End Performance Management Using CA Application Delivery Analysis and Cisco Wide Area Application Services What You Will Learn IT departments are increasingly relying on best-in-class
End-to-End Network/Application Performance Troubleshooting Methodology
End-to-End Network/Application Performance Troubleshooting Methodology Wenji Wu, Andrey Bobyshev, Mark Bowden, Matt Crawford, Phil Demar, Vyto Grigaliunas, Maxim Grigoriev, Don Petravick Fermilab, P.O.
Complementary Visualization of perfsonar Network Performance Measurements
Complementary Visualization of perfsonar Network Performance Measurements Andreas Hanemann German Research Network (DFN) c/o Leibniz Supercomputing Center Barer Str. 21, 80333 Munich, Germany [email protected]
SDN CENTRALIZED NETWORK COMMAND AND CONTROL
SDN CENTRALIZED NETWORK COMMAND AND CONTROL Software Defined Networking (SDN) is a hot topic in the data center and cloud community. The geniuses over at IDC predict a $2 billion market by 2016
PORTrockIT. Spectrum Protect : faster WAN replication and backups with PORTrockIT
1 PORTrockIT 2 Executive summary IBM Spectrum Protect, previously known as IBM Tivoli Storage Manager or TSM, is the cornerstone of many large companies data protection strategies, offering a wide range
Network performance in virtual infrastructures
Network performance in virtual infrastructures A closer look at Amazon EC2 Alexandru-Dorin GIURGIU University of Amsterdam System and Network Engineering Master 03 February 2010 Coordinators: Paola Grosso
Agenda. NRENs, GARR and GEANT in a nutshell SDN Activities Conclusion. Mauro Campanella Internet Festival, Pisa 9 Oct 2015 2
Agenda NRENs, GARR and GEANT in a nutshell SDN Activities Conclusion 2 3 The Campus-NREN-GÉANT ecosystem CAMPUS networks NRENs GÉANT backbone. GÉANT Optical + switching platforms Multi-Domain environment
Operating Systems and Networks Sample Solution 1
Spring Term 2014 Operating Systems and Networks Sample Solution 1 1 byte = 8 bits 1 kilobyte = 1024 bytes 10 3 bytes 1 Network Performance 1.1 Delays Given a 1Gbps point to point copper wire (propagation
Customer Network Assessment
--------- Customer Network Assessment Network Tests Version 2.0 Revision 1.0 8x8, Inc. 2125 O'Nel Drive San Jose, CA 95131 Phone: 408.727.1885 Fax: 408.980.0432 Contents Important Notes for all Tests...
ESnet Support for WAN Data Movement
ESnet Support for WAN Data Movement Eli Dart, Network Engineer ESnet Science Engagement Group Joint Facilities User Forum on Data Intensive Computing Oakland, CA June 16, 2014 Outline ESnet overview Support
LOLA (Low Latency) Project
Enabling remote real time musical performances over advanced networks Description LOLA project aims to enable real time musical performances where musicians are physically located in remote sites, connected
Hands on Workshop. Network Performance Monitoring and Multicast Routing. Yasuichi Kitamura NICT Jin Tanaka KDDI/NICT APAN-JP NOC
Hands on Workshop Network Performance Monitoring and Multicast Routing Yasuichi Kitamura NICT Jin Tanaka KDDI/NICT APAN-JP NOC July 18th TEIN2 Site Coordination Workshop Network Performance Monitoring
--------- Virtual Office Network Tests Version 2.0 Revision 1.0 8x8, Inc. 2125 O'Nel Drive San Jose, CA 95131 Phone: 408.727.1885 Fax: 408.980.
--------- Virtual Office Network Tests Version 2.0 Revision 1.0 8x8, Inc. 2125 O'Nel Drive San Jose, CA 95131 Phone: 408.727.1885 Fax: 408.980.0432 Contents Important Notes for all Tests... 3 Tests and
How Router Technology Shapes Inter-Cloud Computing Service Architecture for The Future Internet
How Router Technology Shapes Inter-Cloud Computing Service Architecture for The Future Internet Professor Jiann-Liang Chen Friday, September 23, 2011 Wireless Networks and Evolutional Communications Laboratory
Report from SARA/NIKHEF T1 and associated T2s
Report from SARA/NIKHEF T1 and associated T2s Ron Trompert SARA About SARA and NIKHEF NIKHEF SARA High Energy Physics Institute High performance computing centre Manages the Surfnet 6 network for the Dutch
MERAKI WHITE PAPER Cloud + Wireless LAN = Easier + Affordable
MERAKI WHITE PAPER Cloud + Wireless LAN = Easier + Affordable Version 1.0, August 2009 This white paper discusses how a cloud-based architecture makes wireless LAN easier and more affordable for organizations
CITO Research Advancing the craft of technology leadership. Hybrid Cloud Myths. Sponsored by
CITO Research Advancing the craft of technology leadership Hybrid Cloud Myths Sponsored by Contents Introduction 1 Myth: You can use the public cloud for everything. 2 Myth: You can t be secure in the
Network Probe. Figure 1.1 Cacti Utilization Graph
Network Probe Description The MCNC Client Network Engineering group will install several open source network performance management tools on a computer provided by the LEA or charter school to build a
Network Architecture and Topology
1. Introduction 2. Fundamentals and design principles 3. Network architecture and topology 4. Network control and signalling 5. Network components 5.1 links 5.2 switches and routers 6. End systems 7. End-to-end
Correlating Internet Performance Changes and Route Changes to Assist in Trouble-shooting from an End-user Perspective
Correlating Internet Performance Changes and Route Changes to Assist in Trouble-shooting from an End-user Perspective Connie Logg, Jiri Navratil, and Les Cottrell Stanford Linear Accelerator Center, 2575
Network Monitoring and Traffic CSTNET, CNIC
Network Monitoring and Traffic Analysis in CSTNET Chunjing Han Aug. 2013 CSTNET, CNIC Topics 1. The background of network monitoring 2. Network monitoring protocols and related tools 3. Network monitoring
Understanding the Impact of Running WAN Emulation with Load Testing
Understanding the Impact of Running WAN Emulation with Load Testing A Shunra Software White Paper July 2, 2008 Introduction Investment in pre-deployment performance testing has become a widely adopted
LHC schedule: what does it imply for SRM deployment? [email protected]. CERN, July 2007
WLCG Service Schedule LHC schedule: what does it imply for SRM deployment? [email protected] WLCG Storage Workshop CERN, July 2007 Agenda The machine The experiments The service LHC Schedule Mar. Apr.
IP SLAs Overview. Finding Feature Information. Information About IP SLAs. IP SLAs Technology Overview
This module describes IP Service Level Agreements (SLAs). IP SLAs allows Cisco customers to analyze IP service levels for IP applications and services, to increase productivity, to lower operational costs,
