Introduc)on & Mo)va)on

Similar documents
Introduction & Motivation

perfsonar Overview Jason Zurawski, ESnet Southern Partnerships for Advanced Networking November 3 rd 2015

Campus Network Design Science DMZ

Achieving the Science DMZ

Troubleshoo*ng Network Performance Issues with Ac*ve Monitoring

The Science DMZ: A Network Design Pa8ern for Data- Intensive Science

Engagement Strategies for Emerging Big Data Collaborations

Optimizing Data Management at the Advanced Light Source with a Science DMZ

Fundamentals of Data Movement Hardware

March 10 th 2011, OSG All Hands Mee6ng, Network Performance Jason Zurawski Internet2 NDT

Network Performance Tools

GARUDA - NKN Partner's Meet 2015 Big data networks and TCP

Globus Research Data Management: Endpoint Configuration and Deployment. Steve Tuecke Vas Vasiliadis

Science DMZ Security

Addressing research data challenges at the. University of Colorado Boulder

LMS. OSI Layers and the Learning Management System. Over view

Science DMZs Understanding their role in high-performance data transfers

Network Management and Monitoring Software

The Science DMZ. Eli Dart, Network Engineer Joe Metzger, Network Engineer ESnet Engineering Group. LHCOPN / LHCONE meeting. Internet2, Washington DC

Effec%ve AX 2012 Upgrade Project Planning and Microso< Sure Step. Arbela Technologies

Network Virtualiza/on on Internet2. Eric Boyd Senior Director for Strategic Projects

Kaseya Fundamentals Workshop DAY THREE. Developed by Kaseya University. Powered by IT Scholars

DICE Diagnostic Service

OSG CAMPUS INFRASTRUCTURES SERIES

The Science DMZ: A Network Design Pa8ern for Data- Intensive Science

perfsonar Multi-Domain Monitoring Service Deployment and Support: The LHC-OPN Use Case

SDN for Science Networks

LHCONE Site Connections

Measuring IP Performance. Geoff Huston Telstra

IPv6 - A Quick Introduction

April 20 th 2011, Internet2 Spring Member Mee5ng Aaron Brown Internet2. Circuit Monitoring for DYNES

Network Monitoring with the perfsonar Dashboard

Data Transfer Network (TPN)

Big Data and Clouds: Challenges and Opportuni5es

Wireless Networks. Reading: Sec5on 2.8. COS 461: Computer Networks Spring Mike Freedman

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

Enhanced Research Data Management and Publication with Globus

Science Gateways What are they and why are they having such a tremendous impact on science? Nancy Wilkins- Diehr wilkinsn@sdsc.edu

TCP Labs. WACREN Network Monitoring and Measurement Workshop Antoine Delvaux perfsonar developer

Comparisons between HTCP and GridFTP over file transfer

perfsonar: End-to-End Network Performance Verification

Accelerating Application Performance on Virtual Machines

Visualizations and Correlations in Troubleshooting

How To Monitor And Test An Ethernet Network On A Computer Or Network Card

Stream Deployments in the Real World: Enhance Opera?onal Intelligence Across Applica?on Delivery, IT Ops, Security, and More

Challenges of Sending Large Files Over Public Internet

Tier3 Network Issues. Richard Carlson May 19, 2009

Verifying Network Bandwidth

Computer Networking Networks

The Science DMZ: Introduction & Architecture

NUIT Tech Talk: Trends in Research Data Mobility

DDOS Mi'ga'on in RedIRIS. SIG- ISM. Vienna

EVALUATING NETWORK BUFFER SIZE REQUIREMENTS

Watson Networks. Carrier / Service Provider Guide. 445 Dexter Avenue - Suite 2050 Montgomery, Alabama WatsonNetworks.com

Hands on Workshop. Network Performance Monitoring and Multicast Routing. Yasuichi Kitamura NICT Jin Tanaka KDDI/NICT APAN-JP NOC

Network Probe. Figure 1.1 Cacti Utilization Graph

Internet2 Network: Controlling a Slice of the Na6onal Network. Eric Boyd Senior Director of Strategic Projects

Visio Enabled Solution: One-Click Switched Network Vision

The Development of Cloud Interoperability

Network performance monitoring Insight into perfsonar

Chapter 4 Connecting to the Internet through an ISP

Industrial Ethernet How to Keep Your Network Up and Running A Beginner s Guide to Redundancy Standards

Question: 3 When using Application Intelligence, Server Time may be defined as.

Internetworking II: MPLS, Security, and Traffic Engineering

February 9 th 2010, AIMS 2010 Ma4 Zekauskas, ma4@internet2.edu

MANAGING NETWORK COMPONENTS USING SNMP

Mission. To provide higher technological educa5on with quality, preparing. competent professionals, with sound founda5ons in science, technology

G DATA TechPaper #0275. G DATA Network Monitoring

The Real Score of Cloud

Improving Scientific Outcomes at the APS with a Science DMZ

configurability compares with typical SIEM & Log Management systems Able to install collectors on remote sites rather than pull all data

configurability compares with typical Asset Monitoring systems Able to install collectors on remote sites rather than pull all data

How To Set Up Foglight Nms For A Proof Of Concept

DDC Sequencing and Redundancy

Dell PowerVault MD Series Storage Arrays: IP SAN Best Practices

Online Enrollment Op>ons - Sales Training Benefi+ocus.com, Inc. All rights reserved. Confiden>al and Proprietary 1

SUMMIT. November 2010

NETFORT LANGUARDIAN MONITORING WAN CONNECTIONS. How to monitor WAN connections with NetFort LANGuardian Aisling Brennan

Controller- based Path Selec2on for Distributed IaaS Cloud Environment. arch B4 yummy

Internet2 ION Service Overview and Status. Tom Lehman (USC/ISI)

Managing Application Delivery from the User s Perspective

How To Manage A Network

Implementing perfsonar in the South African National Research and Education Network

A High-Performance Storage and Ultra-High-Speed File Transfer Solution

Anycast Rou,ng: Local Delivery. Tom Daly, CTO h<p://dyn.com Up,me is the Bo<om Line

Packet Capture and Expert Troubleshooting with the Viavi Solutions T-BERD /MTS-6000A

Disaster Recovery Planning and Implementa6on. Chris Russel Director, IT Infrastructure and ISO Compu6ng and Network Services York University

Network Performance - Theory

Beyond Strategy: Building Your Mobile Capabili6es

Enabling Real-Time Sharing and Synchronization over the WAN

Network Security. Computer Security & Forensics. Security in Compu5ng, Chapter 7. l Network Defences. l Firewalls. l Demilitarised Zones

SolarWinds Certified Professional. Exam Preparation Guide

Scalus A)ribute Workshop. Paris, April 14th 15th

Perspec'ves on SDN. Roadmap to SDN Workshop, LBL

Basic Network Configuration

Internet Control Protocols Reading: Chapter 3

Big Data and Scientific Discovery

Networking and the Web

ESnet Support for WAN Data Movement

COMPSCI 111 / 111G An introduc)on to prac)cal compu)ng

Transcription:

Introduc)on & Mo)va)on This document is a result of work by the perfsonar Project (hdp://www.perfsonar.net) and is licensed under CC BY- SA 4.0 (hdps://crea)vecommons.org/licenses/by- sa/4.0/). Event Presenter, Organiza)on, Email Date

Outline Problem Statement on Network Connec4vity Suppor)ng Scien)fic Users Network Performance & TCP Behaviors w/ Packet Loss What is perfsonar Deployment Overview Conclusions 6

Problem Statement The global Research & Educa)on network ecosystem is comprised of hundreds of interna)onal, na)onal, regional and local- scale networks. 2016, hdp://www.perfsonar.net April 25, 2016 7

Problem Statement While these networks all interconnect, each network is owned and operated by separate organiza)ons (called domains ) with different policies, customers, funding models, hardware, bandwidth and configura)ons. 2016, hdp://www.perfsonar.net April 25, 2016 8

The R&E Community The global Research & Educa)on network ecosystem is comprised of hundreds of interna)onal, na)onal, regional and local- scale resources each independently owned and operated. This complex, heterogeneous set of networks must operate seamlessly from end to end to support science and research collabora)ons that are distributed globally. Data mobility is required; there is no liquid market for HPC resources (people use what they can get DOE, XSEDE, NOAA, etc. etc.) To stay compe))ve, we must learn the use paderns, and support them This may mean making sure your network, and the networks of others, are func)onal 9

Outline Problem Statement on Network Connec)vity Suppor4ng Scien4fic Users Network Performance & TCP Behaviors w/ Packet Loss What is perfsonar Deployment Overview Conclusions 11

Understanding Data Trends 100PB 10PB 1PB Small collaboration scale, e.g. light and neutron sources Medium collaboration scale, e.g. HPC codes A few large collaborations have internal software and networking organizations Data Scale 100TB 10TB 1TB Large collaboration scale, e.g. LHC 100GB 10GB Collaboration Scale hdp://www.es.net/science- engagement/science- requirements- reviews/ 12

Sample Data Transfer Rates This table available at: http://fasterdata.es.net/home/requirements-and-expectations 13

Challenges to Network Adop)on Causes of performance issues are complicated for users. Lack of communica)on and collabora)on between the CIO s office and researchers on campus. Lack of IT exper)se within a science collabora)on or experimental facility User s performance expecta)ons are low ( The network is too slow, I tried it and it didn t work ). Cultural change is hard ( we ve always shipped disks! ). Scien)sts want to do science not IT support The Capability Gap 2016, hdp://www.perfsonar.net April 25, 2016 14

Outline Problem Statement on Network Connec)vity Suppor)ng Scien)fic Users Network Performance & TCP Behaviors w/ Packet Loss What is perfsonar Deployment Overview Conclusions 16

Lets Talk Performance "In any large system, there is always something broken. Jon Postel Modern networks are occasionally designed to be one- size- fits- most e.g. if you have ever heard the phrase converged network, the design is to facilitate CIA (Confiden)ality, Integrity, Availability) This is not bad for protec)ng the HVAC system from hackers. Its all TCP Bulk data movement is a common thread (move the data from the microscope, to the storage, to the processing, to the people and they are all sipng in different facili)es) This fails when TCP suffers due to path problems (ANYWHERE in the path) its easier to work with TCP than to fix it (20+ years of trying ) TCP suffers the most from unpredictability; Packet loss/delays are the enemy Small buffers on the network gear and hosts Incorrect applica)on choice Packet disrup)on caused by overzealous security Conges)on from herds of mice It all starts with knowing your users, and knowing your network 17

Where Are The Problems? Source Campus S Congested or faulty links between domains Backbone Latency dependant problems inside domains with small RTT Des)na)on Campus D Congested intra- campus links NREN Regional 18

Local Tes)ng Will Not Find Everything Performance is poor when RTT exceeds ~10 ms Performance is good when RTT is < ~10 ms Source Campus S R&E Backbone Des=na=on Campus D Regional Regional Switch with small buffers 19

Sot Failures Cause Packet Loss and Degraded TCP Performance Local (LAN) Metro Area Regional With loss, high performance beyond metro distances is essentially impossible Interna)onal Con)nental Measured (TCP Reno) Measured (HTCP) Theoretical (TCP Reno) Measured (no loss) 20

Sot Network Failures Sot failures are where basic connec)vity func)ons, but high performance is not possible. TCP was inten)onally designed to hide all transmission errors from the user: As long as the TCPs con)nue to func)on properly and the internet system does not become completely par))oned, no transmission errors will affect the users. (From IEN 129, RFC 716) Some sot failures only affect high bandwidth long RTT flows. Hard failures are easy to detect & fix sot failures can lie hidden for years! One network problem can oten mask others 22

Problem Statement: Hard vs. Sot Failures Hard failures are the kind of problems every organiza)on understands Fiber cut Power failure takes down routers Hardware ceases to func)on Classic monitoring systems are good at aler)ng hard failures i.e., NOC sees something turn red on their screen Engineers paged by monitoring systems Sot failures are different and oten go undetected Basic connec)vity (ping, traceroute, web pages, email) works Performance is just poor How much should we care about sot failures? 23

Causes of Packet Loss Network Conges)on Easy to confirm via SNMP, easy to fix with $$ This is not a sot failure, but just a network capacity issue Oten people assume conges)on is the issue when it fact it is not. Under- buffered switch dropping packets Hard to confirm Under- powered firewall dropping packets Hard to confirm Dirty fibers or connectors, failing op)cs/light levels Some)mes easy to confirm by looking at error counters in the routers Overloaded or slow receive host dropping packets Easy to confirm by looking at CPU load on the host 24

Under- buffered Switches are probably our biggest problem today 25

The Science DMZ in 1 Slide Consists of four key components, all required: Fric4on free network path Highly capable network devices (wire- speed, deep queues) Virtual circuit connec)vity op)on Security policy and enforcement specific to science workflows Located at or near site perimeter if possible Dedicated, high- performance Data Transfer Nodes (DTNs) Hardware, opera)ng system, libraries all op)mized for transfer Includes op)mized data transfer tools such as Globus and GridFTP Performance measurement/test node perfsonar Educa4on & Engagement w/ End Users Details at hdp://fasterdata.es.net/science- dmz/ 2013 Globus 2013 Wikipedia 27

The Abstract Science DMZ Border Router Enterprise Border Router/Firewall WAN 10G 10GE perfsonar Clean, High-bandwidth WAN path 10GE Site / Campus access to Science DMZ resources Science DMZ Switch/Router 10GE Site / Campus LAN 10GE Per-service security policy control points perfsonar High performance Data Transfer Node with high-speed storage High Latency WAN Path Low Latency LAN Path 28

Outline Problem Statement on Network Connec)vity Suppor)ng Scien)fic Users Network Performance & TCP Behaviors w/ Packet Loss What is perfsonar Deployment Overview Conclusions 30

But It s Not Just the Network Perhaps you are saying to yourself I have no control over parts of my campus, let alone the 5 networks that sit between me and my collaborators Significant gains are possible in isolated areas of the OSI Stack Things you control: Choice of data movement applica)ons (say no to SCP and RSYNC) Configura)on of local gear (hosts, network devices) Placement and configura)on of diagnos)c tools, e.g. perfsonar Use of the diagnos)c tools Things that need some help: Configura)on of remote gear Addressing issues when the diagnos)c tools alarm Gepng someone to care 31

Network Monitoring All networks do some form monitoring. Addresses needs of local staff for understanding state of the network o Would this informa)on be useful to external users? o Can these tools func)on on a mul)- domain basis? Beyond passive methods, there are ac)ve tools. o E.g. oten we want a throughput number. Can we automate that idea? o Wouldn t it be nice to get some sort of plot of performance over the course of a day? Week? Year? Mul)ple endpoints? perfsonar = Measurement Middleware 32

perfsonar All the previous Science DMZ network diagrams have lidle perfsonar boxes everywhere The reason for this is that consistent behavior requires correctness Correctness requires the ability to find and fix problems You can t fix what you can t find You can t find what you can t see perfsonar lets you see Especially important when deploying high performance services If there is a problem with the infrastructure, need to fix it If the problem is not with your stuff, need to prove it Many players in an end to end path Ability to show correct behavior aids in problem localiza)on WAN perfsonar High performance Data Transfer Node with high-speed storage 10GE 10G Clean, High-bandwidth WAN path Border Router 10GE Per-service security policy control points perfsonar 10GE Site / Campus access to Science DMZ resources Science DMZ Switch/Router perfsonar Enterprise Border Router/Firewall 10GE Site / Campus LAN 33

What is perfsonar? perfsonar is a tool to: Set network performance expecta)ons Find network problems ( sot failures ) Help fix these problems All in mul)- domain environments These problems are all harder when mul)ple networks are involved perfsonar is provides a standard way to publish ac)ve and passive monitoring data This data is interes)ng to network researchers as well as network operators 35

perfsonar History perfsonar can trace its origin to the Internet2 End 2 End performance Ini)a)ve from the year 2000. What has changed since 2000? The Good News: TCP is much less fragile; Cubic is the default CC alg, autotuning is and larger TCP buffers are everywhere Reliable parallel transfers via tools like Globus Online High- performance UDP- based commercial tools like Aspera The Bad News: The wizard gap is s)ll large Jumbo frame use is s)ll small Under- buffered and switches and routers are s)ll common Under- powered/misconfigured firewalls are common Sot failures s)ll go undetected for months User performance expecta)ons are s)ll too low 36

Simula)ng Performance It s infeasible to perform at- scale data movement all the )me as we see in other forms of science, we need to rely on simula)ons Network performance comes down to a couple of key metrics: Throughput (e.g. how much can I get out of the network ) Latency ()me it takes to get to/from a des)na)on) Packet loss/duplica)on/ordering (for some sampling of packets, do they all make it to the other side without serious abnormali)es occurring?) Network u)liza)on (the opposite of throughput for a moment in )me) We can get many of these from a selec)on of ac)ve and passive measurement tools enter the perfsonar Toolkit 37

perfsonar Toolkit The perfsonar Toolkit is an open source implementa)on and packaging of the perfsonar measurement infrastructure and protocols hdp://www.perfsonar.net All components are available as RPMs, DEBs, and bundled as a CentOS 6- based ne)nstall perfsonar tools are much more accurate if run on a dedicated perfsonar host Very easy to install and configure Usually takes less than 30 minutes 38

Outline Problem Statement on Network Connec)vity Suppor)ng Scien)fic Users Network Performance & TCP Behaviors w/ Packet Loss What is perfsonar Deployment Overview Conclusions 39

Toolkit Beacon Use Case The general use case is to establish some set of tests to other loca)ons/facili)es To answer the what/why ques)ons: Regular tes)ng with select tools helps to establish paderns how much bandwidth we would see during the course of the day or when packet loss appears We do this to points of interest to see how well a real ac)vity (e.g. Globus transfer) would do. If performance is bad, don t expect much from the data movement tool 40

Benefits: Finding the needle in the haystack Above all, perfsonar allows you to maintain a healthy, high- performing network because it helps iden)fy the sot failures in the network path. Classical monitoring systems have limita)ons Performance problems are oten only visible at the ends Individual network components (e.g. routers) have no knowledge of end host state perfsonar tests the network in ways that classical monitoring systems do not More perfsonar distribu)ons equal beder network visibility. 41

Deployment By The Numbers Last updated August 2015. Adop)on trend increases with each release. CC- NIE and innova)on pla orm helped as well. 43

hdp://stats.es.net/servicesdirectory/ 44

perfsonar Dashboard: Raising Expecta=ons and improving network visibility Status at- a- glance Packet loss Throughput Correctness Current live instances at hdp://pas.net.internet2.edu/ hdp://ps- dashboard.es.net/ Drill- down capabili)es: Test history between hosts Ability to correlate with other events Very valuable for fault localiza)on and isola)on 45

Outline Problem Statement on Network Connec)vity Suppor)ng Scien)fic Users Network Performance & TCP Behaviors w/ Packet Loss What is perfsonar Deployment Overview Conclusions 47

Benefit: Ac)ve and Growing Community Ac)ve email lists and forums provide: Instant access to advice and exper)se from the community. Ability to share metrics, experience and findings with others to help debug issues on a global scale. Joining the community automa)cally increases the reach and power of perfsonar The more endpoints means exponen)ally more ways to test and discover issues, compare metrics 48

perfsonar Community The perfsonar collabora)on is working to build a strong user community to support the use and development of the sotware. perfsonar Mailing Lists Announcement Lists: hdps://mail.internet2.edu/wws/subrequest/perfsonar- announce Users List: hdps://mail.internet2.edu/wws/subrequest/perfsonar- users 49

Resources perfsonar website hdp://www.perfsonar.net/ perfsonar Toolkit Manual hdp://docs.perfsonar.net/ perfsonar mailing lists hdp://www.perfsonar.net/about/ gepng- help/ perfsonar directory hdp://stats.es.net/servicesdirectory/ FasterData Knowledgebase hdp://fasterdata.es.net/ 50

Introduc)on & Mo)va)on This document is a result of work by the perfsonar Project (hdp://www.perfsonar.net) and is licensed under CC BY- SA 4.0 (hdps://crea)vecommons.org/licenses/by- sa/4.0/). Event Presenter, Organiza)on, Email Date