Network at CERN. Large Scale



Similar documents
Monitoring individual traffic flows within the ATLAS TDAQ network

Web Load Stress Testing

How To Set Up Foglight Nms For A Proof Of Concept

NMS300 Network Management System

Network Monitoring Comparison

Monitor network traffic in the Dashboard tab

WhatsUpGold. v NetFlow Monitor User Guide

HP IMC User Behavior Auditor

Cisco Performance Visibility Manager 1.0.1

WHITE PAPER OCTOBER CA Unified Infrastructure Management for Networks

WHITE PAPER September CA Nimsoft For Network Monitoring

mbits Network Operations Centrec

HP Intelligent Management Center v7.1 Network Traffic Analyzer Administrator Guide

Capacity Management Plan

How To Use Mindarray For Business

CHAPTER 1 WhatsUp Flow Monitor Overview. CHAPTER 2 Configuring WhatsUp Flow Monitor. CHAPTER 3 Navigating WhatsUp Flow Monitor

TECHNOLOGY WHITE PAPER. Correlating SDN overlays and the physical network with Nuage Networks Virtualized Services Assurance Platform

Test Case 3 Active Directory Integration

Viete, čo robia Vaši užívatelia na sieti? Roman Tuchyňa, CSA

Scalable Extraction, Aggregation, and Response to Network Intelligence

Net Inspector 2015 GETTING STARTED GUIDE. MG-SOFT Corporation. Document published on October 16, (Document Version: 10.6)

CHAPTER. Monitoring and Diagnosing

PANDORA FMS NETWORK DEVICE MONITORING

Lab 1: Evaluating Internet Connection Choices for a Small Home PC Network

INCREASE NETWORK VISIBILITY AND REDUCE SECURITY THREATS WITH IMC FLOW ANALYSIS TOOLS

Assignment One. ITN534 Network Management. Title: Report on an Integrated Network Management Product (Solar winds 2001 Engineer s Edition)

Network Monitoring On Large Networks. Yao Chuan Han (TWCERT/CC)

PANDORA FMS NETWORK DEVICES MONITORING

Open Source in Network Administration: the ntop Project

Maintaining Non-Stop Services with Multi Layer Monitoring

When Recognition Matters THE COMPARISON OF PROGRAMS FOR NETWORK MONITORING.

MRV EMPOWERS THE OPTICAL EDGE.

and reporting Slavko Gajin

Traffic Monitoring in a Switched Environment

pt360 FREE Tool Suite Networks are complicated. Network management doesn t have to be.

Network Measurement. Why Measure the Network? Types of Measurement. Traffic Measurement. Packet Monitoring. Monitoring a LAN Link. ScienLfic discovery

ntopng: Realtime Network Traffic View

NB6 Series Quality of Service (QoS) Setup (NB6Plus4, NB6Plus4W Rev1)

Firewall VPN Router. Quick Installation Guide M73-APO09-380

Applications. Network Application Performance Analysis. Laboratory. Objective. Overview

Features Overview Guide About new features in WhatsUp Gold v14

Network Management and Monitoring Software

The ntop Project: Open Source Network Monitoring

How To Monitor Bandwidth On A Computer Network

Network Management Deployment Guide

NetFlow: What is it, why and how to use it? Miloš Zeković, ICmyNet Chief Customer Officer Soneco d.o.o.

Accurate End-to-End Performance Management Using CA Application Delivery Analysis and Cisco Wide Area Application Services

Nimsoft for Network Monitoring. A Nimsoft Service Level Management Solution White Paper

How To Understand and Configure Your Network for IntraVUE

EventSentry Overview. Part I About This Guide 1. Part II Overview 2. Part III Installation & Deployment 4. Part IV Monitoring Architecture 13

Cisco Integrated Services Routers Performance Overview

Data Sheet. V-Net Link 700 C Series Link Load Balancer. V-NetLink:Link Load Balancing Solution from VIAEDGE

FortiDDos Size isn t everything

Edge Configuration Series Reporting Overview

Monitoring high-speed networks using ntop. Luca Deri

TROUBLESHOOTING INFORMATION

TDAQ Analytics Dashboard

StruxureWare TM Center Expert. Data

The Picture must be Clear. IPTV Quality of Experience

Security Toolsets for ISP Defense

Monitor all of your critical infrastructure from a single, integrated system.

Elevating Data Center Performance Management

SolarWinds Network Performance Monitor

Netmon User Guide 2. Contents

Giving life to today s media distribution services

Cisco Bandwidth Quality Manager 3.1

Using The Paessler PRTG Traffic Grapher In a Cisco Wide Area Application Services Proof of Concept

Network Monitoring and Traffic CSTNET, CNIC

Open Source and Commercial Performance Testing Tools

Monitor Solution Best Practice v3.2 part of Symantec Server Management Suite

plixer Scrutinizer Competitor Worksheet Visualization of Network Health Unauthorized application deployments Detect DNS communication tunnels

Managed Service For IP VPN Networks

A Brief. Introduction. of MG-SOFT s SNMP Network Management Products. Document Version 1.3, published in June, 2008

QoSpy an approach for QoS monitoring in DiffServ Networks.

Solving Monitoring Challenges in the Data Center

Lab - Configure a Windows 7 Firewall

Question: 3 When using Application Intelligence, Server Time may be defined as.

SolarWinds Network Performance Monitor

NSC E

MSP. HOW MSPs Can Use Performance Monitoring to Create New Revenue Streams. [ WhitePaper ] Introduction

Network Monitoring and Management NetFlow Overview

Quality of Service (QoS) Setup Guide (NB604n)

Xiaoqiao Meng, Vasileios Pappas, Li Zhang IBM T.J. Watson Research Center Presented by: Payman Khani

PRODUCTS & TECHNOLOGY

Lecture 15: Congestion Control. CSE 123: Computer Networks Stefan Savage

NetFlow/IPFIX Various Thoughts

Overview of Network Measurement Tools

Monitoring and analyzing audio, video, and multimedia traffic on the network

Oracle JRockit Mission Control Overview

The new services in nagios: network bandwidth utility, notification and sms alert in improving the network performance

SOLARWINDS NETWORK PERFORMANCE MONITOR

Transcription:

Monitoring i the ATLAS TDAQ Network at CERN Lucian LEAHU Brasov, 15/01/2009 Large Scale 3000 nodes, 200 edge switches, 5 core routers 6000 ports 2 1

Plus physicists! Network dimensioned to meet requirements Maximum average link occupancy = 60% Should mean peace of mind for Network Support Actually seen as a challenge by physicists 40% for free! Turn up the wick until something breaks! Continuous running out of spec! Must distinguish between real and self inflicted problems 3 Basic question: Does any of it work???? Monitor everything! ICMP: Internet Control Message Protocol SNMP: Simple Network Management Protocol SPECTRUM Tells you it s alive Tells you if it dies Fetches status info.. - and hides it! Commissioning 4 2

Commercial versus Proprietary Special needs: Multiple networks per processor Want to see whole picture for system analysis Want to see all detail for component analysis Want to see traffic volume visually Want traffic/errors qualified Spectrum CORBA API clumsy Multiple requests hits the CPU hard 5 Current Infrastructure Statistics RRD SPECTRUM SpectroServer sw_script Python and Java based applications Graphing server Network Search Monitoring Discovery Network Service Gateway Alarms 3D Display 2D Display Network Browser Network Panel RRD Aggregates Events & Alarms Knowledge Base LanDB Rack Wizard REPORTS RRD Top Dynamic pages 6 3

Network Browser: Miniplots Click on switch folder for miniplots of all ports. 7 Dynamic pages with real time traffic Real-time global top Most active connections in real time. Current ATLAS applications running in the network. Connections and traffic per switch port. 8 4

Network browser: port stats Click on port instance for traffic and error plots. 9 Affiliated mini-plots 10 5

Searching and Aggregation 11 Scaling and aggregated plots 12 6

System Diagnostics Error free OK Stressed Overload We have prior detailed measurements of losses and latencies against loads. System dimensioned for a nominal operating maximum of 60% load on any port. Naïve world view: Flag Loads with high thresholds, above, say 85% throughput Flag Errors with low thresholds, above, say, 0.05% or about 10pkts/sec at 1Gbps Unfortunately it s not so simple 13 Where is the real error? TCP UDP 80% Traffic <2% Traffic 5K pkts/s Errors 10 pkts/s Errors 14 7

Anomalous traffic and from here to rules based expert systems 15 Transition to displays So far have concentrated on low level port and switch monitoring i and diagnostics. Identified scaling issues Want to have a display with a system view Want to retain architectural model Want to exploit real time stats 16 8

GUESS 2D Visualisation Can zoom and pan - Traffic overlaid onto connectivity. - No hosts, only switches - Only 1 data core - Autoplacement different for each discovery 17 GUESS: Guided Placement 18 9

19 20 10

21 22 11

23 24 12

Atlas Users Network Panel For operators we provide a summary of status per application set. 25 2D Display Limitations 2D Display is very good for switch to switch traffic visualisation No level of detail feature as you zoom in or out Can t incorporate host details - - -would overwhelm the plot BUT.. I want all the detail when something goes wrong I want neighbouring views when examining individual behaviour I want the system wide view for visual correlation I want different levels of detail depending on my viewpoint I want fly-through, and navigation and easily visible errors I want I want I want GOOGLE Earth for my network. 26 13

3D Display Google earth as inspiration Variable detail as a function of viewing distance Variable viewing angles Intuitive navigation Unfortunately Google doesn t cope with our dynamic update requirements So we went looking for display software that does. X3D (enhanced VRML) Octaga Player 27 3D Top Level View 28 14

3D Top Level View 29 Detail of processor farm 30 15

Detail of a Processor 31 Diagnostic Tools: 1 YATG (Yet Another Traffic Grapher High speed SNMP-based traffic monitoring (the switch is the limiting factor) Fine time granularity statistics for selected device interfaces ATLAS-like traffic bandwidth measurements Distributed applications replicate the transactional request-response transfer protocol Demonstrate maximum achievable bandwidth 32 16

Diagnostic tools :2 CPU CPU CPU CPU Core Switch Latency Plots Dynamic queue growth measurement Edge Switch CPU CPU CPU 33 Diagnostic tools :3 SFLOW (1) sflow is the standard for statistical packet sampling Each network port sampling system All packet samples central location (software) Analysis information about the content of the traffic By collecting packet samples, the packets can be classified into flows A flow ~ network conversation between two applications The bandwidth occupancy for each flow can be estimated We developed an sflow analysis application in order to study the technology 34 17

Diagnostic tools :3 SFLOW (2) For one switch port Using SNMP bandwidth usage on port B1 is 89% Using sflow Pie chart with traffic distribution For the entire network Traffic matrix with all senders and receivers 35 Summary large scale network monitor, diagnose commercial tools in-house built software plots for 6000 ports different levels of visual abstraction 24/7 operation errors not welcome 36 18