SNMP-based Monitoring of a Computing Cluster



Similar documents
How To Monitor Your Computer With Nagiostee.Org (Nagios)

Livrable L13.3. Nature Interne Date livraison 12/07/2012. Titre du Document Energy management system and energy consumption efficiency - COEES Code v1

RTG: A Scalable SNMP Statistics Architecture NANOG 27. Robert Beverly February 10, 2003

ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy

Scaling Graphite Installations

Monitoring Nginx Server

PANDORA FMS NETWORK DEVICES MONITORING

6.2 Reporting BIPublisher Improvements

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

PANDORA FMS NETWORK DEVICE MONITORING

Web Traffic Capture Butler Street, Suite 200 Pittsburgh, PA (412)

MALAYSIAN PUBLIC SECTOR OPEN SOURCE SOFTWARE (OSS) PROGRAMME. COMPARISON REPORT ON NETWORK MONITORING SYSTEMS (Nagios and Zabbix)

Network Management & Monitoring Overview

Monitoring Infrastructure for Superclusters: Experiences at MareNostrum

Configuring SNMP Cisco and/or its affiliates. All rights reserved. 1

Load Balancing and Clustering in EPiServer

Advanced Science and Technology Institute Department of Science and Technology

NTT DOCOMO Technical Journal. Large-Scale Data Processing Infrastructure for Mobile Spatial Statistics

Active Management Services

NETFORT LANGUARDIAN MONITORING WAN CONNECTIONS. How to monitor WAN connections with NetFort LANGuardian Aisling Brennan

Roberto Barbera. Centralized bookkeeping and monitoring in ALICE

Maintaining Non-Stop Services with Multi Layer Monitoring

Bandwidth Management and Optimization System Design (draft)

Simonetta Balsamo, Moreno Marzolla. Dipartimento di Informatica, Università Ca' Foscari di Venezia

Comprehensive Monitoring of VMware vsphere ESX & ESXi Environments

How To Install Linux Titan

SAP WEB DISPATCHER Helps you to make decisions on Web Dispatcher implementation

SCALABILITY AND AVAILABILITY

1. INTERFACE ENHANCEMENTS 2. REPORTING ENHANCEMENTS

Deployment Topologies

MSP End User. Version 3.0. Technical Solution Guide

Wait, How Many Metrics? Monitoring at Quantcast

LinuxWorld Conference & Expo Server Farms and XML Web Services

Building Reliable, Scalable AR System Solutions. High-Availability. White Paper

Vidi NMs Network Management

Free Network Monitoring Software for Small Networks

Exploring Oracle E-Business Suite Load Balancing Options. Venkat Perumal IT Convergence

Network Monitoring Comparison

How to create custom dashboards in SNMPc OnLine 2007

CS615 - Aspects of System Administration

Product Guide CM-Relay

How To Manage Technology

ZEN LOAD BALANCER EE v3.02 DATASHEET The Load Balancing made easy

CAREN NOC MONITORING AND SECURITY

Network Monitoring. By: Delbert Thompson Network & Network Security Supervisor Basin Electric Power Cooperative

Analyzing large flow data sets using. visualization tools. modern open-source data search and. FloCon Max Putas

Server Virtualization with Windows Server Hyper-V and System Center

Study of Network Performance Monitoring Tools-SNMP

Load Balancer Comparison: a quantitative approach. a call for researchers ;)

Network Monitoring. Review of Software

OMNITURE MONITORING. Ensuring the Security and Availability of Customer Data. June 16, 2008 Version 2.0

Network management tools: Torrus, Gerty, Mooxu

ENC Enterprise Network Center. Intuitive, Real-time Monitoring and Management of Distributed Devices. Benefits. Access anytime, anywhere

Evaluation of Nagios for Real-time Cloud Virtual Machine Monitoring

Configuring Dell OpenManage IT Assistant 8.0 to Monitor SNMP Traps Generated by VMware ESX Server

Monitoring MySQL. Presented by, MySQL & O Reilly Media, Inc. A quick overview of available tools

Citrix 1Y0-911 Citrix Resource Manager CCEA. Version 1.0

INCREASE SYSTEM AVAILABILITY BY LEVERAGING APACHE TOMCAT CLUSTERING

2 Methodical approach

ELIXIR LOAD BALANCER 2

Load Balancing Web Applications

Notes on network monitoring, by Oliver Gorwits

Log Management with Open-Source Tools. Risto Vaarandi SEB Estonia

Network and Server Statistics Using Cacti

U-LITE Network Infrastructure

Architecting ColdFusion For Scalability And High Availability. Ryan Stewart Platform Evangelist

How Comcast Built An Open Source Content Delivery Network National Engineering & Technical Operations

CommuniGate Pro White Paper. Dynamic Clustering Solution. For Reliable and Scalable. Messaging

THE VIRTUALIZATION OF AN INTERNET SERVER

Scaling Progress OpenEdge Appservers. Syed Irfan Pasha Principal QA Engineer Progress Software

Name: 1. CS372H: Spring 2009 Final Exam

Log Management with Open-Source Tools. Risto Vaarandi rvaarandi 4T Y4H00 D0T C0M

PANDORA FMS OFFICIAL TRAINING

Guideline for stresstest Page 1 of 6. Stress test

Introduction to Engineering Using Robotics Experiments Lecture 18 Cloud Computing

A SURVEY ON AUTOMATED SERVER MONITORING

Network and Server Statistics Using Cacti

Monitoring the Grid at local, national, and global levels

EventSentry Overview. Part I About This Guide 1. Part II Overview 2. Part III Installation & Deployment 4. Part IV Monitoring Architecture 13

CERN local High Availability solutions and experiences. Thorsten Kleinwort CERN IT/FIO WLCG Tier 2 workshop CERN

How To Set Up Foglight Nms For A Proof Of Concept

1. INTERFACE ENHANCEMENTS 2. REPORTING ENHANCEMENTS

Web Analytics Understand your web visitors without web logs or page tags and keep all your data inside your firewall.

ISPadmin. by Robert Haskins SYSADMIN. Robert D. Haskins is currently employed by Renesys Corporation in Hanover, NH.

HowTo Check. Microsoft Cluster. Functionality via SNMP

High Availability and Clustering

Open Source in the Data Centre. John Ferlito Bulletproof Networks

A Brief. Introduction. of MG-SOFT s SNMP Network Management Products. Document Version 1.3, published in June, 2008

BASICS OF SCALING: LOAD BALANCERS

Device Provisioning in Cable Environments

Konica Minolta s Optimised Print Services (OPS)

/ Administrator Training PAT-2012

DNS ROUND ROBIN HIGH-AVAILABILITY LOAD SHARING

Course Outline. Create and configure virtual hard disks. Create and configure virtual machines. Install and import virtual machines.

Monitoring Systems and Services. Alwin Brokmann DESY-IT March 24 28,2003 CHEP 2003 San Diego

UltraFlow -Cisco Netflow tools-

Data Protection with IBM TotalStorage NAS and NSI Double- Take Data Replication Software

A Web-based System to Monitor and Analyze Network Management Information in XML

Details. Some details on the core concepts:

WEB SERVER MONITORING SORIN POPA

Transcription:

SNMP-based Monitoring of a Computing Cluster Moreno Marzolla marzolla@dsi.unive.it Dipartimento di Informatica Università di Venezia & INFN Padova/BaBar collaboration Workshop sulle problematiche di calcolo e reti nell INFN p.1

Talk Outline The Monitoring Challenge Some Existing Tools ASC: the Asynchronous SNMP Collector Conclusions and Future Work Workshop sulle problematiche di calcolo e reti nell INFN p.2

The Challenge BaBar s Data Reprocessing suddenly stops. What happened? Client process(es) crashed Server process(es) crashed The local disk failed The CPU melted /tmp overflowed None of the above Workshop sulle problematiche di calcolo e reti nell INFN p.3

Requirements Scalable up to machines and more. Easy to configure: If one has to do simple things the effort required to configure the tool should be minimal. Extensible: New functionalities should be easy to add as they are needed. General: Needs to monitor the network switch, tape library, UPS, environmental system... GUI-independent: Should work as a regular UNIX dæmon, yet providing a convenient user interface(s). Workshop sulle problematiche di calcolo e reti nell INFN p.4

Some tools There are many freely distributable monitoring tools available. MRTG http://people.ee.ethz.ch/ oetiker/webtools/mrtg/ NGop http://www-isd.fnal.gov/ngop/ RemStats http://silverlock.dgim.crc.ca/remstats/release/index.html Ganglia http://ganglia.sourceforge.net/ Nagios http://www.nagios.org/ Snips http://www.netplex-tech.com/software/snips/ Cricket http://cricket.sourceforge.net/ Many, many others... Workshop sulle problematiche di calcolo e reti nell INFN p.5

What s wrong? So, why don t simply use one of the many programs available? Because they fail to meet our requirements. In particular: Many of them don t scale well/at all. Configuration is highly nontrivial in most cases (lots of different configuration files scattered around...). Many of them require special dæmons running on the monitored hosts: can t handle the network switch/tape libary/ups... Workshop sulle problematiche di calcolo e reti nell INFN p.6

The Do-It-Yourself approach So we decided to build a tool (ASC, the Asynchronous SNMP Collector) from scratch. The tool is entirely written in C. Use asynchronous (non-blocking) SNMP requests for data collection. SNMP (Simple Network Management Protocol is a standard protocol supported by many different pieces of hardware. Even our air conditioning system speaks SNMP... Workshop sulle problematiche di calcolo e reti nell INFN p.7

The Do-It-Yourself approach (2) Data are stored in Round Robin Databases Provide facilities for storing timestamped data with different granularities; facilities for plotting graphs are also provided The configuration file is written in XML ASC embeds a simple HTTP interface HTML pages are generated by applying an XSLT stylesheet to an automatically-generated XML status file Workshop sulle problematiche di calcolo e reti nell INFN p.8

XML Configuration File Workshop sulle problematiche di calcolo e reti nell INFN p.9

XML to HTML Output Status XSLT Stylesheets HTML Pages XML Configuration File ASC HTTPD Monitored Hosts Workshop sulle problematiche di calcolo e reti nell INFN p.10

HTML Interface Workshop sulle problematiche di calcolo e reti nell INFN p.11

Conclusions ASC is still under development. Are we approaching the goals? Scalability On the test farm ( machines) it is working very well. Will it scale tenfold? (Note that applying stylesheets is not cheap... Easy Configuration We have a single configuration file, which can optionally be split in different parts and use macros. These are standard features of XML. Workshop sulle problematiche di calcolo e reti nell INFN p.12

Facing a scalability limit Top Level Proxy Proxy Proxy ASC ASC ASC Hosts Workshop sulle problematiche di calcolo e reti nell INFN p.13

Work(s) in Progress Implement Alarms Understand SNMP Traps Active control of ASC with its WEB interface Implement more XSLT stylesheets Write the documentation Workshop sulle problematiche di calcolo e reti nell INFN p.14