EUMEDGrid-Support Supporting EUMEDGRID-Support e-infrastructure sustainability Overview of monitoring tools Fulvio Galeazzi GARR Amman, November 24th 2011 EUMEDGRID-Support ROC-training school
Contents Will be a hands-on session, describing: Tool to Describe your site: GOC-DB Tool to Monitor site: Nagios Tool to perform Simple network measurements: Smokeping Will not go much in detail, but will provide more information for the curious ones 2
1. [ GOC-DB ]
GOC-DB: what it is Grid Operations Center DataBase Stores (static) site information (responsible persons, email addresses, resources,...) Very important to spend a minute to fill it correctly, since other tools depend on it
GOC-DB: main page https://gocdb.africa-grid.org/portal/index.php
GOC-DB: requesting role Write access to GOCDB is restricted. Make sure your X509 certificate is loaded in the browser, then click Manage Roles in left column. Choose site Choose privilege Send email to Riccardo, Mario, Fulvio.
GOC-DB: browse sites
GOC-DB: endpoints Each site provides one or more endpoints, or services: sitebdii, UI, CE, SE, LFC, WMS, LB,... Select a site, scroll down to see endpoints : namely, services installed at sites
GOC-DB: add and describe endpoint Site-admins can insert a new endpoint: please pay attention to the Service Type : your CE should be identified as CREAM-CE! Monitored set to Y triggers Nagios monitoring.
2. [ Nagios ]
What is Nagios? Nagios is the tool to monitor: Availability: is site/service up and properly configured? Reliability: is site/service really working? Nagios runs automatically: Gets site description (and downtimes) from GOCDB Schedules a number of tests ( ping, LDAP queries, real GRID jobs) Moreover, Nagios offers a web interface to get to test results, history,...
Nagios pre-requisites If you like your site to be green make sure: Site is properly described in GOCDB, site status is Certified and service status is Monitored Site supports ops VO: not only on services, but also on WNs Services are up, running, functional Note: please make sure you set your site to Certified in GOCDB (and/or service to Monitored ) only when they are really working for ops VO
Nagios: how the thing works
Nagios and the NGIs Nagios is the official tool used within EGI Infrastructure for distributed monitoring submission, transport, storage and visualization of probes relies on existing technologies (Nagios, ActiveMQ, Django) deployed at each NGI Results are used to measure the availability and reliability of EGI sites
Nagios tests Tests are scheduled automatically If you have rights, you can trigger execution of checks (HANDLE WITH CARE: do not Force checks) A hierarchy exists May be confusing/worrying sometimes Make sure you understand/guess whether there may be an upstream cause for the problem you noticed https://tomtools.cern.ch/confluence/display/sam/probes+org.sam
Nagios for EUMEDGrid https://nagios.africa-grid.org/nagios/
Nagios: hosts view
Nagios: host status
Nagios: host status detail
Nagios: host groups
Nagios: MyEGI https://nagios.africa-grid.org/myegi/
MyEGI: Gridmap view
3. [ Smokeping ]
Smokeping: what it is A tool for monitoring network latency Time for ping : every 5 minutes, send 20 packets and measure packet loss, round-trip-time Extended to measure time to reply to a command: every 5 minutes, executes ldapsearch and measure time to reply and number of times command is unanswered Keeps history of tests Implements a star configuration From smokeping server to the world Cannot be aware of problems between sites Configuration is manual: contact grid-tech@garr.it 24
Smokeping: network latency https://dpm2-4.dir.garr.it/smokeping/smokeping.cgi 25
Smokeping: network site view 26
Smokeping: host view 27
Smokeping: central services 28
Smokeping: site services 29
Conclusions Quite some information is available May be too much... information is useful only if you check it :-) Need to have a to-do list for ROC shifters Can be mutuated from existing ones Want to host any of these tools in your site/country? Just ask and we'll be glad to help! 30