The dashboard Grid monitoring framework



Similar documents
ARDA Experiment Dashboard

CMS Dashboard of Grid Activity

Global Grid User Support - GGUS - start up schedule

ATLAS job monitoring in the Dashboard Framework

PoS(EGICF12-EMITC2)110

Global Grid User Support - GGUS - in the LCG & EGEE environment

Sun Grid Engine, a new scheduler for EGEE

Real Time Monitor of Grid Job Executions. Janusz Martyniak Imperial College London

Das HappyFace Meta-Monitoring Framework

Monitoring Evolution WLCG collaboration workshop 7 July Pablo Saiz IT/SDC

The GENIUS Grid Portal

Batch and Cloud overview. Andrew McNab University of Manchester GridPP and LHCb

The Grid-it: the Italian Grid Production infrastructure

Status and Integration of AP2 Monitoring and Online Steering

How To Use Happyface (Hf) On A Network (For Free)

Welcome to the User Support for EGEE Task Force Meeting

The glite File Transfer Service

HappyFace for CMS Tier-1 local job monitoring

Status and Evolution of ATLAS Workload Management System PanDA

16th International Conference on Control Systems and Computer Science (CSCS16 07)

HTCondor within the European Grid & in the Cloud

Sun Grid Engine, a new scheduler for EGEE middleware

Integration of the OCM-G Monitoring System into the MonALISA Infrastructure

DSA1.4 R EPORT ON IMPLEMENTATION OF MONITORING AND OPERATIONAL SUPPORT SYSTEM. Activity: SA1. Partner(s): EENet, NICPB. Lead Partner: EENet

OSG Operational Infrastructure

RCS Liferay Google Analytics Portlet Installation Guide

Frequently Asked Questions (FAQ)

CERN local High Availability solutions and experiences. Thorsten Kleinwort CERN IT/FIO WLCG Tier 2 workshop CERN

Distributed Database Access in the LHC Computing Grid with CORAL

Getting Started Guide for Developing tibbr Apps

Log managing at PIC. A. Bruno Rodríguez Rodríguez. Port d informació científica Campus UAB, Bellaterra Barcelona. December 3, 2013

Site specific monitoring of multiple information systems the HappyFace Project

IGI Portal architecture and interaction with a CA- online

Qualys API Limits. July 10, Overview. API Control Settings. Implementation

ActiveVOS Server Architecture. March 2009

Security Analytics Topology

Service Challenge Tests of the LCG Grid

Analyses on functional capabilities of BizTalk Server, Oracle BPEL Process Manger and WebSphere Process Server for applications in Grid middleware

Report from SARA/NIKHEF T1 and associated T2s

The GRID and the Linux Farm at the RCF

An objective comparison test of workload management systems

Database Services for CERN

The dcache Storage Element

DSA1.5 U SER SUPPORT SYSTEM

Virtualisation Cloud Computing at the RAL Tier 1. Ian Collier STFC RAL Tier 1 HEPiX, Bologna, 18 th April 2013

CloudStack Metering Working with the Usage Data. Tariq Iqbal Senior

How To Use The Rss Feeder On Firstclass (First Class) And First Class (Firstclass) For Free

For a full comparison of Magento Enterprise and Magento Community, visit Magento Feature List

HTCondor at the RAL Tier-1

Client/Server Grid applications to manage complex workflows

SHAREPOINT CI. Spots Business Opportunities, Eliminates Risk COMPETITIVE INTELLIGENCE FOR SHAREPOINT

Contents. Platform Compatibility. GMS SonicWALL Global Management System 5.0

Magento 1.3 Feature List


AN APPROACH TO DEVELOPING BUSINESS PROCESSES WITH WEB SERVICES IN GRID

Site Management Abandoned Shopping Cart Report Best Viewed Products Report Control multiple websites and stores from one

ANALYSIS FUNCTIONAL AND STRESS TESTING

OpenAdmin Tool for Informix (OAT) October 2012

1. GENERAL INFORMATION PROCESS AND WORKSPACE OVERVIEW

Deploying ArcGIS for Server Using Esri Managed Services

Report from Italian ROC

Solution for private cloud computing

Transcription:

The dashboard Grid monitoring framework Benjamin Gaidioz on behalf of the ARDA dashboard team (CERN/EGEE) ISGC 2007 conference The dashboard Grid monitoring framework p. 1

introduction/outline goals of the project, the team, the framework, some monitoring applications: job monitoring, site monitoring, data management monitoring. The dashboard Grid monitoring framework p. 2

the project (EGEE/ARDA) another monitoring tool, a VO specific monitoring service, showing Grid usage from a VO point of view (cross Grid, cross application, submission tool, etc.), merging Grid information and VO information. implemented in close contact with the VOs. The dashboard Grid monitoring framework p. 3

the team Julia Andreeva (lead, CMS) and Juha Herrala (former member, CMS), Benjamin Gaidioz and Ricardo Rocha (ATLAS), Pablo Saiz (ALICE), Gerhild Maier, collaborators and visitors: Taipei: Fu-Ming Tsai (daily summaries), Tao-Sheng Chen (Postgresql and Oracle), Shih-Chun Chiu (user web interface, PHP), etc., Moscow State University, our contacts in all the VOs and Grids. contact: dashboard-support@cern.ch The dashboard Grid monitoring framework p. 4

the framework a python framework for collecting and publishing monitoring information GridPP GridPP collector RGMA RGMA collector dashboard web server Monalisa monalisa collector question request text/html, text/xml, image/png, etc. data access object (DAO) client dashboard oracle database developer guide, savannah project. The dashboard Grid monitoring framework p. 5

a set of applications The dashboard Grid monitoring framework p. 6

applications 1. job monitoring, 2. site monitoring, 3. data management monitoring. see the links in the last slide for accessing them all. The dashboard Grid monitoring framework p. 7

job monitoring real-time view of Grid jobs for a VO, summary views, various grid information systems used (EGEE RGMA, GridPP XML files, LCG BDII), VO info: job instrumentation (Monalisa s ApMon), ATLAS prodsys database, panda monitoring, GangaAtlas monitoring, Dirac database, etc. consistent merging (Grid info + VO info). powerful filtering for serving different use cases (managers, site admins, users), examples: ATLAS activities today, ATLAS jobs in Taiwan, CMS daily views. The dashboard Grid monitoring framework p. 8

job monitoring summary installed for ALICE, ATLAS, CMS, LHCb. latest/next developments: open HTTP API for a VO to publish job information to the dashboard (in progress), user task monitoring (in progress), alerts (with failure pattern recognition), link with the SAM tests (site functionality tests). RSS feeds. The dashboard Grid monitoring framework p. 9

site monitoring linked to job monitoring, identify reason of failure of jobs in sites, using RGMA (which reports Grid error messages), examples: ALICE site info. Waiting Ready (unavailable ) Scheduled (Job successfully submitted to Globus ) Ready (7 an authentication operation failed ) Running (Job successfully submitted to Globus ) Done (Job got an error while in the CondorG queue. ) Submitted Done (Job terminated successfully ) Done (Cannot read JobWrapper output both from Condor and from Maradona. ) Done (/net/hisrv0001/opt.x86_64/grid/globus/etc/globus-user-env.sh not found or unreadable ) Cleared (user retrieved output sandbox ) Waiting (unavailable ) The dashboard Grid monitoring framework p. 10

site monitoring linked to job monitoring, identify reason of failure of jobs in sites, using RGMA (which reports Grid error messages), examples: ALICE site info. submit ce02.grid.acad.bg lepton.rcac.purdue.edu ce01.cmsaf.mit.edu cluster.pnpi.nw.ru Waiting Waiting Waiting Waiting Ready Ready Ready Ready Scheduled Error, authentication Scheduled Scheduled Running Running Running Error, maradona Error, wrong installation Done Success The dashboard Grid monitoring framework p. 10

site monitoring summary installed for ALICE, ATLAS, CMS, LHCb. latest/next developments: merging of all information of a site (not per VO), in order to see if failures are similar for all VOs (in progress). The dashboard Grid monitoring framework p. 11

data management an ATLAS specific application, monitoring the ATLAS DDM tool, events directly reported by ATLAS software to the dashboard, current performance, details, developed in close contact with ATLAS DDM admins and developers, daily summary sent by mail. The dashboard Grid monitoring framework p. 12

data management: summary installed for ATLAS, critical component of ATLAS DDM (now official monitoring system), latest/next developments: text summary sent by e-mail to site admins, correlation with the SAM tests (site functionality tests). The dashboard Grid monitoring framework p. 13

conclusion The dashboard Grid monitoring framework p. 14

conclusion goal: grid monitoring from a VO point of view: merging VO infos and Grid information, feeting the various use cases (managers, users, site admins), several applications already implemented using a flexible python framework, future work: new applications, new information sources (GridICE, APEL, SAM), new functionalities: alerts, assistance in error tracking. The dashboard Grid monitoring framework p. 15

links Savannah project dashboard main page CMS dashboard main page ATLAS dashboard main page LHCb dashboard main page ALICE dashboard main page site reliability dashboard-support@cern.ch The dashboard Grid monitoring framework p. 16