The dashboard Grid monitoring framework
|
|
|
- Brett Harrington
- 10 years ago
- Views:
Transcription
1 The dashboard Grid monitoring framework Benjamin Gaidioz on behalf of the ARDA dashboard team (CERN/EGEE) ISGC 2007 conference The dashboard Grid monitoring framework p. 1
2 introduction/outline goals of the project, the team, the framework, some monitoring applications: job monitoring, site monitoring, data management monitoring. The dashboard Grid monitoring framework p. 2
3 the project (EGEE/ARDA) another monitoring tool, a VO specific monitoring service, showing Grid usage from a VO point of view (cross Grid, cross application, submission tool, etc.), merging Grid information and VO information. implemented in close contact with the VOs. The dashboard Grid monitoring framework p. 3
4 the team Julia Andreeva (lead, CMS) and Juha Herrala (former member, CMS), Benjamin Gaidioz and Ricardo Rocha (ATLAS), Pablo Saiz (ALICE), Gerhild Maier, collaborators and visitors: Taipei: Fu-Ming Tsai (daily summaries), Tao-Sheng Chen (Postgresql and Oracle), Shih-Chun Chiu (user web interface, PHP), etc., Moscow State University, our contacts in all the VOs and Grids. contact: The dashboard Grid monitoring framework p. 4
5 the framework a python framework for collecting and publishing monitoring information GridPP GridPP collector RGMA RGMA collector dashboard web server Monalisa monalisa collector question request text/html, text/xml, image/png, etc. data access object (DAO) client dashboard oracle database developer guide, savannah project. The dashboard Grid monitoring framework p. 5
6 a set of applications The dashboard Grid monitoring framework p. 6
7 applications 1. job monitoring, 2. site monitoring, 3. data management monitoring. see the links in the last slide for accessing them all. The dashboard Grid monitoring framework p. 7
8 job monitoring real-time view of Grid jobs for a VO, summary views, various grid information systems used (EGEE RGMA, GridPP XML files, LCG BDII), VO info: job instrumentation (Monalisa s ApMon), ATLAS prodsys database, panda monitoring, GangaAtlas monitoring, Dirac database, etc. consistent merging (Grid info + VO info). powerful filtering for serving different use cases (managers, site admins, users), examples: ATLAS activities today, ATLAS jobs in Taiwan, CMS daily views. The dashboard Grid monitoring framework p. 8
9 job monitoring summary installed for ALICE, ATLAS, CMS, LHCb. latest/next developments: open HTTP API for a VO to publish job information to the dashboard (in progress), user task monitoring (in progress), alerts (with failure pattern recognition), link with the SAM tests (site functionality tests). RSS feeds. The dashboard Grid monitoring framework p. 9
10 site monitoring linked to job monitoring, identify reason of failure of jobs in sites, using RGMA (which reports Grid error messages), examples: ALICE site info. Waiting Ready (unavailable ) Scheduled (Job successfully submitted to Globus ) Ready (7 an authentication operation failed ) Running (Job successfully submitted to Globus ) Done (Job got an error while in the CondorG queue. ) Submitted Done (Job terminated successfully ) Done (Cannot read JobWrapper output both from Condor and from Maradona. ) Done (/net/hisrv0001/opt.x86_64/grid/globus/etc/globus-user-env.sh not found or unreadable ) Cleared (user retrieved output sandbox ) Waiting (unavailable ) The dashboard Grid monitoring framework p. 10
11 site monitoring linked to job monitoring, identify reason of failure of jobs in sites, using RGMA (which reports Grid error messages), examples: ALICE site info. submit ce02.grid.acad.bg lepton.rcac.purdue.edu ce01.cmsaf.mit.edu cluster.pnpi.nw.ru Waiting Waiting Waiting Waiting Ready Ready Ready Ready Scheduled Error, authentication Scheduled Scheduled Running Running Running Error, maradona Error, wrong installation Done Success The dashboard Grid monitoring framework p. 10
12 site monitoring summary installed for ALICE, ATLAS, CMS, LHCb. latest/next developments: merging of all information of a site (not per VO), in order to see if failures are similar for all VOs (in progress). The dashboard Grid monitoring framework p. 11
13 data management an ATLAS specific application, monitoring the ATLAS DDM tool, events directly reported by ATLAS software to the dashboard, current performance, details, developed in close contact with ATLAS DDM admins and developers, daily summary sent by mail. The dashboard Grid monitoring framework p. 12
14 data management: summary installed for ATLAS, critical component of ATLAS DDM (now official monitoring system), latest/next developments: text summary sent by to site admins, correlation with the SAM tests (site functionality tests). The dashboard Grid monitoring framework p. 13
15 conclusion The dashboard Grid monitoring framework p. 14
16 conclusion goal: grid monitoring from a VO point of view: merging VO infos and Grid information, feeting the various use cases (managers, users, site admins), several applications already implemented using a flexible python framework, future work: new applications, new information sources (GridICE, APEL, SAM), new functionalities: alerts, assistance in error tracking. The dashboard Grid monitoring framework p. 15
17 links Savannah project dashboard main page CMS dashboard main page ATLAS dashboard main page LHCb dashboard main page ALICE dashboard main page site reliability The dashboard Grid monitoring framework p. 16
ARDA Experiment Dashboard
ARDA Experiment Dashboard Ricardo Rocha (ARDA CERN) on behalf of the Dashboard Team www.eu-egee.org egee INFSO-RI-508833 Outline Background Dashboard Framework VO Monitoring Applications Job Monitoring
CMS Dashboard of Grid Activity
Enabling Grids for E-sciencE CMS Dashboard of Grid Activity Julia Andreeva, Juha Herrala, CERN LCG ARDA Project, EGEE NA4 EGEE User Forum Geneva, Switzerland March 1-3, 2006 http://arda.cern.ch ARDA and
Global Grid User Support - GGUS - start up schedule
Global Grid User Support - GGUS - start up schedule GDB Meeting 2004-07 07-13 Concept Target: 24 7 support via time difference and 3 support teams Currently: GGUS FZK GGUS ASCC Planned: GGUS USA Support
ATLAS job monitoring in the Dashboard Framework
ATLAS job monitoring in the Dashboard Framework J Andreeva 1, S Campana 1, E Karavakis 1, L Kokoszkiewicz 1, P Saiz 1, L Sargsyan 2, J Schovancova 3, D Tuckett 1 on behalf of the ATLAS Collaboration 1
PoS(EGICF12-EMITC2)110
User-centric monitoring of the analysis and production activities within the ATLAS and CMS Virtual Organisations using the Experiment Dashboard system Julia Andreeva E-mail: [email protected] Mattia
Global Grid User Support - GGUS - in the LCG & EGEE environment
Global Grid User Support - GGUS - in the LCG & EGEE environment Torsten Antoni ([email protected]) Why Support? New support groups Network layer Resource centers CIC / GOC / etc. more to come New
Sun Grid Engine, a new scheduler for EGEE
Sun Grid Engine, a new scheduler for EGEE G. Borges, M. David, J. Gomes, J. Lopez, P. Rey, A. Simon, C. Fernandez, D. Kant, K. M. Sephton IBERGRID Conference Santiago de Compostela, Spain 14, 15, 16 May
Real Time Monitor of Grid Job Executions. Janusz Martyniak Imperial College London
Real Time Monitor of Grid Job Executions Janusz Martyniak Imperial College London What is the RTM RTM is a GRID monitoring system which gives on overview of current state of the infrastructure. The RTM's
Das HappyFace Meta-Monitoring Framework
Das HappyFace Meta-Monitoring Framework B. Berge, M. Heinrich, G. Quast, A. Scheurer, M. Zvada, DPG Frühjahrstagung Karlsruhe, 28. März 1. April 2011 KIT University of the State of Baden-Wuerttemberg and
Monitoring Evolution WLCG collaboration workshop 7 July 2014. Pablo Saiz IT/SDC
Monitoring Evolution WLCG collaboration workshop 7 July 2014 Pablo Saiz IT/SDC Monitoring evolution Past Present Future 2 The past Working monitoring solutions Small overlap in functionality Big diversity
The GENIUS Grid Portal
The GENIUS Grid Portal (*) work in collaboration with A. Falzone and A. Rodolico EGEE NA4 Workshop, Paris, 18.12.2003 CHEP 2000, 10.02.2000 Outline Introduction Grid portal architecture and requirements
Batch and Cloud overview. Andrew McNab University of Manchester GridPP and LHCb
Batch and Cloud overview Andrew McNab University of Manchester GridPP and LHCb Overview Assumptions Batch systems The Grid Pilot Frameworks DIRAC Virtual Machines Vac Vcycle Tier-2 Evolution Containers
The Grid-it: the Italian Grid Production infrastructure
n 1 Maria Cristina Vistoli INFN CNAF, Bologna Italy The Grid-it: the Italian Grid Production infrastructure INFN-Grid goals!promote computational grid technologies research & development: Middleware and
Status and Integration of AP2 Monitoring and Online Steering
Status and Integration of AP2 Monitoring and Online Steering Daniel Lorenz - University of Siegen Stefan Borovac, Markus Mechtel - University of Wuppertal Ralph Müller-Pfefferkorn Technische Universität
How To Use Happyface (Hf) On A Network (For Free)
Site Meta-Monitoring The HappyFace Project G. Quast, A. Scheurer, M. Zvada CMS Monitoring Review, 16. 17. November 2010 KIT University of the State of Baden-Wuerttemberg and National Research Center of
Welcome to the User Support for EGEE Task Force Meeting
Welcome to the User Support for EGEE Task Force Meeting The agenda is as follows: Welcome Note & Presentation of the current GGUS Support system Basic Support Model Coffee brake Processes Lunch Break Interfaces
The glite File Transfer Service
Enabling Grids Enabling for E-sciencE Grids for E-sciencE The glite File Transfer Service Paolo Badino On behalf of the JRA1 Data Management team EGEE User Forum - CERN, 2 Mars 2006 www.eu-egee.org Outline
HappyFace for CMS Tier-1 local job monitoring
HappyFace for CMS Tier-1 local job monitoring G. Quast, A. Scheurer, M. Zvada CMS Offline & Computing Week CERN, April 4 8, 2011 INSTITUT FÜR EXPERIMENTELLE KERNPHYSIK, KIT 1 KIT University of the State
Status and Evolution of ATLAS Workload Management System PanDA
Status and Evolution of ATLAS Workload Management System PanDA Univ. of Texas at Arlington GRID 2012, Dubna Outline Overview PanDA design PanDA performance Recent Improvements Future Plans Why PanDA The
16th International Conference on Control Systems and Computer Science (CSCS16 07)
16th International Conference on Control Systems and Computer Science (CSCS16 07) TOWARDS AN IO INTENSIVE GRID APPLICATION INSTRUMENTATION IN MEDIOGRID Dacian Tudor 1, Florin Pop 2, Valentin Cristea 2,
HTCondor within the European Grid & in the Cloud
HTCondor within the European Grid & in the Cloud Andrew Lahiff STFC Rutherford Appleton Laboratory HEPiX 2015 Spring Workshop, Oxford The Grid Introduction Computing element requirements Job submission
Sun Grid Engine, a new scheduler for EGEE middleware
Sun Grid Engine, a new scheduler for EGEE middleware G. Borges 1, M. David 1, J. Gomes 1, J. Lopez 2, P. Rey 2, A. Simon 2, C. Fernandez 2, D. Kant 3, K. M. Sephton 4 1 Laboratório de Instrumentação em
Integration of the OCM-G Monitoring System into the MonALISA Infrastructure
Integration of the OCM-G Monitoring System into the MonALISA Infrastructure W lodzimierz Funika, Bartosz Jakubowski, and Jakub Jaroszewski Institute of Computer Science, AGH, al. Mickiewicza 30, 30-059,
DSA1.4 R EPORT ON IMPLEMENTATION OF MONITORING AND OPERATIONAL SUPPORT SYSTEM. Activity: SA1. Partner(s): EENet, NICPB. Lead Partner: EENet
R EPORT ON IMPLEMENTATION OF MONITORING AND OPERATIONAL SUPPORT SYSTEM Document Filename: Activity: Partner(s): Lead Partner: Document classification: BG-DSA1.4-v1.0-Monitoring-operational-support-.doc
OSG Operational Infrastructure
OSG Operational Infrastructure December 12, 2008 II Brazilian LHC Computing Workshop Rob Quick - Indiana University Open Science Grid Operations Coordinator Contents Introduction to the OSG Operations
RCS Liferay Google Analytics Portlet Installation Guide
RCS Liferay Google Analytics Portlet Installation Guide Document Revisions Date Revision By 07/02/12 1 Pablo Rendón 2 Table of Contents RCS Liferay-Google Analytics...1 Document Revisions...2 General Description...4
Frequently Asked Questions (FAQ)
Frequently Asked Questions (FAQ) (click header to jump to the section of your choice) HOW DO I REGISTER FOR AN ACCOUNT? HOW DO I CHANGE MY PASSWORD? WHY SHOULD I REGISTER FOR AN ACCOUNT? HOW CAN I SET
CERN local High Availability solutions and experiences. Thorsten Kleinwort CERN IT/FIO WLCG Tier 2 workshop CERN 16.06.2006
CERN local High Availability solutions and experiences Thorsten Kleinwort CERN IT/FIO WLCG Tier 2 workshop CERN 16.06.2006 1 Introduction Different h/w used for GRID services Various techniques & First
Distributed Database Access in the LHC Computing Grid with CORAL
Distributed Database Access in the LHC Computing Grid with CORAL Dirk Duellmann, CERN IT on behalf of the CORAL team (R. Chytracek, D. Duellmann, G. Govi, I. Papadopoulos, Z. Xie) http://pool.cern.ch &
Getting Started Guide for Developing tibbr Apps
Getting Started Guide for Developing tibbr Apps TABLE OF CONTENTS Understanding the tibbr Marketplace... 2 Integrating Apps With tibbr... 2 Developing Apps for tibbr... 2 First Steps... 3 Tutorial 1: Registering
Log managing at PIC. A. Bruno Rodríguez Rodríguez. Port d informació científica Campus UAB, Bellaterra Barcelona. December 3, 2013
Log managing at PIC A. Bruno Rodríguez Rodríguez Port d informació científica Campus UAB, Bellaterra Barcelona December 3, 2013 Bruno Rodríguez (PIC) Log managing at PIC December 3, 2013 1 / 21 What will
Site specific monitoring of multiple information systems the HappyFace Project
Home Search Collections Journals About Contact us My IOPscience Site specific monitoring of multiple information systems the HappyFace Project This content has been downloaded from IOPscience. Please scroll
IGI Portal architecture and interaction with a CA- online
IGI Portal architecture and interaction with a CA- online Abstract In the framework of the Italian Grid Infrastructure, we are designing a web portal for the grid and cloud services provisioning. In following
Qualys API Limits. July 10, 2014. Overview. API Control Settings. Implementation
Qualys API Limits July 10, 2014 Overview The Qualys API enforces limits on the API calls a customer can make based on their subscription settings, starting with Qualys version 6.5. The limits apply to
ActiveVOS Server Architecture. March 2009
ActiveVOS Server Architecture March 2009 Topics ActiveVOS Server Architecture Core Engine, Managers, Expression Languages BPEL4People People Activity WS HT Human Tasks Other Services JMS, REST, POJO,...
Security Analytics Topology
Security Analytics Topology CEP = Stream Analytics Hadoop = Batch Analytics Months to years LOGS PKTS Correlation with Live in Real Time Meta, logs, select payload Decoder Long-term, intensive analysis
Service Challenge Tests of the LCG Grid
Service Challenge Tests of the LCG Grid Andrzej Olszewski Institute of Nuclear Physics PAN Kraków, Poland Cracow 05 Grid Workshop 22 nd Nov 2005 The materials used in this presentation come from many sources
Analyses on functional capabilities of BizTalk Server, Oracle BPEL Process Manger and WebSphere Process Server for applications in Grid middleware
Analyses on functional capabilities of BizTalk Server, Oracle BPEL Process Manger and WebSphere Process Server for applications in Grid middleware R. Goranova University of Sofia St. Kliment Ohridski,
Report from SARA/NIKHEF T1 and associated T2s
Report from SARA/NIKHEF T1 and associated T2s Ron Trompert SARA About SARA and NIKHEF NIKHEF SARA High Energy Physics Institute High performance computing centre Manages the Surfnet 6 network for the Dutch
The GRID and the Linux Farm at the RCF
The GRID and the Linux Farm at the RCF A. Chan, R. Hogue, C. Hollowell, O. Rind, J. Smith, T. Throwe, T. Wlodek, D. Yu Brookhaven National Laboratory, NY 11973, USA The emergence of the GRID architecture
An objective comparison test of workload management systems
An objective comparison test of workload management systems Igor Sfiligoi 1 and Burt Holzman 1 1 Fermi National Accelerator Laboratory, Batavia, IL 60510, USA E-mail: [email protected] Abstract. The Grid
Database Services for Physics @ CERN
Database Services for Physics @ CERN Deployment and Monitoring Radovan Chytracek CERN IT Department Outline Database services for physics Status today How we do the services tomorrow? Performance tuning
The dcache Storage Element
16. Juni 2008 Hamburg The dcache Storage Element and it's role in the LHC era for the dcache team Topics for today Storage elements (SEs) in the grid Introduction to the dcache SE Usage of dcache in LCG
DSA1.5 U SER SUPPORT SYSTEM
DSA1.5 U SER SUPPORT SYSTEM H ELP- DESK SYSTEM IN PRODUCTION AND USED VIA WEB INTERFACE Document Filename: Activity: Partner(s): Lead Partner: Document classification: BG-DSA1.5-v1.0-User-support-system.doc
Virtualisation Cloud Computing at the RAL Tier 1. Ian Collier STFC RAL Tier 1 HEPiX, Bologna, 18 th April 2013
Virtualisation Cloud Computing at the RAL Tier 1 Ian Collier STFC RAL Tier 1 HEPiX, Bologna, 18 th April 2013 Virtualisation @ RAL Context at RAL Hyper-V Services Platform Scientific Computing Department
CloudStack Metering Working with the Usage Data. Tariq Iqbal Senior Consultant [email protected] Twitter: @TariqIqbal_ @ShapeBlue
CloudStack Metering Working with the Usage Data Tariq Iqbal Senior Consultant [email protected] Twitter: @TariqIqbal_ @ShapeBlue About Me Involved with CloudStack before donation to Apache Built
How To Use The Rss Feeder On Firstclass (First Class) And First Class (Firstclass) For Free
RSS Feeder - Administrator Guide for OpenText Social Workplace and FirstClass Werner de Jong, Senior Solutions Architect 8 July 2011 Abstract This document is an administrator s guide to the installation
For a full comparison of Magento Enterprise and Magento Community, visit http://www.magentocommerce.com/product/compare. Magento Feature List
Magento is a feature-rich, professional Open Source ecommerce platform solution that offers merchants complete flexibility and control over the user experience, content, and functionality of their online
HTCondor at the RAL Tier-1
HTCondor at the RAL Tier-1 Andrew Lahiff, Alastair Dewhurst, John Kelly, Ian Collier, James Adams STFC Rutherford Appleton Laboratory HTCondor Week 2014 Outline Overview of HTCondor at RAL Monitoring Multi-core
Client/Server Grid applications to manage complex workflows
Client/Server Grid applications to manage complex workflows Filippo Spiga* on behalf of CRAB development team * INFN Milano Bicocca (IT) Outline Science Gateways and Client/Server computing Client/server
SHAREPOINT CI. Spots Business Opportunities, Eliminates Risk COMPETITIVE INTELLIGENCE FOR SHAREPOINT
RESULTS Receive key intelligence on clients, prospects, competitors and threats Optimize business strategies Act on critical business opportunities you can t afford to miss. Leverage your knowledge resources
Contents. Platform Compatibility. GMS SonicWALL Global Management System 5.0
GMS SonicWALL Global Management System 5.0 Contents Platform Compatibility...1 New Features and Enhancements...2 Known Issues...6 Resolved Issues...6 Installation Procedure...7 Related Technical Documentation...8
Magento 1.3 Feature List
ecommerce Platform for Growth Magento 1.3 Feature List Site Management Control multiple websites and stores from one Administration Panel with ability to share as much or as little information as needed
AN APPROACH TO DEVELOPING BUSINESS PROCESSES WITH WEB SERVICES IN GRID
AN APPROACH TO DEVELOPING BUSINESS PROCESSES WITH WEB SERVICES IN GRID R. D. Goranova 1, V. T. Dimitrov 2 Faculty of Mathematics and Informatics, University of Sofia S. Kliment Ohridski, 1164, Sofia, Bulgaria
Site Management Abandoned Shopping Cart Report Best Viewed Products Report Control multiple websites and stores from one
Site Management Abandoned Shopping Cart Report Best Viewed Products Report Control multiple websites and stores from one Best Purchased Products Report Administration Panel with ability to share as much
ANALYSIS FUNCTIONAL AND STRESS TESTING
ANALYSIS FUNCTIONAL AND STRESS TESTING Dan van der Ster, CERN IT-ES-DAS for the HC team: Johannes Elmsheuser, Federica Legger, Mario Úbeda García WLCG Workshop, 8 July 2010 Outline Overview what should
OpenAdmin Tool for Informix (OAT) October 2012
OpenAdmin Tool for Informix (OAT) October 2012 What is OpenAdmin Tool for Informix? OAT is a web-based administration tool for the IBM Informix database server A single OAT installation can administer
1. GENERAL INFORMATION... 3 1.1. 3 1.2. 3 2. PROCESS AND WORKSPACE OVERVIEW... 4 2.1. 4 2.2. 5 2.2.1. 5 2.2.2. 6 2.2.3. 7 2.2.4. 9 2.2.5.
TABLE OF CONTENTS 1. GENERAL INFORMATION... 3 1.1. System overview... 3 1.2. The purpose of the user guide... 3 2. PROCESS AND WORKSPACE OVERVIEW... 4 2.1. Authorization... 4 2.2. Workspace overview...
Deploying ArcGIS for Server Using Esri Managed Services
Federal GIS Conference 2014 February 10 11, 2014 Washington DC Deploying ArcGIS for Server Using Esri Managed Services Andrew Sakowicz Erin Ross Cloud Overview Deploying ArcGIS for Server What is Cloud:
Report from Italian ROC
Report from Italian ROC Paolo Veronesi for ROC It www.eu-egee.org ARM-7, ROC(111) 15th - 17th May 2006 - Krakow, Outline Changes in ROC structure over the last 3 months (people/institutes involved) People
Solution for private cloud computing
The CC1 system Solution for private cloud computing 1 Outline What is CC1? Features Technical details Use cases By scientist By HEP experiment System requirements and installation How to get it? 2 What
