Das HappyFace Meta-Monitoring Framework

Similar documents
How To Use Happyface (Hf) On A Network (For Free)

HappyFace for CMS Tier-1 local job monitoring

Site specific monitoring of multiple information systems the HappyFace Project

ARDA Experiment Dashboard

HappyFace3 everything now in Python ;-)

ATLAS job monitoring in the Dashboard Framework

The CMS analysis chain in a distributed environment

Monitoring the Grid at local, national, and global levels

Dcache Support and Strategy

Distributed Database Access in the LHC Computing Grid with CORAL

Status and Evolution of ATLAS Workload Management System PanDA

Forschungszentrum Karlsruhe in der Helmholtz - Gemeinschaft. Holger Marten. Holger. Marten at iwr. fzk. de

How To Monitor Mysql With Zabbix

Database Services for CERN

Grid Computing in Aachen

SCF/FEF Evaluation of Nagios and Zabbix Monitoring Systems. Ed Simmonds and Jason Harrington 7/20/2009

Integration of Virtualized Workernodes in Batch Queueing Systems The ViBatch Concept

HTCondor at the RAL Tier-1

Single Sign-In User Centered Computing for High Energy Physics

PoS(EGICF12-EMITC2)110

Gratia: New Challenges in Grid Accounting.

The dcache Storage Element

SUPPORT FOR CMS EXPERIMENT AT TIER1 CENTER IN GERMANY

Mobile App Framework For any Website

The dashboard Grid monitoring framework

Big Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary

2009 CASTOR F2F. Miguel Coelho dos Santos. CERN - IT Department CH-1211 Genève 23 Switzerland

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. dcache Introduction

Integrating a heterogeneous and shared Linux cluster into grids

Using Apache Derby in the real world

Features of AnyShare

Patrick Fuhrmann. The DESY Storage Cloud

Long term analysis in HEP: Use of virtualization and emulation techniques

The Complete Performance Solution for Microsoft SQL Server

WordPress Security Scan Configuration

OSG Hadoop is packaged into rpms for SL4, SL5 by Caltech BeStMan, gridftp backend

Dashboard applications to monitor experiment activities at sites

ATLAS Software and Computing Week April 4-8, 2011 General News

monitoring the HappyFace Project in LHC Computing Grid

Your eyes in the network

Welcome to the second half ofour orientation on Spotfire Administration.

Running the scientific data archive

CRM. itouch Vision. This document gives an overview of OneTouch Cloud CRM and discusses the different features and functionality.

Managing your Red Hat Enterprise Linux guests with RHN Satellite

CMS Dashboard of Grid Activity

Dynamic Extension of a Virtualized Cluster by using Cloud Resources CHEP 2012

Proposals for Site Monitoring Solutions

Big Data for Satellite Business Intelligence

Data Quality Monitoring. workshop

Powerful Management of Financial Big Data

CERN local High Availability solutions and experiences. Thorsten Kleinwort CERN IT/FIO WLCG Tier 2 workshop CERN

PhEDEx. Physics Experiment Data Export. Chia-Ming, Kuo National Central University, Taiwan ( a site admin/user rather than developer)

Internet Services. CERN IT Department CH-1211 Genève 23 Switzerland

Data Driven Success. Comparing Log Analytics Tools: Flowerfire s Sawmill vs. Google Analytics (GA)

Yahoo! Communities Architectures Ian Flint

High Availability Databases based on Oracle 10g RAC on Linux

Report from SARA/NIKHEF T1 and associated T2s

1. INTERFACE ENHANCEMENTS 2. REPORTING ENHANCEMENTS

Intellicus Enterprise Reporting and BI Platform

Clusters in the Cloud

Project Online: Manage External Sharing

Betriebssystem-Virtualisierung auf einem Rechencluster am SCC mit heterogenem Anwendungsprofil

YAN, Tian. On behalf of distributed computing group. Institute of High Energy Physics (IHEP), CAS, China. CHEP-2015, Apr th, OIST, Okinawa

Database FAQs - SQL Server

Software, Computing and Analysis Models at CDF and D0

Integration of Virtualized Worker Nodes in Standard-Batch-Systems CHEP 2009 Prague Oliver Oberst

Oracle and Streams Diagnostics and Monitoring

APP USER MANUAL. Trackunit Virtual Hardware. Status / Tracking / Map

Smart Business Architecture for Midsize Networks Network Management Deployment Guide

CS3051: Digital Content Management

DCMS Tier 2/3 prototype infrastructure

Sophos Mobile Control Administrator guide. Product version: 3

Copyright EPiServer AB

A recipe using an Open Source monitoring tool for performance monitoring of a SaaS application.

:Introducing Star-P. The Open Platform for Parallel Application Development. Yoel Jacobsen E&M Computing LTD

Gabriel Iuga. London, United Kingdom Tel: ; Website:

Monitoring Remedy with BMC Solutions

SUCCESFUL TESTING THE CONTINUOUS DELIVERY PROCESS

WordPress File Monitor Plus Plugin Configuration

1.0 Hardware Requirements:

The CMS Tier0 goes Cloud and Grid for LHC Run 2. Dirk Hufnagel (FNAL) for CMS Computing

Everything under control OpenTAS Network Monitor. Simple, fast and affordable.


Quick start. A project with SpagoBI 3.x

Agile Infrastructure Update Monitoring

IBM EXAM - C IBM WebSphere Business Monitor V6.2 Solution Development.

How To Monitor Your Computer With Nagiostee.Org (Nagios)

The Agile Infrastructure Project. Monitoring. Markus Schulz Pedro Andrade. CERN IT Department CH-1211 Genève 23 Switzerland

Trainer name is P. Ranjan Raja. He is honour of and he has 8 years of experience in real time programming.

IBM Security QRadar SIEM Version MR1. Administration Guide

Sophos Mobile Control Administrator guide. Product version: 3.6

Cross Platform Applications with IBM Worklight

HAMBURG ZEUTHEN. DESY Tier 2 and NAF. Peter Wegner, Birgit Lewendel for DESY-IT/DV. Tier 2: Status and News NAF: Status, Plans and Questions

Vanilla44 New Features

For each requirement, the Bidder should indicate which level of support pertains to the requirement by entering 1, 2, or 3 in the appropriate box.

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April Page 1 of 12

Monitoring DKRZ

Continuous Integration

How To Backup In Cisco Uk Central And Cisco Cusd (Cisco) Cusm (Custodian) (Cusd) (Uk) (Usd).Com) (Ucs) (Cyse

CMS Level 1 Track Trigger

Transcription:

Das HappyFace Meta-Monitoring Framework B. Berge, M. Heinrich, G. Quast, A. Scheurer, M. Zvada, DPG Frühjahrstagung Karlsruhe, 28. März 1. April 2011 KIT University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu

HappyFace (HF) Basics HF - What it is: Allows real-time site monitoring Acquires information automatically, not on demand! Can be used as sophisticated and modular shift tool Provides as well detailed information for admins (if required) Auto-refresh system, no user intervention necessary Provides rating system (Keyword: non-expert shift crews) Allows to correlate information Can trigger automatic alarms/notifications Highly configurable and adjustable e.g. certificate based access control Meta-monitoring Suite 2 30.03.2011 Dr. Armin Scheurer

Technical Background HF is: Written in Python Highly modular Similar modules inherit functionality Configuration possible on each inheritance level, e.g. 5 modules which provide CMS Dashboard information for T1_DE_KIT can be changed to monitor the site T2_DE_DESY by just altering one line in the parent config file. DB assisted Intrinsic history functionality Lightweight, fast and reliable In production use since more than 2 years now Easy Deployment on New Sites! 3 30.03.2011 Dr. Armin Scheurer

The Interface & Rating System 1. History Navigation 2. Category Navigation 3. Module Navigation 4. Module Content 1. 2. 3. Simple module and category rating system 4. 4 30.03.2011 Dr. Armin Scheurer

HF CMS and ATLAS partners HF core and module development: KIT Karlsruhe Module development/usage University of Hamburg DESY Hamburg RWTH Aachen University of Göttingen 5 30.03.2011 Dr. Armin Scheurer

Selected Module Batch System Monitoring Real-time batch system monitoring Use batch system xml provider (CMS has providers for e.g. PBS, LSF, Condor) See currently running batch jobs Calculate current job efficiency Define warning/ error thresholds 6 30.03.2011 Dr. Armin Scheurer

Prototype: CMS Tier1 Batch System Monitoring Individual categories for each Tier1, e.g. KIT FNAL etc. Combined or individual categories for Tier2s Easily extendable to integrate e.g. storage system monitoring, etc. Everything in one view, cached and thus very fast access to all precollected information 7 30.03.2011 Dr. Armin Scheurer

HF Access Control Certificate-based access control (e.g. Grid certificate) Access can be restricted for single modules or whole categories Hidden mode Use one single HF instance for admins and users! 8 30.03.2011 Dr. Armin Scheurer

Extended Functionality HF publishes its current status via XML Used as input for a plugin available for the Firefox web browser statusbar Used as input for smartphone apps (e.g. iphone, Android) Used for Meta² - Monitoring, ideal for central shifts the HF matrix By clicking on the individual arrows, directly jump to the proper HF instance and module 9 30.03.2011 Dr. Armin Scheurer

Current Development: Database Backend Standard HF DB backend - SQLite: Pro: Con: Lightweight, file-based, well supported, perfectly suited for most HF sites Huge files backup difficult, performance scalability Solution: introduce support for arbitrary DB backends E.g. Postgres, MySQL, Oracle, etc. Each site can use a preferred DB backend to support their own setup (e.g. Oracle cluster at CERN) Allows site-specific scalable performance optimisations 10 30.03.2011 Dr. Armin Scheurer

Summary HF What it is: Modular and easily configurable tool for shifters and admins All information is pre-collected (time interval: ~10 min) No waiting time, live feeling Stores information from external sources (plots, XML/HTML, text files ) Stores configuration parameters with the data to allow a consistent history view even after threshold changes, etc. Identify problems: possibility to get exact state of my site on Sunday night Provides powerful rating system (different algorithms available) Possibility to automatically trigger alarms/notifications Exports its status via XML for further usage/harvesting (e.g. iphone app, Firefox plugin, Meta²-Monitoring) Used by German CMS and ATLAS sites for more than 2 years now Stable, reliable, tested Many modules available designed for collaboration via central repository 11 30.03.2011 Dr. Armin Scheurer

The HappyFace Project More Information & Documentation: https://ekptrac.physik.uni-karlsruhe.de/trac/happyface 12 30.03.2011 Dr. Armin Scheurer

Backup Slides 13 30.03.2011 Dr. Armin Scheurer

Detailed Module Information History Plot Detailed information about each module available, including: Current status, warning and critical thresholds, link to the information source, instructions for shifters what to do in case of problems 14 30.03.2011 Dr. Armin Scheurer

Some Selected Modules Data Management HF module for data management See a list of all datasets available at a site Calculates the used space on disk Provides information about dataset distribution on the storage system (on disk, only on tape) Thresholds mark datasets green Allow/disallow user access on datasets which are not staged 15 30.03.2011 Dr. Armin Scheurer

Some Selected Modules Storage System Karlsruhe HF provides modules to monitor dcache systems Use standard dcache xml provider Can be used by any dcache site Monitor overall dcache status, down to the point of individual pools Monitor current I/O throughput and status of active/queued transfers Free space And many more features 16 30.03.2011 Dr. Armin Scheurer

Some Selected Modules RSS Feeds HF is a shifter tool it provides RSS feed functionality Inform all shifters about ongoing issues Keep track of open tickets, etc. Very useful during shift changeover 17 30.03.2011 Dr. Armin Scheurer

List of Available Modules Additional existing modules: Access to dcache billing database, e.g. last file access, most/least used datasets, etc. SAM test: OPS and VO-specific results User-space monitoring for Tier2s (access control via certificate) Local computing hardware and software infrastructure surveillance: e.g. VO software area, VOBox, ILO interfaces, network connection tests, etc. HF provides an interface to Nagios to include sensors in the status calculation PhEDEx agent status, transfer quality, link status, etc. CMS Site Readiness status Consistency modules: local filespace vs. DBS and/or TMDB Collectors for binary information: e.g. plots from Dashboard, PhEDEx, local services (Ganglia), etc. Atlas Panda monitoring Shift features: module providing further monitoring, documentation and contact links 18 30.03.2011 Dr. Armin Scheurer