Roberto Barbera. Centralized bookkeeping and monitoring in ALICE



Similar documents
The GENIUS Grid Portal

The CMS analysis chain in a distributed environment

GRID workload management system and CMS fall production. Massimo Sgaravatto INFN Padova

EDG Project: Database Management Services

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

Grid Scheduling Dictionary of Terms and Keywords

The ENEA-EGEE site: Access to non-standard platforms

Sun Grid Engine, a new scheduler for EGEE

Instruments in Grid: the New Instrument Element

GT 6.0 GRAM5 Key Concepts

An approach to grid scheduling by using Condor-G Matchmaking mechanism

Grid Computing With FreeBSD

MapCenter: An Open Grid Status Visualization Tool

A Taxonomy and Survey of Grid Resource Planning and Reservation Systems for Grid Enabled Analysis Environment

Interoperating Cloud-based Virtual Farms

Monitoring Clusters and Grids

MIGRATING DESKTOP AND ROAMING ACCESS. Migrating Desktop and Roaming Access Whitepaper

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

Concepts and Architecture of the Grid. Summary of Grid 2, Chapter 4

U-LITE Network Infrastructure

Installation Manual for Grid Monitoring Tool

NorduGrid ARC Tutorial

16th International Conference on Control Systems and Computer Science (CSCS16 07)

System Requirements. Version

Resource Management on Computational Grids

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Technical. Overview. ~ a ~ irods version 4.x

Analisi di un servizio SRM: StoRM

Software infrastructure and remote sites

Deploying a distributed data storage system on the UK National Grid Service using federated SRB

Web Service Based Data Management for Grid Applications

Quick Start Guide. Version

GRMS Features and Benefits

The Grid-it: the Italian Grid Production infrastructure

Cluster, Grid, Cloud Concepts

IBM Solutions Grid for Business Partners Helping IBM Business Partners to Grid-enable applications for the next phase of e-business on demand

Deploying Business Virtual Appliances on Open Source Cloud Computing

Linux Cluster - Compute Power Out of the Box

CHAPTER 4 PROPOSED GRID NETWORK MONITORING ARCHITECTURE AND SYSTEM DESIGN

Using Globus Toolkit

Recommendations for Static Firewall Configuration in D-Grid

This presentation provides an overview of the architecture of the IBM Workload Deployer product.

Grid Scheduling Architectures with Globus GridWay and Sun Grid Engine

Architecture and Mode of Operation

XSEDE Service Provider Software and Services Baseline. September 24, 2015 Version 1.2

CMS Tier-3 cluster at NISER. Dr. Tania Moulik

ARC Computing Element

MySQL Backups: From strategy to Implementation

The GRID and the Linux Farm at the RCF

irods at CC-IN2P3: managing petabytes of data

The glite Workload Management System

Global Grid User Support - GGUS - in the LCG & EGEE environment

Roadmap for Applying Hadoop Distributed File System in Scientific Grid Computing

Anar Manafov, GSI Darmstadt. GSI Palaver,

Cloud Computing. Lecture 5 Grid Case Studies

GridWay: Open Source Meta-scheduling Technology for Grid Computing

TUTORIAL. Rebecca Breu, Bastian Demuth, André Giesler, Bastian Tweddell (FZ Jülich) {r.breu, b.demuth, a.giesler,

Introduction. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

SEE-GRID-SCI. SEE-GRID-SCI USER FORUM 2009 Turkey, Istanbul December, 2009

Transcription:

Centralized bookkeeping and monitoring in ALICE CHEP INFN 2000, GRID 10.02.2000 WP6, 24.07.2001 Roberto 1 Barbera

ALICE and the GRID Phase I: AliRoot production The GRID Powered by ROOT 2

How did we get there? Automatic Linux installation tool with configurable post installation. Tested on the new farm at INFN Catania: a node out of the box and ready to run in 15 mins! Local resource monitoring system with web interface based on MRTG (http://alipc1.ct.infn.it/mrtg/monitoring.html) installed at all sites. Network latency monitor (RRT) with web interface installed at the production coordination site (http://alipc1.ct.infn.it/mrtg/netmon.html). Root/AliRoot automatic installation/upgrade toolkit (both via CVS and TAR ball) distributed at all sites. It automatically sets the environment to run AliRoot with Globus. Web portal based on XML technology realized in collaboration with NICE s.r.l. for user authentication (via Globus GSI) and job submission (via Globus GRAM): http://gridct1.ct.infn.it/globus. 3

Lyon Dipartimento di Fisica dell Università di Catania and INFN Catania - Italy The ALICE testbed for Phase I OSU/C Mexico City Merida 4

I m the PPR production manager Disk Pool Globus EnginFrame Linux farm + MRTG monitor Production test lay-out for phase I Catania 1 week run ~ 200 events 300+ GB I m the local surveyor Test site Batch surveyor 5

Utilities Book keeping system Web interface to login on The Grid CPU Disk Load space Network availability Web interface for job submission! Only at Lyon LDAP server for ALICE (only in Italy ) 6

What did we learn? (1) Pros: We are able to successfully manage certificates from different CA s (INFN, CNRS, Globus). The Root/AliRoot installation toolkit (Torino) works nicely at all sites (ALICE s and WP6 s). Many different job-managers have been tested: Condor, LSF, PBS and BQS (special interface to Globus realized at CCIN2P3 Lyon) The web interface EnginFrame is interfaced not only with the Globus GRAM and GSI but also with the local monitoring systems and with the presently available information service. A geographically distributed AliRoot production can be centrally managed. Produced data was actually used by physics analysis (TOF group in Bologna). 7

What did we learn? (2) Cons: The output and error files do not fly back to the submitting machine if a job manager different from fork is used with the Globus commands. The absence of a centralized bookkeeping system which could also acts as job monitor was the most critical issue. There was no automatic resource broker and wide area work load management. There was no direct interaction between Root/AliRoot and the GRID services. 8

ALICE and the GRID Phase II: Reconstruction and analysis The GRID The GRID 9

The ALICE testbed for Phase II 10

I m the PPR production manager Manager s site Dipartimento di Fisica dell Università di Catania and INFN Catania - Italy Globus EnginFrame Linux farm + MRTG monitor I m the local site manager Disk Pool WWW/ Carrot MySQL PHP Bypass Run DB Cron Mirror DB Tape Pool (CASTOR/ HPSS) I m the impatient ALICE user checking the availability of events Production test lay-out for phase II Production site WWW/Carrot Anywhere 11

More integration: Grid services directly addressed from within Root TAuthorization (P. Malzacher) Interface between Root and the Globus GSI service. TLDAP (P. Malzacher) Interface between Root and the Globus GIS service based on LDAP directories. TPServer, TPServerSocket, TPSocket, TFTP, rootd (F. Rademakers) Parallel socket transfer using TCP (files and objects). RootFTP (F. Rademakers) File transfer utility which uses parallel sockets. 12

Next Root developments Interface between Root and the DataGRID WP2 middleware (GDMP API s). PROOF can use Grid File Catalogue and Replication Manager to map LFN s to chain of RFN s. Interface between PROOF and the GRID Resource Broker to detect which nodes in a cluster can be used in the parallel session (use of TLDAP for resource discovery from the GRID information service(s)). Interface between TFTP/PROOF and the Globus GSI services via TAuthorization. Comparison with GridFTP. Interface between PROOF and the Grid Monitoring Services. 13

DataGrid & Root Selection parameters TAG DB selected events Root RDB LFN #hits Grid RB output LFNs Grid log & monitor PROOF loop Grid replica manager Grid autenticate Spawn PROOF tasks best places Grid perf mon Grid cost evaluator Grid MDS Grid perf log Grid replica catalog Update Root RDB Send results back Grid replica catalog 14

Internal milestones for Phase II 6/2001 List of ALICE users (& Certificates) distributed to the test sites (ftp://alipc1.ct.infn.it/pub/grid/test). 7/2001 Distributed production/reconstruction test with a centralized bookkeeping system and Bypass. As many sites involved as possible. 8/2001 Distributed analysis test with PROOF. 9/2001 Test with/on DataGRID WP6 resources (new version of the installation toolkit with the new scripts). 12/2001 First results of tests of DataGRID PM9 middleware release. 15