Client/Server Grid applications to manage complex workflows



Similar documents
The CMS analysis chain in a distributed environment

CMS Dashboard of Grid Activity

Report on WorkLoad Management activities

PoS(EGICF12-EMITC2)110

Analisi di un servizio SRM: StoRM

EDG Project: Database Management Services

McAfee Web Gateway Administration Intel Security Education Services Administration Course Training

ATLAS job monitoring in the Dashboard Framework

Status and Integration of AP2 Monitoring and Online Steering

The Grid-it: the Italian Grid Production infrastructure

aaps algacom Account Provisioning System

Cluster, Grid, Cloud Concepts

The glite File Transfer Service

Data Grids. Lidan Wang April 5, 2007

The Data Quality Monitoring Software for the CMS experiment at the LHC

Status and Evolution of ATLAS Workload Management System PanDA

Cloud Computing Architecture with OpenNebula HPC Cloud Use Cases

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Welcome to the User Support for EGEE Task Force Meeting

OMU350 Operations Manager 9.x on UNIX/Linux Advanced Administration

Grid Computing in Aachen

CERN local High Availability solutions and experiences. Thorsten Kleinwort CERN IT/FIO WLCG Tier 2 workshop CERN

onetransport 2016 InterDigital, Inc. All Rights Reserved.

Distributed Database Access in the LHC Computing Grid with CORAL

Qlik UKI Consulting Services Catalogue

Grid Scheduling Architectures with Globus GridWay and Sun Grid Engine

<Insert Picture Here> Achieving Business & Government Interoperability through PaaS & SaaS

GigaSpaces XAP.NET Administration Training ADMINISTRATION, MONITORING AND TROUBLESHOOTING GIGASPACES XAP.NET DISTRIBUTED SYSTEMS

GigaSpaces XAP 9.7 Administration Training ADMINISTRATION, MONITORING AND TROUBLESHOOTING GIGASPACES XAP DISTRIBUTED SYSTEMS

Building Semantic Content Management Framework

COMP5426 Parallel and Distributed Computing. Distributed Systems: Client/Server and Clusters

Beyond the SOA/BPM frontiers Towards a complete open cooperative environment

CNR-INFM DEMOCRITOS and SISSA elab Trieste

This module explains the Microsoft Dynamics NAV architecture and its core components.

AN APPROACH TO DEVELOPING BUSINESS PROCESSES WITH WEB SERVICES IN GRID

The glite Workload Management System

Interwise Connect. Working with Reverse Proxy Version 7.x

Anar Manafov, GSI Darmstadt. GSI Palaver,

The GridWay Meta-Scheduler

Simplified Management With Hitachi Command Suite. By Hitachi Data Systems

DARMADI KOMO: Hello, everyone. This is Darmadi Komo, senior technical product manager from SQL Server marketing.

Atrium Discovery for Storage. solution white paper

IGI Portal architecture and interaction with a CA- online

7/15/2011. Monitoring and Managing VDI. Monitoring a VDI Deployment. Veeam Monitor. Veeam Monitor

EMI views on Cloud Computing

MONITORING RED HAT GLUSTER SERVER DEPLOYMENTS With the Nagios IT infrastructure monitoring tool

CLOUD ARCHITECTURE DIAGRAMS AND DEFINITIONS

A Survey Study on Monitoring Service for Grid

Internet of Things. Reply Platform

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Managed Cloud Services

Administering Cisco Unified Communications Manager (ACUCM) v10.0

Sisense. Product Highlights.

Course 20463:Implementing a Data Warehouse with Microsoft SQL Server

glibrary: Digital Asset Management System for the Grid

Delivering secure, real-time business insights for the Industrial world

HISP: a data-driven portal for hadron therapy

Internet Engineering: Web Application Architecture. Ali Kamandi Sharif University of Technology Fall 2007

DSA1.4 R EPORT ON IMPLEMENTATION OF MONITORING AND OPERATIONAL SUPPORT SYSTEM. Activity: SA1. Partner(s): EENet, NICPB. Lead Partner: EENet

A Grid Architecture for Manufacturing Database System

Simplify SSL Certificate Management Across the Enterprise

The Evolution of Cloud Computing in ATLAS

Functional Requirements for Digital Asset Management Project version /30/2006

Glassfish Architecture.

Oracle Database Cloud Services OGh DBA & Middleware Day

ATLAS Petascale Data Processing on the Grid: Facilitating Physics Discoveries at the LHC

A Generic Database Web Service

Using Microsoft Operations Manager To Monitor And Maintain Your Farm. Michael Noel.

GridSolve: : A Seamless Bridge Between the Standard Programming Interfaces and Remote Resources

Parallels Mac Management v4.0

Cloud Computing and Advanced Relationship Analytics

The GENIUS Grid Portal

GigaSpaces XAP 10.0 Administration Training ADMINISTRATION, MONITORING AND TROUBLESHOOTING GIGASPACES XAP DISTRIBUTED SYSTEMS

Transcription:

Client/Server Grid applications to manage complex workflows Filippo Spiga* on behalf of CRAB development team * INFN Milano Bicocca (IT)

Outline Science Gateways and Client/Server computing Client/server approach for CMS distributed analysis Usage statistics and trends Considerations, remarks and future works 2

Science Gateways & Client/Server Computing A Science Gateway is a community-developed set of tools, applications, and data collections that are integrated via a portal or a suite of applications. The Client/Server Computing is a distributed application architecture that partitions tasks or work loads between service providers (servers) and service requesters (clients). 3

Science Gateways & Client/Server Computing Science gateways and client/server applications share the main fundamental ideas address the same purposes are realized using different technologies are deployed in different ways interact with users through different interfaces The choice between them will be made considering the target community (both users and developers) 4

Motivations & Advantages (for users) In the context of HEP computing, the client/server applications can automate (as much as possible) complex analysis workflows allow more advanced job monitoring with centralized front ends hide configuration complexity on heterogeneous grid environment hide implementation details to the users automatic resubmission in case of failure allow (strong) centralized policies 5

Motivations & Advantages (for operators and developers) In the context of HEP computing, the client/server applications can extend/integrate the functionalities of the middleware help to manage centrally updates improve support to users by operators avoid destructive congestions interacting with the grid improve the scalability of the entire computing model 6

The experience with CRAB tool in CMS CRAB, a Python tool intended to create, submit and manage CMS analysis jobs in a Grid environment. A single tool addresses different applications/purposes: CMSSW analysis over the Grid private MC production complex workflow automation Started as standalone application it is evolved to a client/server application (in production since July 2007) 7

Standalone vs Client-Server (workflow point of view) UI C R A B CRAB DatasetPath LFNs, #events, fileblocks DBS Dataset Discovery CRAB server Proxy UserBox CRAB Client datasetpath DBS Output jdl, job, RB/WMS jdl, job CE WN... Fileblocks SEs Data SE PhEDEx DLS (DLI interface) Dataset Location trivial file catalog LFN!> PFN File Location BOSS DB Output gsiftp server Job jdl, CE WN... Jdl, Job direct submission available jdl RB LFN, #events, fileblocks Data fileblocks SEs fileblocks SEs SE File Location Dataset Discovery PhEDEx DLS (DLI enabled, LFC based Dataset Location trivial file catalog Output (also to remote SE) Output 8

Server architecture Agent-based model USER (BROWSER, MAILBOX, USER INTERFACE) Publish&Subscribe message service Most components are multi-threading Transparent support to multiple storage systems (SEAPI) and multiple middlewares (BossLite) TASK TRACKING ADMIN CONTROL ERROR HANDLER JOB KILLER COMMAND MANAGER JOB TRACKING HTTP FRONTEND MYSQL DATABASE (BOSSLITE) MIDDLEWARE GET OUTPUT NOTIFI- CATION CRAB JOB CREATOR TASK LIFE MANAGER TASK REGISTER CRAB SERVER WORKER STORAGE MYPROXY 9

Components interaction example (submission) CLIENT gsoap SERVER SQLITE DB -submit COMMAND MANAGER If a component/thread crashes, the system continues to work GSIFTP Storage Element TASK REGISTER CRAB SERVER WORKER MySQL DATABASE GRID Other services provided by the server will be not compromised At worst, smooth degradation of QoS 10

CRAB deployment on Grid infrastructure CRAB deployment strategy for CMS enforce fault-tolerance avoid single point congestion glite is the main middleware used periodic challenges are performed to test and stress the infrastructure Different CRAB-AS in production now located at CERN (CH), UCSD (US), BARI (IT) each physic groups can deploy it s own unofficial server Analysis Operators team (AnaOps) manages and monitors the services 11

Statistics: starting point Source of statistics: CMS Dashboard Focus on comparison between standalone and client/ server approaches trend analysis from Jan 2008 to Feb 2010 Statistic is not perfect sometimes testing jobs submitted by CRAB developers are counted as analysis jobs Only analysis jobs on T2 level were considered 12

CRAB Standalone vs Server (2008 2009) 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% SERVER DIRECT 2008 2009 SERV; 23% DIRECT; 77% SERV; 47% DIRECT; 53% Source: CMS Dashboard, queried @ 27/03/2010 13

CRAB Standalone vs Server (last 6 months, Sep 09 - Feb 10) 54% of submissions were made through the server Source: CMS Dashboard, queried @ 27/03/2010 14

CRAB Standalone vs Server (2009) Restart of LHC Number of distinct CRAB users: 150~200/day Max distinct users peak reached: ~250 Source: CMS Dashboard, queried @ ~02/2010 15

Considerations & Remarks Architectural aspects of the software are the key factors to reach success good performance achieved good stability and reliability Inside CMS community, the adoption of client/server approach is increasing good feedback by users good quality of service reached The only (relevant but known) problems are related to data movement stage-out remains a critical operation across the Grid, influenced by several factors 16

Improvements & Future works Consolidate the actual architecture change the interaction with glite middleware (CLI instead of API) better strategies for stage-out operations maintain the best interoperability for all middlewares supported reduce the number of jobs aborted improvements for intelligent resubmission New framework for all CMS computing: WMCore one application targets the online (T0), the official production (T1) and the user analyses (T2) communications based on RESTful web services too young to go to production (expected for ) 17

THANK YOU FOR YOUR ATTENTION Questions? Special thanks to Daniele Spiga (CRAB), Fabio Farina (CRAB), Mattia Cinquilli (CRAB) and Julia Andreeva (CMS Dashboard) 18

The CRAB history (backup slide) CRAB_2_X_X (10/2007) CRAB_2_7_1 (03/2010) CRAB_1_X_X 2004/2005 2006 2007 2008 2009 2010 CRABSERVER_0_0_1 (07/2007) CRABSERVER_1_0_X (05/2008) CRABSERVER_1_1_0 (11/2009) CRABSERVER_1_1_1 (03/2010) 19