GRID workload management system and CMS fall production. Massimo Sgaravatto INFN Padova



Similar documents
Condor for the Grid. 3)

Roberto Barbera. Centralized bookkeeping and monitoring in ALICE

Grid Scheduling Dictionary of Terms and Keywords

An objective comparison test of workload management systems

The CMS analysis chain in a distributed environment

LSKA 2010 Survey Report Job Scheduler

HTCondor within the European Grid & in the Cloud

Using Parallel Computing to Run Multiple Jobs

Grid Scheduling Architectures with Globus GridWay and Sun Grid Engine

MSU Tier 3 Usage and Troubleshooting. James Koll

Grid Computing in SAS 9.4 Third Edition

CLOUD COMPUTING. When It's smarter to rent than to buy

DeBruin Consulting. Key Concepts of IBM Integration Broker and Microsoft BizTalk

SEE-GRID-SCI. SEE-GRID-SCI USER FORUM 2009 Turkey, Istanbul December, 2009

Oracle Insurance Policy Administration. Version

The GRID and the Linux Farm at the RCF

(RH 7.3, gcc ,VDT 1.1.6, EDG 1.4.3, GLUE, RLS) Tokyo BNL TAIWAN RAL 20/03/ /03/2003 CERN 15/03/ /03/2003 FNAL 10/04/2003 CNAF

Sun Grid Engine, a new scheduler for EGEE

RenderStorm Cloud Render (Powered by Squidnet Software): Getting started.

Apache Hadoop. Alexandru Costan

An approach to grid scheduling by using Condor-G Matchmaking mechanism

Cloud Computing Architecture with OpenNebula HPC Cloud Use Cases

Installing and running COMSOL on a Linux cluster

A High Performance Computing Scheduling and Resource Management Primer

Chapter 1 - Web Server Management and Cluster Topology

Deploying a distributed data storage system on the UK National Grid Service using federated SRB

The glite Workload Management System

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

CEMON installation and configuration procedure

Monitoring Clusters and Grids

locuz.com HPC App Portal V2.0 DATASHEET

Microsoft HPC. V 1.0 José M. Cámara (checam@ubu.es)

No.1 IT Online training institute from Hyderabad URL: sriramtechnologies.com

Using Big Data and GIS to Model Aviation Fuel Burn

Chapter 2: Getting Started

G-Monitor: Gridbus web portal for monitoring and steering application execution on global grids

Enabling LIGO Applications on Scientific Grids

Scheduling in SAS 9.4 Second Edition

GT 6.0 GRAM5 Key Concepts

The ENEA-EGEE site: Access to non-standard platforms

Scheduling in SAS 9.3

AASPI SOFTWARE PARALLELIZATION

Designing a Windows Server 2008 Applications Infrastructure

SSM6437 DESIGNING A WINDOWS SERVER 2008 APPLICATIONS INFRASTRUCTURE

Load Balancing in cloud computing

Cobalt: An Open Source Platform for HPC System Software Research

Grid Computing With FreeBSD

Concepts and Architecture of the Grid. Summary of Grid 2, Chapter 4

Batch and Cloud overview. Andrew McNab University of Manchester GridPP and LHCb

50331D Windows 7, Enterprise Desktop Support Technician (Windows 10 Curriculum)

Provisioning and Resource Management at Large Scale (Kadeploy and OAR)

Release Notes: SANsymphony-V System Center Operations Manager (SCOM) Management Pack 1.3

Interoperating Cloud-based Virtual Farms

SURVEY ON THE ALGORITHMS FOR WORKFLOW PLANNING AND EXECUTION

Oracle WebLogic Server 11g: Administration Essentials

Comparing two Queuing Network Solvers: JMT vs. PDQ

Running COMSOL in parallel

BusinessObjects Enterprise XI Release 2

CDFII Computing Status

Mitglied der Helmholtz-Gemeinschaft. System monitoring with LLview and the Parallel Tools Platform

HPC-Nutzer Informationsaustausch. The Workload Management System LSF

Alfresco Enterprise on Azure: Reference Architecture. September 2014

The CERN Virtual Machine and Cloud Computing

A Metascheduler Proof of Concept using Tivoli Workload Scheduler

U-LITE Network Infrastructure

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

Transcription:

GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova

What do we want to implement (simplified design) Master chooses in which resources the jobs must be submitted Condor-G able to provide reliability Use of Condor tools for job monitoring, logging, Submit jobs (using Class-Ads) Condor-G Master condor_submit ( Universe) Resource Discovery Grid Information Service (GIS) Information on characteristics and status of local resources globusrun as uniform interface to different local resource management systems Local Resource Management Systems CONDOR LSF Farms Site1 Site2 Site3

What can be implemented now Submit jobs condor_submit ( Universe) Grid Information Service (GIS) Not very useful in this model Condor-G able to provide reliability Use of Condor tools for job monitoring, logging, Condor-G Information on characteristics and status of local resources globusrun as uniform interface to different local resource management systems Local Resource Management Systems CONDOR LSF Farms Site1 Site2 Site3

Status Tests on basic capabilities and functionalities have been performed Problems with scalability and fault tolerance found CMS production useful exercise to test everything with real applications and real environments

CMS production Application: Pythia + Cmsim Traditional applications Overview Job management (submission, monitoring) from a single machine using Condor tools User must explicitly define in which resource (which farm) the jobs must be submitted The applications and the input files must be stored in the file system of the executing machine The output files will be created in the file system of the executing machine We can try to have just the standard output/error files (useful to check the status of the production) created in the submitting machine, using bypass and/or GASS CMS wants to test bypass as a second step

Bypass vs. GASS Bypass Written by Douglas Thain (Condor team) Redirection of standard input/output/error of a program to a remote machine when the program is running Can be used for dynamically linked program Successfully tested with Pythia Use of Security Infrastructure GASS Possibility to copy the input file on the remote machine before the execution, and have the output file back after the execution (otherwise it is necessary to modify the source code)

What is necessary Local farms with shared file system between the various nodes Done using CMS installation toolkit Installation and support up to CMS/local administrators Installation of CMS environment on these farms Done using CMS installation toolkit Support up to CMS

What is necessary Local resource management system to manage the local farm LSF Installation and support up to CMS/local administrators We should define in a common way how to configure the queue/s where the jobs run Local Condor pool Installation and configuration (for dedicated machines) using CMS toolkit Support??? PBS Are there sites where PBS will be used??? Tests on Condor-G PBS not performed yet Fork Warmly thoughtless (even for a single machine) Necessary to install on each machine Job queuing up to the production manager

What is necessary One installation per each farm (on a visible node) Use of personal certificates and host certificates signed by INFN CA User certificates signed by CA are accepted as well By default it is not possible to use resources outside INFN using personal certificates signed by INFN CA Workaround 1: Users have also personal certificates signed by CA Workaround 2: Small modification in the configuration of these resources outside INFN in order to accept our certificates too Installation Installation done by CMS/local administrators/wp1 member (if present) using distribution and procedures provided by INFN GRID release team (http://www.pi.infn.it/grid/grid_inst_1.1.html) In case of problems: globus@infn.it

What is necessary Condor-G Just one installation, used by the production manager (Ivano Lippi?) Installation and maintenance: Massimo Sgaravatto??? Scripts to run CMS production using this GRID environment Up to CMS Tools to monitor production condor_q Condor Job Viewer (Java GUI) Run the production Up to production manager

Some items/actors missing??? When??? Relations with other activities??? Data Management (GDMP, )??????