The glite Workload Management System



Similar documents
The ENEA-EGEE site: Access to non-standard platforms

Sun Grid Engine, a new scheduler for EGEE

Roberto Barbera. Centralized bookkeeping and monitoring in ALICE

The GENIUS Grid Portal

DiskPulse DISK CHANGE MONITOR

Report on WorkLoad Management activities

Client/Server Grid applications to manage complex workflows

NorduGrid ARC Tutorial

Integrating VoltDB with Hadoop

An objective comparison test of workload management systems

The CMS analysis chain in a distributed environment

GRID workload management system and CMS fall production. Massimo Sgaravatto INFN Padova

PBS Tutorial. Fangrui Ma Universit of Nebraska-Lincoln. October 26th, 2007

ARC Clients User Manual for ARC (client versions 1.0.0) and above

Grid Computing in Aachen

The Grid-it: the Italian Grid Production infrastructure

Condor for the Grid. 3)

Introduction to Programming and Computing for Scientists

Bitrix Site Manager ASP.NET. Installation Guide

Recommendations for Static Firewall Configuration in D-Grid

Grid Scheduling Dictionary of Terms and Keywords

An approach to grid scheduling by using Condor-G Matchmaking mechanism

Command Line Interface User Guide for Intel Server Management Software

IceWarp to IceWarp Server Migration

Universal Event Monitor for SOA Reference Guide

HP Operations Manager Software for Windows Integration Guide

Sun Grid Engine, a new scheduler for EGEE middleware

Analisi di un servizio SRM: StoRM

PageR Enterprise Monitored Objects - AS/400-5

Network Detective Client Connector

Using LDAP Authentication in a PowerCenter Domain

Scheduling in SAS 9.3

How To Install An Aneka Cloud On A Windows 7 Computer (For Free)

? Index. Introduction. 1 of 38 About the QMS Network Print Monitor for Windows NT

Cluster, Grid, Cloud Concepts

Status and Integration of AP2 Monitoring and Online Steering

Scheduling in SAS 9.4 Second Edition

WildFire Cloud File Analysis

Tutorial: Load Testing with CLIF

Hadoop Streaming. Table of contents

SOA Software API Gateway Appliance 7.1.x Administration Guide

Web Testing, Java Testing, Server Monitoring. AppPerfect Installation Guide

StreamServe Job Gateway

Grid Engine 6. Troubleshooting. BioTeam Inc.

Introduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber

32-Bit Workload Automation 5 for Windows on 64-Bit Windows Systems

CERN local High Availability solutions and experiences. Thorsten Kleinwort CERN IT/FIO WLCG Tier 2 workshop CERN

Quick Tutorial for Portable Batch System (PBS)

IBM Operational Decision Manager Version 8 Release 5. Getting Started with Business Rules

SDK Code Examples Version 2.4.2

EMC Documentum Content Services for SAP iviews for Related Content

Application Notes for Configuring Alternate Methods of Domain Based Routing for Outbound SIP Calls with the Avaya SIP Trunk Architecture Issue 1.

Hands-on Exercises with Big Data

Grid Computing in SAS 9.4 Third Edition

EDG Project: Database Management Services

glibrary: Digital Asset Management System for the Grid

Internet Technologies. World Wide Web (WWW) Proxy Server Network Address Translator (NAT)

Anar Manafov, GSI Darmstadt. GSI Palaver,

TSM for Windows Installation Instructions: Download the latest TSM Client Using the following link:

MIGRATING DESKTOP AND ROAMING ACCESS. Migrating Desktop and Roaming Access Whitepaper

PTC Integrity Eclipse and IBM Rational Development Platform Guide

Backing Up and Restoring Data

Deltek Costpoint Process Execution Modes

Bulk Downloader. Call Recording: Bulk Downloader

Presto User s Manual. Collobos Software Version Collobos Software, Inc

CA Workload Automation Agent for Remote Execution

Business Enterprise Server Help Desk Integration Guide. Version 3.5

Geant4 simulations using grid computing

Oracle Service Bus Examples and Tutorials

CNR-INFM DEMOCRITOS and SISSA elab Trieste

Handle Tool. User Manual

Integrating with BarTender Integration Builder

Transcription:

Consorzio COMETA - Progetto PI2S2 FESR The glite Workload Management System Annamaria Muoio INFN Catania Italy Annamaria.muoio@ct.infn.it Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July 2007 Catania www.consorzio-cometa.it

Outline This presentation will cover the following arguments: Overview of the glite WMS Architecture Job Description Language Overview - Principal Attributes References and hands-on Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 2

Overview of glite Middleware Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 3

Workload Management System Workload Management System (WMS) comprises a set of Grid middleware components responsible for distribution and management of tasks across Grid resources. Purpose of Workload Manager (WM) is accept and satisfy requests for job management coming from its clients meaning of the submission request is to pass the responsibility of the job to the WM. WM will pass the job to an appropriate CE for executiontaking into account requirements and the preferences expressed in the job description. The decision of which resource should be used is the outcome of a matchmaking process. Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 4

Job submission UI Network Server LFC Job Contr. - Condor CE characts & status Information System SE characts & status Storage Logging & Book-keeping keeping Computing Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 5

Job Status UI Network Server LFC UI: allows users to access the functionalities of the WMS (via command line, GUI, C++ and Java APIs) WMS: Workload Management System Job Contr. - CondorG CE characts & status Information System SE characts & status Logging & Book-keeping keeping Computing Storage Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 6

edg-job-submit myjob.jdl myjob.jdl JobType = Normal ; RB node Executable = "$(CMS)/exe/sum.exe"; Network Server LFC InputSandbox UI = {"/home/user/wp1testc","/home/file* }; OutputSandbox = { sim.err, test.out, sim.log"}; Requirements = other. GlueHostOperatingSystemName == linux" && other.gluecepolicymaxcputime > 10000; Rank = other.gluecestatefreecpus; Job Status submitted Job Status Job Contr. - CondorG Job Description Language (JDL) Information to specify job characteristics Systemand requirements SE characts CE characts & status & status Logging & Book-keeping keeping Computing Storage Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 7

NS: network daemon Job responsible for acceptingstatus incoming requests submitted UI Job Network Server LFC Input Sandbox files WMS storage Job Status Job Contr. - CondorG CE characts & status Information System SE characts & status Logging & Book-keeping keeping Computing Storage Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 8

Job Status submitted UI Job Network Server Match- Maker/ Broker LFC waiting WMS storage WM: responsible to take the appropriate actions to satisfy the request Job Status Job Contr. - CondorG Where must this job be executed? CE characts & status Information System SE characts & status Logging & Book-keeping keeping Computing Storage Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 9

Where are (which SEs) the needed data? Job Status submitted UI Matchmaker: responsible to find the best CE where to submit a job Network Server Match- Maker/ Broker LFC waiting WMS storage Job Status Job Contr. - Condor What is the status of the Grid? CE characts & status Information System SE characts & status Logging & Book-keeping keeping Computing Storage Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 10

Job Status submitted UI Network Server Match- Maker/ Broker LFC waiting WMS storage CE choice Job Status Job Contr. - Condor CE characts & status Information System SE characts & status Logging & Book-keeping keeping Computing Storage Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 11

Job Status submitted UI Network Server LFC waiting Job Status WMS storage Job Contr. - Condor Job Adapter JA: responsible for the final touches to the job before it s passed to Condor (e.g. creation of wrapper script, etc.) CE characts & status Information System SE characts & status Logging & Book-keeping keeping Computing Storage Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 12

Job Status submitted UI Network Server LFC waiting ready WMS storage Job JC: responsible for the actual job management operations (done via CondorG) Logging & Book-keeping keeping Job Status Computing Job Contr. - Condor CE characts & status Information System SE characts & status Storage Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 13

Job Status submitted UI Network Server LFC waiting WMS storage ready scheduled Job Status Input Sandbox files Job Contr. - Condor Job CE characts & status Information System SE characts & status Logging & Book-keeping keeping Computing Storage Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 14

Job Status submitted UI Network Server LFC waiting ready WMS storage scheduled Input Sandbox Job Status Job Contr. - Condor Information System running Logging & Book-keeping keeping Computing Grid enabled data transfers/ accesses Storage Job Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 15

Job Status submitted UI Network Server LFC waiting ready WMS storage scheduled Output Sandbox files Job Status Job Contr. - Condor Information System running done Logging & Book-keeping keeping Computing Storage Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 16

edg-job-get-output <dg-job-id> Job Status submitted UI Network Server LFC waiting ready WMS storage scheduled Job Status Output Sandbox Job Contr. - Condor Information System running done Logging & Book-keeping keeping Computing Storage Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 17

Job Status submitted UI Network Server LFC waiting Output Sandbox files WMS storage ready scheduled Job Status Job Contr. - Condor Information System running done Logging & Book-keeping keeping Computing Storage cleared Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 18

Possible job states Flag Meaning SUBMITTED WAIT READY submission logged in the LB job match making for resources job being sent to executing CE SCHEDULED job scheduled in the CE queue manager RUNNING DONE CLEARED ABORT job executing on a WN of the selected CE queue job terminated without grid errors job output retrieved job aborted by middleware, check reason Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 19

Workload Management System User interface Input sandbox Output sandbox Resource Broker (WorkLoad Mgr.) DataSets info SE & CE info Input sandbox + Broker Info LFC Catalogue Information Service Output sandbox Author. &Authen. Job Submit Event Job Query Job Status Publish Storage Logging & Book-keeping keeping Job Status Computing Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 20

Command Line Interface Job Submission $ edg-job-submit [options] <jdl_file> --vo <vo name> : perform submission with a different VO than the UI default one. --output, -o <output file> save jobid on a file. --resource, -r <resource value> specify the resource for execution. --nomsgi neither message nor errors on the stdout will be displayed. Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 21

If the request has been correctly submitted this is the typical output that you can get: edg-job-submit test.jdl ====================edg-job-submit Success ===================== The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobid) is: - https://lxshare0234.cern.ch:9000/ribubkffkhnsq6cjiluy8q ============================================================== In case of failure, an error message will be displayed instead, and an exit status different form zero will be retured. Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 22

If the command returns the following error message: **** Error: API_NATIVE_ERROR **** Error while calling the "NSClient::multi" native api AuthenticationException: Failed to establish security context... **** Error: UI_NO_NS_CONTACT **** Unable to contact any Network Server it means that there are authentication problems between the UI and the Network Server (check your proxy or contact the site administrator). Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 23

It is possible to see which CEs are eligible to run a job specified by a given JDL file using the command edg-job-list-match test.jdl Connecting to host lxshare0380.cern.ch, port 7772 Selected Virtual Organisation name (from UI conf file): dteam ********************************************************************* COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: adc0015.cern.ch:2119/jobmanager-lcgpbs-infinite adc0015.cern.ch:2119/jobmanager-lcgpbs-long adc0015.cern.ch:2119/jobmanager-lcgpbs-short ********************************************************************** Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 24

After a job is submitted, it is possible to see its status using the glite-job-status command. edg-job-status https://lxshare0234.cern.ch:9000/x-ehtxfdlxxsoidvls0l0w ************************************************************* BOOKKEEPING INFORMATION: Printing status info for the Job: https://lxshare0234.cern.ch:9000/x-ehtxfdlxxsoidvls0l0w Current Status: Scheduled Status Reason: Job successfully submitted to Globus Destination: lxshare0277.cern.ch:2119/jobmanager-pbs-infinite reached on: Fri Aug 1 12:21:35 2003 ************************************************************* Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 25

After the job has finished (it reaches the DONE status), its output can be copied to the UI edg-job-get-output https://lxshare0234.cern.ch:9000/snpegp1ymjcns22yf5pflg Retrieving files from host lxshare0234.cern.ch ***************************************************************** JOB GET OUTPUT OUTCOME Output sandbox files for the job: - https://lxshare0234.cern.ch:9000/snpegp1ymjcns22yf5pflg have been successfully retrieved and stored in the directory: /tmp/joboutput/larocca_snpegp1ymjcns22yf5pflg ***************************************************************** By default, the output is stored under /tmp/joboutput, but it is possible to specify in which directory to save the output using the - -dir <path name> option. Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 26

A job can be canceled before it ends using the command edg-job-cancel. edg-job-cancel https://lxshare0234.cern.ch:9000/dae162is6estca0vqhvkog Are you sure you want to remove specified job(s)? [y/n]n :y =================== edg-job-cancel Success==================== The cancellation request has been successfully submitted for the following job(s) - https://lxshare0234.cern.ch:9000/dae162is6estca0vqhvkog =========================================================== Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 27

In glite Job Description Language (JDL) is used to describe jobs for execution on Grid. The JDL adopted within the glite middleware is based upon Condor CLASSified Advertisement language (ClassAd). A ClassAd is a record-like structure composed of a finite number of attributes separated by a semi-colon (;) A ClassAd is highly flexible and can be used to represent arbitrary services The JDL is used in glite to specify the job s characteristics and constrains, which are used during the match-making making process to select the best resources that satisfy job s requirements. Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 28

The JDL syntax consists on statements like: Attribute = value; Comments must be preceded by a sharp character ( # ) or have to follow the C++ syntax WARNING: The JDL is sensitive to blank characters and tabs. No blank characters or tabs should follow the semicolon at the end of a line. Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 29

In a JDL, some attributes are mandatory while others are optional. An essential JDL is the following: Executable = test.sh ; StdOutput = std.out ; StdError = std.err ; InputSandbox = { test.sh, input.dat }; OutputSandbox = { std.out, std.err }; If needed, arguments to the executable can be passed: Arguments = Hello World! ; Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 30

If the argument contains quoted strings, the quotes must be escaped with a backslash e.g. Arguments = \ Hello World!\ 10 ; Special characters such as &,, >, < are only allowed if specified inside a quoted string or preceded by triple \ (e.g. Arguments = "-f file1\\\&file2";) The backtick character ` cannot be specified in the JDL. Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 31

JDL : Relevant Attributes Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 32

JobType (optional) Normal (simple, sequential job), Interactive, MPICH, Checkpointable, Partitionable, Parametric Or combination of them Checkpointable, Interactive Checkpointable, MPI E.g. JobType = Interactive ; JobType = { Interactive{ Interactive, Checkpointable }; Interactive + MPI not yet permitted Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 33

Executable (mandatory) This is a string representing the executable/command name. The user can specify an executable which is already on the remote CE Executable = { /opt/egeode/gct/egeode.sh{ /opt/egeode/gct/egeode.sh }; The user can provide a local executable name which will be staged from the UI to the WN Executable = { egeode.sh{ egeode.sh }; InputSandbox = { /home/larocca/egeode/{ egeode.sh }; Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 34

Arguments (optional) This is a string containing all the job command line arguments. E.g.: If your executable sum has to be started as: $ sum N1 N2 out result.out Executable = sum ; Arguments = N1 N2 out result.out ; Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 35

Environment (optional) List of environment settings needed by the job to run properly E.g. Environment = { JAVABIN=/usr/local/java{ JAVABIN=/usr/local/java }; InputSandbox (optional) List of files on the UI local disk needed by the job for running The listed files will automatically staged to the remote resource E.g. InputSandbox ={ myscript.sh myscript.sh, /tmp/cc,sh }; Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 36

OutputSandbox (optional) List of files, generated by the job, which have to be retrieved E.g. OutputSandbox = { std.out, std.err, image.png }; Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 37

Requirements (optional) Job requirements on computing resources Specified using attributes of resources published in the Information Service If not specified, default value defined in UI config\uration file is considered Default. Requirements = other.gluecestatestatus == "Production ; Requirements = other.glueceinfolrmstype == PBS && other.glueceinfototalcpus > 2 && Member ( ALICE-2.1.7, other.gluehostapplicationsoftwareruntimeenvironment); Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 38

Rank (optional) Floating-point expression used to rank CEs that have already fulfill the Requirements expression. The Rank expression can contain attributes that describe the CE in the Information System (IS). The evaluation of the rank expression is performed by the Resource Broker (RB) during the match-making phase. A higher numeric value equals a better rank. E.g.: Rank = other.gluecestatefreecpus; Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 39

InputData (optional) This is a string or a list of strings representing the Logical File Name (LFN) orgrid Unique Identifier (GUID) needed by the job as input. The list is used by the RB to find the CE from which the specified files can be better accessed and schedules the job to run there. InputData = { lfn:cmstestfile, guid:135b7b23-4a6a-11d7-87e7-9d101f8c8b70 }; Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 40

DataAccessProtocol (mandatory if InputData has been specified) The protocol or the list of protocols which the application is able to speak with for accessing files listed in InputData on a given SE. Supported protocols in glite are currently gsiftp, and file. DataAccessProtocol = { file{ file, gsiftp }; Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 41

OutputSE (optional) This string representing the URI of the Storage (SE) where the user wants to store the output data. This attribute is used by the Resource Broker to find the bestce close to this SE and schedule the job there. OutputSE = aliserv6.ct.infn.it ; Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 42

OutputData (optional) This attribute allows the user to ask for the automatic upload and registration of datasets produced by the job on the Worker Node (WN). This attribute contains the following three attributes: OutputFile Storage LogicalFileName Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 43

NodeNumber (mandatory if JobType=MPICH) NodeNumber attribute is an integer specifying the number of nodes needed for a MPI job. The RB uses this attribute during the matchmaking for selecting those CE having a number of CPUs equals or greater the one specified in NodeNumber. NodeNumber = 5; Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 44

JobSteps (mandatory for checkpointable or partitionable jobs) JobSteps attribute can be either an integer representing the number of steps for a checkpointable or partitionable job e.g.: JobSteps = 100000; or a list of strings representing labels associated to the steps of a checkpointable or partitionable job e.g.: JobSteps = { d0, d1, gmos }; Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 45

CurrentStep (mandatory for checkpointable or partitionable jobs) CurrentStep attribute used to indicate the initial step when submitting a checkpointable or partitionable job. CurrentStep = 2; Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 46

References & Hands-on JDL (sottomissione via WMS Netrwork Server) https://edms.cern.ch/file/555796/1/egee-jra1-tec-555796-jdl- Attributes-v0-7.doc https://grid.ct.infn.it/twiki/bin/view/gilda/simplejobs ubmissionwithrb https://grid.ct.infn.it/twiki/bin/view/gilda/moreonjdl -withedgcommands Remember to initialize the proxy before to interact with the WMS! Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 47

Thank you for your attention!!!! Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July -Catania 48