GRID workload management system and CMS fall production. Massimo Sgaravatto INFN Padova
|
|
|
- Anastasia Juliana Harmon
- 10 years ago
- Views:
Transcription
1 GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova
2 What do we want to implement (simplified design) Master chooses in which resources the jobs must be submitted Condor-G able to provide reliability Use of Condor tools for job monitoring, logging, Submit jobs (using Class-Ads) Condor-G Master condor_submit ( Universe) Resource Discovery Grid Information Service (GIS) Information on characteristics and status of local resources globusrun as uniform interface to different local resource management systems Local Resource Management Systems CONDOR LSF Farms Site1 Site2 Site3
3 What can be implemented now Submit jobs condor_submit ( Universe) Grid Information Service (GIS) Not very useful in this model Condor-G able to provide reliability Use of Condor tools for job monitoring, logging, Condor-G Information on characteristics and status of local resources globusrun as uniform interface to different local resource management systems Local Resource Management Systems CONDOR LSF Farms Site1 Site2 Site3
4 Status Tests on basic capabilities and functionalities have been performed Problems with scalability and fault tolerance found CMS production useful exercise to test everything with real applications and real environments
5 CMS production Application: Pythia + Cmsim Traditional applications Overview Job management (submission, monitoring) from a single machine using Condor tools User must explicitly define in which resource (which farm) the jobs must be submitted The applications and the input files must be stored in the file system of the executing machine The output files will be created in the file system of the executing machine We can try to have just the standard output/error files (useful to check the status of the production) created in the submitting machine, using bypass and/or GASS CMS wants to test bypass as a second step
6 Bypass vs. GASS Bypass Written by Douglas Thain (Condor team) Redirection of standard input/output/error of a program to a remote machine when the program is running Can be used for dynamically linked program Successfully tested with Pythia Use of Security Infrastructure GASS Possibility to copy the input file on the remote machine before the execution, and have the output file back after the execution (otherwise it is necessary to modify the source code)
7 What is necessary Local farms with shared file system between the various nodes Done using CMS installation toolkit Installation and support up to CMS/local administrators Installation of CMS environment on these farms Done using CMS installation toolkit Support up to CMS
8 What is necessary Local resource management system to manage the local farm LSF Installation and support up to CMS/local administrators We should define in a common way how to configure the queue/s where the jobs run Local Condor pool Installation and configuration (for dedicated machines) using CMS toolkit Support??? PBS Are there sites where PBS will be used??? Tests on Condor-G PBS not performed yet Fork Warmly thoughtless (even for a single machine) Necessary to install on each machine Job queuing up to the production manager
9 What is necessary One installation per each farm (on a visible node) Use of personal certificates and host certificates signed by INFN CA User certificates signed by CA are accepted as well By default it is not possible to use resources outside INFN using personal certificates signed by INFN CA Workaround 1: Users have also personal certificates signed by CA Workaround 2: Small modification in the configuration of these resources outside INFN in order to accept our certificates too Installation Installation done by CMS/local administrators/wp1 member (if present) using distribution and procedures provided by INFN GRID release team ( In case of problems: [email protected]
10 What is necessary Condor-G Just one installation, used by the production manager (Ivano Lippi?) Installation and maintenance: Massimo Sgaravatto??? Scripts to run CMS production using this GRID environment Up to CMS Tools to monitor production condor_q Condor Job Viewer (Java GUI) Run the production Up to production manager
11 Some items/actors missing??? When??? Relations with other activities??? Data Management (GDMP, )??????
Condor for the Grid. 3) http://www.cs.wisc.edu/condor/
Condor for the Grid 1) Condor and the Grid. Douglas Thain, Todd Tannenbaum, and Miron Livny. In Grid Computing: Making The Global Infrastructure a Reality, Fran Berman, Anthony J.G. Hey, Geoffrey Fox,
Roberto Barbera. Centralized bookkeeping and monitoring in ALICE
Centralized bookkeeping and monitoring in ALICE CHEP INFN 2000, GRID 10.02.2000 WP6, 24.07.2001 Roberto 1 Barbera ALICE and the GRID Phase I: AliRoot production The GRID Powered by ROOT 2 How did we get
Grid Scheduling Dictionary of Terms and Keywords
Grid Scheduling Dictionary Working Group M. Roehrig, Sandia National Laboratories W. Ziegler, Fraunhofer-Institute for Algorithms and Scientific Computing Document: Category: Informational June 2002 Status
An objective comparison test of workload management systems
An objective comparison test of workload management systems Igor Sfiligoi 1 and Burt Holzman 1 1 Fermi National Accelerator Laboratory, Batavia, IL 60510, USA E-mail: [email protected] Abstract. The Grid
The CMS analysis chain in a distributed environment
The CMS analysis chain in a distributed environment on behalf of the CMS collaboration DESY, Zeuthen,, Germany 22 nd 27 th May, 2005 1 The CMS experiment 2 The CMS Computing Model (1) The CMS collaboration
LSKA 2010 Survey Report Job Scheduler
LSKA 2010 Survey Report Job Scheduler Graduate Institute of Communication Engineering {r98942067, r98942112}@ntu.edu.tw March 31, 2010 1. Motivation Recently, the computing becomes much more complex. However,
HTCondor within the European Grid & in the Cloud
HTCondor within the European Grid & in the Cloud Andrew Lahiff STFC Rutherford Appleton Laboratory HEPiX 2015 Spring Workshop, Oxford The Grid Introduction Computing element requirements Job submission
Using Parallel Computing to Run Multiple Jobs
Beowulf Training Using Parallel Computing to Run Multiple Jobs Jeff Linderoth August 5, 2003 August 5, 2003 Beowulf Training Running Multiple Jobs Slide 1 Outline Introduction to Scheduling Software The
Grid Scheduling Architectures with Globus GridWay and Sun Grid Engine
Grid Scheduling Architectures with and Sun Grid Engine Sun Grid Engine Workshop 2007 Regensburg, Germany September 11, 2007 Ignacio Martin Llorente Javier Fontán Muiños Distributed Systems Architecture
MSU Tier 3 Usage and Troubleshooting. James Koll
MSU Tier 3 Usage and Troubleshooting James Koll Overview Dedicated computing for MSU ATLAS members Flexible user environment ~500 job slots of various configurations ~150 TB disk space 2 Condor commands
Grid Computing in SAS 9.4 Third Edition
Grid Computing in SAS 9.4 Third Edition SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2014. Grid Computing in SAS 9.4, Third Edition. Cary, NC:
CLOUD COMPUTING. When It's smarter to rent than to buy
CLOUD COMPUTING When It's smarter to rent than to buy Is it new concept? Nothing new In 1990 s, WWW itself Grid Technologies- Scientific applications Online banking websites More convenience Not to visit
DeBruin Consulting. Key Concepts of IBM Integration Broker and Microsoft BizTalk
DeBruin Consulting WMB vs. BTS Key Concepts of IBM Integration Broker and Microsoft BizTalk Barry DeBruin 4/16/2014 WMB & BTS Key Concepts Contents Software Requirements... 2 Microsoft BizTalk Server 2013...
SEE-GRID-SCI. www.see-grid-sci.eu. SEE-GRID-SCI USER FORUM 2009 Turkey, Istanbul 09-10 December, 2009
SEE-GRID-SCI Grid Site Monitoring tools developed and used at SCL www.see-grid-sci.eu SEE-GRID-SCI USER FORUM 2009 Turkey, Istanbul 09-10 December, 2009 V. Slavnić, B. Acković, D. Vudragović, A. Balaž,
Oracle Insurance Policy Administration. Version 9.4.0.0
Oracle Insurance Policy Administration Coherence Version 9.4.0.0 Part Number: E18894-01 June 2011 Copyright 2009, 2011, Oracle and/or its affiliates. All rights reserved. This software and related documentation
The GRID and the Linux Farm at the RCF
The GRID and the Linux Farm at the RCF A. Chan, R. Hogue, C. Hollowell, O. Rind, J. Smith, T. Throwe, T. Wlodek, D. Yu Brookhaven National Laboratory, NY 11973, USA The emergence of the GRID architecture
(RH 7.3, gcc 2.95.2,VDT 1.1.6, EDG 1.4.3, GLUE, RLS) Tokyo BNL TAIWAN RAL 20/03/2003 20/03/2003 CERN 15/03/2003 15/03/2003 FNAL 10/04/2003 CNAF
Our a c t i v i t i e s & c o n c e rn s E D G - L C G t r a n s i t i o n / c o n v e r g e n c e p l a n EDG s i d e : i n t e g r a t i o n o f n e w m i d d l e w a r e, t e s t b e d e v a l u a t
Sun Grid Engine, a new scheduler for EGEE
Sun Grid Engine, a new scheduler for EGEE G. Borges, M. David, J. Gomes, J. Lopez, P. Rey, A. Simon, C. Fernandez, D. Kant, K. M. Sephton IBERGRID Conference Santiago de Compostela, Spain 14, 15, 16 May
RenderStorm Cloud Render (Powered by Squidnet Software): Getting started.
Version 1.0 RenderStorm Cloud Render (Powered by Squidnet Software): Getting started. RenderStorm Cloud Render is an easy to use standalone application providing remote access, job submission, rendering,
Apache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
An approach to grid scheduling by using Condor-G Matchmaking mechanism
An approach to grid scheduling by using Condor-G Matchmaking mechanism E. Imamagic, B. Radic, D. Dobrenic University Computing Centre, University of Zagreb, Croatia {emir.imamagic, branimir.radic, dobrisa.dobrenic}@srce.hr
Cloud Computing Architecture with OpenNebula HPC Cloud Use Cases
NASA Ames NASA Advanced Supercomputing (NAS) Division California, May 24th, 2012 Cloud Computing Architecture with OpenNebula HPC Cloud Use Cases Ignacio M. Llorente Project Director OpenNebula Project.
Installing and running COMSOL on a Linux cluster
Installing and running COMSOL on a Linux cluster Introduction This quick guide explains how to install and operate COMSOL Multiphysics 5.0 on a Linux cluster. It is a complement to the COMSOL Installation
A High Performance Computing Scheduling and Resource Management Primer
LLNL-TR-652476 A High Performance Computing Scheduling and Resource Management Primer D. H. Ahn, J. E. Garlick, M. A. Grondona, D. A. Lipari, R. R. Springmeyer March 31, 2014 Disclaimer This document was
Chapter 1 - Web Server Management and Cluster Topology
Objectives At the end of this chapter, participants will be able to understand: Web server management options provided by Network Deployment Clustered Application Servers Cluster creation and management
Deploying a distributed data storage system on the UK National Grid Service using federated SRB
Deploying a distributed data storage system on the UK National Grid Service using federated SRB Manandhar A.S., Kleese K., Berrisford P., Brown G.D. CCLRC e-science Center Abstract As Grid enabled applications
The glite Workload Management System
Consorzio COMETA - Progetto PI2S2 FESR The glite Workload Management System Annamaria Muoio INFN Catania Italy [email protected] Tutorial per utenti e sviluppo di applicazioni in Grid 16-20 July
Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC
Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC Goals of the session Overview of parallel MATLAB Why parallel MATLAB? Multiprocessing in MATLAB Parallel MATLAB using the Parallel Computing
CEMON installation and configuration procedure
CEMON installation and configuration procedure Introduction Tanya Levshina, Rohit Mathur, Steve Timm Draft This document is intended for administrators responsible for installing and configuring ITB Compute
Monitoring Clusters and Grids
JENNIFER M. SCHOPF AND BEN CLIFFORD Monitoring Clusters and Grids One of the first questions anyone asks when setting up a cluster or a Grid is, How is it running? is inquiry is usually followed by the
locuz.com HPC App Portal V2.0 DATASHEET
locuz.com HPC App Portal V2.0 DATASHEET Ganana HPC App Portal makes it easier for users to run HPC applications without programming and for administrators to better manage their clusters. The web-based
Microsoft HPC. V 1.0 José M. Cámara ([email protected])
Microsoft HPC V 1.0 José M. Cámara ([email protected]) Introduction Microsoft High Performance Computing Package addresses computing power from a rather different approach. It is mainly focused on commodity
No.1 IT Online training institute from Hyderabad Email: [email protected] URL: sriramtechnologies.com
I. Basics 1. What is Application Server 2. The need for an Application Server 3. Java Application Solution Architecture 4. 3-tier architecture 5. Various commercial products in 3-tiers 6. The logic behind
Using Big Data and GIS to Model Aviation Fuel Burn
Using Big Data and GIS to Model Aviation Fuel Burn Gary M. Baker USDOT Volpe Center 2015 Transportation DataPalooza June 17, 2015 The National Transportation Systems Center Advancing transportation innovation
Chapter 2: Getting Started
Chapter 2: Getting Started Once Partek Flow is installed, Chapter 2 will take the user to the next stage and describes the user interface and, of note, defines a number of terms required to understand
G-Monitor: Gridbus web portal for monitoring and steering application execution on global grids
G-Monitor: Gridbus web portal for monitoring and steering application execution on global grids Martin Placek and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Lab Department of Computer
Enabling LIGO Applications on Scientific Grids
Enabling LIGO Applications on Scientific Grids Junwei Cao, MIT/LIGO [email protected] For the LIGO Scientific Collaboration OSG Consortium Meeting Milwaukee, Wisconsin July 20-22 2005 Data Monitoring Toolkit
Scheduling in SAS 9.4 Second Edition
Scheduling in SAS 9.4 Second Edition SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. Scheduling in SAS 9.4, Second Edition. Cary, NC: SAS Institute
GT 6.0 GRAM5 Key Concepts
GT 6.0 GRAM5 Key Concepts GT 6.0 GRAM5 Key Concepts Overview The Globus Toolkit provides GRAM5: a service to submit, monitor, and cancel jobs on Grid computing resources. In GRAM, a job consists of a computation
The ENEA-EGEE site: Access to non-standard platforms
V INFNGrid Workshop Padova, Italy December 18-20 2006 The ENEA-EGEE site: Access to non-standard platforms C. Sciò**, G. Bracco, P. D'Angelo, L. Giammarino*, S.Migliori, A. Quintiliani, F. Simoni, S. Podda
Scheduling in SAS 9.3
Scheduling in SAS 9.3 SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc 2011. Scheduling in SAS 9.3. Cary, NC: SAS Institute Inc. Scheduling in SAS 9.3
AASPI SOFTWARE PARALLELIZATION
AASPI SOFTWARE PARALLELIZATION Introduction Generation of multitrace and multispectral seismic attributes can be computationally intensive. For example, each input seismic trace may generate 50 or more
Designing a Windows Server 2008 Applications Infrastructure
Designing a Windows Server 2008 Applications Infrastructure Course Number: 6437A Course Length: 3 Days Course Overview This three day course will prepare IT professionals for the role of Enterprise Administrator.
SSM6437 DESIGNING A WINDOWS SERVER 2008 APPLICATIONS INFRASTRUCTURE
SSM6437 DESIGNING A WINDOWS SERVER 2008 APPLICATIONS INFRASTRUCTURE Duration 5 Days Course Outline Module 1: Designing IIS Web Farms The students will learn the process of designing IIS Web Farms with
Load Balancing in cloud computing
Load Balancing in cloud computing 1 Foram F Kherani, 2 Prof.Jignesh Vania Department of computer engineering, Lok Jagruti Kendra Institute of Technology, India 1 [email protected], 2 [email protected]
Cobalt: An Open Source Platform for HPC System Software Research
Cobalt: An Open Source Platform for HPC System Software Research Edinburgh BG/L System Software Workshop Narayan Desai Mathematics and Computer Science Division Argonne National Laboratory October 6, 2005
Grid Computing With FreeBSD
Grid Computing With FreeBSD USENIX ATC '04: UseBSD SIG Boston, MA, June 29 th 2004 Brooks Davis, Craig Lee The Aerospace Corporation El Segundo, CA {brooks,lee}aero.org http://people.freebsd.org/~brooks/papers/usebsd2004/
Concepts and Architecture of the Grid. Summary of Grid 2, Chapter 4
Concepts and Architecture of the Grid Summary of Grid 2, Chapter 4 Concepts of Grid Mantra: Coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations Allows
Batch and Cloud overview. Andrew McNab University of Manchester GridPP and LHCb
Batch and Cloud overview Andrew McNab University of Manchester GridPP and LHCb Overview Assumptions Batch systems The Grid Pilot Frameworks DIRAC Virtual Machines Vac Vcycle Tier-2 Evolution Containers
50331D Windows 7, Enterprise Desktop Support Technician (Windows 10 Curriculum)
This course can be purchased by authorized Microsoft Learning Centers at the Courseware Marketplace web-site. Microsoft Certified Trainers (MCTs) can get a free copy at the same website. About the Course
Provisioning and Resource Management at Large Scale (Kadeploy and OAR)
Provisioning and Resource Management at Large Scale (Kadeploy and OAR) Olivier Richard Laboratoire d Informatique de Grenoble (LIG) Projet INRIA Mescal 31 octobre 2007 Olivier Richard ( Laboratoire d Informatique
Release Notes: SANsymphony-V System Center Operations Manager (SCOM) Management Pack 1.3
Release Notes Cumulative Change Summary Date Original 1.0 release November 1, 2011 Added additional troubleshooting notes April 3, 2012 Clarified Management Pack Software Requirements; added known issue
Interoperating Cloud-based Virtual Farms
Stefano Bagnasco, Domenico Elia, Grazia Luparello, Stefano Piano, Sara Vallero, Massimo Venaruzzo For the STOA-LHC Project Interoperating Cloud-based Virtual Farms The STOA-LHC project 1 Improve the robustness
SURVEY ON THE ALGORITHMS FOR WORKFLOW PLANNING AND EXECUTION
SURVEY ON THE ALGORITHMS FOR WORKFLOW PLANNING AND EXECUTION Kirandeep Kaur Khushdeep Kaur Research Scholar Assistant Professor, Department Of Cse, Bhai Maha Singh College Of Engineering, Bhai Maha Singh
Oracle WebLogic Server 11g: Administration Essentials
Oracle University Contact Us: 1.800.529.0165 Oracle WebLogic Server 11g: Administration Essentials Duration: 5 Days What you will learn This Oracle WebLogic Server 11g: Administration Essentials training
Comparing two Queuing Network Solvers: JMT vs. PDQ
Comparing two Queuing Network Solvers: JMT vs. PDQ A presentation for the report of the Course CSI 5112 (W11) Adnan Faisal (CU100841800) Mostafa Khaghani Milani (CU100836314) University of Ottawa 25 March
Running COMSOL in parallel
Running COMSOL in parallel COMSOL can run a job on many cores in parallel (Shared-memory processing or multithreading) COMSOL can run a job run on many physical nodes (cluster computing) Both parallel
BusinessObjects Enterprise XI Release 2
BusinessObjects Enterprise XI Release 2 How to configure an Internet Information Services server as a front end to a WebLogic application server Overview Contents This document describes the process of
CDFII Computing Status
CDFII Computing Status OUTLINE: New CDF-Italy computing group organization Usage status at FNAL and CNAF Towards GRID: where we are Plans and requests 22/04/2005 Donatella Lucchesi 1 CDFII Computing group
Mitglied der Helmholtz-Gemeinschaft. System monitoring with LLview and the Parallel Tools Platform
Mitglied der Helmholtz-Gemeinschaft System monitoring with LLview and the Parallel Tools Platform November 25, 2014 Carsten Karbach Content 1 LLview 2 Parallel Tools Platform (PTP) 3 Latest features 4
HPC-Nutzer Informationsaustausch. The Workload Management System LSF
HPC-Nutzer Informationsaustausch The Workload Management System LSF Content Cluster facts Job submission esub messages Scheduling strategies Tools and security Future plans 2 von 10 Some facts about the
Alfresco Enterprise on Azure: Reference Architecture. September 2014
Alfresco Enterprise on Azure: Reference Architecture Page 1 of 14 Abstract Microsoft Azure provides a set of services for deploying critical enterprise workloads on its highly reliable cloud platform.
The CERN Virtual Machine and Cloud Computing
University of Victoria Faculty of Engineering Fall 2009 Work Term Report The CERN Virtual Machine and Cloud Computing Department of Physics University of Victoria Victoria, BC Vikramjit Sandhu V00194047
A Metascheduler Proof of Concept using Tivoli Workload Scheduler
Redbooks Paper Margaret Radford Fred DiGilio Dean Phillips James Wang A Metascheduler Proof of Concept using Tivoli Workload Scheduler This Redpaper describes a proof of concept (PoC) project that was
U-LITE Network Infrastructure
U-LITE: a proposal for scientific computing at LNGS S. Parlati, P. Spinnato, S. Stalio LNGS 13 Sep. 2011 20 years of Scientific Computing at LNGS Early 90s: highly centralized structure based on VMS cluster
The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland
The Lattice Project: A Multi-Model Grid Computing System Center for Bioinformatics and Computational Biology University of Maryland Parallel Computing PARALLEL COMPUTING a form of computation in which
