1 elab and the FVG grid Stefano Cozzini CNR-INFM DEMOCRITOS and SISSA elab Trieste
2 Agenda/Aims Present elab ant its computational infrastructure GRID-FVG structure basic requirements technical choices open questions Discussion
3 what is elab?
4 elab Mission Maintaining a cutting-edge computational infrastructure for research groups at SISSA and DEMOCRITOS Plan and setup advanced solution for scientific and technical computing Software engineering Parallel computing (platforms & algorithms) Cluster computing (high-performance and distributed) Grid computing Enabling E-science at Sissa/Democritos! 4
5 elab main partner: Democritos CNR/INFM National Simulation Center for condensed matter Research areas: Computational nano-science and materials engineering Computational biochemistry, biotechnology, and drug design Hard and soft condensed matter: superconductivity and disordered systems IT for computer simulations: (software engineering parallel, cluster, and grid computing) 5
6 elab computational environment 6
7 elab numbers established in people involved (3 permanent staff ) ~ 200 server managed (~ 1000 cpu) from HPC and GRID more than 2 millions hours computed so far ( november 2008) : ~ 300k hours budget: ~ 1 Millions Euro strong collaboration with Trieste scientific institutions: ELETTRA/INAF/ICTP
8 elab HPC resources HG1 main cluster it includes hardware from: CUBENET project GRID FVG project funds from SISSA and DEMOCRITOS researchers built using v 0.1 of EPICO WIP Enhanced Package for Installation and Configuration a suite of software tools to install manage HPC and Grid infrastructure developed by elab heterogeneous platform 8
9 elab computational infrastructure 9
10 elab training events Advanced School in High Performance Computing Tools for e-science - joint DEMOCRITOS/INFM-eLab/SISSA-ICTP activity, 5-16 march 2007 (Trieste/Italy) ~100 participants ( ~ 50 in Trieste Area) Advanced School in High Performance and Grid computing -joint DEMOCRITOS/INFMeLab/SISSA-ICTP activity, 3-14 november 2008 (Trieste/Italy) 10
11 HG1 hardware: computational nodes ZEBRA Partition: 160 core 20 Supermicro machine; 2 Intel CPU quad core: 2.50GHz 16 GB RAM each 20Gb infiniband card BLADE partition: 352 core 40(c) +48(m) Eurotech Blade system 2 Opteron CPU dual core GB RAM each 10Gbg infiniband card Diskless 11
13 GRID infrastructure at elab A small grid site within the EGEE infrastructure active in several V.O. ce-01.grid.sissa.it se-01.grid.sissa.it lfc.grid.sissa.it ( implemented JIT security mechanism for Storm SE ) The GRID-FVG infrastructure based on HPC: hg core HPC platform located in Amaro ( Mercurio Headquarter) The Gridseed VM tool
14 il progetto GRID-FVG SISSA/elab partner scientifico di Eurotech per la fornitura di un sistema grid HPC a Mercurio FVG da offrire a industria end enti locali Risorse hardware iniziali del progetto: presso SISSA : 200 core +servizi integrate dentro hg1 presso Mercurio FVG: 20O core + servizi
15 Requirements of GRID_FVG HPC resources should be integrated seamlessly in GRID environment. HPC resources should be used and exploited as HPC resources. (heterogeneity as added value and not as problem): Computational Resources should be available through grid infrastructure and local resources as well..
16 technical deployment: (requirement1) glite adoption : central grid services at Sissa/eLAB HPC systems as lcg- Computing Elements HCP systems installed using standard elab procedures: NO glite installation procedures on HPC infrastructure => ENEA SPAGO solution (see later) Status: satisfactory Future development: integration with CREAM
17 technical deployment (requirement 2) FULL MPI support via ~GRID job-submission.. Mpistart approach tricks on the WN to load appropriate module appropriate tags for information system Status: unsatisfactory: lack of features in JDL lack of info in GLUE2.0 schema Future Development: we are part of EGEE- MPI working group
18 technical deployment (requirement 2) FULL MPI support via ~GRID job-submission.. Mpistart approach tricks on the WN to load appropriate module appropriate tags for information system Status: unsatisfactory: lack of features in JDL lack of info in GLUE2.0 schema Future Development: we are part of EGEE- MPI working group
19 technical deployment (requirement 3) Not yet completed: Local user have a separate management Dedicated parallel filesystems Issues: authentication/authorization (should be the same) not really complicated I guess resources management: easy for CPUs: LRSM does it for your if appropriately configured data management: complicated under study...
20 gl ite standard architecture Voms Server Resource Broker MyProxy Server WAN UI CE Grid users batch server SE DMZ WN WN WN LAN
21 gl ite / SPAGO architecture Voms Server MyProxy Server Resource Broker WAN central glite services UI CE Grid users SE DMZ site glite services dj gri Local Masternode s ob grid jobs PBS/LSF batch server local jobs Local users Local resources Heterogeneous HW/SW LAN local cluster facilities no glite middleware!
22 SPAGO - Implementation grid users MAPPED home /home/grid001 /home/grid /opt/something NFS AFS... glite cluster node cluster node local batch server pbs/lsf cluster node computing element cluster node no glite PROXY WORKER NODE every node delegates all grid commands concerning transfer to/from grid to the proxy worker node
23 SPAGO M PI support MPI-start is the interface to hide the complexity of the local cluster different MPI vendors different batch servers MPI-start is installed on the CE NOTHING is installed on the cluster computing nodes The cluster nodes must be able: to load the same environment of a standard glite WN (MPI_* variables) to execute MPI-start scripts (fake mpirun, openmpi.mpi,...) When the cluster supports module loading, a mechanism should map the MPI Tags published in the CE_RUNTIMEENV in a call to a specific module $ lcg-info --list-ce --attrs Tag query 'CE=grid2.mercuriofvg.it:2119/jobmanagerlcgpbs-mercurio'... MPI-START MPI_SHARED_HOME OPENMPI-1.3 MPICH2-1.2p1... module load openmpi/1.3-intel/... export PATH... export LD_LIBRARY_PATH
24 SPAGO M PI test Intel MPI Benchmarks on computing nodes in a submission FROM User Interface: OPENMPI-1.3 (GNU wrapper) with Infiniband support MVAPICH2 (Intel wrapper) LAM (Intel wrapper) on TCP access to non standard glite platforms compilation of source code can't be targeted
25 IM B results
27 GridSeed as today: central services NO glite dependent CA: Certification Authority service + DNS+NTP+..
28 User Interfaces in GridSeed UI-1 standard UI based on glite UI UI-2 Clean Linux Box + Milu3.1 package MILU: Miramare Lightweight User interface
29 Gridseed typical site-grids CE-? Computing Element LFC-CE + TORQUE CExWN1 CExWN2 2 x CPU 2 x CPU SE-? Storage Element STORM srm v2.2 server
30 Tutorials/exercises elab.sissa.it/gridseed Basic glite tutorials Getting Started with GridSeed and glite middleware Basic Data Management AdvancedGLiteUserTutorials All the examples for this section /opt/examples on the gridseed UI. are available on Advanced Job submission mechanisms MPI job submission Specific ELabToolUserTutorials This tutorials are specific to elab and EU-Indiagrid tools developed to help grid User to run their applications. Automatic Thread optimization on the GRID using GOTO Blas and Reser Run QuantumEspresso pw.x code using SMP resources on the GRID a simple client server python tool for the GRID
31 future development Automatic/semiautomatic updating procedure of glite software Compacting/reducing size of central services: more services on one VM all central services on a single DVD Client/server mechanism to automatically add more grid-site to a basic configuration More grid-services DB elements Respect glite software and so on..
Universitat Autònoma de Barcelona Escola Tècnica Superior d Enginyeria Departament d Arquitectura de Computadors i Sistemes Operatius Scheduling for Interactive and Parallel Applications on Grids Memoria
Best Practices Guide McAfee epolicy Orchestrator for use with epolicy Orchestrator versions 4.5.0 and 4.0.0 COPYRIGHT Copyright 2011 McAfee, Inc. All Rights Reserved. No part of this publication may be
Journal of Computer and System Sciences 78 (2012) 1330 1344 Contents lists available at SciVerse ScienceDirect Journal of Computer and System Sciences www.elsevier.com/locate/jcss Cloud federation in a
EUROPEAN MIDDLEWARE INITIATIVE EMI Security Architecture http://openaire.cern.ch/record/5959 10.5281/ZENODO.5959 April 2013 EMI is partially funded by the European Commission under Grant Agreement RI-261611
Status and Integration of AP2 Monitoring and Online Steering Daniel Lorenz - University of Siegen Stefan Borovac, Markus Mechtel - University of Wuppertal Ralph Müller-Pfefferkorn Technische Universität
An Oracle White Paper March 2013 Load Testing Best Practices for Oracle E- Business Suite using Oracle Application Testing Suite Executive Overview... 1 Introduction... 1 Oracle Load Testing Setup... 2
Distributed Systems 20. Clusters Paul Krzyzanowski Rutgers University Fall 2013 November 27, 2013 2013 Paul Krzyzanowski 1 Designing highly available systems Incorporate elements of fault-tolerant design
Best Practices for Deploying and Managing Linux with Red Hat Network Abstract This technical whitepaper provides a best practices overview for companies deploying and managing their open source environment
Designing highly available systems Incorporate elements of fault-tolerant design Replication, TMR Distributed Systems 20. Clusters Fully fault tolerant system will offer non-stop availability You can t
Deployment Planning Guide August 2011 Copyright: 2011, CCH, a Wolters Kluwer business. All rights reserved. Material in this publication may not be reproduced or transmitted in any form or by any means,
An Oracle Technical White Paper May 2011 Oracle Optimized Solution for Enterprise Cloud Infrastructure Introduction... 1 Overview of the Oracle Optimized Solution for Enterprise Cloud Infrastructure...
An Oracle White Paper May 2010 Oracle Cloud Computing 1 Executive Overview Cloud computing is a significant advancement in the delivery of information technology and services. By providing on demand access
Server Virtualization Products And Information Security William J. Sparks Daniel G. James ICTN 6883 Semester Project 4/8/2008 Author Bio s Daniel G. James is a fulltime employee/fulltime graduate student
sm OPEN DATA CENTER ALLIANCE : The Private Cloud Strategy at BMW SM Table of Contents Legal Notice...3 Executive Summary...4 The Mission of IT-Infrastructure at BMW...5 Objectives for the Private Cloud...6
The GENIUS Grid Portal (*) work in collaboration with A. Falzone and A. Rodolico EGEE NA4 Workshop, Paris, 18.12.2003 CHEP 2000, 10.02.2000 Outline Introduction Grid portal architecture and requirements
INTRODUCTION TO LINUX CLUSTERING DOCUMENT RELEASE 1.1 Copyright 2008 Jethro Carr This document may be freely distributed provided that it is not modified and that full credit is given to the original author.
Distributed Operating Systems Cluster Systems Ewa Niewiadomska-Szynkiewicz firstname.lastname@example.org Institute of Control and Computation Engineering Warsaw University of Technology E&IT Department, WUT 1 1. Cluster
Final Report DFN-Project GRIDWELTEN: User Requirements and Environments for GRID-Computing 5/30/2003 Peggy Lindner 1, Thomas Beisel 1, Michael M. Resch 1, Toshiyuki Imamura 2, Roger Menday 3, Philipp Wieder
Building HA Linux Cluster A tutorial for IEEE Cluster Conference 2001 Ibrahim.Haddad@Ericsson.com Ericsson Research Corporate Unit 2001-05-14 1 Ericsson Canada Purpose of the tutorial 1. Share our experience
Cloud Computing and Grid Computing 360-Degree Compared 1,2,3 Ian Foster, 4 Yong Zhao, 1 Ioan Raicu, 5 Shiyong Lu email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org 1 Department
Oracle Database 10g Product Family An Oracle White Paper Jan. 2004 Oracle Database 10g Product Family INTRODUCTION Oracle Database 10g is available in four editions, each suitable for different development
The Application of Cloud Computing to Scientific Workflows: A Study of Cost and Performance G. Bruce Berriman, Ewa Deelman, Gideon Juve, Mats Rynge and Jens-S. Vöckler G. Bruce Berriman Infrared Processing
education appliance enabling school 2.0 All you need to know... The Education Appliance is designed and manufactured by Critical Links based on Intel s reference design for the Intel Learning Series ecosystem.
The IBM Business Intelligence Software Solution Prepared for IBM by Colin J. White DataBase Associates International, Inc. Version 3, March 1999 TABLE OF CONTENTS WHAT IS BUSINESS INTELLIGENCE? 1 The Evolution
HPC Software Requirements to Support an HPC Cluster Supercomputer Susan Kraus, Cray Cluster Solutions Software Product Manager Maria McLaughlin, Cray Cluster Solutions Product Marketing Cray Inc. WP-CCS-Software01-0417
The UNICORE Grid technology provides a seamless, secure, and intuitive access to distributed Grid resources. UNICORE is a full-grown and well-tested Grid middleware system, which today is used in daily