John Walsh and Brian Coghlan Grid Ireland /e INIS/EGEE John.Walsh@cs.tcd.ie

Similar documents
The Grid-it: the Italian Grid Production infrastructure

Batch and Cloud overview. Andrew McNab University of Manchester GridPP and LHCb

IGI Portal architecture and interaction with a CA- online

Grid Scheduling Architectures with Globus GridWay and Sun Grid Engine

An approach to grid scheduling by using Condor-G Matchmaking mechanism

CMS Dashboard of Grid Activity

Status and Integration of AP2 Monitoring and Online Steering

CERN local High Availability solutions and experiences. Thorsten Kleinwort CERN IT/FIO WLCG Tier 2 workshop CERN

Sun Grid Engine, a new scheduler for EGEE

Cluster, Grid, Cloud Concepts

GRIDSEED: A Virtual Training Grid Infrastructure

Client/Server Grid applications to manage complex workflows

HAMBURG ZEUTHEN. DESY Tier 2 and NAF. Peter Wegner, Birgit Lewendel for DESY-IT/DV. Tier 2: Status and News NAF: Status, Plans and Questions

Virtualisation Cloud Computing at the RAL Tier 1. Ian Collier STFC RAL Tier 1 HEPiX, Bologna, 18 th April 2013

Certificates in a Nutshell. Jens Jensen, STFC Leader of EUDAT AAI TF

Global Grid User Support - GGUS - in the LCG & EGEE environment

An Integrated CyberSecurity Approach for HEP Grids. Workshop Report.

The GENIUS Grid Portal

Analyses on functional capabilities of BizTalk Server, Oracle BPEL Process Manger and WebSphere Process Server for applications in Grid middleware

CNR-INFM DEMOCRITOS and SISSA elab Trieste

ATLAS job monitoring in the Dashboard Framework

GridWay: Open Source Meta-scheduling Technology for Grid Computing

The ENEA-EGEE site: Access to non-standard platforms

Report from SARA/NIKHEF T1 and associated T2s

NorduGrid ARC Tutorial

Global Grid User Support - GGUS - start up schedule

DSA1.4 R EPORT ON IMPLEMENTATION OF MONITORING AND OPERATIONAL SUPPORT SYSTEM. Activity: SA1. Partner(s): EENet, NICPB. Lead Partner: EENet

Grids Computing and Collaboration

Authorization Strategies for Virtualized Environments in Grid Computing Systems

DSA1.5 U SER SUPPORT SYSTEM

How much do you pay for your PKI solution?

Distributed Database Access in the LHC Computing Grid with CORAL

XSEDE Service Provider Software and Services Baseline. September 24, 2015 Version 1.2

Welcome to the User Support for EGEE Task Force Meeting

glibrary: Digital Asset Management System for the Grid

Mobile Device Management Version 8. Last updated:

Deployment and Configuration Guide

Interoperability in Grid Computing

MIGRATING DESKTOP AND ROAMING ACCESS. Migrating Desktop and Roaming Access Whitepaper

Virtualization, Grid, Cloud: Integration Paths for Scientific Computing

Status and Evolution of ATLAS Workload Management System PanDA

Web Application Hosting Cloud Architecture

VMware vrealize Automation

Websense Support Webinar: Questions and Answers

ARDA Experiment Dashboard

Deployment Guide A10 Networks/Infoblox Joint DNS64 and NAT64 Solution

GRAVITYZONE HERE. Deployment Guide VLE Environment

The CMS analysis chain in a distributed environment

Interwise Connect. Working with Reverse Proxy Version 7.x

(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015

An objective comparison test of workload management systems

NMS300 Network Management System

Federated Access Control in Heterogeneous Intercloud Environment: Basic Models and Architecture Patterns

The glite File Transfer Service

Transcription:

Grid Computing and EGEE John Walsh and Brian Coghlan Grid Ireland /e INIS/EGEE John.Walsh@cs.tcd.ie www.eu egee.org EGEE III/Grid Ireland EGEE and glite are registered trademarks

Background Grid is an evolving and maturing architecture Based on several well established computer science domains Distributed Computing Group and Role Management Distributed Data Management Public Key Encryption 2

What is a Grid Anyway? Grid is highly overloaded and incorrectly used term peer to peer networks Compute cycle donation systems (BLAST, SETI, etc) Distributed clusters over VPN Basic requirements Facilitate virtual organisations across domain boundaries Secure and trusted infrastructure Single sign on login Resource sharing Storage Compute cycles New media access (grid filesystems, GridSite Wiki) Coordinated, long term infrastructure Policies for joining/leaving a Grid 3

Virtual Organisation Motivations Academic work often involves large and small collaborations Some local or national Others across international borders This gives rise to several challenges: How do I share my data and resources with someone who doesn t work locally? Don t personally know all people in collaboration bona fide users? How do I ensure that I can limit access to just a well defined set of people and resources? large scale, across national boundaries? 4

Non Grid Solutions Basic solutions address some of these issues, eg: Ssh for login access to remote batch systems Web with simple (but secure) password protection FTP with password protection, scp... However, these are not scalable OpenID aims at single signon for users 5

Grid Solutions Grids address these problems by providing: Internationally recognised identity credentials Renewed annually Must be revoked when user leaves institute Ability to revoke cert quickly (misuse, security,... ) Support for virtual organisations (Software + Service + Policies) Software (Middleware) to support controlled access to resources Extra support for roles/groups within the VO Secure access to larger pool of resources available to users Subject to site policy Storage + CPU (batch systems) Single sign on support Login once, use everywhere Limited duration 6

Certificate Authorities National credential services Known as a CA See http://www.eugridpma.org Often compared to a Passport Office User applies for a grid identity using a web browser Certificate request User is then required to: Present themselves to a local Regional Authority (RA) Prove that they are who they say they are (formal ID of Institute) After approval, uses same web browser to pick up their certificate Uses established Public Key Encryption (PKI) X.509 standard 7

CA in Ireland Run by the Grid Ireland Operations Centre Dr Brian Coghlan, Dr David O Callaghan (TCD) Officially recognised by GridPMA Regional Authorities Cork (Prof John Morrison, Brian Clayton UCC) Galway (Dr Andy Shearer, Dr Bruno Voisin NUI Galway) To apply for a certificate, see: http://www.grid.ie/getting a cert.html 8

The International Dimension Many International Grids EGEE (EU 27, spans 50+ countries ) OSG, TeraGrid (USA) NAREGI (Japan) ARC (Nordic Countries) NGS (UK) DEISA II (leading national super computing centres in EU) PRACE (Peta scale computing ICHEC is Irish Partner) Int.EU.Grid the interactive grid TCD is a partner (Stuart Kenny, Active Security) Finished in April Grid Interoperations/Interoperability demo at SC 07 These are mostly based around Globus Toolkit Grid interoperability is work in progress 9

EGEE III Well established infrastructure EDG(2001/04), EGEE I(2004/06), EGEE II (2006/08) EGEE III (May 2008 2010), EGI from 2010 EU supported grid with mandate to: Run a European wide production grid infrastructure Provides Compute and Data Storage for e Science > 600 people involved in day to day management Running 24/7/365, >150k jobs per day 265+ sites, 55 countries, 11 federations Build, test, certify and deploy quality grid middleware solutions glite middleware stack, developed by project Interface with other well established grids Encourage grid awareness and build new communities Encourage new application domains on the grid Establish and encourage grid technology in the business world 10

The EGEE Grid 11

Grid Ireland and the EGEE Grid Ireland at TCD is the contractual partner with EGEE Responsible for a distributed Regional Operations Centre with UK partners (STFC, NGS + others) Forms UK/I Federation Operations team integrated into European Operations Problem Resolution system (GGUS www.ggus.org) Resolve problems at sites Provide solutions and add to Knowledge base Weekly reporting on operational issues 2 sites in Ireland (DIAS,TCD), 21+ in UK NGI for Ireland in EGI 12

EGEE Grid Architecture and Services Administrative Global Grid User Support Grid Operations Centre DB Service Availability Monitoring Core Infrastructure Centre Certificate Authorities Grid Training Access Services User Interface Grid logon Grid Portals Core Services Information Services Top level BDII, RGMA Registry Authentication and Authorisation VOMS, MyProxy, CRL Service Workload Management WMS/LB Data Management LFC, FTS, AMGA, Site Services Job Management GateKeeper/batch Storage Management Information Management 13

Abstract Architecture 14

Grid Site Internals A Grid Site consists of 1 or more Gatekeepers linked to local batch system Batch System (including Worker nodes) A Grid enabled Storage Service A set of host information systems top level site system. Each information system collects data about the status of the services running on the The site BDII collects and aggregates information about the available services R GMA monitoring service Optional (Core) Services Workload Management/Logging and Book keeping MYPROXY File Transfer service.. UI (allowing a local user sign on to the Grid) 15

Core Services Form backbone of the Grid Provide the necessary services to allow user at Site A send a job or access a data resource at another site. Maintained at one or more sites Most important Services VOMS (Authentication and Authorisation) Workload management Service (Job Management) Information Management Data Management 16

Information System Basics In order for a grid to function, the current status of resources needs to be known e.g, job matchmaking will use this info when deciding where to send a job The amount of information gathered is very large This is handled hierarchically Top Level BDII queries Site BDIIs from a list of known sites This information is amalgamated from lower level Info Systems Data has a Distinguished Name and is unique 17

Site BDII Gathers information from a list of known supported services at the site, e.g. Compute Element Storage Element Top Level Services supported by the Site Managers Workload management system (WMS/LB, optional) Credential renewal (MYPROXY, optional) File catalog (LFC, optional) File transfer service (FTS, optional) Top Level BDII will pull this status information and republish it at regular intervals. Quasi realtime status 18

Information Hierarchy 19

Managing Jobs Grid Jobs are described using JDL A User can Match a job (as decribed by the JDL) against resources 70+ primitives for matching (e.g against available #CPU/ Storage/Site VO access restrictions etc) Submit a job/jobs via WMS/LB Submitted job returns unique ID to user Query Status of Job Retrieve data from finished jobs Cancel submitted jobs 20

Job Workflow 21

What type of jobs can run on the Grid? Grid is not a replacement of HPC Grid is best suited to high throughput computing Short/medium term lifetime jobs Support for long running jobs available Relatively independent tasks Monte Carlo Parameter Sweeps Grid can handle Large volume batches of single jobs Complex Workflows MPI jobs MPI support led by Grid Ireland & Int.EU.Grid However, cross site MPI not well supported, but getting better 22

Infrastructure Statistics No of sites steadily increasing (Jan 2009 = 265 ) CPU + Storage count increasing (very dynamic) More than 49 million jobs have been run in the last year Continuous increase observed. Doubling total for first year 18.9 million jobs for 12 months Number of jobs run by non LHC or Ops VOs also with almost 20,000 jobs each day around October 07. More and more VOs make significant usage of the infrastructure March 2008 there are 115 active VOs, >40 having used over 1 CPU year per week over extended periods during the first nine months of 2007. 23

EGEE Site Growth EGEE Site Growth (to October 2007) May 2008 Production Sites = 255 24

N No. of Jobs on EGEE Infrastructure Job Data is generated over a period of days. Drop off due to lack of accounting data during that period 25

CPU/Cores **** May 2008 Approx 68000 cores/cpus 26

Grid Communities and Applications What communities could I get involved with? LHC HEP community is well established Alice, Atlas, CMS, LHCb Smaller HEP VOs (barbar, PhenoGrid,...) Astronomy and Astrophysics CosmoGrid Planck Process Planck Satellite data MAGIC Origin of VHE Gamma rays Earth Science DEGREE Seismic sensor network EGEODE Seismic data processing ESR Earths Obs, Climate, Climate, Solid Earth Physics FUSION Nuclear Fusion Applications BIOMED Medical imaging, bioinformatics, drug discovery Finance, Infrastructure, Geophysics, Comp Chem... 27

Top 10 Accounting Top 10 VO Accounting for last 12 Months 28

No such thing as a free lunch Virtual Organisations Provide resources Provide application developers to Gridify software EGEE & Grid Ireland do not have infinite manpower EGEE doesn t provide resources, the sites do Resource sharing principal VOs should contribute as much as they use 29

Getting Involved Web Site http://www.grid.ie All e mail enquiries to grid ireland help@cs.tcd.ie Request Grid Ireland training and grid porting help Check to see if an existing VO is appropriate to you http://cic.gridops.org/index.php?section=home&page=volist Contact VO manager (see VO ID card) Is a new VO needed? Enable your site/batch system on the Grid 30

Future E INIS awarded 12 million under PRTLI 4 DIAS(lead), UCC, HEANET, NUI Galway, NUIM, ICHEC, TCD, Grid Ireland Building an integrated e Infrastructure in Ireland HPC + DATA + GRID = e Infrastructure Building on success of CosmoGrid Resources Improvements (TCD) Current WN pool increased to 96 Quad Cores (768 cores) Made possible under grant from SFI Grid Ireland Central Services will be upgraded Facilitated by E INIS User Controlled Light Paths (HEAnet) Allowing sites to have dedicated connections over fibre optic network 31

Questions? 32

Grid Ireland Facilitate virtual organisations in Ireland (and abroad) Provide a point of presence at institutes Machines + middleware providing grid access = Gateways Manage grid infrastructure Encourage institutes and communities to provide/share resources Manage grid operations in Ireland Basic user training Participate in the international grid communities Working Groups and forums Help define standards Develop innovative grid middleware solutions 18 sites + national services centrally managed 33

Network switch Gridfw (firewall) Gridinstall (Quattor) Gridgate (CE) Gridstore (SE) Gridmon (test WN) Gridui (UI) Gridnm (NM) UPS Grid Gateway Grid Gateway: All Virtual machines All run on 1 physical machine Remotely managed by OpsCentre Cluster(s): Managed by local admins OpsCentre supports integration Various config & install options 34

Grid Ireland Innovations Transactional Deployment with rollback Distribute and Configure all Grid Ireland gateways Grid Virtualisation Sites are instances of VMs VMs used in Grid Ireland testbeds Testing and Certification VMs used in e Learning testbed Non trivial networking in testbeds Allows testing of real world setup RemotePBS JobManager allowing submission to non grid batch systems GridFS (Grid FileSystem) R/W access to remote filesystems using grid credentials 35

Current VOs in Ireland CosmoGrid Grid enabled Computational Physics of Natural Phenomena WebCom G Middleware stack supporting complicated workflows, aims to hide complexity of Grid Gene Bioinformatics Solo VO A catch all VO for new users who would like to try Grid, 36

Grid Ireland Successes EGEE III SA3 Porting Coordinator (Eamonn Kenny) Producing ports for CentOS 5, Suse SLES 10 + Others Secretary of OGF e Learning WG (Kathryn Cassidy) Chair of EGEE MPI WG (Stephen Childs) Co author of EGEE technical report on EGEE MPI integration Closer integration with Int.EU.Grid AuthZ integration into EGEE R GMA (Stuart Kenny) VM/VNet work being followed by others Quattor Working Group Fabric Management Deployment of 36TB of data storage (April 2008) Deployment of 768 cores in TCD (Dec 2007) 37

End Users Support Once a user has a cert, they can be associated with an existing virtual organisation Facilitate establishment of international VOs Help VOs port applications to grid User training Adaptive e Learning Developed by Kathryn Cassidy Will be used for training sessions tomorrow Introductory and advanced courses Grid introduction, site integration, MPI on grid The EGEE Gilda testbed https://gilda.ct.infn.it/ open to everyone to try grid for themselves 38

Grid Ireland Infrastructure Central services Grid Ireland Sites Clusters 39