Introduction to Grid computing



Similar documents
Analyses on functional capabilities of BizTalk Server, Oracle BPEL Process Manger and WebSphere Process Server for applications in Grid middleware

Concepts and Architecture of Grid Computing. Advanced Topics Spring 2008 Prof. Robert van Engelen

The GENIUS Grid Portal

MIGRATING DESKTOP AND ROAMING ACCESS. Migrating Desktop and Roaming Access Whitepaper

INFN Testbed status report

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Global Grid User Support - GGUS - within the LCG & EGEE environment

Instruments in Grid: the New Instrument Element

A demonstration of the use of Datagrid testbed and services for the biomedical community

Global Grid User Support - GGUS - in the LCG & EGEE environment

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

The Grid-it: the Italian Grid Production infrastructure

CMS Dashboard of Grid Activity

Earth Observation. 86 esa bulletin may

Recent grid activities at INFN Catania(*) Roberto Barbera

For large geographically dispersed companies, data grids offer an ingenious new model to economically share computing power and storage resources

Cloud Computing. Lecture 5 Grid Case Studies

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Big Data and Storage Management at the Large Hadron Collider

EDG Project: Database Management Services

Bob Jones Technical Director

Analisi di un servizio SRM: StoRM

Cluster, Grid, Cloud Concepts

Concepts and Architecture of the Grid. Summary of Grid 2, Chapter 4

The Italian Grid Infrastructure (IGI) CGW08 Cracow. Mirco Mazzucato Italian Grid Infrastructure Coordinator INFN CNAF Director

Plateforme de Calcul pour les Sciences du Vivant. SRB & glite. V. Breton.

Grid Scheduling Architectures with Globus GridWay and Sun Grid Engine

GeoGrid Project and Experiences with Hadoop

NT1: An example for future EISCAT_3D data centre and archiving?

PHENIX Job Submission/Monitoring in transition to the Grid Infrastructure

Database Services for CERN

Virtual machine interface. Operating system. Physical machine interface

Cloud Computing. Chapter 1 Introducing Cloud Computing

Scalable stochastic tracing of distributed data management events

Experiences with the GLUE information schema in the LCG/EGEE production Grid

Retrieval on the Grid Results from the European Project GRACE (Grid Search and Categorization Engine)

Integrating a heterogeneous and shared Linux cluster into grids

Roberto Barbera. Centralized bookkeeping and monitoring in ALICE

DSA1.4 R EPORT ON IMPLEMENTATION OF MONITORING AND OPERATIONAL SUPPORT SYSTEM. Activity: SA1. Partner(s): EENet, NICPB. Lead Partner: EENet

OGSA - A Guide to Data Access and Integration in UK

Grid Computing: A Ten Years Look Back. María S. Pérez Facultad de Informática Universidad Politécnica de Madrid mperez@fi.upm.es

Grid Computing With FreeBSD

Chapter 7. Using Hadoop Cluster and MapReduce

IP SAN Fundamentals: An Introduction to IP SANs and iscsi

US NSF s Scientific Software Innovation Institutes

Service Oriented Distributed Manager for Grid System

Betriebssystem-Virtualisierung auf einem Rechencluster am SCC mit heterogenem Anwendungsprofil

Esqu Science Experiments For Computer Network

Deploying a distributed data storage system on the UK National Grid Service using federated SRB

Finnish Grid Activities

A multi-dimensional view on information retrieval of CMS data

Distributed Database Access in the LHC Computing Grid with CORAL

Grid Computing vs Cloud

Research IT Plan. UCD IT Services. Seirbhísí TF UCD

The CMS analysis chain in a distributed environment

Status of Grid Activities in Pakistan. FAWAD SAEED National Centre For Physics, Pakistan

Internet2 middleware initiative: past, present and future

Global Grid User Support - GGUS - start up schedule

Grids Computing and Collaboration

2. Requirements for Metadata Management on the Grid

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

Cloud Computing. Summary

High Availability Databases based on Oracle 10g RAC on Linux

Article on Grid Computing Architecture and Benefits

A Market-based Framework for Trading Grid Resources

Introduction to Grid Computing

Scheduling and Resource Management in Computational Mini-Grids

Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery

Knowledge Management and Semantics in Global Grid User Support

Information Technology Services

An approach to grid scheduling by using Condor-G Matchmaking mechanism

Course Outline. Microsoft Azure Fundamentals Course 10979A: 2 days Instructor Led. About this Course. Audience Profile. At Course Completion

GRIP:Creating Interoperability between Grids

Big Data and Cloud Computing for GHRSST

Grid Computing & the Open Grid Services Architecture. Ian Foster Argonne National Laboratory University of Chicago Globus Project

HPC and Grid Concepts

EGEE a worldwide Grid infrastructure

Apache Hadoop FileSystem Internals

Hadoop & its Usage at Facebook

Hadoop & its Usage at Facebook

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Bibliography. University of Applied Sciences Fulda, Prof. Dr. S. Groß

Cloud Computing. Up until now

OWB Users, Enter The New ODI World

55004A: Installing and Configuring System Center 2012 Operations Manager

CPS221 Lecture: Operating System Structure; Virtual Machines

QosCosGrid Grid Technologies and Complex System Modelling

A Performance Monitor based on Virtual Global Time for Clusters of PCs

Digital libraries of the future and the role of libraries

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Introduction. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

The ENEA-EGEE site: Access to non-standard platforms

MANAGING SCIENTIFIC DATA WITH NDN

An IDL for Web Services

How To Understand The History Of Innovation In The United States

The MODIS online archive and on-demand processing

irods at CC-IN2P3: managing petabytes of data

AN APPROACH TO DEVELOPING BUSINESS PROCESSES WITH WEB SERVICES IN GRID

DAME Astrophysical DAta Mining Mining & & Exploration Exploration GRID

Neelesh Kamkolkar, Product Manager. A Guide to Scaling Tableau Server for Self-Service Analytics

SPACI & EGEE LCG on IA64

Transcription:

Introduction to Grid computing The INFNGrid Project Team

Introduction This tutorial has been implemented considering as starting point the DataGrid (EDG) tutorial Many thanks to the EDG tutorials team! Various slides and exercises come from the EDG tutorial Many changes to the above material have been applied E.g. information on Globus has been added by the INFNGrid Tutorials team

Overview What is a Grid? What is Grid computing? Why Grids? Grid projects world wide The European DataGrid project

What is a Grid? Dependable, consistent, pervasive access to resources Enable communities ( virtual organizations ) to share geographically distributed resources as they pursue common goals in the absence of central control, omniscience, trust relationships Make it easy to use diverse, geographically distributed, locally managed and controlled DB computing facilities as if they formed a coherent local cluster DB DB DB

What is Grid computing Coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations [ I.Foster] A VO is a collection of users sharing similar needs and requirements in their access to processing, data and distributed resources and pursuing similar goals. Big challenge for computing with high sociological impact and a new way of collaborating

Researchers perform their activities regardless geographical location, interact with colleagues, share and access data The Grid Vision The Grid: networked data processing centres and middleware software as the glue of resources. Scientific instruments and experiments provide huge amount of data

The Grid distributed computing idea 1/2 Once upon a time.. mainframe Microcomputer Mini Computer Cluster (by Christophe Jacquet)

The Grid distributed computing idea 2/2 and today (by Christophe Jacquet) 28-29 January, 2004 EGrid Project - Grid Tutorial, Trieste

Grid vs distributed computing Distributed computing Distributed applications tend to be specialised systems, intended for a single purpose or user group Resources are usually centrally managed and controlled, or at least they are managed using the same mechanisms, policies, tools Resources are usually homogeneous Static nature Grid Computing Resources locally managed and controlled The owner of the resource has the full control, and therefore different Grid resources can be managed using different policies and mechanisms, e.g. Computing resources can be managed by different local resource management systems Different storage systems (disk pools, mass storage systems, ) used in different Grid sites Different policies and priorities given to the same user in different Grid systems Different kind of resources Not always the same hardware and applications Dynamic nature Resources and users added/removed/changed frequently

Main entities of a Grid architecture Service providers Publish the availability of their services via information systems Such services may come-and-go or change dynamically E.g. a Grid site that offers x CPUs and y GB of storage Service brokers Register and categorize published services and provide search capabilities E.g. 1) Resource Broker selects the best resources for a job 2) Catalogues of data held at each Grid site Service requesters Single sign-on: log into the grid once Use brokering services to find a needed service and employ it E.g. scientist submits a simulation job that needs 12 CPUs for 6 hours and 15 GB which gets scheduled, via the Resource Broker, on a certain Grid site

Why Grids Distributed (world-wide) user collaborations Distributed computational and storage resources Scale of the problems Grids provide access to large data processing power and huge data storage possibilities As the Grid grows its usefulness increases (more resources available) Large communities of possible Grid users : High Energy Physics Environmental studies: Earthquakes forecast, geologic and climate changes, ozone monitoring Biology, Genetics, Earth Observation Astrophysics, New composite materials research Astronautics, etc.

High Energy Physics ATLAS The LHC Detectors CMS ~6-8 PetaBytes / year ~10 8 events/year ~10 3 batch and interactive users Federico.carminati, EU review presentation LHCb

Earth Observation ESA missions: about 100 Gbytes of data per day (ERS 1/2) 500 Gbytes, for the next ENVISAT mission (2002). Grid contribute to EO: enhance the ability to access high level products allow reprocessing of large historical archives improve Earth science complex applications (data fusion, data mining, modelling ) Federico.Carminati, EU review presentation, 1 March 2002 Source: L. Fusco, June 2001

Biology BioInformatics Bio-informatics Phylogenetics Search for primers Statistical genetics Bio-informatics web portal Parasitology Data-mining on DNA chips Geometrical protein comparison Medical imaging MR image simulation Medical data and metadata management Mammographies analysis Simulation platform for PET/SPECT 1. Query the medical image database and retrieve a patient image 3. Retrieve most similar cases Exam image patient key ACL... Metadata Medical images 2. Compute similarity measures over the database images Submit 1 job per image Similar images Low score images

Status (some years ago) People have been discussing about Grid for several years but till some years ago more or less only Globus toolkit available Globus toolkit: core services for Grid tools and applications (Security, Information service, Resource management, etc ) Good basis to build on but: No high level services Many problems (e.g. handling of high volumes of data) not addressed No production quality implementations Not possible to do real work with Grids yet Necessary to address the shortcomings and implement the missing (high level) services

Major existing Grid projects Europe-based projects: European DataGrid (EDG) : 2001-2003 www.edg.org LHC Computing GRID (LCG): 2002-2008 -. cern.ch/lcg CrossGrid : 2002-2005 www.crossgrid.org DataTAG : 2002-2003 www.datatag.org GridLab : 2002-2004 www.gridlab.org EGEE : 2004-2007 www.cern.ch/egee European National Projects: INFNGRID, UK-GridPP, NorduGrid(Nordic test bed for wide area computing )

Major existing Grid projects US projects: GriPhyN HEP www.griphyn.org PPDG HEP www.ppdg.net ivdgl ( joint GriPhyN, PPDG) www.ivdgl.org TERAGRID (NSF) www.teragrid.org IBM, Intel Qwest,Myricom, Sun Microsystems, Oracle. National Middleware Initiative (NSF NMI) www.nsfmiddleware.org ESG www.earthsystemgrid.org NEESgrid virtual lab earthquake engineering www.neesgrid.org BIRN biomedical informatics research network birn.ncrr.nih.gov/birn/ Asia-based projects: ApGRID www.apgrid.org TWGRID www.twgrid.org Many Grid projects in : Korea, Japan, China, Australia

Global Grid Forum (GGF) Mission To focus on the promotion and development of Grid technologies and applications via the development and documentation of "best practices," implementation guidelines, and standards with an emphasis on "rough consensus and running code" An Open Process for Development of Standards A Forum for Information Exchange A Regular Gathering to Encourage Shared Effort See http://www.globalgridforum.org

The European Data Grid (EDG) Project To build on the emerging Grid technology to develop a sustainable computing model for effective share of computing resources and data Applications/End Users Communities: HEP, Earth Observation, Biology Start : Jan 1, 2001 End : Dec 31, 2003 Specific project objectives: Middleware for fabric & Grid management Large scale testbed To collaborate with and complement other European and US projects To contribute to Open Standards and international bodies: DataGrid co-founder of Global Grid Forum (GGF)

DataGrid in Numbers Testbeds People >15 regular sites >350 registered users 12 Virtual Organisations >10 000s jobs submitted 16 Certificate Authorities >1000 CPUs >5 TeraBytes disk >500 people trained 3 Mass Storage Systems 278 man-years of effort 100 years funded Software 50 use cases 18 software releases Scientific applications 5 Earth Obs institutes 9 bio-informatics apps 6 HEP experiments >300K lines of code 28-29 January, 2004 EGrid Project - Grid Tutorial, Trieste

EDG software architecture EDG architectural functional blocks: Basic Services (authentication, authorization, Replica Catalog, secure file transfer, Info Providers) rely on VDT (basically Globus + Condor) Higher level EDG middleware (developed within EDG) Applications (HEP,BIO,EO) Specific application layer VOs common application layer Grid middleware VDT ALICE ATLAS CMS LHCb Other apps LHC Other apps High level Grid middleware Basic Services OS & Net services

EDG middleware Grid architecture Local Computing Local Application Local Database APPLICATIONS Grid Grid Application Layer Job Management Data Management Metadata Management Collective Services Grid Scheduler Replica Manager Information & Monitoring Underlying Grid Services Grid SQL Database Services Computing Element Services Storage Element Services Replica Catalog Authorization Authentication and Accounting Service Index M / W Fabric Fabric services Resource Management Configuration Management Monitoring and Fault Tolerance Node Installation & Management Fabric Storage Management GLOBUS

Current EDG Status EDG currently provides a set of middleware services Job & Data Management Grid Information Services Grid & Network monitoring Security, Authentication & Authorization tools Fabric Management EDG release 2.1 currently deployed to the EDG-Testbeds GNU/Linux RedHat 7.3 on Intel PCs ~15 sites in application testbed actively used by application groups EDG software used also in other Grid infrastructures: Deployed in the LCG Grid infrastructure (for LHC experiments) EDG sw also deployed at total of ~40 sites via CrossGrid, DataTAG and national grid projects Deployed in the GRID.IT grid (used by different scientific applications) Many applications ported to EDG testbeds and actively being used Activities will continue in the framework of the EGEE project

Conclusions Grids enable distributed communities ( virtual organizations ) to share geographically distributed resources Many achievements in the last years Achievements in Grid middleware functionality and stability More and more applications using the Grid Also industry is joining the Grid but still a lot to do Going towards standardizations (OGSA, etc.) Going from Grid prototypes to Grid infrastructures to be used for real production activities

28/9/2003: Computational grids finally reached the same level of stability of electrical power grids!