What is this Thing Called the Grid?



Similar documents
NorduGrid ARC Tutorial

Cluster, Grid, Cloud Concepts

Concepts and Architecture of the Grid. Summary of Grid 2, Chapter 4

MIGRATING DESKTOP AND ROAMING ACCESS. Migrating Desktop and Roaming Access Whitepaper

An approach to grid scheduling by using Condor-G Matchmaking mechanism

Grid Computing With FreeBSD

GRID COMPUTING: A NEW DIMENSION OF THE INTERNET

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

Analisi di un servizio SRM: StoRM

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

Deploying a distributed data storage system on the UK National Grid Service using federated SRB

IBM Solutions Grid for Business Partners Helping IBM Business Partners to Grid-enable applications for the next phase of e-business on demand

Analyses on functional capabilities of BizTalk Server, Oracle BPEL Process Manger and WebSphere Process Server for applications in Grid middleware

GRIP:Creating Interoperability between Grids

THE CCLRC DATA PORTAL

Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago

CLUSTER COMPUTING TODAY

CMS Tier-3 cluster at NISER. Dr. Tania Moulik

GridFTP: A Data Transfer Protocol for the Grid

Generic Grid Computing Tools for Resource and Project Management

Web Service Robust GridFTP

For large geographically dispersed companies, data grids offer an ingenious new model to economically share computing power and storage resources

The NorduGrid architecture and tools

Grid Scheduling Architectures with Globus GridWay and Sun Grid Engine

IBM Deep Computing Visualization Offering

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Grid Scheduling Dictionary of Terms and Keywords

Resource Management on Computational Grids

TUTORIAL. Rebecca Breu, Bastian Demuth, André Giesler, Bastian Tweddell (FZ Jülich) {r.breu, b.demuth, a.giesler,

An Evaluation of Economy-based Resource Trading and Scheduling on Computational Power Grids for Parameter Sweep Applications

Grid Computing: A Ten Years Look Back. María S. Pérez Facultad de Informática Universidad Politécnica de Madrid mperez@fi.upm.es

The glite File Transfer Service

Grid Computing Vs. Cloud Computing

Using High Availability Technologies Lesson 12

PROGRESS Portal Access Whitepaper

Data Management System for grid and portal services

Bibliography. University of Applied Sciences Fulda, Prof. Dr. S. Groß

HPC and Grid Concepts

Globus Toolkit: Authentication and Credential Translation

Media Exchange really puts the power in the hands of our creative users, enabling them to collaborate globally regardless of location and file size.

A Survey Study on Monitoring Service for Grid

Web Service Based Data Management for Grid Applications

Using Globus Toolkit

A Taxonomy and Survey of Grid Resource Planning and Reservation Systems for Grid Enabled Analysis Environment

Assignment # 1 (Cloud Computing Security)

Data Management. Network transfers

From Distributed Computing to Distributed Artificial Intelligence

Implementing Network Attached Storage. Ken Fallon Bill Bullers Impactdata

An Experience in Accessing Grid Computing Power from Mobile Device with GridLab Mobile Services

Evolution of an Inter University Data Grid Architecture in Pakistan

Panasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF

Scala Storage Scale-Out Clustered Storage White Paper

IMPLEMENTING GREEN IT

Efficient Data Management Support for Virtualized Service Providers

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

The GENIUS Grid Portal

Concepts and Architecture of Grid Computing. Advanced Topics Spring 2008 Prof. Robert van Engelen

Elastic Cloud Computing in the Open Cirrus Testbed implemented via Eucalyptus

Planning the Installation and Installing SQL Server

Owner of the content within this article is Written by Marc Grote

Campus Network Design Science DMZ

Maximizing Your Server Memory and Storage Investments with Windows Server 2012 R2

HADOOP, a newly emerged Java-based software framework, Hadoop Distributed File System for the Grid

Data Grids. Lidan Wang April 5, 2007

RSA Authentication Manager 7.1 Basic Exercises

Building a Linux Cluster

CLOUD COMPUTING IN HIGHER EDUCATION

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

The GRID and the Linux Farm at the RCF

Cloud Service Provider Builds Cost-Effective Storage Solution to Support Business Growth

Article on Grid Computing Architecture and Benefits

Managing Credentials with

Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania)

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router

On-Demand Supercomputing Multiplies the Possibilities

A Web Services Data Analysis Grid *

Introduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber

msuite5 & mdesign Installation Prerequisites

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Symmetric Multiprocessing

Information Sciences Institute University of Southern California Los Angeles, CA {annc,

Moving Beyond the Web, a Look at the Potential Benefits of Grid Computing for Future Power Networks

Transparent Optimization of Grid Server Selection with Real-Time Passive Network Measurements. Marcia Zangrilli and Bruce Lowekamp

Roberto Barbera. Centralized bookkeeping and monitoring in ALICE

Diagram 1: Islands of storage across a digital broadcast workflow

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Distributed Systems and Recent Innovations: Challenges and Benefits

We look beyond IT. Cloud Offerings

Integrating a heterogeneous and shared Linux cluster into grids

Summer Student Project Report

GRID COMPUTING Techniques and Applications BARRY WILKINSON

Data Movement and Storage. Drew Dolgert and previous contributors

Network device management solution

vrealize Operations Manager Customization and Administration Guide

An Integrated CyberSecurity Approach for HEP Grids. Workshop Report.

Introduction. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Very Large Enterprise Network Deployment, 25,000+ Users

Zadara Storage Cloud A

REM-Rocks: A Runtime Environment Migration Scheme for Rocks based Linux HPC Clusters

Transcription:

What is this Thing Called the Grid? Or: Is the Emperor Naked? Leif Nixon 2 december 2003

Please, interrupt whenever you have questions. These slides will be available from http://www.nsc.liu.se/~nixon/ upplysning/ 1/55

The Google Hype Measurement 2/55

Some quotes Irving Wladawsky-Berger, Vice President, IBM: Grid Computing is really the natural evolution of the Internet. This is really looking at the Internet with all it s promise of universal connectivity and reach, and making it work far better [... ] [... ] the promise here is our customers will be able to have their cake and eat it too. Wolfgang Gentzsch, Engineering Director, Sun: 19th Century: Steam Engine 20th Century: Combustion Engine 21st Century: Grid Engine Holly Korab, NCSA: Cyberspace on steroids. 3/55

News clippings IBM Takes Sony PlayStation Gaming to the Grid The new online gaming network, based on grid technology, will be unveiled next week at the Game Developer s Conference in San Jose, California. It will allow developers to engineer games directly onto a grid and will make it easier for them to support online gamers. (Hey, let s build a cluster of Playstations and hook them up to the grid! Wait, that s already been done... ) Sun Unveils Grid Computing Solutions Program [... ] more than 6,000 grids have been deployed worldwide over 2,500 in EMEA alone (It seems Sun have their very own definition of what a grid is.) 4/55

Money makes the world go around Some EU funded grid projects: ASP-BP : Development of a framework for 6 experiments regarding ASP technologybased applications applied in different industrial fields. COG : Demonstration of the applicability of Grid technologies to industry. CROSSGRID : Development of techniques for large-scale gridenabled real-time simulations and visualisations. DAMIEN : Development of essential software supporting the Grid infrastructure. DATAGRID : Development of techniques supporting the processing and datastorage requirements of next generation scientific research. DATATAG : Development of techniques to support reliable and high-speed collaboration across widely distributed networks. EUROGRID : Development of core Grid software components. FLOWGRID : On-demand CFD (Computational Fluids Dynamics) simulation and visualisation using Grid computing. GEMSS : Development of new Grid-enabled tools for improved diagnosis, operative planning and surgical procedures. GRACE : Development of a search and categorisation engine for flexible allocation of computational and data storage resources in Grid environments. GRASP : Development of architecture and business models for delivering ASP services over the Grid-enabled networks. GRIA : Development of business models and processes that make it feasible and cost-effective to offer and use computational services securely in an open Grid marketplace. GRIDLAB : Development of software capable of fully exploiting dynamic resources. GRIDSTART : Accompanying measure with objective of maximising the impact of EU-funded Grid and related activities through clustering. GRIP : Interoperability of Globus and UNICORE, two leading software packages central to the operation of the Grid. LeGE-WG : Network of Excellence facilitating the establishment of a European Learning Grid Infrastructure by supporting the systematic exchange of information and by creating opportunities for close collaboration between the different actors. OPENMOLGRID : Development of tools for molecular design based on UNICORE enabled distributed computing environment. Call it grid, and you get funding. 5/55

An often felt frustration Ian Foster, Argonne National Lab: Grids have moved from the obscurely academic to the highly popular. We read about Compute Grids, Data Grids, Science Grids, Access Grids, Knowledge Grids, Bio Grids, Sensor Grids, Cluster Grids, Campus Grids, Tera Grids, and Commodity Grids. [... ] If by deploying a scheduler on my local area network I create a Cluster Grid, then doesn t my Network File System deployment over that same network provide me with a Storage Grid? [... ] Is there any computer system that isn t a Grid? 6/55

Cloudscapes of vapourware An often repeated claim: There are OGSA Python, Perl and.net. implementations available in Actual state of implementations: Python: This code is presented as pre-alpha/development quality code. Perl: OGSI::Lite is an experiment in creating a Grid Services Container using Perl..NET: The latest version of OGSI.NET is Tech Preview 1.1. Makes you feel safe and cozy, right? The Open Grid Service Architecture 7/55

The big question Is the emperor naked? 8/55

Actually, you can use grid stuff today. Especially suitable are tasks that: Well, let s see... are computation intensive are through-put biased (large numbers of similar, independent jobs) run well on commodity hardware We ll look at an example (alright, toy example) raytracing. 9/55

Example: Raytracing Raytracing is good, since it is an embarrassingly parallel task; the computation for determining the colour of a pixel in the image is independent for each pixel. This means we can decompose the rendering of an image into a number of independent computations. 10/55

Example: Raytracing Decomposition of a model of a rackmounted cluster: We can submit each slice as a separate grid job. 11/55

Brief overview of running grid jobs First, establish your grid credentials. Then, for each slice of the image: 1. Create a job description, using XRSL, that specifies: input and output files, what command to run, requirements on the machine that will run the job, and so on... 2. Submit the job to the grid. It will automatically be sent to the most suitable machine. 3. Wait until execution is finished. 4. Retrieve the output files. When all slices are retrieved, paste them into a composite image. 12/55

Meanwhile, let s step back for a moment... What are people actually trying to achieve, and why? Two facts A) Today, we have large amounts of resources connected through high bandwidth networks; Computing resources Storage and databases Instruments All sorts of stuff, really But they all have their own access mechanisms. (Just think of the differences between PDC and NSC.) B) Computing tasks are getting huge. (LHC will generate approx. 5 PB/yr) 13/55

A new utility The goal of the grid Connect all these resources. Provide a uniform and seamless way of accessing them. The long-term vision The resources should just be there when you need them. Computing should be just another utility like tap water or electricity. Sounds simple, right? 14/55

A brief timeline Interest in parallel and distributed computing in the 80s. Testbeds connecting American supercomputing centres in the late 90s. Foster, Kesselman: The Grid: Blueprint for a New Computing Infrastructure, 1998. Lots of software development Globus, Legion, Condor, NWS, SRB, NetSolve, AppLes, Unicore but not much actual use. Around 2000 grid interest starts to boom. High energy physics driving force especially CERN. Globus toolkit v2 becomes leading middleware. 2002 2003: Commitment from big players in industry. Hype levels are rising. OGSA/OGSI standards emerge. Globus toolkit v3 is released. People have been thinking of these things for a long time. Many of the problems involved are non-trivial. 15/55

To formalize things... A grid is a decentralized system spanning multiple administrative domains, where both the set of users and the set of resources vary dynamically. It can handle extremely large numbers of systems and volumes of data. It provides uniform access to heterogenous systems, both from the point of the user and from the point of the user s application. It provides a flexible mechanism for locating resources based on the user s criteria. It provides non-trivial quality of service. 16/55

Don t believe what they say There is today no such thing as a grid in actual operation. However, there are many testbeds, some of which seem to work reasonably well, within certain limits. We will focus on one of these, Nordugrid. Sun s claim of 6,000 operational grids is, frankly, b*llsh*t. They have a very... creative definition of the word grid. 17/55

Nordugrid The Nordugrid project started out in 2001 with a small project team. The team began building a grid middleware based on the Globus version 2 toolkit, with the focus on getting something working fast. Start with simple things that work and proceed from there. The Nordugrid middleware has been used to build a testbed, currently encompassing approx. 1000 CPUs. Swegrid will use the Nordugrid middleware. I.e. not OGSA based. 18/55

Whom can we trust? One of the basic problems in building a grid is how to authenticate users and machines throughout the grid. The Globus, and hence Nordugrid, approach is to use certificates (X.509 style) based on public key cryptography. The security subsystem is known as GSI, Grid Security Infrastructure. 19/55

Public key cryptography In public key cryptography, you have two keys; a private key, which you keep secret, and never share with anybody a public key, which is published for the whole world to see Anyone can send a secret message to you by encrypting it with your public key. It can then only be decrypted by using your private key. Contrariwise, you can sign documents with your private key. Anyone can then verify that signature using your public key. 20/55

Who is who? But there is a problem here; how do we know that a certain public key really belongs to a certain person? The solution is to introduce a trusted third party the Certificate Authority, or CA for short. The CA signs public keys of users, thereby certifying that they belong to certain individuals. The public key and the CA s signature together form a certificate. (You still need to get hold of the CA s public key in a secure manner, though.) 21/55

Not just persons Not only do we need to identify persons we also need to identify the remote machines we speak to. Hence server certificates are issued to machines in the same way as to individuals. Server certificates work the same way as personal certificates, except that they point to a hostname instead of a person. 22/55

The private key must be kept private. Really private. Be paranoid! And I mean private. 23/55

Delegation Grid resources need to do stuff on behalf of the user like reading and writing files using the privileges of the user. We certainly don t want to transfer our private key to the remote resource. The solution is that the user issues a time-limited certificate giving the holder the same rights as the user. This proxy certificate is signed by the user s private key. A bit of paranoia is needed in dealing with the proxy certificate. We don t want nasty people to get hold of one of our proxy certificates. Still, proxy certificates are not quite as sensitive as your private key. 24/55

Authorization Users and resources are grouped into VOs, virtual organizations. Each VO has a VO manager that decides who can be a member of the VO. Resource owners can decide authorization based on VO membership, and don t need to keep track of individual users. (Instead of saying This resource can be used by users U 1, U 2, U 3,... U n, you can say This resource can be used by the members of the virtual organization X.) 25/55

Data handling Another problem is handling the large data sets that you often have in the grid world. (In fact, it is often the size of the data sets that force users onto the grid in the first place.) Nordugrid offers two main tools for this; gridftp data replication 26/55

Gridftp Gridftp (earlier known as gsiftp) is based on the ordinary ftp protocol. Gridftp uses GSI for authentication and authorization. Data transfer can be made using several TCP streams in parallel for increased bandwidth. Data transfers can be initiated by a third party; a client can transfer data between two servers, without needing the data stream to go through the client. Nordugrid uses gridftp where ever data needs to be moved around. 27/55

Data sets can be replicated to: Data replication achieve higher availability servers do crash sometimes keep the data close at hand avoid repeatedly downloading the data over long distances A Replica Catalog keeps track of the various physical copies of each replicated file. 28/55

Resource location So, OK, we ve got tools to handle the across-administrative-domains stuff, and to handle large data sets, but how do we actually find the resources we need? Information system Resource specifications 29/55

The Nordugrid information system Based on the Globus MDS component. Uses the standard LDAP protocol. Each grid unit (cluster, storage element,... ) runs its own Grid Resource Information Server (GRIS). Each GRIS registers to a national Grid Information Index Server (GIIS). The national GIISes register to a set of top-level redundant GIISes. The result is a hierarchical, dynamic system only the top-level GIISes need to be known to clients. Caching is used at each level to increase performance. 30/55

The Nordugrid information system Example: Which clusters in Sweden have i686 class processors? $ grid-info-search -h grid.quark.lu.se \ -b mds-vo-name=sweden,o=grid -x \ nordugrid-cluster-architecture=i686 \ nordugrid-cluster-name # farm.hep.lu.se, Sweden, grid dn: nordugrid-cluster-name=farm.hep.lu.se,mds-vo-name=... nordugrid-cluster-name: farm.hep.lu.se # seth.hpc2n.umu.se, Sweden, grid dn: nordugrid-cluster-name=seth.hpc2n.umu.se,mds-vo-name=... nordugrid-cluster-name: seth.hpc2n.umu.se : : 31/55

CLI challenged? Don t worry There is also a nice web interface to the information system, the Grid Monitor, at http://www.nordugrid.org/. 32/55

Resource specification The resource specification contains two types of information: It describes the job to be run It specifies what resources are needed, in terms of CPU-time, installed software, operating system, and so on. The latter information is used to find a suitable machine to run on, the former is used to actually run the job. Resource specifications are written using the Extended Resource Specification Language (XRSL). 33/55

Resource specification Hello World The grid version of Hello World: &(executable="/bin/echo") (arguments="hello Grid!") (stdout="hello.txt") (gmlog="debuginfo") (jobname="my Hello Grid") (cputime=1) 34/55

Resource specification I/O Input and output files can be specified in a flexible manner: (inputfiles=(file1.dat http://example.com/test.dat) (file2.dat gsiftp://niak.nsc.liu.se/myfile.dat) (file3.dat rc://dc1.uio.no/2000/his/atlas.dat) (file4.dat /scratch/bigfile.dat) (file5.dat "")) (outputfiles=(output1.dat gsiftp://example.com/result17.dat) (output2.dat "")) 35/55

Resource specification Runtime environments You can specify that your job needs a certain software package. (runtimeenvironment=povray-3.5) In this case, version 3.5 of the POVray raytracer is requested. This serves two purposes; The job will be submitted to a machine where POVray 3.5 is available. The job s environment will be properly initialized to run POVray (the povray binary will be in $PATH, and so on). 36/55

Resource specification You can do much more in XRSL. Full details are in the documentation on http://www.nordugrid.org/papers.html 37/55

By the way... The XRSL for one of the raytracing jobs looks like this, in case you were curious. &(executable=runpov.sh) (arguments=101 150) (stdout="pov.out") (stderr="pov.err") (gmlog="pov.log") (jobname="monopov-101-150") (cputime=5) (runtimeenvironment=povray-3.5) (inputfiles=(monolith.pov "") (front.png "")) (outputfiles=(monolith-101-150.png "")) 38/55

The user interface The Nordugrid user interface consists of command line tools to submit jobs and access them in various ways. There is also a web portal. (No personal experience, though) 39/55

The user interface - proxy certificates To create a proxy certificate: $ grid-proxy-init Your identity: /O=Grid/O=NorduGrid/OU=nsc.liu.se/CN=Leif Nixon Enter GRID pass phrase for this identity: Creating proxy... Done Your proxy is valid until: Wed Oct 22 00:26:51 2003 $ To destroy it when you are done: $ grid-proxy-destroy $ (Strictly, these commands are not part of the Nordugrid User Interface they are straight from Globus.) 40/55

The user interface - submitting jobs The ngsub command locates a suitable resource (by querying the information system) and submits the job there. $ ngsub -f hello_grid.xrsl Job submitted with jobid gsiftp://toornaarsuk.distlab.diku.dk:2811/ jobs/17235972981974760487 $ Note that there is no central broker; all of the work matching resources to the XRSL is performed locally. 41/55

The user interface - checking on jobs To check how the job is proceeding the ngstat command is used: $ ngstat gsiftp://toornaarsuk.distlab.diku.dk:2811/jobs/17235972981 974760487 Job gsiftp://toornaarsuk.distlab.diku.dk:2811/jobs/1723597298197476 0487 Jobname: My Hello Grid Status: FINISHING $ Thankfully, you can use the job name defined in the XRSL to specify the job: $ ngstat "My Hello Grid" Job gsiftp://toornaarsuk.distlab.diku.dk:2811/jobs/1723597298197476 0487 Jobname: My Hello Grid Status: FINISHED 2003-10-21 12:34:30 $ 42/55

The user interface - retrieving results When the job is finished, you can download its output files using ngget: $ ngget "My Hello Grid" Downloading files to /home/nixon/docs/grid/test/hello_grid/17235972 981974760487 ngget: Download successful. Deleting job from gatekeeper. $ ls 17235972981974760487/ hello.txt grid.debug $ Output files are typically kept on the resource for 24 hours and then erased. Either download them before that, or specify in your XRSL file that they should be uploaded somewhere. 43/55

So, is the emperor naked? Let s check... Does this stuff work? 44/55

Swegrid A national Swedish grid, Swegrid, is currently under construction. It is planned to enter full scale production in March 2004. It consists of six clusters across the country, each consisting of 100 PC:s running Linux. Each node has a 2.8 GHz Intel P4 processor with 800 MHz front side bus, 2 GB RAM and an 80 GB IDE hard disk. Each cluster has approximately 2 TB of fibre channel attached disk for shortterm storage. The nodes within each cluster are interconnected by gigabit Ethernet. 45/55

Usage AMANDA AMANDA is the Antarctic Muon and Neutrino Detector Array. It consists of more than 700 photodetectors buried in the ice at the geographic South Pole. 46/55

Usage AMANDA 47/55

Usage AMANDA 48/55

Usage LHC Large Hadron Collider world s largest accelerator. To be in operation in 2007. 49/55

Usage LHC Radius: 4.3 km. Depth: approx 100 m 50/55

Usage LHC 51/55

Usage LHC 52/55

Usage LHC 53/55

Usage LHC Some figures: Estimated cost: 1.3 10 10 SEK. Speed of protons: 0.999999991c Amount of preprocessed data from ATLAS: 100 MB/s. Amount of data from LHC: 5 7 PB/yr. One third of Swegrid will be dedicated to LHC computing. We need disks. Lots of disks. 54/55

Swegrid work ahead Technical platform works reasonably well. What s lacking is the boring stuff: handling allocations and accounting, coordinating cluster configurations, coordinating AUP:s, educating users, all sorts of bean counting, really... 55/55