Cloud Computing. Lecture 5 Grid Case Studies 2014-2015

Similar documents
Cloud Computing. Up until now

Grid Computing With FreeBSD

The GridWay Meta-Scheduler

LIGO Authentication and Authorization 2.0

XSEDE Service Provider Software and Services Baseline. September 24, 2015 Version 1.2

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

A Survey Study on Monitoring Service for Grid

Classic Grid Architecture

Inca User-level Grid Monitoring

An approach to grid scheduling by using Condor-G Matchmaking mechanism

GridWay: Open Source Meta-scheduling Technology for Grid Computing

Grid Computing vs Cloud

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Monitoring Clusters and Grids

Grid Scheduling Architectures with Globus GridWay and Sun Grid Engine

THE CCLRC DATA PORTAL

TeraGrid: A National Cyberinfrastructure Facility

Concepts and Architecture of Grid Computing. Advanced Topics Spring 2008 Prof. Robert van Engelen

Inca User-level Grid Monitoring

Data Management Challenges of Data-Intensive Scientific Workflows

Web Services and Service Oriented Architectures. Thomas Soddemann, RZG

EDG Project: Database Management Services

Using Globus Toolkit

Grid Sun Carlo Nardone. Technical Systems Ambassador GSO Client Solutions

Grid Data Management. Raj Kettimuthu

A Web Services Data Analysis Grid *

PRACE WP4 Distributed Systems Management. Riccardo Murri, CSCS Swiss National Supercomputing Centre

Cloud Computing. Lecture 5 Grid Security

Secure Federated Light-weight Web Portals for FusionGrid

Condor for the Grid. 3)

A Service for Data-Intensive Computations on Virtual Clusters

A QoS-aware Method for Web Services Discovery

THE US NATIONAL VIRTUAL OBSERVATORY. IVOA WebServices. William O Mullane The Johns Hopkins University

Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace

DFSgc. Distributed File System for Multipurpose Grid Applications and Cloud Computing

Execution Management: Key Concepts

Integration strategy

Identity and Access Management for LIGO: International Challenges

Science Gateway Services for NERSC Users

On Enabling Hydrodynamics Data Analysis of Analytical Ultracentrifugation Experiments

Deploying Business Virtual Appliances on Open Source Cloud Computing

CONDOR And The GRID. By Karthik Ram Venkataramani Department of Computer Science University at Buffalo

Advanced Data Management and Analytics for Automated Demand Response (ADR) based on NoSQL

Cluster, Grid, Cloud Concepts

Real Time Analysis of Advanced Photon Source Data

P ERFORMANCE M ONITORING AND A NALYSIS S ERVICES - S TABLE S OFTWARE

Recommendations for Static Firewall Configuration in D-Grid

CEMON installation and configuration procedure

Enabling LIGO Applications on Scientific Grids

The Globus Replica Location Service: Design and Experience

SWIFT. Page:1. Openstack Swift. Object Store Cloud built from the grounds up. David Hadas Swift ATC. HRL 2012 IBM Corporation

GRID COMPUTING Techniques and Applications BARRY WILKINSON

HPC-related R&D in 863 Program

CHAPTER 4 PROPOSED GRID NETWORK MONITORING ARCHITECTURE AND SYSTEM DESIGN

AHE Server Deployment and Hosting Applications. Stefan Zasada University College London

Introduction. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Building Web Services with XML Service Utility Library (XSUL)

Towards an E-Governance Grid for India (E-GGI): An Architectural Framework for Citizen Services Delivery

Future Developments in UniGrids and NextGRID

The EDGeS project receives Community research funding

Tanya Levshina, Steve Timm

Esqu Science Experiments For Computer Network

Removing Failure Points and Increasing Scalability for the Engine that Drives webmd.com

Secure Data Transfer and Replication Mechanisms in Grid Environments p. 1

An Architecture for Dynamic Allocation of Compute Cluster Bandwidth

BUILDING APPLICATIONS FROM A WEB SERVICE BASED COMPONENT ARCHITECTURE. D. Gannon, S. Krishnan, L. Fang, G. Kandaswamy, Y. Simmhan, A.

Introduction to Grid computing

Roberto Barbera. Centralized bookkeeping and monitoring in ALICE

Adding IaaS Clouds to the ATLAS Computing Grid

StreamServe Persuasion SP5 StreamStudio

StreamServe Persuasion SP5 Control Center

Globus Auth. Steve Tuecke. The University of Chicago

GT 6.0 GRAM5 Key Concepts

Technical Guide to ULGrid

16th International Conference on Control Systems and Computer Science (CSCS16 07)

Scheduling and Resource Management in Grids and Clouds

Large Scale Coastal Modeling on the Grid

Grid Computing: A Ten Years Look Back. María S. Pérez Facultad de Informática Universidad Politécnica de Madrid mperez@fi.upm.es

Managing Credentials with

HEP Data-Intensive Distributed Cloud Computing System Requirements Specification Document

ADAM 5.5. System Requirements

Building Grids with Jini and JavaSpaces

Grids Computing and Collaboration

Status and Evolution of ATLAS Workload Management System PanDA

Software Engineering II

For large geographically dispersed companies, data grids offer an ingenious new model to economically share computing power and storage resources

Web Service Based Data Management for Grid Applications

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

Jitterbit Technical Overview : Salesforce

A Metadata Catalog Service for Data Intensive Applications

IVOA Single Sign-On security

Chapter 4 Cloud Computing Applications and Paradigms. Cloud Computing: Theory and Practice. 1

Plateforme de Calcul pour les Sciences du Vivant. SRB & glite. V. Breton.

How To Monitor A Grid System

Grid monitoring system survey

Code and Process Migration! Motivation!

SCI-BUS gateways for grid and cloud infrastructures

MIGRATING DESKTOP AND ROAMING ACCESS. Migrating Desktop and Roaming Access Whitepaper

Enhanced Research Data Management and Publication with Globus

Transcription:

Cloud Computing Lecture 5 Grid Case Studies 2014-2015

Up until now Introduction. Definition of Cloud Computing. Grid Computing: Schedulers Globus Toolkit

Summary Grid Case Studies: Monitoring: TeraGRID IS Data Transfer: LIGO Task Distribution: GEO600

TeraGrid Information Systems

TeraGrid (National Science Foundation) TeraGrid DEEP: Integrates NSF s 60 largest computers (+ than 60 TF): more than 2 PB of online storage. National data visualization infrastructure. Most powerful world computing network. TeraGrid WIDE Science Portals: Integration of scientific communities. More than 90 community data collections. Cooperation with other grid projects in Europe and Asia/Pacífic.

The Challenge Provide a mechanism that allows all participants to publish and discover information about available capabilities: What are TeraGrid s computing resources? Which features are provided by each resource? Where are the login services? Where can I access a particular data collection? Who has a weather forecast service? Provide a mechanism adapted to the TeraGrid open community: Editors record information (instead of submitting to a centralized database). A central index allows for aggregation and discovery. Multiple access interfaces (WS/SOAP, WS/ReST, browser).

Technical Issues Information is stored in legacy systems: Databases (different types, restricted access). Static and dynamic web interfaces. Multiple and varied database schemas: It s very complex to design an integrate database which supports all data types and relations. Many types of clients (browsers, SOAP, ReST). Service availability is critical: Both TeraGrid management (testing, documentation and planning) and its users and partners depend on this service. Therefore it has a 99.5% availability goal.

TeraGrid Central Services TeraGrid Information Service Architecture Apache 2.0 WS/REST HTTP GET Clients Cache Tomcat WebMDS WS/SOAP Clients WS MDS4 Resources WS MDS4 WS/SOAP Clients

Registry Editors record available content: The local service maintains a registry at the central registry. Entries expire automatically so are refreshed periodically. Editors maintain ownership of their information systems (they can even participate in other grids). Indexing services pull content: Registry entries have access control. Registry entries have a link to the source. Cache supports service faults, etc

High Availabilty Architecture TG Central Servers info.teragrid.org Clients Resources and Partners info.dyn.teragrid.org TeraGrid Dynamic Direct Numerical Simulation

TGUP Batch Load & Queue Data IIS provides queue & batch load information from all RP sites for TGUP to use in system monitor http://portal.teragrid.org/ <LoadRP xmlns=""> <ComputeResourceLoad xmlns=""> <ResourceID>pople.psc.teragrid.org</ResourceID> <SiteID>psc.teragrid.org</SiteID> <LoadInfo hostname="tg-login1.pople.psc.teragrid.org" timestamp="2009-11-11t13:46:19z"> <Load> <Type>queue</Type> <Value>98</Value> </Load>

TeraGrid Results Does not require deep modification or loss of ownership of legacy systems. Simple and consistent access mechanism. Integrates: Description of computing services and queue state. Registry of service and software availability. Centralized documentation. Test, validation and verification service: INCA, testing and execution portal.

GT4: Base Globus Toolkit Java Runtime GSI- OpenSSH MyProxy Data Rep C Runtime Delegation GRAM GridFTP Replica Location Python Runtime CAS GridWay Reliable File Transfer MDS4 Base Segurança Execução Dados Monit.

LIGO

LIGO: Laser Interferometry Gravitational Wave Observatory Goal: observe gravitational waves. Three physical detectors in two locations (plus the GEO detector in Germany). More than 10 centres for data analysis. Collaborators in more than 40 institutions.

LIGO: Laser Interferometry Gravitational Wave Observatory LIGO records thousands of data channels generating 1TB/day of data during test periods: The result data are published and data centres subscribe to the parts that local users want for analysis or storage. The data analysis results in more derived data: About 30% of LIGOs total data. They are also published and replicated. More than 35 million files on the grid: More than 6 million unique files More than 30 million files (copies of the originals)

The Challenge Replicate more than 1 TB/day of data to more than 10 locations. Solution: A publish/subscribe model. Let scientist specify and discover data using applicational criteria (metadata). Let scientists locate copies of the data.

Technical Constraints Efficiency Avoid bandwidth sub-use during transfer specially on broadband (10 Gbps) links. Avoid downtime between data transfers.

Lightweight Data Replicator Joins three basic Globus services: 1. Metadata service (MDS): What files are available? Information about files such as size, md5, date, Metadata propagation. 2. Globus Replica Location Service (RLS): Where are the files? Catalog service translates filenames into URLs. Maps files to locations. 3. GridFTP Service: How can we copy files? Server+adapted client. Used to replicate data between locations.

LIGO Data Replicator Architecture Each participant has a machine dedicated to transferring files requested by local clients. The scheduler requests metadata and catalogs replicas to identify missing files which are added to a priority list. The transfer daemon checks the list, transfer files and updates the LRC. If a transfer fails it remains on the transfer list.

LIGO Results Complete LIGO/GEO experiment: Replicated 30 TB in 30 days. (~700 MB/minute) MTBF: 1 month. More than 35 million files on the LDR network. Performance limited by chosen programming language (Python). Partnership with Globus Alliance to include a version of the LDR in Globus Toolkit.

Data Replication Service Data Replication Service (DRS) Reimplementation of LDR publish/subscribe capabilities. Uses Java based WS-RF services. Uses the RLS e RFT services from GT4. Included in GT4.

GT4: Base Globus Toolkit Java Runtime GSI- OpenSSH MyProxy Data Rep C Runtime Delegation GRAM GridFTP Replica Location Python Runtime CAS GridWay Reliable File Transfer MDS4 Base Segurança Execução Dados Monit.

GEO600

GEO600 Observatory Goal: observe gravitational waves. 600m laser interferometer close to Hannover. Members of LIGO as well.

The Challenge Sweep the packets of data produced by GEO600 looking for gravitational waves: Very complex signal processing. Very large amounts of data to process. D-GRID (state owned German Grid) and Open Science Grid available.

Two Pronged Approach Einstein@Home Shared with the LIGO community. Runs on voluntary PCs. Uses the BOINC network. Runs since mid 2006. >70,000 computers/week ~19000 units/day AstroGrid-D Same application as Einstein@Home. Using D-Grid and OSG. Runs since Oct/2007. Distributes tasks using GRAM. Averages 4000 units/day.

Traditional Approach to Resource Management Accessing multiple sites: Accounts, permissions, etc... Use a metascheduler to decide on resource selection: GridWay Metascheduler uses GRAM to contact different job submission sites.

GEO600 Approach Submission node has a list of all the GEO 600 resources with min. and max. job capacities for each. Hourly the list are reviewed and jobs are dispatched. Every time a node has less than min. Jobs, more jobs are transferred upto max. jobs and the corresponding input files are transferred asynchronously. The status of each job is maintained in the submission machine. Job output files are sent back to the submission system asynchronously. GEO600 processes 4000 jobs/day this way.

D-GRID Approach Other GRAM4 tasks GEO600 jobs Other GRAM4 tasks Local jobs Local jobs GRAM4 Service Scheduler (e.g. Condor) GRAM4 Service Scheduler (e.g. SGE) Computing Nodes Computing Nodes Resource A Resource B

Next time Cycle Sharing Edge Computing Scheduling Algorithms