The Virtual Grid Application Development Software (VGrADS) Project

Similar documents
Infrastructure for Cloud Computing

The Eucalyptus Open-source Cloud Computing System

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India

Building Platform as a Service for Scientific Applications

FREE AND OPEN SOURCE SOFTWARE FOR CLOUD COMPUTING SERENA SPINOSO FULVIO VALENZA

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India

A Model-Based Proxy for Unified IaaS Management

Cisco Cloud Portal Delivers Self-Service Provisioning for Data Center Services

An Introduction to Virtualization and Cloud Technologies to Support Grid Computing

The GridWay Meta-Scheduler

Cluster, Grid, Cloud Concepts

Extending Hadoop beyond MapReduce

An Experimental Study of Load Balancing of OpenNebula Open-Source Cloud Computing Platform

Cloud-pilot.doc SA1 Marcus Hardt, Marcin Plociennik, Ahmad Hammad, Bartek Palak E U F O R I A

CLEVER: a CLoud-Enabled Virtual EnviRonment

PRIVACY PRESERVATION ALGORITHM USING EFFECTIVE DATA LOOKUP ORGANIZATION FOR STORAGE CLOUDS

VMM-Level Distributed Transparency Provisioning Using Cloud Infrastructure Technology. Mahsa Najafzadeh, Hadi Salimi, Mohsen Sharifi

A Service for Data-Intensive Computations on Virtual Clusters

Comparison of Several Cloud Computing Platforms

MCSE: Private Cloud Training Course (System Center 2012)

Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms

SUSE Cloud 2.0. Pete Chadwick. Douglas Jarvis. Senior Product Manager Product Marketing Manager

WORKFLOW ENGINE FOR CLOUDS

An Efficient Checkpointing Scheme Using Price History of Spot Instances in Cloud Computing Environment

Workprogramme

Scientific versus Business Workflows

HP CLOUD STRATEGY AND SOLUTIONS THE PATH TO HYBRID DELIVERY. Copyright 2011 Hewlett-Packard Development Company, L.P.

Scheduling and Resource Management in Grids and Clouds

DESIGN OF A PLATFORM OF VIRTUAL SERVICE CONTAINERS FOR SERVICE ORIENTED CLOUD COMPUTING. Carlos de Alfonso Andrés García Vicente Hernández

Comparing Open Source Private Cloud (IaaS) Platforms

Windows Azure platform What is in it for you? Dominick Baier Christian Weyer

Grid Scheduling Architectures with Globus GridWay and Sun Grid Engine

Data Management in the Cloud. Zhen Shi

Manjrasoft Market Oriented Cloud Computing Platform

AWS Account Setup and Services Overview

Lavanya Ramakrishnan Phone: (919)

Cloud and Virtualization to Support Grid Infrastructures

Unified Batch & Stream Processing Platform

Logical Data Models for Cloud Computing Architectures

Open Source Cloud Computing Management with OpenNebula

Improving MapReduce Performance in Heterogeneous Environments

Comparing Ganeti to other Private Cloud Platforms. Lance Albertson

C-Meter: A Framework for Performance Analysis of Computing Clouds

10751-Configuring and Deploying a Private Cloud with System Center 2012

A SURVEY ON WORKFLOW SCHEDULING IN CLOUD USING ANT COLONY OPTIMIZATION

Resource Scalability for Efficient Parallel Processing in Cloud

HPC and Grid Concepts

Towards an Optimized Big Data Processing System

StratusLab project. Standards, Interoperability and Asset Exploitation. Vangelis Floros, GRNET

HPC technology and future architecture

How to Do/Evaluate Cloud Computing Research. Young Choon Lee

A Cost-Evaluation of MapReduce Applications in the Cloud

Cloud Computing Architecture with OpenNebula HPC Cloud Use Cases

BSC vision on Big Data and extreme scale computing

Solution for private cloud computing

A Taxonomy and Survey of Grid Resource Planning and Reservation Systems for Grid Enabled Analysis Environment

TECHNOLOGY WHITE PAPER Jun 2012

ANALYSIS OF WORKFLOW SCHEDULING PROCESS USING ENHANCED SUPERIOR ELEMENT MULTITUDE OPTIMIZATION IN CLOUD

Data Sheet Netrounds Control Center

Daniel J. Adabi. Workshop presentation by Lukas Probst

FP-Hadoop: Efficient Execution of Parallel Jobs Over Skewed Data

CiteSeer x in the Cloud

Moving beyond Virtualization as you make your Cloud journey. David Angradi

LSKA 2010 Survey Report I Device Drivers & Cloud Computing

Scalable Architecture on Amazon AWS Cloud

Technology Enablement

Provisioning and Resource Management at Large Scale (Kadeploy and OAR)

Transcription:

The Virtual Grid Application Development Software (VGrADS) Project VGrADS: Enabling e-science Workflows on Grids and Clouds with Fault Tolerance http://vgrads.rice.edu/

VGrADS Goal: Distributed Problem Solving Where We Want To Be o Transparent computing In an increasingly distributed space Applications to HPC Applications to cloud computing Where We Were (circa 2003) o Low-level hand programming o Programmer had to manage: Heterogeneous resources Scheduling of computation and data movement Fault tolerance and performance adaptation What Progress Have We Made? o Separate application development from resource management o VGrADS provides a uniform virtual grid abstraction atop widely differing resources Provide tools to bridge the gap Scheduling, resource management, distributed launch, simple programming models, fault tolerance, grid economies

Overview of SCʼxx Activities for VGrADS Built on previous SC demonstrations o Gradually built up system to handle LEAD workflow o Previous years focused improved performance estimates, scheduling methods, fault tolerance o Use LEAD as an application driver Current status o VGrADS integrates HPC and cloud resources Using TeraGrid (HPC), Amazon EC2 (cloud), Eucalyptus (cloud) resources Using reservations, batch queues, and on-demand clouds o Scheduling for balancing deadlines, reliability, and cost vges supports search for best set of resources Application-specific trade-offs of reliability, time, cost o Abstractions really do work!

VGrADS Components o Virtual Grid Execution System (vges) Uses Amazon EC2 tools to interact with cloud resources Uses QBETS and Globus to provision batch resources Uses Personal PBS to control execution on batch resources Provides a resource gantt chart view of resources to aid higher level workflow orchestration tool o Eucalyptus Developed for VGrADS (now Eucalyptus Inc.) Implements cloud computing on Xen-enabled clusters Open-source software infrastructure that is compatible with Amazon EC2 vges thinks a Eucalyptus cloud is EC2 o Fault Tolerance (FTR) Schedules a task to increase the probability of successful execution of a task up to a desired level, constrained by resource availability and application deadlines

Executing LEAD Workflow Sets www.leadproject.org o Demonstrate planning and execution of LEAD workflow sets execution atop virtualized cloud and Grid resources. o LEAD Workflow Orchestration schedules a set of independent workflows with characteristics a deadline D (e.g. 2 hours) fraction F such that at least F of the workflows finish by the deadline (e.g. 3/8) o Virtual Grid Execution System (vges) provides an abstraction over batch and cloud systems including Amazon EC2 and Eucalyptus cloud sites.

Workflow Orchestration o Workflow Orchestration o Phase 1: Minimal Scheduling Schedule minimum fraction of workflows using simple probabilistic DAG scheduler o Phase 2: Fault Tolerance Tradeoff Compare scheduling additional workflows with increasing fault-tolerance of one or more tasks of the scheduled workflows o Phase 3: Additional Scheduling Use available slots for other scheduling o Phase 4: EC2 Scheduling - For tasks below certain threshold, schedule on EC2 o Execution Manager A prototype for ordered execution of tasks on the slot based on the schedule determined by the orchestration.

Comparison of the LEAD-VGrADS collaboration system with cyberinfrastructure production deployments Without VGrADS User With VGrADS User Portal Portal Workflow Engine Application Service Execution Manager resource planning Workflow Planner Virtual Grid Execution System query execution plan Workflow Engine Application Service Execution Manager Globus HPC Resources specific protocol based execution EC2 interfaces Cloud resource binding Globus HPC Resources EC2 interfaces Cloud standard execution

Example Scheduling of Workflows (a) Example Workflow Set need F=2/3 B A D C F G E H I J K L (b) Uncoordinated Schedule Without VGrADS Batch Queue Cloud J K L A B C D E F G H I Start Deadline D (c) Coordinated Schedule With VGrADS Batch Queue Cloud J K L A B C D E F G H I E F G H I Start Deadline D

Interaction of system components for resource procurement and planning Workflow Planner resource find & bind schedule DAG Scheduler Fault tolerance EC2 Phase 1,2,3 Phase 2,3 Phase 4 Virtual Grid Execu3on System query batch queue times BQP virtual res. VARQ submit job setup virtual machines GT4, Personal PBS Globus EC2 interfaces HPC Resources Cloud

LEAD / VGrADS Architecture : Putting It All Together Portal LEAD System LEAD BPEL Workflow Engine Event Broker Workflow Planner schedule Application Service App. Factory resource find & bind DAG Scheduler Fault tolerance failures Execution Manager failures Virtual Grid Execution System BQP / VAR VGrADS System Batch Systems Globus HPC and Cloud Resources HPC Cloud GT4, Personal PBS

Visualization Key Resource Name Workflow Color Slot Width Done Failed Run Time Task Status

Snapshot of Execution of 6 Workflows

6 Workflows on 7 Clusters

Infrastructure Timing Metrics Binding time Planning time line Workflow execution time line

Fault Tolerance Exploration Low Reliability High Reliability Medium Reliability

Conclusions VGrADS virtual grid abstraction simplifies o Programming grid and cloud systems for e-science workflows o Managing QoS (performance and reliability) The VGrADS system unifies workflow execution over o batch queue systems (with and without advanced reservations), and o cloud computing sites (including Amazon EC2 and Eucalyptus) The system provides an enabling technology for executing deadline-driven, fault-tolerant workflows The integrated cyber-infrastructure from the LEAD and VGrADS system components provides a strong foundation for next-generation dynamic and adaptive environments for scientific workflows

Thank You VGrADS: Enabling e-science Workflows on Grids and Clouds with Fault Tolerance, L. Ramakrishnan, D. Nurmi, A. Mandal, C. Koelbel, D. Gannon, T. M. Huang, Y. S. Kee, G. Obertelli, K. Thyagaraja, R. Wolski, A. Yarkhan and D. Zagorodnov Paper to be presented at 3:30 p.m., Thursday, Nov. 19 in Room E145-146 Thank you..