Workflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice

Similar documents
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures

SimGrid Cloud Broker: Simulation of Public and Private Clouds

DYNAMIC CLOUD PROVISIONING FOR SCIENTIFIC GRID WORKFLOWS

IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 3, NO. X, XXXXX Monetary Cost Optimizations for Hosting Workflow-as-a-Service in IaaS Clouds

Data Sharing Options for Scientific Workflows on Amazon EC2

Building Platform as a Service for Scientific Applications

A Generic Auto-Provisioning Framework for Cloud Databases

ANALYSIS OF WORKFLOW SCHEDULING PROCESS USING ENHANCED SUPERIOR ELEMENT MULTITUDE OPTIMIZATION IN CLOUD

Creating A Galactic Plane Atlas With Amazon Web Services

Experimental Study of Bidding Strategies for Scientific Workflows using AWS Spot Instances

Scientific Workflow Applications on Amazon EC2

Monitoring Elastic Cloud Services

SURVEY ON THE ALGORITHMS FOR WORKFLOW PLANNING AND EXECUTION

Infrastructure as a Service (IaaS)

Nimbus: Cloud Computing with Science

An Experimental Study of Load Balancing of OpenNebula Open-Source Cloud Computing Platform

Enabling Large-Scale Testing of IaaS Cloud Platforms on the Grid 5000 Testbed

Hosted Science: Managing Computational Workflows in the Cloud. Ewa Deelman USC Information Sciences Institute

Cost Minimized PSO based Workflow Scheduling Plan for Cloud Computing

Optimizing the Cost for Resource Subscription Policy in IaaS Cloud

An Efficient Checkpointing Scheme Using Price History of Spot Instances in Cloud Computing Environment

HCOC: A Cost Optimization Algorithm for Workflow Scheduling in Hybrid Clouds

Flauncher and DVMS Deploying and Scheduling Thousands of Virtual Machines on Hundreds of Nodes Distributed Geographically

SCORE BASED DEADLINE CONSTRAINED WORKFLOW SCHEDULING ALGORITHM FOR CLOUD SYSTEMS

A SURVEY ON WORKFLOW SCHEDULING IN CLOUD USING ANT COLONY OPTIMIZATION

Cloud Computing. Alex Crawford Ben Johnstone

Figure 1. The cloud scales: Amazon EC2 growth [2].

OCRP Implementation to Optimize Resource Provisioning Cost in Cloud Computing

C-Meter: A Framework for Performance Analysis of Computing Clouds

Science Clouds: Early Experiences in Cloud Computing for Scientific Applications Kate Keahey and Tim Freeman

Cost estimation for the provisioning of computing resources to execute bag-of-tasks applications in the Amazon cloud

CloudAnalyst: A CloudSim-based Visual Modeller for Analysing Cloud Computing Environments and Applications

Exploiting Cloud Heterogeneity for Optimized Cost/Performance MapReduce Processing

Eoulsan Analyse du séquençage à haut débit dans le cloud et sur la grille

Experiments on cost/power and failure aware scheduling for clouds and grids

Infrastructure-as-a-Service Cloud Computing for Science

Webpage: Volume 3, Issue XI, Nov ISSN

Introduction to Scheduling Theory

SLA BASED SERVICE BROKERING IN INTERCLOUD ENVIRONMENTS

Duke University

Data Semantics Aware Cloud for High Performance Analytics

VIRTUAL RESOURCE MANAGEMENT FOR DATA INTENSIVE APPLICATIONS IN CLOUD INFRASTRUCTURES

Dynamic Resource Pricing on Federated Clouds

The Theory And Practice of Testing Software Applications For Cloud Computing. Mark Grechanik University of Illinois at Chicago

Simulation-based Evaluation of an Intercloud Service Broker

On the Performance-cost Tradeoff for Workflow Scheduling in Hybrid Clouds

Improving MapReduce Performance in Heterogeneous Environments

Keywords: Cloudsim, MIPS, Gridlet, Virtual machine, Data center, Simulation, SaaS, PaaS, IaaS, VM. Introduction

Dynamic Resource Allocation in Software Defined and Virtual Networks: A Comparative Analysis

BlobSeer: Towards efficient data storage management on large-scale, distributed systems

Auto-Scaling Model for Cloud Computing System

A STUDY OF TASK SCHEDULING IN MULTIPROCESSOR ENVIROMENT Ranjit Rajak 1, C.P.Katti 2, Nidhi Rajak 3

Scheduler in Cloud Computing using Open Source Technologies

Minimal Cost Data Sets Storage in the Cloud

IMCM: A Flexible Fine-Grained Adaptive Framework for Parallel Mobile Hybrid Cloud Applications

CHAMELEON: A LARGE-SCALE, RECONFIGURABLE EXPERIMENTAL ENVIRONMENT FOR CLOUD RESEARCH

Deploying Business Virtual Appliances on Open Source Cloud Computing

THE DEFINITIVE GUIDE FOR AWS CLOUD EC2 FAMILIES

The Virtual Grid Application Development Software (VGrADS) Project

Cloud and Virtualization to Support Grid Infrastructures

The Case for Resource Sharing in Scientific Workflow Executions

Cloud Computing and E-Commerce

A science-gateway workload archive application to the self-healing of workflow incidents

HYBRID ACO-IWD OPTIMIZATION ALGORITHM FOR MINIMIZING WEIGHTED FLOWTIME IN CLOUD-BASED PARAMETER SWEEP EXPERIMENTS

A Data Dependency Based Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems *

Mining Large Datasets: Case of Mining Graph Data in the Cloud

Efficient Data Management Support for Virtualized Service Providers

Transcription:

Workflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice Eddy Caron 1, Frédéric Desprez 2, Adrian Mureșan 1, Frédéric Suter 3, Kate Keahey 4 1 Ecole Normale Supérieure de Lyon, France 2 INRIA 3 IN2P3 Computing Center, CNRS, IN2P3 4 UChicago, Argonne National Laboratory Joint-lab workshop

Outline Context Theory Models Proposed solution Simulations Practice Architecture Application Experimentation Conclusions and perspectives

Workflows are a common pattern in scientific applications applications built on legacy code applications built as an aggregate use inherent task parallelism phenomenons having inherent workflow structure Workflows are omnipresent!

(a) Ramses (b) Montage (c) Ocean current modeling (d) Epigenomics Figure Workflow application examples

Classic model of resource provisioning static allocations in a grid environment researchers compete for resources researchers tend to over-provision and under-use workflow applications have a non-constant resource demand This is inefficient, but can it be improved? Yes! How? on-demand resources automate resource provisioning smarter scheduling strategies

Why on-demand resources? more efficient resource usage eliminate overbooking of resources can be easily automated unlimited resources Our goal consider a more general model of workflow apps consider on-demand resources and a budget limit find a good allocation strategy

Related work Functional workflows bicpa Bahsi, E.M., Ceyhan, E., Kosar, T.: Conditional Workflow Management: A Survey and Analysis. Scientific Programming 15(4), 283 297 (2007) Desprez, F., Suter, F.: A Bi-Criteria Algorithm for Scheduling Parallel Task Graphs on Clusters. In: Proc. of the 10th IEEE/ACM Intl. Symposium on Cluster, Cloud and Grid Computing. pp. 243 252 (2010) Chemical programming for workflow applications Fernandez, H., Tedeschi, C., Priol, T.: A Chemistry Inspired Workflow Management System for Scientific Applications in Clouds. In: Proc. of the 7th Intl. Conference on E-Science. pp. 39 46 (2011) Pegasus TMalawski, M., Juve, G., Deelman, E. and Nabrzyski, J..: Cost- and Deadline-Constrained Provisioning for Scientific Workflow Ensembles in IaaS Clouds. 24th IEEE/ACM International Conference on Supercomputing (SC12) (2012)

Outline Context Theory Models Proposed solution Simulations Practice Architecture Application Experimentation Conclusions and perspectives

1 1 1 Or Loop 2 3 2 3 2 3 4 4 4 (a) Classic (b) PTG Figure Workflow types (c) Functional

Application model Non-deterministic (functional) workflows An application is a graph G = (V, E), where V = {v i i = 1,..., V } is a set of vertices E = {e i,j (i, j) {1,..., V } {1,..., V }} is a set of edges representing precedence and flow constraints Vertices a computational task [parallel, moldable] an OR-split [transitions described by random variables] an OR-join

Example workflow Figure Example workflow

Platform model A provider of on-demand resources from a catalog: C = {vm i = (ncpu i, cost i ) i 1} ncpu represents the number of equivalent virtual CPUs cost represents a monetary cost per running hour (Amazon-like) communication bounded multi-port model Makespan C = max i C(v i ) is the global makespan where C(v i ) is the finish time of task v i V Cost of a schedule S Cost = vm i S T end i T starti cost i T starti, T endi cost i represent the start and end times of vm i is the catalog cost of virtual resource vm i

Problem statement Given G a workflow application C a provider of resources from the catalog B a budget find a schedule S such that Cost B budget limit is not passed C (makespan) is minimized with a predefined confidence.

Proposed approach 1. Decompose the non-dag workflow into DAG sub-workflows 2. Distribute the budget to the sub-workflows 3. Determine allocations by adapting an existing allocation approach

Step 1: Decomposing the workflow Figure Decomposing a nontrivial workflow

Step 2: Allocating budget 1. Compute the number of executions of each sub-worflow # of transitions of the edge connecting its parent OR-split to its start node Described by a random variable according to a distinct normal distribution + confidence parameter 2. Give each sub-workflow a ratio of the budget proportional to its work contribution. Work contribution of a sub-workflow G i as the sum of the average execution times of its tasks average execution time computed over the catalog C task speedup model is taken into consideration multiple executions of a sub-workflow also considered

Step 3: Determining allocations Two algorithms based on the bi-cpa algorithm. Eager algorithm one allocation for each task good trade-off between makespan and average time-cost area fast algorithm considers allocation-time cost estimations only Deferred algorithm outputs multiple allocations for each task good trade-off between makespan and average time-cost area slower algorithm one allocation is chosen at scheduling time

Algorithm parameters TA over, TA under average work allocated to tasks duration of the critical path T CP B estimation of the used budget when T A and T CP meet T A keeps increasing as we increase the allocation of tasks and T CP keeps decreasing so they will eventually meet. When they do meet we have a trade-off between the average work in tasks and the makespan. p(v i ) number of processing units allocated to task v i

The eager allocation algorithm 1: for all v V i do 2: Alloc(v) {min vmi C CPU i } 3: end for 4: Compute B 5: while T CP > TA over V i j=1 cost(v j) B i do 6: for all v i Critical Path do 7: Determine Alloc (v i ) such that p (v i ) = p(v i ) + 1 8: Gain(v i ) T (v i,alloc(v i )) p(v i ) T (v i,alloc (v i )) p (v i ) 9: end for 10: Select v such that Gain(v) is maximal 11: Alloc(v) Alloc (v) 12: Update TA over and T CP 13: end while Algorithm 1: Eager-allocate(G i = (V i, E i ), B i )

Methodology Simulation using SimGrid Used 864 synthetic workflows for three types of applications Fast Fourier Transform Strassen matrix multiplication Random workloads Used a virtual resource catalog inspired by Amazon EC2 Used a classic list-scheduler for task mapping Measured Cost and makespan after task mapping Name #VCPUs Network performance Cost / hour m1.small 1 moderate 0.09 m1.med 2 moderate 0.18 m1.large 4 high 0.36 m1.xlarge 8 high 0.72 m2.xlarge 6.5 moderate 0.506 m2.2xlarge 13 high 1.012 m2.4xlarge 26 high 2.024 c1.med 5 moderate 0.186 c1.xlarge 20 high 0.744 cc1.4xlarge 33.5 10 Gigabit Ethernet 0.186 cc2.8xlarge 88 10 Gigabit Ethernet 0.744

1.6 Equality 1.4 Relative makespan 1.2 1 0.8 0.6 0.4 0.2 0 1 5 10 15 20 25 30 35 40 45 50 Budget limit Figure Relative makespan ( Eager Deferred ) for all workflow applications

5 Equality 4 Relative cost 3 2 1 0 1 5 10 15 20 25 30 35 40 45 Budget limit Figure Relative cost ( Eager Deferred ) for all workflow applications

First conclusions Eager is fast but cannot guarantee budget constraint after mapping Deferred is slower, but guarantees budget constraint After a certain budget they yield to identical allocations for small applications and small budgets Deferred should be preferred. When the size of the applications increases or the budget limit approaches task parallelism saturation, using Eager is preferable.

Outline Context Theory Models Proposed solution Simulations Practice Architecture Application Experimentation Conclusions and perspectives

Architecture Send workflow Receive result DIET (MADAG) Workflow engine Phantom service Acquire/release resources Nimbus Scientist VM1 SeD Grafic1 Phantom frontend SeD Ramses Phantom frontend... VM2 VM3 Manage Autoscaling Groups VMn Figure System architecture Deploy workflow tasks

Main components Nimbus open-source IaaS provider provides low-level resources (VMs) compatible with the Amazon EC2 interface used a FutureGrid install

Main components Phantom auto-scaling and high availability provider high-level resource provider subset of the Amazon auto-scale service part of the Nimbus platform used a FutureGrid install still under development

Main components MADag workflow engine part of the DIET (Distributed Interactive Engineering Toolkit) software one service implementation per task each service launches its afferent task supports DAG, PTG and functional workflows

Main components Client describes his workflow in xml implements the services calls the workflow engine no explicit resource management selects the IaaS provider to deploy on

How does it work? Send workflow Receive result DIET (MADAG) Workflow engine Phantom service Acquire/release resources Nimbus Scientist VM1 SeD Grafic1 Phantom frontend SeD Ramses Phantom frontend... VM2 VM3 Manage Autoscaling Groups VMn Figure System architecture Deploy workflow tasks

RAMSES n-body simulations of dark matter interactions backbone of galaxy formations AMR workflow application parallel (MPI) application can refine at different zoom levels

RAMSES Figure The RAMSES workflow

Methodology used a FutureGrid Nimbus installation as a testbed measured running time for static and dynamic allocations estimated cost for each allocation varied maximum number of used resources

Figure Slice through a 2 8 2 8 2 8 box simulation

Results Running time (MM:SS) 25:00 20:00 15:00 10:00 5:00 Runtime (static) VM alloc (dynamic) Runtime (dynamic) 0 2 4 6 8 10 12 14 16 Number of VMs Figure Running times for a 2 6 2 6 2 6 box simulation

Results 25 Cost (static) Cost (dynamic) 20 Cost (units) 15 10 5 0 0 2 4 6 8 10 12 14 16 Number of VMs Figure Estimated costs for a 2 6 2 6 2 6 box simulation

Outline Context Theory Models Proposed solution Simulations Practice Architecture Application Experimentation Conclusions and perspectives

Conclusions proposed two algorithms Eager and Deferred with each their pro and cons on-demand resources can better model workflow usage on-demand resources have a VM allocation overhead allocation overhead decreases with number of VMs for RAMSES, cost is greatly reduced

Perspectives preallocate VMs spot instances smarter scheduling strategy determine per application type which is the tipping point Compare our algorithms with others Collaborations Continue the collaboration with the Nimbus/FutureGrid teams On the algorithms themselves (currently too complicated for an actual implementation) Understanding (obtaining models) clouds and virtualized platforms going from theoretical algorithms to (accurate) simulations and actual implementation

References Eddy Caron, Frédéric Desprez, Adrian Muresan and Frédéric Suter. Budget Constrained Resource Allocation for Non-Deterministic Workflows on a IaaS Cloud. 12th International Conference on Algorithms and Architectures for Parallel Processing. (ICA3PP-12), Fukuoka, Japan, September 04-07, 2012 Adrian Muresan, Kate Keahey. Outsourcing computations for galaxy simulations. In extreme Science and Engineering Discovery Environment 2012 - XSEDE12, Chicago, Illinois, USA, June 15-19 2012. Poster session. Adrian Muresan. Scheduling and deployment of large-scale applications on Cloud platforms. Laboratoire de l Informatique du Parallélisme (LIP), ENS Lyon, France, Dec. 10th, 2012 PhD thesis.