DYNAMIC CLOUD PROVISIONING FOR SCIENTIFIC GRID WORKFLOWS

Similar documents

Resource Management for Scientific Application in Hybrid Cloud Computing Environments. Simon Ostermann

SimGrid Cloud Broker: Simulation of Public and Private Clouds

Data Sharing Options for Scientific Workflows on Amazon EC2

Paul Brebner, Senior Researcher, NICTA,

Aneka Dynamic Provisioning

Workflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice

C-Meter: A Framework for Performance Analysis of Computing Clouds

OCRP Implementation to Optimize Resource Provisioning Cost in Cloud Computing

Cloud Computing and E-Commerce

CBUD Micro: A Micro Benchmark for Performance Measurement and Resource Management in IaaS Clouds

Auto-Scaling Model for Cloud Computing System

Emerging Technology for the Next Decade

WORKFLOW ENGINE FOR CLOUDS

PUBLIC CLOUD USAGE TRENDS

Duke University

Cloud Computing. Adam Barker

Cloud-pilot.doc SA1 Marcus Hardt, Marcin Plociennik, Ahmad Hammad, Bartek Palak E U F O R I A

Building Platform as a Service for Scientific Applications

Task Scheduling for Efficient Resource Utilization in Cloud

Chapter3: Understanding Cloud Computing

Cloud Computing. Alex Crawford Ben Johnstone

International Journal of Engineering Research & Management Technology

Cloud Federations in Contrail

LR120 LoadRunner 12.0 Essentials

Final Project Proposal. CSCI.6500 Distributed Computing over the Internet

An Efficient Checkpointing Scheme Using Price History of Spot Instances in Cloud Computing Environment

DataCenter optimization for Cloud Computing

DESIGN OF A PLATFORM OF VIRTUAL SERVICE CONTAINERS FOR SERVICE ORIENTED CLOUD COMPUTING. Carlos de Alfonso Andrés García Vicente Hernández

Cloud Computing. Aditya Wikan Mahastama

Power Aware Load Balancing for Cloud Computing

ABSTRACT. KEYWORDS: Cloud Computing, Load Balancing, Scheduling Algorithms, FCFS, Group-Based Scheduling Algorithm

Keywords: Cloudsim, MIPS, Gridlet, Virtual machine, Data center, Simulation, SaaS, PaaS, IaaS, VM. Introduction

Performance metrics for parallel systems

Exploring Resource Provisioning Cost Models in Cloud Computing

How to Do/Evaluate Cloud Computing Research. Young Choon Lee

Cloud Computing with Azure PaaS for Educational Institutions

MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT

Cost Effective Selection of Data Center in Cloud Environment

Payment minimization and Error-tolerant Resource Allocation for Cloud System Using equally spread current execution load

A Generic Auto-Provisioning Framework for Cloud Databases

BSC vision on Big Data and extreme scale computing

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India

EC2 Performance Analysis for Resource Provisioning of Service-Oriented Applications

Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures

CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms

Permanent Link:

IBM EXAM QUESTIONS & ANSWERS

Study on Cloud Computing Resource Scheduling Strategy Based on the Ant Colony Optimization Algorithm

Linear Scheduling Strategy for Resource Allocation in Cloud Environment

Smartronix Inc. Cloud Assured Services Commercial Price List

Survey on Cloud computing Services and Portability

A Service for Data-Intensive Computations on Virtual Clusters

Table of Contents. Abstract... Error! Bookmark not defined. Chapter 1... Error! Bookmark not defined. 1. Introduction... Error! Bookmark not defined.

Monitoring Elastic Cloud Services

Cloud Computing An Introduction

Infrastructure as a Service (IaaS)

The New Virtualization Management. Five Best Practices

In Cloud, Do MTC or HTC Service Providers Benefit from the Economies of Scale?

Cloud Computing and Open Source: Watching Hype meet Reality

A Game Theoretic Formulation of the Service Provisioning Problem in Cloud Systems

Performance metrics for parallelism

Dynamic Resource Pricing on Federated Clouds

IaaS Federation. Contrail project. IaaS Federation! Objectives and Challenges! & SLA management in Federations 5/23/11

A Step-by-Step Guide to Defining Your Cloud Services Catalog

THE DEFINITIVE GUIDE FOR AWS CLOUD EC2 FAMILIES

IMCM: A Flexible Fine-Grained Adaptive Framework for Parallel Mobile Hybrid Cloud Applications

Cloud Computing Technology

International Journal of Computer & Organization Trends Volume21 Number1 June 2015 A Study on Load Balancing in Cloud Computing

Fig. 1 WfMC Workflow reference Model

Network Infrastructure Services CS848 Project

Cloud Computing An Elephant In The Dark

Performance Analysis of Cloud-Based Applications

Service Component Architecture for Building Cloud Services

Cloud Computing Summary and Preparation for Examination

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India

Minder. simplifying IT. All-in-one solution to monitor Network, Server, Application & Log Data

SCALABILITY IN THE CLOUD

CLOUD COMPUTING An Overview

Cloud Computing Submitted By : Fahim Ilyas ( ) Submitted To : Martin Johnson Submitted On: 31 st May, 2009

PERFORMANCE ANALYSIS OF PaaS CLOUD COMPUTING SYSTEM

Network for Sustainable Ultrascale Computing (NESUS)

Performance Gathering and Implementing Portability on Cloud Storage Data

Transcription:

DYNAMIC CLOUD PROVISIONING FOR SCIENTIFIC GRID WORKFLOWS Simon Ostermann, Radu Prodan and Thomas Fahringer Institute of Computer Science, University of Innsbruck Technikerstrasse 21a, Innsbruck, Austria simon@dps.uibk.ac.at

OVERVIEW Introduction Optimized Cloud Provisioning Cloud Start Instance Size Grid Scheduling Cloud Stop Evaluation using 3 scientific workflows Wien2k Invmod Meteoag Conclusion

INTRODUCTION Infrastructure as a Service a branch of Cloud computing On-demand resources i.e.: Amazon EC2, GoGrid,... Other common Cloud computing areas not covered: Platform as a Service Software as a Service Specialized solutions for Storage, Web hosting,...

CLOUD COMPUTING FOR SCIENTIFIC COMPUTING? Rent resources instead of buying own hardware Eliminates permanent operation, maintenance, and deprecation costs Scale up/down an infrastructure based on temporary immediate needs Significantly reduced over-provisioning Virtualised resources enables scalable deployment and provisioning of application software Reliability through business SLA relationships that bind actors to offering higher QoS guarantees

nothing 50 CLOUD MODELS Unallocated 100 Requested 100 Starting 100 Running 30 Accessible 270 Shutting down 50 Cloud computing mostly available on a hourly basis Terminated 10 Unallocated 100 Some research papers assume finer granularity )*%+(%,-#./*'( =#>-(&%''%,6( %,-#./*'( 0,*''3"*-#+( 7#98#$-#+( 4-*.5,6( 78,,%,6(!""#$%&'#( 4:8;,6(+3<,( 0,*''3"*-#+( 1%2#( 0$*&'#(%,-#./*'( 1#.2%,*-#+( Interesting problems arise: How much do i use this full hour? How can i maximize the usage / minimize the cost?

GRID COMPUTING Grid has emerged as a worldwide shared distributed platform for solving large-scale scientific problems Grid computing with additional Cloud resources to speed up scientific computing Just in time Scheduler from ASKALON, a workflow execution system for Grid and Cloud resources ASKALON is a Workflow system developed by the DPS group at the University of Innsbruck Multiple scientific workflows from different fields of science

GROUDSIM Grid and Cloud Simulator Event based for scalability reasons Experiments showed up to 90% better performance and better scalability then GridSim Java based - to allow integration into existing software Simulation allows wide analysis of Cloud without expenses Simulation results match real executions

GROUDSIM ARCHITECTURE./($0!"#* 12-/* 3$4$/-*+5-)4* 6"24*73+68* Put events in list Get next event!"#$%&'()*+),")-* Infrastructure + application simulation Callbacks ="24/">$'()* :&;<,/($)0* 6(&0-/* Generate failure ="24/">$'()* 3&"%$/-*.-)-/&4(/* Submit jobs Transfer files./"0*&)0*9%($0*+)''-2* ="24/">$'()*

OPTIMIZED CLOUD PROVISIONING Analysis of regular executions and the resulting costs Analysis resulted in multiple parts needing optimization Choices have to be made about: start and stop of resources and the amount of instances requested Four optimizations found, defined as algorithms (in the paper) and exploited in the evaluation

CLOUD START Grid core 3 120 120 Grid core 2 120 120 Parallel Grid regions core 1 with 120 more tasks 120 then available cores Cloud core 1 250 Depending of Cloud and Grid speed Serialization and Imbalance overheads are analyzed Grid core 3 120 120 Grid core 2 120 120 When Grid minimization core 1 120 of the runtime of the parallel section is Cloud core 1 300 possible Cloud resources are started 2+&'34'536*7" :;.3437)+" %&'(")*&+"#" -*."#" -*."/" 84*9(")*&+"#" -*."/" %&'(")*&+"#" -*."#" %&'(")*&+"$" -*."$" -*."0" %&'(")*&+"$" -*."$" -*."0" %&'(")*&+"," -*."," -*."1" %&'(")*&+"," -*."," -*."1" <';+"!" #!!" $!!" <';+"!" #!!" $!!",!!"

INSTANCE SIZE Instances may offer different number of cores When only part of the Cloud cores are used the cost efficiency is lower Getting to little cores may result in serialization / no benefit Important to decide if number of instances to request is rounded up or down resulting in 2 behaviors: generous: better performance but more expensive economical: less expensive but performance may not improve

GRID SCHEDULING Grid is a dynamical shared environment Resources may become available while workflow execution uses Cloud resources Rescheduling resources to Grid might save cost / might decrease execution time depending of work already completed from a job mapped to a Cloud resource and the speed difference from Grid and Cloud decisions are made

CLOUD STOP Unused resources are shut down to save money Shutdown after 5 minutes of a payed hour is as expensive as after 58 minutes Resources might be reused in the upcoming 53 minutes and this reuse will reduce the overall Cloud provisioning overheads Shut down time is in payed period therefor the point in time has to be chosen knowing the Shut down time of the Cloud in some case: 1 hour of cloud time can be saved

EVALUATION Three different scientific workflows with different levels of parallelism Execution simulated using GroudSim Impact of different optimizations on the three workflows when using 3 different types of Cloud resources and 3 Clusters from the Austrian Grid

METRIC Comparison of executions on Grid resources and executions using Grid and additional on demand Cloud resources We define a new metric CT called cost per unit of saved time ($/T) Represents how expensive a unit of saved execution time comes with the assumption that Grid resources are freely available

WORKFLOWS From different fields of science with different structures Parallelisation size x representing a factor that represents the amount of tasks in a workflow which is evaluated for values from 1-900 Computationally intensive, data transfers are small part of each workflow Cloud network speed and storage influence kept low Simulation data based on real executions in the Austrian Grid

GENERAL OBSERVATIONS 180 160 140 120 100 80 60 40 Cost [$] Grid+m1.small (Cloud stop) Grid+m1.large (Cloud stop) Grid+c1.xlarge (Cloud stop) Grid+m1.small (no opt.) Grid+m1.large (no opt.) Grid+c1.xlarge (no opt.) 20 0 0 100 200 300 400 500 600 700 800 900 Parallelisation size [x] Comparison of regular and optimized executions of different big workflows

WIEN2K Vienna University of Technology Theoretical chemistry (materials science) Electronic structure calculations for solids using density functional theory Number of activities 2 * x + 3 x = parallelisation size

WIEN2K Time [hours] Cost [$] 35 30 25 20 15 10 5 0 180 160 140 120 100 80 60 40 20 0 Grid Grid + m1.small Grid + m1.large Grid + c1.xlarge 0 100 200 300 400 500 600 700 800 900 Grid + m1.small Grid + m1.large Grid + c1.xlarge Parallelisation size [x] 0 100 200 300 400 500 600 700 800 900 Parallelisation size [x] 1 Execution times and cost on the Grid and with additional Cloud resources Cost / Saved time [min/$], logarithmic scale [log C T ] 10 0.1 0.01 Cost per unit of saved time ($/T) for the three different Cloud with logarithmic scale 0 100 200 300 400 500 600 700 800 900 Parallelisation size [x] Grid + m1.small Grid + m1.large Grid + c1.xlarge

INVMOD A hydrological application using Levenberg-Marquardt algorithm to minimize the error between simulation and measurements Number of activities 12 * x + 1 x = parallelisation size

INVMOD Time [hours] Cost [$] 50 45 40 35 30 25 20 15 10 250 200 150 100 50 0 Grid Grid + m1.small Grid + m1.large Grid + c1.xlarge 50 100 150 200 250 300 Grid + m1.small Grid + m1.large Grid + c1.xlarge Parallelisation size [x] 50 100 150 200 250 10 300 Parallelisation size [x] Execution times and cost on the Grid and with additional Cloud resources Cost / Saved time [min/$], logarithmic scale [log C T ] 100 1 0.1 0.01 Cost per unit of saved time ($/T) for the three different Cloud with logarithmic scale 50 100 150 200 250 300 Parallelisation size [x] Grid + m1.small Grid + m1.large Grid + c1.xlarge

stageout METEOAG Meteorology and Geophysics Institute Meteorological simulations with the numerical model RAMS Resolve alpine watersheds and thunderstorms in the Arlberg region of the West Austria simulation_init case 1 case 2 case n case_init case_init case_init rams_makevfile rams_makevfile rams_makevfile Initial Conditions Initial Conditions Initial Conditions rams_init 6 h Simulation revu_compare Post Process raver Verify and Select Number of activities 69 * x + 2 x = parallelisation size no continue? yes rams_hist 18 h Simulation revu_dump Post Process

METEOAG 160 140 120 100 80 60 40 20 0 900 800 700 600 500 400 300 200 100 0 Grid Grid + m1.small Grid + m1.large Grid + c1.xlarge 50 100 150 200 250 300 Grid + m1.small Grid + m1.large Grid + c1.xlarge Parallelisation size [x] 50 100 150 200 250 300 10 Parallelisation size [x] Execution times and cost on the Grid and with additional Cloud resources Cost / Saved time [min/$], logarithmic scale [log C T ] 100 1 0.1 0.01 Cost per unit of saved time ($/T) for the three different Cloud with logarithmic scale 50 100 150 200 250 300 Parallelisation size [x] Grid + m1.small Grid + m1.large Grid + c1.xlarge

CONCLUSION Granularity of Cloud payment has an important roll in Cloud allocation decisions Optimizations like the presented needed to allow efficient usage of this dynamic resource class The longer Cloud resources needed the lower the impact Future extension with full graph scheduling algorithms planed

THANK YOU Any questions?