Experiments on cost/power and failure aware scheduling for clouds and grids



Similar documents
Data Center Evolu.on and the Cloud. Paul A. Strassmann George Mason University November 5, 2008, 7:20 to 10:00 PM

A View of Cloud Computing: Concepts and Challenges

Cloud Compu)ng: Overview & challenges. Aminata A. Garba

Payment minimization and Error-tolerant Resource Allocation for Cloud System Using equally spread current execution load

CHAPTER 6 MAJOR RESULTS AND CONCLUSIONS

Clusters in the Cloud

Dynamic Resource allocation in Cloud

Chapter 3. Database Architectures and the Web Transparencies

A Locality Enhanced Scheduling Method for Multiple MapReduce Jobs In a Workflow Application

Cloud Compu)ng in Educa)on and Research

Energy Constrained Resource Scheduling for Cloud Environment

Project Overview. Collabora'on Mee'ng with Op'mis, Sept. 2011, Rome

Run-time Resource Management in SOA Virtualized Environments. Danilo Ardagna, Raffaela Mirandola, Marco Trubian, Li Zhang

CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms

Power Management in Cloud Computing using Green Algorithm. -Kushal Mehta COP 6087 University of Central Florida

Big Data. The Big Picture. Our flexible and efficient Big Data solu9ons open the door to new opportuni9es and new business areas

Non-Cooperative Computation Offloading in Mobile Cloud Computing

A Taxonomy and Survey of Energy-Efficient Data Centers and Cloud Computing Systems

Infrastructure as a Service (IaaS)

Figure 1. The cloud scales: Amazon EC2 growth [2].

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing

Introduc)on of Pla/orm ISF. Weina Ma

Paul Brebner, Senior Researcher, NICTA,

Group Based Load Balancing Algorithm in Cloud Computing Virtualization

benefit of virtualiza/on? Virtualiza/on An interpreter may not work! Requirements for Virtualiza/on 1/06/15 Which of the following is not a poten/al

Resource Allocation Avoiding SLA Violations in Cloud Framework for SaaS

Some Security Challenges of Cloud Compu6ng. Kui Ren Associate Professor Department of Computer Science and Engineering SUNY at Buffalo

An Enhanced Cost Optimization of Heterogeneous Workload Management in Cloud Computing

HOLACONF - Cloud Forward 2015 Conference From Distributed to Complete Computing HAMZA. in collaboration SAHLI with

The Theory And Practice of Testing Software Applications For Cloud Computing. Mark Grechanik University of Illinois at Chicago

Sla Aware Load Balancing Algorithm Using Join-Idle Queue for Virtual Machines in Cloud Computing

Secure Hybrid Cloud Infrastructure for Scien5fic Applica5ons

VIRTUAL RESOURCE MANAGEMENT FOR DATA INTENSIVE APPLICATIONS IN CLOUD INFRASTRUCTURES

Multifaceted Resource Management for Dealing with Heterogeneous Workloads in Virtualized Data Centers

Dynamic Resource Allocation in Software Defined and Virtual Networks: A Comparative Analysis

International Journal of Advance Research in Computer Science and Management Studies

Keywords Distributed Computing, On Demand Resources, Cloud Computing, Virtualization, Server Consolidation, Load Balancing

Energetic Resource Allocation Framework Using Virtualization in Cloud

CUDA in the Cloud Enabling HPC Workloads in OpenStack With special thanks to Andrew Younge (Indiana Univ.) and Massimo Bernaschi (IAC-CNR)

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

Round Robin with Server Affinity: A VM Load Balancing Algorithm for Cloud Based Infrastructure

PROJECT PORTFOLIO SUITE

Best Prac*ces for Deploying Oracle So6ware on Virtual Compute Appliance

Parametric Analysis of Mobile Cloud Computing using Simulation Modeling

Black-box and Gray-box Strategies for Virtual Machine Migration

Cost-effective Resource Provisioning for MapReduce in a Cloud

Keywords: Cloudsim, MIPS, Gridlet, Virtual machine, Data center, Simulation, SaaS, PaaS, IaaS, VM. Introduction

The Development of Cloud Interoperability

Heterogeneous Workload Consolidation for Efficient Management of Data Centers in Cloud Computing

Project Por)olio Management

How To Manage Cloud Service Provisioning And Maintenance

Context-Aware Optimization in Cloud Management

IMCM: A Flexible Fine-Grained Adaptive Framework for Parallel Mobile Hybrid Cloud Applications

Load Balancing for Improved Quality of Service in the Cloud

SCHEDULING IN CLOUD COMPUTING

Optimal Service Pricing for a Cloud Cache

Auto-Scaling Model for Cloud Computing System

Task Scheduling for Efficient Resource Utilization in Cloud

Big Data Processing Experience in the ATLAS Experiment

An Integrated Approach to Manage IT Network Traffic - An Overview Click to edit Master /tle style

A SURVEY ON WORKFLOW SCHEDULING IN CLOUD USING ANT COLONY OPTIMIZATION

SDN- based Mobile Networking for Cellular Operators. Seil Jeon, Carlos Guimaraes, Rui L. Aguiar

Multilevel Communication Aware Approach for Load Balancing

SCORE BASED DEADLINE CONSTRAINED WORKFLOW SCHEDULING ALGORITHM FOR CLOUD SYSTEMS

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age.

HCOC: A Cost Optimization Algorithm for Workflow Scheduling in Hybrid Clouds

Load Balancing to Save Energy in Cloud Computing

WORKFLOW ENGINE FOR CLOUDS

SQream Technologies Ltd - Confiden7al

Transcription:

Experiments on cost/power and failure aware scheduling for clouds and grids Jorge G. Barbosa, Al0no M. Sampaio, Hamid Harabnejad Universidade do Porto, Faculdade de Engenharia, LIACC Porto, Portugal, jbarbosa@fe.up.pt

Outline Dynamic Power- and Failure- aware Cloud Resources Alloca0on for Sets of Independent Tasks A Budget Constrained Scheduling Algorithm for Workflow Applica0ons on Heterogeneous Clusters

Outline Dynamic Power- and Failure- aware Cloud Resources Alloca0on for Sets of Independent Tasks A Budget Constrained Scheduling Algorithm for Workflow Applica0ons on Heterogeneous Clusters

Dynamic Power- and Failure- aware Cloud Resources Alloca0on for Sets of Independent Tasks Cloud compu0ng paradigm Dynamic provisioning of compu0ng services. Employs Virtual Machine (VM) technologies for consolida0on and environment isola0on purposes. Node failure can occur due to hardware or so[ware problems. Image source: hup://www.commputa0on.kit.edu/92.php

Characteris0cs Dependability of the infrastructure Distributed systems con0nue to grow in scale and in complexity Failures become norms, which can lead to viola0on of the nego0ated SLAs Mean Time Between Failures (MTBF) would be 1.25h on a petaflop system (1) Energy consump;on The main part of energy consump0on is determined by the CPU Energy consump0on dominates the opera0onal costs Task 1 Task 2 Task 3 Task n PM Physical Machine VM 1 VM 2 VM 4... VMM VMM VMM VMM PM 1 PM 2 PM 3 PM m VM n (1) S. Fu, "Failure- aware resource management for high- availability compu0ng clusters with distributed virtual machines," Journal of Parallel and Distributed Compu0ng, vol. 70, April 2010, pp. 384-393, doi: 10.1016/j.jpdc.2010.01.002.

Related Work Dynamic alloca0on of VMs, considering PMs reliability Based in a failure predictor tool with 76.5% of accuracy (1) Op;mis;c Best- Fit (OBFIT) algorithm - Selects the PM with minimum weighted available capacity and reliability. (2) Pessimis;c Best- Fit (PBFIT) algorithm - Selects also unreliable PMs in order to increase the job comple0on rate. - Selects the unreliable PM p with capacity C p such that C avg + C p results in the minimum required capacity Proposed architecture for reconfigurable distributed VM (1) C avg average capacity from reliable PMs.

Approach The goal Construct power- and failure- aware compu;ng environments, in order to maximize the rate of completed jobs by their deadline It is a best- effort approach, not a SLA based approach; Virtual- to- physical resources mapping decisions must consider both the power- efficiency and reliability levels of compute nodes; Dynamic update of virtual- to- physical configura0ons (CPU usage and migra0on).

Approach Mul;- objec;ve scheduling algorithms are addressed in three ways: 1- Finding the pareto op0mal solu0ons, and let the user select the best solu0on. 2- Combina0on of the two func0ons in a single objec0ve func0on. 3- Bicriteria scheduling which the user specifies a limita0on for one criterion (power or budget constraints), and the algorithm tries to op0mize the other criterion under this constraint.

Approach Leverage virtualiza0on tools Xen credit scheduler Dynamically update cap parameter But enforcing work- conserving CPU% 100 CPU Power consump;on Increasing Stop & copy migra0on Faster VM migra0ons, preferable for proac0ve failure management 0 PM3 VM ;me PM2 VM VM PM1 VM VM VM Failure Stop & copy migra0on Failure predic0on accuracy

System Overview Cloud architecture Private cloud Homogenous PMs Cluster coordinator manages user jobs VMs are created and destroyed dynamically Users jobs Private cloud management architecture A job is a set of independent tasks A task runs in a single VM, which CPU- intensive workload is known Number of tasks per job and tasks deadlines are defined by user

Power Model Linear power model P = p1 + p2.cpu% Power Efficiency of P Comple0on rate of users jobs Example of power efficiency curve (p1 = 175w, p2 = 75w) Working Efficiency Measures the quan0ty of useful work done (i.e. completed users jobs) by the consumed power.

Proposed algorithms Minimum Time Task Execu0on (MTTE) algorithm Slack 0me to accomplish task t PM i capacity constraints Selects a PM if: It guarantees maximum processing power required by the VM (task); It has higher reliability; And if It increases CPU Power Efficiency.

Proposed algorithms Relaxed Time Task Execu0on (RTTE) algorithm 100% Host CPU 0% VM Cap set in Xen credit scheduler Unlike MTTE, the RTTE algorithm always reserves to VM the minimum amount of resources necessary to accomplish the task within its deadline

Performance Analysis Simula0on setup 50 PMs, each modeled with one CPU core with the performance equivalent to 800 MFLOPS; VMs stop & copy migra0on overhead takes 12 secs; 30 synthe0c jobs, each being cons0tuted of 5 CPU- intensive workload tasks; Failed PMs stay unavailable during 60 secs; Predicted occurrence 0me of failure precedes the actual occurrence 0me; Failures instants, jobs arriving 0me, and tasks workload sizes follow an uniform distribu0on;

Performance Analysis Implementa0on considera0ons Stabiliza0on to avoid mul0ple migra0ons Concurrence among cluster coordinators Algorithms compared to ours Common Best- Fit (CBFIT) Selects the PM with the maximum power- efficiency and do not consider resources reliability Op0mis0c Best- Fit (OBFIT) Pessimis0c Best- Fit (PBFIT)

Performance Analysis Migra0ons occurring due to proac;ve failure management only: Failure predictor tool has 76.5% of accuracy; RTTE algorithm presents the best results; Working efficiency, as well as the jobs comple0on rate, decreases with failure predic0on inaccuracy.

Performance Analysis Migra0ons occurring due to proac0ve failure management and power efficiency: Sliding window of 36 seconds, with threshold of 65% (a migra0on starts if CPU usage below 65%); RTTE returns the best results for 76.5% failure predic0on accuracy; Comparing to earlier results, the rate of completed jobs diminishes, since the number of VMs migra0ons increases.

Performance Analysis Number of migra0ons occurring due to failure management and power efficiency RTTE and MTTE have stable number of migra0ons and respawns along failure accuracy varia0on Migra0ons occurring due to proac0ve failure management only (75% accuracy) RTTE and MTTE return the best working efficiency as the number of failures in the cloud infrastructure rises

Conclusions (1) Conclusion remarks: Power- and failure- aware dynamic alloca0ons improve the jobs comple0on rate; Dynamically adjus0ng cap parameter of Xen credit scheduler prove to be capable of obtaining beuer jobs comple0on rate (RTTE); Excessive number of VM migra0ons to op0mizing power efficiency reduces job comple0on rate. Future direc0ons: Dynamic alloca0on considering workload characteris0cs; Data locality; Scalability; Compare/integrate DVFS feature; Improve PM consolida0on (why 65% threshold?); Heterogeneous CPUs.

Outline Dynamic Power- and Failure- aware Cloud Resources Alloca0on for Sets of Independent Tasks A Budget Constrained Scheduling Algorithm for Workflow Applica0ons on Heterogeneous Clusters

A Budget Constrained Scheduling Algorithm for Workflow Applica0ons on Heterogeneous Clusters A Job is represented by a workflow A workflow is a Directed Acyclic Graph (DAG) a node is an individual task Workflow scheduling Mapping Tasks to Resources Main goal is to have a lower finish time of the exit task CPU1 CPU2 an edge represents the inter- job dependency CPU3

Introduc0on Target plazorm: - U0lity Grids that are maintained and managed by a service provider. - Based on user requirements, the provider finds a scheduling that meets user constrains. In u;lity Grids, other QoS auributes than execu0on 0me, like economical cost or deadline, may be considered. It is a mul;- objec;ve problem. Mul;- objec;ve scheduling algorithms are addressed in three ways: 1- Finding the pareto op0mal solu0ons, and let the user select the best solu0on; 2- Combina0on of the two func0ons in a single objec0ve func0on; 3- Bicriteria scheduling which the user specifies a limita0on for one criterion (power or budget constraints), and the algorithm tries to op0mize the other criterion under this constraint.

Proposed Algorithm Heterogeneous Budget Constraint Scheduling Algorithm (HBCS) HBCS has two phases: Task Selec0on Phase : We use Upward rank to assign the priority to tasks in the DAG Processor Selec0on Phase : We combine both objec0ve func0ons (cost and 0me) in a single func0on; the processor that maximizes that func0on for the current task is selected.

Proposed Algorithm Heterogeneous Budget Constraint Scheduling Algorithm (HBCS) 0<=k<= 1 (ObjecHve funchon)

Experimental Result Workflow Structure: Synthe0c DAG genera0on (www.loria.fr/~suter/dags.html) Applica0ons have between 30 and 50 tasks, generated randomly. Total number of DAGs in our simula0on is 1000. Workflow Budget: BUDGET = C cheapest + k (CHEFT Ccheapest) Lower budget (k=0) Cheapest scheduling, higher makespan Highest budget (k=1) shortest makespan (HEFT scheduling) 0<=k<= 1 Performance Metric: NormalizedMakespan = makespan makespan HEFT

Experimental Result Simula0on Platorm : We use SIMGRID that allows a realis0c descrip0on of the infrastructure parameters. We consider a bandwidth sharing policy; only one processor can send data over one network link at a 0me. We consider nodes of clusters from the GRID 5000 platorm.

Results Shopia Rennes Grenoble HBCS Time complexity

Conclusions (2) Conclusion remarks We considered a realis0c model of the infrastructure; The HBCS algorithm achieves beuer performances, in par0cular for lower budget values (makespan and 0me complexity); Future direc0ons Compare other combina0ons of cost and 0me factors in the objec0ve func0on; Data locality; Mul0ple DAG scheduling.

29