Experiments on cost/power and failure aware scheduling for clouds and grids Jorge G. Barbosa, Al0no M. Sampaio, Hamid Harabnejad Universidade do Porto, Faculdade de Engenharia, LIACC Porto, Portugal, jbarbosa@fe.up.pt
Outline Dynamic Power- and Failure- aware Cloud Resources Alloca0on for Sets of Independent Tasks A Budget Constrained Scheduling Algorithm for Workflow Applica0ons on Heterogeneous Clusters
Outline Dynamic Power- and Failure- aware Cloud Resources Alloca0on for Sets of Independent Tasks A Budget Constrained Scheduling Algorithm for Workflow Applica0ons on Heterogeneous Clusters
Dynamic Power- and Failure- aware Cloud Resources Alloca0on for Sets of Independent Tasks Cloud compu0ng paradigm Dynamic provisioning of compu0ng services. Employs Virtual Machine (VM) technologies for consolida0on and environment isola0on purposes. Node failure can occur due to hardware or so[ware problems. Image source: hup://www.commputa0on.kit.edu/92.php
Characteris0cs Dependability of the infrastructure Distributed systems con0nue to grow in scale and in complexity Failures become norms, which can lead to viola0on of the nego0ated SLAs Mean Time Between Failures (MTBF) would be 1.25h on a petaflop system (1) Energy consump;on The main part of energy consump0on is determined by the CPU Energy consump0on dominates the opera0onal costs Task 1 Task 2 Task 3 Task n PM Physical Machine VM 1 VM 2 VM 4... VMM VMM VMM VMM PM 1 PM 2 PM 3 PM m VM n (1) S. Fu, "Failure- aware resource management for high- availability compu0ng clusters with distributed virtual machines," Journal of Parallel and Distributed Compu0ng, vol. 70, April 2010, pp. 384-393, doi: 10.1016/j.jpdc.2010.01.002.
Related Work Dynamic alloca0on of VMs, considering PMs reliability Based in a failure predictor tool with 76.5% of accuracy (1) Op;mis;c Best- Fit (OBFIT) algorithm - Selects the PM with minimum weighted available capacity and reliability. (2) Pessimis;c Best- Fit (PBFIT) algorithm - Selects also unreliable PMs in order to increase the job comple0on rate. - Selects the unreliable PM p with capacity C p such that C avg + C p results in the minimum required capacity Proposed architecture for reconfigurable distributed VM (1) C avg average capacity from reliable PMs.
Approach The goal Construct power- and failure- aware compu;ng environments, in order to maximize the rate of completed jobs by their deadline It is a best- effort approach, not a SLA based approach; Virtual- to- physical resources mapping decisions must consider both the power- efficiency and reliability levels of compute nodes; Dynamic update of virtual- to- physical configura0ons (CPU usage and migra0on).
Approach Mul;- objec;ve scheduling algorithms are addressed in three ways: 1- Finding the pareto op0mal solu0ons, and let the user select the best solu0on. 2- Combina0on of the two func0ons in a single objec0ve func0on. 3- Bicriteria scheduling which the user specifies a limita0on for one criterion (power or budget constraints), and the algorithm tries to op0mize the other criterion under this constraint.
Approach Leverage virtualiza0on tools Xen credit scheduler Dynamically update cap parameter But enforcing work- conserving CPU% 100 CPU Power consump;on Increasing Stop & copy migra0on Faster VM migra0ons, preferable for proac0ve failure management 0 PM3 VM ;me PM2 VM VM PM1 VM VM VM Failure Stop & copy migra0on Failure predic0on accuracy
System Overview Cloud architecture Private cloud Homogenous PMs Cluster coordinator manages user jobs VMs are created and destroyed dynamically Users jobs Private cloud management architecture A job is a set of independent tasks A task runs in a single VM, which CPU- intensive workload is known Number of tasks per job and tasks deadlines are defined by user
Power Model Linear power model P = p1 + p2.cpu% Power Efficiency of P Comple0on rate of users jobs Example of power efficiency curve (p1 = 175w, p2 = 75w) Working Efficiency Measures the quan0ty of useful work done (i.e. completed users jobs) by the consumed power.
Proposed algorithms Minimum Time Task Execu0on (MTTE) algorithm Slack 0me to accomplish task t PM i capacity constraints Selects a PM if: It guarantees maximum processing power required by the VM (task); It has higher reliability; And if It increases CPU Power Efficiency.
Proposed algorithms Relaxed Time Task Execu0on (RTTE) algorithm 100% Host CPU 0% VM Cap set in Xen credit scheduler Unlike MTTE, the RTTE algorithm always reserves to VM the minimum amount of resources necessary to accomplish the task within its deadline
Performance Analysis Simula0on setup 50 PMs, each modeled with one CPU core with the performance equivalent to 800 MFLOPS; VMs stop & copy migra0on overhead takes 12 secs; 30 synthe0c jobs, each being cons0tuted of 5 CPU- intensive workload tasks; Failed PMs stay unavailable during 60 secs; Predicted occurrence 0me of failure precedes the actual occurrence 0me; Failures instants, jobs arriving 0me, and tasks workload sizes follow an uniform distribu0on;
Performance Analysis Implementa0on considera0ons Stabiliza0on to avoid mul0ple migra0ons Concurrence among cluster coordinators Algorithms compared to ours Common Best- Fit (CBFIT) Selects the PM with the maximum power- efficiency and do not consider resources reliability Op0mis0c Best- Fit (OBFIT) Pessimis0c Best- Fit (PBFIT)
Performance Analysis Migra0ons occurring due to proac;ve failure management only: Failure predictor tool has 76.5% of accuracy; RTTE algorithm presents the best results; Working efficiency, as well as the jobs comple0on rate, decreases with failure predic0on inaccuracy.
Performance Analysis Migra0ons occurring due to proac0ve failure management and power efficiency: Sliding window of 36 seconds, with threshold of 65% (a migra0on starts if CPU usage below 65%); RTTE returns the best results for 76.5% failure predic0on accuracy; Comparing to earlier results, the rate of completed jobs diminishes, since the number of VMs migra0ons increases.
Performance Analysis Number of migra0ons occurring due to failure management and power efficiency RTTE and MTTE have stable number of migra0ons and respawns along failure accuracy varia0on Migra0ons occurring due to proac0ve failure management only (75% accuracy) RTTE and MTTE return the best working efficiency as the number of failures in the cloud infrastructure rises
Conclusions (1) Conclusion remarks: Power- and failure- aware dynamic alloca0ons improve the jobs comple0on rate; Dynamically adjus0ng cap parameter of Xen credit scheduler prove to be capable of obtaining beuer jobs comple0on rate (RTTE); Excessive number of VM migra0ons to op0mizing power efficiency reduces job comple0on rate. Future direc0ons: Dynamic alloca0on considering workload characteris0cs; Data locality; Scalability; Compare/integrate DVFS feature; Improve PM consolida0on (why 65% threshold?); Heterogeneous CPUs.
Outline Dynamic Power- and Failure- aware Cloud Resources Alloca0on for Sets of Independent Tasks A Budget Constrained Scheduling Algorithm for Workflow Applica0ons on Heterogeneous Clusters
A Budget Constrained Scheduling Algorithm for Workflow Applica0ons on Heterogeneous Clusters A Job is represented by a workflow A workflow is a Directed Acyclic Graph (DAG) a node is an individual task Workflow scheduling Mapping Tasks to Resources Main goal is to have a lower finish time of the exit task CPU1 CPU2 an edge represents the inter- job dependency CPU3
Introduc0on Target plazorm: - U0lity Grids that are maintained and managed by a service provider. - Based on user requirements, the provider finds a scheduling that meets user constrains. In u;lity Grids, other QoS auributes than execu0on 0me, like economical cost or deadline, may be considered. It is a mul;- objec;ve problem. Mul;- objec;ve scheduling algorithms are addressed in three ways: 1- Finding the pareto op0mal solu0ons, and let the user select the best solu0on; 2- Combina0on of the two func0ons in a single objec0ve func0on; 3- Bicriteria scheduling which the user specifies a limita0on for one criterion (power or budget constraints), and the algorithm tries to op0mize the other criterion under this constraint.
Proposed Algorithm Heterogeneous Budget Constraint Scheduling Algorithm (HBCS) HBCS has two phases: Task Selec0on Phase : We use Upward rank to assign the priority to tasks in the DAG Processor Selec0on Phase : We combine both objec0ve func0ons (cost and 0me) in a single func0on; the processor that maximizes that func0on for the current task is selected.
Proposed Algorithm Heterogeneous Budget Constraint Scheduling Algorithm (HBCS) 0<=k<= 1 (ObjecHve funchon)
Experimental Result Workflow Structure: Synthe0c DAG genera0on (www.loria.fr/~suter/dags.html) Applica0ons have between 30 and 50 tasks, generated randomly. Total number of DAGs in our simula0on is 1000. Workflow Budget: BUDGET = C cheapest + k (CHEFT Ccheapest) Lower budget (k=0) Cheapest scheduling, higher makespan Highest budget (k=1) shortest makespan (HEFT scheduling) 0<=k<= 1 Performance Metric: NormalizedMakespan = makespan makespan HEFT
Experimental Result Simula0on Platorm : We use SIMGRID that allows a realis0c descrip0on of the infrastructure parameters. We consider a bandwidth sharing policy; only one processor can send data over one network link at a 0me. We consider nodes of clusters from the GRID 5000 platorm.
Results Shopia Rennes Grenoble HBCS Time complexity
Conclusions (2) Conclusion remarks We considered a realis0c model of the infrastructure; The HBCS algorithm achieves beuer performances, in par0cular for lower budget values (makespan and 0me complexity); Future direc0ons Compare other combina0ons of cost and 0me factors in the objec0ve func0on; Data locality; Mul0ple DAG scheduling.
29