DynamicCloudSim: Simulating Heterogeneity in Computational Clouds Marc Bux, Ulf Leser {bux leser}@informatik.hu-berlin.de The 2nd international workshop on Scalable Workflow Enactment Engines and Technologies (SWEET'13)
Meet Sandra DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 2
Meet Sandra DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 3
Meet Sandra DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 4
Meet Paul Small Instance: 1.7 GB RAM, 1 EC2 Compute Unit, 160 GB local storage Compute Unit: equiv. CPU capacity of a 1.0-1.2 GHz Opteron or Xeon No guarantees wrt. I/O throughput and network delay / bandwidth DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 5
Meet Paul Any one cloud instance is unlike another. DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 6
Heterogeneity in EC2 Cloud Instances Source: [Dejun10] Amazon EC2 Performance [Schad10] Different CPUs on physical host systems [Jackson10, Schad10] Intel Xeon E5430 (2.66 GHz quad) AMD Opteron 270 (2 GHz dual) AMD Opteron 2218 HE (2.6 GHz dual) I/O throughput varies as well [Dejun10] No correlation between CPU and I/O performance DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 7
Dynamic Changes of Performance Occasional CPU performance slumps and failures during task execution [Dejun10, Jackson10] Variance in I/O and network throughput [Zaharia08,Jackson10] Performance depends on hour of day and day of week [Schad10] EC2 Disk performance vs. VM co-allocation [Zaharia08] CPU performance slumps [Dejun10] DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 8
Vision Adaptive scheduling of scientific workflows Exploit heterogeneous resources Exhibit robustness to instability DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 9
Vision The standard approach for evaluation is simulation Cloud simulation toolkits do not model instability [Braun01, Blythe05] DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 10
Agenda 1) Simulating Heterogeneity in Computational Clouds 2) Evaluating Established Workflow Schedulers 3) Summary and Outlook DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 11
Agenda 1) Simulating Heterogeneity in Computational Clouds 2) Evaluating Established Workflow Schedulers 3) Summary and Outlook DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 12
CloudSim R. N. Calheiros, R. Ranjan, A. Beloglazov, C. A. F. De Rose, R. Buyya (2011), CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms, Software - Practice and Experience 41(1):23-50. More than 250 citations in Google Scholar https://code.google.com/p/cloudsim/ Task VM Host Datacenter DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 13
DynamicCloudSim Extend CloudSim with models for 1. Heterogeneous computational resources (Het) 2. Dynamic changes of performance at runtime (DCR) 3. Straggler VMs and failed task executions (SaF) More fine-grained representation of computational resources https://code.google.com/p/dynamiccloudsim/ Error-prone Task Dynamic VM Heterogeneous Host Datacenter DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 14
Realism can we ever get there? Simulation can never perfectly resemble reality We model inhomogeneity and dynamic changes by sampling from normal distributions Default mean and STD/RSD Parameters are obtained from [Zaharia08, Dejun10, Jackson10, Schad10, Iosup11] Many performance characteristics in EC2 follow a normal distribution [Schad10] DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 15
Simulating VM Performance: DCS vs CS 1. Heterogeneous computational resources (Het) 2. Dynamic changes of performance at runtime (DCR) 3. Straggler VMs and failed task executions (SaF) DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 16
Agenda 1) Simulating Heterogeneity in Computational Clouds 2) Evaluating Established Workflow Schedulers a) Scheduling Scientific Workflows b) Evaluation Workflows c) Evaluation Results 3) Summary and Outlook DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 17
Agenda 1) Simulating Heterogeneity in Computational Clouds 2) Evaluating Established Workflow Schedulers a) Scheduling Scientific Workflows b) Evaluation Workflows c) Evaluation Results 3) Summary and Outlook DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 18
Scheduling of Scientific Workflows Scheduling: Mapping tasks to the available physical resources Usual goal: minimize overall execution time Static Scheduling: Schedule is assembled prior to workflow execution Schedule is strictly abided at runtime Adaptive Scheduling: Monitor computational infrastructure Adjust workflow execution at runtime DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 19
Static Schedulers Baseline: Round Robin Assign tasks to resources in turn Equal amount of tasks per resource Elaborate: HEFT (Het. Earliest Finish Time) [Topcuoglu02] Implemented in SWfMS Pegasus Requires runtime estimates for each task on each resource Assign tasks with longest time to finish a fixed timeslot on a suitable (well-performing) resource Exploit heterogeneity in computational infrastructure (Het) DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 20
Adaptive Schedulers Baseline: Greedy Task Queue Assign tasks to resources at runtime in first-come-firstserved manner Adapts to changes of performance at runtime (DCR) Elaborate: LATE (Longest Approx. Time to End) [Zaharia08] Developed for Hadoop to increase robustness to instability 10% of Tasks progressing at rate below average are replicated and speculatively executed Exploit dynamic changes of performance Robust to straggler VMs and failed task executions (SaF) DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 21
Agenda 1) Simulating Heterogeneity in Computational Clouds 2) Evaluating Established Workflow Schedulers a) Scheduling Scientific Workflows b) Evaluation Workflows c) Evaluation Results 3) Summary and Outlook DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 22
Evaluation Workflow: Montage [Berriman04] DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 23
Abstract Montage Workflow One task can have many task instances. DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 24
Concrete Montage Workflow 43,318 tasks reading and writing 534 GB of data 10 GB input files which have to be uploaded to the cloud Determine avg. runtime over 100 simulations of workflow exec. DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 25
Eval. Workflow: Comparative Genomics DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 26
Concrete Genomics Workflow DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 27
Concrete Genomics Workflow Align 10% of the reads produced in a sequencing experiment against the smallest of human chromosomes (chr22) Use about 0.2% of the available data 4,266 tasks reading and writing 436 GB of data (2.3 GB upload) Upload to cloud Indexing (bowtie, SHRiMP, PerM) Alignment (bowtie, SHRiMP, PerM) Convert (samtools view) Sort (samtools sort) Merge (merge) Preprocess (samtools mpileup) Variant calling (VarScan) Sense-Making (VCFTools) Download from cloud DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 28
Agenda 1) Simulating Heterogeneity in Computational Clouds 2) Evaluating Established Workflow Schedulers a) Scheduling Scientific Workflows b) Evaluation Workflows c) Evaluation Results 3) Summary and Outlook DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 29
Average Runtime in Minutes Runtime depending on Heterogeneity (Het) Average Runtime in Minutes 1314 1400 1200 1000 800 600 400 200 0 368 Static Round Robin 286 450 300 296 300 308 371 296 313 303 301 315 300 304 308 296 311 HEFT 715 Greedy Queue LATE 0 0.5 0.375 0.25 0.125 747 RSD Parameters for Heterogeneous Resources (Het) 800 602 600 400 200 0 203 Static Round Robin 220 275 143 163 178 HEFT 149 195 185 152 187 182 150 166 177 148 163 179 Greedy Queue DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 30 LATE 0 0.5 0.375 0.25 0.125 RSD Parameters for Heterogeneous Resources (Het)
Runtime depending on Dynamic Changes (DCR) Average Runtime in Minutes Average Runtime in Minutes 600 500 400 300 200 100 0 368 Static Round Robin 574 530 465 439 394 307 357 311 289 352 299 299 301 308 296 317 304 296 311 HEFT Greedy Queue 400 300 200 100 0 LATE 203 0 Static Round Robin 0.5 0.375 0.25 0.125 HEFT RSD Parameters for Dynamic Changes at Runtime (DCR) 314 295 255 241 207 177 216 190 170 180 165 179 165 166 176 143 163 178 Greedy Queue DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 31 393 LATE 0 0.5 0.375 0.25 0.125 RSD Parameters for Dynamic Changes at Runtime (DCR)
Average Runtime in Minutes Runtime with Stragglers and Failures (SaF) Average Runtime in Minutes 3000 2500 2000 1500 1000 500 0 368 Static Round Robin 1365 1291 1137 962 876 790 659 321 598 586 316 405 317 0.025 396 0.01875 304 316 296 0.0125 311 0.00625 HEFT 2559 Greedy Queue LATE 0 1990 Likelihood of Straggler VMs and Failed Tasks (SaF) 2000 1500 1000 500 0 203 Static Round Robin HEFT Greedy Queue DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 32 1025 984 1125 604 617 635 411 195 444 352 188 262 187 0.025 237 0.01875 180 143 0.0125 163 178 0.00625 LATE 0 Likelihood of Straggler VMs and Failed Tasks (SaF)
That s all well and good, but Scheduling in SWfMS: Static or Greedy Task Queue HEFT and LATE have a computational overhead and require information not available in real scenarios: HEFT: runtime estimates of each task on each machine LATE: progress rate of each running task Untapped optimization potential: multiple resource scheduling Find appropriate matches between tasks and machines DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 33
Summary and Outlook EC2: Heterogeneity and instability in VM performance DynamicCloudSim introduces several factors of instability into CloudSim Simulation experiments reproduce known strengths and shortcomings of established schedulers Outlook: Comparative evaluation on real hardware DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 34
Thanks for your attention! https://code.google.com/p/dynamiccloudsim/ DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 35
Questions DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 36
Literature [Braun01] T. D. Braun, H. J. Siegel, N. Beck, L. L. Boloni, M. Maheswarans, A. I. Reuther, J. P. Robertson, M. D. Theys, B. Yao, D. Hensgen, R. F. Freund (2001), A Comparison Study of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems, Journal of Parallel and Distributed Computing 61:810 837. [Blythe05] J. Blythe, S. Jain, E. Deelman, Y. Gil, K. Vahi, A. Mandal, K. Kennedy (2005), Task Scheduling Strategies for Workflow-based Applications in Grids, in: Proceedings of the 5th IEEE International Symposium on Cluster Computing and the Grid, volume 2, Cardiff, UK, pp. 759 767. DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 37
Literature (cont.) [Jackson10] K. R. Jackson, et al. (2010), Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud, in: Proceedings of the 2nd International Conference on Cloud Computing Technology and Science, Indianapolis, USA, pp. 159-168. [Dejun09] J. Dejun, et al. (2009), EC2 Performance Analysis for Resource Provisioning of Service-Oriented Applications, in: Proceedings of the 7th International Conference on Service Oriented Computing, Stockholm, Sweden, pp. 197-207. [Zaharia08] M. Zaharia, et al. (2008), Improving MapReduce Performance in Heterogeneous Environments, in: Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation, San Diego, USA, pp. 29-42. DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 38
Literature (cont.) [Schad10] J. Schad, J. Dittrich, J.-A. Quiané-Ruiz (2010), Runtime Measurements in the Cloud: Observing, Analyzing, and Reducing Variance, Proceedings of the VLDB Endowment 3(1):460 471. [Iosup11] A. Iosup, N. Yigitbasi, D. Epema (2011), On the Performance Variability of Production Cloud Services, in: Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Newport Beach, California, USA, pp. 104 113. DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 39
Literature (cont.) [Topcuoglu02] H. Topcuoglu, S. Hariri, M.-Y. Wu (2002), Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing, IEEE Transactions on Parallel and Distributed Systems 13(3):260-274. [Berriman04] G. B. Berriman, et al. (2004), Montage: a gridenabled engine for delivering custom science-grade mosaics on demand, in: Proceedings of the SPIE Conference on Astronomical Telescopes and Instrumentation, volume 5493, Glasgow, Scotland, pp. 221-232. DynamicCloudSim: Simulating Heterogeneity in Computational Clouds 40