Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures Maciej'Malawski 1,2,'Piotr'Nowakowski 1,'Tomasz'Gubała 1,'Marek'Kasztelnik 1,' Marian'Bubak 1,2,'Rafael'Ferreira'da'Silva 3,'Ewa'Deelman 3,'Jarek'Nabrzyski 4' ' NSFCloud'Workshop'on'Experimental'Support'for'Cloud'CompuOng' December'11Q12,'2014,'Arlington,'VA' AGH University of Science and Technology: 1 ACC Cyfronet AGH, ul. Nawojki 11, 30-950 Kraków, Poland 2 Department of Computer Science, al. Mickiewicza 30, 30-095 Kraków, Poland 3 University of Southern California, Information Sciences Institute, Marina Del Rey, CA, USA 4 Center for Research Computing, University of Notre Dame, IN, USA
Research Challenges Execution of complex scientific applications on clouds: workflows and their ensembles Pegasus Workflow Management System (OCI SI2-SSI #1148515) HyperFlow Workflow Engine Platform for deployment and sharing of scientific applications on hybrid clouds Atmosphere Framework Algorithms for scheduling, provisioning and cost optimization: Dynamic and Static Algorithms Mathematical Programming Cloud Workflow Simulator 2 2
Research: The Atmosphere Framework Hybrid cloud as a means of provisioning computing power for virtual experiments GUI'host'(provisions'endQuser' features'and'access'opoons)' Cloud'Management'Portlets' Provide'GUI'elements'which'enable'service'developers'and' end'users'to'interact'with'the'atmosphere'playorm'and' create/deploy'services'on'the'available'cloud'resources' Secure'RESTful'API' (Cloud'Facade)' Atmosphere'Core' Atmosphere'Core' Services'Host' AuthenOcaOon'and'authorizaOon'logic' CommunicaOon'with'underlying'computaOonal'clouds' Launching'and'monitoring'service'instances' CreaOng'new'service'templates' Billing'and'accounOng' Logging'and'administraOve'services' Head' Image'store' Head' Image'store' Worker' Worker' Worker' 96'CPU'cores' 184'GB'RAM' 4'TB'storage' private'ip'space' OpenStack'cloud'site'at'ACC'CYFRONET'AGH' Worker'node'w/large' resource'pool'( fat'node )' Worker'node'w/large' resource'pool'( fat'node )' 128'CPU'cores' 256'GB'RAM' 4'TB'storage' private'ip'space' VPHQShare'cloud'site'at'UNIVIE' Atmosphere'Registry'(AIR)' user'accounts' available'cloud'sites' services'and'templates' API' host' Worker' Massive' (funcoonally' limitless)' hardware' resource' pool' Worker' public'ip'space' Image'store' Amazon'ElasOc'Compute'Cloud'(EC2)' 'European'availability'zone' 3 3
Research: Simulation and Scheduling of Large-Scale Scientific Workflows on IaaS Clouds Large-scale scientific workflows from Pegasus WMS Workflows of 100,000 tasks Workflow Ensembles Schedule as many workflows as possible within a budget and deadline Uses a Cloud Workflow Simulator VM Time M. Malawski, G. Juve, E. Deelman, J. Nabrzyski: Cost- and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds. SC 2012: 22 4 4
Research: Cost Optimization of Applications on Clouds Infrastructure model Multiple compute and storage clouds Heterogeneous instance types Application model Bag of tasks Multi-level workflows Modeling with AMPL and CMPL Modeling Language for Mathematical Programming Cost optimization Under deadline constraints Cost ($) Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 3000 2500 2000 1500 1000 500 A B B C B D E F 20000 tasks, 512 MiB input and 512 MiB output, task execution time 0.1h @ 1ccu machine Multiple providers 1h 2.5 h 0.5 h 0.3 h 2 h Amazon's and private instances 6 h Rackspace and private instances Storage Input Output Rackspace instances Task m1.small t1.micro Amazon Application Compute m1.large m2.xlarge Amazon S3 Rackspace Cloud Files Optimal private Compute Private cloud rs.1gb rs.2gb rs.4gb rs.16gb Compute Rackspace Storage Mixed integer programming Bonmin, Cplex solvers 0 0 10 20 30 40 50 60 70 80 90 100 Time limit (hours) M. Malawski, K. Figiela, J. Nabrzyski, Cost minimization for computational applications on hybrid cloud infrastructures, Future Generation Computer Systems, 29(7), 2013, pp.1786-1794, http://dx.doi.org/10.1016/j.future.2013.01.004 M. Malawski, K. Figiela, M. Bubak, E. Deelman, J. Nabrzyski, Cost Optimization of Execution of Multi-level Deadline- Constrained Scientific Workflows on Clouds. PPAM, 2013, 251-260 http://dx.doi.org/10.1007/978-3-642-55224-3_24 5 5
Research: Cloud Performance Evaluation Performance of VM deployment times Virtualization overhead Evaluation of open source cloud stacks Eucalyptus, OpenNebula, OpenStack Survey of European public cloud providers Performance evaluation of top cloud providers EC2, RackSpace, SoftLayer A grant from Amazon has been obtained M. Bubak, M. Kasztelnik, M. Malawski, J. Meizner, P. Nowakowski, S. Varma, Evaluation of Cloud Providers for VPH Applications, poster at CCGrid2013, Delft, the Netherlands, pp.13-16, 2013 6 6 IaaS Provider EEA Zoning jclouds API Support BLOB storage support Perhour instance billing API Access Published price VM Image Import / Export Relational DB support Weight 20 20 10 5 5 5 3 2 1 Amazon AWS 1 1 1 1 1 1 0 1 27 2 Rackspace 1 1 1 1 1 1 0 1 27 3 SoftLayer 1 1 1 1 1 1 0 0 25 4 CloudSigma 1 1 0 1 1 1 1 0 18 5 ElasticHosts 1 1 0 1 1 1 1 0 18 6 Serverlove 1 1 0 1 1 1 1 0 18 7 GoGrid 1 1 0 1 1 1 0 0 15 8 Terremark ecloud 1 1 0 1 1 0 1 0 13 9 RimuHosting 1 1 0 0 1 1 0 1 12 10 Stratogen 1 1 0 0 1 0 1 0 8 11 Bluelock 1 1 0 0 1 0 0 0 5 12 Fujitsu GCP 1 1 0 0 1 0 0 0 5 13 BitRefinery 0 0 0 0 0 1 0 1 0 14 BrightBox 1 0 0 1 1 1 1 0 0 15 BT Global Services 1 0 0 0 1 0 1 0 0 16 Carpathia Hosting 1 0 0 0 0 0 1 0 0 17 City Cloud 1 0 0 1 1 1 0 0 0 18 Claris Networks 0 0 0 1 0 0 0 0 0 19 Codero 0 0 0 1 1 1 0 0 0 20 CSC 1 0 0 0 0 0 1 0 0 21 Datapipe 1 0 0 1 1 0 0 0 0 22 e24cloud 1 0 0 1 0 1 0 0 0 23 eapps 0 0 0 0 0 1 0 0 0 24 FlexiScale 1 0 0 1 1 1 1 0 0 25 Google GCE 1 0 1 1 1 1 0 1 0 26 Green House Data 0 0 0 0 1 0 1 0 0 27 Hosting.com 0 0 0 0 0 1 1 1 0 28 HP Cloud 0 1 1 1 1 1 1 1 0 29 IBM SmartCloud 0 0 1 1 1 1 0 1 0 30 IIJ GIO 0 0 0 0 0 0 0 0 0 31 iland cloud 1 0 0 1 0 1 1 0 0 32 Internap 0 0 1 1 1 1 0 0 0 33 Joyent 0 0 0 1 1 1 0 0 0 34 LunaCloud 1 0 1 1 1 1 0 0 0 35 Oktawave 1 0 1 1 1 1 0 1 0 36 Openhosting.co.uk 1 0 0 0 0 1 0 0 0 37 Openhosting.com 0 1 0 1 1 1 1 0 0 38 OpSource 1 0 1 1 1 1 1 0 0 39 ProfitBricks 1 0 0 1 1 1 0 0 0 40 Qube 1 0 0 0 0 1 0 0 0 41 ReliaCloud 0 0 0 0 0 0 0 0 0 42 SaavisDirect 0 0 1 1 0 1 0 0 0 43 SkaliCloud 0 1 0 1 1 1 1 0 0 44 Teklinks 0 0 0 0 0 0 0 0 0 45 Terremark vcloud 0 1 0 1 1 1 1 0 0 46 Tier 3 0 0 0 0 1 0 0 0 0 47 Umbee 1 0 0 1 1 1 1 0 0 48 VPS.net 1 0 0 0 1 1 0 0 0 49 Windows Azure 1 0 1 1 1 1 0 1 0 Score
Experiment: Evaluation of autoscaling techniques for Atmosphere cloud platform Challenges Requires repeated tests under varying workloads Experiments in an isolated environment Goals Perform autoscaling based on: Complex event processing Time series database Build an isolated environment on NSFCloud 7 7
Experiment: Scalability of Scientific Workflows in HyperFlow Model Challenges Issues on data transfers and data locality Calibrate the performance models of applications Goals Execute large-scale deployments on multi-site NSFCloud facilities Assess the impact of network latency and bandwidth limitations PaaSage platform Upperware Executionware Workflow CAMEL generator PaaSage application CAMEL model Metrics Metrics RabbitMQ Deploy & scale infrastructure Cloud Workflow generator Workflow graph Hyperflow engine Ready tasks Task scheduler Scheduled tasks Monitoring Job queue Results VMs Workflow Executor Executor 1 Executor components 1 Redis 8 8
Experiment: Influence of Variability of Clouds on the Quality of Algorithms Challenges Static scheduling methods assume that the estimates of task runtimes are available 2.0 WA The runtime variations and various uncertainties influence the actual execution Goals A large-scale experimental testbed will allow investigating the influence of the uncertainties Development of new models to mitigate uncertainties negative effects Makespan / Deadline 1.5 1.0 0.5 0.0 WA WA WA WA 0 % 1 % 2 % 5 % 10 % 20 % 50 % Runtime estimate error WA WA WA 9 9
Experiment: Interoperation of Cloud Testbed of PL-Grid Infrastructure with NSFCloud PL-Grid One of the largest national grid infrastructures in Europe (2500+ users, 500+ teams) Cloud testbed based on OpenNebula and OpenStack Goals Possibility to run transatlantic and global-scale experiments Evaluation of impact of wide-area and high-latency networks 10 10
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures Thank&you.&! DICE!Team!at!AGH:!h0p://dice.cyfronet.pl! Center!for!Research!Compu@ng!at!Notre!Dame:!h0ps://crc.nd.edu! Pegasus!Team!at!USC:!h0p://pegasus.isi.edu!