In Cloud, Do MTC or HTC Service Providers Benefit from the Economies of Scale?

In Cloud, Do MTC or HTC Service Providers Benefit from the Economies of Scale? Lei Wang, Jianfeng Zhan, Weisong Shi, Yi Liang, Lin Yuan Institute of Computing Technology, Chinese Academy of Sciences Department of Computer Science, Wayne State University Department of Computer Science, Beijing University of Technology 1

Resources Cloud Computing As an economic platform for daily computing, Cloud computing is coming Resources Web service in the local Web service in the cloud Unused resources Capacity Demand Time Capacity Demand Time Cloud Cloud is a large pool of easily usable and accessible virtualized resources, which can be dynamically reconfigured and typically exploited by a pay-per-use model. [Luis Vaquero et al. A break in the clouds: towards a cloud definition. ACM SIGCOMM Computer Communication Review Volume 39, Issue 1, January 2009 ] Cloud Economics Armbrust et al. in theory show the workloads of Web service applications can benefit from the economies of scale of cloud computing systems. [Michael Armbrust, Armando Fox.2009.Above the clouds: A Berkeley View of cloud Computing. Technical Report] 2

MTC vs. HTC Many-task computing (MTC) delivers much large numbers of computing resources over short period of time to accomplish many computational tasks. MONTAGE workflow High throughput computing (HTC) delivers large amounts of processing capacity over long period of time. batch jobs in Condor HTC and MTC is a new model for running applications on Blue Gene/P [Alan Gara.2008.Many-task applications in use today with a look toward the future.mtags08.] 3

Our motivation Can we consolidate MTC and HTC workloads on a large cloud platform? Different usage scene Different application characteristic Different evaluation metric On the cloud platform, do MTC or HTC service providers benefit from the economies of scale? Service provider Performance Cost Resource provider The total resource consumption Capacity planning 4

DawningCloud Dawning 5000 Ranked as top 10 of Top 500 super computers in November, 2008. MTC runtime environment DawningCloud HTC runtime environment DawningCloud Managing a cluster-based cloud platform, creating the specific runtime environment for MTC or HTC, and dynamically provisioning resources to runtime environments. Using DawningCloud, the organization does not need to own a local cluster system! 5

Our contributions An innovative usage model for cloud computing Dynamic service provision (DSP) model The resource provider can create the specific runtime environments on the demand for service providers;the service provider can dynamically resize the provisioned resources of the runtime environment according to the workload status. An enabling system for cloud computing Dawningcloud-the enabling system based on the DSP model provides automatic management for MTC and HTC workloads. A comprehensive evaluation Emulatation experiment For typical MTC and HTC workloads, on the cloud platform, MTC and HTC service providers and the resource service provider can benefit from the economies of scale. 6

Outline Previous works Our solution Evaluations 7

Previous works Enabling system and consolidation Enabling system EC2 RightScale Shirako The effect of MTC and HTC consolidation Consolidates Web service applications Consolidates Web service applications and parallel batch jobs 8

Previous works Dedicated cluster systems (DCS) Prevailing in MTC and HTC communities Owner purchases, builds and full controls the cluster. The shortcomings TCO is high. Capacity planning is hard. 9

Previous works Usage models for cloud computing Direct resource provision (DRP) model The staff of an organization directly leases virtual machine resources from EC2 in a specified period for running applications. Static service provision (SSP) model The organization as a whole rents resources with the fixed size from EC2 to create a virtual cluster system that is deployed with the queuing system. 10

Our solution We propose an innovative usage model-dynamic service provision (DSP) model. The resource provider can create the specific runtime environments on the demand for service providers. The service provider can dynamically resize the provisioned resources of the runtime environment according to the workload status. We design and implement an enabling system- Dawningcloud. provides automatic management for MTC and HTC workloads. 12

Three players in a cloud Resource provider owns a cloud platform and offers outsourced resources, such as: Amazon. Service provider leases the resources from the resource provider and provide computing service to its end uses through runtime environment. such as: the proxy of an organization. End users submit and manage their computing applications. such as: The staffs in the organization. End users Service Providers Resource Provider Applications Applications Runtime Runtime environment environment Shared Resources 13

The DSP model End Users Workload Runtime environment Service provider Resource provider node The resource provider can create the specific runtime environments on the demand for MTC or HTC service providers. The service provider can dynamically resize the provisioned resources of the runtime environment according to the workload status. 14

The details of DSP End users submit/manage applications Service providers request manage Resource provider destroy accounts confirm create negotiate resources destroy Runtime environments Similar to DCS/SSP models, service providers in the DSP model can fully control their runtime environments. Unlike the DCS/SSP models, service providers can dynamically resize the resources according to the workload status. 15

Dawningcloud Consolidating MTC and HTC workloads Creating runtime environments on the demand for MTC or HTC service providers. Automatically provisioning resources to runtime environments. 16

Creating runtime environment on the demand HTC server HTC portal HTC scheduler CSF resource provision service lifecycle management service deployment service VM provision service agent HTC TRE MTC portal MTC server Trigger monitor MTC scheduler MTC TRE 17

The automatic resource management The setup policy determines when and how to do the setup work Setup policy The service provider specifies his requirement for resource management Resource management policy Lifecycle management Resource provision Server of TRE service of CSF service of CSF Resource provision policy The resource provider specifies his requirement for resource provision 18

The resource management policy for HTC HTC TRE Scans per minute initial nodes dynamic nodes Yes Ratioobtaining_resources > Ratiothreshold NO HTC TRE Yes Sizebiggest_job > Sizecurrent HTC TRE 19

The resource management policy for MTC The resource management policy of the MTC service provider differs from that of HTC service providers in two aspects: MTC server scans jobs in queue per three seconds MTC server calculates the accumulated resource demands of all jobs in queue or the resource demand of the present biggest jobs 20

Scenario Scenario and Method One resource provider Three workloads from three different organizations Two organizations providing HTC services and one organization providing MTC service. Choosing Dawningcloud, the DRP, SSP and DCS systems to provide computing service. Evaluation method Using the emulation method Speeding up by a factor of 100. 22

Emulation system MTC Server MTC Scheduler MTC Job emulator resource provision service HTC Server HTC Scheduler HTC Job emulator The emulated DawningCloud. HTC Server HTC Scheduler HTC Job emulator MTC Server MTC Job emulator MTC Scheduler MTC Job emulator HTC Server HTC Scheduler HTC Job emulator resource provision service HTC Job emulator HTC Server HTC Job emulator HTC Scheduler The emulated SSP and DCS system. The emulated DRP system. HTC Job emulator 23

Workloads MTC workload Montage workflow (http://vtcpc.isi.edu/pegasus/index.php/w orkflowgenerator) includes 1,000 tasks and the average execution time of tasks is 11.38 seconds. HTC workload traces NASA ipsc trace (http://www.cs.huji.ac.il/labs/parallel/wor kload/) two weeks from Fri Oct 01 00:00:03 PDT 199 SDSC BLUE trace (http://www.cs.huji.ac.il/labs/parallel/wor kload/) two weeks from Apr 25 15:00:03 PDT 2000 Montage workflow the number of arrived jobs in each hour for NASA the number of arrived jobs in each hour for BLUE 24

Service provider Evaluation metrics Performance the number of completed jobs for HTC tasks per second for MTC Cost resource consumption (node*hour ) Resource provider The economies of scale the total resource consumption (node*hour ) Capacity planning peak resource consumption (nodes per hour ) 25

The evaluation metric of HTC service providers Configuration Number of completed jobs Resource consumption Saved resources DCS system 2603 43008 / SSP system 2603 43008 0 DRP system 2603 54118-25.8% DawningCloud 2603 29014 32.5% The metrics of the service provider for NASA trace Configuration Number of completed jobs Resource consumption Saved resources DCS system 2649 48384 / SSP system 2649 48384 0 DRP system 2657 35838 25.9% DawningCloud 2653 35201 27.2% The metrics of the service provider for BLUE trace In comparison with the DCS/SSP systems. In comparison with the DRP systems. 26

The evaluation metric of MTC service providers Configuration Tasks per second Resource consumption Saved resources DCS system 2.49 166 / SSP system 2.49 166 0 DRP system 2.71 662-298.8% DawningCloud 2.49 166 0 The metrics of the service provider for Montage In comparison with the DCS/SSP systems. In comparison with the DRP systems. 27

The metric for resource provider Total resource consumption of the resource provider Peak resource consumption of the resource provider Dawningcloud saves the total resource consumption by 29.7% of that of the DCS/SSP system and by 29.0% of that of the DRP system. The peak resource consumption of Dawningcloud is 1.06 times of that of DCS/SSP system and only 0.21 times of that of the DRP system. 28

Management overhead for resource provider The accumulated times of adjusting nodes Allocating or reclaiming nodes or VMs will incur the management overhead for the resource provider. In our real test, the total cost of adjusting one node is 15.743 seconds. The average overhead of Dawningcloud for resource provider is approximately 341 seconds per hour. 29

TCO of the service provider in the SSP and DCS systems We take a real case(of DCS system) from the grid lab of Beijing University of Technology. (http://www.bjut.edu.cn/college/jsjxy/jigoushezhi.jsp?id=4) TCOdcs= (CapEx depreciation) + OpEx = 3,160$ per month TCOssp = (Total Instance Cost) + (Inbound transfer Cost) = 2,260$ per month 30

Analysis From the perspectives of service providers, comparing with the DCS system, SSP is more cost-effective. Service providers have the same performance, but the TCO of the service providers in the SSP system is less than that in the DCS system. Dawningcloud outperform another two cloud solutions: SSP and DRP from the perspectives of service providers and the resource provider. Dawningcloud has the dynamic resource management mechanism and policies. 31

Conclusion With the enabling system: Dawningcloud, MTC or HTC service providers benefit from the economies of scale on the cloud platform. 32

Work-in-progress We have builded a formal model and verified the economies of scale on the cloud platform. For any MTC or HTC workload, there is a configuration of DawningCloud which lets the service provider consume the resources not more than using the SSP/DCS systems with the performance assurance. For any MTC or HTC workloads consolidation,the resource provider uses the resources not more than using the SSP/DCS systems. We have done the more experiments with different parameters configuration and workloads. 33

Future work Investigating the optimal resource management and scheduling policies in the context of cloud computing. Finding the optimal parameters configurations of DawningCloud with the mathematic method. 34

Thanks & Questions weisong@wayne.edu 35

Resource Property The comparison of different usage models DCS SSP DRP DSP local leased leased leased Runtime stereo- stereo- no created Environment typed typed offering on the demand Resources Provision for RE fixed fixed manual flexible 36

The resource provision policy The resource provision service provisions enough initial resources to the TRE at the startup of the TRE. If the server of a TRE requests dynamic resources, then the resource provision service either assigns enough resources to the server or rejects. If the server of a TRE releases dynamic resources, then the resource provision service will passive reclaim all the released resources. 37

The configurations of the experiments Key factors The scheduling policy for Dawningcloud, SSP and DCS system The time unit of leasing resources for Dawningcloud, SSP and DRP system The configurations of runtime environment for SSP and DCS systems The configurations of runtime environment for Dawningcloud Setting first-fit for HTC FCFS for MTC one hour 128/144 for HTC 166 for MTC resource management and provision policies for Dawningcloud 38