On the Amplitude of the Elasticity Offered by Public Cloud Computing Providers Rostand Costa a,b, Francisco Brasileiro a a Federal University of Campina Grande Systems and Computing Department, Distributed Systems Lab Av. Aprígio Veloso, 882 - Bloco CO, Bodocongó Campina Grande, Paraíba, Brazil b Federal University of Paraíba Informatics Department, Digital Video Applications Lab Campus I - Cidade Universitária João Pessoa, Paraíba, Brazil Abstract The cloud computing paradigm allows for the provision of Information Technology infrastructure in the form of a service that clients acquire on-demand and paying only for the amount of service that they actually consume. In particular, considering that the area of a service request is given by the product between the amount of resources requested per unit of time, and the time interval during which the resources are used, there is a cost associativity property that provides clients with many choices on how to mold the area of their service requests, yet paying the same price for any two requests that have the same area. This is a property that is very useful to classes of applications that are becoming more and more popular. Unfortunately, current public cloud computing providers impose very strict limits on the amount of service that a single user can acquire in each instant of time, restricting the extent to which applications can take advantage of cost associativity. In this paper we analyze the reasons why these providers need to impose such a limit. We show that increases on this limit have an important impact on the profit achieved by the providers, therefore, applications that could benefit from extremely high elasticity cannot be appropriately served by the current model of public cloud computing provision. Keywords: Cloud computing, elasticity, availability, capacity planning, BoT Preprint submitted to Information Processing Letters June 18, 2011
1. Introduction Cloud computing is an evolving paradigm that allows the provision of Information Technology (IT) in the form of a service that can be purchased online and on-demand by clients. The resources that are bought from a public cloud computing provider can be rapidly provisioned to customers, and later released by them, offering a theoretically unlimited elasticity on the way service consumption increases and decreases with time. Moreover, clients are charged using a pricing model where they pay only for the amount of service that they actually consume. More formally, let the service requested by a client from a cloud computing provider over time be defined by an infinite sequence of tuples s 1, s 2,..., with s i = ρ i, σ i, δ i, where ρ i is the amount of resources that were requested in service request s i, σ i is the time when the client wants to start using the resources, and δ i is the duration of the time interval for which the ρ i resources were requested. The elasticity property defines that no restrictions are imposed on ρ i ρ i 1 for any i, i > 1, while the pay-as-you-go property defines that the fee charged to the client for any request s i is a function of ρ i δ i. Moreover, a cost associativity property [1] defines that clients are charged the same fee for any two requests s i and s j, such that ρ i δ i = ρ j δ j. The cost associativity property of the cloud computing model is particularly interesting for an important class of applications which has become increasingly popular nowadays. The so-called embarassingling parallel or simply bag-oftasks (BoT) applications [2]. They are comprised of a very large number of tasks that can be independently executed. Thus, they can be trivially parallelized using a simple workqueue scheduler that keeps dispatching tasks to be executed in any available processing resource, until all tasks are executed. It is easy to see that, considering homogeneous resources, the more resources are available to run BoT applications, the faster they will complete their executions. 2
Due to the cost associativity property, clients who need to run a BoT application could fully explore the elasticity property of cloud computing providers, and request as many computing resources as needed in order to maximize the level of parallelization of the execution of their applications, allowing them to run their applications in the shortest possible time without any additional cost, when compared to other scheduling options. Obviously, the elasticity of a cloud computing provider is limited by its capacity. However, public cloud computing providers current in operation impose a limit on the amount of resources which each client can request simultaneously that is very low, compared with their capacity; for instance, currently, one of the major players in the market limits this to 20 resources in most cases [3]. Although the limits currently imposed by cloud computing providers do not prevent most clients from seeing the service provided as an infinite source of resources, this is not the case for most of the clients that need to execute BoT applications, which may require the instantiation of thousands of computing resources in a single request [4]. In this paper, we analyze the reasons why public cloud computing providers impose to their clients limits that restrict the usefulness of their services for the execution of BoT applications. 2. A Simple IaaS Provider Model The cloud computing paradigm can be used at different levels of the IT stack [5]. At its lowest level, infrastructure-as-a-service, or IaaS for short, clients can purchase fully functional virtual machines which run a particular operating system on top of which clients can install and run their own applications. In this paper, we focus our attention at this level. Following the cloud computing paradigm, a client of an IaaS provider requests the provision of resources whenever it needs them. If available, these resources are allocated to the client by the provider for some time. Typically, the client defines the duration of this time interval, and returns the resources that were allocated to it whenever it no longer needs them. Providers charge 3
clients based on a price that is associated to a fixed length, minimal reference allocation interval. Thus, the clients are always charged for the minimal multiple of this interval that is larger than or equal to the time period for which the resources were used. We are interested in analyzing the behavior of an IaaS provider within a long enough time interval of length T. To simplify the model, we consider that this time interval is divided into small slots of fixed size, and that allocations and deallocations of resources are always performed in the beginning of such time slots. We model an IaaS provider as a tuple P = K, U, D, A, C i, C u, C v, V, where: K is the provider s capacity, and represents the maximum amount of resources available; U is the set of users (clients) registered in the provider; D is the distribution of demand of these users; A is the strategy of resource allocation used by the provider; C i is the cost incurred by the provider for making available each individual resource in each time slot, no matter if it is being used or not [6]; C u is the additional cost incurred by the provider, for each time slot in which an individual resource is being effectively used [6] [7]; C v is the cost to the provider for each violation committed; V is the amount charged to users for the effective utilization of a resource during one time slot. We consider that D is represented by the aggregation of the independent infinite sequence of requests s 1, s 2,..., issued by each user u U. We represent the demand that a user u has for resources at a time slot t by the function D(u, t). Depending on the allocation strategy adopted (A) and the provider s 4
capacity (K), each user u requesting D(u, t) resources in time slot t will receive an allocation of resources associated that is expressed by the function A(u, t), 0 A(u, t) D(u, t). in the availability of the service provided. When A(u, t) < D(u, t) we have a violation In this way, the total number of violations occurred in a time slot t is given by: V(t) = u U 1 A(u,t) D(u,t). One way to gauge the efficiency of the provider is to measure its profit in the period of time considered. Let U(t), U(t) = u U A(u, t), be the total utilization of the provider at time slot t, and P(t) be the profit attained in time slot t, thus, the total profit achieved by the provider is given by: P = T t=1 P(t), where P(t) = U(t) (V Cu) V(t) Cv K Ci. 3. Characterization of Users Profile Considering the interval of interest T, our analysis needs to consider only those users that are active, i.e. have requested resources in at least one time slot within the T period. Formally, U a = {u u U t, 1 t T, D(u, t) > 0}. We consider that active users can have one of two possible behaviors: regular or eventual. Regular users are those with uninterrupted use for the whole T period, and are described as: U r = {u u U a t, 1 t T, D(u, t) > 0}. All other active users are classified as eventual, i.e. U e = U a U r. During the period of interest T, eventual active users alternate between periods of consumption (on session), and periods with no consumption (off session) [8]. The interval of time between two consecutive sessions is called the think time, and follows some particular distribution [9]. 4. Analysis Let us first consider the case where no service violations occur during the period of interest, i.e. V(t) = 0, t, 1 t T. For that, the capacity of the provider must be such, that K U(t), t, 1 t T. In this case, the profit attained by the provider in time slot t is given by: 5
P = T t=1 (V Cu) U(t) K Ci, with K = U(t ) U(t ) U(t ), 1 t T t U(t ) = U(t ). As V, C u and C i are constants, we can infer that as the ratio E(U(t)) K tends to 1, the profit of the provider is maximized in T. This would be the case, if the users combined demand per time slot followed a normal distribution with a small variance. The demand of each regular user is typically modeled as a normal distribution with small variance. Thus, the aggregated demand of regular users also follows a normal distribution with both mean and variance being the sum of the individual means and variances, respectively. On the other hand, if there are eventual users, it is possible that their behavior can lead to very high punctual demands, compared to E(U(t)), especially if these are users executing BoT applications. This pressures K to take a value that is much higher than E(U(t)), increasing the idleness of the infrastructure in the periods where only regular users are requesting resources, which, in turn, leads to a decrease on the provider s profit. Note that an increase on the number of eventual users, compared to the number of regular users, only makes the problem worse, given that they do not demand resources in all time slots. In this case, the provider can choose to keep K closer to E(U(t)), at the expenses of reducing its service availability, i.e. allowing violations to happen. Depending on the number of violations, and the value of C v, this may be a more profitable approach. Fundamentally, the provider needs to set K in such a way that a compromise between the losses due to idleness (K E(U(t)) C i ), and those due to penalties (V(t) C v ) are balanced. Establishing this balance may turn out to be difficult in practice. Thus, a more pragmatic approach is to impose a limit L to the maximum number of resources that a single user can demand at any time slot, essentially forcing users to self-regulate their demand within L. Note that this does not prevent violations from happening, but allows for a better planning of the provider s capacity. By using such a limit, the cloud computing provider forces the natural demands of eventual users to be artificially truncated. In order to be served, 6
any user that has a demand at any time slot t that is larger than L will need to accommodate its demand, if possible, in a number of subsequent time slots. This guarantees that the average demand of any user, regular or not, is smaller than L, and makes the variance of the demand of eventual users to be a function of L. Therefore, the smaller the L, the smaller will be the idleness cost associated to the infrastructure. On the other hand, the smaller the L the less service can be sold to the provider s users, and the smaller is the profit achieved. Thus, L needs to be set large enough to serve well regular users, and small enough to spread the demand of eventual users, and increase the utilization of the installed capacity of the provider. 5. Related Work The work by Menascé and Ngo [10] discusses how traditional methods of capacity planning were impacted by the advent of cloud computing, and how the risks and costs involved are migrating from customers to providers. The further investigation we conducted in this paper on the aspects of availability and demand regulation on the part of providers did confirm this condition. The study by Greenberg et al. [11] shows that the typical costs associated with the construction of cloud data centers are distributed into four categories. A framework for detailed analysis of these investments, and how they make up the total cost of ownership of providers was proposed by Li et al. [6]. Their study expanded the classification to eight categories of costs, and introduces the concept of Utilization Cost - the cost associated with the effective consumption of resources by users. We use the depreciation and utilization costs proposed by Li et al. to compute the profit obtained by an IaaS provider. Anandasivam et al. [12] introduce a version of the concept of bid price tailored for cloud computing, in which the provider uses an auction system that acts as an influence on the behavior of price-sensitive users, and regulates the use of the available resources. Our study shows that the limit imposed by providers is another way to regulate users demands. 7
6. Conclusion Our study shows that as demand of regular users is permanent and predictable, its growth is beneficial to the profitability of the provider, since it does not impose a risk of oversizing the infrastructure. Thus, the profit of the provider is negatively affected only by the parcel of the demand that comes from eventual users, which can result in increased inactivity of the infrastructure, if not controlled. This is only exacerbated when eventual users are eager resource consumers and make very large punctual demands. It was observed that users with eventual and intense utilization force the minimum capacity necessary and increase the idleness of the system, increasing the operational costs of the provider. In this way, not only the attribution of a limit for the allocation of resources is necessary, as the assigned value has significant impact on the investments required in infrastructure to ensure an adequate level of availability for the provider. The next steps on our research include the investigation of alternative ways to minimize the costs involved with increasing the capacity of public cloud computing providers to appropriately deal with the demand of eventual eager users, such as those that need to run BoT applications. These costs are a major obstacle to the provision of elasticity in more flexible conditions, allowing these users to fully benefit from the advantages of the cloud computing model. The discovery, federation and resale of resources already amortized in other contexts may represent a promising path, because they rely on the existence of idle capacity in contexts where the costs of availability have already been absorbed by other businesses or purposes [13]. References [1] M. Armbrust, A. Fox, R. Griffith, A. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, Others, A view of cloud computing, Communications of the ACM 53 (2010) 50 58. 8
[2] W. Cirne, D. Paranhos, L. Costa, E. Santos-Neto, F. Brasileiro, J. Sauve, F. Silva, C. Barros, C. Silveira, Running Bag-of-Tasks applications on computational grids: the MyGrid approach, IEEE, 2003. [3] AWS, Amazon Web Services, 2011. [4] M. Sevior, T. Fifield, N. Katayama, Belle monte-carlo production on the Amazon EC2 cloud, Journal of Physics: Conference Series 219 (2010). [5] K. Stanoevska-Slabeva, T. Wozniak, Cloud basics - an introduction to cloud computing, in: K. Stanoevska-Slabeva, T. Wozniak, S. Ristol (Eds.), Grid and Cloud Computing, Springer Berlin Heidelberg, 2010, pp. 47 61. [6] X. Li, Y. Li, T. Liu, J. Qiu, F. Wang, The Method and Tool of Cost Analysis for Cloud Computing, 2009 IEEE International Conference on Cloud Computing (2009) 93 100. [7] L. A. Barroso, U. Hölzle, The Case for Energy-Proportional Computing, Computer 40 (2007) 33 37. [8] D. G. Feitelson, Workload Modeling for Computer Systems Performance Evaluation, 2009. [9] J. Raj, The Art of Computer Systems Performance Analysis, Wiley- Interscience, 1991. [10] D. A. Menascé, P. Ngo, Understanding Cloud Computing: Experimentation and Capacity Planning, in: 2009 Computer Measurement Group Conference, p. 11. [11] A. Greenberg, J. Hamilton, D. a. Maltz, P. Patel, The cost of a cloud, ACM SIGCOMM Computer Communication Review 39 (2008) 68. [12] A. Anandasivam, S. Buschek, R. Buyya, A Heuristic Approach for Capacity Control in Clouds, in: IEEE CEC 2009, IEEE, 2009, pp. 90 97. [13] R. Costa, F. Brasileiro, G. Lemos, D. Mariz, Just in Time Clouds: Enabling Highly-Elastic Public Clouds over Low Scale Amortized Resources, 2010. 9