Randomized Algorithms for Scheduling VMs in the Cloud

Randomized Agorithms for Scheduing VMs in the Coud Javad Ghaderi Coumbia University Abstract We consider the probem of scheduing VMs (Virtua Machines) in a muti-server system motivated by coud computing appications. VMs arrive dynamicay over time and require various amounts of resources (e.g., CPU, Memory, Storage, etc.) for the duration of their service. When a VM arrives, it is queued and ater served by one of the servers that has sufficient remaining capacity to serve it. The scheduing of VMs is subect to: (i) packing constraints, i.e., mutipe VMs can be be served simutaneousy by a singe server if their cumuative resource requirement does not vioate the capacity of the server, and (ii) non-preemption, i.e., once a VM is schedued in a server, it cannot be interrupted or migrated to another server. To achieve maximum throughput, prior resuts hinge on soving a hard combinatoria probem (Knapsack) at the instances that a the servers become empty (the so-caed goba refresh times which require synchronization among the servers). The main contribution of this paper is that it resoves these issues. Specificay, we present a cass of randomized agorithms for pacing VMs in the servers that can achieve maximum throughput without preemptions. The agorithms are naturay distributed, have ow compexity, and each queue needs to perform imited operations. Further, our agorithms dispay good deay performance in simuations, comparabe to deay of heuristics that may not be throughput-optima, and much better than the deay of the prior known throughput-optima agorithms. Index Terms Resource Aocation, Markov Chains, Stabiity, Coud Computing, Knapsack Probem I. INTRODUCTION Coud computing has obtained considerabe momentum recenty and different coud computing modes and services have emerged to meet the needs of various users. By using coud, users no onger require to insta and maintain their own infrastructure and can instead use massive coud computing resources on demand. In Infrastructure as Service (IaaS) mode of the coud computing, the coud provider (e.g., Amazon EC2 ) provides computing, networking, and storage capabiities to users through Virtua Machines (or Instances). Each Virtua Machine (VM) specifies certain amounts of CPU, memory, storage, etc. The users can request from mutipe Virtua Machine (Instance) types depending on their needs. As demand for the coud services continues to scae, resource aocation to meet the demand presents a chaenging probem for two reasons: first, the coud workoad is a priori unknown and wi ikey be variabe over both time and space; and second, serving VMs in the servers of the coud is subect to packing constraints, i.e., the same server can serve mutipe VMs simutaneousy if the cumuative resource requirement of those VMs does not vioate the capacity of the server. Therefore, to maintain the scaabiity and efficiency of the coud architecture, it is imperative to deveop efficient resource aocation agorithms addressing these chaenges. In this paper, we consider a system consisting of a (possiby arge) number of servers. The servers are not necessariy homogeneous in terms of their capacity for various resources (e.g. CPU, memory, storage). The VMs of various types arrive dynamicay over time. Once a VM arrives, it is queued and ater served by one of the servers that has sufficient remaining capacity to serve it. Once the service is competed, the VM departs the system and reeases the resources. The throughput of the system is defined as the average number of VMs of various types that can be served by the system over time. We are interested in efficient and scaabe scheduing agorithms that maximize the throughput of the system. Further, we ideay woud ike to do scheduing without preemptions (i.e., without interrupting the ongoing services of the obs in the system) since preemptions require the interrupted obs to be migrated to new machines or restored at a ater time, which are both undesirabe (expensive) operations 2. In this paper, we use the terms VM (Virtua Machine) and ob interchangeaby. A. Motivation and Chaenges Consider a very simpe exampe: a singe server system with two types of obs. Suppose there is ony one type of resource and the server capacity is units. Jobs of type require 2 units of resource and obs of type 2 require 3 units of resource. This impies that, at any time, the server can simutaneousy serve k obs of type and k 2 obs of type 2 if k (2)+k 2 (3). We refer to (k, k 2 ) as the server configuration. There are two queues Q and Q 2 for hoding obs of type and type 2 waiting to get service. To ensure maximum throughput, prior work 3 5 essentiay reies on the max weight agorithm 6 which operates as foows: consider a weight for each ob type equa to its queue size and then choose a configuration that has the maximum sum weight. Formay, given the current queue sizes Q and Q 2, the max weight agorithm seects a pair (k, k 2 ) that soves the foowing combinatoria optimization probem max (k,k 2) subect to 2k + 3k 2 Q k + Q 2 k 2 () k, k 2 {,, 2, } There are two main issues with respect to compexity and dynamics of this agorithm that wi be substantiay magnified

as the system size scaes: (i) Compexity: Soving the optimization () is not easy, especiay when there is a arge number of mutidimensiona ob types and the system consists of a arge number of (inhomogeneous) servers. In fact, the optimization () is an instance of the cassica Knapsack probem which in genera is NP-compete 7. (ii) Preemption: The agorithm needs to sove () to find the right configuration, whenever the queues change (i.e., every time a ob arrives/departs), and reset the configuration accordingy. Resetting the configuration however can interrupt the ongoing services of the existing obs in the servers and require them to be migrated to new servers or restored at a ater time (which is expensive). One aternative proposed in 4, 5 is to reset the configuration to the max weight configuration at the so-caed refresh times. The oca refresh times for a server are defined as the times when a the obs in the server eave and the server becomes empty. This resoves the preemption issue but, as noted in 4, 5, this approach in genera requires resetting the configurations of a the servers at the goba refresh times which are times at which a the servers in the system become empty simutaneousy. Such goba refresh times become extremey infrequent as the number of servers increases, which has a negative impact on the deay performance; further, it requires synchronization among the servers which is not practica. B. Contribution The contribution of this paper is to resove the compexity and refresh time issues discussed above. In particuar, we propose a simpe ow-compexity agorithm that provides seamess transition between configurations and prove that it is throughput-optima. For the singe server, two ob-type exampe described in Section I-A, our agorithm can be essentiay described as foows: Assign a dedicated Poisson cock of rate ( + Q ) to the -th queue, =, 2. Whenever the cock of the -th queue ticks, try to fit a type- ob into the server if possibe. As we wi show, this simpe mechanism can be easiy extended to systems that have a arge number of servers (which are not necessariy homogeneous) and have many ob types. In summary, our agorithm (i) has ow compexity, is scaabe to arge scae server systems with centraized or distributed queues, and is throughput-optima. (ii) does not rey on refresh times, thus provides seamess transition among the configurations without preemption and without coordination among the servers. (iii) dispays good deay performance in simuations, comparabe to deay of heuristics that may not be throughputoptima, and much better than the deay performance of the prior known throughput-optima poicies. C. Reated Work At the high eve, our work is at the intersection of resource aocation in coud data centers (e.g. 8, 9,, 3, 4) and scheduing agorithms in queueing systems (e.g. 6, 5 2). The VM pacement in an infinite server system mode of coud has been studied in 2 24. We woud ike to highight three cosey reated papers 3, 4, 5 where a finite mode of the coud is studied and preemptive 3 and non-preemptive 4, 5 scheduing agorithms to stabiize the system are proposed. The proposed agorithms however essentiay rey on the max weight agorithm and hence, as expained in Section I-A, in genera suffer from high compexity and resetting at the goba refresh times. In the case that a the servers are identica, it is sufficient to reset the server configurations at the so-caed oca refresh times, namey, time instances when a server becomes empty, which are more frequent than the goba refresh times 4, 5; however, it is not cear if operation based on oca refresh times is stabe in genera. D. Notations Some of the basic notations used in this paper are the foowing. S denotes the cardinaity of a set S. (x A) is the indicator function which is if x A, and otherwise. e denotes a vector whose -th entity is and its other entities are. e i denotes a matrix whose entity (i, ) is one and its other entities are. R + denotes the set of rea nonnegative numbers. For any two probabiity vectors π, ν R n +, the Kuback Leiber (KL) divergence of π from ν is defined as D KL (π ν) = i π i og πi ν i. We use Ξ n to denote the n- dimensiona simpex of probabiity vectors Ξ n = {p R n + : i p i = }. Given two vectors x, y R n, x < y means x i < y i componentwise. f(x) = o(g(x)) means f(x)/g(x) goes to in a imit in x specified in the context. E. Organization The rest of the paper is organized as foows. We start with the system mode and definitions in Section II. Sections III and IV contain the description of our agorithms for centraized and distributed queueing architectures respectivey. The main idea behind the agorithms is briefy expained in Section V. The proof detais are presented in Section VI. Section VII contains the simuation resuts. Section VIII contains concusions and possibe extensions of our resuts. II. SYSTEM MODEL Coud Custer Mode and VM-based Jobs: Consider a coection of servers L. Each server L has a imited capacity for various resource types, e.g. CPU, memory, storage, etc. We assume there are n types of resources. Servers are not necessariy homogeneous in terms of their capacity. We define L = L. There is a coection of VM (Virtua Machine) types J, where each VM type J requires certain amounts of various resources. Hence each VM type can be thought of as an n-dimensiona vector of resource requirements. We define J = J.

VM (Job) Arrivas and Departures: Henceforth, we use the word ob and VM interchangeaby. We assume VMs of type arrive according to a Poisson process with rate λ. Each VM must be paced in a server that has enough avaiabe capacity to accommodate it. VMs of type require an exponentiay distributed service time with mean /µ. The assumptions such as Poisson arrivas and exponentia service times can be reaxed (see Section VIII) but for now et us consider this mode for simpicity. Server Configuration and System Configuration: For each server, a row vector k = (k,, kj ) RJ + is said to be a feasibe configuration if the server can simutaneousy accommodate k type- VMs, k2 type-2 VMs,, kj type-j VMs. We use K to denote the set of feasibe configurations for server. Note that we do not necessariy need the resource requirements to be additive, we ony require the monotonicity of the feasibe configurations, i.e., if k K, and k k (component-wise), then k K. We aso define the system configuration as a matrix k R L J + whose -th row (k ) is the configuration of server. We use K to denote the set of a feasibe configuration matrices for the system. Queueing Dynamics and Stabiity: When obs arrive, they are queued and then served by the servers. We use Q (t) to denote the tota number of type- obs in the system waiting for service. We aso define the row vector Q(t) = (Q (t), J ). The obs can be queued either centray or ocay as described beow. (i) Centraized Queueing Architecture: There are a tota of J queues, one queue for each ob type. When a ob arrives, it is paced in the corresponding queue and ater served by one of the servers. Hence here Q (t) is simpy the size of the -th centraized queue. (ii) Distributed Queueing Architecture: There are a tota of J L queues ocated ocay at the servers. Each server has J queues, one queue for each ob type. When a ob arrives, it is paced in a proper oca queue at one of the servers and then served by the same server ater. We use Q (t) to denote the size of the -th queue at server and use Q (t) to denote the row vector Q (t) = (Q (t), J ). Hence Q (t) = L Q (t) is the tota number of type- obs waiting for service in the system. We aso use Q(t) to denote a matrix whose -th row is Q (t). Under both architectures, the tota number of obs waiting for service foows the usua dynamics: Q (t) = Q () + A (, t) D (, t), (2) where A (, t) is the number of type- obs arrived up to time t and D (, t) denotes the number of type- obs either departed up to time t or receiving service at time t. The system is said to be stabe if the queues remain bounded in the sense that im sup E t Q (t) <. (3) J A vector of arriving rates λ and mean ob sizes /µ is said to be supportabe if there exists a resource aocation agorithm under which the system is stabe. Let ρ = λ /µ be the workoad of type- obs. Define C = {x R J + : x = L x, x Conv(K )} where Conv( ) is the convex hu operator. It has been shown in 3 5 that the set of supportabe work oads is the interior of C, i.e., C o = {ρ R J + : x C s.t. ρ < x} The goa of this paper is to deveop ow compexity agorithms that can stabiize the queues for a ρ C o (i.e., throughput optimaity), under both centraized and distributed queueing architectures, without preemptions. III. SCHEDULING ALGORITHM WITH CENTRALIZED QUEUES In this section, we present our agorithm for the system with centraized queues. Reca that by centraized queues, we mean that there is a set of common queues Q, J, representing the obs waiting to get service. When a type- ob arrives, it is added to queue Q. Once a ob is schedued for service in one of the servers, it is paced in the server and eaves the queue. The agorithm is based on construction of dedicated Poisson cocks for the queues. Each queue Q is assigned an independent Poisson cock of rate exp(w (t)), where w (t) = f(q (t)) for an increasing concave function f to be specified ater. Hence, at each time t, the time duration unti the tick of the next cock is an exponentia random variabe with parameter exp(w (t)). The description of the agorithm is given beow. Agorithm Scheduing Agorithm with Centraized Queues Suppose the dedicated cock of the type- queue makes a tick, then: : One of the servers is chosen uniformy at random. 2: If a type- ob can fit into this server: - if there are type- obs in the queue, pace the headof-the-ine ob in this server, - ese, pace a dummy type- ob is this server. Ese: do nothing. In Agorithm, dummy obs are treated as rea obs, i.e., dummy obs of type depart after an exponentiay distributed time duration with mean /µ. The foowing theorem states the main resut regarding the throughput-optimaity of Agorithm. Theorem : Any ob oad vector ρ C o is supportabe by Agorithm. This means that if Q changes at a time t > t before the cock makes a tick, the time duration unti the next tick is reset to an independent exponentia random variabe with parameter exp(w (t )).

IV. SCHEDULING ALGORITHM WITH DISTRIBUTED QUEUES In this section, we present the agorithm for the system with distributed queues. In this architecture, each server L has a set of oca queues Q, J. When a type- ob arrives to the system, it is routed to a proper server where it is queued. Each server seects a set of obs from its own set of oca queues to serve. Simiar to Agorithm, each queue Q is assigned an independent Poisson cock of rate exp(w (t)), where w (t) = f(q (t)) for an increasing concave function f to be specified ater. The description of the agorithm is given beow. Agorithm 2 JSQ Routing and Scheduing Agorithm with Distributed Queues Job Arriva Instances: Suppose a type- ob arrives at time t. The ob is routed based on JSQ (Join the Shortest Queue), i.e., it is assigned to the server with the shortest queue for type- obs. Formay, et (t) = arg min Q (t) (break ties arbitrariy). Then { Q Q (t + ) = (t) +, if = (t) Q (t), otherwise. Dedicated Cock Instances: Suppose the dedicated cock of queue Q makes a tick at time t, then: If a type- ob can fit into server : if Q (t) is nonempty, pace the head-of-the-ine ob in this server, ese, pace a dummy type- ob is this server. Ese: do nothing. In Agorithm 2, dummy obs are treated as rea obs, i.e., dummy obs of type depart after an exponentiay distributed time duration with mean /µ. Remark : The JSQ routing can be repaced by simper aternatives such as the power-of-two-choices routing 25. Namey, when a ob arrives, two servers are seected at random, and the ob is routed to the server which has the smaer queue for that ob type. The foowing theorem states the main resut regarding the throughput-optimaity of Agorithm 2. Theorem 2: Any ob workoad vector ρ C o is supportabe by Agorithm 2. V. MAIN IDEA BEHIND THE ALGORITHMS AND CONNECTION TO LOSS SYSTEMS The main idea behind the agorithms is that the generation of configurations is essentiay governed by an imaginary oss system whose ob arrivas are the dedicated Poisson cocks in Agorithms and 2. We first define the oss system formay and then mention a few properties of the oss system that wi constitute the basis for the anaysis of our agorithms. Definition (LOSS(L, J, w)): The oss system consists of a set of servers L, a set of ob types J, and a vector of weights w = (w ; J ). The obs of type J arrive as a poisson process of rate exp(w ). Every time a ob arrives, one of the servers is samped uniformy at random, if the ob can fit in the server, it is paced in that server, otherwise the ob is dropped (ost forever). The obs of type eave the system after an exponentiay distributed time duration with mean /µ. The evoution of configurations in the oss system can be described by a continuous-time Markov chain over the space of configurations K with the transition rates as foows: k k + e exp(w ) at rate (k + e K), L k k e at rate µ k(k > ). The foowing emma characterizes the steady-state behavior of configurations in the oss system. Lemma : The steady state probabiity of configuration k in LOSS (L, J, w) is given by φ w (k) = Z w exp( w k)) ( ) k k!, (4) Lµ where Z w is the normaizing constant. Proof: Under the uniform routing, the Markov chain is reversibe. For any pair k and k+e K, the detaied baance equation is given by φ(k)exp(w ) L = φ(k + e )(k + )µ, where the eft-hand side is the transition rate from k to k+e, and the right-hand side is the transition rate from k + e to k. It can be shown that the set of detaied baance equations have a soution as given in (4). Lemma 2: The probabiity distribution φ w soves the foowing maximization probem max E p p Ξ K w k J L D KL (p φ ), (5) where D KL ( ) is the KL divergence distance, Ξ K is the set of probabiity distributions over K, and φ = φ w=, i.e, φ (k) = ( ) k Z k!. (6) Lµ Proof: Let F (p) denote the obective function F (p) = E p w k D KL (p φ ). J L Observe that F (p) is stricty concave in p. The agrangian is given by L(p, η) = F (p) + η( k K p(k) ), where η R is the agrange mutipier associated with the constraint p Ξ K (i.e., p : k K p(k) = ). Taking the partia derivatives and soving L/ p(k) = yieds p(k) = exp( + η)φ (k) exp( kw ); k K,

which is automaticay nonnegative for any η. Hence, by KKT conditions (p, η ) is the optima prima-dua pair if it satisfies K p (k) =. Thus the optima distribution p is given by φ w as defined in 4. Lemma 2 indicates that given a set of weights w, J, the oss system can generate configurations that roughy have the maximum weight max k w k. The foowing coroary formaizes this statement. Coroary : The probabiity distribution φ w satisfies E φ w k w max k K k w + min k K og φ (k), for φ defined in (6) independenty of w. Proof: Define k := arg max kw, (7) k K and et δ k (k) = (k = k ). As a direct consequence of Lemma 2, E φ w kw D KL (φ w φ ) k w D KL (δ k φ ). But D KL (υ φ ), for any distribution υ, so E φ w kw k w D KL (δ k φ ) = Connection to Agorithm : k w + og φ (k ) k w + min k K og φ (k). The generation of configurations under Agorithm is governed by the (imaginary) oss system LOSS(L, J, w(t)) whose ob arrivas are the Poisson cocks of Agorithm. Now suppose the dynamics of the oss system converges to the steady state at a much faster time-scae compared to the time-scae of changes in w(t) (i.e., a time-scae separation occurs), then the distribution of configurations in the system roughy foows the stationary distribution of configurations in LOSS(L, J, w(t)) given by φ w(t) in (4). Then by Coroary, Agorithm on average generates configurations which are cose to the max weight configuration (off by a constant factor min k K og φ (k) independent of queue sizes) which suffices for throughput-optimaity. The time-scae separation hods under functions f(x) that grow as o(og(x)) when x. Simiar time-scae separations arise in the context of scheduing in wireess networks, switches and oss networks and have been formay proved in a sequence of paper 7, 9, 2, 26. Estabishing the time-scae separation for our setting foows simiar arguments and is not the main contribution of this paper. Hence, in the proof of throughput optimaity, we opt for simpy assuming that the time-scae separation hods. Connection to Agorithm 2: Suppose the weights w = (w, J ), L are fixed. From the perspective of server, the configuration k evoves according to the oss system LOSS({}, J, w ), independenty of the evoution of other severs configurations. Let φ w (k ) denote the steady state probabiity of configuration k in server, then by appying Lemma to LOSS({}, J, w ), φ w (k ) = exp( Z w w k ) ( ) k k!, (8) µ where Z w is the normaizing constant. Then the steady-state probabiity of system configuration k = (k, L) foows the product form ψ w (k) := = Z w ψ φ w (k ) exp( wk ) ( ) k k! (9). µ Note that the distributions ψ w (9) and φ w (4) are amost identica (the minor difference is in the /L term inside the product in (4)). The rest of the argument is more or ess simiar to Agorithm. We emphasize that here the ob arriva process to the queues and the queue process are couped through the dynamics of JSQ, nevertheess, the mean number of arrivas/departures that can happen over any time interva, it is sti bounded, and hence by choosing functions f(x) of the form o(og(x)) the time-scae separation sti hods. In Section VI we present more detais regarding the proofs. VI. LYAPUNOV ANALYSIS AND PROOFS The proof of Theorems and 2 is based on the properties of the oss system (Coroary ) and standard Lyapunov arguments 27, 28. A. Proof of Theorem Define the state of the system as S(t) = (Q(t), k(t)) where Q(t) is the vector of queue sizes and k(t) is the system configuration matrix whose -th row is the configuration of server. Under Agorithm, the process {S(t)} {t } evoves as a continuous-time and irreducibe Markov chain. Let Q (t) = Q (t) + L k (t). Consider a Lyapunov function V (t) = J µ F ( Q (t)), () where F (x) = x f(τ)dτ. Reca that f : R + R + is a concave increasing function; thus F is convex. Choose an

arbitrariy sma u >. It foows from convexity of F that for any t, V (t + u) V (t) f( µ Q () (t + u))( Q () (t + u) Q () (t)) = J f( µ Q () (t))( Q () (t + u) Q () (t))+ J (f( µ Q () (t + u)) f( Q () (t)))( Q (t + u) Q (t)). J We can write Q (t + u) Q (t) = A (t, t + u) D (t, t + u), where A (t, t + u) is the number of arrivas of type- obs during (t, t+u) and D (t, t+u) is the number of departures of rea and dummy type- obs from the system during (t, t+u). Let N A(t) and N D (t), J, denote independent unit-rate Poisson processes. Then, the processes A and D can be constructed as A (, t) = N A (λ t), D (, t) = N D ( It is easy to see that t k(τ)µ dτ). f( Q (t + u)) f( Q (t)) f () Q (t + u) Q (t), by the mean vaue theorem, and the fact that f is a concave increasing function. For notationa compactness, et E Z = E Z, given a random variabe Z. Suppose the maximum number of obs of any type that can fit in a server is ess than M <, then k < LM. It is then foows that E S(t) V (t + u)) V (t) E S(t) f( µ Q (t))(a (t, t + u) D (t, t + u)) + f () µ E S(t) A (t, t + u) D (t, t + u) 2 E S(t) f( µ Q (t))(a (t, t + u) D (t, t + u)) + K 2 u + o(u), () where K 2 = f ()( ρ + ML). Note that f( Q (t)) f(q (t)) f () Q (t) Q (t) = f () k (t), (2) again by the mean vaue theorem, and the fact that f is a concave increasing function. Thus E S(t) D (t, t + u)f( µ Q (t)) = J k(t)f( Q (t))u + o(u) k(t)f(q (t))u + o(u). (3) Simiary, using (2), it foows that E S(t) A () (t, t + u)f( µ Q () (t)) J ρ f(q (t))u + K 3 u (4) where J K 3 = f () ρ k(t) f ()ML Hence, using (4) and (3) in (), u E S(t) V (t + u) V (t) ( f(q (t)) ρ ) k(t) + K 2 + K 3 + o(). (5) J The process S(t) has two interacting components. On one hand the evoution of the queue process Q(t) depends on k(t); and on the other hand, the evoution of the configuration process k(t) depends on the queue process Q(t) through the weights w (t) = f(q (t)) in the agorithm. As expained in Section V, a separation of time-scaes happens by using ogarithmic-type functions f that change very sowy with queue size, i.e., the evoution of k(t) occurs on a much faster time-scae compared to the change in the weights. Then roughy speaking, the evoution of the process Q(t) is governed by φ w (t) (i.e., the time-average distribution of configurations when the queue size Q(t) is fixed) defined in (4). Then by Coroary, E Q(t) k(t)f(q (t)) k (t)f(q (t)) ρ. + min k K og φ (k), (6) where k is the max weight configuration defined in (7) with w = f(q(t)) and φ was defined in (6). Let t := im u u E Q(t) V (t + u) V (t) (7) be the infinitesima drift operator. Taking the expectation of both sides of (5) with respect to φ w (conditiona distribution of configurations given the queues), and using (6), yieds t ( f(q (t)) ρ ) k (t) + K.

where K = K 2 + K 3 min k K og φ (k). Since ρ C o, there exists a δ > such that ρ ( + δ) x for some x Conv(K ). Hence, by definition of k, f(q (t))( + δ)ρ f(q (t)) x (t) J J and therefore t δ J J f(q (t)) f(q (t))ρ + K. k (t), Hence the drift is negative for any Q(t) outside of a finite set. Hence the Markov chain is positive recurrent by the continuous-time version of Foster-Lyapunov theorem and further the stabiity in the sense im sup t E f(q (t)) < foows (see e.g., Theorem 4.2 of 28). The stabiity in the mean sense 3 then foows by an extra step as in 5 (See Theorem and Lemma 4 in 5). B. Proof of Theorem 2 The anaysis is simiar to the proof of Theorem () with minor differences. The system state is given by S(t) = (Q(t), k(t)) where Q is the queue-size matrix whose -th row is the vector of queue sizes at server. Consider a Lyapunov function V ( ) as V (t) = J L µ F ( Q (t)), (8) where Q (t) = Q (t) + k (t). For each server and ob type, Q (t + u) Q (t) = A (t, t + u) D (t, t + u), where A (t, t + u) is the number of arrivas of type- obs during (t, t + u) and D (t, t + u) is the number of departures of rea and dummy type- obs from server during (t, t + u). Note that D (, t) is a (time-inhomogeneous) Poisson process of rate k (t). Simiar to the the proof of Theorem (), the Lyapunov drift can be bounded as E S(t) V (t + u) V (t) E S(t) f( µ Q (t))(a (t, t + u) D(t, t + u)) + K 2 u + o(u), (9) for the same constant K 2 as in the proof of Theorem (). The term invoving the product D () (t, t+u)f( Q () (t)) is bounded by an expression simiar to (3), i.e., E S(t) D µ (t, t + u)f( Q (t)) = k(t)f( Q (t))u + o(u) k(t)f(q (t))u + o(u). (2) The term invoving A (t, t + u)f( Q (t)) must be treated more carefuy because, unike Agorithm, the arriva process {A (t, t + u), J } {t } and the queue process {Q (t), J } {t } are now dependent through the dynamics of JSQ. This step can be done as foows. E S(t) A µ (t, t + u)f( Q (t)) E S(t) A µ (t, t + u)f(q (t)) + K 3 u a = = E S(t) A (t, t + u)f(q µ (t)) + K 3 u + o(u) ρ f(q (t))u + K 3u + o(u), (2) where K 3 is same constant as in the proof of Theorem (), and equaity (a) is due to the JSQ property (see Agorithm 2). Then foowing simiar arguments as in the proof of Theorem, t f(q (t))ρ f(q (t))k (t) + ˆK, where ˆK = K 2 + K 3 min k K og ψ (k), for ψ = ψ w= defined based on (9). Without oss of generaity, we can assume that e K for a and (otherwise if there exists an and such that e / K, we can simpy do not consider any queue for type obs at server.) Then since ρ C o, there must exist a δ > such that ρ < x and < x ( + δ) Conv(K ). Then by the JSQ property f(q (t))ρ and by the definition of k, f(q (t))( + δ)x Hence t δ f(q (t))x, (22) f(q (t))k (t). (23) f(q (t))x + ˆK, and therefore the Markov chain is positive recurrent by the continuous-time version of Foster-Lyapunov theorem 28. The rest of the arguments are simiar to the proof of Agorithm. VII. SIMULATION RESULTS In this section, we present our simuation resuts to confirm our anaytica resuts as we as to investigate the deay performance. We consider the same VM types considered in 3, 4, 5, which are three representative instances avaiabe in Amazon EC2 (see Tabe I). We consider arriva rates of the form λ = ζ V, where V = L = V and V is obtained by averaging the maxima configurations of server. Thus V is a point on the boundary of the supportabe oad region C o and ζ (, ) contros the traffic intensity. The mean service times are normaized to one. For simuations, we consider the distributed queueing

VM Type Memory CPU Storage Standard Extra Large 5 GB 8 EC2 units 69 GB High-Memory Extra Large 7. GB 6.5 EC2 units 42 GB High-CPU Extra Large 7 GB 2 EC2 units 69 GB TABLE I THREE REPRESENTATIVE INSTANCES IN AMAZON EC2 Average deay 2 5 5 MW goba Agorithm 2..2.3.4.5.6.7.8 Traffic intensity Fig.. Average deay comparison of MW-goba and Agorithm 2 in a 2- server system. Average deay 2 5 5 MW goba Agorithm 2..2.3.4.5.6.7.8 Traffic intensity Fig. 2. Average deay comparison of MW-goba and Agorithm 2 in a - server system. architecture as it is more common in data centers (queue memory needs to operate at a much sower speed compared to the centraized queueing architecture). This wi aow us to compare our ow compexity agorithm (Agorithm 2) not ony with Max Weight with goba refresh times, but aso with heuristics that operate ocay on the servers such as Max Weight with oca refresh times. We expect our comparisons to be even more pronounced in the case of centraized queueing architecture as these heuristics are not suitabe for this case. A. Comparison with Max Weight with goba refresh times The Max Weight agorithm with goba refresh times (MWgoba) has been shown to be stabe for any ρ C o 4, 5. The agorithm chooses a max weight configuration for each server at the instances of goba refresh times, namey, times when a the servers in the system do not contain any ongoing service (queues might be sti nonempty). We consider a homogeneous muti-server system, each server with capacity 3 GB Memory, 3 EC2 units CPU, and 4 GB Storage. For Agorithm 2, we consider f(x) = og((+x)). We compare the average deay performance of Agorithm 2 and MW-goba for various traffic intensities. The deay for each ob is the time duration from the moment it enters the system unti it starts getting service. Figure shows the average deay when there Tota Queue Size 3.5 3 2.5 2.5.5 4 x 4 Agorithm 2 MW Loca.5.5 2 Time 2.5 3 3.5 4 x 4 Fig. 3. Time evoution of tota queue size under MW-Loca and Agorithm 2 in an inhomogeneous system. are ony two servers and Figure 2 shows the average deay when the number of servers increases to. As we expect, when the number of servers increases, the goba refresh times become extremey infrequent which wi deteriorate the deay performance. B. Comparison with Max Weight with oca refresh times The Max Weight with oca refresh times (MW-oca) reported in 4, 5 has substantiay better deay than the MWgoba. Under MW-oca a max weight configuration is chosen for each server at its oca refresh times. The oca refresh time for a server is a time instance at which the server does not contain any ongoing ob services. Ceary the oca refresh times occur much more frequenty than the goba refresh times. However, it is not cear if MW-Loca is stabe in heterogeneous systems 4, 5. To investigate the performance in a heterogeneous system, we consider servers, 5 of which have capacity (3 GB Memory, 3 EC2 units CPU, and 4 GB Storage), and 5 of which have capacity (9 GB Memory, 9 EC2 units CPU, 5 GB Storage) and ζ =.9. Figure VII depicts the time evoution of the tota queue size under Agorithm 2 and MW-Loca. The figure suggests that the MW- Loca is unstabe whie Agorithm 2 is stabe. Nevertheess, to get a sense of deay performance of our agorithm, we compare the performance of two agorithms in a homogeneous system. Figures 4 and 5 show the queue size and deay performance in a system consisting of homogenous servers, each server with capacity 3 GB Memory, 3 EC2 units CPU, and 4 GB Storage. We have potted the average tota queue size and average deay for various vaues of traffic intensity ζ. Interestingy, the queue size and deay performance of Agorithm 2 are very cose to MW-oca agorithm. However, as noted, MW-oca has higher compexity and it is not cear if it is throughput-optima in genera. VIII. DISCUSSION AND EXTENSIONS This paper presents randomized agorithms for scheduing VMs in coud systems. Our agorithms are throughput-optima, they have ow compexity and are scaabe to arge scae server systems with centraized or distributed queues, and provide seamess change in the server configurations without reying on refresh times or preemptions.

Average tota queue size 35 3 25 2 5 5 MW oca Agorithm 2..2.3.4.5.6.7.8.9 Traffic intensity Fig. 4. Average queue size for MW-oca and Agorithm 2 in a homogeneous system for various vaues of traffic intensity. Average deay.9.8.7.6.5.4.3.2 Agorithm 2 MW oca...2.3.4.5.6.7.8.9 Traffic intensity Fig. 5. Average deay for MW-oca and Agorithm 2 in a homogeneous system for various vaues of traffic intensity. An important feature of our agorithms is that their performance is not restricted to the traffic mode assumptions made in the paper. For exampe, the Lyapunov anaysis can be easiy extended to non-poisson ob arriva processes, e.g., i.i.d. timesotted processes where in each time sot a batch of obs can arrive with finite first and second moments. The agorithms are aso robust to the service time distributions. This is because the oss system LOSS(L, J, w) is reversibe and by the insensitivity property 29, the steady-state distribution ony depends on the mean service times. The dedicated Poisson cocks are crucia in estabishing our resuts however the cocks are part of the agorithms and are not reated to traffic statistics. REFERENCES http://aws.amazon.com/ec2. 2 R. Niranan Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, V. Subramanya, and A. Vahdat, Portand: A scaabe faut-toerant ayer 2 data center network fabric, in ACM SIGCOMM Computer Communication Review, vo. 39, no. 4, 29, pp. 39 5. 3 S. T. Maguuri, R. Srikant, and L. Ying, Stochastic modes of oad baancing and scheduing in coud computing custers, in Proceedings of IEEE INFOCOM, 22, pp. 72 7. 4 S. T. Maguuri and R. Srikant, Scheduing obs with unknown duration in couds, in Proceedings 23 IEEE INFOCOM, 23, pp. 887 895. 5, Scheduing obs with unknown duration in couds, IEEE/ACM Transactions on Networking, vo. 22, no. 6, pp. 938 95, 24. 6 L. Tassiuas and A. Ephremides, Stabiity properties of constrained queueing systems and scheduing poicies for maximum throughput in mutihop radio networks, IEEE Transactions on Automatic Contro, vo. 37, no. 2, pp. 936 948, 992. 7 H. Keerer, U. Pferschy, and D. Pisinger, Introduction to NP- Competeness of knapsack probems. Springer, 24. 8 M. Stiwe, F. Vivien, and H. Casanova, Virtua machine resource aocation for service hosting on heterogeneous distributed patforms, in Parae & Distributed Processing Symposium (IPDPS), 22, pp. 786 797. 9 K. Mis, J. Fiiben, and C. Dabrowski, Comparing VM-pacement agorithms for on-demand couds, in 2 IEEE Internationa Conference on Coud Computing Technoogy and Science (CoudCom), 2, pp. 9 98. J. Xu and J. A. Fortes, Muti-obective virtua machine pacement in virtuaized data center environments, in 2 IEEE/ACM Int Conference on Green Computing and Communications (GreenCom), & Int Conference on Cyber, Physica and Socia Computing (CPSCom), 2, pp. 79 88. J. W. Jiang, T. Lan, S. Ha, M. Chen, and M. Chiang, Joint VM pacement and routing for data center traffic engineering, in Proceedings of IEEE INFOCOM, 22, pp. 2876 288. 2 X. Meng, V. Pappas, and L. Zhang, Improving the scaabiity of data center networks with traffic-aware virtua machine pacement, in 2 Proceedings of IEEE INFOCOM, 2, pp. 9. 3 Y. O. Yazir, C. Matthews, R. Farahbod, S. Nevie, A. Guitouni, S. Ganti, and Y. Coady, Dynamic resource aocation in computing couds using distributed mutipe criteria decision anaysis, in IEEE Conference on Coud Computing (CLOUD), 2, pp. 9 98. 4 J. Ghaderi, S. Shakkottai, and R. Srikant, Scheduing storms and streams in the coud, SIGMETRICS 25, Poster paper, June 25. 5 T. Bonad and D. Cuda, Rateoptima scheduing schemes for asynchronous inputqueued packet switches, ACM SIGMETRICS Performance Evauation Review, vo. 4, no. 3, pp. 95 97, 22. 6 S. Ye, Y. Shen, and S. Panwar, An O() scheduing agorithm for variabe-size packet switching systems, in Annua Aerton Conference on Communication, Contro, and Computing, 2, pp. 683 69. 7 J. Ghaderi and R. Srikant, On the design of efficient CSMA agorithms for wireess networks, in 49th IEEE Conference on Decision and Contro (CDC), 2, pp. 954 959. 8 M. A. Marsan, A. Bianco, P. Giaccone, E. Leonardi, and F. Neri, Packet-mode scheduing in input-queued ce-based switches, IEEE/ACM Transactions on Networking (TON), vo., no. 5, pp. 666 678, 22. 9 J. Ghaderi, T. Ji, and R. Srikant, Fow-eve stabiity of wireess networks: Separation of congestion contro and scheduing, IEEE Transactions on Automatic Contro, vo. 59, no. 8, pp. 252 267, 24. 2 D. Shah and J. Shin, Randomized scheduing agorithm for queueing networks, The Annas of Appied Probabiity, vo. 22, no., pp. 28 7, 22. 2 A. L. Stoyar, An infinite server system with genera packing constraints, Operations Research, vo. 6, no. 5, pp. 2 27, 23. 22 A. L. Stoyar and Y. Zhong, A arge-scae service system with packing constraints: Minimizing the number of occupied servers, in Proceedings of the ACM SIGMETRICS, 23, pp. 4 52. 23 A. Stoyar and Y. Zhong, Asymptotic optimaity of a greedy randomized agorithm in a arge-scae service system with genera packing constraints, arxiv preprint arxiv:36.499, 23. 24 J. Ghaderi, Y. Zhong, and R. Srikant, Asymptotic optimaity of BestFit for stochastic bin packing, ACM SIGMETRICS Performance Evauation Review, vo. 42, no. 2, pp. 64 66, 24. 25 M. Mitzenmacher, The power of two choices in randomized oad baancing, IEEE Transactions on Parae and Distributed Systems, vo. 2, no., pp. 94 4, 2. 26 S. Raagopaan, D. Shah, and J. Shin, Network adiabatic theorem: An efficient randomized protoco for contention resoution, in ACM SIGMETRICS Performance Evauation Review, vo. 37, no.. ACM, 29, pp. 33 44. 27 S. P. Meyn and R. L. Tweedie, Stabiity of markovian processes I: Criteria for discrete-time chains, Advances in Appied Probabiity, pp. 542 574, 992. 28, Stabiity of markovian processes II: continuous-time processes and samped chains, Advances in Appied Probabiity, pp. 487 57, 993. 29 T. Bonad, Insensitive queueing modes for communication networks, in Proceedings of the st internationa conference on Performance evauation methodogies and toos, 26, p. 57.