Stochastic Models of Load Balancing and Scheduling in Cloud Computing Clusters

Stochastc Models of Load Balancng and Schedulng n Cloud Coputng Clusters Sva Theja Magulur and R. Srkant Departent of ECE and CSL Unversty of Illnos at Urbana-Chapagn sva.theja@gal.co; rsrkant@llnos.edu Le Yng Departent of ECE Iowa State Unversty leyng@astate.edu Abstract Cloud coputng servces are becong ubqutous, and are startng to serve as the prary source of coputng power for both enterprses and personal coputng applcatons. We consder a stochastc odel of a cloud coputng cluster, where jobs arrve accordng to a stochastc process and request vrtual achnes (VMs), whch are specfed n ters of resources such as CPU, eory and storage space. Whle there are any desgn ssues assocated wth such systes, here we focus only on resource allocaton probles, such as the desgn of algorths for load balancng aong servers, and algorths for schedulng VM confguratons. Gven our odel of a cloud, we frst defne ts capacty,.e., the axu rates at whch jobs can be processed n such a syste. Then, we show that the wdely-used Best- Ft schedulng algorth s not throughput-optal, and present alternatves whch acheve any arbtrary fracton of the capacty regon of the cloud. We then study the delay perforance of these alternatve algorths through sulatons. I. INTRODUCTION Cloud coputng servces are becong the prary source of coputng power for both enterprses and personal coputng applcatons. A cloud coputng platfor can provde a varety of resources, ncludng nfrastructure, software, and servces, to users n an on-deand fashon. To access these resources, a cloud user subts a request for resources. The cloud provder then provdes the requested resources fro a coon resource pool (e.g., a cluster of servers), and allows the user to use these resources for a requred te perod. Copared to tradtonal own-and-use approaches, cloud coputng servces elnate the costs of purchasng and antanng the nfrastructures for cloud users, and allow the users to dynacally scale up and down coputng resources n real te based on ther needs. Several cloud coputng systes are now coercally avalable, ncludng Aazon EC2 syste [1], Google s AppEngne [2], and Mcrosoft s Azure [3]. We refer to [4], [5], [6] for coprehensve surveys on cloud coputng. Whle cloud coputng servces n practce provde any dfferent servces, n ths paper, we consder cloud coputng platfors that provde nfrastructure as servce, n the for of Vrtual Machnes (VMs), to users. We assue cloud users request vrtual achnes (VMs), whch are specfed n ters of resources such as CPU, eory and storage space. Each request s called a job. The type of a job specfes the type of VM the user wants and the sze of the job specfes the aount of te requred. After recevng these requests, the cloud provder wll schedule the VMs on physcal achnes, called servers. There are any desgn ssues assocated wth such systes [7], [8], [9], [10], [11], [12]. In ths paper, we focus only on resource allocaton probles, such as the desgn of algorths for load balancng aong servers, and algorths for schedulng VM confguratons. We consder a stochastc odel of a cloud coputng cluster. We assue that jobs wth varable szes arrve accordng to a stochastc process, and are assgned to the servers accordng to a resource allocaton algorth. A job departs fro the syste after the VM s hosted for the requred aount of te. We assue jobs are queued n the syste when all servers are busy. We are nterested n the axu rates at whch jobs can be processed n such a syste, and resource allocaton algorths that can support the axu rates. The an contrbutons of ths paper are suarzed below. (1) We characterze the capacty regon of a cloud syste by establshng ts connecton to the capacty regon of a wreless network. The capacty of a cloud syste s defned to be the set of traffc loads under whch the queues n the syste can be stablzed. (2) We then consder the wdely-used Best-Ft schedulng algorth and provde a sple exaple to show that t s not throughput-optal. Next, we pont out that the well-known MaxWeght algorth s throughput-optal n an deal scenaro, where jobs can be preepted and can grate aong servers, and servers can be reconfgured at each te nstant. In practce, preepton and VM graton are costly. Therefore, otvated by the MaxWeght algorth, we present a non-preeptve algorth whch yopcally allocates a new job to a server usng current queue length nforaton whenever a departure occurs. We characterze the throughput of ths yopc algorth, and show that t can acheve any arbtrary fracton of the capacty regon f the the algorth paraeters are chosen approprately. (3) The algorths entoned above requre central queues. In practce, a ore scalable approach s to route jobs to servers rght after ther arrvals. We consder the Jonthe-Shortest-Queue (JSQ) algorth whch routes a job to the server wth the shortest queue. We prove that ths entals no loss s throughput copared to antanng a

2 sngle central queue. (4) JSQ needs to keep track of queue lengths at all servers, whch ay becoe prohbtve when we have a large nuber of servers and the arrval rates of jobs are large. To address ths ssue, we propose the power-of-twochoces routng for the case of dentcal servers, and pck-and-copare routng for the case of non-dentcal servers. II. MODEL DESCRIPTION A cloud syste conssts of a nuber of networked servers. Each of the servers ay host ultple Vrtual Machnes (VMs). Each VM requres a set of resources, ncludng CPU, eory, and storage space. VMs are classfed accordng to the resources they request. As an exaple, Table I lsts three types of VMs (called nstances) avalable n Aazon EC2. Instance Type Meory CPU Storage Standard Extra Large 15 GB 8 EC2 unts 1,690 GB Hgh-Meory Extra Large 17.1 GB 6.5 EC2 unts 420 GB Hgh-CPU Extra Large 7 GB 20 EC2 unts 1,690 GB TABLE I THREE REPRESENTATIVE INSTANCES IN AMAZON EC2 We assue there are M dstnct VM confguratons and that each VM confguraton s specfed n ters of ts requreents for K dfferent resources. Let R k be the aount of type-k resource (e.g., eory) requred by a type- VM (e.g., a standard extra large VM). Further, we assue that the cloud syste conssts of L dfferent servers. Let C k denote the aount of type-k resource at server. Gven a server, an M- densonal vector N s sad to be a feasble VM-confguraton f the gven server can sultaneously host N 1 type-1 VMs, N 2 type-2 VMs,..., and N M type-m VMs. In other words, N s feasble at server f and only f M N R k C k =1 for all k. We let N ax denote the axu nuber of VMs of any type that can be served on any server. Exaple 1: Consder a server wth 30 GB eory, 30 EC2 coputng unts and 4, 000 GB storage space. Then N = (2, 0, 0) and N = (0, 1, 1) are two feasble VMconfguratons on the server, where N 1 s the nuber of standard extra large VMs, N 2 s the nuber of hgh-eory extra large VMs, and N 3 s the nuber of hgh-cpu extra large VMs. N = (0, 2, 1) s not a feasble VM confguraton on ths server because t does not have enough eory and coputng unts. In ths paper, we consder a cloud syste whch hosts VMs for clents. A VM request fro a clent specfes the type of VM the clent needs, and the aount of te requested. We call a VM request a job. A job s sad to be a type- job f a type- VM s requested. We consder a te-slotted syste n ths paper, and we say that the sze of the job s S f the VM needs to be hosted for S te slots. Gven our odel of a cloud syste, we next defne the concept of capacty for a cloud. III. CAPACITY OF A CLOUD What s the capacty of a cloud? Frst, as an exaple, consder the three servers defned n Exaple 1. Clearly ths syste has an aggregate capacty of 90 GB of eory, 90 EC2 copute unts and 12, 000 GB of storage space. However, such a crude defnton of capacty fals to reflect the syste s ablty to host VMs. For exaple, whle 4 17.1 + 3 7 = 89.4 90, 4 6.5 + 3 20 = 86 90, 4 420 + 3 1690 = 6750 12000, t s easy to verfy that the syste cannot host 4 hgh-eory extra large VMs and 3 hgh-cpu extra large VMs at the sae te. Therefore, we have to ntroduce a VM-centrc defnton of capacty. Let A denote the set of type- jobs that arrve at the begnnng of te slot t, and let A = A,.e., the nuber of type- jobs that arrve at the begnnng of te slot t. Indexng the jobs n A fro 1 through A, we defne W = j A S j, to be the overall sze of the jobs n A or the total te slots requested by the jobs. We assue that W s a stochastc process whch s..d. across te slots, E[W ] = λ and Pr(W = 0) > ɛ W for soe ɛ W > 0 for all and t. Many of these assuptons can be relaxed, but we consder the splest odel for ease of exposton. Let D denote the nuber of type- jobs that are served by the cloud at te slot t. Note that the job sze of each of these D jobs reduces by one at the end of te slot t. Let Q denote the total backlogged job sze of type- jobs n the network at the begnnng of te slot t, before any other job arrvals. Then the dynacs of Q can be descrbed as Q (t + 1) = (Q + W D ). (1) We say that the cloud syste s stable f l sup t E[ Q ] <,.e., the aount of expected backlogged jobs s bounded n steady state. A vector of arrvng loads λ s sad to be supportable f there exsts a resource allocaton echans under whch the cloud s stable. In the followng, we frst dentfy the set of supportable λs. Let N l be the set of feasble VM-confguratons on a server l. We defne a set C such that { } L C = λ : λ = λ () and λ () Conv(N )., (2) =1 where Conv s the convex hull. We next use a sple exaple to llustrate the defnton of C. Exaple 2: Consder a sple cloud syste consstng of three servers. Servers 1 and 2 are of the sae type (.e., they have sae aount of resources), and server 3 s of a dfferent type. Assue there are two types of VMs. The set of feasble VM confguratons servers 1 and 2 s assued to be N 1 =

3 1 0 1 Fg. 1. Regons Conv(N 1 ) and Conv(N 3 ) 4 0 Fg. 2. 3 2 (2, 2) The capacty regon C N 2 = {(0, 0), (1, 0), (0, 1)},.e., each of these servers can at ost host ether one type-1 VM or one type-2 VM. The set of feasble confguratons on server 3 s assued to be N 3 = {(0, 0), (1, 0), (2, 0), (0, 1)},.e., the server can at ost host ether two type-1 VMs or one type-2 VM. The regons Conv(N 1 ) and Conv(N 3 ) are plotted n Fgure 1. Note that vector (0.75, 0.25) s n the regon Conv(N 1 ). Whle a type-1 server cannot host 0.75 type-1 VMs and 0.25 type-2 VM, we can host one type-1 VM on server 1 for 3/4 of the te, and a type-2 VM on server for 1/4 of the te to support load (0.75, 0.25). The capacty regon C for ths sple cloud syste s plotted n Fgure 2. We call C the capacty regon of the cloud. Ths defnton of the capacty of a cloud s otvated by slar defntons n [13]. We ntroduce the followng notaton: the servers are ndexed by. Let N () denote the VM-confguraton on server at te slot t. Further defne D = N (), so D s the total nuber of type- VMs hosted n the cloud at te t. As n [13], t s easy to show the followng results. Lea 1: D C for any t. Theore 1: For any λ C, l E Q =. t IV. THROUGHPUT OPTIMAL SCHEDULING: CENTRALIZED 0 APPROACHES In ths secton, we study centralzed approaches for job schedulng. We assue that jobs arrve at a central job scheduler, and are queued at the job scheduler. The scheduler dspatches a job to a server when the server has enough resources to host the VM requested by the job. In ths settng, servers do not have queues, and do not ake schedulng decsons. We call a job schedulng algorth throughput optal f the 1 algorth can support any λ such that (1 + ɛ)λ C for soe ɛ > 0. A. Best Ft s not Throughput Optal: A Sple Exaple A schedulng polcy that s used n practce s so called best-ft polcy [14], [15],.e., the job whch uses the ost aount of resources, aong all jobs that can be served, s selected for servce whenever resources becoe avalable. Such a defnton has to be ade ore precse when a VM conssts of ultple resources. In the case of ultple resources, we can select one type of resource as reference resource, and defne best ft wth respect to ths resource, and f there s a te, then best ft wth respect to another resource s consdered, and so on. Alternatvely, one can consder a partcular lnear or nonlnear cobnaton of the resources as a eta-resource and defne best ft wth respect to the etaresource. We now show that best ft s not throughput optal. Consder a sple exaple where we have one server, one type of resource and two types of jobs. A type-1 job requests half of the resource and three te slots of servce, and a type-2 job requests the whole resource and one te slot of servce. Now assue type-1 jobs arrve once every two te slots startng fro te slot 1, and type-2 jobs arrve accordng to soe arrval process wth arrval rate ɛ. Under the bestft polcy, f a type-1 job s scheduled n te slot 1, then type-1 jobs are scheduled forever snce type-2 jobs cannot be scheduled when a type-1 job s n the server. So the aount of backlogged type-2 jobs wll blow up to nfnty for any ɛ > 0. The syste, however, s clearly stablzable for suffcently sall ɛ. Suppose we schedule type-1 jobs only n te slots 3, 7, 11, 15, 19,...,.e., once every four te slots. Then te slots 6, 10, 14, 18, 22,... are avalable for type-2 jobs. So f ɛ < 1/4, both queues can be stablzed. Note that n ths exaple, we assued a perodc arrval process. It s not dffcult to show that the resource allocaton algorths to be consdered n the next few sectons wll stablze the queues for such arrval processes. But for ease of exposton, n the rest of the paper, we consder only stochastc arrval processes. Ths rases the queston as to whether there are throughputoptal polces whch stablze the queues for all arrval rates whch le wthn the capacty regon, wthout requrng knowledge of the actual arrval rates. In the next subsecton, we answer ths queston affratvely by relatng the proble to a well-known schedulng proble n wreless networks. However, such a schedulng algorth requres job preepton. In the later sectons, we dscuss non-preeptve polces and the loss of capacty (whch can be ade arbtrarly sall) due to non-preepton. B. Preeptve Algorths In ths subsecton, we assue that all servers can be reconfgured at the begnnng of each te slot, and a job can be nterrupted at the begnnng of each te and put back n the queue. We wll study the schees that do not nterrupt job servce n the next subsecton. We further assue the job

4 scheduler antans a separate queue for each type of job, and szes of all jobs are bounded by S ax. Recall that Q s the overall sze of backlogged type- jobs at the begnnng of te slot t. We consder the followng server-by-server MaxWeght allocaton schee. Server-by-server MaxWeght allocaton: Consder the th server. If the set of jobs on the server are not fnshed, ove the back to the central queue. Fnd a VM-confguraton N such that N () arg ax N N Q N. At server, we create upto N () type- VMs dependng on the nuber of jobs that are backlogged. Let N () be the actual nuber of VMs that were created. Then, we set ( Q (t + 1) = Q + W N () ) +. The fact that the proposed algorth s throughput optal follows fro [13] and s stated as a theore below. Theore 2: Assue that a server can serve at ost N ax job at the sae te, and E[W] 2 σ 2 for any. The server-by-server MaxWeght allocaton s throughput optal,.e., l E Q < t f there exsts ɛ > 0 such that (1 + ɛ)λ C. C. Non-preeptve Algorths The algorth presented n the prevous subsecton requres us to reconfgure the servers and re-allocate jobs at the begnnng of each te slot. In practce, a job ay not be nterruptable or nterruptng a job can be very costly (the syste needs to store a snapshot of the VM to be able to restart the VM later). In ths subsecton, we ntroduce a nonpreeptve algorth, whch s nearly throughput optal. Before we present the algorth, we outlne the basc deas frst. We group T te slots nto a super te slot, where T > S ax. At the begnnng of a super te slot, a confguraton s chosen accordng to the MaxWeght algorth. When jobs depart a server, the reanng resources n the server are flled agan usng the MaxWeght algorth; however, we pose the constrant that only jobs that can be copleted wthn the super slot can be served. So the algorth yopcally (wthout consderaton of the future) uses resources, but s queue-length aware snce t uses the MaxWeght algorth. We now descrbe the algorth ore precsely. Myopc MaxWeght allocaton: We group T te slots nto a super te slot. At te slot t, consder the th server. Let N () (t ) be the set of VMs that are hosted on server at the begnnng of te slot t,.e., these correspond to the jobs that were scheduled n the prevous te slot but are stll n the syste. These VMs cannot be reconfgured due to our nonpreepton requreent. The central controller fnds a new vector of confguratons Ñ () to fll up the resources not used by N () (t ),.e., Ñ () arg ax Q N, N:N+N () (t ) N The central controller selects as any jobs as are avalable n the queue, up to a axu of Ñ () type- jobs at server, and subject to the constrant that a type- job can only () be served only f S j T (t od T ). Let N denote the nuber of type- jobs selected. Server then serves the N new jobs of each type, and the set of jobs N () (t ) left over fro the prevous te slot. The queue length s updated as follows: ( ) N () (t () ) + N. Q (t + 1) = Q + W Note that ths yopc MaxWeght allocaton algorth dffers fro the server-by-server MaxWeght allocaton n two aspects: () jobs are not nterrupted when served and () when a job departs fro a server, new jobs are accepted wthout reconfgurng the server. We next characterze the throughput acheved by the yopc MaxWeght allocaton under the followng assuptons: () job szes are unforly bounded by S ax, and () W W ax for all and t. T Theore 3: Any job load that satsfes (1+ɛ) T S ax λ C for soe ɛ > 0 s supportable under the yopc MaxWeght allocaton. We skp the proof of ths theore because the proof s very slar to the proof of Theore 4 n the next secton. It s portant to note that, unlke Best Ft, the yopc MaxWeght Algorth can be ade to acheve any arbtrary fracton of the capacty regon by choosng T suffcently large. V. RESOURCE ALLOCATION WITH LOAD BALANCING In the prevous secton, we consdered the case when there was a sngle queue for jobs of sae type, beng served at dfferent servers. Ths requres a central authorty to antan a sngle queue for all servers n the syste. A ore dstrbuted soluton s to antan queues at each server and route jobs as soon as they arrve to one of the servers. To the best of our knowledge, ths proble does not ft nto the schedulng/routng odel n [13]. However, we show that one can stll show use MaxWeght-type schedulng f the servers are load-balanced usng a jon-the-shortest-queue (JSQ) routng rule. In our odel, we assue that each server antans M dfferent queues for dfferent types of jobs. It then uses ths queue length nforaton n akng schedulng decsons. Let Q denote the vector of these queue lengths where Q s the queue length of type jobs at the server. Routng and Schedulng are perfored as descrbed n the Algorth 1. The followng theore characterzes the throughput perforance of the algorth. (1+ɛ)T Theore 4: Any job load vector that satsfes T +S ax λ C for soe ɛ > 0 s supportable under the JSQ routng and yopc MaxWeght allocaton as descrbed n algorth 1

5 Algorth 1 JSQ Routng and Myopc Maxweght Schedulng 1) Routng Algorth (JSQ Routng): All the type jobs that arrve n a te slot t are routed to the server wth the shortest queue for type jobs.e., choose the server = arg n Q. Therefore, the arrvals to {1,2,,,L} Q n te slot t are gven by { W f = W = (3) 0 otherwse 2) Schedulng Algorth (Myopc Maxweght Schedulng) for each server : T te slots are grouped nto a super te slot. A MaxWeght confguraton s chosen at the begnnng of a super te slot. So, for t = nt, confguraton Ñ () s chosen accordng to Ñ () arg ax N N Q N For all other t, at the begnnng of the te slot, a new confguraton s chosen as follows: Ñ () arg ax Q N N:N+N () (t ) N where N () (t ) s the confguraton of jobs at server that are stll n servce at the end of the prevous te slot. As any jobs are selected for servce fro the queue, up to a axu of Ñ () jobs of type, and subject to the constrant that a new type job s served only f t can fnsh ts servce by the end of the super te slot.e., only f S j T (t od T ). Let N () denote the actual nuber of type jobs selected at server and defne N () = N () (t )+N (). The queue lengths are updated as follows: Q (t + 1) = Q + W N (). (4) Proof: Let Y denote the state of the queue for type jobs, where Y j s the sze of the jth type- job at server. Frst, t s easy to see that Y = {Y }, s a Markov chan under the server-by-server MaxWeght allocaton. Further defne S = {y : Pr(Y = y Q(0) = 0) for soe t}, then Y s an rreducble Markov chan on state space S assung Y(0) = 0. Ths cla holds because () any state n S s reachable fro 0 and () snce Pr(W = 0) ɛ W for all and t, the Markov chan can ove fro Y to 0 n fnte te wth a postve probablty. Further Q = j Yj,,.e., Q s a functon of Y. We wll frst show that Q N () does not ncrease by uch wthn a super te slot. For any t such that 1 (t od T ) T S ax, for each server, Q N () (t 1) = Q N () (t ) + ( ) Q N () (t 1) N () (t ) a Q N () (t ) + Q Ñ () ( ) Q N () (t ) + Q Ñ () I Q S axn ax = + ( ) Q N () (t ) + Q Ñ () I Q<S axn ax (b) Q N () + MS ax Nax 2 where the nequalty (a) follows fro the defnton Ñ () ; and nequalty (b) holds because when Q S ax N ax, there are enough type- jobs to allocate to the servers, and when 1 (t od T ) T S ax, all backlogged jobs are elgble to serve n ters of job szes. Now snce Q Q (t 1) = W N () W ax + N ax, we have Q (t 1)N () (t 1) β + Q N () (5) where β = MN ax (W ax + N ax ) + MS ax N 2 ax. Let V = Q 2 be the Lyapunov functon. Let t = nt + τ for 0 τ < T. Then, E[V (nt + τ + 1) V (nt + τ) Q(nT ) = q] [ ( ) 2 =E Q + W N () ] Q 2 [ =E 2 ( ) Q W N () + ] ( 2 W N ) () Q(nT ) = q [ K + 2E Q W ] Q N () =K + 2 E[Q W Q(nT ) = q] 2E Q N () =K + 2 λ E[Q Q(nT ) = q] 2E Q N () K + 2 λ W ax τ (6) (7) (8) (9) (10)

6 + 2 λ E[Q (nt )(nt ) Q(nT ) = q] 2E Q N () =K + 2 λ W ax τ + 2 λ q 2E Q N () (11) (12) where K = ML(S ax + N ax ) 2 and = (nt ) = arg n q.. Equaton (9) follows fro the defnton of {1,2,,,L} W n the routng algorth n (3). Equaton (10) follows fro ndependence of the arrval process fro the queue length process. The nequalty (11) coes fro the fact that Q Q (nt ) Q (nt ) + W ax τ. Now, applyng (5) repeatedly, for t [nt, (n+1)t S ax ], and sung over, we get Q N () L(t nt )β Q (nt )N () (nt ). (13) (1+ɛ)T { Snce, T S ax λ C, there exsts } λ such that (1+ɛ)T T S ax λ Conv(N ) for all and λ = λ. Accordng to the schedulng algorth, for each, we have that T (1 + ɛ) Q (nt )λ T S ax Thus, we get, Q N () L(t nt )β L(t nt )β Q (nt )N () (nt ). (14) (1 + ɛ)t T S ax Q (nt )N () (nt ) (15) Q (nt )λ (16) L(t nt )β (1 + ɛ)t Q T S (nt )λ ax = L(t nt )β (1 + ɛ)t Q T S (nt )λ. (17) ax Substtutng ths n (12), we get, for t [nt, (n + 1)T S ax ], E[V (nt + τ + 1) V (nt + τ) Q(nT ) = q] K + 2 λ W ax τ + 2L(t nt )β + 2 T λ q 2(1 + ɛ) q T S λ. ax (18) Sung the drft for τ fro 0 to T 1 usng (18) for τ [0, T S ax ], and (12) for the reanng τ, we get E[V ((n + 1)T ) V (nt ) Q(nT ) = q] T K + 2 T 1 T S ax 1 λ W ax τ + 2Lβ τ + 2T (1 + ɛ)t λ q 2 q T S λ (T S ax ) ax T K + 2 T 1 T S ax 1 λ W ax τ + 2Lβ τ 2ɛT q λ. The theore then follows fro Foster-Lyapunov theore [16], [17]. VI. SIMPLER LOAD BALANCING ALGORITHMS Though JSQ routng algorth s throughput optal, the job scheduler needs the queue length nforaton fro all the servers. Ths poses a consderable councaton overhead as the arrval rate of jobs and nuber of servers ncreases. In ths secton, we present two alternatves whch have uch lower routng coplexty. A. Power-of-two choces Routng and Myopc MaxWeght Schedulng An alternate to JSQ routng that s uch spler to pleent s the power-of-two choces algorth [18]. When a job arrves, two servers are sapled at rando, and the job s routed to the server wth the saller queue for that job type. In our algorth, n each te slot t, for each type of job, two servers 1 and 2 are chosen unforly at rando. The job scheduler then routes all the type job arrvals n ths te slot to the server wth shorter queue length aong these two,.e., = arg n Q and so { 1, 2 } W = { W f = 0 otherwse. Otherwse, the algorth s dentcal to the JSQ-Myopc MaxWeght algorth consdered earler. In ths secton, we wll provde a lower bound on the throughput of ths powerof-two choces algorth n the non-preeptve case when all the servers have dentcal resource constrants. Theore 5: When all the severs are dentcal, any job T load that satsfes (1 + ɛ) T +S ax λ C for soe ɛ > 0 s supportable under the power-of-two choces routng and yopc MaxWeght allocaton algorth.

7 Proof: Agan, we use V = Q 2 as the Lyapunov functon. Then, fro (8), we have E[V (t + 1) V Q(nT ) = q] K + 2E Q W 2E Q N () (19) For fxed, let X be the rando varable whch denotes the two servers that were chosen by the routng algorth at te t for jobs of type. X s then unforly dstrbuted over all sets of two servers. Now, usng the tower property of condtonal expectaton, we have, E Q W [ = E X [E Q W ]] Q(nT ) = q, X = {, j } = E X [E [Q W + Q j W j Q(nT ) = q, X = {, j }]] = E X [E [n (Q, Q j ) W Q(nT ) = q, X = {, j }]] (20) [ Q + Q j E X [E W 2 ]] Q(nT ) = q, X = {, j } q + q j = E X λ 2 L 1 1 = λ L q (21) C 2 2 = λ q L. (22) Equaton (20) follows fro the routng algorth and (21) follows fro the fact that X s unforly dstrbuted. Snce the schedulng algorth s dentcal to algorth 1, (13) stll holds for any t such that 1 (t od T ) T S ax. Thus, we have, Q N () L(t nt )β Q (nt )N () (nt ). (23) We assue that all the servers are dentcal. So, C s obtaned by sung L copes of Conv(N ). Thus, snce (1+ɛ)T T S ax λ C, (1+ɛ)T λ we have that T S ax L Conv(N ) = Conv(N ) for all. Accordng to the schedulng algorth, for each, we have that Thus, we get, T (1 + ɛ) Q (nt ) λ T S ax L Q N () L(t nt )β Q (nt )N () (nt ). (24) (1 + ɛ)t T S ax Q (nt ) λ L (25) L(t nt )β (1 + ɛ)t λ Q (nt ). (26) T S ax L Now, substtutng (22) and (16) n (19) and sung over t [nt, (n + 1)T 1], we get E[V ((n + 1)T ) V (nt ) Q(nT ) = q] T K + 2T λ q L T 2(1 + ɛ) T S ax T S ax 1 T K + 2Lβ + 2Lβ λ T S ax 1 q L τ 2T ɛ τ (T S ax) λ q L. Ths proof can be copleted by applyng the Foster-Lyapunov theore [16], [17]. B. Pck-and-Copare Routng and Myopc MaxWeght Schedulng One draw back of the power-of-two choces schedulng s that t s throughput optal only when all the servers are dentcal. In the case of nondentcal servers, one can use pck-and-copare routng algorth nstead of Power-of-two choces. The algorth s otvated by the pck-and-copare algorth for wreless schedulng and swtch schedulng [19], and s as sple to pleent as Power-of-two choces, and can be shown to be optal even f the servers are not dentcal. We descrbe ths next. The schedulng algorth s dentcal to the prevous case. Pck-and-copare routng works as follows. In each te slot t, for each type of job, a server s chosen unforly at rando and copared wth the server to whch packets were routed n the prevous te slot. The server wth the shorter queue length aong these s chosen and all the type job arrvals n ths te slot are routed to that server. Let be the server to whch packets wll be routed n te slot t. Then, = arg n Q and so {, (t 1)} W = { W f = 0 otherwse.

8 (1+ɛ)T Theore 6: Any job load vector that satsfes T +S ax λ C for soe ɛ > 0 s supportable under the pck-and-copare routng and yopc MaxWeght allocaton algorth. Proof: Consder the rreducble Markov chan Y = (Y, { } ) and the Lyapunov functon V = Q 2. Then, as n the proof of theore 5, slar to (19), we have E[V (t + 1) V Q(nT ) = q, (nt ) = Then ] [ ] (n+k)t 1 K + 2E Q W, (nt ) = E Q W, (nt ) = t=nt t 0 2E Q N (), (nt ) =. = E Q W, (nt ) = t=nt (27) T Snce, (1 + ɛ) T S ax λ C, there exsts { (n+k)t 1 λ } such that T (1 + ɛ) T S ax λ Conv(N ) for all and λ = + E Q W, (nt ) = λ. Ths t=t 0 { } (32) λ can be chosen so that there s a κ so that λ κλ. Ths s possble because f λ > 0, one can choose { λ } so λ (t 0 nt ) q that λ > 0. t 0 Snce the schedulng part of the algorth s dentcal to + (t nt ) (W ax + N ax ) LW ax Algorth 1, (16) stll holds for t [nt, (n + 1)T S ax ]. t=nt Thus, we have Q N () L(t nt )β (1 + ɛ)t Q (nt )λ T S. (28) ax We also need a bound on the ncrease n Q N () over ultple super te slots. So, for any n, we have Q (nt )N () (nt ) Q ((n + n )T )N () (nt ) + n T LMN ax (W ax + N ax ) Q ((n + n )T )N () ((n + n )T ) + n T Lβ where the second nequalty follows fro the fact that we use axweght schedulng every T slots and fro the defnton of β. Now, agan, usng (14), and (28), we have that for any t such that 1 (t od T ) T S ax, we have L(t nt )β (1 + ɛ)t Q (nt )λ T S. (30) ax Q N () (29) Fx. Let n = arg n Q (nt ). Note that {1,2,,,L} Q Q (t 1) = W N () W ax + N ax. Therefore, once there s a t 0 nt such that (t 0 ) satsfes Q (t 0)(t 0 ) Q n (t 0 ), (31) then, for all t t 0, we have Q Q n (nt )+(t nt ) (W ax + N ax ). Probablty that (31) does not happen s atost ( 1 1 L) (t0 nt ). Choose t0 so that ths probablty s less than p = ɛ/4κ. Then, (1 + κp) = 1 + ɛ/4. Choose k so that kt > (t 0 nt ) and ((n + k)t t 0 ) + κ(t 0 nt ) kt (1 + ɛ/4). (n+k)t 1 + t=t 0 (1 p)λ ( q n +(t nt ) (W ax + N ax )) + pλ ((n + k)t t 0 ) q (n+k)t 1 + p (t nt ) (W ax + N ax ) LW ax (33) t=t 0 (1 p) ((n + k)t t 0 ) + kt q n λ τ (W ax + N ax ) LW ax + (1 p)λ (t 0 nt ) K 1 + (1 p) ((n + k)t t 0 ) + (1 p)κ(t 0 nt ) K 1 + (1 p)kt (1 + ɛ/4) + (1 + ɛ/4)κpkt K 1 + kt (1 + ɛ/4) 2 K 1 + kt (1 + 3ɛ/4) q + pλ kt q λ q λ + κpkt q λ q (34) q λ (35) q λ (36) q λ (37) q λ (38) wherek 1 = kt τ (W ax + N ax ) LW ax. Equatons (36) and (37) follow fro our choce of k and p respectvely.

9 Now, substtutng (38) and (29) n (27) and sung over t [nt, (n + 1)T 1], we get E[V ((n + k)t ) V (nt ) Q(nT ) = q, (nt ) = ] K + 2kT (1 + 3ɛ/4) q λ [ (n+k)t 1 2E Q N () t=nt ] Q(nT ) = q, = K + 2kT (1 + 3ɛ/4) q λ T 2(1 + ɛ) q λ T S k(t S ax ) ax K + 1 2 kt ɛ q λ where K = kt K + MK 1 + 2Lβ kt S ax 1 τ. The result follows fro the Foster-Lyapunov theore [16], [17]. VII. SIMULATIONS In ths secton, we use sulatons to copare the perforance of the centralzed yopc MaxWeght schedulng algorth, and the jont routng and schedulng algorth based on the power-of-two-choces and MaxWeght schedulng. We consder a cloud coputng cluster wth 100 dentcal servers, and each server has the hardware confguraton specfed n Exaple 1. We assue jobs beng served n ths cloud belong to one of the three types specfed n Table I. So VM confguratons (2, 0, 0), (1, 0, 1), and (0, 1, 1) are the three axal VM confguratons for each server. It s easy to verfy that the load vector λ = (1, 1 3, 2 3 ) s on the boundary of the capacty regon of a server. To odel the large varablty n jobs szes, we assue job szes are dstrbuted as follows: when a new job s generated, wth probablty 0.7, the sze s an nteger that s unforly dstrbuted n the nterval [1, 50], wth probablty 0.15, t s an nteger that s unforly dstrbuted between 251 and 300, and wth probablty 0.15, t s unforly dstrbuted between 451 and 500. Therefore, the average job sze s 130.5 and the axu job sze s 500. We further assue the nuber of type- jobs arrvng at each te slot follows a Bnoal dstrbuton wth paraeter (α λ 130.5, 100). We vared the paraeter α fro 0.5 to 1 n our sulatons, whch vared the traffc ntensty of the cloud syste fro 0.5 to 1, where traffc ntensty s the factor by whch the load vector has to be dvded so that t les on the boundary of the capacty regon. Each sulaton was run for 500, 000 te slots. Frst we study the dfference between power-of-two choce routng and JSQ routng by coparng the ean delay of the two algorths at varous traffc ntenstes for dfferent choces of frae szes. Our sulaton results ndcate that the delay perforance of the Fg. 3. Coparson of Mean delay n n the cloud coputng cluster n the case wth a coon queue and n the case wth power of two choces routng when frae sze s 4000 two algorths was not very dfferent. Due to page ltatons, we only provde a representatve saple of our sulatons here for the case where the frae sze s 4000 n Fgure 3. Next, we show the perforance of our algorths for varous values of the frae sze T n Fgure 4. Agan, we have only shown a representatve saple for the power-of-two choces routng (wth yopc MaxWeght schedulng). Fro Theores 3 and 5, we know that any load less than T Sax T s supportable. The sulatons ndcate that the syste s stable even for loads greater than ths value. Ths s to be expected snce our proofs of Theores 3 and 5 essentally gnore the jobs that are scheduled n the last S ax te slots of a frae. However, the fact that the stablty regon s larger for larger values of T s confred by the sulatons. It s even ore nterestng to observe the delay perforance of our algorths as T ncreases. Fgure 4 ndcates that the delay perforance does not degrade as T s ncreased but the throughput ncreases wth T. In partcular, our theores are not vald when T =. However, the sulatons ndcate that the syste s stable even for T =. When T =, our routng and schedulng algorths yopcally fll space usng queue length nforaton. So the use of queue-length nforaton sees to be the key ngredent of the algorth whle the optal pleentaton of the MaxWeght algorth sees to be secondary. Ths s soewhat slar to the perforance of the LQF (Longest-Queue-Frst) algorth n wreless networks whch s not known to be throughput optal for all topologes but perfors nearly as well as the MaxWeght algorth n sulatons [20]. However, as n the case of wreless networks, t s possble that the choce of T = ay result n suboptal throughput n soe topologes [21] but, n ths paper, we have not attepted to fgure out whether such topologes exst. VIII. CONCLUSIONS We consdered a stochastc odel for load balancng and schedulng n cloud coputng clusters. A prary contrbu-

10 [15] A. Beloglazov and R. Buyya, Energy effcent allocaton of vrtual achnes n cloud data centers, n 2010 10th IEEE/ACM Internatonal Conference on Cluster, Cloud and Grd Coputng, 2010, pp. 577 578. [16] S. Asussen, Appled Probablty and Queues. New York: Sprnger- Verlag, 2003. [17] S. Meyn and R. L. Tweede, Markov chans and stochastc stablty. Cabrdge Unversty Press, 2009. [18] M. Mtzenacher, The power of two choces n randozed load balancng, Ph.D. dssertaton, Unversty of Calforna at Berkeley, 1996. [19] L. Tassulas, Lnear coplexty algorths for axu throughput n radonetworks and nput queued swtches, n Proc. IEEE Infoco., 1998. [20] X. Ln and N. Shroff, The pact of perfect schedulng on crosslayer rate control n wreless networks, n Proc. IEEE Infoco., vol. 3, Ma, FL, March 2005, pp. 1804 1814. [21] A. Daks and J. Walrand, Suffcent condtons for stablty of longest queue frst schedulng, Adv. Appl. Prob., pp. 505 521, 2006. Fg. 4. Coparson of power-of-two choces routng algorth for varous frae lengths T ton s the developent of frae-based non-preeptve VM confguraton polces. These polces can be ade nearly throughput-optal by choosng suffcently long frae duratons, whereas the wdely used Best-Ft polcy was shown to be not throughput optal. Sulatons ndcate that long frae duratons are not only good fro a throughput perspectve but also see to provde good delay perforance. REFERENCES [1] EC2, http://aws.aazon.co/ec2/. [2] AppEngne, http://code.google.co/appengne/. [3] Azure, http://www.crosoft.co/wndowsazure/. [4] I. Foster, Y. Zhao, I. Racu, and S. Lu, Cloud coputng and grd coputng 360-degree copared, n Grd Coputng Envronents Workshop, 2008. GCE 08, 2008, pp. 1 10. [5] M. Arbrust, A. Fox, R. Grffth, A. Joseph, R. Katz, A. Konwnsk, G. Lee, D. Patterson, A. Rabkn, I. Stoca et al., Above the clouds: A berkeley vew of cloud coputng, 2009, tech. Rep. UCb/eeCs-2009-28, EECS departent, U.C. berkeley. [6] D. A. Menasce and P. Ngo, Understandng cloud coputng: Experentaton and capacty plannng, n Proc. 2009 Coputer Measureent Group Conf., 2009. [7] X. Meng, V. Pappas, and L. Zhang, Iprovng the scalablty of data center networks wth traffc-aware vrtual achne placeent, n Proc. IEEE Infoco., 2010, pp. 1 9. [8] Y. Yazr, C. Matthews, R. Farahbod, S. Nevlle, A. Gutoun, S. Gant, and Y. Coady, Dynac resource allocaton n coputng clouds usng dstrbuted ultple crtera decson analyss, n 2010 IEEE 3rd Internatonal Conference on Cloud Coputng, 2010, pp. 91 98. [9] K. Tsakalozos, H. Kllap, E. Stard, M. Roussopoulos, D. Paparas, and A. Dels, Flexble use of cloud resources through proft axzaton and prce dscrnaton, n Data Engneerng (ICDE), 2011 IEEE 27th Internatonal Conference on, 2011, pp. 75 86. [10] M. Ln, A. Weran, L. L. H. Andrew, and E. Thereska, Dynac rghtszng for power-proportonal data centers, n Proc. IEEE Infoco., 2011, pp. 1098 1106. [11] M. Wang, X. Meng, and L. Zhang, Consoldatng vrtual achnes wth dynac bandwdth deand n data centers, n Proc. IEEE Infoco., 2011, pp. 71 75. [12] U. Shara, P. Shenoy, S. Sahu, and A. Shakh, Kngfsher: Cost-aware elastcty n the cloud, n Proc. IEEE Infoco., 2011, pp. 206 210. [13] L. Tassulas and A. Ephredes, Stablty propertes of constraned queueng systes and schedulng polces for axu throughput n ulthop rado networks, IEEE Trans. Autoat. Contr., vol. 4, pp. 1936 1948, Deceber 1992. [14] B. Spetkap and M. Bchler, A atheatcal prograng approach for server consoldaton probles n vrtualzed data centers, IEEE Transactons on Servces Coputng, pp. 266 278, 2010.