Stochastic Models of Load Balancing and Scheduling in Cloud Computing Clusters

01 Proceedngs IEEE INFOCOM Stochastc Models of Load Balancng and Schedulng n Cloud Coputng Clusters Sva heja Magulur and R. Srkant Departent of ECE and CSL Unversty of Illnos at Urbana-Chapagn sva.theja@gal.co; rsrkant@llnos.edu Le Yng Departent of ECE Iowa State Unversty leyng@astate.edu Abstract Cloud coputng servces are becong ubqutous, and are startng to serve as the prary source of coputng power for both enterprses and personal coputng applcatons. We consder a stochastc odel of a cloud coputng cluster, where jobs arrve accordng to a stochastc process and request vrtual achnes (VMs, whch are specfed n ters of resources such as CPU, eory and storage space. Whle there are any desgn ssues assocated wth such systes, here we focus only on resource allocaton probles, such as the desgn of algorths for load balancng aong servers, and algorths for schedulng VM confguratons. Gven our odel of a cloud, we frst defne ts capacty,.e., the axu rates at whch jobs can be processed n such a syste. hen, we show that the wdely-used Best- Ft schedulng algorth s not throughput-optal, and present alternatves whch acheve any arbtrary fracton of the capacty regon of the cloud. We then study the delay perforance of these alternatve algorths through sulatons. I. INRODUCION Cloud coputng servces are becong the prary source of coputng power for both enterprses and personal coputng applcatons. A cloud coputng platfor can provde a varety of resources, ncludng nfrastructure, software, and servces, to users n an on-deand fashon. o access these resources, a cloud user subts a request for resources. he cloud provder then provdes the requested resources fro a coon resource pool (e.g., a cluster of servers, and allows the user to use these resources for a requred te perod. Copared to tradtonal own-and-use approaches, cloud coputng servces elnate the costs of purchasng and antanng the nfrastructures for cloud users, and allow the users to dynacally scale up and down coputng resources n real te based on ther needs. Several cloud coputng systes are now coercally avalable, ncludng Aazon EC syste [1], Google s AppEngne [], and Mcrosoft s Azure [3]. We refer to [4], [5], [6] for coprehensve surveys on cloud coputng. Whle cloud coputng servces n practce provde any dfferent servces, n ths paper, we consder cloud coputng platfors that provde nfrastructure as servce, n the for of Vrtual Machnes (VMs, to users. We assue cloud users request vrtual achnes (VMs, whch are specfed n ters of resources such as CPU, eory and storage space. Each request s called a job. he type of a job specfes the type of VM the user wants and the sze of the job specfes the aount of te requred. After recevng these requests, the cloud provder wll schedule the VMs on physcal achnes, called servers. here are any desgn ssues assocated wth such systes [7], [8], [9], [10], [11], [1]. In ths paper, we focus only on resource allocaton probles, such as the desgn of algorths for load balancng aong servers, and algorths for schedulng VM confguratons. We consder a stochastc odel of a cloud coputng cluster. We assue that jobs wth varable szes arrve accordng to a stochastc process, and are assgned to the servers accordng to a resource allocaton algorth. A job departs fro the syste after the VM s hosted for the requred aount of te. We assue jobs are queued n the syste when all servers are busy. We are nterested n the axu rates at whch jobs can be processed n such a syste, and resource allocaton algorths that can support the axu rates. he an contrbutons of ths paper are suarzed below. (1 We characterze the capacty regon of a cloud syste by establshng ts connecton to the capacty regon of a wreless network. he capacty of a cloud syste s defned to be the set of traffc loads under whch the queues n the syste can be stablzed. ( We then consder the wdely-used Best-Ft schedulng algorth and provde a sple exaple to show that t s not throughput-optal. Next, we pont out that the well-known MaxWeght algorth s throughput-optal n an deal scenaro, where jobs can be preepted and can grate aong servers, and servers can be reconfgured at each te nstant. In practce, preepton and VM graton are costly. herefore, otvated by the MaxWeght algorth, we present a non-preeptve algorth whch yopcally allocates a new job to a server usng current queue length nforaton whenever a departure occurs. We characterze the throughput of ths yopc algorth, and show that t can acheve any arbtrary fracton of the capacty regon f the algorth paraeters are chosen approprately. (3 he algorths entoned above requre central queues. In practce, a ore scalable approach s to route jobs to servers rght after ther arrvals. We consder the Jonthe-Shortest-Queue (JSQ algorth whch routes a job to the server wth the shortest queue. We prove that ths entals no loss n throughput copared to antanng a 978-1-4673-0775-8/1/$31.00 01 IEEE 70

sngle central queue. (4 JSQ needs to keep track of queue lengths at all servers, whch ay becoe prohbtve when we have a large nuber of servers and the arrval rates of jobs are large. o address ths ssue, we propose the power-of-twochoces routng for the case of dentcal servers, and pck-and-copare routng for the case of non-dentcal servers. II. MODEL DESCRIPION A cloud syste conssts of a nuber of networked servers. Each of the servers ay host ultple Vrtual Machnes (VMs. Each VM requres a set of resources, ncludng CPU, eory, and storage space. VMs are classfed accordng to the resources they request. As an exaple, able I lsts three types of VMs (called nstances avalable n Aazon EC. Instance ype Meory CPU Storage Standard Extra Large 15 GB 8 EC unts 1,690 GB Hgh-Meory Extra Large 17.1 GB 6.5 EC unts 40 GB Hgh-CPU Extra Large 7 GB 0 EC unts 1,690 GB ABLE I HREE REPRESENAIVE INSANCES IN AMAZON EC We assue there are M dstnct VM confguratons and that each VM confguraton s specfed n ters of ts requreents for K dfferent resources. Let R k be the aount of type-k resource (e.g., eory requred by a type- VM (e.g., a standard extra large VM. Further, we assue that the cloud syste conssts of L dfferent servers. Let C k denote the aount of type-k resource at server. Gven a server, an M- densonal vector N s sad to be a feasble VM-confguraton f the gven server can sultaneously host N 1 type-1 VMs, N type- VMs,..., and N M type-m VMs. In other words, N s feasble at server f and only f M N R k C k =1 for all k. We let N ax denote the axu nuber of VMs of any type that can be served on any server. Exaple 1: Consder a server wth 30 GB eory, 30 EC coputng unts and 4, 000 GB storage space. hen N = (, 0, 0 and N = (0, 1, 1 are two feasble VMconfguratons on the server, where N 1 s the nuber of standard extra large VMs, N s the nuber of hgh-eory extra large VMs, and N 3 s the nuber of hgh-cpu extra large VMs. N = (0,, 1 s not a feasble VM confguraton on ths server because t does not have enough eory and coputng unts. In ths paper, we consder a cloud syste whch hosts VMs for clents. A VM request fro a clent specfes the type of VM the clent needs, and the aount of te requested. We call a VM request a job. A job s sad to be a type- job f a type- VM s requested. We consder a te-slotted syste n ths paper, and we say that the sze of the job s S f the VM needs to be hosted for S te slots. Gven our odel of a cloud syste, we next defne the concept of capacty for a cloud. III. CAPACIY OF A CLOUD What s the capacty of a cloud? Frst, as an exaple, consder the three servers defned n Exaple 1. Clearly ths syste has an aggregate capacty of 90 GB of eory, 90 EC copute unts and 1, 000 GB of storage space. However, such a crude defnton of capacty fals to reflect the syste s ablty to host VMs. For exaple, whle 4 17.1 + 3 7 = 89.4 90, 4 6.5 + 3 0 = 86 90, 4 40 + 3 1690 = 6750 1000, t s easy to verfy that the syste cannot host 4 hgh-eory extra large VMs and 3 hgh-cpu extra large VMs at the sae te. herefore, we have to ntroduce a VM-centrc defnton of capacty. Let A (t denote the set of type- jobs that arrve at the begnnng of te slot t, and let A (t = A (t,.e., the nuber of type- jobs that arrve at the begnnng of te slot t. We let W (t = j A S (t j be the total nuber of te slots requested by the jobs. We assue that W (t s a stochastc process whch s..d. across te slots, E[W (t] = λ and Pr(W (t = 0 > ɛ W for soe ɛ W > 0 for all and t. Many of these assuptons can be relaxed, but we consder the splest odel for ease of exposton. Let D (t denote the nuber of type- jobs that are served by the cloud at te slot t. Note that the job sze of each of these D (t jobs reduces by one at the end of te slot t. he workload due to type- jobs s defned to be the su of the reanng job szes of all jobs of type- n the syste. We let Q (t denote the workload of type- jobs n the network at the begnnng of te slot t, before any other job arrvals. hen the dynacs of Q (t can be descrbed as Q (t + 1 = (Q (t + W (t D (t. (1 We say that the cloud syste s stable f l sup t E[ Q (t] <,.e., the expected total workload n steady-state s bounded. A vector of arrvng loads λ s sad to be supportable f there exsts a resource allocaton echans under whch the cloud s stable. In the followng, we frst dentfy the set of supportable λs. Let N be the set of feasble VM-confguratons on a server. We defne a set C such that } L C = λ : λ = λ ( and λ ( Conv(N., ( =1 where Conv denotes the convex hull. We next use a sple exaple to llustrate the defnton of C. Exaple : Consder a sple cloud syste consstng of three servers. Servers 1 and are of the sae type (.e., they have the sae aount of resources, and server 3 s of a dfferent type. Assue there are two types of VMs. he set of feasble VM confguratons on servers 1 and s assued to be 703

1 0 1 Fg. 1. Regons Conv(N 1 and Conv(N 3 4 0 Fg.. 3 (, he capacty regon C N 1 = N = (0, 0, (1, 0, (0, 1},.e., each of these servers can at ost host ether one type-1 VM or one type- VM. he set of feasble confguratons on server 3 s assued to be N 3 = (0, 0, (1, 0, (, 0, (0, 1},.e., the server can at ost host ether two type-1 VMs or one type- VM. he regons Conv(N 1 and Conv(N 3 are plotted n Fgure 1. Note that vector (0.75, 0.5 s n the regon Conv(N 1. Whle a type-1 server cannot host 0.75 type-1 VMs and 0.5 type- VM, we can host a type-1 VM on server 1 for 3/4 of the te, and a type- VM on the server for 1/4 of the te to support load (0.75, 0.5. he capacty regon C for ths sple cloud syste s plotted n Fgure. We call C the capacty regon of the cloud. hs defnton of the capacty of a cloud s otvated by slar defntons n [13]. We ntroduce the followng notaton: the servers are ndexed by. Let N ( (t denote the VM-confguraton on server at te slot t. Further defne D(t = N ( (t, so D (t s the total nuber of type- VMs hosted n the cloud at te t. As n [13], t s easy to show the followng results. Lea 1: D(t C for any t. heore 1: For any λ C, l E Q (t =. t IV. HROUGHPU OPIMAL SCHEDULING: CENRALIZED 0 APPROACHES In ths secton, we study centralzed approaches for job schedulng. We assue that jobs arrve at a central job scheduler, and are queued at the job scheduler. he scheduler dspatches a job to a server when the server has enough resources to host the VM requested by the job. In ths settng, servers do not have queues, and do not ake schedulng decsons. We call a job schedulng algorth throughput optal f the algorth can support any λ such that (1 + ɛλ C for soe ɛ > 0. 1 A. Best Ft s not hroughput Optal: A Sple Exaple A schedulng polcy that s used n practce s so called best-ft polcy [14], [15],.e., the job whch uses the ost aount of resources, aong all jobs that can be served, s selected for servce whenever resources becoe avalable. Such a defnton has to be ade ore precse when a VM requests ultple types of ultple resources. In the case of ultple types of resources, we can select one type of resource as reference resource, and defne best ft wth respect to ths resource. If there s a te, then best ft wth respect to another resource s consdered, and so on. Alternatvely, one can consder a partcular lnear or nonlnear cobnaton of the resources as a eta-resource and defne best ft wth respect to the eta-resource. We now show that best ft s not throughput optal. Consder a sple exaple where we have two servers, one type of resource and two types of jobs. A type-1 job requests half of the resource and four te slots of servce, and a type- job requests the whole resource and one te slot of servce. Now assue that ntally, the server 1 hosts one type-1 job and server s epty; two type-1 jobs arrve once every three te slots startng fro te slot 3, and type- jobs arrve accordng to soe arrval process wth arrval rate ɛ startng at te slot 5. Under the best-ft polcy, type-1 jobs are scheduled forever snce type- jobs cannot be scheduled when a type-1 job s n a server. So the workload due to type- jobs wll blow up to nfnty for any ɛ > 0. he syste, however, s clearly stablzable for ɛ < /3. Suppose we schedule type-1 jobs only n te slots 1, 7, 13, 19,...,.e., once every sx te slots. hen te slots 5, 6, 11, 1, 17, 18,... are avalable for type- jobs. So f ɛ < /3, both queues can be stablzed under ths perodc scheduler. he specfc arrval process we constructed s not key to the nstablty of best-ft. Assue type-1 and type- jobs arrve accordng to ndependent Posson processes wth rates λ 1 and λ, respectvely. Fgure 3 s a sulaton result whch shows that the total nuber of backlogged jobs blows up under best-ft wth λ 1 = 0.7 and λ = 0.1, but s stable under a MaxWeght-based polcy wth λ 1 = 0.7 and λ = 0.5. hs exaple rases the queston as to whether there are throughput-optal polces whch stablze the queues for all arrval rates whch le wthn the capacty regon, wthout requrng knowledge of the actual arrval rates. In the next subsecton, we answer ths queston affratvely by relatng the proble to a well-known schedulng proble n wreless networks. However, such a schedulng algorth requres job preepton. In the later sectons, we dscuss non-preeptve polces and the loss of capacty (whch can be ade arbtrarly sall due to non-preepton. B. Preeptve Algorths In ths subsecton, we assue that all servers can be reconfgured at the begnnng of each te slot, and a job can be nterrupted at the begnnng of each te and put back n the queue. We wll study the schees that do not nterrupt job servce n the next subsecton. We further assue the job 704

Fg. 3. he nuber of backlogged jobs under the best-ft polcy and a MaxWeght polcy scheduler antans a separate queue for each type of job, and szes of all jobs are bounded by S ax. Recall that Q (t s the workload of type- jobs at the begnnng of te slot t. We consder the followng server-by-server MaxWeght allocaton schee. Server-by-server MaxWeght allocaton: At the begnnng of te slot t, consder the th server. If the set of jobs on the server are not fnshed, ove the back to the central queue. Fnd a VM-confguraton N (t such that N ( (t arg ax N N Q (tn. At server, we create upto N ( (t type- VMs dependng on the nuber of jobs that are backlogged. Let N ( (t be the actual nuber of VMs that were created. hen, we set ( Q (t + 1 = Q (t + W (t. he fact that the proposed algorth s throughput optal follows fro [13] and s stated as a theore below. heore : Assue that a server can serve at ost N ax jobs at the sae te, and E[W(t] σ for any. he server-by-server MaxWeght allocaton s throughput optal,.e., l E Q (t < t f there exsts ɛ > 0 such that (1 + ɛλ C. C. Non-preeptve Algorths N ( he algorth presented n the prevous subsecton requres us to reconfgure the servers and re-allocate jobs at the begnnng of each te slot. In practce, a job ay not be nterruptable or nterruptng a job can be very costly (the syste needs to store a snapshot of the VM to be able to restart the VM later. In ths subsecton, we ntroduce a nonpreeptve algorth, whch s nearly throughput optal. Before we present the algorth, we outlne the basc deas frst. We group te slots nto a super te slot, where > S ax. At the begnnng of a super te slot, a confguraton s chosen accordng to the MaxWeght algorth. When jobs depart a server, the reanng resources n the server are flled agan usng the MaxWeght algorth; however, we pose the constrant that only jobs that can be copleted wthn the super slot can be served. So the algorth yopcally (wthout consderaton of the future uses resources, but s queue-length aware snce t uses the MaxWeght algorth. We now descrbe the algorth ore precsely. Myopc MaxWeght allocaton: We group te slots nto a super te slot. At te slot t, consder the th server. Let N ( (t be the set of VMs that are hosted on server at the begnnng of te slot t,.e., these correspond to the jobs that were scheduled n the prevous te slot but are stll n the syste. hese VMs cannot be reconfgured due to our nonpreepton requreent. he central controller fnds a new vector of confguratons Ñ ( (t to fll up the resources not used by N ( (t,.e., Ñ ( (t arg ax Q (tn, N:N+N ( (t N he central controller selects as any jobs as avalable n the queue, up to a axu of Ñ ( (t type- jobs at server, and subject to the constrant that a type- job can only be ( served f ts sze S j (t od. Let N (t denote the actual nuber of type- jobs selected. Server then serves the N (t ( new jobs of type, and the set of jobs N ( (t left over fro the prevous te slot. he queue length s updated as follows: ( N ( (t ( + N (t. Q (t + 1 = Q (t + W (t Note that ths yopc MaxWeght allocaton algorth dffers fro the server-by-server MaxWeght allocaton n two aspects: ( jobs are not nterrupted when served and ( when a job departs fro a server, new jobs are accepted wthout reconfgurng the server. We next characterze the throughput acheved by the yopc MaxWeght allocaton under the followng assuptons: ( job szes are unforly bounded by S ax, and ( W (t W ax for all and t. heore 3: Any job load that satsfes (1+ɛ S ax λ C for soe ɛ > 0 s supportable under the yopc MaxWeght allocaton. We skp the proof of ths theore because the proof s very slar to the proof of heore 4 n the next secton. It s portant to note that, unlke best ft, the yopc MaxWeght algorth can be ade to acheve any arbtrary fracton of the capacty regon by choosng suffcently large. V. RESOURCE ALLOCAION WIH LOAD BALANCING In the prevous secton, we consdered the case when there was a sngle queue for jobs of sae type, beng served at dfferent servers. hs requres a central authorty to antan a sngle queue for all servers n the syste. A ore dstrbuted 705

soluton s to antan queues at each server and route jobs as soon as they arrve. o the best of our knowledge, ths proble does not ft nto the schedulng/routng odel n [13]. However, we show that one can stll show use MaxWeght-type schedulng f the servers are load-balanced usng a jon-theshortest-queue (JSQ routng rule. In our odel, we assue that each server antans M dfferent queues for dfferent types of jobs. It then uses ths queue length nforaton n akng schedulng decsons. Let Q denote the vector of these queue lengths where Q s the queue length of type jobs at server. Routng and schedulng are perfored as descrbed n Algorth 1. Algorth 1 JSQ Routng and Myopc Maxweght Schedulng 1 Routng Algorth (JSQ Routng: All the type jobs that arrve n te slot t are routed to the server wth the shortest queue for type jobs.e., the server (t = arg n Q (t. herefore, the arrvals to Q n te 1,,,,L} slot t are gven by W (t = W (t f = (t 0 otherwse Schedulng Algorth (Myopc MaxWeght Schedulng for each server : te slots are grouped nto a super te slot. A MaxWeght confguraton s chosen at the begnnng of a super te slot. So, for t = n, confguraton Ñ ( (t s chosen accordng to Ñ ( (t arg ax N N Q (tn For all other t, at the begnnng of the te slot, a new confguraton s chosen as follows: Ñ ( (t arg ax Q (tn N:N+N ( (t N where N ( (t s the confguraton of jobs at server that are stll n servce at the end of the prevous te slot. As any jobs as avalable are selected for servce fro the queue, up to a axu of Ñ ( (t jobs of type, and subject to the constrant that a new type job s served only f t can fnsh ts servce by the end of the super te slot,.e., only f S j (t od. Let N ( (t denote the actual nuber of type jobs selected at server and defne N ( (t = N ( (t +N ( (t. he queue lengths are updated as follows: (3 Q (t + 1 = Q (t + W (t N ( (t. (4 he followng theore characterzes the throughput perforance of the algorth. (1+ɛ heore 4: Any job load vector that satsfes S ax λ C for soe ɛ > 0 s supportable under the JSQ routng and yopc MaxWeght allocaton as descrbed n Algorth 1 Proof: Let Y (t denote the state of the queue for type jobs, where Y j (t s the reanng job sze of the jth type- job at server. Frst, t s easy to see that Y(t = Y (t}, s a Markov chan under the yopc MaxWeght schedulng. Further defne S = y : Pr(Y(t = y Y(0 = 0 for soe t}, then Y(t s an rreducble Markov chan on state space S assung Y(0 = 0. hs cla holds because ( any state n S s reachable fro 0 and ( snce Pr(W (t = 0 ɛ W for all and t, the Markov chan can ove fro Y(t to 0 n fnte te wth a postve probablty. Further Q (t = j Yj, (t,.e., Q (t s a functon of Y (t. We wll frst show that the ncrease of Q (tn ( (t s bounded wthn a super te slot. For any t such that 1 (t od S ax, for each server, Q (tn ( (t 1 = Q (tn ( (t + ( Q (t N ( (t 1 N ( (t a Q (tn ( (t + Q (tñ ( (t ( Q (tn ( (t + Q (tñ ( (t I Q(t S axn ax = + ( Q (tn ( (t + Q (tñ ( (t I Q(t<S axn ax (b Q (tn ( (t + MS ax Nax where the nequalty (a follows fro the defnton Ñ ( (t; and nequalty (b holds because when Q (t S ax N ax, there are enough nuber of type- jobs to be allocated to the servers, and when 1 (t od S ax, all backlogged jobs are elgble to be served n ters of job szes. Now snce Q (t Q (t 1 = W (t N ( (t W ax + N ax, we have Q (t 1N ( (t 1 β + Q (tn ( (t (5 where β = MN ax (W ax + N ax + MS ax N ax. Let V (t = Q(t be the Lyapunov functon. Let t = n + τ for 0 τ <. hen, E[V (n + τ + 1 V (n + τ Q(n = q] [ ( =E Q (t + W (t N ( (t ] Q (t [ =E ( Q (t W (t N ( (t + ] ( W (t N (t ( Q(n = q (6 (7 706

[ K + E Q (tw (t ] Q (tn ( (t =K + E[Q (t(tw (t Q(n = q] E Q (tn ( (t =K + λ E[Q (t(t Q(n = q] E Q (tn ( (t K + λ W ax τ + λ E[Q (n (n Q(n = q] E Q (tn ( (t =K + λ W ax τ + λ q E Q (tn ( (t (8 (9 (10 (11 (1 where K = ML(S ax + N ax and = (n = arg n q. Equaton (9 follows fro the defnton of 1,,,,L} W n the routng algorth n (3. Equaton (10 follows fro the ndependence of the arrval process fro the queue length process. Inequalty (11 coes fro the fact that Q (t(t Q (n (t Q (n + W ax τ. Now, applyng (5 repeatedly for t [n, (n+1 S ax ], and sung over, we get Q (tn ( (t L(t n β Q (n N ( (n. (13 (1+ɛ Snce, S ax λ C, there exsts } λ such that (1+ɛ S ax λ Conv(N for all and λ = λ. Accordng to the schedulng algorth, for each, we have that (1 + ɛ Q (n λ S ax Q (n N ( (n. (14 hus, we get, Q (tn ( (t L(t n β Q (n N ( (n (15 L(t n β (1 + ɛ Q (n λ (16 S ax L(t n β (1 + ɛ Q S (n λ ax = L(t n β (1 + ɛ Q S (n λ. (17 ax Substtutng ths n (1, we get, for t [n, (n + 1 S ax ], E[V (n + τ + 1 V (n + τ Q(n = q] K + λ W ax τ + L(t n β + λ q (1 + ɛ q S λ. ax (18 Sung the drft for τ fro 0 to 1 usng (18 for τ [0, S ax ], and (1 for the reanng τ, we get E[V ((n + 1 V (n Q(n = q] K + 1 S ax 1 λ W ax τ + Lβ τ + (1 + ɛ λ q q S λ ( S ax ax K + 1 S ax 1 λ W ax τ + Lβ τ ɛ q λ. he theore then follows fro the Foster-Lyapunov theore [16], [17]. VI. SIMPLER LOAD BALANCING ALGORIHMS hough JSQ routng algorth s throughput optal, the job scheduler needs the queue length nforaton fro all the servers. hs poses a consderable councaton overhead as the arrval rates of jobs and nuber of servers ncrease. In ths secton, we present two alternatves whch have uch lower routng coplexty. A. Power-of-two-choces Routng and Myopc MaxWeght Schedulng An alternate to JSQ routng s the power-of-two-choces algorth [18], [19], [0], whch s uch spler to pleent. When a job arrves, two servers are sapled at rando, and the job s routed to the server wth the saller queue for that job type. In our algorth, n each te slot t, for each type of job, two servers 1 (t and (t are chosen unforly at rando. he job scheduler then routes all the type job arrvals n ths te slot to the server wth shorter queue length aong these two,.e., (t = arg n Q (t and so 1 (t, (t} W (t f = W (t = (t 0 otherwse. 707

Otherwse, the algorth s dentcal to the JSQ-Myopc MaxWeght algorth consdered earler. In ths secton, we wll provde a lower bound on the throughput of ths powerof-two-choces algorth n the non-preeptve case when all the servers have dentcal resource constrants. heore 5: When all the severs are dentcal, any job load that satsfes (1 + ɛ S ax λ C for soe ɛ > 0 s supportable under the power-of-two-choces routng and yopc MaxWeght allocaton algorth. Proof: Agan, we use V (t = Q(t as the Lyapunov functon. For fxed, let X (t be the rando varable whch denotes the two servers that were chosen by the routng algorth at te t for jobs of type. X (t s then unforly dstrbuted over all sets of two servers. Now, usng the tower property of condtonal expectaton, we have, E Q (tw (t [ =E X [E Q (tw (t ]] Q(n = q, X (t =, j } =E X [E [Q (tw (t + Q j (tw j (t Q(n = q, X(t =, j }]] =E X [E [n (Q (t, Q j (t W (t Q(n = q, X(t =, j }]] (19 [ Q (t + Q j (t E X [E W (t ]] Q(n = q, X(t =, j } q + q j =E X λ L 1 =λ ( L 1 q (0 =λ q L. (1 Equaton (19 follows fro the routng algorth and (0 follows fro the fact that X (t s unforly dstrbuted. Snce the schedulng algorth s dentcal to Algorth 1, (13 stll holds for any t such that 1 (t od S ax. hus, we have, Q (tn ( (t L(t n β Q (n N ( (n. ( We assue that all the servers are dentcal. So, C s obtaned by sung L copes of Conv(N. hus, snce (1+ɛ S ax λ C, (1+ɛ λ we have that S ax L Conv(N = Conv(N for all. Accordng to the schedulng algorth, for each, we have that (1 + ɛ Q (n λ S ax L hus, we get, Q (tn ( (t L(t n β Q (n N ( (n. (3 (1 + ɛ S ax Q (n λ L (4 L(t n β (1 + ɛ λ Q (n. (5 S ax L Now, substtutng (1 and (16 n (8 (whch also holds for power-of-two-choces routng and sung over t [n, (n + 1 1], we get E[V ((n + 1 V (n Q(n = q] K + λ q L (1 + ɛ S ax S ax 1 K + Lβ + Lβ λ S ax 1 q L τ ɛ τ ( S ax λ q L. hs proof can be copleted by applyng the Foster-Lyapunov theore [16], [17]. B. Pck-and-Copare Routng and Myopc MaxWeght Schedulng One drawback of the power-of-two-choces schedulng s that t s throughput optal only when all servers are dentcal. In the case of nondentcal servers, one can use pck-andcopare routng algorth nstead of power-of-two-choces. he algorth s otvated by the pck-and-copare algorth for wreless schedulng and swtch schedulng [1], and s as sple to pleent as power-of-two-choces, and can be shown to be optal even f the servers are not dentcal. We descrbe ths next. he schedulng algorth s dentcal to the prevous case. Pck-and-copare routng works as follows. In each te slot t, for each type of job, a server (t s chosen unforly at rando and copared wth the server to whch jobs were routed n the prevous te slot. he server wth the shorter queue length aong the two s chosen and all the type job arrvals n ths te slot are routed to that server. Let (t be the server to whch jobs wll be routed n te slot t. hen, (t = arg n Q (t and so (t, (t 1} W (t f = W (t = (t 0 otherwse. 708

(1+ɛ heore 6: Any job load vector that satsfes S ax λ C for soe ɛ > 0 s supportable under the pck-and-copare routng and yopc MaxWeght allocaton algorth. Proof: Consder the rreducble Markov chan Y(t = (Y(t, (t} and the Lyapunov functon V (t = Q(t. hen, as n (8 for t n, we have hs s possble because f λ > 0 and λ s not on the boundary of C, one can always fnd λ } so that λ > 0. Snce the schedulng part of the algorth s dentcal to Algorth 1, (16 stll holds for t [n, (n + 1 S ax ]. hus, we have Q (tn ( (t L(t n β (1 + ɛ Q (n λ S. (7 ax We also need a bound on the ncrease n Q (tn ( (t over ultple super te slots. It s not dffcult to show that for any t such that 1 (t od S ax, (see [] for detals Q (tn ( (t (8 L(t n β (1 + ɛ Q (n λ S. (9 ax Fx. Let n = arg n Q (n. Note that 1,,,,L} Q (t Q (t 1 = W (t N ( (t W ax + N ax. herefore, once there s a t 0 n such that (t 0 satsfes Q (t 0(t 0 Q n (t 0, (30 then, for all t t 0, we have Q (t(t Q n (n +(t n (W ax + N ax. Probablty that (30 does not happen s at ost ( 1 L 1 (t0 n. Choose t0 so that ths probablty s less than p = ɛ/4κ. hen, (1 + κp = 1 + ɛ/4. Choose k so that k > (t 0 n and ((n + k t 0 + κ(t 0 n k (1 + ɛ/4. hen (see [] for detals, [ (n+k 1 E Q (tw (t, (n = t=n K 1 + k (1 + 3ɛ/4 q λ (31 where K 1 = k τ (W ax + N ax LW ax. Now, substtutng (31 and (8 n (6 and sung over all t [n, (n + 1 1], we get E[V ((n + k V (n Q(n = q, (n = ] E[V (t + 1 V (t Q(n = q, (n = ] K 1 k ɛ q λ K + E Q (tw (t, (n = where K = k K +MK 1 +Lβ k S ax 1 τ. he result follows fro the Foster-Lyapunov theore [16], [17]. E Q (tn ( (t, (n =. VII. SIMULAIONS (6 Snce, (1 + ɛ S ax λ C, there exsts λ } In ths secton, we use sulatons to copare the perforance of the centralzed yopc MaxWeght schedulng algo- such that (1 + ɛ S ax λ Conv(N for all and λ = λ. hs rth, and the jont routng and schedulng algorth based } λ can be chosen so that there s a κ so that λ κλ on the power-of-two-choces and MaxWeght schedulng. We. consder a cloud coputng cluster wth 100 dentcal servers, and each server has the hardware confguraton specfed n Exaple 1. We assue jobs beng served n ths cloud belong to one of the three types specfed n able I. So VM confguratons (, 0, 0, (1, 0, 1, and (0, 1, 1 are the three axal VM confguratons for each server. It s easy to verfy that the load vector λ = (1, 1 3, 3 s on the boundary of the capacty regon of a server. o odel the large varablty n jobs szes, we assue job szes are dstrbuted as follows: when a new job s generated, wth probablty 0.7, the sze s an nteger that s unforly dstrbuted n the nterval [1, 50], wth probablty 0.15, t s an nteger that s unforly dstrbuted between 51 and 300, and wth probablty 0.15, t s unforly dstrbuted between 451 and 500. herefore, the average job sze s 130.5 and the axu job sze s 500. We further assue the nuber of type- jobs arrvng at each te slot follows a Bnoal dstrbuton wth paraeter (α λ 130.5, 100. We vared the paraeter α fro 0.5 to 1 n our sulatons, whch vared the traffc ntensty of the cloud syste fro 0.5 to 1, where traffc ntensty s the factor by whch the load vector has to be dvded so that t les on the boundary of the capacty regon. Each sulaton was run for 500, 000 te slots. Frst we study the dfference between power-of-two-choce routng and JSQ routng by coparng the ean delays of the two algorths at varous traffc ntenstes for dfferent choces of frae szes. Our sulaton results ndcate that the delay perforance of the two algorths was not very dfferent. Due to page ltatons, we only provde a representatve saple of our sulatons here for the case where the frae sze s 4000 n Fgure 4. Next, we show the perforance of our algorths for varous values of the frae sze n Fgure 5. Agan, we ] have only shown a representatve saple for the power-of-twochoces routng (wth yopc MaxWeght schedulng. Fro heores 3 and 5, we know that any load less than Sax s supportable. he sulatons ndcate that the syste s stable even for the loads greater than ths value. hs s to be expected snce our proofs of heores 3 and 5 essentally gnore the 709

also see to provde good delay perforance. IX. ACKNOWLEDGEMENS Research supported n part by AFOSR MURI FA 9550-10-1-0573, ARO MURI W911NF-08-1-033, and NSF Grants CNS-0964081 and CNS-0963807. Fg. 4. Coparson of the ean delays n the cloud coputng cluster n the case wth a coon queue and n the case wth power-of-two-choces routng when frae sze s 4000 Fg. 5. Coparson of power-of-two-choces routng algorth for varous frae lengths jobs that are scheduled n the last S ax te slots of a frae. However, the fact that the stablty regon s larger for larger values of s confred by the sulatons. It s even ore nterestng to observe the delay perforance of our algorths as ncreases. Fgure 5 ndcates that the delay perforance does not degrade as ncreases and the throughput ncreases wth. So the use of queue-length nforaton sees to be the key ngredent of the algorth whle the optal pleentaton of the MaxWeght algorth sees to be secondary. VIII. CONCLUSIONS We consdered a stochastc odel for load balancng and schedulng n cloud coputng clusters. A prary contrbuton s the developent of frae-based non-preeptve VM confguraton polces. hese polces can be ade nearly throughput-optal by choosng suffcently long frae duratons, whereas the wdely used best ft polcy was shown to be not throughput optal. Sulatons ndcate that long frae duratons are not only good fro a throughput perspectve but REFERENCES [1] EC, http://aws.aazon.co/ec/. [] AppEngne, http://code.google.co/appengne/. [3] Azure, http://www.crosoft.co/wndowsazure/. [4] I. Foster, Y. Zhao, I. Racu, and S. Lu, Cloud coputng and grd coputng 360-degree copared, n Grd Coputng Envronents Workshop, 008. GCE 08, 008, pp. 1 10. [5] M. Arbrust, A. Fox, R. Grffth, A. Joseph, R. Katz, A. Konwnsk, G. Lee, D. Patterson, A. Rabkn, I. Stoca et al., Above the clouds: A berkeley vew of cloud coputng, 009, tech. Rep. UCb/eeCs-009-8, EECS departent, U.C. berkeley. [6] D. A. Menasce and P. Ngo, Understandng cloud coputng: Experentaton and capacty plannng, n Proc. 009 Coputer Measureent Group Conf., 009. [7] X. Meng, V. Pappas, and L. Zhang, Iprovng the scalablty of data center networks wth traffc-aware vrtual achne placeent, n Proc. IEEE Infoco., 010, pp. 1 9. [8] Y. Yazr, C. Matthews, R. Farahbod, S. Nevlle, A. Gutoun, S. Gant, and Y. Coady, Dynac resource allocaton n coputng clouds usng dstrbuted ultple crtera decson analyss, n 010 IEEE 3rd Internatonal Conference on Cloud Coputng, 010, pp. 91 98. [9] K. sakalozos, H. Kllap, E. Stard, M. Roussopoulos, D. Paparas, and A. Dels, Flexble use of cloud resources through proft axzaton and prce dscrnaton, n Data Engneerng (ICDE, 011 IEEE 7th Internatonal Conference on, 011, pp. 75 86. [10] M. Ln, A. Weran, L. L. H. Andrew, and E. hereska, Dynac rghtszng for power-proportonal data centers, n Proc. IEEE Infoco., 011, pp. 1098 1106. [11] M. Wang, X. Meng, and L. Zhang, Consoldatng vrtual achnes wth dynac bandwdth deand n data centers, n Proc. IEEE Infoco., 011, pp. 71 75. [1] U. Shara, P. Shenoy, S. Sahu, and A. Shakh, Kngfsher: Cost-aware elastcty n the cloud, n Proc. IEEE Infoco., 011, pp. 06 10. [13] L. assulas and A. Ephredes, Stablty propertes of constraned queueng systes and schedulng polces for axu throughput n ulthop rado networks, IEEE rans. Autoat. Contr., vol. 4, pp. 1936 1948, Deceber 199. [14] B. Spetkap and M. Bchler, A atheatcal prograng approach for server consoldaton probles n vrtualzed data centers, IEEE ransactons on Servces Coputng, pp. 66 78, 010. [15] A. Beloglazov and R. Buyya, Energy effcent allocaton of vrtual achnes n cloud data centers, n 010 10th IEEE/ACM Internatonal Conference on Cluster, Cloud and Grd Coputng, 010, pp. 577 578. [16] S. Asussen, Appled Probablty and Queues. New York: Sprnger- Verlag, 003. [17] S. Meyn and R. L. weede, Markov chans and stochastc stablty. Cabrdge Unversty Press, 009. [18] M. Mtzenacher, he power of two choces n randozed load balancng, Ph.D. dssertaton, Unversty of Calforna at Berkeley, 1996. [19] Y.. He and D. G. Down, Lted choce and localty consderatons for load balancng, Perforance Evaluaton, vol. 65, no. 9, 008. [0] H. Chen and H. Q. Ye, Asyptotc optalty of balanced routng, 010, http://yweb.polyu.edu.hk/ lgtyehq/papers/chenye11or.pdf. [1] L. assulas, Lnear coplexty algorths for axu throughput n radonetworks and nput queued swtches, n Proc. IEEE Infoco., 1998. [] S. Magulur, R. Srkant, and L. Yng, Stochastc odels of load balancng and schedulng n cloud coputng clusters, echncal Report, http://hdl.handle.net/14/8577. 710