Dominant Resource Fairness in Cloud Computing Systems with Heterogeneous Servers

1 Domnant Resource Farness n Cloud Computng Systems wth Heterogeneous Servers We Wang, Baochun L, Ben Lang Department of Electrcal and Computer Engneerng Unversty of Toronto arxv:138.83v1 [cs.dc] 1 Aug 213 Abstract We study the mult-resource allocaton problem n cloud computng systems where the resource pool s constructed from a large number of heterogeneous servers, representng dfferent ponts n the confguraton space of resources such as processng, memory, and storage. We desgn a mult-resource allocaton mechansm, called DRFH, that generalzes the noton of Domnant Resource Farness (DRF) from a sngle server to multple heterogeneous servers. DRFH provdes a number of hghly desrable propertes. Wth DRFH, no user prefers the allocaton of another user; no one can mprove ts allocaton wthout decreasng that of the others; and more mportantly, no user has an ncentve to le about ts resource demand. As a drect applcaton, we desgn a smple heurstc that mplements DRFH n real-world systems. Large-scale smulatons drven by Google cluster traces show that DRFH sgnfcantly outperforms the tradtonal slot-based scheduler, leadng to much hgher resource utlzaton wth substantally shorter job completon tmes. I. INTRODUCTION Resource allocaton under the noton of farness and effcency s a fundamental problem n the desgn of cloud computng systems. Unlke tradtonal applcaton-specfc clusters and grds, a cloud computng system dstngushes tself wth unprecedented server and workload heterogenety. Modern datacenters are lkely to be constructed from a varety of server classes, wth dfferent confguratons n terms of processng capabltes, memory szes, and storage spaces [1]. Asynchronous hardware upgrades, such as addng new servers and phasng out exstng ones, further aggravate such dversty, leadng to a wde range of server specfcatons n a cloud computng system [2]. Table I llustrates the heterogenety of servers n one of Google s clusters [2], [3]. In addton to server heterogenety, cloud computng systems also represent much hgher dversty n resource demand profles. Dependng on the underlyng applcatons, the workload spannng multple cloud users may requre vastly dfferent amounts of resources (e.g., CPU, memory, and storage). For example, numercal computng tasks are usually CPU ntensve, whle database operatons typcally requre hgh-memory support. The heterogenety of both servers and workload demands poses sgnfcant techncal challenges on the resource allocaton mechansm, gvng rse to many delcate ssues notably farness and effcency that must be carefully addressed. Despte the unprecedented heterogenety n cloud computng systems, state-of-the-art computng frameworks employ rather smple abstractons that fall short. For example, Hadoop [4] and Dryad [5], the two most wdely deployed cloud computng frameworks, partton a server s resources nto bundles known as slots that contan fxed amounts TABLE I CONFIGURATIONS OF SERVERS IN ONE OF GOOGLE S CLUSTERS [2], [3]. CPU AND MEMORY UNITS ARE NORMALIZED TO THE MAXIMUM SERVER (HIGHLIGHTED BELOW). Number of servers CPUs Memory 6732.5.5 3863.5.25 11.5.75 795 1. 1. 126.25.25 52.5.12 5.5.3 5.5.97 3 1..5 1.5.6 of dfferent resources. The system then allocates resources to users at the granularty of these slots. Such a sngle resource abstracton gnores the heterogenety of both server specfcatons and demand profles, nevtably leadng to a farly neffcent allocaton [6]. Towards addressng the neffcency of the current allocaton system, many recent works focus on mult-resource allocaton mechansms. Notably, Ghods et al. [6] suggest a compellng alternatve known as the Domnant Resource Farness (DRF) allocaton, n whch each user s domnant share the maxmum rato of any resource that the user has been allocated n a server s equalzed. The DRF allocaton possesses a set of hghly desrable farness propertes, and has quckly receved sgnfcant attenton n the lterature [7], [8], [9], [1]. Whle DRF and ts subsequent works address the demand heterogenety of multple resources, they all gnore the heterogenety of servers, lmtng the dscussons to a hypothetcal scenaro where all resources are concentrated n one super computer 1. Such an all-n-one resource model drastcally contrasts the state-of-the-practce nfrastructure of cloud computng systems. In fact, wth heterogeneous servers, even the defnton of domnant resource s unclear: Dependng on the underlyng server confguratons, a computng task may bottleneck on dfferent resources n dfferent servers. We shall note that nave extensons, such as applyng the DRF allocaton to each server separately, leads to a hghly neffcent allocaton (detals n Sec. III-D). Ths paper represents the frst rgorous study to propose a soluton wth provable operatonal benefts that brdge the gap between the exstng mult-resource allocaton models and the prevalent datacenter nfrastructure. We propose DRFH, a generalzaton of DRF mechansm n Heterogeneous en- 1 Whle [6] brefly touches on the case where resources are dstrbuted to small servers (known as the dscrete scenaro), ts coverage s rather nformal.

2 vronments where resources are pooled by a large amount of heterogeneous servers, representng dfferent ponts n the confguraton space of resources such as processng, memory, and storage. DRFH generalzes the ntuton of DRF by seekng an allocaton that equalzes every user s global domnant share, whch s the maxmum rato of any resources the user has been allocated n the entre cloud resource pool. We systematcally analyze DRFH and show that t retans most of the desrable propertes that the all-n-one DRF model provdes [6]. Specfcally, DRFH s Pareto optmal, where no user s able to ncrease ts allocaton wthout decreasng other users allocatons. Meanwhle, DRFH s envy-free n that no user prefers the allocaton of another user. More mportantly, DRFH s truthful n that a user cannot schedule more computng tasks by clamng more resources that are not needed, and hence has no ncentve to msreport ts actual resource demand. DRFH also satsfes a set of other mportant propertes, namely sngleserver DRF, sngle-resource farness, bottleneck farness, and populaton monotoncty (detals n Sec. III-C). As a drect applcaton, we desgn a heurstc schedulng algorthm that mplements DRFH n real-world systems. We conduct large-scale smulatons drven by Google cluster traces [3]. Our smulaton results show that compared to the tradtonal slot schedulers adopted n prevalent cloud computng frameworks, the DRFH algorthm sutably matches demand heterogenety to server heterogenety, sgnfcantly mprovng the system s resource utlzaton, yet wth a substantal reducton of job completon tmes. II. RELATED WORK Despte the extensve computng system lterature on far resource allocaton, much of the exstng works lmt ther dscussons to the allocaton of a sngle resource type, e.g., CPU tme [11], [12] and lnk bandwdth [13], [14], [15], [16], [17]. Varous farness notons have also been proposed throughout the years, rangng from applcaton-specfc allocatons [18], [19] to general farness measures [13], [2], [21]. As for mult-resource allocaton, state-of-the-art cloud computng systems employ nave sngle resource abstractons. For example, the two far sharng schedulers currently supported n Hadoop [22], [23] partton a node nto slots wth fxed fractons of resources, and allocate resources jontly at the slot granularty. Quncy [24], a far scheduler developed for Dryad [5], models the far schedulng problem as a mn-cost flow problem to schedule jobs nto slots. The recent work [25] takes the job placement constrants nto consderaton, yet t stll uses a slot-based sngle resource abstracton. Ghods et al. [6] are the frst n the lterature to present a systematc nvestgaton on the mult-resource allocaton problem n cloud computng systems. They propose DRF to equalze the domnant share of all users, and show that a number of desrable farness propertes are guaranteed n the resultng allocaton. DRF has quckly attracted a substantal amount of attenton and has been generalzed to many dmensons. Notably, Joe-Wong et al. [7] generalze the DRF measure and ncorporate t nto a unfyng framework that captures the trade-offs between allocaton farness and effcency. Dolev et al. [8] suggest another noton of farness for mult-resource allocaton, known as Bottleneck-Based Farness (BBF), under whch two farness propertes that DRF possesses are also guaranteed. Gutman and Nsan [9] consder another settngs of DRF wth a more general doman of user utltes, and show ther connectons to the BBF mechansm. Parkes et al. [1], on the other hand, extend DRF n several ways, ncludng the presence of zero demands for certan resources, weghted user endowments, and n partcular the case of ndvsble tasks. They also study the loss of socal welfare under the DRF rules. More recently, the ongong work of Kash et al. [26] extends the DRF model to a dynamc settng where users may jon the system over tme but wll never leave. Though motvated by the resource allocaton problem n cloud computng systems, all the works above restrct ther dscussons to a hypothetcal scenaro where the resource pool contans only one bg server, whch s not the case n the state-of-the-practce datacenter systems. Other related works nclude far-dvson problems n the economcs lterature, n partcular the egaltaran dvson under Leontef preferences [27] and the cake-cuttng problem [28]. However, these works also assume the all-n-one resource model, and hence cannot be drectly appled to cloud computng systems wth heterogeneous servers. III. SYSTEM MODEL AND ALLOCATION PROPERTIES In ths secton, we model mult-resource allocaton n a cloud computng system wth heterogeneous servers. We formalze a number of desrable propertes that are deemed the most mportant for allocaton mechansms n cloud computng envronments. A. Basc Settng In a cloud computng system, the resource pool s composed of a cluster of heterogeneous servers S = {1,...,k}, each contrbutng m hardware resources (e.g., CPU, memory, storage) denoted by R = {1,...,m}. For each server l, let c l = (c l1,...,c lm ) T be ts resource capacty vector, where each element c lr denotes the total amount of resource r avalable n server l. Wthout loss of generalty, for every resource r, we normalze the total capacty of all servers to 1,.e., c lr = 1, r = 1,2,...,m. l S Let U = {1,...,n} be the set of cloud users sharng the cloud system. For every user, let D = (D 1,...,D m ) T be ts resource demand vector, where D r s the fracton (share) of resource r requred by each task of user over the entre system. For smplcty, we assume postve demands for all users,.e., D r >, U,r R. We say resource r s the global domnant resource of user f r argmax D r. r R In other words, r s the most heavly demanded resource requred by user s task n the entre resource pool. For all user and resource r, we defne d r = D r /D r

3 Memory CPUs Server 1 Server 2 (2 CPUs, 12 GB) (12 CPUs, 2 GB) Fg. 1. An example of a system contanng two heterogeneous servers shared by two users. Each computng task of user 1 requres.2 CPU tme and 1 GB memory, whle the computng task of user 2 requres 1 CPU tme and.2 GB memory. as the normalzed demand and denote byd = (d 1,...,d m ) T the normalzed demand vector of user. As a concrete example, consder Fg. 1 where the system contans two heterogeneous servers. Server 1 s hghmemory wth 2 CPUs and 12 GB memory, whle server 2 s hgh-cpu wth 12 CPUs and 2 GB memory. Snce the system contans 14 CPUs and 14 GB memory n total, the normalzed capacty vectors of server 1 and 2 are c 1 = (CPU share, memory share) T = (1/7,6/7) T and c 2 = (6/7,1/7) T, respectvely. Now suppose there are two users. User 1 has memory-ntensve tasks each requrng.2 CPU tme and 1 GB memory, whle user 2 has CPU-heavy tasks each requrng 1 CPU tme and.2 GB memory. The demand vector of user 1 s D 1 = (1/7,1/14) T and the normalzed vector s d 1 = (1/5,1) T, where memory s the global domnant resource. Smlarly, user 2 has D 2 = (1/14,1/7) T and d 2 = (1,1/5) T, and CPU s ts global domnant resource. For now, we assume users have an nfnte number of tasks to be scheduled, and all tasks are dvsble [6], [8], [9], [1], [26]. We wll dscuss how these assumptons can be relaxed n Sec. V. B. Resource Allocaton For every user and server l, let A l = (A l1,...,a lm ) T be the resource allocaton vector, where A lr s the share of resource r allocated to user n server l. Let A = (A 1,...,A k ) be the allocaton matrx of user, and A = (A 1,...,A n ) the overall allocaton for all users. We say an allocaton A s feasble f no server s requred to use more than any of ts total resources,.e., A lr c lr, l S,r R. U For all user, gven allocaton A l n server l, the maxmum number of tasks (possbly fractonal) that t can schedule s calculated as N l (A l ) = mn r R {A lr/d r }. The total number of tasks user can schedule under allocaton A s hence N (A ) = l S N l (A l ). (1) Intutvely, a user prefers an allocaton that allows t to schedule more tasks. A well-justfed allocaton should never gve a user more resources than t can actually use n a server. Followng the termnology used n the economcs lterature [27], we call such an allocaton non-wasteful: Defnton 1: For user and server l, an allocaton A l s non-wasteful f takng out any resources reduces the number of tasks scheduled,.e., for all A l A l 2, we have that N l (A l) < N l (A l ). User s allocaton A = (A l ) s non-wasteful f A l s nonwasteful for all server l, and allocaton A = (A ) s nonwasteful f A s non-wasteful for all user. Note that one can always convert an allocaton to nonwasteful by revokng those resources that are allocated but have never been actually used, wthout changng the number of tasks scheduled for any user. Therefore, unless otherwse specfed, we lmt the dscussons to non-wasteful allocatons. C. Allocaton Mechansm and Desrable Propertes A resource allocaton mechansm takes user demands as nput and outputs the allocaton result. In general, an allocaton mechansm should provde the followng essental propertes that are wdely recognzed as the most mportant farness and effcency measures n both cloud computng systems [6], [7], [25] and the economcs lterature [27], [28]. Envy-freeness: An allocaton mechansm s envy-free f no user prefers the other s allocaton to ts own,.e., N (A ) N (A j ) for any two users,j U. Ths property essentally embodes the noton of farness. Pareto optmalty: An allocaton mechansm s Pareto optmal f t returns an allocaton A such that for all feasble allocatons A, f N (A ) > N (A ) for some user, then there exsts a user j such that N j (A j ) < N j(a j ). In other words, there s no other allocaton where all users are at least as well off and at least one user s strctly better off. Ths property ensures the allocaton effcency and s crtcal for hgh resource utlzaton. Truthfulness: An allocaton mechansm s truthful f no user can schedule more tasks by msreportng ts resource demand (assumng a user s demand s ts prvate nformaton), rrespectve of other users behavour. Specfcally, gven the demands clamed by other users, let A be the resultng allocaton when user truthfully reports ts resource demand D, and let A be the allocaton returned when user msreports by D D. Then under a truthful mechansm we have N (A ) N (A ). Truthfulness s of a specal mportance for a cloud computng system, as t s common to observe n real-world systems that users try to le about ther resource demands to manpulate the schedulers for more allocaton [6], [25]. In addton to these essental propertes, we also consder four other mportant propertes below: Sngle-server DRF: If the system contans only one server, then the resultng allocaton should be reduced to the DRF allocaton. 2 For any two vectors x and y, we say x y f x y, and for some j we have strct nequalty: x j < y j.

4 1% 5% % User1 42% 1% 5% User2 8% % CPU Memory CPU Memory Server 1 Server 2 Fg. 2. DRF allocaton for the example shown n Fg. 1, where user 1 s allocated 5 tasks n server 1 and 1 n server 2, whle user 2 s allocated 1 task n server 1 and 5 n server 2. Sngle-resource farness: If there s a sngle resource n the system, then the resultng allocaton should be reduced to a max-mn far allocaton. Bottleneck farness: If all users bottleneck on the same resource (.e., havng the same global domnant resource), then the resultng allocaton should be reduced to a max-mn far allocaton for that resource. Populaton monotoncty: If a user leaves the system and relnqushes all ts allocatons, then the remanng users wll not see any reducton n the number of tasks scheduled. In addton to the aforementoned propertes, sharng ncentve s another mportant property that has been frequently mentoned n the lterature [6], [7], [8], [1]. It ensures that every user s allocaton s not worse off than that obtaned by evenly dvdng the entre resource pool. Whle ths property s well defned for a sngle server, t s not for a system contanng multple heterogeneous servers, as there s an nfnte number of ways to evenly dvde the resource pool among users, and t s unclear whch one should be chosen as a benchmark. We defer the dscussons to Sec. IV-D, where we justfy between two possble alternatves. For now, our objectve s to desgn an allocaton mechansm that guarantees all the propertes defned above. D. Nave DRF Extenson and Its Ineffcency It has been shown n [6], [1] that the DRF allocaton satsfes all the desrable propertes mentoned above when there s only one server n the system. The key ntuton s to equalze the fracton of domnant resources allocated to each user n the server. When resources are dstrbuted to many heterogeneous servers, a nave generalzaton s to separately apply the DRF allocaton per server. Snce servers are heterogeneous, a user mght have dfferent domnant resources n dfferent servers. For nstance, n the example of Fg. 1, user 1 s domnant resource n server 1 s CPU, whle ts domnant resource n server 2 s memory. Now apply DRF n server 1. Because CPU s also user 2 s domnant resource, the DRF allocaton lets both users have an equal share of the server s CPUs, each allocated 1. As a result, user 1 schedules 5 tasks onto server 1, whle user 2 schedules 1 onto the same server. Smlarly, n server 2, memory s the domnant resource of both users and s evenly allocated, leadng to 1 task scheduled for user 1 and 5 for user 2. The resultng allocatons n the two servers are llustrated n Fg. 2, where both users schedule 6 tasks. Unfortunately, ths allocaton volates Pareto optmalty and s hghly neffcent. If we nstead allocate server 1 exclusvely to user 1, and server 2 exclusvely to user 2, then both users schedule 1 tasks, more than those scheduled under the DRF allocaton. In fact, we see that navely applyng DRF per server may lead to an allocaton wth arbtrarly low resource utlzaton. The falure of the nave DRF extenson to the heterogeneous envronment necesstates an alternatve allocaton mechansm, whch s the man theme of the next secton. IV. DRFH ALLOCATION AND ITS PROPERTIES In ths secton, we descrbe DRFH, a generalzaton of DRF n a heterogeneous cloud computng system where resources are dstrbuted n a number of heterogeneous servers. We analyze DRFH and show that t provdes all the desrable propertes defned n Sec. III. A. DRFH Allocaton Instead of allocatng separately n each server, DRFH jontly consders resource allocaton across all heterogeneous servers. The key ntuton s to acheve the max-mn far allocaton for the global domnant resources. Specfcally, gven allocaton A l, let G l (A l ) be the fracton of global domnant resources user receves n server l,.e., G l (A l ) = N l (A l )D r = mn r R {A lr/d r }. (2) We call G l (A l ) the global domnant share user receves n server l under allocaton A l. Therefore, gven the overall allocaton A, the global domnant share user receves s G (A ) = l S G l (A l ) = l S mn r R {A lr/d r }. (3) DRFH allocaton ams to maxmze the mnmum global domnant share among all users, subject to the resource constrants per server,.e., max A s.t. mn G (A ) U A lr c lr, l S,r R. U Recall that wthout loss of generalty, we assume nonwasteful allocaton A (see Sec. III-B). We have the followng structural result. Lemma 1: For user and server l, an allocaton A l s non-wasteful f and only f there exsts some g l such that A l = g l d. In partcular, g l s the global domnant share user receves n server l under allocaton A l,.e., g l = G l (A l ). Proof: ( ) We start wth the necessty proof. Snce A l = g l d, for all resource r R, we have As a result, A lr /D r = g l d r /D r = g l D r. N l (A l ) = mn r R {A lr/d r } = g l D r. (4)

5 Now for any A l A l, suppose A lr < A lr for some resource r. We have N l (A l ) = mn r R {A lr /D r} Resource Share 5/7 User1 User2 6/7 6/7 A lr /D r < A lr /D r = N l (A l ). Hence by defnton, allocaton A l s non-wasteful. ( ) We next present the suffcency proof. Snce A l s non-wasteful, for any two resources r 1,r 2 R, we must have A lr1 /D r1 = A lr2 /D r2. Otherwse, wthout loss of generalty, suppose A lr1 /D r1 > A lr2 /D r2. There must exst some ǫ >, such that (A lr1 ǫ)/d r1 > A lr2 /D r2. Now construct an allocaton A l, such that { A A lr = lr1 ǫ, r = r 1 ; A lr, o.w. Clearly, A l A l. However, t s easy to see that N l (A l ) = mn r R {A lr /D r} = mn{a lr /D r} r r 1 = mn{a lr /D r } r r 1 = mn {A lr/d r } = N l (A l ), r R whch contradcts the fact thata l s non-wasteful. As a result, there exts some n l, such that for all resource r R, we have A lr = n l D r = n l D r d r. Now lettng g l = n l D r, we see A l = g l d. Intutvely, Lemma 1 ndcates that under a non-wasteful allocaton, resources are allocated n proporton to the user s demand. Lemma 1 mmedately suggests the followng relatonshp for all user and ts non-wasteful allocaton A : G (A ) = l S G l (A l ) = l S g l. Problem (4) can hence be equvalently wrtten as max mn g l {g l } U l S s.t. g l d r c lr, l S,r R, U where the constrants are derved from Lemma 1. Now let g = mn l S g l. Va straghtforward algebrac operaton, we see that (6) s equvalent to the followng problem: max {g l } s.t. g g l d r c lr, l S,r R, U g l = g, U. l U Note that the second constrant ensures the farness wth respect to the equalzed global domnant share g. By solvng (5) (6) (7) 1/7 1/7 CPU Memory CPU Memory Server 1 Server 2 Fg. 3. An alternatve allocaton wth hgher system utlzaton for the example of Fg. 1. Server 1 and 2 are exclusvely assgned to user 1 and 2, respectvely. Both users schedule 1 tasks. (7), DRFH allocates each user the maxmum global domnant share g, under the constrants of both server capacty and farness. By Lemma 1, the allocaton receved by each user n server l s smply A l = g l d. For example, Fg. 3 llustrates the resultng DRFH allocaton n the example of Fg. 1. By solvng (7), DRFH allocates server 1 exclusvely to user 1 and server 2 exclusvely to user 2, allowng each user to schedule 1 tasks wth the maxmum global domnant share g = 5/7. We next analyze the propertes of DRFH allocaton obtaned by solvng (7) n the followng two subsectons. B. Analyss of Essental Propertes Our analyss of DRFH starts wth the three essental resource allocaton propertes, namely, envy-freeness, Pareto optmalty, and truthfulness. We frst show that under the DRFH allocaton, no user prefers other s allocaton to ts own. Proposton 1 (Envy-freeness): The DRFH allocaton obtaned by solvng (7) s envy-free. Proof: Let {g l } be the soluton to problem (7). For all user, ts DRFH allocaton n server l s A l = g l d. To show N (A j ) N (A ) for any two users and j, t s equvalent to prove N (A j ) N (A ). We have G (A j ) = l G l(a jl ) = l mn r{g jl d jr /d r } l g jl = G (A ), where the nequalty holds because mn r {d jr /d r } d jr /d r 1, where r s user s global domnant resource. We next show that DRFH leads to an effcent allocaton under whch no user can mprove ts allocaton wthout decreasng that of the others. Proposton 2 (Pareto optmalty): The DRFH allocaton obtaned by solvng (7) s Pareto optmal. Proof: Let {g l }, and the correspondng g, be the soluton to problem (7). For all user, ts DRFH allocaton n server l s A l = g l d. Snce (6) and (7) are equvalent, {g l } also solves (6), wth g beng the maxmum value of the objectve of (6). Assume, by way of contradcton, that allocaton A s not Pareto optmal,.e., there exsts some allocaton A, such

6 that N (A ) N (A ) for all user, and for some user j we have strct nequalty: N j (A j ) > N j(a j ). Equvalently, ths mples that G (A ) G (A ) for all user, and G j (A j ) > G j(a j ) for user j. Wthout loss of generalty, let A be non-wasteful. By Lemma 1, for all user and server l, there exsts some g l such that A l = g l d. We show that based on {g l }, one can construct some {ĝ l} such that {ĝ l } s a feasble soluton to (6), yet leads to a hgher objectve than g, contradctng the fact that {g l } optmally solve (6). To see ths, consder user j. We have G j (A j ) = l g jl = g < G j (A j ) = l g jl. For user j, there exsts a server l and some ǫ >, such that after reducng g jl to g jl ǫ, the resultng global domnant share remans hgher than g,.e., l g jl ǫ g. Ths leads to at least ǫd j dle resources n server l. We construct {ĝ l } by redstrbutng these dle resources to all users. Denote by {g l } the domnant share after reducng g jl to g jl ǫ,.e., g { g l = jl ǫ, = j,l = l ; g l, o.w. The correspondng non-wasteful allocaton s A l = g l d for all user and server l. Note that allocaton A s preferred over the orgnal allocaton A by all users,.e., for all user, we have G (A ) = l g l = { lg jl ǫ g = G j(a j ), = j; l g l = G (A ) G (A ), o.w. We now construct {ĝ l } by redstrbutng the ǫd j dle resources n server l to all users, each ncreasng ts global domnant share g l by δ = mn r {ǫd jr / d r},.e., { g ĝ l = l +δ, l = l ; o.w. g l, It s easy to check that {ĝ l } remans a feasble allocaton. To see ths, t suffces to check server l. For all ts resource r, we have ĝld r = (g l +δ)d r = g l d r ǫd jr +δ d r c lr (ǫd jr δ d r) c lr. where the frst nequalty holds because A s a feasble allocaton. On the other hand, for all user U, we have lĝl = l g l +δ = G (A )+δ G (A )+δ > g. Ths contradcts the premse that g s optmal for (6). For now, all our dscussons are based on a crtcal assumpton that all users truthfully report ther resource demands. However, n a real-world system, t s common to observe users to attempt to manpulate the scheduler by msreportng ther resource demands, so as to receve more allocaton [6], [25]. More often than not, these strategc behavours would sgnfcantly hurt those honest users and reduce the number of ther tasks scheduled, nevtably leadng to a farly neffcent allocaton outcome. Fortunately, we show by the followng proposton that DRFH s mmune to these strategc behavours, as reportng the true demand s always the best strategy for every user, rrespectve of the others behavour. Proposton 3 (Truthfulness): The DRFH allocaton obtaned by solvng (7) s truthful. Proof: For any user, fxng all other users clamed demands d = (d 1,...,d 1,d +1,...,d n) (whch may not be ther true demands), let A be the resultng allocaton when truthfully reports ts demand d, that s, A l = g l d and A jl = g jl d j for all user j and server l, where g l and g jl are the global domnant shares users and j receve on server l under A l and A jl, respectvely. Smlarly, let A be the resultng allocaton when user msreports ts demand as d. Let g and g be the global domnant share user receves under A and A, respectvely. We check the followng two cases and show that G (A ) G (A ), whch s equvalent to N (A ) N (A ). Case 1: g g. In ths case, let ρ = mn r {d r /d r} be defned for user. Clearly, where r ρ = mn r {d r /d r} d r /d r 1, s the domnant resource of user. We then have G (A ) = l G l(a l ) = mn r {d r /d r} l g l = ρ g g = G (A ). Case 2: g > g. For all user j, when user truthfully reports ts demand, let G j (A j,d j ) be the global domnant share of user j w.r.t. ts clamed demand d j,.e., G j (A j,d j ) = l mn r{g jl d jr /d jr } = l g jl = g. Smlarly, when usermsreports, letg j (A j,d j ) be the global domnant share of user j w.r.t. ts clamed demand d j,.e., G j (A j,d j ) = l mn r{g jl d jr /d jr } = l g jl = g, As a result, We must have G j (A j,d j ) > G j(a j,d j ), j. G (A ) < G (A ). Otherwse, allocaton A s preferred over A by all users and s strctly preferred by user j w.r.t. the clamed demands (d,d ). Ths contradcts the Pareto optmalty of DRFH allocaton. (Recall that allocaton A s an DRFH allocaton gven the clamed demands (d,d ). ) C. Analyss of Important Propertes In addton to the three essental propertes shown n the prevous subsecton, DRFH also provdes a number of other mportant propertes. Frst, snce DRFH generalzes DRF to heterogeneous envronments, t naturally reduces to the DRF allocaton when there s only one server contaned n the system, where the global domnant resource defned n DRFH s exactly the same as the domnant resource defned n DRF.

7 Proposton 4 (Sngle-server DRF): The DRFH leads to the same allocaton as DRF when all resources are concentrated n one server. Next, by defnton, we see that both sngle-resource farness and bottleneck farness trvally hold for the DRFH allocaton. We hence omt the proofs of the followng two propostons. Proposton 5 (Sngle-resource farness): The DRFH allocaton satsfes sngle-resource farness. Proposton 6 (Bottleneck farness): The DRFH allocaton satsfes bottleneck farness. Fnally, we see that when a user leaves the system and relnqushes all ts allocatons, the remanng users wll not see any reducton of the number of tasks scheduled. Formally, Proposton 7 (Populaton monotoncty): The DRFH allocaton satsfes populaton monotoncty. Proof: Let A be the resultng DRFH allocaton, then for all user and server l, A l = g l d and G (A ) = g, where {g l } andg solve (7). Suppose userj leaves the system, changng the resultng DRFH allocaton to A. By DRFH, for all user j and server l, we have A l = g l d and G (A ) = g, where {g l } j and g solve the followng optmzaton problem: max g l, j g s.t. j g l d r c lr, l S,r R, l U g l = g, j. To show N (A ) N (A ) for all user j, t s equvalent to prove G (A ) G (A ). It s easy to verfy that g,{g l } j satsfy all the constrants of (8) and are hence feasble to (8). As a result, g g. Ths s exactly G (A ) G (A ). D. Dscussons of Sharng Incentve In addton to the aforementoned propertes, sharng ncentve s another mportant allocaton property that has been frequently mentoned n the lterature, e.g., [6], [7], [8], [1], [25]. It ensures that every user s allocaton s at least as good as that obtaned by evenly parttonng the entre resource pool. When the system contans only a sngle server, ths property s well defned, as evenly dvdng the server s resources leads to a unque allocaton. However, for the system contanng multple heterogeneous servers, there s an nfnte number of ways to evenly dvde the resource pool, and t s unclear whch one should be chosen as the benchmark for comparson. For example, n Fg. 1, two users share the system wth 14 CPUs and 14 GB memory n total. The followng two allocatons both allocate each user 7 CPUs and 7 GB memory: (a) User 1 s allocated 1/2 resources of server 1 and 1/2 resources of server 2, whle user 2 receves the rest; (b) user 1 s allocated (1.5 CPUs, 5.5 GB) n server 1 and (5.5 CPUs, 1.5 GB) n server 2, whle user 2 receves the rest. One mght thnk that allocaton (a) s a more reasonable benchmark as t allows all n users to have an equal share of every server, each recevng 1/n of the server s resources. However, ths benchmark has lttle practcal meanng: Wth a large n, each user wll only receve a small fracton of resources on each server, whch lkely cannot be utlzed by any computng task. In other words, havng a small slce (8) of resources n each server s essentally meanngless. We therefore consder another benchmark that s more practcal. Snce cloud systems are constructed by poolng hundreds of thousands of servers [1], [2], the number of users s typcally far smaller than the number of servers [6], [25],.e., k n. An equal dvson would allocate to each user k/n servers drawn from the same dstrbuton of the system s server confguratons. For each user, the allocated k/n servers are then treated as a dedcated cloud that s exclusve to the user. The number of tasks scheduled on ths dedcated cloud s then used as a benchmark and s compared to the number of tasks scheduled n the orgnal cloud computng system shared wth all other users. We wll evaluate such a sharng ncentve property va trace-drven smulatons n Sec. VI. V. PRACTICAL CONSIDERATIONS So far, all our dscussons are based on several assumptons that may not be the case n a real-world system. In ths secton, we relax these assumptons and dscuss how DRFH can be mplemented n practce. A. Weghted Users wth a Fnte Number of Tasks In the prevous sectons, users are assumed to be assgned equal weghts and have nfnte computng demands. Both assumptons can be easly removed wth some mnor modfcatons of DRFH. When users are assgned uneven weghts, let w be the weght assocated wth user. DRFH seeks an allocaton that acheves the weghted max-mn farness across users. Specfcally, we maxmze the mnmum normalzed global domnant share (w.r.t the weght) of all users under the same resource constrants as n (4),.e., max A s.t. mn G (A )/w U A lr c lr, l S,r R. U When users have a fnte number of tasks, the DRFH allocaton s computed teratvely. In each round, DRFH ncreases the global domnant share allocated to all actve users, untl one of them has all ts tasks scheduled, after whch the user becomes nactve and wll no longer be consdered n the followng allocaton rounds. DRFH then starts a new teraton and repeats the allocaton process above, untl no user s actve or no more resources could be allocated to users. Our analyss presented n Sec. IV also extends to weghted users wth a fnte number of tasks. B. Schedulng Tasks as Enttes Untl now, we have assumed that all tasks are dvsble. In a real-world system, however, fractonal tasks may not be accepted. To schedule tasks as enttes, one can apply progressve fllng as a smple mplementaton of DRFH. That s, whenever there s a schedulng opportunty, the scheduler always accommodates the user wth the lowest global domnant share. To do ths, t pcks the frst server that fts the user s task. Whle ths Frst-Ft algorthm offers a farly good

8 approxmaton to DRFH, we propose another smple heurstc that can lead to a better allocaton wth hgher resource utlzaton. Smlar to Frst-Ft, the heurstc also chooses user wth the lowest global domnant share to serve. However, nstead of randomly pckng a server, the heurstc chooses the best one that most sutably matches user s tasks, and s hence referred to as the Best-Ft DRFH. Specfcally, for user wth resource demand vector D = (D 1,...,D m ) T and a server l wth avalable resource vector c l = ( c l1,..., c lm ) T, where c lr s the share of resource r remanng avalable n server l, we defne the followng heurstc functon to measure the task s ftness for the server: H(,l) = D /D 1 c l / c l1 1, (9) where 1 s the L 1 -norm. Intutvely, the smaller H(,l), the more smlar the resource demand vector D appears to the server s avalable resource vector c l, and the better ft user s task s for server l. For example, a CPU-heavy task s more sutable to run n a server wth more avalable CPU resources. Best-Ft DRFH schedules user s tasks to server l wth the least H(, l). We evaluate both Frst-Ft DRFH and Best-Ft DRFH va trace-drven smulatons n the next secton. VI. SIMULATION RESULTS In ths secton, we evaluate the performance of DRFH va extensve smulatons drven by Google cluster-usage traces [3]. The traces contan resource demand/usage nformaton of over 9 users (.e., Google servces and engneers) on a cluster of 12K servers. The server confguratons are summarzed n Table I, where the CPUs and memory of each server are normalzed so that the maxmum server s 1. Each user submts computng jobs, dvded nto a number of tasks, each requrng a set of resources (.e., CPU and memory). From the traces, we extract the computng demand nformaton the requred amount of resources and task runnng tme and use t as the demand nput of the allocaton algorthms for evaluaton. Dynamc allocaton: Our frst evaluaton focuses on the allocaton farness of the proposed Best-Ft DRFH when users dynamcally jon and depart the system. We smulate 3 users submttng tasks wth dfferent resource requrements to a small cluster of 1 servers. The server confguratons are randomly drawn from the dstrbuton of Google cluster servers n Table I, leadng to a resource pool contanng 52.75 CPU unts and 51.32 memory unts n total. User 1 jons the system at the begnnng, requrng.2 CPU and.3 memory for each of ts task. As shown n Fg. 4, snce only user 1 s actve at the begnnng, t s allocated 4% CPU share and 62% memory share. Ths allocaton contnues untl 2 s, at whch tme user 2 jons and submts CPU-heavy tasks, each requrng.5 CPU and.1 memory. Both users now compete for computng resources, leadng to a DRFH allocaton n whch both users receve 44% global domnant share. At 5 s, user 3 starts to submt memory-ntensve tasks, each requrng.1 CPU and.3 memory. The algorthm now allocates the same global domnant share of 26% to all three users untl user 1 fnshes ts tasks and departs at 18 s. After that, only users 2 and CPU Share (%) Memory Share (%) Domnant Share (%) 1 8 User 1 User 2 User 3 6 4 2 2 4 6 8 1 12 14 16 Tme (s) 1 8 User 1 User 2 User 3 6 4 2 2 4 6 8 1 12 14 16 Tme (s) 1 8 User 1 User 2 User 3 6 4 2 2 4 6 8 1 12 14 16 Tme (s) Fg. 4. CPU, memory, and global domnant share for three users on a 1- server system wth 52.75 CPU unts and 51.32 memory unts n total. TABLE II RESOURCE UTILIZATION OF THE SLOTS SCHEDULER WITH DIFFERENT SLOT SIZES. Number of Slots CPU Utlzaton Memory Utlzaton 1 per maxmum server 35.1% 23.4% 12 per maxmum server 42.2% 27.4% 14 per maxmum server 43.9% 28.% 16 per maxmum server 45.4% 24.2% 2 per maxmum server 4.6% 2.% 3 share the system, each recevng the same share on ther global domnant resources. A smlar process repeats untl all users fnsh ther tasks. Throughout the smulaton, we see that the Best-Ft DRFH algorthm precsely acheves the DRFH allocaton at all tmes. Resource utlzaton: We next evaluate the resource utlzaton of the proposed Best-Ft DRFH algorthm. We take the 24-hour computng demand data from the Google traces and smulate t on a smaller cloud computng system of 2, servers so that farness becomes relevant. The server confguratons are randomly drawn from the dstrbuton of Google cluster servers n Table I. We compare Best-Ft DRFH wth two other benchmarks, the tradtonal Slots schedulers that schedules tasks onto slots of servers (e.g., Hadoop Far Scheduler [23]), and the Frst-Ft DRFH that chooses the frst server that fts the task. For the former, we try dfferent slot szes and chooses the one wth the hghest CPU and memory utlzaton. Table II summarzes our observatons, where dvdng the maxmum server (1 CPU and 1 memory n Table I) nto 14 slots leads to the hghest overall utlzaton. Fg. 5 depcts the tme seres of CPU and memory utlzaton of the three algorthms. We see that the two DRFH mplementatons sgnfcantly outperform the tradtonal Slots scheduler wth much hgher resource utlzaton, manly because the latter gnores the heterogenety of both servers and workload.

9 CPU Utlzaton Memory Utlzaton 1.8.6.4.2 Best Ft DRFH Frst Ft DRFH Slots 2 4 6 8 1 12 14 Tme (mn) 1.8.6.4.2 Best Ft DRFH Frst Ft DRFH Slots 2 4 6 8 1 12 14 Tme (mn) Fg. 5. Tme seres of CPU and memory utlzaton. 1.8.6.4.2 Best Ft DRFH Slots 1 1 1 2 1 3 1 4 1 5 Job Completon Tme (s) (a) CDF of job completon tmes. Completon Tme Reducton 8 6 4 2 1% 2% 25% 43% 1 5 51 1 11 5 51 1 >1 Job Sze (tasks) 62% (b) Job completon tme reducton. Fg. 6. DRFH mprovements on job completon tmes over Slots scheduler. Ths observaton s consstent wth fndngs n the homogeneous envronment where all servers are of the same hardware confguratons [6]. As for the DRFH mplementatons, we see that Best-Ft DRFH leads to unformly hgher resource utlzaton than the Frst-Ft alternatve at all tmes. The hgh resource utlzaton of Best-Ft DRFH naturally translates to shorter job completon tmes shown n Fg. 6a, where the CDFs of job completon tmes for both Best-Ft DRFH and Slots scheduler are depcted. Fg. 6b offers a more detaled breakdown, where jobs are classfed nto 5 categores based on the number of ts computng tasks, and for each category, the mean completon tme reducton s computed. Whle DRFH shows no mprovement over Slots scheduler for small jobs, a sgnfcant completon tme reducton has been observed for those contanng more tasks. Generally, the larger the job s, the more mprovement one may expect. Smlar observatons have also been found n the homogeneous envronments [6]. Fg. 6 does not account for partally completed jobs and focuses only on those havng all tasks fnshed n both Best- Ft and Slots. As a complementary study, Fg. 7 computes the task completon rato the number of tasks completed over the number of tasks submtted for every user usng Best- Ft DRFH and Slots schedulers, respectvely. The radus of the crcle s scaled logarthmcally to the number of tasks the user submtted. We see that Best-Ft DRFH leads to hgher task completon rato for almost all users. Around 2% users have all ther tasks completed under Best-Ft DRFH but do not under Slots. Task completon rato w/ DRFH 1.8.6.4.2 y = x.2.4.6.8 1 Task completon rato w/ Slots Fg. 7. Task completon rato of users usng Best-Ft DRFH and Slots schedulers, respectvely. Each bubble s sze s logarthmc to the number of tasks the user submtted. Task completon rato n SC 1.8.6.4.2 y = x.2.4.6.8 1 Task completon rato n DC Fg. 8. Task completon rato of users runnng on dedcated clouds (DCs) and the shared cloud (SC). Each crcle s radus s logarthmc to the number of tasks submtted. Sharng ncentve: Our fnal evaluaton s on the sharng ncentve property of DRFH. As mentoned n Sec. IV-D, for each user, we run ts computng tasks on a dedcated cloud (DC) that s a proportonal subset of the orgnal shared cloud (SC). We then compare the task completon rato n DC wth that obtaned n SC. Fg. 8 llustrates the results. Whle DRFH does not guarantee 1% sharng ncentve for all users, t benefts most of them by poolng ther DCs together. In partcular, only 2% users see fewer tasks fnshed n the shared envronment. Even for these users, the task completon rato decreases only slghtly, as can be seen from Fg. 8. VII. CONCLUDING REMARKS In ths paper, we study a mult-resource allocaton problem n a heterogeneous cloud computng system where the resource pool s composed of a large number of servers wth dfferent confguratons n terms of resources such as processng, memory, and storage. The proposed mult-resource allocaton mechansm, known as DRFH, equalzes the global domnant share allocated to each user, and hence generalzes the DRF allocaton from a sngle server to multple heterogeneous servers. We analyze DRFH and show that t retans almost all desrable propertes that DRF provdes n the sngle-server scenaro. Notably, DRFH s envy-free, Pareto optmal, and truthful. We desgn a Best-Ft heurstc that mplements DRFH n a real-world system. Our large-scale smulatons drven by Google cluster traces show that, compared to the tradtonal sngle-resource abstracton such as a slot scheduler, DRFH acheves sgnfcant mprovements n resource utlzaton, leadng to much shorter job completon tmes. REFERENCES [1] M. Armbrust, A. Fox, R. Grffth, A. Joseph, R. Katz, A. Konwnsk, G. Lee, D. Patterson, A. Rabkn, I. Stoca, and M. Zahara, A vew of cloud computng, Commun. ACM, vol. 53, no. 4, pp. 5 58, 21. [2] C. Ress, A. Tumanov, G. Ganger, R. Katz, and M. Kozuch, Heterogenety and dynamcty of clouds at scale: Google trace analyss, n Proc. ACM SoCC, 212. [3] C. Ress, J. Wlkes, and J. L. Hellersten, Google Cluster-Usage Traces, http://code.google.com/p/googleclusterdata/. [4] Apache Hadoop, http://hadoop.apache.org. [5] M. Isard, M. Budu, Y. Yu, A. Brrell, and D. Fetterly, Dryad: dstrbuted data-parallel programs from sequental buldng blocks, n Proc. EuroSys, 27. [6] A. Ghods, M. Zahara, B. Hndman, A. Konwnsk, S. Shenker, and I. Stoca, Domnant resource farness: Far allocaton of multple resource types, n Proc. USENIX NSDI, 211.

[7] C. Joe-Wong, S. Sen, T. Lan, and M. Chang, Mult-resource allocaton: Farness-effcency tradeoffs n a unfyng framework, n Proc. IEEE INFOCOM, 212. [8] D. Dolev, D. Fetelson, J. Halpern, R. Kupferman, and N. Lnal, No justfed complants: On far sharng of multple resources, n Proc. ACM ITCS, 212. [9] A. Gutman and N. Nsan, Far allocaton wthout trade, n Proc. AAMAS, 212. [1] D. Parkes, A. Procacca, and N. Shah, Beyond domnant resource farness: Extensons, lmtatons, and ndvsbltes, n Proc. ACM EC, 212. [11] S. Baruah, J. Gehrke, and C. Plaxton, Fast schedulng of perodc tasks on multple resources, n Proc. IEEE IPPS, 1995. [12] S. Baruah, N. Cohen, C. Plaxton, and D. Varvel, Proportonate progress: A noton of farness n resource allocaton, Algorthmca, vol. 15, no. 6, pp. 6 625, 1996. [13] F. Kelly, A. Maulloo, and D. Tan, Rate control for communcaton networks: Shadow prces, proportonal farness and stablty, J. Oper. Res. Soc., vol. 49, no. 3, pp. 237 252, 1998. [14] J. Mo and J. Walrand, Far end-to-end wndow-based congeston control, IEEE/ACM Trans. Networkng, vol. 8, no. 5, pp. 556 567, 2. [15] J. Klenberg, Y. Raban, and É. Tardos, Farness n routng and load balancng, n Proc. IEEE FOCS, 1999. [16] J. Blanquer and B. Özden, Far queung for aggregated multple lnks, n Proc. ACM SIGCOMM, 21. [17] Y. Lu and E. Knghtly, Opportunstc far schedulng over multple wreless channels, n Proc. IEEE INFOCOM, 23. [18] C. Koksal, H. Kassab, and H. Balakrshnan, An analyss of short-term farness n wreless meda access protocols, n Proc. ACM SIGMET- RICS (poster sesson), 2. [19] M. Bredel and M. Fdler, Understandng farness and ts mpact on qualty of servce n IEEE 82.11, n Proc. IEEE INFOCOM, 29. [2] R. Jan, D. Chu, and W. Hawe, A quanttatve measure of farness and dscrmnaton for resource allocaton n shared computer system. Eastern Research Laboratory, Dgtal Equpment Corporaton, 1984. [21] T. Lan, D. Kao, M. Chang, and A. Sabharwal, An axomatc theory of farness n network resource allocaton, n Proc. IEEE INFOCOM, 21. [22] Hadoop Capacty Scheduler, http://hadoop.apache.org/docs/r.2.2/capacty scheduler.html. [23] Hadoop Far Scheduler, http://hadoop.apache.org/docs/r.2.2/far scheduler.html. [24] M. Isard, V. Prabhakaran, J. Currey, U. Weder, K. Talwar, and A. Goldberg, Quncy: Far schedulng for dstrbuted computng clusters, n Proc. ACM SOSP, 29. [25] A. Ghods, M. Zahara, S. Shenker, and I. Stoca, Choosy: Max-mn far sharng for datacenter jobs wth constrants, n Proc. ACM EuroSys, 213. [26] I. Kash, A. Procacca, and N. Shah, No agent left behnd: Dynamc far dvson of multple resources, 212. [27] J. L and J. Xue, Egaltaran dvson under leontef preferences, 211, manuscrpt. [28] A. D. Procacca, Cake cuttng: Not just chld s play, Commun. ACM, 213. 1