On the Interaction between Load Balancing and Speed Scaling

On the Interacton between Load Balancng and Speed Scalng Ljun Chen, Na L and Steven H. Low Engneerng & Appled Scence Dvson, Calforna Insttute of Technology, USA Abstract Speed scalng has been wdely adopted n computer and communcaton systems, n partcular, to reduce energy consumpton. An mportant queston s how speed scalng nteracts wth other resource allocaton mechansms such as schedulng and routng, etc. In ths paper, we study the nteracton of speed scalng wth load balancng. We characterze the equlbrum resultng from the load balancng and speed scalng nteracton, and ntroduce two optmal load balancng desgns, n terms of tradtonal performance metrc and cost-aware (n partcular, energy-aware) performance metrc respectvely. Especally, we characterze the load-balancng-speed-scalng equlbrum wth respect to the optmal load balancng schemes n processor sharng systems. Our results show that the degree of neffcency at the equlbrum s mostly bounded by the heterogenety of the system, but ndependent of the number of the servers. These results provde nsghts n understandng the nteracton of load balancng wth speed scalng and gudng new desgns. Index Terms Load balancng, Speed scalng, Energy effcency, Effcency loss, Data centers. I. INTRODUCTION The energy consumpton rate of computer and communcaton systems has been ncreasng exponentally. Computer and communcaton systems must make a fundamental tradeoff between performance and energy usage, see, e.g., [], [2]. The addton of energy to standard performance metrcs such as delay, throughput and loss fundamentally changes the problem space of some of resource allocaton desgns. Not only are new mechansms needed to optmze energy usage, exstng algorthms and protocols must be re-examned as a formerly optmal algorthm may now perform poorly wth respect to a new energy-aware metrc. Energy management decsons must be decomposed and coordnated spatally as well as temporally, and yet global optmalty must be acheved through local algorthms that are mplementable n a dstrbuted manner. In ths paper we study load balancng and ts nteracton wth speed scalng. Energy-aware speed scalng to adapt the speed of the system so as to balance energy and performance metrcs s a wdely-adopted power management technque, see, e.g., [3], [4], [5], [6], [7], [8], [9], [0]. Prevous works on speed scalng usually focus on a sngle server and study ts nteracton wth schedulng, see, e.g, [], [8], [9], [0]. Here we consder a network settng and study the nteracton of speed scalng wth load balancng, to provde nsghts nto such ssues as: ) How does the system perform under speed scalng n terms of tradtonal performance metrcs as well as energyaware metrcs? ) How to desgn energy-aware optmal load balancng and can we decouple the desgn of load balancng from that of speed scalng? ) How does the sophstcaton of speed scalng mpact the desgn and performance of load balancng? We focus on gated-statc speed scalng n processor sharng systems, and our results provde useful nsghts nto the frst two questons. Specfcally, we characterze the equlbrum resultng from the load balancng and speed scalng nteracton, and ntroduce two optmal load balancng desgn problems, n terms of tradtonal performance metrc and cost-aware (n partcular, energy-aware) performance metrc respectvely. We study n detal the load-balancng-speed-scalng equlbrum and the optmal load balancng desgns n processor sharng systems wth gated-statc speed scalng, and propose dstrbuted load balancng algorthms to acheve the correspondng equlbrum and optma. Especally, we characterze the degree of neffcency at the load-balancng-speed-scalng equlbrum, n terms of delay as well as energy-aware metrc. We show that the degree of neffcency s mostly bounded by the heterogenety of the system, but ndependent of the number of servers n the system. Our results suggest that, as n many applcatons a low-order polynomal provdes a good approxmaton to power functon, we can decouple the desgn of load balancng from speed scalng wthout ncurrng much neffcency n delay. In terms of power-aware performance metrc, our results suggest that, as long as the heterogenety n the system s small, we can decouple the desgn of load balancng from speed scalng wthout ncurrng much effcency loss; but when the heterogenety n the system s large, we have to do energy-aware load balancng f the energy consumpton s a man concern. The paper s organzed as follows. The next secton brefly dscusses some related work. Secton III descrbes the system model. Secton IV gves a bref characterzaton of the load-balancng-speed-scalng equlbrum, and ntroduces two optmal load balancng desgn problems. Secton V studes n detal the load-balancng-speed-scalng nteracton n processor sharng systems wth gated-statc speed scalng. Secton VI provdes numercal examples to complement the theoretcal analyss, and Secton VII concludes wth some dscusson on further research. II. RELATED WORK Power management technques have been ncreasngly adopted n desgns from sngle-devce level such as chps to network level such as data centers. It has spurred a new branch

of research n ts own rght. In partcular, startng wth Yao et al [2], there s extensve research on analytcal study of speed scalng, see, e.g., [3], [4], [5], [6], [7], [8], [], [9], [8], [20], [2], [9], [0]. Bansal et al [8] show that a speed scalng polcy (SRPT, P (n + )) s 3-compettve for regular power functons n the worst-case analyss. Ths result has been tghten and extended to PS schedulng as well as to stochastc analyss by Andrew et al [0]. Especally, Andrew et al [0] provde a comprehensve study of speed scalng and ts nteracton wth schedulng, and show a fundamental tradeoff between optmalty, farness and robustness n speed scalng desgns. Related work also ncludes [22], [23] that show that the degree of neffcency n delay for load balancng n processor sharng systems wth fxed server speeds scales wth the number of servers n the system. Ths result has been extended to the processor sharng system wth mult-class load [24], and to other schedulng polces such as SRPT [25]. In contrast to these results, we show that the degree of neffcency n delay for load balancng n processor sharng systems wth speed scalng s bounded by the heterogenety of the system, but ndependent of the number of servers. III. SYSTEM MODEL Consder a system wth a set N of servers and a Posson arrval process of rate λ > 0; see Fgure. We assume that job sze s..d., and wthout loss of generalty, has a mean of. Assocated wth each server s a servce rate (or speed) s. There s a load balancng dspatcher that probablstcally routes arrvals to severs accordng to certan tradtonal performance metrc F that end users are concerned wth, so that F at each server s the same and mnmal. The metrc F can be, for example, the mean response tme E[T ] at the server, the summaton of E[T ] and propagaton delay τ, and the blockng probablty p, etc. contnuously dfferentable, ncreasng n arrval rate λ, and decreasng n servce capacty s wth f (, λ) = 0. Ths s a rather general assumpton. In order to ensure stablty, we must have λ < s for all N. We can thus assume that f (s, λ ) = when λ s. Besdes performance metrc F that s perceved by end users, each server ncurs certan cost c (s ) per unt tme when t runs at a speed of s. The cost can be, for example, the power expended at the server, or any other types of servce costs. Gven an ncomng rate of λ, let g (s, λ ) = E[c (s )], the average cost. The average cost depends on the speed as well as the schedulng polcy at the server. The cost functon g (s, λ ) (or ts analytcal approxmaton) s assumed to be contnuously dfferentable, ncreasng n s, and nondecreasng n λ. Gven arrval rate λ and schedulng polcy, each server wll choose a speed s to mnmze a cost-aware performance metrc M : M = g (s, λ ) + β λ f (s, λ ), () where β > 0 s used to characterze the relatve weght of nternal cost and tradtonal performance metrc. By the above model, we have actually assumed some knd of statc speed scalng,.e., choose a sngle speed s for a gven arrval rate λ. Wth more complcated notaton, we can also model dynamc scalng,.e., adapt speed to dfferent states such as the number of jobs n the server. Speed scalng can be broadly defned as any behavor of adaptng speed to load, and can be due to varous reasons, correspondng to dfferent choces of cost functon g (s, λ ). In ths paper, we wll mostly focus on energy-aware speed scalng as a concrete system to study the nteracton between load balancng and speed scalng, and consder the followng performance metrc: M = E[P (s )] + β λ E[T ], (2) where P (s ) s the power expended when server runs at speed s. The modelng of the power functon P (s ) s an actve research topc, and measurements have shown t can take on dfferent forms dependng on the system nvolved. In many applcatons a low-order polynomal form P (s ) = k s α, k > 0, α > (3) provdes a good approxmaton. For example, for dynamc power n CMOS P s often assumed to be cubc n prevous works [2]. We wll focus on polynomal power functon (3) n ths paper, as n many prevous works on speed scalng. Fg.. A pctoral dagram of the system model. It follows that the resultng arrval process to server s Posson wth rate λ. We assume that server s performance curve F = f (s, λ ) (or ts analytcal approxmaton) s Here a server can be a sngle server, or represent a cluster of collocated servers n, e.g., a mcro-datacenter. IV. LOAD-BALANCING-SPEED-SCALING INTERACTION In ths secton, we characterze the equlbrum resultng from the nteracton between load balancng and speed scalng for the general model descrbed n Secton III. We then ntroduce two optmal load balancng problems, F-optmal load balancng and cost-aware optmal load balancng, under speed scalng. We ntend to characterze the equlbrum wth respect to those two optmal load balancng problems, as well

as proposng dstrbuted load balancng algorthms to acheve the correspondng equlbrum and optma. Gven server speeds (s ) N and denote the set of servers used at load balancng by N b,.e., N b ff λ > 0. At load balancng, the F value at any server N b s thus the same, and not larger than the F j value a job would experence f routed to any unused server j N/N b. Ths can be wrtten mathematcally as f (s, λ ) f j (s j, λ j ), j N, N b, (4) λ = λ, (5) N where (λ ) N s the arrval rates at the servers at load balancng. 2 Denote the F value at server N b at load balancng by γ. The load balancng condton (4)-(5) can be equvalently wrtten as: there exsts a γ > 0, such that (f (s, λ ) γ)( λ λ ) 0, λ 0, (6) λ = λ. (7) N To see ths equvalence, note that equatons (6)-(7) mply that γ must equal the F value at server N b at load balancng. Assume that speed scalng problem mn s>λ M has a unque soluton s (λ ). Under the aforementoned assumptons on f and g, speed scalng s (λ ) satsfes: 3 g (s, λ ) s + β λ f (s, λ ) s = 0. (8) Defnton : The load-balancng-speed-scalng (LBSS) equlbrum s defned as a trple {(λ ) N, (s ) N, γ} that satsfes the varatonal nequaltes (6), (7) and (8). The performance of the system under load balancng and speed scalng s determned by the LBSS equlbrum. At the LBSS equlbrum {(λ ) N, (s ) N, γ}, s = s (λ ) and (f (s (λ ), λ ) γ)( λ λ ) 0, λ 0, (9) λ = λ. (0) N The followng result s straghtforward [26]. Theorem 2: The LBSS equlbrum satsfes the local optmalty condton for the followng optmzaton problem: mn λ 0 s.t. f (s (λ ), λ )dλ () λ = λ, (2) and γ s the correspondng optmal dual varable. Proof: Note that LBSS equlbrum condton (9)-(0) s a varatonal nequalty characterzaton of optmalty condton for optmzaton problem ()-(2) and ts dual [26]. 2 Note that n ths paper we reload the notaton, and λ denotes both any arrval rate of sever and the arrval rate of server at load balancng, dependng on the context. 3 The dynamc speed range of a server s usually fnte,.e., s r for some r > 0. For smplcty, we do not consder such a constrant n ths paper. However, such a constrant does not change the general structure of our model, n terms of, e.g., equlbrum characterzaton, and dstrbuted decomposton structure, etc. An optmzaton problem characterzaton of the equlbrum s usually very useful. It captures the global structure of the problem, and often we can easly tell from the optmzaton problem f there exsts an equlbrum, the multplcty of the equlbra, as well as derve dstrbuted or effcent algorthm to the equlbrum. When there s no speed scalng,.e., s s fxed, we recover the optmzaton problem characterzaton of usual load balancng. Under ths stuaton, problem ()-(2) s strctly convex as f (s, λ ) s an ncreasng functon of λ, and the equlbrum s unque. In general, there may be no or multple LBSS equlbra, dependng on propertes of performance curve f (s (λ ), λ ) under speed scalng. For example, consder performance metrc (2) wth power functon (3) n a processor sharng system wth gated statc speed scalng (see the next secton). Speed scalng s (λ ) satsfes β (s λ ) 2 = k (α )s. When α < 2, f (s (λ ), λ ) s decreasng. So, problem ()- (2) becomes a problem of mnmzng a concave objectve functon, whch s usually a hard computng problem and may admt multple solutons. In the above load balancng model, the dspatcher routes the arrvals accordng to tradtonal performance metrc F but does not consder the nternal cost g of the server. We call ths model cost-oblvous load balancng (e.g., energy-oblvous n the case of energy-aware speed scalng). It can also be seen as a selfsh routng game where each job chooses a server wth mnmal F value [27]. So, the LBSS equlbrum mght not be socally optmal, n terms of metrc F as well as energyaware metrc M. As we mentoned before, speed scalng brngs addtonal dmenson such as energy nto the desgn objectve. It s of sgnfcant value to study ts nteracton wth the exstng algorthms and protocols, e.g., f t s optmal wth respect to tradtonal performance metrc F as well as a new one M, how to desgn dstrbuted optmal algorthms n terms of new performance metrc, and f we can decouple speed scalng from other resource allocaton mechansms. In order to study these questons for load balancng, we consder two new load balancng models, as follows. F-optmal load balancng: The dspatcher routes arrvals so as to acheve socal optmum n terms of tradtonal performance metrc F : mn λ 0 s.t. λ f (s (λ ), λ ) (3) λ = λ. (4) When F = E[T ], we call t delay optmal load balancng. Cost-aware optmal load balancng: The dspatcher routes arrvals so as to acheve socal optmum n terms of cost-aware performance metrc M : mn λ 0 s.t. g (s (λ ), λ ) + β λ f (s (λ ), λ ) (5) λ = λ. (6)

We call t energy-aware optmal load balancng n the case of energy-aware speed scalng. The end users as a whole care about problem (3)-(4) and the servers/end users as a whole care about problem (5)- (6). We ntend to characterze the LBSS equlbrum wth respect to them, as well as proposng dstrbuted algorthms to acheve the correspondng equlbrum or optma. Agan, the general problems (3)-(4) and (5)-(6) may be hghly nontrval, dependng on the performance curve f under speed scalng. In the remander of ths paper, we wll focus on load balancng wth energy-aware speed scalng n processor sharng systems wth performance metrc (2) wth power functon (3), as a concrete system to study the nteracton between load balancng and speed scalng. We wll leave the general problem to future work. V. LOAD-BALANCING-SPEED-SCALING INTERACTION IN PROCESSOR SHARING SYSTEMS In ths secton, we consder energy-aware speed scalng n processor sharng (PS) systems wth performance metrc (2) and power functon (3). Whle general speed scalng polces can be taken at a server, we focus on gated-statc speed scalng, n whch the server has a zero speed when there s no job and otherwse runs at a constant speed that balances the response tme and energy usage; see, e.g, [9], [0]. Gatedstatc speed scalng s the smplest nontrval speed scalng. It requres mnmal hardware to support. For example, a CMOS chp may set a constant clock speed but AND t wth the gatng sgnal to set the speed to 0 when there s no job [0]. The gated statc speed scalng captures some essence of dynamc speed scalng whle admts more tractable analyss. As mentoned n Secton IV, when α < 2, the problem under gated statc speed scalng may become hard problem of mnmzng a concave objectve functon. We thus focus on the system wth α 2, n order to obtan a clean characterzaton to gan nsghts. Power functons wth α 2 s also practcally mportant, as n the server wth a power functon wth α 2 energy cost s usually the drvng force n decdng on server speed whle n the server wth a power functon wth α < 2 tradtonal performance metrc s the drvng force. Besdes, the results obtaned for gated statc speed scalng wth α 2 are expected to carry over to statc provsonng wth α, n whch the server runs at a constant statc speed that s chosen based on workload to balance the response tme and energy usage. Statc provsonng s the smplest form of speed scalng, and s a model often used n energy-aware capacty provsonng n data centers. A. Energy-oblvous load balancng Under PS schedulng, the mean response tme at server takes the form: f (s, λ ) =. (7) s λ Under gated statc speed scalng, the energy cost s only ncurred durng the tme when the server s busy. Note that the fracton of the tme when the server s busy s λ /s. So, the server decdes on speed s by solvng the followng optmzaton problem: λ mn β + λ P (s ). (8) s >λ s λ s the speed scalng s (λ ) satsfes where β = β (s λ ) 2 + s = 0, (9) β k (α ). By equaton (9), we have s (λ ) = 2s (λ ) α s (λ ) (α 2)λ > 0, (20) s (λ ) = (2α 4)(s (λ ) λ s (λ )) (α s (λ ) (α 2)λ ) 2 0, (2) where the second nequalty follows from the fact that s (λ ), and moreover, s (λ ) = and s (λ ) = 0 f and only f α = 2. Hence, speed scalng s (λ ) s a strctly ncreasng, convex functon of λ. Further, (s (λ )) f (s (λ ), λ ) = (22) β s (λ ) λ = s also a strctly ncreasng functon of λ. Corollary 3: There exsts a unque LBSS equlbrum for processor sharng systems wth gated-statc speed scalng. Proof: By Theorem 2, the LBSS equlbrum satsfes the optmalty condtons for optmzaton problem: mn λ dλ (23) s (λ ) λ λ = λ. (24) Snce s (λ ) λ s strctly ncreasng n λ, the above optmzaton problem s strctly convex. The exstence and unqueness of LBSS equlbrum follows from the fact that problem (23)- (24) has a unque optmum [26]. Now, let us characterze the equlbrum. For each server, defne the base servce rate s 0 = s (0 + ) = β α. 4 Wthout loss of generalty, we assume that s 0 s 0 2 s 0 N. For later convenence, we also assume that s 0 N + =0. Theorem 4: The set of servers that are used at the equlbrum s N e = {, 2,, n}, wth a unque n that satsfes where n ( f ) ( s 0 ) < λ n = f (λ ) = n ( f ) ( = s 0 n+ ), (25) s (λ ) λ. (26) 4 For a functon f(x) : R R, f(a + ) denotes the rght hand lmt lm x a + f(x).

Proof: By equlbrum condton (9), we have < γ f s 0 N e and γ otherwse. Further, s 0 λ = s γ > 0, f λ = 0, f s 0 s 0 < γ (27) γ. (28) Snce s 0 s decreasng n, N e takes the form of {, 2,, n}. Note that s < γ, and f 0 n s 0 (λ ) s an ncreasng n+ functon. So, n ( f ) ( n s 0 ) < ( f n ) (γ) ( f ) ( ), n =.e., n ( f ) ( s 0 ) < n = = n λ = λ = = n ( f ) ( = s 0 n+ s 0 n+ The unqueness of n follows from the fact that the LBSS equlbrum s unque. We see that the LBSS equlbrum has a water-fllng structure. If we see load balancng as a selfsh routng problem [27], the arrvals wll aggressvely occupy fast servers wth low delay frst. ) Dstrbuted load balancng algorthm: The (convex) optmzaton problem characterzaton of the LBSS equlbrum also suggests a dstrbuted algorthm to acheve the equlbrum. At k-th teraton: Each server estmates the arrval rate λ, and adjusts ts speed s, accordng to s (k) = s (λ (k)). (29) The dspatcher measures delay t (k) = s experenced at each server. Denote by E[t(k)] the (k) λ (k) mnmal ). N(k) t (k) wth t(k) at step k such that t(k) = N(k) N(k) := { λ (k) > 0 or t (k) t(k), N}. 5 The dspatcher adjusts λ to each server, accordng to λ (k + ) = [λ (k) ε(t (k) E[t(k)])] +. (30) where ε s a postve stepsze, and + denotes the projecton onto R +, the set of nonnegatve real numbers. When ε s small enough, the above algorthm converges. Let δ (k) = λ (k + ) λ (k). It s easy to verfy that δ (k) = 0, (3) δ (k)t (k) 0. (32) We see that δ (k)t (k) = 0 only f δ (k) = 0, whch requres t = t, or, λ = 0 and t > t. 5 t and N can be determned n a recursve way as follows. In the begnnng, let N = N and calculate t = N N t (k), and then exclude from N those servers such that λ = 0 and t > t. Repeat the same procedure wth the new sets N, and when t stops we get E[t]. The above algorthm actually follows the negatve gradent drecton of s (λ ) λ dλ subject to λ = λ [26]. Any algorthms that follow a properly-chosen negatve gradent drecton would work, and (30) pcks a specfc gradent drecton that wll facltate the convergence analyss. We skp the convergence proof for brevty. B. Delay optmal load balancng In ths subsecton, we study delay optmal load balancng desgn: λ mn (33) λ 0 s (λ ) λ s.t. λ = λ, (34) and characterze the LBSS equlbrum wth respect to t. By equaton (9), λ = s α 2, (35) s (λ ) λ β whch s strctly ncreasng and convex n s. Note that s (λ ) λ s ncreasng and convex. It follows that s (λ ) λ s a strctly convex functon of λ. 6 So, problem (33)-(34) s strctly convex, and has a unque optmum. Denote the optmum by (λ ) N. There exsts a unque γ > 0, such that the optmalty condton can be wrtten as [26] ( s (λ ) λ s (λ ) (s (λ ) γ )( λ λ λ )2 ) 0, λ 0, (36) λ = λ. (37) N Theorem 5: The set of servers that are used at the optmum s N o = {, 2,, }, wth a unque that satsfes where ( ˆf ) ( s 0 ) < λ ( ˆf ) ( = = s 0 + )}, (38) ˆf (λ ) = s (λ ) λ s (λ ) (s (λ ) λ ) 2. (39) Moreover, γ γ and n. Proof: Note that ˆf (λ ) s an ncreasng functon of λ, and ˆf (0) =. The frst part of the theorem follows the same s 0 proof as n Theorem 4. For the second part of the theorem. Note that s (λ ) by equaton (20). ˆf (λ ) f (λ ). If γ < γ, then n and ( ˆf ) (γ ) < ( f ) (γ ) ( f ) (γ) λ. = = = Ths contradcts = ( ˆf ) (γ ) = = λ = λ. So, γ γ, and n follows. 6 λ Note that, when α = 2, s not strctly convex but lnear n s (λ ) λ λ. But ths would not change the unqueness of the optmum.

) Dstrbuted load balancng algorthm: The delay optmal load balancng s a convex problem. We can apply smlar dstrbuted algorthm to algorthm (29)-(30), to gude the optmal load balancng desgn. At k-th teraton: Each server estmates the arrval rate λ, and adjusts ts speed s, accordng to s (k) = s (λ (k)). (40) The dspatcher measures delay t (k) = s experenced at each server, and estmates ˆf, accordng (k) λ (k) to ˆf (k) = ˆf (λ (k)) = α λ (k)(t (k)) 2 + α t (k) 2λ (k)t (k) + α. (4) Denote by E[ ˆf(k)] the mnmal ˆf(k) at step k such that ˆf(k) = N(k) N(k) ˆf (k) wth N(k) := { λ (k) > 0 or ˆf (k) ˆf(k), N}. The dspatcher adjusts λ to each server, accordng to λ (k + ) = [λ (k) ε( ˆf (k) E[ ˆf(k)])] +. (42) where ε s a postve stepsze, and + denotes the projecton onto R +, the set of nonnegatve real numbers. Note that delay optmal load balancng algorthm (40)-(42) s more complcated than the smple, energy-oblvous load balancng algorthm (29)-(30). It requres to estmate ˆf. In addton, t requres the dspatcher to know the servers power functon characterstc parameters α and k. 2) Effcency loss n delay at the LBSS equlbrum: Defne the socal cost n delay: C = λ s (λ ) λ, (43) we now characterze the neffcency n delay at the LBSS equlbrum. Lemma 6: Let α = max α. Then, γ γ α γ. (44) 2 Proof: The frst nequalty has been proved n Theorem 5. It remans to prove the second one. By equaton (35), ˆf can be wrtten as ˆf (λ ) = α 2 s β s. (45) Note that s (λ ) s ncreasng. s (λ ) 2 α by equaton (20). Combnng wth s (λ ), we get f (λ ) ˆf (λ ) α 2 f (λ ) α 2 f (λ ). If γ > α 2 γ, then ( ˆf ) (γ ) ( f ) ( 2 α γ ) > ( f ) (γ). ( ˆf ) (γ ) > = n ( f ) (γ) = λ. = Ths contradcts the fact that = ( ˆf ) (γ ) = λ (also note that n). So, γ α 2 γ. Theorem 7: Denote the socal cost n delay at the LBSS equlbrum by C e and the optmal cost by C o. Then, C e C o α 2. (46) Proof: The socal cost at the LBSS equlbrum s C e = λγ. (47) When λ > 0, by equatons (22), (45) and (44), we have s s (λ ) = = 2γ λ β α s 2γ 2γ α α. (48) So, C o = λ s (λ ) λ 2γ α λ = 2λγ α. (49) C e C o α 2. (50) We see that the degree of neffcency n delay at the LBSS equlbrum depends only on the order α of the power functons. For example, f α = 2, the LBSS equlbrum acheves the socal optmum. As α s a constant ndependent of the number N of the servers n the system, ths result s very dfferent from the effcency loss of the usual load balancng (wth fxed server speeds), whch scales wth N, see, e.g., [22]. Also, note that α 2 can be seen as a measure of heterogenety n power functons. We can thus say that the degree of neffcency at the LBSS equlbrum s bounded by the heterogenety of the system. As the power functon can usually be well approxmated as a low-order polynomal functon, the above result suggests bengn nteracton between energyoblvous load balancng and power-aware speed scalng, n terms of delay. As energy-oblvous load balancng s already employed n practce and smple to mplement, we may need not change t as t does not ncur a large penalty n delay. C. Energy-aware optmal load balancng In ths subsecton, we study energy-aware optmal load balancng desgn: mn λ,s s.t. λ β + λ P (s ) (5) s λ s λ = λ, (52) and characterze the LBSS equlbrum wth respect to t. By speed scalng (.e., solvng for s frst), the above problem reduces to: mn h (λ ) (53) λ s.t. λ = λ, (54)

where λ h (λ ) = β + λ P (s (λ )). (55) s (λ ) λ s (λ ) Note that h (λ ) = β s (λ ) (s (λ ) λ ) 2 + k (s (λ ) = α β s (λ ) α (s (λ ) λ ) 2, (56) h (λ ) = α β 2s (λ ) (s (λ ) + λ )s (λ ) α (s (λ ) λ ) 3. (57) We see that h > 0 and h > 0, and thus h (λ ) s strctly ncreasng and convex. So, problem (53)-(54) s a strctly convex problem, and has a unque optmum. Denote the optmum by (λ + ) N. There exsts a unque γ + > 0, such that the optmalty condton can be wrtten as [26] (h (λ + ) γ+ )( λ λ + ) 0, λ 0, (58) λ + = λ. (59) N Note that h (λ ) s strctly ncreasng, and h (λ ) α β α ˆf (λ ) α β α f (λ ). (60) Let d 0 = h (0) = α α β s 0. We can defne a permutaton π : {, 2,, N } {, 2,, N }, such that d 0 s n decreasng order under π. We have the followng characterzaton of the optmum. Theorem 8: The set of servers that are used at the optmum s N s = {π (), π (2),, π (m + )}, wth a unque m + that satsfes (h π () ) ( m + = d 0 π (m + ) ) < λ (h π () ) ( m + = d 0 π (m + +) Proof: It follows the same proof as n Theorem 4. We skp t for brevty. We see that the energy-aware optmal load balancng has a smlar water-fllng effect, and the arrvals wll occupy servers wth low margnal cost n energy-aware metrc frst. As a result, the jobs wll be consoldated nto a subset of servers that have low energy-aware cost. ) Dstrbuted load balancng algorthm: The energy-aware optmal load balancng s a convex problem. Agan, we can apply smlar dstrbuted algorthm to algorthm (29)-(30), to gude the optmal load balancng desgn. At k-th teraton: Each server estmates the arrval rate λ, and adjusts ts speed s, accordng to s (k) = s (λ (k)). (6) The dspatcher measures delay t (k) = s experenced at each server, and estmates h, accordng (k) λ (k) to h (k) = h (λ (k)) = α β α (λ (k)(t (k)) 2 + t (k)). (62) ). Denote by E[h (k)] the mnmal h (k) at step k such that h (k) = N(k) N(k) h (k) wth N(k) := { λ (k) > 0 or h (k) h (k), N}. The dspatcher adjusts λ to each server, accordng to λ (k + ) = [λ (k) ε(h (k) E[h (k)])] +. (63) where ε s a postve stepsze, and + denotes the projecton onto R +, the set of nonnegatve real numbers. Agan, energy aware optmal load balancng algorthm (6)- (63) s more complcated than energy-oblvous load balancng algorthm (29)-(30). In addton to the servers power functon characterstc parameters, the dspatcher requres to know ther weghts β. 2) Effcency loss n energy-aware performance metrc at the LBSS equlbrum: Defne the socal cost n energy-aware performance metrc M : D = λ β + λ P (s ) = h (λ ). (64) s λ s We now characterze the neffcency n energy-aware performance metrc at the LBSS equlbrum. It s complcated to characterze the effcency loss for the system wth arbtrary power functons and loads. Here we gve a partal characterzaton, focusng on the case wth power functons of the same order,.e., P (s ) = k s α for all servers, and n heavy traffc,.e., λ. We leave a complete characterzaton of the effcency loss to future work. The case wth power functons of the same order models a system that employs smlar servers but wth dfferent scalng factors and weghts. Heavy traffc regme s of sgnfcant nterest, as the neffcency of load-balancng-speed-scalng nteracton s ntutvely worst under heavy traffc. Theorem 9: Assume that α = 2. Denote the energy-aware socal cost at the LBSS equlbrum by D e and the optmal cost by D o. Under the aforementoned condtons, we have D e D o max k N. (65) mn k Proof: When α = 2, at the LBSS equlbrum (λ ) N, the arrvals wll be routed to the server that has the maxmal β value. 7 Under heavy traffc, the energy-aware socal cost at the LBSS equlbrum s D e k λ 2 max k λ 2. At the socal optmum (λ + ) N, λ + j /k λ. The optmal /kj socal cost s D 0 k (λ + )2 λ 2 /k mn k λ 2. N D e D o max k mn k N. (66) 7 There may exst multple servers that have the maxmal β value. But t s reasonable to expect that the number of such servers s bounded by a constant that does not scale wth the total number of the servers n the system. For smplcty of presentaton, we assume that there s only one server that has the maxmal β value. Ths only brngs n a constant factor to the bound on effcency loss, f there are multple such servers.

We see that when α = 2, the degree of neffcency at the LBSS equlbrum scales wth the number of servers n the system. Ths happens because the energy-oblvous load balancng uses only the server wth the largest base rate, whch ncurs a huge energy cost at ths server, whle the energy-aware optmal load balancng wll spread load across all servers, whch leads to much smaller energy cost at the servers. Ths suggests that we should do energy-aware load balancng f the energy consumpton s a man concern. α Lemma 0: Assume α > 2. Defne ζ = αk β for each server,. Then, mn ζ γ 2 γ + max ζ γ 2. (67) Proof: By equaton (56), h can be wrtten as h (λ ) = αk (s (λ ) = ζ ( f (λ )) 2. (68) If γ + < mn ζ γ 2, then (h ) (γ + ) < ( f ) (γ). (h ) (γ + ) < ( f ) (γ) = λ. Ths contradcts the fact that (h ) (γ + ) = λ. So, γ + mn ζ γ 2. The second nequalty can be proved smlarly. Theorem : Assume α > 2. Denote the energy-aware socal cost at the LBSS equlbrum by D e and the optmal cost by D o. Under the aforementoned condtons, we have D e D o (max ζ α. (69) mn ζ Proof: Under heavy traffc, λ. By Lemma n [9], we have the followng approxmaton for speed scalng under heavy traffc: β s (λ ) λ + λ λ. β λ + λ P (s ) λ β + k λ α k λ α. s λ β s α 2 The optmal socal cost s D e D o D o k ( γ+ k α α. (72) k β α max γ ( + mn j ζ j α k ( γ+ k α α γ ( + mn j ζ j α k β α ζ k ( γ+ k α α = max( mn j ζ j α ( max ζ mn j ζ j α. (73) We see that when α > 2, the degree of neffcency at the LBSS equlbrum depends only on the degree of heterogenety max ζ mn j ζ j n the system but not the number of servers N. If the degree of heterogenety n the system s small, energyoblvous load balancng nteracts bengnly wth speed scalng, n terms of the energy-aware cost. In ths stuaton, we may do not need complcated energy-aware load balancng,.e., we can decouple the desgn of load balancng from speed scalng. Otherwse, we must do energy-aware optmal load balancng f energy consumpton s a man concern. VI. NUMERICAL EXAMPLES In ths secton, we provde numercal examples to complement the analyss n prevous sectons, manly focusng on evaluatng the dstrbuted load balancng algorthms as our other results on the LBSS equlbrum, delay-optmal load balancng and energy-aware optmal load balancng are analytcal results. We consder a system wth 0 servers wth speed scalng. Half of the servers have a power functon of the form P (s ) = k s 5 2 and the other half have a power functon of the form P (s ) = k s 3. The total load s normalzed to be λ = 0, and the values for parameter k and β used to obtan numercal results are randomly drawn from [, 0] and [5, 5], respectvely. Note that, at the LBSS equlbrum (λ ) N, s λ γ =. β β The energy-aware socal cost at the LBSS equlbrum s D e k ( β γ 2 k β α ( where the nequalty follows from (67). At the socal optmum (λ + ) N, γ + α, (70) mn j ζ j γ + = αk s α αk (λ + )α. (7) Fg. 2. The arrval rate and servce rate evoluton of energy-oblvous load balancng.

once the prce starts oscllatng around some mean value. Fg. 3. The arrval rate and servce rate evoluton of delay optmal load balancng. Fg. 4. The arrval rate and servce rate evoluton of energy-aware optmal load balancng. Fgures 2, 3 and 4 show the evoluton of the arrval rate and servce rate wth stepsze γ = 0.2 for energy-oblvous load balancng, delay optmal load balancng and energy-aware optmal load balancng, respectvely. We see that the arrval rates and servce rates approach the correspondng equlbrum or optmum quckly. The numercal results confrm prevous analyss and ntutons. As we go from energy-oblvous load balancng to delay optmal load balancng, the load s spread more across the servers, whch s drven by mnmzng the socal cost n delay. We also see that the changes n the arrval rate and servce rate are not severe, whch ntutvely confrms Theorem 7 that gves a small bound on effcency loss at the LBSS equlbrum. As we move to energy-aware optmal load balancng, the load becomes more evenly dstrbuted. Ths s drven by mnmzng the energy-aware socal cost, and an uneven load dstrbuton wll lead to uneven servce rate dstrbuton, whch may result n large cost n energy at the server(s) wth large speed. We also see large changes n the arrval rate and servce rate. Ths mples a large degree of neffcency at the LBSS equlbrum, whch ntutvely confrms Theorem even though t s a characterzaton for the system wth power functons of the same order. In order to study the mpact of dfferent choces of the stepsze on the convergence of the algorthms, we have run smulatons wth dfferent stepszes. We found that the smaller the stepsze, the slower the convergence, and the larger the stepsze, the faster the convergence but the system may only approach to wthn a certan neghborhood of the equlbrum, whch s a general characterstc of any gradent based method. In practce, the dspatcher can frst choose large stepszes to ensure fast convergence, and subsequently reduce the stepszes VII. CONCLUSION We have studed the nteracton between load balancng and speed scalng. We characterze the equlbrum resultng from the load balancng and speed scalng nteracton, and ntroduce two optmal load balancng desgns, n terms of tradtonal performance metrc and cost-aware (n partcular, energyaware) performance metrc respectvely. We study n detal the load-balancng-speed-scalng equlbrum and the optmal load balancng desgns n processor sharng systems wth gatedstatc speed scalng, and propose dstrbuted load balancng algorthms to acheve the correspondng equlbrum and optma. Especally, we characterze the degree of neffcency at the load-balancng-speed-scalng equlbrum n terms of delay as well as energy-aware metrc, and show that the degree of neffcency s mostly bounded by the heterogenety of the system, but ndependent of the number of the servers. These results provde nsghts n understandng the nteracton of load balancng wth speed scalng and gudng new desgns. Further research stemmng out of ths paper ncludes the followng. We are characterzng the effcency loss n energyaware metrc at the load-balancng-speed-scalng equlbrum for the system wth power functons of dfferent polynomal orders. We are also studyng the load balancng and speed scalng nteracton n the processor sharng system wth general power functons (e.g., nonconvex, dscontnuous, wth possbly a dscrete set of allowable speeds), as well as n the system wth other schedulng polces such as Shortest Remanng Processng Tme (SRPT). We wll further study other speed scalng polces and ther mpact on the desgn and performance of load balancng. Fnally, we wll go beyond energy-aware speed scalng, and study other types of speed scalng behavors and ther nteracton wth load balancng n, e.g., date centers or call centers. REFERENCES [] O. S. Unsal and I. Koren. System-level power-aware desgn technques n real-tme systems. Proc. IEEE, 97(3):055 069, 2003. [2] S. Kaxras and M. Martonos. Computer Archtecture Technques for Power-Effcency. Morgan and Claypool, 2008. [3] S. Iran and K. R. Pruhs. Algorthmc problems n power management. SIGACT News, 36(2):63 76, 2005. [4] L. Yuan and G. Qu. Analyss of energy reducton on dynamc voltage scalng-enabled systems. IEEE Trans. Comput.-Aded Des. Integr. Crcuts Syst., 24(2):827 837, 2005. [5] Y. Zhu and F. Mueller. Feedback edf schedulng of real-tme tasks explotng dynamc voltage scalng. Real Tme Systems, 3:33 63, 2005. [6] N. Bansal, T. Kmbrel, and K. Pruhs. Speed scalng to manage energy and temperature. J. ACM, 54(): 39, 2007. [7] S. Herbert and D. Marculescu. Analyss of dynamc voltage/frequency scalng n chp-multprocessors. In Proc. ISLPED, 2007. [8] N. Bansal, H.-L. Chan, and K. Pruhs. Speed scalng wth an arbtrary power functon. In Proc. ACM-SIAM SODA, 2009. [9] A. Werman, L. L. H. Andrew, and A. Tang. Power-aware speed scalng n processor sharng systems. In Proceedngs of IEEE Infocom, 2009. [0] L. L. Andrew, M. Ln, and A. Werman. Optmalty, farness, and robustness n speed scalng desgns. In Proceedngs of ACM Sgmetrcs, 200.

[] N. Bansal, K. Pruhs, and C. Sten. Speed scalng for weghted flow tmes. In Proc. ACM-SIAM SODA, 2007. [2] F. Yao, A. Demers, and S. Shenker. A schedulng model for reduced cpu energy. In Proceedngs of IEEE Symposum on Foundatons of Computer Scence (FOCS), 995. [3] J. M. George and J. M. Harrson. Dynamc control of a queue wth adjustable servce rate. Operatons Research, 49(5):720 73, 200. [4] K. Pruhs, P. Uthasombut, and G. Woegnger. Gettng the best response for your erg. In Scandnavan Worksh. Alg. Theory, 2004. [5] J. R. Bradley. Optmal control of a dual servce rate m/m/ producton-nventory model. European Journal of Operatons Research, 6(3):82 837, 2005. [6] S. Albers and H. Fujwara. Energy-effcent algorthms for flow tme mnmzaton. Lecture Notes n Computer Scence, 3884:62 633, 2006. [7] D. P. Bunde. Power-aware schedulng for makespan and flow. In Proc. ACM Symp. Parallel Alg. and Arch, 2006. [8] S. Zhang and K. S. Catha. Approxmaton algorthm for the temperatureaware schedulng problem. In Proceedngs of IEEE Conference on Computer Aded Desgn, 2007. [9] N. Bansal, H.-L. Chan, T.-W. Lam, and L.-K. Lee. Schedulng for speed bounded processors. In Int. Colloq. Automata, Languages and Programmng, 2008. [20] N. Bansal, H.-L. Chan, K. Pruhs, and D. Katz. Improved bounds for speed scalng n devces obeyng the cube-root rule. In Int. Colloq. Automata, Languages and Programmng, 2009. [2] T.-W. Lam, L.-K. Lee, I. K. K. To, and P. W. H. Wong. Speed scalng functons for flow tme schedulng based on actve job count. In Proc. Euro. Symp. Alg., 2009. [22] M. Havv and T. Roughgarden. The prce of anarchy n an exponental mult-server. Operatons Research Letters, 35:42 426, 2007. [23] T. Wu and D. Starobnsk. On the prce of anarchy n unbounded delay networks. In Proc. of Game Theory for Comm. and Networks, 2006. [24] E. Altman, U. Ayesta, and B. J. Prabhu. Optmal load balancng n processor sharng systems. In Proceedngs of GameComm, 2008. [25] H. Chen, J. Marden, and A. Werman. On the mpact of heterogenety and back-end schedulng n load balancng desgns. In Proceedngs of IEEE Infocom, 2009. [26] D. P. Bertsekas and J. N. Tstskls. Parallel and Dstrbuted Computaton. Prentce Hall, 989. [27] N. Nssan, T. Roughgarden, E. Tardos, and V. V. Vazran. Algorthmc game theory. Cambrdge Unversty Press, 2007.