On the Interaction between Load Balancing and Speed Scaling



Similar documents
On the Interaction between Load Balancing and Speed Scaling

What is Candidate Sampling

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Recurrence. 1 Definitions and main statements

Enabling P2P One-view Multi-party Video Conferencing

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

Optimal Scheduling in the Hybrid-Cloud

An Interest-Oriented Network Evolution Mechanism for Online Communities

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

When Network Effect Meets Congestion Effect: Leveraging Social Services for Wireless Services

On File Delay Minimization for Content Uploading to Media Cloud via Collaborative Wireless Network

The literature on many-server approximations provides significant simplifications toward the optimal capacity

DEFINING %COMPLETE IN MICROSOFT PROJECT

Multi-Resource Fair Allocation in Heterogeneous Cloud Computing Systems

A Lyapunov Optimization Approach to Repeated Stochastic Games

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Price Competition in an Oligopoly Market with Multiple IaaS Cloud Providers

How Bad are Selfish Investments in Network Security?

The Power of Slightly More than One Sample in Randomized Load Balancing

2008/8. An integrated model for warehouse and inventory planning. Géraldine Strack and Yves Pochet

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

J. Parallel Distrib. Comput.

How To Solve An Onlne Control Polcy On A Vrtualzed Data Center

Optimal resource capacity management for stochastic networks

Cross-Selling in a Call Center with a Heterogeneous Customer Population

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

Revenue Management for a Multiclass Single-Server Queue via a Fluid Model Analysis

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

Feasibility of Using Discriminate Pricing Schemes for Energy Trading in Smart Grid

Dynamic Pricing for Smart Grid with Reinforcement Learning

Analysis of Energy-Conserving Access Protocols for Wireless Identification Networks

BERNSTEIN POLYNOMIALS

Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application

2. SYSTEM MODEL. the SLA (unlike the only other related mechanism [15] we can compare it is never able to meet the SLA).

An MILP model for planning of batch plants operating in a campaign-mode

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

Dominant Resource Fairness in Cloud Computing Systems with Heterogeneous Servers

8 Algorithm for Binary Searching in Trees

Research Article Enhanced Two-Step Method via Relaxed Order of α-satisfactory Degrees for Fuzzy Multiobjective Optimization

Support Vector Machines

Sngle Snk Buy at Bulk Problem and the Access Network

Multiple-Period Attribution: Residuals and Compounding

Loop Parallelization

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

A Secure Password-Authenticated Key Agreement Using Smart Cards

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Efficient Bandwidth Management in Broadband Wireless Access Systems Using CAC-based Dynamic Pricing

NON-CONSTANT SUM RED-AND-BLACK GAMES WITH BET-DEPENDENT WIN PROBABILITY FUNCTION LAURA PONTIGGIA, University of the Sciences in Philadelphia

Activity Scheduling for Cost-Time Investment Optimization in Project Management

Equlbra Exst and Trade S effcent proportionally

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

行 政 院 國 家 科 學 委 員 會 補 助 專 題 研 究 計 畫 成 果 報 告 期 中 進 度 報 告

AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE

Product-Form Stationary Distributions for Deficiency Zero Chemical Reaction Networks

Project Networks With Mixed-Time Constraints

Fisher Markets and Convex Programs

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

J. Parallel Distrib. Comput. Environment-conscious scheduling of HPC applications on distributed Cloud-oriented data centers

An Alternative Way to Measure Private Equity Performance

Addendum to: Importing Skill-Biased Technology

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

How To Improve Delay Throughput In Wireless Networks With Multipath Routing And Channel Codeing

CLoud computing technologies have enabled rapid

FORMAL ANALYSIS FOR REAL-TIME SCHEDULING

Schedulability Bound of Weighted Round Robin Schedulers for Hard Real-Time Systems

OPTIMAL INVESTMENT POLICIES FOR THE HORSE RACE MODEL. Thomas S. Ferguson and C. Zachary Gilstein UCLA and Bell Communications May 1985, revised 2004

Self-Adaptive SLA-Driven Capacity Management for Internet Services

Availability-Based Path Selection and Network Vulnerability Assessment

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Capacity Reservation for Time-Sensitive Service Providers: An Application in Seaport Management

CALL ADMISSION CONTROL IN WIRELESS MULTIMEDIA NETWORKS

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

Efficient Project Portfolio as a tool for Enterprise Risk Management

A heuristic task deployment approach for load balancing

Calculating the high frequency transmission line parameters of power cables

Credit Limit Optimization (CLO) for Credit Cards

Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT

Transcription:

On the Interacton between Load Balancng and Speed Scalng Ljun Chen and Na L Abstract Speed scalng has been wdely adopted n computer and communcaton systems, n partcular, to reduce energy consumpton. An mportant queston s how speed scalng nteracts wth other resource allocaton mechansms such as schedulng and routng, etc. In ths paper, we study the nteracton of speed scalng wth load balancng. We characterze the equlbrum resultng from the load balancng and speed scalng nteracton, and ntroduce two optmal load balancng desgns, n terms of tradtonal performance metrc and cost-aware (n partcular, energy-aware) performance metrc respectvely. Especally, we characterze the load-balancng-speed-scalng equlbrum wth respect to the optmal load balancng schemes n processor sharng systems. Our results show that the degree of neffcency at the equlbrum s mostly bounded by the heterogenety of the system, but ndependent of the number of the servers. These results provde nsghts n understandng the nteracton of load balancng wth speed scalng and gudng new desgns. Index Terms Load balancng, Speed scalng, Energy effcency, Effcency loss, Data centers. I. INTRODUCTION The energy consumpton rate of computer and communcaton systems has been ncreasng exponentally. Computer and communcaton systems must make a fundamental tradeoff between performance and energy usage, see, e.g., [], [2]. The addton of energy to standard performance metrcs such as delay, throughput and loss fundamentally changes the problem space of some of resource allocaton desgns. Not only are new mechansms needed to optmze energy usage, exstng algorthms and protocols must be re-examned as a formerly optmal algorthm may now perform poorly wth respect to a new energy-aware metrc. Energy management decsons must be decomposed and coordnated spatally as well as temporally, and yet global optmalty must be acheved through local algorthms that are mplementable n a dstrbuted manner. In ths paper we study load balancng and ts nteracton wth speed scalng. Energy-aware speed scalng to adapt the speed of the system so as to balance energy and performance metrcs s a wdely-adopted power management technque, see, e.g., [3], [4], [5], [6], [7], [8], [9], [0], [], [2], [3]. Prevous works on speed scalng usually focus on a sngle server and study ts nteracton wth schedulng, see, e.g, [4], [8], [9], [0], [3]. Here we consder a network settng and study the nteracton of L. Chen s wth Computer Scence and Telecommuncatons, Unversty of Colorado, Boulder, CO 80309, USA (emal: ljun.chen@colorado.edu). N. L s wth Electrcal Engneerng, Harvard Unversty, Cambrdge, MA 0238, USA (emal: nal@seas.harvard.edu). Prelmnary result of ths paper has been presented at the INFORMS Annual Meetng, Austn, Texas, 200 and the ITA Workshop, Lo Jolla, Calforna, 20. speed scalng wth load balancng, to provde nsghts nto such ssues as: ) How does the system perform under speed scalng n terms of tradtonal performance metrcs as well as energyaware metrcs? ) How to desgn energy-aware optmal load balancng and can we decouple the desgn of load balancng from that of speed scalng? ) How does the sophstcaton of speed scalng mpact the desgn and performance of load balancng? We focus on gated-statc speed scalng n processor sharng systems, and our results provde useful nsghts nto the frst two questons. Specfcally, we characterze the equlbrum resultng from the load balancng and speed scalng nteracton, and ntroduce two optmal load balancng desgn problems, n terms of tradtonal performance metrc and cost-aware (n partcular, energy-aware) performance metrc respectvely. We study n detal the load-balancng-speed-scalng equlbrum and the optmal load balancng desgns n processor sharng systems wth gated-statc speed scalng, and propose dstrbuted load balancng algorthms to acheve the correspondng equlbrum and optma. Especally, we characterze the degree of neffcency at the load-balancng-speed-scalng equlbrum, n terms of delay as well as energy-aware metrc. We show that the degree of neffcency s mostly bounded by the heterogenety of the system, but ndependent of the number of servers n the system. Our results suggest that, as n many applcatons a low-order polynomal provdes a good approxmaton to power functon, we can decouple the desgn of load balancng from speed scalng wthout ncurrng much neffcency n delay. In terms of power-aware performance metrc, our results suggest that, as long as the heterogenety n the system s small, we can decouple the desgn of load balancng from speed scalng wthout ncurrng much effcency loss; but when the heterogenety n the system s large, we have to do energy-aware load balancng f the energy consumpton s a man concern. To summarze, we make the followng man contrbutons n ths paper: ) We formulate three dfferent models to study the nteracton of load balancng and speed scalng: energyoblvous load balancng where the dspatcher mnmzes the delay experenced by a job, delay-optmal load balancng where the dspatcher mnmzes the overall delay ncurred at the servers, and energy-aware optmal load balancng where the dspatcher mnmzes the overall energy consumpton at the servers. 2) We characterze the equlbra of the above three load balancng desgns n terms of the set of actve servers, and propose dstrbuted algorthms for achevng the

2 equlbra. Our algorthms have low mplementaton complexty, and requre only nformaton that can be estmated or measured locally at the dspatcher and the servers. 3) We characterze the effcency loss of the energyoblvous load balancng (whch domnates the current practce) n delay and energy consumpton, and show that the degree of neffcency s mostly bounded by the heterogenety of the system. These results provde nsghts n understandng the nteracton of load balancng wth speed scalng and gudng new desgns. 4) We provde numercal examples to demonstrate the convergence of the proposed algorthms, and verfy the bounds on the effcency loss. The paper s organzed as follows. The next secton brefly dscusses some related work. Secton III descrbes the system model. Secton IV gves a bref characterzaton of the load-balancng-speed-scalng equlbrum, and ntroduces two optmal load balancng desgn problems. Secton V studes n detal the load-balancng-speed-scalng nteracton n processor sharng systems wth gated-statc speed scalng. Secton VI provdes numercal examples to complement the theoretcal analyss, and Secton VII concludes wth some dscusson on further research. Notaton: major notaton used n ths paper s summarzed n Table I. N λ λ, s F T c (s ) P (s ) α M e + C e C o D e D o TABLE I NOTATIONS Set of servers Job arrval rate at the dspatcher Job arrval rate and servce rate at server N Performance metrc at server N Delay at server N Operatng cost at speed s of server N Power functon of server N Order of polynomal power functon of server N Cost-aware performance metrc at server N Superscrpt denotng LBSS equlbrum Superscrpt denotng delay-optmal LB Superscrpt denotng energy-aware optmal LB Socal cost n delay at LBSS equlbrum Optmal cost n delay Energy-aware socal cost at LBSS equlbrum Optmal energy-ware cost II. RELATED WORK Power management technques have been ncreasngly adopted n desgns from sngle-devce level such as chps to network level such as data centers. It has spurred a new branch of research n ts own rght. In partcular, startng wth Yao et al [5], there s extensve research on analytcal study of speed scalng, see, e.g., [6], [7], [8], [9], [20], [2], [4], [22], [8], [23], [24], [9], [0], [], [2], [3]. Bansal et al [8] show that a speed scalng polcy (SRPT, P (n + )) s 3-compettve for regular power functons n the worst-case analyss. Ths result has been tghten and extended to PS schedulng as well as to stochastc analyss by Andrew et al [0]. Especally, Andrew et al [0] provde a comprehensve study of speed scalng and ts nteracton wth schedulng, and show a fundamental tradeoff between optmalty, farness and robustness n speed scalng desgns. Stanojevc and Shorten [] study dstrbuted speed scalng to mnmze energy consumpton subject to performance constrants. Son and Krshnamachar [2] study speed-scalngaware load balancng for cellular networks, and ther model has structural smlarty to ours for the energy-aware optmal load balancng (see Secton V-C). However, ther model ncludes delay and energy consumpton n the networkng components, n addton to those n the computng/processng components. Ther equlbrum characterzaton focuses on user assocaton, whle ours focuses on the set of actve servers. Ther teratve algorthm s based heurstcally on a varant of equlbrum characterzaton (.e., user assocaton at the optmum), whle ours s based on the gradent method. Related work also ncludes [25], [26] that show that the degree of neffcency n delay for load balancng n processor sharng systems wth fxed server speeds scales wth the number of servers n the system. Ths result has been extended to the processor sharng system wth mult-class load [27], and to other schedulng polces such as SRPT [28]. In contrast to these results, we show that the degree of neffcency n delay for load balancng n processor sharng systems wth speed scalng s bounded by the heterogenety of the system, but ndependent of the number of servers. III. SYSTEM MODEL Consder a system wth a set N of servers and a Posson arrval process of rate λ > 0; see Fgure. We assume that job sze s..d., and wthout loss of generalty, has a mean of. Assocated wth each server s a servce rate (or speed) s. There s a load balancng dspatcher that probablstcally routes arrvals to severs accordng to certan tradtonal performance metrc F that end users are concerned wth, so that F at each server s the same and mnmal. The metrc F can be, for example, the mean response tme E[T ] at the server, the summaton of E[T ] and propagaton delay τ, and the blockng probablty p, etc. It follows that the resultng arrval process to server s Posson wth rate λ. We assume that server s performance curve F = f (s, λ ) (or ts analytcal approxmaton) s contnuously dfferentable, ncreasng n arrval rate λ, and decreasng n servce capacty s wth f (, λ) = 0. Ths s a rather general assumpton. In order to ensure stablty, we must have λ < s for all N. We can thus assume that f (s, λ ) = when λ s. Besdes performance metrc F that s perceved by end users, each server ncurs certan cost c (s ) per unt tme when t runs at a speed of s. The cost can be, for example, the power expended at the server, or any other types of servce costs. Gven an ncomng rate of λ, let g (s, λ ) = E[c (s )], the average cost. The average cost depends on the speed Here a server can be a sngle server, or represent a cluster of collocated servers n, e.g., a mcro-datacenter.

3 l dspatcher Fg.. A pctoral dagram of the system model. l l 2 l N s s 2 s N servers as well as the schedulng polcy at the server. The cost functon g (s, λ ) (or ts analytcal approxmaton) s assumed to be contnuously dfferentable, ncreasng n s, and nondecreasng n λ. Gven arrval rate λ and schedulng polcy, each server wll choose a speed s to mnmze a cost-aware performance metrc M : M = g (s, λ ) + β λ f (s, λ ), () where β > 0 s used to characterze the relatve weght of nternal cost and tradtonal performance metrc. By the above model, we have actually assumed some knd of statc speed scalng,.e., choose a sngle speed s for a gven arrval rate λ. Wth more complcated notaton, we can also model dynamc scalng,.e., adapt speed to dfferent states such as the number of jobs n the server. Speed scalng can be broadly defned as any behavor of adaptng speed to load, and can be due to varous reasons, correspondng to dfferent choces of cost functon g (s, λ ). In ths paper, we wll mostly focus on energy-aware speed scalng as a concrete system to study the nteracton between load balancng and speed scalng, and consder the followng performance metrc: M = E[P (s )] + β λ E[T ], (2) where P (s ) s the power expended when server runs at speed s. The modelng of the power functon P (s ) s an actve research topc, and measurements have shown t can take on dfferent forms dependng on the system nvolved. In many applcatons a low-order polynomal form P (s ) = k s α, k > 0, α > (3) provdes a good approxmaton. For example, for dynamc power n CMOS P s often assumed to be cubc n prevous works [2]. We wll focus on polynomal power functon (3) n ths paper, as n many prevous works on speed scalng. IV. LOAD-BALANCING-SPEED-SCALING INTERACTION In ths secton, we characterze the equlbrum resultng from the nteracton between load balancng and speed scalng for the general model descrbed n Secton III. We then ntroduce two optmal load balancng problems, F-optmal load balancng and cost-aware optmal load balancng, under speed scalng. We ntend to characterze the equlbrum wth respect to those two optmal load balancng problems, as well as proposng dstrbuted load balancng algorthms to acheve the correspondng equlbrum and optma. Gven server speeds (s ) N and denote the set of servers used at load balancng by N b,.e., N b ff λ > 0. At load balancng, the F value at any server N b s thus the same, and not larger than the F j value a job would experence f routed to any unused server j N/N b. Ths can be wrtten mathematcally as f (s, λ ) f j (s j, λ j ), j N, N b, (4) λ = λ, (5) N where (λ ) N s the arrval rates at the servers at load balancng. Denote the F value at server N b at load balancng by γ. The load balancng condton (4)-(5) can be equvalently wrtten as: there exsts a γ > 0, such that (f (s, λ ) γ)( λ λ ) 0, λ 0, (6) λ = λ. (7) N To see ths equvalence, note that equatons (6)-(7) mply that γ must equal the F value at server N b at load balancng. Assume that speed scalng problem mn s>λ M has a unque soluton s (λ ). Under the aforementoned assumptons on f and g, speed scalng s (λ ) satsfes: g (s, λ ) s + β λ f (s, λ ) s = 0. (8) Notce that the dynamc speed range of a server s usually fnte,.e., s r for some r > 0. For smplcty, we do not consder such a constrant n ths paper. Such a constrant does not change the general structure of our model snce t does not change the convexty and the dstrbuted decomposton structure of the model. But t wll affect the characterzaton of effcency loss n Secton V. However, to remove the speed range constrant s reasonable for two reasons. Frst, one key aspect of ths paper s to study the mpact of speed scalng, but the speed range constrant (that s tght relatve to the job arrval rate) wll lmt the capablty of or even dsable a server s speed scalng. Second, the computng capacty s usually not a constrant; and actually a man motvaton for speed scalng s to scale down dle server capacty n order to save energy. Defnton. The load-balancng-speed-scalng (LBSS) equlbrum s defned as a trple {(λ e ) N, (s e ) N, γ e } that satsfes the varatonal nequaltes (6), (7) and (8). The performance of the system under load balancng and speed scalng s determned by the LBSS equlbrum. At the LBSS equlbrum {(λ e ) N, (s ) e N, γe }, s e = s (λ e ) and (f (s (λ e ), λ e ) γ e )( λ λ e ) 0, λ 0, (9) λ e = λ. (0) N

4 The followng result s straghtforward [29]. Theorem 2. The LBSS equlbrum satsfes the local optmalty condton for the followng optmzaton problem: mn λ 0 s.t. f (s (λ ), λ )dλ () λ = λ, (2) and γ e s the correspondng optmal dual varable. Proof. Note that LBSS equlbrum condton (9)-(0) s a varatonal nequalty characterzaton of optmalty condton for optmzaton problem ()-(2) and ts dual [29]. An optmzaton problem characterzaton of the equlbrum s usually very useful. It captures the global structure of the problem, and often we can easly tell from the optmzaton problem f there exsts an equlbrum, the multplcty of the equlbra, as well as derve dstrbuted or effcent algorthm to the equlbrum. When there s no speed scalng,.e., s s fxed, we recover the optmzaton problem characterzaton of usual load balancng. Under ths stuaton, problem ()-(2) s strctly convex as f (s, λ ) s an ncreasng functon of λ, and the equlbrum s unque. In general, there may be no or multple LBSS equlbra, dependng on propertes of performance curve f (s (λ ), λ ) under speed scalng. For example, consder performance metrc (2) wth power functon (3) n a processor sharng system wth gated statc speed scalng (see the next secton). Speed scalng s (λ ) satsfes β (s λ ) 2 = k (α )s α 2. When α < 2, f (s (λ ), λ ) s decreasng. So, problem ()- (2) becomes a problem of mnmzng a concave objectve functon, whch s usually a hard computng problem and may admt multple solutons. In the above load balancng model, the dspatcher routes the arrvals accordng to tradtonal performance metrc F but does not consder the nternal cost g of the server. We call ths model cost-oblvous load balancng (e.g., energy-oblvous n the case of energy-aware speed scalng). It can also be seen as a selfsh routng game where each job chooses a server wth mnmal F value [30]. So, the LBSS equlbrum mght not be socally optmal, n terms of metrc F as well as energyaware metrc M. As we mentoned before, speed scalng brngs addtonal dmenson such as energy nto the desgn objectve. It s of sgnfcant value to study ts nteracton wth the exstng algorthms and protocols, e.g., f t s optmal wth respect to tradtonal performance metrc F as well as a new one M, how to desgn dstrbuted optmal algorthms n terms of new performance metrc, and f we can decouple speed scalng from other resource allocaton mechansms. In order to study these questons for load balancng, we consder two new load balancng models, as follows. F-optmal load balancng: The dspatcher routes arrvals so as to acheve socal optmum n terms of tradtonal performance metrc F : mn λ 0 s.t. λ f (s (λ ), λ ) (3) λ = λ. (4) When F = E[T ], we call t delay optmal load balancng. Cost-aware optmal load balancng: The dspatcher routes arrvals so as to acheve socal optmum n terms of cost-aware performance metrc M : mn λ 0 s.t. g (s (λ ), λ ) + β λ f (s (λ ), λ ) (5) λ = λ. (6) We call t energy-aware optmal load balancng n the case of energy-aware speed scalng. The end users as a whole care about problem (3)-(4) and the servers/end users as a whole care about problem (5)- (6). We ntend to characterze the LBSS equlbrum wth respect to them, as well as proposng dstrbuted algorthms to acheve the correspondng equlbrum or optma. Agan, the general problems (3)-(4) and (5)-(6) may be hghly nontrval, dependng on the performance curve f under speed scalng. In the remander of ths paper, we wll focus on load balancng wth energy-aware speed scalng n processor sharng systems wth performance metrc (2) wth power functon (3), as a concrete system to study the nteracton between load balancng and speed scalng. We wll leave the general problem to future work. V. LOAD-BALANCING-SPEED-SCALING INTERACTION IN PROCESSOR SHARING SYSTEMS In ths secton, we consder energy-aware speed scalng n processor sharng (PS) systems wth performance metrc (2) and power functon (3). Whle general speed scalng polces can be taken at a server, we focus on gated-statc speed scalng, n whch the server has a zero speed when there s no job and otherwse runs at a constant speed that balances the response tme and energy usage; see, e.g, [9], [0]. Gatedstatc speed scalng s the smplest nontrval speed scalng. It requres mnmal hardware to support. For example, a CMOS chp may set a constant clock speed but AND t wth the gatng sgnal to set the speed to 0 when there s no job [0]. The gated statc speed scalng captures some essence of dynamc speed scalng whle admts more tractable analyss. As mentoned n Secton IV, when α < 2, the problem under gated statc speed scalng may become hard problem of mnmzng a concave objectve functon. We thus focus on the system wth α 2, n order to obtan a clean characterzaton to gan nsghts. Power functons wth α 2 s also practcally mportant, as n the server wth a power functon wth α 2 energy cost s usually the drvng force n decdng on server speed whle n the server wth a power functon wth

5 α < 2 tradtonal performance metrc s the drvng force. Besdes, the results obtaned for gated statc speed scalng wth α 2 are expected to carry over to statc provsonng wth α, n whch the server runs at a constant statc speed that s chosen based on workload to balance the response tme and energy usage. Statc provsonng s the smplest form of speed scalng, and s a model often used n energy-aware capacty provsonng n data centers. A. Energy-oblvous load balancng Under PS schedulng, the mean response tme at server takes the form: f (s, λ ) = s λ. (7) Under gated statc speed scalng, the energy cost s only ncurred durng the tme when the server s busy. Note that the fracton of the tme when the server s busy s λ /s. So, the server decdes on speed s by solvng the followng optmzaton problem: λ mn β + λ P (s ). (8) s >λ s λ s Thus, the speed scalng s (λ ) satsfes where β = β (s λ ) 2 + sα 2 = 0, (9) β k (α ). By equaton (9), we have s (λ ) = 2s (λ ) α s (λ ) (α 2)λ > 0, (20) s (λ ) = (2α 4)(s (λ ) λ s (λ )) (α s (λ ) (α 2)λ ) 2 0, (2) where the second nequalty follows from the fact that s (λ ), and moreover, s (λ ) = and s (λ ) = 0 f and only f α = 2. Hence, speed scalng s (λ ) s a strctly ncreasng, convex functon of λ. Further, (s (λ )) f (s (λ ), λ ) = = α 2 (22) s (λ ) λ β s also a strctly ncreasng functon of λ. Corollary 3. There exsts a unque LBSS equlbrum for processor sharng systems wth gated-statc speed scalng. Proof. By Theorem 2, the LBSS equlbrum satsfes the optmalty condtons for optmzaton problem: mn dλ (23) λ s (λ ) λ λ = λ. (24) Snce s (λ ) λ s strctly ncreasng n λ, the above optmzaton problem s strctly convex. The exstence and unqueness of LBSS equlbrum follows from the fact that problem (23)- (24) has a unque optmum [29]. Now, let us characterze the equlbrum. For each server, defne the base servce rate s 0 = s (0 + ) = β α. 2 Wthout loss of generalty, we assume that s 0 s 0 2 s 0 N. For later convenence, we also assume that s 0 N + =0. Theorem 4. The set of servers that are used at the equlbrum s N e = {, 2,, n}, wth a unque n that satsfes where n ( f ) ( s 0 ) < λ n = f (λ ) = n ( f ) ( = s 0 n+ ), (25) s (λ ) λ. (26) Proof. By equlbrum condton (9), we have < γ e f s 0 N e and γ e otherwse. Further, s 0 λ e = s e γ e > 0, f λ e = 0, f s 0 s 0 < γ e (27) γ e. (28) Snce s 0 s decreasng n, N e takes the form of {, 2,, n}. Note that s < γ e, and f 0 n s 0 (λ ) s an ncreasng n+ functon. So, n ( f ) ( s 0 ) < n =.e., n ( f ) ( s 0 ) < n = n ( f ) (γ e ) = n λ e = λ = n ( f ) ( = n ( f ) ( = s 0 n+ s 0 n+ The unqueness of n follows from the fact that the LBSS equlbrum s unque. We see that the LBSS equlbrum has a water-fllng structure. If we see load balancng as a selfsh routng problem [30], the arrvals wll aggressvely occupy fast servers wth low delay frst. ) Dstrbuted load balancng algorthm: The (convex) optmzaton problem characterzaton of the LBSS equlbrum also suggests a dstrbuted algorthm to acheve the equlbrum. At k-th teraton: Each server estmates the arrval rate λ, and adjusts ts speed s, accordng to s (k) = s (λ (k)). (29) The dspatcher measures delay t (k) = s experenced at each server. Denote by E[t(k)] the (k) λ (k) mnmal t(k) at step k such that t(k) = N(k) ). ), N(k) t (k) wth 2 For a functon f(x) : R R, f(a + ) denotes the rght hand lmt lm x a + f(x).

6 N(k) := { λ (k) > 0 or t (k) t(k), N}. 3 The dspatcher adjusts λ to each server, accordng to λ (k + ) = [λ (k) ε(t (k) E[t(k)])] +. (30) where ε s a postve stepsze, and + denotes the projecton onto R +, the set of nonnegatve real numbers. When ε s small enough, the above algorthm converges. Let δ (k) = λ (k + ) λ (k). It s easy to verfy that δ (k) = 0, (3) δ (k)t (k) 0. (32) We see that δ (k)t (k) = 0 only f δ (k) = 0, whch requres t = t, or, λ = 0 and t > t. The above algorthm actually follows the negatve gradent drecton of s (λ ) λ dλ subject to λ = λ [29]. Any algorthms that follow a properly-chosen negatve gradent drecton would work, and (30) pcks a specfc gradent drecton that wll facltate the convergence analyss. We skp the convergence proof for brevty. The above dstrbuted algorthm, as well as the other two proposed n Secton V.B.) and Secton V.C.), has low mplementaton complexty. All the nformaton requred n the algorthm can be estmated or measured locally at the dspatcher and ndvdual servers. Such algorthms are hghly desrable n a network settng that may nvolve a large number of servers. B. Delay-optmal load balancng In ths subsecton, we study delay optmal load balancng desgn: λ mn (33) λ 0 s (λ ) λ s.t. λ = λ, (34) and characterze the LBSS equlbrum wth respect to t. By equaton (9), λ = s α 2, (35) s (λ ) λ β whch s strctly ncreasng and convex n s. Note that s (λ ) λ s ncreasng and convex. It follows that s (λ ) λ s a strctly convex functon of λ. 4 So, problem (33)-(34) s strctly convex, and has a unque optmum. Denote the optmum by (λ ) N. There exsts a unque γ > 0, such that the optmalty condton can be wrtten as [29] ( s (λ ) λ s (λ ) (s (λ ) γ )( λ λ λ )2 ) 0, λ 0, (36) λ = λ. (37) N 3 t and N can be determned n a recursve way as follows. In the begnnng, let N = N and calculate t = N N t (k), and then exclude from N those servers such that λ = 0 and t > t. Repeat the same procedure wth the new sets N, and when t stops we get E[t]. 4 λ Note that, when α = 2, s not strctly convex but lnear n s (λ ) λ λ. But ths would not change the unqueness of the optmum. Theorem 5. The set of servers that are used at the optmum s N o = {, 2,, }, wth a unque that satsfes where ( ˆf ) ( s 0 ) < λ ( ˆf ) ( = = s 0 + )}, (38) ˆf (λ ) = s (λ ) λ s (λ ) (s (λ ) λ ) 2. (39) Moreover, γ γ e and n. Proof. Note that ˆf (λ ) s an ncreasng functon of λ, and ˆf (0) =. The frst part of the theorem follows the same s 0 proof as n Theorem 4. For the second part of the theorem. Note that s (λ ) by equaton (20). Thus, ˆf (λ ) f (λ ). If γ < γ e, then n and ( ˆf ) (γ ) < ( f ) (γ ) ( f ) (γ e ) λ. = = = Ths contradcts = ( ˆf ) (γ ) = = λ = λ. So, γ γ e, and n follows. Notce that γ e has the nterpretaton as the delay at the energy-oblvous load balancng, but γ does not have such an nterpretaton as delay. So, γ γ e does not mply a larger delay at the delay optmal load balancng. In fact, n the delay-optmal load balancng dfferent servers may have dfferent delays and the whole system has the best overall delay performance. ) Dstrbuted load balancng algorthm: The delay optmal load balancng s a convex problem. We can apply smlar dstrbuted algorthm to algorthm (29)-(30), to gude the optmal load balancng desgn. At k-th teraton: Each server estmates the arrval rate λ, and adjusts ts speed s, accordng to s (k) = s (λ (k)). (40) The dspatcher measures delay t (k) = s experenced at each server, and estmates ˆf, accordng (k) λ (k) to ˆf (k) = ˆf (λ (k)) = α λ (k)(t (k)) 2 + α t (k) 2λ (k)t (k) + α. (4) Denote by E[ ˆf(k)] the mnmal ˆf(k) at step k such that ˆf(k) = N(k) N(k) ˆf (k) wth N(k) := { λ (k) > 0 or ˆf (k) ˆf(k), N}. The dspatcher adjusts λ to each server, accordng to λ (k + ) = [λ (k) ε( ˆf (k) E[ ˆf(k)])] +. (42) where ε s a postve stepsze, and + denotes the projecton onto R +, the set of nonnegatve real numbers. Note that delay optmal load balancng algorthm (40)-(42) s more complcated than the smple, energy-oblvous load balancng algorthm (29)-(30). It requres to estmate ˆf. In

7 addton, t requres the dspatcher to know the servers power functon characterstc parameters α and k. 2) Effcency loss n delay at the LBSS equlbrum: Defne the socal cost n delay: C = λ s (λ ) λ, (43) we now characterze the neffcency n delay at the LBSS equlbrum. Lemma 6. Let α = max α. Then, γ e γ α 2 γe. (44) Proof. The frst nequalty has been proved n Theorem 5. It remans to prove the second one. By equaton (35), ˆf can be wrtten as ˆf (λ ) = α 2 s α 2 β s. (45) Note that s (λ ) s ncreasng. Thus, s (λ e ) 2 α by equaton (20). Combnng wth s (λ e ), we get f (λ e ) ˆf (λ e ) α 2 f (λ e ) α 2 f (λ e ). If γ > α 2 γe, then Thus, ( ˆf ) (γ ) ( f ) ( 2 α γ ) > ( f ) (γ e ). ( ˆf ) (γ ) > = n ( f ) (γ e ) = λ. = Ths contradcts the fact that = ( ˆf ) (γ ) = λ (also note that n). So, γ α 2 γe. Theorem 7. Denote the socal cost n delay at the LBSS equlbrum by C e and the optmal cost by C o. Then, C e C o α 2. (46) Proof. The socal cost at the LBSS equlbrum s C e = λγ e. (47) When λ > 0, by equatons (22), (45) and (44), we have s α 2 s (λ ) = = 2γ λ β α s 2γ 2γe α α. (48) So, Thus, C o = λ s (λ ) λ 2γe α λ = 2λγe α. (49) C e C o α 2. (50) We see that the degree of neffcency n delay at the LBSS equlbrum depends only on the order α of the power functons. For example, f α = 2, the LBSS equlbrum acheves the socal optmum. As α s a constant ndependent of the number N of the servers n the system, ths result s very dfferent from the effcency loss of the usual load balancng (wth fxed server speeds), whch scales wth N, see, e.g., [25]. Also, note that α 2 can be seen as a measure of heterogenety n power functons. We can thus say that the degree of neffcency at the LBSS equlbrum s bounded by the heterogenety of the system. As the power functon can usually be well approxmated as a low-order polynomal functon, the above result suggests bengn nteracton between energyoblvous load balancng and power-aware speed scalng, n terms of delay. As energy-oblvous load balancng s already employed n practce and smple to mplement, we may need not change t as t does not ncur a large penalty n delay. C. Energy-aware optmal load balancng In ths subsecton, we study energy-aware optmal load balancng desgn: mn λ,s s.t. λ β + λ P (s ) s λ (5) s λ = λ, (52) and characterze the LBSS equlbrum wth respect to t. By speed scalng (.e., solvng for s frst), the above problem reduces to: mn h (λ ) (53) λ s.t. λ = λ, (54) where Note that λ h (λ ) = β + λ P (s (λ )). (55) s (λ ) λ s (λ ) h β s (λ ) (λ ) = (s (λ ) λ ) 2 + k (s (λ )) α = α β s (λ ) α (s (λ ) λ ) 2, (56) h (λ ) = α β 2s (λ ) (s (λ ) + λ )s (λ ) α (s (λ ) λ ) 3. (57) We see that h > 0 and h > 0, and thus h (λ ) s strctly ncreasng and convex. So, problem (53)-(54) s a strctly convex problem, and has a unque optmum. Denote the optmum by (λ + ) N. There exsts a unque γ + > 0, such that the optmalty condton can be wrtten as [29] (h (λ + ) γ+ )( λ λ + ) 0, λ 0, (58) λ + = λ. (59) N

8 Note that h (λ ) s strctly ncreasng, and h (λ ) α β α ˆf (λ ) α β α f (λ ). (60) Let d 0 = h (0) = α α β s 0. We can defne a permutaton π : {, 2,, N } {, 2,, N }, such that d 0 s n decreasng order under π. We have the followng characterzaton of the optmum. Theorem 8. The set of servers that are used at the optmum s N s = {π (), π (2),, π (m + )}, wth a unque m + that satsfes (h π () ) ( m + = d 0 π (m + ) ) < λ (h π () ) ( m + = d 0 π (m + +) Proof. It follows the same proof as n Theorem 4. We skp t for brevty. We see that the energy-aware optmal load balancng has a smlar water-fllng effect, and the arrvals wll occupy servers wth low margnal cost n energy-aware metrc frst. As a result, the jobs wll be consoldated nto a subset of servers that have low energy-aware cost. ) Dstrbuted load balancng algorthm: The energy-aware optmal load balancng s a convex problem. Agan, we can apply smlar dstrbuted algorthm to algorthm (29)-(30), to gude the optmal load balancng desgn. At k-th teraton: Each server estmates the arrval rate λ, and adjusts ts speed s, accordng to s (k) = s (λ (k)). (6) The dspatcher measures delay t (k) = s experenced at each server, and estmates h, accordng (k) λ (k) to h (k) = h (λ (k)) = α β α (λ (k)(t (k)) 2 + t (k)). (62) Denote by E[h (k)] the mnmal h (k) at step k such that h (k) = N(k) N(k) h (k) wth N(k) := { λ (k) > 0 or h (k) h (k), N}. The dspatcher adjusts λ to each server, accordng to λ (k + ) = [λ (k) ε(h (k) E[h (k)])] +. (63) where ε s a postve stepsze, and + denotes the projecton onto R +, the set of nonnegatve real numbers. Agan, energy aware optmal load balancng algorthm (6)- (63) s more complcated than energy-oblvous load balancng algorthm (29)-(30). In addton to the servers power functon characterstc parameters, the dspatcher requres to know ther weghts β. ). 2) Effcency loss n energy-aware performance metrc at the LBSS equlbrum: Defne the socal cost n energy-aware performance metrc M : D = λ β + λ P (s ) s λ s = h (λ ). (64) We now characterze the neffcency n energy-aware performance metrc at the LBSS equlbrum. It s complcated to characterze the effcency loss for the system wth arbtrary power functons and loads. Here we gve a partal characterzaton, focusng on the case wth power functons of the same order,.e., P (s ) = k s α for all servers, and n heavy traffc,.e., λ. We leave a complete characterzaton of the effcency loss to future work. The case wth power functons of the same order models a system that employs smlar servers but wth dfferent scalng factors and weghts. Heavy traffc regme s of sgnfcant nterest, as the neffcency of load-balancng-speed-scalng nteracton s ntutvely worst under heavy traffc. Theorem 9. Assume that α = 2. Denote the energy-aware socal cost at the LBSS equlbrum by D e and the optmal cost by D o. Under the aforementoned condtons, we have D e D o max k mn k N. (65) Proof. When α = 2, at the LBSS equlbrum (λ e ) N, the arrvals wll be routed to the server that has the maxmal β value. 5 Under heavy traffc, the energy-aware socal cost at the LBSS equlbrum s D e k λ 2 max k λ 2. At the socal optmum (λ + ) N, λ + j /k λ. The optmal /kj socal cost s Thus, D 0 k (λ + )2 λ 2 /k mn k λ 2. N D e D o max k mn k N. (66) We see that when α = 2, the degree of neffcency at the LBSS equlbrum scales wth the number of servers n the system. Ths happens because the energy-oblvous load balancng uses only the server wth the largest base rate, whch ncurs a huge energy cost at ths server, whle the energy-aware optmal load balancng wll spread load across all servers, whch leads to much smaller energy cost at the servers. Ths suggests that we should do energy-aware load balancng f the energy consumpton s a man concern. 5 There may exst multple servers that have the maxmal β value. But t s reasonable to expect that the number of such servers s bounded by a constant that does not scale wth the total number of the servers n the system. For smplcty of presentaton, we assume that there s only one server that has the maxmal β value. Ths only brngs n a constant factor to the bound on effcency loss, f there are multple such servers.

9 α α 2 Lemma 0. Assume α > 2. Defne ζ = αk β server,. Then, mn ζ (γ e ) 2α 2 α 2 for each γ + max ζ (γ e ) 2α 2 α 2. (67) Proof. By equaton (56), h can be wrtten as h (λ ) = αk (s (λ )) α = ζ ( f (λ )) 2α 2 α 2. (68) If γ + < mn ζ (γ e ) 2α 2 α 2, then (h ) (γ + ) < ( f ) (γ e ). Thus, D e D o k β α α 2 max γ ( + mn j ζ j ) α α k ( γ+ k α ) α α γ ( + mn j ζ j ) α α k β α α 2 ζ k ( γ+ k α ) α α = max( ) α mn j ζ j α ( max ζ mn j ζ j ) α α. (73) Thus, (h ) (γ + ) < ( f ) (γ e ) = λ. Ths contradcts the fact that (h ) (γ + ) = λ. So, γ + mn ζ (γ e ) 2α 2 α 2. The second nequalty can be proved smlarly. Theorem. Assume α > 2. Denote the energy-aware socal cost at the LBSS equlbrum by D e and the optmal cost by D o. Under the aforementoned condtons, we have D e D o (max ζ mn ζ ) α α. (69) Proof. Under heavy traffc, λ. By Lemma n [9], we have the followng approxmaton for speed scalng under heavy traffc: β s (λ ) λ + λ α 2 λ. Thus, β λ + λ P (s ) λ β + k λ α k λ α. s λ β s Note that, at the LBSS equlbrum (λ e ) N, γ e (s e = )α 2 (λ e )α 2. β β The energy-aware socal cost at the LBSS equlbrum s D e k ( β γ e2 ) α α 2 α 2 k β α α 2 ( where the nequalty follows from (67). At the socal optmum (λ + ) N, γ + ) α α, (70) mn j ζ j γ + = αk s α αk (λ + )α. (7) The optmal socal cost s D o k ( γ+ k α ) α α. (72) We see that when α > 2, the degree of neffcency at the LBSS equlbrum depends only on the degree of heterogenety max ζ mn j ζ j n the system but not the number of servers N. If the degree of heterogenety n the system s small, energyoblvous load balancng nteracts bengnly wth speed scalng, n terms of the energy-aware cost. In ths stuaton, we may do not need complcated energy-aware load balancng,.e., we can decouple the desgn of load balancng from speed scalng. Otherwse, we must do energy-aware optmal load balancng f energy consumpton s a man concern. VI. NUMERICAL EXAMPLES In ths secton, we provde numercal examples to complement the analyss n prevous sectons. We frst show the convergence of the three dstrbuted algorthms proposed n secton V, and then verfy the bounds on effcency loss descrbed n Theorem 7, Theorem 9, and Theorem. We consder a system wth 0 servers wth speed scalng. Half of the servers have a power functon of the form P (s ) = k s 5 2 and the other half have a power functon of the form P (s ) = k s 3. The total load s chosen to be λ = 800 correspondng to a heavy traffc scenaro, and the values for parameter k and β used to obtan numercal results are randomly drawn from [2, 8] and [, 5], respectvely. The key consderaton n choosng the parameter values s to ncorporate enough heterogenety n the system. A. Dstrbuted algorthms Fgures 2, 3 and 4 show the evoluton of the arrval rate and servce rate wth stepsze ε = 0.2 for the energy-oblvous load balancng, the delay-optmal load balancng and the energyaware optmal load balancng, respectvely. We see that the arrval rates and servce rates approach the correspondng equlbrum or optmum quckly. The numercal results confrm prevous analyss and ntutons. As we go from the energyoblvous load balancng to the delay-optmal load balancng, the load s spread more across the servers, whch s drven by mnmzng the socal cost n delay. We also see that the changes n the arrval rate and servce rate are not severe, whch ntutvely confrms Theorem 7 that gves a small bound on effcency loss at the LBSS equlbrum. As we move to energy-aware optmal load balancng, the load becomes more

0 evenly dstrbuted. Ths s drven by mnmzng the energyaware socal cost, and an uneven load dstrbuton wll lead to uneven servce rate dstrbuton, whch may result n large cost n energy at the server(s) wth large speed. We also see large changes n the arrval rate and servce rate. Ths mples a large degree of neffcency at the LBSS equlbrum, whch ntutvely confrms Theorem even though t s a characterzaton for the system wth power functons of the same order. In order to study the mpact of dfferent choces of the stepsze on the convergence of the algorthms, we have run smulatons wth dfferent stepszes. We found that the smaller the stepsze, the slower the convergence, and the larger the stepsze, the faster the convergence but the system may only approach to wthn a certan neghborhood of the equlbrum, whch s a general characterstc of any gradent based method. In practce, the dspatcher can frst choose large stepszes to ensure fast convergence, and subsequently reduce the stepszes once the prce starts oscllatng around some mean value. Arrval rate λ 500 400 300 200 00 server server 2 server 3 server 4 server 5 server 6 server 7 server 8 server 9 server 0 0 0 500 000 Number of Iteratons Servce rate s 500 400 300 200 00 server server 2 server 3 server 4 server 5 server 6 server 7 server 8 server 9 server 0 0 0 500 000 Number of Iteratons Fg. 2. The arrval rate and servce rate evoluton of energy-oblvous load balancng. Arrval rate λ 250 200 50 00 50 server server 2 server 3 server 4 server 5 server 6 server 7 server 8 server 9 server 0 0 0 500 000 Number of Iteratons Servce rate s 250 200 50 00 50 server server 2 server 3 server 4 server 5 server 6 server 7 server 8 server 9 server 0 0 0 500 000 Number of Iteratons Fg. 4. The arrval rate and servce rate evoluton of energy-aware optmal load balancng. C e /C o.008.006.004.002 0 5 20 25 30 35 40 45 50 Number of servers Fg. 5. Rato of C e /C o. D e /D o 4 3.5 3 2.5 2.5 0 5 20 25 30 35 40 45 50 Number of servers 500 500 Fg. 6. Rato of D e /D o wth homogeneous α = 2. Arrval rate λ 400 300 200 00 server server 2 server 3 server 4 server 5 server 6 server 7 server 8 server 9 server 0 0 0 500 000 Number of Iteratons Servce rate s 400 server server 2 server 3 300 server 4 server 5 200 server 6 server 7 server 8 00 server 9 server 0 0 0 500 000 Number of Iteratons Fg. 3. The arrval rate and servce rate evoluton of delay optmal load balancng. D e /D o 2. 2.05 2.95 0 5 20 25 30 35 40 45 50 Number of servers Fg. 7. Rato of D e /D o wth homogeneous α = 3. B. Comparson between the three load balancng algorthms Frstly, as shown n Fgure 2 and 3, we observe that energyoblvous load balancng and delay optmal load balancng generate smlar patterns of load schedulng and servce rate. Ths leads to the smlar socal cost n delay, whch s further confrmed n Fgure 5. In Fgure 5, we smulate dfferent szes of systems wth the number of servers beng n = 0, 4,..., 50. Fgure 5 plots the rato of C e /C o where C o s the socal cost n delay of delay-optmal load balancng and C e s the socal cost n delay of the LBSS equlbrum. We see that the effcency loss of LBSS equlbrum s very small wth respect to socal cost n delay. Ths smulaton results are consstent wth the analyss n Theorem 7. As shown n Fgure 2 and 4, as we go from energy-oblvous load balancng to energy-aware optmal load balancng, the load s spread more across the servers. Ths s drven by mnmzng the energy-aware socal cost, and an uneven load dstrbuton wll lead to uneven servce rate dstrbuton, whch may result n large cost n energy at the server(s) wth large speed. To further verfy Theorem 9 and, we smulate dfferent szes of systems wth the number of servers beng

n = 0, 4,..., 50 usng homogeneous α. Fgure 6 plots the rato of D e /D o wth α = 2 and Fgure 7 plots the rato of D e /D o wth α = 3. Here D o s the socal cost n energy-aware performance metrc M of delay-optmal load balancng and D e s the socal cost n n energy-aware performance metrc M of the LBSS equlbrum. Frstly, we observe that the rato D e /D o s much smaller than the worst case bound provded n Theorem 9 and Theorem. 6 Secondly, we see when α = 2, the rato ncreases as the network sze ncreases; n contrast, when α = 3, the rato s ndependent of the network sze. Ths s consstent wth the theoretcal bound n Theorem 9 and Theorem. VII. CONCLUSION We have studed the nteracton between load balancng and speed scalng. We characterze the equlbrum resultng from the load balancng and speed scalng nteracton, and ntroduce two optmal load balancng desgns, n terms of tradtonal performance metrc and cost-aware (n partcular, energyaware) performance metrc respectvely. We study n detal the load-balancng-speed-scalng equlbrum and the optmal load balancng desgns n processor sharng systems wth gatedstatc speed scalng, and propose dstrbuted load balancng algorthms to acheve the correspondng equlbrum and optma. Especally, we characterze the degree of neffcency at the load-balancng-speed-scalng equlbrum n terms of delay as well as energy-aware metrc, and show that the degree of neffcency s mostly bounded by the heterogenety of the system, but ndependent of the number of the servers. These results provde nsghts n understandng the nteracton of load balancng wth speed scalng and gudng new desgns. Further research stemmng out of ths paper ncludes the followng. We are characterzng the effcency loss n energyaware metrc at the load-balancng-speed-scalng equlbrum for the system wth power functons of dfferent polynomal orders. We are also studyng the load balancng and speed scalng nteracton n the processor sharng system wth general power functons (e.g., nonconvex, dscontnuous, wth possbly a dscrete set of allowable speeds), as well as n the system wth other schedulng polces such as Shortest Remanng Processng Tme (SRPT). We wll further study other speed scalng polces and ther mpact on the desgn and performance of load balancng. Fnally, we wll go beyond energy-aware speed scalng, and study other types of speed scalng behavors and ther nteracton wth load balancng n, e.g., date centers or call centers. [3] S. Iran and K. R. Pruhs. Algorthmc problems n power management. SIGACT News, 36(2):63 76, 2005. [4] L. Yuan and G. Qu. Analyss of energy reducton on dynamc voltage scalng-enabled systems. IEEE Trans. Comput.-Aded Des. Integr. Crcuts Syst., 24(2):827 837, 2005. [5] Y. Zhu and F. Mueller. Feedback edf schedulng of real-tme tasks explotng dynamc voltage scalng. Real Tme Systems, 3:33 63, 2005. [6] N. Bansal, T. Kmbrel, and K. Pruhs. Speed scalng to manage energy and temperature. J. ACM, 54(): 39, 2007. [7] S. Herbert and D. Marculescu. Analyss of dynamc voltage/frequency scalng n chp-multprocessors. In Proc. ISLPED, 2007. [8] N. Bansal, H.-L. Chan, and K. Pruhs. Speed scalng wth an arbtrary power functon. In Proc. ACM-SIAM SODA, 2009. [9] A. Werman, L. L. H. Andrew, and A. Tang. Power-aware speed scalng n processor sharng systems. In Proceedngs of IEEE Infocom, 2009. [0] L. L. Andrew, M. Ln, and A. Werman. Optmalty, farness, and robustness n speed scalng desgns. In Proceedngs of ACM Sgmetrcs, 200. [] R. Stanojevc and R. Shorten. Dstrbuted dynamc speed scalng. In INFOCOM, 200 Proceedngs IEEE, pages 5, March 200. [2] Kyuho Son and B. Krshnamachar. Speedbalance: Speed-scalng-aware optmal load balancng for green cellular networks. In INFOCOM, 202 Proceedngs IEEE, pages 286 2820, March 202. [3] Maryam Elah, Carey Wllamson, and Phlpp Woelfel. Decoupled speed scalng: Analyss and evaluaton. Performance Evaluaton, 73(0):3 7, 204. [4] N. Bansal, K. Pruhs, and C. Sten. Speed scalng for weghted flow tmes. In Proc. ACM-SIAM SODA, 2007. [5] F. Yao, A. Demers, and S. Shenker. A schedulng model for reduced cpu energy. In Proceedngs of IEEE Symposum on Foundatons of Computer Scence (FOCS), 995. [6] J. M. George and J. M. Harrson. Dynamc control of a queue wth adjustable servce rate. Operatons Research, 49(5):720 73, 200. [7] K. Pruhs, P. Uthasombut, and G. Woegnger. Gettng the best response for your erg. In Scandnavan Worksh. Alg. Theory, 2004. [8] J. R. Bradley. Optmal control of a dual servce rate m/m/ producton-nventory model. European Journal of Operatons Research, 6(3):82 837, 2005. [9] S. Albers and H. Fujwara. Energy-effcent algorthms for flow tme mnmzaton. Lecture Notes n Computer Scence, 3884:62 633, 2006. [20] D. P. Bunde. Power-aware schedulng for makespan and flow. In Proc. ACM Symp. Parallel Alg. and Arch, 2006. [2] S. Zhang and K. S. Catha. Approxmaton algorthm for the temperatureaware schedulng problem. In Proceedngs of IEEE Conference on Computer Aded Desgn, 2007. [22] N. Bansal, H.-L. Chan, T.-W. Lam, and L.-K. Lee. Schedulng for speed bounded processors. In Int. Colloq. Automata, Languages and Programmng, 2008. [23] N. Bansal, H.-L. Chan, K. Pruhs, and D. Katz. Improved bounds for speed scalng n devces obeyng the cube-root rule. In Int. Colloq. Automata, Languages and Programmng, 2009. [24] T.-W. Lam, L.-K. Lee, I. K. K. To, and P. W. H. Wong. Speed scalng functons for flow tme schedulng based on actve job count. In Proc. Euro. Symp. Alg., 2009. [25] M. Havv and T. Roughgarden. The prce of anarchy n an exponental mult-server. Operatons Research Letters, 35:42 426, 2007. [26] T. Wu and D. Starobnsk. On the prce of anarchy n unbounded delay networks. In Proc. of Game Theory for Comm. and Networks, 2006. [27] E. Altman, U. Ayesta, and B. J. Prabhu. Optmal load balancng n processor sharng systems. In Proceedngs of GameComm, 2008. [28] H. Chen, J. Marden, and A. Werman. On the mpact of heterogenety and back-end schedulng n load balancng desgns. In Proceedngs of IEEE Infocom, 2009. [29] D. P. Bertsekas and J. N. Tstskls. Parallel and Dstrbuted Computaton. Prentce Hall, 989. [30] N. Nssan, T. Roughgarden, E. Tardos, and V. V. Vazran. Algorthmc game theory. Cambrdge Unversty Press, 2007. REFERENCES [] O. S. Unsal and I. Koren. System-level power-aware desgn technques n real-tme systems. Proc. IEEE, 97(3):055 069, 2003. [2] S. Kaxras and M. Martonos. Computer Archtecture Technques for Power-Effcency. Morgan and Claypool, 2008. 6 The worst case bound provded n Theorem 9 s 4n and the worst case bound provded n Theorem s around 353.

2 PLACE PHOTO HERE theory and ts engneerng applcaton. Ljun Chen (M 05) s an Assstant Professor of Computer Scence and Telecommuncatons at Unversty of Colorado at Boulder. He receved a Ph.D. n Control and Dynamcal Systems from Calforna Insttute of Technology n 2007. He was a corecpent of the Best Paper Award at the IEEE Internatonal Conference on Moble Ad-hoc and Sensor Systems (MASS) n 2007. Hs current research nterests nclude optmzaton and control of networked systems, dstrbuted optmzaton and control, convex relaxaton and parsmonous solutons, and game PLACE PHOTO HERE Na L (M 09) s an assstant professor n the School of Engneerng and Appled Scences n Harvard Unversty. She receved her B.S. degree n mathematcs and appled mathematcs from Zhejang Unversty n Chna and PhD degree n Control and Dynamcal systems from Calforna Insttute of Technology n 203. She was a postdoctoral assocate of the Laboratory for Informaton and Decson Systems at Massachusetts Insttute of Technology. She entered the Best Student Paper Award?nalst n the 20 IEEE Conference on Decson and Control. Her research les n the desgn, analyss, optmzaton and control of dstrbuted network systems, wth partcular applcatons to power networks and systems bology/physology.