On the Interaction between Load Balancing and Speed Scaling



Similar documents
On the Interaction between Load Balancing and Speed Scaling

What is Candidate Sampling

Recurrence. 1 Definitions and main statements

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

The literature on many-server approximations provides significant simplifications toward the optimal capacity

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

An Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Enabling P2P One-view Multi-party Video Conferencing

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

Optimal Scheduling in the Hybrid-Cloud

Multi-Resource Fair Allocation in Heterogeneous Cloud Computing Systems

Optimal resource capacity management for stochastic networks

When Network Effect Meets Congestion Effect: Leveraging Social Services for Wireless Services

Cross-Selling in a Call Center with a Heterogeneous Customer Population

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Revenue Management for a Multiclass Single-Server Queue via a Fluid Model Analysis

Price Competition in an Oligopoly Market with Multiple IaaS Cloud Providers

An Interest-Oriented Network Evolution Mechanism for Online Communities

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

DEFINING %COMPLETE IN MICROSOFT PROJECT

A Lyapunov Optimization Approach to Repeated Stochastic Games

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

Cross-Selling in a Call Center with a Heterogeneous Customer Population

BERNSTEIN POLYNOMIALS

How Bad are Selfish Investments in Network Security?

Fisher Markets and Convex Programs

J. Parallel Distrib. Comput.

Product-Form Stationary Distributions for Deficiency Zero Chemical Reaction Networks

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

The Power of Slightly More than One Sample in Randomized Load Balancing

2008/8. An integrated model for warehouse and inventory planning. Géraldine Strack and Yves Pochet

On File Delay Minimization for Content Uploading to Media Cloud via Collaborative Wireless Network

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

Dynamic Pricing for Smart Grid with Reinforcement Learning

An MILP model for planning of batch plants operating in a campaign-mode

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

How To Solve An Onlne Control Polcy On A Vrtualzed Data Center

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

Support Vector Machines

8 Algorithm for Binary Searching in Trees

Dominant Resource Fairness in Cloud Computing Systems with Heterogeneous Servers

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Equlbra Exst and Trade S effcent proportionally

NON-CONSTANT SUM RED-AND-BLACK GAMES WITH BET-DEPENDENT WIN PROBABILITY FUNCTION LAURA PONTIGGIA, University of the Sciences in Philadelphia

Project Networks With Mixed-Time Constraints

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

An Intelligent Policy System for Channel Allocation of Information Appliance

Feasibility of Using Discriminate Pricing Schemes for Energy Trading in Smart Grid

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Distributed Optimal Contention Window Control for Elastic Traffic in Wireless LANs

CALL ADMISSION CONTROL IN WIRELESS MULTIMEDIA NETWORKS

FORMAL ANALYSIS FOR REAL-TIME SCHEDULING

OPTIMAL INVESTMENT POLICIES FOR THE HORSE RACE MODEL. Thomas S. Ferguson and C. Zachary Gilstein UCLA and Bell Communications May 1985, revised 2004

Downlink Power Allocation for Multi-class. Wireless Systems

Value Driven Load Balancing

PAS: A Packet Accounting System to Limit the Effects of DoS & DDoS. Debish Fesehaye & Klara Naherstedt University of Illinois-Urbana Champaign

Addendum to: Importing Skill-Biased Technology

Analysis of Energy-Conserving Access Protocols for Wireless Identification Networks

Loop Parallelization

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Sngle Snk Buy at Bulk Problem and the Access Network

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

A Probabilistic Theory of Coherence

Multiple-Period Attribution: Residuals and Compounding

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Self-Adaptive SLA-Driven Capacity Management for Internet Services

2. SYSTEM MODEL. the SLA (unlike the only other related mechanism [15] we can compare it is never able to meet the SLA).

Research Article Enhanced Two-Step Method via Relaxed Order of α-satisfactory Degrees for Fuzzy Multiobjective Optimization

Dynamic Fleet Management for Cybercars

How To Improve Power Demand Response Of A Data Center Wth A Real Time Power Demand Control Program

J. Parallel Distrib. Comput. Environment-conscious scheduling of HPC applications on distributed Cloud-oriented data centers

Solving Factored MDPs with Continuous and Discrete Variables

Online Auctions in IaaS Clouds: Welfare and Profit Maximization with Server Costs

Energy Efficient Routing in Ad Hoc Disaster Recovery Networks

An Alternative Way to Measure Private Equity Performance

Period and Deadline Selection for Schedulability in Real-Time Systems

Ring structure of splines on triangulations

How To Improve Delay Throughput In Wireless Networks With Multipath Routing And Channel Codeing

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

Availability-Based Path Selection and Network Vulnerability Assessment

Economic Models for Cloud Service Markets

On Robust Network Planning

1 Example 1: Axis-aligned rectangles

denote the location of a node, and suppose node X . This transmission causes a successful reception by node X for any other node

CLoud computing technologies have enabled rapid

Schedulability Bound of Weighted Round Robin Schedulers for Hard Real-Time Systems

Activity Scheduling for Cost-Time Investment Optimization in Project Management

The Application of Fractional Brownian Motion in Option Pricing

Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application

Efficient Project Portfolio as a tool for Enterprise Risk Management

Marginal Revenue-Based Capacity Management Models and Benchmark 1

行 政 院 國 家 科 學 委 員 會 補 助 專 題 研 究 計 畫 成 果 報 告 期 中 進 度 報 告

Efficient Bandwidth Management in Broadband Wireless Access Systems Using CAC-based Dynamic Pricing

Sketching Sampled Data Streams

Dynamics of heterogeneous peer-to-peer networks

Rate Monotonic (RM) Disadvantages of cyclic. TDDB47 Real Time Systems. Lecture 2: RM & EDF. Priority-based scheduling. States of a process

Credit Limit Optimization (CLO) for Credit Cards

Transcription:

On the Interacton between Load Balancng and Speed Scalng Ljun Chen, Na L and Steven H. Low Engneerng & Appled Scence Dvson, Calforna Insttute of Technology, USA Abstract Speed scalng has been wdely adopted n computer and communcaton systems, n partcular, to reduce energy consumpton. An mportant queston s how speed scalng nteracts wth other resource allocaton mechansms such as schedulng and routng, etc. In ths paper, we study the nteracton of speed scalng wth load balancng. We characterze the equlbrum resultng from the load balancng and speed scalng nteracton, and ntroduce two optmal load balancng desgns, n terms of tradtonal performance metrc and cost-aware (n partcular, energy-aware) performance metrc respectvely. Especally, we characterze the load-balancng-speed-scalng equlbrum wth respect to the optmal load balancng schemes n processor sharng systems. Our results show that the degree of neffcency at the equlbrum s mostly bounded by the heterogenety of the system, but ndependent of the number of the servers. These results provde nsghts n understandng the nteracton of load balancng wth speed scalng and gudng new desgns. Index Terms Load balancng, Speed scalng, Energy effcency, Effcency loss, Data centers. I. INTRODUCTION The energy consumpton rate of computer and communcaton systems has been ncreasng exponentally. Computer and communcaton systems must make a fundamental tradeoff between performance and energy usage, see, e.g., [], [2]. The addton of energy to standard performance metrcs such as delay, throughput and loss fundamentally changes the problem space of some of resource allocaton desgns. Not only are new mechansms needed to optmze energy usage, exstng algorthms and protocols must be re-examned as a formerly optmal algorthm may now perform poorly wth respect to a new energy-aware metrc. Energy management decsons must be decomposed and coordnated spatally as well as temporally, and yet global optmalty must be acheved through local algorthms that are mplementable n a dstrbuted manner. In ths paper we study load balancng and ts nteracton wth speed scalng. Energy-aware speed scalng to adapt the speed of the system so as to balance energy and performance metrcs s a wdely-adopted power management technque, see, e.g., [3], [4], [5], [6], [7], [8], [9], [0]. Prevous works on speed scalng usually focus on a sngle server and study ts nteracton wth schedulng, see, e.g, [], [8], [9], [0]. Here we consder a network settng and study the nteracton of speed scalng wth load balancng, to provde nsghts nto such ssues as: ) How does the system perform under speed scalng n terms of tradtonal performance metrcs as well as energyaware metrcs? ) How to desgn energy-aware optmal load balancng and can we decouple the desgn of load balancng from that of speed scalng? ) How does the sophstcaton of speed scalng mpact the desgn and performance of load balancng? We focus on gated-statc speed scalng n processor sharng systems, and our results provde useful nsghts nto the frst two questons. Specfcally, we characterze the equlbrum resultng from the load balancng and speed scalng nteracton, and ntroduce two optmal load balancng desgn problems, n terms of tradtonal performance metrc and cost-aware (n partcular, energy-aware) performance metrc respectvely. We study n detal the load-balancng-speed-scalng equlbrum and the optmal load balancng desgns n processor sharng systems wth gated-statc speed scalng, and propose dstrbuted load balancng algorthms to acheve the correspondng equlbrum and optma. Especally, we characterze the degree of neffcency at the load-balancng-speed-scalng equlbrum, n terms of delay as well as energy-aware metrc. We show that the degree of neffcency s mostly bounded by the heterogenety of the system, but ndependent of the number of servers n the system. Our results suggest that, as n many applcatons a low-order polynomal provdes a good approxmaton to power functon, we can decouple the desgn of load balancng from speed scalng wthout ncurrng much neffcency n delay. In terms of power-aware performance metrc, our results suggest that, as long as the heterogenety n the system s small, we can decouple the desgn of load balancng from speed scalng wthout ncurrng much effcency loss; but when the heterogenety n the system s large, we have to do energy-aware load balancng f the energy consumpton s a man concern. The paper s organzed as follows. The next secton brefly dscusses some related work. Secton III descrbes the system model. Secton IV gves a bref characterzaton of the load-balancng-speed-scalng equlbrum, and ntroduces two optmal load balancng desgn problems. Secton V studes n detal the load-balancng-speed-scalng nteracton n processor sharng systems wth gated-statc speed scalng. Secton VI provdes numercal examples to complement the theoretcal analyss, and Secton VII concludes wth some dscusson on further research. II. RELATED WORK Power management technques have been ncreasngly adopted n desgns from sngle-devce level such as chps to network level such as data centers. It has spurred a new branch

of research n ts own rght. In partcular, startng wth Yao et al [2], there s extensve research on analytcal study of speed scalng, see, e.g., [3], [4], [5], [6], [7], [8], [], [9], [8], [20], [2], [9], [0]. Bansal et al [8] show that a speed scalng polcy (SRPT, P (n + )) s 3-compettve for regular power functons n the worst-case analyss. Ths result has been tghten and extended to PS schedulng as well as to stochastc analyss by Andrew et al [0]. Especally, Andrew et al [0] provde a comprehensve study of speed scalng and ts nteracton wth schedulng, and show a fundamental tradeoff between optmalty, farness and robustness n speed scalng desgns. Related work also ncludes [22], [23] that show that the degree of neffcency n delay for load balancng n processor sharng systems wth fxed server speeds scales wth the number of servers n the system. Ths result has been extended to the processor sharng system wth mult-class load [24], and to other schedulng polces such as SRPT [25]. In contrast to these results, we show that the degree of neffcency n delay for load balancng n processor sharng systems wth speed scalng s bounded by the heterogenety of the system, but ndependent of the number of servers. III. SYSTEM MODEL Consder a system wth a set N of servers and a Posson arrval process of rate λ > 0; see Fgure. We assume that job sze s..d., and wthout loss of generalty, has a mean of. Assocated wth each server s a servce rate (or speed) s. There s a load balancng dspatcher that probablstcally routes arrvals to severs accordng to certan tradtonal performance metrc F that end users are concerned wth, so that F at each server s the same and mnmal. The metrc F can be, for example, the mean response tme E[T ] at the server, the summaton of E[T ] and propagaton delay τ, and the blockng probablty p, etc. contnuously dfferentable, ncreasng n arrval rate λ, and decreasng n servce capacty s wth f (, λ) = 0. Ths s a rather general assumpton. In order to ensure stablty, we must have λ < s for all N. We can thus assume that f (s, λ ) = when λ s. Besdes performance metrc F that s perceved by end users, each server ncurs certan cost c (s ) per unt tme when t runs at a speed of s. The cost can be, for example, the power expended at the server, or any other types of servce costs. Gven an ncomng rate of λ, let g (s, λ ) = E[c (s )], the average cost. The average cost depends on the speed as well as the schedulng polcy at the server. The cost functon g (s, λ ) (or ts analytcal approxmaton) s assumed to be contnuously dfferentable, ncreasng n s, and nondecreasng n λ. Gven arrval rate λ and schedulng polcy, each server wll choose a speed s to mnmze a cost-aware performance metrc M : M = g (s, λ ) + β λ f (s, λ ), () where β > 0 s used to characterze the relatve weght of nternal cost and tradtonal performance metrc. By the above model, we have actually assumed some knd of statc speed scalng,.e., choose a sngle speed s for a gven arrval rate λ. Wth more complcated notaton, we can also model dynamc scalng,.e., adapt speed to dfferent states such as the number of jobs n the server. Speed scalng can be broadly defned as any behavor of adaptng speed to load, and can be due to varous reasons, correspondng to dfferent choces of cost functon g (s, λ ). In ths paper, we wll mostly focus on energy-aware speed scalng as a concrete system to study the nteracton between load balancng and speed scalng, and consder the followng performance metrc: M = E[P (s )] + β λ E[T ], (2) where P (s ) s the power expended when server runs at speed s. The modelng of the power functon P (s ) s an actve research topc, and measurements have shown t can take on dfferent forms dependng on the system nvolved. In many applcatons a low-order polynomal form P (s ) = k s α, k > 0, α > (3) provdes a good approxmaton. For example, for dynamc power n CMOS P s often assumed to be cubc n prevous works [2]. We wll focus on polynomal power functon (3) n ths paper, as n many prevous works on speed scalng. Fg.. A pctoral dagram of the system model. It follows that the resultng arrval process to server s Posson wth rate λ. We assume that server s performance curve F = f (s, λ ) (or ts analytcal approxmaton) s Here a server can be a sngle server, or represent a cluster of collocated servers n, e.g., a mcro-datacenter. IV. LOAD-BALANCING-SPEED-SCALING INTERACTION In ths secton, we characterze the equlbrum resultng from the nteracton between load balancng and speed scalng for the general model descrbed n Secton III. We then ntroduce two optmal load balancng problems, F-optmal load balancng and cost-aware optmal load balancng, under speed scalng. We ntend to characterze the equlbrum wth respect to those two optmal load balancng problems, as well

as proposng dstrbuted load balancng algorthms to acheve the correspondng equlbrum and optma. Gven server speeds (s ) N and denote the set of servers used at load balancng by N b,.e., N b ff λ > 0. At load balancng, the F value at any server N b s thus the same, and not larger than the F j value a job would experence f routed to any unused server j N/N b. Ths can be wrtten mathematcally as f (s, λ ) f j (s j, λ j ), j N, N b, (4) λ = λ, (5) N where (λ ) N s the arrval rates at the servers at load balancng. 2 Denote the F value at server N b at load balancng by γ. The load balancng condton (4)-(5) can be equvalently wrtten as: there exsts a γ > 0, such that (f (s, λ ) γ)( λ λ ) 0, λ 0, (6) λ = λ. (7) N To see ths equvalence, note that equatons (6)-(7) mply that γ must equal the F value at server N b at load balancng. Assume that speed scalng problem mn s>λ M has a unque soluton s (λ ). Under the aforementoned assumptons on f and g, speed scalng s (λ ) satsfes: 3 g (s, λ ) s + β λ f (s, λ ) s = 0. (8) Defnton : The load-balancng-speed-scalng (LBSS) equlbrum s defned as a trple {(λ ) N, (s ) N, γ} that satsfes the varatonal nequaltes (6), (7) and (8). The performance of the system under load balancng and speed scalng s determned by the LBSS equlbrum. At the LBSS equlbrum {(λ ) N, (s ) N, γ}, s = s (λ ) and (f (s (λ ), λ ) γ)( λ λ ) 0, λ 0, (9) λ = λ. (0) N The followng result s straghtforward [26]. Theorem 2: The LBSS equlbrum satsfes the local optmalty condton for the followng optmzaton problem: mn λ 0 s.t. f (s (λ ), λ )dλ () λ = λ, (2) and γ s the correspondng optmal dual varable. Proof: Note that LBSS equlbrum condton (9)-(0) s a varatonal nequalty characterzaton of optmalty condton for optmzaton problem ()-(2) and ts dual [26]. 2 Note that n ths paper we reload the notaton, and λ denotes both any arrval rate of sever and the arrval rate of server at load balancng, dependng on the context. 3 The dynamc speed range of a server s usually fnte,.e., s r for some r > 0. For smplcty, we do not consder such a constrant n ths paper. However, such a constrant does not change the general structure of our model, n terms of, e.g., equlbrum characterzaton, and dstrbuted decomposton structure, etc. An optmzaton problem characterzaton of the equlbrum s usually very useful. It captures the global structure of the problem, and often we can easly tell from the optmzaton problem f there exsts an equlbrum, the multplcty of the equlbra, as well as derve dstrbuted or effcent algorthm to the equlbrum. When there s no speed scalng,.e., s s fxed, we recover the optmzaton problem characterzaton of usual load balancng. Under ths stuaton, problem ()-(2) s strctly convex as f (s, λ ) s an ncreasng functon of λ, and the equlbrum s unque. In general, there may be no or multple LBSS equlbra, dependng on propertes of performance curve f (s (λ ), λ ) under speed scalng. For example, consder performance metrc (2) wth power functon (3) n a processor sharng system wth gated statc speed scalng (see the next secton). Speed scalng s (λ ) satsfes β (s λ ) 2 = k (α )s. When α < 2, f (s (λ ), λ ) s decreasng. So, problem ()- (2) becomes a problem of mnmzng a concave objectve functon, whch s usually a hard computng problem and may admt multple solutons. In the above load balancng model, the dspatcher routes the arrvals accordng to tradtonal performance metrc F but does not consder the nternal cost g of the server. We call ths model cost-oblvous load balancng (e.g., energy-oblvous n the case of energy-aware speed scalng). It can also be seen as a selfsh routng game where each job chooses a server wth mnmal F value [27]. So, the LBSS equlbrum mght not be socally optmal, n terms of metrc F as well as energyaware metrc M. As we mentoned before, speed scalng brngs addtonal dmenson such as energy nto the desgn objectve. It s of sgnfcant value to study ts nteracton wth the exstng algorthms and protocols, e.g., f t s optmal wth respect to tradtonal performance metrc F as well as a new one M, how to desgn dstrbuted optmal algorthms n terms of new performance metrc, and f we can decouple speed scalng from other resource allocaton mechansms. In order to study these questons for load balancng, we consder two new load balancng models, as follows. F-optmal load balancng: The dspatcher routes arrvals so as to acheve socal optmum n terms of tradtonal performance metrc F : mn λ 0 s.t. λ f (s (λ ), λ ) (3) λ = λ. (4) When F = E[T ], we call t delay optmal load balancng. Cost-aware optmal load balancng: The dspatcher routes arrvals so as to acheve socal optmum n terms of cost-aware performance metrc M : mn λ 0 s.t. g (s (λ ), λ ) + β λ f (s (λ ), λ ) (5) λ = λ. (6)

We call t energy-aware optmal load balancng n the case of energy-aware speed scalng. The end users as a whole care about problem (3)-(4) and the servers/end users as a whole care about problem (5)- (6). We ntend to characterze the LBSS equlbrum wth respect to them, as well as proposng dstrbuted algorthms to acheve the correspondng equlbrum or optma. Agan, the general problems (3)-(4) and (5)-(6) may be hghly nontrval, dependng on the performance curve f under speed scalng. In the remander of ths paper, we wll focus on load balancng wth energy-aware speed scalng n processor sharng systems wth performance metrc (2) wth power functon (3), as a concrete system to study the nteracton between load balancng and speed scalng. We wll leave the general problem to future work. V. LOAD-BALANCING-SPEED-SCALING INTERACTION IN PROCESSOR SHARING SYSTEMS In ths secton, we consder energy-aware speed scalng n processor sharng (PS) systems wth performance metrc (2) and power functon (3). Whle general speed scalng polces can be taken at a server, we focus on gated-statc speed scalng, n whch the server has a zero speed when there s no job and otherwse runs at a constant speed that balances the response tme and energy usage; see, e.g, [9], [0]. Gatedstatc speed scalng s the smplest nontrval speed scalng. It requres mnmal hardware to support. For example, a CMOS chp may set a constant clock speed but AND t wth the gatng sgnal to set the speed to 0 when there s no job [0]. The gated statc speed scalng captures some essence of dynamc speed scalng whle admts more tractable analyss. As mentoned n Secton IV, when α < 2, the problem under gated statc speed scalng may become hard problem of mnmzng a concave objectve functon. We thus focus on the system wth α 2, n order to obtan a clean characterzaton to gan nsghts. Power functons wth α 2 s also practcally mportant, as n the server wth a power functon wth α 2 energy cost s usually the drvng force n decdng on server speed whle n the server wth a power functon wth α < 2 tradtonal performance metrc s the drvng force. Besdes, the results obtaned for gated statc speed scalng wth α 2 are expected to carry over to statc provsonng wth α, n whch the server runs at a constant statc speed that s chosen based on workload to balance the response tme and energy usage. Statc provsonng s the smplest form of speed scalng, and s a model often used n energy-aware capacty provsonng n data centers. A. Energy-oblvous load balancng Under PS schedulng, the mean response tme at server takes the form: f (s, λ ) =. (7) s λ Under gated statc speed scalng, the energy cost s only ncurred durng the tme when the server s busy. Note that the fracton of the tme when the server s busy s λ /s. So, the server decdes on speed s by solvng the followng optmzaton problem: λ mn β + λ P (s ). (8) s >λ s λ s the speed scalng s (λ ) satsfes where β = β (s λ ) 2 + s = 0, (9) β k (α ). By equaton (9), we have s (λ ) = 2s (λ ) α s (λ ) (α 2)λ > 0, (20) s (λ ) = (2α 4)(s (λ ) λ s (λ )) (α s (λ ) (α 2)λ ) 2 0, (2) where the second nequalty follows from the fact that s (λ ), and moreover, s (λ ) = and s (λ ) = 0 f and only f α = 2. Hence, speed scalng s (λ ) s a strctly ncreasng, convex functon of λ. Further, (s (λ )) f (s (λ ), λ ) = (22) β s (λ ) λ = s also a strctly ncreasng functon of λ. Corollary 3: There exsts a unque LBSS equlbrum for processor sharng systems wth gated-statc speed scalng. Proof: By Theorem 2, the LBSS equlbrum satsfes the optmalty condtons for optmzaton problem: mn λ dλ (23) s (λ ) λ λ = λ. (24) Snce s (λ ) λ s strctly ncreasng n λ, the above optmzaton problem s strctly convex. The exstence and unqueness of LBSS equlbrum follows from the fact that problem (23)- (24) has a unque optmum [26]. Now, let us characterze the equlbrum. For each server, defne the base servce rate s 0 = s (0 + ) = β α. 4 Wthout loss of generalty, we assume that s 0 s 0 2 s 0 N. For later convenence, we also assume that s 0 N + =0. Theorem 4: The set of servers that are used at the equlbrum s N e = {, 2,, n}, wth a unque n that satsfes where n ( f ) ( s 0 ) < λ n = f (λ ) = n ( f ) ( = s 0 n+ ), (25) s (λ ) λ. (26) 4 For a functon f(x) : R R, f(a + ) denotes the rght hand lmt lm x a + f(x).

Proof: By equlbrum condton (9), we have < γ f s 0 N e and γ otherwse. Further, s 0 λ = s γ > 0, f λ = 0, f s 0 s 0 < γ (27) γ. (28) Snce s 0 s decreasng n, N e takes the form of {, 2,, n}. Note that s < γ, and f 0 n s 0 (λ ) s an ncreasng n+ functon. So, n ( f ) ( n s 0 ) < ( f n ) (γ) ( f ) ( ), n =.e., n ( f ) ( s 0 ) < n = = n λ = λ = = n ( f ) ( = s 0 n+ s 0 n+ The unqueness of n follows from the fact that the LBSS equlbrum s unque. We see that the LBSS equlbrum has a water-fllng structure. If we see load balancng as a selfsh routng problem [27], the arrvals wll aggressvely occupy fast servers wth low delay frst. ) Dstrbuted load balancng algorthm: The (convex) optmzaton problem characterzaton of the LBSS equlbrum also suggests a dstrbuted algorthm to acheve the equlbrum. At k-th teraton: Each server estmates the arrval rate λ, and adjusts ts speed s, accordng to s (k) = s (λ (k)). (29) The dspatcher measures delay t (k) = s experenced at each server. Denote by E[t(k)] the (k) λ (k) mnmal ). N(k) t (k) wth t(k) at step k such that t(k) = N(k) N(k) := { λ (k) > 0 or t (k) t(k), N}. 5 The dspatcher adjusts λ to each server, accordng to λ (k + ) = [λ (k) ε(t (k) E[t(k)])] +. (30) where ε s a postve stepsze, and + denotes the projecton onto R +, the set of nonnegatve real numbers. When ε s small enough, the above algorthm converges. Let δ (k) = λ (k + ) λ (k). It s easy to verfy that δ (k) = 0, (3) δ (k)t (k) 0. (32) We see that δ (k)t (k) = 0 only f δ (k) = 0, whch requres t = t, or, λ = 0 and t > t. 5 t and N can be determned n a recursve way as follows. In the begnnng, let N = N and calculate t = N N t (k), and then exclude from N those servers such that λ = 0 and t > t. Repeat the same procedure wth the new sets N, and when t stops we get E[t]. The above algorthm actually follows the negatve gradent drecton of s (λ ) λ dλ subject to λ = λ [26]. Any algorthms that follow a properly-chosen negatve gradent drecton would work, and (30) pcks a specfc gradent drecton that wll facltate the convergence analyss. We skp the convergence proof for brevty. B. Delay optmal load balancng In ths subsecton, we study delay optmal load balancng desgn: λ mn (33) λ 0 s (λ ) λ s.t. λ = λ, (34) and characterze the LBSS equlbrum wth respect to t. By equaton (9), λ = s α 2, (35) s (λ ) λ β whch s strctly ncreasng and convex n s. Note that s (λ ) λ s ncreasng and convex. It follows that s (λ ) λ s a strctly convex functon of λ. 6 So, problem (33)-(34) s strctly convex, and has a unque optmum. Denote the optmum by (λ ) N. There exsts a unque γ > 0, such that the optmalty condton can be wrtten as [26] ( s (λ ) λ s (λ ) (s (λ ) γ )( λ λ λ )2 ) 0, λ 0, (36) λ = λ. (37) N Theorem 5: The set of servers that are used at the optmum s N o = {, 2,, }, wth a unque that satsfes where ( ˆf ) ( s 0 ) < λ ( ˆf ) ( = = s 0 + )}, (38) ˆf (λ ) = s (λ ) λ s (λ ) (s (λ ) λ ) 2. (39) Moreover, γ γ and n. Proof: Note that ˆf (λ ) s an ncreasng functon of λ, and ˆf (0) =. The frst part of the theorem follows the same s 0 proof as n Theorem 4. For the second part of the theorem. Note that s (λ ) by equaton (20). ˆf (λ ) f (λ ). If γ < γ, then n and ( ˆf ) (γ ) < ( f ) (γ ) ( f ) (γ) λ. = = = Ths contradcts = ( ˆf ) (γ ) = = λ = λ. So, γ γ, and n follows. 6 λ Note that, when α = 2, s not strctly convex but lnear n s (λ ) λ λ. But ths would not change the unqueness of the optmum.

) Dstrbuted load balancng algorthm: The delay optmal load balancng s a convex problem. We can apply smlar dstrbuted algorthm to algorthm (29)-(30), to gude the optmal load balancng desgn. At k-th teraton: Each server estmates the arrval rate λ, and adjusts ts speed s, accordng to s (k) = s (λ (k)). (40) The dspatcher measures delay t (k) = s experenced at each server, and estmates ˆf, accordng (k) λ (k) to ˆf (k) = ˆf (λ (k)) = α λ (k)(t (k)) 2 + α t (k) 2λ (k)t (k) + α. (4) Denote by E[ ˆf(k)] the mnmal ˆf(k) at step k such that ˆf(k) = N(k) N(k) ˆf (k) wth N(k) := { λ (k) > 0 or ˆf (k) ˆf(k), N}. The dspatcher adjusts λ to each server, accordng to λ (k + ) = [λ (k) ε( ˆf (k) E[ ˆf(k)])] +. (42) where ε s a postve stepsze, and + denotes the projecton onto R +, the set of nonnegatve real numbers. Note that delay optmal load balancng algorthm (40)-(42) s more complcated than the smple, energy-oblvous load balancng algorthm (29)-(30). It requres to estmate ˆf. In addton, t requres the dspatcher to know the servers power functon characterstc parameters α and k. 2) Effcency loss n delay at the LBSS equlbrum: Defne the socal cost n delay: C = λ s (λ ) λ, (43) we now characterze the neffcency n delay at the LBSS equlbrum. Lemma 6: Let α = max α. Then, γ γ α γ. (44) 2 Proof: The frst nequalty has been proved n Theorem 5. It remans to prove the second one. By equaton (35), ˆf can be wrtten as ˆf (λ ) = α 2 s β s. (45) Note that s (λ ) s ncreasng. s (λ ) 2 α by equaton (20). Combnng wth s (λ ), we get f (λ ) ˆf (λ ) α 2 f (λ ) α 2 f (λ ). If γ > α 2 γ, then ( ˆf ) (γ ) ( f ) ( 2 α γ ) > ( f ) (γ). ( ˆf ) (γ ) > = n ( f ) (γ) = λ. = Ths contradcts the fact that = ( ˆf ) (γ ) = λ (also note that n). So, γ α 2 γ. Theorem 7: Denote the socal cost n delay at the LBSS equlbrum by C e and the optmal cost by C o. Then, C e C o α 2. (46) Proof: The socal cost at the LBSS equlbrum s C e = λγ. (47) When λ > 0, by equatons (22), (45) and (44), we have s s (λ ) = = 2γ λ β α s 2γ 2γ α α. (48) So, C o = λ s (λ ) λ 2γ α λ = 2λγ α. (49) C e C o α 2. (50) We see that the degree of neffcency n delay at the LBSS equlbrum depends only on the order α of the power functons. For example, f α = 2, the LBSS equlbrum acheves the socal optmum. As α s a constant ndependent of the number N of the servers n the system, ths result s very dfferent from the effcency loss of the usual load balancng (wth fxed server speeds), whch scales wth N, see, e.g., [22]. Also, note that α 2 can be seen as a measure of heterogenety n power functons. We can thus say that the degree of neffcency at the LBSS equlbrum s bounded by the heterogenety of the system. As the power functon can usually be well approxmated as a low-order polynomal functon, the above result suggests bengn nteracton between energyoblvous load balancng and power-aware speed scalng, n terms of delay. As energy-oblvous load balancng s already employed n practce and smple to mplement, we may need not change t as t does not ncur a large penalty n delay. C. Energy-aware optmal load balancng In ths subsecton, we study energy-aware optmal load balancng desgn: mn λ,s s.t. λ β + λ P (s ) (5) s λ s λ = λ, (52) and characterze the LBSS equlbrum wth respect to t. By speed scalng (.e., solvng for s frst), the above problem reduces to: mn h (λ ) (53) λ s.t. λ = λ, (54)

where λ h (λ ) = β + λ P (s (λ )). (55) s (λ ) λ s (λ ) Note that h (λ ) = β s (λ ) (s (λ ) λ ) 2 + k (s (λ ) = α β s (λ ) α (s (λ ) λ ) 2, (56) h (λ ) = α β 2s (λ ) (s (λ ) + λ )s (λ ) α (s (λ ) λ ) 3. (57) We see that h > 0 and h > 0, and thus h (λ ) s strctly ncreasng and convex. So, problem (53)-(54) s a strctly convex problem, and has a unque optmum. Denote the optmum by (λ + ) N. There exsts a unque γ + > 0, such that the optmalty condton can be wrtten as [26] (h (λ + ) γ+ )( λ λ + ) 0, λ 0, (58) λ + = λ. (59) N Note that h (λ ) s strctly ncreasng, and h (λ ) α β α ˆf (λ ) α β α f (λ ). (60) Let d 0 = h (0) = α α β s 0. We can defne a permutaton π : {, 2,, N } {, 2,, N }, such that d 0 s n decreasng order under π. We have the followng characterzaton of the optmum. Theorem 8: The set of servers that are used at the optmum s N s = {π (), π (2),, π (m + )}, wth a unque m + that satsfes (h π () ) ( m + = d 0 π (m + ) ) < λ (h π () ) ( m + = d 0 π (m + +) Proof: It follows the same proof as n Theorem 4. We skp t for brevty. We see that the energy-aware optmal load balancng has a smlar water-fllng effect, and the arrvals wll occupy servers wth low margnal cost n energy-aware metrc frst. As a result, the jobs wll be consoldated nto a subset of servers that have low energy-aware cost. ) Dstrbuted load balancng algorthm: The energy-aware optmal load balancng s a convex problem. Agan, we can apply smlar dstrbuted algorthm to algorthm (29)-(30), to gude the optmal load balancng desgn. At k-th teraton: Each server estmates the arrval rate λ, and adjusts ts speed s, accordng to s (k) = s (λ (k)). (6) The dspatcher measures delay t (k) = s experenced at each server, and estmates h, accordng (k) λ (k) to h (k) = h (λ (k)) = α β α (λ (k)(t (k)) 2 + t (k)). (62) ). Denote by E[h (k)] the mnmal h (k) at step k such that h (k) = N(k) N(k) h (k) wth N(k) := { λ (k) > 0 or h (k) h (k), N}. The dspatcher adjusts λ to each server, accordng to λ (k + ) = [λ (k) ε(h (k) E[h (k)])] +. (63) where ε s a postve stepsze, and + denotes the projecton onto R +, the set of nonnegatve real numbers. Agan, energy aware optmal load balancng algorthm (6)- (63) s more complcated than energy-oblvous load balancng algorthm (29)-(30). In addton to the servers power functon characterstc parameters, the dspatcher requres to know ther weghts β. 2) Effcency loss n energy-aware performance metrc at the LBSS equlbrum: Defne the socal cost n energy-aware performance metrc M : D = λ β + λ P (s ) = h (λ ). (64) s λ s We now characterze the neffcency n energy-aware performance metrc at the LBSS equlbrum. It s complcated to characterze the effcency loss for the system wth arbtrary power functons and loads. Here we gve a partal characterzaton, focusng on the case wth power functons of the same order,.e., P (s ) = k s α for all servers, and n heavy traffc,.e., λ. We leave a complete characterzaton of the effcency loss to future work. The case wth power functons of the same order models a system that employs smlar servers but wth dfferent scalng factors and weghts. Heavy traffc regme s of sgnfcant nterest, as the neffcency of load-balancng-speed-scalng nteracton s ntutvely worst under heavy traffc. Theorem 9: Assume that α = 2. Denote the energy-aware socal cost at the LBSS equlbrum by D e and the optmal cost by D o. Under the aforementoned condtons, we have D e D o max k N. (65) mn k Proof: When α = 2, at the LBSS equlbrum (λ ) N, the arrvals wll be routed to the server that has the maxmal β value. 7 Under heavy traffc, the energy-aware socal cost at the LBSS equlbrum s D e k λ 2 max k λ 2. At the socal optmum (λ + ) N, λ + j /k λ. The optmal /kj socal cost s D 0 k (λ + )2 λ 2 /k mn k λ 2. N D e D o max k mn k N. (66) 7 There may exst multple servers that have the maxmal β value. But t s reasonable to expect that the number of such servers s bounded by a constant that does not scale wth the total number of the servers n the system. For smplcty of presentaton, we assume that there s only one server that has the maxmal β value. Ths only brngs n a constant factor to the bound on effcency loss, f there are multple such servers.

We see that when α = 2, the degree of neffcency at the LBSS equlbrum scales wth the number of servers n the system. Ths happens because the energy-oblvous load balancng uses only the server wth the largest base rate, whch ncurs a huge energy cost at ths server, whle the energy-aware optmal load balancng wll spread load across all servers, whch leads to much smaller energy cost at the servers. Ths suggests that we should do energy-aware load balancng f the energy consumpton s a man concern. α Lemma 0: Assume α > 2. Defne ζ = αk β for each server,. Then, mn ζ γ 2 γ + max ζ γ 2. (67) Proof: By equaton (56), h can be wrtten as h (λ ) = αk (s (λ ) = ζ ( f (λ )) 2. (68) If γ + < mn ζ γ 2, then (h ) (γ + ) < ( f ) (γ). (h ) (γ + ) < ( f ) (γ) = λ. Ths contradcts the fact that (h ) (γ + ) = λ. So, γ + mn ζ γ 2. The second nequalty can be proved smlarly. Theorem : Assume α > 2. Denote the energy-aware socal cost at the LBSS equlbrum by D e and the optmal cost by D o. Under the aforementoned condtons, we have D e D o (max ζ α. (69) mn ζ Proof: Under heavy traffc, λ. By Lemma n [9], we have the followng approxmaton for speed scalng under heavy traffc: β s (λ ) λ + λ λ. β λ + λ P (s ) λ β + k λ α k λ α. s λ β s α 2 The optmal socal cost s D e D o D o k ( γ+ k α α. (72) k β α max γ ( + mn j ζ j α k ( γ+ k α α γ ( + mn j ζ j α k β α ζ k ( γ+ k α α = max( mn j ζ j α ( max ζ mn j ζ j α. (73) We see that when α > 2, the degree of neffcency at the LBSS equlbrum depends only on the degree of heterogenety max ζ mn j ζ j n the system but not the number of servers N. If the degree of heterogenety n the system s small, energyoblvous load balancng nteracts bengnly wth speed scalng, n terms of the energy-aware cost. In ths stuaton, we may do not need complcated energy-aware load balancng,.e., we can decouple the desgn of load balancng from speed scalng. Otherwse, we must do energy-aware optmal load balancng f energy consumpton s a man concern. VI. NUMERICAL EXAMPLES In ths secton, we provde numercal examples to complement the analyss n prevous sectons, manly focusng on evaluatng the dstrbuted load balancng algorthms as our other results on the LBSS equlbrum, delay-optmal load balancng and energy-aware optmal load balancng are analytcal results. We consder a system wth 0 servers wth speed scalng. Half of the servers have a power functon of the form P (s ) = k s 5 2 and the other half have a power functon of the form P (s ) = k s 3. The total load s normalzed to be λ = 0, and the values for parameter k and β used to obtan numercal results are randomly drawn from [, 0] and [5, 5], respectvely. Note that, at the LBSS equlbrum (λ ) N, s λ γ =. β β The energy-aware socal cost at the LBSS equlbrum s D e k ( β γ 2 k β α ( where the nequalty follows from (67). At the socal optmum (λ + ) N, γ + α, (70) mn j ζ j γ + = αk s α αk (λ + )α. (7) Fg. 2. The arrval rate and servce rate evoluton of energy-oblvous load balancng.

once the prce starts oscllatng around some mean value. Fg. 3. The arrval rate and servce rate evoluton of delay optmal load balancng. Fg. 4. The arrval rate and servce rate evoluton of energy-aware optmal load balancng. Fgures 2, 3 and 4 show the evoluton of the arrval rate and servce rate wth stepsze γ = 0.2 for energy-oblvous load balancng, delay optmal load balancng and energy-aware optmal load balancng, respectvely. We see that the arrval rates and servce rates approach the correspondng equlbrum or optmum quckly. The numercal results confrm prevous analyss and ntutons. As we go from energy-oblvous load balancng to delay optmal load balancng, the load s spread more across the servers, whch s drven by mnmzng the socal cost n delay. We also see that the changes n the arrval rate and servce rate are not severe, whch ntutvely confrms Theorem 7 that gves a small bound on effcency loss at the LBSS equlbrum. As we move to energy-aware optmal load balancng, the load becomes more evenly dstrbuted. Ths s drven by mnmzng the energy-aware socal cost, and an uneven load dstrbuton wll lead to uneven servce rate dstrbuton, whch may result n large cost n energy at the server(s) wth large speed. We also see large changes n the arrval rate and servce rate. Ths mples a large degree of neffcency at the LBSS equlbrum, whch ntutvely confrms Theorem even though t s a characterzaton for the system wth power functons of the same order. In order to study the mpact of dfferent choces of the stepsze on the convergence of the algorthms, we have run smulatons wth dfferent stepszes. We found that the smaller the stepsze, the slower the convergence, and the larger the stepsze, the faster the convergence but the system may only approach to wthn a certan neghborhood of the equlbrum, whch s a general characterstc of any gradent based method. In practce, the dspatcher can frst choose large stepszes to ensure fast convergence, and subsequently reduce the stepszes VII. CONCLUSION We have studed the nteracton between load balancng and speed scalng. We characterze the equlbrum resultng from the load balancng and speed scalng nteracton, and ntroduce two optmal load balancng desgns, n terms of tradtonal performance metrc and cost-aware (n partcular, energyaware) performance metrc respectvely. We study n detal the load-balancng-speed-scalng equlbrum and the optmal load balancng desgns n processor sharng systems wth gatedstatc speed scalng, and propose dstrbuted load balancng algorthms to acheve the correspondng equlbrum and optma. Especally, we characterze the degree of neffcency at the load-balancng-speed-scalng equlbrum n terms of delay as well as energy-aware metrc, and show that the degree of neffcency s mostly bounded by the heterogenety of the system, but ndependent of the number of the servers. These results provde nsghts n understandng the nteracton of load balancng wth speed scalng and gudng new desgns. Further research stemmng out of ths paper ncludes the followng. We are characterzng the effcency loss n energyaware metrc at the load-balancng-speed-scalng equlbrum for the system wth power functons of dfferent polynomal orders. We are also studyng the load balancng and speed scalng nteracton n the processor sharng system wth general power functons (e.g., nonconvex, dscontnuous, wth possbly a dscrete set of allowable speeds), as well as n the system wth other schedulng polces such as Shortest Remanng Processng Tme (SRPT). We wll further study other speed scalng polces and ther mpact on the desgn and performance of load balancng. Fnally, we wll go beyond energy-aware speed scalng, and study other types of speed scalng behavors and ther nteracton wth load balancng n, e.g., date centers or call centers. REFERENCES [] O. S. Unsal and I. Koren. System-level power-aware desgn technques n real-tme systems. Proc. IEEE, 97(3):055 069, 2003. [2] S. Kaxras and M. Martonos. Computer Archtecture Technques for Power-Effcency. Morgan and Claypool, 2008. [3] S. Iran and K. R. Pruhs. Algorthmc problems n power management. SIGACT News, 36(2):63 76, 2005. [4] L. Yuan and G. Qu. Analyss of energy reducton on dynamc voltage scalng-enabled systems. IEEE Trans. Comput.-Aded Des. Integr. Crcuts Syst., 24(2):827 837, 2005. [5] Y. Zhu and F. Mueller. Feedback edf schedulng of real-tme tasks explotng dynamc voltage scalng. Real Tme Systems, 3:33 63, 2005. [6] N. Bansal, T. Kmbrel, and K. Pruhs. Speed scalng to manage energy and temperature. J. ACM, 54(): 39, 2007. [7] S. Herbert and D. Marculescu. Analyss of dynamc voltage/frequency scalng n chp-multprocessors. In Proc. ISLPED, 2007. [8] N. Bansal, H.-L. Chan, and K. Pruhs. Speed scalng wth an arbtrary power functon. In Proc. ACM-SIAM SODA, 2009. [9] A. Werman, L. L. H. Andrew, and A. Tang. Power-aware speed scalng n processor sharng systems. In Proceedngs of IEEE Infocom, 2009. [0] L. L. Andrew, M. Ln, and A. Werman. Optmalty, farness, and robustness n speed scalng desgns. In Proceedngs of ACM Sgmetrcs, 200.

[] N. Bansal, K. Pruhs, and C. Sten. Speed scalng for weghted flow tmes. In Proc. ACM-SIAM SODA, 2007. [2] F. Yao, A. Demers, and S. Shenker. A schedulng model for reduced cpu energy. In Proceedngs of IEEE Symposum on Foundatons of Computer Scence (FOCS), 995. [3] J. M. George and J. M. Harrson. Dynamc control of a queue wth adjustable servce rate. Operatons Research, 49(5):720 73, 200. [4] K. Pruhs, P. Uthasombut, and G. Woegnger. Gettng the best response for your erg. In Scandnavan Worksh. Alg. Theory, 2004. [5] J. R. Bradley. Optmal control of a dual servce rate m/m/ producton-nventory model. European Journal of Operatons Research, 6(3):82 837, 2005. [6] S. Albers and H. Fujwara. Energy-effcent algorthms for flow tme mnmzaton. Lecture Notes n Computer Scence, 3884:62 633, 2006. [7] D. P. Bunde. Power-aware schedulng for makespan and flow. In Proc. ACM Symp. Parallel Alg. and Arch, 2006. [8] S. Zhang and K. S. Catha. Approxmaton algorthm for the temperatureaware schedulng problem. In Proceedngs of IEEE Conference on Computer Aded Desgn, 2007. [9] N. Bansal, H.-L. Chan, T.-W. Lam, and L.-K. Lee. Schedulng for speed bounded processors. In Int. Colloq. Automata, Languages and Programmng, 2008. [20] N. Bansal, H.-L. Chan, K. Pruhs, and D. Katz. Improved bounds for speed scalng n devces obeyng the cube-root rule. In Int. Colloq. Automata, Languages and Programmng, 2009. [2] T.-W. Lam, L.-K. Lee, I. K. K. To, and P. W. H. Wong. Speed scalng functons for flow tme schedulng based on actve job count. In Proc. Euro. Symp. Alg., 2009. [22] M. Havv and T. Roughgarden. The prce of anarchy n an exponental mult-server. Operatons Research Letters, 35:42 426, 2007. [23] T. Wu and D. Starobnsk. On the prce of anarchy n unbounded delay networks. In Proc. of Game Theory for Comm. and Networks, 2006. [24] E. Altman, U. Ayesta, and B. J. Prabhu. Optmal load balancng n processor sharng systems. In Proceedngs of GameComm, 2008. [25] H. Chen, J. Marden, and A. Werman. On the mpact of heterogenety and back-end schedulng n load balancng desgns. In Proceedngs of IEEE Infocom, 2009. [26] D. P. Bertsekas and J. N. Tstskls. Parallel and Dstrbuted Computaton. Prentce Hall, 989. [27] N. Nssan, T. Roughgarden, E. Tardos, and V. V. Vazran. Algorthmc game theory. Cambrdge Unversty Press, 2007.