Improving Response Time and Energy Efficiency in Server Clusters



Similar documents
Autonomic Power Management Schemes for Internet Servers and Data Centers

Achieving Energy-Efficiency in Data-Center Industry: A Proactive-Reactive Resource Management Framework

A framework-based approach to support dynamic adaptation of web server clusters

Enhancing Energy Efficiency in Multi-tier Web Server Clusters via Prioritization

Statistical QoS Guarantee and Energy-efficiency in Web Server Clusters

Server Operational Cost Optimization for Cloud Computing Service Providers over a Time Horizon

Power and Energy Management for Server Systems

A Taxonomy and Survey of Energy-Efficient Data Centers and Cloud Computing Systems

5 Performance Management for Web Services. Rolf Stadler School of Electrical Engineering KTH Royal Institute of Technology.

Energy Conservation in Heterogeneous Server Clusters

HyLARD: A Hybrid Locality-Aware Request Distribution Policy in Cluster-based Web Servers

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Windows Server Performance Monitoring

CHAPTER 1 INTRODUCTION

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations

Multi-mode Energy Management for Multi-tier Server Clusters

Coordinated Management of Power Usage and Runtime Performance

Profit Maximization and Power Management of Green Data Centers Supporting Multiple SLAs

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age.

Performance Comparison of Assignment Policies on Cluster-based E-Commerce Servers

How To Compare Load Sharing And Job Scheduling In A Network Of Workstations

An Empirical Study and Analysis of the Dynamic Load Balancing Techniques Used in Parallel Computing Systems

Dynamic Resource Management Using Skewness and Load Prediction Algorithm for Cloud Computing

The Case for Massive Arrays of Idle Disks (MAID)

Online Resource Management for Data Center with Energy Capping

Abstract. 1. Introduction

Modelling the performance of computer mirroring with difference queues

Energy Constrained Resource Scheduling for Cloud Environment

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing

Real-Time Scheduling 1 / 39

Genetic Algorithms for Energy Efficient Virtualized Data Centers

How To Balance A Web Server With Remaining Capacity

CHAPTER 5 WLDMA: A NEW LOAD BALANCING STRATEGY FOR WAN ENVIRONMENT

CHAPTER 4: SOFTWARE PART OF RTOS, THE SCHEDULER

Web Hosting Service Level Agreements

Environments, Services and Network Management for Green Clouds

Virtualization Technology using Virtual Machines for Cloud Computing

Power-aware QoS Management in Web Servers

Ready Time Observations

Chapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup

Energy Aware Resource Allocation in Cloud Datacenter

ENERGY-EFFICIENT TASK SCHEDULING ALGORITHMS FOR CLOUD DATA CENTERS

A Bi-Objective Approach for Cloud Computing Systems

Multifaceted Resource Management for Dealing with Heterogeneous Workloads in Virtualized Data Centers

Performance Modeling and Analysis of a Database Server with Write-Heavy Workload

AN ANALYSIS OF THE IMPORTANCE OF APPROPRIATE TIE BREAKING RULES IN DISPATCH HEURISTICS

MEASURING PERFORMANCE OF DYNAMIC LOAD BALANCING ALGORITHMS IN DISTRIBUTED COMPUTING APPLICATIONS

A Comparative Performance Analysis of Load Balancing Algorithms in Distributed System using Qualitative Parameters


Tableau Server 7.0 scalability

Efficient DNS based Load Balancing for Bursty Web Application Traffic

TPCalc : a throughput calculator for computer architecture studies

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance

Load Balancing of Web Server System Using Service Queue Length

A Markovian Sensibility Analysis for Parallel Processing Scheduling on GNU/Linux

Secured Embedded Many-Core Accelerator for Big Data Processing

Control 2004, University of Bath, UK, September 2004

Global Cost Diversity Aware Dispatch Algorithm for Heterogeneous Data Centers

Dynamic resource management for energy saving in the cloud computing environment

Energy Efficient Redundant Configurations for Real-Time Parallel Reliable Servers

Analysis of an Artificial Hormone System (Extended abstract)

POWER MANAGEMENT FOR DESKTOP COMPUTER: A REVIEW

A Low Cost Two-Tier Architecture Model For High Availability Clusters Application Load Balancing

A Low Cost Two-tier Architecture Model Implementation for High Availability Clusters For Application Load Balancing

Characterizing Task Usage Shapes in Google s Compute Clusters

Load Distribution in Large Scale Network Monitoring Infrastructures

A Hybrid Load Balancing Policy underlying Cloud Computing Environment

Experimental Evaluation of Horizontal and Vertical Scalability of Cluster-Based Application Servers for Transactional Workloads

USING VIRTUAL MACHINE REPLICATION FOR DYNAMIC CONFIGURATION OF MULTI-TIER INTERNET SERVICES

Algorithmic Mechanism Design for Load Balancing in Distributed Systems

A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems

Reliable Systolic Computing through Redundancy

Multi-service Load Balancing in a Heterogeneous Network with Vertical Handover

Dynamic Resource allocation in Cloud

Energy-Aware Multi-agent Server Consolidation in Federated Clouds

An Oracle White Paper November Oracle Real Application Clusters One Node: The Always On Single-Instance Database

Virtual Batching: Request Batching for Energy Conservation in Virtualized Servers

Performance Evaluation of a Green Scheduling Algorithm for Energy Savings in Cloud Computing

Locality Based Protocol for MultiWriter Replication systems

LOAD BALANCING AS A STRATEGY LEARNING TASK

Performance measurements and modeling of database servers

Disk-aware Request Distribution-based Web Server Power Management

Managing Storage Space in a Flash and Disk Hybrid Storage System

Development of Software Dispatcher Based. for Heterogeneous. Cluster Based Web Systems

Modeling on Energy Consumption of Cloud Computing Based on Data Center Yu Yang 1, a Jiang Wei 2, a Guan Wei 1, a Li Ping 1, a Zhou Yongmin 1, a

Shareability and Locality Aware Scheduling Algorithm in Hadoop for Mobile Cloud Computing

OPTIMAL MULTI SERVER CONFIGURATION FOR PROFIT MAXIMIZATION IN CLOUD COMPUTING

Contributions to Gang Scheduling

A QoS-driven Resource Allocation Algorithm with Load balancing for

ENERGY EFFICIENT VIRTUAL MACHINE ASSIGNMENT BASED ON ENERGY CONSUMPTION AND RESOURCE UTILIZATION IN CLOUD NETWORK

Setting deadlines and priorities to the tasks to improve energy efficiency in cloud computing

A Study on Workload Imbalance Issues in Data Intensive Distributed Computing

Dynamic Adaptive Feedback of Load Balancing Strategy

A Performance Analysis of Secure HTTP Protocol

A Classification of Job Scheduling Algorithms for Balancing Load on Web Servers

Power Management in Cloud Computing using Green Algorithm. -Kushal Mehta COP 6087 University of Central Florida

Various Schemes of Load Balancing in Distributed Systems- A Review

The Truth Behind IBM AIX LPAR Performance

Automated QoS Support for Multimedia Disk Access

Power Provisioning for a Warehouse-sized Computer

Transcription:

Improving Response Time and Energy Efficiency in Server Clusters Raphael Guerra, Luciano Bertini and J.C.B. Leite Instituto de Computação - Universidade Federal Fluminense Rua Passo da Pátria, 156, Bloco E, 24.210-240, Niterói, RJ, Brazil [rguerra, lbertini, julius]@ic.uff.br Abstract. The development of energy-efficient web server clusters requires the study of different request dispatch policies applied by the central access point to the cluster, the front-end, and/or the application of hardware techniques that allow the best usage of resources. However, energy efficiency should not be attained at the expense of poor response times. This paper describes a technique that tries to balance energy consumption and adequate response times for soft real-time applications in server clusters. Resumo. O desenvolvimento de servidores web energeticamente eficientes requer não só o estudo de políticas de despacho a serem aplicadas pelo nó central, como também o uso de técnicas de hardware que permitam um melhor uso dos recursos. Contudo, essa eficiência não pode ser obtida em detrimento do atendimento aos prazos de execução das requisições. Este artigo descreve uma técnica que tenta balancear consumo de energia e tempos de resposta adequados para aplicações de tempo real não críticas em clusters de servidores. 1 Introduction The development of energy-efficient mechanisms for web server clusters requires the study of different request dispatch policies applied by the central access point to the cluster, the front-end, and/or the application of hardware techniques that allows the best usage of resources. Several works have been published on these policies, and a good review is presented in [Cardellini et al. 2002]. Essentially, they classify the algorithms in those that work at OSI layer 4, and those that work at OSI layer 7. The former are not content aware, i.e., they cannot look to what content is being requested to make the dispatch decision. On the other hand, the latter can rely on information extracted from the URL, for purposes such as to improve the cache affinity, increase the load sharing, and use specialized server nodes to provide, for example, streaming media and dynamic content. Most of the work mentioned, however, was done aiming to maximize performance, not energy-efficiency. Another important structural characteristic of a server cluster, for which research is still beginning, is to consider node heterogeneity and energy efficiency. When maintaining a web cluster, a replacement or a new node to be added is naturally different from the old ones. Thus, clearly, a cluster is usually homogeneous only when it is first put to service. Another viewpoint on heterogeneity in given in [Lefurgy et al. 2003], on the architecture of commercial servers and the possibilities for energy efficiency in its various subsystems. In that work, they state that mixing power-efficient and performanceefficient processors is important for the support of Internet applications, because these applications require both efficient network-protocol processing and application-level computation. Whatever the motivation, it is necessary to develop new power management

techniques aware of the cluster heterogeneity. Furthermore, for the mentioned service differentiation, it is necessary to provide some kind of QoS control, for example, at the response time level. There are two main mechanisms that can be used to reduce the energy consumption in a cluster, without considering memory, disks and other peripherals. The first of them is DVS (Dynamic Voltage Scaling), which means to scale the voltage and frequency of the processor to predefined supported levels. The other is the dynamic structure configuration of the server cluster, or what is called VOVO (Vary-On Vary-Off), or simply dynamic cluster reconfiguration: turning a server off to save energy, or turning on a server to improve performance. Both techniques have been used together by some authors and they have been proved successful. Although the energy minimization is important, it will not always be desired at maximum levels. For example, the system administrator may desire to speed-up the system, with more energy costs, or it may be desirable to maintain different classes of clients, which will have more privileges on response time. In an e-commerce application, for example, the clients that have already started a transaction should have better response times than the others that are navigating the site. For this reason, the system must be designed with QoS in mind. The purpose of this paper is to present a heterogeneous web server cluster model, with the goal of attaining minimum energy expenditure while guaranteeing response time requirements. The techniques used are DVS and cluster reconfiguration, with a contentblind request dispatch algorithm. In this paper, through simulation, we show results that outperform state-of-the-art techniques. The paper is organized as follows: section 2 presents some related work in energy-efficient web servers. Section 3 presents the system model adopted and section 4 the problem formulation and solution. Section 5 presents some results and section 6 presents our conclusions. 2 Related Work In [Bohrer et al. 2002] the authors applied DVS to a single server, based on utilization limits to change frequency. Also for single servers, the technique of DVS with delaying requests are presented in [Elnozahy et al. 2003]. Important works for clustered servers [Chase and Doyle 2001, Chase et al. 2001, Pinheiro et al. 2003, Rajamani and Lefurgy 2003] presented similar ways of applying DVS and cluster reconfiguration, using threshold values, based on the utilization or the system load, to define the transition points and keep the processor frequencies as low as possible, with the fewer possible number of active nodes. All these works are summarized in the survey presented in [Bianchini and Rajamony 2004]. The work in [Rusu et al. 2004] evaluates DVS policies for power management in systems with unpredictable workloads. One simple technique, used in [Xu et al. 2005], is the application-oblivious prediction, based on periodical utilization monitoring. They also show more complex techniques which attempt to predict performance needs by monitoring the arrival rate and CPU requirements of each request. In [Elnozahy et al. 2002] the IVS (Independent Voltage Scaling) and CVS (Coordinated Voltage Scaling) techniques are proposed. In the former, each server node decides

locally its frequency value, while in the latter scheme, all nodes operate close to the average frequency for the whole cluster. They also combine these DVS techniques with VOVO, which was originally proposed in an earlier version of [Pinheiro et al. 2003]. In this work, only continuous frequencies are considered. The work in [Sharma et al. 2003] considers DVS in QoS enabled web server clusters, assuming load balancing in the nodes, which makes the power management problem symmetric across the cluster. In [Lien et al. 2004] is presented a simple reconfiguration technique for a server cluster. Their model assumes a M/M/m queue and the energy consumption is calculated using the system expected waiting time. However, they do not consider heterogeneity, nor the DVS capability. Finally, the papers [Xu et al. 2005] and [Rusu et al. 2006] are the most relevant to our work. The former propose the technique LAOVS (Load-Aware On-off with independent Voltage Scale), where the determination of the active node number is made using a table calculated off-line, with a load discretization. For each load value, the best number of active nodes is obtained. The local power management is based on DVS using the same techniques presented in [Rusu et al. 2004]. They do not consider heterogeneity. In [Rusu et al. 2006] they include heterogeneity and QoS restrictions. 3 System Model In our model, we consider a cluster with a total of N server nodes, from which n are active, one front-end node, and only one type of request. The servers can be turned on and off as needed and their operating frequencies can be adjusted in a discrete way. The front-end node, assumed to work at the OSI layer 4, receives the requests from clients and redistributes them to the server nodes, in a content-blind request distribution method. The dispatching algorithm is a random weighted dispatch, where the requests are split into n streams, where n is the number of active nodes in the cluster. The probability of a incoming request being sent to a stream is proportional to the operating frequency of the associated node. This same dispatching technique is used in some commercial web servers based on a layer-4 web switch [Cardellini et al. 2002]. We consider that the requests follow a Poisson distribution with average arrival rate λ. The requests are distributed to N queues, each one with a service rate µ i (thus allowing for heterogeneous servers). The arrival rate for each queue is q i λ, where q i is the probability of sending a request to server i and is given by fop i N j=1 fop j. In this last expression, fop i is the operating frequency of server i and N j=1 fop j is the sum of the operating frequencies of all nodes (inactive nodes count as 0). Thus, the probability q i represents the fraction of load that i can handle in the actual configuration. An inactive node, obviously, handles a 0 load and have null probability of receiving a request. The requests service time follow a exponential distribution and have a service rate µ if executed in the fastest processor at its highest frequency (). Thus, the service rates µfop for each queue are given by ( 1 ), ( µfop 2 ),...,( µfop N ). The model is shown in Figure 1. In the model described, one Poisson process is split into N sequences of requests among the N servers, randomly selected as previously described. It is a well known result that in this case we have N Poisson subprocesses, each one with arrival rate q i λ.

λ 1 = fop1λ µ 1 = N j=1 fopj µfop1 λ front end λ 2 = fop2λ N j=1 fopj µ 2 = µfop2 λ N = fopnλ N j=1 fopj µ N = µfopn Figure 1. Cluster model The response time (deadline) will be used as a QoS parameter and the goal is to keep a predefined fraction β of the requests finishing before this deadline. We call β reliability factor. Thus, we should keep the probability W(t) = Pr [response time t] β. We calculate the mean value of this probability for the whole cluster by the average of each W i weighted by the probability q i. The equation for W(t), using the distribution function for the response time of a M/M/1 queue [Kleinrock 1975], is as follows: N ( W(t) = q ) i 1 e (µ i λ i )t (1) i=1 The maximum workload that the system supports, in cycles per second, is Ni=1 max freq i. Without loss of generality, frequencies are normalized by the maximum frequency of all the processors and the parameter µ refers to this maximum frequency. Thus, the requests mean number of cycles is the system is by the maximum supported load: µ, and the actual load of λ µ, in cycles per second. We can then normalize the system load x = λ µ N i=1 max freq i = λ µ N i=1 max freq i (2) The actual capacity of the active cluster, given by N i=1 fop i, must be higher than or equal to the actual workload, in order to keep up with the incoming requests. In other words, using the normalized equations, the normalized workload x must be smaller than N i=1 fop i N. i=1 max freq i Finally, it is assumed that VOVO and DVS decisions are cluster wide and thus taken only by the front-end node. Load measures are made periodically and the decision to reconfigure the system is taken after the increase or decrease of the load is repeated a predefined number of times. 4 Problem Definition and Solution Sketch The problem to be solved is to establish, for each processor, whether it will be on or off and, in the former case, its operating frequency, subject to energy and timing restrictions. The solution to the problem is a vector {fop 1,fop 2,...,fop N }, where fop i is the operating frequency of processor i, that is set to zero if the processor i is not active. We

can see this problem as an optimization problem, where the goal is to minimize the total aggregate expended power of the cluster, and yet guarantying an acceptable response time. Let p i (f j ) be the power consumption of processor i, running at its frequency f j. The value f 0 will be equal to zero, and will represent that processor i is turned off, and consumes no energy. With these assumptions, considering only the active servers, the aggregated power of the cluster is P = N i=1 [ρp i (fop i ) + (1 ρ)p i (idle)], where p i (idle) is the power of processor i when idle, and ρ is the processor utilization. The problem can be stated as follows: Minimize P = subject to N [ρp i (fop i ) + (1 ρ)p i (idle)] (3) i=1 Ni=1 fop i Ni=1 max freq i x (4) and N ( q ) i 1 e (µ i λ i )t β (5) i=1 where t is the predefined expected response time, and β is the minimum fraction of the requests that should fulfill the QoS requirement. In order to determine the number of active nodes and their respective operating frequencies, and inspired by the work done in [Rusu et al. 2006], the solution to this optimization problem is done off-line and a number of tables are obtained. That is, assuming we have a normalized workload represented in r x discrete levels, and the desired response time in r t levels, and also assuming that we have r r different reliability factors β, we will have a maximum of r r r t tables, each one with r x entries. Many techniques could be used to obtain the solution. For this experiment, we used a search algorithm to solve the problem optimally, and that showed adequate to the number of nodes here considered ( 10). Although for a greater number of nodes an exact algorithm will be inefficient, this is not a concern in this work. To solve this, some heuristics, like GRASP or Tabu Search, could be used to reduce this off-line computation execution time. 5 Simulation Results In our experiments we assumed a server cluster with 8 machines, two of each type shown in table 1. The requests follow a Poisson process, with average dependent on the desired workload. The average execution time of each request follows an exponential distribution with average 0.01s (if executed at the highest frequency of the fastest processor). In each experiment, a total of 8 10 5 requests were simulated. To compute the tables referred in the previous section, a granularity of 0.01 was assumed for the workload. The QoS requirement and reliability factor are set up accordingly to the specific experiment. Also,

it is assumed that load measurements are made every 1s, that changes in the configuration are done after 5 consecutive load increases (decreases), and that ρ is computed every 1s in the simulation. Finally, it should be mentioned that, in the simulator, the effect of switching on and off a server is taken into account. For the experiments described, switching on a server implies a 33s penalty and an additional 190J of power consumption. Table 1. Processors specifications Processor Frequencies (MHz) Resp. power consumption (W) XScale idle, 150.0, 400.0, 0.355, 0.355, 0.445, 600.0, 800.0, 1000.0 0.675, 1.175, 1.875 Power PC 750 idle, 4.125, 8.25, 16.5, 1.150, 1.150, 1.369, 1.811, 33, 99, 115.5, 132 2.661, 4.763, 5.269, 6.533 Power PC 1GHz 750GX idle, 533, 600, 667, 733, 7.63, 7.63, 7.8, 7.97, 8.13, 800, 867, 933, 1000 8.30, 10.35, 12, 12.25 Power PC 405 GP idle, 66, 133, 200, 266 0.74, 0.74, 1.09, 1.36, 1.58 To assess our method, we compared it to the one proposed in [Rusu et al. 2006]. Figure 2 shows average power consumption of our method, for different response time QoS parameters and constant reliability factor equal to 0.8, and for the method presented in [Rusu et al. 2006], but without the QoS restrictions, so that we can compare both methods in the most energy efficient situation. For this comparison, we assumed a QoS parameter of 1s, because this value is high enough for the great majority of requests to be executed in time, resulting in the best energy efficiency. Our method presents better results, even in some cases where there is a more tight response time restriction (workload lower than 0.3). The reason for this is that in our method, the search algorithm finds the best configuration for each load level, while the method presented in [Rusu et al. 2006] uses a predefined sequence of machines to be turned on and off, and this limits the optimization process. As expected, the smaller the QoS requirement, or the higher the workload, the processors will have to work faster to respond to the requests within the specified deadline, thus consuming more power. This behavior can be clearly seen in the figure. In our implementation, whenever the defined QoS cannot be satisfied we set up the cluster to full power, in order to operate at the best effort level. This is the reason why all curves, at some point, meet in the same line (the full power situation for a certain load). For example, for workloads greater than 0.7 in Figure 2, all the configurations with QoS response times of 0.02, 0.05, and 0.07 seconds achieve the same power consumption. Figure 3 shows the cluster power consumption considering different workloads. In this experiment, the QoS requirement were kept constant at 0.05s. As it can be seen, the effect on power consumption of imposing a higher reliability factor is greater as the workload increases. In this situation, the system becomes rapidly saturated and starts to work at full power (best effort approach). For example, the curve with workload 0.8 becomes saturated for reliability 0.6, and the curve with workload 0.6 becomes saturated only for reliability 0.8. Finally, Figure 4 shows the actual fraction of requests that have their time demands satisfied, for different β and workloads. Ideally, this curve should follow the identity line, but it is easy to note that, as the workload increases, it is harder to ensure a high percentage

45 40 Rusu2006 QoS = 0.02s QoS = 0.05s QoS = 0.07s QoS = 1.00s 40 35 workload=0.2 workload=0.4 workload=0.6 workload=0.8 35 30 30 25 Power (Watts) 25 20 15 Power (Watts) 20 15 10 10 5 5 0 0 0.2 0.4 0.6 0.8 1 Normalized workload 0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Reliability factor Figure 2. Cluster aggregate power for different QoS requirements, with β = 0.8 Figure 3. Cluster aggregate power for different workloads and QoS requirement of 0.05s of met deadlines (or even impossible, due to cluster saturation). As can be seen, with β increasing, the curves depart from the identity line, and, eventually, saturate (meaning that the system cannot satisfy the QoS requirement at the specified reliability level, shown as points below the identity line). Additionally, due to the discrete frequencies of the processors, as the workload and factor β decrease, the curves will bend upward. This is because the processors have a minimum operating frequency and the requests are being processed at a higher frequency than necessary. This can be clearly seen in the step-like curve for workload 0.2. 1 workload=0.20 workload=0.55 workload=0.71 identity Obtained fraction of met QoS requirements 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 Reliability factor Figure 4. Actual fraction of QoS restrictions met, as a function of β, for different workloads, with QoS=0.05s 6 Conclusion In this paper we presented a technique to achieve minimization of energy consumption at adequate response times for soft real-time applications in web server clusters. The problem is stated as an optimization problem and is solved off-line. During system operation, accordingly to the offered load, the QoS restriction (response times) and the predefined proportion of requests that should have their deadlines met (soft real-time criterion), processors are switched on and off, and the ones active are set to an optimal frequency of operation. In our simulations and comparison to other proposals, the technique here described showed promising results.

7 Acknowledgments The authors would like to thank CNPq, Capes and Faperj for partially providing funds for this research, and also the comments from the anonymous reviewers. References Bianchini, R. and Rajamony, R. (2004). Power and energy management for server systems. IEEE Computer, 37(11):68 74. Bohrer, P., Elnozahy, M., Kistler, M., Lefurgy, C., McDowell, C., and Rajamony, R. (2002). The case for power management in web servers. In Graybill, R. and Melhem, R., editors, Power Aware Computing. Kluwer Academic Publishers. Cardellini, V., Casalicchio, E., Colajanni, M., and Yu, P. S. (2002). The state of the art in locally distributed web-server systems. ACM Computing Surveys, 34(2):263 311. Chase, J., Anderson, D., Thakur, P., and Vahdat, A. (2001). Managing energy and server resources in hosting centers. In Proceedings of the 18th Symposium on Operating Systems Principles, pages 103 116, Banff, Alberta, Canada. Chase, J. and Doyle, R. (2001). Balance of power: Energy management for server clusters. In Eighth Workshop on Hot Topics in Operating Systems. Elnozahy, M., Kistler, M., and Rajamony, R. (2002). Energy-efficient server clusters. In Second Workshop on Power Aware Computing Systems, pages 179 196, Cambridge, MA, USA. Elnozahy, M., Kistler, M., and Rajamony, R. (2003). Energy conservation policies for web servers. In 4th USENIX Symposium on Internet Technologies and Systems, Seattle, WA, USA. Kleinrock, L. (1975). Queueing Systems, volume 1. John Wiley and Sons. Lefurgy, C., Rajamani, K., Rawson, F., Felter, W., Kistler, M., and Keller, T. W. (2003). Energy management for commercial servers. IEEE Computer, 36(12):39 48. Lien, C.-H., Bai, Y.-W., Lin, M.-B., and Chen, P.-A. (2004). The saving of energy in web server clusters by utilizing dynamic sever management. In 12th IEEE International Conference on Networks, volume 1, pages 253 257, Singapore. Pinheiro, E., Bianchini, R., Carrera, E. V., and Heath, T. (2003). Dynamic cluster reconfiguration for power and performance. In Compilers and Operating Systems for Low Power. Kluwer Academic Publishers. Rajamani, K. and Lefurgy, C. (2003). On evaluating request-distribution schemes for saving energy in server clusters. In IEEE International Symposium on Performance Analysis of Systems and Software, pages 111 122, Austin, Texas, USA. Rusu, C., Ferreira, A., Scordino, C., Watson, A., Melhem, R., and Mossé, D. (2006). Energy-efficient real-time heterogeneous server clusters. In IEEE Real-Time and Embedded Technology and Applications Symposium, San Jose, CA, USA. Rusu, C., Xu, R., Melhem, R., and Mossé, D. (2004). Energy-efficient policies for request-driven soft real-time systems. In 16th Euromicro Conference on Real-Time Systems, pages 175 183, Catania, Italy. Sharma, V., Thomas, A., Abdelzaher, T. F., Skadron, K., and Lu, Z. (2003). Power-aware QoS management in web servers. In 24th IEEE Real-Time Systems Symposium, pages 63 72, Cancun, Mexico. Xu, R., Zhu, D., Rusu, C., Melhem, R., and Mossé, D. (2005). Energy-efficient policies for embedded clusters. SIGPLAN Notices, 40(7):1 10.