Improving Response Time and Energy Efficiency in Server Clusters

Improving Response Time and Energy Efficiency in Server Clusters Raphael Guerra, Luciano Bertini and J.C.B. Leite Instituto de Computação - Universidade Federal Fluminense Rua Passo da Pátria, 156, Bloco E, 24.210-240, Niterói, RJ, Brazil [rguerra, lbertini, julius]@ic.uff.br Abstract. The development of energy-efficient web server clusters requires the study of different request dispatch policies applied by the central access point to the cluster, the front-end, and/or the application of hardware techniques that allow the best usage of resources. However, energy efficiency should not be attained at the expense of poor response times. This paper describes a technique that tries to balance energy consumption and adequate response times for soft real-time applications in server clusters. Resumo. O desenvolvimento de servidores web energeticamente eficientes requer não só o estudo de políticas de despacho a serem aplicadas pelo nó central, como também o uso de técnicas de hardware que permitam um melhor uso dos recursos. Contudo, essa eficiência não pode ser obtida em detrimento do atendimento aos prazos de execução das requisições. Este artigo descreve uma técnica que tenta balancear consumo de energia e tempos de resposta adequados para aplicações de tempo real não críticas em clusters de servidores. 1 Introduction The development of energy-efficient mechanisms for web server clusters requires the study of different request dispatch policies applied by the central access point to the cluster, the front-end, and/or the application of hardware techniques that allows the best usage of resources. Several works have been published on these policies, and a good review is presented in [Cardellini et al. 2002]. Essentially, they classify the algorithms in those that work at OSI layer 4, and those that work at OSI layer 7. The former are not content aware, i.e., they cannot look to what content is being requested to make the dispatch decision. On the other hand, the latter can rely on information extracted from the URL, for purposes such as to improve the cache affinity, increase the load sharing, and use specialized server nodes to provide, for example, streaming media and dynamic content. Most of the work mentioned, however, was done aiming to maximize performance, not energy-efficiency. Another important structural characteristic of a server cluster, for which research is still beginning, is to consider node heterogeneity and energy efficiency. When maintaining a web cluster, a replacement or a new node to be added is naturally different from the old ones. Thus, clearly, a cluster is usually homogeneous only when it is first put to service. Another viewpoint on heterogeneity in given in [Lefurgy et al. 2003], on the architecture of commercial servers and the possibilities for energy efficiency in its various subsystems. In that work, they state that mixing power-efficient and performanceefficient processors is important for the support of Internet applications, because these applications require both efficient network-protocol processing and application-level computation. Whatever the motivation, it is necessary to develop new power management

techniques aware of the cluster heterogeneity. Furthermore, for the mentioned service differentiation, it is necessary to provide some kind of QoS control, for example, at the response time level. There are two main mechanisms that can be used to reduce the energy consumption in a cluster, without considering memory, disks and other peripherals. The first of them is DVS (Dynamic Voltage Scaling), which means to scale the voltage and frequency of the processor to predefined supported levels. The other is the dynamic structure configuration of the server cluster, or what is called VOVO (Vary-On Vary-Off), or simply dynamic cluster reconfiguration: turning a server off to save energy, or turning on a server to improve performance. Both techniques have been used together by some authors and they have been proved successful. Although the energy minimization is important, it will not always be desired at maximum levels. For example, the system administrator may desire to speed-up the system, with more energy costs, or it may be desirable to maintain different classes of clients, which will have more privileges on response time. In an e-commerce application, for example, the clients that have already started a transaction should have better response times than the others that are navigating the site. For this reason, the system must be designed with QoS in mind. The purpose of this paper is to present a heterogeneous web server cluster model, with the goal of attaining minimum energy expenditure while guaranteeing response time requirements. The techniques used are DVS and cluster reconfiguration, with a contentblind request dispatch algorithm. In this paper, through simulation, we show results that outperform state-of-the-art techniques. The paper is organized as follows: section 2 presents some related work in energy-efficient web servers. Section 3 presents the system model adopted and section 4 the problem formulation and solution. Section 5 presents some results and section 6 presents our conclusions. 2 Related Work In [Bohrer et al. 2002] the authors applied DVS to a single server, based on utilization limits to change frequency. Also for single servers, the technique of DVS with delaying requests are presented in [Elnozahy et al. 2003]. Important works for clustered servers [Chase and Doyle 2001, Chase et al. 2001, Pinheiro et al. 2003, Rajamani and Lefurgy 2003] presented similar ways of applying DVS and cluster reconfiguration, using threshold values, based on the utilization or the system load, to define the transition points and keep the processor frequencies as low as possible, with the fewer possible number of active nodes. All these works are summarized in the survey presented in [Bianchini and Rajamony 2004]. The work in [Rusu et al. 2004] evaluates DVS policies for power management in systems with unpredictable workloads. One simple technique, used in [Xu et al. 2005], is the application-oblivious prediction, based on periodical utilization monitoring. They also show more complex techniques which attempt to predict performance needs by monitoring the arrival rate and CPU requirements of each request. In [Elnozahy et al. 2002] the IVS (Independent Voltage Scaling) and CVS (Coordinated Voltage Scaling) techniques are proposed. In the former, each server node decides

locally its frequency value, while in the latter scheme, all nodes operate close to the average frequency for the whole cluster. They also combine these DVS techniques with VOVO, which was originally proposed in an earlier version of [Pinheiro et al. 2003]. In this work, only continuous frequencies are considered. The work in [Sharma et al. 2003] considers DVS in QoS enabled web server clusters, assuming load balancing in the nodes, which makes the power management problem symmetric across the cluster. In [Lien et al. 2004] is presented a simple reconfiguration technique for a server cluster. Their model assumes a M/M/m queue and the energy consumption is calculated using the system expected waiting time. However, they do not consider heterogeneity, nor the DVS capability. Finally, the papers [Xu et al. 2005] and [Rusu et al. 2006] are the most relevant to our work. The former propose the technique LAOVS (Load-Aware On-off with independent Voltage Scale), where the determination of the active node number is made using a table calculated off-line, with a load discretization. For each load value, the best number of active nodes is obtained. The local power management is based on DVS using the same techniques presented in [Rusu et al. 2004]. They do not consider heterogeneity. In [Rusu et al. 2006] they include heterogeneity and QoS restrictions. 3 System Model In our model, we consider a cluster with a total of N server nodes, from which n are active, one front-end node, and only one type of request. The servers can be turned on and off as needed and their operating frequencies can be adjusted in a discrete way. The front-end node, assumed to work at the OSI layer 4, receives the requests from clients and redistributes them to the server nodes, in a content-blind request distribution method. The dispatching algorithm is a random weighted dispatch, where the requests are split into n streams, where n is the number of active nodes in the cluster. The probability of a incoming request being sent to a stream is proportional to the operating frequency of the associated node. This same dispatching technique is used in some commercial web servers based on a layer-4 web switch [Cardellini et al. 2002]. We consider that the requests follow a Poisson distribution with average arrival rate λ. The requests are distributed to N queues, each one with a service rate µ i (thus allowing for heterogeneous servers). The arrival rate for each queue is q i λ, where q i is the probability of sending a request to server i and is given by fop i N j=1 fop j. In this last expression, fop i is the operating frequency of server i and N j=1 fop j is the sum of the operating frequencies of all nodes (inactive nodes count as 0). Thus, the probability q i represents the fraction of load that i can handle in the actual configuration. An inactive node, obviously, handles a 0 load and have null probability of receiving a request. The requests service time follow a exponential distribution and have a service rate µ if executed in the fastest processor at its highest frequency (). Thus, the service rates µfop for each queue are given by ( 1 ), ( µfop 2 ),...,( µfop N ). The model is shown in Figure 1. In the model described, one Poisson process is split into N sequences of requests among the N servers, randomly selected as previously described. It is a well known result that in this case we have N Poisson subprocesses, each one with arrival rate q i λ.

λ 1 = fop1λ µ 1 = N j=1 fopj µfop1 λ front end λ 2 = fop2λ N j=1 fopj µ 2 = µfop2 λ N = fopnλ N j=1 fopj µ N = µfopn Figure 1. Cluster model The response time (deadline) will be used as a QoS parameter and the goal is to keep a predefined fraction β of the requests finishing before this deadline. We call β reliability factor. Thus, we should keep the probability W(t) = Pr [response time t] β. We calculate the mean value of this probability for the whole cluster by the average of each W i weighted by the probability q i. The equation for W(t), using the distribution function for the response time of a M/M/1 queue [Kleinrock 1975], is as follows: N ( W(t) = q ) i 1 e (µ i λ i )t (1) i=1 The maximum workload that the system supports, in cycles per second, is Ni=1 max freq i. Without loss of generality, frequencies are normalized by the maximum frequency of all the processors and the parameter µ refers to this maximum frequency. Thus, the requests mean number of cycles is the system is by the maximum supported load: µ, and the actual load of λ µ, in cycles per second. We can then normalize the system load x = λ µ N i=1 max freq i = λ µ N i=1 max freq i (2) The actual capacity of the active cluster, given by N i=1 fop i, must be higher than or equal to the actual workload, in order to keep up with the incoming requests. In other words, using the normalized equations, the normalized workload x must be smaller than N i=1 fop i N. i=1 max freq i Finally, it is assumed that VOVO and DVS decisions are cluster wide and thus taken only by the front-end node. Load measures are made periodically and the decision to reconfigure the system is taken after the increase or decrease of the load is repeated a predefined number of times. 4 Problem Definition and Solution Sketch The problem to be solved is to establish, for each processor, whether it will be on or off and, in the former case, its operating frequency, subject to energy and timing restrictions. The solution to the problem is a vector {fop 1,fop 2,...,fop N }, where fop i is the operating frequency of processor i, that is set to zero if the processor i is not active. We

can see this problem as an optimization problem, where the goal is to minimize the total aggregate expended power of the cluster, and yet guarantying an acceptable response time. Let p i (f j ) be the power consumption of processor i, running at its frequency f j. The value f 0 will be equal to zero, and will represent that processor i is turned off, and consumes no energy. With these assumptions, considering only the active servers, the aggregated power of the cluster is P = N i=1 [ρp i (fop i ) + (1 ρ)p i (idle)], where p i (idle) is the power of processor i when idle, and ρ is the processor utilization. The problem can be stated as follows: Minimize P = subject to N [ρp i (fop i ) + (1 ρ)p i (idle)] (3) i=1 Ni=1 fop i Ni=1 max freq i x (4) and N ( q ) i 1 e (µ i λ i )t β (5) i=1 where t is the predefined expected response time, and β is the minimum fraction of the requests that should fulfill the QoS requirement. In order to determine the number of active nodes and their respective operating frequencies, and inspired by the work done in [Rusu et al. 2006], the solution to this optimization problem is done off-line and a number of tables are obtained. That is, assuming we have a normalized workload represented in r x discrete levels, and the desired response time in r t levels, and also assuming that we have r r different reliability factors β, we will have a maximum of r r r t tables, each one with r x entries. Many techniques could be used to obtain the solution. For this experiment, we used a search algorithm to solve the problem optimally, and that showed adequate to the number of nodes here considered ( 10). Although for a greater number of nodes an exact algorithm will be inefficient, this is not a concern in this work. To solve this, some heuristics, like GRASP or Tabu Search, could be used to reduce this off-line computation execution time. 5 Simulation Results In our experiments we assumed a server cluster with 8 machines, two of each type shown in table 1. The requests follow a Poisson process, with average dependent on the desired workload. The average execution time of each request follows an exponential distribution with average 0.01s (if executed at the highest frequency of the fastest processor). In each experiment, a total of 8 10 5 requests were simulated. To compute the tables referred in the previous section, a granularity of 0.01 was assumed for the workload. The QoS requirement and reliability factor are set up accordingly to the specific experiment. Also,

it is assumed that load measurements are made every 1s, that changes in the configuration are done after 5 consecutive load increases (decreases), and that ρ is computed every 1s in the simulation. Finally, it should be mentioned that, in the simulator, the effect of switching on and off a server is taken into account. For the experiments described, switching on a server implies a 33s penalty and an additional 190J of power consumption. Table 1. Processors specifications Processor Frequencies (MHz) Resp. power consumption (W) XScale idle, 150.0, 400.0, 0.355, 0.355, 0.445, 600.0, 800.0, 1000.0 0.675, 1.175, 1.875 Power PC 750 idle, 4.125, 8.25, 16.5, 1.150, 1.150, 1.369, 1.811, 33, 99, 115.5, 132 2.661, 4.763, 5.269, 6.533 Power PC 1GHz 750GX idle, 533, 600, 667, 733, 7.63, 7.63, 7.8, 7.97, 8.13, 800, 867, 933, 1000 8.30, 10.35, 12, 12.25 Power PC 405 GP idle, 66, 133, 200, 266 0.74, 0.74, 1.09, 1.36, 1.58 To assess our method, we compared it to the one proposed in [Rusu et al. 2006]. Figure 2 shows average power consumption of our method, for different response time QoS parameters and constant reliability factor equal to 0.8, and for the method presented in [Rusu et al. 2006], but without the QoS restrictions, so that we can compare both methods in the most energy efficient situation. For this comparison, we assumed a QoS parameter of 1s, because this value is high enough for the great majority of requests to be executed in time, resulting in the best energy efficiency. Our method presents better results, even in some cases where there is a more tight response time restriction (workload lower than 0.3). The reason for this is that in our method, the search algorithm finds the best configuration for each load level, while the method presented in [Rusu et al. 2006] uses a predefined sequence of machines to be turned on and off, and this limits the optimization process. As expected, the smaller the QoS requirement, or the higher the workload, the processors will have to work faster to respond to the requests within the specified deadline, thus consuming more power. This behavior can be clearly seen in the figure. In our implementation, whenever the defined QoS cannot be satisfied we set up the cluster to full power, in order to operate at the best effort level. This is the reason why all curves, at some point, meet in the same line (the full power situation for a certain load). For example, for workloads greater than 0.7 in Figure 2, all the configurations with QoS response times of 0.02, 0.05, and 0.07 seconds achieve the same power consumption. Figure 3 shows the cluster power consumption considering different workloads. In this experiment, the QoS requirement were kept constant at 0.05s. As it can be seen, the effect on power consumption of imposing a higher reliability factor is greater as the workload increases. In this situation, the system becomes rapidly saturated and starts to work at full power (best effort approach). For example, the curve with workload 0.8 becomes saturated for reliability 0.6, and the curve with workload 0.6 becomes saturated only for reliability 0.8. Finally, Figure 4 shows the actual fraction of requests that have their time demands satisfied, for different β and workloads. Ideally, this curve should follow the identity line, but it is easy to note that, as the workload increases, it is harder to ensure a high percentage

45 40 Rusu2006 QoS = 0.02s QoS = 0.05s QoS = 0.07s QoS = 1.00s 40 35 workload=0.2 workload=0.4 workload=0.6 workload=0.8 35 30 30 25 Power (Watts) 25 20 15 Power (Watts) 20 15 10 10 5 5 0 0 0.2 0.4 0.6 0.8 1 Normalized workload 0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Reliability factor Figure 2. Cluster aggregate power for different QoS requirements, with β = 0.8 Figure 3. Cluster aggregate power for different workloads and QoS requirement of 0.05s of met deadlines (or even impossible, due to cluster saturation). As can be seen, with β increasing, the curves depart from the identity line, and, eventually, saturate (meaning that the system cannot satisfy the QoS requirement at the specified reliability level, shown as points below the identity line). Additionally, due to the discrete frequencies of the processors, as the workload and factor β decrease, the curves will bend upward. This is because the processors have a minimum operating frequency and the requests are being processed at a higher frequency than necessary. This can be clearly seen in the step-like curve for workload 0.2. 1 workload=0.20 workload=0.55 workload=0.71 identity Obtained fraction of met QoS requirements 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 Reliability factor Figure 4. Actual fraction of QoS restrictions met, as a function of β, for different workloads, with QoS=0.05s 6 Conclusion In this paper we presented a technique to achieve minimization of energy consumption at adequate response times for soft real-time applications in web server clusters. The problem is stated as an optimization problem and is solved off-line. During system operation, accordingly to the offered load, the QoS restriction (response times) and the predefined proportion of requests that should have their deadlines met (soft real-time criterion), processors are switched on and off, and the ones active are set to an optimal frequency of operation. In our simulations and comparison to other proposals, the technique here described showed promising results.

7 Acknowledgments The authors would like to thank CNPq, Capes and Faperj for partially providing funds for this research, and also the comments from the anonymous reviewers. References Bianchini, R. and Rajamony, R. (2004). Power and energy management for server systems. IEEE Computer, 37(11):68 74. Bohrer, P., Elnozahy, M., Kistler, M., Lefurgy, C., McDowell, C., and Rajamony, R. (2002). The case for power management in web servers. In Graybill, R. and Melhem, R., editors, Power Aware Computing. Kluwer Academic Publishers. Cardellini, V., Casalicchio, E., Colajanni, M., and Yu, P. S. (2002). The state of the art in locally distributed web-server systems. ACM Computing Surveys, 34(2):263 311. Chase, J., Anderson, D., Thakur, P., and Vahdat, A. (2001). Managing energy and server resources in hosting centers. In Proceedings of the 18th Symposium on Operating Systems Principles, pages 103 116, Banff, Alberta, Canada. Chase, J. and Doyle, R. (2001). Balance of power: Energy management for server clusters. In Eighth Workshop on Hot Topics in Operating Systems. Elnozahy, M., Kistler, M., and Rajamony, R. (2002). Energy-efficient server clusters. In Second Workshop on Power Aware Computing Systems, pages 179 196, Cambridge, MA, USA. Elnozahy, M., Kistler, M., and Rajamony, R. (2003). Energy conservation policies for web servers. In 4th USENIX Symposium on Internet Technologies and Systems, Seattle, WA, USA. Kleinrock, L. (1975). Queueing Systems, volume 1. John Wiley and Sons. Lefurgy, C., Rajamani, K., Rawson, F., Felter, W., Kistler, M., and Keller, T. W. (2003). Energy management for commercial servers. IEEE Computer, 36(12):39 48. Lien, C.-H., Bai, Y.-W., Lin, M.-B., and Chen, P.-A. (2004). The saving of energy in web server clusters by utilizing dynamic sever management. In 12th IEEE International Conference on Networks, volume 1, pages 253 257, Singapore. Pinheiro, E., Bianchini, R., Carrera, E. V., and Heath, T. (2003). Dynamic cluster reconfiguration for power and performance. In Compilers and Operating Systems for Low Power. Kluwer Academic Publishers. Rajamani, K. and Lefurgy, C. (2003). On evaluating request-distribution schemes for saving energy in server clusters. In IEEE International Symposium on Performance Analysis of Systems and Software, pages 111 122, Austin, Texas, USA. Rusu, C., Ferreira, A., Scordino, C., Watson, A., Melhem, R., and Mossé, D. (2006). Energy-efficient real-time heterogeneous server clusters. In IEEE Real-Time and Embedded Technology and Applications Symposium, San Jose, CA, USA. Rusu, C., Xu, R., Melhem, R., and Mossé, D. (2004). Energy-efficient policies for request-driven soft real-time systems. In 16th Euromicro Conference on Real-Time Systems, pages 175 183, Catania, Italy. Sharma, V., Thomas, A., Abdelzaher, T. F., Skadron, K., and Lu, Z. (2003). Power-aware QoS management in web servers. In 24th IEEE Real-Time Systems Symposium, pages 63 72, Cancun, Mexico. Xu, R., Zhu, D., Rusu, C., Melhem, R., and Mossé, D. (2005). Energy-efficient policies for embedded clusters. SIGPLAN Notices, 40(7):1 10.