Decentralized Control Architectures for Power Management. in Data Centers. A Thesis. Submitted to the Faculty. Drexel University.

Transcription

1 Decentralized Control Architectures for Power Management in Data Centers A Thesis Submitted to the Faculty of Drexel University by Rui Wang in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Engineering July 2013

2 i Dedications This thesis is dedicated to my wife Bo Sun and our parents, for their endless love and support.

3 ii Acknowledgements Here, I would like to express my deepest gratitude to my advisor, Dr. Nagarajan Kandasamy. It is my great honor to be inspired by such a capable and respectable mentor, and I would not be able to reach this important career milestone without his patient guidance and generous support over the past five years. I have learned enormously from him, not only during the research activities, but also in the daily lives. I am fortunate to have Dr. Kandasamy as my advisor. I would also like to express my sincere gratitude to the members of my research proposal and dissertation committees, Dr. Spiros Mancoridis, Dr. Chika Nwankpa, Dr. Harish Sethu, Dr. Baris Taskin, and Dr. Steven Weber, for their precious time and valuable comments. Their suggestions have helped to improve much of my work. To the alumni and students in our team Dr. Dara M. Kusic, Dr. James A. Shackleford, Ph.D. candidate Salvador DeCelles, and Ph.D. candidate Tingshan Huang, I thank them for the enjoyable discussions. To all the faculty and staff members, I thank them from the bottom of my heart for making the ECE Department a great place to study and stay. Last but not least, I will thank my dear wife Bo Sun for her care and infinite love, and our parents for their continuous support.

4 iii Table of Contents List of Tables... List of Figures... Abstract... vi viii xii 1. Introduction Motivation Adaptive Management of Large-Scale Computing Systems Contributions Organization Hierarchical Control Architecture Introduction Related Work Preliminaries The System Architecture Workload Generation The Control Architecture The Distributed Local Controllers Receding Horizon Controllers Neural Network Controllers The Supervisory Controller The RH Control Scheme The QP Problem The MINLP Problem Validation Experiments Controller Parameters Building System Models via Profiling Performance of the Control Structures Summary... 46

5 iv 3. Decentralized Control Architecture Introduction The System Model Control Architecture Estimating the Workload Switching Scheme at the L1 Level Resource Allocation at the L0 Level The Decentralized Control Structures A Centralized Controller Performance Analysis System Parameters Building the Application Models Static Teams of Controllers Dynamic Teams of Controllers Sensitivity Analysis Computational Complexity Summary Data Centers as Controllable Load Resources Introduction Related Work Preliminaries The Demand Response Programs The Optimization Framework System Modeling Assumptions Formulation of the Basic Optimization Problem Modeling and Optimization of Risks Computation of Success Probability Case Studies Evaluation Parameters... 93

6 v Evaluating the Basic Optimization Framework Formulations with Risks Multiple Data Centers Signaled at A Time Summary Conclusion and Future Work Bibliography

7 vi List of Tables 3.1 The arrival rates accommodated by VMs for each of the three services as function of CPU share Performance of static teams in terms of SLA satisfactions, number of active cores, and switching activity per server Performance of dynamic teams in terms of SLA satisfactions, number of active cores, and switching activity per server Performance of a centralized RHC scheme Fault tolerance capabilities of the distributed control scheme. The system performance is not affected when a reasonable number of faults take place The effect of tuning key control parameters on overall system performance. SLA satisfaction can be tuned, and different priorities can be assigned to services Execution time overhead incurred by the controller as a function of the number of supported services and server types Distances in km between the seven datacenters used in the case study The PUE ratings of each datacenter and the retail electricity prices in cents/kwh The four types of servers housed within the datacenters Number of servers in each of the seven datacenters The expected bandwidth E[w] (Gbps), standard deviation σ(w) (%), and σ(1/w) (1/Gbps) from datacenter 1 to other datacenters The expected LMPs E[LMP 1t ] ($/MWh) and the standard deviation σ(lmp 1t ) ($/MWh) from hour 15 to hour 20 at datacenter The estimated number of active and idle servers in the datacenters In the basic scenario, the number of servers migrated out from datacenter 1 to other datacenters, and the corresponding migration time on each link In the scenario considering bandwidth risk, the number of servers migrated out from datacenter 1 to other datacenters, and the corresponding migration time on each link

8 4.10 Summary of test scenarios discussed in the chapter vii

9 viii List of Figures 1.1 A glance at the inside of data centers [1]. Online services are typically hosted on multiple heterogeneous networked servers that are housed in data centers The CO 2 emissions of data centers in 2007 [60] [6]. It was comparable to those of some countries, and took 0.3% of the world s total The average daily utilization of servers [60] [6]. It rarely exceeds 6%, and up to 30% of servers in data centers are idle An illustration of virtualization technology and its impact for power efficiency. It allows a physical server to be shared among multiple virtual machines, thus improving the server utilization and power efficiency The system architecture hosting the three online services. The supervisory controller makes switching decisions affecting servers in the application tier whereas local controllers on each server decide the CPU share provided to VMs under their control The schematic of the receding horizon controller. It determines the optimal CPU share for VMs on the local server, based on the incoming workload, the processing capacity of others VMs in the virtual cluster, and the system model A feasible sequence of control actions selected by the receding horizon controller. Only the first control action (marked by the red circle) is applied to the system and the rest are discarded The structure of a back-propagation based neural network. It contains three layers: an input layer, a hidden layer, and an output layer The schematic for the neural network (NN) based controller. The NN learns from the error between the desired and actual response time, updates the weights in the NN, and then computes a new CPU share for the VM The receding horizon control scheme underlying the supervisory controller. It decides which servers are kept active based on the incoming workload as well as the system model A logical grouping of servers in our testbed based on processing capacities: Apollo and Poseidon each have 14 GHz of aggregate processing capacity whereas Eros and Demeter possess an aggregate processing capacity of 11 GHz each An illustration of normal distribution [77]. Here, µ + 2σ is larger than 97.8% (= 1-2.1%-0.1%) of the distribution... 36

10 ix 2.9 Average response times achieved by a VM hosting the Gold service as a function of request arrival rate and the CPU share provided to it The arrival rates accommodated by VMs for each of the three services as function of CPU share The power consumed by two models of Dell servers, the 1950 and 2950, when loaded with VMs hosting an application server. The line is fit from experimentally collected data The dynamic workload provided to the cluster and the switching behavior of the servers as commanded by the supervisor The CPU share provided to the Silver and Bronze virtual clusters by the RHbased L0 controllers in response to the workload The CPU share provided by the RH-based control structure to each of the VMs within the Bronze cluster The CPU shares provided to the Silver and Bronze virtual clusters by the NNbased L0 controllers in response to the workload The CPU share provided by the NN-based control structure to each of the VMs within the Bronze cluster Performance of the RH and NN-based controllers against the Oracle in terms of both QoS violations and power savings The execution time incurred by the supervisor as a function of the cluster size The system architecture hosting the online services. Local controllers on each server decide both the CPU share to assign to VMs under their control within the application tier as well as the number of processing cores to operate. Here, Λ is the incoming workload to the cluster and λ is the workload dispatched to an individual server. The subscripts g, s, and b, denote the Gold, Silver, and Bronze services, respectively Microprocessor Core Idle (C) States [2]. As a power management state, it allows the operating system to idle any core in the package such that the core consumes less power The types of heterogeneous servers assumed in our setup, based on their CPU capacities and services supported... 57

11 x 3.4 Schematic of the L1 control level. It decides if a server should be active or idled, based on the incoming workload as well as the number of active servers in the system Schematic of the L0 control level. It decides the CPU share provided to VMs under its control, based on the local workload Two possible organizational structures for decentralized control. In static teams of controllers, the information observed or estimated by each controller at time k is independent of what other team members have done during that time step. In dynamic teams, the observations and control decisions of a controller during time step k incorporate the actions of other controllers during that time step Schematic of the centralized controller. It determines which servers to stay active as well as the corresponding CPU share allocated to their VMs A portion of the workload provided to the Gold, Silver, and Bronze services. They can change dramatically in a short time period The global Bronze workload and the corresponding aggregate CPU share provided to the Bronze virtual computing cluster during a one-hour period The LMPs values in a wholesale electricity market over a 24-hour period for Philadelphia [48]. The Y-axis shows the $/MWh The operating principle underlying the RTDR and EDR programs, summarized from [30]. Here, t L and t D denote the lead time and the downtime, respectively, and P denotes the load curtailment requested by the RTO. The actual measured load curtailment is denoted by P With VMware s vmotion, live VMs can be migrated to other geographically distributed data centers [65] The locations of distributed datacenters used in our case study. Blue lines indicate communication links less than 500 km between datacenters whereas red lines indicate longer links; live VM migration is not feasible over these links since we assume that VMs can only be migrated between datacenters within 500 km of each other VM migration scheme from source datacenter i to destination data center j. The resource of a type l server in data center j must be no less than that of a type k server in data center i

12 4.6 The time incurred in migrating a VM as a function of distance between two datacenters. The VM used in this experiment hosted Microsoft SQL server and had a memory usage of 8 GB. The plot shows both the original curve as well as the best-fit linear line xi

13 xii Abstract Decentralized Control Architectures for Power Management in Data Centers Rui Wang Advisors: Nagarajan Kandasamy, Ph.D. Data centers host online services on distributed computing systems comprising heterogeneous networked servers. Virtualization technology is a promising solution to support multiple online services using fewer computing resources. It enables a single server to be shared among multiple performance-isolated platforms called virtual machines (VM), where each VM can serve one or more applications. Also, virtualization enables on-demand computing where resources such as CPU, memory, and disk space are allocated to applications as needed, based on the currently prevailing workload demand, rather than statically, based simply on the peak workload demand. By dynamically provisioning VMs and turning servers on/off properly, data center operators can maintain the desired quality of service (QoS) while achieving higher server utilization and lower power consumption. Various techniques have been proposed to automate the system management tasks of computing systems. In terms of control architectures, a centralized controller, though offering the best performance, is only capable of managing the performance of a stand-alone server or a small-scale system comprising a few servers. Significant challenges must still be addressed to achieve real-time control of a large-scale computing system with multiple interacting components. Therefore, hierarchical control and decentralized decision making of computing systems is a recent phenomenon and an area of active research. This thesis focuses on designing hierarchical and decentralized control architectures to manage the power and performance of large-scale computing systems. First, we propose a hierarchical control architecture to manage a virtualized server cluster hosting VMs and supporting online services. In this hierarchy, fully distributed local controllers optimize the CPU share of VMs under their control such that the aggregate CPU share provided to the cluster covers the incoming workload, while a supervisory controller on top dynamically

14 xiii shuts down the extra machines during periods of light workload to reduce the cluster s power consumption. Two different strategies, receding horizon control and neural network based control, are compared for the local controllers. We validate the framework on a cluster supporting three online services, showing that our scheme adapts quickly to dynamic workload changes, and is scalable and quite flexible in that servers can be added/removed anytime while maintaining overall system performance. Also, when managed using our control scheme, the cluster saves, on average, 20% in power-consumption costs over a three hour period when compared to a system operating without dynamic control. Second, we propose a fully decentralized control architecture to further improve the scalability of the hierarchical design. Here, each controller manages one server: its inner loop constantly optimizes the per-vm computing resources to guarantee the service level agreement (SLA), and its outer loop appropriately switches the server or processor package on/off so that dynamic workload is consolidated onto the fewest number of active servers to reduce power usage. In addition, we organize the controllers in different fashions and analyze how the organizations affect the overall performance of large clusters with up to a thousand servers. Our studies indicate that the control structure, when organized as a causal system in which a precedence relation exists among the individual controllers, achieves a high degree of SLA satisfaction (> 98%) while significantly reducing the corresponding switching cost. Finally, we extend our focus further to manage the power consumption of multiple geographically distributed data centers. Assuming each data center is controlled by a welldesigned power management scheme such as the ones mentioned above, we develop a high level optimization framework so that data centers can earn financial reward when curtailing power consumption as requested by electric utilities. Specifically, we integrate the demand response (DR) program offered by the electricity market into data center operations and achieve power reduction in some centers by migrating live VMs to other centers. The optimizer aims to maximize the expected profit by trading off among reward, VM migration costs/time/distance, and risks from bandwidth and reward uncertainties. A set of case studies involving data centers participating in an economic DR program is used to validate the framework.

15

16 1 1. Introduction 1.1 Motivation Online services such as business, e-commerce, and scientific applications, are enabled by enterprise applications, defined broadly as any software which simultaneously provides services to a large number of users over a computer network. These applications are typically hosted on computing systems comprising multiple heterogeneous networked servers that are housed in a facility called a data center, as shown in Fig These data centers hosting online services must typically satisfy stringent quality-of-service (QoS) requirements when operating in highly dynamic environments; for example, the workload to be processed may be time-varying, and hardware or software components may fail during operation. Meanwhile, data centers are also major consumers of electricity. An average data center consumes as much as energy as 25,000 households. In 2007, there were about 30 million servers worldwide, consuming 104 TWh of energy almost 0.5% of the world s energy production at a cost of $9 billion; the consumption increased even more to 200 TWh in 2010 with a cost of $23 billion, and has been projected to grow at about 9% annually [8] [19]. Figure 1.1: A glance at the inside of data centers [1]. Online services are typically hosted on multiple heterogeneous networked servers that are housed in data centers.

17 2 (a) The CO 2 emissions (in megatoones) of data center and some countries. (b) The CO 2 emissions of some industries as percentage of world s total. Figure 1.2: The CO 2 emissions of data centers in 2007 [60] [6]. It was comparable to those of some countries, and took 0.3% of the world s total. As illustrated in Fig. 1.2, due to the enormous electricity consumption, CO 2 emissions caused by data centers is 170 megatonnes in 2007, exceeding those of some countries such as Argentina and Netherlands, and taking 0.3% of world s CO2 emissions. It is projected that, by 2020, CO 2 from data centers will quadruple to 340 megatonnes, exceeding that of the airline industry [60] [6] [43]. Therefore, deducing energy consumption is also an urgent and important task for data centers. Recently, a study by McKinsey & Company points out that there is little correlation between a data center s energy consumption and its server utilization. For example, as Fig. 1.3 shows, the average daily utilization of a server rarely exceeds 6% and up to 30% of serves housed in many data centers are functionally dead (with less than 3% utilization), meaning these servers are powered on but not performing at their full capacity [43]. This phenomenon indicates tremendous waste in terms of the energy used and capital employed in data centers. Therefore, there is great potential for data centers to reduce power consumption by improve their server utilization. Virtualization technology is a promising solution to increase the server utilization and

18 3 Figure 1.3: The average daily utilization of servers [60] [6]. It rarely exceeds 6%, and up to 30% of servers in data centers are idle. power efficiency of data centers. It can support multiple enterprise applications using fewer computing resources. Specifically, as shown in Fig. 1.4(a), it enables a single server to be shared among multiple performance-isolated platforms called virtual machines (VMs), where each VM can serve one or more applications. Therefore, virtualization plays an significant role in affecting the power efficiency of data centers and can improving it by 25 30%, as Fig.1.4(b) indicates. Virtualization also provides a control knob for dynamic resource provisioning. It enables on-demand computing where computing resources such as CPU, memory, and disk space are allocated to applications as needed, based on the currently prevailing workload demand, rather than statically, based simply on the peak workload demand. As a result, by dynamically provisioning computing resources to VMs and turning servers on and off properly based on the time-varying workload, data center operators can satisfy the QoS requirements of online services and improve the power efficiency at the same time. However, as computing systems become larger and more complex, the task of tuning numerous control parameters, such as CPU share, memory usage and disk space, is becoming

19 4 (a) An illustration of virtualization technology [63]. (b) The impact of virtualization technology for power efficiency [60] [6]. Figure 1.4: An illustration of virtualization technology and its impact for power efficiency. It allows a physical server to be shared among multiple virtual machines, thus improving the server utilization and power efficiency. tedious, time-consuming, expensive and almost impossible for data center operators to do manually. As a result, it is highly desirable for such systems to be largely self-managing or autonomic, only requiring high-level guidance from operators [21]. 1.2 Adaptive Management of Large-Scale Computing Systems So far, various management techniques have been proposed to automate the performance and power management of computing systems, such as control theory, optimization theory, intelligent control, and machine learning. (1) Control theoretic methods formulate these tasks as online control problems in terms of cost/performance metrics, and have been successfully applied to a variety of management problems in computing systems [25, 26]. Classical control theory, such as Proportional-Integral-Derivative (PID) control, has been

20 5 proposed for QoS adaptation in web servers [4], CPU power management [59], task scheduling [37], and load balancing [40]. Typically, assuming a linear system with a continuous input/output domain, a closed-loop feedback controller is designed under stability and sensitivity requirements. However, such Single-Input-Single-Output (SISO) control techniques are too simple to effectively manage a system with multiple physical servers and VMs which should be managed in a coordinated fashion to achieve system-wide objectives. So, modern control theory, which can solve Multi-Input-Multi-Output (MIMO) control problems, has also been proposed for system management. For example, model-predictive control (MPC) has been used for power management of server cluster [73], and linear quadratic regulator (LQR) has been used to adjust CPU usage and reduce power consumption [75]. (2) Optimization techniques and intelligent control are also introduced into this area. In [78], giving a single server hosting multiple VMs and supporting two enterprise applications, the authors propose a two-level fuzzy logic based optimization scheme to allocates the per-vm CPU share. The local controllers within each VM request for a certain amount of CPU share, while the global controller arbitrates those requests to maximize the profit generated by the server. (3) Machine learning is another emerging and promising method for automating computing systems. Tesauro describes how reinforcement learning (RL) can be implemented within a two-level control architecture to manage the service level agreements (SLAs) of a non-virtualized computing environment by allocating servers among multiple applications [62]. Rao et al. also design a RL based approach, VCONF, to maintain the response time of applications by configuring the CPU time, the number of virtual CPUs, and the memory of VMs [54]. Although many management techniques have been proposed as mentioned above, one problem is, in terms of control architectures, a large majority of the literatures adopt centralized control. A centralized controller has access to all the state information and seems more likely to make an optimal decision. However, it also has three serious weaknesses. (1) The scalability of centralized controller is a big bottleneck. For a control architecture to be of practical value in a large-scale and distributed system, it must successfully tackle the so-called curses of modeling and dimensionality. The number of available tun-

21 6 ing options is typically quite large in distributed systems, and the corresponding search space grows exponentially with each new variable, making centralized controller designs intractable. Complex, non-linear, and possibly time-varying component behavior as well as component interactions must be accurately modeled and carefully managed at run time to achieve system-wide goals. The system management task is further complicated when these components must cooperate with each other to solve the overall problem. Therefore, as the system size increases, there are more tuning options and the computational complexity and time of the controller grows exponentially. When the optimal solution is finally made after a long time, it is already outdated and no longer optimal with respect to the dynamic workload. (2) A centralized controller only executes once every control period. It can not react to bursty workload in a timely fashion. (3) A centralized control system has a only one failure point. Once the controller is crashed or behaves abnormally, the whole system is paralyzed. To sum up, a centralized architecture is only capable of managing a stand-alone server or a small-scale system comprising a few servers, and is impractical in large-scale systems. Significant challenges must still be addressed to achieve real-time control of a large-scale computing system with multiple interacting components. To overcome the weaknesses of centralized control, researchers are recently moving to hierarchical and decentralized decision making of computing systems, and this is an area of active research [47] [58]. Related work in this area is discussed in Section 2.2. We believe that the control frameworks described in this thesis is an important step in this direction. 1.3 Contributions This thesis focuses on designing hierarchical and decentralized control architectures to manage the power and performance of virtualized server clusters in data centers. It has three major contributions. First, we propose a hierarchical control framework to enable system management of computing systems. We consider a server cluster hosting enterprise applications on a set of VMs, where the problem of interest is to manage the cluster s power

22 7 consumption by dynamically tuning its processing capacity to match the incoming workload intensity while achieving desired response times for multiple applications. The control hierarchy comprises two levels: (1) a fully distributed control structure at the L0 level that optimizes the CPU share provided to the cluster, wherein local controllers on each server cooperate to optimize the CPU capacity allocated to VMs under their control; and (2) a supervisory controller at the L1 level that reduces power consumption by packing VMs on to fewer servers during periods of light workload and shutting down the extra machines. We develop and compare two different strategies receding horizon control and neural network based control at the L0 level, while the supervisor constantly applies receding horizon control. Our scheme adapts quickly to dynamic workload changes, and is scalable and quite flexible in that servers can be added/removed anytime while maintaining overall system performance. Second, we design a fully decentralized control architecture to address the workload consolidation problem in large-scale server clusters, wherein the cluster s processing capacity is dynamically tuned to satisfy the SLAs associated with the incoming workload while the workload is consolidated onto the fewest number of servers. In a decentralized setting, this problem is decomposed into simpler subproblems, each of which is mapped to a server and solved by a controller assigned to that server. Though controllers on different servers run independently of each other, they are implicitly coupled via the shared high-level system goal, and interactions among controllers may result in undesired system behavior such as SLA violations and frequent switching of processor cores on and off. Using the proposed architecture as the reference, we analyze how the organization of individual controllers within the control structure affects the overall performance for large clusters of up to a thousand servers. Finally, based on the decentralized management scheme designed above, we aim to coordinate among multiple geographically distributed data centers so that they can earn financial reward from electric utilities when reducing power consumption as requested. Specifically, we develop an optimization framework to enable data centers

23 8 operating as controllable load resources within the demand dispatch regime, a demand response (DR) program in which incentives are designed to induce lower electricity usage not just during times of high prices but also when the reliability of the local grid is jeopardized or when the electricity supply and demand are unbalanced. Assuming the availability of geographically distributed and virtualized data centers situated in multiple regional electrical markets, the basic idea is to migrate the workload in the form of live VMs among these centers to maximize the expected payoff. The proposed framework addresses issues specific to the demand dispatch of data centers such as timeliness of VM migrations and the impact of geographic distance on migration times. It also explicitly incorporates risks that may cause the load curtailment operation to be ultimately unsuccessful and result in monetary losses to data center operations; specifically, variability in network bandwidth that can cause uncertainty in VM migration times as well as the uncertain payoff when participating in DR markets. 1.4 Organization The rest of this thesis is organized as follows. Chapter 2 proposes a hierarchical control architecture to manage a server cluster hosting VMs and supporting online services. In this control hierarchy, fully distributed local controllers optimize the CPU share of VMs under their control such that the aggregate CPU share provided to the cluster covers the incoming workload, while a supervisory controller on top reduces the power consumption by shutting down the extra machines during periods of light workload. Two different strategies for the local controllers, receding horizon control and neural network based control, are also experimentally compared. Chapter 3 proposes a fully decentralized control architecture to further improve scalability of the work in Chapter 2. Here, each controller manages one server: its inner loop constantly optimizes the per-vm computing resources to guarantee SLAs, and its outer loop appropriately switches the server or processor package on/off so that dynamic workload is consolidated onto the fewest number of active servers for power reduction. In addition, we organize the controllers in different fashions and analyze how the organizations affect the

24 9 overall performance of large clusters with up to a thousand servers. Chapter 4 extends our focus further to manage multiple geographically distributed data centers. Assuming each data center is controlled by a well-designed power management scheme such as the one in Chapter 3, this chapter develops a high level optimization framework so that data centers can earn financial reward from electric utilities when reducing power consumption as signaled. The idea is to integrate the demand response program into data center operations and curtail power consumption in a data center by migrating the workload in the form of live VMs to other centers. The optimizer aims to maximize the expected profit by trading off among reward, costs, VM migration time/distance, and risks from bandwidth and reward uncertainties. Chapter 5 concludes the thesis and outlines the directions for the future work.

25 10 2. Hierarchical Control Architecture This chapter proposes a hierarchical control architecture to manage a virtualized server cluster hosting VMs and supporting enterprise applications. Here, fully distributed local controllers optimize the CPU share of VMs under their control such that the aggregate processing capacity provided to the cluster covers the incoming workload and the required response time is guaranteed. A supervisory controller on top shuts down the extra machines during periods of light workload so that the power usage is minimized. Two different strategies, receding horizon control and neural network based control, are compared for the local controllers. The material presented in this chapter was previously published in [67] [72] [69]. 2.1 Introduction This chapter develops and experimentally validates a control architecture to manage the performance/power of virtualized computing environments using concepts from hierarchical and distributed control. Specifically, we consider a heterogeneous, virtualized server cluster hosting multiple enterprise applications on VMs and processing a time-varying workload where the QoS requirement is to meet desired response times for these applications. The problem of interest is to manage the cluster s power consumption by tuning its processing capacity to closely match the incoming workload intensity at any given time instant. We show how to develop a control hierarchy to achieve this, structured as follows: A fully distributed control structure optimizes the CPU share provided to the cluster wherein local controllers on each server cooperate to dynamically optimize the CPU share (representing the processing rate) provided to VMs under their control to match the workload intensity and guarantee the response-time requirements. We term this level of the control hierarchy as the L0 level. Since the distributed controllers tune the CPU share of VMs to closely match the

26 11 incoming workload, servers typically have spare processing capacity available during periods of light workload. A supervisory controller uses this knowledge to increase server utilization and reduce power consumption by packing VMs on to fewer servers and shutting down the extra machines. We term this level of the control hierarchy as the L1 level. At the L0 level, the overall control problem of assigning CPU shares to VMs is decomposed into a set of corresponding subproblems and each subproblem is mapped to an underlying system component: a server. Controllers, implemented locally within each server, solve their respective sub-problems of assigning CPU shares to VMs under their control in a cooperative fashion such that the specified performance goals for the overall system are satisfied. To improve scalability, local controllers are developed as non-communicating agents wherein each controller infers the actions of other controllers in the cluster without explicitly exchanging messages between controllers. We develop and compare two different control strategies at this level: (1) receding horizon control, a form of predictive control where the idea is to solve an optimal control problem over a given time horizon and then continuously extend this horizon forward, and (2) reactive control using neural-network based controllers that continuously learn how to tune a VM s CPU share based on observed errors between the actual and desired response times 1. At the L1 level, a supervisory controller aims to reduce the cluster s power consumption by consolidating the workload on to fewer VMs and shutting down servers not needed during periods of light workload. The control laws governing the supervisor are simplified (thereby reducing computational complexity) to provide approximate solutions that the L0 controllers can refine further. For example, the supervisor estimates the incoming workload over a prediction horizon and decides only the number of hosts to operate such that the cluster possesses enough aggregate processing capacity to satisfy this workload, leaving the assignment and fine tuning of CPU shares to individual VMs to the L0 controllers. 1 A classical PID scheme is not used since it is tough to find a suitable combination of gain values for the controller, given a non-linear system such as ours with a time-varying input. These considerations are discussed in Section 2.4.

27 12 We validate the control framework on a cluster of heterogenous servers hosting three benchmark applications (Trade6, RUBBoS, and RUBiS) and processing a time-varying workload. Experimental results demonstrate that the scheme is scalable and allows for the dynamic addition and removal of servers during system operation while maintaining overall performance. The cluster, when subject to our workload traces and managed using the proposed approach, saves, on average, 20% in power-consumption costs over a two hour period when compared to a system operating without dynamic control. We also characterize the effect of using the different control regimes at the L0 level of the hierarchy on both power consumption and QoS, finding that receding horizon control consistently outperforms the neural network based scheme. The chapter is organized as follows. Section 2.2 discussed related works in this area. Section 2.3 introduces the testbed used in our experiments and outlines the proposed control architecture. We then develop the architecture in a bottom-up fashion, focusing on the distributed controllers in Section 2.4 and the supervisory controller in Section 2.5. Experimental results validating the control framework are presented in Section 2.6, and Section 2.7 summarizes this chapter. 2.2 Related Work This section discusses the related work on power/performance management of computing systems using different control architectures. A number of centralized control architectures have been proposed to tackle the management problem in server clusters. For example, Kusic et al. propose a two-level lookahead controller to manage a virtualized cluster by dynamically provisioning VM and CPU resources [34]. Wang et al. design a centralized controller to reduce the power consumed by a virtualized server while achieving the specified SLA [75]. This controller is only able to manage VMs residing within a single server. Meng et al. propose a provisioning method to consolidate multiple VMs as per their cumulative capacity needs, so that resource utilization can be improved while guaranteeing SLAs [44]. This method focuses on relatively high-level resource provisioning over a long timescale (the timescale of interest is in hours), whereas our work considers resource allocation in which

28 13 the fine tuning happens over a much shorter time period in the order of few seconds. A common problem faced by centralized control architectures is their limited scalability; as the number of applications/vms/servers in the system grows, the complexity of the control problem increases exponentially, leading to very long controller execution times and making centralized designs intractable. Recently, hierarchical control architectures have been proposed to manage large-scale computing systems. For example, the authors in [76] implement hierarchical control to manage multiple VMs in a coordinated fashion. The authors of [74] present a three-level control structure for power management of data centers. However, as its goal is to drive the power consumption to a desired set point, this architecture can only be applied to applications which allow for degraded performance. The authors also do not directly address the workload consolidation problem in that under utilized servers are not turned off to save power. By contrast, our structure offers tuning knobs to affect a tradeoff between power and performance, and unused servers are idled for further power efficiency. Jung et al. propose a control framework to maximize the utility of a virtualized computing environment by optimizing the power, performance, and transient costs involved in turning servers on and off [31]. The experiments reported in the paper only test a small scale system of up to eight homogeneous servers with no discussion on how the controller scales when applied to a larger number of heterogeneous servers. Raghavendra et al. introduce a coordinated multi-level architecture for power management across hardware and software for virtualized computing systems [53]. Although they assume two types of servers, only one type of homogeneous server is used in each individual simulation. In our work, the framework is capable of handling a large number of heterogeneous servers. Tesauro describes how reinforcement learning (RL) can be implemented within a two-level control architecture to manage the SLA of a non-virtualized computing environment by allocating servers among multiple applications [62], and that paper does not consider the issue of power management. Rao et al. also design a RL based approach, VCONF, to maintain the response time of applications by configuring the CPU time, the number of virtual CPUs, and the memory of VMs [54]. However, to learn an optimal or sub-optimal policy, the RL technique requires

29 14 a large amount of training data and incurs a long learning time. Additionally, if a major change (such as adding new server types or modifying the SLA) occurs in the system, the RL policy needs to be retrained. The control structures proposed in our work can accommodate such changes easily without major modifications to the structure itself; these changes can be reflected in our system by modifying parameter values in the appropriate local controllers. In terms of research into fully decentralized control structures, Das et al. demonstrate a multi-agent approach to manage power and performance in data centers by switching servers off during light workload [15]. However, as they use an off-line model building approach to empirically measure the response time as a function of different numbers of clients and servers powered on, the approach is only feasible for small-scale systems. The authors in [72] use distributed model-predictive controllers to manage the performance of computing system, but reducing energy consumption is out of their scope. Chen et al. propose an integrated solution with multiple controllers to manage application performance as well as power and cooling in a virtualized data center comprising twenty servers [12]. However, the scalability of the approach when managing larger numbers of servers is not evaluated. To summarize, decentralized decision making in the context of real-time control of largescale systems is an emerging research topic within the autonomic computing community. We believe that our control framework developed and analyzed in Chapter 2 and Chapter 3 extends the current state of the art in this area in the following aspects. First, we eventually design a fully distributed control structure to address the workload consolidation problem of a cluster with a thousand servers. Second, we study how interactions between independent controllers as well as specific controller organizations within this control structure affect system behavior. 2.3 Preliminaries This section describes the experimental setup, including the system architecture, benchmark applications used for the online services, and workload generation. The overall control architecture is also discussed.

30 15 Workload Generator Workload for Gold, Silver, Bronze services Dispatcher Supervisory Controller Chronos Application tier System State Switching Decisions Controller Controller Controller Controller Bronze Silver Gold Bronze Silver Gold Bronze Silver Gold Bronze Silver Gold VM VM VM VM VM VM VM VM VM VM VM VM Apollo Poseidon Eros Demeter Database tier Gold Database Silver Database Bronze Database Rada Starscream Megatron Figure 2.1: The system architecture hosting the three online services. The supervisory controller makes switching decisions affecting servers in the application tier whereas local controllers on each server decide the CPU share provided to VMs under their control The System Architecture The computing cluster used in our experiments consists of eight heterogeneous servers: a mix of Dell PowerEdge 2950 and PowerEdge 1950 machines networked via a gigabit switch. Virtualization of this cluster is enabled by VMware s ESX Server. The operating system on each VM is SUSE Linux Enterprise Server 10. The ESX server controls the CPU share (in MHz), disk space, and memory allotted to the VMs, and provides an application programming interface (API) to support the remote management of VMs. The controllers use this API to dynamically assign CPU shares to the VMs. Fig. 2.1 shows the architecture supporting three web-based applications termed Gold, Silver, and Bronze using front-end application servers and back-end databases. The Gold service is enabled by Trade6, a stock-trading benchmark which allows users to browse, buy, and sell stocks [28]. So, users can perform dynamic content retrieval as well as transaction commitments, requiring database reads and writes, respectively. Trade6 resides within the IBM WebSphere Application Server, which in turn, is hosted by a VM in the application

31 16 tier, and the database component is DB2. The Silver service is enabled by RUBBoS, a bulletin board application that allows users to browse stories and post comments [56]. The Bronze service is enabled by RUBiS, an auction site that allows users to browse for items, bid, and post comments [57]. The Silver and Bronze services are each hosted by Apache application servers with MySQL as the database. We focus on dynamic CPU resource provisioning only within the application tier, since in most cases the application layer requires many more CPUs than the database layer for each online service [7]. For example, this ratio can be as high as ten application processors per database processor for the SAP enterprise application. Similarly, Oracle applications have about a five to one ratio of application server processors to database processors. Therefore, increasing processor utilization at the application layer by consolidating the workload has the potential for significant energy savings. In our setup, the application tier comprises of four servers, each hosting three VMs. Each VM within a server is dedicated to one of the Gold, Silver, or Bronze services, and VMs residing on different servers but supporting the same application form a virtual computing cluster. The local controller on a server dynamically allocates the optimal CPU share to each of its VMs in response to the incoming workload intensity. Servers comprising the database tier are not virtualized. These servers run SUSE Linux Enterprise with DB2 or MySQL as the database component and each server supports a dedicated database servicing a single application. Incoming requests to an application are dispatched to VMs within the corresponding virtual cluster in weighted round-robin fashion with the weights being proportional to CPU share. At the start of a control step, each local controller transmits its most recent CPUshare decision to the dispatcher. Since a VM s CPU share reflects processing capacity, the larger the CPU share, the more requests that VM will receive Workload Generation We use Httperf, an open-loop workload generator, to send buy/sell or browse/comment requests to each service [46]. We ensure that the incoming workload to each of the three

32 17 services show time-of-day variations typical of many enterprise workloads where the number of arrivals can change quite significantly within a short time period. The workload used in our experiments was synthesized, in part, using log files from the Soccer World Cup 1998 Web site [5]. The traces have the desirable characteristics of burstiness and variability for stressing web applications. Also, the results presented in this thesis assume a sessionless workload; requests are assumed to be independent of each other and there is no need to maintain state information for multiple requests belonging to one user session. However, our controller design and implementation can be extended easily to accommodate sessionbased workload as well. For example, a clustered configuration of our IBM WebSphere and Apache Tomcat installations will enable sessions to be replicated across all live instances of the application servers. Session replication ensures that state information for each session is shared by all application servers in the cluster via peer-to-peer communication. This allows requests belonging to a session to be re-routed between VMs when hosts are powered up/down by the supervisor with no interruption in service to the end user The Control Architecture Centralized controller designs quickly become intractable for larger systems. Fortunately, hierarchical or decentralized control where multiple controllers interact with each other to satisfy system-wide QoS goals can be used to reduce the dimensionality of the overall problem [58]. In a hierarchical structure, a controller is only responsible for optimizing the behavior of components under its control while satisfying the constraints imposed on it by a higher-level controller. Fig. 2.1 shows our hierarchical control solution superimposed on the system architecture, comprising of a supervisory controller and distributed controllers local to each server. The different control levels have the following responsibilities: L0 control level: A fully distributed control structure tunes the CPU share provided to the cluster wherein controllers on each server cooperate to dynamically optimize the CPU share provided to VMs under their control to match the workload dispatched to the server and guarantee the response-time requirements.