Data Center Optimization WHITE PAPER PARC, 3333 Coyote Hill Road, Palo Alto, California 94304 USA +1 650 812 4000 engage@parc.com www.parc.com
Abstract Recent trends in data center technology have created unprecedented new flexibility in data center configuration and operation, and yet there is still a substantial amount of wasted capacity and wasted energy in current data centers. A PARC collaboration with Power Assure is developing and commercializing new technology to aid the operation of data centers. Our tools aim to provide monitoring insights, recommendations, and automation to expand the capacity and reduce the energy used in data centers. Background Data centers are recognized as an increasingly troublesome percentage of electricity consumption in the US. A recent revision of the Koomey report 1 puts this at 2% of all US power consumption and 1.3% of worldwide power consumption. While a few larger companies have the internal resources to devote specialized effort to improving the efficiency of their own data center operations, Power Assure s products target a broader market of data centers. Power Assure s tools can provide power management insights and automation to these data centers, while still allowing customers to achieve their own mission-critical performance objectives. The development of metrics of data center efficiency (e.g. PUE) has focused attention on improving energy efficiency in data centers. These early metrics drew particular attention to the cooling and other additional power consumption beyond the power used directly for IT operations. Indeed, there have been dramatic improvements in data centers, according to these metrics. However, the focus of these metrics can miss opportunities to reduce the IT power consumption. Metrics devised more recently (e.g. CADE) have drawn attention more broadly to all power consumption in the data center, including both HVAC and IT, showing that there is still significant underutilization in data centers. While energy efficiency is certainly important, the reliability of the applications is still the primary objective. Another key development is virtualization. By allowing multiple virtual machines (servers with their own operating system) to share the same physical resources, virtualization technologies are enabling major improvements in data center consolidation. In order to coax IT managers away from using dedicated physical resources, it is often the case that initial steps to virtualization involve significant overprovisioning of resources. This is one of the reasons why we still see very low utilization in data centers, even when they are virtualized and have new flexibility in operations. PARC and Power Assure s technology can exploit this new flexibility, and find extra capacity and power savings in virtualized data centers without impacting reliability. Contingent Resources An important concept in data center management is that idle resources (such as CPU, memory, disk, or bandwidth) are not necessarily wasted resources. If resources are idle but reserved for mission critical applications that might need them, these idle resources can be seen as providing valuable contingent benefit to the operation of the data center. For example, an application implementing a website or web service may not be able to predict precisely in advance the number of visitors using the service. This makes it particularly challenging to manage these contingent resources in data centers. Without modeling and optimization, data center operators play it safe and usually provision for peak plus 20% 1 Jonathan Koomey. 2011. Growth in Data center electricity use 2005 to 2010. Oakland, CA: Analytics Press. August 1. http://www.analyticspress.com/datacenters.html PARC, 3333 Coyote Hill Road, Palo Alto, California 94304 USA +1 650 812 4000 engage@parc.com www.parc.com page 2
demand to ensure they never exceed capacity. Using patented modeling and optimization technology, PARC and Power Assure have developed methods to make certain that some idle resources play a valuable contingent role and some play an emergency role, rather than all being excess reservations placed primarily due to fear and uncertainty of applications resource needs. As a result, power consumption is significantly reduced, peak capacity per application is increased, service levels are met, and overall reliability is much higher. The management of contingent resources is built on three pillars: monitoring and modeling, quality of service, and optimization (see Figure 1). Monitoring and Modeling Quality of Service Optimization Figure 1. Key elements of data center management Monitoring and Modeling The foundation of good management is careful monitoring and modeling of the existing data center operations. An essential part of this is the prediction of applications future resource needs. Applications that are naturally highly unpredictable will benefit from much larger contingent reservations than those that are highly predictable. Another essential part of the monitoring and modeling is accurately modeling the power and performance consequences of changes to the operation of a virtualized data center. To this end, Power Assure has developed the PAR 4 benchmarking methodology that can be used to characterize accurately the performance of IT equipment and the impact of migration between different machine types, as well as the power consumption of machine types under different IT loadings. This modeling, and the monitoring necessary to produce this modeling, ties directly to key performance objectives in data centers, and is critical to optimizing accurately the overall performance and capacity. PARC and Power Assure are developing and commercializing technology to address the practical challenges of modeling for data center optimization: producing models quickly for new deployments, adapting models that are shifting over time, and conditioning the models on factors such as time-of-day and day-of-week that can improve the quality of predictions. PARC, 3333 Coyote Hill Road, Palo Alto, California 94304 USA +1 650 812 4000 engage@parc.com www.parc.com page 3
Quality of Service A second major consideration is the quality of service (QoS) needs of applications. This helps with measuring the importance of contingent resources to the performance of applications. For example two applications might have nearly identical predicted resource needs, but one application (e.g. a web server attempting to deliver rapid response) has high QoS (Level 4) needs, while the other application (e.g. a batch job that can complete overnight) has low QoS (Level 2) needs. The first application would benefit from large contingent reservations to cover the uncertainty of its resource needs, while the second application, even though it has similar uncertainty about resource needs, would not require large contingent reservations. Both prediction of resource needs and quality of service are important for determining contingent reservations. It is the combination of these two aspects of the application that can distinguish a system that is making valuable contingent reservations from a system that is wasting idle resources. Power Assure s product is aimed at a broad market of diverse data center operations. To meet the needs of this market, PARC and Power Assure have developed a general-purpose approach to specifying applications QoS needs and ensuring that appropriate contingent reservations are used to protect the performance of applications. Figure 2 illustrates the kind of consolidation possible when multiple virtualized applications sharing physical resources (a hardware cluster) have been sized using both predictive models of their resource needs and specifications of their QoS requirements. The savings illustrated is over and above a more typical sizing operation where each individual application is given a virtual reservation equal to its original pre-virtualized, stand-alone hardware configuration. PARC, 3333 Coyote Hill Road, Palo Alto, California 94304 USA +1 650 812 4000 engage@parc.com www.parc.com page 4
Figure 2. Consolidation from modeling and QoS analysis Optimization The final step in consolidation and energy savings is the optimization of a group of applications sharing the same physical resources. Perhaps the most compelling benefit of virtualization technologies is that multiple applications can share the same physical resources. Rather than pinning physical resources one to one with applications, all systems of virtualization, by various means, allow resources unused by one application to be used by other applications. However, even though most virtualization systems benefit from this sharing capacity, surprisingly, capacity planning is frequently focused on individual applications. Their individual reservations are often set based on individual traces, or based on pre-virtualized resource needs, with little anticipation of the sharing benefits of their combined virtualized deployment. This type of planning would be similar to the phone company sizing their circuits for the maximum of all possible phone calls, when a considerably smaller circuit capacity would meet the statistically predicted aggregate needs of their customers. PARC, 3333 Coyote Hill Road, Palo Alto, California 94304 USA +1 650 812 4000 engage@parc.com www.parc.com page 5
Similar to the statistical multiplexing of communication channels, data center management can benefit from statistical packing of the aggregate resource needs of their virtualized applications. PARC and Power Assure have developed highly efficient optimization algorithms that are able to effectively address the practical challenges of applying statistical packing to data centers. At the highest level our approach explicitly plans to share resources, and can achieve higher consolidation and reduced power consumption. In order to achieve this consolidation, we have further addressed more detailed planning so that the underlying virtualization system can function at its best, allowing us to fully benefit from its scheduling and live migration capabilities. We have also found attractive ways to allow data center operators to participate in the optimization, by allowing them to control the degree of isolation among applications and the level of supervision of automation. Figure 3 illustrates the additional consolidation possible when contingent resource planning is performed for a group of virtualized applications, based on the optimization algorithms described above. The jobs have individual contingent reservations (not shown), but they have been planned anticipating sharing possibilities within the group. Any single application can utilize the spare resources (pink space) that are available, giving individual applications a generous peak capacity even under consolidation. Note the additional improvement beyond Figure 2, which already shows the consolidation benefit of good modeling and QoS. Figure 3 shows the benefits of modeling, QoS, and optimization by statistical packing of the group. It illustrates the benefits of a semi-automatic configuration of a cluster where an operator can supervise the implementation of a configuration which is designed to function well for an entire week. PARC, 3333 Coyote Hill Road, Palo Alto, California 94304 USA +1 650 812 4000 engage@parc.com www.parc.com page 6
Figure 3. Further consolidation through group-based resource planning PARC, 3333 Coyote Hill Road, Palo Alto, California 94304 USA +1 650 812 4000 engage@parc.com www.parc.com page 7
Summary PARC and Power Assure have discovered that significant opportunities for power savings arise from carefully monitoring and managing the quality of service needs of data center users. Carefully preparing to meet contingencies is our central design principle for efficiently operating data centers. Rather than operating in a way that throws unnecessary resources at jobs, the careful modeling, analysis, and optimization of contingent resources can both improve the performance and capacity of applications running in data centers and reduce the cost and impact of data center operations reducing power consumption, increasing the flexibility of power consumption, and finding new capacity in existing data centers. PARC, a Xerox company, is in The Business of Breakthroughs. Practicing open innovation, we provide custom R&D services, technology, expertise, best practices, and IP to global Fortune 500 and Global 1000 companies, startups, and government agency partners. We create new business options, accelerate time to market, augment internal capabilities, and reduce risk for our clients. 2012 Palo Alto Research Center Incorporated. All Rights Reserved. PARC, the PARC logo and The Business of Breakthroughs are service marks of Palo Alto Research Center Incorporated. All other trademarks used herein are the property of their respective owners. PAR 4 and Power Assure are registered trademarks of Power Assure, Inc. PARC, 3333 Coyote Hill Road, Palo Alto, California 94304 USA +1 650 812 4000 engage@parc.com www.parc.com page 8