White PAPER Managing Data Center Power and Cooling Introduction: Crisis in Power and Cooling As server microprocessors become more powerful in accordance with Moore s Law, they also consume more power and generate more heat. Similar geometric improvement in disk storage technology has driven rapid growth of online data, using mass storage systems that often consume as much power as the servers themselves. With the rapid growth in computing power and storage capacity, typical data center power consumption (kwh) and power density (kwh/sq. ft.) are both spiraling upward, placing a strain on many existing data center power distribution and cooling systems. Each watt of power consumed in the data center requires 3.413 BTU/hour of cooling capacity to remove the associated heat. Depending on climate factors, removing heat at the rate of 3.413 BTU/hour requires an additional 0.4-0.6 watts of electrical power. According to HP, data center power densities have grown from 2.1 kwh/rack in 1992 to 14 kwh/rack in 2006, requiring cooling systems that can deal with local "hot spots" within the computer room. A Ziff-Davis Survey conducted in November 2005 found that 71% of IT decision makers are dealing with or tracking issues related to power consumption and cooling, while 63% are increasing electrical power capacity or expanding the size of the data center. A similar survey by IDC in May 2006, found that power provisioning and power consumption are among the top three issues in the data center. Another aspect of the power problem is the growing cost of electricity. Currently the 3-year costs of power and cooling in the U.S. are roughly equivalent to the acquisition cost of data center capital equipment, according to IDC. With the demand for computing power and the cost of electrical power continuing to escalate, power can be expected to consume a larger share of IT budgets, possibly as much as 50% in the next few years. As these trends continue to unfold, data center managers will need to give careful consideration to the impact of each investment on the power and cooling profile of their facility. In addition, the life cycle costs of power and cooling will become increasingly important factors in TCO calculations used to guide selection among competing solutions. Aiding this analysis is an emerging set of data center and server power efficiency metrics being developed by the EPA and industry consortia. In the long run, the EPA hopes that manufacturers of data center equipment will publish EnergyStar metrics to help customers better manage power consumption. The key power efficiency metrics coming out of this and similar efforts are expected to be application workload/watt for servers, GB/watt for storage, and Gbps/watt for networking. In parallel with standards efforts, end users can readily develop their own power efficiency metrics and calculations. For example, for compute power efficiency, one can focus on the performance benchmarks that are meaningful for critical applications and divide by the corresponding power consumption in watts. For High Performance Computing, Mflops/watt is a popular power efficiency metric. For databases application workloads, TPC-H Composite Queries-per-Hour per watt (QpHH/watt) for a given size of database is one possible power efficiency metric. There are a number of potential benefits that can be derived from an increased focus on power consumption and power efficiency: Extending the life of existing data centers and minimizing retrofits Gaining at least partial control of growing expenses for power and cooling Optimizing new data center designs 2007 FORCE10 NETWORKS, INC. [ P AGE 1 OF 5 ]
Managing Data Center Power and Cooling Optimizing the overall power efficiency of the data center requires a comprehensive approach that focuses on technologies and strategies to minimize power consumption and maximize power efficiency at every level within the infrastructure, including CPU chips, power supplies, servers, storage devices, and networking equipment. In addition to measures that maximize power efficiency for hardware devices, there are also software strategies, such as server virtualization, that can play a significant role in reducing power consumption. CPU chips: At a given level of compute performance, the basic architecture of the CPU chip can have a significant impact on power consumption. For example, integrated memory controllers on the CPU can reduce overall power consumption of the chip set. Many server manufacturers offer a fairly wide choice of CPUs in a given model of server, allowing power efficiency to be considered when making product selections and tradeoffs. Beyond architectural differences, all leading-performance CPU chips have reached levels of power consumption that prevent tracking Moore s Law simply by increasing clock speed. As shown in Figure 2, the geometric growth in transistors per chip is expected to continue unabated through 2010, while power per chip and clock speed are being forced to level off significantly. These technology trends have led chip manufacturers to turn to multi-core chips to take advantage of continued growth in transistor densities. For example, dual-core CPUs can deliver higher performance than single-core CPUs because scaling back clock frequency by only ~15-20% can cut power consumption by ~40%, allowing two cores per die. Over the remainder of this decade we can expect Moore s Law at the transistor level to drive a doubling in the number of cores per chip every 18-24 months. The applications that benefit the most from multi-core chip architectures include multi-threaded applications (such as cluster computing), transaction processing, and multi-tasking. For applications such as these, a Figure 1. Growth trends in transistors/chip, clock speed, and power/chip 2007 FORCE10 NETWORKS, INC. [ P AGE 2 OF 5 ]
dual core processor can deliver >60% higher performance than a single core processor dissipating the same power. However, single core processors will offer better performance for I/O intensive, single-threaded applications because multiple cores have to contend for memory and I/O bandwidth. From both a performance and a power perspective, the trend to multi-core CPUs should drive a higher priority on multi-threaded programming models for new applications. A second chip level technique to improve power efficiency is with dynamic Clock Frequency and Voltage Scaling (CFVS). CFVS provides performance-on-demand by dynamically adjusting CPU performance (via clock rate and voltage) to match the workload. With CFVS, the CPU runs at the minimum clock speed (and power level) needed by the current workload. Clock frequency and voltage are controlled by the operating system s power management utility via industrystandard Advanced Configuration and Power Interface (ACPI) calls. Figure 2. Effect of CFVS on power consumption and power efficiency The benefits of CFVS are depicted conceptually in the charts of Figure 2. In the chart on the left, power consumption is shown as a function of utilization, both with and without CFVS. CFVS can deliver up to 75% power savings at idle and 40-70% power savings for utilization in the 20-80% range. In the chart on the right, power efficiency in workload units per watt is shown as a function of utilization. Because CPU performance is not degraded by CFVS, dramatic improvements in power efficiency are possible with CFVS, as depicted in the chart. Server Packaging: Rack-optimized servers and blade servers can share a number of components, including higher efficiency power supplies and cooling subsystems. Compared to traditional servers, blade servers can reduce power consumption by as much as 20-50%, while also consuming less floor space. The result is lower overall power consumption but higher power densities measured in terms of watts/rack. The higher power densities (in the range of >12-15 kwh/rack) may require special cooling capabilities, such as local liquid cooling of blade server racks. Irrespective of server packaging choices, server power supplies (as well as the power supplies of all data center devices) should be selected, wherever possible, for high power conversion efficiencies (e.g., conversion efficiency in excess of 80% and power factors approaching 1.0). Server Virtualization: Server virtualization allows applications to be consolidated on a smaller number of servers, through elimination of many low utilization servers dedicated to single applications or operating system versions. Reducing the number of servers can bring power savings of up to 50% depending on the application mix. Server virtualization is attractive because it can be deployed on existing servers, minimizes disruption of existing applications, and has many other TCO benefits besides power conservation. Storage: For storage devices, power is consumed primarily by spindle motors, which means that power consumption is relatively independent of the capacity of the disk. Therefore, storage power efficiency (measured in GBytes/watt) is maximized by deploying the highest capacity disks that have I/O characteristics compatible with the applications being served. Currently, large drives (~500 GB) are often deployed for less demanding applications, such as data mining, while drives with capacities >100 GB are used for applications requiring higher performance I/O, such as for database applications. Storage virtualization technologies and large-scale tiered storage are strategies that offer the potential to maximize power efficiency by minimizing storage over-provisioning. Switched Network Infrastructure: Power efficiency for switches and routers is measured by throughput efficiency in Gbps/watt. For high-density, chassisbased switch/routers required for large data centers, power efficiency largely depends on the power characteristics of the device s backplane. In addition to providing the physical connectivity for the switching fabric carrying data between line cards, the backplane serves as the grid that distributes power to the line cards and control modules of the switch. For passive copper backplanes, power efficiency is primarily a 2007 FORCE10 NETWORKS, INC. [ P AGE 3 OF 5 ]
function of the resistance of the copper traces. For example, the Force10 E-Series switch/router uses a patented 4 layer, 4 ounce copper backplane to minimize resistance and power consumption. As a result, the E-Series backplane itself has a power efficiency of 4.5 Gbps/watt. As shown in Figure 3 (which is an excerpt from Force10 s power and power efficiency modeling tool) the E1200 switch/router fully configured with 1000Base-T GbE interfaces running at line rate (LR) has a system-level power efficiency of 0.125 Gbps/watt (8 watts per 1000Base-T port). 1.Switch Consolidation: consolidating a number of low density switches into a large, high density switch with shared redundant power offers power savings analogous to that of the blade server vs. traditional servers. High density also allows the traditional access and aggregation/distribution layers of switching to be collapsed into a single layer of switching that performs both of these functions. The scalability/density of the Force10 E-Series often enables network consolidations with a >3:1 reduction in the number of data center switches. This high reduction factor is due to the combination of the following factors: Elimination of a distinct access switching layer (i.e., 2-Tier switching vs. 3-Tier switching) More servers per aggregation switch, resulting in fewer aggregation switches More aggregation switches per core switch, resulting in fewer core switches Figure 3. Power Consumption of a E1200 with 672 Line-Rate GbE Ports * This power consumption table uses the maximum power draw in calculating each element in the E-Series system. Actual average power draw will typically be 10-25% more efficient. In similar fashion, an E1200 fully configured with 4 port 10 GbE XFP line cards running at line rate has a system level power efficiency of 0.12 Gbps/watt (83 watts per 10 GbE port). The high reserve capacity of the E-Series backplane will allow future improvements in line card port densities to be achieved on the same power budget. This means that system-level power efficiency improvement will track port densities. For example, when GbE port density is doubled, the system power efficiency will also nearly double. Ultra high density, power-efficient switch/routers with carrier-class reliability, such as the Force10 E-Series, can be leveraged in two additional ways to further reduce data center power consumption. 2.Unified Switch Fabric: With the advent of intelligent Ethernet NICs that reduce both host-based latency and CPU utilization for network transfers, Ethernet is well-positioned to function as a unified, or converged, switching fabric that provides LAN connectivity, storage networking, and cluster interconnect across the data center. With a unified Ethernet fabric, power is conserved because only one network adapter is needed for each server and no additional sets of switches are required for specialized fabrics. Virtual Data Center: Another approach to reducing power consumption in the data center is to move away from a model based on static, dedicated physical resources for each application toward a virtualized model where each application draws on a shared pool of resources in order to satisfy workload requirements. Since workloads of various applications peak at different times in the business cycle, the shared resource model can do the same job with far fewer resources, resulting in far lower power consumption. Server virtualization, in conjunction with automated system management and a unified Ethernet switching fabric, constitutes a significant step toward a virtualized data center architecture that offers optimized resource utilization and minimal power consumption. A detailed discussion of the Force10 Networks blueprint for the design of a power-efficient Virtual Data Center is available on the Force10 website. 2007 FORCE10 NETWORKS, INC. [ P AGE 4 OF 5 ]
Conclusion Until now, IT staff has typically focused on the escalation of computing power and storage capacity, coupled with smaller form factors for servers and storage devices, and the strains they are placing on the power and cooling facilities of the data center. However, according to the Ethernet Alliance, inefficient networking could also be wasting as much as $450m a year, or 5.8 TW-h, in the United States, and potentially three times that much, worldwide. To address network energy efficiency, the IEEE has decided to form a study group to scope standards pertaining to Energy Efficient Ethernet (EEE). This group will work to ensure maximum efficiency under normal use scenarios and develop designs for lower energy use at lower utilization and for minimum energy usage over the operational lifetime of networking platforms. In addition to industry alliance and standards efforts, there are a number of technologies and strategies available to allow data center managers to improve the power efficiency of existing data centers and optimize the power and cooling designs of new data centers. The benefits of focusing on power efficiency metrics and power conservation at every level within the data center infrastructure minimizes the cost of physical plants, as well as the recurring cost of electrical power, an increasingly important component of TCO and of overall IT budgets. Force10 Networks, Inc. 350 Holger Way San Jose, CA 95134 USA www.force10networks.com 408-571-3500 PHONE 408-571-3550 FACSIMILE 2007 Force10 Networks, Inc. All rights reserved. Force10 Networks and E-Series are registered trademarks, and Force10, the Force10 logo, P-Series, S-Series, TeraScale and FTOS are trademarks of Force10 Networks, Inc. All other company names are trademarks of their respective holders. Information in this document is subject to change without notice. Certain features may not yet be generally available. Force10 Networks, Inc. assumes no responsibility for any errors that may appear in this document. WP20 307 v1.1 2007 FORCE10 NETWORKS, INC. [ P AGE 5 OF 5 ]