# Data Centers. Comparing Data Center & Computer Thermal Design

1 Data Centers Comparing Data Center & Computer Thermal Design By Michael K. Patterson, Ph.D., P.E., Member ASHRAE; Robin Steinbrecher; and Steve Montgomery, Ph.D. The design of cooling systems and thermal solutions for today s data centers and computers are handled by skilled mechanical engineers using advanced tools and methods. The engineers work in two different areas: those who are responsible for designing cooling for computers and servers and those who design data center cooling. Unfortunately, a lack of understanding exists about each other s methods and design goals. This can lead to non-optimal designs and problems in creating a successful, reliable, energyefficient data processing environment. This article works to bridge this gap and provide insight into the parameters each engineer works with and the optimizations they go through. A basic understanding of each role will help their counterpart in their designs, be it a data center, or a server. Server Design Focus Thermal architects are given a range of information to begin designing the thermal solution. They know the thermal design power (TDP) and temperature specifications of each component (typically junction temperature, T J, or case temperature T C ). Using a processor as an example, Figure 1 shows a typical component assembly. The processor is specified with a maximum case temperature, T C, which is used for design purposes. In this example, the design parameters are TDP = 103 W and T C = 72 C. Given an ambient temperature specification (T A ) = 35 C, the required thermal resistance of this example would need to be equal to or lower than: CA, required = (T C T A )/TDP = 0.36 C/W (1) Sometimes this value of CA is not feasible. One option to relieve the demands of a thermal solution with a lower thermal resistance is a higher T C. Unfortunately, the trend for T C continues to decline. Reductions in T C result in higher performance, better reliability, and less power used. Those advantages are worth obtaining, making the thermal challenge greater. One of the first parameters discussed by the data center designer is the temperature rise for the servers, but this value is a secondary consideration, at best, in the server design. As seen by Equation 1, no consideration is given to chassis temperature rise. The thermal design is driven by maintaining component temperatures within specifications. The primary parameters being T c, T ambient, and CA, actual. The actual thermal resistance of the solution is driven by component selection, material, configuration, and airflow volumes. Usually, the only time that chassis T RISE About the Authors Michael K. Patterson, Ph.D., P.E., is thermal research engineer, platform initiatives and pathfinding, at Intel s Digital Enterprise Group in Hillsboro, Ore. Robin Steinbrecher is staff thermal architect with Intel s Server Products Group in DuPont, Wash. Steve Montgomery, Ph.D., is senior thermal architect at Intel s Power and Thermal Technologies Lab, Digital Enterprise Group, DuPont, Wash. 3 8 A S H R A E J o u r n a l a s h r a e. o r g A p r i l

3 ment. Monitoring of temperature sensors is accomplished via on-die thermal diodes or discrete thermal sensors mounted on the printed circuit boards (PCBs). Component utilization monitoring is accomplished through activity measurement (e.g., memory throughput measurement by the chipset) or power measurement of individual voltage regulators. Either of these methods results in calculation of component or subsystem power. Data Center Design Focus The data center designer faces a similar list of criteria for the design of the center, starting with a set of requirements that drive the design. These include: Cost: The owner will have a set budget and the designer must create a system within the cost limits. Capital dollars are the primary metric. However, good designs also consider the operational cost of running the system needed to cool the data center. Combined, these comprise the total cost of ownership (TCO) for the cooling systems. Equipment list: The most detailed information would include a list of equipment in the space and how it will be racked together. This allows for a determination of total cooling load in the space, and the airflow volume and distribution in the space. Caution must be taken if the equipment list is used to develop the cooling load by summing up the total connected load. This leads to over-design. The connected load or maximum rating of the power supply is always greater than the maximum heat dissipation possible by the sum of the components. Obtaining the thermal load generated by the equipment from the supplier is the only accurate way of determining the cooling requirements. Unfortunately, the equipment list is not always available, and the designer will be given only a cooling load per unit area and will need to design the systems based upon this information. Sizing the cooling plant is straightforward when the total load is known, but the design of the air-handling system is not as simple. Performance: The owner will define the ultimate performance of the space, generally given in terms of ambient temperature and relative humidity. Beaty and Davidson 2 discusses typical values of the space conditions and how these relate to classes of data centers. Performance also includes values for airflow distribution, total cooling, and percent outdoor air. Reliability: The cooling system s reliability level is defined and factored into equipment selection and layout of distribution systems. The reliability of the data center cooling system requires an economic evaluation comparing the cost of the reliability vs. the cost of the potential interruptions to center operations. The servers protect themselves in the event of cooling failure. The reliability of the cooling system should not be justified based upon equipment protection. Data Center Background Experience in data center layout and configuration is helpful to the understanding of the design issues. Consider two cases at the limits of data center arrangement and cooling configuration: 1. A single rack in a room, and 2. A fully populated room, with racks side by side in multiple rows. Case 2 assumes a hot-aisle/cold-aisle rack configuration, where the cold aisle is the server airflow inlet side containing the perforated tiles. The hot aisle is the back-to-back server outlets, discharging the warm air into the room. The hot aisle/cold aisle is the most prevalent configuration as the arrangement prevents mixing of inlet cooling and warm return air. The most common airflow configuration of individual servers is front-to-back, working directly with the hot-aisle/cold-aisle concept, but it is not the only configuration. Consider the rack of servers in a data processing environment. Typically, these racks are 42U high, where 1U = 44.5 mm (1.75 in.) A U is a commonly used unit to define the height of electronics gear that can be rack mounted. The subject rack could hold 42 1U servers, or 10 4U servers, or other combinations of equipment, including power supplies, network hardware, and/or storage equipment. To consider the two limits, first take the described rack and place it by itself in a reasonably sized space with some cooling in place. The other limit occurs when this rack of equipment is placed in a data center where the rack is one of many similar racks in an aisle. The data center would have multiple aisles, generally configured front-to-front and back-to-back. Common Misconceptions A review of misconceptions illustrates the problems and challenges facing designers of data centers. During a recent design review of a data center cooling system, one of the engineers claimed that the servers were designed for a 20 C (36 F) T RISE, inlet to outlet air temperature. This is not the case. It is possible that there are servers that, when driven at a given airflow and dissipating their nominal amount of power, may generate a 20 C (36 F) T, but none were ever designed with that in mind. Recall the parameters that were discussed in the section on server design. Reducing CA can be accomplished by increasing airflow. However, this also has a negative effect. More powerful air movers increase cost, use more space, are louder, and consume more energy. Increasing airflow beyond the minimum required is not a desirable tactic. In fact, reducing the airflow as much as possible would be of benefit in the overall server design. However, nowhere in that optimization problem is T across the server considered. Assuming a simple T RISE leads to another set of problems. This implies a fixed airflow rate. As discussed earlier, most servers monitor temperature at different locations in the system and modulate airflow to keep the components within desired temperature limits. For example, a server in a well designed data center, particularly if located low in the rack, will likely see a T A of 20 C (68 F) or less. However, the thermal solution in the server is normally designed to handle a T A of 35 C (95 F). If the inlet temperature is at the lower value, the case temperature will be lower. Then, much less airflow is required, and if variable flow capability is built into the server, it will run quieter and consume less power. The server airflow 4 0 A S H R A E J o u r n a l a s h r a e. o r g A p r i l

4 (and hence T RISE ) will vary between the T A = 20 C (68 F) and 35 C (95 F) cases, a variation described in ASHRAE s Thermal Guideline for Data Processing Environments. The publication provides a detailed discussion of what data should be reported by the server manufacturer and in which configuration. Another misconception is that the airflow in the server exhaust must be maintained below the server ambient environmental specification. The outlet temperature of the server does not need to be below the allowed value for the environment (typically 35 C [95 F]). Design Decisions To understand the problems that can arise if the server design process is not fully understood, revisit the two cases introduced earlier. Consider the fully loaded rack in a space with no other equipment. If sufficient cooling is available in the room, the server thermal requirements likely will be satisfied. The servers will pull the required amount of air to cool them, primarily from the raised floor distribution, but if needed, from the sides and above the server as well. It is reasonable to assume the room is well mixed by the server and room distribution airflow. There likely will be some variation of inlet temperature from the bottom of the rack to the top but if sufficient space exists around the servers it is most likely not a concern. In this situation, not having the detailed server thermal report, as described in Reference 3, may not be problematic. At the other limit, a rack is placed in a space that is fully populated with other server racks in a row. Another row sits across the cold aisle facing this row as well as another sitting back-to-back on the hot-aisle side. The space covered by the single rack unit and its associated cold-aisle and hot-aisle floor space often is called a work cell and generally covers a 1.5 m 2 (16 ft 2 ) area. The 0.6 m 0.6 m (2 ft 2 ft) perforated tile in the front, the area covered by the rack (~0.6 m 1.3 m [~ 2 ft 4.25 ft]) and the remaining uncovered solid floor tile in the hot-aisle side. Consider the airflow in and around the work cell. Each work cell needs to be able to exist as a stand-alone thermal zone. The airflow provided to the zone comes from the perforated tile, travels through the servers, and exhausts out the top-back of the work cell where the hot aisle returns the warm air to the inlet of the room air handlers. The work cell cannot bring air into the front of the servers from the side as this would be removing air from another work cell and shorting that zone. No air should come in from the top either as that will bring air at a temperature well above the desired ambient and possibly above the specification value for T A (typically 35 C [95 F]). Based on this concept of the work cell it is clear that designers must know the airflow through the servers or else they will not be able to adequately size the flow rate per floor tile. Conversely, Figure 2: The work cell is shown in orange. if the airflow is not adequate, the server airflow will recirculate, causing problems for servers being fed the warmer air. If the design basis of the data center includes the airflow rates of the servers, certain design decisions are needed. First, the design must provide enough total cooling capacity for the peak, matching the central plant to the load. Another question is at what temperature to deliver the supply air. Lowering this temperature can reduce the required fan size in the room cooling unit but also can be problematic, as the system, particularly in a high density data center, must provide the minimum (or nominal) airflow to all of the work cells. A variant of this strategy is that of increasing the T. Doing this allows a lower airflow rate to give the same total cooling capability. This will yield lower capital costs but if the airflow rate is too low, increasing the T will cause recirculation. Also, if the temperature is too low, comfort and ergonomic issues could arise. If the supplier has provided the right data, another decision must be made. Should the system provide enough for the peak airflow, or just the typical? The peak airflow rate will occur when T A = 35 C (95 F) and the typical when T A = 20 ~ 25 C (68 F ~ 77 F). Sizing the air-distribution equipment at the peak flow will result in a robust design with flexibility, but at a high cost. Another complication in sizing for the peak flow, particularly in dense data centers, is that it may prove difficult to move this airflow through the raised floor tiles, causing an imbalance or increased leakage elsewhere. Care must be taken to ensure the raised floor is of sufficient height and an appropriate design for the higher airflows. If the nominal airflow rate is used as the design point, the design, installation, and operation (including floor tile selection for balancing the distribution) must be correct for the proper operation of the data center, but a cost savings potential exists. It is essential to perform some level of modeling to determine the right airflow. In this design, any time the servers ramp up to their peak airflow rate, the racks will be recirculating warm air from the hot aisle to feed some server inlets. This occurs because the work cell has to satisfy its own airflow needs (because its neighbors are also short of airflow) and, if the servers need more air, they will receive it by recirculating. Another way to visualize this is to consider the walls of symmetry around each work cell and recall that there is no flux across a symmetry boundary. The servers are designed to operate successfully at 35 C (95 F) inlet air temperatures so if the prevalence of this recirculation is not too great, the design should be successful. If the detailed equipment list is unknown when the data center is being designed, the airflow may be chosen based on historical airflows for similarly loaded racks in data centers of the same 4 1 A S H R A E J o u r n a l a s h r a e. o r g A p r i l

