# Reducing the Annual Cost of a Telecommunications Data Center

1 Applied Math Modeling White Paper Reducing the Annual Cost of a Telecommunications Data Center By Paul Bemis and Liz Marshall, Applied Math Modeling Inc., Concord, NH March, 2011 Introduction The facilities managers for a large internet service provider have known for a while that one of their data centers is over-cooled. Over-cooling translates into unnecessary energy consumption and expense, so the managers knew that some changes to the data center were needed. Several options were possible, such as shutting down one or more of the cooling units. Many questions arose, however. For example, what would be the consequences of shutting down a CRAC? Would it be possible to shut down two? If so, which two? Could the supply temperatures be increased? To answer these questions, the operators decided to use computational fluid dynamics (CFD), a tool that uses airflow predictions to demonstrate how effectively the cooling air reaches and removes heat from - the equipment in the room. Using CFD-based modeling techniques for quantifying the efficiency of the data center, different energy-saving strategies can be compared before physical changes to the room are made. Problem Description The CFD modeling is done using CoolSim software from Applied Math Modeling. The raised-floor data center is 4720 sq. ft. in size (80 ft x 59 ft) and makes use of a ceiling plenum return. The 2 ft supply plenum, 15 ft Figure 1: Isometric view of the room geometry, showing the rack rows (pink and gray), CRACs (blue tops), perforated floor tiles (white) and overhead ceiling grilles (green) 2011 Applied Math Modeling Inc. 1

3 Zone # CRAC # % in all cases. A validation of the preliminary model, such as this, is an important step if modifications are to be made. A demonstration that the base model accurately captures the physics to within an acceptable margin of error means that it can be used to correctly predict trends if one or more changes are made. 2 Figure 3: Five cooling zones, each of which consists of a pair of opposing CRACs; under normal operating conditions, Zones 2 and 4 are shut down clude the heat load and flow rate associated with each rack and the supply temperature and flow rate associated with each CRAC. The measured supply and return temperatures are shown in Table 1, along with the predicted return temperatures from the CFD model. In all but one case, the predicted temperatures are below the measured values. Often, when the CFD model under-predicts the return temperature on every CRAC, it means that either the heat loads are under-represented or the CRAC flowrates are too high. In this data center, it could be one of these factors or a combination of both, but the effect is small, since the error is below 4 6 CRAC # Contours of rack inlet temperature for the baseline case are shown in Figure 4. The temperatures all fall below the ASHRAE recommended maximum value of 80.6 F. The maximum rack inlet temperature is a good metric to follow when comparing cooling strategies. For an over-cooled data center, however, the minimum rack inlet temperature is also important to follow. According to the ASH- RAE guidelines, the rack inlet temperature 2011 Applied Math Modeling Inc. 3 8 Measured Supply Temperature (F) 10 Measured Return Temperature (F) Predicted Return Temperature (F) Error (%) Table 1: Measurements of supply and return temperature and predicted return temperature for the baseline case; the error in the predicted return temperature is under 5% for all CRACs

4 should not go below 64.4 F, although the allowed minimum value is 59 F. For the baseline case, at least half of the racks have inlet temperatures that are too cold. Data Center Metrics PUE and DCIE A number of metrics have been defined in recent years that can be used to gauge the efficiency of a data center. Metrics can also be used to test whether changes to the data center bring about reduced (or increased) power demands. One of the most popular metrics is the Power Utilization Effectiveness, or PUE, defined as the ratio of total facility power to total IT power. Total Facility Power PUE Total IT Power The total facility power includes that needed to run the CRACs (chillers and fans), IT equipment, battery backup systems, lighting, and any other heat-producing devices. Thus PUE is always greater than 1, but values that are close to 1 are better than those that are not. A typical value is 1.8, a good value is 1.4, and an excellent value is 1.2. COP The largest contributor to the total facility power is the cooling system, comprised of the heat exchangers (chillers, condensers and cooling fluid pumps, for example) and fans. The heat exchanger portion of the CRAC is a heat pump, whose job it is to move heat from one location (inside the room) to another Figure 4: Rack inlet temperatures for the baseline case, in which the CRACs in Zones 2 and 4 are inactive (1) (outside). Heat pumps are rated by their coefficient of performance, or COP. The COP is the ratio of the heat moved by the pump to the work done by the pump to perform this task. The work done by the pump encompasses the heat exchanger work and does not include the CRAC fans. The COP can also be expressed as a power ratio, making use of the rate at which heat is moved (in Watts, say) or work is done (again, in Watts). COP Heat Moved Work Done Using more practical terms, the COP is the ratio of the total room heat load to the power needed to run the chillers, condensers and other heat rejection equipment. For data center cooling equipment, COP values range from 2 to 5, with larger numbers corresponding to better heat pumps. Note that an alternative definition of COP could be made for the data center as a whole, rather than just for the heat rejection system. In this alternative 2011 Applied Math Modeling Inc. 4 (2)

5 definition, the work done would include the power used to run the CRAC fans. For the purposes of this paper, the traditional definition of COP is used. RCI HI n T i TR i 1 1 NT A_ HI T _ HI R _ HI x100% (4) Return Temperature Index TM The Return Temperature Index, a trademark of ANCIS Inc. (www.ancis.us), is a percentage based on the ratio of the total demand air flow rate to the total supply air flow rate. Total Demand Air Flow Rate RTI (3) Total Supply Air Flow Rate Alternatively, it can be computed using the ratio of the average temperature drop across the CRACs to the average temperature rise across the racks. In either case, a value of 100% indicates a perfectly balanced airflow configuration, where the supply equals the demand. Values with RTI < 100% have excess cooling airflow, so short-circuiting across the CRACs exists. Values with RTI > 100% have a deficit of cooling air, so there is recirculation from the rack exhausts to the rack inlets. It is best to have RTI values that are less than, but close to 100%. Rack Cooling Index The Rack Cooling Index, a registered trademarked of ANCIS Inc., is computed using the average number of degrees that the rack inlet temperature falls above (or below) the ASHRAE recommended temperature range (64.4 F to 80.6 F). One index is defined for temperatures above the range (RCI HI ) and another for temperatures below the range (RCI LO ). For the high side: where T R_HI is the ASHRAE recommended maximum temperature (80.6 F) T A_HI is the ASHRAE allowed maximum temperature (90 F) T i is the maximum inlet temperature on the i th rack n is the number of racks with T i > T R_HI N is the total number of racks in the sample The index on the low side is similarly defined: RCI LO n T R _ LO Ti i 1 1 x100% (5) NT R _ LO TA_ LO where T R_LO is the ASHRAE recommended minimum temperature (64.4 F) T A_LO is the ASHRAE allowed minimum temperature (59 F) T i is the minimum inlet temperature on the i th rack n is the number of racks with T i < T R_LO N is the total number of racks in the sample Ideally, no racks should be outside the recommended range, so the ideal value is 100% for both indices. Values between 90% and 100% are in the acceptable to good range, while values under 90% are considered poor Applied Math Modeling Inc. 5

7 and Modifying the Design RCI 0% (10) LO A value of 100% for RCI HI means that no racks have inlet temperatures above the recommended maximum value. A value less than 0 for RCI LO indicates that the average number of degrees below the recommended minimum value is greater than the number of degrees between the recommended and allowable minimum values. In other words, the inlet temperatures on the whole are much too cold. The metrics calculated for the baseline case are summarized in Table 2. Estimating the Baseline Data Center Costs Before considering changes to the data center, the cost of running the facility in its present state is estimated. To determine the cost, the total facility power is needed along with the cost of electricity. Using 746.3kW as the total facility power and \$0.09 as the cost per kwh, the estimated annual cost of running the data center is about \$588,300, which is within 10% of the actual cost. While this value is not based on the CFD analysis, a similar calculation can be done for proposed modifications to the data center. Thus while a CFD analysis can be used to judge the efficacy of each design, the companion energy calculation can be done to estimate the cost savings. Disabling Zones As a first step, each of the three active zones is disabled in a series of trials. These trials are solved concurrently on separate nodes at CoolSim s remote simulation facility (RSF) using the CRAC Failure Analysis model. Trial 1 has Zones 1, 2, and 4 disabled, Trial 2 has Zones 2, 3, and 4 disabled, and Trial 3 has Zones 2, 4, and 5 disabled. For each of these trials, the maximum rack inlet temperature is, at most, 75 F, well below the ASH- RAE recommended value of 80.6 F. Trial 1 has the highest rack inlet temperature, and contours for all of the racks for this case are shown in Figure 5. Note that when the left two zones are shut down, the temperature on that side of the room increases. Pathlines of the supply air in the plenum (Figure 6) show that jets from the opposing CRACs collide and deflect the cooling air to the left side of the room, keeping the rack temperatures in range. These trials illustrate that the simplest Figure 5: Contours of rack inlet temperature for Trial 1 of the baseline case, where Zones 1, 2, and 4 are shut down 2011 Applied Math Modeling Inc. 7

8 Figure 6: Pathlines of supply air in the plenum for Trial 1 of the baseline case, where Zones 1, 2, and 4 are shut down there are still no racks with temperatures above the recommended value. The RCI LO index remains below 0, but only slightly. Thus while the rack inlet temperatures are not as cold as before, they are still colder than they need to be. Owing to the drop in the total facility power, the cost to run the data center also drops. The new annual cost is estimated to be \$492,400, representing a savings of about \$95,900. These results are summarized in Table 2. modification to the data center - shutting down one of the zones - will not adversely impact the equipment. The data center metrics computed for Trial 1 show a great deal of improvement in energy efficiency and an associated cost savings. Because the amount of power needed to run the cooling system and CRAC fans is twothirds of the earlier value, the total cooling power is reduced to kw and the COP is increased to 2.3. The total facility power is reduced to kw, leading to a decrease in the PUE to The rack temperature index increases from 69% to 103%. Ideally, the RTI should be below 100%, but because an additional 5% of infrastructure equipment is included in the total heat load, the demand air flow rate is assumed to have a corresponding increase, which may be too much. (Additional heat from overhead lamps may be lost through the ceiling, for example.) The RCI HI index remains at 100%, indicating that Increasing the Supply Temperatures One of the dominant factors in reducing data center energy consumption is air supply temperature. For every 1.8 F increase in supply air temperature, the efficiency of the heat pump improves by 3.5% (Design Considerations for Datacom Equipment Centers, Atlanta: ASHRAE, 2005). Further, by increasing the supply air temperature, the window of free cooling opens, since air-side or water-side economizers can be used on more days of the year. Economizers improve the efficiency of the cooling system by making use of the reservoir of outside air in the heat rejection process. If the temperature difference between the supply air and outside air is reduced, the chillers and condensers in the heat rejection system can be augmented or even replaced by economizers, resulting in huge gains in the COP. Because the data center is initially overcooled, it is a prime candidate for increased supply temperature. Thus, as a second modi Applied Math Modeling Inc. 8

9 fication, all of the supply temperatures are increased to 65 F. Recall that in the original configuration, measured temperatures were used for the CRAC boundary conditions and all but two were below 60 F. Increasing all of the supply temperatures to 65 F should Baseline Case Trial 0 Baseline Case Trial 1 IT Heat Load (kw) Total IT Heat Load (kw) CRAC Cooling Power (kw) CRAC Fan Power (kw) Total Room Heat Load (kw) Total Cooling Power (kw) Total Facility Power (kw) COP PUE Total Supply Air Flow (CFM) 87,000 58,000 Total Demand Air Flow (CFM) 59,871 59,871 RTI (%) RCI HI (%) RCI LO (%) <0 <0 Cost of Electricity (\$/kw-hr) Annual Cost (\$) 588, ,400 Savings (\$) 95,900 Table 2: Data center metrics comparing Trials 0 and 1 for the baseline case in which Zones 2 and 4 and Zones 1, 2, and 4 are shut down, respectively 2011 Applied Math Modeling Inc. 9

10 Figure 7: Rack inlet temperatures corresponding to 65 F CRAC supply temperatures for Trial 0 where Zones 2 and 4 are disabled alleviate the problems suggested by the RCI LO index and improve the COP, which will save a significant amount of power. To properly assess such a proposed change, a CFD analysis is needed to determine if hot spots will form, impacting the performance at the upper end of the recommended range. Contours of the rack inlet temperatures for Trial 0 of this scenario with Zones 2 and 4 disabled are shown in Figure 7. The minimum and maximum values for the contours are shown in the key on the left. Because the range (65 F to 78 F) falls with the ASHRAE recommended range (64.4 F to 80.6 F), all racks satisfy the condition and the RCI HI and RCI LO values are both 100%. The average supply temperature for the baseline case with only two zones disabled is 57 F. Increasing the average supply temperature to 65 F (an 8 F increase) corresponds to a 15% increase in the COP, so the new value for this configuration is The previous analysis showed, however, that disabling an additional zone results in potential savings of about \$95,000 a year. Thus a CRAC failure analysis should be done with the 65 F supply temperature boundary condition to make sure that the rack inlet temperatures aren t too high if one of the zones is disabled. In Figure 8, the rack inlet temperatures are shown for the trial where the maximum rack inlet temperature is highest. It is again Trial 1 in which Zones 1, 2, and 4 are disabled. Based on the maximum value shown in the figure, some of the racks have temperatures above the ASHRAE recommended maximum of 80.6 F. A calculation of RCI HI supports this finding, with a value of 97.3%. RCI values between 95% and 100% are considered good for a data center. The value suggests that the average deviation in temperature above the recommended value is small, however, and this is indeed borne out by the detailed results. Indeed, all racks have inlet temperatures that are well below the ASHRAE allowable maximum value (90 F). As expected, RCI LO has a value of 100%. With 60 F as the average supply temperature for Trial 1 in the baseline case, the increase in supply temperature for this case (5 F) corresponds to an increase in the COP to Increasing the supply temperatures to 68 F results in RCI HI and RCI LO indices of 100% for Trial 0. Furthermore, the COP increases to For Trial 1, RCI LO remains at 100%, but RCI HI drops to 84%. Even so, none of 2011 Applied Math Modeling Inc. 10

11 the rack inlet temperatures goes above the ASHRAE allowable value. The COP increases to 2.66 for this scenario. Figure 8: Rack inlet temperatures corresponding to 65 F CRAC supply temperatures for Trial 1 where Zones 1, 2, and 4 are disabled The total facility power can be computed for each of these cases, and from it, the annual cost of running the data center. A summary of COP values and associated costs for the various trials discussed in this section is presented in Table 3. Comparison of the Trail 0 results shows that between \$28,500 and \$37,000 can be saved by increasing the supply temperatures. Comparison of the Trial 1 results shows that an additional Baseline Trial 0 Supply 65 F Trial 0 Supply 68 F Trial 0 Average T SUPPLY ( F) COP Total Facility Power (kw) Annual Cost (\$) 588, , ,000 Savings (\$) 28,500 37,300 Baseline Trial 1 Supply 65 F Trial 1 Supply 68 F Trial 1 Average T SUPPLY ( F) COP Total Facility Power (kw) Annual Cost (\$) 492, , ,400 Savings (\$) 12,500 19,000 Table 3: A comparison of COP and predicted annual costs resulting from increased CRAC supply temperatures; savings of at least \$28,000 can be achieved if 3 of the 5 zones are operational (Trial 0, top) and at least \$12,000 if one additional zone is disabled (Trial 1, bottom) 2011 Applied Math Modeling Inc. 11

12 \$12,500 to \$19,000 can be saved by disabling one of the zones. Applying the savings computed in Tables 2 and 3, the annual cost of the data center could be cut by at least \$110,000 by disabling one of the zones and increasing the supply temperature to 65 F. Summary Computational fluid dynamics and data center metrics have been used to study a data center for which a number of measurements were available. The ten CRACs in the room are controlled using five zones, with two CRACs in each zone. Because the heat load is less than the original planned value, the data center currently operates with only three of the five zones active. Even so, the normal operating configuration is generating temperatures that are colder than needed. CFD was used to test alternative scenarios with additional zones disabled and with increased supply temperatures. For each of the design modifications, energy calculations were performed to estimate the total facility power usage and corresponding cost. The results of the studies show that one additional zone can be disabled and the supply temperatures can be raised slightly. With these changes, the rack inlet temperatures will remain well within the ASHRAE allowable temperature range and the annual cost of running the facility will be reduced by about \$100, Applied Math Modeling Inc. 12

