Energy Smart Data Center Phase-III R&D Andrés Márquez Pacific Northwest National Laboratory
Overall Task Division Advance spraycooling technology (ISR) Advance 3D chip scale packing technology (Tessera) Evaluate these technologies in conjunction and in comparison with other technologies (PNNL) 2
Objectives of ESDC Phase-III R&D: Spraycooling Demonstrate spot spraycooling technology at the facility level by deploying a spraycooled cluster Instrument at processor, server, rack and facility level to acquire realtime data Measure electrical/thermal parameters of an air cooled cluster subsystem as a base-line, determine COP Measure electrical/thermal parameters of a spraycooled cluster, determine COP Characterize TCO Simulate and validate facility CFD Monitor cluster availability and reliability Perform a trade study of spraycooled system delivery and installation Demonstrate +40kW/rack High Power Density Computing Demonstrate indirect spraycooling 3
Demonstrate Spot Spraycooling Technology at Facility Level First demonstration at this scale (~10 servers) Enables validation of spraycooling technology with production equipment vs. engineering samples in previous phases Engages HPC OEM in spraycooling technology addressing issues such as integration, deployment and installation 4
Instrument at Processor, Server, Rack, and Facility Level First multi-scale instrumentation of spraycooling technology Processor instrumentation enables Tracking processor health Analysis of root failures in worst case scenarios Server and Rack instrumentation enables Tracking system health Monitoring behavior of TMU Measurement of thermal interaction between servers Measurement of thermal loads on the facility per rack Facility instrumentation enables Access to data unavailable in a production facility (Power/Temp/Flows) Populating models with real data (COP/TCO/CFD) Exercising what-if-scenarios (chiller on/off) 5
Measure Parameters of Air Cooled Sub-System System Measure Power/Temp/Flows with a conventionally air cooled system enabling Validation of specifications as established by OEM Comparison of competitive technologies under similar conditions by providing a baseline Universal identification of accepted parameters such as COP Validation of models such as TCO/CFD 6
Measure Parameters of Spraycooled Cluster Measure Power/Temp/Flows with a spot spraycooled system enabling Validation of specifications as established by OEM+ISR Comparison of competitive technologies under similar conditions by providing a baseline Universal identification of accepted parameters such as COP Validation of models such as TCO/CFD Validation and follow-up experimentation such as chiller bypass, load change, etc. 7
Characterize Total Cost of Ownership Plug air cooled and spraycooled measured values into TCO model. 8
Simulate and Validate Facility CFD Perform facility CFD simulation Under air cooling conditions Under spraycooling conditions Validate facility CFD with measured data If necessary improve model or change sampling discretization and reiterate Use CFD model to improve air flow conditions and avoid over-engineering that might affect COP/TCO numbers 9
Monitor Cluster Availability and Reliability Phase-III production-grade hardware is expected to be fairly stable without requiring interventions and modifications A sub-system of the cluster will be exempt from perturbations resulting from experiments to avoid influencing reliability studies Availability and reliability information will also be used to establish maintenance and servicing criteria 10
Perform Trade Study of System Delivery and Installation Mean time to repair Filed repair strategies at different levels Troubleshooting and diagnostics Repair process and tools Material requirements (consumables) Typical cost considerations Reliability, availability, service requirements Design for serviceability Design for manufacturability Service parts strategy Training strategy and requirements Onsite installation and testing Facility modifications and installation requirements 11
High Density Computing +40kW/rack Build a combination of a rack with resistive heat loads and some fully spraycooled boards with a thermal load of +40kW Demonstrate high performance density per volume Demonstrate feasibility of spraycooling thermal management 12
Indirect Spraycooling Demonstrate the capability of indirect spraycooling to enable component cooling that is not amenable to spot cooling Enable spot and indirect cooling 13
Additional Objectives of ESDC Phase-III R&D Alternative Cooling Solutions Conduct studies on alternative cooling solutions. These studies include solutions by established vendors at the data center level as well as various cooling solutions at the chip level DC Power Delivery Investigate conversion and distribution losses on a DC branch and compare with power delivery on AC branch Consider using the High Density Computing platform as DUT Architecture Studies Conduct a study to investigate if these technologies are applicable to high-density hybrid-computing 14