Thermal Management of Datacenter Qinghui Tang 1
Preliminaries What is data center What is thermal management Why does Intel Care Why Computer Science 2
Typical layout of a datacenter Rack outlet temperature T out Rack inlet temperature T in Air conditioner supply temperature T s 3
State-of-Art Thermal Management of Data Center Power densities are increasing exponentially along with Moore s Law Current cooling solutions at various levels Chip / component level Server/board level Rack level Data center level S/W based Thermal management solutions HP+Duke 4
Thermal Management of Datacenter Motivation and significance Compute Intensive Applications (Online Gaming, Computer Movie Animation, Data Mining) requiring increased utilization of Data Center Maximizing computing capacity is a demanding requirement New blade servers can be packed more densely Energy cost is rising dramatically Goal Improving thermal performance Lowering hardware failure rate Reducing energy cost 5
Typical layout of a datacenter System Variables Inlet air temperature of servers Outlet air temperature of serve Power consumption of servers Heat Removal Capacity of HVAC Power consumption of air conditioner Heat dissipation rate Temperature threshold of servers Application Profile on servers Temperature threshold of environment Symbol T i_in T i_out P i H i CP i V i TH i C i TH E 6
New Challenges Planning perspective: How to design efficient data center? does upgrading 10% blade servers to smart ones help to reduce cost Operation perspective: How to efficiently operate data center and lower the cost? What s the trade-off between utility cost and hardware failure cost Overcooling: wastes energy and increases utility cost Undercooling: increases frequency of hardware failures 7
Research Issues of Thermal Management of Datacenter Scheduler Thermal Performance Evaluation Other Impact Factors Cost Optimization Control Abstract Heat Flow Model Power & Load Characterization Modeling Thermal Performance Multiscale & Multimodal Info Analysis Understanding 8
Example of multiple granularity and scale 100 80 60 40 20 0 100 100 100 3500 3000 2500 2000 1500 1000 500 0 Task Map Power Map Temperature Map 2300 2300 2300 92 90 88 86 84 82 80 78 76 74 92 85 89 9
Multiscale and multimodal nature of datacenter management Information perspective Multiple system variables Different change pattern Different sampling Rate Control perspective Responsiveness Control granularity (spatial and temporal level) Sensitivity Analysis Spatial Scale Room level Row level Chassis level Row level Room level Row level Room level Seconds Minutes Hours Temporal Scale 10
Approaches CFD simulation to characterize thermal performance of data center Online measurement and feedback control system 11
CFD Simulation CFD real model based on ASU HPC center 12
Thermal-aware task scheduling Incoming task Schematic View of Thermal Management Control Policy Scheduling Policy Other Impact factors 6 Scheduler History Sensor Data Policy Controller 5 Cost Analysis Datacenter Onsite survey 1 CFD simulation software 2 Abstract Heat Model 4 Correlation of load & power Map load to power consumption ` Sensor Data Database Collecting environmental data and load information from sensors Current Sensor Data 3 13
Two-Pronged Approach Real-time measurement Online lightweight simulation & prediction CFD simulation of Computer Room Temperature Impact on Temperature Control Algorithms Model of CRAC unit Operation Command Simulation Feedback to tune up simulation parameters Design Guidline to evaluate deployment performance Computer Room w/ sensors Sensor Data Control Decision Operation Command Impact on Temperature CRAC unit Online monitoring & control 14
Goal: Datacenter energy cost optimization Throughput or Computation Capacity Total cost Thermal distribution Distributed Server model Cooling system model Computation energy cost Cooling energy cost Hardware cost Operation cost 15
Different optimization goals Maximizing computation capacity given energy cost constraint Minimizing individual cost (computing cost/cooling cost) Achieving thermal balancing 16
Questions and answers 17