Journée Thématique Emergente EDF Clamart, 13 janvier 2011 Les aspects énergétiques du calcul Introduction to IBM tools to manage energy consumption François Thomas, Luigi Brochard [ft,luigi.brochard]@fr.ibm.com
Agenda Why IBM? The Power Cycle and Equation Power7 EnergyScale Software to Manage Power Summary 2
Agenda Why IBM? The Power Cycle and Equation Power7 EnergyScale Software to Manage Power Summary 3
Why IBM? Early in the game Consistently at the top of the Green500 list Now a strong selling point 4
Why IBM? Early in the game Blue Gene L design started in 1999-2000 First appearance in Top500 list in 2004 Before the first Green 500 list was out (2005) Consistently at the top of the Green500 list Blue Gene P : 2007 Cell Broadband Engine (RoadRunner) + BG/P : 2008-2009 Blue Gene Q : 2010 onwards SuperMUC : 2011? Our expertise in energy efficicency is now a strong selling point Purpose built (Blue Gene) Acceleratedcomputing (RoadRunner, GPU) COTS hardware (Intel based) 5
Sources : green500.org and hpcwire.com 6
LRZ + IBM Germany Smart System Cooling: Innovative Hot Water usage First High End HPC System with Hot Water Cooling Compute Nodes are cooled with hot water Inlet temperature up to 45 C Enables All-Year free cooling in Garching Aquasar Prototype No cooling aggregates (compressors) required Enables Re-Use of waste heat of system Heating or Process Energy Developed in Germany, @ IBM Böblingen Lab 7 Smarter Systems for a Smarter Planet. 2010 IBM Germany GmbH
LRZ + IBM Germany Smart Job Scheduling: Energy Aware Application Scheduling and System Management First Implementation of Energy Aware HPC Software Stack on x86 Application Energy consumption will be monitored, stored and reported to the user For a second application run, the scheduler will decide based on administrative policies Which Processor Frequency is optimal for the application Lower Frequency reduces energy consumption Currently not used system nodes will put to sleep mode or shutdown based on administrator capacity expectations 8 Smarter Systems for a Smarter Planet. 2010 IBM Germany GmbH
Agenda Why IBM? The Power Cycle and Equation Power7 EnergyScale Software to Manage Power Summary 9
The power cycle : power, compute and cool Fuel Oil 48 Hrs. Typical Generators N+1 Uninterruptible Power Supply Batteries 10-15 min UPS Cooling Towers Data Center 75F eir PDU A 85F deg water 55F deg water Static Switch A Utility Provider 2 Sources 10 PDU B 45F deg water Server Raised Floor Static Switch B 95F deg water Chillers N+1 CRAC Units Makeup Water Storage 55F deg air
Green Datacenter Market Drivers and Trends Increased green consciousness, and rising cost of power IT demand outpaces technology improvements Server energy use doubled 2000-2005; expected to increase15%/year 15 % power growth per year is not sustainable Koomey Study: Server use 1.2% of U.S. energy ICT industries consume 2% ww energy Carbon dioxide emission like global aviation Real Actions Needed Brouillard, APC, 2006 Source IDC 2006, Document# 201722, "The impact of Power and Cooling on Datacenter Infrastructure, John Humphreys, Jed Scaramella" 11 Future datacenters dominated by energy cost; half energy spent on cooling
How much does it cost? Acquisition costs vs Energy costs over 4 years Ratio of Pow er Ratio of Costs 12 Acquisition Costs IT Pow er Energy Costs Cooling Pow er
Our approach is at multiple levels micro-electronics Energy is pervasive in IBM design (especially in our journey to Exascale) Long history of energy efficient designs : SOI, SMT, edram,... Server and rack level Energy management features on all recent IBM servers Water cooling : rear door heat exchangers (idataplex) «cold plate» (Power6, Power7, BG/Q) Hot water cooling (LRZ) Software level Application tuning Unified software for power management Cluster management Power and energy aware job schedulers Data center level Centres of expertise in datacenter design Example : the Green Data Center in IBM Montpellier, France Another example : hot water cooling at IBM Boeblingen, Germany Best practices, monitoring 13
Module Heat Flux (W/cm2 ) The Power Problem 14 12 Bipolar LowPower Multicore CMOS 10 * Frequency => Power ~ Frequency3 => two cores at 80% frequency consumes as much a one core at 100% frequency. We have a frequency problem: Power per chip is constant due to cooling => multicores at constant frequency And we have a passive power problem Smaller lithography => more leakage current => more idle power 8 6 Junction Transistor 4 Integrated Circuit 2 0 1950 1960 3DI 1970 1980 1990 2000 2010 2020 2030 10 1.0E+10 10 Number of Transistors Power = Capacitance * Voltage2 9 1.0E+09 10 1 Billion 8 10 1.0E+08 ~50% CAGR 1.0E+07 107 6 10 1 Million 1.0E+06 5 10 1.0E+05 1980 14 1985 1990 1995 2000 2005 2010
Passive Power continues to explode Oxide thickness is near the limit. Traditional CMOS scaling has ended. Density improvements will continue but power efficiency from technology will only improve very slowly. Historic trend of power efficiency improvement will slow 15
Agenda Why IBM? The Power Cycle and Equation Power7 EnergyScale Software to Manage Power Summary 16
POWER7 Processor IBM s 45nm SOI process 567 mm2, 1.2B transistors 8 out-of-order cores, 4-way SMT 32KB L1 D/I, 256KB L2 per core, 32MB shared L3 in IBM s edram process 2 on-chip memory controllers, 2 pairs of buffered memory channels each Designed for blades, commercial SMPs, supercomputers 17 4X cores in similar power envelope Designed for energy-efficiency and effective power management.
Thermal, Power and Activity Sensors 44 digital thermal sensors (5 per chiplet, 4 extra-chiplet) on chip; Max chiplet thermal sensor(s) also directly available to firmware. On-board ambient temperature sensor, memory buffer/dimm thermal sensors and VRM thermal-trip logic. On-board measurement circuits and A/D channels for Performance/activity sensors 18 full system, processor socket, memory sub-system, I/O sub-system and fan power measurements Core-level usage with active cycle counts, instruction throughput counts Core-level memory hierarchy usage event-based programmable weight counters for frequency impact at high loads Memory controller-level activity requests and power-mode usage stats
Rack to Rack: Power 755 Compared to Power 575 (POWER6) Power 755 Power 575 Cores/chip 8 4 Total cores 32 32 Frequency 3.3 GHz 4.7 GHz Memory (max) 256 GB 256 GB Cooling Air Water Cores/rack Rack type 320 19 448 24 Power (Watts) (Linpack) 1650 5400 Each Power 755 node offers the same core count as Power 575 with: 40-50% Improvement in Performance Air Cooling vs. Water Cooling 1/3 of the Energy Consumption 37% Improvement in floor space for a 64 node configuration Green500 ~ 495 MFlops/Watt 19
IBM EnergyScale functions Power / Thermal Trending Collect and report power consumption, inlet and exhaust temp Power Capping Guaranteed (Hard Cap) Enforces a power cap via Dynamic Frequency and Voltage Slewing Soft Power Cap Attempted lower cap, but not guaranteed. Energy Management Modes Enhanced for P7 Static Power Save (SPS) Save power via a fixed voltage and frequency drop as much as 30% down for P7 Dynamic Power Save (DPS) Optimize power vs performance using Dynamic Voltage and Frequency Slewing Will provide performance boost at very high utilization Will save power at most utilizations Dynamic Power Save - Favor Performance (DPS-FP) Will provide performance boost at most utilizations Will save power only at very low utilization 20
High Level System Power Control View Architected Idle Instructions (Doze, Nap, ) PHYP Policy and Feedback Communication interface Sensor information (temp, current, performance) TPMD P7 Chip Mode 2B,3, 4, 5 P-state I/O 21 Fans Memory Mode 1 & 2A Idle state
Cooperative Power Management in EnergyScale System monitoring and management tools Active Energy Manager FSP Operating Systems Real-time power/thermal control, policyguided, performance-aware energy saving algorithms Dynamic resource folding and any explicit low-power mode control TPMD Hypervisor Off-chip/On-board sensors & controls 22 POWER7 Mechanisms access, low-level coordination among controllers, in-band/out-ofband comm. channel, autonomous/configurable control engines, sensors.
Agenda Why IBM? The Power Cycle and Equation Power7 EnergyScale Software to Manage Power Summary 23
Some examples IBM Active Energy Manager (AEM) Monitor the power consumption at the node/rack level Manages the power consumption (capping, trending, provisioning) IBM Research tools Much higher sampling rates than AEM Can separate CPU power, RAM power, other power Down to every VRM on a motherboard Cluster management tool Extension to xcat (extreme Cluster Cloud Administration Toolkit) To query and set power states Job Scheduler Extension to LoadLeveler Power and Energy aware job scheduling function 24
IBM Systems Director Active Energy Manager (AEM) Monitoring energy in a data center lets you begin to manage it AEM is a cornerstone of the IBM energy management framework Measure, Monitor, and control energy usage Power and Thermal Measurement Supports System x, POWER, and z System natively Supports other equipment via external sensors Integrates with Infrastructure Management Integrates with Enterprise Management 25
IBM Systems Director Active Energy Manager V4.2 AEM application supported on: Windows, AIX, and Linux (x86, POWER, and System z) Web-based user interface requiring only a browser Energy thresholding Enables a user to set an energy or temperature threshold and be notified when it is reached (or allow an action to automatically be taken) Soft power capping (an option within power capping) Ability to set a lower energy cap value to enable clients to save energy Easily set power caps on multiple systems Group capping (an option within power capping): Data to aid in server power on/off scenarios 26 Enables a user to set an energy cap for a group of servers (such as all the servers in a rack) Understand time to IPL and standby power Number of lifetime IPLs and reliability threshold (P7 only)
xcat Manage power consumption on an ad hoc basis For example, while cluster is being installed, or when there is high power consumption in other parts of the lab for a period of time Query: Power saving mode, Power capping value, power consumed info, CPU usage, fan speed, environment temperature Set: Power saving mode and Power capping value 27
Power and Energy Aware LoadLeveler 28 Goals Identify idle nodes in the cluster and put them in the lowest power mode Provide to system admins query capability on historical usage of power and energy by workload, user, etc. Reduction of energy consumption on workloads with minimal impact to performance Choices for system admin: Decide to use Energy Optimize policy or not on his system Decide the max performance degradation one application will be impacted by, if the Energy policy is applied If Energy Policy is on policy is applied only to jobs that match the performance degradation criteria System admin can query LL DB to evaluate the impact of the potential policy on performance degradation and energy saving
Summary IBM started early being hurt by working on the energy consumption of its servers. Energy management is pervasive in IBM servers design, from chips to servers to clusters to datacenters. And even more so with the trend to Exascale. Good energy management can be a key differentiator in some HPC deals. We try to tackle the problem at various levels : chip design, system design, cluster management software, job schedulers. We have monitoring tools that will work across the whole IBM portfolio of servers whatever the microprocessor architecture (IBM or Intel) or the form factor (rackable servers, blades, integrated racks) Using those tools, our customer can save quite a lot on their energy bill 29
Thank you. Questions?