Dynamic Thermal Management for Distributed Systems



Similar documents
How High Temperature Data Centers & Intel Technologies save Energy, Money, Water and Greenhouse Gas Emissions

STANDARD EXTRUDED HEAT SINKS

Agenda. Context. System Power Management Issues. Power Capping Overview. Power capping participants. Recommendations

Module 1 : Conduction. Lecture 5 : 1D conduction example problems. 2D conduction

D5.6 Prototype demonstration of performance monitoring tools on a system with multiple ARM boards Version 1.0

Figure 1 - Unsteady-State Heat Conduction in a One-dimensional Slab

Performance Counter. Non-Uniform Memory Access Seminar Karsten Tausche

Case study: End-to-end data centre infrastructure management

How High Temperature Data Centers & Intel Technologies save Energy, Money, Water and Greenhouse Gas Emissions

Green Wireless Technology Panel Presentation

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations

Effect of Rack Server Population on Temperatures in Data Centers

Thermal Analysis, Heat Sink Design and Performance Verification for GE Fanuc Intelligent Platform s WANic 3860 Packet Processor PCI Card

Energy Efficient MapReduce

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

Using NTC Temperature Sensors Integrated into Power Modules

Welcome to this presentation on LED System Design, part of OSRAM Opto Semiconductors LED 101 series.

Model Order Reduction for Linear Convective Thermal Flow

Building an energy dashboard. Energy measurement and visualization in current HPC systems

Dynamic Power Variations in Data Centers and Network Rooms

Statistical Profiling-based Techniques for Effective Power Provisioning in Data Centers

Characterizing Task Usage Shapes in Google s Compute Clusters

Development and Evaluation of Point Cloud Compression for the Point Cloud Library

ENERGY STAR Program Requirements Product Specification for Computer Servers. Test Method Rev. Apr-2013

The CEETHERM Data Center Laboratory

Everline Module Application Note: Round LED Module Thermal Management

Using CFD for optimal thermal management and cooling design in data centers

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

Energy Constrained Resource Scheduling for Cloud Environment

A Taxonomy and Survey of Energy-Efficient Data Centers and Cloud Computing Systems


Application of CFD modelling to the Design of Modern Data Centres

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

Dynamic Power Variations in Data Centers and Network Rooms

SUN ORACLE EXADATA STORAGE SERVER

Gerard Mc Nulty Systems Optimisation Ltd BA.,B.A.I.,C.Eng.,F.I.E.I

R&D Engineer. equipment. the power

Power Rating Simulation of the new QNS connector generation

Mercury and Freon thermal Management

Mercury and Freon: Temperature Emulation and Management for Server Systems

Integration of a fin experiment into the undergraduate heat transfer laboratory

ConSil: Low-cost Thermal Mapping of Data Centers

Overlapping Data Transfer With Application Execution on Clusters

The Temporal Firewall--A Standardized Interface in the Time-Triggered Architecture

Thermal Management of Datacenter

Multi-core and Linux* Kernel

HyperThreading Support in VMware ESX Server 2.1

Energy Efficient Data Center Design. Can Ozcan Ozen Engineering Emre Türköz Ozen Engineering

CFD SIMULATION OF SDHW STORAGE TANK WITH AND WITHOUT HEATER

Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual

CHAPTER 1 INTRODUCTION

OpenFlow with Intel Voravit Tanyingyong, Markus Hidell, Peter Sjödin

Isolines: Energy-efficient Mapping in Sensor Networks

Thermal Management for Low Cost Consumer Products

Calibrating DC Shunts: Techniques and Uncertainties

THERMAL DESIGN AND TEST REQUIREMENTS FOR OUTSIDE PLANT CABLE TELECOMMUNICATIONS EQUIPMENT Al Marshall, P.E. Philips Broadband Networks

CHAPTER 4 GRID SCHEDULER WITH DEVIATION BASED RESOURCE SCHEDULING

Power & Environmental Monitoring

Real Time Simulation of Power Plants

PERFORMANCE TOOLS DEVELOPMENTS

Run-time Resource Management in SOA Virtualized Environments. Danilo Ardagna, Raffaela Mirandola, Marco Trubian, Li Zhang

From Big Data to Smart Data Thomas Hahn

Data Bulletin. Mounting Variable Frequency Drives in Electrical Enclosures Thermal Concerns OVERVIEW WHY VARIABLE FREQUENCY DRIVES THERMAL MANAGEMENT?

Motor-CAD Software for Thermal Analysis of Electrical Motors - Links to Electromagnetic and Drive Simulation Models

Knowledge Base and Research Report

HERMES: Human Exposure and Radiation Monitoring of Electromagnetic Sources.

SIMULATION OF NONSTATIONARY HEATING/COOLING PROCESS OF TWO-STAGE THERMOELECTRIC MODULE

Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details

How To Build A Clustered Storage Area Network (Csan) From Power All Networks

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and IBM FlexSystem Enterprise Chassis

Distributed Temperature Sensing - DTS

The New Data Center Cooling Paradigm The Tiered Approach

Preserving Performance While Saving Power Using Intel Intelligent Power Node Manager and Intel Data Center Manager

Power efficiency and power management in HP ProLiant servers

Power Management in the Cisco Unified Computing System: An Integrated Approach

Effects of AC Ripple Current on VRLA Battery Life. A Technical Note from the Experts in Business-Critical Continuity

50. DFN Betriebstagung

QoS for (Web) Applications Velocity EU 2011

Intelligent Power Optimization for Higher Server Density Racks

Heat equation examples

Application of Virtual Instrumentation for Sensor Network Monitoring

Control 2004, University of Bath, UK, September 2004

Active-Active Servers and Connection Synchronisation for LVS

Transcription:

Dynamic Thermal Management for Distributed Systems Andreas Weissel Frank Bellosa Department of Computer Science 4 (Operating Systems) University of Erlangen-Nuremberg Martensstr. 1, 91058 Erlangen, Germany {weissel,bellosa}@cs.fau.de 1

Benefits of Dynamic Thermal Management Cooling servers, server clusters cooling facilities often dimensioned for worst-case temperatures or overprovisioned Guarantee temperature limits no need for overprovisioning of cooling units reduced costs (floor space, energy consumption, maintenance,...) Increased reliability safe operation in case of cooling unit failure avoid local hot-spots in the server room Temperature sensors 2

Drawbacks of Existing Approaches If critical temperature is reached throttle the CPU: e.g. halt cycles, reduced duty cycle, reduced speed But: neglect of application-, user- or service-specific requirements due to missing online information about the originator of a specific hardware activation and the amount of energy consumed by that activity Throttling penalizes all tasks 3

Outline From events to energy event-monitoring counters on-line estimation of energy consumption From energy to temperature temperature model Energy Containers accounting of energy consumption task-specific temperature management Infrastructure for temperature management in distributed systems 4

Approaches to Energy Characterization Reading of thermal diode embedded in modern CPUs low temporal resolution significant overhead no information about originator of power consumption 5

Approaches to Energy Characterization Reading of thermal diode embedded in modern CPUs low temporal resolution significant overhead no information about originator of power consumption Counting CPU cycles time as an indicator for energy consumption time as an indicator for contribution to temperature level throttling according to runtime but: wide variation of the active power consumption 6

Approaches to Energy Characterization P4 (2 GHz) running compute intensive tasks: CPU load of 100% variation between 30 51 W 50 power consumption [Watt] 40 30 20 10 0 alu-add alu-add-bswap-inc alu-mov-inc-add mem-load mem-store pushpop mem-load/store mem-inc L1 L2 adler ln2 sha ripemd160 wrap factor idle power consumption: 12.3 W 7

From Events to Energy: Event-Monitoring Counters Event counters register energy-critical events in the complete system architecture. several events can be counted simultaneously low algorithmic overhead high temporal resolution fast response Energy estimation correlate a processor-internal event to an amount of energy select several events and use a linear combination of these event counts to compute the energy consumption (QHUJ\ = HYHQW L ZHLJKW L L 8

From Events to Energy: Methodology Measure the energy consumption of training applications Find the events with the highest correlation to energy consumption Compute weights from linear combination of event counts and real power measurements of the CPU solve linear optimization problem: find the linear combination of these events that produce the minimum estimation error PLQ HYHQW L ZHLJKW L PHDVXUHGHQHUJ\ L avoid underestimation of energy consumption PHDVXUHGHQHUJ\ HYHQW L ZHLJKW L L 9

From Events to Energy: Methodology Set of events and their weights event weight [nj] time stamp counter 6.17 unhalted cycles 7.12 µop queue writes 4.75 retired branches 0.56 mispred branches 340.46 mem retired 1.73 ld miss 1L retired 13.55 Limitations of the Pentium 4 insufficient events for MMX, SSE & floating point instructions the case for dedicated Energy Monitoring Counters 10

Outline From events to energy event-monitoring counters on-line estimation of energy consumption From energy to temperature temperature model Energy Containers accounting of energy consumption task-specific temperature management Infrastructure for temperature management in distributed systems 11

From Energy to Temperature: Thermal Model CPU and heat sink treated as a black box with energy in- and output Heat Sink Thermal Resistance Energy(#Events) CPU Energy( Temp) Thermal Capacity energy input: electrical energy being consumed energy output: heat radiation and convection 12

From Energy to Temperature: Thermal Model Energy input: energy consumed by the processor G7 3 = F 3GW &38SRZHU FRQVXPSWLRQ Heat Sink Thermal Resistance Energy(#Events) CPU Thermal Capacity Energy( Temp) 13

From Energy to Temperature: Thermal Model Energy output: primarily due to convection G7 3 = F 3GW &38SRZHU FRQVXPSWLRQ Heat Sink Thermal Resistance G7 = F ( 7 7 )GW 7 DPELHQW WHPSHUDWXUH Energy(#Events) CPU Energy( Temp) Thermal Capacity 14

From Energy to Temperature: Thermal Model Altogether: G7 = [ F 3 F ( 7 7 )]GW energy estimator power consumption P time stamp counter time interval dt the constants F, F and 7 have to be determined 15

From Energy to Temperature: Thermal Model Altogether: G7 = [ F 3 F ( 7 7 )]GW energy estimator power consumption P time stamp counter time interval dt the constants F, F and 7 have to be determined Solving this differential equation yields 7W () = F ------- H F W F dynamic part + F --- 3 + 7 F static part 16

Thermal Model: Dynamic Part Measurements of the processor temperature on a sudden constant power consumption and a sudden power reduction to HLT power. fit an exponential function to the data: coefficient = F 60 Pentium 4@2GHz with active heat sink temperature [ Celsius] 55 50 45 40 35 Idle Active Idle (50 W) (12 W) 30 0 1000 2000 3000 4000 5000 time [s] 17

Thermal Model: Static Part Static temperatures and power consumption of the test programs 60 55 static temperature [Celsius] 50 45 40 35 30 10 20 30 40 50 power consumption [Watt] 18

Thermal Model: Static Part Linear function to determine and F 7 60 55 static temperature [Celsius] 50 45 40 35 30 10 20 30 40 50 power consumption [Watt] 19

Thermal Model: Implementation Linux 2.6 kernel Periodically compute a temperature estimation from the estimated energy consumption Deviation of a few degrees celsius over 24 hours or if ambient temperature changes Re-calibration with measured temperature every few minutes 20

Thermal Model: Accuracy 60 55 estimated temperature measured temperature temperature (celsius) 50 45 40 35 0 2000 4000 6000 8000 10000 12000 time [s] 21

Outline From events to energy event-monitoring counters on-line estimation of energy consumption From energy to temperature temperature model Energy Containers accounting of energy consumption task-specific temperature management Infrastructure for temperature management in distributed systems 22

Properties of Energy Accounting Accounting to different tasks/activities/clients example: web server serving requests from different client classes e.g. Internet/Intranet, different service contracts Resource principal can change dynamically Client/server relationships between processes account energy consumption of server to client 23

Energy Containers Resource Containers [OSDI 99] Energy Containers separation of protection domain and resource principal Container Hierarchy root container (whole system) processes are attached to containers this association can be changed dynamically (client/server relationship) energy is automatically accounted to the activity responsible for it Energy shares root 80% 20% Intranet 131.188.34.* web server amount of energy available (depending on energy limit) periodically refreshed if a container runs out of energy, its processes are stopped Internet 24

Energy Containers Example: web server working for two clients with different shares power consumption (Watt) 60 50 40 30 20 10 web server (total) client1 (80% share) client2 (20% share) Start throttling 0 0 100 200 300 time (seconds) 25

Task-specific Temperature Management Periodically compute an energy limit for the root container (depending on the temperature limit ) G7 = [ F 3 + F ( 7 7 )]GW 7 OLPLW 7 7 OLPLW Dissolve to 3 3 OLPLW Energy budgets of all containers are limited according to their shares Tasks are automatically throttled according to their contribution to the current temperature Throttling is implemented by removing tasks from the runqueue 26

Temperature Management Example: Enforcing a temperature limit of 45º 55 estimated temperature measured temperature temperature (celsius) 50 45 40 temperature limit 35 0 500 1000 1500 2000 2500 time [s] 27

Outline From events to energy event-monitoring counters on-line estimation of energy consumption From energy to temperature temperature model Energy containers accounting of energy consumption task-specific temperature management Infrastructure for temperature management in distributed systems 28

Energy Containers Distributed energy accounting root root root IPv6 IPv6 id 1 Id 2 id 1 id 2 id 1 id 2 Id transmitted with the network packets (IPv6 extension headers) receiving process attached to the corresponding energy container temperature and energy are cluster-wide accounted and limited transparent to applications and unmodified operating systems 29

Energy Containers Energy accounting across machine boundaries requests from two different clients represented by two containers web server sends requests to factorization server power consumption [W] 50 40 30 20 10 0 0 500 1000 time [s] server #1 (web server) container 1 (id 1) root container (total) container 2 (id 2) 1500 0 500 1000 1500 time [s] server #2 (factorization) the energy consumption of the server is correctly accounted to the client 30

Infrastructure for DTM in Distributed Systems Distributed energy accounting Foundation for policies managing energy and temperature in server clusters account, monitor and limit energy consumption and temperature of each node Examples set equal energy/temperature limits for all servers cluster-wide uniform temperature and power densities, no hot spots in the server room use energy/temperature limits to throttle affected servers in case of a cooling unit failure reduce number of active cooling units in case of low utilization 31

Conclusion Event-monitoring counters enable on-line energy accounting task-specific temperature management Correctly account client/server relations across machine boundaries Transparent to applications and unmodified operating systems Future directions examine more sophisticated energy models task-specific frequency scaling to adjust the thermal load 32