How Much Power Oversubscription is Safe and Allowed in Data Centers?



Similar documents
Achieving QoS in Server Virtualization

Computer Architecture

An OS-oriented performance monitoring tool for multicore systems

CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate:

A-DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters

Performance Characterization of SPEC CPU2006 Integer Benchmarks on x Architecture

When Prefetching Works, When It Doesn t, and Why

Compiler-Assisted Binary Parsing

Cache Capacity and Memory Bandwidth Scaling Limits of Highly Threaded Processors

secubt : Hacking the Hackers with User-Space Virtualization

Linear-time Modeling of Program Working Set in Shared Cache

Memory Bandwidth Management for Efficient Performance Isolation in Multi-core Platforms

Analysis of Memory Sensitive SPEC CPU2006 Integer Benchmarks for Big Data Benchmarking

HQEMU: A Multi-Threaded and Retargetable Dynamic Binary Translator on Multicores

Practical Memory Checking with Dr. Memory

Delivering on the Promise of Universal Memory for Spin-Transfer Torque RAM (STT-RAM)

Understanding the Behavior of In-Memory Computing Workloads

Building a More Efficient Data Center from Servers to Software. Aman Kansal

Characterizing Processor Thermal Behavior

DATA centers often comprise thousands of enterprise

Benchmarking the Amazon Elastic Compute Cloud (EC2)

SHIP: Scalable Hierarchical Power Control for Large-Scale Data Centers

Dynamic Power Variations in Data Centers and Network Rooms

Statistical Profiling-based Techniques for Effective Power Provisioning in Data Centers

Dynamic Power Variations in Data Centers and Network Rooms

Power Budgeting for Virtualized Data Centers

FACT: a Framework for Adaptive Contention-aware Thread migrations

Leveraging Thermal Storage to Cut the Electricity Bill for Datacenter Cooling

Memory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality

Case study: End-to-end data centre infrastructure management

Comparing Multi-Core Processors for Server Virtualization

Run-time Resource Management in SOA Virtualized Environments. Danilo Ardagna, Raffaela Mirandola, Marco Trubian, Li Zhang

Comparing the Carbon Footprints of 11G and 12G Rack Servers from Dell

Fine-Grained User-Space Security Through Virtualization. Mathias Payer and Thomas R. Gross ETH Zurich

MCCCSim A Highly Configurable Multi Core Cache Contention Simulator

Solve your IT energy crisis WITH An energy SMArT SoluTIon FroM Dell

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and Dell PowerEdge M1000e Blade Enclosure

Measuring Power in your Data Center

Measuring Energy Efficiency in a Data Center

DELL VS. SUN SERVERS: R910 PERFORMANCE COMPARISON SPECint_rate_base2006

Computer Science 146/246 Homework #3

Virtualizing Performance Asymmetric Multi-core Systems

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and IBM FlexSystem Enterprise Chassis

Building the Energy-Smart Datacenter

Data Center Power Distribution at the Cabinet Level: The top 12 questions you need to ask to improve data center energy effciency

Measuring Power in your Data Center: The Roadmap to your PUE and Carbon Footprint

Energy Savings in the Data Center Starts with Power Monitoring

Paying to Save: Reducing Cost of Colocation Data Center via Rewards

SPARC64 VII Fujitsu s Next Generation Quad-Core Processor

Towards Energy Efficient Query Processing in Database Management System

Sizing Circuit Breakers for Self-Regulating Heat Tracers

Considerations for a Highly Available Intelligent Rack Power Distribution Unit. A White Paper on Availability

E-cubed: End-to-End Energy Management of Virtual Desktop Infrastructure with Guaranteed Performance

In-network Monitoring and Control Policy for DVFS of CMP Networkson-Chip and Last Level Caches

Energy management White paper. Greening the data center with IBM Tivoli software: an integrated approach to managing energy.

Green HPC - Dynamic Power Management in HPC

Comparative performance test Red Hat Enterprise Linux 5.1 and Red Hat Enterprise Linux 3 AS on Intel-based servers

Prefetch-Aware Shared-Resource Management for Multi-Core Systems

A Taxonomy and Survey of Energy-Efficient Data Centers and Cloud Computing Systems

Transcription:

How Much Power Oversubscription is Safe and Allowed in Data Centers? Xing Fu 1,2, Xiaorui Wang 1,2, Charles Lefurgy 3 1 EECS @ University of Tennessee, Knoxville 2 ECE @ The Ohio State University 3 IBM Research, Austin

Introduction Power: a first-class constraint in data center design Power oversubscription by power capping Improves power facility utilization Improves server performance Power capping at different levels Servers, racks, and data centers However, they all share a common assumption Power should never exceed the rated power capacity? Otherwise the circuit breaker (CB) would trip? Not really! circuit breakers can sustain short overloads. 2

How Much Power Subscription is Safe? A CB trips or not depends on Magnitude of the overload Duration of the overload Ideal upper bound? Lower bound of the tolerance band This paper Investigates CB trip features Proposes adaptive power control to 1. Fully utilizes the allowed overload interval for maximized server perf. 2. Safely hosts more servers without upgrading power facilities 3 Two minutes Trip curve of a typical 1.17 Rated rated circuit breaker capacity

Proposed Solution: CB-Adaptive More than just a standalone controller A methodology that adapts the parameters of existing power controllers to engineer their settling times Example: adapts a server power controller [Lefurgy ICAC 07] 1. Obtain the tripping time from the CB tripping curve 2. The desired settling time should be the tripping time 3. Adapt controller parameter K to enforce the settling time Error Power Cap K Error CPU frequency Power measured Server Adapt controller parameter K 4

CB-Adaptive Design Details System model p( k + 1) = p( k) + Ad( k) p(k) is the power of the server d(k) is the change to the CPU frequency A is a hardware-specific parameter when the server runs LINPACK How to adapt the controller parameter? The relationship between the parameter and the settling time The parameter is a function of the measured power, the rated current of CB, and the control period. 5

Two CB-Adaptive Improvements Temperature-aware CB-Adaptive The CB trip curve is impacted by the ambient temperature. The rated current of CB is a linear function of the temperature. K is also a function of ambient temperature. CB-Proactive Delicately increases DVFS level in a proactive way Further improves the server performance When and to what extent the DVFS level is increased? CB enters the long-delay region The rated current Increase the frequency to the highest level 1.2 1.1 1 0.9 10 20 30 40 50 Temperature 6

Discussion on Power Oversubscription Possible applications of CB-Adaptive Hosting additional servers Safety issues A typical power delivery system Every component can tolerate overloads like CBs Overload capacity: power beyond which permanent damage occurs to the component 7

More Discussion Components other than CBs do not experience overloads frequently. It is less likely that many servers reach their peak power simultaneously. Evidenced by a real Google data center [Fan ISCA 07] When only a branch circuit is overloaded CB-Adaptive can be applied directly When multiple branch circuits are overloaded CB-Adaptive needs to consider the tripping time of components other than CBs. 8

Hardware Testbed Dell OptiPlex 380 Rockwell Allen- Bradley 1489-A Industrial CB Workloads SPEC CPU2006 SPEC JBB LINPACK 9

Baselines NoControl Estimates the peak power consumption of a server P-Control No power caps Unsafe and conservative Measures the power in every control period A non-adaptive proportional controller calculates frequency changes to enforce a power budget. P-Control-CB The power budget is different from that of P-Control Upper bound of the long-delay region of the CB 10

Power Consumption (Watt) Po w er C o n su m p tio n (W att) Power Consumption (Watt) Power Consumption (Watt) Power Control Comparison 150 100 50 150 100 NoControl Rated Limit Long-Delay Upper Limit 0 10 20 30 40 50 60 70 80 90 100 Control Period (5sec) P-Control-CB Long-Delay Upper Limit P-Control 50 0 10 20 30 40 50 60 70 80 90 100 150 100 50 150 100 50 Control Period (5sec) Converge to 80W after hours NoControl P-Control CB-Adaptive Rated Limit Long-Delay Upper Limit 0 10 20 30 40 50 60 70 80 90 100 Control Period (5sec) Converge to 80W CB-Proactive Rated Lim it Long-Delay Upper Lim it 0 10 20 30 40 50 60 70 80 90 100 Control Period (5sec) NoControl causes the CB trips. Unsafe P-Control & P-Control-CB Unsafe and conservative CB-Adaptive fully utilizes overload intervals of CBs. Raise CPU freq for higher performance 11

CB-Adaptive CB-Proactive Performance Comparison CB-Adaptive outperforms P-Control by 66%, for LINPACK 29 % to 49%, for SPEC CPU 2006 74%, for SPEC JBB 2.0 1.6 1.2 0.8 0.4 LINPACK Performance (Gflops) 0.0 P-Control P-Control-CB 12 35000 SPECJBB Performance (bops) 18 15 12 9 6 3 0 LINPACK SPECJBB 30000 25000 20000 15000 10000 5000 0 gcc bzip2 perlbench h264ref libquantum sjeng hmmer gobmk mcf P-Control CB-Adaptive CB-Proactive sphinx3 wrf lbm tonto GemsFDTD calculix povray soplex dealii namd leslie3d cactusadm gromacs zeusmp milc gamess bwaves xalancbmk astar omnetpp SPEC2006 benchmarks Performance (Base Ratio)

Impact of Temperature 500 450 400 350 300 250 200 Trip Time (Sec) NoControl (21.7 ) NoControl (26.4 ) NoControl (31.6 ) NoControl (34.8 ) P-Control-CB (45 ) CB-Adaptive (45 ) CB-Proactive (45 ) Temperature (degree Celsius) Temperature impacts the trip time significantly. Temperature-blind solutions P-Control-CB, CB- Adaptive and CB-Proactive are not safe. 13

Temperature-Aware CB-Adaptive As the temperature increases, the performance of servers decreases. The performance decrease is modest. 14

Power Provisioning Analysis NoControl The number of servers = Rated power of the CB estimated server power The estimation is too conservative 7 servers hosted per branch P-Control Enforce a power budget instead of an estimation of power 13 servers hosted per branch CB-Adaptive Enforce a higher power budget than P-Control 20 servers hosted per branch 15

Conclusions A common assumption of existing power capping Peak power should never exceed the rated CB capacity This paper Systematically studies the CB tripping characteristics Identifies ideal upper bound of safe power oversubscription Proposes two adaptive power control strategies Evaluation on safe power oversubscription A single server: 38% performance improvement Circuit branch: host 54% more servers without upgrading power infrastructure 16

Questions? Acknowledgements NSF CAREER Award CNS-0845390 NSF CSR Grant CNS-0720663 NSF SHF Grant CCF-1017336 Prof. Leon Tolbert at the University of Tennessee Thank you! 17

BackUp 18

Control Theoretic Analysis How to adapt the controller parameter? m 1 0.02 K = A Details of the derivation Z transform of the system model Z-domain controller Calculate the close loop transfer function Reverse Z transform 19

Power Provisioning Analysis UPS cannot tolerate overloads Not a problem because each UPS run at 50% its capacity Factors limiting overload capacities 20

21