Agenda. Context. System Power Management Issues. Power Capping Overview. Power capping participants. Recommendations



Similar documents
Host Power Management in VMware vsphere 5

Power Management in the Linux Kernel

Multi-core and Linux* Kernel

Power efficiency and power management in HP ProLiant servers

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

Host Power Management in VMware vsphere 5.5

Intel 64 and IA-32 Architectures Software Developer s Manual

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and IBM FlexSystem Enterprise Chassis

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

How System Settings Impact PCIe SSD Performance

System Management Interrupt Free Hardware

A Brief Tutorial on Power Management in Computer Systems. David Chalupsky, Emily Qi, & Ilango Ganga Intel Corporation March 13, 2007

Intelligent Power Optimization for Higher Server Density Racks

Achieving QoS in Server Virtualization

Agenda. Capacity Planning practical view CPU Capacity Planning LPAR2RRD LPAR2RRD. Discussion. Premium features Future

Know your Cluster Bottlenecks and Maximize Performance

Power and Cooling Innovations in Dell PowerEdge Servers

Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details

Fastboot Techniques for x86 Architectures. Marcus Bortel Field Application Engineer QNX Software Systems

IO latency tracking. Daniel Lezcano (dlezcano) Linaro Power Management Team

Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and Dell PowerEdge M1000e Blade Enclosure

Comparing Power Saving Techniques for Multi cores ARM Platforms

Summary. Key results at a glance:

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server

Linux on POWER for Green Data Center

Performance Counter. Non-Uniform Memory Access Seminar Karsten Tausche

Binary search tree with SIMD bandwidth optimization using SSE

Capacity Planning for Microsoft SharePoint Technologies

Getting maximum mileage out of tickless

Linux VM Infrastructure for memory power management

BridgeWays Management Pack for VMware ESX

Full and Para Virtualization

Operating Systems. Lecture 03. February 11, 2013

NEC Corporation of America Intro to High Availability / Fault Tolerant Solutions

How To Manage Energy At An Energy Efficient Cost

Energy Management in a Cloud Computing Environment

GHG Protocol Product Life Cycle Accounting and Reporting Standard ICT Sector Guidance. Chapter 7:

Delivering Quality in Software Performance and Scalability Testing

Moab and TORQUE Highlights CUG 2015

Windows8 Internals, Sixth Edition, Part 1

Performance Testing. Configuration Parameters for Performance Testing

Virtualization. Dr. Yingwu Zhu

The Bus (PCI and PCI-Express)

Parallel Algorithm Engineering

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

BIOS Update Release Notes

Mobile Devices - An Introduction to the Android Operating Environment. Design, Architecture, and Performance Implications

PSAM, NEC PCIe SSD Appliance for Microsoft SQL Server (Reference Architecture) September 11 th, 2014 NEC Corporation

Memory Channel Storage ( M C S ) Demystified. Jerome McFarland

Tutorial: Analyzing Energy Usage on an Android* Platform

White Paper. Real-time Capabilities for Linux SGI REACT Real-Time for Linux

Harmonizing policy management with Murphy in GENIVI, AGL and TIZEN IVI

Exar. Optimizing Hadoop Is Bigger Better?? March Exar Corporation Kato Road Fremont, CA

Advances in Virtualization In Support of In-Memory Big Data Applications

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation

Symantec NetBackup 5220

HP PCIe IO Accelerator For Proliant Rackmount Servers And BladeSystems

Hardware RAID vs. Software RAID: Which Implementation is Best for my Application?

HRG Assessment: Stratus everrun Enterprise

Understanding Linux on z/vm Steal Time

Energy-aware Memory Management through Database Buffer Control

Database Instance Caging: A Simple Approach to Server Consolidation. An Oracle White Paper September 2009

SUSE Linux Enterprise 10 SP2: Virtualization Technology Support

Options in Open Source Virtualization and Cloud Computing. Andrew Hadinyoto Republic Polytechnic

Accelerating CFD using OpenFOAM with GPUs

Module I-7410 Advanced Linux FS-11 Part1: Virtualization with KVM

An Oracle White Paper August Oracle VM 3: Server Pool Deployment Planning Considerations for Scalability and Availability

Types Of Operating Systems

NVIDIA Tools For Profiling And Monitoring. David Goodwin

Preserving Performance While Saving Power Using Intel Intelligent Power Node Manager and Intel Data Center Manager

Real-Time Component Software. slide credits: H. Kopetz, P. Puschner

EUCIP IT Administrator - Module 1 PC Hardware Syllabus Version 3.0

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Statistical Profiling-based Techniques for Effective Power Provisioning in Data Centers

Intel Power Gadget 2.0 Monitoring Processor Energy Usage

Performance Management for Cloudbased STC 2012

A Taxonomy and Survey of Energy-Efficient Data Centers and Cloud Computing Systems

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Laptop 2.0. Timo Hönig, Holger Macht, Helmut Schaa. 31. May Laptop 2.0

Enterprise Application Monitoring with

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

Caching SMB Data for Offline Access and an Improved Online Experience

Analyzing Network Servers. Disk Space Utilization Analysis. DiskBoss - Data Management Solution

FUSION iocontrol HYBRID STORAGE ARCHITECTURE 1

Memory Forensics with Hyper-V Virtual Machines.

CGL Architecture Specification

How To Develop An Iterio Data Acquisition System For A Frustreo (Farc) (Iterio) (Fcfc) (For Aterio (Fpc) (Orterio).Org) (Ater

Optimizing the Performance of Your Longview Application

Virtualization. Michael Tsai 2015/06/08

HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads

Virtualization and the U2 Databases

Transcription:

Power Capping Linux

Agenda Context System Power Management Issues Power Capping Overview Power capping participants Recommendations Introduction of Linux Power Capping Framework 2

Power Hungry World Worldwide, the digital warehouses use about 30 billion watts of electricity Equivalent to the output of 30 nuclear power plants On average, using only 6 percent to 12 percent of the electricity powering their servers to perform computations. The rest was essentially used to keep servers idling and ready in case of a surge in activity that could slow or crash their operations. Source: http://www.nytimes.com/2012/09/23/technology/data-centers-waste-vast-amounts-of-energy-belying-industryimage.html?pagewanted=all&_r=0 3

Unpredictability of battery life "Many report unpredictable spikes in battery use, and batteries becoming unnervingly hot, even when used outdoors in the shade." Read more: http://www.dailymail.co.uk/sciencetech/article-2055562/apple-iphone-4s-battery-dies-12- hours--users-forced-ways-patch-mend.html#ixzz2dj5l1uqz 4

Constrained by Power and Thermal Thinner and more power efficient mobile devices Limited power delivery capacity Limited cooling capacity Battery life predictability Aggressive power budgeting Unexpected peaks in system utilization Handling catastrophic thermal/power exceptions are never graceful 5

Power Capping Linux Overview Monitoring power usage Limit power consumed by devices at runtime Redistribution of power to meet power budget at system level Maximize performance Maximize efficiency Avoid critical conditions 6

7 Power Capping Usage Model

Power Capping Participants CPUs GPUs DRAM Others Multimedia sub system Wireless sub system IO 8

Power Capping Ideas Drink less individually Device performance states Throttling states Let the lions handle it (reduce head counts) CPU core offline Reduce number of active lanes on an IO bus Beyond nature Take turns (Idle injection) Avoid fighting / catastrophic shutdown 9

Power Capping Techniques CPU Dynamic performance states (aka P-state) adjustment Dynamic throttling states (aka T-state) adjustment Processor offline Idle injection ACPI power meter ACPI processor aggregator CPU/ DRAM / GPU Running Average Power Limit (RAPL) 10

Power Capping Measurements Test Setup Intel Ivy Bridge Dual Core Linux 3.11.rc3 Power meter: Yokogawa WT210 Test load: openssl speed sha256 11

P-States P-state: a voltage/frequency pair P1 : Guaranteed performance P0 Turbo H/W Control fast Pn-P1 range in OS control/request P1 P0: Max possible performance under HW control (Power = C V 2 f) OS controlled States Pn slow 12

13 Turbo States

Controlling P-states Intel P State Driver sysfs: max_perf_pct, min_perf_pct, no_turbo CPUFREQ Sysfs: scaling_max_freq, scaling_min_freq, scaling_setspeed Thermal Cooling device sysfs: /sys/class/thermal/cooling_device# /type = Processor RAPL (Running Average Power Limit) sysfs: via power capping framework 14

15 P States Performance (Using Intel P State Driver)

RAPL Power monitoring capability Very accurate, model based, per domain built-in power meter Throttle duration Power Limiting Performance feedback mechanism Notification when OS requested P-state is denied Implemented close to processor Interface via MSR, PCIe config space 16

17 RAPL Domains and Interfaces

18 RAPL Performance

RAPL DRAM Power Limit Throttles DRAM Bandwidth Significant DRAM throttling tend to be power inefficient First throttle CPUs 19

T-States Allow Software Controlled Clock modulation Controls stop clock duty cycle Time period for clock signal to drive processor Controlled via thermal cooling device interface for processor Processor clock Stop clock duty cycle Example: 25% 20

21 T-States Performance

Idle Injection Idle states (C-states) get deeper in modern processors Synchronized idle time across all cores achieves the maximum power benefit 22

Idle Injection Implemented by Intel Power Clamp Driver sysfs: /sys/class/thermal/cooling_device# /type = intel_powerclamp Monitors and enforces idle time for each online CPU User selectable idle ratio from 0 to 50% CPU 0 Kidle_inject/0 inactive Force idle inactive CPU 1 Kidle_inject/1 inactive Force idle inactive 23

24 Idle Injection Performance

CPU Offline Migrate activity on current CPU to new CPU Processes, interrupts, timers Logical online/offline CPUs using sysfs interface sysfs: /sys/devices/system/cpu/cpu#/online Conditional physical offline/online depends on BIOS and kernel build flags Limited CPU 0 offline 25

26 CPU Offline Performance

ACPI Power Meter Expose power meter support defined in ACPI 4.0 Depends on BIOS support Interface to read power over a configurable interval Trip point configuration for notification Configuration for power capping parameters power#_cap_min/power#_cap_max 27

ACPI Processor Aggregator(ACPI_PAD) Triggered by ACPI notification only Used to resolve short term thermal emergencies Not a CPU Offline/online but has similar affect Doesn t affect cupset Puts affected CPUs in deep C states 28

Watts Perf % ACPI_PAD Vs. CPU Offline 350 300 250 200 150 100 50 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 # cpu Offlined CPU Offline acpi_pad 100 90 80 70 60 50 40 30 20 10 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 # cpu Offlined CPU Offline acpi_pad Test platform: Intel Romley 2 socket system 29

Recommendations In applicable range RAPL P State Idle Injection CPU Offline T States 30

Linux Power Capping Framework Interface Set power limits read current power for a system or a sub-system Client driver API Avoid code duplication RFC: http://lwn.net/articles/562015/ 31

Linux Power Capping Framework Class sysfs I/F Power cap class driver Registration Callbacks Power cap client Power drivers cap client Power drivers cap client drivers 32

Power Capping Class Features Allow multiple power zones Power zone: Independent unit which has capability to measure and enforce power Allow parent child relationship among zones Exports Sys-FS Interface (monitoring and control) Get Current energy consumption per power zone Set Power limit per zone Power capping driver interface (kernel API) 33

Power Capping Sys-FS hierarchy example /sys/class/powercap intel-rapl powerclamp control_type n enabled intel-rapl:0 intel-rapl:1 intel-rapl:n enabled name energy_uj max_energy_range_uj constraint_x_name constraint_x_power_limit_uw intel_rapl:0:1 constraint_x_time_window_us 34

Power Capping sysfs Arranged as a tree, with root as a control type Control type : Method to implement power capping. E.g. intel-rapl A control type contains multiple power zones Power zone node names are qualified with the control type, E.g. intelrapl:0, intel-rapl:1 Each power zone can have children as power zones Parent child relationship should be based on the relationship of power. E.g. When child power limit is applied, parent power is also affected 35

Takeaways* Prefer direct hardware interface Prefer RAPL for power capping from P0 to Pn over indirect control via P-state governor Consider Idle injection instead of CPU Offline for extended power capping range, maybe platform dependent. Pay attention to hardware trend, e.g. deeper idle Reconsider the effectiveness of legacy features e.g. T-states Suggestions on Improving Power Capping Framework? (*Disclaimer: only based on the test result shown in the slides, not an official recommendation by Intel) 36

Q&A Contact: jacob.jun.pan@linux.intel.com 37

Thank You