Comparing Power Saving Techniques for Multi cores ARM Platforms



Similar documents
Update on big.little scheduling experiments. Morten Rasmussen Technology Researcher

EECS 750: Advanced Operating Systems. 01/28 /2015 Heechul Yun

Power Management in the Linux Kernel

Process Scheduling in Linux

Linux Load Balancing

IO latency tracking. Daniel Lezcano (dlezcano) Linaro Power Management Team

Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6

White Paper. Real-time Capabilities for Linux SGI REACT Real-Time for Linux

Multi-core and Linux* Kernel

Chapter 5 Linux Load Balancing Mechanisms

Real-time KVM from the ground up

Linux Process Scheduling. sched.c. schedule() scheduler_tick() hooks. try_to_wake_up() ... CFS CPU 0 CPU 1 CPU 2 CPU 3

Beyond the Hypervisor

Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details

Agenda. Context. System Power Management Issues. Power Capping Overview. Power capping participants. Recommendations

Cortex-A9 MPCore Software Development

Know your Cluster Bottlenecks and Maximize Performance

big.little Technology Moves Towards Fully Heterogeneous Global Task Scheduling Improving Energy Efficiency and Performance in Mobile Devices

Scheduling. Scheduling. Scheduling levels. Decision to switch the running process can take place under the following circumstances:

big.little Technology: The Future of Mobile Making very high performance available in a mobile envelope without sacrificing energy efficiency

Hardware accelerated Virtualization in the ARM Cortex Processors

The Microsoft Windows Hypervisor High Level Architecture

Real-Time Scheduling 1 / 39

2. is the number of processes that are completed per time unit. A) CPU utilization B) Response time C) Turnaround time D) Throughput

Visualizing gem5 via ARM DS-5 Streamline. Dam Sunwoo ARM R&D December 2012

Long-term monitoring of apparent latency in PREEMPT RT Linux real-time systems

Virtualization for Hard Real-Time Applications Partition where you can Virtualize where you have to

SYSTEM ecos Embedded Configurable Operating System

Development With ARM DS-5. Mervyn Liu FAE Aug. 2015

Task Scheduling for Multicore Embedded Devices

Kernel Optimizations for KVM. Rik van Riel Senior Software Engineer, Red Hat June

CPU Scheduling Outline

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

CPU Scheduling. Core Definitions

Real-Time KVM for the Masses Unrestricted Siemens AG All rights reserved

Host Power Management in VMware vsphere 5

Linux scheduler history. We will be talking about the O(1) scheduler

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

Multilevel Load Balancing in NUMA Computers

HyperThreading Support in VMware ESX Server 2.1

Improvement of Scheduling Granularity for Deadline Scheduler

MODULE 3 VIRTUALIZED DATA CENTER COMPUTE

Sierraware Overview. Simply Secure

Administering Microsoft SQL Server 2012 Databases

EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications

Hybrid Platform Application in Software Debug

Why Relative Share Does Not Work

vsphere Resource Management

ARM Cortex-A9 MPCore Multicore Processor Hierarchical Implementation with IC Compiler

Optimizing Background Sync on Smartphones

High Performance or Cycle Accuracy?

Multi-core Programming System Overview

Performance Testing. Configuration Parameters for Performance Testing

Hello, and welcome to this presentation of the STM32L4 reset and clock controller.

Multi-Threading Performance on Commodity Multi-Core Processors

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

How To Understand And Understand An Operating System In C Programming

W4118 Operating Systems. Instructor: Junfeng Yang

Inside the Erlang VM

Virtual machine CPU monitoring with Kernel Tracing

Deep Dive Monitoring Servers using BI 4.1. Alan Mayer Solid Ground Technologies SESSION CODE: 0305

UNIT 4 Software Development Flow

Overview of the Current Approaches to Enhance the Linux Scheduler. Preeti U. Murthy IBM Linux Technology Center

Load-Balancing for a Real-Time System Based on Asymmetric Multi-Processing

Hard Real-Time Linux

In-System Programmer USER MANUAL RN-ISP-UM RN-WIFLYCR-UM

vsphere Resource Management

Operating Systems 4 th Class

USB readout board for PEBS Performance test

Code:1Z Titre: Oracle WebLogic. Version: Demo. Server 12c Essentials.

Beyond Virtualization: A Novel Software Architecture for Multi-Core SoCs. Jim Ready September 18, 2012

Types Of Operating Systems

Operating Systems. Lecture 03. February 11, 2013

KVM PERFORMANCE IMPROVEMENTS AND OPTIMIZATIONS. Mark Wagner Principal SW Engineer, Red Hat August 14, 2011

Computer-System Architecture

Running VirtualCenter in a Virtual Machine

M68EVB908QL4 Development Board for Motorola MC68HC908QL4

PikeOS: Multi-Core RTOS for IMA. Dr. Sergey Tverdyshev SYSGO AG , Moscow

Virtualization in the ARMv7 Architecture Lecture for the Embedded Systems Course CSD, University of Crete (May 20, 2014)

How System Settings Impact PCIe SSD Performance

Completely Fair Scheduler and its tuning 1

vsphere Resource Management Guide

How to Perform Real-Time Processing on the Raspberry Pi. Steven Doran SCALE 13X

Load-Balancing for Improving User Responsiveness on Multicore Embedded Systems

Building Blocks for PRU Development

Distribution One Server Requirements

x86 ISA Modifications to support Virtual Machines

vsphere Resource Management

Low Power AMD Athlon 64 and AMD Opteron Processors

Architecting Microsoft SQL Server on VMware vsphere M A R C H

The CPU Scheduler in VMware vsphere 5.1

ELI: Bare-Metal Performance for I/O Virtualization

Overview of the Linux Scheduler Framework

Transcription:

Comparing Power Saving Techniques for Multi cores ARM Platforms <vincent.guittot@linaro.org>

Content Why to use CPU hotplug? Tests environment CPU hotplug constraints What we have / What we want How to do? Sched_mc_power_saving Why these awaking? Gather or spread Conclusion

Content Why to use CPU hotplug? Tests environment CPU hotplug constraints What we have / What we want How to do? Sched_mc_power_saving Why these awaking? Gather or spread Conclusion

Why to use CPU hotplug? cpuidle driver managed only CPU0 CPU1 must be unplugged to reach deep C-states Now most of cpuidle driver manages SMP ARM But Both CPUs are managed Reach deep C-States in SMP mode Same power level as CPU hotplug Must synchronize the idle sequence Enter/Leave sequences are more complex

Why to use CPU hotplug? CPU hotplug does better Retention rate State transition Power consumption Aggressive power saving use cases Low CPU load use cases Background activity Voice call MP3 playback

Why to use CPU hotplug? MP3 playback on Linaro Android 12.01 Not a power optimized SW Snowball Retention rate increases from 70 to 74 % ~1mW over 10mW on ARM power rail CPU1 is not power gated Panda ~5mW over 37mW on VDD core power rail Which part is for ARM? CPU1 is power gated

Content Why to use CPU hotplug? Tests environment CPU hotplug constraints What we have / What we want How to do? Sched_mc_power_saving Why these awaking? Gather or spread Conclusion

Tests environment We have used Cyclictest Simulate multi threaded use case Simulate light CPU load use case Preferred to low power MP3 Easier to tests and reproduce No dependency with optimization No dependency with external devices & firmware

Tests environment SW environment Linaro kernel Linaro developer rootfs Cpuidle Cluster retention HW environment Dual cortex-a9 Snowball board Modified to measure IARM

Tests environment Power consumption figures Comparing ARM cluster power consumption Cluster state statistics 20 100% mw 15 10 Hotplug SMP 80% 60% 40% Running WFI Retention 5 20% 0 1 2 3 4 5 6 7 0% Hotplug SMP Thread

Content Why to use CPU hotplug? Tests environment CPU hotplug constraints What we have / What we want How to do? Sched_mc_power_saving Why these awaking? Gather or spread Conclusion

CPU hotplug constraints A power consuming sequence Not only update CPU masks Reset memory structures linked to the CPU Dozens of callbacks for CPU up/down notifications Large time scale Compensate the cost of a unplug/plug sequence

CPU hotplug constraints A time consuming sequence Create dedicated threads of the CPU latency increases above a load threshold low latency (~115ms) on system not loaded

Content Why to use CPU hotplug? Tests environment CPU hotplug constraints What we have / What we want How to do? Sched_mc_power_saving Why these awaking? Gather or spread Conclusion

What we have Current SMP behavior Both CPUs are used Not simultaneously most of the time

What we want Do not wake up CPU1 Let CPU1 in deepest power state Do not run on CPU1 Neither thread nor interrupt Targeted behavior

Content Why to use CPU hotplug? Tests environment CPU hotplug constraints What we have / What we want How to do? Sched_mc_power_saving Why these awaking? Gather or spread Conclusion

Interruption affinity Default interrupt affinity All cores can be used ARM gic set affinity cpumask_any_and(mask_val, cpu_online_mask) Take the 1st CPU of the mask Unless specific configuration Interruptions raised on CPU0

Scheduler & CPU topology Current ARM topology configuration Multi-core topology Easy migration on the idle CPU When task wakes up Good for responsiveness/performance Bad for keeping tasks on CPU0

Scheduler & CPU topology sched_mc_power_saving mode 2 Try to gather tasks in a package Only one package for Cortex-A9 MP! Load balance Default CPU capacity is 1 task

Single core / Multi package How to gather tasks on 1 core? Emulate multi-package topology Gather tasks in one package Increase cpu_power Keep tasks on one core Use asymmetric behavior Use CPU0 in priority

sched_mc_power_saving figures CPU1 stays idle Nothing is scheduled on CPU1 Quite close to CPU hotplug behavior

sched_mc_power_saving figures Power consumption figures Comparing ARM cluster power consumption cluster state statistics 20 100% mw 15 10 Hotplug SMP MC 80% 60% 40% Running WFI Retention 5 20% 0 1 2 3 4 5 6 7 0% Hotplug SMP MC Thread

sched_mc_power_saving figures Still differences with CPU hotplug Let us look over a bigger period Traces show some awaking of CPU1 2 seconds

Content Why to use CPU hotplug? Tests environment CPU hotplug constraints What we have / What we want How to do? Sched_mc_power_saving Why these awaking? Gather or spread Conclusion

Idle Load Balance Idle load balance Run on an idle CPU More than 1 task in the run queue Wake up CPU1 Idle load balance

Load balance Load balance Newly idle CPU Periodic (10~256ms) Periodic load balance Newly idle load balance Nothing when only 1 CPU is online

Sched_RT No power saving policy Can break the effort of CFS load balance RT threads Use cpuset in low power mode Pinned tasks on CPU0?

Deferrable Some activities are pinned to CPU1 Created on cpu_up Deferrable Do not wake the CPU Not a problem in itself

Timer Timer use current CPU base Unless idle and another CPU is not idle A timer can stay on a CPU Cluster wake up Timer fire Deferrable work Force timer migration on CPU0 Unplug / plug CPU1

Content Why to use CPU hotplug? Tests environment CPU hotplug constraints What we have / What we want How to do? Sched_mc_power_saving Why these awaking? Gather or spread Conclusion

Load balance Load balance is based on CPU capacity CPU0 is out of capacity Some task migrate on CPU1 Out of capacity of CPU0 detected during periodic check CPU0 has capacity Tasks migrate on CPU0

Load balance Task migration possibility A newly idle CPU can pull running tasks Idle Load balance triggered by tick Task wakes up on a new CPU What about a isolated task Asynchronous to the tick Asynchronous to other tasks

Gather or spread? The most difficult question What we have used? P-state number of running tasks May be not enough C-state Others

Content Why to use CPU hotplug? Tests environment CPU hotplug constraints What we have / What we want How to do? Sched_mc_power_saving Why these awaking? Gather or spread Conclusion

Conclusion Close to CPU hotplug power consumption On snowball, small overhead on MP3 (<2%) Still some issues to solve isolated task New mechanisms will help

References https://wiki.linaro.org/workinggroups/powermanagement/ http://git.linaro.org/gitweb http://igloocommunity.org/ https://rt.wiki.kernel.org/articles/c/y/c/cyclictest.html

QUESTIONS?

THANK YOU

Backup slide MP3 sequence