Cache-Aware Compositional Analysis of Real-Time Multicore Virtualization Platforms

Similar documents
Cache-aware compositional analysis of real-time multicore virtualization platforms

Real- Time Mul,- Core Virtual Machine Scheduling in Xen

Real-Time Multi-Core Virtual Machine Scheduling in Xen

Real-Time Multi-Core Virtual Machine Scheduling in Xen

RT-OpenStack: CPU Resource Management for Real-Time Cloud Computing

Real-Time Multi-Core Virtual Machine Scheduling in Xen

Why real-time scheduling theory still matters

RT-Xen: Towards Real-time Hypervisor Scheduling in Xen

Compositional Real-Time Scheduling Framework with Periodic Model

Sporadic Server Revisited

Real-Time Scheduling (Part 1) (Working Draft) Real-Time System Example

A hypervisor approach with real-time support to the MIPS M5150 processor

Predictable response times in event-driven real-time systems

Real- Time Scheduling

Real-Time Software. Basic Scheduling and Response-Time Analysis. René Rydhof Hansen. 21. september 2010

Memory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality

Lecture Outline Overview of real-time scheduling algorithms Outline relative strengths, weaknesses

Technical Paper. Moving SAS Applications from a Physical to a Virtual VMware Environment

Real-time Performance Control of Elastic Virtualized Network Functions

Quantum Support for Multiprocessor Pfair Scheduling in Linux

Multi-core real-time scheduling

A Flattened Hierarchical Scheduler for Real-Time Virtual Machines

Scaling in a Hypervisor Environment

Hierarchical Real-Time Scheduling and Synchronization

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

ChronOS Linux: A Best-Effort Real-Time Multiprocessor Linux Kernel

Performance Testing of a Cloud Service

Prioritizing Soft Real-Time Network Traffic in Virtualized Hosts Based on Xen

Virtualization and Cloud Computing. The Threat of Covert Channels. Related Work. Zhenyu Wu, Zhang Xu, and Haining Wang 1

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing

A Network Marketing Model For Different Host Machines

Virtual Machine Monitors. Dr. Marc E. Fiuczynski Research Scholar Princeton University

Performance Implications of Hosting Enterprise Telephony Applications on Virtualized Multi-Core Platforms

Multi-core Programming System Overview

Cloud Operating Systems for Servers

Ada Real-Time Services and Virtualization

12/8/2010. Koen De Bosschere Ghent University Belgium JVM. Process. .NET Virtualization. Virtualization types. Xen. Paravirtualization.

W H I T E P A P E R. Performance and Scalability of Microsoft SQL Server on VMware vsphere 4

Reducing Cost and Complexity with Industrial System Consolidation

Real-Time Scheduling 1 / 39

ICS Principles of Operating Systems

Towards a Load Balancer Architecture for Multi- Core Mobile Communication Systems

Virtualization. Types of Interfaces

A Comparison of Oracle Performance on Physical and VMware Servers

Run-time Resource Management in SOA Virtualized Environments. Danilo Ardagna, Raffaela Mirandola, Marco Trubian, Li Zhang

Deploying Microsoft Exchange Server 2007 mailbox roles on VMware Infrastructure 3 using HP ProLiant servers and HP StorageWorks

Real-time KVM from the ground up

BridgeWays Management Pack for VMware ESX

Energy Efficiency and Server Virtualization in Data Centers: An Empirical Investigation

Analysis and Implementation of the Multiprocessor BandWidth Inheritance Protocol

Virtualizing Performance-Critical Database Applications in VMware vsphere VMware vsphere 4.0 with ESX 4.0

Master thesis. Title: Real-Time Scheduling methods for High Performance Signal Processing Applications on Multicore platform

Partitioned real-time scheduling on heterogeneous shared-memory multiprocessors

Cache-Aware Real-Time Scheduling Simulator: Implementation and Return of Experience

Big Data in the Background: Maximizing Productivity while Minimizing Virtual Machine Interference

Fine-Grained Multiprocessor Real-Time Locking with Improved Blocking

Long-term monitoring of apparent latency in PREEMPT RT Linux real-time systems

Performance Management in the Virtual Data Center, Part II Memory Management

The CPU Scheduler in VMware vsphere 5.1

4. Fixed-Priority Scheduling

A Real-Time Scheduling Service for Parallel Tasks

Using EDF in Linux: SCHED DEADLINE. Luca Abeni

CIVSched: A Communication-aware Inter-VM Scheduling Technique for Decreased Network Latency between Co-located VMs

Chapter 14 Virtual Machines

Evaluation of an RTOS on top of a hosted virtual machine system

PERFORMANCE ANALYSIS OF KERNEL-BASED VIRTUAL MACHINE

A Taxonomy and Survey of Energy-Efficient Data Centers and Cloud Computing Systems

Virtualization. Jukka K. Nurminen

Virtualization Performance on SGI UV 2000 using Red Hat Enterprise Linux 6.3 KVM

On the Scalability of Real-Time Scheduling Algorithms on Multicore Platforms: A Case Study

Black-box Performance Models for Virtualized Web. Danilo Ardagna, Mara Tanelli, Marco Lovera, Li Zhang

A Comparison of Oracle Performance on Physical and VMware Servers

Utilization-based Scheduling in OpenStack* Compute (Nova)

KairosVM: Deterministic Introspection for Real-time Virtual Machine Hierarchical Scheduling

Evaluating Intel Virtualization Technology FlexMigration with Multi-generation Intel Multi-core and Intel Dual-core Xeon Processors.

Dell Virtualization Solution for Microsoft SQL Server 2012 using PowerEdge R820

Capacity Estimation for Linux Workloads

Virtualizing Performance Asymmetric Multi-core Systems

COM 444 Cloud Computing

Proactive, Resource-Aware, Tunable Real-time Fault-tolerant Middleware

Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software

Linux VM Infrastructure for memory power management

KVM in Embedded Requirements, Experiences, Open Challenges

Characterize Performance in Horizon 6

Deciding which process to run. (Deciding which thread to run) Deciding how long the chosen process can run

Windows Server 2008 R2 Hyper-V Live Migration

DELL. Virtual Desktop Infrastructure Study END-TO-END COMPUTING. Dell Enterprise Solutions Engineering

CFS-v: I/O Demand-driven VM Scheduler in KVM

Microsoft Exchange Server 2007

Linux Plumbers API for Real-Time Scheduling with Temporal Isolation on Linux

Distributed and Cloud Computing

Small is Better: Avoiding Latency Traps in Virtualized DataCenters

Virtualization: Concepts, Applications, and Performance Modeling

Complexity of scheduling real-time tasks subjected to cache-related preemption delays

APerformanceComparisonBetweenEnlightenmentandEmulationinMicrosoftHyper-V

Virtualizing The Desktop. Scott Galvin

Resource Synchronization in Hierarchically Scheduled Real-Time Systems using Preemptive Critical Sections

VIRTUALIZATION is widely deployed in large-scale

Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details

Full and Para Virtualization

Transcription:

Cache-Aware Compositional Analysis of Real-Time Multicore Virtualization Platforms Meng Xu, Linh T.X. Phan, Insup Lee, Oleg Sokolsky, Sisu Xi, Chenyang Lu and Christopher D. Gill

Complex Systems on Multicore Platforms Embedded systems Become more and more complex Consist of multiple sub-systems Multicore platforms Number of cores keeps increasing http://www.codeproject.com/articles/16165/robotics-embedded-systems-part-i International technology roadmap for semiconductors 007 edition: System drivers

Virtualization The benefits of virtualization Consolidate legacy systems Integrate large, complex systems VM 0 VM 1 VM Guest OS Guest OS Guest OS VCPU VCPU VCPU VCPU VCPU VCPU VCPU VCPU Virtual Machine Monitor CPU CPU CPU CPU cache cache cache cache 3

Compositional Analysis for RT Guarantees Step 1: Abstract each component (VM) into an interface Step : Transform each interface into a set of VCPUs Step 3: Abstract the VCPUs of all VMs to the system s interface VCPU: (Period, Budget) VM 0 VM 1 VM Interface 0 Interface 1 Interface Guest OS Guest OS Guest OS Interface of the system VCPU VCPU VCPU VCPU VCPU VCPU VCPU VCPU Virtual Machine Monitor CPU CPU CPU CPU cache cache cache cache 4

Limitations of Existing Multicore Compositional Analysis Existing multicore compositional analysis does not consider platform overhead In practice, platform overhead is not negligible Example: cache overhead Result: unsafe analysis! Reason: analysis does not consider the effect of cache overhead in virtualization and under-estimates resource Examples: cache overhead due to task preemption, VCPU preemption and VCPU completion 5

Contributions Introduce overhead-free compositional analysis DMPR: improved MPR resource model Quantify events that cause cache overhead Task-preemption events, VCPU-preemption events, VCPU-completion events Propose cache-aware compositional analysis Hybrid analysis: combination of task-centric analysis and model-centric analysis 6

Deterministic Multi-Processor Resource Model (DMPR) DMPR µ = Π,Θ,m Interface Bandwidth = m full VCPUs (i.e., with bandwidth 1) m + Π, Θ one partial VCPU, with period Π and budget Θ Π Θ Partial VCPU: Full VCPU: Full VCPU: VP 1 VP VP 3 t Worst-case resource supply of a DMPR µ = 5,1, 7

Assumptions Each core has a private cache; no shared cache Period of each component s interface is given by designers Maximum cache overhead per task preemption or crpmd migration in the system is upper bounded by Virtual machine monitor uses hybrid EDF (hedf) cpu1 cpu cpu 3 cpu4 pin VP1 VP3 VP VP4 VP5 hedf scheduling of VCPUs 8

Outline Introduction Events that cause cache overhead Cache-aware compositional analysis Evaluation 9

Event 1: Task Preemption Event Definition: A task-preemption event happens when a task preempts another task within the same VM. Example = {,, 3} 1 1 = (1,5) = (8,5) = (4,3) 3 cpu 1 cpu 1, priority > 3 1 3 0 1 3 4 5 6 7 8 Cache overhead 3 > 1 1 Task-preemption event overhead t t 10

Event : VCPU-Preemption Event Definition: A VCPU-preemption event occurs when a VCPU is preempted by another VCPU of another VM. Example: CPU 3 4 1 CPU 1 3 4 hedf pin VP 1 VP VP 3 VP 4 VP5 VP VP 1 3 VP 4 VP VP5 C 1 µ 1 = 5,3,1 C µ = 8,3,1 C µ = 3 6,4, 0 3 Full VCPU Partial VCPU 1 3 4 5 6 7 8 (b) VCPUs configuration (a) VMs configuration 11

Event : VCPU-Preemption Event C VP3 VP4 4 5 6 VP (5,3) VP 5 (6,4) VP 4 (8,3) VP,VP VP 4 5 VP (5,3) VP (6,4 5 ) VP 4 (8,3) (c) Scheduling of partial VCPUs 4 (8,4) 5 (6,) 6 (10,1.5) CPU1 CPU VP 3 VP 4 0 1 3 4 5 6 7 8 4 6 5 overhead caused by VCPU-preemption event 0 1 3 4 5 6 7 8 (6,) 5 4(8,4) 6 (10,1.5) VP unavailable cache overhead (d) Cache overhead of tasks in component 1

Event 3: VCPU-Completion Event Definition: A VCPU-completion event of a VCPU happens when the VCPU exhausts its budget in a period and stops its execution. Example: C full (4,) VP1 VP 1 3 1 (8,4) (6,) 3 (10,1.5) 1 3 VP 1 VP 0 1 3 4 5 6 7 8 (6,) 1(8,4) 3 (10,1.5) VP unavailable cache overhead caused by VCPUcompletion event cache overhead 13

Outline Introduction Events that cause cache overhead Cache-aware compositional analysis Evaluation 14

Task-Centric Analysis Task-preemption event Inflate higher priority task with one cache overhead VCPU-preemption/completion event e Inflate task with the number of cache overhead caused by VCPU-preemption/completion events during a task s period k e = e = k e k k + + crpmd crpmd ( N (1) 3 VP i + NVP,, k i k ) j k k task-preemption event cache overhead for task k crpmd (a) Task-preemption event overhead number of VCPU-preemption/ completion events (b) VCPU-preemption event overhead during a period of task k See paper for how to compute number of VCPU-preemption/completion events 15 ()

Task-Centric Analysis Inflated WCET of each task e k = e i + crpmd + crpmd ( N 3 VP i + NVP,, k i k ) System is schedulable under cache overhead if the inflated workload is schedulable 16

Pessimistic When Number of Tasks Is Large Only two tasks have cache overhead in a VCPUpreemption/completion event But don t know which two tasks have cache overhead To be safe: have to inflate all tasks WCET with one cache overhead per VCPU-preemption/completion event 1, 3 Only two tasks have cache overhead due to the event VP 1 VP 0 1 3 4 5 6 7 8 (6,) 1(8,4) 3 (10,.5) VP unavailable cache overhead Cache overhead in VCPU-completion event 17

Model-Centric Approach Subtract the overhead due to VCPU-preemption/completion events from the original resource supply of the interface to obtain its effective resource supply. VCPU-preemption/ completion event overhead Task-preemption event overhead How to compute effective resource supply (red line)? 18

Effective SBF of DMPR Interface Effective SBF of the partial VCPU Effective SBF of m full VCPUs Effective SBF of the interface Reason: A DMPR interface provides resource with one partial VCPU and m full VCPUs 19

Worst Case Scenario of Effective Resource Supply of Partial VCPU: The worst case happens when: (1) The partial VCPU has all VCPU-preemption/completion events ()The partial VCPU incurs the overhead as late as possible in the first period and as early as possible in the rest of periods (3) The time interval t begins when the VCPU finishes supplying its effective resource in the first period. Maximum number of VCPU-preemption/completion events during a partial VCPU s period is computed in the paper (3) () (1) t VP i t1 t t3 t4 t t t 5 6 7 8 Worst-case effective resource supply of the partial VCPU Proof is in the paper. 0 t

Effective Resource Supply of Partial VCPU SBF stop (t) = yθ * + max{0, t x yπ z} if Θ VP i 0 if Θ = 0 where VP i belongs to interface Θ stop = max{0, Θ N * crpmd VP i t }, x µ = crpmd = Π Π,Θ, m * Θ y = t x Π and z 0 * = Π Θ VP i t 1 t * Θ t3 t4 t 5 t6 t 7 t8 x Worst-case effective resource supply of the partial VCPU VP i 1 z

Effective SBF of The Interface Effective SBF of the partial VCPU Effective SBF of m full VCPUs Effective SBF of the interface

Model-Centric Analysis Step 1: Consider task-preemption event overhead Step : Consider VCPU-preemption/completion event overhead Step 3: Check if effective resource supply >= resource demand C µ = 10,8.5, 1 1 5 1,, 5 = (0,5) 3

Pessimistic When Number of Full VCPUS Is Large Only one full VCPU is affected per VCPU-preemption/ completion event in practice But all full VCPUs marked unavailable at a VCPU-preemption/ completion event when we compute the effective SBF of m full VCPUs VP 1 VP VP 3 VP 4 t1 t t3 t4 t5 t6 t7 CRPMD Unavailable resource in analysis 4

Task-Centric vs. Model-Centric Neither of these two analysis dominates the other Task-centric is better Model-centric is better C period = 5 hedf Bandwidth of taskcentric analysis: 4.94 Bandwidth of modelcentric analysis: 6.90 C period = 5 hedf Bandwidth of taskcentric analysis: 3.8 Bandwidth of modelcentric analysis:.86 C C1 period = 0 period = 50 crpmd = C C1 period = 0 period = 50 crpmd = 1 5 1 5 1 = (100,50),, 5 = (100,50) 1 = (100,5),, 5 = (100,5) 5

Hybrid Cache-Aware Analysis C period = 5 hedf C C1 period = 0 period = 50 1 5 6 10 1 = (100,5),, 5 = (100,5) 6

Hybrid Cache-Aware Analysis Task-centric analysis Model-centric analysis C µ = 5,4.1,3 bandwidth :3.8 C µ = 5,4.3, bandwidth :.86 hedf hedf C C 1 µ 0,9.8,0 µ = 50,36.1, 1 = C C 1 µ 0,8.8,0 µ = 50,39.7, 1 1 = 1 5 6 10 1 = (100,5),, 5 = (100,5) 7 1 = (100,5),, 5 = (100,5)

Outline Introduction Events that cause cache overhead Cache-aware compositional analysis Evaluation 8

Experimental Setup Dell Optiplex-980 quad-core workstation (3 cores for guest VMs, 1 core for VM0) Hardware hedf RT-Xen D 1 1 1 D Π D 1 = 56 Π = 18 Π = 64 3 3 4 Π 4 = D 3 k 1 k 1 k 1 k 1 3 3 4 4 LITMUS measured crpmd =1.9 WSS=56KB ms Task set: utilization 1.8; Task utilization distribution: uniformly in [0.001,0.1] 9

Cache Overhead Is Not Negligible Unsafe taskset claimed schedulabled by overheadfree analysis is not schedulable in practice MPR DMPR Theory RT-Xen Theory RT-Xen Schedulable Yes No Yes No Cache-aware Hybrid Safe same taskset is claimed NOT schedulable by cache-aware analysis Cache-aware Task-centric Theory RT-Xen Theory RT-Xen Schedulable No No No No 30

Simulation Setup hedf D 1 D Π = 56 Π 18 D 1 = Π = 3 64 Π = 3 4 4 D 3 1 1 k 1 k 1 k 1 k 1 3 3 4 4 crpmd = 0.9 ms Task's period Task's utilization uniformly in [350ms, 850ms] uniform uniformly in [0.001,0.1] light bimodal 8/9 in [0.1,0.4] and 1/9 in [0.5,0.9] medium bimodal 6/9 in [0.1,0.4] and 3/9 in [0.5,0.9] heavy bimodal 4/9 in [0.1,0.4] and 5/9 in [0.5,0.9] 31

Hybrid Analysis Saves Bandwidth Hybrid approach saves bandwidth for 64% of the tasksets crpmd Average wcet = 0.003 Hybrid analysis saves bandwidth over task-centric analysis per taskset utilization 3

Hybrid Analysis Saves Bandwidth Hybrid analysis still saves bandwidth over task-centric analysis when the distribution of tasks utilization changes crpmd = 0.0005 = 0. 0004 Average wcet Average wcet crpmd crpmd Average wcet = 0.0003 a) bimodal-light distribution b) bimodal-medium distribution c) bimodal-heavy distribution 33

Related Work Overhead-free compositional analysis S. Baruah and N. Fisher. Component-based design in multiprocessor real-time systems. In ICESS, 009. A. Easwaran, I. Shin, and I. Lee. Optimal virtual cluster-based multiprocessor scheduling. Real-Time Systems, 43(1):5 59, 009. H. Leontyev and J. H. Anderson. A hierarchical multiprocessor bandwidth reservation scheme with timing guarantees. In ECRTS, 008. G. Lipari and E. Bini. A framework for hierarchical scheduling on multiprocessors: From application requirements to run-time allocation. In RTSS, 010. E. Bini, M. Bertogna, and S. Baruah. Virtual multiprocessor platforms: Specification and use. In RTSS, 009. Overhead-aware analysis on non-virtualization environment B. B. Brandenburg. Scheduling and Locking in Multiprocessor Real-Time Operating Systems. PhD thesis, The University of North Carolina at Chapel Hill, 011. Methods of getting the cache overhead value A. Bastoni, B. B. Brandenburg, and J. H. Anderson. Cache-Related Preemption and Migration Delays: Empirical Approximation and Impact on Schedulability. In OSPERT, 010. S. Altmeyer, R. I. Davis, and C. Maiza. Improved cache related preemption delay aware response time analysis for fixed priority preemptive systems. Real-Time Systems, 01. 34

Conclusion Contribution Propose DMPR resource model Introduce overhead-free compositional analysis under DMPR Quantify events that cause cache overhead Propose cache-aware compositional analysis Future work Extend our method to multi-level cache hierarchy with shared cache Explore cache management methods to reduce the cache overhead 35