Memory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality

Similar documents
Schedulability Analysis for Memory Bandwidth Regulated Multicore Real-Time Systems

Achieving QoS in Server Virtualization

Performance Isolation for Real-time Systems with Xen Hypervisor on Multi-cores

EECS 750: Advanced Operating Systems. 01/28 /2015 Heechul Yun

CPU Shielding: Investigating Real-Time Guarantees via Resource Partitioning

White Paper: Performance of Host-based Media Processing

Resource Efficient Computing for Warehouse-scale Datacenters

PikeOS: Multi-Core RTOS for IMA. Dr. Sergey Tverdyshev SYSGO AG , Moscow

A hypervisor approach with real-time support to the MIPS M5150 processor

Real- Time Mul,- Core Virtual Machine Scheduling in Xen

Scheduling. Scheduling. Scheduling levels. Decision to switch the running process can take place under the following circumstances:

Using EDF in Linux: SCHED DEADLINE. Luca Abeni

The Impact of Memory Subsystem Resource Sharing on Datacenter Applications. Lingia Tang Jason Mars Neil Vachharajani Robert Hundt Mary Lou Soffa

Deciding which process to run. (Deciding which thread to run) Deciding how long the chosen process can run

NVIDIA Tools For Profiling And Monitoring. David Goodwin

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Intel DPDK Boosts Server Appliance Performance White Paper

An OS-oriented performance monitoring tool for multicore systems

Designing Predictable Multicore Architectures for Avionics and Automotive Systems extended abstract

RETIS Lab Real-Time Systems Laboratory

Model based design of real time systems

Sporadic Server Revisited

Virtual Pla*orms Hypervisor Methods to Improve Performance and Isola7on proper7es of Shared Mul7core Servers

System Software and TinyAUTOSAR

QoS and Communication Performance Management

Memory Bandwidth Management for Efficient Performance Isolation in Multi-core Platforms

Optimizing Shared Resource Contention in HPC Clusters

Cache-aware compositional analysis of real-time multicore virtualization platforms

Run-time Resource Management in SOA Virtualized Environments. Danilo Ardagna, Raffaela Mirandola, Marco Trubian, Li Zhang

Long-term monitoring of apparent latency in PREEMPT RT Linux real-time systems

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

Operating System Impact on SMT Architecture

Cache-Aware Compositional Analysis of Real-Time Multicore Virtualization Platforms

Predictable response times in event-driven real-time systems

Computer Science 146/246 Homework #3

Introduction. Application Performance in the QLinux Multimedia Operating System. Solution: QLinux. Introduction. Outline. QLinux Design Principles

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Performance Characterization of SPEC CPU2006 Integer Benchmarks on x Architecture

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

Introduction to the NI Real-Time Hypervisor

A Dual-Layer Bus Arbiter for Mixed-Criticality Systems with Hypervisors

Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6

Real-Time Scheduling 1 / 39

Datacenter Operating Systems

Outline. Introduction. Multiprocessor Systems on Chip. A MPSoC Example: Nexperia DVP. A New Paradigm: Network on Chip

Operatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings

Achieving Performance Isolation with Lightweight Co-Kernels

Overlapping Data Transfer With Application Execution on Clusters

FPGA-based Multithreading for In-Memory Hash Joins

FACT: a Framework for Adaptive Contention-aware Thread migrations

APerformanceComparisonBetweenEnlightenmentandEmulationinMicrosoftHyper-V

Q-Clouds: Managing Performance Interference Effects for QoS-Aware Clouds

CPU Scheduling Outline

Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju

Quality of Service su Linux: Passato Presente e Futuro

white paper Capacity and Scaling of Microsoft Terminal Server on the Unisys ES7000/600 Unisys Systems & Technology Modeling and Measurement

Hardware Based Virtualization Technologies. Elsie Wahlig Platform Software Architect

Web Load Stress Testing

W4118 Operating Systems. Instructor: Junfeng Yang

Eloquence Training What s new in Eloquence B.08.00

Scheduling 0 : Levels. High level scheduling: Medium level scheduling: Low level scheduling

Scheduling Task Parallelism" on Multi-Socket Multicore Systems"

Reducing Dynamic Compilation Latency

Delivering Quality in Software Performance and Scalability Testing

Parallel Processing and Software Performance. Lukáš Marek

High Performance or Cycle Accuracy?

Secure Containers. Jan Imagination Technologies HGI Dec, 2014 p1

Multicore partitioned systems based on hypervisor

Chapter 5 Process Scheduling

Enabling Technologies for Distributed Computing

Principles and characteristics of distributed systems and environments

Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details

DATABASE. Pervasive PSQL Performance. Key Performance Features of Pervasive PSQL. Pervasive PSQL White Paper

ICRI-CI Retreat Architecture track

I/O virtualization. Jussi Hanhirova Aalto University, Helsinki, Finland Hanhirova CS/Aalto

Partition Scheduling in APEX Runtime Environment for Embedded Avionics Software

CS 147: Computer Systems Performance Analysis

Hardware accelerated Virtualization in the ARM Cortex Processors

Addressing Shared Resource Contention in Multicore Processors via Scheduling

Next Generation Operating Systems

Ada Real-Time Services and Virtualization

KerMon: Framework for in-kernel performance and energy monitoring

Application Performance Analysis of the Cortex-A9 MPCore

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

Benchmarking Cassandra on Violin

Informatica Data Director Performance

CPU Scheduling. CPU Scheduling

Multi-core Programming System Overview

Virtualization. Pradipta De

9/26/2011. What is Virtualization? What are the different types of virtualization.

Unifying Information Security

10G Ethernet: The Foundation for Low-Latency, Real-Time Financial Services Applications and Other, Future Cloud Applications

Jun Liu, Senior Software Engineer Bianny Bian, Engineering Manager SSG/STO/PAC

Precise and Accurate Processor Simulation

Architectures for Distributed Real-time Systems

Measuring the Performance of Prefetching Proxy Caches

Lecture Outline Overview of real-time scheduling algorithms Outline relative strengths, weaknesses

Scheduling Virtual Machines for Load balancing in Cloud Computing Platform

Proactive, Resource-Aware, Tunable Real-time Fault-tolerant Middleware

Dynamic Virtual Machine Scheduling in Clouds for Architectural Shared Resources

Transcription:

Memory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality Heechul Yun +, Gang Yao +, Rodolfo Pellizzoni *, Marco Caccamo +, Lui Sha + University of Illinois at Urbana and Champaign + Univerity of Waterloo *

Multi-core Systems Mainstream in smartphone Dual/quad-core smartphones More performance with less power Tegra 3 (4 cores) Traditional embedded/real-time domains Avionics companies are investigating [Nowotsch12] 8 core P4080 processor from Freescale [Nowotsch12] Leveraging Multi-Core Computing Architectures in Avionics, EDCC, 2012 2

Challenge Core1 Core2 Core3 Core4 System bus DRAM Timing isolation is hard to achieve 3

Challenge Appl1 App2 App 3 App 4 Core1 Core2 Core3 Core4 System bus DRAM Cores compete for shared HW resources System bus, DRAM controller, shared cache, 4

Effect of Memory Contention 70% 60% 50% Run-time increase (%) 60% WCET increase is unacceptable 40% App. membomb 30% 20% 10% Core Core Shared system bus Memory 0% 429.mcf 471.omnetpp 473.astar 433.milc 470.lbm Run-time increase due to contention Five SPEC2006 benchmarks Compared to solo execution 5

Goal Mechanism to control memory contention Software based controller for COTS multi-core processors Response time analysis accounting memory contention effect Based on proposed software based controller 6

Outline Motivation Memory Access Control System Response Time Analysis Evaluation Conclusion 7

System Architecture Core1 Core2 Core3 Core4 Memory bandwidth controllers (Part of OS) 20% 30% 10% 40% System bus DRAM Assign memory bandwidth to each core using per-core memory bandwidth controller 8

Memory Bandwidth Controller Periodic server for memory resource Periodically monitor memory accesses of the core and control user specified bandwidth using OS scheduler Monitoring can be efficiently done by using per-core hardware performance counter Bandwidth = # memory accesses X avg. access time 9

Memory Bandwidth Controller Period: 10 time unit, Budget: 2 memory accesses memory access takes 1 time unit Budget 2 1 Enqueue tasks Task 0 10 20 Dequeue tasks Dequeue tasks computation memory fetch 10

Outline Motivation Memory Bandwidth Control System Response Time Analysis Evaluation Conclusion 11

System Model Critical core Interfering cores Core1 Core2 Core3 Core4 Memory bandwidth controller System bus DRAM Cores are partitioned based on criticality Critical core runs periodic real-time tasks with fixed priority scheduling algorithm Interfering cores run non-critical workload and regulated with proposed memory access controller 12

Assumptions Critical core Interfering cores Core1 Core2 Core3 Core4 Memory bandwidth controller System bus DRAM Private or partitioned last level cache (LLC) Round-robin bus arbitration policy Memory access latency is constant 1 LLC miss = 1 DRAM access 13

Simple Case: One Interfering Core Critical Core Interfering Core Memory bandwidth controller System bus DRAM Critical core - core under analysis Interfering core generating memory interference 14

Problem Formulation For a given periodic real-time task set T = {τ 1, τ 1,, τ n } on a critical core Problem: Determine T is schedulable on the critical core given memory access control budget Q and period P on the interfering core 15

Task Model CM C time computation memory fetch (cache stall) C : WCET of a task on isolated core (no interference) CM: number of last level cache misses (DRAM accesses) L: stall time of single cache miss 16

Memory Interference Model P : memory access controller period Q: memory access time budget α u (t): Linearized interfering memory traffic upper-bound 17

Background [Pellizzoni07] Accounting Memory Interference Cache bound: maximum interference time <= maximum number cache-accesses (CM) * L of the task under analysis Traffic bound: maximum interference time <= maximum bus time requested by the interfering core Cache-bound Traffic bound C: WCET account memory stall delay L: stall time of single cache miss [Pellizzoni07] Toward the predictable integration of real-time cots based systems, RTSS, 2007 18

Classic Response Time Analysis R i k+1 = C i + j<i R i k T j C j Tasks are sorted in priority order low index = high priority task C i : WCET of task i (in isolation w/o memory interference) R i : Response time of task i T j : Period of task I 19

Extended Response Time Analysis R i k+1 = C i + j<i R i k T j N t : aggregated cache misses over time t α u (t): interfering memory traffic over time t P: memory access control period Q: memory access time budget C j + min N R k L, α u R k where N t = α u t = t Q P Proposed method achieves tighter response time than using C j i t T j CM j + 2 Q(P Q) P 20

Outline Motivation Memory Bandwidth Control System Response Time Analysis Evaluation Conclusion 21

Linux Kernel Implementation Extending CPU bandwidth reservation feature of group scheduler Specify core and bandwidth (memory budget, period) mkdir /sys/fs/cgroup/core3; cd /sys/fs/cgroup/core3 echo 3 > cpuset.cpus core 3 echo 10000 > cpu.cfs_period_us echo 500000 > cpu.cfs_quota_event period cache-misses budget Added feature Monitor memory usage at every scheduler tick and context switch 22

Experimental Platform Intel Core2Quad Core 0 Core 1 Core 2 Core 3 L1-I L1-D L1-I L1-D L1-I L1-D L1-I L1-D L2 Cache L2 Cache System Bus DRAM Core 0,2 were disabled to simulate a private LLC system Running a modified Linux 3.2 kernel https://github.com/heechul/linux-sched-coreidle/tree/sched-3.2-throttle-v2 23

Synthetic Task App. membomb Core1 Core3 Shared system bus Memory Core under analysis runs a synthetic task with 50% memory bandwidth Vary throttling budget of the interfering core from 0 to 100% Two findings: (1) we can control interference, (2) analysis provide an upper bound (albeit still pessimistic) 24

H.264 Movie Playback Cache-miss counts sampled over every 100ms Some inaccuracy in regulation due to implementation limitation Current version is improved accurate by using hardware overflow interrupt 25

Conclusion Shared hardware resources in multi-core systems are big challenges for designing real-time systems We proposed and implemented a mechanism to provide memory bandwidth reservation capability on COTS multi-core processors We developed a response time analysis method using the proposed memory access control mechanism 26

Thank you. 27