Fine-Grained Multiprocessor Real-Time Locking with Improved Blocking

Similar documents

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Performance Tuning and Optimizing SQL Databases 2016

B) Using Processor-Cache Affinity Information in Shared Memory Multiprocessor Scheduling

Implementing AUTOSAR Scheduling and Resource Management on an Embedded SMT Processor

CPU Scheduling Outline

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Cache-Aware Compositional Analysis of Real-Time Multicore Virtualization Platforms

Concurrent Programming

Design Patterns in C++

Would-be system and database administrators. PREREQUISITES: At least 6 months experience with a Windows operating system.

Operatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings

Road Map. Scheduling. Types of Scheduling. Scheduling. CPU Scheduling. Job Scheduling. Dickinson College Computer Science 354 Spring 2010.

SQL Server 2008 Performance and Scale

Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6

Control 2004, University of Bath, UK, September 2004

Page 1 of 5. IS 335: Information Technology in Business Lecture Outline Operating Systems

Real Time Programming: Concepts

Predictable response times in event-driven real-time systems

MAPREDUCE Programming Model

<Insert Picture Here> Oracle In-Memory Database Cache Overview

Automated Software Testing of Memory Performance in Embedded GPUs. Sudipta Chattopadhyay, Petru Eles and Zebo Peng! Linköping University

CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service


supernova, a multiprocessor-aware synthesis server for SuperCollider

Mutual Exclusion using Monitors

Performance Counters. Microsoft SQL. Technical Data Sheet. Overview:

Real-Time Monitoring Framework for Parallel Processes

Client/Server and Distributed Computing

Topics. Producing Production Quality Software. Concurrent Environments. Why Use Concurrency? Models of concurrency Concurrency in Java

Microkernels, virtualization, exokernels. Tutorial 1 CSC469

Analysis and Implementation of the Multiprocessor BandWidth Inheritance Protocol

EECS 750: Advanced Operating Systems. 01/28 /2015 Heechul Yun

Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two

E) Modeling Insights: Patterns and Anti-patterns

Devices and Device Controllers

Securing and Accelerating Databases In Minutes using GreenSQL

Quantum Support for Multiprocessor Pfair Scheduling in Linux

This presentation covers virtual application shared services supplied with IBM Workload Deployer version 3.1.

The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology

Resource Reservation & Resource Servers. Problems to solve

Chapter 13 Embedded Operating Systems

SQL Server Performance Tuning and Optimization

HDMQ :Towards In-Order and Exactly-Once Delivery using Hierarchical Distributed Message Queues. Dharmit Patel Faraj Khasib Shiva Srivastava

Hagit Attiya and Eshcar Hillel. Computer Science Department Technion

Parallel Computing in Python: multiprocessing. Konrad HINSEN Centre de Biophysique Moléculaire (Orléans) and Synchrotron Soleil (St Aubin)

Dynamic Load Balancing. Using Work-Stealing 35.1 INTRODUCTION CHAPTER. Daniel Cederman and Philippas Tsigas

In Memory Accelerator for MongoDB

Chapter 12: Multiprocessor Architectures. Lesson 09: Cache Coherence Problem and Cache synchronization solutions Part 1

Deciding which process to run. (Deciding which thread to run) Deciding how long the chosen process can run

Real-Time Operating Systems for MPSoCs

Avoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas

CPU Scheduling. CPU Scheduling

Multi-core Programming System Overview

CS Fall 2008 Homework 2 Solution Due September 23, 11:59PM

A Close Look at PCI Express SSDs. Shirish Jamthe Director of System Engineering Virident Systems, Inc. August 2011

White Paper. Fabasoft on Linux Performance Monitoring via SNMP. Fabasoft Folio 2015 Update Rollup 2

Distributed Systems [Fall 2012]

Enforcing Security Policies. Rahul Gera

Adaptive Middleware for Data Replication

READPAST & Furious: Transactions, Locking and Isolation

Real-Time Scheduling 1 / 39

Continuously Evolving Systems: Towards On-Line Adaptation of Fault Tolerance Software

Developing Scalable Java Applications with Cacheonix

Concepts of Concurrent Computation

Main Points. Scheduling policy: what to do next, when there are multiple threads ready to run. Definitions. Uniprocessor policies

Parallel Firewalls on General-Purpose Graphics Processing Units

Microsoft SQL Server: MS Performance Tuning and Optimization Digital

Datacenter Operating Systems

Implementing AUTOSAR Scheduling and Resource Management on an Embedded SMT Processor

Real-Time Component Software. slide credits: H. Kopetz, P. Puschner

Promise of Low-Latency Stable Storage for Enterprise Solutions

Lecture 16: Storage Devices

Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism

Characterizing Performance of Enterprise Pipeline SCADA Systems

State-Driven Testing of Distributed Systems: Appendix

Enhancing SQL Server Performance

DataBlitz Main Memory DataBase System

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Lecture Outline Overview of real-time scheduling algorithms Outline relative strengths, weaknesses

Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Optimizing Shared Resource Contention in HPC Clusters

1 How to Monitor Performance

Understanding IBM Lotus Domino server clustering

Operating Systems OBJECTIVES 7.1 DEFINITION. Chapter 7. Note:

The Oracle Universal Server Buffer Manager

ADDING A NEW SITE IN AN EXISTING ORACLE MULTIMASTER REPLICATION WITHOUT QUIESCING THE REPLICATION

A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing

Novel Systems. Extensible Networks

A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin

Designing Real-Time and Embedded Systems with the COMET/UML method

Distributed Systems LEEC (2005/06 2º Sem.)

Seeking Fast, Durable Data Management: A Database System and Persistent Storage Benchmark

2. is the number of processes that are completed per time unit. A) CPU utilization B) Response time C) Turnaround time D) Throughput

CPU SCHEDULING (CONT D) NESTED SCHEDULING FUNCTIONS

1 Discussion of multithreading on Win32 mod_perl

Transcription:

Fine-Grained Multiprocessor Real-Time Locking with Improved Blocking Bryan C. Ward James H. Anderson Dept. of Computer Science UNC-Chapel Hill

Motivation Locks can be used to control access to: Shared memory objects Shared hardware resources (e.g., buses, GPUs, shared caches, etc.) However, locks cause blocking. Our goal: improve worst-case blocking so as to improve real-time schedulability.

Multi-Resource Locking Sometimes jobs need to access multiple shared resources concurrently. Two general approaches: Coarse-grained locking. Fine-grained locking. The Real-Time Nested Locking Protocol (RNLP) was the first fine-grained locking protocol for multicore real-time systems.

Our Contributions We extend the RNLP in three ways: 1. Reduce system-call overhead. 2. Improve worst-case blocking when critical sections have different lengths. 3. Support replicated resources. Techniques can be combined, but presented separately (unless noted).

RNLP Background Request Satisfaction Mechanism Token Lock T3 A T4 B C

RNLP Background Request Satisfaction Mechanism Token Lock T3 A T4 B Different token locks can be used to achieve C better blocking bounds under different schedulers and analysis assumptions.

RNLP Background Request Satisfaction Mechanism Request satisfaction Token Lock mechanism (RSM) T3 A orders the satisfaction T4 of requests. We focus B exclusively on it for the rest of the talk. C

Fine-Grained Locking Challenge A B C

Fine-Grained Locking Challenge A B C

Fine-Grained Locking Challenge A B C

Fine-Grained Locking Challenge A B C

Fine-Grained Locking Challenge T3 A B C

Fine-Grained Locking Challenge T3 T4 A B C

Fine-Grained Locking Challenge T3 T4 A B C

Fine-Grained Locking Challenge T3 T4 A B C

Fine-Grained Locking Challenge Observation: A later-issued request (T4) blocks earlier-issued requests (-T3). T3 T4 A B C

RNLP Solution A B C

RNLP Solution Placeholder A B C

RNLP Solution Placeholder A B C

RNLP Solution Placeholder A B C

RNLP Solution Placeholder T3 A B C

RNLP Solution Placeholder T3 T4 A B C

RNLP Solution Invariant: A later-issued request never blocks an earlier-issued requests. Placeholder T3 T4 A B C

Dynamic Group Locks (DGLs) Goal: reduce number of lock/unlock calls. A job requests all potentially needed resources at once. Request a subset of all resources. Job may have to request resources that it doesn t use. Scheduled when all resources available.

DGL Example A B C

DGL Example A B C

DGL Example A B C

DGL Example T3 A B C

DGL Example T3 T4 A B C

DGL Example Invariant: A later-issued request never blocks an earlier-issued requests. T3 T4 A B C

DGLs vs. RNLP Same worst-case blocking behavior. DGLs require only one lock and unlock call per outermost request instead of one per resource. RNLP may provide better observed blocking behavior.

Short vs. Long Critical sections may take longer for some resources than others. For example: GPUs = long (milliseconds). Shared memory objects = short (microseconds).

Short vs. Long Problem Placeholder T3 T4 A B C Long Long Short

Short vs. Long Problem Placeholder T3 T4 A B C Long Long Short

Short vs. Long Problem Problem: A short request (T4) is blocked by the placeholder of a long request () Placeholder T3 T4 A B C Long Long Short

Eliminating Short-on-Long Blocking Key ideas: Long requests do not insert placeholders for short resources. Long requests enqueue in short resource queues based on outermost lock request time. Details in the paper show how different waiting mechanisms (spin vs. suspend) can be used for short and long resources. Asymptotic optimality retained.

Multi-unit Locking Resource model: For each resource, there are k replicas. Jobs require access to any of the k replicas. Example use: Shared hardware (e.g. GPUs).

k-rnlp FIFO-ordered queue per replica. At request time, the job is enqueued in the shortest replica queue for each potentially required resource. Could insert a placeholder for potentially needed resources.

Queue Structure Replica 1 Replica 2 Replica 1 Replica 2 A B

Queue Structure Replica 1 Replica 2 Replica 1 Replica 2 A B

Queue Structure Replica 1 Replica 2 Replica 1 Replica 2 A B

Queue Structure Replica 1 Replica 2 Replica 1 Replica 2 T3 T3 A B

Queue Structure Replica 1 Replica 2 Replica 1 Replica 2 T3 T3 A B

Queue Structure Replica 1 Replica 2 Replica 1 Replica 2 T3 T3 T4 A B

Queue Structure Observation: Later-issued requests (T4) Replica 1 Replica 2 Replica 1 Replica 2 may be satisfied earlier by acquiring a different replica. T3 T3 T4 A B

k-rnlp Blocking Analysis Queue length reduced by factor of k. Worst-case blocking is at most the maximum queue length. In the paper, we show that the k-rnlp can be configured for O(m/k) or O(n/k) blocking under different analysis assumptions. Asymptotically optimal.

Schedulability k = 2 HRT Schedulability System Utilization

Schedulability k = 2 HRT Schedulability System Utilization

Schedulability k = 2 HRT Schedulability System Utilization

Schedulability k = 2 HRT Schedulability System Utilization

Schedulability k = 2 HRT Schedulability System Utilization

Conclusions We presented three RNLP modifications: DGLs - reduce system-call overhead. Eliminate short-on-long blocking. Support for multi-unit resources. Modifications improve real-time schedulability. Maintain asymptotic optimality.

Questions?