Advanced Multiprocessor Programming

Similar documents
Parallel Programming

Mutual Exclusion using Monitors

Chapter 6 Concurrent Programming

CS11 Java. Fall Lecture 7

Monitors, Java, Threads and Processes

Concurrent programming in Java

Built-in Concurrency Primitives in Java Programming Language. by Yourii Martiak and Mahir Atmis

Outline of this lecture G52CON: Concepts of Concurrency

Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification

SSC - Concurrency and Multi-threading Java multithreading programming - Synchronisation (I)

Synchronization. Todd C. Mowry CS 740 November 24, Topics. Locks Barriers

Lecture 8: Safety and Liveness Properties

3C03 Concurrency: Condition Synchronisation

Last Class: Semaphores

THE VELOX STACK Patrick Marlier (UniNE)

Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6

Threads 1. When writing games you need to do more than one thing at once.

Topics. Producing Production Quality Software. Concurrent Environments. Why Use Concurrency? Models of concurrency Concurrency in Java

A Survey of Parallel Processing in Linux

API for java.util.iterator. ! hasnext() Are there more items in the list? ! next() Return the next item in the list.

Monitors and Semaphores

Getting Started with the Internet Communications Engine

Lecture 6: Semaphores and Monitors

Java Interview Questions and Answers

Simple Cooperative Scheduler for Arduino ARM & AVR. Aka «SCoop»

First Java Programs. V. Paúl Pauca. CSC 111D Fall, Department of Computer Science Wake Forest University. Introduction to Computer Science

Concurrent Data Structures

Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest

e ag u g an L g ter lvin v E ram Neal G g ro va P Ja

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

Optimizing Performance. Training Division New Delhi

Getting to know Apache Hadoop

Java the UML Way: Integrating Object-Oriented Design and Programming

A Comparative Study on Vega-HTTP & Popular Open-source Web-servers

Monitor Object. An Object Behavioral Pattern for Concurrent Programming. Douglas C. Schmidt

The Sun Certified Associate for the Java Platform, Standard Edition, Exam Version 1.0

Monitors & Condition Synchronization

Chapter 8 Implementing FSP Models in Java

Lecture 23: Interconnection Networks. Topics: communication latency, centralized and decentralized switches (Appendix E)

Next Generation GPU Architecture Code-named Fermi

Optimizing Parallel Reduction in CUDA. Mark Harris NVIDIA Developer Technology

Java SE 8 Programming

The Darwin Game 2.0 Programming Guide

Intro to GPU computing. Spring 2015 Mark Silberstein, , Technion 1

Lecture 6: Introduction to Monitors and Semaphores

Practical Data Visualization and Virtual Reality. Virtual Reality VR Software and Programming. Karljohan Lundin Palmerius

Using Object Database db4o as Storage Provider in Voldemort

Semaphores and Monitors: High-level Synchronization Constructs

A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin

Computer Science 483/580 Concurrent Programming Midterm Exam February 23, 2009

GPI Global Address Space Programming Interface

Resurrecting Ada s Rendez-Vous in Java

Chapter 6 Load Balancing

language 1 (source) compiler language 2 (target) Figure 1: Compiling a program

Why Use Binary Trees?

Developing Scalable Java Applications with Cacheonix

JAVA - MULTITHREADING

! Past few lectures: Ø Locks: provide mutual exclusion Ø Condition variables: provide conditional synchronization

J a v a Quiz (Unit 3, Test 0 Practice)

Real Time Programming: Concepts

MonitorExplorer: A State Exploration-Based Approach to Testing Java Monitors

Shared Address Space Computing: Programming

Merkle Hash Trees for Distributed Audit Logs

Java Memory Model: Content

Thesis Proposal: Improving the Performance of Synchronization in Concurrent Haskell

Operating System: Scheduling

Crash Course in Java

Chapter 6, The Operating System Machine Level

Chapter 7: Termination Detection

Tier Architectures. Kathleen Durant CS 3200

OBJECT ORIENTED PROGRAMMING LANGUAGE

Debugging Java Applications

Squashing the Bugs: Tools for Building Better Software

Pemrograman Dasar. Basic Elements Of Java

Application Domains and Contexts and Threads, Oh My!

Overview. CMSC 330: Organization of Programming Languages. Synchronization Example (Java 1.5) Lock Interface (Java 1.5) ReentrantLock Class (Java 1.

Weaving Stored Procedures into Java at Zalando

How To Write A Multi Threaded Software On A Single Core (Or Multi Threaded) System

Facing the Challenges for Real-Time Software Development on Multi-Cores

Introduction to Parallel Computing. George Karypis Parallel Programming Platforms

Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details

Graph Analytics in Big Data. John Feo Pacific Northwest National Laboratory

Bigdata High Availability (HA) Architecture

Ordered Lists and Binary Trees

Non-blocking Switching in the Cloud Computing Era

Performance Improvement In Java Application

GTask Developing asynchronous applications for multi-core efficiency

COMPUTER SCIENCE. Paper 1 (THEORY)

Transcription:

Advanced Multiprocessor Programming Jesper Larsson Träff traff@par.tuwien.ac.at Research Group Parallel Computing aculty of nformatics, nstitute of nformation Systems Vienna University of Technology (TU Wien)

Combining and Counting (Chap. 2) Parallelizing associative function application: get-and-increment of shared counter as example. Trivial: Protect counter with lock (Java: synchronized method = monitor with condition variable) Properties: With no contention (concurrent get-and-increment), O() and fast On contention, O(n) serialization of the n threads

Tree-based implementation (how?): Always O(log n) operations, even when no contention But, possibly, also O(log n) on contention Latency/throughput trade-offs: Use better data structure to get better throughput, possibly at the cost of higher latency per operation

Combining tree RT 0

Combining tree: The shared value (counter) is maintained at root node Each thread is assigned to a leaf node (at most) Two threads share a leaf node At most two threads share an interior node Node has a status, indicating what to do next To update shared value (increment counter): Thread starts at leaf and works up the tree. f it meets another that thread at some node, their two update values are combined, one thread remains active and proceeds upwards, eventually reaching the root and updating the shared value, while the other, passive thread waits for the result value from the active thread

public class Node { enum Node_status = {DLE, RST, SECOND, RESULT, ROOT; boolean locked; Node_status status; int firstval, secondval; int result; Node parent; public Node() { // root constructor status = Node_status.ROOT; locked = false; public Node(Node p) { // interior node parent = p; status = Node_status.DLE; locked = false;

Locked Status: (dle), R(oo)T, (irst), S(econd), R(esult) Result Value from first thread/value from second thread

public CombiningTree(int width) { Node[] nodes = new Node[width-]; nodes[0] = new Node(); // the root for (i=; i<width-; i++) { nodes[i] = new Node(nodes[(i-)/2]); Node[] leaves = new Node[(width+)/2]; for (i=0; i<(width+)/2; i++) leaves[i] = nodes[width--i-];

Thread A: getandinc() RT 0

Thread A: getandinc() RT 0

Thread A: getandinc() RT 0

Thread A: getandinc(). Reserve combining path where thread is active RT 0

. Reserve combining path where thread is active (precombine) 2. Write values on active path (combine)

Thread A: getandinc(). Reserve combining path where thread is active RT 0

. Reserve combining path where thread is active (precombine) 2. Write values on active path (combine) 3. Perform op on last node 4. Distribute

Thread A: getandinc(). Reserve combining path where thread is active RT

Thread A: getandinc(). Reserve combining path where thread is active RT Return 0 (old result at root)

2 3 4 public int getandinc() { Stack<Node> stack = new Stack<Node>(); Node leaf = leaves[threadd.get()/2]; Node node = leaf; while (node.precombine()) node = node.parent; Node last = node; node = leaf; int combined = ; while (node!=last) { combined = node.combine(combined); stack.push(node); int prior = last.op(combined); while (!stack.empty()) { node = stack.pop(); node.distribute(prior);

synchronized boolean precombine() { while (locked) wait(); switch (status) { case DLE: status = Node_status.RST; return true; case RST: locked = true; status = Node_status.SECOND; return false; case ROOT: return false; default: // cannot happen, throw exception Passive thread, will have to wait

synchronized int combine(int combined) { while (locked) wait(); locked = true; firstval = combined; switch (status) { case RST: return firstval; case SECOND: return firstval+secondval; default: // cannot happen, throw exception Wait on condition variable Java synchronized method: lock (mutual exclusion) and implicit condition variable

synchronized int op(int combined) { switch (status) { case ROOT: int prior = result; result += combined; return prior; case SECOND: secondval = combined; locked = false; notifyall(); // wake up waiting threads while (status!=node_status.result) wait(); locked = false; notifyall(); status = Node_status.DLE; return result; default: // cannot happen, throw exception

synchronized void distribute(int prior) { switch (status) { case RST: status = Node_status.DLE; locked = false; break; case SECOND: result = prior+firstval; status = Node_status.RESULT; break; default: // cannot happen, throw exception notifyall();

Thread A: getandinc() RT 0 Thread B: getandinc()

Thread A: getandinc() RT 0 Thread B: getandinc()

Thread A: getandinc() RT 0 S Thread B: getandinc()

Thread A: getandinc() RT 0 S Thread B: getandinc()

Thread A: getandinc() RT 0 2 S Thread B: getandinc()

Thread A: getandinc() Returns 0 RT 2 2 S Thread B: getandinc() Returns

Properties ine-grained locking by synchronized methods. There is no lock on the whole data structure Blocking: threads will have to wait on locked nodes for active thread to complete update Linearizable Not unfair (what does that mean?) Not likely to be a competitor for hardware fetch_add() operation, but could be useful for more complex update operations?