Advanced Multiprocessor Programming

Advanced Multiprocessor Programming Jesper Larsson Träff traff@par.tuwien.ac.at Research Group Parallel Computing aculty of nformatics, nstitute of nformation Systems Vienna University of Technology (TU Wien)

Combining and Counting (Chap. 2) Parallelizing associative function application: get-and-increment of shared counter as example. Trivial: Protect counter with lock (Java: synchronized method = monitor with condition variable) Properties: With no contention (concurrent get-and-increment), O() and fast On contention, O(n) serialization of the n threads

Tree-based implementation (how?): Always O(log n) operations, even when no contention But, possibly, also O(log n) on contention Latency/throughput trade-offs: Use better data structure to get better throughput, possibly at the cost of higher latency per operation

Combining tree RT 0

Combining tree: The shared value (counter) is maintained at root node Each thread is assigned to a leaf node (at most) Two threads share a leaf node At most two threads share an interior node Node has a status, indicating what to do next To update shared value (increment counter): Thread starts at leaf and works up the tree. f it meets another that thread at some node, their two update values are combined, one thread remains active and proceeds upwards, eventually reaching the root and updating the shared value, while the other, passive thread waits for the result value from the active thread

public class Node { enum Node_status = {DLE, RST, SECOND, RESULT, ROOT; boolean locked; Node_status status; int firstval, secondval; int result; Node parent; public Node() { // root constructor status = Node_status.ROOT; locked = false; public Node(Node p) { // interior node parent = p; status = Node_status.DLE; locked = false;

Locked Status: (dle), R(oo)T, (irst), S(econd), R(esult) Result Value from first thread/value from second thread

public CombiningTree(int width) { Node[] nodes = new Node[width-]; nodes[0] = new Node(); // the root for (i=; i<width-; i++) { nodes[i] = new Node(nodes[(i-)/2]); Node[] leaves = new Node[(width+)/2]; for (i=0; i<(width+)/2; i++) leaves[i] = nodes[width--i-];

Thread A: getandinc() RT 0

Thread A: getandinc(). Reserve combining path where thread is active RT 0

. Reserve combining path where thread is active (precombine) 2. Write values on active path (combine)

Thread A: getandinc(). Reserve combining path where thread is active RT 0

. Reserve combining path where thread is active (precombine) 2. Write values on active path (combine) 3. Perform op on last node 4. Distribute

Thread A: getandinc(). Reserve combining path where thread is active RT

Thread A: getandinc(). Reserve combining path where thread is active RT Return 0 (old result at root)

2 3 4 public int getandinc() { Stack<Node> stack = new Stack<Node>(); Node leaf = leaves[threadd.get()/2]; Node node = leaf; while (node.precombine()) node = node.parent; Node last = node; node = leaf; int combined = ; while (node!=last) { combined = node.combine(combined); stack.push(node); int prior = last.op(combined); while (!stack.empty()) { node = stack.pop(); node.distribute(prior);

synchronized boolean precombine() { while (locked) wait(); switch (status) { case DLE: status = Node_status.RST; return true; case RST: locked = true; status = Node_status.SECOND; return false; case ROOT: return false; default: // cannot happen, throw exception Passive thread, will have to wait

synchronized int combine(int combined) { while (locked) wait(); locked = true; firstval = combined; switch (status) { case RST: return firstval; case SECOND: return firstval+secondval; default: // cannot happen, throw exception Wait on condition variable Java synchronized method: lock (mutual exclusion) and implicit condition variable

synchronized int op(int combined) { switch (status) { case ROOT: int prior = result; result += combined; return prior; case SECOND: secondval = combined; locked = false; notifyall(); // wake up waiting threads while (status!=node_status.result) wait(); locked = false; notifyall(); status = Node_status.DLE; return result; default: // cannot happen, throw exception

synchronized void distribute(int prior) { switch (status) { case RST: status = Node_status.DLE; locked = false; break; case SECOND: result = prior+firstval; status = Node_status.RESULT; break; default: // cannot happen, throw exception notifyall();

Thread A: getandinc() RT 0 Thread B: getandinc()

Thread A: getandinc() RT 0 S Thread B: getandinc()

Thread A: getandinc() RT 0 2 S Thread B: getandinc()

Thread A: getandinc() Returns 0 RT 2 2 S Thread B: getandinc() Returns

Properties ine-grained locking by synchronized methods. There is no lock on the whole data structure Blocking: threads will have to wait on locked nodes for active thread to complete update Linearizable Not unfair (what does that mean?) Not likely to be a competitor for hardware fetch_add() operation, but could be useful for more complex update operations?