Advanced Multiprocessor Programming Jesper Larsson Träff traff@par.tuwien.ac.at Research Group Parallel Computing aculty of nformatics, nstitute of nformation Systems Vienna University of Technology (TU Wien)
Combining and Counting (Chap. 2) Parallelizing associative function application: get-and-increment of shared counter as example. Trivial: Protect counter with lock (Java: synchronized method = monitor with condition variable) Properties: With no contention (concurrent get-and-increment), O() and fast On contention, O(n) serialization of the n threads
Tree-based implementation (how?): Always O(log n) operations, even when no contention But, possibly, also O(log n) on contention Latency/throughput trade-offs: Use better data structure to get better throughput, possibly at the cost of higher latency per operation
Combining tree RT 0
Combining tree: The shared value (counter) is maintained at root node Each thread is assigned to a leaf node (at most) Two threads share a leaf node At most two threads share an interior node Node has a status, indicating what to do next To update shared value (increment counter): Thread starts at leaf and works up the tree. f it meets another that thread at some node, their two update values are combined, one thread remains active and proceeds upwards, eventually reaching the root and updating the shared value, while the other, passive thread waits for the result value from the active thread
public class Node { enum Node_status = {DLE, RST, SECOND, RESULT, ROOT; boolean locked; Node_status status; int firstval, secondval; int result; Node parent; public Node() { // root constructor status = Node_status.ROOT; locked = false; public Node(Node p) { // interior node parent = p; status = Node_status.DLE; locked = false;
Locked Status: (dle), R(oo)T, (irst), S(econd), R(esult) Result Value from first thread/value from second thread
public CombiningTree(int width) { Node[] nodes = new Node[width-]; nodes[0] = new Node(); // the root for (i=; i<width-; i++) { nodes[i] = new Node(nodes[(i-)/2]); Node[] leaves = new Node[(width+)/2]; for (i=0; i<(width+)/2; i++) leaves[i] = nodes[width--i-];
Thread A: getandinc() RT 0
Thread A: getandinc() RT 0
Thread A: getandinc() RT 0
Thread A: getandinc(). Reserve combining path where thread is active RT 0
. Reserve combining path where thread is active (precombine) 2. Write values on active path (combine)
Thread A: getandinc(). Reserve combining path where thread is active RT 0
. Reserve combining path where thread is active (precombine) 2. Write values on active path (combine) 3. Perform op on last node 4. Distribute
Thread A: getandinc(). Reserve combining path where thread is active RT
Thread A: getandinc(). Reserve combining path where thread is active RT Return 0 (old result at root)
2 3 4 public int getandinc() { Stack<Node> stack = new Stack<Node>(); Node leaf = leaves[threadd.get()/2]; Node node = leaf; while (node.precombine()) node = node.parent; Node last = node; node = leaf; int combined = ; while (node!=last) { combined = node.combine(combined); stack.push(node); int prior = last.op(combined); while (!stack.empty()) { node = stack.pop(); node.distribute(prior);
synchronized boolean precombine() { while (locked) wait(); switch (status) { case DLE: status = Node_status.RST; return true; case RST: locked = true; status = Node_status.SECOND; return false; case ROOT: return false; default: // cannot happen, throw exception Passive thread, will have to wait
synchronized int combine(int combined) { while (locked) wait(); locked = true; firstval = combined; switch (status) { case RST: return firstval; case SECOND: return firstval+secondval; default: // cannot happen, throw exception Wait on condition variable Java synchronized method: lock (mutual exclusion) and implicit condition variable
synchronized int op(int combined) { switch (status) { case ROOT: int prior = result; result += combined; return prior; case SECOND: secondval = combined; locked = false; notifyall(); // wake up waiting threads while (status!=node_status.result) wait(); locked = false; notifyall(); status = Node_status.DLE; return result; default: // cannot happen, throw exception
synchronized void distribute(int prior) { switch (status) { case RST: status = Node_status.DLE; locked = false; break; case SECOND: result = prior+firstval; status = Node_status.RESULT; break; default: // cannot happen, throw exception notifyall();
Thread A: getandinc() RT 0 Thread B: getandinc()
Thread A: getandinc() RT 0 Thread B: getandinc()
Thread A: getandinc() RT 0 S Thread B: getandinc()
Thread A: getandinc() RT 0 S Thread B: getandinc()
Thread A: getandinc() RT 0 2 S Thread B: getandinc()
Thread A: getandinc() Returns 0 RT 2 2 S Thread B: getandinc() Returns
Properties ine-grained locking by synchronized methods. There is no lock on the whole data structure Blocking: threads will have to wait on locked nodes for active thread to complete update Linearizable Not unfair (what does that mean?) Not likely to be a competitor for hardware fetch_add() operation, but could be useful for more complex update operations?