SEER PROBABILISTIC SCHEDULING FOR COMMODITY HARDWARE TRANSACTIONAL MEMORY. 27 th Symposium on Parallel Architectures and Algorithms

Size: px

Start display at page:

Download "SEER PROBABILISTIC SCHEDULING FOR COMMODITY HARDWARE TRANSACTIONAL MEMORY. 27 th Symposium on Parallel Architectures and Algorithms"

Polly McCarthy
8 years ago
Views:

1 27 th Symposium on Parallel Architectures and Algorithms SEER PROBABILISTIC SCHEDULING FOR COMMODITY HARDWARE TRANSACTIONAL MEMORY Nuno Diegues, Paolo Romano and Stoyan Garbatov

2 Seer: Scheduling for Commodity HTM SPAA The multi-core (r)evolution Shared Memory Multi-cores are now ubiquitous Concurrent programming is complex Transactional Memory System Classic approach: Locking Transactional Memory abstraction CPU 1 CPU 2 CPU 3 CPU 4 Hard to get right: fine-grained locks deadlocks correctness atomic { withdraw(acc1,val); deposit(acc2,val); } Programmer identifies atomic blocks Runtime implements synchronization

3 Seer: Scheduling for Commodity HTM SPAA Too much optimism y = x x++ Problem: CPU time is wasted run other computations instead inhibit parallelism improve cache usage increase core frequency reduce power consumption Identify likely conflicts before they happen

4 Seer: Scheduling for Commodity HTM SPAA Scheduler Software TM (STM): library has full concurrency control can point precisely the culprit for the conflict HTM available Hardware TM (HTM): feedback is quite limited rough categorization for the type of conflict in commodity processors

5 Seer: Scheduling for Commodity HTM SPAA Objective: Scheduling for Commodity HTM How to find the root cause for the data conflict? Avoid running T1 and T2 concurrently

6 Seer: Scheduling for Commodity HTM SPAA In an ideal world for HTMs xbegin widthdraw(acc1,val) deposit(acc2,val) xend Transactions restart Transactions may abort: because of contention on same memory locations and every transaction shall eventually succeed

7 Seer: Scheduling for Commodity HTM SPAA in practice: HTMS are Best-Effort No progress guarantees: A transaction may always abort due to a number of reasons: Forbidden instructions Capacity of caches (for reads and writes) Faults and signals Contending transactions, aborting each other

8 Seer: Scheduling for Commodity HTM SPAA Single Global Lock SGL fall-back path for HTM Hardware transaction executes if SGL is free Acquire SGL depending on retry policy SGL is a very simple scheduler Ignores the root cause Takes a global decision --- the SGL Adaptive Transaction Scheduling [SPAA08] We need better Scheduling for Commodity HTMs

9 Seer: Scheduling for Commodity HTM SPAA Related Work Scheduler Support for HTM? Support for Imprecise Information? Schedules Transactions in a Fine-Grained Fashion? ATS [SPAA08] Yes Yes No CAR-STM [PODC08] No No Yes Shrink [PODC09] No No Yes ProPS [Euro-Par14] No No Yes SER [PPoPP10] No No Yes TxLinux [SOSP07] Yes No Yes SOA [HiPEAC09/10] Yes No Yes Seer Yes Yes Yes

10 Seer: Scheduling for Commodity HTM SPAA Key Idea Transactions to be executed are announced Many observations are collected upon transaction commit and abort which transactions were active at the same time? Over time, the outliers will be identifiable w.h.p. A dynamic, fine-grained, locking scheme is devised

11 Seer: Scheduling for Commodity HTM SPAA Seer: overview Transaction = source code transaction active transactions

12 Seer: Scheduling for Commodity HTM SPAA Seer: details Threads collect lightweight events independently --- low overhead Locking scheme (re-)calculated periodically One lock per transaction (atomic block in the application) T1 lock (L1) taken by T2 if they are deemed to conflict T1 waits for L1 to be free before executing Calculate conditional probabilities of commit/abort Relevance threshold based on mean/stdev

13 Seer: Scheduling for Commodity HTM SPAA Seer: details For each pair of transactions (x,y) acquire lock of each other if: Are abort events of x common enough with y running concurrently? Is y one of the main causes for x to abort? Hill climbing based adaptive loop for optimal Threshold search.

14 Seer: Scheduling for Commodity HTM SPAA Seer: optimizations Only one thread (re-)calculates the locking scheme: Whenever it is waiting for the SGL (some thread is on the fallback path) If the SGL is rarely taken, then scheduling will not improve Capacity Aborts: another limitation from best-effort nature Per-core lock Taken when capacity aborts occur Tailored for hyper-thread usage Lock acquisition Hardware transaction used as multi-cas for 2+ locks

15 Seer: Scheduling for Commodity HTM SPAA Evaluation Intel Haswell 4 cores (8 hyper-threads) HLE: Intel Hardware Lock Elision, i.e., no scheduling RTM: Intel Commodity HTM with a SGL SCM: Software-assisted Contention Management [PODC14] --- schedule with a (single) auxiliary lock aux lock is not read speculatively (in hw tx) Seer: our Probabilistic Scheduler on top of Intel RTM

16 Seer: Scheduling for Commodity HTM SPAA How much can we gain with Seer? Genome Intruder Speedup Threads Threads Speedup 50% Geometric Mean Speedup in STAMP

17 Seer: Scheduling for Commodity HTM SPAA What motivates these gains? HLE: 77% with fall-back lock RTM: 37% with SGL SCM: 5% with SGL, 29% with (single) auxiliary lock Seer: 3% with at least one tx lock 4% with core lock 12% with tx + core locks 1% with SGL Fine-grained locks Geometric Mean over STAMP w/ 8 threads

18 Seer: Scheduling for Commodity HTM SPAA Relevance of each mechanism? Transaction locks: Detect conflicts inherent to benchmarks Core locks: Only relevant for >4t (hyper-threading) HTM lock acquisition: Small improvement --- benchmark dependent the more locks, the better Threshold tuning for probabilities Consistent/small improvement Baseline: Seer with all mechanisms enabled (i.e., their overhead) but without any lock acquisitions.

19 Seer: Scheduling for Commodity HTM SPAA Summary First scheduler tailored for Commodity HTMs: Copes with imprecise information Schedules transactions in a fine-grained manner 50% performance improvement with 8 threads 0-8% overhead from monitoring/calculation Taken by measuring Seer, but without acquiring locks

20 Seer: Scheduling for Commodity HTM SPAA Thank you Questions? Nuno Diegues, Paolo Romano and Stoyan Garbatov

21 Seer: Scheduling for Commodity HTM SPAA Backup slides

22 Seer: Scheduling for Commodity HTM SPAA HTM with a fall-back path start: int status = htm_begin code: application logic htm_end // fast-path

23 Seer: Scheduling for Commodity HTM SPAA HTM with a fall-back path start: int status = htm_begin if (status == ok) //!= ok when aborted if (fallback-in-use()) htm_abort // fall-back in use else goto code // fast-path?? code: application logic if (infastpath) htm_end // fast-path else??

24 Seer: Scheduling for Commodity HTM SPAA HTM with a fall-back path start: int status = htm_begin if (status == ok) //!= ok when aborted if (fallback-in-use()) htm_abort // fall-back in use else goto code // fast-path if (shouldretry()) goto start else use-fallback() // retry policy // use fall-back code: application logic if (infastpath) htm_end // fast-path else quit-fallback() // fall-back

25 Seer: Scheduling for Commodity HTM SPAA HTM with a fall-back: a single lock start: int status = htm_begin if (status == ok) //!= ok when aborted if (istaken(lock)) htm_abort // fall-back in use else goto code // fast-path Still simple enough. if (shouldretry()) // retry policy: e.g., limit retries to 10 goto start else acquire(lock) // use fall-back code: application logic if (infastpath) // fast-path htm_end else // fall-back release(lock)

Intel TSX (Transactional Synchronization Extensions) Mike Dai Wang and Mihai Burcea

Intel TSX (Transactional Synchronization Extensions) Mike Dai Wang and Mihai Burcea 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Example: toy banking application with RTM Code written and tested in