Intel TSX (Transactional Synchronization Extensions) Mike Dai Wang and Mihai Burcea



Similar documents
Thesis Proposal: Improving the Performance of Synchronization in Concurrent Haskell

SEER PROBABILISTIC SCHEDULING FOR COMMODITY HARDWARE TRANSACTIONAL MEMORY. 27 th Symposium on Parallel Architectures and Algorithms

CSC 2405: Computer Systems II

Improving In-Memory Database Index Performance with Intel R Transactional Synchronization Extensions

Performance monitoring with Intel Architecture

Understanding Hardware Transactional Memory

Exception and Interrupt Handling in ARM

Computer-System Architecture

Mutual Exclusion using Monitors

Chapter 2: Computer-System Structures. Computer System Operation Storage Structure Storage Hierarchy Hardware Protection General System Architecture

Technical Note. Micron NAND Flash Controller via Xilinx Spartan -3 FPGA. Overview. TN-29-06: NAND Flash Controller on Spartan-3 Overview

Systemnahe Programmierung KU

SSC - Concurrency and Multi-threading Java multithreading programming - Synchronisation (I)

Linux Driver Devices. Why, When, Which, How?

Exceptions in MIPS. know the exception mechanism in MIPS be able to write a simple exception handler for a MIPS machine

SYSTEM ecos Embedded Configurable Operating System

Jorix kernel: real-time scheduling

An Introduction to the ARM 7 Architecture

Lecture 17: Virtual Memory II. Goals of virtual memory

Chapter 6 Concurrent Programming

A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin

CPU performance monitoring using the Time-Stamp Counter register

Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification

Whitepaper: performance of SqlBulkCopy

COS 318: Operating Systems

Virtualization Technologies

Using the CoreSight ITM for debug and testing in RTX applications

Coding conventions and C++-style

Synchronization. Todd C. Mowry CS 740 November 24, Topics. Locks Barriers

Operating Systems 4 th Class

A Comparative Study on Vega-HTTP & Popular Open-source Web-servers

µtasker Document FTP Client

RTOS Debugger for ecos

Page 1 of 5. IS 335: Information Technology in Business Lecture Outline Operating Systems

Keil C51 Cross Compiler

Keil Debugger Tutorial

Cortex-A9 MPCore Software Development

Storage. The text highlighted in green in these slides contain external hyperlinks. 1 / 14

Concurrent Programming

Between Mutual Trust and Mutual Distrust: Practical Fine-grained Privilege Separation in Multithreaded Applications

Introduction to Cloud Computing

AN141 SMBUS COMMUNICATION FOR SMALL FORM FACTOR DEVICE FAMILIES. 1. Introduction. 2. Overview of the SMBus Specification. 2.1.

Overview of the Cortex-M3

(General purpose) Program security. What does it mean for a pgm to be secure? Depends whom you ask. Takes a long time to break its security controls.

EXAM PRO:Design & Develop Windows Apps Using MS.NET Frmwk 4. Buy Full Product.

Intel 64 and IA-32 Architectures Software Developer s Manual

evm Virtualization Platform for Windows

Chapter 11: Input/Output Organisation. Lesson 06: Programmed IO

Xen and the Art of. Virtualization. Ian Pratt

A Study of Performance Monitoring Unit, perf and perf_events subsystem

Why Threads Are A Bad Idea (for most purposes)

Tradeoffs in Transactional Memory Virtualization

Chapter 6, The Operating System Machine Level

Virtual Machines. COMP 3361: Operating Systems I Winter

An Implementation Of Multiprocessor Linux

How To Write A Multi Threaded Software On A Single Core (Or Multi Threaded) System

Virtual Private Systems for FreeBSD

URI and UUID. Identifying things on the Web.

Facing the Challenges for Real-Time Software Development on Multi-Cores

Optimizing Application Performance with CUDA Profiling Tools

I Control Your Code Attack Vectors Through the Eyes of Software-based Fault Isolation. Mathias Payer, ETH Zurich

Operating Systems. Lecture 03. February 11, 2013

A Survey of Parallel Processing in Linux

System Calls and Standard I/O

6.828 Operating System Engineering: Fall Quiz II Solutions THIS IS AN OPEN BOOK, OPEN NOTES QUIZ.

Priority Based Implementation in Pintos

The Microsoft Windows Hypervisor High Level Architecture

Debugging with TotalView

Linux Scheduler. Linux Scheduler

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

User Manuals. Connection to Siemens S5 PU (AS511) Part Number: Version: 2. Date:

Below is a diagram explaining the data packet and the timing related to the mouse clock while receiving a byte from the PS-2 mouse:

Hardware Based Virtualization Technologies. Elsie Wahlig Platform Software Architect

ARM Thumb Microcontrollers. Application Note. Software ISO 7816 I/O Line Implementation. Features. Introduction

Boost SQL Server Performance Buffer Pool Extensions & Delayed Durability

Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details

System Requirements for Microsoft Dynamics NAV 2009 SP1

Introduction. What is an Operating System?

INTRODUCTION ADVANTAGES OF RUNNING ORACLE 11G ON WINDOWS. Edward Whalen, Performance Tuning Corporation

Google File System. Web and scalability

Distributed System Monitoring and Failure Diagnosis using Cooperative Virtual Backdoors

Chapter 1 Computer System Overview

Exploiting nginx chunked overflow bug, the undisclosed attack vector

SMTP-32 Library. Simple Mail Transfer Protocol Dynamic Link Library for Microsoft Windows. Version 5.2

ELEC 377. Operating Systems. Week 1 Class 3

Operating Systems CSE 410, Spring File Management. Stephen Wagner Michigan State University

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Transcription:

Intel TSX (Transactional Synchronization Extensions) Mike Dai Wang and Mihai Burcea

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

Example: toy banking application with RTM

Code written and tested in emulator, not in hardware! Hardware did not exist yet Uses STL std::vector<int> as an array of accounts Perform a few operations on two accounts concurrently Protect each operation by protecting its entire scope 22

Classic lock-based implementation

{ std::cout << "open new account" << std::endl; TransactionScope guard; // protect everything in this scope Accounts.push_back(0); } { std::cout << "put 100 units into account 0" <<std::endl; TransactionScope guard; // protect everything in this scope Accounts[0] += 100; // atomic update due to RTM } class TransactionScope { SimpleSpinLock & lock; TransactionScope(); // forbidden public: TransactionScope(SimpleSpinLock & lock_): lock(lock_) {lock.lock();} ~TransactionScope() { lock.unlock(); } }; 24

Naive RTM implementation

class TransactionScope { public: TransactionScope() { int nretries = 0; while(1) { ++nretries; unsigned status = _xbegin(); if (status == _XBEGIN_STARTED) return; // successful start // abort handler std::cout << "DEBUG: Transaction aborted "<< nretries << " time(s) with the status "<< status << std::endl; } } ~TransactionScope() { _xend(); } }; 26

Output open new account DEBUG: Transaction aborted 1 time(s) with the status 0 DEBUG: Transaction aborted 2 time(s) with the status 0 DEBUG: Transaction aborted 3 time(s) with the status 0 DEBUG: Transaction aborted 4 time(s) with the status 0 27

Improved RTM implementation Acquire a fall-back spin lock non-transactionally after a specified number of retries

TransactionScope(SimpleSpinLock & fallbacklock_, int max_retries = 3) : fallbacklock (fallbacklock_) { int nretries = 0; while(1) { ++nretries; unsigned status = _xbegin(); if (status == _XBEGIN_STARTED) { if (!fallbacklock.islocked()) return; //successfully started transaction /* started txn but someone is executing the txn section non-speculatively (acquired the fall-back lock) -> aborting */ _xabort(0xff); // abort with code 0xff } // abort handler InterlockedIncrement64(&naborted); // do abort statistics std::cout << "DEBUG: Transaction aborted "<< nretries << " time(s) with the status "<< status << std::endl; // handle _xabort(0xff) from above if( (status & _XABORT_EXPLICIT) && _XABORT_CODE(status)==0xff &&!(status & _XABORT_NESTED)) { // wait until the lock is free while (fallbacklock.islocked()) _mm_pause(); } // too many retries, take the fall-back lock if (nretries >= max_retries) break; } //end while fallbacklock.lock(); 29 }

~TransactionScope() { if (fallbacklock.islocked()) fallbacklock.unlock(); else _xend(); } open new account DEBUG: Transaction aborted 1 time(s) with the status 0 DEBUG: Transaction aborted 2 time(s) with the status 0 DEBUG: Transaction aborted 3 time(s) with the status 0 open new account put 100 units into account 0 transfer 10 units from account 0 to account 1 atomically! 30

All txns except the first one, succeeded First txn failed 3 times and then took the fallback path It failed repeatedly because of its expensive operations: reserving and touching memory for its vector from the OS, which involves: System calls Privilege transitions Page faults Initialization/zeroing of big memory chunks that may not fit in the transactional buffer 31

Leveraging RTM Abort Status Bits

RTM EAX abort codes EAX Register bit position 0 1 2 3 4 5 23:6 31:24 Meaning Set if abort set by XABORT If set, txn may succeed on retry; clear if bit 0 is set Set if another CPU conflicted with a mem addr part of the aborted txn Set if an internal buffer overflowed Set if debug breakpoint was hit Set if an abort occurred during nested txn Reserved XABORT argument (only valid if bit 0 set, otherwise reserved) 33

In case of such hard aborts, the retry bit (position 1) is not set If the hardware thinks the txn will succeed on retry, it will set the bit The optimization: if the bit is not set, choose the fallback already The idea is to make faster progress, rather than keep aborting and retrying when it seems unlikely that the txn will succeed on subsequent attempts 34

if( (status & _XABORT_EXPLICIT) && _XABORT_CODE(status)==0xff &&!(status & _XABORT_NESTED)) { // wait until the lock is free while (fallbacklock.islocked()) _mm_pause(); } else if(!(status & _XABORT_RETRY)) break; /* take the fall-back lock if the retry abort flag is not set */ // too many retries, take the fall-back lock if (nretries >= max_retries) break; open new account DEBUG: Transaction aborted 1 time(s) with the status 0 open new account Faster progress! put 100 units into account 0 transfer 10 units from account 0 to account 1 atomically! atomically draw 10 units from account 0 if there is enough money add 1000 empty accounts atomically 35

More Statistics Run two worker threads that randomly choose two accounts from the array, and write to both accounts (transfer money from one to the other) Outcome: 100-300 aborts for 20k txns Testing if it all works correctly: Introduce an update/increment of global counter in each txn (create many conflicts) Outcome: 5k-15k aborts 36