Thesis Proposal: Improving the Performance of Synchronization in Concurrent Haskell

Similar documents
Intel TSX (Transactional Synchronization Extensions) Mike Dai Wang and Mihai Burcea

SEER PROBABILISTIC SCHEDULING FOR COMMODITY HARDWARE TRANSACTIONAL MEMORY. 27 th Symposium on Parallel Architectures and Algorithms

Improving In-Memory Database Index Performance with Intel R Transactional Synchronization Extensions

Chapter 6, The Operating System Machine Level

Resource Utilization of Middleware Components in Embedded Systems

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Synchronization Extensions for High-Performance Computing

Challenges for synchronization and scalability on manycore: a Software Transactional Memory approach

Chapter 6 Concurrent Programming

GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications

The ConTract Model. Helmut Wächter, Andreas Reuter. November 9, 1999

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

CSE 373: Data Structure & Algorithms Lecture 25: Programming Languages. Nicki Dell Spring 2014

Transactional Memory

Maximum Benefit from a Minimal HTM

Operating System for the K computer

First-class User Level Threads

Mutual Exclusion using Monitors

Improving the performance of data servers on multicore architectures. Fabien Gaud

A Thread Monitoring System for Multithreaded Java Programs

Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010

WebSphere Architect (Performance and Monitoring) 2011 IBM Corporation

Multi-core Programming System Overview

THE VELOX STACK Patrick Marlier (UniNE)

Replication on Virtual Machines

Data Management in the Cloud

A Review of Customized Dynamic Load Balancing for a Network of Workstations

Combining Events And Threads For Scalable Network Services

BLM 413E - Parallel Programming Lecture 3

Module 15: Monitoring

What s Cool in the SAP JVM (CON3243)

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 5 - DBMS Architecture

The Design and Implementation of Scalable Parallel Haskell

Data Management for Portable Media Players

Performance Improvement In Java Application

Exploiting Hardware Transactional Memory in Main-Memory Databases

Hagit Attiya and Eshcar Hillel. Computer Science Department Technion

Linux Process Scheduling Policy

Monitors, Java, Threads and Processes

Transactionality and Fault Handling in WebSphere Process Server Web Service Invocations. version Feb 2011

Cloned Transactions: A New Execution Concept for Transactional Memory

I/O Device and Drivers

Cloud Haskell. Distributed Programming with. Andres Löh. 14 June 2013, Big Tech Day 6 Copyright 2013 Well-Typed LLP. .Well-Typed

Trace-Based and Sample-Based Profiling in Rational Application Developer

Course Development of Programming for General-Purpose Multicore Processors

A Comparative Study on Vega-HTTP & Popular Open-source Web-servers

Java SE 8 Programming

Graph Analytics in Big Data. John Feo Pacific Northwest National Laboratory

Computer Architecture Syllabus of Qualifying Examination

Real Time Programming: Concepts

Survey on Job Schedulers in Hadoop Cluster

Dynamic Load Balancing. Using Work-Stealing 35.1 INTRODUCTION CHAPTER. Daniel Cederman and Philippas Tsigas

Client/Server Computing Distributed Processing, Client/Server, and Clusters

Chapter 3 Operating-System Structures

Validating Java for Safety-Critical Applications

Have both hardware and software. Want to hide the details from the programmer (user).

aicas Technology Multi Core und Echtzeit Böse Überraschungen vermeiden Dr. Fridtjof Siebert CTO, aicas OOP 2011, 25 th January 2011

Enhancing SQL Server Performance

SYSTEM ecos Embedded Configurable Operating System

Chapter 3: Operating-System Structures. System Components Operating System Services System Calls System Programs System Structure Virtual Machines

KITES TECHNOLOGY COURSE MODULE (C, C++, DS)

Experimental Evaluation of Distributed Middleware with a Virtualized Java Environment

Between Mutual Trust and Mutual Distrust: Practical Fine-grained Privilege Separation in Multithreaded Applications

Eloquence Training What s new in Eloquence B.08.00

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

Understanding Hardware Transactional Memory

Load balancing; Termination detection

An Oracle White Paper September Advanced Java Diagnostics and Monitoring Without Performance Overhead

Design Patterns in C++

DATABASE CONCURRENCY CONTROL USING TRANSACTIONAL MEMORY : PERFORMANCE EVALUATION

11.1 inspectit inspectit

Supporting OpenMP on Cell

Linux Performance Optimizations for Big Data Environments

Mobile Cloud Computing for Data-Intensive Applications

Operating System: Scheduling

CS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015

Agility Database Scalability Testing

Features of AnyShare

The Microsoft Windows Hypervisor High Level Architecture

Parallel Processing and Software Performance. Lukáš Marek

System Software and TinyAUTOSAR

Performance Tools for Parallel Java Environments

How to analyse your system to optimise performance and throughput in IIBv9

Finding Lightweight Opportunities for Parallelism in.net C#

Multi-Threading Performance on Commodity Multi-Core Processors

Getting Started with CodeXL

OpenACC 2.0 and the PGI Accelerator Compilers

Instruction Set Architecture (ISA)

Chapter 2: OS Overview

VMware Server 2.0 Essentials. Virtualization Deployment and Management

The Advanced JTAG Bridge. Nathan Yawn 05/12/09

Inside the Erlang VM

HiDb: A Haskell In-Memory Relational Database

White Paper. Real-time Capabilities for Linux SGI REACT Real-Time for Linux

Practical Performance Understanding the Performance of Your Application

Operating Systems OBJECTIVES 7.1 DEFINITION. Chapter 7. Note:

Google File System. Web and scalability

Running applications on the Cray XC30 4/12/2015

Driving force. What future software needs. Potential research topics

Chapter Outline. Chapter 2 Distributed Information Systems Architecture. Middleware for Heterogeneous and Distributed Information Systems

Transcription:

Thesis Proposal: Improving the Performance of Synchronization in Concurrent Haskell Ryan Yates 5-5-2014 1/21

Introduction Outline Thesis Why Haskell? Preliminary work Hybrid TM for GHC Obstacles to Performance Remaining Implementation Work Opportunities for Improvement Timeline 2/21

Thesis Work Haskell has existing and ongoing work that explores how to express concurrent and parallel execution. This high-level expressiveness gives opportunities to gain performance. Our focus is the synchronization performance of Concurrent Haskell. 3/21

Why Haskell? Active research community. Existing uses of STM. Purity by default and expressive type system gives unique opportunities and challenges. Controlling of effects avoids tricky problems. Lazy evaluation makes reasoning about performance difficult. 4/21

Preliminary Work Hybrid TM for GHC. Full Haskell transactions can be executed as hardware transactions. Global lock on commit phase of software fallback elided with HLE. Fully software fallback acquires global lock for validation and commit. 5/21

Three Forms of Transactions Hardware XBEGIN XEND XACQUIRE XRELEASE Software (HLE) start commit Lock acquire Lock release Software commit 6/21

Haskell Transactions as Hardware Transactions Start transaction with XBEGIN. Read values directly from TVars. Write values directly to TVars. Commit transaction with XEND. Check the fallback lock! 7/21

Fallback Start transaction with empty TRec. Reads Writes Each entry stores TVar, initial value, and new value. Search the TRec. If not found, read from the TVar and store in TRec. Write to TRec. 8/21

Fallback Commit Acquire global lock. Check that initial reads in TRec match values in TVars. Perform writes to TVars. Release global lock. 9/21

HLE Commit Acquire global lock. Elide the global lock. Check that initial reads in TRec match values in TVars. Perform writes to TVars. Release global lock. 10/21

Respecting the Fallback Commit y x A start commit B XBEGIN XEND x y 11/21

Obstacles to performance C Call Overhead Hardware transactions are sensitive to memory operations. Inlining read and write operations rather than calling C functions greatly increased performance. Indirection Indirection increases the number of memory locations that need to be read. 12/21

Indirection Node TRec key TVar prev value value index parent watch tvar left old right Watch Queue Watch Queue new color thread thread tvar next prev next prev old new... tvar old new 13/21

Handling retry and orelse retry Discard the effects of the current transaction and attempt the transaction again. orelse a b If action a reaches retry, then discard the writes of a and perform the action b. The original implementation blocks the transaction s thread when it encounters a retry without an orelse. It then waits for a change from a TVar in the read set to trigger a wake up. 14/21

Handling retry with Hardware Transactions To support the same implementation of retry in a hardware: Track the read set to know which TVars to wait on. Track the write set to know which TVars to wake up. Alternatively we could wake up all blocked transactions on every commit. 15/21

Something in the Middle Use constant space to approximate the read and write sets. Communicate the write set and discard writes by using XABORT. Only 8 bits can be communicated! Read-only transactions can use XEND. If we had non-transactional writes, we could do more. 16/21

Handling orelse with Hardware Transactions We cannot do a partial rollback of writes. Transactions that reach retry before any writes can move immediately to the alternative branch. Our idea: rewrite transactions to buffer writes until after the decision to invoke retry. 17/21

Opportunities for Improvement Purity by default. More code is free to move. More flexibility for improvements. Less reliance on static analysis. Ongoing work explores high-level APIs for concurrent and parallel programming. This work is built on simple primitives. Has the same challenges as our preliminary work. New and refined APIs. Leverage the expressiveness of the types system. Use high-level properties to improve synchronization. 18/21

Timeline Summer 2014 Finish retry and orelse support in hybrid TM. Start implementing benchmarks that exercise retry and orelse. Fall 2014 Finish benchmarks for retry and orelse. Implement compiler passes for improving code with orelse. Start work on efficient TArray support. Look for collaboration opportunities with LVar researchers. Spring 2015 Implement a NOrec-based STM and hybrid TM for GHC. Apply what we have learned to Lightweight Concurrency branch. Explore improvements to IO manager. Explore further compile-time distinctions between transactions. 19/21

Timeline Summer 2015 Implement improved random supply inside transactions. Formalize the Haskell TM s interaction with unsafe operations. Fall 2015 Make run-time system improvements specific to parallel and concurrent libraries. Explore transactions in the context of distributed processes. Begin writing thesis Spring 2016 Continue work on unfinished projects. Explore new opportunities discovered in the work done so far. Finish writing thesis and prepare defense. 20/21

Questions? 21/21