SEER PROBABILISTIC SCHEDULING FOR COMMODITY HARDWARE TRANSACTIONAL MEMORY. 27 th Symposium on Parallel Architectures and Algorithms

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "SEER PROBABILISTIC SCHEDULING FOR COMMODITY HARDWARE TRANSACTIONAL MEMORY. 27 th Symposium on Parallel Architectures and Algorithms"

Transcription

1 27 th Symposium on Parallel Architectures and Algorithms SEER PROBABILISTIC SCHEDULING FOR COMMODITY HARDWARE TRANSACTIONAL MEMORY Nuno Diegues, Paolo Romano and Stoyan Garbatov

2 Seer: Scheduling for Commodity HTM SPAA The multi-core (r)evolution Shared Memory Multi-cores are now ubiquitous Concurrent programming is complex Transactional Memory System Classic approach: Locking Transactional Memory abstraction CPU 1 CPU 2 CPU 3 CPU 4 Hard to get right: fine-grained locks deadlocks correctness atomic { withdraw(acc1,val); deposit(acc2,val); } Programmer identifies atomic blocks Runtime implements synchronization

3 Seer: Scheduling for Commodity HTM SPAA Too much optimism y = x x++ Problem: CPU time is wasted run other computations instead inhibit parallelism improve cache usage increase core frequency reduce power consumption Identify likely conflicts before they happen

4 Seer: Scheduling for Commodity HTM SPAA Scheduler Software TM (STM): library has full concurrency control can point precisely the culprit for the conflict HTM available Hardware TM (HTM): feedback is quite limited rough categorization for the type of conflict in commodity processors

5 Seer: Scheduling for Commodity HTM SPAA Objective: Scheduling for Commodity HTM How to find the root cause for the data conflict? Avoid running T1 and T2 concurrently

6 Seer: Scheduling for Commodity HTM SPAA In an ideal world for HTMs xbegin widthdraw(acc1,val) deposit(acc2,val) xend Transactions restart Transactions may abort: because of contention on same memory locations and every transaction shall eventually succeed

7 Seer: Scheduling for Commodity HTM SPAA in practice: HTMS are Best-Effort No progress guarantees: A transaction may always abort due to a number of reasons: Forbidden instructions Capacity of caches (for reads and writes) Faults and signals Contending transactions, aborting each other

8 Seer: Scheduling for Commodity HTM SPAA Single Global Lock SGL fall-back path for HTM Hardware transaction executes if SGL is free Acquire SGL depending on retry policy SGL is a very simple scheduler Ignores the root cause Takes a global decision --- the SGL Adaptive Transaction Scheduling [SPAA08] We need better Scheduling for Commodity HTMs

9 Seer: Scheduling for Commodity HTM SPAA Related Work Scheduler Support for HTM? Support for Imprecise Information? Schedules Transactions in a Fine-Grained Fashion? ATS [SPAA08] Yes Yes No CAR-STM [PODC08] No No Yes Shrink [PODC09] No No Yes ProPS [Euro-Par14] No No Yes SER [PPoPP10] No No Yes TxLinux [SOSP07] Yes No Yes SOA [HiPEAC09/10] Yes No Yes Seer Yes Yes Yes

10 Seer: Scheduling for Commodity HTM SPAA Key Idea Transactions to be executed are announced Many observations are collected upon transaction commit and abort which transactions were active at the same time? Over time, the outliers will be identifiable w.h.p. A dynamic, fine-grained, locking scheme is devised

11 Seer: Scheduling for Commodity HTM SPAA Seer: overview Transaction = source code transaction active transactions

12 Seer: Scheduling for Commodity HTM SPAA Seer: details Threads collect lightweight events independently --- low overhead Locking scheme (re-)calculated periodically One lock per transaction (atomic block in the application) T1 lock (L1) taken by T2 if they are deemed to conflict T1 waits for L1 to be free before executing Calculate conditional probabilities of commit/abort Relevance threshold based on mean/stdev

13 Seer: Scheduling for Commodity HTM SPAA Seer: details For each pair of transactions (x,y) acquire lock of each other if: Are abort events of x common enough with y running concurrently? Is y one of the main causes for x to abort? Hill climbing based adaptive loop for optimal Threshold search.

14 Seer: Scheduling for Commodity HTM SPAA Seer: optimizations Only one thread (re-)calculates the locking scheme: Whenever it is waiting for the SGL (some thread is on the fallback path) If the SGL is rarely taken, then scheduling will not improve Capacity Aborts: another limitation from best-effort nature Per-core lock Taken when capacity aborts occur Tailored for hyper-thread usage Lock acquisition Hardware transaction used as multi-cas for 2+ locks

15 Seer: Scheduling for Commodity HTM SPAA Evaluation Intel Haswell 4 cores (8 hyper-threads) HLE: Intel Hardware Lock Elision, i.e., no scheduling RTM: Intel Commodity HTM with a SGL SCM: Software-assisted Contention Management [PODC14] --- schedule with a (single) auxiliary lock aux lock is not read speculatively (in hw tx) Seer: our Probabilistic Scheduler on top of Intel RTM

16 Seer: Scheduling for Commodity HTM SPAA How much can we gain with Seer? Genome Intruder Speedup Threads Threads Speedup 50% Geometric Mean Speedup in STAMP

17 Seer: Scheduling for Commodity HTM SPAA What motivates these gains? HLE: 77% with fall-back lock RTM: 37% with SGL SCM: 5% with SGL, 29% with (single) auxiliary lock Seer: 3% with at least one tx lock 4% with core lock 12% with tx + core locks 1% with SGL Fine-grained locks Geometric Mean over STAMP w/ 8 threads

18 Seer: Scheduling for Commodity HTM SPAA Relevance of each mechanism? Transaction locks: Detect conflicts inherent to benchmarks Core locks: Only relevant for >4t (hyper-threading) HTM lock acquisition: Small improvement --- benchmark dependent the more locks, the better Threshold tuning for probabilities Consistent/small improvement Baseline: Seer with all mechanisms enabled (i.e., their overhead) but without any lock acquisitions.

19 Seer: Scheduling for Commodity HTM SPAA Summary First scheduler tailored for Commodity HTMs: Copes with imprecise information Schedules transactions in a fine-grained manner 50% performance improvement with 8 threads 0-8% overhead from monitoring/calculation Taken by measuring Seer, but without acquiring locks

20 Seer: Scheduling for Commodity HTM SPAA Thank you Questions? Nuno Diegues, Paolo Romano and Stoyan Garbatov

21 Seer: Scheduling for Commodity HTM SPAA Backup slides

22 Seer: Scheduling for Commodity HTM SPAA HTM with a fall-back path start: int status = htm_begin code: application logic htm_end // fast-path

23 Seer: Scheduling for Commodity HTM SPAA HTM with a fall-back path start: int status = htm_begin if (status == ok) //!= ok when aborted if (fallback-in-use()) htm_abort // fall-back in use else goto code // fast-path?? code: application logic if (infastpath) htm_end // fast-path else??

24 Seer: Scheduling for Commodity HTM SPAA HTM with a fall-back path start: int status = htm_begin if (status == ok) //!= ok when aborted if (fallback-in-use()) htm_abort // fall-back in use else goto code // fast-path if (shouldretry()) goto start else use-fallback() // retry policy // use fall-back code: application logic if (infastpath) htm_end // fast-path else quit-fallback() // fall-back

25 Seer: Scheduling for Commodity HTM SPAA HTM with a fall-back: a single lock start: int status = htm_begin if (status == ok) //!= ok when aborted if (istaken(lock)) htm_abort // fall-back in use else goto code // fast-path Still simple enough. if (shouldretry()) // retry policy: e.g., limit retries to 10 goto start else acquire(lock) // use fall-back code: application logic if (infastpath) // fast-path htm_end else // fall-back release(lock)

Intel TSX (Transactional Synchronization Extensions) Mike Dai Wang and Mihai Burcea

Intel TSX (Transactional Synchronization Extensions) Mike Dai Wang and Mihai Burcea Intel TSX (Transactional Synchronization Extensions) Mike Dai Wang and Mihai Burcea 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Example: toy banking application with RTM Code written and tested in

More information

Thesis Proposal: Improving the Performance of Synchronization in Concurrent Haskell

Thesis Proposal: Improving the Performance of Synchronization in Concurrent Haskell Thesis Proposal: Improving the Performance of Synchronization in Concurrent Haskell Ryan Yates 5-5-2014 1/21 Introduction Outline Thesis Why Haskell? Preliminary work Hybrid TM for GHC Obstacles to Performance

More information

UConn UTC Workshop on Advanced Systems Engineering Research and Curriculum Development

UConn UTC Workshop on Advanced Systems Engineering Research and Curriculum Development Multicore Software for Safety Critical Applications Omer Khan Assistant Professor of Electrical and Computer Engineering University of Connecticut khan@uconn.edu, (860)-486-2192 Workshop on Research and

More information

Challenges for synchronization and scalability on manycore: a Software Transactional Memory approach

Challenges for synchronization and scalability on manycore: a Software Transactional Memory approach Challenges for synchronization and scalability on manycore: a Software Transactional Memory approach Maurício Lima Pilla André Rauber Du Bois Adenauer Correa Yamin Ana Marilza Pernas Fleischmann Gerson

More information

Understanding Hardware Transactional Memory

Understanding Hardware Transactional Memory Understanding Hardware Transactional Memory Gil Tene, CTO & co-founder, Azul Systems @giltene 2015 Azul Systems, Inc. Agenda Brief introduction What is Hardware Transactional Memory (HTM)? Cache coherence

More information

Improving In-Memory Database Index Performance with Intel R Transactional Synchronization Extensions

Improving In-Memory Database Index Performance with Intel R Transactional Synchronization Extensions Appears in the 20th International Symposium On High-Performance Computer Architecture, Feb. 15 - Feb. 19, 2014. Improving In-Memory Database Index Performance with Intel R Transactional Synchronization

More information

Minimizing Latency in Fault-Tolerant Distributed Stream Processing Systems

Minimizing Latency in Fault-Tolerant Distributed Stream Processing Systems Department of Computer Science Institute for Systems Architecture, Systems Engineering Group Minimizing Latency in Fault-Tolerant Distributed Stream Processing Systems Andrey Brito1, Christof Fetzer1,

More information

Transactional Memory

Transactional Memory Transactional Memory Konrad Lai Microprocessor Technology Labs, Intel Intel Multicore University Research Conference Dec 8, 2005 Motivation Multiple cores face a serious programmability problem Writing

More information

ByteSTM: Virtual Machine-Level Java Software Transactional Memory

ByteSTM: Virtual Machine-Level Java Software Transactional Memory ByteSTM: Virtual Machine-Level Java Software Transactional Memory Mohamed Mohamedin, Binoy Ravindran, and Roberto Palmieri Virginia Tech USA {mohamedin,binoy,robertop}@vt.edu Coordination 2013 Concurrency

More information

Software and the Concurrency Revolution

Software and the Concurrency Revolution Software and the Concurrency Revolution A: The world s fastest supercomputer, with up to 4 processors, 128MB RAM, 942 MFLOPS (peak). 2 Q: What is a 1984 Cray X-MP? (Or a fractional 2005 vintage Xbox )

More information

Synchronization Extensions for High-Performance Computing

Synchronization Extensions for High-Performance Computing Performance Evaluation of Intel Transactional R Synchronization Extensions for High-Performance Computing Richard M. Yoo richard.m.yoo@intel.com Konrad Lai konrad.lai@intel.com Christopher J. Hughes christopher.j.hughes@intel.com

More information

Maximum Benefit from a Minimal HTM

Maximum Benefit from a Minimal HTM Maximum Benefit from a Minimal HTM Owen S. Hofmann Christopher J. Rossbach Emmett Witchel University of Texas at Austin osh@cs.utexas.edu, rossbach@cs.utexas.edu, witchel@cs.utexas.edu Abstract A minimal,

More information

Adaptive thread scheduling techniques for improving scalability of software transactional memory. Title. Chan, K; Lam, KT; Wang, CL

Adaptive thread scheduling techniques for improving scalability of software transactional memory. Title. Chan, K; Lam, KT; Wang, CL Title Adaptive thread scheduling techniques for improving scalability of software transactional memory Author(s) Chan, K; Lam, KT; Wang, CL Citation The 10th IASTED International Conference on Parallel

More information

Intel Hyper-Threading. Matthew Joyner & Mike Diep Computer Architecture

Intel Hyper-Threading. Matthew Joyner & Mike Diep Computer Architecture Intel Hyper-Threading Matthew Joyner & Mike Diep Computer Architecture Outline What is Multithreading Temporal Multithreading Simultaneous Multithreading Intel's Hyper-Threading Architecture Performance

More information

Performance Evaluation of Adaptivity in Software Transactional Memory

Performance Evaluation of Adaptivity in Software Transactional Memory Performance Evaluation of Adaptivity in Software Transactional Memory Mathias Payer ETH Zurich, Switzerland mathias.payer@inf.ethz.ch Thomas R. Gross ETH Zurich, Switzerland trg@inf.ethz.ch Abstract Transactional

More information

Parallel Computing and Performance Evaluation -- Amdahl s Law

Parallel Computing and Performance Evaluation -- Amdahl s Law Parallel Computing and Performance Evaluation -- Amdahl s Law 9/29/205 Yinong Chen Chapter 7 Roadmap: Evaluation in Design Process Amdahl s Law 2 Multi-Core and HyperThreading 3 4 Application of Amdahl

More information

Multi-core Programming System Overview

Multi-core Programming System Overview Multi-core Programming System Overview Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,

More information

Chapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup

Chapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup Chapter 12: Multiprocessor Architectures Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup Objective Be familiar with basic multiprocessor architectures and be able to

More information

Scaling HTM-Supported Database Transactions to Many Cores

Scaling HTM-Supported Database Transactions to Many Cores 1 Scaling HTM-Supported Database Transactions to Many Cores Viktor Leis, Alfons Kemper, and Thomas Neumann Abstract So far, transactional memory although a promising technique suffered from the absence

More information

April 6, 2016 ASPLOS 2016 Atlanta, Georgia.

April 6, 2016 ASPLOS 2016 Atlanta, Georgia. Noam Shalev Technion Eran Harpaz Technion Hagar Porat Technion Idit Keidar Technion Yaron Weinsberg IBM Research April 6, 2016 ASPLOS 2016 Atlanta, Georgia. Technology scaling Many core is here Machines

More information

Running a Workflow on a PowerCenter Grid

Running a Workflow on a PowerCenter Grid Running a Workflow on a PowerCenter Grid 2010-2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)

More information

FPGA-based Multithreading for In-Memory Hash Joins

FPGA-based Multithreading for In-Memory Hash Joins FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded

More information

System Copy GT Manual 1.8 Last update: 2015/07/13 Basis Technologies

System Copy GT Manual 1.8 Last update: 2015/07/13 Basis Technologies System Copy GT Manual 1.8 Last update: 2015/07/13 Basis Technologies Table of Contents Introduction... 1 Prerequisites... 2 Executing System Copy GT... 3 Program Parameters / Selection Screen... 4 Technical

More information

A Pattern-Based Approach to. Automated Application Performance Analysis

A Pattern-Based Approach to. Automated Application Performance Analysis A Pattern-Based Approach to Automated Application Performance Analysis Nikhil Bhatia, Shirley Moore, Felix Wolf, and Jack Dongarra Innovative Computing Laboratory University of Tennessee (bhatia, shirley,

More information

Predictive modeling for software transactional memory

Predictive modeling for software transactional memory VU University Amsterdam BMI Paper Predictive modeling for software transactional memory Author: Tim Stokman Supervisor: Sandjai Bhulai October, Abstract In this paper a new kind of concurrency type named

More information

Improving the performance of data servers on multicore architectures. Fabien Gaud

Improving the performance of data servers on multicore architectures. Fabien Gaud Improving the performance of data servers on multicore architectures Fabien Gaud Grenoble University Advisors: Jean-Bernard Stefani, Renaud Lachaize and Vivien Quéma Sardes (INRIA/LIG) December 2, 2010

More information

Exploiting Hardware Transactional Memory in Main-Memory Databases

Exploiting Hardware Transactional Memory in Main-Memory Databases Exploiting Hardware Transactional Memory in Main-Memory Databases Viktor Leis, Alfons Kemper, Thomas Neumann Fakultät für Informatik Technische Universität München Boltzmannstraße 3, D-85748 Garching @in.tum.de

More information

IMCM: A Flexible Fine-Grained Adaptive Framework for Parallel Mobile Hybrid Cloud Applications

IMCM: A Flexible Fine-Grained Adaptive Framework for Parallel Mobile Hybrid Cloud Applications Open System Laboratory of University of Illinois at Urbana Champaign presents: Outline: IMCM: A Flexible Fine-Grained Adaptive Framework for Parallel Mobile Hybrid Cloud Applications A Fine-Grained Adaptive

More information

Using Restricted Transactional Memory to Build a Scalable In-Memory Database

Using Restricted Transactional Memory to Build a Scalable In-Memory Database Using Restricted Transactional Memory to Build a Scalable In-Memory Database Zhaoguo Wang, Hao Qian, Jinyang Li, Haibo Chen School of Computer Science, Fudan University Institute of Parallel and Distributed

More information

Spring 2011 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem

More information

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel Parallel Databases Increase performance by performing operations in parallel Parallel Architectures Shared memory Shared disk Shared nothing closely coupled loosely coupled Parallelism Terminology Speedup:

More information

Parallel Computing. Frank McKenna. UC Berkeley. OpenSees Parallel Workshop Berkeley, CA

Parallel Computing. Frank McKenna. UC Berkeley. OpenSees Parallel Workshop Berkeley, CA Parallel Computing Frank McKenna UC Berkeley OpenSees Parallel Workshop Berkeley, CA Overview Introduction to Parallel Computers Parallel Programming Models Race Conditions and Deadlock Problems Performance

More information

Overview. Overview of Transaction Management. Definition of a transaction. What is a transaction? Chapter 16

Overview. Overview of Transaction Management. Definition of a transaction. What is a transaction? Chapter 16 1 2 Overview Overview of Transaction Management Chapter 16 Transactions and the atomicity concept The so-called ACID principle Backup and recovery mechanisms» the types of failures that may occur» describe

More information

Multicore Programming with LabVIEW Technical Resource Guide

Multicore Programming with LabVIEW Technical Resource Guide Multicore Programming with LabVIEW Technical Resource Guide 2 INTRODUCTORY TOPICS UNDERSTANDING PARALLEL HARDWARE: MULTIPROCESSORS, HYPERTHREADING, DUAL- CORE, MULTICORE AND FPGAS... 5 DIFFERENCES BETWEEN

More information

Versioned Transactional Shared Memory for the

Versioned Transactional Shared Memory for the Versioned Transactional Shared Memory for the FénixEDU Web Application Nuno Carvalho INESC-ID/IST nonius@gsd.inesc-id.pt João Cachopo INESC-ID/IST joao.cachopo@inesc-id.pt António Rito Silva INESC-ID/IST

More information

LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015. Hermann Härtig

LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015. Hermann Härtig LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015 Hermann Härtig ISSUES starting points independent Unix processes and block synchronous execution who does it load migration mechanism

More information

DATABASE CONCURRENCY CONTROL USING TRANSACTIONAL MEMORY : PERFORMANCE EVALUATION

DATABASE CONCURRENCY CONTROL USING TRANSACTIONAL MEMORY : PERFORMANCE EVALUATION DATABASE CONCURRENCY CONTROL USING TRANSACTIONAL MEMORY : PERFORMANCE EVALUATION Jeong Seung Yu a, Woon Hak Kang b, Hwan Soo Han c and Sang Won Lee d School of Info. & Comm. Engr. Sungkyunkwan University

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

Scheduling Task Parallelism" on Multi-Socket Multicore Systems"

Scheduling Task Parallelism on Multi-Socket Multicore Systems Scheduling Task Parallelism" on Multi-Socket Multicore Systems" Stephen Olivier, UNC Chapel Hill Allan Porterfield, RENCI Kyle Wheeler, Sandia National Labs Jan Prins, UNC Chapel Hill Outline" Introduction

More information

SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri

SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri SWARM: A Parallel Programming Framework for Multicore Processors David A. Bader, Varun N. Kanade and Kamesh Madduri Our Contributions SWARM: SoftWare and Algorithms for Running on Multicore, a portable

More information

Performance Tuning and Optimizing SQL Databases 2016

Performance Tuning and Optimizing SQL Databases 2016 Performance Tuning and Optimizing SQL Databases 2016 http://www.homnick.com marketing@homnick.com +1.561.988.0567 Boca Raton, Fl USA About this course This four-day instructor-led course provides students

More information

Chapter 6, The Operating System Machine Level

Chapter 6, The Operating System Machine Level Chapter 6, The Operating System Machine Level 6.1 Virtual Memory 6.2 Virtual I/O Instructions 6.3 Virtual Instructions For Parallel Processing 6.4 Example Operating Systems 6.5 Summary Virtual Memory General

More information

Transactional Support for SDN Control Planes "

Transactional Support for SDN Control Planes Transactional Support for SDN Control Planes Petr Kuznetsov Telecom ParisTech WTTM, 2015 Software Defined Networking An emerging paradigm in computer network management Separate forwarding hardware (data

More information

CSE 30321 Computer Architecture I Fall 2009 Final Exam December 18, 2009

CSE 30321 Computer Architecture I Fall 2009 Final Exam December 18, 2009 CSE 30321 Computer Architecture I Fall 2009 Final Exam December 18, 2009 Test Guidelines: 1. Place your name on EACH page of the test in the space provided. 2. every question in the space provided. If

More information

Describe the SQL Server components and SQL OS Describe the differences between Windows Scheduling and SQL scheduling Describe waits and queues

Describe the SQL Server components and SQL OS Describe the differences between Windows Scheduling and SQL scheduling Describe waits and queues Course Page - Page 1 of 5 Performance Tuning and Optimizing SQL Databases M-10987 Length: 4 days Price: $ 2,495.00 Course Description This four-day instructor-led course provides students who manage and

More information

GMP implementation on CUDA - A Backward Compatible Design With Performance Tuning

GMP implementation on CUDA - A Backward Compatible Design With Performance Tuning 1 GMP implementation on CUDA - A Backward Compatible Design With Performance Tuning Hao Jun Liu, Chu Tong Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto haojun.liu@utoronto.ca,

More information

B.Sc (Computer Science) Database Management Systems UNIT - IV

B.Sc (Computer Science) Database Management Systems UNIT - IV 1 B.Sc (Computer Science) Database Management Systems UNIT - IV Transaction:- A transaction is any action that reads from or writes to a database. Suppose a customer is purchasing a product using credit

More information

Capacity Estimation for Linux Workloads

Capacity Estimation for Linux Workloads Capacity Estimation for Linux Workloads Session L985 David Boyes Sine Nomine Associates 1 Agenda General Capacity Planning Issues Virtual Machine History and Value Unique Capacity Issues in Virtual Machines

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

Historically, Huge Performance Gains came from Huge Clock Frequency Increases Unfortunately.

Historically, Huge Performance Gains came from Huge Clock Frequency Increases Unfortunately. Historically, Huge Performance Gains came from Huge Clock Frequency Increases Unfortunately. Hardware Solution Evolution of Computer Architectures Micro-Scopic View Clock Rate Limits Have Been Reached

More information

Race Conditions, Critical Sections and Semaphores

Race Conditions, Critical Sections and Semaphores Race Conditions, Critical Sections and Semaphores In a multiprogrammed system, there are several processes "active" at once. Even a single job can create multiple processes (as in the Lab project using

More information

Why Threads Are A Bad Idea (for most purposes)

Why Threads Are A Bad Idea (for most purposes) Why Threads Are A Bad Idea (for most purposes) John Ousterhout Sun Microsystems Laboratories john.ousterhout@eng.sun.com http://www.sunlabs.com/~ouster Introduction Threads: Grew up in OS world (processes).

More information

Lecture 7: Concurrency control. Rasmus Pagh

Lecture 7: Concurrency control. Rasmus Pagh Lecture 7: Concurrency control Rasmus Pagh 1 Today s lecture Concurrency control basics Conflicts and serializability Locking Isolation levels in SQL Optimistic concurrency control Transaction tuning Transaction

More information

(Pessimistic) Timestamp Ordering. Rules for read and write Operations. Pessimistic Timestamp Ordering. Write Operations and Timestamps

(Pessimistic) Timestamp Ordering. Rules for read and write Operations. Pessimistic Timestamp Ordering. Write Operations and Timestamps (Pessimistic) stamp Ordering Another approach to concurrency control: Assign a timestamp ts(t) to transaction T at the moment it starts Using Lamport's timestamps: total order is given. In distributed

More information

Operating Systems. 05. Threads. Paul Krzyzanowski. Rutgers University. Spring 2015

Operating Systems. 05. Threads. Paul Krzyzanowski. Rutgers University. Spring 2015 Operating Systems 05. Threads Paul Krzyzanowski Rutgers University Spring 2015 February 9, 2015 2014-2015 Paul Krzyzanowski 1 Thread of execution Single sequence of instructions Pointed to by the program

More information

10.04.2008. Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details

10.04.2008. Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details Thomas Fahrig Senior Developer Hypervisor Team Hypervisor Architecture Terminology Goals Basics Details Scheduling Interval External Interrupt Handling Reserves, Weights and Caps Context Switch Waiting

More information

Parallelization of Matrix Multiply

Parallelization of Matrix Multiply 18.337/6.338: Parallel Computing Final Project Report Parallelization of Matrix Multiply A Look At How Differing Algorithmic Approaches and CPU Hardware Impact Scaling Calculation Performance in Java Elliotte

More information

Control 2004, University of Bath, UK, September 2004

Control 2004, University of Bath, UK, September 2004 Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of

More information

HyperThreading Support in VMware ESX Server 2.1

HyperThreading Support in VMware ESX Server 2.1 HyperThreading Support in VMware ESX Server 2.1 Summary VMware ESX Server 2.1 now fully supports Intel s new Hyper-Threading Technology (HT). This paper explains the changes that an administrator can expect

More information

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General

More information

Scalability evaluation of barrier algorithms for OpenMP

Scalability evaluation of barrier algorithms for OpenMP Scalability evaluation of barrier algorithms for OpenMP Ramachandra Nanjegowda, Oscar Hernandez, Barbara Chapman and Haoqiang H. Jin High Performance Computing and Tools Group (HPCTools) Computer Science

More information

Final Firewall Report By Graham Carling and Benjamin Koatz

Final Firewall Report By Graham Carling and Benjamin Koatz Final Firewall Report By Graham Carling and Benjamin Koatz Table of Contents Synopsis 1 How to Run 3 Analysis of Functions that use Concurrent Data Structures Comparative Results from Cached vs. Non Cached

More information

OPERATING SYSTEMS (OPS)

OPERATING SYSTEMS (OPS) Computing Curricula - Computer Engineering Body of Knowledge 1 OPERATING SYSTEMS (OPS) OPS0. History and overview of operating systems [core] OPS1. Operating system function and design [core] OPS2. Operating

More information

Hardware support for Local Memory Transactions on GPU Architectures

Hardware support for Local Memory Transactions on GPU Architectures Hardware support for Local Memory Transactions on GPU Architectures Alejandro Villegas Rafael Asenjo Ángeles Navarro Oscar Plata Universidad de Málaga, Andalucía Tech. Dept. Computer Architecture, 29071

More information

2

2 1 2 3 4 5 6 For more information, see http://www.intel.com/content/www/us/en/processors/core/coreprocessor-family.html 7 8 9 The logic for identifying issues on the Haswell microarchitecture is embedded

More information

Driving force. What future software needs. Potential research topics

Driving force. What future software needs. Potential research topics Improving Software Robustness and Efficiency Driving force Processor core clock speed reach practical limit ~4GHz (power issue) Percentage of sustainable # of active transistors decrease; Increase in #

More information

Linked Lists: Locking vs. Lock-Free. Concurrent Algorithms 2013 Programming Assignment

Linked Lists: Locking vs. Lock-Free. Concurrent Algorithms 2013 Programming Assignment Linked Lists: Locking vs. Lock-Free Concurrent Algorithms 2013 Programming Assignment Linked list Data structure with group of nodes - representing a sequence Operations - add() - remove() - contains()

More information

OPERATING SYSTEMS. IIIT-Hyderabad

OPERATING SYSTEMS. IIIT-Hyderabad OPERATING SYSTEMS IIIT-Hyderabad OVERVIEW Introduction What is an OS/Kernel? Bootstrap program Interrupts and exceptions Volatile and Non volatile storage!!! Process Management What is a process/system

More information

Synchronization. Todd C. Mowry CS 740 November 24, 1998. Topics. Locks Barriers

Synchronization. Todd C. Mowry CS 740 November 24, 1998. Topics. Locks Barriers Synchronization Todd C. Mowry CS 740 November 24, 1998 Topics Locks Barriers Types of Synchronization Mutual Exclusion Locks Event Synchronization Global or group-based (barriers) Point-to-point tightly

More information

Distributed Data Management

Distributed Data Management Introduction Distributed Data Management Involves the distribution of data and work among more than one machine in the network. Distributed computing is more broad than canonical client/server, in that

More information

SQL Server 2012 Optimization, Performance Tuning and Troubleshooting

SQL Server 2012 Optimization, Performance Tuning and Troubleshooting 1 SQL Server 2012 Optimization, Performance Tuning and Troubleshooting 5 Days (SQ-OPT2012-301-EN) Description During this five-day intensive course, students will learn the internal architecture of SQL

More information

Carlos Villavieja, Nacho Navarro {cvillavi,nacho}@ac.upc.edu. Arati Baliga, Liviu Iftode {aratib,liviu}@cs.rutgers.edu

Carlos Villavieja, Nacho Navarro {cvillavi,nacho}@ac.upc.edu. Arati Baliga, Liviu Iftode {aratib,liviu}@cs.rutgers.edu Continuous Monitoring using MultiCores Carlos Villavieja, Nacho Navarro {cvillavi,nacho}@ac.upc.edu Arati Baliga, Liviu Iftode {aratib,liviu}@cs.rutgers.edu Motivation Intrusion detection Intruder gets

More information

Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification

Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification Introduction Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification Advanced Topics in Software Engineering 1 Concurrent Programs Characterized by

More information

Transactional Memory Should be an Implementation Technique, Not a Programming Interface. Hans-J. Boehm

Transactional Memory Should be an Implementation Technique, Not a Programming Interface. Hans-J. Boehm Transactional Memory Should be an Implementation Technique, Not a Programming Interface Hans-J. Boehm Why Transactional Memory? A mechanism for providing failure atomicity? Largely subject of last talk.

More information

Chapter 6 Concurrency: Deadlock and Starvation

Chapter 6 Concurrency: Deadlock and Starvation Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 6 Concurrency: Deadlock and Starvation Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Roadmap Principals of Deadlock

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information

Scalability of modern Linux kernels

Scalability of modern Linux kernels Scalability of modern Linux kernels September 2010 Andi Kleen, Tim Chen LinuxCon Japan Agenda Presentation is about Linux kernel scalability On single image systems Not applications or clusters Presentation

More information

Chapter 9. Transaction Management and Concurrency Control. Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel

Chapter 9. Transaction Management and Concurrency Control. Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel Chapter 9 Transaction Management and Concurrency Control Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel 1 In this chapter, you will learn: What a database transaction

More information

aicas Technology Multi Core und Echtzeit Böse Überraschungen vermeiden Dr. Fridtjof Siebert CTO, aicas OOP 2011, 25 th January 2011

aicas Technology Multi Core und Echtzeit Böse Überraschungen vermeiden Dr. Fridtjof Siebert CTO, aicas OOP 2011, 25 th January 2011 aicas Technology Multi Core und Echtzeit Böse Überraschungen vermeiden Dr. Fridtjof Siebert CTO, aicas OOP 2011, 25 th January 2011 2 aicas Group aicas GmbH founded in 2001 in Karlsruhe Focus: Embedded

More information

An Evaluation of OpenMP on Current and Emerging Multithreaded/Multicore Processors

An Evaluation of OpenMP on Current and Emerging Multithreaded/Multicore Processors An Evaluation of OpenMP on Current and Emerging Multithreaded/Multicore Processors Matthew Curtis-Maury, Xiaoning Ding, Christos D. Antonopoulos, and Dimitrios S. Nikolopoulos The College of William &

More information

Hagit Attiya and Eshcar Hillel. Computer Science Department Technion

Hagit Attiya and Eshcar Hillel. Computer Science Department Technion Hagit Attiya and Eshcar Hillel Computer Science Department Technion !!" What are highly-concurrent data structures and why we care about them The concurrency of existing implementation techniques Two ideas

More information

Lecture 3: Single processor architecture and memory

Lecture 3: Single processor architecture and memory Lecture 3: Single processor architecture and memory David Bindel 30 Jan 2014 Logistics Raised enrollment from 75 to 94 last Friday. Current enrollment is 90; C4 and CMS should be current? HW 0 (getting

More information

NAND Flash Memories. Understanding NAND Flash Factory Pre-Programming. Schemes

NAND Flash Memories. Understanding NAND Flash Factory Pre-Programming. Schemes NAND Flash Memories Understanding NAND Flash Factory Pre-Programming Schemes Application Note February 2009 an_elnec_nand_schemes, version 1.00 Version 1.00/02.2009 Page 1 of 20 NAND flash technology enables

More information

Petascale Software Challenges. Piyush Chaudhary piyushc@us.ibm.com High Performance Computing

Petascale Software Challenges. Piyush Chaudhary piyushc@us.ibm.com High Performance Computing Petascale Software Challenges Piyush Chaudhary piyushc@us.ibm.com High Performance Computing Fundamental Observations Applications are struggling to realize growth in sustained performance at scale Reasons

More information

Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle?

Lecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle? Lecture 3: Evaluating Computer Architectures Announcements - Reminder: Homework 1 due Thursday 2/2 Last Time technology back ground Computer elements Circuits and timing Virtuous cycle of the past and

More information

Resource Utilization of Middleware Components in Embedded Systems

Resource Utilization of Middleware Components in Embedded Systems Resource Utilization of Middleware Components in Embedded Systems 3 Introduction System memory, CPU, and network resources are critical to the operation and performance of any software system. These system

More information

Java Virtual Machine: the key for accurated memory prefetching

Java Virtual Machine: the key for accurated memory prefetching Java Virtual Machine: the key for accurated memory prefetching Yolanda Becerra Jordi Garcia Toni Cortes Nacho Navarro Computer Architecture Department Universitat Politècnica de Catalunya Barcelona, Spain

More information

Multi-GPU Load Balancing for Simulation and Rendering

Multi-GPU Load Balancing for Simulation and Rendering Multi- Load Balancing for Simulation and Rendering Yong Cao Computer Science Department, Virginia Tech, USA In-situ ualization and ual Analytics Instant visualization and interaction of computing tasks

More information

A Flexible Cluster Infrastructure for Systems Research and Software Development

A Flexible Cluster Infrastructure for Systems Research and Software Development Award Number: CNS-551555 Title: CRI: Acquisition of an InfiniBand Cluster with SMP Nodes Institution: Florida State University PIs: Xin Yuan, Robert van Engelen, Kartik Gopalan A Flexible Cluster Infrastructure

More information

CSC469 Tutorial 4 - Shared Memory Consistency - Alexey Khrabrov

CSC469 Tutorial 4 - Shared Memory Consistency - Alexey Khrabrov CSC469 Tutorial 4 - Shared Memory Consistency - Alexey Khrabrov Assignment 1 Should determine threshold experimentally Part of the tool, not just a hard-coded value in your program To find CPU frequency,

More information

Hybrid parallelism for Weather Research and Forecasting Model on Intel platforms (performance evaluation)

Hybrid parallelism for Weather Research and Forecasting Model on Intel platforms (performance evaluation) Hybrid parallelism for Weather Research and Forecasting Model on Intel platforms (performance evaluation) Roman Dubtsov*, Mark Lubin, Alexander Semenov {roman.s.dubtsov,mark.lubin,alexander.l.semenov}@intel.com

More information

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here PLATFORM Top Ten Questions for Choosing In-Memory Databases Start Here PLATFORM Top Ten Questions for Choosing In-Memory Databases. Are my applications accelerated without manual intervention and tuning?.

More information

Microkernels & Database OSs. Recovery Management in QuickSilver. DB folks: Stonebraker81. Very different philosophies

Microkernels & Database OSs. Recovery Management in QuickSilver. DB folks: Stonebraker81. Very different philosophies Microkernels & Database OSs Recovery Management in QuickSilver. Haskin88: Roger Haskin, Yoni Malachi, Wayne Sawdon, Gregory Chan, ACM Trans. On Computer Systems, vol 6, no 1, Feb 1988. Stonebraker81 OS/FS

More information

serious tools for serious apps

serious tools for serious apps 524028-2 Label.indd 1 serious tools for serious apps Real-Time Debugging Real-Time Linux Debugging and Analysis Tools Deterministic multi-core debugging, monitoring, tracing and scheduling Ideal for time-critical

More information

An SP-based Programming Model for Consumer Electronics Streaming Applications

An SP-based Programming Model for Consumer Electronics Streaming Applications C P S A L http://scalp.ewi.tudelft.nl SP@CE An SP-based Programming Model for Consumer Electronics Streaming Applications Ana Lucia Varbanescu, Maik Nijhuis, Arturo Gonzalez-Escribano Herbert Bos, Henk

More information

Profiling and Optimizing Transactional Memory Applications

Profiling and Optimizing Transactional Memory Applications Profiling and Optimizing Transactional Memory Applications Ferad Zyulkyarov, Srdjan Stipic, Tim Harris, Osman S. Unsal, Adrián Cristal, Ibrahim Hur, Mateo Valero BSC-Microsoft Research Centre Universitat

More information

EZManage V4.0 Release Notes. Document revision 1.08 (15.12.2013)

EZManage V4.0 Release Notes. Document revision 1.08 (15.12.2013) EZManage V4.0 Release Notes Document revision 1.08 (15.12.2013) Release Features Feature #1- New UI New User Interface for every form including the ribbon controls that are similar to the Microsoft office

More information

Process Management. Processes. CS 502 Spring 99 WPI MetroWest/Southboro Campus

Process Management. Processes. CS 502 Spring 99 WPI MetroWest/Southboro Campus Process Management CS 502 Spring 99 WPI MetroWest/Southboro Campus Processes Process Concept Process Scheduling Operations on Processes Cooperating Processes Threads Interprocess Communication 1 1 Process

More information

An Implementation Of Multiprocessor Linux

An Implementation Of Multiprocessor Linux An Implementation Of Multiprocessor Linux This document describes the implementation of a simple SMP Linux kernel extension and how to use this to develop SMP Linux kernels for architectures other than

More information

Silent Fan. Brief Description. Benefits. Functional Diagram. Silent Fan. Controller

Silent Fan. Brief Description. Benefits. Functional Diagram. Silent Fan. Controller Silent Fan Issue May 2009 Product Silent Fan Pages 6 Brief Description Many versions of Fujitsu Technology Solutions mainboards feature Silent Fan, a fan control and monitoring function; this technology

More information