SEER PROBABILISTIC SCHEDULING FOR COMMODITY HARDWARE TRANSACTIONAL MEMORY. 27 th Symposium on Parallel Architectures and Algorithms
|
|
- Polly McCarthy
- 8 years ago
- Views:
Transcription
1 27 th Symposium on Parallel Architectures and Algorithms SEER PROBABILISTIC SCHEDULING FOR COMMODITY HARDWARE TRANSACTIONAL MEMORY Nuno Diegues, Paolo Romano and Stoyan Garbatov
2 Seer: Scheduling for Commodity HTM SPAA The multi-core (r)evolution Shared Memory Multi-cores are now ubiquitous Concurrent programming is complex Transactional Memory System Classic approach: Locking Transactional Memory abstraction CPU 1 CPU 2 CPU 3 CPU 4 Hard to get right: fine-grained locks deadlocks correctness atomic { withdraw(acc1,val); deposit(acc2,val); } Programmer identifies atomic blocks Runtime implements synchronization
3 Seer: Scheduling for Commodity HTM SPAA Too much optimism y = x x++ Problem: CPU time is wasted run other computations instead inhibit parallelism improve cache usage increase core frequency reduce power consumption Identify likely conflicts before they happen
4 Seer: Scheduling for Commodity HTM SPAA Scheduler Software TM (STM): library has full concurrency control can point precisely the culprit for the conflict HTM available Hardware TM (HTM): feedback is quite limited rough categorization for the type of conflict in commodity processors
5 Seer: Scheduling for Commodity HTM SPAA Objective: Scheduling for Commodity HTM How to find the root cause for the data conflict? Avoid running T1 and T2 concurrently
6 Seer: Scheduling for Commodity HTM SPAA In an ideal world for HTMs xbegin widthdraw(acc1,val) deposit(acc2,val) xend Transactions restart Transactions may abort: because of contention on same memory locations and every transaction shall eventually succeed
7 Seer: Scheduling for Commodity HTM SPAA in practice: HTMS are Best-Effort No progress guarantees: A transaction may always abort due to a number of reasons: Forbidden instructions Capacity of caches (for reads and writes) Faults and signals Contending transactions, aborting each other
8 Seer: Scheduling for Commodity HTM SPAA Single Global Lock SGL fall-back path for HTM Hardware transaction executes if SGL is free Acquire SGL depending on retry policy SGL is a very simple scheduler Ignores the root cause Takes a global decision --- the SGL Adaptive Transaction Scheduling [SPAA08] We need better Scheduling for Commodity HTMs
9 Seer: Scheduling for Commodity HTM SPAA Related Work Scheduler Support for HTM? Support for Imprecise Information? Schedules Transactions in a Fine-Grained Fashion? ATS [SPAA08] Yes Yes No CAR-STM [PODC08] No No Yes Shrink [PODC09] No No Yes ProPS [Euro-Par14] No No Yes SER [PPoPP10] No No Yes TxLinux [SOSP07] Yes No Yes SOA [HiPEAC09/10] Yes No Yes Seer Yes Yes Yes
10 Seer: Scheduling for Commodity HTM SPAA Key Idea Transactions to be executed are announced Many observations are collected upon transaction commit and abort which transactions were active at the same time? Over time, the outliers will be identifiable w.h.p. A dynamic, fine-grained, locking scheme is devised
11 Seer: Scheduling for Commodity HTM SPAA Seer: overview Transaction = source code transaction active transactions
12 Seer: Scheduling for Commodity HTM SPAA Seer: details Threads collect lightweight events independently --- low overhead Locking scheme (re-)calculated periodically One lock per transaction (atomic block in the application) T1 lock (L1) taken by T2 if they are deemed to conflict T1 waits for L1 to be free before executing Calculate conditional probabilities of commit/abort Relevance threshold based on mean/stdev
13 Seer: Scheduling for Commodity HTM SPAA Seer: details For each pair of transactions (x,y) acquire lock of each other if: Are abort events of x common enough with y running concurrently? Is y one of the main causes for x to abort? Hill climbing based adaptive loop for optimal Threshold search.
14 Seer: Scheduling for Commodity HTM SPAA Seer: optimizations Only one thread (re-)calculates the locking scheme: Whenever it is waiting for the SGL (some thread is on the fallback path) If the SGL is rarely taken, then scheduling will not improve Capacity Aborts: another limitation from best-effort nature Per-core lock Taken when capacity aborts occur Tailored for hyper-thread usage Lock acquisition Hardware transaction used as multi-cas for 2+ locks
15 Seer: Scheduling for Commodity HTM SPAA Evaluation Intel Haswell 4 cores (8 hyper-threads) HLE: Intel Hardware Lock Elision, i.e., no scheduling RTM: Intel Commodity HTM with a SGL SCM: Software-assisted Contention Management [PODC14] --- schedule with a (single) auxiliary lock aux lock is not read speculatively (in hw tx) Seer: our Probabilistic Scheduler on top of Intel RTM
16 Seer: Scheduling for Commodity HTM SPAA How much can we gain with Seer? Genome Intruder Speedup Threads Threads Speedup 50% Geometric Mean Speedup in STAMP
17 Seer: Scheduling for Commodity HTM SPAA What motivates these gains? HLE: 77% with fall-back lock RTM: 37% with SGL SCM: 5% with SGL, 29% with (single) auxiliary lock Seer: 3% with at least one tx lock 4% with core lock 12% with tx + core locks 1% with SGL Fine-grained locks Geometric Mean over STAMP w/ 8 threads
18 Seer: Scheduling for Commodity HTM SPAA Relevance of each mechanism? Transaction locks: Detect conflicts inherent to benchmarks Core locks: Only relevant for >4t (hyper-threading) HTM lock acquisition: Small improvement --- benchmark dependent the more locks, the better Threshold tuning for probabilities Consistent/small improvement Baseline: Seer with all mechanisms enabled (i.e., their overhead) but without any lock acquisitions.
19 Seer: Scheduling for Commodity HTM SPAA Summary First scheduler tailored for Commodity HTMs: Copes with imprecise information Schedules transactions in a fine-grained manner 50% performance improvement with 8 threads 0-8% overhead from monitoring/calculation Taken by measuring Seer, but without acquiring locks
20 Seer: Scheduling for Commodity HTM SPAA Thank you Questions? Nuno Diegues, Paolo Romano and Stoyan Garbatov
21 Seer: Scheduling for Commodity HTM SPAA Backup slides
22 Seer: Scheduling for Commodity HTM SPAA HTM with a fall-back path start: int status = htm_begin code: application logic htm_end // fast-path
23 Seer: Scheduling for Commodity HTM SPAA HTM with a fall-back path start: int status = htm_begin if (status == ok) //!= ok when aborted if (fallback-in-use()) htm_abort // fall-back in use else goto code // fast-path?? code: application logic if (infastpath) htm_end // fast-path else??
24 Seer: Scheduling for Commodity HTM SPAA HTM with a fall-back path start: int status = htm_begin if (status == ok) //!= ok when aborted if (fallback-in-use()) htm_abort // fall-back in use else goto code // fast-path if (shouldretry()) goto start else use-fallback() // retry policy // use fall-back code: application logic if (infastpath) htm_end // fast-path else quit-fallback() // fall-back
25 Seer: Scheduling for Commodity HTM SPAA HTM with a fall-back: a single lock start: int status = htm_begin if (status == ok) //!= ok when aborted if (istaken(lock)) htm_abort // fall-back in use else goto code // fast-path Still simple enough. if (shouldretry()) // retry policy: e.g., limit retries to 10 goto start else acquire(lock) // use fall-back code: application logic if (infastpath) // fast-path htm_end else // fall-back release(lock)
Intel TSX (Transactional Synchronization Extensions) Mike Dai Wang and Mihai Burcea
Intel TSX (Transactional Synchronization Extensions) Mike Dai Wang and Mihai Burcea 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Example: toy banking application with RTM Code written and tested in
More informationThesis Proposal: Improving the Performance of Synchronization in Concurrent Haskell
Thesis Proposal: Improving the Performance of Synchronization in Concurrent Haskell Ryan Yates 5-5-2014 1/21 Introduction Outline Thesis Why Haskell? Preliminary work Hybrid TM for GHC Obstacles to Performance
More informationChallenges for synchronization and scalability on manycore: a Software Transactional Memory approach
Challenges for synchronization and scalability on manycore: a Software Transactional Memory approach Maurício Lima Pilla André Rauber Du Bois Adenauer Correa Yamin Ana Marilza Pernas Fleischmann Gerson
More informationImproving In-Memory Database Index Performance with Intel R Transactional Synchronization Extensions
Appears in the 20th International Symposium On High-Performance Computer Architecture, Feb. 15 - Feb. 19, 2014. Improving In-Memory Database Index Performance with Intel R Transactional Synchronization
More informationUnderstanding Hardware Transactional Memory
Understanding Hardware Transactional Memory Gil Tene, CTO & co-founder, Azul Systems @giltene 2015 Azul Systems, Inc. Agenda Brief introduction What is Hardware Transactional Memory (HTM)? Cache coherence
More informationTransactional Memory
Transactional Memory Konrad Lai Microprocessor Technology Labs, Intel Intel Multicore University Research Conference Dec 8, 2005 Motivation Multiple cores face a serious programmability problem Writing
More informationSoftware and the Concurrency Revolution
Software and the Concurrency Revolution A: The world s fastest supercomputer, with up to 4 processors, 128MB RAM, 942 MFLOPS (peak). 2 Q: What is a 1984 Cray X-MP? (Or a fractional 2005 vintage Xbox )
More informationMaximum Benefit from a Minimal HTM
Maximum Benefit from a Minimal HTM Owen S. Hofmann Christopher J. Rossbach Emmett Witchel University of Texas at Austin osh@cs.utexas.edu, rossbach@cs.utexas.edu, witchel@cs.utexas.edu Abstract A minimal,
More informationSynchronization Extensions for High-Performance Computing
Performance Evaluation of Intel Transactional R Synchronization Extensions for High-Performance Computing Richard M. Yoo richard.m.yoo@intel.com Konrad Lai konrad.lai@intel.com Christopher J. Hughes christopher.j.hughes@intel.com
More informationAdaptive thread scheduling techniques for improving scalability of software transactional memory. Title. Chan, K; Lam, KT; Wang, CL
Title Adaptive thread scheduling techniques for improving scalability of software transactional memory Author(s) Chan, K; Lam, KT; Wang, CL Citation The 10th IASTED International Conference on Parallel
More informationMulti-core Programming System Overview
Multi-core Programming System Overview Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,
More informationPerformance Evaluation of Adaptivity in Software Transactional Memory
Performance Evaluation of Adaptivity in Software Transactional Memory Mathias Payer ETH Zurich, Switzerland mathias.payer@inf.ethz.ch Thomas R. Gross ETH Zurich, Switzerland trg@inf.ethz.ch Abstract Transactional
More informationChapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup
Chapter 12: Multiprocessor Architectures Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup Objective Be familiar with basic multiprocessor architectures and be able to
More informationImproving the performance of data servers on multicore architectures. Fabien Gaud
Improving the performance of data servers on multicore architectures Fabien Gaud Grenoble University Advisors: Jean-Bernard Stefani, Renaud Lachaize and Vivien Quéma Sardes (INRIA/LIG) December 2, 2010
More informationFPGA-based Multithreading for In-Memory Hash Joins
FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded
More informationRunning a Workflow on a PowerCenter Grid
Running a Workflow on a PowerCenter Grid 2010-2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)
More informationExploiting Hardware Transactional Memory in Main-Memory Databases
Exploiting Hardware Transactional Memory in Main-Memory Databases Viktor Leis, Alfons Kemper, Thomas Neumann Fakultät für Informatik Technische Universität München Boltzmannstraße 3, D-85748 Garching @in.tum.de
More informationIMCM: A Flexible Fine-Grained Adaptive Framework for Parallel Mobile Hybrid Cloud Applications
Open System Laboratory of University of Illinois at Urbana Champaign presents: Outline: IMCM: A Flexible Fine-Grained Adaptive Framework for Parallel Mobile Hybrid Cloud Applications A Fine-Grained Adaptive
More informationSystem Copy GT Manual 1.8 Last update: 2015/07/13 Basis Technologies
System Copy GT Manual 1.8 Last update: 2015/07/13 Basis Technologies Table of Contents Introduction... 1 Prerequisites... 2 Executing System Copy GT... 3 Program Parameters / Selection Screen... 4 Technical
More informationSpring 2011 Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem
More informationMulticore Programming with LabVIEW Technical Resource Guide
Multicore Programming with LabVIEW Technical Resource Guide 2 INTRODUCTORY TOPICS UNDERSTANDING PARALLEL HARDWARE: MULTIPROCESSORS, HYPERTHREADING, DUAL- CORE, MULTICORE AND FPGAS... 5 DIFFERENCES BETWEEN
More informationHistorically, Huge Performance Gains came from Huge Clock Frequency Increases Unfortunately.
Historically, Huge Performance Gains came from Huge Clock Frequency Increases Unfortunately. Hardware Solution Evolution of Computer Architectures Micro-Scopic View Clock Rate Limits Have Been Reached
More informationCHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
More informationPredictive modeling for software transactional memory
VU University Amsterdam BMI Paper Predictive modeling for software transactional memory Author: Tim Stokman Supervisor: Sandjai Bhulai October, Abstract In this paper a new kind of concurrency type named
More informationScheduling Task Parallelism" on Multi-Socket Multicore Systems"
Scheduling Task Parallelism" on Multi-Socket Multicore Systems" Stephen Olivier, UNC Chapel Hill Allan Porterfield, RENCI Kyle Wheeler, Sandia National Labs Jan Prins, UNC Chapel Hill Outline" Introduction
More informationSWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri
SWARM: A Parallel Programming Framework for Multicore Processors David A. Bader, Varun N. Kanade and Kamesh Madduri Our Contributions SWARM: SoftWare and Algorithms for Running on Multicore, a portable
More informationLOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015. Hermann Härtig
LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2015 Hermann Härtig ISSUES starting points independent Unix processes and block synchronous execution who does it load migration mechanism
More informationUsing Restricted Transactional Memory to Build a Scalable In-Memory Database
Using Restricted Transactional Memory to Build a Scalable In-Memory Database Zhaoguo Wang, Hao Qian, Jinyang Li, Haibo Chen School of Computer Science, Fudan University Institute of Parallel and Distributed
More informationPART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design
PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General
More informationChapter 6, The Operating System Machine Level
Chapter 6, The Operating System Machine Level 6.1 Virtual Memory 6.2 Virtual I/O Instructions 6.3 Virtual Instructions For Parallel Processing 6.4 Example Operating Systems 6.5 Summary Virtual Memory General
More informationReconfigurable Architecture Requirements for Co-Designed Virtual Machines
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra
More informationControl 2004, University of Bath, UK, September 2004
Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of
More informationA Pattern-Based Approach to. Automated Application Performance Analysis
A Pattern-Based Approach to Automated Application Performance Analysis Nikhil Bhatia, Shirley Moore, Felix Wolf, and Jack Dongarra Innovative Computing Laboratory University of Tennessee (bhatia, shirley,
More informationVersioned Transactional Shared Memory for the
Versioned Transactional Shared Memory for the FénixEDU Web Application Nuno Carvalho INESC-ID/IST nonius@gsd.inesc-id.pt João Cachopo INESC-ID/IST joao.cachopo@inesc-id.pt António Rito Silva INESC-ID/IST
More informationHyperThreading Support in VMware ESX Server 2.1
HyperThreading Support in VMware ESX Server 2.1 Summary VMware ESX Server 2.1 now fully supports Intel s new Hyper-Threading Technology (HT). This paper explains the changes that an administrator can expect
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationScaling HTM-Supported Database Transactions to Many Cores
1 Scaling HTM-Supported Database Transactions to Many Cores Viktor Leis, Alfons Kemper, and Thomas Neumann Abstract So far, transactional memory although a promising technique suffered from the absence
More informationParallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel
Parallel Databases Increase performance by performing operations in parallel Parallel Architectures Shared memory Shared disk Shared nothing closely coupled loosely coupled Parallelism Terminology Speedup:
More informationDATABASE CONCURRENCY CONTROL USING TRANSACTIONAL MEMORY : PERFORMANCE EVALUATION
DATABASE CONCURRENCY CONTROL USING TRANSACTIONAL MEMORY : PERFORMANCE EVALUATION Jeong Seung Yu a, Woon Hak Kang b, Hwan Soo Han c and Sang Won Lee d School of Info. & Comm. Engr. Sungkyunkwan University
More informationWhy Threads Are A Bad Idea (for most purposes)
Why Threads Are A Bad Idea (for most purposes) John Ousterhout Sun Microsystems Laboratories john.ousterhout@eng.sun.com http://www.sunlabs.com/~ouster Introduction Threads: Grew up in OS world (processes).
More information10.04.2008. Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details
Thomas Fahrig Senior Developer Hypervisor Team Hypervisor Architecture Terminology Goals Basics Details Scheduling Interval External Interrupt Handling Reserves, Weights and Caps Context Switch Waiting
More informationOperating Systems. 05. Threads. Paul Krzyzanowski. Rutgers University. Spring 2015
Operating Systems 05. Threads Paul Krzyzanowski Rutgers University Spring 2015 February 9, 2015 2014-2015 Paul Krzyzanowski 1 Thread of execution Single sequence of instructions Pointed to by the program
More informationHardware support for Local Memory Transactions on GPU Architectures
Hardware support for Local Memory Transactions on GPU Architectures Alejandro Villegas Rafael Asenjo Ángeles Navarro Oscar Plata Universidad de Málaga, Andalucía Tech. Dept. Computer Architecture, 29071
More informationTransactional Support for SDN Control Planes "
Transactional Support for SDN Control Planes Petr Kuznetsov Telecom ParisTech WTTM, 2015 Software Defined Networking An emerging paradigm in computer network management Separate forwarding hardware (data
More informationaicas Technology Multi Core und Echtzeit Böse Überraschungen vermeiden Dr. Fridtjof Siebert CTO, aicas OOP 2011, 25 th January 2011
aicas Technology Multi Core und Echtzeit Böse Überraschungen vermeiden Dr. Fridtjof Siebert CTO, aicas OOP 2011, 25 th January 2011 2 aicas Group aicas GmbH founded in 2001 in Karlsruhe Focus: Embedded
More informationScalability evaluation of barrier algorithms for OpenMP
Scalability evaluation of barrier algorithms for OpenMP Ramachandra Nanjegowda, Oscar Hernandez, Barbara Chapman and Haoqiang H. Jin High Performance Computing and Tools Group (HPCTools) Computer Science
More informationMicrokernels & Database OSs. Recovery Management in QuickSilver. DB folks: Stonebraker81. Very different philosophies
Microkernels & Database OSs Recovery Management in QuickSilver. Haskin88: Roger Haskin, Yoni Malachi, Wayne Sawdon, Gregory Chan, ACM Trans. On Computer Systems, vol 6, no 1, Feb 1988. Stonebraker81 OS/FS
More informationDriving force. What future software needs. Potential research topics
Improving Software Robustness and Efficiency Driving force Processor core clock speed reach practical limit ~4GHz (power issue) Percentage of sustainable # of active transistors decrease; Increase in #
More informationAn Implementation Of Multiprocessor Linux
An Implementation Of Multiprocessor Linux This document describes the implementation of a simple SMP Linux kernel extension and how to use this to develop SMP Linux kernels for architectures other than
More informationOverview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification
Introduction Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification Advanced Topics in Software Engineering 1 Concurrent Programs Characterized by
More informationCapacity Estimation for Linux Workloads
Capacity Estimation for Linux Workloads Session L985 David Boyes Sine Nomine Associates 1 Agenda General Capacity Planning Issues Virtual Machine History and Value Unique Capacity Issues in Virtual Machines
More informationfind model parameters, to validate models, and to develop inputs for models. c 1994 Raj Jain 7.1
Monitors Monitor: A tool used to observe the activities on a system. Usage: A system programmer may use a monitor to improve software performance. Find frequently used segments of the software. A systems
More informationPerformance Tuning and Optimizing SQL Databases 2016
Performance Tuning and Optimizing SQL Databases 2016 http://www.homnick.com marketing@homnick.com +1.561.988.0567 Boca Raton, Fl USA About this course This four-day instructor-led course provides students
More informationLecture 7: Concurrency control. Rasmus Pagh
Lecture 7: Concurrency control Rasmus Pagh 1 Today s lecture Concurrency control basics Conflicts and serializability Locking Isolation levels in SQL Optimistic concurrency control Transaction tuning Transaction
More informationBinary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
More informationJava Virtual Machine: the key for accurated memory prefetching
Java Virtual Machine: the key for accurated memory prefetching Yolanda Becerra Jordi Garcia Toni Cortes Nacho Navarro Computer Architecture Department Universitat Politècnica de Catalunya Barcelona, Spain
More informationPetascale Software Challenges. Piyush Chaudhary piyushc@us.ibm.com High Performance Computing
Petascale Software Challenges Piyush Chaudhary piyushc@us.ibm.com High Performance Computing Fundamental Observations Applications are struggling to realize growth in sustained performance at scale Reasons
More informationultra fast SOM using CUDA
ultra fast SOM using CUDA SOM (Self-Organizing Map) is one of the most popular artificial neural network algorithms in the unsupervised learning category. Sijo Mathew Preetha Joy Sibi Rajendra Manoj A
More information(Pessimistic) Timestamp Ordering. Rules for read and write Operations. Pessimistic Timestamp Ordering. Write Operations and Timestamps
(Pessimistic) stamp Ordering Another approach to concurrency control: Assign a timestamp ts(t) to transaction T at the moment it starts Using Lamport's timestamps: total order is given. In distributed
More informationNAND Flash Memories. Understanding NAND Flash Factory Pre-Programming. Schemes
NAND Flash Memories Understanding NAND Flash Factory Pre-Programming Schemes Application Note February 2009 an_elnec_nand_schemes, version 1.00 Version 1.00/02.2009 Page 1 of 20 NAND flash technology enables
More informationserious tools for serious apps
524028-2 Label.indd 1 serious tools for serious apps Real-Time Debugging Real-Time Linux Debugging and Analysis Tools Deterministic multi-core debugging, monitoring, tracing and scheduling Ideal for time-critical
More informationResource Utilization of Middleware Components in Embedded Systems
Resource Utilization of Middleware Components in Embedded Systems 3 Introduction System memory, CPU, and network resources are critical to the operation and performance of any software system. These system
More informationCOLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service. Eddie Dong, Tao Hong, Xiaowei Yang
COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service Eddie Dong, Tao Hong, Xiaowei Yang 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO
More informationLecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle?
Lecture 3: Evaluating Computer Architectures Announcements - Reminder: Homework 1 due Thursday 2/2 Last Time technology back ground Computer elements Circuits and timing Virtuous cycle of the past and
More informationA Flexible Cluster Infrastructure for Systems Research and Software Development
Award Number: CNS-551555 Title: CRI: Acquisition of an InfiniBand Cluster with SMP Nodes Institution: Florida State University PIs: Xin Yuan, Robert van Engelen, Kartik Gopalan A Flexible Cluster Infrastructure
More informationSynchronization. Todd C. Mowry CS 740 November 24, 1998. Topics. Locks Barriers
Synchronization Todd C. Mowry CS 740 November 24, 1998 Topics Locks Barriers Types of Synchronization Mutual Exclusion Locks Event Synchronization Global or group-based (barriers) Point-to-point tightly
More informationMulti-GPU Load Balancing for Simulation and Rendering
Multi- Load Balancing for Simulation and Rendering Yong Cao Computer Science Department, Virginia Tech, USA In-situ ualization and ual Analytics Instant visualization and interaction of computing tasks
More informationMultiprocessor Scheduling and Scheduling in Linux Kernel 2.6
Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6 Winter Term 2008 / 2009 Jun.-Prof. Dr. André Brinkmann Andre.Brinkmann@uni-paderborn.de Universität Paderborn PC² Agenda Multiprocessor and
More informationApplication Performance Analysis of the Cortex-A9 MPCore
This project in ARM is in part funded by ICT-eMuCo, a European project supported under the Seventh Framework Programme (7FP) for research and technological development Application Performance Analysis
More informationCloned Transactions: A New Execution Concept for Transactional Memory
Cloned Transactions: A New Execution Concept for Transactional Memory Vom Promotionsausschuss der Technischen Universität Hamburg-Harburg zur Erlangung des akademischen Grades Doktor der Naturwissenschaften
More informationDistributed Data Management
Introduction Distributed Data Management Involves the distribution of data and work among more than one machine in the network. Distributed computing is more broad than canonical client/server, in that
More informationAchieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
More informationDynamic Load Balancing. Using Work-Stealing 35.1 INTRODUCTION CHAPTER. Daniel Cederman and Philippas Tsigas
CHAPTER Dynamic Load Balancing 35 Using Work-Stealing Daniel Cederman and Philippas Tsigas In this chapter, we present a methodology for efficient load balancing of computational problems that can be easily
More informationTop 10 reasons your ecommerce site will fail during peak periods
An AppDynamics Business White Paper Top 10 reasons your ecommerce site will fail during peak periods For U.S.-based ecommerce organizations, the last weekend of November is the most important time of the
More informationSQL Server 2012 Optimization, Performance Tuning and Troubleshooting
1 SQL Server 2012 Optimization, Performance Tuning and Troubleshooting 5 Days (SQ-OPT2012-301-EN) Description During this five-day intensive course, students will learn the internal architecture of SQL
More informationPostgreSQL Concurrency Issues
PostgreSQL Concurrency Issues 1 PostgreSQL Concurrency Issues Tom Lane Red Hat Database Group Red Hat, Inc. PostgreSQL Concurrency Issues 2 Introduction What I want to tell you about today: How PostgreSQL
More informationEmbedded Parallel Computing
Embedded Parallel Computing Lecture 5 - The anatomy of a modern multiprocessor, the multicore processors Tomas Nordström Course webpage:: Course responsible and examiner: Tomas
More informationCOLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service
COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service Eddie Dong, Yunhong Jiang 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
More informationNVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist
NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get
More informationMulti-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
More informationEmbedded Systems: map to FPGA, GPU, CPU?
Embedded Systems: map to FPGA, GPU, CPU? Jos van Eijndhoven jos@vectorfabrics.com Bits&Chips Embedded systems Nov 7, 2013 # of transistors Moore s law versus Amdahl s law Computational Capacity Hardware
More informationSoftware Performance and Scalability
Software Performance and Scalability A Quantitative Approach Henry H. Liu ^ IEEE )computer society WILEY A JOHN WILEY & SONS, INC., PUBLICATION Contents PREFACE ACKNOWLEDGMENTS xv xxi Introduction 1 Performance
More informationJava Environment for Parallel Realtime Development Platform Independent Software Development for Multicore Systems
Java Environment for Parallel Realtime Development Platform Independent Software Development for Multicore Systems Ingo Prötel, aicas GmbH Computing Frontiers 6 th of May 2008, Ischia, Italy Jeopard-Project:
More informationWhy Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat
Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat Why Computers Are Getting Slower The traditional approach better performance Why computers are
More informationOutline. Failure Types
Outline Database Management and Tuning Johann Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Unit 11 1 2 Conclusion Acknowledgements: The slides are provided by Nikolaus Augsten
More informationPutting Checkpoints to Work in Thread Level Speculative Execution
Putting Checkpoints to Work in Thread Level Speculative Execution Salman Khan E H U N I V E R S I T Y T O H F G R E D I N B U Doctor of Philosophy Institute of Computing Systems Architecture School of
More informationA Survey of Parallel Processing in Linux
A Survey of Parallel Processing in Linux Kojiro Akasaka Computer Science Department San Jose State University San Jose, CA 95192 408 924 1000 kojiro.akasaka@sjsu.edu ABSTRACT Any kernel with parallel processing
More informationCloud Management: Knowing is Half The Battle
Cloud Management: Knowing is Half The Battle Raouf BOUTABA David R. Cheriton School of Computer Science University of Waterloo Joint work with Qi Zhang, Faten Zhani (University of Waterloo) and Joseph
More informationIntroduction to Parallel Computing. George Karypis Parallel Programming Platforms
Introduction to Parallel Computing George Karypis Parallel Programming Platforms Elements of a Parallel Computer Hardware Multiple Processors Multiple Memories Interconnection Network System Software Parallel
More informationCloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
More informationCarlos Villavieja, Nacho Navarro {cvillavi,nacho}@ac.upc.edu. Arati Baliga, Liviu Iftode {aratib,liviu}@cs.rutgers.edu
Continuous Monitoring using MultiCores Carlos Villavieja, Nacho Navarro {cvillavi,nacho}@ac.upc.edu Arati Baliga, Liviu Iftode {aratib,liviu}@cs.rutgers.edu Motivation Intrusion detection Intruder gets
More informationParallel Algorithm Engineering
Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis
More informationParallelism and Cloud Computing
Parallelism and Cloud Computing Kai Shen Parallel Computing Parallel computing: Process sub tasks simultaneously so that work can be completed faster. For instances: divide the work of matrix multiplication
More informationOperatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings
Operatin g Systems: Internals and Design Principle s Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings Operating Systems: Internals and Design Principles Bear in mind,
More informationThe Comeback of Batch Tuning
The Comeback of Batch Tuning By Avi Kohn, Time Machine Software Introduction A lot of attention is given today by data centers to online systems, client/server, data mining, and, more recently, the Internet.
More informationIntroduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
More informationWeighted Total Mark. Weighted Exam Mark
CMP2204 Operating System Technologies Period per Week Contact Hour per Semester Total Mark Exam Mark Continuous Assessment Mark Credit Units LH PH TH CH WTM WEM WCM CU 45 30 00 60 100 40 100 4 Rationale
More informationInside the Erlang VM
Rev A Inside the Erlang VM with focus on SMP Prepared by Kenneth Lundin, Ericsson AB Presentation held at Erlang User Conference, Stockholm, November 13, 2008 1 Introduction The history of support for
More informationThe continuum of data management techniques for explicitly managed systems
The continuum of data management techniques for explicitly managed systems Svetozar Miucin, Craig Mustard Simon Fraser University MCES 2013. Montreal Introduction Explicitly Managed Memory systems lack
More informationGPU File System Encryption Kartik Kulkarni and Eugene Linkov
GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through
More information