Programming Language Seminar Concurrency I-1: Java and C# Memory Models
|
|
- Logan Warren
- 7 years ago
- Views:
Transcription
1 Programming Language Seminar Concurrency I-1: Java and C# Memory Models Peter Sestoft Friday
2 Outline for today, 1 Why parallel programming? Concurrency in Java and C# Problem: shared mutable state (data, fields) Solutions: Locks, synchronized! AtomicInteger, AtomicLong, AtomicReference Concurrency without locks Weird behavior legal in Java and C# for speed Safe publication The double meaning of synchronized! The meaning of volatile! Immutability and visibility The double meaning of final 2
3 Why parallel programming? Until 2003, CPUs became faster every year So sequential software became faster every year Today, CPUs are still 2-4 GHz as in 2003 So sequential software has not become faster Instead, we get Multicore: 2, 4, 8,... CPUs on a chip Vector instructions (4 x MAC, SIMD, SSE) in CPUs Superfast Graphics Processing Units (GPU) 96 simple CUDA codes in this ancient 2009 laptop 3027 simple but fast CUDA cores in Nvidia Tesla K10 Herb Sutter: The free lunch is over (2005) More speed requires parallel programming But parallel programming is difficult and errorprone... with existing means: threads, synchronization,... 3
4 A simple counter, incremented in parallel class BareCounter implements Counter { private int counter = 0; public void inc() { counter++; Simple counter Thread[] ts = new Thread[threads]; for (int j=0; j<threads; j++) ts[j] = new Thread() { public void run() { for (int i=0; i<iterations; i++) counter.inc(); ; for (int j=0; j<threads; j++) ts[j].start(); for (int j=0; j<threads; j++) ts[j].join(); Many threads increment counter in parallel This goes wrong, of course Why? 4
5 Locks: Ensure mutual exclusion class SyncCounter implements Counter { private int counter = 0; public synchronized void inc() { counter++; Synchronized counter class SyncCounter implements Counter { private int counter = 0; public void inc() { synchronized(this) { counter++; File ConcurrentCounters.java Really, abbreviation for this code This works Why? 5
6 Locking/synchronization A lock does not guarantee anything in itself Disciplined use of locks can lead to Exclusive access to shared mutable state And hence consistent update of the state Easy to misuse Forget synchronized one place => anarchy Low performance under high contention Context switches Not compositional Using multiple locks can lead to deadlock Easy to avoid by always locking in the same order But hard to know that libraries, GUI,... do 6
7 Atomic update (Java 5) class AtomicCounter implements Counter { private final AtomicInteger counter = new AtomicInteger(); public void inc() { counter.getandincrement(); Atomic counter This uses an atomic x86 instruction Mono JITted code, from CIL, from C# See file Interlocked.cs 7
8 Java Atomic variables java.util.concurrent.atomic package AtomicInteger, AtomicReference<T>,... C#/.NET System.Threading.Interlocked namespace Add(ref int, int), Exchange<T>(ref T, T),... More efficient than locking/synchronized When applicable Translates directly to x86 instructions We shall look more into these next week In lock-free algorithms 8
9 Strange but legal behavior Java Language Specification, sect 17.4: Run these code fragments in two threads Assume A and B shared fields, initially 0 r2=a; B=1; Thread 1 r1=b; Thread 2 A=2; What are the possible results? Strangely, r1==1 and r2==2 is possible The Java (or C#/.NET) memory model Does not guarantee sequential consistency Not between threads, only within each individual thread Compiler may reorder and share memory accesses 9
10 Why permit such strange behaviors? More comprehensible example from JLS 17.4 Assume p, q shared, p==q and p.x==0 r1 = p;! r2 = r1.x;! r3 = q;! r4 = r3.x;! r5 = r1.x;! Thread 1 r6 = p;! Thread 2 r6.x = 3;! Classic compiler optimization: r1 = p;! r2 = r1.x;! r3 = q;! r4 = r3.x;! r5 = r2;! r6 = p;! r6.x = 3;! (p.x seems to switch from r2=0 to r4=3 and back to r5=0) 10
11 Sequential consistency The volatile field modifier avoids these compiler optimizations offers a number of guarantees (in Java and C#) but loses some performance IntArray.IsSorted example, sequential Files VolatileArray.java, VolatileArray.cs Java, sec non-volatile, sec volatile C# MS sec non-volatile, sec volatile C# Mono sec in both cases In this particular case, Mono does no optimization See machine code, in source file 11
12 Java Java and C# Java Language Specification (JLS), Java 7, 2013: section Volatile Fields (brief) and section 17.4 Memory Model (rather complicated) JVM Specification just refers to JLS C#/.NET C# Language Specification Volatile Fields CLI Ecma-335 standard section I : "volatile read has acquire semantics... the read is guaranteed to occur prior to any references to memory than occur after the read instruction in the CIL instruction sequence" "volatile write has release semantics... the write... occur after any memory references... prior to the write..." 12
13 Thread-unsafe integer holder public class MutableInteger { private int value; public int get() { return value; public void set(int value) { this.value = value; One thread may never see the updates performed by another one 13
14 Thread-safe integer holder public class MutableInteger { private int value; public synchronized int get() { return value; public synchronized void set(int value) { this.value = value; Locking (synchronized) has two effects: Mutual exclusion Visibility of memory updates: all fields visible to thread A before releasing a lock are visible to thread B after acquiring the lock ("synchronizes") 14
15 Visibility by synchronization "release" "acquire" Goetz p
16 Another thread-safe integer holder? public class MutableInteger { private volatile int value; public int get() { return value; public void set(int value) { this.value = value; Not in the book, but should work The volatile modifier has one effect: Visibility of memory updates: all fields visible to thread A before writing the field are visible to thread B after reading the field (it "synchronizes") Stronger guarantee than in C/C++ Affects visibility of all fields, not just the volatile 16
17 C#/.NET CLI Ecma-335 standard section I : "A volatile write has release semantics... the write is guaranteed to happen after any memory references prior to the write instruction in the CIL instruction sequence" "volatile read has acquire semantics... the read is guaranteed to occur prior to any references to memory that occur after the read instruction in the CIL instruction sequence" So same as Java: volatile write+read has the visibility effect of lock release+acquire (but not the mutual exclusion effect, of course) 17
18 Goetz factorization servlet example: Stateless servlet public class StatelessFactorizer... implements Servlet { public void service(servletrequest req, ServletResponse resp) { BigInteger i = extractfromrequest(req); BigInteger[] factors = factor(i); encodeintoresponse(resp, factors); BigInteger extractfromrequest(servletrequest req) {... BigInteger[] factor(biginteger i) {... void encodeintoresponse(servletresponse resp,...) {... No concurrent access to any shared state All state is thread-confined (local variables) 18
19 Goetz factorization servlet example: Count accesses in shared int public class UnsafeCountingFactorizer... { private long count = 0; public void service(servletrequest req, ServletResponse resp) { BigInteger i = extractfromrequest(req); BigInteger[] factors = factor(i); ++count; Unsafe encodeintoresponse(resp, factors); Concurrent access to shared mutable state Unsafe because ++i operation is not atomic Risk of lost updates Shared state 19
20 Goetz factorization servlet example: Count accesses with atomic int public class CountingFactorizer... { private final AtomicLong count = new AtomicLong(0); Shared state public void service(servletrequest req, ServletResponse resp) { BigInteger i = extractfromrequest(req); BigInteger[] factors = factor(i); count.incrementandget(); Safe encodeintoresponse(resp, factors); Concurrent access to shared mutable state Safe because operation is atomic No lost updates Could we use synchronized instead? 20
21 Goetz factorization servlet example: Cache last factorization public class UnsafeCachingFactorizer... { private final AtomicReference<BigInteger> lastnumber =...; private final AtomicReference<BigInteger[]> lastfactors =...; public void service(servletrequest req, ServletResponse resp) { BigInteger i = extractfromrequest(req); if (i.equals(lastnumber.get())) encodeintoresponse(resp, lastfactors.get()); else { BigInteger[] factors = factor(i); lastnumber.set(i); lastfactors.set(factors); encodeintoresponse(resp, factors); Invariant: lastnumber = product of lastfactors Can we use synchronized here? Unsafe, may violate invariant 21
22 Goetz factorization servlet example: Cache last factorization, I public class CachedFactorizer... { private BigInteger lastnumber; private BigInteger[] lastfactors; public void service(servletrequest req, ServletResponse resp) { BigInteger i = extractfromrequest(req); BigInteger[] factors = null; synchronized (this) { if (i.equals(lastnumber)) factors = lastfactors.clone(); if (factors == null) { factors = factor(i); synchronized (this) { lastnumber = i; lastfactors = factors.clone(); encodeintoresponse(resp, factors); Why needed? Preserves invariant 22
23 Immutable factor cache public class OneValueCache { private final BigInteger lastnumber; private final BigInteger[] lastfactors; public OneValueCache(BigInteger i, BigInteger[] factors) { lastnumber = i; lastfactors = Arrays.copyOf(factors, factors.length); public BigInteger[] getfactors(biginteger i) { if (lastnumber == null!lastnumber.equals(i)) return null; else return Arrays.copyOf(lastFactors, lastfactors.length); Final fields, and instance-private copies of arrays, and BigInteger instances are immutable 23
24 Goetz factorization servlet example: Cache last factorization, II public class VolatileCachedFactorizer... { private volatile OneValueCache cache = new OneValueCache(null, null); public void service(servletrequest req, ServletResponse resp) { BigInteger i = extractfromrequest(req); BigInteger[] factors = cache.getfactors(i); if (factors == null) { factors = factor(i); cache = new OneValueCache(i, factors); encodeintoresponse(resp, factors); Volatile field cache ensures visibility NB! Immutable cache object avoids shared mutable state and ensures visibility 24
25 Semantics of final fields Final has two effects field cannot be updated after initialization, and field's value is visible after construction Java Language Specification 17.5: A thread that can only see a reference to an object after [that object's constructor has finished] is guaranteed to see the correctly initialized values for that object's final fields This is similar to volatile fields But the JIT compiler can perform lots of optimizations (caching,...) on final fields that are not possible for volatile fields 25
26 JLS example class FinalFieldExample {! final int x;! int y;! static FinalFieldExample f;! public FinalFieldExample() {! x = 3;! y = 4;!! static void writer() {! f = new FinalFieldExample();!! Thread 1. Writes to f after constructor finished static void reader() {! if (f!= null) {! int i = f.x; // guaranteed to see 3! int j = f.y; // could see 0!!! Thread 2 26
27 What about C#/.NET readonly fields?! No mention found in C# Language Specification (readonly) or Ecma-335 CLI Specification (initonly) In fact, no such guarantee intended, see mails from Microsoft (Carol Eidt and Eric Eilebrecht)
28 Visibility of memory updates Caused by synchronized/lock Caused by volatile Caused by final (visible after construction) Caused by CAS and similar (next week) Caused by synchronized collections, in Java package java.util.concurrent.net namespace System.Collections.Concurrent and older synchronized collections 28
29 Week 1 (this week) Reading Read Goetz et al.: Java Concurrency in Practice, chapters 1, 2, 3, 4, 5 Look at Java Language Specification, section Week 2 Goetz et al.: Java Concurrency in Practice, chapter 15 Michael and Scott: Simple, fast, and practical... Herlihy & Shavit: The Art of Multiprocessor Programming, chapters 3 and 9 29
Built-in Concurrency Primitives in Java Programming Language. by Yourii Martiak and Mahir Atmis
Built-in Concurrency Primitives in Java Programming Language by Yourii Martiak and Mahir Atmis Overview One of the many strengths of Java is the built into the programming language support for concurrency
More informationaicas Technology Multi Core und Echtzeit Böse Überraschungen vermeiden Dr. Fridtjof Siebert CTO, aicas OOP 2011, 25 th January 2011
aicas Technology Multi Core und Echtzeit Böse Überraschungen vermeiden Dr. Fridtjof Siebert CTO, aicas OOP 2011, 25 th January 2011 2 aicas Group aicas GmbH founded in 2001 in Karlsruhe Focus: Embedded
More informationComputer Science 483/580 Concurrent Programming Midterm Exam February 23, 2009
Computer Science 483/580 Concurrent Programming Midterm Exam February 23, 2009 Your name There are 6 pages to this exam printed front and back. Please make sure that you have all the pages now. The exam
More informationMonitors, Java, Threads and Processes
Monitors, Java, Threads and Processes 185 An object-oriented view of shared memory A semaphore can be seen as a shared object accessible through two methods: wait and signal. The idea behind the concept
More informationMulti-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007
Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007 1 Single-core computer 2 Single-core CPU chip the single core 3 Multi-core architectures This lecture is about a new trend in computer
More informationHow To Write A Multi Threaded Software On A Single Core (Or Multi Threaded) System
Multicore Systems Challenges for the Real-Time Software Developer Dr. Fridtjof Siebert aicas GmbH Haid-und-Neu-Str. 18 76131 Karlsruhe, Germany siebert@aicas.com Abstract Multicore systems have become
More informationGPU Parallel Computing Architecture and CUDA Programming Model
GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware
More informationFacing the Challenges for Real-Time Software Development on Multi-Cores
Facing the Challenges for Real-Time Software Development on Multi-Cores Dr. Fridtjof Siebert aicas GmbH Haid-und-Neu-Str. 18 76131 Karlsruhe, Germany siebert@aicas.com Abstract Multicore systems introduce
More informationAn Easier Way for Cross-Platform Data Acquisition Application Development
An Easier Way for Cross-Platform Data Acquisition Application Development For industrial automation and measurement system developers, software technology continues making rapid progress. Software engineers
More informationOutline of this lecture G52CON: Concepts of Concurrency
Outline of this lecture G52CON: Concepts of Concurrency Lecture 10 Synchronisation in Java Natasha Alechina School of Computer Science nza@cs.nott.ac.uk mutual exclusion in Java condition synchronisation
More informationIntroduction to GPU Programming Languages
CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure
More informationChapter 1 Computer System Overview
Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides
More informationIntro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1
Intro to GPU computing Spring 2015 Mark Silberstein, 048661, Technion 1 Serial vs. parallel program One instruction at a time Multiple instructions in parallel Spring 2015 Mark Silberstein, 048661, Technion
More informationLast Class: OS and Computer Architecture. Last Class: OS and Computer Architecture
Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts
More informationJava Virtual Machine Locks
Java Virtual Machine Locks SS 2008 Synchronized Gerald SCHARITZER (e0127228) 2008-05-27 Synchronized 1 / 13 Table of Contents 1 Scope...3 1.1 Constraints...3 1.2 In Scope...3 1.3 Out of Scope...3 2 Logical
More informationMulti-core Programming System Overview
Multi-core Programming System Overview Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,
More informationC# and Other Languages
C# and Other Languages Rob Miles Department of Computer Science Why do we have lots of Programming Languages? Different developer audiences Different application areas/target platforms Graphics, AI, List
More informationSSC - Concurrency and Multi-threading Java multithreading programming - Synchronisation (I)
SSC - Concurrency and Multi-threading Java multithreading programming - Synchronisation (I) Shan He School for Computational Science University of Birmingham Module 06-19321: SSC Outline Outline of Topics
More informationGPUs for Scientific Computing
GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research
More informationOverview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification
Introduction Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification Advanced Topics in Software Engineering 1 Concurrent Programs Characterized by
More informationComputers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer
Computers CMPT 125: Lecture 1: Understanding the Computer Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University January 3, 2009 A computer performs 2 basic functions: 1.
More informationUnderstanding Hardware Transactional Memory
Understanding Hardware Transactional Memory Gil Tene, CTO & co-founder, Azul Systems @giltene 2015 Azul Systems, Inc. Agenda Brief introduction What is Hardware Transactional Memory (HTM)? Cache coherence
More informationSoftware and the Concurrency Revolution
Software and the Concurrency Revolution A: The world s fastest supercomputer, with up to 4 processors, 128MB RAM, 942 MFLOPS (peak). 2 Q: What is a 1984 Cray X-MP? (Or a fractional 2005 vintage Xbox )
More informationReplication on Virtual Machines
Replication on Virtual Machines Siggi Cherem CS 717 November 23rd, 2004 Outline 1 Introduction The Java Virtual Machine 2 Napper, Alvisi, Vin - DSN 2003 Introduction JVM as state machine Addressing non-determinism
More informationE6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices
E6895 Advanced Big Data Analytics Lecture 14: NVIDIA GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist,
More informationJava Memory Model: Content
Java Memory Model: Content Memory Models Double Checked Locking Problem Java Memory Model: Happens Before Relation Volatile: in depth 16 March 2012 1 Java Memory Model JMM specifies guarantees given by
More informationChoosing a Computer for Running SLX, P3D, and P5
Choosing a Computer for Running SLX, P3D, and P5 This paper is based on my experience purchasing a new laptop in January, 2010. I ll lead you through my selection criteria and point you to some on-line
More informationNext Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
More informationItem 7: Prefer Immutable Atomic Value Types
Item 7: Prefer Immutable Atomic Value Types 1 Item 7: Prefer Immutable Atomic Value Types Immutable types are simple: After they are created, they are constant. If you validate the parameters used to construct
More informationIntroduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
More informationEmbedded Systems: map to FPGA, GPU, CPU?
Embedded Systems: map to FPGA, GPU, CPU? Jos van Eijndhoven jos@vectorfabrics.com Bits&Chips Embedded systems Nov 7, 2013 # of transistors Moore s law versus Amdahl s law Computational Capacity Hardware
More informationOpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA
OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization
More informationJava Interview Questions and Answers
1. What is the most important feature of Java? Java is a platform independent language. 2. What do you mean by platform independence? Platform independence means that we can write and compile the java
More informationIntroducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.
More informationGPU File System Encryption Kartik Kulkarni and Eugene Linkov
GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through
More informationCS11 Java. Fall 2014-2015 Lecture 7
CS11 Java Fall 2014-2015 Lecture 7 Today s Topics! All about Java Threads! Some Lab 7 tips Java Threading Recap! A program can use multiple threads to do several things at once " A thread can have local
More informationPARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology)
PARALLEL JAVASCRIPT Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology) JAVASCRIPT Not connected with Java Scheme and self (dressed in c clothing) Lots of design errors (like automatic semicolon
More informationGPU Hardware and Programming Models. Jeremy Appleyard, September 2015
GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once
More informationultra fast SOM using CUDA
ultra fast SOM using CUDA SOM (Self-Organizing Map) is one of the most popular artificial neural network algorithms in the unsupervised learning category. Sijo Mathew Preetha Joy Sibi Rajendra Manoj A
More informationParallel Computing with MATLAB
Parallel Computing with MATLAB Scott Benway Senior Account Manager Jiro Doke, Ph.D. Senior Application Engineer 2013 The MathWorks, Inc. 1 Acceleration Strategies Applied in MATLAB Approach Options Best
More informationGPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile
GPU Computing with CUDA Lecture 2 - CUDA Memories Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 Outline of lecture Recap of Lecture 1 Warp scheduling CUDA Memory hierarchy
More informationGeneral Introduction
Managed Runtime Technology: General Introduction Xiao-Feng Li (xiaofeng.li@gmail.com) 2012-10-10 Agenda Virtual machines Managed runtime systems EE and MM (JIT and GC) Summary 10/10/2012 Managed Runtime
More informationOverview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
More informationTopics. Producing Production Quality Software. Concurrent Environments. Why Use Concurrency? Models of concurrency Concurrency in Java
Topics Producing Production Quality Software Models of concurrency Concurrency in Java Lecture 12: Concurrent and Distributed Programming Prof. Arthur P. Goldberg Fall, 2005 2 Why Use Concurrency? Concurrent
More informationApplications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61
F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase
More informationC++ INTERVIEW QUESTIONS
C++ INTERVIEW QUESTIONS http://www.tutorialspoint.com/cplusplus/cpp_interview_questions.htm Copyright tutorialspoint.com Dear readers, these C++ Interview Questions have been designed specially to get
More informationDelivering Quality in Software Performance and Scalability Testing
Delivering Quality in Software Performance and Scalability Testing Abstract Khun Ban, Robert Scott, Kingsum Chow, and Huijun Yan Software and Services Group, Intel Corporation {khun.ban, robert.l.scott,
More informationMulti-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
More informationIntroduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it
t.diamanti@cineca.it Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate
More informationLBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR
LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:
More informationProgramming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga
Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.
More information22S:295 Seminar in Applied Statistics High Performance Computing in Statistics
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC
More information1. Memory technology & Hierarchy
1. Memory technology & Hierarchy RAM types Advances in Computer Architecture Andy D. Pimentel Memory wall Memory wall = divergence between CPU and RAM speed We can increase bandwidth by introducing concurrency
More informationAll ju The State of Software Development Today: A Parallel View. June 2012
All ju The State of Software Development Today: A Parallel View June 2012 2 What is Parallel Programming? When students study computer programming, the normal approach is to learn to program sequentially.
More informationCPU performance monitoring using the Time-Stamp Counter register
CPU performance monitoring using the Time-Stamp Counter register This laboratory work introduces basic information on the Time-Stamp Counter CPU register, which is used for performance monitoring. The
More informationLast Class: OS and Computer Architecture. Last Class: OS and Computer Architecture
Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts
More informationDynamic Load Balancing. Using Work-Stealing 35.1 INTRODUCTION CHAPTER. Daniel Cederman and Philippas Tsigas
CHAPTER Dynamic Load Balancing 35 Using Work-Stealing Daniel Cederman and Philippas Tsigas In this chapter, we present a methodology for efficient load balancing of computational problems that can be easily
More informationUnit 8: Immutability & Actors
SPP (Synchro et Prog Parallèle) Unit 8: Immutability & Actors François Taïani Questioning Locks Why do we need locks on data? because concurrent accesses can lead to wrong outcome But not all concurrent
More informationExperimental Evaluation of Distributed Middleware with a Virtualized Java Environment
Experimental Evaluation of Distributed Middleware with a Virtualized Java Environment Nuno A. Carvalho, João Bordalo, Filipe Campos and José Pereira HASLab / INESC TEC Universidade do Minho MW4SOC 11 December
More informationBindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27
Logistics Week 1: Wednesday, Jan 27 Because of overcrowding, we will be changing to a new room on Monday (Snee 1120). Accounts on the class cluster (crocus.csuglab.cornell.edu) will be available next week.
More informationStack Allocation. Run-Time Data Structures. Static Structures
Run-Time Data Structures Stack Allocation Static Structures For static structures, a fixed address is used throughout execution. This is the oldest and simplest memory organization. In current compilers,
More informationRootbeer: Seamlessly using GPUs from Java
Rootbeer: Seamlessly using GPUs from Java Phil Pratt-Szeliga. Dr. Jim Fawcett. Dr. Roy Welch. Syracuse University. Rootbeer Overview and Motivation Rootbeer allows a developer to program a GPU in Java
More informationInterpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters
Interpreters and virtual machines Michel Schinz 2007 03 23 Interpreters Interpreters Why interpreters? An interpreter is a program that executes another program, represented as some kind of data-structure.
More informationIntroduction to CUDA C
Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU computing as first-class capability Retain traditional DirectX/OpenGL graphics performance CUDA C Based on industry-standard
More informationSpring 2011 Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem
More informationGPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics
GPU Architectures A CPU Perspective Derek Hower AMD Research 5/21/2013 Goals Data Parallelism: What is it, and how to exploit it? Workload characteristics Execution Models / GPU Architectures MIMD (SPMD),
More informationSHARED HASH TABLES IN PARALLEL MODEL CHECKING
SHARED HASH TABLES IN PARALLEL MODEL CHECKING IPA LENTEDAGEN 2010 ALFONS LAARMAN JOINT WORK WITH MICHAEL WEBER AND JACO VAN DE POL 23/4/2010 AGENDA Introduction Goal and motivation What is model checking?
More informationMoving from CS 61A Scheme to CS 61B Java
Moving from CS 61A Scheme to CS 61B Java Introduction Java is an object-oriented language. This document describes some of the differences between object-oriented programming in Scheme (which we hope you
More informationCrash Course in Java
Crash Course in Java Based on notes from D. Hollinger Based in part on notes from J.J. Johns also: Java in a Nutshell Java Network Programming and Distributed Computing Netprog 2002 Java Intro 1 What is
More informationLesson 06: Basics of Software Development (W02D2
Lesson 06: Basics of Software Development (W02D2) Balboa High School Michael Ferraro Lesson 06: Basics of Software Development (W02D2 Do Now 1. What is the main reason why flash
More informationChapter 2: Computer-System Structures. Computer System Operation Storage Structure Storage Hierarchy Hardware Protection General System Architecture
Chapter 2: Computer-System Structures Computer System Operation Storage Structure Storage Hierarchy Hardware Protection General System Architecture Operating System Concepts 2.1 Computer-System Architecture
More informationSynchronization in. Distributed Systems. Cooperation and Coordination in. Distributed Systems. Kinds of Synchronization.
Cooperation and Coordination in Distributed Systems Communication Mechanisms for the communication between processes Naming for searching communication partners Synchronization in Distributed Systems But...
More informationParrot in a Nutshell. Dan Sugalski dan@sidhe.org. Parrot in a nutshell 1
Parrot in a Nutshell Dan Sugalski dan@sidhe.org Parrot in a nutshell 1 What is Parrot The interpreter for perl 6 A multi-language virtual machine An April Fools joke gotten out of hand Parrot in a nutshell
More informationOpenCL Programming for the CUDA Architecture. Version 2.3
OpenCL Programming for the CUDA Architecture Version 2.3 8/31/2009 In general, there are multiple ways of implementing a given algorithm in OpenCL and these multiple implementations can have vastly different
More informationHigh Performance Cloud: a MapReduce and GPGPU Based Hybrid Approach
High Performance Cloud: a MapReduce and GPGPU Based Hybrid Approach Beniamino Di Martino, Antonio Esposito and Andrea Barbato Department of Industrial and Information Engineering Second University of Naples
More informationJava Coding Practices for Improved Application Performance
1 Java Coding Practices for Improved Application Performance Lloyd Hagemo Senior Director Application Infrastructure Management Group Candle Corporation In the beginning, Java became the language of the
More informationParallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.
Parallel Computing: Strategies and Implications Dori Exterman CTO IncrediBuild. In this session we will discuss Multi-threaded vs. Multi-Process Choosing between Multi-Core or Multi- Threaded development
More informationLecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.
Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide
More informationOperating Systems. 05. Threads. Paul Krzyzanowski. Rutgers University. Spring 2015
Operating Systems 05. Threads Paul Krzyzanowski Rutgers University Spring 2015 February 9, 2015 2014-2015 Paul Krzyzanowski 1 Thread of execution Single sequence of instructions Pointed to by the program
More informationGPU Hardware Performance. Fall 2015
Fall 2015 Atomic operations performs read-modify-write operations on shared or global memory no interference with other threads for 32-bit and 64-bit integers (c. c. 1.2), float addition (c. c. 2.0) using
More informationHadoop Parallel Data Processing
MapReduce and Implementation Hadoop Parallel Data Processing Kai Shen A programming interface (two stage Map and Reduce) and system support such that: the interface is easy to program, and suitable for
More informationJava Performance. Adrian Dozsa TM-JUG 18.09.2014
Java Performance Adrian Dozsa TM-JUG 18.09.2014 Agenda Requirements Performance Testing Micro-benchmarks Concurrency GC Tools Why is performance important? We hate slow web pages/apps We hate timeouts
More informationLe langage OCaml et la programmation des GPU
Le langage OCaml et la programmation des GPU GPU programming with OCaml Mathias Bourgoin - Emmanuel Chailloux - Jean-Luc Lamotte Le projet OpenGPU : un an plus tard Ecole Polytechnique - 8 juin 2011 Outline
More informationTowards Fast SQL Query Processing in DB2 BLU Using GPUs A Technology Demonstration. Sina Meraji sinamera@ca.ibm.com
Towards Fast SQL Query Processing in DB2 BLU Using GPUs A Technology Demonstration Sina Meraji sinamera@ca.ibm.com Please Note IBM s statements regarding its plans, directions, and intent are subject to
More informationon an system with an infinite number of processors. Calculate the speedup of
1. Amdahl s law Three enhancements with the following speedups are proposed for a new architecture: Speedup1 = 30 Speedup2 = 20 Speedup3 = 10 Only one enhancement is usable at a time. a) If enhancements
More informationWhy Threads Are A Bad Idea (for most purposes)
Why Threads Are A Bad Idea (for most purposes) John Ousterhout Sun Microsystems Laboratories john.ousterhout@eng.sun.com http://www.sunlabs.com/~ouster Introduction Threads: Grew up in OS world (processes).
More informationDesign Pattern for the Adaptive Scheduling of Real-Time Tasks with Multiple Versions in RTSJ
Design Pattern for the Adaptive Scheduling of Real-Time Tasks with Multiple Versions in RTSJ Rodrigo Gonçalves, Rômulo Silva de Oliveira, Carlos Montez LCMI Depto. de Automação e Sistemas Univ. Fed. de
More informationReal Time Programming: Concepts
Real Time Programming: Concepts Radek Pelánek Plan at first we will study basic concepts related to real time programming then we will have a look at specific programming languages and study how they realize
More informationExtreme Performance with Java
Extreme Performance with Java QCon NYC - June 2012 Charlie Hunt Architect, Performance Engineering Salesforce.com sfdc_ppt_corp_template_01_01_2012.ppt In a Nutshell What you need to know about a modern
More informationLecture 6: Semaphores and Monitors
HW 2 Due Tuesday 10/18 Lecture 6: Semaphores and Monitors CSE 120: Principles of Operating Systems Alex C. Snoeren Higher-Level Synchronization We looked at using locks to provide mutual exclusion Locks
More informationLesson Objectives. To provide a grand tour of the major operating systems components To provide coverage of basic computer system organization
Lesson Objectives To provide a grand tour of the major operating systems components To provide coverage of basic computer system organization AE3B33OSD Lesson 1 / Page 2 What is an Operating System? A
More informationShared Address Space Computing: Programming
Shared Address Space Computing: Programming Alistair Rendell See Chapter 6 or Lin and Synder, Chapter 7 of Grama, Gupta, Karypis and Kumar, and Chapter 8 of Wilkinson and Allen Fork/Join Programming Model
More informationSources: On the Web: Slides will be available on:
C programming Introduction The basics of algorithms Structure of a C code, compilation step Constant, variable type, variable scope Expression and operators: assignment, arithmetic operators, comparison,
More informationGPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
More informationParallel Algorithm Engineering
Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis
More informationComparison of Concurrency Frameworks for the Java Virtual Machine
Universität Ulm Fakultät für Ingenieurwissenschaften und Informatik Institut für Verteilte Systeme, Bachelorarbeit im Studiengang Informatik Comparison of Concurrency Frameworks for the Java Virtual Machine
More informationCUDA SKILLS. Yu-Hang Tang. June 23-26, 2015 CSRC, Beijing
CUDA SKILLS Yu-Hang Tang June 23-26, 2015 CSRC, Beijing day1.pdf at /home/ytang/slides Referece solutions coming soon Online CUDA API documentation http://docs.nvidia.com/cuda/index.html Yu-Hang Tang @
More information1. Properties of Transactions
Department of Computer Science Software Development Methodology Transactions as First-Class Concepts in Object-Oriented Programming Languages Boydens Jeroen Steegmans Eric 13 march 2008 1. Properties of
More informationIntroduction to GPU Computing
Matthis Hauschild Universität Hamburg Fakultät für Mathematik, Informatik und Naturwissenschaften Technische Aspekte Multimodaler Systeme December 4, 2014 M. Hauschild - 1 Table of Contents 1. Architecture
More information