Intro to GPU computing. Spring 2015 Mark Silberstein, , Technion 1

Size: px
Start display at page:

Download "Intro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1"

Transcription

1 Intro to GPU computing Spring 2015 Mark Silberstein, , Technion 1

2 Serial vs. parallel program One instruction at a time Multiple instructions in parallel Spring 2015 Mark Silberstein, , Technion 2

3 GPUs are parallel processors Throughput-oriented hardware Example problem I have 100 apples to eat 1) high performance : finish one apple faster 2) high throughput : finish all apples faster Performance = parallel hardware + scalable parallel program! Spring 2015 Mark Silberstein, , Technion 3

4 Simplified GPU model (see an introductory paper to on my home page: «GPUs: High-performance Accelerators for Parallel Applications» ) Spring 2015 Mark Silberstein, , Technion 4

5 GPU 101 CPU GPU Memory Memory Spring 2015 Mark Silberstein, , Technion 5

6 GPU is a co-processor CPU Computation GPU Memory Memory Spring 2015 Mark Silberstein, , Technion 6

7 GPU is a co-processor CPU Computation GPU Memory Memory Spring 2015 Mark Silberstein, , Technion 7

8 Co-processor model CPU Computation GPU t a t i o n GPU kernel Memory Memory Spring 2015 Mark Silberstein, , Technion 8

9 GPU is a co-processor CPU Computation GPU Memory Memory Spring 2015 Mark Silberstein, , Technion 9

10 Simple GPU program Idea: same set of operations is applied to different data chunks in parallel Implementation Every thread runs the same code on different data chunks. GPU concurrently runs many parallel threads Spring 2015 Mark Silberstein, , Technion 10

11 Sequential algorithm Vector sum A=A+B For every element i<a.size A[i]=A[i]+B[i] Spring 2015 Mark Silberstein, , Technion 11

12 Sequential algorithm Vector sum A=A+B For every element i<a.size A[i]=A[i]+B[i] Parallel algorithm In parallel for every i<a.size A[i]=A[i]+B[i] Spring 2015 Mark Silberstein, , Technion 12

13 Sequential algorithm Vector sum A=A+B For every element i<a.size A[i]=A[i]+B[i] Parallel algorithm In parallel for every i<a.size A[i]=A[i]+B[i] Execution on GPU In parallel for every threadid < total GPU kernel threads A[threadId]=A[threadId]+B[threadId] Spring 2015 Mark Silberstein, , Technion 13

14 Sequential algorithm Vector sum A=A+B For every element i<a.size A[i]=A[i]+B[i] Parallel algorithm In parallel for every i<a.size Instantiated and executed on GPU A[i]=A[i]+B[i] in thousands of hardware threads Execution on GPU In parallel for every threadid < total GPU kernel threads A[threadId]=A[threadId]+B[threadId] Spring 2015 Mark Silberstein, , Technion 14

15 GPU programming basics Spring 2015 Mark Silberstein, , Technion 15

16 GPU execution model Thread sequence of sequentially-executed instructions Each thread has private state: registers, stack Single Instruction Multiple Threads (SIMT) model Same program, called kernel, is instantiated and invoked for each thread Each thread gets unique ID Many threads (tens of thousands) Threads wait in a hardware queue until resources become available All threads share access to GPU memory Spring 2015 Mark Silberstein, , Technion 16

17 CPU control of GPU GPU* and CPU have different memory spaces GPU is fully managed by a CPU cannot* access CPU RAM cannot* reallocate its own memory cannot* start computations CPU must allocate memory and transfer input before kernel invocation CPU must transfer output and clean up memory after kernel termination Spring 2015 Mark Silberstein, , Technion 17

18 Vector sum for A.size=1024 GPU A[threadId]=A[threadId]+B[threadId] CPU 1.Allocate arrays in GPU memory 2.Copy data CPU -> GPU 3.Invoke kernel with 1024 threads 4.Wait until complete and copy data GPU->CPU Spring 2015 Mark Silberstein, , Technion 18

19 Vector sum kernel code in CUDA global void sum(int* A, int* B){ int my=gethwthreadid(); A[my]=A[my]+B[my]; } GPU Get unique thread ID CUDA modifier to signify GPU kernels fine-grain parallelism: operation per thread Spring 2015 Mark Silberstein, , Technion 19

20 CPU code for GPU management global void sum(int* A, int* B){ int my=gethwthreadid(); A[my]=A[my]+B[my]; } int main(int argc, char** argv){ int* d_a, * d_b; int A[SIZE], B[SIZE]; cudamalloc((void**)&d_a,size_bytes); cudamalloc((void**)&d_b,size_bytes); Two sets of pointers Two sets of pointers GPU memory allocated by CPU GPU CPU cudamemcpy(d_a,a,size_bytes,cudamemcpyhosttodevice); cudamemcpy(d_b,b,size_bytes,cudamemcpyhosttodevice); dim3 threads_in_block(512), blocks(2); sum<<<blocks,threads_in_block>>>(d_a,d_b); Input copied from CPU to GPU Pass pointers to cudadevicesynchronize(); cudaerror_t error=cudagetlasterror(); if (error!=cudasuccess) { device memory at fprintf(stderr,"kernel execution failed:%s\n",cudageterrorstring(error)); return 1; kernel invocation } cudamemcpy(a,d_a,size_bytes,cudamemcpydevicetohost); Spring 2015printf ("Success"); return 0; Mark Silberstein, , Technion 20 }

21 CPU code for GPU management global void sum(int* A, int* B){ int my=gethwthreadid(); A[my]=A[my]+B[my]; } GPU int main(int argc, char** argv){ int* d_a, * d_b; int A[SIZE], B[SIZE]; CPU cudamalloc((void**)&d_a,size_bytes); cudamalloc((void**)&d_b,size_bytes); cudamemcpy(d_a,a,size_bytes,cudamemcpyhosttodevice); cudamemcpy(d_b,b,size_bytes,cudamemcpyhosttodevice); dim3 threads_in_block(512), blocks(2); sum<<<blocks,threads_in_block>>>(d_a,d_b); Wait until the kernel is over cudadevicesynchronize(); cudaerror_t error=cudagetlasterror(); if (error!=cudasuccess) { fprintf(stderr,"kernel execution failed:%s\n",cudageterrorstring(error)); Copy data back to CPU return 1; } cudamemcpy(a,d_a,size_bytes,cudamemcpydevicetohost); Spring 2015printf ("Success"); Mark Silberstein, , Technion 21 return 0; }

22 Typical code structure global void sum(int* A, int* B){ int my=gethwthreadid(); A[my]=A[my]+B[my]; } GPU programming GPU int main(int argc, char** argv){ int* d_a, * d_b; int A[SIZE], B[SIZE]; CPU cudamalloc((void**)&d_a,size_bytes); cudamalloc((void**)&d_b,size_bytes); cudamemcpy(d_a,a,size_bytes,cudamemcpyhosttodevice); cudamemcpy(d_b,b,size_bytes,cudamemcpyhosttodevice); dim3 threads_in_block(512), blocks(2); sum<<<blocks,threads_in_block>>>(d_a,d_b); GPU integration cudadevicesynchronize(); cudaerror_t error=cudagetlasterror(); if (error!=cudasuccess) { fprintf(stderr,"kernel execution failed:%s\n",cudageterrorstring(error)); return 1; } cudamemcpy(d_a,a,size_bytes,cudamemcpydevicetohost); printf ("Success"); Spring 2015return 0; Mark Silberstein, , Technion 22 }

23 BUT! Vector sum is simple purely data parallel What if we need coordination between tasks Spring 2015 Mark Silberstein, , Technion 23

24 Inter-thread communication Programming model usability is determined by the means of inter-thread communication No communications easy for hardware, hard for software Fine-grained communications hard to make efficient in hardware, easy for software Spring 2015 Mark Silberstein, , Technion 24

25 Communications: inner product Spring 2015 Mark Silberstein, , Technion 25

26 Communications: inner product x x x x x x x x Eventually all-to-all thread communication required How can we make it efficient?? Spring 2015 Mark Silberstein, , Technion 26

27 Compromise: Hierarchy Threads Threadblocks All threads are split up into fixed-size groups: threadblocks Threads in a threadblock can synchronize and communicate efficiently Share fast memory Can use barriers Spring 2015 Mark Silberstein, , Technion 27

28 Execution model: stream of threadblocks Threadblock scheduling is non-deterministic Cannot rely on a schedule for correctness No inter-threadblock global barrier in the programming model Spring 2015 Mark Silberstein, , Technion 28

29 Hierarchical Dot-Product x x x x x x x x Threadblock 1 Threadblock 2 Spring 2015 Mark Silberstein, , Technion 29

30 Slow coordination should be rare Slow coordination Efficient communication x x x x x x x x Threadblock 1 Threadblock 2 Coarse grain task Fine-grain task Spring 2015 Mark Silberstein, , Technion 30

31 Dot product void vector_dotproduct_kernel(float* ga, float* gb, float* gout) { _shared_ float l_res[tbsize]; //local core memory } int tid=threadidx.x; int bid=blockidx.x; int offset=bid*tbsize+tid; l_res[tid]=ga[offset]*gb[offset]; synchthreads(); // wait for all products in a threadblock // parallel reduction for(int i=tbsize/2; i>0;i/=2) { if (tid<i) l_res[tid]=l_res[tid]+l_res[i+tid]; synchthreads(); // wait for all partial sums } if (tid==0) gout[bid]=l_res[0]; Spring 2015 Mark Silberstein, , Technion 31

32 Dot product void vector_dotproduct_kernel(float* ga, Fast float* local gb, memory float* gout) { _shared_ float l_res[tbsize]; //local core memory int tid=threadidx.x; int bid=blockidx.x; Two levels of indexing int offset=bid*tbsize+tid; l_res[tid]=ga[offset]*gb[offset]; Fast sync } synchthreads(); // wait for all products in a threadblock // parallel reduction for(int i=tbsize/2; i>0;i/=2) { if (tid<i) l_res[tid]=l_res[tid]+l_res[i+tid]; synchthreads(); // wait for all partial sums } if (tid==0) gout[bid]=l_res[0]; Spring 2015 Mark Silberstein, , Technion 32

33 How to finalize the parallel reduce Option 1: use CPU Option 2: use atomics Option 3: use multiple kernel invocations kernel completion = global barrier Spring 2015 Mark Silberstein, , Technion 33

34 Dot product with atomics void vector_dotproduct_kernel(float* ga, float* gb, float* gout) { _shared_ float l_res[tbsize]; //local core memory } int tid=threadidx.x; int bid=blockidx.x; int offset=bid*tbsize+tid; l_res[tid]=ga[offset]*gb[offset]; synchthreads(); // wait for all All threadblocks products in are going a threadblock to atomically update the // parallel reduction global memory gout[0] for(int i=tbsize/2; i>0;i/=2) { if (tid<i) l_res[tid]=l_res[tid]+l_res[i+tid]; synchthreads(); // wait for all partial sums } if (tid==0) atomicadd(gout,l_res[0]); Spring 2015 Mark Silberstein, , Technion 34

35 GPU hardware in more details Spring 2015 Mark Silberstein, , Technion 35

36 BEGIN: Terminology break Spring 2015 Mark Silberstein, , Technion 36

37 Flynn's HARDWARE Taxonomy Spring 2015 Mark Silberstein, , Technion 37

38 SIMD Lock-step execution Example: vector operations (SSE) Spring 2015 Mark Silberstein, , Technion 38

39 MIMD Example: multicores Spring 2015 Mark Silberstein, , Technion 39

40 END: Terminology break Spring 2015 Mark Silberstein, , Technion 40

41 GPU hardware characteristics Massive parallelism Runs thousands of concurrent execution flows (threads) identified and explicitly programmed by a developer Spring 2015 Mark Silberstein, , Technion 41

42 GPU hardware parallelism 1. MIMD = multi-core GPU GPU memory Core MP Core MP Core MP Core MP Spring 2015 Mark Silberstein, , Technion 42

43 GPU hardware parallelism 2. SIMD = vector Core SIMD vector SIMD vector SIMD vector Spring 2015 Mark Silberstein, , Technion 43

44 Fine-grain multithreading 3. Parallelism for latency hiding GPU GPU memory Core MP Execution state T1 T2 T3 Spring 2015 Mark Silberstein, , Technion 44

45 Fine-grain multithreading 3. Parallelism for latency hiding GPU GPU memory Core MP Execution state R 0x01 T1 T2 T3 Spring 2015 Mark Silberstein, , Technion 45

46 Fine-grain multithreading 3. Parallelism for latency hiding GPU GPU memory Core MP R 0x01 R 0x04 T1 T2 Execution state T3 Spring 2015 Mark Silberstein, , Technion 46

47 Fine-grain multithreading 3. Parallelism for latency hiding GPU GPU memory Core MP Execution state R 0x01 R 0x04 R 0x08 T1 T2 T3 Spring 2015 Mark Silberstein, , Technion 47

48 Fine-grain multithreading 3. Parallelism for latency hiding GPU GPU memory Core MP Execution state R 0x01 R 0x04 R 0x08 T1 T2 T3 Spring 2015 Mark Silberstein, , Technion 48

49 Putting it all together: 3 levels of hardware parallelism GPU GPU memory Core MP Core MP Core MP Core MP Core Thread State Ctx 1 1 SIMD vector Core MP Thread State Ctx k k Spring 2015 Mark Silberstein, , Technion 49

50 Thread n Thread 1 What is GPU thread? GPU GPU memory Core MP Core MP Core MP Core MP Core Thread State Ctx 1 1 SIMD vector Core MP Thread State Ctx k k Spring 2015 Mark Silberstein, , Technion 50

51 Thread n Thread 1 What is thread block? GPU GPU memory Core MP Core MP Core MP Core MP scratchpad memory in core Core Core MP Thread State Ctx 1 1 Thread block SIMD vector Thread State Ctx k k Spring 2015 Mark Silberstein, , Technion 51

52 What is GPU kernel Hardware queue GPU GPU memory Thread block Thread block Core MP Core MP Core MP Core MP Thread Thread Thread Thread block Thread block Thread block Thread block Thread block block block block Thread block Spring 2015 Mark Silberstein, , Technion 52

53 Thread 1 Thread n 10,000-s of concurrent threads! GPU GPU memory NVIDIA K20x GPU: 64x14x32= concurrent threads Core MP Core MP Core MP Core MP Core Thread State Ctx 1 1 SIMD vector 64 Core MP Thread State Ctx k k Spring 2015 Mark Silberstein, , Technion 53

54 Using shared memory for efficient matrix product computations (using CUDA samples) Spring 2015 Mark Silberstein, , Technion 54

55 Walk-through matrix product code Spring 2015 Mark Silberstein, , Technion 55

56 Walk-through matrix product code Spring 2015 Mark Silberstein, , Technion 56

57 Walk-through matrix product code Spring 2015 Mark Silberstein, , Technion 57

58 More info about CUDA NVIDIA programming manual UDACITY Interactive online course: Intro to parallel computing highly recommended to skim through the first two lectures before the next class Spring 2015 Mark Silberstein, , Technion 58

Introduction to GPU Programming Languages

Introduction to GPU Programming Languages CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure

More information

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming Overview Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.

More information

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics GPU Architectures A CPU Perspective Derek Hower AMD Research 5/21/2013 Goals Data Parallelism: What is it, and how to exploit it? Workload characteristics Execution Models / GPU Architectures MIMD (SPMD),

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware

More information

OpenCL Programming for the CUDA Architecture. Version 2.3

OpenCL Programming for the CUDA Architecture. Version 2.3 OpenCL Programming for the CUDA Architecture Version 2.3 8/31/2009 In general, there are multiple ways of implementing a given algorithm in OpenCL and these multiple implementations can have vastly different

More information

GPU Parallel Computing Architecture and CUDA Programming Model

GPU Parallel Computing Architecture and CUDA Programming Model GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices E6895 Advanced Big Data Analytics Lecture 14: NVIDIA GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist,

More information

HPC with Multicore and GPUs

HPC with Multicore and GPUs HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware

More information

Lecture 1: an introduction to CUDA

Lecture 1: an introduction to CUDA Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Overview hardware view software view CUDA programming

More information

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com CSCI-GA.3033-012 Graphics Processing Units (GPUs): Architecture and Programming Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Modern GPU

More information

CUDA Debugging. GPGPU Workshop, August 2012. Sandra Wienke Center for Computing and Communication, RWTH Aachen University

CUDA Debugging. GPGPU Workshop, August 2012. Sandra Wienke Center for Computing and Communication, RWTH Aachen University CUDA Debugging GPGPU Workshop, August 2012 Sandra Wienke Center for Computing and Communication, RWTH Aachen University Nikolay Piskun, Chris Gottbrath Rogue Wave Software Rechen- und Kommunikationszentrum

More information

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?

More information

OpenACC 2.0 and the PGI Accelerator Compilers

OpenACC 2.0 and the PGI Accelerator Compilers OpenACC 2.0 and the PGI Accelerator Compilers Michael Wolfe The Portland Group michael.wolfe@pgroup.com This presentation discusses the additions made to the OpenACC API in Version 2.0. I will also present

More information

GPU Computing - CUDA

GPU Computing - CUDA GPU Computing - CUDA A short overview of hardware and programing model Pierre Kestener 1 1 CEA Saclay, DSM, Maison de la Simulation Saclay, June 12, 2012 Atelier AO and GPU 1 / 37 Content Historical perspective

More information

Optimizing Application Performance with CUDA Profiling Tools

Optimizing Application Performance with CUDA Profiling Tools Optimizing Application Performance with CUDA Profiling Tools Why Profile? Application Code GPU Compute-Intensive Functions Rest of Sequential CPU Code CPU 100 s of cores 10,000 s of threads Great memory

More information

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

CUDA Basics. Murphy Stein New York University

CUDA Basics. Murphy Stein New York University CUDA Basics Murphy Stein New York University Overview Device Architecture CUDA Programming Model Matrix Transpose in CUDA Further Reading What is CUDA? CUDA stands for: Compute Unified Device Architecture

More information

Optimizing Parallel Reduction in CUDA. Mark Harris NVIDIA Developer Technology

Optimizing Parallel Reduction in CUDA. Mark Harris NVIDIA Developer Technology Optimizing Parallel Reduction in CUDA Mark Harris NVIDIA Developer Technology Parallel Reduction Common and important data parallel primitive Easy to implement in CUDA Harder to get it right Serves as

More information

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile GPU Computing with CUDA Lecture 2 - CUDA Memories Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 Outline of lecture Recap of Lecture 1 Warp scheduling CUDA Memory hierarchy

More information

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015 GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once

More information

Spring 2011 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem

More information

Introduction to CUDA C

Introduction to CUDA C Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU computing as first-class capability Retain traditional DirectX/OpenGL graphics performance CUDA C Based on industry-standard

More information

GPUs for Scientific Computing

GPUs for Scientific Computing GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research

More information

CUDA Programming. Week 4. Shared memory and register

CUDA Programming. Week 4. Shared memory and register CUDA Programming Week 4. Shared memory and register Outline Shared memory and bank confliction Memory padding Register allocation Example of matrix-matrix multiplication Homework SHARED MEMORY AND BANK

More information

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild. Parallel Computing: Strategies and Implications Dori Exterman CTO IncrediBuild. In this session we will discuss Multi-threaded vs. Multi-Process Choosing between Multi-Core or Multi- Threaded development

More information

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.

More information

CUDA SKILLS. Yu-Hang Tang. June 23-26, 2015 CSRC, Beijing

CUDA SKILLS. Yu-Hang Tang. June 23-26, 2015 CSRC, Beijing CUDA SKILLS Yu-Hang Tang June 23-26, 2015 CSRC, Beijing day1.pdf at /home/ytang/slides Referece solutions coming soon Online CUDA API documentation http://docs.nvidia.com/cuda/index.html Yu-Hang Tang @

More information

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get

More information

GPU Hardware Performance. Fall 2015

GPU Hardware Performance. Fall 2015 Fall 2015 Atomic operations performs read-modify-write operations on shared or global memory no interference with other threads for 32-bit and 64-bit integers (c. c. 1.2), float addition (c. c. 2.0) using

More information

Overview of HPC Resources at Vanderbilt

Overview of HPC Resources at Vanderbilt Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015 2 Computing Resources

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

Low Latency Complex Event Processing on Parallel Hardware

Low Latency Complex Event Processing on Parallel Hardware Low Latency Complex Event Processing on Parallel Hardware Gianpaolo Cugola a, Alessandro Margara a a Dip. di Elettronica e Informazione Politecnico di Milano, Italy surname@elet.polimi.it Abstract Most

More information

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Computer Science 14 (2) 2013 http://dx.doi.org/10.7494/csci.2013.14.2.243 Marcin Pietroń Pawe l Russek Kazimierz Wiatr ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Abstract This paper presents

More information

Stream Processing on GPUs Using Distributed Multimedia Middleware

Stream Processing on GPUs Using Distributed Multimedia Middleware Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research

More information

GPU Computing with CUDA Lecture 4 - Optimizations. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

GPU Computing with CUDA Lecture 4 - Optimizations. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile GPU Computing with CUDA Lecture 4 - Optimizations Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 Outline of lecture Recap of Lecture 3 Control flow Coalescing Latency hiding

More information

OpenACC Basics Directive-based GPGPU Programming

OpenACC Basics Directive-based GPGPU Programming OpenACC Basics Directive-based GPGPU Programming Sandra Wienke, M.Sc. wienke@rz.rwth-aachen.de Center for Computing and Communication RWTH Aachen University Rechen- und Kommunikationszentrum (RZ) PPCES,

More information

Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification

Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification Introduction Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification Advanced Topics in Software Engineering 1 Concurrent Programs Characterized by

More information

A Pattern-Based Comparison of OpenACC & OpenMP for Accelerators

A Pattern-Based Comparison of OpenACC & OpenMP for Accelerators A Pattern-Based Comparison of OpenACC & OpenMP for Accelerators Sandra Wienke 1,2, Christian Terboven 1,2, James C. Beyer 3, Matthias S. Müller 1,2 1 IT Center, RWTH Aachen University 2 JARA-HPC, Aachen

More information

An Implementation Of Multiprocessor Linux

An Implementation Of Multiprocessor Linux An Implementation Of Multiprocessor Linux This document describes the implementation of a simple SMP Linux kernel extension and how to use this to develop SMP Linux kernels for architectures other than

More information

Hybrid Programming with MPI and OpenMP

Hybrid Programming with MPI and OpenMP Hybrid Programming with and OpenMP Ricardo Rocha and Fernando Silva Computer Science Department Faculty of Sciences University of Porto Parallel Computing 2015/2016 R. Rocha and F. Silva (DCC-FCUP) Programming

More information

Debugging with TotalView

Debugging with TotalView Tim Cramer 17.03.2015 IT Center der RWTH Aachen University Why to use a Debugger? If your program goes haywire, you may... ( wand (... buy a magic... read the source code again and again and...... enrich

More information

Lecture 1. Course Introduction

Lecture 1. Course Introduction Lecture 1 Course Introduction Welcome to CSE 262! Your instructor is Scott B. Baden Office hours (week 1) Tues/Thurs 3.30 to 4.30 Room 3244 EBU3B 2010 Scott B. Baden / CSE 262 /Spring 2011 2 Content Our

More information

An Introduction to Parallel Computing/ Programming

An Introduction to Parallel Computing/ Programming An Introduction to Parallel Computing/ Programming Vicky Papadopoulou Lesta Astrophysics and High Performance Computing Research Group (http://ahpc.euc.ac.cy) Dep. of Computer Science and Engineering European

More information

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software GPU Computing Numerical Simulation - from Models to Software Andreas Barthels JASS 2009, Course 2, St. Petersburg, Russia Prof. Dr. Sergey Y. Slavyanov St. Petersburg State University Prof. Dr. Thomas

More information

ultra fast SOM using CUDA

ultra fast SOM using CUDA ultra fast SOM using CUDA SOM (Self-Organizing Map) is one of the most popular artificial neural network algorithms in the unsupervised learning category. Sijo Mathew Preetha Joy Sibi Rajendra Manoj A

More information

Rethinking SIMD Vectorization for In-Memory Databases

Rethinking SIMD Vectorization for In-Memory Databases SIGMOD 215, Melbourne, Victoria, Australia Rethinking SIMD Vectorization for In-Memory Databases Orestis Polychroniou Columbia University Arun Raghavan Oracle Labs Kenneth A. Ross Columbia University Latest

More information

Chapter 2: OS Overview

Chapter 2: OS Overview Chapter 2: OS Overview CmSc 335 Operating Systems 1. Operating system objectives and functions Operating systems control and support the usage of computer systems. a. usage users of a computer system:

More information

A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin

A Comparison Of Shared Memory Parallel Programming Models. Jace A Mogill David Haglin A Comparison Of Shared Memory Parallel Programming Models Jace A Mogill David Haglin 1 Parallel Programming Gap Not many innovations... Memory semantics unchanged for over 50 years 2010 Multi-Core x86

More information

Parallel Computing. Parallel shared memory computing with OpenMP

Parallel Computing. Parallel shared memory computing with OpenMP Parallel Computing Parallel shared memory computing with OpenMP Thorsten Grahs, 14.07.2014 Table of contents Introduction Directives Scope of data Synchronization OpenMP vs. MPI OpenMP & MPI 14.07.2014

More information

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat Why Computers Are Getting Slower The traditional approach better performance Why computers are

More information

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

GPU File System Encryption Kartik Kulkarni and Eugene Linkov GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through

More information

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007 Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007 1 Single-core computer 2 Single-core CPU chip the single core 3 Multi-core architectures This lecture is about a new trend in computer

More information

CUDA programming on NVIDIA GPUs

CUDA programming on NVIDIA GPUs p. 1/21 on NVIDIA GPUs Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view

More information

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase

More information

Introduction to GPU Architecture

Introduction to GPU Architecture Introduction to GPU Architecture Ofer Rosenberg, PMTS SW, OpenCL Dev. Team AMD Based on From Shader Code to a Teraflop: How GPU Shader Cores Work, By Kayvon Fatahalian, Stanford University Content 1. Three

More information

GPU Tools Sandra Wienke

GPU Tools Sandra Wienke Sandra Wienke Center for Computing and Communication, RWTH Aachen University MATSE HPC Battle 2012/13 Rechen- und Kommunikationszentrum (RZ) Agenda IDE Eclipse Debugging (CUDA) TotalView Profiling (CUDA

More information

High Performance Cloud: a MapReduce and GPGPU Based Hybrid Approach

High Performance Cloud: a MapReduce and GPGPU Based Hybrid Approach High Performance Cloud: a MapReduce and GPGPU Based Hybrid Approach Beniamino Di Martino, Antonio Esposito and Andrea Barbato Department of Industrial and Information Engineering Second University of Naples

More information

Parallel Firewalls on General-Purpose Graphics Processing Units

Parallel Firewalls on General-Purpose Graphics Processing Units Parallel Firewalls on General-Purpose Graphics Processing Units Manoj Singh Gaur and Vijay Laxmi Kamal Chandra Reddy, Ankit Tharwani, Ch.Vamshi Krishna, Lakshminarayanan.V Department of Computer Engineering

More information

Hardware design for ray tracing

Hardware design for ray tracing Hardware design for ray tracing Jae-sung Yoon Introduction Realtime ray tracing performance has recently been achieved even on single CPU. [Wald et al. 2001, 2002, 2004] However, higher resolutions, complex

More information

IMAGE PROCESSING WITH CUDA

IMAGE PROCESSING WITH CUDA IMAGE PROCESSING WITH CUDA by Jia Tse Bachelor of Science, University of Nevada, Las Vegas 2006 A thesis submitted in partial fulfillment of the requirements for the Master of Science Degree in Computer

More information

Lecture 3. Optimising OpenCL performance

Lecture 3. Optimising OpenCL performance Lecture 3 Optimising OpenCL performance Based on material by Benedict Gaster and Lee Howes (AMD), Tim Mattson (Intel) and several others. - Page 1 Agenda Heterogeneous computing and the origins of OpenCL

More information

Chapter 2 Parallel Architecture, Software And Performance

Chapter 2 Parallel Architecture, Software And Performance Chapter 2 Parallel Architecture, Software And Performance UCSB CS140, T. Yang, 2014 Modified from texbook slides Roadmap Parallel hardware Parallel software Input and output Performance Parallel program

More information

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.

More information

Programming GPUs with CUDA

Programming GPUs with CUDA Programming GPUs with CUDA Max Grossman Department of Computer Science Rice University johnmc@rice.edu COMP 422 Lecture 23 12 April 2016 Why GPUs? Two major trends GPU performance is pulling away from

More information

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011 Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis

More information

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket

More information

Porting the Plasma Simulation PIConGPU to Heterogeneous Architectures with Alpaka

Porting the Plasma Simulation PIConGPU to Heterogeneous Architectures with Alpaka Porting the Plasma Simulation PIConGPU to Heterogeneous Architectures with Alpaka René Widera1, Erik Zenker1,2, Guido Juckeland1, Benjamin Worpitz1,2, Axel Huebl1,2, Andreas Knüpfer2, Wolfgang E. Nagel2,

More information

GPI Global Address Space Programming Interface

GPI Global Address Space Programming Interface GPI Global Address Space Programming Interface SEPARS Meeting Stuttgart, December 2nd 2010 Dr. Mirko Rahn Fraunhofer ITWM Competence Center for HPC and Visualization 1 GPI Global address space programming

More information

COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP

COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Institute for Multimedia and Software Engineering Conduction of Exercises: Institute for Multimedia eda and Software Engineering g BB 315c, Tel: 379-1174 E-mail: marius.rosu@uni-due.de

More information

Course Development of Programming for General-Purpose Multicore Processors

Course Development of Programming for General-Purpose Multicore Processors Course Development of Programming for General-Purpose Multicore Processors Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University Richmond, VA 23284 wzhang4@vcu.edu

More information

L20: GPU Architecture and Models

L20: GPU Architecture and Models L20: GPU Architecture and Models scribe(s): Abdul Khalifa 20.1 Overview GPUs (Graphics Processing Units) are large parallel structure of processing cores capable of rendering graphics efficiently on displays.

More information

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France

More information

GPGPU Parallel Merge Sort Algorithm

GPGPU Parallel Merge Sort Algorithm GPGPU Parallel Merge Sort Algorithm Jim Kukunas and James Devine May 4, 2009 Abstract The increasingly high data throughput and computational power of today s Graphics Processing Units (GPUs), has led

More information

CSE 6040 Computing for Data Analytics: Methods and Tools

CSE 6040 Computing for Data Analytics: Methods and Tools CSE 6040 Computing for Data Analytics: Methods and Tools Lecture 12 Computer Architecture Overview and Why it Matters DA KUANG, POLO CHAU GEORGIA TECH FALL 2014 Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS

More information

Introduction to Parallel Programming and MapReduce

Introduction to Parallel Programming and MapReduce Introduction to Parallel Programming and MapReduce Audience and Pre-Requisites This tutorial covers the basics of parallel programming and the MapReduce programming model. The pre-requisites are significant

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

Operating System: Scheduling

Operating System: Scheduling Process Management Operating System: Scheduling OS maintains a data structure for each process called Process Control Block (PCB) Information associated with each PCB: Process state: e.g. ready, or waiting

More information

Parallel Prefix Sum (Scan) with CUDA. Mark Harris mharris@nvidia.com

Parallel Prefix Sum (Scan) with CUDA. Mark Harris mharris@nvidia.com Parallel Prefix Sum (Scan) with CUDA Mark Harris mharris@nvidia.com April 2007 Document Change History Version Date Responsible Reason for Change February 14, 2007 Mark Harris Initial release April 2007

More information

Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest

Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest 1. Introduction Few years ago, parallel computers could

More information

APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE

APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE Tuyou Peng 1, Jun Peng 2 1 Electronics and information Technology Department Jiangmen Polytechnic, Jiangmen, Guangdong, China, typeng2001@yahoo.com

More information

Towards Fast SQL Query Processing in DB2 BLU Using GPUs A Technology Demonstration. Sina Meraji sinamera@ca.ibm.com

Towards Fast SQL Query Processing in DB2 BLU Using GPUs A Technology Demonstration. Sina Meraji sinamera@ca.ibm.com Towards Fast SQL Query Processing in DB2 BLU Using GPUs A Technology Demonstration Sina Meraji sinamera@ca.ibm.com Please Note IBM s statements regarding its plans, directions, and intent are subject to

More information

Real-time Visual Tracker by Stream Processing

Real-time Visual Tracker by Stream Processing Real-time Visual Tracker by Stream Processing Simultaneous and Fast 3D Tracking of Multiple Faces in Video Sequences by Using a Particle Filter Oscar Mateo Lozano & Kuzahiro Otsuka presented by Piotr Rudol

More information

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Haohuan Fu haohuan@tsinghua.edu.cn High Performance Geo-Computing (HPGC) Group Center for Earth System Science Tsinghua University

More information

GPGPU Computing. Yong Cao

GPGPU Computing. Yong Cao GPGPU Computing Yong Cao Why Graphics Card? It s powerful! A quiet trend Copyright 2009 by Yong Cao Why Graphics Card? It s powerful! Processor Processing Units FLOPs per Unit Clock Speed Processing Power

More information

Multi-core Programming System Overview

Multi-core Programming System Overview Multi-core Programming System Overview Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,

More information

Performance Analysis for GPU Accelerated Applications

Performance Analysis for GPU Accelerated Applications Center for Information Services and High Performance Computing (ZIH) Performance Analysis for GPU Accelerated Applications Working Together for more Insight Willersbau, Room A218 Tel. +49 351-463 - 39871

More information

Trends in High-Performance Computing for Power Grid Applications

Trends in High-Performance Computing for Power Grid Applications Trends in High-Performance Computing for Power Grid Applications Franz Franchetti ECE, Carnegie Mellon University www.spiral.net Co-Founder, SpiralGen www.spiralgen.com This talk presents my personal views

More information

First-class User Level Threads

First-class User Level Threads First-class User Level Threads based on paper: First-Class User Level Threads by Marsh, Scott, LeBlanc, and Markatos research paper, not merely an implementation report User-level Threads Threads managed

More information

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications ME964 High Performance Computing for Engineering Applications Intro, GPU Computing February 9, 2012 Dan Negrut, 2012 ME964 UW-Madison "The Internet is a great way to get on the net. US Senator Bob Dole

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information

SGRT: A Scalable Mobile GPU Architecture based on Ray Tracing

SGRT: A Scalable Mobile GPU Architecture based on Ray Tracing SGRT: A Scalable Mobile GPU Architecture based on Ray Tracing Won-Jong Lee, Shi-Hwa Lee, Jae-Ho Nah *, Jin-Woo Kim *, Youngsam Shin, Jaedon Lee, Seok-Yoon Jung SAIT, SAMSUNG Electronics, Yonsei Univ. *,

More information

Frog: Asynchronous Graph Processing on GPU with Hybrid Coloring Model

Frog: Asynchronous Graph Processing on GPU with Hybrid Coloring Model X. SHI ET AL. 1 Frog: Asynchronous Graph Processing on GPU with Hybrid Coloring Model Xuanhua Shi 1, Xuan Luo 1, Junling Liang 1, Peng Zhao 1, Sheng Di 2, Bingsheng He 3, and Hai Jin 1 1 Services Computing

More information

COSCO 2015 Heterogeneous Computing Programming

COSCO 2015 Heterogeneous Computing Programming COSCO 2015 Heterogeneous Computing Programming Michael Meyer, Shunsuke Ishikuro Supporters: Kazuaki Sasamoto, Ryunosuke Murakami July 24th, 2015 Heterogeneous Computing Programming 1. Overview 2. Methodology

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

Computer Systems II. Unix system calls. fork( ) wait( ) exit( ) How To Create New Processes? Creating and Executing Processes

Computer Systems II. Unix system calls. fork( ) wait( ) exit( ) How To Create New Processes? Creating and Executing Processes Computer Systems II Creating and Executing Processes 1 Unix system calls fork( ) wait( ) exit( ) 2 How To Create New Processes? Underlying mechanism - A process runs fork to create a child process - Parent

More information

Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design

Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design Learning Outcomes Simple CPU Operation and Buses Dr Eddie Edwards eddie.edwards@imperial.ac.uk At the end of this lecture you will Understand how a CPU might be put together Be able to name the basic components

More information