Programming with CUDA

Size: px
Start display at page:

Download "Programming with CUDA"

Transcription

1 Programming with CUDA Jens K. Mueller Department of Mathematics and Computer Science Friedrich-Schiller-University Jena Monday 23 rd May, 2011

2 Today s lecture: OpenCL

3 CUDA 3 / 23 OpenCL Heterogeneous (GPU, CPU, etc.) standard for general-purpose programming Efficiency Portability Ranging from compute servers to handheld devices Based on C99 Language for writing kernels and runtime APIs OpenCL 1.1 (14th June 2010) Developed by Khronos Group consortium [9] Implementations by AMD, Nvidia, IBM, Apple,...

4 CUDA 4 / 23 Computing Platform and Terminology 1 host and 1+ compute devices Hosts submits work to devices work queue Terminology Work item Basic unit of work Kernel Describes the work of an work item (C function) Program Collection of kernels Context Environment for working with devices

5 CUDA 5 / 23 OpenCL Header File C header files 1. Include CL/opencl.h There are also C ++ bindings at Khronos OpenCL API Registry. 1. Download cl.hpp 2. Include cl.hpp # include <CL/ opencl.h> Listing 1: Including OpenCL header files

6 CUDA 6 / 23 OpenCL Platform Layer API Underlying Hardware Abstraction Query OpenCL devices Device configuration information Create OpenCL context for one/more devices clcreatecontext clcreatecontextfromtype CL_DEVICE_TYPE_CPU CL_DEVICE_TYPE_GPU CL_DEVICE_TYPE_ACCELERATOR CL_DEVICE_TYPE_DEFAULT CL_DEVICE_TYPE_ALL

7 CUDA 7 / 23 Creating an Context // create context cl_ context context ; context = clcreatecontextfromtype (NULL, CL_ DEVICE_ TYPE_ GPU, NULL, NULL, & clerror ) ; Listing 2: Creating an OpenCL context

8 CUDA 8 / 23 Check for Devices in the Context // query all devices available to the context size_t ncontextdescriptorsize ; clgetcontextinfo ( context, CL_ CONTEXT_ DEVICES, NULL, NULL, & ncontextdescriptorsize ); cl_ device_ id * devices = ( cl_ device_ id *) malloc ( ncontextdescriptorsize ); clgetcontextinfo ( context, CL_ CONTEXT_ DEVICES, ncontextdescriptorsize, devices, NULL ); Listing 3: Query devices within the context

9 CUDA 9 / 23 Command Queues and Events Queues belong to a device Enqueuing kernels Events to synchronize between queues // create a command queue for first device of the context cl_ command_ queue cmdqueue ; cmdqueue = clcreatecommandqueue ( context, devices [0], 0, & clerror ); Listing 4: Create a command queue CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE CL_QUEUE_PROFILING_ENABLE

10 CUDA 10 / 23 OpenCL Programs Kernels: Derived from C99 No function pointers, recursion, variable length arrays, bit field, variadic functions,... Work item/work groups Vector Types Synchronization Address space qualifiers Built-In functions (image manipulation, work-item manipulation, math functions,...) Kernel is built by run-time (not by a external compiler)

11 CUDA 11 / 23 Vector Types int4 vi0 = (int4) -7; int4 vi1 = (int4)(0, 1, 2, 3); vi0.lo = vi1.hi; int8 v8 = (int8)(vi0, vi1.s01, vi1.odd); Vector Operations

12 CUDA 12 / 23 Kernel Configuration Global domain of work items (global dimension) Local domain of work groups (local dimension) No synchronization between work groups Synchronization in work groups possible

13 CUDA 13 / 23 OpenCL Memories Private memory (per work item) private Local memory (per work group) local Global/Constant Memory (all work groups) global and constant Host Memory

14 CUDA 14 / 23 Building Programs clcreateprogramwithsource and clcreateprogramwithbinary clbuildprogram and cl_ program program ; program = clcreateprogramwithsource ( context, 1, & kernelsource, NULL, & clerror ); CHECK_EQ ( CL_SUCCESS, error ); Listing 5: Create a program clerror = clbuildprogram ( program, 1, devices, NULL, NULL, NULL ); Listing 6: Build a program

15 CUDA 15 / 23 Program Build Info clgetprogrambuildinfo size_ t sizebuildlog = 200; char * result = ( char *) malloc ( sizebuildlog ); size_ t copied = 0; clerror = clgetprogrambuildinfo ( program, devices [0], CL_ PROGRAM_ BUILD_ LOG, sizebuildlog, result, & copied ); CHECK_EQ ( CL_SUCCESS, error ); LOG ( INFO ) << " Build log : " << result ; free ( result ); Listing 7: Build Information

16 CUDA 16 / 23 Create a Kernel clcreatekernel and clcreatekernelsinprogram cl_ kernel kernel ; kernel = clcreatekernel ( program, " kernel ", & clerror ); Listing 8: Create kernel

17 CUDA 17 / 23 Execute a Kernel clsetkernelarg int arg = 0; clerror = clsetkernelarg ( kernel, arg ++, sizeof ( cl_mem ), ( void *) & image ); clerror = clsetkernelarg ( kernel, arg ++, sizeof ( val ), ( void *) & val ); clerror = clsetkernelarg ( kernel, arg ++, sizeof ( val ), ( void *) & val1 ); clerror = clsetkernelarg ( kernel, arg ++, sizeof ( val ), ( void *) & val2 ); Listing 9: Specify kernel arguments

18 CUDA 18 / 23 Execute a Kernel (cont.) clenqueuendrangekernel const cl_ uint dim = 2; // size_t localworksize [ dim ] = {16, 16}; size_ t globalworksize [ dim ] = { width, height }; // execute kernel clerror = clenqueuendrangekernel ( cmdqueue, kernel, dim, NULL, globalworksize, NULL, 0, NULL, NULL ); Listing 10: Execute a kernel

19 CUDA 19 / 23 Kernel kernel void kernel ( write_only image2d_t dst, float zoom, float to_x, float to_ y ) { uint dimensions = get_ work_ dim (); } for ( uint d = 0; d < dimensions ; ++ d) { size_ t globalsize = get_ global_ size ( d); size_t globalid = get_global_id (d); size_t localsize = get_local_size (d); size_t localid = get_local_id (d); size_t numgroups = get_num_groups (d); size_t groupid = get_group_id (d); size_ t globaloffset = get_ global_ offset ( d); } Listing 11: A Kernel

20 CUDA 20 / 23 Memory Objects Buffer Objects and Image Objects Manage Memory Sub-Buffer Objects to distribute to multiple devices Buffer Objects clcreatebuffer, clcreatesubbuffer clenqueuereadbuffer, clenqueuewritebuffer, clenqueuecopybuffer

21 CUDA 21 / 23 Memory Objects (cont.) Buffer Objects and Image Objects Image Objects clcreateimage{2,3}d clgetsupportedimageformats clenqueuecopyimagetobuffer and clenqueuecopybuffertoimage clenqueuereadimage, clenqueuecopyimage, and clenqueuewriteimage CL_RGBA, CL_BGRA (optional: CL_R, CL_A,...) Kernel: {read,write}_image{f,i,ui}

22 CUDA 22 / 23 Image Example // allocate an image cl_ image_ format format ; format. image_ channel_ order = CL_ RGBA ; format. image_channel_data_type = CL_ UNSIGNED_ INT8 ; cl_ mem image = clcreateimage2d ( context, CL_ MEM_ WRITE_ ONLY, & format, width, height, NULL, NULL, & clerror ); Listing 12: Allocate an image

23 CUDA 23 / 23 References [6] OpenCL 1.1 Quick Reference card. Version URL: [7] OpenCL 1.1 Reference Pages. Khronos Group. URL: docs/man/xhtml/. [8] OpenCL 1.1 Specification. Version 36. Sept. 30, URL: http: // 1.1.pdf. [9] OpenCL. The open standard for parallel programming of heterogeneous systems URL: (cit. on p. 3).

Mitglied der Helmholtz-Gemeinschaft. OpenCL Basics. Parallel Computing on GPU and CPU. Willi Homberg. 23. März 2011

Mitglied der Helmholtz-Gemeinschaft. OpenCL Basics. Parallel Computing on GPU and CPU. Willi Homberg. 23. März 2011 Mitglied der Helmholtz-Gemeinschaft OpenCL Basics Parallel Computing on GPU and CPU Willi Homberg Agenda Introduction OpenCL architecture Platform model Execution model Memory model Programming model Platform

More information

OpenCL. Administrivia. From Monday. Patrick Cozzi University of Pennsylvania CIS 565 - Spring 2011. Assignment 5 Posted. Project

OpenCL. Administrivia. From Monday. Patrick Cozzi University of Pennsylvania CIS 565 - Spring 2011. Assignment 5 Posted. Project Administrivia OpenCL Patrick Cozzi University of Pennsylvania CIS 565 - Spring 2011 Assignment 5 Posted Due Friday, 03/25, at 11:59pm Project One page pitch due Sunday, 03/20, at 11:59pm 10 minute pitch

More information

Lecture 3. Optimising OpenCL performance

Lecture 3. Optimising OpenCL performance Lecture 3 Optimising OpenCL performance Based on material by Benedict Gaster and Lee Howes (AMD), Tim Mattson (Intel) and several others. - Page 1 Agenda Heterogeneous computing and the origins of OpenCL

More information

WebCL for Hardware-Accelerated Web Applications. Won Jeon, Tasneem Brutch, and Simon Gibbs

WebCL for Hardware-Accelerated Web Applications. Won Jeon, Tasneem Brutch, and Simon Gibbs WebCL for Hardware-Accelerated Web Applications Won Jeon, Tasneem Brutch, and Simon Gibbs What is WebCL? WebCL is a JavaScript binding to OpenCL. WebCL enables significant acceleration of compute-intensive

More information

Course materials. In addition to these slides, C++ API header files, a set of exercises, and solutions, the following are useful:

Course materials. In addition to these slides, C++ API header files, a set of exercises, and solutions, the following are useful: Course materials In addition to these slides, C++ API header files, a set of exercises, and solutions, the following are useful: OpenCL C 1.2 Reference Card OpenCL C++ 1.2 Reference Card These cards will

More information

OpenCL Programming for the CUDA Architecture. Version 2.3

OpenCL Programming for the CUDA Architecture. Version 2.3 OpenCL Programming for the CUDA Architecture Version 2.3 8/31/2009 In general, there are multiple ways of implementing a given algorithm in OpenCL and these multiple implementations can have vastly different

More information

AMD Accelerated Parallel Processing. OpenCL Programming Guide. November 2013. rev2.7

AMD Accelerated Parallel Processing. OpenCL Programming Guide. November 2013. rev2.7 AMD Accelerated Parallel Processing OpenCL Programming Guide November 2013 rev2.7 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD Accelerated Parallel Processing, the

More information

Accelerating sequential computer vision algorithms using OpenMP and OpenCL on commodity parallel hardware

Accelerating sequential computer vision algorithms using OpenMP and OpenCL on commodity parallel hardware Accelerating sequential computer vision algorithms using OpenMP and OpenCL on commodity parallel hardware 25 August 2014 Copyright 2001 2014 by NHL Hogeschool and Van de Loosdrecht Machine Vision BV All

More information

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.

More information

Introduction to OpenCL Programming. Training Guide

Introduction to OpenCL Programming. Training Guide Introduction to OpenCL Programming Training Guide Publication #: 137-41768-10 Rev: A Issue Date: May, 2010 Introduction to OpenCL Programming PID: 137-41768-10 Rev: A May, 2010 2010 Advanced Micro Devices

More information

OpenCL Static C++ Kernel Language Extension

OpenCL Static C++ Kernel Language Extension OpenCL Static C++ Kernel Language Extension Document Revision: 04 Advanced Micro Devices Authors: Ofer Rosenberg, Benedict R. Gaster, Bixia Zheng, Irina Lipov December 15, 2011 Contents 1 Overview... 3

More information

Cross-Platform GP with Organic Vectory BV Project Services Consultancy Services Expertise Markets 3D Visualization Architecture/Design Computing Embedded Software GIS Finance George van Venrooij Organic

More information

Programming Guide. ATI Stream Computing OpenCL. June 2010. rev1.03

Programming Guide. ATI Stream Computing OpenCL. June 2010. rev1.03 Programming Guide ATI Stream Computing OpenCL June 2010 rev1.03 2010 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, ATI, the ATI logo, Radeon, FireStream, FirePro, Catalyst,

More information

COSCO 2015 Heterogeneous Computing Programming

COSCO 2015 Heterogeneous Computing Programming COSCO 2015 Heterogeneous Computing Programming Michael Meyer, Shunsuke Ishikuro Supporters: Kazuaki Sasamoto, Ryunosuke Murakami July 24th, 2015 Heterogeneous Computing Programming 1. Overview 2. Methodology

More information

CUDA Basics. Murphy Stein New York University

CUDA Basics. Murphy Stein New York University CUDA Basics Murphy Stein New York University Overview Device Architecture CUDA Programming Model Matrix Transpose in CUDA Further Reading What is CUDA? CUDA stands for: Compute Unified Device Architecture

More information

GPGPU. General Purpose Computing on. Diese Folien wurden von Mathias Bach und David Rohr erarbeitet

GPGPU. General Purpose Computing on. Diese Folien wurden von Mathias Bach und David Rohr erarbeitet GPGPU General Purpose Computing on Graphics Processing Units Diese Folien wurden von Mathias Bach und David Rohr erarbeitet Volker Lindenstruth (www.compeng.de) 15. November 2011 Copyright, Goethe Uni,

More information

How OpenCL enables easy access to FPGA performance?

How OpenCL enables easy access to FPGA performance? How OpenCL enables easy access to FPGA performance? Suleyman Demirsoy Agenda Introduction OpenCL Overview S/W Flow H/W Architecture Product Information & design flow Applications Additional Collateral

More information

Experiences on using GPU accelerators for data analysis in ROOT/RooFit

Experiences on using GPU accelerators for data analysis in ROOT/RooFit Experiences on using GPU accelerators for data analysis in ROOT/RooFit Sverre Jarp, Alfio Lazzaro, Julien Leduc, Yngve Sneen Lindal, Andrzej Nowak European Organization for Nuclear Research (CERN), Geneva,

More information

OpenCL. An Introduction for HPC programmers. Tim Mattson, Intel

OpenCL. An Introduction for HPC programmers. Tim Mattson, Intel OpenCL An Introduction for HPC programmers (based on a tutorial presented at ISC 11 by Tim Mattson and Udeepta Bordoloi) Tim Mattson, Intel Acknowledgements: Ben Gaster (AMD), Udeepta Bordoloi (AMD) and

More information

Optimization. NVIDIA OpenCL Best Practices Guide. Version 1.0

Optimization. NVIDIA OpenCL Best Practices Guide. Version 1.0 Optimization NVIDIA OpenCL Best Practices Guide Version 1.0 August 10, 2009 NVIDIA OpenCL Best Practices Guide REVISIONS Original release: July 2009 ii August 16, 2009 Table of Contents Preface... v What

More information

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming Overview Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.

More information

SYCL for OpenCL. Andrew Richards, CEO Codeplay & Chair SYCL Working group GDC, March 2014. Copyright Khronos Group 2014 - Page 1

SYCL for OpenCL. Andrew Richards, CEO Codeplay & Chair SYCL Working group GDC, March 2014. Copyright Khronos Group 2014 - Page 1 SYCL for OpenCL Andrew Richards, CEO Codeplay & Chair SYCL Working group GDC, March 2014 Copyright Khronos Group 2014 - Page 1 Where is OpenCL today? OpenCL: supported by a very wide range of platforms

More information

Monte Carlo Method for Stock Options Pricing Sample

Monte Carlo Method for Stock Options Pricing Sample Monte Carlo Method for Stock Options Pricing Sample User's Guide Copyright 2013 Intel Corporation All Rights Reserved Document Number: 325264-003US Revision: 1.0 Document Number: 325264-003US Intel SDK

More information

VOCL: An Optimized Environment for Transparent Virtualization of Graphics Processing Units

VOCL: An Optimized Environment for Transparent Virtualization of Graphics Processing Units VOCL: An Optimized Environment for Transparent Virtualization of Graphics Processing Units Shucai Xiao Pavan Balaji Qian Zhu 3 Rajeev Thakur Susan Coghlan 4 Heshan Lin Gaojin Wen 5 Jue Hong 5 Wu-chun Feng

More information

Intro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1

Intro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1 Intro to GPU computing Spring 2015 Mark Silberstein, 048661, Technion 1 Serial vs. parallel program One instruction at a time Multiple instructions in parallel Spring 2015 Mark Silberstein, 048661, Technion

More information

gpus1 Ubuntu 10.04 Available via ssh

gpus1 Ubuntu 10.04 Available via ssh gpus1 Ubuntu 10.04 Available via ssh root@gpus1:[~]#lspci -v grep VGA 01:04.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200eW WPCM450 (rev 0a) 03:00.0 VGA compatible controller: nvidia Corporation

More information

Java GPU Computing. Maarten Steur & Arjan Lamers

Java GPU Computing. Maarten Steur & Arjan Lamers Java GPU Computing Maarten Steur & Arjan Lamers Overzicht OpenCL Simpel voorbeeld Casus Tips & tricks Vragen Waarom GPU Computing Afkortingen CPU, GPU, APU Khronos: OpenCL, OpenGL Nvidia: CUDA JogAmp JOCL,

More information

CUDA Debugging. GPGPU Workshop, August 2012. Sandra Wienke Center for Computing and Communication, RWTH Aachen University

CUDA Debugging. GPGPU Workshop, August 2012. Sandra Wienke Center for Computing and Communication, RWTH Aachen University CUDA Debugging GPGPU Workshop, August 2012 Sandra Wienke Center for Computing and Communication, RWTH Aachen University Nikolay Piskun, Chris Gottbrath Rogue Wave Software Rechen- und Kommunikationszentrum

More information

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software GPU Computing Numerical Simulation - from Models to Software Andreas Barthels JASS 2009, Course 2, St. Petersburg, Russia Prof. Dr. Sergey Y. Slavyanov St. Petersburg State University Prof. Dr. Thomas

More information

Stichting NIOC en de NIOC kennisbank

Stichting NIOC en de NIOC kennisbank Stichting NIOC Stichting NIOC en de NIOC kennisbank Stichting NIOC (www.nioc.nl) stelt zich conform zijn statuten tot doel: het realiseren van congressen over informatica onderwijs en voorts al hetgeen

More information

GPU Profiling with AMD CodeXL

GPU Profiling with AMD CodeXL GPU Profiling with AMD CodeXL Software Profiling Course Hannes Würfel OUTLINE 1. Motivation 2. GPU Recap 3. OpenCL 4. CodeXL Overview 5. CodeXL Internals 6. CodeXL Profiling 7. CodeXL Debugging 8. Sources

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

Altera SDK for OpenCL

Altera SDK for OpenCL Altera SDK for OpenCL Best Practices Guide Subscribe OCL003-15.0.0 101 Innovation Drive San Jose, CA 95134 www.altera.com TOC-2 Contents...1-1 Introduction...1-1 FPGA Overview...1-1 Pipelines... 1-2 Single

More information

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization

More information

GPU ACCELERATED DATABASES Database Driven OpenCL Programming. Tim Child 3DMashUp CEO

GPU ACCELERATED DATABASES Database Driven OpenCL Programming. Tim Child 3DMashUp CEO GPU ACCELERATED DATABASES Database Driven OpenCL Programming Tim Child 3DMashUp CEO SPEAKERS BIO Tim Child 35 years experience of software development Formerly VP Engineering, Oracle Corporation VP Engineering,

More information

GPGPU Parallel Merge Sort Algorithm

GPGPU Parallel Merge Sort Algorithm GPGPU Parallel Merge Sort Algorithm Jim Kukunas and James Devine May 4, 2009 Abstract The increasingly high data throughput and computational power of today s Graphics Processing Units (GPUs), has led

More information

PDC Summer School Introduction to High- Performance Computing: OpenCL Lab

PDC Summer School Introduction to High- Performance Computing: OpenCL Lab PDC Summer School Introduction to High- Performance Computing: OpenCL Lab Instructor: David Black-Schaffer Introduction This lab assignment is designed to give you experience

More information

AMD GPU Architecture. OpenCL Tutorial, PPAM 2009. Dominik Behr September 13th, 2009

AMD GPU Architecture. OpenCL Tutorial, PPAM 2009. Dominik Behr September 13th, 2009 AMD GPU Architecture OpenCL Tutorial, PPAM 2009 Dominik Behr September 13th, 2009 Overview AMD GPU architecture How OpenCL maps on GPU and CPU How to optimize for AMD GPUs and CPUs in OpenCL 2 AMD GPU

More information

CSI 402 Lecture 13 (Unix Process Related System Calls) 13 1 / 17

CSI 402 Lecture 13 (Unix Process Related System Calls) 13 1 / 17 CSI 402 Lecture 13 (Unix Process Related System Calls) 13 1 / 17 System Calls for Processes Ref: Process: Chapter 5 of [HGS]. A program in execution. Several processes are executed concurrently by the

More information

Lecture 1: an introduction to CUDA

Lecture 1: an introduction to CUDA Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Overview hardware view software view CUDA programming

More information

Parallel Web Programming

Parallel Web Programming Parallel Web Programming Tobias Groß, Björn Meier Hardware/Software Co-Design, University of Erlangen-Nuremberg May 23, 2013 Outline WebGL OpenGL Rendering Pipeline Shader WebCL Motivation Development

More information

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts

More information

CUDA Programming. Week 4. Shared memory and register

CUDA Programming. Week 4. Shared memory and register CUDA Programming Week 4. Shared memory and register Outline Shared memory and bank confliction Memory padding Register allocation Example of matrix-matrix multiplication Homework SHARED MEMORY AND BANK

More information

Introduction to GPU Programming Languages

Introduction to GPU Programming Languages CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure

More information

PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology)

PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology) PARALLEL JAVASCRIPT Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology) JAVASCRIPT Not connected with Java Scheme and self (dressed in c clothing) Lots of design errors (like automatic semicolon

More information

Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C

Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C Embedded Programming in C/C++: Lesson-1: Programming Elements and Programming in C 1 An essential part of any embedded system design Programming 2 Programming in Assembly or HLL Processor and memory-sensitive

More information

OpenACC 2.0 and the PGI Accelerator Compilers

OpenACC 2.0 and the PGI Accelerator Compilers OpenACC 2.0 and the PGI Accelerator Compilers Michael Wolfe The Portland Group michael.wolfe@pgroup.com This presentation discusses the additions made to the OpenACC API in Version 2.0. I will also present

More information

Rootbeer: Seamlessly using GPUs from Java

Rootbeer: Seamlessly using GPUs from Java Rootbeer: Seamlessly using GPUs from Java Phil Pratt-Szeliga. Dr. Jim Fawcett. Dr. Roy Welch. Syracuse University. Rootbeer Overview and Motivation Rootbeer allows a developer to program a GPU in Java

More information

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++ Answer the following 1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++ 2) Which data structure is needed to convert infix notations to postfix notations? Stack 3) The

More information

Illustration 1: Diagram of program function and data flow

Illustration 1: Diagram of program function and data flow The contract called for creation of a random access database of plumbing shops within the near perimeter of FIU Engineering school. The database features a rating number from 1-10 to offer a guideline

More information

GPU Tools Sandra Wienke

GPU Tools Sandra Wienke Sandra Wienke Center for Computing and Communication, RWTH Aachen University MATSE HPC Battle 2012/13 Rechen- und Kommunikationszentrum (RZ) Agenda IDE Eclipse Debugging (CUDA) TotalView Profiling (CUDA

More information

Introduction to CUDA C

Introduction to CUDA C Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU computing as first-class capability Retain traditional DirectX/OpenGL graphics performance CUDA C Based on industry-standard

More information

The C Programming Language course syllabus associate level

The C Programming Language course syllabus associate level TECHNOLOGIES The C Programming Language course syllabus associate level Course description The course fully covers the basics of programming in the C programming language and demonstrates fundamental programming

More information

HIGH PERFORMANCE CONSULTING COURSE OFFERINGS

HIGH PERFORMANCE CONSULTING COURSE OFFERINGS Performance 1(6) HIGH PERFORMANCE CONSULTING COURSE OFFERINGS LEARN TO TAKE ADVANTAGE OF POWERFUL GPU BASED ACCELERATOR TECHNOLOGY TODAY 2006 2013 Nvidia GPUs Intel CPUs CONTENTS Acronyms and Terminology...

More information

College of William & Mary Department of Computer Science

College of William & Mary Department of Computer Science Technical Report WM-CS-2010-03 College of William & Mary Department of Computer Science WM-CS-2010-03 Implementing the Dslash Operator in OpenCL Andy Kowalski, Xipeng Shen {kowalski,xshen}@cs.wm.edu Department

More information

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware

More information

Tutorial: Harnessing the Power of FPGAs using Altera s OpenCL Compiler Desh Singh, Tom Czajkowski, Andrew Ling

Tutorial: Harnessing the Power of FPGAs using Altera s OpenCL Compiler Desh Singh, Tom Czajkowski, Andrew Ling Tutorial: Harnessing the Power of FPGAs using Altera s OpenCL Compiler Desh Singh, Tom Czajkowski, Andrew Ling OPENCL INTRODUCTION Programmable Solutions Technology scaling favors programmability and parallelism

More information

3F6 - Software Engineering and Design. Handout 10 Distributed Systems I With Markup. Steve Young

3F6 - Software Engineering and Design. Handout 10 Distributed Systems I With Markup. Steve Young 3F6 - Software Engineering and Design Handout 10 Distributed Systems I With Markup Steve Young Contents 1. Distributed systems 2. Client-server architecture 3. CORBA 4. Interface Definition Language (IDL)

More information

CUDA programming on NVIDIA GPUs

CUDA programming on NVIDIA GPUs p. 1/21 on NVIDIA GPUs Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view

More information

OpenACC Basics Directive-based GPGPU Programming

OpenACC Basics Directive-based GPGPU Programming OpenACC Basics Directive-based GPGPU Programming Sandra Wienke, M.Sc. wienke@rz.rwth-aachen.de Center for Computing and Communication RWTH Aachen University Rechen- und Kommunikationszentrum (RZ) PPCES,

More information

Multi-Threading Performance on Commodity Multi-Core Processors

Multi-Threading Performance on Commodity Multi-Core Processors Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction

More information

Release Notes for Open Grid Scheduler/Grid Engine. Version: Grid Engine 2011.11

Release Notes for Open Grid Scheduler/Grid Engine. Version: Grid Engine 2011.11 Release Notes for Open Grid Scheduler/Grid Engine Version: Grid Engine 2011.11 New Features Berkeley DB Spooling Directory Can Be Located on NFS The Berkeley DB spooling framework has been enhanced such

More information

Android Renderscript. Stephen Hines, Shih-wei Liao, Jason Sams, Alex Sakhartchouk srhines@google.com November 18, 2011

Android Renderscript. Stephen Hines, Shih-wei Liao, Jason Sams, Alex Sakhartchouk srhines@google.com November 18, 2011 Android Renderscript Stephen Hines, Shih-wei Liao, Jason Sams, Alex Sakhartchouk srhines@google.com November 18, 2011 Outline Goals/Design of Renderscript Components Offline Compiler Online JIT Compiler

More information

SMTP-32 Library. Simple Mail Transfer Protocol Dynamic Link Library for Microsoft Windows. Version 5.2

SMTP-32 Library. Simple Mail Transfer Protocol Dynamic Link Library for Microsoft Windows. Version 5.2 SMTP-32 Library Simple Mail Transfer Protocol Dynamic Link Library for Microsoft Windows Version 5.2 Copyright 1994-2003 by Distinct Corporation All rights reserved Table of Contents 1 Overview... 5 1.1

More information

clopencl - Supporting Distributed Heterogeneous Computing in HPC Clusters

clopencl - Supporting Distributed Heterogeneous Computing in HPC Clusters clopencl - Supporting Distributed Heterogeneous Computing in HPC Clusters Albano Alves 1, José Rufino 1, António Pina 2, Luís Santos 2 1 Polytechnic Institute of Bragança, Portugal 2 University of Minho,

More information

Image Processing on Graphics Processing Unit with CUDA and C++

Image Processing on Graphics Processing Unit with CUDA and C++ Image Processing on Graphics Processing Unit with CUDA and C++ Matthieu Garrigues ENSTA-ParisTech June 15, 2011 Image Processing on Graphics Processing Unit with CUDA and

More information

Maximize Application Performance On the Go and In the Cloud with OpenCL* on Intel Architecture

Maximize Application Performance On the Go and In the Cloud with OpenCL* on Intel Architecture Maximize Application Performance On the Go and In the Cloud with OpenCL* on Intel Architecture Arnon Peleg (Intel) Ben Ashbaugh (Intel) Dave Helmly (Adobe) Legal INFORMATION IN THIS DOCUMENT IS PROVIDED

More information

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X DU-05348-001_v5.5 July 2013 Installation and Verification on Mac OS X TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. System Requirements... 1 1.2. About

More information

Hands-on CUDA exercises

Hands-on CUDA exercises Hands-on CUDA exercises CUDA Exercises We have provided skeletons and solutions for 6 hands-on CUDA exercises In each exercise (except for #5), you have to implement the missing portions of the code Finished

More information

Enabling OpenCL Acceleration of Web Applications

Enabling OpenCL Acceleration of Web Applications Enabling OpenCL Acceleration of Web Applications Tasneem Brutch Samsung Electronics Khronos WebCL Working Group Chair LinuxCon Sept. 17, 2013, New Orleans, LA Outline WebCL tutorial, demos and coding examples

More information

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics GPU Architectures A CPU Perspective Derek Hower AMD Research 5/21/2013 Goals Data Parallelism: What is it, and how to exploit it? Workload characteristics Execution Models / GPU Architectures MIMD (SPMD),

More information

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase

More information

An Overview of Java. overview-1

An Overview of Java. overview-1 An Overview of Java overview-1 Contents What is Java Major Java features Java virtual machine Java programming language Java class libraries (API) GUI Support in Java Networking and Threads in Java overview-2

More information

C++ Programming Language

C++ Programming Language C++ Programming Language Lecturer: Yuri Nefedov 7th and 8th semesters Lectures: 34 hours (7th semester); 32 hours (8th semester). Seminars: 34 hours (7th semester); 32 hours (8th semester). Course abstract

More information

Debugging CUDA Applications Przetwarzanie Równoległe CUDA/CELL

Debugging CUDA Applications Przetwarzanie Równoległe CUDA/CELL Debugging CUDA Applications Przetwarzanie Równoległe CUDA/CELL Michał Wójcik, Tomasz Boiński Katedra Architektury Systemów Komputerowych Wydział Elektroniki, Telekomunikacji i Informatyki Politechnika

More information

EMC RepliStor for Microsoft Windows ERROR MESSAGE AND CODE GUIDE P/N 300-002-826 REV A02

EMC RepliStor for Microsoft Windows ERROR MESSAGE AND CODE GUIDE P/N 300-002-826 REV A02 EMC RepliStor for Microsoft Windows ERROR MESSAGE AND CODE GUIDE P/N 300-002-826 REV A02 EMC Corporation Corporate Headquarters: Hopkinton, MA 01748-9103 1-508-435-1000 www.emc.com Copyright 2003-2005

More information

HPC with Multicore and GPUs

HPC with Multicore and GPUs HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware

More information

Getting Started with CodeXL

Getting Started with CodeXL AMD Developer Tools Team Advanced Micro Devices, Inc. Table of Contents Introduction... 2 Install CodeXL... 2 Validate CodeXL installation... 3 CodeXL help... 5 Run the Teapot Sample project... 5 Basic

More information

Hetero Streams Library 1.0

Hetero Streams Library 1.0 Release Notes for release of Copyright 2013-2016 Intel Corporation All Rights Reserved US Revision: 1.0 World Wide Web: http://www.intel.com Legal Disclaimer Legal Disclaimer You may not use or facilitate

More information

Project No. 2: Process Scheduling in Linux Submission due: April 28, 2014, 11:59pm

Project No. 2: Process Scheduling in Linux Submission due: April 28, 2014, 11:59pm Project No. 2: Process Scheduling in Linux Submission due: April 28, 2014, 11:59pm PURPOSE Getting familiar with the Linux kernel source code. Understanding process scheduling and how different parameters

More information

Android builders summit The Android media framework

Android builders summit The Android media framework Android builders summit The Android media framework Author: Bert Van Dam & Poornachandra Kallare Date: 22 April 2014 Usage models Use the framework: MediaPlayer android.media.mediaplayer Framework manages

More information

GPU Parallel Computing Architecture and CUDA Programming Model

GPU Parallel Computing Architecture and CUDA Programming Model GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel

More information

Le langage OCaml et la programmation des GPU

Le langage OCaml et la programmation des GPU Le langage OCaml et la programmation des GPU GPU programming with OCaml Mathias Bourgoin - Emmanuel Chailloux - Jean-Luc Lamotte Le projet OpenGPU : un an plus tard Ecole Polytechnique - 8 juin 2011 Outline

More information

IOS110. Virtualization 5/27/2014 1

IOS110. Virtualization 5/27/2014 1 IOS110 Virtualization 5/27/2014 1 Agenda What is Virtualization? Types of Virtualization. Advantages and Disadvantages. Virtualization software Hyper V What is Virtualization? Virtualization Refers to

More information

Simplified Machine Learning for CUDA. Umar Arshad @arshad_umar Arrayfire @arrayfire

Simplified Machine Learning for CUDA. Umar Arshad @arshad_umar Arrayfire @arrayfire Simplified Machine Learning for CUDA Umar Arshad @arshad_umar Arrayfire @arrayfire ArrayFire CUDA and OpenCL experts since 2007 Headquartered in Atlanta, GA In search for the best and the brightest Expert

More information

Press Briefing. GDC, March 2014. Neil Trevett Vice President Mobile Ecosystem, NVIDIA President Khronos. Copyright Khronos Group 2014 - Page 1

Press Briefing. GDC, March 2014. Neil Trevett Vice President Mobile Ecosystem, NVIDIA President Khronos. Copyright Khronos Group 2014 - Page 1 Copyright Khronos Group 2014 - Page 1 Press Briefing GDC, March 2014 Neil Trevett Vice President Mobile Ecosystem, NVIDIA President Khronos Copyright Khronos Group 2014 - Page 2 Lots of Khronos News at

More information

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts

More information

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices E6895 Advanced Big Data Analytics Lecture 14: NVIDIA GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist,

More information

Database Toolkit: Portable and Cost Effective Software

Database Toolkit: Portable and Cost Effective Software Database Toolkit: Portable and Cost Effective Software By Katherine Ye Recursion Software, Inc. TABLE OF CONTENTS Abstract...2 Why using ODBC...2 Disadvantage of ODBC...3 Programming with Database Toolkit...4

More information

5 Arrays and Pointers

5 Arrays and Pointers 5 Arrays and Pointers 5.1 One-dimensional arrays Arrays offer a convenient way to store and access blocks of data. Think of arrays as a sequential list that offers indexed access. For example, a list of

More information

Optimizing Application Performance with CUDA Profiling Tools

Optimizing Application Performance with CUDA Profiling Tools Optimizing Application Performance with CUDA Profiling Tools Why Profile? Application Code GPU Compute-Intensive Functions Rest of Sequential CPU Code CPU 100 s of cores 10,000 s of threads Great memory

More information

Virtual Servers. Virtual machines. Virtualization. Design of IBM s VM. Virtual machine systems can give everyone the OS (and hardware) that they want.

Virtual Servers. Virtual machines. Virtualization. Design of IBM s VM. Virtual machine systems can give everyone the OS (and hardware) that they want. Virtual machines Virtual machine systems can give everyone the OS (and hardware) that they want. IBM s VM provided an exact copy of the hardware to the user. Virtual Servers Virtual machines are very widespread.

More information

Leveraging Aparapi to Help Improve Financial Java Application Performance

Leveraging Aparapi to Help Improve Financial Java Application Performance Leveraging Aparapi to Help Improve Financial Java Application Performance Shrinivas Joshi, Software Performance Engineer Abstract Graphics Processing Unit (GPU) and Accelerated Processing Unit (APU) offload

More information

Operating System Manual. Realtime Communication System for netx. Kernel API Function Reference. www.hilscher.com.

Operating System Manual. Realtime Communication System for netx. Kernel API Function Reference. www.hilscher.com. Operating System Manual Realtime Communication System for netx Kernel API Function Reference Language: English www.hilscher.com rcx - Kernel API Function Reference 2 Copyright Information Copyright 2005-2007

More information

GPUs for Scientific Computing

GPUs for Scientific Computing GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research

More information

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X DU-05348-001_v6.5 August 2014 Installation and Verification on Mac OS X TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. System Requirements... 1 1.2. About

More information

High Performance Cloud: a MapReduce and GPGPU Based Hybrid Approach

High Performance Cloud: a MapReduce and GPGPU Based Hybrid Approach High Performance Cloud: a MapReduce and GPGPU Based Hybrid Approach Beniamino Di Martino, Antonio Esposito and Andrea Barbato Department of Industrial and Information Engineering Second University of Naples

More information

GPU Hardware Performance. Fall 2015

GPU Hardware Performance. Fall 2015 Fall 2015 Atomic operations performs read-modify-write operations on shared or global memory no interference with other threads for 32-bit and 64-bit integers (c. c. 1.2), float addition (c. c. 2.0) using

More information

SOFTWARE DEVELOPMENT TOOLS USING GPGPU POTENTIALITIES

SOFTWARE DEVELOPMENT TOOLS USING GPGPU POTENTIALITIES SOFTWARE DEVELOPMENT TOOLS USING GPGPU POTENTIALITIES V.A. Dudnik, V.I. Kudryavtsev, T.M. Sereda, S.A. Us, M.V. Shestakov National Science Center Kharkov Institute of Physics and Technology, 61108, Kharkov,

More information