Introduction to GPU Programming Languages

Similar documents
Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Next Generation GPU Architecture Code-named Fermi

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

GPU Parallel Computing Architecture and CUDA Programming Model

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPGPU Computing. Yong Cao

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Intro to GPU computing. Spring 2015 Mark Silberstein, , Technion 1

GPUs for Scientific Computing

Introduction to GPU hardware and to CUDA

Introduction to Cloud Computing

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

CUDA programming on NVIDIA GPUs

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

ultra fast SOM using CUDA

Computer Graphics Hardware An Overview

Evaluation of CUDA Fortran for the CFD code Strukti

An Introduction to Parallel Computing/ Programming

L20: GPU Architecture and Models

LSN 2 Computer Processors

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o

Introduction to GPU Computing

The Future Of Animation Is Games

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

GPU for Scientific Computing. -Ali Saleh

HPC with Multicore and GPUs

Chapter 2 Parallel Architecture, Software And Performance

Case Study on Productivity and Performance of GPGPUs

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA

Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model

GPU Computing - CUDA

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

Parallel Computing with MATLAB

Computer Architecture TDTS10

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

Scalability and Classifications

10- High Performance Compu5ng

Introduction to GPU Architecture

Guided Performance Analysis with the NVIDIA Visual Profiler

ST810 Advanced Computing

Introduction to GPGPU. Tiziano Diamanti

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

High Performance Computing

Lecture 1: an introduction to CUDA

Data-parallel Acceleration of PARSEC Black-Scholes Benchmark

CMSC 611: Advanced Computer Architecture

NVIDIA GeForce GTX 580 GPU Datasheet

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

Go Faster - Preprocessing Using FPGA, CPU, GPU. Dipl.-Ing. (FH) Bjoern Rudde Image Acquisition Development STEMMER IMAGING

HPC Wales Skills Academy Course Catalogue 2015

MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu

GPGPU accelerated Computational Fluid Dynamics

Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 05: Array Processors

Middleware and Distributed Systems. Introduction. Dr. Martin v. Löwis

NVIDIA Tools For Profiling And Monitoring. David Goodwin

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

High Performance Cloud: a MapReduce and GPGPU Based Hybrid Approach

A Comparison of Distributed Systems: ChorusOS and Amoeba

OpenCL Programming for the CUDA Architecture. Version 2.3

APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE

~ Greetings from WSU CAPPLab ~

Multi-core architectures. Jernej Barbic , Spring 2007 May 3, 2007

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers

Texture Cache Approximation on GPUs

RWTH GPU Cluster. Sandra Wienke November Rechen- und Kommunikationszentrum (RZ) Fotos: Christian Iwainsky

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

GPGPU for Real-Time Data Analytics: Introduction. Nanyang Technological University, Singapore 2

ANALYSIS OF SUPERCOMPUTER DESIGN

Part I Courses Syllabus

Parallel Programming

CUDA Optimization with NVIDIA Tools. Julien Demouth, NVIDIA

Parallel Computing for Data Science

GPGPU acceleration in OpenFOAM

ELE 356 Computer Engineering II. Section 1 Foundations Class 6 Architecture

A Pattern-Based Comparison of OpenACC & OpenMP for Accelerators

BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Alan Kaminsky

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

Parallel Programming Survey

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

Clustering Billions of Data Points Using GPUs

Data Parallel Computing on Graphics Hardware. Ian Buck Stanford University

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute

BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Alan Kaminsky

GPGPU Parallel Merge Sort Algorithm

Concurrent Solutions to Linear Systems using Hybrid CPU/GPU Nodes

CUDA Basics. Murphy Stein New York University

GPU Programming Strategies and Trends in GPU Computing

Transcription:

CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho

http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure

Intel s Response to NVIDIA GPUs

To Accelerate Or Not To Accelerate Pro: They make your code run faster. Cons: They re expensive. They re hard to program. Your code may not be cross-platform.

When is GPUs appropriate? Applications Traditional GPU Applications: Gaming, image processing i.e., manipulating image pixels, oftentimes the same operation on each pixel Scientific and Engineering Problems: physical modeling, matrix algebra, sorting, etc. Data parallel algorithms: Large data arrays Single Instruction, Multiple Data (SIMD) parallelism Floating point computations

Parallel Hardware Landscape: Instruction and Data Streams Flynn s Classification: Hardware dimensions of memory and control Data Streams Single Multiple Instruction Single SISD: Intel Pentium 4 SIMD: GPUs Streams Multiple MISD: No Examples Today MIMD: Intel Nalhelm

Single Instruction, Multiple Data (SIMD) Example Instruction: A[i][j] = A[i][j]++; Element-wise operations on vectors or matrices of data. Multiple processors: All processors execute the same set of instructions at the same time Each with data at a different address location. 0,0 0,1 0,2 0,3 1,0 1,1 1,2 1,3 Advantages: Simplifies synchronization Reduces instruction control hardware; One program, many results. Works best for highly data-parallel applications (e.g., matrix operations, monte carlo calculations) 2,0 2,1 2,2 2,3 3,0 3,1 3,2 3,3

CPU vs. GPU Hardware Design Philosophies Control CPU ALU ALU ALU ALU GPU DRAM DRAM

CUDA-capable GPU Hardware Architecture Processors execute computing threads Thread execution managers issues threads 128 thread processors grouped into 16 streaming multiprocessors (SMs) enables thread cooperation. CPU CPU GPU Host Input Assembler Thread Execution Manager GPU... Texture Texture Texture Texture Texture Texture Texture Texture Load/store Load/store Load/store Load/store Load/store Load/store Global Memory

CUDA-capable GPU Hardware Architecture Processors execute computing threads Thread execution managers issues threads 128 thread processors grouped into 16 streaming multiprocessors (SMs) enables thread cooperation. CPU CPU GPU Host Input Assembler Thread Execution Manager GPU... Texture Texture Texture Texture Texture Texture Texture Texture Load/store Load/store Load/store Load/store Load/store Load/store Global Memory

Single Instruction, Multiple Threads (SIMT) A version of SIMD used in GPUs. GPUs use a thread model to achieve a high parallel performance and hide memory latency. On a GPU, 10,000s of threads are mapped on to available processors that all execute the same set of instructions (on different data addresses). 0,0 0,1 0,2 0,3 1,0 1,1 1,2 1,3 2,0 2,1 2,2 2,3 3,0 3,1 3,2 3,3

Is it hard to program on a GPU? In the olden days (pre-2006) programming GPUs meant either: using a graphics standard like OpenGL (which is mostly meant for rendering), or getting fairly deep into the graphics rendering pipeline. To use a GPU to do general purpose number crunching, you had to make your number crunching pretend to be graphics. This is hard. Why bother?

How to Program on a GPU Today Proprietary programming language or extensions NVIDIA: CUDA (C/C++) AMD/ATI: StreamSDK/Brook+ (C/C++) OpenCL (Open Computing Language): an industry standard for doing number crunching on GPUs. Portland Group Inc (PGI) Fortran and C compilers with accelerator directives; PGI CUDA Fortran (Fortran 90 equivalent of NVIDIA s CUDA C). OpenMP version 4.0 may include directives for accelerators.