Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

Similar documents
Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

Introduction to GPGPU. Tiziano Diamanti

GPGPU Computing. Yong Cao

Introduction to GPU Programming Languages

ST810 Advanced Computing

Introduction to GPU Architecture

HIGH PERFORMANCE CONSULTING COURSE OFFERINGS

Introduction to GPU hardware and to CUDA

Introduction to GPU Computing

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

GPU Architecture. Michael Doggett ATI

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

General Purpose Computation on Graphics Processors (GPGPU) Mike Houston, Stanford University

ATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group

The Future Of Animation Is Games

L20: GPU Architecture and Models

QCD as a Video Game?

Computer Graphics Hardware An Overview

Next Generation GPU Architecture Code-named Fermi

Radeon HD 2900 and Geometry Generation. Michael Doggett

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

Real-time Visual Tracker by Stream Processing

GPUs for Scientific Computing

HPC with Multicore and GPUs

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Graphic Processing Units: a possible answer to High Performance Computing?

Data Parallel Computing on Graphics Hardware. Ian Buck Stanford University

ENHANCEMENT OF TEGRA TABLET'S COMPUTATIONAL PERFORMANCE BY GEFORCE DESKTOP AND WIFI

CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University

GPGPU for Real-Time Data Analytics: Introduction. Nanyang Technological University, Singapore 2

Appendix L. General-purpose GPU Radiative Solver. Andrea Tosetto Marco Giardino Matteo Gorlani (Blue Engineering & Design, Italy)

HP Workstations graphics card options

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

Interactive Level-Set Deformation On the GPU

Accelerating CFD using OpenFOAM with GPUs

AMD GPU Architecture. OpenCL Tutorial, PPAM Dominik Behr September 13th, 2009

NVIDIA GeForce GTX 580 GPU Datasheet

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group

Le langage OCaml et la programmation des GPU

CUDA programming on NVIDIA GPUs

GPU Parallel Computing Architecture and CUDA Programming Model

Introduction to Computer Graphics

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Parallel Web Programming

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

Evaluation of CUDA Fortran for the CFD code Strukti

GPGPU in Scientific Applications

Image Processing & Video Algorithms with CUDA

Parallel Programming Survey

High Performance GPGPU Computer for Embedded Systems

AMD EMBEDDED PCIe ADD-IN BOARD Comparison

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

GPGPU accelerated Computational Fluid Dynamics

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

Texture Cache Approximation on GPUs

Retargeting PLAPACK to Clusters with Hardware Accelerators

Optimizing a 3D-FWT code in a cluster of CPUs+GPUs

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

Introduction to Computer Graphics. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012

Intro to GPU computing. Spring 2015 Mark Silberstein, , Technion 1

GPU Computing: More exciting times for HPC! Paul G. Howard, Ph.D. Chief Scientist, Microway, Inc. Copyright 2009 by Microway, Inc.

Brook for GPUs: Stream Computing on Graphics Hardware

gpus1 Ubuntu Available via ssh

HP Z Workstations graphics card options

GPGPU: General-Purpose Computation on GPUs

Awards News. GDDR5 memory provides twice the bandwidth per pin of GDDR3 memory, delivering more speed and higher bandwidth.

Writing Applications for the GPU Using the RapidMind Development Platform

Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model

Msystems Ltd. SAPPHIRE HD GB GDDR5 PCIE

How To Run A Factory I/O On A Microsoft Gpu 2.5 (Sdk) On A Computer Or Microsoft Powerbook 2.3 (Powerpoint) On An Android Computer Or Macbook 2 (Powerstation) On

HP Workstations graphics card options

Maximize Application Performance On the Go and In the Cloud with OpenCL* on Intel Architecture

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

Advanced Rendering for Engineering & Styling

SAPPHIRE HD GB GDDR5 PCIE.

Go Faster - Preprocessing Using FPGA, CPU, GPU. Dipl.-Ing. (FH) Bjoern Rudde Image Acquisition Development STEMMER IMAGING

FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015

Console Architecture. By: Peter Hood & Adelia Wong

2: Introducing image synthesis. Some orientation how did we get here? Graphics system architecture Overview of OpenGL / GLU / GLUT

Guided Performance Analysis with the NVIDIA Visual Profiler

Crosswalk: build world class hybrid mobile apps

GPGPUs, CUDA and OpenCL

ultra fast SOM using CUDA

1. INTRODUCTION Graphics 2

System requirements for Autodesk Building Design Suite 2017

White Paper OpenCL : The Future of Accelerated Application Performance Is Now. Table of Contents

NVPRO-PIPELINE A RESEARCH RENDERING PIPELINE MARKUS TAVENRATH MATAVENRATH@NVIDIA.COM SENIOR DEVELOPER TECHNOLOGY ENGINEER, NVIDIA

Imperial College London

CLOUD GAMING WITH NVIDIA GRID TECHNOLOGIES Franck DIARD, Ph.D., SW Chief Software Architect GDC 2014

Introduction to OpenCL Programming. Training Guide

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions

Transcription:

GPU Computing Numerical Simulation - from Models to Software Andreas Barthels JASS 2009, Course 2, St. Petersburg, Russia Prof. Dr. Sergey Y. Slavyanov St. Petersburg State University Prof. Dr. Thomas Huckle Technische Universität München March 29 April 7, 2009

Outline 1 Introduction 2 GPU Hardware GPU Functionality GPU CPU Comparison 3 GPU Computing Today BrookGPU OEM SDKs Applications 4 GPU Computing Example 2D Simplified Fluid Simulation 5 Outlook DirectX 11 OpenCL Heterogeneous Computing 6 Summary

Introduction GPU Hardware highly parallel 1 Tflops for <120 e ( ˆ= 5,500 RUB) increasingly easy to program This talk: Applicability to numerical simulation and immediate visualization?

Outline 1 Introduction 2 GPU Hardware GPU Functionality GPU CPU Comparison 3 GPU Computing Today BrookGPU OEM SDKs Applications 4 GPU Computing Example 2D Simplified Fluid Simulation 5 Outlook DirectX 11 OpenCL Heterogeneous Computing 6 Summary

GPU Functionality I Standard Tasks display of graphical models and scenes on a computer works with polygons, vertices, textures

GPU Functionality II Standard Task: Shading of Pixels The same code fragment for every pixel ( SPMD) Typically a pixel s value is independent on the others massive gain through parallelization Recent Extensions increased programmability for more realistic visual effects simulation of rigid, deformable or fluid objects in game physics general purpose programmability

GPU Functionality III Programming Interfaces & Terminology DirectX Vertex Shader Pixel Shader Geometry Shader OpenGL Vertex Shader Fragment Shader (Geometry Shader)

Introduction GPU Hardware GPU Computing Today GPU Computing Example GPU Functionality IV Demo of AMD RenderMonkey Outlook Summary

Outline 1 Introduction 2 GPU Hardware GPU Functionality GPU CPU Comparison 3 GPU Computing Today BrookGPU OEM SDKs Applications 4 GPU Computing Example 2D Simplified Fluid Simulation 5 Outlook DirectX 11 OpenCL Heterogeneous Computing 6 Summary

GPU CPU Comparison x86 CPU Multi Purpose OS Instructions One ALU per Core Mild SIMD per Core Slow Memory Big Cache GPU Specialized to Graphics No OS Concepts +100 Parallel ALUs Extensive MIMD Fast Memory Small Cache

Peak Performance AMD/ATI Radeon HD 4870 180 e 1,200 MADD Gflops (single precision) 240 Gflops 115.2 GBytes/s Memory Bandwidth NVIDIA Geforce GTX 285 330 e 1,063 MADD Gflops (single precision) 78 Gflops 159.0 GBytes/s Memory Bandwidth Intel Core i7 965 XE 1000 e 70 M+ADD Gflops (single precision) 35 Gflops 25.6 GBytes/s Memory Bandwidth (= What about double precision?)

Peak Performance AMD/ATI Radeon HD 4870 180 e 1,200 MADD Gflops (single precision) 240 Gflops 115.2 GBytes/s Memory Bandwidth NVIDIA Geforce GTX 285 330 e 1,063 MADD Gflops (single precision) 78 Gflops 159.0 GBytes/s Memory Bandwidth Intel Core i7 965 XE 1000 e 70 M+ADD Gflops (single precision) 35 Gflops 25.6 GBytes/s Memory Bandwidth (= What about double precision?)

Peak Performance AMD/ATI Radeon HD 4870 180 e 1,200 MADD Gflops (single precision) 240 Gflops 115.2 GBytes/s Memory Bandwidth NVIDIA Geforce GTX 285 330 e 1,063 MADD Gflops (single precision) 78 Gflops 159.0 GBytes/s Memory Bandwidth Intel Core i7 965 XE 1000 e 70 M+ADD Gflops (single precision) 35 Gflops 25.6 GBytes/s Memory Bandwidth (= What about double precision?)

Outline 1 Introduction 2 GPU Hardware GPU Functionality GPU CPU Comparison 3 GPU Computing Today BrookGPU OEM SDKs Applications 4 GPU Computing Example 2D Simplified Fluid Simulation 5 Outlook DirectX 11 OpenCL Heterogeneous Computing 6 Summary

BrookGPU I BrookGPU was the first GPGPU framework Developed at Stanford Extension to C Vector datatypes: float, float2, float3, float4 int, int2, int3, int4... double, double2

BrookGPU II Streams float s<10, 10>; 2 dimensional 10x10 float matrix Access to stream items is restricted and regularized Kernels kernel void k(float s<>, float3 f, float a[10][10], out float o<>) {...

BrookGPU III Reduction Operations on Streams void reduce sum (float a<>, reduce float result<>) result = result + a; Scatter & Gather Ops on Streams streamscatterop(s,...) streamgatherop(s,...)

Outline 1 Introduction 2 GPU Hardware GPU Functionality GPU CPU Comparison 3 GPU Computing Today BrookGPU OEM SDKs Applications 4 GPU Computing Example 2D Simplified Fluid Simulation 5 Outlook DirectX 11 OpenCL Heterogeneous Computing 6 Summary

AMD/ATI: Stream SDK I Custom Operations AMD relies on Brook+, an extension to BrookGPU AMD Core Math Library Optimized implementation of BLAS Can use multiple CPU cores and multiple GPUs

AMD/ATI: Stream SDK II Demo of Brook+ and AMD Stream KernelAnalyzer

NVIDIA: CUDA SDK Custom Operations CUDA is very similar to Brook (streams, kernels,...) Math Libraries CUBLAS: BLAS implemented in CUDA CUFFT: Fourier Transforms on GPU

Outline 1 Introduction 2 GPU Hardware GPU Functionality GPU CPU Comparison 3 GPU Computing Today BrookGPU OEM SDKs Applications 4 GPU Computing Example 2D Simplified Fluid Simulation 5 Outlook DirectX 11 OpenCL Heterogeneous Computing 6 Summary

Applications using GPU Computing Image processing Video de-/encoding Protein folding (folding@home) SETI@home

Outline 1 Introduction 2 GPU Hardware GPU Functionality GPU CPU Comparison 3 GPU Computing Today BrookGPU OEM SDKs Applications 4 GPU Computing Example 2D Simplified Fluid Simulation 5 Outlook DirectX 11 OpenCL Heterogeneous Computing 6 Summary

2D Simplified Fluid Simulation I Denote u to be the height of the water surface at a certain point in time and space and let c be the wave speed: 2 u t 2 = c2 ( 2 ) u x 2 + 2 u y 2

2D Simplified Fluid Simulation II Discretizing space and time each equidistantally by t and h: u t+1 i,j 2u t i,j + u t 1 i,j t 2 = c 2 ( u t i+1,j + u t i 1,j + u t i,j+1 + ut i,j 1 4ut i,j h 2 ) ( ) Applying ui,j t := 0.5 u t+1 i,j + ui,j t yields system of linear equations.

Interactive Demo 2D simplified fluid simulation using conjugate gradient solver

Outline 1 Introduction 2 GPU Hardware GPU Functionality GPU CPU Comparison 3 GPU Computing Today BrookGPU OEM SDKs Applications 4 GPU Computing Example 2D Simplified Fluid Simulation 5 Outlook DirectX 11 OpenCL Heterogeneous Computing 6 Summary

DirectX 11 DirectX drives the industry standard for GPUs DX 11 compliant hardware is designed to support GPGPU Object Oriented Programming in Compute Shader Great increase in flexibility Increased double precision performance Wait until Windows 7

Outline 1 Introduction 2 GPU Hardware GPU Functionality GPU CPU Comparison 3 GPU Computing Today BrookGPU OEM SDKs Applications 4 GPU Computing Example 2D Simplified Fluid Simulation 5 Outlook DirectX 11 OpenCL Heterogeneous Computing 6 Summary

OpenCL Open standard, driven by Khronos group Not Object Oriented, just C, like Brook et al. Interoperability with OpenGL Abstraction layer for any type of hardware

Outline 1 Introduction 2 GPU Hardware GPU Functionality GPU CPU Comparison 3 GPU Computing Today BrookGPU OEM SDKs Applications 4 GPU Computing Example 2D Simplified Fluid Simulation 5 Outlook DirectX 11 OpenCL Heterogeneous Computing 6 Summary

Heterogeneous Computing Software support via OpenCL GPU and CPU combined hardware soon to come ( Intel Larrabee, AMD Fusion)

Outline 1 Introduction 2 GPU Hardware GPU Functionality GPU CPU Comparison 3 GPU Computing Today BrookGPU OEM SDKs Applications 4 GPU Computing Example 2D Simplified Fluid Simulation 5 Outlook DirectX 11 OpenCL Heterogeneous Computing 6 Summary

Summary GPUs feature high performance for little money Capable of descent immediate visualization Upcoming standards in hardware & APIs are promising

Introduction GPU Hardware GPU Computing Today GPU Computing Example Thank you Outlook Summary

Copyright Notice NVIDIA and AMD are registered trademarks of NVIDIA and AMD Corp. respectively. AMD RenderMonkey, the corresponding samples and the AMD Stream SDK are copyright AMD.