GPGPU Computing. Yong Cao



Similar documents
Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Computer Graphics Hardware An Overview

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

Introduction to GPGPU. Tiziano Diamanti

GPU Architecture. Michael Doggett ATI

Introduction to GPU Programming Languages

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

L20: GPU Architecture and Models

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

NVIDIA GeForce GTX 580 GPU Datasheet

Radeon HD 2900 and Geometry Generation. Michael Doggett

GPUs Under the Hood. Prof. Aaron Lanterman School of Electrical and Computer Engineering Georgia Institute of Technology

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

Introduction to GPU Architecture

GPGPU for Real-Time Data Analytics: Introduction. Nanyang Technological University, Singapore 2

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

QCD as a Video Game?

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

GPGPU: General-Purpose Computation on GPUs

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

General Purpose Computation on Graphics Processors (GPGPU) Mike Houston, Stanford University

An Introduction to Modern GPU Architecture Ashu Rege Director of Developer Technology

Next Generation GPU Architecture Code-named Fermi

Data Parallel Computing on Graphics Hardware. Ian Buck Stanford University

NVIDIA GeForce GTX 750 Ti

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

CUDA programming on NVIDIA GPUs

Texture Cache Approximation on GPUs

GPUs for Scientific Computing

Low power GPUs a view from the industry. Edvard Sørgård

CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University

Introduction to GPU hardware and to CUDA

ATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

1. INTRODUCTION Graphics 2

OpenGL Performance Tuning

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

A Crash Course on Programmable Graphics Hardware

The Future Of Animation Is Games

Introduction to GPU Computing

AMD GPU Architecture. OpenCL Tutorial, PPAM Dominik Behr September 13th, 2009

Why You Need the EVGA e-geforce 6800 GS

NVIDIA workstation 3D graphics card upgrade options deliver productivity improvements and superior image quality

CUDA. Multicore machines

In the early 1990s, ubiquitous

Parallel Programming Survey

Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA

Fast Implementations of AES on Various Platforms

GPU Computing - CUDA

GPU Parallel Computing Architecture and CUDA Programming Model

Developer Tools. Tim Purcell NVIDIA

NVIDIA Tools For Profiling And Monitoring. David Goodwin

GPU architecture II: Scheduling the graphics pipeline

ST810 Advanced Computing

Stream Processing on GPUs Using Distributed Multimedia Middleware

Graphics and Computing GPUs

Optimization for DirectX9 Graphics. Ashu Rege

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

ultra fast SOM using CUDA

Writing Applications for the GPU Using the RapidMind Development Platform

Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg

Real-Time Graphics Architecture

Videocard Benchmarks Over 600,000 Video Cards Benchmarked

Guided Performance Analysis with the NVIDIA Visual Profiler

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions

Brook for GPUs: Stream Computing on Graphics Hardware

Fast Software AES Encryption

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

Evaluation of CUDA Fortran for the CFD code Strukti

Real-time Visual Tracker by Stream Processing

Go Faster - Preprocessing Using FPGA, CPU, GPU. Dipl.-Ing. (FH) Bjoern Rudde Image Acquisition Development STEMMER IMAGING

A Survey of General-Purpose Computation on Graphics Hardware

QuickSpecs. NVIDIA Quadro M GB Graphics INTRODUCTION. NVIDIA Quadro M GB Graphics. Overview

MIDeA: A Multi-Parallel Intrusion Detection Architecture

How To Test Your Code On A Cuda Gdb (Gdb) On A Linux Computer With A Gbd (Gbd) And Gbbd Gbdu (Gdb) (Gdu) (Cuda

GPU Shading and Rendering: Introduction & Graphics Hardware

Clustering Billions of Data Points Using GPUs

PassMark - G3D Mark High End Videocards - Updated 12th of November 2015

String Matching on a multicore GPU using CUDA

CUDA Basics. Murphy Stein New York University

IP Video Rendering Basics

Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries

Transcription:

GPGPU Computing Yong Cao

Why Graphics Card? It s powerful! A quiet trend Copyright 2009 by Yong Cao

Why Graphics Card? It s powerful! Processor Processing Units FLOPs per Unit Clock Speed Processing Power High End Qual-Core CPU 16 (4 per core) 2 3000 MHz 96 GFLOPs NVIDIA GTX285 240 3 1476 MHz 1063 GFLOPs ATI Radeon HD 4870 800 2 750 MHz 1200 GFLOPs Copyright 2009 by Yong Cao

Why Graphics Card? It s powerful! Die size 576 mm 2 1.4 Billion Transistors Memory width: 512bit Bandwidth 142GB/s Max board Power:236 W GTX285: 183 W (55nm) Copyright 2009 by Yong Cao

Why Graphics Card? It s cheap and everywhere. E.g. NIVIDA sold 100 Million high-end GPGPU devices. A 128-core Geforce 9800GTX (648 GFLOPs) is $129 on Newegg.com. Copyright 2009 by Yong Cao

Why Graphics Card NOW? Before: f Two years ago, everyone is using Graphics API (Cg, GLSL, HLSL) for GPGPU programming. Restrict random-read (using Texture), NOT be able to random-write. (No pointer!) Copyright 2009 by Yong Cao

Why Graphics Card NOW? Now: NIVIDA released CUDA two years ago, since then Thousands d of CUDA software engineers New job title CUDA programmer Around 200 CUDA based technical publications! Why? Standard C language Support Pointer! Random read and write on GPU memory. Work with C++, Fortran Copyright 2009 by Yong Cao

Where s GPU in the system Copyright 2009 by Yong Cao

NVIDIA GPU Architecture t Lots of ALUs Lots of Control Focus on Graphics Applicants (Data parallel) Copyright 2009 by Yong Cao

NVIDIA GT200 Architecture t Thread Processing Cluster Atomic Memory Access Control Copyright 2009 by Yong Cao

NVIDIA GT200 Architecture t Multi-Processor Copyright 2009 by Yong Cao

Execution Mode Graphics Host Input Assembler Vtx Thread Issue Geom Thread Issue Setup / Rstr / ZCull Pixel Thread Issue SP TF L1 SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP TF TF TF TF TF TF TF L1 L1 L1 L1 L1 L1 L1 Thread Pr rocessor L2 L2 L2 L2 L2 L2 FB FB FB FB FB FB NVIDIA G80 GPU Copyright 2009 by Yong Cao, Referencing UIUC ECE498AL Course Notes 13

Execution Mode General Computing Host Input Assembler Thread Execution Manager Parallel Data Cache Parallel Data Cache Parallel Data Cache Parallel Data Cache Parallel Data Cache Parallel Data Cache Parallel Data Cache Parallel Data Cache Texture Texture Texture Texture Texture Texture Texture Texture Load/store Load/store Load/store Load/store Load/store Load/store Global Memory NVIDIA G80 GPU 14 Copyright 2009 by Yong Cao, Referencing UIUC ECE498AL Course Notes

GPGPU from Graphics Point of View Graphics Processing Pipeline (on GPU) Fixed function pipeline Programmable pipeline with Shaders GPU Processing Model Stream computing model

3D Graphics Applications Demos.

GPU Fundamentals: The Graphics Pipeline Graphics State Application Vertices (3 3D) Transform & Light Xformed, Lit Ve ertices (2D) Assemble Primitives Screenspace tr iangles (2D) Rasterize Fragments (pre e-pixels) Fragment Processing Final Pixels (Co olor, Depth) Video Memory (Textures) CPU GPU Render-to-texture A simplified graphics pipeline Copyright 2009 by Yong Cao, Referencing SIGGRAPH 2005 Course Notes from David Luebke

Programmable Graphics Pipeline - Shaders Graphics State Application Vertices (3 3D) Vertex Processor Transform Xformed, Lit Ve ertices (2D) Assemble Primitives Screenspace tr iangles (2D) Fragments (pre e-pixels) Fragment Final Pixels (Co olor, Depth) Rasterizee Shade Video Processor Memory (Textures) CPU GPU Render-to-texture Programmable vertex processor! Programmable fragment processor! Copyright 2009 by Yong Cao, Referencing SIGGRAPH 2005 Course Notes from David Luebke

GPU Pipeline: Transform Vertex processor (multiple in parallel) Transform from world space to image space Compute per-vertex lighting Copyright 2009 by Yong Cao, Referencing SIGGRAPH 2005 Course Notes from David Luebke

GPU Pipeline: Rasterize Rasterizer Convert geometric rep. (vertex) to image rep. (fragment) Fragment = image fragment Pixel + associated data: color, depth, stencil, etc. Interpolate per-vertex quantities across pixels Copyright 2009 by Yong Cao, Referencing SIGGRAPH 2005 Course Notes from David Luebke

Pixel / Fragment Processor Fragment processors (multiple in parallel) Compute a color for each pixel Optionally read colors from textures (images) Copyright 2009 by Yong Cao, Referencing SIGGRAPH 2005 Course Notes from David Luebke

GPU Programming g Model: Stream Stream Programming g Model Streams: An array of data units Kernels: Stream Kernel Take streams as input, produce streams at output Perform computation on streams Kernels can be linked together Stream

Why Streams? GPGPU Computing Ample computation by exposing parallelism Stream expose data parallelism Multiple stream elements can be processed in parallel Pipeline (task) parallelism li Multiple tasks can be processed in parallel Efficient communication Producer-consumer locality Predictable memory access pattern Optimize for throughput of all elements, not latency of one Processing many elements at once allows latency hiding

Reading Material NVIDIA CUDA Programming g Guide 2.0, Chapter One http://www.nvidia.com/object/cuda_develop.html develop html and looking for documentation NVIDIA Geforce GTX 280 Technical Brief http://www.nvidia.com/docs/io/55506/geforce_gtx_200_g PU_Technical_Brief.pdf Copyright 2009 by Yong Cao