Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011



Similar documents
GPGPU Computing. Yong Cao

Introduction to GPGPU. Tiziano Diamanti

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Computer Graphics Hardware An Overview

NVIDIA GeForce GTX 580 GPU Datasheet

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

Next Generation GPU Architecture Code-named Fermi

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Introduction to GPU Architecture

QCD as a Video Game?

GPU Architecture. Michael Doggett ATI

Radeon HD 2900 and Geometry Generation. Michael Doggett

CUDA Optimization with NVIDIA Tools. Julien Demouth, NVIDIA

Introduction to GPU hardware and to CUDA

GPU Parallel Computing Architecture and CUDA Programming Model

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST

GPGPU for Real-Time Data Analytics: Introduction. Nanyang Technological University, Singapore 2

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

How To Use An Amd Ramfire R7 With A 4Gb Memory Card With A 2Gb Memory Chip With A 3D Graphics Card With An 8Gb Card With 2Gb Graphics Card (With 2D) And A 2D Video Card With

Introduction to GPU Computing

L20: GPU Architecture and Models

GPUs for Scientific Computing

Introduction to GPU Programming Languages

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

Parallel Programming Survey

Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

Texture Cache Approximation on GPUs

SAPPHIRE VAPOR-X R9 270X 2GB GDDR5 OC WITH BOOST

ATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group

Guided Performance Analysis with the NVIDIA Visual Profiler

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

Experiences on using GPU accelerators for data analysis in ROOT/RooFit

HPC with Multicore and GPUs

AMD Radeon HD 8000M Series GPU Specifications AMD Radeon HD 8870M Series GPU Feature Summary

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

NVIDIA GeForce GTX 750 Ti

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

GPU Performance Analysis and Optimisation

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

AMD GPU Architecture. OpenCL Tutorial, PPAM Dominik Behr September 13th, 2009

1. INTRODUCTION Graphics 2

GPU Programming Strategies and Trends in GPU Computing

Accelerating CFD using OpenFOAM with GPUs

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

High Performance GPGPU Computer for Embedded Systems

HP Workstations graphics card options

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

The Future Of Animation Is Games

Turbomachinery CFD on many-core platforms experiences and strategies

Advanced Rendering for Engineering & Styling

QuickSpecs. NVIDIA Quadro M GB Graphics INTRODUCTION. NVIDIA Quadro M GB Graphics. Overview

ARM Processors for Computer-On-Modules. Christian Eder Marketing Manager congatec AG

AMD EMBEDDED PCIe ADD-IN BOARD Comparison

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Optimizing a 3D-FWT code in a cluster of CPUs+GPUs

SGRT: A Scalable Mobile GPU Architecture based on Ray Tracing

In the early 1990s, ubiquitous

IP Video Rendering Basics

Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries

Awards News. GDDR5 memory provides twice the bandwidth per pin of GDDR3 memory, delivering more speed and higher bandwidth.

The Fastest, Most Efficient HPC Architecture Ever Built

Stream Processing on GPUs Using Distributed Multimedia Middleware

Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks. October 20 th 2015

NVIDIA Quadro K5200 Amazing 3D Application Performance and Capability

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

GPGPU accelerated Computational Fluid Dynamics

Fast Implementations of AES on Various Platforms

Communicating with devices

General Purpose Computation on Graphics Processors (GPGPU) Mike Houston, Stanford University

NVIDIA Solves the GPU Computing Puzzle

Case Study on Productivity and Performance of GPGPUs

~ Greetings from WSU CAPPLab ~

CLOUD GAMING WITH NVIDIA GRID TECHNOLOGIES Franck DIARD, Ph.D., SW Chief Software Architect GDC 2014

CUDA programming on NVIDIA GPUs

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

White Paper AMD GRAPHICS CORES NEXT (GCN) ARCHITECTURE

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

Evaluation of CUDA Fortran for the CFD code Strukti

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

Configuring Memory on the HP Business Desktop dx5150

Choosing a Computer for Running SLX, P3D, and P5

Jean-Pierre Panziera Teratec 2011

gpus1 Ubuntu Available via ssh

Enhancing Cloud-based Servers by GPU/CPU Virtualization Management

Transcription:

Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011

Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis Conclusion

Graphics Processing Units Processor optimized for graphics processing Very good at parallel processing Designed for computation-heavy operations

Graphics History 1981: First PC video cards created by IBM 1984: First processor-based graphics card 1990 s: Advent of 3D graphics, advancements driven by gaming/cad (Computer Aided Design) 1992: OpenGL application programming interface introduced 1999: Nvidia GeForce 256 marketed as world s first GPU 2000 s present: Increased programmability (DirectX), general purpose usage (GPGPU)

Why GPUs? Demand for better graphics (3D) Too much work for CPU alone 3D pipeline would add complexity to CPU Solution: offload graphics processing to GPU

CPU vs. GPU Less data parallelism More complex control Lower performance goals Memory optimized to reduce latency Serial programming High data parallelism Special purpose HW Memory optimized for high throughput Less emphasis on cache Stream programming

Characteristics Stream processing Lots of functional units Several cores (>100) Multiple, light threads Fast memory 2 types of programmable processors: vertex and fragment

Instructions Computation-heavy, few memory accesses (has what it needs beforehand) Independent and highly data parallel (same instruction performed on several data elements)

Stream Computation/Graphics Pipeline Stream: ordered set of data of the same type Efficiency increases with stream length Kernel: operation performed on an entire stream Stream applications are created by stringing kernels together Creates a graphics pipeline

Inside the Pipeline

The Vertex Processor Polygonal models are made up of vertices (x,y,z,w) coordinates (r,g,b, α) component colors Vertex processor applies a vertex program Changes position of vertices Makes use of texture cache (modern GPUs only) Output of vertex processor goes to the rasterizer Rasterizer converts primitives into fragments

The Fragment Processor AKA: Pixel Shader Fragment: contains all information required to make a pixel Fragment processor performs fragment program Computes final color of pixel Utilizes data from texture cache

General Architecture Fatahalian 2009

Geforce 8800-GTX200 Architecture Very Close to Tesla Architecture 16-30 SMP Cores Running up to 768 Threads Each Warps of 32 Threads Each Core Unified Pixel/Vertex Shader Architecture 6 Pairs of L2 Cache and Memory Buses L2 Cache is Read-Only for Textures First card with DirectX 10 Support

Geforce 8800 Owens 2007

Tesla Architecture GTX200 Luebke 2008

Fermi Architecture 40nm Process, 3 Billion Transistors 512 CUDA cores organized into 16 SMPs Each supporting up to 1536 simultaneous threads 16 Load/Store, 16 ALU, 4 Special Purpose Units Each 6 GDDR5 DRAM Interfaces, 40 bit Addresses Cores Clocked at 1.3Ghz+, while Cache is at 600Mhz

Fermi Architecture Embedded Unified Shader Architecture Incorporates Tessellation into Vertex, Geometry, and Pixel Shading 8x Double Precision FP Performance over Tesla Distributed Rasterizer L1 Cache - Register Spilling, Stack Operations, and Load/Store only L2 Cache For Textures, Read/Write ECC Memory Protection

Fermi Architecture Nickolls and Dally 2010

AMD s Cayman Architecture 40nm Technology, 2.64 Billion Transistors 24 880Mhz Cores Each Running 4-Wide VLIW Instructions Static Scheduling 8KB L1 Texture Cache Shared L2 Cache 4KB Write Cache OpenCL and DirectCompute instead of CUDA Interleaved ALU Functions All ALU/Execution Units are same, no special units 7840KB of Registers

AMD s Cayman Architecture http://www.pcper.com/reviews/graphics-cards/amd-radeon-hd-6970-and-6950-review-cayman-architecture-revealed

Architecture Comparisons Card Stream Processors Graphics Clock Speed Peak Memory Bandwidth Power Peak GFLOPs Geforce 8800GT 112 600Mhz 57.6 GB/s 105 W 504 GTX 280 240 602Mhz 141.7 GB/s 236 W 933 GTX 480 (Fermi) 480 700Mhz 177.4 GB/s 250 W 1344.96 Radeon HD 6950 (Cayman) 960 800Mhz 160 GB/s 200 W 2253

Performance Metrics Based on Warp Parallelism Memory Warp Parallelism How many concurrent memory requests # Warps Accessing Memory in each Memory Period Computational Warp Parallelism How much computation can be done in warps while one waits for memory CWP = (Memory Cycles + Comp Cycles)/Comp Cycles Parallel Element Scalability = # Elements/Total Execution Time Parallel Subset Scalability = (Execution time of Subset) * (# Subsets / # GPU Cores)

GPU Performance Benefits Jacobi Algorithm 2-7.5x Speedup Monte Carlo Simulations 16x Speedup EDA Scheduability 17x for Entire Problem, 100x in Kernel Computations Euler Algorithm Up to 500x Speedup Equivalent to 125 Quad-Core CPUs

Future Architectures Nvidia s Kepler 28nm Technology Supposedly in 2011, but Likely not until Mid 2012 DirectX 12 Support 3-4x Performance over comparable Fermi cards, with 3-4x less power consumption Support for Pre-emption and virtual memory Nvidia s Maxwell Late 2013 Nvidia s Echelon 2017 10nm Technology

Future Applications/Obstacles General Purpose GPU (GPGPU) Simulation, video encoding, numerical methods, databases GPU used to process gaming physics/artificial intelligence? More programmability Power management Memory bandwidth limits Faster PCI-Express buses Combined CPU/GPU

Questions?