Introduction to GPGPU. Tiziano Diamanti

Size: px
Start display at page:

Download "Introduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it"

Transcription

1

2 Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model

3 Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate the XY plane. Points of intersection will be our projected object

4 An example of perspective projection Px Py = = Zx * Qz Qz Zy * Qz Qz Qx* Zz Zz Qy * Zz Zz Dove: P (PX, PY) pixel on the screen Q (QX, QY, QZ,) starting point in 3D coordinates Z (ZX, ZY, ZZ,) vanishing point Projection is on the XY plane for simplicity so z= 0.

5 Hidden lines removal Many alghoritms may be found in literature for solving this problem, like the painter alghoritm

6 Graphic primitives are transformed into pixels of the frame buffer Rasterization

7 Z-Buffer When using filled polygons instead of lines, there is a method easily implemented in hardware to solve the problem of depth: the Z buffer. This buffer has the same size as the viewport and stores the depth value for each pixel that has been designed, where depth is the distance to the observer. For each pixel, you can go to change the color of the pixel if and only if the associated depth value is less than the existing one. In this way the polygons closer to the observer will cover the most remote in the sense that the pixels that constitute them overlap with the polygons that are further away.

8 Z-Buffer Z = -.5 Z = -.3 Final image eye Top View

9 The Z-Buffer algorithm Step 1: Initialization/enabling of the depth buffer depth buffer

10 The Z-Buffer algorithm Step 2: OpenGL stores the z coordinates of the polygons as they are rendered on the screen eye Z = -.5 Z = -.3

11 The Z-Buffer algorithm Step 3: draw the polygons according to their position z eye Z = -.5 Z = -.3

12 Texture mapping Texture mapping is to apply a bitmap image to a two-dimensional polygon.

13 Rendering Pipeline Vertices connections Fragment position Application Vertex Processor Rasterizer Fragment Processor Buffer Vertices Transformed vertices Fragments Textured fragments Woode n texture

14 The first graphic computers The first graphic supercomputers were typically SGI had hardware acceleration and areas were used for military or aviation simulation

15 3D accelerators for PC Since 1997, there are some PC graphics accelerators. The progress is very fast (about a new generation every year)

16 1999 Nvidia Riva TNT 128-bit bus and graphic engine 180 millions pixels/sec fill rate 6 millions triangles/sec peak 16 Mbyte frame buffer

17 The AGP bus Accellerated Graphic Port, introduced by Intel in 1997 The PCI bus was a 32 bit bus and had a frequency of 33 MHz, so the bandwidth was 33 * 4 byte/s = 133 MB/s The AGP bus (1X) had a frequency of 66 MHz and a width of 32 bit, so the bandwidth was 266 MB/s. AGP2x offered 533 MB/s AGP4x doubles again with 1066 MB/s. AGP8x offered 2GB/s

18 Trasform & Lighting, for the first time, perspective and illumination are calculated on the GPU 256-bit bus and graphic engine 480 Millions pixels/s 15 Millions triangles/s 32 Mbytes frame buffer 2000 Nvidia G-Force

19 2001 Nvidia G-Force 3 57 millions transistors First 3D chip 3D with vertex e pixel shaders 2 textures per pixel

20 2002 Nvidia G-Force 4 Ti 63 millions transistors millions triangles/s 128 Mbytes frame buffer vertex shader units were doubled

21 2002 Nvidia G-Force FX 130 Millions transistors 315 Millions Triangles/s 128/256MBytes frame buffer DirectX9 vertex and pixel shaders

22 Introduced by Intel in 2004 The PCI-Express bus PCI-Express 16x offers 4 Gbytes/s both ways (from and to the GPU), this is increasingly important for having the results of calculations on the GPU (GPGPU)

23 2004 Nvidia G-Force Millions transistors 128/256/512MB frame buffer 16 graphic pipelines for pixel shaders 6 units for vertex shaders DirectX 9.0c

24 2 graphic cards in a PC: Nvidia SLI, ATI Crossfire

25 2005: Nvidia G-Force millions transistors 24 graphic pixel pipelines 8 units for vertex shaders Available only for PCI-Express 15,6 billions pixel/sec 1400 millions verteces/sec

26 2006: Nvidia G-Force 8 Shader model 4.0, geometry shader (DirectX 10) Up to 768 Mbytes memory on-board 36,8 billions pixel/sec 681 millions transistors 128 unified graphic pipeline millions vertices/sec

27 PCI Express 2.0 The PCI Express 2.0 doubles the bus clock frequency of 1.1, doubling the available bandwidth. It is backward compatible with PCI Express 1.1 specifications

28 2008: Nvidia G-Force 9 Shader model 4.0, geometry shader (DirectX 10) Up to 1 Gbytes memory on-board 43,2 billionsdi pixel per second Support for PCI Express millions transistors 128 graphic pipelines 65 nm transistors

29 Shader model 4.0, geometry shader (DirectX 10) 240 Streaming processors 55 nm transistors 51.8 billions pixel per second 2008: GTX 200

30 New generation: Fermi The new generation of Nvidia graphics chips has been dubbed Fermi and is marketed under the symbol GTX 400/500. The original project included 512 CUDA cores, up to 6 GB GDDR5 memory. Produced with the process to 40 nm of TMSC (nvidia has always been fabless) _platform/b010101_40nm.htm

31 512 CUDA cores, up to 6 Gbyte GDDR5 memory. TMSC 40 nm transistors GeForce GTX 580: 512 CUDA Cores, 1536 MB GDDR5 GeForce GTX 570: 480 CUDA Cores, 1280 MB GDDR5 2010: Fermi

32 New generation: Fermi

33 Fermi The GPU is organized in 4 Graphics Processing Clusters (GPC) Each GPC has 4 sub-units, each one with 32 streaming processors that execute the same instruction in parallel (in comparison the GTX 200 chip had 8) Each GTC has cache L1 e shared memory Each GTC has 2 Dispatch units

34 Fermi introduces cache

35 Shared memory A sort of explicit cache Resides on the chip so it is much faster than the onboard memory Size is 16KB (48KB on Fermi)

36 Fermi (3) NVIDIA introduces GigaThreadTM Engine that allows concurrent execution kernel, or kernel threads belonging to different kernels can be run simultaneously, which was not possible with previous generation GPUs.

37 GF 104 Introduced the 104 chip for GF GTX 460 graphics card, introduces the hardware differences Each MS 48 and not 32 CUDA cores Provides a total of 384 cores The GTX 460 has a SM card disabled for a total of 336 cores The GTX 560 has the full 384 cores implemented

38 To balance the increase in cores for MS have been doubled dispatch units from 2 to 4 GF 104

39 nvidia naming Mainstream & laptops: GeForce Target: videogames and multi-media Workstation: Quadro Target: graphic professionals who use CAD and 3D modeling applications The surcharge is due to more memory and especially the specific drivers for accelerating applications GPGPU: Tesla Target: High Performance Computing

40 Mainstream: Fermi: real products GeForce GTX 580: 512 CUDA Cores, 1536 MB GDDR5 GeForce GTX 570: 480 CUDA Cores, 1280 MB GDDR5 Computing (memory can be configured to be ECC): Tesla C2050: 448 CUDA Cores, 3GB GDDR5 Tesla C2070: 448 CUDA Cores, 6GB GDDR5 * Note: With ECC on, 12.5% of the GPU memory is used for ECC bits. For example, 3 GB total memory yields GB of user available memory with ECC on.

41 Tesla C2050 Double Precision floating point performance (peak) 515 Gflops Single Precision floating point performance (peak) 1.03 Tflops They were 78 e 933 Tflops for the previous generation

42 Rendering Pipeline Vertices connections Fragment position Application Vertex Processor Rasterizer Fragment Processor Buffer Vertices Transformed vertices Fragments Textured fragments Woode n texture

43 Shading languages HLSL (Microsoft, 2002) Cg (nvidia, 2002) GLSL (ARB, 2003) ASM Shading Languages (2001) Direct3D (Microsoft, 1995) OpenGL (ARB, 1992)

44 GLSL: example void main() // Vertex shader { gl_position = gl_modelviewprojectionmatrix * gl_vertex; } void main() // Fragment shader { gl_fragcolor = vec4(1.0, 0.0, 0.0, 1.0); }

45 Hi level languages C-like syntax Data types: Vectors (from 1 to 4 floating point, integer, boolean) Matrices (2x2, 3x3, 4x4) Arrays e Textures Conditions, loops, functions Matrix and vector Algebra Special instructions: trigonometry, exponentials, geometry, interpolations

46 GPGPU (General Purpose computation using GPU) Non graphic use of the programmable shaders

47 Future trends The power dissipation can be further increased We are already at the limits of air cooling Power consumption increases not linearly with the clock P = CfV 2, V is proportional to f cubic relation Clock high ratios lead to very low efficiency Multi-core processors can be beneficial: To reduce the clock of 20% leads to an energy savings of 50% More efficient use of transistors rather than turning up the clock from a single processor

48 Architecture of a GPU nvidia GTX 580: Bandwidth: GB/s Estimated Gflops/s Intel Core i7-980x: Max Memory Bandwidth: 25.6 GB/s Estimated 107 GFlops

49 AMD s architecture: VLIW 5 Very Long Instruction Word 5 Designed to process a 4 component dot product (e.g. w, x, y, z) and a scalar component (e.g. lighting) at the same time Found on models of the 6800 serie and backwards 48

50 AMD s architecture: VLIW 4 In games VLIW5 reached an average of efficiency of 3.4 Starting from 6900 serie AMD introduced VLIW 4 The space previously allocated to the t-unit can now be used to have more SIMDs Drivers and compilers are more complicated on this architecture than on nvidia s because they need to exploit not only the SIMDs parallelism but they also need to exploit the vectorization inside the SIMDs 49

51 nvidia vs AMD nvidia s SMIDs are simpler (one instruction per clock cicle) but they run at double the clock of the rest of the chip, for example this are the specs of the GeForce GTX 580: CUDA Cores 512 Graphics Clock (MHz) 772 MHz Processor Clock (MHz) 1544 MHz AMD s radeon 6970 specs: Stream processors 1536 (384 * 4) Clock 880 MHz AMD s radeon 6870 specs: Stream processors 1120 (224 * 5) Clock 900 Mhz 50

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011 Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

More information

Computer Graphics Hardware An Overview

Computer Graphics Hardware An Overview Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and

More information

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?

More information

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA The Evolution of Computer Graphics Tony Tamasi SVP, Content & Technology, NVIDIA Graphics Make great images intricate shapes complex optical effects seamless motion Make them fast invent clever techniques

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

GPGPU Computing. Yong Cao

GPGPU Computing. Yong Cao GPGPU Computing Yong Cao Why Graphics Card? It s powerful! A quiet trend Copyright 2009 by Yong Cao Why Graphics Card? It s powerful! Processor Processing Units FLOPs per Unit Clock Speed Processing Power

More information

GPU Architecture. Michael Doggett ATI

GPU Architecture. Michael Doggett ATI GPU Architecture Michael Doggett ATI GPU Architecture RADEON X1800/X1900 Microsoft s XBOX360 Xenos GPU GPU research areas ATI - Driving the Visual Experience Everywhere Products from cell phones to super

More information

NVIDIA GeForce GTX 580 GPU Datasheet

NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet 3D Graphics Full Microsoft DirectX 11 Shader Model 5.0 support: o NVIDIA PolyMorph Engine with distributed HW tessellation engines

More information

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software GPU Computing Numerical Simulation - from Models to Software Andreas Barthels JASS 2009, Course 2, St. Petersburg, Russia Prof. Dr. Sergey Y. Slavyanov St. Petersburg State University Prof. Dr. Thomas

More information

QCD as a Video Game?

QCD as a Video Game? QCD as a Video Game? Sándor D. Katz Eötvös University Budapest in collaboration with Győző Egri, Zoltán Fodor, Christian Hoelbling Dániel Nógrádi, Kálmán Szabó Outline 1. Introduction 2. GPU architecture

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

Radeon HD 2900 and Geometry Generation. Michael Doggett

Radeon HD 2900 and Geometry Generation. Michael Doggett Radeon HD 2900 and Geometry Generation Michael Doggett September 11, 2007 Overview Introduction to 3D Graphics Radeon 2900 Starting Point Requirements Top level Pipeline Blocks from top to bottom Command

More information

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university Real-Time Realistic Rendering Michael Doggett Docent Department of Computer Science Lund university 30-5-2011 Visually realistic goal force[d] us to completely rethink the entire rendering process. Cook

More information

Introduction to GPU Architecture

Introduction to GPU Architecture Introduction to GPU Architecture Ofer Rosenberg, PMTS SW, OpenCL Dev. Team AMD Based on From Shader Code to a Teraflop: How GPU Shader Cores Work, By Kayvon Fatahalian, Stanford University Content 1. Three

More information

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture? This Unit: Putting It All Together CIS 501 Computer Architecture Unit 11: Putting It All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Amir Roth with contributions by Milo

More information

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005 Recent Advances and Future Trends in Graphics Hardware Michael Doggett Architect November 23, 2005 Overview XBOX360 GPU : Xenos Rendering performance GPU architecture Unified shader Memory Export Texture/Vertex

More information

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series By: Binesh Tuladhar Clay Smith Overview History of GPU s GPU Definition Classical Graphics Pipeline Geforce 6 Series Architecture Vertex

More information

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008 Radeon GPU Architecture and the series Michael Doggett Graphics Architecture Group June 27, 2008 Graphics Processing Units Introduction GPU research 2 GPU Evolution GPU started as a triangle rasterizer

More information

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com CSCI-GA.3033-012 Graphics Processing Units (GPUs): Architecture and Programming Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Modern GPU

More information

L20: GPU Architecture and Models

L20: GPU Architecture and Models L20: GPU Architecture and Models scribe(s): Abdul Khalifa 20.1 Overview GPUs (Graphics Processing Units) are large parallel structure of processing cores capable of rendering graphics efficiently on displays.

More information

CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University

CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University GPU Generations CSE 564: Visualization GPU Programming (First Steps) Klaus Mueller Computer Science Department Stony Brook University For the labs, 4th generation is desirable Graphics Hardware Pipeline

More information

GPU Parallel Computing Architecture and CUDA Programming Model

GPU Parallel Computing Architecture and CUDA Programming Model GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel

More information

Introduction to GPU Computing

Introduction to GPU Computing Matthis Hauschild Universität Hamburg Fakultät für Mathematik, Informatik und Naturwissenschaften Technische Aspekte Multimodaler Systeme December 4, 2014 M. Hauschild - 1 Table of Contents 1. Architecture

More information

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group Shader Model 3.0 Ashu Rege NVIDIA Developer Technology Group Talk Outline Quick Intro GeForce 6 Series (NV4X family) New Vertex Shader Features Vertex Texture Fetch Longer Programs and Dynamic Flow Control

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware

More information

Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA

Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA Dissertation submitted in partial fulfillment of the requirements for the degree of Master of Technology, Computer Engineering by Amol

More information

Introduction to Computer Graphics

Introduction to Computer Graphics Introduction to Computer Graphics Torsten Möller TASC 8021 778-782-2215 torsten@sfu.ca www.cs.sfu.ca/~torsten Today What is computer graphics? Contents of this course Syllabus Overview of course topics

More information

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.

More information

How To Use An Amd Ramfire R7 With A 4Gb Memory Card With A 2Gb Memory Chip With A 3D Graphics Card With An 8Gb Card With 2Gb Graphics Card (With 2D) And A 2D Video Card With

How To Use An Amd Ramfire R7 With A 4Gb Memory Card With A 2Gb Memory Chip With A 3D Graphics Card With An 8Gb Card With 2Gb Graphics Card (With 2D) And A 2D Video Card With SAPPHIRE R9 270X 4GB GDDR5 WITH BOOST & OC Specification Display Support Output GPU Video Memory Dimension Software Accessory 3 x Maximum Display Monitor(s) support 1 x HDMI (with 3D) 1 x DisplayPort 1.2

More information

NVIDIA workstation 3D graphics card upgrade options deliver productivity improvements and superior image quality

NVIDIA workstation 3D graphics card upgrade options deliver productivity improvements and superior image quality Hardware Announcement ZG09-0170, dated March 31, 2009 NVIDIA workstation 3D graphics card upgrade options deliver productivity improvements and superior image quality Table of contents 1 At a glance 3

More information

SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST

SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST Specification Display Support Output GPU Video Memory Dimension Software Accessory supports up to 4 display monitor(s) without DisplayPort 4 x Maximum Display

More information

Introduction to GPU Programming Languages

Introduction to GPU Programming Languages CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure

More information

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming Overview Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics GPU Architectures A CPU Perspective Derek Hower AMD Research 5/21/2013 Goals Data Parallelism: What is it, and how to exploit it? Workload characteristics Execution Models / GPU Architectures MIMD (SPMD),

More information

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms

Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Mixed Precision Iterative Refinement Methods Energy Efficiency on Hybrid Hardware Platforms Björn Rocker Hamburg, June 17th 2010 Engineering Mathematics and Computing Lab (EMCL) KIT University of the State

More information

ATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group

ATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group ATI Radeon 4800 series Graphics Michael Doggett Graphics Architecture Group Graphics Product Group Graphics Processing Units ATI Radeon HD 4870 AMD Stream Computing Next Generation GPUs 2 Radeon 4800 series

More information

Evaluation of CUDA Fortran for the CFD code Strukti

Evaluation of CUDA Fortran for the CFD code Strukti Evaluation of CUDA Fortran for the CFD code Strukti Practical term report from Stephan Soller High performance computing center Stuttgart 1 Stuttgart Media University 2 High performance computing center

More information

GPUs for Scientific Computing

GPUs for Scientific Computing GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research

More information

GPGPU accelerated Computational Fluid Dynamics

GPGPU accelerated Computational Fluid Dynamics t e c h n i s c h e u n i v e r s i t ä t b r a u n s c h w e i g Carl-Friedrich Gauß Faculty GPGPU accelerated Computational Fluid Dynamics 5th GACM Colloquium on Computational Mechanics Hamburg Institute

More information

Choosing a Computer for Running SLX, P3D, and P5

Choosing a Computer for Running SLX, P3D, and P5 Choosing a Computer for Running SLX, P3D, and P5 This paper is based on my experience purchasing a new laptop in January, 2010. I ll lead you through my selection criteria and point you to some on-line

More information

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:

More information

HP Workstations graphics card options

HP Workstations graphics card options Family data sheet HP Workstations graphics card options Quick reference guide Leading-edge professional graphics February 2013 A full range of graphics cards to meet your performance needs compare features

More information

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015 GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once

More information

AMD GPU Architecture. OpenCL Tutorial, PPAM 2009. Dominik Behr September 13th, 2009

AMD GPU Architecture. OpenCL Tutorial, PPAM 2009. Dominik Behr September 13th, 2009 AMD GPU Architecture OpenCL Tutorial, PPAM 2009 Dominik Behr September 13th, 2009 Overview AMD GPU architecture How OpenCL maps on GPU and CPU How to optimize for AMD GPUs and CPUs in OpenCL 2 AMD GPU

More information

GPUs Under the Hood. Prof. Aaron Lanterman School of Electrical and Computer Engineering Georgia Institute of Technology

GPUs Under the Hood. Prof. Aaron Lanterman School of Electrical and Computer Engineering Georgia Institute of Technology GPUs Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering Georgia Institute of Technology Bandwidth Gravity of modern computer systems The bandwidth between key components

More information

QuickSpecs. NVIDIA Quadro M6000 12GB Graphics INTRODUCTION. NVIDIA Quadro M6000 12GB Graphics. Overview

QuickSpecs. NVIDIA Quadro M6000 12GB Graphics INTRODUCTION. NVIDIA Quadro M6000 12GB Graphics. Overview Overview L2K02AA INTRODUCTION Push the frontier of graphics processing with the new NVIDIA Quadro M6000 12GB graphics card. The Quadro M6000 features the top of the line member of the latest NVIDIA Maxwell-based

More information

GPGPU for Real-Time Data Analytics: Introduction. Nanyang Technological University, Singapore 2

GPGPU for Real-Time Data Analytics: Introduction. Nanyang Technological University, Singapore 2 GPGPU for Real-Time Data Analytics: Introduction Bingsheng He 1, Huynh Phung Huynh 2, Rick Siow Mong Goh 2 1 Nanyang Technological University, Singapore 2 A*STAR Institute of High Performance Computing,

More information

Configuring Memory on the HP Business Desktop dx5150

Configuring Memory on the HP Business Desktop dx5150 Configuring Memory on the HP Business Desktop dx5150 Abstract... 2 Glossary of Terms... 2 Introduction... 2 Main Memory Configuration... 3 Single-channel vs. Dual-channel... 3 Memory Type and Speed...

More information

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,

More information

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS)

PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters. from One Stop Systems (OSS) PCIe Over Cable Provides Greater Performance for Less Cost for High Performance Computing (HPC) Clusters from One Stop Systems (OSS) PCIe Over Cable PCIe provides greater performance 8 7 6 5 GBytes/s 4

More information

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Technical Specifications

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Technical Specifications J3G90AA INTRODUCTION The NVIDIA Quadro K5200 gives you amazing application performance and capability, making it faster and easier to accelerate 3D models, render complex scenes, and simulate large datasets.

More information

1. INTRODUCTION Graphics 2

1. INTRODUCTION Graphics 2 1. INTRODUCTION Graphics 2 06-02408 Level 3 10 credits in Semester 2 Professor Aleš Leonardis Slides by Professor Ela Claridge What is computer graphics? The art of 3D graphics is the art of fooling the

More information

Writing Applications for the GPU Using the RapidMind Development Platform

Writing Applications for the GPU Using the RapidMind Development Platform Writing Applications for the GPU Using the RapidMind Development Platform Contents Introduction... 1 Graphics Processing Units... 1 RapidMind Development Platform... 2 Writing RapidMind Enabled Applications...

More information

Petascale Visualization: Approaches and Initial Results

Petascale Visualization: Approaches and Initial Results Petascale Visualization: Approaches and Initial Results James Ahrens Li-Ta Lo, Boonthanome Nouanesengsy, John Patchett, Allen McPherson Los Alamos National Laboratory LA-UR- 08-07337 Operated by Los Alamos

More information

Monash University Clayton s School of Information Technology CSE3313 Computer Graphics Sample Exam Questions 2007

Monash University Clayton s School of Information Technology CSE3313 Computer Graphics Sample Exam Questions 2007 Monash University Clayton s School of Information Technology CSE3313 Computer Graphics Questions 2007 INSTRUCTIONS: Answer all questions. Spend approximately 1 minute per mark. Question 1 30 Marks Total

More information

QuickSpecs. NVIDIA Quadro K1200 4GB Graphics INTRODUCTION PERFORMANCE AND FEATURES. Overview

QuickSpecs. NVIDIA Quadro K1200 4GB Graphics INTRODUCTION PERFORMANCE AND FEATURES. Overview Overview L4D16AA INTRODUCTION The NVIDIA Quadro K1200 delivers outstanding professional 3D application performance in a low profile plug-in card form factor. This card is dedicated for small form factor

More information

SAPPHIRE VAPOR-X R9 270X 2GB GDDR5 OC WITH BOOST

SAPPHIRE VAPOR-X R9 270X 2GB GDDR5 OC WITH BOOST SAPPHIRE VAPOR-X R9 270X 2GB GDDR5 OC WITH BOOST Specification Display Support Output GPU Video Memory Dimension Software Accessory 4 x Maximum Display Monitor(s) support 1 x HDMI (with 3D) 1 x DisplayPort

More information

A Crash Course on Programmable Graphics Hardware

A Crash Course on Programmable Graphics Hardware A Crash Course on Programmable Graphics Hardware Li-Yi Wei Abstract Recent years have witnessed tremendous growth for programmable graphics hardware (GPU), both in terms of performance and functionality.

More information

Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks. October 20 th 2015

Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks. October 20 th 2015 INF5063: Programming heterogeneous multi-core processors because the OS-course is just to easy! Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks October 20 th 2015 Håkon Kvale

More information

AMD EMBEDDED PCIe ADD-IN BOARD Comparison

AMD EMBEDDED PCIe ADD-IN BOARD Comparison AMD EMBEDDED PCIe ADD-IN BOARD Comparison AMD Radeon E6460 AMD Radeon E6760 Graphics Processing Unit Process Technology 40 nm 40 nm Graphics Engine Operating Frequency (max) 600 MHz 600 MHz CPU Interface

More information

The Future Of Animation Is Games

The Future Of Animation Is Games The Future Of Animation Is Games 王 銓 彰 Next Media Animation, Media Lab, Director cwang@1-apple.com.tw The Graphics Hardware Revolution ( 繪 圖 硬 體 革 命 ) : GPU-based Graphics Hardware Multi-core (20 Cores

More information

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions Module 4: Beyond Static Scalar Fields Dynamic Volume Computation and Visualization on the GPU Visualization and Computer Graphics Group University of California, Davis Overview Motivation and applications

More information

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Overview. NVIDIA Quadro K5200 8GB Graphics J3G90AA

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Overview. NVIDIA Quadro K5200 8GB Graphics J3G90AA Overview J3G90AA INTRODUCTION The NVIDIA Quadro K5200 gives you amazing application performance and capability, making it faster and easier to accelerate 3D models, render complex scenes, and simulate

More information

Several tips on how to choose a suitable computer

Several tips on how to choose a suitable computer Several tips on how to choose a suitable computer This document provides more specific information on how to choose a computer that will be suitable for scanning and postprocessing of your data with Artec

More information

GRAPHICS CARDS IN RADIO RECONNAISSANCE: THE GPGPU TECHNOLOGY

GRAPHICS CARDS IN RADIO RECONNAISSANCE: THE GPGPU TECHNOLOGY IV. Évfolyam 4. szám - 2009. december Fürjes János furjes.janos@chello.hu GRAPHICS CARDS IN RADIO RECONNAISSANCE: THE GPGPU TECHNOLOGY Absztrakt/Abstract Jelen írás egy modern technológiát elemez, amely

More information

Comp 410/510. Computer Graphics Spring 2016. Introduction to Graphics Systems

Comp 410/510. Computer Graphics Spring 2016. Introduction to Graphics Systems Comp 410/510 Computer Graphics Spring 2016 Introduction to Graphics Systems Computer Graphics Computer graphics deals with all aspects of creating images with a computer Hardware (PC with graphics card)

More information

CUDA programming on NVIDIA GPUs

CUDA programming on NVIDIA GPUs p. 1/21 on NVIDIA GPUs Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view

More information

NVIDIA GeForce GTX 750 Ti

NVIDIA GeForce GTX 750 Ti Whitepaper NVIDIA GeForce GTX 750 Ti Featuring First-Generation Maxwell GPU Technology, Designed for Extreme Performance per Watt V1.1 Table of Contents Table of Contents... 1 Introduction... 3 The Soul

More information

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization

More information

IP Video Rendering Basics

IP Video Rendering Basics CohuHD offers a broad line of High Definition network based cameras, positioning systems and VMS solutions designed for the performance requirements associated with critical infrastructure applications.

More information

QuickSpecs. NVIDIA Quadro K2200 4GB Graphics INTRODUCTION. NVIDIA Quadro K2200 4GB Graphics. Technical Specifications

QuickSpecs. NVIDIA Quadro K2200 4GB Graphics INTRODUCTION. NVIDIA Quadro K2200 4GB Graphics. Technical Specifications J3G88AA INTRODUCTION The NVIDIA Quadro K2200 delivers outstanding professional 3D application performance in a sub-75 Watt graphics design. Ultra-fast 4GB of GDDR5 GPU memory enables you to create large,

More information

Turbomachinery CFD on many-core platforms experiences and strategies

Turbomachinery CFD on many-core platforms experiences and strategies Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29

More information

Msystems Ltd. www.msystems.gr SAPPHIRE HD 6870 1GB GDDR5 PCIE

Msystems Ltd. www.msystems.gr SAPPHIRE HD 6870 1GB GDDR5 PCIE SAPPHIRE HD 6870 1GB GDDR5 PCIE The SAPPHIRE HD 6870 has a new architecture with a total of 1120 stream processors and 56 texture units delivering massively parallel computing power for graphics and other

More information

Developer Tools. Tim Purcell NVIDIA

Developer Tools. Tim Purcell NVIDIA Developer Tools Tim Purcell NVIDIA Programming Soap Box Successful programming systems require at least three tools High level language compiler Cg, HLSL, GLSL, RTSL, Brook Debugger Profiler Debugging

More information

Console Architecture. By: Peter Hood & Adelia Wong

Console Architecture. By: Peter Hood & Adelia Wong Console Architecture By: Peter Hood & Adelia Wong Overview Gaming console timeline and evolution Overview of the original xbox architecture Console architecture of the xbox360 Future of the xbox series

More information

~ Greetings from WSU CAPPLab ~

~ Greetings from WSU CAPPLab ~ ~ Greetings from WSU CAPPLab ~ Multicore with SMT/GPGPU provides the ultimate performance; at WSU CAPPLab, we can help! Dr. Abu Asaduzzaman, Assistant Professor and Director Wichita State University (WSU)

More information

Case Study on Productivity and Performance of GPGPUs

Case Study on Productivity and Performance of GPGPUs Case Study on Productivity and Performance of GPGPUs Sandra Wienke wienke@rz.rwth-aachen.de ZKI Arbeitskreis Supercomputing April 2012 Rechen- und Kommunikationszentrum (RZ) RWTH GPU-Cluster 56 Nvidia

More information

ST810 Advanced Computing

ST810 Advanced Computing ST810 Advanced Computing Lecture 17: Parallel computing part I Eric B. Laber Hua Zhou Department of Statistics North Carolina State University Mar 13, 2013 Outline computing Hardware computing overview

More information

In the early 1990s, ubiquitous

In the early 1990s, ubiquitous How GPUs Work David Luebke, NVIDIA Research Greg Humphreys, University of Virginia In the early 1990s, ubiquitous interactive 3D graphics was still the stuff of science fiction. By the end of the decade,

More information

3D Computer Games History and Technology

3D Computer Games History and Technology 3D Computer Games History and Technology VRVis Research Center http://www.vrvis.at Lecture Outline Overview of the last 10-15 15 years A look at seminal 3D computer games Most important techniques employed

More information

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching

More information

System requirements for Autodesk Building Design Suite 2017

System requirements for Autodesk Building Design Suite 2017 System requirements for Autodesk Building Design Suite 2017 For specific recommendations for a product within the Building Design Suite, please refer to that products system requirements for additional

More information

Experiences on using GPU accelerators for data analysis in ROOT/RooFit

Experiences on using GPU accelerators for data analysis in ROOT/RooFit Experiences on using GPU accelerators for data analysis in ROOT/RooFit Sverre Jarp, Alfio Lazzaro, Julien Leduc, Yngve Sneen Lindal, Andrzej Nowak European Organization for Nuclear Research (CERN), Geneva,

More information

Boundless Security Systems, Inc.

Boundless Security Systems, Inc. Boundless Security Systems, Inc. sharper images with better access and easier installation Product Overview Product Summary Data Sheet Control Panel client live and recorded viewing, and search software

More information

Towards Large-Scale Molecular Dynamics Simulations on Graphics Processors

Towards Large-Scale Molecular Dynamics Simulations on Graphics Processors Towards Large-Scale Molecular Dynamics Simulations on Graphics Processors Joe Davis, Sandeep Patel, and Michela Taufer University of Delaware Outline Introduction Introduction to GPU programming Why MD

More information

Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg

Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg Image Processing and Computer Graphics Rendering Pipeline Matthias Teschner Computer Science Department University of Freiburg Outline introduction rendering pipeline vertex processing primitive processing

More information

Optimizing AAA Games for Mobile Platforms

Optimizing AAA Games for Mobile Platforms Optimizing AAA Games for Mobile Platforms Niklas Smedberg Senior Engine Programmer, Epic Games Who Am I A.k.a. Smedis Epic Games, Unreal Engine 15 years in the industry 30 years of programming C64 demo

More information

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get

More information

gpus1 Ubuntu 10.04 Available via ssh

gpus1 Ubuntu 10.04 Available via ssh gpus1 Ubuntu 10.04 Available via ssh root@gpus1:[~]#lspci -v grep VGA 01:04.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200eW WPCM450 (rev 0a) 03:00.0 VGA compatible controller: nvidia Corporation

More information

Latency and Bandwidth Impact on GPU-systems

Latency and Bandwidth Impact on GPU-systems NTNU Norwegian University of Science and Technology Faculty of Information Technology, Mathematics and Electrical Engineering Department of Computer and Information Science TDT4590 Complex Computer Systems,

More information

How to choose a suitable computer

How to choose a suitable computer How to choose a suitable computer This document provides more specific information on how to choose a computer that will be suitable for scanning and post-processing your data with Artec Studio. While

More information

SUBJECT: SOLIDWORKS HARDWARE RECOMMENDATIONS - 2013 UPDATE

SUBJECT: SOLIDWORKS HARDWARE RECOMMENDATIONS - 2013 UPDATE SUBJECT: SOLIDWORKS RECOMMENDATIONS - 2013 UPDATE KEYWORDS:, CORE, PROCESSOR, GRAPHICS, DRIVER, RAM, STORAGE SOLIDWORKS RECOMMENDATIONS - 2013 UPDATE Below is a summary of key components of an ideal SolidWorks

More information

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

GPU File System Encryption Kartik Kulkarni and Eugene Linkov GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through

More information

Stream Processing on GPUs Using Distributed Multimedia Middleware

Stream Processing on GPUs Using Distributed Multimedia Middleware Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research

More information

HPC with Multicore and GPUs

HPC with Multicore and GPUs HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware

More information

General Purpose Computation on Graphics Processors (GPGPU) Mike Houston, Stanford University

General Purpose Computation on Graphics Processors (GPGPU) Mike Houston, Stanford University General Purpose Computation on Graphics Processors (GPGPU) Mike Houston, Stanford University A little about me http://graphics.stanford.edu/~mhouston Education: UC San Diego, Computer Science BS Stanford

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information

Dynamic Resolution Rendering

Dynamic Resolution Rendering Dynamic Resolution Rendering Doug Binks Introduction The resolution selection screen has been one of the defining aspects of PC gaming since the birth of games. In this whitepaper and the accompanying

More information