7/14/10. 4 Heterogeneous Computing -> Fusion June Heterogeneous Computing -> Fusion. Definitions. Three Eras of Processor Performance

Size: px
Start display at page:

Download "7/14/10. 4 Heterogeneous Computing -> Fusion June Heterogeneous Computing -> Fusion. Definitions. Three Eras of Processor Performance"

Transcription

1 Definitions Heterogeneous Computing -> Fusion Phil Rogers AMD Corporate Fellow Heterogenous Computing A system comprised of two or more compute engines with signficant structural differences In our case, a low latency x86 CPU and a high throughput Radeon GPU Fusion Bringing together two or more components and joining them into a single unified whole In our case, combining CPUs and GPUs on a single silicon die for higher performance and lower power 1 Heterogeneous Computing -> Fusion June 010 Heterogeneous Computing -> Fusion June 010 AMD Balanced Platform Advantage Three Eras of Processor Performance CPU is ideal for scalar processing Out of order x86 cores with low latency memory access Optimized for sequential and branching algorithms Runs existing applications very well GPU is ideal for parallel processing GPU shaders optimized for throughput computing Ready for emerging workloads Media processing, simulation, natural UI, etc Graphics Workloads Single-Core Era Enabled by: Moore s Law Voltage Scaling MicroArchitecture Constrained by: Power Complexity Multi-Core Era Enabled by: Moore s Law Desire for Throughput 0 years of SMP arch Constrained by: Power Parallel SW availability Scalability Heterogeneous Systems Era Enabled by: Moore s Law Abundant data parallelism Power efficient GPUs Temporarily constrained by: Programming models Communication overheads Serial/Task-Parallel Workloads Other Highly Parallel Workloads Single-thread Performance o we are here Time? Throughput Performance o we are here Time (# of Processors) Targeted Application Performance o we are here Time (Data-parallel exploitation) 3 Heterogeneous Computing -> Fusion June Heterogeneous Computing -> Fusion June 010 1

2 Emerging Application Spaces GPU SP ALU Performance Category Characteristics Application Examples Massive Data Mining Natural User Interfaces Visualization Cloud + Client Applications Full 64b addressing Huge data sets New data types Massive behind-the-scenes computing Advanced rendering Interactive physics Seamless responsiveness Workload partitioning Image, Video, Audio processing Pattern analytics and search Face and gesture recognition Real time video & audio proc Physical world interpretation Multi-layered Graphics Holographic Displays Scientific visualization & CAD Next generation Gaming Next generation browsers HTML5 Apps with Native Code from JavaScript CPU HD4870 HD Heterogeneous Computing -> Fusion June Heterogeneous Computing -> Fusion June 010 GPU DP ALU Performance GPU BW Performance expectations over time HD5870 HD HD4870 HD CPU 0 7 Heterogeneous Computing -> Fusion June Heterogeneous Computing -> Fusion June 010

3 GPU Computing Efficiency Trend Thread Processors GFLOPS/W GFLOPS/W 5-way VLIW Architecture 4 Stream Cores and 1 Special Function Stream Core Separate Branch Unit All 5 cores co-issue Scheduling across the cores is done by the compiler Each core delivers a 3-bit result per clock Thread Processor writes 5 results per clock 9 Heterogeneous Computing -> Fusion June Heterogeneous Computing -> Fusion June 010 SIMD Engines ATI Radeon HD 5870 Compute Architecture Diagram shows SIMD Engines Each SIMD Unit includes: 16 Thread Processors (80 shader cores) + 3KB Local Data Share Its own Thread Sequencer which operates a shared set of threads A dedicated fetch unit with an 8KB L1 cache 0 SIMD Engines 1600 shader cores Ultra-Threaded Dispatch Processor Instruction and Constant Caches Memory Export Buffer Fetch path with multi-level caches Global Data Store 11 Heterogeneous Computing -> Fusion June Heterogeneous Computing -> Fusion June 010 3

4 7/14/10 TeraScale Architecture Radeon HD 5870 Memory Hierarchy Distributed Memory Controller Optimized for latency hiding and memory access efficiency GDDR5 memory at 150GB/s Up to 7 billion 3-bit fetches/ second Up to 1 TB/sec L1 texture fetch bandwidth Up to 435 GB/sec between L1 & L 13 Heterogeneous Computing -> Fusion June Heterogeneous Computing -> Fusion June 010 Comparative Stats on ATI Radeon HD 5870 GPU AMD Opteron ATI Radeon ATI Radeon Model 435 HD 4870 HD 5870 Die Size One Year Difference 346 mm 63 mm 334 mm 1.7x Transistors 904 million 956 million.15 billion.5x Memory Bandwidth 1.8 GB/s 115 GB/sec 153 GB/sec 1.33x SP GFlops x DP GFlops x ALUs Yesterday s Chip Designs Won t Do Board Power* Idle 15.5 W 90 W 7 W 0.3x Max 115 W 160 W 188 W 1.17x 105 million Compute tasks including video decode 110 million D and 3D gaming Nascent video processing * Based on internal AMD testing 15 Heterogeneous Computing -> Fusion June Heterogeneous Computing -> Fusion June 010 4

5 7/14/10 Today We Are Evolving 758 million Multi-tasking Most compute tasks Tomorrow Will Amaze.15 billion 3D OS Multi-panel HD gaming Full HD video and audio ~1 billion in one design Significantly enhances active/ resting battery life APU: Fusion of CPU & GPU compute power within one processor High-bandwidth I/O 17 Heterogeneous Computing -> Fusion June Heterogeneous Computing -> Fusion June 010 AMD Fusion APUs Fill the Need Fusion APUs: Putting it all together Established programming and memory model Mature tool chain Extensive backward compatibility for applications and OSs High barrier to entry Very efficient hardware threading SIMD architecture well matched to modern workloads: video, audio, graphics High Performance Task Parallel Execution System-level Programmable OCL/DC Driver-based programs Power-efficient Data Parallel Execution Graphics Driver-based programs GPU Advancement Outstanding performance-per watt-per-dollar Experts Only Enormous parallel computing capacity Thousands of apps Programmer Accessibility Windows, MacOS and Linux franchises Unacceptable GPU Optimized for Modern Workloads Mainstream Microprocessor Advancement x86 CPU owns the Software World Throughput Performance 19 Heterogeneous Computing -> Fusion June Heterogeneous Computing -> Fusion June 010 5

6 PC with Discrete GPU Fusion APU Based PC 1 Heterogeneous Computing -> Fusion June 010 Heterogeneous Computing -> Fusion June 010 Two x86 Cores Tuned for Target Markets Bulldozer Heterogeneous Computing: Next-Generation Software Ecosystem Increase ease of application development Bobcat Load balance across CPUs and GPUs; leverage AMD Fusion performance advantages Drive new features into industry standards 3 Heterogeneous Computing -> Fusion June Heterogeneous Computing -> Fusion June 010 6

7 Open Standards: Maximize Developer Freedom and Addressable Market Vendor specific Cross-platform limiters Apple Display Connector 3dfx Glide Nvidia CUDA Nvidia Cg Rambus Unified Display Interface Vendor neutral Cross-platform enablers OpenCL and DirectX 11 DirectCompute How will developers choose? DirectX 11 DirectCompute Easiest path to add compute capabilities to existing DirectX applications Windows Vista and Windows 7 only OpenCL Ideal path for new applications porting to the GPU for the first time True multiplatform: Windows, Linux, MacOS Natural programming without dealing with a graphics API 5 Heterogeneous Computing -> Fusion June Heterogeneous Computing -> Fusion June 010 The Benefits of Fusion Unparalleled processing capabilities in mobile form factors Shared memory for the CPU and GPU Eliminates copies, increasing performance Reduces dispatch overhead Lower latency from the GPU to memory Power efficient design Enables architectural innovations between CPU, GPU and the Memory System Scalable architecture that can target a broad range of platforms from mobile to data center The Fusion Opportunity A new architectural and performance balance point for computing A new machine target for research A high volume opportunity for new algorithms, new workloads and new applications The deployment opportunity is especially strong in the consumer market place 7 Heterogeneous Computing -> Fusion June Heterogeneous Computing -> Fusion June 010 7

Heterogeneous Computing -> Fusion

Heterogeneous Computing -> Fusion Heterogeneous Computing -> Fusion Norm Rubin AMD Fellow 1 Heterogeneous Computing -> Fusion saahpc 2010 Definitions Heterogenous Computing A system comprised of two or more compute engines with signficant

More information

Introduction to Parallel and Heterogeneous Computing. Benedict R. Gaster October, 2010

Introduction to Parallel and Heterogeneous Computing. Benedict R. Gaster October, 2010 Introduction to Parallel and Heterogeneous Computing Benedict R. Gaster October, 2010 Agenda Motivation A little terminology Hardware in a heterogeneous world Software in a heterogeneous world 2 Introduction

More information

THE PROGRAMMER S GUIDE TO THE APU GALAXY. Phil Rogers, Corporate Fellow AMD

THE PROGRAMMER S GUIDE TO THE APU GALAXY. Phil Rogers, Corporate Fellow AMD THE PROGRAMMER S GUIDE TO THE APU GALAXY Phil Rogers, Corporate Fellow AMD THE OPPORTUNITY WE ARE SEIZING Make the unprecedented processing capability of the APU as accessible to programmers as the CPU

More information

White Paper COMPUTE CORES

White Paper COMPUTE CORES White Paper COMPUTE CORES TABLE OF CONTENTS A NEW ERA OF COMPUTING 3 3 HISTORY OF PROCESSORS 3 3 THE COMPUTE CORE NOMENCLATURE 5 3 AMD S HETEROGENEOUS PLATFORM 5 3 SUMMARY 6 4 WHITE PAPER: COMPUTE CORES

More information

GPUs: Doing More Than Just Games. Mark Gahagan CSE 141 November 29, 2012

GPUs: Doing More Than Just Games. Mark Gahagan CSE 141 November 29, 2012 GPUs: Doing More Than Just Games Mark Gahagan CSE 141 November 29, 2012 Outline Introduction: Why multicore at all? Background: What is a GPU? Quick Look: Warps and Threads (SIMD) NVIDIA Tesla: The First

More information

Radeon HD 2900 and Geometry Generation. Michael Doggett

Radeon HD 2900 and Geometry Generation. Michael Doggett Radeon HD 2900 and Geometry Generation Michael Doggett September 11, 2007 Overview Introduction to 3D Graphics Radeon 2900 Starting Point Requirements Top level Pipeline Blocks from top to bottom Command

More information

Klaus Mueller, Wei Xu, Ziyi Zheng Fang Xu

Klaus Mueller, Wei Xu, Ziyi Zheng Fang Xu MIC-GPU: High-Performance Computing for Medical Imaging on Programmable Graphics Hardware (GPUs) Entertainment Graphics: Virtual Realism for the Masses Computer games need to have: realistic appearance

More information

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011 Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

More information

General & Special-purpose architecture. General-purpose GPU. GPGPU Programming models. GPGPU Memory models. Next generation

General & Special-purpose architecture. General-purpose GPU. GPGPU Programming models. GPGPU Memory models. Next generation General & Special-purpose architecture General-purpose GPU GPGPU Programming models GPGPU Memory models Next generation 26/02/2009 Cristian Dittamo Dept. of Computer Science, University of Pisa 2 Von Neumann

More information

Core/Many-Core Architectures and Programming. Prof. Huiyang Zhou

Core/Many-Core Architectures and Programming.  Prof. Huiyang Zhou ST: CDA 6938 Multi-Core/Many Core/Many-Core Architectures and Programming http://csl.cs.ucf.edu/courses/cda6938/ Prof. Huiyang Zhou School of Electrical Engineering and Computer Science University of Central

More information

Introduction to GPU Architecture

Introduction to GPU Architecture Introduction to GPU Architecture Ofer Rosenberg, PMTS SW, OpenCL Dev. Team AMD Based on From Shader Code to a Teraflop: How GPU Shader Cores Work, By Kayvon Fatahalian, Stanford University Content 1. Three

More information

AES on Graphics Processing Units

AES on Graphics Processing Units AES on Graphics Processing Units AES Encryption Implementation and Analysis on Commodity Graphics Processing Units Trinity College Dublin Ireland Owen Harrison, John Waldron Presentation Motivational Background.

More information

Xbox 360 GPU and Radeon HD Michael Doggett Principal Member of Technical Staff Marlborough, Massachusetts October 29, 2007

Xbox 360 GPU and Radeon HD Michael Doggett Principal Member of Technical Staff Marlborough, Massachusetts October 29, 2007 Xbox 360 GPU and Radeon HD 2900 Michael Doggett Principal Member of Technical Staff Marlborough, Massachusetts October 29, 2007 Overview Introduction to 3D Graphics Xbox 360 GPU Radeon 2900 Pipeline Blocks

More information

NVIDIA Quadro K2200. Product Specifications. NVIDIA Quadro K2200 Part No. VCQK2200 PB $ CUDA Cores 640. Maximum Power Consumption

NVIDIA Quadro K2200. Product Specifications. NVIDIA Quadro K2200 Part No. VCQK2200 PB $ CUDA Cores 640. Maximum Power Consumption NVIDIA Quadro K2200 NVIDIA Quadro K2200 Part No. VCQK2200 PB $599.00 84 0 0 36 Product Specifications CUDA Cores 640 GPU Memory Memory Interface Memory Bandwidth System Interface Maximum Power Consumption

More information

Radeon HD Michael Doggett

Radeon HD Michael Doggett Radeon HD 2900 Michael Doggett August 5, 2007 Overview Starting Point Requirements Top level Pipeline Blocks from top to bottom Command Processor Shader Setup Engine Ultra Threaded Dispatch Processor ShaderCore

More information

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008 Radeon GPU Architecture and the series Michael Doggett Graphics Architecture Group June 27, 2008 Graphics Processing Units Introduction GPU research 2 GPU Evolution GPU started as a triangle rasterizer

More information

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

Graphics Processing Unit (GPU) Memory Hierarchy. Presented by Vu Dinh and Donald MacIntyre

Graphics Processing Unit (GPU) Memory Hierarchy. Presented by Vu Dinh and Donald MacIntyre Graphics Processing Unit (GPU) Memory Hierarchy Presented by Vu Dinh and Donald MacIntyre 1 Agenda Introduction to Graphics Processing CPU Memory Hierarchy GPU Memory Hierarchy GPU Architecture Comparison

More information

NVIDIA GeForce GTX 580 GPU Datasheet

NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet 3D Graphics Full Microsoft DirectX 11 Shader Model 5.0 support: o NVIDIA PolyMorph Engine with distributed HW tessellation engines

More information

G8x Hardware Architecture

G8x Hardware Architecture G8x Hardware Architecture 1 G80 Architecture First DirectX 10 compatible GPU Unified shader architecture Scalar processors Includes new hardware features designed for general purpose computation shared

More information

SAPPHIRE HD GB GDDR5 WITH BOOST

SAPPHIRE HD GB GDDR5 WITH BOOST SAPPHIRE HD 7950 3GB GDDR5 WITH BOOST The SAPPHIRE HD 7950 3GB with Boost features SAPPHIRE s new dual-extractor technology - Dual-X - a highly efficient multi-heatpipe cooler with dual fans providing

More information

Inside Intel Core Microarchitecture Setting New Standards for Energy-Efficient Performance

Inside Intel Core Microarchitecture Setting New Standards for Energy-Efficient Performance White Paper Inside Intel Core Microarchitecture Setting New Standards for Energy-Efficient Performance Ofri Wechsler Intel Fellow, Mobility Group Director, Mobility Microprocessor Architecture Intel Corporation

More information

Experiencing Various Massively Parallel Architectures and Programming Models for Data-Intensive Applications

Experiencing Various Massively Parallel Architectures and Programming Models for Data-Intensive Applications Experiencing Various Massively Parallel Architectures and Programming Models for Data-Intensive Applications Hongliang Gao, Martin Dimitrov, Jingfei Kong, Huiyang Zhou School of Electrical Engineering

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

GPU Architecture. Michael Doggett ATI

GPU Architecture. Michael Doggett ATI GPU Architecture Michael Doggett ATI GPU Architecture RADEON X1800/X1900 Microsoft s XBOX360 Xenos GPU GPU research areas ATI - Driving the Visual Experience Everywhere Products from cell phones to super

More information

Heterogeneous Computing in ARM Architecture. Media Processing Division ARM June 25 th 2013

Heterogeneous Computing in ARM Architecture. Media Processing Division ARM June 25 th 2013 Heterogeneous Computing in ARM Architecture Media Processing Division ARM June 25 th 2013 Agenda Trends in Heterogeneous Computing GPU Computing with ARM Mali -T600 series as example Heterogeneous System

More information

ATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group

ATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group ATI Radeon 4800 series Graphics Michael Doggett Graphics Architecture Group Graphics Product Group Graphics Processing Units ATI Radeon HD 4870 AMD Stream Computing Next Generation GPUs 2 Radeon 4800 series

More information

SAPPHIRE R9 270X 4GB GDDR5 WITH BOOST & OC

SAPPHIRE R9 270X 4GB GDDR5 WITH BOOST & OC SAPPHIRE R9 270X 4GB GDDR5 WITH BOOST & OC Specification Display Support Output GPU Video Memory Dimension Software Accessory 3 x Maximum Display Monitor(s) support 1 x HDMI (with 3D) 1 x DisplayPort 1.2

More information

GPGPU Computing. Yong Cao

GPGPU Computing. Yong Cao GPGPU Computing Yong Cao Why Graphics Card? It s powerful! A quiet trend Copyright 2009 by Yong Cao Why Graphics Card? It s powerful! Processor Processing Units FLOPs per Unit Clock Speed Processing Power

More information

NUMA Programming; OpenCL

NUMA Programming; OpenCL NUMA Programming; OpenCL Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico December 2, 2009 José Monteiro (DEI / IST) Parallel and Distributed

More information

Ivy Bridge. Dan Bower Ashley Sukhavong

Ivy Bridge. Dan Bower Ashley Sukhavong Ivy Bridge Dan Bower Ashley Sukhavong Moore's Law Moore s Law Intel s co founder Gordon Moore. The number of transistors on a chip will double approximately every two years. It s a guiding principle that

More information

NVIDIA Quadro GB Graphics Card Overview

NVIDIA Quadro GB Graphics Card Overview Models WS095AA Introduction Accelerate your entire workflow with NVIDIA's High End Quadro 4000 graphics card: Up to 5X faster 3D application performance scaling over previous generation Quadro FX3800 GPGPU

More information

Stream Computing on ATI Radeon Embedded Graphics Processors

Stream Computing on ATI Radeon Embedded Graphics Processors Stream Computing on ATI Radeon Embedded Graphics Processors Peter Mandl Sr. Product Mgr, Embedded Graphics January 2010 1 Stream Computing Harnessing the Computational Power of GPUs Hundreds of processing

More information

Introduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it

Introduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it t.diamanti@cineca.it Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate

More information

Higher Level Programming Abstractions for FPGAs using OpenCL

Higher Level Programming Abstractions for FPGAs using OpenCL Higher Level Programming Abstractions for FPGAs using OpenCL Desh Singh Supervising Principal Engineer Altera Corporation Toronto Technology Center Technology scaling favors programmability CPUs Single

More information

Awards News. GDDR5 memory provides twice the bandwidth per pin of GDDR3 memory, delivering more speed and higher bandwidth.

Awards News. GDDR5 memory provides twice the bandwidth per pin of GDDR3 memory, delivering more speed and higher bandwidth. SAPPHIRE FleX HD 6870 1GB GDDE5 SAPPHIRE HD 6870 FleX can support three DVI monitors in Eyefinity mode and deliver a true SLS (Single Large Surface) work area without the need for costly active adapters.

More information

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 14: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital

More information

HPC platform options: Cell BE and GPU

HPC platform options: Cell BE and GPU HPC platform options: Cell BE and GPU As data processing requirements increased with new applications, new processing technologies like Stream computing and parallel execution came into being. Anoop Thomas

More information

AMD 2010 Financial Analyst Day

AMD 2010 Financial Analyst Day AMD 2010 Financial Analyst Day Matt Skynner, Corporate Vice President and General Manager, Graphics Division, AMD Products Group Chris Cloran, Corporate Vice President and General Manager, Client Division,

More information

AMD Radeon HD 8000M Series GPU Specifications AMD Radeon HD 8870M Series GPU Feature Summary

AMD Radeon HD 8000M Series GPU Specifications AMD Radeon HD 8870M Series GPU Feature Summary AMD Radeon HD 8000M Series GPU Specifications AMD Radeon HD 8870M Series GPU Feature Summary Up to 725 MHz engine clock (up to 775 MHz wh boost) Up to 2GB GDDR5 memory and 2GB DDR3 Memory Up to 1.125 GHz

More information

Performance Optimization

Performance Optimization ST: CDA 6938: Multi-Core/Many Core/Many-Core Architecture and Performance Optimization of GPGPU Huiyang Zhou School of Electrical Engineering and Computer Science University of Central Florida 1 Performance

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware

More information

Parallella: A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013

Parallella: A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 Parallella: A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1 Adapteva Achieves 3 World Firsts 1. First processor company to reach 50 GFLOPS/W 3. First semiconductor

More information

GPU Architecture Overview. John Owens UC Davis

GPU Architecture Overview. John Owens UC Davis GPU Architecture Overview John Owens UC Davis The Right-Hand Turn [H&P Figure 1.1] Why? [Architecture Reasons] ILP increasingly difficult to extract from instruction stream Control hardware dominates µprocessors

More information

Agenda UCF GPGPU UCF HD 2900 HD 3850 & 3870 HD 3870X2. - Introduction - Overview of available GPUs from AMD

Agenda UCF GPGPU UCF HD 2900 HD 3850 & 3870 HD 3870X2. - Introduction - Overview of available GPUs from AMD 2008 UCF GPGPU Class @ UCF HD 2900 HD 3850 & 3870 HD 3870X2 Mike Mantor Fellow AMD Graphics Products Group michael.mantor@amd.com Agenda - Introduction - Overview of available GPUs from AMD - Functional

More information

NVIDIA Quadro M2000 Part No. VCQM2000-PB

NVIDIA Quadro M2000 Part No. VCQM2000-PB WEB COPY NVIDIA Quadro M2000 Part No. VCQM2000-PB Overview NVIDIA Quadro M2000 The Perfect Balance of Superb Performance, Compelling Features, and Compact Form Factor The Quadro M2000 delivers an incredible

More information

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics GPU Architectures A CPU Perspective Derek Hower AMD Research 5/21/2013 Goals Data Parallelism: What is it, and how to exploit it? Workload characteristics Execution Models / GPU Architectures MIMD (SPMD),

More information

AMD Technical Day - Welcome

AMD Technical Day - Welcome AMD Technical Day - Welcome Ritche Corpus & Eric Lundgren June 29, 2007 Table of Contents AMD Technical Day Schedule GPU Product Overview AMD Developer Relations (Europe) 2 AMD Technical Day Schedule 09:00-09:15

More information

Evolution of Graphics Pipelines

Evolution of Graphics Pipelines Evolution of Graphics Pipelines 1 Understanding the Graphics Heritage the era of fixed-function graphics pipelines the stages to render triangles 2 Programmable Real-Time Graphics programmable vertex and

More information

Msystems Ltd. www.msystems.gr SAPPHIRE HD 6870 1GB GDDR5 PCIE

Msystems Ltd. www.msystems.gr SAPPHIRE HD 6870 1GB GDDR5 PCIE SAPPHIRE HD 6870 1GB GDDR5 PCIE The SAPPHIRE HD 6870 has a new architecture with a total of 1120 stream processors and 56 texture units delivering massively parallel computing power for graphics and other

More information

HETEROGENEOUS SYSTEM ARCHITECTURE. Phil Rogers, AMD Corporate Fellow HSA Foundation President

HETEROGENEOUS SYSTEM ARCHITECTURE. Phil Rogers, AMD Corporate Fellow HSA Foundation President HETEROGENEOUS SYSTEM ARCHITECTURE Phil Rogers, AMD Corporate Fellow HSA Foundation President GOALS Make the unprecedented processing capability of the APU as accessible to programmers as the CPU is today

More information

Multicore Processors. Raul Queiroz Feitosa. Parts of these slides are from the support material provided by W. Stallings

Multicore Processors. Raul Queiroz Feitosa. Parts of these slides are from the support material provided by W. Stallings Multicore Processors Raul Queiroz Feitosa Parts of these slides are from the support material provided by W. Stallings Objective Objective This chapter provides an overview of multicore systems. Stallings

More information

Implementation of Parallel Processing Techniques on Graphical Processing Units. Brad Baker, Wayne Haney, Dr. Charles Choi

Implementation of Parallel Processing Techniques on Graphical Processing Units. Brad Baker, Wayne Haney, Dr. Charles Choi Implementation of Parallel Processing Techniques on Graphical Processing Units Brad Baker, Wayne Haney, Dr. Charles Choi Industry Direction High performance COTS computing is moving to multi-core and heterogeneous

More information

Michael Fried GPGPU Business Unit Manager Microway, Inc. Updated June, 2010

Michael Fried GPGPU Business Unit Manager Microway, Inc. Updated June, 2010 Michael Fried GPGPU Business Unit Manager Microway, Inc. Updated June, 2010 http://microway.com/gpu.html Up to 1600 SCs @ 725-850MHz Up to 512 CUDA cores @ 1.15-1.4GHz 1600 SP, 320, 320 SF 512 SP, 256,

More information

New Standard from Khronos for Heterogeneous Parallel Computing (v1.0 Released Dec 2008)

New Standard from Khronos for Heterogeneous Parallel Computing (v1.0 Released Dec 2008) OpenCL Overview What is OpenCL? New Standard from Khronos for Heterogeneous Parallel Computing (v1.0 Released Dec 2008) Initiated by Apple Open and royalty free Cross-Vendor and Cross-Platform Make use

More information

CSCI-GA Graphics Processing Units (GPUs): Architecture and Programming. OpenCL in Action

CSCI-GA Graphics Processing Units (GPUs): Architecture and Programming. OpenCL in Action CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming OpenCL in Action Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Many slides from this lecture are adapted

More information

Massively Parallel Computing with CUDA. Antonino Tumeo Politecnico di Milano

Massively Parallel Computing with CUDA. Antonino Tumeo Politecnico di Milano Massively Parallel Computing with CUDA Antonino Tumeo Politecnico di Milano 1 GPUs have evolved to the point where many real world applications are easily implemented on them and run significantly faster

More information

FAST FOURIER TRANSFORM FOR AMD GPUS using OpenCL. Akira Nukada Tokyo Institute of Technology, Japan Researcher

FAST FOURIER TRANSFORM FOR AMD GPUS using OpenCL. Akira Nukada Tokyo Institute of Technology, Japan Researcher FAST FOURIER TRANSFORM FOR AMD GPUS using OpenCL Akira Nukada Tokyo Institute of Technology, Japan Researcher FAST FOURIER TRANSFORM () as well as BLAS Important computations used in many HPC applications

More information

Optimization Techniques: Image Convolution. Udeepta D. Bordoloi December 2010

Optimization Techniques: Image Convolution. Udeepta D. Bordoloi December 2010 Optimization Techniques: Image Convolution Udeepta D. Bordoloi December 2010 Contents AMD GPU architecture review OpenCL mapping on AMD hardware Convolution Algorithm Optimizations (CPU) Optimizations

More information

THE PROGRAMMER S GUIDE TO A UNIVERSE OF POSSIBILITY Heterogeneous System Architecture. Bruno Stefanizzi AMD

THE PROGRAMMER S GUIDE TO A UNIVERSE OF POSSIBILITY Heterogeneous System Architecture. Bruno Stefanizzi AMD THE PROGRAMMER S GUIDE TO A UNIVERSE OF POSSIBILITY Heterogeneous System Architecture Bruno Stefanizzi AMD Most parallel code runs on CPUs designed for scalar workloads 2 The Programmer s Guide to a Universe

More information

GPU Computing with CUDA Lecture 1 - Introduction. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

GPU Computing with CUDA Lecture 1 - Introduction. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile GPU Computing with CUDA Lecture 1 - Introduction Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 General Ideas Objectives - Learn CUDA - Recognize CUDA friendly algorithms

More information

IEEE FLOATING-POINT ARITHMETIC IN AMD'S GRAPHICS CORE NEXT ARCHITECTURE MIKE SCHULTE AMD RESEARCH JULY 2016

IEEE FLOATING-POINT ARITHMETIC IN AMD'S GRAPHICS CORE NEXT ARCHITECTURE MIKE SCHULTE AMD RESEARCH JULY 2016 IEEE 754-2008 FLOATING-POINT ARITHMETIC IN AMD'S GRAPHICS CORE NEXT ARCHITECTURE MIKE SCHULTE AMD RESEARCH JULY 2016 AGENDA AMD s Graphics Core Next (GCN) Architecture Processors Featuring the GCN Architecture

More information

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university Real-Time Realistic Rendering Michael Doggett Docent Department of Computer Science Lund university 30-5-2011 Visually realistic goal force[d] us to completely rethink the entire rendering process. Cook

More information

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

NVIDIA GPU Architecture. for General Purpose Computing. Anthony Lippert 4/27/09

NVIDIA GPU Architecture. for General Purpose Computing. Anthony Lippert 4/27/09 NVIDIA GPU Architecture for General Purpose Computing Anthony Lippert 4/27/09 1 Outline Introduction GPU Hardware Programming Model Performance Results Supercomputing Products Conclusion 2 Intoduction

More information

Hyper-Threading Technology on the Intel. Processor Family for Servers. XeonTM

Hyper-Threading Technology on the Intel. Processor Family for Servers. XeonTM Hyper-Threading Technology on the Intel Xeon TM Processor Family for Servers Hyper-Threading Technology on the Intel XeonTM Processor Family for Servers Offering increased server performance through on-processor

More information

The ARM. Mali Family of Graphics Processors

The ARM. Mali Family of Graphics Processors The ARM Mali Family of Graphics Processors Visual Computing by ARM The ARM Mali family of Graphics Processing Units (GPUs) scales to deliver industry-leading graphics on entry level smartphones, right

More information

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.

More information

QuickSpecs. NVIDIA Quadro M6000 12GB Graphics INTRODUCTION. NVIDIA Quadro M6000 12GB Graphics. Overview

QuickSpecs. NVIDIA Quadro M6000 12GB Graphics INTRODUCTION. NVIDIA Quadro M6000 12GB Graphics. Overview Overview L2K02AA INTRODUCTION Push the frontier of graphics processing with the new NVIDIA Quadro M6000 12GB graphics card. The Quadro M6000 features the top of the line member of the latest NVIDIA Maxwell-based

More information

CSCI-GA Graphics Processing Units (GPUs): Architecture and Programming More on GPU Architecture

CSCI-GA Graphics Processing Units (GPUs): Architecture and Programming More on GPU Architecture CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming More on GPU Architecture Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com On the motherboard PCI-e Developed

More information

SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST

SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST Specification Display Support Output GPU Video Memory Dimension Software Accessory supports up to 4 display monitor(s) without DisplayPort 4 x Maximum Display

More information

How a GPU Works. Kayvon Fatahalian (Fall 2011)

How a GPU Works. Kayvon Fatahalian (Fall 2011) How a GPU Works Kayvon Fatahalian 15-462 (Fall 2011) Today 1. Review: the graphics pipeline 2. History: a few old GPUs 3. How a modern GPU works (and why it is so fast!) 4. Closer look at a real GPU design

More information

White Paper AMD EMBEDDED G-SERIES SOC PLATFORM. Excellent Performance in an Ultra Compact Footprint with Enterprise-class ECC Support

White Paper AMD EMBEDDED G-SERIES SOC PLATFORM. Excellent Performance in an Ultra Compact Footprint with Enterprise-class ECC Support White Paper AMD EMBEDDED G-SERIES SOC PLATFORM Excellent Performance in an Ultra Compact Footprint with Enterprise-class ECC Support TABLE OF CONTENTS SUPERIOR PERFORMANCE PER WATT AND ENHANCED MULTIMEDIA

More information

Pangaea: A Tightly-Coupled Heterogeneous IA32 Chip Multiprocessor

Pangaea: A Tightly-Coupled Heterogeneous IA32 Chip Multiprocessor Pangaea: A Tightly-Coupled Heterogeneous IA32 Chip Multiprocessor Henry Wong 1, Anne Bracy 2, Ethan Schuchman 2, Tor M. Aamodt 1, Jamison D. Collins 2, Perry H. Wang 2, Gautham Chinya 2, Ankur Khandelwal

More information

Real-Time Chroma Keying on the GPU. A White Paper by David Yamnitsky Boris FX, Nov. 2009

Real-Time Chroma Keying on the GPU. A White Paper by David Yamnitsky Boris FX, Nov. 2009 Real-Time Chroma Keying on the GPU A White Paper by David Yamnitsky Boris FX, Nov. 2009 Real-Time Chroma Keying on the GPU David Yamnitsky Boris FX (a) Input Image (b) Keyed Matte (c) Composite FIGURE

More information

GPU Computing with NVIDIA CUDA. Ian Buck NVIDIA

GPU Computing with NVIDIA CUDA. Ian Buck NVIDIA GPU Computing with NVIDIA CUDA Ian Buck NVIDIA Stunning Graphics Realism Lush, Rich Worlds Crysis 2006 Crytek / Electronic Arts Incredible Physics Effects Core of the Definitive Gaming Platform Hellgate:

More information

SAPPHIRE HD 6870 1GB GDDR5 PCIE. www.msystems.gr

SAPPHIRE HD 6870 1GB GDDR5 PCIE. www.msystems.gr SAPPHIRE HD 6870 1GB GDDR5 PCIE Get Radeon in Your System - Immerse yourself with AMD Eyefinity technology and expand your games across multiple displays. Experience ultra-realistic visuals and explosive

More information

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University Project3 Cache Race Games night Monday, May 4 th, 5pm Come, eat, drink, have fun and be merry! Location: B17 Upson Hall

More information

QuickSpecs. NVIDIA Quadro M6000 12GB Graphics INTRODUCTION PERFORMANCE AND FEATURES. NVIDIA Quadro M6000 12GB Graphics. Overview

QuickSpecs. NVIDIA Quadro M6000 12GB Graphics INTRODUCTION PERFORMANCE AND FEATURES. NVIDIA Quadro M6000 12GB Graphics. Overview Overview L2K02AA INTRODUCTION Push the frontier of graphics processing with the new NVIDIA Quadro M6000 12GB graphics card. The Quadro M6000 features the top of the line member of the latest NVIDIA Maxwell-based

More information

COSC 243. Computer Architecture 2. Lecture 13 Computer Architecture 2. COSC 243 (Computer Architecture)

COSC 243. Computer Architecture 2. Lecture 13 Computer Architecture 2. COSC 243 (Computer Architecture) COSC 243 1 Overview This Lecture Architectural topics CISC RISC Multi-core processors Source: lecture notes Next Lecture Operating systems 2 Moore s Law 3 CISC What is the best thing to do with all those

More information

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA The Evolution of Computer Graphics Tony Tamasi SVP, Content & Technology, NVIDIA Graphics Make great images intricate shapes complex optical effects seamless motion Make them fast invent clever techniques

More information

AMD 2010 Financial Analyst Day

AMD 2010 Financial Analyst Day AMD 2010 Financial Analyst Day Rick Bergman Senior Vice President and General Manager November 9, 2010 Agenda AMD Strategy Why AMD Fusion? Winning with AMD Fusion Product Roadmaps 3 AMD 2010 Financial

More information

The ARM Mali -T880 Mobile GPU. Ian Bratt ARM Media Processing Group

The ARM Mali -T880 Mobile GPU. Ian Bratt ARM Media Processing Group The ARM Mali -T880 Mobile GPU Ian Bratt ARM Media Processing Group 1 Million units ARM Mali Success 73 licensees Total 114 licenses 25 new Mali licenses in FY14 600 500 550M Mali based GPUs shipped in

More information

GPU Computing & Architectures 1. Introduction. Ezio Bartocci Vienna University of Technology

GPU Computing & Architectures 1. Introduction. Ezio Bartocci Vienna University of Technology GPU Computing & Architectures 1. Introduction Ezio Bartocci Vienna University of Technology Objectives: Aim of this course Gaining understanding of GPU computing architecture Getting familiar with GPU

More information

INF5063: Programming heterogeneous multi-core processors Introduction

INF5063: Programming heterogeneous multi-core processors Introduction INF5063: Programming heterogeneous multi-core processors Introduction Håkon Kvale Stensland August 28 th, 2012 INF5063 IXA: Internet Exchange Architecture IXP2400 basic features: 1 embedded 600 MHz Intel

More information

QuickSpecs. AMD FirePro W7100 8GB Graphics INTRODUCTION PERFORMANCE AND FEATURES. AMD FirePro W7100 8GB Graphics. Overview

QuickSpecs. AMD FirePro W7100 8GB Graphics INTRODUCTION PERFORMANCE AND FEATURES. AMD FirePro W7100 8GB Graphics. Overview Overview J3G93AA INTRODUCTION The AMD FirePro W7100 workstation graphics delivers great performance, superb visual quality, and outstanding multi-display capabilities. It is an excellent high-end solution

More information

The University of Texas at Arlington Lecture 2

The University of Texas at Arlington Lecture 2 The University of Texas at Arlington Lecture 2 Reading Assignment Reading Assignment for Tuesday January 25 Read Chapter 1 in Multi-Core Programming Text Also available for download at: http://www.intel.com/intelpress/samples/mcp_sam

More information

Xbox 360 System Architecture. Jeff Andrews Nick Baker Xbox Semiconductor Technology Group

Xbox 360 System Architecture. Jeff Andrews Nick Baker Xbox Semiconductor Technology Group Xbox 360 System Architecture Jeff Andrews Nick Baker Xbox Semiconductor Technology Group Hot Chips Presentation Hardware Specs Architectural Choices Programming Environment QA Hot Chips 17 2 Overview Design

More information

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.

More information

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software GPU Computing Numerical Simulation - from Models to Software Andreas Barthels JASS 2009, Course 2, St. Petersburg, Russia Prof. Dr. Sergey Y. Slavyanov St. Petersburg State University Prof. Dr. Thomas

More information

CSCI-GA Graphics Processing Units (GPUs): Architecture and Programming. OpenCL in Action

CSCI-GA Graphics Processing Units (GPUs): Architecture and Programming. OpenCL in Action CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming OpenCL in Action Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Many slides from this lecture are adapted

More information

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,

More information

Short Introduction to GPU programming for Scientific Computing

Short Introduction to GPU programming for Scientific Computing PDC Summer School 2016 Short Introduction to GPU programming for Scientific Computing 2015-08-19 Michael Schliephake Szilárd Páll KTH CSC HPCViz Programming with GPU - Michael Schliephake, KTH - CSC -

More information

INF5063: Programming heterogeneous multi-core processors. September 13, 2010

INF5063: Programming heterogeneous multi-core processors. September 13, 2010 INF5063: Programming heterogeneous multi-core processors September 13, 2010 Overview Course topic and scope Background for the use and parallel processing using heterogeneous multi-core processors Examples

More information

GPU programming using C++ AMP

GPU programming using C++ AMP GPU programming using C++ AMP Petrika Manika petrika.manika@fshn.edu.al Elda Xhumari elda.xhumari@fshn.edu.al Julian Fejzaj julian.fejzaj@fshn.edu.al Abstract Nowadays, a challenge for programmers is to

More information

Power Efficient Processor Design and the Cell Processor

Power Efficient Processor Design and the Cell Processor Power Efficient Design and the Cell H. Peter Hofstee, Ph. D. hofstee@us.ibm.com Architect, Cell Synergistic Element IBM Systems and Technology Group Austin, Texas Agenda Power Efficient Architecture System

More information

Spring 2010 Prof. Hyesoon Kim

Spring 2010 Prof. Hyesoon Kim Spring 2010 Prof. Hyesoon Kim Outstanding performance, especially on game/multimedia applications. Challenges: Power Wall, Frequency Wall, Memory Wall Real time responsiveness to the user and the network.

More information