Heterogeneous Computing -> Fusion

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Heterogeneous Computing -> Fusion"

Transcription

1 Heterogeneous Computing -> Fusion Norm Rubin AMD Fellow 1 Heterogeneous Computing -> Fusion saahpc 2010

2 Definitions Heterogenous Computing A system comprised of two or more compute engines with signficant structural differences In our case, a low latency x86 CPU and a high throughput Radeon GPU Fusion Bringing together two or more components and joining them into a single unified whole In our case, combining CPUs and GPUs on a single silicon die for higher performance and lower power 2 Heterogeneous Computing -> Fusion saahpc 2010

3 AMD Balanced Platform Advantage CPU is ideal for scalar processing Out of order x86 cores with low latency memory access Optimized for sequential and branching algorithms Runs existing applications very well GPU is ideal for parallel processing GPU shaders optimized for throughput computing Ready for emerging workloads Media processing, simulation, natural UI, etc Graphics Workloads Serial/Task-Parallel Workloads Other Highly Parallel Workloads Delivers optimal performance for a wide range of platform configurations 3 Heterogeneous Computing -> Fusion saahpc 2010

4 Single-thread Performance Throughput Performance Targeted Application Performance Three Eras of Processor Performance Single-Core Era Enabled by: Moore s Law Voltage Scaling MicroArchitecture Constrained by: Power Complexity Multi-Core Era Enabled by: Moore s Law Desire for Throughput 20 years of SMP arch Constrained by: Power Parallel SW availability Scalability Heterogeneous Systems Era Enabled by: Moore s Law Abundant data parallelism Power efficient GPUs Temporarily constrained by: Programming models Communication overheads o we are here? o we are here o we are here Time Time (# of Processors) Time (Data-parallel exploitation) 4 Heterogeneous Computing -> Fusion saahpc 2010

5 Emerging Application Spaces Category Characteristics Application Examples Massive Data Mining Natural User Interfaces Visualization Cloud + Client Applications Full 64b addressing Huge data sets New data types Massive behind-the-scenes computing Advanced rendering Interactive physics Seamless responsiveness Workload partitioning Image, Video, Audio processing Pattern analytics and search Face and gesture recognition Real time video & audio proc Physical world interpretation Multi-layered Graphics Holographic Displays Scientific visualization & CAD Next generation Gaming Next generation browsers HTML5 Apps with Native Code from JavaScript 5 Heterogeneous Computing -> Fusion saahpc 2010

6 GPU SP ALU Performance HD5870 HD4870 CPU 6 Heterogeneous Computing -> Fusion saahpc 2010

7 GPU DP ALU Performance HD5870 HD4870 CPU 7 Heterogeneous Computing -> Fusion saahpc 2010

8 GPU BW Performance expectations over time HD4870 HD Heterogeneous Computing -> Fusion saahpc 2010

9 GPU Computing Efficiency Trend GFLOPS/W GFLOPS/W GFLOPS/mm GFLOPS/mm Heterogeneous Computing -> Fusion saahpc 2010

10 ATI Radeon HD 5870 Compute Architecture 20 SIMD Engines 1600 shader cores Ultra-Threaded Dispatch Processor Instruction and Constant Caches Memory Export Buffer Fetch path with multi-level caches Global Data Store 10 Heterogeneous Computing -> Fusion saahpc 2010

11 Memory Hierarchy Distributed Memory Controller Optimized for latency hiding and memory access efficiency GDDR5 memory at 150GB/s Up to 272 billion 32-bit fetches/second Up to 1 TB/sec L1 texture fetch bandwidth Up to 435 GB/sec between L1 & L2 11 Heterogeneous Computing -> Fusion saahpc 2010

12 Comparative Stats on ATI Radeon HD 5870 GPU AMD Opteron Model 2435 ATI Radeon HD 4870 ATI Radeon HD 5870 One Year Difference Die Size 346 mm mm mm x Transistors 904 million 956 million 2.15 billion 2.25x Memory Bandwidth 12.8 GB/s 115 GB/sec 153 GB/sec 1.33x SP GFlops x DP GFlops ALUs x Board Power* Idle 15.5 W 90 W 27 W 0.3x Max 115 W 160 W 188 W 1.17x 12 Heterogeneous Computing -> Fusion saahpc 2010 * Based on internal AMD testing

13 Yesterday s Chip Designs Won t Do CPU 105 million Compute tasks including video decode GPU 110 million 2D and 3D gaming Nascent video processing 13 Heterogeneous Computing -> Fusion saahpc 2010

14 Today We Are Evolving Multi-core CPU TeraFLOPS-class GPU 758 million Multi-tasking Most compute tasks 2.15 billion 3D OS Multi-panel HD gaming Full HD video and audio 14 Heterogeneous Computing -> Fusion saahpc 2010

15 Tomorrow Will Amaze ~1 billion in one design APU: Fusion of CPU & GPU compute power within one processor Significantly enhances active/ resting battery life High-bandwidth I/O 15 Heterogeneous Computing -> Fusion saahpc 2010

16 AMD Fusion APUs Fill the Need x86 CPU owns the Software World GPU Optimized for Modern Workloads Windows, MacOS and Linux franchises Thousands of apps Established programming and memory model Mature tool chain Extensive backward compatibility for applications and OSs High barrier to entry Enormous parallel computing capacity Outstanding performance-per - watt-per-dollar Very efficient hardware threading SIMD architecture well matched to modern workloads: video, audio, graphics 16 Heterogeneous Computing -> Fusion saahpc 2010

17 Fusion APUs: Putting it all together Unacceptable Experts Only Mainstream Programmer Accessibility GPU Advancement Microprocessor Advancement Single-Thread Era Multi-Core Era Heterogeneous Systems Era Fusion APU High Performance Task Parallel Execution Heterogeneous Computing System-level Programmable Power-efficient Data Parallel Execution OCL/DC Driver-based programs Graphics Driver-based programs Throughput Performance 17 Heterogeneous Computing -> Fusion saahpc 2010

18 PC with Discrete GPU 18 Heterogeneous Computing -> Fusion saahpc 2010

19 Fusion APU Based PC 19 Heterogeneous Computing -> Fusion saahpc 2010

20 Two x86 Cores Tuned for Target Markets Bulldozer Performance & Scalability Mainstream Client and Server Markets Bobcat Flexibility, Low Power & Low Cost Low Power Markets Lower Cost Cloud Optimized 20 Heterogeneous Computing -> Fusion saahpc 2010

21 Advanced Optimizations & Load Balancing Heterogeneous Computing: Next-Generation Software Ecosystem Increase ease of application development End-user Applications High Level Frameworks Load balance across CPUs and GPUs; leverage AMD Fusion performance advantages Middleware/Libraries: Video, Imaging, Math/Sciences, Physics OpenCL & Direct Compute Tools: HLL compilers, Debuggers, Profilers Drive new features into industry standards Hardware & Drivers: AMD Fusion, Discrete CPUs/GPUs 21 Heterogeneous Computing -> Fusion saahpc 2010

22 Open Standards: Maximize Developer Freedom and Addressable Market Vendor specific Cross-platform limiters Vendor neutral Cross-platform enablers Apple Display Connector 3dfx Glide Nvidia CUDA Digital Visual Interface OpenCL DirectX Nvidia Cg Rambus Certified DP JEDEC OpenGL Unified Display Interface 22 Heterogeneous Computing -> Fusion saahpc 2010

23 The Benefits of Fusion Unparalleled processing capabilities in mobile form factors Shared memory for the CPU and GPU Eliminates copies, increasing performance Reduces dispatch overhead Lower latency from the GPU to memory Power efficient design Enables architectural innovations between CPU, GPU and the Memory System Scalable architecture that can target a broad range of platforms from mobile to data center 23 Heterogeneous Computing -> Fusion saahpc 2010

24 The Fusion Opportunity A new architectural and performance balance point for computing A new machine target for research A high volume opportunity for new algorithms, new workloads and new applications The deployment opportunity is especially strong in the consumer market place 24 Heterogeneous Computing -> Fusion saahpc 2010

25 Questions? 25 Heterogeneous Computing -> Fusion saahpc 2010

26 Backup slides 26 Heterogeneous Computing -> Fusion saahpc 2010

27 Thread Processors 4 32-bit FP MAD per clock 2 64-bit FP MUL or ADD per clock 1 64-bit FP MAD per clock 4 24-bit Int MUL or ADD per clock Stream Cores Special functions 1 32-bit FP MAD per clock 5-way VLIW Architecture 4 Stream Cores and 1 Special Function Stream Core Separate Branch Unit All 5 cores co-issue Scheduling across the cores is done by the compiler Each core delivers a 32-bit result per clock Thread Processor writes 5 results per clock 27 Heterogeneous Computing -> Fusion saahpc 2010

28 SIMD Engines Diagram shows 2 SIMD Engines Each SIMD Unit includes: 16 Thread Processors (80 shader cores) + 32KB Local Data Share Its own Thread Sequencer which operates a shared set of threads A dedicated fetch unit with an 8KB L1 cache 28 Heterogeneous Computing -> Fusion saahpc 2010

29 TeraScale 2 Architecture Radeon HD Heterogeneous Computing -> Fusion saahpc 2010

30 OpenCL and DirectX 11 DirectCompute How will developers choose? DirectX 11 DirectCompute Easiest path to add compute capabilities to existing DirectX applications Windows Vista and Windows 7 only OpenCL Ideal path for new applications porting to the GPU for the first time True multiplatform: Windows, Linux, MacOS Natural programming without dealing with a graphics API 30 Heterogeneous Computing -> Fusion saahpc 2010

White Paper COMPUTE CORES

White Paper COMPUTE CORES White Paper COMPUTE CORES TABLE OF CONTENTS A NEW ERA OF COMPUTING 3 3 HISTORY OF PROCESSORS 3 3 THE COMPUTE CORE NOMENCLATURE 5 3 AMD S HETEROGENEOUS PLATFORM 5 3 SUMMARY 6 4 WHITE PAPER: COMPUTE CORES

More information

Radeon HD 2900 and Geometry Generation. Michael Doggett

Radeon HD 2900 and Geometry Generation. Michael Doggett Radeon HD 2900 and Geometry Generation Michael Doggett September 11, 2007 Overview Introduction to 3D Graphics Radeon 2900 Starting Point Requirements Top level Pipeline Blocks from top to bottom Command

More information

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011 Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

More information

Introduction to GPU Architecture

Introduction to GPU Architecture Introduction to GPU Architecture Ofer Rosenberg, PMTS SW, OpenCL Dev. Team AMD Based on From Shader Code to a Teraflop: How GPU Shader Cores Work, By Kayvon Fatahalian, Stanford University Content 1. Three

More information

Xbox 360 GPU and Radeon HD Michael Doggett Principal Member of Technical Staff Marlborough, Massachusetts October 29, 2007

Xbox 360 GPU and Radeon HD Michael Doggett Principal Member of Technical Staff Marlborough, Massachusetts October 29, 2007 Xbox 360 GPU and Radeon HD 2900 Michael Doggett Principal Member of Technical Staff Marlborough, Massachusetts October 29, 2007 Overview Introduction to 3D Graphics Xbox 360 GPU Radeon 2900 Pipeline Blocks

More information

GPU Architecture Overview. John Owens UC Davis

GPU Architecture Overview. John Owens UC Davis GPU Architecture Overview John Owens UC Davis The Right-Hand Turn [H&P Figure 1.1] Why? [Architecture Reasons] ILP increasingly difficult to extract from instruction stream Control hardware dominates µprocessors

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

NVIDIA GeForce GTX 580 GPU Datasheet

NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet 3D Graphics Full Microsoft DirectX 11 Shader Model 5.0 support: o NVIDIA PolyMorph Engine with distributed HW tessellation engines

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

Introduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it

Introduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it t.diamanti@cineca.it Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate

More information

Graphics Processing Unit (GPU) Memory Hierarchy. Presented by Vu Dinh and Donald MacIntyre

Graphics Processing Unit (GPU) Memory Hierarchy. Presented by Vu Dinh and Donald MacIntyre Graphics Processing Unit (GPU) Memory Hierarchy Presented by Vu Dinh and Donald MacIntyre 1 Agenda Introduction to Graphics Processing CPU Memory Hierarchy GPU Memory Hierarchy GPU Architecture Comparison

More information

SAPPHIRE R9 270X 4GB GDDR5 WITH BOOST & OC

SAPPHIRE R9 270X 4GB GDDR5 WITH BOOST & OC SAPPHIRE R9 270X 4GB GDDR5 WITH BOOST & OC Specification Display Support Output GPU Video Memory Dimension Software Accessory 3 x Maximum Display Monitor(s) support 1 x HDMI (with 3D) 1 x DisplayPort 1.2

More information

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics GPU Architectures A CPU Perspective Derek Hower AMD Research 5/21/2013 Goals Data Parallelism: What is it, and how to exploit it? Workload characteristics Execution Models / GPU Architectures MIMD (SPMD),

More information

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.

More information

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008 Radeon GPU Architecture and the series Michael Doggett Graphics Architecture Group June 27, 2008 Graphics Processing Units Introduction GPU research 2 GPU Evolution GPU started as a triangle rasterizer

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware

More information

GPU Architecture. Michael Doggett ATI

GPU Architecture. Michael Doggett ATI GPU Architecture Michael Doggett ATI GPU Architecture RADEON X1800/X1900 Microsoft s XBOX360 Xenos GPU GPU research areas ATI - Driving the Visual Experience Everywhere Products from cell phones to super

More information

GPU Parallel Computing Architecture and CUDA Programming Model

GPU Parallel Computing Architecture and CUDA Programming Model GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel

More information

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?

More information

GPGPU Computing. Yong Cao

GPGPU Computing. Yong Cao GPGPU Computing Yong Cao Why Graphics Card? It s powerful! A quiet trend Copyright 2009 by Yong Cao Why Graphics Card? It s powerful! Processor Processing Units FLOPs per Unit Clock Speed Processing Power

More information

AMD Radeon HD 8000M Series GPU Specifications AMD Radeon HD 8870M Series GPU Feature Summary

AMD Radeon HD 8000M Series GPU Specifications AMD Radeon HD 8870M Series GPU Feature Summary AMD Radeon HD 8000M Series GPU Specifications AMD Radeon HD 8870M Series GPU Feature Summary Up to 725 MHz engine clock (up to 775 MHz wh boost) Up to 2GB GDDR5 memory and 2GB DDR3 Memory Up to 1.125 GHz

More information

ATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group

ATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group ATI Radeon 4800 series Graphics Michael Doggett Graphics Architecture Group Graphics Product Group Graphics Processing Units ATI Radeon HD 4870 AMD Stream Computing Next Generation GPUs 2 Radeon 4800 series

More information

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university Real-Time Realistic Rendering Michael Doggett Docent Department of Computer Science Lund university 30-5-2011 Visually realistic goal force[d] us to completely rethink the entire rendering process. Cook

More information

ARM Mali GPUs Today and Tomorrow

ARM Mali GPUs Today and Tomorrow ARM Mali GPUs Today and Tomorrow Growing Demand for Graphics 1080p & 2K resolutions 120FPS >150M tablets sold in 2012 **** 12B app downloads in 2012 * Our world is visual Our world is graphical 8K4K resolutions

More information

Awards News. GDDR5 memory provides twice the bandwidth per pin of GDDR3 memory, delivering more speed and higher bandwidth.

Awards News. GDDR5 memory provides twice the bandwidth per pin of GDDR3 memory, delivering more speed and higher bandwidth. SAPPHIRE FleX HD 6870 1GB GDDE5 SAPPHIRE HD 6870 FleX can support three DVI monitors in Eyefinity mode and deliver a true SLS (Single Large Surface) work area without the need for costly active adapters.

More information

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture? This Unit: Putting It All Together CIS 501 Computer Architecture Unit 11: Putting It All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Amir Roth with contributions by Milo

More information

AMD GPU Architecture. OpenCL Tutorial, PPAM 2009. Dominik Behr September 13th, 2009

AMD GPU Architecture. OpenCL Tutorial, PPAM 2009. Dominik Behr September 13th, 2009 AMD GPU Architecture OpenCL Tutorial, PPAM 2009 Dominik Behr September 13th, 2009 Overview AMD GPU architecture How OpenCL maps on GPU and CPU How to optimize for AMD GPUs and CPUs in OpenCL 2 AMD GPU

More information

QuickSpecs. NVIDIA Quadro M6000 12GB Graphics INTRODUCTION. NVIDIA Quadro M6000 12GB Graphics. Overview

QuickSpecs. NVIDIA Quadro M6000 12GB Graphics INTRODUCTION. NVIDIA Quadro M6000 12GB Graphics. Overview Overview L2K02AA INTRODUCTION Push the frontier of graphics processing with the new NVIDIA Quadro M6000 12GB graphics card. The Quadro M6000 features the top of the line member of the latest NVIDIA Maxwell-based

More information

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA The Evolution of Computer Graphics Tony Tamasi SVP, Content & Technology, NVIDIA Graphics Make great images intricate shapes complex optical effects seamless motion Make them fast invent clever techniques

More information

SAPPHIRE HD 6870 1GB GDDR5 PCIE. www.msystems.gr

SAPPHIRE HD 6870 1GB GDDR5 PCIE. www.msystems.gr SAPPHIRE HD 6870 1GB GDDR5 PCIE Get Radeon in Your System - Immerse yourself with AMD Eyefinity technology and expand your games across multiple displays. Experience ultra-realistic visuals and explosive

More information

Msystems Ltd. www.msystems.gr SAPPHIRE HD 6870 1GB GDDR5 PCIE

Msystems Ltd. www.msystems.gr SAPPHIRE HD 6870 1GB GDDR5 PCIE SAPPHIRE HD 6870 1GB GDDR5 PCIE The SAPPHIRE HD 6870 has a new architecture with a total of 1120 stream processors and 56 texture units delivering massively parallel computing power for graphics and other

More information

AMD WHITE PAPER GETTING STARTED WITH SEQUENCEL. AMD Embedded Solutions 1

AMD WHITE PAPER GETTING STARTED WITH SEQUENCEL. AMD Embedded Solutions 1 AMD WHITE PAPER GETTING STARTED WITH SEQUENCEL AMD Embedded Solutions 1 Optimizing Parallel Processing Performance and Coding Efficiency with AMD APUs and Texas Multicore Technologies SequenceL Auto-parallelizing

More information

AMD Radeon HD 2900 Highlights

AMD Radeon HD 2900 Highlights C O N F I D E N T I A L 2007 Hot Chips 19 AMD s Radeon HD 2900 2 nd Generation Unified Shader Architecture Mike Mantor Fellow AMD Graphics Products Group michael.mantor@amd.com AMD Radeon HD 2900 Highlights

More information

Introducing the Singlechip Cloud Computer

Introducing the Singlechip Cloud Computer Introducing the Singlechip Cloud Computer Exploring the Future of Many-core Processors White Paper Intel Labs Jim Held Intel Fellow, Intel Labs Director, Tera-scale Computing Research Sean Koehl Technology

More information

QuickSpecs. NVIDIA Quadro M6000 12GB Graphics INTRODUCTION PERFORMANCE AND FEATURES. NVIDIA Quadro M6000 12GB Graphics. Overview

QuickSpecs. NVIDIA Quadro M6000 12GB Graphics INTRODUCTION PERFORMANCE AND FEATURES. NVIDIA Quadro M6000 12GB Graphics. Overview Overview L2K02AA INTRODUCTION Push the frontier of graphics processing with the new NVIDIA Quadro M6000 12GB graphics card. The Quadro M6000 features the top of the line member of the latest NVIDIA Maxwell-based

More information

FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015

FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015 FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015 AGENDA The Kaveri Accelerated Processing Unit (APU) The Graphics Core Next Architecture and its Floating-Point Arithmetic

More information

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software GPU Computing Numerical Simulation - from Models to Software Andreas Barthels JASS 2009, Course 2, St. Petersburg, Russia Prof. Dr. Sergey Y. Slavyanov St. Petersburg State University Prof. Dr. Thomas

More information

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,

More information

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.

More information

Xbox 360 System Architecture. Jeff Andrews Nick Baker Xbox Semiconductor Technology Group

Xbox 360 System Architecture. Jeff Andrews Nick Baker Xbox Semiconductor Technology Group Xbox 360 System Architecture Jeff Andrews Nick Baker Xbox Semiconductor Technology Group Hot Chips Presentation Hardware Specs Architectural Choices Programming Environment QA Hot Chips 17 2 Overview Design

More information

SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST

SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST Specification Display Support Output GPU Video Memory Dimension Software Accessory supports up to 4 display monitor(s) without DisplayPort 4 x Maximum Display

More information

L20: GPU Architecture and Models

L20: GPU Architecture and Models L20: GPU Architecture and Models scribe(s): Abdul Khalifa 20.1 Overview GPUs (Graphics Processing Units) are large parallel structure of processing cores capable of rendering graphics efficiently on displays.

More information

INF5063: Programming heterogeneous multi-core processors. September 13, 2010

INF5063: Programming heterogeneous multi-core processors. September 13, 2010 INF5063: Programming heterogeneous multi-core processors September 13, 2010 Overview Course topic and scope Background for the use and parallel processing using heterogeneous multi-core processors Examples

More information

Multiprocessor Graphic Rendering Kerey Howard

Multiprocessor Graphic Rendering Kerey Howard Multiprocessor Graphic Rendering Kerey Howard EEL 6897 Lecture Outline Real time Rendering Introduction Graphics API Pipeline Multiprocessing Parallel Processing Threading OpenGL with Java 2 Real time

More information

Data Center and Cloud Computing Market Landscape and Challenges

Data Center and Cloud Computing Market Landscape and Challenges Data Center and Cloud Computing Market Landscape and Challenges Manoj Roge, Director Wired & Data Center Solutions Xilinx Inc. #OpenPOWERSummit 1 Outline Data Center Trends Technology Challenges Solution

More information

Intel Core i7-990x Processor Extreme Edition. Press Deck: February, 2010 Owner: Karl Shurts, PC Client Group

Intel Core i7-990x Processor Extreme Edition. Press Deck: February, 2010 Owner: Karl Shurts, PC Client Group Intel Core i7-990x Processor Extreme Edition Press Deck: February, 2010 Owner: Karl Shurts, PC Client Group Intel Core i7-990x Processor Extreme Edition Super smart Intel Turbo Boost Technology provides

More information

Introduction to GPU Programming Languages

Introduction to GPU Programming Languages CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure

More information

SAPPHIRE VAPOR-X R9 270X 2GB GDDR5 OC WITH BOOST

SAPPHIRE VAPOR-X R9 270X 2GB GDDR5 OC WITH BOOST SAPPHIRE VAPOR-X R9 270X 2GB GDDR5 OC WITH BOOST Specification Display Support Output GPU Video Memory Dimension Software Accessory 4 x Maximum Display Monitor(s) support 1 x HDMI (with 3D) 1 x DisplayPort

More information

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Technical Specifications

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Technical Specifications J3G90AA INTRODUCTION The NVIDIA Quadro K5200 gives you amazing application performance and capability, making it faster and easier to accelerate 3D models, render complex scenes, and simulate large datasets.

More information

GPU Architecture. An OpenCL Programmer s Introduction. Lee Howes November 3, 2010

GPU Architecture. An OpenCL Programmer s Introduction. Lee Howes November 3, 2010 GPU Architecture An OpenCL Programmer s Introduction Lee Howes November 3, 2010 The aim of this webinar To provide a general background to modern GPU architectures To place the AMD GPU designs in context:

More information

AMD EMBEDDED PCIe ADD-IN BOARD Comparison

AMD EMBEDDED PCIe ADD-IN BOARD Comparison AMD EMBEDDED PCIe ADD-IN BOARD Comparison AMD Radeon E6460 AMD Radeon E6760 Graphics Processing Unit Process Technology 40 nm 40 nm Graphics Engine Operating Frequency (max) 600 MHz 600 MHz CPU Interface

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

Dual Core Architecture: The Itanium 2 (9000 series) Intel Processor

Dual Core Architecture: The Itanium 2 (9000 series) Intel Processor Dual Core Architecture: The Itanium 2 (9000 series) Intel Processor COE 305: Microcomputer System Design [071] Mohd Adnan Khan(246812) Noor Bilal Mohiuddin(237873) Faisal Arafsha(232083) DATE: 27 th November

More information

Low power GPUs a view from the industry. Edvard Sørgård

Low power GPUs a view from the industry. Edvard Sørgård Low power GPUs a view from the industry Edvard Sørgård 1 ARM in Trondheim Graphics technology design centre From 2006 acquisition of Falanx Microsystems AS Origin of the ARM Mali GPUs Main activities today

More information

Boosting Long Term Evolution (LTE) Application Performance with Intel System Studio

Boosting Long Term Evolution (LTE) Application Performance with Intel System Studio Case Study Intel Boosting Long Term Evolution (LTE) Application Performance with Intel System Studio Challenge: Deliver high performance code for time-critical tasks in LTE wireless communication applications.

More information

AMD PhenomII. Architecture for Multimedia System -2010. Prof. Cristina Silvano. Group Member: Nazanin Vahabi 750234 Kosar Tayebani 734923

AMD PhenomII. Architecture for Multimedia System -2010. Prof. Cristina Silvano. Group Member: Nazanin Vahabi 750234 Kosar Tayebani 734923 AMD PhenomII Architecture for Multimedia System -2010 Prof. Cristina Silvano Group Member: Nazanin Vahabi 750234 Kosar Tayebani 734923 Outline Introduction Features Key architectures References AMD Phenom

More information

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization

More information

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Overview. NVIDIA Quadro K5200 8GB Graphics J3G90AA

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Overview. NVIDIA Quadro K5200 8GB Graphics J3G90AA Overview J3G90AA INTRODUCTION The NVIDIA Quadro K5200 gives you amazing application performance and capability, making it faster and easier to accelerate 3D models, render complex scenes, and simulate

More information

THE FUTURE OF THE APU BRAIDED PARALLELISM Session 2901

THE FUTURE OF THE APU BRAIDED PARALLELISM Session 2901 THE FUTURE OF THE APU BRAIDED PARALLELISM Session 2901 Benedict R. Gaster AMD Programming Models Architect Lee Howes AMD MTS Fusion System Software PROGRAMMING MODELS A Track Introduction Benedict Gaster

More information

THE AMD MISSION 2 AN INTRODUCTION TO AMD NOVEMBER 2014

THE AMD MISSION 2 AN INTRODUCTION TO AMD NOVEMBER 2014 THE AMD MISSION To be the leading designer and integrator of innovative, tailored technology solutions that empower people to push the boundaries of what is possible 2 AN INTRODUCTION TO AMD NOVEMBER 2014

More information

Internet, adat, biztonság, sebesség

Internet, adat, biztonság, sebesség Internet, adat, biztonság, sebesség Gacsal József Business Development Manager, Intel Hungary Ltd. 2013. április 9. HOUG Siófok Legal Information Today s presentations contain forward-looking statements.

More information

Virtual Desktop VMware View Horizon

Virtual Desktop VMware View Horizon Virtual Desktop VMware View Horizon Presenter - Scott Le Marquand VMware Virtualization consultant with 6 years consultancy experience VMware Certified Professional 5 Data Center Virtualization VMware

More information

NVIDIA GeForce GTX 750 Ti

NVIDIA GeForce GTX 750 Ti Whitepaper NVIDIA GeForce GTX 750 Ti Featuring First-Generation Maxwell GPU Technology, Designed for Extreme Performance per Watt V1.1 Table of Contents Table of Contents... 1 Introduction... 3 The Soul

More information

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?

More information

Optimizing Code for Accelerators: The Long Road to High Performance

Optimizing Code for Accelerators: The Long Road to High Performance Optimizing Code for Accelerators: The Long Road to High Performance Hans Vandierendonck Mons GPU Day November 9 th, 2010 The Age of Accelerators 2 Accelerators in Real Life 3 Latency (ps/inst) Why Accelerators?

More information

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015 GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once

More information

GPGPU for Real-Time Data Analytics: Introduction. Nanyang Technological University, Singapore 2

GPGPU for Real-Time Data Analytics: Introduction. Nanyang Technological University, Singapore 2 GPGPU for Real-Time Data Analytics: Introduction Bingsheng He 1, Huynh Phung Huynh 2, Rick Siow Mong Goh 2 1 Nanyang Technological University, Singapore 2 A*STAR Institute of High Performance Computing,

More information

QULU VMS AND SERVERS Elegantly simple, Ultimately scalable

QULU VMS AND SERVERS Elegantly simple, Ultimately scalable QULU VMS AND SERVERS Elegantly simple, Ultimately scalable E nter a new era of power, performance and freedom with Vista s video management software, qulu. Designed to offer the most efficient performance

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

INF5063: Programming heterogeneous multi-core processors Introduction

INF5063: Programming heterogeneous multi-core processors Introduction INF5063: Programming heterogeneous multi-core processors Introduction 28/8-2009 Overview Course topic and scope Background for the use and parallel processing using heterogeneous multi-core processors

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks. October 20 th 2015

Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks. October 20 th 2015 INF5063: Programming heterogeneous multi-core processors because the OS-course is just to easy! Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks October 20 th 2015 Håkon Kvale

More information

QuickSpecs. NVIDIA Quadro K1200 4GB Graphics INTRODUCTION PERFORMANCE AND FEATURES. Overview

QuickSpecs. NVIDIA Quadro K1200 4GB Graphics INTRODUCTION PERFORMANCE AND FEATURES. Overview Overview L4D16AA INTRODUCTION The NVIDIA Quadro K1200 delivers outstanding professional 3D application performance in a low profile plug-in card form factor. This card is dedicated for small form factor

More information

Intel Pentium 4 Processor on 90nm Technology

Intel Pentium 4 Processor on 90nm Technology Intel Pentium 4 Processor on 90nm Technology Ronak Singhal August 24, 2004 Hot Chips 16 1 1 Agenda Netburst Microarchitecture Review Microarchitecture Features Hyper-Threading Technology SSE3 Intel Extended

More information

GPUs for Scientific Computing

GPUs for Scientific Computing GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research

More information

Choosing a Computer for Running SLX, P3D, and P5

Choosing a Computer for Running SLX, P3D, and P5 Choosing a Computer for Running SLX, P3D, and P5 This paper is based on my experience purchasing a new laptop in January, 2010. I ll lead you through my selection criteria and point you to some on-line

More information

High Performance GPGPU Computer for Embedded Systems

High Performance GPGPU Computer for Embedded Systems High Performance GPGPU Computer for Embedded Systems Author: Dan Mor, Aitech Product Manager September 2015 Contents 1. Introduction... 3 2. Existing Challenges in Modern Embedded Systems... 3 2.1. Not

More information

Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary

Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary OpenCL Optimization Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary 2 Overall Optimization Strategies Maximize parallel

More information

ARM Cortex A9. Alyssa Colyette Xiao Ling Zhuang

ARM Cortex A9. Alyssa Colyette Xiao Ling Zhuang ARM Cortex A9 Alyssa Colyette Xiao Ling Zhuang Outline Introduction ARMv7-A ISA Cortex-A9 Microarchitecture o Single and Multicore Processor Advanced Multicore Technologies Integrating System on Chips

More information

Maximize Application Performance On the Go and In the Cloud with OpenCL* on Intel Architecture

Maximize Application Performance On the Go and In the Cloud with OpenCL* on Intel Architecture Maximize Application Performance On the Go and In the Cloud with OpenCL* on Intel Architecture Arnon Peleg (Intel) Ben Ashbaugh (Intel) Dave Helmly (Adobe) Legal INFORMATION IN THIS DOCUMENT IS PROVIDED

More information

SPARC64 X+: Fujitsu s Next Generation Processor for UNIX servers

SPARC64 X+: Fujitsu s Next Generation Processor for UNIX servers X+: Fujitsu s Next Generation Processor for UNIX servers August 27, 2013 Toshio Yoshida Processor Development Division Enterprise Server Business Unit Fujitsu Limited Agenda Fujitsu Processor Development

More information

"JAGUAR AMD s Next Generation Low Power x86 Core. Jeff Rupley, AMD Fellow Chief Architect / Jaguar Core August 28, 2012

JAGUAR AMD s Next Generation Low Power x86 Core. Jeff Rupley, AMD Fellow Chief Architect / Jaguar Core August 28, 2012 "JAGUAR AMD s Next Generation Low Power x86 Core Jeff Rupley, AMD Fellow Chief Architect / Jaguar Core August 28, 2012 TWO X86 CORES TUNED FOR TARGET MARKETS Mainstream Client and Server Markets Bulldozer

More information

Amazon EC2 Product Details Page 1 of 5

Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Functionality Amazon EC2 presents a true virtual computing environment, allowing you to use web service interfaces to launch instances with a variety of

More information

OpenCL on Intel Iris Graphics

OpenCL on Intel Iris Graphics Copyright Khronos Group, 2013 - Page 1 OpenCL on Intel Iris Graphics SIGGRAPH 2013 July 2013 Presenter: Adam Lake Content: Ben Ashbaugh, Arnon Peleg, others Copyright Khronos Group, 2013 - Page 2 Agenda

More information

HP Workstations graphics card options

HP Workstations graphics card options Family data sheet HP Workstations graphics card options Quick reference guide Leading-edge professional graphics February 2013 A full range of graphics cards to meet your performance needs compare features

More information

Data Centric Systems (DCS)

Data Centric Systems (DCS) Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems

More information

CS 152 Computer Architecture and Engineering. Lecture 16: Graphics Processing Units (GPUs)

CS 152 Computer Architecture and Engineering. Lecture 16: Graphics Processing Units (GPUs) CS 152 Computer Architecture and Engineering Lecture 16: Graphics Processing Units (GPUs) Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste

More information

Xeon+FPGA Platform for the Data Center

Xeon+FPGA Platform for the Data Center Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system

More information

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit

More information

Multi-Core Processors: New Way to Achieve High System Performance

Multi-Core Processors: New Way to Achieve High System Performance Multi-Core Processors: New Way to Achieve High System Performance Pawe Gepner EMEA Regional Architecture Specialist Intel Corporation pawel.gepner@intel.com Micha F. Kowalik Market Analyst Intel Corporation

More information

CLOUD GAMING WITH NVIDIA GRID TECHNOLOGIES Franck DIARD, Ph.D., SW Chief Software Architect GDC 2014

CLOUD GAMING WITH NVIDIA GRID TECHNOLOGIES Franck DIARD, Ph.D., SW Chief Software Architect GDC 2014 CLOUD GAMING WITH NVIDIA GRID TECHNOLOGIES Franck DIARD, Ph.D., SW Chief Software Architect GDC 2014 Introduction Cloud ification < 2013 2014+ Music, Movies, Books Games GPU Flops GPUs vs. Consoles 10,000

More information

Whitepaper. NVIDIA Miracast Wireless Display Architecture

Whitepaper. NVIDIA Miracast Wireless Display Architecture Whitepaper NVIDIA Miracast Wireless Display Architecture 1 Table of Content Miracast Wireless Display Background... 3 NVIDIA Miracast Architecture... 4 Benefits of NVIDIA Miracast Architecture... 5 Summary...

More information

QuickSpecs. NVIDIA Quadro K4200 4GB Graphics INTRODUCTION. NVIDIA Quadro K4200 4GB Graphics. Overview

QuickSpecs. NVIDIA Quadro K4200 4GB Graphics INTRODUCTION. NVIDIA Quadro K4200 4GB Graphics. Overview Overview J3G89AA INTRODUCTION The NVIDIA Quadro K4200 delivers incredible 3D application performance and capability, allowing you to take advantage of dual copy-engines for seamless data movement within

More information

Intel Q45 and Q43 Express Chipsets

Intel Q45 and Q43 Express Chipsets Product Brief Intel Q45 and Q43 Express Chipsets Advancing business solutions by enhancing manageability and security The new Intel Q45 and Q43 Express Chipsets, when combined with the Intel Core 2 processor

More information

Shattering the 1U Server Performance Record. Figure 1: Supermicro Product and Market Opportunity Growth

Shattering the 1U Server Performance Record. Figure 1: Supermicro Product and Market Opportunity Growth Shattering the 1U Server Performance Record Supermicro and NVIDIA recently announced a new class of servers that combines massively parallel GPUs with multi-core CPUs in a single server system. This unique

More information

2020 Design Update 11.3. Release Notes November 10, 2015

2020 Design Update 11.3. Release Notes November 10, 2015 2020 Design Update 11.3 Release Notes November 10, 2015 Contents Introduction... 1 System Requirements... 2 Actively Supported Operating Systems... 2 Hardware Requirements (Minimum)... 2 Hardware Requirements

More information

Msystems Ltd. www.msystems.gr SAPPHIRE HD 6850 1GB GDDR5 PCIE. Specification

Msystems Ltd. www.msystems.gr SAPPHIRE HD 6850 1GB GDDR5 PCIE. Specification Specification Output GPU Memory Software Accessory 1 x Dual-Link DVI 1 x HDMI 1.4a 1 x DisplayPort 1 x Single-Link DVI-D 775 MHz Core Clock 40 nm Chip 960 x Stream Processors 1024 MB Size 256 -bit GDDR5

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information