Bifrost - The GPU architecture for next five billion
|
|
|
- Clarence West
- 7 years ago
- Views:
Transcription
1 Bifrost - The GPU architecture for next five billion Alan Tsai Regional Marketing Manager Media Processing Group ARM Tech Forum Taipei July 1 st, 2016
2 Why New GPU Architecture? 2 ARM 2016
3 Market Drivers and Trends Evolution of mobile gaming and graphics Virtual Reality Augmented Reality Increasing User Interface complexity APIs adapting to developer needs 3
4 Evolution of Mobile Graphics 2016: Lofoten 2013: Trollheim 2010: TrueForce Hardware: Galaxy S2 GPU: Mali-400MP4 API support: OpenGL ES 2.0 Primitives per frame: 16k Cycles per pixel: 3.7 Draw calls per frame: 50 Hardware: Nexus 10 GPU: Mali-T604 API support: OpenGL ES 3.0, OpenCL 1.1 Primitives per frame: 150k Cycles per pixel: 16 Draw calls per frame: 60 Hardware: Galaxy S7 GPU: Mali-T880MP12 API support: Vulkan 1.0 Primitives per frame: 600k Cycles per pixel: 40 Draw calls per frame: 500 4
5 Vulkan: Developer-driven GFX API Vulkan graphics API driven by developer need Low-level API Ideal for new and emerging use-cases Fully exploit heterogeneous system Fully multi-threaded Benefit from HW coherency Application Mali OpenGL ES Driver Driver handles context, memory and error management Application Application handles memory allocation, resources, and thread management to generate command buffers Mali Vulkan Driver Low-overhead driver Mali GPU Mali GPU 5
6 What Vulkan Means for GPU Architecture Stricter system requirements GPU address faults cannot destabilize system Process isolation must be guaranteed Coherent shared memory is mandatory GPU architecture is more exposed Reduces flexibility in some areas, e.g. resource descriptors Application provides more information, sooner Reduces need for indirection/late binding 6
7 Bifrost 7 ARM 2016
8 Why is the Architecture Called Bifrost? 8
9 ARM Mali Architecture Evolution BIFROST Mali-G71 GPU Unified shader cores, scalar ISA, clause execution, full coherency, Vulkan, OpenCL MIDGARD Mali-T600 GPU series Mali-T700 GPU series Mali-T800 GPU series Unified shader cores, SIMD ISA, OpenGL ES 3.x, OpenCL, Vulkan UTGARD Mali-200 GPU Mali-300 GPU Mali-400 GPU Mali-450 GPU Mali-470 GPU Separate shader cores, SIMD ISA, OpenGL ES 2.x 9
10 Bifrost Features A more efficient architecture: More performance overall, per mm 2 and per line of real world shader code Major shader core redesign New scalar, clause-based ISA New quad-based arithmetic units New core fabric New geometry data flow Reduces memory bandwidth and footprint 1.5x Performance improvement 10
11 Bifrost Architectural Innovations Energy efficiency Claused Shaders Index Driven Position Shading Wire light pipelines Developer friendly Designed for Vulkan and VR/AR Heterogeneous computing Full system coherency Midgard Bifrost CPU CPU GPU Coherent Interconnect DRAM 11
12 Bifrost GPU Design 12 ARM 2016
13 Bifrost GPU Design Driver Software Job Manager Shader Core 0 Shader Core 1 Shader Core 2 Shader Core 31 Control Fabric Tiler MMU L2 Cache Segment L2 Cache Segment L2 Cache Segment AXI Memory Bus AXI Memory Bus AXI Memory Bus 13
14 Scalable System Design Driver Software Up to 32 shader cores supported Job Manager Shader Core 0 Shader Core 1 Shader Core 2 Shader Core 31 Control Fabric Tiler MMU L2 Cache Segment L2 Cache Segment L2 Cache Segment AXI Memory Bus AXI Memory Bus AXI Memory Bus 14
15 Geometry Flow Improvement Driver Software Job Manager Shader Core 0 Shader Core 1 Shader Core 2 Shader Core 31 Control Fabric Tiler MMU L2 Cache Segment L2 Cache Segment L2 Cache Segment AXI Memory Bus AXI Memory Bus AXI Memory Bus 15
16 Index-Driven Position Shading Tiler Assembly Position Shading Tiler Culling ½ x Varying Shading Fragment Shading Read/write bandwidth [x times of storage size] 1x 1x ½ x ½ x ½ x Processing Memory 3.5x 2.0x 2.5x 1.5x Positions Positions Attribs Attributes Indices Positions Transformed Positions Polygon List Vertex Attributes Vertex Varyings Midgard Bifrost 1x Bandwidth used relative to memory storage size 16
17 Memory System Driver Software Job Manager Shader Core 0 Shader Core 1 Shader Core 2 Shader Core 31 Control Fabric Tiler MMU L2 Cache Segment L2 Cache Segment L2 Cache Segment AXI Memory Bus AXI Memory Bus AXI Memory Bus Full coherency using ACE protocol 17
18 Memory System Full system coherency support Supports tightly coupled CPU+GPU use cases Cortex-A73 CPU Mali-G71 GPU L2 cache improvements Single logical L2 cache makes software easier Fewer partial lines written to AXI which improves LPDDR4 performance CoreLink CCI-550 DMC-500 DRAM 18
19 Bifrost Core Design 19 ARM 2016
20 Execution Core Improvements Driver Software Job Manager Shader Core 0 Shader Core 1 Shader Core 2 Shader Core 31 Control Fabric Tiler MMU L2 Cache Segment L2 Cache Segment L2 Cache Segment AXI Memory Bus AXI Memory Bus AXI Memory Bus 20
21 ZS Memory Bifrost Core Design Compute Frontend Fragment Frontend Quad Creator Quad Creator Execution Engine 0 Execution Engine 1 Execution Engine 2 Quad State Quad State Quad State Quad Control Quad Manager Control Fabric Load/store Unit Attribute Unit Varying Unit Texture Unit Blender & Tile Access Depth & Stencil To L2 Mem Sys To L2 Mem Sys Tile Memory Tile Writeback To L2 Mem Sys 21
22 ZS Memory Quad Creation Compute Frontend Fragment Frontend Execution Engine 0 Execution Engine 1 Execution Engine 2 Quad State Quad State Quad State Quad Creator Quad Manager Quad Creator Control Fabric Load/store Unit Attribute Unit Varying Unit Texture Unit Blender & Tile Access Depth & Stencil To L2 Mem Sys To L2 Mem Sys Tile Memory Tile Writeback To L2 Mem Sys 22
23 ZS Memory Quad Management Compute Frontend Fragment Frontend Execution Engine 0 Execution Engine 1 Execution Engine 2 Quad State Quad State Quad State Quad Creator Quad Manager Quad Creator Control Fabric Load/store Unit Attribute Unit Varying Unit Texture Unit Blender & Tile Access Depth & Stencil To L2 Mem Sys To L2 Mem Sys Tile Memory Tile Writeback To L2 Mem Sys 23
24 ZS Memory Quad Execution Compute Frontend Fragment Frontend Execution Engine 0 Execution Engine 1 Execution Engine 2 Quad State Quad State Quad State Quad Creator Quad Manager Quad Creator Control Fabric Load/store Unit Attribute Unit Varying Unit Texture Unit Blender & Tile Access Depth & Stencil To L2 Mem Sys To L2 Mem Sys Tile Memory Tile Writeback To L2 Mem Sys 24
25 Lane 0 Lane 1 Lane 2 Lane 3 Quad Vectorization Bifrost uses quad-parallel execution Four scalar threads executed in lockstep in a quad One quad at a time executes in each pipeline stage Each thread fills one 32-bit lane of the hardware 4 threads doing a vec3 FP32 add takes 3 cycles Improves utilization T0.x T0.y T0.z T1.x T1.y T1.z T2.x T2.y T2.z Idle Idle Idle Cycle 1 Cycle 2 Cycle 3 Quad vectorization is compiler friendly T3.x T3.y T3.z Idle Cycle 4 Each thread only sees a stream of scalar operations Vector operations can always be split into scalars 25
26 Classic Instruction Execution Scheduling decision before every instruction Architecturally visible state guaranteed after every instruction Overhead Instruction 26
27 Clause Execution Back-to-back execution guaranteed within a clause Allows aggressive optimisation Overhead Instruction 27
28 Clause Execution R0 R1 R2 R3 R4 R5 R6 R7 R0 R1 R2 R3 R4 R5 R6 R7 ADD R2, R0, R1 A simple register-based instruction set Each instruction fetches arguments from the register file And writes results back to the register file 28
29 Clause Execution R0 R1 R2 R3 R4 R5 R6 R7 ADD R2, R0, R1 R0 R1 R2 R3 R4 R5 R6 R7 ADD R4, R2, R3 R0 R1 R2 R3 R4 R5 R6 R7 Register file access can be expensive Register file is often the most-used part of the GPU High bandwidth means high power consumption Thread allocation keeps registers close to where they are used 29
30 Clause Execution R0 R1 R2 R3 R4 R5 R6 R7 R0 R1 R2 R3 R4 R5 R6 R7 R0 R1 R2 R3 R4 R5 R6 R7 R0 R1 R2 R3 R4 R5 R6 R7 ADD R2, R0, R1 ADD R4, R2, R3 ADD R0, R4, R5 Back-to-back register access is common The result from one instruction is often only used as input to the next 30
31 Clause Execution R0 R1 R2 R3 R4 R5 R6 R7 R0 R1 R2 R3 R4 R5 R6 R7 R0 R1 R2 R3 R4 R5 R6 R7 R0 R1 R2 R3 R4 R5 R6 R7 ADD T, R0, R1 T ADD T, T, R3 T ADD R0, T, R5 Back-to-back register access is common Register file bypass saves power. Allows use of simpler, smaller register files. 31
32 Clause Scheduling TEX Unrelated? Required data not ready? Use result Texture unit operation Delay next clause if asynchronous data not ready Overhead Instruction 32
33 Clause Scheduling? Use result TEX Texture unit operation Another quad can use this execution unit High utilization, high efficiency Overhead Quad 1 Quad 2 33
34 Temp Registers Bifrost Arithmetic Functional Units Executes quad-parallel scalar operations 4x32-bit multiplier FMA 4x32-bit adder ADD Adder includes special function unit Smaller and more area efficient Simplified layout eases compilation Better scheduling in today s code Better utilization One instruction word contains two instructions Main Regs Read FMA ADD/SF Main Regs Write 34
35 Temp Registers Bifrost Arithmetic Functional Units Retains support for smaller width data types Integers useful for deep learning 2x performance for FP16 useful for pixel shaders Main Regs Read int8 int8 int8 int8 8-bit integers int16 int16 16-bit integers int32 32-bit integers FMA float16 float16 16-bit floating point float32 32-bit floating point ADD/SF Main Regs Write 35
36 ZS Memory Load/Store Units Separated Compute Frontend Fragment Frontend Execution Engine 0 Execution Engine 1 Execution Engine 2 Quad State Quad State Quad State Quad Creator Quad Manager Quad Creator Control Fabric Load/store Unit Attribute Unit Varying Unit Texture Unit Blender & Tile Access Depth & Stencil To L2 Mem Sys To L2 Mem Sys Tile Memory Tile Writeback To L2 Mem Sys 36
37 First Incarnation of Bifrost 37 ARM 2016
38 Mali-G71 Built on the innovative new Bifrost architecture Premium GPU delivering our highest performance ever ARM s most scalable GPU to date 38
39 Mali-G71 Efficiency Drives Performance 20% Higher energy efficiency* 32 Shader cores 40% Better performance density* 20% Bandwidth Improvement* Optimized for next generation, advanced, real-world content *Compared to Mali-T880, on same process node under the same conditions. 39
40 The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. Copyright 2016 ARM Limited
GPU Architecture. Michael Doggett ATI
GPU Architecture Michael Doggett ATI GPU Architecture RADEON X1800/X1900 Microsoft s XBOX360 Xenos GPU GPU research areas ATI - Driving the Visual Experience Everywhere Products from cell phones to super
Midgard GPU Architecture. October 2014
Midgard GPU Architecture October 2014 1 The Midgard Architecture HARDWARE EVOLUTION 2 3 Mali GPU Roadmap Mali-T760 High-Level Architecture Distributes tasks to shader cores Efficient mapping of geometry
Introduction to GPU Programming Languages
CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure
Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011
Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis
Performance Optimization and Debug Tools for mobile games with PlayCanvas
Performance Optimization and Debug Tools for mobile games with PlayCanvas Jonathan Kirkham, Senior Software Engineer, ARM Will Eastcott, CEO, PlayCanvas 1 Introduction Jonathan Kirkham, ARM Worked with
The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA
The Evolution of Computer Graphics Tony Tamasi SVP, Content & Technology, NVIDIA Graphics Make great images intricate shapes complex optical effects seamless motion Make them fast invent clever techniques
AMD GPU Architecture. OpenCL Tutorial, PPAM 2009. Dominik Behr September 13th, 2009
AMD GPU Architecture OpenCL Tutorial, PPAM 2009 Dominik Behr September 13th, 2009 Overview AMD GPU architecture How OpenCL maps on GPU and CPU How to optimize for AMD GPUs and CPUs in OpenCL 2 AMD GPU
Low power GPUs a view from the industry. Edvard Sørgård
Low power GPUs a view from the industry Edvard Sørgård 1 ARM in Trondheim Graphics technology design centre From 2006 acquisition of Falanx Microsystems AS Origin of the ARM Mali GPUs Main activities today
Computer Graphics Hardware An Overview
Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and
Next Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1
Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?
Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group
Shader Model 3.0 Ashu Rege NVIDIA Developer Technology Group Talk Outline Quick Intro GeForce 6 Series (NV4X family) New Vertex Shader Features Vertex Texture Fetch Longer Programs and Dynamic Flow Control
Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008
Radeon GPU Architecture and the series Michael Doggett Graphics Architecture Group June 27, 2008 Graphics Processing Units Introduction GPU research 2 GPU Evolution GPU started as a triangle rasterizer
Radeon HD 2900 and Geometry Generation. Michael Doggett
Radeon HD 2900 and Geometry Generation Michael Doggett September 11, 2007 Overview Introduction to 3D Graphics Radeon 2900 Starting Point Requirements Top level Pipeline Blocks from top to bottom Command
GPGPU Computing. Yong Cao
GPGPU Computing Yong Cao Why Graphics Card? It s powerful! A quiet trend Copyright 2009 by Yong Cao Why Graphics Card? It s powerful! Processor Processing Units FLOPs per Unit Clock Speed Processing Power
Optimizing AAA Games for Mobile Platforms
Optimizing AAA Games for Mobile Platforms Niklas Smedberg Senior Engine Programmer, Epic Games Who Am I A.k.a. Smedis Epic Games, Unreal Engine 15 years in the industry 30 years of programming C64 demo
Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005
Recent Advances and Future Trends in Graphics Hardware Michael Doggett Architect November 23, 2005 Overview XBOX360 GPU : Xenos Rendering performance GPU architecture Unified shader Memory Export Texture/Vertex
GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith
GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series By: Binesh Tuladhar Clay Smith Overview History of GPU s GPU Definition Classical Graphics Pipeline Geforce 6 Series Architecture Vertex
Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.
Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide
Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com
CSCI-GA.3033-012 Graphics Processing Units (GPUs): Architecture and Programming Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com Modern GPU
A Scalable VISC Processor Platform for Modern Client and Cloud Workloads
A Scalable VISC Processor Platform for Modern Client and Cloud Workloads Mohammad Abdallah Founder, President and CTO Soft Machines Linley Processor Conference October 7, 2015 Agenda Soft Machines Background
FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015
FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015 AGENDA The Kaveri Accelerated Processing Unit (APU) The Graphics Core Next Architecture and its Floating-Point Arithmetic
Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software
GPU Computing Numerical Simulation - from Models to Software Andreas Barthels JASS 2009, Course 2, St. Petersburg, Russia Prof. Dr. Sergey Y. Slavyanov St. Petersburg State University Prof. Dr. Thomas
Introduction to GPGPU. Tiziano Diamanti [email protected]
[email protected] Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate
ATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group
ATI Radeon 4800 series Graphics Michael Doggett Graphics Architecture Group Graphics Product Group Graphics Processing Units ATI Radeon HD 4870 AMD Stream Computing Next Generation GPUs 2 Radeon 4800 series
Unreal Engine 4: Mobile Graphics on ARM CPU and GPU Architecture
Unreal Engine 4: Mobile Graphics on ARM CPU and GPU Architecture Ray Hwang, Segment Marketing Manager, ARM Jack Porter, Engine Development Lead, Epic Games Korea Hessed Choi, Senior Field Applications
GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics
GPU Architectures A CPU Perspective Derek Hower AMD Research 5/21/2013 Goals Data Parallelism: What is it, and how to exploit it? Workload characteristics Execution Models / GPU Architectures MIMD (SPMD),
Optimizing Unity Games for Mobile Platforms. Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August
Optimizing Unity Games for Mobile Platforms Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August Agenda Introduction The author and ARM Preliminary knowledge Unity Pro, OpenGL ES 3.0 Identify
Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university
Real-Time Realistic Rendering Michael Doggett Docent Department of Computer Science Lund university 30-5-2011 Visually realistic goal force[d] us to completely rethink the entire rendering process. Cook
big.little Technology Moves Towards Fully Heterogeneous Global Task Scheduling Improving Energy Efficiency and Performance in Mobile Devices
big.little Technology Moves Towards Fully Heterogeneous Global Task Scheduling Improving Energy Efficiency and Performance in Mobile Devices Brian Jeff November, 2013 Abstract ARM big.little processing
GPU Architecture. An OpenCL Programmer s Introduction. Lee Howes November 3, 2010
GPU Architecture An OpenCL Programmer s Introduction Lee Howes November 3, 2010 The aim of this webinar To provide a general background to modern GPU architectures To place the AMD GPU designs in context:
How To Teach Computer Graphics
Computer Graphics Thilo Kielmann Lecture 1: 1 Introduction (basic administrative information) Course Overview + Examples (a.o. Pixar, Blender, ) Graphics Systems Hands-on Session General Introduction http://www.cs.vu.nl/~graphics/
GPU Parallel Computing Architecture and CUDA Programming Model
GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel
GPU architecture II: Scheduling the graphics pipeline
GPU architecture II: Scheduling the graphics pipeline Mike Houston, AMD / Stanford Aaron Lefohn, Intel / University of Washington 1 Notes The front half of this talk is almost verbatim from: Keeping Many
Maximize Application Performance On the Go and In the Cloud with OpenCL* on Intel Architecture
Maximize Application Performance On the Go and In the Cloud with OpenCL* on Intel Architecture Arnon Peleg (Intel) Ben Ashbaugh (Intel) Dave Helmly (Adobe) Legal INFORMATION IN THIS DOCUMENT IS PROVIDED
How To Build An Engine 4 Mobile Graphics On Anarm V8-A (A64)
Unreal Engine 4: Mobile Graphics on ARM CPU and GPU Architecture Jesse Barker, Principal Software Engineer, ARM Marius Bjørge, Graphics Research Engineer, ARM Niklas Smedis Smedberg, Senior Engine Programmer,
Introduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
L20: GPU Architecture and Models
L20: GPU Architecture and Models scribe(s): Abdul Khalifa 20.1 Overview GPUs (Graphics Processing Units) are large parallel structure of processing cores capable of rendering graphics efficiently on displays.
Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:
Multiple-Issue Processors Pipelining can achieve CPI close to 1 Mechanisms for handling hazards Static or dynamic scheduling Static or dynamic branch handling Increase in transistor counts (Moore s Law):
OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA
OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization
HPC with Multicore and GPUs
HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware
Hardware accelerated Virtualization in the ARM Cortex Processors
Hardware accelerated Virtualization in the ARM Cortex Processors John Goodacre Director, Program Management ARM Processor Division ARM Ltd. Cambridge UK 2nd November 2010 Sponsored by: & & New Capabilities
Xeon+FPGA Platform for the Data Center
Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system
OpenGL Performance Tuning
OpenGL Performance Tuning Evan Hart ATI Pipeline slides courtesy John Spitzer - NVIDIA Overview What to look for in tuning How it relates to the graphics pipeline Modern areas of interest Vertex Buffer
SGRT: A Scalable Mobile GPU Architecture based on Ray Tracing
SGRT: A Scalable Mobile GPU Architecture based on Ray Tracing Won-Jong Lee, Shi-Hwa Lee, Jae-Ho Nah *, Jin-Woo Kim *, Youngsam Shin, Jaedon Lee, Seok-Yoon Jung SAIT, SAMSUNG Electronics, Yonsei Univ. *,
NVIDIA GeForce GTX 580 GPU Datasheet
NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet 3D Graphics Full Microsoft DirectX 11 Shader Model 5.0 support: o NVIDIA PolyMorph Engine with distributed HW tessellation engines
2: Introducing image synthesis. Some orientation how did we get here? Graphics system architecture Overview of OpenGL / GLU / GLUT
COMP27112 Computer Graphics and Image Processing 2: Introducing image synthesis [email protected] 1 Introduction In these notes we ll cover: Some orientation how did we get here? Graphics system
How To Understand The Power Of Unity 3D (Pro) And The Power Behind It (Pro/Pro)
Optimizing Unity Games for Mobile Platforms Angelo Theodorou Software Engineer Brains Eden, 28 th June 2013 Agenda Introduction The author ARM Ltd. What do you need to have What do you need to know Identify
Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions
Module 4: Beyond Static Scalar Fields Dynamic Volume Computation and Visualization on the GPU Visualization and Computer Graphics Group University of California, Davis Overview Motivation and applications
White Paper AMD GRAPHICS CORES NEXT (GCN) ARCHITECTURE
White Paper AMD GRAPHICS CORES NEXT (GCN) ARCHITECTURE Table of Contents INTRODUCTION 2 COMPUTE UNIT OVERVIEW 3 CU FRONT-END 5 SCALAR EXECUTION AND CONTROL FLOW 5 VECTOR EXECUTION 6 VECTOR REGISTERS 6
NVPRO-PIPELINE A RESEARCH RENDERING PIPELINE MARKUS TAVENRATH [email protected] SENIOR DEVELOPER TECHNOLOGY ENGINEER, NVIDIA
NVPRO-PIPELINE A RESEARCH RENDERING PIPELINE MARKUS TAVENRATH [email protected] SENIOR DEVELOPER TECHNOLOGY ENGINEER, NVIDIA GFLOPS 3500 3000 NVPRO-PIPELINE Peak Double Precision FLOPS GPU perf improved
This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?
This Unit: Putting It All Together CIS 501 Computer Architecture Unit 11: Putting It All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Amir Roth with contributions by Milo
Data Center and Cloud Computing Market Landscape and Challenges
Data Center and Cloud Computing Market Landscape and Challenges Manoj Roge, Director Wired & Data Center Solutions Xilinx Inc. #OpenPOWERSummit 1 Outline Data Center Trends Technology Challenges Solution
ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM
ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit
Parallel Web Programming
Parallel Web Programming Tobias Groß, Björn Meier Hardware/Software Co-Design, University of Erlangen-Nuremberg May 23, 2013 Outline WebGL OpenGL Rendering Pipeline Shader WebCL Motivation Development
Intel Data Direct I/O Technology (Intel DDIO): A Primer >
Intel Data Direct I/O Technology (Intel DDIO): A Primer > Technical Brief February 2012 Revision 1.0 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
Introduction to AMBA 4 ACE and big.little Processing Technology
Introduction to AMBA 4 and big.little Processing Technology Ashley Stevens Senior FAE, Fabric and Systems June 6th 2011 Updated July 29th 2013 Page 1 of 15 Why AMBA 4? The continual requirement for more
Writing Applications for the GPU Using the RapidMind Development Platform
Writing Applications for the GPU Using the RapidMind Development Platform Contents Introduction... 1 Graphics Processing Units... 1 RapidMind Development Platform... 2 Writing RapidMind Enabled Applications...
Silverlight for Windows Embedded Graphics and Rendering Pipeline 1
Silverlight for Windows Embedded Graphics and Rendering Pipeline 1 Silverlight for Windows Embedded Graphics and Rendering Pipeline Windows Embedded Compact 7 Technical Article Writers: David Franklin,
Binary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
Introduction to GPU Architecture
Introduction to GPU Architecture Ofer Rosenberg, PMTS SW, OpenCL Dev. Team AMD Based on From Shader Code to a Teraflop: How GPU Shader Cores Work, By Kayvon Fatahalian, Stanford University Content 1. Three
Trends in HTML5. Matt Spencer UI & Browser Marketing Manager
Trends in HTML5 Matt Spencer UI & Browser Marketing Manager 6 Where to focus? Chrome is the worlds leading browser - by a large margin 7 Chrome or Chromium, what s the difference Chromium is an open source
IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 IBM CELL. Politecnico di Milano Como Campus
Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 CELL INTRODUCTION 2 1 CELL SYNERGY Cell is not a collection of different processors, but a synergistic whole Operation paradigms,
Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH
Equalizer Parallel OpenGL Application Framework Stefan Eilemann, Eyescale Software GmbH Outline Overview High-Performance Visualization Equalizer Competitive Environment Equalizer Features Scalability
OC By Arsene Fansi T. POLIMI 2008 1
IBM POWER 6 MICROPROCESSOR OC By Arsene Fansi T. POLIMI 2008 1 WHAT S IBM POWER 6 MICROPOCESSOR The IBM POWER6 microprocessor powers the new IBM i-series* and p-series* systems. It s based on IBM POWER5
Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association
Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?
big.little Technology: The Future of Mobile Making very high performance available in a mobile envelope without sacrificing energy efficiency
big.little Technology: The Future of Mobile Making very high performance available in a mobile envelope without sacrificing energy efficiency Introduction With the evolution from the first mobile phones
BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA
BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA AGENDA INTRO TO BEAGLEBONE BLACK HARDWARE & SPECS CORTEX-A8 ARMV7 PROCESSOR PROS & CONS VS RASPBERRY PI WHEN TO USE BEAGLEBONE BLACK Single
GPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg
Image Processing and Computer Graphics Rendering Pipeline Matthias Teschner Computer Science Department University of Freiburg Outline introduction rendering pipeline vertex processing primitive processing
Intel Xeon +FPGA Platform for the Data Center
Intel Xeon +FPGA Platform for the Data Center FPL 15 Workshop on Reconfigurable Computing for the Masses PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA
Introduction to Computer Graphics
Introduction to Computer Graphics Torsten Möller TASC 8021 778-782-2215 [email protected] www.cs.sfu.ca/~torsten Today What is computer graphics? Contents of this course Syllabus Overview of course topics
Introduction to RISC Processor. ni logic Pvt. Ltd., Pune
Introduction to RISC Processor ni logic Pvt. Ltd., Pune AGENDA What is RISC & its History What is meant by RISC Architecture of MIPS-R4000 Processor Difference Between RISC and CISC Pros and Cons of RISC
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,
Advanced Rendering for Engineering & Styling
Advanced Rendering for Engineering & Styling Prof. B.Brüderlin Brüderlin,, M Heyer 3Dinteractive GmbH & TU-Ilmenau, Germany SGI VizDays 2005, Rüsselsheim Demands in Engineering & Styling Engineering: :
Instruction Set Architecture (ISA)
Instruction Set Architecture (ISA) * Instruction set architecture of a machine fills the semantic gap between the user and the machine. * ISA serves as the starting point for the design of a new machine
Client/Server Computing Distributed Processing, Client/Server, and Clusters
Client/Server Computing Distributed Processing, Client/Server, and Clusters Chapter 13 Client machines are generally single-user PCs or workstations that provide a highly userfriendly interface to the
ARM Webinar series. ARM Based SoC. Abey Thomas
ARM Webinar series ARM Based SoC Verification Abey Thomas Agenda About ARM and ARM IP ARM based SoC Verification challenges Verification planning and strategy IP Connectivity verification Performance verification
Stream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
Introduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware
GPUs for Scientific Computing
GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles [email protected] Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research
Chapter 2 Parallel Architecture, Software And Performance
Chapter 2 Parallel Architecture, Software And Performance UCSB CS140, T. Yang, 2014 Modified from texbook slides Roadmap Parallel hardware Parallel software Input and output Performance Parallel program
Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles [email protected] hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
Web Based 3D Visualization for COMSOL Multiphysics
Web Based 3D Visualization for COMSOL Multiphysics M. Jüttner* 1, S. Grabmaier 1, W. M. Rucker 1 1 University of Stuttgart Institute for Theory of Electrical Engineering *Corresponding author: Pfaffenwaldring
The Future Of Animation Is Games
The Future Of Animation Is Games 王 銓 彰 Next Media Animation, Media Lab, Director [email protected] The Graphics Hardware Revolution ( 繪 圖 硬 體 革 命 ) : GPU-based Graphics Hardware Multi-core (20 Cores
Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007
Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007 1 Single-core computer 2 Single-core CPU chip the single core 3 Multi-core architectures This lecture is about a new trend in computer
GPUs Under the Hood. Prof. Aaron Lanterman School of Electrical and Computer Engineering Georgia Institute of Technology
GPUs Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering Georgia Institute of Technology Bandwidth Gravity of modern computer systems The bandwidth between key components
LSN 2 Computer Processors
LSN 2 Computer Processors Department of Engineering Technology LSN 2 Computer Processors Microprocessors Design Instruction set Processor organization Processor performance Bandwidth Clock speed LSN 2
GPU Profiling with AMD CodeXL
GPU Profiling with AMD CodeXL Software Profiling Course Hannes Würfel OUTLINE 1. Motivation 2. GPU Recap 3. OpenCL 4. CodeXL Overview 5. CodeXL Internals 6. CodeXL Profiling 7. CodeXL Debugging 8. Sources
Embedded Systems: map to FPGA, GPU, CPU?
Embedded Systems: map to FPGA, GPU, CPU? Jos van Eijndhoven [email protected] Bits&Chips Embedded systems Nov 7, 2013 # of transistors Moore s law versus Amdahl s law Computational Capacity Hardware
Architectures and Platforms
Hardware/Software Codesign Arch&Platf. - 1 Architectures and Platforms 1. Architecture Selection: The Basic Trade-Offs 2. General Purpose vs. Application-Specific Processors 3. Processor Specialisation
Advanced Graphics and Animations for ios Apps
Tools #WWDC14 Advanced Graphics and Animations for ios Apps Session 419 Axel Wefers ios Software Engineer Michael Ingrassia ios Software Engineer 2014 Apple Inc. All rights reserved. Redistribution or
How To Use An Amd Ramfire R7 With A 4Gb Memory Card With A 2Gb Memory Chip With A 3D Graphics Card With An 8Gb Card With 2Gb Graphics Card (With 2D) And A 2D Video Card With
SAPPHIRE R9 270X 4GB GDDR5 WITH BOOST & OC Specification Display Support Output GPU Video Memory Dimension Software Accessory 3 x Maximum Display Monitor(s) support 1 x HDMI (with 3D) 1 x DisplayPort 1.2
3D Computer Games History and Technology
3D Computer Games History and Technology VRVis Research Center http://www.vrvis.at Lecture Outline Overview of the last 10-15 15 years A look at seminal 3D computer games Most important techniques employed
UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS
UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS Structure Page Nos. 2.0 Introduction 27 2.1 Objectives 27 2.2 Types of Classification 28 2.3 Flynn s Classification 28 2.3.1 Instruction Cycle 2.3.2 Instruction
A Crash Course on Programmable Graphics Hardware
A Crash Course on Programmable Graphics Hardware Li-Yi Wei Abstract Recent years have witnessed tremendous growth for programmable graphics hardware (GPU), both in terms of performance and functionality.
