Xbox 360 System Architecture. Jeff Andrews Nick Baker Xbox Semiconductor Technology Group

Similar documents
GPU Architecture. Michael Doggett ATI

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

Radeon HD 2900 and Geometry Generation. Michael Doggett

game console product.

Computer Graphics Hardware An Overview

Console Architecture. By: Peter Hood & Adelia Wong

NVIDIA GeForce GTX 580 GPU Datasheet

AMD GPU Architecture. OpenCL Tutorial, PPAM Dominik Behr September 13th, 2009

How To Use An Amd Ramfire R7 With A 4Gb Memory Card With A 2Gb Memory Chip With A 3D Graphics Card With An 8Gb Card With 2Gb Graphics Card (With 2D) And A 2D Video Card With

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

Introduction to GPGPU. Tiziano Diamanti

ATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group

SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST

GPGPU Computing. Yong Cao

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

Next Generation GPU Architecture Code-named Fermi

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr Teruzzi Roberto matr IBM CELL. Politecnico di Milano Como Campus

SAPPHIRE VAPOR-X R9 270X 2GB GDDR5 OC WITH BOOST

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Introduction to Cloud Computing

Touchstone -A Fresh Approach to Multimedia for the PC

SAPPHIRE HD GB GDDR5 PCIE.

Msystems Ltd. SAPPHIRE HD GB GDDR5 PCIE

A Scalable VISC Processor Platform for Modern Client and Cloud Workloads

Introduction to GPU Programming Languages

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Awards News. GDDR5 memory provides twice the bandwidth per pin of GDDR3 memory, delivering more speed and higher bandwidth.

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

GPU Architecture. An OpenCL Programmer s Introduction. Lee Howes November 3, 2010

The Orca Chip... Heart of IBM s RISC System/6000 Value Servers

Intel Pentium 4 Processor on 90nm Technology

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

NVIDIA GeForce GTX 750 Ti

"JAGUAR AMD s Next Generation Low Power x86 Core. Jeff Rupley, AMD Fellow Chief Architect / Jaguar Core August 28, 2012

Brainlab Node TM Technical Specifications

ARM Processors for Computer-On-Modules. Christian Eder Marketing Manager congatec AG

Optimizing AAA Games for Mobile Platforms

Technical Brief. Quadro FX 5600 SDI and Quadro FX 4600 SDI Graphics to SDI Video Output. April 2008 TB _v01

White Paper. Intel Sandy Bridge Brings Many Benefits to the PC/104 Form Factor

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Multi-Threading Performance on Commodity Multi-Core Processors

Architecture of Hitachi SR-8000

LSI SAS inside 60% of servers. 21 million LSI SAS & MegaRAID solutions shipped over last 3 years. 9 out of 10 top server vendors use MegaRAID

Getting Started with RemoteFX in Windows Embedded Compact 7

OC By Arsene Fansi T. POLIMI

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

Intel Xeon Processor E5-2600

AMD Radeon HD 8000M Series GPU Specifications AMD Radeon HD 8870M Series GPU Feature Summary

Parallel Algorithm Engineering

GPU Parallel Computing Architecture and CUDA Programming Model

CFD Implementation with In-Socket FPGA Accelerators

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

760 Veterans Circle, Warminster, PA Technical Proposal. Submitted by: ACT/Technico 760 Veterans Circle Warminster, PA

Why You Need the EVGA e-geforce 6800 GS

Dynamic Resolution Rendering

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

CHAPTER 7: The CPU and Memory

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

Generations of the computer. processors.

Chapter 1 Computer System Overview

Computer Architecture TDTS10

The Transition to PCI Express* for Client SSDs

Chapter 13. PIC Family Microcontroller

System requirements for Autodesk Building Design Suite 2017

Chapter 2 Parallel Architecture, Software And Performance

PikeOS: Multi-Core RTOS for IMA. Dr. Sergey Tverdyshev SYSGO AG , Moscow

Standardization with ARM on COM Qseven. Zeljko Loncaric, Marketing engineer congatec

White Paper AMD GRAPHICS CORES NEXT (GCN) ARCHITECTURE

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

SiS AMD Athlon TM 64FX/PCI-E Solution. Silicon Integrated Systems Corp. Integrated Product Division April. 2004

Going Linux on Massive Multicore

Mainframe. Large Computing Systems. Supercomputer Systems. Mainframe

How To Build An Ark Processor With An Nvidia Gpu And An African Processor

Autodesk Revit 2016 Product Line System Requirements and Recommendations

Parallel Computing. Benson Muite. benson.

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

IP Video Rendering Basics

A+ Guide to Managing and Maintaining Your PC, 7e. Chapter 1 Introducing Hardware


<Insert Picture Here> T4: A Highly Threaded Server-on-a-Chip with Native Support for Heterogeneous Computing

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Nutaq. PicoDigitizer 125-Series 16 or 32 Channels, 125 MSPS, FPGA-Based DAQ Solution PRODUCT SHEET. nutaq.com MONTREAL QUEBEC

Programming Techniques for Supercomputers: Multicore processors. There is no way back Modern multi-/manycore chips Basic Compute Node Architecture

ICRI-CI Retreat Architecture track

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

QUESTIONS & ANSWERS. ItB tender 72-09: IT Equipment. Elections Project

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

NVIDIA workstation 3D graphics card upgrade options deliver productivity improvements and superior image quality

Optimizing Unity Games for Mobile Platforms. Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August

Low power GPUs a view from the industry. Edvard Sørgård

ANDROID DEVELOPER TOOLS TRAINING GTC Sébastien Dominé, NVIDIA

Transcription:

Xbox 360 System Architecture Jeff Andrews Nick Baker Xbox Semiconductor Technology Group

Hot Chips Presentation Hardware Specs Architectural Choices Programming Environment QA Hot Chips 17 2

Overview Design Principles Next generation gaming Flexibility Programmability Optimized for achievable performance Hot Chips 17 3

Hardware Designed for Games Triple-core, 3.2 GHz custom CPU Shared 1MB L2 cache Customized vector floating point unit per core 5.4Gbps FSB: 10.8 GB/sec read and 10.8 GB/sec write GPU can read from L2 500 MHz custom GPU 48 parallel unified shaders 10 MB embedded DRAM for frame buffer: 256 GB/sec 512 MB unified memory 700Mhz GDDR3: 22.4 GB/sec 12X dual-layer DVD 20 GB hard drive High Definition video out Hot Chips 17 4

System Block Diagram CPU Memory 512 MB DRAM Core0 Core1 Core2 L1D L1I MC0 MC1 L1D L1I 1MB L2 GPU 10MB EDRAM L1D BIU/IO Intf 3D Core L1I Video Out I/O Chip XMA Decoder SMC Analog Chip DVD (SATA) HDD port (SATA) Front controllers (2 USB) Wireless controllers MU ports (2 USB) Rear Panel USB Ethernet IR Audio Out FLASH System control Video Out Hot Chips 17 5

CPU Chip/PPC Core Specs Three 3.2 GHz PowerPC cores Shared 1MB L2 cache, 8-way set associative Per-Core Features 2-issue per cycle, in-order, decoupled Vector/Scalar issue queue 2 symmetric fine grain hardware threads L1 Caches: 2-way I$ / 4-way D$ Execution pipelines Branch Unit, Integer Unit, Load/Store Unit VMX128 Units: Floating Point Unit, Permute Unit, Simple Unit Scalar FPU VMX128 enhanced for game and graphics workloads All execution units 4-way SIMD 128 128-bit vector registers per thread Custom dot-product instruction Native D3D compressed data formats Hot Chips 17 6

CPU Diagram Core 2 Core 1 L1I Instruction Unit Core 0 L1I Instruction Unit L1I Branch VIQ Instruction Unit Branch VIQ L1D Int Ld/St Branch VIQ L1D Int Ld/St VMX VMX VMX VMX VMX VMX FP Perm Simp MMU VMX VMX VMX FP Perm Simp FPU MMU FP Perm Simp FPU VSU VSU VSU Int FPU Ld/St L1D MMU L2 Node Crossbar / Queuing PIC Test Debug Clocks Temp Sensor Uncached Uncached Uncached Unit0 Unit1 Unit2 L2 Dir L2 Dir Bus Interface L2 Data Front Side Bus (FSB) Hot Chips 17 7

CPU Data Streaming Specs High bandwidth data streaming support with minimal cache thrashing 128B cache line size (all caches) Flexible set locking in L2 Write streaming: L1s are write through, writes do not allocate in L1 4 uncacheable write gathering buffers per core 8 cacheable, non-sequential write gathering buffers per core Read streaming: xdcbt data prefetch around L2, directly into L1 8 outstanding load/prefetches per core Tight GPU data streaming integration (XPS) XPS Xbox Procedural Synthesis GPU 128B read from L2 GPU low latency cacheable writebacks to CPU GPU shares D3D compressed data formats with CPU => at least 2x effective bus bandwidth for typical graphics data Hot Chips 17 8

CPU Cached Data Streaming Example Core 2 Core 1 L1I Instruction Unit Core 0 L1I Instruction Unit L1I Branch VIQ L1D Instruction Unit Int Ld/St Branch VIQ L1D Int Ld/St Branch VIQ L1D Int Ld/St VMX VMX VMX MMU VMX VMX VMX FP D3D Perm Compressed Simp FPU Data, MMU VMX VMX VMX FP Perm Simp FPU VMX128 Stores to L2 MMU FP Perm Simp FPU VSU xdcbt 128B Prefetch around L2, into L1 D$ L2 PIC Test Debug Clocks Temp Sensor VSU Uncached Uncached Uncached Unit0 Unit1 Unit2 L2 Dir VSU Node Crossbar / Queuing L2 Dir Bus Interface L2 Data Front Side Bus (FSB) Non-sequential Gathering, Locked Set in L2 GPU 128B Read from L2 From Mem To GPU Hot Chips 17 9

GPU Specs 500 MHz graphics processor 48 parallel shader cores (ALUs); dynamically scheduled; 32bit IEEE FLP 24 billion shader instructions per second Superscalar design: vector, scalar and texture ops per instruction Pixel fillrate: 4 billion pixels/sec (8 per cycle); 2x for depth / stencil only AA: 16 billion samples/sec; 2x for depth / stencil only Geometry rate: 500 million triangles/sec Texture rate: 8 billion bilinear filtered samples / sec 10 MB EDRAM 256 GB/s fill Direct3D 9.0-compatible High-Level Shader Language (HLSL) 3.0+ support Custom features Memory export: Particle physics, Subdivision surfaces Tiling acceleration: Full resolution Hi-Z, Predicated Primitives XPS: CPU cores can be slaved to GPU processing GPU reads geometry data directly from L2 Hardware scaling for display Hot resolution Chips 17 matching 10

GPU Block Diagram FSB Command proc Main Die BIU Vtx Cache Txt Cache Hi-Z Vtx assy / tesselator Sequencer Interpolators Shader complex Shader export Blending i/f Mem 0 Mem1 MC0 MC1 Mem I/F IO Gfx Display PCI-E Control Bus IO AA+AZ 10MB EDRAM DRAM Die Video Hot Chips 17 11

Architectural Choices - SMP Floating point and integer important for games Power consumption Mainstream parallel technique Keep easy to balance Solution: Limited SMP using simplified yet powerful cores Tightly coupled to vector floating point Hot Chips 17 12

Architectural Choices - EDRAM FSAA, alpha and z place heavy load on memory BW Post-process effects require large depth complexity Enable flexible UMA solution Main memory FB/ZB unpredictable performance Many different rendering styles in use, bottlenecks move Solution: Take FB/ZB fill-rate out of the equation Hot Chips 17 13

Software SMP/SMT Mainstream techniques Everything is simplified by being symmetric UMA No partitioning headaches OS All 3 cores available for game developers Standard APIs Win32, OpenMP Direct3D, HLSL Assembly (CPU & Shader) supported - direct hardware access Standard tools XNA: PIX, XACT Visual C++, works with multiple threads Hot Chips 17 14

Software Multi Thread Hot Chips 17 15

The Xbox 360 Platform The Xbox 360 platform delivers breakthrough gaming and entertainment experiences. To ignite the next generation of games and entertainment, we re putting the most powerful next generation platform into the hands of the world s greatest game creators High performance hardware Elegant software Innovative services Xbox 360 was designed from the ground up, specifically to deliver the best console gaming experience Hot Chips 17 16

Summary Designed for next generation gaming Flexible and Programmable Optimized for achievable performance 2005 Microsoft Corporation. Microsoft, Xbox 360, Xbox, XNA, Visual C++, Windows, Win32, Direct3D, and the Xbox 360 logo and Visual Studio logo are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. IBM, PowerPC, and VMX are trademarks of International Business Machines Corporation in the United States, or other countries, or both. IEEE is a registered trademark in the United States, owned by the Institute of Electrical and Electronics Engineers. OpenMP is a trademark of the OpenMP Architecture Review Board. The names of actual companies and products mentioned herein may be the trademarks of their respective owners. Hot Chips 17 17