Touchstone -A Fresh Approach to Multimedia for the PC



Similar documents
Computer Graphics Hardware An Overview

Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

GPU Architecture. Michael Doggett ATI

Radeon HD 2900 and Geometry Generation. Michael Doggett

Image Synthesis. Transparency. computer graphics & visualization

Introduction to Computer Graphics

NVIDIA GeForce GTX 580 GPU Datasheet

Lecture Notes, CEng 477

2: Introducing image synthesis. Some orientation how did we get here? Graphics system architecture Overview of OpenGL / GLU / GLUT

Introduction to GPGPU. Tiziano Diamanti

Monash University Clayton s School of Information Technology CSE3313 Computer Graphics Sample Exam Questions 2007

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

GPUs Under the Hood. Prof. Aaron Lanterman School of Electrical and Computer Engineering Georgia Institute of Technology

INTRODUCTION TO RENDERING TECHNIQUES

Comp 410/510. Computer Graphics Spring Introduction to Graphics Systems

Optimizing AAA Games for Mobile Platforms

Advanced Visual Effects with Direct3D

Optimizing Unity Games for Mobile Platforms. Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group

ACE: After Effects CC

Real-Time Graphics Architecture

Dynamic Resolution Rendering

Intel 810 and 815 Chipset Family Dynamic Video Memory Technology

Intel Graphics Media Accelerator 900

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Low power GPUs a view from the industry. Edvard Sørgård

Getting Started with RemoteFX in Windows Embedded Compact 7

Software Virtual Textures

Silverlight for Windows Embedded Graphics and Rendering Pipeline 1

ACE: After Effects CS6

Graphical displays are generally of two types: vector displays and raster displays. Vector displays

Remote Desktop Protocol Performance

How To Understand The Power Of Unity 3D (Pro) And The Power Behind It (Pro/Pro)

Configuring Memory on the HP Business Desktop dx5150

Note monitors controlled by analog signals CRT monitors are controlled by analog voltage. i. e. the level of analog signal delivered through the

SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST

VA (Video Acceleration) API. Jonathan Bian 2009 Linux Plumbers Conference

Advanced Rendering for Engineering & Styling

The Future Of Animation Is Games

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

A Short Introduction to Computer Graphics

A Crash Course on Programmable Graphics Hardware

GPGPU Computing. Yong Cao

Intel s Next Generation Integrated Graphics Architecture Intel Graphics Media Accelerator X3000 and 3000

How To Use An Amd Ramfire R7 With A 4Gb Memory Card With A 2Gb Memory Chip With A 3D Graphics Card With An 8Gb Card With 2Gb Graphics Card (With 2D) And A 2D Video Card With

B2.53-R3: COMPUTER GRAPHICS. NOTE: 1. There are TWO PARTS in this Module/Paper. PART ONE contains FOUR questions and PART TWO contains FIVE questions.

How To Make A Texture Map Work Better On A Computer Graphics Card (Or Mac)

Solomon Systech Image Processor for Car Entertainment Application

Motivation: Smartphone Market

BUILDING TELEPRESENCE SYSTEMS: Translating Science Fiction Ideas into Reality

Midgard GPU Architecture. October 2014

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-17: Memory organisation, and types of memory

Computer Architecture

High-speed image processing algorithms using MMX hardware

Lezione 4: Grafica 3D*(II)

Using Photorealistic RenderMan for High-Quality Direct Volume Rendering

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

MMGD0203 Multimedia Design MMGD0203 MULTIMEDIA DESIGN. Chapter 3 Graphics and Animations

Visualization and Feature Extraction, FLOW Spring School 2016 Prof. Dr. Tino Weinkauf. Flow Visualization. Image-Based Methods (integration-based)

ADVANTAGES OF AV OVER IP. EMCORE Corporation

Introduction to Game Programming. Steven Osman

Workstation Applications for Windows. NVIDIA MAXtreme User s Guide

How To Teach Computer Graphics

Adding Animation With Cinema 4D XL

VALLIAMMAI ENGNIEERING COLLEGE SRM Nagar, Kattankulathur

WinFast 3D S320V. User s Manual. Leadtek Research Inc.

Video Coding Standards. Yao Wang Polytechnic University, Brooklyn, NY11201

OpenGL Performance Tuning

Large Scale Data Visualization and Rendering: Scalable Rendering

SAPPHIRE VAPOR-X R9 270X 2GB GDDR5 OC WITH BOOST

3D Math Overview and 3D Graphics Foundations

White Paper Video Surveillance Implementation Using FPGAs

An Embedded Based Web Server Using ARM 9 with SMS Alert System

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

COMP175: Computer Graphics. Lecture 1 Introduction and Display Technologies

For Articulation Purpose Only

SGRT: A Scalable Mobile GPU Architecture based on Ray Tracing

BCC Multi Stripe Wipe

IT Fundamentals of Multimedia (Optional)

COMPACT DISK STANDARDS & SPECIFICATIONS

VIRGINIA WESTERN COMMUNITY COLLEGE

3D Computer Games History and Technology

Video display interfaces

AMD Opteron Quad-Core

Modern Graphics Engine Design. Sim Dietrich NVIDIA Corporation

EDIUS 7 EDIT ANYTHING

Classes of multimedia Applications

Car Racing Game. Figure 1 The Car Racing Game

Water Flow in. Alex Vlachos, Valve July 28, 2010

Networking Remote-Controlled Moving Image Monitoring System

Network administrators must be aware that delay exists, and then design their network to bring end-to-end delay within acceptable limits.

Computer Graphics. Anders Hast

Primary Image Ltd. Issue 1.1

Architecture Guide. VideoCore IV 3D. VideoCore IV 3D Architecture Reference Guide

Performance Optimization and Debug Tools for mobile games with PlayCanvas

9! Multimedia Content! Production and Management

Awards News. GDDR5 memory provides twice the bandwidth per pin of GDDR3 memory, delivering more speed and higher bandwidth.

Introduction Computer stuff Pixels Line Drawing. Video Game World 2D 3D Puzzle Characters Camera Time steps

Transcription:

Touchstone -A Fresh Approach to Multimedia for the PC Emmett Kilgariff Martin Randall Silicon Engineering, Inc

Presentation Outline Touchstone Background Chipset Overview Sprite Chip Tiler Chip Compressed Textures Advanced Rendering Features

Touchstone Background First Implementation of Features in the Talisman Architecture Integrated 3D/multimedia chipset for the PC Joint development project: Cirrus Logic Fujitsu Microelectronics Microsoft Corporation Samsung Semiconductor Silicon Engineering

Touchstone Goals Create a new media solution which addresses key limitations in the PC architecture Memory bandwidth limitations for 3D graphics Advanced 3D rendering features Integration of 2D, 3D, audio and video Enable programmable media accelerators Optimized for DirectX

Product Differentiation Integrated Audio, Video, 2D and 3D graphics Simplifies content development & Lowers overall system cost 1024x768 24-bit RGB at 72Hz versus 640x480 16-bit RGB at 30Hz Sharper images without banding 72Hz frame rate greatly improves immersive experience Optimized for guaranteed frame rate Effective polygon rate of 2M triangles/sec Allows for more complex environments Advanced rendering features Provides higher-quality imaging and special effects

Chipset Overview Tiler & Sprite Chips: Silicon Engineering/Cirrus Logic Memory Controller 3D Rendering Pipeline Performs GSprite transformation and controls Compositor Media Signal Processor (MSP) - Samsung Semiconductor Real-Time Kernel, Graphics Control, Geometry, MPEG, Audio Compositing DAC: Fujitsu Microelectronics Compositing Buffer for GSprites into display image RAMDAC

3D Graphics Existing architectures evolved from Mechanical CAD Animation achieved through brute force Re-render entire screen at frame rates How can we use silicon more efficiently?

Chipset Overview 2Mx8 RDRAM 2Mx8 RDRAM Touchstone VLSI Components Standard Components Commodity DRAM Memory R PCI Media DSP Polygon Object Processor Image Layer Compositor Compositing DAC G B Video (Tiler) (Sprite) Audio Chip 2 Channel Audio Modem

Key Concepts: DirectDraw Surfaces Affine Tranform and Composite Surfaces in Display Pipeline. Exploits Temporal Coherency of 3D apps, re-use surfaces Compressed Textures and Surfaces Amplifies Memory Capacity and Bandwidth by 10x. Chunking Render 32x32 tiles of a surface at a time No Global Z-buffer, or Anti-aliasing buffers needed.

Rendering Pipeline MSP performs Geometry Engine operations Transform, Light, Clip, Setup Separate triangles into bins, depending on viewable area Tiler renders DirectDraw Surfaces, one chunk at a time Display list input - pointers to bins needed for each chunk Renders primitives into chunk-sized buffer (32x32) Chunk is compressed and stored in memory

Display Pipeline MSP creates a surface display list, in front-to-back order Compositing-DAC signals start of display Display is composited one band at a time (1344x32) Sprite Chip fetches surfces, performs affine transform Compositing buffer receives pixels from Sprite Chip, composites one band at a time, then displays completed bands

Advanced Feature: Anisotropic Texture Filtering Provides sharper textures (without temporal artifacts) especially at high angles of perspective

Sprite Chip - Block Diagram Initial Evaluation Image Layer Header Registers Pre-Rasterizer Image Layer Queue Image Layer Read Queue Image Layer Cache Control Cache Address Map Rasterizer Tiler Interface Control Compressed Image layer Cache Decompress Image Layer Cache Image Layer Filter Engine Compositing Buffer Controller To Comp. Buffer

Sprite Chip - Functional Overview Fetches DirectDraw Surfaces from memory at display time Performs affine transforms on DirectDraw Surfaces Filtering Modes: Point Sampled, Bilinear, Bicubic Composites one display band at a time, rendering front to back and writing results to Compositing Buffer Composites Pixels at 320Mpixels/sec (1.28GBytes/sec)

Tiler Chip - Block Diagram Initial Evaluation Primitive Register Pre-Rasterizer Pixel Queue Primitive Queue Texture Read Queue Texture Cache Control Cache Address Map Rasterizer Media DSP Command and Memory Control Compressed Texture Cache Decompress Texture Cache Texture Filter Engine RAMBUS Channels Depth/Stencil/Priority Buffer Compress Fragment Resolve Fragment Buffer Color Buffers Pixel Engine

Tiler Chip - Functional Overview Chunk-based rendering pipe Display List Processor Clip triangle to Chunk, and calculate starting point s attributes Interpolate coordinates, colors and texture coefficients - Calculate coverage mask Calculate texture coordinates, lookup samples and filter Per-Pixel operations: Z-buffering, Anti-aliasing, Transparency, stencil, etc After entire chunk is rendered, it is compressed and written to memory

Compressed Textures: Challenges Texture mapping accesses data randomly Most compression methods need to read most of image to decompress one pixel How do you find the random data in a compressed image? Long Latency Between Address Generation and Uncompressed Data Avaliable. Uncompressed 3-D address Needs to be mapped to linear, compressed address Data Fetch Data Decompress

Challenge 1: Not Fetching Most of Image. Solution: Use Jpeg-like 8x8 compressed blocks. Key difference: DC components not differentially encoded. Cache 8x8 blocks uncompressed for re-use.

Challenge 2: Random Access. Solution: List of pointers to Compressed blocks kept in linear Memory Compressed data stored in Linked-list memory: Simplifies memory allocation. Caching of sections of the pointer list, and linked - list elements (128-byte MAUs) in compressed cache increases effective cache size by 10x.

Challenge3: High Latency. Solution 1 (Sprite Chip): Two Rasterizers. Surface edge equations are fed into Pre-Rasterizer and FIFO to Post-rasterizer. Pre-Rasterizer calculates addresses, forcing compressed cache hits/misses early. Post Rasterizer calculates same address sequance later, forcing uncompressed cache hits/misses. Time beween Pre-and post rasterization used to fetch and uncompress blocks.

Challenge3: High Latency. Solution 2: (Tiler Chip): One Rasterizer, Big texel address FIFO. Rasterizer calculates addresses, forcing compressed cache hits/misses early. Rasterizer places addresses in Pixel FIFO Addresses are applied to the uncompressed cache later, after blocks have been fetched and uncompressed.

Advanced Rendering Features: Anisotropic Texture Filtering Shadows Transparency Anti-Aliasing Motion Blur Depth-of-Field Effects

Advanced Features: Motion Blur and Depth-of-Field Render Object in separate GSprite, shrink in one or both dimensions. Expand to normal size during display.

Functionality and Performance Single PCI board with audio, video, 2D and 3D graphics High-resolution display 1344x1024 @ 75Hz 24-bit true color for all resolutions Rendering rate of 40M pixels/sec with anisotropic filtering and Z-buffered anti-aliasing Image compositing at 320M pixels/sec Equivalent to 2M polygons/sec Advanced rendering features - subpixel-filtered antialiasing, translucent surfaces, shadows, blur, fog, etc

Conclusions >120x overall performance improvement: >4x performance improvement through Spriting 10x memory capacity and bandwidth improvement through Compression 3x memory bandwidth improvement through Chunking Allows RE2-level performance, image quality, features at PC price points