Accelerated Cloud Graphics. Franck DIARD, Ph.D. Chief SW Architect, NVIDIA

Similar documents
Cloud Gaming & Application Delivery with NVIDIA GRID Technologies. Franck DIARD, Ph.D. GRID Architect, NVIDIA

CLOUD GAMING WITH NVIDIA GRID TECHNOLOGIES Franck DIARD, Ph.D., SW Chief Software Architect GDC 2014

AGENDA. Overview GPU Video Encoding NVIDIA Video Encoding Capabilities. Software API Performance & Quality. Kepler vs Maxwell GPU capabilities Roadmap

HIGH PERFORMANCE VIDEO ENCODING WITH NVIDIA GPUS

HIGH-PERFORMANCE GPU VIDEO ENCODING ABHIJIT PATAIT SR. MANAGER, NVIDIA

NVIDIA VIDEO ENCODER 5.0

GRID SDK FAQ. PG January Frequently Asked Questions

S Hands on Tutorial: Deploying GRID in Citrix and VMware Virtual Desktop Environments

REMOTE HIGH FIDELITY VISUALIZATION. May 2015 Jeremy Main, Sr. Solution Architect GRID

ANDROID DEVELOPER TOOLS TRAINING GTC Sébastien Dominé, NVIDIA

NVIDIA GRID OVERVIEW SERVER POWERED BY NVIDIA GRID. WHY GPUs FOR VIRTUAL DESKTOPS AND APPLICATIONS? WHAT IS A VIRTUAL DESKTOP?

VIRTU Universal MVP Installation Guide

S5445 BUILDING THE BEST USER EXPERIENCE WITH CITRIX XENAPP & NVIDIA GRID THOMAS POPPELGAARD

HP Workstations graphics card options

PNY Professional Solutions NVIDIA GRID - GPU Acceleration for the Cloud

PLANNING FOR DENSITY AND PERFORMANCE IN VDI WITH NVIDIA GRID JASON SOUTHERN SENIOR SOLUTIONS ARCHITECT FOR NVIDIA GRID

Whitepaper. NVIDIA Miracast Wireless Display Architecture

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Technical Specifications

Performance Testing in Virtualized Environments. Emily Apsey Product Engineer

ArcGIS Pro: Virtualizing in Citrix XenApp and XenDesktop. Emily Apsey Performance Engineer

How To Use An Amd Ramfire R7 With A 4Gb Memory Card With A 2Gb Memory Chip With A 3D Graphics Card With An 8Gb Card With 2Gb Graphics Card (With 2D) And A 2D Video Card With

VMware and NVIDIA: Bringing Workstations to the cloud

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Overview. NVIDIA Quadro K5200 8GB Graphics J3G90AA

HDX 3D Version 1.0 Release Notes

HP Workstations graphics card options

QuickSpecs. NVIDIA Quadro K4200 4GB Graphics INTRODUCTION. NVIDIA Quadro K4200 4GB Graphics. Overview

Low power GPUs a view from the industry. Edvard Sørgård

NVIDIA GeForce GTX 580 GPU Datasheet

Desktop Consolidation. Stéphane Verdy, CTO Devon IT

SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST

Catalyst Software Suite Version 9.2 Release Notes

Get the Best out of NVIDIA GPUs for 3D Design and Engineering in the Cloud

Intel Graphics Virtualization Technology Update. Zhi Wang,

Getting Started with RemoteFX in Windows Embedded Compact 7

GTC EXPRESS WEBINAR: GETTING THE MOST OUT OF VMWARE HORIZON VIEW VDGA WITH NVIDIA GRID. Steve Harpster - Solution Architect NALA

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Data Sheet. Desktop ESPRIMO. General

QuickSpecs. NVIDIA Quadro M GB Graphics INTRODUCTION. NVIDIA Quadro M GB Graphics. Overview

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

4Kp60 H.265/HEVC Glass-to-Glass Real-Time Encoder Reference Design

Example of Standard API

Desktop Virtualization. The back-end

Virtualization for Cloud Computing

SAPPHIRE VAPOR-X R9 270X 2GB GDDR5 OC WITH BOOST

Android Virtualization from Sierraware. Simply Secure

Autodesk Revit 2016 Product Line System Requirements and Recommendations

Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze

Technical Brief. Quadro FX 5600 SDI and Quadro FX 4600 SDI Graphics to SDI Video Output. April 2008 TB _v01

Stream Processing on GPUs Using Distributed Multimedia Middleware

Brainlab Node TM Technical Specifications

GPU Architecture. Michael Doggett ATI

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

Data Centers and Cloud Computing

TEGRA X1 DEVELOPER TOOLS SEBASTIEN DOMINE, SR. DIRECTOR SW ENGINEERING

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Datacenter Operating Systems

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group

Media Cloud Based on Intel Graphics Virtualization Technology (Intel GVT-g) and OpenStack *

Several tips on how to choose a suitable computer

HP Z Workstations graphics card options

NVIDIA GeForce GTX 750 Ti

An Efficient Application Virtualization Mechanism using Separated Software Execution System

Fibre Channel over Ethernet in the Data Center: An Introduction

Virtualization: Hypervisors for Embedded and Safe Systems. Hanspeter Vogel Triadem Solutions AG

Vmware Horizon View with Rich Media, Unified Communications and 3D Graphics

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

Application. EDIUS and Intel s Sandy Bridge Technology

Enabling Technologies for Distributed Computing

SierraVMI Sizing Guide

Enabling Technologies for Distributed and Cloud Computing

Overview of CS 282 & Android

Using Mobile Processors for Cost Effective Live Video Streaming to the Internet

Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual

Computer Graphics Hardware An Overview

Finding Performance and Power Issues on Android Systems. By Eric W Moore

COMPUTING. SharpStreamer Platform. 1U Video Transcode Acceleration Appliance

GPU Accelerated XenDesktop 3D Graphics beyond Designers and Engineers

Amazon EC2 Product Details Page 1 of 5

IP Video Rendering Basics

Pedraforca: ARM + GPU prototype

AMD Radeon HD 8000M Series GPU Specifications AMD Radeon HD 8870M Series GPU Feature Summary

GOLD20TH-GTX980-P-4GD5

QuickSpecs. NVIDIA Quadro K1200 4GB Graphics INTRODUCTION PERFORMANCE AND FEATURES. Overview

Several tips on how to choose a suitable computer

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

Regional SEE-GRID-SCI Training for Site Administrators Institute of Physics Belgrade March 5-6, 2009

E-Guide VIRTUALIZED GPUS TO IMPROVE VDI PERFORMANCE

Logitech ConferenceCam CC3000e. Best Practices for use with Software Clients. UC for Real People

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

Red Hat enterprise virtualization 3.0 feature comparison

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

Router Architectures

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Solomon Systech Image Processor for Car Entertainment Application

QULU VMS AND SERVERS Elegantly simple, Ultimately scalable

QuickSpecs. NVIDIA Quadro K2200 4GB Graphics INTRODUCTION. NVIDIA Quadro K2200 4GB Graphics. Technical Specifications

MED 0115 Optimizing Citrix Presentation Server with VMware ESX Server

Transcription:

Accelerated Cloud Graphics Franck DIARD, Ph.D. Chief SW Architect, NVIDIA

[GRID] truly appears to be prepared to change the way we play games from top to bottom. the GRID could be the long-awaited answer to cloud gaming. Convenience wins. Putting movies in the cloud took sales revenue from $50B (packaged DVDs) to more than $250B (on demand). Cloud gaming is next.

Scope System Architectures Kinds of GPU virtualization HW: potential and limits SW: SDK, techniques Streaming New content / Apps

GRID Server Market All cloud gaming platforms from our partners G-cluster, Playcast, Ubitus, Agawi, CyberCloud, Cloud Union Desktop/App remoting for VCA (Visual Computing Appliance) New cloud game engines Video GRID servers vstb: Live TV, PVR, VOD Transcoding farms

GRID Cloud Gaming Ecosystem Partner middleware They deliver full solution with content, writing their middleware They buy simple blank gaming servers They use our Monterey SDK to write their SW stack They fit our HW with their goals, QOS and content with our SDK Partners Middleware NVIDIA HW Servers Monterey Server Side SDK

Comparison

Cloud Gaming CLIENT SERVER Render Kybd/Mse Decode IP Network NIC CPU Encode Render Capture GRID Client <16ms Network 30 ms 2 Frames GRID 30 ms 2 Frames

GRID Gaming What is Cloud gaming? AAA games? Casual games? What is the experience? What is the QOS? That is a lot of things, the spectrum is VERY large What is the right architecture to do all of it? Well, it depends

Some Background on Platforms

SW Architecture Monterey SDK NVENC Low latency Encoder Low Latency Frame Buffer Capture NvFBC Low Latency Render Target Capture NvIFR NVMOS Platform Virtualization Dedicated GPU Game Windows 7+ Game Game Game Windows 7+ Game Windows 7+ Game Ad-hoc API Shimming DirectX GPU GPU GPU GPU

Architectures: Bare Metal OS No virtualization per se Windows GPU GPU GPU GPU Dx/OGL API shimming can be used

Multiple DX/OpenGL Contexts Windows GPU GPU GPU GPU

Bare Metal OS/Shimming Advantages Disadvantages One OS, multiple game instances Load balancing is dynamic depending on content Licencing cost (one OS licence) High CCU capability for casual gaming Require game modification for input, frame output encoding Today limited by CPU encoding on previous middleware partners solution

Architectures: XEN NMOS Windows Windows Linux Windows GPU GPU GPU GPU VM VM VM VM XEN The platform is virtualized, not the GPUs

XEN / Hypervisor for Gaming Advantages Complete separate OSes, per direct attach GPU Security Deployment Basically, a real desktop complete high performance of DirectX, OpenGL Each game can go full screen Problems CPU tax, performance Needs more resources Not well suited for high CCUs License costs

GRID HW

Kepler, first Cloud GPU High performance per watt Integrated hardware encoder Low-latency frame buffer reads 28nm

GRID Boards K340 4 GK107 (GT650) Only 8x Gen3 between PLX and GPU 1GB per GPU Low 3D perf, but more encoders K520 (0.8 GTX680) 2 GK104 4GB per GPU Lot more 3D perf, but less encoding For Dx gaming, GeForce driver is king GRID GPUs need best perf Profiles, optimizations

Systems

HW: Systems Requirements Pack many GPUs 2-4 slots VT-D Intel for direct attach/pass-though HW requirements Cooling Rack space Power SuperMicro

GRID Gaming Systems: SuperMicro 2027 2U 3 or 4 GPU boards 2 Xeons E5 Sandy Bridge Full lanes of PCI 3.0 Supports VT-d well 6-8 GK104 12-16 GK107 Some other form factors VCA (TYAN, 4U) Balance of power, rack space, cooling

1x Chipset 2P, 16 cores

Monterey SDK: Server Side Capture and Stream

Monterey SDK Core components NvFBC: whole display grab: 1 OS = 1 user = 1 stream NvIFR: render buffer grab: 1 OS = n users = n streams Doc Header files, 2 dlls To system memory CSC conversion Partners running CPU encode Bus contention To H264 HW encoder CSC

NvFBC / NvIFR Frame Buffer Capture (NvFBC) Simple onboarding, low CCU A display remote display 1 OS = 1 user = 1 GPU = 1 virtual display = 1 stream Orthogonal to the rest of everything that puts pixels on the screen Supports multiple heads Inband Frame Readback (NvIFR) Complex onboarding, high CCU for small games Injected into apps API calls or modified apps 1 OS = n users = n apps = n streams Can use several GPUs mapped into OS with Dx/OpenGL affinity

NvFBC Overlay, mouse, GDI, Dx, OGL, video 0 CPU cycles Compressor process comes async to grab content Negative latency: NvFBC grabs the display back buffer If Window Manager runs VSYNC d, NvFBC grabs bits before they make it to the local screen (if any)

NvFBC Usage case VCA, VDI solutions, high-end gaming One VM on NMOS has a full OS and will run only one game/desktop per VM So it can go full screen and take ownership of the single display attached to VM Game s behavior is not modified, at all Only one game can run DX 3D DX Windows NVEnc CSC Shader 3D NvFBC A second background process uses NvFBC to capture what is on the screen, gets back the H264 bits and streams it out Complete asynchronous, pure GPU/DMA Game process Streamer process

NvIFR SDK to use with API shimming Render target read back: Dx9, Dx10, Dx11 (OGL in testing) Format conversion, scaling In band with GFX API: Present() call Page locked sysmem H.264 interface Asynchronous Event Signaling, CPU friendly, interrupt driven Not blocking main render loop Texture sharing extension for DX9 Linux Support

NvIFR Usage case Game is injected/hooked DX device creation is forced offscreen, window hidden n smaller games can run at once, generating n streams DX Shim NvIFR Game process CSC NVEnc Shader GPU DX Windows Shim NvIFR Game process GPU DX Shim NvIFR Game process

NvENC: Fast pathed from NvIFR/NvFBC Completely separate GPU unit: <2 watts PSNR comparable to x264 up to 16 encoding contexts, per GPU High Profile Single pass or dual pass 720p: 4 ms 1080p: 8 ms 4 720p streams per GPU at 30 Hz Constrained VBV buffer size network packet framing for real time delivery CBR, VBR, Min QP CUDA Interoperability I-frame on-demand Reference picture invalidation logic API for packet loss 4Kx4K support Stereo MVC Encoding

Baremetal Shimming Techniques

Windows DirectX Gaming Limitations If Windows is great DirectX and OpenGL games run great on our GPUs It can t do more than one game fullscreen at once So to run several games on the same baremetal OS So games they must be injected/hooked to prevent them from taking ownership of the display and preventing other games from launching/rendering

Shim layer DX affinity 3 2 1 0 N GPUs attached to Windows desktop N D3D adapters Apps create D3D device on default adapter (DX ordinal 0) Apps have to be injected and hooked so The CreateDevice() ordinal parameter is overriden to DX ordinal n DXVA follows the Dx device NvIFR uses the same device so the HW encoder will be on the same GPU

Monterey SDK: Client Side Decode and Display

Decoders Low latency Power efficient H264 packets, fastest path to display Windows Android, Surface RT Mac LGTV

HW decoding with nvcuvid/dx Popping off h264 frames from network socket Feeding elementary stream to nvcuvid -> DXVA YUV comes out in a CUDA buffer Run CUDA program to convert YUV to ARGB CUDA buffer Map cuda ARGB buffer into ARGB texture Draw quad with ARGB texture

Android Decode Low latency decode/display Bypass of OS traditional stack, this is not streaming Java frame work calls jni with native window handle Set output of decoder directly into native window SDK: HoneyComb/ICS and up, native JNI Decoder: 8ms 720p Tear free display 720p / 60 FPS 1080p / 30 FPS

New Type of Content

Multi-View game engine

Cloud Game Engine Cloudgine, EA In house prototype Windows GPU GPU GPU GPU New type of games One single app running, one OS Using all GPUs Dx Dx Dx Dx Dx Context Context Context Context Context Game process N Dx Contexts Using NvIFR Rendering different point of views from n different users N Networked Clients

Higher CCUs Having one complete VM per game is too heavy to reach 80 CCUs per servers Extreme CCUs can run baremetal OS with DX shimming to run loads of casual small games Looking at some very specific and very limited applications

Video workloads XBMC Multimedia Center Mostly HW DXVA decoding Compositing, menus, PIP Reencoding with NvIFR Injected with Detours Dx9Ex Present path hooked NvIFR running in side encoding thread

Virtual Set Top Box (vstb) Storage Content Server 64 streams MultiCast Windows 64 transcoded composited streams GPU GPU GPU GPU

48 XBMC sessions

Demos GRID server (3 x K340) Streaming n sessions of XBMC 7 Mbps per stream HW decoding, encoding

GRID Support For partners SDK with collection of samples, tools, tips and techniques Very responsive and knowledgeable team Solution engineers Release planning

Thanks franck.diard@nvidia.com Phil Eisler, Jon Barad, Thomas Ruge