HIGH PERFORMANCE VIDEO ENCODING WITH NVIDIA GPUS



Similar documents
HIGH-PERFORMANCE GPU VIDEO ENCODING ABHIJIT PATAIT SR. MANAGER, NVIDIA

AGENDA. Overview GPU Video Encoding NVIDIA Video Encoding Capabilities. Software API Performance & Quality. Kepler vs Maxwell GPU capabilities Roadmap

NVIDIA VIDEO ENCODER 5.0

Cloud Gaming & Application Delivery with NVIDIA GRID Technologies. Franck DIARD, Ph.D. GRID Architect, NVIDIA

CLOUD GAMING WITH NVIDIA GRID TECHNOLOGIES Franck DIARD, Ph.D., SW Chief Software Architect GDC 2014

REMOTE HIGH FIDELITY VISUALIZATION. May 2015 Jeremy Main, Sr. Solution Architect GRID

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

GRID SDK FAQ. PG January Frequently Asked Questions

Get the Best out of NVIDIA GPUs for 3D Design and Engineering in the Cloud

PNY Professional Solutions NVIDIA GRID - GPU Acceleration for the Cloud

PLANNING FOR DENSITY AND PERFORMANCE IN VDI WITH NVIDIA GRID JASON SOUTHERN SENIOR SOLUTIONS ARCHITECT FOR NVIDIA GRID

NVIDIA GeForce GTX 580 GPU Datasheet

NVIDIA GRID OVERVIEW SERVER POWERED BY NVIDIA GRID. WHY GPUs FOR VIRTUAL DESKTOPS AND APPLICATIONS? WHAT IS A VIRTUAL DESKTOP?

Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks. October 20 th 2015

Introduction to GPU hardware and to CUDA

This letter contains latest information about the above mentioned software version.

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Vmware Horizon View with Rich Media, Unified Communications and 3D Graphics

NVIDIA GeForce GTX 750 Ti

Low power GPUs a view from the industry. Edvard Sørgård

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

4Kp60 H.265/HEVC Glass-to-Glass Real-Time Encoder Reference Design

CSE 237A Final Project Final Report

The GPU Accelerated Data Center. Marc Hamilton, August 27, 2015

Study and Implementation of Video Compression Standards (H.264/AVC and Dirac)

VMware and NVIDIA: Bringing Workstations to the cloud

Using Mobile Processors for Cost Effective Live Video Streaming to the Internet

ArcGIS Pro: Virtualizing in Citrix XenApp and XenDesktop. Emily Apsey Performance Engineer

ANDROID DEVELOPER TOOLS TRAINING GTC Sébastien Dominé, NVIDIA

IP Video Rendering Basics

This letter contains latest information about the above mentioned software version.

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

Performance Testing in Virtualized Environments. Emily Apsey Product Engineer

Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

The MeeGo Multimedia Stack. Dr. Stefan Kost Nokia - The MeeGo Multimedia Stack - CELF Embedded Linux Conference Europe

VA (Video Acceleration) API. Jonathan Bian 2009 Linux Plumbers Conference

Introduction to GPU Programming Languages

Virtual Desktop VMware View Horizon

Boundless Security Systems, Inc.

ADVANTAGES OF AV OVER IP. EMCORE Corporation

CTX OVERVIEW. Ucentrik CTX

QuickSpecs. NVIDIA Quadro M GB Graphics INTRODUCTION. NVIDIA Quadro M GB Graphics. Overview

Computer Graphics Hardware An Overview

Technical bulletin: Remote visualisation with VirtualGL

GTC EXPRESS WEBINAR: GETTING THE MOST OUT OF VMWARE HORIZON VIEW VDGA WITH NVIDIA GRID. Steve Harpster - Solution Architect NALA

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

HPC with Multicore and GPUs

Desktop Virtualization

Windows Server 2012 R2 VDI - Virtual Desktop Infrastructure. Ori Husyt Agile IT Consulting Team Manager orih@agileit.co.il

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Study and Implementation of Video Compression standards (H.264/AVC, Dirac)

Amazon EC2 Product Details Page 1 of 5

High Efficiency Video Coding (HEVC) or H.265 is a next generation video coding standard developed by ITU-T (VCEG) and ISO/IEC (MPEG).

How To Use An Amd Ramfire R7 With A 4Gb Memory Card With A 2Gb Memory Chip With A 3D Graphics Card With An 8Gb Card With 2Gb Graphics Card (With 2D) And A 2D Video Card With

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Whitepaper. NVIDIA Miracast Wireless Display Architecture

Case Study: Real-Time Video Quality Monitoring Explored

An Efficient Application Virtualization Mechanism using Separated Software Execution System

Quality Estimation for Scalable Video Codec. Presented by Ann Ukhanova (DTU Fotonik, Denmark) Kashaf Mazhar (KTH, Sweden)

Remoting Your 3D. Ian Williams Manager, PSG Applied Engineering NVIDIA Corporation.

S Hands on Tutorial: Deploying GRID in Citrix and VMware Virtual Desktop Environments

GPU Performance Analysis and Optimisation

Media Cloud Based on Intel Graphics Virtualization Technology (Intel GVT-g) and OpenStack *

Chapter 19 Cloud Computing for Multimedia Services

COMPUTING. SharpStreamer Platform. 1U Video Transcode Acceleration Appliance

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Alberto Corrales-García, Rafael Rodríguez-Sánchez, José Luis Martínez, Gerardo Fernández-Escribano, José M. Claver and José Luis Sánchez

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Delivering 3D Graphics from the Private or Public Cloud

Scalability in the Cloud HPC Convergence with Big Data in Design, Engineering, Manufacturing

NVIDIA Quadro K5200 Amazing 3D Application Performance and Capability

NVIDIA GRID DASSAULT CATIA V5/V6 SCALABILITY GUIDE. NVIDIA Performance Engineering Labs PerfEngDoc-SG-DSC01v1 March 2016

Android In The Cloud: A New PaaS Computing Platform

GPU Usage. Requirements

Intel Graphics Virtualization Technology Update. Zhi Wang,

SAPPHIRE VAPOR-X R9 270X 2GB GDDR5 OC WITH BOOST

Video Collaboration & Application Sharing Product Overview

Guided Performance Analysis with the NVIDIA Visual Profiler

Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual

Design and implementation of IPv6 multicast based High-quality Videoconference Tool (HVCT) *

SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST

S4726 VIRTUALIZATION AN INTRO TO VIRTUALIZATION. Luke Wignall & Jared Cowart Senior Solution Architects GRID

STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

GPU Computing - CUDA

VMware Horizon View 3D Graphics

NVIDIA Performance Primitives & Video Codecs on GPU. Gold Room Thursday 1 st October 2009 Anton Obukhov & Frank Jargstorff

REMOTE RENDERING OF COMPUTER GAMES

White Paper THE VALUE OF SOFTWARE-BASED VIDEO CODECS. September S. Ann Earon, Ph.D. President, Telemanagement Resources International Inc.

S5445 BUILDING THE BEST USER EXPERIENCE WITH CITRIX XENAPP & NVIDIA GRID THOMAS POPPELGAARD

We are presenting a wavelet based video conferencing system. Openphone. Dirac Wavelet based video codec

Video Codec Requirements and Evaluation Methodology

QULU VMS AND SERVERS Elegantly simple, Ultimately scalable

Accelerating Wavelet-Based Video Coding on Graphics Hardware

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

Getting Started with RemoteFX in Windows Embedded Compact 7

QuickSpecs. NVIDIA Quadro K4200 4GB Graphics INTRODUCTION. NVIDIA Quadro K4200 4GB Graphics. Overview

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

Transcription:

April 4-7, 2016 Silicon Valley HIGH PERFORMANCE VIDEO ENCODING WITH NVIDIA GPUS Abhijit Patait Eric Young April 4 th, 2016

NVIDIA GPU Video Technologies Video Hardware Capabilities AGENDA Video Software Overview Common Use Cases for Video Performance and Quality Tuning New Directions SDK Links 2

NVIDIA GPU VIDEO TECHNOLOGIES 3

NVIDIA VIDEO TECHNOLOGIES Dedicated hardware for encode & decode Linux, Windows, FFMPEG 4

NVIDIA VIDEO TECHNOLOGIES EVOLUTION Low-latency Streaming GRID Cloud transcoding Social media Live streaming Video-on-demand 5

Time GPU VIDEO ENCODE Benefits Low power Low latency High performance and scalability Frame #4 Frame #4 Automatic benefit from improvements in hardware Linux, Windows, C/C++, FFMPEG support Client Client Client Client 6

VIDEO HARDWARE CAPABILITIES 7

NVIDIA GPU VIDEO HARDWARE NVDEC Video decoder MPEG-2, VC-1, H.264, HEVC Fermi, Kepler, Maxwell, and future GPUs NVENC Video encoder H.264, HEVC Kepler, Maxwell, and future GPUs 8

KEPLER (GK107, GK104) ENCODE CAPABILITIES MAXWELL GEN 1 (GM107) MAXWELL GEN 2 (GM200, GM204, GM206) H.264 only H.264 only H.264 and HEVC/H.265 Standard 4:2:0, Planar 4:4:4 & proprietary 4:4:4 ~240 fps 2-pass encoding @ 720p Standard 4:2:0, 4:4:4 and H.264 lossless encoding ~500 fps 2-pass encoding @ 720p Standard 4:2:0, 4:4:4 and H.264 lossless encoding ~900 fps 2-pass encoding @ 720p GRID K340/K520, K1/K2, Quadro K5000, Tesla K10/K20, GeForce GTX 680 Maxwell-based GRID & Quadro products Tesla M4, M40, M6, M60, Quadro M4000, M5000, M6000, GeForce GTX 960, 980, Titan X NV Encode SDK 1.0-5.0 NV Encode SDK 4.0+ NV Encode SDK 5.0 Video Codec SDK 6.0+ 9

DECODE CAPABILITIES KEPLER (GK107, GK104) MPEG-2, MPEG-4, H.264 H.264: ~200 fps at 1080p; 1 stream of 4K@30 MAXWELL 1 (GM107, GM204, GM200) MPEG-2, MPEG-4, H.264, HEVC with CUDA acceleration H.264: ~540 fps at 1080p 4 streams of 4K@30 MAXWELL 2 (GM206) MPEG-2, MPEG-4, H.264 HEVC/H.265 fully in hardware H.264: ~540 fps at 1080p 4 streams of 4K@30 H.265: Not supported H.265: Not supported H.265: ~500 fps at 1080p 4 streams of 4K@30 Video Codec SDK 5.0+ Video Codec SDK 5.0+ Video Codec SDK 5.0+ 4096 4096 4096 4096 4096 4096 10

VIDEO SOFTWARE OVERVIEW 11

NVIDIA VIDEO TECHNOLOGIES PRE-2016 VIDEO DECODE/PLAYBACK DXVA for Windows VDPAU for Linux NVENC SDK Hardware encoder API Windows, Linux CUDA, DirectX interoperability NVCUVID VIDEO DECODING Windows, Linux, CUDA interoperability GRID/CAPTURE SDK, MFT Use-case specific APIs 12

NVIDIA VIDEO TECHNOLOGIES 2016++ VIDEO CODEC SDK Flexibility API for encode + decode Windows, Linux CUDA, DirectX, OpenGL interoperability High performance transcode Current: Video Codec SDK 6.0 FFMPEG SUPPORT* Hardware acceleration for most popular video and audio framework Leverages FFmpeg s Audio codec, stream muxing, and RTP protocols. Windows, Linux Wide adoption *To get access to the latest FFmpeg repository with NVENC support, please contact your NVIDIA relationship manager. 13

VIDEO CODEC SDK FEATURES What s New Feature Video SDK = encode + decode SDK release Why 6.0 Transcoding Quality++ 6.0 Streaming, Transcoding, Broadcast, Video production RGB inputs 6.0 Capture RGB + encode Motion estimation only mode Adaptive quantization Adaptive B-frames Adaptive GOP Look-ahead 6.0 Hardware assisted motion estimation for custom encoders, Image stabilization 7.0 Improved perceptual quality Available in May 2016 14

ROADMAP Q2 15 Q3 15 Q4 15 Q1 16 Q2 16 Q3 16 NVENC SDK 5.0 HEVC Maxwell Gen 2 H.264 4:4:4 H.264 lossless Video SDK 6.0 ME-only (H.264) HEVC Quality+ RGB inputs HEVC AQ Future Quality++ HEVC 10-bit HEVC 4:4:4 HEVC lossless ME-only (HEVC) 4K HEVC 60 fps 8K HEVC GM204 GM206 Maxwell Gen 2 Pascal 15

COMMON USE CASES FOR VIDEO 16

CAPTURE + ENCODE Capture Desktop (NvFBC) and RenderTargets (NvIFR) Low Latency, low CPU overhead Fully offloads H.264 and HEVC with NVENC High density of users per GPU Streaming Games and Enterprise Apps Graphics commands Apps Apps Apps Tesla, GRID, or Quadro GPU 3D Remote Graphics Stack NVIFR H.264 or raw streams NVENC NVFBC Network Render Target Front Buffer Framebuffer 17

STREAM APPLICATIONS Streaming software VMware Horizon Blast Extreme Nice Desktop Cloud Visualization Capture SDK + Encode SDK Capture (NvFBC and NvIFR) Encode with NvENC (H.264 and HEVC) Supported in Virtualized environments GPU direct attached mode vgpu mode (shared GPU) 18

PERFORMANCE STUDY VMWare Horizon Blast Extreme + GPU 37% better performance (fps) 21% lower latency 19% reduction in bandwidth 16% reduction in CPU utilization 18% increase in number of users 19

LIVE VIDEO TRANSCODING Higher number video streams per GPU server 1 stream to N streams (multi-resolution) Fewer servers needed, higher density, lower TCO Requires Lower bitrate (B-Frames) Live Transcoding User Generated Content Live video broadcasts, presidential debates, concerts Broadcasting from mobile device Live game streaming events 20

TRANSCODE FOR ARCHIVING High density of streams per GPU servers Lower TCO, lower latency 1 stream to N streams (multi-resolution) Archiving HQ archiving for non-live video streaming Quality is and low bitrate are the most important (I, B, and P support) Cost per stream 21

VIDEO CONFERENCING Live video conferencing Video transcoding (1 to N streams) Screen sharing for meetings Video enhancements Video stabilization Frame rate up sampling High quality, low bitrate 22

PERFORMANCE AND QUALITY TUNING 23

RECOMMENDED SETTINGS Remote Graphics NVENC has video presets for latency (I and P frames only) NV_HW_ENC_PRESET_LOW_LATENCY_HQ NV_HW_ENC_PARAMS_RC_2_PASS_QUALITY Video Bitrate settings for low latency dwvbvbuffersize = dwavgbitrate / (dwframeratenum/dwframerateden) dwvbvinitialdelay = dwvbvbuffersize Video Bitrate settings for higher quality K = 4; dwvbvbuffersize = K * dwavgbitrate / (dwframeratenum/dwframerateden) dwvbvinitialdelay = dwvbvbuffersize 24

RECOMMENDED SETTINGS NVENC settings for video quality (I, B, P frames) NV_ENC_PRESET_HQ_GUID NV_ENC_PARAMS_RC_2_PASS_QUALITY set B frames > 0 (EncodeConfig::numB) Video Transcoding Video Bitrate settings for low latency dwvbvbuffersize = dwavgbitrate / (dwframeratenum/dwframerateden) dwvbvinitialdelay = dwvbvbuffersize Video Bitrate settings for higher quality K = 4; dwvbvbuffersize = K * dwavgbitrate / (dwframeratenum/dwframerateden) dwvbvinitialdelay = dwvbvbuffersize 25

TESLA PERFORMANCE # NVDEC # NVENC # 1080P30 H.264 STREAMS* # 1080P30 HEVC STREAMS* Xeon E5 sw encode 2 (x264) 0.25-0.5 (x265) Tesla M60 / 2xGM204 1+1 2+2 2 x (14+14) (870+870Mpixels/sec) 2 x (10+10) (622+622Mpixels/sec) Tesla M6 / 1xGM204 1 2 14+14 (870+870Mpixels/sec) 10+10 (622+622Mpixels/sec) Tesla M4 / 1xGM206 1 1 7 (435Mpixels/sec) 5 (311Mpixels/sec) *Each Maxwell NVENC can do: 7x h.264 1080p30 Highest Quality with B-frames 5x HEVC 1080p30 Highest Quality with no B-frames 26

Quality (PSNR) ENCODE PERF/QUALITY 38.0 Slow Quality vs Performance Quality = x264 37.8 Medium Slow Performance Single NVENC is 3-4x vs x264 37.6 Slow Medium NVENC 37.4 Medium QSV x264 37.2 37.0 0 100 200 300 400 500 Performance (FPS) 27

NEW DIRECTIONS 28

NEW USE CASES Standalone NVENC motion estimation mode Continued video quality improvements Adaptive GOP, Adaptive B-frames, Adaptive Quantization Temporal AQ Frame look ahead Video Stabilization with compute Use CUDA cores for image stabilization to remove video shakiness Algorithm is well suited for GPU architectures Takes advantage of texture cache Scales on GPUs because of high level of parallelism\ 29

DEEP LEARNING VIDEO INFERENCE Using 3D ConvNet Video Analysis using pre-trained Convolution3D network (spatiotemporal signals) Use NVDEC to improve performance when running GPU inference https://research.facebook.com/blog/c3d-generic-features-for-video-analysis/ 30

SDK LINKS 31

NVIDIA VIDEO CODEC SDK Since Kepler dgpu have had Fixed- Function Decoder and Encoder blocks NVENC NVIDIA Video Encoder NVDEC NVIDIA Video Decoder Samples and documentation https://developer.nvidia.com/nvidiavideo-codec-sdk GM200 32

FFMPEG + NVENC NVENC added 1/2015 NVRESIZE added 8/2015 CUDA Context sharing and Zero-Copy NVDEC added 1/2016 https://developer.nvidia.com/ffmpeg 33

QUESTIONS? April 4-7, 2016 Silicon Valley Find us at GTC Hangouts GTC Pod B - H6145A: Video and Image Processing 4/5 (Tuesday) @ 12:45 2pm GTC Pod A - H6145B: Video and Image Processing 4/6 (Wednesday) @ 8:45am - 10am Abhijit Patait Eric Young apatait@nvidia.com eyoung@nvidia.com