HIGH PERFORMANCE VIDEO ENCODING USING NVIDIA GPUS Abhijit Patait Sr. Manager, GPU Multimedia SW
AGENDA Overview GPU Video Encoding NVIDIA Video Encoding Capabilities Kepler vs Maxwell GPU capabilities Roadmap Software API Performance & Quality
WHY GPU VIDEO ENCODING?
BENEFITS OF ENCODING ON GPU Low power Fixed function hardware Reduced memory transfers Low latency High performance Higher density Scalability Ease of Programming Linux, Windows, C/C++, Application portability
NVIDIA GPU VIDEO ENCODING CAPABILITIES
NVIDIA GPU ENCODING CAPABILITIES Feature Benefits H.264 base, main, high profiles Wide range of use-cases High performance (Up to 16x HD) YUV 4:2:0 and 4:4:4 support QP maps MVC Up to 4096 4096 in HW API - NV Encode SDK & GRID SDK Independent of CUDA Blazing-speed encoding High quality encoding without chroma subsampling Customizable quality, region of interest encoding Full resolution stereo encode High resolution encode Flexible, Win/Linux, DirectX/CUDA Use CUDA and encode simultaneously
VIDEO ENCODING KEPLER VS. MAXWELL Kepler (GK104, GK107, GK106, GK110, GK208) Planar 4:4:4 Maxwell (GM107) Standard 4:4:4 and H.264 lossless encoding ~240 fps 2-pass encoding @ 720p ~500 fps 2-pass encoding @ 720p GRID K340/K520, K1/K2, Quadro, Tesla K10/K20 GeForce 2 full-speed encode sessions/gpu Current and future Maxwell GPU-boards GeForce 2 full-speed encode sessions/gpu NV Encode SDK 1.0, 2.0, 3.0 (Now) NV Encode SDK 4.0+ (May 2014) GRID SDK 1.x, 2.2, 2.3 (Now) GRID SDK 3.0+ (June 2014)
NVIDIA VIDEO ENCODING ROADMAP Performance improvements Quality improvements 4:4:4 & lossless encoding Rate control enhancements Adaptive quantization ROI, ME-only mode New video standards
NVENC SOFTWARE APIS
USING NVENC Direct Encode Capture + Encode NVENC SDK No capture Transcoding Archiving Video editing CUDA pre-process + encoding Granular encoder settings D3D, CUDA interop GRID SDK Capture + encode Optimized for lowlatency apps Capture + CUDA preprocess + encoding Encoder settings optimized for streaming D3D, CUDA interop
DIRECT ENCODE (NVENC SDK) Encoded bitstream Client application NVENC API Initialize, Configure, Encode CUDA Driver NVENC Driver Configure HW HW Encode NVENC firmware + hardware DirectX Driver
CAPTURE AND ENCODE (GRID SDK) Encoded Bitstream Client application DX/OGL Present NvFBC/NvIFR Encode NVENC Driver YUV DirectX/OGL Driver Capture NVENC Hardware GPU 3D Engine
NVENC SDK Available on NVIDIA developer zone https://developer.nvidia.com/nvidia-video-codec-sdk Current release 3.0 Release 4.0 in May 2014 with Maxwell support Interface header, documentation, sample application.dll/.so included in the driver Unified API for Windows and Linux Works on x86/x64 Various API s, presets, rate control modes for Transcoding Video conferencing GTC Session S4654
NVENC SDK (CONTD.) Advantages Flexibility Dynamic resolution/bitrate change CABAC vs CAVLC; low-level encoder settings, B-frames, sync vs async, custom QP Linux, Windows, DirectX, CUDA, OGL (via CUDA) Also works on GeForce hardware (2 sessions/gpu) Error concealment Reference picture invalidation Intra-refresh Quality Two-pass modes for higher quality Various presets with quality/performance trade-off 4:4:4 & lossless encoding (Maxwell only)
GRID SDK ENCODE Available on NVIDIA developer zone https://developer.nvidia.com/grid-app-game-streaming Current release: 2.2 Interface header, documentation, sample apps.dll/.so included in the driver Windows and Linux Works on x86/x64 Various presets and API s for Remote graphics (Cloud gaming, remote desktop, capture & stream) Optimized for low latency
GRID SDK (CONTD.) Advantages Simplicity Very simple API; single function call for capture + H.264 encode Low-latency, high performance Optimized API Error concealment Reference picture invalidation Intra-refresh Quality Two-pass modes for higher quality 4:4:4 & lossless encoding (Maxwell only)
PERFORMANCE AND QUALITY
PERFORMANCE 720P NVENC Performance at 720p, Low-Latency HP preset 231 fps Rate control modes CBR_IFRAME_2PASS 2_PASS_FRAMESIZE_CAP 2_PASS_QUALITY 232 fps 232 fps 504 fps 503 fps 505 fps Kepler (GRID) Maxwell 100 200 300 400 500 600 720p Performance (fps) Performance measured on GRID K520 with GRID SDK NVENC performance benchmarking application
PERFORMANCE 1080P NVENC Performance at 1080p, Low-Latency HP preset 118 fps Rate control modes CBR_IFRAME_2PASS 2_PASS_FRAMESIZE_CAP 2_PASS_QUALITY 118 fps 119 fps 239 fps 240 fps 238 fps Kepler (GRID) Maxwell 50 100 150 200 250 1080p Performance (fps) Performance measured on GRID K520 with GRID SDK NVENC performance benchmarking application
ENCODING QUALITY VS X264 ASSUMPTIONS Infinite GOP IPPP VBV buffer = bitrate/framerate x264 Zero latency CRF = 24 Preset = faster NVENC Preset = LOW_LATENCY_HQ RC = 2-pass-quality
NVENC/X264 QUALITY COMPARISON 45 Titan Fall 720p, 5 Mbps, Low-latency HQ 1.2 40 35 PSNR Y (db) 1.1 30 1 PSNR Y (db) 25 20 SSIM Y 0.9 0.8 SSIM Y PSNR NVENC PSNR x264 SSIM NVENC SSIM x264 15 0.7 10 5 0.6 0 1 101 201 301 401 501 601 701 801 901 0.5
NVENC/X264 QUALITY COMPARISON 60 Bunny 1080p, 12 Mbps, Low-latency HQ 1.5 50 1.4 1.3 40 1.2 PSNR Y (db) 30 20 PSNR Y (db) 1.1 1 SSIM Y PSNR NVENC PSNR x264 SSIM NVENC SSIM x264 SSIM Y 0.9 10 0.8 0 1 101 201 301 401 501 0.7
QUALITY COMPARISON PSNR PSNR Comparison - x264 vs NVENC PSNR Y (db) 50.00 db 45.00 db 40.00 db 35.00 db 30.00 db 25.00 db 20.00 db 15.00 db 10.00 db 5.00 db 0.00 db -5.00 db Bunny NFS Rivals NFS Rivals Titan Fall Titan Fall WoT - 3 WoT - 12 1080p 720p 1080p 720p 1080p 1280 768 1280 768 PSNR NVENC 47.24 db 34.05 db 35.51 db 30.58 db 28.13 db 34.15 db 35.60 db PSNR x264 43.71 db 33.18 db 34.39 db 29.78 db 30.63 db 33.41 db 34.72 db PSNR Difference 3.52 db 0.87 db 1.12 db 0.80 db -2.50 db 0.74 db 0.87 db
QUALITY COMPARISON SSIM SSIM Comparison - x264 vs NVENC 1.0000 0.8000 0.6000 SSIM Y 0.4000 0.2000 0.0000-0.2000 Bunny 1080p NFS Rivals NFS Rivals Titan Fall Titan Fall WoT - 3 WoT - 12 720p 1080p 720p 1080p 1280 768 1280 768 SSIM NVENC 0.9874 0.9217 0.9388 0.8350 0.8309 0.9101 0.9169 SSIM x264 0.9808 0.9103 0.9269 0.8073 0.8567 0.8930 0.9027 SSIM Difference 0.01 0.01 0.01 0.03-0.03 0.02 0.01
QUESTIONS?