D3D9 Media Surface Sharing Between Intel Quick Sync Video and OpenCL* on Intel HD Graphics

Similar documents
Intel Media SDK Library Distribution and Dispatching Process

Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze

Specification Update. January 2014

Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual

Intel HTML5 Development Environment. Tutorial Test & Submit a Microsoft Windows Phone 8* App (BETA)

Intel HTML5 Development Environment. Tutorial Building an Apple ios* Application Binary

Software Solutions for Multi-Display Setups

* * * Intel RealSense SDK Architecture

Intel Core TM i3 Processor Series Embedded Application Power Guideline Addendum

Intel HTML5 Development Environment. Article - Native Application Facebook* Integration

Intel Core i5 processor 520E CPU Embedded Application Power Guideline Addendum January 2011

Creating Overlay Networks Using Intel Ethernet Converged Network Adapters

The Case for Rack Scale Architecture

iscsi Quick-Connect Guide for Red Hat Linux

Intel Media Server Studio Professional Edition for Windows* Server

Intel Retail Client Manager

Monte Carlo Method for Stock Options Pricing Sample

Intel HTML5 Development Environment Article Using the App Dev Center

Intel Service Assurance Administrator. Product Overview

Intel Media SDK Features in Microsoft Windows 7* Multi- Monitor Configurations on 2 nd Generation Intel Core Processor-Based Platforms

2013 Intel Corporation

Hetero Streams Library 1.0

Intel 810 and 815 Chipset Family Dynamic Video Memory Technology

Intel Solid-State Drive Pro 2500 Series Opal* Compatibility Guide

Intel Internet of Things (IoT) Developer Kit

Customizing Boot Media for Linux* Direct Boot

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

Intel Ethernet and Configuring Single Root I/O Virtualization (SR-IOV) on Microsoft* Windows* Server 2012 Hyper-V. Technical Brief v1.

Intel Retail Client Manager Audience Analytics

Intel and Qihoo 360 Internet Portal Datacenter - Big Data Storage Optimization Case Study

Intel Technical Advisory

Head-Coupled Perspective

Intel Data Migration Software

Version Rev. 1.0

with PKI Use Case Guide

Intel SSD 520 Series Specification Update

Intel Perceptual Computing SDK My First C++ Application

Intel Small Business Advantage (Intel SBA) Release Notes for OEMs

Video Encoding on Intel Atom Processor E38XX Series using Intel EMGD and GStreamer

Intel Desktop Board D945GCPE Specification Update

How to Configure Intel Ethernet Converged Network Adapter-Enabled Virtual Functions on VMware* ESXi* 5.1

Intel Desktop Board D945GCPE

Intel Platform Controller Hub EG20T

Intel Desktop Board DG43RK

Displaying Stereoscopic 3D (S3D) with Intel HD Graphics Processors for Software Developers August 2011

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture

User Experience Reference Design

VNF & Performance: A practical approach

Intel Retail Client Manager

Intel Desktop Board DP55WB

Intel Integrated Native Developer Experience (INDE): IDE Integration for Android*

Intel Desktop Board DG41TY

Intel Desktop Board DQ965GF

Douglas Fisher Vice President General Manager, Software and Services Group Intel Corporation

Intel vpro Technology. How To Purchase and Install Go Daddy* Certificates for Intel AMT Remote Setup and Configuration

How to Configure Intel X520 Ethernet Server Adapter Based Virtual Functions on Citrix* XenServer 6.0*

Enabling Innovation in Mobile User Experience. Bruce Fleming Sr. Principal Engineer Mobile and Communications Group

This guide explains how to install an Intel Solid-State Drive (Intel SSD) in a SATA-based desktop or notebook computer.

Software Evaluation Guide for Autodesk 3ds Max 2009* and Enemy Territory: Quake Wars* Render a 3D character while playing a game

Partition Alignment of Intel SSDs for Achieving Maximum Performance and Endurance Technical Brief February 2014

Intel Desktop Board DG41BI

Creating Full Screen Applications Across Multiple Displays in Extended Mode

Accelerating Business Intelligence with Large-Scale System Memory

Intel vpro Technology. How To Purchase and Install Symantec* Certificates for Intel AMT Remote Setup and Configuration

Intel Desktop Board D945GCL

COSBench: A benchmark Tool for Cloud Object Storage Services. Jiangang.Duan@intel.com

Intel Desktop Board DQ43AP

NVIDIA GeForce GTX 580 GPU Datasheet

Intel Desktop Board DG31PR

System Event Log (SEL) Viewer User Guide

Intel Desktop Board D945GCZ

Intel Desktop Board DG43NB

IDE Integration for Android* Part of the Intel Integrated Native Developer Experience (Intel INDE) 1.5.7

Cloud based Holdfast Electronic Sports Game Platform

Intel Desktop Board DG965RY

USB 3.0* Radio Frequency Interference Impact on 2.4 GHz Wireless Devices

Intel Identity Protection Technology (IPT)

Intel Desktop Board DQ35JO

Accomplish Optimal I/O Performance on SAS 9.3 with

Resetting USB drive using Windows Diskpart command

Haswell Cryptographic Performance

The ROI from Optimizing Software Performance with Intel Parallel Studio XE

A Case Study of an Android* Client App Using Cloud-Based Alert Service

Intel System Event Log (SEL) Viewer Utility

Intel Desktop Board D945GNT

Accelerating Business Intelligence with Large-Scale System Memory

Intel Desktop Board DG41WV

MCA Enhancements in Future Intel Xeon Processors June 2013

Vendor Update Intel 49 th IDC HPC User Forum. Mike Lafferty HPC Marketing Intel Americas Corp.

Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms

Intel Matrix Storage Console

Intel Identity Protection Technology Enabling improved user-friendly strong authentication in VASCO's latest generation solutions

Revision History. Revision Revision History Date

Intel Desktop Board D101GGC Specification Update

Intel 845G/GL Chipset Dynamic Video Memory Technology

Intel Solid-State Drives Increase Productivity of Product Design and Simulation

Intel Desktop Board DG33TL

Intel Desktop Board DP43BF

Intel Solid-State Drive Data Center Tool User Guide Version 1.1

Intelligent Business Operations

Transcription:

D3D9 Media Surface Sharing Between Intel Quick Sync Video and OpenCL* on Intel HD Graphics Abstract Intel has defined an extension to OpenCL* v1.0 and later, allowing applications to directly access images embedded in Microsoft DirectX* 9 (DX9) media surfaces, without first copying them. For OpenCL v1.2, the Khronos standards organization has defined a standardized extension for the same purpose. Intel Quick Sync Video (Intel QSV) is able to decode video images into DX9 surfaces. The Intel and Khronos extensions can significantly improve performance of applications that use OpenCL to process Intel QSV video frames. These benefits can apply to a range of applications, including video enhancement during playback, video editing with effects, and video generation and encoding OpenCL and Intel QSV Media Surface Sharing Intel QSV supports video decode and encode acceleration by Intel processor graphics and can be accessed by applications using the Intel Media SDK. 1 OpenCL 2 is a framework for developing applications to take advantage of heterogeneous platforms. In the case considered in this article, these platforms consist of a CPU and a programmable graphics unit, with parts of the application executing on both to achieve improved processing efficiency and performance. The Intel defined extension to OpenCL v1.0 (and usable on subsequent versions) enables direct access to DX9 media surfaces created by Intel QSV. Intel QSV media surface sharing with OpenCL can avoid many surface copies between Intel QSV and OpenCL kernels. If an application tries to apply an OpenCL filter to Intel QSV video without surface sharing, it will need to copy Intel QSV output surfaces to OpenCL image objects so that the OpenCL filter can consume the data. This copy will get more timeconsuming as the resolution of input video increases, as there is more data to copy. Figure 1 shows an example of this, in which OpenCL is used to modify a video stream. Video frames decoded by Intel QSV into DX9 surfaces are copied from the GPU to the CPU, then copied to an OpenCL memory object, copied from CPU to GPU, processed in OpenCL with

the output copied back to the CPU, then copied into a DX9 surface and finally copied back to the GPU for video encode by Intel QSV. With shared DX9 surfaces, none of these copies will be needed. Figure 1: Video transcode and processing pipeline without surface sharing Because Intel processor graphics shares memory with the CPU, image surfaces might only need to be copied during conversion to a system memory buffer. Once a system memory pointer is available, it can be used to create an OpenCL buffer using that same system memory buffer. The inverse is also true. After OpenCL processing completes into an OpenCL buffer that shares a system memory buffer, that buffer could be copied and converted to a DX9 buffer. So the total memory copies could be reduced to two for Intel processor graphics. Still, it will be better to eliminate all the extra copies. Using the Intel DirectX 9 media surface sharing extension of OpenCL 1.0, a DX9 image surface in the NV12 format (used internally by Intel QSV) can be converted directly into an NV12 OpenCL image2d memory object without copying, and vice-versa, as shown in Figure 2. This can avoid the extra data copies to convert video frames going between Intel QSV and an OpenCL filter. So for a transcode application, the shared media surface extension can save at least two extra memory copies.

Figure 2: Video transcode and processing pipeline using MSDK/OCL surface sharing extensions OpenCL DirectX 9 Media Sharing APIs Intel has defined an Intel vendor-specific extension for OpenCL version 1.0 (and beyond), called cl_intel_dx9_media_sharing 3. That OpenCL extension API includes four functions that allow applications to share DirectX 9 media surfaces with OpenCL: clgetdeviceidsfromdx9intel clcreatefromdx9mediasurfaceintel clenqueueacquiredx9objectsintel clenqueuereleasedx9objectsintel Using the above API extension, the OpenCL API can be enabled to execute kernels that read and/or write memory objects that are also DirectX 9 resources. An OpenCL 2D image object can be created from a DirectX 9 media surface resource. OpenCL memory objects may be created from DirectX 9 objects if and only if the OpenCL context has been created from a DirectX 9 device. That is, the OpenCL clcreatecontext or clcreatecontextfromtype functions must be called with a properties list that includes the property CL_CONTEXT_D3D9EX_DEVICE_INTEL paired with a pointer to the IDirect3DDevice9 created for Intel QSV. (More information on this can be found in the Intel DX9 media sharing extension specification. 3 )

clgetdeviceidsfromdx9intel: This function is used to query OpenCL devices corresponding to DirectX 9 device on a particular platform. The application should create the OpenCL context from DirectX 9 device. clcreatefromdx9mediasurfaceintel: This function is used to create an OpenCL 2D image object from a DirectX 9 media surface or a plane of a DirectX 9 media surface. Applications can query the properties of the media surface object created from the DirectX 9 resource by using clgetmemobjectinfo and clgetimageinfo functions. clenqueueacquiredx9objectsintel: This function is used to acquire OpenCL memory objects that have been created from the DirectX9 resources. The OpenCL memory objects created from DirectX 9 resources must be acquired before they can be used by any OpenCL commands queued to a command-queue. This function provides synchronization guaranteeing that any DirectX calls made before clenqueueacquiredx9objectsintel is called will complete execution before the execution of any subsequent OpenCL APIs submitted to the command-queue. clenqueuereleasedx9objectsintel: This function is used to release OpenCL memory objects that have been created from the DirectX9 resources. This function provides the synchronization guaranteeing that any calls to DirectX9 resource made after the call to clenqueuereleasedx9objectsintel will not start executing until after all events in event list of this API are complete and all work already submitted to the command-queue completes execution. To use this extension API, the application must first obtain a pointer to each of the corresponding functions. For OpenCL v1.0 and v1.1, use the standard OpenCL function clgetextensionfunctionaddress, passing it the extension function name as a string. The file cl_ext.h in the Intel OpenCL SDK distribution contains function prototypes (with _fn appended to the API name) that can be used to correctly cast the function pointers. In OpenCL version 1.2, Khronos has added an equivalent standard Khronos extension. 4 Using that extension, the equivalents of the above API names become: clgetdeviceidsfromdx9mediaadapterkhr clcreatefromdx9mediasurfacekhr clenqueueacquiredx9mediasurfaceskhr clenqueuereleasedx9mediasurfaceskhr Function prototypes for these APIs can be found in the cl_dx9_media_sharing.h file available from Khronos.org. 5 However, OpenCL v1.2 has deprecated clgetextensionfunctionaddress in favor of a new API function clgetextensionfunctionaddressforplatform, as described in the Khronos.org document

OpenCL 1.2 Extensions Specification. 4 The clgetextensionfunctionaddressforplatform function requires both a platform ID and the Khronos extension API name as a string. For the KHR DX9 media surface sharing extension, as with the Intel DX9 media surface sharing extension, Intel QSV media surfaces can only be shared into an OpenCL context created using the property CL_CONTEXT_ADAPTER_D3D9EX_KHR paired with a pointer to the IDirect3DDevice9 created for Intel QSV. For both Intel and KHR extensions, the OpenCL functions clgetplatforminfo and clgetdeviceinfo can be used to get a list of named extensions supported by an OpenCL implementation, to verify that the desired extension is supported either cl_intel_dx9_media_sharing for the Intel vendor-specific extension to OpenCL v1.0 or v1.1, or cl_khr_dx9_media_sharing for the OpenCL v1.2 Khronos standard extension. It is also worth noting that other DirectX surface sharing extensions have been created, and are also documented on the Khronos.org site. Intel Media SDK User Functions Intel Media SDK provides a USER class of functions to allow user-defined functions to participate in Intel QSV pipeline. This allows the application to integrate OpenCL based kernels into the Intel QSV pipeline as a plug-in. In this particular case, the USER class of functions is implemented to use OpenCL kernels. Intel QSV Decode USER class of functions Intel QSV Encode The application needs to do the following to use an OpenCL plug-in inside Intel Media SDK: Initialize an OpenCL plug-in, registering a set of callback functions (see later) through the MFXVideoUSER_Register function. The Intel Media SDK invokes these callback functions at appropriate times in the Intel QSV pipeline. Once initialized, the application can use the OpenCL plug-ins through the Intel Media SDK function MFXVideoUSER_ProcessFrameAsync to process data. The function returns a sync point for result synchronization, which is similar to other Intel Media SDK async functions. Close the OpenCL plug-in by unregistering it via the MFXVideoUSER_Unregister function. The application needs to include mfxplugin.h in addition to other Intel Media SDK files.

The application needs to implement the following callback functions, which are registered through MFXVideoUSER_Register: PluginInit: Intel Media SDK calls this function to initialize the plug-in components and allocate internal resources PluginClose: Intel Media SDK calls this function to close the plug-in components and free the internal resources. GetPluginParam: Intel Media SDK calls this function to obtain plug-in configuration parameters. Submit: Intel Media SDK calls this function to check the validity of I/O parameters and submit a task to SDK for execution. Execute: Intel Media SDK calls this function to execute the submitted task after resolving all input data dependencies. FreeResources: Intel Media SDK calls this when task execution finishes or to cancel the queued task. The Intel Media SDK kit includes a document, Intel Media Software Development Kit Extensions for User-Defined Functions API version 1.3 6 that provides the details about all these APIs. Intel QSV and Intel OpenCL Media sharing Extension flow There are multiple use cases in which Intel QSV and OpenCL can interoperate. App initialization, DX9 surface creation, OpenCL* setup Initial Setup Intel QSV Decoder Frame Decoding MFXVideoUSer_ProcessFrameAsync clenqueueacquiredx9objectsintel ClSetKernelArg. clenqueuendrangekernel clenqueuereleasedx9objectsintel Video Renderer Frame Renderer Figure 3: Intel QSV Decoder and OpenCL* interoperability

For example, Figure 3 shows the flow of a decoder application, where an OpenCL filter is applied before displaying the frames. Figure 4 shows the flow of a video transcoding application that uses Intel QSV to decode and encode the video. It applies some OpenCL filters to decoded video frames before encoding them. App initialization, DX9 surface creation, OpenCL* Initial Setup Intel QSV Decoder Frame Decoding clenqueueacquiredx9objectsintel clsetkernelarg... OpenCL Kernel Execution clsetkernelarg clenqueuendrangekernel clenqueuereleasedx9objectsintel Intel QSV Encoder Frame Encoding Figure 4: Intel QSV and OpenCL* Interoperability In each of these cases, the four distinct steps are: initial setup, frame decoding, OpenCL filters, and final video frame processing. Initial Setup: In this step the application must initialize Intel QSV components using Intel Media SDK. It must also perform DX9 surface allocation. The application will use the Intel Media SDK API function MFXVideoUSER_Register to register the OpenCL plug-in. After that, the application can initialize other components and create the OpenCL surface using the clcreatefromdx9mediasurfaceintel function and map to DX9 surfaces created by Intel Media SDK.

Frame Decoding: After initializing, the application can start decoding frames. When a decoded frame is available, the applications calls the MFXVideoUSER_ProcessFrameAsync function. This stage could be left off if the application is only generating video frames for Intel QSV encode. At this point Intel Media SDK calls the Submit and then Execute plugin callback functions to execute the OpenCL kernel. The Execute plug-in callback will implement the OpenCL host side part of the kernel. It will call clenqueueacquiredx9objectsintel to lock the frame, then standard OpenCL API functions like clsetkernelargs and clenqueuendrangekernel to run OpenCL kernels to process the frame, followed by clenqueuereleasedx9objectsintel. Frame Encoding / Frame Rendering: After frame processing, the DX9 surfaces are ready for the next stage, which could be encoding or display depending on the application. Use Cases OpenCL can be used to accelerate a range of video processing applications, and using the Intel or Khronos DX9 media sharing extensions will help insure the best possible performance by minimizing surface copy overhead. Examples of applications using a combination of OpenCL and Intel QSV include video editing, in which OpenCL is used to apply special effects or video transitions (such as blending two decoded video frames, and encoding the result); video playback with image enhancement prior to rendering for display; and algorithmic video synthesis, for example creating motion video by manipulating and animating still images and subsequently encoding the resulting sequence of frames. Again, it is important to note that the minimum amount of image surface copies can be obtained only by having OpenCL accept and process frames in NV12 format from Intel QSV, and/or deliver to Intel QSV in the same format. If color conversion is required, for example NV12 to (or from) an RGB format, a developer may be able to merge that into the OpenCL processing pipeline as part of the first (or last) kernel to process the image2d object, thereby avoiding an added pass through memory. If Intel QSV is asked to do the color conversion as part of its post processing, it will be a separate processing stage and additional pass through memory. Example Code The best example code for understanding how to integrate OpenCL-based plugins into a Intel QSV pipeline is provided as sample code, available on the Intel Developers Zone. 7 The Intel Media SDK code samples also include an OpenCL sample, but that sample does not use the DX9 media surface sharing extension.

About the Author Tom Craver is an Intel application engineer in Chandler, Arizona. He is currently focused on performance of applications using OpenCL on Intel processor graphics, but has extensive background in SIMD and threaded parallel coding for performance, primarily for media applications such as audio and video codecs and video effect processing. References: 1. Intel Media SDK 2012, available for download at http://software.intel.com/en-us/vcsource/tools/media-sdk 2. The OpenCL Specification, (versions 1.0, 1.1, and 1.2) available at www.khronos.org under sub-sections OpenCL and Specs & Headers. 3. The Intel extension for DirectX 9 Media Sharing, in the extensions section for OpenCL version 1.0 at www.khronos.org under the OpenCL and Specs & Headers subsections, listed as cl_intel_dx9_media_sharing. 4. OpenCL 1.2 Extensions Specification, in the Khronos standard extensions, including the DirectX 9 Media Sharing extension for OpenCL version 1.2, in the extensions section for version 1.0 at www.khronos.org under subsections OpenCL and Specs & Headers. 5. cl_dx9_media_sharing.h file, in the Khronos standard extensions, including the DirectX 9 Media Sharing extension for OpenCL version 1.2, in the extensions section for version 1.0 at www.khronos.org under subsections OpenCL and Specs & Headers. 6. Intel Media Software Development Kit Extensions for User-Defined Functions Version 1.3 (or later) distributed with the Intel Media SDK - located in <install-folder>\doc\mediasdkuser-man.pdf. 7. Sample code for integrating OpenCL into an Intel Media SDK (Intel QSV) pipeline is available at http://software.intel.com/en-us/vcsource/samples/opencl-and-intelmedia-sdk 8. Intel Media Software Development Kit Reference Manual Version 1.3 (or later) distributed with the Intel Media SDK located in <install-folder>\doc\ mediasdkman.pdf.

Notices INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations, and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

Any software source code reprinted in this document is furnished under a software license and may only be used or copied in accordance with the terms of that license. Intel and the Intel logo are trademarks of Intel Corporation in the US and/or other countries. OpenCL and the OpenCL logo are trademarks of Apple Inc and are used by permission by Khronos. Copyright 2012 Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.