Median Filter. Intel OpenCL SDK Sample Documentation. Copyright Intel Corporation. All Rights Reserved. Document Number: US

Similar documents
Intel Media SDK Library Distribution and Dispatching Process

Intel Perceptual Computing SDK My First C++ Application

The ROI from Optimizing Software Performance with Intel Parallel Studio XE

Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual

Intel Media Server Studio Professional Edition for Windows* Server

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture

Intel Solid-State Drive Pro 2500 Series Opal* Compatibility Guide

* * * Intel RealSense SDK Architecture

Intel Integrated Native Developer Experience (INDE): IDE Integration for Android*

Intel SSD 520 Series Specification Update

Software Solutions for Multi-Display Setups

Cloud based Holdfast Electronic Sports Game Platform

Intel and Qihoo 360 Internet Portal Datacenter - Big Data Storage Optimization Case Study

with PKI Use Case Guide

Eliminate Memory Errors and Improve Program Stability

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms

Intel Desktop Board DG43RK

Intel HTML5 Development Environment. Article - Native Application Facebook* Integration

Intel Core i5 processor 520E CPU Embedded Application Power Guideline Addendum January 2011

Specification Update. January 2014

Intel Cyber Security Briefing: Trends, Solutions, and Opportunities. Matthew Rosenquist, Cyber Security Strategist, Intel Corp

IDE Integration for Android* Part of the Intel Integrated Native Developer Experience (Intel INDE) 1.5.7

Towards OpenMP Support in LLVM

Head-Coupled Perspective

The Case for Rack Scale Architecture

Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze

Intel Desktop Board D945GCPE

Intel Desktop Board DP55WB

DDR2 x16 Hardware Implementation Utilizing the Intel EP80579 Integrated Processor Product Line

Scaling up to Production

Intel HTML5 Development Environment. Tutorial Building an Apple ios* Application Binary

Intel Desktop Board DG41BI

Intel Platform Controller Hub EG20T

Fast, Low-Overhead Encryption for Apache Hadoop*

INTEL PARALLEL STUDIO XE EVALUATION GUIDE

Intel HTML5 Development Environment. Tutorial Test & Submit a Microsoft Windows Phone 8* App (BETA)

Intel Desktop Board DQ35JO

Intel Desktop Board DQ43AP

Intel Service Assurance Administrator. Product Overview

Software Evaluation Guide for Autodesk 3ds Max 2009* and Enemy Territory: Quake Wars* Render a 3D character while playing a game

Intel Desktop Board D945GCPE Specification Update

Intelligent Business Operations

Intel Desktop Board DG41TY

Intel Desktop Board DG31PR

iscsi Quick-Connect Guide for Red Hat Linux

Intel Compiler Code Coverage Tool

Keys to node-level performance analysis and threading in HPC applications

INTEL PARALLEL STUDIO EVALUATION GUIDE. Intel Cilk Plus: A Simple Path to Parallelism

Intel Solid-State Drive Data Center Tool User Guide Version 1.1

Intel Modular Server System MFSYS25

Intel Desktop Board DG33TL

Intel Retail Client Manager Audience Analytics

Enhancing McAfee Endpoint Encryption * Software With Intel AES-NI Hardware- Based Acceleration

Intel EP80579 Software for Security Applications on Intel QuickAssist Technology Cryptographic API Reference

Intel Platform and Big Data: Making big data work for you.

Intel Desktop Board DG41WV

Hetero Streams Library 1.0

How to Configure Intel X520 Ethernet Server Adapter Based Virtual Functions on Citrix* XenServer 6.0*

WHITE PAPER. LVDS Flat Panel Display Interface on Intel Desktop Boards. July 2009 Order Number: E

Haswell Cryptographic Performance

Intel X38 Express Chipset Memory Technology and Configuration Guide

Intel IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor: Spread-Spectrum Clocking to Reduce EMI

Intel Media SDK Features in Microsoft Windows 7* Multi- Monitor Configurations on 2 nd Generation Intel Core Processor-Based Platforms

Improve Fortran Code Quality with Static Analysis

Overview of Data Fitting Component in Intel Math Kernel Library (Intel MKL) Intel Corporation

System Event Log (SEL) Viewer User Guide

Extended Attributes and Transparent Encryption in Apache Hadoop

Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus

Accelerating Business Intelligence with Large-Scale System Memory

Intel Desktop Board D945GCL

Intel Desktop Board DQ965GF

Intel Ethernet and Configuring Single Root I/O Virtualization (SR-IOV) on Microsoft* Windows* Server 2012 Hyper-V. Technical Brief v1.

High Performance Computing and Big Data: The coming wave.

Vendor Update Intel 49 th IDC HPC User Forum. Mike Lafferty HPC Marketing Intel Americas Corp.

Creating Overlay Networks Using Intel Ethernet Converged Network Adapters

Intel Core TM i7-660ue, i7-620le/ue, i7-610e, i5-520e, i3-330e and Intel Celeron Processor P4505, U3405 Series

Resetting USB drive using Windows Diskpart command

Accomplish Optimal I/O Performance on SAS 9.3 with

How to Configure Intel Ethernet Converged Network Adapter-Enabled Virtual Functions on VMware* ESXi* 5.1

Intel vpro Technology. How To Purchase and Install Go Daddy* Certificates for Intel AMT Remote Setup and Configuration

Intel System Event Log (SEL) Viewer Utility

Intel Identity Protection Technology (IPT)

Ensure that the AMD APP SDK Samples package has been installed before proceeding.

Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms

This guide explains how to install an Intel Solid-State Drive (Intel SSD) in a SATA-based desktop or notebook computer.

Intel Core TM i3 Processor Series Embedded Application Power Guideline Addendum

Intel vpro Technology. How To Purchase and Install Symantec* Certificates for Intel AMT Remote Setup and Configuration

2013 Intel Corporation

Evaluating Intel Virtualization Technology FlexMigration with Multi-generation Intel Multi-core and Intel Dual-core Xeon Processors.

A Superior Hardware Platform for Server Virtualization

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

Version Rev. 1.0

Intel Q35/Q33, G35/G33/G31, P35/P31 Express Chipset Memory Technology and Configuration Guide

Intel Desktop Board DP43BF

Intel Desktop Board DG965RY

Accelerating Business Intelligence with Large-Scale System Memory

Transcription:

Intel OpenCL SDK Sample Documentation Copyright 2010 2011 Intel Corporation All Rights Reserved Document Number: 325264-003US Revision: 1.3 World Wide Web: http://www.intel.com Document Number: 325264-003US

Disclaimer and Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to http://www.intel.com/design/literature.htm. Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. Go to: http://www.intel.com/products/processor_number/. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Intel, Intel logo, Intel Core, VTune, Xeon are trademarks of Intel Corporation in the U.S. and other countries. * Other names and brands may be claimed as the property of others. OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos. Microsoft product screen shot(s) reprinted with permission from Microsoft Corporation. Copyright 2010-2011 Intel Corporation. All rights reserved. 2 Document Number: 325264-003US

Optimization Notice Intel compilers, associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel and non-intel microprocessors (for example SIMD instruction sets), but do not optimize equally for non-intel microprocessors. In addition, certain compiler options for Intel compilers, including some that are not specific to Intel micro-architecture, are reserved for Intel microprocessors. For a detailed description of Intel compiler options, including the instruction sets and specific microprocessors they implicate, please refer to the Intel Compiler User and Reference Guides under Compiler Options." Many library routines that are part of Intel compiler products are more highly optimized for Intel microprocessors than for other microprocessors. While the compilers and libraries in Intel compiler products offer optimizations for both Intel and Intel-compatible microprocessors, depending on the options you select, your code and other factors, you likely will get extra performance on Intel microprocessors. Intel compilers, associated libraries and associated development tools may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel Streaming SIMD Extensions 2 (Intel SSE2), Intel Streaming SIMD Extensions 3 (Intel SSE3), and Supplemental Streaming SIMD Extensions 3 (SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel and non-intel microprocessors, Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements. We hope to win your business by striving to offer the best performance of any compiler or library; please let us know if you find we do not. Notice revision #20110307 Intel OpenCL SDK Sample Documentation 3

Contents About Median Filter Sample...5 Path...5 Introduction...6 Motivation...6 Algorithm...6 OpenCL Implementation...7 Pixels Load...8 Partial Sort...8 Understanding OpenCL Performance Characteristics...8 Benefits of Implicit Compiler Vectorization...8 Work-group Size Considerations...8 Limitations...8 Future Work and Enhancements...8 Project Structure...9 APIs Used...9 Reference (Native) Implementation...9 Controlling the Sample... 10 References... 10 4 Document Number: 325264-003US

About Median Filter Sample Median Filter sample demonstrates how to use median filter in OpenCL. This implementation optimizes filtration process using implicit Single Instruction Multiple Data (SIMD) code vectorization performed by build-in OpenCL compiler vectorizer. Data-level parallelism of the underlying algorithm results in additional performance gain. The sample improves the performance of the method and reduces data loads. The following figure illustrates median filtering [1]: Path Location Executable <INSTALL_DIR>samples\MedianFilter Win32\Release\MedianFilter.exe 32-bit executable x64\release\medianfilter.exe 64-bit executable Win32\Debug\MedianFilter.exe 32-bit debug executable x64\debug\medianfilter.exe 64- bit debug executable Intel OpenCL SDK Sample Documentation 5

Introduction Motivation Median filter is a non-linear filter that removes noise from an image or a signal. One of the advantages of this method is that it can preserve sharp edges while removing noise. To remove noise, the median filter algorithm processes element patterns of the input image or signal. For each pattern of neighboring elements called window or support, the algorithm finds the median value that is further used as filtering result for the central element of the window. Algorithm In general, median filter effect requires a significant number of calculations and color buffer accesses. This sample implementation minimizes color buffer accesses, removes synchronization points, and uses data-level parallelism. This results in significant performance gain and better result quality as compared to applications that use the same filtration technique optimized for traditional GPU architectures. For an example, please see the algorithm implemented for traditional GPUs described in [2]. This sample demonstrates a CPU-optimized implementation of 2D image median filtration, showing how to: implement calculation kernels using OpenCL C99 parallelize the kernels by running several work-groups in parallel organize host-device data exchange with final image storage on the hard drive. The median filter processes each pixel in the image and compares it to its neighbors to determine whether this pixel can represent the window entries. It replaces the central pixel value with the median of the pixel values in the window. To define the median of a window, sort the entries of the window numerically. For windows with an odd number of entries, the median is the value of the middle entry. For windows with an even number of entries, several options are possible. The following figure illustrates a sample calculation of the median value for a pixel neighborhood: 6 Document Number: 325264-003US

This example illustrates a 3 3 square window. As the central pixel value of 150 does not represent the surrounding values well, it is replaced with the median value of 124. Please note that larger windows produce greater smoothing. The advantage of the median filtering is that unrepresentative pixels in a window cannot have significant effect on the median value. Since the median value must be an actual value of one of the window entries, the median filter does not create new unrealistic pixel values when the filter processes an edge region. Thus, median filtering permits to preserve sharp edges. For details, see [3]. OpenCL Implementation This sample applies the following algorithm stages to a 2D image: 3x3 pixels patch load partial bitonic sorting result storage in 4-channel 32-bit integer format. In this implementation, MedianFilterBitonic OpenCL kernel of MedianFilter.cl file uses partial bitonic sorting to perform median filtering. Every input array row corresponds to a unique global ID that the kernel uses for their identification. The full median filtering sequence consists of OpenCL kernel call performed in ExecuteMedianFilterKernel() function of MedianFilter.cpp file. This algorithm implementation consists of the Pixels Load and Partial Sort parts. Intel OpenCL SDK Sample Documentation 7

Pixels Load Partial Sort This sample uses 32-bit red, green, blue, and alpha (RGBA) pixels, with 8-bit unsigned char values representing pixel individual color channel. For further processing, 3x3=9 pixels are preloaded into temporary storage. This sample uses an algorithm that operates on 3x3 box-shaped support and performs partial sort to find the fifth sorted value out of 3x3 = 9 values. Partial sort performs 19 MIN and 20 MAX operations to find median value for 3x3 support for each color channel. The algorithm operates with 32-bit unsigned integer values. For details on this algorithm, please see [4] and [5]. Understanding OpenCL Performance Characteristics Benefits of Implicit Compiler Vectorization The kernel structure enables implicit vectorization performed by Intel OpenCL compiler when work-group size is multiple of four. Consequently, you can achieve ~2x speedup for the current versions of kernel and vectorizer. Work-group Size Considerations Limitations You can specify any work-group size for the kernel. However, OpenCL implementation of the median filter achieves peak performance for 1024x1024 two-dimensional array with work-group size ranging from 4 to 64 elements. The current version of the sample requires scalar data types and a fixed kernel size (3x3 median filter). Future Work and Enhancements The sample performs all calculations in integer values. Each image pixel consists of four 32-bit integer values representing RGBA image channels. You can improve this sample performance by introducing the following: 8 Document Number: 325264-003US

Project Structure APIs Used arbitrary window size alternative implementation of sorting part (binary search) further filter extension (bilateral filter) more compact data representation (uchar). This sample project has the following structure: MedianFilter.cpp - the host code, with OpenCL initialization and processing functions. This source file also includes native implementation of the algorithm. MedianFilter.cl source code of the OpenCL median filter kernel. MedianFilter.vcproj - Microsoft Visual Studio* 2008 project file. This sample uses the following APIs: clkreatekernel clcreatecontextfromtype clgetcontextinfo clcreatecommandqueue clcreateprogramwithsource clbuildprogram clcreatebuffer clsetkernelarg clenqueuendrangekernel clenqueuereadbuffer clreleasememobject clreleasekernel clreleaseprogram clreleasecommandqueue clreleasecontext. Reference (Native) Implementation Reference implementation is done in ExecuteMedianFilterReference() routine of MedianFilter.cpp file. This is single-threaded code that performs exactly Intel OpenCL SDK Sample Documentation 9

the same median filtering sequence as OpenCL implementation, but uses conventional C nested loop. Controlling the Sample References The sample executable is a console application. You can set the input array size using command line arguments. If you do not specify the array size, the sample uses the default value of 1024x1024=1048576 items. -h command line argument prints help information; -h <height> command line argument setups input array size height; -w <width> command line argument setups input array size width. [1] http://tracer.lcc.uma.es/problems/mfp/mfp.html [2]http://developer.download.nvidia.com/compute/cuda/3_0/sdk/website/OpenCL/we bsite/samples.html [3] http://homepages.inf.ed.ac.uk/rbf/hipr2/median.htm [4] http://www.iti.fh-flensburg.de/lang/algorithmen/sortieren/bitonic/bitonicen.htm [5] Frederick M. Waltz, Ralf Hack, and Bruce G. Batchelor. Fast, efficient algorithms for 3x3 ranked filters using finite-state machines. http://www.engin.umd.umich.edu/~jwvm/ece581/18_rankedf.pdf 10 Document Number: 325264-003US