Intel Integrated Performance Primitives Getting Started Tutorial. Legal Information

Similar documents
Intel Media SDK Library Distribution and Dispatching Process

Overview

How To Install An Intel System Studio 2015 For Windows* For Free On A Computer Or Mac Or Ipa (For Free)

Intel Platform Controller Hub EG20T

Intel Parallel Studio XE 2015 Cluster Edition

The Case for Rack Scale Architecture

DDR2 x16 Hardware Implementation Utilizing the Intel EP80579 Integrated Processor Product Line

Using Windows* 7/Windows Embedded Standard 7* with Platforms Based on the Intel Atom Processor Z670/Z650 and Intel SM35 Express Chipset

The ROI from Optimizing Software Performance with Intel Parallel Studio XE

Upgrading Intel AMT 5.0 drivers to Linux kernel v2.6.31

Contents Overview and Product Contents

Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze

Intel IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor: Spread-Spectrum Clocking to Reduce EMI

Intel Parallel Studio XE 2015 Update 1 Cluster Edition

Specification Update. January 2014

Intel Media Server Studio Professional Edition for Windows* Server

Intel Perceptual Computing SDK My First C++ Application

Intel Retail Client Manager

Intel HTML5 Development Environment. Tutorial Test & Submit a Microsoft Windows Phone 8* App (BETA)

INTEL PARALLEL STUDIO XE EVALUATION GUIDE

ARM* to Intel Atom Microarchitecture - A Migration Study

Intel HTML5 Development Environment. Tutorial Building an Apple ios* Application Binary

Using GStreamer for hardware accelerated video decoding on Intel Atom Processor E6xx series

Intel EP80579 Software for Security Applications on Intel QuickAssist Technology Cryptographic API Reference

Intel Retail Client Manager Audience Analytics

Intel Core TM i3 Processor Series Embedded Application Power Guideline Addendum

Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual

Cyber Security Framework: Intel s Implementation Tools & Approach

Intel Integrated Native Developer Experience (INDE): IDE Integration for Android*

Intel(R) IT Director User's Guide

Intel Service Assurance Administrator. Product Overview

Tutorial: Analyzing Energy Usage on an Android* Platform

Internal LVDS Dynamic Backlight Brightness Control

Intel Core i5 processor 520E CPU Embedded Application Power Guideline Addendum January 2011

Intel HTML5 Development Environment Article Using the App Dev Center

Improve Fortran Code Quality with Static Analysis

Intel vpro Technology. How To Purchase and Install Symantec* Certificates for Intel AMT Remote Setup and Configuration

CT Bus Clock Fallback for Linux Operating Systems

Intel Small Business Advantage (Intel SBA) Release Notes for OEMs

Intel vpro Technology. How To Purchase and Install Go Daddy* Certificates for Intel AMT Remote Setup and Configuration

Intel Retail Client Manager

Intel VTune Amplifier XE 2011 Release Notes for Windows* OS

Intel HTML5 Development Environment. Article - Native Application Facebook* Integration

Intel Identity Protection Technology (IPT)

Contributed Article Program and Intel DPD Search Optimization Training. John McHugh and Steve Moore January 2012

Scaling up to Production

Eliminate Memory Errors and Improve Program Stability

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

Unleashing Linux*-Based Secure Storage Performance with Intel AES New Instructions

Software Evaluation Guide for Autodesk 3ds Max 2009* and Enemy Territory: Quake Wars* Render a 3D character while playing a game

Development for Mobile Devices Tools from Intel, Platform of Your Choice!

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture

Get an Easy Performance Boost Even with Unthreaded Apps. with Intel Parallel Studio XE for Windows*

Intel Solid-State Drive Pro 2500 Series Opal* Compatibility Guide

Software Solutions for Multi-Display Setups

PHYSICAL CORES V. ENHANCED THREADING SOFTWARE: PERFORMANCE EVALUATION WHITEPAPER

Monte Carlo Method for Stock Options Pricing Sample

Hetero Streams Library 1.0

Haswell Cryptographic Performance

Implementation and Performance of AES-NI in CyaSSL. Embedded SSL

Version Rev. 1.0

iscsi Quick-Connect Guide for Red Hat Linux

* * * Intel RealSense SDK Architecture

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Intel Parallel Studio XE 2016

Intel Data Migration Software

Processor Reorder Buffer (ROB) Timeout

Intel Identity Protection Technology Enabling improved user-friendly strong authentication in VASCO's latest generation solutions

Intel SSD 520 Series Specification Update

How to Configure Intel Ethernet Converged Network Adapter-Enabled Virtual Functions on VMware* ESXi* 5.1

Intel Media SDK Features in Microsoft Windows 7* Multi- Monitor Configurations on 2 nd Generation Intel Core Processor-Based Platforms

Intel Software Guard Extensions(Intel SGX) Carlos Rozas Intel Labs November 6, 2013

Intel and Qihoo 360 Internet Portal Datacenter - Big Data Storage Optimization Case Study

Improve Fortran Code Quality with Static Security Analysis (SSA)

with PKI Use Case Guide

Intel Technical Advisory

User Experience Reference Design

Device Management API for Windows* and Linux* Operating Systems

Intel Desktop Board DG43RK

IDE Integration for Android* Part of the Intel Integrated Native Developer Experience (Intel INDE) 1.5.7

Xen in Embedded Systems. Ray Kinsella Senior Software Engineer Embedded and Communications Group Intel Corporation

Software Evaluation Guide for Microsoft Office Excel 2010* and WinZip 15.5*

Intel Internet of Things (IoT) Developer Kit

Partition Alignment of Intel SSDs for Achieving Maximum Performance and Endurance Technical Brief February 2014

Extended Attributes and Transparent Encryption in Apache Hadoop

Intel Desktop Board D945GCPE Specification Update

Towards OpenMP Support in LLVM

Intel Ethernet and Configuring Single Root I/O Virtualization (SR-IOV) on Microsoft* Windows* Server 2012 Hyper-V. Technical Brief v1.

Intel Desktop Board D945GCPE

Evaluating Intel Virtualization Technology FlexMigration with Multi-generation Intel Multi-core and Intel Dual-core Xeon Processors.

Cloud based Holdfast Electronic Sports Game Platform

System Event Log (SEL) Viewer User Guide

Vendor Update Intel 49 th IDC HPC User Forum. Mike Lafferty HPC Marketing Intel Americas Corp.

MCA Enhancements in Future Intel Xeon Processors June 2013

Overview

Intel Desktop Board DG41BI

An Architecture to Deliver a Healthcare Dial-tone

Creating Overlay Networks Using Intel Ethernet Converged Network Adapters

Accomplish Optimal I/O Performance on SAS 9.3 with

Transcription:

Intel Integrated Performance Primitives Getting Started Tutorial Legal Information

Intel Integrated Performance Primitives Getting Started Tutorial Contents Legal Information... 3 Chapter 1: Using Intel Integrated Performance Primitives for Application Optimization Learning Objectives... 5 Key Terms and Concepts... 5 The Game of Life Samples... 6 Workflow Steps to Create a Threaded Application using Intel(R) IPP... 6 Integrating Intel IPP into Your Project...6 Using Intel IPP Functions... 7 Threading Your Application...9 Game of Life - Parallelization Approaches... 10 Summary...12 2

By using this document, in addition to any agreements you have with Intel, you accept the terms set forth below. You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein. INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http:// www.intel.com/design/literature.htm MPEG-1, MPEG-2, MPEG-4, H.261, H.263, H.264, MP3, DV, VC-1, MJPEG, AC3, AAC, G.711, G.722, G.722.1, G.722.2, AMRWB, Extended AMRWB (AMRWB+), G.167, G.168, G.169, G.723.1, G.726, G.728, G.729, G. 729.1, GSM AMR, GSM FR are international standards promoted by ISO, IEC, ITU, ETSI, 3GPP and other organizations. Implementations of these standards, or the standard enabled platforms may require licenses from various entities, including Intel Corporation. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. BlueMoon, BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Inside, Cilk, Core Inside, E-GOLD, Flexpipe, i960, Intel, the Intel logo, Intel AppUp, Intel Atom, Intel Atom Inside, Intel CoFluent, Intel Core, Intel Inside, Intel Insider, the Intel Inside logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel Sponsors of Tomorrow., the Intel Sponsors of Tomorrow. logo, Intel StrataFlash, Intel vpro, Intel Xeon Phi, Intel XScale, InTru, the InTru logo, the InTru Inside logo, InTru soundmark, Itanium, Itanium Inside, MCS, MMX, Pentium, Pentium Inside, Puma, skoool, the skoool logo, SMARTi, Sound Mark, Stay With It, The Creators Project, The Journey Inside, Thunderbolt, Ultrabook, vpro Inside, VTune, Xeon, Xeon Inside, X-GOLD, XMM, X-PMU and XPOSYS are trademarks of Intel Corporation in the U.S. and/or other countries. 3

Intel Integrated Performance Primitives Getting Started Tutorial *Other names and brands may be claimed as the property of others. Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries. Java is a registered trademark of Oracle and/or its affiliates. Copyright 2012-2014, Intel Corporation. All rights reserved. 4

Using Intel Integrated Performance Primitives for Application Optimization 1 Learning Objectives This tutorial shows how to use Intel Integrated Performance Primitives (Intel IPP) to add vectorization to an application, how to use some of the Intel IPP APIs, and provides an example of how to add threading to an Intel IPP application using Intel Threading Building Blocks (Intel TBB) and Intel Cilk Plus. After you complete this tutorial, you should be able to do the following: Add Intel IPP and Intel TBB to your Microsoft* Visual Studio* project. Integrate Intel IPP to your Microsoft* Visual Studio* project. Use Intel IPP to add optimization to the Game of Life application. Use Intel TBB to add threading to the Game of Life application. Use Intel Cilk Plus to add threading to the Game of Life application. Key Terms and Concepts Key Terms region of interest: Most Intel IPP image processing functions can operate not only on entire images but also on image areas. The image region of interest (ROI) is a rectangular area that may be either some part of the image or the whole image. channel: The Intel IPP image processing functions support only absolute color images in which each pixel is represented by its channel intensities. The data storage for an image can be either pixel-oriented or planeoriented (planar). For images in pixel order, all channel values for each pixel are clustered and stored consecutively, for example, RGBRGBRGB in case of an RGB image. The number of channels in a pixel-order image can be 1, 2, 3, or 4. For images in planar order, all image data for each channel is stored contiguously followed by the next channel, for example, RRR...GGG...BBB. Key Concept: Adding Intel IPP and Intel TBB to your project There is an easy way to add Intel IPP or Intel TBB compile paths and link libraries to your project. Intel IPP and Intel TBB functions can be directly linked to your application. They also can be embedded in a C++ DLL, which is called from the.net language. Key Concept: Using IPP functions Intel IPP image processing functions typically operate on regions of interest and image data in buffers. There are many Intel IPP image processing functions for manipulating buffers. This tutorial discusses several of them. Key Concept: Threading an application that uses Intel IPP Intel TBB or Intel Cilk Plus can be used to thread an application using Intel IPP. Intel IPP functions are thread safe so they can be called by multiple threads without causing synchronization bugs, as long as the data that is passed to them is protected. 5

1 Intel Integrated Performance Primitives Getting Started Tutorial Optimization Notice Intel's compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessordependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 The Game of Life Samples The Game of Life samples are a collection of Microsoft* Visual Studio* 2010 project, solution, and Visual Basic*.NET and C++ source files. You can find the Game of Life samples in the product samples directory: \Composer XE 2015\Samples\en_US\IPP\Life.zip. The algorithm can be implemented using different approaches. For example, you can optimize for speed using vectorization, or threading, or both. There are several different samples available. The samples illustrate different approaches to optimizing the Game of Life algorithm Intel IPP, Intel TBB, and the Intel C++ Compiler with Intel Cilk Plus. The Game of Life algorithm was first introduced by British mathematician John Conway in 1970 and is an example of a cellular automaton. If you are not familiar with the Game of Life algorithm, you can easily find information on the history of the algorithm and other ways to implement it on the worldwide web. Workflow Steps to Create a Threaded Application using Intel(R) IPP In each of the following sections you can find step-by-step instructions on how to convert the Game of Life application from a serial application to a parallel application. The serial project provides baseline performance. There are a number of technologies to take advantage of vector instructions in the processor and to thread an application; this example uses Intel IPP to provide vectorization, and Intel TBB or Intel Cilk Plus to thread the application. To help you see the code lines implementing vectorization and threading, the source code is annotated with # defines stating the action to take in the source code. The following are the workflow steps: Integrating Intel IPP into your project Using Intel IPP functions Threading your application Integrating Intel IPP into Your Project The paths to Intel IPP and Intel TBB include files and the names and paths of the libraries to link to must be available in the project. You can use the Intel Parallel Studio XE Composer Edition to make these files available in the project. To integrate Intel IPP with Microsoft* Visual Studio*: 1. Right-click a project name in the Solution Explorer and select project Properties menu item. 6

Using Intel Integrated Performance Primitives for Application Optimization 1 The project Property Pages dialog box opens. 2. In the Property Pages dialog box, select Intel Performance Libraries pane. ChooseSinglethreaded Static Library option in the Use IPP field and Yes in the Use TBB field. This tutorial uses high level threading by Intel TBB or Intel Cilk Plus, so you need to select the nonthreaded Intel IPP libraries to avoid overuse of threads. Using Intel IPP Functions Running Serial C Code Before running serial C code, check the following in the Life.vb: 1. The SerialLife function is uncommented. 2. The IPPCilkLife and IPPTBBLife functions are commented out. 3. The PictureBox1.Refresh() is uncommented. 7

1 Intel Integrated Performance Primitives Getting Started Tutorial 4. Run the application and check the application performance. Running Intel IPP Optimization Code Intel IPP takes advantage of the Intel Streaming SIMD Extensions (Inte SSE, SSE2, SSE3, SSE4, AVX, and so on) when possible, vectorizing and speeding up the operation. To run IPP optimization code, do the following in the Life.vb project: 1. Uncomment the IPPTBBLife function call. 2. Comment out the SerialLife function call. 3. Run the application and check the application performance. Intel IPP includes a dispatcher that automatically executes the optimal versions of IPP functions at run time, to match the specific processor type and instruction set. If you link your code to Intel IPP statically, you need to call ippinit() function before any other function calls. This function will automatically initialize the most appropriate optimized code for the runtime processor. If this function call is missing, the default minimum SIMD instruction sets (SSE2 on IA-32 and SSE3 on Intel 64 processors) are used. To see the difference between the code with minimal optimization and the code with appropriate CPU optimization: 1. In Lafe.vb, comment out the call of the PictureBox1.Refresh() function around line 68 to get the CPU usage numbers for the IPP optimized code. 8

Using Intel Integrated Performance Primitives for Application Optimization 1 2. Comment out the ippinit() function in the CheckIPPTBBLife.cpp that is around line 17. Run the application and check the performance. Then uncomment this function, run the application again and evaluate the performance difference. Learning Intel IPP Function Calls The Game of Life algorithm specifies that a living cell must have two or three neighbors to stay alive. The Intel IPP function ippicomparec_8u_c1r performs this test on a buffer of cell data instead of one cell at a time. The function ippicomparec_8u_c1r takes a buffer of 8 bit unsigned image data and compares pixels of the source image ROI to a given value specified and writes the results to a one-channel image of Ipp8u data. If the values, in this case 2, are equal, then the corresponding output pixel is set to an IPP_MAX_8U value; otherwise, it is set to 0. The image ROI is set to be the whole buffer. This buffer is used to store which cells have two neighbors. 1. In the IPPTBBLife.cpp, comment out the second function ippicomparec_8u_c1r that is around line 62. 2. In the Lafi.vb, uncomment PictureBox1.Refresh() around line 68. 3. Run the application. You will notice that the visual cellular activity decreases significantly. The ippidup_8u_c1c3r function copies a one-channel (gray scale) image to each channel of the threechannel (color) image. In this case this function copies the new Life cell state to the output buffer so it can be displayed. Run the program with the ippidup_8u_c1c3r function around line 76 commented out in the file IPPTBBLife.cpp. You will notice that the output window is blank. Threading Your Application Intel IPP functions are thread safe. It means that you can thread your application and call Intel IPP functions without synchronization problems if the same functions are called by different threads. Intel TBB and Intel Cilk Plus are two threading technologies that you can use to take advantage of multi-core architectures. Running Application without Threading 1. Comment out the #define USE_TBB statement located around line 5 in the file IPPTBBLife.cpp. This will cause the Game of Life application to run serially. 2. Launch the Windows Task Manager and click on the Performance tab. See the CPU Usage section. To get the CPU usage numbers for just the Game of Life application, without the cycles consumed for the display, comment out the call to PictureBox1.Refresh() around line 68 in Life.vb. 3. Run the application and note the CPU Usage percentage. 9

1 Intel Integrated Performance Primitives Getting Started Tutorial Threading Application with Intel TBB Do the following to see how you can use Intel TBB to thread the Game of Life application: 1. In the IPPTBBLife.cpp file, uncomment the #define USE_TBB statement located around line 5. This will thread the Life application using Intel TBB. 2. Run the application and note the CPU Usage percentage. Threading Application with Intel Cilk Plus Do the following to see how you can use Intel Cilk Plus to thread the Game of Life application: 1. In Life.vb, uncomment the IPPCilkLife function located around the line 51. 2. Comment out the IPPTBBLife and SerialLife functions. 3. Comment out the #define USE_CILK statement located around line 5 in the file IPPClkLife.cpp. This will cause the Game of Life application to run serially. 4. Launch the Windows Task Manager and click on the Performance tab. Run the application and note the CPU Usage percentage. 5. Uncomment the #define USE_CILK statement located around line 5 in the file IPPTBBLife.cpp. This will thread the Game of Life application using Intel Cilk Plus. 6. Run the application and note the CPU Usage percentage. The CPU Usage percentage should be significantly higher than in the serial case for a multi-core system, because Intel Cilk Plus will split the work across multiple processors. A higher CPU Usage percentage indicates more work is being done per unit of time, which means that application performance is better. Game of Life - Parallelization Approaches This section provides step-by-step instructions on how to convert the Game of Life application from a serial application to a parallel application using different approaches: Intel IPP and Intel TBB or Intel Cilk Plus. The serial project provides baseline performance. Follow the steps below to build the serial and threaded Game of Life applications. Serial Project 1. From the directory folder, double-click on the solution file Life.sln to open it in Microsoft* Visual Studio*. 2. Set CheckSerialLife as the StartUp project. 3. View the code for Life.vb in the Life project. 4. Comment the call to IPPTBBLife and IPPCilkLife. 5. Uncomment the call to SerialLife at approximately line 54. 6. Uncomment the call to PictureBox1.Refresh() at approximately line 68. 7. From the Build >... menu select Release to build a release project. 8. Run the Life.exe applicaton. 9. The display window looks similar to this: 10

Using Intel Integrated Performance Primitives for Application Optimization 1 10. Start Task Manager and record the CPU usage. This reading includes the CPU cycles for running the application and displaying the image. To get the CPU usage numbers for just the Game of Life application, without the cycles consumed for the display, perform the following steps: 1. Comment out the call to PictureBox1.Refresh(). 2. From the Build >... menu select Release to build a release project. 3. Run the Life.exe applicaton. 4. Start Task Manager and record the CPU usage. This provides a baseline for CPU usage. Vectorized and Threaded with Intel TBB Create vectorized version: 1. Set CheckIPPTBBLife as the StartUp project. 2. View the code for Life.vb in the Life project. 3. Uncomment the call to IPPTBBLife at approximately line 48. 4. Uncomment the call to PictureBox1.Refresh(). 5. Comment the call to SerialLife and IPPCilkLife. 6. Open the file IPPTBBLife.cpp. 7. Comment out the define for USE_TBB. 8. From the Build >... menu select Release to build a release project. 9. Run the Life.exe applicaton. 10. Start Task Manager and record the CPU usage. This reading includes the CPU cycles for running the application and displaying the image. To get the CPU usage numbers for just the Game of Life application, without the cycles consumed for the display, perform the following steps: 1. Comment out the call to PictureBox1.Refresh(). 2. From the Build >... menu select Release to build a release project. 3. Run the Life.exe applicaton. 4. Start Task Manager and record the CPU usage. This provides a baseline for CPU usage. Is there better CPU usage than in the serial version? Intel IPP provides good vectorization. It means it takes advantage of the Intel Streaming SIMD extensions, which increases CPU usage. 11

1 Intel Integrated Performance Primitives Getting Started Tutorial Parallelize the vectorized version: 1. Open the file IPPTBBLife.cpp. 2. Uncomment the define for USE_TBB. 3. From the Build >... menu select Release to build a release project. 4. Run the Life.exe applicaton. 5. Record the frames per second and compare to the Intel IPP only version. This provides a baseline for IPP plus TBB performance. To get the CPU usage numbers for just the Game of Life application, without the cycles consumed for the display, perform the following steps: 1. Comment out the call to PictureBox1.Refresh(). 2. From the Build >... menu select Release to build a release project. 3. Run the Life.exe applicaton. 4. Start Task Manager and record the CPU usage. This provides a baseline for Intel IPP plus Intel TBB CPU usage. Is there better CPU usage than in either the serial or Intel IPP only versions? Intel TBB provides effective threading so that all cores are used on the machine, which increases CPU usage. Vectorized and Threaded with Intel Cilk Plus Create vectorized and treaded version: 1. Set CheckCilkLife as the StartUp project. 2. View the code for Life.vb in the Life project. 3. Uncomment the call to IPPCilkLife at approximately line 51. 4. Uncomment the call to PictureBox1.Refresh(). 5. Comment the call to SerialLife and IPPTBBLife. 6. Open the file IPPCilkLife.cpp. 7. Uncomment the define for USE_CILK. 8. From the Build >... menu select Release to build a release project. 9. Run the Life.exe applicaton. 10. Start Task Manager and record the CPU usage. This reading includes the CPU cycles for running the application and displaying the image. To get the CPU usage numbers for just the Game of Life application, without the cycles consumed for the display, perform the following steps: 1. Comment out the call to PictureBox1.Refresh(). 2. From the Build >... menu select Release to build a release project. 3. Run the Life.exe applicaton. 4. Start Task Manager and record the CPU usage. This provides a baseline for CPU usage. Is there better CPU usage than in the serial version? Intel IPP plus Intel Cilk Plus provides good vectorization. This means it takes advantage of Intel Streaming SIMD extensions, which increases CPU usage. Intel Cilk Plus provides threading so that more cores are used on the machine, which increases CPU usage. Summary You have completed the tutorial. Here are a few things that you have learned: How to add Intel IPP and Intel TBB or Intel Cilk Plus support to your Microsoft Visual Studio* project. 12

Using Intel Integrated Performance Primitives for Application Optimization 1 How to use Intel TBB and Intel Cilk Plus with Intel IPP to thread your application. How to record the CPU usage. How to use Intel IPP to speed up application code. 13