OpenCL on Intel Iris Graphics

Size: px
Start display at page:

Download "OpenCL on Intel Iris Graphics"

Transcription

1 Copyright Khronos Group, Page 1 OpenCL on Intel Iris Graphics SIGGRAPH 2013 July 2013 Presenter: Adam Lake Content: Ben Ashbaugh, Arnon Peleg, others

2 Copyright Khronos Group, Page 2 Agenda Intel Iris Graphics Architecture Overview Tools to help get the most from OpenCL on Intel CPU and GPU Additional Resources

3 Sub Slice 1 Sub Slice 3 Ring Bus / LLC / Memory Sub Slice 0 Sub Slice 2 Intel Iris Graphics Architecture Overview Command Streamer (CS) Vertex Fetch (VF) Video Front End (VFE) Video Quality Engine Multi-Format CODEC Blitter Display Vertex Shader (VS) Hull Shader (HS) L1 IC$ 3D Media Data Port Tex$ L1 IC$ 3D Media Data Port Tex$ Tessellator Domain Shader (DS) Thread Dispatch Rasterizer / Depth L3$ Pixel Ops Render$ Depth$ Rasterizer / Depth L3$ Pixel Ops Render$ Depth$ Geometry Shader (GS) Stream Out (SOL) Clip/Setup L1 IC$ 3D Media Data Port Tex$ L1 IC$ 3D Media Data Port Tex$ Slice 0 Slice 1 3 Copyright Khronos Group, Page 3

4 Sub Slice 1 Sub Slice 3 Ring Bus / LLC / Memory Sub Slice 0 Sub Slice 2 Intel Iris Graphics Architecture Overview Command Streamer (CS) Vertex Fetch (VF) Video Front End (VFE) Global Assets Command Video Quality Streamer Multi-Format Engine CODEC Thread Dispatch Blitter Display Vertex Shader (VS) Hull Shader (HS) L1 IC$ 3D Media Data Port Tex$ L1 IC$ 3D Media Data Port Tex$ Tessellator Domain Shader (DS) Thread Dispatch Rasterizer / Depth L3$ Pixel Ops Render$ Depth$ Rasterizer / Depth L3$ Pixel Ops Render$ Depth$ Geometry Shader (GS) Stream Out (SOL) Clip/Setup L1 IC$ 3D Media Data Port Tex$ L1 IC$ 3D Media Data Port Tex$ Slice 0 Slice 1 4 Copyright Khronos Group, Page 4

5 Sub Slice 1 Sub Slice 3 Ring Bus / LLC / Memory Sub Slice 0 Sub Slice 2 Intel Iris Graphics Architecture Overview Command Streamer (CS) Vertex Fetch (VF) Video Front End (VFE) Video Quality Engine Multi-Format CODEC Blitter Display Vertex Shader (VS) Hull Shader (HS) Tessellator Sub Slice Execution Units L1 IC$ s and Data Port Instruction and Texture Caches 3D Media Data Port Tex$ L1 IC$ 3D Media Data Port Tex$ Domain Shader (DS) Thread Dispatch Rasterizer / Depth L3$ Pixel Ops Render$ Depth$ Rasterizer / Depth L3$ Pixel Ops Render$ Depth$ Geometry Shader (GS) Stream Out (SOL) Clip/Setup L1 IC$ 3D Media Data Port Tex$ L1 IC$ 3D Media Data Port Tex$ Slice 0 Slice 1 5 Copyright Khronos Group, Page 5

6 Sub Slice 1 Sub Slice 3 Ring Bus / LLC / Memory Sub Slice 0 Sub Slice 2 Intel Iris Graphics Architecture Overview Command Streamer (CS) Vertex Fetch (VF) Vertex Shader (VS) Hull Shader (HS) Video Front End (VFE) Video Quality Engine L1 IC$ Multi-Format CODEC Blitter 3D Media Data Port Slice DisplayCommon L3 Cache Shared Local Memory Barriers Tex$ L1 IC$ 3D Media Data Port Tex$ Tessellator Domain Shader (DS) Thread Dispatch Rasterizer / Depth L3$ Pixel Ops Render$ Depth$ Rasterizer / Depth L3$ Pixel Ops Render$ Depth$ Geometry Shader (GS) Stream Out (SOL) Clip/Setup L1 IC$ 3D Media Data Port Tex$ L1 IC$ 3D Media Data Port Tex$ Slice 0 Slice 1 6 Copyright Khronos Group, Page 6

7 Intel Iris Graphics Architecture Building Blocks OpenCL* Kernels run on an Execution Unit () Each is a Multi-Threaded SIMD Processor Up to 7 threads per x 8 x 32-bit registers per thread Up to 8, 16, or 32 OpenCL* work items per thread (compiler-controlled) - SIMD8, SIMD16, SIMD32 - SIMD8 More Registers - SIMD16 and SIMD32 Better Efficiency Thread 0 Thread 2 Thread 4 Thread 6 Thread ` 1 Thread 3 Thread 5 7 Copyright Khronos Group, Page 7

8 Sub Slice Intel Iris Graphics Architecture Building Blocks OpenCL* Work Groups run on a Sub Slice - 10 s per Sub Slice - Texture (Images) L1 IC$ 3D Media Data Port Tex$ - Data Port (Buffers) - Instruction and Texture Caches OpenCL* Work Groups may run on multiple threads, on multiple s! 8 Copyright Khronos Group, Page 8

9 Intel Iris Graphics Architecture Building Blocks Two Sub Slices make a Slice Shared Resources: Slice Common - L3 Cache + Shared Local Memory Sub Slice L1 IC$ 3D Media Data Port Tex$ - Barriers Intel Iris Graphics has Two Slices How many work items in flight at once? Rasterizer / Depth L3$ Pixel Ops Render$ Depth$ - 2 slides each with 2 subslices = 4 Sub Slices - 4 subslices x 10 s/subslice = 40 s - Up to 40 s x 7 threads/ = 280 threads - Up to 8960 OpenCL* work items in flight! Sub Slice L1 IC$ 3D Media Data Port Tex$ Slice 9 Copyright Khronos Group, Page 9

10 Intel Iris Graphics Cache Hierarchy EDRAM (non-inclusive victim cache) 128MB/package (Intel Iris Pro 5200) images L1 L2 L3 LLC DRAM buffers 256KB/slice 2-8MB/package (shared w/ CPU) 10 Copyright Khronos Group, Page 10

11 Tools and Additional Resources Copyright Khronos Group, Page 11

12 Occupancy Intel VTune Amplifier XE Copyright Khronos Group, Page 12

13 Additional Resources Intel SDK for OpenCL* Applications Offline kernel builder, Debugger for CPU in MSVC Intel OpenCL* Optimization Guide Intel VTune Amplifier XE 2013 Intel Graphics Performance Analyzers - OpenCL Tuning Intel Linux Graphics hardware Bspec: SIGGRAPH 2013: - Faster, Better Pixels on the Go and in the Cloud with OpenCL* on Intel Architecture - Arnon Peleg - Optimizing OpenCL* Applications for Intel Iris Graphics - Ben Ashbaugh - Backup of this deck contains more from Ben Ashbaugh s excellent SIGGRAPH 2012 talk on optimizing for Intel Iris Graphics! 13 Copyright Khronos Group, Page 13

14 Questions Copyright Khronos Group, Page 14

15 Copyright Khronos Group, Page 15 Legal INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without notice. All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. Intel processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. Any code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licensees and other third parties are not authorized by Intel to use code names in advertising, promotion or marketing of any product or services and any such use of Intel's internal code names is at the sole risk of the user. Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel s current plan of record product roadmaps. Performance claims: Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to Intel, Intel Inside, the Intel logo, Centrino, Intel Core, Intel Atom, Pentium, and Ultrabook are trademarks of Intel Corporation in the United States and other countries

16 backup Copyright Khronos Group, Page 16

17 Scheduling s for maximum occupancy Copyright Khronos Group, Page 17

18 Occupancy Goal: Use All Execution Unit Resources This is harder than it sounds! Many factors to consider - Launch Enough Work - More threads means better latency coverage to keep an active - One thread sufficient to prevent an from going idle - Too few threads can result in an being stalled 18 Copyright Khronos Group, Page 18

19 Occupancy Goal: Use All Execution Unit Resources - Don t Waste SIMD Lanes - Use an optimal Local Work Size - Good: Query for compiled SIMD size: CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE - Occasionally Helpful: Compile for a specific local work size (8, 16, or 32): attribute ((reqd_work_group_size(x, Y, Z))) - Best: Let the driver pick (Local Work Size == NULL) Ideal for kernels with no barriers or shared local memory 19 Copyright Khronos Group, Page 19

20 Occupancy (continued ) Barriers - 16 barriers per sub slice - Can be a limiting factor for very small local work groups Shared Local Memory - 64KB shared local memory per sub slice - Can be a limiting factor for kernels that use lots of shared local memory 20 Copyright Khronos Group, Page 20

21 Optimizing OpenCL* Applications for Intel Iris Graphics Ben Ashbaugh (Intel) Copyright Khronos Group, Page 21

22 Agenda Understanding Occupancy - How Intel Iris Graphics executes OpenCL* Kernels Copyright Khronos Group, Page 22

23 How Intel Iris Graphics Runs OpenCL* Copyright Khronos Group, Page 23

24 How Intel Iris Graphics Runs OpenCL* 1. Divide Into Work Groups 24 Copyright Khronos Group, Page 24

25 How Intel Iris Graphics Runs OpenCL* 2. Divide Each Work Group Into Threads 25 Copyright Khronos Group, Page 25

26 Sub Slice How Intel Iris Graphics Runs OpenCL* L1 IC$ 3D Media Data Port Tex$ 3. Launch Threads for the Work Group Onto a Sub Slice - Repeat for each Work Group - Must have enough room in the Sub Slice for all threads for the Work Group - Not enough room in any Sub Slice threads must wait 26 Copyright Khronos Group, Page 26

27 How Intel Iris Graphics Runs OpenCL* 1. Divide Into Work Groups 27 Copyright Khronos Group, Page 27

28 How Intel Iris Graphics Runs OpenCL* 2. Divide Each Work Group Into Threads 28 Copyright Khronos Group, Page 28

29 Sub Slice How Intel Iris Graphics Runs OpenCL* L1 IC$ 3D Media Data Port Tex$ 2. Launch Threads for the Work Group Onto a Sub Slice - Repeat for each Work Group - Must have enough room in the Sub Slice for all threads for the Work Group - Not enough room in any Sub Slice threads must wait 29 Copyright Khronos Group, Page 29

30 Occupancy Goal: Use All Machine Resources This is harder than it sounds! Many factors to consider 1. Launch Enough Work - One thread sufficient to prevent an from going idle - Too few threads can result in an being stalled - More threads better latency coverage keeps an active 30 Copyright Khronos Group, Page 30

31 Occupancy Intel VTune Amplifier XE Copyright Khronos Group, Page 31

32 Agenda - Memory Matters - Host to Device Copyright Khronos Group, Page 32

33 Optimizing Host to Device Transfers Host (CPU) and Device (GPU) share the same physical memory For OpenCL* buffers: - No transfer needed (zero copy)! - Allocate system memory aligned to a cache line (64 bytes) - Create buffer with system memory pointer and CL_MEM_USE_HOST_PTR - Use clenqueuemapbuffer() to access data For OpenCL* images: - Currently tiled in device memory transfer required 33 Copyright Khronos Group, Page 33

34 Operating on Buffers as Images Intel Iris Graphics supports cl_khr_image2d_from_buffer - New OpenCL* 1.2 Extension - Treat data as a buffer for some kernels, as an image for others - Some restrictions for zero copy: buffer size, image pitch Buffer Image 0x123 0x456 0x Copyright Khronos Group, Page 34

35 Interop with Graphics and Media APIs / SDKs Intel Iris Graphics supports many Graphics and Media interop extensions: - cl_khr_dx9_media_sharing (includes DXVA for Intel Media SDK) - cl_khr_d3d10_sharing - cl_khr_d3d11_sharing - cl_khr_gl_sharing - cl_khr_gl_depth_images - cl_khr_gl_event - cl_khr_gl_msaa_sharing Use Graphics API / SDK assets in OpenCL* with no copies! 35 Copyright Khronos Group, Page 35

36 Agenda - Memory Matters - - Device Access - 36 Copyright Khronos Group, Page 36

37 Intel Iris Graphics Cache Hierarchy EDRAM (non-inclusive victim cache) 128MB/package (Intel Iris Pro 5200) images L1 L2 L3 LLC DRAM buffers 256KB/slice 2-8MB/package (shared w/ CPU) 37 Copyright Khronos Group, Page 37

38 global and constant Memory Global Memory Accesses go through the L3 Cache L3 Cache Line is 64 bytes thread accesses to the same L3 Cache Line are collapsed - Order of data within cache line does not matter - Bandwidth determined by number of cache lines accessed - Maximum Bandwidth: 64 bytes / clock / sub slice Good: Load at least 32-bits of data at a time, starting from a 32-bit aligned address Best: Load 4 x 32-bits of data at a time, starting from a cache line aligned address - Loading more than 4 x 32-bits of data is not beneficial 38 Copyright Khronos Group, Page 38

39 Global and Constant Memory Access Examples 1. x = data[ get_global_id(0) ] - One cache line, full bandwidth 2. x = data[ n get_global_id(0) ] - Reverse order, full bandwidth 3. x = data[ get_global_id(0) + 1 ] - Offset, two cache lines, half bandwidth 4. x = data[ get_global_id(0) * 2 ] - Strided, half bandwidth 5. x = data[ get_global_id(0) * 16 ] - Very strided, worst-case Global ID: Global ID: Global ID: Global ID: Global ID: Cache Line n Cache Line n Cache Line n - 1 Cache Line n Cache Line n Cache Line n Cache Line n Cache Line n Cache Line n Cache Line n + 1 Cache Line n Copyright Khronos Group, Page 39

40 Copyright Khronos Group, Page 40 local Memory Accesses Local Memory Accesses also go through the L3 Cache! Key Difference: Local Memory is Banked - Banked at a DWORD granularity, 16 banks - Bandwidth determined by number of bank conflicts - Maximum Bandwidth: Still 64 bytes / clock / sub slice Supports more access patterns with full bandwidth than Global Memory - No bank conflicts full bandwidth - Reading from the same address in a bank full bandwidth

41 Local Memory Access Examples 1. x = data[ get_global_id(0) + 1 ] - Unique banks, full bandwidth 2. x = data[ get_global_id(0) & ~1 Bank: ] - Same address read, full bandwidth 3. x = data[ get_global_id(0) * 2 ] - Strided, half bandwidth 4. x = data[ get_global_id(0) * 16 ] - Very strided, worst-case 5. x = data[ get_global_id(0) * 17 ] - Full bandwidth! Bank: Global ID: Global ID: Bank: Global ID: Bank: Global ID: Bank: Global ID: Copyright Khronos Group, Page 41

42 Thread n+1 Thread n-1 Thread n Copyright Khronos Group, Page 42 private Memory Compiler can usually allocate Private Memory in the Register File Work Item 0 Work Item 1 private int a[100] - Even if Private Memory is dynamically indexed - Good Performance Fallback: Private Memory allocated in Global Memory - Accesses are very strided Work Item n Work Item 0 Work Item 1 Work Item n Work Item 0 Work Item 1 private int b[100] private int c[200] - Bad Performance Work Item n

43 Agenda Compute Characteristics - Maximizing GFlops 43 Copyright Khronos Group, Page 43

44 ISA SIMT ISA with Predication and Branching Divergent code executes both branches Reduced SIMD Efficiency this(); if ( x ) else that(); another(); finish(); time Example: x sometimes true SIMD lane time Example: x never true SIMD lane 44 Copyright Khronos Group, Page 44

45 Compute GFlops s have 2 x 4-wide vector ALUs Second ALU has limitations: - Subset of instructions: add, mov, mad, mul, cmp - Instruction must come from another thread - Only float operands! Peak GFlops: #s x ( 2 x 4-wide ALUs ) x ( MUL + ADD ) x Clock Rate For Intel Iris Pro 5200: 40 x 8 x 2 x 1.3 = 832 GFlops! Add Intel Core Host Processor >1TFlop! 45 Copyright Khronos Group, Page 45

46 Maximizing Compute Performance Use mad() / fma(): Either explicitly with built-ins, or via -cl-mad-enable Use floats wherever possible to maximize co-issue Avoid long and size_t data types - Prefer float over int, if possible - Using short data types may improve performance Trade accuracy for speed: native built-ins, -cl-fastrelaxed-math - Often good enough for graphics 46 Copyright Khronos Group, Page 46

47 Agenda Summary / Questions 47 Copyright Khronos Group, Page 47

48 Summary Maximize Occupancy - Choose a Good Local Work Size - Or, Let the Driver Choose (Local Work Size == NULL) Avoid Host-to-Device Transfers - Create Buffers with CL_MEM_USE_HOST_PTR Access Device Memory Efficiently - Minimize Cache Lines for global Memory - Minimize Bank Conflicts for local Memory Maximize Compute - Avoid Divergent Branches - Use mad / fma and float Data When Possible Copyright Khronos Group, Page 48

49 Copyright Khronos Group, Page 49 Questions / Acknowledgements This presentation would not have been possible without material and review comments from many people Thank you! Murali Sundaresan, Sushma Rao, Aaron Kunze, Tom Craver, Brijender Bharti, Rami Jiossy, Michal Mrozek, Jay Rao, Pavan Lanka, Adam Lake, Arnon Peleg, Raun Krisch, Berna Adalier

Maximize Application Performance On the Go and In the Cloud with OpenCL* on Intel Architecture

Maximize Application Performance On the Go and In the Cloud with OpenCL* on Intel Architecture Maximize Application Performance On the Go and In the Cloud with OpenCL* on Intel Architecture Arnon Peleg (Intel) Ben Ashbaugh (Intel) Dave Helmly (Adobe) Legal INFORMATION IN THIS DOCUMENT IS PROVIDED

More information

Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual

Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual Overview Metrics Monitor is part of Intel Media Server Studio 2015 for Linux Server. Metrics Monitor is a user space shared library

More information

Intel Media SDK Library Distribution and Dispatching Process

Intel Media SDK Library Distribution and Dispatching Process Intel Media SDK Library Distribution and Dispatching Process Overview Dispatching Procedure Software Libraries Platform-Specific Libraries Legal Information Overview This document describes the Intel Media

More information

The Transition to PCI Express* for Client SSDs

The Transition to PCI Express* for Client SSDs The Transition to PCI Express* for Client SSDs Amber Huffman Senior Principal Engineer Intel Santa Clara, CA 1 *Other names and brands may be claimed as the property of others. Legal Notices and Disclaimers

More information

Specification Update. January 2014

Specification Update. January 2014 Intel Embedded Media and Graphics Driver v36.15.0 (32-bit) & v3.15.0 (64-bit) for Intel Processor E3800 Product Family/Intel Celeron Processor * Release Specification Update January 2014 Notice: The Intel

More information

Intel Core TM i3 Processor Series Embedded Application Power Guideline Addendum

Intel Core TM i3 Processor Series Embedded Application Power Guideline Addendum Intel Core TM i3 Processor Series Embedded Application Power Guideline Addendum July 2012 Document Number: 327705-001 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

2013 Intel Corporation

2013 Intel Corporation 2013 Intel Corporation Intel Open Source Graphics Programmer s Reference Manual (PRM) for the 2013 Intel Core Processor Family, including Intel HD Graphics, Intel Iris Graphics and Intel Iris Pro Graphics

More information

Intel HTML5 Development Environment. Tutorial Test & Submit a Microsoft Windows Phone 8* App (BETA)

Intel HTML5 Development Environment. Tutorial Test & Submit a Microsoft Windows Phone 8* App (BETA) Intel HTML5 Development Environment Tutorial Test & Submit a Microsoft Windows Phone 8* App v1.00 : 04.09.2013 Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS.

More information

Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze

Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze Whitepaper December 2012 Anita Banerjee Contents Introduction... 3 Sorenson Squeeze... 4 Intel QSV H.264... 5 Power Performance...

More information

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

Intel Data Direct I/O Technology (Intel DDIO): A Primer > Intel Data Direct I/O Technology (Intel DDIO): A Primer > Technical Brief February 2012 Revision 1.0 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

The Case for Rack Scale Architecture

The Case for Rack Scale Architecture The Case for Rack Scale Architecture An introduction to the next generation of Software Defined Infrastructure Intel Data Center Group Pooled System Top of Rack Switch POD Manager Network CPU/Memory Storage

More information

Intel Service Assurance Administrator. Product Overview

Intel Service Assurance Administrator. Product Overview Intel Service Assurance Administrator Product Overview Running Enterprise Workloads in the Cloud Enterprise IT wants to Start a private cloud initiative to service internal enterprise customers Find an

More information

FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015

FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015 FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015 AGENDA The Kaveri Accelerated Processing Unit (APU) The Graphics Core Next Architecture and its Floating-Point Arithmetic

More information

Intel X38 Express Chipset Memory Technology and Configuration Guide

Intel X38 Express Chipset Memory Technology and Configuration Guide Intel X38 Express Chipset Memory Technology and Configuration Guide White Paper January 2008 Document Number: 318469-002 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization

More information

Intel 965 Express Chipset Family Memory Technology and Configuration Guide

Intel 965 Express Chipset Family Memory Technology and Configuration Guide Intel 965 Express Chipset Family Memory Technology and Configuration Guide White Paper - For the Intel 82Q965, 82Q963, 82G965 Graphics and Memory Controller Hub (GMCH) and Intel 82P965 Memory Controller

More information

GPU Architecture. An OpenCL Programmer s Introduction. Lee Howes November 3, 2010

GPU Architecture. An OpenCL Programmer s Introduction. Lee Howes November 3, 2010 GPU Architecture An OpenCL Programmer s Introduction Lee Howes November 3, 2010 The aim of this webinar To provide a general background to modern GPU architectures To place the AMD GPU designs in context:

More information

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service. Eddie Dong, Tao Hong, Xiaowei Yang

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service. Eddie Dong, Tao Hong, Xiaowei Yang COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service Eddie Dong, Tao Hong, Xiaowei Yang 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO

More information

Intel Ethernet and Configuring Single Root I/O Virtualization (SR-IOV) on Microsoft* Windows* Server 2012 Hyper-V. Technical Brief v1.

Intel Ethernet and Configuring Single Root I/O Virtualization (SR-IOV) on Microsoft* Windows* Server 2012 Hyper-V. Technical Brief v1. Intel Ethernet and Configuring Single Root I/O Virtualization (SR-IOV) on Microsoft* Windows* Server 2012 Hyper-V Technical Brief v1.0 September 2012 2 Intel Ethernet and Configuring SR-IOV on Windows*

More information

AMD GPU Architecture. OpenCL Tutorial, PPAM 2009. Dominik Behr September 13th, 2009

AMD GPU Architecture. OpenCL Tutorial, PPAM 2009. Dominik Behr September 13th, 2009 AMD GPU Architecture OpenCL Tutorial, PPAM 2009 Dominik Behr September 13th, 2009 Overview AMD GPU architecture How OpenCL maps on GPU and CPU How to optimize for AMD GPUs and CPUs in OpenCL 2 AMD GPU

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

VNF & Performance: A practical approach

VNF & Performance: A practical approach VNF & Performance: A practical approach Luc Provoost Engineering Manager, Network Product Group Intel Corporation SDN and NFV are Forces of Change One Application Per System Many Applications Per Virtual

More information

Introduction to GPU Architecture

Introduction to GPU Architecture Introduction to GPU Architecture Ofer Rosenberg, PMTS SW, OpenCL Dev. Team AMD Based on From Shader Code to a Teraflop: How GPU Shader Cores Work, By Kayvon Fatahalian, Stanford University Content 1. Three

More information

Intel HTML5 Development Environment. Tutorial Building an Apple ios* Application Binary

Intel HTML5 Development Environment. Tutorial Building an Apple ios* Application Binary Intel HTML5 Development Environment Tutorial Building an Apple ios* Application Binary V1.02 : 08.08.2013 Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO

More information

Vendor Update Intel 49 th IDC HPC User Forum. Mike Lafferty HPC Marketing Intel Americas Corp.

Vendor Update Intel 49 th IDC HPC User Forum. Mike Lafferty HPC Marketing Intel Americas Corp. Vendor Update Intel 49 th IDC HPC User Forum Mike Lafferty HPC Marketing Intel Americas Corp. Legal Information Today s presentations contain forward-looking statements. All statements made that are not

More information

Intel Media Server Studio Professional Edition for Windows* Server

Intel Media Server Studio Professional Edition for Windows* Server Intel Media Server Studio 2015 R3 Professional Edition for Windows* Server Release Notes Overview What's New System Requirements Installation Installation Folders Known Limitations Legal Information Overview

More information

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service Eddie Dong, Yunhong Jiang 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

Software Solutions for Multi-Display Setups

Software Solutions for Multi-Display Setups White Paper Bruce Bao Graphics Application Engineer Intel Corporation Software Solutions for Multi-Display Setups January 2013 328563-001 Executive Summary Multi-display systems are growing in popularity.

More information

Monte Carlo Method for Stock Options Pricing Sample

Monte Carlo Method for Stock Options Pricing Sample Monte Carlo Method for Stock Options Pricing Sample User's Guide Copyright 2013 Intel Corporation All Rights Reserved Document Number: 325264-003US Revision: 1.0 Document Number: 325264-003US Intel SDK

More information

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics GPU Architectures A CPU Perspective Derek Hower AMD Research 5/21/2013 Goals Data Parallelism: What is it, and how to exploit it? Workload characteristics Execution Models / GPU Architectures MIMD (SPMD),

More information

Douglas Fisher Vice President General Manager, Software and Services Group Intel Corporation

Douglas Fisher Vice President General Manager, Software and Services Group Intel Corporation Douglas Fisher Vice President General Manager, Software and Services Group Intel Corporation Other brands and names are the property of their respective owners. Other brands and names are the property

More information

Haswell Cryptographic Performance

Haswell Cryptographic Performance White Paper Sean Gulley Vinodh Gopal IA Architects Intel Corporation Haswell Cryptographic Performance July 2013 329282-001 Executive Summary The new Haswell microarchitecture featured in the 4 th generation

More information

ATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group

ATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group ATI Radeon 4800 series Graphics Michael Doggett Graphics Architecture Group Graphics Product Group Graphics Processing Units ATI Radeon HD 4870 AMD Stream Computing Next Generation GPUs 2 Radeon 4800 series

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

Intel Q35/Q33, G35/G33/G31, P35/P31 Express Chipset Memory Technology and Configuration Guide

Intel Q35/Q33, G35/G33/G31, P35/P31 Express Chipset Memory Technology and Configuration Guide Intel Q35/Q33, G35/G33/G31, P35/P31 Express Chipset Memory Technology and Configuration Guide White Paper August 2007 Document Number: 316971-002 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION

More information

Creating Overlay Networks Using Intel Ethernet Converged Network Adapters

Creating Overlay Networks Using Intel Ethernet Converged Network Adapters Creating Overlay Networks Using Intel Ethernet Converged Network Adapters Technical Brief Networking Division (ND) August 2013 Revision 1.0 LEGAL INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION

More information

Intel Retail Client Manager

Intel Retail Client Manager Intel Retail Client Manager Frequently Asked Questions June 2014 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO

More information

The ROI from Optimizing Software Performance with Intel Parallel Studio XE

The ROI from Optimizing Software Performance with Intel Parallel Studio XE The ROI from Optimizing Software Performance with Intel Parallel Studio XE Intel Parallel Studio XE delivers ROI solutions to development organizations. This comprehensive tool offering for the entire

More information

Radeon HD 2900 and Geometry Generation. Michael Doggett

Radeon HD 2900 and Geometry Generation. Michael Doggett Radeon HD 2900 and Geometry Generation Michael Doggett September 11, 2007 Overview Introduction to 3D Graphics Radeon 2900 Starting Point Requirements Top level Pipeline Blocks from top to bottom Command

More information

Head-Coupled Perspective

Head-Coupled Perspective Head-Coupled Perspective Introduction Head-Coupled Perspective (HCP) refers to a technique of rendering a scene that takes into account the position of the viewer relative to the display. As a viewer moves

More information

Intel 810 and 815 Chipset Family Dynamic Video Memory Technology

Intel 810 and 815 Chipset Family Dynamic Video Memory Technology Intel 810 and 815 Chipset Family Dynamic Video Technology Revision 3.0 March 2002 March 2002 1 Information in this document is provided in connection with Intel products. No license, express or implied,

More information

Benchmarking Cloud Storage through a Standard Approach Wang, Yaguang Intel Corporation

Benchmarking Cloud Storage through a Standard Approach Wang, Yaguang Intel Corporation Benchmarking Cloud Storage through a Standard Approach Wang, Yaguang Intel Corporation Legal Notices and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

MCA Enhancements in Future Intel Xeon Processors June 2013

MCA Enhancements in Future Intel Xeon Processors June 2013 MCA Enhancements in Future Intel Xeon Processors June 2013 Reference Number: 329176-001, Revision: 1.0 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR

More information

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture

Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture White Paper Intel Xeon processor E5 v3 family Intel Xeon Phi coprocessor family Digital Design and Engineering Three Paths to Faster Simulations Using ANSYS Mechanical 16.0 and Intel Architecture Executive

More information

Intel Platform and Big Data: Making big data work for you.

Intel Platform and Big Data: Making big data work for you. Intel Platform and Big Data: Making big data work for you. 1 From data comes insight New technologies are enabling enterprises to transform opportunity into reality by turning big data into actionable

More information

Intel 845G/GL Chipset Dynamic Video Memory Technology

Intel 845G/GL Chipset Dynamic Video Memory Technology R Intel 845G/GL Chipset Dynamic Video Memory Technology Revision 1.2 June 2002 May 2002 1 Information in this document is provided in connection with Intel products. No license, express or implied, by

More information

Hetero Streams Library 1.0

Hetero Streams Library 1.0 Release Notes for release of Copyright 2013-2016 Intel Corporation All Rights Reserved US Revision: 1.0 World Wide Web: http://www.intel.com Legal Disclaimer Legal Disclaimer You may not use or facilitate

More information

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008 Radeon GPU Architecture and the series Michael Doggett Graphics Architecture Group June 27, 2008 Graphics Processing Units Introduction GPU research 2 GPU Evolution GPU started as a triangle rasterizer

More information

Finding Performance and Power Issues on Android Systems. By Eric W Moore

Finding Performance and Power Issues on Android Systems. By Eric W Moore Finding Performance and Power Issues on Android Systems By Eric W Moore Agenda Performance & Power Tuning on Android & Features Needed/Wanted in a tool Some Performance Tools Getting a Device that Supports

More information

Intel Cyber Security Briefing: Trends, Solutions, and Opportunities. Matthew Rosenquist, Cyber Security Strategist, Intel Corp

Intel Cyber Security Briefing: Trends, Solutions, and Opportunities. Matthew Rosenquist, Cyber Security Strategist, Intel Corp Intel Cyber Security Briefing: Trends, Solutions, and Opportunities Matthew Rosenquist, Cyber Security Strategist, Intel Corp Legal Notices and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION

More information

Intel SSD 520 Series Specification Update

Intel SSD 520 Series Specification Update Intel SSD 520 Series Specification Update June 2012 Revision 1.0 Document Number: 327567-001US INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED,

More information

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software GPU Computing Numerical Simulation - from Models to Software Andreas Barthels JASS 2009, Course 2, St. Petersburg, Russia Prof. Dr. Sergey Y. Slavyanov St. Petersburg State University Prof. Dr. Thomas

More information

How to Configure Intel X520 Ethernet Server Adapter Based Virtual Functions on Citrix* XenServer 6.0*

How to Configure Intel X520 Ethernet Server Adapter Based Virtual Functions on Citrix* XenServer 6.0* How to Configure Intel X520 Ethernet Server Adapter Based Virtual Functions on Citrix* XenServer 6.0* Technical Brief v1.0 December 2011 Legal Lines and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED

More information

iscsi Quick-Connect Guide for Red Hat Linux

iscsi Quick-Connect Guide for Red Hat Linux iscsi Quick-Connect Guide for Red Hat Linux A supplement for Network Administrators The Intel Networking Division Revision 1.0 March 2013 Legal INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH

More information

Keys to node-level performance analysis and threading in HPC applications

Keys to node-level performance analysis and threading in HPC applications Keys to node-level performance analysis and threading in HPC applications Thomas GUILLET (Intel; Exascale Computing Research) IFERC seminar, 18 March 2015 Legal Disclaimer & Optimization Notice INFORMATION

More information

Intel HTML5 Development Environment. Article - Native Application Facebook* Integration

Intel HTML5 Development Environment. Article - Native Application Facebook* Integration Intel HTML5 Development Environment Article - Native Application Facebook* Integration V3.06 : 07.16.2013 Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO

More information

Introduction to GPU Programming Languages

Introduction to GPU Programming Languages CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure

More information

Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms

Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms EXECUTIVE SUMMARY Intel Cloud Builder Guide Intel Xeon Processor-based Servers Red Hat* Cloud Foundations Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms Red Hat* Cloud Foundations

More information

Intel Technical Advisory

Intel Technical Advisory This Technical Advisory describes an issue which may or may not affect the customer s product Intel Technical Advisory 5200 NE Elam Young Parkway Hillsboro, OR 97124 TA-1054-01 April 4, 2014 Incorrectly

More information

COSBench: A benchmark Tool for Cloud Object Storage Services. Jiangang.Duan@intel.com 2012.10

COSBench: A benchmark Tool for Cloud Object Storage Services. Jiangang.Duan@intel.com 2012.10 COSBench: A benchmark Tool for Cloud Object Storage Services Jiangang.Duan@intel.com 2012.10 Updated June 2012 Self introduction COSBench Introduction Agenda Case Study to evaluate OpenStack* swift performance

More information

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

Accelerating High-Speed Networking with Intel I/O Acceleration Technology White Paper Intel I/O Acceleration Technology Accelerating High-Speed Networking with Intel I/O Acceleration Technology The emergence of multi-gigabit Ethernet allows data centers to adapt to the increasing

More information

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011 Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

More information

Customizing Boot Media for Linux* Direct Boot

Customizing Boot Media for Linux* Direct Boot White Paper Bruce Liao Platform Application Engineer Intel Corporation Customizing Boot Media for Linux* Direct Boot October 2013 329747-001 Executive Summary This white paper introduces the traditional

More information

Intel Solid-State Drive Pro 2500 Series Opal* Compatibility Guide

Intel Solid-State Drive Pro 2500 Series Opal* Compatibility Guide Opal* Compatibility Guide 1.0 Order Number: 331049-001US INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL

More information

Partition Alignment of Intel SSDs for Achieving Maximum Performance and Endurance Technical Brief February 2014

Partition Alignment of Intel SSDs for Achieving Maximum Performance and Endurance Technical Brief February 2014 Partition Alignment of Intel SSDs for Achieving Maximum Performance and Endurance Technical Brief February 2014 Order Number: 330105-001US INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL

More information

Measuring Cache and Memory Latency and CPU to Memory Bandwidth

Measuring Cache and Memory Latency and CPU to Memory Bandwidth White Paper Joshua Ruggiero Computer Systems Engineer Intel Corporation Measuring Cache and Memory Latency and CPU to Memory Bandwidth For use with Intel Architecture December 2008 1 321074 Executive Summary

More information

* * * Intel RealSense SDK Architecture

* * * Intel RealSense SDK Architecture Multiple Implementations Intel RealSense SDK Architecture Introduction The Intel RealSense SDK is architecturally different from its predecessor, the Intel Perceptual Computing SDK. If you re a developer

More information

How to Configure Intel Ethernet Converged Network Adapter-Enabled Virtual Functions on VMware* ESXi* 5.1

How to Configure Intel Ethernet Converged Network Adapter-Enabled Virtual Functions on VMware* ESXi* 5.1 How to Configure Intel Ethernet Converged Network Adapter-Enabled Virtual Functions on VMware* ESXi* 5.1 Technical Brief v1.0 February 2013 Legal Lines and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED

More information

Intel Data Migration Software

Intel Data Migration Software User Guide Software Version 2.0 Document Number: 324324-002US INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY

More information

OpenCL Programming for the CUDA Architecture. Version 2.3

OpenCL Programming for the CUDA Architecture. Version 2.3 OpenCL Programming for the CUDA Architecture Version 2.3 8/31/2009 In general, there are multiple ways of implementing a given algorithm in OpenCL and these multiple implementations can have vastly different

More information

Intel Core i5 processor 520E CPU Embedded Application Power Guideline Addendum January 2011

Intel Core i5 processor 520E CPU Embedded Application Power Guideline Addendum January 2011 Intel Core i5 processor 520E CPU Embedded Application Power Guideline Addendum January 2011 Document Number: 324818-001 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

GPU architecture II: Scheduling the graphics pipeline

GPU architecture II: Scheduling the graphics pipeline GPU architecture II: Scheduling the graphics pipeline Mike Houston, AMD / Stanford Aaron Lefohn, Intel / University of Washington 1 Notes The front half of this talk is almost verbatim from: Keeping Many

More information

Intel Software Guard Extensions(Intel SGX) Carlos Rozas Intel Labs November 6, 2013

Intel Software Guard Extensions(Intel SGX) Carlos Rozas Intel Labs November 6, 2013 Intel Software Guard Extensions(Intel SGX) Carlos Rozas Intel Labs November 6, 2013 Legal Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR

More information

Cross-Platform Game Development Best practices learned from Marmalade, Unreal, Unity, etc.

Cross-Platform Game Development Best practices learned from Marmalade, Unreal, Unity, etc. Cross-Platform Game Development Best practices learned from Marmalade, Unreal, Unity, etc. Orion Granatir & Omar Rodriguez GDC 2013 www.intel.com/software/gdc Be Bold. Define the Future of Software. Agenda

More information

Intel HTML5 Development Environment Article Using the App Dev Center

Intel HTML5 Development Environment Article Using the App Dev Center Intel HTML5 Development Environment Article Using the App Dev Center v1.06 : 06.04.2013 Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS

More information

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com CSCI-GA.3033-012 Graphics Processing Units (GPUs): Architecture and Programming Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Modern GPU

More information

Intel Identity Protection Technology (IPT)

Intel Identity Protection Technology (IPT) Intel Identity Protection Technology (IPT) Enabling improved user-friendly strong authentication in VASCO's latest generation solutions June 2013 Steve Davies Solution Architect Intel Corporation 1 Copyright

More information

GPU Architecture. Michael Doggett ATI

GPU Architecture. Michael Doggett ATI GPU Architecture Michael Doggett ATI GPU Architecture RADEON X1800/X1900 Microsoft s XBOX360 Xenos GPU GPU research areas ATI - Driving the Visual Experience Everywhere Products from cell phones to super

More information

NVIDIA GeForce GTX 580 GPU Datasheet

NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet 3D Graphics Full Microsoft DirectX 11 Shader Model 5.0 support: o NVIDIA PolyMorph Engine with distributed HW tessellation engines

More information

Intel Platform Controller Hub EG20T

Intel Platform Controller Hub EG20T Intel Platform Controller Hub EG20T General Purpose Input Output (GPIO) Driver for Windows* Order Number: 324257-002US Legal Lines and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION

More information

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?

More information

Software Evaluation Guide for Autodesk 3ds Max 2009* and Enemy Territory: Quake Wars* Render a 3D character while playing a game

Software Evaluation Guide for Autodesk 3ds Max 2009* and Enemy Territory: Quake Wars* Render a 3D character while playing a game Software Evaluation Guide for Autodesk 3ds Max 2009* and Enemy Territory: Quake Wars* Render a 3D character while playing a game http://www.intel.com/performance/resources Version 2008-09 Rev. 1.0 Information

More information

Intel Retail Client Manager Audience Analytics

Intel Retail Client Manager Audience Analytics Intel Retail Client Manager Audience Analytics By using this document, in addition to any agreements you have with Intel, you accept the terms set forth below. You may not use or facilitate the use of

More information

CUDA Optimization with NVIDIA Tools. Julien Demouth, NVIDIA

CUDA Optimization with NVIDIA Tools. Julien Demouth, NVIDIA CUDA Optimization with NVIDIA Tools Julien Demouth, NVIDIA What Will You Learn? An iterative method to optimize your GPU code A way to conduct that method with Nvidia Tools 2 What Does the Application

More information

Intel Identity Protection Technology Enabling improved user-friendly strong authentication in VASCO's latest generation solutions

Intel Identity Protection Technology Enabling improved user-friendly strong authentication in VASCO's latest generation solutions Intel Identity Protection Technology Enabling improved user-friendly strong authentication in VASCO's latest generation solutions June 2013 Dirk Roziers Market Manager PC Client Services Intel Corporation

More information

Big Data Technologies for Ultra-High-Speed Data Transfer and Processing

Big Data Technologies for Ultra-High-Speed Data Transfer and Processing White Paper Intel Xeon Processor E5 Family Big Data Analytics Cloud Computing Solutions Big Data Technologies for Ultra-High-Speed Data Transfer and Processing Using Technologies from Aspera and Intel

More information

A Powerful solution for next generation Pcs

A Powerful solution for next generation Pcs Product Brief 6th Generation Intel Core Desktop Processors i7-6700k and i5-6600k 6th Generation Intel Core Desktop Processors i7-6700k and i5-6600k A Powerful solution for next generation Pcs Looking for

More information

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture? This Unit: Putting It All Together CIS 501 Computer Architecture Unit 11: Putting It All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Amir Roth with contributions by Milo

More information

CUDA SKILLS. Yu-Hang Tang. June 23-26, 2015 CSRC, Beijing

CUDA SKILLS. Yu-Hang Tang. June 23-26, 2015 CSRC, Beijing CUDA SKILLS Yu-Hang Tang June 23-26, 2015 CSRC, Beijing day1.pdf at /home/ytang/slides Referece solutions coming soon Online CUDA API documentation http://docs.nvidia.com/cuda/index.html Yu-Hang Tang @

More information

Intel vpro Technology Module for Microsoft* Windows PowerShell*

Intel vpro Technology Module for Microsoft* Windows PowerShell* Intel vpro Technology Module for Microsoft* Windows PowerShell* 1 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL

More information

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series By: Binesh Tuladhar Clay Smith Overview History of GPU s GPU Definition Classical Graphics Pipeline Geforce 6 Series Architecture Vertex

More information

DDR2 x16 Hardware Implementation Utilizing the Intel EP80579 Integrated Processor Product Line

DDR2 x16 Hardware Implementation Utilizing the Intel EP80579 Integrated Processor Product Line Utilizing the Intel EP80579 Integrated Processor Product Line Order Number: 320296-002US Legal Lines and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

This guide explains how to install an Intel Solid-State Drive (Intel SSD) in a SATA-based desktop or notebook computer.

This guide explains how to install an Intel Solid-State Drive (Intel SSD) in a SATA-based desktop or notebook computer. Installation Guide This guide explains how to install an (Intel SSD) in a SATA-based desktop or notebook computer. The instructions include migrating your data from your current storage device (such as

More information

SAP * Mobile Platform 3.0 Scaling on Intel Xeon Processor E5 v2 Family

SAP * Mobile Platform 3.0 Scaling on Intel Xeon Processor E5 v2 Family White Paper SAP* Mobile Platform 3.0 E5 Family Enterprise-class Security SAP * Mobile Platform 3.0 Scaling on Intel Xeon Processor E5 v2 Family Delivering Incredible Experiences to Mobile Users Executive

More information

Intel Desktop Board D945GCPE Specification Update

Intel Desktop Board D945GCPE Specification Update Intel Desktop Board D945GCPE Specification Update Release Date: July 11, 2007 Order Number: E11670-001US The Intel Desktop Board D945GCPE may contain design defects or errors known as errata, which may

More information

Intel 865G, Intel 865P, Intel 865PE Chipset Memory Configuration Guide

Intel 865G, Intel 865P, Intel 865PE Chipset Memory Configuration Guide G, P, PE Chipset Memory Configuration Guide White Paper May 2003 Document Number: 253036-001 INFOMATION IN THIS DOCUMENT IS POVIDED IN CONNECTION WITH INTEL PODUCTS. NO LICENSE, EXPESS O IMPLIED, BY ESTOPPEL

More information

GPU Parallel Computing Architecture and CUDA Programming Model

GPU Parallel Computing Architecture and CUDA Programming Model GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel

More information

Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms

Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms Intel Cloud Builders Guide Intel Xeon Processor-based Servers RES Virtual Desktop Extender Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms Client Aware Cloud with RES Virtual

More information

Configuring RAID for Optimal Performance

Configuring RAID for Optimal Performance Configuring RAID for Optimal Performance Intel RAID Controller SRCSASJV Intel RAID Controller SRCSASRB Intel RAID Controller SRCSASBB8I Intel RAID Controller SRCSASLS4I Intel RAID Controller SRCSATAWB

More information