HETEROGENEOUS SYSTEM ARCHITECTURE. Phil Rogers, AMD Corporate Fellow HSA Foundation President
|
|
- Andra Tate
- 7 years ago
- Views:
Transcription
1 HETEROGENEOUS SYSTEM ARCHITECTURE Phil Rogers, AMD Corporate Fellow HSA Foundation President
2 GOALS Make the unprecedented processing capability of the APU as accessible to programmers as the CPU is today Dramatically expand the APU software ecosystem in client and server Enable immersive applications whether hosted locally or in the cloud Copyright 2012 HSA Foundation. All Rights Reserved. 2
3 APU: ACCELERATED PROCESSING UNIT The APU has arrived and it is a great advance over previous platforms Combines scalar processing on CPU with parallel processing on the GPU and high bandwidth access to memory How do we make it even better going forward? Easier to program Easier to optimize Easier to load balance Higher performance Lower power Copyright 2012 HSA Foundation. All Rights Reserved. 3
4 A NEW ERA OF PROCESSOR PERFORMANCE Single-thread Performance Throughput Performance Modern Application Performance Single-Core Era Enabled by: Moore s Law Voltage Scaling Constrained by: Power Complexity Assembly C/C++ Java Multi-Core Era Enabled by: Moore s Law SMP architecture Constrained by: Power Parallel SW Scalability pthreads OpenMP / TBB Heterogeneous Systems Era Enabled by: Abundant data parallelism Power efficient GPUs Temporarily Constrained by: Programming models Comm.overhead Shader CUDA OpenCL C++ and Java we are here? we are here we are here Time Time (# of processors) Time (Data-parallel exploitation) Copyright 2012 HSA Foundation. All Rights Reserved. 4
5 Architecture Maturity & Programmer Accessibility EVOLUTION OF HETEROGENEOUS COMPUTING Architected Era Proprietary Drivers Era Graphics & Proprietary Driver-based APIs Adventurous programmers Exploit early programmable shader cores in the GPU Make your program look like graphics to the GPU Standards Drivers Era OpenCL, DirectCompute Driver-based APIs Expert programmers C and C++ subsets Compute centric APIs, data types Multiple address spaces with explicit data movement Specialized work queue based structures Kernel mode dispatch Heterogeneous System Architecture GPU Peer Processor Mainstream programmers Full C++ GPU as a co-processor Unified coherent address space Task parallel runtimes Nested Data Parallel programs User mode dispatch Pre-emption and context switching Poor Excellent CUDA, Brook+, etc Copyright 2012 HSA Foundation. All Rights Reserved. 5
6 HSA FEATURE ROADMAP Physical Integration Optimized Platforms Architectural Integration System Integration Integrate CPU & GPU in silicon GPU Compute C++ support Unified Address Space for CPU and GPU GPU compute context switch Unified Memory Controller User mode scheduling GPU uses pageable system memory via CPU pointers GPU graphics pre-emption Common Manufacturing Technology Bi-Directional Power Mgmt between CPU and GPU Fully coherent memory between CPU & GPU Quality of Service Copyright 2012 HSA Foundation. All Rights Reserved. 6
7 COMMITTED TO OPEN STANDARDS HSA Foundation believes in open and de-facto standards Compete on the best implementation Open standards are the basis for large ecosystems Open standards always win over time DirectX SW developers want their applications to run on multiple platforms from multiple hardware vendors Copyright 2012 HSA Foundation. All Rights Reserved. 7
8 HETEROGENEOUS SYSTEM ARCHITECTURE AN OPEN PLATFORM Open Architecture, published specifications HSAIL virtual ISA HSA memory model HSA system architecture ISA agnostic for both CPU and GPU HSA Foundation formed in June 2012 Inviting partners to join us, in all areas Hardware companies Operating Systems Tools and Middleware Applications Copyright 2012 HSA Foundation. All Rights Reserved. 8
9 HSA INTERMEDIATE LAYER - HSAIL HSAIL is a virtual ISA for parallel programs Finalized to ISA by a JIT compiler or Finalizer ISA independent by design for CPU & GPU Explicitly parallel Designed for data parallel programming Support for exceptions, virtual functions, and other high level language features Syscall methods GPU code can call directly to system services, IO, printf, etc Debugging support Copyright 2012 HSA Foundation. All Rights Reserved. 9
10 HSA MEMORY MODEL Designed to be compatible with C++11, Java and.net Memory Models Relaxed consistency memory model for parallel compute performance Loads and stores can be re-ordered by the finalizer Visibility controlled by: Load.Acquire Store.Release Barriers Copyright 2012 HSA Foundation. All Rights Reserved. 10
11 HSA SOFTWARE STACKS APPLICATIONS, SYSTEM SOFTWARE AND PROGRAMMING MODELS
12 APPLICATION AREAS WITH ABUNDANT PARALLEL WORKLOADS Biometric Recognition Natural UI & Gestures Touch, gesture, and voice Secure, fast, accurate: face, voice, fingerprints Augmented Reality Superimpose graphics, audio, and other digital information as a virtual overlay Content Everywhere Content from any source to any display seamlessly Beyond HD Experiences Streaming media, new codecs, 3D, transcode, audio AV Content Management Searching, indexing and tagging of video & audio. multimedia data mining Copyright 2012 HSA Foundation. All Rights Reserved. 12
13 Driver Stack HSA Software Stack Apps Apps Apps Apps Apps Apps Apps Apps Apps Apps Apps Apps Domain Libraries HSA Domain Libraries, OpenCL 2.x Runtime OpenCL 1.x, DX Runtimes, User Mode Drivers Graphics Kernel Mode Driver HSA JIT Task Queuing Libraries HSA Runtime HSA Kernel Mode Driver Hardware - APUs, CPUs, GPUs User mode component Kernel mode component Components contributed by third parties Copyright 2012 HSA Foundation. All Rights Reserved. 13
14 INTRODUCING HSA BOLT PARALLEL PRIMITIVES LIBRARY FOR HSA Easily leverage the inherent power efficiency of GPU computing Common routines such as scan, sort, reduce, transform More advanced routines like heterogeneous pipelines Bolt library works with OpenCL or C++ AMP Enjoy the unique advantages of the HSA platform Move the computation not the data Finally a single source code base for the CPU and GPU! Developers can focus on core algorithms Bolt Preview available in AMD APP SDK 2.8 (2H-2012)
15 HSA SOLUTION STACK Application Application SW Domain Specific Libs (Bolt, OpenCV, many others) OpenCL Runtime DirectX Runtime Other Runtime HSA Software Drivers HSA Runtime HSAIL Ctl HSA Finalizer Knl Driver GPU ISA Legacy Drivers Differentiated HW CPU(s) GPU(s) Other Accelerators
16 OPENCL AND HSA HSA is an optimized platform architecture for OpenCL Not an alternative to OpenCL OpenCL on HSA will benefit from Avoidance of wasteful copies Low latency dispatch Improved memory model Pointers shared between CPU and GPU HSA also exposes a lower level programming interface, for those that want the ultimate in control and performance Optimized libraries may choose the lower level interface Copyright 2012 HSA Foundation. All Rights Reserved. 16
17 MAKE GPUS EASIER TO PROGRAM: PRIMARY PROGRAMMING MODELS Microsoft C++AMP Address huge population of developers Integrated in Visual Studio and Windows 8 Metro Microsoft Community Promise License for open platform use Java acceleration Aparapi on OpenCL today Project Sumatra to add HSA support in an OpenJDK for Java 8 Driving to have Sumatra absorbed into Java 9 Copyright 2012 HSA Foundation. All Rights Reserved. 17
18 AMD S OPEN SOURCE COMMITMENT TO HSA We will open source our linux execution and compilation stack Jump start the ecosystem Allow a single shared implementation where appropriate Enable university research in all areas Component Name AMD Specific Rationale HSA Bolt Library No Enable understanding and debug HSAIL Code Generator No Enable research LLVM Contributions No Industry and academic collaboration HSA Assembler No Enable understanding and debug HSA Runtime No Standardize on a single runtime HSA Finalizer Yes Enable research and debug HSA Kernel Driver Yes For inclusion in linux distros Copyright 2012 HSA Foundation. All Rights Reserved. 18
19 ACCELERATED WORKLOADS CLIENT AND SERVER EXAMPLES
20 HAAR Face Detection CORNERSTONE TECHNOLOGY FOR COMPUTERVISION
21 LOOKING FOR FACES IN ALL THE RIGHT PLACES Quick HD Calculations Search square = 21 x 21 Pixels = 1920 x 1080 = 2,073,600 Search squares = 1900 x 1060 = ~2 Million
22 LOOKING FOR DIFFERENT SIZE FACES BY SCALING THE VIDEO FRAME More HD Calculations 70% scaling in H and V Total Pixels = 4.07 Million Search squares = 3.8 Million Copyright 2012 HSA Foundation. All Rights Reserved. 22
23 HAAR CASCADE STAGES Feature k Feature l Stage N Feature m Yes Face still possible? Feature p Feature r Stage N+1 No Feature q REJECT FRAME Copyright 2012 HSA Foundation. All Rights Reserved. 23
24 22 CASCADE STAGES, EARLY OUT BETWEEN EACH STAGE 1 STAGE 2 STAGE 21 STAGE 22 FACE CONFIRMED NO FACE Final HD Calculations Search squares = 3.8 million Average features per square = 124 Calculations per feature = 100 Calculations per frame = 47 GCalcs Calculation Rate 30 frames/sec = 1.4TCalcs/second 60 frames/sec = 2.8TCalcs/second and this only gets front-facing faces Copyright 2012 HSA Foundation. All Rights Reserved. 24
25 CASCADE DEPTH ANALYSIS Cascade Depth Copyright 2012 HSA Foundation. All Rights Reserved. 25
26 Time (ms) PROCESSING TIME/STAGE Trinity A M 4 GPU CPU Cascade Stage AMD A M APU with Radeon HD Graphics; CPU: MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G, 6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL 1.1 (873.1) Copyright 2012 HSA Foundation. All Rights Reserved. 26
27 Images/Sec PERFORMANCE CPU-VS-GPU 12 Trinity A M CPU HSA 4 2 GPU Number of Cascade Stages on GPU AMD A M APU with Radeon HD Graphics; CPU: MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G, 6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL 1.1 (873.1) Copyright 2012 HSA Foundation. All Rights Reserved. 27
28 HAAR SOLUTION RUN DIFFERENT CASCADES ON GPU AND CPU By seamlessly sharing data between CPU and GPU, HSA allows the right processor to handle its appropriate workload +2.5x -2.5x INCREASED PERFORMANCE DECREASED ENERGY PER FRAME Copyright 2012 HSA Foundation. All Rights Reserved. 28
29 ACCELERATING JAVA GOING BEYOND NATIVE LANGUAGES
30 JAVA ENABLEMENT BY APARAPI Aparapi = Runtime capable of converting Java bytecode to OpenCL Developer creates Java source Source compiled to class files (bytecode) using standard compiler For execution on any OpenCL 1.1+ capable device OR execute via a thread pool if OpenCL is not available Copyright 2012 HSA Foundation. All Rights Reserved. 30
31 JAVA AND APARAPI HSA ENABLEMENT ROADMAP Application Application Application Application APARAPI APARAPI APARAPI JVM JVM JVM HSA-Enabled JVM IR HSA Runtime LLVM Optimizer HSAIL HSAIL HSAIL OpenCL HSA Finalizer HSA Finalizer HSA Finalizer CPU ISA GPU ISA CPU ISA GPU ISA CPU ISA GPU ISA CPU ISA GPU ISA CPU GPU HSA CPU HSA GPU HSA CPU HSA GPU HSA CPU HSA GPU Copyright 2012 HSA Foundation. All Rights Reserved. 31
32 ACCELERATING MEMCACHED CLOUD SERVER WORKLOAD
33 MEMCACHED A Distributed Memory Object Caching System Used in Cloud Servers Generally used for short-term storage and caching, handling requests that would otherwise require database or file system accesses Used by Facebook, YouTube, Twitter, Wikipedia, Flickr, and others Effectively a large distributed hash table Responds to store and get requests received over the network Conceptually: store(key, object) object = get(key) Copyright 2012 HSA Foundation. All Rights Reserved. 33
34 OFFLOADING MEMCACHED KEY LOOKUP TO THE GPU Key Look Up Performance 4 Execution Breakdown 100% % 60% 40% 1 20% 0 Multithreaded CPU Radeon HD 5870 Trinity A K Zacate E Data Transfer Execution T. H. Hetherington, T. G. Rogers, L. Hsu, M. O Connor, and T. M. Aamodt, Characterizing and Evaluating a Key-Value Store Application on Heterogeneous CPU-GPU Systems, Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2012), April Copyright 2012 HSA Foundation. All Rights Reserved. 34
35 ACCELERATING B+TREE SEARCHES CLOUD SERVER WORKLOAD
36 B+TREE SEARCHES B+Trees are a fundamental data structure Used to reduce memory & disk access to locate a key Can support index- and range-based queries Can be updated efficiently B+Trees are used by enterprise DB applications SQL: SQLite, MySQL, Oracle, and others No-SQL: Apache CouchDB, Tokyo Cabinet, and others Audio search, video copy detection A simple B+Tree linking the keys 1-7. The linked list (red) allows rapid in-order traversal. Copyright 2012 HSA Foundation. All Rights Reserved. 36
37 PARALLEL B+TREE SEARCHES ON HSA By efficiently sharing data between CPU and GPU, HSA increases performance versus Multi Threaded CPU, even for tree structures that reside in host memory. +4x With HSA, DB can be larger than GPU memory, and can be shared. HSA lets us move compute to data Parallel search can move to GPU Sequential updates can remain on CPU Platform Size < 1.5 GB Size GB Size > 2.7 GB dgpu (memory size = 3GB) INCREASED PERFORMANCE HSA M. Daga, and M. Nutter, Exploiting Coarse-Grained Parallelism in B+Tree Searches on an APU, Accepted at Second Workshop on Irregular Applications: Algorithms and Architectures, (IA3) November AMD A M APU with Radeon HD Graphics; CPU: MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G, 6 compute units, 685MHz; 4GB RAM Copyright 2012 HSA Foundation. All Rights Reserved. 37
38 ACCELERATING SUFFIX ARRAY CONSTRUCTION CLOUD SERVER WORKLOAD
39 SUFFIX ARRAYS Suffix Arrays are a fundamental data structure Designed for efficient searching of a large text Quickly locate every occurrence of a substring S in a text T Suffix Arrays are used to accelerate in-memory cloud workloads Full text index search Lossless data compression Bio-informatics Copyright 2012 HSA Foundation. All Rights Reserved. 39
40 ACCELERATED SUFFIX ARRAY CONSTRUCTION ON HSA By efficiently sharing data between CPU and GPU, HSA lets us move compute to data without penalty of intermediate copies. Skew Algorithm for Compute SA Radix Sort::GPU Lexical Rank::CPU By offloading data parallel computations to GPU, HSA increases performance and reduces energy for Suffix Array Construction versus Single Threaded CPU. +5.8x Compute SA::CPU Radix Sort::GPU Merge Sort::GPU INCREASED PERFORMANCE -5x DECREASED ENERGY M. Deo, Parallel Suffix Array Construction and Least Common Prefix for the GPU, Submitted to Principles and Practice of Parallel Programming, (PPoPP 13) February AMD A M APU with Radeon HD Graphics; CPU: MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G, 6 compute units, 685MHz; 4GB RAM Copyright 2012 HSA Foundation. All Rights Reserved. 40
41 VASCULAR IMAGE ENHANCEMENT EMBEDDED MEDICAL WORKLOAD
42 VASCULATURE IMAGE ENHANCEMENT Automatic image enhancement over volumetric CT/MRI data Used to diagnose vascular pathologies Input Image Output Image Copyright 2012 HSA Foundation. All Rights Reserved. 42
43 IMPROVED PERFORMANCE AND ENERGY HSA increases performance and reduces energy for Medical Image Analysis by offloading data parallel compute to GPU +14x INCREASED PERFORMANCE -14x DECREASED ENERGY AMD A M APU with Radeon HD Graphics; CPU: MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G, 6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL 1.1 (873.1) Copyright 2012 HSA Foundation. All Rights Reserved. 43
44 EASE OF PROGRAMMING CODE COMPLEXITY VS. PERFORMANCE
45 LINES-OF-CODE AND PERFORMANCE FOR DIFFERENT PROGRAMMING MODELS LOC 350 (Exemplary ISV Hessian Kernel) Launch Init Compile Copy Launch Compile Copy Launch Launch Performance 100 Launch Algorithm Launch Algorithm Algorithm Algorithm Algorithm Algorithm Launch Algorithm Copy-back Copy-back Copy-back Serial CPU TBB Intrinsics+TBB OpenCL -C OpenCL -C++ C++ AMP HSA Bolt 0 Copy-back Algorithm Launch Copy Compile Init Performance AMD A K APU with Radeon HD Graphics CPU: 4 cores, 3800MHz (4200MHz Turbo); GPU: AMD Radeon HD 7660D, 6 compute units, 800MHz; 4GB RAM. Software Windows 7 Professional SP1 (64-bit OS); AMD OpenCL 1.2 AMD-APP (937.2); Microsoft Visual Studio 11 Beta Copyright 2012 HSA Foundation. All Rights Reserved. 45
46 THE HSA FUTURE Highly productive programmers + Scalable performance + Power efficiency = AMAZING USER EXPERIENCES Copyright 2012 HSA Foundation. All Rights Reserved. 46
47 THE HSA FOUNDATION MEMBERSHIP Founders Promoters Supporters Contributors Academic Associates 47 Copyright 2012 HSA Foundation. All Rights Reserved.
Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.
More informationGraphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011
Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis
More informationData Center and Cloud Computing Market Landscape and Challenges
Data Center and Cloud Computing Market Landscape and Challenges Manoj Roge, Director Wired & Data Center Solutions Xilinx Inc. #OpenPOWERSummit 1 Outline Data Center Trends Technology Challenges Solution
More informationAMD WHITE PAPER GETTING STARTED WITH SEQUENCEL. AMD Embedded Solutions 1
AMD WHITE PAPER GETTING STARTED WITH SEQUENCEL AMD Embedded Solutions 1 Optimizing Parallel Processing Performance and Coding Efficiency with AMD APUs and Texas Multicore Technologies SequenceL Auto-parallelizing
More informationGPGPU Computing. Yong Cao
GPGPU Computing Yong Cao Why Graphics Card? It s powerful! A quiet trend Copyright 2009 by Yong Cao Why Graphics Card? It s powerful! Processor Processing Units FLOPs per Unit Clock Speed Processing Power
More informationProgramming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga
Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.
More informationLOOKING FOR AN AMAZING PROCESSOR. Product Brief 6th Gen Intel Core Processors for Desktops: S-series
Product Brief 6th Gen Intel Core Processors for Desktops: Sseries LOOKING FOR AN AMAZING PROCESSOR for your next desktop PC? Look no further than 6th Gen Intel Core processors. With amazing performance
More informationNVIDIA GeForce GTX 580 GPU Datasheet
NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet 3D Graphics Full Microsoft DirectX 11 Shader Model 5.0 support: o NVIDIA PolyMorph Engine with distributed HW tessellation engines
More informationA Survey on ARM Cortex A Processors. Wei Wang Tanima Dey
A Survey on ARM Cortex A Processors Wei Wang Tanima Dey 1 Overview of ARM Processors Focusing on Cortex A9 & Cortex A15 ARM ships no processors but only IP cores For SoC integration Targeting markets:
More informationA Scalable VISC Processor Platform for Modern Client and Cloud Workloads
A Scalable VISC Processor Platform for Modern Client and Cloud Workloads Mohammad Abdallah Founder, President and CTO Soft Machines Linley Processor Conference October 7, 2015 Agenda Soft Machines Background
More informationGPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
More informationAMD CodeXL 1.7 GA Release Notes
AMD CodeXL 1.7 GA Release Notes Thank you for using CodeXL. We appreciate any feedback you have! Please use the CodeXL Forum to provide your feedback. You can also check out the Getting Started guide on
More informationGPU Architecture. Michael Doggett ATI
GPU Architecture Michael Doggett ATI GPU Architecture RADEON X1800/X1900 Microsoft s XBOX360 Xenos GPU GPU research areas ATI - Driving the Visual Experience Everywhere Products from cell phones to super
More informationL20: GPU Architecture and Models
L20: GPU Architecture and Models scribe(s): Abdul Khalifa 20.1 Overview GPUs (Graphics Processing Units) are large parallel structure of processing cores capable of rendering graphics efficiently on displays.
More informationMulti-core Programming System Overview
Multi-core Programming System Overview Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,
More informationNext Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
More informationOverview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
More informationMaking Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association
Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?
More informationApplications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61
F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase
More informationXeon+FPGA Platform for the Data Center
Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system
More informationPower Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze
Power Benefits Using Intel Quick Sync Video H.264 Codec With Sorenson Squeeze Whitepaper December 2012 Anita Banerjee Contents Introduction... 3 Sorenson Squeeze... 4 Intel QSV H.264... 5 Power Performance...
More informationWindows Server 2012 R2 VDI - Virtual Desktop Infrastructure. Ori Husyt Agile IT Consulting Team Manager orih@agileit.co.il
Windows Server 2012 R2 VDI - Virtual Desktop Infrastructure Ori Husyt Agile IT Consulting Team Manager orih@agileit.co.il Today s challenges Users Devices Apps Data Users expect to be able to work in any
More informationRadeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008
Radeon GPU Architecture and the series Michael Doggett Graphics Architecture Group June 27, 2008 Graphics Processing Units Introduction GPU research 2 GPU Evolution GPU started as a triangle rasterizer
More informationVDI Optimization Real World Learnings. Russ Fellows, Evaluator Group
Russ Fellows, Evaluator Group SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material
More informationIntel Xeon +FPGA Platform for the Data Center
Intel Xeon +FPGA Platform for the Data Center FPL 15 Workshop on Reconfigurable Computing for the Masses PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA
More informationRadeon HD 2900 and Geometry Generation. Michael Doggett
Radeon HD 2900 and Geometry Generation Michael Doggett September 11, 2007 Overview Introduction to 3D Graphics Radeon 2900 Starting Point Requirements Top level Pipeline Blocks from top to bottom Command
More informationTHE AMD MISSION 2 AN INTRODUCTION TO AMD NOVEMBER 2014
THE AMD MISSION To be the leading designer and integrator of innovative, tailored technology solutions that empower people to push the boundaries of what is possible 2 AN INTRODUCTION TO AMD NOVEMBER 2014
More informationIntel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual
Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual Overview Metrics Monitor is part of Intel Media Server Studio 2015 for Linux Server. Metrics Monitor is a user space shared library
More informationThe Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA
The Evolution of Computer Graphics Tony Tamasi SVP, Content & Technology, NVIDIA Graphics Make great images intricate shapes complex optical effects seamless motion Make them fast invent clever techniques
More informationGetting Started with CodeXL
AMD Developer Tools Team Advanced Micro Devices, Inc. Table of Contents Introduction... 2 Install CodeXL... 2 Validate CodeXL installation... 3 CodeXL help... 5 Run the Teapot Sample project... 5 Basic
More informationData Centric Systems (DCS)
Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems
More informationMaximizing Hadoop Performance and Storage Capacity with AltraHD TM
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created
More informationx64 Servers: Do you want 64 or 32 bit apps with that server?
TMurgent Technologies x64 Servers: Do you want 64 or 32 bit apps with that server? White Paper by Tim Mangan TMurgent Technologies February, 2006 Introduction New servers based on what is generally called
More informationLecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.
Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide
More informationApplied Micro development platform. ZT Systems (ST based) HP Redstone platform. Mitac Dell Copper platform. ARM in Servers
ZT Systems (ST based) Applied Micro development platform HP Redstone platform Mitac Dell Copper platform ARM in Servers 1 Server Ecosystem Momentum 2009: Internal ARM trials hosting part of website on
More informationExample of Standard API
16 Example of Standard API System Call Implementation Typically, a number associated with each system call System call interface maintains a table indexed according to these numbers The system call interface
More informationA Powerful solution for next generation Pcs
Product Brief 6th Generation Intel Core Desktop Processors i7-6700k and i5-6600k 6th Generation Intel Core Desktop Processors i7-6700k and i5-6600k A Powerful solution for next generation Pcs Looking for
More informationAwards News. GDDR5 memory provides twice the bandwidth per pin of GDDR3 memory, delivering more speed and higher bandwidth.
SAPPHIRE FleX HD 6870 1GB GDDE5 SAPPHIRE HD 6870 FleX can support three DVI monitors in Eyefinity mode and deliver a true SLS (Single Large Surface) work area without the need for costly active adapters.
More informationFachbereich Informatik und Elektrotechnik SunSPOT. Ubiquitous Computing. Ubiquitous Computing, Helmut Dispert
Ubiquitous Computing Ubiquitous Computing The Sensor Network System Sun SPOT: The Sun Small Programmable Object Technology Technology-Based Wireless Sensor Networks a Java Platform for Developing Applications
More informationSAPPHIRE HD 6870 1GB GDDR5 PCIE. www.msystems.gr
SAPPHIRE HD 6870 1GB GDDR5 PCIE Get Radeon in Your System - Immerse yourself with AMD Eyefinity technology and expand your games across multiple displays. Experience ultra-realistic visuals and explosive
More informationQuareo ICM Server Software
The Quareo Infrastructure Configuration Manager (ICM) is a server software application designed to document and administer both passive and active network connectivity infrastructure. ICM enables management
More informationIn-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller
In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency
More informationBSC vision on Big Data and extreme scale computing
BSC vision on Big Data and extreme scale computing Jesus Labarta, Eduard Ayguade,, Fabrizio Gagliardi, Rosa M. Badia, Toni Cortes, Jordi Torres, Adrian Cristal, Osman Unsal, David Carrera, Yolanda Becerra,
More informationLeveraging Aparapi to Help Improve Financial Java Application Performance
Leveraging Aparapi to Help Improve Financial Java Application Performance Shrinivas Joshi, Software Performance Engineer Abstract Graphics Processing Unit (GPU) and Accelerated Processing Unit (APU) offload
More informationOpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,
More informationThe Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.
White Paper 021313-3 Page 1 : A Software Framework for Parallel Programming* The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud. ABSTRACT Programming for Multicore,
More informationVALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS
VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS Perhaad Mistry, Yash Ukidave, Dana Schaa, David Kaeli Department of Electrical and Computer Engineering Northeastern University,
More informationDatabase Scalability and Oracle 12c
Database Scalability and Oracle 12c Marcelle Kratochvil CTO Piction ACE Director All Data/Any Data marcelle@piction.com Warning I will be covering topics and saying things that will cause a rethink in
More informationThis Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?
This Unit: Putting It All Together CIS 501 Computer Architecture Unit 11: Putting It All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Amir Roth with contributions by Milo
More informationCopyright 2013, Oracle and/or its affiliates. All rights reserved.
1 Oracle SPARC Server for Enterprise Computing Dr. Heiner Bauch Senior Account Architect 19. April 2013 2 The following is intended to outline our general product direction. It is intended for information
More informationTake full advantage of IBM s IDEs for end- to- end mobile development
Take full advantage of IBM s IDEs for end- to- end mobile development ABSTRACT Mobile development with Rational Application Developer 8.5, Rational Software Architect 8.5, Rational Developer for zenterprise
More informationIntroduction to GPU Programming Languages
CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure
More informationAchieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003
Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks An Oracle White Paper April 2003 Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building
More informationIntroduction to GPU Computing
Matthis Hauschild Universität Hamburg Fakultät für Mathematik, Informatik und Naturwissenschaften Technische Aspekte Multimodaler Systeme December 4, 2014 M. Hauschild - 1 Table of Contents 1. Architecture
More informationNavigating Big Data with High-Throughput, Energy-Efficient Data Partitioning
Application-Specific Architecture Navigating Big Data with High-Throughput, Energy-Efficient Data Partitioning Lisa Wu, R.J. Barker, Martha Kim, and Ken Ross Columbia University Xiaowei Wang Rui Chen Outline
More informationIntel DPDK Boosts Server Appliance Performance White Paper
Intel DPDK Boosts Server Appliance Performance Intel DPDK Boosts Server Appliance Performance Introduction As network speeds increase to 40G and above, both in the enterprise and data center, the bottlenecks
More informationBest Practices for Deploying SSDs in a Microsoft SQL Server 2008 OLTP Environment with Dell EqualLogic PS-Series Arrays
Best Practices for Deploying SSDs in a Microsoft SQL Server 2008 OLTP Environment with Dell EqualLogic PS-Series Arrays Database Solutions Engineering By Murali Krishnan.K Dell Product Group October 2009
More informationStream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
More informationGPU Profiling with AMD CodeXL
GPU Profiling with AMD CodeXL Software Profiling Course Hannes Würfel OUTLINE 1. Motivation 2. GPU Recap 3. OpenCL 4. CodeXL Overview 5. CodeXL Internals 6. CodeXL Profiling 7. CodeXL Debugging 8. Sources
More informationGetting Started with RemoteFX in Windows Embedded Compact 7
Getting Started with RemoteFX in Windows Embedded Compact 7 Writers: Randy Ocheltree, Ryan Wike Technical Reviewer: Windows Embedded Compact RDP Team Applies To: Windows Embedded Compact 7 Published: January
More informationEmbedded Systems: map to FPGA, GPU, CPU?
Embedded Systems: map to FPGA, GPU, CPU? Jos van Eijndhoven jos@vectorfabrics.com Bits&Chips Embedded systems Nov 7, 2013 # of transistors Moore s law versus Amdahl s law Computational Capacity Hardware
More information#OpenPOWERSummit. Join the conversation at #OpenPOWERSummit 1
XLC/C++ and GPU Programming on Power Systems Kelvin Li, Kit Barton, John Keenleyside IBM {kli, kbarton, keenley}@ca.ibm.com John Ashley NVIDIA jashley@nvidia.com #OpenPOWERSummit Join the conversation
More informationGETTING STARTED WITH ANDROID DEVELOPMENT FOR EMBEDDED SYSTEMS
Embedded Systems White Paper GETTING STARTED WITH ANDROID DEVELOPMENT FOR EMBEDDED SYSTEMS September 2009 ABSTRACT Android is an open source platform built by Google that includes an operating system,
More informationGPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics
GPU Architectures A CPU Perspective Derek Hower AMD Research 5/21/2013 Goals Data Parallelism: What is it, and how to exploit it? Workload characteristics Execution Models / GPU Architectures MIMD (SPMD),
More informationSystem Requirements Table of contents
Table of contents 1 Introduction... 2 2 Knoa Agent... 2 2.1 System Requirements...2 2.2 Environment Requirements...4 3 Knoa Server Architecture...4 3.1 Knoa Server Components... 4 3.2 Server Hardware Setup...5
More informationRevit products will use multiple cores for many tasks, using up to 16 cores for nearphotorealistic
Autodesk Revit 2013 Product Line System s and Recommendations Autodesk Revit Architecture 2013 Autodesk Revit MEP 2013 Autodesk Revit Structure 2013 Autodesk Revit 2013 Minimum: Entry-Level Configuration
More informationAzul's Zulu JVM could prove an awkward challenge to Oracle's Java ambitions
Azul's Zulu JVM could prove an awkward challenge to Oracle's Java ambitions Analyst: John Abbott 26 Feb, 2014 Azul Systems, best known for its Zing scalable Java runtime, has been introducing a new product
More informationIntroduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software
GPU Computing Numerical Simulation - from Models to Software Andreas Barthels JASS 2009, Course 2, St. Petersburg, Russia Prof. Dr. Sergey Y. Slavyanov St. Petersburg State University Prof. Dr. Thomas
More informationWhitepaper. NVIDIA Miracast Wireless Display Architecture
Whitepaper NVIDIA Miracast Wireless Display Architecture 1 Table of Content Miracast Wireless Display Background... 3 NVIDIA Miracast Architecture... 4 Benefits of NVIDIA Miracast Architecture... 5 Summary...
More informationMapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu
1 MapReduce on GPUs Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu 2 MapReduce MAP Shuffle Reduce 3 Hadoop Open-source MapReduce framework from Apache, written in Java Used by Yahoo!, Facebook, Ebay,
More informationNVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS
NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS DU-05349-001_v6.0 February 2014 Installation and Verification on TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. System Requirements... 1 1.2.
More informationPerformance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries
Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Shin Morishima 1 and Hiroki Matsutani 1,2,3 1Keio University, 3 14 1 Hiyoshi, Kohoku ku, Yokohama, Japan 2National Institute
More informationFastboot Techniques for x86 Architectures. Marcus Bortel Field Application Engineer QNX Software Systems
Fastboot Techniques for x86 Architectures Marcus Bortel Field Application Engineer QNX Software Systems Agenda Introduction BIOS and BIOS boot time Fastboot versus BIOS? Fastboot time Customizing the boot
More informationChoosing a Computer for Running SLX, P3D, and P5
Choosing a Computer for Running SLX, P3D, and P5 This paper is based on my experience purchasing a new laptop in January, 2010. I ll lead you through my selection criteria and point you to some on-line
More informationVirtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies
Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies Kurt Klemperer, Principal System Performance Engineer kklemperer@blackboard.com Agenda Session Length:
More informationRecent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005
Recent Advances and Future Trends in Graphics Hardware Michael Doggett Architect November 23, 2005 Overview XBOX360 GPU : Xenos Rendering performance GPU architecture Unified shader Memory Export Texture/Vertex
More informationOWB Users, Enter The New ODI World
OWB Users, Enter The New ODI World Kulvinder Hari Oracle Introduction Oracle Data Integrator (ODI) is a best-of-breed data integration platform focused on fast bulk data movement and handling complex data
More informationAutodesk Revit 2016 Product Line System Requirements and Recommendations
Autodesk Revit 2016 Product Line System Requirements and Recommendations Autodesk Revit 2016, Autodesk Revit Architecture 2016, Autodesk Revit MEP 2016, Autodesk Revit Structure 2016 Minimum: Entry-Level
More informationXTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April 2009. Page 1 of 12
XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines A.Zydroń 18 April 2009 Page 1 of 12 1. Introduction...3 2. XTM Database...4 3. JVM and Tomcat considerations...5 4. XTM Engine...5
More informationData Management for Portable Media Players
Data Management for Portable Media Players Table of Contents Introduction...2 The New Role of Database...3 Design Considerations...3 Hardware Limitations...3 Value of a Lightweight Relational Database...4
More informationHIGH PERFORMANCE VIDEO ENCODING WITH NVIDIA GPUS
April 4-7, 2016 Silicon Valley HIGH PERFORMANCE VIDEO ENCODING WITH NVIDIA GPUS Abhijit Patait Eric Young April 4 th, 2016 NVIDIA GPU Video Technologies Video Hardware Capabilities AGENDA Video Software
More informationA White Paper By: Dr. Gaurav Banga SVP, Engineering & CTO, Phoenix Technologies. Bridging BIOS to UEFI
A White Paper By: Dr. Gaurav Banga SVP, Engineering & CTO, Phoenix Technologies Bridging BIOS to UEFI Copyright Copyright 2007 by Phoenix Technologies Ltd. All rights reserved. No part of this publication
More informationHow To Use An Amd Ramfire R7 With A 4Gb Memory Card With A 2Gb Memory Chip With A 3D Graphics Card With An 8Gb Card With 2Gb Graphics Card (With 2D) And A 2D Video Card With
SAPPHIRE R9 270X 4GB GDDR5 WITH BOOST & OC Specification Display Support Output GPU Video Memory Dimension Software Accessory 3 x Maximum Display Monitor(s) support 1 x HDMI (with 3D) 1 x DisplayPort 1.2
More informationINTRODUCTION ADVANTAGES OF RUNNING ORACLE 11G ON WINDOWS. Edward Whalen, Performance Tuning Corporation
ADVANTAGES OF RUNNING ORACLE11G ON MICROSOFT WINDOWS SERVER X64 Edward Whalen, Performance Tuning Corporation INTRODUCTION Microsoft Windows has long been an ideal platform for the Oracle database server.
More informationA Unified View of Virtual Machines
A Unified View of Virtual Machines First ACM/USENIX Conference on Virtual Execution Environments J. E. Smith June 2005 Introduction Why are virtual machines interesting? They allow transcending of interfaces
More informationMark Bennett. Search and the Virtual Machine
Mark Bennett Search and the Virtual Machine Agenda Intro / Business Drivers What to do with Search + Virtual What Makes Search Fast (or Slow!) Virtual Platforms Test Results Trends / Wrap Up / Q & A Business
More informationOverview. The Android operating system is like a cake consisting of various layers.
The Android Stack Overview The Android operating system is like a cake consisting of various layers. Each layer has its own characteristics and purpose but the layers are not always cleanly separated and
More informationEffective Java Programming. efficient software development
Effective Java Programming efficient software development Structure efficient software development what is efficiency? development process profiling during development what determines the performance of
More informationCLOUD GAMING WITH NVIDIA GRID TECHNOLOGIES Franck DIARD, Ph.D., SW Chief Software Architect GDC 2014
CLOUD GAMING WITH NVIDIA GRID TECHNOLOGIES Franck DIARD, Ph.D., SW Chief Software Architect GDC 2014 Introduction Cloud ification < 2013 2014+ Music, Movies, Books Games GPU Flops GPUs vs. Consoles 10,000
More informationGraduate presentation for CSCI 5448. By Janakiram Vantipalli ( Janakiram.vantipalli@colorado.edu )
Graduate presentation for CSCI 5448 By Janakiram Vantipalli ( Janakiram.vantipalli@colorado.edu ) Content What is Android?? Versions and statistics Android Architecture Application Components Inter Application
More informationA REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information
More informationSystem Requirements. SAS Profitability Management 2.21. Deployment
System Requirements SAS Profitability Management 2.2 This document provides the requirements for installing and running SAS Profitability Management. You must update your computer to meet the minimum requirements
More informationWHITEPAPER. Unlock the value of your.net architecture with MuleSoft. MuleSoft s Anypoint Platform-The Next Generation Integration Solution
WHITEPAPER Unlock the value of your.net architecture with MuleSoft MuleSoft s Anypoint Platform-The Next Generation Integration Solution Overview For.NET centric organizations, the options available to
More informationSAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST
SAPPHIRE TOXIC R9 270X 2GB GDDR5 WITH BOOST Specification Display Support Output GPU Video Memory Dimension Software Accessory supports up to 4 display monitor(s) without DisplayPort 4 x Maximum Display
More informationANDROID DEVELOPER TOOLS TRAINING GTC 2014. Sébastien Dominé, NVIDIA
ANDROID DEVELOPER TOOLS TRAINING GTC 2014 Sébastien Dominé, NVIDIA AGENDA NVIDIA Developer Tools Introduction Multi-core CPU tools Graphics Developer Tools Compute Developer Tools NVIDIA Developer Tools
More informationAMD APP SDK v2.8 FAQ. 1 General Questions
AMD APP SDK v2.8 FAQ 1 General Questions 1. Do I need to use additional software with the SDK? To run an OpenCL application, you must have an OpenCL runtime on your system. If your system includes a recent
More informationBENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
More informationAbout Parallels Desktop 10 for Mac
About Parallels Desktop 10 for Mac Parallels Desktop 10 for Mac is a major upgrade to Parallels award-winning software for running Windows on a Mac. About this Update This update for Parallels Desktop
More informationMicrosoft SQL Server 2012 on Cisco UCS with iscsi-based Storage Access in VMware ESX Virtualization Environment: Performance Study
White Paper Microsoft SQL Server 2012 on Cisco UCS with iscsi-based Storage Access in VMware ESX Virtualization Environment: Performance Study 2012 Cisco and/or its affiliates. All rights reserved. This
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware
More information