Writing Applications for the GPU Using the RapidMind Development Platform
|
|
|
- Sherman Caldwell
- 10 years ago
- Views:
Transcription
1 Writing Applications for the GPU Using the RapidMind Development Platform Contents Introduction... 1 Graphics Processing Units... 1 RapidMind Development Platform... 2 Writing RapidMind Enabled Applications... 2 How Does It Work?... 3 Example: A Particle System... 3 Example: Blinn-Phong Shading... 5 High Performance Made Simple... 6 About RapidMind Inc Abstract The RapidMind Development Platform allows developers to use standard C++ programming to easily create applications targeted for highperformance processors, including GPUs, the Cell BE, and multi-core CPUs. In the case of the GPU, the RapidMind platform can be used for both shaders and general purpose processing. For shaders, the platform provides many advantages over other shading systems, including support for object-oriented programming. For general purpose processing, the platform provides a simple computational model that can be mapped onto any available computational resource in a system, including both GPUs and other processors. Code can be written once, then run in parallel on any of the processors that RapidMind supports. Copyright 2006 RapidMind Inc.
2 Introduction Graphics Processing Units (GPUs), the main processors of commodity video accelerator cards in PCs, are capable of achieving very high levels of performance by utilizing the power of parallel processing. In fact, these processors typically have more than an order of magnitude more floating-point power than the host CPU they support. Although designed for graphics applications, they can be used for arbitrary computations; however, graphics APIs do not directly support this mode of usage. The RapidMind Development Pla tform makes it straightforward to access a GPU s power for generalpurpose computation without any need to work through a graphics API. The RapidMind platform provides a simple data-parallel model of execution that is easy to understand and use, and maps efficiently onto the capabilities of GPUs. The RapidMind platform s unique interface enables access to a GPU s power from within a single ISO-standard C++ program without any special extensions. It allows the use of familiar development environments and tools by building upon concepts and strategies already familiar to C++ programmers. It is also possible to use the platform to write shaders and graphical applications as a special case of the platform s general-purpose capabilities. The RapidMind platform supports both NVIDIA and ATI GPUs. This document outlines the challenges general-purpose GPU programming presents to programmers and describes how these challenges are effectively overcome using the RapidMind Development Platform. A simple example of a particle system simulation is used to show how straightforward it is to leverage GPUs for general purpose computation by using the RapidMind Development Platform. We also demonstrate how the platform can be used to program shaders for graphical applications. Graphics Processing Units GPUs from ATI and NVIDIA are massively parallel processors that can support more than 30x the floatingpoint power of typical host CPUs and more than 5x the memory bandwidth. This processing power is necessary to implement shaders, which are small programs which must be executed at every vertex to transform it in 3D space and at every pixel to compute the color of that pixel. Although they are powerful, GPUs are widely available on inexpensive video accelerator cards, and are standard components of most PCs. The execution of shaders can be considered a form of stream processing, a massively parallel computing model that emphasizes coherent access to memory. However, the programming model for GPUs is usuall y expressed in graphics terms, and can be inconvenient to use for general-purpose computation. For instance, to apply a function to an array, it is necessary to bind a shader and draw a rectangle to the screen. To implement random-access reads from memory, it is necessary to set up texture maps and bind them to shaders. To write to output arrays, it is necessary to use frame buffer object interfaces to bind the output of the shaders to a texture. To write to computed locations in output arrays, it is necessary to reinterpret image pixel values as vertex positions and render a sequence of points that the rasterizer scatters to new locations on a destination buffer. This makes it very complex and confusing to program GPUs for general applications. Although GPU hardware vendors provide languages for programming the shader units of their hardware, these languages only provide the ability to program shaders not the whole system. Many other aspects of the GPU must be managed, in particular memory. The GPU memory is separate from the host memory, so it is necessary to transfer data to and from the video accelerator. Also, binding shader code to the host application requires a large amount of glue code. Finally, after programming a GPU this way, the resulting code will only run on GPUs, and cannot easily be ported to other hardware targets, such as the Cell BE. Copyright 2006 RapidMind Inc. Page 1
3 RapidMind Development Platform RapidMind provides a software development platform that allows the developer to use standard C++ programming to easily create high-performance and massively parallel applications that run on the GPU. Developers are provided a single, simple and standard way to program which the RapidMind platform then maps onto all available computational resources in a given system. Developers can continue to use their existing C++ compilers and build systems. The RapidMind platform is embedded in the application and transparently manages massively parallel computations. Application Structure Writing RapidMind Enabled Applications Users of the RapidMind Development Platform continue to program in C++ using their existing compiler. After identifying components of their application to accelerate, the overall process of integration is as follows: 1. Replace types: The developer replaces numerical types representing floating point numbers and integers with the equivalent RapidMind platform types. 2. Capture computations: While the user s application is running, sequences of numerical operations invoked by the user s application are captured, recorded, and dynamically compiled to a program object by the RapidMind platform. 3. Stream execution: The RapidMind platform runtime is used to manage parallel execution of program objects on the target hardware (in this case, a GPU). Process to change applications to use the RapidMind Development Platform Copyright 2006 RapidMind Inc. Page 2
4 How Does It Work? The RapidMind Development Platform is an advanced dynamic compiler and runtime management system for parallel processing. It has a sophisticated interface embedded within standard ISO C++. It can be used to express arbitrary parallel computations, but it is not a new language. Instead, it merely adds a new vocabulary to standard ISO C++: a set of nouns (types) and verbs (operations). In the case of the GPU, this new vocabulary allows the specification of high-performance parallel computations, while retaining close integration with standard C++ code running on the host processor. In essence, the RapidMind Development Platform enables programming of the GPU as a co-processor, under a single unified programming model, while using existing tools. A user of the RapidMind Development Platform writes C++ code in the usual way, but uses specific types for numbers, vectors of numbers, matrices, and arrays. In immediate mode, operations on these values can be executed on the host processor, in the manner of a simple operator-overloaded matrix-vector library. In this mode, the RapidMind Development Platform simply reflects standard practice in numerical programming under C++. However, the RapidMind platform also supports a unique retained mode. In this mode, operations are recorded and dynamically compiled into a program object rather than being immediately executed. These program objects can be used as functions in the host program. In the case of GPUs, applying such a function to an array of values automatically invokes a massively parallel computation on the video accelerator. Data is automatically transferred to and from the video accelerator, overlapping computation with data transfers. Program objects mimic the behavior of native C++ functions, including support for modularity and scope, so standard C++ object-oriented programming techniques can be leveraged. It should be noted that at runtime, program objects only execute the numerical computations they have recorded, and can completely avoid any overhead due to the object-oriented nature of the specification. The platform uses C++ only as scaffolding to define computations, but rips away this scaffolding for more efficient runtime execution. The RapidMind Development Platform can target other processors. In particular, it can also be used to program the Cell BE processor and the host CPU with dynamically compiled code. Example: A Particle System As a simple example to demonstrate the use of the RapidMind Development Platform on the GPU, the code below is provided. This code simulates a particle system, and specifies a parallel computation which will take place on the GPU. Every particle has a state. At every timestep of the simulation, each particle must execute a rule to update its own state and such rules can depend on a number of other inputs. Such simulations have many applications, especially when a sufficiently large number of particles are used, as is possible on the GPU. Appropriate choices of state and update rules can give rise to a wide variety of interesting phenomena. We can store the state of all the particles in an array and then write a program object that will update all the states in parallel. A state update rule that integrates the acceleration due to gravity and checks for collision against a ground plane is given in the listing on the next page. More complex rules are possible that check for collision against other objects or that use the state of other nearby particles to implement collective behaviors. After the simulation step, another rule can be run to convert the state of each particle to a visualization. These rules can also be written using the RapidMind Development Platform. In this code example, the words in boldface are types and keywords (macros) provided by the RapidMind Development Platform. These types and macros are implemented using standard ISO C++ features, and work with a variety of compilers. The types used in this example come in a number of flavors representing different geometric objects, but basically represent short vectors of floating point numbers. The user can create types for their own purposes by extending the types provided. The BEGIN macro enters retained mode and creates a program object called particle, and marks it for the streaming execution mode. This program Copyright 2006 RapidMind Inc. Page 3
5 object will act as a container for all the operations specified up to the invocation of the END macro. The InOut type modifier marks the pos and vel variables as both inputs and outputs. Finally, at the end of the code example, we invoke the program object particle on arrays of positions ps and velocities vs, updating them. The syntax shown can combine multiple arrays into a single stream, execute the program object, then split apart the output into multiple destination arrays. The computation specified in this code example executes completely on the GPU. // STATE ARRAYS Array<1,Value3f> ps(n_particles); Array<1,Value3f> vs(n_particles); // array holding particle positions // array holding particle velocities // PARAMETERS float mu = 0.3f; // parameters controlling collision response float eps = 0.6f; Value1f delta(1.0f/60.0f); // timestep delta (this type mimics a float) Value3f g(0.0f,-9.8f,0.0f); // acceleration due to gravity // DEFINE STREAM PROGRAM particle = BEGIN { // create a new program object InOut<Value3f> pos; // position; both input and output InOut<Value3f> vel; // velocity; both input and output // SPECIFY COMPUTATIONS // clamp acceleration to zero if particles at or below ground plane Value3f acc = cond(abs(pos[1])<0.05f, Value3f(0.0f,0.0f,0.0f), g); // integrate acceleration to get velocity vel += acc*delta; // integrate velocity to update position pos += vel*delta; // check if below ground level Value1f under = pos[1] < 0.0f; // clamp to ground level if necessary pos = cond(under, pos * Value3f(1.0f,0.0f,1.0f), pos); // modify velocity in case of collision Value3f veln = vel * Value3f(0.0f,1.0f,0.0f); Value3f velt = vel - veln; vel = cond(under, (1.0f - mu)*velt - eps*veln, vel); // clamp position to the edge of the plane, just to make it look nice pos = min(max(pos, Value3f(-5.0f, numeric_limits<float>::min(), -5.0f)), Value3f(5.0f, numeric_limits<float>::max(), 5.0f)); } END; // EXECUTE STREAM PROGRAM WITH MULTIPLE VALUE RETURN BUNDLE bundle(ps, vs) = particle(ps, vs); // uses parallel assignment semantics Stream program for particle system simulation One of the important features of the RapidMind Development Platform is that no glue code is required. The above code is the API. Co-processor code can be treated as simply part of the host application. Program objects can be used, essentially, as dynamically definable functions that can be run in parallel on all the elements of arrays. On the GPU, execution of program objects, such as particle in the example, are automatically mapped to the graphics hardware and managed by a runtime scheduler. Copyright 2006 RapidMind Inc. Page 4
6 You will notice that there is no explicit reference in the above program to a graphics API or the GPU. Neither the developer nor the application needs to be concerned about the actual processor resources th a t might be available at runtime. Whether targeting for a multi-core CPU, a GPU, or the Cell BE, the program itself does not change. The RapidMind Development Platform takes care of mapping the program objects onto available computational resources. Example: Blinn-Phong Shading The RapidMind platform can also be used to implement shaders. The listing given below presents an example where the platform is used to specify vertex and fragment shaders in order to implement the Blinn- Phong lighting model. Shaders are GPU-specific, but conceptually they are special versions of the program objects used for general-purpose computation. Functions and objects defined for general-purpose computation can also be used for shaders, and vice-versa. The platform also manages texture maps and other memory, enabling a powerful form of data abstraction not avai lable in other shading languages. Using the platform to program shaders is also useful when the results of general-purpose computations on the GPU need to be visualized. // DEFINE VERTEX SHADER vertex_shader = BEGIN_PROGRAM("vertex") { // declare input vertex attributes (unpacked in order given) In<Value3f> nm; // normal vector (MCS) In<Value4f> pm; // position (MCS) // declare outputs vertex attributes (packed in order given) Out<Value3f> nv; // normal (VCS) Out<Value3f> pv; // position (VCS) Out<Value3f> pd; // position (DCS) // specify computations (transform positions and normals) pv = (MV*pm)(0,1,2); // VCS position pd = (MD*pm)(0,1,2); // DCS position nv = normalize(nm*inverse_mv(0,1,2)(0,1,2)); // VCS normal } END; // DEFINE FRAGMENT SHADER fragment_shader = BEGIN_PROGRAM("fragment") { // declare input fragment attributes (unpacked in order given) In<Value3f> nv; // normal (VCS) In<Value3f> pv; // position (VCS) In<Value3f> pd; // fragment position (DCS) // declare output fragment attributes (packed in order given) Out<Value3f> fc; // fragment color // compute unit normal and view vector nv = normalize(nv); Value3f vv = -normalize(pv); // process each light source for (int i = 0; i < NLIGHTS; i++) { // compute per-light normalized vectors Value3f lv = normalize(light_position[i] - pv); Value3f hv = normalize(lv + vv); Value3f ec = light_color[i] * max(0.0f,dot(nv,lv)); // sum up contribution of each light source fc += ec *(kd + ks*pow(pos(dot(hv,nv)),q)); } } END; Copyright 2006 RapidMind Inc. Page 5
7 Vertex and fragment shaders for the Blinn-Phong lighting model. High Performance Made Simple The RapidMind Development Platform allows developers to use standard C++ programming to easily create applications targeted for high performance processors, including the Cell BE, GPUs, and CPUs. In the case of GPUs, the RapidMind platform implements user-specified arbitrary computations on the GPU, without any explicit reference by the developer to graphics APIs. The RapidMind platform provides a simple computational model tha t can be targeted by programmers and then maps this model onto any available computational resources in a system. Code can be written once, then run in parallel on any of the processors that the RapidMind Development Platform supports. About RapidMind Inc. RapidMind provides a software development platform that allows applications to take advantage of a new generation of high performance processors, including the Cell BE, GPUs, and other multi-core processors. The RapidMind Development Platform enables applications to realize the performance breakthroughs offered by these processors. Based in Waterloo, Canada, RapidMind is a venture-backed private company that is built on over five years of advanced research and development. For more information on how RapidMind can help you with your high performance application, visit RapidMind and the RapidMind logo are trademarks of RapidMind Inc. NVIDIA is a registered trademark of NVIDIA Corporation in the United States and/or other countries. ATI is a registered trademark of ATI Technologies Inc. Other company, product, or service names may be trademarks or service marks of others Copyright 2006 RapidMind Inc. Page 6
CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University
GPU Generations CSE 564: Visualization GPU Programming (First Steps) Klaus Mueller Computer Science Department Stony Brook University For the labs, 4th generation is desirable Graphics Hardware Pipeline
Computer Graphics Hardware An Overview
Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and
L20: GPU Architecture and Models
L20: GPU Architecture and Models scribe(s): Abdul Khalifa 20.1 Overview GPUs (Graphics Processing Units) are large parallel structure of processing cores capable of rendering graphics efficiently on displays.
Lecture Notes, CEng 477
Computer Graphics Hardware and Software Lecture Notes, CEng 477 What is Computer Graphics? Different things in different contexts: pictures, scenes that are generated by a computer. tools used to make
Introduction to GPGPU. Tiziano Diamanti [email protected]
[email protected] Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate
The Future Of Animation Is Games
The Future Of Animation Is Games 王 銓 彰 Next Media Animation, Media Lab, Director [email protected] The Graphics Hardware Revolution ( 繪 圖 硬 體 革 命 ) : GPU-based Graphics Hardware Multi-core (20 Cores
Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011
Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis
Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group
Shader Model 3.0 Ashu Rege NVIDIA Developer Technology Group Talk Outline Quick Intro GeForce 6 Series (NV4X family) New Vertex Shader Features Vertex Texture Fetch Longer Programs and Dynamic Flow Control
QCD as a Video Game?
QCD as a Video Game? Sándor D. Katz Eötvös University Budapest in collaboration with Győző Egri, Zoltán Fodor, Christian Hoelbling Dániel Nógrádi, Kálmán Szabó Outline 1. Introduction 2. GPU architecture
Data Parallel Computing on Graphics Hardware. Ian Buck Stanford University
Data Parallel Computing on Graphics Hardware Ian Buck Stanford University Brook General purpose Streaming language DARPA Polymorphous Computing Architectures Stanford - Smart Memories UT Austin - TRIPS
NVIDIA GeForce GTX 580 GPU Datasheet
NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet 3D Graphics Full Microsoft DirectX 11 Shader Model 5.0 support: o NVIDIA PolyMorph Engine with distributed HW tessellation engines
Texture Cache Approximation on GPUs
Texture Cache Approximation on GPUs Mark Sutherland Joshua San Miguel Natalie Enright Jerger {suther68,enright}@ece.utoronto.ca, [email protected] 1 Our Contribution GPU Core Cache Cache
GPGPU Computing. Yong Cao
GPGPU Computing Yong Cao Why Graphics Card? It s powerful! A quiet trend Copyright 2009 by Yong Cao Why Graphics Card? It s powerful! Processor Processing Units FLOPs per Unit Clock Speed Processing Power
Hardware design for ray tracing
Hardware design for ray tracing Jae-sung Yoon Introduction Realtime ray tracing performance has recently been achieved even on single CPU. [Wald et al. 2001, 2002, 2004] However, higher resolutions, complex
Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions
Module 4: Beyond Static Scalar Fields Dynamic Volume Computation and Visualization on the GPU Visualization and Computer Graphics Group University of California, Davis Overview Motivation and applications
Next Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
GPU Architecture. Michael Doggett ATI
GPU Architecture Michael Doggett ATI GPU Architecture RADEON X1800/X1900 Microsoft s XBOX360 Xenos GPU GPU research areas ATI - Driving the Visual Experience Everywhere Products from cell phones to super
Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga
Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.
Large-Data Software Defined Visualization on CPUs
Large-Data Software Defined Visualization on CPUs Greg P. Johnson, Bruce Cherniak 2015 Rice Oil & Gas HPC Workshop Trend: Increasing Data Size Measuring / modeling increasingly complex phenomena Rendering
OpenACC 2.0 and the PGI Accelerator Compilers
OpenACC 2.0 and the PGI Accelerator Compilers Michael Wolfe The Portland Group [email protected] This presentation discusses the additions made to the OpenACC API in Version 2.0. I will also present
Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software
GPU Computing Numerical Simulation - from Models to Software Andreas Barthels JASS 2009, Course 2, St. Petersburg, Russia Prof. Dr. Sergey Y. Slavyanov St. Petersburg State University Prof. Dr. Thomas
The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA
The Evolution of Computer Graphics Tony Tamasi SVP, Content & Technology, NVIDIA Graphics Make great images intricate shapes complex optical effects seamless motion Make them fast invent clever techniques
In the early 1990s, ubiquitous
How GPUs Work David Luebke, NVIDIA Research Greg Humphreys, University of Virginia In the early 1990s, ubiquitous interactive 3D graphics was still the stuff of science fiction. By the end of the decade,
GPU Parallel Computing Architecture and CUDA Programming Model
GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel
How To Teach Computer Graphics
Computer Graphics Thilo Kielmann Lecture 1: 1 Introduction (basic administrative information) Course Overview + Examples (a.o. Pixar, Blender, ) Graphics Systems Hands-on Session General Introduction http://www.cs.vu.nl/~graphics/
NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist
NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get
Comp 410/510. Computer Graphics Spring 2016. Introduction to Graphics Systems
Comp 410/510 Computer Graphics Spring 2016 Introduction to Graphics Systems Computer Graphics Computer graphics deals with all aspects of creating images with a computer Hardware (PC with graphics card)
Data Visualization Using Hardware Accelerated Spline Interpolation
Data Visualization Using Hardware Accelerated Spline Interpolation Petr Kadlec [email protected] Marek Gayer [email protected] Czech Technical University Department of Computer Science and Engineering
Silverlight for Windows Embedded Graphics and Rendering Pipeline 1
Silverlight for Windows Embedded Graphics and Rendering Pipeline 1 Silverlight for Windows Embedded Graphics and Rendering Pipeline Windows Embedded Compact 7 Technical Article Writers: David Franklin,
Stream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
GPGPU accelerated Computational Fluid Dynamics
t e c h n i s c h e u n i v e r s i t ä t b r a u n s c h w e i g Carl-Friedrich Gauß Faculty GPGPU accelerated Computational Fluid Dynamics 5th GACM Colloquium on Computational Mechanics Hamburg Institute
Low power GPUs a view from the industry. Edvard Sørgård
Low power GPUs a view from the industry Edvard Sørgård 1 ARM in Trondheim Graphics technology design centre From 2006 acquisition of Falanx Microsystems AS Origin of the ARM Mali GPUs Main activities today
The C Programming Language course syllabus associate level
TECHNOLOGIES The C Programming Language course syllabus associate level Course description The course fully covers the basics of programming in the C programming language and demonstrates fundamental programming
GeoImaging Accelerator Pansharp Test Results
GeoImaging Accelerator Pansharp Test Results Executive Summary After demonstrating the exceptional performance improvement in the orthorectification module (approximately fourteen-fold see GXL Ortho Performance
Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005
Recent Advances and Future Trends in Graphics Hardware Michael Doggett Architect November 23, 2005 Overview XBOX360 GPU : Xenos Rendering performance GPU architecture Unified shader Memory Export Texture/Vertex
OpenGL Performance Tuning
OpenGL Performance Tuning Evan Hart ATI Pipeline slides courtesy John Spitzer - NVIDIA Overview What to look for in tuning How it relates to the graphics pipeline Modern areas of interest Vertex Buffer
Introduction to GPU Programming Languages
CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure
Embedded Systems: map to FPGA, GPU, CPU?
Embedded Systems: map to FPGA, GPU, CPU? Jos van Eijndhoven [email protected] Bits&Chips Embedded systems Nov 7, 2013 # of transistors Moore s law versus Amdahl s law Computational Capacity Hardware
GPU Hardware and Programming Models. Jeremy Appleyard, September 2015
GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once
NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X
NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X DU-05348-001_v6.5 August 2014 Installation and Verification on Mac OS X TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. System Requirements... 1 1.2. About
Dynamic Resolution Rendering
Dynamic Resolution Rendering Doug Binks Introduction The resolution selection screen has been one of the defining aspects of PC gaming since the birth of games. In this whitepaper and the accompanying
Learn CUDA in an Afternoon: Hands-on Practical Exercises
Learn CUDA in an Afternoon: Hands-on Practical Exercises Alan Gray and James Perry, EPCC, The University of Edinburgh Introduction This document forms the hands-on practical component of the Learn CUDA
Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA
Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA Dissertation submitted in partial fulfillment of the requirements for the degree of Master of Technology, Computer Engineering by Amol
walberla: A software framework for CFD applications on 300.000 Compute Cores
walberla: A software framework for CFD applications on 300.000 Compute Cores J. Götz (LSS Erlangen, [email protected]), K. Iglberger, S. Donath, C. Feichtinger, U. Rüde Lehrstuhl für Informatik 10 (Systemsimulation)
DATA VISUALIZATION OF THE GRAPHICS PIPELINE: TRACKING STATE WITH THE STATEVIEWER
DATA VISUALIZATION OF THE GRAPHICS PIPELINE: TRACKING STATE WITH THE STATEVIEWER RAMA HOETZLEIN, DEVELOPER TECHNOLOGY, NVIDIA Data Visualizations assist humans with data analysis by representing information
Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61
F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase
This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?
This Unit: Putting It All Together CIS 501 Computer Architecture Unit 11: Putting It All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Amir Roth with contributions by Milo
Optimizing AAA Games for Mobile Platforms
Optimizing AAA Games for Mobile Platforms Niklas Smedberg Senior Engine Programmer, Epic Games Who Am I A.k.a. Smedis Epic Games, Unreal Engine 15 years in the industry 30 years of programming C64 demo
Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university
Real-Time Realistic Rendering Michael Doggett Docent Department of Computer Science Lund university 30-5-2011 Visually realistic goal force[d] us to completely rethink the entire rendering process. Cook
Performance Optimization and Debug Tools for mobile games with PlayCanvas
Performance Optimization and Debug Tools for mobile games with PlayCanvas Jonathan Kirkham, Senior Software Engineer, ARM Will Eastcott, CEO, PlayCanvas 1 Introduction Jonathan Kirkham, ARM Worked with
GPU Point List Generation through Histogram Pyramids
VMV 26, GPU Programming GPU Point List Generation through Histogram Pyramids Gernot Ziegler, Art Tevs, Christian Theobalt, Hans-Peter Seidel Agenda Overall task Problems Solution principle Algorithm: Discriminator
Scalability and Classifications
Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static
Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1
Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?
Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.
Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual
Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual Overview Metrics Monitor is part of Intel Media Server Studio 2015 for Linux Server. Metrics Monitor is a user space shared library
Optimization for DirectX9 Graphics. Ashu Rege
Optimization for DirectX9 Graphics Ashu Rege Last Year: Batch, Batch, Batch Moral of the story: Small batches BAD What is a batch Every DrawIndexedPrimitive call is a batch All render, texture, shader,...
GPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
Computer Applications in Textile Engineering. Computer Applications in Textile Engineering
3. Computer Graphics Sungmin Kim http://latam.jnu.ac.kr Computer Graphics Definition Introduction Research field related to the activities that includes graphics as input and output Importance Interactive
CE 504 Computational Hydrology Computational Environments and Tools Fritz R. Fiedler
CE 504 Computational Hydrology Computational Environments and Tools Fritz R. Fiedler 1) Operating systems a) Windows b) Unix and Linux c) Macintosh 2) Data manipulation tools a) Text Editors b) Spreadsheets
Big Data Visualization on the MIC
Big Data Visualization on the MIC Tim Dykes School of Creative Technologies University of Portsmouth [email protected] Many-Core Seminar Series 26/02/14 Splotch Team Tim Dykes, University of Portsmouth
GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith
GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series By: Binesh Tuladhar Clay Smith Overview History of GPU s GPU Definition Classical Graphics Pipeline Geforce 6 Series Architecture Vertex
Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH
Equalizer Parallel OpenGL Application Framework Stefan Eilemann, Eyescale Software GmbH Outline Overview High-Performance Visualization Equalizer Competitive Environment Equalizer Features Scalability
Developer Tools. Tim Purcell NVIDIA
Developer Tools Tim Purcell NVIDIA Programming Soap Box Successful programming systems require at least three tools High level language compiler Cg, HLSL, GLSL, RTSL, Brook Debugger Profiler Debugging
A Crash Course on Programmable Graphics Hardware
A Crash Course on Programmable Graphics Hardware Li-Yi Wei Abstract Recent years have witnessed tremendous growth for programmable graphics hardware (GPU), both in terms of performance and functionality.
Interactive Visualization of Magnetic Fields
JOURNAL OF APPLIED COMPUTER SCIENCE Vol. 21 No. 1 (2013), pp. 107-117 Interactive Visualization of Magnetic Fields Piotr Napieralski 1, Krzysztof Guzek 1 1 Institute of Information Technology, Lodz University
Chapter 19 Cloud Computing for Multimedia Services
Chapter 19 Cloud Computing for Multimedia Services 19.1 Cloud Computing Overview 19.2 Multimedia Cloud Computing 19.3 Cloud-Assisted Media Sharing 19.4 Computation Offloading for Multimedia Services 19.5
Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008
Radeon GPU Architecture and the series Michael Doggett Graphics Architecture Group June 27, 2008 Graphics Processing Units Introduction GPU research 2 GPU Evolution GPU started as a triangle rasterizer
Application. EDIUS and Intel s Sandy Bridge Technology
Application Note How to Turbo charge your workflow with Intel s Sandy Bridge processors and chipsets Alex Kataoka, Product Manager, Editing, Servers & Storage (ESS) August 2011 Would you like to cut the
Awards News. GDDR5 memory provides twice the bandwidth per pin of GDDR3 memory, delivering more speed and higher bandwidth.
SAPPHIRE FleX HD 6870 1GB GDDE5 SAPPHIRE HD 6870 FleX can support three DVI monitors in Eyefinity mode and deliver a true SLS (Single Large Surface) work area without the need for costly active adapters.
How To Build A Cloud Computer
Introducing the Singlechip Cloud Computer Exploring the Future of Many-core Processors White Paper Intel Labs Jim Held Intel Fellow, Intel Labs Director, Tera-scale Computing Research Sean Koehl Technology
APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE
APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE Tuyou Peng 1, Jun Peng 2 1 Electronics and information Technology Department Jiangmen Polytechnic, Jiangmen, Guangdong, China, [email protected]
MPSoC Designs: Driving Memory and Storage Management IP to Critical Importance
MPSoC Designs: Driving Storage Management IP to Critical Importance Design IP has become an essential part of SoC realization it is a powerful resource multiplier that allows SoC design teams to focus
Radeon HD 2900 and Geometry Generation. Michael Doggett
Radeon HD 2900 and Geometry Generation Michael Doggett September 11, 2007 Overview Introduction to 3D Graphics Radeon 2900 Starting Point Requirements Top level Pipeline Blocks from top to bottom Command
Getting Started with CodeXL
AMD Developer Tools Team Advanced Micro Devices, Inc. Table of Contents Introduction... 2 Install CodeXL... 2 Validate CodeXL installation... 3 CodeXL help... 5 Run the Teapot Sample project... 5 Basic
Monash University Clayton s School of Information Technology CSE3313 Computer Graphics Sample Exam Questions 2007
Monash University Clayton s School of Information Technology CSE3313 Computer Graphics Questions 2007 INSTRUCTIONS: Answer all questions. Spend approximately 1 minute per mark. Question 1 30 Marks Total
Web Based 3D Visualization for COMSOL Multiphysics
Web Based 3D Visualization for COMSOL Multiphysics M. Jüttner* 1, S. Grabmaier 1, W. M. Rucker 1 1 University of Stuttgart Institute for Theory of Electrical Engineering *Corresponding author: Pfaffenwaldring
Chapter 2 Parallel Architecture, Software And Performance
Chapter 2 Parallel Architecture, Software And Performance UCSB CS140, T. Yang, 2014 Modified from texbook slides Roadmap Parallel hardware Parallel software Input and output Performance Parallel program
Technical Brief. Quadro FX 5600 SDI and Quadro FX 4600 SDI Graphics to SDI Video Output. April 2008 TB-03813-001_v01
Technical Brief Quadro FX 5600 SDI and Quadro FX 4600 SDI Graphics to SDI Video Output April 2008 TB-03813-001_v01 Quadro FX 5600 SDI and Quadro FX 4600 SDI Graphics to SDI Video Output Table of Contents
Course materials. In addition to these slides, C++ API header files, a set of exercises, and solutions, the following are useful:
Course materials In addition to these slides, C++ API header files, a set of exercises, and solutions, the following are useful: OpenCL C 1.2 Reference Card OpenCL C++ 1.2 Reference Card These cards will
HPC with Multicore and GPUs
HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware
NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS
NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS DU-05349-001_v6.0 February 2014 Installation and Verification on TABLE OF CONTENTS Chapter 1. Introduction...1 1.1. System Requirements... 1 1.2.
Computer Graphics. Computer graphics deals with all aspects of creating images with a computer
Computer Graphics Computer graphics deals with all aspects of creating images with a computer Hardware Software Applications Computer graphics is using computers to generate and display images based on
VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS
VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS Perhaad Mistry, Yash Ukidave, Dana Schaa, David Kaeli Department of Electrical and Computer Engineering Northeastern University,
Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg
Image Processing and Computer Graphics Rendering Pipeline Matthias Teschner Computer Science Department University of Freiburg Outline introduction rendering pipeline vertex processing primitive processing
Introduction to Computer Graphics
Introduction to Computer Graphics Torsten Möller TASC 8021 778-782-2215 [email protected] www.cs.sfu.ca/~torsten Today What is computer graphics? Contents of this course Syllabus Overview of course topics
IA-64 Application Developer s Architecture Guide
IA-64 Application Developer s Architecture Guide The IA-64 architecture was designed to overcome the performance limitations of today s architectures and provide maximum headroom for the future. To achieve
Parallel Firewalls on General-Purpose Graphics Processing Units
Parallel Firewalls on General-Purpose Graphics Processing Units Manoj Singh Gaur and Vijay Laxmi Kamal Chandra Reddy, Ankit Tharwani, Ch.Vamshi Krishna, Lakshminarayanan.V Department of Computer Engineering
A Survey of General-Purpose Computation on Graphics Hardware
EUROGRAPHICS 2005 STAR State of The Art Report A Survey of General-Purpose Computation on Graphics Hardware Lefohn, and Timothy J. Purcell Abstract The rapid increase in the performance of graphics hardware,
1 External Model Access
1 External Model Access Function List The EMA package contains the following functions. Ema_Init() on page MFA-1-110 Ema_Model_Attr_Add() on page MFA-1-114 Ema_Model_Attr_Get() on page MFA-1-115 Ema_Model_Attr_Nth()
Multi-GPU Load Balancing for In-situ Visualization
Multi-GPU Load Balancing for In-situ Visualization R. Hagan and Y. Cao Department of Computer Science, Virginia Tech, Blacksburg, VA, USA Abstract Real-time visualization is an important tool for immediately
Enhancing Cloud-based Servers by GPU/CPU Virtualization Management
Enhancing Cloud-based Servers by GPU/CPU Virtualiz Management Tin-Yu Wu 1, Wei-Tsong Lee 2, Chien-Yu Duan 2 Department of Computer Science and Inform Engineering, Nal Ilan University, Taiwan, ROC 1 Department
AMD GPU Architecture. OpenCL Tutorial, PPAM 2009. Dominik Behr September 13th, 2009
AMD GPU Architecture OpenCL Tutorial, PPAM 2009 Dominik Behr September 13th, 2009 Overview AMD GPU architecture How OpenCL maps on GPU and CPU How to optimize for AMD GPUs and CPUs in OpenCL 2 AMD GPU
ultra fast SOM using CUDA
ultra fast SOM using CUDA SOM (Self-Organizing Map) is one of the most popular artificial neural network algorithms in the unsupervised learning category. Sijo Mathew Preetha Joy Sibi Rajendra Manoj A
Intro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1
Intro to GPU computing Spring 2015 Mark Silberstein, 048661, Technion 1 Serial vs. parallel program One instruction at a time Multiple instructions in parallel Spring 2015 Mark Silberstein, 048661, Technion
Data Centric Systems (DCS)
Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems
