Flexible Agent Based Simulation for Pedestrian Modelling on GPU Hardware

Size: px
Start display at page:

Download "Flexible Agent Based Simulation for Pedestrian Modelling on GPU Hardware"

Transcription

1 Flexible Agent Based Simulation for Pedestrian Modelling on GPU Hardware The University of Sheffield, UK Richmond Paul, Coakley Simon, Romano Daniela, "Cellular Level Agent Based Modelling on the Graphics Processing Unit (with FLAME GPU)", Selected for review in the special issue: "Parallel and Ubiquitous methods and tools in Systems Biology" of the international journal: Briefings in Bioinformatics Richmond Paul, Coakley Simon, Romano Daniela (2009), "Cellular Level Agent Based Modelling on the Graphics Processing Unit", Proc. of HiBi09 - High Performance Computational Systems Biology, October 2009,Trento, Italy Richmond Paul, Coakley Simon, Romano Daniela(2009), "A High Performance Agent Based Modelling Framework on Graphics Card Hardware with CUDA", Proc. of 8th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2009), May, 10 15, 2009, Budapest, Hungary Richmond Paul, Romano Daniela(2008), "A High Performance Framework For Agent Based Pedestrian Dynamics On GPU Hardware", Proceedings of EUROSIS ESM 2008 (European Simulation and Modelling), October 27-29, 2008, Universite du Havre, Le Havre, France

2 Introduction and Scope Agent Based Modelling (ABM) Emergence of Complex natural behaviour for simple rules Individuals are agents with memory Update own memory by considering neighbours Of Pedestrian Behaviour Continuous space mobile agents Discrete time steps On the GPU Why?: Performance and real time visualisation Aim is for Flexibility: Want to be able to harness the GPUs power without modellers having to understand GPU programming Not Continuum based (Treuille 06) or using mobile discrete agents (D Souza 07)

3 Outline FLAME and FLAME GPU About FLAME A simple example of an pedestrian model specification Implementing FLAME on the GPU Brief overview of GPU technology Mapping agent data and functions to the GPU Agent communication patterns Case Study Pedestrian modelling Discrete agents Performance results Conclusions

4 What is FLAME? What is FLAME (and what FLAME is not)? Flexible Large-scale Agent Modelling Environment XML Model specification based on the X-Machine Template systems for generating simulation code Single CPU GRID GPU Not a modelling application itself (dynamically generated API) Why extend FLAME to the GPU Complete modelling environment (beyond that of simple swarms) Formal and portable specification technique based on the X-Machine Many existing models to be used for benchmarking What is FLAME GPU Data parallel implementation of FLAME using CUDA Offers real time visualisation Cost effective solution for high performance ABM

5 FLAME and Formal Agent Specification The X-Machine formally defined by Eilenberg (Eilenberg 74) as a 8-tuple (, Γ, Q, M, Φ, F, q0, m0), where; and Γ are the input and output finite alphabet respectively; Q is the finite set of states; M is the (possibly) infinite set called memory; Φ is a finite set of partial functions ø that map an input and a memory state to an output and a new memory state, ø: M Γ M; F is the next state partial function that, given a state and a function from the type Φ, provides the next state, F: Q Φ Q (F is often described as a transition state diagram); q0 and m0 are the initial state and memory respectively; and

6 Agents as Communicating X-Machine s Each agent is a Communicating Stream X- Machine (Balanescu 99) Stream: input and output are streams of data Communicating: agents input and output messages State transitions (functions) describe agent behaviour Updates agent memory Outputs messages (and agents) and process input messages

7 Specifying an Agent in XMML <xagent> <name>pedestrian</name> <memory> <variable><type>float</type><name>x</name></variable> <variable><type>float</type><name>y</name></variable> <variable><type>float</type><name>velx</name></variable> <variable><type>float</type><name>vely</name></variable> </memory> <states> <state><name>start_state</name></state> <state><name>wait_input</name></state> <initialstate>start_state</initialstate> </states> <functions> <function> <name>output_location</name> <currentstate>start_state</currentstate><nextstate>wait_input</nextstate> <outputs> <output><messagename>pedestrian_location</messagename></output> </outputs> </function> <function> <name>input_locations</name> <currentstate>wait_input</currentstate><nextstate>start_state</nextstate> <inputs> <input><messagename>pedestrian_location</messagename></input> </inputs> </function> </functions> <type>continuous</type> </xagent>

8 Specifying Agent Communication in XMML <message> <name>pedestrian_location</name> <variables> <variable> <type>float</type><name>x</name> </variable> <variable> <type>float</type><name>y</name> </variable> <variable> <type>float</type><name>velx</name> </variable> <variable> <type>float</type><name>vely</name> </variable> </variables> <partitioningspatial> <radius>25</radius> <xmin>-100.0</xmin> <xmax>100.0</xmax> <ymin>-100.0</ymin> <ymax>100.0</ymax> <zmin>0.0</zmin> <zmax>25</zmax> </partitioningspatial> </partitioningnone> <partitioningdiscrete> <radius>0</radius> </partitioningdiscrete> </message>

9 Specifying the Function Order <layers> <layer> <layerfunction> <name>output_location</name> </layerfunction> </layer> <layer> <layerfunction> <name>input_locations</name> </layerfunction> </layer> </layers> agent->x agent->y agent->vel_x agent->vel_x pedestrian_location Message list output_location() OUT start_state wait_input input_locations() IN

10 Simulation Process and Code Generation XMML Model File Syntax validated through XML Schema Base XMML Schema describes the basic structure of an X-Machine agent GPU Specific extensions (partitioning) available through a XMMLGPU Schema Object Orientated Approach to extension of the base model C Function Files Translates an XMML model file into simulation source code Templates are written in XML (using XSLT Schema) so can be syntax validated XSLT Processors implement a W3C specification: Any compliant processor can be used to generate code FLAME GPU is therefore not dependant on internal tools or parsers XML Input Data Defines the internal memory of an initial population of agents

11 FLAME and FLAME GPU About FLAME A simple example of an pedestrian model specification Implementing FLAME on the GPU Brief overview of GPU technology Mapping agent data and functions to the GPU Agent communication patterns Case Study Pedestrian modelling Discrete agents Performance results Conclusions

12 Programming the GPU Purpose of the GPU Data parallel device for operation on streams of data Programming for General Purpose Use Graphics API Technique: Not ideal High Level Alternatives Brook GPU (Buck 04): SIMD Stream programming extension for C Sh (McCool 02): C++ language with a Compiler for GPU backends Hardware Specific Stream SDK: Low level ATI specific native instruction set and High Level support with Brook + CUDA: NVIDIA programming for GPU using a compiler and a C syntax with extensions OpenCL: New standard but growing limited support

13 NVIDIA CUDA Programming Model GPU is a coprocessor to CPU (with its own global memory) Many parallel threads of execution Each thread runs the same kernel program (SPMD) Threads are grouped into regular sized blocks Threads within a block can communicate through shared memory Simple synchronisation primitive Threads across blocks can not communicate Block of Threads Thread 0 Thread 1 Thread 2 Thread 3 Thread 4 Thread N Grid of Blocks Block 0 Block 1 Block 2 Block 3

14 CUDA Hardware Model Thread blocks are mapped to Multi Processors (MPs) Multiprocessors are a set of SIMD thread (vector) processors Limited shared memory per MP (and hence blocks) Limited cache and registers per MP GPU Device Multiprocessor 1 Registers Vector Processor 1 Multiprocessor N Multiprocessor 2 Shared Memory Registers Vector Processor 2 Registers Vector Processor N GPU DRAM Device Memory Instruction Unit Constant Cache Texture Cache

15 Mapping Agent Functions to the GPU Each transition function is wrapped by a GPU kernel Each agent is a thread performing the function Functions can input and output messages Functions can output new agents (agent birth) An agent can be removed (agent death) by returning non 0 value FLAME_GPU_FUNC int input_function( xmachine_memory_pedestrian* xmemory, xmachine_message_pedestrian_location_list* location_messages) { /* Get the first message */ xmachine_message_pedestrian_location* location_message = get_first_pedestrian_location_message(location_messages); } /* Repeat untill there are no more messages */ while(location_message) { /* Process the message */ if distance_check(xmemory, location_message) { updatesteervelocity(xmemory, location_message); } } /* Get the next message */ location_message = get_next_pedestrian_location_message(location_message, location_messages); /* Update any other xmemory variables */ xmemory->x += xmemory->vel_x*time_step;... return 0;

16 Mapping X-Machine Agent Data to the GPU All data (agents and messages) is mapped to global memory on the GPU Lists are stored using an Structure of Arrays (SoA) rather than an Array of Structures (AoS) Data is read from global memory to registers Agents and messages are referenced as C structures within function code typedef struct agent_list{ float x[n]; float y[n]; } xm_memory_agent_list; typedef struct agent{ float x; float y; } xm_memory_agent_list [N]; N N N

17 Use of Parallel Compaction Need to avoid diversity within thread blocks Agents are stored and processed in state lists to avoid conditional branching Sparse lists still occur as a result off Agent births Function filters Also during message outputs Agent Function Agent List Agent Birth Output Flags Agent Function Agent list (colour represents state) Prefix Sum Algorithm Agent list after agent function Compact New Agent List

18 Brute Force Message Communication Tile message lists into shared memory to reduce global memory access (Nyland 07) Each thread in the thread block loads a single message into shared memory on the load_first_message function Each call to load_next_message then iterates through messages in shared memory When a call to load_next_message is made after each message in SM has been returned then tile a new batch of messages Repeat until all messages have been considered

19 Effect of Optimisations on for Brute Force Message Communication Simple benchmarking model Efficient data access methods double performance Massive performance gain by using shared memory 100 SoA +SM SoA -SM AoS -SM Relative Speedup over FLAME Population Size

20 Limited Range Message Communication For each message output Environment is split into discrete partitions equal to the message range (each has a unique identifier) The message list is sorted depending on the partition which the message is within A boundary matrix indicates how many messages are within each partition by indicating the start and end index of agents within the sorted list To read all messages within a partition the boundary matrix indicates the range within the message list which needs to be iterated Each agent reads 27 partitions (for a 3D environment) including its own which guarantees messages within the range are processed. Roughly 2/3 messages are outside the range but much better than O(n)² Texture cache is used to read messages from global memory

21 Evaluation of Limited Range Communication N % 80% Percentage of GPU Time 60% 40% 20% 0% Population Size GPUFLAME_inputdata GPUFLAME_move GPUFLAME_outputdata radixsort_kernel merge_kernel memcpyhtod other_gpuflame_kernels

22 Discrete Agent Communication Discrete Agents reading Discrete Messages Load messages into shared memory 2D Message Output Message Load Message Load Message Load Message Load Message Load Message Load 6, 7, 8 and SM in Block 1 SM in Block 2 SM in Block 3 SM in Block Continuous Agent Reading Discrete Messages Cant ensure all messages are loaded into shared memory Use the texture cache instead

23 Performance of Discrete Message Communication Cellular Automaton Model (Game of Life) Over 1 million agents Shared memory only suitable for very small interaction ranges 300 TEX 64 TEX 256 SMC 64 SM GPU Time (ms) Message Range

24 FLAME and FLAME GPU About FLAME A simple example of an pedestrian model specification Implementing FLAME on the GPU Brief overview of GPU technology Mapping agent data and functions to the GPU Agent communication patterns Case Study Pedestrian modelling Discrete agents Performance results Conclusions

25 A Simple Pedestrian Model Inter agent interaction (using spatially partitioned messaging) is based on a hybrid of Reynolds and Social Forces Social repulsion force Navigates pedestrians to area of low concentration Limited forward Vision Preference over agents in direct line of sight Scaled depending on distance to neighbour Close Range Interaction Force Very short range with no limited vision Acts as collision avoidance

26 Visualisation Technique Agent data is already on the GPU Agent positions are made available to OpenGL by mapping them to a Buffer Object We can also store geometry on the GPU to reduce draw calls For Complex models (lots of vertices) Store a single instance of the geometry in a Vertex Array Draw the array for each agent and set a Vertex Attribute each time to indicate the agent index GLSL vertex shader is used to displace vertices in the same way For Simple Models we can use a single large Vertex Array to hold a geometry instance for each agent Associate each vertex with an agent by using a Vertex Attribute stored in a Vertex Attribute Array Only suitable for simple geometry but very few draw calls

27 Animation and Level of Detail (LOD) Animation - Very simple Interpolate between 2 key frames Rotate the model depending on velocity direction Performed in a vertex shader LOD - All data is maintained on GPU so must remain parallel Set View position as a GLOBAL variable Use agent script to calculate viewing distance Save LOD Level in an agent variable Use parallel reduction function to count number of agents per Level Secondary sort of the agents by LOD Level and render in groups

28 Demo Agents coloured by LOD

29 Performance Results Observables Performance Dependant on Communication Radius Larger communication = less partitions = more agents considered per update LOD technique has a cost Don t use for small populations Very large population sizes possible in real time Fram es per Second (FPS) Billboards Detail Level 0 Detail Level 1 Detail Level 2 Dynamic Pedestrian Population

30 Environment Collision Avoidance Discrete grid of agents to encode the environment Static Discrete Agents Repulsive forces direct agents from wall Automatically generated in advance Continuous Pedestrian Agents read discrete messages Apply a collision force Displace pedestrian agents by height value

31 Long Range Navigation Many agents following similar paths so a global solution is used Fluid flow route for each path through the environment Calculated offline in advance by backtracking from exit point Smooth movement around obstacles Discrete Agents also responsible for pedestrian birth allocation

32

33 Conclusions and Future Work Summary Flexible agent architecture for the GPU suitable for force models Easily extendible Massive performance/cost benefits Scope for Future Work Multi GPU Would enable extremely large populations of systems to be simulated For Spatial partitioning only partition boundaries would need to be communicated between GPU devices Improve pedestrian models Improved collision detection (more accurate) Long range individual path planning without flow grids Physically accurate animation and movement Much larger models (need appropriate scenarios)

34 References A. Treuille, S. Cooper, and Z. Popović, "Continuum crowds," in SIGGRAPH '06: ACM SIGGRAPH 2006 Papers. New York, NY, USA: ACM, 2006, pp R. M. D Souza, M. Lysenko, and K. Rahmani. Sugarscape on steroids: simulating over a million agents at interactive rates. In Proceedings of Agent2007, Samuel Eilenberg. Automata, Languages, and Machines. Academic Press, Inc., Orlando, FL, USA, T. Balanescu, A. J. Cowling, H. Georgescu, M. Gheorghe, M. Holcombe, and C. Vertan. Communicating stream x-machines systems are no more than x-machines. j-jucs, 5(9): , Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan. Brook for gpus: stream computing on graphics hardware. ACM Trans. Graph., 23(3): , Michael D. McCool, Zheng Qin, and Tiberiu S. Popa. Shader metaprogramming. In HWWS 02: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, pages 57 68, Aire-la-Ville, Switzerland, Switzerland, Eurographics Association. Lars Nyland, Mark Harris, and Jan Prins. Fast n-body simulation with cuda. In Hubert Nguyen, editor, GPU Gems 3, chapter 31. Addison Wesley Professional, August 2007.

Introduction to GPU Programming Languages

Introduction to GPU Programming Languages CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure

More information

Texture Cache Approximation on GPUs

Texture Cache Approximation on GPUs Texture Cache Approximation on GPUs Mark Sutherland Joshua San Miguel Natalie Enright Jerger {suther68,enright}@ece.utoronto.ca, joshua.sanmiguel@mail.utoronto.ca 1 Our Contribution GPU Core Cache Cache

More information

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software GPU Computing Numerical Simulation - from Models to Software Andreas Barthels JASS 2009, Course 2, St. Petersburg, Russia Prof. Dr. Sergey Y. Slavyanov St. Petersburg State University Prof. Dr. Thomas

More information

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005 Recent Advances and Future Trends in Graphics Hardware Michael Doggett Architect November 23, 2005 Overview XBOX360 GPU : Xenos Rendering performance GPU architecture Unified shader Memory Export Texture/Vertex

More information

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011 Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

More information

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015 GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once

More information

CUDA programming on NVIDIA GPUs

CUDA programming on NVIDIA GPUs p. 1/21 on NVIDIA GPUs Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware

More information

GPGPU Computing. Yong Cao

GPGPU Computing. Yong Cao GPGPU Computing Yong Cao Why Graphics Card? It s powerful! A quiet trend Copyright 2009 by Yong Cao Why Graphics Card? It s powerful! Processor Processing Units FLOPs per Unit Clock Speed Processing Power

More information

GPU Architecture. Michael Doggett ATI

GPU Architecture. Michael Doggett ATI GPU Architecture Michael Doggett ATI GPU Architecture RADEON X1800/X1900 Microsoft s XBOX360 Xenos GPU GPU research areas ATI - Driving the Visual Experience Everywhere Products from cell phones to super

More information

GPUs for Scientific Computing

GPUs for Scientific Computing GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research

More information

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:

More information

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA The Evolution of Computer Graphics Tony Tamasi SVP, Content & Technology, NVIDIA Graphics Make great images intricate shapes complex optical effects seamless motion Make them fast invent clever techniques

More information

Multi-GPU Load Balancing for Simulation and Rendering

Multi-GPU Load Balancing for Simulation and Rendering Multi- Load Balancing for Simulation and Rendering Yong Cao Computer Science Department, Virginia Tech, USA In-situ ualization and ual Analytics Instant visualization and interaction of computing tasks

More information

Parallel Prefix Sum (Scan) with CUDA. Mark Harris mharris@nvidia.com

Parallel Prefix Sum (Scan) with CUDA. Mark Harris mharris@nvidia.com Parallel Prefix Sum (Scan) with CUDA Mark Harris mharris@nvidia.com April 2007 Document Change History Version Date Responsible Reason for Change February 14, 2007 Mark Harris Initial release April 2007

More information

Radeon HD 2900 and Geometry Generation. Michael Doggett

Radeon HD 2900 and Geometry Generation. Michael Doggett Radeon HD 2900 and Geometry Generation Michael Doggett September 11, 2007 Overview Introduction to 3D Graphics Radeon 2900 Starting Point Requirements Top level Pipeline Blocks from top to bottom Command

More information

L20: GPU Architecture and Models

L20: GPU Architecture and Models L20: GPU Architecture and Models scribe(s): Abdul Khalifa 20.1 Overview GPUs (Graphics Processing Units) are large parallel structure of processing cores capable of rendering graphics efficiently on displays.

More information

Brook for GPUs: Stream Computing on Graphics Hardware

Brook for GPUs: Stream Computing on Graphics Hardware Brook for GPUs: Stream Computing on Graphics Hardware Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan Computer Science Department Stanford University

More information

GPU Point List Generation through Histogram Pyramids

GPU Point List Generation through Histogram Pyramids VMV 26, GPU Programming GPU Point List Generation through Histogram Pyramids Gernot Ziegler, Art Tevs, Christian Theobalt, Hans-Peter Seidel Agenda Overall task Problems Solution principle Algorithm: Discriminator

More information

General Purpose Computation on Graphics Processors (GPGPU) Mike Houston, Stanford University

General Purpose Computation on Graphics Processors (GPGPU) Mike Houston, Stanford University General Purpose Computation on Graphics Processors (GPGPU) Mike Houston, Stanford University A little about me http://graphics.stanford.edu/~mhouston Education: UC San Diego, Computer Science BS Stanford

More information

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group Shader Model 3.0 Ashu Rege NVIDIA Developer Technology Group Talk Outline Quick Intro GeForce 6 Series (NV4X family) New Vertex Shader Features Vertex Texture Fetch Longer Programs and Dynamic Flow Control

More information

AMD GPU Architecture. OpenCL Tutorial, PPAM 2009. Dominik Behr September 13th, 2009

AMD GPU Architecture. OpenCL Tutorial, PPAM 2009. Dominik Behr September 13th, 2009 AMD GPU Architecture OpenCL Tutorial, PPAM 2009 Dominik Behr September 13th, 2009 Overview AMD GPU architecture How OpenCL maps on GPU and CPU How to optimize for AMD GPUs and CPUs in OpenCL 2 AMD GPU

More information

GPGPU: General-Purpose Computation on GPUs

GPGPU: General-Purpose Computation on GPUs GPGPU: General-Purpose Computation on GPUs Randy Fernando NVIDIA Developer Technology Group (Original Slides Courtesy of Mark Harris) Why GPGPU? The GPU has evolved into an extremely flexible and powerful

More information

GPU Shading and Rendering: Introduction & Graphics Hardware

GPU Shading and Rendering: Introduction & Graphics Hardware GPU Shading and Rendering: Introduction & Graphics Hardware Marc Olano Computer Science and Electrical Engineering University of Maryland, Baltimore County SIGGRAPH 2005 Schedule Shading Technolgy 8:30

More information

HPC with Multicore and GPUs

HPC with Multicore and GPUs HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware

More information

Data Parallel Computing on Graphics Hardware. Ian Buck Stanford University

Data Parallel Computing on Graphics Hardware. Ian Buck Stanford University Data Parallel Computing on Graphics Hardware Ian Buck Stanford University Brook General purpose Streaming language DARPA Polymorphous Computing Architectures Stanford - Smart Memories UT Austin - TRIPS

More information

Introduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it

Introduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it t.diamanti@cineca.it Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate

More information

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming Overview Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.

More information

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase

More information

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?

More information

Bachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries

Bachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries First Semester Development 1A On completion of this subject students will be able to apply basic programming and problem solving skills in a 3 rd generation object-oriented programming language (such as

More information

GPGPU for Real-Time Data Analytics: Introduction. Nanyang Technological University, Singapore 2

GPGPU for Real-Time Data Analytics: Introduction. Nanyang Technological University, Singapore 2 GPGPU for Real-Time Data Analytics: Introduction Bingsheng He 1, Huynh Phung Huynh 2, Rick Siow Mong Goh 2 1 Nanyang Technological University, Singapore 2 A*STAR Institute of High Performance Computing,

More information

VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS

VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS Perhaad Mistry, Yash Ukidave, Dana Schaa, David Kaeli Department of Electrical and Computer Engineering Northeastern University,

More information

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university Real-Time Realistic Rendering Michael Doggett Docent Department of Computer Science Lund university 30-5-2011 Visually realistic goal force[d] us to completely rethink the entire rendering process. Cook

More information

Interactive Level-Set Deformation On the GPU

Interactive Level-Set Deformation On the GPU Interactive Level-Set Deformation On the GPU Institute for Data Analysis and Visualization University of California, Davis Problem Statement Goal Interactive system for deformable surface manipulation

More information

HIGH PERFORMANCE CONSULTING COURSE OFFERINGS

HIGH PERFORMANCE CONSULTING COURSE OFFERINGS Performance 1(6) HIGH PERFORMANCE CONSULTING COURSE OFFERINGS LEARN TO TAKE ADVANTAGE OF POWERFUL GPU BASED ACCELERATOR TECHNOLOGY TODAY 2006 2013 Nvidia GPUs Intel CPUs CONTENTS Acronyms and Terminology...

More information

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get

More information

ultra fast SOM using CUDA

ultra fast SOM using CUDA ultra fast SOM using CUDA SOM (Self-Organizing Map) is one of the most popular artificial neural network algorithms in the unsupervised learning category. Sijo Mathew Preetha Joy Sibi Rajendra Manoj A

More information

HP ProLiant SL270s Gen8 Server. Evaluation Report

HP ProLiant SL270s Gen8 Server. Evaluation Report HP ProLiant SL270s Gen8 Server Evaluation Report Thomas Schoenemeyer, Hussein Harake and Daniel Peter Swiss National Supercomputing Centre (CSCS), Lugano Institute of Geophysics, ETH Zürich schoenemeyer@cscs.ch

More information

Accelerating Wavelet-Based Video Coding on Graphics Hardware

Accelerating Wavelet-Based Video Coding on Graphics Hardware Wladimir J. van der Laan, Andrei C. Jalba, and Jos B.T.M. Roerdink. Accelerating Wavelet-Based Video Coding on Graphics Hardware using CUDA. In Proc. 6th International Symposium on Image and Signal Processing

More information

An investigation of the efficient implementation of Cellular. Automata on multi-core CPU and GPU hardware.

An investigation of the efficient implementation of Cellular. Automata on multi-core CPU and GPU hardware. An investigation of the efficient implementation of Cellular Automata on multi-core CPU and GPU hardware. Mike Gibson (Corresponding author), Ed Keedwell and Dragan Savić College of Engineering, Mathematics

More information

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008 Radeon GPU Architecture and the series Michael Doggett Graphics Architecture Group June 27, 2008 Graphics Processing Units Introduction GPU research 2 GPU Evolution GPU started as a triangle rasterizer

More information

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions Module 4: Beyond Static Scalar Fields Dynamic Volume Computation and Visualization on the GPU Visualization and Computer Graphics Group University of California, Davis Overview Motivation and applications

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

GPU Parallel Computing Architecture and CUDA Programming Model

GPU Parallel Computing Architecture and CUDA Programming Model GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel

More information

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization

More information

CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University

CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University GPU Generations CSE 564: Visualization GPU Programming (First Steps) Klaus Mueller Computer Science Department Stony Brook University For the labs, 4th generation is desirable Graphics Hardware Pipeline

More information

NVPRO-PIPELINE A RESEARCH RENDERING PIPELINE MARKUS TAVENRATH MATAVENRATH@NVIDIA.COM SENIOR DEVELOPER TECHNOLOGY ENGINEER, NVIDIA

NVPRO-PIPELINE A RESEARCH RENDERING PIPELINE MARKUS TAVENRATH MATAVENRATH@NVIDIA.COM SENIOR DEVELOPER TECHNOLOGY ENGINEER, NVIDIA NVPRO-PIPELINE A RESEARCH RENDERING PIPELINE MARKUS TAVENRATH MATAVENRATH@NVIDIA.COM SENIOR DEVELOPER TECHNOLOGY ENGINEER, NVIDIA GFLOPS 3500 3000 NVPRO-PIPELINE Peak Double Precision FLOPS GPU perf improved

More information

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.

More information

Real-time Visual Tracker by Stream Processing

Real-time Visual Tracker by Stream Processing Real-time Visual Tracker by Stream Processing Simultaneous and Fast 3D Tracking of Multiple Faces in Video Sequences by Using a Particle Filter Oscar Mateo Lozano & Kuzahiro Otsuka presented by Piotr Rudol

More information

ATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group

ATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group ATI Radeon 4800 series Graphics Michael Doggett Graphics Architecture Group Graphics Product Group Graphics Processing Units ATI Radeon HD 4870 AMD Stream Computing Next Generation GPUs 2 Radeon 4800 series

More information

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices E6895 Advanced Big Data Analytics Lecture 14: NVIDIA GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist,

More information

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France

More information

GUI GRAPHICS AND USER INTERFACES. Welcome to GUI! Mechanics. Mihail Gaianu 26/02/2014 1

GUI GRAPHICS AND USER INTERFACES. Welcome to GUI! Mechanics. Mihail Gaianu 26/02/2014 1 Welcome to GUI! Mechanics 26/02/2014 1 Requirements Info If you don t know C++, you CAN take this class additional time investment required early on GUI Java to C++ transition tutorial on course website

More information

GPU for Scientific Computing. -Ali Saleh

GPU for Scientific Computing. -Ali Saleh 1 GPU for Scientific Computing -Ali Saleh Contents Introduction What is GPU GPU for Scientific Computing K-Means Clustering K-nearest Neighbours When to use GPU and when not Commercial Programming GPU

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

Computer Graphics Hardware An Overview

Computer Graphics Hardware An Overview Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and

More information

Game Development in Android Disgruntled Rats LLC. Sean Godinez Brian Morgan Michael Boldischar

Game Development in Android Disgruntled Rats LLC. Sean Godinez Brian Morgan Michael Boldischar Game Development in Android Disgruntled Rats LLC Sean Godinez Brian Morgan Michael Boldischar Overview Introduction Android Tools Game Development OpenGL ES Marketing Summary Questions Introduction Disgruntled

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

CUDA Basics. Murphy Stein New York University

CUDA Basics. Murphy Stein New York University CUDA Basics Murphy Stein New York University Overview Device Architecture CUDA Programming Model Matrix Transpose in CUDA Further Reading What is CUDA? CUDA stands for: Compute Unified Device Architecture

More information

CUBE-MAP DATA STRUCTURE FOR INTERACTIVE GLOBAL ILLUMINATION COMPUTATION IN DYNAMIC DIFFUSE ENVIRONMENTS

CUBE-MAP DATA STRUCTURE FOR INTERACTIVE GLOBAL ILLUMINATION COMPUTATION IN DYNAMIC DIFFUSE ENVIRONMENTS ICCVG 2002 Zakopane, 25-29 Sept. 2002 Rafal Mantiuk (1,2), Sumanta Pattanaik (1), Karol Myszkowski (3) (1) University of Central Florida, USA, (2) Technical University of Szczecin, Poland, (3) Max- Planck-Institut

More information

Multi-GPU Load Balancing for In-situ Visualization

Multi-GPU Load Balancing for In-situ Visualization Multi-GPU Load Balancing for In-situ Visualization R. Hagan and Y. Cao Department of Computer Science, Virginia Tech, Blacksburg, VA, USA Abstract Real-time visualization is an important tool for immediately

More information

GPU Computing - CUDA

GPU Computing - CUDA GPU Computing - CUDA A short overview of hardware and programing model Pierre Kestener 1 1 CEA Saclay, DSM, Maison de la Simulation Saclay, June 12, 2012 Atelier AO and GPU 1 / 37 Content Historical perspective

More information

How To Teach Computer Graphics

How To Teach Computer Graphics Computer Graphics Thilo Kielmann Lecture 1: 1 Introduction (basic administrative information) Course Overview + Examples (a.o. Pixar, Blender, ) Graphics Systems Hands-on Session General Introduction http://www.cs.vu.nl/~graphics/

More information

NVIDIA GeForce GTX 580 GPU Datasheet

NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet 3D Graphics Full Microsoft DirectX 11 Shader Model 5.0 support: o NVIDIA PolyMorph Engine with distributed HW tessellation engines

More information

A Computer Vision System on a Chip: a case study from the automotive domain

A Computer Vision System on a Chip: a case study from the automotive domain A Computer Vision System on a Chip: a case study from the automotive domain Gideon P. Stein Elchanan Rushinek Gaby Hayun Amnon Shashua Mobileye Vision Technologies Ltd. Hebrew University Jerusalem, Israel

More information

Hardware design for ray tracing

Hardware design for ray tracing Hardware design for ray tracing Jae-sung Yoon Introduction Realtime ray tracing performance has recently been achieved even on single CPU. [Wald et al. 2001, 2002, 2004] However, higher resolutions, complex

More information

Web Based 3D Visualization for COMSOL Multiphysics

Web Based 3D Visualization for COMSOL Multiphysics Web Based 3D Visualization for COMSOL Multiphysics M. Jüttner* 1, S. Grabmaier 1, W. M. Rucker 1 1 University of Stuttgart Institute for Theory of Electrical Engineering *Corresponding author: Pfaffenwaldring

More information

Writing Applications for the GPU Using the RapidMind Development Platform

Writing Applications for the GPU Using the RapidMind Development Platform Writing Applications for the GPU Using the RapidMind Development Platform Contents Introduction... 1 Graphics Processing Units... 1 RapidMind Development Platform... 2 Writing RapidMind Enabled Applications...

More information

Optimizing AAA Games for Mobile Platforms

Optimizing AAA Games for Mobile Platforms Optimizing AAA Games for Mobile Platforms Niklas Smedberg Senior Engine Programmer, Epic Games Who Am I A.k.a. Smedis Epic Games, Unreal Engine 15 years in the industry 30 years of programming C64 demo

More information

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching

More information

Towards Large-Scale Molecular Dynamics Simulations on Graphics Processors

Towards Large-Scale Molecular Dynamics Simulations on Graphics Processors Towards Large-Scale Molecular Dynamics Simulations on Graphics Processors Joe Davis, Sandeep Patel, and Michela Taufer University of Delaware Outline Introduction Introduction to GPU programming Why MD

More information

Optimizing Unity Games for Mobile Platforms. Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August

Optimizing Unity Games for Mobile Platforms. Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August Optimizing Unity Games for Mobile Platforms Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August Agenda Introduction The author and ARM Preliminary knowledge Unity Pro, OpenGL ES 3.0 Identify

More information

Introduction to GPU Architecture

Introduction to GPU Architecture Introduction to GPU Architecture Ofer Rosenberg, PMTS SW, OpenCL Dev. Team AMD Based on From Shader Code to a Teraflop: How GPU Shader Cores Work, By Kayvon Fatahalian, Stanford University Content 1. Three

More information

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Computer Science 14 (2) 2013 http://dx.doi.org/10.7494/csci.2013.14.2.243 Marcin Pietroń Pawe l Russek Kazimierz Wiatr ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Abstract This paper presents

More information

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series By: Binesh Tuladhar Clay Smith Overview History of GPU s GPU Definition Classical Graphics Pipeline Geforce 6 Series Architecture Vertex

More information

Intro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1

Intro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1 Intro to GPU computing Spring 2015 Mark Silberstein, 048661, Technion 1 Serial vs. parallel program One instruction at a time Multiple instructions in parallel Spring 2015 Mark Silberstein, 048661, Technion

More information

GPGPU Parallel Merge Sort Algorithm

GPGPU Parallel Merge Sort Algorithm GPGPU Parallel Merge Sort Algorithm Jim Kukunas and James Devine May 4, 2009 Abstract The increasingly high data throughput and computational power of today s Graphics Processing Units (GPUs), has led

More information

Rethinking SIMD Vectorization for In-Memory Databases

Rethinking SIMD Vectorization for In-Memory Databases SIGMOD 215, Melbourne, Victoria, Australia Rethinking SIMD Vectorization for In-Memory Databases Orestis Polychroniou Columbia University Arun Raghavan Oracle Labs Kenneth A. Ross Columbia University Latest

More information

Big Data Visualization on the MIC

Big Data Visualization on the MIC Big Data Visualization on the MIC Tim Dykes School of Creative Technologies University of Portsmouth timothy.dykes@port.ac.uk Many-Core Seminar Series 26/02/14 Splotch Team Tim Dykes, University of Portsmouth

More information

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Amanda O Connor, Bryan Justice, and A. Thomas Harris IN52A. Big Data in the Geosciences:

More information

QCD as a Video Game?

QCD as a Video Game? QCD as a Video Game? Sándor D. Katz Eötvös University Budapest in collaboration with Győző Egri, Zoltán Fodor, Christian Hoelbling Dániel Nógrádi, Kálmán Szabó Outline 1. Introduction 2. GPU architecture

More information

Guided Performance Analysis with the NVIDIA Visual Profiler

Guided Performance Analysis with the NVIDIA Visual Profiler Guided Performance Analysis with the NVIDIA Visual Profiler Identifying Performance Opportunities NVIDIA Nsight Eclipse Edition (nsight) NVIDIA Visual Profiler (nvvp) nvprof command-line profiler Guided

More information

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics GPU Architectures A CPU Perspective Derek Hower AMD Research 5/21/2013 Goals Data Parallelism: What is it, and how to exploit it? Workload characteristics Execution Models / GPU Architectures MIMD (SPMD),

More information

OpenGL Performance Tuning

OpenGL Performance Tuning OpenGL Performance Tuning Evan Hart ATI Pipeline slides courtesy John Spitzer - NVIDIA Overview What to look for in tuning How it relates to the graphics pipeline Modern areas of interest Vertex Buffer

More information

15-418 Final Project Report. Trading Platform Server

15-418 Final Project Report. Trading Platform Server 15-418 Final Project Report Yinghao Wang yinghaow@andrew.cmu.edu May 8, 214 Trading Platform Server Executive Summary The final project will implement a trading platform server that provides back-end support

More information

Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg

Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg Image Processing and Computer Graphics Rendering Pipeline Matthias Teschner Computer Science Department University of Freiburg Outline introduction rendering pipeline vertex processing primitive processing

More information

CUDA SKILLS. Yu-Hang Tang. June 23-26, 2015 CSRC, Beijing

CUDA SKILLS. Yu-Hang Tang. June 23-26, 2015 CSRC, Beijing CUDA SKILLS Yu-Hang Tang June 23-26, 2015 CSRC, Beijing day1.pdf at /home/ytang/slides Referece solutions coming soon Online CUDA API documentation http://docs.nvidia.com/cuda/index.html Yu-Hang Tang @

More information

HPC Wales Skills Academy Course Catalogue 2015

HPC Wales Skills Academy Course Catalogue 2015 HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses

More information

String Matching on a multicore GPU using CUDA

String Matching on a multicore GPU using CUDA String Matching on a multicore GPU using CUDA Charalampos S. Kouzinopoulos and Konstantinos G. Margaritis Parallel and Distributed Processing Laboratory Department of Applied Informatics, University of

More information

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile GPU Computing with CUDA Lecture 2 - CUDA Memories Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 Outline of lecture Recap of Lecture 1 Warp scheduling CUDA Memory hierarchy

More information

A Load Balancing Schema for Agent-based SPMD Applications

A Load Balancing Schema for Agent-based SPMD Applications A Load Balancing Schema for Agent-based SPMD Applications Claudio Márquez, Eduardo César, and Joan Sorribes Computer Architecture and Operating Systems Department (CAOS), Universitat Autònoma de Barcelona,

More information

Developer Tools. Tim Purcell NVIDIA

Developer Tools. Tim Purcell NVIDIA Developer Tools Tim Purcell NVIDIA Programming Soap Box Successful programming systems require at least three tools High level language compiler Cg, HLSL, GLSL, RTSL, Brook Debugger Profiler Debugging

More information

Parallel Computing with MATLAB

Parallel Computing with MATLAB Parallel Computing with MATLAB Scott Benway Senior Account Manager Jiro Doke, Ph.D. Senior Application Engineer 2013 The MathWorks, Inc. 1 Acceleration Strategies Applied in MATLAB Approach Options Best

More information

A Performance-Oriented Data Parallel Virtual Machine for GPUs

A Performance-Oriented Data Parallel Virtual Machine for GPUs A Performance-Oriented Data Parallel Virtual Machine for GPUs Mark Segal Mark Peercy ATI Technologies, Inc. Abstract Existing GPU programming interfaces require that applications employ a graphics-centric

More information

A Case Study - Scaling Legacy Code on Next Generation Platforms

A Case Study - Scaling Legacy Code on Next Generation Platforms Available online at www.sciencedirect.com ScienceDirect Procedia Engineering 00 (2015) 000 000 www.elsevier.com/locate/procedia 24th International Meshing Roundtable (IMR24) A Case Study - Scaling Legacy

More information

Design and Optimization of a Portable Lattice Boltzmann Code for Heterogeneous Architectures

Design and Optimization of a Portable Lattice Boltzmann Code for Heterogeneous Architectures Design and Optimization of a Portable Lattice Boltzmann Code for Heterogeneous Architectures E Calore, S F Schifano, R Tripiccione Enrico Calore INFN Ferrara, Italy Perspectives of GPU Computing in Physics

More information

Performance Optimization and Debug Tools for mobile games with PlayCanvas

Performance Optimization and Debug Tools for mobile games with PlayCanvas Performance Optimization and Debug Tools for mobile games with PlayCanvas Jonathan Kirkham, Senior Software Engineer, ARM Will Eastcott, CEO, PlayCanvas 1 Introduction Jonathan Kirkham, ARM Worked with

More information

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Haohuan Fu haohuan@tsinghua.edu.cn High Performance Geo-Computing (HPGC) Group Center for Earth System Science Tsinghua University

More information

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C Tutorial#1 Q 1:- Explain the terms data, elementary item, entity, primary key, domain, attribute and information? Also give examples in support of your answer? Q 2:- What is a Data Type? Differentiate

More information