Flexible Agent Based Simulation for Pedestrian Modelling on GPU Hardware
|
|
- Ira Wilkinson
- 7 years ago
- Views:
Transcription
1 Flexible Agent Based Simulation for Pedestrian Modelling on GPU Hardware The University of Sheffield, UK Richmond Paul, Coakley Simon, Romano Daniela, "Cellular Level Agent Based Modelling on the Graphics Processing Unit (with FLAME GPU)", Selected for review in the special issue: "Parallel and Ubiquitous methods and tools in Systems Biology" of the international journal: Briefings in Bioinformatics Richmond Paul, Coakley Simon, Romano Daniela (2009), "Cellular Level Agent Based Modelling on the Graphics Processing Unit", Proc. of HiBi09 - High Performance Computational Systems Biology, October 2009,Trento, Italy Richmond Paul, Coakley Simon, Romano Daniela(2009), "A High Performance Agent Based Modelling Framework on Graphics Card Hardware with CUDA", Proc. of 8th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2009), May, 10 15, 2009, Budapest, Hungary Richmond Paul, Romano Daniela(2008), "A High Performance Framework For Agent Based Pedestrian Dynamics On GPU Hardware", Proceedings of EUROSIS ESM 2008 (European Simulation and Modelling), October 27-29, 2008, Universite du Havre, Le Havre, France
2 Introduction and Scope Agent Based Modelling (ABM) Emergence of Complex natural behaviour for simple rules Individuals are agents with memory Update own memory by considering neighbours Of Pedestrian Behaviour Continuous space mobile agents Discrete time steps On the GPU Why?: Performance and real time visualisation Aim is for Flexibility: Want to be able to harness the GPUs power without modellers having to understand GPU programming Not Continuum based (Treuille 06) or using mobile discrete agents (D Souza 07)
3 Outline FLAME and FLAME GPU About FLAME A simple example of an pedestrian model specification Implementing FLAME on the GPU Brief overview of GPU technology Mapping agent data and functions to the GPU Agent communication patterns Case Study Pedestrian modelling Discrete agents Performance results Conclusions
4 What is FLAME? What is FLAME (and what FLAME is not)? Flexible Large-scale Agent Modelling Environment XML Model specification based on the X-Machine Template systems for generating simulation code Single CPU GRID GPU Not a modelling application itself (dynamically generated API) Why extend FLAME to the GPU Complete modelling environment (beyond that of simple swarms) Formal and portable specification technique based on the X-Machine Many existing models to be used for benchmarking What is FLAME GPU Data parallel implementation of FLAME using CUDA Offers real time visualisation Cost effective solution for high performance ABM
5 FLAME and Formal Agent Specification The X-Machine formally defined by Eilenberg (Eilenberg 74) as a 8-tuple (, Γ, Q, M, Φ, F, q0, m0), where; and Γ are the input and output finite alphabet respectively; Q is the finite set of states; M is the (possibly) infinite set called memory; Φ is a finite set of partial functions ø that map an input and a memory state to an output and a new memory state, ø: M Γ M; F is the next state partial function that, given a state and a function from the type Φ, provides the next state, F: Q Φ Q (F is often described as a transition state diagram); q0 and m0 are the initial state and memory respectively; and
6 Agents as Communicating X-Machine s Each agent is a Communicating Stream X- Machine (Balanescu 99) Stream: input and output are streams of data Communicating: agents input and output messages State transitions (functions) describe agent behaviour Updates agent memory Outputs messages (and agents) and process input messages
7 Specifying an Agent in XMML <xagent> <name>pedestrian</name> <memory> <variable><type>float</type><name>x</name></variable> <variable><type>float</type><name>y</name></variable> <variable><type>float</type><name>velx</name></variable> <variable><type>float</type><name>vely</name></variable> </memory> <states> <state><name>start_state</name></state> <state><name>wait_input</name></state> <initialstate>start_state</initialstate> </states> <functions> <function> <name>output_location</name> <currentstate>start_state</currentstate><nextstate>wait_input</nextstate> <outputs> <output><messagename>pedestrian_location</messagename></output> </outputs> </function> <function> <name>input_locations</name> <currentstate>wait_input</currentstate><nextstate>start_state</nextstate> <inputs> <input><messagename>pedestrian_location</messagename></input> </inputs> </function> </functions> <type>continuous</type> </xagent>
8 Specifying Agent Communication in XMML <message> <name>pedestrian_location</name> <variables> <variable> <type>float</type><name>x</name> </variable> <variable> <type>float</type><name>y</name> </variable> <variable> <type>float</type><name>velx</name> </variable> <variable> <type>float</type><name>vely</name> </variable> </variables> <partitioningspatial> <radius>25</radius> <xmin>-100.0</xmin> <xmax>100.0</xmax> <ymin>-100.0</ymin> <ymax>100.0</ymax> <zmin>0.0</zmin> <zmax>25</zmax> </partitioningspatial> </partitioningnone> <partitioningdiscrete> <radius>0</radius> </partitioningdiscrete> </message>
9 Specifying the Function Order <layers> <layer> <layerfunction> <name>output_location</name> </layerfunction> </layer> <layer> <layerfunction> <name>input_locations</name> </layerfunction> </layer> </layers> agent->x agent->y agent->vel_x agent->vel_x pedestrian_location Message list output_location() OUT start_state wait_input input_locations() IN
10 Simulation Process and Code Generation XMML Model File Syntax validated through XML Schema Base XMML Schema describes the basic structure of an X-Machine agent GPU Specific extensions (partitioning) available through a XMMLGPU Schema Object Orientated Approach to extension of the base model C Function Files Translates an XMML model file into simulation source code Templates are written in XML (using XSLT Schema) so can be syntax validated XSLT Processors implement a W3C specification: Any compliant processor can be used to generate code FLAME GPU is therefore not dependant on internal tools or parsers XML Input Data Defines the internal memory of an initial population of agents
11 FLAME and FLAME GPU About FLAME A simple example of an pedestrian model specification Implementing FLAME on the GPU Brief overview of GPU technology Mapping agent data and functions to the GPU Agent communication patterns Case Study Pedestrian modelling Discrete agents Performance results Conclusions
12 Programming the GPU Purpose of the GPU Data parallel device for operation on streams of data Programming for General Purpose Use Graphics API Technique: Not ideal High Level Alternatives Brook GPU (Buck 04): SIMD Stream programming extension for C Sh (McCool 02): C++ language with a Compiler for GPU backends Hardware Specific Stream SDK: Low level ATI specific native instruction set and High Level support with Brook + CUDA: NVIDIA programming for GPU using a compiler and a C syntax with extensions OpenCL: New standard but growing limited support
13 NVIDIA CUDA Programming Model GPU is a coprocessor to CPU (with its own global memory) Many parallel threads of execution Each thread runs the same kernel program (SPMD) Threads are grouped into regular sized blocks Threads within a block can communicate through shared memory Simple synchronisation primitive Threads across blocks can not communicate Block of Threads Thread 0 Thread 1 Thread 2 Thread 3 Thread 4 Thread N Grid of Blocks Block 0 Block 1 Block 2 Block 3
14 CUDA Hardware Model Thread blocks are mapped to Multi Processors (MPs) Multiprocessors are a set of SIMD thread (vector) processors Limited shared memory per MP (and hence blocks) Limited cache and registers per MP GPU Device Multiprocessor 1 Registers Vector Processor 1 Multiprocessor N Multiprocessor 2 Shared Memory Registers Vector Processor 2 Registers Vector Processor N GPU DRAM Device Memory Instruction Unit Constant Cache Texture Cache
15 Mapping Agent Functions to the GPU Each transition function is wrapped by a GPU kernel Each agent is a thread performing the function Functions can input and output messages Functions can output new agents (agent birth) An agent can be removed (agent death) by returning non 0 value FLAME_GPU_FUNC int input_function( xmachine_memory_pedestrian* xmemory, xmachine_message_pedestrian_location_list* location_messages) { /* Get the first message */ xmachine_message_pedestrian_location* location_message = get_first_pedestrian_location_message(location_messages); } /* Repeat untill there are no more messages */ while(location_message) { /* Process the message */ if distance_check(xmemory, location_message) { updatesteervelocity(xmemory, location_message); } } /* Get the next message */ location_message = get_next_pedestrian_location_message(location_message, location_messages); /* Update any other xmemory variables */ xmemory->x += xmemory->vel_x*time_step;... return 0;
16 Mapping X-Machine Agent Data to the GPU All data (agents and messages) is mapped to global memory on the GPU Lists are stored using an Structure of Arrays (SoA) rather than an Array of Structures (AoS) Data is read from global memory to registers Agents and messages are referenced as C structures within function code typedef struct agent_list{ float x[n]; float y[n]; } xm_memory_agent_list; typedef struct agent{ float x; float y; } xm_memory_agent_list [N]; N N N
17 Use of Parallel Compaction Need to avoid diversity within thread blocks Agents are stored and processed in state lists to avoid conditional branching Sparse lists still occur as a result off Agent births Function filters Also during message outputs Agent Function Agent List Agent Birth Output Flags Agent Function Agent list (colour represents state) Prefix Sum Algorithm Agent list after agent function Compact New Agent List
18 Brute Force Message Communication Tile message lists into shared memory to reduce global memory access (Nyland 07) Each thread in the thread block loads a single message into shared memory on the load_first_message function Each call to load_next_message then iterates through messages in shared memory When a call to load_next_message is made after each message in SM has been returned then tile a new batch of messages Repeat until all messages have been considered
19 Effect of Optimisations on for Brute Force Message Communication Simple benchmarking model Efficient data access methods double performance Massive performance gain by using shared memory 100 SoA +SM SoA -SM AoS -SM Relative Speedup over FLAME Population Size
20 Limited Range Message Communication For each message output Environment is split into discrete partitions equal to the message range (each has a unique identifier) The message list is sorted depending on the partition which the message is within A boundary matrix indicates how many messages are within each partition by indicating the start and end index of agents within the sorted list To read all messages within a partition the boundary matrix indicates the range within the message list which needs to be iterated Each agent reads 27 partitions (for a 3D environment) including its own which guarantees messages within the range are processed. Roughly 2/3 messages are outside the range but much better than O(n)² Texture cache is used to read messages from global memory
21 Evaluation of Limited Range Communication N % 80% Percentage of GPU Time 60% 40% 20% 0% Population Size GPUFLAME_inputdata GPUFLAME_move GPUFLAME_outputdata radixsort_kernel merge_kernel memcpyhtod other_gpuflame_kernels
22 Discrete Agent Communication Discrete Agents reading Discrete Messages Load messages into shared memory 2D Message Output Message Load Message Load Message Load Message Load Message Load Message Load 6, 7, 8 and SM in Block 1 SM in Block 2 SM in Block 3 SM in Block Continuous Agent Reading Discrete Messages Cant ensure all messages are loaded into shared memory Use the texture cache instead
23 Performance of Discrete Message Communication Cellular Automaton Model (Game of Life) Over 1 million agents Shared memory only suitable for very small interaction ranges 300 TEX 64 TEX 256 SMC 64 SM GPU Time (ms) Message Range
24 FLAME and FLAME GPU About FLAME A simple example of an pedestrian model specification Implementing FLAME on the GPU Brief overview of GPU technology Mapping agent data and functions to the GPU Agent communication patterns Case Study Pedestrian modelling Discrete agents Performance results Conclusions
25 A Simple Pedestrian Model Inter agent interaction (using spatially partitioned messaging) is based on a hybrid of Reynolds and Social Forces Social repulsion force Navigates pedestrians to area of low concentration Limited forward Vision Preference over agents in direct line of sight Scaled depending on distance to neighbour Close Range Interaction Force Very short range with no limited vision Acts as collision avoidance
26 Visualisation Technique Agent data is already on the GPU Agent positions are made available to OpenGL by mapping them to a Buffer Object We can also store geometry on the GPU to reduce draw calls For Complex models (lots of vertices) Store a single instance of the geometry in a Vertex Array Draw the array for each agent and set a Vertex Attribute each time to indicate the agent index GLSL vertex shader is used to displace vertices in the same way For Simple Models we can use a single large Vertex Array to hold a geometry instance for each agent Associate each vertex with an agent by using a Vertex Attribute stored in a Vertex Attribute Array Only suitable for simple geometry but very few draw calls
27 Animation and Level of Detail (LOD) Animation - Very simple Interpolate between 2 key frames Rotate the model depending on velocity direction Performed in a vertex shader LOD - All data is maintained on GPU so must remain parallel Set View position as a GLOBAL variable Use agent script to calculate viewing distance Save LOD Level in an agent variable Use parallel reduction function to count number of agents per Level Secondary sort of the agents by LOD Level and render in groups
28 Demo Agents coloured by LOD
29 Performance Results Observables Performance Dependant on Communication Radius Larger communication = less partitions = more agents considered per update LOD technique has a cost Don t use for small populations Very large population sizes possible in real time Fram es per Second (FPS) Billboards Detail Level 0 Detail Level 1 Detail Level 2 Dynamic Pedestrian Population
30 Environment Collision Avoidance Discrete grid of agents to encode the environment Static Discrete Agents Repulsive forces direct agents from wall Automatically generated in advance Continuous Pedestrian Agents read discrete messages Apply a collision force Displace pedestrian agents by height value
31 Long Range Navigation Many agents following similar paths so a global solution is used Fluid flow route for each path through the environment Calculated offline in advance by backtracking from exit point Smooth movement around obstacles Discrete Agents also responsible for pedestrian birth allocation
32
33 Conclusions and Future Work Summary Flexible agent architecture for the GPU suitable for force models Easily extendible Massive performance/cost benefits Scope for Future Work Multi GPU Would enable extremely large populations of systems to be simulated For Spatial partitioning only partition boundaries would need to be communicated between GPU devices Improve pedestrian models Improved collision detection (more accurate) Long range individual path planning without flow grids Physically accurate animation and movement Much larger models (need appropriate scenarios)
34 References A. Treuille, S. Cooper, and Z. Popović, "Continuum crowds," in SIGGRAPH '06: ACM SIGGRAPH 2006 Papers. New York, NY, USA: ACM, 2006, pp R. M. D Souza, M. Lysenko, and K. Rahmani. Sugarscape on steroids: simulating over a million agents at interactive rates. In Proceedings of Agent2007, Samuel Eilenberg. Automata, Languages, and Machines. Academic Press, Inc., Orlando, FL, USA, T. Balanescu, A. J. Cowling, H. Georgescu, M. Gheorghe, M. Holcombe, and C. Vertan. Communicating stream x-machines systems are no more than x-machines. j-jucs, 5(9): , Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan. Brook for gpus: stream computing on graphics hardware. ACM Trans. Graph., 23(3): , Michael D. McCool, Zheng Qin, and Tiberiu S. Popa. Shader metaprogramming. In HWWS 02: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, pages 57 68, Aire-la-Ville, Switzerland, Switzerland, Eurographics Association. Lars Nyland, Mark Harris, and Jan Prins. Fast n-body simulation with cuda. In Hubert Nguyen, editor, GPU Gems 3, chapter 31. Addison Wesley Professional, August 2007.
Introduction to GPU Programming Languages
CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure
More informationTexture Cache Approximation on GPUs
Texture Cache Approximation on GPUs Mark Sutherland Joshua San Miguel Natalie Enright Jerger {suther68,enright}@ece.utoronto.ca, joshua.sanmiguel@mail.utoronto.ca 1 Our Contribution GPU Core Cache Cache
More informationIntroduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software
GPU Computing Numerical Simulation - from Models to Software Andreas Barthels JASS 2009, Course 2, St. Petersburg, Russia Prof. Dr. Sergey Y. Slavyanov St. Petersburg State University Prof. Dr. Thomas
More informationRecent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005
Recent Advances and Future Trends in Graphics Hardware Michael Doggett Architect November 23, 2005 Overview XBOX360 GPU : Xenos Rendering performance GPU architecture Unified shader Memory Export Texture/Vertex
More informationGraphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011
Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis
More informationGPU Hardware and Programming Models. Jeremy Appleyard, September 2015
GPU Hardware and Programming Models Jeremy Appleyard, September 2015 A brief history of GPUs In this talk Hardware Overview Programming Models Ask questions at any point! 2 A Brief History of GPUs 3 Once
More informationCUDA programming on NVIDIA GPUs
p. 1/21 on NVIDIA GPUs Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware
More informationGPGPU Computing. Yong Cao
GPGPU Computing Yong Cao Why Graphics Card? It s powerful! A quiet trend Copyright 2009 by Yong Cao Why Graphics Card? It s powerful! Processor Processing Units FLOPs per Unit Clock Speed Processing Power
More informationGPU Architecture. Michael Doggett ATI
GPU Architecture Michael Doggett ATI GPU Architecture RADEON X1800/X1900 Microsoft s XBOX360 Xenos GPU GPU research areas ATI - Driving the Visual Experience Everywhere Products from cell phones to super
More informationGPUs for Scientific Computing
GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research
More informationLBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR
LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:
More informationThe Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA
The Evolution of Computer Graphics Tony Tamasi SVP, Content & Technology, NVIDIA Graphics Make great images intricate shapes complex optical effects seamless motion Make them fast invent clever techniques
More informationMulti-GPU Load Balancing for Simulation and Rendering
Multi- Load Balancing for Simulation and Rendering Yong Cao Computer Science Department, Virginia Tech, USA In-situ ualization and ual Analytics Instant visualization and interaction of computing tasks
More informationParallel Prefix Sum (Scan) with CUDA. Mark Harris mharris@nvidia.com
Parallel Prefix Sum (Scan) with CUDA Mark Harris mharris@nvidia.com April 2007 Document Change History Version Date Responsible Reason for Change February 14, 2007 Mark Harris Initial release April 2007
More informationRadeon HD 2900 and Geometry Generation. Michael Doggett
Radeon HD 2900 and Geometry Generation Michael Doggett September 11, 2007 Overview Introduction to 3D Graphics Radeon 2900 Starting Point Requirements Top level Pipeline Blocks from top to bottom Command
More informationL20: GPU Architecture and Models
L20: GPU Architecture and Models scribe(s): Abdul Khalifa 20.1 Overview GPUs (Graphics Processing Units) are large parallel structure of processing cores capable of rendering graphics efficiently on displays.
More informationBrook for GPUs: Stream Computing on Graphics Hardware
Brook for GPUs: Stream Computing on Graphics Hardware Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan Computer Science Department Stanford University
More informationGPU Point List Generation through Histogram Pyramids
VMV 26, GPU Programming GPU Point List Generation through Histogram Pyramids Gernot Ziegler, Art Tevs, Christian Theobalt, Hans-Peter Seidel Agenda Overall task Problems Solution principle Algorithm: Discriminator
More informationGeneral Purpose Computation on Graphics Processors (GPGPU) Mike Houston, Stanford University
General Purpose Computation on Graphics Processors (GPGPU) Mike Houston, Stanford University A little about me http://graphics.stanford.edu/~mhouston Education: UC San Diego, Computer Science BS Stanford
More informationShader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group
Shader Model 3.0 Ashu Rege NVIDIA Developer Technology Group Talk Outline Quick Intro GeForce 6 Series (NV4X family) New Vertex Shader Features Vertex Texture Fetch Longer Programs and Dynamic Flow Control
More informationAMD GPU Architecture. OpenCL Tutorial, PPAM 2009. Dominik Behr September 13th, 2009
AMD GPU Architecture OpenCL Tutorial, PPAM 2009 Dominik Behr September 13th, 2009 Overview AMD GPU architecture How OpenCL maps on GPU and CPU How to optimize for AMD GPUs and CPUs in OpenCL 2 AMD GPU
More informationGPGPU: General-Purpose Computation on GPUs
GPGPU: General-Purpose Computation on GPUs Randy Fernando NVIDIA Developer Technology Group (Original Slides Courtesy of Mark Harris) Why GPGPU? The GPU has evolved into an extremely flexible and powerful
More informationGPU Shading and Rendering: Introduction & Graphics Hardware
GPU Shading and Rendering: Introduction & Graphics Hardware Marc Olano Computer Science and Electrical Engineering University of Maryland, Baltimore County SIGGRAPH 2005 Schedule Shading Technolgy 8:30
More informationHPC with Multicore and GPUs
HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware
More informationData Parallel Computing on Graphics Hardware. Ian Buck Stanford University
Data Parallel Computing on Graphics Hardware Ian Buck Stanford University Brook General purpose Streaming language DARPA Polymorphous Computing Architectures Stanford - Smart Memories UT Austin - TRIPS
More informationIntroduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it
t.diamanti@cineca.it Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate
More informationOverview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming
Overview Lecture 1: an introduction to CUDA Mike Giles mike.giles@maths.ox.ac.uk hardware view software view Oxford University Mathematical Institute Oxford e-research Centre Lecture 1 p. 1 Lecture 1 p.
More informationApplications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61
F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase
More informationIntroduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1
Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?
More informationBachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries
First Semester Development 1A On completion of this subject students will be able to apply basic programming and problem solving skills in a 3 rd generation object-oriented programming language (such as
More informationGPGPU for Real-Time Data Analytics: Introduction. Nanyang Technological University, Singapore 2
GPGPU for Real-Time Data Analytics: Introduction Bingsheng He 1, Huynh Phung Huynh 2, Rick Siow Mong Goh 2 1 Nanyang Technological University, Singapore 2 A*STAR Institute of High Performance Computing,
More informationVALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS
VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS Perhaad Mistry, Yash Ukidave, Dana Schaa, David Kaeli Department of Electrical and Computer Engineering Northeastern University,
More informationReal-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university
Real-Time Realistic Rendering Michael Doggett Docent Department of Computer Science Lund university 30-5-2011 Visually realistic goal force[d] us to completely rethink the entire rendering process. Cook
More informationInteractive Level-Set Deformation On the GPU
Interactive Level-Set Deformation On the GPU Institute for Data Analysis and Visualization University of California, Davis Problem Statement Goal Interactive system for deformable surface manipulation
More informationHIGH PERFORMANCE CONSULTING COURSE OFFERINGS
Performance 1(6) HIGH PERFORMANCE CONSULTING COURSE OFFERINGS LEARN TO TAKE ADVANTAGE OF POWERFUL GPU BASED ACCELERATOR TECHNOLOGY TODAY 2006 2013 Nvidia GPUs Intel CPUs CONTENTS Acronyms and Terminology...
More informationNVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist
NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get
More informationultra fast SOM using CUDA
ultra fast SOM using CUDA SOM (Self-Organizing Map) is one of the most popular artificial neural network algorithms in the unsupervised learning category. Sijo Mathew Preetha Joy Sibi Rajendra Manoj A
More informationHP ProLiant SL270s Gen8 Server. Evaluation Report
HP ProLiant SL270s Gen8 Server Evaluation Report Thomas Schoenemeyer, Hussein Harake and Daniel Peter Swiss National Supercomputing Centre (CSCS), Lugano Institute of Geophysics, ETH Zürich schoenemeyer@cscs.ch
More informationAccelerating Wavelet-Based Video Coding on Graphics Hardware
Wladimir J. van der Laan, Andrei C. Jalba, and Jos B.T.M. Roerdink. Accelerating Wavelet-Based Video Coding on Graphics Hardware using CUDA. In Proc. 6th International Symposium on Image and Signal Processing
More informationAn investigation of the efficient implementation of Cellular. Automata on multi-core CPU and GPU hardware.
An investigation of the efficient implementation of Cellular Automata on multi-core CPU and GPU hardware. Mike Gibson (Corresponding author), Ed Keedwell and Dragan Savić College of Engineering, Mathematics
More informationRadeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008
Radeon GPU Architecture and the series Michael Doggett Graphics Architecture Group June 27, 2008 Graphics Processing Units Introduction GPU research 2 GPU Evolution GPU started as a triangle rasterizer
More informationOverview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions
Module 4: Beyond Static Scalar Fields Dynamic Volume Computation and Visualization on the GPU Visualization and Computer Graphics Group University of California, Davis Overview Motivation and applications
More informationGPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
More informationGPU Parallel Computing Architecture and CUDA Programming Model
GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel
More informationOpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA
OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization
More informationCSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University
GPU Generations CSE 564: Visualization GPU Programming (First Steps) Klaus Mueller Computer Science Department Stony Brook University For the labs, 4th generation is desirable Graphics Hardware Pipeline
More informationNVPRO-PIPELINE A RESEARCH RENDERING PIPELINE MARKUS TAVENRATH MATAVENRATH@NVIDIA.COM SENIOR DEVELOPER TECHNOLOGY ENGINEER, NVIDIA
NVPRO-PIPELINE A RESEARCH RENDERING PIPELINE MARKUS TAVENRATH MATAVENRATH@NVIDIA.COM SENIOR DEVELOPER TECHNOLOGY ENGINEER, NVIDIA GFLOPS 3500 3000 NVPRO-PIPELINE Peak Double Precision FLOPS GPU perf improved
More informationIntroducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.
More informationReal-time Visual Tracker by Stream Processing
Real-time Visual Tracker by Stream Processing Simultaneous and Fast 3D Tracking of Multiple Faces in Video Sequences by Using a Particle Filter Oscar Mateo Lozano & Kuzahiro Otsuka presented by Piotr Rudol
More informationATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group
ATI Radeon 4800 series Graphics Michael Doggett Graphics Architecture Group Graphics Product Group Graphics Processing Units ATI Radeon HD 4870 AMD Stream Computing Next Generation GPUs 2 Radeon 4800 series
More informationE6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices
E6895 Advanced Big Data Analytics Lecture 14: NVIDIA GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist,
More informationPerformance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
More informationGUI GRAPHICS AND USER INTERFACES. Welcome to GUI! Mechanics. Mihail Gaianu 26/02/2014 1
Welcome to GUI! Mechanics 26/02/2014 1 Requirements Info If you don t know C++, you CAN take this class additional time investment required early on GUI Java to C++ transition tutorial on course website
More informationGPU for Scientific Computing. -Ali Saleh
1 GPU for Scientific Computing -Ali Saleh Contents Introduction What is GPU GPU for Scientific Computing K-Means Clustering K-nearest Neighbours When to use GPU and when not Commercial Programming GPU
More informationNext Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
More informationComputer Graphics Hardware An Overview
Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and
More informationGame Development in Android Disgruntled Rats LLC. Sean Godinez Brian Morgan Michael Boldischar
Game Development in Android Disgruntled Rats LLC Sean Godinez Brian Morgan Michael Boldischar Overview Introduction Android Tools Game Development OpenGL ES Marketing Summary Questions Introduction Disgruntled
More informationIntroduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
More informationCUDA Basics. Murphy Stein New York University
CUDA Basics Murphy Stein New York University Overview Device Architecture CUDA Programming Model Matrix Transpose in CUDA Further Reading What is CUDA? CUDA stands for: Compute Unified Device Architecture
More informationCUBE-MAP DATA STRUCTURE FOR INTERACTIVE GLOBAL ILLUMINATION COMPUTATION IN DYNAMIC DIFFUSE ENVIRONMENTS
ICCVG 2002 Zakopane, 25-29 Sept. 2002 Rafal Mantiuk (1,2), Sumanta Pattanaik (1), Karol Myszkowski (3) (1) University of Central Florida, USA, (2) Technical University of Szczecin, Poland, (3) Max- Planck-Institut
More informationMulti-GPU Load Balancing for In-situ Visualization
Multi-GPU Load Balancing for In-situ Visualization R. Hagan and Y. Cao Department of Computer Science, Virginia Tech, Blacksburg, VA, USA Abstract Real-time visualization is an important tool for immediately
More informationGPU Computing - CUDA
GPU Computing - CUDA A short overview of hardware and programing model Pierre Kestener 1 1 CEA Saclay, DSM, Maison de la Simulation Saclay, June 12, 2012 Atelier AO and GPU 1 / 37 Content Historical perspective
More informationHow To Teach Computer Graphics
Computer Graphics Thilo Kielmann Lecture 1: 1 Introduction (basic administrative information) Course Overview + Examples (a.o. Pixar, Blender, ) Graphics Systems Hands-on Session General Introduction http://www.cs.vu.nl/~graphics/
More informationNVIDIA GeForce GTX 580 GPU Datasheet
NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet 3D Graphics Full Microsoft DirectX 11 Shader Model 5.0 support: o NVIDIA PolyMorph Engine with distributed HW tessellation engines
More informationA Computer Vision System on a Chip: a case study from the automotive domain
A Computer Vision System on a Chip: a case study from the automotive domain Gideon P. Stein Elchanan Rushinek Gaby Hayun Amnon Shashua Mobileye Vision Technologies Ltd. Hebrew University Jerusalem, Israel
More informationHardware design for ray tracing
Hardware design for ray tracing Jae-sung Yoon Introduction Realtime ray tracing performance has recently been achieved even on single CPU. [Wald et al. 2001, 2002, 2004] However, higher resolutions, complex
More informationWeb Based 3D Visualization for COMSOL Multiphysics
Web Based 3D Visualization for COMSOL Multiphysics M. Jüttner* 1, S. Grabmaier 1, W. M. Rucker 1 1 University of Stuttgart Institute for Theory of Electrical Engineering *Corresponding author: Pfaffenwaldring
More informationWriting Applications for the GPU Using the RapidMind Development Platform
Writing Applications for the GPU Using the RapidMind Development Platform Contents Introduction... 1 Graphics Processing Units... 1 RapidMind Development Platform... 2 Writing RapidMind Enabled Applications...
More informationOptimizing AAA Games for Mobile Platforms
Optimizing AAA Games for Mobile Platforms Niklas Smedberg Senior Engine Programmer, Epic Games Who Am I A.k.a. Smedis Epic Games, Unreal Engine 15 years in the industry 30 years of programming C64 demo
More informationHardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui
Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching
More informationTowards Large-Scale Molecular Dynamics Simulations on Graphics Processors
Towards Large-Scale Molecular Dynamics Simulations on Graphics Processors Joe Davis, Sandeep Patel, and Michela Taufer University of Delaware Outline Introduction Introduction to GPU programming Why MD
More informationOptimizing Unity Games for Mobile Platforms. Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August
Optimizing Unity Games for Mobile Platforms Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August Agenda Introduction The author and ARM Preliminary knowledge Unity Pro, OpenGL ES 3.0 Identify
More informationIntroduction to GPU Architecture
Introduction to GPU Architecture Ofer Rosenberg, PMTS SW, OpenCL Dev. Team AMD Based on From Shader Code to a Teraflop: How GPU Shader Cores Work, By Kayvon Fatahalian, Stanford University Content 1. Three
More informationACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU
Computer Science 14 (2) 2013 http://dx.doi.org/10.7494/csci.2013.14.2.243 Marcin Pietroń Pawe l Russek Kazimierz Wiatr ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Abstract This paper presents
More informationGPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith
GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series By: Binesh Tuladhar Clay Smith Overview History of GPU s GPU Definition Classical Graphics Pipeline Geforce 6 Series Architecture Vertex
More informationIntro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1
Intro to GPU computing Spring 2015 Mark Silberstein, 048661, Technion 1 Serial vs. parallel program One instruction at a time Multiple instructions in parallel Spring 2015 Mark Silberstein, 048661, Technion
More informationGPGPU Parallel Merge Sort Algorithm
GPGPU Parallel Merge Sort Algorithm Jim Kukunas and James Devine May 4, 2009 Abstract The increasingly high data throughput and computational power of today s Graphics Processing Units (GPUs), has led
More informationRethinking SIMD Vectorization for In-Memory Databases
SIGMOD 215, Melbourne, Victoria, Australia Rethinking SIMD Vectorization for In-Memory Databases Orestis Polychroniou Columbia University Arun Raghavan Oracle Labs Kenneth A. Ross Columbia University Latest
More informationBig Data Visualization on the MIC
Big Data Visualization on the MIC Tim Dykes School of Creative Technologies University of Portsmouth timothy.dykes@port.ac.uk Many-Core Seminar Series 26/02/14 Splotch Team Tim Dykes, University of Portsmouth
More informationGraphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data
Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Amanda O Connor, Bryan Justice, and A. Thomas Harris IN52A. Big Data in the Geosciences:
More informationQCD as a Video Game?
QCD as a Video Game? Sándor D. Katz Eötvös University Budapest in collaboration with Győző Egri, Zoltán Fodor, Christian Hoelbling Dániel Nógrádi, Kálmán Szabó Outline 1. Introduction 2. GPU architecture
More informationGuided Performance Analysis with the NVIDIA Visual Profiler
Guided Performance Analysis with the NVIDIA Visual Profiler Identifying Performance Opportunities NVIDIA Nsight Eclipse Edition (nsight) NVIDIA Visual Profiler (nvvp) nvprof command-line profiler Guided
More informationGPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics
GPU Architectures A CPU Perspective Derek Hower AMD Research 5/21/2013 Goals Data Parallelism: What is it, and how to exploit it? Workload characteristics Execution Models / GPU Architectures MIMD (SPMD),
More informationOpenGL Performance Tuning
OpenGL Performance Tuning Evan Hart ATI Pipeline slides courtesy John Spitzer - NVIDIA Overview What to look for in tuning How it relates to the graphics pipeline Modern areas of interest Vertex Buffer
More information15-418 Final Project Report. Trading Platform Server
15-418 Final Project Report Yinghao Wang yinghaow@andrew.cmu.edu May 8, 214 Trading Platform Server Executive Summary The final project will implement a trading platform server that provides back-end support
More informationImage Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg
Image Processing and Computer Graphics Rendering Pipeline Matthias Teschner Computer Science Department University of Freiburg Outline introduction rendering pipeline vertex processing primitive processing
More informationCUDA SKILLS. Yu-Hang Tang. June 23-26, 2015 CSRC, Beijing
CUDA SKILLS Yu-Hang Tang June 23-26, 2015 CSRC, Beijing day1.pdf at /home/ytang/slides Referece solutions coming soon Online CUDA API documentation http://docs.nvidia.com/cuda/index.html Yu-Hang Tang @
More informationHPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
More informationString Matching on a multicore GPU using CUDA
String Matching on a multicore GPU using CUDA Charalampos S. Kouzinopoulos and Konstantinos G. Margaritis Parallel and Distributed Processing Laboratory Department of Applied Informatics, University of
More informationGPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile
GPU Computing with CUDA Lecture 2 - CUDA Memories Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 Outline of lecture Recap of Lecture 1 Warp scheduling CUDA Memory hierarchy
More informationA Load Balancing Schema for Agent-based SPMD Applications
A Load Balancing Schema for Agent-based SPMD Applications Claudio Márquez, Eduardo César, and Joan Sorribes Computer Architecture and Operating Systems Department (CAOS), Universitat Autònoma de Barcelona,
More informationDeveloper Tools. Tim Purcell NVIDIA
Developer Tools Tim Purcell NVIDIA Programming Soap Box Successful programming systems require at least three tools High level language compiler Cg, HLSL, GLSL, RTSL, Brook Debugger Profiler Debugging
More informationParallel Computing with MATLAB
Parallel Computing with MATLAB Scott Benway Senior Account Manager Jiro Doke, Ph.D. Senior Application Engineer 2013 The MathWorks, Inc. 1 Acceleration Strategies Applied in MATLAB Approach Options Best
More informationA Performance-Oriented Data Parallel Virtual Machine for GPUs
A Performance-Oriented Data Parallel Virtual Machine for GPUs Mark Segal Mark Peercy ATI Technologies, Inc. Abstract Existing GPU programming interfaces require that applications employ a graphics-centric
More informationA Case Study - Scaling Legacy Code on Next Generation Platforms
Available online at www.sciencedirect.com ScienceDirect Procedia Engineering 00 (2015) 000 000 www.elsevier.com/locate/procedia 24th International Meshing Roundtable (IMR24) A Case Study - Scaling Legacy
More informationDesign and Optimization of a Portable Lattice Boltzmann Code for Heterogeneous Architectures
Design and Optimization of a Portable Lattice Boltzmann Code for Heterogeneous Architectures E Calore, S F Schifano, R Tripiccione Enrico Calore INFN Ferrara, Italy Perspectives of GPU Computing in Physics
More informationPerformance Optimization and Debug Tools for mobile games with PlayCanvas
Performance Optimization and Debug Tools for mobile games with PlayCanvas Jonathan Kirkham, Senior Software Engineer, ARM Will Eastcott, CEO, PlayCanvas 1 Introduction Jonathan Kirkham, ARM Worked with
More informationUnleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers
Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Haohuan Fu haohuan@tsinghua.edu.cn High Performance Geo-Computing (HPGC) Group Center for Earth System Science Tsinghua University
More informationKrishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C
Tutorial#1 Q 1:- Explain the terms data, elementary item, entity, primary key, domain, attribute and information? Also give examples in support of your answer? Q 2:- What is a Data Type? Differentiate
More information