# Real-time Visual Tracker by Stream Processing

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Real-time Visual Tracker by Stream Processing Simultaneous and Fast 3D Tracking of Multiple Faces in Video Sequences by Using a Particle Filter Oscar Mateo Lozano & Kuzahiro Otsuka presented by Piotr Rudol

2

3 Outline Particle Filter Basics Visual Tracking of Faces - Problem Definition CUDA Implementation Results Conclusion

4 Particle Filtering Particle filtering is a state estimation technique based on Monte Carlo simulations. Particles are random values of a state-space variable which are checked against the current output measure to generate a weight value i.e. likeliness of a particle being the one that best describes the current state of the system. The collection of all particles and their weights in each instant is a numerical approximation of the probability density function of the system. The probabilistic approach of this method provides significant robustness, as several possible states of the system are tracked at any given moment.

5 Particle Filtering First, a population of M samples is created by sampling from the prior distribution at time 0, P(X0). Then the update cycle is repeated for each time step: Each sample is propagated forward by sampling the next state value xt+1 given the current value xt for the sample, using the transition model P(Xt+1 xt). Each sample is weighted by the likelihood it assigns to the new evidence P(et+1 xt+1). The population is resampled to generate a new population of M samples. Each new sample is selected from the current population; the probability that a particular sample is selected is proportional to its weight. This procedure puts in focus samples in high-probability regions of the state space.

6 Particle Filter-based Localization

7 Problem Definition We consider the state sequence of a target: xt = (Tx, Ty, S, Rx, Ry, Rz, α) Tx, Ty - translational coordinate S - scale Rx, Ry, Rz - rotations along all axis α - global illumination variable

8 Problem Definition The state-space vector is given in each discrete time step by: x t = ft(x t-1, v t-1 ) - state transition model ft is possibly non-linear function of state xt-1, {vt-1} is an independent and identically-distributed process noise sequence. The objective of tracking is to recursively estimate x t from measurements: z t = ht(x t, n t ) - observation model ht is possibly non-linear function, {nt-1} is an independent and identically-distributed measurement noise sequence. We seek filtered estimates of x t based on the set of all available measurements z 1:t up to time t, together with some degree of belief in them. Namely, we want to know the p.d.f. p(x t z 1:t ), assuming that the initial p.d.f. p( x 0 ) is known.

9 Particle Filter Solution is an approximation method that makes use of Monte Carlo simulations [2]. We now consider discrete particles forming the set: ( ) { X t 1, t 1 = ( x 0 t 1, π t 1 0 ),..., M ( x t 1, π t 1) } M (4) We consider discrete particles forming the set: Every particle contains information about one possible state every of the particle system x contains j information about j one possible state of the t 1 and its importance weight πt 1 system and its importance weight.. This sample set is propagated in the Condensation algorithm as follows: Select A set of M new particles ( X t, t ) is generated, by random selection from the previous particles by using a p.d.f. according to their weights t 1. Diffuse A Gaussian random noise is added to each parameter of each particle in X t. Measure weight The probability t is measured using template matching, based on the difference error between the template (the collection of all pixels forming the face in t = 0, or some carefully selected pixels of the template if we use a sparse template [16, 17] as we describe in Section 2.3) and the input image at each time step. j O. Mateo Lozano, K where R 2 3 denotes the 2 3 upper submatri rotation matrix R. The matching error betw template and input image is defined based on th sity of each feature point, J(q m ), m = 1,, the intensity of the corresponding projected I( p m ), modified by the global illumination fa More precisely, the matching error, defined average matching error of N feature points, written as This sample is propagated in the Condensation algorithm as follows: ε = 1 N N ρ (κ (J(q m ), I( p m ))) m=1 Select - A set of M new particles is generated by random selection from the previous particles by updating the p.d.f. according their weight. κ(j, I) = α I J J Diffuse - A Gaussian noise is added to each parameter of each particle ρ(κ) = κ2 κ Measure weight - The likelihood of each particle being the true state is calculated based on template matching using the input image at each step. where κ(j, I) denotes the relative residual b intensity values of the feature point q and the pr point p, and ρ(κ) represents a robust functio is used to increase the robustness against outlie [17]. From the matching error for each particle

10 The Algorithm

11 Initialization Every n-th frame is analyzed by a boosting algorithm to detect new faces giving a rectangular sub-image. a b c A fitting of a statistical deformable model is performed resulting in a series of 53 landmark points that correspond to known features of a human face. The landmarks include a pair of texture coordinates, a set of numbers that refer to a unique part of a texture resource stored in memory. This way two image buffers ( height- and normalmap ) are created by applying a 3D face model.

12 Initialization The feature points (position and gray level) that form the sparse template are selected using image processing techniques, and their depth and normal to the surface values are extracted from prior maps. Finally the template is formed by a stream of N feature points formed as follows: p = (Px, Py, Pz, Nx, Ny, Nz, J) Px, Py, Pz - coordinates of feature points Nx, Ny, Nz - form the vector normal to the face surface in the feature point J - gray level of the feature point

13 Initialization Why do feature points include normal vectors? If a normal vector points away from the camera it is likely occluded by another feature point. It s weight can be disregarded.

14 Tracking Tracking is performed by means of a Particle Filter. Out of selection, diffusion, and weighting the latter is the most computationally intensive. It requires performing the 3D transformation of each feature point as estimated by each of the particles, and then a comparison is made of the feature point gray level against the resulting point in the full image. The sum of all those comparisons for each feature point results in the weight of one particle. For example with M=1000 particles and N=230 feature points it s weight computation operations! Weight calculation of each particle is an independent process!

15 Display After every tracking step, the best particles are averaged to get an approximation of the system state. The resulting state-space values can displayed on the screen as rotated, scaled and translated 3D-mesh models.

16 CUDA implementation After initialization, feature point stream is copied to device memory. It doesn t change during tracking. At every step the current video frame and a buffer with particles is copied to device memory. The kernel is invoked and executed in M blocks, each with N threads i.e. one block per particle and one thread per feature point per block. Each thread computes the matching error of a particle-feature pair i.e. transforms the feature point position and normal vector, accesses the video frame pixel in that position (if normal vector does not mean occlusion) and compares the gray level. The result is placed by every thread in different position in the block s shared memory. When all threads in a block finish (sync), first thread of every block sums up matching errors and the result is placed in global device memory. The host retrieves stream of weights of every particle.

17 Hardware Intel Core II 2.66GHz host system with 2 GB RAM NVIDIA GeForce 8800GTX GPU: 128 processing units, 16 multiprocessors, 350GFLOPS under CUDA. For comparison purposes, a software-only version of the weighting algorithm was also developed (single-threaded, no SIMD extensions used) for execution on the host CPU alone.

18 Results Results of the simultaneous tracking of four people. The frame sequence is taken from a synthetic video, the sparse templates are composed of approximately 230 feature points, and each template is tracked using 1000 particles.

19 ks, cle ch rtisime nd ult of all he in ry m bal this GPGPU (as opposed to SP) technique. Brook also suffers from the excessive overhead associated with Time per frame (ms) NRT RT GPU Software Results Particle weighting speed Number of particles Speed of the particle weighting stage, comparing Stream processing in the GPU version to a serial CPU-only version. Video and 217 feature points ut Figure 6 Speed of the particle weighting stage, comparing Stream processing in the GPU version to a serial CPU-only

20 Results Real-time visual tracker by stream processing Frames per second RT NRT Full application speed Faces GPU Software Speed of the full application, comparing stream processing in the GPU version to a serial CPU-only version. Video 1024x768, 230 feature points per face, and 1000 particles per face. Figure 7 Speed of the full application, comparing stream processing in the GPU version to a serial CPU-only version. BE, or many (usuall mind) implem One effort PFs so hardwa proble sents a step, s CPU ( in [25] pling c

21 Summary Particle Filter on GPU-based visual tracking of human faces was presented. Thanks to easy parallelization of weight computation operations a significant performance increase was achieved. GPU based filter runs easily in real-time with 10k particles comparing to approx. 4.5k for CPU only version. GPU based filter can track up to 6 faces in a video stream in real time comparing to 0 for CPU only version. The GPU-based particle filter can easily by used in different domains.

### Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

GPU Computing Numerical Simulation - from Models to Software Andreas Barthels JASS 2009, Course 2, St. Petersburg, Russia Prof. Dr. Sergey Y. Slavyanov St. Petersburg State University Prof. Dr. Thomas

### Computer Graphics Hardware An Overview

Computer Graphics Hardware An Overview Graphics System Monitor Input devices CPU/Memory GPU Raster Graphics System Raster: An array of picture elements Based on raster-scan TV technology The screen (and

### Tracking Algorithms. Lecture17: Stochastic Tracking. Joint Probability and Graphical Model. Probabilistic Tracking

Tracking Algorithms (2015S) Lecture17: Stochastic Tracking Bohyung Han CSE, POSTECH bhhan@postech.ac.kr Deterministic methods Given input video and current state, tracking result is always same. Local

### LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR Frédéric Kuznik, frederic.kuznik@insa lyon.fr 1 Framework Introduction Hardware architecture CUDA overview Implementation details A simple case:

### Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

### Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization Wolfram Burgard, Maren Bennewitz, Diego Tipaldi, Luciano Spinello 1 Motivation Recall: Discrete filter Discretize

### Texture Cache Approximation on GPUs

Texture Cache Approximation on GPUs Mark Sutherland Joshua San Miguel Natalie Enright Jerger {suther68,enright}@ece.utoronto.ca, joshua.sanmiguel@mail.utoronto.ca 1 Our Contribution GPU Core Cache Cache

### Real-Time Camera Tracking Using a Particle Filter

Real-Time Camera Tracking Using a Particle Filter Mark Pupilli and Andrew Calway Department of Computer Science University of Bristol, UK {pupilli,andrew}@cs.bris.ac.uk Abstract We describe a particle

### ultra fast SOM using CUDA

ultra fast SOM using CUDA SOM (Self-Organizing Map) is one of the most popular artificial neural network algorithms in the unsupervised learning category. Sijo Mathew Preetha Joy Sibi Rajendra Manoj A

### GeoImaging Accelerator Pansharp Test Results

GeoImaging Accelerator Pansharp Test Results Executive Summary After demonstrating the exceptional performance improvement in the orthorectification module (approximately fourteen-fold see GXL Ortho Performance

### A Robust Multiple Object Tracking for Sport Applications 1) Thomas Mauthner, Horst Bischof

A Robust Multiple Object Tracking for Sport Applications 1) Thomas Mauthner, Horst Bischof Institute for Computer Graphics and Vision Graz University of Technology, Austria {mauthner,bischof}@icg.tu-graz.ac.at

### A Prototype For Eye-Gaze Corrected

A Prototype For Eye-Gaze Corrected Video Chat on Graphics Hardware Maarten Dumont, Steven Maesen, Sammy Rogmans and Philippe Bekaert Introduction Traditional webcam video chat: No eye contact. No extensive

### Interactive Level-Set Deformation On the GPU

Interactive Level-Set Deformation On the GPU Institute for Data Analysis and Visualization University of California, Davis Problem Statement Goal Interactive system for deformable surface manipulation

### Introduction to GPGPU. Tiziano Diamanti t.diamanti@cineca.it

t.diamanti@cineca.it Agenda From GPUs to GPGPUs GPGPU architecture CUDA programming model Perspective projection Vectors that connect the vanishing point to every point of the 3D model will intersecate

### Introduction to GPU Programming Languages

CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure

### GPGPU Computing. Yong Cao

GPGPU Computing Yong Cao Why Graphics Card? It s powerful! A quiet trend Copyright 2009 by Yong Cao Why Graphics Card? It s powerful! Processor Processing Units FLOPs per Unit Clock Speed Processing Power

### SIMULTANEOUS LOCALIZATION AND MAPPING FOR EVENT-BASED VISION SYSTEMS

SIMULTANEOUS LOCALIZATION AND MAPPING FOR EVENT-BASED VISION SYSTEMS David Weikersdorfer, Raoul Hoffmann, Jörg Conradt 9-th International Conference on Computer Vision Systems 18-th July 2013, St. Petersburg,

### Program Optimization Study on a 128-Core GPU

Program Optimization Study on a 128-Core GPU Shane Ryoo, Christopher I. Rodrigues, Sam S. Stone, Sara S. Baghsorkhi, Sain-Zee Ueng, and Wen-mei W. Hwu Yu, Xuan Dept of Computer & Information Sciences University

### GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

GPU Computing with CUDA Lecture 2 - CUDA Memories Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1 Outline of lecture Recap of Lecture 1 Warp scheduling CUDA Memory hierarchy

### Mean-Shift Tracking with Random Sampling

1 Mean-Shift Tracking with Random Sampling Alex Po Leung, Shaogang Gong Department of Computer Science Queen Mary, University of London, London, E1 4NS Abstract In this work, boosting the efficiency of

### GPU Architecture Overview. John Owens UC Davis

GPU Architecture Overview John Owens UC Davis The Right-Hand Turn [H&P Figure 1.1] Why? [Architecture Reasons] ILP increasingly difficult to extract from instruction stream Control hardware dominates µprocessors

### Robust Infrared Vehicle Tracking across Target Pose Change using L 1 Regularization

Robust Infrared Vehicle Tracking across Target Pose Change using L 1 Regularization Haibin Ling 1, Li Bai, Erik Blasch 3, and Xue Mei 4 1 Computer and Information Science Department, Temple University,

### Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Amanda O Connor, Bryan Justice, and A. Thomas Harris IN52A. Big Data in the Geosciences:

### GPU Accelerated Monte Carlo Simulations and Time Series Analysis

GPU Accelerated Monte Carlo Simulations and Time Series Analysis Institute of Physics, Johannes Gutenberg-University of Mainz Center for Polymer Studies, Department of Physics, Boston University Artemis

### GTC 2014 San Jose, California

GTC 2014 San Jose, California An Approach to Parallel Processing of Big Data in Finance for Alpha Generation and Risk Management Yigal Jhirad and Blay Tarnoff March 26, 2014 GTC 2014: Table of Contents

### Interactive Level-Set Segmentation on the GPU

Interactive Level-Set Segmentation on the GPU Problem Statement Goal Interactive system for deformable surface manipulation Level-sets Challenges Deformation is slow Deformation is hard to control Solution

### Analysis of GPU Parallel Computing based on Matlab

Analysis of GPU Parallel Computing based on Matlab Mingzhe Wang, Bo Wang, Qiu He, Xiuxiu Liu, Kunshuai Zhu (School of Computer and Control Engineering, University of Chinese Academy of Sciences, Huairou,

### Tracking in flussi video 3D. Ing. Samuele Salti

Seminari XXIII ciclo Tracking in flussi video 3D Ing. Tutors: Prof. Tullio Salmon Cinotti Prof. Luigi Di Stefano The Tracking problem Detection Object model, Track initiation, Track termination, Tracking

### The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

The Evolution of Computer Graphics Tony Tamasi SVP, Content & Technology, NVIDIA Graphics Make great images intricate shapes complex optical effects seamless motion Make them fast invent clever techniques

### CSE 167: Lecture #18: Deferred Rendering. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012

CSE 167: Introduction to Computer Graphics Lecture #18: Deferred Rendering Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012 Announcements Thursday, Dec 13: Final project presentations

### Intro to GPU computing. Spring 2015 Mark Silberstein, 048661, Technion 1

Intro to GPU computing Spring 2015 Mark Silberstein, 048661, Technion 1 Serial vs. parallel program One instruction at a time Multiple instructions in parallel Spring 2015 Mark Silberstein, 048661, Technion

### Performance evaluation of multi-camera visual tracking

Performance evaluation of multi-camera visual tracking Lucio Marcenaro, Pietro Morerio, Mauricio Soto, Andrea Zunino, Carlo S. Regazzoni DITEN, University of Genova Via Opera Pia 11A 16145 Genoa - Italy

### Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks. October 20 th 2015

INF5063: Programming heterogeneous multi-core processors because the OS-course is just to easy! Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks October 20 th 2015 Håkon Kvale

### Alberto Corrales-García, Rafael Rodríguez-Sánchez, José Luis Martínez, Gerardo Fernández-Escribano, José M. Claver and José Luis Sánchez

Alberto Corrales-García, Rafael Rodríguez-Sánchez, José Luis artínez, Gerardo Fernández-Escribano, José. Claver and José Luis Sánchez 1. Introduction 2. Technical Background 3. Proposed DVC to H.264/AVC

### CSE168 Computer Graphics II, Rendering. Spring 2006 Matthias Zwicker

CSE168 Computer Graphics II, Rendering Spring 2006 Matthias Zwicker Last time Global illumination Light transport notation Path tracing Sampling patterns Reflection vs. rendering equation Reflection equation

### CS231M Project Report - Automated Real-Time Face Tracking and Blending

CS231M Project Report - Automated Real-Time Face Tracking and Blending Steven Lee, slee2010@stanford.edu June 6, 2015 1 Introduction Summary statement: The goal of this project is to create an Android

### GPGPU accelerated Computational Fluid Dynamics

t e c h n i s c h e u n i v e r s i t ä t b r a u n s c h w e i g Carl-Friedrich Gauß Faculty GPGPU accelerated Computational Fluid Dynamics 5th GACM Colloquium on Computational Mechanics Hamburg Institute

### GPUs for Scientific Computing

GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research

### GPGPU acceleration in OpenFOAM

Carl-Friedrich Gauß Faculty GPGPU acceleration in OpenFOAM Northern germany OpenFoam User meeting Braunschweig Institute of Technology Thorsten Grahs Institute of Scientific Computing/move-csc 2nd October

### Speed Performance Improvement of Vehicle Blob Tracking System

Speed Performance Improvement of Vehicle Blob Tracking System Sung Chun Lee and Ram Nevatia University of Southern California, Los Angeles, CA 90089, USA sungchun@usc.edu, nevatia@usc.edu Abstract. A speed

### Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?

### The Scientific Data Mining Process

Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

### The Future Of Animation Is Games

The Future Of Animation Is Games 王 銓 彰 Next Media Animation, Media Lab, Director cwang@1-apple.com.tw The Graphics Hardware Revolution ( 繪 圖 硬 體 革 命 ) : GPU-based Graphics Hardware Multi-core (20 Cores

### Experiments in Unstructured Mesh Finite Element CFD Using CUDA

Experiments in Unstructured Mesh Finite Element CFD Using CUDA Graham Markall Software Performance Imperial College London http://www.doc.ic.ac.uk/~grm08 grm08@doc.ic.ac.uk Joint work with David Ham and

### Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt.

Medical Image Processing on the GPU Past, Present and Future Anders Eklund, PhD Virginia Tech Carilion Research Institute andek@vtc.vt.edu Outline Motivation why do we need GPUs? Past - how was GPU programming

### NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get

### 15-418 Final Project Report. Trading Platform Server

15-418 Final Project Report Yinghao Wang yinghaow@andrew.cmu.edu May 8, 214 Trading Platform Server Executive Summary The final project will implement a trading platform server that provides back-end support

### Introduction to GPU Computing

Matthis Hauschild Universität Hamburg Fakultät für Mathematik, Informatik und Naturwissenschaften Technische Aspekte Multimodaler Systeme December 4, 2014 M. Hauschild - 1 Table of Contents 1. Architecture

### Head and Facial Animation Tracking using Appearance-Adaptive Models and Particle Filters

Head and Facial Animation Tracking using Appearance-Adaptive Models and Particle Filters F. Dornaika and F. Davoine CNRS HEUDIASYC Compiègne University of Technology 625 Compiègne Cedex, FRANCE {dornaika,

### MONTE-CARLO SIMULATION OF AMERICAN OPTIONS WITH GPUS. Julien Demouth, NVIDIA

MONTE-CARLO SIMULATION OF AMERICAN OPTIONS WITH GPUS Julien Demouth, NVIDIA STAC-A2 BENCHMARK STAC-A2 Benchmark Developed by banks Macro and micro, performance and accuracy Pricing and Greeks for American

### Tracking Groups of Pedestrians in Video Sequences

Tracking Groups of Pedestrians in Video Sequences Jorge S. Marques Pedro M. Jorge Arnaldo J. Abrantes J. M. Lemos IST / ISR ISEL / IST ISEL INESC-ID / IST Lisbon, Portugal Lisbon, Portugal Lisbon, Portugal

### Towards robust automatic detection of vulnerable road users: monocular pedestrian tracking from a moving vehicle

Towards robust automatic detection of vulnerable road users: monocular pedestrian tracking from a moving vehicle Kristof Van Beeck 1, Floris De Smedt 1, Sander Beckers 1, Lars Struyf 1, Joost Vennekens

### PTask: Operating System Abstractions To Manage GPUs as Compute Devices

PTask: Operating System Abstractions To Manage GPUs as Compute Devices C.J. Rossbach, J. Currey - Microsoft Research B. Ray, E. Witchel - University of Texas M.Silberstein - Technion Presentation: Adam

### INFOGR Computer Graphics. J. Bikker - April-July 2016 - Lecture 12: Post-processing. Welcome!

INFOGR Computer Graphics J. Bikker - April-July 2016 - Lecture 12: Post-processing Welcome! Today s Agenda: The Postprocessing Pipeline Vignetting, Chromatic Aberration Film Grain HDR effects Color Grading

### Deterministic Sampling-based Switching Kalman Filtering for Vehicle Tracking

Proceedings of the IEEE ITSC 2006 2006 IEEE Intelligent Transportation Systems Conference Toronto, Canada, September 17-20, 2006 WA4.1 Deterministic Sampling-based Switching Kalman Filtering for Vehicle

### Efficient mapping of the training of Convolutional Neural Networks to a CUDA-based cluster

Efficient mapping of the training of Convolutional Neural Networks to a CUDA-based cluster Jonatan Ward Sergey Andreev Francisco Heredia Bogdan Lazar Zlatka Manevska Eindhoven University of Technology,

### Graphics Processing Unit (GPU) Memory Hierarchy. Presented by Vu Dinh and Donald MacIntyre

Graphics Processing Unit (GPU) Memory Hierarchy Presented by Vu Dinh and Donald MacIntyre 1 Agenda Introduction to Graphics Processing CPU Memory Hierarchy GPU Memory Hierarchy GPU Architecture Comparison

### Multi Grid for Multi Core

Multi Grid for Multi Core Harald Köstler, Daniel Ritter, Markus Stürmer and U. Rüde (LSS Erlangen, ruede@cs.fau.de) in collaboration with many more Lehrstuhl für Informatik 10 (Systemsimulation) Universität

### NVIDIA GeForce GTX 580 GPU Datasheet

NVIDIA GeForce GTX 580 GPU Datasheet NVIDIA GeForce GTX 580 GPU Datasheet 3D Graphics Full Microsoft DirectX 11 Shader Model 5.0 support: o NVIDIA PolyMorph Engine with distributed HW tessellation engines

### VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS

VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS Perhaad Mistry, Yash Ukidave, Dana Schaa, David Kaeli Department of Electrical and Computer Engineering Northeastern University,

### Towards Large-Scale Molecular Dynamics Simulations on Graphics Processors

Towards Large-Scale Molecular Dynamics Simulations on Graphics Processors Joe Davis, Sandeep Patel, and Michela Taufer University of Delaware Outline Introduction Introduction to GPU programming Why MD

### VEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS

VEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS Aswin C Sankaranayanan, Qinfen Zheng, Rama Chellappa University of Maryland College Park, MD - 277 {aswch, qinfen, rama}@cfar.umd.edu Volkan Cevher, James

### Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching

### GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

### CUDA programming on NVIDIA GPUs

p. 1/21 on NVIDIA GPUs Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view

### Dynamic Resolution Rendering

Dynamic Resolution Rendering Doug Binks Introduction The resolution selection screen has been one of the defining aspects of PC gaming since the birth of games. In this whitepaper and the accompanying

### Robust Visual Tracking and Vehicle Classification via Sparse Representation

Robust Visual Tracking and Vehicle Classification via Sparse Representation Xue Mei and Haibin Ling, Member, IEEE Abstract In this paper, we propose a robust visual tracking method by casting tracking

### multimodality image processing workstation Visualizing your SPECT-CT-PET-MRI images

multimodality image processing workstation Visualizing your SPECT-CT-PET-MRI images InterView FUSION InterView FUSION is a new visualization and evaluation software developed by Mediso built on state of

### GPU-BASED TUNING OF QUANTUM-INSPIRED GENETIC ALGORITHM FOR A COMBINATORIAL OPTIMIZATION PROBLEM

GPU-BASED TUNING OF QUANTUM-INSPIRED GENETIC ALGORITHM FOR A COMBINATORIAL OPTIMIZATION PROBLEM Robert Nowotniak, Jacek Kucharski Computer Engineering Department The Faculty of Electrical, Electronic,

### Fast Implementations of AES on Various Platforms

Fast Implementations of AES on Various Platforms Joppe W. Bos 1 Dag Arne Osvik 1 Deian Stefan 2 1 EPFL IC IIF LACAL, Station 14, CH-1015 Lausanne, Switzerland {joppe.bos, dagarne.osvik}@epfl.ch 2 Dept.

### A robot s Navigation Problems. Localization. The Localization Problem. Sensor Readings

A robot s Navigation Problems Localization Where am I? Localization Where have I been? Map making Where am I going? Mission planning What s the best way there? Path planning 1 2 The Localization Problem

### Parallel Image Processing with CUDA A case study with the Canny Edge Detection Filter

Parallel Image Processing with CUDA A case study with the Canny Edge Detection Filter Daniel Weingaertner Informatics Department Federal University of Paraná - Brazil Hochschule Regensburg 02.05.2011 Daniel

### The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System Qingyu Meng, Alan Humphrey, Martin Berzins Thanks to: John Schmidt and J. Davison de St. Germain, SCI Institute Justin Luitjens

### Parallel algorithms to a parallel hardware: Designing vision algorithms for a GPU

Parallel algorithms to a parallel hardware: Designing vision algorithms for a GPU Jun-Sik Kim, Myung Hwangbo and Takeo Kanade Robotics Institute Carnegie Mellon University {kimjs,myung,tk}@cs.cmu.edu Abstract

### Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture.

Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture. Chirag Gupta,Sumod Mohan K cgupta@clemson.edu, sumodm@clemson.edu Abstract In this project we propose a method to improve

### Clustering Billions of Data Points Using GPUs

Clustering Billions of Data Points Using GPUs Ren Wu ren.wu@hp.com Bin Zhang bin.zhang2@hp.com Meichun Hsu meichun.hsu@hp.com ABSTRACT In this paper, we report our research on using GPUs to accelerate

### GPGPU for Real-Time Data Analytics: Introduction. Nanyang Technological University, Singapore 2

GPGPU for Real-Time Data Analytics: Introduction Bingsheng He 1, Huynh Phung Huynh 2, Rick Siow Mong Goh 2 1 Nanyang Technological University, Singapore 2 A*STAR Institute of High Performance Computing,

### Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Amanda O Connor, Bryan Justice, and A. Thomas Harris IN52A. Big Data in the Geosciences:

### Accelerating Wavelet-Based Video Coding on Graphics Hardware

Wladimir J. van der Laan, Andrei C. Jalba, and Jos B.T.M. Roerdink. Accelerating Wavelet-Based Video Coding on Graphics Hardware using CUDA. In Proc. 6th International Symposium on Image and Signal Processing

### Optimisation and SPH Tricks

Optimisation and SPH Tricks + José Manuel Domínguez Alonso jmdominguez@uvigo.es EPHYSLAB, Universidade de Vigo, Spain Outline DualSPHysics Users Workshop 2015, 8-9 September 2015, Manchester (United Kingdom)

### Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary

OpenCL Optimization Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary 2 Overall Optimization Strategies Maximize parallel

### Master s thesis tutorial: part III

for the Autonomous Compliant Research group Tinne De Laet, Wilm Decré, Diederik Verscheure Katholieke Universiteit Leuven, Department of Mechanical Engineering, PMA Division 30 oktober 2006 Outline General

### GPU Parallel Computing Architecture and CUDA Programming Model

GPU Parallel Computing Architecture and CUDA Programming Model John Nickolls Outline Why GPU Computing? GPU Computing Architecture Multithreading and Arrays Data Parallel Problem Decomposition Parallel

### ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

Computer Science 14 (2) 2013 http://dx.doi.org/10.7494/csci.2013.14.2.243 Marcin Pietroń Pawe l Russek Kazimierz Wiatr ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Abstract This paper presents

### Amazing renderings of 3D data... in minutes.

Amazing renderings of 3D data... in minutes. What is it? KeyShot is a standalone 3D rendering and animation application for anyone with a need to quickly and easily create photo-realistic images of 3D

### Stream Processing on GPUs Using Distributed Multimedia Middleware

Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research

### GPGPU: General-Purpose Computation on GPUs

GPGPU: General-Purpose Computation on GPUs Randy Fernando NVIDIA Developer Technology Group (Original Slides Courtesy of Mark Harris) Why GPGPU? The GPU has evolved into an extremely flexible and powerful

### Depth from a single camera

Depth from a single camera Fundamental Matrix Essential Matrix Active Sensing Methods School of Computer Science & Statistics Trinity College Dublin Dublin 2 Ireland www.scss.tcd.ie 1 1 Geometry of two

### Parallel Prefix Sum (Scan) with CUDA. Mark Harris mharris@nvidia.com

Parallel Prefix Sum (Scan) with CUDA Mark Harris mharris@nvidia.com April 2007 Document Change History Version Date Responsible Reason for Change February 14, 2007 Mark Harris Initial release April 2007

### Comparing Improved Versions of K-Means and Subtractive Clustering in a Tracking Application

Comparing Improved Versions of K-Means and Subtractive Clustering in a Tracking Application Marta Marrón Romera, Miguel Angel Sotelo Vázquez, and Juan Carlos García García Electronics Department, University

### Algorithm (DCABES 2009)

People Tracking via a Modified CAMSHIFT Algorithm (DCABES 2009) Fahad Fazal Elahi Guraya, Pierre-Yves Bayle and Faouzi Alaya Cheikh Department of Computer Science and Media Technology, Gjovik University

### Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware

### WAKING up without the sound of an alarm clock is a

EINDHOVEN UNIVERSITY OF TECHNOLOGY, MARCH 00 Adaptive Alarm Clock Using Movement Detection to Differentiate Sleep Phases Jasper Kuijsten Eindhoven University of Technology P.O. Box 53, 5600MB, Eindhoven,

### OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization

### A System for Capturing High Resolution Images

A System for Capturing High Resolution Images G.Voyatzis, G.Angelopoulos, A.Bors and I.Pitas Department of Informatics University of Thessaloniki BOX 451, 54006 Thessaloniki GREECE e-mail: pitas@zeus.csd.auth.gr

### Probability and Random Variables. Generation of random variables (r.v.)

Probability and Random Variables Method for generating random variables with a specified probability distribution function. Gaussian And Markov Processes Characterization of Stationary Random Process Linearly

### TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments

TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments Shinpei Kato*, Karthik Lakshmanan*, Raj Rajkumar*, and Yutaka Ishikawa** * Carnegie Mellon University ** The University of Tokyo Graphics

### Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61

F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase