Real-time Visual Tracker by Stream Processing

Similar documents

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

Computer Graphics Hardware An Overview

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

Texture Cache Approximation on GPUs

ultra fast SOM using CUDA

GeoImaging Accelerator Pansharp Test Results

A Robust Multiple Object Tracking for Sport Applications 1) Thomas Mauthner, Horst Bischof

Real-Time Camera Tracking Using a Particle Filter

Introduction to GPGPU. Tiziano Diamanti

Interactive Level-Set Deformation On the GPU

A Prototype For Eye-Gaze Corrected

GPGPU Computing. Yong Cao

Introduction to GPU Programming Languages

SIMULTANEOUS LOCALIZATION AND MAPPING FOR EVENT-BASED VISION SYSTEMS

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

Robust Infrared Vehicle Tracking across Target Pose Change using L 1 Regularization

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

GPU Accelerated Monte Carlo Simulations and Time Series Analysis

Interactive Level-Set Segmentation on the GPU

GTC 2014 San Jose, California

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

Tracking in flussi video 3D. Ing. Samuele Salti

Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks. October 20 th 2015

Performance evaluation of multi-camera visual tracking

Intro to GPU computing. Spring 2015 Mark Silberstein, , Technion 1

Alberto Corrales-García, Rafael Rodríguez-Sánchez, José Luis Martínez, Gerardo Fernández-Escribano, José M. Claver and José Luis Sánchez

CSE168 Computer Graphics II, Rendering. Spring 2006 Matthias Zwicker

CS231M Project Report - Automated Real-Time Face Tracking and Blending

GPGPU accelerated Computational Fluid Dynamics

Mean-Shift Tracking with Random Sampling

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Towards robust automatic detection of vulnerable road users: monocular pedestrian tracking from a moving vehicle

GPUs for Scientific Computing

The Future Of Animation Is Games

GPGPU acceleration in OpenFOAM

Speed Performance Improvement of Vehicle Blob Tracking System

Final Project Report. Trading Platform Server

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute

MONTE-CARLO SIMULATION OF AMERICAN OPTIONS WITH GPUS. Julien Demouth, NVIDIA

Introduction to GPU Computing

The Scientific Data Mining Process

Introduction to GPU hardware and to CUDA

Deterministic Sampling-based Switching Kalman Filtering for Vehicle Tracking

VEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS

Towards Large-Scale Molecular Dynamics Simulations on Graphics Processors

Head and Facial Animation Tracking using Appearance-Adaptive Models and Particle Filters

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

Stream Processing on GPUs Using Distributed Multimedia Middleware

VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS

Tracking Groups of Pedestrians in Video Sequences

Fast Implementations of AES on Various Platforms

CUDA programming on NVIDIA GPUs

Dynamic Resolution Rendering

Robust Visual Tracking and Vehicle Classification via Sparse Representation

GPU-BASED TUNING OF QUANTUM-INSPIRED GENETIC ALGORITHM FOR A COMBINATORIAL OPTIMIZATION PROBLEM

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

Parallel Image Processing with CUDA A case study with the Canny Edge Detection Filter

Clustering Billions of Data Points Using GPUs

NVIDIA GeForce GTX 580 GPU Datasheet

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

GPGPU for Real-Time Data Analytics: Introduction. Nanyang Technological University, Singapore 2

Master s thesis tutorial: part III

Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture.

GPU Parallel Computing Architecture and CUDA Programming Model

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

Parallel Prefix Sum (Scan) with CUDA. Mark Harris

GPGPU: General-Purpose Computation on GPUs

multimodality image processing workstation Visualizing your SPECT-CT-PET-MRI images

A Comparative Study between SIFT- Particle and SURF-Particle Video Tracking Algorithms

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

WAKING up without the sound of an alarm clock is a

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

Parallel algorithms to a parallel hardware: Designing vision algorithms for a GPU

A Movement Tracking Management Model with Kalman Filtering Global Optimization Techniques and Mahalanobis Distance

Probability and Random Variables. Generation of random variables (r.v.)

Porting the Plasma Simulation PIConGPU to Heterogeneous Architectures with Alpaka

ArcGIS Pro: Virtualizing in Citrix XenApp and XenDesktop. Emily Apsey Performance Engineer

PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUMBER OF REFERENCE SYMBOLS

Silverlight for Windows Embedded Graphics and Rendering Pipeline 1

Accelerating Wavelet-Based Video Coding on Graphics Hardware

Optimizing a 3D-FWT code in a cluster of CPUs+GPUs

High Performance GPU-based Preprocessing for Time-of-Flight Imaging in Medical Applications

Introduction to GPU Architecture

Image Processing & Video Algorithms with CUDA

Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries

On-line Tracking Groups of Pedestrians with Bayesian Networks

QCD as a Video Game?

Visualization and Feature Extraction, FLOW Spring School 2016 Prof. Dr. Tino Weinkauf. Flow Visualization. Image-Based Methods (integration-based)

Comparing Improved Versions of K-Means and Subtractive Clustering in a Tracking Application

HPC with Multicore and GPUs

Amazing renderings of 3D data... in minutes.

HIGH PERFORMANCE CONSULTING COURSE OFFERINGS

Le langage OCaml et la programmation des GPU

Building an Advanced Invariant Real-Time Human Tracking System

Parallel 3D Image Segmentation of Large Data Sets on a GPU Cluster

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

Transcription:

Real-time Visual Tracker by Stream Processing Simultaneous and Fast 3D Tracking of Multiple Faces in Video Sequences by Using a Particle Filter Oscar Mateo Lozano & Kuzahiro Otsuka presented by Piotr Rudol

Outline Particle Filter Basics Visual Tracking of Faces - Problem Definition CUDA Implementation Results Conclusion

Particle Filtering Particle filtering is a state estimation technique based on Monte Carlo simulations. Particles are random values of a state-space variable which are checked against the current output measure to generate a weight value i.e. likeliness of a particle being the one that best describes the current state of the system. The collection of all particles and their weights in each instant is a numerical approximation of the probability density function of the system. The probabilistic approach of this method provides significant robustness, as several possible states of the system are tracked at any given moment.

Particle Filtering First, a population of M samples is created by sampling from the prior distribution at time 0, P(X0). Then the update cycle is repeated for each time step: Each sample is propagated forward by sampling the next state value xt+1 given the current value xt for the sample, using the transition model P(Xt+1 xt). Each sample is weighted by the likelihood it assigns to the new evidence P(et+1 xt+1). The population is resampled to generate a new population of M samples. Each new sample is selected from the current population; the probability that a particular sample is selected is proportional to its weight. This procedure puts in focus samples in high-probability regions of the state space.

Particle Filter-based Localization

Problem Definition We consider the state sequence of a target: xt = (Tx, Ty, S, Rx, Ry, Rz, α) Tx, Ty - translational coordinate S - scale Rx, Ry, Rz - rotations along all axis α - global illumination variable

Problem Definition The state-space vector is given in each discrete time step by: x t = ft(x t-1, v t-1 ) - state transition model ft is possibly non-linear function of state xt-1, {vt-1} is an independent and identically-distributed process noise sequence. The objective of tracking is to recursively estimate x t from measurements: z t = ht(x t, n t ) - observation model ht is possibly non-linear function, {nt-1} is an independent and identically-distributed measurement noise sequence. We seek filtered estimates of x t based on the set of all available measurements z 1:t up to time t, together with some degree of belief in them. Namely, we want to know the p.d.f. p(x t z 1:t ), assuming that the initial p.d.f. p( x 0 ) is known.

Particle Filter Solution is an approximation method that makes use of Monte Carlo simulations [2]. We now consider discrete particles forming the set: ( ) { X t 1, t 1 = ( x 0 t 1, π t 1 0 ),..., M ( x t 1, π t 1) } M (4) We consider discrete particles forming the set: Every particle contains information about one possible state every of the particle system x contains j information about j one possible state of the t 1 and its importance weight πt 1 system and its importance weight.. This sample set is propagated in the Condensation algorithm as follows: Select A set of M new particles ( X t, t ) is generated, by random selection from the previous particles by using a p.d.f. according to their weights t 1. Diffuse A Gaussian random noise is added to each parameter of each particle in X t. Measure weight The probability t is measured using template matching, based on the difference error between the template (the collection of all pixels forming the face in t = 0, or some carefully selected pixels of the template if we use a sparse template [16, 17] as we describe in Section 2.3) and the input image at each time step. j O. Mateo Lozano, K where R 2 3 denotes the 2 3 upper submatri rotation matrix R. The matching error betw template and input image is defined based on th sity of each feature point, J(q m ), m = 1,, the intensity of the corresponding projected I( p m ), modified by the global illumination fa More precisely, the matching error, defined average matching error of N feature points, written as This sample is propagated in the Condensation algorithm as follows: ε = 1 N N ρ (κ (J(q m ), I( p m ))) m=1 Select - A set of M new particles is generated by random selection from the previous particles by updating the p.d.f. according their weight. κ(j, I) = α I J J Diffuse - A Gaussian noise is added to each parameter of each particle ρ(κ) = κ2 κ 2 + 1 Measure weight - The likelihood of each particle being the true state is calculated based on template matching using the input image at each step. where κ(j, I) denotes the relative residual b intensity values of the feature point q and the pr point p, and ρ(κ) represents a robust functio is used to increase the robustness against outlie [17]. From the matching error for each particle

The Algorithm

Initialization Every n-th frame is analyzed by a boosting algorithm to detect new faces giving a rectangular sub-image. a b c A fitting of a statistical deformable model is performed resulting in a series of 53 landmark points that correspond to known features of a human face. The landmarks include a pair of texture coordinates, a set of numbers that refer to a unique part of a texture resource stored in memory. This way two image buffers ( height- and normalmap ) are created by applying a 3D face model.

Initialization The feature points (position and gray level) that form the sparse template are selected using image processing techniques, and their depth and normal to the surface values are extracted from prior maps. Finally the template is formed by a stream of N feature points formed as follows: p = (Px, Py, Pz, Nx, Ny, Nz, J) Px, Py, Pz - coordinates of feature points Nx, Ny, Nz - form the vector normal to the face surface in the feature point J - gray level of the feature point

Initialization Why do feature points include normal vectors? If a normal vector points away from the camera it is likely occluded by another feature point. It s weight can be disregarded.

Tracking Tracking is performed by means of a Particle Filter. Out of selection, diffusion, and weighting the latter is the most computationally intensive. It requires performing the 3D transformation of each feature point as estimated by each of the particles, and then a comparison is made of the feature point gray level against the resulting point in the full image. The sum of all those comparisons for each feature point results in the weight of one particle. For example with M=1000 particles and N=230 feature points it s 230000 weight computation operations! Weight calculation of each particle is an independent process!

Display After every tracking step, the best particles are averaged to get an approximation of the system state. The resulting state-space values can displayed on the screen as rotated, scaled and translated 3D-mesh models.

CUDA implementation After initialization, feature point stream is copied to device memory. It doesn t change during tracking. At every step the current video frame and a buffer with particles is copied to device memory. The kernel is invoked and executed in M blocks, each with N threads i.e. one block per particle and one thread per feature point per block. Each thread computes the matching error of a particle-feature pair i.e. transforms the feature point position and normal vector, accesses the video frame pixel in that position (if normal vector does not mean occlusion) and compares the gray level. The result is placed by every thread in different position in the block s shared memory. When all threads in a block finish (sync), first thread of every block sums up matching errors and the result is placed in global device memory. The host retrieves stream of weights of every particle.

Hardware Intel Core II 2.66GHz host system with 2 GB RAM NVIDIA GeForce 8800GTX GPU: 128 processing units, 16 multiprocessors, 350GFLOPS under CUDA. For comparison purposes, a software-only version of the weighting algorithm was also developed (single-threaded, no SIMD extensions used) for execution on the host CPU alone.

Results Results of the simultaneous tracking of four people. The frame sequence is taken from a synthetic 1024 768 video, the sparse templates are composed of approximately 230 feature points, and each template is tracked using 1000 particles.

ks, cle ch rtisime nd ult of all he in ry m bal this GPGPU (as opposed to SP) technique. Brook also suffers from the excessive overhead associated with Time per frame (ms) 100 90 80 70 60 50 40 30 20 10 NRT RT GPU Software Results Particle weighting speed 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Number of particles Speed of the particle weighting stage, comparing Stream processing in the GPU version to a serial CPU-only version. Video 1024 768 and 217 feature points ut Figure 6 Speed of the particle weighting stage, comparing Stream processing in the GPU version to a serial CPU-only

Results Real-time visual tracker by stream processing Frames per second 40 35 30 25 20 15 10 5 0 RT NRT Full application speed 1 2 3 4 5 6 Faces GPU Software Speed of the full application, comparing stream processing in the GPU version to a serial CPU-only version. Video 1024x768, 230 feature points per face, and 1000 particles per face. Figure 7 Speed of the full application, comparing stream processing in the GPU version to a serial CPU-only version. BE, or many (usuall mind) implem One effort PFs so hardwa proble sents a step, s CPU ( in [25] pling c

Summary Particle Filter on GPU-based visual tracking of human faces was presented. Thanks to easy parallelization of weight computation operations a significant performance increase was achieved. GPU based filter runs easily in real-time with 10k particles comparing to approx. 4.5k for CPU only version. GPU based filter can track up to 6 faces in a video stream in real time comparing to 0 for CPU only version. The GPU-based particle filter can easily by used in different domains.