Computer Vision on GPU with OpenCV Anatoly Baksheev, Itseez

Similar documents
Computer Vision on GPU with OpenCV Anton Obukhov, NVIDIA

Introduction to OpenCV for Tegra. Shalini Gupta, Nvidia

Learn CUDA in an Afternoon: Hands-on Practical Exercises

HPC with Multicore and GPUs

Image Processing & Video Algorithms with CUDA

Parallel Image Processing with CUDA A case study with the Canny Edge Detection Filter

CUDA Programming. Week 4. Shared memory and register

Local features and matching. Image classification & object localization

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Introduction to GPU Computing

Lecture 2: The SVM classifier

Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model

The Visual Internet of Things System Based on Depth Camera

Introduction to GPU hardware and to CUDA

VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS

A Prototype For Eye-Gaze Corrected

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

CUDA Basics. Murphy Stein New York University

How To Program With Adaptive Vision Studio

Interactive Level-Set Deformation On the GPU

Computer Graphics Hardware An Overview

A Computer Vision System on a Chip: a case study from the automotive domain

Advanced CUDA Webinar. Memory Optimizations

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow

OpenACC Basics Directive-based GPGPU Programming

CUDAMat: a CUDA-based matrix class for Python

Lecture Notes, CEng 477

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute

EFFICIENT VEHICLE TRACKING AND CLASSIFICATION FOR AN AUTOMATED TRAFFIC SURVEILLANCE SYSTEM

Automatic CUDA Code Synthesis Framework for Multicore CPU and GPU architectures

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

Rootbeer: Seamlessly using GPUs from Java

Introduction to GPGPU. Tiziano Diamanti

GPU Computing with CUDA Lecture 3 - Efficient Shared Memory Use. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

Hands-on CUDA exercises

APPLICATIONS OF LINUX-BASED QT-CUDA PARALLEL ARCHITECTURE

Interactive Level-Set Segmentation on the GPU

The use of computer vision technologies to augment human monitoring of secure computing facilities

T O B C A T C A S E G E O V I S A T DETECTIE E N B L U R R I N G V A N P E R S O N E N IN P A N O R A MISCHE BEELDEN

RIPL. An Image Processing DSL. Rob Stewart & Deepayan Bhowmik. 1st May, Heriot Watt University

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

A Comparison of Keypoint Descriptors in the Context of Pedestrian Detection: FREAK vs. SURF vs. BRISK

Analecta Vol. 8, No. 2 ISSN

Go Faster - Preprocessing Using FPGA, CPU, GPU. Dipl.-Ing. (FH) Bjoern Rudde Image Acquisition Development STEMMER IMAGING

Texture Cache Approximation on GPUs

GeoImaging Accelerator Pansharp Test Results

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

EXPLORING IMAGE-BASED CLASSIFICATION TO DETECT VEHICLE MAKE AND MODEL FINAL REPORT

How To Use Trackeye

PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology)

Optimizing a 3D-FWT code in a cluster of CPUs+GPUs

Distributed Vision Processing in Smart Camera Networks

MVTec Software GmbH.

GPU Point List Generation through Histogram Pyramids

Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA

QCD as a Video Game?

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

Image Classification for Dogs and Cats

Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture.

Tutorial for Tracker and Supporting Software By David Chandler

Embedded Vision on FPGAs The MathWorks, Inc. 1

High Performance Matrix Inversion with Several GPUs

How does the Kinect work? John MacCormick

GPUs for Scientific Computing

Comp 410/510. Computer Graphics Spring Introduction to Graphics Systems

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

GPGPU Computing. Yong Cao

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Retargeting PLAPACK to Clusters with Hardware Accelerators

CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University

Lazy OpenCV installation and use with Visual Studio

Accelerating Wavelet-Based Video Coding on Graphics Hardware

OpenACC 2.0 and the PGI Accelerator Compilers

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.

Parallel algorithms to a parallel hardware: Designing vision algorithms for a GPU

Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg

Adaptive GPU-Accelerated Software Beacon Processing for Geospace Sensing

Sélection adaptative de codes polyédriques pour GPU/CPU

Intro to GPU computing. Spring 2015 Mark Silberstein, , Technion 1

Turbomachinery CFD on many-core platforms experiences and strategies

GPU Computing with CUDA Lecture 4 - Optimizations. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

IMAGE PROCESSING WITH CUDA

GPU-Based Network Traffic Monitoring & Analysis Tools

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

GPU Programming in Computer Vision

CUDA Optimization with NVIDIA Tools. Julien Demouth, NVIDIA

Android Ros Application

Image Processing on Graphics Processing Unit with CUDA and C++

Feature Tracking and Optical Flow

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang

Introduction to GPU Programming Languages

In: Proceedings of RECPAD th Portuguese Conference on Pattern Recognition June 27th- 28th, 2002 Aveiro, Portugal

Real time vehicle detection and tracking on multiple lanes

FPGA-BASED PARALLEL HARDWARE ARCHITECTURE FOR REAL-TIME OBJECT CLASSIFICATION. by Murad Mohammad Qasaimeh

Project Discussion Multi-Core Architectures and Programming

Introduction to Computer Graphics

CUDA Debugging. GPGPU Workshop, August Sandra Wienke Center for Computing and Communication, RWTH Aachen University

Transcription:

Computer Vision on GPU with OpenCV Anatoly Baksheev, Itseez (anatoly.baksheev@itseez.com)

Outline Introduction into OpenCV Overview of OpenCV GPU module Pedestrian detection on GPU 2

Lure into 3

OpenCV History OpenCV started Alpha Release CVPR 00 Beta 1 Beta 2 Beta 3 Beta 4 Beta 5 Release 1.0 Release 2.0 Release 1.1 Release 2.2 (GPU Beta) Release 2.3 R 2.1 (GPU) 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 Intel Willow Garage NVIDIA Original goal: Accelerate the field by lowering the bar to computer vision Find compelling uses for the increasing MIPS out in the market Staffing: Climbed in 1999 to average 7 first couple of years Little development from 2002 2008 Willow entered in 2008 to accelerate development, NVIDIA joined in 2010 8 full time professional developers, 3 of them dedicated to GPU 4

Projects Using OpenCV Academic and Research Large community Industry Google street view Structure from motion in movies Robotics Over 3 000 000 downloads 5

OpenCV Functionality Overview Image processing General Image Processing Segmentation Machine Learning, Detection Image Pyramids Transforms Fitting Video, Stereo, and 3D Camera Calibration Features Depth Maps Optical Flow Inpainting Tracking 6

OpenCV License Based on BSD license Free for commercial and research use Does not force your code to be open You need not contribute back 7

Outline Introduction into OpenCV OpenCV GPU module Pedestrian detection on GPU 8

OpenCV GPU Module Goals: Provide developers with a convenient computer vision framework on the GPU Maintain conceptual consistency with the current CPU functionality Achieve the best performance with GPUs Efficient kernels tuned for modern architectures Optimized dataflows (asynchronous execution, copy overlaps, zero-copy) 9

OpenCV GPU Module Example Mat frame; VideoCapture capture(camera); cv::hogdescriptor hog; hog.setsvmdetector(cv::hogdescriptor:: getdefaultpeopledetectorector()); capture >> frame; Mat frame; VideoCapture capture(camera); cv::gpu::hogdescriptor hog; hog.setsvmdetector(cv::hogdescriptor:: getdefaultpeopledetectorector()); capture >> frame; GpuMat gpu_frame; gpu_frame.upload(frame); vector<rect> found; hog.detectmultiscale(frame, found, 1.4, Size(8, 8), Size(0, 0), 1.05, 8); vector<rect> found; hog.detectmultiscale(gpu_frame, found, 1.4, Size(8, 8), Size(0, 0), 1.05, 8); Designed very similar! 10

OpenCV GPU Module Contents Image processing building blocks: Color conversions Geometrical transforms Per-element operations Integrals, reductions Template matching Filtering engine Feature detectors High-level algorithms: Stereo matching Face detection SURF 11

GPU Demo Pack: http://sourceforge.net/projects/opencvlibrary/ 12

Using in your application We recommend CMake for Quick Start. Allows to generate project for your favorite IDE CmakeLists.txt: cmake_minimum_required(version 2.8) set(the_target YourProjName} set(src "main.cpp" "other.cpp") project( ${the_target} ) find_package( OpenCV REQUIRED ) include_directories( ${OpenCV_INCLUDE_DIR} ) add_executable( ${the_target} ${src} ) target_link_libraries( ${the_target} ${OpenCV_LIBS} ) 13

OpenCV GPU Data Structures Class GpuMat For storing 2D image in GPU memory, just like class cv::mat Reference counting Can point to data allocated by user cv::gpumat edges; cv::mat edges_host; cv::gpumat image(size(1024, 768), CV_8UC1); // allocates GPU memory cv::mat image_host = cv::imread( file.png ); // loads image from file image.upload(image_host); cv::gpu::canny(image, edges, 1, 0, 3); // run GPU edges.download(edges_host); cv::imshow( derivate, edges_host); // displays result // cv::gpu::imshow( edges OpenGL, edges); // coming soon 14

GpuMat: Types explained cv::gpumat image(size(1024, 768), CV_8UC3); CV_<depth>C<number of channels> depth: 8U, 8S, 16U, 16S, 32S, 32F, 64F channels: 1, 2, 3, 4 Ex: CV_8UC1, CV_8UC3, CV_8UC4 CV_32F, CV32FC3 15

500 GpuMat: step explained cv::gpumat image(size(1000, 500), CV_8UC3); 1000 * 3 * sizeof(uchar) USED Bad idea: Size(1, n) step (x,y): - image.data + image.step * y + 3 * sizeof(unsigned char) * x - image.ptr<unsigned char>(y) + image.channels() * x; 16

Memory Allocation Policy GPU memory allocations are very expensive OpenCV GPU functions allocate memory inside if necessary cv::gpumat dx; cv::gpu::sobel(image, dx, 1, 0, 3); Pre-allocation on initialization stage cv::gpumat dx(size, CV_8U); //main functionality... cv::gpu::sobel(image, dx, 1, 0, 3);... Smart code organization cv::gpumat dx; for(;;) // video loop { GpuMat image = getnextvideoframe(); cv::gpu::sobel(image, dx, 1, 0, 3); } 17

Using OpenCV GPU with your code Constructing for memory allocated by you GpuMat(rows, cols, type, pointer, step) Passing to any other GPU libraries (CUFFT, CUBLAS, MAGMA) GpuMat::cols, rows, data, step Passing to your CUDA code Convertible to PtrStep<T>/PtrStepSz<T> global void func(cv::gpu::ptrstepsz<uchar3> mat) { int x = blockidx.x * blockdim.x + threadidx.x; int y = blockidx.y * blockdim.y + threadidx.y; } if (x < mat.cols && y < mat.rows) { int val = min(255, x + y); matr.ptr(y)[x] = make_char3(val, val, val); } 18

OpenCV GPU Module Performance Tesla C2050 (Fermi) vs. Core i5-760 2.8GHz (4 cores, TBB, SSE) Average speedup for primitives: 33 For good data (large images are better) Without copying to GPU What can you get from your computer? opencv\samples\gpu\performance 19

FULL HD FULL HD OpenCV GPU: Performance of High level algorithms HOG (8 ) SURF (12 ) Viola-Jones (6 ) Stereo matching BM (12 ) BP (20 ) GPU BM GPU CSBP 20

Documentation and tutorials http://opencv.itseez.com/ http://opencv.itseez.com/doc/tutorials/tutorials.html 22

Outline Introduction into OpenCV OpenCV GPU module Pedestrian detection on GPU 23

Pedestrian Detection HOG descriptor Introduced by Navneet Dalal and Bill Triggs (2005) Feature vectors are compatible with the INRIA Object Detection and Localization Toolkit http://pascal.inrialpes.fr/soft/olt/ 24

Pedestrian Detection: HOG Descriptor Object shape is characterized by distributions of: Gradient magnitude Edge/Gradient orientation Grid of orientation histograms Magnitude Edge Orientation 25

HOG Descriptor Sliding window algorithm Compute histograms Normalize blocks Blocks overlap Linear SVM Multi-scale Blocks of cells Cells Window descriptor 26

Multi-scale 27

Pedestrian Detection: Step 1 Gradients computation Block histograms calculation Histograms normalization Linear SVM Gamma correction improves quality Sobel filter 3x3 by columns and rows Output: magnitude and angle 28

Pedestrian Detection: Step 2 Gradients computation Block histograms calculation Histograms normalization Linear SVM Big intersection in close positions Require window stride to be multiple of cell size Histograms of blocks are computed independently Image 29

Pedestrian Detection: Step 2 Gradients computation Block histograms calculation Histograms normalization Linear SVM Pixels vote in proportion to gradient magnitude Tri-linear interpolation 2 orientation bins 4 cells Gaussian Decreases weight of pixels near block boundary 30

Pedestrian Detection: Step 3 Gradients computation Block histograms calculation Histograms normalization Linear SVM Normalization L2-Hys norm L2 norm, clipping, normalization 2 parallel reductions in shared memory 31

Pedestrian Detection: Step 4 Gradients computation Block histograms calculation Histograms normalization Linear SVM Linear SVM GPU time, % Classification is just a dot product 6% 5% 20% Gamma + Gradients 1 thread block per window position 30% Histograms SVM Normalize 39% Other 32

Pedestrian Detection Performance 8 times faster! Detection rate Same as CPU FPS 40 35 30 25 20 15 10 5 Core i5 2.8 GHz TBB, 4 cores Tesla C2050 0 768x576 1536x1152 33

Coming soon Implementation of Microsoft Kinect Fusion License??? 34

Questions? http://opencv.willowgarage.com/wiki http://opencv.itseez.com anatoly.bakshev@itseez.com 35