Computer Vision on GPU with OpenCV Anton Obukhov, NVIDIA (aobukhov@nvidia.com)

Similar documents
Introduction to OpenCV for Tegra. Shalini Gupta, Nvidia

Local features and matching. Image classification & object localization

A Comparison of Keypoint Descriptors in the Context of Pedestrian Detection: FREAK vs. SURF vs. BRISK

How To Train A Face Recognition In Python And Opencv

CS231M Project Report - Automated Real-Time Face Tracking and Blending

PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology)

Computer Vision Technology. Dave Bolme and Steve O Hara

HPC with Multicore and GPUs

A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow

Android and OpenCV Tutorial

Optimizing a 3D-FWT code in a cluster of CPUs+GPUs

Texture Cache Approximation on GPUs

How To Program With Adaptive Vision Studio

A Computer Vision System on a Chip: a case study from the automotive domain

Advanced analytics at your hands

Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15

GPU Point List Generation through Histogram Pyramids

Performance monitoring at CERN openlab. July 20 th 2012 Andrzej Nowak, CERN openlab

VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS

GPU Usage. Requirements

A Prototype For Eye-Gaze Corrected

Parallel Image Processing with CUDA A case study with the Canny Edge Detection Filter

ANDROID DEVELOPER TOOLS TRAINING GTC Sébastien Dominé, NVIDIA

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

Computer Graphics Hardware An Overview

Next Generation GPU Architecture Code-named Fermi

VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS

Interactive Level-Set Deformation On the GPU

Real Time Industrial Application of Single Board Computer Based Color Detection System

GETTING STARTED WITH ANDROID DEVELOPMENT FOR EMBEDDED SYSTEMS

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Equalizer. Parallel OpenGL Application Framework. Stefan Eilemann, Eyescale Software GmbH

Real-time Visual Tracker by Stream Processing

Distributed Vision Processing in Smart Camera Networks

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Get an Easy Performance Boost Even with Unthreaded Apps. with Intel Parallel Studio XE for Windows*

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

Getting Started with RemoteFX in Windows Embedded Compact 7

The use of computer vision technologies to augment human monitoring of secure computing facilities

IP Video Rendering Basics

Go Faster - Preprocessing Using FPGA, CPU, GPU. Dipl.-Ing. (FH) Bjoern Rudde Image Acquisition Development STEMMER IMAGING

Lazy OpenCV installation and use with Visual Studio

Basler. Line Scan Cameras

Real-time Person Detection and Tracking in Panoramic Video

RIPL. An Image Processing DSL. Rob Stewart & Deepayan Bhowmik. 1st May, Heriot Watt University

Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture.

ultra fast SOM using CUDA

Lecture 2: The SVM classifier

Learn CUDA in an Afternoon: Hands-on Practical Exercises

Using Mobile Processors for Cost Effective Live Video Streaming to the Internet

In: Proceedings of RECPAD th Portuguese Conference on Pattern Recognition June 27th- 28th, 2002 Aveiro, Portugal

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

Revision history: New comments added to: m)! Universal Driver

If you are working with the H4D-60 or multi-shot cameras we recommend 8GB of RAM on a 64 bit Windows and 1GB of video RAM.

CUDAMat: a CUDA-based matrix class for Python

Silverlight for Windows Embedded Graphics and Rendering Pipeline 1

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

The Visual Internet of Things System Based on Depth Camera

Network Traffic Monitoring and Analysis with GPUs

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute

GPU-Based Network Traffic Monitoring & Analysis Tools

Analecta Vol. 8, No. 2 ISSN

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

Anime Studio Debut vs. Pro

Managing Adaptability in Heterogeneous Architectures through Performance Monitoring and Prediction

Intro to GPU computing. Spring 2015 Mark Silberstein, , Technion 1

NVIDIA Performance Primitives & Video Codecs on GPU. Gold Room Thursday 1 st October 2009 Anton Obukhov & Frank Jargstorff

EXPLORING IMAGE-BASED CLASSIFICATION TO DETECT VEHICLE MAKE AND MODEL FINAL REPORT

Example of Standard API

ROBOTRACKER A SYSTEM FOR TRACKING MULTIPLE ROBOTS IN REAL TIME. by Alex Sirota, alex@elbrus.com

Robust Algorithms for Current Deposition and Dynamic Load-balancing in a GPU Particle-in-Cell Code

Multi-GPU Load Balancing for Simulation and Rendering

An Active Head Tracking System for Distance Education and Videoconferencing Applications

Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Stream Processing on GPUs Using Distributed Multimedia Middleware

Research and Design of Universal and Open Software Development Platform for Digital Home

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

Writing Applications for the GPU Using the RapidMind Development Platform

Network Traffic Monitoring & Analysis with GPUs

DEFERRED IMAGE PROCESSING IN INTEL IPP LIBRARY

Overview. Proven Image Quality and Easy to Use Without a Frame Grabber. Your benefits include:

Packet-based Network Traffic Monitoring and Analysis with GPUs

Basler. Area Scan Cameras

Lecture Notes, CEng 477

Affdex SDK for Windows!

Turbomachinery CFD on many-core platforms experiences and strategies

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

CS1112 Spring 2014 Project 4. Objectives. 3 Pixelation for Identity Protection. due Thursday, 3/27, at 11pm

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Crosswalk: build world class hybrid mobile apps

Capacities Overview: 9.7 MultiTouch Screen with IPS technology Access to AndroidTM apps HD Multimedia playback

Introduction to GPU hardware and to CUDA

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

EFFICIENT VEHICLE TRACKING AND CLASSIFICATION FOR AN AUTOMATED TRAFFIC SURVEILLANCE SYSTEM

Transcription:

Computer Vision on GPU with OpenCV Anton Obukhov, NVIDIA (aobukhov@nvidia.com)

Outline Introduction into OpenCV OpenCV GPU module Face Detection on GPU Pedestrian detection on GPU 2

OpenCV History OpenCV started Alpha Release CVPR 00 Beta 1 Beta 2 Beta 3 Beta 4 Beta 5 Release 1.0 Release 2.0 Release 1.1 Release 2.2 (GPU Beta) Release 2.3 R 2.1 (GPU) 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 Intel Willow Garage NVIDIA Original goal: Accelerate the field by lowering the bar to computer vision Find compelling uses for the increasing MIPS out in the market Staffing: Climbed in 1999 to average 7 first couple of years Little development from 2002 2008 Willow entered in 2008 to accelerate development, NVIDIA joined in 2010 8 full time professional developers, 3 of them dedicated to GPU 3

OpenCV Functionality Overview Image processing General Image Processing Segmentation Machine Learning, Detection Image Pyramids Transforms Fitting Video, Stereo, and 3D Camera Calibration Features Depth Maps Optical Flow Inpainting Tracking 4

OpenCV Architecture and Development Languages: C C++ Python CUDA JAVA (plans) Technologies: CUDA SSE TBB 3 rd party libs: Eigen IPP Jasper JPEG, PNG OpenNI QT TBB VideoInput Development: Maintainers Contributors QA: Buildbot Google Tests Modules: Core ImgProc HighGUI GPU ML ObjDetect Video Calib3D Features2D FLANN Target archs: X86 X64 ARM CUDA Target OS: Windows Linux Mac OS Android 5

OpenCV License Based on BSD license Free for commercial and research use Does not force your code to be open You need not contribute back 6

Projects Using OpenCV Academic and Research Large community Industry Google street view Structure from motion in movies Robotics Over 3 000 000 downloads 7

Recent OpenCV release (2.3rc, June 2011) Added patent-free feature descriptors Improved camera calibration Added panorama stitching module GPU module release Android port Fine documentation (online too: http://opencv.itseez.com/) 250 bugfixes Full list on http://opencv.willowgarage.com/wiki/opencv Change Logs 8

Outline Introduction into OpenCV OpenCV GPU module Face Detection on GPU Pedestrian detection on GPU 9

OpenCV GPU Module Goals: Provide developers with a convenient computer vision framework on the GPU Maintain conceptual consistency with the current CPU functionality Achieve the best performance with GPUs Efficient kernels tuned for modern architectures Optimized dataflows (asynchronous execution, copy overlaps, zero-copy) 10

OpenCV GPU Module Contents Image processing building blocks: Color conversions Geometrical transforms Per-element operations Integrals, reductions Template matching Filtering engine Feature detectors High-level algorithms: Stereo matching Face detection SURF 11

OpenCV GPU Module Usage Prerequisites: Get sources from SourceForge or SVN CMake NVIDIA Display Driver NVIDIA GPU Computing Toolkit (for CUDA) Build OpenCV with CUDA support #include <opencv2/gpu/gpu.hpp> http://opencv.willowgarage.com/wiki/installguide 12

OpenCV GPU Data Structures Class GpuMat For storing 2D image in GPU memory, just like class cv::mat Reference counting Can point to data allocated by user Class CudaMem For pinned memory support Can be transformed into cv::mat or cv::gpu::gpumat Class Stream Overloads with extra Stream parameter // class GpuMat GpuMat(Size size, int type); GpuMat(const GpuMat& m); explicit GpuMat (const Mat& m); GpuMat& operator = (const GpuMat& m); GpuMat& operator = (const Mat& m); void upload(const Mat& m); void upload(const CudaMem& m, Stream& stream); void download(mat& m) const; void download(cudamem& m, Stream& stream) const; // class Stream bool queryifcomplete(); void waitforcompletion(); void enqueuedownload(const GpuMat& src, Mat& dst); void enqueueupload(const Mat& src, GpuMat& dst); void enqueuecopy(const GpuMat& src, GpuMat& dst); 13

OpenCV GPU Module Example Mat frame; VideoCapture capture(camera); cv::hogdescriptor hog; hog.setsvmdetector(cv::hogdescriptor:: getdefaultpeopledetectorector()); capture >> frame; Mat frame; VideoCapture capture(camera); cv::gpu::hogdescriptor hog; hog.setsvmdetector(cv::hogdescriptor:: getdefaultpeopledetectorector()); capture >> frame; GpuMat gpu_frame; gpu_frame.upload(frame); vector<rect> found; hog.detectmultiscale(frame, found, 1.4, Size(8, 8), Size(0, 0), 1.05, 8); vector<rect> found; hog.detectmultiscale(gpu_frame, found, 1.4, Size(8, 8), Size(0, 0), 1.05, 8); Designed very similar! 14

OpenCV and NPP NPP is NVIDIA Performance Primitives library of signal and image processing functions (similar to Intel IPP) NVIDIA will continue adding new primitives and optimizing for future architectures GPU module uses NPP whenever possible Highly optimized implementations for all supported NVIDIA architectures and OS Part of CUDA Toolkit no additional dependencies OpenCV extends NPP and uses it to build higher level CV 15

OpenCV GPU Module Performance Tesla C2050 (Fermi) vs. Core i5-760 2.8GHz (4 cores, TBB, SSE) Average speedup for primitives: 33 For good data (large images are better) Without copying to GPU What can you get from your computer? opencv\samples\gpu\perfomance 16

OpenCV GPU: Histogram of Oriented Gradients Used for pedestrian detection Speed-up ~ 8 17

OpenCV GPU: Speeded Up Robust Features SURF (12 ) Bruteforce matcher K-Nearest search (20-30 ) In radius search (3-5 ) 18

FULL HD FULL HD OpenCV GPU: Stereo Vision Stereo Block Matching (7 ) Can run Full HD real-time on Dual-GPU GPU BM Hierarchical Dense Stereo Belief Propagation (20 ) Constant space BP (50-100 ) GPU CSBP 19

OpenCV GPU: Viola-Jones Cascade Classifier Used for face detection Speed-up ~ 6 Based on NCV classes (NVIDIA implementation) 20

OpenCV with Multiple GPUs Algorithms designed with single GPU in mind You can split workload manually in slices: Stereo Block Matching (dual-gpu speedup ~ 1.8 ) Multi-scale pedestrian detection: linear speed-up (scale-parallel) 21

OpenCV Needs Your Feedback! Help us set development priorities Which OpenCV functions do you use? Which are the most painful and time-consuming today? The more information you can provide about your end application, the better Feature request/feedback form on OpenCV Wiki: http://opencv.willowgarage.com/wiki/opencv_gpu 22

Outline Introduction into OpenCV OpenCV GPU module Face Detection on GPU Pedestrian detection on GPU 23

GPU Face Detection: Motivation One of the first Computer Vision problems Soul of Human-Computer interaction Smart applications in real life 24

GPU Face Detection: Problem Locate all upright frontal faces: Where face detection does not work: 25

GPU Face Detection: Approaches Viola-Jones Haar classifiers framework: Y 1 2 N Y Y Y Face N N N Not face N Basic idea: reject most non-face regions on early stages 26

Classifiers Cascade Explained 20 pix White points represent face windows passed through the 1,2,3,6, and 20 classifier stages Time for CUDA to step in! (Parallel windows processing) 27

GPU Face Detection: Haar Classifier Each stage comprises a strong classifier: 1 h 1 h 2 h S T φ > f (X) Yes No α 1 α 2 28

Haar Features Explained Most representative Haar features for Face Detection 29

Integral Image Explained Each Integral Image pixel contains the sum of all pixels of the original image to the left and top Calculation of sum of pixels in a rectangle can be done in 4 accesses to the integral image 30

Integral Images with CUDA Algorithm: Integrate image rows Integrate image columns Known as Parallel Scan (one CUDA thread per element): Input: Output: 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 31

GPU Face Detection Performance 32

OpenCV NCV Framework Features: Native and Stack GPU memory allocators Protected allocations (fail-safety) Containers: NCVMatrix, NCVVector Runtime C++ template dispatcher NPP_staging a place for missing NPP functions Integral images Mean and StdDev calculation Vector compaction 33

OpenCV NCV Haar Cascade Classifiers Haar Object Detection from OpenCV GPU module: Implemented on top of NCV Uses NPP with extensions (NPP_staging) Not only faces! Suitable for production applications Reliable (fail-safe) Largest Object mode (up to 200 fps) All Objects mode 34

Outline Introduction into OpenCV OpenCV GPU module Face Detection on GPU Pedestrian detection on GPU 35

Pedestrian Detection HOG descriptor Introduced by Navneet Dalal and Bill Triggs Feature vectors are compatible with the INRIA Object Detection and Localization Toolkit http://pascal.inrialpes.fr/soft/olt/ 36

Pedestrian Detection: HOG Descriptor Object shape is characterized by distributions of: Gradient magnitude Edge/Gradient orientation Grid of orientation histograms Magnitude Edge Orientation 37

Pedestrian Detection: Working on Image Gamma correction Gradients calculation Sliding window algorithm Multi-scale 38

Pedestrian Detection: Inside Window Compute histograms inside cells Normalize blocks of cells One cell may belong to >1 block Apply linear SVM classifier Window descriptor Blocks of cells Cells 39

Pedestrian Detection: Step 1 Gradients computation Block histograms calculation Histograms normalization Linear SVM Gamma correction improves quality Sobel filter 3x3 by columns and rows Output: magnitude and angle 40

Pedestrian Detection: Step 2 Gradients computation Block histograms calculation Histograms normalization Linear SVM Big intersection in close positions Require window stride to be multiple of cell size Histograms of blocks are computed independently Image 41

Pedestrian Detection: Step 2 Gradients computation Block histograms calculation Histograms normalization Linear SVM Pixels vote in proportion to gradient magnitude Tri-linear interpolation 2 orientation bins 4 cells Gaussian Decreases weight of pixels near block boundary 42

Pedestrian Detection: Step 3 Gradients computation Block histograms calculation Histograms normalization Linear SVM Normalization L2-Hys norm L2 norm, clipping, normalization 2 parallel reductions in shared memory 43

Pedestrian Detection: Step 4 Gradients computation Block histograms calculation Histograms normalization Linear SVM Linear SVM GPU time, % Classification is just a dot product 6% 5% 20% Gamma + Gradients 1 thread block per window position 30% Histograms SVM Normalize 39% Other 44

Pedestrian Detection Performance 8 times faster! Detection rate Same as CPU FPS 40 35 30 25 20 15 10 5 Core i5 2.8 GHz TBB, 4 cores Tesla C2050 0 768x576 1536x1152 45

GPU Technology Conference Spring 2012 San Francisco Bay Area The one event you can t afford to miss Learn about leading-edge advances in GPU computing Explore the research as well as the commercial applications Discover advances in computational visualization Take a deep dive into parallel programming Ways to participate Speak share your work and gain exposure as a thought leader Register learn from the experts and network with your peers Exhibit/Sponsor promote your company as a key player in the GPU ecosystem www.gputechconf.com

Thank you http://opencv.willowgarage.com/wiki anatoly.bakshev@itseez.com 47