Sliding windows and face detection

Similar documents
Local features and matching. Image classification & object localization

Semantic Recognition: Object Detection and Scene Segmentation

Robust Real-Time Face Detection

Object Recognition. Selim Aksoy. Bilkent University

Convolutional Feature Maps

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Finding people in repeated shots of the same scene

3D Model based Object Class Detection in An Arbitrary View

Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006

Jiří Matas. Hough Transform

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang

Mean-Shift Tracking with Random Sampling

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

Interactive Offline Tracking for Color Objects

L25: Ensemble learning

Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.

Object class recognition using unsupervised scale-invariant learning

Probabilistic Latent Semantic Analysis (plsa)

VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS

Active Learning with Boosting for Spam Detection

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation

Boosting.

Feature Tracking and Optical Flow

Lecture 2: The SVM classifier

AN IMPROVED DOUBLE CODING LOCAL BINARY PATTERN ALGORITHM FOR FACE RECOGNITION

Who are you? Learning person specific classifiers from video

Classification using Intersection Kernel Support Vector Machines is Efficient

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

High Level Describable Attributes for Predicting Aesthetics and Interestingness

The Delicate Art of Flower Classification

CS231M Project Report - Automated Real-Time Face Tracking and Blending

Tracking in flussi video 3D. Ing. Samuele Salti

The Visual Internet of Things System Based on Depth Camera

Face Recognition in Low-resolution Images by Using Local Zernike Moments

Learning Detectors from Large Datasets for Object Retrieval in Video Surveillance

Image Classification for Dogs and Cats

A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow

The use of computer vision technologies to augment human monitoring of secure computing facilities

Segmentation as Selective Search for Object Recognition

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Learning Motion Categories using both Semantic and Structural Information

Segmentation & Clustering

CS 1699: Intro to Computer Vision. Deep Learning. Prof. Adriana Kovashka University of Pittsburgh December 1, 2015

Multi-View Object Class Detection with a 3D Geometric Model

Machine Learning Final Project Spam Filtering

Automated Attendance Management System using Face Recognition

Scale-Invariant Object Categorization using a Scale-Adaptive Mean-Shift Search

Part-Based Recognition

Rapid Object Detection using a Boosted Cascade of Simple Features

A Comparative Study between SIFT- Particle and SURF-Particle Video Tracking Algorithms

Introduction. Selim Aksoy. Bilkent University

Camera geometry and image alignment

FACE RECOGNITION BASED ATTENDANCE MARKING SYSTEM

The Scientific Data Mining Process

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Open-Set Face Recognition-based Visitor Interface System

VECTORAL IMAGING THE NEW DIRECTION IN AUTOMATED OPTICAL INSPECTION

An Active Head Tracking System for Distance Education and Videoconferencing Applications

Colour Image Segmentation Technique for Screen Printing

Robert Collins CSE598G. More on Mean-shift. R.Collins, CSE, PSU CSE598G Spring 2006

Randomized Trees for Real-Time Keypoint Recognition

MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr

International Journal of Advanced Information in Arts, Science & Management Vol.2, No.2, December 2014

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

Discovering objects and their location in images

Distributed Vision Processing in Smart Camera Networks

Visual Categorization with Bags of Keypoints

T O B C A T C A S E G E O V I S A T DETECTIE E N B L U R R I N G V A N P E R S O N E N IN P A N O R A MISCHE BEELDEN

OBJECT TRACKING USING LOG-POLAR TRANSFORMATION

Open issues and research trends in Content-based Image Retrieval

Semantic Image Segmentation and Web-Supervised Visual Learning

Data Mining Practical Machine Learning Tools and Techniques

THE development of methods for automatic detection

IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju

A Robust Multiple Object Tracking for Sport Applications 1) Thomas Mauthner, Horst Bischof

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Data Mining. Nonlinear Classification

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Canny Edge Detection

Fast Matching of Binary Features

TouchPaper - An Augmented Reality Application with Cloud-Based Image Recognition Service

LOCAL SURFACE PATCH BASED TIME ATTENDANCE SYSTEM USING FACE.

Taking Inverse Graphics Seriously

Deformable Part Models with CNN Features

Big Data: Image & Video Analytics

VISION BASED ROBUST VEHICLE DETECTION AND TRACKING VIA ACTIVE LEARNING

Supporting Online Material for

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

Tracking of Small Unmanned Aerial Vehicles

A Learning Based Method for Super-Resolution of Low Resolution Images

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

How To Model The Labeling Problem In A Conditional Random Field (Crf) Model

Transcription:

Slidi windows and face detection Tuesday, Nov 10 Kristen Grauman UT Austin Last time Modeli categories with local features and spatial information: Histograms configurations of visual words to capture Histograms, configurations of visual words to capture global or local layout in the bag-of-words framework Pyramid match, semi-local features 1

Pyramid match Histogram intersection counts number of possible matches at a given partitioni. Spatial pyramid match Make a pyramid of bag of words histograms. Provides some loose (global) spatial layout information [Lazebnik, Schmid & Ponce, CVPR 2006] 2

Last time Modeli categories with local features and spatial information: Histograms, configurations of visual words to capture global or local layout in the bag-of-words framework Pyramid match, semi-local features Part-based models to encode category s part appearance together with 2d layout, Allow detection within cluttered image implicit shape model, Generalized Hough for detection constellation model : exhaustive search for best fit of features to parts Implicit shape models Visual vocabulary is used to index votes for object position [a visual word = part ] traini image annotated with object localization info visual codeword with displacement vectors B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV Workshop on Statistical Learni in Computer Vision 2004 3

Implicit shape models Visual vocabulary is used to index votes for object position [a visual word = part ] test image B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV Workshop on Statistical Learni in Computer Vision 2004 Shape representation in part-based models Star shape model Fully connected constellation model ory Augmented Computi gnition Tutorial x 6 x 5 x 1 x 4 e.g. implicit it shape model Parts mutually independent Recognition complexity: O(NP) Method: Gen. Hough Transform x 2 x 3 N image features, P parts in the model x 6 x 5 x 1 x 4 e.g. Constellation Model Parts fully connected Recognition complexity: O(N P ) Method: Exhaustive search x 2 x 3 Slide credit: Rob Fergus 4

Coarse genres of recognition approaches Alignment: hypothesize and test Pose clusteri with object instances Indexi invariant features + verification Local features: as parts or words Part-based models Bags of words models Global appearance: texture templates With or without a slidi window Today Detection as classification Supervised classification Skin color detection example Slidi window detection Face detection example 5

Supervised classification Given a collection of labeled examples, come up with a function that will predict the labels of new examples. four nine Traini examples? Novel input How good is some function we come up with to do the classification? Depends on Mistakes made Cost associated with the mistakes Supervised classification Given a collection of labeled examples, come up with a function that will predict the labels of new examples. Consider the two-class (binary) decision problem L(4 9): Loss of classifyi a 4 as a 9 L(9 4): Loss of classifyi a 9 as a 4 Risk of a classifier s is expected loss: ( 4 9 usi s) L( 4 9) + Pr( 9 4 usi s) ( 9 4) R( s) = Pr L We want to choose a classifier so as to minimize this total risk 6

Supervised classification Optimal classifier will minimize total risk. Feature value x At decision boundary, either choice of label yields same expected loss. If we choose class four at boundary, expected loss is: = P(class is 9 x) L(9 4) + P(class is 4 x) L(4 4) = P(class is 9 x) L(9 4) If we choose class nine at boundary, expected loss is: = P( class is 4 x) L(4 9) Supervised classification Optimal classifier will minimize total risk. Feature value x At decision boundary, either choice of label yields same expected loss. So, best decision boundary is at point x where P( class is 9 x) L(9 4) = P(class is 4 x) L(4 9) To classify a new point, choose class with lowest expected loss; i.e., choose four if P( 4 x) L(4 9) > P(9 x) L(9 4) 7

Supervised classification P(4 x) P(9 x) Feature value x Optimal classifier will minimize total risk. At decision boundary, either choice of label yields same expected loss. So, best decision boundary is at point x where P( class is 9 x) L(9 4) = P(class is 4 x) L(4 9) To classify a new point, choose class with lowest expected loss; i.e., choose four if P( 4 x) L(4 9) > P(9 x) L(9 4) How to evaluate these probabilities? Probability Basic probability X is a random variable P(X) is the probability that X achieves a certain value called a PDF -probability distribution/density function or continuous X discrete X Conditional probability: P(X Y) probability of X given that we already know Y Source: Steve Seitz 8

Example: learni skin colors We can represent a class-conditional density usi a histogram (a non-parametric distribution) P(x skin) Percentage of skin pixels in each bin Feature x = Hue P(x not skin) Feature x = Hue Example: learni skin colors We can represent a class-conditional density usi a histogram (a non-parametric distribution) P(x skin) Now we get a new image, and want to label each pixel as skin or non-skin. What s the probability we care about to do skin detection? Feature x = Hue Feature x = Hue P(x not skin) 9

Bayes rule posterior likelihood prior P ( skin x) = P ( x skin ) P ( skin ) P( x) P ( skin x ) α P ( x skin ) P ( skin ) Where does the prior come from? Why use a prior? Example: classifyi skin pixels Now for every pixel in a new image, we can estimate probability that it is generated by skin. Brighter pixels higher probability of bei skin Classify pixels based on these probabilities 10

Example: classifyi skin pixels Gary Bradski, 1998 Example: classifyi skin pixels Usi skin color-based face detection and pose estimation as a video-based interface Gary Bradski, 1998 11

Supervised classification Want to minimize the expected misclassification Two general strategies t Use the traini data to build representative probability model; separately model class-conditional densities and priors (generative) Directly construct a good decision boundary, model the posterior (discriminative) Today Detection as classification Supervised classification Skin color detection example Slidi window detection Face detection example 12

Detection via classification: Main idea Basic component: a binary classifier Car/non-car Classifier No, Yes, not car. a car. Detection via classification: Main idea If object may be in a cluttered scene, slide a window around looki for it. (Essentially, our skin detector was doi this, with a window that was one pixel big.) Car/non-car Classifier 13

Detection via classification: Main idea Fleshi out this pipeline a bit more, we need to: 1. Obtain traini data 2. Define features 3. Define classifier Feature extraction Traini examples Car/non-car Classifier Detection via classification: Main idea Consider all subwindows in an image Sample at multiple scales and positions (and orientations) Make a decision per window: Does this contain object category X or not? 14

Feature extraction: global appearance Feature extraction Simple holistic descriptions of image content grayscale / color histogram vector of pixel intensities Feature extraction: global appearance Pixel-based representations sensitive to small shifts Color or grayscale-based appearance description can be sensitive to illumination and intra-class appearance variation 15

Gradient-based representations Consider edges, contours, and (oriented) intensity gradients Gradient-based representations Consider edges, contours, and (oriented) intensity gradients Summarize local distribution of gradients with histogram Locally orderless: offers invariance to small shifts and rotations Contrast-normalization: try to correct for variable illumination 16

Classifier construction How to compute a decision for each subwindow? Image feature Discriminative classifier construction: many choices Nearest neighbor Neural networks 10 6 examples Shakhnarovich, Viola, Darrell 2003 Berg, Berg, Malik 2005... Support Vector Machines Guyon, Vapnik Heisele, Serre, Poggio, 2001, Boosti Viola, Jones 2001, Torralba et al. 2004, Opelt et al. 2006, LeCun, Bottou, Beio, Haffner 1998 Rowley, Baluja, Kanade 1998 Conditional Random Fields McCallum, Freitag, Pereira 2000; Kumar, Hebert 2003 K. Grauman, B. Leibe Slide adapted from Antonio Torralba 17

Boosti Build a stro classifier by combini number of weak classifiers, which need only be better than chance Sequential learni process: at each iteration, add a weak classifier Flexible to choice of weak learner includi fast simple classifiers that alone may be inaccurate We ll look at the AdaBoost algorithm Easy to implement Base learni algorithm for Viola-Jones face detector AdaBoost: Intuition Figure adapted from Freund and Schapire Consider a 2-d feature space with positive and negative examples. Each weak classifier splits the traini examples with at least 50% accuracy. Examples misclassified by a previous weak learner are given more emphasis at future rounds. 18

AdaBoost: Intuition AdaBoost: Intuition 19

AdaBoost: Intuition Final classifier is combination of the weak classifiers Boosti: Traini procedure Initially, weight each traini example equally In each boosti round: Find the weak learner that achieves the lowest weighted traini error Raise the weights of traini examples misclassified by current weak learner Compute final classifier as linear combination of all weak learners (weight of each learner is directly proportional to its accuracy) Exact formulas for re-weighti and combini i weak learners depend on the particular boosti scheme (e.g., AdaBoost) Slide credit: Lana Lazebnik 20

AdaBoost Algorithm Start with uniform weights on traini examples For T rounds Evaluate weighted error for each feature, pick best. {x 1, x n } Re-weight the examples: Incorrectly classified -> more weight Correctly classified -> less weight Final classifier is combination of the weak ones, weighted accordi to error they had. Freund & Schapire 1995 Faces : terminology Detection: given an image, where is the face? Recognition: whose face is it? Ann Image credit: H. Rowley 21

Example: Face detection Frontal faces are a good example of a class where global appearance models + a slidi window detection approach fit well: Regular 2D structure Center of face almost shaped like a patch /window Now we ll take AdaBoost and see how the Viola- Jones face detector works Feature extraction Rectaular filters Efficiently computable with integral image: any sum can be computed in constant time Avoid scali images scale features directly for same cost Viola & Jones, CVPR 2001 Feature output is difference between adjacent regions Value at (x,y) is sum of pixels above and to the left of (x,y) Integral image 22

Large library of filters Which subset of these features should we use to determine if a window has a face? Use AdaBoost both to select the informative features and to form the classifier Consideri all possible filter parameters: position, scale, and type: 180,000+ possible features associated with each 24 x 24 window AdaBoost for feature+classifier selection Want to select the sile rectale feature and threshold that best separates positive (faces) and negative (nonfaces) traini examples, in terms of weighted error. Outputs of a possible rectale feature on faces and non-faces. Resulti weak classifier: For next round, reweight the examples accordi to errors, choose another filter/threshold combo. 23

Even if the filters are fast to compute, each new image has a lot of possible windows to search. How to make the detection more efficient? Cascadi classifiers for detection For efficiency, apply less accurate but faster classifiers first to immediately discard windows that clearly appear to be negative; e.g., Filter for promisi regions with an initial inexpensive classifier Build a chain of classifiers, choosi cheap ones with low false negative rates early in the chain Figure from Viola & Jones CVPR 2001 24

Viola-Jones Face Detector: Summary Faces Non-faces Train cascade of classifiers with AdaBoost Selected features, thresholds, and weights Train with 5K positives, 350M negatives Real-time detector usi 38 layer cascade 6061 features in final layer [Implementation available in OpenCV: http://www.intel.com/technology/computi/opencv/] New image Viola-Jones Face Detector: Summary A seminal approach to real-time object detection Traini is slow, but detection is very fast Key ideas Integral images for fast feature evaluation Boosti for feature selection Attentional cascade for fast rejection of non-face windows P. Viola and M. Jones. Rapid object detection usi a boosted cascade of simple features. CVPR 2001. P. Viola and M. Jones. Robust real-time face detection. IJCV 57(2), 2004. Slide credit: Lana Lazebnik 25

Viola-Jones Face Detector: Results First two features selected Viola-Jones Face Detector: Results 26

Viola-Jones Face Detector: Results Viola-Jones Face Detector: Results 27

Detecti profile faces? Can we use the same detector? Viola-Jones Face Detector: Results Paul Viola, ICCV tutorial 28

Example application Frontal faces detected and then tracked, character names inferred with alignment of script and subtitles. Everiham, M., Sivic, J. and Zisserman, A. "Hello! My name is... Buffy" - Automatic nami of characters in TV video, BMVC 2006. http://www.robots.ox.ac.uk/~vgg/research/nface/index.html Example application: faces in photos 29

Consumer application: iphoto 2009 http://www.apple.com/ilife/iphoto/ Slide credit: Lana Lazebnik Consumer application: iphoto 2009 Can be trained to recognize pets! http://www.maclife.com/article/news/iphotos_faces_recognizes_cats Slide credit: Lana Lazebnik 30

Consumer application: iphoto 2009 This iphoto thinks are faces Slide credit: Lana Lazebnik Other classes that might work with global appearance in a window? 31

Pedestrian detection Detecti upright, walki humans also possible usi slidi window s appearance/texture; e.g., SVM with Haar wavelets [Papageorgiou & Poggio, IJCV 2000] Space-time rectale features [Viola, Jones & Snow, ICCV 2003] SVM with HoGs [Dalal & Triggs, CVPR 2005] Other classes that might work with global appearance in a window? 32

Peuin detection & identification This project uses the Viola Jones Adaboost face detection algorithm to detect peuin chests, and then matches the pattern of spots to identify a particular peuin. Burghart, Thomas, Barham, and Calic. Automated Visual Recognition of Individual African Peuins, 2004. 33

Use rectaular features, select good features to distiuish the chest from non chests with Adaboost Burghart, Thomas, Barham, and Calic. Automated Visual Recognition of Individual African Peuins, 2004. Attentional cascade Peuin chest detections Burghart, Thomas, Barham, and Calic. Automated Visual Recognition of Individual African Peuins, 2004. 34

Given a detected chest, try to extract the whole chest for this particular peuin. Burghart, Thomas, Barham, and Calic. Automated Visual Recognition of Individual African Peuins, 2004. Example detections Burghart, Thomas, Barham, and Calic. Automated Visual Recognition of Individual African Peuins, 2004. 35

Perform identification by matchi the pattern of spots to a database of known peuins. Burghart, Thomas, Barham, and Calic. Automated Visual Recognition of Individual African Peuins, 2004. Peuin detection & identification Burghart, Thomas, Barham, and Calic. Automated Visual Recognition of Individual African Peuins, 2004. 36

Highlights Slidi window detection and global appearance descriptors: Simple detection protocol to implement Good feature choices critical Past successes for certain classes Limitations High computational complexity For example: 250,000 locations x 30 orientations x 4 scales = 30,000,000 evaluations! If traini binary detectors independently, means cost increases linearly with number of classes With so many windows, false positive rate better be low 37

Limitations (continued) Not all objects are box shaped Limitations (continued) Non-rigid, deformable objects not captured well with representations assumi a fixed 2d structure; or must assume fixed viewpoint Objects with less-regular textures not captured well with holistic appearance-based descriptions 38

Limitations (continued) If consideri windows in isolation, context is lost Slidi window Detector s view Figure credit: Derek Hoiem Limitations (continued) In practice, often entails large, cropped traini set (expensive) Requiri good match to a global appearance description can lead to sensitivity to partial occlusions Image credit: Adam, Rivlin, & Shimshoni 39

Summary: Detection as classification Supervised classification Loss and risk, Bayes rule Skin color detection example Slidi window detection Classifiers, boosti algorithm, cascades Face detection example Limitations of a global appearance description Limitations of slidi window detectors 40