Object Detection - Basics 1 Lecture 28 See Sections 10.1.1, 10.1.2, and 10.1.3 in Reinhard Klette: Concise Computer Vision Springer-Verlag, London, 2014 1 See last slide for copyright information. 1 / 33
Agenda 1 Localization, Classification, Evaluation 2 Descriptors, Classifiers, Learning 3 Performance of Object Detectors 4 Descriptor Example: Histogram of Oriented Gradients 2 / 33
Localization, Classification, Evaluation Descriptors, Classifiers, Learning Performance of Object Detectors HoG Localization Localization, classification, and evaluation are three basic steps of an object detection system Object candidates are localized within a rectangular bounding box 3 / 33
Localization, Classification, Evaluation Descriptors, Classifiers, Learning Performance of Object Detectors HoG Classification Localized object candidates are mapped by classification either in detected objects or rejected candidates Face detection: one false-positive and two false-negatives (not counting the side-view of a face) 4 / 33
Evaluation A true-positive, also called a detection, is a correctly-detected object A false-positive, also called a false detection, occurs if we detect an object where there is none A false-negative denotes a case where we miss an object A true-negative describes the cases where non-object regions are correctly identified as non-object regions (typically not of interest) 5 / 33
Localization, Classification, Evaluation Descriptors, Classifiers, Learning Performance of Object Detectors HoG Which one is TP or FP or FN or TN? 6 / 33
Agenda 1 Localization, Classification, Evaluation 2 Descriptors, Classifiers, Learning 3 Performance of Object Detectors 4 Descriptor Example: Histogram of Oriented Gradients 7 / 33
Descriptors Classification is membership in pairwise-disjoint classes being subsets of R n, where n > 0 is defined by the used descriptors A descriptor x = (x 1,..., x n ) is a point in the n-dimensional descriptor space R n representing measured or calculated property values in a given order Two Examples: n = 128 for SIFT n = 2 on the next page: descriptor space is defined by properties perimeter and area ; e.g. descriptor x 1 = (621.605, 10 940) for Segment 1 8 / 33
Example: 2D Descriptor Space Left: Regions in a segmented image. Right: Descriptor space 80,000 Area 3 1 4 70,000 60,000 2 3 5 6 50,000 40,000 30,000 +1-1 20,000 10,000 5 4 6 1 2 Perimeter 200 600 1,000 1,400 1,800 2,200 2,600 The blue line defines a binary classifier; it subdivides the descriptor space into two half-planes such that descriptors in one half-plane have value +1 (i.e. +1 is a class identifier) assigned, and -1 if in the other half-plane 9 / 33
Classifiers A classifier (i.e. a partioning of the descriptor space) assigns class numbers to descriptors Training: using a given set {x 1,..., x m } of already-classified descriptors (the learning set) for defining the partitioning (the classifier) Application: on descriptors generated for recorded data General classifier: Assigns class numbers 1, 2,..., k for k > 1 classes, and 0 for not classified Binary classifier: Assigns class numbers 1 or +1 10 / 33
Weak or Strong Classifiers A classifier is weak if it does not perform up to expectations (e.g., it might be just a bit better than random guessing) Multiple weak classifiers can be mapped into a strong classifier, aiming at a satisfactory solution of a classification problem Weak or strong classifiers can be general-case (i.e. multi-class) classifiers or just binary classifiers; just being binary does not define weak Example: AdaBoost defines a statistical combination of multiple weak classifiers into one strong classifier (see later) 11 / 33
Example 1: Binary Classifier by Linear Separation We define a binary classifier by constructing a hyperplane in R n, for n 1 Vector w R n is the weight vector Real b R is the bias of Π Π : w x + b = 0 Example: n = 2 or n = 3, then w is the gradient or normal orthogonal to the defined line or plane Π, respectively 12 / 33
Example 1: Continued x 2 x 2 Π x 1 Π x 1 Left: Linear-separable distribution of descriptors pre-classified to be either in class +1 (green descriptors) or -1 (red descriptors) Right: Not linear separable; sum of shown distances (black line segments) of misclassified descriptors defines total error for Π 13 / 33
Example 1: Continued h(x) = w x + b h(x) 0: One side of the hyperplane (including the plane itself) defines value +1 h(x) < 0: The other side (not including the plane itself) value -1 Linear classifier defined by w and b can be calculated for a distribution of (pre-classified) training descriptors in nd descriptor space Error for a misclassified descriptor x is the perpendicular distance w x + b d 2 (x, Π) = w 2 to the hyperplane Π Task: Calculate Π such that total error for all misclassified training descriptors is minimized 14 / 33
Example 2: Classification by Using a Binary Decision Tree Classifier defined by binary decisions at split nodes in a tree (i.e. yes or no ) Each decision is formalized by a rule, and given input data can be tested whether they satisfy the rule or not Accordingly, we proceed with the identified successor node in the tree Each leaf node of the tree defines finally an assignment of data arriving at this node into classes Example: each leaf node identifies exactly one class in R n ; see next slide for n = 2 15 / 33
Example 2: Continued Left: Decision tree Right: Resulting subdivison in 2D descriptor space x 2 x 1 < 100 yes no x 2 >60 x 1 >160 yes no yes no x 1 + x 2 < 120 yes no 200 180 160 140 120 100 80 60 40 20 x 1 20 40 60 80 100 120 140 160 180 200 Tested rules in the shown example of a tree define straight lines in the 2D descriptor space; descriptors arriving at one of the leaf nodes are then in one of the shown subsets of R 2 16 / 33
Trees, Forests, Cascades of Binary Classifiers A single decision tree (defined by at least one split node) can be considered to be an example for a weak classifier A set of decision trees, called a forest, can then be used for defining a strong classifier Observation. A single decision tree provides a way to partition a descriptor space into multiple regions (i.e. classes) When applying binary classifiers defined by linear separation then we need to combine several of those (e.g. in a cascade) to achieve a similar partitioning of a descriptor space 17 / 33
Learning Learning is the process when defining or training a classifier based on a set of descriptors Classification is the actual application of the classifier During classification we may also identify some misbehavior, and this can lead again to another phase of learning The set of descriptors used for learning may be pre-classified or not Supervised learning: We have a mechanism for assigning class numbers to descriptors (e.g. manually based on expertise such as yes, the driver does have closed eyes in this image ) Unsupervised learning: We do not have prior knowledge about class memberships of descriptors, e.g. for randomly selected patches in an image: a typical patch for a pedestrian or not? 18 / 33
Unsupervised Learning: Two Examples Data distribution in learning set decides about the classifier Clustering Apply a clustering algorithm for a given set of descriptors for identifying a separation of R n into classes Example: Analyze the density of the distribution of given descriptors in R n ; a region having a dense distribution defines a seed point of one class, and then we assign all descriptors to identified seed points by applying, for example, the nearest-neighbor rule Learn Rules at Split Nodes in a Decision Tree Learn decision rules at split nodes e.g. by having a general scheme how to define such rules, and optimise parameters by maximising the information gain at this split node (e.g. equal number of training descriptors passing to either the left or the right successor) 19 / 33
Positive (for Pedestrian ) and Negative Class Examples 20 / 33
Combined Learning Approaches There are also cases where we may combine supervised learning with strategies known from unsupervised learning Example Supervised: Decide whether a given bounding box shows a pedestrian, or decide for a patch, being a subwindow of a bounding box, whether it possibly belongs to a pedestrian Unsupervised: Generate a decision tree, e.g. by maximising information gain at split nodes Result: Assign class probabilities to a leaf node in the generated tree according to percentages of pre-classified descriptors arriving at this leaf node 21 / 33
Agenda 1 Localization, Classification, Evaluation 2 Descriptors, Classifiers, Learning 3 Performance of Object Detectors 4 Descriptor Example: Histogram of Oriented Gradients 22 / 33
Object Detector and Measures An object detector is defined by applying a classifier for an object detection problem We assume that any made decision can be evaluated as being either correct or false Evaluations of designed object detectors are required to compare their performance under particular conditions There are common measures in pattern recognition or information retrieval for performance evaluation of classifiers 23 / 33
Basic Definitions Let tp or fp denote the numbers of true-positives or false-positives, respectively Let tn or fn denote the numbers of true-negatives or false-negatives, respectively What are the numbers for the example on Page 6? Note: just the image does not indicate how many non-object regions have been analyzed (and correctly identified as being no faces); thus we cannot specify the number tn; we need to analyze the applied classifier for obtaining tn 24 / 33
PR, RC, MR, and FPPI Precision is the ratio of true-positives compared to all detections Recall (or sensitivity) is the ratio of true-positives to all potentially possible detections PR = tp tp + fp and RC = tp tp + fn PR = 1: no false-positive is detected RC = 1: all visible objects are detected & there is no false-negative Miss rate is the ratio of false-negatives to all objects False-positives per image is the ratio of false-positives to all detected objects MR = fn fp = 1 RC and FPPI = tp + fn tp + fp = 1 PR MR = 0: all visible objects are detected FPPI = 0: detected objects are correctly classified 25 / 33
TNR and AC tn is not a common entry for performance measures, but, if available then we also have TNR and AC: True-negative rate (or specificity) is the ratio of true-negatives to all decisions in no-object regions Accuracy is the ratio of correct decisions to all decisions TNR = tn tn + fp and AC = tp + tn tp + tn + fp + fn 26 / 33
Detected? How to decide whether a detected object is true-positive? Assume: Objects in images have been locally identified (e.g. manually) by bounding boxes, serving as the ground truth Detected objects are matched with these ground-truth boxes by calculating ratios of areas of overlapping regions a o = A(D T ) A(D T ) where A denotes the area of a region in an image, D is the detected bounding box of the object, and T is the area of the bounding box of the matched ground-truth box If a o T, say for T = 0.5, the detected object is taken as a true-positive If more than one possible matching for a detected bounding box then use the one with the largest a o -value 27 / 33
Agenda 1 Localization, Classification, Evaluation 2 Descriptors, Classifiers, Learning 3 Performance of Object Detectors 4 Descriptor Example: Histogram of Oriented Gradients 28 / 33
Scanning an Image for Object Candidates 1 Window of the size of the expected bounding box scans through an image 2 The scan stops at potential object candidates 3 If a potential bounding box has been identified, a process for descriptor calculation starts Histogram of oriented gradients (HoG) is a common way to derive a descriptor for a bounding box for an object candidate 29 / 33
Bounding Box, Blocks, and Cells A bounding box (here: of a pedestrian) is subdivided into blocks, and each block into smaller cells for calculating the HoG Yellow solid or dashed blocks are subdivided into red cells; a block moves left to right, top down, through a bounding box Right: Magnitudes of gradient vectors 30 / 33
Algorithm for Calculating the HoG Descriptor 1 Preprocessing. Intensity normalization and smoothing 2 Calculate an edge map. Gradient magnitudes and gradient angles for each pixel, generating a magnitude map I m and an angle map I a 3 Spatial binning. 1 Group pixels into non-overlapping cells (e.g. 8 8) 2 Accumulate magnitude values in I m into direction bins (e.g., nine bins for intervals of 20 each) to obtain a voting vector for each cell calculation 4 Normalize voting values for generating a descriptor. 1 Group cells (e.g., 2 2) into one block 2 Normalize voting vectors over each block, and combine them into one block vector 5 Concatenation. Augment all block vectors consecutively; this produces the final HoG descriptor 31 / 33
Two Examples Length of vectors in nine different directions in each cell represents the accumulated magnitude of gradient vectors for one of those nine directions 32 / 33
Copyright Information This slide show was prepared by Reinhard Klette with kind permission from Springer Science+Business Media B.V. The slide show can be used freely for presentations. However, all the material is copyrighted. R. Klette. Concise Computer Vision. c Springer-Verlag, London, 2014. In case of citation: just cite the book, that s fine. 33 / 33