RCNN, Fast RCNN, Faster RCNN

Transcription

1 RCNN, Fast RCNN, Faster RCNN Topics of the lecture: Problem statement Review of slow R-CNN Review of Fast R-CNN Review of Faster R-CNN Presented by: Roi Shikler & Gil Elbaz Advisor: Prof. Michael Lindenbaum Compare with other methods Take away Topics of the lecture: Problem statement Review of slow R-CNN Review of Fast R-CNN PASCAL Visual Object Classes The main goal of this challenge is to recognize objects from a number of visual object classes in realistic scenes [from the challenge homepage] Review of Faster R-CNN Compare with other methods Take away PASCAL Visual Object Classes The challenge face the participants in front of two problems: Classification: For each of the twenty classes, predicting presence/absence of an example of that class in the test. Detection (localization): Predicting the bounding box and label of each object from the twenty target classes in the test. PASCAL Visual Object Classes The twenty object classes that have been selected are: Person: person Animal: bird, cat, cow, dog, horse, sheep Vehicle: airplane, bicycle, boat, bus, car, motorbike, train Indoor: bottle, chair, dining table, potted plant, sofa, TV/monitor 1

2 map: Mean Average Precision. Mean average precision for a set of queries is the mean of the average precision scores for each query. map = ) * +, AvgP q Q map: Mean Average Precision. Mean average precision for a set of queries is the mean of the average precision scores for each query. map = ) AvgP q P=TP/(TP+FP) * +, Q Plateau & Increasing complexity 2

3 Topics of the lecture: R-CNN - Article (2013) Problem statement Review of slow R-CNN Review of Fast R-CNN Review of Faster R-CNN Compare with other methods Take away R-CNN: Regions with CNN RCNN at test time: Step 1 proposals (~2k / ) Compute CNN Classify regions (linear SVM) proposals (~2k / ) Compute CNN Classify regions (linear SVM) 3

4 RCNN at test time: Step 1 RCNN at test time: Step 2 proposals (~2k / ) Selective Search [van de Sande, Uijlings et al.] (Agnostic proposal-method) proposals (~2k / ) Dilate Proposal Compute CNN RCNN at test time: Step 2 RCNN at test time: Step 2 proposals (~2k / ) Compute CNN proposals (~2k / ) Compute CNN 227 X 227 a. Crop a. Crop b. Scale (anisotropic) RCNN at test time: Step 2 RCNN at test time: Step 2 proposals (~2k / ) Compute CNN proposals (~2k / ) Compute CNN Five connected convolutional layers, and two fully connected layers. b. Scale c. Forward propagate Output: fc7 b. Scale c. Forward propagate Output: fc7 : 227*227*3= 150,528 Output: 4096-dimen t io n al feat u re vect o r [A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet classification with deep convolutional neural networks.] 4

5 5/26/16 RCNN at test time: Step 2 RCNN at test time: Step 3 Compute CNN Extract region proposals (~2k / ) b. Scale Compute CNN Extract region proposals (~2k / ) c. Forward propagate Output: fc7 b. Scale RCNN at test time: Step dimensional fc7 feature vector RCNN at test time: Step 4 Compute CNN Extract region proposals (~2k / ) Classify regions Compute CNN Extract region proposals (~2k / ) Pers on? 1.6 b. Scale Linear classifiers (SVM) Step 4: Object proposal refinement Course Detection [us ing HOG +BOW ] (~20-30% overlap) Feature Extraction from Color and Edges Relocalization + Resizing with Graph Cuts [Affinity graph] [fine tuning] Classify regions Pers on? 1.6 Hors e? dimensional fc7 feature vector Classify regions Bounding Box Regression Hors e? -0.3 b. Scale 4096-dimensional fc7 feature vector Linear classifiers (SVM) Step 4: Object proposal refinement Linear regression on CNN Original proposal Predicted object bounding box **In RCN N from Selective Search** 5

6 Review of the slow R-CNN training pipeline Steps for training a slow R-CNN detector: 1. [offline] M Pre-train a for ImageNet classification Review of the slow R-CNN training pipeline Steps for training a slow R-CNN detector: 1. [offline] M Pre-train a for ImageNet classification 2. M Fine-tune M for object detection (softmax classifier) Review of the slow R-CNN training pipeline Steps for training a slow R-CNN detector: 1. [offline] M Pre-train a for ImageNet classification 2. M Fine-tune M for object detection (softmax classifier) 3. F Cache feature vectors to disc using M Review of the slow R-CNN training pipeline Steps for training a slow R-CNN detector: 1. [offline] M Pre-train a for ImageNet classification 2. M Fine-tune M for object detection (softmax classifier) 3. F Cache feature vectors to disc using M 4. Train post hoc linear SVMs on F for object classification * post hoc means the parameters are learned after the is fixed Review of the slow R-CNN training pipeline Steps for training a slow R-CNN detector: Review slow R-CNN 1. [offline] M Pre-train a for ImageNet classification 2. M Fine-tune M for object detection (softmax classifier) 3. F Cache feature vectors to disc using M 4. Train post hoc linear SVMs on F for object classification 5. Train post hoc linear bounding-box regressors on F * post hoc means the parameters are learned after the is fixed 6

7 Review slow R-CNN Review slow R-CNN Warped regions Regions of Interes t (RoI) from a proposal method (~2k) Regions of ofinteres rest t (RoI) o I) from a from proposal a proposal method method (~2k) (~2k) Girshick et al. CVPR14. Review slow R-CNN Review slow R-CNN SVMs Classify regions with SVMs SVMs Forward each region through SVMs Forward each region through Warped regions Warped regions Regions of Interes t (RoI) from a proposal method (~2k) Regions of of Inte Interes rest t (R(RoI) o from from a proposal a proposal method method (~2k) (~2k) Girshick et al. CVPR14. Girshick et al. CVPR14. Post hoc component Review slow R-CNN Bbox reg SVMs Bbox reg SVMs Apply bounding-box regressors Classify regions with SVMs What s wrong with slow R-CNN? Bbox reg SVMs Forward each region through Warped regions Girshick et al. CVPR14. Regions of Inte rest (R o I) from a proposal method (~2k) Post hoc component 7

8 What s wrong with slow R-CNN? Ad hoc training objectives Fine tune network with softmax classifier (log loss) Train post-hoc linear SVMs (hinge loss) Train post hoc bounding-box regressors (squared loss) What s wrong with slow R-CNN? Ad hoc training objectives Fine tune network with softmax classifier (log loss) Train post-hoc linear SVMs (hinge loss) Train post hoc bounding-box regressors (squared loss) Training is slow, takes a lot of disk space The training process takes about 84h What s wrong with slow R-CNN? Ad hoc training objectives Fine tune network with softmax classifier (log loss) Train post-hoc linear SVMs (hinge loss) Train post hoc bounding-box regressors (squared loss) Training is slow, takes a lot of disk space The training process takes about 84h Detection (inference) is slow About 47s per ~2k forward passes per What s wrong with slow R-CNN? Ad hoc training objectives Fine tune network with softmax classifier (log loss) Train post-hoc linear SVMs (hinge loss) Train post hoc bounding-box regressors (squared loss) Training is slow, takes a lot of disk space The training process takes about 84h Detection (inference) is slow About 47s per ~2k forward passes per Topics of the lecture: Problem statement Review of slow R-CNN Review of Fast R-CNN Review of Faster R-CNN Compare with other methods Take away 8

9 5/26/16 PASCAL VOC detection history Fast R-CNN objectives Fix most of what s wrong with slow R-CNN and SPP-net Train the detector in a single stage, end-to-end No caching to disk No post hoc training steps Train all layers of the network [Training the conv layers is important for very deep networks] ConvN et External propos al algorithm (s elective s earch) External propos al algorithm (s elective s earch) ConvN et ConvN et 9

10 5/26/16 FCs FCs Fully connected layer External propos al algorithm (s elective s earch) ConvN et ConvN et B o u n d i n g-b o x regressi o n C l assi fi cati o n l o ss Clas s ifier External propos al algorithm (s elective s earch) Linear SVM & Softmax FCs Linear SVM + Bounding box regres s ors B o u n d i n g-b o x regressi o n C l assi fi cati o n l o ss Clas s ifie r Linear SVM & Softmax Fully connected layer External propos al algorithm (s elective s earch) Trainable External propos al algorithm (s elective s earch) ConvN et ConvN et Benefits of end-to-end training Slow R-CNN vs. Fast R-CNN Faster training No reading/writing from/to disk No training post hoc SVMs and bounding-box regressors Training time: 84 hours / 8.75 hours Verified empirically: optimizing a single multi-task objective is more accurate than optimizing objectives independently Multi-tas k los s Linear SVM Bounding box regres s ors FCs Fully connected layer + VOC07 test map: 66.0% / 68.1% Testing time per : 47s / 0.32s With selective search: 49s / 2.32s (+2s per ) 10

11 5/26/16 Review of the faster R-CNN Topics of the lecture: In Fast RCNN: B o u n d i n g-b o x regressi o n C l assi fi cati o n l o ss Single loss Clas s ifie r Problem statement Linear SVM & Softmax Fully connected layer In Fast RCNN: Review of Fast R-CNN Multi-tas k los s Linear SVM Bounding box regres s ors FCs Review of slow R-CNN + Trainable External proposal algorithm (selective search ) Review of Faster R-CNN Compare with other methods CN N Take away Review of the faster R-CNN separate losses C l assi fi cati o n Clas s ifie r Review of the faster R-CNN B o u n d i n g-b o x regressi o n l o ss l o ss Linear SVM & Softmax Clas s ifie r Linear SVM Bounding box regres s ors FCs RPN Clas s ification los s (anchor good/ bad) Fully connected layer Built-in Region Proposal Network (RPN) Trainable CN N Faster R-CNN: Region Proposal Network Slide a small window on the feature map. Build a small network for: Classifying object or not-object Regressing bounding-box localizations Position of the sliding window provides localization information with reference to the B o u n d i n g-b o x regressi o n l o ss Clas s ification los s Multi-tas k los s Linear SVM & Softmax RPN regres s ion los s (anchor propos al) Multi-tas k los s Linear SVM Bounding box regres s ors FCs Fully connected layer Built-in Region Proposal Network (RPN) Trainable CN N Faster R-CNN: Region Proposal Network Use N anchor boxes at each location Anchors are translation invariant 2n s cores Regression gives offsets from anchor boxes Classification gives the probability that each regressed box shows an object 4n coordinates Intermediate layer Box regression provides finer localization information with reference to this sliding window. 11

12 Topics of the lecture: R-CNN Results Problem statement Review of slow R-CNN Review of Fast R-CNN Results by using pre-cnn methods Review of Faster R-CNN Compare with other methods Take away R-CNN Results R-CNN Results Big improvement using CNN Bounding box regression shows a bit improvement R-CNN Results Fast R-CNN Results Features from deeper network helps a lot 12

13 Faster R-CNN Results Faster R-CNN Results YOLO You Only Look Once Detection as Regression Faster than Faster R-CNN, but not as good Faster R-CNN Results Topics of the lecture: Selective Search vs. Region Proposal Network Problem statement Review of slow R-CNN Review of Fast R-CNN Review of Faster R-CNN Compare with other methods Take away Take Away Classification and detection of objects in s is approaching high quality and real-time frame rate results. The major breakthroughs came from: 1. Utilizing CNN instead of classical computer vision methods 2. Making deeper neural networks with multiple objectives 3. Switching all stages of algorithm with one unified network (end-to-end) These algorithms are all available and applicable to real world problems! Once you know how they work you can change and adjust them to your specific research needs. Object Detection code links: R-CNN(Cafffe+MATLAB) : [Probably don t use this; too slow] Fast R-CNN (Caffe+ MATLAB): ast-rcnn Faste r R-CNN(Caffe+ MATLAB) : Faste r R-CNN (Caffe+ P ython) : YOLO: 13

14 14