Multi-view Face Detection Using Deep Convolutional Neural Networks
|
|
|
- Alexia Cook
- 9 years ago
- Views:
Transcription
1 Multi-view Face Detection Using Deep Convolutional Neural Networks Sachin Sudhakar Farfade Yahoo Mohammad Saberian Yahoo Li-Jia Li Yahoo 1. INTRODUCTION With the wide spread of smartphones and fast mobile networks, everyday millions of photos are uploaded to the cloud storages such as Dropbox or social networks such as Facebook, Twitter, Instagram, Google+ and Flicker. Organizing and retrieving relevant information from these photos are very challenging and directly impacts user experience on those platforms. For example, it is very common for a user to look for photos that were taken at a particular location, at a particular time or with a particular friend. The first two types of queries are fairly straight forward as almost all of today s cameras embed time and GPS location into photos. The last query, i.e. contextual query, is more challenging as there is no explicit signal about identity of people in the photos. The key for this identification is to detect human faces. This has made low complexity, rapid and accurate face detection an essential component for cloud based photo sharing/storage platforms. For the past two decades, face detection has always been an active area of research in vision community. The seminal work of Viola and Jones [25], made it possible to rapidly detect up-right faces in real-time with very low computational complexity. Their detector, called detector cascade, is conarxiv: v1 [cs.cv] 10 Feb 2015 ABSTRACT In this paper we consider the problem of multi-view face detection. While there has been significant research on this problem, current state-of-the-art approaches for this task require annotation of facial landmarks, e.g. TSM [17], or annotation of face poses [18, 16]. They also require training a dozen of models to fully capture faces in all orientations, e.g. 22 models in HeadHunter method [16]. In this paper we propose an method, called DDFD, that does not require pose/landmark annotation and is able to detect faces in all orientations using a single model based on deep convolution neural networks. The proposed method has minimal complexity as unlike other deep learning methods it does not require additional components such as segmentation, bounding box regression or SVM classifiers. Furthermore, we analyze scores of the proposed face detector for faces in different orientations and find that 1) the proposed face detector based on deep convolution neural network is able to detect faces from different angles and can handle occlusion to some extent, 2) there is a correlation between distribution of positive examples in the training set and scores of the proposed face detector. The latter suggests that the performance of the proposed method can get further improved by using better sampling strategies and more sophisticated data augmentation techniques. Evaluations on face detection benchmarks show that our single-model face detector algorithm has similar or better performance comparing to the previous methods which are more complex and require annotations of either different poses or facial landmark. Categories and Subject Descriptors H.4 [Information Systems Applications]: Miscellaneous General Terms Application Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. WOODSTOCK 97 El Paso, Texas USA Copyright 20XX ACM X-XXXXX-XX-X/XX/XX...$ Figure 1: An example of user generated photos on social networks which contain faces in all different poses, illuminations and occlusions. The bounding boxes and corresponding scores show output of our proposed face detector. Keywords Face Detection, Convolutional Neural Network, Deep Learning
2 sist of a sequence of simple to complex face classifiers and has been subject of extensive research in academia. Moreover, detector cascade has been deployed in many commercial products such and smartphones and digital cameras. However, while cascade detectors can accurately find visible up-right faces, they often fail to detect faces from different angles, e.g. side view or partially occluded faces. This failure can significantly impact performance of photo organizing software/applications as many of user generated contents contain faces from different angles or faces that are not fully visible, see for example figure 1. This has motivated many works on the problem of multi-view face detection in the past decade and the current solutions can be classified in two categories: Cascade based: These methods extend the Viola and Jones detector cascade. For example [26] proposed to train a detector cascade for each view of the face and combine their results at the test time. Recently, [16] combined this method with integral channel features [4] and soft-cascade [2], and showed that by using 22 cascades it is possible to obtain state-of-the-art for multi-view face detection. This approach, however, requires face orientation annotations. Moreover its complexity in training and test time increases linearly with the number of models. To address the run time complexity issue, Viola and Jones [24] proposed to first estimate the face pose using a tree classifier and then run the cascade of corresponding face pose to verify the detection. While improving the detection speed, this method degrade the accuracy as mistakes of the initial tree classifier is irreversible. This method is further improved by [10, 9] where instead of one detector cascade, several detectors are used after the initial classifier for verification. Finally, [22] and [18] combined detector cascade with multiclass boosting and proposed method for multiclass/multi-view object detection. DPM based: These methods are based on the deformable part model technique [6] where a face is defined as a collection of parts. The parts are defined via unsupervised or supervised training and a classifier, latent SVM, is trained to find those parts and their geometric relationship. As these detector can detect face even when some of its part are not present, they are robust to partial occlusion. This methods are, however, computationally intensive as 1) they require solving a latent SVM for each candidate location and 2) multiple DPMs has to be trained and combined to achieve state-of-the-art performance [16, 6]. Finally, in some cases DPM based models require annotation of facial landmarks for training, e.g [17]. The key challenge for multi-view face detection as pointed out by Viola and Jones [24] is that learning algorithms such as Boosting or SVM and image features such as HOG or Haar wavelets are not strong enough to capture faces of different poses and thus the resulted classifiers are hopelessly inaccurate. However, with the recent advances in deep learning we proposed to utilize high capacity of deep convolutional neural networks for feature extraction/classification, and train a single model for the task of multi-view face detection. Deep convolutional neural network has recently demonstrated outstanding performance in variety of vision tasks. In particular [13] trained an 8-layered network, called AlexNet, and showed that deep convolutional neural networks can significantly outperform other methods for task of large scale image classification. For task of object detection [7] proposed R-CNN method which uses an image segmentation technique, selective search [23], to find candidate image regions and classify those candidates using a version of AlexNet that is fined tuned for the target objects in PASCAL VOC dataset. More recently, [21] improved R-CNN by 1) augmenting the selective search proposals with candidate regions from multibox approach [5], and 2) replacing 8-layered AlexNet with a much deeper CNN model of Googlenet [20]. Despite state-of-the-art performance, these methods are computationally sub-optimal as they require to evaluate a CNN over more than 2, 000 candidate regions independently. To address this issue, [12] recently proposed to run the CNN model on the full image once and create a feature pyramid. The candidate regions, obtained by selective search, are then mapped into this feature pyramid space. [12] then uses spatial pyramid pooling [14] and SVM on the mapped regions to classify candidate proposals. Beyond region based methods, deep convolutioanl neural networks has also been used with sliding window approaches, e.g. OverFeat [19], and deformable part model, e.g. DenseNet [8]. In general, these methods have still inferior performance comparing to region based methods such as R-CNN [7] and [21]. In our face detection experiments, however, we found that the region based methods are often very slow and results in relatively weak performance. In this paper we propose a method based on deep learning, called Deep Dense Face Detector, that does not require pose/landmark annotation and is able to detect faces in all orientations using a single model. The proposed method has minimal complexity as unlike other deep learning method it does not require additional components for segmentation, bounding box regression or SVM classifiers. We also analyze performance of our proposed face detector on variety of face images with different orientations and find that DDFD is able to detect faces in almost all orientations. In addition by analyzing detector scores we find that there is a correlation between distribution of positive examples in the training set and scores of the proposed detector. This suggests that the performance of the proposed method can get further improved by using better sampling strategies and more sophisticated data augmentation techniques. In the evaluation we compare the proposed method with other deep learning based methods such as R-CNN and show that our method results in faster and more accurate results. We also compare performance of our detector with other type of face detectors, e.g. cascade and DPM based, and show that DDFD can get similar or better performance while it does not require pose annotation or information about facial landmarks. 2. PROPOSED METHOD In this section we provide details of algorithm and training process for our proposed face detector, called Deep Dense Face Detector (DDFD). The key ideas are 1) to leverage the high capacity of deep convolutional networks for classification and feature extraction to learn a single classifier for detecting faces from multiple views and 2) to minimize the complexity and simplify the detector as much as possible. We started by fine tuning AlexNet [13] for face detection. For this we extracted training examples from AFLW
3 Figure 2: right) an example image with faces in different in-plane rotations. It also shows output of our proposed face detector after NMS along with corresponding confidence score for each detection. left) heat map for the response of DDFD scores over the image. [15] dataset which is consist of 21K images with 24K face annotations. To increase number of positive example, we randomly sample sub-windows of the images and use them as positive examples if they have more than 50% overlap with the ground truth. For further data augmentation, we also randomly flipped these training examples. This resulted in total number of 200K positive and and 20 millions negative training examples. These examples are then resized to and used to fine tune a pre-trained AlexNet model [13]. For fine tuning, we used 50K iterations and batch size of 128 images where each batch contains 32 positive and 96 negative examples. Using this fine tuned deep network it is possible to take either region based or sliding window approaches to obtain the final face detector. In this work we selected sliding window approach because 1) it has less complexity and is independent of extra modules such as selective search, 2) as discussed in experiment section, it leads to better results comparing to R-CNN. Using a deep CNN in sliding window is not possible for a network that has fully connected layers. And similar to AlexNet [13] our face classifier is consist of 8 layers where the first 5 layers are convolutional and the last 3 layers are fully connected. Therefore we first convert the fully connected layers into convolutional layers by reshaping layer parameters [8]. This make it possible to run the CNN on images of any size and obtain heat map of the face classifier. An example of this heat map is shown in figure 2-left where it shows CNN response, i.e. probability of face, across the full image. Note that unlike R-CNN that uses SVM classifier to obtain final score, we removed the SVM module and found that the network output are informative enough for the task of face detection. The heat map will then be processed by non-maximal suppression to accurately localize the faces. Finally, to detect faces of different sizes we scale the images up/down and obtain new heat maps. We tried different scaling schemes and found that rescaling image 3 time per octave give reasonable good performance. This is interesting as many of other methods such as [16, 3] requires significantly large number of resizing per octave, e.g. 8. The face localization can be further improved by using a bounding box regression module similar to [19, 7]. In our experiment, however, adding this module degrades the performance. Therefor comparing to other methods such as R-CNN [7], that uses selective search, SVM and bounding box regression, or DenseNet [8], that is based deformable part model, our proposed method (DDFD) is pretty simple. Despite simplicity, as shown in experiments section, DDFD can achieve state-of-the-art performance for face detection. 2.1 Detector Analysis In this section we look into scores of the proposed face detector and show that there is a correlation between score of this detector and positive examples in the training set. We can later use this correlation to 1) obtain better training set or 2) to design better data augmentation procedures and improve performance of DDFD. We start by running our detector on variety of faces with different in-plane and out-of-plane rotations, occlusions and lighting conditions, see for example figure 1, figure 2-right and figure 3. First, note that in all cases our detector was able to detect the face except for two highly occluded ones in figure 1. Second, for almost all the detected faces, the detector score is pretty high, e.g. close to 1 and as shown in the heat map of figure 2-left, is close to zero for all other regions. This shows that DDFD has a very strong classifier and its output can be use directly without requiring any post-processing steps such as SVM that is used in R-CNN [7]. Third, comparing the detector scores for faces in figure 2-right, it is clear that the up-right frontal face in the bottom has very high score of while faces with more in-plane rotation have less score. In fact the scores decrease as the in-plane rotation increase. We can see the same trend for out-of-plane rotated faces and occluded faces in figures 1 and 3. We hypothesize that this trend in the scores is not because detecting rotated face are more difficult but it is because of lack of good training examples to represent such faces in the training process. To examine this hypothesis, we looked into the face annotations for AFLW dataset [15] and figure 4 shows the distribution of the annotated faces with regards to their in-plane, pitch (up and down) and yaw(left to right) rotations. As shown in this figure, the number of faces with more than 30 degrees out of place rotation is significantly lower that face with less than 30 degree rotation. Similarly, the number of faces with yaw or pitch less than 50 degree is significantly
4 larger than other ones. Given this skewed training set it not surprising that the fine-tuned CNN has more tendency toward up-right faces. This is because our CNN is designed to minimize risk of the softmax loss function R = log [prob(y i x i)] (1) x i inb where B is the example batch that is used in an iteration of stochastic gradient descent and z i is the label of example x i. The sampling method for selecting examples in B can significantly impact performance of the final detector. In an extreme case if B never contains any example from a certain class, the CNN classifier will never learn attributes of that class. In our implementation B = 128 and it is collected by sampling the training set randomly. However, since the number of negative examples are 100 times more than number of positive examples, a pure uniform sampling will result in only 2 positive examples per patch. This significantly degrades chance of CNN to distinguish faces from non-faces. To address this issue we enforced that one quarter of each patch has to be positive examples where these positive examples are uniformly sampled from the pool of positive images. But as illustrated in figure 4, the pool of positive examples is highly skewed in different aspects, e.g. in-plane and outof-plane rotations. Therefore CNN is getting exposed with more up-right faces and it is not surprising that the finetuned CNN is more confident about the up-right faces than the rotated ones. This analysis suggests that the key for improving performance of DDFD is to make sure that all categories of the training examples have similar chance to contribute in optimizing of the CNN. This can be accomplished by 1) enforcing population based sampling strategies such as increasing selection probability for low population categories, similar to our strategy for sampling more positive example. Similarly, as shown in figure 1, the current face detector still fails to detect faces with heavy occlusions. Similar to the issue for rotated faces, we believe that this problem can also be addressed through modification of the training set. In fact most of face images in AFLW dataset [15] are not occluded which makes it difficult for a CNN to learn that faces can be occluded as well. This issue can be addressed by using more sophisticated data augmentation techniques such as occluding parts of positive examples. Note that simply covering parts of positive examples with black/white or noise blocks is not useful as the CNN may learn those artificial patterns. In summery, the proposed face detector based on deep CNN is able to detect faces from different angle and handle occlusion to some extend. However, since the training set is skewed the network is more confident about up-right faces and better results can be achieved by using better sampling strategies and more sophisticated data augmentation techniques. 3. EXPERIMENTS We implemented the proposed face detector using Caffe[11] code and used its pre-trained Alexnet[13] model for finetuning. For further details on the training process of our proposed face detector please see section 2. After converting fully connected layers to convolutional layers, it is possible to get the network response, heat map, for the whole input Figure 3: A set of faces with different out-of-plane rotations and occlusions. The figure also shows output of our proposed face detector after NMS along with corresponding confidence score for each detection. image in one call to Caffe code. The heat map shows score of the CNN for every window with stride of 32 pixels in the whole image. We directly used this response for classifying a window as face or background. To detect faces of smaller or bigger size than we then scale the image down or up respectively. We tested our face detection approach on PASCAL Face[1] and AFW[17] datasets. For selecting and tuning parameters of the proposed face detector we used PASCAL Face dataset while results are finally reported on AFW dataset as well. For evaluation we used toolbox provide by [16] with corrected annotations for PASCAL Face and AFW dataset. PASCAL Face dataset is consist of 851 images and 1341 annotated faces where annotated faces can be as small as 35 pixels. AFW dataset is built using Flickr images. It has 205 images with 473 annotated faces and its images tend to contain cluttered backgrounds with large variations in both face viewpoint and appearance (aging, sunglasses, make-ups, skin color, expression etc.). We start by finding the required number of scales for the proposed detector using PASCAL dataset. We up scaled training images by 5 times to detect faces as small as 227/5 = 45 pixels. We then down scale the image with by a factor, f s, and repeat the process until minimum image dimension is less than 227 pixels. For choice of f s, we chose f s { , 0.5 2, 0.5 3, 0.5 5, } and figure 5 shows the effect of this parameter on precision and recall of our face detector, DDFD. The legend shows the area under the curve for each method. Decreasing f s lets the detector to scan the image
5 in plane rotation in degree 0.2 DDFD fs 3 DDFD fs 5 DDFD fs 7 DDFD fs Figure 5: Effect of scaling factor on precision and recall of the detector pitch in degree yaw in degree Figure 4: Histogram of faces in AFLW dataset based on their top) in-plane, middle) pitch (up and down) and bottom) yaw(left to right) rotations. DDFD NMS max overlap 0.3 (AP 89.66) DDFD NMS max overlap 0.2 (AP 89.07) DDFD NMS max overlap 0.1 (AP 88.90) DDFD NMS max overlap (AP 87.95) DDFD NMS avg overlap 0.2 (AP 87.85) DDFD NMS avg overlap 0.3 (AP 87.75) DDFD NMS avg overlap (AP 85.40) DDFD NMS max overlap 0.5 (AP 78.45) 0.2 DDFD NMS avg overlap 0.5 (AP 75.46) DDFD NMS avg overlap (AP 75.23) DDFD NMS max overlap (AP 51.47) DDFD NMS avg overlap 0.1 (AP 17.85) Figure 6: Effect of different NMS strategies and their overlap thresholds. finer and increases the computational time. According to figure 5, it seems that choice of f s { , 0.5 3, 0.5 5, } has little impact on performance of the detector. Surprisingly, f s = seems to have slightly better performance while it does not scan the image as thorough as f s = or f s = Based on this experiment we use fs = in the rest of this paper. Another component of the system is non-maximum suppression module. For this we evaluated two different strategies, NMS-max: we find the window of maximum score and remove all of the bounding boxes with IOU (intersection over union) larger than an overlap threshold. NMS-avg: we filter out windows whose confidence is lower than 90%. We then find the window of maximum score and average all of the bounding boxes with IOU larger than an overlap threshold. We tested both strategies and figure 6 shows performance of each strategy for different overlap thresholds. As shown in this figure, performance of both methods vary significantly with overlap threshold. Overlap threshold of 0.3 gives the best performance for NMS-max while for NMS-avg 0.2 performances the best. According to this figure, NMS-max, has slightly better performance in terms of average precision, however NMS-avg reaches higher recall. In the rest of the paper we show our method with both strategies and their optimal thresholds. Finally, we examine the effect of a bounding box regression module for improving detector localization. The idea is to train regressors to predict the difference between the predictor bounding box and the ground truth. At the test time these regressors can be used to estimate this difference and adjust the predicted bounding boxes accordingly. This idea has been shown to improve localization performance in several methods including [6, 19, 5]. For training our bounding box regressors we followed the algorithm of [7] and figure 7 shows performance of our detector with and without this module. As shown in this figure, surprisingly adding a bounding box regressors degrades the performance for both NMS strategies. We investigated this phenomena and believe that the mismatch in the annotation between training set and test set is the reason. This mismatch is mostly is for side view faces and is illustrated in figure 8. Beyond degrading performance of bounding box regression module, this mismatch is also leads to false miss-detections in the evaluation process. 3.1 Comparison with R-CNN
6 DDFD NMS-max (AP 89.76) DDFD NMS-avg (AP 87.85) DDFD bbox nms max (AP 87.22) DDFD bbox nms avg (AP 85.67) DDFD NMS-max (AP 89.76) DDFD NMS-avg (AP 87.85) 0.2 rcnn face ft bbox (AP 78.23) rcnn voc ft bbox (AP 68.27) rcnn face ft (AP 67.75) rcnn voc ft (AP 52.96) Figure 7: Performance of the proposed face detector with and without bounding box regression. Figure 9: Comparison of our face detector, DDFD, with different R-CNN face detector. Figure 8: Annotation of a side face in Left) training set and right) test set. The dotted bounding box is the predicted bounding box by our proposed detector that is counted as false positive as its IOU is less than 50%. R-CNN [7] is one the current state-of-the art methods for object detection. In this section we compare performance on our proposed detector with that of R-CNN and its variants. We started by fine tuning AlexNet for face detection using the process of section 2, we then train SVM classifier for face classification using fc7 features. We also trained a bounding box regression unit to further improve the results and used max NMS for final localization. We repeated this experiment on a version of AlexNet that is tuned for PAS- CAL VOC 2012 dataset and is provided with R-CNN code. Figure 9 compare performance of our detector with different NMS strategies along with performance of R-CNN methods with and without bounding box regression. As shown in this figure, it is not surprising that performance of the detectors whose AlexNet is fine tuned by faces are better than the other ones. In addition it seems that bounding box regression can significantly improve R-CNN performance. However, even the best R-CNN classifier has significantly inferior performance comparing to our proposed face detector independent of the NMS strategy. We believe the inferior performance of R-CNN are due to 1) loss of recall: since selective search may miss some of face regions and 2) loss in localization: as bounding box regression is not perfect and may not be able fully align the segmentation bounding boxes, provided by selective search[23], with ground truth. 3.2 Comparisons with state-of-the-art In this section we compare performance of our proposed detector with other state-of-the-art face detectors using publicly available datasets of PASCAL faces [1], and AFW [17]. In particular we compared our method with 1) deformable part based methods such as structural model[1] and TSM [17] and 2) cascade based method such as head hunter [16]. Figure 10 compares performance of our detector with other detectors. Note that the illustrated comparison is not completely fair as most of the other methods such as DPM or HeadHunter use extra information of view point annotation in training. As shown in this figure our single model face detector was able to achieve similar or better results comparing to other state-of-the-art methods while it does not require pose annotation or information about facial landmarks. 4. CONCLUSIONS AND FUTURE WORK In this paper we proposed a face detection method based on deep learning, called Deep Dense Face Detector. The proposed method has minimal complexity and is independent of common modules in deep learning methods such as bounding box regression, SVM, or image segmentation. In addition, the proposed method does not require pose/landmark annotation and is able to detect faces in all orientations using a single model. We evaluated the proposed method with other deep learning based methods such as R-CNN and showed that our method results in faster and more accurate results. Comparing to other methods that are developed specifically for multi-view face detection e.g. cascade and DPM based, our detector was able to get similar or better results while it does not require pose annotation or information about facial landmarks. Finally, we analyzed performance of our proposed face detector on variety of face images find that there is a correlation between distribution of positive examples in the training set and scores of the proposed detector. In future we are planning to use better sampling strategies and more sophisticated data augmentation techniques to further improve performance of the proposed method for detecting occluded and rotated faces. 5. REFERENCES [1] Face detection by structural models. Image and Vision Computing, 2014.
7 DPM (HeadHunter) (AP 90.29) DDFD NMS-max (AP 89.76) Headhunter (AP 89.63) DDFD NMS-avg (AP 87.85) Structured Models (AP 83.87) TSM (AP 76.35) Sky Biometry [28] (AP 68.57) OpenCV (AP 61.09) 0.2 W.S. Boosting (AP 59.72) Face++ Picasa DPM (HeadHunter) (AP 97.21) Headhunter (AP 97.14) Structured Models (AP 95.19) DDFD NMS-avg (AP 94.46) DDFD NMS-max (AP 91.70) Shen et al. (AP 89.03) TSM (AP 87.99) 0.2 Face++ Face.com Picasa Figure 10: Comparing performance of different face detectors on left) PASCAL faces, right) AFW dataset [2] L. Bourdev and J. Brandt. Robust object detection via soft cascade. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 05) - Volume 2 - Volume 02, CVPR 05, [3] P. Dollár, R. Appel, S. Belongie, and P. Perona. Fast feature pyramids for object detection. PAMI, [4] P. Dollar, Z. Tu, P. Perona, and S. Belongie. Integral channel features. In Proc. BMVC, [5] D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov. Scalable object detection using deep neural networks. June [6] P. Felzenszwalb, D. McAllester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), [7] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), [8] R. B. Girshick, F. N. Iandola, T. Darrell, and J. Malik. Deformable part models are convolutional neural networks. CoRR, [9] C. Huang, H. Ai, Y. Li, and S. Lao. Vector boosting for rotation invariant multi-view face detection. In Computer Vision, ICCV Tenth IEEE International Conference on, [10] C. Huang, H. Ai, Y. Li, and S. Lao. High-performance rotation invariant multiview face detection. Pattern Analysis and Machine Intelligence, IEEE Transactions on, [11] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arxiv preprint arxiv: , [12] S. R. Kaiming He, Xiangyu Zhang and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. In ECCV, [13] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages , [14] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2, CVPR 06, [15] P. M. R. Martin Koestinger, Paul Wohlhart and H. Bischof. Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. [16] M. Mathias, R. Benenson, M. Pedersoli, and L. Van Gool. Face detection without bells and whistles. In ECCV, [17] D. Ramanan. Face detection, pose estimation, and landmark localization in the wild. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), [18] M. Saberian and N. Vasconcelos. Multi-resolution cascades for multiclass object detection. In Advances in Neural Information Processing Systems 27, pages [19] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. In International Conference on Learning Representations (ICLR 2014), [20] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. CoRR, [21] C. Szegedy, S. Reed, D. Erhan, and D. Anguelov. Scalable, high-quality object detection. CoRR, [22] A. Torralba, K. Murphy, and W. Freeman. Sharing visual features for multiclass and multiview object detection. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(5): , May [23] J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders. Selective search for object recognition. International Journal of Computer Vision, [24] M. Viola, M. J. Jones, and P. Viola. Fast multi-view face detection. In Proc. of Computer Vision and Pattern Recognition, [25] P. Viola and M. J. Jones. Robust real-time face detection. Int. J. Comput. Vision, [26] B. Wu, H. Ai, C. Huang, and S. Lao. Fast rotation invariant multi-view face detection based on real adaboost. In Automatic Face and Gesture Recognition, Proceedings. Sixth IEEE International Conference on, 2004.
8 Figure 11: Examples of output of our face detector from AFW dataset. The results are shown in different colors: green boxes are true positives, magenta boxes are ground truth, red boxes are false positives and cyan boxes are false negatives.
Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection
CSED703R: Deep Learning for Visual Recognition (206S) Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection Bohyung Han Computer Vision Lab. [email protected] 2 3 Object detection
Deformable Part Models with CNN Features
Deformable Part Models with CNN Features Pierre-André Savalle 1, Stavros Tsogkas 1,2, George Papandreou 3, Iasonas Kokkinos 1,2 1 Ecole Centrale Paris, 2 INRIA, 3 TTI-Chicago Abstract. In this work we
Lecture 6: Classification & Localization. boris. [email protected]
Lecture 6: Classification & Localization boris. [email protected] 1 Agenda ILSVRC 2014 Overfeat: integrated classification, localization, and detection Classification with Localization Detection. 2 ILSVRC-2014
Pedestrian Detection with RCNN
Pedestrian Detection with RCNN Matthew Chen Department of Computer Science Stanford University [email protected] Abstract In this paper we evaluate the effectiveness of using a Region-based Convolutional
Convolutional Feature Maps
Convolutional Feature Maps Elements of efficient (and accurate) CNN-based object detection Kaiming He Microsoft Research Asia (MSRA) ICCV 2015 Tutorial on Tools for Efficient Object Detection Overview
Semantic Recognition: Object Detection and Scene Segmentation
Semantic Recognition: Object Detection and Scene Segmentation Xuming He [email protected] Computer Vision Research Group NICTA Robotic Vision Summer School 2015 Acknowledgement: Slides from Fei-Fei
Module 5. Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016
Module 5 Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016 Previously, end-to-end.. Dog Slide credit: Jose M 2 Previously, end-to-end.. Dog Learned Representation Slide credit: Jose
CS 1699: Intro to Computer Vision. Deep Learning. Prof. Adriana Kovashka University of Pittsburgh December 1, 2015
CS 1699: Intro to Computer Vision Deep Learning Prof. Adriana Kovashka University of Pittsburgh December 1, 2015 Today: Deep neural networks Background Architectures and basic operations Applications Visualizing
A Convolutional Neural Network Cascade for Face Detection
A Neural Network Cascade for Face Detection Haoxiang Li, Zhe Lin, Xiaohui Shen, Jonathan Brandt, Gang Hua Stevens Institute of Technology Hoboken, NJ 07030 {hli18, ghua}@stevens.edu Adobe Research San
Local features and matching. Image classification & object localization
Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to
Bert Huang Department of Computer Science Virginia Tech
This paper was submitted as a final project report for CS6424/ECE6424 Probabilistic Graphical Models and Structured Prediction in the spring semester of 2016. The work presented here is done by students
Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report
Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 69 Class Project Report Junhua Mao and Lunbo Xu University of California, Los Angeles [email protected] and lunbo
Fast R-CNN. Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th. Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:1504.08083.
Fast R-CNN Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:1504.08083. ECS 289G 001 Paper Presentation, Prof. Lee Result 1 67% Accuracy
The Visual Internet of Things System Based on Depth Camera
The Visual Internet of Things System Based on Depth Camera Xucong Zhang 1, Xiaoyun Wang and Yingmin Jia Abstract The Visual Internet of Things is an important part of information technology. It is proposed
Fast R-CNN Object detection with Caffe
Fast R-CNN Object detection with Caffe Ross Girshick Microsoft Research arxiv code Latest roasts Goals for this section Super quick intro to object detection Show one way to tackle obj. det. with ConvNets
Robust Real-Time Face Detection
Robust Real-Time Face Detection International Journal of Computer Vision 57(2), 137 154, 2004 Paul Viola, Michael Jones 授 課 教 授 : 林 信 志 博 士 報 告 者 : 林 宸 宇 報 告 日 期 :96.12.18 Outline Introduction The Boost
Pedestrian Detection using R-CNN
Pedestrian Detection using R-CNN CS676A: Computer Vision Project Report Advisor: Prof. Vinay P. Namboodiri Deepak Kumar Mohit Singh Solanki (12228) (12419) Group-17 April 15, 2016 Abstract Pedestrian detection
CAP 6412 Advanced Computer Vision
CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong Jan 26, 2016 Today Administrivia A bigger picture and some common questions Object detection proposals, by Samer
arxiv:1604.08893v1 [cs.cv] 29 Apr 2016
Faster R-CNN Features for Instance Search Amaia Salvador, Xavier Giró-i-Nieto, Ferran Marqués Universitat Politècnica de Catalunya (UPC) Barcelona, Spain {amaia.salvador,xavier.giro}@upc.edu Shin ichi
Image and Video Understanding
Image and Video Understanding 2VO 710.095 WS Christoph Feichtenhofer, Axel Pinz Slide credits: Many thanks to all the great computer vision researchers on which this presentation relies on. Most material
Scalable Object Detection by Filter Compression with Regularized Sparse Coding
Scalable Object Detection by Filter Compression with Regularized Sparse Coding Ting-Hsuan Chao, Yen-Liang Lin, Yin-Hsi Kuo, and Winston H Hsu National Taiwan University, Taipei, Taiwan Abstract For practical
Steven C.H. Hoi School of Information Systems Singapore Management University Email: [email protected]
Steven C.H. Hoi School of Information Systems Singapore Management University Email: [email protected] Introduction http://stevenhoi.org/ Finance Recommender Systems Cyber Security Machine Learning Visual
Administrivia. Traditional Recognition Approach. Overview. CMPSCI 370: Intro. to Computer Vision Deep learning
: Intro. to Computer Vision Deep learning University of Massachusetts, Amherst April 19/21, 2016 Instructor: Subhransu Maji Finals (everyone) Thursday, May 5, 1-3pm, Hasbrouck 113 Final exam Tuesday, May
Task-driven Progressive Part Localization for Fine-grained Recognition
Task-driven Progressive Part Localization for Fine-grained Recognition Chen Huang Zhihai He [email protected] University of Missouri [email protected] Abstract In this paper we propose a task-driven
Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang
Recognizing Cats and Dogs with Shape and Appearance based Models Group Member: Chu Wang, Landu Jiang Abstract Recognizing cats and dogs from images is a challenging competition raised by Kaggle platform
arxiv:1504.08083v2 [cs.cv] 27 Sep 2015
Fast R-CNN Ross Girshick Microsoft Research [email protected] arxiv:1504.08083v2 [cs.cv] 27 Sep 2015 Abstract This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object
Image Classification for Dogs and Cats
Image Classification for Dogs and Cats Bang Liu, Yan Liu Department of Electrical and Computer Engineering {bang3,yan10}@ualberta.ca Kai Zhou Department of Computing Science [email protected] Abstract
3D Model based Object Class Detection in An Arbitrary View
3D Model based Object Class Detection in An Arbitrary View Pingkun Yan, Saad M. Khan, Mubarak Shah School of Electrical Engineering and Computer Science University of Central Florida http://www.eecs.ucf.edu/
Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28
Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object
Classifying Manipulation Primitives from Visual Data
Classifying Manipulation Primitives from Visual Data Sandy Huang and Dylan Hadfield-Menell Abstract One approach to learning from demonstrations in robotics is to make use of a classifier to predict if
InstaNet: Object Classification Applied to Instagram Image Streams
InstaNet: Object Classification Applied to Instagram Image Streams Clifford Huang Stanford University [email protected] Mikhail Sushkov Stanford University [email protected] Abstract The growing
Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks
1 Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks Tomislav Hrkać, Karla Brkić, Zoran Kalafatić Faculty of Electrical Engineering and Computing University of
Compacting ConvNets for end to end Learning
Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson, Hao Zhou, Fatih Porikli. Success of CNN Image Classification Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton,
International Journal of Advanced Information in Arts, Science & Management Vol.2, No.2, December 2014
Efficient Attendance Management System Using Face Detection and Recognition Arun.A.V, Bhatath.S, Chethan.N, Manmohan.C.M, Hamsaveni M Department of Computer Science and Engineering, Vidya Vardhaka College
arxiv:1506.03365v2 [cs.cv] 19 Jun 2015
LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop Fisher Yu Yinda Zhang Shuran Song Ari Seff Jianxiong Xiao arxiv:1506.03365v2 [cs.cv] 19 Jun 2015 Princeton
R-CNN minus R. 1 Introduction. Karel Lenc http://www.robots.ox.ac.uk/~karel. Department of Engineering Science, University of Oxford, Oxford, UK.
LENC, VEDALDI: R-CNN MINUS R 1 R-CNN minus R Karel Lenc http://www.robots.ox.ac.uk/~karel Andrea Vedaldi http://www.robots.ox.ac.uk/~vedaldi Department of Engineering Science, University of Oxford, Oxford,
Do Convnets Learn Correspondence?
Do Convnets Learn Correspondence? Jonathan Long Ning Zhang Trevor Darrell University of California Berkeley {jonlong, nzhang, trevor}@cs.berkeley.edu Abstract Convolutional neural nets (convnets) trained
Learning and transferring mid-level image representions using convolutional neural networks
Willow project-team Learning and transferring mid-level image representions using convolutional neural networks Maxime Oquab, Léon Bottou, Ivan Laptev, Josef Sivic 1 Image classification (easy) Is there
Edge Boxes: Locating Object Proposals from Edges
Edge Boxes: Locating Object Proposals from Edges C. Lawrence Zitnick and Piotr Dollár Microsoft Research Abstract. The use of object proposals is an effective recent approach for increasing the computational
Latest Advances in Deep Learning. Yao Chou
Latest Advances in Deep Learning Yao Chou Outline Introduction Images Classification Object Detection R-CNN Traditional Feature Descriptor Selective Search Implementation Latest Application Deep Learning
Fast Accurate Fish Detection and Recognition of Underwater Images with Fast R-CNN
Fast Accurate Fish Detection and Recognition of Underwater Images with Fast R-CNN Xiu Li 1, 2, Min Shang 1, 2, Hongwei Qin 1, 2, Liansheng Chen 1, 2 1. Department of Automation, Tsinghua University, Beijing
Object Detection in Video using Faster R-CNN
Object Detection in Video using Faster R-CNN Prajit Ramachandran University of Illinois at Urbana-Champaign [email protected] Abstract Convolutional neural networks (CNN) currently dominate the computer
Learning Detectors from Large Datasets for Object Retrieval in Video Surveillance
2012 IEEE International Conference on Multimedia and Expo Learning Detectors from Large Datasets for Object Retrieval in Video Surveillance Rogerio Feris, Sharath Pankanti IBM T. J. Watson Research Center
Object Detection from Video Tubelets with Convolutional Neural Networks
Object Detection from Video Tubelets with Convolutional Neural Networks Kai Kang Wanli Ouyang Hongsheng Li Xiaogang Wang Department of Electronic Engineering, The Chinese University of Hong Kong {kkang,wlouyang,hsli,xgwang}@ee.cuhk.edu.hk
Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers
Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers Fan Yang 1,2, Wongun Choi 2, and Yuanqing Lin 2 1 Department of Computer Science,
SSD: Single Shot MultiBox Detector
SSD: Single Shot MultiBox Detector Wei Liu 1, Dragomir Anguelov 2, Dumitru Erhan 3, Christian Szegedy 3, Scott Reed 4, Cheng-Yang Fu 1, Alexander C. Berg 1 1 UNC Chapel Hill 2 Zoox Inc. 3 Google Inc. 4
Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.
Visual search: what's next? Cees Snoek University of Amsterdam The Netherlands Euvision Technologies The Netherlands Problem statement US flag Tree Aircraft Humans Dog Smoking Building Basketball Table
Object Recognition. Selim Aksoy. Bilkent University [email protected]
Image Classification and Object Recognition Selim Aksoy Department of Computer Engineering Bilkent University [email protected] Image classification Image (scene) classification is a fundamental
Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006
Practical Tour of Visual tracking David Fleet and Allan Jepson January, 2006 Designing a Visual Tracker: What is the state? pose and motion (position, velocity, acceleration, ) shape (size, deformation,
T O B C A T C A S E G E O V I S A T DETECTIE E N B L U R R I N G V A N P E R S O N E N IN P A N O R A MISCHE BEELDEN
T O B C A T C A S E G E O V I S A T DETECTIE E N B L U R R I N G V A N P E R S O N E N IN P A N O R A MISCHE BEELDEN Goal is to process 360 degree images and detect two object categories 1. Pedestrians,
VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS
VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS Norbert Buch 1, Mark Cracknell 2, James Orwell 1 and Sergio A. Velastin 1 1. Kingston University, Penrhyn Road, Kingston upon Thames, KT1 2EE,
Pixels Description of scene contents. Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT) Banksy, 2006
Object Recognition Large Image Databases and Small Codes for Object Recognition Pixels Description of scene contents Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)
Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite
Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite Philip Lenz 1 Andreas Geiger 2 Christoph Stiller 1 Raquel Urtasun 3 1 KARLSRUHE INSTITUTE OF TECHNOLOGY 2 MAX-PLANCK-INSTITUTE IS 3
A Dynamic Convolutional Layer for Short Range Weather Prediction
A Dynamic Convolutional Layer for Short Range Weather Prediction Benjamin Klein, Lior Wolf and Yehuda Afek The Blavatnik School of Computer Science Tel Aviv University [email protected], [email protected],
The Delicate Art of Flower Classification
The Delicate Art of Flower Classification Paul Vicol Simon Fraser University University Burnaby, BC [email protected] Note: The following is my contribution to a group project for a graduate machine learning
CS231M Project Report - Automated Real-Time Face Tracking and Blending
CS231M Project Report - Automated Real-Time Face Tracking and Blending Steven Lee, [email protected] June 6, 2015 1 Introduction Summary statement: The goal of this project is to create an Android
Supporting Online Material for
www.sciencemag.org/cgi/content/full/313/5786/504/dc1 Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks G. E. Hinton* and R. R. Salakhutdinov *To whom correspondence
Novelty Detection in image recognition using IRF Neural Networks properties
Novelty Detection in image recognition using IRF Neural Networks properties Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban Université de Haute-Alsace MIPS 4, rue des Frères Lumière, 68093 Mulhouse,
CNN Based Object Detection in Large Video Images. WangTao, [email protected] IQIYI ltd. 2016.4
CNN Based Object Detection in Large Video Images WangTao, [email protected] IQIYI ltd. 2016.4 Outline Introduction Background Challenge Our approach System framework Object detection Scene recognition Body
High Level Describable Attributes for Predicting Aesthetics and Interestingness
High Level Describable Attributes for Predicting Aesthetics and Interestingness Sagnik Dhar Vicente Ordonez Tamara L Berg Stony Brook University Stony Brook, NY 11794, USA [email protected] Abstract
Naive-Deep Face Recognition: Touching the Limit of LFW Benchmark or Not?
Naive-Deep Face Recognition: Touching the Limit of LFW Benchmark or Not? Erjin Zhou [email protected] Zhimin Cao [email protected] Qi Yin [email protected] Abstract Face recognition performance improves rapidly
Keypoint Density-based Region Proposal for Fine-Grained Object Detection and Classification using Regions with Convolutional Neural Network Features
Keypoint Density-based Region Proposal for Fine-Grained Object Detection and Classification using Regions with Convolutional Neural Network Features JT Turner 1, Kalyan Gupta 1, Brendan Morris 2, & David
arxiv:1501.05703v2 [cs.cv] 30 Jan 2015
Beyond Frontal Faces: Improving Person Recognition Using Multiple Cues Ning Zhang 1,2, Manohar Paluri 2, Yaniv Taigman 2, Rob Fergus 2, Lubomir Bourdev 2 1 UC Berkeley 2 Facebook AI Research {nzhang}@eecs.berkeley.edu
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,
PANDA: Pose Aligned Networks for Deep Attribute Modeling
PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang 1,2, Manohar Paluri 1, Marc Aurelio Ranzato 1, Trevor Darrell 2, Lubomir Bourdev 1 1 Facebook AI Research 2 EECS, UC Berkeley {nzhang,
Interactive person re-identification in TV series
Interactive person re-identification in TV series Mika Fischer Hazım Kemal Ekenel Rainer Stiefelhagen CV:HCI lab, Karlsruhe Institute of Technology Adenauerring 2, 76131 Karlsruhe, Germany E-mail: {mika.fischer,ekenel,rainer.stiefelhagen}@kit.edu
Finding people in repeated shots of the same scene
Finding people in repeated shots of the same scene Josef Sivic 1 C. Lawrence Zitnick Richard Szeliski 1 University of Oxford Microsoft Research Abstract The goal of this work is to find all occurrences
Mean-Shift Tracking with Random Sampling
1 Mean-Shift Tracking with Random Sampling Alex Po Leung, Shaogang Gong Department of Computer Science Queen Mary, University of London, London, E1 4NS Abstract In this work, boosting the efficiency of
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS
IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS Alexander Velizhev 1 (presenter) Roman Shapovalov 2 Konrad Schindler 3 1 Hexagon Technology Center, Heerbrugg, Switzerland 2 Graphics & Media
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos [email protected]
Machine Learning for Computer Vision 1 MVA ENS Cachan Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos [email protected] Department of Applied Mathematics Ecole Centrale Paris Galen
TouchPaper - An Augmented Reality Application with Cloud-Based Image Recognition Service
TouchPaper - An Augmented Reality Application with Cloud-Based Image Recognition Service Feng Tang, Daniel R. Tretter, Qian Lin HP Laboratories HPL-2012-131R1 Keyword(s): image recognition; cloud service;
Beyond Frontal Faces: Improving Person Recognition Using Multiple Cues
Beyond Frontal Faces: Improving Person Recognition Using Multiple Cues Ning Zhang 1, Manohar Paluri 2, Yaniv Taigman 2, Rob Fergus 2, Lubomir Bourdev 2 1 UC Berkeley 2 Facebook AI Research {nzhang}@eecs.berkeley.edu
Probabilistic Latent Semantic Analysis (plsa)
Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg [email protected] www.multimedia-computing.{de,org} References
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
arxiv:1409.1556v6 [cs.cv] 10 Apr 2015
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION Karen Simonyan & Andrew Zisserman + Visual Geometry Group, Department of Engineering Science, University of Oxford {karen,az}@robots.ox.ac.uk
Taking Inverse Graphics Seriously
CSC2535: 2013 Advanced Machine Learning Taking Inverse Graphics Seriously Geoffrey Hinton Department of Computer Science University of Toronto The representation used by the neural nets that work best
Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15
Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15 GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries Copyright GENIVI Alliance
Hybrid Learning Framework for Large-Scale Web Image Annotation and Localization
Hybrid Learning Framework for Large-Scale Web Image Annotation and Localization Yong Li 1, Jing Liu 1, Yuhang Wang 1, Bingyuan Liu 1, Jun Fu 1, Yunze Gao 1, Hui Wu 2, Hang Song 1, Peng Ying 1, and Hanqing
arxiv:1312.6034v2 [cs.cv] 19 Apr 2014
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps arxiv:1312.6034v2 [cs.cv] 19 Apr 2014 Karen Simonyan Andrea Vedaldi Andrew Zisserman Visual Geometry Group,
Segmentation as Selective Search for Object Recognition
Segmentation as Selective Search for Object Recognition Koen E. A. van de Sande Jasper R. R. Uijlings Theo Gevers Arnold W. M. Smeulders University of Amsterdam University of Trento Amsterdam, The Netherlands
Open-Set Face Recognition-based Visitor Interface System
Open-Set Face Recognition-based Visitor Interface System Hazım K. Ekenel, Lorant Szasz-Toth, and Rainer Stiefelhagen Computer Science Department, Universität Karlsruhe (TH) Am Fasanengarten 5, Karlsruhe
arxiv:submit/1533655 [cs.cv] 13 Apr 2016
Bags of Local Convolutional Features for Scalable Instance Search Eva Mohedano, Kevin McGuinness and Noel E. O Connor Amaia Salvador, Ferran Marqués, and Xavier Giró-i-Nieto Insight Center for Data Analytics
Face Recognition in Low-resolution Images by Using Local Zernike Moments
Proceedings of the International Conference on Machine Vision and Machine Learning Prague, Czech Republic, August14-15, 014 Paper No. 15 Face Recognition in Low-resolution Images by Using Local Zernie
Active Learning with Boosting for Spam Detection
Active Learning with Boosting for Spam Detection Nikhila Arkalgud Last update: March 22, 2008 Active Learning with Boosting for Spam Detection Last update: March 22, 2008 1 / 38 Outline 1 Spam Filters
Informed Haar-like Features Improve Pedestrian Detection
Informed Haar-like Features Improve Pedestrian Detection Shanshan Zhang, Christian Bauckhage, Armin B. Cremers University of Bonn, Germany Fraunhofer IAIS, Germany Bonn-Aachen International Center for
Colorado School of Mines Computer Vision Professor William Hoff
Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/ 1 Introduction to 2 What is? A process that produces from images of the external world a description
MulticoreWare. Global Company, 250+ employees HQ = Sunnyvale, CA Other locations: US, China, India, Taiwan
1 MulticoreWare Global Company, 250+ employees HQ = Sunnyvale, CA Other locations: US, China, India, Taiwan Focused on Heterogeneous Computing Multiple verticals spawned from core competency Machine Learning
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
1 Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun arxiv:1506.01497v3 [cs.cv] 6 Jan 2016 Abstract State-of-the-art object
arxiv:1502.01852v1 [cs.cv] 6 Feb 2015
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun arxiv:1502.01852v1 [cs.cv] 6 Feb 2015 Abstract Rectified activation
arxiv:1505.04597v1 [cs.cv] 18 May 2015
U-Net: Convolutional Networks for Biomedical Image Segmentation Olaf Ronneberger, Philipp Fischer, and Thomas Brox arxiv:1505.04597v1 [cs.cv] 18 May 2015 Computer Science Department and BIOSS Centre for
Canny Edge Detection
Canny Edge Detection 09gr820 March 23, 2009 1 Introduction The purpose of edge detection in general is to significantly reduce the amount of data in an image, while preserving the structural properties
