arxiv: v1 [cs.cv] 29 Apr 2016
|
|
|
- Eleanore Montgomery
- 9 years ago
- Views:
Transcription
1 Faster R-CNN Features for Instance Search Amaia Salvador, Xavier Giró-i-Nieto, Ferran Marqués Universitat Politècnica de Catalunya (UPC) Barcelona, Spain Shin ichi Satoh National Institute of Informatics Tokyo, Japan arxiv: v1 [cs.cv] 29 Apr 2016 Abstract Image representations derived from pre-trained Convolutional Neural Networks (CNNs) have become the new state of the art in computer vision tasks such as instance retrieval. This work explores the suitability for instance retrieval of image- and region-wise representations pooled from an object detection CNN such as Faster R-CNN. We take advantage of the object proposals learned by a Region Proposal Network (RPN) and their associated CNN features to build an instance search pipeline composed of a first filtering stage followed by a spatial reranking. We further investigate the suitability of Faster R-CNN features when the network is fine-tuned for the same objects one wants to retrieve. We assess the performance of our proposed system with the Oxford Buildings 5k, Paris Buildings 6k and a subset of TRECVid Instance Search 2013, achieving competitive results. 1. Introduction Visual media is nowadays the most common type of content in social media channels, thanks to the proliferation of ubiquitous cameras. This explosion of online visual content has motivated researchers to come up with effective yet efficient automatic content based image retrieval systems. This work addresses the problem of instance search, understood as the task of retrieving those images from a database that contain an instance of a query. Recently, Convolutional Neural Networks (CNNs) have been proven to achieve state of the art performance in many computer vision tasks such as image classification [12, 22], object detection [19] or semantic segmentation [14]. CNNs trained with large amounts of data have been shown to learn feature representations that can be generic enough to be used even to solve tasks for which they had not been trained [18]. Particularly for image retrieval, many works in the literature [3, 25, 11] have adopted solutions based on offthe-shelf features extracted from a CNN pretrained for the task of image classification [12, 22, 24], achieving state of Figure 1. Examples of the rankings and object locations obtained by our proposed retrieval system for query objects (left, depicted with a blue contour) of three different datasets: TRECVid INS 2013, Paris Buildings and Oxford Buildings. the art performance in popular retrieval benchmarks. Instance search systems often combine fast first filtering stages, in which all images in a database are ranked according to their similarity to the query, with more computationally expensive mechanisms that are only applied to the top retrieved items. Geometric verification and spatial analysis [10, 29, 15, 28] are common reranking strategies, which are often followed with query expansion (pseudo-relevance feedback) [1, 5]. Spatial reranking usually involves the usage of sliding windows at different scales and aspect ratios over an image. Each window is then compared to the query instance in order to find the optimal location that contains the query, 1
2 which requires the computation of a visual descriptor on each of the considered windows. Such strategy resembles that of an object detection algorithm, which usually evaluates many image locations and determines whether they contain the object or not. Object Detection CNNs [8, 9, 7, 19] have rapidly evolved to a point where the usage of exhaustive search with sliding windows or the computation of object proposals [26, 2] is no longer required. Instead, state of the art detection CNNs [19] are trained in an end-to-end manner to simultaneously learn object locations and labels. This work explores the suitability of both off-the-shelf and fine-tuned features from an object detection CNN for the task of instance retrieval. We make the following three contributions: We propose to use a CNN pre-trained for object detection to extract convolutional features both at global and local scale in a single forward pass of the image through the network. We explore simple spatial reranking strategies, which take advantage of the locations learned by a Region Proposal Network (RPN) to provide a rough object localization for the top retrieved images of the ranking. We analyze the impact of fine-tuning an object detection CNN for the same instances one wants to query in the future. We find such a strategy to be suitable for learning better image representations. This way, we put together a simple instance retrieval system that uses both local and global features from an object detection network. Figure 1 shows examples of rankings generated with our retrieval pipeline. The remainder of the paper is structured as follows. Section 2 introduces the related works, Section 3 presents the methodology of this paper, including feature pooling, reranking and fine-tuning strategies. Section 4 includes the performed experiments on three different image retrieval benchmarks as well as the comparison to other state of the art CNN-based instance search systems. Finally, Section 5 draws the conclusions of this work. 2. Related Work CNNs for Instance Search. Features from pre-trained image classification CNNs have been widely used for instance search in the literature. Early works in this direction demonstrated the suitability of features from fully connected layers for image retrieval [4]. Razavian et al. [18] later improved the results by combining fully connected layers extracted from different image sub-patches. A second generation of works explored the usage of other layers in the pretrained CNN and found that convolutional layers significantly outperformed fully connected ones at image retrieval tasks [21]. Babenko and Lempitsky [3] later proposed a compact descriptor composed of the sum of the activations of each of the filter responses in a convolutional layer. Tolias et al. introduced R-MAC [25], a compact descriptor composed of the aggregation of multiple region features. Kalantidis et al. [11] found significant improvements when applying non-parametric spatial and channel-wise weighting strategies to the convolutional layers. This work shares similarities with all the former in the usage of convolutional features of a pretrained CNN. However, we choose to use a state-of-the-art object detection CNN, to extract both image- and region-based convolutional features in a single forward pass. Object Detection CNNs. Many works in the literature have proposed CNN-based object detection pipelines. Girshick et al. presented R-CNN [8], a version of Krizhevsky s AlexNet [12], fine-tuned for the Pascal VOC Detection data [6]. Instead of full images, the regions of an object proposal algorithm [26] were used as inputs to the network. At test time, fully connected layers for all windows were extracted and used to train a bounding box regressor and classifier. Since then, great improvements to R-CNN have been released, both in terms of accuracy and speed. He et al. proposed SPP-net [9], which used a Spatial Pyramid based pooling layer to improve classification and detection performance. Additionally, they significantly decreased computational time by pooling region features from convolutional features instead of forward passing each region crop through all layers in the CNN. This way, the computation of convolutional features is shared for all regions in an image. Girshick later released Fast R-CNN [7], which used the same speed strategy as SPP-net but, more importantly, replaced the post-hoc training of SVM classifiers and box regressors with an end-to-end training solution. Ren et al. introduced Faster R-CNN [19], which removed the object proposal dependency of former object detection CNN systems by introducing a Region Proposal Network (RPN). In Faster R-CNN, the RPN shares features with the object detection network in [7] to simultaneously learn prominent object proposals and their associated class probabilities. In this work, we take advantage of the end-to-end selfcontained object detection architecture of Faster R-CNN to extract both image and region features for instance search. 3. Methodology 3.1. CNN-based Representations This paper explores the suitability of using features from an object detection CNN for the task of instance search. In our setup, query instances are defined by a bounding box over the query images. We choose the architecture and pretrained models of Faster R-CNN [19] and use it as a feature
3 Conv layers Conv5_3 Image-wise Pooling of Activations (IPA) RPN RPN Proposals RPN Proposals RoI Pooling FC6 FC7 FC8 Class probabilities Region-wise Pooling of Activations (RPA) Figure 2. Image- and region-wise descriptor pooling from the Faster R-CNN architecture. extractor at both global and local scales. Faster R-CNN is composed of two branches that share convolutional layers. The first branch is a Region Proposal Network that learns a set of window locations, and the second one is a classifier that learns to label each window as one of the classes in the training set. Similarly to other works [3, 25, 11] our goal is to extract a compact image representation built from the activations of a convolutional layer in a CNN. Since Faster R-CNN operates at global and local scales, we propose the following strategies of feature pooling: Image-wise pooling of activations (IPA). In order to construct a global image descriptor from Faster R-CNN layer activations, one can choose to ignore all layers in the network that operate with object proposals and extract features from the last convolutional layer. Given the activations of a convolutional layer extracted for an image, we aggregate the activations of each filter response to construct an image descriptor of the same dimension as the number of filters in the convolutional layer. Both max and sum pooling strategies are considered and compared in Section 4.3 of this paper. Region-wise pooling of activations (RPA). After the last convolutional layer, Faster R-CNN implements a region pooling layer that extracts the convolutional activations for each of the object proposals learned by the RPN. This way, for each one of the window proposals, it is possible to compose a descriptor by aggregating the activations of that window in the RoI pooling layer, giving raise to the region-wise descriptors. For the region descriptor, both max and sum pooling strategies are tested as well. Figure 2 shows a schematic of the Faster R-CNN architecture and the two types of descriptor pooling described above. Following several other authors [3, 11], sum-pooled features are l 2 -normalized, followed by whitening and a second round of l 2 -normalization, while max-pooled features are just l 2 -normalized once (no whitening) Fine-tuning Faster R-CNN This paper explores the suitability of fine-tuning Faster R-CNN to 1) obtain better feature representations for image retrieval and 2) improve the performance of spatial analysis and reranking. To achieve this, we choose to fine tune Faster R-CNN to detect the query objects to be retrieved by our system. This way, we modify the architecture of Faster R- CNN to output the regressed bounding box coordinates and the class scores for each one of the query instances of the tested datasets. In our experiments, we explore two modalities of finetuning: Fine-tuning Strategy #1: Only the weights of the fully connected layers in the classification branch are updated (i.e. the convolutional layers and the RPN are left unchanged). Fine-tuning Strategy #2: Weights of all layers after the first two convolutional layers are updated. This way, convolutional features, RPN proposals and fully connected layers are modified and adapted to the query instances. The resulting fine-tuned networks are to be used to extract better image and region representations and to perform spatial reranking based on class scores instead of feature similarities Image Retrieval The three stages of the proposed instance retrieval pipeline are described in this section: filtering stage, spatial reranking and query expansion. Filtering Stage. The Image-wise pooling (IPA) strategy is used to build image descriptors for both query and database images. At test time, the descriptor of the query image is compared to all the elements in the database, which are then ranked based on the cosine similarity. At this stage, the whole image is considered as the query.
4 Spatial Reranking. After the Filtering Stage, the top N elements are locally analyzed and reranked. We explore two reranking strategies: Class-Agnostic Spatial Reranking (CA-SR). For every image in the top N ranking, the region-wise descriptors (RPA) for all RPN proposals are compared to the region descriptor of the query bounding box. The regionwise descriptors of RPN proposals are pooled from the RoI pooling layer of Faster R-CNN (see Figure 2). To obtain the region-wise descriptor of the query object, we warp its bounding box to the size of the feature maps in the last convolutional layer and pool the activations within its area. The region with maximum cosine similarity for every image in the top N ranking gives the object localization, and its score is kept for ranking. Class-Specific Spatial Reranking (CS-SR). Using a network that has been fine-tuned with the same instances one wishes to retrieve, it is possible to use the direct classification scores for each RPN proposal as the similarity score to the query object. Similarly to CA-SR, the region with maximum score is kept for visualization, and the score is used to rank the image list. Query Expasion (QE). The image descriptors of the top M elements of the ranking are averaged together with the query descriptor to perform a new search. 4. Experiments 4.1. Datasets The methodologies described in Section 3 are assessed with the following datasets: Oxford Buildings [16]. 5,063 images, including 55 query images of 11 different buildings in Oxford (5 images/instance are provided). A bounding box surrounding the target object is provided for query images. Paris Buildings [17]. 6,412 still images of Paris landmarks, including 55 query images of 11 buildings with associated bounding box annotations. INS 2013 [23]. A subset of 23,614 keyframes from TRECVid Instance Search (INS) dataset containing only those keyframes that are relevant for at least one of the queries of INS Experimental Setup We use both the VGG16 [22] and ZF [27] architectures of Faster R-CNN to extract image and region features. In both cases, we use the last convolutional layer (conv5 and conv5 3 for ZF and VGG16, respectively) to build the image descriptors introduced in Section 3, which are of dimension 256 and 512 for the ZF and VGG16 architectures, respectively. Region-wise features are pooled from the RoI pooling layer of Faster R-CNN. Images are re-scaled such that their shortest side is 600 pixels. All experiments were run in an Nvidia Titan X GPU Off-the-shelf Faster R-CNN features In this section, we assess the performance of using offthe-shelf features from the Faster R-CNN network for instance retrieval. First, we compare the sum and max pooling strategies of image- and region-wise descriptors. Table 1 summarizes the results. According to our experiments sumpooling is significantly superior to maxpooling for the filtering stage. Such behaviour is consistent with other works in the literature [3, 11]. Sumpooling is, however, consistently outperformed by maxpooling when reranking using region-wise features for all three datasets. Specifically for the Oxford and Paris datasets, we find the spatial reranking with maxpooling to be beneficial after filtering (gain of 0.10 and 0.03 map points for Oxford and Paris, respectively). However, the spatial reranking (either with max or sum pooling) has little or no effect for the INS13 dataset. To further interpret these results, we qualitatively evaluate the two pooling strategies. Figure 3 shows examples of top rankings for INS13 queries, spatially reranked with region-wise max and sum pooled descriptors. These examples indicate that, although map is similar, the object locations obtained with maxpooling are more accurate. According to this analysis, we set IPA-sum descriptors for the filtering stage and RPAmax descriptors for the spatial reranking in all the upcoming experiments of this paper. Table 2 shows the performance of different Faster R- CNN architectures (ZF and VGG16) trained on two datasets (Pascal VOC and COCO [13]), including experiments with query expansion with the M = 5 top retrieved images as well. As expected, features pooled from the deeper VGG16 network perform better in most cases, which is consistent with previous works in the literature showing that features from deeper networks reach better performance. Query expansion applied after the spatial reranking achieves significant gains for all tested datasets. Such behaviour was expected in particular with Oxford and Paris datasets, for which the spatial reranking already provided a significant gain. Interestingly, query expansion is also most beneficial after spatial reranking for the INS13 dataset, which suggests that, although in this case the spatial reranking does not provide any gain in map, the images that fall on the very top of the ranking are more useful to expand the query than the ones in the top of the first ranking.
5 fully connected layers might not be sufficient to effectively objects to be retrieved. We choose to fine-tune the VGG16 when using CS-SR, which suggests that only fine-tuning mance of fine-tuning a pretrained network with the query the INS 13 dataset, we do not find significant improvements In this section, we assess the impact in retrieval perfor- and Paris when followed with query expansion. In case of ford 5k), it is significantly better than CA-SR for Oxford 4.4. Fine-tuning Faster R-CNN improved after CS-SR (e.g. from to for Oxtuning. Results indicate that, although map is not always for the convolutional layers were not modified during fine original Faster R-CNN model, which is because the weights No VGG16 (C) and CA-SR stages are the same as those obtained with the No the obtained results (ft#1 columns). Results of the filtering No and run the retrieval pipeline from scratch. Table 3 shows We first take the networks fine-tuned with strategy #1 No VGG16 (P) #1 and respectively. #2, around and minutes finetuning No took 45 strategies 30 and for No was on Nvidia X 3.2. performed Titan Fine-tuning GPU a in two fine-tuning described the different modalities Section No separate network for each one of the tested datasets, using ZF (P) defined in [19]. This way, we train a No the multi-task loss No the classifier same the time, branches using and at RPN INS trains 13 mate Net joint CA-SR training QE strategy Oxford introduced 5k in Paris [20], 6k which is applied with our M small = number 5. of ments training of samples. the We ranking. use When the indicated, approxi- QE decreased ele- to tial Reranking considering with RPA-max is applied to ber the of top iterations, N which = we 100 are specifies kept whether for Class-Agnostic fine-tuning, Spa- except stage. for scribed the The num- in CA-SR [19] column IPA-sum descriptors are used for the filtering tively. The In original all Faster cases, R-CNN training parameters de- Pascal with trained was respec- images, COCO Microsoft or VOC class). (C) and (P) architectures. VGG16 network the whether denote background the plus queries (30 31 is 13 INS for classes map 2. Table and ZF with models R-CNN Faster pre-trained of output of number The examples. training 240 = to raise giving each, images 4 with instances, query ent differ- 30 have we 13, INS For total). in images training RPA-max 110 = ( images training the on flip horizontal a ing RPA-sum IPA-max perform- by set training the augment we Additionally, data None training as locations box bounding their and buildings the RPA-max of one each for provided images 5 the use We coordinates RPA-sum IPA-sum box bounding regressed corresponding their and ground), None back- the for class extra an plus dataset, the in buildings Reranking Filtering 13 INS 6k Paris 5k Oxford (11 probabilities class 12 return to network the in layers model. R-CNN Faster ZF the from features conv5 using stages reranking and filtering both for strategies pooling max and sum between comparison (map) Precision Average Mean 1. Table monochrome this 9076: and sign automat) (parking P a 9098: queries for obtained locations object and rankings 4 top of Examples 3. Figure R-CNN pretrained the objects of the Microsoft Faster model, with the In case of Oxford and Paris, we modify the output dataset. COCO visualization. RPA-sum (top) and RPA-max (bottom), after the filtering stage with IPA-sum. Regressed bounding box coordinates have been disabled for using generated rankings the between Comparison blue). in surrounded images (query dataset 2013 INS the from Victoria Queen of bust
6 detect the challenging query objects in this dataset. The second experiment in this section involves finetuning a higher number of layers in the Faster R-CNN architecture (Fine-tuning Strategy #2). Using this modality, the weights in the last convolutional layer are modified. Figure 4 shows the difference in the activations in conv5 3 after fine-tuning it for the query instances in each dataset. These visualizations indicate that, after fine-tuning, more neurons in the convolutional layer positively react to the visual patterns that are present in the query objects of the dataset. We then use the fine-tuned networks of the Fine-tuning Strategy #2 for each one of the datasets to extract imageand region-wise descriptors to perform instance search. Table 3 presents the results (ft#2 columns). As expected, finetuned features significantly outperform raw Faster R-CNN features for all datasets (map is 20% higher for Oxford and Paris, and 8% higher for INS 13). Results indicate that, for Oxford and Paris datasets, the gain of CA-SR + QE is higher with raw features (10% and 11% map increase for Oxford and Paris, respectively) than with fine-tuned ones (8% and 3% map increase, respectively). This suggests that fine-tuned features are already discriminant enough to correctly retrieve the objects in these two datasets. However, results for the INS 13 dataset show that CA-SR + QE is most beneficial when using fine-tuned features (11% and 41% map increase for raw and fine-tuned features, respectively). This difference between the performance for Oxford/Paris and INS13 suggests that queries from the latter are more challenging and therefore benefit from fine-tuned features and spatial reranking the most. A similar behaviour is observed for CS-SR which, for Oxfod and Paris, is most beneficial when applied to a ranking obtained with raw features. For INS 13, however, the gain is greater when using fine-tuned features. Overall, the performance of reranking + query expansion is higher for CS-SR than CA-SR. Figure 1 shows examples of rankings for queries of the three different datasets after applying CS-SR. For visualization, we disable the regressed bounding box coordinates predicted by Faster R-CNN and choose to display those that are directly returned by the RPN. We find that the locations returned by the regression layer are innacurate in most cases, which we hypothesize is caused by the lack of training data. Finally, in Figure 5 we qualitatively evaluate the object detections after CS-SR using the fine-tuned strategies #1 and #2. The comparison reveals that locations obtained with the latter are more accurate and tight to the objects. The Fine-tuning Strategy #2 allows the RPN layers to adapt to the query objects, which causes the network to produce object proposals that are more suitable for the objects in the test datasets. Figure 4. Difference between conv5 3 features (sum pooled over feature maps) extracted from the original Faster R-CNN model pretrained with MS COCO with conv5 3 features from the same model fine-tuned for INS13 (bottom), Oxford and Paris (top) queries. Table 3. Comparison between Fine-tuning strategies #1 (ft#1) and #2 (ft#2) on the three datasets. Spatial Reranking (R) is applied to the N = 100 top elements of the ranking. QE is performed with M = 5. R QE Oxford 5k Paris 6k INS 13 ft#1 ft#2 ft#1 ft#2 ft#1 ft#2 No No No CA-SR No CA-SR CS-SR No CS-SR Comparison with state-of-the-art In this section, we compare our results with several instance search works in the literature. Table 4 shows the results of this comparison. Our proposed pipeline using Faster R-CNN features shows competitive results with respect to the state of the art. However, other works [11, 25] achieve a very high performance without any reranking nor query expansion strategies using similar feature pooling strategies. We hypothesize that the difference in the CNN architecture (Faster R-CNN vs. VGG16), training data (Pascal VOC vs ImageNet) and input image size (600px wide vs. full resolution) between these works and ours might be the reasons of the gap in performance. Our proposed reranking strategy CA-SR fol-
7 Figure 5. Ranking examples after CS-SR with fine-tuned strategies #1 (left) and #2 (right). Table 4. Comparison with CNN-based state-of-the-art works on instance retrieval. Oxford 5k Paris 6k Razavian et al. [18] Tolias et al. [25] Kalantidis et al. [11] Babenko and Lempitsky [3] Ours Ours (ft#2) Tolias et al. (+ R + QE) [25] Kalantidis et al. (+ QE) [11] Ours (+ CA-SR + QE) Ours (ft#1) (+ CS-SR + QE) Ours (ft#2) (+ CS-SR + QE) lowed by query expansion is demonstrated to provide similar map gains compared to the one proposed in [25]. While CA-SR + QE gives us a gain in map of 10% both for Oxford and Paris (using raw Faster R-CNN features), Tolias et al. [25] use their reranking strategy to raise their map by 5 and 15% for the two datasets, respectively. As expected, results obtained with fine-tuned features (ft#2) achieve very competitive results compared to those in the state of the art, which suggests that fine-tuning the network for the object queries is an effective solution when time is not a constraint. 5. Conclusion This paper has presented different strategies to make use of CNN features from an object detection CNN. It provides a simple baseline that uses off-the-shelf Faster R-CNN features to describe both images and their sub-parts. We have shown that is possible to greatly improve the performance of an off-the-shelf based system, at the cost of fine tuning the CNN for the query images that include objects that one wants to retrieve. Acknowledgements This work has been developed in the framework of the project BigGraph TEC R, funded by the Spanish Ministerio de Economía y Competitividad and the European Regional Development Fund (ERDF). The Image Processing Group at the UPC is a SGR14 Consolidated Research Group recognized and sponsored by the Catalan Government (Generalitat de Catalunya) through its AGAUR office. Amaia Salvador developed this work thanks to the NII International Internship Program We gratefully acknowledge the support of NVIDIA Corporation with the donation of the GeForce GTX Titan X used in this work. References [1] R. Arandjelović and A. Zisserman. Three things everyone should know to improve object retrieval. In Computer Vision and Pattern Recognition (CVPR), pages , [2] P. Arbeláez, J. Pont-Tuset, J. Barron, F. Marques, and J. Malik. Multiscale combinatorial grouping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages , [3] A. Babenko and V. Lempitsky. Aggregating local deep features for image retrieval. In International Conference on Computer Vision (ICCV), December , 2, 3, 4, 7 [4] A. Babenko, A. Slesarev, A. Chigorin, and V. Lempitsky. Neural codes for image retrieval. In Computer Vision ECCV 2014, pages [5] O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. Total recall: Automatic query expansion with a generative feature model for object retrieval. In International Conference on Computer Vision (ICCV), pages 1 8, [6] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2): , June [7] R. Girshick. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, pages ,
8 [8] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages , [9] K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 37(9): , [10] H. Jégou, M. Douze, and C. Schmid. Improving bag-offeatures for large scale image search. International Journal of Computer Vision, 87(3): , [11] Y. Kalantidis, C. Mellina, and S. Osindero. Crossdimensional weighting for aggregated deep convolutional features. arxiv: , , 2, 3, 4, 6, 7 [12] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages , , 2 [13] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In Computer Vision ECCV 2014, pages Springer, [14] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages , [15] T. Mei, Y. Rui, S. Li, and Q. Tian. Multimedia search reranking: A literature survey. ACM Computing Surveys (CSUR), 46(3):38, [16] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In Computer Vision and Pattern Recognition (CVPR), pages 1 8, [17] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In Computer Vision and Pattern Recognition (CVPR), [18] A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. CNN features off-the-shelf: an astounding baseline for recognition. In Computer Vision and Pattern Recognition Workshops (CVPRW), , 2, 7 [19] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems, pages 91 99, , 2, 5 [20] S. Ren, K. He, R. B. Girshick, and J. Sun. Faster R-CNN: towards real-time object detection with region proposal networks. CoRR, abs/ , [21] A. Sharif Razavian, J. Sullivan, A. Maki, and S. Carlsson. A baseline for visual instance retrieval with deep convolutional networks. In International Conference on Learning Representations. ICLR, [22] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/ , , 4 [23] A. F. Smeaton, P. Over, and W. Kraaij. Evaluation campaigns and trecvid. In International Workshop on Multimedia Information Retrieval (MIR), [24] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1 9, [25] G. Tolias, R. Sicre, and H. Jégou. Particular object retrieval with integral max-pooling of CNN activations. ICLR, , 2, 3, 6, 7 [26] J. R. Uijlings, K. E. van de Sande, T. Gevers, and A. W. Smeulders. Selective search for object recognition. International journal of computer vision, 104(2): , [27] M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In Computer vision ECCV 2014, pages Springer, [28] W. Zhang and C.-W. Ngo. Topological spatial verification for instance search. IEEE Transactions on Multimedia, 17(8): , [29] Y. Zhang, Z. Jia, and T. Chen. Image retrieval with geometrypreserving visual phrases. In Computer Vision and Pattern Recognition (CVPR), pages ,
arxiv:submit/1533655 [cs.cv] 13 Apr 2016
Bags of Local Convolutional Features for Scalable Instance Search Eva Mohedano, Kevin McGuinness and Noel E. O Connor Amaia Salvador, Ferran Marqués, and Xavier Giró-i-Nieto Insight Center for Data Analytics
Convolutional Feature Maps
Convolutional Feature Maps Elements of efficient (and accurate) CNN-based object detection Kaiming He Microsoft Research Asia (MSRA) ICCV 2015 Tutorial on Tools for Efficient Object Detection Overview
Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection
CSED703R: Deep Learning for Visual Recognition (206S) Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection Bohyung Han Computer Vision Lab. [email protected] 2 3 Object detection
Pedestrian Detection with RCNN
Pedestrian Detection with RCNN Matthew Chen Department of Computer Science Stanford University [email protected] Abstract In this paper we evaluate the effectiveness of using a Region-based Convolutional
Fast R-CNN. Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th. Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:1504.08083.
Fast R-CNN Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:1504.08083. ECS 289G 001 Paper Presentation, Prof. Lee Result 1 67% Accuracy
Lecture 6: Classification & Localization. boris. [email protected]
Lecture 6: Classification & Localization boris. [email protected] 1 Agenda ILSVRC 2014 Overfeat: integrated classification, localization, and detection Classification with Localization Detection. 2 ILSVRC-2014
CS 1699: Intro to Computer Vision. Deep Learning. Prof. Adriana Kovashka University of Pittsburgh December 1, 2015
CS 1699: Intro to Computer Vision Deep Learning Prof. Adriana Kovashka University of Pittsburgh December 1, 2015 Today: Deep neural networks Background Architectures and basic operations Applications Visualizing
Module 5. Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016
Module 5 Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016 Previously, end-to-end.. Dog Slide credit: Jose M 2 Previously, end-to-end.. Dog Learned Representation Slide credit: Jose
Deformable Part Models with CNN Features
Deformable Part Models with CNN Features Pierre-André Savalle 1, Stavros Tsogkas 1,2, George Papandreou 3, Iasonas Kokkinos 1,2 1 Ecole Centrale Paris, 2 INRIA, 3 TTI-Chicago Abstract. In this work we
Fast R-CNN Object detection with Caffe
Fast R-CNN Object detection with Caffe Ross Girshick Microsoft Research arxiv code Latest roasts Goals for this section Super quick intro to object detection Show one way to tackle obj. det. with ConvNets
Compacting ConvNets for end to end Learning
Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson, Hao Zhou, Fatih Porikli. Success of CNN Image Classification Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton,
Bert Huang Department of Computer Science Virginia Tech
This paper was submitted as a final project report for CS6424/ECE6424 Probabilistic Graphical Models and Structured Prediction in the spring semester of 2016. The work presented here is done by students
Multi-view Face Detection Using Deep Convolutional Neural Networks
Multi-view Face Detection Using Deep Convolutional Neural Networks Sachin Sudhakar Farfade Yahoo [email protected] Mohammad Saberian Yahoo [email protected] Li-Jia Li Yahoo [email protected]
Task-driven Progressive Part Localization for Fine-grained Recognition
Task-driven Progressive Part Localization for Fine-grained Recognition Chen Huang Zhihai He [email protected] University of Missouri [email protected] Abstract In this paper we propose a task-driven
Steven C.H. Hoi School of Information Systems Singapore Management University Email: [email protected]
Steven C.H. Hoi School of Information Systems Singapore Management University Email: [email protected] Introduction http://stevenhoi.org/ Finance Recommender Systems Cyber Security Machine Learning Visual
arxiv:1504.08083v2 [cs.cv] 27 Sep 2015
Fast R-CNN Ross Girshick Microsoft Research [email protected] arxiv:1504.08083v2 [cs.cv] 27 Sep 2015 Abstract This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object
Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers
Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers Fan Yang 1,2, Wongun Choi 2, and Yuanqing Lin 2 1 Department of Computer Science,
arxiv:1506.03365v2 [cs.cv] 19 Jun 2015
LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop Fisher Yu Yinda Zhang Shuran Song Ari Seff Jianxiong Xiao arxiv:1506.03365v2 [cs.cv] 19 Jun 2015 Princeton
Object Detection in Video using Faster R-CNN
Object Detection in Video using Faster R-CNN Prajit Ramachandran University of Illinois at Urbana-Champaign [email protected] Abstract Convolutional neural networks (CNN) currently dominate the computer
Semantic Recognition: Object Detection and Scene Segmentation
Semantic Recognition: Object Detection and Scene Segmentation Xuming He [email protected] Computer Vision Research Group NICTA Robotic Vision Summer School 2015 Acknowledgement: Slides from Fei-Fei
arxiv:1409.1556v6 [cs.cv] 10 Apr 2015
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION Karen Simonyan & Andrew Zisserman + Visual Geometry Group, Department of Engineering Science, University of Oxford {karen,az}@robots.ox.ac.uk
Image and Video Understanding
Image and Video Understanding 2VO 710.095 WS Christoph Feichtenhofer, Axel Pinz Slide credits: Many thanks to all the great computer vision researchers on which this presentation relies on. Most material
Learning and transferring mid-level image representions using convolutional neural networks
Willow project-team Learning and transferring mid-level image representions using convolutional neural networks Maxime Oquab, Léon Bottou, Ivan Laptev, Josef Sivic 1 Image classification (easy) Is there
R-CNN minus R. 1 Introduction. Karel Lenc http://www.robots.ox.ac.uk/~karel. Department of Engineering Science, University of Oxford, Oxford, UK.
LENC, VEDALDI: R-CNN MINUS R 1 R-CNN minus R Karel Lenc http://www.robots.ox.ac.uk/~karel Andrea Vedaldi http://www.robots.ox.ac.uk/~vedaldi Department of Engineering Science, University of Oxford, Oxford,
MulticoreWare. Global Company, 250+ employees HQ = Sunnyvale, CA Other locations: US, China, India, Taiwan
1 MulticoreWare Global Company, 250+ employees HQ = Sunnyvale, CA Other locations: US, China, India, Taiwan Focused on Heterogeneous Computing Multiple verticals spawned from core competency Machine Learning
Return of the Devil in the Details: Delving Deep into Convolutional Nets
CHATFIELD ET AL.: RETURN OF THE DEVIL 1 Return of the Devil in the Details: Delving Deep into Convolutional Nets Ken Chatfield [email protected] Karen Simonyan [email protected] Andrea Vedaldi [email protected]
Pedestrian Detection using R-CNN
Pedestrian Detection using R-CNN CS676A: Computer Vision Project Report Advisor: Prof. Vinay P. Namboodiri Deepak Kumar Mohit Singh Solanki (12228) (12419) Group-17 April 15, 2016 Abstract Pedestrian detection
Image Classification for Dogs and Cats
Image Classification for Dogs and Cats Bang Liu, Yan Liu Department of Electrical and Computer Engineering {bang3,yan10}@ualberta.ca Kai Zhou Department of Computing Science [email protected] Abstract
Administrivia. Traditional Recognition Approach. Overview. CMPSCI 370: Intro. to Computer Vision Deep learning
: Intro. to Computer Vision Deep learning University of Massachusetts, Amherst April 19/21, 2016 Instructor: Subhransu Maji Finals (everyone) Thursday, May 5, 1-3pm, Hasbrouck 113 Final exam Tuesday, May
Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.
Visual search: what's next? Cees Snoek University of Amsterdam The Netherlands Euvision Technologies The Netherlands Problem statement US flag Tree Aircraft Humans Dog Smoking Building Basketball Table
CNN Based Object Detection in Large Video Images. WangTao, [email protected] IQIYI ltd. 2016.4
CNN Based Object Detection in Large Video Images WangTao, [email protected] IQIYI ltd. 2016.4 Outline Introduction Background Challenge Our approach System framework Object detection Scene recognition Body
SSD: Single Shot MultiBox Detector
SSD: Single Shot MultiBox Detector Wei Liu 1, Dragomir Anguelov 2, Dumitru Erhan 3, Christian Szegedy 3, Scott Reed 4, Cheng-Yang Fu 1, Alexander C. Berg 1 1 UNC Chapel Hill 2 Zoox Inc. 3 Google Inc. 4
CAP 6412 Advanced Computer Vision
CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong Jan 26, 2016 Today Administrivia A bigger picture and some common questions Object detection proposals, by Samer
Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report
Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 69 Class Project Report Junhua Mao and Lunbo Xu University of California, Los Angeles [email protected] and lunbo
Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang
Recognizing Cats and Dogs with Shape and Appearance based Models Group Member: Chu Wang, Landu Jiang Abstract Recognizing cats and dogs from images is a challenging competition raised by Kaggle platform
Scalable Object Detection by Filter Compression with Regularized Sparse Coding
Scalable Object Detection by Filter Compression with Regularized Sparse Coding Ting-Hsuan Chao, Yen-Liang Lin, Yin-Hsi Kuo, and Winston H Hsu National Taiwan University, Taipei, Taiwan Abstract For practical
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
1 Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun arxiv:1506.01497v3 [cs.cv] 6 Jan 2016 Abstract State-of-the-art object
Object Detection from Video Tubelets with Convolutional Neural Networks
Object Detection from Video Tubelets with Convolutional Neural Networks Kai Kang Wanli Ouyang Hongsheng Li Xiaogang Wang Department of Electronic Engineering, The Chinese University of Hong Kong {kkang,wlouyang,hsli,xgwang}@ee.cuhk.edu.hk
Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks
1 Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks Tomislav Hrkać, Karla Brkić, Zoran Kalafatić Faculty of Electrical Engineering and Computing University of
The Visual Internet of Things System Based on Depth Camera
The Visual Internet of Things System Based on Depth Camera Xucong Zhang 1, Xiaoyun Wang and Yingmin Jia Abstract The Visual Internet of Things is an important part of information technology. It is proposed
SEMANTIC CONTEXT AND DEPTH-AWARE OBJECT PROPOSAL GENERATION
SEMANTIC TEXT AND DEPTH-AWARE OBJECT PROPOSAL GENERATION Haoyang Zhang,, Xuming He,, Fatih Porikli,, Laurent Kneip NICTA, Canberra; Australian National University, Canberra ABSTRACT This paper presents
Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28
Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object
Keypoint Density-based Region Proposal for Fine-Grained Object Detection and Classification using Regions with Convolutional Neural Network Features
Keypoint Density-based Region Proposal for Fine-Grained Object Detection and Classification using Regions with Convolutional Neural Network Features JT Turner 1, Kalyan Gupta 1, Brendan Morris 2, & David
arxiv:1502.01852v1 [cs.cv] 6 Feb 2015
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun arxiv:1502.01852v1 [cs.cv] 6 Feb 2015 Abstract Rectified activation
InstaNet: Object Classification Applied to Instagram Image Streams
InstaNet: Object Classification Applied to Instagram Image Streams Clifford Huang Stanford University [email protected] Mikhail Sushkov Stanford University [email protected] Abstract The growing
arxiv:1312.6034v2 [cs.cv] 19 Apr 2014
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps arxiv:1312.6034v2 [cs.cv] 19 Apr 2014 Karen Simonyan Andrea Vedaldi Andrew Zisserman Visual Geometry Group,
3D Object Recognition using Convolutional Neural Networks with Transfer Learning between Input Channels
3D Object Recognition using Convolutional Neural Networks with Transfer Learning between Input Channels Luís A. Alexandre Department of Informatics and Instituto de Telecomunicações Univ. Beira Interior,
Do Convnets Learn Correspondence?
Do Convnets Learn Correspondence? Jonathan Long Ning Zhang Trevor Darrell University of California Berkeley {jonlong, nzhang, trevor}@cs.berkeley.edu Abstract Convolutional neural nets (convnets) trained
Florida International University - University of Miami TRECVID 2014
Florida International University - University of Miami TRECVID 2014 Miguel Gavidia 3, Tarek Sayed 1, Yilin Yan 1, Quisha Zhu 1, Mei-Ling Shyu 1, Shu-Ching Chen 2, Hsin-Yu Ha 2, Ming Ma 1, Winnie Chen 4,
Real-Time Grasp Detection Using Convolutional Neural Networks
Real-Time Grasp Detection Using Convolutional Neural Networks Joseph Redmon 1, Anelia Angelova 2 Abstract We present an accurate, real-time approach to robotic grasp detection based on convolutional neural
Applications of Deep Learning to the GEOINT mission. June 2015
Applications of Deep Learning to the GEOINT mission June 2015 Overview Motivation Deep Learning Recap GEOINT applications: Imagery exploitation OSINT exploitation Geospatial and activity based analytics
A Convolutional Neural Network Cascade for Face Detection
A Neural Network Cascade for Face Detection Haoxiang Li, Zhe Lin, Xiaohui Shen, Jonathan Brandt, Gang Hua Stevens Institute of Technology Hoboken, NJ 07030 {hli18, ghua}@stevens.edu Adobe Research San
Fast Matching of Binary Features
Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been
GPU-Based Deep Learning Inference:
Whitepaper GPU-Based Deep Learning Inference: A Performance and Power Analysis November 2015 1 Contents Abstract... 3 Introduction... 3 Inference versus Training... 4 GPUs Excel at Neural Network Inference...
Edge Boxes: Locating Object Proposals from Edges
Edge Boxes: Locating Object Proposals from Edges C. Lawrence Zitnick and Piotr Dollár Microsoft Research Abstract. The use of object proposals is an effective recent approach for increasing the computational
Fast Accurate Fish Detection and Recognition of Underwater Images with Fast R-CNN
Fast Accurate Fish Detection and Recognition of Underwater Images with Fast R-CNN Xiu Li 1, 2, Min Shang 1, 2, Hongwei Qin 1, 2, Liansheng Chen 1, 2 1. Department of Automation, Tsinghua University, Beijing
Deep Residual Networks
Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July 2016. Formerly affiliated with Microsoft Research Asia 7x7 conv,
Convolutional Neural Networks with Intra-layer Recurrent Connections for Scene Labeling
Convolutional Neural Networks with Intra-layer Recurrent Connections for Scene Labeling Ming Liang Xiaolin Hu Bo Zhang Tsinghua National Laboratory for Information Science and Technology (TNList) Department
Latest Advances in Deep Learning. Yao Chou
Latest Advances in Deep Learning Yao Chou Outline Introduction Images Classification Object Detection R-CNN Traditional Feature Descriptor Selective Search Implementation Latest Application Deep Learning
Large Scale Semi-supervised Object Detection using Visual and Semantic Knowledge Transfer
Large Scale Semi-supervised Object Detection using Visual and Semantic Knowledge Transfer Yuxing Tang 1 Josiah Wang 2 Boyang Gao 1,3 Emmanuel Dellandréa 1 Robert Gaizauskas 2 Liming Chen 1 1 Ecole Centrale
arxiv:1505.04597v1 [cs.cv] 18 May 2015
U-Net: Convolutional Networks for Biomedical Image Segmentation Olaf Ronneberger, Philipp Fischer, and Thomas Brox arxiv:1505.04597v1 [cs.cv] 18 May 2015 Computer Science Department and BIOSS Centre for
Denoising Convolutional Autoencoders for Noisy Speech Recognition
Denoising Convolutional Autoencoders for Noisy Speech Recognition Mike Kayser Stanford University [email protected] Victor Zhong Stanford University [email protected] Abstract We propose the use of
Interactive person re-identification in TV series
Interactive person re-identification in TV series Mika Fischer Hazım Kemal Ekenel Rainer Stiefelhagen CV:HCI lab, Karlsruhe Institute of Technology Adenauerring 2, 76131 Karlsruhe, Germany E-mail: {mika.fischer,ekenel,rainer.stiefelhagen}@kit.edu
Segmentation as Selective Search for Object Recognition
Segmentation as Selective Search for Object Recognition Koen E. A. van de Sande Jasper R. R. Uijlings Theo Gevers Arnold W. M. Smeulders University of Amsterdam University of Trento Amsterdam, The Netherlands
A Dynamic Convolutional Layer for Short Range Weather Prediction
A Dynamic Convolutional Layer for Short Range Weather Prediction Benjamin Klein, Lior Wolf and Yehuda Afek The Blavatnik School of Computer Science Tel Aviv University [email protected], [email protected],
Learning to Process Natural Language in Big Data Environment
CCF ADL 2015 Nanchang Oct 11, 2015 Learning to Process Natural Language in Big Data Environment Hang Li Noah s Ark Lab Huawei Technologies Part 1: Deep Learning - Present and Future Talk Outline Overview
SIGNAL INTERPRETATION
SIGNAL INTERPRETATION Lecture 6: ConvNets February 11, 2016 Heikki Huttunen [email protected] Department of Signal Processing Tampere University of Technology CONVNETS Continued from previous slideset
Introduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Deep Learning Barnabás Póczos & Aarti Singh Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey
Transform-based Domain Adaptation for Big Data
Transform-based Domain Adaptation for Big Data Erik Rodner University of Jena Judy Hoffman Jeff Donahue Trevor Darrell Kate Saenko UMass Lowell Abstract Images seen during test time are often not from
Getting Started with Caffe Julien Demouth, Senior Engineer
Getting Started with Caffe Julien Demouth, Senior Engineer What is Caffe? Open Source Framework for Deep Learning http://github.com/bvlc/caffe Developed by the Berkeley Vision and Learning Center (BVLC)
TouchPaper - An Augmented Reality Application with Cloud-Based Image Recognition Service
TouchPaper - An Augmented Reality Application with Cloud-Based Image Recognition Service Feng Tang, Daniel R. Tretter, Qian Lin HP Laboratories HPL-2012-131R1 Keyword(s): image recognition; cloud service;
3D Model based Object Class Detection in An Arbitrary View
3D Model based Object Class Detection in An Arbitrary View Pingkun Yan, Saad M. Khan, Mubarak Shah School of Electrical Engineering and Computer Science University of Central Florida http://www.eecs.ucf.edu/
SZTAKI @ ImageCLEF 2011
SZTAKI @ ImageCLEF 2011 Bálint Daróczy Róbert Pethes András A. Benczúr Data Mining and Web search Research Group, Informatics Laboratory Computer and Automation Research Institute of the Hungarian Academy
Hybrid Learning Framework for Large-Scale Web Image Annotation and Localization
Hybrid Learning Framework for Large-Scale Web Image Annotation and Localization Yong Li 1, Jing Liu 1, Yuhang Wang 1, Bingyuan Liu 1, Jun Fu 1, Yunze Gao 1, Hui Wu 2, Hang Song 1, Peng Ying 1, and Hanqing
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,
Local features and matching. Image classification & object localization
Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to
arxiv:1501.05703v2 [cs.cv] 30 Jan 2015
Beyond Frontal Faces: Improving Person Recognition Using Multiple Cues Ning Zhang 1,2, Manohar Paluri 2, Yaniv Taigman 2, Rob Fergus 2, Lubomir Bourdev 2 1 UC Berkeley 2 Facebook AI Research {nzhang}@eecs.berkeley.edu
The Delicate Art of Flower Classification
The Delicate Art of Flower Classification Paul Vicol Simon Fraser University University Burnaby, BC [email protected] Note: The following is my contribution to a group project for a graduate machine learning
The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
Distributed forests for MapReduce-based machine learning
Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
T O B C A T C A S E G E O V I S A T DETECTIE E N B L U R R I N G V A N P E R S O N E N IN P A N O R A MISCHE BEELDEN
T O B C A T C A S E G E O V I S A T DETECTIE E N B L U R R I N G V A N P E R S O N E N IN P A N O R A MISCHE BEELDEN Goal is to process 360 degree images and detect two object categories 1. Pedestrians,
An Analysis of Single-Layer Networks in Unsupervised Feature Learning
An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates 1, Honglak Lee 2, Andrew Y. Ng 1 1 Computer Science Department, Stanford University {acoates,ang}@cs.stanford.edu 2 Computer
Object Recognition. Selim Aksoy. Bilkent University [email protected]
Image Classification and Object Recognition Selim Aksoy Department of Computer Engineering Bilkent University [email protected] Image classification Image (scene) classification is a fundamental
FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION
FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION Marius Muja, David G. Lowe Computer Science Department, University of British Columbia, Vancouver, B.C., Canada [email protected],
Simultaneous Deep Transfer Across Domains and Tasks
Simultaneous Deep Transfer Across Domains and Tasks Eric Tzeng, Judy Hoffman, Trevor Darrell UC Berkeley, EECS & ICSI {etzeng,jhoffman,trevor}@eecs.berkeley.edu Kate Saenko UMass Lowell, CS [email protected]
Sense Making in an IOT World: Sensor Data Analysis with Deep Learning
Sense Making in an IOT World: Sensor Data Analysis with Deep Learning Natalia Vassilieva, PhD Senior Research Manager GTC 2016 Deep learning proof points as of today Vision Speech Text Other Search & information
CS231M Project Report - Automated Real-Time Face Tracking and Blending
CS231M Project Report - Automated Real-Time Face Tracking and Blending Steven Lee, [email protected] June 6, 2015 1 Introduction Summary statement: The goal of this project is to create an Android
siftservice.com - Turning a Computer Vision algorithm into a World Wide Web Service
siftservice.com - Turning a Computer Vision algorithm into a World Wide Web Service Ahmad Pahlavan Tafti 1, Hamid Hassannia 2, and Zeyun Yu 1 1 Department of Computer Science, University of Wisconsin -Milwaukee,
arxiv:1511.02300v2 [cs.cv] 9 Mar 2016
Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images Shuran Song Jianxiong Xiao Princeton University http://dss.cs.princeton.edu arxiv:1511.02300v2 [cs.cv] 9 Mar 2016 Abstract We focus on
Group Sparse Coding. Fernando Pereira Google Mountain View, CA [email protected]. Dennis Strelow Google Mountain View, CA strelow@google.
Group Sparse Coding Samy Bengio Google Mountain View, CA [email protected] Fernando Pereira Google Mountain View, CA [email protected] Yoram Singer Google Mountain View, CA [email protected] Dennis Strelow
Privacy-CNH: A Framework to Detect Photo Privacy with Convolutional Neural Network Using Hierarchical Features
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) Privacy-CNH: A Framework to Detect Photo Privacy with Convolutional Neural Network Using Hierarchical Features Lam Tran
Big Data: Image & Video Analytics
Big Data: Image & Video Analytics How it could support Archiving & Indexing & Searching Dieter Haas, IBM Deutschland GmbH The Big Data Wave 60% of internet traffic is multimedia content (images and videos)
Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network
Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network Seunghoon Hong 1 Tackgeun You 1 Suha Kwak 2 Bohyung Han 1 1 Dept. of Computer Science and Engineering, POSTECH,
The Big Data methodology in computer vision systems
The Big Data methodology in computer vision systems Popov S.B. Samara State Aerospace University, Image Processing Systems Institute, Russian Academy of Sciences Abstract. I consider the advantages of
