Bert Huang Department of Computer Science Virginia Tech
|
|
|
- Pierce Hines
- 9 years ago
- Views:
Transcription
1 This paper was submitted as a final project report for CS6424/ECE6424 Probabilistic Graphical Models and Structured Prediction in the spring semester of The work presented here is done by students on short time constraints, so it may be preliminary or have inconclusive results. I encouraged the students to choose projects that complemented or integrated with their own research, so it is possible the project has continued since this report was submitted. If you are interested in the topic, I encourage you to contact the authors to see the latest developments on these ideas. Bert Huang Department of Computer Science Virginia Tech
2 On Helping Stroke Rehabilitation with Faster R-CNN Jinwoo Choi Department of Electrical and Computer Engineering Virginia Tech Blacksburg, VA Abstract In this work, we address the problem of real-time hand detection problem for automatic stroke patient rehabilitation assessment system. We employ the stateof-the-art object detector, Faster R-CNN, to detect hands of various skin-tones and under illumination variations. We got preliminary experimental results of reasonable mean average precision score with real-time test speed in this work. As our future works, we will apply the Faster R-CNN detection to other objects and integrate the detection modules to the stroke patient rehabilitation system. 1 Introduction In this work, we address the hand detection problem from video for real-time applications. This work is designed with a purpose of application to a bigger project: helping stroke rehabilitation with an autonomous rehabilitation exercise evaluation system. In the system, a stroke survivor performs some pre-defined exercises which consist of grasping some objects, moving them from one place to another, releasing the objects, and manipulating the objects. In order to assess the quality of the exercises without any humans in the loop, the system should detect and track hands, torso, and other objects and it should also recognize which grasp type the survivor uses and assess the quality of the grasp. Object detection is a fundamental computer vision problem. In order to develop higher level vision algorithms such as grasp detection, action recognition, and event detection which are employed in the stroke rehabilitation system, we need more reliable and accurate detection method. Thus, in this work, we focus on the hand detection problem among all the building blocks of the large system. However, we expect that the hand detection problem addressed in this work can be applied to other detection problems such as torso and other object. 2 Related Works Li and Kitani proposed a pixel-level hand detection method in [1] in order to apply it to the stroke rehabilitation system. They used random tree regressors with various features (color, texture, histogram of oriented gradients, SIFT, ORB, super-pixel features). In order to account for various skin-tones, background clutter, various poses, and illumination changes, they trained the multiple regressors conditioned on the illumination and they used the mixture of multiple regressors as a final model. Even though they tried to build a model which is robust and accurate, it is still not very robust to the various skin-tones and illumination conditions. Recently, there have been great advances in the object detection task [2]-[5] and these advances have been driven by convolutional neural networks (CNNs) [6], [7]. In the CNN based detection method, CNN is mainly employed as a classifier. In [2], a region proposal method such as selective search Probabilistic Graphical Models and Structured Prediction Final Project Report. Spring 2016, Virginia Tech.
3 generates object-like regions. Then CNN classifies each region into categories and this method is called region-based CNN (R-CNN) method. Although R-CNN achieves state-of-the-art detection performance in terms of mean average precision (map), it is slow at test-time to be deployed for real-time applications due to many independent forward passes of the CNNs. In [3], they modified the network architecture to share computation of convolutional layers between object proposals and this method is called Fast R-CNN. In [4], which is called Faster R-CNN, the object proposal task is incorporated into the CNNs rather than having an external region proposal module. Fast R-CNN and Faster R-CNN achieve the 25 and 250 times faster test-time speed than R-CNN without loss of detection accuracy respectively. In this work, we employ Faster R-CNN because our goal is apply the hand detection module to the real-time stroke rehabilitation system. 3 Approach In order to detect hands in each image, we need not only a hand classifier but also a bounding box regressor for hands. In this work, we formulate the hand detection problem as a structured prediction problem. In this formulation we want to predict [x min,y min,x max,y max ] which correspond to the top-left and bottom-right coordinates of the hand, for a given image. There are only two classes in this problem definition: hand and background. Faster R-CNN [4] consists of two networks on top of shared convolutional layers. One network is region proposal network (RPN), another network is Fast R-CNN for object classification. Convolutional layers are shared across the two networks to efficiently train a model and to train the entire model in an end-to-end manner. In the forward pass, an input image is fed into the shared convolutional layers. The output of the shared convolutional layers are convolutional feature maps. The feature maps are fed into RPN. The proposed regions are fed into Fast R-CNN detector along with convolutional feature maps. Then the Fast R-CNN module classifies each region. Detailed illustration of the Faster R-CNN algorithm is described in [4]. We follow the 4-step training algorithm in [4]. 1) RPN training: Initialize the RPN with pre-trained ImageNet model, and train the RPN. 2) Fast R-CNN training: Train the detection network (Fast R-CNN) using the region proposals generated by step 1). 3) RPN fine-tuning: Fine-tune the layers which are unique to RPN. Fix the shared convolutional layers. 4) Fast R-CNN fine-tuning: Fine-tune the layers which are unique to Fast R-CNN. Fix the shared convolutional layers. In the RPN training, intersection over union (IoU) scores between ground truth bounding boxes and proposed bounding boxes are used to generate positive (object) and negative (background) examples. The boxes with IoU score greater than 0.7 are treated as positive examples and the boxes of IoU score lower than 0.3 are treated as negative examples. The loss function for the RPN consists of a classification (of objectness) term and a box regression term. The trained RPN is utilized when Fast R-CNN is being trained. RPN generates the region proposals during the forward pass and weights of the RPN are also updated during backward pass of the Fast R-CNN. In this work, we use the pre-trained ImageNet model for Zeiler and Fergus network (ZFNet) [10] architecture to initialize the weights of step 1) and 2). We aim to train a model for hand detection in this work. However, this work can be generalized and applied to other objects which are used in the stroke rehabilitation system such as torso, cylindrical object, hour-glass object, round-top object, and etc. The only issue for other object detection is how to collect sufficient amount of data for training. We do not deal with the data collection issue in this report because it is out of scope. 2
4 4 Experiments 4.1 Dataset and implementation details Figure 1: Training loss with respect to iterations. Each plot corresponds to each step of 4-step training pipeline. We used Oxford hand dataset [8]. This dataset consists of hand instances: 9163 instances are used for the training and 2031 instances are used for the test. We did not use the validation instances. We used the python and C++ implementation of Faster R-CNN by the author [4]. The code is implemented with an open sourced deep learning library Caffe [9]. Hyper parameters such as the number of scales and aspect ratios of anchor boxes were set to the same values used in [4]. For the training and testing, we ran the system on a Tesla K80 GPU equipped linux machine. We used 80k iterations per each training stage and the entire training time took about 12 hours to train the model with the Oxford hand dataset. We had a training loss convergence issue as shown in the Fig. 1. Training loss of RPN and Fast R-CNN decreased in stage 1 even though there were some amount of fluctuations. However, in stage 2, the training losses were not converged. We could not fully address this convergence issue. However it turns out that the trained model is working. We will further investigate this issue as a future work. 4.2 Quantitative results The system achieved a map of with ZFNet architecture [10]. If we use more advanced network architectures such as VGGNet [13] or ResNet [14], an increased map is expected. We would investigate this further as our future works. 3
5 Figure 2: Successful cases of hand detection. Note that in spite of various illuminations, skin-tones, occlusions, poses the system detects the hands successfully. 4
6 Figure 3: Failure cases of hand detection. Note that there are some false positives of similar textured objects such as ears and faces. 5
7 4.3 Qualitative results Some successful detection results are depicted in the Fig. 2. We marked the detected hands with red bounding boxes along with the confidence values. Multiple hands with different skin-tones, various illumination conditions, small resolutions and severe occlusions are detected precisely. Some representative failure cases are illustrated in the Fig. 3. The system detects the hand when there are visible hands in the image. However, there are many false positive detections when the similar textured regions such as ears and faces exist. Incorporating negative images without any hands to the training dataset could be a possible solution to this problem. We expect that this would also increase the map score. 4.4 Test-time speed Test-time speed of the hand detection is 100ms per image (=10 frames per second) on the average when using ZFNet [10] architecture on the linux machine equipped with Tesla K80 GPU. This is a reasonable speed for real-time detection module. However, if the rehabilitation system consist of more modules such as torso detector, grasp classifiers, and etc, 10fps hand detection would not be a sufficient speed. More speed up could be achieved if network compression techniques are employed. 5 Conclusion We addressed the real-time hand detection problem for stroke rehabilitation system. We applied Faster R-CNN to get robust and accurate detection for various skin-tones and illuminations. The experimental results in this work are preliminary for the future steps. We will investigate deeper and advanced network architectures such as VGGNet [13] and ResNet [14]. Furthermore, we can apply CNNs to grasp recognition task and other detections task as well. The test-time computation complexity is still not very light, because for the stroke rehabilitation system, we have a grasp recognition and a tracking modules as well as a hand detection module. Thus, as another future work, we could apply the deep compression techniques [11], [12] to the system to get some further speed up. Acknowledgments This work is done with a supervision of Dr. Dhruv Batra, Dr. Devi Parikh, and Dr. Aisling Kelliher of Virginia Tech. This work is also submitted to ECE6554: Advanced Computer Vision, Spring 2016, Virginia Tech as a class project. References [1] Cheng Li and Kris M. Kitani, Pixel-level Hand Detection in Ego-Centric Videos, in CVPR [2] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in CVPR [3] Ross Girshick, Fast R-CNN, in ICCV [4] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, NIPS [5] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi, You only look once: unified real-time object detection, CVPR [6] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner, Gradient-based learning applied to document recognition. Proceedings of the IEEE, November, [7] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton, Imagenet classification with deep convolutional neural networks, NIPS [8] Arpit Mittal, Andrew Zisserman, and Phillip H. S. Torr, Hand detection using multiple proposals, BMVC, [9] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell, Caffe: Convolutional Architecture for Fast Feature Embedding, arxiv preprint arxiv: ,
8 [10] Matthew D Zeiler, and Rob Fergus, Visualizing and understanding convolutional networks, ECCV [11] Song Han, Huizi Mao, and William J. Dally, Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding, ICLR [12] Yong-Deok Kim, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin, Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications, arxiv preprint arxiv: , [13] Karen Simonyan and Andrew Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition, ICLR [14] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, Deep Residual Learning for Image Recognition, arxiv: ,
Compacting ConvNets for end to end Learning
Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson, Hao Zhou, Fatih Porikli. Success of CNN Image Classification Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton,
Convolutional Feature Maps
Convolutional Feature Maps Elements of efficient (and accurate) CNN-based object detection Kaiming He Microsoft Research Asia (MSRA) ICCV 2015 Tutorial on Tools for Efficient Object Detection Overview
Pedestrian Detection with RCNN
Pedestrian Detection with RCNN Matthew Chen Department of Computer Science Stanford University [email protected] Abstract In this paper we evaluate the effectiveness of using a Region-based Convolutional
Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection
CSED703R: Deep Learning for Visual Recognition (206S) Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection Bohyung Han Computer Vision Lab. [email protected] 2 3 Object detection
CAP 6412 Advanced Computer Vision
CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong Jan 26, 2016 Today Administrivia A bigger picture and some common questions Object detection proposals, by Samer
Object Detection in Video using Faster R-CNN
Object Detection in Video using Faster R-CNN Prajit Ramachandran University of Illinois at Urbana-Champaign [email protected] Abstract Convolutional neural networks (CNN) currently dominate the computer
Fast R-CNN Object detection with Caffe
Fast R-CNN Object detection with Caffe Ross Girshick Microsoft Research arxiv code Latest roasts Goals for this section Super quick intro to object detection Show one way to tackle obj. det. with ConvNets
CS 1699: Intro to Computer Vision. Deep Learning. Prof. Adriana Kovashka University of Pittsburgh December 1, 2015
CS 1699: Intro to Computer Vision Deep Learning Prof. Adriana Kovashka University of Pittsburgh December 1, 2015 Today: Deep neural networks Background Architectures and basic operations Applications Visualizing
Module 5. Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016
Module 5 Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016 Previously, end-to-end.. Dog Slide credit: Jose M 2 Previously, end-to-end.. Dog Learned Representation Slide credit: Jose
Image Classification for Dogs and Cats
Image Classification for Dogs and Cats Bang Liu, Yan Liu Department of Electrical and Computer Engineering {bang3,yan10}@ualberta.ca Kai Zhou Department of Computing Science [email protected] Abstract
Pedestrian Detection using R-CNN
Pedestrian Detection using R-CNN CS676A: Computer Vision Project Report Advisor: Prof. Vinay P. Namboodiri Deepak Kumar Mohit Singh Solanki (12228) (12419) Group-17 April 15, 2016 Abstract Pedestrian detection
Deformable Part Models with CNN Features
Deformable Part Models with CNN Features Pierre-André Savalle 1, Stavros Tsogkas 1,2, George Papandreou 3, Iasonas Kokkinos 1,2 1 Ecole Centrale Paris, 2 INRIA, 3 TTI-Chicago Abstract. In this work we
Image and Video Understanding
Image and Video Understanding 2VO 710.095 WS Christoph Feichtenhofer, Axel Pinz Slide credits: Many thanks to all the great computer vision researchers on which this presentation relies on. Most material
CNN Based Object Detection in Large Video Images. WangTao, [email protected] IQIYI ltd. 2016.4
CNN Based Object Detection in Large Video Images WangTao, [email protected] IQIYI ltd. 2016.4 Outline Introduction Background Challenge Our approach System framework Object detection Scene recognition Body
MulticoreWare. Global Company, 250+ employees HQ = Sunnyvale, CA Other locations: US, China, India, Taiwan
1 MulticoreWare Global Company, 250+ employees HQ = Sunnyvale, CA Other locations: US, China, India, Taiwan Focused on Heterogeneous Computing Multiple verticals spawned from core competency Machine Learning
Fast R-CNN. Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th. Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:1504.08083.
Fast R-CNN Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:1504.08083. ECS 289G 001 Paper Presentation, Prof. Lee Result 1 67% Accuracy
Administrivia. Traditional Recognition Approach. Overview. CMPSCI 370: Intro. to Computer Vision Deep learning
: Intro. to Computer Vision Deep learning University of Massachusetts, Amherst April 19/21, 2016 Instructor: Subhransu Maji Finals (everyone) Thursday, May 5, 1-3pm, Hasbrouck 113 Final exam Tuesday, May
Lecture 6: Classification & Localization. boris. [email protected]
Lecture 6: Classification & Localization boris. [email protected] 1 Agenda ILSVRC 2014 Overfeat: integrated classification, localization, and detection Classification with Localization Detection. 2 ILSVRC-2014
Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks
1 Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks Tomislav Hrkać, Karla Brkić, Zoran Kalafatić Faculty of Electrical Engineering and Computing University of
Network Morphism. Abstract. 1. Introduction. Tao Wei
Tao Wei [email protected] Changhu Wang [email protected] Yong Rui [email protected] Chang Wen Chen [email protected] Microsoft Research, Beijing, China, 18 Department of Computer Science and Engineering,
arxiv:1506.03365v2 [cs.cv] 19 Jun 2015
LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop Fisher Yu Yinda Zhang Shuran Song Ari Seff Jianxiong Xiao arxiv:1506.03365v2 [cs.cv] 19 Jun 2015 Princeton
Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report
Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 69 Class Project Report Junhua Mao and Lunbo Xu University of California, Los Angeles [email protected] and lunbo
arxiv:1604.08893v1 [cs.cv] 29 Apr 2016
Faster R-CNN Features for Instance Search Amaia Salvador, Xavier Giró-i-Nieto, Ferran Marqués Universitat Politècnica de Catalunya (UPC) Barcelona, Spain {amaia.salvador,xavier.giro}@upc.edu Shin ichi
Deep Residual Networks
Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July 2016. Formerly affiliated with Microsoft Research Asia 7x7 conv,
Introduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Deep Learning Barnabás Póczos & Aarti Singh Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey
Multi-view Face Detection Using Deep Convolutional Neural Networks
Multi-view Face Detection Using Deep Convolutional Neural Networks Sachin Sudhakar Farfade Yahoo [email protected] Mohammad Saberian Yahoo [email protected] Li-Jia Li Yahoo [email protected]
Semantic Recognition: Object Detection and Scene Segmentation
Semantic Recognition: Object Detection and Scene Segmentation Xuming He [email protected] Computer Vision Research Group NICTA Robotic Vision Summer School 2015 Acknowledgement: Slides from Fei-Fei
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
1 Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun arxiv:1506.01497v3 [cs.cv] 6 Jan 2016 Abstract State-of-the-art object
Learning and transferring mid-level image representions using convolutional neural networks
Willow project-team Learning and transferring mid-level image representions using convolutional neural networks Maxime Oquab, Léon Bottou, Ivan Laptev, Josef Sivic 1 Image classification (easy) Is there
HE Shuncheng [email protected]. March 20, 2016
Department of Automation Association of Science and Technology of Automation March 20, 2016 Contents Binary Figure 1: a cat? Figure 2: a dog? Binary : Given input data x (e.g. a picture), the output of
Task-driven Progressive Part Localization for Fine-grained Recognition
Task-driven Progressive Part Localization for Fine-grained Recognition Chen Huang Zhihai He [email protected] University of Missouri [email protected] Abstract In this paper we propose a task-driven
SSD: Single Shot MultiBox Detector
SSD: Single Shot MultiBox Detector Wei Liu 1, Dragomir Anguelov 2, Dumitru Erhan 3, Christian Szegedy 3, Scott Reed 4, Cheng-Yang Fu 1, Alexander C. Berg 1 1 UNC Chapel Hill 2 Zoox Inc. 3 Google Inc. 4
Getting Started with Caffe Julien Demouth, Senior Engineer
Getting Started with Caffe Julien Demouth, Senior Engineer What is Caffe? Open Source Framework for Deep Learning http://github.com/bvlc/caffe Developed by the Berkeley Vision and Learning Center (BVLC)
Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.
Visual search: what's next? Cees Snoek University of Amsterdam The Netherlands Euvision Technologies The Netherlands Problem statement US flag Tree Aircraft Humans Dog Smoking Building Basketball Table
Steven C.H. Hoi School of Information Systems Singapore Management University Email: [email protected]
Steven C.H. Hoi School of Information Systems Singapore Management University Email: [email protected] Introduction http://stevenhoi.org/ Finance Recommender Systems Cyber Security Machine Learning Visual
arxiv:1312.6034v2 [cs.cv] 19 Apr 2014
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps arxiv:1312.6034v2 [cs.cv] 19 Apr 2014 Karen Simonyan Andrea Vedaldi Andrew Zisserman Visual Geometry Group,
SEMANTIC CONTEXT AND DEPTH-AWARE OBJECT PROPOSAL GENERATION
SEMANTIC TEXT AND DEPTH-AWARE OBJECT PROPOSAL GENERATION Haoyang Zhang,, Xuming He,, Fatih Porikli,, Laurent Kneip NICTA, Canberra; Australian National University, Canberra ABSTRACT This paper presents
Keypoint Density-based Region Proposal for Fine-Grained Object Detection and Classification using Regions with Convolutional Neural Network Features
Keypoint Density-based Region Proposal for Fine-Grained Object Detection and Classification using Regions with Convolutional Neural Network Features JT Turner 1, Kalyan Gupta 1, Brendan Morris 2, & David
Convolutional Neural Networks with Intra-layer Recurrent Connections for Scene Labeling
Convolutional Neural Networks with Intra-layer Recurrent Connections for Scene Labeling Ming Liang Xiaolin Hu Bo Zhang Tsinghua National Laboratory for Information Science and Technology (TNList) Department
Naive-Deep Face Recognition: Touching the Limit of LFW Benchmark or Not?
Naive-Deep Face Recognition: Touching the Limit of LFW Benchmark or Not? Erjin Zhou [email protected] Zhimin Cao [email protected] Qi Yin [email protected] Abstract Face recognition performance improves rapidly
InstaNet: Object Classification Applied to Instagram Image Streams
InstaNet: Object Classification Applied to Instagram Image Streams Clifford Huang Stanford University [email protected] Mikhail Sushkov Stanford University [email protected] Abstract The growing
Do Convnets Learn Correspondence?
Do Convnets Learn Correspondence? Jonathan Long Ning Zhang Trevor Darrell University of California Berkeley {jonlong, nzhang, trevor}@cs.berkeley.edu Abstract Convolutional neural nets (convnets) trained
Fast Accurate Fish Detection and Recognition of Underwater Images with Fast R-CNN
Fast Accurate Fish Detection and Recognition of Underwater Images with Fast R-CNN Xiu Li 1, 2, Min Shang 1, 2, Hongwei Qin 1, 2, Liansheng Chen 1, 2 1. Department of Automation, Tsinghua University, Beijing
R-CNN minus R. 1 Introduction. Karel Lenc http://www.robots.ox.ac.uk/~karel. Department of Engineering Science, University of Oxford, Oxford, UK.
LENC, VEDALDI: R-CNN MINUS R 1 R-CNN minus R Karel Lenc http://www.robots.ox.ac.uk/~karel Andrea Vedaldi http://www.robots.ox.ac.uk/~vedaldi Department of Engineering Science, University of Oxford, Oxford,
Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network
Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network Seunghoon Hong 1 Tackgeun You 1 Suha Kwak 2 Bohyung Han 1 1 Dept. of Computer Science and Engineering, POSTECH,
Applications of Deep Learning to the GEOINT mission. June 2015
Applications of Deep Learning to the GEOINT mission June 2015 Overview Motivation Deep Learning Recap GEOINT applications: Imagery exploitation OSINT exploitation Geospatial and activity based analytics
Edge Boxes: Locating Object Proposals from Edges
Edge Boxes: Locating Object Proposals from Edges C. Lawrence Zitnick and Piotr Dollár Microsoft Research Abstract. The use of object proposals is an effective recent approach for increasing the computational
Convolutional Networks for Stock Trading
Convolutional Networks for Stock Trading Ashwin Siripurapu Stanford University Department of Computer Science 353 Serra Mall, Stanford, CA 94305 [email protected] Abstract Convolutional neural networks
arxiv:1505.04597v1 [cs.cv] 18 May 2015
U-Net: Convolutional Networks for Biomedical Image Segmentation Olaf Ronneberger, Philipp Fischer, and Thomas Brox arxiv:1505.04597v1 [cs.cv] 18 May 2015 Computer Science Department and BIOSS Centre for
A Convolutional Neural Network Cascade for Face Detection
A Neural Network Cascade for Face Detection Haoxiang Li, Zhe Lin, Xiaohui Shen, Jonathan Brandt, Gang Hua Stevens Institute of Technology Hoboken, NJ 07030 {hli18, ghua}@stevens.edu Adobe Research San
A Dynamic Convolutional Layer for Short Range Weather Prediction
A Dynamic Convolutional Layer for Short Range Weather Prediction Benjamin Klein, Lior Wolf and Yehuda Afek The Blavatnik School of Computer Science Tel Aviv University [email protected], [email protected],
arxiv:1504.08083v2 [cs.cv] 27 Sep 2015
Fast R-CNN Ross Girshick Microsoft Research [email protected] arxiv:1504.08083v2 [cs.cv] 27 Sep 2015 Abstract This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object
Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers
Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers Fan Yang 1,2, Wongun Choi 2, and Yuanqing Lin 2 1 Department of Computer Science,
Object Detection from Video Tubelets with Convolutional Neural Networks
Object Detection from Video Tubelets with Convolutional Neural Networks Kai Kang Wanli Ouyang Hongsheng Li Xiaogang Wang Department of Electronic Engineering, The Chinese University of Hong Kong {kkang,wlouyang,hsli,xgwang}@ee.cuhk.edu.hk
Latest Advances in Deep Learning. Yao Chou
Latest Advances in Deep Learning Yao Chou Outline Introduction Images Classification Object Detection R-CNN Traditional Feature Descriptor Selective Search Implementation Latest Application Deep Learning
Advanced analytics at your hands
2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously
Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15
Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15 GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries Copyright GENIVI Alliance
Local features and matching. Image classification & object localization
Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to
Simultaneous Deep Transfer Across Domains and Tasks
Simultaneous Deep Transfer Across Domains and Tasks Eric Tzeng, Judy Hoffman, Trevor Darrell UC Berkeley, EECS & ICSI {etzeng,jhoffman,trevor}@eecs.berkeley.edu Kate Saenko UMass Lowell, CS [email protected]
arxiv:1502.01852v1 [cs.cv] 6 Feb 2015
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun arxiv:1502.01852v1 [cs.cv] 6 Feb 2015 Abstract Rectified activation
Denoising Convolutional Autoencoders for Noisy Speech Recognition
Denoising Convolutional Autoencoders for Noisy Speech Recognition Mike Kayser Stanford University [email protected] Victor Zhong Stanford University [email protected] Abstract We propose the use of
Novelty Detection in image recognition using IRF Neural Networks properties
Novelty Detection in image recognition using IRF Neural Networks properties Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban Université de Haute-Alsace MIPS 4, rue des Frères Lumière, 68093 Mulhouse,
Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang
Recognizing Cats and Dogs with Shape and Appearance based Models Group Member: Chu Wang, Landu Jiang Abstract Recognizing cats and dogs from images is a challenging competition raised by Kaggle platform
The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
Classifying Manipulation Primitives from Visual Data
Classifying Manipulation Primitives from Visual Data Sandy Huang and Dylan Hadfield-Menell Abstract One approach to learning from demonstrations in robotics is to make use of a classifier to predict if
Conditional Random Fields as Recurrent Neural Networks
Conditional Random Fields as Recurrent Neural Networks Shuai Zheng 1, Sadeep Jayasumana *1, Bernardino Romera-Paredes 1, Vibhav Vineet 1,2, Zhizhong Su 3, Dalong Du 3, Chang Huang 3, and Philip H. S. Torr
Tracking performance evaluation on PETS 2015 Challenge datasets
Tracking performance evaluation on PETS 2015 Challenge datasets Tahir Nawaz, Jonathan Boyle, Longzhen Li and James Ferryman Computational Vision Group, School of Systems Engineering University of Reading,
3D Model based Object Class Detection in An Arbitrary View
3D Model based Object Class Detection in An Arbitrary View Pingkun Yan, Saad M. Khan, Mubarak Shah School of Electrical Engineering and Computer Science University of Central Florida http://www.eecs.ucf.edu/
The Delicate Art of Flower Classification
The Delicate Art of Flower Classification Paul Vicol Simon Fraser University University Burnaby, BC [email protected] Note: The following is my contribution to a group project for a graduate machine learning
arxiv:1409.1556v6 [cs.cv] 10 Apr 2015
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION Karen Simonyan & Andrew Zisserman + Visual Geometry Group, Department of Engineering Science, University of Oxford {karen,az}@robots.ox.ac.uk
VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS
VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS Norbert Buch 1, Mark Cracknell 2, James Orwell 1 and Sergio A. Velastin 1 1. Kingston University, Penrhyn Road, Kingston upon Thames, KT1 2EE,
arxiv:1412.6856v2 [cs.cv] 15 Apr 2015
Published as a conference paper at ICLR 215 OBJECT DETECTORS EMERGE IN DEEP SCENE CNNS Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba Computer Science and Artificial Intelligence
Fast Matching of Binary Features
Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been
3D Object Recognition using Convolutional Neural Networks with Transfer Learning between Input Channels
3D Object Recognition using Convolutional Neural Networks with Transfer Learning between Input Channels Luís A. Alexandre Department of Informatics and Instituto de Telecomunicações Univ. Beira Interior,
Supporting Online Material for
www.sciencemag.org/cgi/content/full/313/5786/504/dc1 Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks G. E. Hinton* and R. R. Salakhutdinov *To whom correspondence
Taking Inverse Graphics Seriously
CSC2535: 2013 Advanced Machine Learning Taking Inverse Graphics Seriously Geoffrey Hinton Department of Computer Science University of Toronto The representation used by the neural nets that work best
Optical Flow. Shenlong Wang CSC2541 Course Presentation Feb 2, 2016
Optical Flow Shenlong Wang CSC2541 Course Presentation Feb 2, 2016 Outline Introduction Variation Models Feature Matching Methods End-to-end Learning based Methods Discussion Optical Flow Goal: Pixel motion
Stochastic Pooling for Regularization of Deep Convolutional Neural Networks
Stochastic Pooling for Regularization of Deep Convolutional Neural Networks Matthew D. Zeiler Department of Computer Science Courant Institute, New York University [email protected] Rob Fergus Department
Speed Performance Improvement of Vehicle Blob Tracking System
Speed Performance Improvement of Vehicle Blob Tracking System Sung Chun Lee and Ram Nevatia University of Southern California, Los Angeles, CA 90089, USA [email protected], [email protected] Abstract. A speed
Mean-Shift Tracking with Random Sampling
1 Mean-Shift Tracking with Random Sampling Alex Po Leung, Shaogang Gong Department of Computer Science Queen Mary, University of London, London, E1 4NS Abstract In this work, boosting the efficiency of
Learning to Process Natural Language in Big Data Environment
CCF ADL 2015 Nanchang Oct 11, 2015 Learning to Process Natural Language in Big Data Environment Hang Li Noah s Ark Lab Huawei Technologies Part 1: Deep Learning - Present and Future Talk Outline Overview
Real-Time Grasp Detection Using Convolutional Neural Networks
Real-Time Grasp Detection Using Convolutional Neural Networks Joseph Redmon 1, Anelia Angelova 2 Abstract We present an accurate, real-time approach to robotic grasp detection based on convolutional neural
Limitations of Human Vision. What is computer vision? What is computer vision (cont d)?
What is computer vision? Limitations of Human Vision Slide 1 Computer vision (image understanding) is a discipline that studies how to reconstruct, interpret and understand a 3D scene from its 2D images
PANDA: Pose Aligned Networks for Deep Attribute Modeling
PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang 1,2, Manohar Paluri 1, Marc Aurelio Ranzato 1, Trevor Darrell 2, Lubomir Bourdev 1 1 Facebook AI Research 2 EECS, UC Berkeley {nzhang,
Colorado School of Mines Computer Vision Professor William Hoff
Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/ 1 Introduction to 2 What is? A process that produces from images of the external world a description
Transform-based Domain Adaptation for Big Data
Transform-based Domain Adaptation for Big Data Erik Rodner University of Jena Judy Hoffman Jeff Donahue Trevor Darrell Kate Saenko UMass Lowell Abstract Images seen during test time are often not from
Masters in Information Technology
Computer - Information Technology MSc & MPhil - 2015/6 - July 2015 Masters in Information Technology Programme Requirements Taught Element, and PG Diploma in Information Technology: 120 credits: IS5101
Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition
Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition Marc Aurelio Ranzato, Fu-Jie Huang, Y-Lan Boureau, Yann LeCun Courant Institute of Mathematical Sciences,
The Visual Internet of Things System Based on Depth Camera
The Visual Internet of Things System Based on Depth Camera Xucong Zhang 1, Xiaoyun Wang and Yingmin Jia Abstract The Visual Internet of Things is an important part of information technology. It is proposed
AN IMPROVED DOUBLE CODING LOCAL BINARY PATTERN ALGORITHM FOR FACE RECOGNITION
AN IMPROVED DOUBLE CODING LOCAL BINARY PATTERN ALGORITHM FOR FACE RECOGNITION Saurabh Asija 1, Rakesh Singh 2 1 Research Scholar (Computer Engineering Department), Punjabi University, Patiala. 2 Asst.
An Introduction to Deep Learning
Thought Leadership Paper Predictive Analytics An Introduction to Deep Learning Examining the Advantages of Hierarchical Learning Table of Contents 4 The Emergence of Deep Learning 7 Applying Deep-Learning
Privacy-CNH: A Framework to Detect Photo Privacy with Convolutional Neural Network Using Hierarchical Features
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) Privacy-CNH: A Framework to Detect Photo Privacy with Convolutional Neural Network Using Hierarchical Features Lam Tran
CS231M Project Report - Automated Real-Time Face Tracking and Blending
CS231M Project Report - Automated Real-Time Face Tracking and Blending Steven Lee, [email protected] June 6, 2015 1 Introduction Summary statement: The goal of this project is to create an Android
Multi-view Intelligent Vehicle Surveillance System
Multi-view Intelligent Vehicle Surveillance System S. Denman, C. Fookes, J. Cook, C. Davoren, A. Mamic, G. Farquharson, D. Chen, B. Chen and S. Sridharan Image and Video Research Laboratory Queensland
Simplified Machine Learning for CUDA. Umar Arshad @arshad_umar Arrayfire @arrayfire
Simplified Machine Learning for CUDA Umar Arshad @arshad_umar Arrayfire @arrayfire ArrayFire CUDA and OpenCL experts since 2007 Headquartered in Atlanta, GA In search for the best and the brightest Expert
Object Recognition. Selim Aksoy. Bilkent University [email protected]
Image Classification and Object Recognition Selim Aksoy Department of Computer Engineering Bilkent University [email protected] Image classification Image (scene) classification is a fundamental
Tracking in flussi video 3D. Ing. Samuele Salti
Seminari XXIII ciclo Tracking in flussi video 3D Ing. Tutors: Prof. Tullio Salmon Cinotti Prof. Luigi Di Stefano The Tracking problem Detection Object model, Track initiation, Track termination, Tracking
