Image Classification for Dogs and Cats
|
|
|
- Dale Berry
- 10 years ago
- Views:
Transcription
1 Image Classification for Dogs and Cats Bang Liu, Yan Liu Department of Electrical and Computer Engineering Kai Zhou Department of Computing Science Abstract In this project, our task is to develop an algorithm to classify images of dogs and cats, which is the Dogs vs. Cats competition from Kaggle. We mainly investigated two approaches to address this problem. The first one is a traditional pattern recognition model. We extracted some human-crafted features like color and Dense-SIFT, represented images using bag of words model, and then trained Support Vector Machines(SVMs) classifiers. For the second approach, we used Deep Convolutional Neural Networks (CNN) to learn features of images and trained Backpropagation(BP) Neural Networks and SVMs for classification. We tried various experiments to improve our performance on the test dataset, and finally got the best accuracy of 94.00% by the second approach. 1 Introduction 1.1 Motivation and Background The Dogs vs. Cats competition from Kaggle is trying to solve the CAPTCHA[3] challenge, which relies on the problem of distinguishing images of dogs and cats. It is easy for humans, but evidence [3] suggests that cats and dogs are particularly difficult to tell apart automatically. Many people has worked or are working on constructing machine learning classifiers to address this problem. In [3], a classifier based on color features got 56.9% accuracy on the Asirra dataset [2]. In [17], an accuracy of 82.7% was achieved from a SVM classifier based on a combination of color and texture features. And in [18], they used the SIFT (Scale-Invariant Feature Transform) [13] features to train a classifier and finally got an accuracy of 92.9%. In our project, we also would like to solve this problem and achieve higher performance. We tried different strategies. For instance, we tried Dense-SIFT features and the combination of Dense- SIFT and color features, and features learned from CNN. Also, we employed SVMs on the learned features and finally achieved our best classication accuracy of 94.00%. 1.2 Task Definition Our basic task is to create an algorithm to classify whether an image contains a dog or a cat. The input for this task is images of dogs or cats from training dataset, while the output is the classification accuracy on test dataset. The given dataset for this competition is the Asirra dataset provided by Microsoft Research. Our training set contains 25,000 images, including 12,500 images of dogs and 12,500 images of cats, while the test dataset contains 12,500 images. The average size for these images is around Our learning task is to learn a classification model to determine the decision boundary for the training dataset. The whole process is illustrated in Figure 1, from which we can see the input for the learning task is images from the training dataset, while the output is the learned classification model. 1
2 Figure 1: Architecture for Learning Task Figure 2: Architecture for Performance Task Our performance task is to apply the learned classification model to classify images from the test dataset, and then evaluate the classification accuracy. As seen from Figure 2, the input is images from the test dataset, and the output is the classification accuracy. 1.3 Our Solution In our solution, we mainly tried two different approaches. The first method is a traditional pattern recognition model, by which we learned the classification model from some human-crafted features, mainly including color feature, Dense-SIFT feature, and a combination of the two kinds of features. The second method is a trainable model, by which we applied a CNN to learn features. In terms of classifiers, we mainly chose SVMs and BP Neural Networks, considering the high dimensional feature space for images. We tried various experiments to achieve high accuracy on the test dataset, with different algorithms and parameter settings. The outline of our paper is as follows. We introduce the first approach in section 2. The second approach is described in section 3. Finally we summarize our work and potential future work in section 4. 2 Method One: Using Human-Crafted Features 2.1 Human-Crafted Features In typical image classification problems, we choose some fixed human-crafted features to use. There are many well studied features, such as SIFT, HoG[15], RGB or HSV color features, etc. In our project, the images are either dogs or cats. And we know that the shape and the priori probability of colors of dogs and cats are different. So we extracted local features descriptor SIFT and HSV color features to represent the original images. The SIFT features are local and based on the appearance of the object, also invariant to image scale and rotations. The scale-invariant feature transform of a neighborhood is a 128 dimensional vector of histograms of image gradients, as shown in Figure 3. The region, at the appropriate scale and orientation, is divided into a 4 4 square grid, each cell of which yields a histogram with 8 2
3 Figure 3: SIFT Descriptor [14] Figure 4: Bag of Words [16] orientation bins. The Dense-SIFT is a fast algorithm for the calculation of a large number of SIFT descriptors of densely sampled features[12]. The HSV (hue, saturation, value) are the most common cylindrical-coordinate representations of points in an RGB color model, which rearranges the geometry of RGB in an attempt to be more intuitive and perceptually relevant. The reason why we chose it is that the HSV is closer to human perception of color and easy to interpret. 2.2 Our Model After getting the Dense-SIFT and HSV features, we applied the bag of words model [16] to represent images. Figure 4 shows the bag of words model. It is a simplifying representation used in natural language processing and computer vision. In this model, a dictionary is formed by clustering the extracted features of training set using k-means algorithm. Every cluster is a word of this visual dictionary. Then images are represented by frequency vectors in which every dimention represents the proportion of features belong to a cluster. For classification, we trained SVM classifiers respectively on the Dense-SIFT feature, the HSV feature and the combination of the two. SVMs with suitable parameters have the ability to prevent overfitting, and experience shows that they usually can get good performance for classification. Additionally, since the original images have various complicated backgrounds, we also tried the grab-cut [18] segmentation algorithm to cut the backgrounds of images. 2.3 Result and Analysis When only using the Dense-SIFT features, the accuracy on the test dataset was only 67.60%. After combining with HSV feature, the accuracy increased to 71.47%. The performance is not good and here we analyzed some potential reasons. The most possible one is that the Dense-SIFT and color 3
4 features of dogs and cats are quite similar. For instance, they both have one head, one tail and four legs. Also, their color has much in common. For the image segmentation part, the classification performance turns out to decrease, due to the poor results of image segmentation. For example, in some images, the dogs or cats were even cutted out. Also, some important information like tails or ears was removed during the segmentation process. To further improve the performance, we can extract more distinctive features, use PCA to reduce dimension or try Deep Neural Network to extract features of images. 3 Method Two: Using Features learned by Convolutional Neural Network Our second approach is to learn features by the CNN [5][6] and then use BP Neural Network or SVM classifiers to train these features. Different from human crafted features which are fixed features directly extracted from images, deep neural networks learn features from images, and discover multiple levels of representation, with higher-level features representing more abstract aspects of the data [4]. 3.1 Deep Convolutional Neural Networks The CNN is a kind of deep architecture which has achieved great performance in tasks like document recognition [5] and image recognition [6]. Different from traditional BP Neural Networks, which contains input layer, hidden layers and output layer, the CNN also contains Convolutional layers and Max Pooling layers. Figure 5: Convolutional Operation and Max Pooling in CNN[9][10] Convolutional layers contain many feature maps, which are two dimensional hidden nodes. Every feature map owns a weight matrix called kernel, and different feature maps owns different kernels. Kernels do convolutional operation with every feature map in previous layer (layer j), then we sum them up and put it into sigmoid function. The output is the pixel value in layer j+1. With different kernels, we can learn different representations of data and the amount of parameters is not increased exponentially with the number of hidden nodes and layers. Once a feature has been detected, its exact location becomes less important [5]. Only its approximate position relative to other features is relevant. Not only is the precise position of each of those features irrelevant for identifying the pattern, it is also potentially harmful because the positions are likely to vary for different instances of the pattern. Max pooling layers do sub-sampling operation on feature maps. For every four pixels, we only retain the max value, so that the size of feature maps will be half of the original size. Sub-sampling reduces the resolution of feature maps and reduces the sensitivity of the output to shifts and distortions, so that the model will be more robust. Max pooling operation can be incorporated into convolutional layers, and we don t need additional layers to do sub-sampling. 4
5 3.2 Our Model In 2012, Krizhevsky and Hinton trained a CNN and achieved state of the art performance on the ImageNet 2012 classification benchmark [6]. Our model is based on the model of Krizhevsky. The original model contains 8 layers and the last three layers are two fully connected layers (same as hidden layers in BP Neural Network) and output layer. We cut off the last 2 layers and re-use the previous 6 layers to extract features from images. Recent work [8] proved that features extracted from the activation of a CNN trained in a fully supervised fashion on a large, fixed set of object recognition tasks can be re-purposed to novel generic tasks, which may differ significantly from the originally trained tasks. And training a Deep Neural Network needs much experience and skill, and it is very time consuming. That is why we re-used the pre-trained network of Krizhevsky rather than training a deep architecture by ourselves. Figure 6: An illustration of the architecture of our model. The CNN was trained in parallel by two GPUs, and the figure explicitly shows the delineation of responsibilities between two GPUs. One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts at the bottom. The networks input is (RGB), and the number of neurons in the networks remaining layers is given by , , , , Then we train a classifier using the 4096 dimensional features and classify images as dog or cat. We refer to [6] for more details about the network. The architecture of this model is illustrated by figure 6. First, input images are normalized into (first normalize the short edge size to 224 and then choose the center part of images); then the first convolutional layer filters the input image with 96 kernels of size with a stride of 4 pixels (this is the distance between the receptive field centers of neighboring neurons in a kernel map). The second convolutional layer takes as input the pooled output of the first convolutional layer and filters it with 256 kernels of size The third, fourth, and fifth convolutional layers are connected to one another without any intervening pooling layers. The third convolutional layer has 384 kernels of size , and the fifth convolutional layer has 256 kernels of size The fully-connected layers have 4096 neurons. Until now, the 6 layer CNN transforms an image into a feature vector of 4096 dimensions. Then we trained the BP Neural Networks and SVMs with 5 fold cross-validation on the extracted features. After that, we applied the learned models to classify the test set and evaluated their performance. 3.3 Result and Analysis In this method, we finally got the best accuracy 94.00% from an SVM classifier (with RBF kernel and C = 10000), which is impressive. To clearly understand how and why the CNN work, here we look insight into the internal operation and behavior of the complex model. To understand what every layers learned from images, we need to figure out what kind of inputs activate a feature map. A feature map is activated if its pixel value is 1. Zeiler [9] proposed a visualization technique using multi-layered Deconvolutional Network (deconvnet) [11] to reveals the input stimuli that excite individual feature maps at any layer in the model. It projects the feature activations back to the input pixel space. A deconvnet can be thought as a convent model that uses the same components (filtering, pooling) but in reverse, so instead of mapping pixels to features does the opposite. To examine a convnet, a deconvnet is attached to each of its layers, providing a continuous path back to image pixels. To examine a given covnet activation, we set all other 5
6 Figure 7: Projecting back from higher layers. Figure 8: Feature visualization using deconvnet. Images are from the ImageNet2012 validation set[9] activations in the layer to zero and pass the feature maps as input to the attached deconvnet layer. Then we reconstruct the activity in the layer beneath until input pixel space is reached. For more details about decovnet please refer to [9][11]. Figure 8 shows feature visualizations for different layers [9]. Instead of showing the single strongest activation for a given feature map, we show the top 9 activations. Projecting each separately down to pixel space reveals the different structures that excite a given feature map, hence showing its invariance to input deformations. Alongside these visualizations we show the corresponding image patches. These have greater variation than visualizations as the latter solely focus on the discriminant structure within each patch. For example, in layer 5, row 1, col 2, the patches appear to have little in common, but the visualizations reveal that this particular feature map focuses on the grass in the background, not the foreground objects. The projections from each layer show the hierarchical nature of the features in the network. Layer 1 learns some basic edges and colors. Layer 2 responds to corners and other edge/color conjunctions. Layer 3 has more complex invariances, capturing similar textures or patterns. Layer 4 shows signif- 6
7 Figure 9: Some incorrectly classified images in the kaggle test set icant variation, but is more class-specific: dog faces, birds legs. Layer 5 shows entire objects with significant pose variation, e.g. dogs and grass. Now we get an insight into why deep architectures can achieve good performances. Features learned by Deep Neural Networks are hierarchical. With more hidden layers, the learned features become more high level and specific. Compared to human-crafted features such as Dense-SIFT or color features, they are more expressive, class-specific, and invariant to backgrounds. We can also figure out which layers are critical to recognition and change the parameters and architecture of deep neural network, e.g. increase the number of feature maps in some layers, to achieve better performance. To know the remaining problems and achieve better performance, we investigated what kind of images were incorrectly classified. Figure 9 reveals some characters of images that have been failed to classify. Some images resolutions are too low to recognize (1 and 2), and some images critical characters, e.g. faces, are hidden (3 and 4). The test set also contains some cartoon images (5 and 6) that only contain simple shapes of dog or cat, which are hard to classify. Also, many wrongly classified images contain backgrounds (7 and 8) that are very complicated or similar to the forground animals. Finally, some images even cannot be recognized by humans (9). From the properties of images that failed to recognize, we can try different strategies. We may try object localization to locate the animals in images, so that complicated backgounds can be elimited. We may also take advantage of extra training set to improve the ability of classification for special images such as cartoon dogs or cats. We can also combine features such as texture feature with features learned by Deep Neural Network, which may be helpful for the recognition of images that only contains the fur or partial of the animal s body. 4 Conclusion and Future Work In this report, we first briefly explained our motivation of this project and showed some background materials. Then, we precisely illustrated our task, including the learning task and the performance task. After that, we introduced our solution in detail, mainly including two approaches. The first approach is a traditional pattern recognition model, by which we learned the classification model from some human-crafted features, mainly including color feature, Dense-SIFT feature, and a combination of the two. To improve the performance, we also applied image segmentation approach to preprocess the data. However, due to poor segmentation result, we did not achieve any improvement. The best accuracy we got from the first method is only 71.47% (from an SVM classifier). 7
8 To achieve better performance, we implemented our second approach, which is a trainable model that applies the CNN to learn features. We also looked insight into what Deep Networks learned from images and explained why they achieve good performance. The highest accuracy of this approach is 94.00% (from an SVM classifier), which is also our best result and helps us rank 9th in 91 teams in the Kaggle competition. In terms of classifiers, we mainly considered SVMs and BP Neural Networks, taking our high dimensional feature space into account. Various parameter settings were explored to improve classification accuracy on the test dataset. For example, for the BP Neural Networks, we tried different hidden layers and hidden units; for the SVMs, different kernel functions and C parameters were used. Table 1 illustrates the best results of each model and related parameters. Table 1: Best Performance on the Test Dataset for Different Models Feature Classifier Parameter Setting Accuracy Dense-SIFT SVM Linear kernel, C= % Dense-SIFT+Colors SVM Linear kernel, C= % From Deep CNN BP Network 1 Hidden layer, 30 Neurons 93.01% From Deep CNN SVM RBF kernel, C = % In the future, we will explore more to achieve better performance. For instance, we will try to change the architecture and parameter settings of the Deep Neural Network based on the feature visualization of different layers feature maps. Also, we will apply different parameter settings for SVMs and Deep Neural Networks. We may also try object localization to eliminate the influence of complicated backgrounds. Additionally, we would like to extract more features or try a combination of human-crafted features and learned features. Acknowledgments We would like to express our appreciation to Dr. Russ Greiner. Thanks for his time and efforts on guiding the whole process of our project. Also, we would like to thank Junfeng for being our co-coach and providing some suggestions. Apart from that, we would like to thank Dr. Mohamed Elgendi for his suggestions on image preprocessing. References [1] Kaggle DogVCat Competation: [2] MSR Asirra: [3] J. Elson, J. Douceur, J. Howell and J. Saul. Asirra: a CAPTCHA that exploits interest-aligned manual image categorization. Proc. of ACM CCS 2007, pp [4] Bengio, Y. (2013). Deep learning of representations: Looking forward. arxiv preprint arxiv: [5] LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), [6] Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25 (pp ). [7] Huang, F. J., & LeCun, Y. (2006, June). Large-scale learning with svm and convolutional for generic object categorization. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on (Vol. 1, pp ). IEEE. [8] Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., & Darrell, T. (2013). DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. arxiv preprint arxiv: [9] Zeiler, M. D., & Fergus, R. (2013). Visualizing and Understanding Convolutional Neural Networks. arxiv preprint arxiv: [10] irkhan/conn2.html 8
9 [11] Zeiler, M. D., Taylor, G. W., & Fergus, R. (2011, November). Adaptive deconvolutional networks for mid and high level feature learning. In Computer Vision (ICCV), 2011 IEEE International Conference on (pp ). IEEE. [12] Vedaldi, A., & Fulkerson, B. (2010, October). VLFeat: An open and portable library of computer vision algorithms. In Proceedings of the international conference on Multimedia (pp ). ACM. [13] Lowe, D. G. (1999). Object recognition from local scale-invariant features. InComputer vision, The proceedings of the seventh IEEE international conference on (Vol. 2, pp ). Ieee. [14] Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints.international journal of computer vision, 60(2), [15] Dalal, N., & Triggs, B. (2005, June). Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, CVPR IEEE Computer Society Conference on (Vol. 1, pp ). IEEE. [16] Bag of Words: fergus/teaching/vision 2012/9 BoW.pdf [17] Golle, P. (2008, October). Machine learning attacks against the Asirra CAPTCHA. In Proceedings of the 15th ACM conference on Computer and communications security (pp ). ACM. [18] Parkhi, O. M., Vedaldi, A., Zisserman, A., & Jawahar, C. V. (2012, June). Cats and dogs. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on (pp ). IEEE. [19] Dense SIFT: 9
10 Table 2: Team Member s Jobs Jobs Members Conceptualize the problem Implement the algorithms (SIFT + SVM ) Implement the algorithms ( Deep Neural Networks ) Run experiments Prepare for presentation Kai Zhou Kai Zhou & Yan Liu & Bang Liu Bang Liu & Yan Liu Bang Liu & Yan Liu & Kai Zhou Bang Liu & Yan Liu & Kai Zhou 10
Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang
Recognizing Cats and Dogs with Shape and Appearance based Models Group Member: Chu Wang, Landu Jiang Abstract Recognizing cats and dogs from images is a challenging competition raised by Kaggle platform
CS 1699: Intro to Computer Vision. Deep Learning. Prof. Adriana Kovashka University of Pittsburgh December 1, 2015
CS 1699: Intro to Computer Vision Deep Learning Prof. Adriana Kovashka University of Pittsburgh December 1, 2015 Today: Deep neural networks Background Architectures and basic operations Applications Visualizing
Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report
Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 69 Class Project Report Junhua Mao and Lunbo Xu University of California, Los Angeles [email protected] and lunbo
The Delicate Art of Flower Classification
The Delicate Art of Flower Classification Paul Vicol Simon Fraser University University Burnaby, BC [email protected] Note: The following is my contribution to a group project for a graduate machine learning
Steven C.H. Hoi School of Information Systems Singapore Management University Email: [email protected]
Steven C.H. Hoi School of Information Systems Singapore Management University Email: [email protected] Introduction http://stevenhoi.org/ Finance Recommender Systems Cyber Security Machine Learning Visual
Convolutional Feature Maps
Convolutional Feature Maps Elements of efficient (and accurate) CNN-based object detection Kaiming He Microsoft Research Asia (MSRA) ICCV 2015 Tutorial on Tools for Efficient Object Detection Overview
Administrivia. Traditional Recognition Approach. Overview. CMPSCI 370: Intro. to Computer Vision Deep learning
: Intro. to Computer Vision Deep learning University of Massachusetts, Amherst April 19/21, 2016 Instructor: Subhransu Maji Finals (everyone) Thursday, May 5, 1-3pm, Hasbrouck 113 Final exam Tuesday, May
Deformable Part Models with CNN Features
Deformable Part Models with CNN Features Pierre-André Savalle 1, Stavros Tsogkas 1,2, George Papandreou 3, Iasonas Kokkinos 1,2 1 Ecole Centrale Paris, 2 INRIA, 3 TTI-Chicago Abstract. In this work we
Image and Video Understanding
Image and Video Understanding 2VO 710.095 WS Christoph Feichtenhofer, Axel Pinz Slide credits: Many thanks to all the great computer vision researchers on which this presentation relies on. Most material
Taking Inverse Graphics Seriously
CSC2535: 2013 Advanced Machine Learning Taking Inverse Graphics Seriously Geoffrey Hinton Department of Computer Science University of Toronto The representation used by the neural nets that work best
Local features and matching. Image classification & object localization
Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to
Introduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Deep Learning Barnabás Póczos & Aarti Singh Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
Learning and transferring mid-level image representions using convolutional neural networks
Willow project-team Learning and transferring mid-level image representions using convolutional neural networks Maxime Oquab, Léon Bottou, Ivan Laptev, Josef Sivic 1 Image classification (easy) Is there
Novelty Detection in image recognition using IRF Neural Networks properties
Novelty Detection in image recognition using IRF Neural Networks properties Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban Université de Haute-Alsace MIPS 4, rue des Frères Lumière, 68093 Mulhouse,
Learning to Process Natural Language in Big Data Environment
CCF ADL 2015 Nanchang Oct 11, 2015 Learning to Process Natural Language in Big Data Environment Hang Li Noah s Ark Lab Huawei Technologies Part 1: Deep Learning - Present and Future Talk Outline Overview
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection
CSED703R: Deep Learning for Visual Recognition (206S) Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection Bohyung Han Computer Vision Lab. [email protected] 2 3 Object detection
Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall
Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin
Lecture 6: Classification & Localization. boris. [email protected]
Lecture 6: Classification & Localization boris. [email protected] 1 Agenda ILSVRC 2014 Overfeat: integrated classification, localization, and detection Classification with Localization Detection. 2 ILSVRC-2014
Analecta Vol. 8, No. 2 ISSN 2064-7964
EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28
Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object
Supporting Online Material for
www.sciencemag.org/cgi/content/full/313/5786/504/dc1 Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks G. E. Hinton* and R. R. Salakhutdinov *To whom correspondence
An Analysis of Single-Layer Networks in Unsupervised Feature Learning
An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates 1, Honglak Lee 2, Andrew Y. Ng 1 1 Computer Science Department, Stanford University {acoates,ang}@cs.stanford.edu 2 Computer
A Genetic Algorithm-Evolved 3D Point Cloud Descriptor
A Genetic Algorithm-Evolved 3D Point Cloud Descriptor Dominik Wȩgrzyn and Luís A. Alexandre IT - Instituto de Telecomunicações Dept. of Computer Science, Univ. Beira Interior, 6200-001 Covilhã, Portugal
Fast Matching of Binary Features
Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been
Face Recognition in Low-resolution Images by Using Local Zernike Moments
Proceedings of the International Conference on Machine Vision and Machine Learning Prague, Czech Republic, August14-15, 014 Paper No. 15 Face Recognition in Low-resolution Images by Using Local Zernie
Compacting ConvNets for end to end Learning
Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson, Hao Zhou, Fatih Porikli. Success of CNN Image Classification Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton,
Bert Huang Department of Computer Science Virginia Tech
This paper was submitted as a final project report for CS6424/ECE6424 Probabilistic Graphical Models and Structured Prediction in the spring semester of 2016. The work presented here is done by students
arxiv:1312.6034v2 [cs.cv] 19 Apr 2014
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps arxiv:1312.6034v2 [cs.cv] 19 Apr 2014 Karen Simonyan Andrea Vedaldi Andrew Zisserman Visual Geometry Group,
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons
Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence
Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support
Module 5. Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016
Module 5 Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016 Previously, end-to-end.. Dog Slide credit: Jose M 2 Previously, end-to-end.. Dog Learned Representation Slide credit: Jose
Group Sparse Coding. Fernando Pereira Google Mountain View, CA [email protected]. Dennis Strelow Google Mountain View, CA strelow@google.
Group Sparse Coding Samy Bengio Google Mountain View, CA [email protected] Fernando Pereira Google Mountain View, CA [email protected] Yoram Singer Google Mountain View, CA [email protected] Dennis Strelow
3D Model based Object Class Detection in An Arbitrary View
3D Model based Object Class Detection in An Arbitrary View Pingkun Yan, Saad M. Khan, Mubarak Shah School of Electrical Engineering and Computer Science University of Central Florida http://www.eecs.ucf.edu/
Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15
Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15 GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries Copyright GENIVI Alliance
Probabilistic Latent Semantic Analysis (plsa)
Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg [email protected] www.multimedia-computing.{de,org} References
Selecting Receptive Fields in Deep Networks
Selecting Receptive Fields in Deep Networks Adam Coates Department of Computer Science Stanford University Stanford, CA 94305 [email protected] Andrew Y. Ng Department of Computer Science Stanford
Do Convnets Learn Correspondence?
Do Convnets Learn Correspondence? Jonathan Long Ning Zhang Trevor Darrell University of California Berkeley {jonlong, nzhang, trevor}@cs.berkeley.edu Abstract Convolutional neural nets (convnets) trained
Image Segmentation and Registration
Image Segmentation and Registration Dr. Christine Tanner ([email protected]) Computer Vision Laboratory, ETH Zürich Dr. Verena Kaynig, Machine Learning Laboratory, ETH Zürich Outline Segmentation
Stochastic Pooling for Regularization of Deep Convolutional Neural Networks
Stochastic Pooling for Regularization of Deep Convolutional Neural Networks Matthew D. Zeiler Department of Computer Science Courant Institute, New York University [email protected] Rob Fergus Department
Pedestrian Detection with RCNN
Pedestrian Detection with RCNN Matthew Chen Department of Computer Science Stanford University [email protected] Abstract In this paper we evaluate the effectiveness of using a Region-based Convolutional
arxiv:1506.03365v2 [cs.cv] 19 Jun 2015
LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop Fisher Yu Yinda Zhang Shuran Song Ari Seff Jianxiong Xiao arxiv:1506.03365v2 [cs.cv] 19 Jun 2015 Princeton
The Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
Object Recognition. Selim Aksoy. Bilkent University [email protected]
Image Classification and Object Recognition Selim Aksoy Department of Computer Engineering Bilkent University [email protected] Image classification Image (scene) classification is a fundamental
The use of computer vision technologies to augment human monitoring of secure computing facilities
The use of computer vision technologies to augment human monitoring of secure computing facilities Marius Potgieter School of Information and Communication Technology Nelson Mandela Metropolitan University
Face Recognition For Remote Database Backup System
Face Recognition For Remote Database Backup System Aniza Mohamed Din, Faudziah Ahmad, Mohamad Farhan Mohamad Mohsin, Ku Ruhana Ku-Mahamud, Mustafa Mufawak Theab 2 Graduate Department of Computer Science,UUM
Scalable Object Detection by Filter Compression with Regularized Sparse Coding
Scalable Object Detection by Filter Compression with Regularized Sparse Coding Ting-Hsuan Chao, Yen-Liang Lin, Yin-Hsi Kuo, and Winston H Hsu National Taiwan University, Taipei, Taiwan Abstract For practical
Distributed forests for MapReduce-based machine learning
Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
Environmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Categorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
Accurate and robust image superresolution by neural processing of local image representations
Accurate and robust image superresolution by neural processing of local image representations Carlos Miravet 1,2 and Francisco B. Rodríguez 1 1 Grupo de Neurocomputación Biológica (GNB), Escuela Politécnica
Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition
Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition Marc Aurelio Ranzato, Fu-Jie Huang, Y-Lan Boureau, Yann LeCun Courant Institute of Mathematical Sciences,
Introduction to Pattern Recognition
Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University [email protected] CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
Character Image Patterns as Big Data
22 International Conference on Frontiers in Handwriting Recognition Character Image Patterns as Big Data Seiichi Uchida, Ryosuke Ishida, Akira Yoshida, Wenjie Cai, Yaokai Feng Kyushu University, Fukuoka,
Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks
1 Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks Tomislav Hrkać, Karla Brkić, Zoran Kalafatić Faculty of Electrical Engineering and Computing University of
Simplified Machine Learning for CUDA. Umar Arshad @arshad_umar Arrayfire @arrayfire
Simplified Machine Learning for CUDA Umar Arshad @arshad_umar Arrayfire @arrayfire ArrayFire CUDA and OpenCL experts since 2007 Headquartered in Atlanta, GA In search for the best and the brightest Expert
Making Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University [email protected] [email protected] I. Introduction III. Model The goal of our research
The Visual Internet of Things System Based on Depth Camera
The Visual Internet of Things System Based on Depth Camera Xucong Zhang 1, Xiaoyun Wang and Yingmin Jia Abstract The Visual Internet of Things is an important part of information technology. It is proposed
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
How To Filter Spam Image From A Picture By Color Or Color
Image Content-Based Email Spam Image Filtering Jianyi Wang and Kazuki Katagishi Abstract With the population of Internet around the world, email has become one of the main methods of communication among
AN IMPROVED DOUBLE CODING LOCAL BINARY PATTERN ALGORITHM FOR FACE RECOGNITION
AN IMPROVED DOUBLE CODING LOCAL BINARY PATTERN ALGORITHM FOR FACE RECOGNITION Saurabh Asija 1, Rakesh Singh 2 1 Research Scholar (Computer Engineering Department), Punjabi University, Patiala. 2 Asst.
Tracking and Recognition in Sports Videos
Tracking and Recognition in Sports Videos Mustafa Teke a, Masoud Sattari b a Graduate School of Informatics, Middle East Technical University, Ankara, Turkey [email protected] b Department of Computer
Classifying Manipulation Primitives from Visual Data
Classifying Manipulation Primitives from Visual Data Sandy Huang and Dylan Hadfield-Menell Abstract One approach to learning from demonstrations in robotics is to make use of a classifier to predict if
Part-Based Recognition
Part-Based Recognition Benedict Brown CS597D, Fall 2003 Princeton University CS 597D, Part-Based Recognition p. 1/32 Introduction Many objects are made up of parts It s presumably easier to identify simple
TRAFFIC sign recognition has direct real-world applications
Traffic Sign Recognition with Multi-Scale Convolutional Networks Pierre Sermanet and Yann LeCun Courant Institute of Mathematical Sciences, New York University {sermanet,yann}@cs.nyu.edu Abstract We apply
3D Object Recognition using Convolutional Neural Networks with Transfer Learning between Input Channels
3D Object Recognition using Convolutional Neural Networks with Transfer Learning between Input Channels Luís A. Alexandre Department of Informatics and Instituto de Telecomunicações Univ. Beira Interior,
Blood Vessel Classification into Arteries and Veins in Retinal Images
Blood Vessel Classification into Arteries and Veins in Retinal Images Claudia Kondermann and Daniel Kondermann a and Michelle Yan b a Interdisciplinary Center for Scientific Computing (IWR), University
Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006
Practical Tour of Visual tracking David Fleet and Allan Jepson January, 2006 Designing a Visual Tracker: What is the state? pose and motion (position, velocity, acceleration, ) shape (size, deformation,
Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.
Visual search: what's next? Cees Snoek University of Amsterdam The Netherlands Euvision Technologies The Netherlands Problem statement US flag Tree Aircraft Humans Dog Smoking Building Basketball Table
Recognition Method for Handwritten Digits Based on Improved Chain Code Histogram Feature
3rd International Conference on Multimedia Technology ICMT 2013) Recognition Method for Handwritten Digits Based on Improved Chain Code Histogram Feature Qian You, Xichang Wang, Huaying Zhang, Zhen Sun
Multiscale Object-Based Classification of Satellite Images Merging Multispectral Information with Panchromatic Textural Features
Remote Sensing and Geoinformation Lena Halounová, Editor not only for Scientific Cooperation EARSeL, 2011 Multiscale Object-Based Classification of Satellite Images Merging Multispectral Information with
Subspace Analysis and Optimization for AAM Based Face Alignment
Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China [email protected] Stan Z. Li Microsoft
Introduction to Machine Learning Using Python. Vikram Kamath
Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression
Journal of Industrial Engineering Research. Adaptive sequence of Key Pose Detection for Human Action Recognition
IWNEST PUBLISHER Journal of Industrial Engineering Research (ISSN: 2077-4559) Journal home page: http://www.iwnest.com/aace/ Adaptive sequence of Key Pose Detection for Human Action Recognition 1 T. Sindhu
The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
Deep learning applications and challenges in big data analytics
Najafabadi et al. Journal of Big Data (2015) 2:1 DOI 10.1186/s40537-014-0007-7 RESEARCH Open Access Deep learning applications and challenges in big data analytics Maryam M Najafabadi 1, Flavio Villanustre
Simultaneous Gamma Correction and Registration in the Frequency Domain
Simultaneous Gamma Correction and Registration in the Frequency Domain Alexander Wong [email protected] William Bishop [email protected] Department of Electrical and Computer Engineering University
Employer Health Insurance Premium Prediction Elliott Lui
Employer Health Insurance Premium Prediction Elliott Lui 1 Introduction The US spends 15.2% of its GDP on health care, more than any other country, and the cost of health insurance is rising faster than
A Simple Feature Extraction Technique of a Pattern By Hopfield Network
A Simple Feature Extraction Technique of a Pattern By Hopfield Network A.Nag!, S. Biswas *, D. Sarkar *, P.P. Sarkar *, B. Gupta **! Academy of Technology, Hoogly - 722 *USIC, University of Kalyani, Kalyani
Transfer Learning for Latin and Chinese Characters with Deep Neural Networks
Transfer Learning for Latin and Chinese Characters with Deep Neural Networks Dan C. Cireşan IDSIA USI-SUPSI Manno, Switzerland, 6928 Email: [email protected] Ueli Meier IDSIA USI-SUPSI Manno, Switzerland, 6928
Big Data: Image & Video Analytics
Big Data: Image & Video Analytics How it could support Archiving & Indexing & Searching Dieter Haas, IBM Deutschland GmbH The Big Data Wave 60% of internet traffic is multimedia content (images and videos)
Neural Networks for Sentiment Detection in Financial Text
Neural Networks for Sentiment Detection in Financial Text Caslav Bozic* and Detlef Seese* With a rise of algorithmic trading volume in recent years, the need for automatic analysis of financial news emerged.
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
Scalable Developments for Big Data Analytics in Remote Sensing
Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,
Applications of Deep Learning to the GEOINT mission. June 2015
Applications of Deep Learning to the GEOINT mission June 2015 Overview Motivation Deep Learning Recap GEOINT applications: Imagery exploitation OSINT exploitation Geospatial and activity based analytics
Multi-Column Deep Neural Network for Traffic Sign Classification
Multi-Column Deep Neural Network for Traffic Sign Classification Dan Cireşan, Ueli Meier, Jonathan Masci and Jürgen Schmidhuber IDSIA - USI - SUPSI Galleria 2, Manno - Lugano 6928, Switzerland Abstract
Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Impact of Feature Selection on the Performance of ireless Intrusion Detection Systems
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
Self-Organizing g Maps (SOM) COMP61021 Modelling and Visualization of High Dimensional Data
Self-Organizing g Maps (SOM) Ke Chen Outline Introduction ti Biological Motivation Kohonen SOM Learning Algorithm Visualization Method Examples Relevant Issues Conclusions 2 Introduction Self-organizing
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
SZTAKI @ ImageCLEF 2011
SZTAKI @ ImageCLEF 2011 Bálint Daróczy Róbert Pethes András A. Benczúr Data Mining and Web search Research Group, Informatics Laboratory Computer and Automation Research Institute of the Hungarian Academy
siftservice.com - Turning a Computer Vision algorithm into a World Wide Web Service
siftservice.com - Turning a Computer Vision algorithm into a World Wide Web Service Ahmad Pahlavan Tafti 1, Hamid Hassannia 2, and Zeyun Yu 1 1 Department of Computer Science, University of Wisconsin -Milwaukee,
Determining optimal window size for texture feature extraction methods
IX Spanish Symposium on Pattern Recognition and Image Analysis, Castellon, Spain, May 2001, vol.2, 237-242, ISBN: 84-8021-351-5. Determining optimal window size for texture feature extraction methods Domènec
