Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation
|
|
|
- Darrell Holmes
- 9 years ago
- Views:
Transcription
1 Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation Jonathan Tompson, Arjun Jain, Yann LeCun, Christoph Bregler New York University {tompson, ajain, yann, bregler}@cs.nyu.edu Abstract This paper proposes a new hybrid architecture that consists of a deep Convolutional Network and a Markov Random Field. e show how this architecture is successfully applied to the challenging problem of articulated human pose estimation in monocular images. The architecture can exploit structural domain constraints such as geometric relationships between body joint locations. e show that joint training of these two model paradigms improves performance and allows us to significantly outperform existing state-of-the-art techniques. 1 Introduction Despite a long history of prior work, human body pose estimation, or specifically the localization of human joints in monocular RGB images, remains a very challenging task in computer vision. Complex joint inter-dependencies, partial or full joint occlusions, variations in body shape, clothing or lighting, and unrestricted viewing angles result in a very high dimensional input space, making naive search methods intractable. Recent approaches to this problem fall into two broad categories: 1) more traditional deformable part models [27] and 2) deep-learning based discriminative models [15, ]. Bottom-up part-based models are a common choice for this problem since the human body naturally segments into articulated parts. Traditionally these approaches have relied on the aggregation of hand-crafted low-level features such as SIFT [18] or HoG [7], which are then input to a standard classifier or a higher level generative model. Care is taken to ensure that these engineered features are sensitive to the part that they are trying to detect and are invariant to numerous deformations in the input space (such as variations in lighting). On the other hand, discriminative deep-learning approaches learn an empirical set of low and high-level features which are typically more tolerant to variations in the training set and have recently outperformed part-based models [27]. However, incorporating priors about the structure of the human body (such as our prior knowledge about joint inter-connectivity) into such networks is difficult since the low-level mechanics of these networks is often hard to interpret. In this work we attempt to combine a Convolutional Network (ConvNet) Part-Detector which alone outperforms all other existing methods with a part-based Spatial-Model into a unified learning framework. Our translation-invariant ConvNet architecture utilizes a multi-resolution feature representation with overlapping receptive fields. Additionally, our Spatial-Model is able to approximate MRF loopy belief propagation, which is subsequently back-propagated through, and learned using the same learning framework as the Part-Detector. e show that the combination and joint training of these two models improves performance, and allows us to significantly outperform existing state-of-the-art models on the task of human body pose recognition. 1
2 2 Related ork For unconstrained image domains, many architectures have been proposed, including shapecontext edge-based histograms from the human body [] or just silhouette features [13]. Many techniques have been proposed that extract, learn, or reason over entire body features. Some use a combination of local detectors and structural reasoning [25] for coarse tracking and [5] for persondependent tracking). In a similar spirit, more general techniques using Pictorial Structures such as the work by Felzenszwalb et al. [] made this approach tractable with so called Deformable Part Models (DPM). Subsequently a large number of related models were developed [1, 9, 31, 8]. Algorithms which model more complex joint relationships, such as Yang and Ramanan [31], use a flexible mixture of templates modeled by linear SVMs. Johnson and Everingham [16] employ a cascade of body part detectors to obtain more discriminative templates. Most recent approaches aim to model higher-order part relationships. Pishchulin [23, 24] proposes a model that augments the DPM model with Poselet [3] priors. Sapp and Taskar [27] propose a multi-modal model which includes both holistic and local cues for mode selection and pose estimation. Following the Poselets approach, the Armlets approach by Gkioxari et al. [12] employs a semi-global classifier for part configuration, and shows good performance on real-world data, however, it is tested only on arms. Furthermore, all these approaches suffer from the fact that they use hand crafted features such as HoG features, edges, contours, and color histograms. The best performing algorithms today for many vision tasks, and human pose estimation in particular ([, 15, 29]) are based on deep convolutional networks. Toshev et al. [] show state-of-art performance on the FLIC [27] and LSP [17] datasets. However, their method suffers from inaccuracy in the high-precision region, which we attribute to inefficient direct regression of pose vectors from images, which is a highly non-linear and difficult to learn mapping. Joint training of neural-networks and graphical models has been previously reported by Ning et al. [22] for image segmentation, and by various groups in speech and language modeling [4, 21]. To our knowledge no such model has been successfully used for the problem of detecting and localizing body part positions of humans in images. Recently, Ross et al. [26] use a message-passing inspired procedure for structured prediction on computer vision tasks, such as 3D point cloud classification and 3D surface estimation from single images. In contrast to this work, we formulate our message-parsing inspired network in a way that is more amenable to back-propagation and so can be implemented in existing neural networks. Heitz et al. [14] train a cascade of off-the-shelf classifiers for simultaneously performing object detection, region labeling, and geometric reasoning. However, because of the forward nature of the cascade, a later classifier is unable to encourage earlier ones to focus its effort on fixing certain error modes, or allow the earlier classifiers to ignore mistakes that can be undone by classifiers further in the cascade. Bergtholdt et al. [2] propose an approach for object class detection using a parts-based model where they are able to create a fully connected graph on parts and perform MAP-inference using A search, but rely on SIFT and color features to create the unary and pairwise potentials. 3 Model 3.1 Convolutional Network Part-Detector Image Patches Conv + (3 Stages) 64x64x3 xx128 13x13x128 9x9x128 LCN Fully-Connected Layers 128x128 64x64 LCN 5x5 Conv + 5x5 Conv + 5x5 9x9x x64x3 xx128 13x13x128 9x9x128 9x9x256 Figure 1: Multi-Resolution Sliding-indow ith Overlapping Receptive Fields 2
3 The first stage of our detection pipeline is a deep ConvNet architecture for body part localization. The input is an RGB image containing one or more people and the output is a heat-map, which produces a per-pixel likelihood for key joint locations on the human skeleton. A sliding-window ConvNet architecture is shown in Fig 1. The network is slid over the input image to produce a dense heat-map output for each body-joint. Our model incorporates a multi-resolution input with overlapping receptive fields. The upper convolution bank in Fig 1 sees a standard 64x64 resolution input window, while the lower bank sees a larger 128x128 input context down-sampled to 64x64. The input images are then Local Contrast Normalized (LCN [6]) (after down-sampling with anti-aliasing in the lower resolution bank) to produce an approximate Laplacian pyramid. The advantage of using overlapping contexts is that it allows the network to see a larger portion of the input image with only a moderate increase in the number of weights. The role of the Laplacian Pyramid is to provide each bank with non-overlapping spectral content which minimizes network redundancy. Full Image 3x2px + 98x68x128 9xx512 9xx256 9xx4 1x1 Conv 1x1 Conv Fully-connected equivalent model Figure 2: Efficient Sliding indow Model with Single Receptive Field An advantage of the Sliding-indow model (Fig 1) is that the detector is translation invariant. However a major drawback is that evaluation is expensive due to redundant convolutions. Recent work [11, 28] has addressed this problem by performing the convolution stages on the full input image to efficiently create dense feature maps. These dense feature maps are then processed through convolution stages to replicate the fully-connected network at each pixel. An equivalent but efficient version of the sliding window model for a single resolution bank is shown in Fig 2. Note that due to pooling in the convolution stages, the output heat-map will be a lower resolution than the input image. For our Part-Detector, we combine an efficient sliding window-based architecture with multiresolution and overlapping receptive fields; the subsequent model is shown in Fig 3. Since the large context (low resolution) convolution bank requires a stride of 1 /2 pixels in the lower resolution image to produce the same dense output as the sliding window model, the bank must process four down-sampled images, each with a 1 /2 pixel offset, using shared weight convolutions. These four outputs, along with the high resolution convolutional features, are processed through a 9x9 convolution stage (with 512 output features) using the same weights as the first fully connected stage (Fig 1) and then the outputs of the low resolution bank are added and interleaved with the output of high resolution bank. To improve training time we simplify the above architecture by replacing the lower-resolution stage with a single convolution bank as shown in Fig 4 and then upscale the resulting feature map. In our practical implementation we use 3 resolution banks. Note that the simplified architecture is no longer equivalent to the original sliding-window network of Fig 1 since the lower resolution convolution features are effectively decimated and replicated leading into the fully-connected stage, however we have found empirically that the performance loss is minimal. Supervised training of the network is performed using batched Stochastic Gradient Descent (SGD) with Nesterov Momentum. e use a Mean Squared Error (MSE) criterion to minimize the distance between the predicted output and a target heat-map. The target is a 2D Gaussian with a small variance and mean centered at the ground-truth joint locations. At training time we also perform random perturbations of the input images (randomly flipping and scaling the images) to increase generalization performance. 3
4 Full Image 3x2px Offset 4x1x1px images x68x128 53x38x128 + Replicate + Offset + Stride 2 (1, 1) (1, 2) (2, 1) (2, 2) Interleaved 9xx Fully-connectioned equivalent model 9xx4 Figure 3: Efficient Sliding indow Model with Overlapping Receptive Fields Full Image 3x2px Half-res Image 1x1px x68x128 53x38x128 45xx128 Point-wise Upscale 9xx xx512 Fully-connectioned equivalent model 9xx4 Figure 4: Approximation of Fig Higher-Level Spatial-Model The Part-Detector (Section 3.1) performance on our validation set predicts heat-maps that contain many false positives and poses that are anatomically incorrect; for instance when a peak for face detection is unusually far from a peak in the corresponding shoulder detection. Therefore, in spite of the improved Part-Detector context, the feed forward network still has difficulty learning an implicit model of the constraints of the body parts for the full range of body poses. e use a higher-level Spatial-Model to constrain joint inter-connectivity and enforce global pose consistency. The expectation of this stage is to not increase the performance of detections that are already close to the ground-truth pose, but to remove false positive outliers that are anatomically incorrect. Similar to Jain et al. [15], we formulate the Spatial-Model as an MRF-like model over the distribution of spatial locations for each body part. However, the biggest drawback of their model is that the body part priors and the graph structure are explicitly hand crafted. On the other hand, we learn the prior model and implicitly the structure of the spatial model. Unlike [15], we start by connecting every body part to itself and to every other body part in a pair-wise fashion in the spatial model to create a fully connected graph. The Part-Detector (Section 3.1) provides the unary potentials for each body part location. The pair-wise potentials in the graph are computed using convolutional priors, which model the conditional distribution of the location of one body part to another. For instance, given that body part B is located at the center pixel, the convolution prior P A B (i, j) is the likelihood of the body part A occurring in pixel location (i, j). For a body part A, we calculate the final marginal likelihood p A as: p A = 1 ( ) pa v p v + b v A (1) Z v V where v is the joint location, p A v is the conditional prior described above, b v a is a bias term used to describe the background probability for the message from joint v to A, and Z is the partition 4
5 function. Evaluation of Eq 1 is analogous to a single round of sum-product belief propagation. Convergence to a global optimum is not guaranteed given that our spatial model is not tree structured. However, as it can been seen in our results (Fig 8b), the inferred solution is sufficiently accurate for all poses in our datasets. The learned pair-wise distributions are purely uniform when any pairwise edge should to be removed from the graph structure. Fig 5 shows a practical example of how the Spatial-Model is able to remove an anatomically incorrect strong outlier from the face heat-map by incorporating the presence of a strong shoulder detection. For simplicity, only the shoulder and face joints are shown, however, this example can be extended to incorporate all body part pairs. If the shoulder heat-map shown in Fig 5 had an incorrect false-negative (i.e. no detection at the correct shoulder location), the addition of the background bias b v A would prevent the output heat-map from having no maxima in the detected face region. Face Unary * f f = Face Face x Face x = Face Shoulder s f * Face Unary * = = * Shoulder Unary f s Shoulder Face Shoulder Shoulder Shoulder s s Shoulder Unary Figure 5: Didactic Example of Message Passing Between the Face and Shoulder Joints Fig 5 contains the conditional distributions for face and shoulder parts learned on the FLIC [27] dataset. For any part A the distribution P A A is the identity map, and so the message passed from any joint to itself is its unary distribution. Since the FLIC dataset is biased towards front-facing poses where the right shoulder is directly to the lower right of the face, the model learns the correct spatial distribution between these body parts and has high probability in the spatial locations describing the likely displacement between the shoulder and face. For datasets that cover a larger range of the possible poses (for instance the LSP [17] dataset), we would expect these distributions to be less tightly constrained, and therefore this simple Spatial-Model will be less effective. For our practical implementation we treat the distributions above as energies to avoid the evaluation of Z. There are 3 reasons why we do not include the partition function. Firstly, we are only concerned with the maximum output value of our network, and so we only need the output energy to be proportional to the normalized distribution. Secondly, since both the part detector and spatial model parameters contain only shared weight (convolutional) parameters that are equal across pixel positions, evaluation of the partition function during back-propagation will only add a scalar constant to the gradient weight, which would be equivalent to applying a per-batch learning-rate modifier. Lastly, since the number of parts is not known a priori (since there can be unlabeled people in the image), and since the distributions p v describe the part location of a single person, we cannot normalize the Part-Model output. Our final model is a modification to Eq 1: ( [ ( ( ) ē A = exp log ea v (ev ) + (b v A ) )]) (2) v V where: (x) = 1 /β log (1 + exp (βx)), 1 /2 β 2 (x) = max (x, ɛ), < ɛ.1 Note that the above formulation is no longer exactly equivalent to an MRF, but still satisfactorily encodes the spatial constraints of Eq 1. The network-based implementation of Eq 2 is shown in Fig 6. Eq 2 replaces the outer multiplication of Eq 1 with a log space addition to improve numerical stability and to prevent coupling of the convolution output gradients (the addition in log space means that the partial derivative of the loss function with respect to the convolution output is not dependent on the output of any other stages). The inclusion of the and stages on the weights, biases and input heat-map maintains a strictly greater than zero convolution output, which prevents numerical issues for the values leading into the Log stage. Finally, a stage is used to 5
6 maintain continuous and non-zero weight and bias gradients during training. ith this modified formulation, Eq 2 is trained using back-propagation and SGD. b b b b b Conv b Conv b Conv b Conv log log log log + + exp exp Figure 6: Single Round Message Passing Network The convolution sizes are adjusted so that the largest joint displacement is covered within the convolution window. For our 9x pixel heat-map output, this results in large 128x128 convolution kernels to account for a joint displacement radius of 64 pixels (note that padding is added on the heat-map input to prevent pixel loss). Therefore for such large kernels we use FFT convolutions based on the GPU implementation by Mathieu et al. [19]. The convolution weights are initialized using the empirical histogram of joint displacements created from the training examples. This initialization improves learned performance, decreases training time and improves optimization stability. During training we randomly flip and scale the heat-map inputs to improve generalization performance. 3.3 Unified Model Since our Spatial-Model (Section 3.2) is trained using back-propagation, we can combine our Part- Detector and Spatial-Model stages in a single Unified Model. To do so, we first train the Part- Detector separately and store the heat-map outputs. e then use these heat-maps to train a Spatial- Model. Finally, we combine the trained Part-Detector and Spatial-Models and back-propagate through the entire network. This unified fine-tuning further improves performance. e hypothesize that because the Spatial- Model is able to effectively reduce the output dimension of possible heat-map activations, the Part- Detector can use available learning capacity to better localize the precise target activation. 4 Results The models from Sections 3.1 and 3.2 were implemented within the Torch7 [6] framework (with custom GPU implementations for the non-standard stages above). Training the Part-Detector takes approximately 48 hours, the Spatial-Model 12 hours, and forward-propagation for a single image through both networks takes 51ms 1. e evaluated our architecture on the FLIC [27] and extended-lsp [17] datasets. These datasets consist of still RGB images with 2D ground-truth joint information generated using Amazon Mechanical Turk. The FLIC dataset is comprised of 3 images from Hollywood movies with actors in predominantly front-facing standing up poses (with 16 images used for testing), while the extended-lsp dataset contains a wider variety of poses of athletes playing sport (442 training and test images). The FLIC dataset contains many frames with more than a single person, while the joint locations from only one person in the scene are labeled. Therefore an approximate torso bounding box is provided for the single labeled person in the scene. e incorporate this data by including an extra torso-joint heat-map to the input of the Spatial-Model so that it can learn to select the correct feature activations in a cluttered scene. 1 e use a 12 CPU workstation with an NVIDIA Titan GPU 6
7 The FLIC-full dataset contains 928 training images, however many of these training set images contain samples from the 16 test set scenes and so would allow unfair overtraining on the FLIC test set. Therefore, we propose a new dataset - called FLIC-plus ( tompson/flic plus.htm) - which is a 173 image subset from the FLIC-plus dataset. To create this dataset, we produced unique scene labels for both the FLIC test set and FLICplus training sets using Amazon Mechanical Turk. e then removed all images from the FLIC-plus training set that shared a scene with the test set. Since 253 of the sample images from the original 3987 FLIC training set came from the same scene as a test set sample (and were therefore removed by the above procedure), we added these images back so that the FLIC-plus training set is a superset of the original FLIC training set. Using this procedure we can guarantee that the additional samples in FLIC-plus are sufficiently independent to the FLIC test set samples. For evaluation of the test-set performance we use the measure suggested by Sapp et. al. [27]. For a given normalized pixel radius (normalized by the torso height of each sample) we count the number of images in the test-set for which the distance of the predicted UV joint location to the ground-truth location falls within the given radius. Fig 7a and 7b show our model s performance on the the FLIC test-set for the elbow and wrist joints respectively and trained using both the FLIC and FLIC-plus training sets. Performance on the LSP dataset is shown in Fig 7c and 8a. For LSP evaluation we use person-centric (or non-observercentric) coordinates for fair comparison with prior work [, 8]. Our model outperforms existing state-of-the-art techniques on both of these challenging datasets with a considerable margin. 9 Ours (FLIC) Ours (FLIC plus) Toshev et. al. Jain et. al (a) FLIC: Elbow 9 MODEC Eichner et. al. Yang et. al. Sapp et. al (b) FLIC: rist Figure 7: Model Performance 9 Ours: wrist Ours: elbow Toshev et al.: wrist Toshev et al.: elbow Dantone et al.: wrist Dantone et al.: elbow Pishchulin et al.: wrist Pishchulin et al.: elbow (c) LSP: rist and Elbow Fig 8b illustrates the performance improvement from our simple Spatial-Model. As expected the Spatial-Model has little impact on accuracy for low radii threshold, however, for large radii it increases performance by 8 to 12%. Unified training of both models (after independent pre-training) adds an additional 4-5% detection rate for large radii thresholds. 9 Ours: ankle Ours: knee Toshev et al.: ankle Toshev et al.: knee Dantone et al.: ankle Dantone et al.: knee Pishchulin et al.: ankle Pishchulin et al.: knee (a) LSP: Ankle and Knee 9 Part Model Part and Spatial Model Joint Training (b) FLIC: rist 9 1 Bank 2 Banks 3 Banks (c) FLIC: rist Figure 8: (a) Model Performance (b) ith and ithout Spatial-Model (c) Part-Detector Performance Vs Number of Resolution Banks (FLIC subset) 7
8 The impact of the number of resolution banks is shown in Fig 8c). As expected, we see a big improvement when multiple resolution banks are added. Also note that the size of the receptive fields as well as the number and size of the pooling stages in the network also have a large impact on the performance. e tune the network hyper-parameters using coarse meta-optimization to obtain maximal validation set performance within our computational budget (less than ms per forwardpropagation). Fig 9 shows the predicted joint locations for a variety of inputs in the FLIC and LSP test-sets. Our network produces convincing results on the FLIC dataset (with low joint position error), however, because our simple Spatial-Model is less effective for a number of the highly articulated poses in the LSP dataset, our detector results in incorrect joint predictions for some images. e believe that increasing the size of the training set will improve performance for these difficult cases. Figure 9: Predicted Joint Positions, Top Row: FLIC Test-Set, Bottom Row: LSP Test-Set 5 Conclusion e have shown that the unification of a novel ConvNet Part-Detector and an MRF inspired Spatial- Model into a single learning framework significantly outperforms existing architectures on the task of human body pose recognition. Training and inference of our architecture uses commodity level hardware and runs at close to real-time frame rates, making this technique tractable for a wide variety of application areas. For future work we expect to further improve upon these results by increasing the complexity and expressiveness of our simple spatial model (especially for unconstrained datasets like LSP). 6 Acknowledgments The authors would like to thank Mykhaylo Andriluka for his support. This research was funded in part by the Office of Naval Research ONR Award N References [1] M. Andriluka, S. Roth, and B. Schiele. Pictorial structures revisited: People detection and articulated pose estimation. In CVPR, 9. 8
9 [2] M. Bergtholdt, J. Kappes, S. Schmidt, and C. Schnörr. A study of parts-based object class detection using complete graphs. IJCV,. [3] L. Bourdev and J. Malik. Poselets: Body part detectors trained using 3d human pose annotations. In ICCV, 9. [4] H. Bourlard, Y. Konig, and N. Morgan. Remap: recursive estimation and maximization of a posteriori probabilities in connectionist speech recognition. In EUROSPEECH, [5] P. Buehler, A. Zisserman, and M. Everingham. Learning sign language by watching TV (using weakly aligned subtitles). CVPR, 9. [6] R. Collobert, K. Kavukcuoglu, and C. Farabet. Torch7: A matlab-like environment for machine learning. In BigLearn, NIPS orkshop, 11. [7] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 5. [8] M. Dantone, J. Gall, C. Leistner, and L. Van Gool. Human pose estimation using body parts dependent joint regressors. In CVPR 13. [9] M. Eichner and V. Ferrari. Better appearance models for pictorial structures. In BMVC, 9. [] P. Felzenszwalb, D. McAllester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. In CVPR, 8. [11] A. Giusti, D. Ciresan, J. Masci, L. Gambardella, and J. Schmidhuber. Fast image scanning with deep max-pooling convolutional neural networks. In CoRR, 13. [12] G. Gkioxari, P. Arbelaez, L. Bourdev, and J. Malik. Articulated pose estimation using discriminative armlet classifiers. In CVPR 13. [13] K. Grauman, G. Shakhnarovich, and T. Darrell. Inferring 3d structure with a statistical image-based shape model. In ICCV, 3. [14] G. Heitz, S. Gould, A. Saxena, and D. Koller. Cascaded classification models: Combining models for holistic scene understanding. 8. [15] A. Jain, J. Tompson, M. Andriluka, G. Taylor, and C. Bregler. Learning human pose estimation features with convolutional networks. In ICLR, 14. [16] S. Johnson and M. Everingham. Learning Effective Human Pose Estimation from Inaccurate Annotation. In CVPR 11. [17] S. Johnson and M. Everingham. Clustered pose and nonlinear appearance models for human pose estimation. In BMVC,. [18] D. Lowe. Object recognition from local scale-invariant features. In ICCV, [19] M. Mathieu, M. Henaff, and Y. LeCun. Fast training of convolutional networks through ffts. In CoRR, 13. [] G. Mori and J. Malik. Estimating human body configurations using shape context matching. ECCV, 2. [21] F. Morin and Y. Bengio. Hierarchical probabilistic neural network language model. In Proceedings of the Tenth International orkshop on Artificial Intelligence and Statistics, 5. [22] F. Ning, D. Delhomme, Y. LeCun, F. Piano, L. Bottou, and P. Barbano. Toward automatic phenotyping of developing embryos from videos. IEEE TIP, 5. [23] L. Pishchulin, M. Andriluka, P. Gehler, and B. Schiele. Poselet conditioned pictorial structures. In CVPR 13. [24] L. Pishchulin, M. Andriluka, P. Gehler, and B. Schiele. Strong appearance and expressive spatial models for human pose estimation. In ICCV 13. [25] D. Ramanan, D. Forsyth, and A. Zisserman. Strike a pose: Tracking people by finding stylized poses. In CVPR, 5. [26] S. Ross, D. Munoz, M. Hebert, and J.A Bagnell. Learning message-passing inference machines for structured prediction. In CVPR, 11. [27] B. Sapp and B. Taskar. Modec: Multimodal decomposable models for human pose estimation. In CVPR, 13. [28] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. In ICLR, 14. [29] J. Tompson, M. Stein, Y. LeCun, and K. Perlin. Real-time continuous pose recovery of human hands using convolutional networks. In TOG, 14. [] A. Toshev and C. Szegedy. Deeppose: Human pose estimation via deep neural networks. In CVPR, 14. [31] Yi Yang and Deva Ramanan. Articulated pose estimation with flexible mixtures-of-parts. In CVPR 11. 9
Lecture 6: Classification & Localization. boris. [email protected]
Lecture 6: Classification & Localization boris. [email protected] 1 Agenda ILSVRC 2014 Overfeat: integrated classification, localization, and detection Classification with Localization Detection. 2 ILSVRC-2014
Convolutional Feature Maps
Convolutional Feature Maps Elements of efficient (and accurate) CNN-based object detection Kaiming He Microsoft Research Asia (MSRA) ICCV 2015 Tutorial on Tools for Efficient Object Detection Overview
Deformable Part Models with CNN Features
Deformable Part Models with CNN Features Pierre-André Savalle 1, Stavros Tsogkas 1,2, George Papandreou 3, Iasonas Kokkinos 1,2 1 Ecole Centrale Paris, 2 INRIA, 3 TTI-Chicago Abstract. In this work we
CS 1699: Intro to Computer Vision. Deep Learning. Prof. Adriana Kovashka University of Pittsburgh December 1, 2015
CS 1699: Intro to Computer Vision Deep Learning Prof. Adriana Kovashka University of Pittsburgh December 1, 2015 Today: Deep neural networks Background Architectures and basic operations Applications Visualizing
A Learning Based Method for Super-Resolution of Low Resolution Images
A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 [email protected] Abstract The main objective of this project is the study of a learning based method
Semantic Recognition: Object Detection and Scene Segmentation
Semantic Recognition: Object Detection and Scene Segmentation Xuming He [email protected] Computer Vision Research Group NICTA Robotic Vision Summer School 2015 Acknowledgement: Slides from Fei-Fei
Local features and matching. Image classification & object localization
Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to
Module 5. Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016
Module 5 Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016 Previously, end-to-end.. Dog Slide credit: Jose M 2 Previously, end-to-end.. Dog Learned Representation Slide credit: Jose
Pictorial Structures Revisited: People Detection and Articulated Pose Estimation
Pictorial Structures Revisited: People Detection and Articulated Pose Estimation Mykhaylo Andriluka, Stefan Roth, and Bernt Schiele Department of Computer Science, TU Darmstadt Abstract Non-rigid object
Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report
Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 69 Class Project Report Junhua Mao and Lunbo Xu University of California, Los Angeles [email protected] and lunbo
Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection
CSED703R: Deep Learning for Visual Recognition (206S) Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection Bohyung Han Computer Vision Lab. [email protected] 2 3 Object detection
Image Classification for Dogs and Cats
Image Classification for Dogs and Cats Bang Liu, Yan Liu Department of Electrical and Computer Engineering {bang3,yan10}@ualberta.ca Kai Zhou Department of Computing Science [email protected] Abstract
Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang
Recognizing Cats and Dogs with Shape and Appearance based Models Group Member: Chu Wang, Landu Jiang Abstract Recognizing cats and dogs from images is a challenging competition raised by Kaggle platform
Finding people in repeated shots of the same scene
Finding people in repeated shots of the same scene Josef Sivic 1 C. Lawrence Zitnick Richard Szeliski 1 University of Oxford Microsoft Research Abstract The goal of this work is to find all occurrences
3D Model based Object Class Detection in An Arbitrary View
3D Model based Object Class Detection in An Arbitrary View Pingkun Yan, Saad M. Khan, Mubarak Shah School of Electrical Engineering and Computer Science University of Central Florida http://www.eecs.ucf.edu/
arxiv:1312.6034v2 [cs.cv] 19 Apr 2014
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps arxiv:1312.6034v2 [cs.cv] 19 Apr 2014 Karen Simonyan Andrea Vedaldi Andrew Zisserman Visual Geometry Group,
Pedestrian Detection with RCNN
Pedestrian Detection with RCNN Matthew Chen Department of Computer Science Stanford University [email protected] Abstract In this paper we evaluate the effectiveness of using a Region-based Convolutional
Has my Algorithm Succeeded? An Evaluator for Human Pose Estimators
Has my Algorithm Succeeded? An Evaluator for Human Pose Estimators Nataraj Jammalamadaka, Andrew Zisserman, Marcin Eichner, Vittorio Ferrari, and C. V. Jawahar IIIT-Hyderabad, University of Oxford, ETH
Part-Based Recognition
Part-Based Recognition Benedict Brown CS597D, Fall 2003 Princeton University CS 597D, Part-Based Recognition p. 1/32 Introduction Many objects are made up of parts It s presumably easier to identify simple
Lecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28
Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object
Introduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Deep Learning Barnabás Póczos & Aarti Singh Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey
The Visual Internet of Things System Based on Depth Camera
The Visual Internet of Things System Based on Depth Camera Xucong Zhang 1, Xiaoyun Wang and Yingmin Jia Abstract The Visual Internet of Things is an important part of information technology. It is proposed
Image and Video Understanding
Image and Video Understanding 2VO 710.095 WS Christoph Feichtenhofer, Axel Pinz Slide credits: Many thanks to all the great computer vision researchers on which this presentation relies on. Most material
Novelty Detection in image recognition using IRF Neural Networks properties
Novelty Detection in image recognition using IRF Neural Networks properties Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban Université de Haute-Alsace MIPS 4, rue des Frères Lumière, 68093 Mulhouse,
Do Convnets Learn Correspondence?
Do Convnets Learn Correspondence? Jonathan Long Ning Zhang Trevor Darrell University of California Berkeley {jonlong, nzhang, trevor}@cs.berkeley.edu Abstract Convolutional neural nets (convnets) trained
Probabilistic Latent Semantic Analysis (plsa)
Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg [email protected] www.multimedia-computing.{de,org} References
Segmentation & Clustering
EECS 442 Computer vision Segmentation & Clustering Segmentation in human vision K-mean clustering Mean-shift Graph-cut Reading: Chapters 14 [FP] Some slides of this lectures are courtesy of prof F. Li,
Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006
Practical Tour of Visual tracking David Fleet and Allan Jepson January, 2006 Designing a Visual Tracker: What is the state? pose and motion (position, velocity, acceleration, ) shape (size, deformation,
Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks
1 Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks Tomislav Hrkać, Karla Brkić, Zoran Kalafatić Faculty of Electrical Engineering and Computing University of
Human Pose Estimation from RGB Input Using Synthetic Training Data
Human Pose Estimation from RGB Input Using Synthetic Training Data Oscar Danielsson and Omid Aghazadeh School of Computer Science and Communication KTH, Stockholm, Sweden {osda02, omida}@kth.se arxiv:1405.1213v2
Supporting Online Material for
www.sciencemag.org/cgi/content/full/313/5786/504/dc1 Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks G. E. Hinton* and R. R. Salakhutdinov *To whom correspondence
MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos [email protected]
Machine Learning for Computer Vision 1 MVA ENS Cachan Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos [email protected] Department of Applied Mathematics Ecole Centrale Paris Galen
Bert Huang Department of Computer Science Virginia Tech
This paper was submitted as a final project report for CS6424/ECE6424 Probabilistic Graphical Models and Structured Prediction in the spring semester of 2016. The work presented here is done by students
Efficient online learning of a non-negative sparse autoencoder
and Machine Learning. Bruges (Belgium), 28-30 April 2010, d-side publi., ISBN 2-93030-10-2. Efficient online learning of a non-negative sparse autoencoder Andre Lemme, R. Felix Reinhart and Jochen J. Steil
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
Administrivia. Traditional Recognition Approach. Overview. CMPSCI 370: Intro. to Computer Vision Deep learning
: Intro. to Computer Vision Deep learning University of Massachusetts, Amherst April 19/21, 2016 Instructor: Subhransu Maji Finals (everyone) Thursday, May 5, 1-3pm, Hasbrouck 113 Final exam Tuesday, May
Taking Inverse Graphics Seriously
CSC2535: 2013 Advanced Machine Learning Taking Inverse Graphics Seriously Geoffrey Hinton Department of Computer Science University of Toronto The representation used by the neural nets that work best
Stochastic Pooling for Regularization of Deep Convolutional Neural Networks
Stochastic Pooling for Regularization of Deep Convolutional Neural Networks Matthew D. Zeiler Department of Computer Science Courant Institute, New York University [email protected] Rob Fergus Department
Edge Boxes: Locating Object Proposals from Edges
Edge Boxes: Locating Object Proposals from Edges C. Lawrence Zitnick and Piotr Dollár Microsoft Research Abstract. The use of object proposals is an effective recent approach for increasing the computational
Classifying Manipulation Primitives from Visual Data
Classifying Manipulation Primitives from Visual Data Sandy Huang and Dylan Hadfield-Menell Abstract One approach to learning from demonstrations in robotics is to make use of a classifier to predict if
CS231M Project Report - Automated Real-Time Face Tracking and Blending
CS231M Project Report - Automated Real-Time Face Tracking and Blending Steven Lee, [email protected] June 6, 2015 1 Introduction Summary statement: The goal of this project is to create an Android
Università degli Studi di Bologna
Università degli Studi di Bologna DEIS Biometric System Laboratory Incremental Learning by Message Passing in Hierarchical Temporal Memory Davide Maltoni Biometric System Laboratory DEIS - University of
Compacting ConvNets for end to end Learning
Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson, Hao Zhou, Fatih Porikli. Success of CNN Image Classification Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton,
Steven C.H. Hoi School of Information Systems Singapore Management University Email: [email protected]
Steven C.H. Hoi School of Information Systems Singapore Management University Email: [email protected] Introduction http://stevenhoi.org/ Finance Recommender Systems Cyber Security Machine Learning Visual
Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski [email protected]
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski [email protected] Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems
Selecting Receptive Fields in Deep Networks
Selecting Receptive Fields in Deep Networks Adam Coates Department of Computer Science Stanford University Stanford, CA 94305 [email protected] Andrew Y. Ng Department of Computer Science Stanford
Mean-Shift Tracking with Random Sampling
1 Mean-Shift Tracking with Random Sampling Alex Po Leung, Shaogang Gong Department of Computer Science Queen Mary, University of London, London, E1 4NS Abstract In this work, boosting the efficiency of
Learning Detectors from Large Datasets for Object Retrieval in Video Surveillance
2012 IEEE International Conference on Multimedia and Expo Learning Detectors from Large Datasets for Object Retrieval in Video Surveillance Rogerio Feris, Sharath Pankanti IBM T. J. Watson Research Center
Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.
Visual search: what's next? Cees Snoek University of Amsterdam The Netherlands Euvision Technologies The Netherlands Problem statement US flag Tree Aircraft Humans Dog Smoking Building Basketball Table
Image Segmentation and Registration
Image Segmentation and Registration Dr. Christine Tanner ([email protected]) Computer Vision Laboratory, ETH Zürich Dr. Verena Kaynig, Machine Learning Laboratory, ETH Zürich Outline Segmentation
Part-based models for finding people and estimating their pose
Part-based models for finding people and estimating their pose Deva Ramanan Abstract This chapter will survey approaches to person detection and pose estimation with the use of part-based models. After
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Scalable Object Detection by Filter Compression with Regularized Sparse Coding
Scalable Object Detection by Filter Compression with Regularized Sparse Coding Ting-Hsuan Chao, Yen-Liang Lin, Yin-Hsi Kuo, and Winston H Hsu National Taiwan University, Taipei, Taiwan Abstract For practical
Pedestrian Detection using R-CNN
Pedestrian Detection using R-CNN CS676A: Computer Vision Project Report Advisor: Prof. Vinay P. Namboodiri Deepak Kumar Mohit Singh Solanki (12228) (12419) Group-17 April 15, 2016 Abstract Pedestrian detection
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Learning and transferring mid-level image representions using convolutional neural networks
Willow project-team Learning and transferring mid-level image representions using convolutional neural networks Maxime Oquab, Léon Bottou, Ivan Laptev, Josef Sivic 1 Image classification (easy) Is there
Localizing 3D cuboids in single-view images
Localizing 3D cuboids in single-view images Jianxiong Xiao Bryan C. Russell Antonio Torralba Massachusetts Institute of Technology University of Washington Abstract In this paper we seek to detect rectangular
Group Sparse Coding. Fernando Pereira Google Mountain View, CA [email protected]. Dennis Strelow Google Mountain View, CA strelow@google.
Group Sparse Coding Samy Bengio Google Mountain View, CA [email protected] Fernando Pereira Google Mountain View, CA [email protected] Yoram Singer Google Mountain View, CA [email protected] Dennis Strelow
Optical Flow. Shenlong Wang CSC2541 Course Presentation Feb 2, 2016
Optical Flow Shenlong Wang CSC2541 Course Presentation Feb 2, 2016 Outline Introduction Variation Models Feature Matching Methods End-to-end Learning based Methods Discussion Optical Flow Goal: Pixel motion
Edge tracking for motion segmentation and depth ordering
Edge tracking for motion segmentation and depth ordering P. Smith, T. Drummond and R. Cipolla Department of Engineering University of Cambridge Cambridge CB2 1PZ,UK {pas1001 twd20 cipolla}@eng.cam.ac.uk
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,
An Analysis of Single-Layer Networks in Unsupervised Feature Learning
An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates 1, Honglak Lee 2, Andrew Y. Ng 1 1 Computer Science Department, Stanford University {acoates,ang}@cs.stanford.edu 2 Computer
Multi-view Face Detection Using Deep Convolutional Neural Networks
Multi-view Face Detection Using Deep Convolutional Neural Networks Sachin Sudhakar Farfade Yahoo [email protected] Mohammad Saberian Yahoo [email protected] Li-Jia Li Yahoo [email protected]
Online Learning for Offroad Robots: Using Spatial Label Propagation to Learn Long Range Traversability
Online Learning for Offroad Robots: Using Spatial Label Propagation to Learn Long Range Traversability Raia Hadsell1, Pierre Sermanet1,2, Jan Ben2, Ayse Naz Erkan1, Jefferson Han1, Beat Flepp2, Urs Muller2,
Semi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
Fast R-CNN. Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th. Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:1504.08083.
Fast R-CNN Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:1504.08083. ECS 289G 001 Paper Presentation, Prof. Lee Result 1 67% Accuracy
Introduction to Deep Learning Variational Inference, Mean Field Theory
Introduction to Deep Learning Variational Inference, Mean Field Theory 1 Iasonas Kokkinos [email protected] Center for Visual Computing Ecole Centrale Paris Galen Group INRIA-Saclay Lecture 3: recap
The Delicate Art of Flower Classification
The Delicate Art of Flower Classification Paul Vicol Simon Fraser University University Burnaby, BC [email protected] Note: The following is my contribution to a group project for a graduate machine learning
Environmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
Object Recognition. Selim Aksoy. Bilkent University [email protected]
Image Classification and Object Recognition Selim Aksoy Department of Computer Engineering Bilkent University [email protected] Image classification Image (scene) classification is a fundamental
Tracking in flussi video 3D. Ing. Samuele Salti
Seminari XXIII ciclo Tracking in flussi video 3D Ing. Tutors: Prof. Tullio Salmon Cinotti Prof. Luigi Di Stefano The Tracking problem Detection Object model, Track initiation, Track termination, Tracking
Lecture 8 February 4
ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt
Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary
Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:
Object class recognition using unsupervised scale-invariant learning
Object class recognition using unsupervised scale-invariant learning Rob Fergus Pietro Perona Andrew Zisserman Oxford University California Institute of Technology Goal Recognition of object categories
Face Recognition in Low-resolution Images by Using Local Zernike Moments
Proceedings of the International Conference on Machine Vision and Machine Learning Prague, Czech Republic, August14-15, 014 Paper No. 15 Face Recognition in Low-resolution Images by Using Local Zernie
Colour Image Segmentation Technique for Screen Printing
60 R.U. Hewage and D.U.J. Sonnadara Department of Physics, University of Colombo, Sri Lanka ABSTRACT Screen-printing is an industry with a large number of applications ranging from printing mobile phone
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
The Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
Object Categorization using Co-Occurrence, Location and Appearance
Object Categorization using Co-Occurrence, Location and Appearance Carolina Galleguillos Andrew Rabinovich Serge Belongie Department of Computer Science and Engineering University of California, San Diego
Structured Learning and Prediction in Computer Vision. Contents
Foundations and Trends R in Computer Graphics and Vision Vol. 6, Nos. 3 4 (2010) 185 365 c 2011 S. Nowozin and C. H. Lampert DOI: 10.1561/0600000033 Structured Learning and Prediction in Computer Vision
Jiří Matas. Hough Transform
Hough Transform Jiří Matas Center for Machine Perception Department of Cybernetics, Faculty of Electrical Engineering Czech Technical University, Prague Many slides thanks to Kristen Grauman and Bastian
SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING
AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations
Vehicle Tracking by Simultaneous Detection and Viewpoint Estimation
Vehicle Tracking by Simultaneous Detection and Viewpoint Estimation Ricardo Guerrero-Gómez-Olmedo, Roberto López-Sastre, Saturnino Maldonado-Bascón, and Antonio Fernández-Caballero 2 GRAM, Department of
TRAFFIC sign recognition has direct real-world applications
Traffic Sign Recognition with Multi-Scale Convolutional Networks Pierre Sermanet and Yann LeCun Courant Institute of Mathematical Sciences, New York University {sermanet,yann}@cs.nyu.edu Abstract We apply
Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite
Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite Philip Lenz 1 Andreas Geiger 2 Christoph Stiller 1 Raquel Urtasun 3 1 KARLSRUHE INSTITUTE OF TECHNOLOGY 2 MAX-PLANCK-INSTITUTE IS 3
Latest Advances in Deep Learning. Yao Chou
Latest Advances in Deep Learning Yao Chou Outline Introduction Images Classification Object Detection R-CNN Traditional Feature Descriptor Selective Search Implementation Latest Application Deep Learning
Simple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,
arxiv:1409.1556v6 [cs.cv] 10 Apr 2015
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION Karen Simonyan & Andrew Zisserman + Visual Geometry Group, Department of Engineering Science, University of Oxford {karen,az}@robots.ox.ac.uk
Who are you? Learning person specific classifiers from video
Who are you? Learning person specific classifiers from video Josef Sivic, Mark Everingham 2 and Andrew Zisserman 3 INRIA, WILLOW Project, Laboratoire d Informatique de l Ecole Normale Superieure, Paris,
VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS
VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS Norbert Buch 1, Mark Cracknell 2, James Orwell 1 and Sergio A. Velastin 1 1. Kingston University, Penrhyn Road, Kingston upon Thames, KT1 2EE,
Simultaneous Gamma Correction and Registration in the Frequency Domain
Simultaneous Gamma Correction and Registration in the Frequency Domain Alexander Wong [email protected] William Bishop [email protected] Department of Electrical and Computer Engineering University
Handwritten Digit Recognition with a Back-Propagation Network
396 Le Cun, Boser, Denker, Henderson, Howard, Hubbard and Jackel Handwritten Digit Recognition with a Back-Propagation Network Y. Le Cun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard,
