A Convolutional Neural Network Cascade for Face Detection
|
|
|
- Percival Copeland
- 9 years ago
- Views:
Transcription
1 A Neural Network Cascade for Face Detection Haoxiang Li, Zhe Lin, Xiaohui Shen, Jonathan Brandt, Gang Hua Stevens Institute of Technology Hoboken, NJ {hli18, Adobe Research San Jose, CA {zlin, xshen, Abstract In real-world face detection, large visual variations, such as those due to pose, expression, and lighting, demand an advanced discriminative model to accurately differentiate faces from the backgrounds. Consequently, effective models for the problem tend to be computationally prohibitive. To address these two conflicting challenges, we propose a cascade architecture built on convolutional neural networks (CNNs) with very powerful discriminative capability, while maintaining high performance. The proposed CNN cascade operates at multiple resolutions, quickly rejects the background regions in the fast low resolution stages, and carefully evaluates a small number of challenging candidates in the last high resolution stage. To improve localization effectiveness, and reduce the number of candidates at later stages, we introduce a CNN-based calibration stage after each of the detection stages in the cascade. The output of each calibration stage is used to adjust the detection window position for input to the subsequent stage. The proposed method runs at 14 FPS on a single CPU core for VGA-resolution images and 100 FPS using a GPU, and achieves state-of-the-art detection performance on two public face detection benchmarks. 1. Introduction Face detection is a well studied problem in computer vision. Modern face detectors can easily detect near frontal faces. Recent research in this area focuses more on the uncontrolled face detection problem, where a number of factors such as pose changes, exaggerated expressions and extreme illuminations can lead to large visual variations in face appearance, and can severely degrade the robustness of the face detector. The difficulties in face detection mainly come from two aspects: 1) the large visual variations of human faces in the cluttered backgrounds; 2) the large search space of possible face positions and face sizes. The former one requires the face detector to accurately address a binary classification problem while the latter one further imposes a time efficiency requirement. Ever since the seminal work of Viola et al. [27], the boosted cascade with simple features becomes the most popular and effective design for practical face detection. The simple nature of the features enable fast evaluation and quick early rejection of false positive detections. Meanwhile, the boosted cascade constructs an ensemble of the simple features to achieve accurate face vs. non-face classification. The original Viola-Jones face detector uses the Haar feature which is fast to evaluate yet discriminative enough for frontal faces. However, due to the simple nature of the Haar feature, it is relatively weak in the uncontrolled environment where faces are in varied poses, expressions under unexpected lighting. A number of improvements to the Viola-Jones face detector have been proposed in the past decade [30]. Most of them follow the boosted cascade framework with more advanced features. The advanced feature helps construct a more accurate binary classifier at the expense of extra computation. However, the number of cascade stages required to achieve the similar detection accuracy can be reduced. Hence the overall computation may remain the same or even reduced because of fewer cascade stages. This observation suggests that it is possible to apply more advanced features in a practical face detection solution as long as the false positive detections can be rejected quickly in the early stages. In this work, we propose to apply the Neural Network (CNN) [13] to face detection. Compared with the previous hand-crafted features, CNN can automatically learn features to capture complex visual variations by leveraging a large amount of training data and its testing phase can be easily parallelized on GPU cores for acceleration. Considering the relatively high computational expense of the CNNs, exhaustively scanning the full image in multiple scales with a deep CNN is not a practical solution. To achieve fast face detection, we present a CNN cascade, 1
2 which rejects false detections quickly in the early, lowresolution stages and carefully verify the detections in the later, high-resolution stages. We show that this intuitive solution can outperform the state-of-the-art methods in face detection. For typical VGA size images, our detector runs in 14 FPS on single CPU core and 100 FPS on a GPU card 1. In this work, our contributions are four-fold: we propose a CNN cascade for fast face detection; we introduce a CNN-based face bounding box calibration step in the cascade to help accelerate the CNN cascade and obtain high quality localization; we present a multi-resolution CNN architecture that can be more discriminative than the single resolution CNN with only a fractional overhead; we further improve the state-of-the-art performance on the Face Detection Data Set and Benchmark (FDDB) [7]. 2. Related Work 2.1. Neural network based face detection Early in 1994 Vaillant et al. [26] applied neural networks for face detection. In their work, they proposed to train a convolutional neural network to detect the presence or absence of a face in an image window and scan the whole image with the network at all possible locations. In 1996, Rowley et al. [22] presented a retinally connected neural network for upright frontal face detection. The method was extended for rotation invariant face detection later in 1998 [23] with a router network to estimate the orientation and apply the proper detector network. In 2002 Garcia et al. [5] developed a neural network to detect semi-frontal human faces in complex images; in 2005 Osadchy et al. [20] trained a convolutional network for simultaneous face detection and pose estimation. It is unknown how these detectors perform in today s benchmarks with faces in uncontrolled environments. Nevertheless, given recent break-through results of CNNs [13] for image classification [24] and object detection [3], it is worth to revisit the neural network based face detection. One of the recent CNN based detection method is the R-CNN by Girshick et al. [6] which has achieved the stateof-the-art result on VOC R-CNN follows the recognition using regions paradigm. It generates categoryindependent region proposals and extracts CNN features from the regions. Then it applies class-specific classifiers to recognize the object category of the proposals. Compared with the general object detection task, uncontrolled face detection presents different challenges that make it impractical to directly apply the R-CNN method to face detection. For example, the general object proposal 1 Intel Xeon E GHz CPU and GeForce GTX TITAN BLACK GPU with Caffe [9]. methods may not be effective for faces due to small-sized faces and complex appearance variations Face detection in uncontrolled environments Previous uncontrolled face detection systems are mostly based on hand-crafted features. Since the seminal Viola- Jones face detector [27], a number of variants are proposed for real-time face detection [10, 17, 29, 30]. Recently within the boosted cascade with simple features framework, Chen et al. [2] propose to use the shape indexed features to jointly conduct face detection and face alignment. Similar to this idea, we have alternative stages of calibration and detection in our framework. Considering the success of CNNs in a number of visual tasks including the face alignment [31], our framework is more general in that we can adopt a CNN-based face alignment method to achieve joint face alignment and detection, and we use CNN to learn more robust features for faces. Zhang et al. [32] and Park et al. [21] adopt the multiresolution idea in general object detection. While sharing the similar technique, our method utilizes CNNs as the classifiers and combines the multi-resolution and calibration ideas for face detection. Additionally, the part-based model has motivated a number of face detection methods. Zhu et al. [33] propose the tree structured model for face detection which can simultaneously achieve the pose estimation and facial landmarks localization. Yan et al. [28] present a structural model for face detection. Mathias et al. [19] show that a carefully trained deformable part-based model [4] achieves state-ofthe-art detection accuracy. Different from these model-based methods, Shen et al. [25] propose to detect faces by image retrieval. Li et al. [15] further improve it to a boosted exemplar-based face detector with state-of-the-art performance. Compared with these face detection systems, our work learns the classifier directly from the image instead of relying on hand-crafted features. Hence we benefit from the powerful features learned by the CNN to better differentiate faces from highly cluttered backgrounds. Meanwhile, our detector is many times faster than the model-based and exemplar-based detection systems and has a frame rate comparable to the classical boosted cascade with simple features. Sharing the advantages of the CNN, our detector is easy to be parallelized on GPU for much faster detection. 3. Neural Network Cascade We present a specific design of our detector here for a clear explanation of the proposed method. In practice, the CNN cascade can have varied settings for accuracycomputation trade off.
3 Global NMS NMS NMS NMS NMS Test image After 12-net After 12-calibration-net After 24-net After 24-calibration-net After 48-net Before 48-calibration-net Output detections Figure 1: Test pipeline of our detector: from left to right, we show how the detection windows (green squares) are reduced and calibrated from stage to stage in our detector. The detector runs on a single scale for better viewing Overall framework The overall test pipeline of our face detector is shown in Figure 1. We briefly explain the work-flow and will introduce all the CNNs in detail later. Given a test image, the 12-net scans the whole image densely across different scales to quickly reject more than 90% of the detection windows. The remaining detection windows are processed by the 12-calibration-net one by one as images to adjust its size and location to approach a potential face nearby. Non-maximum suppression (NMS) is applied to eliminate highly overlapped detection windows. The remaining detection windows are cropped out and resized into as input images for the 24-net to further reject nearly 90% of the remaining detection windows. Similar to the previous process, the remaining detection windows are adjusted by the 24-calibration-net and we apply NMS to further reduce the number of detection windows. The last 48-net accepts the passed detection windows as images to evaluate the detection windows. NMS eliminates overlapped detection windows with an Intersection-Over-Union (IoU) ratio exceeding a pre-set threshold. The 48-calibration-net is then applied to calibrate the residual detection bounding boxes as the outputs CNN structure There are 6 CNNs in the cascade including 3 CNNs for face vs. non-face binary classification and 3 CNNs for bounding box calibration, which is formulated as multiclass classification of discretized displacement pattern. In these CNNs, without specific explanation we follow AlexNet [12] to apply ReLU nonlinearity function after the pooling and fully-connected net 12-net refers to the first CNN in the test pipeline. The structure of this CNN is shown in Figure net is a very shallow binary classification CNN to quickly scan the testing image. Densely scanning an image of size W H with 4-pixel spacing for detection windows is equivalent to apply the 12-net to the whole image to obtain a ( (W 12)/4 + 1) ( (H 12)/4 + 1) map of confidence scores. Each point on the confidence map refers to a detection window on the testing image. In practice, if the acceptable minimum face size is F, the test image is first built into image pyramid to cover faces at different scales and each level in the image pyramid is resized by 12 F as the input image for the 12-net. On a single CPU core, it takes 12-net less than 36 ms to densely scan an image of size for faces with 4-pixel spacing, which generates 2, 494 detection windows. The time reduces to 10 ms on a GPU card, most of which is overhead in data preparation calibration-net 12-calibration-net refers to the CNN after 12-net for bounding box calibration. The structure is shown in Figure calibration-net is a shallow CNN. N calibration patterns are pre-defined as a set of 3-dimensional scale changes and offset vectors {[s n, x n, y n ]} N n=1. Given a detection window (x, y, w, h) with top-left corner at (x, y) of size (w, h), the calibration pattern adjusts the window to be (x x nw, y y nh, w, h ). (1) s n s n s n s n In this work, we have N = 45 patterns, formed by all combinations of s n {0.83, 0.91, 1.0, 1.10, 1.21} x n { 0.17, 0, 0.17} y n { 0.17, 0, 0.17}. Given a detection window, the region is cropped out and resized to as the input image for the 12-calibrationnet. The calibration net outputs a vector of confidence scores [c 1, c 2,, c N ]. Since the calibration patterns are not orthogonal to each other, we take the average results of the patterns of high confidence score as the adjustment
4 24-net 12-net Input Image Max-pooling Fully-connected 3 channels 16 3x3 filters 12x12 16 outputs 2 classes face / non-face Input Image 3 channels 24x x5 filters resize Max-pooling Fully-connected 2 classes face / non-face 128 outputs 12-net Fully-connected 48-net Input Image 64 5x5 filters 3 channels 48x48 Normalization Max-pooling Fully-connected Max-pooling Normalization 9x9 region 9x9 region 64 5x5 filters 24-net resize 2 classes 256 outputs face / non-face Fully-connected Figure 2: CNN structures of the 12-net, 24-net and 48-net. [s, x, y], i.e., [s, x, y] Z = = 1 Z [sn, xn, yn ]I(cn > t), (2) n=1 N X I(cn > t), (3) n=1 I(cn > t) = 1, 0, if cn > t otherwise with multi-resolution w/o multi-resolution (4) Here t is a threshold to filter out low confident patterns. In our experiment, we observe that the 12-net and 12calibration-net reject 92.7% detection windows while keeping 94.8% recall on FDDB (see Table 1) Detection Rate 1 N X 24-net 24-net is an intermediate binary classification CNN to further reduce the number of detection windows. Remaining detection windows from the 12-calibration-net are cropped out and resized into images and evaluated by the 24-net. The CNN structure is shown in Figure 2. A similar shallow structure is chosen for time efficiency. Besides, we adopt a multi-resolution structure in the 24net. In additional to the input, we also feed the input in resolution to a sub-structure same as the 12-net in 24-net. The fully-connected from the 12net sub-structure is concatenated to the 128-output fullyconnected for classification as shown in Figure 2. With this multi-resolution structure, the 24-net is supplemented by the information at resolution which helps detect the small faces. The overall CNN becomes more discriminative and the overhead from the 12-net sub-structure is only a fraction of the overall computation. In Figure 3, we compare the detection performance with and without the multi-resolution design in the 24-net. We Number of False Detections Figure 3: On the Annotated Faces in the Wild dataset, the detection performance of 24-net with and without the multiresolution structure. observe that at the same recall rate, the one with the multiresolution structure can achieve the same recall level with less false detection windows. The gap is more obvious at the high recall level calibration-net Similar to the 12-calibration-net, 24-calibration-net is another calibration net with N calibration patterns. The structure is shown in Figure 4. Except for the input size to the 24-calibration-net is 24 24, the pre-defined patterns and the calibration process is same as in the 12-calibration-net. In our experiment, we observe that the 24-net and 24calibration-net can further reject 86.2% detection windows retained after 24-calibration-net while keeping 89.0% recall on FDDB (see Table 1) net 48-net is the last binary classification CNN. At this stage of the cascade, it is feasible to apply a more powerful but
5 12-calibration-net Input Image 3 channels 16 3x3 filters 12x12 24-calibration-net Input Image 45 classes 3 channels 24x x5 filters Max-pooling Fully-connected 128 outputs Max-pooling Fully-connected 64 outputs 45 classes 48-calibration-net Input image 3 channels 48x x5 filters Max-pooling Normalization Fully-connected Normalization 9x9 region 64 5x5 filters 9x9 region 256 outputs 45 classes Figure 4: CNN structures of the 12-calibration-net, 24-calibration-net and 48-calibration-net. challenging false positives may have a higher confidence score compared with the true positives. Hence after 12calibration-net and 24-calibration-net, we conservatively apply NMS separately for the detection windows at the same scale (of the same size) to avoid degrading the recall rate. NMS After 48-net is applied globally to all detection windows at different scales to make most accurate detection window at the correct scale stand out and avoid redundant evaluation in the 48-calibration-net. Figure 5: Calibrated bounding boxes are better aligned to the faces: the blue rectangles are the most confident detection bounding boxes of 12-net; the red rectangles are the adjusted bounding boxes with the 12-calibration-net. slower CNN. As shown in Figure 2, the 48-net is relatively more complicated. Similar to the 24-net, we adopt the multi-resolution design in 48-net with additional input copy in and a sub-structure the same as the 24-net calibration-net 48-calibration-net is the last stage in the cascade. The CNN structure is shown in Figure 4. The same N = 45 calibration patterns are pre-defined for the 48-calibration-net as in Section We use only one pooling in this CNN to have more accurate calibration Non-maximum suppression (NMS) We adopt an efficient implementation of the NMS in this work. We iteratively select the detection window with the highest confidence score and eliminate the detection windows with an IoU ratio higher than a pre-set threshold to the selected detection window. In the 12-net and 24-net, the shallow CNNs may not be discriminative enough to address challenging false positives. After the 12-calibration-net and 24-calibration-net, 3.3. CNN for calibration We explain how the calibration nets help in the cascade for face detection. The motivation of applying the calibration is shown in Figure 5. The most confident detection window may not be well aligned to the face. As a result, without the calibration step, the next CNN in the cascade will have to evaluate more regions to maintain a good recall. The overall detection runtime increases significantly. This problem generally exists in object detection. We explicitly address this problem with CNNs in this work. Instead of training a CNN for bounding boxes regression as in R-CNN, we train a multi-class classification CNN for calibration. We observe that a multi-class calibration CNN can be easily trained from limited amount of training data while a regression CNN for calibration requires more training data. We believe that the discretization decreases the difficulty of the calibration problem so that we can achieve good calibration accuracy with simpler CNN structures. As shown in Figure 5, after calibration the detection bounding box is better aligned to the real face center. As a result, the calibration nets enable more accurate face localization using coarser scanning windows across less scales Training process In training the CNNs in the cascade, we collect 5, 800 background images to obtain negative training samples and
6 (a) AFLW (b) AFW Figure 6: Mismatched face annotations in AFLW and AFW. use the faces in the Annotated Facial Landmarks in the Wild (AFLW) [11] dataset as positive training samples. For both the binary and multi-class classification CNNs in the cascade, we use the multinomial logistic regression objective function for optimization in training Calibration nets In collecting training data for the calibration nets, we perturb the face annotations with the N = 45 calibration patterns in Section Specifically, for the n-th pattern [s n, x n, y n ], we apply [1/s n, x n, y n ] (following Equation 1) to adjust the face annotation bounding box, crop and resize into proper input sizes (12 12, and 48 48) Detection nets The detection nets 12-net, 24-net and 48-net are trained following the cascade structure. We resize all training faces into and randomly sample 200, 000 non-face patches from the background images to train the 12-net. We then apply a 2-stage cascade consists of the 12-net and 12- calibration-net on a subset of the AFLW images to choose a threshold T 1 at 99% recall rate. Then we densely scan all background images with the 2- stage cascade. All detection windows with confidence score larger than T 1 become the negative training samples for the 24-net. 24-net is trained with the mined negative training samples and all training faces in After that, we follow the same process for the 4-stage cascade consists of the 12-net, 12-calibration-net, 24-net and 24-calibrationnet. We set the threshold T 2 to keep 97% recall rate. Following the same procedure, we mine negative training samples for the 48-net with the 4-stage cascade on all the background images. The 48-net is trained with positive and negative training samples in At each stage in the cascade, the CNN is trained to address a sub-problem which is easier than addressing the face vs. non-face classification globally. Compared with the design to have one single CNN to scan the full image for faces, the cascade makes it possible to have simpler CNNs achieve the same or even better accuracy. 2 We use the cuda-convnet2 [1] CNN implementation. Precision 0.90 Face.com 0.85 Face++ Picasa 0.80 TSM (AP 87.99) Shen et al. (AP 89.03) 0.75 Structured Models (AP 95.19) 0.70 Ours (AP 96.72) HeadHunter (AP 97.14) 0.65 DPM (AP 97.21) Ours (manually eval) (AP 97.97) Recall Figure 7: On the AFW dataset we compare our performance with the state-of-the-art methods including TSM [33], Shen et al. [25], Structured Models [28], HeadHunter [19], DPM [4, 19], Face.com, Face++ and Picasa. 4. Experiments We verify the proposed detector on two public face detection benchmarks. On the Annotated Faces in the Wild (AFW) [33] test set, our detector is comparable to the stateof-the-art. This small scale test set is almost saturated and we observe that the evaluation is biased due to the mismatched face annotations. On the challenging Face Detection Data Set and Benchmark (FDDB) dataset [7], our detector outperforms the state-of-the-art methods in the discontinuous score evaluation. Meanwhile, we show that our detector can be easily tuned to be a faster version with minor performance decrease Annotated Faces in the Wild Annotated Faces in the Wild (AFW) is a 205 images dataset created by Zhu et al. [33]. We evaluate our detector on AFW and the precision-recall curves are shown in Figure 7. Our performance is comparable to the state-the-arts on this dataset. As pointed out by Mathias et al. [19], one important problem in the evaluation of face detection methods is the mismatch of the face annotations in the training and testing stages. In our training stage, we generate square annotations to approach the ellipse face annotations on AFLW in preparing the positive training samples. As shown in Figure 6, the annotations in AFLW and AFW are mismatched. In the evaluation on AFW, we follow the remedial method proposed by Mathias et al. to search for a global rigid transformation of the detection outputs to maximize the overlapping ratio with the ground-truth annotations with the shared evaluation tool [19]. However our annotations cannot be simply linearly mapped to AFW annotations, after the global transformation step the mismatches still exist. Therefore we manually evaluate the output detections and get better results. All the curated detections are shown
7 Figure 8: Manually curated detection bounding boxes on AFW: blue boxes are faces mis-evaluated to be false alarms; green boxes are unannotated faces. These are all detections our approach generated but miss-classified by the evaluation. However, with our annotation standards, these detections are examined to be true detections. Table 1: Performance statistics of the cascade on FDDB: we show the average number of detection windows per image after each stage and the overall recall rate. Stage # windows Recall sliding window % 12-net % 12-calibration-net % 24-net % 24-calibration-net % 48-net % global NMS % 48-calibration-net % in Figure 8 including unannotated faces and mis-evaluated faces Face Detection Data Set and Benchmark The Face Detection Data Set and Benchmark (FDDB) dataset [7] contains 5, 171 annotated faces in 2, 845 images. This is a large-scale face detection benchmark with standardized evaluation process. We follow the required evaluation procedure to report our detection performance with the toolbox provided by the authors. FDDB uses ellipse face annotations and defines two types of evaluations: the discontinuous score and continuous score. In the discontinuous score evaluation, it counts the number of detected faces versus the number of false alarms. The detection bounding boxes (or ellipses) are regarded as true positive only if it has an Intersection-over- Union (IoU) ratio above 0.5 to a ground-truth face. In the continuous score evaluation, it evaluates how well the faces are located by considering the IoU ratio as the matching metric of the detection bounding box. We uniformly extend our square detection bounding boxes vertically by 20% to be upright rectangles on FDDB to better approach their ellipse annotation. As show in Figure 9, our detector outperforms the best performance in the discontinuous score evaluation. Our detector outputs rectangle outputs while the Head- Hunter [19] and JointCascade [2] generate ellipse outputs. For a more fair comparison under the continuous score evaluation, we uniformly fit upright ellipses for our rectangle bounding boxes. For a rectangle in size (w, h), we fit an upright ellipse at the same center with axis sizes 1.18h and 1.13w, which most overlapped with the rectangle. As shown in Figure 9, with the naively fitted ellipses our detector outperforms the HeadHunter and approaches the Joint- Cascade under the continuous score evaluation. The performance drops a little in the discontinuous score evaluation due to the inaccurate simple fitting strategy. The performance of the cascade from stage to stage is shown Table 1. We observe the number of detection windows decreases quickly and the calibration nets help further reduce the detection windows and improve the recall Runtime efficiency One of the important advantages of this work is its runtime efficiency. In this work, the CNN cascade can achieve very fast face detection. Furthermore, by simply varying the thresholds T 1,2, one can find a task specific accuracycomputation trade off. In Figure 11, we show that the performance of a faster version of our detector is comparable to Picasa on AFW. In the faster version, we set the thresholds to be aggressively high to reject a large portion of the detection windows in the early stages. The calibration nets help more in the faster version by adjusting the bounding boxes back to the face center to improve recall in the later stage. On average, only 2.36% detection windows passed the 12-net and 12- calibration-net; 14.3% of the retained detection windows passed 24-net and 24-calibration-net to feed into the most computationally expensive 48-net. With the same thresholds, we evaluate our detector in a typical surveillance scenario to detect faces from VGA images. We only scan for faces in the 12- net stage but with the calibration nets we can detect faces smaller or larger than within the calibration range. In this scenario, our detector processes one image in 71 ms on average on a 2.0 GHz CPU core 3. Under the same setting, it takes 770 ms to scan the whole image with the 48-3 Over an image pyramid with scaling factor 1.414, the average detection time is 110 ms.
8 1 Discontinuous score 1 Continuous score True positive rate Ours (ellipse) 0.4 Ours ACF-multiscale 0.3 PEP-Adapt Boosted Exemplar 0.2 HeadHunter Joint Cascade 0.1 SURF-multiview Jain et al. Pico False positives True positive rate Ours (ellipse) 0.4 Ours ACF-multiscale 0.3 PEP-Adapt Boosted Exemplar 0.2 HeadHunter Joint Cascade 0.1 SURF-multiview Jain et al. Pico False positives Figure 9: On the FDDB dataset we compare our performance with the state-of-the-art methods including: ACFmultiscale [29], PEP-Adapt [14], Boosted Exemplar [15], HeadHunter [19], Jain et al. [8], SURF-multiview [16], Pico [18], and Joint Cascade [2]. Figure 10: Qualitative results of our detector on FDDB. net for detection, which demonstrates the time efficiency of our cascade design. On the GPU card the detection time of cascade is further reduced to 10 ms per image without code optimization. This detection speed is very competitive compared with other state-of-the-art methods. Among the top performers on the FDDB and AFW that reported their detection speed, the runtime for the ACFmultiscale [29] is 20 FPS for full yaw pose face detection and 34 FPS for frontal faces on a single thread of Intel Core i CPU in VGA image; for the Boosted Exemplar [15] it is 900 ms for a pixels image; for the Joint Cascade [2] it is 28.6 ms for VGA images on a 2.93 GHz CPU; SURF-multiview [16] runs in real-time for VGA video on a personal workstation with 3.2 GHz Core-i7 CPU (4 cores 8 threads); TSM et al. [33] processes a VGA image in 33.8 seconds [2]; Shen et al. [25] processes a 1280-pixel dimension image in less than 10 seconds. 5. Conclusion In this work, we present a CNN cascade for fast face detection. Our detector evaluates the input image at low resolution to quickly reject non-face regions and carefully process the challenging regions at higher resolution for ac- Figure 11: Precision-recall curve of the faster version of our detector the AFW dataset. All three false alarms are mis-evaluated or unannotated faces as shown on the right. curate detection. Calibration nets are introduced in the cascade to accelerate detection and improve bounding box quality. Sharing the advantages of CNN, the proposed face detector is robust to large visual variations. On the public face detection benchmark FDDB, the proposed detector outperforms the state-of-the-art methods. The proposed detector is very fast, achieving 14 FPS for typical VGA images on CPU and can be accelerated to 100 FPS on GPU.
9 Acknowledgment This work is partially done when the first author was an intern at Adobe Research. Research reported in this publication was partly supported by the National Institute Of Nursing Research of the National Institutes of Health under Award Number R01NR The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This work is also partly supported by US National Science Foundation Grant IIS and GH s start-up funds from Stevens Institute of Technology. References [1] code.google.com/p/cuda-convnet2. 6 [2] D. Chen, S. Ren, Y. Wei, X. Cao, and J. Sun. Joint cascade face detection and alignment. In Computer Vision ECCV , 7, 8 [3] M. Everingham, L. V. Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge, [4] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained partbased models. Pattern Analysis and Machine Intelligence, IEEE Transactions on, , 6 [5] C. Garcia and M. Delakis. A neural architecture for fast and robust face detection. In Pattern Recognition, Proceedings. 16th International Conference on, [6] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. arxiv preprint arxiv: , [7] V. Jain and E. Learned-Miller. Fddb: A benchmark for face detection in unconstrained settings. Technical Report UM- CS , University of Massachusetts, Amherst, , 6, 7 [8] V. Jain and E. Learned-Miller. Online domain adaptation of a pre-trained cascade of classifiers. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, [9] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: architecture for fast feature embedding. arxiv preprint arxiv: , [10] M. Jones and P. Viola. Fast multi-view face detection. Mitsubishi Electric Research Lab TR , [11] M. Koestinger, P. Wohlhart, P. M. Roth, and H. Bischof. Annotated facial landmarks in the wild: A large-scale, realworld database for facial landmark localization. In First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies, [12] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, [13] Y. LeCun and Y. Bengio. networks for images, speech, and time series. The handbook of brain theory and neural networks, , 2 [14] H. Li, G. Hua, Z. Lin, J. Brandt, and J. Yang. Probabilistic elastic part model for unsupervised face detector adaptation. In Proc. IEEE International Conference on Computer Vision, [15] H. Li, Z. Lin, J. Brandt, X. Shen, and G. Hua. Efficient boosted exemplar-based face detection. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, , 8 [16] J. Li, T. Wang, and Y. Zhang. Face detection using surf cascade. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, [17] R. Lienhart and J. Maydt. An extended set of haar-like features for rapid object detection. In Image Processing Proceedings International Conference on, [18] N. Markuš, M. Frljak, I. S. Pandžić, J. Ahlberg, and R. Forchheimer. A method for object detection based on pixel intensity comparisons organized in decision trees. arxiv preprint arxiv: , [19] M. Mathias, R. Benenson, M. Pedersoli, and L. Van Gool. Face detection without bells and whistles. In Computer Vision ECCV , 6, 7, 8 [20] M. Osadchy, Y. L. Cun, M. L. Miller, and P. Perona. Synergistic face detection and pose estimation with energy-based model. In In Advances in Neural Information Processing Systems (NIPS), [21] D. Park, D. Ramanan, and C. Fowlkes. Multiresolution models for object detection. In Computer Vision ECCV [22] H. Rowley, S. Baluja, and T. Kanade. Neural network-based face detection. In Computer Vision and Pattern Recognition, [23] H. A. Rowley, S. Baluja, and T. Kanade. Rotation invariant neural network-based face detection. In Computer Vision and Pattern Recognition, [24] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge, [25] X. Shen, Z. Lin, J. Brandt, and Y. Wu. Detecting and aligning faces by image retrieval. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, , 6, 8 [26] R. Vaillant, C. Monrocq, and Y. Le Cun. Original approach for the localisation of objects in images. IEE Proceedings- Vision, Image and Signal Processing, [27] P. A. Viola and M. J. Jones. Rapid object detection using a boosted cascade of simple features. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, , 2 [28] J. Yan, X. Zhang, Z. Lei, and S. Z. Li. Face detection by structural models. Image and Vision Computing, , 6 [29] B. Yang, J. Yan, Z. Lei, and S. Z. Li. Aggregate channel features for multi-view face detection. arxiv preprint arxiv: , , 8 [30] C. Zhang and Z. Zhang. A survey of recent advances in face detection. Technical Report MSR-TR , , 2 [31] J. Zhang, S. Shan, M. Kan, and X. Chen. Coarse-to-fine auto-encoder networks (cfan) for real-time face alignment. In Computer Vision ECCV
10 [32] W. Zhang, G. Zelinsky, and D. Samaras. Real-time accurate object detection using multiple resolutions. In Proc. IEEE International Conference on Computer Vision, [33] X. Zhu and D. Ramanan. Face detection, pose estimation, and landmark localization in the wild. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, , 6, 8
Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection
CSED703R: Deep Learning for Visual Recognition (206S) Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection Bohyung Han Computer Vision Lab. [email protected] 2 3 Object detection
Multi-view Face Detection Using Deep Convolutional Neural Networks
Multi-view Face Detection Using Deep Convolutional Neural Networks Sachin Sudhakar Farfade Yahoo [email protected] Mohammad Saberian Yahoo [email protected] Li-Jia Li Yahoo [email protected]
Lecture 6: Classification & Localization. boris. [email protected]
Lecture 6: Classification & Localization boris. [email protected] 1 Agenda ILSVRC 2014 Overfeat: integrated classification, localization, and detection Classification with Localization Detection. 2 ILSVRC-2014
Pedestrian Detection with RCNN
Pedestrian Detection with RCNN Matthew Chen Department of Computer Science Stanford University [email protected] Abstract In this paper we evaluate the effectiveness of using a Region-based Convolutional
Deformable Part Models with CNN Features
Deformable Part Models with CNN Features Pierre-André Savalle 1, Stavros Tsogkas 1,2, George Papandreou 3, Iasonas Kokkinos 1,2 1 Ecole Centrale Paris, 2 INRIA, 3 TTI-Chicago Abstract. In this work we
Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report
Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 69 Class Project Report Junhua Mao and Lunbo Xu University of California, Los Angeles [email protected] and lunbo
Convolutional Feature Maps
Convolutional Feature Maps Elements of efficient (and accurate) CNN-based object detection Kaiming He Microsoft Research Asia (MSRA) ICCV 2015 Tutorial on Tools for Efficient Object Detection Overview
Scalable Object Detection by Filter Compression with Regularized Sparse Coding
Scalable Object Detection by Filter Compression with Regularized Sparse Coding Ting-Hsuan Chao, Yen-Liang Lin, Yin-Hsi Kuo, and Winston H Hsu National Taiwan University, Taipei, Taiwan Abstract For practical
Semantic Recognition: Object Detection and Scene Segmentation
Semantic Recognition: Object Detection and Scene Segmentation Xuming He [email protected] Computer Vision Research Group NICTA Robotic Vision Summer School 2015 Acknowledgement: Slides from Fei-Fei
Bert Huang Department of Computer Science Virginia Tech
This paper was submitted as a final project report for CS6424/ECE6424 Probabilistic Graphical Models and Structured Prediction in the spring semester of 2016. The work presented here is done by students
CS 1699: Intro to Computer Vision. Deep Learning. Prof. Adriana Kovashka University of Pittsburgh December 1, 2015
CS 1699: Intro to Computer Vision Deep Learning Prof. Adriana Kovashka University of Pittsburgh December 1, 2015 Today: Deep neural networks Background Architectures and basic operations Applications Visualizing
CAP 6412 Advanced Computer Vision
CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong Jan 26, 2016 Today Administrivia A bigger picture and some common questions Object detection proposals, by Samer
Pedestrian Detection using R-CNN
Pedestrian Detection using R-CNN CS676A: Computer Vision Project Report Advisor: Prof. Vinay P. Namboodiri Deepak Kumar Mohit Singh Solanki (12228) (12419) Group-17 April 15, 2016 Abstract Pedestrian detection
Module 5. Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016
Module 5 Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016 Previously, end-to-end.. Dog Slide credit: Jose M 2 Previously, end-to-end.. Dog Learned Representation Slide credit: Jose
Local features and matching. Image classification & object localization
Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to
CS231M Project Report - Automated Real-Time Face Tracking and Blending
CS231M Project Report - Automated Real-Time Face Tracking and Blending Steven Lee, [email protected] June 6, 2015 1 Introduction Summary statement: The goal of this project is to create an Android
Naive-Deep Face Recognition: Touching the Limit of LFW Benchmark or Not?
Naive-Deep Face Recognition: Touching the Limit of LFW Benchmark or Not? Erjin Zhou [email protected] Zhimin Cao [email protected] Qi Yin [email protected] Abstract Face recognition performance improves rapidly
Administrivia. Traditional Recognition Approach. Overview. CMPSCI 370: Intro. to Computer Vision Deep learning
: Intro. to Computer Vision Deep learning University of Massachusetts, Amherst April 19/21, 2016 Instructor: Subhransu Maji Finals (everyone) Thursday, May 5, 1-3pm, Hasbrouck 113 Final exam Tuesday, May
Object Detection in Video using Faster R-CNN
Object Detection in Video using Faster R-CNN Prajit Ramachandran University of Illinois at Urbana-Champaign [email protected] Abstract Convolutional neural networks (CNN) currently dominate the computer
Steven C.H. Hoi School of Information Systems Singapore Management University Email: [email protected]
Steven C.H. Hoi School of Information Systems Singapore Management University Email: [email protected] Introduction http://stevenhoi.org/ Finance Recommender Systems Cyber Security Machine Learning Visual
3D Model based Object Class Detection in An Arbitrary View
3D Model based Object Class Detection in An Arbitrary View Pingkun Yan, Saad M. Khan, Mubarak Shah School of Electrical Engineering and Computer Science University of Central Florida http://www.eecs.ucf.edu/
InstaNet: Object Classification Applied to Instagram Image Streams
InstaNet: Object Classification Applied to Instagram Image Streams Clifford Huang Stanford University [email protected] Mikhail Sushkov Stanford University [email protected] Abstract The growing
The Visual Internet of Things System Based on Depth Camera
The Visual Internet of Things System Based on Depth Camera Xucong Zhang 1, Xiaoyun Wang and Yingmin Jia Abstract The Visual Internet of Things is an important part of information technology. It is proposed
Image Classification for Dogs and Cats
Image Classification for Dogs and Cats Bang Liu, Yan Liu Department of Electrical and Computer Engineering {bang3,yan10}@ualberta.ca Kai Zhou Department of Computing Science [email protected] Abstract
Edge Boxes: Locating Object Proposals from Edges
Edge Boxes: Locating Object Proposals from Edges C. Lawrence Zitnick and Piotr Dollár Microsoft Research Abstract. The use of object proposals is an effective recent approach for increasing the computational
Compacting ConvNets for end to end Learning
Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson, Hao Zhou, Fatih Porikli. Success of CNN Image Classification Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton,
Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks
1 Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks Tomislav Hrkać, Karla Brkić, Zoran Kalafatić Faculty of Electrical Engineering and Computing University of
Robust Real-Time Face Detection
Robust Real-Time Face Detection International Journal of Computer Vision 57(2), 137 154, 2004 Paul Viola, Michael Jones 授 課 教 授 : 林 信 志 博 士 報 告 者 : 林 宸 宇 報 告 日 期 :96.12.18 Outline Introduction The Boost
Unsupervised Joint Alignment of Complex Images
Unsupervised Joint Alignment of Complex Images Gary B. Huang Vidit Jain University of Massachusetts Amherst Amherst, MA {gbhuang,vidit,elm}@cs.umass.edu Erik Learned-Miller Abstract Many recognition algorithms
Learning Detectors from Large Datasets for Object Retrieval in Video Surveillance
2012 IEEE International Conference on Multimedia and Expo Learning Detectors from Large Datasets for Object Retrieval in Video Surveillance Rogerio Feris, Sharath Pankanti IBM T. J. Watson Research Center
Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers
Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers Fan Yang 1,2, Wongun Choi 2, and Yuanqing Lin 2 1 Department of Computer Science,
Face Recognition in Low-resolution Images by Using Local Zernike Moments
Proceedings of the International Conference on Machine Vision and Machine Learning Prague, Czech Republic, August14-15, 014 Paper No. 15 Face Recognition in Low-resolution Images by Using Local Zernie
Unconstrained Face Detection
1 Unconstrained Face Detection Shengcai Liao, Anil K. Jain, Fellow, IEEE and Stan Z. Li, Fellow, IEEE Abstract Face detection, as the first step in automatic facial analysis, has been well studied over
Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang
Recognizing Cats and Dogs with Shape and Appearance based Models Group Member: Chu Wang, Landu Jiang Abstract Recognizing cats and dogs from images is a challenging competition raised by Kaggle platform
Task-driven Progressive Part Localization for Fine-grained Recognition
Task-driven Progressive Part Localization for Fine-grained Recognition Chen Huang Zhihai He [email protected] University of Missouri [email protected] Abstract In this paper we propose a task-driven
Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.
Visual search: what's next? Cees Snoek University of Amsterdam The Netherlands Euvision Technologies The Netherlands Problem statement US flag Tree Aircraft Humans Dog Smoking Building Basketball Table
Fast R-CNN. Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th. Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:1504.08083.
Fast R-CNN Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:1504.08083. ECS 289G 001 Paper Presentation, Prof. Lee Result 1 67% Accuracy
Distributed forests for MapReduce-based machine learning
Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
Interactive person re-identification in TV series
Interactive person re-identification in TV series Mika Fischer Hazım Kemal Ekenel Rainer Stiefelhagen CV:HCI lab, Karlsruhe Institute of Technology Adenauerring 2, 76131 Karlsruhe, Germany E-mail: {mika.fischer,ekenel,rainer.stiefelhagen}@kit.edu
International Journal of Advanced Information in Arts, Science & Management Vol.2, No.2, December 2014
Efficient Attendance Management System Using Face Detection and Recognition Arun.A.V, Bhatath.S, Chethan.N, Manmohan.C.M, Hamsaveni M Department of Computer Science and Engineering, Vidya Vardhaka College
Denoising Convolutional Autoencoders for Noisy Speech Recognition
Denoising Convolutional Autoencoders for Noisy Speech Recognition Mike Kayser Stanford University [email protected] Victor Zhong Stanford University [email protected] Abstract We propose the use of
arxiv:1312.6034v2 [cs.cv] 19 Apr 2014
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps arxiv:1312.6034v2 [cs.cv] 19 Apr 2014 Karen Simonyan Andrea Vedaldi Andrew Zisserman Visual Geometry Group,
MulticoreWare. Global Company, 250+ employees HQ = Sunnyvale, CA Other locations: US, China, India, Taiwan
1 MulticoreWare Global Company, 250+ employees HQ = Sunnyvale, CA Other locations: US, China, India, Taiwan Focused on Heterogeneous Computing Multiple verticals spawned from core competency Machine Learning
An Active Head Tracking System for Distance Education and Videoconferencing Applications
An Active Head Tracking System for Distance Education and Videoconferencing Applications Sami Huttunen and Janne Heikkilä Machine Vision Group Infotech Oulu and Department of Electrical and Information
CNN Based Object Detection in Large Video Images. WangTao, [email protected] IQIYI ltd. 2016.4
CNN Based Object Detection in Large Video Images WangTao, [email protected] IQIYI ltd. 2016.4 Outline Introduction Background Challenge Our approach System framework Object detection Scene recognition Body
Learning to Process Natural Language in Big Data Environment
CCF ADL 2015 Nanchang Oct 11, 2015 Learning to Process Natural Language in Big Data Environment Hang Li Noah s Ark Lab Huawei Technologies Part 1: Deep Learning - Present and Future Talk Outline Overview
arxiv:1604.08893v1 [cs.cv] 29 Apr 2016
Faster R-CNN Features for Instance Search Amaia Salvador, Xavier Giró-i-Nieto, Ferran Marqués Universitat Politècnica de Catalunya (UPC) Barcelona, Spain {amaia.salvador,xavier.giro}@upc.edu Shin ichi
Hybrid Learning Framework for Large-Scale Web Image Annotation and Localization
Hybrid Learning Framework for Large-Scale Web Image Annotation and Localization Yong Li 1, Jing Liu 1, Yuhang Wang 1, Bingyuan Liu 1, Jun Fu 1, Yunze Gao 1, Hui Wu 2, Hang Song 1, Peng Ying 1, and Hanqing
arxiv:1506.03365v2 [cs.cv] 19 Jun 2015
LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop Fisher Yu Yinda Zhang Shuran Song Ari Seff Jianxiong Xiao arxiv:1506.03365v2 [cs.cv] 19 Jun 2015 Princeton
Latest Advances in Deep Learning. Yao Chou
Latest Advances in Deep Learning Yao Chou Outline Introduction Images Classification Object Detection R-CNN Traditional Feature Descriptor Selective Search Implementation Latest Application Deep Learning
MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos [email protected]
Machine Learning for Computer Vision 1 MVA ENS Cachan Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos [email protected] Department of Applied Mathematics Ecole Centrale Paris Galen
Fast R-CNN Object detection with Caffe
Fast R-CNN Object detection with Caffe Ross Girshick Microsoft Research arxiv code Latest roasts Goals for this section Super quick intro to object detection Show one way to tackle obj. det. with ConvNets
Fast Accurate Fish Detection and Recognition of Underwater Images with Fast R-CNN
Fast Accurate Fish Detection and Recognition of Underwater Images with Fast R-CNN Xiu Li 1, 2, Min Shang 1, 2, Hongwei Qin 1, 2, Liansheng Chen 1, 2 1. Department of Automation, Tsinghua University, Beijing
Simultaneous Gamma Correction and Registration in the Frequency Domain
Simultaneous Gamma Correction and Registration in the Frequency Domain Alexander Wong [email protected] William Bishop [email protected] Department of Electrical and Computer Engineering University
Taking Inverse Graphics Seriously
CSC2535: 2013 Advanced Machine Learning Taking Inverse Graphics Seriously Geoffrey Hinton Department of Computer Science University of Toronto The representation used by the neural nets that work best
Open-Set Face Recognition-based Visitor Interface System
Open-Set Face Recognition-based Visitor Interface System Hazım K. Ekenel, Lorant Szasz-Toth, and Rainer Stiefelhagen Computer Science Department, Universität Karlsruhe (TH) Am Fasanengarten 5, Karlsruhe
Transform-based Domain Adaptation for Big Data
Transform-based Domain Adaptation for Big Data Erik Rodner University of Jena Judy Hoffman Jeff Donahue Trevor Darrell Kate Saenko UMass Lowell Abstract Images seen during test time are often not from
Image and Video Understanding
Image and Video Understanding 2VO 710.095 WS Christoph Feichtenhofer, Axel Pinz Slide credits: Many thanks to all the great computer vision researchers on which this presentation relies on. Most material
Learning and transferring mid-level image representions using convolutional neural networks
Willow project-team Learning and transferring mid-level image representions using convolutional neural networks Maxime Oquab, Léon Bottou, Ivan Laptev, Josef Sivic 1 Image classification (easy) Is there
Do Convnets Learn Correspondence?
Do Convnets Learn Correspondence? Jonathan Long Ning Zhang Trevor Darrell University of California Berkeley {jonlong, nzhang, trevor}@cs.berkeley.edu Abstract Convolutional neural nets (convnets) trained
Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite
Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite Philip Lenz 1 Andreas Geiger 2 Christoph Stiller 1 Raquel Urtasun 3 1 KARLSRUHE INSTITUTE OF TECHNOLOGY 2 MAX-PLANCK-INSTITUTE IS 3
Canny Edge Detection
Canny Edge Detection 09gr820 March 23, 2009 1 Introduction The purpose of edge detection in general is to significantly reduce the amount of data in an image, while preserving the structural properties
Object Detection from Video Tubelets with Convolutional Neural Networks
Object Detection from Video Tubelets with Convolutional Neural Networks Kai Kang Wanli Ouyang Hongsheng Li Xiaogang Wang Department of Electronic Engineering, The Chinese University of Hong Kong {kkang,wlouyang,hsli,xgwang}@ee.cuhk.edu.hk
R-CNN minus R. 1 Introduction. Karel Lenc http://www.robots.ox.ac.uk/~karel. Department of Engineering Science, University of Oxford, Oxford, UK.
LENC, VEDALDI: R-CNN MINUS R 1 R-CNN minus R Karel Lenc http://www.robots.ox.ac.uk/~karel Andrea Vedaldi http://www.robots.ox.ac.uk/~vedaldi Department of Engineering Science, University of Oxford, Oxford,
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,
Face Model Fitting on Low Resolution Images
Face Model Fitting on Low Resolution Images Xiaoming Liu Peter H. Tu Frederick W. Wheeler Visualization and Computer Vision Lab General Electric Global Research Center Niskayuna, NY, 1239, USA {liux,tu,wheeler}@research.ge.com
Novelty Detection in image recognition using IRF Neural Networks properties
Novelty Detection in image recognition using IRF Neural Networks properties Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban Université de Haute-Alsace MIPS 4, rue des Frères Lumière, 68093 Mulhouse,
Subspace Analysis and Optimization for AAM Based Face Alignment
Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China [email protected] Stan Z. Li Microsoft
arxiv:1409.1556v6 [cs.cv] 10 Apr 2015
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION Karen Simonyan & Andrew Zisserman + Visual Geometry Group, Department of Engineering Science, University of Oxford {karen,az}@robots.ox.ac.uk
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Speed Performance Improvement of Vehicle Blob Tracking System
Speed Performance Improvement of Vehicle Blob Tracking System Sung Chun Lee and Ram Nevatia University of Southern California, Los Angeles, CA 90089, USA [email protected], [email protected] Abstract. A speed
Supporting Online Material for
www.sciencemag.org/cgi/content/full/313/5786/504/dc1 Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks G. E. Hinton* and R. R. Salakhutdinov *To whom correspondence
Introduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Deep Learning Barnabás Póczos & Aarti Singh Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey
Fast Matching of Binary Features
Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been
Machine Learning with MATLAB David Willingham Application Engineer
Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the
Applications of Deep Learning to the GEOINT mission. June 2015
Applications of Deep Learning to the GEOINT mission June 2015 Overview Motivation Deep Learning Recap GEOINT applications: Imagery exploitation OSINT exploitation Geospatial and activity based analytics
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
Analecta Vol. 8, No. 2 ISSN 2064-7964
EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
Neovision2 Performance Evaluation Protocol
Neovision2 Performance Evaluation Protocol Version 3.0 4/16/2012 Public Release Prepared by Rajmadhan Ekambaram [email protected] Dmitry Goldgof, Ph.D. [email protected] Rangachar Kasturi, Ph.D.
LIBSVX and Video Segmentation Evaluation
CVPR 14 Tutorial! 1! LIBSVX and Video Segmentation Evaluation Chenliang Xu and Jason J. Corso!! Computer Science and Engineering! SUNY at Buffalo!! Electrical Engineering and Computer Science! University
Sense Making in an IOT World: Sensor Data Analysis with Deep Learning
Sense Making in an IOT World: Sensor Data Analysis with Deep Learning Natalia Vassilieva, PhD Senior Research Manager GTC 2016 Deep learning proof points as of today Vision Speech Text Other Search & information
Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture.
Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture. Chirag Gupta,Sumod Mohan K [email protected], [email protected] Abstract In this project we propose a method to improve
arxiv:1505.04597v1 [cs.cv] 18 May 2015
U-Net: Convolutional Networks for Biomedical Image Segmentation Olaf Ronneberger, Philipp Fischer, and Thomas Brox arxiv:1505.04597v1 [cs.cv] 18 May 2015 Computer Science Department and BIOSS Centre for
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
Mean-Shift Tracking with Random Sampling
1 Mean-Shift Tracking with Random Sampling Alex Po Leung, Shaogang Gong Department of Computer Science Queen Mary, University of London, London, E1 4NS Abstract In this work, boosting the efficiency of
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Data Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
Machine Learning for Medical Image Analysis. A. Criminisi & the InnerEye team @ MSRC
Machine Learning for Medical Image Analysis A. Criminisi & the InnerEye team @ MSRC Medical image analysis the goal Automatic, semantic analysis and quantification of what observed in medical scans Brain
AN IMPROVED DOUBLE CODING LOCAL BINARY PATTERN ALGORITHM FOR FACE RECOGNITION
AN IMPROVED DOUBLE CODING LOCAL BINARY PATTERN ALGORITHM FOR FACE RECOGNITION Saurabh Asija 1, Rakesh Singh 2 1 Research Scholar (Computer Engineering Department), Punjabi University, Patiala. 2 Asst.
Classification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15
Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15 GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries Copyright GENIVI Alliance
Finding people in repeated shots of the same scene
Finding people in repeated shots of the same scene Josef Sivic 1 C. Lawrence Zitnick Richard Szeliski 1 University of Oxford Microsoft Research Abstract The goal of this work is to find all occurrences
Keypoint Density-based Region Proposal for Fine-Grained Object Detection and Classification using Regions with Convolutional Neural Network Features
Keypoint Density-based Region Proposal for Fine-Grained Object Detection and Classification using Regions with Convolutional Neural Network Features JT Turner 1, Kalyan Gupta 1, Brendan Morris 2, & David
The Big Data methodology in computer vision systems
The Big Data methodology in computer vision systems Popov S.B. Samara State Aerospace University, Image Processing Systems Institute, Russian Academy of Sciences Abstract. I consider the advantages of
arxiv:submit/1533655 [cs.cv] 13 Apr 2016
Bags of Local Convolutional Features for Scalable Instance Search Eva Mohedano, Kevin McGuinness and Noel E. O Connor Amaia Salvador, Ferran Marqués, and Xavier Giró-i-Nieto Insight Center for Data Analytics
Tracking and Recognition in Sports Videos
Tracking and Recognition in Sports Videos Mustafa Teke a, Masoud Sattari b a Graduate School of Informatics, Middle East Technical University, Ankara, Turkey [email protected] b Department of Computer
