# arxiv: v1 [cs.cv] 6 Feb 2015

Save this PDF as:

Size: px
Start display at page:

## Transcription

2 f (y) = 0 f (y) f (y) = y y f (y) f (y) = ay f (y) = y Figure 1. ReLU vs. PReLU. For PReLU, the coefficient of the negative part is not constant and is adaptively learned. 2. Approach In this section, we first present the PReLU activation function (Sec. 2.1). Then we derive our initialization method for deep rectifier networks (Sec. 2.2). Lastly we discuss our architecture designs (Sec. 2.3) Parametric Rectifiers We show that replacing the parameter-free ReLU activation by a learned parametric activation unit improves classification accuracy 1. Definition Formally, we consider an activation function defined as: { y i, if y i > 0 f(y i ) = a i y i, if y i 0. (1) Here y i is the input of the nonlinear activation f on the ith channel, and a i is a coefficient controlling the slope of the negative part. The subscript i in a i indicates that we allow the nonlinear activation to vary on different channels. When a i = 0, it becomes ReLU; when a i is a learnable parameter, we refer to Eqn.(1) as Parametric ReLU (PReLU). Figure 1 shows the shapes of ReLU and PReLU. Eqn.(1) is equivalent to f(y i ) = max(0, y i ) + a i min(0, y i ). If a i is a small and fixed value, PReLU becomes the Leaky ReLU (LReLU) in [20] (a i = 0.01). The motivation of LReLU is to avoid zero gradients. Experiments in [20] show that LReLU has negligible impact on accuracy compared with ReLU. On the contrary, our method adaptively learns the PReLU parameters jointly with the whole model. We hope for end-to-end training that will lead to more specialized activations. PReLU introduces a very small number of extra parameters. The number of extra parameters is equal to the total number of channels, which is negligible when considering the total number of weights. So we expect no extra risk of overfitting. We also consider a channel-shared variant: 1 Concurrent with our work, Agostinelli et al. [1] also investigated learning activation functions and showed improvement on other tasks. y f(y i ) = max(0, y i ) + a min(0, y i ) where the coefficient is shared by all channels of one layer. This variant only introduces a single extra parameter into each layer. Optimization PReLU can be trained using backpropagation [17] and optimized simultaneously with other layers. The update formulations of {a i } are simply derived from the chain rule. The gradient of a i for one layer is: E a i = y i E f(y i ), (2) f(y i ) a i E f(y i) where E represents the objective function. The term is the gradient propagated from the deeper layer. The gradient of the activation is given by: { f(y i ) 0, if y i > 0 = a i y i, if y i 0. (3) The summation y i runs over all positions of the feature map. For the channel-shared variant, the gradient of a is E a = E f(y i) i y i f(y i) a, where i sums over all channels of the layer. The time complexity due to PReLU is negligible for both forward and backward propagation. We adopt the momentum method when updating a i : a i := µ a i + ɛ E a i. (4) Here µ is the momentum and ɛ is the learning rate. It is worth noticing that we do not use weight decay (l 2 regularization) when updating a i. A weight decay tends to push a i to zero, and thus biases PReLU toward ReLU. Even without regularization, the learned coefficients rarely have a magnitude larger than 1 in our experiments. Further, we do not constrain the range of a i so that the activation function may be non-monotonic. We use a i = 0.25 as the initialization throughout this paper. Comparison Experiments We conducted comparisons on a deep but efficient model with 14 weight layers. The model was studied in [10] (model E of [10]) and its architecture is described in Table 1. We choose this model because it is sufficient for representing a category of very deep models, as well as to make the experiments feasible. As a baseline, we train this model with ReLU applied in the convolutional (conv) layers and the first two fullyconnected (fc) layers. The training implementation follows [10]. The top-1 and top-5 errors are 33.82% and 13.34% on ImageNet 2012, using 10-view testing (Table 2). 2

3 learned coefficients layer channel-shared channel-wise conv1 7 7, 64, / pool1 3 3, /3 conv , conv , conv , conv , pool2 2 2, /2 conv , conv , conv , conv , conv , conv , spp {6, 3, 2, 1} fc fc fc Table 1. A small but deep 14-layer model [10]. The filter size and filter number of each layer is listed. The number /s indicates the stride s that is used. The learned coefficients of PReLU are also shown. For the channel-wise case, the average of {a i} over the channels is shown for each layer. top-1 top-5 ReLU PReLU, channel-shared PReLU, channel-wise Table 2. Comparisons between ReLU and PReLU on the small model. The error rates are for ImageNet 2012 using 10-view testing. The images are resized so that the shorter side is 256, during both training and testing. Each view is All models are trained using 75 epochs. Then we train the same architecture from scratch, with all ReLUs replaced by PReLUs (Table 2). The top-1 error is reduced to 32.64%. This is a 1.2% gain over the ReLU baseline. Table 2 also shows that channel-wise/channelshared PReLUs perform comparably. For the channelshared version, PReLU only introduces 13 extra free parameters compared with the ReLU counterpart. But this small number of free parameters play critical roles as evidenced by the 1.1% gain over the baseline. This implies the importance of adaptively learning the shapes of activation functions. Table 1 also shows the learned coefficients of PReLUs for each layer. There are two interesting phenomena in Table 1. First, the first conv layer (conv1) has coefficients (0.681 and 0.596) significantly greater than 0. As the filters of conv1 are mostly Gabor-like filters such as edge or texture detectors, the learned results show that both positive and negative responses of the filters are respected. We believe that this is a more economical way of exploiting lowlevel information, given the limited number of filters (e.g., 64). Second, for the channel-wise version, the deeper conv layers in general have smaller coefficients. This implies that the activations gradually become more nonlinear at increasing depths. In other words, the learned model tends to keep more information in earlier stages and becomes more discriminative in deeper stages Initialization of Filter Weights for Rectifiers Rectifier networks are easier to train [8, 16, 34] compared with traditional sigmoid-like activation networks. But a bad initialization can still hamper the learning of a highly non-linear system. In this subsection, we propose a robust initialization method that removes an obstacle of training extremely deep rectifier networks. Recent deep CNNs are mostly initialized by random weights drawn from Gaussian distributions [16]. With fixed standard deviations (e.g., 0.01 in [16]), very deep models (e.g., >8 conv layers) have difficulties to converge, as reported by the VGG team [25] and also observed in our experiments. To address this issue, in [25] they pre-train a model with 8 conv layers to initialize deeper models. But this strategy requires more training time, and may also lead to a poorer local optimum. In [29, 18], auxiliary classifiers are added to intermediate layers to help with convergence. Glorot and Bengio [7] proposed to adopt a properly scaled uniform distribution for initialization. This is called Xavier initialization in [14]. Its derivation is based on the assumption that the activations are linear. This assumption is invalid for ReLU and PReLU. In the following, we derive a theoretically more sound initialization by taking ReLU/PReLU into account. In our experiments, our initialization method allows for extremely deep models (e.g., 30 conv/fc layers) to converge, while the Xavier method [7] cannot. Forward Propagation Case Our derivation mainly follows [7]. The central idea is to investigate the variance of the responses in each layer. For a conv layer, a response is: y l = W l x l + b l. (5) Here, x is a k 2 c-by-1 vector that represents co-located k k pixels in c input channels. k is the spatial filter size of the layer. With n = k 2 c denoting the number of connections of a response, W is a d-by-n matrix, where d is the number of filters and each row of W represents the weights of a filter. b is a vector of biases, and y is the response at a pixel of the output map. We use l to index a layer. We have x l = f(y l 1 ) where f is the activation. We also have c l = d l 1. 3

4 We let the initialized elements in W l be mutually independent and share the same distribution. As in [7], we assume that the elements in x l are also mutually independent and share the same distribution, and x l and W l are independent of each other. Then we have: Var[y l ] = n l Var[w l x l ], (6) where now y l, x l, and w l represent the random variables of each element in y l, W l, and x l respectively. We let w l have zero mean. Then the variance of the product of independent variables gives us: Var[y l ] = n l Var[w l ]E[x 2 l ]. (7) Here E[x 2 l ] is the expectation of the square of x l. It is worth noticing that E[x 2 l ] Var[x l] unless x l has zero mean. For the ReLU activation, x l = max(0, y l 1 ) and thus it does not have zero mean. This will lead to a conclusion different from [7]. If we let w l 1 have a symmetric distribution around zero and b l 1 = 0, then y l 1 has zero mean and has a symmetric distribution around zero. This leads to E[x 2 l ] = 1 2 Var[y l 1] when f is ReLU. Putting this into Eqn.(7), we obtain: Var[y l ] = 1 2 n lvar[w l ]Var[y l 1 ]. (8) With L layers put together, we have: ( L ) 1 Var[y L ] = Var[y 1 ] 2 n lvar[w l ]. (9) l=2 This product is the key to the initialization design. A proper initialization method should avoid reducing or magnifying the magnitudes of input signals exponentially. So we expect the above product to take a proper scalar (e.g., 1). A sufficient condition is: 1 2 n lvar[w l ] = 1, l. (10) This leads to a zero-mean Gaussian distribution whose standard deviation (std) is 2/n l. This is our way of initialization. We also initialize b = 0. For the first layer (l = 1), we should have n 1 Var[w 1 ] = 1 because there is no ReLU applied on the input signal. But the factor 1/2 does not matter if it just exists on one layer. So we also adopt Eqn.(10) in the first layer for simplicity. Backward Propagation Case For back-propagation, the gradient of a conv layer is computed by: x l = Ŵl y l. (11) Here we use x and y to denote gradients ( E E x and y ) for simplicity. y represents k-by-k pixels in d channels, and is reshaped into a k 2 d-by-1 vector. We denote ˆn = k 2 d. Note that ˆn n = k 2 c. Ŵ is a c-by-ˆn matrix where the filters are rearranged in the way of back-propagation. Note that W and Ŵ can be reshaped from each other. x is a c- by-1 vector representing the gradient at a pixel of this layer. As above, we assume that w l and y l are independent of each other, then x l has zero mean for all l, when w l is initialized by a symmetric distribution around zero. In back-propagation we also have y l = f (y l ) x l+1 where f is the derivative of f. For the ReLU case, f (y l ) is zero or one, and their probabilities are equal. We assume that f (y l ) and x l+1 are independent of each other. Thus we have E[ y l ] = E[ x l+1 ]/2 = 0, and also E[( y l ) 2 ] = Var[ y l ] = 1 2 Var[ x l+1]. Then we compute the variance of the gradient in Eqn.(11): Var[ x l ] = ˆn l Var[w l ]Var[ y l ] = 1 2 ˆn lvar[w l ]Var[ x l+1 ]. (12) The scalar 1/2 in both Eqn.(12) and Eqn.(8) is the result of ReLU, though the derivations are different. With L layers put together, we have: ( L ) 1 Var[ x 2 ] = Var[ x L+1 ] 2 ˆn lvar[w l ]. (13) l=2 We consider a sufficient condition that the gradient is not exponentially large/small: 1 2 ˆn lvar[w l ] = 1, l. (14) The only difference between this equation and Eqn.(10) is that ˆn l = kl 2d l while n l = kl 2c l = kl 2d l 1. Eqn.(14) results in a zero-mean Gaussian distribution whose std is 2/ˆn l. For the first layer (l = 1), we need not compute x 1 because it represents the image domain. But we can still adopt Eqn.(14) in the first layer, for the same reason as in the forward propagation case - the factor of a single layer does not make the overall product exponentially large/small. We note that it is sufficient to use either Eqn.(14) or Eqn.(10) alone. For example, if we use Eqn.(14), then in Eqn.(13) the product L l=2 1 2 ˆn lvar[w l ] = 1, and in Eqn.(9) the product L l=2 1 2 n lvar[w l ] = L l=2 n l/ˆn l = c 2 /d L, which is not a diminishing number in common network designs. This means that if the initialization properly scales the backward signal, then this is also the case for the forward signal; and vice versa. For all models in this paper, both forms can make them converge. Discussions If the forward/backward signal is inappropriately scaled by a factor β in each layer, then the final propagated signal 4

7 input size VGG-19 [25] model A model B model C 3 3, , 96, /2 7 7, 96, /2 7 7, 96, / , maxpool, / , , maxpool, /2 2 2 maxpool, /2 2 2 maxpool, /2 2 2 maxpool, /2 3 3, , , , , , , , , , , , , , , , , , , , , maxpool, /2 2 2 maxpool, /2 2 2 maxpool, /2 2 2 maxpool, /2 3 3, , , , , , , , , , , , , , , , , , , , , maxpool, /2 2 2 maxpool, /2 2 2 maxpool, /2 2 2 maxpool, /2 3 3, , , , , , , , , , , , , , , , , , , , , maxpool, /2 spp, {7, 3, 2, 1} spp, {7, 3, 2, 1} spp, {7, 3, 2, 1} fc fc fc depth (conv+fc) complexity (ops., ) Table 3. Architectures of large models. Here /2 denotes a stride of 2. and 1e-4, and is switched when the error plateaus. The total number of epochs is about 80 for each model. Testing We adopt the strategy of multi-view testing on feature maps used in the SPP-net paper [11]. We further improve this strategy using the dense sliding window method in [24, 25]. We first apply the convolutional layers on the resized full image and obtain the last convolutional feature map. In the feature map, each window is pooled using the SPP layer [11]. The fc layers are then applied on the pooled features to compute the scores. This is also done on the horizontally flipped images. The scores of all dense sliding windows are averaged [24, 25]. We further combine the results at multiple scales as in [11]. Multi-GPU Implementation We adopt a simple variant of Krizhevsky s method [15] for parallel training on multiple GPUs. We adopt data parallelism [15] on the conv layers. The GPUs are synchronized before the first fc layer. Then the forward/backward propagations of the fc layers are performed on a single GPU - this means that we do not parallelize the computation of the fc layers. The time cost of the fc layers is low, so it is not necessary to parallelize them. This leads to a simpler implementation than the model parallelism in [15]. Besides, model parallelism introduces some overhead due to the communication of filter responses, and is not faster than computing the fc layers on just a single GPU. We implement the above algorithm on our modification of the Caffe library [14]. We do not increase the mini-batch size (128) because the accuracy may be decreased [15]. For the large models in this paper, we have observed a 3.8x speedup using 4 GPUs, and a 6.0x speedup using 8 GPUs. 7

8 model A ReLU PReLU scale s top-1 top-5 top-1 top multi-scale Table 4. Comparisons between ReLU/PReLU on model A in ImageNet 2012 using dense testing. 4. Experiments on ImageNet We perform the experiments on the 1000-class ImageNet 2012 dataset [22] which contains about 1.2 million training images, 50,000 validation images, and 100,000 test images (with no published labels). The results are measured by top- 1/top-5 error rates [22]. We only use the provided data for training. All results are evaluated on the validation set, except for the final results in Table 7, which are evaluated on the test set. The top-5 error rate is the metric officially used to rank the methods in the classification challenge [22]. Comparisons between ReLU and PReLU In Table 4, we compare ReLU and PReLU on the large model A. We use the channel-wise version of PReLU. For fair comparisons, both ReLU/PReLU models are trained using the same total number of epochs, and the learning rates are also switched after running the same number of epochs. Table 4 shows the results at three scales and the multiscale combination. The best single scale is 384, possibly because it is in the middle of the jittering range [256, 512]. For the multi-scale combination, PReLU reduces the top- 1 error by 1.05% and the top-5 error by 0.23% compared with ReLU. The results in Table 2 and Table 4 consistently show that PReLU improves both small and large models. This improvement is obtained with almost no computational cost. Comparisons of Single-model Results Next we compare single-model results. We first show 10- view testing results [16] in Table 5. Here, each view is a 224-crop. The 10-view results of VGG-16 are based on our testing using the publicly released model [25] as it is not reported in [25]. Our best 10-view result is 7.38% (Table 5). Our other models also outperform the existing results. Table 6 shows the comparisons of single-model results, which are all obtained using multi-scale and multi-view (or dense) test. Our results are denoted as MSRA. Our baseline model (A+ReLU, 6.51%) is already substantially better than the best existing single-model result of 7.1% reported for VGG-19 in the latest update of [25] (arxiv v5). We believe that this gain is mainly due to our end-to-end training, without the need of pre-training shallow models. Moreover, our best single model (C, PReLU) has 5.71% top-5 error. This result is even better than all previous multi-model results (Table 7). Comparing A+PReLU with B+PReLU, we see that the 19-layer model and the 22-layer model perform comparably. On the other hand, increasing the width (C vs. B, Table 6) can still improve accuracy. This indicates that when the models are deep enough, the width becomes an essential factor for accuracy. Comparisons of Multi-model Results We combine six models including those in Table 6. For the time being we have trained only one model with architecture C. The other models have accuracy inferior to C by considerable margins. We conjecture that we can obtain better results by using fewer stronger models. The multi-model results are in Table 7. Our result is 4.94% top-5 error on the test set. This number is evaluated by the ILSVRC server, because the labels of the test set are not published. Our result is 1.7% better than the ILSVRC 2014 winner (GoogLeNet, 6.66% [29]), which represents a 26% relative improvement. This is also a 17% relative improvement over the latest result (Baidu, 5.98% [32]). Analysis of Results Figure 4 shows some example validation images successfully classified by our method. Besides the correctly predicted labels, we also pay attention to the other four predictions in the top-5 results. Some of these four labels are other objects in the multi-object images, e.g., the horse-cart image (Figure 4, row 1, col 1) contains a mini-bus and it is also recognized by the algorithm. Some of these four labels are due to the uncertainty among similar classes, e.g., the coucal image (Figure 4, row 2, col 1) has predicted labels of other bird species. Figure 6 shows the per-class top-5 error of our result (average of 4.94%) on the test set, displayed in ascending order. Our result has zero top-5 error in 113 classes - the images in these classes are all correctly classified. The three classes with the highest top-5 error are letter opener (49%), spotlight (38%), and restaurant (36%). The error is due to the existence of multiple objects, small objects, or large intra-class variance. Figure 5 shows some example images misclassified by our method in these three classes. Some of the predicted labels still make some sense. In Figure 7, we show the per-class difference of top-5 error rates between our result (average of 4.94%) and our team s in-competition result in ILSVRC 2014 (average of 8.06%). The error rates are reduced in 824 classes, unchanged in 127 classes, and increased in 49 classes. 8

9 model top-1 top-5 MSRA [11] VGG-16 [25] GoogLeNet [29] A, ReLU A, PReLU B, PReLU C, PReLU Table 5. The single-model 10-view results for ImageNet 2012 val set. : Based on our tests. in competition ILSVRC 14 post-competition team top-1 top-5 MSRA [11] VGG [25] GoogLeNet [29] VGG [25] (arxiv v2) VGG [25] (arxiv v5) Baidu [32] MSRA (A, ReLU) MSRA (A, PReLU) MSRA (B, PReLU) MSRA (C, PReLU) Table 6. The single-model results for ImageNet 2012 val set. : Evaluated from the test set. in competition ILSVRC 14 post-competition team top-5 (test) MSRA, SPP-nets [11] 8.06 VGG [25] 7.32 GoogLeNet [29] 6.66 VGG [25] (arxiv v5) 6.8 Baidu [32] 5.98 MSRA, PReLU-nets 4.94 Table 7. The multi-model results for the ImageNet 2012 test set. Comparisons with Human Performance from [22] Russakovsky et al. [22] recently reported that human performance yields a 5.1% top-5 error on the ImageNet dataset. This number is achieved by a human annotator who is well trained on the validation images to be better aware of the existence of relevant classes. When annotating the test images, the human annotator is given a special interface, where each class title is accompanied by a row of 13 example training images. The reported human performance is estimated on a random subset of 1500 test images. Our result (4.94%) exceeds the reported human-level performance. To our knowledge, our result is the first published instance of surpassing humans on this visual recognition challenge. The analysis in [22] reveals that the two major types of human errors come from fine-grained recognition and class unawareness. The investigation in [22] suggests that algorithms can do a better job on fine-grained recognition (e.g., 120 species of dogs in the dataset). The second row of Figure 4 shows some example fine-grained objects successfully recognized by our method - coucal, komondor, and yellow lady s slipper. While humans can easily recognize these objects as a bird, a dog, and a flower, it is nontrivial for most humans to tell their species. On the negative side, our algorithm still makes mistakes in cases that are not difficult for humans, especially for those requiring context understanding or high-level knowledge (e.g., the spotlight images in Figure 5). While our algorithm produces a superior result on this particular dataset, this does not indicate that machine vision outperforms human vision on object recognition in general. On recognizing elementary object categories (i.e., common objects or concepts in daily lives) such as the Pascal VOC 9

10 Figure 6. The per-class top-5 errors of our result (average of 4.94%) on the test set. Errors are displayed in ascending order. Figure 7. The difference of top-5 error rates between our result (average of 4.94%) and our team s in-competition result for ILSVRC 2014 (average of 8.06%) on the test set, displayed in ascending order. A positive number indicates a reduced error rate. [7] X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In International Conference on Artificial Intelligence and Statistics, pages , [8] X. Glorot, A. Bordes, and Y. Bengio. Deep sparse rectifier networks. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, pages , [9] I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio. Maxout networks. arxiv: , [10] K. He and J. Sun. Convolutional neural networks at constrained time cost. arxiv: , [11] K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. arxiv: v2, [12] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arxiv: , [13] A. G. Howard. Some improvements on deep convolutional neural network based image classification. arxiv: , [14] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arxiv: , [15] A. Krizhevsky. One weird trick for parallelizing convolutional neural networks. arxiv: , [16] A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, [17] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural computation, [18] C.-Y. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu. Deeplysupervised nets. arxiv: , [19] M. Lin, Q. Chen, and S. Yan. Network in network. arxiv: , [20] A. L. Maas, A. Y. Hannun, and A. Y. Ng. Rectifier nonlinearities improve neural network acoustic models. In ICML, [21] V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In ICML, pages , [22] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. arxiv: , [23] A. M. Saxe, J. L. McClelland, and S. Ganguli. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arxiv: , [24] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks [25] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arxiv: , [26] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, pages , [27] R. K. Srivastava, J. Masci, S. Kazerounian, F. Gomez, and J. Schmidhuber. Compete to compute. In NIPS, pages , [28] Y. Sun, Y. Chen, X. Wang, and X. Tang. Deep learning face representation by joint identification-verification. In NIPS, [29] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. arxiv: , [30] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face verification. In CVPR, [31] L. Wan, M. Zeiler, S. Zhang, Y. L. Cun, and R. Fergus. Regularization of neural networks using dropconnect. In ICML, pages , [32] R. Wu, S. Yan, Y. Shan, Q. Dang, and G. Sun. Deep image: Scaling up image recognition. arxiv: , [33] M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional neural networks. In ECCV, [34] M. D. Zeiler, M. Ranzato, R. Monga, M. Mao, K. Yang, Q. V. Le, P. Nguyen, A. Senior, V. Vanhoucke, J. Dean, and G. E. Hinton. On rectified linear units for speech processing. In ICASSP,

### CS 1699: Intro to Computer Vision. Deep Learning. Prof. Adriana Kovashka University of Pittsburgh December 1, 2015

CS 1699: Intro to Computer Vision Deep Learning Prof. Adriana Kovashka University of Pittsburgh December 1, 2015 Today: Deep neural networks Background Architectures and basic operations Applications Visualizing

### Convolutional Feature Maps

Convolutional Feature Maps Elements of efficient (and accurate) CNN-based object detection Kaiming He Microsoft Research Asia (MSRA) ICCV 2015 Tutorial on Tools for Efficient Object Detection Overview

### Abstract. (CNN) ImageNet CNN [18] CNN CNN. fine-tuning ( )[14] Neocognitron[10] 1990 LeCun (CNN)[27]

Abstract (CNN) ImageNet CNN CNN CNN fine-tuning 1 ( )[14] ( 1) ( ). (CNN)[27] CNN (pre-trained network) pre-trained CNN 2 2.1 (CNN) CNN CNN [18] 2 ( ). Neocognitron[10] 1990 LeCun [27] CNN 2000 CNN [34,

### Lecture 6: Classification & Localization. boris. ginzburg@intel.com

Lecture 6: Classification & Localization boris. ginzburg@intel.com 1 Agenda ILSVRC 2014 Overfeat: integrated classification, localization, and detection Classification with Localization Detection. 2 ILSVRC-2014

### arxiv:1409.1556v6 [cs.cv] 10 Apr 2015

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION Karen Simonyan & Andrew Zisserman + Visual Geometry Group, Department of Engineering Science, University of Oxford {karen,az}@robots.ox.ac.uk

### Image and Video Understanding

Image and Video Understanding 2VO 710.095 WS Christoph Feichtenhofer, Axel Pinz Slide credits: Many thanks to all the great computer vision researchers on which this presentation relies on. Most material

### Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Deep Learning Barnabás Póczos & Aarti Singh Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey

### Using Convolutional Neural Network for the Tiny ImageNet Challenge

Using Convolutional Neural Network for the Tiny ImageNet Challenge Jason Ting Stanford University jmting@stanford.edu Abstract In this project we work on creating a model to classify images for the Tiny

### arxiv: v1 [cs.cv] 10 Dec 2015

Deep Residual Learning for Image Recognition Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun Microsoft Research {kahe, v-xiangz, v-shren, jiansun}@microsoft.com arxiv:1512.03385v1 [cs.cv] 10 Dec 2015 Abstract

### Visualizing and Comparing AlexNet and VGG using Deconvolutional Layers

Wei Yu Kuiyuan Yang Yalong Bai Tianjun Xiao Hongxun Yao Yong Rui W.YU@HIT.EDU.CN KUYANG@MICROSOFT.COM YLBAI@MTLAB.HIT.EDU.CN TIXIAO@MICROSOFT.COM H.YAO@HIT.EDU.CN YONGRUI@MICROSOFT.COM Abstract Convolutional

### Human Pose Estimation and Activity Classification Using Convolutional Neural Networks

Human Pose Estimation and Activity Classification Using Convolutional Neural Networks Amy Bearman Stanford University Catherine Dong Stanford University abearman@cs.stanford.edu cdong@cs.stanford.edu Abstract

### Stochastic Pooling for Regularization of Deep Convolutional Neural Networks

Stochastic Pooling for Regularization of Deep Convolutional Neural Networks Matthew D. Zeiler Department of Computer Science Courant Institute, New York University zeiler@cs.nyu.edu Rob Fergus Department

### Deformable Part Models with CNN Features

Deformable Part Models with CNN Features Pierre-André Savalle 1, Stavros Tsogkas 1,2, George Papandreou 3, Iasonas Kokkinos 1,2 1 Ecole Centrale Paris, 2 INRIA, 3 TTI-Chicago Abstract. In this work we

### Fast R-CNN. Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th. Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:1504.08083.

Fast R-CNN Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:1504.08083. ECS 289G 001 Paper Presentation, Prof. Lee Result 1 67% Accuracy

### Administrivia. Traditional Recognition Approach. Overview. CMPSCI 370: Intro. to Computer Vision Deep learning

: Intro. to Computer Vision Deep learning University of Massachusetts, Amherst April 19/21, 2016 Instructor: Subhransu Maji Finals (everyone) Thursday, May 5, 1-3pm, Hasbrouck 113 Final exam Tuesday, May

### Module 5. Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016

Module 5 Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016 Previously, end-to-end.. Dog Slide credit: Jose M 2 Previously, end-to-end.. Dog Learned Representation Slide credit: Jose

### Compacting ConvNets for end to end Learning

Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson, Hao Zhou, Fatih Porikli. Success of CNN Image Classification Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton,

### Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks SHAOQING REN, KAIMING HE, ROSS GIRSHICK, JIAN SUN Göksu Erdoğan Object Detection Detection as Regression? DOG, (x, y, w, h)

### Cascade Region Regression for Robust Object Detection

Large Scale Visual Recognition Challenge 2015 (ILSVRC2015) Cascade Region Regression for Robust Object Detection Jiankang Deng Team Name: Amax Centre for Quantum Computation & Intelligent Systems (QCIS),

### Steven C.H. Hoi School of Information Systems Singapore Management University Email: chhoi@smu.edu.sg

Steven C.H. Hoi School of Information Systems Singapore Management University Email: chhoi@smu.edu.sg Introduction http://stevenhoi.org/ Finance Recommender Systems Cyber Security Machine Learning Visual

### Image Classification for Dogs and Cats

Image Classification for Dogs and Cats Bang Liu, Yan Liu Department of Electrical and Computer Engineering {bang3,yan10}@ualberta.ca Kai Zhou Department of Computing Science kzhou3@ualberta.ca Abstract

### Training R- CNNs of various velocities Slow, fast, and faster

Training R- CNNs of various velocities Slow, fast, and faster Ross Girshick Facebook AI Research (FAIR) Tools for Efficient Object Detection, ICCV 2015 Tutorial Section overview Kaiming just covered inference

### Deep Image: Scaling up Image Recognition

Deep Image: Scaling up Image Recognition Ren Wu 1, Shengen Yan, Yi Shan, Qingqing Dang, Gang Sun Baidu Research Abstract We present a state-of-the-art image recognition system, Deep Image, developed using

### InstaNet: Object Classification Applied to Instagram Image Streams

InstaNet: Object Classification Applied to Instagram Image Streams Clifford Huang Stanford University chuang8@stanford.edu Mikhail Sushkov Stanford University msushkov@stanford.edu Abstract The growing

### Learning and transferring mid-level image representions using convolutional neural networks

Willow project-team Learning and transferring mid-level image representions using convolutional neural networks Maxime Oquab, Léon Bottou, Ivan Laptev, Josef Sivic 1 Image classification (easy) Is there

### Bert Huang Department of Computer Science Virginia Tech

This paper was submitted as a final project report for CS6424/ECE6424 Probabilistic Graphical Models and Structured Prediction in the spring semester of 2016. The work presented here is done by students

### Deep Residual Networks

Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July 2016. Formerly affiliated with Microsoft Research Asia 7x7 conv,

### arxiv:1506.03365v2 [cs.cv] 19 Jun 2015

LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop Fisher Yu Yinda Zhang Shuran Song Ari Seff Jianxiong Xiao arxiv:1506.03365v2 [cs.cv] 19 Jun 2015 Princeton

### arxiv:1312.6034v2 [cs.cv] 19 Apr 2014

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps arxiv:1312.6034v2 [cs.cv] 19 Apr 2014 Karen Simonyan Andrea Vedaldi Andrew Zisserman Visual Geometry Group,

### ImageNet Classification with Deep Convolutional Neural Networks

ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky University of Toronto kriz@cs.utoronto.ca Ilya Sutskever University of Toronto ilya@cs.utoronto.ca Geoffrey E. Hinton University

### Pedestrian Detection with RCNN

Pedestrian Detection with RCNN Matthew Chen Department of Computer Science Stanford University mcc17@stanford.edu Abstract In this paper we evaluate the effectiveness of using a Region-based Convolutional

### Fast R-CNN Object detection with Caffe

Fast R-CNN Object detection with Caffe Ross Girshick Microsoft Research arxiv code Latest roasts Goals for this section Super quick intro to object detection Show one way to tackle obj. det. with ConvNets

### A Dynamic Convolutional Layer for Short Range Weather Prediction

A Dynamic Convolutional Layer for Short Range Weather Prediction Benjamin Klein, Lior Wolf and Yehuda Afek The Blavatnik School of Computer Science Tel Aviv University beni.klein@gmail.com, wolf@cs.tau.ac.il,

### Going Deeper with Convolutions

Going Deeper with olutions Christian Szegedy 1, Wei Liu 2, Yangqing Jia 1, Pierre Sermanet 1, Scott Reed 3, Dragomir Anguelov 1, Dumitru Erhan 1, Vincent Vanhoucke 1, Andrew Rabinovich 4 1 Google Inc.

### Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

1 Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun arxiv:1406.4729v4 [cs.cv] 23 Apr 2015 Abstract Existing deep convolutional

### Deep Fisher Networks and Class Saliency Maps for Object Classification and Localisation

Deep Fisher Networks and Class Saliency Maps for Object Classification and Localisation Karén Simonyan, Andrea Vedaldi, Andrew Zisserman Visual Geometry Group, University of Oxford Outline Classification

### arxiv:1505.04597v1 [cs.cv] 18 May 2015

U-Net: Convolutional Networks for Biomedical Image Segmentation Olaf Ronneberger, Philipp Fischer, and Thomas Brox arxiv:1505.04597v1 [cs.cv] 18 May 2015 Computer Science Department and BIOSS Centre for

### Supporting Online Material for

www.sciencemag.org/cgi/content/full/313/5786/504/dc1 Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks G. E. Hinton* and R. R. Salakhutdinov *To whom correspondence

### Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection

CSED703R: Deep Learning for Visual Recognition (206S) Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection Bohyung Han Computer Vision Lab. bhhan@postech.ac.kr 2 3 Object detection

### Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks

Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks Dong-Hyun Lee sayit78@gmail.com Nangman Computing, 117D Garden five Tools, Munjeong-dong Songpa-gu, Seoul,

### Regularization of Neural Networks using DropConnect

Regularization of Neural Networks using DropConnect Li Wan, Matthew Zeiler, Sixin Zhang, Yann LeCun, Rob Fergus Dept. of Computer Science, Courant Institute of Mathematical Science, New York University

### Learning to Process Natural Language in Big Data Environment

CCF ADL 2015 Nanchang Oct 11, 2015 Learning to Process Natural Language in Big Data Environment Hang Li Noah s Ark Lab Huawei Technologies Part 1: Deep Learning - Present and Future Talk Outline Overview

### Image Super-Resolution Using Deep Convolutional Networks

1 Image Super-Resolution Using Deep Convolutional Networks Chao Dong, Chen Change Loy, Member, IEEE, Kaiming He, Member, IEEE, and Xiaoou Tang, Fellow, IEEE arxiv:1501.00092v3 [cs.cv] 31 Jul 2015 Abstract

### CAP 6412 Advanced Computer Vision

CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong Jan 26, 2016 Today Administrivia A bigger picture and some common questions Object detection proposals, by Samer

### Fixed Point Quantization of Deep Convolutional Networks

Darryl D. Lin Qualcomm Research, San Diego, CA 922, USA Sachin S. Talathi Qualcomm Research, San Diego, CA 922, USA DARRYL.DLIN@GMAIL.COM TALATHI@GMAIL.COM V. Sreekanth Annapureddy SREEKANTHAV@GMAIL.COM

### Two-Stream Convolutional Networks for Action Recognition in Videos

Two-Stream Convolutional Networks for Action Recognition in Videos Karen Simonyan Andrew Zisserman Visual Geometry Group, University of Oxford {karen,az}@robots.ox.ac.uk Abstract We investigate architectures

### 2.1

baba@i.kyoto-u.ac.jp 1 2 2.1 2.2 ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) KDD Cup 1997 1 2015 1 KDD Cup 1999 KDD Cup TCP 2003 arxiv LaTex 2011 Yahoo! Music 2015 KDD Cup XuetangX

### Network Morphism. Abstract. 1. Introduction. Tao Wei

Tao Wei TAOWEI@BUFFALO.EDU Changhu Wang CHW@MICROSOFT.COM Yong Rui YONGRUI@MICROSOFT.COM Chang Wen Chen CHENCW@BUFFALO.EDU Microsoft Research, Beijing, China, 18 Department of Computer Science and Engineering,

### Denoising Convolutional Autoencoders for Noisy Speech Recognition

Denoising Convolutional Autoencoders for Noisy Speech Recognition Mike Kayser Stanford University mkayser@stanford.edu Victor Zhong Stanford University vzhong@stanford.edu Abstract We propose the use of

### SIGNAL INTERPRETATION

SIGNAL INTERPRETATION Lecture 6: ConvNets February 11, 2016 Heikki Huttunen heikki.huttunen@tut.fi Department of Signal Processing Tampere University of Technology CONVNETS Continued from previous slideset

### Naive-Deep Face Recognition: Touching the Limit of LFW Benchmark or Not?

Naive-Deep Face Recognition: Touching the Limit of LFW Benchmark or Not? Erjin Zhou zej@megvii.com Zhimin Cao czm@megvii.com Qi Yin yq@megvii.com Abstract Face recognition performance improves rapidly

### Generating Facial Expressions

Generating Facial Expressions Jonathan Suit Georgia Tech Abstract In this report, I use CAS-PEAL dataset [1] to try and generate facial expressions. That is, I take in, say, a smiling face, and try to

### A brief intro to computer vision and neural networks. 28 April 2016 Pamela Toman

1 A brief intro to computer vision and neural networks 28 April 2016 Pamela Toman 2 At 1:00, you will be able to Competence (able to perform on your own, with varying levels of perfection): Understand

### RCNN, Fast RCNN, Faster RCNN

RCNN, Fast RCNN, Faster RCNN Topics of the lecture: Problem statement Review of slow R-CNN Review of Fast R-CNN Review of Faster R-CNN Presented by: Roi Shikler & Gil Elbaz Advisor: Prof. Michael Lindenbaum

### 3D Object Recognition using Convolutional Neural Networks with Transfer Learning between Input Channels

3D Object Recognition using Convolutional Neural Networks with Transfer Learning between Input Channels Luís A. Alexandre Department of Informatics and Instituto de Telecomunicações Univ. Beira Interior,

### Scalable Object Detection by Filter Compression with Regularized Sparse Coding

Scalable Object Detection by Filter Compression with Regularized Sparse Coding Ting-Hsuan Chao, Yen-Liang Lin, Yin-Hsi Kuo, and Winston H Hsu National Taiwan University, Taipei, Taiwan Abstract For practical

### Observer Localization using ConvNets

Observer Localization using ConvNets (Stanford CS231n class project, Winter 2016) Alon Kipnis Department of Electrical Engineering Stanford University 350 Serra Mall, Stanford, CA kipnisal@stanford.edu

### Scene recognition with CNNs: objects, scales and dataset bias

Scene recognition with CNNs: objects, scales and dataset bias Luis Herranz, Shuqiang Jiang, Xiangyang Li Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS) Institute

### CSC321 Introduction to Neural Networks and Machine Learning. Lecture 21 Using Boltzmann machines to initialize backpropagation.

CSC321 Introduction to Neural Networks and Machine Learning Lecture 21 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton Some problems with backpropagation The amount of information

### Towards a Deep Learning Framework for Unconstrained Face Detection

Towards a Deep Learning Framework for Unconstrained Face Detection Yutong Zheng Chenchen Zhu Khoa Luu Chandrasekhar Bhagavatula T. Hoang Ngan Le Marios Savvides CyLab Biometrics Center and the Department

### Deep Learning using Linear Support Vector Machines

Yichuan Tang tang@cs.toronto.edu Department of Computer Science, University of Toronto. Toronto, Ontario, Canada. Abstract Recently, fully-connected and convolutional neural networks have been trained

### Deep Face Recognition

PARKHI et al.: DEEP FACE RECOGNITION 1 Deep Face Recognition Omkar M. Parkhi omkar@robots.ox.ac.uk Andrea Vedaldi vedaldi@robots.ox.ac.uk Andrew Zisserman az@robots.ox.ac.uk Visual Geometry Group Department

### Recognizing Handwritten Digits and Characters

Recognizing Handwritten Digits and Characters Vishnu Sundaresan Stanford University vishnu@stanford.edu Jasper Lin Stanford University jasperlin@stanford.edu Abstract Our project is meant to implement

### Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

1 Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun arxiv:1506.01497v3 [cs.cv] 6 Jan 2016 Abstract State-of-the-art object

### Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition Kaiming He 1, Xiangyu Zhang 2, Shaoqing Ren 3, and Jian Sun 1 1 Microsoft Research 2 Xi an Jiaotong University 3 University

### Monza: Image Classification of Vehicle Make and Model Using Convolutional Neural Networks and Transfer Learning

Monza: Image Classification of Vehicle Make and Model Using Convolutional Neural Networks and Transfer Learning Derrick Liu Stanford University lediur@stanford.edu Yushi Wang Stanford University yushiw@stanford.edu

### Convolutional Neural Networks for Named. Entity Recognition in Images of Documents

Convolutional Neural Networks for Named Entity Recognition in Images of Documents Master Thesis Jan van de Kerkhof Supervisors: Roelof Pieters (KTH), Erik Rehn (Dooer), David Hallvig (Dooer) Stockholm

### Rectified Linear Units Improve Restricted Boltzmann Machines

Rectified Linear Units Improve Restricted Boltzmann Machines Vinod Nair vnair@cs.toronto.edu Geoffrey E. Hinton hinton@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto, ON

### Do Deep Nets Really Need to be Deep?

Do Deep Nets Really Need to be Deep? Lei Jimmy Ba University of Toronto jimmy@psi.utoronto.ca Rich Caruana Microsoft Research rcaruana@microsoft.com Abstract Currently, deep neural networks are the state

### HE Shuncheng hsc12@outlook.com. March 20, 2016

Department of Automation Association of Science and Technology of Automation March 20, 2016 Contents Binary Figure 1: a cat? Figure 2: a dog? Binary : Given input data x (e.g. a picture), the output of

### Multi-view Face Detection Using Deep Convolutional Neural Networks

Multi-view Face Detection Using Deep Convolutional Neural Networks Sachin Sudhakar Farfade Yahoo fsachin@yahoo-inc.com Mohammad Saberian Yahoo saberian@yahooinc.com Li-Jia Li Yahoo lijiali@cs.stanford.edu

### Programming Exercise 3: Multi-class Classification and Neural Networks

Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks

### SSD: Single Shot MultiBox Detector

SSD: Single Shot MultiBox Detector Wei Liu 1, Dragomir Anguelov 2, Dumitru Erhan 3, Christian Szegedy 3, Scott Reed 4, Cheng-Yang Fu 1, Alexander C. Berg 1 1 UNC Chapel Hill 2 Zoox Inc. 3 Google Inc. 4

### Visual Learning of Arithmetic Operations

Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) Visual Learning of Arithmetic Operations Yedid Hoshen and Shmuel Peleg School of Computer Science and Engineering The Hebrew

### Implementation of Neural Networks with Theano. http://deeplearning.net/tutorial/

Implementation of Neural Networks with Theano http://deeplearning.net/tutorial/ Feed Forward Neural Network (MLP) Hidden Layer Object Hidden Layer Object Hidden Layer Object Logistic Regression Object

### Gnumpy: an easy way to use GPU boards in Python

Department of Computer Science 6 King s College Rd, Toronto University of Toronto M5S 3G4, Canada http://learning.cs.toronto.edu fax: +1 416 978 1455 July 9, 2010 UTML TR 2010 002 Gnumpy: an easy way to

### 6.2.8 Neural networks for data mining

6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural

### Lecture 6. Artificial Neural Networks

Lecture 6 Artificial Neural Networks 1 1 Artificial Neural Networks In this note we provide an overview of the key concepts that have led to the emergence of Artificial Neural Networks as a major paradigm

### Efficient online learning of a non-negative sparse autoencoder

and Machine Learning. Bruges (Belgium), 28-30 April 2010, d-side publi., ISBN 2-93030-10-2. Efficient online learning of a non-negative sparse autoencoder Andre Lemme, R. Felix Reinhart and Jochen J. Steil

### Return of the Devil in the Details: Delving Deep into Convolutional Nets

CHATFIELD ET AL.: RETURN OF THE DEVIL 1 Return of the Devil in the Details: Delving Deep into Convolutional Nets Ken Chatfield ken@robots.ox.ac.uk Karen Simonyan karen@robots.ox.ac.uk Andrea Vedaldi vedaldi@robots.ox.ac.uk

### Do Convnets Learn Correspondence?

Do Convnets Learn Correspondence? Jonathan Long Ning Zhang Trevor Darrell University of California Berkeley {jonlong, nzhang, trevor}@cs.berkeley.edu Abstract Convolutional neural nets (convnets) trained

### Pedestrian Detection using R-CNN

Pedestrian Detection using R-CNN CS676A: Computer Vision Project Report Advisor: Prof. Vinay P. Namboodiri Deepak Kumar Mohit Singh Solanki (12228) (12419) Group-17 April 15, 2016 Abstract Pedestrian detection

### TRANSFER LEARNING USING CONVOLUTIONAL NEURAL NETWORKS FOR OBJECT CLASSIFICATION WITHIN X-RAY BAGGAGE SECURITY IMAGERY

TRANSFER LEARNING USING CONVOLUTIONAL NEURAL NETWORKS FOR OBJECT CLASSIFICATION WITHIN X-RAY BAGGAGE SECURITY IMAGERY Samet Akçay 1, Mikolaj E. Kundegorski 1, Michael Devereux 2, Toby P. Breckon 1 1 Durham

### Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk

Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski trakovski@nyus.edu.mk Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems

### INTRODUCTION TO NEURAL NETWORKS

INTRODUCTION TO NEURAL NETWORKS Pictures are taken from http://www.cs.cmu.edu/~tom/mlbook-chapter-slides.html http://research.microsoft.com/~cmbishop/prml/index.htm By Nobel Khandaker Neural Networks An

### On the Number of Linear Regions of Deep Neural Networks

On the Number of Linear Regions of Deep Neural Networks Guido Montúfar Max Planck Institute for Mathematics in the Sciences montufar@mis.mpg.de Razvan Pascanu Université de Montréal pascanur@iro.umontreal.ca

### Selecting Receptive Fields in Deep Networks

Selecting Receptive Fields in Deep Networks Adam Coates Department of Computer Science Stanford University Stanford, CA 94305 acoates@cs.stanford.edu Andrew Y. Ng Department of Computer Science Stanford

### Object Detection in Video using Faster R-CNN

Object Detection in Video using Faster R-CNN Prajit Ramachandran University of Illinois at Urbana-Champaign prmchnd2@illinois.edu Abstract Convolutional neural networks (CNN) currently dominate the computer

### arxiv:1604.08893v1 [cs.cv] 29 Apr 2016

Faster R-CNN Features for Instance Search Amaia Salvador, Xavier Giró-i-Nieto, Ferran Marqués Universitat Politècnica de Catalunya (UPC) Barcelona, Spain {amaia.salvador,xavier.giro}@upc.edu Shin ichi

### Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

### Task-driven Progressive Part Localization for Fine-grained Recognition

Task-driven Progressive Part Localization for Fine-grained Recognition Chen Huang Zhihai He chenhuang@mail.missouri.edu University of Missouri hezhi@missouri.edu Abstract In this paper we propose a task-driven

### Predict Influencers in the Social Network

Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

### Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning Natalia Vassilieva, PhD Senior Research Manager GTC 2016 Deep learning proof points as of today Vision Speech Text Other Search & information

### The Delicate Art of Flower Classification

The Delicate Art of Flower Classification Paul Vicol Simon Fraser University University Burnaby, BC pvicol@sfu.ca Note: The following is my contribution to a group project for a graduate machine learning

### Learning Deep Hierarchies of Representations

Learning Deep Hierarchies of Representations Yoshua Bengio, U. Montreal Google Research, Mountain View, California September 23rd, 2009 Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier

### Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks

This version: December 12, 2013 Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks Lawrence Takeuchi * Yu-Ying (Albert) Lee ltakeuch@stanford.edu yy.albert.lee@gmail.com Abstract We

### Deep Learning Face Representation by Joint Identification-Verification

Deep Learning Face Representation by Joint Identification-Verification Yi Sun 1 Yuheng Chen 2 Xiaogang Wang 3,4 Xiaoou Tang 1,4 1 Department of Information Engineering, The Chinese University of Hong Kong

### Bootstrapping Big Data

Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu