Deep Residual Networks
|
|
- Eileen Cooper
- 7 years ago
- Views:
Transcription
1 Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July Formerly affiliated with Microsoft Research Asia 7x7 conv, 64, /2, pool/2 1x1 conv, 64 1x1 conv, 64 1x1 conv, 64 1x1 conv, 128, /2 1x1 conv, 512 1x1 conv, 128 1x1 conv, 512 1x1 conv, 128 1x1 conv, 512 1x1 conv, 128 1x1 conv, 512 1x1 conv, 128 1x1 conv, 512 1x1 conv, 128 1x1 conv, 512 1x1 conv, 128 1x1 conv, 512 1x1 conv, 128 1x1 conv, 512, /2 1x1 conv, 512, /2 1x1 conv, x1 conv, 512 1x1 conv, x1 conv, 512 1x1 conv, 2048 ave pool, fc 1000
2 Overview Introduction Background From shallow to deep Deep Residual Networks From 10 layers to 100 layers From 100 layers to 1000 layers Applications Q & A
3 Introduction
4 Introduction Deep Residual Networks (ResNets) Deep Residual Learning for Image Recognition. CVPR 2016 (next week) A simple and clean framework of training very deep nets State-of-the-art performance for Image classification Object detection Semantic segmentation and more Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR 2016.
5 ILSVRC & COCO 2015 Competitions 1st places in all five main tracks ImageNet Classification: Ultra-deep 152-layer nets ImageNet Detection: 16% better than 2nd ImageNet Localization: 27% better than 2nd COCO Detection: 11% better than 2nd COCO Segmentation: 12% better than 2nd *improvements are relative numbers Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR 2016.
6 Revolution of Depth 152 layers layers 19 layers layers 8 layers shallow ILSVRC'15 ResNet ILSVRC'14 GoogleNet ILSVRC'14 VGG ILSVRC'13 ILSVRC'12 AlexNet ILSVRC'11 ILSVRC'10 ImageNet Classification top-5 error (%) Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR 2016.
7 Revolution of Depth AlexNet, 8 layers (ILSVRC 2012) 11x11 conv, 96, /4, pool/2 5x5 conv, 256, pool/2 3x3 conv, 384 3x3 conv, 384, pool/2 fc, 4096 fc, 4096 fc, 1000 Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR 2016.
8 Revolution of Depth soft max2 Soft maxact ivat ion FC AlexNet, 8 layers (ILSVRC 2012) 11x11 conv, 96, /4, pool/2 5x5 conv, 256, pool/2 3x3 conv, 384 VGG, 19 layers (ILSVRC 2014), pool/2 GoogleNet, 22 layers (ILSVRC 2014) AveragePool 7x7+ 1(V) Dept hconcat Conv Conv Conv Conv 1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S) Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S) Dept hconcat Conv Conv Conv Conv 1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S) soft max1 3x3 conv, 384, pool/2 Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S) MaxPool 3x3+ 2(S) Soft maxact ivat ion FC, pool/2 Dept hconcat Conv Conv Conv Conv 1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S) FC Conv 1x1+ 1(S) fc, 4096 Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S) Dept hconcat AveragePool 5x5+ 3(V) fc, 4096 Conv Conv Conv Conv 1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S) Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S) fc, 1000, pool/2 Dept hconcat Conv Conv Conv Conv 1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S) soft max0 Soft maxact ivat ion Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S) Dept hconcat FC FC Conv Conv Conv Conv 1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S) Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S) Conv 1x1+ 1(S) AveragePool 5x5+ 3(V), pool/2, pool/2 fc, 4096 fc, 4096 Dept hconcat Conv Conv Conv Conv 1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S) Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S) MaxPool 3x3+ 2(S) Dept hconcat Conv Conv Conv Conv 1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S) Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S) Dept hconcat Conv Conv Conv Conv 1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S) Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S) MaxPool 3x3+ 2(S) LocalRespNorm Conv 3x3+ 1(S) Conv 1x1+ 1(V) LocalRespNorm MaxPool 3x3+ 2(S) fc, 1000 input Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR Conv 7x7+ 2(S)
9 11x11 conv, 96, /4, pool/2 5x5 conv, 256, pool/2 3x3 conv, 384 3x3 conv, 384, pool/2 fc, 4096 fc, 4096 fc, 1000, pool/2, pool/2, pool/2, pool/2, pool/2 fc, 4096 fc, 4096 fc, x7 conv, 64, /2, pool/2 1x1 conv, 64 1x1 conv, 64 1x1 conv, 64 1x1 conv, 128, /2 1x1 conv, 512 1x1 conv, 128 1x1 conv, 512 1x1 conv, 128 1x1 conv, 512 1x1 conv, 128 1x1 conv, 512 1x1 conv, 128 1x1 conv, 512 1x1 conv, 128 1x1 conv, 512 1x1 conv, 128 1x1 conv, 512 1x1 conv, 128 1x1 conv, 512, /2 1x1 conv, 512, /2 1x1 conv, x1 conv, 512 1x1 conv, x1 conv, 512 1x1 conv, 2048 ave pool, fc 1000 Revolution of Depth AlexNet, 8 layers (ILSVRC 2012) VGG, 19 layers (ILSVRC 2014) ResNet, 152 layers (ILSVRC 2015) Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR 2016.
10 Revolution of Depth 101 layers Engines of visual recognition shallow 8 layers 16 layers HOG, DPM AlexNet (RCNN) VGG (RCNN) ResNet (Faster RCNN)* PASCAL VOC 2007 Object Detection map (%) *w/ other improvements & more data Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR 2016.
11 ResNet s object detection result on COCO *the original image is from the COCO dataset Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR 2016.
12 Very simple, easy to follow Many third-party implementations (list in Facebook AI Research s Torch ResNet: Torch, CIFAR-10, with ResNet-20 to ResNet-110, training code, and curves: code Lasagne, CIFAR-10, with ResNet-32 and ResNet-56 and training code: code Neon, CIFAR-10, with pre-trained ResNet-32 to ResNet-110 models, training code, and curves: code Torch, MNIST, 100 layers: blog, code A winning entry in Kaggle's right whale recognition challenge: blog, code Neon, Place2 (mini), 40 layers: blog, code Easily reproduced results (e.g. Torch ResNet: A series of extensions and follow-ups > 200 citations in 6 months after posted on arxiv (Dec. 2015) Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR 2016.
13 Background From shallow to deep
14 Traditional recognition But what s next? shallower pixels classifier bus? edges classifier bus? SIFT/HOG deeper edges histogram classifier bus? edges K-means/ sparse code histogram classifier bus?
15 Deep Learning Specialized components, domain knowledge required edges K-means/ sparse code histogram classifier bus? Generic components ( layers ), less domain knowledge bus? Repeat elementary layers => Going deeper bus? End-to-end learning Richer solution space
16 Spectrum of Depth 5 layers: easy >10 layers: initialization, Batch Normalization >30 layers: skip connections >100 layers: identity skip connections >1000 layers:? shallower deeper
17 input X Initialization weight W output Y = WX 1-layer: Var y = (n +, Var w )Var[x] Multi-layer: If: Linear activation x, y, w: independent Then: n +, n 567 Var y = (2 n 3 +, Var w 3 3 )Var[x] LeCun et al 1998 Efficient Backprop Glorot & Bengio 2010 Understanding the difficulty of training deep feedforward neural networks
18 Initialization Both forward (response) and backward (gradient) signal can vanish/explode Forward: Var y = (2 n +, 3 Var w 3 Backward: 3 Var x = (2 n Var w 3 3 )Var[x] )Var[ y ] exploding ideal vanishing depth LeCun et al 1998 Efficient Backprop Glorot & Bengio 2010 Understanding the difficulty of training deep feedforward neural networks
19 Initialization Initialization under linear assumption 3 n +, 3 Var w 3 = const >? (healthy forward) and 3 n Var w 3 = (healthy backward) n 3 +, Var w 3 = 1 or* n Var w 3 = 1 *: n = n +, 3BC, so D5,E7 FG D5,E7 HG =, IJKL Xavier init in Caffe LeCun et al 1998 Efficient Backprop Glorot & Bengio 2010 Understanding the difficulty of training deep feedforward neural networks MNO RS <., HPQKL It is sufficient to use either form.
20 Initialization Initialization under activation C 3 V n 3 +, Var w 3 = const >? (healthy forward) and C 3 V n Var w 3 = (healthy backward) 1 2 n 3 +, Var w 3 or = n Var w 3 = 1 MSRA init in Caffe With D layers, a factor of 2 per layer has exponential impact of 2 Y Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. ICCV 2015.
21 Initialization 22-layer net: good initconverges faster 30-layer net: good init is able to converge 1 ours 2 nvar w = 1 1 nvar w Xavier = 1 ours nvar w = 1 2 Xavier nvar w = 1 *Figures show the beginning of training Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. ICCV 2015.
22 Batch Normalization (BN) Normalizing input (LeCun et al 1998 Efficient Backprop ) BN: normalizing each layer, for each mini-batch Greatly accelerate training Less sensitive to initialization Improve regularization S. Ioffe & C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ICML 2015
23 Batch Normalization (BN) layer x xz = x μ σ y = γxz + β μ: mean of x in mini-batch σ: std of x in mini-batch γ: scale β: shift μ, σ: functions of x, analogous to responses γ, β: parameters to be learned, analogous to weights S. Ioffe & C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ICML 2015
24 Batch Normalization (BN) layer x xz = x μ σ y = γxz + β 2 modes of BN: Train mode: μ, σ are functions of x; backprop gradients Test mode: μ, σ are pre-computed* on training set Caution: make sure your BN is in a correct mode *: by running average, or post-processing after training S. Ioffe & C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ICML 2015
25 Batch Normalization (BN) best of w/ BN w/o BN accuracy Figure taken from [S. Ioffe & C. Szegedy] S. Ioffe & C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ICML 2015 iter.
26 Deep Residual Networks From 10 layers to 100 layers
27 Going Deeper Initialization algorithms Batch Normalization Is learning better networks as simple as stacking more layers? Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR 2016.
28 Simply stacking layers? 20 train error (%) 56-layer CIFAR test error (%) 56-layer layer layer iter. (1e4) iter. (1e4) Plain nets: stacking 3x3 conv layers 56-layer net has higher training error and test error than 20-layer net Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR 2016.
29 Simply stacking layers? error (%) CIFAR layer 44-layer 32-layer 20-layer error (%) ImageNet layer 5 plain-20 plain-32 plain plain iter. (1e4) solid: test/val dashed: train 30 plain plain iter. (1e4) layer Overly deep plain nets have higher training error A general phenomenon, observed in many datasets Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR 2016.
30 a shallower model (18 layers) 7x7 conv, 64, /2 7x7 conv, 64, /2 a deeper counterpart (34 layers), /2, /2 Richer solution space, /2 extra layers, /2 A deeper model should not have higher training error A solution by construction: original layers: copied from a learned shallower model extra layers: set as identity at least the same training error, /2, /2 Optimization difficulties: solvers cannot find the solution when going deeper fc 1000 fc 1000 Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR 2016.
31 Deep Residual Learning Plaint net x H x is any desired mapping, hope the 2 weight layers fit H(x) any two stacked layers weight layer weight layer H(x) relu relu Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR 2016.
32 Deep Residual Learning Residual net x H x is any desired mapping, hope the 2 weight layers fit H(x) weight layer hope the 2 weight layers fit F(x) F(x) relu weight layer identity x let H x = F x + x H x = F x + x relu Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR 2016.
33 Deep Residual Learning F x is a residualmapping w.r.t. identity x F(x) weight layer relu weight layer identity x If identity were optimal, easy to set weights as 0 If optimal mapping is closer to identity, easier to find small fluctuations H x = F x + x relu Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR 2016.
34 Related Works Residual Representations VLAD & Fisher Vector [Jegou et al 2010], [Perronnin et al 2007] Encoding residual vectors; powerful shallower representations. Product Quantization (IVF-ADC) [Jegou et al 2011] Quantizing residual vectors; efficient nearest-neighbor search. MultiGrid & Hierarchical Precondition [Briggs, et al 2000], [Szeliski 1990, 2006] Solving residual sub-problems; efficient PDE solvers. Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR 2016.
35 7x7 conv, 64, /2 7x7 conv, 64, /2 pool, /2 pool, /2 Network Design plain net ResNet, /2, /2 Keep it simple Our basic design (VGG-style) all 3x3 conv (almost) spatial size /2 => # filters x2 (~same complexity per layer) Simple design; just deep! Other remarks: no hidden fc no dropout, /2, /2 avg pool, /2, /2 avg pool Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR fc 1000 fc 1000
36 Training All plain/residual nets are trained from scratch All plain/residual nets use Batch Normalization Standard hyper-parameters & augmentation Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR 2016.
37 CIFAR-10 experiments error (%) CIFAR-10 plain nets 5 plain-20 plain-32 plain plain iter. (1e4) layer 44-layer 32-layer 20-layer solid: test dashed: train error (%) CIFAR-10 ResNets ResNet-20 ResNet-32 ResNet-44 ResNet-56 ResNet iter. (1e4) 20-layer 32-layer 44-layer 56-layer 110-layer Deep ResNets can be trained without difficulties Deeper ResNets have lower training error, and also lower test error Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR 2016.
38 ImageNet experiments ImageNet plain nets ImageNet ResNets error (%) layer error (%) layer 30 solid: test plain-18 dashed: train plain iter. (1e4) 18-layer 30 ResNet-18 ResNet iter. (1e4) layer Deep ResNets can be trained without difficulties Deeper ResNets have lower training error, and also lower test error Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR 2016.
39 ImageNet experiments A practical design of going deeper 64-d 256-d 3x3, 64 relu 3x3, 64 1x1, 64 relu 3x3, 64 relu 1x1, 256 relu all-3x3 similar complexity relu bottleneck (for ResNet-50/101/152) Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR 2016.
40 ImageNet experiments this model has lower time complexity than VGG-16/19 DeeperResNets have lowererror ResNet-152 ResNet-101 ResNet-50 ResNet crop testing, top-5 val error (%) Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR 2016.
41 ImageNet experiments 152 layers layers 19 layers layers 8 layers shallow ILSVRC'15 ResNet ILSVRC'14 GoogleNet ILSVRC'14 VGG ILSVRC'13 ILSVRC'12 AlexNet ILSVRC'11 ILSVRC'10 ImageNet Classification top-5 error (%) Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR 2016.
42 Discussions Representation, Optimization, Generalization
43 Issues on learning deep models Representation ability Optimization ability Ability of model to fit training data, if optimum could be found If model A s solution space is a superset of B s, A should be better. Feasibility of finding an optimum Not all models are equally easy to optimize Generalization ability Once training data is fit, how good is the test performance
44 How do ResNets address these issues? Representation ability No explicit advantage on representation (only re-parameterization), but Allow models to go deeper Optimization ability Generalization ability Enable very smooth forward/backward prop Greatly ease optimizing deeper models Not explicitly address generalization, but Deeper+thinner is good generalization
45 On the Importance of Identity Mapping From 100 layers to 1000 layers
46 On identity mappings for optimization F(x l ) layer x l h(x c ) shortcut mapping: h = identity after-add mapping: f = layer x cbc = f(h x c + F x c ) Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Identity Mappings in Deep Residual Networks. arxiv 2016.
47 On identity mappings for optimization F(x l ) layer x l h(x c ) shortcut mapping: h = identity after-add mapping: f = What if f = identity? layer x cbc = f(h x c + F x c ) Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Identity Mappings in Deep Residual Networks. arxiv 2016.
48 On identity mappings for optimization F(x l ) layer x l h(x c ) shortcut mapping: h = identity after-add mapping: f = What if f = identity? layer x cbc = f(h x c + F x c ) Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Identity Mappings in Deep Residual Networks. arxiv 2016.
49 Very smooth forward propagation x cbc = x c + F x c x cbv = x cbc + F x cbc Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Identity Mappings in Deep Residual Networks. arxiv 2016.
50 Very smooth forward propagation x cbc = x c + F x c x cbv = x cbc + F x cbc x cbv = x c + F x c + F x cbc Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Identity Mappings in Deep Residual Networks. arxiv 2016.
51 Very smooth forward propagation x cbc = x c + F x c x cbv = x cbc + F x cbc x cbv = x c + F x c + F x cbc gic x g = x c + h F x + +jc Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Identity Mappings in Deep Residual Networks. arxiv 2016.
52 Very smooth forward propagation gic x g = x c + h F x + +jc Any x c is directly forward-prop to any x g, plus residual. Any x g is an additive outcome. in contrast to multiplicative:x g = W + x c gic +jc x c x g 7x7 conv, 64, /2 pool, /2, /2, /2, /2 7x7 conv, 64, /2 pool, /2, /2, /2, /2 Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Identity Mappings fc in 1000Deep Residual fc 1000Networks. arxiv avg pool avg pool
53 Very smooth backward propagation gic 7x7 conv, 64, /2 pool, /2, /2 7x7 conv, 64, /2 pool, /2, /2 x g = x c + h F x + +jc E, /2, /2 gic E = E x g = E (1 + h F x x c x g x c x g x + c +jc ) x c E x g, /2, /2 Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Identity Mappings fc in 1000Deep Residual fc 1000Networks. arxiv avg pool avg pool
54 Very smooth backward propagation 7x7 conv, 64, /2 pool, /2 7x7 conv, 64, /2 pool, /2 gic E = E (1 + h F x x c x g x + c +jc Any lm ln o is directly back-prop to any lm ln p, plus residual. Any lm ln p is additive; unlikely to vanish in contrast to multiplicative: lm ln p = gic +jc ) W + lm ln o E x c E x g, /2, /2, /2, /2, /2, /2 Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Identity Mappings fc in 1000Deep Residual fc 1000Networks. arxiv avg pool avg pool
55 Residual for every layer forward: backward: gic x g = x c + h F x + +jc gic E = E (1 + h F x x c x g x + c +jc Enabled by: shortcut mapping: h = identity after-add mapping: f = identity ) Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Identity Mappings in Deep Residual Networks. arxiv 2016.
56 Experiments Set 1: what if shortcut mapping h identity Set 2: what if after-add mapping f is identity Experiments on ResNets with more than 100 layers deeper models suffer more from optimization difficulty
57 Experiment Set 1: what if shortcut mapping h identity?
58 h x = x error: 6.6% addition 3x3 conv 3x3 conv (a) original addition 3x3 conv 3x3 conv * ResNet-110 on CIFAR-10 (b) constant scaling h x = 0.5x error: 12.4% h x = gate x error: 8.7% *similar to Highway Network addition 1x1 conv sigmoid 1-3x3 conv 3x3 conv (c) exclusive gating addition 1x1 conv sigmoid 1-3x3 conv 3x3 conv (d) shortcut-only gating h x = gate x error: 12.9% h x = conv(x) error: 12.2% 1x1 conv 3x3 conv 3x3 conv dropout 3x3 conv 3x3 conv h x = dropout(x) error: > 20% addition (e) conv shortcut addition (f) dropout shortcut Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Identity Mappings in Deep Residual Networks. arxiv 2016.
59 h x = x error: 6.6% addition 3x3 conv 3x3 conv (a) original addition 3x3 conv 3x3 conv * ResNet-110 on CIFAR-10 (b) constant scaling h x = 0.5x error: 12.4% h x = gate x error: 8.7% h x = conv(x) error: 12.2% addition 1x1 conv 1x1 conv sigmoid 1-3x3 conv 3x3 conv 3x3 conv 3x3 conv (c) exclusive shortcuts gating blocked by multiplications addition dropout 1x1 conv sigmoid 1- (d) shortcut-only gating 3x3 conv 3x3 conv 3x3 conv 3x3 conv h x = gate x error: 12.9% h x = dropout(x) error: > 20% addition (e) conv shortcut addition (f) dropout shortcut Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Identity Mappings in Deep Residual Networks. arxiv 2016.
60 If h is multiplicative, e.g. h x = λx forward: gic x g = λ gic x c + h Fu x + +jc if h is multiplicative, shortcuts are blocked direct propagation is decayed backward: gic E = E (λ gic + h Fu x x c x g x + c +jc ) *assuming f = identity Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Identity Mappings in Deep Residual Networks. arxiv 2016.
61 addition 1x1 conv sigmoid 1-3x3 conv 3x3 conv h is gating solid: test dashed: train 3x3 conv 3x3 conv h is identity addition gating should have better representation ability (identity is a special case), but optimization difficulty dominates results Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Identity Mappings in Deep Residual Networks. arxiv 2016.
62 Experiment Set 2: what if after-add mapping f is identity
63 x l x l x l weight BN weight weight BN weight BN weight BN BN addition addition BN weight x l+1 x l+1 addition x l+1 f is (original ResNet) f is BN+ f is identity (pre-activation ResNet) Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Identity Mappings in Deep Residual Networks. arxiv 2016.
64 x l x l weight BN weight weight BN weight BN addition solid: test dashed: train f = BN+ addition x l+1 f = BN x l+1 f = BN+ f = BN could block prop Keep the shortest pass as smooth as possible Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Identity Mappings in Deep Residual Networks. arxiv 2016.
65 1001-layer ResNets on CIFAR-10 x l x l weight BN weight BN BN weight BN addition weight solid: test dashed: train f = identity f = x l+1 f = addition x l+1 f = identity could block prop when there are 1000 layers pre-activation design eases optimization (and improves generalization; see paper) Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Identity Mappings in Deep Residual Networks. arxiv 2016.
66 Comparisons on CIFAR-10/100 CIFAR-10 method error (%) NIN 8.81 DSN 8.22 FitNet 8.39 Highway 7.72 ResNet-110 (1.7M) 6.61 ResNet-1202 (19.4M) 7.93 ResNet-164, pre-activation (1.7M) 5.46 ResNet-1001, pre-activation (10.2M) 4.92 (4.89±0.14) CIFAR-100 method error (%) NIN DSN FitNet Highway ResNet-164 (1.7M) ResNet-1001 (10.2M) ResNet-164, pre-activation (1.7M) ResNet-1001, pre-activation (10.2M) (22.68±0.22) *all based on moderate augmentation Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Identity Mappings in Deep Residual Networks. arxiv 2016.
67 ImageNet Experiments ImageNet single-crop (320x320) val error method data augmentation top-1 error (%) top-5 error (%) ResNet-152, original scale ResNet-152, pre-activation scale ResNet-200, original scale ResNet-200, pre-activation scale ResNet-200, pre-activation scale + aspect ratio 20.1 * 4.8 * * independently reproduced by: training code and models available. Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Identity Mappings in Deep Residual Networks. arxiv 2016.
68 Summary of observations Keep the shortest path as smooth as possible by making h and f identity forward/backward signals directly flow through this path Features of any layers are additive outcomes 1000-layer ResNets can be easily trained and have better accuracy x l addition x l+1 BN weight BN weight Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Identity Mappings in Deep Residual Networks. arxiv 2016.
69 Future Works Representation skipping 1 layer vs. multiple layers? Flat vs. Bottleneck? Inception-ResNet [Szegedy et al 2016] ResNet in ResNet [Targ et al 2016] Width vs. Depth [Zagoruyko & Komodakis 2016] Generalization DropOut, MaxOut, DropConnect, Drop Layer (Stochastic Depth) [Huang et al 2016] Optimization Without residual/shortcut? x l addition x l+1 BN weight BN weight Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Identity Mappings in Deep Residual Networks. arxiv 2016.
70 Applications Features matter
71 Features matter. (quote [Girshick et al. 2014], the R-CNN paper) task 2nd-place winner ResNets margin (relative) ImageNet Localization (top-5 error) % absolute 8.5% better! ImageNet Detection % COCODetection % COCO Segmentation % Our results are all based on ResNet-101 Deeper features are well transferrable Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR 2016.
72 Revolution of Depth 101 layers Engines of visual recognition shallow 8 layers 16 layers HOG, DPM AlexNet (RCNN) VGG (RCNN) ResNet (Faster RCNN)* PASCAL VOC 2007 Object Detection map (%) *w/ other improvements & more data Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR 2016.
73 Deep Learning for Computer Vision detection network (e.g. R-CNN) ImageNet data segmentation network (e.g. FCN) target data backbone structure pre-train classification network features... human pose estimation network fine-tune depth estimation network
74 Example: Object Detection ü boat ü person Image Classification (what?) Object Detection (what + where?)
75 Object Detection: R-CNN warped region CNN figure credit: R. Girshick et al. aeroplane? no... person? yes... tvmonitor? no. input image region proposals ~2,000 1 CNN for each region classify regions Region-based CNN pipeline Girshick, Donahue, Darrell, Malik. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. CVPR 2014
76 Object Detection: R-CNN R-CNN feature feature feature feature CNN CNN CNN CNN End-to-End training pre-computed Regions-of-Interest (RoIs) image Girshick, Donahue, Darrell, Malik. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. CVPR 2014
77 Object Detection: Fast R-CNN Fast R-CNN feature feature pre-computed Regions-of-Interest (RoIs) feature RoI pooling End-to-End training shared conv layers CNN image Girshick. Fast R-CNN. ICCV 2015
78 Object Detection: Faster R-CNN Faster R-CNN Solely based on CNN No external modules Each step is end-to-end proposals Region Proposal Net features RoI pooling End-to-End training feature map CNN image Shaoqing Ren, Kaiming He, Ross Girshick, & Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015.
79 Object Detection ImageNet data detection data backbone structure pre-train classification network features detection network fine-tune plug-in features AlexNet VGG-16 GoogleNet ResNet-101 independently developed R-CNN Fast R-CNN Faster R-CNN MultiBox SSD detectors
80 Object Detection Simply Faster R-CNN + ResNet proposals classifier RoI pooling Faster R-CNN baseline map@.5 map@.5:.95 VGG ResNet Region Proposal Net feature map COCO detection results ResNet-101 has 28% relative gain vs VGG-16 CNN image Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR Shaoqing Ren, Kaiming He, Ross Girshick, & Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015.
81 Object Detection RPN learns proposals by extremely deep nets We use only 300 proposals (no hand-designed proposals) Add components: Iterative localization Context modeling Multi-scale testing All are based on CNN features; all are end-to-end All benefit more from deeper features cumulative gains! Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR Shaoqing Ren, Kaiming He, Ross Girshick, & Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015.
82 ResNet s object detection result on COCO *the original image is from the COCO dataset Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arxiv Shaoqing Ren, Kaiming He, Ross Girshick, & Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015.
83 *the original image is from the COCO dataset Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arxiv Shaoqing Ren, Kaiming He, Ross Girshick, & Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015.
84 *the original image is from the COCO dataset Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arxiv Shaoqing Ren, Kaiming He, Ross Girshick, & Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015.
85 this video is available online: Results on real video. Models trained on MS COCO (80 categories). (frame-by-frame; no temporal processing) Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. arxiv Shaoqing Ren, Kaiming He, Ross Girshick, & Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015.
86 More Visual Recognition Tasks ResNet-based methods lead on these benchmarks (incomplete list): ImageNet classification, detection, localization MS COCO detection, segmentation PASCAL VOC detection, segmentation ResNet-101 Human pose estimation [Newell et al 2016] Depth estimation [Laina et al 2016] Segment proposal [Pinheiro et al 2016] PASCAL segmentation leaderboard ResNet-101 PASCAL detection leaderboard
87 Potential Applications Visual Recognition ResNets have shown outstanding or promising results on: Image Generation (Pixel RNN, Neural Art, etc.) Natural Language Processing (Very deep CNN) Speech Recognition (preliminary results) Advertising, user prediction (preliminary results)
88 Conclusions of the Tutorial Deep Residual Learning: Ultra deep networks can be easy to train Ultra deep networks can gain accuracy from depth Ultra deep representations are well transferrable Now 200 layers on ImageNet and 1000 layers on CIFAR! Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Identity Mappings in Deep Residual Networks. arxiv 2016.
89 Resources Models and Code Our ImageNet models in Caffe: Many available implementation (list in Facebook AI Research s Torch ResNet: Torch, CIFAR-10, with ResNet-20 to ResNet-110, training code, and curves: code Lasagne, CIFAR-10, with ResNet-32 and ResNet-56 and training code: code Neon, CIFAR-10, with pre-trained ResNet-32 to ResNet-110 models, training code, and curves: code Torch, MNIST, 100 layers: blog, code A winning entry in Kaggle's right whale recognition challenge: blog, code Neon, Place2 (mini), 40 layers: blog, code... Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Deep Residual Learning for Image Recognition. CVPR Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. Identity Mappings in Deep Residual Networks. arxiv 2016.
Convolutional Feature Maps
Convolutional Feature Maps Elements of efficient (and accurate) CNN-based object detection Kaiming He Microsoft Research Asia (MSRA) ICCV 2015 Tutorial on Tools for Efficient Object Detection Overview
More informationLecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection
CSED703R: Deep Learning for Visual Recognition (206S) Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection Bohyung Han Computer Vision Lab. bhhan@postech.ac.kr 2 3 Object detection
More informationCS 1699: Intro to Computer Vision. Deep Learning. Prof. Adriana Kovashka University of Pittsburgh December 1, 2015
CS 1699: Intro to Computer Vision Deep Learning Prof. Adriana Kovashka University of Pittsburgh December 1, 2015 Today: Deep neural networks Background Architectures and basic operations Applications Visualizing
More informationLecture 6: Classification & Localization. boris. ginzburg@intel.com
Lecture 6: Classification & Localization boris. ginzburg@intel.com 1 Agenda ILSVRC 2014 Overfeat: integrated classification, localization, and detection Classification with Localization Detection. 2 ILSVRC-2014
More informationImage and Video Understanding
Image and Video Understanding 2VO 710.095 WS Christoph Feichtenhofer, Axel Pinz Slide credits: Many thanks to all the great computer vision researchers on which this presentation relies on. Most material
More informationFast R-CNN Object detection with Caffe
Fast R-CNN Object detection with Caffe Ross Girshick Microsoft Research arxiv code Latest roasts Goals for this section Super quick intro to object detection Show one way to tackle obj. det. with ConvNets
More informationFast R-CNN. Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th. Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:1504.08083.
Fast R-CNN Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:1504.08083. ECS 289G 001 Paper Presentation, Prof. Lee Result 1 67% Accuracy
More informationModule 5. Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016
Module 5 Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016 Previously, end-to-end.. Dog Slide credit: Jose M 2 Previously, end-to-end.. Dog Learned Representation Slide credit: Jose
More informationBert Huang Department of Computer Science Virginia Tech
This paper was submitted as a final project report for CS6424/ECE6424 Probabilistic Graphical Models and Structured Prediction in the spring semester of 2016. The work presented here is done by students
More informationCompacting ConvNets for end to end Learning
Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson, Hao Zhou, Fatih Porikli. Success of CNN Image Classification Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton,
More informationPedestrian Detection with RCNN
Pedestrian Detection with RCNN Matthew Chen Department of Computer Science Stanford University mcc17@stanford.edu Abstract In this paper we evaluate the effectiveness of using a Region-based Convolutional
More informationMulticoreWare. Global Company, 250+ employees HQ = Sunnyvale, CA Other locations: US, China, India, Taiwan
1 MulticoreWare Global Company, 250+ employees HQ = Sunnyvale, CA Other locations: US, China, India, Taiwan Focused on Heterogeneous Computing Multiple verticals spawned from core competency Machine Learning
More informationarxiv:1502.01852v1 [cs.cv] 6 Feb 2015
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun arxiv:1502.01852v1 [cs.cv] 6 Feb 2015 Abstract Rectified activation
More informationDeformable Part Models with CNN Features
Deformable Part Models with CNN Features Pierre-André Savalle 1, Stavros Tsogkas 1,2, George Papandreou 3, Iasonas Kokkinos 1,2 1 Ecole Centrale Paris, 2 INRIA, 3 TTI-Chicago Abstract. In this work we
More informationPedestrian Detection using R-CNN
Pedestrian Detection using R-CNN CS676A: Computer Vision Project Report Advisor: Prof. Vinay P. Namboodiri Deepak Kumar Mohit Singh Solanki (12228) (12419) Group-17 April 15, 2016 Abstract Pedestrian detection
More informationCAP 6412 Advanced Computer Vision
CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong Jan 26, 2016 Today Administrivia A bigger picture and some common questions Object detection proposals, by Samer
More informationAdministrivia. Traditional Recognition Approach. Overview. CMPSCI 370: Intro. to Computer Vision Deep learning
: Intro. to Computer Vision Deep learning University of Massachusetts, Amherst April 19/21, 2016 Instructor: Subhransu Maji Finals (everyone) Thursday, May 5, 1-3pm, Hasbrouck 113 Final exam Tuesday, May
More informationSteven C.H. Hoi School of Information Systems Singapore Management University Email: chhoi@smu.edu.sg
Steven C.H. Hoi School of Information Systems Singapore Management University Email: chhoi@smu.edu.sg Introduction http://stevenhoi.org/ Finance Recommender Systems Cyber Security Machine Learning Visual
More informationFaster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
1 Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun arxiv:1506.01497v3 [cs.cv] 6 Jan 2016 Abstract State-of-the-art object
More informationHE Shuncheng hsc12@outlook.com. March 20, 2016
Department of Automation Association of Science and Technology of Automation March 20, 2016 Contents Binary Figure 1: a cat? Figure 2: a dog? Binary : Given input data x (e.g. a picture), the output of
More informationSIGNAL INTERPRETATION
SIGNAL INTERPRETATION Lecture 6: ConvNets February 11, 2016 Heikki Huttunen heikki.huttunen@tut.fi Department of Signal Processing Tampere University of Technology CONVNETS Continued from previous slideset
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Deep Learning Barnabás Póczos & Aarti Singh Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey
More informationImage Classification for Dogs and Cats
Image Classification for Dogs and Cats Bang Liu, Yan Liu Department of Electrical and Computer Engineering {bang3,yan10}@ualberta.ca Kai Zhou Department of Computing Science kzhou3@ualberta.ca Abstract
More informationLearning and transferring mid-level image representions using convolutional neural networks
Willow project-team Learning and transferring mid-level image representions using convolutional neural networks Maxime Oquab, Léon Bottou, Ivan Laptev, Josef Sivic 1 Image classification (easy) Is there
More informationEfficient online learning of a non-negative sparse autoencoder
and Machine Learning. Bruges (Belgium), 28-30 April 2010, d-side publi., ISBN 2-93030-10-2. Efficient online learning of a non-negative sparse autoencoder Andre Lemme, R. Felix Reinhart and Jochen J. Steil
More informationInstaNet: Object Classification Applied to Instagram Image Streams
InstaNet: Object Classification Applied to Instagram Image Streams Clifford Huang Stanford University chuang8@stanford.edu Mikhail Sushkov Stanford University msushkov@stanford.edu Abstract The growing
More informationLatest Advances in Deep Learning. Yao Chou
Latest Advances in Deep Learning Yao Chou Outline Introduction Images Classification Object Detection R-CNN Traditional Feature Descriptor Selective Search Implementation Latest Application Deep Learning
More informationObject Detection in Video using Faster R-CNN
Object Detection in Video using Faster R-CNN Prajit Ramachandran University of Illinois at Urbana-Champaign prmchnd2@illinois.edu Abstract Convolutional neural networks (CNN) currently dominate the computer
More informationImplementation of Neural Networks with Theano. http://deeplearning.net/tutorial/
Implementation of Neural Networks with Theano http://deeplearning.net/tutorial/ Feed Forward Neural Network (MLP) Hidden Layer Object Hidden Layer Object Hidden Layer Object Logistic Regression Object
More informationGetting Started with Caffe Julien Demouth, Senior Engineer
Getting Started with Caffe Julien Demouth, Senior Engineer What is Caffe? Open Source Framework for Deep Learning http://github.com/bvlc/caffe Developed by the Berkeley Vision and Learning Center (BVLC)
More informationarxiv:1409.1556v6 [cs.cv] 10 Apr 2015
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION Karen Simonyan & Andrew Zisserman + Visual Geometry Group, Department of Engineering Science, University of Oxford {karen,az}@robots.ox.ac.uk
More informationNetwork Morphism. Abstract. 1. Introduction. Tao Wei
Tao Wei TAOWEI@BUFFALO.EDU Changhu Wang CHW@MICROSOFT.COM Yong Rui YONGRUI@MICROSOFT.COM Chang Wen Chen CHENCW@BUFFALO.EDU Microsoft Research, Beijing, China, 18 Department of Computer Science and Engineering,
More informationSSD: Single Shot MultiBox Detector
SSD: Single Shot MultiBox Detector Wei Liu 1, Dragomir Anguelov 2, Dumitru Erhan 3, Christian Szegedy 3, Scott Reed 4, Cheng-Yang Fu 1, Alexander C. Berg 1 1 UNC Chapel Hill 2 Zoox Inc. 3 Google Inc. 4
More informationGoing Deeper with Convolutional Neural Network for Intelligent Transportation
Going Deeper with Convolutional Neural Network for Intelligent Transportation by Tairui Chen A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE In partial fulfillment of the requirements
More informationarxiv:1604.08893v1 [cs.cv] 29 Apr 2016
Faster R-CNN Features for Instance Search Amaia Salvador, Xavier Giró-i-Nieto, Ferran Marqués Universitat Politècnica de Catalunya (UPC) Barcelona, Spain {amaia.salvador,xavier.giro}@upc.edu Shin ichi
More informationStochastic Pooling for Regularization of Deep Convolutional Neural Networks
Stochastic Pooling for Regularization of Deep Convolutional Neural Networks Matthew D. Zeiler Department of Computer Science Courant Institute, New York University zeiler@cs.nyu.edu Rob Fergus Department
More informationAutomatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report
Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 69 Class Project Report Junhua Mao and Lunbo Xu University of California, Los Angeles mjhustc@ucla.edu and lunbo
More informationSimplified Machine Learning for CUDA. Umar Arshad @arshad_umar Arrayfire @arrayfire
Simplified Machine Learning for CUDA Umar Arshad @arshad_umar Arrayfire @arrayfire ArrayFire CUDA and OpenCL experts since 2007 Headquartered in Atlanta, GA In search for the best and the brightest Expert
More informationFast Accurate Fish Detection and Recognition of Underwater Images with Fast R-CNN
Fast Accurate Fish Detection and Recognition of Underwater Images with Fast R-CNN Xiu Li 1, 2, Min Shang 1, 2, Hongwei Qin 1, 2, Liansheng Chen 1, 2 1. Department of Automation, Tsinghua University, Beijing
More informationMulti-view Face Detection Using Deep Convolutional Neural Networks
Multi-view Face Detection Using Deep Convolutional Neural Networks Sachin Sudhakar Farfade Yahoo fsachin@yahoo-inc.com Mohammad Saberian Yahoo saberian@yahooinc.com Li-Jia Li Yahoo lijiali@cs.stanford.edu
More informationarxiv:1506.03365v2 [cs.cv] 19 Jun 2015
LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop Fisher Yu Yinda Zhang Shuran Song Ari Seff Jianxiong Xiao arxiv:1506.03365v2 [cs.cv] 19 Jun 2015 Princeton
More informationarxiv:1505.04597v1 [cs.cv] 18 May 2015
U-Net: Convolutional Networks for Biomedical Image Segmentation Olaf Ronneberger, Philipp Fischer, and Thomas Brox arxiv:1505.04597v1 [cs.cv] 18 May 2015 Computer Science Department and BIOSS Centre for
More informationSemantic Recognition: Object Detection and Scene Segmentation
Semantic Recognition: Object Detection and Scene Segmentation Xuming He xuming.he@nicta.com.au Computer Vision Research Group NICTA Robotic Vision Summer School 2015 Acknowledgement: Slides from Fei-Fei
More informationSupporting Online Material for
www.sciencemag.org/cgi/content/full/313/5786/504/dc1 Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks G. E. Hinton* and R. R. Salakhutdinov *To whom correspondence
More informationarxiv:1504.08083v2 [cs.cv] 27 Sep 2015
Fast R-CNN Ross Girshick Microsoft Research rbg@microsoft.com arxiv:1504.08083v2 [cs.cv] 27 Sep 2015 Abstract This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object
More informationConditional Random Fields as Recurrent Neural Networks
Conditional Random Fields as Recurrent Neural Networks Shuai Zheng 1, Sadeep Jayasumana *1, Bernardino Romera-Paredes 1, Vibhav Vineet 1,2, Zhizhong Su 3, Dalong Du 3, Chang Huang 3, and Philip H. S. Torr
More informationTwo-Stream Convolutional Networks for Action Recognition in Videos
Two-Stream Convolutional Networks for Action Recognition in Videos Karen Simonyan Andrew Zisserman Visual Geometry Group, University of Oxford {karen,az}@robots.ox.ac.uk Abstract We investigate architectures
More informationApplications of Deep Learning to the GEOINT mission. June 2015
Applications of Deep Learning to the GEOINT mission June 2015 Overview Motivation Deep Learning Recap GEOINT applications: Imagery exploitation OSINT exploitation Geospatial and activity based analytics
More informationImage Super-Resolution Using Deep Convolutional Networks
1 Image Super-Resolution Using Deep Convolutional Networks Chao Dong, Chen Change Loy, Member, IEEE, Kaiming He, Member, IEEE, and Xiaoou Tang, Fellow, IEEE arxiv:1501.00092v3 [cs.cv] 31 Jul 2015 Abstract
More informationPhD in Computer Science and Engineering Bologna, April 2016. Machine Learning. Marco Lippi. marco.lippi3@unibo.it. Marco Lippi Machine Learning 1 / 80
PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it Marco Lippi Machine Learning 1 / 80 Recurrent Neural Networks Marco Lippi Machine Learning
More informationCNN Based Object Detection in Large Video Images. WangTao, wtao@qiyi.com IQIYI ltd. 2016.4
CNN Based Object Detection in Large Video Images WangTao, wtao@qiyi.com IQIYI ltd. 2016.4 Outline Introduction Background Challenge Our approach System framework Object detection Scene recognition Body
More informationSense Making in an IOT World: Sensor Data Analysis with Deep Learning
Sense Making in an IOT World: Sensor Data Analysis with Deep Learning Natalia Vassilieva, PhD Senior Research Manager GTC 2016 Deep learning proof points as of today Vision Speech Text Other Search & information
More informationEdVidParse: Detecting People and Content in Educational Videos
EdVidParse: Detecting People and Content in Educational Videos by Michele Pratusevich S.B., Massachusetts Institute of Technology (2013) Submitted to the Department of Electrical Engineering and Computer
More informationRecognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang
Recognizing Cats and Dogs with Shape and Appearance based Models Group Member: Chu Wang, Landu Jiang Abstract Recognizing cats and dogs from images is a challenging competition raised by Kaggle platform
More informationApplying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15
Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15 GENIVI is a registered trademark of the GENIVI Alliance in the USA and other countries Copyright GENIVI Alliance
More informationA Dynamic Convolutional Layer for Short Range Weather Prediction
A Dynamic Convolutional Layer for Short Range Weather Prediction Benjamin Klein, Lior Wolf and Yehuda Afek The Blavatnik School of Computer Science Tel Aviv University beni.klein@gmail.com, wolf@cs.tau.ac.il,
More informationNAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju
NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE Venu Govindaraju BIOMETRICS DOCUMENT ANALYSIS PATTERN RECOGNITION 8/24/2015 ICDAR- 2015 2 Towards a Globally Optimal Approach for Learning Deep Unsupervised
More informationExploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers
Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers Fan Yang 1,2, Wongun Choi 2, and Yuanqing Lin 2 1 Department of Computer Science,
More informationDeep Learning using Linear Support Vector Machines
Yichuan Tang tang@cs.toronto.edu Department of Computer Science, University of Toronto. Toronto, Ontario, Canada. Abstract Recently, fully-connected and convolutional neural networks have been trained
More informationTattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks
1 Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks Tomislav Hrkać, Karla Brkić, Zoran Kalafatić Faculty of Electrical Engineering and Computing University of
More informationUnsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition
Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition Marc Aurelio Ranzato, Fu-Jie Huang, Y-Lan Boureau, Yann LeCun Courant Institute of Mathematical Sciences,
More informationLecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
More informationJava Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
More informationarxiv:1507.02672v2 [cs.ne] 24 Nov 2015
Semi-Supervised Learning with Ladder Networks Antti Rasmus The Curious AI Company, Finland Harri Valpola The Curious AI Company, Finland Mikko Honkala Nokia Labs, Finland arxiv:1507.02672v2 [cs.ne] 24
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationGroup Sparse Coding. Fernando Pereira Google Mountain View, CA pereira@google.com. Dennis Strelow Google Mountain View, CA strelow@google.
Group Sparse Coding Samy Bengio Google Mountain View, CA bengio@google.com Fernando Pereira Google Mountain View, CA pereira@google.com Yoram Singer Google Mountain View, CA singer@google.com Dennis Strelow
More informationReturn of the Devil in the Details: Delving Deep into Convolutional Nets
CHATFIELD ET AL.: RETURN OF THE DEVIL 1 Return of the Devil in the Details: Delving Deep into Convolutional Nets Ken Chatfield ken@robots.ox.ac.uk Karen Simonyan karen@robots.ox.ac.uk Andrea Vedaldi vedaldi@robots.ox.ac.uk
More informationNovelty Detection in image recognition using IRF Neural Networks properties
Novelty Detection in image recognition using IRF Neural Networks properties Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban Université de Haute-Alsace MIPS 4, rue des Frères Lumière, 68093 Mulhouse,
More informationDenoising Convolutional Autoencoders for Noisy Speech Recognition
Denoising Convolutional Autoencoders for Noisy Speech Recognition Mike Kayser Stanford University mkayser@stanford.edu Victor Zhong Stanford University vzhong@stanford.edu Abstract We propose the use of
More informationTaking Inverse Graphics Seriously
CSC2535: 2013 Advanced Machine Learning Taking Inverse Graphics Seriously Geoffrey Hinton Department of Computer Science University of Toronto The representation used by the neural nets that work best
More informationConvolutional Neural Networks with Intra-layer Recurrent Connections for Scene Labeling
Convolutional Neural Networks with Intra-layer Recurrent Connections for Scene Labeling Ming Liang Xiaolin Hu Bo Zhang Tsinghua National Laboratory for Information Science and Technology (TNList) Department
More informationRecognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28
Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object
More informationWeakly Supervised Object Boundaries Supplementary material
Weakly Supervised Object Boundaries Supplementary material Anna Khoreva Rodrigo Benenson Mohamed Omran Matthias Hein 2 Bernt Schiele Max Planck Institute for Informatics, Saarbrücken, Germany 2 Saarland
More informationAn Analysis of Single-Layer Networks in Unsupervised Feature Learning
An Analysis of Single-Layer Networks in Unsupervised Feature Learning Adam Coates 1, Honglak Lee 2, Andrew Y. Ng 1 1 Computer Science Department, Stanford University {acoates,ang}@cs.stanford.edu 2 Computer
More informationLearning to Process Natural Language in Big Data Environment
CCF ADL 2015 Nanchang Oct 11, 2015 Learning to Process Natural Language in Big Data Environment Hang Li Noah s Ark Lab Huawei Technologies Part 1: Deep Learning - Present and Future Talk Outline Overview
More informationDEEP LEARNING WITH GPUS
DEEP LEARNING WITH GPUS GEOINT 2015 Larry Brown Ph.D. June 2015 AGENDA 1 Introducing NVIDIA 2 What is Deep Learning? 3 GPUs and Deep Learning 4 cudnn and DiGiTS 5 Machine Learning & Data Analytics and
More information3D Object Recognition using Convolutional Neural Networks with Transfer Learning between Input Channels
3D Object Recognition using Convolutional Neural Networks with Transfer Learning between Input Channels Luís A. Alexandre Department of Informatics and Instituto de Telecomunicações Univ. Beira Interior,
More informationScalable Object Detection by Filter Compression with Regularized Sparse Coding
Scalable Object Detection by Filter Compression with Regularized Sparse Coding Ting-Hsuan Chao, Yen-Liang Lin, Yin-Hsi Kuo, and Winston H Hsu National Taiwan University, Taipei, Taiwan Abstract For practical
More informationTask-driven Progressive Part Localization for Fine-grained Recognition
Task-driven Progressive Part Localization for Fine-grained Recognition Chen Huang Zhihai He chenhuang@mail.missouri.edu University of Missouri hezhi@missouri.edu Abstract In this paper we propose a task-driven
More informationIntroduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski trakovski@nyus.edu.mk Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems
More informationSemantic Image Segmentation and Web-Supervised Visual Learning
Semantic Image Segmentation and Web-Supervised Visual Learning Florian Schroff Andrew Zisserman University of Oxford, UK Antonio Criminisi Microsoft Research Ltd, Cambridge, UK Outline Part I: Semantic
More informationLecture 6. Artificial Neural Networks
Lecture 6 Artificial Neural Networks 1 1 Artificial Neural Networks In this note we provide an overview of the key concepts that have led to the emergence of Artificial Neural Networks as a major paradigm
More informationApplying Deep Learning to Enhance Momentum Trading Strategies in Stocks
This version: December 12, 2013 Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks Lawrence Takeuchi * Yu-Ying (Albert) Lee ltakeuch@stanford.edu yy.albert.lee@gmail.com Abstract We
More informationConvolutional Networks for Stock Trading
Convolutional Networks for Stock Trading Ashwin Siripurapu Stanford University Department of Computer Science 353 Serra Mall, Stanford, CA 94305 ashwin@cs.stanford.edu Abstract Convolutional neural networks
More informationAn Introduction to Neural Networks
An Introduction to Vincent Cheung Kevin Cannons Signal & Data Compression Laboratory Electrical & Computer Engineering University of Manitoba Winnipeg, Manitoba, Canada Advisor: Dr. W. Kinsner May 27,
More informationProgramming Exercise 3: Multi-class Classification and Neural Networks
Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks
More informationarxiv:submit/1533655 [cs.cv] 13 Apr 2016
Bags of Local Convolutional Features for Scalable Instance Search Eva Mohedano, Kevin McGuinness and Noel E. O Connor Amaia Salvador, Ferran Marqués, and Xavier Giró-i-Nieto Insight Center for Data Analytics
More informationSimultaneous Deep Transfer Across Domains and Tasks
Simultaneous Deep Transfer Across Domains and Tasks Eric Tzeng, Judy Hoffman, Trevor Darrell UC Berkeley, EECS & ICSI {etzeng,jhoffman,trevor}@eecs.berkeley.edu Kate Saenko UMass Lowell, CS saenko@cs.uml.edu
More informationSimple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,
More informationSelecting Receptive Fields in Deep Networks
Selecting Receptive Fields in Deep Networks Adam Coates Department of Computer Science Stanford University Stanford, CA 94305 acoates@cs.stanford.edu Andrew Y. Ng Department of Computer Science Stanford
More informationGPU-Based Deep Learning Inference:
Whitepaper GPU-Based Deep Learning Inference: A Performance and Power Analysis November 2015 1 Contents Abstract... 3 Introduction... 3 Inference versus Training... 4 GPUs Excel at Neural Network Inference...
More informationKeypoint Density-based Region Proposal for Fine-Grained Object Detection and Classification using Regions with Convolutional Neural Network Features
Keypoint Density-based Region Proposal for Fine-Grained Object Detection and Classification using Regions with Convolutional Neural Network Features JT Turner 1, Kalyan Gupta 1, Brendan Morris 2, & David
More informationCCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York
BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not
More informationDo Convnets Learn Correspondence?
Do Convnets Learn Correspondence? Jonathan Long Ning Zhang Trevor Darrell University of California Berkeley {jonlong, nzhang, trevor}@cs.berkeley.edu Abstract Convolutional neural nets (convnets) trained
More informationWeakly Supervised Fine-Grained Categorization with Part-Based Image Representation
ACCEPTED BY IEEE TIP 1 Weakly Supervised Fine-Grained Categorization with Part-Based Image Representation Yu Zhang, Xiu-Shen Wei, Jianxin Wu, Member, IEEE, Jianfei Cai, Senior Member, IEEE, Jiangbo Lu,
More informationIBM SPSS Neural Networks 22
IBM SPSS Neural Networks 22 Note Before using this information and the product it supports, read the information in Notices on page 21. Product Information This edition applies to version 22, release 0,
More informationSemi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
More informationDo Deep Nets Really Need to be Deep?
Do Deep Nets Really Need to be Deep? Lei Jimmy Ba University of Toronto jimmy@psi.utoronto.ca Rich Caruana Microsoft Research rcaruana@microsoft.com Abstract Currently, deep neural networks are the state
More informationThe Importance of Encoding Versus Training with Sparse Coding and Vector Quantization
The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization Adam Coates Andrew Y. Ng Stanford University, 353 Serra Mall, Stanford, CA 94305 acoates@cs.stanford.edu ang@cs.stanford.edu
More informationDeep Learning GPU-Based Hardware Platform
Deep Learning GPU-Based Hardware Platform Hardware and Software Criteria and Selection Mourad Bouache Yahoo! Performance Engineering Group Sunnyvale, CA +1.408.784.1446 bouache@yahoo-inc.com John Glover
More information