Compacting ConvNets for end to end Learning

Size: px

Start display at page:

Download "Compacting ConvNets for end to end Learning"

Mavis Flora Barber
7 years ago
Views:

1 Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson, Hao Zhou, Fatih Porikli.

2 Success of CNN Image Classification Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS, 2012

3 Success of CNN Object Detection from Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, arxiv:

4 Success of CNN Semantic Segmentation Jifeng Dai, Kaiming He, Jian Sun, BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation, arxiv:

5 Success of CNN Image Captioning Andrej Karpathy, Li Fei-Fei, Deep Visual-Semantic Alignments for Generating Image Description, CVPR, 2015 Video classification

6 Key of success Better training algorithms Batch normalization Initializations Momentum

7 Key of success Better training algorithms Large amount of data / labels

8 Key of success Better training algorithms Large amount of data / labels Hardware / Storage GPU, parallel systems Memory GPU (in Gb) GTX-580 Titan Black ('14) Titan X ('15)

9 Key of success Better training algorithms Large amount of data / labels Hardware / Storage Larger community of researchers

10 Key of success Enabled larger networks Num. Parameters (in Millions) LeNet-5 AlexNet VGGNet-16

11 Key of success 150 Num. Parameters (in Millions) LeNet-5 AlexNet VGGNet-16

12 Key of success 150 Num. Parameters (in Millions) LeNet-5 AlexNet VGGNet-16

13 Key of success 150 Num. Parameters (in Millions) LeNet-5 AlexNet VGGNet-16

14 Challenges Embedded devices with limited resources / power 2014 Jetson TK1 2015/16 Jetson TX1

15 Challenges Embedded devices with limited resources / power - Memory is a limiting factor - Real time operation

16 Computational Cost AlexNet Forward-pass is time consuming

17 Computational Cost AlexNet Memory bottleneck

18 Computational Cost VGGNet Memory bottleneck conv3-64 x 2 : 38,720 conv3-128 x 2 : 221,440 conv3-256 x 3 : 1,475,328 conv3-512 x 3 : 5,899,776 conv3-512 x 3 : 7,079,424 fc1 : 102,764,544 fc2 : 16,781,312 fc3 : 4,097,000 TOTAL : 138,357,544

19 Do we need all these parameters?

20 Over-Parameterization Needed for high non-convex optimization 1 Anna Choromanska, Mikael Henaff, Michael Mathieu, Gérard Ben Arous, Yann LeCun. The Loss Surfaces of Multilayer Networks

21 Over-Parameterization Needed for high non-convex optimization Deeper structures, larger learning capacity 1 1 Guido Montúfar, Razvan Pascanu, Kyunghyun Cho, Yoshua Bengio. On the Number of Linear Regions of Deep Neural Networks. NIPS 2014

22 Over-Parameterization Needed for high non-convex optimization Deeper structures, larger learning capacity From images to Video -> Even larger nets? A. Karpathy et. al. Large-scale Video Classification with Convolutional Neural Networks. CVPR 2014.

23 Compacting CNN

24 Compacting CNN Network distillation Network pruning Structured parameters Ours

25 Compacting CNN Network distillation

26 Compacting CNN Network distillation Large network learns from data Generate labels using the trained network Train smaller nets using the output or soft layer Geoffrey Hinton, Oriol Vinyals, Jeff Dean. Distilling the Knowledge in a Neural Network. NIPSw 2015

27 Compacting CNN Network distillation (II) Use intermediate layers to guide the training Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta and Yoshua Bengio. FitNets: Hints for Thin Deep Nets. ICLR 2015

28 Compacting CNN Pros In general better generalization and faster. Equal or slightly better performance Cons Requires a larger network to learn from.

29 Compacting CNN Network distillation Network pruning Directly remove unimportant parameters during training Requires second derivatives. Remove parameters + quantification 1 Good compression rates (orthogonal to other approaches) 1 S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. CoRR, abs/ , 2015

30 Compacting CNN Network distillation Network pruning Structured parameters

31 Compacting CNN: Structured parameters Low rank approximations Max Jaderberg, Andrea Vedaldi, Andrew Zisserman Speeding up Convolutional Neural Networks with Low Rank Expansions. BMVC 2014

32 Compacting CNN: Structured parameters Low rank approximations (II) Emily Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, Rob Fergus. Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation. NIPS 2014

33 Compacting CNN: Structured parameters Low rank approximations (III) Weights are approximated by a sum of rank 1 tensors. Emily Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, Rob Fergus. Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation. NIPS 2014

Compacting CNN: Structured parameters Weak-Points Needs a full-rank network completely trained Not all filters can be approximated Theoretical speeds-up with drop of

34 Compacting CNN: Structured parameters Weak-Points Needs a full-rank network completely trained Not all filters can be approximated Theoretical speeds-up with drop of performance. Emily Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, Rob Fergus. Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation. NIPS 2014

35 Compacting CNN: Structured parameters Weak-Points Needs a full-rank network completely trained. Not all filters can be approximated. Drop of performance. Strengths Potential ability to aid in regularization during or post training. Parameter sharing within the layer.

36 Compacting CNN: Structured parameters Low rank approximations (IV) VGG nets restrict filters during training. Same receptive field Deeper networks (more nonlinearities) Less parameters (49C 2 vs 3x(3x3)C 2 ) K. Simonyan, A. Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. ICLR, 2015

37 Compacting CNN: Structured parameters Low rank approximations (Ours 1 ) Filter restriction during training. Larger receptive fields Deeper networks (more nonlinearities) Parameter sharing Less parameters 1 Joint work with Lars Pertersson. Under review

38 Compacting CNN: Structured parameters Low rank approximations (Ours) ImageNet Results (AlexNet). Baseline: Alex Krizhevsky. Ilya Sutskever. Geoffrey Hinton. ImageNet Classification with Deep. Convolutional Neural Networks. NIPS 2012

39 Compacting CNN: Structured parameters Low rank approximations (Ours) Stereo Matching. Ours-3 32K Ours-1 32K Ours-1 48K Baseline: Jure Zbontar, Yann LeCun. Computing the Stereo Matching Cost With a Convolutional Neural Network. CVPR 2015

40 Memory?

41 Computational Cost VGGNet Memory bottleneck conv3-64 x 2 : 38,720 conv3-128 x 2 : 221,440 conv3-256 x 3 : 1,475,328 conv3-512 x 3 : 5,899,776 conv3-512 x 3 : 7,079,424 fc1 : 102,764,544 fc2 : 16,781,312 fc3 : 4,097,000 TOTAL : 138,357,544

42 Computational Cost AlexNet Memory bottleneck

43 Memory Bottleneck Sparse constraints during training (Ours 2 ) Directly reduce the number of neurons. Select the optimum number of neurons. Significant memory reductions with minor drop of performance 2 Joint work with Hao Zhou, Fatih Porikli. Under review

44 Memory Bottleneck Sparse constraints during training (Ours 2 ) 2 Joint work with Hao Zhou, Fatih Porikli. Under review

45 Do we need all these parameters?

46 Compacting ConvNets for end to end Learning Jose M. Alvarez Joint work with Lars Pertersson, Hao Zhou, Fatih Porikli.

Bert Huang Department of Computer Science Virginia Tech

Bert Huang Department of Computer Science Virginia Tech This paper was submitted as a final project report for CS6424/ECE6424 Probabilistic Graphical Models and Structured Prediction in the spring semester of 2016. The work presented here is done by students