HE Shuncheng hsc12@outlook.com. March 20, 2016

Similar documents

Convolutional Feature Maps

CS 1699: Intro to Computer Vision. Deep Learning. Prof. Adriana Kovashka University of Pittsburgh December 1, 2015

Image and Video Understanding

Bert Huang Department of Computer Science Virginia Tech

Denoising Convolutional Autoencoders for Noisy Speech Recognition

Fast R-CNN Object detection with Caffe

Lecture 6: Classification & Localization. boris. ginzburg@intel.com

Introduction to Machine Learning CMU-10701

Getting Started with Caffe Julien Demouth, Senior Engineer

Compacting ConvNets for end to end Learning

Steven C.H. Hoi School of Information Systems Singapore Management University

Learning to Process Natural Language in Big Data Environment

Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection

Module 5. Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016

Network Morphism. Abstract. 1. Introduction. Tao Wei

Pedestrian Detection with RCNN

Stochastic Pooling for Regularization of Deep Convolutional Neural Networks

A Dynamic Convolutional Layer for Short Range Weather Prediction

Image Classification for Dogs and Cats

Administrivia. Traditional Recognition Approach. Overview. CMPSCI 370: Intro. to Computer Vision Deep learning

Fast R-CNN. Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th. Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:

Deep Residual Networks

CAP 6412 Advanced Computer Vision

Supporting Online Material for

arxiv: v1 [cs.cv] 6 Feb 2015

Sequence to Sequence Weather Forecasting with Long Short-Term Memory Recurrent Neural Networks

Pedestrian Detection using R-CNN

Convolutional Networks for Stock Trading

Do Deep Nets Really Need to be Deep?

Advanced analytics at your hands

Lecture 8 February 4

Simplified Machine Learning for CUDA. Umar

Deep Learning using Linear Support Vector Machines

Deformable Part Models with CNN Features

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

NEURAL NETWORKS A Comprehensive Foundation

InstaNet: Object Classification Applied to Instagram Image Streams

CNN Based Object Detection in Large Video Images. WangTao, IQIYI ltd

Novelty Detection in image recognition using IRF Neural Networks properties

MulticoreWare. Global Company, 250+ employees HQ = Sunnyvale, CA Other locations: US, China, India, Taiwan

SIGNAL INTERPRETATION

Efficient online learning of a non-negative sparse autoencoder

Machine Learning Introduction

An Early Attempt at Applying Deep Reinforcement Learning to the Game 2048

ImageNet Classification with Deep Convolutional Neural Networks

Multi-view Face Detection Using Deep Convolutional Neural Networks

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 8: Multi-Layer Perceptrons

arxiv: v2 [cs.cv] 19 Jun 2015

An Introduction to Neural Networks

IFT3395/6390. Machine Learning from linear regression to Neural Networks. Machine Learning. Training Set. t (3.5, -2,..., 127, 0,...

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

GPU-Based Deep Learning Inference:

Programming Exercise 3: Multi-class Classification and Neural Networks

Using Convolutional Neural Networks for Image Recognition

Multi-Column Deep Neural Network for Traffic Sign Classification

arxiv: v4 [cs.cv] 2 Apr 2015

Object Detection in Video using Faster R-CNN

Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승

Neural Networks and Support Vector Machines

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks

Big Data Analytics CSCI 4030

Machine Learning and Pattern Recognition Logistic Regression

The Applications of Deep Learning on Traffic Identification

Chapter 4: Artificial Neural Networks

SSD: Single Shot MultiBox Detector

arxiv: v6 [cs.cv] 10 Apr 2015

DeepPlaylist: Using Recurrent Neural Networks to Predict Song Similarity

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15

Lecture 6. Artificial Neural Networks

Image Super-Resolution Using Deep Convolutional Networks

Transfer Learning for Latin and Chinese Characters with Deep Neural Networks

Predict Influencers in the Social Network

Analecta Vol. 8, No. 2 ISSN

MACHINE LEARNING IN HIGH ENERGY PHYSICS

CSCI567 Machine Learning (Fall 2014)

Latest Advances in Deep Learning. Yao Chou

STA 4273H: Statistical Machine Learning

Task-driven Progressive Part Localization for Fine-grained Recognition

Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation

LARGE-SCALE MALWARE CLASSIFICATION USING RANDOM PROJECTIONS AND NEURAL NETWORKS

A Simple Introduction to Support Vector Machines

Lecture 2: The SVM classifier

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

USTC Course for students entering Clemson F2013 Equivalent Clemson Course Counts for Clemson MS Core Area. CPSC 822 Case Study in Operating Systems

Fast Accurate Fish Detection and Recognition of Underwater Images with Fast R-CNN

Linear Threshold Units

Recurrent Neural Networks

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems

Image Caption Generator Based On Deep Neural Networks

Deep learning applications and challenges in big data analytics

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014

An Introduction to Deep Learning

Simple and efficient online algorithms for real world applications

arxiv: v1 [cs.cv] 29 Apr 2016

Transcription:

Department of Automation Association of Science and Technology of Automation March 20, 2016

Contents

Binary Figure 1: a cat? Figure 2: a dog? Binary : Given input data x (e.g. a picture), the output of a binary classifier y = f (x) is one label retrieved from a set of two labels y {±1}.

Linear Classifier x 2 + + + + + + + w T x = b x 1 Data set D = {(x (1) 1, x (1) (n) 2 ),, (x 1, x (n) 2 )} A linear binary classifier is a hyperplane w T x = b f (x) = sgn(w T x b)

of Linear Classifier + + x 2 + + + + + w T x = b x 1 True Positive: y = +1, f (x) = +1 True Negative: y = 1, f (x) = 1 False Positive: y = 1, f (x) = +1 False Negative: y = +1, f (x) = 1 Accuracy: TP+TN n Error Rate: FP+FN n A good classifier: minizing the error rate

Basic Concepts x 2 + + + + + + + x 1 Training Set Test Set Training Error Generalization Error Overfitting Loss Function w T x = b training error training data classifier output loss function

Overfitting + + x 2 + error rate generalization error + + + + x 1 training error iteration

Perceptron x 0 = 1 x 1 x 2 w 1 w 2 w 0 y x n w n n i=0 w ix i { +1, if w T x > 0 y = 1, otherwise

Perceptron x 1 x 2 w 1 w 2 x 0 = 1 w 0 o(x 0, x 1,, x n ) = n i=0 w ix i n i=0 w o ix i w n x n

Training Algorithm Define a loss function: How to reduce loss? E(w) = 1 (t d o d ) 2 2 d D 5 0 2 1 y 0 1 2 2 0 x 2

Gradient Descent Gradient w.r.t. w where E(w) = ( E w 0, E w 1,, E w n ) T E = (t d o d )( x (d) i ) w i d D for every iteration (α denotes learning rate) w i w i + w i w i = α E = α (t d o d )x (d) i w i d D i [n]

Artificial input layer x (1) 1 x (1) 2 hidden layer x (2) 1 x (2) 2 x (2) 3 output layer x (3) 1 h is a non-linear function. x l+1 = h((w l ) T x l )

Sigmoid Function h(x) = h(x) 1 1 1 + e x 0 1. continuous, differentiable 2. map [, + ] to [0, 1] 3. nonlinearity 4. h (x) is easy to calculate h (x) = h(x)(1 h(x)) x

Back Propagation and Delta Rule Please refer to this page Mathematical model of ANN x l = f (u l ), u l = (W l 1 ) T x l 1 where l denotes the current layer with the output layer designated to be layer L and the input layer desiganted to ba layer 1. Function f ( ) is a nonlinear function (i.e. sigmoid or hyperbolic tangent). Define loss function as E(x L, t) where x L is the network output and t is the target output.

Back Propagation and Delta Rule Expand the loss function E(x L, t) = E(f ((W L 1 ) T x L 1 ), t) Using chain rule, we can write the derivatives w.r.t. W L 1 E W L 1 = x L 1 (f (u L ) E x L )T where denotes elementwise multiplication, and if we define we get δ L = f (u L ) E x L E W L 1 = x L 1 (δ L ) T

Back Propagation and Delta Rule If we calculate the δ term recursively it is easy to write δ l = f (u l ) ((W l ) T δ l+1 ), l = L 1,, 2 E W l = x l (δ l+1 ) T, l = L 2,, 1

Network Structure Figure 3: structure of convolutional neural network Convolution Layer Pooling Layer (Subsampling) Full-connected Layer (Inner-product) ReLU Layer Softmax Layer

Convolution Layer kernel size=3 3 j+2 i+2 g ij = h st k st s=i t=j

Convolution Layer stride=1 j+2 i+2 g ij = h st k st s=i t=j

Convolution Layer Why we choose convolution? Inspired from feature extraction - using different kernels On the assumption that pixels far from each other are independent Reducing adjustable network weights to avoid overfitting

Pooling Layer g ij = max{h 2i,2j, h 2i+1,2j, h 2i,2j+1, h 2i+1,2j+1 } No free parameter in pooling layer.

Pooling Layer g ij = max{h 2i,2j, h 2i+1,2j, h 2i,2j+1, h 2i+1,2j+1 } No free parameter in pooling layer.

Inner-product Known as full-connected layer. Weights are designated from every input to every output, namely y = W T x

Rectified Linear Unit A rectifier y = max{0, x} A rectified linear unit y = ln(1 + e x ) with its derivative w.r.t. x dy dx = 1 1 + e x y x ReLU improves CNN performance.

Softmax Loss Softmax: y i (x) = e x i n, i [n] k=1 ex k Derived from softmax regression, extension of logistic regression for multinomial classfication. Outputs of softmax layer are probabilities of each label. Softmax loss function: let label j be groundtruth, therefore e x j n L = ln(y j (x)) = ln( n ) = ln( e x k ) x j k=1 ex k k=1 L x i = y i (x) δ ij where δ ij = 1 iff i = j, and δ ij = 0 otherwise.

MNIST Database MNIST: Mixed National Institute of Standards and Technology Figure 4: Handwritten Digits 10 distinguishing classes

LeNet Review Figure 5: LeNet for MNIST input: a picture (size 28 28) conv1: 4 kernels (size 5 5) pool1: max pooling (size 2 2) conv2: 3 kernels (size 5 5) pool2: max pooling (size 2 2) ip: full-connected (192 10) softmax: 10 inputs, 10 prob outputs

Stochastic gradient descent Early stop Weight decay Rectified linear unit Finetune Overlapping pooling Dropout

Stochastic Gradient Descent Overall loss function: Batch loss function: E(w) = 1 (t d o d ) 2 2 d D E B (w) = 1 (t d o d ) 2 2 d B where B D and B D, e.g. 256 samples out of 100000 samples. Stochastic gradient: E B w

Stochastic Gradient Descent SGD algorithm: α: learning rate µ: momentum Advantages of SGD: w t+1 = w t + w t w t+1 = µ w t α L w t Reducing computing resources dramatically Escaping local optimal solutions

Early Stop error rate stop training generalization error training error iteration Stop before generalization error rises

Weight Decay Regularization: In algorithm: E(w) = 1 (t d o d ) 2 + λw T W 2 d D w t+1 = µ w t α L w t αλw t

Rectified Linear Unit y x In terms of training time with gradient descent, saturating nonlinearities such as sigmoid and tanh are much slower than non-saturating nonlinearities.

Dropout Overfitting can be reduced by using dropout to prevent complex co-adaptations on the training data. On each presentation of each training case, each hidden unit is randomly omitted from the network with a probability of 0.5, so a hidden unit cannot rely on other hidden units being present.

Network in Network Network in network is a new concept in GoogLeNet, the champion of ILSVRC (ImageNet Large Scale Visual Recognition Challenge) 2014. The key structure of network in network is called Inception module. The whole network structure.

Network in Network Motivations to choose this structure: Deeper network tend to have stronger learning ability - especially handling with large scale datasets Dimension Curse in neural network - exponential expansion of parameters Very sparse deep neural network can be constructed layer by layer by analyzing the correlation statistics Inception module is a try to approximate a sparse structure Details about inception module: Various scale of convolution is necessary for small or big features - inspired from Gabor filter 1 1 convolution is used to reduce dimension

Network in Network The result of ILSVRC 2014:

Residual Network Encountering degradation problem when network is going deeper:

Residual Network An intuitive solution to construct a deep model from a shallow model: shallow model identity map identity map m layers n layers (n m) Unable to reach a better solution than the original model.

Residual Network Learning a residual (F(x) = H(x) x): Finally reached the depth of up to 152 layers.

Residual Network The result of ILSVRC 2015:

BNN: Binarized s Deterministic or stochastic binarization: { x b +1 if x 0, = sgn(x) = 1 otherwise. x b = { +1 with probability p = σ(x), 1 with probability 1 p. 1 σ(x) O -1 1 x

BNN: Binarized s Binarized conv kernel: Only 42% of the filters are unique on CIFAR-10 ConvNet.

BNN: Binarized s : 1. As accurate as 32-bit DNNs

BNN: Binarized s : 2. Less computional consumption

CNN Visualization Use deconvolutional network to visualize higher layer s feature map. Three keypoints are mentioned: Unpooling: approximate inverse of max pooling by recording the location of maxima Rectification: pass reconstructed signals through ReLU Filtering: use transposed version of the same filters

CNN Visualization Different results of visualization: Feature map in each layers Evolution of features

More Tremendous Development Long Short-Term Memory (LSTM) Region-based CNN (R-CNN, fast R-CNN, faster R-CNN) Deep Belief Networks (Restricted Boltzmann Machine) AlphaGo and Reinforment Learning CNN for Speech Recognition

Caffe Tutorial For more information please refer to this page. Key words: Nets, Layers and Blobs Forward / Backward Loss Solver Layer Catalogue Interfaces Data

Nets, Layers and Blobs Layer Net Blob : Softmax Regression

Forward / Backward f (w T x) L(f (w T x)) w T x L f L w x

Loss Softmax: y i (x) = e x i n, i [n] k=1 ex k Softmax loss function: let label j be groundtruth, therefore e x j n L = ln(y j (x)) = ln( n ) = ln( e x k ) x j k=1 ex k L x i = y i (x) δ ij where δ ij = 1 iff i = j, and δ ij = 0 otherwise. k=1

Solver SGD (Stochastic Gradient Descent) α: learning rate µ: momentum w t+1 = w t + w t w t+1 = µ w t α L w t

Solver Solver parameters (i.e.): basic learning rate: α = 0.01 learning rate policy: step (reduce learning rate according to step size) step size: 100000 gamma: 0.1 (multipy learning rate with factor 0.1 after step size) momentum: µ = 0.9 max iteration: 350000 (stop at iteration 350000)

Layer Catalogue Please refer to this page. Vision layer: convolution pooling Loss layer: softmax loss Euclidean loss cross-entropy Activation layer: sigmoid ReLU hyperbolic tangent

Layer Catalogue Data layer: datebase in-memory HDF5 input HDF5 output Common layer: inner product splitting flatening reshape concatenation

Installation Prerequisites: protobuf, CUDA, OpenBLAS, Boost, OpenCV, lmdb, leveldb, cudnn(optional), Python(optional), numpy(optional), MATLAB(optional) Install: git clone git://github.com/bvlc/caffe /your/own/caffe/folder Go to Caffe root folder cp Makefile.config.example Makefile.config make all make test make runtest Hardware: K40, K20, Titan for ImageNet scale GTX series or GPU-equipped MacBook Pro for small datasets

LeNet LeNet Structure 1. Protobuf Protocol 2. Run!

How to be Professional? 1. Figure out theoretical keypoints (read papers) 2. Read Caffe source code 3. Be proficient at programming and debugging skills 4. Take advantage of search engine and community 5. Do it through this pipeline: Experiment design Data preparation (build database with tools) Model selection (including network and solver) Training Analysis and comparison

Reference I Ossama Abdel-Hamid, Li Deng, and Dong Yu. Exploring convolutional neural network structures and optimization techniques for speech recognition. In INTERSPEECH, pages 3366 3370, 2013. Christopher M Bishop. Pattern recognition. Machine Learning, 2006. Ross Girshick. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, pages 1440 1448, 2015. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. arxiv preprint arxiv:1512.03385, 2015.

Reference II Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735 1780, 1997. Itay Hubara, Daniel Soudry, and Ran El Yaniv. Binarized neural networks. arxiv preprint arxiv:1602.02505, 2016. Herbert Jaeger. Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the echo state network approach. GMD-Forschungszentrum Informationstechnik, 2002.

Reference III Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia, pages 675 678. ACM, 2014. Abdel-rahman Mohamed, George E Dahl, and Geoffrey Hinton. Acoustic modeling using deep belief networks. Audio, Speech, and Language Processing, IEEE Transactions on, 20(1):14 22, 2012.

Reference IV Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems, pages 91 99, 2015. David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484 489, 2016.

Reference V Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1 9, 2015. Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In Computer vision ECCV 2014, pages 818 833. Springer, 2014.