Implementation of Neural Networks with Theano. http://deeplearning.net/tutorial/

Similar documents

Deep Learning Tutorial

Simplified Machine Learning for CUDA. Umar

CS 1699: Intro to Computer Vision. Deep Learning. Prof. Adriana Kovashka University of Pittsburgh December 1, 2015

Package AMORE. February 19, 2015

Programming Exercise 3: Multi-class Classification and Neural Networks

Supporting Online Material for

STA 4273H: Statistical Machine Learning

Introduction to Machine Learning CMU-10701

Stochastic Pooling for Regularization of Deep Convolutional Neural Networks

Lecture 6. Artificial Neural Networks

IFT3395/6390. Machine Learning from linear regression to Neural Networks. Machine Learning. Training Set. t (3.5, -2,..., 127, 0,...

Feedforward Neural Networks and Backpropagation

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

Advanced analytics at your hands

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 8: Multi-Layer Perceptrons

Efficient online learning of a non-negative sparse autoencoder

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

IBM SPSS Neural Networks 22

Deep Learning with H2O. Arno Candel & Viraj Parmar

Predict Influencers in the Social Network

CSC 411: Lecture 07: Multiclass Classification

Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks

Image and Video Understanding

Novelty Detection in image recognition using IRF Neural Networks properties

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Data Mining. Supervised Methods. Ciro Donalek Ay/Bi 199ab: Methods of Sciences hcp://esci101.blogspot.

Lecture 8 February 4

SIGNAL INTERPRETATION

IBM SPSS Neural Networks 19

Machine Learning: Multi Layer Perceptrons

Administrivia. Traditional Recognition Approach. Overview. CMPSCI 370: Intro. to Computer Vision Deep learning

Improving Deep Neural Network Performance by Reusing Features Trained with Transductive Transference

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Neural Computation - Assignment

PhD in Computer Science and Engineering Bologna, April Machine Learning. Marco Lippi. Marco Lippi Machine Learning 1 / 80

Recurrent Neural Networks

Transfer Learning for Latin and Chinese Characters with Deep Neural Networks

Manifold Learning with Variational Auto-encoder for Medical Image Analysis

Visualizing Higher-Layer Features of a Deep Network

Sequence to Sequence Weather Forecasting with Long Short-Term Memory Recurrent Neural Networks

SELECTING NEURAL NETWORK ARCHITECTURE FOR INVESTMENT PROFITABILITY PREDICTIONS

Package neuralnet. February 20, 2015

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition

Feature Engineering in Machine Learning

Machine Learning and Pattern Recognition Logistic Regression

Natural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression

MapReduce/Bigtable for Distributed Optimization

Data Mining Techniques Chapter 7: Artificial Neural Networks

Image denoising: Can plain Neural Networks compete with BM3D?

SUCCESSFUL PREDICTION OF HORSE RACING RESULTS USING A NEURAL NETWORK

Efficient Learning of Sparse Representations with an Energy-Based Model

A Hybrid Neural Network-Latent Topic Model

A Logistic Regression Approach to Ad Click Prediction

NEURAL NETWORKS A Comprehensive Foundation

Deep Learning for Multivariate Financial Time Series. Gilberto Batres-Estrada

LARGE-SCALE MALWARE CLASSIFICATION USING RANDOM PROJECTIONS AND NEURAL NETWORKS

Role of Neural network in data mining

Neural Network Add-in

Application of Neural Network in User Authentication for Smart Home System

Neural network software tool development: exploring programming language options

Simple and efficient online algorithms for real world applications

Data Mining mit der JMSL Numerical Library for Java Applications

K-Means Clustering Tutorial

Deep Learning using Linear Support Vector Machines

CSCI567 Machine Learning (Fall 2014)

A Neural Support Vector Network Architecture with Adaptive Kernels. 1 Introduction. 2 Support Vector Machines and Motivations

Machine Learning Logistic Regression

Compacting ConvNets for end to end Learning

Logistic Regression for Spam Filtering

Deep learning via Hessian-free optimization

Why Does Unsupervised Pre-training Help Deep Learning?

A Dynamic Convolutional Layer for Short Range Weather Prediction

Performance Comparison between Backpropagation Algorithms Applied to Intrusion Detection in Computer Network Systems

Machine Learning and Data Mining -

Question 2 Naïve Bayes (16 points)

Data quality in Accounting Information Systems

Big Data Analytics Using Neural networks

Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation

Chapter 4: Artificial Neural Networks

The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization

Statistical Models in Data Mining

Data Mining Algorithms Part 1. Dejan Sarka

The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Expedia

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Linear Classification. Volker Tresp Summer 2015

Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승

More Data Mining with Weka

Generalized Denoising Auto-Encoders as Generative Models

Christfried Webers. Canberra February June 2015

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

Bayesian Dark Knowledge

Do Deep Nets Really Need to be Deep?

3F3: Signal and Pattern Processing

Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network

The Artificial Prediction Market

Deep Belief Nets (An updated and extended version of my 2007 NIPS tutorial)

Transcription:

Implementation of Neural Networks with Theano http://deeplearning.net/tutorial/

Feed Forward Neural Network (MLP) Hidden Layer Object Hidden Layer Object Hidden Layer Object Logistic Regression Object Inputlayer Outputlayer

Example for an Output Layer: LogisticRegression (SoftmaxCostLayer) Logistic Regression implemented as Class class LogisticRegression(object) Softmax-Output (Classification) output also for prediction Computation of the Cost for training

Minibatches input Data as a Matrix each row corresponds to a training example each column corresponds to a feature input.shape == (mini_batch_size, nb_inputs)

MNIST Dataset Handwritten Images (size: 28x28) 10 Classes input for visilble units: flattened image (vector) numpy.reshape

Train Error (Classification) Target class encoded as "one-hot": 1-of-k Cross entropy loss likelihood: (average) loss: cost of a mini batch in Theano example index vector with class id values return T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y]) matrix with shape (batch-size, nb-of-classes)

Class Probabilities Probability of each class is given by Softmax matrix with shape (size-of-minibatch, num-classes) self.p_y_given_x = T.nnet.softmax(T.dot(input, self.w) + self.b) - input is a matrix with shape (size-of-minibatch, nb-of-features) - the input is the first element of the dot-product, so the rows of the result say the different examples

Class Prediction Symbolic description of how to compute the prediction as the class whose probability is maximal self.y_pred = T.argmax(self.p_y_given_x, axis=1) argmax gives the index of a array vector of class labels the argmax is computed per row (over the column axis = 1 )

Layer has a list of parameters self.params = [self.w, self.b]

Model Parameters are Theano Shared Variables self.w = theano.shared( value=numpy.zeros( (n_in, n_out), dtype=theano.config.floatx ), name='w', borrow=true ) # initialize the biases b as a vector of n_out 0s self.b = theano.shared( value=numpy.zeros( (n_out,), dtype=theano.config.floatx ), name='b', borrow=true )

Gradient Learning is done by mini-batch stocastic gradient descent (MSGD) use just a few examples for frequent updates Symbolic gradient calculation by theano g_w = T.grad(cost=cost, wrt=classifier.w) g_b = T.grad(cost=cost, wrt=classifier.b) in ML-Libraries typically encapsulated in a "Trainer-Class" (not in the Tutorial)

Updates Update Rule (for each parameter) Cost Store updates in a list for the train_fn updates = [(classifier.w, classifier.w learning_rate * g_w), (classifier.b, classifier.b learning_rate * g_b)]

generate symbolic variables for input (x and y represent a minibatch) x = T.matrix('x') # data, presented as rasterized # images y = T.ivector('y') # labels, presented as 1D vector # of [int] labels construct the logistic regression class Each MNIST image has size 28*28 classifier = LogisticRegression(input=x, n_in=28 * 28, n_out=10)

Compilation of the training function by theano.function train_set_x, train_set_y are stored in shared variables, so they could be hold in GPU memory. With givens a dictionary is given for the computatiional graph y is the target used in the cost function x is the input used in the constructor call train_model = theano.function( inputs=[index], outputs=cost, updates=updates, givens={ x: train_set_x[index * batch_size: (index + 1) * batch_size], y: train_set_y[index * batch_size: (index + 1) * batch_size] } )

Classification error for Test and Validation For Validation and Testing T.mean(T.neq(self.y_pred, y)) is 1 if both values are same, else 0 test_model = theano.function( inputs=[index], outputs=classifier.errors(y), givens={ x: test_set_x[index * batch_size: (index + 1) * batch_size], y: test_set_y[index * batch_size: (index + 1) * batch_size] } ) same for validation_model (with validation_set_x and..y)

Training in Loop: maximally n_epochs while (epoch < n_epochs) and (not done_looping): Early Stopping based Validation error + Patience hyperparam if condition satiesfied not_done_looping = True http://deeplearning.net/tutorial/logreg.html

Feed Forward Neural Network (MLP) Hidden Layer Object Hidden Layer Object Hidden Layer Object Logistic Regression Object Inputlayer Outputlayer

Hidden Layers implemented as Class class HiddenLayer(object) Each layer computes a non linear transformation: a l 1 =activation W l a l b l input= a 0

Activation Functions tanh sigmoid (logistic function) rectifiers linear rectified units maxout

Weights Initialization e.g. tanh weight initialization X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, International conference on artificial intelligence and statistics 2010 Pretraining (see e.g. autoencoders) Orthogonal Initialization Andrew M. Saxe, James L. McClelland, Surya Ganguli: Exact solutions to the nonlinear dynamics of learning in deep linear neural networks, ICLR 2014 see also http://christianherta.de/lehre/datascience/machinelearning/neuralnetworks/weightinitialization.html

Connecting the layers (in Tutorial class MLP(object) ) self.hiddenlayer = HiddenLayer( rng=rng, input=input, n_in=n_in, n_out=n_hidden, activation=t.tanh ) self.logregressionlayer = LogisticRegression( input=self.hiddenlayer.output, n_in=n_hidden, n_out=n_out ) self corresponding to MLP

L1 and L2 Regularization self.l1 = ( abs(self.hiddenlayer.w).sum() + abs(self.logregressionlayer.w).sum() ) self.l2_sqr = ( (self.hiddenlayer.w ** 2).sum() + (self.logregressionlayer.w ** 2).sum() ) self corresponds to MLP

self.l1 = ( ) abs(self.hiddenlayer.w).sum() + abs(self.logregressionlayer.w).sum() self.l2_sqr = ( (self.hiddenlayer.w ** 2).sum() + (self.logregressionlayer.w ** 2).sum() ) self.negative_log_likelihood = ( self.logregressionlayer.negative_log_likelihood ) self.params = self.hiddenlayer.params + self.logregressionlayer.params self corresponds to MLP

Gradient Calculation/Update Rules cost = ( classifier.negative_log_likelihood(y) + L1_reg * classifier.l1 + L2_reg * classifier.l2_sqr ) gparams = [T.grad(cost, param) for param in classifier.params] updates = [(param, param learning_rate * gparam) for param, gparam in zip(classifier.params, gparams) ]

Training in Loop: maximally n_epochs while (epoch < n_epochs) and (not done_looping): Early Stopping based Validation error + Patience hyperparam if condition satiesfied not_done_looping = True http://deeplearning.net/tutorial/mlp.html

Hyperparameters Learning rate How many layers, how many units per layer Regularization Activation function of the units Momentum etc. Yoshua Bengio, Practical recommendations for gradient-based training of deep architectures,neural Networks: Tricks of the Trade, Lecture Notes in Computer Science Volume 7700, 2012 http://arxiv.org/abs/1206.5533 YA LeCun, L Bottou, GB Orr, KR Müller: Efficient BackProp, Neural Networks: Tricks of the Trade 1998

Libraries/Frameworks Pylearn2 (based on Theano) Torch Caffe (mainly focused on vision) for a complete list see http://deeplearning.net/software_links/