PhD in Computer Science and Engineering Bologna, April 2016. Machine Learning. Marco Lippi. marco.lippi3@unibo.it. Marco Lippi Machine Learning 1 / 80

Similar documents
Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15

Applications of Deep Learning to the GEOINT mission. June 2015

Learning to Process Natural Language in Big Data Environment

Steven C.H. Hoi School of Information Systems Singapore Management University

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

CS 1699: Intro to Computer Vision. Deep Learning. Prof. Adriana Kovashka University of Pittsburgh December 1, 2015

Neural Networks and Support Vector Machines

Introduction to Machine Learning CMU-10701

CIKM 2015 Melbourne Australia Oct. 22, 2015 Building a Better Connected World with Data Mining and Artificial Intelligence Technologies

Sequence to Sequence Weather Forecasting with Long Short-Term Memory Recurrent Neural Networks

Deep learning applications and challenges in big data analytics

DEEP LEARNING WITH GPUS

Implementation of Neural Networks with Theano.

Deep Learning For Text Processing

Taking Inverse Graphics Seriously

A Look Into the World of Reddit with Neural Networks

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Deep Learning for Big Data

SIGNAL INTERPRETATION

Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation

Machine Learning Introduction

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

Lecture 6: Classification & Localization. boris. ginzburg@intel.com

Compacting ConvNets for end to end Learning

Recurrent Neural Networks

Programming Exercise 3: Multi-class Classification and Neural Networks

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

A picture is worth five captions

Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015

Image and Video Understanding

CS 40 Computing for the Web

Software Development Training Camp 1 (0-3) Prerequisite : Program development skill enhancement camp, at least 48 person-hours.

Projects - Neural and Evolutionary Computing

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

RECURSIVE DEEP LEARNING FOR NATURAL LANGUAGE PROCESSING AND COMPUTER VISION

Administrivia. Traditional Recognition Approach. Overview. CMPSCI 370: Intro. to Computer Vision Deep learning

Machine Learning What, how, why?

Image Classification for Dogs and Cats

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju

Machine Learning and Data Mining -

Machine learning for algo trading

3D Object Recognition using Convolutional Neural Networks with Transfer Learning between Input Channels

Introduction to Pattern Recognition

Machine Learning: Overview

Neural Networks in Data Mining

Lecture 8 February 4

An Analysis of Single-Layer Networks in Unsupervised Feature Learning

Back Propagation Neural Networks User Manual

Università degli Studi di Bologna

An Introduction to Deep Learning

Learning Deep Architectures for AI. Contents

MulticoreWare. Global Company, 250+ employees HQ = Sunnyvale, CA Other locations: US, China, India, Taiwan

SELECTING NEURAL NETWORK ARCHITECTURE FOR INVESTMENT PROFITABILITY PREDICTIONS

Using Convolutional Neural Networks for Image Recognition

HPC ABDS: The Case for an Integrating Apache Big Data Stack

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Getting Started with Caffe Julien Demouth, Senior Engineer

Pedestrian Detection with RCNN

Data, Measurements, Features

GPU-Based Deep Learning Inference:

Big Data Systems CS 5965/6965 FALL 2015

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015

The Applications of Deep Learning on Traffic Identification

Fast R-CNN. Author: Ross Girshick Speaker: Charlie Liu Date: Oct, 13 th. Girshick, R. (2015). Fast R-CNN. arxiv preprint arxiv:

How To Use Neural Networks In Data Mining

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Fall 2012 Q530. Programming for Cognitive Science

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina

IFT3395/6390. Machine Learning from linear regression to Neural Networks. Machine Learning. Training Set. t (3.5, -2,..., 127, 0,...

Machine Learning over Big Data

Advanced analytics at your hands

DM810 Computer Game Programming II: AI. Lecture 11. Decision Making. Marco Chiarandini

Predict Influencers in the Social Network

Machine Learning for Data Science (CS4786) Lecture 1

Computer Vision Technology. Dave Bolme and Steve O Hara

Transfer Learning for Latin and Chinese Characters with Deep Neural Networks

Machine Learning with MATLAB David Willingham Application Engineer

Deep Learning GPU-Based Hardware Platform

Image Caption Generator Based On Deep Neural Networks

NEURAL NETWORKS A Comprehensive Foundation

Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network

Manifold Learning with Variational Auto-encoder for Medical Image Analysis

Support Vector Machine. Tutorial. (and Statistical Learning Theory)

Annotated bibliographies for presentations in MUMT 611, Winter 2006

Deep Learning and GPUs. Julie Bernauer

What is Artificial Intelligence?

Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks

Supporting Online Material for

Visualizing Higher-Layer Features of a Deep Network

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

NEURAL NETWORK FUNDAMENTALS WITH GRAPHS, ALGORITHMS, AND APPLICATIONS

Analysis Tools and Libraries for BigData

Recurrent Convolutional Neural Networks for Text Classification

Efficient online learning of a non-negative sparse autoencoder

Search and Information Retrieval

Transcription:

PhD in Computer Science and Engineering Bologna, April 2016 Machine Learning Marco Lippi marco.lippi3@unibo.it Marco Lippi Machine Learning 1 / 80

Recurrent Neural Networks Marco Lippi Machine Learning 2 / 80

Material and references Many figures are taken from Cristopher Olah s tutorial (colah.github.io) and Alex Karpathy s post The unreasonable effectiveness of Recurrent Neural Networks Marco Lippi Machine Learning 3 / 80

Processing sequences In many tasks we should deal with variable-size input (text, audio, video) while most algorithms can deal with just fixed-size input Marco Lippi Machine Learning 4 / 80

Processing sequences In many tasks we should deal with variable-size input (text, audio, video) while most algorithms can deal with just fixed-size input 1. Standard tasks such as image classification Marco Lippi Machine Learning 5 / 80

Processing sequences In many tasks we should deal with variable-size input (text, audio, video) while most algorithms can deal with just fixed-size input 2. Fixed input, sequence output: e.g., image captioning Marco Lippi Machine Learning 6 / 80

Processing sequences In many tasks we should deal with variable-size input (text, audio, video) while most algorithms can deal with just fixed-size input 3. Sequence classification: e.g., sentiment classification Marco Lippi Machine Learning 7 / 80

Processing sequences In many tasks we should deal with variable-size input (text, audio, video) while most algorithms can deal with just fixed-size input 4. Sequence input/output: e.g., machine translation Marco Lippi Machine Learning 8 / 80

Processing sequences In many tasks we should deal with variable-size input (text, audio, video) while most algorithms can deal with just fixed-size input 5. Synced sequence input/output: e.g., video frame classification Marco Lippi Machine Learning 9 / 80

Processing sequences One possibility: use convolutional neural networks with max pooling over all feature maps [Figure by Kim, 2014] Problem: transalation invariance... Marco Lippi Machine Learning 10 / 80

Processing sequences In a recurrent neural network the hidden state at time t depends on the input on the hidden state at time t 1 (memory) y t = f V (h t ) h t = f W (h t 1, x t ) Marco Lippi Machine Learning 11 / 80

Processing sequences We can parametrize: input-hidden connections hidden-hidden connections hidden-output connections y t = W hy h t h t = tanh(w hh h t 1 + W xh x t ) Marco Lippi Machine Learning 12 / 80

Recurrent Neural Network Language Model x(t) = w(t) + s(t 1) s j (t) = f ( i x i (t)u ji ) y k (t) = g( j s j (t)v kj ) [Figure by Mikolov et al., 2010] Wrt NNLM, no need to specify context dimension in advance! Marco Lippi Machine Learning 13 / 80

Recurrent Neural Networks (RNNs) A classic RNN can be unrolled through time, so that the looping connections are made explicit Marco Lippi Machine Learning 14 / 80

Recurrent Neural Networks (RNNs) Backpropagation Through Time (BPTT) [Figure from Wikipedia] Marco Lippi Machine Learning 15 / 80

Recurrent Neural Networks (RNNs) BPTT drawbacks: decide the value k for unfolding exploding or vanishing gradients exploding could be controlled with gradient clipping vanishing has to be faced with different models (LSTM) Marco Lippi Machine Learning 16 / 80

Recurrent Neural Networks (RNNs) Marco Lippi Machine Learning 17 / 80

Recurrent Neural Networks (RNNs) Some short- or mid-term dependencies can be afforded... Marco Lippi Machine Learning 18 / 80

Recurrent Neural Networks (RNNs) But the model fails in learning long-term dependencies! The main problem is vanishing gradients in BPTT... Marco Lippi Machine Learning 19 / 80

Long Short-Term Memory Networks (LSTMs) An LSTM is basically an RNN with a different computational block LSTMs were designed by Hochreiter & Schmidhuber in 1997! Marco Lippi Machine Learning 20 / 80

Long Short-Term Memory Networks (LSTMs) An LSTM block features a memory cell which can maintain its state over time, and non-linear gating units which regulate the information flow into and out of the cell [Greff et al., 2015] Marco Lippi Machine Learning 21 / 80

Long Short-Term Memory Networks (LSTMs) Cell state Just let the information go through the network Other gates can optionally let information through Marco Lippi Machine Learning 22 / 80

Long Short-Term Memory Networks (LSTMs) Forget gate Sigmoid layer that produces weights for the state cell C t 1 Decides what to keep (1) or forget (0) of past cell state f t = σ(w f [h t 1, x t ] + b f ) Marco Lippi Machine Learning 23 / 80

Long Short-Term Memory Networks (LSTMs) Input gate Allows novel information be used to update state cell C t 1 i t = σ(w i [h t 1, x t ] + b i ) C t = tanh(w C [h t 1, x t ] + b C ) Marco Lippi Machine Learning 24 / 80

Long Short-Term Memory Networks (LSTMs) Cell state update Combine old state (after forgetting) with novel input C t = f t C t 1 + i t C t Marco Lippi Machine Learning 25 / 80

Long Short-Term Memory Networks (LSTMs) Output gate Build the output to be sent to next layer and upper block o t = σ(w o [h t 1, x t ] + b O ) h t = o t tanh(c t ) Marco Lippi Machine Learning 26 / 80

Long Short-Term Memory Networks (LSTMs) Putting all together f t = σ(w f [h t 1, x t ] + b f ) i t = σ(w i [h t 1, x t ] + b i ) o t = σ(w o [h t 1, x t ] + b O ) h t = o t tanh(c t ) C t = f t C t 1 + i t C t Marco Lippi Machine Learning 27 / 80

Application: text generation Example: character-level language model [Figure by A. Karpathy] Marco Lippi Machine Learning 28 / 80

Application: text generation Example: character-level language model Pick up a (large) plain text file Feed the LSTM with the text Predict next character given past history At prediction time, sample from output distribution Get LSTM-generated text Marco Lippi Machine Learning 29 / 80

[Figure by A. Karpathy] Marco Lippi Machine Learning 30 / 80

Application: text generation Interpretation of cells (blue=off, red=on) [Figure by A. Karpathy] Marco Lippi Machine Learning 31 / 80

Application: text generation Interpretation of cells (blue=off, red=on) [Figure by A. Karpathy] Marco Lippi Machine Learning 32 / 80

Application: text generation Example: training with Shakespeare poems [Figure by A. Karpathy] Marco Lippi Machine Learning 33 / 80

Application: text generation Example: training with LaTeX math papers [Figure by A. Karpathy] Marco Lippi Machine Learning 34 / 80

Application: text generation Example: training with LaTeX math papers [Figure by A. Karpathy] Marco Lippi Machine Learning 35 / 80

Application: text generation Example: training with Linux Kernel (C code) [Figure by A. Karpathy] Marco Lippi Machine Learning 36 / 80

Application: image captioning [Figure by A. Karpathy] Marco Lippi Machine Learning 37 / 80

Application: handwriting text generation [Figure by Graves, 2014] Marco Lippi Machine Learning 38 / 80

Variants of the LSTM model Many different LSTM variants have been proposed: Peep-hole connections (almost standard now) all gates can also look at the state Coupling forget and input gates basically we input only when we forget Gated Recurrent Units (GRUs) where a single gate controls forgetting and update See LSTM: a search space odyssey by Greff et al., 2015 Marco Lippi Machine Learning 39 / 80

Recursive Neural Networks (RecNNs) A generalization of RNNs to handle structured data in the form of a dependency graph such as a tree [Figure by Socher et al., 2014] RNNs can be seen as RecNNs having a linear chain structure. Marco Lippi Machine Learning 40 / 80

Recursive Neural Networks (RecNNs) Exploit compositionality (e.g., parse trees in text) [Figure by R. Socher] Marco Lippi Machine Learning 41 / 80

Recursive Neural Networks (RecNNs) Exploit compositionality (e.g., object parts in images) [Figure by Socher et al., 2011] Marco Lippi Machine Learning 42 / 80

Recursive Neural Tensor Networks (RNTNs) Recently (2013) proposed by Stanford for sentiment analysis composition function over sentence parse tree exploit parameters (tensors) that are common to all nodes (tensor) backpropagation through structure Later (2014-2015) also a tree-structured version of LSTMs Marco Lippi Machine Learning 43 / 80

Recursive Neural Tensor Networks (RNTNs) Sentiment Treebank built by crowdsourcing 11,855 sentences from movie reviews 215,154 labeled phrases (sub-sentences) [Figure by Socher et al., 2014] Marco Lippi Machine Learning 44 / 80

Recursive Neural Tensor Networks (RNTNs) [Figure by Socher et al., 2014] Marco Lippi Machine Learning 45 / 80

Recursive Neural Tensor Networks (RNTNs) [Figure by Socher et al., 2014] Marco Lippi Machine Learning 46 / 80

Recursive Neural Tensor Networks (RNTNs) [Figure by Socher et al., 2014] Marco Lippi Machine Learning 47 / 80

Recursive Neural Tensor Networks (RNTNs) [Figure by Socher et al., 2014] Marco Lippi Machine Learning 48 / 80

Recursive Neural Tensor Networks (RNTNs) Very nice model but: not easy to adapt it to other domains (Reddit, Twitter, etc.) need to compute parse tree in advance need classification at phrase-level (expensive) Marco Lippi Machine Learning 49 / 80

Other applications Marco Lippi Machine Learning 50 / 80

Modeling speech Several tasks... Speech recognition Speaker identification Speaker gender classification Phone classification Music genre classification Artist classification Music retrieval Marco Lippi Machine Learning 51 / 80

Modeling speech Feature extraction from acoustic signals...... Tons of unsupervised data! [Figure by H. Lee] Marco Lippi Machine Learning 52 / 80

Modeling speech [Figure by H. Lee] Marco Lippi Machine Learning 53 / 80

Multimodal Deep Learning [Figure by Ngiam et al.] Can we gain something from shared representations...?... Most probably yes! Marco Lippi Machine Learning 54 / 80

Other applications: Bioinformatics, Neuroscience, etc... Protein structure prediction (contact maps) Deep Spatio-Temporal Architectures [Di Lena et al., 2012] Sequence: ASCDEVVGSACH...CPPGAERMMAYGV [Figure by Rafferty et al.] Marco Lippi Machine Learning 55 / 80

Other applications: Bioinformatics, Neuroscience, etc... Predicting the aqueous solubility of drug-like molecules Recursive neural networks [Lusci et al., 2013] [Figure by Lusci et al.] Marco Lippi Machine Learning 56 / 80

Ongoing Research and Concluding Remarks Marco Lippi Machine Learning 57 / 80

Summary Deep learning has obtained breakthrough results in many tasks exploit unsupervised data learning feature hierarchies some optimization tricks (dropout, rectification,...) Is this the solution to all AI problems? Probably not but... for certain types of task it is hard to compete huge datasets and many computational resources big companies will likely play the major role huge space for applications upon deep learning systems...but what is missing? Marco Lippi Machine Learning 58 / 80

A nice overview of methods Marco [Figure Lippi by M.A. Machine Ranzato] Learning 59 / 80

We are not yet there... Still far from human-level in unrestricted domains... Marco Lippi Machine Learning 60 / 80

Again: symbolic vs. sum-symbolic Still basically a sub-symbolic approach Representation learning: a step towards symbols... What are the connections with symbolic approaches? What about logic and reasoning? Some steps forward: babi tasks @ Facebook Neural Conversational Models @ Google Neural Turing Machines (NTMs) @ GoogleDeepMind Marco Lippi Machine Learning 61 / 80

babi tasks et al. (Facebook) Text understanding and reasoning with deep networks The (20) babi tasks The Children s Book Test The Movie Dialog dataset The SimpleQuestions dataset Marco Lippi Machine Learning 62 / 80

babi tasks (Facebook) [Table by Weston et al.] Marco Lippi Machine Learning 63 / 80

babi tasks (Facebook) [Table by Weston et al.] Marco Lippi Machine Learning 64 / 80

Children s Book Test [Table by Hill et al., 2016] Marco Lippi Machine Learning 65 / 80

Movie Dialog dataset [Table by Dodge et al., 2016] Marco Lippi Machine Learning 66 / 80

SimpleQuestions dataset [Table by Bordes et al., 2015] Marco Lippi Machine Learning 67 / 80

Neural Conversational Model (Google) [Table by Vinyalis & Le, 2015] Marco Lippi Machine Learning 68 / 80

Neural Conversational Model (Google) [Table by Vinyalis & Le, 2015] Marco Lippi Machine Learning 69 / 80

Neural Turing Machines (Google Deepmind) [Figure by Graves et al., 2014] Inspired by Turing Machines Memory (tape) to read/write through deep networks Capable of learning simple algorithms (e.g., sorting) Marco Lippi Machine Learning 70 / 80

Learning concepts Examples from the ImageNet category Restaurant Is it enough to learn the concept of a restaurant? Marco Lippi Machine Learning 71 / 80

Software packages and a few references Marco Lippi Machine Learning 72 / 80

Tensorflow Developed by Google Python Computational graph abstraction Parallelize over both data and model The multi-machine part is not open source Nice tool to visualize stuff (Tensorboard) A little slower than other systems Provides only a few (one?) pre-trained models Marco Lippi Machine Learning 73 / 80

Theano University of Montréal (Yoshua Bengio s group) Python Computational graph abstraction The code is somehow difficult (low-level) Long compile times Easily GPU and multi-gpu Marco Lippi Machine Learning 74 / 80

Keras and Lasagne Keras is a wrapper for Theano or Tensorflow Python Plug-and-play layers, losses, optimizers, etc. Sometimes error messages can be criptic... Lasagne is a wrapper for Theano Python Plug-and-play layers, losses, optimizers, etc. Model zoo with plenty pre-trained architectures Still employs some symbolic computation as plain Theano Marco Lippi Machine Learning 75 / 80

Torch Facebook, Google, Twitter, IDIAP,... mostly written in Lua and C sharing similarities to python (e.g. tensors vs. numpy arrays) module nn to train neural networks same code running for both CPU and GPU One notable product by Torch: Overfeat A CNN for image classification, object detection, etc. Marco Lippi Machine Learning 76 / 80

Caffe Berkeley Vision and Learning Center (BVLC) written in C++ Python and MATLAB bindings (although not much documented) very popular for CNNs, not much for RNNs... quite a large model zoo (AlexNet, GoogLeNet, ResNet,... ) scripts for training without writing code Just to give an idea of Caffe s performance... During training 60M images per day with a single GPU At test time 1 ms/image Marco Lippi Machine Learning 77 / 80

Matlab Plenty of resources Code by G. Hinton on RBMs and DBNs (easy to try!) Autoencoders (many different implementations) Convolutional neural networks... Marco Lippi Machine Learning 78 / 80

Websites and references http://deeplearning.net http://www.cs.toronto.edu/~hinton/ http://www.iro.umontreal.ca/~bengioy/ http://yann.lecun.com/ Introductory paper by Yoshua Bengio: Learning Deep Architectures for AI Marco Lippi Machine Learning 79 / 80

Videos Geoffrey Hinton on DBNs: https://www.youtube.com/watch?v=ayzoubkuf3m Yoshua Bengio (Deep Learning lectures): https://www.youtube.com/watch?v=juimbuvewbg Yann LeCun (CNNs, energy-based models): https://www.youtube.com/watch?v=oob4evklemq Marco Lippi Machine Learning 80 / 80