Introduction to Deep Learning Variational Inference, Mean Field Theory



Similar documents
STA 4273H: Statistical Machine Learning

Course: Model, Learning, and Inference: Lecture 5

Variational Mean Field for Graphical Models

Probabilistic Latent Semantic Analysis (plsa)

A Learning Based Method for Super-Resolution of Low Resolution Images

Introduction to Machine Learning CMU-10701

arxiv: v2 [cs.lg] 9 Apr 2014

Neural Networks for Machine Learning. Lecture 13a The ups and downs of backpropagation

How Conditional Random Fields Learn Dynamics: An Example-Based Study

Classification in Networked Data: A Toolkit and a Univariate Case Study

Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network

Tracking Groups of Pedestrians in Video Sequences

MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr

Various applications of restricted Boltzmann machines for bad quality training data

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Invited Applications Paper

Methods of Data Analysis Working with probability distributions

Pixels Description of scene contents. Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT) Banksy, 2006

Generating more realistic images using gated MRF s

CSCI567 Machine Learning (Fall 2014)

Structured Learning and Prediction in Computer Vision. Contents

Section 5. Stan for Big Data. Bob Carpenter. Columbia University

Reinforcement Learning with Factored States and Actions

Programming Tools based on Big Data and Conditional Random Fields

Fast Semantic Segmentation of 3D Point Clouds using a Dense CRF with Learned Parameters

Cell Phone based Activity Detection using Markov Logic Network

Dynamic Programming and Graph Algorithms in Computer Vision

Towards running complex models on big data

Finding the M Most Probable Configurations Using Loopy Belief Propagation

Robust 3D Scan Point Classification using Associative Markov Networks

Object Recognition. Selim Aksoy. Bilkent University

Norbert Schuff Professor of Radiology VA Medical Center and UCSF

Deterministic Sampling-based Switching Kalman Filtering for Vehicle Tracking

Manifold Learning with Variational Auto-encoder for Medical Image Analysis

Evaluation of Machine Learning Techniques for Green Energy Prediction

Chapter 14 Managing Operational Risks with Bayesian Networks

Machine Learning for Medical Image Analysis. A. Criminisi & the InnerEye MSRC

Statistical Models in Data Mining

Supporting Online Material for

NEURAL NETWORKS A Comprehensive Foundation

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Journal of Machine Learning Research 1 (2013) 1-1 Submitted 8/13; Published 10/13

Conditional Random Fields as Recurrent Neural Networks

Linear Classification. Volker Tresp Summer 2015

Graphical Models, Exponential Families, and Variational Inference

Social Media Mining. Data Mining Essentials

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Optical Flow. Shenlong Wang CSC2541 Course Presentation Feb 2, 2016

Bildverarbeitung und Mustererkennung Image Processing and Pattern Recognition

Deformable Part Models with CNN Features

An Analysis of Single-Layer Networks in Unsupervised Feature Learning

A Practical Guide to Training Restricted Boltzmann Machines

Local features and matching. Image classification & object localization

Basics of Statistical Machine Learning

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

Deep Belief Nets (An updated and extended version of my 2007 NIPS tutorial)

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Bayesian networks - Time-series models - Apache Spark & Scala

Programming Exercise 3: Multi-class Classification and Neural Networks

Max Flow. Lecture 4. Optimization on graphs. C25 Optimization Hilary 2013 A. Zisserman. Max-flow & min-cut. The augmented path algorithm

Distributed Structured Prediction for Big Data

Visualizing Higher-Layer Features of a Deep Network

HT2015: SC4 Statistical Data Mining and Machine Learning

Semantic Recognition: Object Detection and Scene Segmentation

Semantic Image Segmentation and Web-Supervised Visual Learning

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images

Statistical Machine Learning from Data

Approximating the Partition Function by Deleting and then Correcting for Model Edges

Signature Segmentation from Machine Printed Documents using Conditional Random Field

Support Vector Machine (SVM)

The Basics of Graphical Models

Bayesian probability theory

How To Model The Labeling Problem In A Conditional Random Field (Crf) Model

Generative versus discriminative training of RBMs for classification of fmri images

Learning Deep Architectures for AI. Contents

Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations

Principled Hybrids of Generative and Discriminative Models

Operations and Supply Chain Management Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras

Fast Matching of Binary Features

Moral Hazard. Itay Goldstein. Wharton School, University of Pennsylvania

Learning multiple layers of representation

2 Signature-Based Retrieval of Scanned Documents Using Conditional Random Fields

Bayes and Naïve Bayes. cs534-machine Learning

Robert Collins CSE598G. More on Mean-shift. R.Collins, CSE, PSU CSE598G Spring 2006

Maximum Likelihood Graph Structure Estimation with Degree Distributions

Recurrent Neural Networks

Multi-Class and Structured Classification

How To Classify Objects From 3D Data On A Robot

Visualization by Linear Projections as Information Retrieval

Tracking and Recognition in Sports Videos

Lecture 6: CNNs for Detection, Tracking, and Segmentation Object Detection

Compression algorithm for Bayesian network modeling of binary systems

A Fast Learning Algorithm for Deep Belief Nets

Simple and efficient online algorithms for real world applications

Behavior Analysis in Crowded Environments. XiaogangWang Department of Electronic Engineering The Chinese University of Hong Kong June 25, 2011

Transcription:

Introduction to Deep Learning Variational Inference, Mean Field Theory 1 Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Center for Visual Computing Ecole Centrale Paris Galen Group INRIA-Saclay

Lecture 3: recap 2 Network Architectures Boltzmann Machine Restricted Boltzmann Machine

Boltzmann Machine (Hinton & Sejnowski, 1983 +) 3 Full-blown Ising Model Parameter estimation Once again: Training data MCMC 3

Boltzmann Machine limitations 4 Underlying statistical model: constrains second-order moments This will not get us too far even with extra information 4

Hidden variables, to the resque! 5 hidden, h observed, x 5

Boltzmann Machine: a big mixture model 6 Marginalization Mixture components Mixing weights compositional structure of components: h mixes and mashes rows of U 6

Botlzmann machine learning 7 As before, but with hidden variables

Botlzmann machine learning 8

Restricted Boltzmann Machine 9 hidden, h observed, x

RBM 10 RBM 10

The perks of a Restricted Boltzmann Machine 11 All hidden units are conditionally independent given the visible units and vice versa. We can update them in batch mode! 11

Restricted Boltzmann Machine sampling 12 Block-Gibbs MCMC 12

RBM inference 13 Block-Gibbs MCMC 13

RBM learning 14 Maximize with respect to 14

Lecture 4 15 Variational Approximations Mean Field Inference

Entropy reminder 16 Entropy = optimal coding length 16

Relative Entropy (Kullback-Leibler divergence) 17 Information lost when Q is used to approximate P: The KL divergence measures the expected number of extra bits required to code samples from P when using a code optimized for Q, rather than using the true code optimized for P. but (not a proper distance) 17

Step 1: Bounding the expectation of a convex function 18 Convex function: For more summands (Jensen s inequality): 18

Step 2: Bounding the KL divergence 19 Convex function: For we get KL divergence We also observe: By Jensen s inequality 19

Variational Inference 20 where makes the minimization tractable Typical family ( naïve mean field ): 20

21 Gibbs Sampling (one variant of MCMC) x 1 x 2 ),,, ( ~ ) ( ) ( 3 ) ( 2 1 1) ( 1 t K t t t x x x x x! π + ),,, ( ~ ) ( ) ( 3 1) ( 1 2 1) ( 2 t K t t t x x x x π x! + + ),, ( ~ 1) ( 1 1) ( 1 1) ( + + + t K t K t K x x x x! π Variational Inference versus MCMC Variational inference: try to match distribution with member of

Variational Inference for Boltzmann-Gibbs distribution 22 Exponential family: Variational Free Energy: 22

Ising model 23 Boltzmann-Gibbs distribution Ising model: Variational Free Energy:

Lecture 4 24 Variational Approximations Mean Field Inference

Naïve Mean Field for binary random variables 25 Factored distribution: Notation:

Naïve Mean Field for Ising model 26 - - - -

Naïve Mean Field for Ising model 27 Independent variables: additive entropy

Putting it all together 28 - Condition for extremum after some algebra.. Mean Field Equations:

Lecture 4 29 Variational Approximations Mean Field Inference Applications to computer vision (fully connected CRFs)

Mean Field Theory & Computer Vision 30 Discrete/Continuous Hopfield Networks (1982/1984) Yuille & coworkers (1985-199X) 1998+ Loopy Belief Propagation >(?) Mean Field 2011: Mean Field for fully connected CRF s

Winkler, 1995, p. 32 MRF nodes as pixels

MRFs nodes as patches 32 image Φ(x i, y i ) scene image Ψ(x i, x j ) scene

Network joint probability 33 1 P ( x, y ) = Ψ ( x, x ) Φ ( x, y ) i j i i Z scene image i, j Scene-scene compatibility function neighboring scene nodes i Image-scene compatibility function local observations

MRFs for Denoising (Geman & Geman, 1984) 34 Φ(x i, y i ) Noisy Pixel Intensities Ψ(x i, x j ) Clean Image

MRFs for Segmentation 35

Ising model (two labels) 36 Model for Binary vectors: Samples from Ising model for different Temperatures 36

Potts model (K-labels) 37 Multiple labels: Samples from Potts model for different Temperatures 37

Network Joint Probability 38 Scene Image Image-scene compatibility function Local observations Scene-scene compatibility function Neighboring scene nodes

Generative Framework for Vision 39 MRF: joint model over scene and observations Vision Task: recover scene given observations Bayes rule Posterior Likelihood Prior

Conditional Random Fields 40 MRF x 1 x 2 x 3 x 4 x 5 x 6 y 1 y 2 y 3 y 4 y 5 y 6 x 1 x 2 x 3 x 4 x 5 x 6 CRF y 1 y 2 y 3 y 4 y 5 y 6 CRFs: keep MRF tools, drop Bayesian aspect

CRFs in a nutshell 41

Grid CRF 42

Grid CRF limitations 43

Grid CRF limitations 44

45 2011: Fully-connected CRF (Krahnebuhl & Koltun) Philipp Krähenbühl and Vladlen Koltun, Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 2011

Fully-connected CRF 46 Philipp Krähenbühl and Vladlen Koltun, Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 2011

Fully-connected CRF 47 Philipp Krähenbühl and Vladlen Koltun, Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 2011

Fully-connected CRF 48 Philipp Krähenbühl and Vladlen Koltun, Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 2011

Fully-connected CRF: FAST 49 How? Mean Field + some tricks

Trick: Pairwise Term 50 Potts model Gaussian kernels Fast summation through separable convolution Philipp Krähenbühl and Vladlen Koltun, Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 2011

2014: Fully connected CRFs + Deep Classifiers 51 L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. Yuille Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, arxiv:1412.7062v1, 2014

Evolution from mean field updates 52 L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. Yuille Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, arxiv:1412.7062v1, 2014

Results (input, DCNN, CRF-DCNN) 53 L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. Yuille Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, arxiv:1412.7062v1, 2014

Results (input, DCNN, CRF-DCNN) 54 L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. Yuille Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, arxiv:1412.7062v1, 2014

Comparisons to other techniques 55 L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. Yuille Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, arxiv:1412.7062v1, 2014

Comparisons to previous state-of-the-art 56 L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. Yuille Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, arxiv:1412.7062v1, 2014