Introduction to Deep Learning Variational Inference, Mean Field Theory 1 Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Center for Visual Computing Ecole Centrale Paris Galen Group INRIA-Saclay
Lecture 3: recap 2 Network Architectures Boltzmann Machine Restricted Boltzmann Machine
Boltzmann Machine (Hinton & Sejnowski, 1983 +) 3 Full-blown Ising Model Parameter estimation Once again: Training data MCMC 3
Boltzmann Machine limitations 4 Underlying statistical model: constrains second-order moments This will not get us too far even with extra information 4
Hidden variables, to the resque! 5 hidden, h observed, x 5
Boltzmann Machine: a big mixture model 6 Marginalization Mixture components Mixing weights compositional structure of components: h mixes and mashes rows of U 6
Botlzmann machine learning 7 As before, but with hidden variables
Botlzmann machine learning 8
Restricted Boltzmann Machine 9 hidden, h observed, x
RBM 10 RBM 10
The perks of a Restricted Boltzmann Machine 11 All hidden units are conditionally independent given the visible units and vice versa. We can update them in batch mode! 11
Restricted Boltzmann Machine sampling 12 Block-Gibbs MCMC 12
RBM inference 13 Block-Gibbs MCMC 13
RBM learning 14 Maximize with respect to 14
Lecture 4 15 Variational Approximations Mean Field Inference
Entropy reminder 16 Entropy = optimal coding length 16
Relative Entropy (Kullback-Leibler divergence) 17 Information lost when Q is used to approximate P: The KL divergence measures the expected number of extra bits required to code samples from P when using a code optimized for Q, rather than using the true code optimized for P. but (not a proper distance) 17
Step 1: Bounding the expectation of a convex function 18 Convex function: For more summands (Jensen s inequality): 18
Step 2: Bounding the KL divergence 19 Convex function: For we get KL divergence We also observe: By Jensen s inequality 19
Variational Inference 20 where makes the minimization tractable Typical family ( naïve mean field ): 20
21 Gibbs Sampling (one variant of MCMC) x 1 x 2 ),,, ( ~ ) ( ) ( 3 ) ( 2 1 1) ( 1 t K t t t x x x x x! π + ),,, ( ~ ) ( ) ( 3 1) ( 1 2 1) ( 2 t K t t t x x x x π x! + + ),, ( ~ 1) ( 1 1) ( 1 1) ( + + + t K t K t K x x x x! π Variational Inference versus MCMC Variational inference: try to match distribution with member of
Variational Inference for Boltzmann-Gibbs distribution 22 Exponential family: Variational Free Energy: 22
Ising model 23 Boltzmann-Gibbs distribution Ising model: Variational Free Energy:
Lecture 4 24 Variational Approximations Mean Field Inference
Naïve Mean Field for binary random variables 25 Factored distribution: Notation:
Naïve Mean Field for Ising model 26 - - - -
Naïve Mean Field for Ising model 27 Independent variables: additive entropy
Putting it all together 28 - Condition for extremum after some algebra.. Mean Field Equations:
Lecture 4 29 Variational Approximations Mean Field Inference Applications to computer vision (fully connected CRFs)
Mean Field Theory & Computer Vision 30 Discrete/Continuous Hopfield Networks (1982/1984) Yuille & coworkers (1985-199X) 1998+ Loopy Belief Propagation >(?) Mean Field 2011: Mean Field for fully connected CRF s
Winkler, 1995, p. 32 MRF nodes as pixels
MRFs nodes as patches 32 image Φ(x i, y i ) scene image Ψ(x i, x j ) scene
Network joint probability 33 1 P ( x, y ) = Ψ ( x, x ) Φ ( x, y ) i j i i Z scene image i, j Scene-scene compatibility function neighboring scene nodes i Image-scene compatibility function local observations
MRFs for Denoising (Geman & Geman, 1984) 34 Φ(x i, y i ) Noisy Pixel Intensities Ψ(x i, x j ) Clean Image
MRFs for Segmentation 35
Ising model (two labels) 36 Model for Binary vectors: Samples from Ising model for different Temperatures 36
Potts model (K-labels) 37 Multiple labels: Samples from Potts model for different Temperatures 37
Network Joint Probability 38 Scene Image Image-scene compatibility function Local observations Scene-scene compatibility function Neighboring scene nodes
Generative Framework for Vision 39 MRF: joint model over scene and observations Vision Task: recover scene given observations Bayes rule Posterior Likelihood Prior
Conditional Random Fields 40 MRF x 1 x 2 x 3 x 4 x 5 x 6 y 1 y 2 y 3 y 4 y 5 y 6 x 1 x 2 x 3 x 4 x 5 x 6 CRF y 1 y 2 y 3 y 4 y 5 y 6 CRFs: keep MRF tools, drop Bayesian aspect
CRFs in a nutshell 41
Grid CRF 42
Grid CRF limitations 43
Grid CRF limitations 44
45 2011: Fully-connected CRF (Krahnebuhl & Koltun) Philipp Krähenbühl and Vladlen Koltun, Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 2011
Fully-connected CRF 46 Philipp Krähenbühl and Vladlen Koltun, Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 2011
Fully-connected CRF 47 Philipp Krähenbühl and Vladlen Koltun, Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 2011
Fully-connected CRF 48 Philipp Krähenbühl and Vladlen Koltun, Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 2011
Fully-connected CRF: FAST 49 How? Mean Field + some tricks
Trick: Pairwise Term 50 Potts model Gaussian kernels Fast summation through separable convolution Philipp Krähenbühl and Vladlen Koltun, Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 2011
2014: Fully connected CRFs + Deep Classifiers 51 L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. Yuille Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, arxiv:1412.7062v1, 2014
Evolution from mean field updates 52 L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. Yuille Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, arxiv:1412.7062v1, 2014
Results (input, DCNN, CRF-DCNN) 53 L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. Yuille Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, arxiv:1412.7062v1, 2014
Results (input, DCNN, CRF-DCNN) 54 L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. Yuille Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, arxiv:1412.7062v1, 2014
Comparisons to other techniques 55 L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. Yuille Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, arxiv:1412.7062v1, 2014
Comparisons to previous state-of-the-art 56 L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. Yuille Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, arxiv:1412.7062v1, 2014