Introduction to Deep Learning Variational Inference, Mean Field Theory


 Eustacia Nicholson
 1 years ago
 Views:
Transcription
1 Introduction to Deep Learning Variational Inference, Mean Field Theory 1 Iasonas Kokkinos Center for Visual Computing Ecole Centrale Paris Galen Group INRIASaclay
2 Lecture 3: recap 2 Network Architectures Boltzmann Machine Restricted Boltzmann Machine
3 Boltzmann Machine (Hinton & Sejnowski, ) 3 Fullblown Ising Model Parameter estimation Once again: Training data MCMC 3
4 Boltzmann Machine limitations 4 Underlying statistical model: constrains secondorder moments This will not get us too far even with extra information 4
5 Hidden variables, to the resque! 5 hidden, h observed, x 5
6 Boltzmann Machine: a big mixture model 6 Marginalization Mixture components Mixing weights compositional structure of components: h mixes and mashes rows of U 6
7 Botlzmann machine learning 7 As before, but with hidden variables
8 Botlzmann machine learning 8
9 Restricted Boltzmann Machine 9 hidden, h observed, x
10 RBM 10 RBM 10
11 The perks of a Restricted Boltzmann Machine 11 All hidden units are conditionally independent given the visible units and vice versa. We can update them in batch mode! 11
12 Restricted Boltzmann Machine sampling 12 BlockGibbs MCMC 12
13 RBM inference 13 BlockGibbs MCMC 13
14 RBM learning 14 Maximize with respect to 14
15 Lecture 4 15 Variational Approximations Mean Field Inference
16 Entropy reminder 16 Entropy = optimal coding length 16
17 Relative Entropy (KullbackLeibler divergence) 17 Information lost when Q is used to approximate P: The KL divergence measures the expected number of extra bits required to code samples from P when using a code optimized for Q, rather than using the true code optimized for P. but (not a proper distance) 17
18 Step 1: Bounding the expectation of a convex function 18 Convex function: For more summands (Jensen s inequality): 18
19 Step 2: Bounding the KL divergence 19 Convex function: For we get KL divergence We also observe: By Jensen s inequality 19
20 Variational Inference 20 where makes the minimization tractable Typical family ( naïve mean field ): 20
21 21 Gibbs Sampling (one variant of MCMC) x 1 x 2 ),,, ( ~ ) ( ) ( 3 ) ( 2 1 1) ( 1 t K t t t x x x x x! π + ),,, ( ~ ) ( ) ( 3 1) ( 1 2 1) ( 2 t K t t t x x x x π x! + + ),, ( ~ 1) ( 1 1) ( 1 1) ( t K t K t K x x x x! π Variational Inference versus MCMC Variational inference: try to match distribution with member of
22 Variational Inference for BoltzmannGibbs distribution 22 Exponential family: Variational Free Energy: 22
23 Ising model 23 BoltzmannGibbs distribution Ising model: Variational Free Energy:
24 Lecture 4 24 Variational Approximations Mean Field Inference
25 Naïve Mean Field for binary random variables 25 Factored distribution: Notation:
26 Naïve Mean Field for Ising model
27 Naïve Mean Field for Ising model 27 Independent variables: additive entropy
28 Putting it all together 28  Condition for extremum after some algebra.. Mean Field Equations:
29 Lecture 4 29 Variational Approximations Mean Field Inference Applications to computer vision (fully connected CRFs)
30 Mean Field Theory & Computer Vision 30 Discrete/Continuous Hopfield Networks (1982/1984) Yuille & coworkers ( X) Loopy Belief Propagation >(?) Mean Field 2011: Mean Field for fully connected CRF s
31 Winkler, 1995, p. 32 MRF nodes as pixels
32 MRFs nodes as patches 32 image Φ(x i, y i ) scene image Ψ(x i, x j ) scene
33 Network joint probability 33 1 P ( x, y ) = Ψ ( x, x ) Φ ( x, y ) i j i i Z scene image i, j Scenescene compatibility function neighboring scene nodes i Imagescene compatibility function local observations
34 MRFs for Denoising (Geman & Geman, 1984) 34 Φ(x i, y i ) Noisy Pixel Intensities Ψ(x i, x j ) Clean Image
35 MRFs for Segmentation 35
36 Ising model (two labels) 36 Model for Binary vectors: Samples from Ising model for different Temperatures 36
37 Potts model (Klabels) 37 Multiple labels: Samples from Potts model for different Temperatures 37
38 Network Joint Probability 38 Scene Image Imagescene compatibility function Local observations Scenescene compatibility function Neighboring scene nodes
39 Generative Framework for Vision 39 MRF: joint model over scene and observations Vision Task: recover scene given observations Bayes rule Posterior Likelihood Prior
40 Conditional Random Fields 40 MRF x 1 x 2 x 3 x 4 x 5 x 6 y 1 y 2 y 3 y 4 y 5 y 6 x 1 x 2 x 3 x 4 x 5 x 6 CRF y 1 y 2 y 3 y 4 y 5 y 6 CRFs: keep MRF tools, drop Bayesian aspect
41 CRFs in a nutshell 41
42 Grid CRF 42
43 Grid CRF limitations 43
44 Grid CRF limitations 44
45 : Fullyconnected CRF (Krahnebuhl & Koltun) Philipp Krähenbühl and Vladlen Koltun, Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 2011
46 Fullyconnected CRF 46 Philipp Krähenbühl and Vladlen Koltun, Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 2011
47 Fullyconnected CRF 47 Philipp Krähenbühl and Vladlen Koltun, Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 2011
48 Fullyconnected CRF 48 Philipp Krähenbühl and Vladlen Koltun, Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 2011
49 Fullyconnected CRF: FAST 49 How? Mean Field + some tricks
50 Trick: Pairwise Term 50 Potts model Gaussian kernels Fast summation through separable convolution Philipp Krähenbühl and Vladlen Koltun, Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 2011
51 2014: Fully connected CRFs + Deep Classifiers 51 L.C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. Yuille Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, arxiv: v1, 2014
52 Evolution from mean field updates 52 L.C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. Yuille Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, arxiv: v1, 2014
53 Results (input, DCNN, CRFDCNN) 53 L.C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. Yuille Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, arxiv: v1, 2014
54 Results (input, DCNN, CRFDCNN) 54 L.C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. Yuille Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, arxiv: v1, 2014
55 Comparisons to other techniques 55 L.C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. Yuille Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, arxiv: v1, 2014
56 Comparisons to previous stateoftheart 56 L.C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. Yuille Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, arxiv: v1, 2014