Introduction to Deep Learning Variational Inference, Mean Field Theory

Size: px

Start display at page:

Download "Introduction to Deep Learning Variational Inference, Mean Field Theory"

Eustacia Nicholson
10 years ago
Views:

1 Introduction to Deep Learning Variational Inference, Mean Field Theory 1 Iasonas Kokkinos [email protected] Center for Visual Computing Ecole Centrale Paris Galen Group INRIA-Saclay

2 Lecture 3: recap 2 Network Architectures Boltzmann Machine Restricted Boltzmann Machine

3 Boltzmann Machine (Hinton & Sejnowski, ) 3 Full-blown Ising Model Parameter estimation Once again: Training data MCMC 3

4 Boltzmann Machine limitations 4 Underlying statistical model: constrains second-order moments This will not get us too far even with extra information 4

5 Hidden variables, to the resque! 5 hidden, h observed, x 5

6 Boltzmann Machine: a big mixture model 6 Marginalization Mixture components Mixing weights compositional structure of components: h mixes and mashes rows of U 6

7 Botlzmann machine learning 7 As before, but with hidden variables

8 Botlzmann machine learning 8

9 Restricted Boltzmann Machine 9 hidden, h observed, x

10 RBM 10 RBM 10

11 The perks of a Restricted Boltzmann Machine 11 All hidden units are conditionally independent given the visible units and vice versa. We can update them in batch mode! 11

12 Restricted Boltzmann Machine sampling 12 Block-Gibbs MCMC 12

13 RBM inference 13 Block-Gibbs MCMC 13

14 RBM learning 14 Maximize with respect to 14

15 Lecture 4 15 Variational Approximations Mean Field Inference

16 Entropy reminder 16 Entropy = optimal coding length 16

17 Relative Entropy (Kullback-Leibler divergence) 17 Information lost when Q is used to approximate P: The KL divergence measures the expected number of extra bits required to code samples from P when using a code optimized for Q, rather than using the true code optimized for P. but (not a proper distance) 17

extra bits required to code samples from P when using a code optimized for

18 Step 1: Bounding the expectation of a convex function 18 Convex function: For more summands (Jensen s inequality): 18

19 Step 2: Bounding the KL divergence 19 Convex function: For we get KL divergence We also observe: By Jensen s inequality 19

20 Variational Inference 20 where makes the minimization tractable Typical family ( naïve mean field ): 20

21 21 Gibbs Sampling (one variant of MCMC) x 1 x 2 ),,, ( ~ ) ( ) ( 3 ) ( 2 1 1) ( 1 t K t t t x x x x x! π + ),,, ( ~ ) ( ) ( 3 1) ( 1 2 1) ( 2 t K t t t x x x x π x! + + ),, ( ~ 1) ( 1 1) ( 1 1) ( t K t K t K x x x x! π Variational Inference versus MCMC Variational inference: try to match distribution with member of

22 Variational Inference for Boltzmann-Gibbs distribution 22 Exponential family: Variational Free Energy: 22

23 Ising model 23 Boltzmann-Gibbs distribution Ising model: Variational Free Energy:

24 Lecture 4 24 Variational Approximations Mean Field Inference

25 Naïve Mean Field for binary random variables 25 Factored distribution: Notation:

26 Naïve Mean Field for Ising model

27 Naïve Mean Field for Ising model 27 Independent variables: additive entropy

28 Putting it all together 28 - Condition for extremum after some algebra.. Mean Field Equations:

29 Lecture 4 29 Variational Approximations Mean Field Inference Applications to computer vision (fully connected CRFs)

30 Mean Field Theory & Computer Vision 30 Discrete/Continuous Hopfield Networks (1982/1984) Yuille & coworkers ( X) Loopy Belief Propagation >(?) Mean Field 2011: Mean Field for fully connected CRF s

31 Winkler, 1995, p. 32 MRF nodes as pixels

32 MRFs nodes as patches 32 image Φ(x i, y i ) scene image Ψ(x i, x j ) scene

33 Network joint probability 33 1 P ( x, y ) = Ψ ( x, x ) Φ ( x, y ) i j i i Z scene image i, j Scene-scene compatibility function neighboring scene nodes i Image-scene compatibility function local observations

34 MRFs for Denoising (Geman & Geman, 1984) 34 Φ(x i, y i ) Noisy Pixel Intensities Ψ(x i, x j ) Clean Image

35 MRFs for Segmentation 35

36 Ising model (two labels) 36 Model for Binary vectors: Samples from Ising model for different Temperatures 36

37 Potts model (K-labels) 37 Multiple labels: Samples from Potts model for different Temperatures 37

38 Network Joint Probability 38 Scene Image Image-scene compatibility function Local observations Scene-scene compatibility function Neighboring scene nodes

39 Generative Framework for Vision 39 MRF: joint model over scene and observations Vision Task: recover scene given observations Bayes rule Posterior Likelihood Prior

40 Conditional Random Fields 40 MRF x 1 x 2 x 3 x 4 x 5 x 6 y 1 y 2 y 3 y 4 y 5 y 6 x 1 x 2 x 3 x 4 x 5 x 6 CRF y 1 y 2 y 3 y 4 y 5 y 6 CRFs: keep MRF tools, drop Bayesian aspect

41 CRFs in a nutshell 41

42 Grid CRF 42

43 Grid CRF limitations 43

44 Grid CRF limitations 44

45 : Fully-connected CRF (Krahnebuhl & Koltun) Philipp Krähenbühl and Vladlen Koltun, Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 2011

46 Fully-connected CRF 46 Philipp Krähenbühl and Vladlen Koltun, Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 2011

47 Fully-connected CRF 47 Philipp Krähenbühl and Vladlen Koltun, Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 2011

48 Fully-connected CRF 48 Philipp Krähenbühl and Vladlen Koltun, Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 2011

49 Fully-connected CRF: FAST 49 How? Mean Field + some tricks

50 Trick: Pairwise Term 50 Potts model Gaussian kernels Fast summation through separable convolution Philipp Krähenbühl and Vladlen Koltun, Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 2011

51 2014: Fully connected CRFs + Deep Classifiers 51 L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. Yuille Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, arxiv: v1, 2014

52 Evolution from mean field updates 52 L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. Yuille Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, arxiv: v1, 2014

53 Results (input, DCNN, CRF-DCNN) 53 L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. Yuille Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, arxiv: v1, 2014

54 Results (input, DCNN, CRF-DCNN) 54 L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. Yuille Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, arxiv: v1, 2014

55 Comparisons to other techniques 55 L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. Yuille Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, arxiv: v1, 2014

56 Comparisons to previous state-of-the-art 56 L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. Yuille Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, arxiv: v1, 2014

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct