Probabilistic Fitting

Probabilistic Fitting Shape Modeling April 2016 Sandro Schönborn University of Basel 1

Probabilistic Inference for Face Model Fitting Approximate Inference with Markov Chain Monte Carlo 2

Face Image Manipulation 3

Probabilistic Registration Model-based image registration Probabilistic framework of Gaussian Processes Combination? Probabilistic registration? 4

Face Model Fitting Target: 2D face image More difficulties: pose, illumination, color n Deformation h is now a complicated function 5

Goal: Fit the 3D Morphable Model of Faces 3D face model: Color model: I $ + h & Shape model: I $ h ( 3DMM: I $ + h & h ( 3D-2D computer graphics: x *+ = T./0 Pr T 34 S x 3+ Rigid 3D T 34, transform in image T./0 Projection Pr x = x/z y/z I x *+ = C < L n x 3+, C x 3+, x 3+ Normal n, Color transform C < c, illumination L n, c, x Corresponding x *+ and x 3+ 6

Overview Probabilistic Setup & Bayesian Inference Approximate Inference with Sampling Markov Chain Monte Carlo 3D Fitting Problem Landmarks Computer Graphics Overview 2D Face Image Analysis Filtering with unreliable information 7

Posterior vs. Optimization Registration, so far optimized (MAP): p h I B, I $ p h p I B h, I $ ) h = arg max K p h p I B h, I $ ) But it is actually a distribution: p h I B, I $ ( posterior ) Each solution is assigned a certain degree of belief Bayesian inference 8

Bayesian Inference Probability to express beliefs: credibility we assign a hypothesis, based on our current knowledge Observations change our knowledge -> beliefs change too! Bayes rule to update beliefs: p q p q D p q D, D * p q D, D *, D 3 p q D = p D q p D p q = p D q p q p D q p q dq Posterior becomes prior of next inference step 9

Example: Coin Toss With prior knowledge to capture prior beliefs No point estimate but posterior distribution Likelihood of observation HHH Prior p q HHH = V W p HHH q p(q) p HHH q p(q)dq Belief update with Bayes rule Posterior : Whole distribution available to express our knowledge about q! 10

Estimation Overview Maximum Likelihood: Single value Maximum of likelihood only Maximum-A-Posteriori: Single value Maximum of posterior distribution regularized Bayes Whole posterior distribution Belief update (Bayes rule, Bayesian inference ) Captures uncertainty qx = arg max l(q; D) Y qx = arg max Y l q;d p(q) l q; D p(q) p(q D) = l q;d p q dq 11

3D Morphable Model: Probabilistic Setup Prior: deformations of the mean face: p h Shape & Color: h ~ GP μ, k : h b μ + d α f α ~ N(0, E i ) i f λ f Φ f Pose (camera setup), illumination, color transform: all independent Likelihood: p I B α, I $ ) parameterization: low-rank models 12

3DMM Posterior Posterior captures the plausibility of each solution p θ I B, I $ = p θ p I B θ, I $ ) dθp θ p I B θ, I $ ) Intractable posterior! Integration over complete parameter space! But expensive, point-wise and unnormalized evaluation is possible: p θ I B, I $ p θ p I B θ, I $ ) 13

Intractable Posteriors Cannot be represented in closed form Normalization often intractable in Bayesian setting p θ D = l(θ; D) p(θ) N(D) N D = n l(θ; D)p(θ)dθ Models are often designed to keep the posterior tractable For example through conjugate priors: posterior has same form as prior Is MAP the only option? No! Approximate posterior with tractable function l θ; D = p(d θ) 14

Approximate Bayesian Inference Variational methods Function approximation q(θ) arg max KL(q(θ) p(θ D)) Y Variational Message Passing, Mean- Field Theory, Moment matching, Sampling methods Numeric approximations through simulation Monte Carlo, Importance sampling, Particle Filters, MCMC, KL: Kullback-Leibler divergence 15

Sampling Methods Simulate a distribution p through random samples x f Evaluate expectations E f x = n f x p x dx s E f x fq = 1 N d f x f, x f ~ p x V fq ~ O 1 N f This is difficult! Independent of dimensionality More samples increase accuracy 16

Sampling from A Distribution Easy for standard distributions is it? Uniform Gaussian How to sample from more complex distributions? Beta, Exponential, Chi square, Gamma, Target posteriors are very often not in a nice standard text book form Sadly, only very few distributions are easy to sample from We need to sample from an unknown posterior with only unnormalized, expensive point-wise evaluation L General Samplers? Yes! Rejection, Importance, MCMC Random.nextDouble() Random.nextGaussian() 17

Markov Chain Monte Carlo Markov Chain Monte Carlo Methods (MCMC) Design a Markov Chain such that samples x obey the target distribution p Concept: Use an already existing sample to produce the next one Very powerful general sampling methods Many successful practical applications Proven: developed in the 1950/1970ies (Metropolis/Hastings) Recent: direct mapping of computing power to approximation accuracy Algorithms (buzz words): Metropolis/-Hastings, Gibbs, Slice Sampling 18

Markov Chain Sequence of random variables X f fwv s, X f S with joint distribution P X V,X *,, X s = P X V z P(X f X f{v ) s fw* State space Initial distribution Transition probability Simplifications: (for our analysis) Discrete state space: S = {1, 2,, K} Homogeneous Chain: P X f = l X f{v = m = T ƒ 19

Example: Markov Chain Simple weather model: dry (D) or rainy (R) hour State space S = {D, R} Condition in next hour? X V Stochastic: P(X V X ) Simple: depends only on current condition X Draw Samples from chain: Initial: X W = D Evolution: P X V X Long-term Behavior Does it converge? Average probability of rain? Dynamics? 0.95 0.05 0.2 0.8 DDDDDDDDRRRRRRRRRRRDDDDDDDDDDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDD DDDDDDDDDRDD... 20

Discrete Homogeneous Markov Chain Formally linear algebra: Distribution (vector): P X f : p i = P(X f = 1) P(X f = K) Transition probability (transition matrix): P 1 1 P 1 K P X f X f{v : T = P K 1 P K K T ƒ = P l m = P X f = l X f{v = m 21

Evolution of the Initial Distribution Evolution of P X V P(X * ): P X * = l = d P l m P X V = m ƒ p* = Tp V Evolution of n steps: ( Is there a stable distribution p? p V = T p V p = Tp Eigenvector of T to eigenvalue λ = 1 22

Steady-State Distribution: p It exists: T subject to normalization constraint: left eigenvector to eigenvalue 1 d T ƒ = 1 1 1 T = 1 1 T has eigenvalue λ = 1 (left-/right eigenvalues are the same) Steady-state distribution as corresponding right eigenvector Tp = p Does an arbitrary initial distribution evolve to p? Convergence? Uniqueness? 23

Equilibrium Distribution: p Additional requirement for T: T ƒ > 0 for n > N W The chain is called irreducible and aperiodic (implies ergodic) All states are connected using at most N W steps Return intervals to a certain state are irregular Perron-Frobenius theorem for positive matrices: PF1: λ V = 1 is a simple eigenvalue with 1d eigenspace (uniqueness) PF2: λ V = 1 is dominant, all λ f < 1, i 1 (convergence) p is a stable attractor, called equilibrium distribution Tp = p 24

Convergence Time evolution of arbitrary distribution p W p = T p W Expand p W in Eigen basis of T: Te f = λ f e f, λ f < λ V = 1, λ λ V œ p W = d c f e f f œ Tp W = d c f λ f e f f T p W = d c f λ f e f f = c V e V + λ * c * e * + λ 3 c 3 e 3 + 25

Convergence (II) T p W = d c f λ f e f (n 1) Convergence: f p + λ * c * e * = c V e 1 + λ * c * e * + λ 3 c 3 e 3 + T p W Ÿ p Rate of convergence: p p λ * c * e * = λ * c * c V e 1 = p Normalizations: e V = 1 f p f = 1 26

Example: Weather Dynamics Rain forecast for stable versus mixed weather: W = 0.95 0.2 stable mixed W 0.05 0.8 ƒ = 0.85 0.6 0.15 0.4 p = 0.8 Long-term average 0.2 p = 0.8 probability of rain: 20% 0.2 Eigenvalues: 1, 0.75 Eigenvalues: 1, 0.25 Rainy now, next hours? RRRRDDDDDDDDDDDD DDDDDDDDDDDDD... Rainy now, next hours? RDDDDDDDDDDDDDDD RDDDRDDDDDDDD... 27

Detailed Balance Detailed Balance is a local equilibrium Distribution p satisfies detailed balance if the total flow of probability between every pair of states is equal, the chain is then reversible: P l m p m = P m l p(l) Detailed balance implies: p is the equilibrium distribution Tp = d T ƒ p ƒ ƒ = d T ƒ p ƒ = p Design Markov Chains with specific equilibrium distributions! 28

Example: Detailed Balance Local Equilibrium at 1 Global Equilibrium at 1 0.5 0.5 0.25 0.4 0.25 0.1 0.25 0.25 0.25 0.4 0.1 0.1 0.4 same equilibrium distribution [1/3, 1/3, 1/3] different convergence mechanism 29

Summary: Markov Chains Sequential random variables: X V, X *, Aperiodic and irreducible chains are ergodic: Convergence towards a unique equilibrium distribution p Equilibrium distribution p Eigenvector of T with eigenvalue λ = 1: Tp = p Rate of convergence: decay with second largest eigenvalue λ * Detailed Balance: Local equilibrium global equilibrium 30

The Metropolis-Hastings Algorithm Convert samples from Q(x) into samples from P(x) Requires: Can draw samples from Q x x) Q x x > 0 if Q x x) > 0 Point-wise evaluation of P(x) Result: Sequence of samples approximately from P Behind the scenes: MH constructs a Markov chain 31

Metropolis-Hastings Algorithm Initialize: Initialize at state x Generate next sample, starting at x: 1. Draw a sample x from Q(x x) ( proposal ) 2. Update state x x with probability α = min ± ²³ ( accept or reject ) 3. Sample is current state x ± ² ² ² ³ ² ³ ², 1 32

Metropolis-Hastings Algorithm Defines a Markov Chain with transition kernel T µ x x = Q x x α x x + d Q x x (1 α x x ) Target distribution P obeys detailed balance: P is the equilibrium distribution! Sampling with algorithm samples from P (in asymptotia ) P appears only in ratios of point-wise evaluations ok for our unnormalized posterior! α = min Application: easily accessible Q, difficult target P ² P x P x δ ² ³ ² Q x x Q x x, 1 33

MH: Detailed Balance T µ x x = Q x x α x x + d Q x x (1 α x x ) Check: does it hold? T µ x x P x = T µ x x P x Expand: (blackboard) Result: P satisfies detailed balance for MH kernel P is the stable distribution P is the equilibrium distribution if the chain is irreducible Samples from chain converge to be drawn from P! ² δ ² ³ ² 34

Example: 2D Gaussian Target: P x = V *¹º» e{½» x{μ À Á½ (x{μ) Proposal: Q x x = N(x x, σ * I * ) Q x x = Q x x Random walk μ = 1.5 1.5 1.25 0.75 Σ = 0.75 1.25 μ = 1.56 1.68 1.09 0.63 ΣÅ = 0.63 1.07 35

Different Proposals: 2D Gaussian σ = 0.2 σ = 1.0 36

Serial Correlation Memory of Markov Chain leads to dependent samples Chain remembers last sample(s) Different standard deviations Correlations are due to: Variation of proposal is too weak Too many rejections (rejects freeze the chain) May be relevant for estimation Resolve with subsampling 37

Metropolis-Hastings: Limitations Target distribution is approximate Need complicated diagnostics to know exactly how approximate There are methods for perfect sampling Serial correlation Vulnerable to highly correlated targets Proposal should match correlation Or be adapted to target Result depends on choice of proposal distribution Bishop. PRML, Springer, 2006 38

Probabilistic Fitting with MCMC Probabilistic Registration Bayesian Inference Posterior distribution Approximate Inference Sampling Simulate posterior distribution Metropolis-Hastings MCMC, general sampler Sample from Q transform to P Choose P p θ I $, I B l(θ; D)p(θ) p θ I $, I B = l(θ; D)p(θ)dθ α = min P x P x Q x x Q x x, 1 39

3D Fitting Example right.eye.corner_outer left.eye.corner_outer right.lips.corner left.lips.corner 40

3D Fitting Setup 3D face with statistical model Discrete low-rank Gaussian Process Arbitrary rigid transformation Pose, Positioning in space Observations Observed positions xé V, xé *, xé Ê Correspondence: x V Ë, x * Ë,, x Ê Ë Goal: Find Posterior Distribution P θ xé V,, xé Ê l xé V,, xé Ê θ P(α) Parameters θ = (α, φ, ψ, θ, t) Shape i x = μ(x) + d α f λ f Φ f (x) Rigid Transform 3 angles (pitch, yaw, roll) φ, ψ, θ Translation t x = R Ñ R Ò R Ó x + t f 41

Proposals Choose simple Gaussian random walk proposals (Metropolis) "Q θ θ = N(θ θ, Σ Õ )" Normal perturbations of current state Block-wise to account for different parameter types Shape N(α α, σ * ( E ) Rotation N φ φ, σ * Ó + N ψ ψ, σ * Ò + N θ * θ, σ Ñ Translation N t t, σ * E 3 E i Identity matrix (I is image) Large mixture distributions as proposals Q θ θ = c f Q f (θ θ) 42

3DMM Landmarks Likelihood Simple models: Independent Gaussians Observation of L landmark locations xé f in image Single landmark position model: x f θ = R Ó,Ò,Ñ h α x f Ë Ø + t * l f θ; xé f = N xé f x f θ, σ Ê/ Independent model (conditional independence): Ù l θ; D = p(d θ) l θ; xé V, xé *,, xé Ù = z l f θ; xé f fwv Independence and Gaussian are just simple models (questionable) 43

3D Fit to Landmarks Influence of landmarks uncertainty on final posterior? σ Ê/ = 1mm σ Ê/ = 4mm σ Ê/ = 10mm Only 4 landmark observations: Expect only weak shape impact Should still constrain pose Uncertain LM should be looser 44

3D Fitting: Code val yawproposal = GaussianRotationProposal(AxisY, sdev = 0.05) val pitchproposal = GaussianRotationProposal(AxisX, sdev = 0.05) val rollproposal = GaussianRotationProposal(AxisZ, sdev = 0.05) val rotationproposal = MixtureProposal( 0.6 *: yawproposal + 0.3 *: pitchproposal + 0.1 *: rollproposal) val translationproposal = GaussianTranslationProposal(Vector(2, 2, 2)) val poseproposal = MixtureProposal( rotationproposal + translationproposal) val shapeproposal = GaussianShapeProposal(sdev = 0.05) val lmfitter = MetropolisHastings( proposal = 0.2 *: poseproposal + 0.8 *: shapeproposal, evaluator = lmlikelihood * shapeprior) val samples = lmfitter.iterator(initstate).drop(2000).take(8000).toindexedseq 45

Posterior: Pose & Shape, 4mm μ ÚÛÜ = 0.511 σx ÚÛÜ = 0.073 (4 ) μ Þ ß = 1 mm σx Þß = 4 mm (Estimation from samples) μ b ½ = 0.4 σx b½ = 0.6 46

Posterior: Pose & Shape, 4mm Posterior values (log, unnormalized!) 47

Posterior: Pose & Shape, 1mm μ ÚÛÜ = 0.50 σx ÚÛÜ = 0.041 (2.4 ) μ Þ ß = 2 mm σx Þß = 0.8 mm μ b ½ = 1.5 σx b½ = 0.35 48

Posterior: Pose & Shape, 10mm μ ÚÛÜ = 0.49 σx ÚÛÜ = 0.11 (7 ) μ Þ ß = 5 mm σx Þß = 10 mm μ b ½ = 0 σx b½ = 0.6 49

Summary: MCMC for 3D Fitting Probabilistic inference for fitting probabilistic models Probabilistic inference is most often not tractable Use approximate inference methods Sampling methods approximate by simulation MCMC methods provide a powerful sampling framework Markov Chain with target distribution as equilibrium distribution General algorithms, e.g. Metropolis-Hastings 3D landmarks fitting example: Posterior distribution Model likelihood Define proposals 50