Probabilistic Fitting

Similar documents
Basics of Statistical Machine Learning

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Statistical Machine Learning from Data

Gaussian Processes to Speed up Hamiltonian Monte Carlo

Bayesian Statistics in One Hour. Patrick Lam

Introduction to Markov Chain Monte Carlo

Inference on Phase-type Models via MCMC

Statistical Machine Learning

Tutorial on Markov Chain Monte Carlo

11. Time series and dynamic linear models

Imperfect Debugging in Software Reliability

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Markov Chain Monte Carlo Simulation Made Simple

Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)


1 Prior Probability and Posterior Probability

gloser og kommentarer til Macbeth ( sidetal henviser til "Illustrated Shakespeare") GS 96

STA 4273H: Statistical Machine Learning


A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data

Bayesian Statistics: Indian Buffet Process




Monte Carlo-based statistical methods (MASM11/FMS091)


Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT


Centre for Central Banking Studies

Exam C, Fall 2006 PRELIMINARY ANSWER KEY

Message-passing sequential detection of multiple change points in networks



Bayesian Image Super-Resolution

Bud row 1. Chips row 2. Coors. Bud. row 3 Milk. Chips. Cheesies. Coors row 4 Cheesies. Diapers. Milk. Diapers

Statistics Graduate Courses






An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics

PS 271B: Quantitative Methods II. Lecture Notes


Lecture 3: Linear methods for classification

CHAPTER 2 Estimating Probabilities

(a) Hidden Terminal Problem. (b) Direct Interference. (c) Self Interference

Introduction to Monte Carlo. Astro 542 Princeton University Shirley Ho



Analysis of Bayesian Dynamic Linear Models







Author manuscript, published in "1st International IBM Cloud Academy Conference - ICA CON 2012 (2012)" hal , version 1-20 Apr 2012



Inference of Probability Distributions for Trust and Security applications

Deterministic Sampling-based Switching Kalman Filtering for Vehicle Tracking

Christfried Webers. Canberra February June 2015

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan , Fall 2010


Parallelization Strategies for Multicore Data Analysis


Least-Squares Intersection of Lines

Maximum Likelihood Estimation


Cell Phone based Activity Detection using Markov Logic Network

universe nonself self detection system false negatives false positives

FRAME. ... Data Slot S. Data Slot 1 Data Slot 2 C T S R T S. No. of Simultaneous Users. User 1 User 2 User 3. User U. No.

Controllability and Observability




Markov Chain Monte Carlo and Numerical Differential Equations


Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page

A Markov Chain Analysis of Blackjack Strategy

Hierarchical Bayesian Modeling of the HIV Response to Therapy

Client URL. List of object servers that contain object

How to create OpenDocument URL s with SAP BusinessObjects BI 4.0

Bayesian Analysis for the Social Sciences

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Towards running complex models on big data

Sliding Window ... Basic Window S[0] S[k 1] S[k] Digests Digests Digests

Universitat Autònoma de Barcelona

Simple Linear Regression Inference


Computational Statistics for Big Data

hospital physician(2)... disease(4) treat(2) W305(2) leukemia(3) leukemia(2) cancer

Master s Theory Exam Spring 2006

Nonlinear Regression:

Transcription:

Probabilistic Fitting Shape Modeling April 2016 Sandro Schönborn University of Basel 1

Probabilistic Inference for Face Model Fitting Approximate Inference with Markov Chain Monte Carlo 2

Face Image Manipulation 3

Probabilistic Registration Model-based image registration Probabilistic framework of Gaussian Processes Combination? Probabilistic registration? 4

Face Model Fitting Target: 2D face image More difficulties: pose, illumination, color n Deformation h is now a complicated function 5

Goal: Fit the 3D Morphable Model of Faces 3D face model: Color model: I $ + h & Shape model: I $ h ( 3DMM: I $ + h & h ( 3D-2D computer graphics: x *+ = T./0 Pr T 34 S x 3+ Rigid 3D T 34, transform in image T./0 Projection Pr x = x/z y/z I x *+ = C < L n x 3+, C x 3+, x 3+ Normal n, Color transform C < c, illumination L n, c, x Corresponding x *+ and x 3+ 6

Overview Probabilistic Setup & Bayesian Inference Approximate Inference with Sampling Markov Chain Monte Carlo 3D Fitting Problem Landmarks Computer Graphics Overview 2D Face Image Analysis Filtering with unreliable information 7

Posterior vs. Optimization Registration, so far optimized (MAP): p h I B, I $ p h p I B h, I $ ) h = arg max K p h p I B h, I $ ) But it is actually a distribution: p h I B, I $ ( posterior ) Each solution is assigned a certain degree of belief Bayesian inference 8

Bayesian Inference Probability to express beliefs: credibility we assign a hypothesis, based on our current knowledge Observations change our knowledge -> beliefs change too! Bayes rule to update beliefs: p q p q D p q D, D * p q D, D *, D 3 p q D = p D q p D p q = p D q p q p D q p q dq Posterior becomes prior of next inference step 9

Example: Coin Toss With prior knowledge to capture prior beliefs No point estimate but posterior distribution Likelihood of observation HHH Prior p q HHH = V W p HHH q p(q) p HHH q p(q)dq Belief update with Bayes rule Posterior : Whole distribution available to express our knowledge about q! 10

Estimation Overview Maximum Likelihood: Single value Maximum of likelihood only Maximum-A-Posteriori: Single value Maximum of posterior distribution regularized Bayes Whole posterior distribution Belief update (Bayes rule, Bayesian inference ) Captures uncertainty qx = arg max l(q; D) Y qx = arg max Y l q;d p(q) l q; D p(q) p(q D) = l q;d p q dq 11

3D Morphable Model: Probabilistic Setup Prior: deformations of the mean face: p h Shape & Color: h ~ GP μ, k : h b μ + d α f α ~ N(0, E i ) i f λ f Φ f Pose (camera setup), illumination, color transform: all independent Likelihood: p I B α, I $ ) parameterization: low-rank models 12

3DMM Posterior Posterior captures the plausibility of each solution p θ I B, I $ = p θ p I B θ, I $ ) dθp θ p I B θ, I $ ) Intractable posterior! Integration over complete parameter space! But expensive, point-wise and unnormalized evaluation is possible: p θ I B, I $ p θ p I B θ, I $ ) 13

Intractable Posteriors Cannot be represented in closed form Normalization often intractable in Bayesian setting p θ D = l(θ; D) p(θ) N(D) N D = n l(θ; D)p(θ)dθ Models are often designed to keep the posterior tractable For example through conjugate priors: posterior has same form as prior Is MAP the only option? No! Approximate posterior with tractable function l θ; D = p(d θ) 14

Approximate Bayesian Inference Variational methods Function approximation q(θ) arg max KL(q(θ) p(θ D)) Y Variational Message Passing, Mean- Field Theory, Moment matching, Sampling methods Numeric approximations through simulation Monte Carlo, Importance sampling, Particle Filters, MCMC, KL: Kullback-Leibler divergence 15

Sampling Methods Simulate a distribution p through random samples x f Evaluate expectations E f x = n f x p x dx s E f x fq = 1 N d f x f, x f ~ p x V fq ~ O 1 N f This is difficult! Independent of dimensionality More samples increase accuracy 16

Sampling from A Distribution Easy for standard distributions is it? Uniform Gaussian How to sample from more complex distributions? Beta, Exponential, Chi square, Gamma, Target posteriors are very often not in a nice standard text book form Sadly, only very few distributions are easy to sample from We need to sample from an unknown posterior with only unnormalized, expensive point-wise evaluation L General Samplers? Yes! Rejection, Importance, MCMC Random.nextDouble() Random.nextGaussian() 17

Markov Chain Monte Carlo Markov Chain Monte Carlo Methods (MCMC) Design a Markov Chain such that samples x obey the target distribution p Concept: Use an already existing sample to produce the next one Very powerful general sampling methods Many successful practical applications Proven: developed in the 1950/1970ies (Metropolis/Hastings) Recent: direct mapping of computing power to approximation accuracy Algorithms (buzz words): Metropolis/-Hastings, Gibbs, Slice Sampling 18

Markov Chain Sequence of random variables X f fwv s, X f S with joint distribution P X V,X *,, X s = P X V z P(X f X f{v ) s fw* State space Initial distribution Transition probability Simplifications: (for our analysis) Discrete state space: S = {1, 2,, K} Homogeneous Chain: P X f = l X f{v = m = T ƒ 19

Example: Markov Chain Simple weather model: dry (D) or rainy (R) hour State space S = {D, R} Condition in next hour? X V Stochastic: P(X V X ) Simple: depends only on current condition X Draw Samples from chain: Initial: X W = D Evolution: P X V X Long-term Behavior Does it converge? Average probability of rain? Dynamics? 0.95 0.05 0.2 0.8 DDDDDDDDRRRRRRRRRRRDDDDDDDDDDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDD DDDDDDDDDRDD... 20

Discrete Homogeneous Markov Chain Formally linear algebra: Distribution (vector): P X f : p i = P(X f = 1) P(X f = K) Transition probability (transition matrix): P 1 1 P 1 K P X f X f{v : T = P K 1 P K K T ƒ = P l m = P X f = l X f{v = m 21

Evolution of the Initial Distribution Evolution of P X V P(X * ): P X * = l = d P l m P X V = m ƒ p* = Tp V Evolution of n steps: ( Is there a stable distribution p? p V = T p V p = Tp Eigenvector of T to eigenvalue λ = 1 22

Steady-State Distribution: p It exists: T subject to normalization constraint: left eigenvector to eigenvalue 1 d T ƒ = 1 1 1 T = 1 1 T has eigenvalue λ = 1 (left-/right eigenvalues are the same) Steady-state distribution as corresponding right eigenvector Tp = p Does an arbitrary initial distribution evolve to p? Convergence? Uniqueness? 23

Equilibrium Distribution: p Additional requirement for T: T ƒ > 0 for n > N W The chain is called irreducible and aperiodic (implies ergodic) All states are connected using at most N W steps Return intervals to a certain state are irregular Perron-Frobenius theorem for positive matrices: PF1: λ V = 1 is a simple eigenvalue with 1d eigenspace (uniqueness) PF2: λ V = 1 is dominant, all λ f < 1, i 1 (convergence) p is a stable attractor, called equilibrium distribution Tp = p 24

Convergence Time evolution of arbitrary distribution p W p = T p W Expand p W in Eigen basis of T: Te f = λ f e f, λ f < λ V = 1, λ λ V œ p W = d c f e f f œ Tp W = d c f λ f e f f T p W = d c f λ f e f f = c V e V + λ * c * e * + λ 3 c 3 e 3 + 25

Convergence (II) T p W = d c f λ f e f (n 1) Convergence: f p + λ * c * e * = c V e 1 + λ * c * e * + λ 3 c 3 e 3 + T p W Ÿ p Rate of convergence: p p λ * c * e * = λ * c * c V e 1 = p Normalizations: e V = 1 f p f = 1 26

Example: Weather Dynamics Rain forecast for stable versus mixed weather: W = 0.95 0.2 stable mixed W 0.05 0.8 ƒ = 0.85 0.6 0.15 0.4 p = 0.8 Long-term average 0.2 p = 0.8 probability of rain: 20% 0.2 Eigenvalues: 1, 0.75 Eigenvalues: 1, 0.25 Rainy now, next hours? RRRRDDDDDDDDDDDD DDDDDDDDDDDDD... Rainy now, next hours? RDDDDDDDDDDDDDDD RDDDRDDDDDDDD... 27

Detailed Balance Detailed Balance is a local equilibrium Distribution p satisfies detailed balance if the total flow of probability between every pair of states is equal, the chain is then reversible: P l m p m = P m l p(l) Detailed balance implies: p is the equilibrium distribution Tp = d T ƒ p ƒ ƒ = d T ƒ p ƒ = p Design Markov Chains with specific equilibrium distributions! 28

Example: Detailed Balance Local Equilibrium at 1 Global Equilibrium at 1 0.5 0.5 0.25 0.4 0.25 0.1 0.25 0.25 0.25 0.4 0.1 0.1 0.4 same equilibrium distribution [1/3, 1/3, 1/3] different convergence mechanism 29

Summary: Markov Chains Sequential random variables: X V, X *, Aperiodic and irreducible chains are ergodic: Convergence towards a unique equilibrium distribution p Equilibrium distribution p Eigenvector of T with eigenvalue λ = 1: Tp = p Rate of convergence: decay with second largest eigenvalue λ * Detailed Balance: Local equilibrium global equilibrium 30

The Metropolis-Hastings Algorithm Convert samples from Q(x) into samples from P(x) Requires: Can draw samples from Q x x) Q x x > 0 if Q x x) > 0 Point-wise evaluation of P(x) Result: Sequence of samples approximately from P Behind the scenes: MH constructs a Markov chain 31

Metropolis-Hastings Algorithm Initialize: Initialize at state x Generate next sample, starting at x: 1. Draw a sample x from Q(x x) ( proposal ) 2. Update state x x with probability α = min ± ²³ ( accept or reject ) 3. Sample is current state x ± ² ² ² ³ ² ³ ², 1 32

Metropolis-Hastings Algorithm Defines a Markov Chain with transition kernel T µ x x = Q x x α x x + d Q x x (1 α x x ) Target distribution P obeys detailed balance: P is the equilibrium distribution! Sampling with algorithm samples from P (in asymptotia ) P appears only in ratios of point-wise evaluations ok for our unnormalized posterior! α = min Application: easily accessible Q, difficult target P ² P x P x δ ² ³ ² Q x x Q x x, 1 33

MH: Detailed Balance T µ x x = Q x x α x x + d Q x x (1 α x x ) Check: does it hold? T µ x x P x = T µ x x P x Expand: (blackboard) Result: P satisfies detailed balance for MH kernel P is the stable distribution P is the equilibrium distribution if the chain is irreducible Samples from chain converge to be drawn from P! ² δ ² ³ ² 34

Example: 2D Gaussian Target: P x = V *¹º» e{½» x{μ À Á½ (x{μ) Proposal: Q x x = N(x x, σ * I * ) Q x x = Q x x Random walk μ = 1.5 1.5 1.25 0.75 Σ = 0.75 1.25 μ = 1.56 1.68 1.09 0.63 ΣÅ = 0.63 1.07 35

Different Proposals: 2D Gaussian σ = 0.2 σ = 1.0 36

Serial Correlation Memory of Markov Chain leads to dependent samples Chain remembers last sample(s) Different standard deviations Correlations are due to: Variation of proposal is too weak Too many rejections (rejects freeze the chain) May be relevant for estimation Resolve with subsampling 37

Metropolis-Hastings: Limitations Target distribution is approximate Need complicated diagnostics to know exactly how approximate There are methods for perfect sampling Serial correlation Vulnerable to highly correlated targets Proposal should match correlation Or be adapted to target Result depends on choice of proposal distribution Bishop. PRML, Springer, 2006 38

Probabilistic Fitting with MCMC Probabilistic Registration Bayesian Inference Posterior distribution Approximate Inference Sampling Simulate posterior distribution Metropolis-Hastings MCMC, general sampler Sample from Q transform to P Choose P p θ I $, I B l(θ; D)p(θ) p θ I $, I B = l(θ; D)p(θ)dθ α = min P x P x Q x x Q x x, 1 39

3D Fitting Example right.eye.corner_outer left.eye.corner_outer right.lips.corner left.lips.corner 40

3D Fitting Setup 3D face with statistical model Discrete low-rank Gaussian Process Arbitrary rigid transformation Pose, Positioning in space Observations Observed positions xé V, xé *, xé Ê Correspondence: x V Ë, x * Ë,, x Ê Ë Goal: Find Posterior Distribution P θ xé V,, xé Ê l xé V,, xé Ê θ P(α) Parameters θ = (α, φ, ψ, θ, t) Shape i x = μ(x) + d α f λ f Φ f (x) Rigid Transform 3 angles (pitch, yaw, roll) φ, ψ, θ Translation t x = R Ñ R Ò R Ó x + t f 41

Proposals Choose simple Gaussian random walk proposals (Metropolis) "Q θ θ = N(θ θ, Σ Õ )" Normal perturbations of current state Block-wise to account for different parameter types Shape N(α α, σ * ( E ) Rotation N φ φ, σ * Ó + N ψ ψ, σ * Ò + N θ * θ, σ Ñ Translation N t t, σ * E 3 E i Identity matrix (I is image) Large mixture distributions as proposals Q θ θ = c f Q f (θ θ) 42

3DMM Landmarks Likelihood Simple models: Independent Gaussians Observation of L landmark locations xé f in image Single landmark position model: x f θ = R Ó,Ò,Ñ h α x f Ë Ø + t * l f θ; xé f = N xé f x f θ, σ Ê/ Independent model (conditional independence): Ù l θ; D = p(d θ) l θ; xé V, xé *,, xé Ù = z l f θ; xé f fwv Independence and Gaussian are just simple models (questionable) 43

3D Fit to Landmarks Influence of landmarks uncertainty on final posterior? σ Ê/ = 1mm σ Ê/ = 4mm σ Ê/ = 10mm Only 4 landmark observations: Expect only weak shape impact Should still constrain pose Uncertain LM should be looser 44

3D Fitting: Code val yawproposal = GaussianRotationProposal(AxisY, sdev = 0.05) val pitchproposal = GaussianRotationProposal(AxisX, sdev = 0.05) val rollproposal = GaussianRotationProposal(AxisZ, sdev = 0.05) val rotationproposal = MixtureProposal( 0.6 *: yawproposal + 0.3 *: pitchproposal + 0.1 *: rollproposal) val translationproposal = GaussianTranslationProposal(Vector(2, 2, 2)) val poseproposal = MixtureProposal( rotationproposal + translationproposal) val shapeproposal = GaussianShapeProposal(sdev = 0.05) val lmfitter = MetropolisHastings( proposal = 0.2 *: poseproposal + 0.8 *: shapeproposal, evaluator = lmlikelihood * shapeprior) val samples = lmfitter.iterator(initstate).drop(2000).take(8000).toindexedseq 45

Posterior: Pose & Shape, 4mm μ ÚÛÜ = 0.511 σx ÚÛÜ = 0.073 (4 ) μ Þ ß = 1 mm σx Þß = 4 mm (Estimation from samples) μ b ½ = 0.4 σx b½ = 0.6 46

Posterior: Pose & Shape, 4mm Posterior values (log, unnormalized!) 47

Posterior: Pose & Shape, 1mm μ ÚÛÜ = 0.50 σx ÚÛÜ = 0.041 (2.4 ) μ Þ ß = 2 mm σx Þß = 0.8 mm μ b ½ = 1.5 σx b½ = 0.35 48

Posterior: Pose & Shape, 10mm μ ÚÛÜ = 0.49 σx ÚÛÜ = 0.11 (7 ) μ Þ ß = 5 mm σx Þß = 10 mm μ b ½ = 0 σx b½ = 0.6 49

Summary: MCMC for 3D Fitting Probabilistic inference for fitting probabilistic models Probabilistic inference is most often not tractable Use approximate inference methods Sampling methods approximate by simulation MCMC methods provide a powerful sampling framework Markov Chain with target distribution as equilibrium distribution General algorithms, e.g. Metropolis-Hastings 3D landmarks fitting example: Posterior distribution Model likelihood Define proposals 50