Sampling Methods: Particle Filtering

Similar documents
Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking

STA 4273H: Statistical Machine Learning

Master s thesis tutorial: part III

Monte Carlo-based statistical methods (MASM11/FMS091)

Tutorial on Markov Chain Monte Carlo

How To Solve A Sequential Mca Problem

Language Modeling. Chapter Introduction

1 Short Introduction to Time Series

The Quantum Harmonic Oscillator Stephen Webb

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

Robert Collins CSE598G. More on Mean-shift. R.Collins, CSE, PSU CSE598G Spring 2006

The Basics of Graphical Models

Markov Chain Monte Carlo Simulation Made Simple

Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 12 04/08/2008. Sven Zenker

Model-based Synthesis. Tony O Hagan

1 Maximum likelihood estimation

1 Prior Probability and Posterior Probability

Deterministic Sampling-based Switching Kalman Filtering for Vehicle Tracking

Master s Theory Exam Spring 2006

Basics of Statistical Machine Learning

An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics

Statistics Graduate Courses

Real-time Visual Tracker by Stream Processing

CHAPTER 2 Estimating Probabilities

Lecture 3: Linear methods for classification

Binomial lattice model for stock prices

Computational Geometry Lab: FEM BASIS FUNCTIONS FOR A TETRAHEDRON

Inference on Phase-type Models via MCMC

A Robust Multiple Object Tracking for Sport Applications 1) Thomas Mauthner, Horst Bischof

VARIANCE REDUCTION TECHNIQUES FOR IMPLICIT MONTE CARLO SIMULATIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan , Fall 2010

Computing with Finite and Infinite Networks

Bayesian Statistics: Indian Buffet Process

Bayes and Naïve Bayes. cs534-machine Learning

Continued Fractions and the Euclidean Algorithm

1 The Brownian bridge construction

6 Scalar, Stochastic, Discrete Dynamic Systems

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

LECTURE 4. Last time: Lecture outline

LECTURE 16. Readings: Section 5.1. Lecture outline. Random processes Definition of the Bernoulli process Basic properties of the Bernoulli process

An Application of Inverse Reinforcement Learning to Medical Records of Diabetes Treatment

Probability and Random Variables. Generation of random variables (r.v.)

Introduction to Netlogo: A Newton s Law of Gravity Simulation

11. Time series and dynamic linear models

An On-Line Algorithm for Checkpoint Placement

Dirichlet Processes A gentle tutorial

Lecture 7: Continuous Random Variables

Name Partners Date. Energy Diagrams I

Mechanics 1: Conservation of Energy and Momentum

Dirichlet forms methods for error calculus and sensitivity analysis

3. Reaction Diffusion Equations Consider the following ODE model for population growth

Online Appendix. Supplemental Material for Insider Trading, Stochastic Liquidity and. Equilibrium Prices. by Pierre Collin-Dufresne and Vyacheslav Fos

Real-Time Monitoring of Complex Industrial Processes with Particle Filters

Bias in the Estimation of Mean Reversion in Continuous-Time Lévy Processes

Robotics. Chapter 25. Chapter 25 1

One-year reserve risk including a tail factor : closed formula and bootstrap approaches

MCMC Using Hamiltonian Dynamics

Discrete Frobenius-Perron Tracking

Covariance and Correlation

Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization

Objective. Materials. TI-73 Calculator

Tagging with Hidden Markov Models

Chapter 4 One Dimensional Kinematics

Linear Threshold Units

Gibbs Sampling and Online Learning Introduction

Chapter 22: The Electric Field. Read Chapter 22 Do Ch. 22 Questions 3, 5, 7, 9 Do Ch. 22 Problems 5, 19, 24

Monte Carlo Methods and Models in Finance and Insurance

Econometrics Simple Linear Regression

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

MAN-BITES-DOG BUSINESS CYCLES ONLINE APPENDIX

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

2.3 Solving Equations Containing Fractions and Decimals

Graphing Rational Functions

A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University

2.2 Scientific Notation: Writing Large and Small Numbers

Supplement to Call Centers with Delay Information: Models and Insights

Equity-Based Insurance Guarantees Conference November 1-2, New York, NY. Operational Risks

Analysis of a Production/Inventory System with Multiple Retailers

Characteristics of Binomial Distributions

Neuro-Dynamic Programming An Overview

6. Vectors Scott Surgent (surgent@asu.edu)

The CUSUM algorithm a small review. Pierre Granjon

Epipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R.

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Financial TIme Series Analysis: Part II

Corrected Diffusion Approximations for the Maximum of Heavy-Tailed Random Walk

UW CSE Technical Report Probabilistic Bilinear Models for Appearance-Based Vision

Polynomial and Rational Functions

Statistics, Probability and Noise

Christfried Webers. Canberra February June 2015

Important Probability Distributions OPRE 6301

1 Interest rates, and risk-free investments

Lecture 3: Finding integer solutions to systems of linear equations

Linear Programming. March 14, 2014

BayesX - Software for Bayesian Inference in Structured Additive Regression

Understanding and Applying Kalman Filtering

Pricing and calibration in local volatility models via fast quantization

Transcription:

Penn State Sampling Methods: Particle Filtering CSE586 Computer Vision II CSE Dept, Penn State Univ

Penn State Recall: Importance Sampling Procedure to estimate E P (f(x)): 1) Generate N samples x i from Q(x) 2) form importance weights 3) compute empirical estimate of E P (f(x)), the expected value of f(x) under distribution P(x), as

Penn State Resampling Note: We thus have a set of weighted samples (xi, wi i=1,,n) If we really need random samples from P, we can generate them by resampling such that the likelihood of choosing value x i is proportional to its weight w i This would now involve now sampling from a discrete distribution of N possible values (the N values of x i ) Therefore, regardless of the dimensionality of vector x, we are resampling from a 1D distribution (we are essentially sampling from the indices 1...N, in proportion to the importance weights w i ). So we can using the inverse transform sampling method we discussed earlier.

Penn State Sequential Monte Carlo Methods Sequential Importance Sampling (SIS) and the closely related algorithm Sampling Importance Sampling (SIR) are known by various names in the literature: - bootstrap filtering - particle filtering - Condensation algorithm - survival of the fittest General idea: Importance sampling on time series data, with samples and weights updated as each new data term is observed. Well-suited for simulating recursive Bayes filtering!

Penn State Recall: Bayes Filtering Two-step Iteration at Each Time t: Motion Prediction Step: Data Correction Step (Bayes rule):

Penn State Recall: Bayes Filtering Problem: in general we get intractible integrals Motion Prediction Step: Data Correction Step (Bayes rule):

Penn State Sequential Monte Carlo Methods Intuition: Represent probability distributions by samples (called particles). Each particle is a guess at the true state. For each one, simulate it s motion update and add noise to get a motion prediction. Measure the likelihood of this prediction, and weight the resulting particles proportional to their likelihoods.

Penn State Back to Bayes Filtering This integral in the denominator of Bayes rule disappears as a consequence of representing distributions by a weighted set of samples. Since we have only a finite number of samples, the normalization constant will be the sum of the weights! Data Correction Step (Bayes rule):

Penn State Back to Bayes Filtering Now let s write the Bayes filter by combining motion prediction and data correction steps into one equation. new posterior data term motion term old posterior

Penn State Monte Carlo Bayes Filtering Assume the posterior at time t-1 (which is the prior at time t) has been approximated as a set of N weighted particles: So that Where is the delta dirac function Useful property:

Penn State Monte Carlo Bayes Filtering Then the motion prediction integral simplifies to a summation Motion prediction integral The prior had been approximated by N particles Exchange order of summation and integration Property of Dirac delta function

Penn State Monte Carlo Bayes Filtering Our Bayes filtering equation thus simplifies as well Plugging in result from previous page Bringing term that doesn t depend on i into the summation

Penn State Monte Carlo Bayes Filtering Our new posterior is therefore but this is still not amenable to computation in closed-form for arbitrary motion models and likelihood functions (e.g. we would have to integrate it to compute the normalization constant c) Idea : Let s approximate the posterior as a set of N samples! Idea 2 : Hey wait a minute, the prior was already represented as a set of N samples! Why don t we just update each of those?

Penn State Monte Carlo Bayes Filtering Approach: for each sample x i t-1, generate a new sample x i t from convenient proposal distribution by importance sampling using some So, generate a sample and compute its importance weight

Penn State Monte Carlo Bayes Filtering We then can approximate our posterior as where

Penn State SIS Algorithm

w w w Robert Collins Penn State SIS Degeneracy Unfortunately, pure SIS suffers from degeneracy. In many cases, after a few iterations, all but one particle will have negligible weight. Illustration of degeneracy: Time 1 Time 10 Time 19

Penn State Resampling to Combat Degeneracy Sampling with replacement to get N new samples, each having equal weight 1/N Samples with high weight get replicated Samples with low weight die off Concentrates particles in areas of higher probability

Penn State Generic Particle Filter

Penn State Sample Importance Resample (SIR) SIR is a special case of the generic particle filter where: - the prior density is used as the proposal density - resampling is done every iteration therefore and thus cancellation the old weights are all equal due to resampling

Penn State SIR Algorithm

Penn State Drawing from the Prior Density note, when we use the prior as the importance density, we only need to sample from the process noise distribution (typically uniform or Gaussian). Why? Recall: x k = f k (x k-1, v k-1 ) v is process noise Thus we can sample from the prior P(x k x k-1 ) by starting with sample x i k-1, generating a noise vector v i k-1 from the noise process, and forming the noisy sample x i k = f k (x i k-1, v i k-1) If the noise is additive, this leads to a very simple interpretation: move each particle using motion prediction, then add noise.

Penn State Robert Collins SIR Filtering Illustration M m m k M x 1 ) ( 1 1, x M m m k m k w x 1 ) ( ) (, M m m k M x 1 ) ( ~ 1, M m m k M x 1 ) ( 1 1, M m m k m k w x 1 ) ( 1 ) ( 1, M m m k M x 1 ) ( 1 ~ 1, M m m k M x 1 ) ( 2 1,

Penn State Problems with SIS/SIR Degeneracy: in SIS, after several iterations all samples except one tend to have negligible weight. Thus a lot of computational effort is spent on particles that make no contribution. Resampling is supposed to fix this, but also causes a problem... Sample Impoverishment: in SIR, after several iterations all samples tend to collapse into a single state. The ability to representation multimodal distributions is thus short-lived.

Particle Filter Failure Analysis References King and Forsyth, How Does CONDENSATION Behave with a Finite Number of Samples? ECCV 2000, 695-709. Karlin and Taylor, A First Course in Stochastic Processes, 2nd edition, Academic Press, 1975.

Particle Filter Failure Analysis Summary Condensation/SIR is aymptotically correct as the number of samples tends towards infinity. However, as a practical issue, it has to be run with a finite number of samples. Iterations of Condensation form a Markov chain whose state space is quantized representations of a density. This Markov chain has some undesirable properties high variance - different runs can lead to very different answers low apparent variance within each individual run (appears stable) state can collapse to single peak in time roughly linear in number of samples tracker may appear to follow peaks in the posterior even in the absence of any meaningful measurements. These properties generally known as sample impoverishment

Stationary Analysis For simplicity, we focus on tracking problems with stationary distributions (posterior should be the same at any time step). [because it is hard to really focus on what is going on when the posterior modes are deterministically moving around. Any movement of modes in our analysis will be due to behavior of the particle filter]

A Simple PMF State Space Consider 10 particles representing a probability mass function over 2 locations. PMF state space: {(0,10)(1,9)(2,8)(3,7)(4,6)(5,5) (6,4)(7,3)(8,2)(9,1)(10,0)} 1 2 (4,6) We will now instantiate a particular two-state filtering model that we can analyze in closed-form, and explore the Markov chain process (on the PMF state space above) that describes how particle filtering performs on that process.

Discrete, Stationary, No Noise Assume a stationary process model with no-noise process model: X k+1 = F X k + v k I Identity 0 no noise process model: X k+1 = X k

Perfect Two-State Ambiguity Let our two filtering states be {a,b}. We define both prior distribution and observation model to be ambiguous (equal belief in a and b). P(X 0 ) =.5 X 0 = a.5 X 0 = b P(Z X k ) =.5 X 0 = a.5 X 0 = b from process model: a P(X k+1 X k ) = b a 1 0 b 0 1

These are exact propagation equations. Robert Collins Prediction: Recall: Recursive Filtering predicted current state state transition previous estimated state Update: estimated current state measurement predicted current state normalization term

Analytic Filter Analysis Predict 1.5 0.5 0.5 1.5 =.5 =.5 Update.5.5.5.5 =.25/(.25+.25) =.5 =.25/(.25+.25) =.5

Analytic Filter Analysis Therefore, for all k, the posterior distribution is P(X k z 1:k ) =.5 X k = a.5 X k = b which agrees with our intuition in regards to the stationarity and ambiguity of our two-state model. Now let s see how a particle filter behaves...

Particle Filter Consider 10 particles representing a probability mass function over our 2 locations {a,b}. In accordance with our ambiguous prior, we will initialize with 5 particles in each location P(X 0 ) = a b

Condensation (SIR) Particle Filter 1) Select N new samples with replacement, according to the sample weights (equal weights in this case) 2) Apply process model to each sample (deterministic motion + noise) (no-op in this case) 3) For each new position, set weight of particle in accordance to observation probability (all weights become.5 in this case) 4) Normalize weights so they sum to one (weights are still equal )

Condensation as Markov Chain (Key Step) Recall that 10 particles representing a probability mass function over 2 locations can be thought of as having a state space with 11 elements: {(0,10)(1,9)(2,8)(3,7)(4,6)(5,5)(6,4)(7,3)(8,2)(9,1)(10,0)} (5,5) a b

Condensation as Markov Chain (Key Step) We want to characterize the probability that the particle filter procedure will transition from the current configuration to a new configuration: {(0,10)(1,9)(2,8)(3,7)(4,6)(5,5)(6,4)(7,3)(8,2)(9,1)(10,0)} (5,5)? a b

Condensation as Markov Chain (Key Step) We want to characterize the probability that the particle filter procedure will transition from the current configuration to a new configuration: {(0,10)(1,9)(2,8)(3,7)(4,6)(5,5)(6,4)(7,3)(8,2)(9,1)(10,0)}? (5,5) Let P(j i) be prob of transitioning from (i,10-i) to (j,10-j) a b

Example N=10 samples (5,5) a b.2051 (4,6).1172 a b.2461.2051 (3,7) a b (5,5) a b (6,4) a b.25 0 P( j 5) j 0 10

0 Full Transition Table P( j i) i.25 P( j 5) 0 0 j 1 0 10 0 j 10

P(j 5) The Crux of the Problem from (5,5), there is a good chance we will jump to away from (5,5), say to (6,4) P(j 6) P(j 7) once we do that, we are no longer sampling from the transition distribution at (5,5), but from the one at (6,4). But this is biased off center from (5,5) and so on. The behavior will be similar to that of a random walk.

0 Another Problem P( j i) P(0 0) = 1 i (0,10) and (10,0) are absorbing states! 10 0 j 10 P(10 10) = 1

Observations The Markov chain has two absorbing states (0,10) and (10,0) Once the chain gets into either of these two states, it can never get out (all the particles have collapsed into a single bucket) There is a nonzero probability of getting into either absorbing state, starting from (5,5) These are the seeds of our destruction!

Simulation

Some sample runs with 10 particles

N=10 More Sample Runs N=20 N=100

average time to absorbtion Robert Collins Average Time to Absorbtion number of particles N Dots - from running simulator (100 trials at N=10,20,30...) Line - plot of 1.4 N, the asymptotic analytic estimate (King and Forsyth)

More Generally Implications of stationary process model with no noise, in a discrete state space. any time any bucket contains zero particles, it will forever after have zero particles (for that run). there is typically a nonzero probability of getting zero particles in a bucket sometime during the run. thus, over time, the particles will inevitably collapse into a single bucket.

Extending to Continuous Case A similar thing happens in more realistic cases. Consider a continuous case with two stationary modes in the likelihood, and where each mode has small variance with respect to distance between modes. mode1 mode2

Extending to Continuous Case The very low variance between modes is fatal to any particles that try to cross from one to the other via diffusion. mode1 mode2

Extending to Continuous Case Each mode thus becomes an isolated island, and we can reduce this case to our previous two-state analysis (each mode is one discrete state) mode1 mode2 a b