Monte Carlo-based statistical methods (MASM11/FMS091) Jimmy Olsson Centre for Mathematical Sciences Lund University, Sweden Lecture 6 Sequential Monte Carlo methods II February 3, 2012
Changes in HA1 Problem 4(c) in HA1 has been modified: You do NOT have to provide any confidence interval for the variance V(P (V 1 ) + P (V 2 )) and the standard deviation D(P (V 1 ) + P (V 2 )). Provide only point estimates of these quantities.
Plan of today s lecture 1 Last time: Sequential MC problems 2
We are here 1 Last time: Sequential MC problems 2
Last time: Sequential MC problems In the sequential MC framework, we aim at sequentially estimating sequences (τ n ) n 0 of expectations τ n = E fn (φ(x 0:n )) = φ(x 0:n )f n (x 0:n ) dx 0:n ( ) X n over spaces X n of increasing dimension, where the densities (f n ) are known up to normalizing constants only, i.e., for every n 0, where c n is an unknown constant. f n (x 0:n ) = z n(x 0:n ) c n,
Last time: Markov chains Some applications involved the notion of Markov chains: A Markov chain on X R d is a family of random variables (= stochastic process) (X k ) k 0 taking values in X such that P(X k+1 A X 0, X 1,..., X k ) = P(X k+1 A X k ). The density q of the distribution of X k+1 given X k = x is called the transition density of (X k ). Consequently, P(X k+1 A X k = x k ) = q(x k+1 x k ) dx k+1. As a first example we considered an AR(1) process: X 0 = 0, X k+1 = αx k + ɛ k+1, where α is a constant and (ɛ k ) are i.i.d. variables. A
Last time: Markov chains (cont.) The following theorem provides the joint density f n (x 0, x 1,..., x n ) of X 0, X 1,..., X n. Theorem Let (X k ) be Markov with X 0 χ. Then for n > 0, n 1 f n (x 0, x 1,..., x n ) = χ(x 0 ) q(x k+1 x k ). k=0 Corollary (The Chapman-Kolmogorov equation) Let (X k ) be Markov. Then for n > 1, f n (x n x 0 ) = ( n 1 k=0 q(x k+1 x k ) ) dx 1 dx n 1.
Last time: Rare event analysis (REA) for Markov chains Let (X k ) be a Markov chain. Assume that we want to compute, for n = 0, 1, 2,... τ n = E(φ(X 0:n ) X 0:n A) = = A A f n (x 0:n ) φ(x 0:n ) P(X 0:n A) dx 0:n φ(x 0:n ) χ(x 0) n 1 k=0 q(x k+1 x k ) dx 0:n, P(X 0:n A) where A is a possibly rare event and P(X 0:n A) is generally unknown. We thus face a sequential MC problem ( ) with { z n (x 0:n ) χ(x 0 ) n 1 k=0 q(x k+1 x k ), c n P(X 0:n A).
Last time: Estimation in general HMMs Graphically: Y k 1 Y k Y k+1 (Observations)... X k 1 X k X k+1... (Markov chain) Y k X k = x k g(y k x k ) X k+1 X k = x k q(x k+1 x k ) X 0 χ(x 0 ) (Observation density) (Transition density) (Initial distribution)
Last time: Estimation in general HMMs In an HMM, the smoothing distribution f n (x 0:n y 0:n ) is the conditional distribution of a set X 0:n of hidden states given Y 0:n = y 0:n. Theorem (Smoothing distribution) where f n (x 0:n y 0:n ) = χ(x 0)g(y 0 x 0 ) n k=1 g(y k x k )q(x k x k 1 ), L n (y 0:n ) L n (y 0:n ) = density of the observations y 0:n n = χ(x 0 )g(y 0 x 0 ) g(y k x k )q(x k x k 1 ) dx 0 dx n. k=1
Last time: Estimation in general HMMs Assume that we want to compute, online for n = 0, 1, 2,... τ n = E(φ(X 0:n ) Y 0:n = y 0:n ) = φ(x 0:n )f n (x 0:n y 0:n ) dx 0 dx n = φ(x 0:n ) χ(x 0)g(y 0 x 0 ) n k=1 g(y k x k )q(x k x k 1 ) dx 0 dx n, L n (y 0:n ) where L n (y 0:n ) (= obscene integral) is generally unknown. We thus face a sequential MC problem ( ) with { z n (x 0:n ) χ(x 0 )g(y 0 x 0 ) n k=1 g(y k x k )q(x k x k 1 ), c n L n (y 0:n ).
We are here 1 Last time: Sequential MC problems 2
We are here 1 Last time: Sequential MC problems 2
It is natural to aim at solving the problem using usual self-normalized IS. However, the generated samples (X0:n i, ω n(x0:n i )) should be such that having (X0:n i, ω n(x0:n i )), the next sample (X0:n+1 i, ω n+1(x0:n+1 i )) is easily generated with a complexity that does not increase with n (online sampling). the approximation remains stable as n increases. We call each draw X i 0:n = (Xi 0,..., Xi n) a particle. Moreover, we denote importance weights by ω i n = ω n (X i 0:n).
We are here 1 Last time: Sequential MC problems 2
We proceed recursively. Assume that we have generated particles (X i 0:n ) from g n(x 0:n ) so that N i=1 ω i n N l=1 ωl n h(x i 0:n) E fn (φ(x 0:n )), where, as usual, ω i n = ω n (X i 0:n ) = z n(x i 0:n )/g n(x i 0:n ). Key trick: Choose an instrumental distribution satisfying g n+1 (x 0:n+1 ) = g n+1 (x n+1 x 0:n )g n (x 0:n ).
SIS (cont.) Last time: Sequential MC problems Consequently, X0:n+1 i and ωi n+1 can be generated by keeping the previous X0:n i, simulating X i n+1 g n+1(x n+1 X i 0:n ), setting X0:n+1 i = (Xi n+1, Xi 0:n ), and computing ω i n+1 = z n+1(x i 0:n+1 ) g n+1 (X i 0:n+1 ) = = z n+1 (X i 0:n+1 ) z n (X i 0:n )g n+1(x i n+1 Xi 0:n ) z n(x i 0:n ) g n (X i 0:n ) z n+1 (X i 0:n+1 ) z n (X i 0:n )g n+1(x i n+1 Xi 0:n ) ωi n.
SIS (cont.) Last time: Sequential MC problems Voilà, the sample (X0:n+1 i, ωi n+1 ) can now be used to approximate E fn+1 (φ(x 0:n+1 ))! So, by running the SIS algorithm, we have updated an approximation N i=1 to an approximation N i=1 ω i n N l=1 ωl n h(x i 0:n) E fn (φ(x 0:n )) ω i n+1 N l=1 ωl n+1 h(x i 0:n+1) E fn+1 (φ(x 0:n+1 )) by only adding a component Xn+1 i to Xi 0:n updating the weights. and sequentially
SIS: Pseudo code for i = 1 N do draw X i 0 g 0 set ω i 0 = z 0(X i 0 ) g 0 (X i 0 ) end for return (X i 0, ωi 0 ) for k = 0, 1, 2,... do for i = 1 N do draw X i k+1 g k+1(x k+1 X i 0:n ) set X i 0:k+1 (Xi k+1, Xi 0:k ) set ω i k+1 z k+1 (X i 0:k+1 ) z k (X i 0:k )g k+1(x i k+1 Xi 0:k ) ωi k end for return (X i 0:k+1, ωi k+1 ) end for
Example: REA reconsidered We consider again the example of REA for Markov chains (X = R): τ n = E(φ(X 0:n ) a X l, l = 0,..., n) = φ(x 0:n ) χ(x n 1 0) k=0 q(x k+1 x k ) (a, ) n+1 P(a X l, l) } {{ } =z n(x 0:n )/c n Choose g k+1 (x k+1 x 0:k ) to be the conditional density of X k+1 given X k and X k+1 a: dx 0:n. g k+1 (x k+1 x 0:k ) = {cf. HA1, Problem 1} = q(x k+1 x k ) a q(z x k) dz.
Example: REA This implies that g n (x 0:n ) = a n 1 χ(x 0 ) q(x k+1 x k ) χ(x 0) dx 0 a q(z x k) dz. k=0 In addition, the weights are updated as ωk+1 i = z k+1 (X0:k+1 i ) z k (X0:k i )g k+1(xk+1 i Xi 0:k ) ωi k k l=0 = q(xi l+1 Xi l ) k 1 l=0 q(xi l+1 Xi l ) q(xi k+1 Xi k ) ωk i = a q(z Xi k ) dz a q(z X i k ) dz ωi k.
Example: REA; Matlab implementation for AR(1) process with Gaussian noise % design of instrumental distribution: int = @(x) 1 normcdf(a,alpha*x,sigma); trunk_td_rnd = @(x)... % use HA1, Problem 1, to draw from % the truncated transition density; % SIS: % x = starting position. part = a*ones(n,1); % initialization in a w = ones(n,1); for k = 1:(n 1), % main loop part_mut = trunk_td_rnd(part); w = w.*int(part); part = part_mut; end c = mean(w); % estimated probability
REA: Importance weight distribution Serious drawback of SIS: the importance weights degenerate!... 1000 500 n = 1 0 15 10 5 0 1000 n = 10 500 0 15 10 5 0 100 n = 100 50 0 15 10 5 0 Importance weights (base 10 logarithm)
What s next? Last time: Sequential MC problems Weight degeneration is a universal problem with the SIS method and is due to the fact that the particle weights are generated through subsequent multiplications. This drawback prevented during several decades the SIS method from being practically useful. Next week we will discuss an efficient solution to this problem: SIS with resampling.