Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February 3, 2015 M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (1)
Plan of today's lecture Variance reduction reconsidered 1 Variance reduction reconsidered 2 3 M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (2)
We are here Variance reduction reconsidered 1 Variance reduction reconsidered 2 3 M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (3)
Last time: Variance reduction Last time we discussed how to reduce the variance of the standard MC sampler by introducing correlation between the variables of the sample. More specically, we used 1 a control variate Y such that E(Y ) = m is known: Z = φ(x) + β(y m), where β was tuned optimally to β = C(φ(X), Y )/V(Y ). 2 antithetic variables V and V such that E(V ) = E(V ) = τ and C(V, V ) < 0: W = V + V. 2 M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (4)
Last time: Variance reduction The following theorem turned out to be useful when constructing antithetic variables. Theorem Let V = ϕ(u), where ϕ : R R is a monotone function. Moreover, assume that there exists a non-increasing transform T : R R such that U = d. T (U). Then V = ϕ(u) and V = ϕ(t (U)) are identically distributed and C(V, V ) = C(ϕ(U), ϕ(t (U))) 0. An important application of this theorem is the following: Let F be a distribution function. Then, letting U U(0, 1), T (u) = 1 u, and ϕ = F 1 yields, for X = F 1 (U) and X = F 1 (1 U), X d. = X (with distribution function F ) and C(X, X ) 0. M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (5)
Last time: Variance reduction τ = 2 π/2 0 exp(cos 2 (x)) dx, V = 2 π 2 exp(cos2 (X)), V = 2 π 2 exp(sin2 (X)), W = V +V 2. 6.2 6 5.8 With Antithetic sampling 5.6 5.4 5.2 5 4.8 4.6 Standard MC 4.4 4.2 0 200 400 600 800 1000 Sample size N V (= 2 N W ) M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (6)
Control variates reconsidered A problem with the control variate approach is that the optimal β, i.e. β = C(φ(X), Y ), V(Y ) is generally not known explicitly. Thus, it was suggested to 1 draw (X i ) N i=1, 2 draw (Y i ) N i=1, 3 estimate, via MC, β using the drawn samples, and 4 use this to optimally construct (Z i ) N i=1. This yields a so-called batch estimator of β. However, this procedure is computationally somewhat complex. M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (7)
An online approach to optimal control variates The estimators C N def = 1 N V N def = 1 N N φ(x i )(Y i m) i=1 N (Y i m) 2 i=1 of C(φ(X), Y ) and V(Y ), respectively, can be implemented recursively according to and C l+1 = with C 0 = V 0 = 0. V l+1 = l l + 1 C l + 1 l + 1 φ(x l+1 )(Y l+1 m) l l + 1 V l + 1 l + 1 (Y l+1 m) 2. M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (8)
An online approach to optimal control variates (cont.) Inspired by this we set for l = 0, 1, 2,..., N 1, Z l+1 = φ(x l+1 ) + β l (Y l+1 m), τ l+1 = l l + 1 τ l + 1 l + 1 Z l+1, ( ) def def def where β 0 = 1, β l = C l /V l for l > 0, and τ 0 = 0 yielding an online estimator. One may then establish the following convergence results. Theorem Let τ N be obtained through ( ). Then, as N, (i) τ N τ (a.s.), (ii) N(τ N τ) where σ 2 d. N (0, σ ), 2 def = V(φ(X)){1 ρ(φ(x), Y ) 2 } is the optimal variance. M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (9)
Example: the tricky integral again We estimate τ = using π/2 π/2 π/2 exp(cos 2 (x)) dx sym = 2 0 π 2 exp(cos2 (x)) } {{ } =φ(x) Z = φ(x) + β (Y m), where Y = cos 2 (X) is a control variate with m = E(Y ) = π/2 0 2 π }{{} =f(x) dx = E f (φ(x)) cos 2 (x) 2 π dx = {use integration by parts} = 1 2. However, the optimal coecient β is not known explicitly. M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (10)
Example: the tricky integral again cos2 = @(x) cos(x).^2; phi = @(x) 2(pi/2)*exp(cos2(x)); m = 1/2; X = (pi/2)*rand; Y = cos2(x); c = phi(x)*(y m); v = (Y m)^2; tau_cv = phi(x) + (Y m); beta = c/v; for k = 2:N, X = (pi/2)*rand; Y = cos2(x); Z = phi(x) + beta*(y m); tau_cv = (k 1)*tau_CV/k + Z/k; c = (k 1)*c/k + phi(x)*(y m)/k; v = (k 1)*v/k + (Y m)^2/k; beta = c/v; end M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (11)
Example: the tricky integral again 5.6 5.5 5.4 5.3 Batch algorithm 5.2 Adaptive algorithm 5.1 Standard MC 5 0 500 1000 1500 Sample size N M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (12)
We are here Variance reduction reconsidered 1 Variance reduction reconsidered 2 3 M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (13)
We will now (and for the coming two lectures) extend the principal goal of the course to the problem of estimating sequentially sequences (τ n ) n 0 of expectations τ n = E fn (φ(x 0:n )) = φ(x 0:n )f n (x 0:n ) dx 0:n X n over spaces X n of increasing dimension, where again the densities (f n ) n 0 are known up to normalizing constants only; i.e. for every n 0, f(x 0:n ) = z n(x 0:n ) c n, where c n is an unknown constant and z n is a known positive function on X n. As we will see, such sequences appear in many applications in statistics and numerical analysis. M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (14)
We are here Variance reduction reconsidered 1 Variance reduction reconsidered 2 3 M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (15)
We are here Variance reduction reconsidered 1 Variance reduction reconsidered 2 3 M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (16)
A Markov chain on X R d is a family of random variables (= stochastic process) (X k ) k 0 taking values in X such that P(X k+1 B X 0, X 1,..., X k ) = P(X k+1 B X k ) for all B X. We call the chain time homogeneous if the conditional distribution of X k+1 given X k does not depend on k. The distribution of X k+1 given X k = x determines completely the dynamics of the process, and the density q of this distribution is called the transition density of (X k ). Consequently, P(X k+1 B X k = x k ) = q(x k+1 x k ) dx k+1. B M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (17)
Markov chains (cont.) Variance reduction reconsidered The following theorem provides the joint density f n (x 0, x 1,..., x n ) of X 0, X 1,..., X n. Theorem Let (X k ) be Markov with initial distribution χ. Then for n > 0, n 1 f n (x 0, x 1,..., x n ) = χ(x 0 ) q(x k+1 x k ). Corollary (Chapman-Kolmogorov equation) Let (X k ) be Markov. Then for n > 1, f n (x n x 0 ) = ( n 1 k=0 k=0 q(x k+1 x k ) ) dx 1 dx n 1. M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (18)
Example: The AR(1) process As a rst example we consider a rst order autoregressive process (AR(1)) in R. Set X 0 = 0, X k+1 = αx k + ɛ k+1, where α is a constant and the variables (ɛ k ) k 1 of the noise sequence are i.i.d. with density function f. In this case, P(X k+1 x k+1 X k = x k ) = P(αX k + ɛ k+1 x k+1 X k = x k ) implying that q(x k+1 x k ) = = P(ɛ k+1 x k+1 αx k X k = x k ) = P(ɛ k+1 x k+1 αx k ), x k+1 P(X k+1 x k+1 X k = x k ) = x k+1 P(ɛ k+1 x k+1 αx k ) = f(x k+1 αx k ). M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (19)
We are here Variance reduction reconsidered 1 Variance reduction reconsidered 2 3 M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (20)
Simulation of rare events for Markov chains Let (X k ) be a Markov chain on X = R and consider the rectangle B = B 0 B 1 B n R n, where every B l = (a l, b l ) is an interval. Here B can be a possibly extreme event. Say that we wish to to compute, sequentially as n increases, some expectation under the conditional distribution f n B of the states X 0:n = (X 0, X 2,..., X n ) given X 0:n B, i.e. τ n = E fn (φ(x 0:n ) X 0:n B) = E fn B (φ(x 0:n )) f(x 0:n ) = φ(x 0:n ) dx 0:n. B P(X 0:n B) } {{ } =f n B (x 0:n )=z n(x 0:n )/c n Here the unknown probability c n = P(X 0:n B) of the rare event B is often the quantity of interest. M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (21)
Simulation of rare events for Markov chains (cont.) As c n = P(X 0:n B) = 1 B (x 0:n )f(x 0:n ) dx 0:n a rstnaiveapproach could of course be to use standard MC and simply 1 simulate the Markov chain N times, yielding trajectories (X 0:n i ) N i=1, 2 count the number N B of trajectories that fall into B, and 3 estimate c n using the MC estimator c N n = N B N. Problem: if c n = 10 9 we may expect to produce a billion draws before obtaining a single draw belonging to B! As we will se, SMC methods solve the problem eciently. M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (22)
We are here Variance reduction reconsidered 1 Variance reduction reconsidered 2 3 M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (23)
Estimation in general hidden Markov models (HMMs) A hidden Markov model (HMM) comprises two stochastic processes: 1 A Markov chain (X k ) k 0 with transition density q: X k+1 X k = x k q(x k+1 x k ). The Markov chain is not seen by us (hidden) but partially observed through 2 an observation process (Y k ) k 0 such that conditionally on the chain (X k ) k 0, (i) (ii) the Y k 's are independent with conditional distribution of each Y k depending on the corresponding X k only. The density of the conditional distribution Y k (X k ) k 0 d. = Y k X k will be denoted by p(y k x k ). M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (24)
Estimation in general HMMs (cont.) Graphically: Y k 1 Y k Y k+1 (Observations)... X k 1 X k X k+1... (Markov chain) Y k X k = x k p(y k x k ) X k+1 X k = x k q(x k+1 x k ) X 0 χ(x 0 ) (Observation density) (Transition density) (Initial distribution) M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (25)
Example HMM: A stochastic volatility model The following dynamical system is used in nancial economy (see e.g. Jacuquier et al., 1994). Let { Xk+1 = αx k + σɛ k+1, Y k = β exp ( Xk 2 ) ε k, where α (0, 1), σ > 0, and β > 0 are constants and (ɛ k ) k 1 and (ε k ) k 0 are sequences of i.i.d. standard normal-distributed noise variables. In this model the values of the observation process (Y k ) are observed daily log-returns (from e.g. the Swedish OMXS30 index) and the hidden chain (X k ) is the unobserved log-volatility (modeled by a stationary AR(1) process). The strength of this model is that it allows for volatility clustering, a phenomenon that is often observed in real nancial time series. M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (26)
Example HMM: A stochastic volatility model A realization of the the model looks like follows (here α = 0.975, σ = 0.16, and β = 0.63). 4 3 2 Log returns Log volatility process log returns 1 0 1 2 0 100 200 300 400 500 600 700 800 900 1000 Days M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (27)
Example HMM: A stochastic volatility model Daily log-returns from the Swedish stock index OMXS30, from 2005-03-30 to 2007-03-30. 0.06 0.04 0.02 y k 0 0.02 0.04 0.06 0 50 100 150 200 250 300 350 400 450 500 k M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (28)
The smoothing distribution When operating on HMMs, one is most often interested in the smoothing distribution f n (x 0:n y 0:n ), i.e. the conditional distribution of a set X 0:n of hidden states given Y 0:n = y 0:n. Theorem (Smoothing distribution) f n (x 0:n y 0:n ) = χ(x 0)p(y 0 x 0 ) n k=1 p(y k x k )q(x k x k 1 ), L n (y 0:n ) where L n (y 0:n ) is the likelihood function given by L n (y 0:n ) = density of the observations y 0:n n = χ(x 0 )p(y 0 x 0 ) p(y k x k )q(x k x k 1 ) dx 0 dx n. k=1 M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (29)
Estimation of smoothed expectations Being a high-dimensional (say n 1000 or 10, 000) integral over complicated integrands, L n (y 0:n ) is in general unknown. However by writing τ n = E(φ(X 0:n ) Y 0:n = y 0:n ) = φ(x 0:n )f n (x 0:n y 0:n ) dx 0 dx n with = φ(x 0:n ) z n(x 0:n ) c n dx 0 dx n, { z n (x 0:n ) = χ(x 0 )p(y 0 x 0 ) n k=1 p(y k x k )q(x k x k 1 ), c n = L n (y 0:n ), we may cast the problem of computing τ n into the framework of self-normalized IS. In particular we would like to update sequentially, in n, the approximation as new data (Y k ) appears. M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (30)
We are here Variance reduction reconsidered 1 Variance reduction reconsidered 2 3 M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (31)
Self-avoiding walks (SAWs) Denote by N (x) the set of neighbors of a point x in Z 2. Let def S n = {x 0:n Z 2n : x 0 = 0, x k+1 x k = 1, x k x l, 0 l < k n} be the set of n-step self-avoiding walks in Z 2. 6 A self avoiding walk of length 50 5 4 3 2 1 0 1 2 3 8 6 4 2 0 2 4 6 M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (32)
Application of SAWs Variance reduction reconsidered In addition, let SAWs are used in c n = S n = The number of possible SAWs of length n. polymer science for describing long chain polymers, with the self-avoidance condition modeling the excluded volume eect. statistical mechanics and the theory of critical phenomena in equilibrium. However, computing c n (and in analyzing how c n depends on n) is known to be a very challenging combinatorial problem! M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (33)
An MC approach to SAWs Trick: let f n (x 0:n ) be the uniform distribution on S n : f n (x 0:n ) = 1 1 Sn (x 0:n ), x c n } {{ } 0:n Z 2n, =z(x 0:n ) We may thus cast the problem of computing the number c n (= the normalizing constant of f n ) into the framework of self-normalized IS based on some convenient instrumental distribution g n on Z 2n. In addition, solving this problem for n = 1, 2, 3,..., 508, 509,... calls for sequential implementation of IS. This will be the topic of HA2! M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (34)