Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

Similar documents

Monte Carlo-based statistical methods (MASM11/FMS091)

Monte Carlo-based statistical methods (MASM11/FMS091)

How To Solve A Sequential Mca Problem

I. Pointwise convergence

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan , Fall 2010

1 Sufficient statistics

A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails

1 Prior Probability and Posterior Probability

Maximum Likelihood Estimation

Monte Carlo Methods in Finance

Model-based Synthesis. Tony O Hagan

Basics of Statistical Machine Learning

Sales forecasting # 2

Binomial lattice model for stock prices

Lecture notes: single-agent dynamics 1

Principle of Data Reduction

Time Series Analysis

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

Numerical Methods for Option Pricing

Lecture 8. Confidence intervals and the central limit theorem

Metric Spaces. Chapter Metrics

Lecture 13: Martingales

Bayesian Statistics in One Hour. Patrick Lam

Nonparametric adaptive age replacement with a one-cycle criterion

Random access protocols for channel access. Markov chains and their stability. Laurent Massoulié.

Introduction to General and Generalized Linear Models

Some stability results of parameter identification in a jump diffusion model

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

Polarization codes and the rate of polarization

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i.

Note on the EM Algorithm in Linear Regression Model

Separable First Order Differential Equations

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Introduction to Markov Chain Monte Carlo

ARMA, GARCH and Related Option Pricing Method

On closed-form solutions to a class of ordinary differential equations

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

Statistics Graduate Courses

Scalar Valued Functions of Several Variables; the Gradient Vector

Gaussian Processes to Speed up Hamiltonian Monte Carlo

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

Tutorial on Markov Chain Monte Carlo

Monte Carlo Simulation

Simulating Stochastic Differential Equations

Nonlinear Algebraic Equations. Lectures INF2320 p. 1/88

1. (First passage/hitting times/gambler s ruin problem:) Suppose that X has a discrete state space and let i be a fixed state. Let

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

An Introduction to Basic Statistics and Probability

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

The Heat Equation. Lectures INF2320 p. 1/88

Marshall-Olkin distributions and portfolio credit risk

Generating Random Variables and Stochastic Processes

1 Simulating Brownian motion (BM) and geometric Brownian motion (GBM)

Generating Random Variables and Stochastic Processes

A Dynamic Supply-Demand Model for Electricity Prices

Chapter 3 RANDOM VARIATE GENERATION

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Moreover, under the risk neutral measure, it must be the case that (5) r t = µ t.

Lecture 3: Linear methods for classification

Portfolio Distribution Modelling and Computation. Harry Zheng Department of Mathematics Imperial College

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Chapter 2: Binomial Methods and the Black-Scholes Formula

A Brief Review of Elementary Ordinary Differential Equations

2WB05 Simulation Lecture 8: Generating random variables

Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 12 04/08/2008. Sven Zenker

Non-Stationary Time Series andunitroottests

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

Message-passing sequential detection of multiple change points in networks

Estimating the Degree of Activity of jumps in High Frequency Financial Data. joint with Yacine Aït-Sahalia

LECTURE 4. Last time: Lecture outline

Recent Developments of Statistical Application in. Finance. Ruey S. Tsay. Graduate School of Business. The University of Chicago

Example of the Glicko-2 system

Efficiency and the Cramér-Rao Inequality

ALGORITHMIC TRADING WITH MARKOV CHAINS

Markov Chain Monte Carlo Simulation Made Simple

3. Monte Carlo Simulations. Math6911 S08, HM Zhu

The Engle-Granger representation theorem

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

Inner Product Spaces

3. Regression & Exponential Smoothing

Statistical Machine Learning

Nonparametric Tests for Randomness

An Introduction to Machine Learning

Package SCperf. February 19, 2015

Financial Market Microstructure Theory

INTERPOLATION. Interpolation is a process of finding a formula (often a polynomial) whose graph will pass through a given set of points (x, y).

The Ergodic Theorem and randomness

Determining distribution parameters from quantiles

Econometrics Simple Linear Regression

Real Business Cycle Models

Detection of changes in variance using binary segmentation and optimal partitioning

False-Alarm and Non-Detection Probabilities for On-line Quality Control via HMM

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers

BayesX - Software for Bayesian Inference in Structured Additive Regression

1. First-order Ordinary Differential Equations

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Elliptical copulae. Dorota Kurowicka, Jolanta Misiewicz, Roger Cooke

Master s Theory Exam Spring 2006

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA

Transcription:

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February 3, 2015 M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (1)

Plan of today's lecture Variance reduction reconsidered 1 Variance reduction reconsidered 2 3 M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (2)

We are here Variance reduction reconsidered 1 Variance reduction reconsidered 2 3 M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (3)

Last time: Variance reduction Last time we discussed how to reduce the variance of the standard MC sampler by introducing correlation between the variables of the sample. More specically, we used 1 a control variate Y such that E(Y ) = m is known: Z = φ(x) + β(y m), where β was tuned optimally to β = C(φ(X), Y )/V(Y ). 2 antithetic variables V and V such that E(V ) = E(V ) = τ and C(V, V ) < 0: W = V + V. 2 M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (4)

Last time: Variance reduction The following theorem turned out to be useful when constructing antithetic variables. Theorem Let V = ϕ(u), where ϕ : R R is a monotone function. Moreover, assume that there exists a non-increasing transform T : R R such that U = d. T (U). Then V = ϕ(u) and V = ϕ(t (U)) are identically distributed and C(V, V ) = C(ϕ(U), ϕ(t (U))) 0. An important application of this theorem is the following: Let F be a distribution function. Then, letting U U(0, 1), T (u) = 1 u, and ϕ = F 1 yields, for X = F 1 (U) and X = F 1 (1 U), X d. = X (with distribution function F ) and C(X, X ) 0. M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (5)

Last time: Variance reduction τ = 2 π/2 0 exp(cos 2 (x)) dx, V = 2 π 2 exp(cos2 (X)), V = 2 π 2 exp(sin2 (X)), W = V +V 2. 6.2 6 5.8 With Antithetic sampling 5.6 5.4 5.2 5 4.8 4.6 Standard MC 4.4 4.2 0 200 400 600 800 1000 Sample size N V (= 2 N W ) M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (6)

Control variates reconsidered A problem with the control variate approach is that the optimal β, i.e. β = C(φ(X), Y ), V(Y ) is generally not known explicitly. Thus, it was suggested to 1 draw (X i ) N i=1, 2 draw (Y i ) N i=1, 3 estimate, via MC, β using the drawn samples, and 4 use this to optimally construct (Z i ) N i=1. This yields a so-called batch estimator of β. However, this procedure is computationally somewhat complex. M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (7)

An online approach to optimal control variates The estimators C N def = 1 N V N def = 1 N N φ(x i )(Y i m) i=1 N (Y i m) 2 i=1 of C(φ(X), Y ) and V(Y ), respectively, can be implemented recursively according to and C l+1 = with C 0 = V 0 = 0. V l+1 = l l + 1 C l + 1 l + 1 φ(x l+1 )(Y l+1 m) l l + 1 V l + 1 l + 1 (Y l+1 m) 2. M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (8)

An online approach to optimal control variates (cont.) Inspired by this we set for l = 0, 1, 2,..., N 1, Z l+1 = φ(x l+1 ) + β l (Y l+1 m), τ l+1 = l l + 1 τ l + 1 l + 1 Z l+1, ( ) def def def where β 0 = 1, β l = C l /V l for l > 0, and τ 0 = 0 yielding an online estimator. One may then establish the following convergence results. Theorem Let τ N be obtained through ( ). Then, as N, (i) τ N τ (a.s.), (ii) N(τ N τ) where σ 2 d. N (0, σ ), 2 def = V(φ(X)){1 ρ(φ(x), Y ) 2 } is the optimal variance. M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (9)

Example: the tricky integral again We estimate τ = using π/2 π/2 π/2 exp(cos 2 (x)) dx sym = 2 0 π 2 exp(cos2 (x)) } {{ } =φ(x) Z = φ(x) + β (Y m), where Y = cos 2 (X) is a control variate with m = E(Y ) = π/2 0 2 π }{{} =f(x) dx = E f (φ(x)) cos 2 (x) 2 π dx = {use integration by parts} = 1 2. However, the optimal coecient β is not known explicitly. M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (10)

Example: the tricky integral again cos2 = @(x) cos(x).^2; phi = @(x) 2(pi/2)*exp(cos2(x)); m = 1/2; X = (pi/2)*rand; Y = cos2(x); c = phi(x)*(y m); v = (Y m)^2; tau_cv = phi(x) + (Y m); beta = c/v; for k = 2:N, X = (pi/2)*rand; Y = cos2(x); Z = phi(x) + beta*(y m); tau_cv = (k 1)*tau_CV/k + Z/k; c = (k 1)*c/k + phi(x)*(y m)/k; v = (k 1)*v/k + (Y m)^2/k; beta = c/v; end M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (11)

Example: the tricky integral again 5.6 5.5 5.4 5.3 Batch algorithm 5.2 Adaptive algorithm 5.1 Standard MC 5 0 500 1000 1500 Sample size N M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (12)

We are here Variance reduction reconsidered 1 Variance reduction reconsidered 2 3 M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (13)

We will now (and for the coming two lectures) extend the principal goal of the course to the problem of estimating sequentially sequences (τ n ) n 0 of expectations τ n = E fn (φ(x 0:n )) = φ(x 0:n )f n (x 0:n ) dx 0:n X n over spaces X n of increasing dimension, where again the densities (f n ) n 0 are known up to normalizing constants only; i.e. for every n 0, f(x 0:n ) = z n(x 0:n ) c n, where c n is an unknown constant and z n is a known positive function on X n. As we will see, such sequences appear in many applications in statistics and numerical analysis. M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (14)

We are here Variance reduction reconsidered 1 Variance reduction reconsidered 2 3 M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (15)

We are here Variance reduction reconsidered 1 Variance reduction reconsidered 2 3 M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (16)

A Markov chain on X R d is a family of random variables (= stochastic process) (X k ) k 0 taking values in X such that P(X k+1 B X 0, X 1,..., X k ) = P(X k+1 B X k ) for all B X. We call the chain time homogeneous if the conditional distribution of X k+1 given X k does not depend on k. The distribution of X k+1 given X k = x determines completely the dynamics of the process, and the density q of this distribution is called the transition density of (X k ). Consequently, P(X k+1 B X k = x k ) = q(x k+1 x k ) dx k+1. B M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (17)

Markov chains (cont.) Variance reduction reconsidered The following theorem provides the joint density f n (x 0, x 1,..., x n ) of X 0, X 1,..., X n. Theorem Let (X k ) be Markov with initial distribution χ. Then for n > 0, n 1 f n (x 0, x 1,..., x n ) = χ(x 0 ) q(x k+1 x k ). Corollary (Chapman-Kolmogorov equation) Let (X k ) be Markov. Then for n > 1, f n (x n x 0 ) = ( n 1 k=0 k=0 q(x k+1 x k ) ) dx 1 dx n 1. M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (18)

Example: The AR(1) process As a rst example we consider a rst order autoregressive process (AR(1)) in R. Set X 0 = 0, X k+1 = αx k + ɛ k+1, where α is a constant and the variables (ɛ k ) k 1 of the noise sequence are i.i.d. with density function f. In this case, P(X k+1 x k+1 X k = x k ) = P(αX k + ɛ k+1 x k+1 X k = x k ) implying that q(x k+1 x k ) = = P(ɛ k+1 x k+1 αx k X k = x k ) = P(ɛ k+1 x k+1 αx k ), x k+1 P(X k+1 x k+1 X k = x k ) = x k+1 P(ɛ k+1 x k+1 αx k ) = f(x k+1 αx k ). M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (19)

We are here Variance reduction reconsidered 1 Variance reduction reconsidered 2 3 M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (20)

Simulation of rare events for Markov chains Let (X k ) be a Markov chain on X = R and consider the rectangle B = B 0 B 1 B n R n, where every B l = (a l, b l ) is an interval. Here B can be a possibly extreme event. Say that we wish to to compute, sequentially as n increases, some expectation under the conditional distribution f n B of the states X 0:n = (X 0, X 2,..., X n ) given X 0:n B, i.e. τ n = E fn (φ(x 0:n ) X 0:n B) = E fn B (φ(x 0:n )) f(x 0:n ) = φ(x 0:n ) dx 0:n. B P(X 0:n B) } {{ } =f n B (x 0:n )=z n(x 0:n )/c n Here the unknown probability c n = P(X 0:n B) of the rare event B is often the quantity of interest. M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (21)

Simulation of rare events for Markov chains (cont.) As c n = P(X 0:n B) = 1 B (x 0:n )f(x 0:n ) dx 0:n a rstnaiveapproach could of course be to use standard MC and simply 1 simulate the Markov chain N times, yielding trajectories (X 0:n i ) N i=1, 2 count the number N B of trajectories that fall into B, and 3 estimate c n using the MC estimator c N n = N B N. Problem: if c n = 10 9 we may expect to produce a billion draws before obtaining a single draw belonging to B! As we will se, SMC methods solve the problem eciently. M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (22)

We are here Variance reduction reconsidered 1 Variance reduction reconsidered 2 3 M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (23)

Estimation in general hidden Markov models (HMMs) A hidden Markov model (HMM) comprises two stochastic processes: 1 A Markov chain (X k ) k 0 with transition density q: X k+1 X k = x k q(x k+1 x k ). The Markov chain is not seen by us (hidden) but partially observed through 2 an observation process (Y k ) k 0 such that conditionally on the chain (X k ) k 0, (i) (ii) the Y k 's are independent with conditional distribution of each Y k depending on the corresponding X k only. The density of the conditional distribution Y k (X k ) k 0 d. = Y k X k will be denoted by p(y k x k ). M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (24)

Estimation in general HMMs (cont.) Graphically: Y k 1 Y k Y k+1 (Observations)... X k 1 X k X k+1... (Markov chain) Y k X k = x k p(y k x k ) X k+1 X k = x k q(x k+1 x k ) X 0 χ(x 0 ) (Observation density) (Transition density) (Initial distribution) M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (25)

Example HMM: A stochastic volatility model The following dynamical system is used in nancial economy (see e.g. Jacuquier et al., 1994). Let { Xk+1 = αx k + σɛ k+1, Y k = β exp ( Xk 2 ) ε k, where α (0, 1), σ > 0, and β > 0 are constants and (ɛ k ) k 1 and (ε k ) k 0 are sequences of i.i.d. standard normal-distributed noise variables. In this model the values of the observation process (Y k ) are observed daily log-returns (from e.g. the Swedish OMXS30 index) and the hidden chain (X k ) is the unobserved log-volatility (modeled by a stationary AR(1) process). The strength of this model is that it allows for volatility clustering, a phenomenon that is often observed in real nancial time series. M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (26)

Example HMM: A stochastic volatility model A realization of the the model looks like follows (here α = 0.975, σ = 0.16, and β = 0.63). 4 3 2 Log returns Log volatility process log returns 1 0 1 2 0 100 200 300 400 500 600 700 800 900 1000 Days M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (27)

Example HMM: A stochastic volatility model Daily log-returns from the Swedish stock index OMXS30, from 2005-03-30 to 2007-03-30. 0.06 0.04 0.02 y k 0 0.02 0.04 0.06 0 50 100 150 200 250 300 350 400 450 500 k M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (28)

The smoothing distribution When operating on HMMs, one is most often interested in the smoothing distribution f n (x 0:n y 0:n ), i.e. the conditional distribution of a set X 0:n of hidden states given Y 0:n = y 0:n. Theorem (Smoothing distribution) f n (x 0:n y 0:n ) = χ(x 0)p(y 0 x 0 ) n k=1 p(y k x k )q(x k x k 1 ), L n (y 0:n ) where L n (y 0:n ) is the likelihood function given by L n (y 0:n ) = density of the observations y 0:n n = χ(x 0 )p(y 0 x 0 ) p(y k x k )q(x k x k 1 ) dx 0 dx n. k=1 M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (29)

Estimation of smoothed expectations Being a high-dimensional (say n 1000 or 10, 000) integral over complicated integrands, L n (y 0:n ) is in general unknown. However by writing τ n = E(φ(X 0:n ) Y 0:n = y 0:n ) = φ(x 0:n )f n (x 0:n y 0:n ) dx 0 dx n with = φ(x 0:n ) z n(x 0:n ) c n dx 0 dx n, { z n (x 0:n ) = χ(x 0 )p(y 0 x 0 ) n k=1 p(y k x k )q(x k x k 1 ), c n = L n (y 0:n ), we may cast the problem of computing τ n into the framework of self-normalized IS. In particular we would like to update sequentially, in n, the approximation as new data (Y k ) appears. M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (30)

We are here Variance reduction reconsidered 1 Variance reduction reconsidered 2 3 M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (31)

Self-avoiding walks (SAWs) Denote by N (x) the set of neighbors of a point x in Z 2. Let def S n = {x 0:n Z 2n : x 0 = 0, x k+1 x k = 1, x k x l, 0 l < k n} be the set of n-step self-avoiding walks in Z 2. 6 A self avoiding walk of length 50 5 4 3 2 1 0 1 2 3 8 6 4 2 0 2 4 6 M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (32)

Application of SAWs Variance reduction reconsidered In addition, let SAWs are used in c n = S n = The number of possible SAWs of length n. polymer science for describing long chain polymers, with the self-avoidance condition modeling the excluded volume eect. statistical mechanics and the theory of critical phenomena in equilibrium. However, computing c n (and in analyzing how c n depends on n) is known to be a very challenging combinatorial problem! M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (33)

An MC approach to SAWs Trick: let f n (x 0:n ) be the uniform distribution on S n : f n (x 0:n ) = 1 1 Sn (x 0:n ), x c n } {{ } 0:n Z 2n, =z(x 0:n ) We may thus cast the problem of computing the number c n (= the normalizing constant of f n ) into the framework of self-normalized IS based on some convenient instrumental distribution g n on Z 2n. In addition, solving this problem for n = 1, 2, 3,..., 508, 509,... calls for sequential implementation of IS. This will be the topic of HA2! M. Wiktorsson Monte Carlo and Empirical Methods for Stochastic Inference, L5 (34)