Monte Carlo-based statistical methods (MASM11/FMS091)



Similar documents
Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

How To Solve A Sequential Mca Problem

Monte Carlo-based statistical methods (MASM11/FMS091)

I. Pointwise convergence

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan , Fall 2010

1 Sufficient statistics

1 Prior Probability and Posterior Probability

A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails

Separable First Order Differential Equations

Maximum Likelihood Estimation

Monte Carlo Methods in Finance

Introduction to General and Generalized Linear Models

Lecture notes: single-agent dynamics 1

Basics of Statistical Machine Learning

Polarization codes and the rate of polarization

Random access protocols for channel access. Markov chains and their stability. Laurent Massoulié.

Binomial lattice model for stock prices

Time Series Analysis

Principle of Data Reduction

Lecture 8. Confidence intervals and the central limit theorem

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

Numerical Methods for Option Pricing

Lecture 13: Martingales

Note on the EM Algorithm in Linear Regression Model

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

On closed-form solutions to a class of ordinary differential equations

Generating Random Variables and Stochastic Processes

Generating Random Variables and Stochastic Processes

Model-based Synthesis. Tony O Hagan

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

Monte Carlo Simulation

The Heat Equation. Lectures INF2320 p. 1/88

Bayesian Statistics in One Hour. Patrick Lam

Sales forecasting # 2

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Metric Spaces. Chapter Metrics

Nonparametric adaptive age replacement with a one-cycle criterion

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

Scalar Valued Functions of Several Variables; the Gradient Vector

ARMA, GARCH and Related Option Pricing Method

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

2WB05 Simulation Lecture 8: Generating random variables

The Ergodic Theorem and randomness

Moreover, under the risk neutral measure, it must be the case that (5) r t = µ t.

Nonlinear Algebraic Equations. Lectures INF2320 p. 1/88

A Brief Review of Elementary Ordinary Differential Equations

Simulating Stochastic Differential Equations

Lecture 3: Linear methods for classification

1. (First passage/hitting times/gambler s ruin problem:) Suppose that X has a discrete state space and let i be a fixed state. Let

An Introduction to Basic Statistics and Probability

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

Portfolio Distribution Modelling and Computation. Harry Zheng Department of Mathematics Imperial College

1. Prove that the empty set is a subset of every set.

A Dynamic Supply-Demand Model for Electricity Prices

Exact Confidence Intervals

1. First-order Ordinary Differential Equations

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Message-passing sequential detection of multiple change points in networks

Marshall-Olkin distributions and portfolio credit risk

ALGORITHMIC TRADING WITH MARKOV CHAINS

. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i.

Chapter 3 RANDOM VARIATE GENERATION

INTERPOLATION. Interpolation is a process of finding a formula (often a polynomial) whose graph will pass through a given set of points (x, y).

Statistics Graduate Courses

Inner Product Spaces

1 Simulating Brownian motion (BM) and geometric Brownian motion (GBM)

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

CONTINUED FRACTIONS AND PELL S EQUATION. Contents 1. Continued Fractions 1 2. Solution to Pell s Equation 9 References 12

Statistical Machine Learning

Introduction to Markov Chain Monte Carlo

Chapter 2: Binomial Methods and the Black-Scholes Formula

Package SCperf. February 19, 2015

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

Tutorial on Markov Chain Monte Carlo

1 if 1 x 0 1 if 0 x 1

MATH 425, PRACTICE FINAL EXAM SOLUTIONS.

THE CENTRAL LIMIT THEOREM TORONTO

Gaussian Processes to Speed up Hamiltonian Monte Carlo

Estimating the random coefficients logit model of demand using aggregate data

Credit Risk Models: An Overview

The Engle-Granger representation theorem

Forecasting in supply chains

Example of the Glicko-2 system

LECTURE 4. Last time: Lecture outline

Tagging with Hidden Markov Models

Efficiency and the Cramér-Rao Inequality

General Sampling Methods

Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers

3. Monte Carlo Simulations. Math6911 S08, HM Zhu

Language Modeling. Chapter Introduction

Nonparametric Tests for Randomness

Distributed computing of failure probabilities for structures in civil engineering

Markov Chain Monte Carlo Simulation Made Simple

Volumetric Path Tracing

No: Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics

Transcription:

Monte Carlo-based statistical methods (MASM11/FMS091) Jimmy Olsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February 5, 2013 J. Olsson Monte Carlo-based statistical methods, L5 (1)

Plan of today s lecture 1 Variance reduction reconsidered 2 3 J. Olsson Monte Carlo-based statistical methods, L5 (2)

We are here Variance reduction reconsidered 1 Variance reduction reconsidered 2 3 J. Olsson Monte Carlo-based statistical methods, L5 (3)

Last time: Variance reduction Last time we discussed how to reduce the variance of the standard MC sampler by introducing correlation between the variables of the sample. More specifically, we used 1 a control variate Y such that E(Y ) = m is known: Z = φ(x) + α(y m), where α was tuned optimally to α = C(φ(X), Y )/V(Y ). 2 antithetic variables V and V such that E(V ) = E(V ) = τ and C(V, V ) < 0: W = V + V. 2 J. Olsson Monte Carlo-based statistical methods, L5 (4)

Last time: Variance reduction The following theorem turned out to be useful when constructing antithetic variables. Theorem Let V = ϕ(u), where ϕ : R R is a monotone function. Moreover, assume that there exists a non-increasing transform T : R R such that U = d. T (U). Then V = ϕ(u) and V = ϕ(t (U)) are identically distributed and C(V, V ) = C(ϕ(U), ϕ(t (U))) 0. An important application of this theorem is the following: Let F be a distribution function. Then, letting U U(0, 1), T (u) = 1 u, and ϕ = F 1 yields, for X = F 1 (U) and X = F 1 (1 U), X = d. X (with distribution function F ) and C(X, X ) 0. J. Olsson Monte Carlo-based statistical methods, L5 (5)

Last time: Variance reduction τ = π/2 0 exp(cos 2 (x)) dx, V = π 2 exp(cos2 (X)), V = π 2 exp(sin2 (X)), W = V +V 2. 3.1 3 2.9 With antithetic sampling 2.8 2.7 2.6 2.5 2.4 Standard MC 2.3 2.2 2.1 0 100 200 300 400 500 600 700 800 900 1000 Sample size N V (= 2 N W ) J. Olsson Monte Carlo-based statistical methods, L5 (6)

Control variates reconsidered A problem with the control variate approach is that the optimal α, i.e. α = C(φ(X), Y ), V(Y ) is generally not known explicitly. Thus, it was suggested to 1 draw (X i ) N i=1, 2 draw (Y i ) N i=1, 3 estimate, via MC, α using the drawn samples, and 4 use this to optimally construct (Z i ) N i=1. This yields a so-called batch estimator of α. However, this procedure is computationally somewhat complex. J. Olsson Monte Carlo-based statistical methods, L5 (7)

An online approach to optimal control variates The estimators def C N = 1 N def V N = 1 N N φ(x i )(Y i m) i=1 N (Y i m) 2 i=1 of C(φ(X), Y ) and V(φ(X)), respectively, can be implemented recursively according to and C l+1 = V l+1 = with C 0 = V 0 = 0. l l + 1 C l + 1 l + 1 φ(xl+1 )(Y l+1 m) l l + 1 V l + 1 l + 1 (Y l+1 m) 2. J. Olsson Monte Carlo-based statistical methods, L5 (8)

An online approach to optimal control variates (cont.) Inspired by this we set for l = 0, 1, 2,..., N 1, Z l+1 = φ(x l+1 ) + α l (Y l+1 m), τ l+1 = l l + 1 τ l + 1 l + 1 Z l+1, def def def where α 0 = 1, α l = C l /V l for l > 0, and τ 0 = 0 yielding an online estimator. One may then establish the following convergence results. Theorem Let τ N be obtained through ( ). Then, as N, (i) τ N τ (a.s.), (ii) N(τ N τ) where σ 2 d. N (0, σ 2 ), def = V(φ(X)){1 ρ(φ(x), Y ) 2 } is the optimal variance. ( ) J. Olsson Monte Carlo-based statistical methods, L5 (9)

Example: the tricky integral again We estimate τ = using π/2 0 exp(cos 2 (x)) dx = π/2 0 π 2 exp(cos2 (x)) } {{ } =φ(x) Z = φ(x) + α (Y m), where Y = cos 2 (X) is a control variate with m = E(Y ) = π/2 0 2 π }{{} =f(x) dx = E f (φ(x)) cos 2 (x) 2 π dx = {use integration by parts} = 1 2. However, the optimal coefficient α is not known explicitly. J. Olsson Monte Carlo-based statistical methods, L5 (10)

Example: the tricky integral again cos2 = @(x) cos(x).^2; phi = @(x) (pi/2)*exp(cos2(x)); m = 1/2; X = (pi/2)*rand; Y = cos2(x); c = phi(x)*(y m); v = (Y m)^2; tau_cv = phi(x) + (Y m); alpha = c/v; for k = 2:N, X = (pi/2)*rand; Y = cos2(x); Z = phi(x) + alpha*(y m); tau_cv = (k 1)*tau_CV/k + Z/k; c = (k 1)*c/k + phi(x)*(y m)/k; v = (k 1)*v/k + (Y m)^2/k; alpha = c/v; end J. Olsson Monte Carlo-based statistical methods, L5 (11)

Example: the tricky integral again 3.2 3.15 3.1 3.05 3 2.95 2.9 Standard MC Adaptive algorithm Batch algorithm 2.85 2.8 2.75 2.7 0 500 1000 1500 Sample size N J. Olsson Monte Carlo-based statistical methods, L5 (12)

We are here Variance reduction reconsidered 1 Variance reduction reconsidered 2 3 J. Olsson Monte Carlo-based statistical methods, L5 (13)

We will now (and for the coming two lectures) extend the principal goal of the course to the problem of estimating sequentially sequences (τ n ) n 0 of expectations τ n = E fn (φ(x 0:n )) = φ(x 0:n )f n (x 0:n ) dx 0:n X n over spaces X n of increasing dimension, where again the densities (f n ) n 0 are known up to normalizing constants only; i.e. for every n 0, f(x 0:n ) = z n(x 0:n ), c n where c n is an unknown constant and z n is a known positive function on X n. As we will see, such sequences appear in many applications in statistics and numerical analysis. J. Olsson Monte Carlo-based statistical methods, L5 (14)

We are here Variance reduction reconsidered 1 Variance reduction reconsidered 2 3 J. Olsson Monte Carlo-based statistical methods, L5 (15)

We are here Variance reduction reconsidered 1 Variance reduction reconsidered 2 3 J. Olsson Monte Carlo-based statistical methods, L5 (16)

A Markov chain on X R d is a family of random variables (= stochastic process) (X k ) k 0 taking values in X such that P(X k+1 A X 0, X 1,..., X k ) = P(X k+1 A X k ) for all A X. We call the chain time homogeneous if the conditional distribution of X k+1 given X k does not depend on k. The distribution of X k+1 given X k = x determines completely the dynamics of the process, and the density q of this distribution is called the transition density of (X k ). Consequently, P(X k+1 A X k = x k ) = q(x k+1 x k ) dx k+1. A J. Olsson Monte Carlo-based statistical methods, L5 (17)

Markov chains (cont.) The following theorem provides the joint density f n (x 0, x 1,..., x n ) of X 0, X 1,..., X n. Theorem Let (X k ) be Markov with initial distribution χ. Then for n > 0, n 1 f n (x 0, x 1,..., x n ) = χ(x 0 ) q(x k+1 x k ). k=0 Corollary (Chapman-Kolmogorov equation) Let (X k ) be Markov. Then for n > 1, f n (x n x 0 ) = ( n 1 k=0 q(x k+1 x k ) ) dx 1 dx n 1. J. Olsson Monte Carlo-based statistical methods, L5 (18)

Example: The AR(1) process As a first example we consider a first order autoregressive process (AR(1)) in R. Set X 0 = 0, X k+1 = αx k + ɛ k+1, where α is a constant and the variables (ɛ k ) k 1 of the noise sequence are i.i.d. with density function f. In this case, P(X k+1 x k+1 X k = x k ) = P(αX k + ɛ k+1 x k+1 X k = x k ) = P(ɛ k+1 x k+1 αx k X k = x k ) = P(ɛ k+1 x k+1 αx k ), implying that q(x k+1 x k ) = P(X k+1 x k+1 X k = x k ) x k+1 = P(ɛ k+1 x k+1 αx k ) = f(x k+1 αx k ). x k+1 J. Olsson Monte Carlo-based statistical methods, L5 (19)

We are here Variance reduction reconsidered 1 Variance reduction reconsidered 2 3 J. Olsson Monte Carlo-based statistical methods, L5 (20)

Simulation of rare events for Markov chains Let (X k ) be a Markov chain on X = R and consider the rectangle A = A 0 A 1 A n R n, where every A l = (a l, b l ) is an interval. Here A can be a possibly extreme event. Say that we wish to to compute, sequentially as n increases, some expectation under the conditional distribution f n A of the states X 0:n = (X 0, X 2,..., X n ) given X 0:n A, i.e. τ n = E fn (φ(x 0:n ) X 0:n A) = E fn A (φ(x 0:n )) f(x 0:n ) = φ(x 0:n ) dx 0:n. A P(X 0:n A) } {{ } =f n A (x 0:n )=z n(x 0:n )/c n Here the unknown probability c n = P(X 0:n A) of the rare event A is often the quantity of interest. J. Olsson Monte Carlo-based statistical methods, L5 (21)

Simulation of rare events for Markov chains (cont.) As c n = P(X 0:n A) = 1 A (x 0:n )f(x 0:n ) dx 0:n a first naive approach could of course be to use standard MC and simply 1 simulate the Markov chain N times, yielding trajectories (X i 0:n )N i=1, 2 count the number N A of trajectories that fall into A, and 3 estimate c n using the MC estimator c N n = N A N. Problem: if c n = 10 9 we may expect to produce a billion draws before obtaining a single draw belonging to A! As we will se, SMC methods solve the problem efficiently. J. Olsson Monte Carlo-based statistical methods, L5 (22)

We are here Variance reduction reconsidered 1 Variance reduction reconsidered 2 3 J. Olsson Monte Carlo-based statistical methods, L5 (23)

Estimation in general hidden Markov models (HMMs) A hidden Markov model (HMM) comprises two stochastic processes: 1 A Markov chain (X k ) k 0 with transition density q: X k+1 X k = x k q(x k+1 x k ). The Markov chain is not seen by us (hidden) but partially observed through 2 an observation process (Y k ) k 0 such that conditionally on the chain (X k ) k 0, (i) the Y k s are independent with (ii) conditional distribution of each Y k depending on the corresponding X k only. d. The density of the conditional distribution Y k (X k ) k 0 = Y k X k will be denoted by p(y k x k ). J. Olsson Monte Carlo-based statistical methods, L5 (24)

Estimation in general HMMs (cont.) Graphically: Y k 1 Y k Y k+1 (Observations)... X k 1 X k X k+1... (Markov chain) Y k X k = x k g(y k x k ) X k+1 X k = x k q(x k+1 x k ) X 0 χ(x 0 ) (Observation density) (Transition density) (Initial distribution) J. Olsson Monte Carlo-based statistical methods, L5 (25)

Example HMM: A stochastic volatility model The following dynamical system is used in financial economy (see e.g. Jacuquier et al., 1994). Let { Xk+1 = αx k + σɛ k+1, Y k = β exp ( Xk 2 ) ε k, where α (0, 1), σ > 0, and β > 0 are constants and (ɛ k ) k 1 and (ε k ) k 0 are sequences of i.i.d. standard normal-distributed noise variables. In this model the values of the observation process (Y k ) are observed daily log-returns (from e.g. the Swedish OMXS30 index) and the hidden chain (X k ) is the unobserved log-volatility (modeled by a stationary AR(1) process). The strength of this model is that it allows for volatility clustering, a phenomenon that is often observed in real financial time series. J. Olsson Monte Carlo-based statistical methods, L5 (26)

Example HMM: A stochastic volatility model A realization of the the model looks like follows (here α = 0.975, σ = 0.16, and β = 0.63). 4 3 2 Log returns Log volatility process log returns 1 0 1 2 0 100 200 300 400 500 600 700 800 900 1000 Days J. Olsson Monte Carlo-based statistical methods, L5 (27)

Example HMM: A stochastic volatility model Daily log-returns from the Swedish stock index OMXS30, from 2005-03-30 to 2007-03-30. 0.06 0.04 0.02 y k 0 0.02 0.04 0.06 0 50 100 150 200 250 300 350 400 450 500 k J. Olsson Monte Carlo-based statistical methods, L5 (28)

The smoothing distribution When operating on HMMs, one is most often interested in the smoothing distribution f n (x 0:n y 0:n ), i.e. the conditional distribution of a set X 0:n of hidden states given Y 0:n = y 0:n. Theorem (Smoothing distribution) f n (x 0:n y 0:n ) = χ(x 0)g(y 0 x 0 ) n k=1 g(y k x k )q(x k x k 1 ), L n (y 0:n ) where L n (y 0:n ) is the likelihood function given by L n (y 0:n ) = density of the observations y 0:n n = χ(x 0 )g(y 0 x 0 ) g(y k x k )q(x k x k 1 ) dx 0 dx n. k=1 J. Olsson Monte Carlo-based statistical methods, L5 (29)

Estimation of smoothed expectations Being a high-dimensional (say n 1000 or 10, 000) integral over complicated integrands, L n (y 0:n ) is in general unknown. However by writing τ n = E(φ(X 0:n ) Y 0:n = y 0:n ) = φ(x 0:n )f n (x 0:n y 0:n ) dx 0 dx n = φ(x 0:n ) z n(x 0:n ) c n dx 0 dx n, with { z n (x 0:n) = χ(x 0 )g(y 0 x 0 ) n k=1 g(y k x k )q(x k x k 1 ), c n = L n (y 0:n ), we may cast the problem of computing τ n into the framework of self-normalized IS. In particular we would like to update sequentially, in n, the approximation as new data (Y k ) appears. J. Olsson Monte Carlo-based statistical methods, L5 (30)

We are here Variance reduction reconsidered 1 Variance reduction reconsidered 2 3 J. Olsson Monte Carlo-based statistical methods, L5 (31)

Self-avoiding walks (SAWs) Denote by N (x) the set of neighbors of a point x in Z 2 def. Let S n = {x 0:n Z 2n : x 0 = 0, x k+1 x k = 1, x k x l, 0 l < k n} be the set of n-step self-avoiding walks in Z 2. 6 A self avoiding walk of length 50 5 4 3 2 1 0 1 2 3 8 6 4 2 0 2 4 6 J. Olsson Monte Carlo-based statistical methods, L5 (32)

Application of SAWs In addition, let c n = S n = The number of possible SAWs of length n. SAWs are used in polymer science for describing long chain polymers, with the self-avoidance condition modeling the excluded volume effect. statistical mechanics and the theory of critical phenomena in equilibrium. However, computing c n (and in analyzing how c n depends on n) is known to be a very challenging combinatoric problem! J. Olsson Monte Carlo-based statistical methods, L5 (33)

An MC approach to SAWs Trick: let f n (x 0:n ) be the uniform distribution on S n : f n (x 0:n ) = 1 1 Sn (x 0:n ), x c n } {{ } 0:n Z 2n, =z(x 0:n ) We may thus cast the problem of computing the number c n (= the normalizing constant of f n ) into the framework of self-normalized IS based on some convenient instrumental distribution g n on Z 2n. In addition, solving this problem for n = 1, 2, 3,..., 508, 509,... calls for sequential implementation of IS. This will be the topic of HA2! J. Olsson Monte Carlo-based statistical methods, L5 (34)