Diagnostics in MCMC. Hoff Chapter 7. October 7, 2009

Similar documents
Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 12 04/08/2008. Sven Zenker

A Latent Variable Approach to Validate Credit Rating Systems using R

Journal of Statistical Software

BAYESIAN ANALYSIS OF AN AGGREGATE CLAIM MODEL USING VARIOUS LOSS DISTRIBUTIONS. by Claire Dudley

An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics

Tutorial on Markov Chain Monte Carlo

Improving paired comparison models for NFL point spreads by data transformation. Gregory J. Matthews

Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT

Model Calibration with Open Source Software: R and Friends. Dr. Heiko Frings Mathematical Risk Consulting

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Bayesian Methods for the Social and Behavioral Sciences

APPLIED MISSING DATA ANALYSIS

Validation of Software for Bayesian Models Using Posterior Quantiles

Chenfeng Xiong (corresponding), University of Maryland, College Park

Dirichlet Processes A gentle tutorial

Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach

Using SAS PROC MCMC to Estimate and Evaluate Item Response Theory Models

Bayesian Estimation Supersedes the t-test

More details on the inputs, functionality, and output can be found below.

Imputing Missing Data using SAS

Markov Chain Monte Carlo and Applied Bayesian Statistics: a short course Chris Holmes Professor of Biostatistics Oxford Centre for Gene Function

Bayesian prediction of disability insurance frequencies using economic indicators

How To Understand The Theory Of Probability

Non-Life Insurance Mathematics

Introduction to Bayesian Analysis Using SAS R Software

MAN-BITES-DOG BUSINESS CYCLES ONLINE APPENDIX

BayeScan v2.1 User Manual

Markov Chain Monte Carlo Simulation Made Simple

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Bayesian Statistics: Indian Buffet Process

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Estimation and comparison of multiple change-point models

Optimising and Adapting the Metropolis Algorithm

STAT3016 Introduction to Bayesian Data Analysis

AN INTRODUCTION TO MARKOV CHAIN MONTE CARLO METHODS AND THEIR ACTUARIAL APPLICATIONS. Department of Mathematics and Statistics University of Calgary

Centre for Central Banking Studies

Gaussian Processes to Speed up Hamiltonian Monte Carlo

The HB. How Bayesian methods have changed the face of marketing research. Summer 2004

Bayesian Statistics in One Hour. Patrick Lam

Handling attrition and non-response in longitudinal data

STA 4273H: Statistical Machine Learning

Echantillonnage exact de distributions de Gibbs d énergies sous-modulaires. Exact sampling of Gibbs distributions with submodular energies

11. Time series and dynamic linear models

Parallelization Strategies for Multicore Data Analysis

Bayesian Phylogeny and Measures of Branch Support

Objections to Bayesian statistics

USE OF ARIMA TIME SERIES AND REGRESSORS TO FORECAST THE SALE OF ELECTRICITY

A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data

Estimating Industry Multiples

AALBORG UNIVERSITY. An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants

A Bayesian Model to Enhance Domestic Energy Consumption Forecast

Bayesian Forecasting of an Inhomogeneous Poisson Process with Applications to Call Center Data

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

Examples. David Ruppert. April 25, Cornell University. Statistics for Financial Engineering: Some R. Examples. David Ruppert.

Computational Statistics for Big Data

Assessment of the National Water Quality Monitoring Program of Egypt

Big Data, Statistics, and the Internet

Applications of R Software in Bayesian Data Analysis

HIDDEN MARKOV MODELS FOR ALCOHOLISM TREATMENT TRIAL DATA

Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Measuring Business Cycle Turning Points in Japan with a Dynamic Markov Switching Factor Model

> plot(exp.btgpllm, main = "treed GP LLM,", proj = c(1)) > plot(exp.btgpllm, main = "treed GP LLM,", proj = c(2)) quantile diff (error)

L25: Ensemble learning

PS 271B: Quantitative Methods II. Lecture Notes

Statistical Considerations in Magnetic Resonance Imaging of Brain Function

1 Short Introduction to Time Series

Probability and Statistics

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

Financial TIme Series Analysis: Part II

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

Modeling the Distribution of Environmental Radon Levels in Iowa: Combining Multiple Sources of Spatially Misaligned Data

Vector Time Series Model Representations and Analysis with XploRe

Sales forecasting # 2

Quantitative Methods for Finance

Microclustering: When the Cluster Sizes Grow Sublinearly with the Size of the Data Set

Introduction to Markov Chain Monte Carlo

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )


Bayesian networks - Time-series models - Apache Spark & Scala

When scientists decide to write a paper, one of the first

Monte Carlo-based statistical methods (MASM11/FMS091)

Package mcmcse. March 25, 2016

Introduction to Markov Chain Monte Carlo

A Bayesian hierarchical surrogate outcome model for multiple sclerosis

Luciano Rispoli Department of Economics, Mathematics and Statistics Birkbeck College (University of London)

Analyzing Clinical Trial Data via the Bayesian Multiple Logistic Random Effects Model

Statistical Machine Learning from Data

Online Model-Based Clustering for Crisis Identification in Distributed Computing

Big Data need Big Model 1/44

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Basic Bayesian Methods

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Enhancing Parallel Pattern Search Optimization with a Gaussian Process Oracle

Statistics Graduate Courses

Introduction to Time Series Analysis. Lecture 1.

Bayesian Time Series Analysis

Predicting the US Unemployment Rate Using Bayesian Model Averaging

Transcription:

Diagnostics in MCMC Hoff Chapter 7 October 7, 2009

Convergence to Posterior Distribution Theory tells us that if we run the Gibbs sampler long enough the samples we obtain will be samples from the joint posterior distribution (target or stationary distribution). This does not depend on the starting point (forgets the past). How long do we need to run the Markov Chain to adequately explore the posterior distribution? Mixing of the chain plays a critical role in how fast we can obtain good results. How can we tell if the chain is mixing well (or poorly)?

Three Component Mixture Model Posterior for µ: µ Y 0.45N( 3,1/3) + 0.10N(0,1/3) + 0.45N(3,1/3) How can we draw samples from the posterior? Introduce mixture component indicator δ, an unobserved latent variable which simplifies sampling δ = 1 then µ δ,y N( 3,1/3) and P(δ = 1 Y ) = 0.45 δ = 2 then µ δ,y N(0,1/3) and P(δ = 2 Y ) = 0.10; δ = 3 then µ δ,y N(3,1/3) and P(δ = 3 Y ) = 0.45 Monte Carlo sampling: Draw δ; Given δ, draw µ

MC density Mixture Density Density 0.00 0.05 0.10 0.15 0.20 0.25 0.30 4 2 0 2 4 Histogram of µ from 1000 MC draws with posterior density as a solid line µ

MC Variation If we want to find the posterior mean of g(µ), then the Monte Carlo estimate based on M MC samples is with variance ĝ MC = 1 g(µ (m) ) E[g(µ) Y ] M m Var[ĝ MC ] = E[ĝ MC E[ĝ MC ]] 2 = Var[g(µ) Y ] M leading to Monte Carlo Standard Error Var[ĝ MC ]. We expect the posterior mean of g(µ) should in the interval ĝ MC ± 2 Var[ĝ MC ] for roughly 95% of repeated MC samples. To increase accuracy, increase M.

Markov Chain Monte Carlo Full conditional for µ δ, Y µ δ = 1, Y N( 3, 1/3) µ δ = 2, Y N(0, 1/3) µ δ = 3, Y N(3, 1/3) Full conditional for δ µ,y (use Bayes Theorem): P(δ = d µ,y ) = p(δ = d Y )dnorm(µ,m d,s 2 d ) 3 d=1 p(δ = d Y )dnorm(µ,m d,s 2 d ) where dnorm is the normal density.

MCMC density Mixture Density Density 0.00 0.05 0.10 0.15 0.20 0.25 0.30 6 4 2 0 2 4 1000 draws using MCMC starting at µ = 6 and δ = 1 µ

MC Trace Plots Monte Carlo Samples µ 4 2 0 2 4 0 200 400 600 800 1000 Index

MCMC Traceplot Markov Chain Monte Carlo Samples µ 6 4 2 0 2 4 0 200 400 600 800 1000 Index

Stationarity Partition the parameter space as follows: A 0 = (, 5) A 1 = ( 5, 1) A 2 = ( 1,1) A 3 = (1,5) A 4 = (5, ) Under the posterior, we should have P(A 3 ) = P(A 1 ) > P(A 2 ) > P(A 0 ) = P(A 4 ). We need to have our MCMC sample size S to be big enough so that we can 1. Move out of A 2 (or another areas of low probability) into regions of high posterior regions (convergence) 2. Move between A 1 and A 3 and other high probability regions (mixing)

Stickiness The traceplots show MC samples can move from one region to another in 1 step (perfect mixing) MCMC quickly moves away from the starting value 6 MCMC has more difficulty moving from A 2 into higher probability regions MCMC has difficulty moving between the different components and tends to get stuck in one component for a while (stickiness)

Consequences for Variance for MCMC estimates Var MCMC [ĝ] =Var MC (ĝ)+ [( ) ( )] E g(θ (s) ) E[g(θ) Y ] g(θ (t) ) E[g(θ) Y ] s t The second term depends on the autocorrelation of samples within the Markov chain Autocorrelation of lag t is the correlation between g(θ) (s) ) and g(θ) (s+t) ) (elements that are t time steps apart) often positive so MCMC variance is larger than MC variance. high correlation is an indicator of poor mixing, so that we need a larger sample size to obtain a comparable variance Effective Sample Size Var MCMC = Var MCMC S eff

ACF Plots Monte Carlo Sample ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 25 30 Lag

ACF Plots MCMC Sample ACF 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 25 30 Lag

CODA The CODA package provides many popular diagnostics for assessing convergence of MCMC output from WinBUGS (and other programs) > library(coda) > effectivesize(theta.mcmc) var1 var2 3.444007 4.390776 The precision of the MCMC estimate of the posterior mean based on 1000 samples is a good as taking 4 independent samples! Running for 1 million iterations, still has an effective sample size of 1927.

Useful Diagnostics/Functions Effective Sample Size: effectivesize() Autocorrelation autocorr.plot() Cross Variable Correlations:crosscorr.plot() Geweke: geweke.diag() Gelman-Rubin: gelman.diag() Heidelberg & Welch: heidel.diag() Raftery-Lewis: raftery.diag()

Geweke Geweke (1992) proposed a convergence diagnostic for Markov chains based on a test for equality of the means of the first and last part of a Markov chain (by default the first 10% and the last 50%). If the samples are drawn from the stationary distribution of the chain, the two means are equal and Geweke s statistic has an asymptotically standard normal distribution.

Gelman-Rubin Gelman and Rubin (1992) propose a general approach to monitoring convergence of MCMC output in which m > 1 parallel chains are run. Use different starting values that are overdispersed relative to the posterior distribution. Convergence is diagnosed when the chains have forgotten their initial values, and the output from all chains is indistinguishable. The diagnostic is applied to a single variable from the chain. It is based a comparison of within-chain and between-chain variances (similar to a classical analysis of variance) Assumes that the target is normal (transformations may help) Values of ˆR near 1 suggest convergence

Heidelberg-Welch The convergence test uses the Cramer-von-Mises statistic to test the null hypothesis that the sampled values come from a stationary distribution. The test is successively applied, firstly to the whole chain, then after discarding the first 10%, 20%, of the chain until either the null hypothesis is accepted, or 50% of the chain has been discarded. The latter outcome constitutes failure of the stationarity test and indicates that a longer MCMC run is needed. If the stationarity test is passed, the number of iterations to keep and the number to discard (burn-in) are reported.

Raftery-Lewis Calculates the number of iterations required to estimate the quantile q to within an accuracy of ±r with probability p. Separate calculations are performed for each variable within each chain. If the number of iterations in data is too small, an error message is printed indicating the minimum length of pilot run. The minimum length is the required sample size for a chain with no correlation between consecutive samples. An estimate I (the dependence factor ) of the extent to which autocorrelation inflates the required sample size is also provided. Values of I larger than 5 indicate strong autocorrelation which may be due to a poor choice of starting value, high posterior correlations or stickiness of the MCMC algorithm. The number of burn-in iterations to be discarded at the beginning of the chain is also calculated.

Summary Diagnostics cannot guarantee that chain has converged Can indicate that it has not converged Solutions? Run longer and thin output Reparametrize model Block correlated variables together integrate out variables Add auxiliary variables (Slice-sampler for example) Use Rao-Blackwellization in estimation