Bayesian model comparison with un-normalised likelihoods

Similar documents
Chapter 8: Regression with Lagged Explanatory Variables

Term Structure of Prices of Asian Options

PROFIT TEST MODELLING IN LIFE ASSURANCE USING SPREADSHEETS PART ONE

Real-time Particle Filters

MTH6121 Introduction to Mathematical Finance Lesson 5

Measuring macroeconomic volatility Applications to export revenue data,

Duration and Convexity ( ) 20 = Bond B has a maturity of 5 years and also has a required rate of return of 10%. Its price is $613.

Journal Of Business & Economics Research September 2005 Volume 3, Number 9

USE OF EDUCATION TECHNOLOGY IN ENGLISH CLASSES

Supplementary Appendix for Depression Babies: Do Macroeconomic Experiences Affect Risk-Taking?

Why Did the Demand for Cash Decrease Recently in Korea?

Principal components of stock market dynamics. Methodology and applications in brief (to be updated ) Andrei Bouzaev, bouzaev@ya.

TEMPORAL PATTERN IDENTIFICATION OF TIME SERIES DATA USING PATTERN WAVELETS AND GENETIC ALGORITHMS

Multiprocessor Systems-on-Chips

II.1. Debt reduction and fiscal multipliers. dbt da dpbal da dg. bal

DOES TRADING VOLUME INFLUENCE GARCH EFFECTS? SOME EVIDENCE FROM THE GREEK MARKET WITH SPECIAL REFERENCE TO BANKING SECTOR

Distributing Human Resources among Software Development Projects 1

Appendix D Flexibility Factor/Margin of Choice Desktop Research

ANALYSIS AND COMPARISONS OF SOME SOLUTION CONCEPTS FOR STOCHASTIC PROGRAMMING PROBLEMS

SPEC model selection algorithm for ARCH models: an options pricing evaluation framework

Random Walk in 1-D. 3 possible paths x vs n. -5 For our random walk, we assume the probabilities p,q do not depend on time (n) - stationary

Hedging with Forwards and Futures

Morningstar Investor Return

Iterated importance sampling in missing data problems

Bayesian Filtering with Online Gaussian Process Latent Variable Models

Making a Faster Cryptanalytic Time-Memory Trade-Off

Single-machine Scheduling with Periodic Maintenance and both Preemptive and. Non-preemptive jobs in Remanufacturing System 1

WATER MIST FIRE PROTECTION RELIABILITY ANALYSIS

ARCH Proceedings

INTEREST RATE FUTURES AND THEIR OPTIONS: SOME PRICING APPROACHES

CHARGE AND DISCHARGE OF A CAPACITOR

Mathematics in Pharmacokinetics What and Why (A second attempt to make it clearer)

Maintaining Multi-Modality through Mixture Tracking

Setting Accuracy Targets for. Short-Term Judgemental Sales Forecasting

Stochastic Optimal Control Problem for Life Insurance

INTRODUCTION TO FORECASTING

Market Liquidity and the Impacts of the Computerized Trading System: Evidence from the Stock Exchange of Thailand

Chapter 7. Response of First-Order RL and RC Circuits

Dependent Interest and Transition Rates in Life Insurance

Vector Autoregressions (VARs): Operational Perspectives

11/6/2013. Chapter 14: Dynamic AD-AS. Introduction. Introduction. Keeping track of time. The model s elements

SELF-EVALUATION FOR VIDEO TRACKING SYSTEMS

A Re-examination of the Joint Mortality Functions

Individual Health Insurance April 30, 2008 Pages

Biology at Home - Pariion Funcion Guillaume

PATHWISE PROPERTIES AND PERFORMANCE BOUNDS FOR A PERISHABLE INVENTORY SYSTEM

A Note on Using the Svensson procedure to estimate the risk free rate in corporate valuation

On the degrees of irreducible factors of higher order Bernoulli polynomials

DYNAMIC MODELS FOR VALUATION OF WRONGFUL DEATH PAYMENTS

Forecasting and Information Sharing in Supply Chains Under Quasi-ARMA Demand

Automatic measurement and detection of GSM interferences

The Transport Equation

Large Scale Online Learning.

The Application of Multi Shifts and Break Windows in Employees Scheduling

Risk Modelling of Collateralised Lending

Measuring the Effects of Monetary Policy: A Factor-Augmented Vector Autoregressive (FAVAR) Approach * Ben S. Bernanke, Federal Reserve Board

A Generalized Bivariate Ornstein-Uhlenbeck Model for Financial Assets

The naive method discussed in Lecture 1 uses the most recent observations to forecast future values. That is, Y ˆ t + 1

UNDERSTANDING THE DEATH BENEFIT SWITCH OPTION IN UNIVERSAL LIFE POLICIES. Nadine Gatzert

Present Value Methodology

Distributed Online Localization in Sensor Networks Using a Moving Target

A Bayesian framework with auxiliary particle filter for GMTI based ground vehicle tracking aided by domain knowledge

DETERMINISTIC INVENTORY MODEL FOR ITEMS WITH TIME VARYING DEMAND, WEIBULL DISTRIBUTION DETERIORATION AND SHORTAGES KUN-SHAN WU

Sampling Time-Based Sliding Windows in Bounded Space

Working Paper No Net Intergenerational Transfers from an Increase in Social Security Benefits

Cointegration: The Engle and Granger approach

An Online Learning-based Framework for Tracking

Module 3 Design for Strength. Version 2 ME, IIT Kharagpur

Inventory Planning with Forecast Updates: Approximate Solutions and Cost Error Bounds

Finance and Economics Discussion Series Divisions of Research & Statistics and Monetary Affairs Federal Reserve Board, Washington, D.C.

MACROECONOMIC FORECASTS AT THE MOF A LOOK INTO THE REAR VIEW MIRROR

COMPUTATION OF CENTILES AND Z-SCORES FOR HEIGHT-FOR-AGE, WEIGHT-FOR-AGE AND BMI-FOR-AGE

Markov Chain Modeling of Policy Holder Behavior in Life Insurance and Pension

Time Series Analysis Using SAS R Part I The Augmented Dickey-Fuller (ADF) Test

Analysis of Planck and the Equilibrium ofantis in Tropical Physics

The option pricing framework

When Is Growth Pro-Poor? Evidence from a Panel of Countries

RISK ANALYSIS FOR LARGE POOLS OF LOANS

The Relationship between Stock Return Volatility and. Trading Volume: The case of The Philippines*

A Note on the Impact of Options on Stock Return Volatility. Nicolas P.B. Bollen

Keldysh Formalism: Non-equilibrium Green s Function

The Grantor Retained Annuity Trust (GRAT)

Small and Large Trades Around Earnings Announcements: Does Trading Behavior Explain Post-Earnings-Announcement Drift?

AP Calculus BC 2010 Scoring Guidelines

As widely accepted performance measures in supply chain management practice, frequency-based service

ON THE PRICING OF EQUITY-LINKED LIFE INSURANCE CONTRACTS IN GAUSSIAN FINANCIAL ENVIRONMENT

Unstructured Experiments

Fakultet for informasjonsteknologi, Institutt for matematiske fag

Does Option Trading Have a Pervasive Impact on Underlying Stock Prices? *

How To Calculate Price Elasiciy Per Capia Per Capi

Stock Trading with Recurrent Reinforcement Learning (RRL) CS229 Application Project Gabriel Molina, SUID

Analysis of Pricing and Efficiency Control Strategy between Internet Retailer and Conventional Retailer

TSG-RAN Working Group 1 (Radio Layer 1) meeting #3 Nynashamn, Sweden 22 nd 26 th March 1999

Forecasting Daily Supermarket Sales Using Exponentially Weighted Quantile Regression

Mortality Variance of the Present Value (PV) of Future Annuity Payments

Optimal Investment and Consumption Decision of Family with Life Insurance

Forecasting, Ordering and Stock- Holding for Erratic Demand

Forecasting. Including an Introduction to Forecasting using the SAP R/3 System

Measuring the Downside Risk of the Exchange-Traded Funds: Do the Volatility Estimators Matter?

Transcription:

Saisics and Compuing manuscrip No. (will be insered by he edior) Bayesian model comparison wih un-normalised likelihoods Richard G. Everi Adam M. Johansen Ellen Rowing Melina Evdemon-Hogan Each of hese (overlapping) siuaions has been considered in some deail in previous work and each has inspired differen mehodologies. In his paper we focus on he hird case, in which he likelihood has an INC. This is an imporan problem in is own righ (Girolami e al (2013) refer o i as one of he main challenges o mehodology for compuaional saisics currenly ). There exis several compeing mehodologies for inference in his seing (see Everi (2012)). In paricular, he exac approaches of Møller e al (2006) and Murray e al (2006) exploi he decomposiion f(y θ) = 1 Z(θ) γ(y θ), whereas simarxiv:1504.00298v3 [sa.co] 20 Jan 2016 Received: dae / Acceped: dae Absrac Models for which he likelihood funcion can be evaluaed only up o a parameer-dependen unknown normalizing consan, such as Markov random field models, are used widely in compuer science, saisical physics, spaial saisics, and nework analysis. However, Bayesian analysis of hese models using sandard Mone Carlo mehods is no possible due o he inracabiliy of heir likelihood funcions. Several mehods ha permi exac, or close o exac, simulaion from he poserior disribuion have recenly been developed. However, esimaing he evidence and Bayes facors (BFs) for hese models remains challenging in general. This paper describes new random weigh imporance sampling and sequenial Mone Carlo mehods for esimaing BFs ha use simulaion o circumven he evaluaion of he inracable likelihood, and compares hem o exising mehods. In some cases we observe an advanage in he use of biased weigh esimaes. An iniial invesigaion ino he heoreical and empirical properies of his class of mehods is presened. Some suppor for he use of biased esimaes is presened, bu we advocae cauion in he use of such esimaes. Keywords approximae Bayesian compuaion Bayes facors imporance sampling marginal likelihood Markov random field pariion funcion sequenial Mone Carlo R. G. Everi E. Rowing M. Evdemon-Hogan Deparmen of Mahemaics and Saisics, Universiy of Reading, UK. E-mail: r.g.everi@reading.ac.uk A. M. Johansen Deparmen of Saisics, Universiy of Warwick, Covenry, CV4 7AL, UK. E-mail: a.m.johansen@warwick.ac.uk 1 Inroducion There has been much recen ineres in performing Bayesian inference in models where he poserior is inracable and, in paricular, we have he siuaion in which he poserior disribuion π(θ y) p(θ)f(y θ), canno be evaluaed poinwise. This inracabiliy ypically occurs occurs due o he inracabiliy of he likelihood, i.e. f(y θ) canno be evaluaed poinwise. Example scenarios include: 1. he use of big daa ses, where f(y θ) consiss of a produc of a large number of erms; 2. he exisence of a large number of laen variables x, wih f(y θ) known only as a high dimensional inegral f(y θ) = f(y, x θ)dx; x 3. when f(y θ) = 1 Z(θ) γ(y θ), wih Z(θ) being an inracable normalising consan (INC) for he racable erm γ(y θ) (e.g. when f facorises as a Markov random field); 4. where i is possible o sample from f( θ), bu no o evaluae i, such as when he disribuion of he daa given θ is modelled by a complex sochasic compuer model.

2 Richard G. Everi e al. ulaion based mehods such as approximae Bayesian compuaion (ABC) (Grelaud e al 2009) do no depend upon such a decomposiion and can be applied more generally: o siuaion 1 in Picchini and Forman (2013); siuaions 2 and 3 (e.g. Everi (2012)) and siuaion 4 (e.g. Wilkinson (2013)). This paper considers he problem of Bayesian model comparison in he presence of an INC. We explore boh exac and simulaion-based mehods, and find ha elemens of boh approaches may also be more generally applicable. Specifically: For exac mehods we find ha approximaions are required o allow pracical implemenaion, and his leads us o invesigae he use of approximae weighs in imporance sampling (IS) and sequenial Mone Carlo (SMC). We examine he use of boh exacapproximae approaches (as in Fearnhead e al (2010)) and also inexac-approximae mehods, in which complee flexibiliy is allowed in he approximaion of weighs, a he cos of losing he exacness of he mehod. This work is a naural counerpar o Alquier e al (2015), which examines ananalogous quesion (concerning he accepance probabiliy) for Markov chain Mone Carlo (MCMC) algorihms. These generally applicable mehods, noisy MCMC (Alquier e al 2015) and noisy SMC (his paper) have some poenial o address siuaions 1-3. We provide some comparison of hese inexac approximaions wih simulaion-based mehods, including he synheic likelihood (SL) of Wood (2010). In he applicaions considered here we find his o be a viable alernaive o ABC. Our resuls are suggesive ha his, and relaed mehods, may find success in scenarios in which ABC is more usually applied. In he remainder of his secion we briefly ouline he problem of, and mehods for, parameer inference in he presence of an INC. We hen deail he problem of Bayesian model comparison in his conex, before discussing mehods for addressing i in he following wo secions. 1.1 Parameer inference In his secion we consider he problem of simulaing from π(θ y) p(θ)γ(y θ)/z(θ) using MCMC. This problem has been well sudied, and such models are ermed doubly inracable because he accepance probabiliy in he Meropolis-Hasings (MH) algorihm { min 1, q(θ θ ) q(θ θ) p(θ ) p(θ) γ(y θ ) γ(y θ) Z(θ) Z(θ ) }, (1) canno be evaluaed due o he presence of he INC. We firs review exac mehods for simulaing from such a arge in secions 1.1.1-1.1.3, before looking a simulaionbased mehods in secions 1.1.4 and 1.1.5. The mehods described here in he conex of MCMC form he basis of he mehods for evidence esimaion developed in he remainder of he paper. 1.1.1 Single and muliple auxiliary variable mehods Møller e al (2006) avoid he evaluaion of he INC by augmening he arge disribuion wih an exra variable u ha lies on he same space as y, and use an MH algorihm wih arge disribuion π(θ, u y) q u (u θ, y)f(y θ)p(θ), where q u is some (normalised) arbirary disribuion. As he MH proposal in (θ, u)-space hey use (θ, u ) f(u θ )q(θ θ), giving an accepance probabiliy of { min 1, q(θ θ ) q(θ θ) p(θ ) p(θ) γ(y θ ) γ(y θ) q u (u θ, y) γ(u θ ) γ(u θ) q u (u θ, y) }. Noe ha, by viewing q u (u θ, y)/γ(u θ ) as an unbiased IS esimaor of 1/Z(θ ), his algorihm can be seen as an insance of he exac approximaions described in Beaumon (2003) and Andrieu and Robers (2009), where i is esablished ha if an unbiased esimaor of a arge densiy is used appropriaely in an MH algorihm, he θ-marginal of he invarian disribuion of his chain is he arge disribuion of ineres. This auomaically suggess exensions o he single auxiliary variable (SAV) mehod described above, where M > 1 imporance poins are used, yielding: 1 Z(θ) = 1 M q u (u (m) θ, y). (2) γ(u (m) θ) Andrieu and Vihola (2012) show ha he reduced variance of his esimaor leads o a reduced asympoic variance of esimaors from he resulan Markov chain. The variance of he IS esimaor is srongly dependen on an appropriae choice of IS arge q u ( θ, y), which should have ligher ails han f( θ). Møller e al (2006) sugges ha a reasonable choice may be q u ( θ, y) = f( θ), where θ is he maximum likelihood esimaor of θ. However, in pracice q u ( θ, y) can be difficul o choose well, paricularly when y lies on a high dimensional space. Moivaed by his, annealed imporance sampling (AIS) (Neal 2001) can be used as an alernaive o IS, leading o he muliple auxiliary variable (MAV) mehod of Murray e al (2006). AIS makes use

Bayesian model comparison wih un-normalised likelihoods 3 of a sequence of K arges, which in Murray e al (2006) are chosen o be f k ( θ, θ, y) γ k ( θ, θ, y) = γ( θ) (K+1 k)/(k+1) q u ( θ, y) k/(k+1) (3) beween f( θ) and q u ( θ, y). Afer he iniial draw u K+1 f( θ), he auxiliary poin is aken hrough a sequence of K MCMC moves which successively have arge f k ( θ, θ, y) for k = K : 1. The resulan IS esimaor is given by 1 Z(θ) = 1 M K k=1 γ k (u (m) k 1 θ, θ, y) γ k 1 (u (m) k 1 θ, θ, y). (4) This esimaor has a lower variance (alhough a a higher compuaional cos) han he corresponding IS esimaor. We noe ha AIS can be viewed as a paricular case of SMC wihou resampling and one migh expec o obain addiional improvemens a negligible cos by incorporaing resampling seps wihin such algorihms (see Zhou e al (2015) for an illusraion of he poenial improvemen and some discussion); we do no pursue his here as i is no he focus of his work. 1.1.2 Exchange algorihms An alernaive approach o avoiding he raio of INCs in equaion (1) is given by Murray e al (2006), in which i is suggesed o use he accepance probabiliy { min 1, q(θ θ ) q(θ θ) p(θ ) p(θ) γ(y θ ) γ(y θ) γ(u θ) γ(u θ ) }, where u f( θ ), moivaed by he inuiive idea ha γ(u θ)/γ(u θ ) is a single poin IS esimaor of Z(θ)/Z(θ ). This mehod is shown o have he correc invarian disribuion, as is he exension in which AIS is used in place of IS. A poenial exension migh seem o be using muliple imporance poins {u (m) } M f( θ ) o obain an esimaor of Z(θ)/Z(θ ) ha has a smaller variance, wih he aim of improving he saisical efficiency of esimaors based on he resulan Markov chain. This scheme is shown o work well empirically in Alquier e al (2015). However, his chain does no have he desired arge as is invarian disribuion. Insead i can be seen as par of a wider class of algorihms ha use a noisy esimae of he accepance probabiliy: noisy Mone Carlo algorihms (also referred o as inexac approximaions in Girolami e al (2013)). Alquier e al (2015) shows ha under uniform ergodiciy of he ideal chain, a bound on he expeced difference beween he noisy and rue accepance probabiliies can lead o bounds on he disance beween he desired arge disribuion and he ieraed noisy kernel. I also describes addiional noisy MCMC algorihms for approximaely simulaing from he poserior, based on Langevin dynamics. 1.1.3 Russian Roulee and oher approaches Girolami e al (2013) use series-based approximaions o inracable arge disribuions wihin he exac-approximaion framework, where Russian Roulee mehods from he physics lieraure are used o ensure he unbiasedness of runcaions of infinie sums. These mehods do no require exac simulaion from f( θ ), as do he SAV and exchange approaches described in he previous wo secions. However, SAV and exchange are ofen implemened in pracice by generaing he auxiliary variables by aking he final poin of a long inernal MCMC run in place of exac simulaion (e.g Caimo and Friel (2011)). For finie runs of he inernal MCMC, his approach will no have exacly he desired invarian disribuion, bu Everi (2012) shows ha under regulariy condiions he bias inroduced by his approximaion ends o zero as he run lengh of he inernal MCMC increases: he same proof holds for he use of an MCMC chain for he simulaion wihin an ABC-MCMC (i.e. MCMC applied o an ABC approximaion of he poserior, Marjoram e al (2003)) or SL-MCMC (i.e. MCMC applied o an SL approximaion) algorihm, as described in secions 1.1.4 and 1.1.5. Alhough he approach of Girolami e al (2013) is exac, as hey noe i is significanly more compuaionally expensive han his approximae approach. For his reason, we do no pursue Russian Roulee approaches furher in his paper. When a rejecion sampler is available for simulaing from f( θ ), Rao e al (2013) inroduce an alernaive exac algorihm ha has some favourable properies compared o he exchange algorihm. Since a rejecion sampler is no available in many cases, we do no pursue his approach furher. 1.1.4 Approximae Bayesian compuaion Approximae Bayesian Compuaion (Tavaré e al 1997) refers o mehods ha aim o approximae an inracable likelihood f(y θ) hrough he inegral f(s(y) θ) π ɛ (S(y) S(u))f(u θ)du (5) where S( ) gives a vecor of summary saisics and π ɛ (S(y) S(u)) is he densiy of a symmeric kernel wih bandwidh ɛ, cenered a S(u) and evaluaed a S(y). As ɛ 0, his disribuion becomes more concenraed, so ha in he case where S( ) gives sufficien saisics

4 Richard G. Everi e al. for esimaing θ, as ɛ 0 he approximae poserior becomes closer o he rue poserior. This approximaion is used wihin sandard Mone Carlo mehods for simulaing from he poserior. For example, i may be used wihin an MCMC algorihm, where using an exacapproximaion argumen i can be seen ha i is sufficien in he calculaion of he accepance probabiliy o use he Mone Carlo approximaion f ɛ (S(y) θ ) = 1 M π ɛ (S(y) S ( u (m))) (6) for he likelihood a θ a each ieraion, where {u (m) } M f( θ ). Whils he exac-approximaion argumen means ha here is no addiional bias due o his Mone Carlo approximaion, he approximaion inroduced hrough using a olerance ɛ > 0 or insufficien summary saisics may be large. For his reason i migh be considered a las resor o use ABC on likelihoods wih an INC, bu previous success on hese models (e.g Grelaud e al (2009) and Everi (2012)) lead us o consider hem furher in his paper. 1.1.5 Synheic likelihood ABC is essenially using, based on simulaions from f, a nonparameeric esimaor of f S (S θ), he disribuion of he summary saisics of he daa given θ. In some siuaions, a parameric model migh be more appropriae. For example, if he saisic is he sum of independen random variables, a Cenral Limi Theorem (CLT) migh imply ha i would be appropriae o assume ha f S (S θ) is a mulivariae Gaussian. The SL approach (Wood 2010) proceeds by making exacly his Gaussian assumpion and uses his approximae likelihood wihin an MCMC algorihm. The parameers (he mean and variance) of his approximaing disribuion for a given θ are esimaed based on he summary saisics of simulaions {u (m) } M f( θ). Concreely, he esimae of he likelihood is f SL (S(y) θ) = N (S(y); µ θ, Σ ) θ, where µ θ = 1 M S (u (m)) Σθ = sst M 1, (7) wih s = (S (u 1 ) µ θ,..., S (u M ) µ θ ). Wood (2010) applies his mehod in a seing where he summary saisics are regression coefficiens, moivaed by heir disribuion being approximaely normal. One of he approximaions inheren in his mehod, as in ABC, is he use of summary saisics raher han he whole daase. However, unlike ABC, here is no need o choose a bandwidh ɛ: his approximaion is replaced wih ha arising from he discrepancy beween he normal approximaion and he exac disribuion of he chosen summary saisic. The SL mehod remains approximae even if he summary saisic disribuion is Gaussian as f SL is no an unbiased esimae of he densiy and so he exac-approximaion resuls do no apply. Raher, his is a special case of noisy MCMC, and we do no expec he addiional bias inroduced by esimaing he parameers of f SL o have large effecs on he resuls, even if he parameers are esimaed via an inernal MCMC chain argeing f( θ) as described in secion 1.1.3. SL is relaed o a number of oher simulaion based algorihms under he umbrella of Bayesian indirec inference (Drovandi e al 2015). This suggess a number of exensions o some of he mehods presened in his paper ha we do no explore here. 1.2 Bayesian model comparison The main focus of his paper is esimaing he marginal likelihood (also ermed he evidence) p(y) = p(θ)f(y θ)dθ and Bayes facors: raios of evidences for differen models (M 1 and M 2, say), BF 12 = p(y M 1 )/p(y M 2 ). These quaniies canno usually be esimaed reliably from MCMC oupu, and commonly used mehods for esimaing hem require f(y θ) o be racable in θ. This leads Friel (2013) o label heir esimaion as riply inracable when f has an INC. To our knowledge he only published approach o esimaing he evidence for such models is in Friel (2013), wih his paper also giving one of he only approaches o esimaing BFs in his seing. For esimaing BFs, ABC provides a viable alernaive (Grelaud e al 2009), a leas for models wihin he exponenial family. Friel (2013) sars from Chib s approximaion, p(y) = f(y θ)p( θ), (8) π( θ y) where θ can be an arbirary value of θ and π is an approximaion o he poserior disribuion. Such an approximaion is inracable when f has an INC. Friel (2013) devises a populaion version of he exchange algorihm ha simulaes poins θ (p) from he poserior disribuion, and which also gives an esimae Ẑ(θ(p) ) of he INC a each of hese poins. The poins θ (p) can be used o find a kernel densiy approximaion π, and

Bayesian model comparison wih un-normalised likelihoods 5 esimaes Ẑ(θ(p) ) of he INC. These are hen used in a number of evaluaions of (8) a poins (generaed by he populaion exchange algorihm) in a region of high poserior densiy, which are hen averaged o find an esimae of he evidence. This mehod has a number of useful properies (including ha i may be a more efficien approach for parameer inference han he sandard exchange algorihm), bu for evidence esimaion i suffers he limiaion of using a kernel densiy esimae which means ha, as noed in he paper, is use is limied o low-dimensional parameer spaces. In his paper we explore he alernaive approach of mehods based on IS, making use of he likelihood approximaions described earlier in his secion. These IS mehods are oulined in secion 2. In secion 2 we noe he good empirical performance of an inexacapproximae mehod and examine such approaches in more deail. As IS is iself no readily applicable o high dimensional parameer spaces, in secion 3 we look a naural exensions o he IS mehods based on SMC. Paricular care is required when considering approximaions wihin ieraive algorihms: we provide a preliminary sudy of approximaion in his conex demonsraing heoreically ha he resuling error can be conrolled uniformly in ime, under very favorable assumpions. This, and he associaed empirical sudy are inended o provide moivaion and proof of concep; cauion is sill required if approximaion is used wihin such mehods in pracice bu he resuls presened sugges ha furher invesigaion is warraned. The algorihms presened laer in he paper are viable alernaives o he MCMC approaches o parameer esimaion described in his secion, and may ouperform he corresponding MCMC approach in some cases. In paricular hey all auomaically make use of a populaion of poins, an idea previously explored in he MCMC conex by Caimo and Friel (2011) and Friel (2013). In secion 4 we draw conclusions. 2 Imporance sampling approaches In his secion we invesigae he use of IS for esimaing he evidence and BFs for models wih INCs. We consider an ideal imporance sampler ha simulaes P poins { θ (p)} P from a proposal q( ) and calculaes p=1 heir weigh, in he presence of an INC, using w (p) = p(θ(p) )γ(y θ (p) ) q(θ (p) )Z(θ (p) ), (9) wih an esimae of he evidence given by p(y) = 1 P P w (p). (10) p=1 To esimae a BF we simply ake he raio of esimaes of he evidence for he wo models under consideraion. However, he presence of he INC in he weigh expression in (9) means ha imporance samplers canno be direcly implemened for hese models. To circumven his problem we will invesigae he use of he echniques described in secion 1.1 in imporance sampling. We begin by looking a exac-approximaion based mehods in secion 2.1. We hen examine he use o approximae likelihoods based on simulaion, including ABC and SL in secion 2.2, before looking a he performance of all of hese mehods on a oy example in secion 2.3. Finally, in secions 2.4 and 2.6 we examine applicaions o exponenial random graph models (ERGMs) and Ising models, he laer of which leads us o consider he use of inexac-approximaions in IS (firs inroduced in secion 2.5). 2.1 Auxiliary variable IS To avoid he evaluaion of he INC in (9), we propose he use of he auxiliary variable mehod used in he MCMC conex in secion 1.1.1. Specifically, consider IS using he SAV arge p(θ, u y) q u (u θ, y)f(y θ)p(θ), noing ha i has he same normalizing consan as p(θ y) f(y θ)p(θ), wih proposal q(θ, u) = f(u θ)q(θ). This resuls in weighs w (p) = q u(u θ (p), y)γ(y θ (p) )p(θ (p) ) Z(θ (p) ) γ(u θ (p) )q(θ (p) ) Z(θ (p) ) = γ(y θ(p) )p(θ (p) ) q(θ (p) ) q u (u θ (p), y), γ(u θ (p) ) and he esimae (10) of he evidence. In his mehod, which we will refer o as single auxiliary variable IS (SAVIS), we may view q u (u θ (p), y)/γ(u θ (p) ) as an unbiased imporance sampling (IS) esimaor of 1/Z(θ (p) ). Alhough we are using an unbiased esimaor of he weighs in place of he ideal weighs, he resul is sill an exac imporance sampler. SAVIS is an exac-approximae IS mehod, as seen previously in Fearnhead e al (2010), Chopin e al (2013) and Tran e al (2013). As in he MCMC seing, o ensure he variance of esimaors produced by his scheme is no large we mus ensure he variance of esimaor of 1/Z(θ (p) ) is small. Thus in pracice we found exensions o his basic algorihm were useful: using muliple u imporance poins for each proposed θ (p) as

6 Richard G. Everi e al. in (2); and using AIS, raher han simple IS, for esimaing 1/Z(θ (p) ) as in (4) (giving an algorihm ha we refer o as muliple auxiliary variable IS (MAVIS), in common wih he erminology in Murray e al (2006)). Using q u ( θ, y) = f( θ), as described in secion 1.1.1, and γ k as in (3), we obain 1 Z(θ) = 1 Z( θ) 1 M K k=1 γ k (u (m) k 1 θ, θ, y) γ k 1 (u (m) k 1 θ, θ, y). (11) In his case he (A)IS mehods are being used as unbiased esimaors of he raio Z( θ)/z(θ) and again SMC could be used in heir place. 2.2 Simulaion based mehods Didelo e al (2011) invesigae he use of he ABC approximaion when using IS for esimaing marginal likelihoods. In his case he weigh equaion becomes w (p) = p(θ(p) ) 1 R R r=1 π ɛ(s(y) S(x (p) r )), q(θ (p) ) where { x (p) r } R r=1 f( θ (p) ), and using he noaion from secion 1.1.4. However, using hese weighs wihin (10) gives an esimae for p(s(y)) raher han, as desired, an esimae of he evidence p(y). Forunaely, here are cases in which ABC may be used o esimae BFs. Didelo e al (2011) esablish ha, for he BF for wo exponenial family models: if S 1 (y) is sufficien for he parameers in model 1 and S 2 (y) is sufficien for he parameers in model 2, hen using S(y) = (S 1 (y), S 2 (y)) gives p(y M 1 ) p(y M 2 ) = p(s(y) M 1) p(s(y) M 2 ). Ouside he exponenial family, making an appropriae choice of summary saisics is harder (Rober e al 2011; Prangle e al 2014; Marin e al 2014). Jus as in he parameer esimaion case, he use of a olerance ɛ > 0 resuls in esimaing an approximaion o he rue BF. An alernaive approximaion, no previously used in model comparison, is o use SL (as described in secion 1.1.5). In his case he weigh equaion becomes ( p(θ (p) )N S(y); µ θ (p), Σ ) θ w (p) (p) =, q(θ (p) ) where µ θ, Σ θ are given by (7). As in parameer esimaion, his approximaion is only appropriae if he normaliy assumpion is reasonable. The choice of summary saisics is as difficul as in he ABC case. 2.3 Toy example In his secion we have discussed hree alernaive mehods for esimaing BFs: MAVIS, ABC and SL. To furher undersand heir properies we now invesigae he performance of each mehod on a oy example. Consider i.i.d. observaions y = {y i } n=100 i=1 of a discree random variable ha akes values in N. For such a daase, we will find he BF for he models 1. y θ Poisson(θ), θ = λ Exp(1) n λ yi f 1 (y θ) = y i! / exp( nλ) i=1 2. y θ Geomeric(θ), θ = p Unif(0, 1) n f 2 (y θ) = (1 p) yi /p n. i=1 In boh cases we have rewrien he likelihoods f 1 and f 2 in he form γ(y θ)/z(θ) in order o use MAVIS. Due o he use of conjugae priors he BF for hese wo models can be found analyically. As in Didelo e al (2011) we simulaed (using an approximae rejecion sampling p(y M 1) p(y M 1)+p(y M 2) roughly scheme) 1000 daases for which uniformly cover he inerval [0.01,0.99], o ensure ha esing is performed in a wide range of scenarios. For each algorihm we used he same compuaional effor, in erms of he number of simulaions (100, 000) from he likelihood (such simulaions dominae he compuaional cos of all of he algorihms considered). Our resuls are shown in figure 1, wih he algorihmspecific parameers being given in figure 1a. We noe ha we achieved beer resuls for MAVIS when: devoing more compuaional effor o he esimaion of 1/Z(θ) (hus we used only 100 imporance poins in θ-space, compared o 1000 for he oher algorihms); and using more inermediae bridging disribuions in he AIS, raher han muliple imporance poins (hus, in equaion (11) we used K = 1000 and M = 1). In he ABC case we found ha reducing ɛ much furher han 0.1 resuled in many imporance poins wih zero weigh (noe ha here, and hroughou he paper we use he uniform kernel for π ɛ ). From he box plos in figure 1a, we migh infer ha overall SL has ouperformed he oher mehods, bu be concerned abou he number of ouliers. Figures 1b o 1d shed more ligh on he siuaions in which each algorihm performs well. In figure 1b we observe ha he non-zero ɛ resuls in a bias in he BF esimaes (represened by he shallower slope in he esimaed BFs compared o he rue values). In his example we conclude ha ABC has worked quie well, since he bias is only pronounced in siuaions where he rue BF favours one model srongly over

Bayesian model comparison wih un-normalised likelihoods 7 (a) A box plo of he log of he esimaed BF divided by he rue BF. (b) The log of he BF esimaed by ABC-IS agains he log of he rue BF. (c) The log of he BF esimaed by SL-IS agains he log of he rue BF. (d) The log of he BF esimaed by MAVIS agains he log of he rue BF. Fig. 1: Bayes facors for he Poisson and geomeric models. he oher, and his conclusion would no be affeced by he bias. For his reason i migh be more relevan in his example o consider he deviaions from he shallow slope, which are likely due o he Mone Carlo variance in he esimaor (which becomes more pronounced as ɛ is reduced). We see ha he choice of ɛ essenially governs a bias-variance rade-off, and ha he difficuly in using he approach more generally is ha i is no easy o evaluae wheher a choice of ɛ ha ensures a low variance also ensures ha he bias is no significan in erms of affecing he conclusions ha migh be drawn from he esimaed BF (see secion 2.4). Figure 1c suggess ha SL has worked exremely well (in erms of having a low variance) for he mos imporan siuaions, where he BF is close o 1. However, we noe ha he large biases inroduced due o he limiaion of he Gaussian assumpion when he BF is far from 1. Figure 1d indicaes ha here is lile or no bias when using MAVIS, bu ha here is appreciable variance (due o using IS on he relaively high-dimensional u-space). These resuls highligh ha he hree mehods will be mos effecive in slighly differen siuaions. The approximaions in ABC and SL inroduce a bias, he effec of which migh be difficul o assess. In ABC (assuming sufficien saisics) his bias can be reduced by an increased compuaional effor allowing a smaller ɛ, however i is essenially impossible o assess when his bias is small enough. SL is he simples mehod o implemen, and seems o work well in a wide variey of siuaions, bu he advice in Wood (2010) should be followed in checking ha he assumpion of normaliy is appropriae. MAVIS is limied by he need o perform imporance sampling on he high-dimensional (θ, u) space bu consequenly avoids specifying summary saisics, is bias is small, and his mehod is able o esimae he evidence of individual models.

8 Richard G. Everi e al. ABC (ɛ = 0.1) ABC (ɛ = 0.05) SL MAVIS ˆp(y M 1 ) ˆp(y M 2 ) 4 20 40 41 Table 1: Model comparison resuls for Gamaneg daa. Noe ha he ABC (ɛ = 0.05) esimae was based upon jus 5 sample poins of non-zero weigh. MAVIS also provides esimaes of he individual evidence (log [ p(y M 1 )] = 69.6, log [ p(y M 2 )] = 73.3). 2.4 Applicaion o social neworks In his secion we use our mehods o compare he evidence for wo alernaive ERGMs for he Gamaneg daa previously analysed in Friel (2013) (who illusrae he daa in heir figure 3). An ERGM has he general form f(y θ) = 1 Z(θ) exp ( θ T S(y) ), where S(y) is a vecor of saisics of a nework y and θ is a parameer vecor of he same lengh. We ake S(y) = (# of edges ) in model 1 and S(y) = (# of edges, # of wo sars) in model 2. As in Friel (2013) we use he prior p(θ) = N (θ; 0, 25I). Using a compuaional budge of 10 5 simulaions from he likelihood (each simulaion consising of an inernal MCMC run of lengh 1000 as a proxy for an exac sampler, as described in secion 1.1.3), Friel (2013) finds ha he evidence for model 1 is 37 ha for model 2. Using he same compuaional budge for our mehods, consising of 1000 imporance poins (wih 100 simulaions from he likelihood for each poin), we obained he resuls shown in Table 1. This example highlighs he issue wih he biasvariance rade-off in ABC, wih ɛ = 0.1 having oo large a bias and ɛ = 0.05 having oo large a variance. SL performs well in his paricular case he Gaussian assumpion appears o be appropriae. One migh expec his, since he saisics are sums of random variables. However, we noe ha his is no usually he case for ERGMs, paricularly when modelling large neworks, and ha SL is a much more appropriae mehod for inference in he ERGMs wih local dependence (Schweinberger and Handcock 2015). A more sophisicaed ABC approach migh exhibi improved performance, possibly ouperforming SL. However, he appeal of SL is in is simpliciy, and we find i o be a useful mehod for obaining good resuls wih minimal uning. 2.5 IS wih biased weighs The implemenaion of MAVIS in he previous secion is no an exac-approximae mehod for wo reasons: 1. An inernal MCMC chain was used in place of an exac sampler; 2. The 1/Z( θ) erm in (11) was esimaed before running his algorihm (by using a sandard SMC mehod, wih iniial disribuion being he Bernoulli random graph (which can be simulaed from exacly) and final disribuion γ( θ) o esimae Z( θ) (being he normalising consan of γ), and aking he reciprocal) wih his fixed esimae being used hroughou. However, in pracice, we end o find ha such inexacapproximaions do no inroduce large errors ino Bayes facor esimaes, paricularly when compared o sandard implemenaions of ABC (as seen in he previous secion). This example suggess ha in pracice i may someimes be advanageous o use biased raher han unbiased esimaes of imporance weighs wihin a random weigh IS algorihm: an observaion ha is somewha analogous o ha made in Alquier e al (2015) in he conex of MCMC. This secion provides an iniial heoreical exploraion as o wheher his migh be a useful sraegy in IS. In order o analyse he behaviour of imporance sampling wih biased weighs, we consider biased esimaes of he weighs in equaion (10). Le w(θ) := p(θ)γ(y θ) Z(θ)q(θ). We consider biased randomised weighs ha admi an addiive decomposiion, ẁ(θ) := w(θ) + b(θ) + `V θ, in which b(θ) = E[ẁ(θ) θ] w(θ) is a deerminisic funcion describing he bias of he weighs and `V θ is a random variable (more precisely, here is an independen copy of such a random variable associaed wih every paricle), which condiional upon θ is of mean zero and variance `σ θ 2 = Var(ẁ(θ) θ). This decomposiion will no generally be available in pracice, bu is flexible enough o allow he formal descripion of many seings of ineres. For insance, one migh consider he algorihms presened here by seing b(θ) o he (condiional) expeced value of he difference beween he approximae and exac weighs and `V θ o he difference beween he approximae weighs and heir expeced value. We have immediaely ha he bias of such an esimae is, using a subscrip of q o denoe expecaions and variances wih respec o q(θ), E q [b(θ)]. By a simple applicaion of he law of oal variance, is variance is 1 P Var q(ẁ(θ)) = 1 { ]} 2 Varq [w(θ) + b(θ)] + E q [`σ P θ

Bayesian model comparison wih un-normalised likelihoods 9 Consequenly, he mean squared error of his esimae is: 1 { Varq [w(θ) + b(θ)] + E q [`σ 2 P θ] } + E q [b(θ)] 2. If we compare such a biased esimaor wih a second esimaor in which we use he same proposal disribuion bu insead use an unbiased random weigh ẃ(θ) := w(θ) + V (θ), where V (θ) has condiional expecaion zero and variance σ θ 2, hen i s clear ha he biased esimaor has smaller mean squared error for small enough samples if i has sufficienly smaller variance, i.e., when (assuming E q [b(θ)] 2 > 0, oherwise one esimaor dominaes he oher for all sample sizes): 1 { Varq [w(θ) + b(θ)] + E q [`σ 2 P θ] } + E q [b(θ)] 2 < 1 { Varq [w(θ)] + E q [ σ 2 P θ] } which holds when P is inferior o E q [ σ 2 θ `σ2 θ ] Var q [b(θ)] 2Cov q [w(θ), b(θ)] E q [b(θ)] 2. In he arificially simple seing in which b(θ) = b 0 is consan, his would mean ha he biased esimaor would have smaller MSE for samples smaller han he raio of he difference in variance o he square of ha bias suggesing ha qualiaively a biased esimaor migh be beer if he square of he average bias is small in comparison o he variance reducion ha i provides. Given a family of increasingly expensive biased esimaors wih progressively smaller bias, one could envisage using such an argumen o manage he rade-off beween less biased esimaors and larger sample sizes. In pracice a negaive covariance beween b(θ) and w(θ) migh also lead o favourable performance by biased esimaors. 2.6 Applicaions o Ising models In he curren secion we invesigae his ype of approach furher empirically, esimaing Bayes facors from daa simulaed from Ising models. In paricular we reanalyse he daa from Friel (2013), which consiss of 20 realisaions from a firs-order 10 10 Ising model and 20 realisaions from a second-order 10 10 Ising model for which accurae esimaes (via Friel and Rue (2007)) of he evidence serve as a ground ruh for comparison. We also analyse daa from a 100 100 Ising model. 2.6.1 10 10 Ising models As in he oy example, we examine several differen configuraions of he IS and AIS esimaors of he Z( θ)/z(θ) erm in he weigh (9), using differen values of M, K and B, he burn in of he inernal MCMC, ha yield he same compuaional cos (in erms of he number of Gibbs sweeps used o simulae from he likelihood). Noe ha for small values of B hese esimaors are biased; a bias ha decreases as B increases. The empirical resuls in Friel (2013), use a oal 2 10 7 Gibbs sweeps o esimae one Bayes facor, o allow comparison of our resuls wih hose in ha paper. Here, esimaing a marginal likelihood is done in hree sages: firsly θ is esimaed; followed by Z( θ), hen finally he marginal likelihood. We ook θ o be he poserior expecaion, esimaed from a run of he exchange algorihm of 10, 000 ieraions. Z( θ) was hen esimaed using SMC wih an MCMC move, wih 200 paricles and 100 arges, wih he ih arge being γ i ( θ) = γ i ( iθ/100), employing sraified resampling when he effecive sample size (ESS; Kong e al (1994)) falls below 100. The oal cos of hese hree sages is 5 10 6 Gibbs sweeps (1/4 of he cos of populaion exchange) wih he final IS sage cosing 2 10 4 sweeps (1/1000 of he cos of populaion exchange). We noe ha he cos of he firs wo sages has been chosen conservaively - less compuaional effor here can also yield good resuls. The imporance proposal used in all cases was a mulivariae normal disribuion, wih mean and variance aken o be he sample mean and variance from he iniial run of he exchange algorihm. This proposal would clearly no be appropriae in high dimensions, bu is reasonable for he low dimensional parameer spaces considered here. Figure 2 shows he resuls produced by hese mehods in comparison wih hose from Friel (2013). We observe: improvemens of he new mehods over populaion exchange; an overall robusness of he new mehods o differen choices of parameers; and ha here is a bias-variance radeoff in he inernal esimae of Z( θ)/z(θ) in erms of producing he bes behaviour of he Bayes facor esimaes. Recall ha as B increases he bias of he inernal esimae (he resuls of which can be observed in he resuls when using B = 0) decreases, bu for a fixed compuaional effor i is beneficial o use a lower B and o insead increase M, using more imporance poins o decrease he variance. As in Alquier e al (2015), we observe ha i may be useful o move away from he exac-approximae approaches, and in his case, o simply use he bes available esimaor of Z( θ)/z(θ) (aking ino accoun is saisical and compuaional efficiency) regardless of wheher

10 Richard G. Everi e al. 1.0 log(esimaed BF / True BF) 0.5 0.0 0.5 K 0 1 5 9 10 15 19 50 100 150 199 Algorihm Populaion Exchange SAVIS MAVIS 1.0 Pop Exchg 200,1,0 100,2,0 20,10,0 10,20,0 2,100,0 1,200,0 100,1,1 20,5,5 M, B, K 10,15,5 20,1,9 10,10,10 10,5,15 10,1,19 1,150,50 1,100,100 1,50,150 1,1,199 Figure 2: Box plos of he resuls of populaion exchange, SAVIS, and MAVIS on he Ising daa. i is unbiased. In his example here is lile observed difference in using our fixed compuaional budge on more AIS moves (K) in place of using more imporance poins (M). In general we migh expec using more AIS moves o be more producive when he esimaes of he Z( θ)/z(θ) for θ far from θ are required. 2.6.2 100 100 Ising model In his secion we use SAVIS for esimaing he marginal likelihood for a firs order Ising model on daa of size 100 100 pixels simulaed from an Ising model wih parameer θ = 10. Again, esimaing a marginal likelihood is done in hree sages: firsly θ is esimaed; followed by Z( θ), hen finally he marginal likelihood. The mehods use for he firs wo sages are idenical o hose used in secion 2.6.1, as is he choice of proposal disribuion. The hird sage is performed using SAVIS wih M = 100 and B = 20. From 20 runs of his hird sage, a five-number summary of he log evidence esimaes was (-5790.251, -5790.178, -5790.144, - 5790.119, -5790.009), wih he average ESS being 80.75. Noe he low variance over hese runs of he algorihm and he high ESS, which were also found for differen configuraions of he algorihm (including for more imporance poins and larger values of M and B). One migh expec his example o be more difficul han he 10 10 grids considered in he previous secion, due o he need o find good esimaes of Z( θ)/z(θ) ha are here normalising consans of disribuions on a space of higher dimensions. However, since he poserior has lower variance in his case, only values of θ close o θ are proposed, which makes esimaing Z( θ)/z(θ) much easier, yielding he good resuls in his secion. 2.7 Discussion In his secion we have compared he use of ABC-IS, SL-IS, MAVIS (and alernaives) for esimaing marginal likelihoods and Bayes facors. The use of ABC for model comparison has received much aenion, wih much of he discussion cenring around appropriae choices of summary saisics. We have avoided his in our examples by using exponenial family models, bu in general his remains an issue affecing boh ABC and SL. I is he use of summary saisics ha makes ABC and SL unable o provide evidence esimaes. However, i is he use of summary saisics, usually essenial in hese seings, ha provides ABC and SL wih an advanage over MAVIS, in which imporance sampling mus be performed over he high dimensional daa-space. Despie his disadvanage, MAVIS avoids he approximaions made in he simulaion based mehods (illusraed in figures 1b o 1d, wih he accuracy depending primarily on he qualiy of he esimae of

Bayesian model comparison wih un-normalised likelihoods 11 1/Z used). In secion 2.6 we saw ha here can be advanages of using biased, bu lower variance esimaes in place of sandard IS. The main weakness of all of he mehods described in his secion is ha hey are all based on sandard IS and are hus no pracical for use when θ is high dimensional. In he nex secion we examine he use of SMC samplers as an exension o IS for use on riply inracable problems, and in his framework discuss furher he effec of inexac approximaions. 3 Sequenial Mone Carlo approaches SMC samplers (Del Moral e al 2006) are a generalisaion of IS, in which he problem of choosing an appropriae proposal disribuion in IS is avoided by performing IS sequenially on a sequence of arge disribuions, saring a a arge ha is easy o simulae from, and ending a he arge of ineres. In sandard IS he number of Mone Carlo poins required in order o obain a paricular accuracy increases exponenially wih he dimension of he space, bu Beskos e al (2011) show (under appropriae regulariy condiions) ha he use of SMC circumvens his problem and can hus be pracically useful in high dimensions. In his secion we inroduce SMC algorihms for simulaing from doubly inracable poseriors which have he by-produc ha, like IS, hey also produce esimaes of marginal likelihoods. We noe ha, alhough here we focus on esimaing he evidence, he SMC sampler approaches based here are a naural alernaive o he MCMC mehods described in secion 1.1. and inherenly use a populaion of Mone Carlo poins (shown o be beneficial on hese models by Caimo and Friel (2011)). In secion 3.1 we describe hese algorihms, before examining an applicaion o esimaing he precision marix of a Gaussian disribuion in high dimensions in secion 3.2. In 3.4 we provide a preliminary invesigaion of he consequences of using biased weigh esimaes in an SMC framework. 3.1 SMC samplers in he presence of an INC This secion inroduces wo alernaive SMC samplers for use on doubly inracable arge disribuions. The firs, marginal SMC, direcly follows from he IS mehods in he previous secion. The second, SMC-MCMC, requires a slighly differen approach, bu is more compuaionally efficien. Finally we briefly discuss simulaion-based SMC samplers in secion 3.1.2. To begin, we inroduce noaion ha is common o all algorihms ha we discuss. SMC samplers perform sequenial IS using P paricles θ (p), each having (normalised) weigh w (p), using a sequence of arges π 0 o π T, wih π T being he disribuion of ineres, in our case π(θ y) p(θ)f(y θ). In his secion we will ake π (θ y) p(θ)f (y θ) = p(θ)γ (y θ)/z (θ). A arge, a forward kernel K ( θ (p) is used o move paricle θ (p) 1 o θ(p), wih each paricle hen being reweighed o give unnormalised weigh (p) L 1 (θ (p), θ (p) w (p) = p(θ(p) )γ (y θ (p) ) Z 1(θ p(θ (p) γ 1(y θ (p) Z (θ (p) ) Here, L 1 represens a backward kernel ha we chose differenly in he alernaive algorihms below. We noe he presence of he INC, which means ha his algorihm canno be implemened in pracice in is curren form. The weighs are hen normalised o give { w (p) K (θ (p) 1, θ(p) ). }, and a resampling sep is carried ou. In he following secions he focus is on he reweighing sep: his is he main difference beween he differen algorihms. For more deail on hese mehods, see Del Moral e al (2007). Zhou e al (2015) describe how BFs can be esimaed direcly by SMC samplers, simply by aking π 1 o be one model and π T o be he oher (wih he π being inermediae disribuions). This idea is also explored for Gibbs random fields in Friel (2013). However, he empirical resuls in Zhou e al (2015) sugges ha in some cases his mehod does no necessarily perform beer han esimaing marginal likelihoods for he wo models separaely and aking he raio of he esimaes. Here we do no invesigae hese algorihms furher, bu noe ha hey offer an alernaive o esimaing he marginal likelihood separaely. 3.1.1 Random weigh SMC Samplers SMC wih an MCMC kernel Suppose we were able o use a reversible MCMC kernel K wih invarian disribuion π (θ y) p(θ)f (y θ), and choose he L 1 kernel o be he ime reversal of K wih respec o is invarian disribuion, we obain he following incremenal weigh: w (p) = γ (y θ (p) (p) Z 1(θ γ 1 (y θ (p) Z (θ (p). (12) Once again, we canno evaluae his incremenal weigh due o he presence of a raio of normalising consans. Also, such an MCMC kernel canno generally be direcly consruced he MH updae iself involves evaluaing he raio of inracable normalising consans. However, appendix A shows ha precisely he same weigh updae resuls when using eiher SAV or exchange MCMC moves in place of a direc MCMC sep.

12 Richard G. Everi e al. In order ha his approach may be implemened we migh consider, in he spiri of he approximaions suggesed in secion 2, using an esimae of he raio erm Z 1 (θ (p) /Z (θ (p). For example, an unbiased IS esimae is given by Z 1 (θ (p) Z (θ (p) where u (m,p) = 1 M γ 1 (u (m,p) θ (p) γ (u (m,p) θ (p), (13) f ( θ (p). Alhough his esimae is unbiased, we noe ha he resulan algorihm does no have precisely he same exended space inerpreaion as he mehods in Del Moral e al (2006). Appendix B gives an explici consrucion for his case, which incorporaes a pseudomarginal-ype consrucion (Andrieu and Robers 2009). Daa poin empering For he SMC approach o be efficien we require ha he sequence of disribuions {π } be chosen such ha π 0 is easy o simulae from, π T is he arge of ineres and he inermediae disribuions provide a roue beween hem. For he applicaions in his paper we found he daa empering approach of Chopin (2002) o be paricularly useful. Suppose ha he daa y consiss of N poins, and ha N is exacly divisible by T for ease of exposiion. We hen ake π 0 (θ y) = p(θ) and for = 1,...T π (θ y) = p(θ)f (y θ) wih f (y θ) = f ( y 1:N/T θ ), (14) i.e. essenially we incorporae N/T addiional daa poins for each incremen of. On his sequence of arges we hen propose o use he SMC sampler wih an MCMC kernel as described in he previous secion. The only slighly non-sandard poin is he esimaion of Z 1 (θ (p) Z (θ (p), since in his case Z 1(θ (p) and Z (θ (p) are he normalising consans of disribuions on differen spaces. We use Z 1 (θ (p) Z (θ (p) where u (m,p) = 1 M / γ 1 (v (m,p) θ (p) q w(w (m,p) ) γ (u (m,p) θ (p) (15) f ( θ (p) and v(m,p) and w (m,p) are is in he space of he ad- subvecors of u (m,p). w (m,p) diional variables added when moving from f 1 o f (providing he argumen in an arbirary auxiliary disribuion q w ( )) and v (m,p) is in he space of he exising variables. For = 1 his becomes 1 Z 1 (θ (p) 0 ) = 1 M wih u (m,p) 1 f (. θ (p) 0 ). q w (u (m,p) 1 ) γ 1 (u (m,p) 1 θ (p) 0 ) (16) Analogous o he SAV( mehod, ) a sensible choice for q w (w) migh be o use f w θ, where w is on he same space as N/T daa poins. The normalising consan for his disribuion needs o be known o calculae he imporance weigh in (19) so, as earlier, we advocae esimaing his in advance of running he SMC sampler (aside from when he daa poins are added one a a ime - in his case he normalising consan may usually be found analyically). Noe ha if y does no consis of i.i.d. poins, i is useful o choose he order in which daa poins are added such ha he same q w (each wih he same normalising consan) can be used in every weigh updae. For example, in an Ising model, he requiremen would be o add he same shape grid of variables a each arge. Marginal SMC An alernaive mehod commonly used in ABC applicaions arises from he use of an approximaion o he opimal backward kernel (Peers 2005; Klaas e al 2005). In his case he weigh updae is w (p) = Z (θ (p) p(θ (p) )γ (y θ (p) ) ) P r=1 w(r) 1 K (θ (p) θ (r) (17) for an arbirary forward kernel K. This resuls in a compuaional complexiy of O(P 2 ) compared o O(P ) for a sandard SMC mehod, bu we include i here in order o noe ha he 1/Z( ) erm in (17) could be deal wih in he same way as in he simple IS case. Considering he SAVM poserior, where in arge we use he disribuion q u for he auxiliary variable u, and he SAVM proposal, where u (p) a he weigh updae: w (p) = q u (u (p) θ (p) f ( θ (p), y)p(θ (p) )γ (y θ (p) ) 1 K (θ (p) ) we arrive γ (u (p) θ (p) ) P r=1 w(r) θ (r). in which normalising consan appears in his weigh updae. We include his approach for compleeness bu do no invesigae i furher in his paper. 3.1.2 Simulaion-based SMC samplers Secion 2.2 describes how he ABC and SL approximaions may be used wihin IS. The same approximae likelihoods may be used in SMC. In ABC (Sisson e al 2007), where he sequence of arges is chosen o be π (θ) p(θ) f ɛ (y θ) wih a decreasing sequence ɛ, his idea provides a useful alernaive o MCMC for exploring ABC poserior disribuions, whils also providing esimaes of Bayes facors (Didelo e al 2011). The use of SMC wih SL does no appear o have been explored previously. One migh expec SMC o be useful in his conex (using, for example, he sequence of (/T ) arges π (θ) p(θ) f SL (S(y) θ)), paricularly when f SL is concenraed relaive o he prior.

Bayesian model comparison wih un-normalised likelihoods 13 3.2 Applicaion o precision marices In his secion we examine he performance of he SMC sampler, wih MCMC proposal and daa-empered arge disribuions, for esimaing he evidence in an example in which θ is of moderaely high dimension. We consider he case in which θ = Σ 1 is an unknown precision marix, f(y θ) is he d-dimensional mulivariae Gaussian disribuion wih zero mean and p(θ) is a Wishar disribuion W(ν, V ) wih parameers ν d and V R d d. Suppose we observe n i.i.d. observaions y = {y i } n i=1, where y i R d. The rue evidence can be calculaed analyically, and is given by 2 ) p(y) = 1 Γ d ( ν+n π nd/2 Γ d ( ν 2 ) ( V 1 + n i=1 y iyi T V ν 2 ) 1 ν+n 2, (18) where Γ d denoes he d-dimensional gamma funcion. For ease of implemenaion, we paramerise he precision using a Cholesky decomposiion Σ 1 = LL wih L a lower riangular marix whose (i, j) h elemen is denoed a ij. As in secion 2.3, we wrie f(y θ) as γ(y θ)/z(θ) as follows f ( ( ) {y i } n i=1 Σ 1) = 2πΣ n/2 exp 1 n y 2 iσ 1 y i, i=1 where in some of he experimens ha follow, Z(θ) = 2πΣ n/2 is reaed as if i is an INC. In he Wishar prior, we ake ν = 10 + d and V = I d. Taking d = 10, n = 30 poins were simulaed using y i MVN (0 d, 0.1 I d ). The parameer space is hus 55-dimensional, moivaing he use of an SMC sampler in place of IS or he populaion exchange mehod, neiher of which are suied o his problem. In he SMC sampler, in which we used P = 10, 000 paricles, he sequence of arges is given by daa poin empering. Specifically, he sequence of arges is ) o use p(σ 1 ) when = 0 and p(σ 1 )f ({y i } i=1 Σ 1 for = 1,..., T (wih T = n). The parameers are {a ij 1 j i d}. We use single componen MH kernels o updae each of he parameers, wih one (deerminisic) sweep consising of an updae of each in urn. For each a ij we use a Gaussian random walk proposal, where a arge, he variance for he proposal used for a ij is aken o be he sample variance of a ij a arge 1. For updaing he weighs of each paricle we used equaion 15, where we chose q w ( ) = f ( Σ ) 1 wih Σ 1 he maximum likelihood esimae of he precision Σ 1, and chose M = 200 inernal imporance sampling poins. Sysemaic resampling was performed when he effecive sample size (ESS) fell below P/2. We esimaed he evidence 10 imes using he SMC sampler and compared he saisical properies of each algorihm using hese esimaes. For our simulaed daa, he log of he rue evidence was 89.43. Over he 10 runs of he SMC sampler a five-number summary of he log evidence esimaes was ( 90.01, 89.51, 89.35, 88.92, 88.37). 3.3 Applicaion o Ising models In his secion we apply he random weigh SMC sampler o he Ising model daa considered in secion 2.6.1. We use SMC o esimae he marginal likelihood of boh he firs and second order Ising models, hen ake he raio of hese esimaes o esimae he Bayes facor. Noe ha in his case he size of he parameer space is much smaller han in he precision example, wih he models having parameer spaces of sizes 1 and 2 respecively. The excellen resuls achieved by IS in secion 2.6.1 migh seem o imply ha SMC samplers are no required for his problem, bu recall ha we required preliminary runs of he exchange algorihm in order o design an appropriae imporance proposal, along wih an SMC sampler in order o esimae he normalising consan Z( θ) of he disribuion q u used for he auxiliary variables u (m). An SMC sampler offers a cleaner approach ha requires less user uning. We applied he random weigh SMC sampler described in secion 3.1.1, wih 500 paricles, daa poin empering (adding one pixel a a ime, aking q w o be Bern(0.5)), and using he esimae of he raio of normalising consans in he weigh updae from equaion (15) wih M = 20 imporance poins. Each of hese esimaes requires simulaing a single poin from γ ( θ (p) using a Gibbs sampler, which had a burn in of B = 10 ieraions, yielding a oal compuaional budge of 200 Gibbs sweeps for esimaing he raio of normalising consans. Noe ha, as considered in secion 2.6.1, his use of a Gibbs sampler resuls in an inexac algorihm, bu his level of burn in was found o be sufficien for his bias o be minimal in he random weigh IS algorihms. The MCMC kernel of he exchange algorihm was used (wih proposal aken o be he sample variance of he se of paricles a each SMC ieraion), using he approximae version where a Gibbs sampler wih burn in B = 10 ieraions is used o simulae from γ ( θ ( ) ). The oal cos of his algorihm is comparable o he IS approaches in secion 2.6.1, wih a oal cos of 5.25 10 6 Gibbs sweeps and hence around a quarer of ha of he algorihm of Friel (2013). Figure 3 shows