BEHAVIOUR OF ABC FOR BIG DATA By Wetao Li ad Paul Fearhead Lacaster Uiversity arxiv:1506.03481v1 [stat.me] 10 Ju 2015 May statistical applicatios ivolve models that it is difficult to evaluate the likelihood, but relatively easy to sample from, which is called itractable likelihood. Approximate Bayesia computatio (ABC) is a useful Mote Carlo method for iferece of the ukow parameter i the itractable likelihood problem uder Bayesia framework. Without evaluatig the likelihood fuctio, ABC approximately samples from the posterior by joitly simulatig the parameter ad the data ad acceptig/rejectig the parameter accordig to the distace betwee the simulated ad observed data. May successful applicatios have bee see i populatio geetics, systematic biology, ecology etc. I this work, we aalyse the asymptotic properties of ABC as the umber of data poits goes to ifiity, uder the assumptio that the data is summarised by a fixed-dimesioal statistic, ad this statistic obeys a cetral limit theorem. We show that the ABC posterior mea for estimatig a fuctio of the parameter ca be asymptotically ormal, cetred o the true value of the fuctio, ad with a mea square error that is equal to that of the maximum likelihood estimator based o the summary statistic. We further aalyse the efficiecy of importace samplig ABC for fixed Mote Carlo sample size. For a wide-rage of proposal distributios importace samplig ABC ca be efficiet, i the sese that the Mote Carlo error of ABC icreases the mea square error of our estimate by a factor that is just 1 + O(1/N), where N is the Mote Carlo sample size. 1. Itroductio. There are may statistical applicatios which ivolve iferece about models that are easy to simulate from, but for which it is difficult, or impossible, to calculate likelihoods for. I such situatios it is possible to use the fact we ca simulate from the model to eable us to perform iferece. There is a wide class of such likelihood-free methods of iferece icludig idirect iferece [22, 23], the bootstrap filter [21] ad simulated methods of momet [16]. We cosider a Bayesia versio of these methods, termed Approximate Bayesia Computatio (ABC). This approach ivolves defiig a approximatio to the posterior distributio i such a way that it is possible to sample from this approximate posterior usig oly the ability to sample from the model for ay give parameter value. Let K(x) be a desity kerel, where max x K(x) = 1, ad ε > 0 be a badwith. Deote the data as Y obs = (y obs,1,, y obs, ). Assume we have chose a fiite dimesioal summary statistic s (Y ), ad deote s obs = s (Y obs ). If we model the data as a draw from a 1
2 LI AND FEARNHEAD Algorithm 1: Importace ad Rejectio Samplig ABC 1. Simulate θ 1,, θ N q (θ); 2. For each i = 1,..., N, simulate Y (i) = (y (i) 1,, y(i) ) f (y θ i ); 3. For each i = 1,..., N, accept θ i with probability K ε(s (i) s obs ), where s (i) = s (Y (i) ); ad defie the associated weight as w i = π(θ i )/q (θ i ). parametric desity f (y θ), ad assume prior π(θ), the defie the ABC posterior as (1) π ABC (θ s obs, ε) π(θ) f (s obs + εv θ)k(v) dv, where f (s θ) is the desity for the summary statistic implied by f (y θ). Let f ABC (s obs θ, ε) = f (s obs + εv θ)k(v) dv. The idea is that f ABC (s obs θ, ε) is a approximatio of the likelihood, ad the ABC posterior, proportioal to the prior multiplyig this likelihood approximatio, is a approximatio of the true posterior. The likelihood approximatio ca be iterpreted as a measure of, o average, how close the summary, s, simulated from the model are to the summary for the observed data, s obs. The choices of kerel ad badwidth affect the defiitio of closeess. By defiig the approximate posterior i this way, we ca simulate samples from it usig stadard Mote Carlo methods. Oe approach, that we will focus o later, uses importace samplig (IS). Let K ε (x) = K(x/ε). Give a proposal desity, q (θ), a badwidth, ε, ad a Mote Carlo sample size, N, the importace samplig ABC (IS-ABC) would proceed as i Algorithm 1. The set of accepted parameters ad their associated weights provides a Mote Carlo approximatio to π ABC. Note that if we set q (θ) = π(θ) the this is just a rejectio sampler with the ABC posterior as its target, which is called rejectio ABC i this paper. I practice sequetial importace samplig methods are ofte used to lear a good proposal distributio [3]. There are three choices i implemetig ABC: the choice of summary statistic, the choice of badwidth, ad the specifics of the Mote Carlo algorithm. For importace samplig, the last of these ivolves specifyig the Mote Carlo sample size, N, ad the proposal desity, q (θ). These, roughly, relate to three sources of approximatio i ABC. To see this ote that as ε 0 we would expect ABC posterior to coverge to the posterior give s obs [17]. Thus the choice of summary statistic govers the approximatio, or loss of iformatio, betwee usig the full posterior distributio ad usig the posterior give the summary. The value ε the affects how close the ABC posterior is to the posterior give the summary. Fially there is the Mote Carlo error from approximatig the true ABC posterior with a Mote Carlo sample. The Mote Carlo error is ot oly affected by the specifics of the Mote Carlo algorithm, but also by the choices of summary statistic ad badwidth, which together affect, say, the probability of acceptace i step 3 of the above importace samplig algorithm. Havig higher dimesioal summary statistic, or
ABC ASYMPTOTICS 3 smaller values of ε, will ted to reduce this acceptace probability ad hece icrease the Mote Carlo error. These three sources of approximatio, together with the variatio of the observatios, determie the variatio of the ABC estimator. Arguably the first ABC method was that of [36], ad these methods have bee popular withi populatio geetics [4, 11, 43], ecology [2] ad systematic biology [42, 38]. More recetly, there have bee applicatios of ABC to other areas icludig stereology [9], stochastic differetial equatios [34] ad fiace [33]. The basic rejectio scheme is limited due to the low acceptace probability whe the posterior is far away from the prior [31] or the dimesio of summary statistic is high [4]. Importace samplig ca improve upo rejectio samplig by proposig parameter values i areas of high posterior desity, i order to icrease the acceptace probability. Alteratives to the importace samplig iclude MCMC [31, 43, 41] ad sequetial Mote Carlo which attempts to move the sample towards the high posterior desity area [3, 15]. The choice of the proposal distributio is key to the performace of the importace samplig. [17] used a pilot stage to fid the high posterior desity regio for costructig the proposal distributio, ad [7] used iterative procedure to lear good proposal distributios. However, as it is closer to the posterior distributio, oe cocer is the icreased Mote Carlo variace due to the more ad more skewed importace weight, the effect of which is uclear. Whilst ABC methods have bee widely used, its theoretical uderstadig is still limited, ad theory to date has ofte focussed o specific aspects of ABC. By igorig the Mote Carlo error, the asymptotic properties of some ABC estimators of the parameter are aalysed. For example, [39, 26] show the cosistecy of the maximum a posteriori estimator of the ABC posterior desity; [14] ad [13] devise a ABC procedure for the hidde Markov model based o the full observatios, istead of a summary statistic, ad give the cosistecy ad the asymptotic ormality of the ABC posterior ad the estimator maximisig the approximate likelihood. There are also results for choosig the optimal summary statistic for parameter estimatio or model choice [17, 35], ad coditios o the summary statistic that are required if we wish to be able to distiguish betwee competig models [30]. For the choice of ε of the rejectio ABC, [6], [5] ad [1] ivestigate how it should scale with the Mote Carlo sample size, N, by obtaiig the asymptotic MSE to the posterior mea based o s obs. There have bee separate results aroud the implemetatio of differet Mote Carlo algorithms for ABC. For example [27] looks at the coditios uder which MCMC algorithms i ABC are geometrically ergodic. [17] gives the optimal proposal desity for the importace samplig implemetatio i the sese of it miimisig the effective sample size (ESS) of [28] of the sample weights. 1.1. Cotributios ad Mai Results. Assume the true parameter is θ 0, ad some fuctio of the parameter, h(θ), is of iterest. I Algorithm 1, the ABC estimator ĥh of h(θ 0 ) is obtaied usig a weighted average of h(θ) for the accepted θ. I this paper, we study the asymptotic behaviour of the approximatio accuracy of ĥh, cosiderig all sources of error, for fixed but large Mote Carlo size as the umber of observatios icrease. Our
4 LI AND FEARNHEAD key assumptio behid the results is that as size of the data set icreases, the summary statistic obeys a cetral limit theorem. Our goal is to fid out for icreasig ad fixed N, whether the efficiecy of ĥh ca icrease at the same rate as that of the maximum likelihood estimator for h(θ) give the summary statistic. We will use the termiology MLES of h(θ) to deote this maximum likelihood estimator give the summary. To help uderstad the results we will cosider ABC applied to a simple Gaussia example, for which we ca aalytically calculate the ABC posterior ad properties of IS-ABC. Iformally, our assumptio that the summary statistics obey a cetral limit theorem meas that the asymptotic behaviour of ABC will be qualitatively similar to its behaviour o this example. Assume a sample of eve size, y 1,..., y, with y i idepedetly draw from a N(θ, 1) distributio. Assume that we have a two-dimesio summary statistic s (y) = 2 /2 y i, 2 y i, i=1 i=/2+1 the average of the first /2 ad last /2 data poits respectively. The ABC posterior will deped o this 2-dimesio summary through the average of its two compoets, ad we let s(y) deote this average. For details of the derivatio of the aalytical expressios show below, see Appedix D. We will assume a prior for θ which is stadard ormal. Our first set of results relates to the ABC posterior. We choose a kerel ad badwidth which is equivalet to idepedet margial Gaussia desity with variace ε 2, for which the badwidth is proportioal to ε. The ABC posterior for this simple model is ( N s obs 1/ + ε 2 + 1, 1 + ε 2 + 1 + ε 2 The ABC posterior differs from the true posterior due to terms which are O(ε 2 ) i both the mea ad variace. If we cosider the ABC estimate of h(θ), for some fuctio h that has bouded derivatives, ad assume ε = o( 1/4 ), its mea will be ( ) s obs h 1 + 1/ + ε 2 = h ( s obs ) + o p ( 1/2 ). Now s obs is just the MLES for θ. So the mea of the ABC estimate is just MLES for h(θ) plus terms which are egligible as. The asymptotic distributio of the MLES is Gaussia, ad thus the ABC posterior mea will also have the same asymptotic distributio. Our Theorem 3.1, a Berstei-vo Mises type result, shows that such behaviour holds i geeral. Furthermore, we ca get a ABC estimate with asymptotically equivalet mea if we just use a oe-dimesioal summary statistic, s(y). We show i Propositio 3.1 that for ay ).
ABC ASYMPTOTICS 5 d dimesioal summary statistic, with d greater tha the dimesio, p, of the parameter, there is a equivalet p dimesioal summary statistic achievig the same asymptotic distributio for the ABC posterior mea. Note that whilst for ε = o( 1/4 ) we have that the ABC posterior mea is asymptotically equivalet to the MLES, the ABC posterior is ot ecessarily a good approximatio to the posterior distributio give the summaries. I particular the ABC posterior has a larger variace tha the true posterior. If ε = O( 1/2 ) the it will over-estimate the posterior variace by a costat factor as, ad this correspods to a equivalet overestimate of the ucertaity i ABC estimates of the parameters. If ε decreases to 0 more slowly tha O( 1/2 ), the the ABC posterior variace will be O(ε 2 ) rather tha O(1/). To obtai a accurate estimate of the true posterior give the summary statistics as we would eed 1/(ε 2 ) = o(1), but as we shall see, this will lead to the deteriorative Mote Carlo performace of the IS-ABC algorithm. Our secod set of results focuses o how the Mote Carlo error of IS-ABC affects the accuracy of the fial ABC estimator. Firstly ote that we ca boud the performace of IS- ABC by a algorithm which geerates N i.i.d. draws from the ABC posterior. The Mote Carlo variace of such a algorithm will be equal to the ABC posterior variace divided by the Mote Carlo sample size, N. So if ε decreases to 0 more slowly tha O( 1/2 ) the Mote Carlo variace will domiate the variace of ĥh. For IS-ABC we will cosider a class of proposal distributios that are tempered versios of the ABC posterior, defied for α 0, as π (α) ABC (θ) π(θ)f ABC(s obs θ, ε) α. For the above model with summary statistic s(y) this correspods to the followig proposal distributio for θ ( α s obs N 1/ + ε 2 + α, 1 + ε 2 ) α + 1 + ε 2. Deote the mea ad variace of this distributio as µ α ad σ 2 α respectively. It is straightforward to see that the margial distributio of summary statistics simulated i IS-ABC will also be ormal, with mea µ α ad variace σ 2 α +1/. Iformally, to have o-egligible acceptace probability we eed simulated summary statistics to be withi O(ε 2 ) of s obs. This meas that both σ 2 α +1/ ad ( s obs µ α ) 2 must be O(ε 2 ), ad thus occurs if ad oly if α > 0 ad ε 2 c/ for some c > 0. Aalytic expressios for the acceptace probability for our model, which cofirm this ituitio, are give i Appedix D. I Theorem 5.1 we demostrate that this behaviour holds for ABC i geeral. For the Mote Carlo variace of IS-ABC to be well-behaved we also eed that the variace of the ormalised weights assiged to the accepted θ values does ot blow-up as icreases. Note that cotrollig this variace is o-trivial as the expected value of the origial, u-ormalised, weights goes to 0 as icreases. Thus stadard methods [e.g. 25] which boud the origial weights do ot work. For our Gaussia example, the above
6 LI AND FEARNHEAD discussio for the acceptace probability suggests that to cotrol the Mote Carlo variace we wat ε 2 = c/ for some positive costat c. Uder this coditio we ca show that the variace of the ormalised IS weights depeds o the ratio of the ABC posterior variace to the variace of the distributio of θ values that are accepted i IS-ABC. Similar to the stadard result for importace samplig with a Gaussia proposal ad Gaussia target, we eed the latter variace to be greater tha half the former. For our example, as this occurs if ad oly if α < 1 (see Appedix D). I Theorem 5.2 we show IS-ABC usig a tempered proposal with α (0, 1) leads to a Mote Carlo variace that is well-behaved as, ad that the resultig asymptotic variace of ĥh is 1 + O(1/N) times the variace of the MLES. 1.2. Outlie of Paper. The paper is orgaised as follows. Sectio 2 sets up some otatios ad presets the key assumptios for the mai theorems. Sectio 3 gives the asymptotic ormality of the ABC posterior mea of h(θ) for. Sectio 4 gives the asymptotic ormality of ĥh whe N. I Sectio 5, the relative asymptotic efficiecy betwee MLES ad ĥh is studied for various proposal desities. A iterative importace samplig algorithm is proposed ad the compariso betwee ABC ad the idirect iferece (II) is give. I Sectio 6 we demostrate our results empirically o a stochastic volatility model. Sectio 7 cocludes with some discussios. 2. Notatio ad Set-up. As metioed above, we deote the data by Y obs = (y obs,1,, y obs, ), where is the sample size, ad each observatio, y obs,i, ca be of arbitrary dimesio. We will be cosiderig asymptotics as, ad thus deote the desity of Y obs by f (y θ). This desity depeds o a ukow parameter θ. We will let θ 0 deote the true parameter value, ad π(θ) the prior distributio for the parameter. Let p be the dimesio of θ ad P be the parameter space. For a set A, let A c be its complemet with respect to the whole space. We assume that θ 0 is i the iterior of the parameter space, as implied by the followig coditio: (C1) There exists some δ 0 > 0, such that P 0 {θ : θ θ 0 < δ 0 } P. To implemet ABC we will use a summary statistic of the data, s (Y ) R d ; for example a vector of sample meas of appropriately chose fuctios of the data. This summary statistic will be of fixed dimesio, d, as we vary. The desity for s (Y ), implied by the desity for the data, will deped o, ad we deote this by f (s θ). We will use the shorthad S to deote the radom variable with desity f (s θ). I ABC we use a kerel, K(x), with max x K(x) = 1, ad a badwidth ε > 0. As we vary we will ofte wish to vary ε, ad i these situatios deote the badwidth by ε. For the importace samplig algorithm we require a proposal distributio, q (θ), ad allow for this to deped o. We assume the followig coditios o the kerel: (C2) (i) vk(v) dv = 0 ad v i v j v k K(v) dv = 0 for ay differet coordiates (v i, v j, v k ) of v.
ABC ASYMPTOTICS 7 (ii) K(v) is spherically symmetric, i.e. K(v) = K( v ), ad K(v) is a decreasig fuctio of v. (iii) K(v) = O(e c 1 v α1 ) for some α 1 > 0 ad c 1 > 0 as v. I (C2), (i) is satisfied by all commoly used kerels i ABC; (ii) ca be assumed without loss of geerality, sice π ABC with a elliptically symmetric kerel is equivalet to π ABC with a spherically symmetric kerel ad the liearly trasformed s obs ; (iii) is satisfied by kerels with bouded support or expoetially decreasig tails, like Gaussia kerel. For a real fuctio g(x) with vector x, at x = x 0, deote its k th partial derivative by D xk g(x 0 ), the gradiet fuctio by D x g(x 0 ) ad the Hessia matrix by H x g(x 0 ). To simplify the otatios, D θk, D θ ad H θ are writte as D k, D ad H respectively. For a series x, besides the limit otatios O( ) ad o( ), we use the otatios that for large eough, x = Θ(a ) if there exists costats m ad M such that 0 < m < x /a < M <, ad x = Ω(a ) if x /a. The asymptotic results are based aroud assumig a cetral limit theorem for the summary statistic. (C3) There exists a sequece a, with a as, a d-dimesioal vector s(θ) ad a d d matrix A(θ), such that for all θ P, Furthermore, that a (S s(θ)) L N(0, A(θ)); as. (i) s(θ) ad A(θ) C 1 (P 0 ), ad A(θ) is positive defiite for ay θ; (ii) s(θ) = s(θ 0 ) if ad oly if θ = θ 0 ; ad (iii) I(θ) Ds(θ) T A 1 (θ)ds(θ) has full rak at θ = θ 0. Uder coditio (C3) we have that a is the rate of covergece i the cetral limit theorem. If the data are idepedet ad idetically distributed, ad the summaries cosist of sample meas of fuctios of the data, the a = 1/2. Part (ii) of this coditio is required for the true parameter to be idetifiable give oly the summary of data. Furthermore, I 1 (θ 0 )/a 2 is the asymptotic variace of MLES for θ ad therefore is required to be valid at the true parameter. We ext require a coditio that cotrols the differece betwee f (s θ) ad its limitig distributio for θ P 0 ad s close to s(θ 0 ). This coditio is similar to that assumed by [12] whe they looked at the asymptotics of the MLES for θ. Let N(x; µ, Σ) be the ormal desity at x with mea µ ad variace Σ. Defie f (s θ) = N(s; s(θ), A(θ)/a 2 ), LR (s, θ) = log(f (s θ)/ f (s θ)) ad LR (θ) = LR (s obs, θ). The the coditio is: (C4) sup θ P0 sup s s(θ0 ) M LR (s, θ) = o(1) for ay positive costat M, a 1 D θ LR (θ 0 ) = o p (1) ad sup θ P0 a 2 H θ LR (θ) = o p (1). We also eed a coditio that esures the tails of f (s θ) are expoetially decreasig.
8 LI AND FEARNHEAD (C5) sup θ P c sup 0 s s(θ 0 ) M 1 f (s θ) = O(e c 2a α 2 ) for some positive costats M 1, c 2 ad α 2. The followig coditio requires a appropriate choice of K(v) such that the approximate likelihood f ABC, as a itegral i R d, maily depeds o the itegratio i a compact set aroud s obs. (C6) M 2 > 0 such that [ sup θ P 0 v M 2 ε 1 f (s obs + ε v θ)k(v) dv/f ABC (s obs θ, ε ) ] = o p (1). Whe the support of K(v) is bouded, (C6) obviously holds. For K(v) with ubouded support, a sufficiet coditio for (C6) to hold is that the tails of K(v) decrease fast eough, as stated below. (C6 ) M 2 > 0 such that sup v M2 ε d K(ε 1 v) if θ P0, s s obs M 2 f (s θ). Some cotiuity ad momet coditios of the prior distributio are required. (C7) π(θ) is cotiuous i P 0 ad π(θ 0 ) > 0. (C8) θ π(θ) dθ < ad θ 2 π(θ) dθ <. Fially, the fuctio of iterest h(θ) eeds to satisfy some differetiable ad momet coditios i order that the remaiders of its posterior momet expasio are small. Cosider the k th coordiate h k (θ) of h(θ). (C9) h k (θ) C 1 (P 0 ) ad D k h(θ 0 ) 0. (C10) h k (θ) π(θ) dθ < ad h k (θ) 2 π(θ) dθ <. 3. Asymptotics of h ABC. We first igore the Mote Carlo error of ABC, ad focus o the ideal ABC estimator, h ABC, where h ABC = E πabc [h(θ) s obs, ε ]. As a approximatio to the true posterior mea, E[h(θ) Y obs ], h ABC cotais the errors from the choice of the badwidth, ε, ad the summary statistic s obs. To uderstad the effect of these two sources of error, we derive a result for the asymptotic distributio of h ABC, where we cosider radomess solely due to the radomess of the data. Theorem 3.1. Assume coditios (C1) (C5), (C7) (C10), ad (C11)-(C16) i the appedix. The if ε = o(1/ a ), as. a (h ABC h(θ 0 )) L N(0, Dh(θ 0 ) T I 1 (θ 0 )Dh(θ 0 )),
ABC ASYMPTOTICS 9 Theorem 3.1 says whe ε goes to 0 at a rate faster tha 1/ a, the bias brought by ε is asymptotically egligible. Hece regardless of the sufficiecy of s obs, the ABC estimator is cosistet ad asymptotically ormal with the asymptotic variace equal to the Cramer- Rao lower boud for estimatig θ give the summary statistic. This is miimised by ay sufficiet statistic satisfyig (C3), illustrated i the remark below, ad also by choices such as E[θ Y obs ] suggested i [17, Theorem 3]. How to choose the dimesio d of s obs is of iterest, sice larger d gives possibly more iformative s obs but slower covergece of ĥh whe N icreases [8]. The followig propositio states that whe d exceeds the dimesio of the parameter, h ABC based o s obs is equivalet i the first order to h ABC based o p liear combiatios of s obs. Thus we ca use a p dimesioal statistic without loss of asymptotic efficiecy. Propositio 3.1. Assume the coditios of Theorem 3.1. If d is larger tha p, let C = Ds(θ 0 ) T A 1 (θ 0 ), the I C (θ 0 ) = I(θ 0 ) where I C (θ) is the I(θ) matrix of the summary statistic CS. Therefore h ABC based o Cs obs ad s obs have the same asymptotic variace. Proof. The equality ca be verified by algebra. Remark 3.1. Cosider the MLES for the parameter, θ MLES = argmax θ P log f (s obs θ), ad the correspodig MLES for our fuctio of iterest, h(θ MLES ). Theorem 3.1 is based o two results. First, Lemma 3 states that a (h(θ MLES ) h(θ 0 )) L N(0, Dh(θ 0 ) T I 1 (θ 0 )Dh(θ 0 )), which meas that h(θ MLES ) shares a similar cetral limit theorem to the stadard MLE based o the full data, but with a differet asymptotic variace that depeds o the covergece properties of s obs. This is more geeral tha the covergece result of MLES i [12] which assumes P is compact. Secod, h ABC is the same as h(θ MLES ) to the first order through a Berstei Vo-Mises type of covergece for the posterior distributio ad expectatios, stated i Lemma 4 ad 5 i Appedix A. [46] developed a similar covergece of the posterior distributio which is limited to the case whe p = d. The equivalece betwee h ABC ad h(θ MLES ) also implies that the optimal asymptotic variace of h ABC is the Cramer-Rao lower boud, achieved whe s obs is sufficiet. Remark 3.2. The order o(1/ a ) of ε is surprisig due to the followig observatio. I [45] it is oted that the ABC posterior is the posterior uder a wrog model likelihood. Specifically, let S,ε S εx where X K(x). The approximate likelihood f ABC (s obs θ, ε) used i ABC is the desity of S,ε. If ε = o(1/a ) the a S,ε S will ted to 0 for large, ad we would expect the error itroduced through usig a o-zero ε to be egligible. However the theorem gives a much weaker coditio o ε for the bias to be asymptotically egligible.
10 LI AND FEARNHEAD Theorem 3.1 leads to followig atural defiitio. Defiitio 1. Assume that the coditios of Theorem 3.1 hold. The the asymptotic variace of h ABC is AV habc = 1 a 2 Dh(θ 0 ) T I 1 (θ 0 )Dh(θ 0 ). 4. Asymptotic Mote Carlo Error of ABC. We ow cosider the Mote Carlo error ivolved i estimatig h ABC. Here we fix the data ad cosider radomess solely i terms of the stochasticity of the Mote Carlo algorithm. We focus o the importace samplig algorithm give i the itroductio. Remember that N is the Mote Carlo sample size. For i = 1,..., N, θ i is the proposed parameter value ad w i is its importace samplig weight. Let φ i be the idicator that is 1 if ad oly if θ i is accepted i step 3 of algorithm 1 ad N acc = N i=1 φ i be the umber of accepted parameter. Provided N acc 1 we ca estimate h ABC from the output of importace samplig algorithm with N N ĥh = h(θ i )w i φ i / w i φ i. Defie p acc,q = i=1 q(θ) i=1 f (s θ)k ε (s s obs )dsdθ, which is the acceptace probability of the importace samplig algorithm proposig from q(θ). Furthermore, defie q ABC (θ s obs, ε) q (θ)f ABC (s obs θ, ε), the desity of the accepted parameter; ad [ Σ IS, E πabc (h(θ) h ABC ) 2 π ] ABC(θ s obs, ε ) q ABC (θ s obs, ε ) (2) ad Σ ABC, p 1 acc,q Σ IS,, where Σ IS, is the IS variace with π ABC as the target desity ad q ABC as the proposal desity. Note that p acc,q ad Σ IS,, ad hece Σ ABC,, deped o s obs. Stadard results give the followig asymptotic distributio of ĥh. Propositio 4.1. For a give ad s obs, if h ABC ad Σ ABC, are fiite, the N( ĥh h ABC ) L N(0, Σ ABC, ), as N.
The propositio motivates the followig defiitio. ABC ASYMPTOTICS 11 Defiitio 2. For a give ad s obs, assume that the coditios of Propositio 4.1 hold. The the asymptotic Mote Carlo variace of ĥh is MCVĥh = 1 N Σ ABC,. From Propositio 4.1, it ca be see that the asymptotic Mote Carlo variace of ĥh is equal to the IS variace Σ IS, divided by the average umber of acceptace Np acc,q, ad therefore depeds o the proposal distributio ad ε through these two terms. Remark 4.1 (Optimal proposal desity). Accordig the alterative expressio of Σ ABC, i the proof of Propositio 4.1 that [ Σ ABC, = p 1 acc,πe πabc (h(θ) h ABC ) 2 π(θ) ] (3), q (θ) the optimal proposal desity miimisig MCVĥh is the desity proportioal to h(θ) h ABC π(θ)f ABC (s obs θ, ε) 1/2. This ca be obtaied similarly as obtaiig the optimal proposal desity for the ratio estimate of importace samplig [24, Chapter2]. 5. Asymptotic Properties of Rejectio ad Importace Samplig ABC. We have defied the asymptotic variace as of h ABC, ad the asymptotic Mote Carlo variace, as N of ĥh. Both the error of h ABC whe estimatig h(θ 0 ) ad the Mote Carlo error of ĥh whe estimatig h ABC are idepedet of each other. Thus this suggests the followig defiitio. Defiitio 3. Assume that the coditios of Theorem 3.1, ad that h ABC ad Σ ABC, are bouded i probability for ay. The the asymptotic variace of ĥh is AVĥh = 1 a 2 h(θ 0 ) T I 1 (θ 0 )Dh(θ 0 ) + 1 N Σ ABC,. That is the asymptotic variace of ĥh is the sum of its Mote Carlo asymptotic variace for estimatig h ABC, ad the asymptotic variace of h ABC. As metioed i Remark 3.1, the first term o the right-had side is the asymptotic variace of the MLES for h(θ). Therefore let AV MLES = a 2 h(θ 0 ) T I 1 (θ 0 )Dh(θ 0 ). We ow wish to ivestigate the properties of this asymptotic variace, for large but fixed N, as. I particular we are iterested i how AVĥh, compares to AV MLES, ad how this depeds o the choice of ε ad q (θ). Thus we itroduce the followig defiitio:
12 LI AND FEARNHEAD Defiitio 4. as For a choice of ε ad q (θ), we defie the asymptotic efficiecy of ĥh AEĥh = lim AV MLES AVĥh. If this limitig value is 0, we say that ĥh is asymptotically iefficiet. We will ivestigate the asymptotic efficiecy of ĥh uder the assumptio of Theorem 3.1 that ε = o(1/ a ). We will further defie c ε = lim a ε, ad assume that this limit exists. Note that c ε ca be either a costat or ifiity. We will cosider a family of proposal desities, defied for α [0, 1], π (α) ABC (θ) π(θ)f ABC (s obs θ, ε ) α. These ca be viewed as tempered versios of the ABC posterior. For α = 0 ad 1, π (α) ABC (θ) are π(θ) ad π ABC (θ s obs, ε ) respectively. For α = 1/2, π (α) ABC (θ) is the proposal desity miimisig the ESS of [28], as show i [17]. Whilst we could ot use π (α) ABC directly as a proposal distributio, except for whe α = 0, this family should give us isight ito the behaviour of differet proposal distributios if we try ad icreasigly sample i areas of high ABC-posterior mass. First we show that if we propose from the prior (α = 0) or the posterior (α = 1) the the ABC estimator is asymptotically iefficiet. Let a,ε = a 1 cε< + ε 1 1 cε=. Recall the iterpretatio i Remark 3.2 ad give (C3), a,ε is the covergece rate of S,ε. Theorem 5.1. Assume the coditios of Theorem 3.1 ad (C6). Cosider a fixed N. The we have: (i) If q (θ) = π(θ), p acc,q = Θ p (ε d a d p,ε ) ad Σ IS, = Θ p (a 2,ε). (ii) If q (θ) = π ABC (θ s obs, ε ), p acc,q = Θ p (ε d a d,ε) ad Σ IS, = Θ p (a p,ε). I both cases ĥh is asymptotically iefficiet. The reaso why ĥh is asymptotically iefficiet is because the Mote Carlo variace decays more slowly tha 1/a 2 as. However the problem with the Mote Carlo variace is caused by differet factors i each case. To see this, cosider the acceptace probability of a value of θ ad correspodig summary s simulated i oe iteratio of the IS-ABC algorithm. This acceptace probability depeds o (4) s s obs ε = 1 ε [(s s(θ)) + (s(θ) s(θ 0 )) + (s(θ 0 ) s obs )], where s(θ), defied i (C3), is the limitig values of s as if data is sampled from the model for parameter value θ. By (C3) the first ad third bracketed terms withi
ABC ASYMPTOTICS 13 the square brackets o the right-had side are O p (a 1 ). If we sample from the prior, the the middle term is O p (1), ad thus (4) will blow-up as ε goes to 0. Hece p acc,π goes to 0 as ε goes to 0 ad thus causes the estimate to be iefficiet. If we sample from the posterior, the by Theorem 3.1 we expect the middle term to also be O p (a 1 ). Hece (4) is well behaved as, ad cosequetly p acc,π is bouded away from 0, provided either ε = Θ(a 1 ) or ε = Ω(a 1 ). However, π ABC (θ s obs, ε ) still causes the estimate to be iefficiet due to a icreasig variace of the importace weights. As icreases the proposal is more ad more cocetrated aroud θ 0, while π does ot chage. Therefore the weight, which is the ratio of π ABC ad q ABC, is icreasigly skewed ad causes Σ IS, to go to. Whilst usig π (α) ABC (θ) with either α = 0, the prior, or α = 1, the posterior, leads to asymptotically iefficiet estimators, the followig result shows that by usig π (α) ABC (θ) with α (0, 1) as a proposal we ca avoid this problem. This is because such a choice of proposal leads to a acceptace probability that is bouded away from 0, ad, if we further choose ε = Θ(a 1 ), the Mote Carlo IS variace for the accepted parameter values is Θ(a 2 ), i.e. havig the same order as the variace of MLES. Theorem 5.2. Assume the coditios of Theorem 5.1 ad (C17)-(C20). Cosider N is fixed. If q (θ) = π (α) ABC with α (0, 1), p acc,q = Θ p (a d,εε d ) ad Σ IS, = Θ(a 2,ε). The if ε = Θ(a 1 ), AVĥh = (1 + K/N)AV MLES ad AEĥh = 1 K/(N + K) for some costat K. The above result shows that a good proposal distributio, i the sese of resultig i a ABC estimator whose asymptotic efficiecy is 1 O(1/N), will have a threshold ε that is Θ(a 1 ) ad a acceptace probability that is bouded away from 0 as icreases. This supports the ituitive idea of usig the acceptace rate i ABC to choose the threshold based o aimig for a appropriate proportio of acceptaces [e.g. 15, 5]. 5.1. Iterative Importace Samplig ABC. From Theorem 5.2 ad [17], we suggest proposig from a approximatio to π (1/2) ABC (θ). We suggest usig a iterative procedure [similar i spirit to that of 3], see Algorithm 2. I this algorithm, N is the umber of simulatios allowed by the computig budget, N 0 < N ad {p k } is a sequece of acceptace rate, which we use to choose the badwidth. The rule for choosig the ew proposal distributio is based o the mea ad variace of π (1/2) ABC (θ) beig approximately equal to the mea ad twice of the variace of π ABC (θ) respectively, as show i the proof of Theorem 5.2. A atural choice of q 1 (θ) is π(θ). {p k } ca be set to decrease iitially from a relatively large percetage ad the stay at a small value, so that the cetre µ k ca stably move towards the true parameter ad a small eough badwidth ca be achieved at last. Startig from a small percetage may accelerate the covergece, but if the summary is ot accurate eough about the parameter, it may cause iaccurate µ k. It ca also be adjusted automatically by assessig some quality criterio of
14 LI AND FEARNHEAD Algorithm 2: Iterative Importace Samplig ABC At the k th step, 1. ru IS-ABC with simulatio size N 0, proposal desity q k (θ) ad acceptace rate p k, ad record the badwidth ε k. 2. If ε k 1 ε k is smaller tha some positive threshold, stop. Otherwise, let µ k+1 ad Σ k+1 be the empirical mea ad variace matrix of the weighted sample from step 1, ad let q k+1 (θ) be the desity with cetre µ k+1 ad variace matrix 2Σ k+1. 3. If q k (θ) is close to q k+1 (θ), stop. Otherwise, retur to step 1. After the iteratio stops at the K th step, ru the IS-ABC with proposal desity q K+1 (θ), N KN 0 simulatios ad p K+1. the importace weights, like the ESS used i [15]. Whe comparig q k (θ) ad q k+1 (θ), a simple criteria is the differece µ k µ k+1 + Σ k Σ k+1 1/2. Besides costructig q k (θ) as a uimodal desity, other methods of costructig the importace proposal ca be applied icludig [37, 10, 44, 29]. Sice algorithm 2 has the same simulatio size as the rejectio ABC ad the additioal calculatio is igorable, the iterative procedure does ot itroduce additioal computatioal cost. 5.2. Compariso with Idirect Iferece. We ca compare the efficiecy of IS-ABC with that of Idirect Iferece (II) [22]. II is a alterative likelihood-free method that ivolves (i) approximatig the model of iterest, heceforth the true model by a tractable auxiliary model; (ii) estimatig the parameters of the auxiliary model; (iii) mappig the estimates of these auxiliary model parameters to estimates of parameters of the true model usig simulatio from the true model. The estimates of the auxiliary model parameters have the same role as the summary statistics i ABC. Thus if we implemet ABC with these summary statistics, which of II ad IS-ABC will be more accurate? I the situatio where there are the same umber of parameters i the auxiliary model, or equivaletly summary statistics, as there are parameters i the true model, the both II ad IS-ABC have similar asymptotic efficiecy. I both cases it is 1 O(1/N) times the efficiecy of the MLES [23]. Here N is the umber of simulatios from the true model for either II or IS-ABC, ad is proportioal to the computatioal cost of the method. If the umber of parameters i the auxiliary model is greater tha the umber of parameters i the true model, II requires a weight-matrix to be specified. The asymptotic efficiecy of II depeds o this choice of weight-matrix. If chose optimally the II will obtai the same asymptotic efficiecy as IS-ABC; otherwise for sufficietly large N IS-ABC will lead to more accurate estimates tha II. (Note that there are simulatio based approaches that will cosistetly estimate the optimal weight-matrix i idirect iferece.)
ABC ASYMPTOTICS 15 φ σ η 8 8 log(*mse) 6 4 log(*mse) 6 4 2 2 2.0 2.5 3.0 3.5 4.0 log 10 2.0 2.5 3.0 3.5 4.0 log 10 logσ 8 log(*mse) 6 4 2 φ σ v logσ =100 0.94 1.2 1.1 500 0.48 1.2 1.3 2000 0.17 0.51 0.94 10000 0.055 0.2 0.61 2.0 2.5 3.0 3.5 4.0 log 10 methods prior IIS Fig 1. Comparisos of R-ABC ad IIS-ABC for icreasig. For each, the logarithm of average MSE for 100 datasets multiplyig by is reported. For each dataset, the Mote Carlo sample size of ABC estimators is 10 4. The ratio of the MSEs of the two methods is give i the table, ad smaller values idicate better performace of the IIS-ABC. 6. Stochastic Volatility with AR(1) Dyamics. Cosider the stochastic volatility model i [40] { x = φx 1 + η, η N(0, σ 2 η) y = σe x 2 ξ, ξ N(0, 1), where η ad ξ are idepedet, y is the demeaed retur of a portfolio obtaied by subtractig the average of all returs from the actual retur ad σ is the average volatility level. By the trasformatio y = log y 2 ad ξ = log ξ, 2 the state-space model ca be trasformed to { x = φx 1 + η, η N(0, σ 2 (5) η) y = 2 log σ + x + ξ, exp{ξ} χ 2 1, which is liear ad o-gaussia. The ABC method ca be used to obtai a off-lie estimator for the ukow parameter of the state-space models, which is recetly discussed by [32]. Here we illustrate the effectiveess of iteratively choosig the importace proposal for large by comparig the performace of the rejectio ABC (R-ABC) ad the iterative IS-ABC (IIS- ABC). Cosider the estimatio of the parameter (φ, σ η, log σ) with the uiform prior i
16 LI AND FEARNHEAD the area [0, 1) [0.1, 3] [ 10, 1]. The settig with the true parameter (φ, σ η, log σ) = (0.9, 0.675, 4.1) is studied, which is motivated by the empirical studies ad the details are stated i [40]. For ay dataset Y = (y 1,, y ), let Y = (y 1,, y ). The summary statistic s (Y ) = (Ṽ ar[y ], Cor[Y ], Ẽ[Y ]) is used, where Ṽ ar, Cor ad Ẽ deote the empirical variace, lag-1 autocorrelatio ad mea. If there were o oise i the state equatio for ξ i (5), the s (Y ) would be a sufficiet statistic of Y, ad hece is a atural choice to make for summary statistic. The uiform kerel is used i the accept-reject step of ABC. The performace of R-ABC ad IIS-ABC for = 100, 500, 2000 ad 10000 with the simulatio budget N = 10000. For the IIS-ABC, the sequece {p k } has the first five values beig 5% to 1%, decreasig by 1%, ad the other values beig 1%. For R-ABC, both 5% ad 1% quatiles are tried ad 5% is chose for its better performace. For each iteratio, N 0 = 1000. The simulatio results are show i figure 1. It ca be see that for all parameters, the IIS-ABC shows icreasig advatage over the R-ABC as icreases. For larger, sice the summary statistic is more accurate about the parameter, by costructig the importace proposal with oly the simulatios withi a small distace to the observed summary, the iterative procedure teds to obtai the cetre closer to the true parameter ad the smaller badwidth tha those used i the R-ABC, ad the compariso becomes more sigificat whe icreases. For smaller, both perform similarly, sice whe the summary statistic is ot accurate eough, the ABC posterior is ot much differet from the prior, ad the beefit of samplig from a slightly better proposal does ot compesate the icreased Mote Carlo variace from the importace weight. For φ ad σ v, the values of for which IIS-ABC starts to show advatage are smaller tha that for log σ. Because with the iformative summary statistic Ẽ[Y ] the limit of which is i a liear relatioship with log σ, the estimatio of log σ is easier tha that of φ ad σ v, ad more improvemet ca be made upo the R-ABC estimators of φ ad σ v. 7. Summary ad Discussio. The results i this paper suggest that ABC ca scale to large data, at least for models with a fixed umber of parameters. Uder the assumptio that the summary statistics obey a cetral limit theorem (as defied i Coditio C3), the we have that asymptotically the ABC posterior mea of a fuctio of the parameters is ormally distributed about the true value of that fuctio. The asymptotic variace of the estimator is equal to the asymptotic variace of the MLE for the fuctio give the summary statistic. Ad without loss of asymptotic efficiecy we ca always use a summary statistic that has the same dimesio as the umber of parameters. This is a stroger result tha that of [17], where they show that choosig the same umber of summaries as parameters is optimal whe iterest is i estimatig just the parameters. We have further show that appropriate importace samplig implemetatios of ABC are efficiet, i the sese of icreasig the asymptotic variace of our estimator by a factor that is just O(1/N). However similar results are likely to apply to SMC ad MCMC implemetatios of ABC. For example ABC-MCMC will be efficiet provided the acceptace
ABC ASYMPTOTICS 17 probability does ot degeerate to 0 as icreases. However at statioarity, ABC-MCMC will propose parameter values from a distributio close to the ABC posterior desity, ad Theorems 5.1 ad 5.2 suggest that for such a proposal distributio the acceptace probability of ABC will be bouded away from 0. Whilst our theoretical results suggest that poit estimates based o the ABC posterior have good properties, they do ot suggest that the ABC posterior is a good approximatio to the true posterior, or that the ABC posterior will accurately quatify the ucertaity i estimates. As show by the Gaussia example i Sectio 1.1, the ABC posterior will ted to over-estimate the ucertaity. Ackowledgemets This work was support by the Egieerig ad Physical Scieces Research Coucil, grat EP/K014463. Appedix. Here techical lemmas ad proofs of the mai results are preseted. Throughout the appedix the data are cosidered to be radom, ad O( ) ad Θ( ) deote the limitig behaviour whe goes to. For a vector x ad a desity f(x), let x 1:k be the first k coordiates of x ad f(x 1:k ) be the margial desity o x 1:k. For two sets A ad B, the sum of itegrals A f(x) dx + B f(x) dx is writte as ( A + B )f(x) dx. Let T obs = A(θ 0 ) 1/2 a (s obs s(θ 0 )) ad by (C3), T obs N(0, I d ) where I d is the idetity matrix with dimesio d. APPENDIX A: PROOF OF SECTION 3 Deote V ar πabc [h(θ) s obs, ε] by V ABC (ε) ad E πabc [h(θ) s obs, ε] by h ABC (ε). The h ABC = h ABC (ε ). Cosider the followig coditios: (C11) E[h(θ) s obs ] = O p (1) ad V ar[h(θ) s obs ] = O p (1). (C12) Let g c (s obs, ε) = π(θ)f ABC (s obs θ, ε) dθ, g h (s obs, ε) = h(θ)π(θ)f ABC (s obs θ, ε) dθ ad g h2 (s obs, ε) = (h(θ) h ABC (ε)) 2 π(θ)f ABC (s obs θ, ε) dθ. Assume that i D ε g h (s obs, ε), D ε g c (s obs, ε) ad D ε g h2 (s obs, ε), the differetiatio ad itegratio ca be exchaged. (C13) c tol > 0 such that max H εh ABC (ε) = O p (1) ad ε (0,c tol ) max H εv ABC (ε) = O p (1). ε (0,c tol ) (C12) ad (C13) are the techical coditios eeded for applyig Taylor expasios o the ABC posterior momets. (C13) ca be iterpreted i the followig framework. By Remark 3.2, π ABC (θ s obs, ε) is the posterior desity takig the desity of S,ε as the likelihood ad the h ABC (ε) ad V ABC (ε) are the correspodig posterior mea ad variace give S,ε = s obs. I this sese, sice S,ε = O p (1) for ay ε > 0 by coditio (C3), it is reasoable to assume the uiform covergeces of h ABC (ε) ad V ABC (ε) i a compact set. Comparig to this, (C13) is stroger for assumig uiform covergece o the secod derivative. Let V ABC = V ABC (ε ). The proof of Theorem 3.1 proceeds as follows. First, i Lemma 1, the ABC posterior mea h ABC ad variace V ABC are expaded to separate the badwidth ε ad the posterior momets based o s obs. The i Lemma 4, the Berstei Vo-Mises
18 LI AND FEARNHEAD theorem is exteded for the posterior distributio ad expectatio based o s obs, which, i Lemma 5, leads to the expasios of the posterior mea with the leadig term beig the MLES ad variace with the leadig term beig the asymptotic variace i Theorem 3.1. Lemma 3 ad Lemma 2 give the covergece of the MLES. Lemma 1. Assume coditios (C2)(i) ad (C11)-(C13). For ay ε < c tol, h ABC ad V ABC have the followig expasio, h ABC = E[h(θ) s obs ] + r 1 (s obs, )ε 2, where r 1 (s obs, ) = O p (1), ad V ABC = V ar[h(θ) s obs ] + r 2 (s obs, )ε 2, where r 2 (s obs, ) = O p (1). Proof. Give coditios (C2)(i) ad (C12), a basic fact that will be used throughout this proof is that D ε f ABC (s obs θ, 0) = 0 for ay θ. With the otatio i (C12), h ABC = g h (s obs, ε )/g c (s obs, ε ), V ABC (ε ) = g h2 (s obs, ε )/g c (s obs, ε ). Applyig Taylor expasio o ε, sice D ε g c (s obs, 0) = 0 ad D ε g h (s obs, 0) = 0 by coditio (C12), we have h ABC = h ABC (0) + r 1 (s obs, )ε 2, where r 1 (s obs, ) = H ε h ABC (ε θ ) ad 0 < ε θ < ε. By coditio (C12) ad the product rule of differetiatio, it is ot difficult to see that D ε g h2 (s obs, 0) = 0. The V ABC (ε ) = V ABC (0) + r 2 (s obs, )ε 2, where r 2 (s obs, ) = H ε V ABC (ε V ) ad 0 < ε V < ε. By coditio (C13) ad otig that h ABC (0) = E[h(θ) s obs ] ad V ABC (0) = V ar[θ s obs ], the lemma holds. As icreases to, based o the classical Berstei Vo-Mises theorem, it is well kow the posterior mea ad variace of θ coditioal o the full dataset ca be expaded with the leadig terms beig the MLE ad the Fisher iformatio matrix respectively. See [19, Sectios 4.1-4.2]. The differece here is that the posterior momets are for the fuctio h(θ) ad coditioal o the summary statistic istead of the full dataset. Therefore we eed extesios of the classical result. [12] gives the cetral limit theorem for θ MLES whe a = ad P is compact. Accordig to the proof i [12], extedig the result to the geeral a is straightforward. Additioally, we give the extesio for geeral P. Lemma 2. Assume coditios (C1), (C3)-(C5). The it holds that a (θ MLES θ 0 ) L N(0, I 1 (θ 0 )) as. Proof. Accordig to (C4), f (s obs θ 0 ) = f (s obs θ 0 ) + o p (1) ad hece f (s obs θ 0 ) has the order O p (a d ) by (C3). The by (C5), for large eough, with probability 1, f (s obs θ 0 ) is larger tha f (s obs θ) for ay θ P c 0 ad hece θ MLES = argmax θ P0 f (s obs θ). The from [12], the lemma holds.
ABC ASYMPTOTICS 19 The cetral limit theorem of h(θ MLES ) is eeded. Give the coditio (C9), by Lemma 2 ad the delta method, the followig holds. Lemma 3. Assume the coditios of Lemma 2 ad (C9). The a (h(θ MLES ) h(θ 0 )) L N(0, Dh(θ 0 ) T I 1 (θ 0 )Dh(θ 0 )) as. Here we prove a posterior ormality result more geeral tha the Berstei Vo-Mises type of ormality, which is give i Corollary 1, by followig the the derivatios for posterior ormality i [20]. Let l (θ) = log f (s obs θ). The followig coditios about how fast the likelihood chages aroud θ 0 are eeded. (C14) l (θ) C 3 (P 0 ). (C15) a 2 sup θ P0 3 θ i θ j θ k l (θ) M(s obs ) for ay i, j, k of coordiate idices of θ, ad M(s obs ) = O p (1). Sice θ 0 is the true parameter, it is atural to assume that the log-likelihoods of the the parameters outside P 0 is smaller tha ad do ot coverge to that of θ 0 as, as stated below. (C16) P θ0 {lim a 2 sup θ P c 0 >δ [l 0 (θ) l (θ 0 )] < ɛ } = 1 for some ɛ > 0. Let τ = a (θ θ MLES ) be the ormalised θ. The we have the followig covergece results for the posterior distributio of θ. Lemma 4. Assume the coditios of Lemma 2, (C14)-(C16) ad (C7). Let π (t s obs ) be the posterior desity of τ. For ay real fuctio g (t; s obs ) = g (t; s obs )v(t) satisfyig the followig coditios: (a) The limit of g (0; s obs ), deoted by g 0, exists i probability ad t = o(a ), g (t ; s obs ) g 0 = o p (1); (b) max t δ0 a g (t; s obs ) = O p (1); (c) k 0 such that v(t) t k for ay t ad g (τ; s obs ) θ θ MLES k π(θ) dθ = O p (1), it holds that g (t; s obs )π (t s obs ) dt P g 0 1 I(θ v(t) 0 )t/2 (2π) p/2 I(θ 0 ) 1/2 e tt dt as. The itroductio of g (t; s obs ) is eeded for extedig the Berstei Vo-Mises covergece to the posterior momets, which ca be see later i the proof of Lemma 5. A example of g (t; s obs ) satisfyig (a)-(c) is g (t; s obs ) = h(a 1 t + θ MLES ) where the real fuctio h(θ) is cotiuous at θ 0, bouded i {θ : θ θ 0 δ 0 } ad h(θ)π(θ) dθ <.
20 LI AND FEARNHEAD Proof. By the fact that θ MLES is a costat give s obs, π (t s obs ) ca be obtaied by trasformig the posterior desity of θ which is proportioal to f (s obs θ)π(θ). The we have π (t s obs ) π (t s obs ) exp{l (θ MLES + a 1 t) l (θ MLES )}π(θ MLES + a 1 t), which holds sice l (θ MLES ) does ot deped o t. We oly eed to show (6) g (t; s obs )π (t s obs )dt P g 0 π(θ 0 ) v(t)e tt I(θ 0 )t/2 dt. Because g (t; s obs ) 1 obviously satisfies (a)-(c) with v(t) = 1, ad hece the ormalisig costat of π (t s obs ) coverges i probability to (2π) p/2 I(θ 0 ) 1/2 π(θ 0 ). The by Slutsky s theorem, Lemma 4 holds. We break P ito three regios, B 1 = {t P : t δ 0 a }, B 2 = {t P : c log a t < δ 0 a } ad B 3 = {t P : t < c log a } for o-egative c. (6) will be justified by showig that the itegrals of g (t; s obs )π (t s obs ) i B 1 ad B 2 are o p (1) ad that i B 3 coverges to the RHS of (6) i probability. I the regio B 1, we have g (t; s obs ) π (t s obs ) dt B 1 exp{ sup [l (θ MLES + a 1 t) l (θ MLES )]}a p g (τ; s obs ) π(θ) dθ t δ 0 a = exp{ sup [l (θ) l (θ 0 )] + o p (1)}a p+k θ θ 0 δ 0 g(τ; s obs )v(τ) a k π(θ) dθ, The by coditio (C16) ad (c), it holds that B 1 g (t; s obs ) π (t s obs )dt = o p (1). I the regio B 2, we have g (t; s obs ) π (t s obs )dt max g(t; s obs ) v(t) π (t s obs )dt. B 2 t δ 0 a B 2 By coditio (b), for provig the LHS of the above iequality is o p (1), we oly eed to show B 2 v(t) π (t s obs )dt is o p (1). By the defiitio of θ MLES, D θ l (θ MLES ) = 0. The t, (7) l (θ MLES + a 1 t) l (θ MLES ) = 1 2 tt D 2 t + 1 6 a 1 t T D 3 (ɛ 1 (t), t)t, where D 2 a 2 H θ l (θ MLES ), D 3 (ɛ 1 (t), t) [ a 2 3 l (θ MLES + ɛ 1 (t)) θ i θ j θ k k ] t k ad ɛ 1 (t) δ 0. p p
ABC ASYMPTOTICS 21 By coditio (C4), the Hessia matrix of l (θ) is similar to the Hessia matrix of its ormal log-likelihood approximatio, sice H θ l (θ) = H θ {log f(s obs θ) + a 2 LR (θ)} = H θ {a 2 (s obs η(θ)) T A 1 (θ)(s obs η(θ))/2} + a 2 H θ LR (θ) = a 2 I(θ) + a 2 O p (s obs η(θ)) + o p (1), where i the RHS of the last equatio, O p (s obs η(θ)) is a polyomial of s obs η(θ) without costat terms ad the covergece of o p (1) uiformly holds i P 0. The D 2 = I(θ 0 ) + o p (1). By coditio (C15), the absolute value of each elemet i D 3 (ɛ 1 (t), t) is less tha or equal to M(s obs ) k t k. Sice t k /a < δ 0 for ay t B 2, by choosig appropriate δ 0, I(θ 0 )/4 a 1 D 3 (ɛ 1 (t), t)/6 ca be positive defiite. The with probability 1, (8) l (θ MLES + a 1 t) l (θ MLES ) 1 4 tt I(θ 0 )t. Therefore for t B 2, v(t) π (t s obs ) t k exp{ 1 4 tt I(θ 0 )t}π(θ MLES + t a ) for some positive costat r, which implies that δ0a k k exp{ 1 4 c2 r log a } max π(θ), θ θ MLES δ 0 B 2 v(t) π (t s obs )dt 2a k+1 c2 r/4 δ k 0 max θ θ MLES δ 0 π(θ). Therefore by (C7) ad choosig a large eough c, B 2 v(t) π (t s obs )dt is o p (1). I the regio B 3, we have g (t; s obs )π (t s obs ) dt B 3 (9) =g 0 v(t)π (t s obs )dt + [g(t; B s obs ) g 0 ]v(t)π (t s obs ) dt. 3 B 3 For the first itegral i (9), by (7) we have v(t)π (t s obs ) dt B 3 = v(t) exp{ 1 B 3 2 tt D 2 t}π(θ MLES + t ) dt a (10) + v(t) exp{ 1 B 3 2 tt D 2 t}[exp{ 1 6 a 1 t T D 3 (ɛ 1 (t), t)t} 1]π(θ MLES + t ) dt. a
22 LI AND FEARNHEAD Sice B 3 goes to P ad log a /a 0 as, the first itegral i (10) coverges to π(θ 0 ) v(t)e tt I(θ 0 )t/2 dt i probability. For the secod itegral i (10), by otig that e x 1 e x x for x R, we have v(t) exp{ 1 B 3 2 tt D 2 t} exp{ 1 6 a 1 t T D 3 (ɛ 1 (t), t)t} 1 π(θ MLES + t ) dt a t k exp{ 1 B 3 2 tt D 2 t + 1 6 a 1 t T D 3 (ɛ 1 (t), t)t } 1 6 a 1 t T D 3 (ɛ 1 (t), t)t π(θ MLES + t ) dt a Ma 1 max t k+3 exp{ 1 t B 3 B 3 4 tt I(θ 0 )t} dt max π(θ), θ θ MLES δ 0 for some positive costat M, where the secod iequality follows the previous argumets about D 3 (ɛ 1 (t), t) whe t B 2. Sice max t B3 t k+3 = (c log a ) k+3, whe a is large eough, the secod itegral i (10) is o p (1). For the secod itegral i (9), we have [g(t; s obs ) g 0 ]v(t)π (t s obs ) dt max g(t; s obs ) g 0 v(t)π (t s obs ) dt. B 3 t <c log a B 3 By (a), sice log a = o(a ), max t <c log a g(t; s obs ) g 0 = o p (1). The sice the first itegral i (9) has bee show to be O p (1), the secod itegral is o p (1). Therefore B 3 g (t; s obs )π (t s obs ) dt P g 0 π(θ 0 ) v(t)e tt I(θ 0 )t/2 dt which cocludes the proof. The Berstei Vo-Mises type of asymptotic ormality is stated below ad holds obviously by lettig g (t; s obs ) = 1 t B. Corollary 1. Assume the coditios of Lemma 2, (C14)-(C16) ad (C7). It holds that for ay measurable set B P, π(t s obs )dt P 1 I(θ 0 )t/2 (2π) d/2 I(θ 0 ) 1/2 e tt dt as. B B Based o the covergece of τ, we ca ow expad the posterior momets E[h(θ) s obs ] ad V ar[h(θ) s obs ]. For simplicity, cosider the scalar fuctio h(θ) ad the results ca be easily exteded to the vector case. Lemma 5. Assume the coditios of Lemma 4, (C8)-(C10). The we have E[h(θ) s obs ] = h(θ MLES )+o p (a 1 Proof. Sice θ = θ MLES + a 1 τ, we have ) ad V ar[h(θ) s obs ] = a 2 Dh(θ 0 ) T I 1 (θ 0 )Dh(θ 0 )+o p (a 2 ), E[h(θ) s obs ] = h(θ MLES ) + E[g 1(τ; s obs ) s obs ], where g 1(τ; s obs ) = h(θ MLES + a 1 τ) h(θ MLES ),
ABC ASYMPTOTICS 23 ad V ar[h(θ) s obs ] =V ar[g2(τ; s obs )v 2 (τ) g3(τ; s obs )v 3 (τ) s obs ]a 2 =V ar[g2(τ; s obs )v 2 (τ) s obs ]a 2 + V ar[g3(τ; s obs ) T v 3 (τ) s obs ]a 2 2E[g23(τ; s obs ) T v 23 (τ) s obs ]a 2, where g 2(τ; s obs ) = h(θ MLES + a 1 τ) h(θ MLES ) Dh(θ MLES ) T a 1 a 1 τ τ, g 3(τ; s obs ) = Dh(θ MLES ), g 23(τ; s obs ) = g 2(τ; s obs )g 3(τ; s obs ), v 2 (τ) = τ, v 3 (τ) = τ ad v 23 (τ) = τ τ. If (a)-(c) of Lemma 4 are satisfied for the followig fuctios: g1 (τ; s obs ), g2 (τ; s obs )v 2 (τ), g3 (τ; s obs ) T v 3 (τ), (g2 (τ; s obs )v 2 (τ)) 2, ( g3 (τ; s obs ) T v 3 (τ) ) 2 ad g 23 (τ; s obs ) T v 23 (τ), the explicit forms of the expasios would be give by Lemma 4. For g1 (τ; s obs ), (a) is obviously satisfied with g 0 = 0 ad (b) is satisfied by coditio (C9). Sice by coditio (C10), g1(τ; s obs ) π(θ) dθ = h(θ) h(θ MLES ) π(θ) dθ = O p (1), (c) holds for v(t) = 1 ad k = 0. Therefore by Lemma 4, E[g 1 (τ; s obs ) s obs ] = o p (1). For g2 (τ; s obs )v 2 (τ) ad (g2 (τ; s obs )v 2 (τ)) 2, by coditio (C9), g2 (t ; s obs ) P 0 whe t = o(a ) ad g2 (t ; s obs ) is bouded whe t = O(a ). Hece (a) ad (b) are satisfied for g2 (τ; s obs ) ad g2 (τ; s obs ) 2 with g 0 = 0. For (c), by coditio (C10) ad (C8) ad otig that a 1 τ = θ θ MLES, we have g2(τ; s obs ) θ θ MLES π(θ) dθ = h(θ) h(θ MLES ) Dh(θ MLES ) T (θ θ MLES ) π(θ) dθ = O p (1). Similarly, g 2 (τ; s obs ) 2 θ θ MLES 2 π(θ) dθ = O p (1). Hece (c) is satisfied for g 2 (τ; s obs )v 2 (τ) ad (g 2 (τ; s obs )v 2 (τ)) 2. The by Lemma 4, V ar[g 2 (τ; s obs )v 2 (τ) s obs ] = o p (1). For g 3 (τ; s obs ) T v 3 (τ), cosider D k h(θ MLES )T k where T k is the k th coordiate of τ for each k. (a)-(c) are obviously satisfied with g 0 = D k h(θ 0 ). Hece by Lemma 4, E[g 3 (τ; s obs ) T v 3 (τ) s obs ] = o p (1). Similarly, ( g 3 (τ; s obs ) T v 3 (τ) ) 2, cosider Di h(θ MLES )D j h(θ MLES )T i T j for each (i, j) pair. (a)-(c) are obviously satisfied with g 0 = D i h(θ 0 )D j h(θ 0 ). Hece by Lemma 4, E[ ( g 3 (τ; s obs ) T v 3 (τ) ) 2 sobs ] P Dh(θ 0 ) T I 1 (θ 0 )Dh(θ 0 ). For g 23 (τ; s obs ) T v 23 (τ), followig the argumets for g 2 (τ; s obs ) ad g 3 (τ; s obs ) ad Cauchy- Schwartz iequality, we have E[g 23 (τ; s obs ) T v 23 (τ) s obs ] = o p (1). Therefore the lemma holds. Proof of Theorem 3.1. Combiig Lemma 1 ad Lemma 5, we have h ABC = h(θ MLES ) + o p (a 1 ) + r 1 (s obs, )ε 2.
24 LI AND FEARNHEAD The with ε = o(a 1/2 ) ad Lemma 3, the cetral limit theorem holds. APPENDIX B: PROOF OF SECTION 4 I the followig we use the covetio that for a d-dimesio vector x, the matrix xx T is deoted by x 2. Proof of Propositio 4.1. For the i.i.d sample (φ i, θ i, s (i) ), (θ i, s (i) ) are geerated from q (θ)f(s θ), ad coditioal o s = s (i), φ i are geerated from the Beroulli distributio with probability K ɛ (s s obs ). The sice ĥh is the ratio of meas of sample fuctios, we ca use the delta method to show that the cetral limit theorem holds, with mea E[h(θ 1 )w 1 φ 1 ] E[w 1 φ 1 ] = E[h(θ 1 )w 1 K ɛ (s (1) s obs )] h(θ)π(θ)f (s θ)k ɛ (s s obs ) ds dθ E[w 1 K ɛ (s (1) = s obs )] π(θ)f (s θ)k ɛ (s s obs ) ds dθ = h ABC, ad variace 1 E 2 [w 1 φ 1 ] V ar[h(θ 1 )w 1 φ 1 ] + E2 [h(θ 1 )w 1 φ 1 ] E 4 V ar[w 1 φ 1 ] 2 E[h(θ 1 )w 1 φ 1 ] [w 1 φ 1 ] E 3 Cov[h(θ 1 )w 1 φ 1, w 1 φ 1 ] T [w 1 φ 1 ] { (E[h(θ1 =p 2 acc,π ) 2 w1φ 2 1 ] h 2 ABCp 2 acc,π) + h 2 ( ABC E[w 2 1 φ 1 ] p 2 ) ( } acc,π 2hABC E[h(θ1 )w1φ 2 1 ] h ABC p 2 T acc,π) =p 2 acc,πe[(h(θ 1 ) 2 2h ABC h(θ 1 ) T + h 2 ABC)w1K 2 ε (s (1) s obs )] [ =p 1 acc,πe πabc (h(θ) h ABC ) 2 π(θ) ]. q (θ) I the above expressio we have used the fact that p acc,π = E[w 1 φ 1 ]. It is easy to verify by algebra that [ Σ ABC, = p 1 acc,πe πabc (h(θ) h ABC ) 2 π(θ) ]. q (θ) Therefore the CLT holds. APPENDIX C: PROOF OF SECTION 5 Deote f ABC (s obs θ, ε ) by f ABC (s obs θ) for short. By pluggig i the expressio of π (α) ABC (θ), the acceptace probability ad the IS variace for π(α) ABC (θ) are (11) (12) p acc,π (α) ABC ad Σ IS, = π(θ)fabc (s obs θ) 1+α dθ π(θ)fabc (s obs θ) α dθ, = ε d π(θ)fabc (s obs θ) 1+α dθ [ π(θ)f ABC (s obs θ) dθ] 2 (h(θ) h ABC ) 2 π(θ)f ABC (s obs θ) 1 α dθ.
ABC ASYMPTOTICS 25 Exted the defiitio of π (γ) ABC (θ) to γ [0, 2]. It ca be see that the proof of Theorem 5.1 ad 5.2 require to study the covergece order of the ormalisig costat of π (γ) ABC (θ), deoted by c γ (s obs ) = π(θ)f ABC (s obs θ) γ dθ. The mai idea is as follows. Divide R p ito B δ = {θ : θ θ 0 < δ} ad Bδ c for some δ < δ 0. First, i Bδ c, Lemma 6 shows that the itegratio is igorable. I B δ, by treatig f ABC (s θ) γ as a o-ormalised desity, Lemma 9 shows that its ormalisig costat is idepedet of θ, ad the (17) states that the itegratio i B δ ca be writte as the product of the margial desity of s obs, uder the posterior distributio with the ormalised f ABC (s θ) γ as likelihood, ad the ormalisig costat of f ABC (s θ) γ. The covergece rate of the ormalisig costat of f ABC (s θ) γ is give i Lemma 9, ad that of the margial desity is give i Lemma 11 by applyig the posterior covergece results o o-ormal likelihood i [18]. Fially, the covergece rate of c γ (s obs ) is stated i Lemma 12, implied by the above results. Cosider the o-trivial case where γ (0, 2]. For some δ < δ 0, decompose c γ (s obs ) ito two parts, icludig c γ,bδ (s obs ) π(θ)f ABC (s obs θ) γ dθ ad c γ,b c δ (s obs ) π(θ)f ABC (s obs θ) γ dθ. B δ First of all, the followig lemma shows that the itegral i Bδ c ca be igored. Lemma 6. Assume coditios (C2) ad (C5). The δ > 0, π(θ)f ABC (s obs θ) γ dθ = o p (a γd p,ε ). Proof. It is sufficiet to show that sup θ P c 0 f ABC(s obs θ) = O p (e aα,εc ) for some positive costats c ad α. Let c 3 = mi(c 1, c 2 ), α 3 = mi(α 1, α 2 ) ad v (1) = ε v. Note that e aα 3,εc 3 sup f (s obs + ε v θ)k(v) dv θ P c 0 = e aα 3,εc 3 sup f (s obs + v (1) θ)k(ε 1 v (1) )ε d dv (1) θ P c 0 v (1) >M 1 (13) + e aα 3,εc 3 sup θ P c 0 B c δ v (1) M 1 f (s obs + ε v θ)k(v) dv The first term i the RHS of (13) is bouded by K(ε 1 M 1 )ε d,εc 3 ad hece has the order O p (1) by (C2)(iii) ad otig that a α 3,ε/ε α 1 = O(1). The secod term is bouded by sup θ P c sup 0 s s obs M 1 f (s θ)e aα 3,εc 3 P ad hece has the order O p (1) by (C5), s obs s(θ0 ) ad otig that a α 3,ε/α α 2 = O(1). Therefore e aα 3,εc 3 sup θ P c f (s 0 obs + ε v θ)k(v) dv = O p (1) ad the lemma holds. B c δ e aα 3
26 LI AND FEARNHEAD The we oly eed to cosider the itegratio i B δ. Let f ABC (s θ) = f (s+ε v θ)k(v) dv ad LR ABC (θ) = log(f ABC (s obs θ)/ f ABC (s obs θ)). The followig lemma states a result similar to (C4) for f ABC (s obs θ). Lemma 7. Assume the coditios (C4) ad (C6). The sup θ P0 LR ABC (θ) = o p (1). Proof. By (C6), f ABC (s obs θ) is domiated by the itegratio i the set where v < M 2 ε 1. For f ABC (s obs θ), (C6) automatically holds ad therefore it is also domiated by the itegratio i this set. This ca be see by lettig M 3 to be big eough ad comparig the maximum of f (s obs + ε v θ) i {v : v M 2 ε 1 } ad its miimum i {v : v m}, where m satisfies K(m) > 0, for big eough. The f ABC (s obs θ) = v <M 2 ε 1 exp{log f (s obs + ε v θ) log f (s obs + ε v θ)} f (s obs + ε v θ)k(v) dv(1 + o p (1)) exp{ sup LR (s, θ) } f ABC (s obs θ)(1 + o p (1)), s s obs M 2 ad LR ABC (θ) sup s sobs M 2 LR (s, θ) +o p (1). Similarly, LR ABC (θ) sup s sobs M 2 LR (s, θ) + o p (1). Therefore by (C4), sup θ P0 LR ABC (θ) = o p (1). Lemma 7 implies the approximatio that c γ,bδ (s obs ) = π(θ) f ABC (s obs θ) γ dθ(1 + o p (1)), B δ ad therefore c γ,bδ (s obs ) ca be evaluated based o the aalytical form of f ABC (s obs θ). Regardig K(v), the followig lemma states several useful properties of K(v). Lemma 8. Assume the coditio (C2). The it holds that (i) K(v) γ dv < ad v >x K(v)γ dv = O(e cxα ) for some positive costats c ad α as x. (ii) K(v 1:p ) γ dv 1:p < ad v 1:p >x K(v 1:p ) γ dv 1:p = O(e cxα ) for some positive costats c ad αas x. (iii) Let K p (u) = K( u ) for u R p. The K p (u) γ du < ad u >x K p(u) γ du = O(e cxα ) for some positive costats c ad αas x. Proof. By (C2)(iii), whe v > x 0 for some large eough x 0, K(v) Me c 1 v α1 some positive costat M. The by the decompositio K(v) γ dv =( + )K(v) γ dv V x0 + K(v) γ dv, v x 0 v >x 0 v >x 0 for
ABC ASYMPTOTICS 27 where V x0 is the volume of the d-dimesio sphere with radius x 0, K(v) γ dv is bouded if v >x 0 K(v) γ dv is bouded. Sice v >x 0 K(v) γ dv M v >x 0 e c 1γ v α1 dv the RHS of which has the order O(e c 1γx α 1 0 ) by itegratig i the spherical coordiate, (i) of the lemma holds. For (ii), sice K(v 1:p ) = K(v) dv p+1:d, by the iequality that e c 1 v α1 e c 1( v 1:p + v p+1:d ) α 1 /2, it is easy to show that (ii) holds by the similar argumet ad itegratig i the spherical coordiate. For (iii), sice K p (u) = K( u ), it follows from the similar argumets. Cosider f ABC (s obs θ) γ as a o-ormalised desity of s obs ad defie the ormalised (γ) desity by f ABC (s θ) f ABC (s θ) γ, with which c γ,bδ (s obs ) may be evaluated through the (γ) results o posterior covergece. The followig lemma verifies the validity of f ABC (s obs θ) beig a desity ad evaluates its ormalisig costat. Lemma 9. Assume the coditios of Lemma 7 ad (C2), the for θ P 0 it holds that f ABC (s θ) γ ds = a (γ 1)d,ε M γ,, where M γ, = Θ(1) ad is idepedet of θ. More specifically, (i) whe c ε <, lim M γ, 2 d+1 (2π) (1 γ)d/2 γ d/2 + 2 d+1 c (1 γ)d ε (ii) whe c ε =, lim M γ, = K(v) γ dv. K(v) γ dv; Proof. Let M γ, = a,ε (1 γ)d fabc (s θ) γ ds. Whe c ε <, with trasformatio T (1) = a (s s(θ)), it ca be writte that (14) [ f ABC (s θ) γ ds = a (γ 1)d N(T (1) + a ε v; 0, I d )K(v)dv] γ dt (1). I the RHS of (14), T R d, by (C2)(ii), we have the decompositio that [ ] γ N(T + a ε v; 0, I d )K(v) dv [ ] γ = ( + )N(T + a ε v; 0, I d )K(v) dv a ε v T /2 a ε v > T /2 [ (2π) d/2 T 2 exp{ 8 } K(v) dv + K( T ] ) N(T + a ε v; 0, I d ) dv v T 2a ε v > T 2aε 2aε [(2π) d/2 T 2 exp{ 8 } + (a ε ) d K( T ] γ ). 2a ε
28 LI AND FEARNHEAD Note that for positive costats x ad y, whe γ 1, by Jese s iequality, (x + y) γ 2 γ 1 (x γ + y γ ) ad whe γ < 1, by the order of l p orm, (x + y) γ x γ + y γ. Therefore (15) M γ, 2 (2π) γd/2 exp{ γ T (1) 2 } dt (1) + 2(a ε ) (1 γ)d 8 (a ε ) d K( T (1) 2a ε ) γ dt (1), ad the iequality i (i) of the lemma holds by takig the limit of the RHS of (15). Whe c ε =, with the trasformatio T (2) = ε 1 (s s(θ)), it ca be writte that [ γ (16) f ABC (s θ) γ ds = ε (γ 1)d N(T (2) + v; 0, (a ε ) 2 I d )K(v) dv] dt (2). By domiated covergece theorem, T R d, we have N(T + v; 0, (a ε ) 2 I d )K(v) dv = lim lim N(T + v; 0, (a ε ) 2 I d )K(v) dv = K(T ). If the limit ad the itegral of T (2) i M γ, ca be exchaged, the we have [ lim M γ, = lim γ N(T (2) + v; 0, (a ε ) 2 I d )K(v) dv] dt (2) = K(T (2) ) γ dt (2), ad (iii) of the lemma holds. The exchageability holds by the uiform itegrability of the itegrads { N(T (2) + v; 0, (a ε ) 2 I d )K(v) dv} N which is show i the followig. Let µ( ) be the Lebesgue measure. Whe γ < 1, ɛ > 0, choose σ = ɛ γ 1. By Jese s iequality, E R d satisfyig µ(e) < σ, [ ] γ N(T (2) + v; 0, (a ε ) 2 I d )K(v) dv dt (2) E [ N(T (2) + v; 0, (a ε ) 2 I d )K(v) dv 1 {T (2) E} µ(e) =µ(e) 1 γ µ(e) 1 γ < ɛ. E N(T (2) + v; 0, (a ε ) 2 I d ) dt (2) K(v) dv dt (2) ] γ µ(e) Whe γ 1, ɛ > 0, by Lemma 8, the σ ca be chose such that E R d satisfyig
ABC ASYMPTOTICS 29 µ(e) < σ, E K(v)γ v < ɛ. The let v (2) = v + T (2), by Jese s iequality, = = E E E [ ] γ N(T (2) + v; 0, (a ε ) 2 I d )K(v) dv dt (2) [ ] γ N(v (2) ; 0, (a ε ) 2 I d )K(v (2) T (2) ) dv (2) dt (2) N(v (2) ; 0, (a ε ) 2 I d )K(v (2) T (2) ) γ dv (2) dt (2) N(v (2) ; 0, (a ε ) 2 I d ) K(T (3) ) γ dt (3) dv (2), E where T (3) = T (2) v (2) ad E is E uder the trasformatio. Sice µ(e ) = µ(e) for ay v (2), E K(T (3) ) γ dt (3) < ɛ ad E [ γ N(T (2) + v; 0, (a ε ) 2 I d )K(v) dv] dt (2) ɛ N(v (2) ; 0, (a ε ) 2 I d ) dv (2) = ɛ. Therefore the itegrads are uiformly itegrable. Sice the RHS of (14) ad (16) do ot deped o θ, M γ, is idepedet of θ. The c γ,bδ (s obs ) ca be writte as (γ) (17) c γ,bδ (s obs ) = π(θ) f ABC (s obs θ) dθ B δ f ABC (s θ 0 ) γ ds(1 + o p (1)), that is the product of the margial desity of s obs, for the posterior distributio with (γ) prior π(θ) ad f ABC (s (γ) obs θ) as the likelihood, ad the ormalisig costat of f ABC (s obs θ) which has bee evaluated i Lemma 9. For the margial desity, the results of posterior covergece i o-regular cases i [18] ca be applied. Sice the posterior covergece results i [18] are based o the covergece of the likelihood ratio betwee the true parameter ad its eighbourhood, the followig lemma about (γ) the covergece rate of f ABC (s obs θ 0 ) is eeded. Lemma 10. Assume the coditios of Lemma 9. The f (γ) ABC (s obs θ 0 ) = Θ p (a d,ε). Proof. Whe c ε =, sice K(v) 1, f ABC (s obs θ 0 ) = ε d f (s obs + v (1) θ 0 )K(ε 1 v (1) ) dv (1) ε d,
30 LI AND FEARNHEAD ad let M 3 be a positive costat satisfyig if v M3 K(v) < 0, the f ABC (s obs θ 0 ) if K(v) f (s obs + ε v θ 0 ) dv v M 3 v M 3 = ε d if K(v) N(v + (a ε ) 1 T obs ; 0, (a ε ) 2 I d ) dv v M 3 v M 3 = ε d if K(v)(1 + o p (1)). v M 3 Hece f ABC (s obs θ 0 ) = Θ p (ε d ) ad f ABC (s obs θ 0 ) γ = Θ p (ε γd ). Whe c ε <, f ABC (s obs θ 0 ) a d (2π) d/2, ad f ABC (s obs θ 0 ) a d (2π) d/2 exp{ 1 2 sup T obs + (c ε + o(1))v 2 } v M 3 v M 3 K(v) dv, where the RHS of this iequality has the order Θ p (a d ). Hece f ABC (s obs θ 0 ) = Θ p (a d ) ad f ABC (s obs θ 0 ) γ = Θ p (a γd ). The by Lemma 9, the lemma holds. f (γ) ABC (s obs θ) as the likeli- Now we are ready to prove the posterior covergece takig hood. Lemma 11. Assume coditios (C2)-(C5) ad (C6). The δ > 0 such that (γ) π(θ) f ABC (s obs θ) dθ = Θ p (a d p,ε ). B δ Proof. Let P (θ) be the p p matrix Ds(θ)Ds(θ) T. δ is selected such that θ B δ, P (θ) ad A(θ) are positive defiite. Such a δ exists sice Ds(θ 0 ) has rak p by (C3)(iii) ad A(θ 0 ) is positive defiite. The we ca choose positive costats λ P,mi, λ P,max, λ A,mi ad λ A,max such that θ B δ, P (θ) λ P,mi I p, λ P,max I p P (θ), A(θ) λ A,mi I d, λ A,max I d A(θ) are positive defiite. To simplify the otatios, we ca assume A(θ) I d without without loss of geerality by otig that π(θ) f ABC (s obs θ) γ dθ B δ ( λγd/2 A,mi λ γd/2 π(θ) B A,max δ λ γd/2 ( A,max λ γd/2 π(θ) B A,mi δ N(s obs + ε v; s(θ), λ A,mi I d )K(v) dv) γ dθ, γ N(s obs + ε v; s(θ), λ A,max I d )K(v) dv) dθ.
Let U = {a,ε (θ θ 0 ) : θ B δ }, ad for u U, let ABC ASYMPTOTICS 31 Z (u) = f ABC (s obs θ 0 + a 1,εu) γ f ABC (s obs θ 0 ) γ ad ξ (u) = Z (u)π(θ 0 + a 1,εu) U Z (u)π(θ 0 + a 1,εu) du. Here Z (u) is the likelihood ratio, sice the ormalisig costats are idetical, ad ξ (u) is the posterior desity of a,ε (θ θ 0 ). The we have (γ) π(θ) f ABC (s (γ) obs θ) dθ = f ABC (s obs θ 0 )a p,ε Z (u)π(θ 0 + a 1,εu) du. B δ U From [18, Propositio 2], ξ (u) L Z(u)/ R Z(u) du as a radom elemet i L 1 (R p ) p ad hece Z (u)π(θ 0 + a 1,εu) du L π(θ 0 ) Z(u)du, U by (IH3) below ad the weak covergece of the ratio of radom sequeces, if Z (u) satisfies the followig coditios: (IH1) For some M > 0, m > 0 ad α > 0, E θ0 [Z 1/2 (u 1 ) Z 1/2 (u 2 )] 2 M(1 + R m ) u 1 u 2 α, for all u 1, u 2 U satisfyig u 1 R ad u 2 R; (IH2) For all u U, E θ0 Z 1/2 (u) exp{ g ( u )}, where {g } is a sequece of real-value fuctios o [0, ) satisfyig the followig: (a) for a fixed 1, g (y) as y ; (b) for ay N > 0, lim y, yn exp{ g (y)} = 0; (IH3) The fiite-dimesioal distributios of {Z (u) : u R p } coverge to those of a stochastic process {Z(u) : u R p }. Therefore by Lemma 10, i order for the lemma to hold, we oly eed to verify (IH1)-(IH3) for Z (u) ad that Z(u)du (0, ). The verificatio proceeds by discussig the cases of c ε < ad c ε =. Whe c ε <, a,ε = a. For (IH1), (18) [ ] fabc E θ0 [Z 1/2 (u 1 ) Z 1/2 (u 2 )] 2 (s θ 0 + a 1 u 1 ) γ/2 fabc (s θ 0 + a 1 u 2 ) γ/2 ds = 2 1. fabc (s θ 0 ) γ ds
32 LI AND FEARNHEAD By Cauchy-Schwartz iequality, i the RHS of (18), f ABC (s θ 0 + a 1 u 1 ) γ/2 fabc (s θ 0 + a 1 u 2 ) γ/2 ds [ [ = f(s + ε v θ 0 + a 1 u 1 ) 1/2 f(s + ε v θ 0 + a 1 u 2 ) 1/2 K(v) dv] γ ds exp{ a2 8 s(θ 0 + a 1 u 1 ) s(θ 0 + a 1 u 2 ) 2 } N(s + ε v; 1 2 (s(θ 0 + a 1 u 1 ) + s(θ 0 + a 1 u 2 )), 1 γ a 2 )K(v) dv] ds [ γ = exp{ γa2 8 s(θ 0 + a 1 u 1 ) s(θ 0 + a 1 u 2 ) 2 }a (γ 1)d N(T (3) + a ε v; 0, I d )K(v)dv] dt (3), where T (3) [ = a s 2 1 (s(θ 0 + a 1 u 1 ) + s(θ 0 + a 1 u 2 )) ]. By Taylor expasio, a (s(θ 0 + a 1 u 1 ) s(θ 0 + a 1 u 2 )) = Ds(θ 0 + a 1 u ) T (u 1 u 2 ) where a 1 u δ. The by (14) ad pluggig i the above iequality i (18), we have [ E θ0 [Z 1/2 (u 1 ) Z 1/2 (u 2 )] 2 2 1 exp{ γ ] 8 Ds(θ 0 + a 1 u ) T (u 1 u 2 ) 2 } γλ P,max u 1 u 2 2 /4, where the last iequality holds by the fact that 1 e x x for x > 0. Hece (IH1) is satisfied. For (IH2), (19) E θ0 Z 1/2 (u) = f ABC (s θ 0 + a 1 u) γ/2 fabc (s θ 0 ) 1 γ/2 ds. Sice γ/2 1, by applyig Jese s iequality twice, we have E θ0 Z 1/2 (u) = [ f ABC (s θ 0 + a 1 u) γ/2 fabc (s θ 0 ) 1 γ/2 fabc (s θ 0 ) 1 γ/2 ds ds ] [ f ABC (s θ 0 + a 1 u) f ABC (s θ 0 ) 1 γ/2 ds f ABC (s θ 0 ) 1 γ/2 ds. ] 1 γ/2 f ABC (s θ 0 ) 1 γ/2 ds (20) E θ0 Z 1/2 (u) [ a d ] (1 γ/2)γ/2 [ f ABC (s θ 0 + a 1 u) f ABC (s θ 0 ) ds a γd/2 f ABC (s θ 0 ) 1 γ/2 ds] 1 γ/2. For the secod term i (20), sice 1 γ/2 (0, 2], by Lemma 9 ad Lemma 8, it is bouded by some positive costat. For the first term i the RHS of (20), by algebra ad exchagig
the order of itegratio, a d = a d = f ABC (s θ 0 + a 1 u) f ABC (s θ 0 ) ds ABC ASYMPTOTICS 33 f(s + ε v θ 0 ) f(s + ε w θ 0 + a 1 u)k(v)k(w) dvdwds N(a (s(θ 0 + a 1 u) s(θ 0 )); v (3) w (1), 2I d )K( v(3) a ε )K( w(1) a ε )(a ε ) 2d dv (3) dw (1) where v (3) = a ε v ad w (1) = a ε ω. By Taylor expasio, a (s(θ 0 + a 1 u) s(θ 0 ) = Ds(θ 0 + a 1 u ) T u where a 1 u δ. I order to evaluate the last itegral i the above equality, we divide the itegratio space R d R d ito two parts: R 1 = {(v (3), w (1) ) : v (3) w (1) 2 1 Ds(θ 0 + a 1 u ) T u } ad R1 c. The we have N(Ds(θ 0 + a 1 u ) T u; v (3) w (1), 2I d )K( v(3) )K( w(1) )(a ε ) 2d dv (3) dw (1) R 1 a ε a ε (4π) d/2 exp{ 1 4 if Ds(θ 0 + a 1 (v (3),w (1) u ) T u (v (3) w (1) ) 2 } ) R 1 R1 K( v(3) a ε )K( w(1) a ε )(a ε ) 2d dv (3) dw (1) (21) (4π) d/2 exp{ 1 8 λ P,mi u 2 }, ad by lettig V (3) ad W (1) be idepedet radom vectors with desity K((a ε ) 1 v)(a ε ) d, R c 1 N(Ds(θ 0 + a 1 u ) T u; v (3) w (1), 2I d )K( v(3) a ε )K( w(1) a ε )(a ε ) 2d dv (3) dw (1) (4π) d/2 P ((V (3), W (1) ) R c 1) (4π) d/2 P ( V (3) + W (1) > 1 2 Ds(θ 0 + a 1 u ) T u ad V (3) W (1) ) 2(4π) d/2 P ( V (3) 1 4 Ds(θ 0 + a 1 u ) T u ) (22) 2(4π) d/2 P ( V > λp,mi 4a ε u ), where V is the radom vector with desity K(v). Therefore, E θ0 Z 1/2 (u) c [ exp{ 1 8 λ P,mi u 2 } + 2P ( V > λp,mi 4a ε u ) where c is some positive costat, ad by Lemma 8, (IH2) is satisfied. ] (1 γ/2)γ/2,
34 LI AND FEARNHEAD For (IH3), with probability 1, by domiated covergece theorem, [ lim Z lim N(a (s obs s(θ 0 + a 1 ] γ u)) + a ε v; 0, I d )K(v) dv (u) = lim N(a (s obs s(θ 0 )) + a ε v; 0, I d )K(v) dv [ lim N(a (s obs s(θ 0 )) + Ds(θ 0 + a 1 u ) T u + a ε v; 0, I d )K(v) dv = lim N(a (s obs s(θ 0 )) + a ε v; 0, I d )K(v) dv [ N(T lim Z obs + Ds(θ 0 ) T u + c ε v; 0, I d )K(v) dv ] γ (u) = Z(u) [ N(T obs + c ε v; 0, I d )K(v) dv ] γ. Hece (IH3) is satisfied. For Z(u) du, whe c ε = 0, it is obviously i (0, ). Whe c ε (0, ), we oly eed to show the itegral i the umerator of Z(u) over R p is i (0, ). By algebra, [ N(T obs + Ds(θ 0 ) T u + c ε v; 0, I d )K(v) dv] γ du [ = N(P (θ 0 ) 1/2 u + P (θ 0 ) 1/2 Ds(θ 0 )(T obs + c ε v); 0, I p )(4π) d p 2 ] γ (23) exp{ 1 2 (T obs + c ε v) T (I d Ds(θ 0 ) T P (θ 0 ) 1 Ds(θ 0 ))(T obs + c ε v)}k(v) dv] γ du. Sice I d Ds(θ 0 ) T P (θ 0 ) 1 Ds(θ 0 ) is a projectio matrix ad hece positive-semidefiite, the RHS of (23) is bouded by [ γ (24) (4π) γ(d p) 2 N(P (θ 0 ) 1/2 u + P (θ 0 ) 1/2 Ds(θ 0 )(T obs + c ε v); 0, I p )K(v) dv] du. Sice the eigevalues of Ds(θ 0 ) T P (θ 0 ) 1 Ds(θ 0 ) are 0 ad 1, by sigular value decompositio, there exists a p-dimesio uitary ] matrix Q 1 ad a q-dimesio uitary matrix Q 2 such that P (θ 0 ) 1/2 Ds(θ 0 ) = Q 1 [I p.0 Q 2. Let v SV D = Q 2 v, u SV D = Q 1 v SV D,1:p, u = p d P (θ 0 ) 1/2 u + P (θ 0 ) 1/2 Ds(θ 0 )T obs, K SV D (v SV D ) = K(Q T 2 v SV D ) ad K SV D,p (u SV D ) = K SV D (Q T 1 u SV D ). The (24) ca be trasformed to be (4π) γ(d p) 2 P (θ 0 ) 1/2 [ (25) =(4π) γ(d p) 2 P (θ 0 ) 1/2 [ ] ] γ N(u + c ε Q 1 [I p.0 v SV D ; 0, I p )K(Q T 2 v SV D ) dv SV D du p d N(u + c ε u SV D ; 0, I p )K SV D,p (u SV D ) du SV D ] γ du. With Lemma 8 ad the argumets similar to the proof of Lemma 9(i), it ca be show that (25) is i (0, ) which implies that the LHS of (23) is i (0, ). Therefore Z(u) du (0, ).
ABC ASYMPTOTICS 35 Now we cosider the case whe c ε = ad a,ε = ε 1. For (IH1), i (18), with the trasformatio T (4) = ε 1 (s s(θ 0 + ε u 2 )), f ABC (s θ 0 + ε u 2 ) γ = ε (1 γ)d ad f ABC (s θ 0 + ε u 1 ) γ = ε (1 γ)d = ε (1 γ)d [ [ [ N(T (4) + v; 0, (a ε ) 2 I d )K(v) dv] γ, N(T (4) + ε 1 (s(θ 0 + ε u 2 ) s(θ 0 + ε u 1 )) + v; 0, (a ε ) 2 I d )K(v) dv N(T (4) + v (4) ; 0, (a ε ) 2 I d )K(v (4) Ds(θ 0 + a 1 u ) T (u 2 u 1 )) dv (4) ] γ, where v (4) = v + Ds(θ 0 + a 1 u ) T (u 2 u 1 ). The by Cauchy-Schwarz iequality, f ABC (s θ 0 + a 1 u 1 ) γ/2 fabc (s θ 0 + a 1 u 2 ) γ/2 ds (26) ε (1 γ)d [ N(T (4) + v; 0, (a ε ) 2 I d )K(v) 1/2 K(v Ds(θ 0 + a 1 u ) T (u 2 u 1 )) 1/2 dv] γ dt (4). The RHS of (26) is similar to the the RHS of (16), differig i the itegrad that K(v) 1/2 K(v Ds(θ 0 + a 1 u ) T (u 2 u 1 )) 1/2 is i the place of K(v). Note that i the proof of Lemma 9(ii), regardig K(v), the uiform itegrability oly eeds K(v) γ dv to be bouded. Sice K(v) γ/2 K(v Ds(θ 0 + a 1 u ) T (u 2 u 1 )) γ/2 dv [ K(v) γ dv K(v Ds(θ 0 + a 1 u ) T (u 2 u 1 )) γ dv] 1/2 which is bouded, similar to the argumets i Lemma 9(ii), it ca also be show that the itegrads i the RHS of (26) are uiformly itegrable ad the lim ε(γ 1)d f ABC (s θ 0 + a 1 u 1 ) γ/2 fabc (s θ 0 + a 1 u 2 ) γ/2 ds [ ] γ lim N(T (4) + v; 0, (a ε ) 2 I d )K(v) 1/2 K(v Ds(θ 0 + a 1 u ) T (u 2 u 1 )) 1/2 dv dt (4) = K(T (4) ) γ/2 K(T (4) Ds(θ 0 ) T (u 2 u 1 )) γ/2 dt (4). The by Lemma 9(ii), fabc (s θ 0 + a 1 u 1 ) γ/2 fabc (s θ 0 + a 1 u 2 ) γ/2 ds fabc (s θ 0 ) γ ds ] γ (27) K(v) γ/2 K(v Ds(θ 0 ) T (u 2 u 1 )) γ/2 dv K(v) γ dv + o(1).
36 LI AND FEARNHEAD I order to obtai a appropriate lower boud for the leadig term of the RHS of (27), divide R d ito two parts: R 2 = {v : v Ds(θ 0 ) T (u 2 u 1 ) v } ad R2 c. The K(v) γ/2 K(v Ds(θ 0 ) T (u 2 u 1 )) γ/2 dv = ( )K(v) γ/2 K(v Ds(θ 0 ) T (u 2 u 1 )) γ/2 dv R 2 + R c 2 K(v Ds(θ 0 ) T (u 2 u 1 )) γ dv + R 2 = K(v (4) ) γ dv (4) + K(v) γ dv, R 2 R c 2 R c 2 K(v) γ dv where v (4) = v Ds(θ 0 +h 1 ) T (u 2 u 1 ) ad R 2 = {v(4) : v (4) v (4) +Ds(θ 0 ) T (u 2 u 1 ) }. Let Q 3 be the rotatio matrix that Q 3 Ds(θ 0 ) T (u 2 u 1 ) = Ds(θ 0 ) T (u 2 u 1 ) e 1 where e 1 = (1, 0, 0,...) R d. The by lettig v (5) = Q 3 v, ( + )K(v) γ dv = ( + )K(v (5) ) γ dv (5) R 2 R c 2 = ( v (5) v (5) + Ds(θ 0 ) T (u 2 u 1 ) e 1 v (5) 1 v(5) 1 + Ds(θ 0 ) T (u 2 u 1 ) + where v (5) 1 is the first coordiate of v (5), K(v v (5) 1 2 (5) ) γ dv (5). 1 λp,max u 1 u 2 v (5) v (5) Ds(θ 0 ) T (u 2 u 1 ) e 1 v (5) 1 v(5) 1 Ds(θ 0 ) T (u 2 u 1 ) Therefore by (18) ad (27), [ ] E θ0 [Z 1/2 (u 1 ) Z 1/2 (u 2 )] 2 K(v) 2 1 γ v 1 2 1 λp,max u 1 u 2 K(v) γ dv dv 2 sup K (γ) (v 1 ) λ P,max u 1 u 2, v 1 R )K(v (5) ) γ dv (5), where K (γ) (v) K(v) γ. Sice K(v) γ dv <, K (γ) (v) is a valid desity ad hece sup v1 R K (γ) (v 1 ) <, which implies that (IH1) is satisfied. For (IH2), similar to (20), we have (28) E θ0 Z 1/2 (u) [ ε d ] (1 γ/2)γ/2 [ f ABC (s θ 0 + ε u) f ABC (s θ 0 ) ds ε γd/2 f ABC (s θ 0 ) 1 γ/2 ds] 1 γ/2. For the secod term i (28), sice 1 γ/2 (0, 2], by Lemma 8 ad Lemma 9, it is bouded by some positive costat. For the first term i the RHS of (28), by algebra ad exchagig
ABC ASYMPTOTICS 37 the order of itegratio, ε d f ABC (s θ 0 + ε u) f ABC (s θ 0 ) ds =(a ε ) d N(a (s(θ 0 + ε u) s(θ 0 )); a ε (v w), 2I d )K(v)K(w) dvdw = N(Ds(θ 0 + ε u ) T u; v w, 2(a ε ) 2 I d )K(v)K(w) dvdw, where ε u δ. Divide R d R d ito two parts: R 1 = {(v, w) : v w 2 1 Ds(θ 0 + ε u ) T u } ad R 1 c. The similar to (21), we have (29) R 1 N(Ds(θ 0 + a 1 u ) T u; v w, 2(a ε ) 2 I d )K(v)K(w) dvdw (4π) d/2 (a ε ) d exp{ 1 8 a ε λ P,mi u 2 }. For the itegral i R 1 c, let w(2) = v w ad we have N(Ds(θ 0 + a 1 u ) T u; v w, 2(a ε ) 2 I d )K(v)K(w) dvdw R 1 c = N(w (2) ; Ds(θ 0 + a 1 u ) T u, 2(a ε ) 2 I d ) w (2) >2 1 Ds(θ 0 +ε u ) T u K(v)K(v w (2) ) dvdw (2). For ay w (2) satisfyig w (2) > 2 1 Ds(θ 0 + ε u ) T u, K(v)K(v w (2) ) dv = ( + The (30) R c 1 v > 1 4 Ds(θ 0 +ε u ) T u K( 1 4 Ds(θ 0 + ε u ) T u ) + K( v 1 4 Ds(θ 0 +ε u ) T u if v 1 4 Ds(θ 0 +ε u ) T u 2K( 1 4 Ds(θ 0 + ε u ) T u ) 2K( 1 4 λp,mi u ). )K(v)K(v w (2) ) dv v w (2) ) N(Ds(θ 0 + a 1 u ) T u; v w, 2(a ε ) 2 I d )K(v)K(w) dvdw 2K( 1 4 λp,mi u ). Therefore by (29) ad (30), for some positive costat c, [ E θ0 Z 1/2 (u) c (4π) d/2 (a ε ) d exp{ 1 8 a ε λ P,mi u 2 } + 2K( 1 ] λp,mi u ) 4 ad by (C2)(iii), (IH2) is satisfied.
38 LI AND FEARNHEAD For (IH3), let v (6) = a (s obs s(θ 0 + ε v)) + a ε v ad by domiated covergece theorem, we have [ ] γ lim Z lim N(v (6) ; 0, I d )K((a ε ) 1 v (6) ε 1 (s obs s(θ 0 + ε u))) dv (6) (u) = lim N(v (6) ; 0, I d )K((a ε ) 1 v (6) ε 1 (s obs s(θ 0 ))) dv (6) [ N(v (6) ; 0, I d ) lim K((a ε ) 1 v (6) ε 1 (s obs s(θ 0 )) + Ds(θ 0 + ε u ) T u) dv (6) = N(v (6) ; 0, I d ) lim K((a ε ) 1 v (6) ε 1 (s obs s(θ 0 ))) dv (6) = K(Ds(θ 0 ) T u) γ. ] γ Therefore (IH3) is satisfied with Z(u) = K(Ds(θ 0 ) T u) γ. Fially, Z(u) du = K( Ds(θ 0 ) T u ) γ du = P (θ 0 ) 1/2 K( u ) γ du which is i (0, ) by Lemma 8(iii). Lemma 12. Assume coditios (C2)-(C5) ad (C6). The π(θ)f ABC (s obs θ) γ dθ = Θ p (a γd p,ε ) for γ (0, 2]. Proof. Sice c γ (s obs ) = c γ,bδ (s obs ) + c γ,b c δ (s obs ), by Lemma 6, (17), Lemma 9 ad Lemma 11, the lemma immediately holds. Proof of Theorem 5.1. Sice π(θ) = π (0) ABC (θ) ad π ABC (θ) = π (1) ABC (θ), the order of their acceptace probabilities follow immediately from (11) ad Lemma 11. For Σ IS,, whe q (θ) = π(θ), Σ IS, = V ABC. By Lemma 1 ad Lemma 3, Σ IS, = Θ p (a 2 + ε 2 ). Whe q (θ) = π ABC (θ s obs, ε ), pluggig q (θ) ito the alterative expressio (3) of Σ ABC,, we have Σ ABC, = p 1 acc,π (h(θ) h ABC ) 2 π(θ) dθ = p 1 [ acc,π V arπ [h(θ)] + (E π [h(θ)] h ABC ) 2]. The sice Σ IS, = p acc,q Σ ABC,, Σ IS, = cp acc,πabc /p acc,π = Θ(a p,ε). Due to the complicatio from the power α i Σ ABC,, the followig otatios ad coditios similar to (C12), (C13), (C8) ad (C10) are eeded. (C17) Let g c (γ) (s obs, ε) = π(θ)f ABC (s obs θ, ε) γ dθ ad g (γ) Assume that i D ε g c (γ) ca be exchaged. h2 (s obs, ε) = (h(θ) h ABC ) 2 π(θ)f ABC (s obs θ, ε) γ dθ. (s obs, ε) ad D ε g (γ) h2 (s obs, ε), the differetiatio ad itegratio (C18) max H ε{g (γ) h2 (s obs, ε)/g c (γ) (s obs, ε)} = O p (1) for γ (0, 1). ε (0,c tol ) (C19) A(θ) α/2 θ k π(θ) dθ < for k = 0, 1, 2.
ABC ASYMPTOTICS 39 (C20) A(θ) α/2 h k (θ) π(θ) dθ < ad A(θ) α/2 h k (θ) 2 π(θ) dθ <.. Proof of Theorem 5.2. For q(θ) = π (α) ABC (θ), sice α (0, 1), the order of p acc,q is O(a d,εε d ), followig from (11) ad Lemma 11. Usig the otatios i (C17), from (12), Σ IS, = g(1 α) c (s obs, ε )g c (1+α) (s obs, ε ) gc 2 (s obs, ε ) g (1 α) h2 (s obs, ε ) g c (1 α) (s obs, ε ), which is a product of two ratios. The first ratio has the order Θ p (1) by Lemma 11. For the secod ratio, by Taylor expasio ad (C18), g (1 α) h2 (s obs, ε ) g c (1 α) (s obs, ε ) = g { (1 α) (1 α) h2 (s obs, 0) g g c (1 α) (s obs, 0) + H h2 (s obs, c θ2 ) } ε g c (1 α) (s obs, c θ2 ) ε 2 (h(θ) E[h(θ) s obs ]) 2 π(θ)f 1 α (s obs θ) dθ = π(θ)f 1 α + O p (ε 2 ). (s obs θ) dθ (31) = V ar (1 α) [h(θ) s obs ] + (E[h(θ) s obs ] E (1 α) [h(θ) s obs ]) 2 + O p (ε 2 ), where E (1 α) [ s obs ] ad V ar (1 α) [ s obs ] are the posterior mea ad variace with prior desity π(θ) ad likelihood proportioal to f 1 α (s obs θ). I order to evaluate (31), results similar to Lemma 5 are eeded. Although f 1 α (s obs θ) is uormalised, by otig that the proof of Lemma 4-5 do ot utilise the fact that f (s obs θ) is ormalised ad θ MLES is also the maximum poit of f 1 α (s obs θ), similar results would hold for E (1 α) [h(θ) s obs ] ad V ar (1 α) [h(θ) s obs ] if the correspodig coditios, which are (C4) ad (C14)-(C16) with f (s obs θ) replaced by f 1 α (s obs θ), are satisfied. It is easy to verify that (C4) ad (C14)-(C16) are also satisfied for f 1 α (s obs θ). Therefore V ar (1 α) [h(θ) s obs ] = a 2 (1 α) 1 Dh(θ 0 ) T I 1 (θ 0 )Dh(θ 0 )+o p (a 2 ) ad E (1 α) [h(θ) s obs ] = h(θ MLES )+o p (a 1 ) implyig that E[h(θ) s obs ] E (1 α) [h(θ) s obs ] = o p (a 1 ). Therefore Σ IS, = Θ p (a 2 + ε 2 ). APPENDIX D: CALCULATIONS FOR SIMPLE GAUSSIAN EXAMPLE To obtai the ABC posterior for the Gaussia model of Sectio 1.1, we use the result of [45]. For badwidth, chose so that the kerel is that of a Gaussia with margial variace ε, the ABC posterior is equivalet to the posterior distributio if we fit a model where s (y) MVN ( (θ, θ), ( 2/ + ε 2 0 0 2/ + ε 2 )). From which the ABC posterior follows by stadard calculatios. Now cosider the acceptace probability of the IS-ABC algorithm usig π (α) ABC (θ) as a proposal. Coditioal o the proposed θ value, the simulated summary statistic is S N(θ, 1/). So, as the proposal distributio of θ is ( α s obs N 1/ + ε 2 + α, 1 + ε 2 ) α + 1 + ε 2,
40 LI AND FEARNHEAD this gives that the margial proposal distributio of S as ( α s obs N 1/ + ε 2 + α, 1 + ε 2 α + 1 + ε 2 + 1 ). The resultig acceptace probability ca be calculated usig the momet geeratig fuctio of a o-cetral chi-squared, as ( { E exp 1 ( S }) s obs ) 2 { 1 2 ε 2 = exp λt } 1 + t 2 + 2t where λ = (1 + ε 2 ) 2 s 2 obs (1 + ε 2 + α)( + 2 ɛ 2 + α + 1 + ɛ 2 ), ad t = 1 ε 2 + 1 ε 2 1 + ε 2 α + 1 + ε 2. Fially ote that λ s 2 obs ad t/(2+2t) < 1/2, so we ca boud the acceptace probability both above ad below by a costat times 1/ 1 + t. For α = 0, we have t > 1/ɛ 2 ad this will go to ifiity as because ɛ = O( 1/2 ). For α > 0 we have t is bouded above ad below by a costat time 1/(ɛ 2 ). Simple but tedious maipulatios gives that the variace of the importace samplig weights for accepted θ values is { σ acc (θ µabc ) 2 exp (θ µ acc) 2 } 2σacc 2 dθ, σ 2 ABC σ 2 ABC where µ ABC ad σabc 2 are the ABC posterior mea ad variace, ad µ acc ad σacc 2 are the mea ad variace of accepted θ values. For this to be fiite we eed 2σacc 2 < σabc 2. Now we ca calculate σacc 2 i a similar to σabc 2 above. This gives which usig ɛ 2 = c simplifies to σ 2 acc = 1 + ɛ 2 α + 1 + ɛ 2 +, σ 2 acc = 1 + c α + 1 + c + = + 1 + c σ2 ABC α + 1 + c +. as σ 2 ABC = (1 + c)/( + 1 + c). This gives σ 2 acc σ 2 ABC = + 1 + c α + 1 + c + 1 1 + α, as. Note further that this ratio i mootoically decreasig as icreases. We require the ratio to be greater tha 1/2, which occurs if ad oly if α < 1.
ABC ASYMPTOTICS 41 REFERENCES [1] Barber, S., Voss, J., ad Webster, M. (2013). The rate of covergece for approximate Bayesia computatio. arxiv preprit arxiv:1311.2038. [2] Beaumot, M. A. (2010). Approximate Bayesia computatio i evolutio ad ecology. Aual review of ecology, evolutio, ad systematics, 41:379 406. [3] Beaumot, M. A., Coruet, J.-M., Mari, J.-M., ad Robert, C. P. (2009). Adaptive approximate Bayesia computatio. Biometrika, 96(4):983 990. [4] Beaumot, M. A., Zhag, W., ad Baldig, D. J. (2002). Approximate Bayesia computatio i populatio geetics. Geetics, 162:2025 2035. [5] Biau, G., Cérou, F., ad Guyader, A. (2012). New isights ito approximate Bayesia computatio. arxiv preprit arxiv:1207.6461. [6] Blum, M. G. (2010). Approximate Bayesia computatio: a oparametric perspective. Joural of the America Statistical Associatio, 105(491). [7] Blum, M. G. ad Fraçois, O. (2010). No-liear regressio models for approximate Bayesia computatio. Statistics ad Computig, 20(1):63 73. [8] Blum, M. G., Nues, M. A., Pragle, D., Sisso, S. A., et al. (2013). A comparative review of dimesio reductio methods i approximate bayesia computatio. Statistical Sciece, 28(2):189 208. [9] Bortot, P., Coles, S. G., ad Sisso, S. A. (2007). Iferece for stereological extremes. Joural of the America Statistical Associatio, 102(477):84 92. [10] Cappé, O., Douc, R., Guilli, A., Mari, J., ad Robert, C. (2008). Adaptive importace samplig i geeral mixture classes. Statistics ad Computig, 18(4):447 459. [11] Coruet, J.-M., Satos, F., Beaumot, M. A., Robert, C. P., Mari, J.-M., Baldig, D. J., Guillemaud, T., ad Estoup, A. (2008). Iferrig populatio history with DIY ABC: a user-friedly approach to approximate Bayesia computatio. Bioiformatics, 24(23):2713 2719. [12] Creel, M. ad Kristese, D. (2013). Idirect likelihood iferece (revised). UFAE ad IAE workig papers, Uitat de Foamets de l Aalisi Ecoomica (UAB) ad Istitut d Aalisi Ecoomica (CSIC). [13] Dea, T. A. ad Sigh, S. S. (2011). Asymptotic behaviour of approximate Bayesia estimators. arxiv preprit arxiv:1105.3655. [14] Dea, T. A., Sigh, S. S., Jasra, A., ad Peters, G. W. (2014). Parameter estimatio for hidde Markov models with itractable likelihoods. Scadiavia Joural of Statistics. [15] Del Moral, P., Doucet, A., ad Jasra, A. (2012). A adaptive sequetial Mote Carlo method for approximate Bayesia computatio. Statistics ad Computig, 22(5):1009 1020. [16] Duffie, D. ad Sigleto, K. J. (1993). Simulated momets estimatio of Markov models of asset prices. Ecoometrica, 61(4):929 952. [17] Fearhead, P. ad Pragle, D. (2012). Costructig summary statistics for approximate Bayesia computatio: semi-automatic approximate Bayesia computatio. Joural of the Royal Statistical Society: Series B (Statistical Methodology), 74(3):419 474. [18] Ghosal, S., the, J. K., ad Samata, T. (1995). O covergece of posterior distributios. The Aals of Statistics, pages 2145 2152. [19] Ghosh, J. K., Delampady, M., ad Samata, T. (2006). A itroductio to Bayesia aalysis: theory ad methods. Spriger. [20] Ghosh, J. K. ad Ramamoorthi, R. (2003). Bayesia oparametrics, volume 1. Spriger. [21] Gordo, N., Salmod, D., ad Smith, A. F. M. (1993). Novel approach to oliear/o-gaussia Bayesia state estimatio. IEE proceedigs-f, 140:107 113. [22] Gouriéroux, C. ad Rochetti, E. (1993). Idirect iferece. Joural of Applied Ecoometrics, 8:s85 s118. [23] Hegglad, K. ad Frigessi, A. (2004). Estimatig fuctios i idirect iferece. Joural of the Royal Statistical Society, series B, 66:447 462. [24] Hesterberg, T. (1988). Advaces i Importace Samplig. PhD thesis, Staford Uiversity. [25] Hesterberg, T. (1995). Weighted average importace samplig ad defesive mixture distributios.
42 LI AND FEARNHEAD Techometrics, 37(2):185 194. [26] Jasra, A., Katas, N., ad Ehrlich, E. (2014). Approximate iferece for observatio-drive time series models with itractable likelihoods. ACM Trasactios o Modelig ad Computer Simulatio (TOMACS), 24(3):13. [27] Lee, A. ad Latuszyski, K. (2012). Variace boudig ad geometric ergodicity of Markov chai Mote Carlo kerels for approximate Bayesia computatio. arxiv preprit arxiv:1210.6703. [28] Liu, J. S. (1996). Metropolized idepedet samplig with comparisos to rejectio samplig ad importace samplig. Statistics ad Computig, 6(2):113 119. [29] Loredo, T. J., Berger, J. O., Cheroff, D. F., Clyde, M. A., ad Liu, B. (2012). Bayesia methods for aalysis ad adaptive schedulig of exoplaet observatios. Statistical Methodology, 9(1):101 114. [30] Mari, J.-M., Pillai, N. S., Robert, C. P., ad Rousseau, J. (2013). Relevat statistics for Bayesia model choice. Joural of the Royal Statistical Society: Series B (Statistical Methodology). [31] Marjoram, P., Molitor, J., Plagol, V., ad Tavare, S. (2003). Markov chai Mote Carlo without likelihoods. PNAS, 100:15324 15328. [32] Marti, G. M., McCabe, B. P., Maeesoothor, W., ad Robert, C. P. (2014). Approximate Bayesia computatio i state space models. arxiv preprit arxiv:1409.8363. [33] Peters, G. W., Kaa, B., Lasscock, B., Melle, C., Godsill, S., et al. (2011). Bayesia coitegrated vector autoregressio models icorporatig alpha-stable oise for iter-day price movemets via approximate Bayesia computatio. Bayesia Aalysis, 6(4):755 792. [34] Picchii, U. (2013). Iferece for SDE models via approximate Bayesia computatio. Joural of Computatioal ad Graphical Statistics, (just-accepted). [35] Pragle, D., Fearhead, P., Cox, M. P., Biggs, P. J., ad Frech, N. P. (2013). Semi-automatic selectio of summary statistics for ABC model choice. Statistical Applicatios i Geetics ad Molecular Biology, 74:67 82. [36] Pritchard, J. K., Seielstad, M. T., Perez-Lezau, A., ad Feldma, M. W. (1999). Populatio growth of huma Y chromosomes: a study of Y chromosome microsatellites. Molecular Biology ad Evolutio, 16:1791 1798. [37] Raftery, A. ad Bao, L. (2010). Estimatig ad projectig treds i hiv/aids geeralized epidemics usig icremetal mixture importace samplig. Biometrics, 66(4):1162 1173. [38] Ratma, O., Adrieu, C., Wiuf, C., ad Richardso, S. (2009). Model criticism based o likelihoodfree iferece, with a applicatio to protei etwork evolutio. Proceedigs of the Natioal Academy of Scieces, 106(26):10576 10581. [39] Rubio, F., Johase, A. M., et al. (2013). A simple approach to maximum itractable likelihood estimatio. Electroic Joural of Statistics, 7:1632 1654. [40] Sadma, G. ad Koopma, S. (1998). Estimatio of stochastic volatility models via Mote Carlo maximum likelihood. Joural of Ecoometrics, 87(2):271 301. [41] Sisso, S., Peters, G., Briers, M., ad Fa, Y. (2010). A ote o target distributio ambiguity of likelihood-free samplers. arxiv preprit arxiv:1005.5201. [42] Toi, T., Welch, D., Strelkowa, N., Ipse, A., ad Stumpf, M. P. (2009). Approximate Bayesia computatio scheme for parameter iferece ad model selectio i dyamical systems. Joural of the Royal Society Iterface, 6(31):187 202. [43] Wegma, D., Leueberger, C., ad Excoffier, L. (2009). Efficiet approximate Bayesia computatio coupled with Markov chai Mote Carlo without likelihood. Geetics. [44] West, M. (1993). Approximatig posterior distributios by mixture. Joural of the Royal Statistical Society. Series B (Methodological), 55(2):409 422. [45] Wilkiso, R. D. (2013). Approximate Bayesia computatio (ABC) gives exact results uder the assumptio of model error. Statistical applicatios i geetics ad molecular biology, 12(2):129 141. [46] Yua, A. ad Clarke, B. (2004). Asymptotic ormality of the posterior give a statistic. Caadia Joural of Statistics, 32(2):119 137.
ABC ASYMPTOTICS 43 Departmet of Mathematics ad Statistics E-mail: w.li@lacaster.ac.uk E-mail: p.fearhead@lacaster.ac.uk