Journal of Statistical Software

Transcription

1 JSS Journal of Statistical Software Januar 2015, Volume 63, Issue Spatial Data Analsis with R-INLA with Some Etensions Roger S. Bivand NHH Norwegian School of Economics Virgilio Gómez-Rubio Universidad de Castilla-La Mancha Håvard Rue Norwegian Universit for Science and Technolog Abstract The integrated nested Laplace approimation (INLA) provides an interesting wa of approimating the posterior marginals of a wide range of Baesian hierarchical models. This approimation is based on conducting a Laplace approimation of certain functions and numerical integration is etensivel used to integrate some of the models parameters out. The R-INLA package offers an interface to INLA, providing a suitable framework for data analsis. Although the INLA methodolog can deal with a large number of models, onl the most relevant have been implemented within R-INLA. However, man other important models are not available for R-INLA et. In this paper we show how to fit a number of spatial models with R-INLA, including its interaction with other R packages for data analsis. Secondl, we describe a novel method to etend the number of latent models available for the model parameters. Our approach is based on conditioning on one or several model parameters and fit these conditioned models with R-INLA. Then these models are combined using Baesian model averaging to provide the final approimations to the posterior marginals of the model. Finall, we show some eamples of the application of this technique in spatial statistics. It is worth noting that our approach can be etended to a number of other fields, and not onl spatial statistics. Kewords: INLA, spatial statistics, R. 1. Introduction Baesian inference has become ver popular in spatial statistics in recent ears. Part of this success is due to the availabilit of computation methods to tackle fitting of spatial models. Besag, York, and Mollié (1991) proposed in their seminal paper an appropriate wa of fitting

2 2 Spatial Data Analsis with R-INLA with Some Etensions a spatial model using Markov chain Monte Carlo methods. This model has been etensivel used and etended to consider different tpes of fied and random effects for spatial and spatio-temporal analsis. In general, fitting these models has been possible because of the availabilit of different computational techniques, the most notable being Markov chain Monte Carlo (MCMC). For large models or big data sets, MCMC can be tedious and reaching the required number of samples can take a long time. Not to mention that autocorrelation ma arise and that an increased number of iterations ma be required. Alternativel, the posterior distributions of the parameters ma be approimated in some wa. However, most models are highl multivariate and approimating the full posterior distribution ma not be possible in practice. The integrated nested Laplace approimation (INLA, Rue, Martino, and Chopin 2009) focuses on the posterior marginals for latent Gaussian models. Although these models ma seem rather restricted, the appear in a fair number of fields. This also means that INLA will be particularl useful when onl marginal inference on the model parameters is needed. The R-INLA package (Rue, Martino, Lindgren, Simpson, Riebler, and Krainski 2014; Lindgren and Rue 2015) for the R programming language (R Core Team 2014) provides an interface to INLA (a free-standing programme) so that models can be fitted using standard R commands. Results are readil available for plotting or further analsis. First of all, we describe how R-INLA can be used together with other R packages for spatial data analsis. It is often the case that spatial data are available in different formats that need to be loaded into R and some pre-processing is required. Also, once the results are available, it is helpful to eplain how to displa them on a map. Although INLA is a general method to approimate the posterior marginals, R-INLA implements a number of popular latent models and prior distributions for the model parameters. It is, however, difficult to fit new models with INLA if these are based on other distributions not available in R-INLA. This ma be an inconvenience when tring to develop new models as there is no eas wa of etending R-INLA to fit other models without writing them into INLA itself. This is wh we also describe a wa of etending the number of models that R-INLA can fit with little etra effort. First of all, we consider one (or more) parameters in our model so that, if the are fied, the resulting model can be fitted with R-INLA. What we are doing here is, in fact, to fit a model conditioned on the assigned values to the parameters. Then, we can assign different values to these parameters and combine the resulting models in some wa to obtain a fit of the original model. We have used Baesian model averaging and numerical integration techniques to combine these models (Bivand, Gómez-Rubio, and Rue 2014b). This paper is organized as follows. Section 2 describes the integrated nested Laplace approimation. In Section 3 the different latent models for spatial statistics are described. We describe how to etend R-INLA to fit new models in Section 4. Some eamples are provided in Section 5. Finall, we discuss wh our approach is relevant in Section Integrated nested Laplace approimation Baesian inference is based on computing the posterior distribution of a vector of model parameters conditioned on the vector of observed data. Baes rule states that this

3 Journal of Statistical Software 3 posterior distribution can be written down as π( ) π( )π() (1) Here, π( ) is the likelihood of the model and π() represents the prior distribution on the model parameters. Usuall, π( ) is a highl multivariate distribution and difficult to obtain. In particular, it is seldom possible to derive it in a closed form. For this reason, several computational approaches have been proposed to get approimations to it. MCMC is probabl the most widel used famil of computational approaches to estimate the posterior distribution. The marginal distribution of parameter i can be denoted b π( i ) and it can be easil derived from the full posterior b integrating out over the remaining set of parameters i. Let us assume that we have a set of n observations = { i } n i=1, whose distribution is of the eponential famil. The mean of observation i is µ i and it can depend on a linear predictor η i via a link function. In turn, the linear predictor η i can be modelled as follows: n f η i = α + f (j) (u ji ) + β k z ki + ε i (2) j=1 α is the intercept, f (j) are functions on a set of n f random effects on a vector of covariates u, β k are coefficients on some covariates z and ε i are error terms. Hence, the vector of latent effects is = {{η i }, α, {β k },...}. Note that given our particular interest in spatial models, terms f (j) (u ji ) can be defined as to model spatial or spatio-temporal dependence. is modelled using a Gaussian distribution with zero mean and variance-covariance matri Q(θ 1 ). Now, θ 1 is a vector of hperparameters. Furthermore, is assumed to be a Gaussian Markov random field (GMRF, Rue and Held 2005). This means that Q(θ 1 ) will fulfil a number of Markov properties. The distribution of observations i will depend on the latent effects and, possibl, a number of hperparameters θ 2. Taking the vector of hperparameters θ = (θ 1, θ 2 ), observations i will be independent of each other given i and θ because of being a GMRF. Following Rue et al. (2009), the posterior distribution of the model latent effects and hperparameters θ can be written as n β k=1 π(, θ ) π(θ)π( θ) i I π( i i, θ) (3) π(θ) Q(θ) 1/2 ep{ 1 2 T Q(θ) + i I log(π( i i, θ))} I represents an inde of observed data (from 1 to n), Q(θ) is a precision matri on some hperparameters θ and log(π( i i, θ)) is the log-likelihood of observation i. INLA allows different forms for the likelihood of the observations. This includes not onl distributions from the eponential famil but also mitures of distributions. Also, INLA can handle observations with different likelihoods in the same model. Regarding the latent effects, different models can be used. We will describe some of these in more detail in Section 3. The specification of the prior distributions π(θ) is also ver fleible. These will often depend on the latent effect but, in principle, the most common distributions are available and the

4 4 Spatial Data Analsis with R-INLA with Some Etensions user can define their own prior distribution in the R-INLA package (but we will return to this later). Hence, we can write the marginals of the elements in and θ (i.e., latent effects and hperparameters) as π( i ) = π( i θ, )π(θ )dθ (4) and π(θ j ) = π(θ )dθ j (5) In order to estimate the previous marginals, we need π(θ ) or, alternativel, a convenient approimation that we will denote b π(θ ). Initiall, this approimation can be taken as π(, θ, ) π(θ ) (6) π G ( θ, ) = (θ) Here π G ( θ, ) is a Gaussian approimation to the full conditional of and (θ) is the mode of the full conditional for a given value of θ. Rue et al. (2009) take this approimation and use it to compute the marginal distribution of i using numerical integration: π( i ) = k π( i θ k, ) π(θ k ) k (7) Here k are the weights associated with the ensemble of values θ k, defined on a multidimentional grid over the space of hperparameters. Note that in the previous equation it is important to have good approimations of π( i θ k, ). A Gaussian approimation π G ( i θ k, ), with mean µ i (θ) and variance σi 2 (θ), ma be a good starting point but a better approimation ma be required in other cases. Rue et al. (2009) developed better approimations based on alternative approimation methods, such as the Laplace approimation. For eample, the have used the Laplace approimation to obtain: π(, θ, ) π LA ( i θ, ) (8) π GG ( i i, θ, ) i = i ( i,θ) π GG ( i i, θ, ) is a Gaussian approimation to i i, θ, around its mode i ( i, θ). Rue et al. (2009) develop a simplified Laplace approimation to improve π LA ( i θ, ) using a series epansion of the Laplace approimation around i. This approimation is computationall less epensive, and it also corrects for location and skewness The R-INLA package An interface to INLA has been provided as an R package called R-INLA, which can be downloaded from together with the free-standing eternal INLA programme. R-INLA provides a user model interface similar to the one used to fit generalized additive models (GAM) with function gam() in the mgcv package (Wood 2006). It can handle fied effects, non-linear terms and random effects in a formula argument. The interface is fleible enough to allow for the specification of different priors and model fitting options. Non-linear terms and random effects are included in the formula as calls to the f() function.

5 Journal of Statistical Software 5 The model is fitted with a call to function inla(), which will return the fitted model as an inla object. Note that, b default, onl some results will be returned. These include the marginal distributions of the latent effects and hperparameters, as well as summar statistics. In addition to the posterior marginals, R-INLA can provide a number of additional quantities on the fitted model. For eample, it can provide the log-marginal likelihood π() which can be used for model selection. Other model selection criteria such as the DIC (Spiegelhalter, Best, Carlin, and Van der Linde 2002) and CPO (Held, Schödle, and Rue 2010) have also been implemented. Furthermore, R-INLA includes a number of options to define the prior distributions for the parameters in the model. Well-known prior distributions are available and the user can define their own prior distributions as well. In the net Section we describe different eamples of the use of R-INLA for spatial statistics, in which we have included a detailed description on how inla() should be called. 3. Spatial models with INLA As discussed in Section 2, spatial dependence can be included as part of the vector of latent effects. In principle, an number of random effects can be included in the model. In this Section, we will describe the different options available, depending on the tpe of problem. A full description of the models described here can be found in the R-INLA website at http: // but we have included a summar. Blangiardo, Cameletti, Baio, and Rue (2013) and Gómez-Rubio, Bivand, and Rue (2014b) also discuss the different spatial models included in R-INLA. First we will briefl introduce other papers describing the use of INLA and R-INLA for spatial statistics. Schrödle and Held (2010) describe the use of spatial and spatio-temporal models for disease mapping, including ecological regression. Schrödle and Held (2011) epand the number of spatio-temporal models that can be used with R-INLA, and show the use of setting linear constraints to make comple spatio-temporal effects identifiable. Schrödle, Held, Riebler, and Danuser (2011) show how to use spatio-temporal models for disease surveillance. Eidsvik, Finle, Banerjee, and Rue (2012) focus on the use of R-INLA for the analsis of large spatial datasets. Finall, Ruiz-Cardenas, Krainski, and Rue (2012) develop spatio-temporal dnamic models with R-INLA Analsis of lattice data First of all, we will discuss the analsis of lattice data because this will establish the basis for other tpes of analses. In the analsis of lattice data observations are grouped according to a set of areas, which usuall represent some sort of administrative region (neighborhoods, municipalities, provinces, countries, etc.). R-INLA includes a latent model for uncorrelated random effects. In this case, the random effects u i are modelled as u i N(0, τ u ) (9) where τ u refers to the precision of the Gaussian distribution. It should be noted that R-INLA assigns a prior to log(τ u ) which, b default, is a log-gamma distribution. Although this model

6 6 Spatial Data Analsis with R-INLA with Some Etensions is not spatial, it can be combined with other spatial models. Using log(τ u ) instead of simpl τ u provides some advantages as log(τ u ) is not constrained to be positive. This is particularl useful when optimising to find the mode of log(τ u ), for eample. In order to model spatial correlation, neighborhoods must be defined among the stud areas. It is often considered that two areas are neighbors is the share a common boundar. Spatial autocorrelation is modelled using a Gaussian distribution with zero mean and a precision matri that will model correlation between neighbors. Given that latent effects are a GMRF, we can define the variance-covariance matri of the random effects as Σ = 1 τ Q 1 (10) where τ is a precision parameter and matri Q encodes the spatial structure. Given that we are assuming a latent GMRF, this also means that matri Q will be defined such as element Q ij is zero if areas i and j are not neighbors. This means that Q is often a ver sparse matri. See, for eample, Rue and Held (2005) for details. Available specifications for spatial dependence includes the intrinsic conditional autoregressive (CAR) specification (Besag et al. 1991). This will produce a Q matri in which element Q ii is n i (the number of neighbors of area i) and element Q ij (with i j) is -1 if areas i and j are neighbors and 0 otherwise. This means that the spatial random effects v i are distributed as v i v j, τ v N( 1 1 v j, ) i j (11) n i τ v n i i j τ v is the conditional precision of the random effects. As in the previous model, R-INLA uses a log-gamma prior on log(τ v ). In addition, a proper version of this model is available as well, for which the spatial random effects are distributed as 1 1 v i v j, τ v N( v j, n i + d τ v (n i + d) ) i j (12) i j d is a positive quantit to make the distribution proper. B default, a log-gamma distribution is assigned to log(d). A more general approach is obtained with the following precision matri: Q = (I ρ λ ma C) (13) Here I is the identit matri, ρ a spatial autocorrelation parameter, C an adjacenc matri and λ ma the maimum eigenvalue of C. R-INLA assigns a Gaussian prior on log(ρ/(1 ρ)). This specification ensures that ρ takes values between 0 and 1. In the following eample we use the Boston housing data, which is described in Harrison and Rubinfeld (1978), to develop an eample on several spatial models. This data set records median price for houses that were occupied b their owners plus some other relevant covariates (see Harrison and Rubinfeld 1978; Pace and Gille 1997, for details). Data have been recorded at the tract level and the neighborhood structure of the tracts is also available, and it is available in the boston data set from the R package spdep (Bivand 2014). In addition, this data set is also available in a shapefile, which is the one we will use in this eample. This

7 Journal of Statistical Software 7 will provide a more general eample on how to load eternal data into R to fit models with R-INLA. readshapepol(), from package maptools (Bivand and Lewin-Koh 2014), can be used to load vector data from a shapefile. Alternativel, readogr(), from package rgdal (Bivand, Keitt, and Rowlingson 2014a), provides a more general data loading framework for vector data since it supports a wider range of formats. This is the one we have used to load the Boston data set: R> librar("rgdal") R> boston <- readogr(sstem.file("etc/shapes", package = "spdep")[1], + "boston_tracts") Here, readogr() takes the director where the laer (shapefile) is located and the laer name, which in this case is the name of the shapefile, as arguments and return an object of tpe SpatialPolgonsDataFrame. This data object is used to store the tract boundaries plus the associated data (tract name and other variables). Before fitting an spatial model, the neighborhood structure needs to be defined. A common criterion is to consider that two areas are neighbors if the share a common boundar. Function pol2nb() will take the tract boundaries and perform this operation to provide us with the adjacenc structure of the Boston tracts as a nb object: R> librar("spdep") R> bostonadj <- pol2nb(boston, queen = FALSE) Here, we have also set queen = FALSE so that queen adjacenc is not used, i.e., in order to consider two areas as neighbors more than one shared point is required. We have converted this into a binar matri to be used with R-INLA using function nb2mat(). Furthermore, the adjacenc matri is converted into a sparse matri of class dgtmatri to reduce memor usage. This will be passed to function f() when defining the spatial model. R> adj <- nb2mat(bostonadj, stle = "B") R> adj <- as(adj, "dgtmatri") A summar of some latent models implemented in R-INLA, and that can be used within the f() function, is available in Table 1. Note that this is not an ehaustive list and that a complete list of the available latent models can be obtained from the R-INLA documentation. We have also included a column showing whether these models are restricted to a regular grid. Also, detailed eamples are available from the R-INLA website at Fied effects (including the intercept) in R-INLA have a Gaussian prior with fied mean and precision, which are 0 and 0.01 (or 0 for the intercept) b default, respectivel. These values can be changed using option control.fied in the inla() call. control.fied must take a named list of arguments, which are used to control how to handle the fied effects in the model. In this named list, mean.intercept and prec.intercept can be used to set the parameters of the Gaussian prior of the intercept, whilst mean and prec are the analogous parameters to define the priors for the other fied effects. These can be a numeric value or another named list, using the names of fied effects, to set different priors for different effects. Note that

8 8 Spatial Data Analsis with R-INLA with Some Etensions Name in f() Model Regular grid besag Intrinsic CAR No besagproper Proper CAR No bm Convolution model No generic0 Σ = 1 τ Q 1 No generic1 Σ = 1 τ (I n ρ λ ma C) 1 No rw2d 2-D random walk Yes matern2d Matérn correlation Yes Table 1: Summar of some latent models implemented in R-INLA for spatial statistics. precisions in the fied effects priors cannot be estimated as was the case with the different random effects presented before. The model that we are fitting is: i = α + βx i + v i + ε i (14) where α is the model intercept, β a vector of coefficients of the covariates X i, v i a random effect with an intrinsic CAR specification and ε i is random Gaussian error term. As f() needs an area inde which must have different values for different areas, this is first defined in variable id. R> librar("inla") R> boston$id <- 1:nrow(boston) R> form <- log(cmedv) ~ CRIM + ZN + INDUS + CHAS + I(NOX^2) + I(RM^2) + + AGE + log(dis) + log(rad) + TAX + PTRATIO + B + log(lstat) + + f(id, model = "besag", graph = adj) R> btdf <- as.data.frame(boston) R> m1 <- inla(form, data = btdf, control.predictor = list(compute = TRUE)) Note how the call to inla() is similar to fitting other regression models with R with glm() or gam(). Furthermore, it is ver eas to include spatial random effects with function f() in the formula passed to inla(). Finall, control.predictor = list(compute = TRUE) is used to compute summar statistics on the fitted values. A summar of the model can be obtained as follows: R> summar(m1) Call: "inla(formula = form, data = btdf, control.predictor = list(compute = TRUE))" Time used: Pre-processing Running inla Post-processing Total Fied effects:

9 Journal of Statistical Software 9 mean sd 0.025quant 0.5quant 0.975quant mode kld (Intercept) CRIM ZN INDUS CHAS I(NOX^2) I(RM^2) AGE log(dis) log(rad) TAX PTRATIO B log(lstat) Random effects: Name Model id Besags ICAR model Model hperparameters: mean sd Precision for the Gaussian observations 1.626e e+04 Precision for id 1.222e e quant 0.5quant Precision for the Gaussian observations 7.582e e+04 Precision for id 1.074e e quant mode Precision for the Gaussian observations 6.180e e+03 Precision for id 1.381e e+01 Epected number of effective parameters(std dev): (5.348) Number of equivalent replicates : Marginal Likelihood: Posterior marginals for linear predictor and fitted values computed The output includes summar statistics of the posterior marginals of the coefficients of the fied effects plus the precisions of the error term and intrinsic CAR random effect. In addition, kld reports the Kullback-Leibler divergence between the Gaussian and the (simplified) Laplace approimation to the marginal posterior densities. This provides information about the accurac of the Gaussian approimation. The marginal likelihood of the model is also reported and it is computed b integrating all the model parameters out. Hence, it is not the predictive marginal likelihood and it can be used to perform model selection, for eample. The effictive number of parameters, as defined in Spiegelhalter et al. (2002), and the associated number of equivalent replicates are also shown. See Martino and Rue (2010) for more details on the R-INLA output.

10 10 Spatial Data Analsis with R-INLA with Some Etensions (Intercept) CRIM ZN INDUS CHAS1 I(NOX^2) I(RM^2) AGE log(dis) log(rad) TAX PTRATIO e 04 2e B log(lstat) Precision error Precision spatial effects e+00 6e e+00 4e 05 0e+00 2e+05 4e Figure 1: Marginals of the fied effects, and the precisions of the error term and spatial random effects, Boston housing data. Figure 1 shows the estimated marginals of the coefficients of the fied effects and the precisions of the random effects in the model. These distributions can be used to compute summar statistics for the model parameters. In the previous R-INLA output these marginals have been used to compute the posterior mean, standard deviation, mode and some quantiles (0.025, 0.5 and 0.975). Fitted values can be easil displaed in a map. First, we need to add all the required values to the SpatialPolgonsDataFrame: R> boston$logcmedv <- log(boston$cmedv) R> boston$ftdlogcmedv <- m1$summar.fitted[, "mean"] Note that we will represent values in the log-scale. Net, we can use spplot() to displa

11 Journal of Statistical Software 11 Observed CMEDV Predicted CMEDV Figure 2: Observed and predicted median values, Boston housing data. both the observed and the predicted values of house prices. In the following eample, which can be seen in Figure 2, we have also used package RColorBrewer (Neuwirth 2014) to define a suitable color palette: R> librar("rcolorbrewer") R> spplot(boston, c("logcmedv", "FTDLOGCMEDV"), + col.regions = brewer.pal(9, "Blues"), cuts = 8, + names.attr = c("observed log-cmedv", "Predicted log-cmedv")) To provide an alternative visualisation of the results, we have included a short eample using function qmap() from the ggmap package (Kahle and Wickham 2013). First of all we will reproject our data to be WGS84. With fortif() the boston dataset is converted into a suitable format to be used when plotting and then the log median values are added to the new data. R> bostonf <- sptransform(boston, CRS("+proj=longlat +datum=wgs84")) R> librar("ggmap") R> bostonf <- fortif(bostonf, region = "TRACT") R> id <- match(bostonf$id, as.character(boston$tract)) R> bostonf$logcmedv <- boston$logcmedv[id] qmap() is based on the the grammar of graphics implemented in the ggplot2 package (Wickham 2009). In the net eample, qmap() is used to get satellite data from the Boston area, whilst geom_polgon() adds the boundaries: R> qmap("boston", zoom = 10, maptpe = "satellite") + geom_polgon( + data = bostonf, aes( = long, = lat, group = group, fill = LOGCMEDV), + colour = "white", alpha = 0.8, size = 0.3) The resulting map can be seen in Figure 3.

12 12 Spatial Data Analsis with R-INLA with Some Etensions Figure 3: Displa of the Boston housing data set using ggmap and Google Maps Point patterns Point patterns are analzed with INLA as the result of a counting process, i.e., points are not modelled directl but the are aggregated over a a grid of small squares. For this reason, the analsis of point patterns is conducted similarl to that of lattice data: counts are available for each square and these are assigned neighbors according to the adjacent squares. Then, counts can be smoothed using an appropriate non-linear term, such as spatial random effects. Hossain and Lawson (2009) compare different approimations to the analsis of point patterns, including methods that are based on discretisation of the stud region. In the following eample we use the Japanese black pine data set from R package spatstat (Baddele and Turner 2005). This data set records the location of Japanese black pine saplings in a square sampling in a natural forest. This eample is reproduced from Go mez-rubio et al. (2014b). Hence, we first split the stud area into smaller squares to create a grid of squares. R> R> R> R> R> R> R> + librar("spatstat") data("japanesepines") japd <- as.data.frame(japanesepines) Nrow <- 10 Ncol <- 10 n <- Nrow * Ncol grd <- GridTopolog(cellcentre.offset = c(0.05, 0.05), cellsize = c(1/nrow, 1/Ncol), cells.dim = c(nrow, Ncol))

13 Journal of Statistical Software 13 After the creation of the grid, we have used function over() on the set of points and the newl defined squares to find how man points can be found in each square. R> polgrdjap <- as(grd, "SpatialPolgons") R> idpp <- over(spatialpoints(japd), polgrdjap) R> japgrd <- SpatialGridDataFrame(grd, data.frame(ntrees = rep(0, n))) R> tidpp <- table(idpp) R> japgrd$ntrees[as.numeric(names(tidpp))] <- tidpp Net, an inde variable is built to create the spatial neighborhood structure to be passed to the f() function. Note that care must be taken as R and R-INLA ma have a different ordering of the areas when defining the adjacenc matri. R> japgrd$spidx <- 1:n R> japnb <- pol2nb(polgrdjap, queen = FALSE, row.names = 1:100) R> adjpine <- nb2mat(japnb, stle = "B") R> adjpine <- as(adjpine, "dgtmatri") Here we have avoided using a queen adjacenc as this will consider as neighbors two areas which onl share a corner. Finall, we define the call to inla() using a formula which includes spatial random effects based on the grid of squares. In addition, we have set other options to compute the DIC, with control.compute = list(dic = TRUE), and the marginals of the linear predictors, using control.predictor = list(compute = TRUE). We have included the specification of the prior distributions of the log-precisions of unstructured and spatial random effects as well. R> fpp <- Ntrees ~ 1 + f(japgrd$spidx, model = "bm", graph = adjpine, + hper = list(prec.unstruct = list(prior = "loggamma", + param = c(0.001, 0.001)), + prec.spatial = list(prior = "loggamma", param = c(0.1, 0.1)))) R> japinlala <- inla(fpp, famil = "poisson", data = as.data.frame(japgrd), + control.compute = list(dic = TRUE), + control.inla = list(tolerance = 1e-20, h = 1e-08), + control.predictor = list(compute = TRUE)) R> japgrd$inlala <- japinlala$summar.fitted.values[, "mean"] The former model is the one that we have emploed with the Boston data set on an irregular lattice. Given that now we are considering a regular lattice it is also possible to use a twodimensional random walk for spatial smoothing: R> fpprw2d <- Ntrees ~ 1 + f(japgrd$spidx, model = "rw2d", nrow = 10, + ncol = 10, hper = list(prec = list(prior = "loggamma", + param = c(0.001, 0.001)))) R> japinlalarw2d <- inla(fpprw2d, famil = "poisson", + data = as.data.frame(japgrd), control.compute = list(dic = TRUE), + control.inla = list(tolerance = 1e-20, h = 1e-08), + control.predictor = list(compute = TRUE)) R> japgrd$inlalarw2d <- japinlalarw2d$summar.fitted.values[, "mean"]

14 14 Spatial Data Analsis with R-INLA with Some Etensions DATA INLA BYM INLA RW2D Figure 4: Estimation of the intensit of a point pattern with R-INLA, Japanese black pine dataset. Figure 4 shows the original counts and the smoothed counts. Note that this is similar to estimating the intensit of an inhomogeneous point pattern using a smoothing method Geostatistics R-INLA deals with geostatistical data on a regular grid. This means that observations need to be matched to the points in the grid and that those points with no observations attached are considered as missing values. Hence, this is somewhat similar to the analsis of lattice data and point patterns. However, R-INLA provides a number of options to build model-based geostatistical models (Diggle and Ribeiro 2007). First of all, different likelihoods can be used. Secondl, there are different options to define the spatial random effects. Although it is still possible to model spatial dependence in the grid of points using a CAR specification, R-INLA provides a two-dimensional Matérn covariance function. This correlation allows, for eample, the use of eponentiall decaing functions such as Σ ij = σ 2 ep( d ij /ϕ) (15) where d ij is the distance between points i and j, and ϕ is a parameter that controls the scale of the spatial dependence. More recentl, Lindgren, Rue, and Lindström (2011) follow a different approach based on a triangulation on the sampling points and the use of stochastic partial differential equations. Now, the spatial effects are defined as u(s) = n ψ k (s)w k, s R 2 (16) k=1 Here, {ψ k (s)} are a basis of functions and w k are associated weights. Weights are assumed to be Gaussian. The advantages of this approach for spatial statistics are full described in Cameletti, Lindgren, Simpson, and Rue (2013). In order to show how to fit geostatistical models with R-INLA we reproduce here an eample from Gómez-Rubio et al. (2014b) based on the Rongelap data set (Diggle and Ribeiro 2007), which records radionuclide concentration at 157 different locations in Rongelap island. We have restricted the analsis to one of the clusters in the north-east part of the island because

15 Journal of Statistical Software 15 observations need to be matched to a regular grid of points. For this analsis we have used R packages geor (Ribeiro and Diggle 2001) and georglm (Christensen and Ribeiro 2002). First of all, data are loaded and the data from the desired clusters are etracted from the original data set b checking that their coordinates are in the window ( 700, 500) ( 1900, 1700). R> librar("geor") R> librar("georglm") R> data("rongelap") R> rgldata <- as.data.frame(rongelap) R> <- rongelap[[1]] R> id1 <- ([, 1] < -500 & [, 1] > -700 & [, 2] > & + [, 2] < -1700) R> rgldata <- rgldata[id1, ] The net step is to define the grid topolog for the grid that will be used to match these points to. The grid is defined to be of dimension 5 5. R> Nrow <- 5 R> Ncol <- 5 R> n <- Nrow * Ncol R> grdoffset <- c(min(rgldata$x1), min(rgldata$x2)) R> csize1 <- diff(range(rgldata$x1))/(nrow - 1) R> csize2 <- diff(range(rgldata$x2))/(ncol - 1) R> grd <- GridTopolog(cellcentre.offset = grdoffset, + cellsize = c(csize1, csize2), cells.dim = c(nrow, Ncol)) Data will be placed in a SpatialGridDataFrame (using the previousl defined grid topolog) and re-organized according to what R-INLA epects for this model (i.e., grid data stored b column). An inde variable IDX is added to be used in f() when defining the model. However, R-INLA will rel on how the rows are ordered in the data passed to inla() when defining distances and adjacencies (i.e., the inde variable ordering will not be considered). R> inla2sp <- inla.lattice2node.mapping(nrow, Ncol)[, Ncol:1] R> inla2sp <- as.vector(inla2sp) R> spgrd <- SpatialGridDataFrame(grd, as.data.frame(rgldata[inla2sp, ])) R> spgrd$idx <- 1:nrow(spgrd@data) Net, we create a SpatialPolgons with the boundaries of the squares in the grid. This wa, it is eas to match the data to the newl created grid using function over(). R> polgrd <- as(grd, "SpatialPolgons") R> dataid <- over(spatialpoints(as.matri(rgldata[, 1:2])), polgrd) It should be noted that radionuclide concentration is measured at each square b the average of the observations in the square, and this needs to be computed beforehand. R> ag <- b(rgldata$data, dataid, sum) R> umag <- b(rgldata$units.m, dataid, sum) R> ratioag <- ag/umag

16 16 Spatial Data Analsis with R-INLA with Some Etensions DATA INLA MATERN2D INLA RW2D Figure 5: Observed and estimated radionuclide concentration in Rongelap island. Then, a new column is added to the SpatialGridDataFrame with these averages. NA will be used for the squares with no data so that these values will be imputed from the model. R> spgrd$ratioag <- NA R> spgrd$ratioag[as.numeric(names(ratioag))] <- ratioag Here we define a model with an intercept term and a random effect of the Matérn class. Note how we have fied, for convenience, the value of the range and precision. R> formula1 <- ratioag ~ 1 + f(spgrd$idx, model = "matern2d", nrow = Nrow, + ncol = Ncol, hper = list(range = list(initial = log(sqrt(8)/0.5), + fied = TRUE), prec = list(initial = log(1), fied = TRUE))) R> rglinlala <- inla(formula1, famil = "poisson", + control.predictor = list(compute = TRUE), + control.compute = list(dic = TRUE), + data = as.data.frame(spgrd)) R> spgrd$inlala <- rglinlala$summar.fitted.values[, "mean"] Similarl as in the point patterns eample, here we have also used a two dimensional random walk for spatial smoothing. R> formularw2d <- ratioag ~ 1 + f(spgrd$idx, model = "rw2d", nrow = Nrow, + ncol = Ncol, hper = list(prec = list(prior = "loggamma", + param = c(1, 1)))) R> rglinlalarw2d <- inla(formularw2d, famil = "poisson", + control.predictor = list(compute = TRUE), + control.compute = list(dic = TRUE), + data = as.data.frame(spgrd)) R> spgrd$inlalarw2d <- rglinlalarw2d$summar.fitted.values[, "mean"] Figure 5 shows the observed and estimated radionuclide concentration in Rongelap island. It can be seen how our model has spatiall smoothed the observed values.

17 Journal of Statistical Software 17 DATA INLA BYM INLA RW2D BAYESX Figure 6: Estimation of the intensit of a point pattern with R-INLA and BaesX, Japanese black pine dataset R-INLA and other packages for Baesian spatial modelling R-INLA is not the onl package for Baesian spatial modelling. Bivand, Pebesma, and Gómez- Rubio (2013, Chapter 10) compare different packages for Baesian modelling in the contect of disease mapping. We wil focus here in R2BaesX (Umlauf, Kneib, Lang, and Zeileis 2013; Umlauf, Adler, Kneib, Lang, and Zeileis 2015) because it provides a wa to defining spatial models as R-INLA. For eample, in order to reproduce the eample on the Japanese black pine data with R2BaesX we can do the following: R> librar("r2baesx") R> baesadj <- nb2gra(japnb) R> japb <- baes(ntrees ~ 1 + s(spidx, bs = "re") + + s(spidx, bs = "spatial", map = baesadj), famil = "poisson", + data = as.data.frame(japgrd)) Function nb2gra() is used to convert our adjacenc matri into an object of class gra, which is used in R2BaesX to store adjacencies. baes() takes similar arguments as inla() and the model can be epressed using a formula, with s() used to define the random effects. s(id, bs = "re") defines independent Gaussian random effects and the spatial random effects are defined in s(tract, bs = "spatial", map = baesadj) using adjancenc matri defined in baesadj. Retrieving the predicted data requires some care as the are reordered, but is is as simple as: R> japgrd$bayesx <- japb$fitted.values[ + order(japb$baes.setup$order), "mu"] Finall, we compare the fitted values obtained with R-INLA and R2BaesX in Figure 6. Note that differences appear not onl because of the different models used but also because of the choice of prior distributions.

18 18 Spatial Data Analsis with R-INLA with Some Etensions 4. Etending R-INLA to fit new models Although the current implementation of INLA in the R-INLA package provides a reasonable number of models for spatial dependence it ma be the case that we need to include some other models. As it is now, this is not possible without adding to the code of the eternal INLA programme. Bivand et al. (2014b) describe a simple wa of etending INLA to use other latent models. In particular the focus on some latent models used in spatial econometrics that are not available as part of the R-INLA package at the moment. A new latent class has been added recentl and it is described in Gómez-Rubio, Bivand, and Rue (2014a). This approach is based on considering a model where one or several parameters have been fied in a wa that makes the conditioned model fittable with R-INLA. If we denote b ρ the vector of parameters to fi and b ˆρ a specific set of fied parameter values, the full posterior marginal could be written as π(, θ, ˆρ) (17) Taking this into account, it is clear that when conditioning on ρ = ˆρ R-INLA will give us an approimation to π( i, ˆρ) and π(θ i, ˆρ). Note that the full posterior distribution can be obtained b integrating ρ out, i.e., π(, θ ) = π(, θ, ρ)π(ρ )dρ (18) where π(ρ ) is the posterior distribution of ρ. Also, note that this can be written as π(ρ ) π( ρ)π(ρ) (19) Here π(ρ) is a prior distribution on ρ and π( ρ) is the marginal likelihood of the model, which is reported b R-INLA. Hence, π(ρ ) can be estimated b re-scaling the epression in Equation 19. The posterior distribution of ρ can be estimated b defining a fine grid of values S = {ρ i } r i=1 so that π(ρ i ), i = 1,..., r are computed. Then π(ρ ) can be obtained b fitting and rescaling a spline (or other non-linear function) to the previous values. Using simple numerical integration techniques we can obtain an approimation to π(, θ ) as follows: π(, θ ) = π(, θ, ρ)π(ρ )dρ π(, θ, ρ i )π(ρ i ) i (20) ρ i S where i is the amplitude of the interval used in the discretisation of ρ. Note that the previous epression can be regarded as a weighted average of the different models fitted after conditioning on different values of ρ. From Equation 20 it is clear that we can obtain the following approimations to the posterior marginals of the individual latent parameters and hperparameters: ˆπ( i ) = j π( i, ρ j )w j (21) ˆπ(θ i ) = j π(θ i, ρ j )w j (22)

19 Journal of Statistical Software 19 w j is a weight associated with ρ j as follows: w j = π(ρ j ) j (23) This is like carring out Baesian model averaging (Hoeting, Madigan, Rafter, and Volinsk 1999) on the different conditioned models fitted with R-INLA. Altogether, this provides a wa of combining simpler models to obtain our desired model. In Section 5 we show how to appl these ideas to different models in spatial statistics. Note that this approach can be easil etended to the case of ρ being a discrete random variable Implementation We have implemented this approach in an R package called INLABMA, available from CRAN. The package includes some general functions to conduct Baesian model averaging of models fitted with INLA. In addition, we have included some wrapper functions to fit the models described in Section Eamples 5.1. Lerou model Lerou, Lei, and Breslow (1999) propose a model for the analsis of spatial data in a lattice which is similar to the one b Besag et al. (1991), in the sense that the split variation according to spatial and non-spatial patterns. Rather than including the spatial and nonspatial random effect as a sum in the linear term the consider a single random effect as follows: u MVN(0, Σ); Σ = σ 2 ((1 λ)i n + λm) 1 (24) Here M is the precision matri of a process with spatial structure and we will take that of an intrinsic CAR specification. Hence, the precision matri is, in a sense, a miture of the precisions of a non-spatial and a spatial one. λ controls how strong the spatial structure is. For λ = 1 the effect is entirel spatial whilst for λ = 0 there is no spatial dependence. In principle, this is not a model that R-INLA can fit. However, if λ is fied, then the random effects are Gaussian with a known structure for the variance-covariance matri which can be fitted using a generic0 latent model. Boston housing data Here we revisit the Boston housing data to fit the Lerou et al. model. First of all, it is worth mentioning that the model needs a wrapper function to be fitted for a given value of the spatial parameter λ. This wrapper function is included in the R package R-INLA and it is based on the generic0 latent model available in R-INLA. Once λ is fied the model can be easil fitted with R-INLA, as the latent effect is a multivariate Gaussian random effect with zero mean and precision matri as in Equation 24. We repeat this procedure for different values of λ to obtain a list of fitted models to be combined later. Hence, we have written a simple wrapper function which is included in package INLABMA (Gómez-Rubio and Bivand 2014):

20 20 Spatial Data Analsis with R-INLA with Some Etensions R> librar("inlabma") R> lerou.inla function (formula, d, W, lambda, improve = TRUE, fhper = NULL,...) { W2 <- diag(appl(w, 1, sum)) - W Q <- (1 - lambda) * diag(nrow(w)) + lambda * W2 assign("q", Q, environment(formula)) if (is.null(fhper)) { formula <- update(formula,. ~. + f(id, model = "generic0", Cmatri = Q)) } else { formula <- update(formula,. ~. + f(id, model = "generic0", Cmatri = Q, hper = fhper)) } res <- INLA::inla(formula, data = d,...) if (improve) res <- INLA::inla.rerun(res) res$logdet <- as.numeric(matri::determinant(q)$modulus) res$mlik <- res$mlik + res$logdet/2 return(res) } <environment: namespace:inlabma> In the previous code, the precision matri Q is created using the adjacenc matri W and the value of λ. Then the generic0 model is added to the formula with the fied effects. Finall we correct the marginal log-likelihood π( λ) (conditioned on the value of λ) b adding half the log-determinant of ((1 λ)i n + λm). Note that, in principle, this is not needed to fit a single model and obtain the approimations to the posterior marginals as it is a constant. However, we are fitting and combining several models so we need to correct for this because this scaling factor will change with the value of λ. Argument... is used to pass an other options to inla(). This can be used to tune and set a number of other options. Also, the adjacenc matri is taken from the data provided in the boston data set. Note that we will be using a binar adjacenc matri as the random effects have an intrinsic CAR specification: R> boston.matb <- listw2mat(nb2listw(bostonadj, stle = "B")) R> bmspb <- as(boston.matb, "CsparseMatri") Function inla.lerou is used in the eample below to compute the fitted models for the Lerou et al. model. In this case, we take λ to be in the interval (0.8, 0.99) after previous assessment on where π(λ ) has its mode. Also, we define a prior for the precision of the random effects in variable fhper. The prior for the precision of the error term is defined in errorhper. In addition, we have used mclappl to parallelize the computations on operating sstems supporting forking (not Windows). Note that this is an advantage of fitting these conditioned models compared with standard MCMC methods.