Journal of Statistical Software


 Theresa Bailey
 1 years ago
 Views:
Transcription
1 JSS Journal of Statistical Software Januar 2015, Volume 63, Issue 20. Spatial Data Analsis with RINLA with Some Etensions Roger S. Bivand NHH Norwegian School of Economics Virgilio GómezRubio Universidad de CastillaLa Mancha Håvard Rue Norwegian Universit for Science and Technolog Abstract The integrated nested Laplace approimation (INLA) provides an interesting wa of approimating the posterior marginals of a wide range of Baesian hierarchical models. This approimation is based on conducting a Laplace approimation of certain functions and numerical integration is etensivel used to integrate some of the models parameters out. The RINLA package offers an interface to INLA, providing a suitable framework for data analsis. Although the INLA methodolog can deal with a large number of models, onl the most relevant have been implemented within RINLA. However, man other important models are not available for RINLA et. In this paper we show how to fit a number of spatial models with RINLA, including its interaction with other R packages for data analsis. Secondl, we describe a novel method to etend the number of latent models available for the model parameters. Our approach is based on conditioning on one or several model parameters and fit these conditioned models with RINLA. Then these models are combined using Baesian model averaging to provide the final approimations to the posterior marginals of the model. Finall, we show some eamples of the application of this technique in spatial statistics. It is worth noting that our approach can be etended to a number of other fields, and not onl spatial statistics. Kewords: INLA, spatial statistics, R. 1. Introduction Baesian inference has become ver popular in spatial statistics in recent ears. Part of this success is due to the availabilit of computation methods to tackle fitting of spatial models. Besag, York, and Mollié (1991) proposed in their seminal paper an appropriate wa of fitting
2 2 Spatial Data Analsis with RINLA with Some Etensions a spatial model using Markov chain Monte Carlo methods. This model has been etensivel used and etended to consider different tpes of fied and random effects for spatial and spatiotemporal analsis. In general, fitting these models has been possible because of the availabilit of different computational techniques, the most notable being Markov chain Monte Carlo (MCMC). For large models or big data sets, MCMC can be tedious and reaching the required number of samples can take a long time. Not to mention that autocorrelation ma arise and that an increased number of iterations ma be required. Alternativel, the posterior distributions of the parameters ma be approimated in some wa. However, most models are highl multivariate and approimating the full posterior distribution ma not be possible in practice. The integrated nested Laplace approimation (INLA, Rue, Martino, and Chopin 2009) focuses on the posterior marginals for latent Gaussian models. Although these models ma seem rather restricted, the appear in a fair number of fields. This also means that INLA will be particularl useful when onl marginal inference on the model parameters is needed. The RINLA package (Rue, Martino, Lindgren, Simpson, Riebler, and Krainski 2014; Lindgren and Rue 2015) for the R programming language (R Core Team 2014) provides an interface to INLA (a freestanding programme) so that models can be fitted using standard R commands. Results are readil available for plotting or further analsis. First of all, we describe how RINLA can be used together with other R packages for spatial data analsis. It is often the case that spatial data are available in different formats that need to be loaded into R and some preprocessing is required. Also, once the results are available, it is helpful to eplain how to displa them on a map. Although INLA is a general method to approimate the posterior marginals, RINLA implements a number of popular latent models and prior distributions for the model parameters. It is, however, difficult to fit new models with INLA if these are based on other distributions not available in RINLA. This ma be an inconvenience when tring to develop new models as there is no eas wa of etending RINLA to fit other models without writing them into INLA itself. This is wh we also describe a wa of etending the number of models that RINLA can fit with little etra effort. First of all, we consider one (or more) parameters in our model so that, if the are fied, the resulting model can be fitted with RINLA. What we are doing here is, in fact, to fit a model conditioned on the assigned values to the parameters. Then, we can assign different values to these parameters and combine the resulting models in some wa to obtain a fit of the original model. We have used Baesian model averaging and numerical integration techniques to combine these models (Bivand, GómezRubio, and Rue 2014b). This paper is organized as follows. Section 2 describes the integrated nested Laplace approimation. In Section 3 the different latent models for spatial statistics are described. We describe how to etend RINLA to fit new models in Section 4. Some eamples are provided in Section 5. Finall, we discuss wh our approach is relevant in Section Integrated nested Laplace approimation Baesian inference is based on computing the posterior distribution of a vector of model parameters conditioned on the vector of observed data. Baes rule states that this
3 Journal of Statistical Software 3 posterior distribution can be written down as π( ) π( )π() (1) Here, π( ) is the likelihood of the model and π() represents the prior distribution on the model parameters. Usuall, π( ) is a highl multivariate distribution and difficult to obtain. In particular, it is seldom possible to derive it in a closed form. For this reason, several computational approaches have been proposed to get approimations to it. MCMC is probabl the most widel used famil of computational approaches to estimate the posterior distribution. The marginal distribution of parameter i can be denoted b π( i ) and it can be easil derived from the full posterior b integrating out over the remaining set of parameters i. Let us assume that we have a set of n observations = { i } n i=1, whose distribution is of the eponential famil. The mean of observation i is µ i and it can depend on a linear predictor η i via a link function. In turn, the linear predictor η i can be modelled as follows: n f η i = α + f (j) (u ji ) + β k z ki + ε i (2) j=1 α is the intercept, f (j) are functions on a set of n f random effects on a vector of covariates u, β k are coefficients on some covariates z and ε i are error terms. Hence, the vector of latent effects is = {{η i }, α, {β k },...}. Note that given our particular interest in spatial models, terms f (j) (u ji ) can be defined as to model spatial or spatiotemporal dependence. is modelled using a Gaussian distribution with zero mean and variancecovariance matri Q(θ 1 ). Now, θ 1 is a vector of hperparameters. Furthermore, is assumed to be a Gaussian Markov random field (GMRF, Rue and Held 2005). This means that Q(θ 1 ) will fulfil a number of Markov properties. The distribution of observations i will depend on the latent effects and, possibl, a number of hperparameters θ 2. Taking the vector of hperparameters θ = (θ 1, θ 2 ), observations i will be independent of each other given i and θ because of being a GMRF. Following Rue et al. (2009), the posterior distribution of the model latent effects and hperparameters θ can be written as n β k=1 π(, θ ) π(θ)π( θ) i I π( i i, θ) (3) π(θ) Q(θ) 1/2 ep{ 1 2 T Q(θ) + i I log(π( i i, θ))} I represents an inde of observed data (from 1 to n), Q(θ) is a precision matri on some hperparameters θ and log(π( i i, θ)) is the loglikelihood of observation i. INLA allows different forms for the likelihood of the observations. This includes not onl distributions from the eponential famil but also mitures of distributions. Also, INLA can handle observations with different likelihoods in the same model. Regarding the latent effects, different models can be used. We will describe some of these in more detail in Section 3. The specification of the prior distributions π(θ) is also ver fleible. These will often depend on the latent effect but, in principle, the most common distributions are available and the
4 4 Spatial Data Analsis with RINLA with Some Etensions user can define their own prior distribution in the RINLA package (but we will return to this later). Hence, we can write the marginals of the elements in and θ (i.e., latent effects and hperparameters) as π( i ) = π( i θ, )π(θ )dθ (4) and π(θ j ) = π(θ )dθ j (5) In order to estimate the previous marginals, we need π(θ ) or, alternativel, a convenient approimation that we will denote b π(θ ). Initiall, this approimation can be taken as π(, θ, ) π(θ ) (6) π G ( θ, ) = (θ) Here π G ( θ, ) is a Gaussian approimation to the full conditional of and (θ) is the mode of the full conditional for a given value of θ. Rue et al. (2009) take this approimation and use it to compute the marginal distribution of i using numerical integration: π( i ) = k π( i θ k, ) π(θ k ) k (7) Here k are the weights associated with the ensemble of values θ k, defined on a multidimentional grid over the space of hperparameters. Note that in the previous equation it is important to have good approimations of π( i θ k, ). A Gaussian approimation π G ( i θ k, ), with mean µ i (θ) and variance σi 2 (θ), ma be a good starting point but a better approimation ma be required in other cases. Rue et al. (2009) developed better approimations based on alternative approimation methods, such as the Laplace approimation. For eample, the have used the Laplace approimation to obtain: π(, θ, ) π LA ( i θ, ) (8) π GG ( i i, θ, ) i = i ( i,θ) π GG ( i i, θ, ) is a Gaussian approimation to i i, θ, around its mode i ( i, θ). Rue et al. (2009) develop a simplified Laplace approimation to improve π LA ( i θ, ) using a series epansion of the Laplace approimation around i. This approimation is computationall less epensive, and it also corrects for location and skewness The RINLA package An interface to INLA has been provided as an R package called RINLA, which can be downloaded from together with the freestanding eternal INLA programme. RINLA provides a user model interface similar to the one used to fit generalized additive models (GAM) with function gam() in the mgcv package (Wood 2006). It can handle fied effects, nonlinear terms and random effects in a formula argument. The interface is fleible enough to allow for the specification of different priors and model fitting options. Nonlinear terms and random effects are included in the formula as calls to the f() function.
5 Journal of Statistical Software 5 The model is fitted with a call to function inla(), which will return the fitted model as an inla object. Note that, b default, onl some results will be returned. These include the marginal distributions of the latent effects and hperparameters, as well as summar statistics. In addition to the posterior marginals, RINLA can provide a number of additional quantities on the fitted model. For eample, it can provide the logmarginal likelihood π() which can be used for model selection. Other model selection criteria such as the DIC (Spiegelhalter, Best, Carlin, and Van der Linde 2002) and CPO (Held, Schödle, and Rue 2010) have also been implemented. Furthermore, RINLA includes a number of options to define the prior distributions for the parameters in the model. Wellknown prior distributions are available and the user can define their own prior distributions as well. In the net Section we describe different eamples of the use of RINLA for spatial statistics, in which we have included a detailed description on how inla() should be called. 3. Spatial models with INLA As discussed in Section 2, spatial dependence can be included as part of the vector of latent effects. In principle, an number of random effects can be included in the model. In this Section, we will describe the different options available, depending on the tpe of problem. A full description of the models described here can be found in the RINLA website at http: //www.rinla.org/, but we have included a summar. Blangiardo, Cameletti, Baio, and Rue (2013) and GómezRubio, Bivand, and Rue (2014b) also discuss the different spatial models included in RINLA. First we will briefl introduce other papers describing the use of INLA and RINLA for spatial statistics. Schrödle and Held (2010) describe the use of spatial and spatiotemporal models for disease mapping, including ecological regression. Schrödle and Held (2011) epand the number of spatiotemporal models that can be used with RINLA, and show the use of setting linear constraints to make comple spatiotemporal effects identifiable. Schrödle, Held, Riebler, and Danuser (2011) show how to use spatiotemporal models for disease surveillance. Eidsvik, Finle, Banerjee, and Rue (2012) focus on the use of RINLA for the analsis of large spatial datasets. Finall, RuizCardenas, Krainski, and Rue (2012) develop spatiotemporal dnamic models with RINLA Analsis of lattice data First of all, we will discuss the analsis of lattice data because this will establish the basis for other tpes of analses. In the analsis of lattice data observations are grouped according to a set of areas, which usuall represent some sort of administrative region (neighborhoods, municipalities, provinces, countries, etc.). RINLA includes a latent model for uncorrelated random effects. In this case, the random effects u i are modelled as u i N(0, τ u ) (9) where τ u refers to the precision of the Gaussian distribution. It should be noted that RINLA assigns a prior to log(τ u ) which, b default, is a loggamma distribution. Although this model
6 6 Spatial Data Analsis with RINLA with Some Etensions is not spatial, it can be combined with other spatial models. Using log(τ u ) instead of simpl τ u provides some advantages as log(τ u ) is not constrained to be positive. This is particularl useful when optimising to find the mode of log(τ u ), for eample. In order to model spatial correlation, neighborhoods must be defined among the stud areas. It is often considered that two areas are neighbors is the share a common boundar. Spatial autocorrelation is modelled using a Gaussian distribution with zero mean and a precision matri that will model correlation between neighbors. Given that latent effects are a GMRF, we can define the variancecovariance matri of the random effects as Σ = 1 τ Q 1 (10) where τ is a precision parameter and matri Q encodes the spatial structure. Given that we are assuming a latent GMRF, this also means that matri Q will be defined such as element Q ij is zero if areas i and j are not neighbors. This means that Q is often a ver sparse matri. See, for eample, Rue and Held (2005) for details. Available specifications for spatial dependence includes the intrinsic conditional autoregressive (CAR) specification (Besag et al. 1991). This will produce a Q matri in which element Q ii is n i (the number of neighbors of area i) and element Q ij (with i j) is 1 if areas i and j are neighbors and 0 otherwise. This means that the spatial random effects v i are distributed as v i v j, τ v N( 1 1 v j, ) i j (11) n i τ v n i i j τ v is the conditional precision of the random effects. As in the previous model, RINLA uses a loggamma prior on log(τ v ). In addition, a proper version of this model is available as well, for which the spatial random effects are distributed as 1 1 v i v j, τ v N( v j, n i + d τ v (n i + d) ) i j (12) i j d is a positive quantit to make the distribution proper. B default, a loggamma distribution is assigned to log(d). A more general approach is obtained with the following precision matri: Q = (I ρ λ ma C) (13) Here I is the identit matri, ρ a spatial autocorrelation parameter, C an adjacenc matri and λ ma the maimum eigenvalue of C. RINLA assigns a Gaussian prior on log(ρ/(1 ρ)). This specification ensures that ρ takes values between 0 and 1. In the following eample we use the Boston housing data, which is described in Harrison and Rubinfeld (1978), to develop an eample on several spatial models. This data set records median price for houses that were occupied b their owners plus some other relevant covariates (see Harrison and Rubinfeld 1978; Pace and Gille 1997, for details). Data have been recorded at the tract level and the neighborhood structure of the tracts is also available, and it is available in the boston data set from the R package spdep (Bivand 2014). In addition, this data set is also available in a shapefile, which is the one we will use in this eample. This
7 Journal of Statistical Software 7 will provide a more general eample on how to load eternal data into R to fit models with RINLA. readshapepol(), from package maptools (Bivand and LewinKoh 2014), can be used to load vector data from a shapefile. Alternativel, readogr(), from package rgdal (Bivand, Keitt, and Rowlingson 2014a), provides a more general data loading framework for vector data since it supports a wider range of formats. This is the one we have used to load the Boston data set: R> librar("rgdal") R> boston < readogr(sstem.file("etc/shapes", package = "spdep")[1], + "boston_tracts") Here, readogr() takes the director where the laer (shapefile) is located and the laer name, which in this case is the name of the shapefile, as arguments and return an object of tpe SpatialPolgonsDataFrame. This data object is used to store the tract boundaries plus the associated data (tract name and other variables). Before fitting an spatial model, the neighborhood structure needs to be defined. A common criterion is to consider that two areas are neighbors if the share a common boundar. Function pol2nb() will take the tract boundaries and perform this operation to provide us with the adjacenc structure of the Boston tracts as a nb object: R> librar("spdep") R> bostonadj < pol2nb(boston, queen = FALSE) Here, we have also set queen = FALSE so that queen adjacenc is not used, i.e., in order to consider two areas as neighbors more than one shared point is required. We have converted this into a binar matri to be used with RINLA using function nb2mat(). Furthermore, the adjacenc matri is converted into a sparse matri of class dgtmatri to reduce memor usage. This will be passed to function f() when defining the spatial model. R> adj < nb2mat(bostonadj, stle = "B") R> adj < as(adj, "dgtmatri") A summar of some latent models implemented in RINLA, and that can be used within the f() function, is available in Table 1. Note that this is not an ehaustive list and that a complete list of the available latent models can be obtained from the RINLA documentation. We have also included a column showing whether these models are restricted to a regular grid. Also, detailed eamples are available from the RINLA website at Fied effects (including the intercept) in RINLA have a Gaussian prior with fied mean and precision, which are 0 and 0.01 (or 0 for the intercept) b default, respectivel. These values can be changed using option control.fied in the inla() call. control.fied must take a named list of arguments, which are used to control how to handle the fied effects in the model. In this named list, mean.intercept and prec.intercept can be used to set the parameters of the Gaussian prior of the intercept, whilst mean and prec are the analogous parameters to define the priors for the other fied effects. These can be a numeric value or another named list, using the names of fied effects, to set different priors for different effects. Note that
8 8 Spatial Data Analsis with RINLA with Some Etensions Name in f() Model Regular grid besag Intrinsic CAR No besagproper Proper CAR No bm Convolution model No generic0 Σ = 1 τ Q 1 No generic1 Σ = 1 τ (I n ρ λ ma C) 1 No rw2d 2D random walk Yes matern2d Matérn correlation Yes Table 1: Summar of some latent models implemented in RINLA for spatial statistics. precisions in the fied effects priors cannot be estimated as was the case with the different random effects presented before. The model that we are fitting is: i = α + βx i + v i + ε i (14) where α is the model intercept, β a vector of coefficients of the covariates X i, v i a random effect with an intrinsic CAR specification and ε i is random Gaussian error term. As f() needs an area inde which must have different values for different areas, this is first defined in variable id. R> librar("inla") R> boston$id < 1:nrow(boston) R> form < log(cmedv) ~ CRIM + ZN + INDUS + CHAS + I(NOX^2) + I(RM^2) + + AGE + log(dis) + log(rad) + TAX + PTRATIO + B + log(lstat) + + f(id, model = "besag", graph = adj) R> btdf < as.data.frame(boston) R> m1 < inla(form, data = btdf, control.predictor = list(compute = TRUE)) Note how the call to inla() is similar to fitting other regression models with R with glm() or gam(). Furthermore, it is ver eas to include spatial random effects with function f() in the formula passed to inla(). Finall, control.predictor = list(compute = TRUE) is used to compute summar statistics on the fitted values. A summar of the model can be obtained as follows: R> summar(m1) Call: "inla(formula = form, data = btdf, control.predictor = list(compute = TRUE))" Time used: Preprocessing Running inla Postprocessing Total Fied effects:
9 Journal of Statistical Software 9 mean sd 0.025quant 0.5quant 0.975quant mode kld (Intercept) CRIM ZN INDUS CHAS I(NOX^2) I(RM^2) AGE log(dis) log(rad) TAX PTRATIO B log(lstat) Random effects: Name Model id Besags ICAR model Model hperparameters: mean sd Precision for the Gaussian observations 1.626e e+04 Precision for id 1.222e e quant 0.5quant Precision for the Gaussian observations 7.582e e+04 Precision for id 1.074e e quant mode Precision for the Gaussian observations 6.180e e+03 Precision for id 1.381e e+01 Epected number of effective parameters(std dev): (5.348) Number of equivalent replicates : Marginal Likelihood: Posterior marginals for linear predictor and fitted values computed The output includes summar statistics of the posterior marginals of the coefficients of the fied effects plus the precisions of the error term and intrinsic CAR random effect. In addition, kld reports the KullbackLeibler divergence between the Gaussian and the (simplified) Laplace approimation to the marginal posterior densities. This provides information about the accurac of the Gaussian approimation. The marginal likelihood of the model is also reported and it is computed b integrating all the model parameters out. Hence, it is not the predictive marginal likelihood and it can be used to perform model selection, for eample. The effictive number of parameters, as defined in Spiegelhalter et al. (2002), and the associated number of equivalent replicates are also shown. See Martino and Rue (2010) for more details on the RINLA output.
10 10 Spatial Data Analsis with RINLA with Some Etensions (Intercept) CRIM ZN INDUS CHAS1 I(NOX^2) I(RM^2) AGE log(dis) log(rad) TAX PTRATIO e 04 2e B log(lstat) Precision error Precision spatial effects e+00 6e e+00 4e 05 0e+00 2e+05 4e Figure 1: Marginals of the fied effects, and the precisions of the error term and spatial random effects, Boston housing data. Figure 1 shows the estimated marginals of the coefficients of the fied effects and the precisions of the random effects in the model. These distributions can be used to compute summar statistics for the model parameters. In the previous RINLA output these marginals have been used to compute the posterior mean, standard deviation, mode and some quantiles (0.025, 0.5 and 0.975). Fitted values can be easil displaed in a map. First, we need to add all the required values to the SpatialPolgonsDataFrame: R> boston$logcmedv < log(boston$cmedv) R> boston$ftdlogcmedv < m1$summar.fitted[, "mean"] Note that we will represent values in the logscale. Net, we can use spplot() to displa
11 Journal of Statistical Software 11 Observed CMEDV Predicted CMEDV Figure 2: Observed and predicted median values, Boston housing data. both the observed and the predicted values of house prices. In the following eample, which can be seen in Figure 2, we have also used package RColorBrewer (Neuwirth 2014) to define a suitable color palette: R> librar("rcolorbrewer") R> spplot(boston, c("logcmedv", "FTDLOGCMEDV"), + col.regions = brewer.pal(9, "Blues"), cuts = 8, + names.attr = c("observed logcmedv", "Predicted logcmedv")) To provide an alternative visualisation of the results, we have included a short eample using function qmap() from the ggmap package (Kahle and Wickham 2013). First of all we will reproject our data to be WGS84. With fortif() the boston dataset is converted into a suitable format to be used when plotting and then the log median values are added to the new data. R> bostonf < sptransform(boston, CRS("+proj=longlat +datum=wgs84")) R> librar("ggmap") R> bostonf < fortif(bostonf, region = "TRACT") R> id < match(bostonf$id, as.character(boston$tract)) R> bostonf$logcmedv < boston$logcmedv[id] qmap() is based on the the grammar of graphics implemented in the ggplot2 package (Wickham 2009). In the net eample, qmap() is used to get satellite data from the Boston area, whilst geom_polgon() adds the boundaries: R> qmap("boston", zoom = 10, maptpe = "satellite") + geom_polgon( + data = bostonf, aes( = long, = lat, group = group, fill = LOGCMEDV), + colour = "white", alpha = 0.8, size = 0.3) The resulting map can be seen in Figure 3.
12 12 Spatial Data Analsis with RINLA with Some Etensions Figure 3: Displa of the Boston housing data set using ggmap and Google Maps Point patterns Point patterns are analzed with INLA as the result of a counting process, i.e., points are not modelled directl but the are aggregated over a a grid of small squares. For this reason, the analsis of point patterns is conducted similarl to that of lattice data: counts are available for each square and these are assigned neighbors according to the adjacent squares. Then, counts can be smoothed using an appropriate nonlinear term, such as spatial random effects. Hossain and Lawson (2009) compare different approimations to the analsis of point patterns, including methods that are based on discretisation of the stud region. In the following eample we use the Japanese black pine data set from R package spatstat (Baddele and Turner 2005). This data set records the location of Japanese black pine saplings in a square sampling in a natural forest. This eample is reproduced from Go mezrubio et al. (2014b). Hence, we first split the stud area into smaller squares to create a grid of squares. R> R> R> R> R> R> R> + librar("spatstat") data("japanesepines") japd < as.data.frame(japanesepines) Nrow < 10 Ncol < 10 n < Nrow * Ncol grd < GridTopolog(cellcentre.offset = c(0.05, 0.05), cellsize = c(1/nrow, 1/Ncol), cells.dim = c(nrow, Ncol))
13 Journal of Statistical Software 13 After the creation of the grid, we have used function over() on the set of points and the newl defined squares to find how man points can be found in each square. R> polgrdjap < as(grd, "SpatialPolgons") R> idpp < over(spatialpoints(japd), polgrdjap) R> japgrd < SpatialGridDataFrame(grd, data.frame(ntrees = rep(0, n))) R> tidpp < table(idpp) R> japgrd$ntrees[as.numeric(names(tidpp))] < tidpp Net, an inde variable is built to create the spatial neighborhood structure to be passed to the f() function. Note that care must be taken as R and RINLA ma have a different ordering of the areas when defining the adjacenc matri. R> japgrd$spidx < 1:n R> japnb < pol2nb(polgrdjap, queen = FALSE, row.names = 1:100) R> adjpine < nb2mat(japnb, stle = "B") R> adjpine < as(adjpine, "dgtmatri") Here we have avoided using a queen adjacenc as this will consider as neighbors two areas which onl share a corner. Finall, we define the call to inla() using a formula which includes spatial random effects based on the grid of squares. In addition, we have set other options to compute the DIC, with control.compute = list(dic = TRUE), and the marginals of the linear predictors, using control.predictor = list(compute = TRUE). We have included the specification of the prior distributions of the logprecisions of unstructured and spatial random effects as well. R> fpp < Ntrees ~ 1 + f(japgrd$spidx, model = "bm", graph = adjpine, + hper = list(prec.unstruct = list(prior = "loggamma", + param = c(0.001, 0.001)), + prec.spatial = list(prior = "loggamma", param = c(0.1, 0.1)))) R> japinlala < inla(fpp, famil = "poisson", data = as.data.frame(japgrd), + control.compute = list(dic = TRUE), + control.inla = list(tolerance = 1e20, h = 1e08), + control.predictor = list(compute = TRUE)) R> japgrd$inlala < japinlala$summar.fitted.values[, "mean"] The former model is the one that we have emploed with the Boston data set on an irregular lattice. Given that now we are considering a regular lattice it is also possible to use a twodimensional random walk for spatial smoothing: R> fpprw2d < Ntrees ~ 1 + f(japgrd$spidx, model = "rw2d", nrow = 10, + ncol = 10, hper = list(prec = list(prior = "loggamma", + param = c(0.001, 0.001)))) R> japinlalarw2d < inla(fpprw2d, famil = "poisson", + data = as.data.frame(japgrd), control.compute = list(dic = TRUE), + control.inla = list(tolerance = 1e20, h = 1e08), + control.predictor = list(compute = TRUE)) R> japgrd$inlalarw2d < japinlalarw2d$summar.fitted.values[, "mean"]
14 14 Spatial Data Analsis with RINLA with Some Etensions DATA INLA BYM INLA RW2D Figure 4: Estimation of the intensit of a point pattern with RINLA, Japanese black pine dataset. Figure 4 shows the original counts and the smoothed counts. Note that this is similar to estimating the intensit of an inhomogeneous point pattern using a smoothing method Geostatistics RINLA deals with geostatistical data on a regular grid. This means that observations need to be matched to the points in the grid and that those points with no observations attached are considered as missing values. Hence, this is somewhat similar to the analsis of lattice data and point patterns. However, RINLA provides a number of options to build modelbased geostatistical models (Diggle and Ribeiro 2007). First of all, different likelihoods can be used. Secondl, there are different options to define the spatial random effects. Although it is still possible to model spatial dependence in the grid of points using a CAR specification, RINLA provides a twodimensional Matérn covariance function. This correlation allows, for eample, the use of eponentiall decaing functions such as Σ ij = σ 2 ep( d ij /ϕ) (15) where d ij is the distance between points i and j, and ϕ is a parameter that controls the scale of the spatial dependence. More recentl, Lindgren, Rue, and Lindström (2011) follow a different approach based on a triangulation on the sampling points and the use of stochastic partial differential equations. Now, the spatial effects are defined as u(s) = n ψ k (s)w k, s R 2 (16) k=1 Here, {ψ k (s)} are a basis of functions and w k are associated weights. Weights are assumed to be Gaussian. The advantages of this approach for spatial statistics are full described in Cameletti, Lindgren, Simpson, and Rue (2013). In order to show how to fit geostatistical models with RINLA we reproduce here an eample from GómezRubio et al. (2014b) based on the Rongelap data set (Diggle and Ribeiro 2007), which records radionuclide concentration at 157 different locations in Rongelap island. We have restricted the analsis to one of the clusters in the northeast part of the island because
15 Journal of Statistical Software 15 observations need to be matched to a regular grid of points. For this analsis we have used R packages geor (Ribeiro and Diggle 2001) and georglm (Christensen and Ribeiro 2002). First of all, data are loaded and the data from the desired clusters are etracted from the original data set b checking that their coordinates are in the window ( 700, 500) ( 1900, 1700). R> librar("geor") R> librar("georglm") R> data("rongelap") R> rgldata < as.data.frame(rongelap) R> < rongelap[[1]] R> id1 < ([, 1] < 500 & [, 1] > 700 & [, 2] > & + [, 2] < 1700) R> rgldata < rgldata[id1, ] The net step is to define the grid topolog for the grid that will be used to match these points to. The grid is defined to be of dimension 5 5. R> Nrow < 5 R> Ncol < 5 R> n < Nrow * Ncol R> grdoffset < c(min(rgldata$x1), min(rgldata$x2)) R> csize1 < diff(range(rgldata$x1))/(nrow  1) R> csize2 < diff(range(rgldata$x2))/(ncol  1) R> grd < GridTopolog(cellcentre.offset = grdoffset, + cellsize = c(csize1, csize2), cells.dim = c(nrow, Ncol)) Data will be placed in a SpatialGridDataFrame (using the previousl defined grid topolog) and reorganized according to what RINLA epects for this model (i.e., grid data stored b column). An inde variable IDX is added to be used in f() when defining the model. However, RINLA will rel on how the rows are ordered in the data passed to inla() when defining distances and adjacencies (i.e., the inde variable ordering will not be considered). R> inla2sp < inla.lattice2node.mapping(nrow, Ncol)[, Ncol:1] R> inla2sp < as.vector(inla2sp) R> spgrd < SpatialGridDataFrame(grd, as.data.frame(rgldata[inla2sp, ])) R> spgrd$idx < Net, we create a SpatialPolgons with the boundaries of the squares in the grid. This wa, it is eas to match the data to the newl created grid using function over(). R> polgrd < as(grd, "SpatialPolgons") R> dataid < over(spatialpoints(as.matri(rgldata[, 1:2])), polgrd) It should be noted that radionuclide concentration is measured at each square b the average of the observations in the square, and this needs to be computed beforehand. R> ag < b(rgldata$data, dataid, sum) R> umag < b(rgldata$units.m, dataid, sum) R> ratioag < ag/umag
16 16 Spatial Data Analsis with RINLA with Some Etensions DATA INLA MATERN2D INLA RW2D Figure 5: Observed and estimated radionuclide concentration in Rongelap island. Then, a new column is added to the SpatialGridDataFrame with these averages. NA will be used for the squares with no data so that these values will be imputed from the model. R> spgrd$ratioag < NA R> spgrd$ratioag[as.numeric(names(ratioag))] < ratioag Here we define a model with an intercept term and a random effect of the Matérn class. Note how we have fied, for convenience, the value of the range and precision. R> formula1 < ratioag ~ 1 + f(spgrd$idx, model = "matern2d", nrow = Nrow, + ncol = Ncol, hper = list(range = list(initial = log(sqrt(8)/0.5), + fied = TRUE), prec = list(initial = log(1), fied = TRUE))) R> rglinlala < inla(formula1, famil = "poisson", + control.predictor = list(compute = TRUE), + control.compute = list(dic = TRUE), + data = as.data.frame(spgrd)) R> spgrd$inlala < rglinlala$summar.fitted.values[, "mean"] Similarl as in the point patterns eample, here we have also used a two dimensional random walk for spatial smoothing. R> formularw2d < ratioag ~ 1 + f(spgrd$idx, model = "rw2d", nrow = Nrow, + ncol = Ncol, hper = list(prec = list(prior = "loggamma", + param = c(1, 1)))) R> rglinlalarw2d < inla(formularw2d, famil = "poisson", + control.predictor = list(compute = TRUE), + control.compute = list(dic = TRUE), + data = as.data.frame(spgrd)) R> spgrd$inlalarw2d < rglinlalarw2d$summar.fitted.values[, "mean"] Figure 5 shows the observed and estimated radionuclide concentration in Rongelap island. It can be seen how our model has spatiall smoothed the observed values.
17 Journal of Statistical Software 17 DATA INLA BYM INLA RW2D BAYESX Figure 6: Estimation of the intensit of a point pattern with RINLA and BaesX, Japanese black pine dataset RINLA and other packages for Baesian spatial modelling RINLA is not the onl package for Baesian spatial modelling. Bivand, Pebesma, and Gómez Rubio (2013, Chapter 10) compare different packages for Baesian modelling in the contect of disease mapping. We wil focus here in R2BaesX (Umlauf, Kneib, Lang, and Zeileis 2013; Umlauf, Adler, Kneib, Lang, and Zeileis 2015) because it provides a wa to defining spatial models as RINLA. For eample, in order to reproduce the eample on the Japanese black pine data with R2BaesX we can do the following: R> librar("r2baesx") R> baesadj < nb2gra(japnb) R> japb < baes(ntrees ~ 1 + s(spidx, bs = "re") + + s(spidx, bs = "spatial", map = baesadj), famil = "poisson", + data = as.data.frame(japgrd)) Function nb2gra() is used to convert our adjacenc matri into an object of class gra, which is used in R2BaesX to store adjacencies. baes() takes similar arguments as inla() and the model can be epressed using a formula, with s() used to define the random effects. s(id, bs = "re") defines independent Gaussian random effects and the spatial random effects are defined in s(tract, bs = "spatial", map = baesadj) using adjancenc matri defined in baesadj. Retrieving the predicted data requires some care as the are reordered, but is is as simple as: R> japgrd$bayesx < japb$fitted.values[ + order(japb$baes.setup$order), "mu"] Finall, we compare the fitted values obtained with RINLA and R2BaesX in Figure 6. Note that differences appear not onl because of the different models used but also because of the choice of prior distributions.
18 18 Spatial Data Analsis with RINLA with Some Etensions 4. Etending RINLA to fit new models Although the current implementation of INLA in the RINLA package provides a reasonable number of models for spatial dependence it ma be the case that we need to include some other models. As it is now, this is not possible without adding to the code of the eternal INLA programme. Bivand et al. (2014b) describe a simple wa of etending INLA to use other latent models. In particular the focus on some latent models used in spatial econometrics that are not available as part of the RINLA package at the moment. A new latent class has been added recentl and it is described in GómezRubio, Bivand, and Rue (2014a). This approach is based on considering a model where one or several parameters have been fied in a wa that makes the conditioned model fittable with RINLA. If we denote b ρ the vector of parameters to fi and b ˆρ a specific set of fied parameter values, the full posterior marginal could be written as π(, θ, ˆρ) (17) Taking this into account, it is clear that when conditioning on ρ = ˆρ RINLA will give us an approimation to π( i, ˆρ) and π(θ i, ˆρ). Note that the full posterior distribution can be obtained b integrating ρ out, i.e., π(, θ ) = π(, θ, ρ)π(ρ )dρ (18) where π(ρ ) is the posterior distribution of ρ. Also, note that this can be written as π(ρ ) π( ρ)π(ρ) (19) Here π(ρ) is a prior distribution on ρ and π( ρ) is the marginal likelihood of the model, which is reported b RINLA. Hence, π(ρ ) can be estimated b rescaling the epression in Equation 19. The posterior distribution of ρ can be estimated b defining a fine grid of values S = {ρ i } r i=1 so that π(ρ i ), i = 1,..., r are computed. Then π(ρ ) can be obtained b fitting and rescaling a spline (or other nonlinear function) to the previous values. Using simple numerical integration techniques we can obtain an approimation to π(, θ ) as follows: π(, θ ) = π(, θ, ρ)π(ρ )dρ π(, θ, ρ i )π(ρ i ) i (20) ρ i S where i is the amplitude of the interval used in the discretisation of ρ. Note that the previous epression can be regarded as a weighted average of the different models fitted after conditioning on different values of ρ. From Equation 20 it is clear that we can obtain the following approimations to the posterior marginals of the individual latent parameters and hperparameters: ˆπ( i ) = j π( i, ρ j )w j (21) ˆπ(θ i ) = j π(θ i, ρ j )w j (22)
19 Journal of Statistical Software 19 w j is a weight associated with ρ j as follows: w j = π(ρ j ) j (23) This is like carring out Baesian model averaging (Hoeting, Madigan, Rafter, and Volinsk 1999) on the different conditioned models fitted with RINLA. Altogether, this provides a wa of combining simpler models to obtain our desired model. In Section 5 we show how to appl these ideas to different models in spatial statistics. Note that this approach can be easil etended to the case of ρ being a discrete random variable Implementation We have implemented this approach in an R package called INLABMA, available from CRAN. The package includes some general functions to conduct Baesian model averaging of models fitted with INLA. In addition, we have included some wrapper functions to fit the models described in Section Eamples 5.1. Lerou model Lerou, Lei, and Breslow (1999) propose a model for the analsis of spatial data in a lattice which is similar to the one b Besag et al. (1991), in the sense that the split variation according to spatial and nonspatial patterns. Rather than including the spatial and nonspatial random effect as a sum in the linear term the consider a single random effect as follows: u MVN(0, Σ); Σ = σ 2 ((1 λ)i n + λm) 1 (24) Here M is the precision matri of a process with spatial structure and we will take that of an intrinsic CAR specification. Hence, the precision matri is, in a sense, a miture of the precisions of a nonspatial and a spatial one. λ controls how strong the spatial structure is. For λ = 1 the effect is entirel spatial whilst for λ = 0 there is no spatial dependence. In principle, this is not a model that RINLA can fit. However, if λ is fied, then the random effects are Gaussian with a known structure for the variancecovariance matri which can be fitted using a generic0 latent model. Boston housing data Here we revisit the Boston housing data to fit the Lerou et al. model. First of all, it is worth mentioning that the model needs a wrapper function to be fitted for a given value of the spatial parameter λ. This wrapper function is included in the R package RINLA and it is based on the generic0 latent model available in RINLA. Once λ is fied the model can be easil fitted with RINLA, as the latent effect is a multivariate Gaussian random effect with zero mean and precision matri as in Equation 24. We repeat this procedure for different values of λ to obtain a list of fitted models to be combined later. Hence, we have written a simple wrapper function which is included in package INLABMA (GómezRubio and Bivand 2014):
20 20 Spatial Data Analsis with RINLA with Some Etensions R> librar("inlabma") R> lerou.inla function (formula, d, W, lambda, improve = TRUE, fhper = NULL,...) { W2 < diag(appl(w, 1, sum))  W Q < (1  lambda) * diag(nrow(w)) + lambda * W2 assign("q", Q, environment(formula)) if (is.null(fhper)) { formula < update(formula,. ~. + f(id, model = "generic0", Cmatri = Q)) } else { formula < update(formula,. ~. + f(id, model = "generic0", Cmatri = Q, hper = fhper)) } res < INLA::inla(formula, data = d,...) if (improve) res < INLA::inla.rerun(res) res$logdet < as.numeric(matri::determinant(q)$modulus) res$mlik < res$mlik + res$logdet/2 return(res) } <environment: namespace:inlabma> In the previous code, the precision matri Q is created using the adjacenc matri W and the value of λ. Then the generic0 model is added to the formula with the fied effects. Finall we correct the marginal loglikelihood π( λ) (conditioned on the value of λ) b adding half the logdeterminant of ((1 λ)i n + λm). Note that, in principle, this is not needed to fit a single model and obtain the approimations to the posterior marginals as it is a constant. However, we are fitting and combining several models so we need to correct for this because this scaling factor will change with the value of λ. Argument... is used to pass an other options to inla(). This can be used to tune and set a number of other options. Also, the adjacenc matri is taken from the data provided in the boston data set. Note that we will be using a binar adjacenc matri as the random effects have an intrinsic CAR specification: R> boston.matb < listw2mat(nb2listw(bostonadj, stle = "B")) R> bmspb < as(boston.matb, "CsparseMatri") Function inla.lerou is used in the eample below to compute the fitted models for the Lerou et al. model. In this case, we take λ to be in the interval (0.8, 0.99) after previous assessment on where π(λ ) has its mode. Also, we define a prior for the precision of the random effects in variable fhper. The prior for the precision of the error term is defined in errorhper. In addition, we have used mclappl to parallelize the computations on operating sstems supporting forking (not Windows). Note that this is an advantage of fitting these conditioned models compared with standard MCMC methods.
BayesX  Software for Bayesian Inference in Structured Additive Regression
BayesX  Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, LudwigMaximiliansUniversity Munich
More informationThe Graph of a Linear Equation
4.1 The Graph of a Linear Equation 4.1 OBJECTIVES 1. Find three ordered pairs for an equation in two variables 2. Graph a line from three points 3. Graph a line b the intercept method 4. Graph a line that
More informationSection 7.2 Linear Programming: The Graphical Method
Section 7.2 Linear Programming: The Graphical Method Man problems in business, science, and economics involve finding the optimal value of a function (for instance, the maimum value of the profit function
More informationLocation matters. 3 techniques to incorporate geospatial effects in one's predictive model
Location matters. 3 techniques to incorporate geospatial effects in one's predictive model Xavier Conort xavier.conort@gearanalytics.com Motivation Location matters! Observed value at one location is
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3 Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More informationData Mining: An Overview. David Madigan http://www.stat.columbia.edu/~madigan
Data Mining: An Overview David Madigan http://www.stat.columbia.edu/~madigan Overview Brief Introduction to Data Mining Data Mining Algorithms Specific Eamples Algorithms: Disease Clusters Algorithms:
More informationSpatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
More informationAn Introduction to Spatial Regression Analysis in R. Luc Anselin University of Illinois, UrbanaChampaign http://sal.agecon.uiuc.
An Introduction to Spatial Regression Analysis in R Luc Anselin University of Illinois, UrbanaChampaign http://sal.agecon.uiuc.edu May 23, 2003 Introduction This note contains a brief introduction and
More informationSection 5: The Jacobian matrix and applications. S1: Motivation S2: Jacobian matrix + differentiability S3: The chain rule S4: Inverse functions
Section 5: The Jacobian matri and applications. S1: Motivation S2: Jacobian matri + differentiabilit S3: The chain rule S4: Inverse functions Images from Thomas calculus b Thomas, Wier, Hass & Giordano,
More informationIdentify a pattern and find the next three numbers in the pattern. 5. 5(2s 2 1) 2 3(s 1 2); s 5 4
Chapter 1 Test Do ou know HOW? Identif a pattern and find the net three numbers in the pattern. 1. 5, 1, 3, 7, c. 6, 3, 16, 8, c Each term is more than the previous Each term is half of the previous term;
More information1. a. standard form of a parabola with. 2 b 1 2 horizontal axis of symmetry 2. x 2 y 2 r 2 o. standard form of an ellipse centered
Conic Sections. Distance Formula and Circles. More on the Parabola. The Ellipse and Hperbola. Nonlinear Sstems of Equations in Two Variables. Nonlinear Inequalities and Sstems of Inequalities In Chapter,
More informationLESSON EIII.E EXPONENTS AND LOGARITHMS
LESSON EIII.E EXPONENTS AND LOGARITHMS LESSON EIII.E EXPONENTS AND LOGARITHMS OVERVIEW Here s what ou ll learn in this lesson: Eponential Functions a. Graphing eponential functions b. Applications of eponential
More informationRegression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
More informationBayesian modeling of inseparable spacetime variation in disease risk
Bayesian modeling of inseparable spacetime variation in disease risk Leonhard KnorrHeld Laina Mercer Department of Statistics UW May 23, 2013 Motivation Area and timespecific disease rates Area and
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationKMeans Cluster Analysis. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1
KMeans Cluster Analsis Chapter 3 PPDM Class Tan,Steinbach, Kumar Introduction to Data Mining 4/18/4 1 What is Cluster Analsis? Finding groups of objects such that the objects in a group will be similar
More informationINVESTIGATIONS AND FUNCTIONS 1.1.1 1.1.4. Example 1
Chapter 1 INVESTIGATIONS AND FUNCTIONS 1.1.1 1.1.4 This opening section introduces the students to man of the big ideas of Algebra 2, as well as different was of thinking and various problem solving strategies.
More informationThe Big Picture. Correlation. Scatter Plots. Data
The Big Picture Correlation Bret Hanlon and Bret Larget Department of Statistics Universit of Wisconsin Madison December 6, We have just completed a length series of lectures on ANOVA where we considered
More informationSECTION 51 Exponential Functions
354 5 Eponential and Logarithmic Functions Most of the functions we have considered so far have been polnomial and rational functions, with a few others involving roots or powers of polnomial or rational
More information15.1. Exact Differential Equations. Exact FirstOrder Equations. Exact Differential Equations Integrating Factors
SECTION 5. Eact FirstOrder Equations 09 SECTION 5. Eact FirstOrder Equations Eact Differential Equations Integrating Factors Eact Differential Equations In Section 5.6, ou studied applications of differential
More informationSo, using the new notation, P X,Y (0,1) =.08 This is the value which the joint probability function for X and Y takes when X=0 and Y=1.
Joint probabilit is the probabilit that the RVs & Y take values &. like the PDF of the two events, and. We will denote a joint probabilit function as P,Y (,) = P(= Y=) Marginal probabilit of is the probabilit
More informationAn explicit link between Gaussian fields and Gaussian Markov random fields; the stochastic partial differential equation approach
Intro B, W, M, & R SPDE/GMRF Example End An explicit link between Gaussian fields and Gaussian Markov random fields; the stochastic partial differential equation approach Finn Lindgren 1 Håvard Rue 1 Johan
More information5. Linear regression and correlation
Statistics for Engineers 51 5. Linear regression and correlation If we measure a response variable at various values of a controlled variable, linear regression is the process of fitting a straight line
More informationTranslating Points. Subtract 2 from the ycoordinates
CONDENSED L E S S O N 9. Translating Points In this lesson ou will translate figures on the coordinate plane define a translation b describing how it affects a general point (, ) A mathematical rule that
More informationMoving Least Squares Approximation
Chapter 7 Moving Least Squares Approimation An alternative to radial basis function interpolation and approimation is the socalled moving least squares method. As we will see below, in this method the
More informationCrossvalidation for detecting and preventing overfitting
Crossvalidation for detecting and preventing overfitting Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures.
More informationIntroduction to polarization of light
Chapter 2 Introduction to polarization of light This Chapter treats the polarization of electromagnetic waves. In Section 2.1 the concept of light polarization is discussed and its Jones formalism is presented.
More informationMODELLING AND ANALYSIS OF
MODELLING AND ANALYSIS OF FOREST FIRE IN PORTUGAL  PART I Giovani L. Silva CEAUL & DMIST  Universidade Técnica de Lisboa gsilva@math.ist.utl.pt Maria Inês Dias & Manuela Oliveira CIMA & DM  Universidade
More informationAffine Transformations
A P P E N D I X C Affine Transformations CONTENTS C The need for geometric transformations 335 C2 Affine transformations 336 C3 Matri representation of the linear transformations 338 C4 Homogeneous coordinates
More informationTHE POWER RULES. Raising an Exponential Expression to a Power
8 (5) Chapter 5 Eponents and Polnomials 5. THE POWER RULES In this section Raising an Eponential Epression to a Power Raising a Product to a Power Raising a Quotient to a Power Variable Eponents Summar
More informationDownloaded from www.heinemann.co.uk/ib. equations. 2.4 The reciprocal function x 1 x
Functions and equations Assessment statements. Concept of function f : f (); domain, range, image (value). Composite functions (f g); identit function. Inverse function f.. The graph of a function; its
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Clustering Algorithms Kmeans and its variants Hierarchical clustering
More informationExample: Document Clustering. Clustering: Definition. Notion of a Cluster can be Ambiguous. Types of Clusterings. Hierarchical Clustering
Overview Prognostic Models and Data Mining in Medicine, part I Cluster Analsis What is Cluster Analsis? KMeans Clustering Hierarchical Clustering Cluster Validit Eample: Microarra data analsis 6 Summar
More informationx y The matrix form, the vector form, and the augmented matrix form, respectively, for the system of equations are
Solving Sstems of Linear Equations in Matri Form with rref Learning Goals Determine the solution of a sstem of equations from the augmented matri Determine the reduced row echelon form of the augmented
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationFor supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall
Cluster Validation Cluster Validit For supervised classification we have a variet of measures to evaluate how good our model is Accurac, precision, recall For cluster analsis, the analogous question is
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/4 What is
More informationGaussian Probability Density Functions: Properties and Error Characterization
Gaussian Probabilit Densit Functions: Properties and Error Characterization Maria Isabel Ribeiro Institute for Sstems and Robotics Instituto Superior Tcnico Av. Rovisco Pais, 1 1491 Lisboa PORTUGAL {mir@isr.ist.utl.pt}
More informationQ (x 1, y 1 ) m = y 1 y 0
. Linear Functions We now begin the stud of families of functions. Our first famil, linear functions, are old friends as we shall soon see. Recall from Geometr that two distinct points in the plane determine
More informationHandling attrition and nonresponse in longitudinal data
Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 6372 Handling attrition and nonresponse in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationSlopeIntercept Form and PointSlope Form
SlopeIntercept Form and PointSlope Form In this section we will be discussing SlopeIntercept Form and the PointSlope Form of a line. We will also discuss how to graph using the SlopeIntercept Form.
More informationC3: Functions. Learning objectives
CHAPTER C3: Functions Learning objectives After studing this chapter ou should: be familiar with the terms oneone and manone mappings understand the terms domain and range for a mapping understand the
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More informationGLOBAL COORDINATE METHOD FOR DETERMINING SENSITIVITY IN ASSEMBLY TOLERANCE ANALYSIS
GOBA COORDINATE METOD FOR DETERMINING SENSITIVIT IN ASSEMB TOERANCE ANASIS Jinsong Gao ewlettpackard Corp. InkJet Business Unit San Diego, CA Kenneth W. Chase Spencer P. Magleb Mechanical Engineering
More informationPulsed Fourier Transform NMR The rotating frame of reference. The NMR Experiment. The Rotating Frame of Reference.
Pulsed Fourier Transform NR The rotating frame of reference The NR Eperiment. The Rotating Frame of Reference. When we perform a NR eperiment we disturb the equilibrium state of the sstem and then monitor
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationLab 8: Introduction to WinBUGS
40.656 Lab 8 008 Lab 8: Introduction to WinBUGS Goals:. Introduce the concepts of Bayesian data analysis.. Learn the basic syntax of WinBUGS. 3. Learn the basics of using WinBUGS in a simple example. Next
More informationCHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS
Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining /8/ What is Cluster
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationMore details on the inputs, functionality, and output can be found below.
Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a twoarmed trial comparing
More informationR2MLwiN Using the multilevel modelling software package MLwiN from R
Using the multilevel modelling software package MLwiN from R Richard Parker Zhengzheng Zhang Chris Charlton George Leckie Bill Browne Centre for Multilevel Modelling (CMM) University of Bristol Using the
More informationP1. Plot the following points on the real. P2. Determine which of the following are solutions
Section 1.5 Rectangular Coordinates and Graphs of Equations 9 PART II: LINEAR EQUATIONS AND INEQUALITIES IN TWO VARIABLES 1.5 Rectangular Coordinates and Graphs of Equations OBJECTIVES 1 Plot Points in
More informationProbability and Statistics
CHAPTER 2: RANDOM VARIABLES AND ASSOCIATED FUNCTIONS 2b  0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute  Systems and Modeling GIGA  Bioinformatics ULg kristel.vansteen@ulg.ac.be
More informationGeostatistics Exploratory Analysis
Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt
More informationImplicit Differentiation
Revision Notes 2 Calculus 1270 Fall 2007 INSTRUCTOR: Peter Roper OFFICE: LCB 313 [EMAIL: roper@math.utah.edu] Standard Disclaimer These notes are not a complete review of the course thus far, and some
More informationPackage EstCRM. July 13, 2015
Version 1.4 Date 2015711 Package EstCRM July 13, 2015 Title Calibrating Parameters for the Samejima's Continuous IRT Model Author Cengiz Zopluoglu Maintainer Cengiz Zopluoglu
More informationZeros of Polynomial Functions. The Fundamental Theorem of Algebra. The Fundamental Theorem of Algebra. zero in the complex number system.
_.qd /7/ 9:6 AM Page 69 Section. Zeros of Polnomial Functions 69. Zeros of Polnomial Functions What ou should learn Use the Fundamental Theorem of Algebra to determine the number of zeros of polnomial
More informationHigher. Polynomials and Quadratics 64
hsn.uk.net Higher Mathematics UNIT OUTCOME 1 Polnomials and Quadratics Contents Polnomials and Quadratics 64 1 Quadratics 64 The Discriminant 66 3 Completing the Square 67 4 Sketching Parabolas 70 5 Determining
More informationSAMPLE. Polynomial functions
Objectives C H A P T E R 4 Polnomial functions To be able to use the technique of equating coefficients. To introduce the functions of the form f () = a( + h) n + k and to sketch graphs of this form through
More informationJoint models for classification and comparison of mortality in different countries.
Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute
More informationSYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation
SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu September 19, 2015 Outline
More informationChapter 6 Quadratic Functions
Chapter 6 Quadratic Functions Determine the characteristics of quadratic functions Sketch Quadratics Solve problems modelled b Quadratics 6.1Quadratic Functions A quadratic function is of the form where
More informationSECTION 74 Algebraic Vectors
74 lgebraic Vectors 531 SECTIN 74 lgebraic Vectors From Geometric Vectors to lgebraic Vectors Vector ddition and Scalar Multiplication Unit Vectors lgebraic Properties Static Equilibrium Geometric vectors
More informationSection V.2: Magnitudes, Directions, and Components of Vectors
Section V.: Magnitudes, Directions, and Components of Vectors Vectors in the plane If we graph a vector in the coordinate plane instead of just a grid, there are a few things to note. Firstl, directions
More informationChapter 16, Part C Investment Portfolio. Risk is often measured by variance. For the binary gamble L= [, z z;1/2,1/2], recall that expected value is
Chapter 16, Part C Investment Portfolio Risk is often measured b variance. For the binar gamble L= [, z z;1/,1/], recall that epected value is 1 1 Ez = z + ( z ) = 0. For this binar gamble, z represents
More informationCOMPLEX STRESS TUTORIAL 3 COMPLEX STRESS AND STRAIN
COMPLX STRSS TUTORIAL COMPLX STRSS AND STRAIN This tutorial is not part of the decel unit mechanical Principles but covers elements of the following sllabi. o Parts of the ngineering Council eam subject
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationHandling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza
Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and
More informationMore Equations and Inequalities
Section. Sets of Numbers and Interval Notation 9 More Equations and Inequalities 9 9. Compound Inequalities 9. Polnomial and Rational Inequalities 9. Absolute Value Equations 9. Absolute Value Inequalities
More informationAddition and Subtraction of Vectors
ddition and Subtraction of Vectors 1 ppendi ddition and Subtraction of Vectors In this appendi the basic elements of vector algebra are eplored. Vectors are treated as geometric entities represented b
More informationGraphing Linear Equations
6.3 Graphing Linear Equations 6.3 OBJECTIVES 1. Graph a linear equation b plotting points 2. Graph a linear equation b the intercept method 3. Graph a linear equation b solving the equation for We are
More information7.2 Application to economics: Leontief Model
7 Application to economics: Leontief Model Wassil Leontief won the Nobel prize in economics in 97 The Leontief model is a model for the economics of a whole countr or region In the model there are n industries
More informationAutonomous Equations / Stability of Equilibrium Solutions. y = f (y).
Autonomous Equations / Stabilit of Equilibrium Solutions First order autonomous equations, Equilibrium solutions, Stabilit, Longterm behavior of solutions, direction fields, Population dnamics and logistic
More informationSolving Systems of Linear Equations With Row Reductions to Echelon Form On Augmented Matrices. Paul A. Trogdon Cary High School Cary, North Carolina
Solving Sstems of Linear Equations With Ro Reductions to Echelon Form On Augmented Matrices Paul A. Trogdon Car High School Car, North Carolina There is no more efficient a to solve a sstem of linear equations
More informationLet (x 1, y 1 ) (0, 1) and (x 2, y 2 ) (x, y). x 0. y 1. y 1 2. x x Multiply each side by x. y 1 x. y x 1 Add 1 to each side. SlopeIntercept Form
8 () Chapter Linear Equations in Two Variables and Their Graphs In this section SlopeIntercept Form Standard Form Using SlopeIntercept Form for Graphing Writing the Equation for a Line Applications
More informationf x a 0 n 1 a 0 a 1 cos x a 2 cos 2x a 3 cos 3x b 1 sin x b 2 sin 2x b 3 sin 3x a n cos nx b n sin nx n 1 f x dx y
Fourier Series When the French mathematician Joseph Fourier (768 83) was tring to solve a problem in heat conduction, he needed to epress a function f as an infinite series of sine and cosine functions:
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002Topics in StatisticsBiological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationAnalytic Models of the ROC Curve: Applications to Credit Rating Model Validation
QUANTITATIVE FINANCE RESEARCH CENTRE QUANTITATIVE FINANCE RESEARCH CENTRE Research Paper 8 August 2006 Analtic Models of the ROC Curve: Applications to Credit Rating Model Validation Stephen Satchell and
More informationPoisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study loglinear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More informationPhysics 53. Kinematics 2. Our nature consists in movement; absolute rest is death. Pascal
Phsics 53 Kinematics 2 Our nature consists in movement; absolute rest is death. Pascal Velocit and Acceleration in 3D We have defined the velocit and acceleration of a particle as the first and second
More informationClient Based Power Iteration Clustering Algorithm to Reduce Dimensionality in Big Data
Client Based Power Iteration Clustering Algorithm to Reduce Dimensionalit in Big Data Jaalatchum. D 1, Thambidurai. P 1, Department of CSE, PKIET, Karaikal, India Abstract  Clustering is a group of objects
More informationGetting started with qplot
Chapter 2 Getting started with qplot 2.1 Introduction In this chapter, you will learn to make a wide variety of plots with your first ggplot2 function, qplot(), short for quick plot. qplot makes it easy
More informationMBA 611 STATISTICS AND QUANTITATIVE METHODS
MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 111) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain
More informationD.2. The Cartesian Plane. The Cartesian Plane The Distance and Midpoint Formulas Equations of Circles. D10 APPENDIX D Precalculus Review
D0 APPENDIX D Precalculus Review APPENDIX D. The Cartesian Plane The Cartesian Plane The Distance and Midpoint Formulas Equations of Circles The Cartesian Plane Just as ou can represent real numbers b
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationA Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
More informationSolving Quadratic Equations by Graphing. Consider an equation of the form. y ax 2 bx c a 0. In an equation of the form
SECTION 11.3 Solving Quadratic Equations b Graphing 11.3 OBJECTIVES 1. Find an ais of smmetr 2. Find a verte 3. Graph a parabola 4. Solve quadratic equations b graphing 5. Solve an application involving
More informationLinear Equations in Two Variables
Section. Sets of Numbers and Interval Notation 0 Linear Equations in Two Variables. The Rectangular Coordinate Sstem and Midpoint Formula. Linear Equations in Two Variables. Slope of a Line. Equations
More informationExploratory data analysis for microarray data
Eploratory data analysis for microarray data Anja von Heydebreck Ma Planck Institute for Molecular Genetics, Dept. Computational Molecular Biology, Berlin, Germany heydebre@molgen.mpg.de Visualization
More informationJournal of Statistical Software
JSS Journal of Statistical Software MMMMMM YYYY, Volume VV, Issue II. http://www.jstatsoft.org/ sptimer: SpatioTemporal Bayesian Modeling Using R K.S. Bakar Yale University, USA S.K. Sahu University of
More informationClassifying Solutions to Systems of Equations
CONCEPT DEVELOPMENT Mathematics Assessment Project CLASSROOM CHALLENGES A Formative Assessment Lesson Classifing Solutions to Sstems of Equations Mathematics Assessment Resource Service Universit of Nottingham
More informationTo Be or Not To Be a Linear Equation: That Is the Question
To Be or Not To Be a Linear Equation: That Is the Question Linear Equation in Two Variables A linear equation in two variables is an equation that can be written in the form A + B C where A and B are not
More informationSolving Nonlinear Equations Using Recurrent Neural Networks
Solving Nonlinear Equations Using Recurrent Neural Networks Karl Mathia and Richard Saeks, Ph.D. Accurate Automation Corporation 71 Shallowford Road Chattanooga, Tennessee 37421 Abstract A class of recurrent
More informationStudents Currently in Algebra 2 Maine East Math Placement Exam Review Problems
Students Currently in Algebra Maine East Math Placement Eam Review Problems The actual placement eam has 100 questions 3 hours. The placement eam is free response students must solve questions and write
More informationANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION
ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? SAMUEL H. COX AND YIJIA LIN ABSTRACT. We devise an approach, using tobit models for modeling annuity lapse rates. The approach is based on data provided
More informationGraduate Programs in Statistics
Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL
More information