1 Monte Carlo tests for spatial patterns and their change a Finnish Forest Research Institute Unioninkatu 40 A, 00170 Helsinki juha.heikkinen@metla.fi Workshop on Spatial Statistics and Ecology Perämeri Research Station, Hailuoto 16. 18.10.2003 a With inspiration from problem presented by Juha Siitonen (Metla, Vantaa) Slides available at http:///pp/juhe/pub/hailuoto03.pdf
2 Motivation All available observations on long-horned beetles (sarvijäärät) Leptura quadrifasciata (nelivyöjäärä) and Monochamus sutor (suutari) in Finland, aggregated to 50km 50km squares and two time intervals. Filled dot: observations both before and after 1960 Empty dot: observations only before 1960 + : observations only after 1960 Question: Are there changes in the biogeographic ranges?
3
4 Problems The usual problem with such data: observational effort spatially and temporally variable and unknown. Also species tend to be more abundant in the core of their range than in the limits. Question: Can spatial pattern of observations still reveal real changes? E.g., when no changes, random pattern of empty dots and + s expected along the limits. Significant clustering of empty dots indicates decline of species?
5 Monte Carlo significance test Let H 0 be a null hypothesis about the distribution of (multidimensional) random variable X, such that H 0 is simple (no unknown parameters involved) or H 0 is composite, but sufficient statistics exist for all nuisance parameters.
6 Example (Besag & Diggle, 1977) X are locations of events (e.g., trees) in study region R H 0 : pattern X is completely random Null distribution is the homogeneous Poisson process on R : number of events in X, n Poisson(λ R ), where R is the size of R the n events are uniformly distributed over R, locations mutually independent λ is a nuisance parameter with sufficient statistics n.
7 General test procedure (Barnard, 1963) Select any test statistic u, sensitive to suspected kind of departure from H 0. Suppose large values indicate departure. Compute u 1 = u(x) from the observed data. Simulate m 1 random samples x 2,x 3,...,x m from the null distribution of X. Compute simulated u-values u i = u(x i ), i = 2,...,m. Order the complete set {u 1,u 2,...,u m } If u 1 is the k th largest, then the exact significance level of the test is k/n.
8 Citing Hope (1968) preferable to use known test of good efficiency instead of a Monte Carlo test procedure assuming that the alternative statistical hypothesis can be completely satisfied. Monte Carlo useful (at least in early stages of statistical analysis) when conditions for applying test based on (asymptotic) distribution assumptions not satisfied (e.g., small non-normal sample); note exact tests can be approximated by Monte Carlo distribution of test statistic under H 0 unknown (this is often the case in spatial statistics) only vague alternative hypotheses exist
9 Example (ctd.) In the point pattern example, each simulated x i is a pattern of n independent uniform points on R (conditioning on n). A commonly recommended graphical test is obtained by choosing a vector u of estimated values of the so-called K-function (Bartlett, 1964; Ripley, 1977) at a number of distances h > 0. For a stationary point process with intensity λ (expected number of events per unit area) λk(h) = E(number of further events within distance h from a random event). For a random pattern K(h) = πh 2, for regular patterns K(h) tends to be smaller and for clustered patterns greater (at least for small h).
10 Tree map from a 50m 50m sample plot in Lapland 20 10 0 10 20 K(h) 0 500 1000 1500 2000 0 5 10 15 20 25 20 10 0 10 20 h
11 L-function L(h) = K(h)/π h a motivated by variance stabilising square-root transformation (Besag, 1977; Silverman, 1977) and comparison to the Poisson process (for which L(h) 0) a or L(h) = K(h) (Silverman, 1977) or... ; L-function does not seem to be a very welldefined concept.
12 L-function for Lapland trees 20 10 0 10 20 L(h) 1.0 0.5 0.0 0.5 1.0 0 5 10 15 20 25 20 10 0 10 20 Formal test can be based on the sum of L(h), for example. h
13 Dependence between different types of events The cross K function can be defined as λ j K i j (h) = E(number of type j events within distance h from a randomly chosen type i event), To test for association between events of different type, bi-variate Poisson process not a particularly useful null hypothesis, because patterns of each type should be allowed to have a structure among themselves. The whole marginal models for each type are nuisance parameters!
14 Conditioning on marginal patterns (Lotwick & Silverman, 1982) Generate replications under H 0 : no association between events of different types by randomly shifting the whole pattern of one type relative to the other, events moved outside of R by a shift reappear in R from the opposite side or corner (the toroidal idea, works for rectangular R )
15 Amacrine cells in a rabbit s eye (Diggle, 1986) Are the two types of cells formed initially in two separate layers or does differentiation occur in a later stage of development.
16 Toroidal shift of black dots
17 Are patterns of black and white dots independent? ˆL 12 (h) and simulation envelopes from 29 random toroidal shifts. cross L(h) 0.006 0.002 0.002 0.00 0.05 0.10 0.15 0.20 0.25 0.30 h Evidence against H 0 rather weak.
18 Random labelling of events H 0 : black dots are a random subset of all dots is not the same hypothesis as independence between patterns. Random labelling is often considered in epidemiological case-control -studies, where apparent clustering of cases may result from inhomogeneous population density. Controls (type 2 events): a random sample from the population at risk. If no clustering then cases (type 1 events) are a random sample of the pattern of cases and controls.
19 Monte Carlo test for random labelling H 0 suggest the obvious simulation method: choose observed number of cases randomly from the combined pattern. In other words, fix spatial locations and permute the type labels, so this leads back to Fisher (1935). Under random labelling marginal patterns are random thinnings of combined patterns, which implies K 11 (h) = K 22 (h) This suggests choosing u from differences ˆK 11 (h) ˆK 22 (h) to study clustering of cases over the natural environmental spatial clustering of controls.
20 Thefts by blacks and whites in Oklahoma City (Bailey & Gatrell, 1995) Simulation envelopes, random labelling 100 150 200 250 300 K^ 1 K^ 2 10000 0 5000 150 200 250 300 350 20 40 60 80 distance ˆK 1 of white offenders - ˆK 2 of blacks.
21 Space-time clustering Space-time K-function (Diggle et al., 1993) can be defined by λk(h,t) = E(number of events within distance h and time interval t from a randomly chosen event). Here λ is the intensity in space-time: expected number of events in a space-time box of size one area unit by one time unit.
22 Monte Carlo test for space-time interaction Simulation under H 0 : no space-time interaction by random permutations of time labels keeping spatial locations fixed. If the processes operating in time and space are independent (no space-time interaction) then K(h,t) = K S (h)k T (t), where K S is the usual (spatial) K-function and K T is the similarly defined function in time domain. This suggests choosing u from differences ˆD(h,t) = ˆK(h,t) ˆK S (h) ˆK T (t) to study whether events clustered in space are also close together in time.
23 Burkitt s lymphoma in Uganda Bailey & Gatrell (1995) 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970
24 Plot of ˆD(h,t) and a Monte Carlo test D plot MC results 8e+05 6e+05 4e+05 2e+05 0e+00 D 1500 Time 1000 500 10 20 Distance 30 Frequency 0 1 2 3 4 Data Statistic 4e+07 0e+00 4e+07 Test statistic u = ˆD(h,t) over a grid of h and t-values.
25 Tools All these methods discussed in Bailey & Gatrell (1995) in a more general context of point pattern analysis. Manly (1997) is a gentle introduction to Monte Carlo methods in general. All the analyses are easily accessible to anyone in public domain spatial point pattern analysis package splancs running under commercial Splus and free R http://cran.r-project.org http://www.maths.lancs.ac.uk/~rowlings/splancs
26 splancs-functions for the examples The splancs-functions which did the essential work for the examples were Kenv.csr for the simple point pattern example Kenv.tor for random toroidal shifts of one pattern w.r.t. another Kenv.label for random labelling of two types of events stdiagn for space-time clustering analysis Type help(<function>) or example(<function>) for further details.
27 Data sets for the examples (all but one) The amacrine cells data set is available as data set amacrine in Splus/R-package spatstat http://www.maths.uwa.edu.au/~adrian/spatstat.html locations of Oklahoma City offences (okblack, okwhite) and Burkitt s lymphoma cases (burkitt) are available in splancs. Type help(<data set>) for further details.
28 Back to beetle problem The actual data are a space-time point pattern. Random permutations of time labels, keeping the spatial locations fixed, takes care of conditioning on marginal variation of observational effort and abundance both in space and time.
29 Monte Carlo test and interpretation Rejection of H 0 : no space-time interaction indicates that either spatial distribution of species has changed or observational effort has changed in some parts of the country differently from other parts Discrimination between these explanations left to the ecologist (unless info on observational effort somehow extracted)
30 Test statistic? Could perhaps be more focussed than the general space-time K-function. For example, K-functions of empty dots (to test for decline) and + s (to test for expansion). In line with the original idea. Further ideas warmly welcome! http:///pp/juhe/pub/hailuoto03.pdf
References Bailey, T. C. & Gatrell, A. C. (1995). Interactive spatial data analysis. Longman Scientific & Technical, Harlow. Barnard, G. A. (1963). Discussion on paper by M. S. Bartlett. J. R. Stat. Soc. Ser. B 25: 294. Bartlett, M. S. (1964). The spectral analysis of two-dimenstional point processes. Biometrika 51: 299 311. Besag, J. & Diggle, P. J. (1977). Simple Monte Carlo tests for spatial pattern. Appl. Statist. 26: 327 333. Besag, J. E. (1977). Discussion on paper by b. d. ripley. J. R. Stat. Soc. Ser. B 39: 193 195. Diggle, P. J. (1986). Displaced amacrine cells in the retina of a rabbit: analysis
of a bivariate spatial point pattern. Journal of Neuroscience Methods 18: 115 125. Diggle, P. J., Chetwynd, A. G., Haggkvist, R. & Morris, S. (1993). Secondorder analysis of space-time clustering. Statistical Methods in Medical Research 4: 124 136. Fisher, R. A. (1935). The design of experiments. Oliver and Boyd, Edinburgh. Hope, A. C. A. (1968). A simplified Monte Carlo significance test procedure. J. R. Stat. Soc. Ser. B 30: 582 598. Lotwick, H. W. & Silverman, B. W. (1982). Methods for analysing spatial processes of several types of points. J. R. Stat. Soc. Ser. B 44: 406 413. Manly, B. F. J. (1997). Randomization, bootstrap and Monte Carlo methods in biology. Chapman & Hall/CRC, Boca Raton, 2nd edn. Ripley, B. D. (1977). Modelling spatial patterns (with discussion). J. R. Stat. Soc. Ser. B 39: 172 212.
Silverman, B. W. (1977). Discussion on paper by b. d. ripley. J. R. Stat. Soc. Ser. B 39: 201.