Estimating the K-function for a univariate spatial point process on an arbitrary polygon.

Estimating the K-function for a univariate spatial point process on an arbitrary polygon. B. S. Rowlingson & P. J. Diggle University of Lancaster, Lancaster, UK December 1991 Keywords: point process; spatial data. 1 Language Fortran 77 2 Description and Purpose One of the most useful summary descriptions of a spatial point pattern is an estimate of the reduced second moment measure, or K-function. The K-function of a stationary isotropic point process is K(r) = λ 1 E[ Number of further points within distance r of an arbitrary point of the process ]. where λ is the intensity, or expected number of points per unit area (Ripley, 1976,1977). For a set of data {x i : i = 1,..., n} consisting of n points in a planar region A, Ripley s estimator for K(r) is defined as follows. Let A denote the area of A, and d ij the distance between x i and x j. Let δ ij (r) = 1 if d ij < r, δ ij (r) = 0 otherwise. Define w(x, r) to be the reciprocal of the proportion of the circumference of the circle with centre x and radius r which is contained in A. Then, 1 ˆK(r) = n(n 1) A w ij δ ij (r) (1) i j 1

where w ij = w(x i, d ij ). See Figure 1 for an example in which A is a rectangle, w ij = 1 and w ji > 1. The estimator (1) is approximately unbiased for sufficiently small r. As r increases, so the value of w ij can become large until, eventually, w ij becomes unbounded. For example, if A is a unit square then this happens when r = 1 2 2. In practice this is not a serious problem except possibly for very convoluted shapes of region A, since the precision of ˆK(r) also deteriorates with increasing r and the useful statistical information in ˆK(r) is confined to values of r which are small relative to the dimensions of A. Applications of ˆK(r) are described by many authors, including Cliff & Ord (1981), Ripley (1981), Diggle (1983), Upton & Fingleton (1985), Stoyan et al (1987). Motivated by recent applications in spatial epidemiology (e.g. Diggle & Chetwynd 1991), in which A is a geographical region with a complicated boundary, our objective here is to present an algorithm for computation of (1) when the region A is an arbitrary polygon. 3 Method The only non-routine step in the computation of (1) is the evaluation of the weights w ij. Explicit formulae are available for simple shapes of A such as rectangles or circles, e.g. Diggle (1983) p72. Our algorithm for a general m-sided polygon A proceeds as follows. 1. Find all the 2k 2m points of intersection of the circle in question with the boundary of A, by solving a quadratic equation for each boundary segment. Store two coincident points if the circle touches the boundary. If the circle passes through a vertex of the polygon, the code determines whether or not the polygon passes through the circle at this point by considering the direction of the two polygon segments at that vertex. 2. Order the 2k points in increasing angle around the circumference of the circle, from an arbitrary origin. Note that this partitions the circle into a sequence of arcs which alternate between being inside and outside the polygon. 3. Identify the largest arc and use a point-in-polygon test on the mid-point of this arc to establish whether that arc is inside or outside A. 2

4. Using the alternating property, sum the angles subtended by the k arcs which lie within A. Call this total angle θ. Then, w ij = θ/(2π). The use of the mid-point of the largest arc in step (3) is simply a convenient way to avoid problems of numerical instability. In step (4), the possibility of θ = 0 is precluded by a restriction on the range of r, as noted earlier. In order to handle multiple coincident points, we define w(x, 0) = 1 for all x. This is detected by the WEIGHT subroutine and a value of 1 is returned immediately without doing the full calculation. 4 Extensions Essentially the same calculations are involved in estimating the K-functions for a multivariate spatial point process, in which the points are of two or more qualitatively different types (Lotwick & Silverman 1982 ; Diggle 1983, chapter 7), or for a spatial-temporal process (Diggle, Chetwynd & Häggkvist, 1991). In its present form the routine will correctly handle concave and selfintersecting polygons. With a more comprehensive boundary data structure it can easily be extended to boundaries that are disjoint or regions A with holes. 5 Structure SUBROUTINE KHAT(X,Y,N,XV,YV,NV,NCELL,HCELL,HKHAT,IFAULT) Formal Parameters Input X Real*8 array(n) X-coordinates of points Y Real*8 array(n) Y-coordinates of points N Integer Number of points XV Real*8 array(nc) Ordered array of x-coordinates of vertices YV Real*8 array(nc) Ordered array of y-coordinates of vertices NV Integer Number of vertices defining the polygon NCELL Integer Number of tabulated values of Khat HCELL Real*8 Step size between successive r-values Output 3

HKHAT Real*8 array(ncell) Returned table of Khat values IFAULT Integer Error code The NV vertices define NV line segments forming the boundary of the polygon. The polygon is closed by a line segment from the NV th vertex to the first vertex. The values of the returned array HKHAT are such that HKHAT(J) = ˆK(J*HCELL) 6 Failure Indications The error code IFAULT returns 0 on successful completion. Non-zero values indicate an error, and the contents of HKHAT are undefined. IFAULT = 1 indicates that too many vertices were specified in the polygon data. Increase the parameter IBMAX and recompile. IFAULT = 2 indicates that a zero-length line segment was found in the polygon data. Remove one of the points defining this segment, and rerun with the new polygon data. IFAULT = 3 means that a point in the data was found to lie outside the polygon. Either remove this point from the dataset or enlarge the polygon to enclose it. IFAULT = 4 means that an odd number of polygon-circle intersections were found at some particular point and radius. This could only occur if there were a problem with the finite precision of the calculation, and we have never experienced it in practice. 7 Timings The most computationally intensive part of the routine is the calculation of the weight for a given pair of points. This is embedded inside two nested doloops, giving O(n 2 ) operations for n point events. The weight routine loops over the boundary polygon segments once, and for m boundary segments, the algorithm is therefore O(m). For further comparison, two other weightcalculation routines were coded, one to calculate the weight on a rectangular boundary by simple trigonometry, and a slightly more general version that was valid for any convex polygon. These routines were timed using uniform random point data on a square boundary. The ratios of the times for rectangular:convex:general algorithms were 1:1.8:2.6 4

8 Acknowledgements The authors would like to thank Ken McElvain for permission to use his point-in-polygon algorithm. 9 References Cliff A.D. and Ord J.K. (1981) Spatial Processes : Models and Applications. London: Pion Diggle P.J. (1983) Statistical Analysis of Spatial Point Patterns. London: Academic Press. Diggle P.J. and Chetwynd A.G. (1991) Second-order analysis of spatial clustering for inhomogeneous populations. Biometrics Diggle P.J., Chetwynd A.G. & Häggkvist R. (1991) Second order analysis of spatial-temporal point patterns. Bull. Int. Statist. Inst. Lotwick H.W. & Silverman B.W. (1982) Methods for analysing spatial processes of several types of points. J. R. Statist. Soc. B 44, 406-413 Ripley B.D. (1976) The second-order analysis of stationary point processes. J. Appl. Prob, 13, 255 266 Ripley B.D. (1977) Modelling spatial patterns (with discussion). J. R. Statist. Soc. B 39, 172 212 Ripley B.D. (1981) Spatial Statistics. New York: Wiley. Stoyan D., Kendall W.S., Mecke J. (1987) Stochastic Geometry and its Applications. Berlin: Akademie-Verlag. Upton G. and Fingleton B. (1985) Spatial Data Analysis by Example, Vol 1. Chichester: Wiley. 5