Monte Carlo tests for spatial patterns and their change a



Similar documents
Ripley s K function. Philip M. Dixon Volume 3, pp Encyclopedia of Environmetrics (ISBN ) Edited by

MATH2740: Environmental Statistics

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Exact Nonparametric Tests for Comparing Means - A Personal Summary

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Multivariate Analysis of Ecological Data

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

UNDERSTANDING THE TWO-WAY ANOVA

Statistics Graduate Courses

Fairfield Public Schools

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Tutorial on Markov Chain Monte Carlo

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Introduction to Quantitative Methods

Permutation P-values Should Never Be Zero: Calculating Exact P-values When Permutations Are Randomly Drawn

How To Check For Differences In The One Way Anova

Chapter 19 The Chi-Square Test

Chapter 3 RANDOM VARIATE GENERATION

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

EFFECTS OF DATA QUALITY ON ANALYSIS OF ECOLOGICAL PATTERN USING THE Kˆ (d) STATISTICAL FUNCTION

NCSS Statistical Software

Non-Life Insurance Mathematics

Environmental Remote Sensing GEOG 2021

Monte Carlo testing with Big Data

STATISTICA Formula Guide: Logistic Regression. Table of Contents

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

NCSS Statistical Software

Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine

Statistics 100A Homework 7 Solutions

Quantitative Methods for Finance

Analysis of Variance (ANOVA) Using Minitab

Chi-square test Fisher s Exact test

1 Another method of estimation: least squares

Introduction to the Monte Carlo method

Data Mining. Nonlinear Classification

Example: Boats and Manatees

Section 12 Part 2. Chi-square test

The Wilcoxon Rank-Sum Test

Spatial-Temporal Analysis of Mountain Pine Beetle Infestations to Characterize Pattern, Risk, and Spread at the Landscape Level

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Elementary Statistics Sample Exam #3

A study on the bi-aspect procedure with location and scale parameters

Joint Exam 1/P Sample Exam 1

Chapter G08 Nonparametric Statistics

LOGNORMAL MODEL FOR STOCK PRICES

Chemotaxis and Migration Tool 2.0

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

The Variability of P-Values. Summary

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

12: Analysis of Variance. Introduction

FACTOR ANALYSIS NASC

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

ABSORBENCY OF PAPER TOWELS

Hypothesis Testing for Beginners

Association Between Variables

Chapter 1 Introduction. 1.1 Introduction

MATHEMATICAL METHODS OF STATISTICS

Class Meeting # 1: Introduction to PDEs

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

AN INTRODUCTION TO HYPOTHESIS-TESTING AND EXPERIMENTAL DESIGN

DATA MINING SPECIES DISTRIBUTION AND LANDCOVER. Dawn Magness Kenai National Wildife Refuge

Unit 26 Estimation with Confidence Intervals

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

An introduction to Value-at-Risk Learning Curve September 2003

Alessandro Birolini. ineerin. Theory and Practice. Fifth edition. With 140 Figures, 60 Tables, 120 Examples, and 50 Problems.

Statistics Review PSY379

Master programme in Statistics

Nonparametric tests these test hypotheses that are not statements about population parameters (e.g.,

ABSTRACT FOR THE 1ST INTERNATIONAL WORKSHOP ON HIGH ORDER CFD METHODS

Permutation Tests for Comparing Two Populations

Chapter 2 Exploratory Data Analysis

SAS Certificate Applied Statistics and SAS Programming

The STC for Event Analysis: Scalability Issues

RELIABILITY OF SYSTEMS WITH VARIOUS ELEMENT CONFIGURATIONS

2013 MBA Jump Start Program. Statistics Module Part 3

Introduction to General and Generalized Linear Models

STA 4273H: Statistical Machine Learning

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Nonparametric statistics and model selection

Chapter 5 Analysis of variance SPSS Analysis of variance

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

STAT 360 Probability and Statistics. Fall 2012

STRUTS: Statistical Rules of Thumb. Seattle, WA

Geographically Weighted Regression

Study Guide for the Final Exam

EVALUATION OF PROBABILITY MODELS ON INSURANCE CLAIMS IN GHANA

1 Sufficient statistics

The Exponential Distribution

Package HHG. July 14, 2015

Lecture 9: Introduction to Pattern Analysis

Master of Mathematical Finance: Course Descriptions

Randomization Based Confidence Intervals For Cross Over and Replicate Designs and for the Analysis of Covariance

Mean = (sum of the values / the number of the value) if probabilities are equal

SIMULATION STUDIES IN STATISTICS WHAT IS A SIMULATION STUDY, AND WHY DO ONE? What is a (Monte Carlo) simulation study, and why do one?

Transcription:

1 Monte Carlo tests for spatial patterns and their change a Finnish Forest Research Institute Unioninkatu 40 A, 00170 Helsinki juha.heikkinen@metla.fi Workshop on Spatial Statistics and Ecology Perämeri Research Station, Hailuoto 16. 18.10.2003 a With inspiration from problem presented by Juha Siitonen (Metla, Vantaa) Slides available at http:///pp/juhe/pub/hailuoto03.pdf

2 Motivation All available observations on long-horned beetles (sarvijäärät) Leptura quadrifasciata (nelivyöjäärä) and Monochamus sutor (suutari) in Finland, aggregated to 50km 50km squares and two time intervals. Filled dot: observations both before and after 1960 Empty dot: observations only before 1960 + : observations only after 1960 Question: Are there changes in the biogeographic ranges?

3

4 Problems The usual problem with such data: observational effort spatially and temporally variable and unknown. Also species tend to be more abundant in the core of their range than in the limits. Question: Can spatial pattern of observations still reveal real changes? E.g., when no changes, random pattern of empty dots and + s expected along the limits. Significant clustering of empty dots indicates decline of species?

5 Monte Carlo significance test Let H 0 be a null hypothesis about the distribution of (multidimensional) random variable X, such that H 0 is simple (no unknown parameters involved) or H 0 is composite, but sufficient statistics exist for all nuisance parameters.

6 Example (Besag & Diggle, 1977) X are locations of events (e.g., trees) in study region R H 0 : pattern X is completely random Null distribution is the homogeneous Poisson process on R : number of events in X, n Poisson(λ R ), where R is the size of R the n events are uniformly distributed over R, locations mutually independent λ is a nuisance parameter with sufficient statistics n.

7 General test procedure (Barnard, 1963) Select any test statistic u, sensitive to suspected kind of departure from H 0. Suppose large values indicate departure. Compute u 1 = u(x) from the observed data. Simulate m 1 random samples x 2,x 3,...,x m from the null distribution of X. Compute simulated u-values u i = u(x i ), i = 2,...,m. Order the complete set {u 1,u 2,...,u m } If u 1 is the k th largest, then the exact significance level of the test is k/n.

8 Citing Hope (1968) preferable to use known test of good efficiency instead of a Monte Carlo test procedure assuming that the alternative statistical hypothesis can be completely satisfied. Monte Carlo useful (at least in early stages of statistical analysis) when conditions for applying test based on (asymptotic) distribution assumptions not satisfied (e.g., small non-normal sample); note exact tests can be approximated by Monte Carlo distribution of test statistic under H 0 unknown (this is often the case in spatial statistics) only vague alternative hypotheses exist

9 Example (ctd.) In the point pattern example, each simulated x i is a pattern of n independent uniform points on R (conditioning on n). A commonly recommended graphical test is obtained by choosing a vector u of estimated values of the so-called K-function (Bartlett, 1964; Ripley, 1977) at a number of distances h > 0. For a stationary point process with intensity λ (expected number of events per unit area) λk(h) = E(number of further events within distance h from a random event). For a random pattern K(h) = πh 2, for regular patterns K(h) tends to be smaller and for clustered patterns greater (at least for small h).

10 Tree map from a 50m 50m sample plot in Lapland 20 10 0 10 20 K(h) 0 500 1000 1500 2000 0 5 10 15 20 25 20 10 0 10 20 h

11 L-function L(h) = K(h)/π h a motivated by variance stabilising square-root transformation (Besag, 1977; Silverman, 1977) and comparison to the Poisson process (for which L(h) 0) a or L(h) = K(h) (Silverman, 1977) or... ; L-function does not seem to be a very welldefined concept.

12 L-function for Lapland trees 20 10 0 10 20 L(h) 1.0 0.5 0.0 0.5 1.0 0 5 10 15 20 25 20 10 0 10 20 Formal test can be based on the sum of L(h), for example. h

13 Dependence between different types of events The cross K function can be defined as λ j K i j (h) = E(number of type j events within distance h from a randomly chosen type i event), To test for association between events of different type, bi-variate Poisson process not a particularly useful null hypothesis, because patterns of each type should be allowed to have a structure among themselves. The whole marginal models for each type are nuisance parameters!

14 Conditioning on marginal patterns (Lotwick & Silverman, 1982) Generate replications under H 0 : no association between events of different types by randomly shifting the whole pattern of one type relative to the other, events moved outside of R by a shift reappear in R from the opposite side or corner (the toroidal idea, works for rectangular R )

15 Amacrine cells in a rabbit s eye (Diggle, 1986) Are the two types of cells formed initially in two separate layers or does differentiation occur in a later stage of development.

16 Toroidal shift of black dots

17 Are patterns of black and white dots independent? ˆL 12 (h) and simulation envelopes from 29 random toroidal shifts. cross L(h) 0.006 0.002 0.002 0.00 0.05 0.10 0.15 0.20 0.25 0.30 h Evidence against H 0 rather weak.

18 Random labelling of events H 0 : black dots are a random subset of all dots is not the same hypothesis as independence between patterns. Random labelling is often considered in epidemiological case-control -studies, where apparent clustering of cases may result from inhomogeneous population density. Controls (type 2 events): a random sample from the population at risk. If no clustering then cases (type 1 events) are a random sample of the pattern of cases and controls.

19 Monte Carlo test for random labelling H 0 suggest the obvious simulation method: choose observed number of cases randomly from the combined pattern. In other words, fix spatial locations and permute the type labels, so this leads back to Fisher (1935). Under random labelling marginal patterns are random thinnings of combined patterns, which implies K 11 (h) = K 22 (h) This suggests choosing u from differences ˆK 11 (h) ˆK 22 (h) to study clustering of cases over the natural environmental spatial clustering of controls.

20 Thefts by blacks and whites in Oklahoma City (Bailey & Gatrell, 1995) Simulation envelopes, random labelling 100 150 200 250 300 K^ 1 K^ 2 10000 0 5000 150 200 250 300 350 20 40 60 80 distance ˆK 1 of white offenders - ˆK 2 of blacks.

21 Space-time clustering Space-time K-function (Diggle et al., 1993) can be defined by λk(h,t) = E(number of events within distance h and time interval t from a randomly chosen event). Here λ is the intensity in space-time: expected number of events in a space-time box of size one area unit by one time unit.

22 Monte Carlo test for space-time interaction Simulation under H 0 : no space-time interaction by random permutations of time labels keeping spatial locations fixed. If the processes operating in time and space are independent (no space-time interaction) then K(h,t) = K S (h)k T (t), where K S is the usual (spatial) K-function and K T is the similarly defined function in time domain. This suggests choosing u from differences ˆD(h,t) = ˆK(h,t) ˆK S (h) ˆK T (t) to study whether events clustered in space are also close together in time.

23 Burkitt s lymphoma in Uganda Bailey & Gatrell (1995) 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970

24 Plot of ˆD(h,t) and a Monte Carlo test D plot MC results 8e+05 6e+05 4e+05 2e+05 0e+00 D 1500 Time 1000 500 10 20 Distance 30 Frequency 0 1 2 3 4 Data Statistic 4e+07 0e+00 4e+07 Test statistic u = ˆD(h,t) over a grid of h and t-values.

25 Tools All these methods discussed in Bailey & Gatrell (1995) in a more general context of point pattern analysis. Manly (1997) is a gentle introduction to Monte Carlo methods in general. All the analyses are easily accessible to anyone in public domain spatial point pattern analysis package splancs running under commercial Splus and free R http://cran.r-project.org http://www.maths.lancs.ac.uk/~rowlings/splancs

26 splancs-functions for the examples The splancs-functions which did the essential work for the examples were Kenv.csr for the simple point pattern example Kenv.tor for random toroidal shifts of one pattern w.r.t. another Kenv.label for random labelling of two types of events stdiagn for space-time clustering analysis Type help(<function>) or example(<function>) for further details.

27 Data sets for the examples (all but one) The amacrine cells data set is available as data set amacrine in Splus/R-package spatstat http://www.maths.uwa.edu.au/~adrian/spatstat.html locations of Oklahoma City offences (okblack, okwhite) and Burkitt s lymphoma cases (burkitt) are available in splancs. Type help(<data set>) for further details.

28 Back to beetle problem The actual data are a space-time point pattern. Random permutations of time labels, keeping the spatial locations fixed, takes care of conditioning on marginal variation of observational effort and abundance both in space and time.

29 Monte Carlo test and interpretation Rejection of H 0 : no space-time interaction indicates that either spatial distribution of species has changed or observational effort has changed in some parts of the country differently from other parts Discrimination between these explanations left to the ecologist (unless info on observational effort somehow extracted)

30 Test statistic? Could perhaps be more focussed than the general space-time K-function. For example, K-functions of empty dots (to test for decline) and + s (to test for expansion). In line with the original idea. Further ideas warmly welcome! http:///pp/juhe/pub/hailuoto03.pdf

References Bailey, T. C. & Gatrell, A. C. (1995). Interactive spatial data analysis. Longman Scientific & Technical, Harlow. Barnard, G. A. (1963). Discussion on paper by M. S. Bartlett. J. R. Stat. Soc. Ser. B 25: 294. Bartlett, M. S. (1964). The spectral analysis of two-dimenstional point processes. Biometrika 51: 299 311. Besag, J. & Diggle, P. J. (1977). Simple Monte Carlo tests for spatial pattern. Appl. Statist. 26: 327 333. Besag, J. E. (1977). Discussion on paper by b. d. ripley. J. R. Stat. Soc. Ser. B 39: 193 195. Diggle, P. J. (1986). Displaced amacrine cells in the retina of a rabbit: analysis

of a bivariate spatial point pattern. Journal of Neuroscience Methods 18: 115 125. Diggle, P. J., Chetwynd, A. G., Haggkvist, R. & Morris, S. (1993). Secondorder analysis of space-time clustering. Statistical Methods in Medical Research 4: 124 136. Fisher, R. A. (1935). The design of experiments. Oliver and Boyd, Edinburgh. Hope, A. C. A. (1968). A simplified Monte Carlo significance test procedure. J. R. Stat. Soc. Ser. B 30: 582 598. Lotwick, H. W. & Silverman, B. W. (1982). Methods for analysing spatial processes of several types of points. J. R. Stat. Soc. Ser. B 44: 406 413. Manly, B. F. J. (1997). Randomization, bootstrap and Monte Carlo methods in biology. Chapman & Hall/CRC, Boca Raton, 2nd edn. Ripley, B. D. (1977). Modelling spatial patterns (with discussion). J. R. Stat. Soc. Ser. B 39: 172 212.

Silverman, B. W. (1977). Discussion on paper by b. d. ripley. J. R. Stat. Soc. Ser. B 39: 201.