MATH2740: Environmental Statistics



Similar documents
Probability Calculator

Statistics 100A Homework 7 Solutions

Math 461 Fall 2006 Test 2 Solutions

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Notes on the Negative Binomial Distribution

Maximum Likelihood Estimation

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March Due:-March 25, 2015.

Feb 28 Homework Solutions Math 151, Winter Chapter 6 Problems (pages )

1 Prior Probability and Posterior Probability

HYPOTHESIS TESTING: POWER OF THE TEST

Aggregate Loss Models

Lecture 6: Discrete & Continuous Probability and Random Variables

Chapter 3 RANDOM VARIATE GENERATION

1 Sufficient statistics

Statistics 100A Homework 8 Solutions

Hypothesis Testing for Beginners

2WB05 Simulation Lecture 8: Generating random variables

Lesson19: Comparing Predictive Accuracy of two Forecasts: Th. Diebold-Mariano Test

UNIT I: RANDOM VARIABLES PART- A -TWO MARKS

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp , ,

Sales forecasting # 2

Introduction to General and Generalized Linear Models

PSTAT 120B Probability and Statistics

Section 5.1 Continuous Random Variables: Introduction

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Permutation Tests for Comparing Two Populations

CHAPTER 6: Continuous Uniform Distribution: 6.1. Definition: The density function of the continuous random variable X on the interval [A, B] is.

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

ECE302 Spring 2006 HW5 Solutions February 21,

CSU Fresno Problem Solving Session. Geometry, 17 March 2012

Interaction between quantitative predictors

Math 151. Rumbos Spring Solutions to Assignment #22

Simple Linear Regression Inference

Questions and Answers

6.041/6.431 Spring 2008 Quiz 2 Wednesday, April 16, 7:30-9:30 PM. SOLUTIONS

Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck!

Stat 515 Midterm Examination II April 6, 2010 (9:30 a.m. - 10:45 a.m.)

3.4 Statistical inference for 2 populations based on two samples

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA

CAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION

Nominal and ordinal logistic regression

. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i.

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Chapter 5. Random variables

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

Monte Carlo tests for spatial patterns and their change a

START Selected Topics in Assurance

6.2 Permutations continued

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

Pr(X = x) = f(x) = λe λx

Statistical Machine Learning

4. How many integers between 2004 and 4002 are perfect squares?

Non-Inferiority Tests for Two Means using Differences

A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails

Pearson's Correlation Tests

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Introduction to Hypothesis Testing OPRE 6301

Confidence Intervals for Exponential Reliability

Premaster Statistics Tutorial 4 Full solutions

The Exponential Distribution

Math 425 (Fall 08) Solutions Midterm 2 November 6, 2008

VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

5.1 Identifying the Target Parameter

Properties of Future Lifetime Distributions and Estimation

Notes on Continuous Random Variables

Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents


MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

Hypothesis Testing --- One Mean

Multinomial and Ordinal Logistic Regression

Math 1B, lecture 5: area and volume

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

Survival Distributions, Hazard Functions, Cumulative Hazards

5. Continuous Random Variables

Introduction to Hypothesis Testing

Practice problems for Homework 11 - Point Estimation

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

How to assess the risk of a large portfolio? How to estimate a large covariance matrix?

MAINTAINED SYSTEMS. Harry G. Kwatny. Department of Mechanical Engineering & Mechanics Drexel University ENGINEERING RELIABILITY INTRODUCTION

Inference for two Population Means

1.3. DOT PRODUCT If θ is the angle (between 0 and π) between two non-zero vectors u and v,

Introduction to the Monte Carlo method

Basics of Statistical Machine Learning

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Dŵr y Felin Comprehensive School. Perimeter, Area and Volume Methodology Booklet

SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

Multivariate normal distribution and testing for means (see MKB Ch 3)

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

INSURANCE RISK THEORY (Problems)

Measuring Line Edge Roughness: Fluctuations in Uncertainty

Math 431 An Introduction to Probability. Final Exam Solutions

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

Error Type, Power, Assumptions. Parametric Tests. Parametric vs. Nonparametric Tests

Transcription:

MATH2740: Environmental Statistics Lecture 6: Distance Methods I February 10, 2016

Table of contents 1 Introduction Problem with quadrat data Distance methods 2 Point-object distances Poisson process case Rayleigh distribution Distribution of object-object distances 3 Clark-Evans test Clark-Evans test of randomness Problems with the Clark-Evans test Examples of Clark and Evans test Problems with Clark-Evans test II

Problems with quadrat data Quadrat methods can be inefficient to use in some circumstances: Time and cost to lay out and search all quadrats. Choice of quadrat size potentially influencing conclusions. Quadrat counts do not depend on underlying point pattern. Plots have same quadrat counts but different spatial pattern.

Distance methods Using distance methods tries to overcome some of the problems associated with quadrat counting methods.

Types of distance measurement Distance measurements involve measuring: Distances from randomly selected points to the nearest neighbouring object, giving a point-object distance. Distances from a randomly selected object to the nearest neighbouring object, giving an object-object distance. This procedure requires us to know the locations of all objects within the study area to allow selecting objects randomly.

Example: Types of distance measurement Have a Poisson process with 30 objects within a unit square. Left: distances from four randomly selected objects in the study area to their nearest object. Gives object-object distances. Right: distances from four randomly located points in the study area to their nearest object. Gives point-object distances.

Other types of distance measurement I (NOT examined) Other types of distance measurement can be considered: Random object to the nth nearest neighbour. Random point to the nth nearest neighbour. Besag and Gleaves (1973) T-square sampling.

Other types of distance measurement II (NOT examined) Besag and Gleaves (1973) T-square sampling. Find distance from a random point O to the nearest object P. Find distance to nearest object Q from P, where Q is located in the half-plane beyond O. Gives a point-object distance and an object-object distance. Q P O

Point-object distances I Suppose object locations occur as a Poisson process with intensity λ (mean number of objects per unit area is λ). Number X(A) of objects in a region A with size A has a Poisson distribution with mean µ = λ A so pr{x(a) = x} = µx e µ x! = (λ A )x e λ A, x = 0,1,2,.... x! In particular, pr{x(a) = 0} = e λ A.

Point-object distances II Let R denote distance from a random point to nearest object. Consider a circle of radius r centred on the random point. r Distance R from a random point to the nearest object is greater than r if the circle of radius r and area πr 2 contains no objects.

Point-object distances III Distance R from random point to nearest object satisfies pr{r > r} = pr{no objects inside circle of radius r} = pr{x(a) = 0} where X(A) Poisson(µ = λ A ) with A = πr 2. Hence pr{r > r} = exp( λπr 2 ). Cumulative distribution function of R is F R (r) = pr{r r} = 1 pr{r > r} = 1 exp( λπr 2 ), r > 0. The probability density function f R (r) of R is f R (r) = df R(r) dr = 2λπr exp( λπr 2 ), r > 0.

Rayleigh distribution I f R (r) = df R(r) = 2λπr exp( λπr 2 ), r > 0. dr This is probability density function of a Rayleigh distribution. It is a special case of the Weibull distribution with probability density function f X (x) = abx b 1 exp( ax b ), for x > 0, where a > 0 and b > 0. Here a = λπ and b = 2. Plots show λ = 0.1 (left), λ = 0.2 (centre) and λ = 0.4 (right). Pdf 0.0 0.2 0.4 0.6 0.8 1.0 Pdf 0.0 0.2 0.4 0.6 0.8 1.0 Pdf 0.0 0.2 0.4 0.6 0.8 1.0 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 r r r

Rayleigh distribution II E[R] = r=0 rf R (r)dr = r=0 2λπr 2 exp( λπr 2 )dr. Let y = λπr 2 so dy = 2λπrdr and dr = dy 2 λπy so E[R] = 2ye y 2 λπy dy = 1 2 λ y 0.5 e y y=0 = 1 2 λ y=0 Γ ( 3 2 y=0 ) dy = 1 2 λ y 0.5 e y dy π since Γ( 3 2 ) = 1 2 Γ(1 2 ) = 1 2 π and area under a gamma(α = 3 2,1) distribution integrates to one, so that y=0 y 0.5 e y Γ ( ) 3 dy = 1. 2 1 2

Rayleigh distribution III: revision of gamma distribution A gamma(α,λ) distribution has probability density function f Y (y) = λα y α 1 e λy Γ(α) for y > 0, where the gamma function satisfies Γ(α) = (α 1)Γ(α 1) with Γ(1) = 1 and Γ ( 1 2) = π.

Rayleigh distribution IV E[R 2 ] = r=0 r 2 f R (r)dr = r=0 2λπr 3 exp( λπr 2 )dr. Putting y = λπr 2 and dy = 2λπrdr gives E[R 2 2ye y ] = y=0 2λπ dy = 1 ye y dy = 1 λπ y=0 λπ as area under a gamma(α = 2,1) distribution integrates to one ( ) ye y so, with Γ(2) = 1, dy = 1. y=0 Γ(2) Hence Var[R] = E[R 2 ] {E[R]} 2 = 1 λπ 1 4λ = 4 π 4λπ. ( ) Or recall for Y exponential(1), E[Y ] = y=0 ye y dy = 1.

Object-object distances Given a large number N of objects in the study area A, the distribution of the distance between a random object and the nearest neighbouring object is the same as the point-object distance. Suppose A contains N objects randomly positioned within A. Probability any object is located in a small region a A is a / A. Probability any object is not located in a is 1 a / A. If a = πr 2, probability that none of remaining N 1 objects are within a distance r of a randomly chosen object is (1 πr 2 / A ) N 1 by independence. Writing λ N/ A gives pr{r r} 1 (1 λπr 2 /N) N 1. As N this gives same as point-object distribution function.

Clark-Evans test I Have N object-object nearest neighbour distances r i, i = 1,2,...,N, with sample mean r. If randomness (Poisson process) assumption is true, then for large N, Clark and Evans (1954) assume ( 1 r N 2 λ, 4 π ) 4λπN where E[R] = 1 2 λ and Var[R] = 4 π 4λπ. Hence Z = r 1 2 λ 4 π 4λπN N(0,1). Reject randomness hypothesis at 5% level if Z > 1.96.

Clark-Evans test II For small N Clark and Evans would suggest using a suitable gamma distribution as an approximation to the distribution of r.

Clark-Evans measure of randomness Clark and Evans use 1 φ R = r E[R] = 2 λ r as a measure of randomness. φ R 1 for a random process, φ R < 1 for a clustered (aggregated) process, and φ R > 1 for a regularly located process 2. 1 Clark and Evans used the symbol R for their randomness measure but to avoid confusion with the random variable R the symbol φ R is used here. 2 Most extreme case has objects on a hexagonal grid, each object the same distance r from six others. This hexagon has area 3 3 r 2 /2 and is associated with 3 data points, the central point and a weight one third for each of the six surrounding points, so λ = 3/(3 3 r 2 /2). Thus r = 1.0746/ λ so φ R = 2.149.

Problems with Clark-Evans test I Intensity λ should be known to carry out the test. Could be estimated using the mean number of objects per unit area from the study region. Clark-Evans test uses all N object-object distances. These distances are not independent but Diggle (1976) and Donnelly (1978) showed that the correlations are small 3. Correlations between the object-object distances mean central limit theorem does NOT apply. However Z N(0,1) as shown by Donnelly (1978). 3 Donnelly (1978) obtained better approximations for mean and variance of the object-object distances, but for large N these give E[R] and Var[R] as obtained by assuming object-object distances are independent.

Using a border region I Clark and Evans (1954) advise having a border around the study region to avoid bias. For points near the edge of a study region the calculated object-object distance to objects within the study region will tend to be larger than it should be. This will have the effect of biasing the test statistic Z upwards, rejecting the randomness hypothesis and suggesting regularity of the data points.

Using a border region II Object-object distances are measured for all objects within the inner region and can be to points within the border.

Using a border region III Donnelly (1978) presented approximations for E[R] and Var[R] when a border is ignored. For perimeter P, E[R] 1 2 λ + P N Var[R] 0.070 λn + 0.037P N 2 λ. ( 0.0514 + 0.0412 ), N

Using a toroidal correction If a rectangular study region, an alternative is to assume the region lies on a torus, so opposite edges are adjacent to each other. The study region (centre below) is surrounded by a grid of identical regions. Object-object distances are measured for all objects within the central region and can be to points outside the centre.

Example 1: Simulated data I The object-object nearest neighbour distances for the N = 11 objects within the inner study region below are: 0.201 0.201 0.327 0.327 0.350 0.350 0.500 0.500 0.657 0.826 1.278

Example 1: Simulated data II Data are: 0.201 0.201 0.327 0.327 0.350 0.350 0.500 0.500 0.657 0.826 1.278 These have mean r = 0.5015. The inner region has area 9m 2 so λ can be estimated by λ = 11/9 = 1.222. The test statistic is thus z = r 1 2 λ 4 π 4λπN = 0.5015 0.4523 0.07128 = 0.690. Here z < 1.96. Accept the randomness hypothesis at 5% level. Notice many of the object-object distances are the same.

Example 2: Ground ant nests in Panama I Levings and Franks (1982) present data for the number of ground ant nests in various study regions on Barro Colorado Island, in Gatun Lake, Panama. For one 100m 2 square study region the number of nests of Ectatomma ruidum per m 2 was given as 0.61 with φ R = 1.16. This suggests λ = 0.61, N = 100λ = 61 and r = φ R 2 = 0.7426. The Clark-Evans test statistic is then λ z = r 1 2 λ 4 π 4λπN = 0.7426 0.6402 0.04285 = 2.390. As a two-sided test the P-value of this test is P = pr{ Z > 2.390} = 0.0168 so reject randomness hypothesis. φ R > 1 suggests the ant nests are distributed regularly.

Example 2: Ground ant nests in Panama II Unfortunately Levings and Franks did not appear to use a border so that their results are invalid. Using the corrected values for E[R] and Var[R] obtained by Donnelly (1978) the test statistic becomes z = 0.470 which is not significant. There is thus no evidence to reject the randomness hypothesis. For perimeter P (here 40m), this gives E[R] 1 2 λ + P N ( 0.0514 + 0.0412 N ) = 0.67735, Var[R] 0.070 λn + 0.037P N 2 λ = 0.019321.

Intensive sampling If all the nearest neighbour distances are calculated in a region, then the values are not independent. Cressie (1993, p.609-610) refers to this as intensive sampling. The consequence is that the true variance of R is greater than that assumed (due to the correlations) so the test statistic Z used in the test tends to be larger than it should be resulting in clustering being suggested more often than it should be. One solution is to use Monte-Carlo tests for inference. Independent realizations of the data assuming the null hypothesis is true are simulated and the test statistic Z i calculated for each. The observed value of the test statistic Z can be compared with the ones simulated and the test rejects the null hypothesis if the observed Z is too large or too small when compared with the simulated Z i.