DP- MS - 88-24 9 PROPER STATISTICAL TREATMENT OF SPECIES-AREA DATA. Savannah River Laboratory Aiken, SC 29808-0001



Similar documents
Should Costing Version 1.1

This document was prepared in conjunction with work accomplished under Contract No. DE-AC09-96SR18500 with the U. S. Department of Energy.

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Cost Effective Mass Standards Calibration Intervals (U)

Example: Boats and Manatees

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Derived Intervention Levels for Tritium Based on Food and Drug Administration Methodology Using ICRP 56 Dose Coefficients (U)

Testing of Kaonetics Devices at BNL

Software Quality Assurance Plan for the Hydrologic Evaluation of Landfill Performance (HELP) Model. Author: Mark A. Phifer OCTOBER, 2006

LIMITATIONS ON HIGH DATA RATE OPTICAL FIBER TRAN-SMISSION SYSTEMS DUE TO TRANSMISSION IMPAIRMENT

2-DFINITE ELEMENT CABLE & BOX IEMP ANALYSIS

Simple linear regression

Science Goals for the ARM Recovery Act Radars

LLNL-TR SAVANT Status Report. Arden Dougan. November 6, 2009

RAVEN: A GUI and an Artificial Intelligence Engine in a Dynamic PRA Framework

College Readiness LINKING STUDY


W. C. Reinig. Savannah River Laboratory E. I. du Pent de Nemours and Company Aiken, South Carolina 298o1

Simple Regression Theory II 2010 Samuel L. Baker

Microgrids. EEI TDM Meeting Charleston, South Carolina October 7, Neal Bartek Smart Grid Projects Manager

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

Asynchronous data change notification between database server and accelerator control systems

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

WET BULB GLOBE TEMPERATURE MEASUREMENT AT THE Y-12 NATIONAL SECURITY COMPLEX

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

DISCLAIMER. This document was prepared as an account of work sponsored by an agency of the United States

Boost Libraries Boost Software License Version 1.0

Module 5: Multiple Regression Analysis

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

The Value of RECs in Renewable Project Financing

Analysis of Filter Coefficient Precision on LMS Algorithm Performance for G.165/G.168 Echo Cancellation

ESNET Requirements for Physics Research at the SSCL

S.L. Chang, S.A. Lottes, C.Q. Zhou,** B. Golchert, and M. Petrick

Drupal Automated Testing: Using Behat and Gherkin

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Panel Session #4 Smart Grid Workforce Training and Education: Identifying the Industry Perspective

Fairfield Public Schools

Introducing the Loan Pool Specific Factor in CreditManager

Regression III: Advanced Methods

Open Source Software used in the product

Small PV Systems Performance Evaluation at NREL's Outdoor Test Facility Using the PVUSA Power Rating Method

Premaster Statistics Tutorial 4 Full solutions

Part 2: Analysis of Relationship Between Two Variables

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

ACOT WEBSITE PRIVACY POLICY

Better decision making under uncertain conditions using Monte Carlo Simulation

This document was prepared in conjunction with work accomplished under Contract No. DE-AC09-96SR18500 with the U. S. Department of Energy.

Teaching Business Statistics through Problem Solving

Regression Analysis: A Complete Example

Open Source Used In Cisco IronPort Encryption SDK

Week TSX Index

Factors affecting online sales

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Final Exam Practice Problem Answers

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

This document was prepared in conjunction with work accomplished under Contract No. DE-AC09-96SR18500 with the U. S. Department of Energy.

Measurement of BET Surface Area on Silica Nanosprings

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Pearson s Correlation

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

2. Simple Linear Regression

Integrated Tools for Multifamily Green Asset Management. Energy Upgrade CA Web Portal Funding Finder Compass Portfolio Tracker Webinar May 22, 2012

IDC Reengineering Phase 2 & 3 US Industry Standard Cost Estimate Summary

WinMagic Encryption Software Installation and Configuration

Module 5: Statistical Analysis

New Tools Using the Hardware Pertbrmaiihe... Monitor to Help Users Tune Programs on the Cray X-MP*

Seasonal Adjustment for Short Time Series in Excel Catherine C.H. Hood Catherine Hood Consulting

Terms of Submission In order to participate, you must be at least eighteen (18) years old.

Regression step-by-step using Microsoft Excel

User Guide. The Business Energy Dashboard

Distribution of Terminal Lung and Liver Dose Rates in United States Transuranium and Uranium Registries Registrants

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Chapter 7: Simple linear regression Learning Objectives

AP Physics 1 and 2 Lab Investigations

Free Trial - BIRT Analytics - IAAs

COMPARISON OF INDIRECT COST MULTIPLIERS FOR VEHICLE MANUFACTURING

Secure SCADA Communication Protocol Performance Test Results

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

Scatter Plot, Correlation, and Regression on the TI-83/84

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Simple Methods and Procedures Used in Forecasting

Comment on "Observational and model evidence for positive low-level cloud feedback"

Preparing for the Meter Data Deluge An Intelligent Utility Reality Webcast

SPSS Guide: Regression Analysis

Up to $750 (or 30% of the project cost, whichever is less), for

Testing for Lack of Fit

Nuclear Engineering Enrollments and Degrees Survey, 2014 Data

Slope-Intercept Equation. Example

The Bayesian Approach to Forecasting. An Oracle White Paper Updated September 2006

2013 MBA Jump Start Program. Statistics Module Part 3

International Statistical Institute, 56th Session, 2007: Phil Everson

Situated Usability Testing for Security Systems

Open Source Used In Cisco Instant Connect for ios Devices 4.9(1)

Small Modular Nuclear Reactors: Parametric Modeling of Integrated Reactor Vessel Manufacturing Within A Factory Environment Volume 1

End-User Reference Guide

Transcription:

I / J DP- MS - 88-24 9 PROPER STATISTICAL TREATMENT OF SPECIES-AREA DATA by C. Loehle E. I. du Pont de Nemours and Company Savannah River Laboratory Aiken, SC 29808-0001 The information contained in this paper was developed during the course of work under Contract No. DE-AC09-76SR00001 with the U. S. Department of Energy. By acceptance of this paper, the publisher and/or recipient acknowledges the U. S. Government's right to retain a nonexclusive, royalty-free license in and to any copyright covering this paper, along with the right to reproduce and to authorize others to reproduce all or part of the copyrighted paper.

DISCLAIMER This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, make any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.

DISCLAIMER Portions of this document may be illegible in electronic image products. Images are produced from the best available original document.

INTRODUCTION The purpose of this note is to comment on the entire process of analyzing species-area data, pa&cularly as performed by Rydin and Borgegbd (1988). They use three different models to test species-area relations for islands over a 100 year period. Several aspects of their analysis of species-area data could be improved, including their comparison of goodness-of-fit and testing of the expected value of z. The reason that these issues are important (their basic conclusions being correct) is that there is acrimonious debate over the best model to use for species-area curves and over whether the slope coefficient is constant or is an artifact, and because the species-area curve is being used for nature reserve design (see DUM and Loehle 1988 for discussion of these points). The problems pointed out here are common to a large class of allomemc-type analyses in ecology. I attempt to show the potential pitfalls inherent in allometric analyses and demonstrate methods for avoiding these problems. Rydin and Borgeghd (1988) document long-term trends in species numbers on islands formed in 1886 when the water level was lowered in Lake Hjalmaren, Sweden. At each remeasurement period, data on species richness was collected for all islands. They fit three different models at each time (Table 1, their Table 2). The three models are log S = C + z log A S=CAz S = C z log A 2

where S is species number, and A is island area. A is usually known with fairly high accuracy. The first mistake is that (1) and (2) are not really the same model. Taking the log transform of both sides of (2), log s = log (CAZ) log S = log C + zlog A Creating new variables from the log of the data values, we get a typical linear equation which can be fit by linear least-squares The coefficient BOobtained from regression is therefore not C but log C. Doing the inverse transform, C can be recovered and the C and z values for (1) and (2) will be identical when there is no error term. The two models do not in general give identical results, however, as shown next. In addition to C in (1) and (2) not having the same meaning, C and z in f3j are not comparable in meaning to either (1) or (2). C and z have come to have specific meanings in the ecological literam (C indicating species richness and other factors and z being related to Preston's canonical hypothesis and other things, Gould, 1979; Sugihara, 1981). Thus it would be much less confusing if Soand 81 were used as designations for arbitrary coefficients to be fit when comparing several equations, rather than C and z. 3

The second problem is the comparison of R* values from models with differently transformed data. It is valid to compare R2 values for one model under different treatments (years, in this case) and thus their comparisons of slope by sample year are valid. When different models are to be compared, however, the R2 values are not commensurate because a log-transform may change the R2 value and in general will increase it (Kvdseth, 1985). This is purely an artifact of the data transfonnation and results from axis compression. Since (2) has an additive error term, whereas (4) has a multiplicative error term (in the S-A plane), (4) and (2) will not in general give identical values for C and z. In fact, z values for (1) are much larger than for (2) (Table 1). Thus (4) and (2) cannot be considered interchangeable ways of fitting C and z, (as it seems many ecologists assume), and R2 values for (2), (3), and (4) cannot be compared when R2 is based'on the transformed variables, which is how it appears that Rydin and Borgeghd calculated R2. The question of which error term (additive or multiplicative) is more appropriate for species-area data has not been answered. This leads to a more general problem with the use of R2 as a measure of goodness of fit. KvAlseth (1985) points out that there are several methods for calculating R2. For linear models fit by least squares, most of these methods give the exact same results. For linear 'models with no intercept term and nonlinear models, the various methods give different results. This problem is not generally appreciated nor discussed in regression text books. Kvdseth (1985) 2 recommends a method (his R1 ) that allows comparison between linear, linear with no intercept, nonlinear (in the parameters) models and those fit by other than ordinary regression. For comparisons, all calculations are done using untransfonned values of x and y. His adjusted R2 is, 1 A where summation is over n data points, 4

a = n - 1 n - k - i for intercept model, and a= - n for no-intercept model, n-k where k is the number of parameters. This adjusted R2 calculation is not necessarily the one provided by standard computer packages, as shown by Kvillseth (1985). Thus care must be taken when different models are calculated different packages or by hand or compared and compiled from different published papers. It is not safe to assume that R2 values are computed from (5) unless the authors specify this. In general Rz values printed by packages will be based on the transformed (S') values for models such as (1) and (3), rather than on the original axes (Kvillseth, 1985). To emphasize the importance of using (3, KvAlseth (1985) notes that for nonlinear models several of the available R2 calculations can give values greater than 1. KvAlseth (1985) elaborates (5) for increased resistance to outliers, which may be useful if outliers are not fmt removed. He also notes that residual analysis should accompany R2 as a measure of which model is best. As a final point, Rydin and Borgegkd (1988) test whether the confidence limits around z in (2) include 0.25. This test is due to claims that the value of z has underfjing significance (Gould, 1979; Sugihara, 1981) or in contrast that z = 0.25 is a statistical artifact (Connor and McCoy, 1979). However, it is not entirely valid to compare 0.25 with the 95% confidence limits around z because the power of the test is unknown. After we fail to reject H, further tests are needed to conclude that z is not significantly different from 0.25. Range of the data (on the area axis) can easily be inadequate for species-area studies and small range can lead to low power. Inadequate sample size can also lead to the conclusion of no difference. Finally, very large islands are typically few in number in species-area studies, leading to very high leverage for the values for large islands, particularly if they are much larger than the other islands. Tests are available for examining leverage (Myers, 1986 p.155). These factors must be evaluated and eliminated as causes of the overlap of z with 0.25 before we can conclude that the lack of difference is?b 5

meaningful. A method for testing whether the measured z differs from 0.25 for a nonlinear model for species-area data is given in Dunn and Loehle (1988). Using Monte Carlo simulation, sample from a function (2) with z = 0.25 and variance the same as the actual data. This is essentially a bootstrap method. Generate in this way a large series of data sets to which z is fit, giving a distribution of z values against which the sample value can be tested (ie., is the sample value likely to have been obtained from a population with a true value of z = 0.25?). The effects of factors which give low power can be tested in this context by, e.g., varying the range of the data in the Monte Carlo trial (DUM and Loehle, 1988). In conclusion, care must be taken when working with transformed data, particuldrly for comparing equations and drawing conclusions about the values of parameters. The problems pointed out here are not well known to ecologists and deserve consideration. Particular care should be taken when using results from statistical computer packages. ACKNOWLEDGMENTS * -- This work was carried out under contract DE-AC09-76SR00001 with the U. S. Department of Energy, with whom the copyright remains. I would like to thank C. Comiskey and statisticians S. Harris and E. Smith for reviewing the manuscript. Connor, E. F. and E. D. McCoy. 1979. The statistics and biology of the species-area relationship. Am. Nat. 113:791-833. Dunn, C. P., and C. Loehle. 1988. Species-area parameter estimation: testing the null model of lack of relationship. J. Biogeog. In Press. 6

Gould, S. J. 1979. An allometric interpretation of species-area curves: the meaning of the coefficient Am. Nat. 114:335-343. Kvaseth, T. 0.1985. Cautionary note about R2. Am. Statis. 39:279-285. Myers, R. H. 1986. Classical and Modem Regression with Applications. Duxbury Press, Boston. Rydin, H., and S.-0. Borgeghd. 1988. Plant species richness on islands over a century of primary succession: Lake Hjiililmaren. Ecology 69:916-927. Sugihara, G. 1981. S = CAZ, z = 1/4: A reply to Connor and McCoy. Am. Nat. 117:790-793..- 7

Table 1. Species-area re; -essions. In the 1886 figures, islands formed in 1886, just prior to Callmes survey, are excluded. n is the number of islands included in each regression. From Rydin and Borgeghd (1988). S = C + z log A (exponen tiai) logs=c +zloga (transformed power) Year C Z n ~2 C z n R2 C Z n R2 1886-14.0 13.2 21 0.24 8.6 0.16 21 0.40 0.80 0.20 21 0.47 1892-53.9 32.7 31 0.65 8.5 0.24 31 0.53 0.16 0.46 30 0.56 1903-1904 -54.6 34.6 35 0.76 6.8 0.28 35 0.69 0.02 0.52 34 0.78 1927-1928 -61.0 39.3 37 0.85 6.3 0.30 37 0.81-0.02 0.56 37 0.78 1984-1985 -48.0 36.1 37 0.84 '0.8 0.24 37 0.78 0.65 0.36 37 0.72 8