Ecological ordination with Past

Similar documents
Multivariate Analysis of Ecological Data

Excel -- Creating Charts

Data representation and analysis in Excel

Dimensionality Reduction: Principal Components Analysis

Factor Analysis. Chapter 420. Introduction

Section 3 Part 1. Relationships between two numerical variables

Summary of important mathematical operations and formulas (from first tutorial):

Tutorial on Using Excel Solver to Analyze Spin-Lattice Relaxation Time Data

Vertical Alignment Colorado Academic Standards 6 th - 7 th - 8 th

DATA ANALYSIS II. Matrix Algorithms

Microsoft Excel 2010 Charts and Graphs

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Using Microsoft Excel to Plot and Analyze Kinetic Data

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Cluster analysis with SPSS: K-Means Cluster Analysis

Review Jeopardy. Blue vs. Orange. Review Jeopardy

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

4.7. Canonical ordination

(Least Squares Investigation)

3. INNER PRODUCT SPACES

Georgia Standards of Excellence Curriculum Map. Mathematics. GSE 8 th Grade

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Unit 9 Describing Relationships in Scatter Plots and Line Graphs

with functions, expressions and equations which follow in units 3 and 4.

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

Manifold Learning Examples PCA, LLE and ISOMAP

Dealing with Data in Excel 2010

Steven M. Ho!and. Department of Geology, University of Georgia, Athens, GA

Data Mining: Algorithms and Applications Matrix Math Review

x = + x 2 + x

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

A Guide to Using Excel in Physics Lab

Session 7 Bivariate Data and Analysis

13 MATH FACTS a = The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

Common Core Unit Summary Grades 6 to 8

Using Excel for descriptive statistics

Scatter Plots with Error Bars

Chapter 19. General Matrices. An n m matrix is an array. a 11 a 12 a 1m a 21 a 22 a 2m A = a n1 a n2 a nm. The matrix A has n row vectors

Vector and Matrix Norms

Visualization Quick Guide

Learning and Skills Improvement Service OLAP reporting tool Guidance for Learning Providers. Further Education Workforce Data for England

9.4. The Scalar Product. Introduction. Prerequisites. Learning Style. Learning Outcomes

Pennsylvania System of School Assessment

Introduction to Principal Components and FactorAnalysis

Introduction to Principal Component Analysis: Stock Market Values

Data Mining and Visualization

Updates to Graphing with Excel

Linear Algebra Review. Vectors

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Vector Notation: AB represents the vector from point A to point B on a graph. The vector can be computed by B A.

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Advanced Microsoft Excel 2010

Tutorial for proteome data analysis using the Perseus software platform

UCINET Quick Start Guide

Regression Clustering

Creating Charts in Microsoft Excel A supplement to Chapter 5 of Quantitative Approaches in Business Studies

Section 1.1. Introduction to R n

ESTIMATING THE DISTRIBUTION OF DEMAND USING BOUNDED SALES DATA

Elements of a graph. Click on the links below to jump directly to the relevant section

The Basics of FEA Procedure

Lean Six Sigma Analyze Phase Introduction. TECH QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY

Comparables Sales Price

Basics of Dimensional Modeling

The Correlation Coefficient

Data exploration with Microsoft Excel: analysing more than one variable

Principal Component Analysis

January 26, 2009 The Faculty Center for Teaching and Learning

Graphical Representation of Multivariate Data

The electrical field produces a force that acts

Multivariate Analysis of Variance (MANOVA)

Objectives. Experimentally determine the yield strength, tensile strength, and modules of elasticity and ductility of given materials.

Measurement with Ratios

Social Media Mining. Data Mining Essentials

Spreadsheets and Laboratory Data Analysis: Excel 2003 Version (Excel 2007 is only slightly different)

Design & Analysis of Ecological Data. Landscape of Statistical Methods...

Scientific Graphing in Excel 2010

2. Spin Chemistry and the Vector Model

Systems of Linear Equations

UCL Depthmap 7: Data Analysis

Space Perception and Binocular Vision

Using Excel for inferential statistics

by the matrix A results in a vector which is a reflection of the given

Graphing Parabolas With Microsoft Excel

Creating an Excel XY (Scatter) Plot

x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.

FREE FALL. Introduction. Reference Young and Freedman, University Physics, 12 th Edition: Chapter 2, section 2.5

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

2013 MBA Jump Start Program. Statistics Module Part 3

Part-Based Recognition

FURTHER VECTORS (MEI)

Confidence Intervals for One Standard Deviation Using Standard Deviation

5: Magnitude 6: Convert to Polar 7: Convert to Rectangular

To do a factor analysis, we need to select an extraction method and a rotation method. Hit the Extraction button to specify your extraction method.

Prentice Hall Mathematics Courses 1-3 Common Core Edition 2013

Topographic Change Detection Using CloudCompare Version 1.0

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries?

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

The KaleidaGraph Guide to Curve Fitting

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Factor Analysis. Sample StatFolio: factor analysis.sgp

Transcription:

Ecological ordination with Past Øyvind Hammer, Natural History Museum, University of Oslo, 11-6-6 Introduction This text concerns taxa-in-samples data. Such a data set contains a number of samples, each sample occupying one row in the spreadsheet. Each sample contains counts, percentages or presenceabsence of a number of taxa (in columns). The samples may come from different localities or different levels in a section or core. A basic requirement is to plot the samples as points in D or 3D, so that similar samples plot closely to each other, and more different samples are more distant. From such a plot, called an ordination, it may be possible to extract different types of information: Are there groups of points? How may such groups be interpreted, e.g. in terms of biogeography, biostratigraphy or environment? Are the points ordered according to geographical, stratigraphical or environmental gradients? Ordination is a fundamental technique in modern ecology, and often one of the first things we try in order to get an overview of a complex taxa-in-samples data set. Correspondence analysis Correspondence analysis (CA) is one of the most popular ordination methods for taxa-in-samples data, especially for samples collected along one or several gradients along which taxa come and go in an overlapping sequence. Like other ordination methods, CA attempts to place similar samples in similar positions in the ordination plot. The measure of distance between samples is proportional to the chi-squared statistic. We will use a dataset containing abundances (on a scale from to 1) of Recent benthic foraminifera along a depth gradient in the Gulf of Mexico. Open the file bentforams.dat, select the whole table and run Correspondence from the Multivar menu.

Axis The first window shows the 13 axes constructed by the analysis. Consider that two points will always lie on a straight line (one dimension) and three points always in a plane (two dimensions). In this case we have 14 samples, and the corresponding points occupy a 13-dimensional space. The axes are ordered according to their eigenvalues. The first axis, with the largest eigenvalue, contains 33.1% of the information in the data set, measured using the chi-squared criterion. The second axis contains 4.5%. This means that if we plot the points in two dimensions, using the two first CA axes, we retain 57.6% of the information, which is impressive considering that the dimensionality has been reduced from 14 to. This works because there is structure in the data, and the analysis has been successful at extracting this structure. Click the View scatter button to see the result of the ordination. 3.5 Marsh 3.5 1.5 1.5 51-5m 151-m 1-5m -.5 11-15m 51-1m -5m -1-1 -.5.5 1 1.5.5 Bay

Axis The important thing to note is that the shallowest sample (Marsh) is found at the right end of the plot. To the left of it is Bay, then -5m, 51-1m and 11-15m. Going deeper than this, the points lie very close together, and partly in the wrong order considering their depths. Still, it is clear that the CA axis 1 can be interpreted as reflecting depth. Remember that the computer had no information about depth, but managed to place the samples in this order based only on their foram abundances. Firstly, this indicates that depth (or rather some other factor correlating with depth) is an important control on the foram fauna. Secondly, if this were a paleontological data set, with no a priori information on depth, such an analysis might provide clues about paleodepth. The interpretation of Axis is more obscure, but it is clearly dominated by the difference between the Marsh and the Bay samples. Detrended correspondence analysis Sometimes, nonlinear relationships can cause the main gradient, reflected by CA axis 1, to spill over into axis, producing an arch rather than a linear trend in the CA plot. The points can also get compressed near the ends of the gradient. To reduce these perhaps annoying effects, one can attempt to straighten out the arch. Detrended correspondence analysis (DCA) is one method with this purpose. Select all, and run Detrended correspondence from the Multivar menu. In this case the effect is not dramatic compared with the usual correspondence analysis, but the depth gradient is more parallel with axis 1 (tick and untick the Detrending box to compare the two methods). 3 Marsh 1 >3 51-5m 1-5m 151-m 11-15m -5m 51-1m 1 3 Bay An interesting feature of CA (and DCA) is that is can show both the samples and the taxa in the same plot, illustrating which taxa are moreimportant in different regions of the diagram. Select the Column labels option. In the figure below I have moved and removed names to improve readability. E.g. Rotalia and Miliammina are typical of the Marsh environment, e.g. Lagena and

Axis Elphidium are found mainly in the Bay, Virgulina and Bolivina on the inner shelf, Bulimina and Cassidulina in deeper water. 4 3 Haplophragmoides Tiphotrocha Miliammina Trochammina Rotalia 1-1 Spiroplectammina Pseudoglandulina Valvulineria Virgulina Parrella Reophax Sphaeroidina Eggerella Nonion Rectobolivina Bulimina Bifarina Laticarinina Cassidulina Bigenerina Trifarina Planorbulina Sigmoilina Bolivina Buliminella Gaudryina Discorbis Quinqueloculina Elphidium Ammobaculites Lagena Ammoscalaria Triloculina Palmerinella Triloculinella - - -1 1 3 4 Principal coordinates analysis The idea of placing the samples in the ordination plot so that similar samples are close, can be generalized to any measure of sample distance. This leads to principal coordinates analysis (PCoA), which attempts to make the (Euclidean) distance between any pair of points proportional to sample distance. We have large flexibility in the choice of distance (or similarity) measure, and different people have different favorites. In marine ecology, the Bray-Curtis distance is now the default choice. The Bray-Curtis distance between samples j and k is defined as follows (the sums indexed by i go over all taxa): x ji xki i x ji xki d. jk i For binary (presence-absence) data, the Dice similarity is a good start. The Dice similarity puts more weight on joint occurences than on mismatches. When comparing two samples, a match is counted for all taxa with presences in both samples. Using M for the number of matches and N for the the total number of taxa with presence in just one row, we have d jk = M / (M+N).

Select all, and run Principal coordinates in the Multivar menu. Then select the Bray-Curtis similarity index. Similarly to correspondence analysis, each ordination axis has an associated eigenvalue. The first two axes explain 69% of the total variation, which is again impressive considering the dimensionality reduction. Click the View scatter button to see the ordination results. The samples are clearly placed along a depth gradient, but forming a large arch instead of a straight line. The order of samples along the gradient is further emphasized by plotting the minimal spanni ng tree, which is a set of lines connecting all the dots so that the total length is as small as possible, measured with the selected index (Bray-Curtis) in the fully dimensional data set. We see that except for the 1-5m sample, the depth gradient is captured perfectly.

Non-metric multidimensional scaling PCoA is now in relatively little use compared with a conceptually similar method called non-metric multidimensional scaling (NMDS). This method attempts to place the points in a two- or threedimensional coordinate system such that the ranked differences are preserved. For example, if the original distance between points 4 and 7 is the ninth largest of all distances between any two points, points 4 and 7 will ideally be placed such that their Euclidean distance in the ordinated D plane or 3D space is still the ninth largest. NMDS intentionally does not take absolute distances into account. It usually performs better than PCoA. Because there is no closed algebraic solution to this problem, the computer must proceed by trial and error. The program may converge on a different solution in each run, depending upon the random initial conditions. Each run is actually a sequence of 11 trials, from which the best one is chosen. One of these trials uses PCoA as the initial condition. Select all, and run Non-metric MDS from the Multivar menu. You are asked to select a similarity measure try Bray-Curtis. The Shepard plot of obtained versus observed (target) ranks indicates the quality of the result. Ideally, all points should be placed on a straight ascending line (x=y). The stress value should be

Axis small, at least less than. and ideally less than.1. In this case the result is excellent (stress.5), showing that the reduction to two dimensions implies very little loss of information. Canonical correspondence analysis If we have independent measurement of environmental variables (temperature, ph, substrate type etc.) it is possible to constrain the analysis so that the ordination axes are linear combinations of these variables. This can give precise visualization of how the environment controls the faunal gradients. Canonical correspondence analysis (CCA) is one such method. sp9 31 sp8 1 sand sp7-1 sp5 sp6 8 6 4 1 other sp1 sp 75 9 depth coral sp3 sp4 - - -1 1 In the example above, three different sets of items are shown in the same plot (a triplot ). The taxa are sp1 to sp9. The samples are numbered 1-1. The environmental variables (depth and substrate type) are shown as lines (vectors) from the origin. For example, samples 1-3 are sandy, shallow, and characterized by taxa sp8 and sp9. CCA is now very popular in ecology, but less often used in paleontology because we do not have independent environmental information (in contrast, the task is often to reconstruct environment based on fossil data). References Legendre, P. & Legendre, L. 1998. Numerical Ecology. Elsevier.