Mantel Permutation Tests

Similar documents
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Introduction to Quantitative Methods

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Multivariate Analysis of Ecological Data

Linear Models in STATA and ANOVA

Tutorial for proteome data analysis using the Perseus software platform

The Dummy s Guide to Data Analysis Using SPSS

Analysing Questionnaires using Minitab (for SPSS queries contact -)

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared.

Elementary Statistics Sample Exam #3

Part 2: Analysis of Relationship Between Two Variables

Statistical tests for SPSS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

Performance Metrics for Graph Mining Tasks

Statistics Review PSY379

Additional sources Compilation of sources:

1.5 Oneway Analysis of Variance

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Directions for using SPSS

Parametric and Nonparametric: Demystifying the Terms

Data Analysis Tools. Tools for Summarizing Data

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Nonparametric statistics and model selection

Come scegliere un test statistico

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Projects Involving Statistics (& SPSS)

Recall this chart that showed how most of our course would be organized:

DATA ANALYSIS. QEM Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. Howard University

Data Mining: Algorithms and Applications Matrix Math Review

The correlation coefficient

Multiple regression - Matrices

Simple Linear Regression Inference

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Univariate Regression

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

You have data! What s next?

Factors affecting online sales

Descriptive Statistics

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Factor Analysis. Chapter 420. Introduction

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Difference tests (2): nonparametric

August 2012 EXAMINATIONS Solution Part I

MEASURES OF LOCATION AND SPREAD

Chapter 12 Nonparametric Tests. Chapter Table of Contents

Pearson's Correlation Tests

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Association Between Variables

Inference for two Population Means

Regression Analysis (Spring, 2000)

Study Guide for the Final Exam

Regression Analysis: A Complete Example

Using Excel for inferential statistics

Biostatistics: Types of Data Analysis

Homework 11. Part 1. Name: Score: / null

Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable

EXCEL Analysis TookPak [Statistical Analysis] 1. First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it:

Minitab Tutorials for Design and Analysis of Experiments. Table of Contents

2013 MBA Jump Start Program. Statistics Module Part 3

Fairfield Public Schools

Math Quizzes Winter 2009

Chapter 13 Introduction to Linear Regression and Correlation Analysis

CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Analysis of Variance ANOVA

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Multivariate normal distribution and testing for means (see MKB Ch 3)

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Point Biserial Correlation Tests

The Kendall Rank Correlation Coefficient

II. DISTRIBUTIONS distribution normal distribution. standard scores

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

Introduction to Principal Components and FactorAnalysis

UNDERSTANDING THE TWO-WAY ANOVA

Independent samples t-test. Dr. Tom Pierce Radford University

Pearson s Correlation

Chapter 3 RANDOM VARIATE GENERATION

MORPHEUS. Prediction of Transcription Factors Binding Sites based on Position Weight Matrix.

Simple Predictive Analytics Curtis Seare

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Statistics 2014 Scoring Guidelines

Crosstabulation & Chi Square

SAS/STAT. 9.2 User s Guide. Introduction to. Nonparametric Analysis. (Book Excerpt) SAS Documentation

ABSORBENCY OF PAPER TOWELS

Section 13, Part 1 ANOVA. Analysis Of Variance

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

SPSS Explore procedure

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Analyzing Research Data Using Excel

UNIVERSITY OF NAIROBI

Simple linear regression

Transcription:

PERMUTATION TESTS Mantel Permutation Tests Basic Idea: In some experiments a test of treatment effects may be of interest where the null hypothesis is that the different populations are actually from the same population. Or in other tests, the null hypothesis is one of complete randomness. Example 1: ANOVA where H 0 is that the treatment means are all equal. The assumptions that must be true are that each treatment must have the same variance and the same shape. If in fact, the null hypothesis is true, then the observations are not distinguishable by treatment but are instead from the same distribution (one shape, mean and variance) and just happen to be randomly associated with a treatment. Original dataset collected Sample ID Pop 1 Pop 2 1 7 12 2 0 1 3 6 5 4 2 6 5 3 5 6 4 3 7 7 3 8 6 4 9 5 7 Mean 4.44 4.55 Permuted Data Sample ID Pop 1 Pop 2 1 5 7 2 0 7 3 3 5 4 2 4 5 3 12 6 5 1 7 7 6 8 6 4 9 6 3 Mean 4.11 5.44 ALS5932/FOR6934 Fall 2006 1 Mary C. Christman

Permutation tests are based on this idea. If H 0 is true then any set of values are just random assignments among treatments. Method Under The Assumptions That The Distributions Are Identical Under H 0 And Sampling Is Random And With Replacement And Treatment Assignment Is Random: 1) Calculate the test statistic for the hypotheses for the original observed arrangement of data. This could be a sample correlation, an F-stat or a MS or some other statistic. Call it κ 0. 2) Now, randomly rearrange the data among the treatments (shuffle or permute the data according to the experimental design; see below for the case of matrices) and calculate the test statistic for the new * arrangement. Call it. κ p * 3) Store the permutation estimate κ p. 4) Repeat steps 2-3 many times. Call the total number of times you repeat the permutations P. That is p = 1, 2,, P. * 5) Compare κ 0 to the distribution of the permutation estimates κ p. The p- value for the test is * #( κ p > κ p ) p value =. P Example: The most famous use of permutation tests for ecological problems is Mantel s test of similarity of two symmetric matrices. Mantel s test was extended to allow more than 2 matrices by Smouse et al. 1986. We ll look at the simple case (2 matrices). Mantel s test is a test of the correlation between the elements in one matrix with the elements in the other matrix where the elements within the matrices have been organized in a very specific way (symmetric with zeroes on the diagonal). Original use was to compare two distance matrices and that is still the most common use today. STA 6934 Spring 2007 2 Mary C. Christman

Matrix Y a b c b d e c e f Matrix X α β χ β δ ε χ ε φ Question: Are the element-wise pairs, (a, α), (b, β), (c, χ), (d, δ), (e, ε), (f, φ), correlated? Can we use Pearson s correlation coefficient to test that? Recall that Pearson s correlation assumes that 1) the variables are quantitative, and 2) if there is a relationship between 2 variables, that relationship is linear. Now, most of the matrices are not exactly as just shown above. More specifically, the matrices are usually distance measures where distance is some metric between the replicates involved in the study. For example, matrix Y could be the number of genes not in common between sampled animals in a study and matrix X could be the Euclidean distance between the locations at which the animals were found. The distance between a replicate and itself is 0 and the distances are symmetric in the sense that the distance between F and H is the same as the distance between H and F. So commonly we have matrices with the structure Y X animal 1 2 3 1 2 3 1 0 b c 1 0 β 2 b 0 e 2 β 0 3 c e 0 3 χ ε χ ε 0 where b = # genes not in common between animals 1 and 2 and β = geographic distance between animals 1 and 2, etc. We only (b, β), (c, χ), (e, ε) need to test for correlation. STA 6934 Spring 2007 3 Mary C. Christman

Because of the use of the same individuals repeatedly in generating the distances given in the matrices, the values within each matrix are also correlated among themselves. As a consequence, the usual method for testing Pearson s correlation coefficient would involve an estimated standard error that is biased low for the true standard deviation of the estimator of correlation. This means we shouldn t use the usual large-sample test based on Normality. Use a Permutation test! Example: Copepods in Ceiling Drips in Organ Cave, West Virginia STA 6934 Spring 2007 4 Mary C. Christman

STA 6934 Spring 2007 5 Mary C. Christman

# Title="Organ Cave Ceiling Drips" Partial Code for Testing Correlation Matrixsize= 13 #Y matrix matrix of dissimilarities (1-Jaccard Index) Jaccard <- matrix(c( 0.00, 0.83, 0.80, 1.00, 0.87, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 0.90, 1.00, 0.83, 0.00, 0.43, 0.43, 0.44, 1.00, 0.67, 0.86, 0.62, 1.00, 0.67, 0.55, 0.08, 0.80, 0.43, 0.00, 0.33, 0.37, 1.00, 0.60, 0.83, 0.57, 1.00, 0.43, 0.50, 0.40, 1.00, 0.43, 0.33, 0.00, 0.56, 1.00, 0.60, 0.83, 0.33, 1.00, 0.43, 0.66, 0.40, 0.87, 0.44, 0.37, 0.56, 0.00, 0.87, 0.75, 0.89, 0.70, 0.87, 0.44, 0.20, 0.62, 1.00, 1.00, 1.00, 1.00, 0.87, 0.00, 1.00, 1.00, 1.00, 1.00, 0.83, 0.90, 1.00, 1.00, 0.67, 0.60, 0.60, 0.75, 1.00, 0.00, 0.67, 0.60, 1.00, 0.67, 0.80, 0.75, 1.00, 0.86, 0.83, 0.83, 0.89, 1.00, 0.67, 0.00, 0.83, 1.00, 0.86, 0.91, 1.00, 1.00, 0.62, 0.57, 0.33, 0.70, 1.00, 0.60, 0.83, 0.00, 1.00, 0.62, 0.64, 0.67, 1.00, 1.00, 1.00, 1.00, 0.87, 1.00, 1.00, 1.00, 1.00, 0.00, 1.00, 0.00, 1.00, 1.00, 0.67, 0.43, 0.43, 0.44, 0.83, 0.67, 0.86, 0.62, 1.00, 0.00, 0.55, 0.50, 0.90, 0.55, 0.50, 0.66, 0.20, 0.90, 0.80, 0.91, 0.64, 0.00, 0.55, 0.00, 0.70, 1.00, 0.08, 0.40, 0.40, 0.62, 1.00, 0.75, 1.00, 0.67, 1.00, 0.50, 0.70, 0.00), #X1 matrix logdist=matrix(c( 0.00, 0.556, 0.607, 0.653, 0.708, 3.097, 3.097, 3.097, 3.097, 3.076, 3.076, 3.076, 3.076, 0.556, 0.00, 0.161, 0.279, 0.398, 3.097, 3.097, 3.097, 3.097, 3.076, 3.076, 3.076, 3.076, 0.607, 0.161, 0.00, 0.161, 0.312, 3.097, 3.097, 3.097, 3.097, 3.076, 3.076, 3.076, 3.076, 0.653, 0.279, 0.161, 0.000, 0.204, 3.097, 3.097, 3.097, 3.097, 3.076, 3.076, 3.076, 3.076, 0.708, 0.398, 0.312, 0.204, 0.000, 3.097, 3.097, 3.097, 3.097, 3.076, 3.076, 3.076, 3.076, 3.097, 3.097, 3.097, 3.097, 3.097, 0.000, 1.959, 1.959, 1.959, 1.820, 1.820, 1.820, 1.820, 3.097, 3.097, 3.097, 3.097, 3.097, 1.959, 0.000, 0.886, 0.896, 1.820, 1.820, 1.820, 1.820, 3.097, 3.097, 3.097, 3.097, 3.097, 1.959, 0.886, 0.000, 0.072, 1.820, 1.820, 1.820, 1.820, 3.097, 3.097, 3.097, 3.097, 3.097, 1.959, 0.896, 0.072, 0.000, 1.820, 1.820, 1.820, 1.820, 3.076, 3.076, 3.076, 3.076, 3.076, 1.820, 1.820, 1.820, 1.820, 0.000, 1.390, 1.405, 1.412, 3.076, 3.076, 3.076, 3.076, 3.076, 1.820, 1.820, 1.820, 1.820, 1.390, 0.000, 0.270, 0.356, 3.076, 3.076, 3.076, 3.076, 3.076, 1.820, 1.820, 1.820, 1.820, 1.405, 0.270, 0.000, 0.149, 3.076, 3.076, 3.076, 3.076, 3.076, 1.820, 1.820, 1.820, 1.820, 1.412, 0.356, 0.149, 0.000), To test if matrices Y and X1 are correlated, I need to permute one of the matrices repeatedly and then test the original correlation estimate against the distribution of correlations for the permuted matrices. So, permute Y by randomly rearranging the columns and then arranging the rows to match the random rearrangement of the columns: A <- matrix(c(11,12,13,21,22,23,31,32,33), byrow=t, nrow=3) >A [,1] [,2] [,3] [1,] 11 12 13 [2,] 21 22 23 [3,] 31 32 33 STA 6934 Spring 2007 6 Mary C. Christman

temp <- sample(3) > temp [1] 2 3 1 Aperm<- A[,temp] > Aperm [,1] [,2] [,3] [1,] 12 13 11 [2,] 22 23 21 [3,] 32 33 31 Aperm<- Aperm[temp,] > Aperm [,1] [,2] [,3] [1,] 22 23 21 [2,] 32 33 31 [3,] 12 13 11 Aperm<-A[temp,temp] > Aperm preserves the symmetry of [,1] [,2] [,3] the matrix [1,] 22 23 21 [2,] 32 33 31 [3,] 12 13 11 Then do the permutations and get the resulting set of correlations. Compare the original correlation against the permuted pairs. H 0 : the two variables are not correlated H A : the two variables are positively correlated The p-value of the one-sided test is the proportion of permutation correlations estimates > original correlation estimate # simple Mantel test of Jaccard and log(distance) ignoring system effects # observed correlation between X and Y STA 6934 Spring 2007 7 Mary C. Christman

Jvector <- as.vector(jaccard) X1vector <- as.vector(logdist) obs.corr <- cor(jvector,x1vector) numpermutes <- 5000 # 13! = 6,227,020,800 possible arrangements permuted.corr <- rep(0,numpermutes) permuted.corr[1] <- obs.corr for (i in 2:numPermutes) { temp <- sample(matrixsize) permuted.jaccard <- Jaccard[temp,temp] Jvector <- as.vector(permuted.jaccard) permuted.corr[i] <- cor(jvector,x1vector) } pvalue <- sum(permuted.corr>=obs.corr)/numpermutes 0 200 400 600 Frequency distribution of Pearson s r from the permutations original data correlation r = 0.4062 permutation p-value = 0.0652 0.2 0.3 0.4 0.5 permuted.corr Pearson s assumes the relationship if it exists is linear. Is that the case here? STA 6934 Spring 2007 8 Mary C. Christman

I reran the test using Spearman s correlation coefficient instead: Change cor(jvector,x1vector) to cor(rank(jvector),rank(x1vector)) and rerun the above code. > obs.corr = 0.3994808 > pvalue = 0.0482 0 200 400 600 800 1000 1200 0.0 0.1 0.2 0.3 0.4 0.5 permuted.corr The best method (not shown) is to incorporate a second variable that distinguishes the two regions from each other following the method outlined in Smouse et al. 1986. Sometimes called partial Mantel tests or multiple regression Mantel tests. Mantel Correlogram In order to study the structure of the Y matrix (usually the one of interest) with respect to distances in the other matrix, it is of interest to look at the correlation among values of Y for specific sets of distances in X. This is a case of looking at AUTOcorrelation among subsets of values within a matrix rather than correlation between two different variables. The correlogram is a graphic displaying the autocorrelation for those different subsets. For example, suppose I am interested in the autocorrelation among the dissimilarities of the copepods as a function of log(distance). The way to do that is to create a set of non-overlapping distance classes (called lag STA 6934 Spring 2007 9 Mary C. Christman

distances) and do the autocorrelation of observations that fall within each distance class. First, I need to create the set of lag distances: (>0 1), (1 2), and (> 2). # Lag distance matrix lagdistmatrix=matrix(c( 0, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 1, 0, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 0, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 0, 1, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 2, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 2, 1, 0, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 2, 1, 1, 0, 2, 2, 2, 2, 3, 3, 3, 3, 3, 2, 2, 2, 2, 0, 2, 2, 2, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 0, 1, 1, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 1, 0, 1, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 1, 1, 0), Then, for each lag distance, I need to create another matrix of 0s and 1s, where the zeroes indicate that the distance is within the lag class or 1s otherwise. Now perform Mantel s test on these two matrices. Repeat until all lag classes have been done. # For example: Lag 1 matrix lagdistmatrix1 = matrix(c( 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0), # Lag distance 2 matrix lagdistmatrix2 = matrix(c( 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, STA 6934 Spring 2007 10 Mary C. Christman

1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0), # Lag distance matrix lagdistmatrix3 = matrix(c( 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0), Run Mantel s test on each lag distance matrix and Y. We obtain the following results: Lag Observed Correlation 2-sided p-value 1 0.4152 0.0004 2-0.0000561 0.3444 3-0.1931 0.0088 Very positive and very negative values indicate that the further away the locations from one another, the more dissimilar the species composition (as measured by 1-J). STA 6934 Spring 2007 11 Mary C. Christman