Mantel Permutation Tests

Transcription

1 PERMUTATION TESTS Mantel Permutation Tests Basic Idea: In some experiments a test of treatment effects may be of interest where the null hypothesis is that the different populations are actually from the same population. Or in other tests, the null hypothesis is one of complete randomness. Example 1: ANOVA where H 0 is that the treatment means are all equal. The assumptions that must be true are that each treatment must have the same variance and the same shape. If in fact, the null hypothesis is true, then the observations are not distinguishable by treatment but are instead from the same distribution (one shape, mean and variance) and just happen to be randomly associated with a treatment. Original dataset collected Sample ID Pop 1 Pop Mean Permuted Data Sample ID Pop 1 Pop Mean ALS5932/FOR6934 Fall Mary C. Christman

2 Permutation tests are based on this idea. If H 0 is true then any set of values are just random assignments among treatments. Method Under The Assumptions That The Distributions Are Identical Under H 0 And Sampling Is Random And With Replacement And Treatment Assignment Is Random: 1) Calculate the test statistic for the hypotheses for the original observed arrangement of data. This could be a sample correlation, an F-stat or a MS or some other statistic. Call it κ 0. 2) Now, randomly rearrange the data among the treatments (shuffle or permute the data according to the experimental design; see below for the case of matrices) and calculate the test statistic for the new * arrangement. Call it. κ p * 3) Store the permutation estimate κ p. 4) Repeat steps 2-3 many times. Call the total number of times you repeat the permutations P. That is p = 1, 2,, P. * 5) Compare κ 0 to the distribution of the permutation estimates κ p. The p- value for the test is * #( κ p > κ p ) p value =. P Example: The most famous use of permutation tests for ecological problems is Mantel s test of similarity of two symmetric matrices. Mantel s test was extended to allow more than 2 matrices by Smouse et al We ll look at the simple case (2 matrices). Mantel s test is a test of the correlation between the elements in one matrix with the elements in the other matrix where the elements within the matrices have been organized in a very specific way (symmetric with zeroes on the diagonal). Original use was to compare two distance matrices and that is still the most common use today. STA 6934 Spring Mary C. Christman

3 Matrix Y a b c b d e c e f Matrix X α β χ β δ ε χ ε φ Question: Are the element-wise pairs, (a, α), (b, β), (c, χ), (d, δ), (e, ε), (f, φ), correlated? Can we use Pearson s correlation coefficient to test that? Recall that Pearson s correlation assumes that 1) the variables are quantitative, and 2) if there is a relationship between 2 variables, that relationship is linear. Now, most of the matrices are not exactly as just shown above. More specifically, the matrices are usually distance measures where distance is some metric between the replicates involved in the study. For example, matrix Y could be the number of genes not in common between sampled animals in a study and matrix X could be the Euclidean distance between the locations at which the animals were found. The distance between a replicate and itself is 0 and the distances are symmetric in the sense that the distance between F and H is the same as the distance between H and F. So commonly we have matrices with the structure Y X animal b c 1 0 β 2 b 0 e 2 β 0 3 c e 0 3 χ ε χ ε 0 where b = # genes not in common between animals 1 and 2 and β = geographic distance between animals 1 and 2, etc. We only (b, β), (c, χ), (e, ε) need to test for correlation. STA 6934 Spring Mary C. Christman

4 Because of the use of the same individuals repeatedly in generating the distances given in the matrices, the values within each matrix are also correlated among themselves. As a consequence, the usual method for testing Pearson s correlation coefficient would involve an estimated standard error that is biased low for the true standard deviation of the estimator of correlation. This means we shouldn t use the usual large-sample test based on Normality. Use a Permutation test! Example: Copepods in Ceiling Drips in Organ Cave, West Virginia STA 6934 Spring Mary C. Christman

5 STA 6934 Spring Mary C. Christman

6 # Title="Organ Cave Ceiling Drips" Partial Code for Testing Correlation Matrixsize= 13 #Y matrix matrix of dissimilarities (1-Jaccard Index) Jaccard <- matrix(c( 0.00, 0.83, 0.80, 1.00, 0.87, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 0.90, 1.00, 0.83, 0.00, 0.43, 0.43, 0.44, 1.00, 0.67, 0.86, 0.62, 1.00, 0.67, 0.55, 0.08, 0.80, 0.43, 0.00, 0.33, 0.37, 1.00, 0.60, 0.83, 0.57, 1.00, 0.43, 0.50, 0.40, 1.00, 0.43, 0.33, 0.00, 0.56, 1.00, 0.60, 0.83, 0.33, 1.00, 0.43, 0.66, 0.40, 0.87, 0.44, 0.37, 0.56, 0.00, 0.87, 0.75, 0.89, 0.70, 0.87, 0.44, 0.20, 0.62, 1.00, 1.00, 1.00, 1.00, 0.87, 0.00, 1.00, 1.00, 1.00, 1.00, 0.83, 0.90, 1.00, 1.00, 0.67, 0.60, 0.60, 0.75, 1.00, 0.00, 0.67, 0.60, 1.00, 0.67, 0.80, 0.75, 1.00, 0.86, 0.83, 0.83, 0.89, 1.00, 0.67, 0.00, 0.83, 1.00, 0.86, 0.91, 1.00, 1.00, 0.62, 0.57, 0.33, 0.70, 1.00, 0.60, 0.83, 0.00, 1.00, 0.62, 0.64, 0.67, 1.00, 1.00, 1.00, 1.00, 0.87, 1.00, 1.00, 1.00, 1.00, 0.00, 1.00, 0.00, 1.00, 1.00, 0.67, 0.43, 0.43, 0.44, 0.83, 0.67, 0.86, 0.62, 1.00, 0.00, 0.55, 0.50, 0.90, 0.55, 0.50, 0.66, 0.20, 0.90, 0.80, 0.91, 0.64, 0.00, 0.55, 0.00, 0.70, 1.00, 0.08, 0.40, 0.40, 0.62, 1.00, 0.75, 1.00, 0.67, 1.00, 0.50, 0.70, 0.00), #X1 matrix logdist=matrix(c( 0.00, 0.556, 0.607, 0.653, 0.708, 3.097, 3.097, 3.097, 3.097, 3.076, 3.076, 3.076, 3.076, 0.556, 0.00, 0.161, 0.279, 0.398, 3.097, 3.097, 3.097, 3.097, 3.076, 3.076, 3.076, 3.076, 0.607, 0.161, 0.00, 0.161, 0.312, 3.097, 3.097, 3.097, 3.097, 3.076, 3.076, 3.076, 3.076, 0.653, 0.279, 0.161, 0.000, 0.204, 3.097, 3.097, 3.097, 3.097, 3.076, 3.076, 3.076, 3.076, 0.708, 0.398, 0.312, 0.204, 0.000, 3.097, 3.097, 3.097, 3.097, 3.076, 3.076, 3.076, 3.076, 3.097, 3.097, 3.097, 3.097, 3.097, 0.000, 1.959, 1.959, 1.959, 1.820, 1.820, 1.820, 1.820, 3.097, 3.097, 3.097, 3.097, 3.097, 1.959, 0.000, 0.886, 0.896, 1.820, 1.820, 1.820, 1.820, 3.097, 3.097, 3.097, 3.097, 3.097, 1.959, 0.886, 0.000, 0.072, 1.820, 1.820, 1.820, 1.820, 3.097, 3.097, 3.097, 3.097, 3.097, 1.959, 0.896, 0.072, 0.000, 1.820, 1.820, 1.820, 1.820, 3.076, 3.076, 3.076, 3.076, 3.076, 1.820, 1.820, 1.820, 1.820, 0.000, 1.390, 1.405, 1.412, 3.076, 3.076, 3.076, 3.076, 3.076, 1.820, 1.820, 1.820, 1.820, 1.390, 0.000, 0.270, 0.356, 3.076, 3.076, 3.076, 3.076, 3.076, 1.820, 1.820, 1.820, 1.820, 1.405, 0.270, 0.000, 0.149, 3.076, 3.076, 3.076, 3.076, 3.076, 1.820, 1.820, 1.820, 1.820, 1.412, 0.356, 0.149, 0.000), To test if matrices Y and X1 are correlated, I need to permute one of the matrices repeatedly and then test the original correlation estimate against the distribution of correlations for the permuted matrices. So, permute Y by randomly rearranging the columns and then arranging the rows to match the random rearrangement of the columns: A <- matrix(c(11,12,13,21,22,23,31,32,33), byrow=t, nrow=3) >A [,1] [,2] [,3] [1,] [2,] [3,] STA 6934 Spring Mary C. Christman

7 temp <- sample(3) > temp [1] Aperm<- A[,temp] > Aperm [,1] [,2] [,3] [1,] [2,] [3,] Aperm<- Aperm[temp,] > Aperm [,1] [,2] [,3] [1,] [2,] [3,] Aperm<-A[temp,temp] > Aperm preserves the symmetry of [,1] [,2] [,3] the matrix [1,] [2,] [3,] Then do the permutations and get the resulting set of correlations. Compare the original correlation against the permuted pairs. H 0 : the two variables are not correlated H A : the two variables are positively correlated The p-value of the one-sided test is the proportion of permutation correlations estimates > original correlation estimate # simple Mantel test of Jaccard and log(distance) ignoring system effects # observed correlation between X and Y STA 6934 Spring Mary C. Christman

8 Jvector <- as.vector(jaccard) X1vector <- as.vector(logdist) obs.corr <- cor(jvector,x1vector) numpermutes < # 13! = 6,227,020,800 possible arrangements permuted.corr <- rep(0,numpermutes) permuted.corr[1] <- obs.corr for (i in 2:numPermutes) { temp <- sample(matrixsize) permuted.jaccard <- Jaccard[temp,temp] Jvector <- as.vector(permuted.jaccard) permuted.corr[i] <- cor(jvector,x1vector) } pvalue <- sum(permuted.corr>=obs.corr)/numpermutes Frequency distribution of Pearson s r from the permutations original data correlation r = permutation p-value = permuted.corr Pearson s assumes the relationship if it exists is linear. Is that the case here? STA 6934 Spring Mary C. Christman

9 I reran the test using Spearman s correlation coefficient instead: Change cor(jvector,x1vector) to cor(rank(jvector),rank(x1vector)) and rerun the above code. > obs.corr = > pvalue = permuted.corr The best method (not shown) is to incorporate a second variable that distinguishes the two regions from each other following the method outlined in Smouse et al Sometimes called partial Mantel tests or multiple regression Mantel tests. Mantel Correlogram In order to study the structure of the Y matrix (usually the one of interest) with respect to distances in the other matrix, it is of interest to look at the correlation among values of Y for specific sets of distances in X. This is a case of looking at AUTOcorrelation among subsets of values within a matrix rather than correlation between two different variables. The correlogram is a graphic displaying the autocorrelation for those different subsets. For example, suppose I am interested in the autocorrelation among the dissimilarities of the copepods as a function of log(distance). The way to do that is to create a set of non-overlapping distance classes (called lag STA 6934 Spring Mary C. Christman

10 distances) and do the autocorrelation of observations that fall within each distance class. First, I need to create the set of lag distances: (>0 1), (1 2), and (> 2). # Lag distance matrix lagdistmatrix=matrix(c( 0, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 1, 0, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 0, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 0, 1, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 2, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 2, 1, 0, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 2, 1, 1, 0, 2, 2, 2, 2, 3, 3, 3, 3, 3, 2, 2, 2, 2, 0, 2, 2, 2, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 0, 1, 1, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 1, 0, 1, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 1, 1, 0), Then, for each lag distance, I need to create another matrix of 0s and 1s, where the zeroes indicate that the distance is within the lag class or 1s otherwise. Now perform Mantel s test on these two matrices. Repeat until all lag classes have been done. # For example: Lag 1 matrix lagdistmatrix1 = matrix(c( 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0), # Lag distance 2 matrix lagdistmatrix2 = matrix(c( 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, STA 6934 Spring Mary C. Christman

11 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0), # Lag distance matrix lagdistmatrix3 = matrix(c( 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0), Run Mantel s test on each lag distance matrix and Y. We obtain the following results: Lag Observed Correlation 2-sided p-value Very positive and very negative values indicate that the further away the locations from one another, the more dissimilar the species composition (as measured by 1-J). STA 6934 Spring Mary C. Christman