Testing Hypotheses: Topology Comparisons. Last updated 04/09/12

Similar documents
Bayesian Phylogeny and Measures of Branch Support

PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference

Molecular Clocks and Tree Dating with r8s and BEAST

Tutorial 5: Hypothesis Testing

Descriptive Statistics

A comparison of methods for estimating the transition:transversion ratio from DNA sequences

Non-Inferiority Tests for One Mean

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

NONPARAMETRIC STATISTICS 1. depend on assumptions about the underlying distribution of the data (or on the Central Limit Theorem)

Missing data and the accuracy of Bayesian phylogenetics

Maximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1

Permutation Tests for Comparing Two Populations

Phylogenetic Trees Made Easy

How To Test For Significance On A Data Set

StatCrunch and Nonparametric Statistics

Evaluating the Performance of a Successive-Approximations Approach to Parameter Optimization in Maximum-Likelihood Phylogeny Estimation

Likelihood: Frequentist vs Bayesian Reasoning

Non-Parametric Tests (I)

Correlational Research

Quantitative Methods for Finance

From the help desk: Bootstrapped standard errors

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

NCSS Statistical Software

NCSS Statistical Software. One-Sample T-Test

Non-Inferiority Tests for Two Means using Differences

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

HYPOTHESIS TESTING: POWER OF THE TEST

Sample Size and Power in Clinical Trials

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

Two Correlated Proportions (McNemar Test)

UNDERSTANDING THE DEPENDENT-SAMPLES t TEST

Nonparametric statistics and model selection

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Chapter 23 Inferences About Means

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

Simple Regression Theory II 2010 Samuel L. Baker

NCSS Statistical Software

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Statistics 3202 Introduction to Statistical Inference for Data Analytics 4-semester-hour course

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Comparing Bootstrap and Posterior Probability Values in the Four-Taxon Case

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

Comparing Means in Two Populations

The Variability of P-Values. Summary

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Indices of Model Fit STRUCTURAL EQUATION MODELING 2013

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Statistics 2014 Scoring Guidelines

Paired T-Test. Chapter 208. Introduction. Technical Details. Research Questions

HYPOTHESIS TESTING WITH SPSS:

Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation

Nonparametric Two-Sample Tests. Nonparametric Tests. Sign Test

Statistics Graduate Courses

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

Testing for differences I exercises with SPSS

Lecture 9: Bayesian hypothesis testing

T-TESTS: There are two versions of the t-test:

One-Way Analysis of Variance (ANOVA) Example Problem

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Rank-Based Non-Parametric Tests

Phylogenetic systematics turns over a new leaf

Study Guide for the Final Exam

The Wondrous World of fmri statistics

CHAPTER 14 NONPARAMETRIC TESTS

II. DISTRIBUTIONS distribution normal distribution. standard scores

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

UNIVERSITY OF NAIROBI

Non-Inferiority Tests for Two Proportions

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

Difference tests (2): nonparametric

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

Basics of Statistical Machine Learning

171:290 Model Selection Lecture II: The Akaike Information Criterion

Section 13, Part 1 ANOVA. Analysis Of Variance

Permutation P-values Should Never Be Zero: Calculating Exact P-values When Permutations Are Randomly Drawn

Research Methodology: Tools

A Study of Differences in Standard & Poor s and Moody s. Corporate Credit Ratings

Point Biserial Correlation Tests

1 Nonparametric Statistics

Testing Research and Statistical Hypotheses

SAMPLING & INFERENTIAL STATISTICS. Sampling is necessary to make inferences about a population.

13: Additional ANOVA Topics. Post hoc Comparisons

EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST

SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12

Fairfield Public Schools

SPSS Explore procedure

1.5 Oneway Analysis of Variance

Statistics. One-two sided test, Parametric and non-parametric test statistics: one group, two groups, and more than two groups samples

Pearson's Correlation Tests

Multinomial and Ordinal Logistic Regression

DDBA 8438: Introduction to Hypothesis Testing Video Podcast Transcript

Transcription:

Testing Hypotheses: Topology Comparisons Last updated 04/09/12

Measures of Support (continued) Non-parametric bootstrap Bremer support Bayesian posterior probabilities Likelihood ratio tests (LRT) Cannot be used because hypotheses (i.e., trees) are not nested Approximate LRT (implemented in PhyML) Topology comparison tests Paired-sites tests: Parsimony Likelihood Parametric bootstrap (SOWH test)

Measures of Support (continued) Non-parametric bootstrap Bremer support Bayesian posterior probabilities Likelihood ratio tests Cannot be used because hypotheses (i.e., trees) are not nested Topology comparison tests Parametric bootstrap (SOWH test) Paired-sites tests

Paired-sites tests The basic question: Is tree A significantly better than tree B or are the differences (in parsimony or likelihood scores) within the expectations of random error? 1. Compare two trees for either parsimony or likelihood scores Number of steps or likelihood score at each site 2. Decide whether the difference is significant by comparing to an assumed distribution (binomial or normal), or creating a null distribution using bootstrap sampling techniques

Paired-sites tests: parsimony Developed by Alan Templeton (originally for restriction site data; 1983) Winning sites test (Prager and Wilson 1988) PAUP* a simpler version of Templeton for each site, score which tree is better so that each site is assigned either + or - (or 0 if the are equivalent). Use a binomial distribution to test whether the fraction of + versus - is significantly different from 0.5 Wilcoxon signed-ranks test (Templeton 1983) PAUP* Replace absolute values of differences by their ranks, then re-applies signs Kishino-Hasegawa test (1989) PAUP* Assume all sites are i.i.d Test statistic T = ΣT (i) Where T (i) is the difference in the minimum number of substitutions on the two trees at the ith informative site. Expectation for T (under the null hypothesis that the two trees are not significantly different) is zero. The sample variance can be obtained with an equation and tested with a t-test with n-1 degrees of freedom (n = number of informative sites) If no a priori reason to suspect one tree better than other (two-tailed) If a priori reason (such as one tree being the most parsimonious), test is not valid

Paired-sites tests: parsimony Example One dataset Two trees: Tree 1: 1153 steps (Better) Tree 2: 1279 steps (Worse) Question: Are the two trees significantly different?

Winning sites: PAUP* output Comparison of tree 1 (best) to tree 2: ------ Changes ------ ----- Ranks ----- Character Tree 1 Tree 2 Difference Positive Negative ------------------------------------------------------------------------- 7 2 3 1 81.5 8 1 2 1 81.5 19 4 3-1 -81.5 25 5 6 1 81.5 28 2 1-1 -81.5 34 3 4 1 81.5 43 4 3-1 -81.5. 82 2 3 1 81.5 88 2 4 2 164 96 1 2 1 81.5. 888 1 2 1 81.5 890 4 3-1 -81.5 ------------------------------------------------------------------------- Totals 126 12068-1793 Winning-sites (sign) test: N = 166 Number of positive differences = 144 Number of negative differences = 22 Test statistic = 0.867470 P < 0.0001* (normal approximation with z = 9.391421)

Wilcoxon signed-ranks (Templeton) test: PAUP* output Comparison of tree 1 (best) to tree 2: ------ Changes ------ ----- Ranks ----- Character Tree 1 Tree 2 Difference Positive Negative ------------------------------------------------------------------------- 7 2 3 1 81.5 8 1 2 1 81.5 19 4 3-1 -81.5 25 5 6 1 81.5 28 2 1-1 -81.5 34 3 4 1 81.5 43 4 3-1 -81.5. 82 2 3 1 81.5 88 2 4 2 164 96 1 2 1 81.5. 888 1 2 1 81.5 890 4 3-1 -81.5 ------------------------------------------------------------------------- Totals 126 12068-1793 Templeton (Wilcoxon signed-ranks) test: N = 162 Test statistic (T) = 1467 P < 0.0001* (normal approximation with z = -9.899495)

Kishino-Hasegawa test: PAUP* output Kishino-Hasegawa test: Length Tree Length diff s.d.(diff) t P* --------------------------------------------------------------------- 1 1153 (best) 2 1279 126 12.02005 10.4825 <0.0001* * Probability of getting a more extreme T-value under the null hypothesis of no difference between the two trees (two-tailed test). Asterisked values in table (if any) indicate significant difference at P < 0.05. Remember: this test is only valid if there is no a priori reason to believe that one tree is better than the other

Paired-sites tests: likelihood Winning sites test (Prager and Wilson 1988) Similar to the parsimony approach, but using likelihood scores for each site z test (and the practically identical t test) PAUP* (KH test with normal approximation) If the likelihood difference between the two trees is >1.96 times its standard error, the two trees are significantly different Wilcoxon signed-ranks test (Templeton 1983) Kishino-Hasegawa (Kishino & Hasegawa 1989) Normal approximation (see z test above) Using bootstrap resampling to create a null distribution RELL approximation PAUP* (KH test RELL) Full optimization PAUP* (KH test Full) Shimodaira-Hasegawa PAUP* (RELL & Full optimization) Approximately unbiased AU (Shimodaira 2002) CONSEL Similar to SH but uses multiscale bootstrap to correct for the bias caused by standard bootstrap resampling

Example Is the ML tree (right) significantly better than the accepted phylogeny (left)? See Goldman et al. 2000

Example: frequency distribution of Example: KH bootstrap test log-likelihood differences Observed difference in lnl = 3.9 (P = 0.384) See Goldman et al. 2000

Example: frequency distribution of log-likelihood differences KH bootstrap histogram P = 0.384 KH Normal approximation P = 0.384 (var = no. of sites x variance of site lnls)

RELL approximation vs. Full optimization In all paired sites tests, the sum of the likelihoods for each site is compared among the different trees Full optimization (option in PAUP*): For each bootstrap replicate, branch lengths are optimized for each tree to be compared RELL (Resampling Estimated Log Likelihoods) approximation (option in PAUP*): For each bootstrap replicate, assume the same branch lengths obtained from the original data Keeps track of likelihoods at each individual sites and then adds them up according to the sites that were resampled (much faster)

Shimodaira-Hasegawa (SH) test (PAUP* and CONSEL) The Kishino-Hasegawa (KH) test is only valid if the two compared trees are specified before-hand However, the common practice is to estimate the ML tree from dataset and test alternative trees (from other authors/datasets) against the ML tree Also, comparing more than two trees leads to a multiple comparison problem that cannot be solved by Bonferroni corrections Use of KH test is these cases will lead to false rejections of the non-ml trees (selection bias) pointed out by Shimodaira & Hasegawa (1999) and Goldman et al. (2000) Selection bias leads to overconfidence in the wrong trees The Shimodaira-Hasegawa (SH) corrects for this selection bias and accounts for the multiple comparisons issue, but is very conservative Test constructed under the least favorable scenario: Null: that all hypotheses are equivalent For example: if we have 10 reasonable trees to evaluate, but we add another 90 implausible trees to the comparison, they make the test more conservative Thus, too many trees will dilute the power

Shimodaira-Hasegawa (SH) test (PAUP* and CONSEL) The conservative behavior is partially alleviated with the Weighted Shimodaria Hasegawa (WSH) CONSEL The test statistic is standardized The null distribution against which the test statistic is compared is obtained by bootstrap re-sampling via Full optimization or RELL approximation However, bootstrap is biased (Zharkikh & Li 1992; Li & Zharkikh 1994; Hillis & Bull 1993; Felsenstein & Kishino 1993; Efron, Halloran & Holmes 1996; Newton 1996) Shimodaira (2002) introduced a modification of the bootstrap to reduce the bias when conducting the SH test Known as the Approximately Unbiased (AU) test (CONSEL) Uses multi-scale bootstrap to correct p-value estimates A series of bootstraps of different sizes (still uses RELL approximation) Fits curves through resulting p-values to obtain a correction formula

KH and SH tests (RELL): PAUP* output Tree 1 2 3 -ln L 5988.05924 6256.51841 6263.25718 Time used to compute likelihoods = 0.22 sec Kishino-Hasegawa and Shimodaira-Hasegawa tests: KH test using RELL bootstrap, two-tailed test SH test using RELL bootstrap (one-tailed test) Number of bootstrap replicates = 1000 KH-test SH-test Tree -ln L Diff -ln L P P ------------------------------------------------------ 1 5988.05924 (best) 2 6256.51841 268.45917 0.000* 0.000* 3 6263.25718 275.19794 0.000* 0.000* * P < 0.05

KH and SH tests (Full): PAUP* output Tree 1 2 3 -ln L 5988.05924 6256.51841 6263.25718 Time used to compute likelihoods = 0.22 sec Kishino-Hasegawa and Shimodaira-Hasegawa tests: KH test using bootstrap with full optimization, two-tailed test SH test using bootstrap with full optimization (one-tailed test) Number of bootstrap replicates = 1000 KH-test SH-test Tree -ln L Diff -ln L P P ------------------------------------------------------ 1 5988.05924 (best) 2 6256.51841 268.45917 0.000* 0.000* 3 6263.25718 275.19794 0.000* 0.000* * P < 0.05

KH, SH & AU tests (RELL): CONSEL output P-values for several tests comparing 3 topologies # reading phyllo.pv # rank item obs au np bp pp kh sh wkh wsh # 1 1-9.7 0.967 0.931 0.933 1.000 0.929 0.981 0.929 0.995 # 2 2 9.7 0.056 0.063 0.061 6e-005 0.071 0.344 0.071 0.123 # 3 3 46.0 0.007 0.006 0.007 1e-020 0.008 0.013 0.008 0.010

References Chapter 12 Testing tree topologies in The Phylogenetic Handbook textbook. Chapter 21 Paired Sites tests and pp. 346 352 in Inferring Phylogenies textbook. Goldman, N., 2000. Likelihood-based tests of topologies in phylogenetics. Systematic Biology 49:652-670. Kishino H, Hasegawa M (1989) Evaluation of the maximum likelihood estimate of evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. J. Mol. Evol. 29, 170-179. Prager EM, Wilson AC (1988) Ancient origin of lactalbumin from J. P. Anderson, and A. G. Rodrigo. lysozyme: analysis of DNA and amino acid sequences. J. Mol. Evol. 27, 326-335. Shimodaira, H. An application of multiple comparison techniques to model selection. Ann. Inst. Statist. Math. 50, 1-13 (1998). Shimodaira, H. Multiple comparisons of log-likelihoods and combining nonnested models with applications to phylogenetic tree selection. Comm. in Statist., Part A - Theory Meth. 30, 1751-1772 (2001). Shimodaira, H. An approximately unbiased test of phylogenetic tree selection. Syst. Biol., 51, 492-508 (2002). Shimodaira, H. & Hasegawa, M. Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol. Biol. Evol. 16, 1114-1116 (1999). Shimodaira, H. & Hasegawa, M. CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics 17, 1246-1247 (2001) Templeton AR (1983) Phylogenetic inference from restriction endonuclease cleavage site maps with particular reference to the evolution of man and the apes. Evolution 37, 221-224." Templeton AR (1983) Convergent evolution and nonparametric inferences from restriction data and DNA sequences. In: Statistical Analysis of DNA Sequence Data (ed. Weir BS), pp. 151-179. Marcel Decker, New York, NY."

Non-parametric bootstrap (review) (from Felsenstein, 2004)

Parametric Bootstrap (from Felsenstein, 2004)

Parametric Bootstrap Simulate the data under the most likely tree and model of evolution Infer the ML tree for each simulated replicate Compute a majority-rule consensus tree with P values for branches (same as with non-parametric bootstrap) Concerns: Close reliance on the correctness of statistical model evolution When model is correct, variation from parametric bootstrap will be very similar to that of non-parametric bootstrap Alternative use (and much more widely used) to compare alternative topologies to the most likely tree (SOWH test or Parametric Bootstrap Test of Monophyly)

Parametric Bootstrap of Monophyly Most useful: In many cases, the overall tree structure, rather than a particular branch may suggest that a null hypothesis is incorrect If no single branch is particularly well supported, but the cumulative effects of many branches contain enough phylogenetic signal to reject a particular null hypothesis

ML tree A C B O Morphology tree (Hypothesis A or null) A B C O

ML tree A C B O Morphology tree (Hypothesis A or null) A B C O simulate evolution on this tree to generate N datasets of the same size and base composition as the original dataset

Parametric bootstrap test of monophyly (SOWH test) For each of the simulated dataset: ML analyses ML analyses no constraints constraint Hyp A Datasets in likelihood (or null Hypothesis) Likelihood score Likelihood score Difference in score 1 x 0 x A δ 1 2 x 0 x A δ 2 3 x 0 x A δ 3............ 100 x 0 x A δ 100

Parametric Bootstrap of Monophyly estimate of tree under null (constrained) dataset #1 dataset #2 original data dataset #2... ML estimate of tree (unconstrained) dataset #100

Parametric bootstrap test of monophyly (SOWH test)

Parametric bootstrap test of monophyly (SOWH test) Difference in score Observed difference is not significantly different from the values in the distribution. Can t reject hypothesis A Difference in score Observed difference is significantly different from the values in the distribution (p < 0.01). Reject hypothesis A

In other words... If hypothesis A were true, we could often recover alternative hypotheses in an analysis If hypothesis A were true, we would be unlikely to recover alternative hypotheses in an analysis Thus, Hyp A is unlikely; at least with our dataset (and all the assumptions we made in the process)

Parametric Bootstrap of Monophyly It is a way of testing whether we have enough power with the data at hand to reject a hypothesis There are many shortcut options for the SOWH test (Goldman et al. 2000 discusses implications of some of these): use parsimony instead of likelihood for comparison of trees use fixed topologies instead of constrained (to the null or Hyp A) and unconstrained searches partial vs. full optimization of parameters and branchlengths

Additional References Hillis, D. M., and J. J. Bull. 1993. An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Syst. Biol. 42:182-192. Hillis, D. M. 1995. Approaches for Assessing Phylogenetic Accuracy. Systematic Biology 44:3-16. Goldman, Anderson and Rodrigo. 2000. Likelihood-Based Tests of Topologies in Phylogenetics. Systematic Biology 49, (652-670) Zharkikh, A., and W. H. Li. 1995. Estimation of confidence in phylogeny: the complete-and-partial bootstrap technique. Mol. Phylogenet. Evol. 4:44-63. Swofford, D. L., G. J. Olsen, P. J. Waddell, and D. M. Hillis. 1996. Phylogenetic inference. Pp. 407-514 in D. M. Hillis, C. Moritz and B. K. Mable, eds. Molecular Systematics. Sinauer Associates, Sunderland. Felsenstein 2004. Inferring Phylogenies. Chapter 20

Parametric Bootstrap test of Monophyly (SOWH test) https://molevol.mbl.edu/wiki/index.php/ ParametricBootstrappingLab http://www.ebi.ac.uk/goldman/tests/ SOWHinstr.html