A combinatorial test for significant codivergence between cool-season grasses and their symbiotic fungal endophytes



Similar documents
Currencies of Mutualisms: Sources of Alkaloid Genes in Vertically Transmitted Epichloae

Molecular Clocks and Tree Dating with r8s and BEAST

Phylogenetic Trees Made Easy

Introduction to Phylogenetic Analysis

Introduction to Bioinformatics AS Laboratory Assignment 6

Bayesian Phylogeny and Measures of Branch Support

Retail Lawn Seed Mixtures for Western Oregon and Western Washington

A data management framework for the Fungal Tree of Life

Relationships of Floras (& Faunas)

Arbres formels et Arbre(s) de la Vie

Visualization of Phylogenetic Trees and Metadata

An experimental study comparing linguistic phylogenetic reconstruction methods *

Tree Integrated Pest Management. Dan Nortman Virginia Cooperative Extension, York County

Data Partitions and Complex Models in Bayesian Analysis: The Phylogeny of Gymnophthalmid Lizards

Matter and Energy in Ecosystems

Endophytes of perennial ryegrass and tall fescue

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

A comparison of methods for estimating the transition:transversion ratio from DNA sequences

Ecology Symbiotic Relationships

Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE

by Erik Lehnhoff, Walt Woolbaugh, and Lisa Rew

Deterministic computer simulations were performed to evaluate the effect of maternallytransmitted

Name Class Date. binomial nomenclature. MAIN IDEA: Linnaeus developed the scientific naming system still used today.

Natural History Note

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Data for phylogenetic analysis

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis

DNA Barcoding in Plants: Biodiversity Identification and Discovery

Protein Sequence Analysis - Overview -

Borges, J. L On exactitude in science. P. 325, In, Jorge Luis Borges, Collected Fictions (Trans. Hurley, H.) Penguin Books.

Pairwise Sequence Alignment

Crawling and Detecting Community Structure in Online Social Networks using Local Information

Internet Search Activity & Crowdsourcing

Fungi and plants practice

Heredity - Patterns of Inheritance

AP Biology Essential Knowledge Student Diagnostic

Scaling the gene duplication problem towards the Tree of Life: Accelerating the rspr heuristic search

KEY CONCEPT Organisms can be classified based on physical similarities. binomial nomenclature

Missing data and the accuracy of Bayesian phylogenetics

Routing in packet-switching networks

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Original article: COMPARISON OF FIVE CALLIGONUM SPECIES IN TARIM BASIN BASED ON MORPHOLOGICAL AND MOLECULAR DATA

Bayesian coalescent inference of population size history

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

PHYLOGENY AND EVOLUTION OF NEWCASTLE DISEASE VIRUS GENOTYPES

Molecular identification of the turf grass rapid blight pathogen

Bio-Informatics Lectures. A Short Introduction

PLANT EVOLUTION DISPLAY Handout

Cytospora Canker. A Hard Nut to Crack. My current ongoing projects 1/23/ % of Cherry trees

An Introduction to Phylogenetics

Turfgrass Selection for the Home Landscape

CSE 326: Data Structures B-Trees and B+ Trees

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Biological Sciences Initiative. Human Genome

Using multiple models: Bagging, Boosting, Ensembles, Forests

Phylogenetic relationships among Staphylococcus species and refinement of cluster groups based on multilocus data

a-cB. Code assigned:

In-Network Coding for Resilient Sensor Data Storage and Efficient Data Mule Collection

Bird and bat droppings

PHYLOGENETIC ANALYSIS

Supporting Online Material for

Speciation in fig pollinators and parasites

Question Bank Five Kingdom Classification

Practice Questions 1: Evolution

Activity IT S ALL RELATIVES The Role of DNA Evidence in Forensic Investigations

Automated Plausibility Analysis of Large Phylogenies

DNA Sequence Alignment Analysis

The Central Dogma of Molecular Biology

Geography 4203 / GIS Modeling. Class (Block) 9: Variogram & Kriging

The enigmatic monotypic crab plover Dromas ardeola is closely related to pratincoles and coursers (Aves, Charadriiformes, Glareolidae)

Vascular Plants Bryophytes. Seedless Plants

Final Project Report

Common Core Unit Summary Grades 6 to 8

Virginia s Turfgrass Industry

Social Media Mining. Network Measures

What mathematical optimization can, and cannot, do for biologists. Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL

OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION

Time series experiments

Point of View. Missing Data in Phylogenetic Analysis: Reconciling Results from Simulations and Empirical Data JOHN J. WIENS AND MATTHEW C.

Biology 300 Homework assignment #1 Solutions. Assignment:

Avoiding Tree & Utility Conflicts

A short guide to phylogeny reconstruction

BIOL 1030 TOPIC 5 LECTURE NOTES TOPIC 5: SEEDLESS VASCULAR PLANTS (CH. 29)

Foamy Bark Canker. A New Insect-Disease Complex on Coast Live Oak in California Caused by Western Oak Bark Beetle and Geosmithia sp.

CLUSTER ANALYSIS FOR SEGMENTATION

PLANT DIVERSITY. EVOLUTION OF LAND PLANTS KINGDOM: Plantae

Taxonomy and Classification

Common Core State Standards for Mathematics Accelerated 7th Grade

2.3 Identify rrna sequences in DNA

COC131 Data Mining - Clustering

Host specificity and the probability of discovering species of helminth parasites

A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML

Inference of Large Phylogenetic Trees on Parallel Architectures. Michael Ott

Cells are tiny building blocks that make up all living things. Cells are so small that you need a microscope to see them.

How To Run Statistical Tests in Excel

Introduction to Plants

Vector storage and access; algorithms in GIS. This is lecture 6

Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment

Transcription:

A combinatorial test for significant codivergence between cool-season grasses and their symbiotic fungal endophytes Ruriko Yoshida Dept. of Statistics University of Kentucky Joint work with C.L. Schardl, K.D. Craven, A. Lindstrom, A. Stromberg www.ms.uky.edu/ ruriko IMA 1

Endophytes species IMA 2

Life cycles of sexual ( (Epichloë spp.) endophytes asexual cycle & vertical transmission sexual cycle & horizontal transmission

Epichloë/Neotyphodium /Neotyphodium in a grass plant Symbioses are: Systemic Constitutive Often heritable Often mutualistic

Endophytes: Constitutive, heritable symbionts H. Koya, pers.. comm. Freeman 1904

Endophyte effects on host fitness Tall fescue without endophyte Tall fescue w/ Neotyphodium coenophialum

Fruiting structures of Epichloë species

Grass-endophyte mutualisms nutrition shelter dispersal H O O H N H H anti-insect insect OH O O anti-vertebrate anti-nematode nematode drought tolerance etc. O O O H O O N NR 2 H N N OH O O N HN O H 3 C N N H NH 2 NH HN N

Transmission strategies Relative importance of vertical vs horizontal transmission. A. Exclusively or almost exclusively transmitted horizontally. The endophyte will tend to shut down host seed production ( choke disease ), diverting available plant resources to production of infectious spores. Those spores spread to developing seeds of neighboring plants. B. Exclusive vertical transmission. The host exhibits no disease symptoms due to the endophyte infection. Its seeds develop and germinate normally, but bear the endophyte and thereby transmit it to the next generation. C. Mixed vertical and horizontal transmission strategy. IMA 3

Host range 1. Some endophytes are restricted to individual host species. This seems rare for endophyte categories A and C above, but typical of category B. 2. Some are restricted to individual genera. 3. Some are restricted to host tribes. 4. Some are associated mainly with one host tribe, but occasionally can be identified in the sister tribe. 5. Some are present in a phylogenetically broad range of host tribes. IMA 4

Problem Question. We would like to analyze how grasses and their endophytes evolved together? Method. 1. We use phylogenetic trees among grass species and among endophytes species. 2. Compute pairwise distances in the grass tree and in the endophyte tree. 3. Compute MRCA pairs of two trees. 4. Estimate the probability of codivergence between two trees and compute their correlations. IMA 5

Phylogenetic trees Data. Sequencing of Chloroplast DNA (cpdna) Non-Coding Regions. 27 species in each group. Sequences were entered into GenBank as accession numbers AY450932 AY450949 and xxxxx-xxxxx. Based on published phylogenetic inference for the grass subfamily Poöideae (Soreng and Davis 1998), Brachyelytrum erectum was chosen as the outgroup for reconstructing the grass phylogenies. The corresponding endophyte, Epichloë brachyelytri, was the outgroup chosen for endophyte phylogenies. We reconstruct phylogenetic trees of grasses (the host tree, T H ) and phylogenetic trees of endophytes (the parasite tree, T P ) via a software PAUP* under the GTR+G+I model. Then we used a software r8s to make trees unltrametric (using the least square method). IMA 6

100 A. tenuis 100 96 A. hiemalis 74 C. villosa Ech. ovatus K. cristata 100 S. obtusata Agrostideae 100 L. edwardii 97 L. multiflorum 96 L. perenne 65 100 L. arundinaceum 99 Lolium sp. P4074 Lolium sp. P4078 Poeae 92 96 100 F. rubra F. longifolia Hol. mollis 84 100 85 88 100 85 Bro. erectus Bro. purgans Bro. ramosus He. europaeus El. canadensis H. brevisubulatum Bromeae Hordeeae Brp. sylvaticum Brachypodieae Ach. inebrians Stipeae G. striata Meliceae 0.005 substitutions/site Bre. erectum Brachyelytreae Figure 1: Parametric ML tree estimated from cpdna intron and intergenic sequences. Numbers above branches indicate bootstrap support percentages (over 50%) obtained by 1000 maximum parsimony searches with branch swapping. IMA 7

Hosts Bre.erectum Brp.sylvaticum Ech.ovatus C.villosa A.tenuis A.hiemalis S.obtusata K.cristata Lolium sp.p4078 Lolium sp.p4074 L.arundinaceum L.multiflorum L.edwardii L.perenne L.perenne F.r.commutata F.longifolia Hol.mollis He.europaeus Bro.ramosus Bro.erectus Bro.purgans H.brevisubulatum El.canadensis G.striata Ach.inebrians 10 E.brachyelytri E.sylvatica E.typhina N.t.canariense N.aotearoae E.baconii E.amarillans E.amarillans E.baconii N.sp.FATG2 N.sp.FATG3 N.coenophialum N.occultans E.festucae E.festucae E.festucae N.lolii Epichloë sp. N.sp.HeTG1 Bro.ramosus E.bromicola N.sp.HbTG1 E.elymi E.elymi E.glyceriae N.gansuense Endophytes Figure 2: Ultrametric ML time trees for host grasses and their endophytes. Hosts and their endophytes are indicated opposite each other or by connecting dashed lines. Full taxon names are given in Table 1 in our paper. IMA 8

MRCA pairs A MRCA pair is a pair of a Most Recent Common Ancestor (MRCA) of any pair of host species and a Most Recent Common Ancestor (MRCA) of any pair of parasite species. IMA 9

MRCA pairs Congruent trees H 7 P 7' 5 6 5' 6' 1 2 3 4 1' 2' 3' 4' MRCA pair (5,5') (6,6') (7,7') Pairs of H and P taxon pairs ((1,2),(1',2')) ((3,4),(3',4')) ((1,3),(1',3')), ((1,4),(1',4')), ((2,3),(2',3')), ((2,4),(2',4')) Incongruent trees H 7 P 7' 6' 5 6 5' 1 2 3 4 1' 2' 3' 4' MRCA pair (5,5') (7,6') (7,7') (6,7') Pairs of H and P taxon pairs ((1,2),(1',2')) ((1,3),(1',3')), ((2,3),(2',3')) ((1,4),(1',4')), ((2,4),(2',4')) ((3,4),(3',4')) IMA 10

Analysis on codivergence [Legendre et al 2002] etc used all possible pairs of pairwise distances from the host tree and the parasite tree and used Principal Components Analysis (PCA) to compute their correlations. A problem of their method is that we possibly pick the same Most Recent Common Ancestor (MRCA) pair multiple times. This causes a bias in the result. In each tip clade a MRCA uniquely relates two taxa. However, a MRCA deeper in the tree relates multiple taxon pairs. For example, for congruent H and P trees the matrix of all pairwise distances of H taxon pairs against all pairwise distances of P taxon pairs represents each corresponding pair of tip clade MRCAs only once, and each corresponding pair of deeper MRCAs multiple times. The MRCALink algorithm samples corresponding H and P MRCA pairs only once. IMA 11

MRCALink algorithm We will go though the algorithm with an example. H 7 P 7' 6' 5 6 5' 1 2 3 4 1' 2' 3' 4' IMA 12

Step 1: Assign each node a unique number such that its number is bigger than its children. Step 2: for each interior node in H, from all possible pairs of offsprings, find corresponding pairs in P. 5: From 5 = (1,2), we find a new MRCA 5 = (1,2 ) in P. 6: From 6 = (3,4) we find a new MRCA 7 = (3,4 ). 7: From 7 = (1,3) = (2,3) = (2, 4), we find new MRCAs 6 = (1,3 ) and 7 = (1,4 ). Thus, we have pairs (5,5 ), (6,7 ), (7,7 ), (7,6 ). IMA 13

Computing the probability of codivergence Let τ H be the set of all ultrametric host trees with n taxa and let τ P be the set of all ultrametric parasite trees with n taxa. S(X, Y, T, t) = x X,y Y time(mrca(x)) time(mrca(y)), where T τ H, t τ P, X is a set of pairs of taxa in H, and Y is a set of pairs of taxa in P. Then we estimate the probability P(S(X, Y, T H, T P ) S(X, Y,T, t) : T τ H, t τ P ) which is the estimated probability of codivergence for T H and T P, by randomly generated trees from τ H and τ P. IMA 14

Results We analyzed 4 pairs of host trees and parasite trees, namely the full tree and T 1 T 4 by removing some of species in the full trees, trimmed trees (T 1 T 4 ). For each pair of trimmed trees, we removed some species from the endophytes and corresponding grasses because these endophytes seem to have holizontal or mixed transmission. Plots of MRCA ages of hosts and their corresponding endophytes identified by the MRCALink algorithm from ultrametric ML trees for the full dataset or trimmed datasets T 1 T 4. Root ages were set to 100 arbitrary units for both trees. The diagonal lines represent expectation for perfect cospeciation and perfect phylogenetic reconstruction. IMA 15

100 Full Rel. Symbiont MRCA Ages 50 0 p = 0.3618 0 50 100 Rel. Host MRCA Ages 100 T1 100 T2 Rel. Symbiont MRCA Ages 50 0 p = 0.8734 Rel. Symbiont MRCA Ages 50 0 p = 0.8679 0 50 100 Rel. Host MRCA Ages 0 50 100 Rel. Host MRCA Ages T3 T4 100 100 Rel. Symbiont MRCA Ages 50 p = 0.9923 0 0 50 100 Rel. Host MRCA Ages Rel. Symbiont MRCA Ages 50 p = 0.9981 0 IMA 16 0 50 100 Rel. Host MRCA Ages

Questions [Huelsenbeck et al, 2000] showed an algorithm to estimate the MLE for host switching via MCMC. We would like to estimate the MLE of the probability of host switching in endophytes. All existing methods assume that the host and parasite trees are true trees. Can we reconstruct phylogenetic trees including host switching and also information of relations between hosts and parasites? How are the host tree and the parasite tree related in the tree space? More detailed analysis needs to be done. IMA 17

A preprint is available at arxiv: http://arxiv.org/abs/q-bio.pe/0611084 MRCALink will be available soon IMA 18

Thank you... IMA 19