# On the Use of Nucleic Acid Sequences to Infer Early Branchings in the Tree of Life

Save this PDF as:

Size: px
Start display at page:

Download "On the Use of Nucleic Acid Sequences to Infer Early Branchings in the Tree of Life"

## Transcription

3 Use of DNAs to Infer Early Branchings 453 purposes (Navidi et al ; Yang 1994; Yang et al. 1994). The base frequencies are listed in table 1, and they are seen to differ among sequences. Pattern of Nucleotide Substitution A locally homogeneous Markov process was used to model nucleotide substitution along a branch in the tree, but different processes were allowed for different branches. The basic model was that of Hasegawa et al. ( 1985 ), by which the rate of nucleotide i changing into nucleotide j( j # i) is Qij = K C~ (for transitions: T ++ C, A * G) ( nj (for transversions: T, C ++ A, G) where 5 is the equilibrium frequency of nucleotide j, with c 5 = ZT + 7rc +,ra + no = 1. The diagonals Of the rate matrix Q = { Q,} are determined by the mathematical restriction that row sums of Q are zero; -Qii = Zj+i Qd is then the rate of substitution of nucleotide i. The matrix Q is multiplied by a scale factor such that the expected rate of substitution at equilibrium is -2 KiQii = 1. Th is means that time t, or the branch length in a tree, is measured by the expected number of substitutions per site accumulated during the time period or along the branch. The transition probability matrix for the branch (with length t) is then P(t) = exp( Qt). The model will be referred to as HKY, and, when combined with the gamma or discrete gamma distribution for variable rates among sites, as HKY+T or HKY+dG. Nonhomogeneous Process Models Different frequency parameters (7Ci s in eq. [ 11) are allowed for different branches in the tree so that base frequencies can drift toward different values in different lineages. The models may be considered special cases of Barry and Hartigan s ( 1987) parameter-rich model, whereby one whole-rate matrix (including 11 free parameters) was assigned for each branch. To get some feel for the cost of using many parameters on the one hand and the fit of the model to data on the other, two (1) models were constructed concerning the frequency parameters, referred to as Nl and N2. Model N2 is the more general, in which one set of frequency parameters (three free parameters) is assigned for each branch. Including the initial base frequencies at the root of the tree, this model involves as many sets of frequency parameters as the number of nodes in the (rooted) tree; for a bifurcating tree with s species, this number is 2s - 1. As the interior branches are usually short, one set of frequency parameters was used for all interior branches in model N 1. Including one set for the root and one set for each of the branches leading to the end nodes, model Nl involves s + 2 sets of frequency parameters and is a special case of N2. The likelihood function for given values of parameters and tree topology can be calculated following Felsenstein ( 198 1) (see also Barry and Hartigan 1987) for models assuming a single rate for all sites, and following Yang ( 1994) for models that assume the (discrete) gamma distribution for rates over sites. Removal of the homogeneity and stationarity assumption created two new problems. First, the placement of the root in a tree changes the likelihood, and therefore rooted trees should be considered, as implied above. This problem was ignored by Barry and Hartigan ( 1987). Second, as the process of substitution is not at equilibrium and base frequencies change with time, the branch length (t) as defined above is only an approximation to the real expected number of substitutions accumulated along the branch, which is an average over variable base frequencies. In this study, we use t as the (approximate) branch length, as this approximation will slightly affect estimates of branch lengths only, and estimates of other parameters or calculation of the likelihood are unaffected. Results Homogeneous Models Table 2 lists results obtained from using models with the homogeneity and stationarity assumptions. The HKY model of substitution is used, either assuming a single rate for all sites (HKY ) or gamma-distributed rates Table 1 Nucleotide Frequencies in ss rrnas (1,352 nucleotides) of Four Species Analyzed in this Paper Species T C A G G+C 1. Sulfolobus solfatarius Halobacterium salinarium Escherichia coli Homo sapiens Average

4 454 Yang and Roberts Table 2 Log-Likelihood Values and Estimates of Parameters Obtained from Models Assuming Homogeneity and Stationarity of the Substitution Processes HKY HKY+I HKY + dg TREE c - ~,a, i? c - 4nax i? a c - &xix 2 ci To: (1234) TI: ((12)34) Tz: ((13)24) TX: ((14)23) NOTE.-The ss rrnas of Suljdobus soljdarius (I), Halobacterium salinarium (2), Escherichia coli (3), and Homo sapiens (4) are analyzed. C, = -5J91.06 is the upper limit of the log likelihood (Navidi et al. 1991; Goldman 1993a). Parameter K is the transition:transversion rate ratio (see eq. [l]), and a is the shape parameter of the gamma (P) or discrete gamma (dg) distribution for rates among sites. Parameters in the models are estimated by maximum likelihood for each of the four (unrooted) tree topologies, and estimates of branch lengths are not shown. T,, is the star tree. Estimates of the frequency parameters obtained from the four trees are identical at the third decimal point; these are ft r = 0.205, iic = 0.265, 5, = 0.224, and & = for the HKY model; it, = 0.203, & = 0.270, ir, = 0.217, and & = for the HKY + P and HKY + dg models. Results for the ML trees under the models are shown in boldface type. among sites (HKY+T). In the discrete gamma model (HKY+dG), four categories of rates are used to approximate the gamma distribution (Yang 1994). The shape parameter a of the gamma distribution is inversely related to the extent of rate variation over sites; a = 00 corresponds to the case of a single rate for all sites. HKY is thus a special case of HKY+T or HKY+dG. In all three models, the substitution processes were assumed to be homogeneous and stationary, and one K and one set of frequency parameters were assumed for all sequences (branches). The existence of a molecular clock (i.e., rate constancy among lineages) was not assumed, and as HKY is a reversible process model, the root of the tree cannot be identified (Felsenstein 198 1). All parameters in the models were estimated from the data for each (unrooted) tree topology. The ML estimates of ni s are quite different from the averages of the observed frequencies; for example, using the average observed frequencies (table 1) in the HKY+T model gives log-likelihood values , , , and for tree topologies To, T1, T2, and T3, respectively (compare table 2). Rate variation over sites can be seen from the tremendous improvement in likelihood upon adding the a parameter of the gamma distribution (comparison between HKY and HKY+T or HKY+dG). For example, using tree topology T3, we can compare HKY and HKY+r by the likelihood-ratio test to test for rate constancy among sites, which means comparison of 2A& = 2( [ ) = with x:,14b = 6.63, and the difference is obviously significant. The same conclusion is drawn if T1 or T2 is used in the comparison instead of T3, or if HKY +dg is used instead of HKY+T (table 2). T, was the best tree by the HKY model, while assuming the gamma distribution of rates over sites favored T3 ( HYK+T and HKY+dG). The likelihood values for different trees were worryingly similar under the HKY+T and HKY+dG models. It was also noted that the performance of the HKY+dG model was quite good relative to HKY+T, with respect both to the fit to data reflected in the likelihood values and to the approximation to the continuous distribution reflected in the estimates of the a parameter. The discrete gamma model was used in later analysis in place of the continuous gamma. Nonhomogeneous Models In a preliminary analysis of the ss rrna data, a model that assumed one K for each branch in the tree was compared with another that assumed one K for all branches in the tree. The likelihood values obtained for these two models were not significantly different, indi- cating that the transition:transversion rate ratio is more or less the same in different parts of the tree although the sequences are drifting toward different base frequencies; at any rate, the ratio is not much larger than one (compare Yang et al. 1994). Later analyses were performed assuming one K for the whole tree. Results obtained from models that do not assume homogeneity and stationarity of substitution (models N 1 and N2) are listed in table 3. The 15 (rooted) bifurcating trees are classified into three groups according to their unrooted topology; for example, trees T1 1, T12, T13, T14, and TIS in table 3 have the same unrooted topology (i.e., T1 in table 2). Compared to HKY (table 2)) the HKY+N2 model (table 3 ) involves one extra branch (length) due to the addition of the root and 3 X (2s - 1-1) = 18 extra frequency parameters. The homogeneity and stationarity assumptions are clearly rejected; for example, using the likelihood values of T1 I (table 3) and T1 (table 2)) we compare 2AE = 2 X ( [ ) = withX:9,,9b = 36.19,

5 Use of DNAs to Infer Early Branchings 455 Table 3 Log-Likelihood Values and Parameter Estimates Under Nonhomogenous Models of Nucleotide Substitution HKY + N2 HKY+dG+N2 HKY+dG+Nl TREE 4 - &nax i? e - Lxx i? a -! - &lax i? & To: (1234) T,,: ((( 12)4)3) T,,: (((12)3)4) T,3: (((34)2)1) T,4: (((34)1)2) TIS: ((12)(34)) T2,: (((24)1)3) Tz2: ((( 13)2)4) T,,: ((( 13)4)2) Tz4: (((24)3)1) Tz5: (( 13)(24)) T3,: (((14)2)3) TJ2: (((14)3)2) TJ3: (((23)4)1) T3,: (((23)1)4) TJ5: ((14)(23)) NOTE.-The ss rrnas of Sulfolobus soljdarius (l), Halobacterium salinarium (2), Escherichia coli (3), and Homo sapiens (4) are analyzed. To is the star tree, and other multifurcating trees are not evaluated. Results for the ML trees are given in boldface type, while those for the best tree in each group corresponding to the same unrooted topology are listed in italicized type. A four-category discrete gamma (dg) model is used to account for variable rates over sites in the dg models. One set of frequency parameters is assumed for each branch in the tree by the N2 model, while the Nl model differs from N2 by assuming one set of frequency parameters for all the interior branches. Estimates of the frequency parameters and of branch lengths are not presented. and the difference is significant. Similar results are obtained if other reasonable tree topologies such as Tzl or T3i are used, or if rate variation over sites has been taken into account ( comparison between HKY +dg+n2 with HKY+dG). In sum, both rate variation among sites and nonhomogeneity of substitution are characteristics of the evolutionary processes of these sequences. The HKY+dG+N 1 model has three fewer frequency parameters than HKY+dG+N2 but very similar log-likelihood values. Results obtained from HKY+Nl (not shown) are also very similar to those obtained from HKY+N2; for example, the three best trees under.. HKY+Nl are also T1 1, Tsl, and T2 1, with log hkehhoods of , , and , respectively. The reason appears to be that one of the two interior branches in any tree topology is very short, and not much information exists in the data concerning the pattern of substitution along the short branch. It is noteworthy that different rooted trees sharing the same unrooted topology have quite different likelihood values (table 3). The three tree topologies that suggest first separation of Escherichia coli from other species- T1 1, T2 1, and T3 1,-have likelihood values much higher than others. This seems to suggest that the data contain considerable information concerning the position of the root (the earlist splitting) even though the topology may be uncertain. Notably the likelihood difference between the ML tree and the second best tree under the HKY+dG+Nl or HKY+dG+N2 models (table 3) is larger than that between the ML tree and the star tree under the HKY+dG model (table 2). This seems to suggest that the application of nonhomogeneous models to sequences with different base frequencies not only allows the root of the tree to be located but also leads to better power in discriminating among tree topologies. Estimates of parameters that are common to all The best tree under HKY+dG+N2 (fig. 1) sepatree topologies; that is, K and a are remarkably similar rates the species in the order E. coli, Halobacterium safor different tree topologies. Furthermore, the transition: linarium, Sulfolobus solfatarius, and Homo sapiens, sugtransversion rate ratio is underestimated under the HKY gesting that E. coli, which represents eubacteria, separates model which ignores rate variation over sites. These were first from the other species and that the two archaebacobserved and discussed for other models or data sets terial species (H. salinarium and S. solfatarius) are not ( Wakeley 1994; Yang et al. 1994). Branch lengths (not monophyletic. The position of the root of the universal shown) are also underestimated when rate variation over tree of life is widely, if not universally, accepted as being sites is ignored. within the branch of eubacteria. Support for this position

6 456 Yang and Roberts [0.16,0.19,0.32,0.33] c \ (0.28,0.27,0.22,0.23) 4: Homo sapiens [0.24,0.26,0.22,0.28] has been derived by using gene duplications which are believed to have occurred before the separation of these lineages (Gotgarten et al. 1989; Iwabe et al. 1989). In contrast, the nonhomogeneous models identified the root of the tree by using one single gene. Although the earliest separation of E. coli seems to be strongly supported as tree topologies Tr 1, Tzl,.and TJI have much higher likelihood values than other tree topologies, the relationship among the remaining three species is much less certain, in that the likelihood values for the three trees T1 I, Tzl, and Tjl are similar (table 3). No attempt is made here to evaluate the reliability (sampling error) of the ML tree ( Tjl ), and we suggest that the present results do not contribute to the debate over monophyly of the archaebacteria (Hoffman 1992). The eukaryotic lineage, represented by Homo sapiens, is placed on the longest branch indicating greatest divergence; this is true as long as one of the three best trees (i.e., T1,, Tzl, and T3,) is the true tree. This might reflect two important events in eukaryotic evolution: the appearance of chromosomes and mitotic division, and the evolution of sexual reproduction. These two events might affect the rate of mutation within the lineage, particularly within the early eukaryotes. Estimates of branch lengths and frequency parameters for the branches are shown in figure 1 for the ML tree under the HKY+dG+N2 model (i.e., T3r). The expected base frequency distribution at a node of the tree is calculated from the formula pt = po P(t), where the row vectors po and pt are the base frequency distributions at the start and end of the branch, respectively. We note that parameters in a model are estimated with different levels of accuracy. Poor estimates usually have large sampling variances and are sensitive to small perturbations in the model or data. Because of the parameter richness of the nonhomogeneous models, it is important to find out which parameters are reliably estimated and which are not. We have examined this problem by calculating the standard errors of the parameter estimates, FIG. 1.-The maximum-likelihood tree and estimates of parameters for ss rrnas under the HKY+dG+N2 model. One K parameter by comparing estimates under the HKY+dG+Nl and in the HKY model is assumed for the whole tree, with estimate I? HKY+dG+N2 models, and by comparing estimates = One set of frequency parameters is assumed for each obtained from different tree topologies. Either of the two branch (N2). A discrete gamma model (dg) is used to describe variable models may be considered a slight variation of the other, rates among sites, with & = 0.75 f The log likelihood for this and so are some of the tree topologies such as Tr 1, T2,, tree is t - C, = -5, (-5,591.06) = Branch lengths, approximately measured by the expected numbers of nucleotide sub- and T31. We note that estimates of branch lengths are stitutions per site, are shown in boldface type. Numbers in parentheses quite stable no matter which of the two models is asare estimates of the frequency parameters in the HKY model (eq. [ 11) sumed; their sampling variances are also comparable to for the branch, while those in brackets are the base frequencies in those of branch length estimates under the homogeneous sequences at the nodes of the tree, estimated from the model. For models. One exception is for the two branches around example, the base frequencies at the root of the tree (node 5) are estimated as 0.16 (7), 0.19 (C), 0.32 (A), and 0.33 (G). These estimates the root of the tree; using figure 1 as an example, estiof frequency parameters, however, involve large sampling errors and mates of lengths for branches 5-3 and 5-6 can be quite do not appear reliable. different by the two models although their sum is almost the same; although the root of the tree is quite certain, the exact position of the root is not reliably estimated. Likelihood values and estimates of K, a, and other branch lengths appear to be quite reliable by this evidence. The frequency parameters for the branches involve large sampling variances and may have quite different estimates under the two models considered. Their estimates are the least reliable. Discussion Use of the nonhomogeneous models in this study suggests that the ss rrnas can lead to reasonable estimation of phylogeny despite the fact that base frequencies are quite different in different species. The results in table 3 also suggest that it may be possible to identify the root of the tree even though the topology is uncertain. If this can be confirmed with more data sets, methods that infer rooted trees should be considered more seriously.

7 Use of DNAs to Infer Early Branchings 457 As mentioned before, the N2 model involves 3 X (2s - 1) frequency parameters, besides branch lengths in the tree and parameters such as K and a which are common to all tree topologies. With s larger than 4 or 5, this means many parameters. The N 1 model, by using one set of frequency parameters for all interior branches, reduces the number of frequency parameters to 3 X (s + 2). However, this restriction may be too unrealistic when there are many (long) interior branches. We note that even with four species, the nonhomogeneous models still pose computational problems, especially if the sequences are short; it is difficult to locate the optimum values of the frequency parameters during the iteration. The models do not appear to be usable for data of more than five sequences, and more practical methods are needed. When some species in the data have very similar base frequencies and are known to belong to a monophyletic group (which suggests that within the group the patterns of substitution are more or less the same), we may use one set of frequency parameters for all branches within the group. Another possibility may be to take the frequencies in the rate matrices for different branches as random variables generated from a probabilistic distribution, analogous to the case of using a gamma distribution (with a single parameter a) to describe variable rates among sites rather than estimating one rate parameter for each site. At present, it is unclear how such a distribution for a superprocess of base frequency drift can be constructed. Even within the framework of the nonhomogeneous models of this paper, it may be feasible to obtain approximations to the transition probabilities for the branches in the tree without iteration, using, for example, the parsimony inference of the ancestral sequences; this approach will remove the computational problems of the nonhomogeneous models mentioned above. It should be noted that the models considered in this paper are all formulated at the level of nucleotide substitution, which is the product of a complicated process driven by many factors, notably mutation, random drift and natural selection. Unequal base compositions in different species may have a strong selectional basis; for example, thermophiles probably have gained a selective advantage in maintaining high G+C content. These factors are, however, very difficult to model, and in our formulation the effects of selection are reflected in the affected substitution rates. Therefore, the locally homogeneous Markov process assumed for one branch in the tree should be interpreted as an average over time of a substitution pattern that varies along the branch. The argument of Barry and Hartigan ( 1987) suggests that it may be impossible to distinguish using sequence data the average of a variable substitution pattern along a branch from a constant pattern. For data to which the nonhomogeneous models are expected to apply, the interpretation of a variable pattern along a branch is biologically more reasonable. Instead of using a rate matrix Q for a branch as described in equation ( 1 ), the models may as well be formulated using only the matrix P( t) of transition probabilities along the branch; the rate matrix Q may be considered a way of placing restrictions on the structure of P( t) to reduce the number of parameters. Acknowledgment We wish to thank Dr. N. Takahata and two anonymous referees for many useful comments. Dr. M. Nei provided facilities during the revision of the manuscript. This study was partially supported by a grant from the National Science Foundation of China to Z.Y. LITERATURE CITED ADACHI, J., Y. CAO, and M. HASEGAWA Tempo and mode of mitochondrial DNA evolution in vertebrates at the amino acid sequence level: rapid evolution in warmblooded vertebrates. J. Mol. Evol. 36: BARRY, D., and J. A. HARTIGAN Statistical analysis of hominoid molecular evolution. Statist. Sci. 2: 19 l CAVALIER-SMITH, T Kingdom Protozoa and its 18 phyla. Microbial. Rev. 57: EMBLEY, T. M., R. H. THOMAS, and R. A. D. WILLIAMS Reduced thermophilic bias in the 16s rdna sequence from Thermus ruber provides further support for a relationship between Thermus and Deinococcus. Syst. Appl. Microbial. 16: FELSENSTEIN, J Cases in which parsimony and compatibility methods will be positively misleading. Syst. Zool. 27: Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17: FIELD, K. G., G. J. OLSEN, D. J. LANE, S. J. GIOVANNONI, M. T. GHISELIN, E. C. RAFF, N. R. PACE, and R. A. RAFF Molecular phylogeny of the animal kingdom. Science 239: FORTERRE, P., N. BENACHENHOU-LAHFA, F. CONFALONIERI, M. DUGUET, C. ELIE, and B. LABEDAN The nature of the last universal ancestor and the root of the tree of life, still open questions. BioSystems 28: GILLESPIE, J. H Rates of molecular evolution. Ann. Rev. Ecol. Syst. 17: GOLDMAN, N. 1993a. Statistical tests of models of DNA substitution. J. Mol. Evol. 36: b. Simple diagnostic statistical tests of models for DNA substitution. J. Mol. Evol. 37: GOTGARTEN, J. P., H. KIBAK, P. DITTRICH, L. TAIZ, E. J. BOWMAN,B.J.BOWMAN,M.F.MANOLSON,R.J.POOLE, T. DATE, T. OSHIMA, J. KONISHI, K. DENDA, and M. YOSHIDA Evolution of the vacuolar H+-ATPase: implications for the origin of eukaryotes. Proc. Natl. Acad. Sci. USA 86:666 l-6665.

8 458 Yang and Roberts HASEGAWA, M., and T. HASHIMOTO Ribosomal RNA trees misleading? Nature 361:23. HASEGAWA, M., T. HASHIMOTO, E. OTAKA, J. ADACHI, N. IWABE, and T. MIYATA Early divergences in the evolution of eukaryotes: ancient divergence of Entamoeba that lacks mitochondria revealed by protein sequence data. J. Mol. Evol. 36: HASEGAWA, M., H. KISHINO, and T. YANO Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22: HOFFMAN, M Researchers find organisms they can really relate to. Science 257:32. IWABE, N., K. KUMA, M. HASEGAWA, S. OSAWA, and T. MI- YATA Evolutionary relationships of archaebacteria, eubacteria and eukaryotes infered from phylogenetic trees of duplicated genes. Proc. Natl. Acad. Sci. USA 86: JIN, L., and M. NEI Limitations of the evolutionary parsimony method of phylogenetic analysis. Mol. Biol. Evol. 7: LAKE, J. A Origin of the Metazoa. Proc. Natl. Acad. Sci. USA 87: LI, W.-H., M. GOUY, P. M. SHARP, C. O HUIGIN, and Y.-W. YANG Molecular phylogeny of rodentia, lagomorpha, primates, artidactyla and carnivora and molecular clocks. Proc. Natl. Acad. Sci. USA 87: LOCKHART, P. S., C. J. HOWE, D. A. BRYANT, T. J. BEANLAND, and A. W. D. LARKUM Substitutional bias confounds inference of cyanelle origins from sequence data. J. Mol. Evol. 34: LOOMIS, W. F., and D. W. SMITH Molecular phylogeny of Dictyostelium discoideum by protein sequence comparison. Proc. Natl. Acad. Sci. USA 87: NAVIDI, W. C., G. A. CHURCHILL, and A. VON HAESELER Methods for inferring phylogenies from nucleotide acid sequence data by using maximum likelihood and linear invariants. Mol. Biol. Evol. 8: OLSEN, G. J., C. R. WOESE, and R. OVERBEEK The winds of (evolutionary) change: breathing new life into microbiology. J. Bacterial. 176: l-6. REEVES, J. H Heterogeneity in the substitution process of amino acid sites of proteins coded for by mitochondrial DNA. J. Mol. Evol. 35: RITLAND, K., and M. T. CLEGG Evolutionary analysis of plant DNA sequences. Am. Nat. 13O:S74-SlOO. RIVERA, M. C., and J. A. LAKE Evidence that eukaryotes and eocyte prokaryotes are immediate relatives. Science 257: SOGIN, M. L Early evolution and the origin of the eukaryotes. Curr. Opin. Genet. Dev. 1: SCKXN, M. L., J. H. GUNDERSON, H. J. ELWOOD, R. A. ALONSO, and D. A. PEATTIE Phylogenetic meaning of the kingdom concept: an unusual ribosomal RNA from Giardia lamblia. Science 243~ STEEL, M. A., P. J. LOCKHART, and D. PENNY Confidence in evolutionary trees from biological sequence data. Nature 364: TAKAHATA, N Overdispersed molecular clock at the major histocompatibility complex loci. Proc. R. Sot. Lond. B 243: TAMURA, K., and M. NEI Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 10: THORNE, J. L., H. KISHINO, and J. FELSENSTEIN An evolutionary model for maximum likelihood alignment of DNA sequences. J. Mol. Evol. 33: (Erratum: J. Mol. Evol. 34:9 1 [ ) Inching toward reliability: an improved likelihood model of sequence evolution. J. Mol. Evol. 34:3-16. WAINRIGHT, P. O., G. HINKLE, M. L. SOGIN, and S. K. STICKEL Monophyletic origins of the metazoa: an evolutionary link with the fungi. Science 260: WAKELEY, J Substitution rate variation among sites in hypervariable region 1 of human mitochondrial DNA. J. Mol. Evol. 37: Substitution rate variation among sites and the estimation of transition bias. Mol. Biol. Evol. 11: WOESE, C. R Bacterial evolution. Microbial. Rev. 51: WOESE, C. R., L. KANDLER, and M. L. WHEELIS Towards a natural system of organisms: proposal for the domains archaea, bacteria, and eucarya. Proc. Natl. Acad. Sci. USA 87: YANG, Z Maximum likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol. Biol. Evol. 10: Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39: YANG, Z., N. GOLDMAN, and A. E. FRIDAY Comparison of models for nucleotide substitution used in maximum likelihood phylogenetic estimation. Mol. Biol. Evol. 11: ZILLIG, W., H. P. KLENK, P. PALM, H. LEFFERS, G. PUHLER, F. GROPP, and R. GARRETT Did eukaryotes originate by a fusion event? Endocytobiosis Cell Res. 6: l-25. NAOYUKI TAKAHATA, reviewing Received August 12, 1994 Accepted December 13, 1994 editor

### Maximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1

Maximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1 Ziheng Yang Department of Animal Science, Beijing Agricultural University Felsenstein s maximum-likelihood

### Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions

### How to Build a Phylogenetic Tree

How to Build a Phylogenetic Tree Phylogenetics tree is a structure in which species are arranged on branches that link them according to their relationship and/or evolutionary descent. A typical rooted

### Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Name: Class: Date: Chapter 17 Practice Multiple Choice Identify the choice that best completes the statement or answers the question. 1. The correct order for the levels of Linnaeus's classification system,

### Protein Sequence Analysis - Overview -

Protein Sequence Analysis - Overview - UDEL Workshop Raja Mazumder Research Associate Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center Topics Why do protein

### Parallel Adaptations to High Temperatures in the Archean Eon

Parallel Adaptations to High Temperatures in the Archean Eon Samuel Blanquart a1 Bastien Boussau b1 Anamaria Necşulea b Nicolas Lartillot a Manolo Gouy b June 9, 2008 a LIRMM, CNRS. b BBE, CNRS, Université

### Distributions of Statistics Used for the Comparison of Models of Sequence Evolution in Phylogenetics

Distributions of Statistics Used for the Comparison of Models of Sequence Evolution in Phylogenetics Simon Whelan and Nick Goldman Department of Genetics, University of Cambridge Asymptotic statistical

### Accuracies of Ancestral Amino Acid Sequences Inferred by the Parsimony, Likelihood, and Distance Methods

J Mol Evol (1997) 44(Suppl 1):S139 S146 Springer-Verlag New York Inc. 1997 Accuracies of Ancestral Amino Acid Sequences Inferred by the Parsimony, Likelihood, and Distance Methods Jianzhi Zhang, Masatoshi

### INTRODUCTION: Topic I: RIBOSOMAL RNA

INTRODUCTION: The rrna gene is the most conserved (least variable) DNA in all cells. Portions of the rdna sequence from distantly related organisms are remarkably similar. This means that sequences from

### A branch-and-bound algorithm for the inference of ancestral. amino-acid sequences when the replacement rate varies among

A branch-and-bound algorithm for the inference of ancestral amino-acid sequences when the replacement rate varies among sites Tal Pupko 1,*, Itsik Pe er 2, Masami Hasegawa 1, Dan Graur 3, and Nir Friedman

### A comparison of methods for estimating the transition:transversion ratio from DNA sequences

Molecular Phylogenetics and Evolution 32 (2004) 495 503 MOLECULAR PHYLOGENETICS AND EVOLUTION www.elsevier.com/locate/ympev A comparison of methods for estimating the transition:transversion ratio from

### Gene Trees in Species Trees

Computational Biology Class Presentation Gene Trees in Species Trees Wayne P. Maddison Published in Systematic Biology, Vol. 46, No.3 (Sept 1997), pp. 523-536 Overview Gene Trees and Species Trees Processes

### 18.1 Finding Order in Diversity

18.1 Finding Order in Diversity Lesson Objectives Describe the goals of binomial nomenclature and systematics. Identify the taxa in the classification system devised by Linnaeus. Lesson Summary Assigning

### Three Domains of Life

Image from Scientific American blog Three Domains of Life http://www.buzzle.com/articles/three-domains-of-life.html The three-domain system, which classifies life on the planet into three different domains

### Simon Tavarg Statistics Department Colorado State University Fort Collins, CO ABSTRACT

Simon Tavarg Statistics Department Colorado State University Fort Collins, CO 80523 ABSTRACT Some models for the estimation of substitution rates from pairs of functionally homologous DNA sequences are

### Name Class Date. binomial nomenclature. MAIN IDEA: Linnaeus developed the scientific naming system still used today.

Section 1: The Linnaean System of Classification 17.1 Reading Guide KEY CONCEPT Organisms can be classified based on physical similarities. VOCABULARY taxonomy taxon binomial nomenclature genus MAIN IDEA:

### A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML

9 June 2011 A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML by Jun Inoue, Mario dos Reis, and Ziheng Yang In this tutorial we will analyze

### Unit 6: Classification and Diversity. KEY CONCEPT Organisms can be classified based on physical similarities.

KEY CONCEPT Organisms can be classified based on physical similarities. Linnaeus developed the scientific naming system still used today. Taxonomy is the science of naming and classifying organisms. White

### A response to charges of error in Biology by Miller & Levine

A response to charges of error in Biology by Miller & Levine According to TEA, a citizen disputes two sentences on page 767 of our textbook, Biology, by Miller & Levine. These sentences are: SE 767, par.

### A Novel Use of Equilibrium Frequencies in Models of Sequence Evolution

A Novel Use of Equilibrium Frequencies in Models of Sequence Evolution Nick Goldman and Simon Whelan Department of Zoology, University of Cambridge, U.K. Current mathematical models of amino acid sequence

### 4. Why are common names not good to use when classifying organisms? Give an example.

1. Define taxonomy. Classification of organisms 2. Who was first to classify organisms? Aristotle 3. Explain Aristotle s taxonomy of organisms. Patterns of nature: looked like 4. Why are common names not

### 2. Discuss testable models, including terms for them and why testable matters. How does this relate to the supernatural?

Chapter 1: Introduction The Science of Biology 1. Discuss in your group how the scientific method works, and the difference between inductive and deductive reasoning. Come up with examples of inductive

### Overview. bayesian coalescent analysis (of viruses) The coalescent. Coalescent inference

Overview bayesian coalescent analysis (of viruses) Introduction to the Coalescent Phylodynamics and the Bayesian skyline plot Molecular population genetics for viruses Testing model assumptions Alexei

### II. Pathways of Discovery in Microbiology. 1.6 The Historical Roots of Microbiology. Robert Hooke and Early Microscopy

II. Pathways of Discovery in Microbiology 1.6 The Historical Roots of Microbiology 1.6 The Historical Roots of Microbiology 1.7 Pasteur and the Defeat of Spontaneous Generation 1.8 Koch, Infectious Disease,

### Last Universal Common Ancestor

Last Universal Common Ancestor Nothing in Biology Makes Sense Except in the Light of Evolution Theodosius Dobzhansky (1900 1975) Dephney Mathebula Dieter Winkler Hloniphile Sithole INTRODUCTION LUCA Last

### DNA Insertions and Deletions in the Human Genome. Philipp W. Messer

DNA Insertions and Deletions in the Human Genome Philipp W. Messer Genetic Variation CGACAATAGCGCTCTTACTACGTGTATCG : : CGACAATGGCGCT---ACTACGTGCATCG 1. Nucleotide mutations 2. Genomic rearrangements 3.

### Lecture 5 Mutation and Genetic Variation

1 Lecture 5 Mutation and Genetic Variation I. Review of DNA structure and function you should already know this. A. The Central Dogma DNA mrna Protein where the mistakes are made. 1. Some definitions based

### TAXONOMY & the 3 Kingdom idea Chapter 9 There are 2 important videos in this chapter. 1 is Microbial Evolution and is accessed from the Nav page

TAXONOMY & the 3 Kingdom idea Chapter 9 There are 2 important videos in this chapter. 1 is Microbial Evolution and is accessed from the Nav page videos. The other is found as Origins Video with a worksheet

### BIOL 1030 TOPIC 1 LECTURE NOTES Topic 1: Classification and the Diversity of Life (Chapters 25, 26.6)

Topic 1: Classification and the Diversity of Life (Chapters 25, 26.6) I. Background review (Biology 1020 material) A. Scientific Method 1. observations 2. scientific model explains observations makes testable

### Biology Performance Level Descriptors

Limited A student performing at the Limited Level demonstrates a minimal command of Ohio s Learning Standards for Biology. A student at this level has an emerging ability to describe genetic patterns of

### Molecular Clocks and Tree Dating with r8s and BEAST

Integrative Biology 200B University of California, Berkeley Principals of Phylogenetics: Ecology and Evolution Spring 2011 Updated by Nick Matzke Molecular Clocks and Tree Dating with r8s and BEAST Today

### What Do I Need to Know About Classification?

What Do I Need to Know About Classification? (Chapter 17 Honors) MULTIPLE CHOICE: Circle ALL that are true. There may be MORE THAN one correct answer. *Correct Answer(s) 1. The science that specializes

### Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 15: lecture 5 Substitution matrices Multiple sequence alignment A teacher's dilemma To understand... Multiple sequence alignment Substitution matrices Phylogenetic trees You first need

### Evolutionary analysis of gorilla, chimps and humans using sequence divergence

Journal of Bioinformatics and Sequence Analysis Vol. 3(1), pp. 1-5, January 2011 Available online at http://www.academicjournals.org/jbsa ISSN 2141-2464 2011 Academic Journals Full length Research Paper

### The Central Dogma of Molecular Biology

Vierstraete Andy (version 1.01) 1/02/2000 -Page 1 - The Central Dogma of Molecular Biology Figure 1 : The Central Dogma of molecular biology. DNA contains the complete genetic information that defines

### Protein Phylogenies and Signature Sequences: A Reappraisal of Evolutionary Relationships among Archaebacteria, Eubacteria, and Eukaryotes

MICROBIOLOGY AND MOLECULAR BIOLOGY REVIEWS, Dec. 1998, p. 1435 1491 Vol. 62, No. 4 1092-2172/98/\$04.00 0 Copyright 1998, American Society for Microbiology. All Rights Reserved. Protein Phylogenies and

### Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Tuesday 11:00-12:00/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office

### of interrelatedness and decent with modification. However, technological advances are opening another powerful avenue of comparison: DNA sequences.

SCI115SC Module 3 AVP Transcript Title: Genetic Similarity Title Slide Narrator: Welcome to this presentation on genetic similarity. Slide 2 Title: Underlying the Outwardly Visible Similarities, We Find

### Cells. DNA and Heredity

Cells DNA and Heredity ! Nucleic acids DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) Determines how cell function " change the DNA and you change the nature of the organism Changes of DNA allows

### 2. The domain Archaea is made up of three phyla. Fill in the table below: Phylum Sister group of phylum Example taxon

Study Guide MICROBIAL DIVERSITY BROCK CH. 13 This study guide covers Brock Ch. 13 the Archaea. I will hand out Table 11.3, which is referenced at the start of this chapter. Note that Brock uses Archaeon

### Optimization of Gene Prediction via More Accurate Phylogenetic Substitution Models

Washington University in St. Louis Washington University Open Scholarship All Computer Science and Engineering Research Computer Science and Engineering Report Number: WUCSE-2011-78 2011 Optimization of

### Organizing Life s Diversity Section 17.1 The History of Classification

Organizing Life s Diversity Section 17.1 The History of Classification Scan Section 1 of the chapter. Write three questions that come to mind from reading the headings and the illustration captions. 1.

### High Throughput Network Analysis

High Throughput Network Analysis Sumeet Agarwal 1,2, Gabriel Villar 1,2,3, and Nick S Jones 2,4,5 1 Systems Biology Doctoral Training Centre, University of Oxford, Oxford OX1 3QD, United Kingdom 2 Department

### A CONTENT STANDARD IS NOT MET UNLESS APPLICABLE CHARACTERISTICS OF SCIENCE ARE ALSO ADDRESSED AT THE SAME TIME.

Biology Curriculum The Georgia Performance Standards are designed to provide students with the knowledge and skills for proficiency in science. The Project 2061 s Benchmarks for Science Literacy is used

### An Investigation of Codon Usage Bias Including Visualization and Quantification in Organisms Exhibiting Multiple Biases

An Investigation of Codon Usage Bias Including Visualization and Quantification in Organisms Exhibiting Multiple Biases Douglas W. Raiford, Travis E. Doom, Dan E. Krane, and Michael L. Raymer Abstract

### MATCH Commun. Math. Comput. Chem. 61 (2009) 781-788

MATCH Communications in Mathematical and in Computer Chemistry MATCH Commun. Math. Comput. Chem. 61 (2009) 781-788 ISSN 0340-6253 Three distances for rapid similarity analysis of DNA sequences Wei Chen,

### Name Class Date. Figure 13 1. 2. Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d.

13 Multiple Choice RNA and Protein Synthesis Chapter Test A Write the letter that best answers the question or completes the statement on the line provided. 1. Which of the following are found in both

### Network Protocol Analysis using Bioinformatics Algorithms

Network Protocol Analysis using Bioinformatics Algorithms Marshall A. Beddoe Marshall_Beddoe@McAfee.com ABSTRACT Network protocol analysis is currently performed by hand using only intuition and a protocol

### Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6

Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 In the last lab, you learned how to perform basic multiple sequence alignments. While useful in themselves for determining conserved residues

### Missing data and the accuracy of Bayesian phylogenetics

Journal of Systematics and Evolution 46 (3): 307 314 (2008) (formerly Acta Phytotaxonomica Sinica) doi: 10.3724/SP.J.1002.2008.08040 http://www.plantsystematics.com Missing data and the accuracy of Bayesian

### Introduction to Phylogenetic Analysis

Subjects of this lecture Introduction to Phylogenetic nalysis Irit Orr 1 Introducing some of the terminology of phylogenetics. 2 Introducing some of the most commonly used methods for phylogenetic analysis.

### Bayesian Phylogeny and Measures of Branch Support

Bayesian Phylogeny and Measures of Branch Support Bayesian Statistics Imagine we have a bag containing 100 dice of which we know that 90 are fair and 10 are biased. The

9th Grade 9th -12th Grade History - Social Science Historical and Social Sciences Analysis Skills Chronological and Spatial Thinking 1. Students compare and contrast the present with the past, evaluating

### FROM PROTEIN SEQUENCES TO PHYLOGENETIC TREES

FROM PROTEIN SEQUENCES TO PHYLOGENETIC TREES Robert Hirt Department of Zoology, The Natural History Museum, London Agenda Remind you that molecular phylogenetics is complex the more you know about the

### Q1: Hint: Q2: Hint: Q3: Hint: Q4: Hint: Q5: Hint: Q6: Hint:

Q1: Human origins expert Chris Stringer says that there are still questions about the early part of the human family tree. What is the important question he is asking? Hint: Audio/slideshow: Our family

### MCMC A T T T G C T C B T T C C C T C C G C C T C T C D C C T T C T C. (Saitou and Nei, 1987) (Swofford and Begle, 1993)

MCMC 1 1 1 DNA 1 2 3 4 5 6 7 A T T T G C T C B T T C C C T C C G C C T C T C D C C T T C T C ( 2) (Saitou and Nei, 1987) (Swofford and Begle, 1993) 1 A B C D (Vos, 2003) (Zwickl, 2006) (Morrison, 2007)

### Achievement Level Descriptors for Biology

Achievement Level Descriptors for Biology Georgia Department of Education September 2015 All Rights Reserved Achievement Levels and Achievement Level Descriptors With the implementation of the Georgia

### Concept 27.1 Structural and functional adaptations contribute to prokaryotic success

Name Period Chapter 27: Bacteria and Archaea Overview 1. The chapter opens with amazing tales of life at the extreme edge. What are the masters of adaptation? Describe what this means and the one case

Phylogenetic Trees Made Easy A How-To Manual Fourth Edition Barry G. Hall University of Rochester, Emeritus and Bellingham Research Institute Sinauer Associates, Inc. Publishers Sunderland, Massachusetts

### NSilico Life Science Introductory Bioinformatics Course

NSilico Life Science Introductory Bioinformatics Course INTRODUCTORY BIOINFORMATICS COURSE A public course delivered over three days on the fundamentals of bioinformatics and illustrated with lectures,

### Student Guide for Mesquite

MESQUITE Student User Guide 1 Student Guide for Mesquite This guide describes how to 1. create a project file, 2. construct phylogenetic trees, and 3. map trait evolution on branches (e.g. morphological

### PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org

BIOINFTool: Bioinformatics and sequence data analysis in molecular biology using Matlab Mai S. Mabrouk 1, Marwa Hamdy 2, Marwa Mamdouh 2, Marwa Aboelfotoh 2,Yasser M. Kadah 2 1 Biomedical Engineering Department,

### Pairwise Sequence Alignment

Pairwise Sequence Alignment carolin.kosiol@vetmeduni.ac.at SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What

### S = {1, 2,..., n}. P (1, 1) P (1, 2)... P (1, n) P (2, 1) P (2, 2)... P (2, n) P = . P (n, 1) P (n, 2)... P (n, n)

last revised: 26 January 2009 1 Markov Chains A Markov chain process is a simple type of stochastic process with many social science applications. We ll start with an abstract description before moving

### Schools of Systematics. Schools of Systematics. What is Evolutionary Systematics? What is Evolutionary Systematics? What is Evolutionary Systematics?

Topic 3: Systematics II What are the schools of thought of systematics? How does one make a cladogram? What are higher taxa & ranks? How should they be used in this course? Applying phylogenies Evolution

### AP Biology Learning Objective Cards

1.1 The student is able to convert a data set from a table of numbers that reflect a change in the genetic makeup of a population over time and to apply mathematical methods and conceptual understandings

### Activity IT S ALL RELATIVES The Role of DNA Evidence in Forensic Investigations

Activity IT S ALL RELATIVES The Role of DNA Evidence in Forensic Investigations SCENARIO You have responded, as a result of a call from the police to the Coroner s Office, to the scene of the death of

### Phylogenetic systematics turns over a new leaf

30 Review Phylogenetic systematics turns over a new leaf Paul O. Lewis Long restricted to the domain of molecular systematics and studies of molecular evolution, likelihood methods are now being used in

### Bayesian coalescent inference of population size history

Bayesian coalescent inference of population size history Alexei Drummond University of Auckland Workshop on Population and Speciation Genomics, 2016 1st February 2016 1 / 39 BEAST tutorials Population

### ECO-1.1: I can describe the processes that move carbon and nitrogen through ecosystems.

Cycles of Matter ECO-1.1: I can describe the processes that move carbon and nitrogen through ecosystems. ECO-1.2: I can explain how carbon and nitrogen are stored in ecosystems. ECO-1.3: I can describe

### SAM Teacher s Guide DNA to Proteins

SAM Teacher s Guide DNA to Proteins Note: Answers to activity and homework questions are only included in the Teacher Guides available after registering for the SAM activities, and not in this sample version.

### Scientific Process Skills: Scientific Process Skills:

Texas University Interscholastic League Contest Event: Science The contest challenges students to read widely in biology, to understand the significance of experiments rather than to recall obscure details,

### SAM Teacher s Guide DNA to Proteins

SAM Teacher s Guide DNA to Proteins Overview Students examine the structure of DNA and the processes of translation and transcription, and then explore the impact of various kinds of mutations. Learning

### The sequence of bases on the mrna is a code that determines the sequence of amino acids in the polypeptide being synthesized:

Module 3F Protein Synthesis So far in this unit, we have examined: How genes are transmitted from one generation to the next Where genes are located What genes are made of How genes are replicated How

### Grade EOC Biology STAAR and STAAR-M Fall 2012 by Objective

TEKS: (4) The student knows that cells are the basic structures of all living things with specialized parts that perform specific functions, and that viruses are different from cells. Objective: (A) Compare

Marine Microplankton Ecology Reading Microbes dominate our planet, especially the Earth s oceans. The distinguishing feature of microorganisms is their small size, usually defined as less than 200 micrometers

### Ch. 12: DNA and RNA 12.1 DNA Chromosomes and DNA Replication

Ch. 12: DNA and RNA 12.1 DNA A. To understand genetics, biologists had to learn the chemical makeup of the gene Genes are made of DNA DNA stores and transmits the genetic information from one generation

### Introduction to Bioinformatics 3. DNA editing and contig assembly

Introduction to Bioinformatics 3. DNA editing and contig assembly Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 matthewb@ba.ars.usda.gov

### Course Name: BIOL 10A. Title: Cellular Biology, Genetics & Evolution. Units: 5

Course Name: BIOL 10A Title: Cellular Biology, Genetics & Evolution Units: 5 Course Description: Investigates the principles governing cell biology, metabolism, genetics, evolution and history of life

### Biology 164 Laboratory PHYLOGENETIC SYSTEMATICS

Biology 164 Laboratory PHYLOGENETIC SYSTEMATICS Objectives 1. To become familiar with the cladistic approach to reconstruction of phylogenies. 2. To construct a character matrix and phylogeny for a group

### How Populations Evolve

How Populations Evolve Darwin and the Origin of the Species Charles Darwin published On the Origin of Species by Means of Natural Selection, November 24, 1859. Darwin presented two main concepts: Life

### Chapter 2 Biological Inventions

Note: When any ambiguity of interpretation is found in this provisional translation, the Japanese text shall prevail. Chapter 2 Biological Inventions (Applied to any patent applications filed on or after

### RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the

### In silico comparison of nucleotide composition and codon usage bias between the essential and non- essential genes of Staphylococcus aureus NCTC 8325

International Journal of Current Microbiology and Applied Sciences ISSN: 2319-7706 Volume 3 Number 12 (2014) pp. 8-15 http://www.ijcmas.com Original Research Article In silico comparison of nucleotide

### Dealing with saturation at the amino acid level: a case study based on anciently duplicated zebrafish genes

Gene 295 (2002) 205 211 www.elsevier.com/locate/gene Dealing with saturation at the amino acid level: a case study based on anciently duplicated zebrafish genes Yves Van de Peer a, *, Tancred Frickey b,

### THE HISTORY OF LIFE AND NOW FOR A BRIEF INTRODUCTION TO: FUNDAMENTAL CHARACTERISTICS OF LIFE

AND NOW FOR A BRIEF INTRODUCTION TO: THE HISTORY OF LIFE FUNDAMENTAL CHARACTERISTICS OF LIFE Energy acquisition and utilization --- metabolism, growth, behavior. Information storage --- presence of a genome

### Lab 10 Mitosis. Background. Mitosis. Prokaryotic fission. Prophase During prophase, the chromatin. Eukaryotic cell division

Lab 10 Mitosis Background Reproduction means producing a new organism from an existing organism. The new offspring must receive hereditary information and enough cytoplasmic material to maintain its own

### Y-DNA FACT SHEET. Bruce A. Crawford

Y-DNA FACT SHEET By Bruce A. Crawford For those not familiar with DNA analysis and particularly Y-DNA, the following explanation may help. DNA is the basic building block of cell information and heredity.

### Mammalian Housekeeping Genes Evolve more Slowly. than Tissue-specific Genes

MBE Advance Access published October 31, 2003 Mammalian Housekeeping Genes Evolve more Slowly than Tissue-specific Genes Liqing Zhang and Wen-Hsiung Li Department of Ecology and Evolution, University of

### Evolution of early cells

1 Life: Early Cells, Classification of Life EVPP 110 Lecture Fall 2003 Dr. Largen 2 evolution of cells earliest cells prokaryotic cells eukaryotic cells classification of life 3 Evolution of early cells

### THE INTERESTING HISTORY OF CELLS STUDENT HANDOUT. There are two basic types of cells, prokaryotic cells and eukaryotic cells.

THE INTERESTING HISTORY OF CELLS STUDENT HANDOUT There are two basic types of cells, prokaryotic cells and eukaryotic cells. Figure 1. A PROKARYOTIC CELL The prokaryotes are very small singlecelled organisms

### Markov chains and Markov Random Fields (MRFs)

Markov chains and Markov Random Fields (MRFs) 1 Why Markov Models We discuss Markov models now. This is the simplest statistical model in which we don t assume that all variables are independent; we assume

### Lab 2/Phylogenetics/September 16, 2002 1 PHYLOGENETICS

Lab 2/Phylogenetics/September 16, 2002 1 Read: Tudge Chapter 2 PHYLOGENETICS Objective of the Lab: To understand how DNA and protein sequence information can be used to make comparisons and assess evolutionary

### The root of the universal tree and the origin of eukaryotes based

Proc. Natl. Acad. Sci. USA Vol. 93, pp. 7749-7754, July 1996 Evolution The root of the universal tree and the origin of eukaryotes based on elongation factor phylogeny SANDRA L. ALDAUF*t, JEFFREY D. PALMERt,

### Base Compositional Bias and Phylogenetic Analyses: A Test of the Flying DNA Hypothesis

MOLECULAR PHYLOGENETICS AND EVOLUTION Vol. 3, No. 3, December, pp. 408 46, 998 ARTICLE NO. FY98053 Base Compositional Bias and Phylogenetic Analyses: A Test of the Flying DNA Hypothesis Ronald A. Van Den

### Taxonomy Five Kingdoms

Taxonomy Five Kingdoms R.H Whittaker 1969 Three Domains Evolutionary Trees Cladistics: Cladograms and molecular data What are the tools used by scientists to observe and understand evolutionary relationships?

### PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference

PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference Stephane Guindon, F. Le Thiec, Patrice Duroux, Olivier Gascuel To cite this version: Stephane Guindon, F. Le Thiec, Patrice

### Amino Acids and Their Properties

Amino Acids and Their Properties Recap: ss-rrna and mutations Ribosomal RNA (rrna) evolves very slowly Much slower than proteins ss-rrna is typically used So by aligning ss-rrna of one organism with that

### DnaSP, DNA polymorphism analyses by the coalescent and other methods.

DnaSP, DNA polymorphism analyses by the coalescent and other methods. Author affiliation: Julio Rozas 1, *, Juan C. Sánchez-DelBarrio 2,3, Xavier Messeguer 2 and Ricardo Rozas 1 1 Departament de Genètica,

### Theory of Evolution. A. the beginning of life B. the evolution of eukaryotes C. the evolution of archaebacteria D. the beginning of terrestrial life

Theory of Evolution 1. In 1966, American biologist Lynn Margulis proposed the theory of endosymbiosis, or the idea that mitochondria are the descendents of symbiotic, aerobic eubacteria. What does the