A short guide to phylogeny reconstruction
|
|
|
- Adelia McKenzie
- 9 years ago
- Views:
Transcription
1 A short guide to phylogeny reconstruction E. Michu Institute of Biophysics, Academy of Sciences of the Czech Republic, Brno, Czech Republic ABSTRACT This review is a short introduction to phylogenetic analysis. Phylogenetic analysis allows comprehensive understanding of the origin and evolution of species. Generally, it is possible to construct the phylogenetic trees according to different features and characters (e.g. morphological and anatomical characters, RAPD patterns, FISH patterns, sequences of DNA/RNA, and amino acid sequences). The DNA sequences are preferable for phylogenetic analyses of closely related species. On the other hand, the amino acid sequences are used for phylogenetic analyses of more distant relationships. The sequences can be analysed using many computer programs. The methods most often used for phylogenetic analyses are neighbor-joining (NJ), maximum parsimony (MP), maximum likelihood (ML) and Bayesian inference. Keywords: sequence alignment; phylogenetic analysis; neighbor-joining; maximum parsimony; maximum likelihood; Bayesian inference Phylogenetic analysis allows comprehensive understanding of the origin and evolution of species. The main aim of this article is to provide basic information and discuss difficulties in the phylogeny reconstruction. The result of phylogeny reconstruction is a phylogenetic tree, which can be either rooted or unrooted. In the unrooted tree, groupings are inferred, but no direction to evolutionary change is implied. It only displays the relationships between the taxons (Figure 1). Unlike the unrooted tree, the rooted tree implies directionality in time and shows the relationships with regard to an (Figure 2) (reviewed in Doyle and Gaut 2000). To root the tree, it is necessary to add an, which is a (unrelated) group of species or single species that is not included in the group of species under the study (reviewed in Harrison and Langdale 2006). The can be selected on the basis of prior knowledge of the group of interest, or may become apparent during the sequence alignment. Generally, the most informative is the actual sister group. The first step in the phylogeny reconstruction is to choose species used in the phylogenetic analyses. The selection of species is very important because wrong (incongruent) selection (e.g. only few taxa or the wrong taxa) can negatively influence the phylogeny reconstruction; an example of incongruence coming from the wrong taxa selection is discussed by Soltis et al. (2004). Generally, it is possible to construct the phylogenetic trees according to different features and characters (e.g. morphological and anatomical characters, RAPD patterns, FISH patterns, sequences of DNA/RNA, and amino acid sequences). In the case of molecular phylogeny based on sequencing data, another important consideration in building molecular trees from protein-coding genes is, whether to analyse the sequences at the DNA or the protein level. In DNA sequences, there are only four possible nucleotides and provided DNA substitution rates are high, the probability that two lineages will independently evolve the same nucleotide at the same site increases (reviewed in Bergsten 2005). However, the increased number of characters in nucleotide sequences can lead to a better resolution of the tree. The DNA sequences are used for phylogenetic analyses of closely related species because of more information This contribution was presented at the 4 th Conference on Methods in Experimental Biology held in 2006 in the Horizont Hotel in Šumava, Czech Republic. 442 PLANT SOIL ENVIRON., 53, 2007 (10):
2 species H species G species F species D Figure 1. An example of unrooted tree (sequenced plasmids coming from one colony contains only one variant of the gene). Subsequent segregation analysis of the found polymorphism enables to identify whether different alleles or different copies of the gene were found. If the phylogenetic tree is constructed according to DNA sequences, there is a possibility to choose either repetitive sequences (for example rdna spacers) or single copy genes for further analyses. The main disadvantage of repetitive sequences is that not all of the repetitive sequences are identical. For example, Desfeux and Lejeune (1996) sequenced rdna spacer and obtained two types of sequences of Silene dioica with completely different branching patterns. Therefore, it is better to sequence more monomers. Furthermore, repetitive sequences between closely related species do not always offer sufficient distinction. On the other hand, phylogenetic analysis of introns of nuclear genes can provide better resolution than repetitive sequences. However, in the case of less conserved sequences, it can be difficult to find orthologs from all studied species by PCR with the same pair of primers. In general, the genes used for the analysis can have more copies in a respective genome (e.i. they are paralogous), which may cause problems in the construction of phylogenetic trees. For example, it is possible to consider a gene with two copies in all analysed species. If both copies of this gene are detected, the found phylogenetic tree will agree with the true relationships between the species (Figure 4). However, when different copies are found in different species, the resulting phylogenetic tree will not provide the correct relationships between the species (Figure 3) (reviewed in Baldauf 2003). To distinguish between a single copy and a multicopy gene, Southern or PCR analysis can be performed. In some cases, sequencing of a single gene does not provide the best resolution in the phylogenetic tree. For this reason, it is better to sequence more genes (Sandersson and Driskell 2003). The construction of the phylogenetic tree on the basis of a higher number of sequences, coming from multiple genes, provides generally much more information than a tree constructed according to the sequences coming from one gene. The second step in the phylogeny reconstruction is to check the sequences from the studied species and to align them. The resulting sequences can be visualized for example using the program BioEdit, available at: BioEdit/bioedit.html (Hall 1999). Except the pro- species E at the DNA level when compared with the protein level. Moreover, by coding amino acid characters the potentially informative silent substitutions are ignored (Simmons and Freudenstein 2002). In the case of protein level, there are more possible character states for amino acids as opposed to nucleotides (20 versus 4). The amino acid sequences are used for phylogenetic analyses of more distant relationhips (Simmons and Freudenstein 2002). Generally, they are analysed when transitions and/or third-codon positions are determined to be saturated as indicated by high divergence values between sequences (reviewed in Simmons 2000). However, in some cases the DNA sequences are still used for phylogenetic analyses of distant relationships but it is important to remove the third codon positions as these could present pure noise (reviewed in Baldauf 2003). The primary source of data used for molecular phylogenetic analyses are sequenced PCR or RT-PCR products, which are either directly used in analyses or translated to amino acid sequences. The obtained PCR (RT-PCR) products can be directly sequenced or it is possible to clone them into a suitable vector, and then to sequence the inserted PCR product. The direct sequencing enables to find some polymorphisms in the sequence, in contrast to the sequencing of cloned PCR product insert Figure 2. An example of rooted tree species D PLANT SOIL ENVIRON., 53, 2007 (10):
3 , copy 1, copy 1, copy 2 Figure 3. Phylogenetic tree constructed on the basis of different copies of a gene. This tree does not provide the correct relationships between the species gram BioEdit, it is possible to use the program MEGA3 (Kumar et al. 2004), which I consider not very intuitive. The alignment of the sequences can be performed automatically or manually. Automatic alignments may fail to correctly identify conserved regions, whereas manual alignments allow this, but they are much more laborious. Using a computer-based alignment as a guide to manual alignment offers a good compromise. For example, the programs Clustal W1.81, available at: Downloads/clustalw.html (Thompson et al. 1994); T-Coffee, available at: (Notredame et al. 2000); Muscle, available at: com/muscle/ (Edgar 2004); Musca, available at: (Parida et al. 1998); Mafft, available at: (Katoh et al. 2002); ProbCons, available at: probcons.stanford.edu/ (Do et al. 2005) can be used for automatic alignment of sequences. To check and to manually correct the alignments, the program called SeaView, available at: univ-lyon1.fr/software/seaview.html (Galtier et al. 1996) can be used. copy 1 copy 2 Figure 4. Phylogenetic tree constructed according to all found paralogs of a gene. This tree indicates the correct relationships between the species Once the data are aligned, there are many different types of phylogenetic analyses, which can be performed (Holder and Lewis 2003). The methods for calculating phylogenetic trees can be generally divided into two categories. These are distancematrix based methods, also known as clustering or algorithmic methods (e.g. neighbor-joining, Fitch-Margoliash, UPGMA), and discrete data based methods, also known as tree searching methods (maximum parsimony, maximum likelihood and Bayesian methods). Distance is relatively simple and direct. The distance (roughly, the percent sequence difference), is calculated for all pairwise combinations of operation taxonomic units, and then the distances are gathered into a tree. Discrete data methods examine each column of the alignment separately and look for the tree that best complies all of this information. Unsurprisingly, distance methods are much faster than discrete data methods. However, a distance analysis yields little information other than the tree, while discrete data analyses are information rich. There is a hypothesis for every column in the alignment, so it is possible to trace the evolution at specific sites in the molecule (e.g. catalytic sites or regulatory regions; reviewed in Baldauf 2003). The most often used methods for phylogenetic analyses are neighbor-joining (NJ), maximum parsimony (MP), maximum likelihood (ML), and Bayesian inference. NJ is a fast method suited for large datasets. It permits different branch lengths indicating the evolutionary time or amount of evolutionary changes along the branch. The disadvantages of this method are that it gives only one possible tree and it is strongly dependent on the used model of evolution (model is a mathematical description of the sequence evolution and it can be complicated to incorporate other biologically important processes like insertions or deletions; Saitou and Nei 1987). Moreover, the algorithm is based on the reduction of sequence information when transforming the data to the distance matrix, which can be also disadvantageous. Maximum parsimony method is based on shared and derived characters. It does not reduce sequence information to a single number. It works with original data (alignment) and tries to provide the information about the ancestral sequences. The principle of this method is to find a tree with the smallest number of evolutionary changes (based on the theory that the evolution prefers the smallest number of mutations); in comparison with 444 PLANT SOIL ENVIRON., 53, 2007 (10):
4 the distance based methods it is relatively slow. MP does not use all the sequence information because only informative sites are used, and it does not provide information on the branch lengths. An advantage of the maximum parsimony method is that it does not imply a specific model of evolution and provides all equally parsimonious topologies. A specific problem of MP is long-branch attraction. It is a phenomenon in phylogenetic analyses (especially those using maximum parsimony) caused by the fact that rapidly evolving lineages are considered closely related, regardless of their true evolutionary relationships (reviewed in Bergsten 2005). This problem can be minimized by using methods which comprise differential rates of substitution among lineages or by breaking up long branches by adding taxa that are related to those with the long branches (reviewed in Bergsten 2005), e.g. maximum likelihood. The principle of the latter method is to find such tree-topology, which explains the relationships between the sequences with the highest probability. ML method requires a model of evolution. This is an advantage because it makes us aware of the assumptions being made. The disadvantage of the model is that the use of inadequate likelihood models can lead to interpretation in real data sets. To decide which model best fits the data, the likelihood values given by the different models for the data are calculated and compared. The model to choose is the simplest model that gives a likelihood not significantly lower than the likelihood given by a more general model (reviewed in Felsenstein 2004). Bayesian inference suggests a natural way how to accommodate uncertainty in phylogenies and provides an intuitive measure of support for trees and a practical way to estimate large phylogenies using a statistical approach (Huelsenbeck et al. 2000). Bayesian inference of phylogenetic trees uses Markov Chain Monte Carlo (MCMC) to approximate the posterior probabilities of trees (Huelsenbeck and Ronquist 2001). The most important aspect of MCMC Bayesian inference is its computational efficiency. The method allows the incorporation of complex models of the DNA substitution process, and other aspects of evolution. Although Bayesian analysis using MCMC is an elegant method for solving many problems, it is relatively new and there is a number of unsolved questions, e.g. convergence of the Markov Chain, discrepancy between Bayesian posterior probabilities and nonparametric bootstrap test values (Huelsenbeck et al. 2002). Phylogenetic analysis can be performed for example using the program PhyloWin, available at: (Galtier et al. 1996). The direct output of a phylogenetic analysis are user-unfriendly formatted files. To visualise and to edit the tree, many different programs can be used, as for instance, NJ plot (Perrière and Gouy 1996, accessible on univ-lyon1.fr/software/njplot.html). It is important to know how much the individual branches are supported within the tree. Finally the accuracy of resulting phylogeny can be measured by different methods. The most commonly used method is bootstrapping (reviewed in Felsenstein 1985). This technique determines the phylogenetic accuracy and enables to establish a marginal winner among many, nearly equal, alternatives (possibilities). In this method, numerous subsamples ( ) are generated. Each of the subsamples has the same size as the original, which is accomplished by allowing repeated sampling of sites. That is random sampling with replacement, constructing trees from each of the subsample and calculating the frequency with which the branching pattern in each of this random subsample is reproduced (Hillis and Bull 1993). The bootstrap represents the value interpreting the number of cases in which the sequences were classified together. For example, if the species X is found in every subsample tree, then its bootstrap support is 100%. If it is found in only two-thirds of the subsample tree, its bootstrap support is 67%. Generally, the bootstrap values of 70% and higher indicate reliable groupings. When the bootstrap values all over the tree are low, it can indicate problems with longbranch attraction. Then it is possible to remove these sequences from the dataset and observe whether the bootstrap values are increased. To present phylogenetic trees, there are several widely accepted rules. Branch lengths are almost always drawn to scale. Bootstrap values should be displayed as percentages and only values of 50% and higher are presented, because of easier understanding and comparison with other trees. To conclude, phylogenetic analysis is a powerful tool for organization and interpreting of molecular data. With even a very basic understanding of general principles and conventions, it is possible to glean valuable information from a phylogenetic tree on the origin, evolution and possible function of genes and the proteins they might encode (reviewed in Baldauf 2003). I hope that this short article will help to understand basic problems of phylogenetic analysis. PLANT SOIL ENVIRON., 53, 2007 (10):
5 REFERENCES Baldauf S.L. (2003): Phylogeny for the faint of heart: a tutorial. Trends Genet., 19: Bergsten J. (2005): A review of long-branch attraction. Cladistics, 21: Desfeux C., Lejeune B. (1996): Systematics of Euromediterranean silene (Caryophyllaceae): Evidence from a phylogenetic analysis using ITS sequences. Life Sci., 319: Do C.B., Mahabhashyam M.S.P., Brudno M., Batzoglou S. (2005): PROBCONS: Probabilistic consistency-based multiple sequence alignment. Genome Res., 15: Doyle J.J., Gaut B.S. (2000): Evolution of genes and taxa: a primer. Plant Mol. Biol., 42: Edgar R.C. (2004): MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res., 32: Felsenstein J. (1985): Confidence limits on phylogenies: an approach using the bootstrap. Evolution, 39: Felsenstein J. (2004): Inferring Phylogenies. Sinauer Associates, Sanderland, Massachusets. Galtier N., Gouy M., Gautier C. (1996): SeaView and PhyloWin, two graphic tools for sequence alignment and molecular phylogeny. Comput. Appl. Biosci., 12: Hall T.A. (1999): Bioedit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98NT. Nucleic Acids Symp. Ser. 41: Harrison C.J., Langdale J.A. (2006): A step by step guide to phylogeny reconstruction. Plant J., 45: Hillis D.M., Bull J.J. (1993): An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analyses. Syst. Biol., 42: Holder M., Lewis P.O. (2003): Phylogeny estimation: traditional and Bayesian inference of phylogenetic trees. Bioinformatics, 17: Huelsenbeck J.P.H., Larget B., Miller R.E., Ronquist F. (2002): Potential applications and pitfalls of Bayesian inference of phylogeny. Syst. Biol., 51: Huelsenbeck J.P., Rannala B., Masly J.P. (2000): Accomodating phylogenetic uncertainty in evolutionary studies. Science, 288: Huelsenbeck J.P., Ronquist F. (2001): MRBAYES: Bayesian inference of phylogeny. Bioinformatics, 17: Katoh K., Misawa K., Kuma K., Miyata T. (2002): MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res., 30: Kumar S., Tamura K., Nei M. (2004): MEGA3: Integrated software for molecular evolutionary genetics analysis and sequence alignment briefings. Bioinformatics, 5: Notredame C., Higgins D., Heringa J. (2000): T-Coffee: A novel method for multiple sequence alignments. J. Mol. Biol., 302: Parida L., Floratos A., Rigoutsos I. (1998): Musca: An algorithm for constrained alignment of multiple data sequences. In: Proc. 9 th Workshop on Genome Informatics, Tokyo, Japan. Perrière G., Gouy M. (1996): WWW-Query: An online retrieval system for biological sequence banks. Biochimie, 78: Saitou N., Nei M. (1987): The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol., 4: Sanderson M.J., Driskell A.C. (2003): The challenge of constructing large phylogenetic trees. Trends Plant Sci., 8: Simmons M.P. (2000): A fundamental problem with amino-acid-sequence characters for phylogenetic analyses. Cladistics, 16: Simmons M.P., Freudenstein J.V. (2002): Artifacts of coding amino acids and other composite characters for phylogenetic analysis. Cladistics, 18: Soltis D.E., Albert V.A., Savolainen V., Hilu K., Qiu Y.L., Chase M.W., Farris J.S., Stefanović S., Rice D.W., Palmer J.D., Soltis P.S. (2004): Genome-scale data, angiosperm relationships, and ending incongruence : a cautionary tale in phylogenetics. Trends Plant Sci., 9: Thompson J.D., Higgins D.G., Gibson T.J. (1994): CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22: Received on February 27, 2007 Corresponding author: Mgr. Elleni Michu, Akademie věd České republiky, Biofyzikální ústav, v. v. i., Královopolská 135, Brno, Česká republika phone: , fax: , [email protected] 446 PLANT SOIL ENVIRON., 53, 2007 (10):
Phylogenetic Trees Made Easy
Phylogenetic Trees Made Easy A How-To Manual Fourth Edition Barry G. Hall University of Rochester, Emeritus and Bellingham Research Institute Sinauer Associates, Inc. Publishers Sunderland, Massachusetts
Bayesian Phylogeny and Measures of Branch Support
Bayesian Phylogeny and Measures of Branch Support Bayesian Statistics Imagine we have a bag containing 100 dice of which we know that 90 are fair and 10 are biased. The
PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: [email protected]
BIOINFTool: Bioinformatics and sequence data analysis in molecular biology using Matlab Mai S. Mabrouk 1, Marwa Hamdy 2, Marwa Mamdouh 2, Marwa Aboelfotoh 2,Yasser M. Kadah 2 1 Biomedical Engineering Department,
Bio-Informatics Lectures. A Short Introduction
Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively
PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference
PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference Stephane Guindon, F. Le Thiec, Patrice Duroux, Olivier Gascuel To cite this version: Stephane Guindon, F. Le Thiec, Patrice
Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6
Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 In the last lab, you learned how to perform basic multiple sequence alignments. While useful in themselves for determining conserved residues
Maximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1
Maximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1 Ziheng Yang Department of Animal Science, Beijing Agricultural University Felsenstein s maximum-likelihood
Introduction to Phylogenetic Analysis
Subjects of this lecture Introduction to Phylogenetic nalysis Irit Orr 1 Introducing some of the terminology of phylogenetics. 2 Introducing some of the most commonly used methods for phylogenetic analysis.
A comparison of methods for estimating the transition:transversion ratio from DNA sequences
Molecular Phylogenetics and Evolution 32 (2004) 495 503 MOLECULAR PHYLOGENETICS AND EVOLUTION www.elsevier.com/locate/ympev A comparison of methods for estimating the transition:transversion ratio from
The Central Dogma of Molecular Biology
Vierstraete Andy (version 1.01) 1/02/2000 -Page 1 - The Central Dogma of Molecular Biology Figure 1 : The Central Dogma of molecular biology. DNA contains the complete genetic information that defines
A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML
9 June 2011 A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML by Jun Inoue, Mario dos Reis, and Ziheng Yang In this tutorial we will analyze
DNA Sequence Alignment Analysis
Analysis of DNA sequence data p. 1 Analysis of DNA sequence data using MEGA and DNAsp. Analysis of two genes from the X and Y chromosomes of plant species from the genus Silene The first two computer classes
Final Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
Network Protocol Analysis using Bioinformatics Algorithms
Network Protocol Analysis using Bioinformatics Algorithms Marshall A. Beddoe [email protected] ABSTRACT Network protocol analysis is currently performed by hand using only intuition and a protocol
Genome Explorer For Comparative Genome Analysis
Genome Explorer For Comparative Genome Analysis Jenn Conn 1, Jo L. Dicks 1 and Ian N. Roberts 2 Abstract Genome Explorer brings together the tools required to build and compare phylogenies from both sequence
DnaSP, DNA polymorphism analyses by the coalescent and other methods.
DnaSP, DNA polymorphism analyses by the coalescent and other methods. Author affiliation: Julio Rozas 1, *, Juan C. Sánchez-DelBarrio 2,3, Xavier Messeguer 2 and Ricardo Rozas 1 1 Departament de Genètica,
Protein Sequence Analysis - Overview -
Protein Sequence Analysis - Overview - UDEL Workshop Raja Mazumder Research Associate Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center Topics Why do protein
Algorithms in Computational Biology (236522) spring 2007 Lecture #1
Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Tuesday 11:00-12:00/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office
Activity IT S ALL RELATIVES The Role of DNA Evidence in Forensic Investigations
Activity IT S ALL RELATIVES The Role of DNA Evidence in Forensic Investigations SCENARIO You have responded, as a result of a call from the police to the Coroner s Office, to the scene of the death of
Comparing Bootstrap and Posterior Probability Values in the Four-Taxon Case
Syst. Biol. 52(4):477 487, 2003 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150390218213 Comparing Bootstrap and Posterior Probability Values
PHYLOGENETIC ANALYSIS
Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, Second Edition Andreas D. Baxevanis, B.F. Francis Ouellette Copyright 2001 John Wiley & Sons, Inc. ISBNs: 0-471-38390-2 (Hardback);
Pairwise Sequence Alignment
Pairwise Sequence Alignment [email protected] SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What
Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions
SMA 50: Statistical Learning and Data Mining in Bioinformatics (also listed as 5.077: Statistical Learning and Data Mining ()) Spring Term (Feb May 200) Faculty: Professor Roy Welsch Wed 0 Feb 7:00-8:0
RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
A data management framework for the Fungal Tree of Life
Web Accessible Sequence Analysis for Biological Inference A data management framework for the Fungal Tree of Life Kauff F, Cox CJ, Lutzoni F. 2007. WASABI: An automated sequence processing system for multi-gene
Keywords: evolution, genomics, software, data mining, sequence alignment, distance, phylogenetics, selection
Sudhir Kumar has been Director of the Center for Evolutionary Functional Genomics in The Biodesign Institute at Arizona State University since 2002. His research interests include development of software,
Bayesian coalescent inference of population size history
Bayesian coalescent inference of population size history Alexei Drummond University of Auckland Workshop on Population and Speciation Genomics, 2016 1st February 2016 1 / 39 BEAST tutorials Population
PHYLOGENY AND COMPARATIVE METHODS SYMBIOMICS WORKSHOP
PHYLOGENY AND COMPARATIVE METHODS SYMBIOMICS WORKSHOP March 4-7, 2013 Valencia, Spain Parc Cientific of the University of Valencia Goals The aim of this workshop is to provide the attendees with a broad
Core Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1
Core Bioinformatics 2014/2015 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformàtica/Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: [email protected]
Visualization of Phylogenetic Trees and Metadata
Visualization of Phylogenetic Trees and Metadata November 27, 2015 Sample to Insight CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.clcbio.com [email protected]
Bioinformatics Resources at a Glance
Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences
Introduction to Bioinformatics 3. DNA editing and contig assembly
Introduction to Bioinformatics 3. DNA editing and contig assembly Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 [email protected]
Principles of Evolution - Origin of Species
Theories of Organic Evolution X Multiple Centers of Creation (de Buffon) developed the concept of "centers of creation throughout the world organisms had arisen, which other species had evolved from X
Statistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
What mathematical optimization can, and cannot, do for biologists. Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL
What mathematical optimization can, and cannot, do for biologists Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL Introduction There is no shortage of literature about the
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 [email protected] Genomics A genome is an organism s
BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS
BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS NEW YORK CITY COLLEGE OF TECHNOLOGY The City University Of New York School of Arts and Sciences Biological Sciences Department Course title:
Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006
Hidden Markov Models in Bioinformatics By Máthé Zoltán Kőrösi Zoltán 2006 Outline Markov Chain HMM (Hidden Markov Model) Hidden Markov Models in Bioinformatics Gene Finding Gene Finding Model Viterbi algorithm
Focusing on results not data comprehensive data analysis for targeted next generation sequencing
Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes
Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment
Sequence Analysis 15: lecture 5 Substitution matrices Multiple sequence alignment A teacher's dilemma To understand... Multiple sequence alignment Substitution matrices Phylogenetic trees You first need
SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications
Product Bulletin Sequencing Software SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications Comprehensive reference sequence handling Helps interpret the role of each
OD-seq: outlier detection in multiple sequence alignments
Jehl et al. BMC Bioinformatics (2015) 16:269 DOI 10.1186/s12859-015-0702-1 RESEARCH ARTICLE Open Access OD-seq: outlier detection in multiple sequence alignments Peter Jehl, Fabian Sievers * and Desmond
Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources
1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools
High Throughput Network Analysis
High Throughput Network Analysis Sumeet Agarwal 1,2, Gabriel Villar 1,2,3, and Nick S Jones 2,4,5 1 Systems Biology Doctoral Training Centre, University of Oxford, Oxford OX1 3QD, United Kingdom 2 Department
Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.
Name: Class: Date: Chapter 17 Practice Multiple Choice Identify the choice that best completes the statement or answers the question. 1. The correct order for the levels of Linnaeus's classification system,
Innovations in Molecular Epidemiology
Innovations in Molecular Epidemiology Molecular Epidemiology Measure current rates of active transmission Determine whether recurrent tuberculosis is attributable to exogenous reinfection Determine whether
INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B
INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE ICH HARMONISED TRIPARTITE GUIDELINE QUALITY OF BIOTECHNOLOGICAL PRODUCTS: ANALYSIS
An experimental study comparing linguistic phylogenetic reconstruction methods *
An experimental study comparing linguistic phylogenetic reconstruction methods * François Barbançon, a Steven N. Evans, b Luay Nakhleh c, Don Ringe, d and Tandy Warnow, e, a Palantir Technologies, 100
Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company
Genetic engineering: humans Gene replacement therapy or gene therapy Many technical and ethical issues implications for gene pool for germ-line gene therapy what traits constitute disease rather than just
DNA Insertions and Deletions in the Human Genome. Philipp W. Messer
DNA Insertions and Deletions in the Human Genome Philipp W. Messer Genetic Variation CGACAATAGCGCTCTTACTACGTGTATCG : : CGACAATGGCGCT---ACTACGTGCATCG 1. Nucleotide mutations 2. Genomic rearrangements 3.
Molecular Clocks and Tree Dating with r8s and BEAST
Integrative Biology 200B University of California, Berkeley Principals of Phylogenetics: Ecology and Evolution Spring 2011 Updated by Nick Matzke Molecular Clocks and Tree Dating with r8s and BEAST Today
Data Partitions and Complex Models in Bayesian Analysis: The Phylogeny of Gymnophthalmid Lizards
Syst. Biol. 53(3):448 469, 2004 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150490445797 Data Partitions and Complex Models in Bayesian Analysis:
AP Biology Essential Knowledge Student Diagnostic
AP Biology Essential Knowledge Student Diagnostic Background The Essential Knowledge statements provided in the AP Biology Curriculum Framework are scientific claims describing phenomenon occurring in
MAKING AN EVOLUTIONARY TREE
Student manual MAKING AN EVOLUTIONARY TREE THEORY The relationship between different species can be derived from different information sources. The connection between species may turn out by similarities
Name Class Date. binomial nomenclature. MAIN IDEA: Linnaeus developed the scientific naming system still used today.
Section 1: The Linnaean System of Classification 17.1 Reading Guide KEY CONCEPT Organisms can be classified based on physical similarities. VOCABULARY taxonomy taxon binomial nomenclature genus MAIN IDEA:
Molecular typing of VTEC: from PFGE to NGS-based phylogeny
Molecular typing of VTEC: from PFGE to NGS-based phylogeny Valeria Michelacci 10th Annual Workshop of the National Reference Laboratories for E. coli in the EU Rome, November 5 th 2015 Molecular typing
Genetomic Promototypes
Genetomic Promototypes Mirkó Palla and Dana Pe er Department of Mechanical Engineering Clarkson University Potsdam, New York and Department of Genetics Harvard Medical School 77 Avenue Louis Pasteur Boston,
BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16
Course Director: Dr. Barry Grant (DCM&B, [email protected]) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems
Scaling the gene duplication problem towards the Tree of Life: Accelerating the rspr heuristic search
Scaling the gene duplication problem towards the Tree of Life: Accelerating the rspr heuristic search André Wehe 1 and J. Gordon Burleigh 2 1 Department of Computer Science, Iowa State University, Ames,
Data for phylogenetic analysis
Data for phylogenetic analysis The data that are used to estimate the phylogeny of a set of tips are the characteristics of those tips. Therefore the success of phylogenetic inference depends in large
Amino Acids and Their Properties
Amino Acids and Their Properties Recap: ss-rrna and mutations Ribosomal RNA (rrna) evolves very slowly Much slower than proteins ss-rrna is typically used So by aligning ss-rrna of one organism with that
Evaluating the Performance of a Successive-Approximations Approach to Parameter Optimization in Maximum-Likelihood Phylogeny Estimation
Evaluating the Performance of a Successive-Approximations Approach to Parameter Optimization in Maximum-Likelihood Phylogeny Estimation Jack Sullivan,* Zaid Abdo, à Paul Joyce, à and David L. Swofford
PHYLOGENY AND EVOLUTION OF NEWCASTLE DISEASE VIRUS GENOTYPES
Eötvös Lóránd University Biology Doctorate School Classical and molecular genetics program Project leader: Dr. László Orosz, corresponding member of HAS PHYLOGENY AND EVOLUTION OF NEWCASTLE DISEASE VIRUS
Building a phylogenetic tree
bioscience explained 134567 Wojciech Grajkowski Szkoła Festiwalu Nauki, ul. Ks. Trojdena 4, 02-109 Warszawa Building a phylogenetic tree Aim This activity shows how phylogenetic trees are constructed using
Becker Muscular Dystrophy
Muscular Dystrophy A Case Study of Positional Cloning Described by Benjamin Duchenne (1868) X-linked recessive disease causing severe muscular degeneration. 100 % penetrance X d Y affected male Frequency
Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003
Similarity Searches on Sequence Databases: BLAST, FASTA Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Outline Importance of Similarity Heuristic Sequence Alignment:
4 Techniques for Analyzing Large Data Sets
4 Techniques for Analyzing Large Data Sets Pablo A. Goloboff Contents 1 Introduction 70 2 Traditional Techniques 71 3 Composite Optima: Why Do Traditional Techniques Fail? 72 4 Techniques for Analyzing
Clone Manager. Getting Started
Clone Manager for Windows Professional Edition Volume 2 Alignment, Primer Operations Version 9.5 Getting Started Copyright 1994-2015 Scientific & Educational Software. All rights reserved. The software
2.3 Identify rrna sequences in DNA
2.3 Identify rrna sequences in DNA For identifying rrna sequences in DNA we will use rnammer, a program that implements an algorithm designed to find rrna sequences in DNA [5]. The program was made by
Bioinformatics Grid - Enabled Tools For Biologists.
Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis
CCR Biology - Chapter 9 Practice Test - Summer 2012
Name: Class: Date: CCR Biology - Chapter 9 Practice Test - Summer 2012 Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Genetic engineering is possible
Model-based Synthesis. Tony O Hagan
Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that
Phylogenetic Analysis using MapReduce Programming Model
2015 IEEE International Parallel and Distributed Processing Symposium Workshops Phylogenetic Analysis using MapReduce Programming Model Siddesh G M, K G Srinivasa*, Ishank Mishra, Abhinav Anurag, Eklavya
GenBank, Entrez, & FASTA
GenBank, Entrez, & FASTA Nucleotide Sequence Databases First generation GenBank is a representative example started as sort of a museum to preserve knowledge of a sequence from first discovery great repositories,
Sample Size and Power in Clinical Trials
Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance
MATCH Commun. Math. Comput. Chem. 61 (2009) 781-788
MATCH Communications in Mathematical and in Computer Chemistry MATCH Commun. Math. Comput. Chem. 61 (2009) 781-788 ISSN 0340-6253 Three distances for rapid similarity analysis of DNA sequences Wei Chen,
A Tutorial in Genetic Sequence Classification Tools and Techniques
A Tutorial in Genetic Sequence Classification Tools and Techniques Jake Drew Data Mining CSE 8331 Southern Methodist University [email protected] www.jakemdrew.com Sequence Characters IUPAC nucleotide
A Primer of Genome Science THIRD
A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:
A greedy algorithm for the DNA sequencing by hybridization with positive and negative errors and information about repetitions
BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES, Vol. 59, No. 1, 2011 DOI: 10.2478/v10175-011-0015-0 Varia A greedy algorithm for the DNA sequencing by hybridization with positive and negative
Biological Sciences Initiative. Human Genome
Biological Sciences Initiative HHMI Human Genome Introduction In 2000, researchers from around the world published a draft sequence of the entire genome. 20 labs from 6 countries worked on the sequence.
Scottish Qualifications Authority
National Unit specification: general information Unit code: FH2G 12 Superclass: RH Publication date: March 2011 Source: Scottish Qualifications Authority Version: 01 Summary This Unit is a mandatory Unit
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering
Typing in the NGS era: The way forward!
Typing in the NGS era: The way forward! Valeria Michelacci NGS course, June 2015 Typing from sequence data NGS-derived conventional Multi Locus Sequence Typing (University of Warwick, 7 housekeeping genes)
Guide for Bioinformatics Project Module 3
Structure- Based Evidence and Multiple Sequence Alignment In this module we will revisit some topics we started to look at while performing our BLAST search and looking at the CDD database in the first
Genetics Module B, Anchor 3
Genetics Module B, Anchor 3 Key Concepts: - An individual s characteristics are determines by factors that are passed from one parental generation to the next. - During gamete formation, the alleles for
A COMPARISON OF PHYLOGENETIC RECONSTRUCTION METHODS ON AN INDO-EUROPEAN DATASET
Transactions of the Philological Society Volume 103:2 (2005) 171 192 A COMPARISON OF PHYLOGENETIC RECONSTRUCTION METHODS ON AN INDO-EUROPEAN DATASET By LUAY NAKHLEH a,tandy WARNOW b,don RINGE c AND STEVEN
PAML FAQ... 1 Table of Contents... 1. Data Files...3. Windows, UNIX, and MAC OS X basics...4 Common mistakes and pitfalls...5. Windows Essentials...
PAML FAQ Ziheng Yang Last updated: 5 January 2005 (not all items are up to date) Table of Contents PAML FAQ... 1 Table of Contents... 1 Data Files...3 Why don t paml programs read my files correctly?...3
