Phylogenetic Trees Made Easy



Similar documents
A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML

Bio-Informatics Lectures. A Short Introduction

Introduction to Bioinformatics AS Laboratory Assignment 6

Visualization of Phylogenetic Trees and Metadata

Genome Explorer For Comparative Genome Analysis

Bayesian Phylogeny and Measures of Branch Support

Introduction to Phylogenetic Analysis

Protein Sequence Analysis - Overview -

MEGA. Molecular Evolutionary Genetics Analysis VERSION 4. Koichiro Tamura, Joel Dudley Masatoshi Nei, Sudhir Kumar

User Manual for SplitsTree4 V4.14.2

Contents. list of contributors. Preface. Basic concepts of molecular evolution 3

UGENE Quick Start Guide

PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference

PAML FAQ... 1 Table of Contents Data Files...3. Windows, UNIX, and MAC OS X basics...4 Common mistakes and pitfalls...5. Windows Essentials...

A short guide to phylogeny reconstruction

DNA Sequence Alignment Analysis

Bioinformatics Resources at a Glance

Software review. Pise: Software for building bioinformatics webs

Bioinformatics Grid - Enabled Tools For Biologists.

Arbres formels et Arbre(s) de la Vie

Core Bioinformatics. Degree Type Year Semester Bioinformàtica/Bioinformatics OB 0 1

A comparison of methods for estimating the transition:transversion ratio from DNA sequences

Pairwise Sequence Alignment

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

Molecular Clocks and Tree Dating with r8s and BEAST

Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment

PHYLOGENY AND COMPARATIVE METHODS SYMBIOMICS WORKSHOP

Installing C++ compiler for CSc212 Data Structures

DnaSP, DNA polymorphism analyses by the coalescent and other methods.

Java Web Start Guide

PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE

Keywords: evolution, genomics, software, data mining, sequence alignment, distance, phylogenetics, selection

The Central Dogma of Molecular Biology

CLC Sequence Viewer USER MANUAL

Introduction to Bioinformatics 3. DNA editing and contig assembly

BUDAPEST: Bioinformatics Utility for Data Analysis of Proteomics using ESTs

Multiple Sequence Alignment. Hot Topic 5/24/06 Kim Walker

Vector NTI Advance 11 Quick Start Guide

Unipro UGENE User Manual Version

Code Estimation Tools Directions for a Services Engagement

CD-HIT User s Guide. Last updated: April 5,

PHYLOGENETIC ANALYSIS

Do I need to install anything on my computer to use the VC?

Version 5.0 Release Notes

Protein & DNA Sequence Analysis. Bobbie-Jo Webb-Robertson May 3, 2004

Clone Manager. Getting Started

Lab 2/Phylogenetics/September 16, PHYLOGENETICS

Working with AppleScript

Guide for Bioinformatics Project Module 3

2 Short biographies and contact information of the workshop organizers

Netbeans IDE Tutorial for using the Weka API

(A GUIDE for the Graphical User Interface (GUI) GDE)

Analyzing A DNA Sequence Chromatogram

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Unipro UGENE Manual. Version

A combinatorial test for significant codivergence between cool-season grasses and their symbiotic fungal endophytes

Using NetBeans to Compile and Run Java Programs

A Primer of Genome Science THIRD

Maximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1

Getting Started. Getting Started with Time Warner Cable Business Class. Voice Manager. A Guide for Administrators and Users

University of Toronto

The FX Trading Station 2.0

Getting Started with Command Prompts

Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003

A Rough Guide to BEAST 1.4

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006

Version Control with Subversion and Xcode

2.3 Identify rrna sequences in DNA

For Introduction to Java Programming, 5E By Y. Daniel Liang

Screen Design : Navigation, Windows, Controls, Text,

Mac OS X. A Brief Introduction for New Radiance Users. Andrew McNeil & Giulio Antonutto

A data management framework for the Fungal Tree of Life

A branch-and-bound algorithm for the inference of ancestral. amino-acid sequences when the replacement rate varies among

Geneious 8.1. Biomatters Ltd

What mathematical optimization can, and cannot, do for biologists. Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Biological Sequence Data Formats

Working With Your FTP Site

How to use the Eclipse IDE for Java Application Development

Inference of Large Phylogenetic Trees on Parallel Architectures. Michael Ott

Core Bioinformatics. Titulació Tipus Curs Semestre Bioinformàtica/Bioinformatics OB 0 1

Installing (1.8.7) 9/2/ Installing jgrasp

Montefiore Portal Quick Reference Guide

MultiExperiment Viewer Quickstart Guide

Molecular typing of VTEC: from PFGE to NGS-based phylogeny

Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

A Tutorial in Genetic Sequence Classification Tools and Techniques

Software review. Analysis for free: Comparing programs for sequence analysis

PROGRAMMING FOR BIOLOGISTS. BIOL 6297 Monday, Wednesday 10 am -12 pm

Core Bioinformatics. Degree Type Year Semester

Frequently Asked Questions Next Generation Sequencing

COMPARING DNA SEQUENCES TO DETERMINE EVOLUTIONARY RELATIONSHIPS AMONG MOLLUSKS

EMBL-EBI Web Services

Online Backup Client User Manual

How to FTP (How to upload files on a web-server)

PRINCIPLES OF POPULATION GENETICS

1) Orthology of zebrafish HoxD4 and euteleost HoxD4a:

Supporting Online Material for

SECURE MOBILE ACCESS MODULE USER GUIDE EFT 2013

Transcription:

Phylogenetic Trees Made Easy A How-To Manual Fourth Edition Barry G. Hall University of Rochester, Emeritus and Bellingham Research Institute Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A.

Table of Contents Chapter 1 Read Me First! 1 New and Improved Software 2 Just What Is a Phylogenetic Tree? 3 Estimating Phylogenetic Trees: The Basics 4 Beyond the Basics 5 Learn More about the Principles 6 About Appendix III: F.A.Q. 7 Computer Programs and Where to Obtain Them 7 MEGA 5 8 MrBayes 8 FigTree 8 Codeml 8 SplitsTree and Dendroscope 8 Utility Programs 8 Text Editors 9 Acknowledging Computer Programs 9 The Phylogenetic Trees Made Easy Website 9 Chapter 2 Tutorial: Estimate a Tree 11 Why Create Phylogenetic Trees? 11 About this Tutorial 12 Macintosh and Linux users 12 A word about screen shots 12 Search for Sequences Related to Your Sequence 13 Decide Which Related Sequences to Include on Your Tree 16 Establishing homology 17 To include or not to include, that is the question 18

x Table of Contents Download the Sequences 20 Align the Sequences 23 Make a Neighbor Joining Tree 24 Summary 28 Chapter 3 Acquiring the Sequences 29 Hunting Homologs: What Sequences Can Be Included on a Single Tree? 29 Becoming More Familiar with BLAST 30 BLAST help 32 Using the Nucleotide BLAST Page 32 Using BLAST to Search for Related Protein Sequences 34 Finalizing Selected Sequences for a Tree 38 Other Ways to Find Sequences of Interest (Beware! The Risks Are High) 43 Chapter 4 Aligning the Sequences 47 Aligning Sequences with MUSCLE 47 Examine and Possibly Manually Adjust the Alignment 51 Trim excess sequence 51 Eliminate duplicate sequences 54 Check Average Identity to Estimate Reliability of the Alignment 56 Codons: Pairwise amino acid identity 56 Non-coding DNA sequences 57 Increasing Alignment Speed by Adjusting MUSCLE s Parameter Settings 58 How MUSCLE works 58 Adjusting parameters to increase alignment speed 59 Aligning Sequences with ClustalW 60 Chapter 5 Major Methods for Estimating Phylogenetic Trees 61 Learn More about Tree-Searching Methods 62 Distance versus Character-Based Methods 64 Learn More about Distance Methods 64 Which Method Should You Use? 66 Accuracy 66 Ease of interpretation 67 Time and convenience 67

Table of Contents xi Chapter 6 Neighbor Joining Trees 69 Using MEGA 5 to Estimate a Neighbor Joining Tree 69 Learn More about Phylogenetic Trees 70 Determine the suitability of the data for a Neighbor Joining tree 73 Estimate the tree 74 Learn More about Evolutionary Models 75 Unrooted and Rooted trees 80 Estimating the Reliability of a Tree 82 Learn More about Estimating the Reliability of Phylogenetic Trees 83 What about Protein Sequences? 89 Chapter 7 Drawing Phylogenetic Trees 91 Changing the Appearance of a Tree 92 The Options dialog 94 Branch styles 96 Fine-tuning the appearance of a tree 99 Subtrees 102 Rooting a Tree 106 Finding an outgroup 108 Saving Trees 108 Saving a tree description 108 Saving a tree image 108 Captions 109 Chapter 8 Parsimony 111 Learn More about Parsimony 111 MP Search Methods 113 Multiple Equally Parsimonious Trees 116 Calculating branch lengths 117 Consensus and bootstrap trees 118 In the Final Analysis 122 Chapter 9 Maximum Likelihood 123 Learn More about Maximum Likelihood 123 ML Analysis Using MEGA 125 Test alternative models 126 Rooting the ML tree 129

xii Table of Contents The special case of zero length branches 132 Estimating the Reliability of an ML Tree by Bootstrapping 134 What about Protein Sequences? 137 Chapter 10 Bayesian Inference of Trees Using MrBayes 139 MrBayes: An Overview 139 Learn More about Bayesian Inference 141 Saving time (and perhaps your sanity) 142 Choose a model 143 A General Strategy for Estimating Trees Using MrBayes 143 Creating the Execution File 144 What the statements in the example mrbayes block do 145 How the stoprule option of the mcmc command is implemented 148 How Do You Run a MrBayes Analysis? 148 More Complex (and More Useful) MrBayes Blocks 149 Including a user tree 149 The nperts option of the mcmc command 150 Coding sequences and the charset statement 150 The Screen Output while MrBayes Is Running 151 What If You Don t Get Convergence? 152 What about Protein Sequences? 156 Visualizing the MrBayes Tree 156 Using FigTree 158 The side panel 158 The icons above the tree 160 Chapter 11 Working with Various Computer Platforms 161 Command Line Programs 161 MEGA on the Macintosh Platform 162 Navigating among folders on the Mac 162 Printing trees and text from MEGA 165 The Line Endings Issue 165 Installing Command Line Programs 165 Macintosh and Linux: Use the bin folder 166 Windows: Create a bin folder and a path to it 166 Command Line Programs: The Running Environment 168

Table of Contents xiii Windows: A brief visit to the Command Prompt program 168 Macintosh and Linux: A brief visit to Terminal and Unix 170 Acquiring and Installing MrBayes 172 Windows users 172 Macintosh and Linux users 173 Compile MrBayes for your Mac 173 Running the Utility Programs 174 Utility programs for Windows 175 Utility programs for Macintosh and Linux 175 Chapter 12 Advanced Alignment Using GUIDANCE 177 Issues of Alignment Reliability 177 Unreliable sequences 177 Unreliable regions 178 How GUIDANCE Works 178 An Example Illustrated by the SmallData Data Set 179 Make a file of the unaligned sequences in FASTA format 180 Starting the run 180 Viewing the results 182 Eliminate unreliable sequences 186 Applications of GUIDANCE 190 Chapter 13 Reconstructing Ancestral Sequences 191 Using MEGA to Estimate Ancestral Sequences by Maximum Likelihood 192 Create the alignment 192 Construct the phylogeny 193 Examine the ancestral states at each site in the alignment 194 Estimate the ancestral sequence 196 Calculating the ancestral protein sequence and amino acid probabilities 201 How Accurate are the Estimated Ancestral Sequences? 201 Chapter 14 Detecting Adaptive Evolution 203 Effect of Alignment Accuracy on Detecting Adaptive Evolution 205 Using MEGA to Detect Adaptive Evolution 205 Detecting overall selection 205 Detecting selection between pairs 206 Finding the region of the gene that has been subject to positive selection 208 Using Codeml to Detect Adaptive Evolution 211 Installation 211

xiv Table of Contents The files you need to run codeml 211 Questions that underlie the models 213 Run codeml 214 Identify the branches along which selection may have occurred 214 Test the statistical significance of the dn/ds ratios 216 Summary 218 Chapter 15 Phylogenetic Networks 219 Why Trees Are Not Always Sufficient 219 Unrooted and Rooted Phylogenetic Networks 221 Using SplitsTree to Estimate Unrooted Phylogenetic Networks 221 Estimating networks from alignments 221 Learn More about Phylogenetic Networks 223 Rooting an unrooted network 234 Estimating networks from trees 235 Consensus networks 236 Supernetworks 241 Using Dendroscope to Estimate Rooted Networks from Rooted Trees 243 Chapter 16 Some Final Advice: Learn to Program 249 Appendix I File Formats and Their Interconversion 251 Format Descriptions 251 The MEGA format 251 The FASTA format 252 The Nexus format 253 The PHYLIP format 256 Interconverting Formats 257 FastaConvert and MEGA 257 Other format conversion programs 257 Appendix II Additional Programs 259 Appendix III Frequently Asked Questions 263 Literature Cited 267 Index to Major Program Discussions 269 Subject Index 275