1) Orthology of zebrafish HoxD4 and euteleost HoxD4a:



Similar documents
Phylogenetic Trees Made Easy

Introduction to Phylogenetic Analysis

Introduction to Bioinformatics 3. DNA editing and contig assembly

Protein Sequence Analysis - Overview -

Bayesian Phylogeny and Measures of Branch Support

An ancestral MADS-box gene duplication occurred before the divergence of plants and animals

Introduction to Bioinformatics AS Laboratory Assignment 6

DNA Sequence Alignment Analysis

Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

This is a series of skulls and front leg fossils of organisms believed to be ancestors of the modern-day horse.

MAKING AN EVOLUTIONARY TREE

ID of alternative translational initiation events. Description of gene function Reference of NCBI database access and relative literatures

Supporting Online Material for

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Biological Sequence Data Formats

PHYLOGENETIC ANALYSIS

Scaling the gene duplication problem towards the Tree of Life: Accelerating the rspr heuristic search

Name Class Date. binomial nomenclature. MAIN IDEA: Linnaeus developed the scientific naming system still used today.

Worksheet - COMPARATIVE MAPPING 1

Bioinformatics Grid - Enabled Tools For Biologists.

PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE

A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML

A short guide to phylogeny reconstruction

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Milk protein genetic variation in Butana cattle

Supplementary material: A benchmark of multiple sequence alignment programs upon structural RNAs Paul P. Gardner a Andreas Wilm b Stefan Washietl c

Inferred thermophily of the last universal ancestor based on estimated

Note: This document wh_informatics_practical.doc and supporting materials can be downloaded at

Introduction to Genome Annotation

Evidence for evolution factsheet

Name: Date: Problem How do amino acid sequences provide evidence for evolution? Procedure Part A: Comparing Amino Acid Sequences

Maximum-Likelihood Estimation of Phylogeny from DNA Sequences When Substitution Rates Differ over Sites1

Lab 2/Phylogenetics/September 16, PHYLOGENETICS

Biological Sciences Initiative. Human Genome

Crime Scene Investigation (Adopted from Forensics in the Classroom developed by Court TV)

Genome Explorer For Comparative Genome Analysis

Extensive Cryptic Diversity in Indo-Australian Rainbowfishes Revealed by DNA Barcoding

PHYLOGENY AND EVOLUTION OF NEWCASTLE DISEASE VIRUS GENOTYPES

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing

BIOINFORMATICS TUTORIAL

Visualization of Phylogenetic Trees and Metadata

Principles of Evolution - Origin of Species

DnaSP, DNA polymorphism analyses by the coalescent and other methods.

2.3 Identify rrna sequences in DNA

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Biological kinds and the causal theory of reference

The Central Dogma of Molecular Biology

Bio-Informatics Lectures. A Short Introduction

Interaktionen von Nukleinsäuren und Proteinen

Name Class Date. Figure Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d.

PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference

EMBL-EBI Web Services

Searching Nucleotide Databases

Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment

Bioinformatics Resources at a Glance

The Quixote Project: a pioneering work in managing Computational Chemistry research data

Year 10: The transmission of heritable characteristics from one generation to the next involves DNA

Network Protocol Analysis using Bioinformatics Algorithms

An Overview of Cells and Cell Research

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

Regents Biology REGENTS REVIEW: PROTEIN SYNTHESIS

Which Are the Largest Genes?

Pairwise Sequence Alignment

Human Genome and Human Genome Project. Louxin Zhang

A branch-and-bound algorithm for the inference of ancestral. amino-acid sequences when the replacement rate varies among

Systematics - BIO 615

Graph theoretic approach to analyze amino acid network

Survey of clinical data mining applications on big data in health informatics

Keywords: evolution, genomics, software, data mining, sequence alignment, distance, phylogenetics, selection

Consequences of genome duplication Marie Sémon and Kenneth H Wolfe

Heuristics for the Gene-duplication Problem: A Θ(n) Speed-up for the Local Search

The evolution of HOM-C homeoboxes in the Dipteran

UCHIME in practice Single-region sequencing Reference database mode

Arbres formels et Arbre(s) de la Vie

Section 3 Comparative Genomics and Phylogenetics

Bob Jesberg. Boston, MA April 3, 2014

A Rough Guide to BEAST 1.4

13.4 Gene Regulation and Expression

Guide for Bioinformatics Project Module 3

Mechanisms of Evolution

MEGA. Molecular Evolutionary Genetics Analysis VERSION 4. Koichiro Tamura, Joel Dudley Masatoshi Nei, Sudhir Kumar

Gene Models & Bed format: What they represent.

Classification/Decision Trees (II)

Distributed Bioinformatics Computing System for DNA Sequence Analysis

Molecular Databases and Tools

Inference of Large Phylogenetic Trees on Parallel Architectures. Michael Ott

Building a phylogenetic tree

Phylogenetic relationships among Staphylococcus species and refinement of cluster groups based on multilocus data

Discovering Bioinformatics

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure enzymes control cell chemistry ( metabolism )

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, Abstract. Haruna Cofer*, PhD

Module 3. Genome Browsing. Using Web Browsers to View Genome Annota4on. Kers4n Howe Wellcome Trust Sanger Ins4tute zfish-

Outline. MicroRNA Bioinformatics. microrna biogenesis. short non-coding RNAs not considered in this lecture. ! Introduction

1 Mutation and Genetic Change

RNA and Protein Synthesis

Bayesian coalescent inference of population size history

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

Unraveling protein networks with Power Graph Analysis

many diverse adaptations to life -

Transcription:

Supplementary Material for Karen D Crow, Peter F. Stadler, Vincent J. Lynch, Chris Amemiya, and Günter P. Wagner. 2005 The fish specific Hox cluster duplication is coincident with the origin of teleosts. Molecular Biology and Evolution, in press. HoxD4 amino acid sequence analysis Gunter P. Wagner Date document finished: 1/20/05 Objective: analyze the fish HoxD4 genes to clarify whether the fugu/zebrafish HoxD4a are orthologous. Add new sequences one by one, and restrict the analysis to the most critical species to determine orthology of the new sequences. 1) Orthology of zebrafish HoxD4 and euteleost HoxD4a: Approach: use complete amino acid sequences of exon 1 of zebrafish and euteleost genes to establish orthology of zebrafish and euteleost HoxD4a. Sequences: shark (Hfr) HoxD4, coelacanth HoxD4 (Lme), zebrafish HoxD4, pufferfish (Sne) HoxD4a/b, medaka (Ola) HoxD4a/b, fugu (Tru) HoxD4b (8 sequences) ClustalW amino acid alignment yields a good alignment, with few corrections necessary. For analysis I used the file D4ab_aaAligNoGap.phy. The original alignment is 150 positions, the no-gap alignment is 127 amino acid positions. The analysis was done with the protein sequence algorithms of Phylip3.6 (NJ, MP, ML). Results: In all analysis the zebrafish HoxD4 gene groups with the a-clade of euteleosts. The bootstrap support is reasonable (77/70/57) for (NJ/MP/ML). Hence it is most likely that the zebrafish HoxD4 gene is orthologous to HoxD4a. This is consistent with the HoxD9 result of Prohaska and Stadler (2004).

NJ: 97 77 93 57 2

MP: 70 49 52 97 3

ML: 92 51 57 79 4

2) Identity of elopomorph genes: For both elopomorph species, tarpon and eel, we obtained two paralogs. The orthology of these genes was investigated with an alignment of the known zebrafish and euteleost genes and the four elopomorph sequences. The file is D4ab_aaAroMat copy.phy. In all analyses the four elopomorph sequences form a clade nested in the HoxD4a clade with support values (97/86/74). Hence the elopomorph paralogs are probably the result of a gene duplication which happened after the split of the zebrafish and the elopomorph lineage but before the split of the eel and tarop lineage. NJ: 84 Aro3D4a 66 64 97 MatD4a MatD4 31 47 AroD4n1 68 86 5

MP: 81 MatD4a 39 86 AroD4n1 86 Aro3D4an 67 78 MatD4 79 91 6

ML: 50 MatD4a 51 54 AroD4n1 74 Aro3D4an 49 52 MatD4 66 74 7

3) The orthology of Hiodon paralogs: From the goldeye two genes were found and compared to the known HoxD4a/b sequences and outgroups, shark and coelacanth(d4ab_aahalnogap copy.phy). The trees support an affiliation of one of the Hiodon sequences with the HoxD4a and HoxD4b clade respectively, suggesting orthology. In this analysis the most significant observation is the association of the HalD4b gene with the euteleost HoxD4b genes, because the association of the other paralog with the HoxD4a clade could in principle be artifactual, as the clustering of the coelacanth sequence shows. NJ: HalD4b 86 99 86 HalD4a 33 80 55 8

MP: 99 99 HalD4a 52 50 HalD4b 99 9

ML: HalD4b 50 86 HalD4a 40 27 45 43

4) The orthology of Amia HoxD4 gene: One sequence was recovered from bowfin (D4ab_aaAcaLme copy.phy). This sequence is consistently placed inside the HoxD4a clade, but this result is certainly artifactual because even when the duplication occurred before the split of the Amia/Teleost lineage one would predict the Amia sequence to diverge before the zebrafish sequence, because the monophyly of teleosts is not in question. In summary this data is not able to constrain the duplication event downwards. On the other hand it also does not provide evidence for a duplication date different from that suggested by the HoxA11 and HoxB5 data. NJ: 74 AcaD4 74 50 79 95 11

MP: 60 78 AcaD4 50 39 OlaD4 98 12

ML: 50 59 AcaD4 51 65 90 13

Conclusions: The zebrafish HoxD4 gene is orthologous to HoxD4a. The two paralogs of Hiodon are orthologous to the HoxD4a and HoxD4b respectively, i.e. the duplication occurred before the most recent common ancestor of teleosts. The two paralogs of HoxD4 genes recovered in this study arose through one additional duplication event in the stem of the elopomorph clade. The present analysis cannot decide whether the duplication occurred after the most recent common ancestor of Amia and teleosts, but it also does not contain any evidence against a duplication along with HoxA11 and HoxB5. 14