A data management framework for the Fungal Tree of Life

Similar documents
Introduction to Bioinformatics 3. DNA editing and contig assembly

Bio-Informatics Lectures. A Short Introduction

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

An introduction to bioinformatic tools for population genomic and metagenetic data analysis, 2.5 higher education credits Third Cycle

A Primer of Genome Science THIRD

Molecular typing of VTEC: from PFGE to NGS-based phylogeny

Geospiza s Finch-Server: A Complete Data Management System for DNA Sequencing

Master's projects at ITMO University. Daniil Chivilikhin PhD ITMO University

DNA Barcoding in Plants: Biodiversity Identification and Discovery

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

Bioinformatics Resources at a Glance

2.3 Identify rrna sequences in DNA

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

Introduction to Bioinformatics AS Laboratory Assignment 6

What is a contig? What are the contig assembly programs?

UF EDGE brings the classroom to you with online, worldwide course delivery!

Genome Explorer For Comparative Genome Analysis

Phylogenetic Trees Made Easy

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

NORTH PACIFIC RESEARCH BOARD SEMIANNUAL PROGRESS REPORT

Core Bioinformatics. Degree Type Year Semester Bioinformàtica/Bioinformatics OB 0 1

A Tutorial in Genetic Sequence Classification Tools and Techniques

Final Project Report

University of Glasgow - Programme Structure Summary C1G MSc Bioinformatics, Polyomics and Systems Biology

BIO 3352: BIOINFORMATICS II HYBRID COURSE SYLLABUS

Assign: Unit 1: Preparation Activity page 4-7. Chapter 1: Classifying Life s Diversity page 8

AmphoraNet: Taxonomic Composition Analysis of Metagenomic Shotgun Sequencing Data

Ettema Lab Information Management System Documentation

Next Generation Sequencing Technologies in Microbial Ecology. Frank Oliver Glöckner

COMPARING DNA SEQUENCES TO DETERMINE EVOLUTIONARY RELATIONSHIPS AMONG MOLLUSKS

Description: Molecular Biology Services and DNA Sequencing

Analyzing A DNA Sequence Chromatogram

Name Class Date. binomial nomenclature. MAIN IDEA: Linnaeus developed the scientific naming system still used today.

Typing in the NGS era: The way forward!

How Sequencing Experiments Fail

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE

Lab 2/Phylogenetics/September 16, PHYLOGENETICS

Introduction to Databases and Data Mining

UGENE Quick Start Guide

The Central Dogma of Molecular Biology

Pipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices

The enigmatic monotypic crab plover Dromas ardeola is closely related to pratincoles and coursers (Aves, Charadriiformes, Glareolidae)

Vector NTI Advance 11 Quick Start Guide

Network Protocol Analysis using Bioinformatics Algorithms

GenBank, Entrez, & FASTA

EDIT Workpackage 5 Unified Model Software and Activities

Integrating Bioinformatics, Medical Sciences and Drug Discovery

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

LESSON 9. Analyzing DNA Sequences and DNA Barcoding. Introduction. Learning Objectives

Delivering the power of the world s most successful genomics platform

PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference

Custom TaqMan Assays For New SNP Genotyping and Gene Expression Assays. Design and Ordering Guide

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011

Supplementary Material

EMBL-EBI Web Services

Lecture 11 Data storage and LIMS solutions. Stéphane LE CROM

Scaling the gene duplication problem towards the Tree of Life: Accelerating the rspr heuristic search

Biological Sequence Data Formats

Doctor of Philosophy in Computer Science

Software review. Analysis for free: Comparing programs for sequence analysis

AS Replaces Page 1 of 50 ATF. Software for. DNA Sequencing. Operators Manual. Assign-ATF is intended for Research Use Only (RUO):

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix

NECC History. Karl V. Steiner 2011 Annual NECC Meeting, Orono, Maine March 15, 2011

Geneious 7.0. Biomatters Ltd

A short guide to phylogeny reconstruction

Data for phylogenetic analysis

icer Bioinformatics Support Fall 2011

Teaching Bioinformatics to Undergraduates

WJEC AS Biology Biodiversity & Classification (2.1 All Organisms are related through their Evolutionary History)

BIOLOMICS SOFTWARE & SERVICES GENERAL INFORMATION DOCUMENT

Next generation sequencing (NGS)

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

Data Registry Workshop Report

Evaluating the Performance of a Successive-Approximations Approach to Parameter Optimization in Maximum-Likelihood Phylogeny Estimation

investigation 3 Comparing DNA Sequences to

Introduction to Phylogenetic Analysis

LifeScope Genomic Analysis Software 2.5

Primetime for KNIME:

Version 5.0 Release Notes

Transcription:

Web Accessible Sequence Analysis for Biological Inference A data management framework for the Fungal Tree of Life Kauff F, Cox CJ, Lutzoni F. 2007. WASABI: An automated sequence processing system for multi-gene phylogenies. Syst. Biol. 56(3): 523-531.

Quelle: www.wikipedia.de Tree Thinking

Cladistics Aus: Assembling the Tree of Life, Oxford University Press, 2004

ATOL Assembling the Tree of Life... Along with comparative data on morphology, fossils, development, behavior, and interactions of all forms of life on earth, these new data streams make even more critical the need for an organizing framework for information retrieval, analysis, and prediction.... Currently, single investigators or small teams of researchers are studying the evolutionary pathways of heredity usually concentrating on phylogenetic groups of modest size and lower taxonomic rank. Assembly of a framework phylogeny, or Tree of Life, for all 1.7 million described species requires a greatly magnified effort by large teams working across institutions and disciplines.... Teams of investigators also will be supported for projects in data acquisition, analysis, algorithm development and dissemination in computational phylogenetics and phyloinformatics. (NSF website at http://www.nsf.gov/pubs/2003/nsf03536/nsf03536.htm)

AFTOL: the Fungal Tree of Life Part of NSF financed ATOL project Cooperation: Clark University, Duke University, Oregon State University, University of Minnesota Goal: sequencing of 8 genetic loci for a total of 1500 taxa TEM / ultrastructural data of selected specimen

AFTOL Bioinformatics: Web Accessible Sequence Analysis for Biological Inference Central storage for all project data Participant and public interface to the project data Automated analyses of raw sequence data: Phred, Phrap, local BLAST,... Automated analyses of gene sequence data: alignment, test for topological congruence provide conflict free datasets of single and combined loci for further analysis (e.g. CIPRES) and individual download Interface to GenBank Taxon information Voucher & sample plate submission WASABI GenBank DNA, analyses, & results

WASABI: components PostgreSQL database Zope Application Server User (Internet)

WASABI: components Duke Seqencing lab Phred Blast Phrap Blast PostgreSQL database Zope Application Server User (Internet)

WASABI: components Blast database Duke Seqencing lab Phred Blast Verification Phrap Blast PostgreSQL database Zope Application Server User (Internet)

WASABI: components Blast database GenBank Duke Seqencing lab Phred Blast Phrap Blast PostgreSQL database Zope Application Server Alignment Congruence Phylogen. Analysis (MrBayes, Paup, p4) User (Internet)

Blast database GenBank EUtils Server Sequencing facility MOA Phred Blast Phrap Blast PostgreSQL database Zope Application Server alignment congruence (compat & tct) phylogenetic analyses (MrBayes, Paup, p4) Users (Internet) Python

Data analysis New AFTOL DB LSU LSU core LSU core LSU SSU core SSU SSU core Alignment SSU RPB1 RPB1 core RPB1 core RPB1

Alignment atrich_hirs atrype_unkn Auric_auri Aurip_aure Auris_vulg Auxar_zuff averpa_coni axanth_cons axylar_acut axylar_hypo Backu_circ Backu_cten BAEPLAx Banke_fuli Basid_hapt Basid_rana Benja_poit Bimur_nova Blake_tris CTTAGGTATCGGGCGATGTTAATTTTAT---GTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGATGTTATCTTTTT---ATGTCGCTCTTGGGCTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGACCTCTTTTTT---ATGTGGCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGGTCTCAATTAT---ATATGTCGATCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGACCTCAATTTAA---TTTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGGGCAACTTTTAA---TATGTCGCTCTTGGGTTCTCGATCGGCTACGAGCGGACTAGCGGCGGCGCATCGAGCAGGGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGATGCTTAATAGAT---GTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGGTGTTATTATTTT---GTGTCGGTCTTGTTTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGATGTTATTTTTT----GTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGATGTTATTTTTT----GTGTCGCTCCTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAAGGATCGGGCCTGTTTATT------ATGTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCTTGTTTATT------ATGTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGTATCGGGCGGTGTTATCATTTT---GTGTCGCTCCTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGAACTCAATTCTA---TGTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCAAT---GT------TATGTGCCGCTCTTAGGTTCT----------------------------------------GGAACGGGCAGGATGTCGTAGGCTGGGGGAGTATGGT CTTAGGGATCGGGCAAT---GT------TATGTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTAAGGGATCGGGCTTGTTTATT------ATGTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGGTGTTTCTATTG---TGTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCTTGTTTATT------ATGTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT Esto_nia GGGGGTTCGCTTAGGGATCGGGCTTGTTTATTATGTGTCGCTCTTGGGTTCTCTACGAGCGGACTAGCGGCGGCGCATCGAGGAGGGGGAGTATGGTCGGGCGGTGTTTATTAGATTTTAGATGGT

Alignment atrich_hirs atrype_unkn Auric_auri Aurip_aure Auris_vulg Auxar_zuff averpa_coni axanth_cons axylar_acut axylar_hypo Backu_circ Backu_cten BAEPLAx Banke_fuli Basid_hapt Basid_rana Benja_poit Bimur_nova Blake_tris ambiguous intron indel CTTAGGTATCGGGCGATGTTAATTTTAT---GTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGATGTTATCTTTTT---ATGTCGCTCTTGGGCTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGACCTCTTTTTT---ATGTGGCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGGTCTCAATTAT---ATATGTCGATCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGACCTCAATTTAA---TTTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGGGCAACTTTTAA---TATGTCGCTCTTGGGTTCTCGATCGGCTACGAGCGGACTAGCGGCGGCGCATCGAGCAGGGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGATGCTTAATAGAT---GTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGGTGTTATTATTTT---GTGTCGGTCTTGTTTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGATGTTATTTTTT----GTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGATGTTATTTTTT----GTGTCGCTCCTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAAGGATCGGGCCTGTTTATT------ATGTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCTTGTTTATT------ATGTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGTATCGGGCGGTGTTATCATTTT---GTGTCGCTCCTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGAACTCAATTCTA---TGTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCAAT---GT------TATGTGCCGCTCTTAGGTTCT----------------------------------------GGAACGGGCAGGATGTCGTAGGCTGGGGGAGTATGGT CTTAGGGATCGGGCAAT---GT------TATGTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTAAGGGATCGGGCTTGTTTATT------ATGTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGGTGTTTCTATTG---TGTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCTTGTTTATT------ATGTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT 2 1 4 3 Esto_nia GGGGGTTCGCTTAGGGATCGGGCTTGTTTATTATGTGTCGCTCTTGGGTTCTCTACGAGCGGACTAGCGGCGGCGCATCGAGGAGGGGGAGTATGGTCGGGCGGTGTTTATTAGATTTTAGATGGT

Data analysis New AFTOL DB LSU LSU core LSU core new LSU core SSU core SSU SSU core Alignment new SSU core RPB1 RPB1 core RPB1 core new RPB1 core

Data set combination data set 1 + data set 2 data set 1 data set 2 combined data set phylogenetic estimate

Data set combination data set 1 yes test for congruence data set 2 no eliminate conflicting

Data analysis Neue Sequenzen AFTOL DB LSU core SSU core RPB1 core LSU LSU core SSU SSU core RPB1 RPB1 core Alignment LSU SSU RPB1 Test for topological congruence Taxon pruning Multiprocessor Cluster

Data analysis Neue Sequenzen AFTOL DB LSU core SSU core RPB1 core LSU LSU core SSU SSU core RPB1 RPB1 core Alignment LSU SSU RPB1 Test for topological congruence Taxon pruning LSU SSU SSU RPB1 LSU RPB1 LSU SSU RPB1 Multiprocessor Cluster

Data analysis LSU SSU SSU RPB1 LSU RPB1 LSU SSU RPB1 very sophisticated phylogenetic analysis Multiprocessor Cluster

Data flow overview B. WASABI Pipeline GenBank Final analysis Publication CLUSTALW PHRED PHRAP Local BLAST WASALIGN Conflict detection Automated sequencer DNA sequence chromatograms Single read Contig BLAST results Finalized gene Core alignments Deleted Single locus trees Combined loci trees A. WASABI Database Mandatory user verification C. WASABI Data Interface External editing and visualization (e.g. Sequencher) ZOPE WWW interface MESQUITE interface Direct data access, editing, and visualization (future development)

Data flow: automated data processing pipeline B. WASABI Pipeline GenBank Final analysis Publication CLUSTALW PHRED PHRAP Local BLAST WASALIGN Conflict detection Automated sequencer DNA sequence chromatograms Single read Contig BLAST results Finalized gene Core alignments Deleted Single locus trees Combined loci trees A. WASABI Database Mandatory user verification C. WASABI Data Interface External editing and visualization (e.g. Sequencher) ZOPE WWW interface MESQUITE interface Direct data access, editing, and visualization (future development)

Provenance in WASABI: keep track of user interactions B. WASABI Pipeline GenBank Final analysis Publication CLUSTALW PHRED PHRAP Local BLAST WASALIGN Conflict detection Automated sequencer DNA sequence chromatograms Single read Contig BLAST results Finalized gene Core alignments Deleted Single locus trees Combined loci trees A. WASABI Database Mandatory user verification C. WASABI Data Interface PHRED External editing and visualization (e.g. Sequencher) ZOPE WWW interface MESQUITE interface PHRAP Local BLAST

Provenance in WASABI: keep track of user interactions B. WASABI Pipeline Current implementation gives access only to owners of the data PHRED PHRAP GenBank Other data access only by admins (direct SQL) Local BLAST Authors are supposed to keep track of their changes WASABI only keeps most recent version. Future data access with third-party software and access by multiple users will need more Final sophisticated Publication access analysis control CLUSTALW Access to Conflict different versions of the data WASALIGN and a 'roll-back' detection feature are desirable. Automated sequencer DNA sequence chromatograms Single read Contig BLAST results Finalized gene Core alignments Deleted Single locus trees Combined loci trees A. WASABI Database Mandatory user verification C. WASABI Data Interface PHRED External editing and visualization (e.g. Sequencher) ZOPE WWW interface MESQUITE interface PHRAP Local BLAST

Provenance in WASABI: keep track of user interactions B. WASABI Pipeline A GenBank Final analysis B CLUSTALW C PHRED PHRAP Local BLAST WASALIGN Conflict detection D Automated sequencer DNA sequence chromatograms Single read Contig BLAST results Finalized gene Core alignments Deleted Single locus trees Combined loci trees A. WASABI Database Mandatory user verification C. WASABI Data Interface External editing and visualization (e.g. Sequencher) ZOPE WWW interface MESQUITE interface Direct data access, editing, and visualization (future development)

Tracing back final results to original data B. WASABI Pipeline A GenBank Final analysis B CLUSTALW C PHRED PHRAP Local BLAST WASALIGN Conflict detection D Automated sequencer DNA sequence chromatograms Single read Contig BLAST results Finalized gene Core alignments Deleted Single locus trees Combined loci trees A. WASABI Database A Mandatory user verification B C. WASABI Data Interface C D based on multiple consisting of many Core alignments External editing and visualization (e.g. Sequencher) Finalized gene ZOPE WWW interface created from many MESQUITE interface DNA Single sequence read Direct data access, chromatograms editing, and visualization (future development)

Thanks to...... Cymon Cox (Natural History Museum, London)... Francois Lutzoni and all lab members in Duke Biology Department... AFTOL and its participants... NSF (DEB-0228668)