Web Accessible Sequence Analysis for Biological Inference A data management framework for the Fungal Tree of Life Kauff F, Cox CJ, Lutzoni F. 2007. WASABI: An automated sequence processing system for multi-gene phylogenies. Syst. Biol. 56(3): 523-531.
Quelle: www.wikipedia.de Tree Thinking
Cladistics Aus: Assembling the Tree of Life, Oxford University Press, 2004
ATOL Assembling the Tree of Life... Along with comparative data on morphology, fossils, development, behavior, and interactions of all forms of life on earth, these new data streams make even more critical the need for an organizing framework for information retrieval, analysis, and prediction.... Currently, single investigators or small teams of researchers are studying the evolutionary pathways of heredity usually concentrating on phylogenetic groups of modest size and lower taxonomic rank. Assembly of a framework phylogeny, or Tree of Life, for all 1.7 million described species requires a greatly magnified effort by large teams working across institutions and disciplines.... Teams of investigators also will be supported for projects in data acquisition, analysis, algorithm development and dissemination in computational phylogenetics and phyloinformatics. (NSF website at http://www.nsf.gov/pubs/2003/nsf03536/nsf03536.htm)
AFTOL: the Fungal Tree of Life Part of NSF financed ATOL project Cooperation: Clark University, Duke University, Oregon State University, University of Minnesota Goal: sequencing of 8 genetic loci for a total of 1500 taxa TEM / ultrastructural data of selected specimen
AFTOL Bioinformatics: Web Accessible Sequence Analysis for Biological Inference Central storage for all project data Participant and public interface to the project data Automated analyses of raw sequence data: Phred, Phrap, local BLAST,... Automated analyses of gene sequence data: alignment, test for topological congruence provide conflict free datasets of single and combined loci for further analysis (e.g. CIPRES) and individual download Interface to GenBank Taxon information Voucher & sample plate submission WASABI GenBank DNA, analyses, & results
WASABI: components PostgreSQL database Zope Application Server User (Internet)
WASABI: components Duke Seqencing lab Phred Blast Phrap Blast PostgreSQL database Zope Application Server User (Internet)
WASABI: components Blast database Duke Seqencing lab Phred Blast Verification Phrap Blast PostgreSQL database Zope Application Server User (Internet)
WASABI: components Blast database GenBank Duke Seqencing lab Phred Blast Phrap Blast PostgreSQL database Zope Application Server Alignment Congruence Phylogen. Analysis (MrBayes, Paup, p4) User (Internet)
Blast database GenBank EUtils Server Sequencing facility MOA Phred Blast Phrap Blast PostgreSQL database Zope Application Server alignment congruence (compat & tct) phylogenetic analyses (MrBayes, Paup, p4) Users (Internet) Python
Data analysis New AFTOL DB LSU LSU core LSU core LSU SSU core SSU SSU core Alignment SSU RPB1 RPB1 core RPB1 core RPB1
Alignment atrich_hirs atrype_unkn Auric_auri Aurip_aure Auris_vulg Auxar_zuff averpa_coni axanth_cons axylar_acut axylar_hypo Backu_circ Backu_cten BAEPLAx Banke_fuli Basid_hapt Basid_rana Benja_poit Bimur_nova Blake_tris CTTAGGTATCGGGCGATGTTAATTTTAT---GTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGATGTTATCTTTTT---ATGTCGCTCTTGGGCTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGACCTCTTTTTT---ATGTGGCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGGTCTCAATTAT---ATATGTCGATCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGACCTCAATTTAA---TTTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGGGCAACTTTTAA---TATGTCGCTCTTGGGTTCTCGATCGGCTACGAGCGGACTAGCGGCGGCGCATCGAGCAGGGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGATGCTTAATAGAT---GTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGGTGTTATTATTTT---GTGTCGGTCTTGTTTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGATGTTATTTTTT----GTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGATGTTATTTTTT----GTGTCGCTCCTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAAGGATCGGGCCTGTTTATT------ATGTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCTTGTTTATT------ATGTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGTATCGGGCGGTGTTATCATTTT---GTGTCGCTCCTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGAACTCAATTCTA---TGTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCAAT---GT------TATGTGCCGCTCTTAGGTTCT----------------------------------------GGAACGGGCAGGATGTCGTAGGCTGGGGGAGTATGGT CTTAGGGATCGGGCAAT---GT------TATGTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTAAGGGATCGGGCTTGTTTATT------ATGTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGGTGTTTCTATTG---TGTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCTTGTTTATT------ATGTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT Esto_nia GGGGGTTCGCTTAGGGATCGGGCTTGTTTATTATGTGTCGCTCTTGGGTTCTCTACGAGCGGACTAGCGGCGGCGCATCGAGGAGGGGGAGTATGGTCGGGCGGTGTTTATTAGATTTTAGATGGT
Alignment atrich_hirs atrype_unkn Auric_auri Aurip_aure Auris_vulg Auxar_zuff averpa_coni axanth_cons axylar_acut axylar_hypo Backu_circ Backu_cten BAEPLAx Banke_fuli Basid_hapt Basid_rana Benja_poit Bimur_nova Blake_tris ambiguous intron indel CTTAGGTATCGGGCGATGTTAATTTTAT---GTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGATGTTATCTTTTT---ATGTCGCTCTTGGGCTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGACCTCTTTTTT---ATGTGGCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGGTCTCAATTAT---ATATGTCGATCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGACCTCAATTTAA---TTTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGGGCAACTTTTAA---TATGTCGCTCTTGGGTTCTCGATCGGCTACGAGCGGACTAGCGGCGGCGCATCGAGCAGGGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGATGCTTAATAGAT---GTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGGTGTTATTATTTT---GTGTCGGTCTTGTTTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGATGTTATTTTTT----GTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGATGTTATTTTTT----GTGTCGCTCCTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAAGGATCGGGCCTGTTTATT------ATGTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCTTGTTTATT------ATGTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGTATCGGGCGGTGTTATCATTTT---GTGTCGCTCCTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGAACTCAATTCTA---TGTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCAAT---GT------TATGTGCCGCTCTTAGGTTCT----------------------------------------GGAACGGGCAGGATGTCGTAGGCTGGGGGAGTATGGT CTTAGGGATCGGGCAAT---GT------TATGTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTAAGGGATCGGGCTTGTTTATT------ATGTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCGGTGTTTCTATTG---TGTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT CTTAGGGATCGGGCTTGTTTATT------ATGTGTCGCTCTTGGGTTCT----------------------------------------GGA---------------------GGGGGAGTATGGT 2 1 4 3 Esto_nia GGGGGTTCGCTTAGGGATCGGGCTTGTTTATTATGTGTCGCTCTTGGGTTCTCTACGAGCGGACTAGCGGCGGCGCATCGAGGAGGGGGAGTATGGTCGGGCGGTGTTTATTAGATTTTAGATGGT
Data analysis New AFTOL DB LSU LSU core LSU core new LSU core SSU core SSU SSU core Alignment new SSU core RPB1 RPB1 core RPB1 core new RPB1 core
Data set combination data set 1 + data set 2 data set 1 data set 2 combined data set phylogenetic estimate
Data set combination data set 1 yes test for congruence data set 2 no eliminate conflicting
Data analysis Neue Sequenzen AFTOL DB LSU core SSU core RPB1 core LSU LSU core SSU SSU core RPB1 RPB1 core Alignment LSU SSU RPB1 Test for topological congruence Taxon pruning Multiprocessor Cluster
Data analysis Neue Sequenzen AFTOL DB LSU core SSU core RPB1 core LSU LSU core SSU SSU core RPB1 RPB1 core Alignment LSU SSU RPB1 Test for topological congruence Taxon pruning LSU SSU SSU RPB1 LSU RPB1 LSU SSU RPB1 Multiprocessor Cluster
Data analysis LSU SSU SSU RPB1 LSU RPB1 LSU SSU RPB1 very sophisticated phylogenetic analysis Multiprocessor Cluster
Data flow overview B. WASABI Pipeline GenBank Final analysis Publication CLUSTALW PHRED PHRAP Local BLAST WASALIGN Conflict detection Automated sequencer DNA sequence chromatograms Single read Contig BLAST results Finalized gene Core alignments Deleted Single locus trees Combined loci trees A. WASABI Database Mandatory user verification C. WASABI Data Interface External editing and visualization (e.g. Sequencher) ZOPE WWW interface MESQUITE interface Direct data access, editing, and visualization (future development)
Data flow: automated data processing pipeline B. WASABI Pipeline GenBank Final analysis Publication CLUSTALW PHRED PHRAP Local BLAST WASALIGN Conflict detection Automated sequencer DNA sequence chromatograms Single read Contig BLAST results Finalized gene Core alignments Deleted Single locus trees Combined loci trees A. WASABI Database Mandatory user verification C. WASABI Data Interface External editing and visualization (e.g. Sequencher) ZOPE WWW interface MESQUITE interface Direct data access, editing, and visualization (future development)
Provenance in WASABI: keep track of user interactions B. WASABI Pipeline GenBank Final analysis Publication CLUSTALW PHRED PHRAP Local BLAST WASALIGN Conflict detection Automated sequencer DNA sequence chromatograms Single read Contig BLAST results Finalized gene Core alignments Deleted Single locus trees Combined loci trees A. WASABI Database Mandatory user verification C. WASABI Data Interface PHRED External editing and visualization (e.g. Sequencher) ZOPE WWW interface MESQUITE interface PHRAP Local BLAST
Provenance in WASABI: keep track of user interactions B. WASABI Pipeline Current implementation gives access only to owners of the data PHRED PHRAP GenBank Other data access only by admins (direct SQL) Local BLAST Authors are supposed to keep track of their changes WASABI only keeps most recent version. Future data access with third-party software and access by multiple users will need more Final sophisticated Publication access analysis control CLUSTALW Access to Conflict different versions of the data WASALIGN and a 'roll-back' detection feature are desirable. Automated sequencer DNA sequence chromatograms Single read Contig BLAST results Finalized gene Core alignments Deleted Single locus trees Combined loci trees A. WASABI Database Mandatory user verification C. WASABI Data Interface PHRED External editing and visualization (e.g. Sequencher) ZOPE WWW interface MESQUITE interface PHRAP Local BLAST
Provenance in WASABI: keep track of user interactions B. WASABI Pipeline A GenBank Final analysis B CLUSTALW C PHRED PHRAP Local BLAST WASALIGN Conflict detection D Automated sequencer DNA sequence chromatograms Single read Contig BLAST results Finalized gene Core alignments Deleted Single locus trees Combined loci trees A. WASABI Database Mandatory user verification C. WASABI Data Interface External editing and visualization (e.g. Sequencher) ZOPE WWW interface MESQUITE interface Direct data access, editing, and visualization (future development)
Tracing back final results to original data B. WASABI Pipeline A GenBank Final analysis B CLUSTALW C PHRED PHRAP Local BLAST WASALIGN Conflict detection D Automated sequencer DNA sequence chromatograms Single read Contig BLAST results Finalized gene Core alignments Deleted Single locus trees Combined loci trees A. WASABI Database A Mandatory user verification B C. WASABI Data Interface C D based on multiple consisting of many Core alignments External editing and visualization (e.g. Sequencher) Finalized gene ZOPE WWW interface created from many MESQUITE interface DNA Single sequence read Direct data access, chromatograms editing, and visualization (future development)
Thanks to...... Cymon Cox (Natural History Museum, London)... Francois Lutzoni and all lab members in Duke Biology Department... AFTOL and its participants... NSF (DEB-0228668)