MoBEDAC -- Integrated data and analysis for the indoor and built environment. Folker Meyer Argonne National Laboratory GSC 13 Shenzhen, China



Similar documents
nuts and bolts of DNA sequencing approaches and bioinformatic tools

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

Richmond, VA. Richmond, VA. 2 Department of Microbiology and Immunology, Virginia Commonwealth University,

Metagenomic and metatranscriptomic analysis

CGHub Web-based Metadata GUI Statement of Work

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

SEQUENCING. From Sample to Sequence-Ready

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

Delivering the power of the world s most successful genomics platform

Enabling the Big Data Commons through indexing of data and their interactions

BRCA1 / 2 testing by massive sequencing highlights, shadows or pitfalls?

Module 1. Sequence Formats and Retrieval. Charles Steward

LMI Open Data Portal. LMI Advisory Group August 6, Presenter: Marlon Fletcher EDD Labor Market Information Division

NORTH PACIFIC RESEARCH BOARD SEMIANNUAL PROGRESS REPORT

Work Package 13.5: Authors: Paul Flicek and Ilkka Lappalainen. 1. Introduction

ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013

Three data delivery cases for EMBL- EBI s Embassy. Guy Cochrane

AmphoraNet: Taxonomic Composition Analysis of Metagenomic Shotgun Sequencing Data

Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

PreciseTM Whitepaper

SMRT Analysis v2.2.0 Overview. 1. SMRT Analysis v SMRT Analysis v2.2.0 Overview. Notes:

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Sharing Data from Large-scale Biological Research Projects: A System of Tripartite Responsibility

NCBI resources III: GEO and ftp site. Yanbin Yin Spring 2013

NIH s Genomic Data Sharing Policy

Coastal Waters Consortium (CWC) Data Management Plan

G E N OM I C S S E RV I C ES

Software Description Technology

NIH Genomic Data Sharing (GDS) Policy Guidance Memo #2 1

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

The Galaxy workflow. George Magklaras PhD RHCE

Virginia Commonwealth University Rice Rivers Center Data Management Plan

An introduction to bioinformatic tools for population genomic and metagenetic data analysis, 2.5 higher education credits Third Cycle

La capture de la fonction par des approches haut débit

9. PROGRAM: TEACHING AND RESEARCH. Teaching and research are the principal activities on reserves and the basis for the

Integrated Rule-based Data Management System for Genome Sequencing Data

ODUM INSTITUTE ARCHIVE SERVICES OVERVIEW IASSIST 2015

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

Mass Storage Use Cases April 21, 2011

In 2014, the Research Data Purdue University

Metagenomics revisits the one pathogen/one disease postulates and translate the One Health concept into action

ONLINE PROGRAM MANAGEMENT SYSTEM. Program Management System. Overview PRINTED ON 16/06/2009 PAGE 1 OF 10

Practical Solutions for Big Data Analytics

History of DNA Sequencing & Current Applications

NaviCell Data Visualization Python API

Enhancing Web Publishing with Digital Asset Management - Using Open Text Artesia DAM to enhance your Open Text WCMS (Red Dot) web sites

Data integration for metagenomics: current status and future plans

RAST Automated Analysis. What is RAST for?

Automated and Scalable Data Management System for Genome Sequencing Data

CORSI WHITE PAPER. Department of Evolution and Ecology, University of California, Davis; 3 Building Ecology Research Group

mmnet: Metagenomic analysis of microbiome metabolic network

Open Access to Manuscripts, Open Science, and Big Data

MIP File Server Re-Architecture and Enhanced Search Changes for Amendments & Revisions. Customer and Data Services June 6, 2014

SRA File Formats Guide

Data search and visualization tools at the Comparative Evolutionary Genomics of Cotton Web resource

Case Study Life Sciences Data

Introduction to next-generation sequencing data

Alison Yao, Ph.D. July 2014

Genome and DNA Sequence Databases. BME 110/BIOL 181 CompBio Tools Todd Lowe March 31, 2009

THE CCLRC DATA PORTAL

Learn about OverDrive APIs and how they can benefit search, discovery and reporting services at your library. Contact:

Local Loading. The OCUL, Scholars Portal, and Publisher Relationship

Description: Molecular Biology Services and DNA Sequencing

Product Overview: Software Update Management for Automotive. Wireless software update & management service for Automotive manufacturers

Worldwide Collaborations in Molecular Profiling

Molecular and Cell Biology Laboratory (BIOL-UA 223) Instructor: Ignatius Tan Phone: Office: 764 Brown

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Get More Value from Your Reference Data Make it Meaningful with TopBraid RDM

Building Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT

Data management and archiving

MIBIE Summer School Molecular diagnostics of UTI & STI by using PCR, DHPLC and NGS

Shanoir: So*ware as a Service Environment to Manage Popula9on Imaging Research Repositories

Enhanced Research Data Management and Publication with Globus

Towards the construction of an integrated Wheat Information System

Digital Public Library of America (DPLA)

Accelerate > Converged Storage Infrastructure. DDN Case Study. ddn.com DataDirect Networks. All Rights Reserved

GEnomics is one of the areas where biology and medical

THE DATA CITATION INDEX

BIOINFORMATICS Supporting competencies for the pharma industry

Manage Licenses and Updates

Intelligent Document Platform (eforms) and File Upload

SEED: Standard Energy Efficiency Data Platform

AN INQUIRY-BASED LEARNING (IBL) APPROACH TO MOLECULAR BIOLOGY FOR BIOTECHNOLOGY UNDERGRADUATE STUDENTS

Large-scale Research Data Management and Analysis Using Globus Services. Ravi Madduri Argonne National Lab University of

US Structure and Ground Motion Data. Virtual Data Center (VDC)

BioS 323: Molecular Biology Laboratory. Fall Semester CRN3636 Tuesday & Thursday 2PM to 5PM 3068 SEL

On-line supplement to manuscript Galaxy for collaborative analysis of ENCODE data: Making large-scale analyses biologist-friendly

Data Management Plans - How to Treat Digital Sources

SAR Archive and Community Support Activities at UNAVCO

Information and Data Sharing Policy* Genomics:GTL Program

PeptidomicsDB: a new platform for sharing MS/MS data.

Invenio: A Modern Digital Library for Grey Literature

University of Glasgow - Programme Structure Summary C1G MSc Bioinformatics, Polyomics and Systems Biology

DataSafe Solutions. Protect your valuable genomic data

Software review. Bioinformatics software resources

Annex 6: Nucleotide Sequence Information System BEETLE. Biological and Ecological Evaluation towards Long-Term Effects

Transcription:

MoBEDAC -- Integrated data and analysis for the indoor and built environment Folker Meyer Argonne National Laboratory GSC 13 Shenzhen, China

NGS is causing paradigm shift Environmental clone libraries ( functional metagenomics ) $250 / 96 clones/reads (prep + sequencing) Amplicon studies (single gene studies, 16s rdna) $17 / 100,000 reads (PCR, barcoding, sequencing) Shotgun metagenomics Cost for library, barcoding and sequencing $1200 / 10GBp / 100 million reads (single ended) $2400 / 20GBp / 100 million fragments (paired ends) What are they doing? Who are they? data Data is cheap!

Background: Metagenomics data challenge Data growing fast: 2004: C. Venter s GOS with 600MBp (or 0.6GBp) 2011: HMP with 6TBp (or 6,000GBp) 2012: MG-RAST hits 11TBp (10 *10^12 bases) Sequencing cost will continue to drop Analysis needs to speed up 10x annually Analysis cost is 10x of sequencing cost Driving force Source: Rob Knight, UColorado

Background: Numerous data sources In the past just a few genome centers produced data, now hundreds of groups MG-RAST alone has 2500 data submitters Metadata coverage is sparse MAP OF Submissions

Background: Integration is missing There is no Genbank for metagenomes SRA is not functioning in that role Even if it did, it would be raw data only We lack an integration of data, analysis and pre-analyzed data! Microbiome of the Built Environment Data Analysis Core (MoBeDAC)

What is MoBEDAC? The MoBEDAC provides a data repository and bioinformatics tools for analyzing molecular sequence data and for visualizing ecological and functional similarities between microbial communities in the indoor environment and other field sites.

What is MoBEDAC? FungiDB QIIME MG- RAST VAMPS Common Submission API Analysis (BIOM format) Metadata standard working group

The MoBDAC PIs Mitch Sogin Folker Meyer, MBL University of Chicago ANL Rob Knight University of Colorado Boulder BE minimal metadata working group Argonne: Elizabeth Glass, Folker Meyer, Andreas Wilke Colorado: Rob Knight, Doug Wendel, Bob Van Pelt microbenet: Hal Levin UC Davis: Jonathan Eisen UMD-SOM,IGS: Lynn Schriml MBL: Mitch Sogin, Anna Shipunova Sloan: Paula Olsiewski Jason Stajich University of California Riverside

Complex Queries and Analysis Retrieve and compare all 16s sequences and meta data from sample from industrial buildings. Retrieve the set of samples for which both the V2 and V6 regions have been sequenced (for comparisons of primer bias). www.mobedac.org Retrieve and compare a set of samples from cities in which both drinking water and sewage have been sampled (to allow comparisons of contamination levels and source tracking). Retrieve and compare metabolic profiles of samples from waste water that were sequenced using HISEQ. GOLD INSDC Web Services Web Interface Repository Export Sequence MetaData Analyses Upload Download Comparative Tools MG-RAST (Meyer) QIIME (Knight) VAMPS (Sogin) FungiDB (Stajich)

Data Portal and Repository No single analysis tool could satisfy all researchers across metagenomics; a federated approach to analysis is required. At the same time, the size of data sets from nextgeneration sequencing platforms have made these data sets difficult to move and share. The MoBEDAC will act as an archive for all sequence data (plus metadata) and analysis generated in the Sloan IE program, allowing PIs easy upload directly or via one of the tools participating in the MoBEDAC project. We will provide unified access to all sequence data created in the program, as well as from other relevant IE programs.

Repository and Data Synchronization MoBEDAC will include mechanisms to automatically retrieve pertinent datasets from various websites and archives, including data relevant to the indoor environment from INSDC, KEGG, SEED, VAMPS, GOLD, SRA, QIIME, MG- RAST, FungiDB, and IMG/M as well as corresponding metadata. We will accommodate existing exchange and data formats for inclusion in the repository. Sequence data collected and integrated will be provided in various formats and made available via FTP download or web services. Metadata will be available in GCDML format.

Metadata Metadata provides an essential complement to sequence data, helping answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the IE community, but considerable challenges remain, including exchange, curation, distribution, and IE-specific standards. Communication and feedback from the IE community is vital. We have developed a GSC-compliant BE minimal metadata package (Glass and Schriml).

Mechanisms Enabling Metadatadriven Queries for Sequence Data Mechanism to enable download from the MoBEDAC and linking to analysis results on existing analysis servers (VAMPS, QIIME, MG-RAST, and FungiDB). The query results can be of two kinds: datasets for download or links to the analysis of those datasets in existing tools. Enables researchers to obtain an overview of microbial communities for existing data sets with various tools. The query results returned via web pages or web services. The MoBEDAC team is also developing data management capabilities for the core. These will support prepublication project creation and data sharing by PIs via web-based tools.

APIs

When will this be available? Timeline 1 st Beta testing March 2012 Integration of Feedback April 2012 2 nd Beta testing March/April 2012 MoBEDAC Public Launch May 2012

Web integration Widgets! Next phase: Widget to allow integration of views into MoBEDAC integration (prototype) Example: User interface code (~100lines) allows views into MoBEDAC from other web sites.