* Alison Yao, Ph.D. Program Officer, Office of Genomics and Advanced Technologies Division of Microbiology and Infectious Diseases National Institute of Allergy and Infectious Diseases National Institutes of Health July 2014
BIG DATA * NIH Big Data to Knowledge Initiative for Research Data BD2K
* Genomic Other Omic Imaging Phenotypic Exposure Clinical Courtesy of NHGRI
*BIG DATA Experimental metadata Interpreted data/ knowledge Derived data Analytical metadata Primary data Courtesy of Richard Scheuermann
*BIG DATA *Lots and lots of data in individual labs Lab 2 Lab 1 Lab 6 Lab 4 Lab 3 Lab 5 Courtesy of Michael F. Huerta
*NIH is Tackling the Big Data Problem Associate Director for Data Science (ADDS) Scientific Data Council (SDC) Big Data to Knowledge (BD2K) Courtesy of NHGRI
*Big Data to Knowledge (BD2K): Major trans-nih initiative addressing an NIH imperative and key roadblock Aims to be catalytic to biomedical research and synergistic across different scientific communities Overarching goal: BD2K aims to develop the new approaches, standards, methods, tools, software, and competencies that will enhance the use of biomedical Big Data by supporting research, implementation, and training in data science.
Data Computing centers and Software development Advance the science & technology of biomedical big data Data standards, catalog, and data sharing policies Facilitate the broad use of biomedical research data Training *NIH BD2K Initiative Enhance & develop the workforce in biomedical big data
*Impact of NIH BD2K *Increased data sharing will make data available *Promotion of standards will make data useable *Data will be brought into the research ecosystem *Discoverable, citable & linked to data, tools & literature *Data science & tools will enable scientific innovation BD2K will make the biomedical research enterprise more data centric Today Hypothesis driven Transforming Biomedical research Tomorrow Data centric
* *The DDICC will support *Data Discoverability *Data Access *Data Citation *Approaches *Community engagement and Outreach *Task Forces *Pilot Projects *Deliverables: *White paper and examples to help inform development of a fully functional DDI
* NIAID/DMID Genomics Program Sequencing Functional Genomics Proteomics Structural Genomics Systems Biology Genomic Sequencing Centers Functional Genomic Research Centers Clinical Proteomics Centers Structural Genomics Centers Systems Biology Centers Bioinformatics Resource Centers Bioinformatics Genomic Research Resources Genomic/Omics Data Sets, Databases, Bioinformatics Tools, Biomarkers, 3D Structures, Protein Clones, Predictive Models To address key questions in microbiology and infectious disease
* Bioinformatics Resource Centers (BRCs) Genome Sequencing Centers Systems Biology Centers Structure Genomics Centers Clinical Proteomics Centers
*Bioinformatics Resource Centers (BRCs) Goal: Provide integrated bioinformatics resources in support of basic and applied infectious diseases research Data and metadata management and integration solutions Computational analysis and visualization tools Work spaces and web interfaces Training and outreach activities Free bioinformatics services Rapid response to new and emerging pandemic threats
*Bioinformatics Resource Centers (BRCs)
* Software Engineering Data Management & Integration Web interfaces and workspaces Social Engineering Computational analysis tools Collaboration Bioinformatics Services Training Workshop
* Data Tools
* CEIRS ICEMR BRCs DBPs
* *Key Features: *~16,000 bacterial genomes and standardized annotations *Free bioinformatics services * Genome annotation service (RAST) * Comparative genome analysis *Integrated genomic and omics data, metadata and tools *Comparative analyses and interactive visualizations *Personal workspace *TB Portal
* Genomes Metadata Phylogenetic Trees Genes & Proteins
* Protein-protein interactions Structures Transcriptomics (Microarray, RNA-Seq) Pathways Proteomics, ChIP-Seq data coming January 2014
Reference genomes (H37Rv) * tb.patricbrc.org Gene/ Protein search Analysis Tools Omics Data
* Data Generation Infectious disease community CEIRS Insight Hypothesis Data Processing Bioinformatics centers, IRD CEIRS data coordinating center Knowledge Presentation Open access Visualization Analysis Query Analysis Training Services Collaboration
*Acknowledgment DMID/OGAT Maria Giovanni Valentina Di Francesco Julia Puzak Eun Mi Lee Punam Mathur Malu Polanski Vivien Dugan Christina Giblin The Influenza Research Database Team J. Craig Venter Institute Northrop Grumman Health Solutions Vecna Technologies Los Alamos National Laboratory University of California Davis