Building the Systems Biology Knowledgebase

From this document you will learn the answers to the following questions:

What is the name of the product that is used in the Kbase?

What is the purpose of the Systems Biology Knowledgebase?

Similar documents
KBase and Globus Online Nexus. Shreyas Cholia NERSC/LBL

The Office of Biological and Environmental

DOE Office of Biological & Environmental Research: Biofuels Strategic Plan

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

Mission. To provide higher technological educa5on with quality, preparing. competent professionals, with sound founda5ons in science, technology

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

Cloud-Based Big Data Analytics in Bioinformatics

A Primer of Genome Science THIRD

nuts and bolts of DNA sequencing approaches and bioinformatic tools

Data Center Evolu.on and the Cloud. Paul A. Strassmann George Mason University November 5, 2008, 7:20 to 10:00 PM

Big Data Challenges in Bioinformatics

Plant Metabolomics. For BOT 6516

OpenCB a next generation big data analytics and visualisation platform for the Omics revolution

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov

University of Glasgow - Programme Structure Summary C1G MSc Bioinformatics, Polyomics and Systems Biology

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

NERSC Data Efforts Update Prabhat Data and Analytics Group Lead February 23, 2015

Experiences with Eucalyptus: Deploying an Open Source Cloud

Science Gateways What are they and why are they having such a tremendous impact on science? Nancy Wilkins- Diehr wilkinsn@sdsc.edu

MoBEDAC -- Integrated data and analysis for the indoor and built environment. Folker Meyer Argonne National Laboratory GSC 13 Shenzhen, China

Nicolas Pons INRA Ins(tut Micalis Plateforme MetaQuant Jouy- en- Josas, France

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

ENOS: a Network Opera/ng System for ESnet Testbed

Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS

Founda'onal IT Governance A Founda'onal Framework for Governing Enterprise IT Adapted from the ISACA COBIT 5 Framework

Legacy Archiving How many lights do you leave on? September 14 th, 2015

Three data delivery cases for EMBL- EBI s Embassy. Guy Cochrane

Core Bioinformatics. Degree Type Year Semester Bioinformàtica/Bioinformatics OB 0 1

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

Structural Bioinformatics

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc.

Certified Cloud Computing Professional VS-1067

Return on Experience on Cloud Compu2ng Issues a stairway to clouds. Experts Workshop Nov. 21st, 2013

Big Data. The Big Picture. Our flexible and efficient Big Data solu9ons open the door to new opportuni9es and new business areas

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix

Interna'onal Standards Ac'vi'es on Cloud Security EVA KUIPER, CISA CISSP HP ENTERPRISE SECURITY SERVICES

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

Next-Generation Networking for Science

An Open Dynamic Big Data Driven Applica3on System Toolkit

Portable, Scalable, and High-Performance I/O Forwarding on Massively Parallel Systems. Jason Cope

OpenDaylight: Introduction, Lithium and Beyond

GeneProf and the new GeneProf Web Services

Module 3. Genome Browsing. Using Web Browsers to View Genome Annota4on. Kers4n Howe Wellcome Trust Sanger Ins4tute zfish-

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, Abstract. Haruna Cofer*, PhD

Pipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices

Cloudian The Storage Evolution to the Cloud.. Cloudian Inc. Pre Sales Engineering

Ibis: Scaling Python Analy=cs on Hadoop and Impala

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow. Barry Bolding. Cray Inc Seattle, WA

bigdata Managing Scale in Ontological Systems

Big Data + Big Analytics Transforming the way you do business

SDN Controller Requirement

Introduc)on of Pla/orm ISF. Weina Ma

Storage Solutions for Bioinformatics

Visualizing Networks: Cytoscape. Prat Thiru

SURFsara HPC Cloud Workshop

ENZO UNIFIED SOLVES THE CHALLENGES OF REAL-TIME DATA INTEGRATION

Delivering the power of the world s most successful genomics platform

Teaching Computational Thinking using Cloud Computing: By A/P Tan Tin Wee

FACULTY OF MEDICAL SCIENCE

HFAA: A Generic Socket API for Hadoop File Systems

May 13-14, Copyright 2015 Open Networking User Group. All Rights Reserved Not For

Distributed Systems Interconnec=ng Them Fundamentals of Distributed Systems Alvaro A A Fernandes School of Computer Science University of Manchester

Big Data, Big Challenges

An introduction to bioinformatic tools for population genomic and metagenetic data analysis, 2.5 higher education credits Third Cycle

2) Xen Hypervisor 3) UEC

Linux Clusters Ins.tute: Turning HPC cluster into a Big Data Cluster. A Partnership for an Advanced Compu@ng Environment (PACE) OIT/ART, Georgia Tech

Application of Graph-based Data Mining to Metabolic Pathways

Toward a Unified Ontology of Cloud Computing

Lecture 11 Data storage and LIMS solutions. Stéphane LE CROM

An Advanced Performance Architecture for Salesforce Native Applications

Cloud Ready for Bioinformatics?

Case Studies in Solving Testing Constraints using Service Virtualization

HP Converged Cloud Cloud Platform Overview. Shane Pearson Vice President, Portfolio & Product Management

Transcription:

Building the Systems Biology Knowledgebase Tom Brettin Oak Ridge National Laboratory brettints@ornl.gov outreach@kbase.us kbase-users@lists.kbase.us kbase-devel@lists.kbase.us

Integrate science and the science community JGI Sequencing Genome Annota@on Carbon Cycling Processes Bioenergy Research Integrate Science Across Ac@vi@es Metabolic Modeling Plant Feedstocks for Bioenergy Computa@onal Biology Founda@onal Research There is a tremendous wealth of data and informa@on in the Genomic Sciences program. The Knowledgebase (Kbase) is an opportunity to integrate this data and informa@on both within individual ac@vi@es as well as to integrate together different ac@vi@es.

Everyone should be a contributor! KBASE: A. Professional Computa@onal Biologists B. Data generators and basic analysts C. Knowledge Seekers D. Knowledge Generators Therefore we aim to: instances of minimum inventory/maximum diversity systems, a term coined by Peter Pearce in his book, Structure in Nature Is a Strategy for Design (MIT Press, 1978). Create a powerful framework for programma@c access to data and func@ons of Kbase. (Users A,B) Ul@mately provide stubs for use in PERL, PYTHON, R, MATLAB, Galaxy, etc. Create a set of packaged Widgets that make placement and recognizable display of Kbase func@ons on web pages (or within perhaps other apps), easy and iden@fiable. (Users B) Create a simplified portal for search and aggrega@on of data for data consumers and Knowledge Seekers. (Users C,D) Create a innova+ve pla.orm for knowledge crea+on, evolu+on and sharing. 2 DOE Office of Science Office of Biological and Environmental Research

An Integrated View of Modeling, Simulation, Experiment, and Bioinformatics Bioinformatics Analysis Tools Integrated Biological Databases Experimental Design High-throughput Experiments Analysis & Visualization

An Integrated View of Modeling, Simulation, Experiment, and Bioinformatics Problem Specification Modeling and Simulation Analysis & Visualization Bioinformatics Analysis Tools Integrated Biological Databases Experimental Design High-throughput Experiments Analysis & Visualization

Base Knowledgebase enabling predic5ve systems biology. Powerful modeling framework. Systems Biology Knowledge Community driven, extensible and scalable open source so_ware and applica@on system. Infrastructure for integra@on and reconcilia@on of algorithms and data sources. Framework for standardiza@on, search, and associa@on of data. Enable model based experimental design and interpreta<on of results. Microbes Communities Plants

Engineering a Microbe for Biofuel Produc<on Annotated Genome Annota@on algorithms Metabolic reconstruc@on Feed Stock Stresses Hydrolysate, ph, Salt, End product, intermediates Metabolic model genera@on Model op@miza@on algorithms Biomass Regulatory network inference Isoprene Other func@onal modeling Fi`ng kine@c model parameters DNA replica<on transcrip<on protein folding transla<on Regula<on Predic@ng pathway fluxes KBase Tool Integra<on Proposing strain op@miza@ons Genome Sequence Compara@ve Genomics KEGG Brenda BioCyc Published models Gene KO Phenotypes Transcriptomics Metabolomics Proteomics Growth curves Flux tracing experiments KBase Data Integra<on

Modifying Lignin Biosynthesis S G H S G H PolyPhen 2 Genome annota@on algorithms Compara@ve genomics Genome wide Correla@ve analysis SNP influenced changes in protein structure and func@on Pathway predic@ons Network inference Pathway reconstruc@on Omics & SNP overlay Model op@miza@on valida@on Phylogenomics Modeling phase I Plant systems modifica@on Phenotype Mutant popula<on Resequencing data Transcriptomics Proteomics Metabolomics

Culturing Recalcitrant Microbes from Communi<es Covaria<on Analysis, Phylogene<cally and Func<onally Interes<ng Keystone Species Phylogene<c Inference Gene Func<onal Annota<on Trp N Differen<al Gene Expression Popula@on Sta@s@cs Compara@ve Metagenomics Isolate Genomes and Models Genome Assembly from Metagenomics Annota@on and Metabolic Reconstruc@on Regula@on and Func@onal Modeling Predict Syntrophic Interac@ons Predict Culturing Condi@ons Isolate vs. Community Phenotype Species Abundance Func@onal Gene Abundance Phylo binning and scaffolding Transcriptomics Metabolomics Proteomics Temp ph Salinity Amino Acids Cofactors Syntrophies

What the KBase Needs To Provide? Scalable compute and data capabilities beyond that available locally Distributed infrastructure available 24x7 worldwide Integration with local bioinfo systems for seamless computing and data management Enables leverage of remote systems administration and support via service providers Enables access to state of the art facilities at fraction of the cost (SPs just add more servers) Centralized support of tools and data Bottom line enable biologists to focus on biology

Leverage Existing Investments We leverage the considerable investments in existing integrated databases and analysis environments Key challenge: How we build on these systems yet provide to the community an integrated view for future development

Microbes Online Model SEED MG-RAST 1000s Data Sets 300+ Daily Users Meta Microbes Online 6532 Models 1000+ Users 41,000 Metagenomes 500+ Daily Users Phyotozome 153 Metagenomes 100+ Daily Users RegFam 1000s Papers 100+ Daily Users 20,000+ users The SEED 1166 Subsystems 5859 Users 25 Plant Genomes 300 Daily Users RAST 39,000 Genomes 6000+ Users

Infrastructure Goals Our vision is to put users in the drivers seat.

DOE Systems Biology Knowledgebase KBASE Data and modeling for predictive biology Overview of Infrastructure Tom Brettin and Rick Stevens Oak Ridge and Argonne National Laboratories

Working As One Team Plant CDM Design and Build Jan 2012, ORNL Communi@es Hackathon Jan 2012, LBL First Internal Kbase Build Feb 2012, ANL

Scien@fic So_ware Technical Reviews (May 2 3, 2012)

Energy Sciences Network (ESnet) KBase leverages ESNet for 10+ Gb/s data transfer between all nodes BNL ESnet backbone ( ESnet4) is a na<onal 10 Gbps op<cal circuit infrastructure ESnet shares its op<cal network with Internet2 ESnet's IP network func<ons as a Tier 1 internet service provider

The DOE KBase Cloud Built on the DOE ASCR investment in the Magellan cloud infrastructure Current configura@on of 700 nodes homed at ANL op@mized for heterogeneous applica@ons Open Stack Cloud @ Argonne Open Stack Cloud @ Oak Ridge Cluster system @ Berkeley Cluster system @ Brookhaven

The Kbase Cloud Architecture Data Intensive Science KBase Applica<on Development Large Scale Computa<on Method Development HPC Cluster Image MapReduce Image Ubuntu Image KBase Image OpenStack IaaS Cloud SoZware Stack (EC2/S3 APIs) Commodity Compute Cluster Hardware

The KBase Services Services Oriented Architecture: The KBase Unified API access to a highly diverse set of services ranging from quick retrieval of simple data to massive computa@ons on the KBase Cloud. In a SOA the system is func@onally decomposed into many services each of which is implemented as one or more servers. Our long term goal includes community developed and contributed services. Our ini@al set of services will be backed by the following example servers: Genomic Servers Protein Family Servers Phenotype Servers Polymorphism Servers Compound and Reac5on Data Servers Metabolic Modeling Servers Expression Data Servers Regulatory Models Servers

Concept: KBase User Experience

Development Schedule A series of system builds occurring every quarter will enable a graded process. Successive builds will expand community involvement.