Vivien Bonazzi ADDS Office (OD) George Komatsoulis (NCBI)



Similar documents
NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons

Data Science at the NIH Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health

Big Data to Knowledge (BD2K)

NIH As A Digital Enterprise Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health

NIH Big Data to Knowledge ac<vi<es in FY14 and beyond

Proposed Commons Credits Model Pilot Service Provider Conformance Requirements 12/22/2015 Version

NCI-DOE Pilot and Precision Medicine Initiative in Oncology

Alison Yao, Ph.D. July 2014

Cloud Computing for Scientific Research

Agenda. Background and cloud portability and interoperability concepts Distributed computing reference model. development Conclusions

Environment Canada Data Management Program. Paul Paciorek Corporate Services Branch May 7, 2014

Cloud computing - Architecting in the cloud

How To Write A Blog Post On Globus

From Research to Practice: New Models for Data-sharing and Collaboration to Improve Health and Healthcare

Oracle Applications and Cloud Computing - Future Direction

Fundamental Concepts and Models

High Performance Computing Initiatives

Multi-Council Working Group April 25, 2016

vcd Cloud Marketplace Portal

Summary of Responses to the Request for Information (RFI): Input on Development of a NIH Data Catalog (NOT-HG )

Seamless adaptive multi-cloud management of service-based applications

Cloud Computing. Cloud computing:

BD2K Update. Philip Bourne, PhD, FACMI Associate Director for Data Science

Google Cloud Platform The basics

IAAS CLOUD EXCHANGE WHITEPAPER

Globus Research Data Management: Introduction and Service Overview. Steve Tuecke Vas Vasiliadis

What s the Big Deal? Big Data, Cloud & the Internet of Things. Christine Kirkpatrick San Diego Supercomputer Center, UC San Diego

Managing and Conducting Biomedical Research on the Cloud Prasad Patil

Globus Auth. Steve Tuecke. The University of Chicago

How To Get A Cloud Platform To Work For A Company

MoMoD-Cloud Movie Mobile on Demand by Cloud

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix

How To Write A Cloud Computing Plan

Cloud Computing. Chapter 1 Introducing Cloud Computing

BYODs & FAIR Data Stewardship

The National Cancer Informatics Program (NCIP) Hub

Building Platform as a Service for Scientific Applications

Have We Really Understood the Cloud Yet?

Cluster, Grid, Cloud Concepts

Enterprise Managed Cloud Computing at NASA. Karen Petraska NASA Office of the CIO Computing Services Service Office (CSSO) October 1, 2014

Dell Active System, Enabling service-centric IT, the path to the Cloud. Pavlos Kitsanelis Enterprise Solutions Lead Greece, Cyprus, Malta

Cloud Computing and Amazon Web Services

globus online Cloud-based services for (reproducible) science Ian Foster Computation Institute University of Chicago and Argonne National Laboratory

ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013

Cloud First Does Not Have to Mean Cloud Exclusively. Digital Government Institute s Cloud Computing & Data Center Conference, September 2014

Forward-looking Statements. GAAP Reconciliation.

Government e-transformation Vision and Achievements

LabArchives Electronic Lab Notebook:

Towards a Cloud of Public Services

Cloud Computing. Chapter 1 Introducing Cloud Computing

Enabling the Big Data Commons through indexing of data and their interactions

White Paper on CLOUD COMPUTING

The Risks and Promises of Cloud Computing for Genomics

Data Management A key enabler to Open Data and. Phil Dana, VP DAMA-Ottawa, Partner, BMB Data Consulting

HPC Cloud Computing with OpenNebula

Cloud Essentials for Architects using OpenStack

A Gentle Introduction to Cloud Computing

How To Understand Cloud Computing

Cloud Security. Peter Jopling IBM UK Ltd Software Group Hursley Labs. peterjopling IBM Corporation

CLOUD COMPUTING. Dana Petcu West University of Timisoara

IDEAL INSTITITE OF MANAGEMENT AND TECHNOLOGY In association with IIT MADRAS Presents SAARANG 2015 National Level CLOUD COMPUTING Championship

NIH and Biomedical Big Data. Eric Green, M.D., Ph.D. Director, NHGRI Acting Associate Director for Data Science, NIH

SURFsara Data Services

Fundamentals of Cloud Computing

CLOUD TECH SOLUTION AT INTEL INFORMATION TECHNOLOGY ICApp Platform as a Service

The Top 5 Most Common Cloud Management Functions

Cloud-Security: Show-Stopper or Enabling Technology?

Open-source and Standards - Unleashing the Potential for Innovation of Cloud Computing

Seeing Though the Clouds

Amazon Web Services Demo Tech Exchange. Slides:

OCR LEVEL 2 CAMBRIDGE TECHNICAL

A Proof of Concept Cloud Based Solution. Mark Evans Tessella Inc. PASIG Austin, TX - January 13 th 2012

If you do NOT use applications based on Amazon Web Services raise your hand.

Oracle Database Cloud

SURFconext, Cloud Integration for Higher Education and Research. Paul van Dijk, Product Manager SURFnet

StratusLab project. Standards, Interoperability and Asset Exploitation. Vangelis Floros, GRNET

Building More Reliable Cloud Services The CUMULUS Project

CRN# CPET Cloud Computing: Technologies & Enterprise IT Strategies

Daren Kinser Auditor, UCSD Jennifer McDonald Auditor, UCSD

Research Data Networks: Privacy- Preserving Sharing of Protected Health Informa>on

The Hybrid Cloud: Bringing Cloud-Based IT Services to State Government

Cloud Services. More agility. More freedom. More choice.

and Deployment Roadmap for Satellite Ground Systems

Cloud 101. Mike Gangl, Caltech/JPL, 2015 California Institute of Technology. Government sponsorship acknowledged

Federated Identity Management at NIH NIH Login and Beyond. Debbie Bucci October 2009

CHAPTER 8 CLOUD COMPUTING

Gain Control over Cloud Services and Grow Your Cloud Professional Services Practice

Software as a Service (SaaS) and Platform as a Service (PaaS) (ENCS 691K Chapter 1)

Dynamic Security for the Hybrid Cloud

Cloud-pilot.doc SA1 Marcus Hardt, Marcin Plociennik, Ahmad Hammad, Bartek Palak E U F O R I A

NCTA Cloud Architecture

Oracle Cloud: Line of Business PaaS Services. Balaji Yelamanchili Senior Vice President Product Development

The New Style of IT. Rob McMahon. Director Cloud Computing HP General Western Europe

Cloud Ready Data: Speeding Your Journey to the Cloud

NTT i 3 Cloud Services Orchestration Platform

PaaS market moves beyond deployment and scaling

Enterprise Cloud Computing. The war for enterprise software

References. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline

Introduction to Database Systems CSE 444

Proactively Secure Your Cloud Computing Platform

Transcription:

The Commons Vivien Bonazzi ADDS Office (OD) George Komatsoulis (NCBI)

Why do we need a Commons?

NIH Data NIH Data

System.out.println ( the Commons )

What are the PRINCIPLES of The Commons? Supports a digital biomedical ecosystem Treats products of research data, software, methods, papers etc. as digital research objects Digital research objects exist in a shared virtual space Find, Deposit, Manage, Share and Reuse data, software, metadata and workflows Digital objects need to conform to FAIR principles: Findable Accessible (and usable) Interoperable Reusable

What is The Commons Framework:? Exploits new scalable computing technologies - Cloud Provides physical or logical access to data Simplifies access, sharing and interoperability of digital research objects such as data, software, metadata and workflows Makes digital research objects indexable and findable: FAIR Provides understanding and accounting of usage patterns Is potentially more cost effective given digital growth Gives currency to digital objects and the people who develop and support them

The Commons Framework Software: Services & Tools App store/user Interface scientific analysis tools/workflows Services: APIs, Containers, Indexing, Data Reference Data Sets User defined data Digital Object Compliance Compute Platform: Cloud or HPC

The Commons Framework SaaS PaaS Software: Services & Tools App store/user Interface scientific analysis tools/workflows Services: APIs, Containers, Indexing, Data Reference Data Sets User defined data Digital Object Compliance IaaS Compute Platform: Cloud or HPC

Commons: Digital Object Compliance Attributes of digital research objects in the Commons Initial Phase Unique digital object identifiers of resolvable to original authoritative source Machine readable A minimal set of searchable metadata Physically available in a cloud based Commons provider Clear access rules (especially important for human subjects data) An entry (with metadata) in one or more indices Future Phases Standard, community based unique digital object identifiers Conform to community approved standard metadata for enhanced searching Digital objects accessible via open standard APIs Are physically and logical available to the commons

Commons PILOTS

Commons Pilots - current The Cloud Credits Model Infrastructure building blocks: IaaS: accessing cloud services for NIH grantees Eventual portal for academic and commercial PaaS and SaaS Commons Supplements Data, analysis tools, APIs, containers BD2K Centers MODs (Model Organism Databases) Interoperability (some) HMP (Human Microbiome Project) NIH Affiliated Commons projects NIAID/CF/ADDS - Microbiome/HMP Cloud Pilot NCI Cloud Pilots & Genomic Data Commons

Mapping pilots to the Commons framework: Cloud Credits Model: George Komatsoulis Software: Services & Tools App store/user Interface scientific analysis tools/workflows Services: APIs, Containers, Indexing, Data Reference Data Sets User defined data Digital Object Compliance SaaS PaaS Cloud credits model (CCM) Compute Platform: Cloud or HPC IaaS

Drivers of the Cloud Credits Model Scalability Exploiting new computing models Cost Effectiveness Simplified sharing of digital objects FAIR: Findable, Accessible, Interoperable and Reusable Cloud computing supports many of these objectives

The Cloud Credits Model The Commons Cloud Provider A Cloud Provider B HPC Provider Option: Direct Funding Uses credits in the Commons Indexes Provides credits Enables Search NIH Investigator Discovery Index

Advantages of this Model Supports simplified data sharing by driving science into publicly accessible computing environments that still provide for investigator level access control Scalable for the needs of the scientific community for the next 5 years Democratize access to data and computational tools Cost effective Competitive marketplace for biomedical computing services Reduces redundancy Uses resources efficiently

Potential Disadvantages of this Model Novelty: Never been tried, so we don t have data about likelihood of success Cost Models: Assumes stable or declining prices among providers True for the last several years, but we can t guarantee that it will continue, particularly if there is significant consolidation in industry Service Providers: Assumes that providers are willing to make the investment to become conformant Market research suggests 3-5 providers within 2-3 months of launch Persistence: The model is Pay As You Go which means if you stop paying it stops going Giving investigators an unprecedented level of control over what lives (or dies) in the Commons

What does it mean for a provider to be conformant? Minimum set of requirements for Business relationships (reseller, investigators) Interfaces (upload, download, manage, compute) Capacity (storage, compute) Networking and Connectivity Information Assurance Authentication and authorization A conformant cloud is not necessarily a provider of Infrastructure as a Service (IaaS) although all providers must provide IaaS

Pilot of the Commons Cloud Credits Business Model NIH intends to run a 3 year pilot to test the efficacy of this business model in enhancing data sharing and reducing costs. Pilot will not directly interact with the existing grant system, rather, it is being modeled on the mechanisms being used to gain access to NSF and DOE national resources (HPC, light sources, etc.) The only required qualification for applying for credits will be that the investigator has an existing NIH grant A major element will be the collection of metrics to assess effectiveness of this model

Status and requests NIH recently completed a contract with the CAMH Federally Funded Research and Development Center (FFRDC) to act as the coordinating center for this effort. We need you to: Identify what capabilities will be useful to investigators Provide guidance on the conformance requirements Help identify good metrics Define the criteria that are used to decide if credit requests are selected

Cloud Credits Model Breakout Session Tomorrow! Co-moderated by Victor Jongeneel (UIUC), Valentina di Francesco (NHGRI) & George Komatsoulis (NCBI) FRIDAY 10:45 AM 12:30 PM

Mapping pilots to the Commons framework: BD2K centers, HMP, MODs & Interoperability BD2K Centers, MODS, HMP & Interoperability Supplements Software: Services & Tools App store/user Interface scientific analysis tools/workflows Services: APIs, Containers, Indexing, Data Reference Data Sets User defined data Digital Object Compliance SaaS PaaS Compute Platform: Cloud or HPC

Commons Supplements: BD2K Centers, MODs, HMP & Interoperability Testing the Commons framework Facilitating connectivity, interoperability and access to digital research objects Interoperable (APIs, containers) Digital object compliant: FAIR Indexable Publishable Privacy/security (PHI) Available on cloud platforms Providing digital research objects to populate the Commons

Commons Supplements Program Session Tomorrow! Moderated by Valentina di Francesco (NHGRI) & Vivien Bonazzi (ADDS) FRIDAY 2:00 3:30 PM

Mapping pilots to the Commons framework: NIAD and NCI Cloud Pilots/GDC : SaaS PaaS Software: Services & Tools App store/user Interface scientific analysis tools/workflows Services: APIs, Containers, Indexing, Data Reference Data Sets User defined data Digital Object Compliance NCI & NIAID Cloud Pilots/GDC IaaS Compute Platform: Cloud or HPC

NAID/CF/ADDS Microbiome* Cloud Pilot HMP data and tools in the AWS cloud Making Human Microbiome Project (HMP) data broadly accessible, computable, and usable. Moving ~20TB of HMP data to AWS Providing access to a suite of tools and APIs to facilitate data access and use Data and tools will follow FAIR principles (digital object compliance) * In collaboration with Owen White (UM) HMP DCC

NCI Cloud pilots and Genomic Data Commons Making cancer genomics data broadly accessible, computable, and usable by researchers worldwide. Genomic Data Commons (GDC) will store, analyze and distribute ~2.5 PB of cancer genomics data and associated clinical data generated by the TCGA and TARGET (Therapeutically Applicable Research to Generate Effective Treatments) initiatives The NCI cloud pilots will make TCGA data available on the AWS and Google clouds, along with a suite of tools and APIs to facilitate their access and use

Commons Pilots Poster Session Today! THURSDAY 2:00 4:00 PM

Commons Pilots - proposed Indexing and Searching digital research objects in the Commons Leveraging indexing methods within BD2K e.g. BioCADDIE Accessing large, high value commonly used data sets in the cloud E.g 1000 genomes, HMP, modencode Co-location of data and computes Digital research objects are FAIR

Mapping proposed pilots to the Commons framework : Indexing & Large Data Sets BD2K Indexing e.g. BioCADDIE NIH + Community defined data sets Software: Services & Tools App store/user Interface scientific analysis tools/workflows Services: APIs, Containers, Indexing, Data Reference Data Sets User defined data Digital Object Compliance Compute Platform: Cloud or HPC

Commons Pilots - proposed Indexing and Searching digital research objects in the Commons Leveraging indexing methods within BD2K e.g. BioCADDIE Accessing large, high value commonly used data sets in the cloud E.g 1000 genomes, HMP, modencode Co-location of data and computes Digital research objects are FAIR

Thankyou ADDS Office Phil Bourne, Michelle Dunn, Jennie Larkin, Mark Guyer, Sonynka Ngosso NCBI: George Komatsoulis NHGRI: Valentina di Francesco, Kevin Lee CIT: Debbie Sinmao, Andrea Norris, Stacy Charland Trans NIH BD2K Executive Committee & Working groups NCI: Warren Kibbe, Tony Kerlavage, Tanja Davidsen NIAID: Nick Weber, Darrell Hurt, Maria Giovanni, JJ McGowan Many biomedical researchers, cloud providers, IT professionals

QUESTIONS? Come to the poster session!