Big Data to Knowledge (BD2K)



Similar documents
Data Science at the NIH Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health

Vivien Bonazzi ADDS Office (OD) George Komatsoulis (NCBI)

From Research to Practice: New Models for Data-sharing and Collaboration to Improve Health and Healthcare

NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons

Summary of Responses to the Request for Information (RFI): Input on Development of a NIH Data Catalog (NOT-HG )

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21) $100,070,000 -$32,350,000 / %

Searching biomedical data sets. Hua Xu, PhD The University of Texas Health Science Center at Houston

Open Data and Big Data at NOAA

High Performance Computing Initiatives

Alison Yao, Ph.D. July 2014

NOAA Environmental Data Management

RFI Summary: Executive Summary

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21)

NITRD: National Big Data Strategic Plan. Summary of Request for Information Responses

Response from Oxford University Press, USA

Report of the DTL focus meeting on Life Science Data Repositories

Public Access Plan. U.S. Department of Energy July 24, 2014 ENERGY.GOV

How To Manage Research Data At Columbia

Strategic Plan. Creating a healthier world through bold innovation

Re: Public Access to Peer-Reviewed Scholarly Publications Resulting from Federally Funded Research Request for Information

THE GEORGE T. HARRELL HEALTH SCIENCES LIBRARY. Strategic Plan

Understanding Big Data Analytics for Research

National Big Data R&D Initiative

NIH Grants Overview: Scientific Review Branch. Management

OpenAIRE Research Data Management Briefing paper

Plan to Increase Access to Results of FDA-Funded Scientific Research

Databases & Data Infrastructure. Kerstin Lehnert

Funding Your Graduate Program in the Biosciences

Horizon Research e-infrastructures Excellence in Science Work Programme Wim Jansen. DG CONNECT European Commission

NASA Plan for Increasing Access to the Results of Scientific Research

NATIONAL CENTER FOR PUBLIC HEALTH INFORMATICS (CPE)

How To Build An Open Source Data Infrastructure

Data at NIST: A View from the Office of Data and Informatics

The National Consortium for Data Science (NCDS)

COGNITIVE SCIENCE AND NEUROSCIENCE

Research Data Management

Best Practices for Data Management. RMACC HPC Symposium, 8/13/2014

Cloud Computing for Education Workshop

9360/15 FMA/AFG/cb 1 DG G 3 C

SECURE AND TRUSTWORTHY CYBERSPACE (SaTC)

Data Integration Strategies

31 December Dear Sir:

Strategic Agenda for Library-based Research Data Support Services

NIH s Genomic Data Sharing Policy

RFA-OD : NIH Research Evaluation and Commercialization Hub (REACH) Awards (U01) Kurt Marek, PhD

Introduction to Research Data Management

UNIVERSITY OF WISCONSIN SYSTEM INFORMATION TECHNOLOGY SUMMARY FISCAL YEAR 2015

DEPARTMENT OF DEFENSE

Strategic Vision. for Stewarding the Nation s Climate Data. Our. NOAA s National Climatic Data Center

Understanding EHRs: Common Features and Strategic Approaches for Medicaid/SCHIP

Information and Communications Technology Strategy

UMKC STRATEGIC PLAN LIFE AND HEALTH SCIENCES 1. IMPLEMENT ORGANIZATIONAL ENHANCEMENTS TO ADVANCE THE

Cloud and Big Data Standardisation

Cloud Computing for Architects

European Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute

8970/15 FMA/AFG/cb 1 DG G 3 C

Survey of Canadian and International Data Management Initiatives. By Diego Argáez and Kathleen Shearer

354 Russell Senate Office Building 724 Hart Senate Office Building Washington, D.C Washington, D.C

Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects

Transcription:

Big Data to Knowledge () potential funding agency synergies Jennie Larkin, PhD Office of the Associate Director of Data Science National Institutes of Health idash-pscanner meeting UCSD September 16, 2014

Talk Outline /ADDS mission Who we are What we are doing Partnerships

Mission Statement To foster an ecosystem that enables biomedical research to be conducted as a sustainable digital enterprise that enhances health, lengthens life and reduces illness and disability

The Challenge Facing Us: increasing data versus flat budgets Source Michael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830

Biomedical Research Enterprise - Today Digital assets (data, software, and tools) are distributed. Digital assets are difficult to find and use. Much of biomedical research enterprise is concept-centric. Major product of research are scientific papers Reward/incentives/metrics are publication-based

Biomedical Research Enterprise - Tomorrow Liberated digital asset ecosystem increase data and software sharing Greater prominence of digital assets (data, software, standards, etc) in science & scholarship Ways to make digital assets Discoverable Useful to others Citable Linked to scientific literature Data science-savvy workforce

Today? Tomorrow

Data and Informatics Working Group acd.od.nih.gov/diwg.htm

DIWG Report Overarching Themes NIH at a pivotal point: Risk failing to capitalize on technology advances Bordering on institutional malpractice Cultural changes at NIH and across biomedical research are essential Long-term NIH commitment is required

Major Data Science Problems to Solve 1. Locating and citing the digital assets. data and software discovery indices 2. Ensuring digital assets are useful and usable. standards activities 3. Extending policies and practices for data sharing. working across NIH and agencies 4. Developing new methods to analyze and manage biomedical Big Data (computing across data types, data integration). new data science research 5. Training researchers who can use biomedical Big Data effectively. workforce development (DIWG report, 2012)

The Biomedical Digital Enterprise enable biomedical research as a sustainable digital research enterprise to facilitate discovery and support new knowledge and maximize community engagement. 11

Components of The Digital Enterprise Consists of digital assets e.g. datasets, papers, software, associated metadata, lab notes Each asset is uniquely identified and has provenance, including access control e.g. publishing simply involves changing the access control Requires robust, scalable, interoperable computer infrastructure Digital assets interoperable across the enterprise

Goals & Examples of the Digital Enterprise Sustainability 50% business model Efficiency sharing best practices in longitudinal clinical studies Collaboration - identification of collaborators at the point of data collection not publication Reproducibility data accessible with publication Integration phenotype harmonization Accessibility clinical trials registration Quality sharing CDEs across institutes Training keeping trainees in the ecosystem

Developing the Digital Research Enterprise: a community endeavor Not just NIH and federal mandates Not only innovation from the extramural community. A collaboration between NIH and stakeholders in the biomedical research community research institutions, publishers, societies, researchers, libraries, industry, funders, international partners, etc.

Example Participants of the Ecosystem NIH 20/27 ICs Agencies NSF DOE DARPA NIST Government OSTP HHS HDI ONC CDC FDA Private sector Phrma Google Amazon Organizations PCORI RDA, ELIXIR CCC CATS FASEB Biophysical Society Sloan Foundation Moore Foundation

Raw Materials to Build the Digital Enterprise NIH mandate & support ADDS team of 8 people Intramural participation of over 100 team members across ICs Funding through : ~$30M in FY14 ~$80M in FY15...

Associate Director for Data Science: Phil Bourne Scientific Data Council Multi-Council Working Group Programmatic Theme Sustainability Education Innovation Process Collaboration Deliverable Commons Example Features Cloud Data & Compute Search Security Reproducibility Standards App Store Training Center Coordinate Hands-on Syllabus MOOCs Community Centers Training Grants Catalogs Standards Analysis Modified Review Data Resource Support Metrics Best Practices Evaluation Portfolio Analysis Communication IC s Researchers Federal Agencies International Partners Computer Scientists The Biomedical Research Digital Enterprise

ADDS/ Activities in 2014 Hired the NIH Associate Director for Data Science, Dr. Phil Bourne Workshops and RFIs for community input Established new office and procedures (multicouncil working group for trans-nih perspective) Published and will award new data science research: Data Discovery Index Coordination Consortium Training: short courses, open educational resources, mentored career development Investigator-Initiated Centers of Excellence -LINCS Perturbation Data Coordination and Integration Center

Associate Director for Data Science Scientific Data Council External Advisory Board Programmatic Theme Sustainability Education Innovation Process Collaboration Deliverable Commons Example Features Cloud Data & Compute Search Security Reproducibility Standards App Store Training Center Coordinate Hands-on Syllabus MOOCs Community Centers Training Grants Catalogs Standards Analysis Modified Review Data Resource Support Metrics Best Practices Evaluation Portfolio Analysis * Hires made Communication IC s Researchers Federal Agencies International Partners Computer Scientists The Biomedical Research Digital Enterprise

The Commons Vivien Bonazzi & George Komatsoulis ADDS NCBI Dropbox like storage The opportunity to apply quality metrics Bring compute to the data A place to collaborate A place to discover

Piloting the Commons Public/private partnership Work with IC s, NCBI and CIT to identify and run pilots cloud, HPC centers Port DbGAP or SRA to the cloud? Experiment with new funding strategies Evaluate

What is a Commons? Supports sharing, accessibility, and discoverability of biomedical data and analytical tools. Enables scientific innovation, by having data resources co-located with advanced computing resources. Uses flexible, cloud-based computing to support big data and scalable analysis. Allows researchers to bring their analysis to the data (important when the data are too big to move) Allows scientists to interact, share, analyze and explore large data sets.

Associate Director for Data Science Scientific Data Council External Advisory Board Programmatic Theme Sustainability* Education* Innovation* Process Collaboration Deliverable Commons Example Features Cloud Data & Compute Search Security Reproducibility Standards App Store Training Center Coordinate Hands-on Syllabus MOOCs Community Centers Training Grants Catalogs Standards Analysis Modified Review Data Resource Support Metrics Best Practices Evaluation Portfolio Analysis * Hires made Communication IC s Researchers Federal Agencies International Partners Computer Scientists The Biomedical Research Digital Enterprise

Training Michelle Dunn Training Goals: Develop a sufficient cadre of researchers skilled in the science of Big Data Elevate general competencies in data usage and analysis across the biomedical research workforce Combat the Google bus How: Traditional training grants Work with IC s on a needs assessment Work with institutions on raising awareness Training center(s)?

Associate Director for Data Science Scientific Data Council External Advisory Board Programmatic Theme Sustainability* Education* Innovation* Process Collaboration Deliverable Commons Example Features Cloud Data & Compute Search Security Reproducibility Standards App Store Training Center Coordinate Hands-on Syllabus MOOCs Community Centers Training Grants Catalogs Standards Analysis Modified Review Data Resource Support Metrics Best Practices Evaluation Portfolio Analysis * Hires made Communication IC s Researchers Federal Agencies International Partners Computer Scientists The Biomedical Research Digital Enterprise

Innovation Jennie Larkin and Mark Guyer Data Discovery Index Coordination Consortium (U24) Funding in September 2014 Centers of Excellence in Data Science. Funding in September 2014 Data and Metadata Standards in development. See RFI: NOT-CA-14-054 Targeted Software Development Development of Software and Analysis Methods for Biomedical Big Data in Targeted Areas of High Need (U01) RFA-HG-14-020. Funding in FY15. Topics: data compression/reduction, visualization, provenance, or wrangling.

Innovation FY 15 Governance model to foster the ecosystem Workshops identified, others will be considered Sustainability Standards ELSI for research use of clinical data Private sector engagement for research use of clinical data Using EHRs for outcomes research Gaming community contribution to biomedical research Software index? Standards framework? Other?

What is active now in? Request for Information: Input on Information Resources for Data-Related Standards Widely Used in Biomedical Science NOT-CA-14-054 http://grants.nih.gov/grants/guide/notice-files/not-ca- 14-054.html Response Date: September 30, 2014 All Training FOAs: T32, T15, K01, R25 BISTI FOAs

Associate Director for Data Science Scientific Data Council External Advisory Board Programmatic Theme Sustainability* Education* Innovation* Process Collaboration Deliverable Commons Example Features Cloud Data & Compute Search Security Reproducibility Standards App Store Training Center Coordinate Hands-on Syllabus MOOCs Community Centers Training Grants Catalogs Standards Analysis Modified Review Data Resource Support Metrics Best Practices Evaluation Portfolio Analysis * Hires made Communication IC s Researchers Federal Agencies International Partners Computer Scientists The Biomedical Research Digital Enterprise

Process Current Efforts Clinical data harmonization Data citation Machine readable data sharing plans New review models, audiences etc. Open review Micro funding Standing data committees to explore best practices Crowd sourcing

Collaboration Current Efforts Joint public private partnership workshop with NOAA? 2 joint workshops with NSF + Dear Colleague letter OSTP Open Data 2.0 HIRO s big data meeting ELIXIR working groups

NIH Policy and Cultural Change NIH and other Federal Agencies are working to make digital assets from federally funded research available. Public Access to Data Memo: http://www.whitehouse.gov/sites/default/files/microsites/ostp/o stp_public_access_memo_2013.pdf Applies to publications and digital scientific data Develop a strategy for: leveraging existing archives (where appropriate) fostering public-private partnerships with scientific journals relevant to the agency s research

NIH Policy and Cultural Change New NIH Genomic Data Sharing policy see: http://gds.nih.gov How technology and policy may impact culture If data management plans are included in applications, they can be peer-reviewed for strength and appropriateness to their field. Indexing of digital assets would allow grantees to report to funders and link data to publications. Improved ways to support citation and impact of digital assets (publications, data sets, software, standards), would support credit to scientists.

Challenges to Creating a Digital Research Enterprise Relies on participation of the larger community (funders, researchers, publishers, industry, academics ) Requires technological improvements and coordination Will need associated cultural changes in reward systems that value sharing of digital assets.

Result of? Enable a new digital enterprise that will: include researchers, clinicians, computer scientists, and others who work with digital research assets. recognize and support the importance of publications, data, software, and analyses. ensure that knowledge and resources coming from biomedical research can be more informative and reusable. promote cultural changes in the scientific community

Find out More! ADDS Office: http://www.nih.gov/adds : http://bd2k.nih.gov Workshop reports and videos Funding Opportunities Focus areas, structure, and people in Links to Phil s blog Join the listserve or follow the twitter email: Jennie.Larkin@nih.gov

NIH Turning Discovery Into Health