Big Data to Knowledge () potential funding agency synergies Jennie Larkin, PhD Office of the Associate Director of Data Science National Institutes of Health idash-pscanner meeting UCSD September 16, 2014
Talk Outline /ADDS mission Who we are What we are doing Partnerships
Mission Statement To foster an ecosystem that enables biomedical research to be conducted as a sustainable digital enterprise that enhances health, lengthens life and reduces illness and disability
The Challenge Facing Us: increasing data versus flat budgets Source Michael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830
Biomedical Research Enterprise - Today Digital assets (data, software, and tools) are distributed. Digital assets are difficult to find and use. Much of biomedical research enterprise is concept-centric. Major product of research are scientific papers Reward/incentives/metrics are publication-based
Biomedical Research Enterprise - Tomorrow Liberated digital asset ecosystem increase data and software sharing Greater prominence of digital assets (data, software, standards, etc) in science & scholarship Ways to make digital assets Discoverable Useful to others Citable Linked to scientific literature Data science-savvy workforce
Today? Tomorrow
Data and Informatics Working Group acd.od.nih.gov/diwg.htm
DIWG Report Overarching Themes NIH at a pivotal point: Risk failing to capitalize on technology advances Bordering on institutional malpractice Cultural changes at NIH and across biomedical research are essential Long-term NIH commitment is required
Major Data Science Problems to Solve 1. Locating and citing the digital assets. data and software discovery indices 2. Ensuring digital assets are useful and usable. standards activities 3. Extending policies and practices for data sharing. working across NIH and agencies 4. Developing new methods to analyze and manage biomedical Big Data (computing across data types, data integration). new data science research 5. Training researchers who can use biomedical Big Data effectively. workforce development (DIWG report, 2012)
The Biomedical Digital Enterprise enable biomedical research as a sustainable digital research enterprise to facilitate discovery and support new knowledge and maximize community engagement. 11
Components of The Digital Enterprise Consists of digital assets e.g. datasets, papers, software, associated metadata, lab notes Each asset is uniquely identified and has provenance, including access control e.g. publishing simply involves changing the access control Requires robust, scalable, interoperable computer infrastructure Digital assets interoperable across the enterprise
Goals & Examples of the Digital Enterprise Sustainability 50% business model Efficiency sharing best practices in longitudinal clinical studies Collaboration - identification of collaborators at the point of data collection not publication Reproducibility data accessible with publication Integration phenotype harmonization Accessibility clinical trials registration Quality sharing CDEs across institutes Training keeping trainees in the ecosystem
Developing the Digital Research Enterprise: a community endeavor Not just NIH and federal mandates Not only innovation from the extramural community. A collaboration between NIH and stakeholders in the biomedical research community research institutions, publishers, societies, researchers, libraries, industry, funders, international partners, etc.
Example Participants of the Ecosystem NIH 20/27 ICs Agencies NSF DOE DARPA NIST Government OSTP HHS HDI ONC CDC FDA Private sector Phrma Google Amazon Organizations PCORI RDA, ELIXIR CCC CATS FASEB Biophysical Society Sloan Foundation Moore Foundation
Raw Materials to Build the Digital Enterprise NIH mandate & support ADDS team of 8 people Intramural participation of over 100 team members across ICs Funding through : ~$30M in FY14 ~$80M in FY15...
Associate Director for Data Science: Phil Bourne Scientific Data Council Multi-Council Working Group Programmatic Theme Sustainability Education Innovation Process Collaboration Deliverable Commons Example Features Cloud Data & Compute Search Security Reproducibility Standards App Store Training Center Coordinate Hands-on Syllabus MOOCs Community Centers Training Grants Catalogs Standards Analysis Modified Review Data Resource Support Metrics Best Practices Evaluation Portfolio Analysis Communication IC s Researchers Federal Agencies International Partners Computer Scientists The Biomedical Research Digital Enterprise
ADDS/ Activities in 2014 Hired the NIH Associate Director for Data Science, Dr. Phil Bourne Workshops and RFIs for community input Established new office and procedures (multicouncil working group for trans-nih perspective) Published and will award new data science research: Data Discovery Index Coordination Consortium Training: short courses, open educational resources, mentored career development Investigator-Initiated Centers of Excellence -LINCS Perturbation Data Coordination and Integration Center
Associate Director for Data Science Scientific Data Council External Advisory Board Programmatic Theme Sustainability Education Innovation Process Collaboration Deliverable Commons Example Features Cloud Data & Compute Search Security Reproducibility Standards App Store Training Center Coordinate Hands-on Syllabus MOOCs Community Centers Training Grants Catalogs Standards Analysis Modified Review Data Resource Support Metrics Best Practices Evaluation Portfolio Analysis * Hires made Communication IC s Researchers Federal Agencies International Partners Computer Scientists The Biomedical Research Digital Enterprise
The Commons Vivien Bonazzi & George Komatsoulis ADDS NCBI Dropbox like storage The opportunity to apply quality metrics Bring compute to the data A place to collaborate A place to discover
Piloting the Commons Public/private partnership Work with IC s, NCBI and CIT to identify and run pilots cloud, HPC centers Port DbGAP or SRA to the cloud? Experiment with new funding strategies Evaluate
What is a Commons? Supports sharing, accessibility, and discoverability of biomedical data and analytical tools. Enables scientific innovation, by having data resources co-located with advanced computing resources. Uses flexible, cloud-based computing to support big data and scalable analysis. Allows researchers to bring their analysis to the data (important when the data are too big to move) Allows scientists to interact, share, analyze and explore large data sets.
Associate Director for Data Science Scientific Data Council External Advisory Board Programmatic Theme Sustainability* Education* Innovation* Process Collaboration Deliverable Commons Example Features Cloud Data & Compute Search Security Reproducibility Standards App Store Training Center Coordinate Hands-on Syllabus MOOCs Community Centers Training Grants Catalogs Standards Analysis Modified Review Data Resource Support Metrics Best Practices Evaluation Portfolio Analysis * Hires made Communication IC s Researchers Federal Agencies International Partners Computer Scientists The Biomedical Research Digital Enterprise
Training Michelle Dunn Training Goals: Develop a sufficient cadre of researchers skilled in the science of Big Data Elevate general competencies in data usage and analysis across the biomedical research workforce Combat the Google bus How: Traditional training grants Work with IC s on a needs assessment Work with institutions on raising awareness Training center(s)?
Associate Director for Data Science Scientific Data Council External Advisory Board Programmatic Theme Sustainability* Education* Innovation* Process Collaboration Deliverable Commons Example Features Cloud Data & Compute Search Security Reproducibility Standards App Store Training Center Coordinate Hands-on Syllabus MOOCs Community Centers Training Grants Catalogs Standards Analysis Modified Review Data Resource Support Metrics Best Practices Evaluation Portfolio Analysis * Hires made Communication IC s Researchers Federal Agencies International Partners Computer Scientists The Biomedical Research Digital Enterprise
Innovation Jennie Larkin and Mark Guyer Data Discovery Index Coordination Consortium (U24) Funding in September 2014 Centers of Excellence in Data Science. Funding in September 2014 Data and Metadata Standards in development. See RFI: NOT-CA-14-054 Targeted Software Development Development of Software and Analysis Methods for Biomedical Big Data in Targeted Areas of High Need (U01) RFA-HG-14-020. Funding in FY15. Topics: data compression/reduction, visualization, provenance, or wrangling.
Innovation FY 15 Governance model to foster the ecosystem Workshops identified, others will be considered Sustainability Standards ELSI for research use of clinical data Private sector engagement for research use of clinical data Using EHRs for outcomes research Gaming community contribution to biomedical research Software index? Standards framework? Other?
What is active now in? Request for Information: Input on Information Resources for Data-Related Standards Widely Used in Biomedical Science NOT-CA-14-054 http://grants.nih.gov/grants/guide/notice-files/not-ca- 14-054.html Response Date: September 30, 2014 All Training FOAs: T32, T15, K01, R25 BISTI FOAs
Associate Director for Data Science Scientific Data Council External Advisory Board Programmatic Theme Sustainability* Education* Innovation* Process Collaboration Deliverable Commons Example Features Cloud Data & Compute Search Security Reproducibility Standards App Store Training Center Coordinate Hands-on Syllabus MOOCs Community Centers Training Grants Catalogs Standards Analysis Modified Review Data Resource Support Metrics Best Practices Evaluation Portfolio Analysis * Hires made Communication IC s Researchers Federal Agencies International Partners Computer Scientists The Biomedical Research Digital Enterprise
Process Current Efforts Clinical data harmonization Data citation Machine readable data sharing plans New review models, audiences etc. Open review Micro funding Standing data committees to explore best practices Crowd sourcing
Collaboration Current Efforts Joint public private partnership workshop with NOAA? 2 joint workshops with NSF + Dear Colleague letter OSTP Open Data 2.0 HIRO s big data meeting ELIXIR working groups
NIH Policy and Cultural Change NIH and other Federal Agencies are working to make digital assets from federally funded research available. Public Access to Data Memo: http://www.whitehouse.gov/sites/default/files/microsites/ostp/o stp_public_access_memo_2013.pdf Applies to publications and digital scientific data Develop a strategy for: leveraging existing archives (where appropriate) fostering public-private partnerships with scientific journals relevant to the agency s research
NIH Policy and Cultural Change New NIH Genomic Data Sharing policy see: http://gds.nih.gov How technology and policy may impact culture If data management plans are included in applications, they can be peer-reviewed for strength and appropriateness to their field. Indexing of digital assets would allow grantees to report to funders and link data to publications. Improved ways to support citation and impact of digital assets (publications, data sets, software, standards), would support credit to scientists.
Challenges to Creating a Digital Research Enterprise Relies on participation of the larger community (funders, researchers, publishers, industry, academics ) Requires technological improvements and coordination Will need associated cultural changes in reward systems that value sharing of digital assets.
Result of? Enable a new digital enterprise that will: include researchers, clinicians, computer scientists, and others who work with digital research assets. recognize and support the importance of publications, data, software, and analyses. ensure that knowledge and resources coming from biomedical research can be more informative and reusable. promote cultural changes in the scientific community
Find out More! ADDS Office: http://www.nih.gov/adds : http://bd2k.nih.gov Workshop reports and videos Funding Opportunities Focus areas, structure, and people in Links to Phil s blog Join the listserve or follow the twitter email: Jennie.Larkin@nih.gov
NIH Turning Discovery Into Health