Integrating Environmental Health Data to Advance Discovery NAS Committee on Use of Emerging Science for Environmental Health Decisions January 10-11, 2013 Allen Dearry, Ph.D., NIEHS
Outline Is there a data problem? NIEHS strategic plan and knowledge management Data sharing NIH data and informatics planning What can NIEHS do?
Why are data an issue? Accelerating growth of biomedical research datasets. Science is increasingly computational, data-intensive, and collaborative because digital technologies provide new ways for scientists both to create scientific information, as well as to communicate, replicate, and reuse scientific knowledge and data. Changes in biomedical research technologies and methods have shifted the bottleneck in scientific productivity from data production to data management, communication, and interpretation. Encourage and support a research ecosystem that Encourage and support a research ecosystem that leverages data and tools.
Data Scale We are currently generating 1.8 zettabytes (10 21 ) of electronic data every er year (Science, 2012). Every two days we create as much information as we did from the dawn of civilization up until 2003, approximately 5 exabytes (10 18 ) of data (Google, 2010). The world s information is doubling every two years (International Data Corp, 2011).
NIEHS Strategic Plan, 2012-2017 Mission: The mission of the NIEHS is to discover how the environment affects people in order to promote po otehealthier eat e lives. Vision: The vision of the NIEHS is to provide global leadership for innovative research that improves public health by preventing disease and disability.
NIEHS Strategic Plan Knowledge Management The pace of data generation in environmental health science has surpassed the existing infrastructure for information acquisition, management, analysis, visualization, and dissemination. Information, data, and knowledge management comprise an overarching issue with implications applicable to all the strategic planning themes. There is a broad consensus that more informatics expertise and resources are required to support environmental health science research.
NIEHS Strategic Plan Knowledge Management Dedicated strategic investments of resources are necessary to support the information and knowledge needs of the multiple disciplines within environmental health science and to encourage an interdisciplinary i approach to investigate, analyze, and disseminate findings. Develop bioinformatics, biostatistics, and data integration tools to conduct interdisciplinary research for application to environmental health science. Develop and invest in publicly available resources and computational tools for integrating and analyzing environmental health data.
NIEHS and NTP interest in a more integrated approach to data science in environmental health Advance basic and translational science by facilitating and enhancing sharing of research-generated data. Promote development of new analytical methods for emerging data. Offer continuously upgraded tools, systems, and services to facilitate t data management and sharing. Maximize NIEHS and NTP investments by enhancing coordination among research groups to address common and prevalent exposures and diseases. Accelerate translation of findings into clinical and public health practice.
Unique considerations for sharing environmental health data Heterogeneity of environmental and biological measurements. Potential to identify individuals based on the association of environmental exposures with geographical data. Increased interest in return of individual or communitylevel research results from environmental health research. Regulatory implications of the use of environmental exposure and health data. Concerns of vulnerable populations who may be Concerns of vulnerable populations who may be disproportionately impacted by environmental exposures.
NIEHS Workshop on Data Sharing February 2012 Recommendations Develop common environmental measurement vocabularies. Harmonize data collection methodologies. Address future sharing of data in informed consent processes. Address privacy and confidentiality concerns. Incorporate specific needs and stated preferences of individuals and communities into data sharing plans. Articulate expectations for return of research results, scientific publications, and other forms of dissemination into informed consent processes.
NIH Data and Informatics Planning Data and Informatics Working Group (DIWG) of the Advisory Committee to the Director (ACD) Recommendations, June 2012 Promote data sharing through central and federated catalogs. Support development, implementation, evaluation, maintenance, and dissemination of informatics methods and applications. Build capacity by training workforce in relevant quantitative sciences. Develop NIH-wide data strategic plan. Provide serious funding commitment to support these Provide serious funding commitment to support these recommendations.
What Are the Big Problems to Solve? 1. Locating the data 2. Getting access to the data 3. Extending policies and practices for data sharing 4. Organizing, managing, and processing biomedical Big Data 5. Developing new methods for analyzing biomedical Big Data 6. Training researchers who can g use biomedical Big Data effectively
NIH Big Data to Knowledge (BD2K) I. Facilitating Broad Use of Biomedical Big Data New policies, catalog datasets, data & metadata standards II. Developing and Disseminating Analysis Methods and Software for Biomedical Big Data Large-scale computing, engaging developers III. Enhancing Training i for Biomedical Big Data Strengthen quantitative skills IV. Establishing Centers of Excellence for Biomedical Big Data Catalog & citation mechanisms, EHRs, privacy, imaging
What Can NIEHS Do To Enhance Data Integration and Sharing? Sharing of NIEHS-produced and supported data requires fundamental changes in current data management and dissemination practices. Leverage and expand existing data science knowledge and resources. Identify means for harmonization, storage, analysis, and management of data. Data management and sharing standards for the entire lifecycle of various kinds of digital data. Criteria for deciding the appropriate level of management and preservation. Consider trusted digital repository(ies) (TDR) for long-term storage, stewardship, and access to scientific data. Coordinate with colleagues at other NIH Institutes, federal agencies, academic institutions, NGOs, private industry.