Data Management Considerations for the Data Life Cycle
|
|
|
- Mercy Burke
- 10 years ago
- Views:
Transcription
1 Data Management Considerations for the Data Life Cycle NRC STS Panel 2011 November 17, 2011, Washington DC Peter Fox (RPI) Tetherless World Constellation
2 Working premise Scientists actually ANYONE - should be able to access and use a global, distributed knowledge base of scientific data that: appears to be integrated appears to be locally available But data and information is obtained by multiple means (instruments, models, analysis) using various (often opaque) protocols, in differing vocabularies, using (sometimes unstated) assumptions, with inconsistent (or non-existent) meta-data. It may be inconsistent, incomplete, evolving, and distributed AND created in a form that facilitates generation, not use (except by accident) And significant levels of semantic heterogeneity, largescale data, complex data types, legacy systems, inflexible and unsustainable implementation technology Uh-oh 2
3 Life cycle overused? Data Life Cycle: The data life cycle is a term coined to represent the entire process of data management. It starts with concept study and data collection, but importantly has no end, as data is continually repurposed, creating new data products that may be processed, distributed, discovered, analyzed and archived. Fully supporting the different steps in the life cycle puts demands on metadata, standards, tools and people.
4 Definition simply put Data life-cycle elements (simple 3-level) 1 Acquisition: Process of recording or generating a concrete artefact from the concept (see transduction) 2 Curation: The activity of managing the use of data from its point of creation to ensure it is available for discovery and re-use in the future 3 Preservation: Process of retaining usability of data in some source form for intended and unintended use Stewardship: Process of maintaining integrity across acquisition, curation and preservation + arranging for discovery, access and use of data, information and all related elements. Involves fiscal and intellectual responsibility 4
5 People! Curation stages Fox VSTO et al. 5
6 People, organizations, norms not so simply put Even cast into three broad stages Acquisition/ Curation/ Preservation: The people/ roles, their organizations, means of generation, what matters to them can be very different across and (especially) between stages of the life cycle, even if they overlap Stewardship: So how is an effective and coherent process of maintaining integrity across acquisition, curation and preservation + arranging for discovery, access and use of data, information and all related elements, achieved? Especially as it Involves fiscal and intellectual responsibility Sustainability= resources + social + organizational constructs 6
7 Digital Curation Center model
8 MIT DDA Alliance model
9 It does not go on forever
10 Business or software model?
11 Can we really fulfil futures with diagrams? GEO Secretariat
12 GeoData 2011 Metadata at All Stages of the Data Life Cycle Best Practices in Data Life Cycle Management The Human Factor Training and Outreach Technology gaps A business model for the longer term (NSF, USGS, NASA, NOAA, )
13 The Human Factor in Data Life Cycle Management Tendency of people to continue working the way they have in the past: the inertia of habit. Leads to slow growth in data submissions to archives. Both carrots (such as wider data use and data citations) and sticks (contingent funding and contingent publication of papers) were suggested as potential motivators to change this inertia. Will publish data for credit
14 Personal view Over-emphasis on preservation Heavier, rather than supportable or scalable, approaches to curation Significant attention deficit disorder to acquisition Why value? Return on investment but those that need to perform the key data management functions (across life cycle) A-I-G coalitions!
15 E.g. Identifiers Idea: not to pre-empt an implementation DOI = e.g /s URI, entifier e.g n85/fulltext.pdf But has fundamental consequences for data in the life cycle (sharing, annotation) 15
16 To be sustainable Bridge, or eliminate disconnects between people, organizations, norms Yes, it s so easy to write Problem: the stakeholders are not all at the table Lightweight solutions, e.g. linked data have a lot of potential but terrify data curators and preservationists Heavyweight solutions, e.g. full life cycle OAIS approaches are robust Lightweight solutions, terrify (data) generators
17 Temptation To run screaming from the room? Been there, done that Or Look for a GOOD solution one that addresses the goal of the virtual organization across the life cycle (they exist in the small ) Focus on value the real and immediate value to the people their institutions/ communities and funders 17
18 Thanks for listening Questions? Comments?
19 .. Data has Lots of Audiences More Strategic Less Strategic Science too! From Why EPO?, a NASA internal report on science education,
20 Recommendations (1) Develop review criteria for data management plans that recognize the criticality of upfront planning, design of metadata and data design, as well as the importance of enabling future repurposing of the data. Develop proposal and review criteria for instrument development that ensures adequate metadata collection at the time of measurement. Develop a searchable repository for best practices in data lifecycle management for capturing best practices noted in Geodata 2011, the NSF Research Data Lifecycle Management Workshop in July 2011, and related work in the Earth Science Information Partners and similar organizations.
21 Recommendations (2) Identify Data Life Cycle Communities of Practice within NSF programs as well as other organizations such as ESIP and American Geophysical Union. Develop methods, such as workshops, AGU special sessions, etc. for bringing together overlapping Communities of Practice for information exchange. Develop domain-specific tools for adding and editing metadata, particularly within ISO metadata standards.
22 Recommendations (3) Develop incentives (both carrots and sticks) to induce data providers to develop metadata and data products that will be usable by both narrow initial users of data and the wider community of interdisciplinary users reusing data for other purposes. Develop curricula targeted at both the practicing researcher and science students in data science, including data collection, management and integration. Continue supporting work in recording provenance of data, while shifting some resources to the practical application of this work.
23 Recommendations (4) Support research and applications aimed at improving cross-disciplinary and inter-disciplinary discovery of data and related services. Develop Business Models for persistent long-term archives that take into account the funding cycle as well as the data life cycle.
Databases & Data Infrastructure. Kerstin Lehnert
+ Databases & Data Infrastructure Kerstin Lehnert + Access to Data is Needed 2 to allow verification of research results to allow re-use of data + The road to reuse is perilous (1) 3 Accessibility Discovery,
How To Write A Blog Post On Globus
Globus Software as a Service data publication and discovery Kyle Chard, University of Chicago Computation Institute, [email protected] Jim Pruyne, University of Chicago Computation Institute, [email protected]
Interagency Science Working Group. National Archives and Records Administration
Interagency Science Working Group 1 National Archives and Records Administration Establishing Trustworthy Digital Repositories: A Discussion Guide Based on the ISO Open Archival Information System (OAIS)
Service Road Map for ANDS Core Infrastructure and Applications Programs
Service Road Map for ANDS Core and Applications Programs Version 1.0 public exposure draft 31-March 2010 Document Target Audience This is a high level reference guide designed to communicate to ANDS external
James Hardiman Library. Digital Scholarship Enablement Strategy
James Hardiman Library Digital Scholarship Enablement Strategy This document outlines the James Hardiman Library s strategy to enable digital scholarship at NUI Galway. The strategy envisages the development
Exploring the roles and responsibilities of data centres and institutions in curating research data a preliminary briefing.
Exploring the roles and responsibilities of data centres and institutions in curating research data a preliminary briefing. Dr Liz Lyon, UKOLN, University of Bath Introduction and Objectives UKOLN is undertaking
Assessing a Scientific Data Center as a Trustworthy Digital Repository
Assessing a Scientific Data Center as a Trustworthy Digital Repository Robert R. Downs 1 and Robert S. Chen 2 1 [email protected] 2 [email protected] NASA Socioeconomic Data and Applications
Canadian National Research Data Repository Service. CC and CARL Partnership for a national platform for Research Data Management
Research Data Management Canadian National Research Data Repository Service Progress Report, June 2016 As their digital datasets grow, researchers across all fields of inquiry are struggling to manage
Data Management Plans & the DMPTool. IAP: January 26, 2016
Data Management Plans & the DMPTool IAP: January 26, 2016 Data Management Services @ MIT Libraries Workshops/Webinars Web guide: http://libraries.mit.edu/data-management Individual consultations includes
Response from Oxford University Press, USA
OSTP Request for Information: Public Access to Peer- Reviewed Scholarly Publications Resulting from Federally Funded Research Response from Oxford University Press, USA To: Ted Wackler Deputy Chief of
NOAA Environmental Data Management
NOAA Environmental Management Report to Unidata Policy Cmtee 2013-05-15 Jeff de La Beaujardière, PhD NOAA Management Architect [email protected] +1 301-713-7175 [email protected]
Managing and Sharing research Data
Managing and Sharing research A Guide to Good Practice Louise Corti Veerle Van den Eynden Libby Bishop & Matthew Woollard Corti_ AW.indd 8 13/02/2014 14:00 00_Corti_et_al_Prelims.indd 3 2/25/2014 6:02:00
OpenAIRE Research Data Management Briefing paper
OpenAIRE Research Data Management Briefing paper Understanding Research Data Management February 2016 H2020-EINFRA-2014-1 Topic: e-infrastructure for Open Access Research & Innovation action Grant Agreement
Research Data Management
Research Data Management 1 Why to we need to Manage Data? 2 Data Management Planning Typically covers: - What data will be created (format, types) and how? - How will the data be documented and described?
Summary of Responses to the Request for Information (RFI): Input on Development of a NIH Data Catalog (NOT-HG-13-011)
Summary of Responses to the Request for Information (RFI): Input on Development of a NIH Data Catalog (NOT-HG-13-011) Key Dates Release Date: June 6, 2013 Response Date: June 25, 2013 Purpose This Request
data.bris: collecting and organising repository metadata, an institutional case study
Describe, disseminate, discover: metadata for effective data citation. DataCite workshop, no.2.. data.bris: collecting and organising repository metadata, an institutional case study David Boyd data.bris
RESEARCH DATA MANAGEMENT POLICY
Document Title Version 1.1 Document Review Date March 2016 Document Owner Revision Timetable / Process RESEARCH DATA MANAGEMENT POLICY RESEARCH DATA MANAGEMENT POLICY Director of the Research Office Regular
DATA STEWARDSHIP from a geoscience and academic perspective
DATA STEWARDSHIP from a geoscience and academic perspective Margaret Leinen Vice Chancellor for Marine Science, UC San Diego Director, Scripps Institution of Oceanography Research Data Alliance - 5 San
Checklist and guidance for a Data Management Plan
Checklist and guidance for a Data Management Plan Please cite as: DMPTuuli-project. (2016). Checklist and guidance for a Data Management Plan. v.1.0. Available online: https://wiki.helsinki.fi/x/dzeacw
Survey of Canadian and International Data Management Initiatives. By Diego Argáez and Kathleen Shearer
Survey of Canadian and International Data Management Initiatives By Diego Argáez and Kathleen Shearer on behalf of the CARL Data Management Working Group (Working paper) April 28, 2008 Introduction Today,
Workprogramme 2014-15
Workprogramme 2014-15 e-infrastructures DCH-RP final conference 22 September 2014 Wim Jansen einfrastructure DG CONNECT European Commission DEVELOPMENT AND DEPLOYMENT OF E-INFRASTRUCTURES AND SERVICES
DATA CITATION. what you need to know
DATA CITATION what you need to know The current state of practice of the citation of datasets is seriously lacking. Acknowledgement of intellectual debts should not be limited to only certain formats of
CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21) $100,070,000 -$32,350,000 / -24.43%
CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21) $100,070,000 -$32,350,000 / -24.43% Overview The Cyberinfrastructure Framework for 21 st Century Science, Engineering,
Report of the DTL focus meeting on Life Science Data Repositories
Report of the DTL focus meeting on Life Science Data Repositories Goal The goal of the meeting was to inform and discuss research data repositories for life sciences. The big data era adds to the complexity
Long Term Preservation of Earth Observation Space Data. Preservation Workflow
Long Term Preservation of Earth Observation Space Data Preservation Workflow CEOS-WGISS Doc. Ref.: CEOS/WGISS/DSIG/PW Data Stewardship Interest Group Date: March 2015 Issue: Version 1.0 Preservation Workflow
Affiliation: University of Massachusetts Amherst, University Libraries
Date: January 12, 2012 Name: Marilyn S Billings Email: [email protected] Affiliation: University of Massachusetts Amherst, University Libraries City, State: Amherst, MA Summary: Thank you for
Data Management Plan. Name of Contractor. Name of project. Project Duration Start date : End: DMP Version. Date Amended, if any
Data Management Plan Name of Contractor Name of project Project Duration Start date : End: DMP Version Date Amended, if any Name of all authors, and ORCID number for each author WYDOT Project Number Any
Data Governance Workshop, Final Report Arlington, VA December 14-15, 2011
Data Governance Workshop, Final Report Arlington, VA December 14-15, 2011 Supported by NSF #0753138 and #0830944 Abstract The Internet and related technologies have created new opportunities to advance
Cloud and Big Data Standardisation
Cloud and Big Data Standardisation EuroCloud Symposium ICS Track: Standards for Big Data in the Cloud 15 October 2013, Luxembourg Yuri Demchenko System and Network Engineering Group, University of Amsterdam
Environmental Data Management:
Environmental Data Management: Challenges & Opportunities Mohan Ramamurthy, Unidata University Corporation for Atmospheric Research Boulder CO 25 May 2010 Environmental Data Management Workshop Silver
REACCH PNA Data Management Plan
REACCH PNA Data Management Plan Regional Approaches to Climate Change (REACCH) For Pacific Northwest Agriculture 875 Perimeter Drive MS 2339 Moscow, ID 83844-2339 http://www.reacchpna.org [email protected]
Second EUDAT Conference, October 2013 Data Management Plans and Certification Motivation: increasing importance of Data Management Planning
Second EUDAT Conference, October 2013 Data Management Plans and Certification Motivation: increasing importance of Data Management Planning Simon Lambert Scientific Computing Department STFC Rutherford
Department of the Interior Open Data FY14 Plan
Department of the Interior Open Data FY14 Plan November 30, 2013 Contents Overview Introduction DOI Open Data Objectives Transparency Governance and Culture Change Open Data Plan FY14 Quarterly Schedule
Data Management Resources at UNC: The Carolina Digital Repository and Dataverse Network
Data Management Resources at UNC: The Carolina Digital Repository and Dataverse Network November 16, 2010 Data Management Short Course Series Sponsored by the Odum Institute and the UNC Libraries Campus
Data management plan
FACILITATE OPEN SCIENCE TRAINING FOR EUROPEAN RESEARCH 612425 Data management plan Course for Doctoral Students at ECPR Summer School 2015 Faculty of Social Sciences, University of Ljubljana, Slovenia
The challenges of becoming a Trusted Digital Repository
The challenges of becoming a Trusted Digital Repository Annemieke de Jong is Preservation Officer at the Netherlands Institute for Sound and Vision (NISV) in Hilversum. She is responsible for setting out
A Policy Framework for Canadian Digital Infrastructure 1
A Policy Framework for Canadian Digital Infrastructure 1 Introduction and Context The Canadian advanced digital infrastructure (DI) ecosystem is the facilities, services and capacities that provide the
USGS Guidelines for the Preservation of Digital Scientific Data
USGS Guidelines for the Preservation of Digital Scientific Data Introduction This document provides guidelines for use by USGS scientists, management, and IT staff in technical evaluation of systems for
A Place to Stand: e-infrastructures and Data Management for Global Change Research
Community Edition 10 th August 2015 A Place to Stand: e-infrastructures and Data Management for Global Change Research Belmont Forum e-infrastructures & Data Management Community Strategy and Implementation
Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA) NSF 12-499
Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA) NSF 12-499 Vasant Honavar Program Director Information & Intelligent Systems (IIS) Division Computer and Information
Data-Intensive Science and Scientific Data Infrastructure
Data-Intensive Science and Scientific Data Infrastructure Russ Rew, UCAR Unidata ICTP Advanced School on High Performance and Grid Computing 13 April 2011 Overview Data-intensive science Publishing scientific
Data Publishing Workflows with Dataverse
Data Publishing Workflows with Dataverse Mercè Crosas, Ph.D. Twitter: @mercecrosas Director of Data Science Institute for Quantitative Social Science, Harvard University MIT, May 6, 2014 Intro to our Data
Digital libraries of the future and the role of libraries
Digital libraries of the future and the role of libraries Donatella Castelli ISTI-CNR, Pisa, Italy Abstract Purpose: To introduce the digital libraries of the future, their enabling technologies and their
NERC Data Policy Guidance Notes
NERC Data Policy Guidance Notes Author: Mark Thorley NERC Data Management Coordinator Contents 1. Data covered by the NERC Data Policy 2. Definition of terms a. Environmental data b. Information products
The Libraries Role in Research Data Management: A Case Study from the University of Minnesota
13 September 2012 The Libraries Role in Research Data Management: A Case Study from the University of Minnesota Meghan Lafferty, Chemistry, Chemical Engineering, and Materials Science Librarian, and Lisa
COGNITIVE SCIENCE AND NEUROSCIENCE
COGNITIVE SCIENCE AND NEUROSCIENCE Overview Cognitive Science and Neuroscience is a multi-year effort that includes NSF s participation in the Administration s Brain Research through Advancing Innovative
From Stored Knowledge to Smart Knowledge
From Stored Knowledge to Smart Knowledge The British Library s Content Strategy 2013 2015 From Stored Knowledge to Smart Knowledge: The British Library s Content Strategy 2013 2015 Introduction The British
EUROPEAN COMMISSION Directorate-General for Research & Innovation. Guidelines on Data Management in Horizon 2020
EUROPEAN COMMISSION Directorate-General for Research & Innovation Guidelines on Data Management in Horizon 2020 Version 2.0 30 October 2015 1 Introduction In Horizon 2020 a limited and flexible pilot action
Information and documentation The Dublin Core metadata element set
ISO TC 46/SC 4 N515 Date: 2003-02-26 ISO 15836:2003(E) ISO TC 46/SC 4 Secretariat: ANSI Information and documentation The Dublin Core metadata element set Information et documentation Éléments fondamentaux
North Carolina Digital Preservation Policy. April 2014
North Carolina Digital Preservation Policy April 2014 North Carolina Digital Preservation Policy Page 1 Executive Summary The North Carolina Digital Preservation Policy (Policy) governs the operation,
Data Intensive Research Initiative for South Africa (DIRISA)
Data Intensive Research Initiative for South Africa (DIRISA) A Reinterpreted Vision A. Vahed 25 November 2014 Outline Background Data Landscape Strategy & Objectives Activities & Outputs Organisational
Library Strategic Planning
Library Strategic Planning Develop campus partnerships to collect, manage, share, and preserve Georgia Tech digital research data 1. Needs Assessment Research data survey & interviews NSF DMP content analysis
How To Manage Research Data At Columbia
An experience/position paper for the Workshop on Research Data Management Implementations *, March 13-14, 2013, Arlington Rajendra Bose, Ph.D., Manager, CUIT Research Computing Services Amy Nurnberger,
