INTRODUCTION TO DATA MANAGEMENT

Similar documents
DATA MANAGEMENT PLAN

Checklist and guidance for a Data Management Plan

Lesson 3: Data Management Planning

Coastal Waters Consortium (CWC) Data Management Plan

RESEARCH DATA MANAGEMENT POLICY

Research Data Management Procedures

Geospatial Data Stewardship at an Interdisciplinary Data Center

Interagency Science Working Group. National Archives and Records Administration

Data dissemination best practice and STAR experience

Data Management at UT

BioMed Central s position statement on open data

ESRC Research Data Policy

Outline of a Research Data Management Policy for Australian Universities / Institutions

Best Practices for Data Management. RMACC HPC Symposium, 8/13/2014

Image Data, RDA and Practical Policies

SKSPI33 Undertake image asset management

Research Data Management Guide

LIBER Case Study: University of Oxford Research Data Management Infrastructure

Best Practices for Research Data Management. October 30, 2014

Research Data Governance and Sharing. 5 & 6 March 2015

NERC Data Policy Guidance Notes

Mississippi State University Libraries. Digitization. Policies and Procedures

NERC Biodiversity and Ecosystem Service Sustainability (BESS) Data Management Strategy

Data Management Considerations for the Data Life Cycle

Environmental Science Overview

AHDS Digital Preservation Glossary

Biodiversity Data Exchange Using PRAGMA Cloud

Management of Research Data Procedure

Data Management Planning

Archive I. Metadata. 26. May 2015

Infrastructure, Standards, and Policies for Research Data Management

Data Management Brown-bag/Seminar March 12, 2014

Teacher s Guide For. Core Biology: Environmental Sciences

Virginia Commonwealth University Rice Rivers Center Data Management Plan

Managing research data and Horizon 2020

Transition Guidelines: Managing legacy data and information. November 2013 v.1.0

Data Management Plans - How to Treat Digital Sources

A grant number provides unique identification for the grant.

An Introduction to Managing Research Data

Data Management Exercises

Survey of Canadian and International Data Management Initiatives. By Diego Argáez and Kathleen Shearer

Capacity Plan. Template. Version X.x October 11, 2012

SCOR/IGBP Meeting on Data Management for International Marine Research Projects 1

"49 39' ' E ' E ' S ' S.

SharePoint Document and Data Control

Department of Defense INSTRUCTION

LJMU Research Data Policy: information and guidance

Canadian Polar Data Network (CPDN) Governance Charter

Benefits of managing and sharing your data

IVATA and IOSEA IOSA - IUCN Scientific Collaboration

Management: A Guide For Harvard Administrators

Administrative Manual

Introduction to Research Data Management. Tom Melvin, Anita Schwartz, and Jessica Cote April 13, 2016

Discussion of Electronic Discovery at Rule 26(f) Conferences: A Guide for Practitioners

Trends and solutions for archiving in pharmaceutical industry. Didier Coyman / Ömer Yilmaz

Research Data Management PROJECT LIFECYCLE

BUILDING A BIODIVERSITY CONTENT MANAGEMENT SYSTEM FOR SCIENCE, EDUCATION, AND OUTREACH

Environment and Natural Resources Trust Fund 2016 Request for Proposals (RFP)

FEDERAL PRACTICE. In some jurisdictions, understanding the December 1, 2006 Amendments to the Federal Rules of Civil Procedure is only the first step.

Twente Grants Week: Data management. Maarten van Bentum (Library & Archive)

Bradford Scholars Digital Preservation Policy

CIP s Open Data & Data Management Guidelines and Procedures

Research Data Alliance: Current Activities and Expected Impact. SGBD Workshop, May 2014 Herman Stehouwer

Introduction to protection goals, ecosystem services and roles of risk management and risk assessment. Lorraine Maltby

Research Data Archival Guidelines

Data Management Plan. Name of Contractor. Name of project. Project Duration Start date : End: DMP Version. Date Amended, if any

USGS Data Management Training Modules

Best Practices for Data Management. September 24, 2014

DATA CITATION. what you need to know

Issue Management Plan Preparation Guidelines

WHAT SHOULD NSF DATA MANAGEMENT PLANS LOOK LIKE

Government Records Procedure GRO 3. Retrieving Records. Government Records Office Archives of Manitoba

The Strategic Environmental Assessment Directive: Guidance for Planning Authorities

Implementing an Electronic Document and Records Management System. Key Considerations

The RDMSG : Data Management Planning and More

Data Management Plan for the Fisheries Monitoring and Analysis Division at the AFSC 2015 (DRAFT)

Zotero. Zotero works best with Firefox, but Google Chrome and Safari may be used standalone.

The evolution of data archiving

OpenAIRE Research Data Management Briefing paper

Call for Proposals

Mapping the Technical Dependencies of Information Assets

USGS Guidelines for the Preservation of Digital Scientific Data

State Records Office Guideline. Management of Digital Records

GRIIDC Data Management Plan Version 1.0

Best Practices in Contract Migration

Electronic Records Management

Research Data Services at London s Global University. UCL Research Data Services RECODE Meeting 14 th Jan 2015

Metadata for Research Data: Current Practices and Trends

Broken Arrow Public Schools AP Environmental Science Objectives Revised

Corporate Records Management Policy

Polices and Procedures

Research Data Management Policy

Grant Applications and Data Management Plans. Jonathan Newman Director, School of Environmental Sciences

Please provide the following information, and submit to the NOAA DM Plan Repository.

Timescapes Methods Guides Series 2012 Guide No. 17

National Statistics Code of Practice Protocol on Data Management, Documentation and Preservation

ThirtySix Software WRITE ONCE. APPROVE ONCE. USE EVERYWHERE. SMARTDOCS SHAREPOINT CONFIGURATION GUIDE THIRTYSIX SOFTWARE

CONTENT STORE SURVIVAL GUIDE

IODE Quality Management Framework for National Oceanographic Data Centres

STORRE: Stirling Online Research Repository Policy for etheses

Transcription:

INTRODUCTION TO DATA MANAGEMENT By Michelle Lloyd, Kate Crosby, Peter Lawton Data Management Team Canadian Healthy Oceans Network November 2013 Approved and endorsed by Canadian Healthy Oceans Network Scientific Advisory Committee (SAC) and Board of Directors Copyright by Canadian Healthy Oceans Network, 2008-2014

TABLE OF CONTENTS Introduction... 1 Responsibilities and Benefits of a structured approach to data management... 2 So what is all the fuss about METADATA?... 3 What are we asking CHONe researchers to do?... 5 INTRODUCTION Management of marine ecological data is a complex and daunting task, especially due to the variety of data types that may be acquired in scientific studies (genetic, ecological, taxonomic, oceanographic, geological, geographical, etc.). In addition to managing data within the original time frame and scope of ecological studies, there is also the issue of ensuring the legacy value of newly-acquired data, and derived information products. Legacy in this context refers both to the protection of the new information from loss (e.g. through a comprehensive backup strategy), and also the extent to which the data can be reused by others in the future (e.g. by storing contextual information metadata in an accessible format that lets others know the what, where, when, why, and how the original data was collected). Given the national scope and diversity of its research projects, Canadian Healthy Oceans Network (CHONe) faces all the above challenges and more. Prior efforts on developing a comprehensive data management approach in the CHONe were not successful, due to a number of interacting factors. Nonetheless, as we enter the final stages of the first research network and begin to prepare proposals for a renewal it is critical that we undertake a data management and data rescue process. To achieve this goal CHONe has recently hired two personnel, Michelle Lloyd and Kate Crosby, as Data Management Coordinator and Data Manager, respectively. They, along with guidance from Peter Lawton ( co-theme leader for Marine Biodiversity theme) Data Management Team The data management process must be completed by March 2014 to meet CHONe funding gauge the range and magnitude of issues facing CHONe in meeting some of its end of program obligations to Natural Sciences and Engineering Research Council (NSERC) with research project and subproject outputs (theses, publications, theses chapters) accessible through CHONe was found to be incomplete, and research data documentation and metadata were either incomplete, out of 1

date and/or in some cases non-existent. These issues have complicated, in some cases halted our data management progress. Terms that appear in bold are typically words or phrases that have a specific meaning with respect to the data management approach. These terms, and various other abbreviations used in the text, are defined in a Glossary. -government research network undertaking a broad range of marine biodiversity- oceans, CHONe has to address the problem of archiving and reusing diverse ecological data. Recent surveys have estimated that only 1% of ecological data are accessibly archived for reuse (Reichman et al. 2011). CHONe aims to have a searchable and widely available Discovery Database and Ocean Biogeographic Information System Discovery Database will contain discovery metadata from all research projects and subprojects, while the OBIS ine species. Both databases will provide a link to and/or location of complete data packages that may be accessed through Public Digital Repositories (e.g. DataONE, Data Dryad, Figshare, Integrated Science Data Management (ISDM)). Researchers will be able to use CHONe data to infer larger regional and global patterns about marine biodiversity, ecosystem function and population connectivity, and begin to examine the effects of cumulative impacts and risks for ocean sustainability. RESPONSIBILITIES AND BENEFITS OF A STRUCTURED APPROACH TO DATA MANAGEMENT Data management refers to all aspects of creating, storing and delivering, maintaining, archiving and preserving data. It is one of the essential areas of responsible research conduct (Whitlock et al. 2010). From an individual project standpoint, data management approaches may be user specific, minimally structured, minimally documented (Wallis et al. 2013), yet still meet the needs of the researcher in completing and publishing a particular study. The benefits of a structured data management approach include: 1. Meeting NSERC Strategic Network Grant requirements; 2. Enabling reproducibility; 3. Increasing research efficiency and organization among projects within CHONe, as well as subprojects within projects; 4. Ensuring research metadata and data are accurate, complete, authentic and reliable; 5. Enabling the development of CHONe summary products from structured data and metadata that can be applied across projects; 6. Saving time and resources in the long run; 2

7. Enhancing validation and quality control of the data; 8. Enhancing data durability and minimising the risk of data loss; 9. Preventing duplication of effort by enabling others to re-use data; and, 10. Complying with practices conducted in industry and commerce. research discovery, private data repository, data submission, and data sharing and permanent archival, and timeline, to ensure data is accessible, understandable and reusable by CHONe partners or other users. So what is all the fuss about METADATA? Metadata is the data (i.e. documentation and information) about research data. One of the founding principles of science is reproducibility and replication; however, ecological studies are not easily reproduced or replicated (Reichman et al. 2011). While evolution and genetics have been archiving and re-using data in centralized data repositories for ~30 years, ecology because of its diverse nature is in a period of developing its own framework for data archival and retrieval. Meta-analyses make use of past genetic, ecological, taxonomic, oceanographic, geological, and geographical data and metadata to infer larger regional and global patterns. Without metadata to explain the data, the interpretation and merging of data records becomes impossible, or at the very least an arduous task. Here are three examples of different levels of metadata: Example 1 Consider a simple data record with no column headers. Without metadata, a data record like this one is useless. In addition, there is no information about the location the data was collected, the focal organism or system, or the identity of the data owner. Example 2 Now the simple data record contains column headers. We know the day, month, year and time the samples were collected, but what do d, T, S,, Fl, w, v, u, w, v, u, RI, SN, and A stand for? The lack of adequate metadata makes this data record useless to anyone other than the owner. 3

Example 3 Now the simple data record contains defined column headers, including the variable units, and the corresponding metadata record, that identifies the data record owner, where, when, why and how the data was collected, who founded the research, etc. Data Record Metadata Record 4

See Dryad data package or Appendix 1 and 3 for more examples. What are we asking CHONe researchers to do? researchers (i.e. investigators, students and postdoctoral fellows) for all CHONe related research projects and subprojects. The data management team has estimated that 36 project and >60 subproject discovery metadata records will need to be collected, and >100 data packages must be collected by March 2014 (or 2 years after completion of data collection whichever comes last) (Figure 1). 5

Within this context, project denotes the 36 project grants awarded to CHONe investigators. Subproject refers to individual student projects and other initiatives within those 36 projects. As of June 2013 there were ~ 40 completed and 58 ongoing research subprojects. The data management team will be targeting first those researchers who have completed their research projects and subprojects. Figure 1. Anticipated number discovery metadata records to accompany each data record. The data management process is multi-stepped, requiring researchers to complete 4 tasks: 1. Update researcher information and research project or subproject description. 2. Complete an online standardized discovery metadata survey for each project or subproject (<15 minutes to complete). 1 3. data record) for each subproject (or project if no subprojects exist) to CHONe Data Repository to back-up and secure the data against loss, unless your research data is already archived in a publicly accessible repository in which case provide us with permanent DOI linked to and/or location of your data. 2 4. Submit corresponding metadata record and documentation for each data record. Only researchers really know their data. The metadata record must be accurate, complete, authentic and reliable. See Data Management Plan for further details. 1 Students and postdoctoral fellows MUST have their investigator approve their surveys. 2 Should CHONe be refunded, raw data record and processed data record will be uploaded regularly using versioning software to prevent accidental loss. 6