Checklist for a Data Management Plan draft



Similar documents
OpenAIRE Research Data Management Briefing paper

Data management plan

Checklist and guidance for a Data Management Plan

LJMU Research Data Policy: information and guidance

An Introduction to Managing Research Data

H2020 Guidelines on Open Data and Data Management Plan

ESRC Research Data Policy

A grant number provides unique identification for the grant.

Edinburgh Napier University. Research Data Management Policy

Research Data Management Guide

Data Management Plan. Name of Contractor. Name of project. Project Duration Start date : End: DMP Version. Date Amended, if any

The Horizon 2020 Open Data Pilot. Sarah Jones Digital Curation Centre, University of Glasgow

Creating a Data Management Plan for your Research

EUROPEAN COMMISSION Directorate-General for Research & Innovation. Guidelines on Data Management in Horizon 2020

Research Data Management Policy

NERC Biodiversity and Ecosystem Service Sustainability (BESS) Data Management Strategy

NSF Data Management Plan Template Duke University Libraries Data and GIS Services

EXECUTIVE AGENCY HORIZON 2020 PROGRAMME

Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020

DATA LIFE CYCLE & DATA MANAGEMENT PLANNING

Research Data Management PROJECT LIFECYCLE

D1.3: 1 st Data Management Plan WP1 Project Management

D5.5 Initial EDSA Data Management Plan

Best Practices for Good Data Management. February 19, 2015

Lesson 3: Data Management Planning

How To Useuk Data Service

Best Practices for Research Data Management. October 30, 2014

Data Management Plans & the DMPTool. IAP: January 26, 2016

ERA-CAPS Data Sharing Policy ERA-CAPS. Data Sharing Policy

RESEARCH DATA MANAGEMENT POLICY

Open Access to publications and research data in Horizon 2020

DATA MANAGEMENT PLANNING

Research Data Management Plan (RDMP template)

Research Data Archival Guidelines

HERON (No: ): Deliverable D.2.6 DATA MANAGEMENT PLAN AUGUST Partners: Oxford Brookes University and Università Commerciale Luigi Bocconi

Best Practices for Data Management. RMACC HPC Symposium, 8/13/2014

A Guide to the Research Data Service

Data Management Planning

Research Data Management

Action full title: Universal, mobile-centric and opportunistic communications architecture. Action acronym: UMOBILE

Opus: University of Bath Online Publication Store

Open Access to scientific data. SwissCore Annual Event Brussels, 14 May 2014

OPEN ACCESSAND DATA MANAGEMENT SUPPORTAT THE UNIVERSITY OF HELSINKI

FURNIT-SAVER Smart Augmented and Virtual Reality Marketplace for Furniture Customisation. Data Management Plan

Timescapes Methods Guides Series 2012 Guide No. 17

Service Road Map for ANDS Core Infrastructure and Applications Programs

Research Data Management in Horizon 2020

Managing and Sharing research Data

ENHANCED PUBLICATIONS IN THE CZECH REPUBLIC

Introduction to Research Data Management

Data Management Exercises

Second EUDAT Conference, October 2013 Data Management Plans and Certification Motivation: increasing importance of Data Management Planning

Urban Big Data Centre. Data services: Guide for researchers. December 2014 Version 2.0 Authors: Nick Bailey

EPSRC Research Data Management Compliance Report

LIBER Case Study: University of Oxford Research Data Management Infrastructure

NWO-DANS Data Contracts

DATA CITATION. what you need to know

Introduction to the Research Data Center PsychData

Johns Hopkins University Data Management Services

BSBMKG408B Conduct market research

TIMESTORM " MIND AND TIME: INVESTIGATION OF THE TEMPORAL TRAITS OF HUMAN-MACHINE CONVERGENCE"

Recommendations for the Implementation of Article 37 of the Spanish Science, Technology and Innovation Act: Open Access Dissemination SUMMARY

Research Data Management Procedures

Horizon2020 Data Management Plans. Ma4 Harrison BGS

Introduction. 1. Name of your organisation: 2. Country (of your organisation): Page 2

Writing a MRC / NC3R Data Management Plan

Management of Research Data Procedure

Project Plan DATA MANAGEMENT PLANNING FOR ESRC RESEARCH DATA-RICH INVESTMENTS

Librarian s skills for eresearch support Joint project at TUM, CPUT and UniBW_M Dr Caroline Leiss

Advances in Nutrition Authors' Statement and Copyright Release Form

Data Management at UT

THE HELMHOLTZ INVENIO REPOSITORY PROJECT :

NATIONAL CLIMATE CHANGE & WILDLIFE SCIENCE CENTER & CLIMATE SCIENCE CENTERS DATA MANAGEMENT PLAN GUIDANCE

Clarifications of EPSRC expectations on research data management.

Transcription:

Checklist for a Data Management Plan draft The Consortium Partners involved in data creation and analysis are kindly asked to fill out the form in order to provide information for each datasets that will be generated in the research project. Details provided by the survey shall help to define the state of the art, enabling staff in charge for DMP preparation to address relevant issues regarding data management at this stage of the research project. The DMP requires constant updating and at least three reviewed versions must be provided (within six months, mid-term, end of the project). Updates are required when major changes occur, such as the creation of new datasets or changes in the access policies. The DMP must cover all datasets generated in the research project. The following questionnaire must be filled out for every research dataset. NB. Identifiers for researchers and datasets are not required at this stage: they must be provided after policies for preservation are definitely defined. 1. Define and describe datasets created 1.1 Provide a title, an identifier (not covered in this report) and a brief description of the dataset Give a proper name to the dataset and provide a brief description (abstract): main content, time/space coverage and methodologies applied for data creation. e.g. Qualitative data from GIS-P - Recordings and maps from GIS for Participation focus groups. When the final Data Management Plan will be submitted, each dataset will have to be identified by a persistent identifier (e.g. DOI) and referenced for access, re-use and validation of results (i.e. the URI of dataset via the chosen data research repository). Then, you ll have to describe how to properly cite the dataset. The data citation should at least include: all contributors, date of dataset publication, title of dataset, media or URL, data publisher, an identifier (Digital Object Identifier).

1.2 Provide provenance information for each dataset Name the people responsible for the dataset throughout its lifecycle, including: Name, Contact information, Role. e.g., principal investigator, technician, data manager. When the first version Data Management Plan will be submitted, the main researcher should be identified with a persistent identifier (e.g. ORCID). 1.3 Describe data sources used by the project Give a description of the context of the dataset with respect to other projects or studies, include links and related documentation, if applicable. Explain if existing data sources are used by the research project, including references to data and publications; analyse the gap between previous or current data and yours; describe limits of previous works and how you ll improve results in your project; describe trustworthiness of third party data If new data are created, explain why no-suitable existing data are available. 1.4 Describe nature of data Describe which different type of data you will collect or generate. For example: - Instrument measurements - Experimental observations - Still images, video and audio - Text documents, spreadsheets, databases - Quantitative data (e.g. household survey data) - Survey results & interview transcripts - Simulation data, models & software - Slides, artefacts, specimens, samples - Sketches, diaries, lab notebooks 1.5 Describe a potential target for your dataset Describe who can be interested in accessing your data and, maybe, how your results can be re-used in other research studies.

2. Describe standards and metadata used by the project 2.1 How data will be created? Describe how data are (or will be) created: e.g. starting from a public data source manipulated for creating a dataset; merging different datasets into a new one; creation of new data. 2.2 Describe typologies, content and quality of data Define if they are raw data, derived or secondary data. Data may be numerical, textual - descriptive, visual or tactile. They may be created in digital form (born digital) or converted (digitized). Describe content of data and aimed goals. 2.3 Data format Open, standard file formats should be used as they are easier to preserve some funders may state preferred formats. If necessary, explain why you adopt specific formats instead of maybe common ones. If proprietary formats are used, conversion of data across formats is required. A good practice in data management is providing the same data in several formats. (see http://ukdataservice.ac.uk/manage-data/format.aspx for further information about suggested data formats) e.g. a spreadsheet in Excel (.xls) can be converted in a comma-separated-value file (.csv): both these can be stored for long-term preservation. 2.4 Data volume An estimated amount of data to be produced should be evaluated since the start of the project. It must be considered both the amount of information (e.g. 5000 interview transcripts) and storage required (e.g. 300MB). Refer to state of the art and, eventually, define how data will be increased during the life cycle of dataset and what is expected at mid-term and final stage of the project. 2.5 How you will document your data? describe metadata standards Contextual information for data is called metadata. A good practice is to provide self-explanatory data (in

terms of variables, codes, abbreviations) and contextual documentation describing what your data mean, how they are collected and which methods are used to create them.details can be recorded in a text file (such as a read me file) in the same directory as the data.both data and metadata have to be in an open format for preservation and dissemination. e.g. documentation about datasets and data can be provided in plain texts (.txt,.odt), tabular data (.csv,.tsv) or spreadsheets and semi-structured data (.xml). Metadata can cover several features about data: e.g. provenance (of dataset), bibliographic information, administrative issues, research field information. (see http://ukdataservice.ac.uk/manage-data/document.aspx) 2.6 Versioning, File naming conventions and Structure of files in dataset These details are useful to make sure data will be easily understood, for the purpose of its re-use and preservation, and have to be considered also in the documentation attached to the dataset. If data are held in various place or change considerably during their life cycle, a method for keep track of versions is required. e.g. a research group can keep track of all versions of its dataset, or the first and the last version, or doesn t need to keep track of previous versions and decide to preserve simply the latest version. These relevant are also meaningful for deciding which repository will hold data. Furthermore, a well organized folder structure and self-descriptive file names make re-use easier. Establish since the start of collection of data how they will be organized and file naming conventions (and document it in metadata). (see http://ukdataservice.ac.uk/manage-data/format/organising.aspx for further information). 2.7 Methodologies for data collection and/or processing Data may be raw, cleaned or processed, and may be held in any format or media. Describe standardized procedures to collect and process your data.

Describe if data cleaning or anonymizing of data are required. 2.8 Data quality assurance Describe procedures for quality assurance that will be carried out on data at the time of data collection, data entry, digitization and data checking. (see http://ukdataservice.ac.uk/manage-data/format/quality.aspx for further information) e.g. quality control measures can include: - calibration of instruments to check the precision, bias and/or scale of measurement - using standardised methods and protocols for capturing observations, alongside recording forms with clear instructions - computer-assisted interview software to: standardise interviews, verify response consistency, route and customise questions so that only appropriate questions are asked, confirm responses against previous answers where appropriate and detect inadmissible responses - setting up validation rules or input masks in data entry software - using controlled vocabularies, code lists and choice lists to minimise manual data entry - detailed labelling of variable and record names to avoid confusion - designing a purpose-built database or speadsheet structure to organise data and data files - accompanying notes and documentation about the data - checking data completeness - correcting errors made during transcription - peer review 3. Describe how your data will be shared and disseminated during data life-cycle 3.1 How will data be shared? Explain how data can be accessible to others: download, open repository accessible via web, web sites... Identify a repository for current deposit of data (e.g. public thematic repositories). Give details about the repository used by partners for sharing data during the research project and the repository used for long term preservation.

3.2 Identify a proper license for re-using data In H2020 Open Data Pilot Creative Commons licenses are recommended (e.g. CC0, CC-by): use a specific license for data. Establish if an embargo period is required and explain why. Consider if you need to anonymise data. e.g. to remove identifying information or personal data, during research or in preparation for sharing If necessary, define which restrictions on data access will be applied and why (maybe various access regulations are needed, during and after research). Consider policies of your institution and Consortium Grant Agreement. e.g. - when gaining informed consent, include consent for data sharing - where needed, protect participants identities by anonymising data; - address access restrictions to data, before commencing research; - define when restrictions are needed on sensitive data, accessible only by research team; - declare if an embargo period is required by an editor or for commercial exploitation. When choosing licenses, consider requirements of Open Data Pilot in terms of data sharing: [...] a license to copy, use, distribute, transmit and display the work publicly and to make and distribute derivative works, in any digital medium for any responsible purpose, subject to proper attribution of authorship (community standards, will continue to provide the mechanism for enforcement of proper attribution and responsible use of the published work, as they do now), as well as the right to make small numbers of printed copies for their personal use 1 (see http://www.dcc.ac.uk/resources/how-guides/license-research-data for further information) When using third-party data, licenses agreements should allow for the derived data to be shared and preserved. If third party data are re-used please give details about distribution licenses and terms and conditions for re-use with reference to IP and Privacy issues. Establish who will own the copyright and IPR of any new data that you will generate. 1 Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020, http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-guide_en.pdf, p.2

3.3 Tools and Software used and necessary for re-use data In order to enable share, re-use and preservation of data, also software required to access data have to be open source (non-proprietary software and software based on open standards). Describe and explain here what a user needs when re-using your data. If needed, also tools and software should be preserved along with data. 4. Data curation and preservation (Not covered here) Further issues have to be addressed when drafting the DMP, which requires more than a description of the state of art of research data. Policies, protocols, tools, environments for long-term preservation are to be detailed in the fifth section of the DMP. These details will be discussed among Coordinator, WP Leaders, technicians and people identified as responsible for data management and curation, after collecting relevant documentation from partners. If relevant, some issues can already be addressed. 4.1 Do data need further processes for long-term preservation? When data cannot be simply stored as they are for long-term preservation, further manipulation is needed in order to accomplish H2020 requirements. e.g. conversion in an open format, data cleaning, anonymisation, elimination of certain datasets, metadata validation, etc... 4.2 What data will be stored for long-term preservation? H2020 Open Data Pilot requires storage in a research data repository only of underlying data, i.e. those one needed for validating publications and which have to be accessible and preserved in a long-term perspective. At latest when a scientific publication is submitted for peer-review, data have to be deposited in a research data repository for data preservation, ensuring they will be citable and accessible. 4.3 Which back-up and security system will be applied? Are there particular type of data which need a non-standard storage or manipulation, such as secure storage of personal or sensitive data?

4.4 Which Open-AIRE compliant repository is chosen for long-term preservation? A research repository must accomplish requirements of interoperability: the Open-AIRE Portal provide a list of repositories compliant with minimal Open Data requirements in H2020. 4.5 Estimated data volume in final stage of project 4.6 Define costs, responsibilities and team for data management Identify, through the costing tool attached to this documentation, which packages you will require for data management or which skills and features your research group will manage inhouse. Establish who within your research team will be responsible for data management, metadata production, dealing with quality issues and the final delivery of data for sharing or archiving. For collaborative projects explain the coordination of data management responsibilities across partners. Explain if additional specialist expertise and software/hardware resources are required. Outline and justify costs. 4.7 Are there any other relevant issues not covered by the above questions?

Bibliography Jones, S., Guy, M., Pickton, M., Digital Curation Centre, Research Data Management for librarians, 2013 http://www.dcc.ac.uk/sites/default/files/documents/events/rdm-for-librarians/rdm-for-librarians-booklet.pdf Jones, S., Pryor, G., and Whyte, a., Digital Curation Centre, How to Develop RDM Services - a guide for HEIs, 25 March 2013, http://www.dcc.ac.uk/resources/how-guides/how-develop-rdm-services DMPonline, a tool for writing DMPs https://dmponline.dcc.ac.uk UK Data Archive, Managing and sharing data: best practice for researchers, http://www.data-archive.ac.uk/media/2894/managingsharing.pdf UK Data Service http://ukdataservice.ac.uk/