Twente Grants Week: Data management Maarten van Bentum (Library & Archive)
Overview 1. Questions to be answered (storage, description and sharing/archiving) 2. Scientific integrity 3. Why data management 4. Data Management Plan (DMP) 5. Archiving data (data repositories/centres)
Questions to be answered 1. Where do you keep your research data? 2. Is there a backup? Where? How many copies? 3. How do you document/describe your data? 4. Who can access your data? 5. Do you share your data during or after research, for instance for reuse? If not, why? 6. What will happen to your data after finishing your research?
Question 1-2: Where do you keep your data? Is there a backup? Where? How many copies? (2/3) Storage options 1. UT central storage p- or m-disk (ICTS): http://www.utwente.nl/icts/diensten/catalogus/dataopslag_mw/stor age/) 2. Project, community or research institute storage IGS Datalab: https://www.utwente.nl/igs/datalab/ 3. Individual data storage (computer, dvd/cd, external hard disk, ) 4. Non-commercial cloud storage Surfdrive: https://www.surfdrive.nl/en DataverseNL: https://dataverse.nl/dvn/ 5. Commercial cloud storage: Dropbox, OneDrive,
Question 1-2: Where do you keep your data? Is there a backup? Where? How many copies? (1/3) Criteria Sustainability/reliability: frequency backup (off line / off site?) Dataset type: raw dataset, versions during processing and analysis, final datasets Size dataset: capacity, costs, data transfer Legal or contractual regulations Access: individual, community, open
DMP - Data storage and backup (3/3) Backup 3 copies (original, external/local, external/remote) Local vs. remote depends on recovery time needed
Question 3: How do you document/describe your data? (1/2) Documentation during research of dynamic data sets (for yourself, fellow researchers in the project and/or group) Documentation after research of static data sets (for discovery, verification, replication, and reuse) Documentation: standard metadata schemes enhanced with specific descriptive elements necessary for verification, replication, and reuse See list: http://www.dcc.ac.uk/resources/metadata-standards/list See also 3TU.Datacentrum Data description and formats
Question 3: How do you document/describe your data? (2/2) Metadata 3TU.Datacentrum Creator* Main researcher(s) involved in producing the data Contributor Institution where the data was created or collected. Publisher* Institution which submitted the work Title* Name or title by which a resource is known Publication year* The year when the data was or will be made publicly available Date created Date the resource itself was put together; data range or a single date Description* Concise description of the contents of the dataset Subject Subject, keyword, classification code, of key phrase describing the resource Coverage temporal Indicate the dates to which the data refer. Coverage spatial Describe the geographic area to which the data refer Identifier A persistent identifier to a dataset URL to publication Include the web addresses for any publication
Question 4: Who can access your data? (1/2) Verifiability 3.1.Research must be replicable in order to verify its accuracy. The choice of research question, the research set-up, the choice of method and the references to sources used are accurately documented in a form that allows for verification of all steps in the research process. 3.2. The quality of data collection, data input, data storage and data processing is closely guarded. All steps taken must be properly reported and their execution must be properly monitored (lab journals, progress reports, documentation of arrangements and decisions, etc.). 3.3.Raw research data are stored for at least ten years. These data are made available to other academic practitioners upon request, unless legal provisionsdictate otherwise. 3.4.Raw research data are archived in such a way that they can be consulted at all times and with a minimum expense of time and effort. 3.5.The source of all educational material, written as well as oral, is stated (From: The Netherlands Code of Conduct for Academic Practice)
Question 4: Who can access your data? (2/2) - UT data policy? - Funder requirements? - Requirements other parties? Contracts? - Open Access required? Possible? Dutch Personal Data Protection Act (UT Data Protection Officer)
Question 5: Do you share your data during or after research, for instance for reuse? If not, why? Why sharing your data? Replication / verification Promote your research Enable new discoveries (reuse) "Open where possible, protected where needed" See NWO policy http://www.nwo.nl/en/policies/open+science After research: public, linked to publication(s) > 3TU.Datacentrum, DANS, DataverseNL
Question 6: What will happen to your data after finishing your research? Proper archiving: Trusted data repositories (DANS, 3TU.Datacentrum) Linked to publications Open or restricted access (DANS) Open: funder requirements NWO data management pilot: http://www.nwo.nl/en/policies/open+science/data+management EC Horizon 2020 data management pilot: http://ec.europa.eu/research/participants/data/ref/h2020/grants_ma nual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf Restricted: Legal and contractual regulations (Dutch Personal Data Protection Act, http://www.utwente.nl/az/gegevensbescherming/, in Dutch)
Scientific integrity 1. Criteria: Fabrication, Falsification and Plagiarism (FFP) 2. Fabrication of data (Stapel, Schön) 3. Untraceable data (Poldermans) Neglect of basic preservation of data Neglect of data management No proper mechanism for quality control: no data or instruments for easy data reproduction means no possible check See also: https://www.utwente.nl/en/organization/structure/management/goodmanagement/
Why manage research data Validate research results or verification of data (e.g. Netherlands code of conduct for scientific practice) Use/Reuse research data (secondary user) Obligation by the research funding body (EC and NWO) Uniqueness of the data (e.g. innovative character of the research) Value of the data (non-repeatable observations) Importance of data / heritage (e.g. history of science)
Data Management Plan Formal research project document about what and how data will be collected, stored, described, and archived and how access, reuse and linking to publications will be realised. Responsibility Description of data Methodology data collection Documentation: metadata (standards) Quality assurance Storage and backup Policies for access and sharing and provisions for appropriate protection/privacy Policies and provisions for reuse, redistribution Plans for archiving and preservation of access From: National Science Foundation and University of California
Data Management Plan Information, templates and checklists UT template: website RDM on Library & Archive 3TU.Datacentrum: template DANS checklist NWO form
Data repositories Data centres: 3TU.Datacentrum DANS List of data repositories: Databib or Data repositories
18
19
Enhanced publication
21
Support and/or advice Information specialist in your faculty or Maarten van Bentum (data librarian): m.vanbentum@utwente.nl tel. 489 4474