Introduction to Research Data Management Marta Teperek, Veronica Phillips 30/10/2015 University of Cambridge
TODAY: Mixture of activities and talking Introduction 1. Backup and exchange strategies 2. How to organise your data well 3. Data sharing 4. how to avoid problems => data management plans We will send you the slides
Part 1: Data backup and data exchange strategies 3
Disastrous data loss Credit: Peter Murray-Rust http://blogs.ch.cam.ac.uk/pmr/2011/08 /01/why-you-need-a-datamanagement-plan/ August 2011, CC-BY
How much of your data would you lose if?
How much of your data would you lose if? your laptop got stolen your lab/office burnt you've lost your USB stick your portable hard drive got damaged data from your Dropbox account disappeared
Copyright: https://www.lacie.com/more/?id=10129
Backup strategies: Your own examples? Copyright: http://blog.baroan.com/
Backup strategies: Departmental backup system External drives Online backups At least two backups, at two different locations
At least 2 backups at 2 locations: Every Monday morning Everyday at 10am (automated!) Free software to manage backups (there is plenty of free software!): http://www.2brightsparks.com/download-syncbackfree.html
At least 2 backups at 2 locations: Store at home! Free software to manage backups (there is plenty of free software!): http://www.2brightsparks.com/download-syncbackfree.html
At least 2 backups at 2 locations: Store at home! Shiny new exciting data! Free software to manage backups (there is plenty of free software!): http://www.2brightsparks.com/download-syncbackfree.html
At least 2 backups at 2 locations: Store at home! Copy ASAP! Free software to manage backups (there is plenty of free software!): http://www.2brightsparks.com/download-syncbackfree.html
File sharing: Your own strategies?
File sharing: Google Drive/Dropbox - cautious! E-mail Moodle Sharepoint FTP/SFTP University of Cambridge private cloud under development
Part 2: Data organisation Copyright: DAM Learning Centre 16
Data organisation: Your own examples?
Data organisation: consistent meaningful to you and your colleagues allow you to find files easily would you be able to easily get hold of your own data?
Copyright: http://www.vukovicnikola.info/folder-structure-for-research/
Data organisation: also applies to physical samples Marta s PhD project
Organisation of physical samples: create maps of your samples o and keep them up to date! reference your samples: o date in the lab books o supplier s name/code add any relevant notes
File naming conventions why matter? *** *** Copyright: http://10pm.com/
File naming conventions why matter? Would you know in 3 years time what are all these?
File naming convention: http://www.data.cam.ac.uk/files/gdl_tilsdocnaming_v1_20090612.pdf
Part 3: Data sharing 25
Data sharing Copyright: http://mikeholtzer.com/
What is your opinion? 27
It would be useful if research data underpinning publications was available
I (/my group) regularly share research data underpinning publications
Why not to share? Outline 4 top reasons for your group Credit: Dr Jenny Molloy, Open Knowledge Foundation
Benefits of data sharing: Moving science forward Transparency in research Higher citation, better recognition More collaborations Public money better spent Less time wasted Better data management Cultural change
Funders policies for research data
Publicly funded research data are a public good ( ), which should be made openly available with as few restrictions as possible
How to share data? Describe your data Deposit your data in suitable repositories Use persistent links, e.g. DOIs (Digital Object Identifiers) Store data for (at least) 10 years http://www.bbsrc.ac.uk/documents/data-sharing-policy-pdf/
Exemptions Personal/sensitive data IP protection/commercial data Too expensive to share via a repository Appropriate statement in the publication needs to explain the reasons for restrictions http://www.data.cam.ac.uk/funders/bbsrc
On the horizon Random checks on all publications from 1 May 2015 that acknowledge EPSRC + sanctions for not sharing
What do I need to do? For every new publication share what is shareable & add a statement Be aware of help available to you at the University of Cambridge
Cambridge support for data management and sharing
www.data.cam.ac.uk
FUNDERS POLICIES www.data.cam.ac.uk/funders
www.data.cam.ac.uk/funders
DATA REPOSITORY www.data.cam.ac.uk/repository
DATA REPOSITORY University of Cambridge data repository www.data.cam.ac.uk/upload
www.data.cam.ac.uk/upload www.data.cam.ac.uk/upload
www.data.cam.ac.uk/upload www.data.cam.ac.uk/upload
Part 4: How to avoid problems with data management 46
Data management plan: roadmap to help you not to get lost with your data
Data management plan: You now have 10 mins to write your own data plan!
Data management plan: 1. Identify the type of data you are working with: E.g. microscopic images, video recordings; big data, small data; physical samples; lab books. 2 mins!
Data management plan: 1. Identify the type of data you are working with: Big volume data: microscopic images in proprietary file formats (hundreds of images, each 250MB in size total volume of images: 200GB); genomic and transcriptomic data (about 150 raw sequencing reads, each around 2GB is size; plus processed files for each of similar size) Small volume data: experimental files from other instruments (stored in proprietary formats and exported to non-proprietary formats), spreadsheets and graphs, reports, dissemination documents, scans of lab book pages Physical data: various types of physical samples and lab books
Data management plan: 2. Decide on your data organisation strategy E.g. how do you organise your folders and name your files? How do you organise your physical samples? 2 mins
Data management plan: 2. Decide on your data organisation strategy All digital data is organised by project names all projects have separate folders, with similar organisation across the projects (experimental data, project management, dissemination of results). Physical samples have their location and description indicated in the laboratory database. Naming of files is decided and documented separately for each project, and accepted by all co-workers.
Data management plan: 3. Define your backup plan (and follow it!) E.g. How frequently do you do your backups? At how many independent locations? Can you back-up your physical samples/lab books? 2 mins
Data management plan: 3. Define your backup plan (and follow it!) I am working on a desktop computer. All digital data are backed up daily on the departmental server (the departmental server is located outside the department, which is also routinely backed-up). Additionally, all digital data is backed-up weekly to multiple external hard-drives (stored in the lab and at an independent location). Lab book pages are digitised weekly and backed-up in the same way as other digital data.
Data management plan: 4. Decide on a file exchange strategy How do you exchange files (and other information) with your collaborators? How will you share your files with internal and external collaborators? 2 mins
Data management plan: 4. Decide on a file exchange strategy Internal collaborators: shared drive on the departmental server or physical exchange of USB drives. External collaborators: e-mail exchange for small files, and secure file transfer protocols (set up by the departmental IT support) for sharing of bigger files.
Data management plan: 5. What are your plans for data sharing? Are your plans in-line with your funder s requirements? How will you share your data? Will you deposit data into a repository? 2 mins
Data management plan: 5. What are your plans for data sharing? All genomic and transcriptomic data will be shared using disciplinespecific repositories (NCBI). Microscopic data is very expensive to share (high volumes of data), but experiments are easily reproducible (precise and well-defined protocols will be shared); therefore only representative images will be included in the publication. Datasets essential for the reproducibility of the experiments will be shared via the University of Cambridge data repository.
Data management plan: 6. Will you experience any problems with data sharing? Are you working with commercial/sensitive/personal/patentable data? Will you be able to share these data? 2 mins
Data management plan: 6. Will you experience any problems with data sharing? One of my projects is co-funded by a commercial company, and therefore research data cannot be shared. However, metadata will be made available for discovery, as well as details of the Non-Disclosure Agreement (NDA) with the company. Therefore, anyone who would like to access these data, will be able to discover these data, and might explore the possibility of signing the NDA with the company to get the access to these data.
Your data management plan Fill in and send to info@data.cam.ac.uk obenefit from free advice on your data management strategy otake a copy of a sample data plan
TODAY S SUMMARY: Problems with data 1. Backup and exchange strategies 2. How to organise your data well? 3. Data sharing 4. How to avoid problems: data management plans
Final conclusions: Data management plan can save you from a lot of trouble www.data.cam.ac.uk info@data.cam.ac.uk
Announcements University of Cambridge
www.data.cam.ac.uk/events
THANK YOU Questions: info@data.cam.ac.uk Follow us on Twitter: @CamOpenData