The data landscape lessons from UK Veerle Van den Eynden UK Data Archive University of Essex Faculty of Psychology and Educational Sciences University of Ghent, Belgium 23 October 2014
UK data landscape Data centres (domain-specific) International data repositories (domain-specific) Institutional data repositories (emerging) Funder data policies & mandates National data support services Institutional research data management services
UK Data Archive / UK Data Service Put together a collection of the most valuable data and enhance that over time Preserve data in the long term for future research purposes Make the data and documentation available for reuse Provide data management advice for data creators Provide support for users of the service Information about the use to which data are put Easy access through Discover - ukdataservice.ac.uk
UKDS some statistics Holdings: data for research and teaching purposes, used in all sectors and for many different disciplines 6,000 datasets in the collection 25,000 registered users 60,000 downloads worldwide p.a. 4000+ user support queries p.a.
UKDS data sources Official agencies - mainly central government International statistical time series Individual academics - research grants Market research agencies Public records/historical sources Access to international data via links with other data archives worldwide
What do users do with the data? Comparative research, restudy or follow-up study Re-analysis/secondary analysis Research design and methodological advancement Replication of published statistics Teaching and learning
UK data centres Archaeology Data Service Biomedical Informatics Research Network Data Repository British Atmospheric Data Centre British Library National Sound Archive British Oceanographic Data Centre Cambridge Crystallographic Data Centre ChemSpider ChemSpider Synthetic Pages ecrystals Endangered Language Archive Environmental Information Data Centre Ethno-ornithology World Archive National Biodiversity Network National Geoscience Data Centre NERC Earth Observation Data Centre NERC Environmental Bioinformatics Centre Polar Data Centre The Oxford Text Archive UK Data Archive UK Solar System Data Centre Visual Arts Data Service
International data repositories (examples) Dataverse Dryad figshare Genbank European Bioinformatics Institute European Nucleotide Archive / EMBL Publishing Network for Geoscientific and Environmental Data (PANGAEA) The Arabidopsis Information Resource (TAIR) Zenodo
Institutional repositories Almost all UK universities now have an institutional repository for publications Increasingly providing for datasets too Project underway to develop a national registry for research data, which will bring together data collections from across national data centres and institutional data repositories (Jisc, DCC, UKDA)
Data repository platforms www.eprints.org/ Widely used as an IR solution already Active community of users and developers ckan.org/ Great for open, active data Includes visualisation features www.dspace.org/ Another widely supported IR, now doing data projecthydra.org/ Very customisable top layer Fedora repository underneath
Research funder data policies (RCUK) Publicly funded research data are a public good, produced in the public interest, that should be made openly available with as few restrictions as possible in a timely and responsible manner that does not harm intellectual property. in accordance with relevant standards and community best practice metadata to make research data discoverable legal, ethical, commercial constraints on release of research data recognition for collecting & analysing data; limited privileged use acknowledge sources of data, intellectual contributions, terms & conditions use public funds to support the management and sharing of publiclyfunded research data Research Councils UK Common Principles on Data Policy
Research funder policies (RCUK) peer reviewed research papers published in journals that are compliant with Research Council policy on Open Access include statement on how the underlying research materials such as data, samples or models can be accessed for publications submitted for publication from 1 April 2013 Research Councils UK Policy on Open Access
Research funder data policies Data sharing policy mandating or encouraging data sharing Data management / sharing planning required Grant holders responsible for managing & sharing data Except EPSRC institutional responsibility publish metadata online, with DOI (digital object identifier) maintain data securely for 10 years after last access request papers include statement on access to supporting data also Wellcome Trust, DFID, Cancer Research UK, British Academy, Nuffield Foundation, have data sharing policies
Research councils Arts and Humanities Research Council (AHRC) Biotechnology and Biological Sciences Research Council (BBSRC) Engineering and Physical Sciences Research Council (EPSRC Economic and Social Research Council (ESRC) Medical Research Council (MRC) Natural Environment Research Council (NERC) Science and Technology Facilities Council (STFC)
Research funder data investments Fund data sharing support services and infrastructure, e.g. UK Data Service (ESRC) NERC data centres (NERC) MRC Data Support Service Genbank (BBSRC, MRC) Atlas Petabyte Storage (STFC) Archaeology Data Service (AHRC)
Example: ESRC research data policy Research data should be openly available to the maximum extent possible through long-term preservation and high quality data management. (ESRC Research Data Policy, 2010) ESRC grant applicants planning to create data during their research include a data management plan with their application, as an attachment to the Je-S form ESRC award holders offer their research data to the ReShare repository (managed by UK Data Service) within three months of the end of their grant, to preserve them and to make them available for new research. Researchers who collect the data initially should be aware that ESRC expects that others will also use it, so consent should be obtained on this basis and the original researcher must take into account the long-term use and preservation of data. (ESRC Framework for Research Ethics, 2012)
ESRC data management plan Assessment of existing data Information on new data Quality assurance of data Backup and security of data Expected difficulties in data sharing Copyright / Intellectual Property Right Responsibilities Preparation of data for sharing and archiving ESRC DMP guidance
JISC / Jisc Managing Research Data programme (2009-2013), funding research and development project by institutions: Research data management infrastructure Data management planning Data publishing Data management training materials Tools development
Data support services Digital Curation Centre: Data infrastructure / services audits Tool, e.g. DMPonline Data management planning Data management training / courses (www.dcc.ac.uk) Research Data Management Training MANTRA (Edinburgh) online learning units (datalib.edina.ac.uk/mantra/)
UKDS guidance and resources Online best practice guidance: ukdataservice.ac.uk/manage-data.aspx Managing and Sharing Research Data a Guide to Good Practice: (Sage Publications Ltd) Helpdesk for queries: ukdataservice.ac.uk/help/get-in-touch.aspx Training: www.data-archive.ac.uk/create-manage/advice-training/events
Institutional RDM guidance
And researchers
Data landscape.
Worth the investment? Return on investment evaluation: ADS, UKDA, BADC Economic analysis indicates: Very significant increases in research, teaching and studying efficiency were realised by the users as a result of their use of the data centres The value to users exceeds the investment made in data sharing and curation via the centres in all three cases By facilitating additional use, the data centres significantly increase the measurable returns on investment in the creation/collection of the data hosted. Jisc (2014). The value and impact of data sharing and curation - synthesis of three recent UK studies.
Worth the investment? (cont) Qualitative analysis indicated that: Interviewees underlined the value seen by users and depositors in the data centres. Overall feedback was very positive Surveyed academic users reported that use of the centres was very or extremely important for their academic research. A majority of respondents (between 53% and 61% across the three surveys) reported that it would have a major or severe impact on their work if they could not access the data and services For surveyed depositors, having the data preserved for the longterm and its dissemination being targeted to the academic community were seen as the most beneficial aspects of depositing data with the centres. Jisc (2014). The value and impact of data sharing and curation - synthesis of three recent UK studies.
Worth the investment? (cont.) Jisc (2014). The value and impact of data sharing and curation - synthesis of three recent UK studies.
Should all research data be archived? Are all archived data re-used or in demand? High demand for: National / international data Time-series, longitudinal data Data with broad coverage UKDS: Curation and preservation of high demand datasets Short-term management of smaller research datasets: ReShare repository (self-deposit by researchers) But joined up discovery / access
Developments in Europe? More and more funders / institutions adopt data policies, e.g.: Max-Planck-Society Data Policy Grundsätze zu Forschungsdaten an der Universität Bielefeld Data that emerge from research funded by NWO is co-owned by NWO, with NWO having a say in making data accessible KNAW: policy on open access and digital preservation requires researchers to digitally preserve research data, ideally via deposit in recognised repositories, to make them openly accessible as much as possible Danish e-infrastructure Cooperation (DeIC) is developing a national strategy for data management for all research institutions
Developments in Europe (cont.) Data management plans becoming norm Investment in data infrastructure with research infrastructure Open data ~ open access policies for publications Data sharing esp. for data that underpin publications
Questions Contact details veerle@essex.ac.uk