Data Archiving and Networked Services A federated data infrastructure: the Dutch way forward Ingrid Dillo (DANS) DeIC Conference Middelfart, 30 September 2014 DANS is an institute of KNAW en NWO
The Dutch case as best practice?
Content DANS Data and Trust Contents European framework for certification DSA, DIN, ISO Other approaches Ongoing work Horizon 2020 DANS Data infrastructure Trust and certification Business model
What is DANS? Institute of Dutch Academy and Research Funding Organisation (KNAW & NWO) since 2005 Mission: promote and provide permanent access to digital research information First predecessor dates back to 1964 (Steinmetz Foundation), Historical Data Archive 1989
NARCIS: Gateway to scholarly information In the Netherlands Our services EASY: Electronic Archiving System for self-deposit New service from May 1 st onward: Dutch Dataverse
Content DANS Data and Trust Contents European framework for certification DSA, DIN, ISO Other approaches Ongoing work Horizon 2020 DANS Data infrastructure Trust and certification Business model
Proliferation of data Growing recognition of the value of data Advantages: Transparency and replication of research Re-use of data Trend of data sharing/open data policies Challenges: Data management Data infrastructure
RDM: research data life cycle
Trust The federated data infrastructure: a collaborative framework Data Curation Data Generators Data Users User functions: data capture and transfer Front offices: Local Data Facilities (University Libraries) Domain-Specific Research Infrastructures Community Support Services Back Offices: DANS, 3TU.Datacentrum, SURFsara, Common Data Services: Archiving, Access, Basic Technical Infrastructure: SURFsara, Target, Common Data Services: Storage, Backups,
FO-BO Institutions Front offices Universities (libraries, local data centers) Disciplinary research infrastructures (ESFRI/NL-National Roadmap) Back offices DANS (humanities, social sciences) 3TU.Datacentrum (technical sciences) SURFsara (big data) -> Trusted digital repositories!
Data Archiving and Networked Services DANS is an institute of KNAW en NWO
Services in the model Information and awareness raising Training (data librarians and researchers) Storage (during and after the research)
Roles and responsibilities: the Front Office Focus on information and awareness raising: Information portal research community Awareness raising, support and training research community Supporting VREs (research tools, data storage during research; Sharepoint, Dataverse, etc.; transfer for long-term archiving in trusted digital repository) Liaising with back office Data acquisition
Roles and responsibilities: the Back Office Focus on expertise and long term storage: Expertise and innovation in the area of permanent storage, data management and re-use of data Providing expertise to the research community: training courses Providing expertise to the front office: training courses for data experts, consultancy, contact persons Long term preservation of data in a trusted digital repository
Research Data Netherlands BO collaboration in order to serve the FO better and more efficient RDA P4: signing SURFsara (training data experts, Dutch Dataprize, data acquisition, exchange of technical expertise) Expanding areas of collaboration Open to other trusted digital repositories
Challenges Expanding the model over all universities (institutional agreements) Creating one single back office desk (RDNL) Creating a technical infrastructure for automatic data ingest Developing a business model to cover the costs
Content DANS Data and Trust Contents European framework for certification DSA, DIN, ISO Other approaches Ongoing work Horizon 2020 DANS Data infrastructure Trust and certification Business model
Reality of data sharing On a central storage facility outside my department or institute 9% On a network disk of my department or institute 31% Other 1% Locally: on my own computer(s), or on computer(s) of my department or laboratory 36% On external hard disks or backup media (CD, DVD, tape, etc.) 23%
Why not share? Those data are mine! Discredit my findings Still analyzing the data I cannot trust the data produced somewhere else
Trust Trust is at the very heart of storing and sharing data Users Depositors Funders
What do we rely on? RANG IS ALLEEN RANG ALS ER RANG OP STAAT You can rely on us Can you?
What is a trusted digital repository? Things are not always what they say they are. Things do not always state what they are.
What is trust built on? Dedicate yourself (mission statement) Do what you promise (stable, sincere and competent reputation) Be transparent (peer review, get certified)
Trust in data archives: an example
AIS 14721) Certification of digital repositories ted Digital ositories: butes and onsibilities TRAC s concerning: anizational Infrastructure e.g. The repository shall have a documented history of the changes to its operations, procedures, software, and hardware. ital Object Management Audit And Certification (ISO 16919 ) Audit and Certification of Trustworthy Digital Repositories (ISO 16363 ) European framework Information 3 for standards all of the digital objects it contains. Data Seal of Approval 3 levels (basic, extended, formal) e.g. The repository shall have access to necessary tools and resources to provide authoritative Representation astructure and Security Risk Management eg. The repository shall have procedures in place to evaluate when changes are needed to current software. Audit by external auditors Monitored self-audit using ISO 16363 (or DIN31644 in Germany) Monitored selfaudit using DSA metrics 16363. It covers principles needed to inspire confidence that third party certification of the management of the digital repository has been performed with impartiality, competence, responsibility, openness, confidentiality, and responsiveness to complaints Formal Certification Extended Certification Basic Certification EUROPEAN FRAMEWORK FOR AUDIT AND CERTIFICATION OF DIGITAL REPOSITORIES to be promoted by the EU i.digitalrepositoryauditandcertification.org and lliancepermanentaccess.org/membership/member-resources/audit-and-certification l be available free from http://www.ccsds.org http://www.trusteddigitalrepository.eu
Framework levels Basic Certification is granted to repositories which obtain DSA certification Extended Certification is granted to Basic Certification repositories which in addition perform a structured, externally reviewed and publicly available self-audit based on ISO 16363 or DIN 31644 Formal Certification is granted to repositories which in addition to Basic Certification obtain full external audit and certification based on ISO or DIN
DSA: basic certification DANS initiative (2005/6) International Board (2009) 16 guidelines Self assessment & review Transparency Around 35 seals awarded/30 applications in process Data producers are responsible for the quality of research data, repositories for storage and long-term access, and users for correct use of data http://datasealofapproval.org/ The research data: can be found on the Internet are accessible (clear rights and licenses) are in a usable format are reliable can be referred to (persistent identifier)
DIN 31644: extended certification 34 criteria written by German NESTOR-group and adopted in Germany as DIN31644 Self-assessment procedure by NESTOR leads to NESTOR seal Review of the assessment by 2 reviewers, appointed by NESTOR Self assessment and evidence on website No seals acquired yet.. http://www.langzeitarchivierung.de/subsites/nestor/en/nestor- Siegel/siegel_node.htm l
ISO 16363: formal certification Based on Open Archival Information System (OAIS) and Trusted Repository Audit and Certification (TRAC) Over 100 metrics Test audits 2011 by PTAB (Primary Trustworthy Digital Repository Authorisation Body) Full external auditing process ISO 16919: Requirements for bodies providing audit and certification of candidate trustworthy digital repositories No ISO certifications yet.. http://www.iso16363.org/
ESFRI Research Infrastructures and Trust Requirements for CLARIN Centres Centres need to have a proper and clearly specified repository system and participate in a quality assessment procedure as proposed by the Data Seal of Approval or MOIMS-RAC approaches Building Trust: CESSDA Self-Assessment Project Participants from fifteen CESSDA member organisations discussed the CESSDA-ERIC requirements and agreed upon using the Data Seal of Approval (DSA) guidelines as a tool to gain information on the level of their conformance with the DSA and the CESSDA-ERIC requirements.
CTRUST proposal Building on the existing European certification framework Aiming to develop a sustainable certification mechanism for trusted repositories of digital research data (archives, libraries, etc.). Project as a launching pad to permanent services supporting all European research communities.
CTRUST deliverables 1.Professionalisation of the existing standards (DSA, DIN, ISO and ICSU/WDS) 2. Boost the number of European TDRs 10 partners 4 years 4,5 million euro
Content DANS Data and Trust Contents European framework for certification DSA, DIN, ISO Other approaches Ongoing work Horizon 2020 DANS Data infrastructure Trust and certification Business model
Proliferation of data Growing recognition of the value of data -> Trend of data sharing/open data policies Challenges: Data management Data infrastructure (BO) volume and complexity of data
DANS pricing structure: why? Exponential growth in volume and complexity of data Budget growth cannot keep pace with this a business model in which all data storage will remain free forever is not viable
Costs to be charged Processing data and organising documentation, if data and metadata are not supplied in the agreed format basic data storage costs including back-up in the case of institutional depositors, based on a framework agreement (so not for individual depositors up to 1GB!) Consultancy: Larger projects for third parties
How much do we charge? Processing and documentation 75 euro per hour Consultancy 100 euro per hour Storage: 1. Archeological companies 2. Institutional agreements 3. Long-term storage in dark archive
Archeological companies Excavation research only Fixed price per research project, based on average size of a project, determined in consultation with the sector
Institutional agreements for institutions with whom DANS concludes an agreement (FOBO) on data-archiving; data available through DANS Direct access via the EASY system with two back ups
Long-term storage in dark archive For institutions who themselves provide access to the data : Long-term storage in dark archive (two back ups, slow access)
Rates for storage We charge the rates that we pay our third party for the different levels of basic storage (yearly prices). In return for a one-off payment of these costs for five years in advance, DANS ensures conservation of the data forever. This will enable the long-term safekeeping of data from projects with temporary funding.
Data Archiving and Networked Services Thank you for your attention www.dans.knaw.nl http://datasealofapproval.org/en/ ingrid.dillo@dans.knaw.nl DANS is an institute of KNAW en NWO