EUDAT. Towards a pan-european Collaborative Data Infrastructure. Willem Elbers

Similar documents

How To Build An Open Source Data Infrastructure

EUDAT - Open Data Services for Research

European Data Infrastructure - EUDAT Data Services & Tools

Report of the DTL focus meeting on Life Science Data Repositories

SURFsara Data Services

Italian Scientific Big Data Initiative

Federated Authentication and Credential Translation in the EUDAT Collaborative Data Infrastructure

Databases & Data Infrastructure. Kerstin Lehnert

Second EUDAT Conference, October 2013 Data Management Plans and Certification Motivation: increasing importance of Data Management Planning

Research Data Management

How To Use Open Source Software For Library Work

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Local Loading. The OCUL, Scholars Portal, and Publisher Relationship

Action full title: Universal, mobile-centric and opportunistic communications architecture. Action acronym: UMOBILE

Image Data, RDA and Practical Policies

Horizon Research e-infrastructures Excellence in Science Work Programme Wim Jansen. DG CONNECT European Commission

OpenAIRE Research Data Management Briefing paper

Two Recent LE Use Cases

INTEGRATING RECORDS SYSTEMS WITH DIGITAL ARCHIVES CURRENT STATUS AND WAY FORWARD

Digital Preservation Strategy,

Big Data Standardisation in Industry and Research

Data Management using irods

Cloud and Big Data Standardisation

Long Term Preservation of Earth Observation Space Data. Preservation Workflow

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

THE BRITISH LIBRARY BOARD BLB 12/29

Digital preservation a European perspective

Big Data in BioMedical Sciences. Steven Newhouse, Head of Technical Services, EMBL-EBI

IBM Data Warehousing and Analytics Portfolio Summary

Collaboration. Michael McCabe Information Architect black and white solutions for a grey world

The challenges of becoming a Trusted Digital Repository

Why long time storage does not equate to archive

The National Consortium for Data Science (NCDS)

SHared Access Research Ecosystem (SHARE)

Project Number: Project Title: Human Brain Project. HBP_SP13_EPFL_ _D13.3.2_Final.docx

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)

An Enterprise Framework for Business Intelligence

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success

Enabling the re-use of research data: organising stakeholders and infrastructure in the Netherlands

MarkLogic Enterprise Data Layer

Background: Business Value of Enterprise Architecture TOGAF Architectures and the Business Services Architecture

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

Integration strategy

A. Document repository services for EU policy support

Exploitation of ISS scientific data

CMIP6 Data Management at DKRZ

Digital Collections as Big Data. Leslie Johnston, Library of Congress Digital Preservation 2012

Policy Policy--driven Distributed driven Distributed Data Management (irods) Richard M arciano Marciano marciano@un

IPL Service Definition - Master Data Management Service

CLARIN-NL Third Call: Closed Call

Building next generation consortium services. Part 3: The National Metadata Repository, Discovery Service Finna, and the New Library System

How to avoid building a data swamp

The Czech Digital Library and Tools for the Management of Complex Digitization Processes

DATA MANAGEMENT PLAN DELIVERABLE NUMBER RESPONSIBLE AUTHOR. Co- funded by the Horizon 2020 Framework Programme of the European Union

Data at NIST: A View from the Office of Data and Informatics

Multi-domain Research Data Description

Functional Requirements for Digital Asset Management Project version /30/2006

Compute Canada Technology Briefing

EUROPEAN COMMISSION Directorate-General for Research & Innovation. Guidelines on Data Management in Horizon 2020

Canadian National Research Data Repository Service. CC and CARL Partnership for a national platform for Research Data Management

Technical. Overview. ~ a ~ irods version 4.x

Design of Data Management Guideline for Open Data Implementation

Data Centric Systems (DCS)

Big Data in the context of Preservation and Value Adding

Clodoaldo Barrera Chief Technical Strategist IBM System Storage. Making a successful transition to Software Defined Storage

Transcription:

EUDAT Towards a pan-european Collaborative Data Infrastructure Willem Elbers EUDAT / MPI-TLA Focus meeting: Data repositories SURF, Utrecht March 3, 2014

Outline EUDAT project EUDAT services Summary and conclusion 2

Data Deluge Exponential growth Zettabytes Exabytes Petabytes Terabytes Gigabytes Increasing complexity and variety Where to store it? How to find it? How to make the most of it? 3

Consortium 4 4

EUDATs Mission Collaborative Data Infrastructure Data Generators Users User-focused functionality, data capture & transfer, VREs Trust Data Curation Community Support Services Data discovery & navigation, workflow creation, annotation, interpretability Common Data Services Persistent storage, identification, authenticity, workflow execution, mining 5

... implementing services initially motivated by early community use cases 6

EUDAT addressing all data Large volumes of data (big data) - more uniform in terms of formats and quality - lots of automatic processing - high reduction as goal irregular big data - automatically derived data - aggregated data - semi-automatic processing long tail data - large variety (complexity) - many sources, many owners - difficult to manage 7

The CDI network architecture Generic data centres Community data sites (repositories) may join the data infrastructure or just use EUDAT services 8

Domain of registered data Data in the EUDAT domain must have: (descriptive) Metadata Persistent identifier Ingest points define boundary between domains Joining EUDAT: Community center Using EUDAT: EUDAT data center Specific cases: BE2SHARE where EUDAT center(s) act as repository 9

enrichment processing reduction analysis domain of registered data individual value (short timescale) community value (medium timescale) society value (long timescale) publication acquisition generation description preservation Identifier Service 10

EUDAT Services Portfolio Metadata Catalogue Aggregated EUDAT metadata domain. Data inventory Data Staging Safe Replication Simple Store Dynamic replication to HPC workspace for processing Data preservation, access optimization Researcher data store (simple upload, share and access) PID Identity Integrity Authenticity Locations AAI Network of trust among authentication and authorization actors 11

Replication from repositories to data storages in different administrative domains (long-term) archiving and preservation optimize access for users from different regions bring data closer to powerful computers for data analytics Typical policies triggered by Community Data Managers: Replicate collection X from my repository to data centres A and B Store the replica safely for N years Check the integrity of the replica every M years 12

Transferring data from EUDAT storages to compute facilities reliable, efficient, easy-to-use tools to manage data transfers ingest data into the EUDAT domain of registered data 13

enabled EUDAT sites repositories replica storages 14

B2SHARE Offering a simple self-service registration for data providers Lowering barriers to allow registered users to upload and store smaller scientific data sets into the B2SHARE repository Enabling users to share their data with other researchers 15

B2FIND Make collections of scientific data easy to find Provide access those data collections through the given references in the metadata Commenting functionality 16

Summary The EUDAT project is driven by community requirements bridging the gap between community support services and common data services The EUDAT project is providing services to safely and easily store your data, make it discoverable and run hpc analysis on your data In a domain of registered data 17

Thank you B2SAFE eudat-safereplication@postit.csc.fi B2STAGE eudat-b2stage@postit.csc.f B2FIND http://b2find.eudat.eu/ B2SHARE https://b2share.eudat.eu/ www.eudat.eu 18