EUDAT. Towards a pan-european Collaborative Data Infrastructure. Willem Elbers
|
|
|
- Bruce Higgins
- 10 years ago
- Views:
Transcription
1 EUDAT Towards a pan-european Collaborative Data Infrastructure Willem Elbers EUDAT / MPI-TLA Focus meeting: Data repositories SURF, Utrecht March 3, 2014
2 Outline EUDAT project EUDAT services Summary and conclusion 2
3 Data Deluge Exponential growth Zettabytes Exabytes Petabytes Terabytes Gigabytes Increasing complexity and variety Where to store it? How to find it? How to make the most of it? 3
4 Consortium 4 4
5 EUDATs Mission Collaborative Data Infrastructure Data Generators Users User-focused functionality, data capture & transfer, VREs Trust Data Curation Community Support Services Data discovery & navigation, workflow creation, annotation, interpretability Common Data Services Persistent storage, identification, authenticity, workflow execution, mining 5
6 ... implementing services initially motivated by early community use cases 6
7 EUDAT addressing all data Large volumes of data (big data) - more uniform in terms of formats and quality - lots of automatic processing - high reduction as goal irregular big data - automatically derived data - aggregated data - semi-automatic processing long tail data - large variety (complexity) - many sources, many owners - difficult to manage 7
8 The CDI network architecture Generic data centres Community data sites (repositories) may join the data infrastructure or just use EUDAT services 8
9 Domain of registered data Data in the EUDAT domain must have: (descriptive) Metadata Persistent identifier Ingest points define boundary between domains Joining EUDAT: Community center Using EUDAT: EUDAT data center Specific cases: BE2SHARE where EUDAT center(s) act as repository 9
10 enrichment processing reduction analysis domain of registered data individual value (short timescale) community value (medium timescale) society value (long timescale) publication acquisition generation description preservation Identifier Service 10
11 EUDAT Services Portfolio Metadata Catalogue Aggregated EUDAT metadata domain. Data inventory Data Staging Safe Replication Simple Store Dynamic replication to HPC workspace for processing Data preservation, access optimization Researcher data store (simple upload, share and access) PID Identity Integrity Authenticity Locations AAI Network of trust among authentication and authorization actors 11
12 Replication from repositories to data storages in different administrative domains (long-term) archiving and preservation optimize access for users from different regions bring data closer to powerful computers for data analytics Typical policies triggered by Community Data Managers: Replicate collection X from my repository to data centres A and B Store the replica safely for N years Check the integrity of the replica every M years 12
13 Transferring data from EUDAT storages to compute facilities reliable, efficient, easy-to-use tools to manage data transfers ingest data into the EUDAT domain of registered data 13
14 enabled EUDAT sites repositories replica storages 14
15 B2SHARE Offering a simple self-service registration for data providers Lowering barriers to allow registered users to upload and store smaller scientific data sets into the B2SHARE repository Enabling users to share their data with other researchers 15
16 B2FIND Make collections of scientific data easy to find Provide access those data collections through the given references in the metadata Commenting functionality 16
17 Summary The EUDAT project is driven by community requirements bridging the gap between community support services and common data services The EUDAT project is providing services to safely and easily store your data, make it discoverable and run hpc analysis on your data In a domain of registered data 17
18 Thank you B2SAFE B2STAGE B2FIND B2SHARE
How To Build An Open Source Data Infrastructure
EUDAT Collaborative Data Infrastructure Towards the convergence of Compute, Data, Knowledge and Scientific Instruments Giuseppe Fiameni CINECA www.eudat.eu EUDAT receives funding from the European Union's
EUDAT - Open Data Services for Research
EUDAT - Open Data Services for Research Per Öster 05.03.2015 CSC at a Glance Founded in 1971 as a technical support unit for Univac 1108 Connected Finland to the Internet in 1988 Reorganized as a company,
European Data Infrastructure - EUDAT Data Services & Tools
European Data Infrastructure - EUDAT Data Services & Tools Dr. Ing. Morris Riedel Research Group Leader, Juelich Supercomputing Centre Adjunct Associated Professor, University of iceland BDEC2015, 2015-01-28
Report of the DTL focus meeting on Life Science Data Repositories
Report of the DTL focus meeting on Life Science Data Repositories Goal The goal of the meeting was to inform and discuss research data repositories for life sciences. The big data era adds to the complexity
SURFsara Data Services
SURFsara Data Services SUPPORTING DATA-INTENSIVE SCIENCES Mark van de Sanden The world of the many Many different users (well organised (international) user communities, research groups, universities,
Italian Scientific Big Data Initiative
Italian Scientific Big Data Initiative Sanzio Bassini Director of Supercomputing Application & Innovation Department [email protected] Casalecchio di Reno (BO) Via Magnanelli 6/3, 40033 Casalecchio di
Workprogramme 2014-15
Workprogramme 2014-15 e-infrastructures DCH-RP final conference 22 September 2014 Wim Jansen einfrastructure DG CONNECT European Commission DEVELOPMENT AND DEPLOYMENT OF E-INFRASTRUCTURES AND SERVICES
Federated Authentication and Credential Translation in the EUDAT Collaborative Data Infrastructure
Federated Authentication and Credential Translation in the EUDAT Collaborative Data Infrastructure Ahmed Shiraz Memon (JSC - DE) Jens Jensen (STFC escience - UK) Ales Cernivec (XLAB - SL) Krzysztof Benedyczak
Databases & Data Infrastructure. Kerstin Lehnert
+ Databases & Data Infrastructure Kerstin Lehnert + Access to Data is Needed 2 to allow verification of research results to allow re-use of data + The road to reuse is perilous (1) 3 Accessibility Discovery,
Second EUDAT Conference, October 2013 Data Management Plans and Certification Motivation: increasing importance of Data Management Planning
Second EUDAT Conference, October 2013 Data Management Plans and Certification Motivation: increasing importance of Data Management Planning Simon Lambert Scientific Computing Department STFC Rutherford
Research Data Management
Research Data Management 1 Why to we need to Manage Data? 2 Data Management Planning Typically covers: - What data will be created (format, types) and how? - How will the data be documented and described?
How To Use Open Source Software For Library Work
USE OF OPEN SOURCE SOFTWARE AT THE NATIONAL LIBRARY OF AUSTRALIA Reports on Special Subjects ABSTRACT The National Library of Australia has been a long-term user of open source software to support generic
Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007
Data Management in an International Data Grid Project Timur Chabuk 04/09/2007 Intro LHC opened in 2005 several Petabytes of data per year data created at CERN distributed to Regional Centers all over the
Local Loading. The OCUL, Scholars Portal, and Publisher Relationship
Local Loading Scholars)Portal)has)successfully)maintained)relationships)with)publishers)for)over)a)decade)and)continues) to)attract)new)publishers)that)recognize)both)the)competitive)advantage)of)perpetual)access)through)
Action full title: Universal, mobile-centric and opportunistic communications architecture. Action acronym: UMOBILE
Action full title: Universal, mobile-centric and opportunistic communications architecture Action acronym: UMOBILE Deliverable: D.6.10 - Data Management Plan Project Information: Project Full Title Project
Image Data, RDA and Practical Policies
Image Data, RDA and Practical Policies Rainer Stotzka and many others KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association www.kit.edu Data Life Cycle Lab
Horizon 2020. Research e-infrastructures Excellence in Science Work Programme 2016-17. Wim Jansen. DG CONNECT European Commission
Horizon 2020 Research e-infrastructures Excellence in Science Work Programme 2016-17 Wim Jansen DG CONNECT European Commission 1 Before we start The material here presented has been compiled with great
OpenAIRE Research Data Management Briefing paper
OpenAIRE Research Data Management Briefing paper Understanding Research Data Management February 2016 H2020-EINFRA-2014-1 Topic: e-infrastructure for Open Access Research & Innovation action Grant Agreement
Two Recent LE Use Cases
Two Recent LE Use Cases Case Study I Have A Bomb On This Plane (Miami Airport) In January 2012, an airline passenger tweeted she had a bomb on a Jet Blue commercial aircraft at the Miami International
INTEGRATING RECORDS SYSTEMS WITH DIGITAL ARCHIVES CURRENT STATUS AND WAY FORWARD
INTEGRATING RECORDS SYSTEMS WITH DIGITAL ARCHIVES CURRENT STATUS AND WAY FORWARD National Archives of Estonia Kuldar As National Archives of Sweden Karin Bredenberg University of Portsmouth Janet Delve
Digital Preservation Strategy, 2012-2015
Digital Preservation Strategy, 2012-2015 Preface This digital preservation strategy sets out what the National Library of Wales (NLW) intends to do to preserve digital materials over the next three years.
Big Data Standardisation in Industry and Research
Big Data Standardisation in Industry and Research EuroCloud Symposium ICS Track: Standards for Big Data in the Cloud 15 October 2013, Luxembourg Yuri Demchenko System and Network Engineering Group, University
Data Management using irods
Data Management using irods Fundamentals of Data Management September 2014 Albert Heyrovsky Applications Developer, EPCC [email protected] 2 Course outline Why talk about irods? What is irods?
Cloud and Big Data Standardisation
Cloud and Big Data Standardisation EuroCloud Symposium ICS Track: Standards for Big Data in the Cloud 15 October 2013, Luxembourg Yuri Demchenko System and Network Engineering Group, University of Amsterdam
Long Term Preservation of Earth Observation Space Data. Preservation Workflow
Long Term Preservation of Earth Observation Space Data Preservation Workflow CEOS-WGISS Doc. Ref.: CEOS/WGISS/DSIG/PW Data Stewardship Interest Group Date: March 2015 Issue: Version 1.0 Preservation Workflow
Data Governance in the Hadoop Data Lake. Michael Lang May 2015
Data Governance in the Hadoop Data Lake Michael Lang May 2015 Introduction Product Manager for Teradata Loom Joined Teradata as part of acquisition of Revelytix, original developer of Loom VP of Sales
THE BRITISH LIBRARY BOARD BLB 12/29
IN CONFIDENCE THE BRITISH LIBRARY BOARD BLB 12/29 BRITISH LIBRARY DIGITAL STRATEGY TO 2015 1. PURPOSE OF THE PAPER To respond to members request for an overarching overview of how digital activities in
Digital preservation a European perspective
Digital preservation a European perspective Pat Manson Head of Unit European Commission DG Information Society and Media Cultural Heritage and Technology Enhanced Learning Outline The digital preservation
Big Data in BioMedical Sciences. Steven Newhouse, Head of Technical Services, EMBL-EBI
Big Data in BioMedical Sciences Steven Newhouse, Head of Technical Services, EMBL-EBI Big Data for BioMedical Sciences EMBL-EBI: What we do and why? Challenges & Opportunities Infrastructure Requirements
IBM Data Warehousing and Analytics Portfolio Summary
IBM Information Management IBM Data Warehousing and Analytics Portfolio Summary Information Management Mike McCarthy IBM Corporation [email protected] IBM Information Management Portfolio Current Data
Collaboration. Michael McCabe Information Architect [email protected]. black and white solutions for a grey world
Collaboration Michael McCabe Information Architect [email protected] black and white solutions for a grey world Slide Deck & Webcast Recording links Questions and Answers We will answer questions at
The challenges of becoming a Trusted Digital Repository
The challenges of becoming a Trusted Digital Repository Annemieke de Jong is Preservation Officer at the Netherlands Institute for Sound and Vision (NISV) in Hilversum. She is responsible for setting out
Why long time storage does not equate to archive
Why long time storage does not equate to archive Jos van Wezel HUF Toronto 2015 STEINBUCH CENTRE FOR COMPUTING - SCC KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz
The National Consortium for Data Science (NCDS)
The National Consortium for Data Science (NCDS) A Public-Private Partnership to Advance Data Science Ashok Krishnamurthy PhD Deputy Director, RENCI University of North Carolina, Chapel Hill What is NCDS?
SHared Access Research Ecosystem (SHARE)
SHared Access Research Ecosystem (SHARE) June 7, 2013 DRAFT Association of American Universities (AAU) Association of Public and Land-grant Universities (APLU) Association of Research Libraries (ARL) This
Project Number: 284941 Project Title: Human Brain Project. HBP_SP13_EPFL_14-0205_D13.3.2_Final.docx
Project Number: 284941 Project Title: Human Brain Project Document Title: Document Filename (1) : Deliverable Number: Deliverable Type: HBP Data Management Plan HBP_SP13_EPFL_14-0205_D13.3.2_Final.docx
CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)
CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21) Goal Develop and deploy comprehensive, integrated, sustainable, and secure cyberinfrastructure (CI) to accelerate research
An Enterprise Framework for Business Intelligence
An Enterprise Framework for Business Intelligence Colin White BI Research May 2009 Sponsored by Oracle Corporation TABLE OF CONTENTS AN ENTERPRISE FRAMEWORK FOR BUSINESS INTELLIGENCE 1 THE BI PROCESSING
Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success
Developing an MDM Strategy Key Components for Success WHITE PAPER Table of Contents Introduction... 2 Process Considerations... 3 Architecture Considerations... 5 Conclusion... 9 About Knowledgent... 10
Enabling the re-use of research data: organising stakeholders and infrastructure in the Netherlands
Enabling the re-use of research data: organising stakeholders and infrastructure in the Netherlands Ingrid Dillo, deputy director DANS RECODE conference, Athens 15-01-2015 Challenges in sharing data Technical
MarkLogic Enterprise Data Layer
MarkLogic Enterprise Data Layer MarkLogic Enterprise Data Layer MarkLogic Enterprise Data Layer September 2011 September 2011 September 2011 Table of Contents Executive Summary... 3 An Enterprise Data
Background: Business Value of Enterprise Architecture TOGAF Architectures and the Business Services Architecture
Business Business Services Services and Enterprise and Enterprise This Workshop Two parts Background: Business Value of Enterprise TOGAF s and the Business Services We will use the key steps, methods and
Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance
Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice
Integration strategy
C3-INAD and ESGF: Integration strategy C3-INAD Middleware Team: Stephan Kindermann, Carsten Ehbrecht [DKRZ] Bernadette Fritzsch [AWI] Maik Jorra, Florian Schintke, Stefan Plantikov [ZUSE Institute] Markus
A. Document repository services for EU policy support
A. Document repository services for EU policy support 1. CONTEXT Type of Action Type of Activity Service in charge Associated Services Project Reusable generic tools DG DIGIT Policy DGs (e.g. FP7 DGs,
Exploitation of ISS scientific data
Cooperative ISS Research data Conservation and Exploitation Exploitation of ISS scientific data Luigi Carotenuto Telespazio s.p.a. Copernicus Big Data Workshop March 13-14 2014 European Commission Brussels
CMIP6 Data Management at DKRZ
CMIP6 Data Management at DKRZ icas2015 Annecy, France on 13 17 September 2015 Michael Lautenschlager Deutsches Klimarechenzentrum (DKRZ) With contributions from ESGF Executive Committee and WGCM Infrastructure
Digital Collections as Big Data. Leslie Johnston, Library of Congress Digital Preservation 2012
Digital Collections as Big Data Leslie Johnston, Library of Congress Digital Preservation 2012 Data is not just generated by satellites, identified during experiments, or collected during surveys. Datasets
Policy Policy--driven Distributed driven Distributed Data Management (irods) Richard M arciano Marciano marciano@un marciano @un.
Policy-driven Distributed Data Management (irods) Richard Marciano [email protected] Professor @ SILS / Chief Scientist for Persistent Archives and Digital Preservation @ RENCI Director of the Sustainable
IPL Service Definition - Master Data Management Service
IPL Proposal IPL Service Definition - Master Data Management Service Project: Date: 16th Dec 2014 Issue Number: Issue 1 Customer: Crown Commercial Service Page 1 of 7 IPL Information Processing Limited
CLARIN-NL Third Call: Closed Call
CLARIN-NL Third Call: Closed Call CLARIN-NL launches in its third call a Closed Call for project proposals. This called is only open for researchers who have been explicitly invited to submit a project
Building next generation consortium services. Part 3: The National Metadata Repository, Discovery Service Finna, and the New Library System
Building next generation consortium services Part 3: The National Metadata Repository, Discovery Service Finna, and the New Library System Kristiina Hormia-Poutanen, Director of Library Network Services
How to avoid building a data swamp
How to avoid building a data swamp Case studies in Hadoop data management and governance Mark Donsky, Product Management, Cloudera Naren Korenu, Engineering, Cloudera 1 Abstract DELETE How can you make
The Czech Digital Library and Tools for the Management of Complex Digitization Processes
The Czech Digital Library and Tools for the Management of Complex Digitization Processes Martin LHOTÁK Library of the Academy of Sciences of the Czech Republic [email protected] INFORUM 2012: 18th Conference
DATA MANAGEMENT PLAN DELIVERABLE NUMBER RESPONSIBLE AUTHOR. Co- funded by the Horizon 2020 Framework Programme of the European Union
DATA MANAGEMENT PLAN Co- funded by the Horizon 2020 Framework Programme of the European Union DELIVERABLE NUMBER DELIVERABLE TITLE D7.4 Data Management Plan RESPONSIBLE AUTHOR DFKI GRANT AGREEMENT N. PROJECT
Data at NIST: A View from the Office of Data and Informatics
Data at NIST: A View from the Office of Data and Informatics Robert Hanisch Office of Data and Informatics Material Measurement Laboratory National Institute of Standards and Technology Data and NIST 1
Multi-domain Research Data Description
Multi-domain Research Data Description Fostering the participation of researchers in a ontology-based data management environment João Aguiar Castro Faculdade de Engenharia da Universidade do Porto / INESC
Functional Requirements for Digital Asset Management Project version 3.0 11/30/2006
/30/2006 2 3 4 5 6 7 8 9 0 2 3 4 5 6 7 8 9 20 2 22 23 24 25 26 27 28 29 30 3 32 33 34 35 36 37 38 39 = required; 2 = optional; 3 = not required functional requirements Discovery tools available to end-users:
Compute Canada Technology Briefing
Compute Canada Technology Briefing November 12, 2015 Introduction Compute Canada, in partnership with regional organizations ACENET, Calcul Québec, Compute Ontario and WestGrid, leads the acceleration
EUROPEAN COMMISSION Directorate-General for Research & Innovation. Guidelines on Data Management in Horizon 2020
EUROPEAN COMMISSION Directorate-General for Research & Innovation Guidelines on Data Management in Horizon 2020 Version 2.0 30 October 2015 1 Introduction In Horizon 2020 a limited and flexible pilot action
Canadian National Research Data Repository Service. CC and CARL Partnership for a national platform for Research Data Management
Research Data Management Canadian National Research Data Repository Service Progress Report, June 2016 As their digital datasets grow, researchers across all fields of inquiry are struggling to manage
Technical. Overview. ~ a ~ irods version 4.x
Technical Overview ~ a ~ irods version 4.x The integrated Ru e-oriented DATA System irods is open-source, data management software that lets users: access, manage, and share data across any type or number
Design of Data Management Guideline for Open Data Implementation
Design of Data Guideline for Implementation (case study in Indonesia) Arry Akhmad Arman Institut Teknologi Bandung Jl. Ganesha 10 Bandung Indonesia 40132 Phone: +62-22-2502260 [email protected] Gilang
Data Centric Systems (DCS)
Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems
Big Data in the context of Preservation and Value Adding
Big Data in the context of Preservation and Value Adding R. Leone, R. Cosac, I. Maggio, D. Iozzino ESRIN 06/11/2013 ESA UNCLASSIFIED Big Data Background ESA/ESRIN organized a 'Big Data from Space' event
Clodoaldo Barrera Chief Technical Strategist IBM System Storage. Making a successful transition to Software Defined Storage
Clodoaldo Barrera Chief Technical Strategist IBM System Storage Making a successful transition to Software Defined Storage Open Server Summit Santa Clara Nov 2014 Data at the core of everything Data is
