Image Data, RDA and Practical Policies

Size: px
Start display at page:

Download "Image Data, RDA and Practical Policies"

Transcription

1 Image Data, RDA and Practical Policies Rainer Stotzka and many others KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association

2 Data Life Cycle Lab Key Technologies Mostly scientific imaging projects with common needs in Data and metadata management High throughput analysis in near real time Data archiving and sharing High performance data ingest Automatic metadata extraction Visualization 2

3 Light Sheet Microscopy Novel microscope to image living Zebrafish embryos High 3D resolution Extreme short data acquisition time Growth of an embryo with in 16 h 16 TB raw data Acknowledgements: Andrey Kobitskiy, G. Ulrich Nienhaus Jens C. Otte, Masanari Takamiya, Uwe Strähle Johannes Stegmaier, Ralf Mikut Francesca Rindone, Volker Hartmann, Thomas Jejkal Achim Streit, Christopher Jung, Jos van Wezel Institute of Applied Physics Institute of Toxicology and Genetics Institute for Applied Computer Science Steinbuch Centre for Computing 3

4 Image Data Workflow Data Transfer Storage Preservation and Curation Sharing and Reuse 4

5 Image Data Production Data Transfer Storage Preservation and Curation Sharing and Reuse Experiment design Big Data: variety Data acquisition and quality control 200 MB/s 16 TB/d 5

6 Image Data Transfer Data Transfer Storage Data disposal : Data description Metadata Transfer to storage: 400 MB/s Preservation and Curation Sharing and Reuse 6

7 Image Data Storage Data Transfer Storage Big Data: volume, velocity Data access Security Preservation and Curation Sharing and Reuse 7

8 Image Data Data Transfer Storage Preservation and Curation Sharing and Reuse Big Data: value 8

9 Image Data Data Transfer Storage Big Data: value costly to be reproduced Worth to be preserved for future generations Sharing: Reproducible science Re-use: New (interdisciplinary) scientific outcomes Big Data: veracity Trust, well maintained Open Data Preservation and Curation Sharing and Reuse 9

10 Image Workflow Data Transfer Storage Preservation and Curation Sharing and Reuse Problems: HELP: Where is my data? HOW to manage, preserve, and to curate the data? 10

11 Image Data Repository Data Transfer Storage Preservation and Curation Repository Sharing and Reuse 11

12 Repository Repository: Managed location/destination/directory/bucket where digital data objects are Registered Permanently stored Made accessible and retrievable Curated E.g.: libraries & museums Digital data object (dataset): Consists of Data Description for re-use Digital Data Object (Dataset) Data (Data) Storage Organization Metadata 12

13 Repository System: KIT Data Manager Administrative Metadata Base MD Curation MD Sharing MD Policy MD MD OID Data Organization Metadata Content Metadata Repository System: Optimized for large scale and heterogeneous research data Internals are based on metadata Users & data curators don t see the complexity: Visible: Content metadata Discovery tool: search Representation of data 13

14 Repository Design Discover Access Curate Repository HELP: Where is my data? HOW to manage, preserve, and to curate the data? HOW to design the repository? Need for content metadata Need for rules for automation practical policies 14

15 RDA Working Group: Practical Policies Chairs: Reagan Moore, Rainer Stotzka Catalog with 11 important policy areas: Contextual metadata extraction Data access control Data backup Data format control Data retention Disposition Integrity (including replication) Notification Restricted searching Storage cost reports Use agreements 15

16 Example: Contextual Metadata Extraction Scenario: Extract metadata from an associated document, e.g. DICOM, OME,... Extract metadata from Discover Access Curate Repository 16

17 Example: Contextual Metadata Extraction Policy MD enable Contextual MD metadata extraction On file File_name On digital data object OID On user User_ID Attribute_name Extract metadata Attribute_value enable Data access control enable Data backup enable Data format control enable Data retention enable Disposition Expert knowledge required Data Life Cycle Lab Minimize the effort of manually recording metadata! 17

18 Policy Examples Policy: Contextual metadata extraction Policy: Restricted searching Policy: Data access control Discover Access Curate Repository Policy: Data backup Policy: Data integrity 18

19 Interaction with RDA Important groups for imaging repositories WG Practical Policies WG Data Foundation & Terminology PID groups Metadata groups IG Data Fabric Repository groups Domain related groups Typical PhD researcher Cooperates with domain partners Solves their data problems Specific research topic Monitors the activities of groups Transfer to the domain partners If funding available RDA plenaries Inhibition threshold: world experts Active in RDA groups Results: Motivation booster Networking Future State of the art 19

20 Conclusions Goal: Better and easier management of research data Sustain the value Enable sharing, reproducibility and re-use Enable better research RDA has real value and impact RDA is alive and growing, efficiency needs further improvement Technical Advisory Board, Organizational Advisory Board, Council, Secretariat, and individuals Plenaries (group sessions) produce results: 2 plenaries per year, low registration fee Need to invest: contributions gain results RDA is producing internationally connected data experts 20

Software Entwicklungen für das LSDF Datenmanagement

Software Entwicklungen für das LSDF Datenmanagement Software Entwicklungen für das LSDF Datenmanagement Rainer Stotzka, V. Hartmann, T. Jejkal,, P. Neuberger, S. Ochsenreither, F. Rindone, T. Schmidt, H. Pasic J. van Wezel, A. Garcia, R. Kupsch, S. Bourov,

More information

Data Management at UT

Data Management at UT Data Management at UT Maria Esteva, TACC, [email protected] Colleen Lyon, UT Libraries, [email protected] Angela Newell, ITS, [email protected] What is data management? systematic organization

More information

Management of Research Data Procedure

Management of Research Data Procedure Management of Research Data Procedure Related Policy Management of Research Data Policy Responsible Officer Deputy Vice Chancellor (Research) Approved by Deputy Vice Chancellor (Research) Approved and

More information

OpenAIRE Research Data Management Briefing paper

OpenAIRE Research Data Management Briefing paper OpenAIRE Research Data Management Briefing paper Understanding Research Data Management February 2016 H2020-EINFRA-2014-1 Topic: e-infrastructure for Open Access Research & Innovation action Grant Agreement

More information

Globus Research Data Management: Introduction and Service Overview

Globus Research Data Management: Introduction and Service Overview Globus Research Data Management: Introduction and Service Overview Kyle Chard [email protected] Ben Blaiszik [email protected] Thank you to our sponsors! U. S. D E P A R T M E N T OF ENERGY 2 Agenda

More information

The Australian War Memorial s Digital Asset Management System

The Australian War Memorial s Digital Asset Management System The Australian War Memorial s Digital Asset Management System Abstract The Memorial is currently developing an Enterprise Content Management System (ECM) of which a Digital Asset Management System (DAMS)

More information

USGS Guidelines for the Preservation of Digital Scientific Data

USGS Guidelines for the Preservation of Digital Scientific Data USGS Guidelines for the Preservation of Digital Scientific Data Introduction This document provides guidelines for use by USGS scientists, management, and IT staff in technical evaluation of systems for

More information

Survey of Canadian and International Data Management Initiatives. By Diego Argáez and Kathleen Shearer

Survey of Canadian and International Data Management Initiatives. By Diego Argáez and Kathleen Shearer Survey of Canadian and International Data Management Initiatives By Diego Argáez and Kathleen Shearer on behalf of the CARL Data Management Working Group (Working paper) April 28, 2008 Introduction Today,

More information

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Willem Elbers

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Willem Elbers EUDAT Towards a pan-european Collaborative Data Infrastructure Willem Elbers EUDAT / MPI-TLA Focus meeting: Data repositories SURF, Utrecht March 3, 2014 Outline EUDAT project EUDAT services Summary and

More information

Long-term archiving and preservation planning

Long-term archiving and preservation planning Long-term archiving and preservation planning Workflow in digital preservation Hilde van Wijngaarden Head, Digital Preservation Department National Library of the Netherlands The Challenge: Long-term Preservation

More information

Cloud computing based big data ecosystem and requirements

Cloud computing based big data ecosystem and requirements Cloud computing based big data ecosystem and requirements Yongshun Cai ( 蔡 永 顺 ) Associate Rapporteur of ITU T SG13 Q17 China Telecom Dong Wang ( 王 东 ) Rapporteur of ITU T SG13 Q18 ZTE Corporation Agenda

More information

DMBI: Data Management for Bio-Imaging.

DMBI: Data Management for Bio-Imaging. DMBI: Data Management for Bio-Imaging. Questionnaire Data Report 1.0 Overview As part of the DMBI project an international meeting was organized around the project to bring together the bio-imaging community

More information

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM Introduction The Institute of Museum and Library Services (IMLS) is committed to expanding public access to federally funded research, data, software,

More information

Research Data Management Guide

Research Data Management Guide Research Data Management Guide Research Data Management at Imperial WHAT IS RESEARCH DATA MANAGEMENT (RDM)? Research data management is the planning, organisation and preservation of the evidence that

More information

Best Practices for Data Management. RMACC HPC Symposium, 8/13/2014

Best Practices for Data Management. RMACC HPC Symposium, 8/13/2014 Best Practices for Data Management RMACC HPC Symposium, 8/13/2014 Presenters Andrew Johnson Research Data Librarian CU-Boulder Libraries Shelley Knuth Research Data Specialist CU-Boulder Research Computing

More information

The Concept of Big Data Reference Model

The Concept of Big Data Reference Model 2013-11-14 ISO/IEC JTC1/SC32/WG2N1853 The Concept of Reference Model Sungjoon Lim, KoDB*, [email protected] Dongwon Jeong, KNU**, [email protected] Jangwon Gim, KISTI***, [email protected] Hanmin Jung,

More information

Interagency Science Working Group. National Archives and Records Administration

Interagency Science Working Group. National Archives and Records Administration Interagency Science Working Group 1 National Archives and Records Administration Establishing Trustworthy Digital Repositories: A Discussion Guide Based on the ISO Open Archival Information System (OAIS)

More information

Records Management and SharePoint 2013

Records Management and SharePoint 2013 Records Management and SharePoint 2013 SHAREPOINT MANAGEMENT, ARCHITECTURE AND DESIGN Bob Mixon Senior SharePoint Architect, Information Architect, Project Manager Copyright Protected by 2013, 2014. Bob

More information

SURFsara Data Services

SURFsara Data Services SURFsara Data Services SUPPORTING DATA-INTENSIVE SCIENCES Mark van de Sanden The world of the many Many different users (well organised (international) user communities, research groups, universities,

More information

Lesson 3: Data Management Planning

Lesson 3: Data Management Planning Lesson 3: CC image by Joe Hall on Flickr What is a data management plan (DMP)? Why prepare a DMP? Components of a DMP NSF requirements for DMPs Example of NSF DMP CC image by Darla Hueske on Flickr After

More information

THE BRITISH LIBRARY. Unlocking The Value. The British Library s Collection Metadata Strategy 2015-2018. Page 1 of 8

THE BRITISH LIBRARY. Unlocking The Value. The British Library s Collection Metadata Strategy 2015-2018. Page 1 of 8 THE BRITISH LIBRARY Unlocking The Value The British Library s Collection Metadata Strategy 2015-2018 Page 1 of 8 Summary Our vision is that by 2020 the Library s collection metadata assets will be comprehensive,

More information

How To Write A Blog Post On Globus

How To Write A Blog Post On Globus Globus Software as a Service data publication and discovery Kyle Chard, University of Chicago Computation Institute, [email protected] Jim Pruyne, University of Chicago Computation Institute, [email protected]

More information

Report of the DTL focus meeting on Life Science Data Repositories

Report of the DTL focus meeting on Life Science Data Repositories Report of the DTL focus meeting on Life Science Data Repositories Goal The goal of the meeting was to inform and discuss research data repositories for life sciences. The big data era adds to the complexity

More information

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences WP11 Data Storage and Analysis Task 11.1 Coordination Deliverable 11.2 Community Needs of

More information

Ingrid Hsieh Yee, Professor School of Library & Information Science Catholic University of America

Ingrid Hsieh Yee, Professor School of Library & Information Science Catholic University of America Ingrid Hsieh Yee, Professor School of Library & Information Science Catholic University of America Best Practices Exchange, Dec. 4, 2012, Annapolis, MD 21 st Century Information Professionals Manage data,

More information

Data Management Plan. Name of Contractor. Name of project. Project Duration Start date : End: DMP Version. Date Amended, if any

Data Management Plan. Name of Contractor. Name of project. Project Duration Start date : End: DMP Version. Date Amended, if any Data Management Plan Name of Contractor Name of project Project Duration Start date : End: DMP Version Date Amended, if any Name of all authors, and ORCID number for each author WYDOT Project Number Any

More information

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015 Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO Big Data Everywhere Conference, NYC November 2015 Agenda 1. Challenges with Risk Data Aggregation and Risk Reporting (RDARR) 2. How a

More information

Why long time storage does not equate to archive

Why long time storage does not equate to archive Why long time storage does not equate to archive Jos van Wezel HUF Toronto 2015 STEINBUCH CENTRE FOR COMPUTING - SCC KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz

More information

Research Data Management Policy

Research Data Management Policy Research Data Management Policy Version Number: 1.0 Effective from 06 January 2016 Author: Research Data Manager The Library Document Control Information Status and reason for development New as no previous

More information

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets!! Large data collections appear in many scientific domains like climate studies.!! Users and

More information

Data Management in NeuroMat and the Neuroscience Experiments System (NES)

Data Management in NeuroMat and the Neuroscience Experiments System (NES) Data Management in NeuroMat and the Neuroscience Experiments System (NES) Kelly Rosa Braghetto Department of Computer Science Institute of Mathematics and Statistics - University of São Paulo 1st NeuroMat

More information

Data Management Plans & the DMPTool. IAP: January 26, 2016

Data Management Plans & the DMPTool. IAP: January 26, 2016 Data Management Plans & the DMPTool IAP: January 26, 2016 Data Management Services @ MIT Libraries Workshops/Webinars Web guide: http://libraries.mit.edu/data-management Individual consultations includes

More information

IST687 Scientific Data Management

IST687 Scientific Data Management 1 IST687 Scientific Data Management Spring 2012 Instructor: Jian Qin Email: [email protected] Office: 311 Hinds Hall Phone: 315-443-5642 Time: any time Location: anywhere Course Description The Scientific Data

More information

Archiving the Web: the mass preservation challenge

Archiving the Web: the mass preservation challenge Archiving the Web: the mass preservation challenge Catherine Lupovici Chargée de Mission auprès du Directeur des Services et des Réseaux Bibliothèque nationale de France 1-, Koninklijke Bibliotheek, Den

More information

National Earth System Science Data Sharing Platform of China

National Earth System Science Data Sharing Platform of China RDA Fifth Plenary Session: Large Scale Projects meet RDA National Earth System Science Sharing Platform of China Yunqiang Zhu, Jia Song, Mo Wang, Xiaoxuan Wang Department of Geo-data Science and Sharing

More information

Exploitation of ISS scientific data

Exploitation of ISS scientific data Cooperative ISS Research data Conservation and Exploitation Exploitation of ISS scientific data Luigi Carotenuto Telespazio s.p.a. Copernicus Big Data Workshop March 13-14 2014 European Commission Brussels

More information

E-Content Service Group Virtual Meeting. Digital Preservation: How to Get Started

E-Content Service Group Virtual Meeting. Digital Preservation: How to Get Started E-Content Service Group Virtual Meeting Digital Preservation: How to Get Started Slide 2 of 17 Agenda Committee Members E-Content Purpose Presentation by Greg Zick Questions Discussion for May Meeting

More information

Data Management Resources at UNC: The Carolina Digital Repository and Dataverse Network

Data Management Resources at UNC: The Carolina Digital Repository and Dataverse Network Data Management Resources at UNC: The Carolina Digital Repository and Dataverse Network November 16, 2010 Data Management Short Course Series Sponsored by the Odum Institute and the UNC Libraries Campus

More information

North Carolina Digital Preservation Policy. April 2014

North Carolina Digital Preservation Policy. April 2014 North Carolina Digital Preservation Policy April 2014 North Carolina Digital Preservation Policy Page 1 Executive Summary The North Carolina Digital Preservation Policy (Policy) governs the operation,

More information

Checklist: Persistent identifiers

Checklist: Persistent identifiers Checklist: Persistent identifiers Persistent Identifiers (PID) are unique character strings 1 attached to various items. Attached to digital items, they are a prerequisite for linked data. Persistent identifiers

More information

LabArchives Electronic Lab Notebook:

LabArchives Electronic Lab Notebook: Electronic Lab Notebook: Cloud platform to manage research workflow & data Support Data Management Plans Annotate and prove discovery Secure compliance Improve compliance with your data management plans,

More information

Library Strategic Planning

Library Strategic Planning Library Strategic Planning Develop campus partnerships to collect, manage, share, and preserve Georgia Tech digital research data 1. Needs Assessment Research data survey & interviews NSF DMP content analysis

More information

How To Useuk Data Service

How To Useuk Data Service Publishing and citing research data Research Data Management Support Services UK Data Service University of Essex April 2014 Overview While research data is often exchanged in informal ways with collaborators

More information

A grant number provides unique identification for the grant.

A grant number provides unique identification for the grant. Data Management Plan template Name of student/researcher(s) Name of group/project Description of your research Briefly summarise the type of your research to help others understand the purposes for which

More information

Towards a Domain-Specific Framework for Predictive Analytics in Manufacturing. David Lechevalier Anantha Narayanan Sudarsan Rachuri

Towards a Domain-Specific Framework for Predictive Analytics in Manufacturing. David Lechevalier Anantha Narayanan Sudarsan Rachuri Towards a Framework for Predictive Analytics in Manufacturing David Lechevalier Anantha Narayanan Sudarsan Rachuri Outline 2 1. Motivation 1. Why Big in Manufacturing? 2. What is needed to apply Big in

More information

Data Management Exercises

Data Management Exercises Data Management Exercises Exercise 1 Defining Post-Graduate Research Data 1. Discuss your research project and research data in groups of 3-4 Think about the following questions: What is your research

More information

Policy Policy--driven Distributed driven Distributed Data Management (irods) Richard M arciano Marciano marciano@un marciano @un.

Policy Policy--driven Distributed driven Distributed Data Management (irods) Richard M arciano Marciano marciano@un marciano @un. Policy-driven Distributed Data Management (irods) Richard Marciano [email protected] Professor @ SILS / Chief Scientist for Persistent Archives and Digital Preservation @ RENCI Director of the Sustainable

More information

Why archiving erecords influences the creation of erecords. Martin Stürzlinger scopepartner Vienna, Austria

Why archiving erecords influences the creation of erecords. Martin Stürzlinger scopepartner Vienna, Austria Why archiving erecords influences the creation of erecords Martin Stürzlinger scopepartner Vienna, Austria Electronic Records In a Productive System Created Used Changed Deleted In an Archival System No

More information

Digital Preservation Strategy, 2012-2015

Digital Preservation Strategy, 2012-2015 Digital Preservation Strategy, 2012-2015 Preface This digital preservation strategy sets out what the National Library of Wales (NLW) intends to do to preserve digital materials over the next three years.

More information

INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS)

INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS) INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS) Todd BenDor Associate Professor Dept. of City and Regional Planning UNC-Chapel Hill [email protected] http://irods.org/ SESYNC Model Integration Workshop Important

More information

On the Radar: Tessella

On the Radar: Tessella On the Radar: Tessella Creating an archive for the long-term preservation of digital content Reference Code: IT014-002789 Publication Date: 04 Sep 2013 Author: Sue Clarke SUMMARY Catalyst Ensuring that

More information

Data Publishing Workflows with Dataverse

Data Publishing Workflows with Dataverse Data Publishing Workflows with Dataverse Mercè Crosas, Ph.D. Twitter: @mercecrosas Director of Data Science Institute for Quantitative Social Science, Harvard University MIT, May 6, 2014 Intro to our Data

More information

Strategies and New Technology for Long Term Preservation of Big Data

Strategies and New Technology for Long Term Preservation of Big Data Strategies and New Technology for Long Term Preservation of Big Data Presenter: Sam Fineberg, HP Storage Co-Authors: Simona Rabinovici-Cohen, IBM Research Haifa Roger Cummings, Antesignanus Mary Baker,

More information

How To Understand And Understand The Science Of Astronomy

How To Understand And Understand The Science Of Astronomy Introduction to the VO [email protected] ESAVO ESA/ESAC Madrid, Spain The way Astronomy works Telescopes (ground- and space-based, covering the full electromagnetic spectrum) Observatories Instruments

More information

RESEARCH DATA MANAGEMENT POLICY

RESEARCH DATA MANAGEMENT POLICY Document Title Version 1.1 Document Review Date March 2016 Document Owner Revision Timetable / Process RESEARCH DATA MANAGEMENT POLICY RESEARCH DATA MANAGEMENT POLICY Director of the Research Office Regular

More information

Outline of a Research Data Management Policy for Australian Universities / Institutions

Outline of a Research Data Management Policy for Australian Universities / Institutions Outline of a Research Data Management Policy for Australian Universities / Institutions This documents purpose: This document is intended as a basic starting point for institutions that are intending to

More information

Are You Big Data Ready?

Are You Big Data Ready? ACS 2015 Annual Canberra Conference Are You Big Data Ready? Vladimir Videnovic Business Solutions Director Oracle Big Data and Analytics Introduction Introduction What is Big Data? If you can't explain

More information

Harvard Library Preparing for a Trustworthy Repository Certification of Harvard Library s DRS.

Harvard Library Preparing for a Trustworthy Repository Certification of Harvard Library s DRS. National Digital Stewardship Residency - Boston Project Summaries 2015-16 Residency Harvard Library Preparing for a Trustworthy Repository Certification of Harvard Library s DRS. Harvard Library s Digital

More information

A Service for Data-Intensive Computations on Virtual Clusters

A Service for Data-Intensive Computations on Virtual Clusters A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King [email protected] Planets Project Permanent

More information

NIDA Big Data Strategic Planning Workgroup

NIDA Big Data Strategic Planning Workgroup Big Data Strategic Planning Workgroup June 17, 2015 Co- Chairs: Roger Little, Ph.D. () Massoud Vahabzadeh, Ph.D. () Today s Agenda Welcome Review of timeline, work group priorities/activities and June

More information

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova Using the Grid for the interactive workflow management in biomedicine Andrea Schenone BIOLAB DIST University of Genova overview background requirements solution case study results background A multilevel

More information

Long Term Preservation of Earth Observation Space Data. Preservation Workflow

Long Term Preservation of Earth Observation Space Data. Preservation Workflow Long Term Preservation of Earth Observation Space Data Preservation Workflow CEOS-WGISS Doc. Ref.: CEOS/WGISS/DSIG/PW Data Stewardship Interest Group Date: March 2015 Issue: Version 1.0 Preservation Workflow

More information