Image Data, RDA and Practical Policies

Similar documents
Software Entwicklungen für das LSDF Datenmanagement

Data Management at UT

Management of Research Data Procedure

OpenAIRE Research Data Management Briefing paper

Globus Research Data Management: Introduction and Service Overview

The Australian War Memorial s Digital Asset Management System

USGS Guidelines for the Preservation of Digital Scientific Data

Survey of Canadian and International Data Management Initiatives. By Diego Argáez and Kathleen Shearer

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Willem Elbers

Long-term archiving and preservation planning

Cloud computing based big data ecosystem and requirements

DMBI: Data Management for Bio-Imaging.

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

Research Data Management Guide

Best Practices for Data Management. RMACC HPC Symposium, 8/13/2014

The Concept of Big Data Reference Model

Interagency Science Working Group. National Archives and Records Administration

Records Management and SharePoint 2013

SURFsara Data Services

Lesson 3: Data Management Planning

THE BRITISH LIBRARY. Unlocking The Value. The British Library s Collection Metadata Strategy Page 1 of 8

How To Write A Blog Post On Globus

Report of the DTL focus meeting on Life Science Data Repositories

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

Ingrid Hsieh Yee, Professor School of Library & Information Science Catholic University of America

Data Management Plan. Name of Contractor. Name of project. Project Duration Start date : End: DMP Version. Date Amended, if any

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015

Why long time storage does not equate to archive

Research Data Management Policy

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Data Management in NeuroMat and the Neuroscience Experiments System (NES)

Data Management Plans & the DMPTool. IAP: January 26, 2016

IST687 Scientific Data Management

Archiving the Web: the mass preservation challenge

National Earth System Science Data Sharing Platform of China

Exploitation of ISS scientific data

E-Content Service Group Virtual Meeting. Digital Preservation: How to Get Started

Data Management Resources at UNC: The Carolina Digital Repository and Dataverse Network

North Carolina Digital Preservation Policy. April 2014

Checklist: Persistent identifiers

LabArchives Electronic Lab Notebook:

Library Strategic Planning

How To Useuk Data Service

A grant number provides unique identification for the grant.

Towards a Domain-Specific Framework for Predictive Analytics in Manufacturing. David Lechevalier Anantha Narayanan Sudarsan Rachuri

Data Management Exercises

Policy Policy--driven Distributed driven Distributed Data Management (irods) Richard M arciano Marciano marciano@un

Why archiving erecords influences the creation of erecords. Martin Stürzlinger scopepartner Vienna, Austria

Digital Preservation Strategy,

INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS)

On the Radar: Tessella

Data Publishing Workflows with Dataverse

Strategies and New Technology for Long Term Preservation of Big Data

How To Understand And Understand The Science Of Astronomy

RESEARCH DATA MANAGEMENT POLICY

Outline of a Research Data Management Policy for Australian Universities / Institutions

Are You Big Data Ready?

Harvard Library Preparing for a Trustworthy Repository Certification of Harvard Library s DRS.

A Service for Data-Intensive Computations on Virtual Clusters

NIDA Big Data Strategic Planning Workgroup

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

Long Term Preservation of Earth Observation Space Data. Preservation Workflow

Transcription:

Image Data, RDA and Practical Policies Rainer Stotzka and many others KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association www.kit.edu

Data Life Cycle Lab Key Technologies Mostly scientific imaging projects with common needs in Data and metadata management High throughput analysis in near real time Data archiving and sharing High performance data ingest Automatic metadata extraction Visualization 2

Light Sheet Microscopy Novel microscope to image living Zebrafish embryos High 3D resolution Extreme short data acquisition time Growth of an embryo with in 16 h 16 TB raw data Acknowledgements: Andrey Kobitskiy, G. Ulrich Nienhaus Jens C. Otte, Masanari Takamiya, Uwe Strähle Johannes Stegmaier, Ralf Mikut Francesca Rindone, Volker Hartmann, Thomas Jejkal Achim Streit, Christopher Jung, Jos van Wezel Institute of Applied Physics Institute of Toxicology and Genetics Institute for Applied Computer Science Steinbuch Centre for Computing 3

Image Data Workflow Data Transfer Storage Preservation and Curation Sharing and Reuse 4

Image Data Production Data Transfer Storage Preservation and Curation Sharing and Reuse Experiment design Big Data: variety Data acquisition and quality control 200 MB/s 16 TB/d 5

Image Data Transfer Data Transfer Storage Data disposal : Data description Metadata Transfer to storage: 400 MB/s Preservation and Curation Sharing and Reuse 6

Image Data Storage Data Transfer Storage Big Data: volume, velocity Data access Security Preservation and Curation Sharing and Reuse 7

Image Data Data Transfer Storage Preservation and Curation Sharing and Reuse Big Data: value 8

Image Data Data Transfer Storage Big Data: value costly to be reproduced Worth to be preserved for future generations Sharing: Reproducible science Re-use: New (interdisciplinary) scientific outcomes Big Data: veracity Trust, well maintained Open Data Preservation and Curation Sharing and Reuse 9

Image Workflow Data Transfer Storage Preservation and Curation Sharing and Reuse Problems: HELP: Where is my data? HOW to manage, preserve, and to curate the data? 10

Image Data Repository Data Transfer Storage Preservation and Curation Repository Sharing and Reuse 11

Repository Repository: Managed location/destination/directory/bucket where digital data objects are Registered Permanently stored Made accessible and retrievable Curated E.g.: libraries & museums Digital data object (dataset): Consists of Data Description for re-use Digital Data Object (Dataset) Data (Data) Storage Organization Metadata 12

Repository System: KIT Data Manager Administrative Metadata Base MD Curation MD Sharing MD Policy MD MD OID Data Organization Metadata Content Metadata Repository System: Optimized for large scale and heterogeneous research data Internals are based on metadata Users & data curators don t see the complexity: Visible: Content metadata Discovery tool: search Representation of data 13

Repository Design Discover Access Curate Repository HELP: Where is my data? HOW to manage, preserve, and to curate the data? HOW to design the repository? Need for content metadata Need for rules for automation practical policies 14

RDA Working Group: Practical Policies Chairs: Reagan Moore, Rainer Stotzka Catalog with 11 important policy areas: Contextual metadata extraction Data access control Data backup Data format control Data retention Disposition Integrity (including replication) Notification Restricted searching Storage cost reports Use agreements 15

Example: Contextual Metadata Extraction Scenario: Extract metadata from an associated document, e.g. DICOM, OME,... Extract metadata from Discover Access Curate Repository 16

Example: Contextual Metadata Extraction Policy MD enable Contextual MD metadata extraction On file File_name On digital data object OID On user User_ID Attribute_name Extract metadata Attribute_value enable Data access control enable Data backup enable Data format control enable Data retention enable Disposition +++ +++ Expert knowledge required Data Life Cycle Lab Minimize the effort of manually recording metadata! 17

Policy Examples Policy: Contextual metadata extraction Policy: Restricted searching Policy: Data access control Discover Access Curate Repository Policy: Data backup Policy: Data integrity 18

Interaction with RDA Important groups for imaging repositories WG Practical Policies WG Data Foundation & Terminology PID groups Metadata groups IG Data Fabric Repository groups Domain related groups Typical PhD researcher Cooperates with domain partners Solves their data problems Specific research topic Monitors the activities of groups Transfer to the domain partners If funding available RDA plenaries Inhibition threshold: world experts Active in RDA groups Results: Motivation booster Networking Future State of the art 19

Conclusions Goal: Better and easier management of research data Sustain the value Enable sharing, reproducibility and re-use Enable better research RDA has real value and impact RDA is alive and growing, efficiency needs further improvement Technical Advisory Board, Organizational Advisory Board, Council, Secretariat, and individuals Plenaries (group sessions) produce results: 2 plenaries per year, low registration fee Need to invest: contributions gain results RDA is producing internationally connected data experts 20