Report on data management and infrastructure



Similar documents
MSDI: Workflows, Software and Related Data Standards

The Interoperability of Wireless Sensor Networks

SeaDataNet pan-european infrastructure for ocean and marine data management. Dick M.A. Schaap MARIS

Databases & Data Infrastructure. Kerstin Lehnert

Pan-European infrastructure for management of marine and ocean geological and geophysical data

Metadata for Data Discovery: The NERC Data Catalogue Service. Steve Donegan

Big Data in the context of Preservation and Value Adding

SeaDataNet pan-european infrastructure for ocean and marine data management and its relation to EMODNet and GEOSS

Cloud-based Infrastructures. Serving INSPIRE needs

The distribution of marine OpenData via distributed data networks and Web APIs. The example of ERDDAP, the message broker and data mediator from NOAA

ODIP: Establishing and operating an Ocean Data Interoperability Platform

Nigel Bayliff CEO, Huawei Marine Networks. Committed to connecting the world

18 Month Summary of Progress

Nevada NSF EPSCoR Track 1 Data Management Plan

CDI/THREDDS Interoperability: the SeaDataNet developments. P. Mazzetti 1,2, S. Nativi 1,2, 1. CNR-IMAA; 2. PIN-UNIFI

EUROPEAN COMMISSION Directorate-General for Research & Innovation. Guidelines on Data Management in Horizon 2020

The Arctic Observing Network and its Data Management Challenges Florence Fetterer (NSIDC/CIRES/CU), James A. Moore (NCAR/EOL), and the CADIS team

COMMISSION OF THE EUROPEAN COMMUNITIES COMMUNICATION FROM THE COMMISSION TO THE COUNCIL AND THE EUROPEAN PARLIAMENT

Building a SDI for small countries the Portuguese example

Francesco Tortorelli

THE STRATEGIC PLAN OF THE HYDROMETEOROLOGICAL PREDICTION CENTER

GeoKettle: A powerful open source spatial ETL tool

NOAA. Integrated Ocean Observing System (IOOS) Program. Data Integration Framework (DIF) Master Project Plan. (Version 1.0) November 8, 2007

MyOcean Copernicus Marine Service Architecture and data access Experience

General concepts: DDI

Compute Canada Technology Briefing

OPENGREY: HOW IT WORKS AND HOW IT IS USED

An Esri White Paper June 2011 ArcGIS for INSPIRE

Sextant. Spatial Data Infrastructure for Marine Environment. C. Satra Le Bris, E. Quimbert, M. Treguer

Checklist for a Data Management Plan draft

Big Data at ECMWF Providing access to multi-petabyte datasets Past, present and future

Product Navigator User Guide

Sofware Engineering, Services and Cloud Computing

NOON. log. Norwegian Ocean Observatory Network by Juergen Mienert

How To Use Data From Copernicus And Big Data To Help The Environment

Survey of Canadian and International Data Management Initiatives. By Diego Argáez and Kathleen Shearer

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

Future Internet Service- Based Architecture According to FI-WARE

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)

PDOK Kaart, the Dutch Mapping API

Data Models For Interoperability. Rob Atkinson

Overview of progress towards a data quality assurance strategy to facilitate interoperability. WGCV May 27 th, 2009

Industry 4.0 and Big Data

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21)

SowiDataNet. Bringing Social and Economic Research Data Together

GLOBAL DATA SPATIALLY INTERRELATE SYSTEM FOR SCIENTIFIC BIG DATA SPATIAL-SEAMLESS SHARING

GEOGRAPHIC INFORMATION SYSTEMS CERTIFICATION

A grant number provides unique identification for the grant.

NASA s Big Data Challenges in Climate Science

Mobile Broadband Technology & Services: Sustainability Factors

IoT R&I on IoT integration and platforms INTERNET OF THINGS FOCUS AREA

GAMP 4 to GAMP 5 Summary

INTERNATIONAL CONFERENCE: SDI & SIM Spatial Data Infrastructures & Spatial Information Management 2013

EXPLORING AND SHARING GEOSPATIAL INFORMATION THROUGH MYGDI EXPLORER

ICSU/WMO World Data Center for Remote Sensing of the Atmosphere (WDC RSAT)

DATA ACCESS AT EUMETSAT

Standard Big Data Architecture and Infrastructure

ProSUM Prospecting Secondary raw materials in the Urban mine and Mining wastes

Big Data Processing and Apps for Citizens' Observatories - The CITI-SENSE Approach

Exploring the roles and responsibilities of data centres and institutions in curating research data a preliminary briefing.

Digital libraries of the future and the role of libraries

Carlos Iglesias, Open Data Consultant.

Transcription:

ESONET All Regions Workshop - Barcelona 5-7th September 2007 Report on data management and infrastructure Introduction, objectives and context On September, 6 th the participants of the ESONET All Regions Workshop in Barcelona (5-7 th September 2007) were invited to join a parallel technical discussion session on data management and infrastructure. The topic of data management and infrastructure is crucial for ESONET s aim to address the objectives of the Global Earth Observation System (GEOSS). GEOSS is supposed to provide the overall conceptual and organizational framework towards integrated earth observations as the system of systems. The discussion focused on identifying potential problems to be solved on the way to a standardized data flow and services in a commonly usable network. Discussion format and participants The discussion was chaired by Michael Diepenbroek (Networking and data infrastructure) and took part in one session. Representatives from ESONET working packages as well as from the European Union and NEPTUNE Canada were present. The list of participants of the discussion is attached is listed below.

Participants of the session on data management and infrastructure: Cieslikiewicz, Witold (European Union, Witold.CIESLIKIEWICZ@ec.europa.eu) Barnes, Christopher (NEPTUNE Canada University of Victoria, crbarnes@uvic.ca) Carval, Thierry (Ifremer, Thierry.Carval@ifremer.fr) Pulliat-Felix, Ingrid (Ifremer, ingrid.puillat@jrc.it) Sarradin, Pierre Marie (Ifremer, Pierre.Marie.Sarradin@ifremer.fr) Vangriesheim, Annick (Ifremer, avangri@ifremer.fr) Diepenbroek, Michael (MARUM / KDM, mdiepenbroek@pangaea.de) Schietke, Johanna (MARUM / KDM, schietke@uni-bremen.de) Favali, Paolo (INGV, paolofa@ingv.it) Hageberg, Anne A. (CMR, anne@cmr.no) Piera, Jaume (UTM CSIC, jpiera@cmima.csic.es) Sigray, Peter (Stockholm University, peters@misu.su.se) Discussion content, main topics: The main objective of the working on group data management and infrastructure is to generate standardized data flow and services in a commonly usable network. Work flows in data infrastructure, management and interoperability must be developed for the data flow from sensors to laboratories to data centres to the final distribution through data portals or other services. The participants of the discussion identified as the main problems to be solved the - heterogeneity of the organisational and technical approaches - heterogeneity of equipment, analytical methods and data - dynamics of technical developments To break down this complex and multilayered challenge the discussion group focused primarily on seven thematic sub-sections: Data capture Data products Quality control Archiving Publication Dissemination

Action items: Guidelines etc. as a dynamic document (wiki type document) Working group on CI Vocabularies -> standards -> interoperability Certification of data centers Topology of observatories, data centers etc Thematic discussion focus: Data capture: The discussion participants agreed that prior to the data capture the community has to focus and agree on a sufficient sampling density, the continuity of measurements, and the long-term operation to make the captured data comparable and usable within the community and for data products. The expected data that will be captured ranges from (near) real time data to offline data, taken from cabled or wireless observatories and includes also sampled data. This results in a wide variety of date types. One key problem the discussion identified is to solve the convergence of these data types. Data products The participants of the discussion group think ESONET will be able to offer standard products as well as more complex products. Requirement for these products is at least an agreement within the community on minimum measurements, convergence in methodologies as far as sensors and processing is concerned, and often data type specific standards. To estimate the need for other data products more input from the community and the user is needed since this is mainly a user driven system. Quality control Reliable quality of data is necessary for the use of data in data products as well as for data sharing among the scientist of a community. The discussion recommended that quality

control should be embedded in the data flow on several levels depending on the processing level. Methods of quality control should include at least the control of completeness of metainformation, the flagging of data following identical procedures, and checking the validity of used methods. The aim is to have knowledge about the quality of data as well as to judge quality versus good quality. The quality control should be implemented in conformance with accepted and agreed on standards and protocols. Archiving ESONET will produce data volumes on different scales. One challenge will be the migration of data (e.g. with media changes involved). The discussion group agreed that for the long-term archiving of this data there is the need for the identification of certified data centres. These data centres should be especially committed to the long term archiving of data with established responsibilities and resources needed for extra capacities. Supporting a distributed system the discussion participants suggested the usage of existing data centres, e.g. national and world data centres. Publication In the discussion the participants agreed that the data should be (mostly) openly accessible according to the GEOSS data sharing principles. The aim is to create citable and identifiable (e.g. through DOIs) datasets compliant with existing standards. The participants stressed in the discussion the importance of crediting the data producers in the datasets. Dissemination Dissemination of data will be carried out mostly via data portals and data services. The Internet is here the first choice access medium. The discussion group suggested that internet based services should be compatible with US/Canadian services (OOI cyberinfrastructure Scripps IO) and it would be desirable to have also a compatibility with Japanese services, IODP and other cabled observatories to generate a maximum exploitation.

Potential dissemination infrastructure: ESONET data portal, Scientific Commons GEOSS, GMES, Marine Core Services Compatilibility with US/Canada (OOI cyberinfrastructure -> Scripps IO) Japan, IODP, cabled observatories Google scholar, Scientific Commons Other systems & communities Action items: The session on data management and infrastructure closed with the specification of concrete tasks for the work package. The participants agreed on phrasing guidelines on good data management as a dynamic (wiki type) document. A working group on CI should be formed, furthermore concerned with vocabularies, standards, interoperability and the certification of data centres. The discussion group supported the idea of a topology of observatories, data centres, etc. Implementation and test of spatial data infrastructure (SDI): GSDI as worldwide effort to network georeferenced data supported by many countries and initiatives (INSPIRE, GRID, IPCC, IGBP, OGC etc) Interoperability based on global standards (IEEE standards, Sensor ML, ISO19xxx family of standards, SOAP/WSDL etc.) Service interfaces to be contained within the data exchange and dissemination components. mechanisms will include many varieties of communications modes, with a primary emphasis on the Internet wherever appropriate, Ranging from very low technology approaches to highly specialized technologies