ESONET All Regions Workshop - Barcelona 5-7th September 2007 Report on data management and infrastructure Introduction, objectives and context On September, 6 th the participants of the ESONET All Regions Workshop in Barcelona (5-7 th September 2007) were invited to join a parallel technical discussion session on data management and infrastructure. The topic of data management and infrastructure is crucial for ESONET s aim to address the objectives of the Global Earth Observation System (GEOSS). GEOSS is supposed to provide the overall conceptual and organizational framework towards integrated earth observations as the system of systems. The discussion focused on identifying potential problems to be solved on the way to a standardized data flow and services in a commonly usable network. Discussion format and participants The discussion was chaired by Michael Diepenbroek (Networking and data infrastructure) and took part in one session. Representatives from ESONET working packages as well as from the European Union and NEPTUNE Canada were present. The list of participants of the discussion is attached is listed below.
Participants of the session on data management and infrastructure: Cieslikiewicz, Witold (European Union, Witold.CIESLIKIEWICZ@ec.europa.eu) Barnes, Christopher (NEPTUNE Canada University of Victoria, crbarnes@uvic.ca) Carval, Thierry (Ifremer, Thierry.Carval@ifremer.fr) Pulliat-Felix, Ingrid (Ifremer, ingrid.puillat@jrc.it) Sarradin, Pierre Marie (Ifremer, Pierre.Marie.Sarradin@ifremer.fr) Vangriesheim, Annick (Ifremer, avangri@ifremer.fr) Diepenbroek, Michael (MARUM / KDM, mdiepenbroek@pangaea.de) Schietke, Johanna (MARUM / KDM, schietke@uni-bremen.de) Favali, Paolo (INGV, paolofa@ingv.it) Hageberg, Anne A. (CMR, anne@cmr.no) Piera, Jaume (UTM CSIC, jpiera@cmima.csic.es) Sigray, Peter (Stockholm University, peters@misu.su.se) Discussion content, main topics: The main objective of the working on group data management and infrastructure is to generate standardized data flow and services in a commonly usable network. Work flows in data infrastructure, management and interoperability must be developed for the data flow from sensors to laboratories to data centres to the final distribution through data portals or other services. The participants of the discussion identified as the main problems to be solved the - heterogeneity of the organisational and technical approaches - heterogeneity of equipment, analytical methods and data - dynamics of technical developments To break down this complex and multilayered challenge the discussion group focused primarily on seven thematic sub-sections: Data capture Data products Quality control Archiving Publication Dissemination
Action items: Guidelines etc. as a dynamic document (wiki type document) Working group on CI Vocabularies -> standards -> interoperability Certification of data centers Topology of observatories, data centers etc Thematic discussion focus: Data capture: The discussion participants agreed that prior to the data capture the community has to focus and agree on a sufficient sampling density, the continuity of measurements, and the long-term operation to make the captured data comparable and usable within the community and for data products. The expected data that will be captured ranges from (near) real time data to offline data, taken from cabled or wireless observatories and includes also sampled data. This results in a wide variety of date types. One key problem the discussion identified is to solve the convergence of these data types. Data products The participants of the discussion group think ESONET will be able to offer standard products as well as more complex products. Requirement for these products is at least an agreement within the community on minimum measurements, convergence in methodologies as far as sensors and processing is concerned, and often data type specific standards. To estimate the need for other data products more input from the community and the user is needed since this is mainly a user driven system. Quality control Reliable quality of data is necessary for the use of data in data products as well as for data sharing among the scientist of a community. The discussion recommended that quality
control should be embedded in the data flow on several levels depending on the processing level. Methods of quality control should include at least the control of completeness of metainformation, the flagging of data following identical procedures, and checking the validity of used methods. The aim is to have knowledge about the quality of data as well as to judge quality versus good quality. The quality control should be implemented in conformance with accepted and agreed on standards and protocols. Archiving ESONET will produce data volumes on different scales. One challenge will be the migration of data (e.g. with media changes involved). The discussion group agreed that for the long-term archiving of this data there is the need for the identification of certified data centres. These data centres should be especially committed to the long term archiving of data with established responsibilities and resources needed for extra capacities. Supporting a distributed system the discussion participants suggested the usage of existing data centres, e.g. national and world data centres. Publication In the discussion the participants agreed that the data should be (mostly) openly accessible according to the GEOSS data sharing principles. The aim is to create citable and identifiable (e.g. through DOIs) datasets compliant with existing standards. The participants stressed in the discussion the importance of crediting the data producers in the datasets. Dissemination Dissemination of data will be carried out mostly via data portals and data services. The Internet is here the first choice access medium. The discussion group suggested that internet based services should be compatible with US/Canadian services (OOI cyberinfrastructure Scripps IO) and it would be desirable to have also a compatibility with Japanese services, IODP and other cabled observatories to generate a maximum exploitation.
Potential dissemination infrastructure: ESONET data portal, Scientific Commons GEOSS, GMES, Marine Core Services Compatilibility with US/Canada (OOI cyberinfrastructure -> Scripps IO) Japan, IODP, cabled observatories Google scholar, Scientific Commons Other systems & communities Action items: The session on data management and infrastructure closed with the specification of concrete tasks for the work package. The participants agreed on phrasing guidelines on good data management as a dynamic (wiki type) document. A working group on CI should be formed, furthermore concerned with vocabularies, standards, interoperability and the certification of data centres. The discussion group supported the idea of a topology of observatories, data centres, etc. Implementation and test of spatial data infrastructure (SDI): GSDI as worldwide effort to network georeferenced data supported by many countries and initiatives (INSPIRE, GRID, IPCC, IGBP, OGC etc) Interoperability based on global standards (IEEE standards, Sensor ML, ISO19xxx family of standards, SOAP/WSDL etc.) Service interfaces to be contained within the data exchange and dissemination components. mechanisms will include many varieties of communications modes, with a primary emphasis on the Internet wherever appropriate, Ranging from very low technology approaches to highly specialized technologies