The GEOmon Distributed DataBase GDDB A data discovery and download portal for atmospheric composition data http://geomon.nilu.no Presentation at the 2 nd MACC general assembly, October 19 th 2010 Aasmund Fahre Vik, afv@nilu.no
Outline Background GDDB system and contributing databases GDDB user interface Applicability for MACC purposes Future prospects for the system
Background of the work Previous attempts to create superdatabases for all types of data or had failed Experiments with metadatabases not too useful GEOSS 10 year implementation plan distributed datasystems European contribution to GEOSS from GEOmon project (coord.: Philippe Ciais) Organise and harmonise atmospheric composition obervations in Europe Organise and manage data through virtual data centre creation of GDDB
GEOmon project
GEOmon data management vision (text written 3.5 years ago) Interact with GEOMON participants to agree on how to manage data generated through the project (Data Management Committee) Build upon existing infrastructures and data flow as far as possible Balance data originators intellectual property rights with openness and transparancy, manage protocols for data access rights Keep burden on individual DO s as low as possible how can data reporting be simplified? Investigate the use of meta-data exchange in developing a distributed data centre Different approaches for different data types (multi-dimensional data vs. Simple time series) An extensive review of data sources and routes of data flow to be conducted to serve as a basis for the choice of solution Special data transfer for NRT data all GEOMON NRT data routed through common system External interfaces Web portal and machine-to-machine interface
Data flow diagrams aerosol example
GDDB overall design GEOSS GEOSS service service GEOSS GEOSS service service GEOSS service GEOmon Data Centre web portal GEOmon RD Data archive EBAS WWW EBAS Data archive External data archive External data archive External data archive External data archive External data archive External data archive CDB WWW ESA-CDB Data archive
GDDB web portal design
Data catalogue a metadatabase Contains records of core metadata for datasets stored in databases elsewhere - One record for each dataset One dataset is defined as one component from one location A physical datafile may therefore contain several datasets Information on where the dataset is stored and how/if it may be downloaded through GDDB is available Oracle DB
Data catalogue generator A series of Perl-scripts that prepare metadata and inserts (meta)data into the catalogue Different approach for the different archives that are linked to GDDB Metadata, especially component names and location, is harmonised and original naming is converted using a GDDB naming convention Exchange of metadata normally done through simple text-files made available by contributing databases Metadata harvested routinely (cron-job) and data catalogue updated automatically Syncronizes with external databases once per day
Contributing databases Implemented data connections EMEP, AMAP, Helcom, Osparcom, EUCAARI, EUSAAR, CREATE, HTAP observations, +more GAW-WDCA, GAWSIS-WOUDC, GAWSIS-WDCGG, GAWSIS-WRDC NDACC, Aura Validation Data Centre, Envisat Validation Data Centre, EARLINET Aerocom model median (only aerosol sulphate) Yet to be implemented RAMCES (GHG), NRT O3-Sondes, more Aerocom
GDDB User interface Demonstration of: http://geomon.nilu.no Two-page system search and info/download Search by Component, Location, Database, Platform, Data type and Matrix 4D boundary selection Descriptions of terms and usage guides available Link to Rapid Delivery Data
GDDB Info and Download page Demonstration of: http://geomon.nilu.no Sorting of metadata results Viewing of metadata details Access information and login details (for restricted datasets) Download of data Sometimes only http link to contributing data centre System takes care of all data transfer (http, ftp, web services) Download module works in background come back later to retrieve data through unique URL (dev version only)
Applicability for MACC purposes GDDB is a powerful data discovery tool! An easy way to learn about existence of observations Contains an updated overview of available data from key databases Possible to download multiple datasets from several databases simultaneously GDDB system continously evolving
Applicability for MACC purposes Possible Use cases: I am studying a forest fire episode over Europe on August 14-15 2008 (imaginary event) What data are available to constraint my model? I am studying solar proton events in 2007 what stratospheric HNO3 measurements are available? Which databases contain aerosol data? A group of modellers are comparing sulphate concentrations we want to download a common reference dataset for 2007
Future prospects of the GDDB GEOmon is funded until April 2011 ACTRIS infrastructure project starts April 2011 and will utilize the GDDB system ensures support for five years More databases will be added Metadata exchange mechanisms will be improved and standardized (ESA DCIO) Better support for scripts and automatic operations may be added something for MACC II?
Thank you for your attention