Data archiving and data policies

Similar documents
The NOAA National Climatic Data Center Data availability, WDC-A, and GCOS data sets

How to find operational data centres for each ECV?!

SOOS Data Management

Data Center Coordination Office: Reprints & IGY Bibliography:

University Bremen (UniHB) PANGAEA

ICSU/WMO World Data Center for Remote Sensing of the Atmosphere (WDC RSAT)

ICSU World Data System Global Data for Global Sustainability

Fresh Ideas, New Frontiers

ICSU World Data Centers with marine holdings. Contents

TURNING OCEANS OF DATA INTO USEFUL PRODUCTS

Recent Developments at WDC Climate: Limitation of Long-term Archiving at DKRZ

NASA Earth System Science: Structure and data centers

NERC Data Policy Guidance Notes

Data dissemination best practice and STAR experience

UKOARP Data Management. Rob Thomas, British Oceanographic Data Centre

Activities of the Japanese Space Weather Forecast Center at Communications Research Laboratory

AmeriFlux Site and Data Exploration System

NCDC Strategic Vision

ICSU and the Challenge of Big Data in Science

A beginners guide to accessing Argo data. John Gould Argo Director

Metadata for Data Discovery: The NERC Data Catalogue Service. Steve Donegan

GLOBAL TEMPERATURE AND SALINITY PROFILE PROGRAMME (GTSPP) Dr. Charles Sun, GTSPP Chair National Oceanographic Data Center USA

DATA STEWARDSHIP from a geoscience and academic perspective

DA-09-02a Task Report Data Integration and Analysis System

13.2 THE INTEGRATED DATA VIEWER A WEB-ENABLED APPLICATION FOR SCIENTIFIC ANALYSIS AND VISUALIZATION

Scientific Research data archiving in Resources and Environment fields of 973 Program

WORLD DATA CENTER FOR GEOINFORMATICS AND SUSTAINABLE DEVELOPMENT: STATE-OF-THE-ART

Response from Oxford University Press, USA

THE STATE OF IPY DATA MANAGEMENT: THE JAPANESE CONTRIBUTION AND LEGACY

THE CCLRC DATA PORTAL

Introduction to BODC and how to submit data

Driving Earth Systems Collaboration across the Pacific

The CEOP Model Data Archive at the World Data Center for Climate as part of the CEOP Data Network

Pan-European infrastructure for management of marine and ocean geological and geophysical data

World Data Center for Remote Sensing of the Atmosphere, WDC-RSAT

The NERC DataGrid (NDG)

The GLOSS Delayed Mode Data Centre and the GLOSS Implementation Plan 2012

Economic and Social Council

IDS Data Flow Coordination (2009)

Shannon Rauch. Summer ESIP July 2015

International Data Sharing Framework

INTERNATIONAL COUNCIL OF SCIENTIFIC UNIONS. PANEL ON WORLD DATA CENTRES (Geophysical, Solar and Environmental) GUIDE. to the WORLD DATA CENTER SYSTEM

41 T Korea, Rep T Netherlands T Japan E Bulgaria T Argentina T Czech Republic T Greece 50.

Open Access to Manuscripts, Open Science, and Big Data

Compared assessment of selected environmental indicators of photovoltaic electricity in OECD cities

Data Management in Science and the Legacy of the International Polar Year

Directorate for Geosciences

ON OECD I-O DATABASE AND ITS EXTENSION TO INTER-COUNTRY INTER- INDUSTRY ANALYSIS " Norihiko YAMANO"

Big Data and Storage Management at the Large Hadron Collider

SCOR/IGBP Meeting on Data Management for International Marine Research Projects 1

What is a GEIA Foundation?

Use of ISO standards by NERC (a snapshot!)

Exploitation of ISS scientific data

Publishing Data Workflows. Chairs: Theodora Bloom (BMJ) Sünje Dallmeier-Tiessen (CERN) Elizabeth Newbold (British Library)

ARM Data Center Experience in Linking Big Data

M.S. Civil Engineering, Drexel University, Philadelphia, PA. Dec B.S. Industrial Engineering, Los Andes University, Bogotá, Colombia. Sep.


Building Links to Academic Research in Germany

Primary author: Kaspar, Frank (DWD - Deutscher Wetterdienst), Frank.Kaspar@dwd.de

Checklist for a Data Management Plan draft

Joint European Research Infrastructure network for Coastal Observatories

ANALYSIS OF DATA EXCHANGE PROBLEMS IN GLOBAL ATMOSPHERIC AND HYDROLOGICAL NETWORKS SUMMARY REPORT 1. June 2004

New Zealand s response to climate change. March

Data Management Plan FAQs

Six greenhouse gases covered by the United Nations Framework Convention on Climate Change (UNFCCC) and its Kyoto Protocol are:

ANALYSIS OF US AND STATE-BY-STATE CARBON DIOXIDE EMISSIONS AND POTENTIAL SAVINGS IN FUTURE GLOBAL TEMPERATURE AND GLOBAL SEA LEVEL RISE

Report on data management and infrastructure

Establishing and operating an Ocean Data Interoperability Platform ODIP. EU US Australia cooperation

Levels of Archival Stewardship at the NOAA National Oceanographic Data Center: A Conceptual Model 1

GLOBAL EDUCATION PROGRAM

Data Management Handbook

ICSTI 2014 General Assembly October 18-19, 2014

Data Publication and Paradigm Mapping Solutions

Annex 5A Trends in international carbon dioxide emissions

Guidelines for Archiving Data in the NARSTO Permanent Data Archive

COST Presentation. COST Office Brussels, ESF provides the COST Office through a European Commission contract

Oceanographic Data Management

GCOS science conference, 2 Mar. 2016, Amsterdam. Japan Meteorological Agency (JMA)

Nevada NSF EPSCoR Track 1 Data Management Plan

GCOS/GOOS/GTOS JOINT DATA AND INFORMATION MANAGEMENT PLAN

Climate and Global Dynamics National Center for Atmospheric Research phone: (303) Boulder, CO 80307

Data Management Framework for the North American Carbon Program

Includ acc to all tabl and graphs in Excel TM

Data Integration Strategies

1. Overview and Status Update (Satoko) : 10min. 2. Demonstration (Yoshi) : 20min. 3. New Architecture (Yoshi): 15min. 4. Q&A, Discussion (All) : 15min

FlowViewer. Maintaining NASA s Earth Science Traffic Situational Awareness

Microsoft Research Worldwide Presence

Data Sets of Climate Science

The Arctic Observing Network and its Data Management Challenges Florence Fetterer (NSIDC/CIRES/CU), James A. Moore (NCAR/EOL), and the CADIS team

Radiative effects of clouds, ice sheet and sea ice in the Antarctic

CEOS Water Portal Status Update

The ratification of the Kyoto-protocol in Turkey and its implementation into domestic law

GLOBAL EDUCATION PROGRAM

GLOBAL EDUCATION PROGRAM (GEP)

The forum for electrical innovation

MyOcean Copernicus Marine Service Architecture and data access Experience

The Global Commission on the Economy and Climate. Major Economies Forum, Paris

PART 1. Representations of atmospheric phenomena

Quality Assimilation and Validation Process For the Ensemble of Environmental Services

Virginia Commonwealth University Rice Rivers Center Data Management Plan

Transcription:

WWW.BJERKNES.UIB.NO Data archiving and data policies Benjamin Pfeil, Stephane Pesant, Michael Diepenbroek, Hannes Grobe Bjerknes Centre for Climate Research/ University of Bergen, Norway EPOCA-BIOACID-CalMarO-OCB training workshop, Kiel, Germany 12.03.2010

...but also

Often data shows a snapshot of the environment at that time/space Sampling can be very expensive (average of over 100.000 for one dataset for bio-, geoscience - including costs for expeditions, laboratories, etc) Therefore is data very valueable for future scientific work and has to be archived and made available

Why do we need data? Verification of research results Comparison of results Indication of trends Model input Remote sensing...and so on

Some facts about data in the scientific community Scientific instruments and computer simulations create large amount of data Due to new measurement techniques (and better precision) are data volumes doubling each year Scientific data has to be archived according to Good scientific practise in research and scholarship (European Science Foundation 2000)

Global increase in publications in empirical sciences?

But how is data archived?

For a long time this was common practise:

In 1958 the World Data Center system was established Mission Statement of the World Data Center System Data constitute the raw material of scientific understanding. The World Data Center system works to guarantee access to solar, geophysical and related environmental data. It serves the whole scientific community by assembling, scrutinizing, organizing and disseminating data and information

Network of ICSU WDCs Airglow Mitaka,Japan Astronomy Beijing, China Atmospheric Trace Gases Oak Ridge TN, USA Aurora Tokyo, Japan Cosmic Rays Toyokawa, Japan Earth Tides Brussels, Belgium Geology Beijing, China Geomagnetism Copenhagen, Denmark Edinburgh, UK Kyoto, Japan Colaba, India Glaciology Boulder CO, USA Cambridge, UK Lanzhou, China Human Interactions in the Environment Palisades NY, USA Ionosphere Tokyo, Japan Marine Environmental Sciences Bremen, Germany Meteorology Asheville NC, USA Beijing, China Obninsk, Russia Oceaography Obninsk, Russia Silver Spring MD, USA Tianjin, China Paleoclimatology Boulder CO, USA Marine Geology and Geophysics Boulder CO, USA Nuclear Radiation Moscow, Russia Tokyo, Japan WDC Co-ordination Offices Washington DC, USA Beijing, China Recent Crustal Movements Ondrejov, Czech Republic Remotely Sensed Land Data Sioux Falls SD, USA Renewable Resources and Environment Beijing, China Rockets and Satellites Obninsk, Russia Rotation of the Earth Obninsk, Russia Washington DC, USA Satellite Information Greenbelt MD, USA Seismology Denver CO, USA Beijing, China Soils Wageningen, The Netherlands Solar Activity Meudon, France Solar Radio Emission Nagano, Japan Solar Terrestrial Physics Boulder CO, USA Didcot Oxon, UK Moscow, Russia Haymarket, Australia Solid Earth Geophysics Beijing, China Boulder CO, USA Moscow, Russia Space Science Beijing, China Space Science Satellites Kanagawa, Japan Sunspot Index Brussels, Belgium

(Some) important WDCs for environmental data WDC for Atmospheric Trace Gases Carbon Dioxide Information Analysis Center USA WDC for Climate Model and Data Max-Planck- Institute for Meteorology GERMANY WDC for Glaciology, Boulder University of Colorado USA WDC for Marine Environmental Sciences Center for Marine Environmental Sciences (MARUM) GERMANY WDC for Marine Geology & Geophysics, Boulder USA WDC for Oceanography, Silver Spring USA

But WDC is a status! There are many national and international data centres as well which are not a WDC e.g. ICES International Council for the Exploration of the Sea, Denmark BODC British Oceanographic Data Centre, UK BADC British Atmospheric Data Centre, UK NODC National Oceanographic Data Center, USA NMD - Norsk marint datasenter, Norway

But there are different data archiving systems in use e.g. Data server (eg ftp server) Different project websites Different long term data archives (World Data Centers (WDC), National Data Centers (NODC)) as mentioned Combination of the above

Data server like a ftp server + very fast to archive data (data dump) + very cheap + easy to archive large data sets (model output for example) - Data is not structured (different file formats, units, etc) - Not easy to search for data - Difficult to know about the existence - Different versions of data - Not a long term archive data can be lost! - Maintenance

Data archived at project websites Normally a small database or ftp server is used + useful for members of the project since relevant data is available at one site + easy way to inform about the project and achievements - websites only represent data coming from the projects - can take a long time to get all relevant data - links will not work after a while data will be lost! - no more funding no more maintenance!!!

Data archived at a Data Center (WDC or NDC) + Data is long term archived and online available + many WDC are linked to each other data can be found at different websites (GCMD, WDC- CLUSTER, data portals, etc) + often a relational database is used which enables Google like queries and the extraction of large amount of data data and metadata is structured! + data sets get a DOI and are citable - Time and cost intensive - Depending on the type of data it can be very expensive

The next step at data centers and between data centers: - Data portals - LAS - Data warehouse

What happens in data portals? All relevant data centers are searched daily for new data by searching one website you search many at once All metadata is available at the data portal Changes at the different data centers are automatically applied - always the latest version is used Scientists can use it like Google and get a direct link to the data

CARBOOCEAN data portal

Live Access Server (LAS) A web server for visualizing gridded and in-situ data Can offer a wide range of data products for interpolation, comparison, visualization, and analysis

Data warehouse Enables online retrieval of data archived in a relational database Queries can be limited by parameters, area, geocode, etc

If data is structured in the same format is a matter of minutes!

Data publications following the OECD principles and guidelines for access to research data (2007) peer-reviewed citable data sets referenced by persistent identifiers (DOI) DOI registry -> crossref for scientific data Collaborations with publishers with data journals crossreferencing supplementary data with traditional publications (SCOR working group, Elsevier, Nature, Springer, Thompson Reuters)

Data can be published in a journal!

Data policies Are based on Good scientific practise in research and scholarship by the International Council for Science and European Science Foundation which state

Good scientific practice in research and scholarship European Science Foundation (ESF), 2000 Data accumulation, handling, and storage 36. Data are produced at all stages in experimental research and in scholarship. Data sets are an important resource, which enable later verification of scientific interpretations and conclusions. They may also be the starting point for further studies. It is vital, therefore, that all primary and secondary data are stored in a secure and accessible form. 37. Institutions may pay particular attention to documenting and archiving original research and scholarship data. Several codes of good practice recommend a minimum period of 10 years, longer in the case of especially significant or sensitive data. National or regional discipline-based archives should be considered where there are practical or other problems in storing data at the institution where the research was conducted.

Principles for dissemination of scientific data (International Council for Science/CODATA) 4. Scientific advances rely on full and open access to data. Both science and the public are well served by a system of scholarly research and communication with minimal constraints on the availability of data for further analysis. The tradition of full and open access to data has led to breakthroughs in scientific understanding, as well as to later economic and public policy benefits. The idea that an individual or organization can control access to or claim ownership of the facts of nature is foreign to science. 5. The interests of database owners must be balanced with society s need for open exchange of ideas. Given the substantial investment in data collection and its importance to society, it is equally important that data are used to the maximum extent possible. Data that were collected for a variety of purposes may be useful to science. Legal foundations and societal attitudes should foster a balance between individual rights to data and the public good of shared data.

There are different data policies in use They all state when and how data (and metadata) have to be made available for project members, the general public, where data have to be archived and who shall get the credit Funding agencies, institutes, projects, organizations, etc have often their own policies when data have to be publicly released (ranges from 0,5 3 years!) and where it has to be archived

Example: data from one cruise can fall under different data policies One of the project Institute Owner of the vessel National funding agency International funding agency

Even though it is slightly confusing but it sounds like everything is in place and data is available.

But those are recent publications:

and

Why is data often not reported in time or at all or not available to the community?

Please discuss

Why is data archiving important? Data sets are an important resource, which enable later verification of scientific interpretations and conclusions Data has been lost over in the past due to no or insufficient data management Essential for syntheses Several codes of good practice recommend a minimum data storage of at least 10 years Funding agencies require data to be long-term archived More and more data is being gathered

What is high impact for data? Making a data set available is a publication! Make data sets citable and get cited (use of DOI) Make data available to internationally agreed standards (for the data reporting itself and infrastructure being used) Use established data institutions which people search The more scientists find and use your data the more your paper will get cited Offering an easy data access and use of data

Possible problems in retrieving data from different sources Version conflicts (data is archived in many data centres in different stages e.g. raw data, quality controlled, etc.) Bad documented metadata and data (methods, units, unclear parameter definitions, etc) Just metadata is available online data has to be requested Naming of cruises varies in many countries > hard to identify same cruises Date formats (mm/dd/yyyy; yy/mm/dd; dd/mm/yyyy etc) Ways to report the position (Lat/Long, UTM) Different export formats (plain text, xml, netcdf, etc) Different entities (one data set = data from one cruise or data from one station or data from one sample) Data set is too large to be downloaded (e.g. model data) Result: Can take a lot of time to create large homogenic data collections!