Data Formats for Long-term Archiving of Climate Model Data at WDC Climate and DKRZ



Similar documents
Big Data Services at DKRZ

CMIP5 Data Management CAS2K13

Data management and archiving

Data-Intensive Science and Scientific Data Infrastructure

Introduction to DKRZ and to the Visualization Server halo

Big Data Research at DKRZ

Nevada NSF EPSCoR Track 1 Data Management Plan

WORLD METEOROLOGICAL ORGANIZATION. Introduction to. GRIB Edition1 and GRIB Edition 2

The THREDDS Data Repository: for Long Term Data Storage and Access

Interactive Data Visualization with Focus on Climate Research

CMIP6 Data Management at DKRZ

PART 1. Representations of atmospheric phenomena

The Arctic Observing Network and its Data Management Challenges Florence Fetterer (NSIDC/CIRES/CU), James A. Moore (NCAR/EOL), and the CADIS team

Norwegian Satellite Earth Observation Database for Marine and Polar Research USE CASES

Long Term Preservation of Earth Observation Space Data. Preservation Workflow

Scientific Data Management and Dissemination

HYCOM Meeting. Tallahassee, FL


CLOUD BASED N-DIMENSIONAL WEATHER FORECAST VISUALIZATION TOOL WITH IMAGE ANALYSIS CAPABILITIES

Solution Brief: Creating Avid Project Archives

The ORIENTGATE data platform

Integrated Rule-based Data Management System for Genome Sequencing Data

How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (

Analysis of Climatic and Environmental Changes Using CLEARS Web-GIS Information-Computational System: Siberia Case Study

Primary author: Kaspar, Frank (DWD - Deutscher Wetterdienst), Frank.Kaspar@dwd.de

Satellite Products and Dissemination: Visualization and Data Access

NOAA Big Data Project. David Michaud Acting Director, Office of Central Processing Office Monday, August 3, 2015

Big Data at ECMWF Providing access to multi-petabyte datasets Past, present and future

Zhenping Liu *, Yao Liang * Virginia Polytechnic Institute and State University. Xu Liang ** University of California, Berkeley

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

NASA Earth System Science: Structure and data centers

Structure? Integrated Climate Data Center How to use the ICDC? Tools? Data Formats? User

Guide to the WMO Table Driven Code Form Used for the Representation and Exchange of Regularly Spaced Data In Binary Form: FM 92 GRIB Edition 2

Product Brief: XenData X2500 LTO-6 Digital Video Archive System

How To Write An Nccwsc/Csc Data Management Plan

An Introduction to the MTG-IRS Mission

The STAR Algorithm Integration Team (AIT) Research to Operations Process

Geospatial Software Solutions for the Environment and Natural Resources

Archive Data Retention & Compliance. Solutions Integrated Storage Appliances. Management Optimized Storage & Migration

XenData Video Edition. Product Brief:

Metadata for Data Discovery: The NERC Data Catalogue Service. Steve Donegan

J9.6 GIS TOOLS FOR VISUALIZATION AND ANALYSIS OF NEXRAD RADAR (WSR-88D) ARCHIVED DATA AT THE NATIONAL CLIMATIC DATA CENTER

POLICY AND GUIDELINES FOR THE MANAGEMENT OF ELECTRONIC RECORDS INCLUDING ELECTRONIC MAIL ( ) SYSTEMS

<Insert Picture Here> Oracle SQL Developer 3.0: Overview and New Features

ETERNUS CS High End Unified Data Protection

Big Data and the Earth Observation and Climate Modelling Communities: JASMIN and CEMS

CEDA Storage. Dr Matt Pritchard. Centre for Environmental Data Archival (CEDA)

Digital Preservation. OAIS Reference Model

INTEROPERABLE IMAGE DATA ACCESS THROUGH ARCGIS SERVER

SQL Server Array Library László Dobos, Alexander S. Szalay

GIS Databases With focused on ArcSDE

GPFS und HPSS am HLRS

BIG DATA What it is and how to use?

OpenDAP configuration course

Long term data Archive Study on new Technologies. GMV, 2011 Property of GMV All rights reserved

Reprojecting MODIS Images

VIIRS-CrIS mapping. NWP SAF AAPP VIIRS-CrIS Mapping

1/20/2016 INTRODUCTION

"CLIMATE DATA OPERATORS" AS A USER-FRIENDLY PROCESSING TOOL FOR CMSAF'S SATELLITE-DERIVED CLIMATE MONITORING PRODUCTS

NASA s Big Data Challenges in Climate Science

CE 504 Computational Hydrology Computational Environments and Tools Fritz R. Fiedler

REACCH PNA Data Management Plan

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems

Many DBA s are being required to support multiple DBMS s on multiple platforms. Many IT shops today are running a combination of Oracle and DB2 which

Storage of the Experimental Data at SOLEIL. Computing and Electronics

CommVault Simpana Archive 8.0 Integration Guide

Hardware Configuration Guide

XenData Archive Series Software Technical Overview

GIS Initiative: Developing an atmospheric data model for GIS. Olga Wilhelmi (ESIG), Jennifer Boehnert (RAP/ESIG) and Terri Betancourt (RAP)

AMLIGHT, Simulation Datasets, and Global Data Sharing

Database preservation toolkit:

AWI Fedora User Meeting Copenhagen, Denmark 28 September, 2005

Introduction to IODE Data Management. Greg Reed Past Co-Chair IODE

Use of OGC Sensor Web Enablement Standards in the Meteorology Domain. in partnership with

EUMETSAT EO Portal. End User Image Access using OGC WMS/WCS services. EUM/OPS/VWG/10/0095 Issue <1> <14/01/2010> Slide: 1

Integrated Climate Data Center How to use our data center? Integrated Climate Data Center - Remon Sadikni - remon.sadikni@zmaw.de

Transcription:

Data Formats for Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager and Jörg Wegner WDC Climate / Max-Planck-Institute for Meteorology, Hamburg MPG e-science Seminar October 25th 26th, 2007 in Berlin, Germany

DKRZ: Earth system model development Simulations of past, present and future climate WDC Climate: Long-term data archiving Inter-disciplinary data dissemination

Diagram of Climate System

Diagram of the Hamburg IPCC- Climate Model ECHAM5/MPI-OM

Forcing of Climate Projetions for IPCC AR4

Near surface temperature change for the scenarios A1B und B1. Presented is the difference of the 30-year-means 2071-2100 minus 1961-1990.

Comparison of the present-day sea ice cover In March and September (oben) with the climate projection for the scenario A1B (unten) in 2100. Additionally the snow over land can be obtained.

Spatial resolution of the North Atlantic sector in ECHAM5/MPI-OM

Data Volumina in Climate Projections: IPCC AR4: ECHAM5[T63L19]/MPI-OM produces 23 TB/year Climate projection over 240 years (1860-2100): 5,5 TB and appr. 2 months computer run time Future: ECHAM5[T106L31] produces 44 GB/year Climate projection over 240 years (1860-2100): 106 TB, i.e. the complexity is appr. 20 * T63

Extrapolated HLRE2 linear archive increase (10 times HLRE) Compute server architectures: C90: Cray C90 / HLRE: NEC SX-6 / MPP: SUN-Cluster / HLRE2: new system (HLRE: Höchstleistungsrechnersystem für die Erdsystemforschung)

DKRZ System Diagram GFS lokale lokale Systeme DS CS NW entfernte Systeme

HLRE System architecture at DKRZ SX-6 SX-6 IXS SX-6 24 nodes SX-6 SX-6 SX-6 3 * 16/32-48 SX-6 SX-6 SX-6 SX-6 x SX-6 32 SX-6 SX-6 SX-6 X compile user appl x 6 GFS Disk 70 TB x 12 x 48 x 2 x 12 x 112 x 4 x 16 x 12 x 8 x 24 x 32 LAN x 6 x 2 x 12 ApplSrv SUN Az GFS GFS/UVDM UDSN GFS/UVDM UDSN DXUL-DB 8/16 UCFM 2 * 16/32 UCFM Oracle9i archive backup 2 * 8/16 GFS 8/16 x 36 HSM x 6 3 * 4/8 DS test 6 * 4/8 x 20 DBMS UCFM Cache 17 TB x 16 x 35 9840C x 7 9940B x 18 T10000 x 8 LTO2 x 2 DBMS Disk 30 TB

Data classes Test data from model code development, life cycle: weeks to months Project data from scientific model evaluation and research projects (DKRZ resources at project level), life cycle: 3 5 years Final results as data products for international projects (IPCC) and scientific publications, life cycle: 10 years and longer Data hierarchy levels Temp(orary) scratch discs at compute server Work fixed disc space at project level for evaluation Arch(ive) tape storage space (single copy) with expiration date for project data beyond available disc space Docu(mentation) documented, long-term tape archive (security copy) for data products, focus on interdisciplinary data utilisation, data are fixed and no longer matter of change

Tape space distributon to archive classes at DKRZ begin of 2007: part of the work space on tape because GFS too small docu domain consists of WDCC no expiration dates in arch domain, parts of arch domain belongs to docu but not yet documented

Data documentation requirements are accomplished by using the WDCC infrastruture CERA-2 metadata model developed in 1999 Catalogue interface: cera.wdc-climate.de Input interface: input.wdc-climate.de CERA-2 metadata content is complete with respect to browse, to discover and to use climate data which are stored in the database system or outside in flat files The WDCC matches international description standards like ISO 19115, Dublin Core or GCMD and is integrated in international data federations Data storage structure assembles storage of climate time series per variable in BLOB data tables. This allows for web-based data catalogue search and data access in small data granules.

CERA Data Model Reference Contact Coverage Status Entry Parameter Distribution Data Org Local Adm. Data Access Spatial Reference

Coloured columns correspond to BLOB data tables in WDCC. Collections of matrix rows represents storage in model raw data files (complete model output storage time step by storage time step).

Data infrastructure integrates data stewardship in the long-term archive Bit-stream preservation Quality assurance Usability enabling

Long-term archive data stewardship Bit-stream preservation Secondary tape copies on different tapes and technology at separate location Copy to new tapes after maximum number of tape accesses are reached (Refreshment) Quality assurance Semantic examinations: behavior of a numerical model compared to observations and to other models, part of the scientific evaluation process Syntactic examinations: formal aspects of data archiving and ensurance that data archiving is free of errors as far as possible Consitency between metadata and climate data Completeness of climate data Standard range of values Spatial and temporal data arrangement

Long-term archive data stewardship (continued) Usability enabling Complete and searchable documenation of climate data entities (database tables and flat files) in the catalogue system of the WDCC WDCC offers web-based data access to small data granules (individual entries in BLOB DB tables) Archive technology transfer must be downward compatible to keep old data technically readable Data processing tools and data format access libraries must be migrated to new architectures

Standard Data Formats (SDFs) at WDC Climate Requirements Self-descriptive (use metadata) Machine independent Should contain compression or packing Benefits SDFs support long-term data preservation SDFs support data exchnage and dissemination SDFs allow for application of standardized data processing tools and packages

Data Form a ts at W D C C climate model output GRIB 1 GRIB 2 NetCDF 3.x NetCDF 4.x tools: cdo, cdat, xconv, IDV cdo, cdat, nco, ncl cdat, grads, ncview, G M T convert manipulate visualize

GRIB 1 G RIdded Binary -'GRIB' Section 0 -length of message, edition nu m ber Section 1 - product description section Section 2 - grid description section repeated Section 3 - bit map section Section 4 - binary data section -'7777' ds8 55 %grib -ginfo zzz.grb Rec : Position Size : V PDS GDS BMS BDS : Code Level: LType GType 1 : 0 36948 : 1 28 32 0 36876 :133 20000 : 100 4 2 : 36948 36948 : 1 28 32 0 36876 :133 20000 : 100 4 3 : 73896 36948 : 1 28 32 0 36876 :133 20000 : 100 4 4 : 110844 36948 :1 28 32 0 36876 : 133 20000 : 100 4 5 : 147792 36948 :1 28 32 0 36876 : 133 20000 : 100 4 ds8 56 %grib -gdsinfo zzz.grb Rec : GDS NV PVPL Typ :xsize ysize Lat1 Lon1 Lat2 Lon2 dx dy 1 : 32 0 255 4 : 192 96 88572 0-88572 358125 1875 48 ds8 57 % - co mpressed data -> smallfile size - every 2d field (record) is a G RIB file -> UNIX co m m ands for catenating -library supportfor FORTRAN & C -strong restrictions for header informations - header infor mation coded (num b ers) -need of tables for decoding

GRIB 2 General Regulary-distributed Information in Binary form Section 0 -'GRIB'indicator section Section 1 -identification section* Section 2 -localuse section (optional) Section 3 - grid definition section* repeated Section 4 - product definition section* repeated Section 5 - data representation section repeated Section 6 - bit map section Section 7 - data section Section 8 - end section '7777' * Sections 1,3,4 represent the GRIB1 product description section. This splitting, com bined with the option foriterating sections and the concept of templates make the main difference to GRIB1 and keeps GRIB2 very flexible. Concept of templates: You can define templates for grid definition, product definition and data representation by your o wn.

GRIB 2exa mple A 500 hpa height field forecasts on a Northern He misphere polar stereographic grid produced by a particular num erical model atforecast hours 12, 24, 36,and 48. These fourfields could be represented by a single GRIB2 message by repeating the sequence of Sections 4to 7 four times, making the appropriate forecast time changes in the Product Definition Section in each iteration of the sequence. Section 0:Indicator Section Section 1:Identification Section Section 2:Local Use Section (optional) Section 3:Grid Definition Section Section 4:Product Definition Section (hour = 12) repetition 1 Section 5:Data Representation Section Section 6:Bit-Map Section Section 7:D ata Sectio n Section 4:Product Definition Section (hour = 24) repetition 2 Section 5:Data Representation Section Section 6:Bit-Map Section Section 7:D ata Sectio n Section 4:Product Definition Section (hour = 36) repetition 3 Section 5:Data Representation Section Section 6:Bit-Map Section Section 7:D ata Sectio n Section 4:Product Definition Section (hour = 48) repetition 4 Section 5:Data Representation Section Section 6:Bit-Map Section Section 7:D ata Sectio n Section 8:End Section Note that since the Grid Definition Section is not repeated, itremains in effectfor allfour forecast hours.

NetCDF 3.x NETwork Com mon Data Form - dimensions (1 unlimited possible) - variables & attributes - globalattributes -data netcdf simple_xy { dimensions: x = 6 ; y = 12 ; variables: int data(x,y) ; // global attributes: :C D O = "Climate Data Operators version 0.9.5 " ; :source = "E C H A M5.2" ; data: data = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,32, 33, 34,35, 36, 37,38, 39, 40, 41,42, 43, 44,45, 46, 47,48, 49, 50,51, 52, 53,54, 55, 56, 57, 58, 59,60, 61, 62,63, 64, 65,66, 67, 68,69, 70, 71 ; } - no co m pression for data -> file size bigger than G RIB1 - data stored n-dim ensional -library supportfor FORTRAN & C -file size => 2 GByte with NetCDF3.6 -no restrictions for headerinfor mations

NetCDF 4.x, HDF5 NetCDF-4/HDF5 Format W ith version 4.0 of netc D F, another ne w data for m at was introduced: netcdf-4/hdf5 format. This format is HDF5, with fulluse of the new dimension scales,creation ordering, and other features of HDF5 added in its version 1.8.0 release. Multiple unlimited dimensions. Groups to organize data. New types,including com pound types and variable length arrays. ParallelI/O. netcdf4 "exa mple" { group "/" { group "group1" { dataset "set1" { dimension variables data} dataset "set2" { dimension variables data} } group "group2" {... }} netcdf3.x file

Tools nco: for NetCDF http://nco.sourceforge.net/ ncl:for NetC D F3, NetC D F4, G RIB1, GRIB2, HDF4 http://www.ncl.ucar.edu/ ncview:for NetCDF http://m eteora.ucsd.edu/~pierce/ncview_ho me_page.html cdo: for GRIB1, NetC DF, ieg, EXTRA, Service http://w w w.m pimet.mpg.de/filead min/software/cdo/ cdat:for GRIB1 (with GrADS controlfile), NetC DF, HDF http://www-pcmdi.llnl.gov/software/ xconv: for NetCDF, G RIB http://badc.nerc.ac.uk/help/software/xconv/ IDV: for GRIB, NetCDF http://w w w.unidata.ucar.edu/software/ THG HDF5 tools:for HDF http://w w w.hdfgroup.org/products/hdf5_tools/ GrADS: GRIB1 (with controlfile),netcdf3.x http://grads.iges.org/grads/grads.ht ml