CMIP5 Data Management CAS2K13
|
|
|
- Camron Gilmore
- 10 years ago
- Views:
Transcription
1 CMIP5 Data Management CAS2K September 2013, Annecy Michael Lautenschlager (DKRZ) With Contributions from ESGF CMIP5 Core Data Centres PCMDI, BADC and DKRZ
2 Status DKRZ Data Archive HLRE-2 archive concept from 2009: Annual growth rate with 6 PB/year is less than expected and total number of files is small compared to HLRE-1 WDCC CMIP-5 reference data: 105 TB, expected 1 PB 2
3 CMIP5 Protocol +Timeline Taylor et al (2009), "A Summary of the CMIP5 Experiment Design Centennial Experiments Decadal Experiments Timeline: : CMIP5 definition with Taylor et al (2009) as result : Climate model calculations and archive design : CMIP5 archive build up (presentation at CAS2K11) 3
4 CMIP5 Results 4 CMIP5 Release WS MPI-M/DKRZ CAS2K13 (Feb. 2012) (Lautenschlager, DKRZ)
5 Data Amounts CMIP3/CMIP5 CMIP3 / IPCC AR4 (Report 2007) Participation: 17 modelling centres with 25 models In total 36 TB model data central at PCMDI and ca. ½ TB in IPCC DDC at WDCC/DKRZ as reference data CMIP5 / IPCC AR5 (Report 2013/2014) Participation: 29 modelling groups with 61 models Produced data volume: ca. 10 PB with 640 TB from MPI ESM CMIP5 requested data volume: ca. 2 PB (in CMIP5 data federation) Data volume for IPCC DDC: ca. 1 PB (complete quality assurance process) with 60 TB from MPI ESM Status CMIP5 data archive (June 2013): 1.8 PB for data sets stored in 4.3 Mio Files in 23 data nodes CMIP5 data is about 50 times CMIP3 5
6 Usage Requirements for CMIP5 Results from CMIP5 (Coupled Model Intercomparison Project No. 5) are for Model intercomparisons with respect to climate model improvement and consolidation of the climate system knowledge Usage as common data basis for scientific publications as basis for the IPCC Assessment Report No. 5 (IPCC AR5) New in IPCC AR5: all three working groups should use the same model data base Resulting interdisciplinary applications (IAV Impact, Adaptation/Mitigation, Vulnerability) imposes high requirements to data quality and documentation This has implications for treatment and provision of climate data in the IPCC DDC (IPCC Data Distribution Centre) compared to AR4 This means accomplishment of quality control and data documentation in connection or just after the climate model runs in order to remove data errors and inconsistencies prior to the (interdisciplinary) usage. 6
7 CMIP5 Data Federation (P2P) BADC: CIM Metadata, Help- Desk, Replicates IPCC DDC PCMDI: CMIP5 Data Access Control, CMIP5 Coordination with WCRP/WGCM WDCC/DKRZ: Quality Control, DataCite Data Publication, Long-term Archive IPCC DDC ipcc-ar5.dkrz.de Currently 16 Index Nodes and 23 Data Nodes 1,2 PB bmbf-ipcc-ar5.dkrz.de 7
8 European Contribution to ESGF-CMIP5 The FP7 project IS-ENES contributes to ESGF-CMIP5 with 7 European data nodes and 4 index nodes 8
9 CMIP5 Data Federation 3 central management components have been planned for interdisciplinary data re use Highly structured data files in self descriptive data format NetCDF/CF with use metadata New: searchable model and experiment descriptions (CIM metadata from EU Project METAFOR) New: 3 layer quality assurance concept for data and metadata QC L1: ESGF publisher conformance checks QC L2: Data consistency checks QC L3: Double and cross checks of data and metadata and DataCite data publication 9
10 CIM Metadata Development: FP7 METAFOR New: Documentation of data creation process in close connection with climate data Improvement: Searchable model and experiment description Climate Model Experiment 10
11 3-Layer Quality Assurance Concept CIM QC1 MD I I WDCC/ IL IL I PCMDI IPSL ANU DIAS NCAR DKRZ QC Level 1 at ESG Data Nodes BADC All CMIP5 requested data (CMOR-2) PCMDI WDCC/ DKRZ BADC QC Level 2 at Archive Centers CMIP5 Output1 / replicated Data Documentation of QC: Stockhause, M., Höck, H., Toussaint, F., and Lautenschlager, M Quality assessment concept of the World Data Center for Climate and its application to CMIP5 CIM WDCC/ DKRZ QC QC Level 3 at PubAgency data. Geosci. Model Dev., 5, , 2012 DOI : /gmd CMIP5 Output1 / replicated data and metadata CMIP5 Output1 / replicated data and metadata in the long-term archive (IPCC DDC/WDCC)
12 Finalisation of Quality Assessment After final control of data and metadata (CIM und CF) CMIP5 data are transferred from the ESGF archive (most recent version) into the reference data archive (snapshot around March 2013) Quality status: approved by author Data are marked as irrevocable Long term archiving in WDC Climate of DKRZ Final step is the DataCite data publication and integration of associated citation reference into library catalogues Data entity (here one climate model experiment) receives a citation reference for direct usage in scientific publications and a DOI (Digital Object Identifier) for the transparent data access Citation reference contains data author and title as well as WDC Climate as DataCite DOI publisher and the DOI Resolution of the DOI leads to a Landing Page, which address is stored in the central data base of the DOI Handle Server at DataCite 12
13 DOI Landing Page Citation Reference Contact Person Metadata Summary Information on Quality Assurance Direct Access to Climate Data 13
14 Status QC for CMIP5 QC Status CMIP5 (8. August 2013) Quality Control 1: 1142 Experiments Quality Control 2: 830 Experiments (finalised 403) Quality Control 3: 174 Experiments DataCite DOI: 116 Experiments (WDCC / IPCC DDC) RCPs, AMIP, Historical 14
15 CMIP5 data management achievements CMIP5 federation with 3 core data nodes (PCMDI, DKRZ, BADC), 16 index nodes and 23 data nodes operates an distributed archive of nearly 2 PB of climate model data which is an increase by a factor of 50 compared to the last CMIP in A searchable data catalogue is available across the federation. A description of climate models and experiments has been established. A three layer quality assurance process has been established which ends in a DataCite data publication for finalised reference data. Long term archiving of reference data in the WDCC/DKRZ and integration in the ICSU WDS (World Data System) and the WIS (WMO Information System) Approved terms of use are available with open access for non commercial use and 2/3 of the archive is available without any restrictions. 15
16 ESGF started to analyse the CMIP5 experiences in order to improve the ESGF data infrastructure: Managing large data archives is not only a technical problem. The establishment of a stable distributed ESGF infrastructure requires stable commitments and funding ESGF has requests from alternative modelling efforts and related observations to be included in ESGF in order to have all these data more easily inter comparable. Federated data infrastructures like ESGF or Data Clouds seem the way to go for the next generation of climate data archives CMIP3 to CMIP5: 36 TB to 1.8 PB, which means factor 50 increase CMIP5 to CMIP6: 1.8 PB * 50 = 90 PB for one these MIPS If a few or several of these MIPs are considered then Requested improvements 16 Future Usability of ESGF data access interface Automated data replication between ESGF data nodes More powerful, more stable and scalable wide area data networks (service level agreements)
Big Data Research at DKRZ
Big Data Research at DKRZ Michael Lautenschlager and Colleagues from DKRZ and Scien:fic Compu:ng Research Group Symposium Big Data in Science Karlsruhe October 7th, 2014 Big Data in Climate Research Big
Big Data Services at DKRZ
Big Data Services at DKRZ Michael Lautenschlager and Colleagues from DKRZ and Scientific Computing Research Group MPI-M Seminar Hamburg, March 31st, 2015 Big Data in Climate Research Big data is an all-encompassing
CMIP6 Data Management at DKRZ
CMIP6 Data Management at DKRZ icas2015 Annecy, France on 13 17 September 2015 Michael Lautenschlager Deutsches Klimarechenzentrum (DKRZ) With contributions from ESGF Executive Committee and WGCM Infrastructure
Data management and archiving
Data management and archiving for COP / GOP / D_PHASE 4 th COPS Workshop 25./26.09. 2006 Stuttgart Claudia Wunram Hannes Thiemann 25.-26.09.06 / 1 Data archive Long term data archive for COPS, GOP and
Addendum to the CMIP5 Experiment Design Document: A compendium of relevant emails sent to the modeling groups
Addendum to the CMIP5 Experiment Design Document: A compendium of relevant emails sent to the modeling groups CMIP5 Update 13 November 2010: Dear all, Here are some items that should be of interest to
Integration strategy
C3-INAD and ESGF: Integration strategy C3-INAD Middleware Team: Stephan Kindermann, Carsten Ehbrecht [DKRZ] Bernadette Fritzsch [AWI] Maik Jorra, Florian Schintke, Stefan Plantikov [ZUSE Institute] Markus
Interactive Data Visualization with Focus on Climate Research
Interactive Data Visualization with Focus on Climate Research Michael Böttinger German Climate Computing Center (DKRZ) 1 Agenda Visualization in HPC Environments Climate System, Climate Models and Climate
Big Data and the Earth Observation and Climate Modelling Communities: JASMIN and CEMS
Big Data and the Earth Observation and Climate Modelling Communities: JASMIN and CEMS Workshop on the Future of Big Data Management 27-28 June 2013 Philip Kershaw Centre for Environmental Data Archival
NERC Data Policy Guidance Notes
NERC Data Policy Guidance Notes Author: Mark Thorley NERC Data Management Coordinator Contents 1. Data covered by the NERC Data Policy 2. Definition of terms a. Environmental data b. Information products
Research Data Management Guide
Research Data Management Guide Research Data Management at Imperial WHAT IS RESEARCH DATA MANAGEMENT (RDM)? Research data management is the planning, organisation and preservation of the evidence that
Jochen Schirrwagen, Najko Jahn. Bielefeld University Library, Germany. Research in Context
Jochen Schirrwagen, Najko Jahn Bielefeld University Library, Germany Research in Context In the light of recent results from OpenAIREplus and from the Library perspective Seminar to Access of Grey Literature
ICSU/WMO World Data Center for Remote Sensing of the Atmosphere (WDC RSAT)
ICSU/WMO World Data Center for Remote Sensing of the Atmosphere (WDC RSAT) Beate Hildenbrand (et al.) German Aerospace Center (DLR) GAW 2009, Geneva, 05 07 May 2009 http://wdc.dlr.de WDC RSAT overview
ETERNUS CS High End Unified Data Protection
ETERNUS CS High End Unified Data Protection Optimized Backup and Archiving with ETERNUS CS High End 0 Data Protection Issues addressed by ETERNUS CS HE 60% of data growth p.a. Rising back-up windows Too
Data Formats for Long-term Archiving of Climate Model Data at WDC Climate and DKRZ
Data Formats for Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager and Jörg Wegner WDC Climate / Max-Planck-Institute for Meteorology, Hamburg MPG e-science Seminar
CEDA Storage. Dr Matt Pritchard. Centre for Environmental Data Archival (CEDA) www.ceda.ac.uk
CEDA Storage Dr Matt Pritchard Centre for Environmental Data Archival (CEDA) www.ceda.ac.uk How we store our data NAS Technology Backup JASMIN/CEMS CEDA Storage Data stored as files on disk. Data is migrated
Metadata for Data Discovery: The NERC Data Catalogue Service. Steve Donegan
Metadata for Data Discovery: The NERC Data Catalogue Service Steve Donegan Introduction NERC, Science and Data Centres NERC Discovery Metadata The Data Catalogue Service NERC Data Services Case study:
Data Management Plan Deliverable 5.4
Data Management Plan Deliverable 5.4 This project has received funding from the European Union s Horizon 2020 Research and Innovation Programme under Grant Agreement No 675191 Page 1 About this document
CDI SSF Category 1: Management, Policy and Standards
CDI SSF Category 1: Management, Policy and Standards Developing a Data Management Plan for Implementation: Best Practices for the Collection, Management, Storing, and Sharing of Geospatial and Non-geospatial
The Design and Implementation of the Zetta Storage Service. October 27, 2009
The Design and Implementation of the Zetta Storage Service October 27, 2009 Zetta s Mission Simplify Enterprise Storage Zetta delivers enterprise-grade storage as a service for IT professionals needing
NATIONAL CLIMATE CHANGE & WILDLIFE SCIENCE CENTER & CLIMATE SCIENCE CENTERS DATA MANAGEMENT PLAN GUIDANCE
NATIONAL CLIMATE CHANGE & WILDLIFE SCIENCE CENTER & CLIMATE SCIENCE CENTERS DATA MANAGEMENT PLAN GUIDANCE Prepared by: NCCWSC/CSC Data Management Working Group US Geological Survey February 26, 2013 Version
Why aren t climate models getting better? Bjorn Stevens, UCLA
Why aren t climate models getting better? Bjorn Stevens, UCLA Four Hypotheses 1. Our premise is false, models are getting better. 2. We don t know what better means. 3. It is difficult, models have rough
Diagram 1: Islands of storage across a digital broadcast workflow
XOR MEDIA CLOUD AQUA Big Data and Traditional Storage The era of big data imposes new challenges on the storage technology industry. As companies accumulate massive amounts of data from video, sound, database,
Cite My Data M2M Service Technical Description
Cite My Data M2M Service Technical Description 1 Introduction... 2 2 How Does it Work?... 2 2.1 Integration with the Global DOI System... 2 2.2 Minting DOIs... 2 2.3 DOI Resolution... 3 3 Cite My Data
NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons
The NIH Commons Summary The Commons is a shared virtual space where scientists can work with the digital objects of biomedical research, i.e. it is a system that will allow investigators to find, manage,
Big Data at ECMWF Providing access to multi-petabyte datasets Past, present and future
Big Data at ECMWF Providing access to multi-petabyte datasets Past, present and future Baudouin Raoult Principal Software Strategist ECMWF Slide 1 ECMWF An independent intergovernmental organisation established
Benchmarking OPeNDAP services for modern ESM data workloads
Benchmarking OPeNDAP services for modern ESM data workloads Stephen Pascoe ([email protected]) Richard Wilkinson (Tessella plc. Abingdon, UK) Phil Kershaw ([email protected]) 1 BADC: British
The Ophidia framework: toward cloud- based big data analy;cs for escience Sandro Fiore, Giovanni Aloisio, Ian Foster, Dean Williams
The Ophidia framework: toward cloud- based big data analy;cs for escience Sandro Fiore, Giovanni Aloisio, Ian Foster, Dean Williams Sandro Fiore, Ph.D. CMCC Scientific Computing and Operations Division
met.no AeroCom tools for HTAP Michael Schulz, Jan Griesfeller MetNo
met.no AeroCom tools for HTAP Michael Schulz, Jan Griesfeller MetNo cf,wcs: Snehal Waychal, Martin Schultz, Michael Decker FZJulich Geneve 20.03.13 acknowledgment: MetNo, EU service contract Aerocom Tools
DSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group
DSS High performance storage pools for LHC Łukasz Janyst on behalf of the CERN IT-DSS group CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Introduction The goal of EOS is to provide a
Long term retention and archiving the challenges and the solution
Long term retention and archiving the challenges and the solution NAME: Yoel Ben-Ari TITLE: VP Business Development, GH Israel 1 Archive Before Backup EMC recommended practice 2 1 Backup/recovery process
Cloud computing in the Enterprise: An Overview
Systems & Technology Group Cloud computing in the Enterprise: An Overview v Andrea Greggo Cloud Computing Initiative Leader, System z Market Strategy What is cloud computing? A user experience and a business
NASA s Big Data Challenges in Climate Science
NASA s Big Data Challenges in Climate Science Tsengdar Lee, Ph.D. High-end Computing Program Manager NASA Headquarters Presented at IEEE Big Data 2014 Workshop October 29, 2014 1 2 7-km GEOS-5 Nature Run
Primary author: Kaspar, Frank (DWD - Deutscher Wetterdienst), [email protected]
Primary author: Kaspar, Frank (DWD - Deutscher Wetterdienst), [email protected] Co-authors: Johannes Behrendt (DWD - Deutscher Wetterdienst), Klaus-Jürgen Schreiber (DWD - Deutscher Wetterdienst) Abstract
Integrating a Multi-tiered Deduplication Approach to Simplify Enterprise-wide Backup & Recovery
Integrating a Multi-tiered Deduplication Approach to Simplify Enterprise-wide Backup & Recovery Travis Melo IT Manager, EMC IT EMC IT At a Glance User Profiles 48,000 internal users IT Environment 400,000+
SAP IT Infrastructure Management
SAP IT Infrastructure Management Legal Disclaimer This presentation is not subject to your license agreement or any other agreement with SAP. SAP has no obligation to pursue any course of business outlined
A. Document repository services for EU policy support
A. Document repository services for EU policy support 1. CONTEXT Type of Action Type of Activity Service in charge Associated Services Project Reusable generic tools DG DIGIT Policy DGs (e.g. FP7 DGs,
Local Loading. The OCUL, Scholars Portal, and Publisher Relationship
Local Loading Scholars)Portal)has)successfully)maintained)relationships)with)publishers)for)over)a)decade)and)continues) to)attract)new)publishers)that)recognize)both)the)competitive)advantage)of)perpetual)access)through)
Data Protection as Part of Your Cloud Journey
Data Protection as Part of Your Cloud Journey Jim Vanek DPAD Area Manager IL / WI EMC Data Protection & Availability Division October 23, 2014 Copyright 2014 EMC Corporation. All rights reserved. 1 Setting
OpenStack Private Cloud Hosting in an Tier 3 Data Centre. G-Cloud Lot 1 IaaS
OpenStack Private Cloud Hosting in an Tier 3 Data Centre This is service provides a dedicated private cloud environment built on the open source technology, OpenStack. This is service provides a dedicated
How To Write An Nccwsc/Csc Data Management Plan
Guidance and Requirements for NCCWSC/CSC Plans (Required for NCCWSC and CSC Proposals and Funded Projects) Prepared by the CSC/NCCWSC Working Group Emily Fort, Data and IT Manager for the National Climate
Cost effective methods of test environment management. Prabhu Meruga Director - Solution Engineering 16 th July SCQAA Irvine, CA
Cost effective methods of test environment management Prabhu Meruga Director - Solution Engineering 16 th July SCQAA Irvine, CA 2013 Agenda Basic complexity Dynamic needs for test environments Traditional
Working with the British Library and DataCite A guide for Higher Education Institutions in the UK
Working with the British Library and DataCite A guide for Higher Education Institutions in the UK Contents About this guide This booklet is intended as an introduction to the DataCite service that UK organisations
Managed Cloud Services
Managed Services From Data Centre to Managed Public Traditional data centre Virtual Data Centre In-house Dedicated External Multi-tenant External Managed Public Consulting approach: Breakdown of Business
DRIVER Providing value-added services on top of Open Access institutional repositories
DRIVER Providing value-added services on top of Open Access institutional repositories Dr Dale Peters Scientific Technical Manager : DRIVER SUB Goettingen Germany Gaining the momentum: Open Access and
Data Management in Science and the Legacy of the International Polar Year
Data Management in Science and the Legacy of the International Polar Year Ellsworth LeDrew Mark Parsons Taco de Bruin Peter Yoon Christine Barnard Scott Tomlinson Warwick Vincent http://www.earthzine.org/2008/03/27/securing-the-legacy-of-ipy/
WMO Climate Database Management System Evaluation Criteria
ANNEX 8 WMO Climate Database Management System Evaluation Criteria System Name: Version: Contributing Country: Contact Information Contact Person: Telephone: FAX: Email address: Postal address: Date: General
A Service for Data-Intensive Computations on Virtual Clusters
A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King [email protected] Planets Project Permanent
DESIGN OF A PLATFORM OF VIRTUAL SERVICE CONTAINERS FOR SERVICE ORIENTED CLOUD COMPUTING. Carlos de Alfonso Andrés García Vicente Hernández
DESIGN OF A PLATFORM OF VIRTUAL SERVICE CONTAINERS FOR SERVICE ORIENTED CLOUD COMPUTING Carlos de Alfonso Andrés García Vicente Hernández 2 INDEX Introduction Our approach Platform design Storage Security
(Scale Out NAS System)
For Unlimited Capacity & Performance Clustered NAS System (Scale Out NAS System) Copyright 2010 by Netclips, Ltd. All rights reserved -0- 1 2 3 4 5 NAS Storage Trend Scale-Out NAS Solution Scaleway Advantages
Service Guidelines. This document describes the key services and core policies underlying California Digital Library (CDL) s EZID Service.
http://ezid.cdlib.org Service Guidelines 1 Purpose This document describes the key services and core policies underlying (CDL) s EZID Service. 2 Introduction to EZID EZID (easy eye dee) is a service providing
Scala Storage Scale-Out Clustered Storage White Paper
White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current
Oracle Data Protection Concepts
Oracle Data Protection Concepts Matthew Ellis Advisory Systems Engineer BRS Database Technologist, EMC Corporation Accelerating Transformation EMC Backup Recovery Systems Division 1 Agenda Market Conditions.
AIIM & ASSUREON AN ASSUREON BRIEF
SOLUTIONBRIEF AIIM & ASSUREON AN ASSUREON BRIEF AIIM (Association for Information and Image Management) is the global community of information professionals. Their mission is to help organizations thrive
