DUE GlobBiomass D9 System Specification Document Prepared for European Space Agency (ESA-ESRIN) In response to ESRIN/Contract No. 4000113100/14/I_NB Prepared by
GlobBiomass Page 2/20 Revision History Deliverable Work Package Due date Authors Distribution Reason for change Issue 1 Revision 30 WP4000 KO+12 Date 20.12.2015 Release 1 Version 01 Andreas Wiesmann, Maurizio Santoro, Oliver Cartus FSU: Christiane Schmullius, Evelin Matejka ESA: Frank Martin Seifert; Nathalie Boisard
GlobBiomass Page 3/20 Contents 1.Summary...4 2.Introduction...5 2.1.Purpose...5 2.2.Document Overview...5 2.3.Symbols and Acronyms...5 2.4.References...6 3.GlobBiomass Processing System Overview...7 3.1.Context...7 3.2.User Requirements...7 3.3.Main System requirements...8 3.4.Main functions...8 3.5.High Level Decomposition...8 3.6.Hardware Infrastructure...9 4.GlobBiomass Processing System Workflow and Operational Scenarios...9 4.1.Roles...9 4.2.User information and data access...9 4.3.Processing System Workflow...9 4.3.1.Pre-processing...10 4.3.2.Retrieval...10 4.3.3.Product Generation...11 4.3.4.Verification/Validation...11 4.4.Algorithm Improvement...12 5.Functional Design...13 5.1.Services...13 5.2.Processors...14 5.3.Concept for continuous improvement...14 5.4.System Documentation...14 6.DEVELOPMENT, LIFE CYCLE, COST AND PERFORMANCE...15 6.1.Re-use and development...15 6.2.System life cycle drivers and considerations...15 6.3.Sizing and performance analysis...16 6.4.Cost estimation...17 7.Requirements Traceability...17
GlobBiomass Page 4/20 1. SUMMARY This document is the System Specification Document of the ESA DUE GlobBiomass project. The response of the Committee on Earth Observation Satellites (CEOS) to the Carbon Strategy of the Group on Earth Observation (GEO) recently released CEOS Strategy for Carbon Observations from Space, lists thirty challenges. The first listed challenge is to provide accurate measurements of forest canopy height and estimates of above-ground biomass. This dominant position reflects the importance of quantifying the Essential Climate Variable (ECV) Biomass and highlights the need for improving technical capabilities for monitoring above-ground biomass of forests. This issue is being addressed within this ESA DUE GlobBiomass project particularly in the core task development and prototyping of the Processing System for the global biomass product. This system specification document (SSD) describes the GlobBiomass processing system (GBPS) specifications aiming at an operational GBPS. The GBPS specifications are derived from the GlobBiomass documents, in particular the GlobBiomass System Requirements Document (SRD) but also the Algorithm Theoretical Basis Document (ATBD) and User Requirements Document (SRD). The specifications are based on the experience with the processors developed within the BIOMASS project. Specifications foresee a modular approach covering the sequential modules from data uptake to product verification resulting in the workflow specifications. Operational scenarios specify roles relevant for the GBPS. Most relevant roles are the development team for the development and implementation of the GBPS, external experts for the validation of products and ESA having overall control. The functional design outlines the foreseen services and processors to be implemented. Most relevant the processors for the retrieval of the AGB products and product uncertainty. Services at this stage cover mainly support services for development, the software repository and the issue tracker. Important aspects in the functional design specifications are the development and documentation. The aim is to have a well documented easy to maintain and further develop processing system. At this stage the processing time specifications are not critical and reachable with standard COTS servers. Also the data volume is modest, about 20TB for input and output in total.
GlobBiomass Page 5/20 2. INTRODUCTION 2.1. Purpose This System Specification Document (SSD) defines the GBPS system for DUE GlobBiomass project and potentially for the time beyond. The system shall be developed within the named project and used later for the production. Biomass is one of the Essential Climate Variables (ECV) named in the recently released CEOS Strategy for Carbon Observations from Space, the response of the Committee on Earth Observation Satellites (CEOS) to the Carbon Strategy of the Group on Earth Observation (GEO). The system covers production for the GB contribution to the Climate Data Record (CDR) to be generated. The SSD is a response to the GBPS Requirements Document (SRD) [RD 2] and is a deliverable of the ESA DUE GlobBiomass project as requested in the Statement of Work (SOW) [AD 1]. The system design is based on experience with a prototype developed within the BIOMASAR and other projects (Santoro et al., 2011, Santoro et al., 2015). The degree of reuse from the prototype is one of the topics in this SSD. 2.2. Document Overview After this formal introduction, Section 3 provides an overview of the GBPS, describing its purpose and intended use, its main requirements, its context, its main functions and components. Section 4 describes main operational scenarios and use cases of the system Section 5 describes the functional design with components and interfaces Section 6 is a collection of further analyses regarding re-use of components, system life cycle, and cost and performance. Section 7 traces the requirements coming from the System Requirements Document towards the GBPS with this GBPS System Specification Document. 2.3. Symbols and Acronyms AGB Above Ground Biomass ALOS AR ARR ASAR ATBD ATD ATR Advanced Land Observing Satellite Acceptance Review Acceptance Review Report Advanced Synthetic Aperture Radar Algorithm Theoretical Basis Document Acceptance Test Document Acceptance Test Review CCI-LC Climate Change Initiative Land Cover
GlobBiomass Page 6/20 CDR Critical Design Review COTS Commercial Off-The-Shelf software DDF Design Definition File DDS Demonstration Data Set DJF Design Justification File DUE Data User Element of ESA s Earth Observation Envelope Programme ECSS European Cooperation for Space Standardization ECV Essential Climate Variable Envisat Environmental Satellite EOEP Earth Observation Envelope Programme EO Earth Observation FAO Food and Agriculture Organization GLAS Geoscience Laser Altimeter System GSV Growing stock volume ICESAT Ice, Cloud,and land Elevation Satellite MB Megabytes MODIS Moderate Resolution Imaging Spectroradiometer PALSAR Phased-array L-band type Synthetic Aperture Radar SOW Statement of Work VCF Vegetation Continuous Fields 2.4. References The following documents are applicable to this document: [AD-1] EOEP-4 Data User Element GlobBiomass Statement of Work 30.4.2014 [AD-2] Data User Element GlobBiomass Algorithm Theoretical Basis Document (ATBD) preliminary version [AD-3] Data User Element GlobBiomass User Requirements Document (URD) 14.8.2015 [AD-4] Data User Element GlobBiomass System requirements Document (SRD) 14.12.2015 The following documents are referenced in this document: Cartus, O., Santoro, M. and Kellndorfer, J. (2012), Mapping forest aboveground biomass in the Northeastern United States with ALOS PALSAR dual-polarization L-band. Remote Sensing of Environment, 124, 466-478. Santoro, M., Beer, C., Cartus, O., Schmullius, C., Shvidenko, A., McCallum, I., Wegmüller, U. and Wiesmann, A. (2011), Retrieval of growing stock volume in boreal forest using hyper-temporal series of Envisat ASAR ScanSAR backscatter measurements. Remote Sensing of Environment, 115, 490-507. Santoro, M., Beaudoin, A., Beer, C., Cartus, O., Fransson, J. E. S., Hall, R. J., Pathe, C., Schepaschenko, D., Schmullius, C., Shvidenko, A., Thurner, M. and Wegmüller, U. (2015), Forest growing stock volume
GlobBiomass Page 7/20 of the northern hemisphere: spatially explicit estimates for 2010 derived from Envisat ASAR data. Remote Sensing of Environment, 168, 316-334. 3. GLOBBIOMASS PROCESSING SYSTEM OVERVIEW This section highlights only the most important information concerning the GBPS. For a detailed description it is referred to Section 3 of the SRD. 3.1. Context Since no EO observable is directly related to a biomass type of parameter, the remote sensing science of biomass has developed all possible approaches to extract from the existing data what is believed to be the best possible result. In the ATBD [AD-2] the general overview of the global mapping approach is outlined as shown in Figure 1. The GlobBiomass retrieval approach shall reflect a methodology that exploits a multi-sensor strategy with the objective of exploiting at its best the information content on biomass in each of the input and training datasets. Figure 1: General overview of the global mapping approach [AD-2]. 3.2. User Requirements The input requirements for this document stem from: the Statement of Work (SOW) [AD 1] the GlobBiomass project documents URD [AD-3], ATBD[AD-2] Different user groups with different needs were identified:
GlobBiomass Page 8/20 Science Policy Forest management The summary of the requirements is presented in the SRD [AD-4] 3.3. Main System requirements The GB SRD lists about sixty functional, operational and performance requirements for the GBPS. There are some high level requirements, and some performance and sizing requirements that have an impact on the system design. High level requirements are to generate the GB products (GB-SR-0010, GB-SR-0100), to implement GBPS workflows and scenarios (GB-SR-0400), to ensure the stability of the outputs (GB-SR-0200), and to be reactive to improvements (GB-SR-0300). This SSD foresees measures for stability by proper versioning and support for future improvements. The quantitative requirements are more on the performance and sizing side than on the reliability and security side because the main scenario is reprocessing. It is more important to have a good overall performance than to be highly reliable in the short term. 3.4. Main functions The GBPS, performs pre-processing, retrieval, product generation, validation/verification. Important is the transparent upgrade of processors. To fulfil its purpose in this context the GBPS provides three high level functions: production in a broader sense product verification processor migration 3.5. High Level Decomposition For the GBPS there is only one subsystem, the processing environment (production and development subsystem). A subsystem consists of functional components. The production control, processing storage and the processors provide the basic infrastructure for the processing environment. A test environment with read access to all data and the option to use the production infrastructure for bulk tests serves the development needs.
GlobBiomass Page 9/20 3.6. Hardware Infrastructure The hardware used within the GB project is COTS computers with multiple cores and large hardware raids to host the data. The OS is Ubuntu Linux LTS versions. The hardware is not performance critical and mainly used for the development and processing foreseen in the GB project. However, the software development is generic and shall also take advantage of more powerful environments. 4. GLOBBIOMASS PROCESSING SYSTEM WORKFLOW AND OPERATIONAL SCENARIOS 4.1. Roles The development team consists of scientists, operators and system integrators. Together they manage the production and the continuous development of the GBPS. The following roles are identified: The development team with scientists, operators, and system integrators. The group has a mandate to push forward GBPS, decide about requirements to analyse and implement, algorithms to test in agreement with ESA. External experts for GB validation ESA that supervises the project and decides about overall direction of the project 4.2. User information and data access At this stage no user interaction with the GBPS is foreseen. 4.3. Processing System Workflow A thorough presentation of the Processing System Workflow is given in the SRD Section 5. Here we summarize the core elements. The high level workflow is shown in Figure 2. It highlights the main modules of the workflow, the preprocessing, retrieval, product generation, validation and verification. They are now being discussed in more detail.
GlobBiomass Page 10/20 Preprocessing Retrieval Verification Product Generation Validation Figure 2: High level outline of the GBPS processing workflow. 4.3.1. Pre-processing The preprocessing covers all steps from SAR data intake to orthorectified calibrated backscatter images. This includes filtering, preparation of auxiliary data needed for the processing, gridding to a common reference grid etc. 4.3.2. Retrieval In the retrieval module the biomass and related uncertainties are estimated. The retrieval is arranged in a cascade approach with three sequential stages (Figure 3 taken from draft ATBD): A global dataset of GSV referred to as "biomass indicator" is derived from the hyper-temporal dataset of ASAR backscatter images with the BIOMASAR algorithm (Santoro et al. 2011, Santoro et al. 2015). This is the BIOMASAR-C processor The retrieval of AGB following the final version of the ATBD to be released early 2016decision as required by the Invitation to Tender, i.e. a global map of forest based on high-resolution information available from global mosaics of ALOS PALSAR dual-polarization backscatter provided by JAXA.
GlobBiomass Page 11/20 Figure 3: Flowchart of GlobBiomass global biomass retrieval algorithm (draft ATBD). 4.3.3. Product Generation The product generation process includes the conversion of GSV to above ground biomass, conversion of the maps to the product format specifications. All AGB data products will be in Geotiff format. 4.3.4. Verification/Validation Verification of the system is done during all steps above by comparing intermediate and final results obtained on local machines in a controlled environment with the same version of the software on the GBPS (benchmark data). Data validation is done by external experts. However, the GBPS should perform consistency checks on the products before release to the experts. For the consistency check, comparisons against in situ observations of AGB and raster datasets of AGB are foreseen. A database of such observations has been set up on a data exchange platform hosted by the University of Jena and is administrated by GAMMA with the aid of Excel sheets with a listing of available datasets.
GlobBiomass Page 12/20 4.4. Algorithm Improvement The development team decides about features or processes to be improved in order to meet user requirements. The development team implements the improvements as new versions of processors. Then the team tests and validates the new version. Finally the development team decides if the new version is to be released.
GlobBiomass Page 13/20 5. FUNCTIONAL DESIGN 5.1. Services Most of the necessary services are provided through the GB project such as the data exchange storage, the document management system and user interaction. For the GBPS development a software repository and an issue tracker is hosted at GAMMA as part of the GBPS ( Figure 4, https://redmine.gamma-rs.ch/projects/globbiomass). Figure 4: GlobBiomass instance in Redmine at GAMMA Remote Sensing. The Subversion repository can also be accessed from within Redmine. Processor software repository An important element of the modern software development process is source control (or version control). Cooperating developers commit their changes incrementally to a common source repository, which allows them to collaborate on code without resorting to crude file-sharing techniques (shared drives, email). Source control tools track all prior versions of all files, allowing developers to "time travel" backward and forward in their software to determine when and where bugs are introduced. These tools also identify conflicting simultaneous modifications made by two (poorly-communicating) team members, forcing them to work out the correct solution (rather than blindly overwriting one or the other original submission). For the GBPS development Subversion is used as version control tool (https://subversion.apache.org/). The software repository contains the actual processing code and all prior versions. The write access to the processor repository is restricted to the development team. As all software changes are updated directly in the repository, the software changes are published almost immediately and are made available for review. The software of the GBPS and the processing
GlobBiomass Page 14/20 algorithm code are under configuration control. Additional to the raw source code, the repository should also contain the information required to build the software from the source-code. Issue Tracker During software development a Redmine issue tracker is used (http://www.redmine.org/). Redmine is a flexible project management web application written using Ruby on Rails framework. Redmine integrates the version control system into its user interface and manages the access control to the version control system resulting in a state of the art FOSS software development environment. 5.2. Processors The GBPS processors are developed based on the prototype BIOMASS processors, mainly BIOMASAR- C and BIOMASAR-L. The processors for the preprocessing are not necessary at the moment as all data are already preprocessed and available. The data uptake of the intermediate data is done using shell scripts. The prototype processors for the retrieval are available in Matlab. The code will be ported from Matlab to Octave. GNU Octave is software featuring a high-level programming language, primarily intended for numerical computations. It provides a command-line interface for solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly compatible with MATLAB (https://gnu.org/software/octave/). It may also be used as a batch-oriented language. It is part of the GNU Project, it is FOSS (GNU General Public License). Octave allows the code to be run in parallel and decentralized if necessary in the future without license restrictions. Product generation and verification is implemented from scratch. It is foreseen to use shell scripts and the gdal tools. 5.3. Concept for continuous improvement Processors, (processor bundles including software and configuration) are under configuration control in the GBPS. The processor information contains the release version of the version control system that hosts the code. 5.4. System Documentation The system documentation comprises manuals, specifications and reports. It is supplemented by more general GB project documentation like product specification, algorithm definitions, and validation reports. The system documentation comprises requirement documents, design and interface control documents, test documents, manuals, and maintenance information. The GB SRD and SSD define requirements and design of the system. An operations manual includes an instruction part with step-
GlobBiomass Page 15/20 by-step descriptions of different use cases, and a reference part related to the functions and functional components and their capabilities, how to use them. An installation and administration manual describes the initial setup and configuration of the GBPS. The processor integration guide describes the most important internal interface of the system. The system verification documents define a set of tests and report about their results for the versions of the system that have been provided. The software release notes describe valid combinations of versions of components and software packages and they identify the corresponding documentation. They identify the versions currently in use. The issue tracking system documents system issues, among others, and their status. 6. DEVELOPMENT, LIFE CYCLE, COST AND PERFORMANCE 6.1. Re-use and development The GBPS re-uses components from the prototype, and configures and adapts them. Matlab code is converted to Octave to increase portability. Software Role Adaptation Integration and Configuration Preprocessor Shell scripts Retrieval Matlab code C-band processor (BIOMASAR-C) Preprocessor Module Retrieval Module Adapt to GBPS architecture Convert to Octave Retrieval Matlab code L-band processor (BIOMASAR-L) Retrieval Module Convert to Octave Product generation scripts Product Generation Module Adapt to GBPS architecture Redmine Issue tracker Used for development Subversion Software and Document repository Used for development 6.2. System life cycle drivers and considerations Starting with the prototype migrated to the target platform, GBPS is operated and incrementally developed by functional extension, improved algorithms. In the future also new input datasets have to be considered along with the stepwise extension of the hardware infrastructure. As the project itself is on a short timescale and aiming at a initial production of a global AGB map, the development
GlobBiomass Page 16/20 is very likely after the project period. However, the development should be designed in a sustainable way that it is open for future development and enhancement. 6.3. Sizing and performance analysis From the prototype implementations a data storage and processing load budget can be made: Dataset Size Use in GlobBiomass Envisat ASAR, backscattered 15 TB Input, retrieval of GSV at 1 km spatial resolution intensity (2005-2012) ALOS PALSAR mosaics of 6 TB Input, retrieval of GSV at high spatial resolution backscattered intensity (2007-2010) Landsat reflectances 730 GB Input, retrieval of GSV at high spatial resolution MODIS Vegetation Continuous Fields 4 GB Auxiliary dataset, support to the training of the model inverting ASAR data to retrieve GSV Landsat canopy density and density change 90 GB Auxiliary dataset, support to the training of the model inverting high resolution data to estimate GSV ICESAT GLAS 64 GB Auxiliary dataset, support to the training of the models inverting EO data to estimate GSV CCI Land Cover 4 GB Auxiliary dataset, support to the training of the model inverting EO data to retrieve GSV ERA Interim 11 GB Auxiliary dataset, support to the training of the model inverting ASAR data to retrieve GSV FAO Global Ecological Zones (resampled to 0.01 pixel size) Map of forest GSV, pixel size 0.01, and related uncertainties Map of forest AGB, pixel size 0.000444, and related uncertainties 4 GB Auxiliary dataset, support to the training of the model inverting ASAR data to retrieve GSV 8 GB Output of stage 1 and input of stage 2 16 GB Output of stage 2 and input of stage 2 The retrieval of GSV with C-band data is by far the most time consuming approach and calls for strong parallelization of the processing. The processing works on a tile-by-tile basis and in each tile, the model training is applied on each image and at each pixel. For each pixel, backscatter values are extracted for a window centred at the pixel whose size is adaptive. The size of the window depends on a threshold on a minimum number of valid backscatter values used to generate the estimates of the model parameter. As long as this requirement is not met, the window size is increased or, alternatively, other requirements are considered. It is referred to the ATBD for more details. The adaptivity of the window size depends strongly on the heterogeneity of the forest cover. The estimation of the model parameters for all valid backscatter values in a tiled image can last a few seconds for a rather homogeneous distribution of forest cover. It lasts up 10-20 seconds in very heterogeneous forest cover. Assuming that a tile contains 200-300 tiled images of the ASAR backscatter, the estimation of the model parameter can last between 5-6 minutes up to 100 minutes. The estimation of GSV once the model is trained is straightforward (less than 1 second per image and
GlobBiomass Page 17/20 per tile). The multi-temporal combination requires an additional 150 seconds (on average) per tile. The same figures apply to the computation of the uncertainties. These values apply to a single machine with an Intel Core i7-2600k CPU @ 3.40 GHz with a 16 GB RAM and 64-bit architecture. The performance of the algorithm has been benchmarked with less powerful machines, indicating a significant loss of processing speed during the model training phase. The retrieval with L- band data is less time consuming since it does not adapt at the pixel level nor it has a multi-temporal combination. Again depending on the heterogeneity of the forest landscape, the processing time of the model training phase differs, being between 1 and 4 minutes for a single 1 1 tile. The retrieval of GSV / AGB once the forest backscatter model is trained is straightforward. The processing times of BIOMASAR-L and Cubist do not differ significantly. Given that the CESBIO method has only been assessed regionally, figures cannot be presented at this stage. 6.4. Cost estimation Not applicable at this stage 7. REQUIREMENTS TRACEABILITY ID Title Reference Section in SSD GB-SR-0010 Generate and publish global above ground biomass maps 3.1 GB-SR-0020 Standalone system 3.1 GB-SR-0030 Generate the outputs required in URD 3.2 GB-SR-0100 GB-SR-0200 Implement the processing workflows and the scenarios developed within DUE GlobBiomass Stability of the processing chain for re-processing and its continuous extension in order to support consistency of the output dataset. 4, 4.3 5.3, 6.1 GB-SR-0300 Re-producible output. 4.3 GB-SR-0400 GB-SR-0500 Implementation of improved algorithms. Frequent reprocessing shall be possible in a TBD period. There shall be an operational procedure to perform the necessary validation and other steps to perform the transfer to operations. 5.3, 6.1 4.3.4, 5.4 GB-SR-1010 Production according to the processing workflows. 3.4
GlobBiomass Page 18/20 GB-SR-1020 Processors 4.3, 5.2 GB-SR-1021 Processors for pre-processing. 4.3.1, 5.2 GB-SR-1022 Processors for retrieval 4.3.2, 5.2 GB-SR-1023 Processors for product generation 4.3.3, 5.2 GB-SR-1024 Processors for verification. 4.3.4, 5.2 GB-SR-1030 GB-SR-1040 GB-SR-1110 The GBPS shall provide means to ensure the quality of its outputs. The GBPS shall provide means to do quality checks and manual inspection of intermediate results during processing. Read Envisat and Palsar Mosaic data and prepare the data for injection into the Retrieval module. 4.4 4.3.4, 4.4 4.3.1 GB-SR-1210 Produce growing stock volume maps 3.2, 3.3 GB-SR-1220 GB-SR-1230 C-band module capable to derive growing stock volume from C-band SAR data L-band module capable to derive growing stock volume from L-band SAR data 4.3.2 4.3.2 GB-SR-1310 Converter to derive AGB from GSV. 4.3.2 GB-SR-1320 Produce standardized products in Geotiff format. 4.3.3 GB-SR-1410 Support the validation 4.3.4 GB-SR-1420 Validate land cover products automatically 4.3.4 GB-SR-1430 Support different validation datasets 4.3.4 GB-SR-1510 Allow to keep several versions of outputs at the same time. 4.3.3,.5.4 GB-SR-1610 GB-SR-2010 Provide subsets of the global product and products in different resolutions and projections Optimised for repeated re-generation of parts or the complete dataset 6.1 3.3 GB-SR-2020 Production of a global dataset within TBD days. 3.3 GB-SR-2030 Prepare for massive parallel processing. 3.6, 6.1
GlobBiomass Page 19/20 GB-SR-2040 Scalable by additional hardware 6.1 GB-SR-3010 GB-SR-3020 The GBPS shall allow for a collaborative but distributed operation. The GBPS shall ingest and preprocessed Envisat ASAR data and Jaxa ALOS Mosaic 6.1 4.3.1 GB-SR-4010 The GBPS supports the detection and handling of failures. 4.3.4 GB-SR-4020 The GBPS supports the transfer of a processor version to operations 6.1 4030 The used version of the processor shall be transparent 6.1 4040 The GBPS shall support re-processing of parts or all of the dataset 5010 Maintain its software components under configuration control. 3.5, 5.2 6.1 5020 Follow a modular approach in its design 3.3, 3.5 5030 Support continuous improvement of methods and inputs 6.1, 6.2
GlobBiomass Page 20/20 Gumligen (Switzerland), 20.12.2015 Dr. Andreas Wiesmann