NASA's Earth Observing Data and Information System (EOSDIS) Copernicus Big Data Workshop 13 March 2014 Brussels, Belgium Kevin Murphy EOSDIS System Architect NASA Goddard Space Flight Center
Topics to be Covered Overview of NASA Earth Observation System Data and Information System (EOSDIS) Architecture and Capabilities NASA International Collaboration Sentinel Mirror Site to Support U.S. Users
National Aeronautics and Space Administration MISSION OPERATIONS DATA ACQUISITION EARTH SCIENCE DATA OPERATIONS FLIGHT OPERATIONS, DATA CAPTURE, INITIAL PROCESSING, BACKUP ARCHIVE DATA TRANSPORT TO DATA CENTERS/SIPSs SCIENCE DATA PROCESSING, DATA MANAGEMENT, INTEROPERABLE DATA, ARCHIVE, AND DISTRIBUTION SCIENCE OPERATIONS DISTRIBUTION AND DATA ACCESS Tracking and Data Relay Satellite (TDRS) Research EOS Spacecraft Direct Broadcast (DB) Direct Broadcast/ Direct Readout Stations White Sands Complex (WSC) EOS Polar Ground Stations EOS Data Operations System (EDOS) Data Processing EOS Operations Center (EOC) Mission Control NASA INTEGRATED SERVICES NETWORK (NISN) MISSION SERVICES EOSDIS Data Centers Instrument Teams and Science Investigator-led Processing Systems (SIPSs) 0 1 pt Infrastructure (Search, Order, Distribution) Education Value-Added Providers Interagency Data Centers Earth System Models International Partners Decision Support Systems
NASA s Earth Observation System Data and Information System Earth Science Data are held at Distributed Active Archive Centers (DAACs) to provide knowledgeable curation and science-disciplinebased support NASA provides high bandwidth network connectivity to support production data flows and community access to data NASA develop tools for users to obtain needed data/information while minimizing burden associated with unwanted data NASA engages with multiple US agency efforts to facilitate use of data by broadest possible community with minimal effort and maximal consistency with other data sources NASA s DAACs Key EOSDIS Metrics Unique Data Products (Collections) 6,861 Distinct Data and Service Users Per year Average Daily Archive Growth Total Archive Volume End User Products Distributed per year End User Average Daily Distribution Volume 1.7 M 8.5 TB/day 9.8 PB 839 M 22 TB/day
EOSDIS as a Seamless, Efficient User Driven System Present NASA s EOSDIS as an interoperable system of systems where users can select, view, interact and download the data they need transparently from all subsystems in support of interdisciplinary Earth Science research. Supplement current data system capabilities with new interoperable technologies to create a foundation for future evolution. Support technology infusion of tools developed by internal programs and by industry EOSDIS capabilities and feature development is prioritized through DAAC User Working Groups (science experts) and input from community data system programs Data Metadata as a Service Imagery as a Service Data as a Service Partnerships commercial clouds Today Tomorrow Working Towards
Central Reusable Capabilities Earthdata: the EOSDIS website (http://earthdata.nasa.gov) Metadata Services ECHO: Searchable catalog of granule metadata for NASA datasets (OpenSearch, CSW, OGC interfaces) Global Change Master Directory (GCMD): Searchable catalog of over 26,000 NASA and International dataset collections User Tools (e.g.) Reverb search and order tool Global Imagery Browse Services (GIBS) full resolution imagery derived from NASA products in a standardized manner to any web-connected client (Open Sourced) Worldview - highly responsive interface to explore GIBS imagery and download the underlying data LANCE DAAC ECHO GCMD EMS Web Applications & Services Giovanni Quick-Start Exploratory Data Analysis SIPS Earthdata Web Infrast. Metrics System (EMS): collects and reports on data ingest, production, archive, and distribution across all EOSDIS data centers User Registration System: provides a centralized and mechanism for user registration and account management for all EOSDIS system. GIBS
Examples of Relevant DAAC Capabilities Sentinel 1 (ASF DAAC) ASF DAAC offers a variety of SAR-derived higher level products via easy-to-use Web interfaces MapReady Toolkit Sentinel 3 (ODPS, LAADS, LPDAAC, ) SeaDAS - science processing from Level-1B through Level-3 with a host of NASA standard and alternative ocean product algorithms (including source code). Also data product analysis and visualization based on ESA's BEAM tool. Match-up to field data - Level-1/Level-2 browser will identify all data granules for which coincident field data exists in the NASA SeaBASS in situ bio-optical archive, and provide the data as a unified order. In situ SeaBASS archive for product validation and the NOMAD database for bio-optical algorithm development Sentinel 5p Giovanni Subsetting and reformatting 8
EOSDIS Networks Provide end-to-end network connectivity between users and geographically distributed EOSDIS data centers. Globally connected to serve the diverse needs of NASA's worldwide science and research community. Network Operations Monitor and operate all aspects of network to meet required level of service Develop tools to identify and resolve bandwidth and connectivity issues Plan for future bandwidth needs
Examples of International Collaborations European Space Agency (ESA) - ESA/NASA Bilateral MERIS MODIS/SEAWiFS Data Exchange NASA provided the entire Aqua MODIS and SeaWiFS Level 1A data archives to ESA in exchange for MERIS Reduced Resolution and Full Resolution data (for redistribution to NASA user community) First time non-commercial 3 rd party data distributor for ESA Japanese Aerospace Exploration Agency (JAXA) Under agreement between NASA and JAXA, in 2010 the NASA ASF DAAC began acquiring PALSAR data from ALOS MapReady tool modified by ASF DAAC to support ALOS PALSAR data ASF DAAC integrated JAXA processor into DAAC processing stream Currently the ASF DAAC archives over 1 petabyte of ALOS PALSAR data, and has distributed 1.6 million scenes. Canadian Space Agency (CSA) NASA ASF DAAC provides mission planning, downlink, and data distribution for RADARSAT-1 Long-term record of Chla averaged over global deep-water region. Shown is the monthly anomaly after subtraction of the monthly climatological mean for consistently-processed SeaWiFS (black), MERIS (brown), MODIS- Aqua (blue), and NASA NPP VIIRS evaluation product (red). Gray region is estimated uncertainty.
DHuS Plugs into EOSDIS COPERNICUS SERVICES DATA TRANSPORT NASA SCIENCE OPERATIONS SCIENCE DATA PROCESSING, DATA MANAGEMENT, INTEROPERABLE DATA, ARCHIVE, AND DISTRIBUTION DISTRIBUTION AND DATA ACCESS Research Education EOSDIS Data Centers Sentinel Mirror System 0 pt Infrastructure (Search, Order, Distribution) Value-Added Providers Interagency Data Centers Earth System Models International Partners Decision Support Systems NASA INTEGRATED SERVICES NETWORK (NISN) MISSION SERVICES
NASA Sentinel Mirror Leveraging Existing EOSDIS Capabilities For Sentinel 1, 3, 5P, NASA will leverage proven mirroring and redistribution capabilities, currently used for S-NPP Single network interface relieves bandwidth load on European networks Long term archival and end user distribution by DAACs (e.g. Sentinal-1 by ASF DAAC). Provides metric reports back to the EC/ESA on product distribution and usage Leverages entire suite of NASA s EOSDIS capabilities including capturing and reporting metrics on distribution and usage of Sentinel products by U.S. scientists
Sentinel Mirror: Leveraging experience with SNPP Current Capabilities of NASA s S-NPP Science Data System: Server System dedicated to Acquiring data from multiple locations Storing data temporarily (~ 30 days) Making data available to six data processing centers Ingests 6 TB daily Capability of distributing 2.5 times the ingest volume; routinely distributes 15TB daily Products available to data processing centers within 30 minutes of receipt
Lessons Learned Data Acquisition and Archives Pick the best protocol for high bandwidth circuits over long distances NASA s recent test results indicate the most effective protocols are: Bbftp (http://doc.in2p3.fr/bbftp/) GridFtp (http://toolkit.globus.org/toolkit/data/gridftp/) Monitor system health, provide system failover, database failover and replication to protect against data loss resulting from system faults Detect data gaps and automatically reacquire missing products Provide users with data gap reports and data archive status Provide data consumers with subscription capabilities and a manifest file to streamline data access User Experience Data on spinning disk is essential for providing interactive services Quality, consistency and flexibility of metadata services enables service orientated architectures Open source software and standards are vital for interoperability Machine accessible APIs Engage users early and often
Conclusion In partnership with Copernicus program NASA s EOSDIS is prepared to invest in capabilities to maximize Sentinel data utilization
Backup Slides
NASA Sentinel Connectivity NASA Sentinel Reflector GSFC Users Other NASA Users, e.g., LaRC, JPL DAACs Other Users 10 g NASA GSFC Greenbelt, MD EOS 10 g 10 LAN g 10 gbps 1 g NASA Backbone (NISN) 10 gbps Commodity Internet 10 g College Park, MD MAX FedNets Users at Other Federal Agencies 1 g Chicago, IL Starlight 2.4 g EROS Sioux Falls, SD 10 g DC WIX PNW 100 g 2 x 10 g Internet2 100 gbps Seattle, WA 622Mbps ASF Fairbanks, AK Universities GÉANT 10 g ESA
Protocol Performance Test ESDIS Networks has recently completed an extensive test of OpenSource protocols FTP, SFTP, FTPS, SCP, HTTP, HTTPS, BBFTP, GridFTP Test variables: Latency (low, medium, long) Packet Loss (low, medium, high) Testing was completed in-lab Using a delay and packet loss simulator Lab test results validated by testing between GSFC and Alaska NASA facility. From our test results there are protocols which use high-bandwidth circuits more effectively over long distance. bbftp GridFtp http://doc.in2p3.fr/bbftp/ http://en.wikipedia.org/wiki/gridftp http://toolkit.globus.org/toolkit/data/gridftp/ http://toolkit.globus.org/toolkit/data/gridftp/ downloads.html
Exploratory Analysis of Remote Sensing Data with Giovanni* Exploratory Data Analysis Find Download Learn format Write Code Read Subset Quality Filter Summarize / Analyze Visualize Giovanni Read Extract Variables Subset Filter Quality Reformat Regrid Visualize Explore Giovanni provides Quick-Start Exploratory Data Analysis: no coding necessary Main Analysis Phase Select Data Analyze Derive Conclusions Publish *Geospatial Interfactive Online Visualization and Analysis Interface: http://giovanni.gsfc.nasa.gov linked interactive scatterplot + map