Big Data Infrastructures for Processing Sentinel Data



Similar documents
The European Space Agency s Synthetic Aperture Radar Programme From Experiment to Service Provision

Big Data and Cloud Computing for GHRSST

On Demand Satellite Image Processing

A Future Scenario of interconnected EO Platforms How will EO data be used in 2025?

Big Data and the Earth Observation and Climate Modelling Communities: JASMIN and CEMS

Mission Operations and Ground Segment

ESA Earth Observation Big Data R&D Past, Present, & Future Activities

Cloud Computing Where ISR Data Will Go for Exploitation

Copernicus Space Component ESA Data Access Overview J. Martin (ESA), R. Knowelden (Airbus D&S)

ESA Earth Observation and the need for high speed networking

IMPLEMENTING GREEN IT

CEDA Storage. Dr Matt Pritchard. Centre for Environmental Data Archival (CEDA)

Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania)

How To Use Data From Copernicus And Big Data To Help The Environment

Big Data in the context of Preservation and Value Adding

Volunteer Computing, Grid Computing and Cloud Computing: Opportunities for Synergy. Derrick Kondo INRIA, France

How To Test Cloud Stack On A Microsoft Powerbook 2.5 (Amd64) On A Linux Computer (Amd86) On An Ubuntu) Or Windows Xp (Amd66) On Windows Xp (Amd65

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

Long Term Preservation of Earth Observation Data

HPC Cloud. Focus on your research. Floris Sluiter Project leader SARA

Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer

Cloud Computing and Amazon Web Services

Experiences and challenges in the development of the JASMIN cloud service for the environmental science community

Stanford SDN-Based Private Cloud. Johan van Reijendam Stanford University

Cloud Platforms in the Enterprise

Space Work Programme 2015

RevoScaleR Speed and Scalability

JASMIN Cloud ESGF and UV- CDAT Conference December 2014 STFC / Stephen Kill

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Integrated Risk Management System Components in the GEO Architecture Implementation Pilot Phase 2 (AIP-2)

Doing Multidisciplinary Research in Data Science

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

ARM-UAV Mission Gateway System

ERS and ENVISAT missions status

Performance measurement of a private Cloud in the OpenCirrus Testbed

The Hartree Centre helps businesses unlock the potential of HPC

AN OPENGIS WEB MAP SERVER FOR THE ESA MULTI-MISSION CATALOGUE

Cloud Computing. Alex Crawford Ben Johnstone

ACCESS TO ERS AND ENVISAT DATA. CGMS is informed about the ESA Earth Observation data policy and data access, in particular in Near Real Time.

Agenda. Company Platform Customers Partners Competitive Analysis

Challenges in Delivering Large-scale Services over Cloud Environments

Data Centric Computing Revisited

BIG DATA FUNDAMENTALS

The DLR Multi Mission EO Ground Segment

Big Data Services at DKRZ

Solution for private cloud computing

EO data hosting and processing core capabilities and emerging solutions

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

The OPTIRAD Platform: Cloud-hosted IPython Notebooks for collaborative EO Data Analysis and Processing

Data Centric Systems (DCS)

L'apport du «big data» et des données satellitaires d'observation de la Terre facilement accessibles au service de la Géologie et de l'environnement

U"lizing the SDSC Cloud Storage Service

GRASS GIS in the Cloud

Forestry Thematic Exploitation Platform Earth Observation Open Science 2.0

What is the real cost of Commercial Cloud provisioning? Thursday, 20 June 13 Lukasz Kreczko - DICE 1

Part V Applications. What is cloud computing? SaaS has been around for awhile. Cloud Computing: General concepts

Best Practices for Optimizing Your Linux VPS and Cloud Server Infrastructure

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications

Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH

A Study of Data Management Technology for Handling Big Data

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Satellite Snow Monitoring Activities Project CRYOLAND

Overview of HPC Resources at Vanderbilt

Estonian Scientific Computing Infrastructure (ETAIS)

EO INSTITUTIONAL PERSPECTIVE

Data Analytics at NERSC. Joaquin Correa NERSC Data and Analytics Services

CSCA0102 IT & Business Applications. Foundation in Business Information Technology School of Engineering & Computing Sciences FTMS College Global

Building a Top500-class Supercomputing Cluster at LNS-BUAP

Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000

cs.nyu.edu/courses/fall13/csci-ua /

European Space Agency EO Missions. Ola Gråbak ESA Earth Observation Programmes Tromsø, 17 October 2012

How To Write A Call To Action For Terrasar-X

How To Build A Cloud Stack For A University Project

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

Transcription:

Big Data Infrastructures for Processing Sentinel Data Wolfgang Wagner Department for Geodesy and Geoinformation Technische Universität Wien Earth Observation Data Centre for Water Resources Monitoring What is Big Data? Big Data, Big Hype? Steve Dodson (2014) in An intrusion of privacy A successful business model of few big primarily American enterprises Sven Schade (2015) describes the Big Data era as a situation in where the volume, variety, velocity and veracity (3+1 Vs) in which data sets and streams become available challenges current management and processing capabilities Schade, S. (2015) Big Data breaking barriers - first steps on a long trail, ISPRS Archives, XL-7/W3, 691-697.

Infrastructures for Processing Big Data Google Council Bluffs data center (http://www.google.com/about/datacenters/gallery/#/all/2) Sentinel Programme A fleet of European earth observation satellites for environmental monitoring

Sentinel-1 A Game Changer C-band SAR satellite in continuation of ERS-1/2 and ENVISAT High spatio-temporal coverage Spatial resolution 20-80 m Temporal resolution < 3 days over Europe and Canada with 2 satellites Excellent data quality Highly dynamic land surface processes can be captured Impact on water management, health and other applications could be high if the challenges in the ground segment can be overcome Sentinel-1 Image of Upper Austria taken on 13/04/2015 Solar panel and SAR antenna of Sentinel-1 launched 3 April 2014. Image was acquired by the satellite's onboard camera. ESA

Sentinel-1 Data Volume From Byte to PetaByte 1 Byte 1 GigaByte 1 KiloByte 1 TeraByte 1 MegaByte 1 PetaByte

Speed of Data Transmission Download of 500 Gigabyte ( daily Sentinel-1 data volume over land) Wireless with 7 Mbit/s Landline with 1 Gbit/s Download of 1 Petabyte ( 7 years of Sentinel-1 data over land) Landline with 1 Gbit/s Speed of Data Processing Assumed processing speed of Sentinel-1 data with one computer/node ~ 4 Mbit/s Processing of 500 Gigabyte ( daily Sentinel-1 data over land) 1 computer Processing of 1 Petabyte ( 7 years of Sentinel-1 data over land) 1 computer 100 nodes 1000 nodes One needs supercomputers for processing Sentinel data!

Approaching Technological Frontiers? Information and communications technology (ICT) has improved dramatically over the past decades Moore s law, which states that the number of transistors in a dense integrated circuit doubles approximately every two years, still holds But there are physical limits to every technology! e.g. for any thermodynamic cycle operating between temperatures and none can exceed the efficiency of a Carnot cycle: = 1 Increasingly we face challenges related to Data volume Bandwidth and I/O Algorithmic complexity Earth Observation Ground Segment Past

Earth Observation Ground Segment Present Earth Observation Ground Segment Future

A New Paradigm for Earth Observation Reasons Fast growing volume and increasing variety of EO data Increasing complexity of algorithms with increasing resolution Higher scientific standards Algorithms must be validated with big data sets and competing algorithms Algorithms ensembles needed Solution Consequence Bring users and their software to the data Need for cooperation & specialisation An Opportunity for New Business Models Business Model of Munich-based company CloudEO http://www.cloudeo-ag.com/how-it-works

Big Data Infrastructures for the Sentinels Private Sector Google Earth Engine Amazon Web Services Offers Landsat data (complete from 2015 onwards) for its cloud user Helix Nebula Science Cloud etc. Consortium of European ICT providers teaming up with ESA, CERN, etc. Public Sector Initiatives trigged mainly by national space programmes THEIA Land Data Centre (France) Climate, Environment and Monitoring from Space (CEMS) (UK) OPUS/Copernicus Centre (Germany) European Space Agency etc. Thematic Exploitation Platforms Mission Exploitation Platforms Google Earth Engine Premier platform for the scientific analysis of high-resolution imagery Combines the strength of an ICT giant with expertise in earth observation (team of > 100 programmers) Rolled out on three Google data centres (US, Europe, Asia) Access through Java Script or Python API Programming in Googlish, i.e. code can only run on Google Earth Engine Image-oriented data structure, including image pyramids for interactive analysis Commercial applications are not free Data download possible (original and processed data) Landsat: complete archive MODIS: many geophysical variables Sentinel-1: already about 10.000 scenes Sentinel-2: will likely follow soon

Snapshot of Google Earth Engine Interface showing Sentinel-1 data holding as of 4/9/2016 (https://ee-api.appspot.com) Earth Observation Data Centre (EODC) Founded in May 2014 as a Public-Private Partnership Mission EODC works together with its partners from science, the public- and the private sectors in order to foster the use of EO data for monitoring of water and land EODC acts as a community facilitator Joint developments Cloud infrastructure Operational data services Software Open Source EODC works towards a federation of data centres

EODC Cooperation Network Work is done within the Communities Infrastructure Sentinel-1 Sentinel-2 Already 13 Cooperation Partners from 6 countries Austria, Australia, Czech Republic, Italy, France, The Netherlands EODC Infrastructure in Vienna Virtual Machines (VMs) Supercomputer VSC-3 Rank 85 of the World s most powerful computers (11/2014) 24/7 Operations & Rolling Archive Petabyte-Scale Disk Storage Tape Storage

EODC Status Operations started in June 2015 after a one year development phase Operational data reception and processing by ZAMG Computer cluster to operated by EODC Virtual Machines via OpenStack Cloud Services Supercomputer VSC-3 operated by TU Wien Data and Platform Services Community Building PaaS User VMs Repositories Community File Repository VSC-3 Login Node NORA Router Job Scheduler High Availabilty Continuous Integration Various Inspection Tools Web Conferencing Development Collaboration Sentinel-1 Data Availability @ EODC Sentinel-1 data are currently available ~2,5 hours after its processing time and 6,25 hours after acquisition time (median value for August 2015) 54888 acquisitions with 39.65 TB (>1,5 times our 10-year ENVISAT ASAR archive) Ramp-up of Sentinel-1 acquisition scenario to full operational status

Supercomputing Experiment Vienna Scientific Cluster 3 High-performance computing (HPC) system with 2020 nodes Each node has 2 processors Intel Xeon E5-2650v2, 2.6 GHz, and 64 Gbytes of RAM Simple Linux Utility for Resource Management (SLURM) Experiment Geocoding of 624 Sentinel-1 images from Austria, Sudan and Zambia with Sentinel-1 toolbox Each image is about 1 Gbyte in size Serial processing with one processor would take about two weeks Approach Parallel processing on 312 nodes whereas 2 images were simultaneously launched on a single computing node Results Processing was completed within 45 min (without queuing)

Conclusions Big Data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Earth Observation is entering the Big Data era Big Data infrastructures for processing of Sentinel data are being developed along two main lines Deploy EO specific services on general-purpose cloud computing environments Building of new, or expansion of existing dedicated EO data centres Acknowledgements My colleagues at TU Wien and EODC: Christian Briese, Vahid Naeimi, Bernhard Bauer- Marschallinger, Christoph Paulik, Alena Dostalova, Stefano Elefante, Thomas Mistelbauer, Hans Thüminger, and Andreas Roncat Austrian Space Application Programme: Projects 844350 Prepare4EODC and 88001 WetMon European Space Agency: Contract No. 4000107319/12/I-BG EODC Water Study