The OPTIRAD Platform: Cloud-hosted IPython Notebooks for collaborative EO Data Analysis and Processing



Similar documents
Experiences and challenges in the development of the JASMIN cloud service for the environmental science community

Big Data and the Earth Observation and Climate Modelling Communities: JASMIN and CEMS

JASMIN Cloud ESGF and UV- CDAT Conference December 2014 STFC / Stephen Kill

Microsoft Research Windows Azure for Research Training

Microsoft Research Microsoft Azure for Research Training

CEDA Storage. Dr Matt Pritchard. Centre for Environmental Data Archival (CEDA)

Virtualisation Cloud Computing at the RAL Tier 1. Ian Collier STFC RAL Tier 1 HEPiX, Bologna, 18 th April 2013

IT Exam Training online / Bootcamp

Fast Lane OpenStack Overview Red Hat Enterprise Linux OpenStack Platform

Cloud JPL Science Data Systems

Cloud Federation to Elastically Increase MapReduce Processing Resources

PyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts

Performance Analysis of a Numerical Weather Prediction Application in Microsoft Azure

STeP-IN SUMMIT June 18 21, 2013 at Bangalore, INDIA. Performance Testing of an IAAS Cloud Software (A CloudStack Use Case)

Sacha Dubois RED HAT TRENDS AND TECHNOLOGY PATH TO AN OPEN HYBRID CLOUD AND DEVELOPER AGILITY. Solution Architect Infrastructure

Virtual Machine Instance Scheduling in IaaS Clouds

Deploying complex applications to Google Cloud. Olia Kerzhner

Big Data Infrastructures for Processing Sentinel Data

Making a Smooth Transition to a Hybrid Cloud with Microsoft Cloud OS

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

EO data hosting and processing core capabilities and emerging solutions

Automated deployment of virtualization-based research models of distributed computer systems

Block Storage in the Open Source Cloud called OpenStack

How to Secure Infrastructure Clouds with Trusted Computing Technologies

SYNNEFO: A COMPLETE CLOUD PLATFORM OVER GOOGLE GANETI WITH OPENSTACK APIs VANGELIS KOUKIS, TECH LEAD, SYNNEFO

Solution for private cloud computing

FIA Athens 2014 ~OKEANOS: A LARGE EUROPEAN PUBLIC CLOUD BASED ON SYNNEFO. VANGELIS KOUKIS, TECHNICAL LEAD, ~OKEANOS

OpenNebula Open Souce Solution for DC Virtualization. C12G Labs. Online Webinar

The most powerful open source data science technologies in your browser.!! Yves Hilpisch

CRITEO INTERNSHIP PROGRAM 2015/2016

Sriram Krishnan, Ph.D.

OpenNebula The Open Source Solution for Data Center Virtualization

Cloud Computing Trends


DOCLITE: DOCKER CONTAINER-BASED LIGHTWEIGHT BENCHMARKING ON THE CLOUD

GIS IN THE CLOUD THE ESRI EXAMPLE DAVID CHAPPELL SEPTEMBER 2010 SPONSORED BY ESRI

Proactively Secure Your Cloud Computing Platform

Why is a good idea to use OpenNebula in your VMware Infrastructure?

The Virtualization Practice

Tech Note. TrakCel in the wider Clinical Ecosystem: Accelerating Integration and Automation

OpenStack Introduction. November 4, 2015

Putchong Uthayopas, Kasetsart University

Modeling Public Pensions with Mathematica and Python II

A SHORT INTRODUCTION TO CLOUD PLATFORMS

Web Application Hosting Cloud Architecture

THE EUCALYPTUS OPEN-SOURCE PRIVATE CLOUD

Getting Started Hacking on OpenNebula

Het is een kleine stap naar een hybrid cloud

Elastic Management of Cluster based Services in the Cloud

VM Management for Green Data Centres with the OpenNebula Virtual Infrastructure Engine

An Introduction to Using Python with Microsoft Azure

Building Hyper-Scale Platform-as-a-Service Microservices with Microsoft Azure. Patriek van Dorp and Alex Thissen

One click Hadoop clusters - anywhere

APPLICATION NOTE. Elastic Scalability. for HetNet Deployment, Management & Optimization

Docker : devops, shared registries, HPC and emerging use cases. François Moreews & Olivier Sallou

CloudStack and Big Data. Sebastien May 22nd 2013 LinuxTag, Berlin

Unidata Cloud-Related Activities. Unidata Users Committee Meeting September 2014 Ward Fisher

Public Clouds. Krishnan Subramanian Analyst & Researcher Krishworld.com. A whitepaper sponsored by Trend Micro Inc.

Implementing Multi-Tenanted Storage for Service Providers with Cloudian HyperStore. The Challenge SOLUTION GUIDE

Intro to Docker and Containers

Cloud 101. Mike Gangl, Caltech/JPL, 2015 California Institute of Technology. Government sponsorship acknowledged

Cloud Computing Architecture with OpenNebula HPC Cloud Use Cases

5 SCS Deployment Infrastructure in Use

Volunteer Computing, Grid Computing and Cloud Computing: Opportunities for Synergy. Derrick Kondo INRIA, France

BIGS: A Framework for Large-Scale Image Processing and Analysis Over Distributed and Heterogeneous Computing Resources

IBM Spectrum Protect in the Cloud

DevOps with Containers. for Microservices

Best Practices for Python in the Cloud: Lessons

Cloud Essentials for Architects using OpenStack

On Demand Satellite Image Processing

CLOUD COMPUTING. When It's smarter to rent than to buy

How To Manage Cloud Computing

SURFnet Cloud Computing Solutions

Cloud computing - Architecting in the cloud

Unleash the IaaS Cloud About VMware vcloud Director and more VMUG.BE June 1 st 2012

Cloud Computing. Course: Designing and Implementing Service Oriented Business Processes

Planning the Migration of Enterprise Applications to the Cloud

The path to the cloud training

Transcription:

JASMIN (STFC/Stephen Kill) The OPTIRAD Platform: Cloud-hosted IPython Notebooks for collaborative EO Data Analysis and Processing ESA EO Open Science 2.0 Conference 12-14 October 2015 Philip Kershaw (CEDA), John Holt (Tessella plc.) José Gómez-Dans, Philip Lewis (UCL) Nicola Pounder, Jon Styles (Assimila Ltd.)

Introduction OPTIRAD = OPTImisation environment for joint retrieval of multi-sensor RADiances Collaboration: CEDA, UCL, Assimila Ltd, FastOpt and VU Amsterdam Funded by ESA Overview of technical solution Introduction to IPython (Jupyter) Notebook Deployment on JASMIN-CEMS science cloud Make the case, IPython Notebook + Cloud = powerful combination for EO Open Science 2.0

OPTIRAD Goals Address the challenge of producing consistent EO land surface information products from heterogeneous EO data input: Collaboration: provide a collaborative research environment as a means to engender closer working between algorithm specialists, modellers and end users. Computing resources: processing at high spatial and temporal resolutions with computationally expensive algorithms. Usability and access: easy execution and development of existing Python code and the provision of interactive tutorials for new users

IPython Notebook Provides Python kernels accessible via a web browser Sessions can be saved and shared Trivial access to parallel processing capabilities IPython.parallel (ipyparallel) IPython Jupyter Notebook Support for other languages such as R New JupyterHub allows multi-user management of notebooks Gained traction as a teaching and collaborative tool

IPython Notebook + Cloud Cloud s characteristics: Broad network access, resource pooling, elasticity, scale compute and storage Good fit for Big Data science applications Cloud-hosted Notebook - a model already demonstrated with public cloud services e.g. Wakari, Azure, Rackspace Central hosting allows central management of software packages no installation steps needed for the user Algorithm prototyping environment next to Big Data Acts as a precursor to operational processing services

Different classes of user Notebook: a user application perspective Support a spectrum of usage models Long-tail of science users

Design and development considerations Host on JASMIN-CEMS Data analysis facility and science cloud at Rutherford Appleton Lab, UK Advantage of proximity to locally hosted EO and climate science datasets Integration with environmental sciences community Lightweight development and deployment philosophy Build on Open Source and community efforts to use what s already available How to meet multi-user support requirement? Buy off-the-shelf: run Wakari on JASMIN-CEMS platform or Try JupyterHub: multi-user IPython Notebook solution or Roll our own solution How to integrate parallel processing? IPython.parallel (ipyparallel) Python API accessed via the Notebook

Firewall Deployment Architecture Browser access OPTIRAD JASMIN Cloud Tenancy VM: Swarm pool 0 VM: Swarm pool 0 VM: Docker Swarm Container pool 0 VM: shared services NFS LDAP JupyterHub Manage users and provision of notebooks IPython Notebook Kernel Parallel Controller VM: slave 0 VM: Swarm pool VM: Parallel 0 Swarm Engine pool 0 Docker Container Docker Container Swarm IPython Notebook Kernel Parallel Controller Parallel Engine Swarm manages allocation of containers for notebooks Notebooks and kernels in containers Nodes for parallel Processing

Conclusions + Next Steps Experiences from project delivery Off-shelf solution using JupyterHub paid off JupyterHub and Swarm was new but Installation straightforward + operationally robust Challenges and future development Extend use of containers for parallel compute Challenge: managing cloud elasticity with both containers and host VMs Provide object storage CEPH likely to be adopted Expand from OPTIRAD pilot to wider user community Deploy with toolboxes e.g. Sentinels or CIS.

Demo... A tutorial on EO data assimilation Notebook blurs the traditional separation between tutorial documentation and using the target system The two are one selfcontained interactive unit

Further information OPTIRAD: Optimisation Environment For Joint Retrieval Of Multi-Sensor Radiances (OPTIRAD), Proceedings of the ESA 2014 Conference on Big Data from Space (BiDS 14) http://dx.doi.org/10.2788/1823 JASMIN paper (Sept 2013) http://home.badc.rl.ac.uk/lawrence/static/2013/10/14/lawea13_jasmin. pdf Cloud paper to follow soon Cloud-hosted JupyterHub with Docker for teaching: https://developer.rackspace.com/blog/deploying-jupyterhub-foreducation/ JASMIN and CEDA: http://jasmin.ac.uk/ http://www.ceda.ac.uk @PhilipJKershaw