JASMIN (STFC/Stephen Kill) The OPTIRAD Platform: Cloud-hosted IPython Notebooks for collaborative EO Data Analysis and Processing ESA EO Open Science 2.0 Conference 12-14 October 2015 Philip Kershaw (CEDA), John Holt (Tessella plc.) José Gómez-Dans, Philip Lewis (UCL) Nicola Pounder, Jon Styles (Assimila Ltd.)
Introduction OPTIRAD = OPTImisation environment for joint retrieval of multi-sensor RADiances Collaboration: CEDA, UCL, Assimila Ltd, FastOpt and VU Amsterdam Funded by ESA Overview of technical solution Introduction to IPython (Jupyter) Notebook Deployment on JASMIN-CEMS science cloud Make the case, IPython Notebook + Cloud = powerful combination for EO Open Science 2.0
OPTIRAD Goals Address the challenge of producing consistent EO land surface information products from heterogeneous EO data input: Collaboration: provide a collaborative research environment as a means to engender closer working between algorithm specialists, modellers and end users. Computing resources: processing at high spatial and temporal resolutions with computationally expensive algorithms. Usability and access: easy execution and development of existing Python code and the provision of interactive tutorials for new users
IPython Notebook Provides Python kernels accessible via a web browser Sessions can be saved and shared Trivial access to parallel processing capabilities IPython.parallel (ipyparallel) IPython Jupyter Notebook Support for other languages such as R New JupyterHub allows multi-user management of notebooks Gained traction as a teaching and collaborative tool
IPython Notebook + Cloud Cloud s characteristics: Broad network access, resource pooling, elasticity, scale compute and storage Good fit for Big Data science applications Cloud-hosted Notebook - a model already demonstrated with public cloud services e.g. Wakari, Azure, Rackspace Central hosting allows central management of software packages no installation steps needed for the user Algorithm prototyping environment next to Big Data Acts as a precursor to operational processing services
Different classes of user Notebook: a user application perspective Support a spectrum of usage models Long-tail of science users
Design and development considerations Host on JASMIN-CEMS Data analysis facility and science cloud at Rutherford Appleton Lab, UK Advantage of proximity to locally hosted EO and climate science datasets Integration with environmental sciences community Lightweight development and deployment philosophy Build on Open Source and community efforts to use what s already available How to meet multi-user support requirement? Buy off-the-shelf: run Wakari on JASMIN-CEMS platform or Try JupyterHub: multi-user IPython Notebook solution or Roll our own solution How to integrate parallel processing? IPython.parallel (ipyparallel) Python API accessed via the Notebook
Firewall Deployment Architecture Browser access OPTIRAD JASMIN Cloud Tenancy VM: Swarm pool 0 VM: Swarm pool 0 VM: Docker Swarm Container pool 0 VM: shared services NFS LDAP JupyterHub Manage users and provision of notebooks IPython Notebook Kernel Parallel Controller VM: slave 0 VM: Swarm pool VM: Parallel 0 Swarm Engine pool 0 Docker Container Docker Container Swarm IPython Notebook Kernel Parallel Controller Parallel Engine Swarm manages allocation of containers for notebooks Notebooks and kernels in containers Nodes for parallel Processing
Conclusions + Next Steps Experiences from project delivery Off-shelf solution using JupyterHub paid off JupyterHub and Swarm was new but Installation straightforward + operationally robust Challenges and future development Extend use of containers for parallel compute Challenge: managing cloud elasticity with both containers and host VMs Provide object storage CEPH likely to be adopted Expand from OPTIRAD pilot to wider user community Deploy with toolboxes e.g. Sentinels or CIS.
Demo... A tutorial on EO data assimilation Notebook blurs the traditional separation between tutorial documentation and using the target system The two are one selfcontained interactive unit
Further information OPTIRAD: Optimisation Environment For Joint Retrieval Of Multi-Sensor Radiances (OPTIRAD), Proceedings of the ESA 2014 Conference on Big Data from Space (BiDS 14) http://dx.doi.org/10.2788/1823 JASMIN paper (Sept 2013) http://home.badc.rl.ac.uk/lawrence/static/2013/10/14/lawea13_jasmin. pdf Cloud paper to follow soon Cloud-hosted JupyterHub with Docker for teaching: https://developer.rackspace.com/blog/deploying-jupyterhub-foreducation/ JASMIN and CEDA: http://jasmin.ac.uk/ http://www.ceda.ac.uk @PhilipJKershaw