Data Mining as a Service DMaaS



Similar documents
User-friendly access to Grid and Cloud resources for 18scientific th 19 th computing January / 21

Book of Abstracts CS3 Workshop

The most powerful open source data science technologies in your browser.!! Yves Hilpisch

PyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts

Status and Integration of AP2 Monitoring and Online Steering

ArcGIS for Server: Administrative Scripting and Automation

Data Lab System Architecture

Eloquence Training What s new in Eloquence B.08.00

Parallels Cloud Server 6.0

Implementing a SAS Metadata Server Configuration for Use with SAS Enterprise Guide

Obelisk: Summoning Minions on a HPC Cluster

System Administration Training Guide. S100 Installation and Site Management


HEP Data-Intensive Distributed Cloud Computing System Requirements Specification Document

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE

Assignment # 1 (Cloud Computing Security)

Experience in integrating enduser cloud storage for CMS Analysis

Introducing ScienceCloud

SonicWALL SRA Virtual Appliance Getting Started Guide

EOS Monitoring and Analytics Tools

PES. Batch virtualization and Cloud computing. Part 1: Batch virtualization. Batch virtualization and Cloud computing

IBM Cloud Manager with OpenStack

locuz.com HPC App Portal V2.0 DATASHEET

Monitoring HP OO 10. Overview. Available Tools. HP OO Community Guides

MIGRATING DESKTOP AND ROAMING ACCESS. Migrating Desktop and Roaming Access Whitepaper

Linstantiation of applications. Docker accelerate

Session 85 IF, Predictive Analytics for Actuaries: Free Tools for Life and Health Care Analytics--R and Python: A New Paradigm!

NSi Mobile Installation Guide. Version 6.2

w w w. u l t i m u m t e c h n o l o g i e s. c o m Infrastructure-as-a-Service on the OpenStack platform

Parallels Cloud Server 6.0

Long term analysis in HEP: Use of virtualization and emulation techniques

Client/Server Grid applications to manage complex workflows

1 Download & Installation Usernames and... Passwords

Features of AnyShare

Setup Cisco Call Manager on VMware

Development at the Speed and Scale of Google. Ashish Kumar Engineering Tools

VIPERVAULT APPASSURE REPLICATION SETUP GUIDE

owncloud Architecture Overview

ATLAS job monitoring in the Dashboard Framework

MyCloudLab: An Interactive Web-based Management System for Cloud Computing Administration

MONITORING RED HAT GLUSTER SERVER DEPLOYMENTS With the Nagios IT infrastructure monitoring tool

Service Product: IBM Cloud Automated Modular Management (AMM) for SAP HANA One

Using GitHub for Rally Apps (Mac Version)

Scyld Cloud Manager User Guide

files without borders

Cloud Computing. Adam Barker

Scaling out a SharePoint Farm and Configuring Network Load Balancing on the Web Servers. Steve Smith Combined Knowledge MVP SharePoint Server

PC-Duo Web Console Installation Guide

How To Install Project Photon On Vsphere 5.5 & 6.0 (Vmware Vspher) With Docker (Virtual) On Linux (Amd64) On A Ubuntu Vspheon Vspheres 5.4

SSL VPN. Virtual Appliance Installation Guide. Virtual Private Networks

The Julia Language Seminar Talk. Francisco Vidal Meca

An Introduction to Using Python with Microsoft Azure

Docker : devops, shared registries, HPC and emerging use cases. François Moreews & Olivier Sallou

Dynamic Extension of a Virtualized Cluster by using Cloud Resources CHEP 2012

SHAREPOINT 2013 TO EMPOWER END USERS

Virtuozzo 7 Technical Preview - Virtual Machines Getting Started Guide

Experiences and challenges in the development of the JASMIN cloud service for the environmental science community

Improved metrics collection and correlation for the CERN cloud storage test framework

Operating system Dr. Shroouq J.

User Manual. Version connmove GmbH Version: Seite 1 von 33

The OPTIRAD Platform: Cloud-hosted IPython Notebooks for collaborative EO Data Analysis and Processing

DSS. High performance storage pools for LHC. Data & Storage Services. Łukasz Janyst. on behalf of the CERN IT-DSS group

McAfee SMC Installation Guide 5.7. Security Management Center

Common Services Platform Collector (CSPC) Self-Service - Getting Started Guide. November 2015

Status and Evolution of ATLAS Workload Management System PanDA

Configuration Guide. SafeNet Authentication Service AD FS Agent

Getting Started Using Project Photon on VMware Fusion/Workstation

WEBTITAN CLOUD. User Identification Guide BLOCK WEB THREATS BOOST PRODUCTIVITY REDUCE LIABILITIES

Cloud Computing. What is it? Presented by Prof. Dr.Prabhas CHONGSTITVATANA Asst. Prof. Dr.Chaiyachet SAIVICHIT. Source : Montana State Library Archive

SAS 9.4 Intelligence Platform

Extending Remote Desktop for Large Installations. Distributed Package Installs

OrgChart Now SSL Certificate Installation. OfficeWork Software LLC

Team: May15-17 Advisor: Dr. Mitra. Lighthouse Project Plan Client: Workiva Version 2.1

Mitglied der Helmholtz-Gemeinschaft UNICORE. Uniform Access to JSC Resources. Michael Rambadt, 20.

ULTEO OPEN VIRTUAL DESKTOP UBUNTU (PRECISE PANGOLIN) SUPPORT

Parallels Virtual Automation 6.1

Software. Enabling Technologies for the 3D Clouds. Paolo Maggi R&D Manager

CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT

About Node Manager and the WebLogic Scripting Tool

Oracle Virtual Desktop Infrastructure. VDI Demo (Microsoft Remote Desktop Services) for Version 3.2

v7.1 Technical Specification

DSS. The Data Storage Services (DSS) Strategy at CERN. Jakub T. Moscicki. (Input from J. Iven, M. Lamanna A. Pace, A. Peters and A.

Summer 2013 Cloud Initiative. Release Bulletin

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

Integration of Virtualized Workernodes in Batch Queueing Systems The ViBatch Concept

Mod 2: User Management

Parallels Remote Application Server

Network Monitoring Assignments

Overview of HPC Resources at Vanderbilt

BarTender Print Portal. Web-based Software for Printing BarTender Documents WHITE PAPER

2X ApplicationServer & LoadBalancer Manual

PIVOTAL CRM ARCHITECTURE

Gladinet Cloud Enterprise

Full disk encryption with Sophos Safeguard Enterprise With Two-Factor authentication of Users Using SecurAccess by SecurEnvoy

SAML 2.0 SSO Deployment with Okta

How To Use Exchange Reporter Plus On A Microsoft Mailbox On A Windows (Windows) On A Server Or Ipa (Windows 7) On An Ubuntu 7.6 (Windows 8) On Your Pc Or

CMB 207 1I Citrix XenApp and XenDesktop Fast Track

Parallels Cloud Server 6.0

Tenrox. Single Sign-On (SSO) Setup Guide. January, Tenrox. All rights reserved.

Transcription:

Data Mining as a Service DMaaS P. Mato, D. Piparo, E. Tejedor EP-SFT M. Lamanna, L. Mascetti, J. Moscicki IT-ST Cloud Services for Synchronisation and Services (CS3) 18/01/2016

Data Mining As a Service 2 Describe a distributed service offering a web interface for data analysis based on Jupyter Notebooks Demonstrate how provision of CPU and storage resources as well as software are its building blocks Focus on sync d and mass storage Illustrate with a demo analysis how it can boost productivity and give access to innovative workflows Give you the possibility to try it out!

Prelude: The Notebook

A web-based interactive computing interface and platform that combines code, equations, text and visualisations. http://www.jupyter.org Many supported languages: Python, Haskell, Julia, R One generally speaks about a kernel for a specific language In a nutshell: an interactive shell opened within the browser Also called: Jupyter Notebook or IPython Notebook Data Mining As a Service 4

In a browser A Choice of Kernels http:// Kernels are processes that run interactive code in a particular programming language and return output to the user. Kernels also respond to tab completion and introspection requests.

6 http:// Text and Formulas

7 http:// Code

8 http:// This is a notebook in Python Code

9 http:// Code

10 We can invoke http:// commands in the shell Shell Commands

11 http:// And capture their output Shell Commands

http:// 12

13 http:// http:// Images

14 In a browser Text and Formulas Code http:// Shell Commands http:// Images

A Distributed Service Building on top of CERN Services Portfolio

Platform independent: only with a web browser Analyse data via the Notebook web interface Calculations, input and results in the cloud Allow easy sharing of scientific results: plots, data, code Storage is crucial Simplify teaching of data processing and programming Not HEP specific, not only for cutting edge fundamental research C++, Python and other languages or analysis ecosystems Also interface to widely adopted scientific libraries (e.g. ROOT*) Data Mining As a Service * www.root.cern.ch 16

Data Mining As a Service 17 The DMaaS project relies on technologies provided by CERN Scientific libraries Notebook integration (EP-SFT) Software distribution (EP-SFT, IT-ST): CVMFS All software potentially available Virtualised CPU resources in OpenStack Cloud (IT-CM) Interactive and batch usage Synergy with document sharing and publication (IT-CDA) Security, e.g. CERN credentials (IT-DI-CSO) Storage access (IT-ST): CERNBox, EOS All data potentially available A coherent view at CERN

EOS Disk-based low latency storage infrastructure for physics users. Main target: physics data analysis. Storage backend for CERNBox. Indico Manage complex conferences, workshops and meetings. CVMFS HTTP based network FS, optimized to deliver experiment software Files aggressively cached and downloaded on demand. ROOT Software framework for data mining, visualisation and storage. Hundreds of PB of HEP data saved in ROOT format. Try it in your browser (notebooks!): mybinder.org/repo/cernphsft/rootbinder Data Mining As a Service 18

Data Mining As a Service 19 Jupyterhub: Server application - manages login of users and redirection to notebook Existing solution Allows encapsulation: spawn Docker container at logon Docker Isolation of users Boot faster than Virtual Machines Openstack support Both have large user bases and an active community behind

CERN Authentication CERN Cloud Web Portal Notebook Container Container Scheduler C C C C 12/10/2015 Data Mining As a Service C C C C 20

Launch jobs on the batch farm Access notebook running on a container in the OpenStack instance Inspect produced data via CERNBox/EOS from the notebook Create plots and output data Share, access plots (and output data!) on the web with CERNBox web interface Security guaranteed by the usual CERN standards e.g. Added value: remote users cannot open graphical connections to CERN (latency): Problem automatically solved in the above workflow Data Mining As a Service 21

Data Mining As a Service 22 Time for a demo: Download data Produce a plot after a simple analysis Share it via CERNBox Test Node at CERN being used DEMO Number of cinemas and their screens in canton Zurich in the last 45 years

Data Mining As a Service 23 Intermediate steps accomplished: 1) Single node, CERNBox, no CERN credentials 2) Single node on Openstack, CVMFS, CERNBox, CERN authentication (just demoed) TODO: Distributed setup on Openstack, CVMFS, CERNBox, CERN authentication DMaaS accessible to CERN users: 2 nd quarter 2016 See backup for more details about these setups

CERN Summer Student Program, ROOT lectures: Interactive notebooks offered 50 participants, perfect scaling, a success! https://indico.cern.ch/event/407519 Data Science @ LHC Workshop, Multivariate analysis tutorial: http://indico.cern.ch/event/395374/ E-Planet exchange @ UERJ, Brazil 30 participants, every day for a week, 3h a day https://indico.cern.ch/event/402660/ Data Mining As a Service Example of notebooks on Indico In addition, clear signs of appreciation of Notebook technology: see backup 24

We will provide a service for data analysis in the cloud via a web interface Platform independent: no need to install software Rely on the robust services already provided by CERN Sync d storage and access to the mass storage are crucial Share data, code, documentation, results CERNBox + EOS An optimal solution New ways of approaching data mining made accessible: boost productivity thanks to sync and file sharing services Give you the possibility to try it out! Data Mining As a Service 25

Data Mining As a Service 26 Access from the conference site until tomorrow dmaasdemo.web.cern.ch Take a look to the provided notebooks, modify them, run them Produce results! Access them via CERNBox (https://cernbox.cern.ch) From the ETH public network only! Ask me for your user name and password!

Data Mining As a Service 27 1 3 2

Backup Slides

29 Local users Authenticator Spawner A single powerful machine C C C C Data Mining As a Service Containers (notebook servers)

CERN Authentication CERN Cloud Web Portal C C C C C 17/1/2016 Data Mining As a Service Notebook Container 30

Data Mining As a Service 31 Large volume of data complex analysis: need to use many cores 1) Single node: TProcPool, IPython Parallel, etherogeneous/ multithreaded code 2) Many nodes: Batch/Grid jobs Several production grade, Python based job submission tools available: Ganga, GridControl, Panda, See A. Richards Presentation CERN Batch Service being considered in the full picture! Opportunity: Steer job submission to WLCG or local batch resources from the notebook.

Define a custom software environment via a web form Same mechanism for selecting hardware (e.g. GPU, N CPU cores, SSD disk) Future: more fields to specify hw requirements N releases C 82 ~150 packages 32

And Notebooks

Python flavour import ROOT: all goodies activated %%cpp magic ROOT C++ flavour Kernel distributed with ROOT itself Goodies Integration of ROOT & Jupyter Notebooks delivered Tab completion Display of graphics Syntax highlighting Asynchronous output capturing ROOT comes with a C++ 11/14 compatible interpreter based on LLVM Technology 34

35 Follow some simple instructions at: https://root.cern.ch/how/how-create-rootbook (basically build ROOT) and $ root --notebook This command: 1. Starts a local notebook server 2. Connects to it via the browser Provides a ROOT C++ kernel and the rest of ROOTbook goodies

ROOTbooks How-Tos https://root.cern.ch/howtos#jupyter%20notebooks ROOT bindings for Jupyter https://github.com/root-mirror/root/tree/master/bindings/pyroot/ JupyROOT ROOT C++ Kernel https://github.com/ipython/ipython/wiki/ipython-kernels-for-otherlanguages Examples (15 already) from the new ROOT Tutorials can be found at: https://root.cern.ch/code-examples#notebooks both in Python and C++ (and mixed!) 36

Data Mining As a Service 37 Binder is a software package and a webservice (100% free and open source) to turn a GitHub repo into a collection of interactive notebooks powered by Jupyter and Kubernetes. ROOT is on Binder: you can try it at ROOTBinder. On ROOTBinder you can find a collection of Notebooks aiming to illustrate the potential of the ROOT Framework. Anonymous access, no persistent storage Dec 2015 Jan 2016 View, Create and Run ROOTbooks!

mybinder.org/repo/cernphsft/rootbinder Data Mining As a Service 38

mybinder.org/repo/cernphsft/rootbinder Data Mining As a Service 39