Data Lab System Architecture

Similar documents
Assignment # 1 (Cloud Computing Security)

Implementing Microsoft Azure Infrastructure Solutions

Course 20533: Implementing Microsoft Azure Infrastructure Solutions

Implementing Microsoft Azure Infrastructure Solutions 20533B; 5 Days, Instructor-led

Course 20533B: Implementing Microsoft Azure Infrastructure Solutions

MS 20487A Developing Windows Azure and Web Services

Training on Linux System Administration, LPI Certification Level 1

Adding scalability to legacy PHP web applications. Overview. Mario Valdez-Ramirez

IBM Bluemix. The Digital Innovation Platform. Simon

Developing Windows Azure and Web Services

Globus Research Data Management: Introduction and Service Overview

Introducing. Markus Erlacher Technical Solution Professional Microsoft Switzerland

Client. Applications. Middle Tier. Database. Infrastructure. Leading Vendors

SSM6437 DESIGNING A WINDOWS SERVER 2008 APPLICATIONS INFRASTRUCTURE

This module provides an overview of service and cloud technologies using the Microsoft.NET Framework and the Windows Azure cloud.

DeployStudio Server Quick Install

Course Overview. What You Will Learn

Managing Complexity in Mobile Application Deployment Using the OSGi Service Platform

Linstantiation of applications. Docker accelerate

Cloud Based Application Architectures using Smart Computing

owncloud Architecture Overview

Google Cloud Data Platform & Services. Gregor Hohpe

The Virtualization Practice

OpenNebula Open Souce Solution for DC Virtualization. C12G Labs. Online Webinar

Enterprise GIS Architecture Deployment Options. Andrew Sakowicz

Network File System (NFS) Pradipta De

In Memory Accelerator for MongoDB

Simplifying and Empowering the Implementation of Enterprise Mobile Strategy

Use Cases for Argonaut Project. Version 1.1

Migration Scenario: Migrating Backend Processing Pipeline to the AWS Cloud

OpenNebula Open Souce Solution for DC Virtualization

Sisense. Product Highlights.

owncloud Architecture Overview

Designing a Windows Server 2008 Applications Infrastructure

OpenStack Introduction. November 4, 2015

OpenNebula Open Souce Solution for DC Virtualization

Getting Started with IBM Bluemix: Web Application Hosting Scenario on Java Liberty IBM Redbooks Solution Guide

ebay : How is it a hit

Data Grids. Lidan Wang April 5, 2007

McAfee VirusScan and epolicy Orchestrator Administration Course

APPLICATION PERFORMANCE MONITORING

Functional Requirements for Digital Asset Management Project version /30/2006

EMC SYNCPLICITY FILE SYNC AND SHARE SOLUTION

10231B: Designing a Microsoft SharePoint 2010 Infrastructure

Service-Oriented Architecture and Software Engineering

Building a Continuous Integration Pipeline with Docker

Copyrighted , Address :- EH1-Infotech, SCF 69, Top Floor, Phase 3B-2, Sector 60, Mohali (Chandigarh),

BSA Best Practices Webinars Role Based Access Control Sean Berry Customer Engineering

C/S Basic Concepts. The Gartner Model. Gartner Group Model. GM: distributed presentation. GM: distributed logic. GM: remote presentation

AirWatch Solution Overview

50331D Windows 7, Enterprise Desktop Support Technician (Windows 10 Curriculum)

ENVI Services Engine: Scientific Data Analysis and Image Processing for the Cloud

Junos Space for Android: Manage Your Network on the Go

Customer Bank Account Management System Technical Specification Document

Load and Performance Load Testing. RadView Software October

AstroCompute. AWS101 - using the cloud for Science. Brendan Bouffler ( boof ) Scientific Computing AWS. ska-astrocompute@amazon.

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

ArcGIS for Server Deployment Scenarios An ArcGIS Server s architecture tour

PostgreSQL Backup Strategies

Deploying Business Virtual Appliances on Open Source Cloud Computing

APPLICATION VIRTUALIZATION TECHNOLOGIES WHITEPAPER

Application Discovery Manager User s Guide vcenter Application Discovery Manager 6.2.1

How To Create A Desktop Computer From A Computer Or Mouse And Keyboard (For Business)

Introduction to LSST Data Management. Jeffrey Kantor Data Management Project Manager

EMC AVAMAR INTEGRATION WITH EMC DATA DOMAIN SYSTEMS

VMware End User Computing Horizon Suite

What We Do: Simplify Enterprise Mobility

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

Developing Microsoft SharePoint Server 2013 Core Solutions

Acronis Backup & Recovery for Mac. Acronis Backup & Recovery & Acronis ExtremeZ-IP REFERENCE ARCHITECTURE

Configuration Management of Massively Scalable Systems

SECURE, ENTERPRISE FILE SYNC AND SHARE WITH EMC SYNCPLICITY UTILIZING EMC ISILON, EMC ATMOS, AND EMC VNX

The full setup includes the server itself, the server control panel, Firebird Database Server, and three sample applications with source code.

Running Oracle Databases in a z Systems Cloud environment

Architecture Workshop

Do Containers fully 'contain' security issues? A closer look at Docker and Warden. By Farshad Abasi,

Deploying SAP on Microsoft SQL Server 2008 Environments Using the Hitachi Virtual Storage Platform

DLT Solutions and Amazon Web Services

Course 6437A: Designing a Windows Server 2008 Applications Infrastructure

Interwise Connect. Working with Reverse Proxy Version 7.x

CA Workload Automation Agents for Mainframe-Hosted Implementations

Deployment Guide: Unidesk and Hyper- V

Virtual Application Management with Microsoft Application Virtualization 4.6 and System Center 2012 Configuration Manager

Managing Enterprise Devices and Apps using System Center Configuration Manager

WHITE PAPER Redefining Monitoring for Today s Modern IT Infrastructures

IBM Rational ClearCase, Version 8.0

Big Data and Cloud Computing for GHRSST

Memopol Documentation

Developing Windows Azure and Web Services

Smartphone Enterprise Application Integration

Outline. Mariposa: A wide-area distributed database. Outline. Motivation. Outline. (wrong) Assumptions in Distributed DBMS

Openstack. Cloud computing with Openstack. Saverio Proto

Transcription:

Data Lab System Architecture

Data Lab Context

Data Lab Architecture Astronomer s Desktop Web Page Cmdline Tools Legacy Apps User Code User Mgmt Data Lab Ops Monitoring Presentation Layer Authentication Query Manager Public Services Job Manager Storage Mgr Resource Resolver Public Repo Private Services Ops Monitor Private Repo Services Layer Data Access Services SIA SSA SCS UWS VOSpace UWS TAP UWS SQL Service Data Access Layer Databases MyDB Large Cats Data Pub Ops DBs Storage Resource User Space Virtual Space Compute Resource UWS Compute Jobs External Resources VO Data VO Svcs NSA Resources Layer

Data Lab Architecture Astronomer s Desktop Web Page Cmdline Tools Legacy Apps User Code User Mgmt Data Lab Ops Monitoring Presentation Layer Authentication Query Manager Public Services Job Manager Storage Mgr Resource Resolver Public Repo Private Services Ops Monitor Private Repo Services Layer Data Access Services SIA SSA SCS UWS VOSpace UWS TAP UWS SQL Service Data Access Layer Databases MyDB Large Cats Data Pub Ops DBs Storage Resource User Space Virtual Space Compute Resource UWS Compute Jobs External Resources VO Data VO Svcs NSA Resources Layer

Presentation Layer This layer contains the primary user interfaces. Astronomer s Desktop Web clients -- data query forms, content browsers, monitors, etc Command-line tools -- for local desktop access Legacy Apps -- inc. scripting environments such as Python User-written code -- custom science clients Login shells Operators Tools System Monitoring / Administration User and Resource management

Services Layer This layer provides interfaces used mostly by software. Public Services Authentication / Authorization controlled access to D/L Job Manager manage compute jobs Query Manager manage large data queries Storage Manager manage virtual storage resource Resource Resolver locate services / resource within D/L Private Services Operations monitoring service automated resource checking

Data Access Layer This layer provides interfaces to data services. Simple VO data services Catalog/images/spectra positional (+constraint) based query Anonymous access allowed Advanced Catalog Services Full SQL query capability VO standard interface (public access) Custom SQL interface (authorized access) Virtual storage Authorized access, user-controlled sharing

Service vs. Access Layers Why the need for different layers? Service Layer Access Layer Astronomer Friendly X Authorized Access X Anonymous Access X X Direct VO Protocols X Job Control X Depends Data Lab API X X Virtual Observatory API X Web Interface X* X Programmatic (Desktop) Interface X* X* Legacy App Support X*

Resources Layer This layer describes physical / logical resources in the D/L. Databases Large (distributed) Catalog DB Personal DB (similar to SDSS MyDB) User-published datasets Operational DB Physical Storage Persistent user storage Virtual storage Compute Resources Servers for processing workflows External Services Data and processing VO tools (e.g. cross-match)

Large Catalogs Require a low-cost, scalable and reliable solution No viable turnkey system available The LSST QServ project will gain us valuable experience Presents a normal DB interface to client - Can put TAP/SQL service in front of it Can optimize data partitioning thru experimentation QServ Requires dedicated hardware for each catalog instance

Virtual Storage Implemented using disk filesystem as back-end Simplifies exported service for use on local user file systems Provides options for D/L operations: User-based partition scheme Legacy code can bypass VOSpace protocols (via FUSE mounted filesystem) Cons: Potential synchronization issues Containers used to package service Bundle dependencies FUSE mounts for other containers Exploit protocol s support of: Capabilities Views Virtual Storage Service Container Python VOSpace Database Data Lab Interfaces Base Docker OS Image/Table Support Apps Local Disk Container

Example - Bringing It All Together NOAO Data Lab Virtual Storage Svcs 1(b) DL Task DL Task 1(c) MyDB Large Catalog Svcs Data Publication Svcs PI/Survey NSA 1(a) 2(a) Virtual Storage Svc MyDB DL Task DL Task 2(b) Data Publication Svc User 1 Desktop Virtual Storage Svc Legacy Tools User 2 Laptop

Compute Services / Virtualization Task Container Task Containers Why are they interesting? Provide task-level virtualization Much smaller in size, faster to startup Bundles / isolates dependencies Container images can be layered E.g. a base Python 2.7 environment Containers have their own IP address Users can login to a container Tasking Interface Can be deployed to other Clouds easily Growing user / developer community Repository of public containers available Params Results Data Lab Support Code Base OS Image <<Task>> Disk Cache Mount F U S E Virtual Storage

Task Containers What can you contain? Web applications Desktop Tools Almost anything. Compute Services / Virtualization Task Container Tasking Interface Tasking Interface Handles UWS communications with the Job Manager Allows for setting of parameters, results collection, timeouts Redirects stdio streams back to calling client Params Results Data Lab Support Code Base OS Image Container Storage Persistent cache container shared in a workflow <<Task>> Virtual storage can be mounted as part of environment Disk Cache Mount F U S E Virtual Storage

Compute Services / Job Manager Job Manager Parallelizes a request based on user parameters User-defined independent input list to parallelize Initializes a job on the remote compute server Executes as sync or async job UWS for job control Polls for completion Gets result objects Returns results to client Or, creates new transfer job Manages hundreds of jobs Sync Job fork() Job Manager ssh Tasking Interface stdio streams <<Task>> Tasking Interface fork() Job Manager ssh UWS Client ASync Job stdio streams <<Task>>

Query Manager / SQL Service Query Manager Provides a high-level, uniform, interface for clients to query data services Hides the sync/async job handling and VO protocols from clients Orchestrates result handling (download, save to virtual storage, etc) SQL Service Provides job control for query by implementing UWS Offers options for query-result handling Store to personal database, virtual storage, direct download, etc. Download format options (FITS, etc) Offers alternative to VO TAP Greater re-use of existing DB client software

Data Publication Capability is used in multiple contexts Public access to high-level data products (static) Private access used in workflows (transient) Semi-private access within a collaboration (shared) Shared responsibility between D/L and Users D/L provides tools, resources and a publishing framework Users provide the content and the scientific curation Low-cost, simple, services for all datasets Higher-cost, advanced, services to support collaborations

Storage Manager Provides a simple interface for user applications Hides details of the Virtual Storage implementation (VOSpace) Can map to idiomatic filesystem interfaces easily (i.e. get, put, list) Abstracts easily to web, desktop and programmatic APIs Provides authenticated access to data holdings Manages the details for other Data Lab services Endpoint resolution, authentication, etc when used to save results

Authentication / Authorization Deferred implementation in Year-1 due to potential landmines in a changing landscape General user support not needed, trusted-users only Y1 services to use null interface to identify need for service in the code w/out requiring a working service Various authentication methods under discussion Requests to public services passed-thru automatically Implies, service knows public vs private services Manages user- and group-level access to resources Manages multiple authentication methods as needed