Processing big data by WS- PGRADE/gUSE and Data Avenue



Similar documents
Data Avenue: Remote Storage Resource Management in WS-PGRADE/gUSE

Project Full Title: Cloud based Simulation platform for Manufacturing and Engineering. Project Acronym: CloudSME Project Number:

Hadoop Cloud SaaS access via WS-PGRADE adaptation

User Manual: Using Hadoop with WS-PGRADE. workflow.

Anwendungsintegration und Workflows mit UNICORE 6

Test of cloud federation in CHAIN-REDS project

HPC Cloud Computing with OpenNebula

Managing Complexity in Distributed Data Life Cycles Enhancing Scientific Discovery

A Survey Study on Monitoring Service for Grid

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

glibrary: Digital Asset Management System for the Grid

Scientific and Technical Applications as a Service in the Cloud

The EGI pan-european Federation of Clouds

Cluster, Grid, Cloud Concepts

How To Make A Grid Broker Work With A Grid For A Better Profit

Execution of scientific workflows on federated multi-cloud infrastructures

Volunteer Computing, Grid Computing and Cloud Computing: Opportunities for Synergy. Derrick Kondo INRIA, France

Approaches for Cloud and Mobile Computing

Cloud security monitoring and vulnerability management

Grids Computing and Collaboration

User Guide of edox Archiver, the Electronic Document Handling Gateway of

An approach to grid scheduling by using Condor-G Matchmaking mechanism

GridWay: Open Source Meta-scheduling Technology for Grid Computing

Grid Scheduling Architectures with Globus GridWay and Sun Grid Engine

Software Entwicklungen für das LSDF Datenmanagement

Orchestrated service deployment, maintenance, and debugging in IaaS clouds for crowd computing *

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

XSEDE Service Provider Software and Services Baseline. September 24, 2015 Version 1.2

Deploying Business Virtual Appliances on Open Source Cloud Computing

UFTP High-performance data transfer for UNICORE

MIGRATING DESKTOP AND ROAMING ACCESS. Migrating Desktop and Roaming Access Whitepaper

HISP: a data-driven portal for hadron therapy

Concepts and Architecture of the Grid. Summary of Grid 2, Chapter 4

Cloud-pilot.doc SA1 Marcus Hardt, Marcin Plociennik, Ahmad Hammad, Bartek Palak E U F O R I A

A Service for Data-Intensive Computations on Virtual Clusters

Australian Synchrotron, Storage Gateway

THE CCLRC DATA PORTAL

Building Platform as a Service for Scientific Applications

SOA REFERENCE ARCHITECTURE: WEB TIER

Cloud Computing Architecture with OpenNebula HPC Cloud Use Cases

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014

HPC and Grid Concepts

Accessing the FTP Server - User Manual

Project Title: Judicial Branch Enterprise Document Management System RFP Number: FIN122210CK DMS TECHNICAL REQUIREMENTS

GRID COMPUTING Techniques and Applications BARRY WILKINSON

IGI Portal architecture and interaction with a CA- online

Product Training Services. Training Options and Procedures for JobScheduler and YADE

Communiqué 4. Standardized Global Content Management. Designed for World s Leading Enterprises. Industry Leading Products & Platform

TUTORIAL. Rebecca Breu, Bastian Demuth, André Giesler, Bastian Tweddell (FZ Jülich) {r.breu, b.demuth, a.giesler,

MassTransit vs. FTP Comparison

Open Source Cloud Computing Management with OpenNebula

GridFTP: A Data Transfer Protocol for the Grid

Big Data and Cloud Computing for GHRSST

The glite File Transfer Service

Pilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing

Cisco Enterprise Mobility Services Platform

Transcription:

Processing big data by WS- PGRADE/gUSE and Data Avenue http://www.sci-bus.eu Peter Kacsuk, Zoltan Farkas, Krisztian Karoczkai, Istvan Marton, Akos Hajnal, Tamas Pinter MTA SZTAKI SCI-BUS is supported by the FP7 Capacities Programme under contract nr RI-283481 1

Processing big data by workflows Processing big data many times requires a set of activities that can be combined and formulated in a scientific workflow in order to repeat the activities for a large set of data components in an automatic way. Therefore scientific workflows that can run in Globus-based DCIs and they can access large data storages are crucial for processing big data. 2

The SCI-BUS approach Integrate workflows with the Data Avenue services Run these workflows in an environment that enables to run the nodes of a workflow in many different type of DCIs to achieve Highly parallel and distributed workflow execution Workflow level interoperability among DCIs and data storages The environment offered by SCI-BUS is the WS- PGRADE/gUSE gateway framework 3

WS-PGRADE/gUSE Generic-purpose gateway framework Based on Liferay General purpose Workflow-oriented gateway framework Supports the development and execution of workflow-based applications Supports the fast development of domainspecific gateways by a customization technology Most important design aspects are flexibility and robustness 4

Flexibility in exploiting parallelism Multiple instances of the same workflow with different data files Parallel execution inside a workflow node Parallel execution among workflow nodes Parameter study execution of the workflow Multiple jobs run parallel Each job can be a parallel program

Flexibility of using various DCIs Flexible management of Security: Individual users certificate Robot certificates Flexible access to various types of DCIs: Clusters (PBS, LSF, MOAB, SGE) Cluster grids (ARC, glite, GT2, GT4, GT5, UNICORE) Supercomputers (e.g. via UNICORE) Desktop grids (BOINC) Clouds 6

Using IGE Globus resources in the DRIHM gateway start.sh $modelname $jobid 7

Flexibility in data storage access Use Data Avenue Blacktop service To access data storages in different DCIs To transfer files among the storages of different DCIs To upload/download files to/from the storages of different DCIs Data Avenue Liferay portlet to access the data transfer services of Data Avenue Blacktop See details: http://data-avenue.eu/home Currently supported protocols: http, https, ftp, gsiftp, srm, S3, (irods in beta phase) Soon coming protocols: LFC, further cloud storage protocols 8

Data Avenue services Data Avenue @ SZTAKI Data Avenue @ XY Data Avenue Portlet WS-PGRADE gateway Data Avenue Blacktop service Openstack FS1 FS2 FS3 FSn Amazon glite GT5 9

Use cases to be supported Browse, download, upload Create dir, Remove item, Data Avenue Produce data Use data Storage Service Storage Service Storage Service EGI Community Forum 2014, Helsinki, Finland 10

Data Avenue services Data Avenue Blacktop: Core service accessible through SOAP (Java API provided) Hides access details of storage services Data Avenue Portlet: User-friendly interface to manage data, up-, and download files,... Can be deployed onto any Liferay-based portal Data Avenue in WS-PGRADE/gUSE: Integration in a science gateway enabling easy data usage from workflows EGI Community Forum 2014, Helsinki, Finland 11

Data Avenue Blacktop Core service accessible through SOAP File transfers Directory operations Easy to add new protocols using the Adaptor interface HTTP(S), SFTP, SRM, GSIFTP, S3 EGI Community Forum 2014, Helsinki, Finland 12

Data Avenue Blacktop API Java API available: https://data-avenue.eu/webservice-api-doc Focused on easy usage when created EGI Community Forum 2014, Helsinki, Finland 13

Data Avenue Blacktop usage API or portlet ticket must be requested: https://data-avenue.eu/ticket-request-form Used to identify DA Blacktop clients EGI Community Forum 2014, Helsinki, Finland 14

Data Avenue Portlet Try it for yourself: https://data-avenue.eu/ Also available as a JSR-268 portlet (can be deployed over e.g. Liferay) Included in WS-PGRADE releases Two-panel layout Data up- and download Copy/move Favorites Progress monitoring 7/7/2014 Footer 15

Data Avenue @ SZTAKI 16

Data Avenue Liferay portlet 17

Data Avenue Liferay portlet 18

Data Avenue Liferay portlet 19

Data Avenue Liferay portlet 20

Generic data transfer among WS-PGRADE workflow nodes DCI1 WS-PGRADE Workflow DCI2 FS1 J2 J1 J4 J5 J3 FS2 FS3 FS5 The Data Avenue Blacktop services are available not only by the Data Avenue portlet but also by the nodes of a WS-PGRADE workflow J: Job FS: File storage system, e.g. gsiftp, irods, SRM

Data Avenue in WS- PGRADE/gUSE Data sources and destinations of jobs can be selected guse automatically manages data transfers using Blacktop Actual transfer delegated up to the worker node wherever possible (two-phase up- and download), bypassing the Blacktop service if the middleware is capable of handling the protocol To be released before summer EGI Community Forum 2014, Helsinki, Finland 22

Comparison with Globus Online 1. Globus Online is excellent inside a globus grid 2. But it supports only the Globus storage protocols 3. Does not enable to use inside a workflow 4. Data Avenue is a generalization of Globus Online 5. Enables the access to many different types of storages even in a workflow that runs through several kind of DCIs 6. This technology enables the easy integration of Globus and Cloud resources at workflow 23 level

Flexibility for collaboration among community members SHIWA Repository WF upload WF download guse Portal guse WF Repo guse Portal guse WF Repo Cloud 1 OpenNebula Cloud 2 Amazon Cloud n OpenStack 24

Flexibility in using different workflow systems Cyberspace WS-PGRADE Gateways Bio1 Bio2 BioN er Taverna Galaxy Kepler WF systems EMI Grids Glob us Infrastructures Cloud Combining SCI-BUS and SHIWA technologies (supported by ER- Flow) users can access and use many WFs and many infrastructures in an interoperable way no matter which is their home WF system 25

Flexibility of gateway types and user views 1. Generic purpose gateways for clouds (workflow view) Core WS-PGRADE/gUSE (e.g. Greek NGI) 2. Generic purpose gateway for specific technologies (workflow view) SHIWA gateway for workflow sharing and interoperation 3. Domain-specific science gateway instance Autodock gateway (end-user view) Swiss proteomics portal (customized GUI using ASM API) VisIVO Mobile (use of Remote API) 26

Some examples of SCI-BUS domain-specific gateways 27

The DRIHM project s gateway Other data sources

guse based gateways More than 100 deployments worldwide More than 15.000 downloads from 75 countries on sourceforge 29

Conclusions Join SCI-BUS as associated member Why to select WS-PGRADE/gUSE and join the SCI-BUS community? 1.Robustness Already large number of gateways used in production 2.Sustainability The SCI-BUS project and its sustainability and commercialization plan guarantees it 3.Functionalities Rich functionalities that are growing according to the SCI-BUS and sourceforge community needs 4.How easy to adapt for the needs of the new user community? Already large number of gateways customized from guse/ws- PGRADE 5.You can influence the progress of WS-PGRADE/gUSE 30

Where to find further information? SCI-BUS web page: http://www.sci-bus.eu/ guse/ws-pgrade: http://www.guse.hu/ guse on sourceforge http://sourceforge.net/projects/guse/ http://sourceforge.net/projects/guse/forums/forum/ http://sourceforge.net/projects/guse/develop 31