Processing big data by WS- PGRADE/gUSE and Data Avenue http://www.sci-bus.eu Peter Kacsuk, Zoltan Farkas, Krisztian Karoczkai, Istvan Marton, Akos Hajnal, Tamas Pinter MTA SZTAKI SCI-BUS is supported by the FP7 Capacities Programme under contract nr RI-283481 1
Processing big data by workflows Processing big data many times requires a set of activities that can be combined and formulated in a scientific workflow in order to repeat the activities for a large set of data components in an automatic way. Therefore scientific workflows that can run in Globus-based DCIs and they can access large data storages are crucial for processing big data. 2
The SCI-BUS approach Integrate workflows with the Data Avenue services Run these workflows in an environment that enables to run the nodes of a workflow in many different type of DCIs to achieve Highly parallel and distributed workflow execution Workflow level interoperability among DCIs and data storages The environment offered by SCI-BUS is the WS- PGRADE/gUSE gateway framework 3
WS-PGRADE/gUSE Generic-purpose gateway framework Based on Liferay General purpose Workflow-oriented gateway framework Supports the development and execution of workflow-based applications Supports the fast development of domainspecific gateways by a customization technology Most important design aspects are flexibility and robustness 4
Flexibility in exploiting parallelism Multiple instances of the same workflow with different data files Parallel execution inside a workflow node Parallel execution among workflow nodes Parameter study execution of the workflow Multiple jobs run parallel Each job can be a parallel program
Flexibility of using various DCIs Flexible management of Security: Individual users certificate Robot certificates Flexible access to various types of DCIs: Clusters (PBS, LSF, MOAB, SGE) Cluster grids (ARC, glite, GT2, GT4, GT5, UNICORE) Supercomputers (e.g. via UNICORE) Desktop grids (BOINC) Clouds 6
Using IGE Globus resources in the DRIHM gateway start.sh $modelname $jobid 7
Flexibility in data storage access Use Data Avenue Blacktop service To access data storages in different DCIs To transfer files among the storages of different DCIs To upload/download files to/from the storages of different DCIs Data Avenue Liferay portlet to access the data transfer services of Data Avenue Blacktop See details: http://data-avenue.eu/home Currently supported protocols: http, https, ftp, gsiftp, srm, S3, (irods in beta phase) Soon coming protocols: LFC, further cloud storage protocols 8
Data Avenue services Data Avenue @ SZTAKI Data Avenue @ XY Data Avenue Portlet WS-PGRADE gateway Data Avenue Blacktop service Openstack FS1 FS2 FS3 FSn Amazon glite GT5 9
Use cases to be supported Browse, download, upload Create dir, Remove item, Data Avenue Produce data Use data Storage Service Storage Service Storage Service EGI Community Forum 2014, Helsinki, Finland 10
Data Avenue services Data Avenue Blacktop: Core service accessible through SOAP (Java API provided) Hides access details of storage services Data Avenue Portlet: User-friendly interface to manage data, up-, and download files,... Can be deployed onto any Liferay-based portal Data Avenue in WS-PGRADE/gUSE: Integration in a science gateway enabling easy data usage from workflows EGI Community Forum 2014, Helsinki, Finland 11
Data Avenue Blacktop Core service accessible through SOAP File transfers Directory operations Easy to add new protocols using the Adaptor interface HTTP(S), SFTP, SRM, GSIFTP, S3 EGI Community Forum 2014, Helsinki, Finland 12
Data Avenue Blacktop API Java API available: https://data-avenue.eu/webservice-api-doc Focused on easy usage when created EGI Community Forum 2014, Helsinki, Finland 13
Data Avenue Blacktop usage API or portlet ticket must be requested: https://data-avenue.eu/ticket-request-form Used to identify DA Blacktop clients EGI Community Forum 2014, Helsinki, Finland 14
Data Avenue Portlet Try it for yourself: https://data-avenue.eu/ Also available as a JSR-268 portlet (can be deployed over e.g. Liferay) Included in WS-PGRADE releases Two-panel layout Data up- and download Copy/move Favorites Progress monitoring 7/7/2014 Footer 15
Data Avenue @ SZTAKI 16
Data Avenue Liferay portlet 17
Data Avenue Liferay portlet 18
Data Avenue Liferay portlet 19
Data Avenue Liferay portlet 20
Generic data transfer among WS-PGRADE workflow nodes DCI1 WS-PGRADE Workflow DCI2 FS1 J2 J1 J4 J5 J3 FS2 FS3 FS5 The Data Avenue Blacktop services are available not only by the Data Avenue portlet but also by the nodes of a WS-PGRADE workflow J: Job FS: File storage system, e.g. gsiftp, irods, SRM
Data Avenue in WS- PGRADE/gUSE Data sources and destinations of jobs can be selected guse automatically manages data transfers using Blacktop Actual transfer delegated up to the worker node wherever possible (two-phase up- and download), bypassing the Blacktop service if the middleware is capable of handling the protocol To be released before summer EGI Community Forum 2014, Helsinki, Finland 22
Comparison with Globus Online 1. Globus Online is excellent inside a globus grid 2. But it supports only the Globus storage protocols 3. Does not enable to use inside a workflow 4. Data Avenue is a generalization of Globus Online 5. Enables the access to many different types of storages even in a workflow that runs through several kind of DCIs 6. This technology enables the easy integration of Globus and Cloud resources at workflow 23 level
Flexibility for collaboration among community members SHIWA Repository WF upload WF download guse Portal guse WF Repo guse Portal guse WF Repo Cloud 1 OpenNebula Cloud 2 Amazon Cloud n OpenStack 24
Flexibility in using different workflow systems Cyberspace WS-PGRADE Gateways Bio1 Bio2 BioN er Taverna Galaxy Kepler WF systems EMI Grids Glob us Infrastructures Cloud Combining SCI-BUS and SHIWA technologies (supported by ER- Flow) users can access and use many WFs and many infrastructures in an interoperable way no matter which is their home WF system 25
Flexibility of gateway types and user views 1. Generic purpose gateways for clouds (workflow view) Core WS-PGRADE/gUSE (e.g. Greek NGI) 2. Generic purpose gateway for specific technologies (workflow view) SHIWA gateway for workflow sharing and interoperation 3. Domain-specific science gateway instance Autodock gateway (end-user view) Swiss proteomics portal (customized GUI using ASM API) VisIVO Mobile (use of Remote API) 26
Some examples of SCI-BUS domain-specific gateways 27
The DRIHM project s gateway Other data sources
guse based gateways More than 100 deployments worldwide More than 15.000 downloads from 75 countries on sourceforge 29
Conclusions Join SCI-BUS as associated member Why to select WS-PGRADE/gUSE and join the SCI-BUS community? 1.Robustness Already large number of gateways used in production 2.Sustainability The SCI-BUS project and its sustainability and commercialization plan guarantees it 3.Functionalities Rich functionalities that are growing according to the SCI-BUS and sourceforge community needs 4.How easy to adapt for the needs of the new user community? Already large number of gateways customized from guse/ws- PGRADE 5.You can influence the progress of WS-PGRADE/gUSE 30
Where to find further information? SCI-BUS web page: http://www.sci-bus.eu/ guse/ws-pgrade: http://www.guse.hu/ guse on sourceforge http://sourceforge.net/projects/guse/ http://sourceforge.net/projects/guse/forums/forum/ http://sourceforge.net/projects/guse/develop 31