Development and Execution of Collaborative Application on the ViroLab Virtual Laboratory



Similar documents
A Platform for Collaborative e-science Applications. Marian Bubak ICS / Cyfronet AGH Krakow, PL bubak@agh.edu.pl

GridSpace2 Towards Science-as-a-Service Model

DataNet Flexible Metadata Overlay over File Resources

Distributed Cloud Environment for PL-Grid Applications

How To Monitor A Grid With A Gs-Enabled System For Performance Analysis

AUTOMATIC PROXY GENERATION AND LOAD-BALANCING-BASED DYNAMIC CHOICE OF SERVICES

HYBRID WORKFLOW POLICY MANAGEMENT FOR HEART DISEASE IDENTIFICATION DONG-HYUN KIM *1, WOO-RAM JUNG 1, CHAN-HYUN YOUN 1

Distributed Knowledge Management based on Software Agents and Ontology

Knowledge-based Expressive Technologies within Cloud Computing Environments

Monitoring of Business Processes in the EGI

Semantic Workflows and the Wings Workflow System

An approach to monitoring, data analytics, and decision support for levee supervision

Performance monitoring of GRID superscalar with OCM-G/G-PM: integration issues

P ERFORMANCE M ONITORING AND A NALYSIS S ERVICES - S TABLE S OFTWARE

Towards user-defined performance monitoring of distributed Java applications

Collaborative & Integrated Network & Systems Management: Management Using Grid Technologies

Connecting Scientific Data to Scientific Experiments with Provenance

Intelligent Data Integration Middleware Based on Updateable Views

Integration of the OCM-G Monitoring System into the MonALISA Infrastructure

Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures

K-WGrid - Knowledge Based Workflow System for Grid Applications

Looking into the Future of Workflows: The Challenges Ahead

Metodologia komponentowa do konstruowania i wykonywania aplikacji naukowych wykorzystuj cych zasoby gridowe

GSiB: PSE Infrastructure for Dynamic Service-oriented Grid Applications

jeti: A Tool for Remote Tool Integration

A common interface for multi-rule-engine distributed systems

DATA STORAGE MANAGEMENT USING AI METHODS

Active Code Generation and Visual Feedback for Scientific Workflows using Tigres

SURVEY ON THE ALGORITHMS FOR WORKFLOW PLANNING AND EXECUTION

Data Management Challenges of Data-Intensive Scientific Workflows

An approach to monitoring, data analytics, and decision support for levee supervision

Monitoring Message Passing Applications in the Grid

Monitoring and Performance Analysis of Grid Applications

zen Platform technical white paper

Application of ontologies for the integration of network monitoring platforms

DATA MODEL FOR DESCRIBING GRID RESOURCE BROKER CAPABILITIES

PROGRESS Portal Access Whitepaper

A General Approach to Real-time Workflow Monitoring Karan Vahi, Ewa Deelman, Gaurang Mehta, Fabio Silva

Liste des publications de Richard Olejnik

Cloud Platform for VPH Applications

Soaplab - a unified Sesame door to analysis tools

AN APPROACH TO DEVELOPING BUSINESS PROCESSES WITH WEB SERVICES IN GRID

Workflow Automation and Management Services in Web 2.0: An Object-Based Approach to Distributed Workflow Enactment

PLGrid Programme: IT Platforms and Domain-Specific Solutions Developed for the National Grid Infrastructure for Polish Science

Aggregating IaaS Service

Praseeda Manoj Department of Computer Science Muscat College, Sultanate of Oman

Techniques to Produce Good Web Service Compositions in The Semantic Grid

e-science Technologies in Synchrotron Radiation Beamline - Remote Access and Automation (A Case Study for High Throughput Protein Crystallography)

Automatic Conversion of Desktop Applications to Java Web Technology

MIGRATING DESKTOP AND ROAMING ACCESS. Migrating Desktop and Roaming Access Whitepaper

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

IMPROVING BUSINESS PROCESS MODELING USING RECOMMENDATION METHOD

gridmonsteer: Generic Architecture for Monitoring and Steering Legacy Applications in Grid Environments

Integration of Application Business Logic and Business Rules with DSL and AOP

An Eclipse Plug-In for Visualizing Java Code Dependencies on Relational Databases

Title of Deliverable Interoperability Framework Release Report 1 IF/2-D3, IF/3-D3, IF/4-D3, IF/5-D3

Digital libraries of the future and the role of libraries

THE CCLRC DATA PORTAL

A PRACTICAL APPROACH FOR A WORKFLOW MANAGEMENT SYSTEM

Early Cloud Experiences with the Kepler Scientific Workflow System

Software Visualization Tools for Component Reuse

JFlooder - Application performance testing with QoS assurance

Table of Contents. Keynote and Invited Lectures

Data Sharing Options for Scientific Workflows on Amazon EC2

BPMN PATTERNS USED IN MANAGEMENT INFORMATION SYSTEMS

Towards Software Configuration Management for Test-Driven Development

Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object

Towards Analytical Data Management for Numerical Simulations

Development of a file-sharing system for educational collaboration among higher-education institutions

SCALEA-G: a Unified Monitoring and Performance Analysis System for the Grid

Business Process Management

Intelligent Workflow Systems and Provenance-Aware Software

Ontology-Based Discovery of Workflow Activity Patterns

Artemis: Integrating Scientific Data on the Grid

Web. Studio. Visual Studio. iseries. Studio. The universal development platform applied to corporate strategy. Adelia.

Proceedings of the 6th Educators Symposium: Software Modeling in Education at MODELS 2010 (EduSymp 2010)

Selbo 2 an Environment for Creating Electronic Content in Software Engineering

MANAGEMENT METHODS IN SLA-AWARE DISTRIBUTED STORAGE SYSTEMS

Big Data Mining Services and Knowledge Discovery Applications on Clouds

by Academic Computer Centre CYFRONET AGH ul. Nawojki 11, Kraków 61, P.O. Box 386, Poland The Authors mentioned in the Table of Contents

How To Understand The Difference Between Business Process And Process Model In Java.Java.Org (Programming)

Datagridflows: Managing Long-Run Processes on Datagrids

Performance Monitoring and Analysis System for MUSCLE-based Applications

Semantic-ontological combination of Business Rules and Business Processes in IT Service Management

SURVEY ON SCIENTIFIC DATA MANAGEMENT USING HADOOP MAPREDUCE IN THE KEPLER SCIENTIFIC WORKFLOW SYSTEM

Grid environment for on-line application monitoring and performance analysis

A Contribution to Expert Decision-based Virtual Product Development

H2O A Lightweight Approach to Grid Computing

Model-Driven Cloud Data Storage

Context-Aware Access Control for Pervasive Access to Process-Based Healthcare Systems

Approaches for Cloud and Mobile Computing

Building Platform as a Service for Scientific Applications

Integrating TAU With Eclipse: A Performance Analysis System in an Integrated Development Environment

Cloud services in PL-Grid and EGI Infrastructures

AI Planning for Grid/Web Services Composition, Policy Analysis & Workflow

Pipeline Orchestration for Test Automation using Extended Buildbot Architecture

San Diego Supercomputer Center, UCSD. Institute for Digital Research and Education, UCLA

Winery A Modeling Tool for TOSCA-based Cloud Applications

Supporting Collaborative Grid Application Development Within The E-Science Community p. 1

A Tool for Searching the Semantic Web for Supplies Matching Demands

Transcription:

Development and Execution of Collaborative Application on the ViroLab Virtual Laboratory Marek Kasztelnik 3, Tomasz Guba la 2,3, Maciej Malawski 1, and Marian Bubak 1,3 1 Institute of Computer Science AGH, al. Mickiewicza 30, 30-059 Kraków, Poland 2 Informatics Institute, University of Amsterdam, Kruislaan 403, 1098 SJ Amsterdam, The Netherlands 3 ACC CYFRONET AGH, Kraków, ul. Nawojki 11, 30-950 Kraków, Poland email: m.kasztelnik@cyfronet.pl Abstract Creating applications that use distributed Grid resources is a complex and time-consuming process. To help developers and end users to create, test and execute this kind of applications, the integrated environment is needed. This paper shows how to develop and execute collaborative applications on the ViroLab virtual laboratory. Furthermore, the collaboration tools which allow to communicate between end users (scientists) and developers are presented. Keywords: virtual laboratory, collaborative applications, collaborative environment, ViroLab, e-science experiments, grid 1 Introduction Modern practices of science in such area as investigation of HIV virus drug resistance require collaborative sharing, processing and analysing of virological, immunological, clinical and experimental data, as well as advanced tools for (bio) statistical analysis, visualization, modelling and simulation [1]. A process integrating these computational tools and data, which leads to obtaining results relevant to the application domain, is called in-silico experiment. An experiment in virtual laboratory combines data and activities which are available on the distributed Web- and Grid-based infrastructure and it needs to orchestrate them in possibly complex scenarios. A common approach to experiment orchestration is to use one of many scientific workflow systems available for the Grid, such as Pegasus [2], Triana [3] and K-WfGrid [4] systems. They are intended to assist non-programmer users in developing applications, however, in the case of workflows with many components and complex interactions, they can become difficult to understand and use. To overcome the limitations of workflow systems, we decided to define an experiment plan notation based on a high-level scripting language, namely Ruby [5]. An experiment plan is a Ruby script which features a concise and clear syntax combined with a full set of control structures, allowing expressing experiments of arbitrary complexity level.

Fig. 1: Application lifecycle The process of experiment planning and execution in the ViroLab virtual laboratory is collaborative in the sense that the virtual laboratory supports cooperation of multiple experiment developers and users (see Fig. 1). 2 Experiment planning When a developer prepares an experiment script, it can be published in the experiment repository and thus become available to others. Then, a scientist, who does not intend to get into the details of scripting, can access the virtual laboratory through a portal, and execute the published experiments using Web browser, providing only input data when necessary. The main idea is that the experiment in the repository can be shared and reused what is a very efficient way of promoting collaboration between scientists. Provenance data related to the experiment is also recorded and available for queries thus making the results more reliable, reproducible and scientifically relevant. The first stage of experiment lifecycle is performed by the experiment developer, whose task (with the assistance of the domain researcher) is to develop an experiment plan using a scripting notation. To hide sophisticated details of the underlying grid infrastructure, a high level object-oriented API has been introduced. It allows to define which computational functionality is required, without a need to specify how to access it with available middleware. Uniform access to computational resources in a Grid environment is possible due to three levels of abstraction that describe resources, namely Grid Object Class, Grid Object implementation and Grid Object Instance. While creating experiment, only the highest level of resources description may be used on the other hand, however, if the developer needs to retain a full control over the experiment plan, it is possible to specify all the technical details on one of the lower levels of abstraction. When a resource is registered in the virtual laboratory, it is available for the whole community and other developers can reuse it in new applications. Another feature, provided by the ViroLab virtual laboratory, which is useful

during development, is the high-level API that allows to connect and query various databases. Thanks to OGSA-DAI [9] system, that is accessible through this API, applications are able to query for the data located in distributed data sources. It is a very important tool that allows users to share the experiment results with the whole community. When preparing the experiment plan, the experiment developer uses the Experiment Planning Environment (EPE) [10] based on the Eclipse [11] platform which offers user-friendly, integrated with a set of tools, editor for writing scripts. The developer can use the semantic-web based Domain Ontology Store graphical browser to discover available data and computational services, coupled with Grid Resources Registry which provides the available operations that can be invoked directly from a script. To facilitate collaboration, EPE is integrated with the Experiment Repository based on the Subversion (SVN) version control system as a result many developers can work on single experiment or share experiment codes. When the experiment is ready (it fulfills all requirements of an end user) the developer uses dedicated EPE wizard to release it. When it is released, it instantly becomes available through the web interface for the scientific communitie. Moreover, clear releasing and versioning policy is very important for provenance data that has to be connected with specific version of the experiment. 3 Experiment execution A script can be executed during development phase from the EPE which is integrated with the GridSpace Engine [12] (GSEngine, Fig. 2) which acts as a core of the runtime system. GSEngine includes the Grid Operation Invoker [13, 14] which translates high-level operations specified in the script into concrete invocations on computational resources using appropriate technologies. After an experiment plan is developed, tested and released, a scientist can use a dedicated web interface (Experiment Management Interface - EMI [10]) for executing the experiment. The main advantage of the web based interface is that the scientist does not have to have any additional software installed (only a web browser with Java Script support is required). This tool hides the whole complexity of technology used underneath. Scientists may browse released applications, can see their documentation and execute them. EMI is also connected with the Gridspace Engine that is able to connect to SVN repository, download the experiment and run it. While application executes, data for the PROToS provenance system [15, 16] is recorded and stored into a dedicated provenance storage. This information can be searched by scientists through QUery TRanslation tools (QUaTRO) [16, 17] - web interface dedicated for searching provenance data.

Fig. 2: Execution of a collaborative application on ViroLab virtual laboratory 4 Feedback from the end user to the experiment developer Experiment development is a long-lasting process: bugs may occur in the code, not every user s requirement works correctly, additional enhancements should be implemented, etc. These problems, in most cases, are discovered by the end user of the application during its execution. The ViroLab virtual laboratory supports solving this kind of problems by simplifying process of communication between the experiment user and its developer. EMI allows the end user to submit feedback that can be useful for the developer who is able to browse it using EPE, respond to it and take it into account when creating a new version of the experiment. Scientists can track progress of the new version of the experiment development using simple SVN client (as a standalone or web based application) and help developer during this phase (with e.g. further comments). 5 Conclusions and future work The unique feature of the virtual laboratory developed for ViroLab is that by providing a set of user friendly tools, both advanced experiment developers and domain scientists can productively collaborate and conduct their research

in modern highly distributed environment. Thanks to scripting language it is possible to define even complex experiments easily, still remaining on a high-level of abstraction and concealing the details of underlying grid middleware. Currently the first prototype of the integrated virtual laboratory is released, installed and accessible by the experiment developers and the scientists (see [18] for software download and access to the Experiment Management Environment) who start to set up communities that share the data, resources, created experiments scripts and the knowledge. Future work will concentrate on providing additional methodologies and tools that allow to create applications based on the data, resources and experience of the community. It is worth mentioning that those applications are created by many developers and are available for many scientists. Currently the work on management of results produced by experiments is in progress. Afterwards, data will be connected with ontological description of the environment. Consequently, finding interesting information will be easy and connecting it with provenance data will allow to track back the origin of this data. This functionality is very important for the virologists who are the main end users of the ViroLab virtual laboratory. Acknowledgements This work was partially funded by the European Commission, Project ViroLab IST-027446, the related Polish grant SPUB-M and the Foundation for Polish Science. References 1. Peter M.A. Sloot, Ilkay Altintas, Marian Bubak, Charles A. Boucher: From Molecule to Man: Decision Support in Individualized E-Health; IEEE Computer Society, vol 39, no.11, pp. 40-46, Nov., 2006 2. Ewa Deelman, James Blythe, Yolanda Gil, Carl Kesselman, Gaurang Mehta, Sonal Patil, Mei-Hui Su, Karan Vahi, and Miron Livny. Pegasus: Mapping scientific workflows onto the grid. In Grid Computing: Second European AcrossGrids Conference, AxGrids, volume 3165 of Lecture Notes in Computer Science, pages 11-20. Springer, 2004. 3. Ian Taylor, Matthew Shields, Ian Wang, and Andrew Harrison: Visual grid workflow in Triana. Journal of Grid Computing, 3(3-4):153-169, September 2005. 4. Tomasz Guba la and Andreas Hoheisel: Highly dynamic workflow orchestration for scientific applications. In CoreGRID Intergation Workshop 2006 (CIW06), pages 309-320. ACC CYFRONET AGH, 2006. 5. Ruby programming language http://www.ruby-lang.org 6. Web Service http://www.w3.org/2002/ws 7. Web Service Resource Framework http://www.oasis-open.org/committees/ wsrf 8. Maciej Malawski, Marian Bubak, Micha l Placek, Dawid Kurzyniec, and Vaidy Sunderam: Experiments with distributed component computing across grid boundaries. In Proceedings of the HPC-GECO/CompFrame workshop in conjunction with HPDC 2006, Paris, France, 2006. 9. OGSA-DAI homepage http://www.ogsadai.org.uk

10. W lodzimierz Funika, Daniel Harȩżlak, Dariusz Król,Piotr Pȩgiel, Marian Bubak Developer and User Interfaces to the Virolab Virtual Laboratory. In Proceedings of Cracow Grid Workshop 2007, This volume. 11. Eclipse - an open development platform www.eclipse.org 12. Eryk Ciepiela, Joanna Kocot, Tomasz Guba la, Maciej Malawski, Marek Kasztelnik, Marian Bubak: Virtual Laboratory Engine - GridSpace Engine. In Proceedings of Cracow Grid Workshop 2007, This volume. 13. Tomasz Bartyński, Marian Bubak, Tomasz Guba la, Maciej Malawski: Universal Grid Client: Grid Operation Invoker In Proc. PPAM 2007, Seventh International Conference on Parallel Processing and Applied Mathematics, LNCS, Gdansk, Poland, Sept. 2007. Springer. In print. 14. Maciej Malawski, Tomasz Bartyński, Marian Bubak: Invocation of Grid Operations in the ViroLab Virtual Laboratory. In Proceedings of Cracow Grid Workshop 2007, This volume. 15. Bartosz Balis, Marian Bubak, and Jakub Wach: Provenance Tracking in the ViroLab Virtual Laboratory. In Proc. PPAM 2007, Seventh International Conference on Parallel Processing and Applied Mathematics, LNCS, Gdansk, Poland, Sept. 2007. Springer. In print. 16. Bartosz Bali, Marian Bubak, Micha l Pelczar, Jakub Wach: Provenance Tracking and Querying in ViroLab. In Proceedings of Cracow Grid Workshop 2007, This volume. 17. Bartosz Balis, Marian Bubak, Jakub Wach: User-Oriented Querying over Repositories of Data and Provenance. In Proc. 3rd IEEE International Conference on e-science and Grid Computing, e-science 2007, IEEE Computer Society, Bangalore, India, Dec. 2007. 18. The ViroLab Project Consortium. The ViroLab Virtual Laboratory Website, 2007. http://virolab.cyfronet.pl