A Workflow Service for Biomedical Applications Emanuela Merelli Paolo Romano Lorenzo Scortichini Università di Camerino National Cancer Research Institute Università di Camerino ITALY ITALY ITALY 2004 Bioinformatics Italian Society Meeting Padova 27 marzo 2004
Outline The Workflow in the BioScience Domain The Workflow-based Activity Coordination Workflow service The supporting technology: Agent-based middleware supports Workflow Service Future Activities and Conclusions Padova 27 marzo 2004 2
Workflows in the BioScience Domain Definition The computerised facilitation or automation of a business process, in whole or part. (from Workflow Management Coalition-Reference Model) Goals to design and implement a data analysis process (standardized protocols - S. Hoon et al. 03) to simulate a high-level biological process (Peleg et al. 03, Amici et al. 04) Advantages for data analysis, it makes possible: to reproduce the analysis to reuse intermediate results to create a transparent analysis environment to support a good practice to free the bioscientist from repetitive interactaction with the web to verify structure, functional and dynamic process requirements Padova 27 marzo 2004 3
Hypothetical Scenario for O2I project Oncology over Internet project aims to develop a framework to support searching, retrieving and filtering information from Intenert for oncology research and clinics A possible scenario involving the use of biological resources Biological resources (micro-organisms, cell lines ) are essential for implementing a good, reproducible experiment High quality biological resources are available at some specialized centers (Biological Resources Centers:ATCC, DSMZ, ) and related catalogues are available on-line Molecular Biology (MB) databases, e.g., sequence dbs, often include information (strain numbers, accession numbers) on the original resources Researchers assessing MB databases often need extended information regarding resources to finally request materials Padova 27 marzo 2004 4
A simple workflow example Use context: to verify a mutation experiment by reproducing Goal: Retrieve abstracts from a literature db for identifying the best cell line for reproducing a human TP53 mutation experiment linked to a particular tumour-habitssex combination Activities: use Bioinformatics Services available on Internet in order to achieve the desired result 1. Retrieve all mutations (IDs) observed in the 7th exon in men who are ex-smokers and drinkers by searching p53 mutations database SRS implementation at IST, Genova 2. Retrieve all mutations (IDs) observed by using B9 cell line as original resource by searching p53 mutations database SRS implemerntation at IST, Genova 1 st Activity 2 nd Activity 3. Retrieve all abstracts of the correlated bibliographic references, of a specific mutation ID by searching Medline 3 rd Activity Achievement: To integrate on-line Bioinformatics data in a unique result freeing the Bioscientist from the need to personally interact with remote sites Padova 27 marzo 2004 5
An example of workflow at user level linea_cellulare= B9 intron_exon= 7-exon sex= M fumo= ex-fumatore alcool= bevitore Intron exon : intron_exon Sesso : sex Fumo : fumo Alcool : alcool Linea Cellulare : linea_cellulare Info_Mutazioni Nome_Mutazioni Intersezione_Mutazioni Abstract Info_Mutazioni Retrieve all mutations (IDs) Find all mutations observed in the B1 7th exon in men observed by using B4 B9 cell line who are ex-smokers and drinkers Nome_Mutazioni 1 st Activity 3 rd Activity Intersezione_Mutazioni Abstract merge [Intersezione_Mutazioni > 0] Retrieve all abstracts B5 of the correlated bibliographic references, of a specific mutation ID Intersezione_Mutazioni 4 nd Activity [Intersezione_Mutazioni = 0] 2 nd Activity Padova 27 marzo 2004 6
Use Cases Use cases are a set of scenarios tied together by a common user goal. Where a scenario is a sequence of steps describing an interaction between a user and a system. Some examples: find all possible mutations that involve a given protein X find all possible cell lines related to a given tumour Y select all possible abstracts referring to a given protein Z select all possible abstracts referring to cell line W UML definition Workflow as a composition of uses cases Padova 27 marzo 2004 7
Use cases in Cell Line domain A1: Find information about the cell line named x A2: Find all cell lines derived from a specific tumour or pathology A3: Find all Cell Lines producing a specific protein A4: Given a specific Cell Line, find all related bibliographic references A5: Given a specific Cell Line, find all information about produced proteins Use cases in Mutation domain Use cases and Application Domains B1: Find all mutations observed in a specific intron/exon in subjects with specific sex and life habits (i.e. smokers/ drinkers) B2: Find all mutations in subjects affected by a given pathology B3: Find all subjects affected by a tumoural pathology and with a given protein mutation B4: Find all mutations observed by using a given cell line B5: Given a specific mutation, find all abstracts of the correlated bibliographic references Use cases in Bibliographic database domain C1: Select all abstracts of bibliographic references, whose text includes a given term C2:.. Padova 27 marzo 2004 8
Tools suitcase Wf Management Wf Editor modelling and definition Wf Checker analyzing consistency of the model Wf Compiler - translating model to executable code User interface: Web-based GUI, Console, System Management Middleware platform deployment tool Account management tool System s Diagnosis tool System maintenance tool Traceability tool Padova 27 marzo 2004 9
User accounts managements This page is offered by the BioAgent platform site. It is possible to access to the system services If you have already a login and password, Submit Through the New User botton you can register Padova 27 marzo 2004 10
The offered services Remove a user Change user inforation Add a new use case View the available workflows Get agent data Exit Padova 27 marzo 2004 11
Use Cases management The Get Use Case botton allows to configure a new use case 1- Choose the application domain 2- Choose one of the available use cases Padova 27 marzo 2004 12
Use Cases configuration B1: Find all mutations observed in a specific intron/exon in subjects with specific sex and life habits (i.e. smokers/ drinkers) Input parameters Choose the coordination operator Padova 27 marzo 2004 13
The Workflow and Use cases Global textual view of the workflow. Submit Workflow Remove the use case Change configuration of a use case Submit a single use case Submit the workflow Padova 27 marzo 2004 14
Data management The Get Agents Data allows to manage data resulting from the workflow submission Once we have selected data of interest, we can remove or view in XML or HTML format Padova 27 marzo 2004 15
Data view XML data format XHTML data format Padova 27 marzo 2004 16
Future extensions to the tool Design and implementation of a knowledge database Automatic generation, from a workflow to a Multiagent system that behaves as the execution engine of a WfMS Development of an ontological service to support the user during the use case creation Padova 27 marzo 2004 17
The agent-based supporting technology
From Data to Knowledge and vice versa (Merelli et al. 02) Web meta-data ontologies (human concepts) + workflow MAS XML + RDF elements Information + coordination Services data format Data access + code Padova 27 marzo 2004 19
System s software architecture (Corradini et al 03) Padova 27 marzo 2004 20
The O2I system s architecture User Layer System Layer Run-Time Layer Long-transaction Retrieval Service User Application Workflow Short-transaction Workflow Mng High Level Integration Module Low Level Integration Module Web Services Knowledge Base Temporary Data Repository Remote Place where Tools are available Padova 27 marzo 2004 21
BioAgent System Architecture for O2I User Layer System Layer Run-Time Layer User Application Workflow Long-transaction Workflow Mng Workflow Executor Agents Retrieval Service Wrapper Agent EMBL FASTA ASN.1 GenBank RDB Kw db and Use Cases repository Knw Mng Agent Service HTML XML TXT ADb ULAD ALAID DoS Temporary Data Repository Remote data format Padova 27 marzo 2004 22
Software Architecture (Corradini et al. 04) User Application Workflow Workflow Management B1 B6 B4 User level workflow User Layer Application Agents Service Agents Management Activity A1 Activity A2 WE WE Activity B1 Activity B2 WA WA WA Insieme di esecutori di workflow (workflow executors, WE) Activity C1 Agent level workflow WE A WE B WE C System Layer Resources (data, tools and services) Core Tool A Tool B Tool C Run-Time Layer Padova 27 marzo 2004 23
Workflow Management Service The workflow management service is a prototype which allows the defintion of complex queries (use case) by using workflow as a coordinaton model. The application uses a databases to allow the user to configure use cases and manage resulting data. The application provides a graphical interface The application is a plug-in for the BioAgent platform linea_cellulare= B9 intron_exon= 7-intron sex= M fumo= ex-fumatore alcool= bevitore Intron exon : intron_exon Sesso : sex Fumo : fumo Alcool : alcool Linea Cellulare : linea_cellulare Info_Mutazioni Nome_Mutazioni Intersezione_Mutazioni Abstract Info_Mutazioni B1 B4 Nome_Mutazioni Intersezione_Mutazioni Abstract B6 B5 [Intersezione_Mutazioni > 0] Intersezione_Mutazioni [Intersezione_Mutazioni = 0] Padova 27 marzo 2004 24
An example of workflow at agent level introne/exon= 7-intron sex= M fumo= ex-fumatore alcool= bevitore lista={"www.a...","www.b... } Indice Info_Mutazioni[] controlla_risultato Intersezione_Mutazioni Indice Agent A Place: lista[indice] linea Cellulare =B9 lista={"www.a...","www.b..."} Indice Nomi_Mutazioni[] controlla_risultato Agent B Place: lista[indice] Indice Move lista={"www.f...","www.g..."} Indice abstract[] controlla_risultato mutazioni Intersezione_Mutazioni Agent C Indice Info_Mutazioni controlla_risultato Move Cerca mutazione Controlla lista Intron exon : intron_exon Sesso : sex Fumo : fumo Alcool : alcool Lista : lista Elemento : indice Indice Nomi_Mutazioni controlla_risultato [controlla_risultato no fine lista] Cerca mutazione Controlla lista Linea Cellulare : linea_cellulare Lista : lista Elemento : indice Indice Place: lista[indice] [controlla_risultato = no fine lista] Move [controlla_risultato = fine lista] [controlla_risultato = fine lista] Indice abstract Cerca Abstract Intersezione_Mutazioni B6 mutazioni Controlla Mutazioni [mutazioni = mutazioni finite] Controlla controlla_risultato lista [mutazioni = altre mutazioni] Intersezione_mutazioni Lista : lista Elemento : indice [controlla_risultato = no fine lista] [controlla_risultato = fine lista] Mutazioni : Intersezione_Mutazioni Padova 27 marzo 2004 25
Compiler stage: executable application agents DoS ALAID Activity Activity Activity Activity Activity Activity Compiler Activity Activity Padova 27 marzo 2004 26
Wrapper-Agent: general scenario WA WA WA QueryString: XML XML XML ProgramOption:.. SELECT. FROM WHERE.. Adaptor AIXO Adaptor AIXO Adaptor AIXO HTML Web Page Flat Files from Command Line Program E. Bartocci et al. 03 Bartocci et al 03 RDBMS Padova 27 marzo 2004 27
Wrapper-based System: Retrieval articles about P53 protein XML Filter and Map XSLT XML XML Trasl. TEXT Access XML GRAMMAR XML < > ID P53_HUMAN <entry> STANDARD; PRT; 393 AA. AC P04637; <ID Q16848; name="p53_human" Q9UBI2; type="standard" molecule="prt" lenght="393"/> DT 13-AUG-1987 <AC value= P04637 /> (Rel. 05, Created) <AC value= Q16848 /> <AC value= Q9UBI2 /> <DT day= 13 month= AUG year= 1987 rel= 05 /> </entry> Padova 27 marzo 2004 28
BioAgent x O2I (Merelli et al. 02) http://www.bioagent.net Padova 27 marzo 2004 29
Future Activities and Conclusions We are implementing an ontology-based wrapper agent developing the first prototype of the compiler to allow the automatic generation of user-agents enriching the set of tools supporting workflow specification We plan to evaluate our approach in using Workflow as a coordination model for modelling biological process We conclude saying that development of integrated platform for real applications, as those in Bioinformatics domain, is a very difficult task due to both heterogeneity of data format and wide variety of tools which continuously evolve Padova 27 marzo 2004 30
Our references for this work E. Mereli, R. Culmon & L. Mariani 02 E. Bartocci, L. Mariani & 03 F. Corradini, L. Mariani & 03 R.Amici, F. Corradini & 04 F. Corradini, L. Mariani & 04 Bioagent: an agent based platform for bioinformatics AIXO: Any Input XML Output, a generalized wrapper, ICEIS03 PEGAA: A Programming Environment for Global Activity-based Applications, WOA03 A Process Algebra View of Coordination Models with a Case Study in Computational Systems Biology An agent-based approach to toolintegration Padova 27 marzo 2004 31