An agent-based layered middleware as tool integration

Similar documents
A Workflow Service for Biomedical Applications

FarMAS: a MAS for Extended

Module 1. Sequence Formats and Retrieval. Charles Steward

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Biological Databases and Protein Sequence Analysis

This document presents the new features available in ngklast release 4.4 and KServer 4.2.

A Multiple DNA Sequence Translation Tool Incorporating Web Robot and Intelligent Recommendation Techniques

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

How to solve relevant problems in bioinformatics using agents

org.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers.

GenBank, Entrez, & FASTA

Library page. SRS first view. Different types of database in SRS. Standard query form

Linear Sequence Analysis. 3-D Structure Analysis

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, Abstract. Haruna Cofer*, PhD

CD-HIT User s Guide. Last updated: April 5,

irods and Metadata survey Version 0.1 Date March Abhijeet Kodgire 25th

Core Bioinformatics. Degree Type Year Semester Bioinformàtica/Bioinformatics OB 0 1

Laboratorio di Bioinformatica

Bioinformatics Resources at a Glance

Biological Sequence Data Formats

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011

PeptidomicsDB: a new platform for sharing MS/MS data.

UGENE Quick Start Guide

Scientific databases. Biological data management

PPInterFinder A Web Server for Mining Human Protein Protein Interaction

Lecture Outline. Introduction to Databases. Introduction. Data Formats Sample databases How to text search databases. Shifra Ben-Dor Irit Orr

Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003

AN INTEGRATION APPROACH FOR THE STATISTICAL INFORMATION SYSTEM OF ISTAT USING SDMX STANDARDS

BIOINFORMATICS TUTORIAL

A collaborative platform for knowledge management

GenBank: A Database of Genetic Sequence Data

Integrating Bioinformatics, Medical Sciences and Drug Discovery

Building the European Biodiversity. Observation Network (EU BON)

DNA Sequence Analysis Software

Data Grids. Lidan Wang April 5, 2007

Databases and mapping BWA. Samtools

Introduction to Bioinformatics 2. DNA Sequence Retrieval and comparison

Distributed Data Mining in Discovery Net. Dr. Moustafa Ghanem Department of Computing Imperial College London

Middleware support for the Internet of Things

Processing Genome Data using Scalable Database Technology. My Background

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

EMBL-EBI Web Services

1 What Are Web Services?

Enterprise Application Designs In Relation to ERP and SOA

Guzmán Llambías and Raúl Ruggia Universidad de la República, Facultad de Ingeniería, Montevideo, Uruguay, {gllambi,

Bioinformatics Grid - Enabled Tools For Biologists.

BLAST. Anders Gorm Pedersen & Rasmus Wernersson

Integration of Distributed Healthcare Records: Publishing Legacy Data as XML Documents Compliant with CEN/TC251 ENV13606

Committee on WIPO Standards (CWS)

Lesson 4 Web Service Interface Definition (Part I)

Government Service Bus

Syllabus of B.Sc. (Bioinformatics) Subject- Bioinformatics (as one subject) B.Sc. I Year Semester I Paper I: Basic of Bioinformatics 85 marks

Rotorcraft Health Management System (RHMS)

IMPLEMENTATION GUIDE. API Service. More Power to You. May For more information, please contact

IBM WebSphere DataStage Online training from Yes-M Systems

A Tutorial in Genetic Sequence Classification Tools and Techniques

MD Link Integration MDI Solutions Limited

Apply PERL to BioInformatics (II)

General principles and architecture of Adlib and Adlib API. Petra Otten Manager Customer Support

Three data delivery cases for EMBL- EBI s Embassy. Guy Cochrane

Bio-Informatics Lectures. A Short Introduction

THE GENBANK SEQUENCE DATABASE

A W orkflow Management System for Bioinformatics Grid

LDIF - Linked Data Integration Framework

Core Bioinformatics. Degree Type Year Semester

Module 10: Bioinformatics

Integrating Heterogeneous Data Sources Using XML

DNA Sequencing Overview

Data Integration and Data Cleaning in DWH

From Oracle Warehouse Builder to Oracle Data Integrator fast and safe.

Software Architecture Document

Data Integration using Agent based Mediator-Wrapper Architecture. Tutorial Report For Agent Based Software Engineering (SENG 609.

Embedded Systems in Healthcare. Pierre America Healthcare Systems Architecture Philips Research, Eindhoven, the Netherlands November 12, 2008

Oracle Warehouse Builder 10g

Transcription:

An agent-based layered middleware as tool integration Flavio Corradini Leonardo Mariani Emanuela Merelli University of L Aquila University of Milano University of Camerino ITALY ITALY ITALY Helsinki FSE/ESEC 2003 Tool Integration Workshop

Outline The Tool Integration problem in the Bioinformatics Domain The Workflow-based Task Coordination (High Level Tool Integration) The Wrapper-based Data Integration (Low level Tool Integration) The Proposed Approach: An Agent-based Middleware for Tool Integration Preliminary Results Future Activities and Conclusions 2003 2

The Tool Integration problem in Bioinformatics Domain Problem: To find the crystallographic structure of the 10 proteins more similar to a new genetic sequence, e.g X=MEEP DSD, Objective: To use several Bioinformatics Software Tools available on Internet in order to find the wanted result For Tool Integration we mean 1. Select the 10 proteins more 1) similar Supporting to the X=MEEP Tasks Coordination DSD sequence 2) Allowing Data Integration by using BLASTn in GenBank at NCBI 1 st Tool 2. Search for the PDB ID (crystallographic in order structure to automatically identifier) of each selected execute proteins, by using BLASTp in SWISS-PROT at EMBL-EBI 2 nd Tool by retrieving from PubMed via Entrez an experiment Retrieval System at NCBI, abstracts containing PDB-ID information 3 rd Tool 3. Search for the Crystallographic Structure of any selected PDB ID find 3-D biological macromolecular structure in Protein DataBank repository 4 th Tool Aim: To integrate the four Bioinformatics tools freeing the Bioscientist from the need to continous interact with remote sites. 2003 3

Workflow-based User BioApplication 1 st Tool 3 rd Tool 2 nd Tool 4 th Tool 2003 4

Wrapper-based System: general scenario QueryString: XML XML XML ProgramOption:.. SELECT. FROM WHERE.. AIXO AIXO AIXO HTML Web Page Flat Files from Command Line Program RDBMS 2003 5

Wrapper-based System: Bioinformatics Tools Tool 1: Environment: NCBI (WebSite): html format Data: GenBank (DB): proprietary format Tool: BLASTn (Algorithm): Takes nucleotides sequences in FASTA format, GenBank Accession numbers or GI numbers and compares them against the NCBI nucleotide databases Output: GenBank Format Tool 2: Environment: EMBL-EBI (WebSite): html format Data: Swiss-Prot (DB): proprietary format Tool: BLASTp (Algorithm): Takes protein sequences in FASTA format, GenBank Accession numbers or GI numbers Output: FASTA format Tool 3: Environment: NCBI (WebSite): html format Data: PubMed & MEDLINE: ANS.1 format Tool: Entrez Retrieval System Output: XML Tool 4: Environment: Protein DataBank web site Data: PDB(DB): proprietary format Tool: FASTA (Algorithm): Output: FASTA Format and compares them against the NCBI protein databases. 2003 6

Wrapper-based System: Retrieval MedLine articles about P53 proteine XML Filter and Map XSLT XML XML Trasl. TEXT Access XML GRAMMAR XML 2003 7

Wrapper-based System: the software architecture (AIXO) Wrapper addxslfilter (pathfile : String) : void retrievalxmldocument(parmeter : Object) : org.jdom.document access : DataSource, XMLResource : ResourceToXMLReader, engine : WaterfallXSLTProcessor ResourceToXMLReader toxmlreader (Input : Object) : Reader DataSource WaterfallXSLTProcessor getaccess (parameter : Object) : Object getdocument (input: java.io.reader) : Document.. DataSource: HTTP, RDBMS, Command Line program,. ResourceToXMLReader: HTML, FlatFile, 2003 8

The Tool Integration Problem in Activity-Based Applications Problem: To integrate and coordinate multiple software tools for retrieving and integrating heterogeneous, distributed and frequently redundant data Objective: To integrate and coordinate several software tools in order to provide a uniform way and an high level of abstraction for users Aim: To define an integrated environment freeing the user from the need to know details on data repository and to coordinate the intermediate steps of an experiment (tasks) Proposed Approach: To define an application as a workflow of tasks; to coordinate the execution of cooperative tasks by using software agent tools 2003 9

System s software architecture 2003 10

A general system s architecture User Layer System Layer Run-Time Layer Long-transaction Retrieval Service User Application Workflow Short-transaction Workflow Mng High Level Integration Module Low Level Integration Module Web Services Knowledge Base Temporary Data Repository Remote Place where Tools are available 2003 11

Agent-based System Architecture User Layer System Layer Run-Time Layer User Application Workflow Long-transaction Workflow Mng User Agents Retrieval Service Wrapper Agent EMBL FASTA ASN.1 GenBank Short-transaction RDB Web Services Knw Mng Agent Service HTML XML TXT Knowledge Base Temporary Data Repository Remote data format 2003 12

From Data to Knowledge and vice versa tool Integration meta-data XML elements ontologies (human concepts) + workflow Information + coordination Different data format data + algorithms 2003 13

The Proposed Approach: an Agent-based Middleware 2003 14

Preliminary Results: User-agent as high level Tool Integration 2003 15

Preliminary Results: Wrapper-agent as low level Tool Integration 2003 16

BioAgent 2003 17

Future Activities and Conclusions For different application domains (i.e supply chain, components traceability for testing ) we plan to: Develop wrapper agents Design and develop the knowledge database to manage software tools Develop the compiler to allow the automatic generation of user-agents Evaluate the possibility to include mobility to user-agents in order to minimize the data transfer during tasks execution. We conclude saying that software tool integration for real applications, as those in Bioinformatics domain, is a very difficult task due to both heterogeneity of data format and wide variety of tools which continuously evolve. 2003 18

NCBI Home page 2003 19

NCBI main databses 2003 20

NCBI site map 2003 21

NCBI - Entrez 2003 22

PDB Home page 2003 23

NCBI - BLAST 2003 24

NCBI ASN.1 2003 25

NCBI - fomats 2003 26

PDB-ID (P53) = 1TSR www.rcsb.org ( 2003 27

DNA and nucleotide sequence atggaggagccgcagtcagatcctagcgtcgagccccctctgagtcaggaaacattttca M atggaggagccgcagtcagatcctagcgtcgagccccctctgagtcaggaaacattttca E E P Q S D P S V E P P L S Q E T F S gacctatggaaactacttcctgaaaacaacgttctgtcccccttgccgtcccaagcaatg M E E P Q S D P S V E P P L S Q E T F S D gacctatggaaactacttcctgaaaacaacgttctgtcccccttgccgtcccaagcaatg L W K L L P E N N V L S P L P S Q A M gatgatttgatgctgtccccggacgatattgaacaatggttcactgaagacccaggtcca D L W K L L P E N N V L S P L P S Q A M D D L M L S P D D I E Q W F T E D P G P gatgaagctcccagaatgccagaggctgctccccgcgtggcccctggaccagcagctcct D ccagggagcactaagcgagcactgcccaacaacaccagctcctctccccagccaaagaag E A P R M P E A A P R V A P G P A A P acaccggcggcccctgcaccagccccctcctggcccctgtcatcttctgtcccttcccag P G S T K R A L P N N T S S S P Q P K K T aaaccactggatggagaatatttcacccttcagatccgtgggcgtgagcgcttcgagatg P A A P A P A P S W P L S S S V P S Q aaaacctaccagggcagctacggtttccgtctgggcttcttgcattctgggacagccaag K P L D G E Y F T L Q I R G R E R F E M K ttccgagagctgaatgaggccttggaactcaaggatgcccaggctgggaaggagccaggg T Y Q G S Y G F R L G F L H S G T A K tctgtgacttgcacgtactcccctgccctcaacaagatgttttgccaactggccaagacc F R E L N E A L E L K D A Q A G K E P G S gggagcagggctcactccagccacctgaagtccaaaaagggtcagtctacctcccgccat V T C T Y S P A L N K M F C Q L A K T tgccctgtgcagctgtgggttgattccacacccccgcccggcacccgcgtccgcgccatg G S R A H S S H L K S K K G Q S T S R H C aaaaaactcatgttcaagacagaagggcctgactcagactga P V Q L W V D S T P P P G T R V R A M gccatctacaagcagtcacagcacatgacggaggttgtgaggcgctgcccccaccatgag K K L M F K T E G P D S D - A I Y K Q S Q H M T E V V R R C P H H E cgctgctcagatagcgatggtctggcccctcctcagcatcttatccgagtggaaggaaat R C S D S D G L A P P Q H L I R V E G N ttgcgtgtggagtatttggatgacagaaacacttttcgacatagtgtggtggtgccctat Tool 2003 Integration workshop 28 2003 32

ULAD 2003 29

Significant References Y. Papakonstantinou, H. Garcia-Molina & J. Widom 95 S. Bergamaschi 00 G. Cabri, L.Leonardi & F. Zambonelli 00 E. Bartocci, L. Mariani & E. Merelli 03 F. Corradini, L. Mariani & E. Merelli 03 OEM: Object Exchange Across Heterogeneous Information Sources Momis: Mediator environment for Multiple Information Sources MARS: A Programmable coordination Architecture for Mobile Agents AIXO: Any Input XML Output, a generalized wrapper PEGAA: A Programming Environment for Global Activity-based Applications 2003 30