EGEE-2 NA4 Biomed Bioinformatics in CNRS



Similar documents
Deployment of BioXSDenabled services on a Cloud. christophe.blanchet@ibcp.fr

Cloud Ready for Bioinformatics?

A demonstration of the use of Datagrid testbed and services for the biomedical community

Grille de calcul pour le biomédical Journée BioSTIC, 10 juillet 2008

Bioinformatics Grid - Enabled Tools For Biologists.

Bob Jones Technical Director

Tactical Storage: Simple, Secure, and Semantic Access to Remote Data. Prof. Douglas Thain University of Notre Dame

Bioinformatique sur Cloud Cas d usage avec le portail Galaxy

The Grid-it: the Italian Grid Production infrastructure

File Transfer Software and Service SC3

UGENE Quick Start Guide

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

ARDA Experiment Dashboard

Encrypting*a*Windows*7*Hard*Disk* with%bitlocker%disk%encryption!

Data Integration and Decision-Making For Biomarkers Discovery, Validation and Evaluation. D. POLVERARI, CTO October

The GENIUS Grid Portal

The Galaxy workflow. George Magklaras PhD RHCE

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

glibrary: Digital Asset Management System for the Grid

Fédération et analyse de données distribuées en imagerie biomédicale

Software review. Pise: Software for building bioinformatics webs

Sustainable Grid User Support

Biology Institute: 7 PhD programs Expertise in all areas of biological sciences

Introduction to bioknoppix: Linux for the life sciences

Extending Enterprise GIS Into The Field with Mobile GIS Technology

Esqu Science Experiments For Computer Network

EGEE is a project funded by the European Union under contract IST

Syslog Server. Eddie Aronovich. Tel-Aviv University. egee INFSO-RI

Plateforme de Calcul pour les Sciences du Vivant. SRB & glite. V. Breton.

A Secure Grid Medical Data Manager Interfaced to the glite Middleware

Support Model for SC4 Pilot WLCG Service

Alternative models to distribute VO specific software to WLCG sites: a prototype set up at PIC

EMBL-EBI Web Services

Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments

Distributed Data Mining in Discovery Net. Dr. Moustafa Ghanem Department of Computing Imperial College London

A W orkflow Management System for Bioinformatics Grid

Russ College of Engineering and Technology. Revised 9/ Undergraduate GPA of 3.0/4.0 or equivalent.

Status and Evolution of ATLAS Workload Management System PanDA

Structural Bioinformatics

ClusterControl: A Web Interface for Distributing and Monitoring Bioinformatics Applications on a Linux Cluster

Cloud pour la Bioinformatique

CMS Dashboard of Grid Activity

Bridging clinical information systems and grid middleware: a Medical Data Manager

Institut Français de Bioinformatique, Un Cloud pour les Sciences du Vivant

Analyses on functional capabilities of BizTalk Server, Oracle BPEL Process Manger and WebSphere Process Server for applications in Grid middleware

Applications des grilles aux sciences du vivant

Genome Explorer For Comparative Genome Analysis

Manual for Android 1.5

EGEE a worldwide Grid infrastructure

FACULTY OF ALLIED HEALTH SCIENCES

Network monitoring in DataGRID project

Genome Science Education for Engineering Majors

Bachelor of Science in Applied Bioengineering

REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf])

Integrating Bioinformatics, Medical Sciences and Drug Discovery

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

EGEE vision and roadmap to involve industry

Global and Discovery Proteomics Lecture Agenda

The GILDA Testbed Valeria Ardizzone INFN Catania

Bio-Informatics Lectures. A Short Introduction

Web-Based Genomic Information Integration with Gene Ontology

Teaching Bioinformatics to Undergraduates

Data Storage Solution Using PL-Grid UNICORE Infrastructure

For additional information on the program, see the current university catalog.

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, Abstract. Haruna Cofer*, PhD

MIGRATING DESKTOP AND ROAMING ACCESS. Migrating Desktop and Roaming Access Whitepaper

Rights Management Services

Grid Computing Perspectives for IBM

Biological Databases and Protein Sequence Analysis

Department of Information Technology and Electrical Engineering. Master. Biomedical Engineering

A Multiple DNA Sequence Translation Tool Incorporating Web Robot and Intelligent Recommendation Techniques

M The Nucleus M The Cytoskeleton M Cell Structure and Dynamics

PROGRAMME SPECIFICATION

The NERC DataGrid (NDG)

SPACI & EGEE LCG on IA64

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

The glite File Transfer Service

Data, Measurements, Features

Regional Dashboard. Cyril L Orphelin - CNRS/IN2P3 Abingdon, England. EGEE and glite are registered trademarks

Define and Configure an Application Request Routing Server Farm

Three data delivery cases for EMBL- EBI s Embassy. Guy Cochrane

M.S. AND PH.D. IN BIOMEDICAL ENGINEERING

A pretty picture, or a measurement? Retinal Imaging

IMAN: DATA INTEGRATION MADE SIMPLE YOUR SOLUTION FOR SEAMLESS, AGILE DATA INTEGRATION IMAN TECHNICAL SHEET

DSA1.4 R EPORT ON IMPLEMENTATION OF MONITORING AND OPERATIONAL SUPPORT SYSTEM. Activity: SA1. Partner(s): EENet, NICPB. Lead Partner: EENet

Step-by-Step guide for SSO from MS Sharepoint 2010 to SAP EP 7.0x

EDG Project: Database Management Services

Sending Files to a Social Security Laptop

Data Grids. Lidan Wang April 5, 2007

Virtual research environments: learning gained from a situation and needs analysis for malaria researchers


Cloud Services MDM. ios User Guide

1. Programme title and designation Biochemistry. For undergraduate programmes only Single honours Joint Major/minor

UC SANTA CRUZ FALL 2016 TRANSFER ADMISSION GUARANTEE (TAG) PROGRAM ELIGIBILITY REVIEW FORM

Partitionning medical image databases for content-based queries on a grid

Early Cloud Experiences with the Kepler Scientific Workflow System

Working Together Managing and Securing Enterprise Mobility WHITE PAPER. Larry Klimczyk Digital Defence P:

Experiences with the GLUE information schema in the LCG/EGEE production Grid

Caching SMB Data for Offline Access and an Improved Online Experience

Transcription:

Enabling Grids for E-sciencE EGEE-2 NA4 Biomed Bioinformatics in CNRS Christophe Blanchet Institute of Biology and Chemistry of Proteins Lyon, April 28, 2006 www.eu-egee.org

Enabling Grids for E-sciencE Outline NPS@ Web portal Online since 1998 Production mode Gridification of NPS@: the GPS@ pilot application of EGEE Bioinformatics description with XML-based Framework Legacy mode for application file access GPS@ Web portal for Bioinformatics on Grid External application about phylogeny phylojava A Dehne-Garcia (CNRS BBE) EGEE-2, Bioinformatics in CNRS partner, IBCP, Lyon, April 28th, 2006 2

Enabling Grids for E-sciencE Institute of Biology and Chemistry of Proteins French CNRS Institute, associated to Univ. Lyon1 Life Science About 160 people http://www.ibcp.fr Located in Lyon, France Study of proteins in their biological context Approaches used include integrative cellular (cell culture, various types of microscopies) and molecular techniques, both experimental (including biocrystallography and nuclear magnetic resonance) and theoretical (structural bioinformatics). Three main departments, bringing together 13 groups topics such as cancer, extracellular matrix, tissue engineering, membranes, cell transport and signalling, bioinformatics and structural biology EGEE-2, Bioinformatics in CNRS partner, IBCP, Lyon, April 28th, 2006 3

EGEE Biomedical Activity Enabling Grids for E-sciencE Chair: Johan Montagnat Christophe Blanchet (deputy) Biomedical activity area Bioinformatics Medical imaging Other health related areas Three types of application Pilots: LCG-2 compliant applications at day 0 Internal: from project partners, to be deployed on EGEE External: from other projects, to go through a selection procedure EGEE User Forum, CERN, March 1-3th, 2006 http://egee-intranet.web.cern.ch/egee-intranet/user-forum EGEE-2, Bioinformatics in CNRS partner, IBCP, Lyon, April 28th, 2006 4

NPS@: bioinformatic Web portal Enabling Grids for E-sciencE http://npsa-pbil.ibcp.fr/ online since 1998 ; NPS@ release 3 46 integrated methods for protein sequence analysis 12 Online up-to-date biological databanks 1-click download of NPS@ results in biological softwares: MPSA, AnTheProt, Clustal X, RasMol, International references: Expasy, University of California, InfoBioGen,... NPS@: Network Protein Sequence Analysis, Combet C., Blanchet C., Geourjon C. et Deléage G. Tibs, 2000, 25, 147-150. EGEE-2, Bioinformatics in CNRS partner, IBCP, Lyon, April 28th, 2006 5

Enabling Grids for E-sciencE More than 7 millions analyses since 1998 More than 5000 analyses/day NPS@ hits EGEE-2, Bioinformatics in CNRS partner, IBCP, Lyon, April 28th, 2006 6

Ex. of Bioinformatics Applications Enabling Grids for E-sciencE Different algorithms Sequence similarity, Multiple alignment Structural prediction Numerous programs BioCatalog: + 600 at end of 1990s EMBOSS: + 200 (world-famous) Data access Text files I/O standards with local file interface No modification of source codes to preserve generic model EGEE-2, Bioinformatics in CNRS partner, IBCP, Lyon, April 28th, 2006 7

Expected benefits of gridification Enabling Grids for E-sciencE Biological data distribute international databases, store more and large Bioinformatics algorithms compute larger datasets more complex workflows NPS@ Web portal well-known Web interface open to a wider user community. EGEE-2, Bioinformatics in CNRS partner, IBCP, Lyon, April 28th, 2006 8

GPSA UI http://gpsa.ibcp.fr XML integration into GPS@ portal Biologist Enabling Grids for E-sciencE GPSA bio Portal biogateway to GRID Bio_cgi XML bioinformatics descriptors Bio_launcher EGEE UI Genomics-ui.ibcp.fr HTTP Christophe Blanchet, Christophe Combet and Gilbert Deléage: Integrating Bioinformatics Resources on the EGEE Grid Platform. IEEE Proceedings of Biogrid 2006, Singapore, May 16-19 bio Resources @IBCP International DBs Bioinformatics Algorithms EGEE CE - SE WMS - RMS DataGRID EGEE EGEE-2, Bioinformatics in CNRS partner, IBCP, Lyon, April 28th, 2006 9

Bioinformatics description with XML-based Framework Enabling Grids for E-sciencE <?xml version="1.0"?> <!DOCTYPE bio_method SYSTEM "/opt/bio/etc/bio_method.dtd"> <bio_method version="2.0" mode="egee" > <method name="pattinprot" class="scanprot" type="sequential" root="/var/www/gpsa.ibcp.fr/pbil/servers/gpsa/w3-gpsa/" > <bio_binary path="gbio_lfn://pattinprot/newpattinprot" arch="i686" version="1" /> <bio_parameter usage="cliio" > <parameter class="sequence_bank" type="file" option="-p" value="gbio_lfn://work_space/pattinprot_0.inputdata" visibility="external" IO="in" /> <parameter class="pattern_bank" type="file" option="-m" value="gbio_lfn://work_space/pattinprot_1.inputdata" visibility="external" IO="in" /> <parameter class="result" type="file" option="-r" link="biodata" value="gbio_lfn://work_space/pattinprot.out" visibility="external" IO="out" /> </bio_parameter> </method> </bio_method> EGEE-2, Bioinformatics in CNRS partner, IBCP, Lyon, April 28th, 2006 10

Enabling Grids for E-sciencE Legacy and secured data I/O C. Blanchet, R. Mollon and G. Deleage: Building an Encrypted File System on the EGEE grid: Application to Protein Sequence Analysis. IEEE Proceedings of ARES 2006, Vienna, 20-22 April Time to download a 205-MB gridified file Encrypted LFN + lcg-cp (+ decryption) Encrypted LFN + Perroquet (with cache) Encrypted LFN + Perroquet (without cache) Plain LFN + lcg-cp Plain LFN + Perroquet (with cache) Plain LFN + Perroquet (without cache) 0 10 20 30 40 50 60 70 80 90 Time (seconds) EGEE-2, Bioinformatics in CNRS partner, IBCP, Lyon, April 28th, 2006 11

GPS@ : Gridification of NPS@ Enabling Grids for E-sciencE EGEE-2, Bioinformatics in CNRS partner, IBCP, Lyon, April 28th, 2006 12

GPS@ portal on Grid Enabling Grids for E-sciencE EGEE-2, Bioinformatics in CNRS partner, IBCP, Lyon, April 28th, 2006 13

Enabling Grids for E-sciencE BLAST on grid through GPS@ EGEE-2, Bioinformatics in CNRS partner, IBCP, Lyon, April 28th, 2006 14

Enabling Grids for E-sciencE E.g. of gridified bio-resources Resource Swiss-Prot And Blast indexes TrEMBL PROSITE ClustalW SSearch Grid Descriptor lfn://genomics_gpsa/db/swissprot/swissprot.fasta lfn://genomics_gpsa/db/swissprot/swissprot.fasta.phr lfn://genomics_gpsa/db/swissprot/swissprot.fasta.pin lfn://genomics_gpsa/db/swissprot/swissprot.fasta.psq lfn://genomics_gpsa/db/trembl/trembl.fasta lfn://genomics_gpsa/db/prosite/prosite.dat lfn://genomics_gpsa/db/prosite/prosite.doc ESM tag Ņgenomics_gpsa_clustalwÓ ESM tag Ņgenomics_gpsa_ssearchÓ Examples of biological databases and bioinformatics programs registered and deployed onto the EGEE grid. Database files have been registered as logical files into the replica manager system, with their own logical filename (LFN, lfn://), and programs with an tag of the experiment software manager (ESM tag). EGEE-2, Bioinformatics in CNRS partner, IBCP, Lyon, April 28th, 2006 15

Enabling Grids for E-sciencE Conclusion and perspectives GPS@ Web portal for Bioinformatics on Grid Access to grid resources of EGEE (computation and storage) Well-known interface Integration of legacy resources XML-based Automatic deployment of legacy applications Integration with Perroquet and EncFile tools Transparent and local file access to remote data On-the-fly encryption/decryption Good performances Perspectives Short jobs: execution time < 5 minutes EGEE TCG working group on Short Deadline Job EGEE-2, Bioinformatics in CNRS partner, IBCP, Lyon, April 28th, 2006 16