IFB s e-infrastructure



Similar documents
Sequencing data. And other experimental data. EMBL-EBI data resources growth

Cloud pour la Bioinformatique

Une e-infrastructure nationale en bioinformatique

Cloud Ready for Bioinformatics?

Bioinformatique sur Cloud Cas d usage avec le portail Galaxy

Le cloud IFB et son instance Galaxy

Le cloud IFB et son instance Galaxy

A curated Domain centric shared Docker registry linked to the Galaxy toolshed

DATA MANAGEMENT PLAN IN THE REAL LIFE SCIENCES

SURFsara HPC Cloud Workshop

SURFsara HPC Cloud Workshop

Towards a galaxy.prabi.fr

OpenNebula Open Souce Solution for DC Virtualization. C12G Labs. Online Webinar

SOFTWARE DEFINED SOLUTIONS JEUDI 19 NOVEMBRE Nicolas EHRMAN Sr Presales SDS

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix

OpenNebula The Open Source Solution for Data Center Virtualization

OpenNebula Open Souce Solution for DC Virtualization

StratusLab project. Standards, Interoperability and Asset Exploitation. Vangelis Floros, GRNET

OpenNebula Open Souce Solution for DC Virtualization

Building Storage Service in a Private Cloud

Quel pilote ètes-vous

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

Hadoopizer : a cloud environment for bioinformatics data analysis

Big Data in BioMedical Sciences. Steven Newhouse, Head of Technical Services, EMBL-EBI

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Planning, Provisioning and Deploying Enterprise Clouds with Oracle Enterprise Manager 12c Kevin Patterson, Principal Sales Consultant, Enterprise

Operation Structure (OS)

e-biogenouest : The Tools

BUSINESS PROCESS OPTIMIZATION. OPTIMIZATION DES PROCESSUS D ENTERPRISE Comment d aborder la qualité en améliorant le processus

How To Run A Cloud Server On A Server Farm (Cloud)

Three data delivery cases for EMBL- EBI s Embassy. Guy Cochrane

<Insert Picture Here> Private Cloud with Fusion Middleware

SCC / QUANTUM Kick Off 2015 Comment gérer efficacement des workflows et archives de données non structurées?

Linux/Open Source and Cloud computing Wim Coekaerts Senior Vice President, Linux and Virtualization Engineering

Getting Started Hacking on OpenNebula

Bioinformatics Grid - Enabled Tools For Biologists.

Big Data and Cloud Computing for GHRSST

Introduction to Cloud Computing

UGENE Quick Start Guide

The OpenNebula Cloud Platform for Data Center Virtualization

OpenNebula Cloud Platform for Data Center Virtualization

Core Bioinformatics. Degree Type Year Semester Bioinformàtica/Bioinformatics OB 0 1

Private Cloud Database Consolidation with Exadata. Nitin Vengurlekar Technical Director/Cloud Evangelist

Copyright 2014, Oracle and/or its affiliates. All rights reserved.

Formation à l ED STIC ED STIC Doctoral education. Hanna Klaudel

wu.cloud: Insights Gained from Operating a Private Cloud System

CompatibleOne & le SLA

Accélérer le développement d'applications avec DevOps

Build and Manage Private and Hybrid Cloud. Urban Järund, Sr Regional Services Manager Nordics, Red Hat

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

vnebula Cloud. Made Easy. Introducing vnebula from Stream Networks. A simple, self-service cloud portal for our partner community.

Cloud Computing Architecture with OpenNebula HPC Cloud Use Cases

Maquette DB2 PureScale

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

Qu est-ce que le Cloud? Quels sont ses points forts? Pourquoi l'adopter? Hugues De Pra Data Center Lead Cisco Belgium & Luxemburg

Open Source Cloud Computing Management with OpenNebula

Scientific and Technical Applications as a Service in the Cloud

Course 20533: Implementing Microsoft Azure Infrastructure Solutions

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 7

Cheminformatics in the Cloud. Michael A. Dippolito DeltaSoft, Inc. 3-June-2009 ChemAxon European User Group Meeting

Managed Object - PerformanceManager

A Gentle Introduction to Cloud Computing

An Alternative to the VMware Tax...

CHAPTER 8 CLOUD COMPUTING

Brief description of the paper/report. Identification

Solution for private cloud computing

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Cloud Computing. Chapter 1 Introducing Cloud Computing

Virtualization & Cloud Computing (2W-VnCC)

Final Report on StratusLab Adoption

Cloud Essentials for Architects using OpenStack

Using WebSphere Application Server on Amazon EC2. Speaker(s): Ed McCabe, Arthur Meloy

Intel IT Cloud Extending OpenStack* IaaS with Cloud Foundry* PaaS

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

HP OpenStack & Automation

Implementing Microsoft Azure Infrastructure Solutions

SUSE Cloud 2.0. Pete Chadwick. Douglas Jarvis. Senior Product Manager Product Marketing Manager

Design and Building of IaaS Clouds

Transcription:

IFB s e-infrastructure Christophe Blanchet Institut Français de Bioinformatique - IFB French Institute of Bioinformatics - ELIXIR-FR CNRS UMS3601 - Gif-sur-Yvette - FRANCE

Life Sciences Platforms in France National platforms (GIS IBISA) Nb Cellular imaging 19 Genomic, Transcriptomic 16 Proteomic 13 Structural biology, biophysic 11 NGS BI C IMG Biological platform (Genomics, IMaGing, PROteomics...) Bioinformatics center Cloud resources Scientists PRO C NGS BI NGS PRO BI C French NGS platforms PRO BI IMG PRO NGS PRO C IMG BI C C Source: omicsmaps.com Regional centers distribute the load in terms of computing and storage, and provide better interactions with scientists Des sites intermédiaires permettent de répartir la charge en terme de stockage et de puissance de calcul tout en assurant une meilleure proximité avec les scientifiques

National and European Infrastructures FR / EU

IFB e-infrastructure Team Staff Christophe Blanchet (CNRS) Marie Grosjean - Data and tools integration (fixed-term contract, 2014-Oct/2016-Mar) Mohamed Bedri - Cloud technology (fixed-term contract, 2014-Dec/2016-May) Fedi Ben Ali - Information technology (fixed-term contract, 2015-Feb/2016-Jul) Xxx Xxx - Data and tools integration (fixed-term contract, to be hired) Services Provide scientists with bioinformatics resources, data and tools, as cloud appliances Provide users support Provide developers support to integrate their tools/dbs Deploy and operate IFB s national IT infrastructure as a cloud in collaboration with CNRS IDRIS SC center teams. Evaluate and deploy cloud technologies Interacting with the communities (academic and industrial) Collaborate with IFB s partners, national and European infrastructures, scientific and technological projects. Liven up and train national community: GRISBI workgroup, tutorials, thematic school CumuloNumBIO, 2015, http://www.france-bioinformatique.fr/?q=fr/core/cellule-infrastructure

Deploy and operate IFB s national IT infrastructure as a cloud

IT resources in presence Distributed infrastructure a national IFB-core resource (see Table) hosted at CNRS IDRIS SC center (Paris) + regional resources 6 regional bioinformatics centers 11,000 cores - 6 PB, but in +20 platforms Create a federation of clouds for life sciences IFB-GO IFB-SO APLIBIO IFB-core IFB-GS IFB-NE PRABI IFB-core # Compute Cores # TB Storage # TB RAM Max VM size Technology Location Pilot 200 50 2 40c 256GB StratusLab CNRS-IDRIS, Paris 2015 3,000 500-96c 1TB StratusLab CNRS-IDRIS, Paris 2016 10,000 2,000-96c 2TB StratusLab CNRS-IDRIS, Paris

Cloud? Essential characteristics On-demand self-service No human intervention Broad network access Rapid elasticity Fast, reliable remote access Scale based on app. needs Resource pooling Multi-tenant sharing Measured service Direct or indirect economic model with measured use Deployment models Hybrid Federation via combination of other deployment models Service models Software as a Service (SaaS) Direct (scalable) hosting of end user applications Platform as a Service (PaaS) Framework and infrastructure for creating web applications Infrastructure as a Service (IaaS) Access to remote virtual machines with root access Private Single administrative domain, limited number of users Community Different administrative domains with common interests & proc. Public People outside of institute s administrative domain http://csrc.nist.gov/publications/nistpubs/800-145/sp800-145.pdf

IFB-core s cloud PaaS NGS, imaging, statistics, S a Ia RENATER 10giga Scientists Sha red FS launch jobs SaaS Master Workers Virtualization Layer Frontend Web portal Pdisk storage 10giga eth iscsi 10giga eth Cloud Hypervisors - std nodes: 32c 128GB - bigmem nodes: 40c 256GB Hosted @ IDRIS CNRS SC-center

Storage for biological data CLI (scp, sftp), GUI (Cyberduck, Transmit, Filezilla, ) sftp/http/s3 Upload your data Public Data sources Genomes EMBL PDB UNIPROT PROSITE shared (NFS ro) BLAST, Clustal, etc. PaaS IaaS launch jobs ssh Shared FS Master & Storage VM ARIA Workers VM CNS Identity Mgmt j. doe e. martin you chb virtual disks Portal Bioinformatics Cloud cg User data sftp/http/s3 Get your results CLI (scp, sftp), GUI (Cyberduck, Transmit, Filezilla, )

A cloud driven through a web dashboard http://cloud.france-bioinformatique.fr/cloud

Moving VMs vs Data NGS IMG PRO NGS Biological platform (Genomics, IMaGing, PROteomics...) BI C Bioinformatics center Cloud resources Scientists C BI NGS data PRO VM BI VM C VMs PRO IFB life sciences marketplace & VMs repository NGS data VM PRO BI C IMG PRO data IMG BI C C

Provide scientists with bioinformatics resources, data and tools, as cloud appliances

Make an inventory of national resources Make an inventory of resources provided by IFB s platforms Data: through the federation of existing BioMAJ servers Tools: with a service registry to be set up in IFB-core current developments based on a graph-db model (Neo4J) and an ontology (EDAM) Information stored in text file, Web & wiki, DBMS, Large numbers (10s-100s) From 21 platforms and more labs Goal: provide most-used resources in the IFB s cloud

Cloud reference databases repository shared storage Cloud IFB manage BIOMAJ VM Databases All virtual machines

Create bioinformatics cloud appliances Integration of bioinformatics tools Bioinformatics appliances are pre-defined virtual machines small : few GB, easy to convert in most virtualization formats Installed and pre-configured with bioinformatics tools e.g. BLAST, ClustalW, ARIA, MEME, HMMer, TopHat, BWA, Samtools, RSAT, etc. Referenced in a the IFB marketplace a catalog of VM templates devoted to bioinformatics tools BLAST FastA OMSSA ClustalW2 SSearch PeptideShaker ARIA BWA X!tandem HMMer TopHat samtools Galaxy Clustal Muscle fastqc Omega Create new cloud services R Linux system Bioinformatics Marketplace Structures Sequences Virtual Machines Proteomics + Galaxy...

Current bioinformatics appliances Scientific apps CLI Virtual desktop Web Galaxy MODAL Proteomics Galaxy Imaging Galaxy AVIESAN 2013 RSAT PhyML RSAT mini R statistics Aria biocompute Node Utilities biodata BioMaj BlobSeer biodata NFS Cassandra Data mgmt biohadoop CentOS Ubuntu Base OS

Browse the appliances and run yours! Proteomics Sequences Galaxy Structures?... IFB Marketplace!

Usecase cloud Galaxy portal Galaxy portal is widely used in the community analyse NGS data (mainly but not only) connected to community knowledge: data and indexes, tools, workflows Cloud advantages : User is administrator on his/her own Galaxy instance: install data and tools Preserve workflows and results in cloud storage Help the integration of monthly updates and new tools Cloud permit different appliances to be built from the same base: base one with common tools for NGS specific ones for a domain or a set of tools e.g. Galaxy-MODAL : MOdels for Data Analysis and Learning for training: create a special appliance with dedicated datasets, tools or workflows e.g. for the French AVIESAN school 2013

Usecase A specialized software suite for the analysis of noncoding sequences motif discovery in promotors of co-expressed genes CHIPseq analysis evolutionary conserved motifs (phylogenetics footprints) Contact: J. van Helden (TGAC) Used for ECCB 14 tutorial T01 RSAT offers a series of tools dedicated to the detection of regulatory signals in noncoding sequences input a list of genes of interest you retrieve the upstream sequences over a desired distance, discover putative regulatory signals, search the matching positions for these signals in your original dataset or in whole genomes, display the results graphically in the form of a feature map.

Usecase proteomics virtual desktop Motivation Collaboration with a mass spectroscopy platform Running out of space on their local resources Protein identification tools Mass experimental data Reference databases : nr, Swiss-Prot Reference screening tools: OMSSA, X!Tandem User interface Remote Virtual Desktop (NX) Reference GUI PeptidShaker

Interacting with the communities - Liven up, train people and participate to projects -

GRISBI http://www.france-bioinformatique.fr/?q=fr/groupe-de-travail/grisbi Groupe de réflexion sur les InfraStructures BioInformatiques Concertation technologique IFB-core/centres régionaux/plateformes identification des besoins, choix des orientations, suivi des développements transfert de compétences en technologie cloud Lien avec les partenaires technologiques IDRIS, IDGC, mésocentres, StratusLab, ELIXIR, IFB-GS 1 3 IFB-SO 2 IFB-NE Parten. 1 IFB-GO 9 IFB-core 4 PRABI 2 APLIBIO 6 IFB-core APLIBIO PRABI IFB-GO IFB-SO IFB-GS IFB-NE Parten. 2 2 2 2 2 2 4 6 Grisbi-25 (22) 2014-06 2 2 2 1 1 3 3 8 Grisbi-24 (22) 2013-11 2 2 1 2 4 4 Grisbi-23 (15) 2012-12 3 1 2 1 2 6 Grisbi-22 (15) 2012-06 Grisbi-26 (28)

Tutoriels Cloud pour la Biologie 2 sessions en 2014 19 juin 2014-23 participants - IFB-core, Gif-sur-Yvette 6 novembre 2014-20 participants - GenOuest, IFB-GO, Rennes Formation d'initiation à l utilisation du cloud computing pour l'analyse de données biologiques avec les outils usuels de bioinformatique Aborde les concepts et principes généraux du cloud, ainsi que la description des outils, des usages et de son intérêt pour la Bioinformatique. Une partie pratique sur les clouds concernés par la session (IFBcore ou Genocloud). FORMATEURS - Christophe BLANCHET (IFB- core) - Olivier COLLIN (GenOuest) - Jean- François GIBRAT (IFB- core) - Marie GROSJEAN (IFB- core) - Charles LOOMIS (LAL) - Cyril MONJEAUD (GenOuest) - Yvan LE BRAS(GenOuest) - Olivier SALLOU (GenOuest) 09h00-09h30 09h30 10h00 10h00-11h00 11h00-11h30 11h30 13h00 13h00 14h00 14h00 15h30 15h30 16h00 Accueil des participants Présentation de l IFB Salle Markov Cloud computing Salle Markov Pause- café Salle Markov Cloud pour la biologie Salle Markov Déjeuner Hall Amphi G Exemples d application de cloud Salle Markov Pause- café

Ecole Cumulo NumBIO 2015 Objectifs Mettre en relation le plus largement possible les scientifiques et ingénieurs des sciences du vivant, avec leurs besoins d analyse à large échelle de données biologique hétérogènes, et les scientifiques et ingénieurs des sciences informatiques, avec leurs développements de recherche et les solutions de Cloud existantes pour l intégration des logiciels et des données. 1er au 5 juin 2015, Lieu : à confirmer Participation attendue: 100 personnes Soutien du CNRS et deux de ses instituts: INSB et INS2I Sessions Besoins de la Biologie, exemples de la génomique et de la protéomique Etat de l art des infrastructures bioinformatiques Intégration des données et des outils, workflows et provenance Cloud infrastructures des donnéesacadémiques de production Cloud développements en recherche Gestion des données L infrastructure nationale IFB Présentation de participants (sélection préalable) - Chairs (nom et qualité) C. Bruley, C. Gaspin, T. Grange, C. Médigue, C. Thermes J.F. Gibrat, H. Touzet S. Cohen-Boulakia, C. Froidevaux V. Breton, C. Loomis M. Daydé, F. Desprez G. Antoniu C. Blanchet, O. Collin

ELIXIR Workstreams Tools interoperability and service registry and Elixir technical services Partners Technical Core Group and task forces (cloud, services registry, AAI) Excelerate project Work packages Tools Interoperability and Service Registry and Technical Services PIA BIODATACLOUD WP-2.3 Besoins informatiques EU H2020 Cyclone Bioinformatics applications National infrastructures: ProFi, France Génomique, MetaboHub

Conclusion - Current usage of IFB s cloud Scientific production - 100 users opened to members of IFB (at least with standard user level) opened to partners, academic and industry, infrastructures and projects: e.g. BioDataCloud, ProFi, MetaboHub, resources allocation according to scientific and financial criteria Training - 60 users short- and middle-term accounts two tutorials Cloud pour la Biologie in June and Nov. 2014 tutorial at ECCB 14 about Analysis of Cis-Regulatory Motifs from High-Throughput Sequence Sets, in Sept. 2014 Master2 hands-on Méthodes bioinformatiques pour la cisrégulation (AMU SBBCU16L), Oct.-Dec. 2014

Perspectives Create more bioinformatics appliances by the experts of the different life sciences domains to make them available to the community current appliances in progress BioDataCloud-RNAseq, ProFi, SymBioWatch, Clinical NGS for cancerology, REPET, TriAnnot, BWA-mpi IFB supports different domain-specific developments First round: Microbial Bioinformatics, Evolutionary bioinformatics, Plant bioinformatics, Structural Biology, NGS data processing Future call to projects Technical pilots Interoperability of appliances on different cloud infrastructures Registry of distributed multi-cloud datasets Live remote cloud processing of sequencing data

Questions? http://www.france-bioinformatique.fr Acknowledgments Clément Gauthey (CNRS IDRIS, formerly IDB-IBCP) Developers that integrated their tools as a IFB s appliance: Samuel Blanck (Inria Lille), Jacques van Helden (TAGC), You? IFB members StratusLab members IFB s funding by French program PIA INBS 2012