Le cloud IFB et son instance Galaxy

Similar documents
Le cloud IFB et son instance Galaxy

Une e-infrastructure nationale en bioinformatique

Institut Français de Bioinformatique, Un Cloud pour les Sciences du Vivant

Bioinformatique sur Cloud Cas d usage avec le portail Galaxy

Cloud pour la Bioinformatique

Sequencing data. And other experimental data. EMBL-EBI data resources growth

Cloud Ready for Bioinformatics?

IFB s e-infrastructure

Deployment of BioXSDenabled services on a Cloud. christophe.blanchet@ibcp.fr

E-SCIENCE IN WESTERN FRANCE : THE BEGINNING

DATA MANAGEMENT PLAN IN THE REAL LIFE SCIENCES

A curated Domain centric shared Docker registry linked to the Galaxy toolshed

Towards a galaxy.prabi.fr

E-SCIENCE IN WESTERN FRANCE :

Ins$tut Français de Bioinforma$que Current situa+on and prospect. IFB General Assembly Gif- sur- Yve=e, January

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

SURFsara HPC Cloud Workshop

SURFsara HPC Cloud Workshop

Eoulsan Analyse du séquençage à haut débit dans le cloud et sur la grille

e-biogenouest : The Tools

Administrer les solutions Citrix XenApp et XenDesktop 7.6 CXD-203

icer Bioinformatics Support Fall 2011

Big Data and Cloud Computing for GHRSST

Stockage distribué sous Linux

StratusLab project. Standards, Interoperability and Asset Exploitation. Vangelis Floros, GRNET

UGENE Quick Start Guide

Restricted Document. Pulsant Technical Specification

SCC / QUANTUM Kick Off 2015 Comment gérer efficacement des workflows et archives de données non structurées?

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix

Le Cloud Computing selon IBM : stratégie et offres, zoom sur WebSphere CloudBurst

OPEN SOURCE AND BOTTOM-UP VRE APPROACH IN WESTERN FRANCE

Les nouveautés 2014 mise en lumière

SOFTWARE DEFINED SOLUTIONS JEUDI 19 NOVEMBRE Nicolas EHRMAN Sr Presales SDS

Cloud Computing through Virtualization and HPC technologies

Microsoft Hyper-V chose a Primary Server Virtualization Platform

Maquette DB2 PureScale

Vincent Rullier Technology specialist Microsoft Suisse Romande

Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow. Barry Bolding. Cray Inc Seattle, WA

Seed4C: A Cloud Security Infrastructure validated on Grid 5000

DU PROJET E-BIOGENOUEST À CESGO, PREMIER CENTRE E-SCIENCE EN FRANCE : MISE EN PLACE D UNE INFRASTRUCTURE DE DONNÉES OUVERTE

Experiences and challenges in the development of the JASMIN cloud service for the environmental science community

Group Projects M1 - Cubbyhole

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

Big Data in BioMedical Sciences. Steven Newhouse, Head of Technical Services, EMBL-EBI

Enabling multi-cloud resources at CERN within the Helix Nebula project. D. Giordano (CERN IT-SDC) HEPiX Spring 2014 Workshop 23 May 2014

Big Data in BioMedical Sciences. Steven Newhouse, Head of Technical Services, EMBL-EBI

Building Storage Service in a Private Cloud

Development of Bio-Cloud Service for Genomic Analysis Based on Virtual

Planning, Provisioning and Deploying Enterprise Clouds with Oracle Enterprise Manager 12c Kevin Patterson, Principal Sales Consultant, Enterprise

Three data delivery cases for EMBL- EBI s Embassy. Guy Cochrane

Options in Open Source Virtualization and Cloud Computing. Andrew Hadinyoto Republic Polytechnic

Managing and Conducting Biomedical Research on the Cloud Prasad Patil

In order to upload a VM you need to have a VM image in one of the following formats:

Agenda. Begining Research Project. Our problems. λ The End is not near...

Red Hat enterprise virtualization 3.0 feature comparison

Guide Share France Groupe de Travail MQ sept 2013

System Requirements Orion

WebLogic on Oracle Database Appliance: Combining High Availability and Simplicity

Data Centers and Cloud Computing. Data Centers

Hadoopizer : a cloud environment for bioinformatics data analysis

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

New solutions for Big Data Analysis and Visualization

Jimmy Hébergement Cloud - TechDay

vnebula Cloud. Made Easy. Introducing vnebula from Stream Networks. A simple, self-service cloud portal for our partner community.

Open Source Cloud Computing Management with OpenNebula

2nd Singapore Heritage Science Conference

Virtual Computing and VMWare. Module 4

Solution for private cloud computing

Module I-7410 Advanced Linux FS-11 Part1: Virtualization with KVM

Storage solutions for a. infrastructure. Giacinto DONVITO INFN-Bari. Workshop on Cloud Services for File Synchronisation and Sharing

Workshop. Avril 2015 Benoit Buonassera

COLLABORATIVE LCA. Rachel Arnould and Thomas Albisser. Hop-Cube, France

wu.cloud: Insights Gained from Operating a Private Cloud System

Hyper-V vs ESX at the datacenter

Nebula Cloud Computing Project: Background, Technology, Operations, Challenges, and Status

An Energy-aware Multi-start Local Search Metaheuristic for Scheduling VMs within the OpenNebula Cloud Distribution

Dedicated Hosting. The best of all worlds. Build your server to deliver just what you want. For more information visit: imcloudservices.com.

Assignment # 1 (Cloud Computing Security)

IT-ADVENTURES PLAYGROUND (ISERINK) Remote Setup Guide IOWA STATE UNIVERSITY INFORMATION ASSURANCE CENTER

Deploying Business Virtual Appliances on Open Source Cloud Computing

THE EUCALYPTUS OPEN-SOURCE PRIVATE CLOUD

An Alternative to the VMware Tax...

Calcul parallèle avec R

HPC Cloud. Focus on your research. Floris Sluiter Project leader SARA

SAP Crystal Reports & SAP HANA: Integration & Roadmap Kenneth Li SAP SESSION CODE: 0401

-> Integration of MAPHiTS in Galaxy

Denis Caromel, CEO Ac.veEon. Orchestrate and Accelerate Applica.ons. Open Source Cloud Solu.ons Hybrid Cloud: Private with Burst Capacity

Case study: Migrating 1,000 VMs from VMware to RHEV. Tomas Von Veschler Cox Senior Solution Architect, Red Hat June 2013

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

Solution for private cloud computing

Mobile Cloud Computing T Open Source IaaS

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

Introduction to Cloud Computing

2) Xen Hypervisor 3) UEC

Parallels Plesk Automation

Estonian Scientific Computing Infrastructure (ETAIS)

Computer Science. About PaaS Security. Donghoon Kim Henry E. Schaffer Mladen A. Vouk

Final Report on StratusLab Adoption

Cloud OS. Philip Meyer Partner Technology Specialist - Hosting

Getting Started Hacking on OpenNebula

Transcription:

Le cloud IFB et son instance Galaxy Christophe BLANCHET Institut Français de Bioinformatique - IFB French Institute of Bioinformatics - ELIXIR-FR CNRS UMS3601 - Gif-sur-Yvette - FRANCE Ecole Bioinformatique Aviesan 28 Septembre 2015, Roscoff

Experimental data in life sciences (FR) French national platforms (GIS IBISA) Nb Cellular imaging 19 Genomic, Transcriptomic 16 Proteomic 13 French NGS platforms Structural biology, biophysic 11 NGS C BI PRO Source: omicsmaps.com NGS IMG PRO NGS BI C Biological platform (Genomics, IMaGing, PROteomics...) BI C Bioinformatics center Cloud resources Scientists BI IMG PRO NGS BI C PRO IMG C C Un déluge de donnée. Blanchet C. et Collin O., 2011, Biofutur, 323: 64-67 PRO Regional centers distribute the load in terms of computing and storage, and provide better interactions with scientists Des sites intermédiaires permettent de répartir la charge en terme de stockage et de puissance de calcul tout en assurant une meilleure proximité avec les scientifiques 2

A lot of bioinforma-cs tools tools BLAST FastA OMSSA ClustalW2 SSearch PeptideShaker ARIA BWA X!tandem HMMer TopHat samtools Galaxy Clustal Muscle fastqc Omega R ABYSS 1.3.4 ARIA 2.3 Bioconductor 2.11 biomaj BLAST+ 2.2.27 Blat 35 Bowtie 0.12.8 Bowtie2 2.0.0- beta7 BWA 0.6.2 BWA 0.7.10 CAP3 CD-HIT 4.6.1 Clustal Omega 1.0.3 CLUSTALW 2.1 Cufflinks 2.0.2 Cutadapt 1.2.1 E-SURGE 1.9.0 Exonerate 2.2.0 express 1.5.1 FastA 3.6 FastQC 0.10.1 Galaxy portal GATK 2.3.4 HMMer 3.0 ImageJ 1.48 khmer 1.1 M-SURGE 1.8.5 MEME 4.7 MMSEQ 0.11.2a Mobyle MODAL MultAlin 5.4.1 MUSCLE 3.8.31 neo4j Oases 0.2.08 OMSSA 2.1.9 PeptideShaker 0.18.3 phyml 3.1 PREDATOR 2.1.2 proline python 2.7 R 2.13 R 3.1.1 R 3.1.2 R-studio Ray 1.3 RSAT samtools 0.1.18 Samtools 1.1 SearchGUI 1.10.4 SeqClean Shiny Stacks STAR 2.4.0f1 SuMo v1 TGICL TopHat 2.0.6 trim_galore 0.3.7 Trinity 2.0.4 U-CARE 2.3.2 VCFtools 0.1.11 Velvet 1.2.10 X!tandem 12-10-01-1 XPLOR-NIH 2.30 3

Many interfaces 4

The French Ins-tute of Bioinforma-cs and its e- infrastructure 5

IFB - Ins-tut Français de Bioinforma-que IFB, the French distributed infrastructure for life-science information Mission : to make available core bioinformatics resources to the national/international life science research community. To provide support for national biology programs To provide an IT infrastructure devoted to management and analysis of biological data To act as a middleman between the life science community and the bioinformatics/ computer science research community http://www.france-bioinformatique.fr CNRS UMS3601. Avenue de la Terrasse, Bât 21. 91190 Gif-sur-Yvette ELIXIR French Node The European distributed infrastructure for lifescience information To optimize the interactions and coordination between the national level and ELIXIR and other ESFRI infrastructures in biomedical and environmental field, To promote consistency and complementarities between the components offered by the ELIXIR French node and those of other European nodes 6

IFB e- Infrastructure Mission : to provide core bioinformatics resources to the life science research community. To set up a French IT infrastructure (cloud) devoted to management and analysis of biological data To provide hardware, data collections and bioinformatics tools To collaborate with international infrastructure (ELIXIR) Current resources A national hub : IFB-core IT resources hosted at CNRS IDRIS SC center A network of regional centers 32 bioinformatics platforms - 15,000 cores - 5 PB Two running clouds : IFB-core and GenOuest Create a federation of clouds for life sciences C C 7

Virtualisa-on Virtual machines 1 N } } App App Application OS Matériel P R S Re OS Matériel P R S Re Système d exploitation Hyperviseur Matériel Proc. RAM Stock. Rés. Matériel Proc. RAM Stock. Rés. Physical server 8

IFB- core s cloud IFB-core # Compute Cores # TB Storage # TB RAM Max VM size Technology Location Pilot 200 50 2 40c 256GB StratusLab CNRS-IDRIS, Paris 2016-S1 3,000 500 -?144c 3TB? StratusLab 2017 10,000 2,000 -?? StratusLab CNRS-IDRIS, Paris CNRS-IDRIS, Paris NGS, imaging, statistics, PaaS IaaS launch jobs Scientists RENATER 10giga SaaS Frontend Master Virtualization Layer Shared FS Workers Web portal Pdisk storage iscsi 10giga eth Hosted @ IDRIS CNRS SC-center 10giga eth Cloud Hypervisors - std nodes: 32c 128GB - bigmem nodes: 40c 256GB 9

Provide scien-sts with bioinforma-cs resources - data and tools - as cloud appliances 10

Create bioinforma-cs appliances VM 1 VM n Application Application OS OS HW HW Hypervisor HW tools BLAST FastA OMSSA ClustalW2 SSearch PeptideShaker ARIA BWA X!tandem HMMer TopHat samtools Galaxy Clustal Muscle fastqc Omega Create new cloud services Virtual Machines R + Linux system Bioinformatics Marketplace Appliance? predefined virtual machine including tools, pipeline,recipes Ready to run Appliance annotation Title Description + (w. controlled voc.) Topics Tools Contact Developer(s) and maintainer(s)! Structures Sequences Proteomics Galaxy... 11

Remote desktop IFB s bioinforma-cs appliances Proteomics Imaging Web Galaxy MODAL Eco Pop Galaxy Galaxy RADseq Galaxy AVIESAN 2015 BioDataCloud IGV Scientific apps RSAT z PhyML MacSyFinder SynBioWatch R CLI statistics biocompute Node Aria biohadoop Utilities biodata BioMaj BlobSeer biodata NFS Cassandra Docker CentOS Ubuntu Base OS Neo4j Data mgmt 12

AppS Cloud Galaxy Portal Galaxy portal is widely used in the community analyse NGS data (mainly but not only) connected to community knowledge: data and indexes, tools, workflows Cloud advantages : User is administrator of his/her own Galaxy instance: install data and tools Versions 1,2..n & Updates Preserve workflows and results in cloud storage Help the integration of monthly updates and new tools Cloud permit different appliances to be built from the same base: a basic one with common tools for NGS specific ones for a domain or a set of tools e.g. Galaxy-MODAL, Galaxy-RADseq, EBA-ChIP-Seq for training: create a special appliance with dedicated datasets, tools or workflows e.g. AVIESAN school 2013 manual Galaxy NGS Run Linux system Galaxy + Tools Galaxy RADseq Galaxy automatic Galaxy MODAL Aviesan Cloud Marketplace Install Publish VMs 13

App R Sta-s-cal Compu-ng R software environment for statistical computing and graphics include common bioinformatics module RStudio IDE Biobase, BiocGenerics, BiocInstaller, GenomeInfoDb integrated development environment (IDE) for R features: console, syntax-highlighting editor Shiny web framework powerful web framework for building web applications using R. without requiring HTML, CSS, or JavaScript knowledge. Contact: Stéphane Delmotte (PRABI-LBBE) 14

IFB s cloud for Bioinforma-cs Public Data sources Data BioMAJ EMBL PDB Genomes UNIPROT PROSITE Reference Datasets common share VMs VMs VMs VMs VMs VMs VMs VMs VMs VMs Cloud Credentials Data Personalized interfaces j. doe e. martin you chb virtual disks cg User data Author. VMs VMs VMs Cloud for Bioinformatics 15

A cloud driven through a web dashboard http://cloud.france-bioinformatique.fr/cloud 16

Browse the marketplace and run an App! Proteomics Sequences Galaxy Structures?... IFB s bioinformatics marketplace! 17

Pra-que 18

Connexion au Cloud IFB http://cloud.france-bioinformatique.fr/cloud Connectez-vous au cloud IFB dans Sign in 19

Connexion au Cloud IFB (2) Lors de la première connexion rubrique Settings complétez vos paramètres! clé SSH: fichier ~/.ssh/dsa.pub attention aux retours à la ligne lors du copier-coller la créer avec sshkeygen (ou PuTTYgen) 20

Le tableau de bord du Cloud IFB Gérer ses VMs Créer / arrêter / renommer Gérer ses disques virtuels Créer / supprimer Visualisez les paramètres nom/type/taille état/charge CPU disque virtuel attaché 21

Créer une machine virtuelle Proteomics Sequences Galaxy Structures?... IFB s Marketplace! 22

Choisir la bonne appliance 23

Caractéris-ques d une VM Définir sa VM un nom nombre de CPUs taille mémoire attacher un disque virtuel Cluster de VMs remplir le nombre de VMs choix du nom unique Appelée aussi Instance 24

Les disques durs virtuels Pour stocker ses données taille et nombre variable (quota) retrouver ses données d une VM à la suivante.! pas de sauvegarde! Sur un vdisk l attacher à une (seule) VM à la création de la VM Partager un vdisk mode cluster VM NFS 25

Pra-que Depuis le tableau de bord du cloud http://cloud.france-bioinformatique.fr/cloud/ Créez votre disque virtuel bouton New vdisk monddgalaxy, 10Go 26

Pra-que Depuis le tableau de bord du cloud http://cloud.france-bioinformatique.fr/cloud/ bouton New instance identifier quelle appliance fournit l outil deeptools? quelle version de l outil? Créer une instance EBA15 Galaxy ChIP-seq nom: galaxy_roscoff_2015 taille: c3.medium (2 CPU, 8 GB RAM) attacher votre disque virtuel monddgalaxy 27

Monitor your usage 28

Ques-ons? Acknowledgments IFB members IFB hub: Patricia, Jean-François, Mohamed, Jonathan, Maxime, Dominique Alumni : Marie, Quentin we are hiring! Working group IFB-GRISBI (co-chair with Olivier Collin) Appliances developers Samuel Blanck (Inria Lille), Jacques van Helden (TAGC), Stéphane Delmotte (PRABI-Doua), Bruno Spataro (PRABI-Doua), Marie-Laure Franchinard (MIGALE), Anis Djari (BioinfoGenoToul), Bertrand Néron (Institut Pasteur), Adrien Josso (MicroScope), Thomas Lacroix (MIGALE), Christian Baudet (CLB), Germain Paimparay & Baptiste Brault (CFB) CNRS IDRIS: R. Medeiros, C. Gauthey and staff StratusLab members IFB is funded by French programs PIA INBS 2012, BioDataCloud EU H2020 projects, CYCLONE (644925) and EGI-Engage (654142) http://www.france-bioinformatique.fr 29