Sequencing Institut Français de Bioinformatique, Un loud pour les Sciences du Vivant source: www.genomesonline.org source: www.politigenomics.com/next-generation- hristophe Blanchet Institut Français de Bioinformatique - IFB French Institute of - ELIXIR French Node NRS UMS3601 - Gif-sur-Yvette - FRANE omplete genome sequencing become a lab commodity with (cheap and efficient) Source: omicsmaps.com And other experimental EMBL-E resources growth FR - EU source: EMBL-E Annual Report 2013
Plateformes Expérimentales en Biologie Plateformes nationales (GIS ISA) Nb Imagerie cellulaire 19 Génomique, Transcriptomique 16 Protéomique 13 Biologie structurale, biophysique Infrastructures in Biology Biological platform (Genomics, IMaGing, teomics...) center loud resources 11 Scientists Localisation des plateformes Des sites intermédiaires permettent de répartir la charge en terme de stockage et de puissance de calcul tout en assurant une meilleure proximité avec les scientifiques Source: omicsmaps.com IFB mission Mission: to make available core bioinformatics resources to the national life science research community. by providing support for biology programs IFB structure IFB: national research infrastructure of service in Lot of bioinformatics tools and services to treat and vizualize the biological projects bolstering users training IFB consists of : A network of 6 regional centers (> 20 PFs) about 110 FTE permanent staff + 70 FT staff A national node : IFB-core plan of 10 FTE permanent staff (currently 3) + a few FT staff by setting up an IT infrastructure devoted to the management and analysis of biological material resources : PUs, disks, etc. availability of biology collections deployment of bioinformatics tools (loud) by acting as a middleman between the life science community needs and the bioinformatics/computer science research community by being the ELIXIR French node
IFB-core: IFB s national hub IFB-core consists of two teams: The "ommunication, Training, Exploitation" unit in charge of the web site and training program based on e-learning The "Infrastructure" unit in charge of the IFB national IT infrastructure IFB-core tasks to provide a technical and administrative support to IFB to implement IFB scientific policy and facilitate the dissemination of actions to ensure an effective coordination between the PFs to serve as an interface by providing a unique entry point to IFB partners (supervisory authorities, the life science community, European and national bioinformatics communities) to set up and manage the IFB s national IT infrastructure to facilitate access to this IT infrastructure by deploying an academic loud IFB e-infrastructure Support : help members to deploy and use their tools e-infrastructure: hardware, biology collections, bioinformatics tools Academic cloud for life science a core ressource IFB-core hosted at NRS IDRIS S center (Paris) + regional resources 6 regional bioinformatics centers with 2 clouds 11,000 cores - +6 PB but +20 bioinformatics platforms reate a federation of clouds for life sciences Technical organization GRIS: a national technical working group (all platforms) Participation to ELIXIR task forces RENA-GO RENA-SO APLIO IFB-core RENA-GS RENA-NE PRA loud? SaaS IaaS PaaS Deploy. public community private hybrid loud Ressources Location # ompute ores # TB Storage # TB RAM Max VM size Technology IFB-core NRS-IDRIS, Paris 200 50 2 40c 256GB StratusLab IFB-core 2015 NRS-IDRIS, Paris 3,000 500-96c 1TB StratusLab IFB-core 2016 NRS-IDRIS, Paris 10,000 2,000-96c 2TB StratusLab Genocloud IFB-GO, Rennes 240 8 1 - ONE Ack.:. Loomis
A cloud driven through a web dashboard Ready-to-use bioinformatics cloud appliances appliances are usual virtual machines reate new cloud services e.g. BLAST, lustalw, ARIA, MEME, HMMer, TopHat, BWA, Samtools, etc. MODAL LI PhyML Web AVIESAN 2013 l Virtua op deskt bioompute Node RSAT Run bioinformatics appliances marketplace both a virtual machines repository Store life science VMs and a catalogue Help users to select the appropriate VM for their analysis Aria RSAT mini U ti s litiebiodata BioMaj BlobSeer biodata NFS Dat biohadoop mt a mg public B entos assandra... devoted to bioinformatics urrent bioinformatics appliances @ IFB apps Linux system Scientific + Virtual Machines Referenced in a marketplace R lustalw2 Installed and preconfigured with bioinformatics tools BLAST FastA OMSSA SSearch PeptideShaker ARIA BWA Xtandem HMMer TopHat samtools lustal Muscle fastq Omega small : few GB, easy to convert in most virtualization formats http://cloud.france-bioinformatique.fr/cloud tools A UNIT EMBL Genomes PDB SITE Move cloud virtual machines Ubuntu (2) tools VM: BLAST, lustalw2, etc. Filter images with meta related to bioinformatics (1) ud lo IDB Analyze OS Base Select tools Scientists can filter (1) the appliances through a Web interface to identify and launch (2) the appropriate ones.... (3) Use tools (3) Scientists have access to their own cloud resources through web portal, remote virtual desktop or SSH. attribute <bio:tool> in VM manifests scientists can select the appropriate appliance according to the tools required for their analyses e.g. the BLAST tool Deploy on several clouds
Public Data sources Genomes EMBL Storage for biological PDB UNIT SITE shared (NFS ro) LI (scp, sftp), GUI (yberduck, Transmit, Filezilla, ) PaaS BLAST, lustal, etc. sftp/http/s3 IaaS Master & Storage VM ARIA launch jobs ssh Shared FS Workers VM NS Upload your Monitor your usage Identity Mgmt j. doe e. martin you chb virtual disks Portal loud cg User sftp/http/s3 LI (scp, sftp), GUI (yberduck, Transmit, Filezilla, ) Get your results Moving VMs vs Data ase 1: Standard node Biological platform (Genomics, IMaGing, teomics...) center loud resources Scientists VM VM VM VMs IFB life sciences marketplace & VMs repository appliance Biocompute Use your own instance(s) With pre-installed standard bioinformatics tools BLAST, FastA, SSearch,HMM,... lustalw2, lustal-omega, Muscle,.. Bowtie(2), BWA, samtools,... MEME, R, etc. onnected to public reference Uniprot, EMBL, genomes, PDB, etc. Automaticaly shared to the VMs luster mode turn several instances in a single virtual cluster shared file system batch scheduling
appliances @ IFB cloud ase 2: loud portal Web interface portal is widely used in the community analyse (mainly but not only) connected to community knowledge: and indexes, tools, workflows Preserve workflows and results (cloud virtual disk) Help the integration of monthly updates and new tools loud permit different appliances to be built from the same base: generic with common tools for specific for a set of tools, example of MODAL (MOdels for Data Analysis and Learning) for training: create a special appliance with dedicated sets, tools or workflows (French AVIESAN school 2013) domain specific appliances: RNAseq, HIPseq, etc. (planned) reated by interactive installation Main with standard tools Devoted For a specific event: training, demo, Linux system manual installation For tools: e.g. MODAL AVIESAN 2013 MODAL Version 1.0 Version 2.0 N ase 3: ase 4: virtual desktop A specialized software suite for the analysis of noncoding sequences Used for motif discovery in promotors of co-expressed genes HIPseq analysis evolutionary conserved motifs (phylogenetics footprints) EB 14 tutorial T01 ontact: J. van Helden (TGA) RSAT offers a series of tools dedicated to the detection of regulatory signals in noncoding sequences input a list of genes of interest you retrieve the upstream sequences over a desired distance, discover putative regulatory signals, search the matching positions for these signals in your original set or in whole genomes, display the results graphically in the form of a feature map. Motivation ollaboration with a mass spectroscopy platform Running out of space on their local resources Mass experimental Reference bases : nr, Swiss-Prot Reference screening tools: OMSSA, XTandem Remote Virtual Desktop (NX) Reference GUIs SearchGUI PeptidShaker Protein identification tools User interface source: PeptideShaker site
Perspectives IFB - an academic cloud for life sciences IFB s cloud simplify access to biological and tools reference images related to life science help users to select the appropriate VM for their analysis R BLAST FastA OMSSA lustalw2 SSearch PeptideShaker ARIA BWA Xtandem HMMer TopHat samtools lustal Muscle fastq Omega integrate tools and pipelines in turnkey cloud appliances is tightly connected to existing bioinformatics resources, e.g. public reference sources 16 bioinformatics appliances: standard compute nodes, proteomics virtual desktop, portal, structural biology +50 users from all IFB regional centers appliances registry tools reate new cloud services + Virtual Machines Linux system IFB established priorities for 5 scientific domains Microbial Evolutionary bioinformatics Plant bioinformatics Structural Biology processing Select tools... public B A UNIT EMBL Genomes PDB SITE Move cloud virtual machines Scientists can filter (1) appliances through a Web interface to identify and launch (2) the appropriate ones....the (2) tools VM: BLAST, lustalw2, etc. (1) ud lo IDB Analyze (3) Use tools (3) Questions? http://cloud.france-bioinformatique.fr Acknowledgments lément Gauthey (NRS IDRIS, form. IDB-IBP) Developers of tools that integrated them as an IFB cloud appliance: Samuel Blanck (Inria Lille), Jacques van Helden (TAG), You? StratusLab members IFB s funding by French program PIA INBS 2012 by the experts of the domains publish them in the IFB marketplace to make them available to the scientists Scientists have access to their own cloud resources through web portal, Ecole AVIESAN 2014, 7 septembre remote2014, virtualroscoff desktop or SSH. reate more bioinformatics appliances and technical pilots Interoperability of appliances on different cloud infrastructures Registry of distributed multi-loud sets Live remote cloud processing of sequencing