Cloud Ready for Bioinformatics?



Similar documents
Bioinformatique sur Cloud Cas d usage avec le portail Galaxy

Cloud pour la Bioinformatique

Sequencing data. And other experimental data. EMBL-EBI data resources growth

Une e-infrastructure nationale en bioinformatique

Le cloud IFB et son instance Galaxy

IFB s e-infrastructure

Le cloud IFB et son instance Galaxy

StratusLab project. Standards, Interoperability and Asset Exploitation. Vangelis Floros, GRNET

Towards a galaxy.prabi.fr

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix

Final Report on StratusLab Adoption

Eoulsan Analyse du séquençage à haut débit dans le cloud et sur la grille

SURFsara HPC Cloud Workshop

UGENE Quick Start Guide

Hadoopizer : a cloud environment for bioinformatics data analysis

Three data delivery cases for EMBL- EBI s Embassy. Guy Cochrane

Managing and Conducting Biomedical Research on the Cloud Prasad Patil

Deploying Business Virtual Appliances on Open Source Cloud Computing

Early Cloud Experiences with the Kepler Scientific Workflow System

HPC Cloud. Focus on your research. Floris Sluiter Project leader SARA

Bioinformatics Grid - Enabled Tools For Biologists.

Restricted Document. Pulsant Technical Specification

vnebula Cloud. Made Easy. Introducing vnebula from Stream Networks. A simple, self-service cloud portal for our partner community.

VDI: What Does it Mean, Deploying challenges & Will It Save You Money?

EMBL-EBI Web Services

DATA MANAGEMENT PLAN IN THE REAL LIFE SCIENCES

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

SURFsara HPC Cloud Workshop

OnApp Cloud. The complete platform for cloud service providers. 114 Cores. 286 Cores / 400 Cores

Scientific and Technical Applications as a Service in the Cloud

THE EUCALYPTUS OPEN-SOURCE PRIVATE CLOUD

Big Data and Cloud Computing for GHRSST

Global and Discovery Proteomics Lecture Agenda

CLOUD. MADE EASY. vnebula Portal

Grid Computing Perspectives for IBM

icer Bioinformatics Support Fall 2011

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, Abstract. Haruna Cofer*, PhD

How To Run A Cloud Server On A Server Farm (Cloud)

OVERVIEW. The complete IaaS platform for service providers

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

Solution for private cloud computing

Overview. The OnApp Cloud Platform. Dashboard APPLIANCES. Used Total Used Total. Virtual Servers. Blueprint Servers. Load Balancers.

Hyper-V Private Cloud Virtualization & Optimization

Enabling Large-Scale Testing of IaaS Cloud Platforms on the Grid 5000 Testbed

Content Distribution Management

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

Hyper-V vs ESX at the datacenter

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

Big Data in BioMedical Sciences. Steven Newhouse, Head of Technical Services, EMBL-EBI

Cloud Success story in India

A curated Domain centric shared Docker registry linked to the Galaxy toolshed

OpenNebula Leading Innovation in Cloud Computing Management

How To Install Eucalyptus (Cont'D) On A Cloud) On An Ubuntu Or Linux (Contd) Or A Windows 7 (Cont') (Cont'T) (Bsd) (Dll) (Amd)

Cloud UT. Pay-as-you-go computing explained

Boas Betzler. Planet. Globally Distributed IaaS Platform Examples AWS and SoftLayer. November 9, IBM Corporation

GTC Presentation March 19, Copyright 2012 Penguin Computing, Inc. All rights reserved

Virtualisation Cloud Computing at the RAL Tier 1. Ian Collier STFC RAL Tier 1 HEPiX, Bologna, 18 th April 2013

Integration of Virtualized Workernodes in Batch Queueing Systems The ViBatch Concept

Cloud Services. May 28 th, 2014 Athens, Greece

Good Morning Wireless! SSID: MSFTOPEN No Username or Password Required

Introduction to Cloud Technology

Data Sharing Initiative: International Cancer Genome Consortium

NAS Storage needs to be purchased; Will not be offered IAAS - Utility SMTP Per SMTP account Per server

Personalized Medicine and IT

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

Computing Power at your Service: IaaS from the Private Cloud Dynamic Services for Infrastructure

Hadoop Distributed File System Propagation Adapter for Nimbus

BioHPC Web Computing Resources at CBSU

Cloud Federations in Contrail

Building Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT

SUSE Cloud Installation: Best Practices Using an Existing SMT and KVM Environment

Intel Cloud Builder Guide to Cloud Design and Deployment on Intel Xeon Processor-based Platforms

The most powerful open source data science technologies in your browser.!! Yves Hilpisch

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 7

Brian Connolly Systems Engineer, LabKey Software LabKey Server in the Cloud

Transcription:

IDB acknowledges co-funding by the European Community's Seventh Framework Programme (INFSO-RI-261552) and the French National Research Agency's Arpege Programme (ANR-10-SEGI-001) Cloud Ready for Bioinformatics? Christophe Blanchet, Clément Gauthey C. Blanchet and C. Gauthey Plateforme Infrastructure Distributed Distribuée for pour Biology la Biologie IDB-IBCP CNRS FR3302 - LYON - FRANCE Journée Clouds pour le Calcul http://idee-b.ibcp.fr Scientifique, Paris Sud (Orsay, France) 27 novembre 2012

A Bioinformatics Today Size of biological data are tremendous Institut Sanger, UK, 5 PB Beijing Genome Institute, China, 4 sites, 10 PB Huge data in lot of places Analysing such data became difficult Scale-up of the analyses : gene/protein to complete genome/proteome,... Lot of different daily-used tools That need to be combined in workflows Usual interfaces: portals, Web services, federation,... Datacenters with ease of access/use Distributed resources Experimental platforms: NGS, structural,... ADN Bioinformatics platfomr Federation M BI ADN ADN BI CC BI ADN ADN

Infrastructure in Biology

Infrastructure in Biology

Infrastructure in Biology

Infrastructure in Biology

Infrastructure in Biology

Infrastructure in Biology Lot of tools and web services to treat and vizualize lot of data

The scene Core facility providers Is it easy to deploy lot of (incompatible) tools? To make them connected to public databases? To limit transfer of huge data? To provide users with their own computing resources? With their own isolated storage? Scientific users Is it easy to access/use these tools? To adapt to your usage? To get your/other tools deployed on a datacenter? To combine them?

Integrate Bioinformatics Tools in Cloud Bioinformatics Tools GOR4 ClustalW BLAST Ray BWA FastA SSearch Abyss PhyML Create new Appliance RedHat, CentOS Linux Virtual machines Debian, Ubuntu Suse Bioinformatics Marketplace Sequence Structure NGS Galaxy ARIA ( ) Predefined virtual machines small : few GB, easy to convert in most virtualization formats Installed and pre-configured with common bioinformatics tools e.g. BLAST, Clustalw, ARIA, MEME, HMMer, TopHat, BWA, Samtools, etc.

IDB s Cloud Workbench Demonstrate usefulness of cloud for Biology 13 turnkey bioinformatics appliances (as of Nov. 2012) Running since Sept. 2011, opened to Biology community 184 cores, 536 GB RAM, 36 TB de stockage Powered by Toolkit StratusLab (EU FP7 INFSO-RI-261552) Rel. 1.4 Specific developments of a Web interface ready for biologists

Bioinformatics Appliances

Bioinformatics Appliances (2)

Select your bioinformatics tools

Select your bioinformatics tools

Run Bioinformatics Cloud Instances Bioinformatics Marketplace Sequence Structure NGS Galaxy ARIA ( ) Launch Instances BLAST, Clustal, etc. PaaS IaaS launch jobs ssh Shared FS Master & Storage VM ARIA Workers VM CNS Portal IBCP's Cloud Resources

Manage your Cloud Instances

Data in cloud Upload your data Public Data sources Genomes EMBL PDB UNIPROT PROSITE shared (NFS) BLAST, Clustal, etc. PaaS scp http IaaS launch jobs ssh Shared FS User Persistent data pdisk (iscsi) Portal Master & Storage VM ARIA Bioinformatics Cloud Workers VM CNS scp http Get your results

Examples Structural Biology: the TOSCANI usecase Galaxy portal for NGS analyses Proteomics

Structural Biology TOwards StruCtural AssignmeNt Improvement To improve the determination of protein structures based on Nuclear Magnetic Resonance (NMR) information with ARIA software Large computational needs. A NMR laboratory will not specially invest in building a cluster of about 100 nodes to be able to run such NMR structure calculations. Flexibility of the cloud to deploy the different required bioinformatics tools can accelerate such a procedure. Commercial interest in providing such tools to structural biologists on a pay as you go basis. Endorsers: Institut Pasteur Paris and CNRS IBCP

IaaS deployment of ARIA ARIA CNS CNS CNS CNS Structure preparation... (20-100) Intermediate results CNS CNS CNS CNS (8x) Read Write Shared Storage Input data: 10s MB Results: GB Virtual Cluster Significant increase in the number of calculated protein conformations improves the statistics on the NMR conformations and can help to overcome the ambiguity bottleneck. launch jobs ssh Final results Master & Storage VM ARIA Shared FS Workers VM CNS

Galaxy portal for NGS analyses

Galaxy portal for NGS analyses

Galaxy portal for NGS analyses

Galaxy portal for NGS analyses

Motivation Protein identification Proteomics Mass spectroscopy platform Running out of space on their local resources Mass spectroscopy experimental data reference databases : nr, Swiss-Prot Screening tools: OMSSA, X!Tandem User interface Remote display Common GUIs SearchGUI PeptidShaker source: PeptideShaker site

Conclusion Provide turnkey bioinformatics cloud services Standard tools and pipelines Ready to run on core facilities and commercial datacenters Easier to transfer appliances than data (GB vs TB) Cloud infrastructure tightly connected to existing bioinformatics infrastructure Public IDB s bioinformatics cloud Linked to public biological databases Collaboration with national Research Infrastructures like the French RENABI GRISBI Ease the access Web interface for cloud management Usual scientific gateways Persistent and large ubiquitous storage

What s next? Help bioinformatics centres to provide academic and commercial community with bioinformatics services! Pre-defined bioinformatics appliances, webservices and portal (PaaS) Multi-nodes applications, e.g. ARIA, or comprehensive pipelines (IaaS) Referenced in a bioinformatics marketplace Ready to deploy on the future cloud infrastructure of the French Bioinformatics Institute (IA ReNaBi-IFB)

Questions? Platform Infrastructure Distributed for Biology - IDB acknowledges Institut Pasteur: M Nilges, T Malliavin, F. Mareuil StratusLab partners co-funding by the European Community's Seventh Framework Programme (INFSO-RI-261552) and the French National Research Agency's Arpege Programme (ANR-10-SEGI-001) http://idee-b.ibcp.fr