Scientific and Technical Applications as a Service in the Cloud University of Bern, 28.11.2011 adapted version Wibke Sudholt CloudBroker GmbH Technoparkstrasse 1, CH-8005 Zurich, Switzerland Phone: +41 44 633 79 34 Email: info@cloudbroker.com Web: http://www.cloudbroker.com 28.11.2011 1
Overview High performance computing (HPC) in the cloud CloudBroker Platform Example: Protein modeling in the IBM Cloud for ETH Zurich EuroCloud Swiss 28.11.2011 2
HPC in the Cloud 28.11.2011 3
Cloud Terms Clusters Virtualization Hybrid Cloud Computing on Demand Private Cloud Infrastructure as a Service Elasticity Self Service Cloud Bursting Grids Multi-Tenancy Platform as a Service Utility Computing Public Cloud Pay-per-Use ASP Internet Cloud Storage Software as a Service Scalability SOA Web Services 28.11.2011 4
Cloud Computing Definition Access to computer resources on demand without much initial investment in time or money (self service) Only pay for what is actually used in small steps (OpEx instead of CapEx) Nearly unlimited scalability (elasticity) = Change in business model = Interfaces set at the right place 28.11.2011 5
Cloud Services Software as a Service (SaaS) Web / office / business applications, Salesforce, Google Apps,... Platform as a Service (PaaS) Development / deployment frameworks, distribution / messaging / monitoring systems, databases, Microsoft Windows Azure, Google App Engine,... Infrastructure as a Service (IaaS) Computing power, virtual machines, storage space, Amazon Web Services, IBM SmartCloud Enterprise,... 28.11.2011 6
Problems of Traditional HPC Scientific and technical applications Complex algorithms and applications needing HPC resources (supercomputers, clusters, grids) Mainly used in research and development (R&D), often projectbased, with increasing importance HPC computer infrastructure, middleware tools and application software Require expert knowledge Expensive, time-consuming and complex to buy, set up, use and maintain Hard to integrate with existing systems and processes Often operating at capacity limit HPC is hardly accessible or affordable for SMEs / small research groups, specialized application purposes or shortterm projects 28.11.2011 7
Advantages of Cloud for HPC Immediate access to computer resources on demand Availability of resources not existing in-house Possibility for spill-over / cloud bursting Temporary, non-binding utilization Pay-per-use with minimal initial investment Nearly unlimited scalability Hardware and partly software maintained by cloud providers 28.11.2011 8
Challenges of Cloud for HPC HPC infrastructure, middleware and applications remain complex to set up, use and maintain also in the cloud Dynamic features of the cloud and pay-per-use billing add to the complexity Performance limitations for some HPC calculations due to virtualization and available hardware Security concerns for R&D because of outsourcing, internationality, SLAs, multi-tenancy and potential vendor lock-in Hardware and software vendors have to adapt to the pay-per-use and self-service business model 28.11.2011 9
CloudBroker Platform 28.11.2011 10
Solutions of CloudBroker GmbH Easy, scalable, secure, integrable and pay-per-use access to scientific and technical applications in the cloud HPC application store / marketplace with direct deployment and execution of applications in the cloud and one bill for everything Using infrastructure as a service (IaaS) from cloud providers Offering platform as a service (PaaS) for software vendors Providing software as a service (SaaS) to end users Application parameters and files remain the same as for local execution and can be easily exported Integration into third party tools 28.11.2011 11
CloudBroker Platform R&D End Users and Software Vendors Bioinformatics Applications CLI Generic Workbenches Domain-Specific Gateways Web Service API Molecular Modeling Applications Fluid Dynamics Applications Applications CloudBroker Platform Amazon Cloud IBM Cloud CloudBroker Integration Web Browser UI Cloud 28.11.2011 12
Business Model End Users Usage Cloud Broker Resources Cloud Providers $ $ Applications $ Software Vendors 28.11.2011 13
Functionality End Users Web Browser UI Process Manager Application Manager Process Monitor Portals Clients Software Vendors Web Service API User Manager Queuing System Accounting Module Resource Manager Billing Module Storage Manager Payment Module Image Manager Scalability and Fault Tolerance Handler Cloud Provider Access Manager Amazon Adapter IBM Adapter Adapter Security Frame: Transport Layer Security, Access Rights Security Amazon Cloud IBM Cloud Cloud 28.11.2011 14
Security Customer Corporate IT Client Browser or Application Corporate Security Policies and Standards SSL Secured Connection Authentication CloudBroker CloudBroker Platform Industry Standard Application Security CBP Technology. Industry Standard Server Security Technology Industry Standard Secure Data Center SSL Secured Connection SSL secured connection Authentication to Cloud Authentication to VM Cloud Provider Cloud Instances Dedicated, Secured CBP. and Restricted Virtual Machines Security Certified Compute and Storage Cloud Technology Security Certified Data Center 28.11.2011 15
Typical Calculation Lifecycle 1. Prepayment (user) 2. Software selection and job creation (user) 3. Data file upload (user) to cloud storage (platform) 4. Job submission (user) 5. Compute instance startup or reuse (platform) 6. Data file upload from cloud storage to master instance (platform) 7. Calculation on compute instances (platform, application) 8. Data file download from master instance to cloud storage (platform) 9. Compute instance shutdown or reuse (platform) 10. Data file download (user) from cloud storage (platform) 11. Billing (platform) 28.11.2011 16
Current Applications Application Domain Remarks GAMESS BLAST AutoDock Gromacs Quantum chemical calculations DNA and protein sequence alignment Protein-ligand docking Molecular dynamics simulations X! Tandem Mass spectrometry data matching OpenFOAM Computational fluid dynamics Rosetta Protein modeling Only with own license??? Computational fluid dynamics In preparation??? Material science In preparation??? DNA and protein sequence alignment Requested??? Protein modeling Requested More applications continuously to be added Applications can also be added by users 28.11.2011 17
Application Requirements Software characteristics Scientific and technical applications, open source or commercial, independent of domain Compute-intensive, not data-intensive Batch-oriented, noninteractive, command line, running for hours or days Installable on Linux Single-threaded, multithreaded or parallel / MPI Deployment in the cloud Installation shell script and software package Configuration through the platform Selection of pricing options Validation and execution by the CloudBroker team Several software versions possible 28.11.2011 18
Integration into Third Party Tools Provide all platform and cloud advantages within an environment known to the user Public or private, generic or domainspecific clients, workflows, workbenches, portals, etc. Utilize platform as cloud middleware in the background KNIME Konstanz Information Miner http://www.knime.org Workflow framework SCI-BUS SCIentific gateway Based User Support http://www.sci-bus.eu EU FP7 project 11 User communities from different domains 28.11.2011 19
SCI-BUS Project SCI-BUS is supported by the FP7 Capacities Programme under contract no. RI-283481 28.11.2011 20
Example: Protein Modeling in the IBM Cloud for ETH Zurich 28.11.2011 21
Scientific Background R&D group: Institute of Molecular Systems Biology (IMSB) at ETH Zurich (http://www.imsb.ethz.ch) Goal: Better understand the mechanisms of infectious diseases to fight antibiotics resistance Example: Streptococcus bacterium Method: Computationally model the 3D structures of important proteins from their 1D sequence Software: Rosetta (http://www.rosettacommons.org) Analysis: Find the important structural differences between less and more harmful bacteria strains 28.11.2011 22
Example Protein Model Source: Dr. Lars Malmström, IMSB, ETH Zurich 28.11.2011 23
Infrastructure Problem: Calculations would need several months on ETH Zurich s compute cluster due to long queue waiting times and low job throughput Calculations: Embarrassingly parallel and thus highly scalable, compute-intensive and not dataintensive, can be automated and outsourced Perfect fit for cloud computing Solution: Use the CloudBroker Platform to deploy the Rosetta software and manage data and calculations on IBM SmartCloud Enterprise cloud resources 28.11.2011 24
Project Architecture Source: IBM Schweiz AG, CloudBroker GmbH 28.11.2011 25
Showcase Results 249 Streptococcus target proteins modeled using special Rosetta client for automation Up to 63 compute instances with 1 008 virtual CPUs in parallel provided by the IBM SmartCloud Enterprise Number of instances in the cloud automatically adjusted to the workload by the CloudBroker Platform Optimized data transfer between ETH Zurich file server and compute and storage instances in the cloud About 36 000 single-threaded jobs created by the client, managed by the platform and computed in the cloud Almost 250 000 CPU hours utilized for the production calculations Ca. 2.3 Mio 3D protein structure models created Calculations finished within two weeks 28.11.2011 26
EuroCloud Swiss 28.11.2011 27
EuroCloud Swiss Swiss association for cloud computing Platform and lobbying for cloud computing in Switzerland http://www.eurocloud swiss.ch Representative of Switzerland in the EuroCloud Europe network Collaboration with simsa Swiss Cloud Conference: 21.03.2012, Technopark Zurich Swiss Cloud Award 2012 Code of practice Certification 28.11.2011 28
Thank You CloudBroker management team, in particular Nicola Fantini CloudBroker development team SystemsX.ch, in particular Dr. Peter Kunszt ETH Zurich, in particular Dr. Lars Malmström IBM, in particular Marcel Lautenschlager, Roland Reifler and Stefan Ruckstuhl EuroCloud Swiss and you for listening! 28.11.2011 29
For More Information Contact Dr. Wibke Sudholt CloudBroker GmbH Technoparkstrasse 1 CH-8005 Zurich Switzerland Phone: +41 44 633 79 34 Email: wibke.sudholt@cloudbroker.com Web: http://www.cloudbroker.com 28.11.2011 30