Brian Connolly Systems Engineer, LabKey Software brian@labkey.com LabKey Server in the Cloud 1
Agenda What is the Cloud? Why would I want to use the cloud? What will it cost? Using LabKey in the cloud How does LabKey use the cloud? Other scientific tools in the Cloud 2
Introduction Who am I What do I bring to the conversation 3
What is the Cloud? 4
What is the cloud Wikipedia says Cloud computing is the delivery of computing as a service rather than a product, whereby shared resources, software, and information are provided to computers and other devices as a utility (like the electricity grid) over a network (typically the Internet). - http://en.wikipedia.org/wiki/cloud_computing What does this mean to me? rent vs buy only pay for what you use takes minutes have a new computer instead of days or weeks 5
Types of Clouds Datacenter as a Service* Platform as a Service* Software as a Service* How non-it folks see the cloud vendors. * http://en.wikipedia.org/wiki/cloud_computing 6
Datacenter as a Service From an API or GUI, you are able to provision and manage all pieces of a datacenter (servers, network hardware, storage, databases, etc). What do you get? Full control over: servers (install and configure any way you want) network hardware (firewall, load balancing) Pay as you use, self-service, and support custom configs Access to all the vendors services Storage, Database, Message Queues, CDN, load balancers, firewall,etc Major Vendors: Amazon Web Services 7
Platform as a Service delivers a computing platform and/or solution stack as a service, often consuming cloud infrastructure and sustaining cloud applications. - Wikipedia (http://en.wikipedia.org/wiki/iaas#platform) What do you get? servers (or their equivalent) storage service database service, etc no firewall, load balancing, etc Failover, clustering, custom configs. maybe Pay as you use and self-service 8
Platform as a Service (cont) There are two types of Platform as a Service You get a server(s) essentially the old hosted server model existing hosting and VPS companies are in this space. major vendors: Rackspace, GoGrid, IBM, etc You write some code and hit the deploy button. you never interact with the servers directly your application code is bundled with deployment descriptors and sent to cloud via API major vendors: Microsoft, Google App Engine, Heroku 9
Software as a Service deliver software over the Internet, eliminating the need to install and run the application on the customer's own computers and simplifying maintenance and support. - Wikipedia (http://en.wikipedia.org/wiki/iaas#application) What do you get? you are the end-user for this software personalization available pay as you use Major Vendors: Salesforce, Spotify, Flickr, etc 10
Why would I want to use the cloud? 12
Why would I want to use the cloud? To meet a deadline. A reviewer asked for the samples processed using a new method. I need to process large number of samples for a grant application Prototyping: Try a new processing method Proteomics: Use an updated FASTA file or additional parameter Genomics: Reference sequence has changed I have new hypothesis and want to quickly re-process my data 14
Why would I want to use the cloud? (cont) I want to try out new software to see if it meets my needs LabKey Online Galaxy s free public server UCSC Genome Browser I want to automate my pipeline Cyclecomputing.com (Push button HPC in the cloud) Starcluster CloudBioLinux No need to wait for 3 months for IT to purchase and setup 15
Why would I NOT want to use the cloud? (cont) Processing huge amounts of data Data transfer time is too long small network pipe to the internet transfer time + processing time in the cloud >= processing time in on your laptop I have a long running study (year or more) and I need to the computing around 24x7 (iffy) 16
What will it cost? 17
What will it cost? What will I be changed for? How will I be billed? How to estimate my costs 18
What will I get charged for? When using the cloud you are renting the computers you need. Most clouds bill by hour vendors: AWS, Rackspace,Windows Azure, Google App Engine, etc some do not (Heroku) SAAS usually bills by the month vendors: Salesforce, etc If you forgot to turn it off you will still be billed 19
How will I be billed? Billed monthly Billed to credit card Large institutions or large companies use purchase orders Monitoring usage and cost during the month 20
How to estimate your costs In general things you will get charged for are: Servers (instances) Network usage ie what you send into and out of cloud Storage How much data you store in the cloud 21
Estimating Costs: Servers What do I mean by Servers? Called instances at AWS, Google App Engine and Windows Azure Called Dynos at Heroku Usage is charged per hour Price goes up with the size of the server How to estimate: how many servers will you need? what type of servers do you need? windows or linux what will they be doing? how big a server to do you need? where should they be located AWS: Spot instances 22
Estimating Costs: Network What do I mean by Network Bandwidth into and out of the cloud You are changed only for Bandwidth out of cloud Bandwidth into cloud is generally free Bandwidth between servers is generally free Bandwidth between datacenters (not free) For most scientific applications This is usually small compared to Servers 100GB of traffic in a month = $15 23
Estimating Costs: Storage What do I mean by Storage? Amount of data you have stored in the cloud Windows Azure $0.15/GB per month based on daily average You are charged for # of transactions AWS $0.10/GB per month You are charged for # of I/O requests For most scientific applications: This can be a significant cost 24
Using LabKey in the cloud 25
Using LabKey in the cloud Who is doing it Which clouds can run a LabKey Server Installing LabKey from scratch 26
Who is running LabKey in the cloud? LabKey LabKey Online Test servers Non-Profit Research Institute Seattle based BioTech company 27
Which clouds can run LabKey Server? Datacenter as a Service clouds Amazon Web Services Some Platform as a Service clouds Rackspace GoGrid IBM Smartcloud LabKey currently cannot be used on Window Azure Google App Engine Heroku 28
Installing the LabKey in a cloud 1. Start a new instance at your cloud provider 1. Download the LabKey Server installer Windows Installer Linux Installer (coming in 11.3) 2. Install LabKey Server Instructions at http://www.labkey.org 3. Start using your LabKey Server in the cloud. 29
How does LabKey us the cloud? 30
How does LabKey do it? Use Amazon Web Services and Rackspace Cloud offerings Operating Systems Linux: Ubuntu 10.04 LTS Windows: Windows Server 2008 31
How does LabKey do it? (cont) Installation/Configuration Choose latest Ubuntu AMI (http://uecimages.ubuntu.com/releases/10.04/release/) Use EBS backed instances AWS: Use Cloudformation to provision Instances Networks (firewalls) Disks Use Chef (http://opscode.com/chef) to automate install/configuration 32
How does LabKey do it? (cont) Data upload/download speeds what do we see here at FHCRC the ship us a hard drive option Processor /memory combinations test and measure Pipelines in the Cloud our experience working with Galaxy 33
What does it cost us? Lets use LabKey Online as an example: Server stats instance type: m1.large (2) EBS volumes: 85GB total Operating System: Linux Datacenter: us-east-1c Cost break-down (average monthly price: July->Oct 2011) Cost Price Percentage of Total Instance $250.92 95.8% Storage $10.92 4.1% Network $0.04 0.1% 34
Other scientific tools in the cloud 35
Other scientific tools in the cloud Galaxy Both SAAS and install on your own instances in the cloud GenomeSpace Cytoscape, Galaxy, GenePattern, Genomica, Integrative Genomics Viewer (IGV), and the UCSC Browser in the cloud The Gaggle The Gaggle is a framework for exchanging data between independently developed software tools and databases. CloudBioLinux Starcluster 36
Key Messages LabKey has been run successfully in the cloud by both LabKey and a number of other customers We would love to help you get started using LabKey in the cloud 37
Any questions? Brian Connolly brian@labkey.com 206-667-7521
If you use LabKey Server for your research, please reference one of these publications about the platform: General Use: Nelson EK, Piehler B, Eckels J, Rauch A, Bellew M, Hussey P, Ramsay S, Nathe C, Lum K, Krouse K, Stearns D, Connolly B, Skillman T, Igra M. LabKey Server: An open source platform for scientific data integration, analysis and collaboration. BMC Bioinformatics 2011 Mar 9; 12(1): 71. Proteomics: Rauch A, Bellew M, Eng J, Fitzgibbon M, Holzman T, Hussey P, Igra M, Maclean B, Lin CW, Detter A, Fang R, Faca V, Gafken P, Zhang H, Whitaker J, States D, Hanash S, Paulovich A, McIntosh MW: Computational Proteomics Analysis System (CPAS): An Extensible, Open-Source Analytic System for Evaluating and Publishing Proteomic Data and High Throughput Biological Experiments. Journal of Proteome Research 2006, 5:112-121. Flow Cytometry: Shulman N, Bellew M, Snelling G, Carter D, Huang Y, Li H, Self SG, McElrath MJ, De Rosa SC: Development of an automated analysis system for data from flow cytometric intracellular cytokine staining assays from clinical vaccine trials. Cytometry 2008, 73A:847-856.