Cloud Computing For Bioinformatics



Similar documents
Cloud Computing and Amazon Web Services

Scaling in the Cloud with AWS. By: Eli White (CTO & mojolive) eliw.com - mojolive.com

Shadi Khalifa Database Systems Laboratory (DSL)

Fault-Tolerant Computer System Design ECE 695/CS 590. Putting it All Together

How AWS Pricing Works May 2015

How AWS Pricing Works

Web Application Deployment in the Cloud Using Amazon Web Services From Infancy to Maturity

Scalable Architecture on Amazon AWS Cloud

A Comparison of Clouds: Amazon Web Services, Windows Azure, Google Cloud Platform, VMWare and Others (Fall 2012)

Amazon EC2 Product Details Page 1 of 5

AWS Performance Tuning

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

How To Choose Between A Relational Database Service From Aws.Com

Cloud Computing with Amazon Web Services and the DevOps Methodology.

Cloud Compu)ng. [Stephan Bergemann, Björn Bi2ns] IP 2011, Virrat

Using ArcGIS for Server in the Amazon Cloud

Amazon Elastic Beanstalk

Amazon Cloud Storage Options

Cloud Computing. Adam Barker

TECHNOLOGY WHITE PAPER Jan 2016

ur skills.com

Chapter 3 Cloud Infrastructure. Cloud Computing: Theory and Practice. 1

TECHNOLOGY WHITE PAPER Jun 2012

Service Organization Controls 3 Report

Preparing Your IT for the Holidays. A quick start guide to take your e-commerce to the Cloud

This computer will be on independent from the computer you access it from (and also cost money as long as it s on )

Amazon Web Services Primer. William Strickland COP 6938 Fall 2012 University of Central Florida

Designing Apps for Amazon Web Services

Cloud computing - Architecting in the cloud

An Introduction to Cloud Computing Concepts

Amazon Elastic Compute Cloud Getting Started Guide. My experience

EEDC. Scalability Study of web apps in AWS. Execution Environments for Distributed Computing

The Total Cost of (Non) Ownership of a NoSQL Database Cloud Service

Amazon Web Services Yu Xiao

Amazon Web Services Student Tutorial

Amazon AWS in.net. Presented by: Scott Reed

Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers

Web Application Hosting in the AWS Cloud Best Practices

Building your Big Data Architecture on Amazon Web Services

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Financial Services Grid Computing on Amazon Web Services January 2013 Ian Meyers

Scalable Application. Mikalai Alimenkou

ColdFusion 10 in the Amazon AWS Cloud. Sven Ramuschkat tecracer GmbH

Running Oracle Applications on AWS

Technology and Cost Considerations for Cloud Deployment: Amazon Elastic Compute Cloud (EC2) Case Study

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Intro to AWS: Storage Services

Amazon Web Services Building in the Cloud

Amazon Relational Database Service (RDS)

Introduction to AWS Economics

Estimating the Cost of a GIS in the Amazon Cloud. An Esri White Paper August 2012

Migration Scenario: Migrating Backend Processing Pipeline to the AWS Cloud

Last time. Today. IaaS Providers. Amazon Web Services, overview

References. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline

Introduction to Database Systems CSE 444

Cloud Models and Platforms

Enterprise Cloud Computing with AWS. for internal partner use only

Expand Your Infrastructure with the Elastic Cloud. Mark Ryland Chief Solutions Architect Jenn Steele Product Marketing Manager

Deploying for Success on the Cloud: EBS on Amazon VPC. Phani Kottapalli Pavan Vallabhaneni AST Corporation August 17, 2012

Storage Options in the AWS Cloud

Amazon Web Services. Elastic Compute Cloud (EC2) and more...

An Esri White Paper January 2011 Estimating the Cost of a GIS in the Amazon Cloud

Introduction to Cloud : Cloud and Cloud Storage. Lecture 2. Dr. Dalit Naor IBM Haifa Research Storage Systems. Dalit Naor, IBM Haifa Research

Alfresco Enterprise on AWS: Reference Architecture

Design for Failure High Availability Architectures using AWS

Storage Options in the AWS Cloud: Use Cases

Financial Services Grid Computing on Amazon Web Services. January, 2016

HADOOP BIG DATA DEVELOPER TRAINING AGENDA

DLT Solutions and Amazon Web Services

A Survey on Cloud Storage Systems

Web Application Hosting in the AWS Cloud Best Practices

Cloud Computing and Amazon Web Services. CJUG March, 2009 Tom Malaher

CONNECTRIA MANAGED AMAZON WEB SERVICES (AWS)

LARGE-SCALE DATA STORAGE APPLICATIONS

Building Fault-Tolerant Applications on AWS October 2011

A programming model in Cloud: MapReduce

AWS Storage: Minimizing Costs While Retaining Functionality

Cost Optimization with AWS

Data Center Infrastructure Innovation

How To Manage An Orgsync Database On An Amazon Cloud 2 Instance

Getting Started with Cloud Computing: Amazon EC2 on Red Hat Enterprise Linux

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

Lets SAAS-ify that Desktop Application

Cloud Computing Now and the Future Development of the IaaS

How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda

Introduction to Cloud Computing

Introduction to Cloud computing. Viet Tran

Background on Elastic Compute Cloud (EC2) AMI s to choose from including servers hosted on different Linux distros

Using ArcGIS for Server in the Amazon Cloud

Transcription:

Cloud Computing For Bioinformatics

Cloud Computing: what is it? Cloud Computing is a distributed infrastructure where resources, software, and data are provided in an on-demand fashion. Cloud Computing abstracts infrastructure from application. Cloud Computing should save you time the way software packages save you time.

Cloud Computing Before: Purchase Hardware & ensure it s all compatible Appropriate resources for hardware (power, cooling, rack space, etc) Set up & configure hardware Install baseline software (OS, packages) Develop & deploy your application With the Cloud: Request resource Develop & deploy your application

Cloud Computing Advantages: Reliability: Decoupling applications from hardware removes hardware failure concerns Scalability: Many cloud services have built-in linear scaling, allowing more resources to be brought online on-demand Turnaround: Greatly reduce time taken to procure hardware resources Cost: Limited upfront cost when compared to hardware purchase Pay as you go: Pay for what you use. Don t pay for servers sitting idly sucking power & cooling Experimentation: Because of the above, the opportunity costs of experimentation are tiny Sharing & Collaboration: Share resources such as machine images & data without worry

Cloud Computing Disadvantages: Learning Curve: One must learn how to leverage cloud & it s advantages; not how one is used to working Data Transfer: Getting data into & out of the cloud is at internet speed, not network speed Opacity: The underlying infrastructure is hidden from view

Cloud Computing: Components

Cloud Computing: Components Glossary: AWS: Amazon Web Services EC2 / Elastic Compute Cloud: Computer resources in the cloud. Essentially virtual computers with varying CPU & memory resources. EBS / Elastic Block Store: Blocklevel storage for data. They are virtual hard drives for EC2 instances. S3 / Simple Storage Service: An object store allowing you to save data in the cloud in a highlyredundant fashion EMR / Elastic Map Reduce: Automanaged map reduce infrastructure for running highlyparallel computation problems against a farm of computers. SDB / Simple Database: Run queries against structured data in real time. A very simple version of: RDS / Relational Database Service: Web service that lets you place a relational database in the cloud. AWS Import/Export: Load your data onto a device and mail it to Amazon, and let them load your data for you! There s plenty more, but these are the most important for bioinformatics.

Cloud Computing: Components Ok, here are some others: CloudWatch: Monitor AWS cloud resources, such as EC2 instances. Elastic Load Balancing: Amazonhosted load balancers distributing incoming traffic among EC2 nodes. SQS / Simple Queue Service: Hosted queue for storing messages as they pass between computers, enabling combination of disparate programs communicating with each other. VPC / Virtual Private Cloud: Fence off AWS services over an IP range via VPN, allowing cloud services to fit in with legacy security protocols. CloudFront: Content delivery service (CDN) on Amazon s collection of edge servers. SNS / Simple Notification Service: Set up, operate, and send notifications from the cloud to a variety of locations such as web page, email, SMS, etc. Amazon Mechanical Turk: As the name implies, you create a Human Intelligence Task (HITs) which a human can do easily, then you pay a modest fee each time some human performs this task. Examples would be rating quality between items, filling out forms, or solving CAPTCHAs, etc.

Cloud Computing: Components Let s learn more about those important services

Cloud Computing: Services EC2: Virtual computers offered with varying memory / cpu power How is CPU power measured in a virtual world? ECU: EC2 Compute Unit: measure of computing power on AWS. Equivalent of a 1.0GHz 2007 Xeon processor. 4 classes of instances: Standard Instances: inexpensive instances used for testing, web service, and many less intensive jobs High-Memory Instances: Large RAM images for high throughput applications e.g. databases, caches High-CPU Instances: High ECU instances for compute-intensive applications Cluster Compute Instances: Increased network performance for HPC applications e.g. map-reduce

Cloud Computing: Services Instance Type ECU Units RAM (GB) Local Storage (GB) Standard Small 1 1.7 160 Large 4 7.5 850 XL 8 15 1690 High-Memory XL 6.5 17.1 420 Double XL 13 34.2 850 Quadruple XL 26 68.4 1690 High-CPU Medium 5 1.7 350 XL 20 7 1690 Cluster Compute Quadruple XL 33.5 23 1690

Cloud Computing: Services Pricing Lot of factors affect pricing Prices commensurate with class of instance used (Standard, High-memory) Prices adjusted by OS: Linux (cheaper) and Windows (pricier) Prices adjusted by instance type: On-demand Instances: Always available to start. Priciest option. No commitment, no contract Reserved Instances: Pre-pay upfront to have the ability to run an instance at a reduced rate Spot Instances: EBay-style! Bid a max price for compute instances, and procure them when the demand price meets your top bid. Cannot get a price reliably, but can save money on instances. Prices adjusted by availability zone. 4 available: US East (cheapest across the board) US West EU Ireland APAC Singapore (new!) Estimating costs is hard, even with Amazon-provided calculators, as YMMV.

Cloud Computing: Services Availability Zone? What s that? Amazon data centers are located around the globe. This ensures protection from data-center wide failure Problem is many services are independent between zones, making this moot in most cases Proximity to your work environment will reduce latency (the speed information travels from you to Amazon and back) Choose the one closest to you, or the cheapest price, or somewhere in between This will trip you up, trust me.

Cloud Computing: Services EBS: Create disks that can be mounted onto your EC2 AMIs Disks are also placed in Availability Zones, and priced accordingly Can create new volumes based on public data sets Can create snapshots : User-initiated copies of all the data stored in super-durable Amazon S3

S3: Cloud Computing: Services Stores objects in a bucket and allows retrieval based on unique key (URI) Can store objects ranging from 1 byte to 5GB. Unlimited objects can be stored RESTful interface (Representational state transfer) Extreme durability of data, with option for cheaper service (but reduced durability) Backed by Amazon S3 SLA (service level agreement) Unlimited objects and Extreme durability? What s the catch? Simple object stores are bad when disk I/O operations are needed 5GB may be too small for data sets At the end of the day you can save data to S3 but you ll be transferring it to EBS for any operations you re going to do with it.

Cloud Computing: Services EMR: Hosted Hadoop infrastructure for use of MapReduce paradigm in the cloud Allows Wait, do you know what MapReduce is? No? Then let s back up a moment

Cloud Computing: MapReduce Super-quick Primer MapReduce: Inspired by functional programming, and introduced by Google. A way to process large amounts of data by farming out work to a cluster. Works by using two functions: Mapper: Takes huge input data and chunks it out into smaller sub-problems, applying one or more functions to each, resulting in a key/value pair of the data Reducer: Takes the key/value pairs and combines them into useful data This is just a way of thinking about a problem. You need to code everything by hand. (Think of this not as a solution, but a way to think about creating one) Hadoop is software that handles distribution and collection of the data through your Map and Reduce functions, abstracting the bookkeeping. If this still seems obtuse, Vince & Daniel have great talks on this. Also for more information, Google has the answer. Check out Google s MapReduce in a Week (http://code.google.com/edu/submissions/mapreduce/listing.html)

Cloud Computing: Services EMR: With that out of the way Hosted Hadoop infrastructure for use of MapReduce paradigm in the cloud Allows processing of vast amounts of data Built to take advantage of other systems such as S3 to process data & store results (respectively) Most Bioinformatics tools cannot make good use of EMR at this time

Cloud Computing: Services SDB: Non-relational data store (More like excel than MySQL) Think of it as S3 for data instead of files Primarily for index & query capabilities Comes with a free tier for testing, making approaching this service easy First 25 machine hours & 1GB storage / month free After that, pricing is per machine hour used Syntax: Domains: Think of this as your spreadsheet name Attributes: These would be the data in a column. Attributes have a name (header) and a value Limit of 10GB per domain Comes in two flavors: Consistent: Your read reflects the data previously written Eventually Consistent: Higher read throughput, but reads are not guaranteed to reflect everything written to it before. Latency between writing and reading updated information.

Cloud Computing: Services RDS: Literally a hosted relational database (like MySQL) Features reserved & on-demand pricing Patches the software and handles backups for a user-defined retention period Designed for use with other services (as you can imagine), so using EC2 will have low-latency to a RDS instance and vice-versa Can create snapshots (sound familiar?): User-initiated backups with indefinite retention (last until you delete them) Multi-zone deployment: Allows replication of data across availability zones for durability of data RDS instances come in various sizes which will look familiar to anyone that knows EC2 instance sizes.

Cloud Computing: Components Questions?

Cloud Computing: Components Oh yeah, here s some free money! (weren t expecting that, were ya?)