Managing and Conducting Biomedical Research on the Cloud Prasad Patil



Similar documents
Cloud Computing. Adam Barker

Introduction to Cloud Computing

Cheminformatics in the Cloud. Michael A. Dippolito DeltaSoft, Inc. 3-June-2009 ChemAxon European User Group Meeting

Session 3. the Cloud Stack, SaaS, PaaS, IaaS

Cloud computing - Architecting in the cloud

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

Table of Contents. Abstract... Error! Bookmark not defined. Chapter 1... Error! Bookmark not defined. 1. Introduction... Error! Bookmark not defined.

<Insert Picture Here> Enterprise Cloud Computing: What, Why and How

An Introduction to Cloud Computing Concepts

NCTA Cloud Architecture

Best Practices for Using MySQL in the Cloud

Comparison of Open Source Cloud System for Small and Medium Sized Enterprises

Cloud Computing. Chapter 1 Introducing Cloud Computing

Cloud-Based Big Data Analytics in Bioinformatics

Cloud Computing. Chapter 1 Introducing Cloud Computing

AWS Account Setup and Services Overview

How To Understand Cloud Computing

Research Paper Available online at: A COMPARATIVE STUDY OF CLOUD COMPUTING SERVICE PROVIDERS

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

CHAPTER 8 CLOUD COMPUTING

Amazon Elastic Beanstalk

FREE computing using Amazon EC2

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

OTM in the Cloud. Ryan Haney

Cloud Computing. Chapter 1 Introducing Cloud Computing

Last time. Today. IaaS Providers. Amazon Web Services, overview

Cloud Computing An Elephant In The Dark

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix

Cloud Computing Services and its Application

Chapter 9 PUBLIC CLOUD LABORATORY. Sucha Smanchat, PhD. Faculty of Information Technology. King Mongkut s University of Technology North Bangkok

Amazon Hosted ESRI GeoPortal Server. GeoCloud Project Report

CloudFTP: A free Storage Cloud

SURFsara HPC Cloud Workshop

Virtualization Technologies in SCADA/EMS/DMS/OMS. Vendor perspective Norman Sabelli Ventyx, an ABB company

Technology and Cost Considerations for Cloud Deployment: Amazon Elastic Compute Cloud (EC2) Case Study

Oracle Applications and Cloud Computing - Future Direction

Cloud Computing and Amazon Web Services

Amazon Web Services Demo Tech Exchange. Slides:

Cloud Panel Service Evaluation Scenarios

Cloud Compu)ng. [Stephan Bergemann, Björn Bi2ns] IP 2011, Virrat

Cloud Computing. Alex Crawford Ben Johnstone

Outline. What is cloud computing? History Cloud service models Cloud deployment forms Advantages/disadvantages

Cloud Federation to Elastically Increase MapReduce Processing Resources

How To Understand Cloud Computing

If you do NOT use applications based on Amazon Web Services raise your hand.

A PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM

Security management in the internet era

ArcGIS for Server: In the Cloud

Cloud Models and Platforms

International Journal of Engineering Research & Management Technology

USER CONFERENCE 2011 SAN FRANCISCO APRIL Running MarkLogic in the Cloud DEVELOPER LOUNGE LAB

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

Scientific and Technical Applications as a Service in the Cloud

Cloud Computing Submitted By : Fahim Ilyas ( ) Submitted To : Martin Johnson Submitted On: 31 st May, 2009

Cloud Computing Technology

Cloud Based Tes,ng & Capacity Planning (CloudPerf)

CLOUD COMPUTING. A Primer

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Cloud Computing. Cloud computing:

SURFsara HPC Cloud Workshop

Cluster, Grid, Cloud Concepts

BUSINESS MANAGEMENT SUPPORT

A STUDY ON CLOUD STORAGE

ABSTRACT. KEYWORDS: Cloud Computing, Load Balancing, Scheduling Algorithms, FCFS, Group-Based Scheduling Algorithm

Subash Krishnaswamy Applications Software Technology Corporation

How To Understand Cloud Computing

Oracle: Private Platform as a Service from Oracle

Cloud 101. Mike Gangl, Caltech/JPL, 2015 California Institute of Technology. Government sponsorship acknowledged

Chapter 11 Cloud Application Development

CLOUD COMPUTING. When It's smarter to rent than to buy

DataCenter optimization for Cloud Computing

Why Private Cloud? Nenad BUNCIC VPSI 29-JUNE-2015 EPFL, SI-EXHEB

JAVA IN THE CLOUD PAAS PLATFORM IN COMPARISON

Building Out Your Cloud-Ready Solutions. Clark D. Richey, Jr., Principal Technologist, DoD

CSE543 Computer and Network Security Module: Cloud Computing

Introduction What is the cloud

idash Infrastructure to Host Sensitive Data: HIPAA Cloud Storage and Compute

Running Agilent GeneSpring MPP on the Cloud

White Paper on CLOUD COMPUTING

BMC Control-M for Cloud. BMC Control-M Workload Automation

Private Cloud for WebSphere Virtual Enterprise Application Hosting

CLOUD COMPUTING. Dana Petcu West University of Timisoara

A.Prof. Dr. Markus Hagenbuchner CSCI319 A Brief Introduction to Cloud Computing. CSCI319 Page: 1

Cloud Computing Now and the Future Development of the IaaS

Cloud-based Services: To Move or Not To Move. Seminar Internet Economics Cristian Anastasiu & Taya Goubran

Lets SAAS-ify that Desktop Application

PERFORMANCE ANALYSIS OF PaaS CLOUD COMPUTING SYSTEM

Viswanath Nandigam Sriram Krishnan Chaitan Baru

Design of Cloud Services for Cloud Based IT Education

What is Cloud Computing? Why call it Cloud Computing?

Cloud Computing Summary and Preparation for Examination

Transcription:

Managing and Conducting Biomedical Research on the Cloud Prasad Patil Laboratory for Personalized Medicine Center for Biomedical Informatics Harvard Medical School

SaaS & PaaS gmail google docs app engine What is Cloud Computing? SaaS MS Office online IaaS virtualized hardware Definition: Clouds are a large pool of easily usable and accessible virtualized resources (such as hardware, development platforms, and/or services). These resources can be dynamically reconfigured to adjust to a variable load (scale), allowing also for an optimum resource utilization.

Why Cloud Computing? 1. Ability to scale Job that takes 10 hours on a single server can be done in 1 hour on 10 servers 2. Pay-per-use Only pay for what you use, when you use it Reduces the need to purchase hardware 3. Increased flexibility Variety of server types and operating systems

Cloud Computing at Harvard? Datacenter CBMI Countway hardware and software Goal is to go beyond the hype and explore the utility for novice users Armbrust et al. Above the Clouds (2009)

LPM Question Can we implement a systematic strategy and best practice process to efficiently manage cloud computing resources and clouded translational science projects? Goals: 1. Facilitate global, multi-institutional, multidiscipline research collaborations 2. Significantly reduce overall administrative and management requirements 3. Repeatable process at other research centers or labs to enhance scientific progress at reduce cost

Challenge for Scientific Research Objective: Create low cost, low administrative footprint Computational Center under typical academic and current technical and resource constraints: Tremendous Project diversity (scientifically, complexity, and computationally) Multiple project leads Multiple project teams at several US and Foreign locations Diversity of experience with Cloud Varying levels of project team access/ resource control One AWS account Single (20% effort) Administrator Limited Resources: Time, AWS services, Coordination Overall Focus: Science objective of the project Do not allow resource management and administration (configuration, software downloads, upgrades, version control, etc ) to distract or impede the scientific objectives.

Clouded Translational Science Seminar Participants in the Clouded Translational Science seminar will conduct a series of exercises in biomedical discovery and translational science using cloud computing technology. Participants represent Harvard, Children s Hospital of Boston, Brigham and Women s Hospital, Beth Israel Hospital, Mass General, the Broad Institute, two University of Wisconsin campuses (Madison and Milwaukee) and the Tokyo Medical and Dental University and will learn about and implement databases, analysis tools and web application development environments using the Amazon cloud computing. http://lpm.hms.harvard.edu/palaver/

Types of Projects Network analysis for disease genetics (Roundup) Translational Variome Next generation sequence analysis (DNA & RNA) i2b2 (www.i2b2.org) Pharmacogenetics using clinical avatars Cloud computing center

LPM Project Breakdown Inelastic Clinical Avatars Project Development i2b2 AMI Development Clinical Variome Managed Elastic Clinical Avatars Web Deployment i2b2 Federated Queries NGS RNA Algorithm Testing Elastic RoundUp Crossbow NGS DNA Whole Genome Mapping

Resource Access Management Option Advantage Disadvantage RightScale sub-accounts AWS Identity Access Management Secure server access Individual control Easy to implement Customizable Elastic Free educational license Free for AWS accounts Control service usage Minimal restrictions Steep learning curve Trust users RightScale hooks No UI: code-intensive Beta SSH Keys/Passwords Easy to implement We control user access Requires more mgmt. Not elastic

Cloud Management Strategy Projects Decision Criteria AWS/RightScale Main Account Project Deployment Server Configuration SSH Key Pair RightScale Sub Account Project Deployment Server Configuration RightScale SSH Inelastic Managed Elastic Elastic

RightScale Cloud Management Inelastic Elastic Managed Elastic www.rightscale.com

RightScale Instance Resource Usage

Best Practices 1. Analyze the details of the project to take full advantage of the cloud Type of OS required, size of data, CPU, memory 2. Create a backup strategy before you launch an instance 3. Only launch an instance when you are ready to start working and shut down the instance when the work is complete 4. Access an instance using a secure connection 5. Actively monitor your account

Whole Genome Mapping Strategy MAQ Sequence alignment and assembly Uses short reads from NGS technology Publicly available Apply to the African genome Using AWS Take advantage of the elasticity of the cloud Goal Flexible framework for any application MAQ developed by Heng Li at Sanger Institute Bentley et al. Nature, 2008

Launch AWS Linux Instance Where to Start? Package into an AMI Install Maq NCBI reference genome Apache web server Launch identical copies of our instance

Job Distribution Architecture EBS volume or S3 Copy files and scripts to slaves Slave 1 AMI Local Terminal Create the cluster Master AMI Slave 2 AMI Web monitoring Copy output files to master Slave 3 AMI

Web Monitoring

Costs and Results African Genome NGS Data: 370 GB Compute Nodes: 25 (5 x 5-core) Total Cost: $2,600 Cost can be spread over more instances to decrease computation time. New mapping software significantly reduces the cost (<$100) Resulting annotated variant file: 22 MB