Genome Analysis in a Dynamically Scaled Hybrid Cloud



Similar documents
Genome Analysis in a Dynamically Scaled Hybrid Cloud

Survey on Job Schedulers in Hadoop Cluster

CloudCmp:Comparing Cloud Providers. Raja Abhinay Moparthi

Avoiding Performance Bottlenecks in Hyper-V

Application of Predictive Analytics for Better Alignment of Business and IT

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Cloud Computing Trends

VI Performance Monitoring

Road Map. Scheduling. Types of Scheduling. Scheduling. CPU Scheduling. Job Scheduling. Dickinson College Computer Science 354 Spring 2010.

1. Simulation of load balancing in a cloud computing environment using OMNET

Private Cloud Database Consolidation with Exadata. Nitin Vengurlekar Technical Director/Cloud Evangelist

Scaling Analysis Services in the Cloud

The International Journal Of Science & Technoledge (ISSN X)

Sla Aware Load Balancing Algorithm Using Join-Idle Queue for Virtual Machines in Cloud Computing

THE WINDOWS AZURE PROGRAMMING MODEL

Load Balancing to Save Energy in Cloud Computing

ActiveVOS Performance Tuning

Dynamic Thread Pool based Service Tracking Manager

Task Scheduling in Hadoop

Automating Big Data Benchmarking for Different Architectures with ALOJA

Efficient Parallel Processing on Public Cloud Servers Using Load Balancing

Final Project Proposal. CSCI.6500 Distributed Computing over the Internet

Chapter 2: Getting Started

WINDOWS AZURE EXECUTION MODELS

Mark Bennett. Search and the Virtual Machine

Web Server Software Architectures

Agenda. Tomcat Versions Troubleshooting management Tomcat Connectors HTTP Protocal and Performance Log Tuning JVM Tuning Load balancing Tomcat

Cost Savings Solutions for Year 5 True Ups

WINDOWS AZURE AND WINDOWS HPC SERVER

Windows Server Performance Monitoring

The Moab Scheduler. Dan Mazur, McGill HPC Aug 23, 2013

CHAPTER 1 INTRODUCTION

STeP-IN SUMMIT June 18 21, 2013 at Bangalore, INDIA. Performance Testing of an IAAS Cloud Software (A CloudStack Use Case)

Apache Hadoop. Alexandru Costan

Microsoft HPC. V 1.0 José M. Cámara (checam@ubu.es)

An Approach to Load Balancing In Cloud Computing

Sistemi Operativi e Reti. Cloud Computing

Genetic Algorithms for Energy Efficient Virtualized Data Centers

Het is een kleine stap naar een hybrid cloud

COST VS. ROI Is There Value to Virtualization and Cloud Computing?

How To Test The Power Of Ancientisk On An Ipbx On A Quad Core Ios (Powerbee) On A Pc Or Ipbax On A Microsoft Ipbox On A Mini Ipbq

Case Study I: A Database Service

Accelerate the Performance of Virtualized Databases Using PernixData FVP Software

Cloud Sure - Virtual Machines

Load Balancing in cloud computing

An Energy Efficient Server Load Balancing Algorithm

International Journal of Advance Research in Computer Science and Management Studies

Performance White Paper

Capacity Scheduler Guide

VMware vcloud Automation Center 6.0

Understanding Server Configuration Parameters and Their Effect on Server Statistics

International Journal of Scientific & Engineering Research, Volume 6, Issue 4, April ISSN

Limitations of Managing VMware vsphere with MS System Center Virtual Machine Manager 2012

Web Application Hosting Cloud Architecture

On Demand Satellite Image Processing

Performance Benchmark for Cloud Block Storage

Multifaceted Resource Management for Dealing with Heterogeneous Workloads in Virtualized Data Centers

Roulette Wheel Selection Model based on Virtual Machine Weight for Load Balancing in Cloud Computing

U-LITE Network Infrastructure

Making a Smooth Transition to a Hybrid Cloud with Microsoft Cloud OS

OpenStack Private Cloud

Migration Scenario: Migrating Batch Processes to the AWS Cloud

PERFORMANCE ANALYSIS OF PaaS CLOUD COMPUTING SYSTEM

Amazon EC2 Product Details Page 1 of 5

WINDOWS AZURE DATA MANAGEMENT

Cloud computing - Architecting in the cloud

Scalable Machine Learning - or what to do with all that Big Data infrastructure

The Importance of Software License Server Monitoring

International Journal Of Engineering Research & Management Technology

Amazon EC2 XenApp Scalability Analysis

VMware vrealize Automation

How To Build A Software Defined Data Center

Cloud Management: Knowing is Half The Battle

Users are Complaining that the System is Slow What Should I Do Now? Part 1

IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE

How To Choose Between A Relational Database Service From Aws.Com

Principles and Methods for Elastic Computing

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

Bright Idea: GE s Storage Performance Best Practices Brian W. Walker

Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details

Assignment # 1 (Cloud Computing Security)

<Insert Picture Here> An Experimental Model to Analyze OpenMP Applications for System Utilization

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

WebLogic on Oracle Database Appliance: Combining High Availability and Simplicity

A Comparative Performance Analysis of Load Balancing Algorithms in Distributed System using Qualitative Parameters

Hadoop: Embracing future hardware

Energy Efficient MapReduce

Transcription:

Genome Analysis in a Dynamically Scaled Hybrid Cloud Chris Smowton*, Georgiana Copil**, Hong- Linh Truong**, Crispin Miller* and Wei Xing* * CRUK Manchester ** TU Vienna

In a Nutshell Users want to run analysis tasks Admin wants to dynamically allocate resources Non- expert users are bad at assigning resources to their jobs We propose to automalcally control global and per- job parallelism and show via simulalon that this could work

Background: CELAR Project Cloud applicalons are osen scalable Example: web database frontend How many database servers? How many web servers? What spec cloud instance for each one? Parameters adjusted in ad- hoc manner, using admin s intuilon CELAR: automate sewng these parameters instead

Background: CELAR Project ApplicaLons expose metrics Web example: avg. page load Lme, web server memory usage, Admin sets goals Keep page load Lme below 1 second CELAR can hire and release cloud resources Learns how different resources help achieve the goals

Background: Resource Managers Users submit jobs Annotated with resource needs Admin provides machines (usually fixed physical hardware, maybe cloud- hired) Resource manager (RM) queues jobs and assigns jobs to machines

CELAR- ised Resource Manager Expose metrics to CELAR: Job queue length, user happiness Set goals: Maximise happiness! Let CELAR pick how many cloud VMs the RM gets, and what spec they have

This Work: Show that CELARising our RM can be a good thing CELAR chooses how many VMs the RM has at its disposal, and their spec Show that there is scope to improve over simple control policies For example, hire a VM whenever there is a queue Evaluate by simulalng interaclon between users, resource manager and CELAR

Model Users CELAR Submit jobs (randomly) Observes metrics; takes hiring decisions Resource Manager Provides variable cost machines (2- Ler) Infrastructure

Model Users Users pay for service; we pay the infrastructure provider to hire machines In praclcal academic environments, instead of money these might be quota credits Users pay more for beger service We try to make as much profit as possible

Model ApplicaLon: GATK 7- stage analysis pipeline Users care how long the whole pipeline takes Different stages have different scaling behaviour Measured through profiling

CELAR Task 1: When to hire, when to wait? Infrastructure has two cost Lers: cheap and expensive Suppose a queue is building up at the RM but no cheap resources are available Should we hire an expensive machine or wait for a cheap one?

Proposed solulon: Predict From our GATK profiling we can predict when running tasks will finish Weigh amount the user will pay for faster complelon vs. expected extra cost If the queue is long, all queued jobs will benefit if we accelerate one

PredicLve Scaling Example

Caveat: PredicLon Quality

CELAR Task 2: To Thread or Not To Thread? Most GATK pipeline stages support mullthreading but some beger than others, leading to diminishing returns During idle Lmes, best to run mullthreaded and make the users happy During busy Lmes, best to run singlethreaded and maximise throughput

Proposed SoluLon: Weigh Expected Reward vs. Extra Cost Profiling lets us predict the effect of adding a thread Compare against cost, taking into account whether we would use cheap or expensive resources Three candidate schemes: greedy, long- term and long- term adaplve

VerLcal Scaling Example

Caveat: Greed? If there are only 8 cheap cores les, and the next job would ideally use them all, should we really grab them all ourself? Surprisingly, yes!

Summary Developed candidate algorithms for automalcally controlling the size of an RM s pool of worker VMs and the parallelism used by each job Showed ability to improve over baseline algorithms

What s Next? Apply findings in the real world: use auto- scaling on our real cluster Compare against exislng work on e.g. Hadoop scheduling, where jobs are flexible by default Demonstrate within the CELAR framework

More InformaLon www.celarcloud.eu christopher.smowton@cruk.manchester.ac.uk