Data Sharing Options for Scientific Workflows on Amazon EC2



Similar documents
Scientific Workflow Applications on Amazon EC2

The Application of Cloud Computing to Scientific Workflows: A Study of Cost and Performance

Data Sharing Options for Scientific Workflows on Amazon EC2

Hosted Science: Managing Computational Workflows in the Cloud. Ewa Deelman USC Information Sciences Institute

The application of cloud computing to scientific workflows: a study of cost and performance

An Evaluation of the Cost and Performance of Scientific Workflows on Amazon EC2

The Application of Cloud Computing to Astronomy: A Study of Cost and Performance

How can new technologies can be of service to astronomy? Community effort

Scientific Workflows in the Cloud

Cloud Computing. Alex Crawford Ben Johnstone

Rapid 3D Seismic Source Inversion Using Windows Azure and Amazon EC2

On the Use of Cloud Computing for Scientific Workflows

Creating A Galactic Plane Atlas With Amazon Web Services

Peer-to-Peer Data Sharing for Scientific Workflows on Amazon EC2

Hopes and Facts in Evaluating the Performance of HPC-I/O on a Cloud Environment

DynamicCloudSim: Simulating Heterogeneity in Computational Clouds

The Application of Cloud Computing to the Creation of Image Mosaics and Management of Their Provenance

A Very Brief Introduction To Cloud Computing. Jens Vöckler, Gideon Juve, Ewa Deelman, G. Bruce Berriman

PUBLIC CLOUD USAGE TRENDS

Running R from Amazon's Elastic Compute Cloud

Performance Analysis of a Numerical Weather Prediction Application in Microsoft Azure

Cloud Computing and Amazon Web Services

The Cost of Doing Science on the Cloud: The Montage Example

A Cloud Computing Approach for Big DInSAR Data Processing

Cornell University Center for Advanced Computing

The Case for Resource Sharing in Scientific Workflow Executions

Shared Parallel File System

Clearing the Clouds. Understanding cloud computing. Ali Khajeh-Hosseini ST ANDREWS CLOUD COMPUTING CO-LABORATORY. Cloud computing

OCRP Implementation to Optimize Resource Provisioning Cost in Cloud Computing

Rethinking Data Management for Big Data Scientific Workflows

Rethinking Data Management for Big Data Scientific Workflows

Cloud Server as IaaS Pricing

A Quantitative Analysis of High Performance Computing with Amazon s EC2 Infrastructure: The Death of the Local Cluster?

Cloud Server as IaaS Pricing

Amazon Elastic Compute Cloud Getting Started Guide. My experience

wu.cloud: Insights Gained from Operating a Private Cloud System

Understanding the Performance and Potential of Cloud Computing for Scientific Applications

Eoulsan Analyse du séquençage à haut débit dans le cloud et sur la grille

Enabling Technologies for Distributed and Cloud Computing

Product: Order Delivery Tracking

UBUNTU DISK IO BENCHMARK TEST RESULTS

Technology and Cost Considerations for Cloud Deployment: Amazon Elastic Compute Cloud (EC2) Case Study

CENIC Private Cloud Pilot Using Amazon Reserved Instances (RIs) September 9, 2011

Cloud Performance Benchmark Series

Time and Cost Sensitive Data-Intensive Computing on Hybrid Clouds

Cornell University Center for Advanced Computing

Department of Computer Sciences University of Salzburg. HPC In The Cloud? Seminar aus Informatik SS 2011/2012. July 16, 2012

Connecting Scientific Data to Scientific Experiments with Provenance

Adapting Market-Oriented Scheduling Policies for Cloud Computing

Cloud Computing and E-Commerce

Automating Application Deployment in Infrastructure Clouds

Failure Analysis of Distributed Scientific Workflows Executing in the Cloud

Neptune. A Domain Specific Language for Deploying HPC Software on Cloud Platforms. Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams

DYNAMIC CLOUD PROVISIONING FOR SCIENTIFIC GRID WORKFLOWS

Workflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice

Cost estimation for the provisioning of computing resources to execute bag-of-tasks applications in the Amazon cloud

DataCenter optimization for Cloud Computing

HPC performance applications on Virtual Clusters

SURVEY ON THE ALGORITHMS FOR WORKFLOW PLANNING AND EXECUTION

Resource Sizing: Spotfire for AWS

Cloud Computing through Virtualization and HPC technologies

SURFsara HPC Cloud Workshop

Enabling multi-cloud resources at CERN within the Helix Nebula project. D. Giordano (CERN IT-SDC) HEPiX Spring 2014 Workshop 23 May 2014

Cloud computing for fire engineering. Chris Salter Hoare Lea, London, United Kingdom,

Excalibur: An Autonomic Cloud Architecture for Executing Parallel Applications

Scientific Computing Meets Big Data Technology: An Astronomy Use Case

MATE-EC2: A Middleware for Processing Data with AWS

The Performance and Cost Variability of Amazon EC2

SURFsara HPC Cloud Workshop

Deploying Business Virtual Appliances on Open Source Cloud Computing

THE DEFINITIVE GUIDE FOR AWS CLOUD EC2 FAMILIES

Enabling Technologies for Distributed Computing

Challenges of Managing Scientific Workflows in High-Throughput and High- Performance Computing Environments

Keywords: Cloudsim, MIPS, Gridlet, Virtual machine, Data center, Simulation, SaaS, PaaS, IaaS, VM. Introduction

A Cost-Evaluation of MapReduce Applications in the Cloud

Grids and Clouds: Making Workflow Applications Work in Heterogeneous Distributed Environments

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

Comparing major cloud-service providers: virtual processor performance. A Cloud Report by Danny Gee, and Kenny Li

for my computation? Stefano Cozzini Which infrastructure Which infrastructure Democrito and SISSA/eLAB - Trieste

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications

CloudFTP: A free Storage Cloud

When Does Colocation Become Competitive With The Public Cloud? WHITE PAPER SEPTEMBER 2014

When Does Colocation Become Competitive With The Public Cloud?

Investigation of storage options for scientific computing on Grid and Cloud facilities

Développement logiciel pour le Cloud (TLC)

Benchmark Results of Fengqi.Asia

5 SCS Deployment Infrastructure in Use

Cloud Computing For Bioinformatics

Towards a Model for Cloud Computing Cost Estimation with Reserved Instances

Auto-Scaling Model for Cloud Computing System

Performance measurement of a private Cloud in the OpenCirrus Testbed

SYSTEM SETUP FOR SPE PLATFORMS

Comparing FutureGrid, Amazon EC2, and Open Science Grid for Scientific Workflows

How To Evaluate The Cost Performance Of A Cloud Cache On A Microsoft Cloud Instance (Aws Cloud)

XtreemFS Extreme cloud file system?! Udo Seidel

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

Chao He (A paper written under the guidance of Prof.

FortiGate Amazon Machine Image (AMI) Selection Guide for Amazon EC2

High Performance Computing in CST STUDIO SUITE

CloudAnalyst: A CloudSim-based Visual Modeller for Analysing Cloud Computing Environments and Applications

Transcription:

Data Sharing Options for Scientific Workflows on Amazon EC2 Gideon Juve, Ewa Deelman, Karan Vahi, Gaurang Mehta, Benjamin P. Berman, Bruce Berriman, Phil Maechling Francesco Allertsen Vrije Universiteit March 16, 2012 Francesco Allertsen (Vrije Universiteit) March 16, 2012 1 / 22

Introduction Workflow applications needs to exchange lots of files between the tasks Workflow applications are starting to be deployed in the cloud What is the best (for costs and efficiency) way of store files for workflow applications in the cloud? Francesco Allertsen (Vrije Universiteit) March 16, 2012 2 / 22

Overview What is a workflow application? Why cloud? Workflow applications Epigenome Broadband Montage Storage options Amazon S3 NFS GlusterFS PVFS Performance results Cost comparison Conclusions Francesco Allertsen (Vrije Universiteit) March 16, 2012 3 / 22

What is a workflow application? Workflow application is a parallel application where tasks typically communicate through the use of files Each task in a workflow produces one or more output files that become input files to other tasks Francesco Allertsen (Vrije Universiteit) March 16, 2012 4 / 22

Why cloud? Traditionally, large-scale workflows have been run on academic HPC systems such as cluster and grids. Why moving to the cloud? root access control over the entire software environment using of VM images on-demand provisioning capabilities Francesco Allertsen (Vrije Universiteit) March 16, 2012 5 / 22

Workflow Applications Epigenome (bio informatics application) Broadband (seismology application) Montage (astronomy application) Application I/O Memory CPU Epigenome Low Medium High Broadband Medium High Medium Montage High Low Low Francesco Allertsen (Vrije Universiteit) March 16, 2012 6 / 22

Epigenome Application I/O Memory CPU Epigenome Low Medium High Broadband Medium High Medium Montage High Low Low Contains 529 tasks Read 1.9GB of input data Produces 300MB of output data 99% of its runtime in the CPU and only 1% on I/O and other activities Francesco Allertsen (Vrije Universiteit) March 16, 2012 7 / 22

Broadband Application I/O Memory CPU Epigenome Low Medium High Broadband Medium High Medium Montage High Low Low Contains 768 tasks Read 6GB of input data Produces 303MB of output data 75% is consumed by tasks requiring more than 1GB of physical memory Francesco Allertsen (Vrije Universiteit) March 16, 2012 8 / 22

Montage Application I/O Memory CPU Epigenome Low Medium High Broadband Medium High Medium Montage High Low Low Contains 10429 tasks Read 4.2GB of input data Produces 7.9GB of output data 95% of its time waiting on I/O operations Francesco Allertsen (Vrije Universiteit) March 16, 2012 9 / 22

Execution environment For the experiments it is used the Amazon EC2 infrastructure, because it is the most popular, feature-rich and stable commercial cloud available. Francesco Allertsen (Vrije Universiteit) March 16, 2012 10 / 22

Resources For the experiments it is used the c1.xlarge instance type. two quad core 2.33-2.66 GHz Xeon processors 7GB RAM 1690 GB local disk storage It delivers the best overall performance for the applications considered. Francesco Allertsen (Vrije Universiteit) March 16, 2012 11 / 22

Storage For each workflow application we need to allocate storage for Application executables Input data Intermediate and output data For all the experiments we did not perform or measure data transfer to/from the cloud. Francesco Allertsen (Vrije Universiteit) March 16, 2012 12 / 22

Storage Options Amazon S3 NFS Dedicated note using the m1.xlarge instance type (16GB memory) async option GlusterFS NUFA (Non-Uniform File Access) distributed PVFS Version 2.6 instead of 2.8 XtreemFS Francesco Allertsen (Vrije Universiteit) March 16, 2012 13 / 22

Performance results for Epigenome Francesco Allertsen (Vrije Universiteit) March 16, 2012 14 / 22

Performance results for Broadband Francesco Allertsen (Vrije Universiteit) March 16, 2012 15 / 22

Performance results for Montage Francesco Allertsen (Vrije Universiteit) March 16, 2012 16 / 22

Cost comparison There are three different cost categories when running an application on EC2 resource cost storage cost transfer cost Amazon charges for resources by the hour, and any partial hours are rounded up. We consider also how much does it cost if Amazon would charge per second. Francesco Allertsen (Vrije Universiteit) March 16, 2012 17 / 22

Cost for Epigenome Francesco Allertsen (Vrije Universiteit) March 16, 2012 18 / 22

Cost for Broadband Francesco Allertsen (Vrije Universiteit) March 16, 2012 19 / 22

Cost for Montage Francesco Allertsen (Vrije Universiteit) March 16, 2012 20 / 22

Conclusions For different type of workflow application the performance of each file system can be different Using a cost-per-second model can be cheaper that the cost-per-hour, so it s better if you use one virtual cluster for more workflow applications instead of creating one for each application Francesco Allertsen (Vrije Universiteit) March 16, 2012 21 / 22

Take Home Message Test your workflow application with different storage options to find the optimal solution for you, because there isn t the perfect one Francesco Allertsen (Vrije Universiteit) March 16, 2012 22 / 22