A Batch Computing Service for the Spot Market. Supreeth Subramanya, Tian Guo, Prateek Sharma, David Irwin, Prashant Shenoy

Similar documents
SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Data Centers and Cloud Computing

Improving MapReduce Performance in Heterogeneous Environments

Cutting the Cost of Hosting Online Services Using Cloud Spot Markets

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

Amazon CloudWatch to monitor cloud resource usage

Black-box and Gray-box Strategies for Virtual Machine Migration

Cloud Computing and Amazon Web Services

Chapter 19 Cloud Computing for Multimedia Services

A Very Brief Introduction To Cloud Computing. Jens Vöckler, Gideon Juve, Ewa Deelman, G. Bruce Berriman

ADAPTIVE LIVE VM MIGRATION OVER A WAN MODELING AND IMPLEMENTATION

Ensuring Collective Availability in Volatile Resource Pools via Forecasting

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

CHAPTER 8 CLOUD COMPUTING

Disaster Recovery as a Cloud Service: Economic Benefits and Deployment Challenges

Google File System. Web and scalability

Data Centers and Cloud Computing. Data Centers

Characterizing Task Usage Shapes in Google s Compute Clusters

Data Centers and Cloud Computing. Data Centers

Datacenters and Cloud Computing. Jia Rao Assistant Professor in CS

Ground up Introduction to In-Memory Data (Grids)

Part 1: Price Comparison Among The 10 Top Iaas Providers

Multilevel Communication Aware Approach for Load Balancing

Data Centers and Cloud Computing. Data Centers. MGHPCC Data Center. Inside a Data Center

Who moved my cloud? Part II: What s behind the cloud vendors AWS, GCE and Azure?

Dynamic Resource Distribution Across Clouds

CloudFTP: A free Storage Cloud

Takahiro Hirofuchi, Hidemoto Nakada, Satoshi Itoh, and Satoshi Sekiguchi

International Journal of Computer & Organization Trends Volume21 Number1 June 2015 A Study on Load Balancing in Cloud Computing

Cloud Panel Service Evaluation Scenarios

IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 3, NO. X, XXXXX Monetary Cost Optimizations for Hosting Workflow-as-a-Service in IaaS Clouds

Proactive, Resource-Aware, Tunable Real-time Fault-tolerant Middleware

Fujitsu Managed Hosting Delivers your Cloud Infrastructure as a Service environment with confidence

Experimental Investigation Decentralized IaaS Cloud Architecture Open Stack with CDT

Database high availability

How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda

The Case for Enterprise Ready Virtual Private Clouds

Process Replication for HPC Applications on the Cloud

HPC performance applications on Virtual Clusters

How To Create A Cloud Based System For Aaas (Networking)

Last time. Data Center as a Computer. Today. Data Center Construction (and management)

Coding Techniques for Efficient, Reliable Networked Distributed Storage in Data Centers

Cloud Services. May 28 th, 2014 Athens, Greece

Protecting your SQL database with Hybrid Cloud Backup and Recovery. Session Code CL02

Network Attached Storage. Jinfeng Yang Oct/19/2015

Towards a New Model for the Infrastructure Grid

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Server Virtualization and Cloud Computing

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

- Behind The Cloud -

Traditional v/s CONVRGD

On the Use of Fuzzy Modeling in Virtualized Data Center Management Jing Xu, Ming Zhao, José A. B. Fortes, Robert Carpenter*, Mazin Yousif*

The Virtual Grid Application Development Software (VGrADS) Project

Best Practices for Using MySQL in the Cloud

Security management in the internet era

Hadoop Architecture. Part 1

Web Application Deployment in the Cloud Using Amazon Web Services From Infancy to Maturity

Cloud 101. Mike Gangl, Caltech/JPL, 2015 California Institute of Technology. Government sponsorship acknowledged

Cloud Computing. Summary

Virtualization and Cloud Computing

Last time. Today. IaaS Providers. Amazon Web Services, overview

Amazon EC2 Product Details Page 1 of 5

Building an Internal Cloud that is ready for the external Cloud

Volunteer Computing, Grid Computing and Cloud Computing: Opportunities for Synergy. Derrick Kondo INRIA, France

The Google File System

Analyzing the Applicatbility of Airline Booking Systems for Cloud Computing Offerings

DESIGN OF A PLATFORM OF VIRTUAL SERVICE CONTAINERS FOR SERVICE ORIENTED CLOUD COMPUTING. Carlos de Alfonso Andrés García Vicente Hernández

Mesos: A Platform for Fine- Grained Resource Sharing in Data Centers (II)

Cloud Computing through Virtualization and HPC technologies

Scheduling. Yücel Saygın. These slides are based on your text book and on the slides prepared by Andrew S. Tanenbaum

Oracle Solutions on Top of VMware vsphere 4. Saša Hederić VMware Adriatic

Virtual Machine Synchronization for High Availability Clusters

A New Strategy for Data Management in a Time of Hyper Change

SERVER VIRTUALIZATION IN MANUFACTURING

IBM Spectrum Protect in the Cloud

Azure VM Performance Considerations Running SQL Server

Dynamic Load Balancing: Improve Efficiency in Cloud Computing Argha Roy * M.Tech CSE Netaji Subhash Engineering College West Bengal, India.

DISTRIBUTED SYSTEMS AND CLOUD COMPUTING. A Comparative Study

Big data blue print for cloud architecture

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

JOHNSON COUNTY COMMUNITY COLLEGE College Blvd., Overland Park, KS Ph Fax

Cloud Management: Knowing is Half The Battle

Comparison of Open Source Cloud System for Small and Medium Sized Enterprises

Introduction to AWS Economics

TGL VMware Presentation. Guangzhou Macau Hong Kong Shanghai Beijing

Cloud Server Performance A Comparative Analysis of 5 Large Cloud IaaS Providers

Network Infrastructure Services CS848 Project

PeerMon: A Peer-to-Peer Network Monitoring System

High Availability for Database Systems in Cloud Computing Environments. Ashraf Aboulnaga University of Waterloo

Daniel J. Adabi. Workshop presentation by Lukas Probst

Outline. MCSA: Server Virtualization

An Energy Efficient Server Load Balancing Algorithm

The Easiest Way to Run Spark Jobs. How-To Guide

High Availability for Citrix XenServer

A Secure Strategy using Weighted Active Monitoring Load Balancing Algorithm for Maintaining Privacy in Multi-Cloud Environments

Cloud Computing #6 - Virtualization

Millennium Systems on the Cloud:

Box Leangsuksun+ * Thammasat University, Patumtani, Thailand # Oak Ridge National Laboratory, Oak Ridge, TN, USA + Louisiana Tech University, Ruston,

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

Transcription:

SpotOn: A Batch Computing Service for the Spot Market Supreeth Subramanya, Tian Guo, Prateek Sharma, David Irwin, Prashant Shenoy University of Massachusetts Amherst

infrastructure cloud Cost vs. Availability Tradeoff in IaaS Clouds On-demand Cost (per hour) Cheap Expensive Reserved Spot Spot Instances Preemptible VM Guaranteed, Non-revocable Not guaranteed, Non-revocable Not guaranteed, Revocable Availability

spot markets Amazon EC2 Spot User bid Bid in a 2nd price auction Acquire when bid > spot price Price Terminate when spot price > bid Time of the day How do we mitigate the impact of revocation? 1. Raise the bid 2. Employ fault-tolerance mechanisms mechanisms

challenges spot market complexity Amazon EC2 operates ~4000 spot markets Selecting an instance that yields lowest cost per unit of computation while also considering the probability of revocation is complex Scatterplot of ranks for EC2 spot markets

challenges application complexity CPU : IO 1:1 Working Set 8GB Disk Type Remote Running Time 1 hour Resource Vector Cost 20% On-demand Revocation Rate 2.4 per day Fault-tolerance Mechanism Checkpoint (every 900s) Spot VM 100 8000 Cost ( ) 75 50 25 6000 4000 2000 Completion time (s) 0 On-demand Checkpoint 0

challenges application complexity Resource Vector CPU : IO 1:1 Working Set 8GB Disk Type Remote Local Running Time 1 hour Spot VM Cost 20% On-demand Revocation Rate 2.4 per day Fault-tolerance Mechanism Checkpoint (every 900s) Replicate (deg=2) 100 8000 Cost ( ) 75 50 25 6000 4000 2000 Completion time (s) 0 On-demand Checkpoint Replicate 0

challenges application complexity CPU : IO 1:1 Working Set 8GB Disk Type Local Running Time 1 hour Resource Vector Spot VM Cost 20% On-demand Revocation Rate 2.4 per day >24 per day Fault-tolerance Mechanism Checkpoint (every 900s) 100 8000 Cost ( ) 75 50 25 6000 4000 2000 Completion time (s) 0 0 On-demand Checkpoint Replicate Replicate (Revoked)

Spoton: A batch computing service Service that accepts batch jobs from users and runs them on spot instances SpotOn Manages application and spot market complexity transparently Run batch jobs at on-demand performance but paying spot market price

greedy selection algorithm Selecting the best spot market and fault-tolerance mechanism Migrate Dup Chkp Spot Markets Fault Tolerance S i F t J b $ Batch Job Minimum cost of running J b using F t on S i Repeat on spot revocation (until J b finishes) Acquire S i

fault tolerance (1/3) Reactive Migration Migrated T M size of memory + local disk remote disk bandwidth Z k Random variable measuring time to revocation P k Probability job gets revoked before completion Cost = E[Price k ] = [(1 Pk) * T + Pk * (E(Zk) + T M )] * spot-price E[Time k ] (1 Pk) * T + Pk * E(Zk)

fault tolerance (2/3) Proactive Checkpoint Restored T C size of memory + local disk remote disk bandwidth T L (τ/2) τ: checkpoint frequency Total Overhead = (T /τ) * T c + T L Cost overhead is primarily a function of job s resource usage

fault tolerance (3/3) Spot Replication T L market volatility Total Overhead (Replication factor, T L ) Cost overhead is primarily a function of market characteristics

evaluation (1/3) SpotOn Prototype App emulator Built on Linux Containers for efficient checkpointing / migration App Emulator to create synthetic jobs with varying resource usage SpotOn job manager daemon daemon

evaluation (2/3) Effect of Application and Spot Market Characteristics Best choice of fault-tolerance mechanism is a function of the spot market and job characteristics

evaluation (3/3) Effect of Cost-aware Selection 2x better than just checkpointing On Google cluster trace, Cost-aware selection achieved 91.9% savings with little impact on performance

conclusion Spot markets offer arbitrage opportunities SpotOn manages application and market complexities We model fault tolerance and propose a selection algorithm Prototype on Amazon EC2 Achieves ~90% cost savings Thank you!

backup slides

bidding Spot price distribution has long tail Spot price changes are peaky Bidding does affect volatility but not drastically SpotOn always bids at the on-demand price If spot price goes above on-demand, SpotOn would choose on-demand