PipeCloud : Using Causality to Overcome Speed-of-Light Delays in Cloud-Based Disaster Recovery. Razvan Ghitulete Vrije Universiteit



Similar documents
Part2 Hyper-V Replica and Hyper-V Recovery Manager. Datacenter Specialist

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

High Availability for Database Systems in Cloud Computing Environments. Ashraf Aboulnaga University of Waterloo

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

HRG Assessment: Stratus everrun Enterprise

Scaling Database Performance in Azure

Deployment Topologies

Migration and Disaster Recovery Underground in the NEC / Iron Mountain National Data Center with the RackWare Management Module

Chapter 19 Cloud Computing for Multimedia Services

Enterprise Linux Business Continuity Solutions for Critical Applications

RackWare Solutions Disaster Recovery

High Availability with Postgres Plus Advanced Server. An EnterpriseDB White Paper

Berkeley Ninja Architecture

Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers

High Availability & Disaster Recovery Development Project. Concepts, Design and Implementation

Evaluation Methodology of Converged Cloud Environments

Storage as a Service: Leverage the benefits of scalability and elasticity with Storage as a Service

Remus: : High Availability via Asynchronous Virtual Machine Replication

BeBanjo Infrastructure and Security Overview

Top 10 Reasons why MySQL Experts Switch to SchoonerSQL - Solving the common problems users face with MySQL

Whitepaper - Disaster Recovery with StepWise

AppDynamics Lite Performance Benchmark. For KonaKart E-commerce Server (Tomcat/JSP/Struts)

Variations in Performance and Scalability when Migrating n-tier Applications to Different Clouds

EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

SERVICE SCHEDULE PULSANT ENTERPRISE CLOUD SERVICES

Tushar Joshi Turtle Networks Ltd

EMC VPLEX FAMILY. Continuous Availability and data Mobility Within and Across Data Centers

Products for the registry databases and preparation for the disaster recovery

High Performance Data-Transfers in Grid Environment using GridFTP over InfiniBand

Drobo How-To Guide. Cloud Storage Using Amazon Storage Gateway with Drobo iscsi SAN

EMC Business Continuity for Microsoft SQL Server Enabled by SQL DB Mirroring Celerra Unified Storage Platforms Using iscsi

How AWS Pricing Works

Unitt Zero Data Loss Service (ZDLS) The ultimate weapon against data loss

Cisco Wide Area Application Services Optimizes Application Delivery from the Cloud

VMware vcloud Automation Center 6.0

ArcGIS for Server in the Amazon Cloud. Michele Lundeen Esri

Fax Server Cluster Configuration

Why is Redundancy Important?

be architected pool of servers reliability and

HA / DR Jargon Buster High Availability / Disaster Recovery

PostgreSQL Performance Characteristics on Joyent and Amazon EC2

TOP FIVE REASONS WHY CUSTOMERS USE EMC AND VMWARE TO VIRTUALIZE ORACLE ENVIRONMENTS

Performance Testing of a Cloud Service

Riverbed WAN Acceleration for EMC Isilon Sync IQ Replication

Building disaster-recovery solution using Azure Site Recovery (ASR) for Hyper-V (Part 1)

WINDOWS AZURE EXECUTION MODELS

Leveraging Public Clouds to Ensure Data Availability

Designing a Data Solution with Microsoft SQL Server 2014

BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011

A Middleware Strategy to Survive Compute Peak Loads in Cloud

CLOUD DEVELOPMENT BEST PRACTICES & SUPPORT APPLICATIONS

Informix Dynamic Server May Availability Solutions with Informix Dynamic Server 11

Network Performance Between Geo-Isolated Data Centers. Testing Trans-Atlantic and Intra-European Network Performance between Cloud Service Providers

Intro to Virtualization

Dependency Free Distributed Database Caching for Web Applications and Web Services

SLA Driven Load Balancing For Web Applications in Cloud Computing Environment

Performance Monitoring AlwaysOn Availability Groups. Anthony E. Nocentino

Dimension Data Enabling the Journey to the Cloud

Deep Dive on SimpliVity s OmniStack A Technical Whitepaper

High Availability Solutions for the MariaDB and MySQL Database

IBM FlashSystem and Atlantis ILIO

The Evolution of Cloud Storage - From "Disk Drive in the Sky" to "Storage Array in the Sky" Allen Samuels Co-founder & Chief Architect

Westek Technology Snapshot and HA iscsi Replication Suite

The Benefits of Virtualizing

Deploying in a Distributed Environment

Upgrading a Telecom Billing System with Intel Xeon Processors

Amazon Cloud Storage Options

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

VMware vcloud Automation Center 6.1

Solution Brief Availability and Recovery Options: Microsoft Exchange Solutions on VMware

Server Deployment and Configuration. Qlik Sense 1.1 Copyright QlikTech International AB. All rights reserved.

Dell AppAssure cloud replication

How AWS Pricing Works May 2015

IBM Spectrum Protect in the Cloud

SQL Server on Azure An e2e Overview. Nosheen Syed Principal Group Program Manager Microsoft

Testing ARES on the GTS framework: lesson learned and open issues. Mauro Femminella University of Perugia

SAP HANA Operation Expert Summit BUILD - High Availability & Disaster Recovery

CloudNet: Dynamic Pooling of Cloud Resources by Live WAN Migration of Virtual Machines


Disaster Recovery Solution Achieved by EXPRESSCLUSTER

HyperQ DR Replication White Paper. The Easy Way to Protect Your Data

SteelFusion with AWS Hybrid Cloud Storage

Contents Introduction... 5 Deployment Considerations... 9 Deployment Architectures... 11

VMware vrealize Automation

Expert Reference Series of White Papers. Unlock the Power of Microsoft SQL Server 2012

Transcription:

PipeCloud : Using Causality to Overcome Speed-of-Light Delays in Cloud-Based Disaster Recovery Razvan Ghitulete Vrije Universiteit

Introduction /introduction Ubiquity: the final frontier Internet needs to become ubiquitous disaster recovery low response time low downtime incurred by catastrophic system failures March 23, 2012 1/27

Overview /overview Disaster recovery challenges synchronous vs. asynchronous Pipelined synchrony single vs. multiple writers PipeCloud Evaluation Conclusions March 23, 2012 2/27

Disaster recovery challenges /disaster_recovery_challenges/system_model The system should be composed of modern virtualized data center primary site cloud data center for use a secondary site Applications can be distributed across multiple VMs data on any disk requires DR protection March 23, 2012 3/27

Disaster recovery challenges /disaster_recovery_challenges/replication_strategies fig. 1: Existing replication strategies Synchronous replication no application progress until the data has been backed up Asynchronous replication unsafe replies in order to reduce latency for the client March 23, 2012 5/27

Disaster recovery challenges /disaster_recovery_challenges/replication_strategies Synchronous replication dependent of geographical distance to the backup cluster Asynchronous replication fig. 2: Distance impact on response time separates application performance from data preservation March 23, 2012 6/27

Progress /overview Disaster recovery challenges Pipelined synchrony PipeCloud Evaluation Conclusions March 23, 2012 7/27

Pipelined synchrony /pipelined_sync fig. 3: Replication in a simple multi-tiered app. Synchronous flow Asynchronous flow Pipelined sync flow 1. submitting data 2. web server submits data for storage to database 3. database writes data to disk 4. disk write replicated across WAN link 5. the system waits for replication to complete 6. doing additional processing 7. responding to the client 8. the replication ack unlocks the reply to the client 9. reply is returned to the client browser March 23, 2012 8/27

Pipelined synchrony /pipelined_sync fig. 3: Replication in a simple multi-tiered app. Synchronous flow Asynchronous flow Pipelined sync flow 1. submitting data 2. web server submits data for storage to database 3. database writes data to disk 4. disk write replicated across WAN link 5. the system waits for replication to complete 6. doing additional processing 7. responding to the client 8. the replication ack unlocks the reply to the client 9. reply is returned to the client browser March 23, 2012 9/27

Pipelined synchrony /pipelined_sync fig. 3: Replication in a simple multi-tiered app. Synchronous flow Asynchronous flow Pipelined sync flow 1. submitting data 2. web server submits data for storage to database 3. database writes data to disk 4. disk write replicated across WAN link 5. the system waits for replication to complete 6. doing additional processing 7. waiting for response from the replication site 8. the replication ack unlocks the reply to the client 9. reply is returned to the client browser March 23, 2012 10/27

Pipelined synchrony /pipelined_sync/challenge To keep a maximum degree of transparency for the application developer we consider the application to be a black-box How do we determine when is it safe to send the reply to the client? March 23, 2012 11/27

Pipelined synchrony /pipelined_sync/challenge To keep a maximum degree of transparency for the application developer we consider the application to be a black-box How do we determine when is it safe to send the reply to the client? in theory: global clock timestamps March 23, 2012 11/27

Pipelined synchrony /pipelined_sync/challenge To keep a maximum degree of transparency for the application developer we consider the application to be a black-box How do we determine when is it safe to send the reply to the client? in theory: global clock timestamps in practice: timestamps March 23, 2012 11/27

Pipelined synchrony /pipelined_sync/challenge To keep a maximum degree of transparency for the application developer we consider the application to be a black-box How do we determine when is it safe to send the reply to the client? in theory: global clock timestamps in practice: timestamps (logical clocks) March 23, 2012 12/27

Pipelined synchrony /pipelined_sync/single_writer Buffer messages in a deque in the frontend tier of the application add new messages by appending them to the end remove messages and forward them to the client as the latest received timestamp is higher than the message creation timestamp March 23, 2012 13/27

Pipelined synchrony /pipelined_sync/multiple_writers How do we decide which ack does a message wait for? we can t, so we wait for until the message respects every writer s timestamp The logical clocks are also used by the backup server to order the writes before submitting them to disk There could be the possibility that a read operation has to wait for an unrelated write acknowledgement March 23, 2012 14/27

Pipelined synchrony /pipelined_sync/how_does_it_actually_work fig. 4: Read delay. March 23, 2012 15/27

Progress /overview Disaster recovery challenges Pipelined synchrony PipeCloud Evaluation Conclusions March 23, 2012 16/27

PipeCloud /pipecloud/prototype_specs PipeCloud must run in the VMM(hypervisor) of each physical machine as to be able to monitor and replicate disk writes PipeCloud needs to benefit from information regarding application deployment (configuration gray box) March 23, 2012 17/27

PipeCloud /pipecloud/prototype_specs fig. 5: PipeCloud architectural design. March 23, 2012 18/27

PipeCloud /pipecloud/prototype_specs In order to maintain the causality and propagate the logical clocks between tiers, they are injected into packet headers Also instead of an actual timestamp, the local outbound nodes maintain two logical clocks pending writes updated by either issuing disk writes or propagation causality committed writes updated by the DR backup site March 23, 2012 19/27

PipeCloud /pipecloud/storage_in_the_cloud_backup At the secondary site a backup server collects all incoming disk writes and commits them to a storage volume The backup server is implemented as a user level process and doesn t require any privileges on the backup site March 23, 2012 20/27

Progress /overview Disaster recovery challenges Pipelined synchrony PipeCloud Evaluation Conclusions March 23, 2012 21/27

Evaluation /evaluation/setup Physical setup Amazon EC2 Large virtual machines CentOS 5.1 Local test-bed quad-core 2.12GHz, 4Gb RAM Linux tc tool to emulate network latency CentOS 5.1 March 23, 2012 22/27

Evaluation /evaluation/setup Test cases MySQL TPC-W e-commerce web benchmark composed of Tomcat webserver and MySQL database March 23, 2012 23/27

Evaluation /evaluation/single_writer fig. 6: MySQL single-writer Test. March 23, 2012 24/27

Evaluation /evaluation/multiple_writer fig. 7: TPC-W multiple-writer Test. March 23, 2012 25/27

Conclusions /conclusions The pipelined sync replication approach proves very effective with overcoming the deleterious effects of speed-of-light delays over large areas PipeCloud demonstrates dramatic performance benefits over synchronous replication both in throughput and response time for a variety of workloads March 23, 2012 26/27

THM /conclusions PipeCloud is most definitely the right step towards a transparent internet. March 23, 2012 27/27