Remus: : High Availability via Asynchronous Virtual Machine Replication

Similar documents

Remus: High Availability via Asynchronous Virtual Machine Replication

Remus: High Availability via Asynchronous Virtual Machine Replication

Virtual Machine Synchronization for High Availability Clusters

High Availability for Database Systems in Cloud Computing Environments. Ashraf Aboulnaga University of Waterloo

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service

Evaluation of disaster recovery in cloud computing

Distributed Systems. Virtualization. Paul Krzyzanowski

PipeCloud : Using Causality to Overcome Speed-of-Light Delays in Cloud-Based Disaster Recovery. Razvan Ghitulete Vrije Universiteit

Remote Copy Technology of ETERNUS6000 and ETERNUS3000 Disk Arrays

Improving the Performance of Hypervisor-Based Fault Tolerance

Database high availability

Xen Live Migration. Networks and Distributed Systems Seminar, 24 April Matúš Harvan Xen Live Migration 1

HRG Assessment: Stratus everrun Enterprise

Solaris For The Modern Data Center. Taking Advantage of Solaris 11 Features

Bridging the Gap between Software and Hardware Techniques for I/O Virtualization

Question: 3 When using Application Intelligence, Server Time may be defined as.

EonStor DS remote replication feature guide

Broadcom Ethernet Network Controller Enhanced Virtualization Functionality

State-Machine Replication

Xen and the Art of. Virtualization. Ian Pratt

CS5460: Operating Systems. Lecture: Virtualization 2. Anton Burtsev March, 2013

Network performance in virtual infrastructures

Windows Server 2008 R2 Hyper-V Live Migration

Tardigrade: Leveraging Lightweight Virtual Machines to Easily and Efficiently Construct Fault-Tolerant Services

Intro to Virtualization

Live Migration of Virtual Machines in Cloud

HA / DR Jargon Buster High Availability / Disaster Recovery

Business Continuity with the. Concerto 7000 All Flash Array. Layers of Protection for Here, Near and Anywhere Data Availability

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

Network File System (NFS) Pradipta De

Avoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas

Outline. Failure Types

RELIABLE service plays an important role in

System Virtual Machines

Cloud Computing #6 - Virtualization

Microsoft SharePoint 2010 on VMware Availability and Recovery Options. Microsoft SharePoint 2010 on VMware Availability and Recovery Options

Disaster Recovery Infrastructure

Virtual Machines Fact Sheet

Database Virtualization

Windows Server 2008 R2 Hyper-V Live Migration

Using VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems

Fault Tolerance and Resilience in Cloud Computing Environments

COLO: COarse-grain LOck-stepping Virtual Machine for Non-stop Service. Eddie Dong, Tao Hong, Xiaowei Yang

Regional SEE-GRID-SCI Training for Site Administrators Institute of Physics Belgrade March 5-6, 2009

Server and Storage Virtualization with IP Storage. David Dale, NetApp

Data Sheet. V-Net Link 700 C Series Link Load Balancer. V-NetLink:Link Load Balancing Solution from VIAEDGE

IOS110. Virtualization 5/27/2014 1

Network Attached Storage. Jinfeng Yang Oct/19/2015

AKIPS Network Monitor Installation, Configuration & Upgrade Guide Version 15. AKIPS Pty Ltd

Optimizing Data Center Networks for Cloud Computing

Virtualization is set to become a key requirement

UNDERSTANDING DATA DEDUPLICATION. Thomas Rivera SEPATON

Full and Para Virtualization

Live und in Farbe Live Migration. André Przywara CLT 2010

Frequently Asked Questions

Gigabit Ethernet Design

- An Essential Building Block for Stable and Reliable Compute Clusters

NETAPP WHITE PAPER USING A NETWORK APPLIANCE SAN WITH VMWARE INFRASTRUCTURE 3 TO FACILITATE SERVER AND STORAGE CONSOLIDATION

VMWARE WHITE PAPER 1

WanVelocity. WAN Optimization & Acceleration

Best of Breed of an ITIL based IT Monitoring. The System Management strategy of NetEye

HAVmS: Highly Available Virtual machine Computer System Fault Tolerant with Automatic Failback and close to zero downtime

Basics of Virtualisation

Fault Tolerance in the Internet: Servers and Routers

Models For Modeling and Measuring the Performance of a Xen Virtual Server

Availability Digest. Stratus Avance Brings Availability to the Edge February 2009

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014

Live Migration across data centers and disaster tolerant virtualization architecture with HP Cluster Extension and Microsoft Hyper-V

How To Make Your Database More Efficient By Virtualizing It On A Server

Leveraging Virtualization in Data Centers

Part 1 - What s New in Hyper-V 2012 R2. Clive.Watson@Microsoft.com Datacenter Specialist

AKIPS Network Monitor Installation, Configuration & Upgrade Guide Version 16. AKIPS Pty Ltd

Hyper-V R2: What's New?

UNDERSTANDING DATA DEDUPLICATION. Jiří Král, ředitel pro technický rozvoj STORYFLEX a.s.

Peter Ruissen Marju Jalloh

Implementing and Managing Windows Server 2008 Hyper-V

An Evaluation Framework for Highly Available and Scalable SIP Server Clusters

UNDERSTANDING DATA DEDUPLICATION. Tom Sas Hewlett-Packard

Code and Process Migration! Motivation!

Petascale Software Challenges. Piyush Chaudhary High Performance Computing

Zerto Virtual Manager Administration Guide

Virtualization. Pradipta De

Cisco and EMC Solutions for Application Acceleration and Branch Office Infrastructure Consolidation

Eloquence Training What s new in Eloquence B.08.00

One solution to protect all Physical, Virtual and Cloud

How To Make A Virtual Machine Aware Of A Network On A Physical Server

Virtualization: Benefits & Pitfalls. Matt Liebowitz, Kraft Kennedy Tim Garner, Aderant Mike Lombardi, Vertigrate Sergey Polak, Ropes & Gray LLP

SQL Server 2014 New Features/In- Memory Store. Juergen Thomas Microsoft Corporation

Technical Brief. DualNet with Teaming Advanced Networking. October 2006 TB _v02

Transcription:

Remus: : High Availability via Asynchronous Virtual Machine Replication Brendan Cully, Geoffrey Lefebvre, Dutch Meyer, Mike Feeley,, Norm Hutchinson, and Andrew Warfield Department of Computer Science The University of British Columbia NSDI 08: 5th USENIX Symposium on Networked Systems Design and Implementation (2008) Presented by Yeh Tsung-Yu 1

Outline Introduction Design and Implementation Evaluation Conclusion and future work 2

Introduction It s s hard and expensive to design highly available system to survive hardware failure Using redundant component. Customized hardware and software. E.g. HP Nonstop Server. Our goal is to provide high availability system, and it s s : Generality and transparency Without modification of OS and App. Seamless hardware failure recovery 3

Our approach Introduction VM-based whole system replication Frequently checkpoint whole Virtual Machine state. Protected VM and Backup VM is located in different Physical host. Speculative execution We buffer state to syn backup later, and continue execution ahead of syn point. Asynchronous replication Primary VM execution is overlap state transmission 4

Introduction 5

Our implementation is based on: Xen virtual machine monitor. Xen s support for live migration to provide fine-grained checkpoints. Two host machines is connected over redundant gigabit Ethernet connections. The virtual machine does not actually execute on the backup host until a failure occurs. 6

7

Pipelined checkpoint (see figure 1,p-5) 1.Pause the running VM and copy any changed state into a buffer. (talk later more detail) 2.With state changes preserved in a buffer, VM is unpaused and speculative execution resumes. 3.Buffered state is transmitted to the backup host. 4.When complete state has been received, ack to the primary. 5.Finally, buffered network output is released. (talk later more detail) 8

Next we talk about how to checkpoint CPU state, memory, disk, network. CPU & memory state Checkpointing is implemented above Xen s existing machinery for performing live migration. What is live migration? A technique by which a VM is relocated to another physical host with slight interruption. 9

How live migration work? 1.Memory is copied to the new location while the VM continues to run at the old location. 2.During migration, writes to memory are intercepted, and dirty pages are copied to the new location in rounds. 3.After a specified number of intervals, the guest is suspended and the remaining dirty page and CPU state is copied out. (final round) 10

By hardware MMU, page protection is used to trap dirty page. Actually, Remus implements checkpointing as repeated executions of the final round of live migration. 11

Disk buffering Requirement All writes to disk in VM is configured to write though. As a result, we can achieve crash-consistent consistent when active and backup host both fail. On-disk state don t t change until the entire checkpoint has been received, see figure4 in p13. 12

13

Network buffering Most networks cannot be counted on for reliable data delivery. Therefore, networked app use reliable protocol as TCP to deal with packet loss or duplication. This fact simplifies the network buffering problem considerably : transmitted packets do not require replication 14

In older to ensure packet transition atomic and checkpoint consistency: outbound packets generated since the previous checkpoint are queued. Til that checkpoint has been acknowledged by the backup site, outbound packets is released. 15

Detecting Failure We use a simple failure detector that is directly integrated in the checkpointing stream: a timeout of the backup responding to commit requests. a timeout of new checkpoints being transmitted from the primary. Timeout event represent the host s s failure. 16

Evaluation 17

Evaluation 18

Conclusion and future work We develop a novel method to provide high availability to survive hardware failure, and it have low cost and transparency. But outbound packet latency lower network throughput and still is waited for future work to solve. 19