Avoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas



Similar documents
In Memory Accelerator for MongoDB

Distributed Data Management

Geo-Replication in Large-Scale Cloud Computing Applications

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Data Management in the Cloud

High Availability for Database Systems in Cloud Computing Environments. Ashraf Aboulnaga University of Waterloo

BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011

Chapter 18: Database System Architectures. Centralized Systems

Contents. SnapComms Data Protection Recommendations

Tushar Joshi Turtle Networks Ltd

B) Using Processor-Cache Affinity Information in Shared Memory Multiprocessor Scheduling

ZooKeeper. Table of contents

Distributed Systems (5DV147) What is Replication? Replication. Replication requirements. Problems that you may find. Replication.

Chapter 14: Recovery System

Distributed Software Systems

Distributed Systems. Tutorial 12 Cassandra

High Availability with Postgres Plus Advanced Server. An EnterpriseDB White Paper

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF

TECHNIQUES FOR DATA REPLICATION ON DISTRIBUTED DATABASES

CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1

Remus: : High Availability via Asynchronous Virtual Machine Replication

Informix Dynamic Server May Availability Solutions with Informix Dynamic Server 11

A Framework for Highly Available Services Based on Group Communication

Replication on Virtual Machines

Availability Digest. MySQL Clusters Go Active/Active. December 2006

Distributed File Systems

SQL Server 2012/2014 AlwaysOn Availability Group

The Microsoft Large Mailbox Vision

Network Attached Storage. Jinfeng Yang Oct/19/2015

Comparing MySQL and Postgres 9.0 Replication

Connectivity. Alliance Access 7.0. Database Recovery. Information Paper

Dr Markus Hagenbuchner CSCI319. Distributed Systems

Contents. Foreword. Acknowledgments

Promise of Low-Latency Stable Storage for Enterprise Solutions

High Availability Solutions for the MariaDB and MySQL Database

Cloud Computing at Google. Architecture

Web DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing)

High Availability Essentials

High Availability and Disaster Recovery Solutions for Perforce

SCALABILITY AND AVAILABILITY

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Copyright 1

Practical Cassandra. Vitalii

Connectivity. Alliance Access 7.0. Database Recovery. Information Paper

Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Fault Tolerance in the Internet: Servers and Routers

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Special Relativity and the Problem of Database Scalability

Software Replication

Astaro Deployment Guide High Availability Options Clustering and Hot Standby

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Unitt Zero Data Loss Service (ZDLS) The ultimate weapon against data loss

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

Windows Server Failover Clustering April 2010

DELL RAID PRIMER DELL PERC RAID CONTROLLERS. Joe H. Trickey III. Dell Storage RAID Product Marketing. John Seward. Dell Storage RAID Engineering

PipeCloud : Using Causality to Overcome Speed-of-Light Delays in Cloud-Based Disaster Recovery. Razvan Ghitulete Vrije Universiteit

USING REPLICATED DATA TO REDUCE BACKUP COST IN DISTRIBUTED DATABASES

High Availability and Clustering

VERITAS Business Solutions. for DB2

Scaling Microsoft SQL Server

Linux High Availability

Creating A Highly Available Database Solution

Automatic Service Migration in WebLogic Server An Oracle White Paper July 2008

Cisco Active Network Abstraction Gateway High Availability Solution

Techniques for implementing & running robust and reliable DB-centric Grid Applications

Information Systems. Computer Science Department ETH Zurich Spring 2012

Module 14: Scalability and High Availability

<Insert Picture Here> Oracle In-Memory Database Cache Overview

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Installation Guide. Step-by-Step Guide for clustering Hyper-V virtual machines with Sanbolic s Kayo FS. Table of Contents

Optimizing Performance. Training Division New Delhi

The Shift Cloud Computing Brings to Disaster Recovery

OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006

Antelope Enterprise. Electronic Documents Management System and Workflow Engine

Bosch Video Management System High Availability with Hyper-V

Table of contents. Matching server virtualization with advanced storage virtualization

VMware Virtual Machine File System: Technical Overview and Best Practices

Module 15: Network Structures

HRG Assessment: Stratus everrun Enterprise

Hypertable Architecture Overview

RAID Storage, Network File Systems, and DropBox

Proactive, Resource-Aware, Tunable Real-time Fault-tolerant Middleware

Geospatial Server Performance Colin Bertram UK User Group Meeting 23-Sep-2014

EMC MID-RANGE STORAGE AND THE MICROSOFT SQL SERVER I/O RELIABILITY PROGRAM

IncidentMonitor Server Specification Datasheet

The Methodology Behind the Dell SQL Server Advisor Tool

BookKeeper overview. Table of contents

In-Memory Columnar Databases HyPer. Arto Kärki University of Helsinki

Transcription:

3. Replication Replication Goal: Avoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas Problems: Partial failures of replicas and messages No global ordering of messages Some replicas might execute operations that others did not know about The state of replicas diverge L M B L M B L M B 1

Replicated State Machine Known also as Active Replication The idea: Every replica sees exactly the same set of messages in the same order and will execute them in that order Benefit: Immediate fail-over Limitations: Waste of resources, since all replicas are doing the same Requires determinism, which is not trivial to ensure An important issue: At what level? Option 1: Machine instruction level Virtual Machine Level Machine Instruction Level Replication Benefits: Fast consistency resolution Transparent Problems: Requires special hardware Usually behind technology curve Resource wasteful Requires physical proximity Does not overcome software bugs / no multi-versioning 2

N-Modular Redundancy Node 1 Node 2 Node N Voter Output Monitoring System Logical State Replication Idea: The important thing is the logical/semantic state of the application Benefits: The negation of the shortcoming of machine level Problems: Not transparent Changes programming model Although one tries to minimize this Slower consistency resolution May result in lower throughput and higher latency 3

Total Ordering Based Replication Simple replication protocol Clients can send requests to any replica All replicas utilize a black-box total ordering mechanism to apply the updates in the same order Client1 Request1 Replicated Servers Request2 Client2 Total Order Reply1 Reply2 Total Ordering Based Replication Fine Details How do clients find an alive server? Name servers Local directors Virtual IP being migrated between alive servers What happens if the server fails? Before servicing the request Resubmit After servicing the request We need to avoid re-execution Sequence numbers We need to return pre-computed result Cache of recent results 4

Primary-Backup Replication Cold backup Only the primary is active Periodically checkpoints its state to storage that is available to the backup Stable storage or shared storage (SCSI, SAN) When the primary fails, the backup is initiated, loads the state from storage, and continues from there Slow recovery The backup needs to be started (run applications, obtain resources, etc.) Either the backup needs to replay the last actions from a log file, or it may miss the last updates since the most recent checkpoint + Resource efficient + Invocations need not be deterministic It is possible to have several backups to survive multiple failures Requires consistent failure detection, e.g., by a group membership service It is possible to have several nodes, each primary for some services and backup for others Primary-Backup Replication Warm backup In this case, the backup is (at least) partially alive, so the recovery phase is faster But typically still requires some replaying of last transactions, or losing the last few updates Typically, updates are also frequent Hot standby (leader/follower) The backup is also up, and is constantly updated about the state of the primary + Fast and up-to-date recovery Special protocols are required to ensure true up-to-date recovery We will talk about such protocols later + More efficient than active replication Higher overhead than cold and warm backup Slower recovery than active replication 5

Challenges in Primary/Backup How to consistently agree on the primary? How to detect that the primary has failed? How to ensure that if we suspect the primary, it is indeed no longer operating on behalf of the system How to enable additional backups to join the system without manual intervention and reset We will discuss these issues when we talk about Consensus, Failure Detection, and Membership Quorum Replication Atomic Register Intuitively, operations should appear as if executed on a single server This is a private case of linearizability Well suited for distributed storage and distributed shared memory More specifically, A read always returns a value written either by the last write that terminated before the read started, or by a write that is concurrent with the read If several writes are concurrent, then a following read can return a value written by any one of them A read should not return a value that is older than the value returned by a previous read 6

Quorums A quorum system consists of a set of subsets such that the intersection of every two subsets is not empty Each of these subsets is called a quorum Example U = {1,2,3,4}, S = {{1,2,3}, {2,3,4}, {1,4}} The simplest generic quorum system is majority Every majority subset intersects with every other majority subset Another common type of generic quorum system is any row + any column of a lattice Bi-Quorums The universe U is now composed of two sets of subsets, A and B Every subset from A must intersect every subset from B Clearly, each quorum system is also a bi-quorum system Majority is a generic bi-quorum system Scalable generic bi-quorums Idea Servers are arranged in a logical matrix A write quorum consists of any one row in the matrix A read quorum consists of any one column in the matrix Drawback Tradeoff between availability and size of quorum Read quorums Write quorums 7

Implementing Read/Write Registers with Quorums Quorum replication Clients read and write directly to quorums of servers Essentially adaptation of Attiya, Bar-Noy, Dolev protocol for emulating distributed shared memory robustly, but using any given bi-quorum rather than majority Tradeoff between availability of the system and its scalability (size of quorum) Cannot implement Read-Modify-Write semantics Implementing Quorum Replication Each server maintains a logical timestamp for each register Servers protocol: Upon receiving a r-request(x) message do o := values[x] reply with a r-reply(x,o.val,<o.ts,o.id>) message Upon receiving a w-request(x,v,<ts,id>) message do o := values[x] if <ts,id> is lexicographically larger than <o.ts,o.id> then o.ts := ts; o.id := id; o.val := v reply with a w-reply(x) message 8

Implementing Quorum Replication continued A read operation R(x) 1. Send a read request r-request(x) to all servers 2. Wait for r-reply(x,v,<ts,id>) messages from a READ quorum of servers let v be the value with largest logical timestamp <ts,id> 3. Send a write request w-request(x,v,<ts,id>) to all servers 4. Wait for w-reply(x) replies from a WRITE quorum of servers 5. Return (v) // the value that was reserved in line 2 Why are lines 3&4 important? It is possible to first send requests only to the corresponding quorums; only if there is no reply, send to more servers Implementing Quorum Replication continued A write operation W(x,v) 1. Send a read request r-request(x) to all servers 2. Wait for r-reply(x,-,<ts,id>) messages from a READ quorum of servers let <ts,id> be the largest returned logical timestamp 3. Send a write request w-request(x,v,<ts+1,my_id>) to all servers 4. Wait for w-reply(x) messages from a WRITE quorum of servers 5. Return 9