Software Replication



Similar documents
Avoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas

Distributed Systems (5DV147) What is Replication? Replication. Replication requirements. Problems that you may find. Replication.

State-Machine Replication

DATA RECOVERY SOLUTIONS EXPERT DATA RECOVERY SOLUTIONS FOR ALL DATA LOSS SCENARIOS.

Replication on Virtual Machines

Pronto: High Availability for Standard Off-the-shelf Databases

Proactive, Resource-Aware, Tunable Real-time Fault-tolerant Middleware

A Framework for Highly Available Services Based on Group Communication

Distributed Data Management

High Availability Design Patterns

Database Replication Techniques: a Three Parameter Classification

Dr Markus Hagenbuchner CSCI319. Distributed Systems

DFSgc. Distributed File System for Multipurpose Grid Applications and Cloud Computing

Remus: : High Availability via Asynchronous Virtual Machine Replication

Virtual Infrastructure Security

TIBCO StreamBase High Availability Deploy Mission-Critical TIBCO StreamBase Applications in a Fault Tolerant Configuration

High Availability and Clustering

Chapter 10: Distributed DBMS Reliability

High Availability And Disaster Recovery

Data Management in the Cloud

Getting to Know the SQL Server Management Studio

Architectures Haute-Dispo Joffrey MICHAÏE Consultant MySQL

Redis Cluster. a pragmatic approach to distribution

Architecture Evaluation Methods: Introduction to ATAM

Transactions and Recovery. Database Systems Lecture 15 Natasha Alechina

How To Understand The Power Of Zab In Zookeeper

Distributed Databases

DHCP Failover. Necessary for a secure and stable network. DHCP Failover White Paper Page 1

Exchange DAG backup and design best practices

Chancery SMS Database Split

Huang-Ming Huang and Christopher Gill. Bala Natarajan and Aniruddha Gokhale

Microkernels & Database OSs. Recovery Management in QuickSilver. DB folks: Stonebraker81. Very different philosophies

Pervasive PSQL Meets Critical Business Requirements

Distributed Architectures. Distributed Databases. Distributed Databases. Distributed Databases

BME CLEARING s Business Continuity Policy

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

TIBCO StreamBase High Availability Deploy Mission-Critical TIBCO StreamBase Applications in a Fault Tolerant Configuration

TECHILA INTERCONNECT END-USER GUIDE

Distributed Software Systems

Chapter 3 - Data Replication and Materialized Integration

Real-time Protection for Hyper-V

Bounded Cost Algorithms for Multivalued Consensus Using Binary Consensus Instances

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Review: The ACID properties

Survey on Comparative Analysis of Database Replication Techniques

Performance tuning policies for application level fault tolerance in distributed object systems

All about Eve: Execute-Verify Replication for Multi-Core Servers

Topics. Distributed Databases. Desirable Properties. Introduction. Distributed DBMS Architectures. Types of Distributed Databases

Fault Tolerance and Resilience in Cloud Computing Environments

Distributed Commit Protocols

VERITAS Volume Replicator in an Oracle Environment

A SURVEY OF POPULAR CLUSTERING TECHNOLOGIES

Westek Technology Snapshot and HA iscsi Replication Suite

Introduction CORBA Distributed COM. Sections 9.1 & 9.2. Corba & DCOM. John P. Daigle. Department of Computer Science Georgia State University

CA ARCserve and CA XOsoft r12.5 Best Practices for protecting Microsoft SQL Server

Achieving High Availability

GPI Global Address Space Programming Interface

Paxos for System Builders

ELIXIR LOAD BALANCER 2

Highly Available AMPS Client Programming

Protecting SQL Server Databases Software Pursuits, Inc.

Microsoft s Advantages and Goals for Hyper-V for Server 2016

CA ARCserve Replication and High Availability for Windows

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Real-Time Component Software. slide credits: H. Kopetz, P. Puschner

Providing Load Balancing and Fault Tolerance in the OSGi Service Platform

Unification of Transactions and Replication in Three-Tier Architectures Based on CORBA

Geo-Replication in Large-Scale Cloud Computing Applications

Fault tolerant stateful firewalling with GNU/Linux. Pablo Neira Ayuso Proyecto Netfilter University of Sevilla

Outline. Failure Types

Highly Available Monitoring System Architecture (HAMSA)

How To Write A Transaction System

Chapter 2: Remote Procedure Call (RPC)

The Aspect Unified IP Five 9s Environment

Connectivity. Alliance Access 7.0. Database Recovery. Information Paper

How to Set Up Disaster Recovery for HP OO

CA ARCserve Replication and High Availability for Windows

Transcription:

Software Replication

Motivation Assure the availability of an object (service) despite failures Assuming p the failure probability of an object O. O s availability is 1-p. Replicating an object O on n nodes and assuming p the failure probability of each replica, O s availability is 1- p n (considering independent failures probability) 2

Linearizability In a Shared Memory System 1. LINEARIZZABILE p 1 x write(a) p 1 ok() x read() p 1 ok(b) p 2 x write(b) p 2 ok() 2. NON LINEARIZZABILE p 1 x write(a) p 1 ok() x read() p 1 ok(a) p 2 x write(b) p 2 ok() 3

Linearizability Liveness: every invocation has a matching response Safety: there exists a permutation P of all operations on O such that P is legal wrt O (respects the semantic of the object) If the response of O occurs before O in the run, then O precedes O in P (respects the real-time ordering of operations) 4

Replication Techniques Due main techniques implementing linearizability: Primary Backup Active Replication 5

Sistem Model Processes are either clients or replicas Clients and replicas exchanges messages through Perfect point-point links Processes can crash 6

Passive replication: Primary-Backup

Primary Backup Primary: Receives invocations from clients and sends back the answers. Given an object x, prim(x) returns the primary of x. Backup: Interacts with prim(x) is used to guarantee fault tolerance by replacing a primary when crashes 8

Primary Backup Scenario 9

Primary Backup: the case of no crash When update messages are received by backups, they update their state and send back the ack to prim(x). prim(x) waits for an ack message from each correct backup and then sends back the answer, res, to the client. Guarantee Linearizability: the order in which prim(x) receive clients invocations define the order of the operation on the object. 10

Primary Backup: Presence of Crash Three scenarios : 1. Scenario 1: Primary fails after the client receives the answer. 2. Scenario 2: Primary fails before sending update messages 3. Scenario 3: Primary fails after sending update messages and before receiving all the ack messages. In all cases there is the need of electing a new leader. 11

Scenario 1 Primary fails after the client receives the answer: Client could not receive the response due to perfect point-topoint link If the response is lost, client retransmits the request after a timeout The new primary will recognize the request re-issued by the client as already processed and sends back the result without updating the replicas 12

Scenario 2 Primary fails before sending update messages Client does not get an answer and resends the requests after a timeout The new primary will handle the request as new 13

Scenario 3 Primary fails after sending update messages and before receiving all the ack messages: Guarantee atomicity: update it is received either by all or by no one. When a primary fails there is the need to elect another primary among the correct replicas 14

Example of Primary Backup on Synchronous Systems USE Perfect Point-to-point link; rb broadcast, leader election; perfect failure detector; upon event <Init> do recovery:= false; primary:=p; state:=initial; storage:= Ø; lastreq:=(-,-,-,-,initial); ack_recovery:=ø; correct:=π; upon event < pp2pdeliver client, [request(arg,sn]> and not recovery do If (sn,c,arg) in storage then result:=retrieve(sn,c,arg)from storage trigger <pp2psend c, [response(result)]> else (result,new state):= execute request(arg) trigger <rbbcast [up(sn,c,arg,result,newstate)]>; state:=newstate; create(ack_set c,sn ); ack_set c,sn := Ø; Upon <rbdeliver [up(sn,c,arg,result,newstate)]> do lastreq := (sn,c,arg,result,newstate); state:=newstate; store (sn,c,arg,result,newstate) into storage; trigger <pp2psend primary, [ack(sn,c)]>; upon event <crash p > do correct:=correct-p; if p=primary then trigger <leaderelection> ; upon event <leader p> do if p=myself then recovery:= true; trigger <rbbcast [recovery(lastreq)]>; register the primary to a name server; Upon event <pp2pdeliver replica, ack(c,sn)> ack_set c,sn := ack_set c,sn U replica; Upon event ackset c,sn correct do store (sn,c,arg,result,newstate) into storage; trigger <pp2psend c, [response(result)]>; Upon event ack_recovery correct do recovery:=false; Upon <rbdeliver [recovery(lastreq)]> do if state lastreq.state then remove lastreq from storage; state:=lastreq.state; trigger <pp2psend primary, [ack_rec]>; Upon event <pp2pdeliver replica, ack_recovery> ack_recovery:= ack_recovery U replica;

Active Replication

Active Replication There is no coordinator all replicas have the same role Each replica is deterministic. If any replica starts from the same state and receives the same input, they will produce the same output As a matter of fact clients will receive the same response one from each replica 17

Active Replication 18

Active Replication: Garantire Linearizability To ensure linearlizability we need to preserve: Atomicity: if a replica executes an invocation, all correct replicas execute the same invocation. Ordering: (at least) no two correct replicas have to execute two invocations in different order. We need: TOTAL ORDER Broadcast INCLUDING THE CLIENTS! 19

Active Replication:Crash Active Replication does not need recovery action upon the failure of a replica 20